Saturday, August 20, 2016

Linux veth device benchmark with high and low mtu.

Veth devices where created --

ip link add name tstveth mtu 65535 type veth peer name tstveth0 mtu 65535

65535 is the highest MTU these devices support.

They where moved to different namespaces --

ip netns add transfertest

ip link set netns transfertest dev tstveth0

IPs where added to it --

ip netns exec transfertest ip a add fc00::2:2/112 dev tstveth0
ip a add fc00::2:1/112 dev tstveth

sshd was listening on fc00::2:1 with compression disabled.

This command was run to test throughoutput of ssh and the sys CPU utilization --

ip netns exec transfertest ssh -i /etc/mypc/my.key -p 80 de@fc00::2:1 dd bs=10M if=/dev/zero > /dev/null
ip netns exec transfertest ssh -i /etc/mypc/my.key -p 80 de@fc00::2:1 dd bs=10M count=103 if=/dev/zero > /dev/null

For the 65535 MTU, CPU was between 6 and 7, sometimes 5 also. Throughoutput was around 140MB/s

When the MTU was lowered to 1500, the CPU utilization dropped to between 5 and 6%, sometimes goes to 7% and the throughoutput was a little higher.

I blame it on chance, but overall, it doesn't make a difference.

Wednesday, April 13, 2016

Enable electrolsys/e10s/multiprocessing on Firefox 45/ESR

Open "about:config" in the URL bar.

Search for "browser.tabs.remote.autostart" and set it to true, then restart.

That's it!

If it's really enabled, in about:support you must see Multiprocess Windows set to true.

Tuesday, March 29, 2016

Apache vs Nginx benchmark/How to make Apache faster than Nginx.

Unlike other benchmarks, both Apache and and Nginx have been tuned for best performance for the task they're doing (serving static content).
Test using apache's ab utility.

Results --

Direct hits --

Rewrite hits --

Test done

There are 2 sets of tests done --
  • Hitting URLs with rewrite rules
  • Hitting URLs with file paths directly.
For each of these, test where done from concurrency 1000 to 20000.
In the charts, legends which represent tests done by hitting URLs with file paths directly is suffixed with _direct, while for the rewrite rules it's suffixed with _rewrite.

Configurations --

Nginx --

user nginx nginx;
worker_processes 4;
events {
multi_accept on;
worker_connections 11000;

error_log /var/log/nginx16/nginx.log;

http {
server {
access_log off;
listen [::]:80 backlog=2 so_keepalive=60:60:0;
root /home/nginx;
server_name RHEL6;
sendfile on;
reset_timedout_connection on;
server_tokens off;
open_file_cache max=20 inactive=99999;
open_file_cache_min_uses 1;
open_file_cache_valid 99999;
open_file_cache_errors on;
log_not_found off;

rewrite ^/file1$ /0 last;
rewrite ^/file2$ /1 last;
rewrite ^/file3$ /2 last;
rewrite ^/file4$ /3 last;
rewrite "^/list([234]){0,1}/(.*)$" /$2 last;

location / {
deny all;
location ~ ^/[0-9]$ {
allow all;
location = /hello.php {
allow all;
location ~ ^/file[1234]$ {
allow all;
location ~ "^/list([234]){0,1}/[0-9]$" {
allow all;

Apache tuned --

ServerRoot /opt/rh/httpd24/root/usr/lib64/httpd/
LoadModule authn_core_module modules/
LoadModule authz_core_module modules/
LoadModule unixd_module modules/
LoadModule rewrite_module modules/
LoadModule mpm_event_module modules/
#LoadModule php5_module /usr/lib64/httpd/modules/

Listen [::]:80 http
ListenBackLog 2
MaxConnectionsPerChild 0
MaxMemFree 0
ServerLimit 4
StartServers 4
# optimized for concurrency
#MaxRequestWorkers 10000
#ThreadLimit 2500
#ThreadsPerChild 2500
#MaxSpareThreads 10000
#MinSpareThreads 10000

# optimized for benchmark/HTTP header throughoutput
MaxRequestWorkers 100
ThreadLimit 25
ThreadsPerChild 25
MaxSpareThreads 100
MinSpareThreads 100

DocumentRoot /home/apache
ServerName RHEL6
User apache
Group apache
ErrorLog /var/log/httpd24/apache.log

LogLevel alert
AcceptPathInfo off
ContentDigest off
FileETag Inode Mtime
KeepAlive on
KeepAliveTimeout 60
MaxKeepAliveRequests 0
ServerTokens Full
TimeOut 5
EnableMMAP on
EnableSendfile on
ExtendedStatus off
LimitInternalRecursion 1
MaxRangeOverlaps none
MaxRangeReversals none
MergeTrailers off
Mutex pthread rewrite-map

RewriteEngine on
RewriteRule ^/file1 /0 [END,PT]
RewriteRule ^/file2 /1 [END,PT]
RewriteRule ^/file3 /2 [END,PT]
RewriteRule ^/file4 /3 [END,PT]
RewriteRule ^/list([234]){0,1}/(.*) /$2 [END,PT]

AllowOverride none
Options -FollowSymLinks
Require all denied
#SetHandler php5-script
Require all granted
Require all granted

Test commands --

echo -n list/1 list3/8 file1 list2/5 list2/0 file2 file4 list/7 list4/6 list3/3 | xargs -r -P 0 -n 1 -d ' ' -I {} /bin/bash -c 'ab -c -k -g /home/de//_$$ -s 1 -t 60 -n 9999999 -r http://[fc00::1:2]/{} &> /home/de//_stdout_$$'

echo -n {9..0} | xargs -r -P 0 -n 1 -d ' ' -I {} /bin/bash -c 'ab -c -k -g /home/de//_$$ -s 1 -t 60 -n 9999999 -r http://[fc00::1:2]/{} &> /home/de//_stdoout_$$'
The output of ab along with the report of each link served have been uploaded.

Test strategy --

Concurrency is the main thing we need to test.
We need to simulate situation when there are multiple low-bandwidth clients (all of them needs to be served concurrently to prevent some connections from being stalled). So we'll just increase the concurrent requests sent to the server using all the test machines's bandwidth.
Multiple TCP connection must be established in parallel; each of these connections will be reused to send multiple requests (keep alive will be turned on). Since the TCP connection has been established, we're not benchmarking the kernel. Speaking of which I could not get Nginx's keepalive to be turned off; that maybe the reason why Nginx was so fast in other benchmarks.
Since all webservers use sendfile() for delivering files, it's pointless to make the file large; we're not benchmarking the kernel. We're interested in how quickly the server creates HTTP headers.

Concurrency vs throughoutput (total requests/second).

Serving request serially, as opposed to concurrently is more efficient because of context switching and management overhead; we cant do anything about context switching, but the management overhead and the efficiency in constructing HTTP headers is what we want to benchmark.
But concurrency matters more than throughoutput.
If the server is independent, i.e. only it's CPU resources are used (network, disk I/o, a separate server like database are not the bottleneck like with these benchmarks), then increasing the webserver's concurrency will reduce the efficiency of the CPU cycles because of the context switching overhead. In these cases it's better to start serving another request when it finishes serving one request.
When there is a in-server server bottleneck like the network or the disk (for e.g. we'll take this e.g. for this para) and the requests are such that they take up quiet a lot of time reading the disk/network, it'll happen that the other requests timeout, or take too much time to respond. The longer the queue, more likely this'll happen. In these situations, increasing the concurrency will let the server serve multiple request in parallel sending progress to each user, abet slowly as compared to serving a single user at a time but without timing them out.
Another kind of bottleneck is towards the end user. A classic e.g. is downloading files where the client's network or the Internet is the bottle neck. If we do 1 download at a time, we wont be able to use all our hardware (disk, network etc..) to the fullest since the user's Internet connection is the bottleneck; to use it to the fullest we have to serve multiple clients. When it comes to these kind of situations, resource utilization IS about concurrency and concurrent efficiency becomes more important as the difference between the server's network speed and the client's network speed increases because that'll mean the server can serve more clients in parallel.
A similar bottleneck is when there is are multiple backend server (physical) which can handle, like, N queries in parallel. If there are X server, then the webserver must serve N*X requests in parallel to get the maximum utilization of the backend servers.

Cheating web servers --

Suppose we have a timeout of x seconds.
If the webserver is not serving the requests concurrently (to reduce context switching and management overhead and increase throughoutput), some of the requests will be within x seconds, while others will timeout.
A webserver which serves requests concurrently, will have all the requests timed out if the load exceeds a certain value.
A webserver which does not have a better concurrency is designed for benchmarks and will only perform good at benchmarks.

Apache vs Nginx in concurrency –

Nginx also appears to be serving the requests concurrently but with not as much concurrency as with Apache, but with Apache the responses were like within 21ms or lower, where as with nginx they were under 400ms; for Apache the distribution of the no. of requests served vs the interval under which they were served were not noted down for under 100ms, thus it may be cheating also.
Because of client machine limitation (it's a 10 years old machine), Apache maybe a lot faster than Nginx.

Verdict –

Apache is the clear winner.
If you switched to Nginx for the speed, your assumptions where false. Apache has a bigger toolkit, is faster and at the same time is security oriented. And in case you're wondering about the bigger CVE for Apache, it's because it has a bigger tookit and is older.
So it's all about for what purpose you tweak Apache.
Personally I don't understand the purpose of the Nginx project. If they want to optimized, they rather contribute to Apache or fork the project or create modules (something which Nginx doesn't even support) instead of creating a rival.