nginx大文件下载问题

On 2013年03月12日, in work, by netoearth

Lets take an example of a dedicated 1Gbps bandwidth server with 3 * 2TB disks. Keep first disk dedicated to OS and tmp. For other 2 disks you may create a software raid (For me, it worked better than on-board hardware raid). Else, you need to divide your files equally on independent disks. Idea is to keep both disk share read/write load equally. Software raid-0 is best option.

Nginx Conf There are two ways to achieve high level of performance using nginx.

  1. use directio

    aio on;
    directio 512; output_buffers 1 8m;

    “This option will require you to have good amount of ram” Around 12-16GB of ram is needed.

  2. userland io

    output_buffers 1 2m;

    “make sure you have set readahead to 4-6MB for software raid mount” blockdev –setra 4096 /dev/md0 (or independent disk mount)

    This setting will optimally use system file cache, and require much less ram. Around 8GB of ram is needed.

Common Notes:

  • keep “sendfile off;”

you may also like to use bandwidth throttle to enable 100s of connections over available bandwidth. Each download connection will use 4MB of active ram.

        limit_rate_after 2m;
        limit_rate 100k;

Both of above solution will scale easily to 1k+ simultaneous user on a 3 disk server. Assuming you have 1Gbps bandwidth and each connection is throttled at 1Mb/ps There is additional setup needed to optimize disk writes without affecting reads much.

make all Uploads to main os disk on a mount say /tmpuploads. this will ensure no intermittent disturbance while heavy reads are going on. Then move the file from /tmpuploads using “dd ” command with oflag=direct. something like

dd if=/tmpuploads/<myfile> of=/raidmount/uploads/<myfile> oflag=direct bs=8196k

另一篇文章:

Nginx on 1 GBps interface serving large files.



Our story started when we upgraded bandwidth for file servers to 1Gbps. We’ve analyzed RRD charts and found out that bandwidth usage is almost the same as before upgrade. 1Gbps service is not cheap, but this was not the problem we encountered. We wanted to provide better service for customers, so it was a new puzzle for us.

First of all, we made sure that the interface operates in full-duplex mode. Then we checked ports on the switch. We did some download/upload tests. We’ve talked with datacenter’s support and etc. In a nutshell, it was a slow and boring process. However, the results of the verifications were quite interesting.

We found that after ~ 900 active connections, the new session speed sticks at a few kBps.

It was surprising. It was a powerful server with Intel Xeon L5410, 6GB of memory and disk array with 20 SATA HDDs on hardware RAID controller. The software part was developed and designed to run on very demanding workloads: FreeBSD as OS and Nginx as web server. FreeBSD is an advanced operating system for the modern servers and is known for its high performance for network applications. Nginx is the third most popular web server in the world. It can handle static content (such as images, styles, js) with lightning speed, and greatly improve the speed of the site. Now we know that when it comes to serving large files, Nginxs behavior is not always predictable.

We have determined that the root of the problem is in Nginx. That conclusion was based on a simple test. When the web server reached 900 active connections, we tried to download a test file. The download speed was ~20kBps. In the same time, we started to download this file via FTP. FTP speed was more than 20MB.

The reason lies in the architecture of a web server. Nginx is based on an asynchronous model. It uses kernel calls such as epoll, kqueue, real-time signals, etc. You can find out more here: http://people.freebsd.org/~jlemon/papers/kqueue.pdf. The asynchronous model saves system resources and allows you to handle thousands of network connections very fast and effectively. Nginx uses this concept without regard to disk I/O. The disk may be busy with a lot of simultaneous reading or writing requests (keep in mind limited random seek speed) and when Nginx process tries to read data from that disk, it freezes. This is what happens when Nginx is distributing large files. Nginx process is constantly blocked because of I/O wait. I must say that there is no silver bullet for that kind of situation. The best result can only be obtained by experimenting with different parameters.

Here are steps that we did to fix the issue:

1) Increasing the number of worker processes, is the first thing to begin with. There is a base formula from which we should start working. The amount of worker processes should be equal to the number of CPU cores in system plus one. That’s exactly what we did. L5410 has 4 cores, each having 4 threads. We have set “worker_processes 17.” The situation became slightly better. But at a certain point of time, all 17 workers were working on the same disk, which caused delays and nginx response time could take up to half a minute.

2) The next step of setup was to choose a buffer size. Bufferization helps to reduce the amount of seek operations on disk and decrease process latency. For us, the best combination was sndbuf=32K output_buffers=1 512k. These parameters significantly increased server productivity, but still we were far away from using bandwidth at its full capability.

3) After that, we concentrated on send_file directive. This directive helps to decrease memory usage because it doesn’t use double compression (more about this can be found at http://www.freebsd.org/cgi/man.cgi?query=sendfile&sektion=2). If there are files with sizes of more than 4MB, it is better to completely disable this option. We had small files as well, so we used the direct I/O directive to disable send_file for big files. Disabling the send_file directive caused log file errors like “upstream prematurely closed connection while reading response header from upstream, client” to appear. The reason for this, as well, were locks on disks. Moreover, with send_file enabled, there were no such errors, but after disabling send_file they appeared.

4) We’ve tried several configuration options in the first three steps, but couldn’t get the desired result. In some cases we were loosing part of connections, but had the entire bandwidth used. In other cases, we didn’t use the full bandwidth, but everything was stable. To avoid locks on the disk, we’ve enabled aio kernel module and recompiled Nginx with support of aio. Aio showed good results, but it had its limitations. Disk I/O increased as well as the load averages on servers.

5) Given all the experience we have, we decided to dramatically change the entire File Server schema. We decided to run one Nginx instance per disk. The main Nginx instance (we call it connection manager) only redirects connections to the responsible instance, depending on the path to the file. It uses aio and never freezes. Drive instances use direct I/O and even if one disk is busy it does not affect other requests. With the following configuration we have gained 1GBps.

_______________________________________________________________________________________________________

nginx_main config
user www;
worker_processes 16;
worker_rlimit_nofile 600000;
error_log logs/error.log;

events {
worker_connections 51200;
use kqueue;
}

http {
# Global settings
include mime.types;
default_type application/octet-stream;
sendfile off;
aio on;
output_buffers 1 512k;
keepalive_timeout 15;
send_timeout 30s;
tcp_nopush off;
tcp_nodelay on;
gzip on;
client_body_temp_path /storage/s1/tmp 1 2;
client_max_body_size 102400m;
reset_timedout_connection on;
server_names_hash_bucket_size 512;

log_format main ‘$remote_addr – $remote_user [$time_local] “$request” ‘
‘$status $body_bytes_sent “$http_referer” ‘
‘”$http_user_agent”‘;

# Main virtual host
server {
listen ip1:80 sndbuf=32K;
# Put here your vhost stuff
# and do not forget to change ip
}
}

__________________________________________________________________________________________________
nginx_hdd1 config
user www;
worker_processes 2;
worker_priority -10;
worker_rlimit_nofile 600000;
error_log logs/error.log;

events {
worker_connections 51200;
use kqueue;
}

http {
# Global settings
include mime.types;
default_type application/octet-stream;
sendfile on;
aio off;
output_buffers 1 2m;
keepalive_timeout 15;
send_timeout 30s;
tcp_nopush on;
tcp_nodelay on;
client_body_temp_path /storage/s1/tmp 1 2;
client_max_body_size 102400m;
reset_timedout_connection on;

# Logging
log_format main ‘$remote_addr – $remote_user [$time_local] “$request” ‘
‘$status $body_bytes_sent “$http_referer” ‘
‘”$http_user_agent”‘;

# Virtual host for s1 disk
server {
listen ip2;
# Put here your vhost stuff
# and do not forget to change ip
}
}

 

 
Tagged with:  

Comments are closed.