Every serious developer has heard of the C10k problem, which is the networking challenge of accommodating 10,000 simultaneous connections, essentially users, on a web site. While accomplishing this feat is an incredible technical achievement and a cost-effective solution for web sites of a certain scale, it requires many infrastructural changes. For some people, solving the C10k problem might involve switching their web server; for others, it might involve using a different programming language or paradigm.
In this blog post, I will not claim to solve the C10k problem. Instead, I will show you how to support 2,000 concurrent connections on a static site by tuning Linux (Ubuntu 14.04) and Apache (2.4.7). Armed with a $20 a month DigitalOcean droplet, I was able to maintain 2,000 concurrent connections and 1,900 hits per second on this blog. Over an entire day, this performance translates to almost 165,000,000 total requests—more than enough to deal with seasonal load or a popular article on Hacker News.
Open File Limits
In the UNIX abstraction of a computer, all input and output operations are performed on files. This design decision was made to present programmers with a unified I/O interface, leaving all of the hard work to the operating system. As a result, each new network connection we want to accommodate corresponds to an open file, and the operating system maintains a limit for the maximum number of files that can be considered “open” at any given time per user. You can check this setting on the command line by running (as root):
If you’re running a stock Ubuntu machine, the output of this will likely be 1,024. You can then check to see how many file descriptors you already have open by running:
On a stock Ubuntu 14.04 machine, this last command outputs 568, limiting me to 468 additional file descriptors. If I want to support 2,000 simultaneous HTTP connections, I’m more than 1,500 short!
To change this setting, we want to modify the root user’s open files limit (Apache runs as root). To do so, add the following lines to /etc/security/limits.conf:
The first setting is considered the “soft”, or initial, value. A running process can adjust its own open files limit up to the “hard” limit, which corresponds to the second setting. We’ll set these to 100,000 to allow the root user to open up to 100,000 files at a time if it really wants to, which should be more than enough for handling 2,000 concurrent connections.
In order for these settings to actually be enforced, we’ll have to add to add a new module to the system’s PAM (Pluggable Authentication Modules). In /etc/pam.d/common-session, add this line:
This module will ensure that the root user’s open file limits are set to the settings in /etc/security/limits.conf upon login.
After restarting Ubuntu,
ulimit -n should now output 100,000.
In order to have Apache scale quickly to the number of connections, we’ll be using its MPM Worker module. This module turns Apache into both a multi-process and multi-threaded server. For each incoming request, Apache will utilize a free thread on one of its running processes. When all threads have been exhausted, Apache will create a new process. This type of system utilizes fewer system resources than a process per request system because new threads are not as expensive to create or carry the memory requirements of a new process.
To enable this module, run
Now, you’ll want to modify /etc/apache2/mods-enabled/mpm_worker.conf to match this:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
Afterward, restart Apache with
sudo service apache2 restart. Theoretically, we now have Apache configured to
serve up to 2,048 simultaneous connections.
Benchmarking with Blitz.io
Blitz.io is a wonderful site for benchmarking web servers because they provide real internet load from many, possibly international, client machines and offer a free tier. Using their product, we can very quickly generate 2,000 simultaneous connections for a web site.
After a test run with 2,000 simultaneous California connections, I was shown these results:
The test started with a single connection and progressed to 2,000 over the course of 2 minutes. Overall, there were 0 timeouts and 0 errors. On an unconfigured machine receiving 1,000 simultaneous Blitz.io connections, I received an error rate of 13.0%, and 6% of requests timed out. Configuration makes a big difference.
This post should leave you with the right tools to achieve similar scalability feats. Although I didn’t achieve 10,000 simultaneous connections with Apache, I was able to sustain a more realistic 2,000 connections, which is probably more than enough for Hacker News.