Speeding up dynamic websites via an nginx proxy

Posted by Steve on Sun 25 Oct 2009 at 13:17

Many of us are familiar with the use of Apache for hosting websites. It might not be the fastest webserver but it is extraordinarily popular, extremely flexible, and a great choice for most people. However there are times when it can struggle, and placing a proxy in front of it can be useful.

nginx is a very small, fast, and efficient HTTP server with a lot of built in smarts to allow it to work as a reverse proxy, and not just for HTTP, it also supports SMTP.

I've recently updated this site to change the way it works - as it has been struggling - and this brief introduction is mostly the documentation on the changes I made.

The site itself, like many, is made up of a mixture of static resources and dynamically generated content. In our case our dynamic content is produced a collection of Perl CGI scripts. The changes described in this brief introduction to using nginx would apply to any site that had a mixture of static & dynamic resources, so could apply equally well to a Ruby on Rails or PHP-based site.

Over the past couple of years I've noticed that Apache 2.x would perform well on an average day, but start to struggle and perform less well when under load. Apart from adding more memory to this host I wanted to change the setup to increase the number of connections and hits it could withstand.

My plan was:

  • Leave Apache's configuration as untouched as possible.
    • In case there were problems I wanted to be able to revert any changes easily.
  • Leave Apache to serve all the dynamic content as it did currently.
  • Place a dedicated smaller, faster, and simpler HTTP server to serve static resources.
    • With the expectation that this would leave Apache to handle the rest of the traffic which it would be able to do without being so distracted.

There are a several ways this plan could have been executed and the two most obvious were:

Shifting Resources Elsewhere

We could split the serving of static resources by moving them. For example rather than serving and hosting http://www.debian-administration.org/images/logo.png we could move that to a different domain, such as http://images.debian-administration.org/logo.png.

We could have created another sub-domain "static." to host any other content, such as CSS and Javascript files.

This would allow us to easily configure a second webserver to handle the static content (perhaps on a different host, but most likely on the same one). The downside to this approach would be the required updates to our site code, templates, and other files.

Introduce A Proxy

To avoid moving resources around, and the overhead this would entail, the most simple solution would be to place a proxy in front of Apache. This would examine the incoming HTTP request and dispatch it to either:

  • Apache if it were a request for /cgi-bin/
  • Another dedicated server for all static resources (e.g. *.gif, *.png)

The decision to use nginx was pretty simple, there are a few different proxies out there which are well regarded (including pound which we've previously introduced for simple load-balancing). nginx looked like the most likely candidate because it focuses upon being both a fast HTTP server and a proxy.

By working as a proxy and a HTTP server this cuts down the software we must use. Had we chosen a dedicated proxy-only tool we'd have needed to have three servers running:

  • The proxy to receive requests.
    • Apache2 for serving the dynamic content.
    • HTTP server for static content.

With nginx in place we have a simpler setup with only two servers running:

  • nginx to accept requests and immediately serve static content.
    • Apache to receive the dynamic requests that nginx didn't want to handle.

Installing nginx

The installation of nginx was as simple as we'd expect upon a Debian GNU/Linux host:

aptitude install nginx

Once installed the configuration files are all be located beneath the directory /etc/nginx. As with the Debian apache2 packages you're expected to place sites you'd like to be enabled into configuration files beneath a sites-enabled directory.

We'll not dwell on the nginx configuration files too much - the main one is /etc/nginx/nginx.conf and is pretty readable and sensible. The only change we need to make is to remove the file /etc/nginx/sites-enabled/default.

Configuring nginx & apache2

Configuring nginx itself is very simple, and our actual setup will consist of two parts:

  • Configuring nginx to listen upon port 80, and forward some requests to Apache.
  • Changing the Apache configuration so that it no longer listens upon *:80, instead another port will be used.

Our site is comprised of two virtual hosts our main main, and our planet. The latter site is by far the most simple one as it has no dynamic component to it, merely static files.

The setup of the static site consists of creating a configuration file for it at /etc/nginx/sites-available/planet.conf with the following content:

#
#  planet-debian-administration.org is 100% static, so nginx can
# serve it all directly.
#
server {
	listen :80;

	server_name  planet.debian-administration.org;

        access_log   /home/www/planet.debian-administration.org/logs/access.log;

	root   /home/www/planet.debian-administration.org/htdocs/;
}

This is sufficient for nginx to serve the virtual host planet.debian-administration.org from the directory /home/www/planet.debian-administration.org/htdocs - and log incoming requests to an appropriate file.

The dynamic handling of our main site is a little more complex. The contents of the /etc/nginx/sites-enabled/d-a.conf configuration file I came up with look like this:

#
#  This configuration file handles our main site - it attempts to
# serve content directly when it is static, and otherwise pass to
# an instance of Apache running upon 127.0.0.1:8080.
#
server {
	listen :80;

	server_name  www.debian-administration.org debian-administration.org;
        access_log  /var/log/nginx/d-a.proxied.log;

        #
        # Serve directly:  /images/ + /css/ + /js/
        #
	location ^~ /(images|css|js) {
		root   /home/www/www.debian-administration.org/htdocs/;
		access_log  /var/log/nginx/d-a.direct.log ;
	}

	#
	# Serve directly: *.js, *.css, *.rdf,, *.xml, *.ico, & etc
	#
	location ~* \.(js|css|rdf|xml|ico|txt|gif|jpg|png|jpeg)$ {
		root   /home/www/www.debian-administration.org/htdocs/;
		access_log  /var/log/nginx/d-a.direct.log ;
	}


        #
        # Proxy all remaining content to Apache
        #
        location / {

            proxy_pass         http://127.0.0.1:8080/;
            proxy_redirect     off;

            proxy_set_header   Host             $host;
            proxy_set_header   X-Real-IP        $remote_addr;
            proxy_set_header   X-Forwarded-For  $proxy_add_x_forwarded_for;

            client_max_body_size       10m;
            client_body_buffer_size    128k;

            proxy_connect_timeout      90;
            proxy_send_timeout         90;
            proxy_read_timeout         90;

            proxy_buffer_size          4k;
            proxy_buffers              4 32k;
            proxy_busy_buffers_size    64k;
            proxy_temp_file_write_size 64k;
        }
}

This configuration file has several points of interest, but for full details you'll need to consult the nginx documentation. The most obvious sections of interest are the rules which determine which content is handled directly by nginx.

You'll see that we have two different rules:

  • A rule which says anything beneath /images should be handled directly.
  • Another rule which says regardless of location *.png will always be handled directly.

These rules might seem redundant but it is better to be explicit about our intentions. The rest of the file contains settings for the forwarding of all other requests to the local Apache instance - I made no changes to the sample configuration here.

The other point to note is that I log incoming requests to two files, depending on whether they were proxied to our Apache instance or handled directly. This isn't really required but it gives an idea of which requests are going where.

With these two configuration files in place we're almost done, we just need to ensure that Apache is no longer going to claim port 80 as its own. We do this by modifying /etc/apache2/ports.conf to read:

NameVirtualHost *:8080
Listen 8080

<IfModule mod_ssl.c>
    # SSL name based virtual hosts are not yet supported, therefore no
    # NameVirtualHost statement here
    Listen 443
</IfModule>

This will ensure that Apache binds to port 8080 and not port 80. We then make matching changes to our virtual host files. For example /etc/apache2/sites-enabled/debian-administration.org:

#  Debian Administration domain.
#
<VirtualHost *:8080>
        ServerAdmin webmaster@debian-administration.org
        ServerName www.debian-administration.org
        DirectoryIndex index.cgi index.html

        DocumentRoot /home/www/www.debian-administration.org/htdocs/
        ...
        ...

With these changes in place we can switch to using our proxy:

/etc/init.d/apache2 stop
/etc/init.d/nginx start
/etc/init.d/apache2 start

(We stop apache2 so that port 80 becomes available, then start nginx which will use that port, and finally restart apache2 so that it will be available on port 8080 such that nginx can talk to it.)

Note: In this example Apache is listening on port 8080 on all IPs rather than just 127.0.0.1:8080 - I later changed this.

Problems Experienced

Once deployed there were two problems which were initially apparent:

  • Lack of IPv6 support.
  • Incorrect IP addresses being logged.

Unfortunately the version of nginx available in the Lenny release of Debian did not contain any IPv6 support - which was a real shame as we've been running upon IPv6 for quite some time now. (About 3% of our visitors use native IPv6, including myself, and I didn't want to lose them.)

The solution to the IPv6 problem was to backport the package available in Debian's unstable distribution (a painless process). Once this was done the nginx configuration file could be updated to read:

  # Listen on both IPv6 & IPv4.
  listen [::]:80;

The second problem was related to how Apache received all connections from the outside world via our local host via nginx. This meant it would believe each incoming request was made from the IP address 127.0.0.1.

Happily there was a very simple solution to this problem, the libapache2-mod-rpaf module for Apache 2.x which will allow the real IP address to be visible to our side, and the logfiles.

The RPAF module takes the IP address which initiated the original connection, and which nginx placed in a X-Forwarded-For header, and ensures this IP address is available to our dynamic scripts & apache logfiles.

Applying this solution was as simple as:

aptitude install libapache2-mod-rpaf
a2enmod rpaf
/etc/init.d/apache2 force-reload

Once this was done our incoming connections were logged correctly, and our code would see the real IP address for each connection rather than the loopback address of the proxy host.

Potential Changes

As you can see from the posted configuration files all incoming requests on port 80 will be either handled directly or proxied - but I made no changes to the handling of port 443, or SSL requests.

We've offered SSL for a significant length of time but few visitors use it, so I elected to leave this setup as-is.

If the situation changes then nginx will be updated to proxy SSL requests too - it has support for this, as a perusal of the documentation will suggest.

So far it is too early to tell if this solution has increased our scalability, but I'm very optimistic. Resource usage has certainly fallen and the combination of nginx and apache is a good one that isn't too complex.

 

 


Posted by stab (89.173.xx.xx) on Sun 25 Oct 2009 at 16:59
[ Send Message | View Weblogs ]
what is the performance improvement?
i'm interested in stress test results, and also the normal operation, if available

[ Parent | Reply to this comment ]

Posted by Steve (2001:0xx:0xx:0xxx:0xxx:0xxx:xx) on Sun 25 Oct 2009 at 20:50
[ Send Message | View Steve's Scratchpad | View Weblogs ]

Sadly it didn't occur to me to take benchmarks before & after - with ab or similar - but I can say page load times have decreased a noticeable amount, and memory usage is significantly lower.

Time will tell if this will prevent the previous OOM problems, but I'm cautiously optimistic.

Steve

[ Parent | Reply to this comment ]

Posted by Anonymous (70.95.xx.xx) on Sun 8 Aug 2010 at 19:15
I don't have exact figures but from personal experience it's significant, depending on load levels. Nginx is marginally faster than Apache at static content, but negligibly so. However as load ramps up that difference becomes quite pronounced, especially as Apache spawns increasing numbers of processes to cope with the load which sucks up both memory and CPU cycles. Nginx doesn't need to do that.
Under "extreme" load I've seen mid-range servers running out of bandwidth before processing power, whereas under Apache the server would swap to death.

An alternative to proxying to Apache is to install and configure FastCGI as a series of daemons and proxy requests for dynamic content to them to handle. That also helps get away from a situation where Apache can swap itself to death.

[ Parent | Reply to this comment ]

Posted by Anonymous (200.120.xx.xx) on Sun 25 Oct 2009 at 20:18
Great article!

[ Parent | Reply to this comment ]

Posted by Anonymous (200.123.xx.xx) on Mon 26 Oct 2009 at 03:48
Interesting article, thanks a lot. Is a nice work-arround to apply.

But do you have investigated why apache behave so bad under heavy load ? Is that expected ? Couldn't that be improved ? Perhaps looking deeper through this is the real solution.

[ Parent | Reply to this comment ]

Posted by Steve (2001:0xx:0xx:0xxx:0xxx:0xxx:xx) on Mon 26 Oct 2009 at 20:09
[ Send Message | View Steve's Scratchpad | View Weblogs ]

I've never noticed Apache itself leak - the issues I've seen have always been that it has used a lot of memory, and launched lots of children.

I think the problem is mostly that the server and the scripts are memory intensive, not that Apache itself is leaking memory -though of course I could be mistaken.

Steve

[ Parent | Reply to this comment ]

Posted by Anonymous (190.231.xx.xx) on Mon 26 Oct 2009 at 20:05
what about http1.0, i had tested this solution, using apache+php+joomla in the backend, but have "http-encode chunked" problems... i cannot modify the client's PHP code...

[ Parent | Reply to this comment ]

Posted by Steve (2001:0xx:0xx:0xxx:0xxx:0xxx:xx) on Mon 26 Oct 2009 at 20:07
[ Send Message | View Steve's Scratchpad | View Weblogs ]

I ruled that out because I would have had to have three pieces of software (Apache + httpd + Proxy) rather than two (Apache + nginx).

Steve

[ Parent | Reply to this comment ]

Posted by Anonymous (207.172.xx.xx) on Mon 26 Oct 2009 at 23:50
that rpaf module is quite handy, thanks for pointer!

[ Parent | Reply to this comment ]

Posted by Anonymous (194.247.xx.xx) on Tue 27 Oct 2009 at 11:44
How does this compare to deploying squid on top of Apache using "content accelerator" mode?

[ Parent | Reply to this comment ]

Posted by simonw (84.45.xx.xx) on Wed 28 Oct 2009 at 02:23
[ Send Message | View Weblogs ]
Squid? I'd want to know how Varnish compares.

But both support squid and varnish support IPv6. I assume if expiry times are set on cacheable objects Varnish probably wouldn't need a configuration file beyond the listen on 80 request from localhost:8080.

But no benchmarks makes the whole thing pretty surreal.

Come on Steve, out of the box no tweaking, which gives you the most whiz - Apache, Varnish, Squid or NGINX?

[ Parent | Reply to this comment ]

Posted by Steve (2001:0xx:0xx:0xxx:0xxx:0xxx:xx) on Wed 28 Oct 2009 at 08:53
[ Send Message | View Steve's Scratchpad | View Weblogs ]

I wish I knew!

I suspect Varnish would win, but perhaps this is a good time to setup a new virtual machine and find out for real ..

Steve

[ Parent | Reply to this comment ]

Posted by Anonymous (91.98.xx.xx) on Mon 2 Nov 2009 at 15:10
Thanks for your good tutorial :)
Did anyone implement mod_cache for speeding up dynamic pages??
Thanks in advance

[ Parent | Reply to this comment ]

Posted by Anonymous (91.98.xx.xx) on Mon 2 Nov 2009 at 15:11
Thanks for your good tutorial :)
Did anyone implement mod_cache for speeding up dynamic pages??
Thanks in advance

[ Parent | Reply to this comment ]

Posted by Anonymous (83.241.xx.xx) on Fri 4 Dec 2009 at 10:25
It's possible to cache the dynamic content with nginx, by using proxy_cache_path, proxy_cache, proxy_cache_valid etc.
Supports file based and memcached based caching.

file based example:
proxy_cache_path
/some/directory levels=1:2
keys_zone=one:10m;

[ Parent | Reply to this comment ]

Posted by Anonymous (83.241.xx.xx) on Fri 26 Feb 2010 at 12:04
I agree, the tutorial should be extended to cover caching the dynamic pages as well!

Problem is that it's not that simple. Some cookies may be left out for the unique key, even some GET params (like a random= which often can be present). My experience tells me Varnish is far superior to do this with because of it's endless possibilities of configuration - but I might be wrong ;)

http://3molo.blogspot.com

[ Parent | Reply to this comment ]

Posted by Anonymous (91.98.xx.xx) on Mon 2 Nov 2009 at 15:12
Thanks for your good tutorial :)
Did anyone implement mod_cache for speeding up dynamic pages??
Thanks in advance

[ Parent | Reply to this comment ]

Posted by Anonymous (81.109.xx.xx) on Thu 19 Nov 2009 at 16:40
If all you use apache for is plain CGI, why not serve this from nginx as well and drop apache?

Can you post a benchmark against apache (static + dynamic content) and against nginx (same content)?

[ Parent | Reply to this comment ]

Posted by Anonymous (195.49.xx.xx) on Sun 29 Nov 2009 at 15:51
nginx much more faster. I've tested it, it was about 90-500% faster than apache

[ Parent | Reply to this comment ]

Posted by Anonymous (70.95.xx.xx) on Sun 8 Aug 2010 at 19:17
nginx comes with a caching module (like Apache). Have you considered enabling that and setting it to cache some of your obvious static content like images and css?

[ Parent | Reply to this comment ]

Posted by angelabad (85.87.xx.xx) on Mon 20 Sep 2010 at 01:17
[ Send Message ]
Hi Steve, I made a spanish translation of this kind article at: http://www.pastelero.net/2010/07/17-acelerando-websites-dinmicos- a-travs-de-un-proxy-nginx/

Thanks for your docs!

[ Parent | Reply to this comment ]

Posted by Anonymous (50.9.xx.xx) on Mon 23 Jul 2012 at 23:38
Hi Steve. Thanks for your tutorial. 3 years later, and it continues to be relevant. I wanted to point out that as of today (July 23rd, 2012), the notation you use for the server port in the d-a.conf configuration file "listen :80;" causes an error: "nginx error no host in ":80" of the "listen" directive" when starting up nginx. The correct notation for the current nginx release is "listen 80;"

[ Parent | Reply to this comment ]

Sign In

Username:

Password:

[Register|Advanced]

 

Flattr

 

Current Poll

Which init system are you using in Debian?






( 1057 votes ~ 6 comments )