Some simple Apache optimisations
Posted by Steve on Mon 18 Jul 2005 at 10:02
Apache is the world's most popular webserver, powering over half the websites on the internet. It is a stable and reliable platform, but sometimes it struggles under a lot of load. Here we'll look at a couple of simple changes to increase performance when handling a lot of traffic.
None of these tips are revolutionary, but combined they have allowed this site to stay up under two slashdottings. If you've not heard the term before a Slashdotting is what happens when a popular website such as Slashdot links to a smaller site - suddenly there are thousands of visitors all coming to your site. The sudden and sustained increase in incoming requests can frequently overload many servers.
Frequently the Slashdot effect will knock a site over, either because there's insufficient bandwidth to handle the incoming connections, or because the webserver isn't setup to handle such a large load. This site has survived two such links, most recently a single article received 16,000 readers in the space of a couple of hours.
So, what can we do to tune Apache? Well there are several small and large changes that can be made - depending upon your server some or all of these may not be appropriate, but they've worked for me both here and on other sites I've setup. (At times like this I feel like pimping out my server handholding and remote maintainence services .. ;)
DNS LookupsMaxClientsThe single biggest source of slowdown in most webservers is the time required to perform DNS lookups.
Typically a webserver will record the full host name of each incoming client connection in it's access.log. This resolving can eat a significant chunk of time, even with a DNS cache.
Disabling DNS lookups by ensuring your Apache setup contains "HostnameLookups Off" inside either /etc/apache/httpd.conf, or /etc/apache2/apache2.conf can immediately make your server capable of handling more traffic.
You might be concerned that this will make your server log files less readable, and affect any log file analysis you might wish to perform. But thankfully the Debian Apache package ships with the logresolve tool - this will perform hostname lookups upon your log file, and output a new one as output.
If you use webalizer or Awstats you can use the logresolve tool to add in the host names before the stats are generated.
I use webalizer to produce my site's statistics and simply instruct it to read it's logfile from access.log.resolved instead of the more typical access.log. I produce this file once a day, just before producing the statistics with the following small script:
#!/bin/sh cd /home/www/www.site1.com/logs logresolve < access.log > access.log.resolved /usr/bin/webalizer -q cd /home/www/www.site2.com/logs logresolve < access.log > access.log.resolved /usr/bin/webalizer -q
KeepAliveWhen Apache starts up it will create a number of listening processes, each of which will handle a given number of clients then exit.
(This process is complicated somewhat by the different MPM models available in Apache2 - but in general it's a fair statement.)
If you have a lot of incoming clients you can immediately handle more just by increasing the relevant counts.
If your server has reached the limit of what it can handle you'll see something like this in your error.log file:
[error] server reached MaxClients setting, consider raising the MaxClients settingThe settings look like this, although if you're using Apache2 you'll discover that your apache2.conf file has multiple versions of these settings, one for each of the process models available:
StartServers 5 MinSpareServers 5 MaxSpareServers 10 MaxClients 35 MaxRequestsPerChild 0The way to adjust these is to increase each number upwards by a small amount. This should allow you to handle more simultaneous clients, at the expense of running more processes. There's a fine balance to be maintained between running enough processes to handle the traffic, and running so many that your server slows down due to increased load.
Adjusting these settings appropriately will almost certainly be the single most useful change you can make to your server, but it's hard to give appropriate numbers. It really will depend upon your server, and what else you're running.
Deny OverRidesUsing KeepAlive is closely related to the MaxClients setting above.
Essentially KeepAlive keeps each listening connection alive for a short time to receive a potential followup request. Assuming that a client wishes to make several requests to your server it can do so en masse without having to make multiple distinct connections.
In this scenario KeepAlive is a useful optimisation, but it can mean that you have a lot of connections open uselessly waiting for followup requests which never occur.
A possible solution here is to allow KeepAlive, but only for a few seconds. This means that any client which requests another page quickly will receive it, but if it doesn't then the listening will stop - allowing your server to handle another connection instead.
To do this use:
# # Keep connections alive, but only for two seconds. # KeepAlive On KeepAliveTimeout 2
Compress ContentAnother common source of slowdown in Apache is the use of .htaccess files to change Apache's behaviour.
Many settings can be altered on a per-directory basis using these files, but looking for them and reading them will cause the server to slow down, and do more work than it really needs to.
For example the following URL:
This file should be something that Apache can serve quickly, there's nothing (obviously) dynamic about it. But if you allow the use of "Override files" then Apache must scan for and process:
- prefix/.htaccess
- prefix/some/.htaccess
- prefix/some/long/.htaccess
- prefix/some/long/path/.htaccess
Setting "AllowOverride None" inside any virtual hosts or directory directives you might have will disable this searching and reduce the amount of file testing and reading your server will need.
Of course many times you will discover that you need some directories to have specific processing - the solution here is to add such configuration settings inside your Apache setup directly.
Remove Debugging LogsCompress your content with mod_deflate, or mod_gzip, if you can.
Whilst there's some CPU overhead in performing this compression when serving a lot of mostly static content the network saturation is a bigger problem than CPU overload.
If you have CPU load issues you can easily disable this compression when you spot it.
Many Apache modules such as mod_rewrite (used for making prettier URLs) or mod_security (a simple security module) allow you to setup logfiles useful for debugging problems.
If you're happy that your setup is working correctly then you no longer need any logfiles so the following entries, for example, should be removed:
RewriteLog /tmp/rewrite.log SecFilterDebugLog /var/log/apache2/modsec_debug_log
Hopefully those small tips will allow you to setup your server to handle more load, and perform more efficiently if you get slashdotted.
If you're routinely suffering from lots of load these tips might not be so useful, instead you might need to consider:
- Having multiple webservers, each sharing the same common back end if you're using a database driven site.
- Installing a web cache in front of your server to avoid the overhead of generating a lot of identical content to visitors.
Both of these solutions will ease the load on your servers, but they are overkill for smaller sites.
If you have any tips of your own to share feel free to leave them in the comments!
- AWStats (and AFAIK webalizer too) can read logfiles with unresolved IP addresses, and do the resolve in the analyzing process - last time I used logresolve was many years ago :)
- the example url has a typo at the end, hmtl instead of html, not that it would matter too much :)
Also one more thing to optimize, I've often found that Server Side Includes parsing is on for any .html file with the setting
AddHandler server-parsed .shtml .html
Altough this is confortable for the webmaster, this can also cost some performance, since apache has to browse through all the static files too for SSI commands. It's better to stick with .shtml or in case of multiple sites on a box and a currently running large site, put the AddHandler part to the VirtualHost section of the site.
--
root.log
[ Parent | Reply to this comment ]
[ Send Message | View Steve's Scratchpad | View Weblogs ]
Thanks for the comment, I've fixed the typo you spotted.
You're right that SSI can also cause a slowdown, but I was under the impression that these were disabled by default - so I'd assume that if they were enabled for a reason the slowdown from them would be anticipated.
You're correct to point out that most of the logfile analysis packages will manage their own DNS lookups. I know that Awstats in particular has a few different options for that, and will cache hostname+IP addresses to avoid having to lookup previously resolved names.
Steve
-- Steve.org.uk
[ Parent | Reply to this comment ]
[ Parent | Reply to this comment ]
[ Parent | Reply to this comment ]
[ Send Message | View Steve's Scratchpad | View Weblogs ]
Would that involve using "Content Negotiation"?
I'd guess that the overhead of searching for matching documents might be almost enough to outweigh the benefit .. although without full details it's hard to test/know.
Steve
-- Steve.org.uk
[ Parent | Reply to this comment ]
[ Parent | Reply to this comment ]
[ Parent | Reply to this comment ]
[ Send Message | View Steve's Scratchpad | View Weblogs ]
Is there any point in doing so?
Steve
-- Steve.org.uk
[ Parent | Reply to this comment ]
[ Parent | Reply to this comment ]
[ Send Message | View Steve's Scratchpad | View Weblogs ]
Ahhh but here it's different, here I'm preventing people from seeing exactly which IP address is linked to each named account.
(Although that's a bit misguided in the case of anonymous users anyway).
In the server statistics that information isn't present - so the inclusion of "real" IP addresses doesn't involve any information leakage.
(And of course site administrators here see the full addresses...)
Steve
-- Steve.org.uk
[ Parent | Reply to this comment ]
Advantages: memory fast lookups for any file
Disadvantages: uses memory
[ Parent | Reply to this comment ]
The only time I have found RAM disks useful to boost performance -- and by RAM disc I do of course mean a tmpfs mount NOT old school RAM discs -- is when you need to write some temp files.
Read-only access is already fast.
[ Parent | Reply to this comment ]
[ Send Message | View Steve's Scratchpad | View Weblogs ]
Agreed.
Although there is the experimental module for Apache which does cache specific files mod_mmap_static which could be used if you were sure you wanted to cache a particular document - and avoid the overhead of a RAM disk.
Steve
-- Steve.org.uk
[ Parent | Reply to this comment ]
But remember that when you mmap data too much you could run out of address space for your temporary variables. And also you need file handle to be able to mmap...
[ Parent | Reply to this comment ]
Thanks in advanced my e-mail is activty@sciarada.net
[ Parent | Reply to this comment ]
[ Send Message | View Steve's Scratchpad | View Weblogs ]
You'd be better off asking elsewhere, like maybe the debian-user mailing list, or one of the relevent newsgroups.
Steve
--
[ Parent | Reply to this comment ]