Weblog entry #211 for Steve
Over the past few months this site has become a lot less reliable than I would wish. This unreliability has been caused by two things:
- Kernel issues.
- Site issues.
The kernel the host has been running, until recently, was the stock Lenny AMD64 kernel. This would frequently hang with messages of the form:
task master:26085 blocked for more than 120 seconds. echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
I appear to have solved these issues by upgrading to a locally compiled kernel of a more recent revision.
The second class of problems seem to be self-inflicted. The machine hosting this site is an Athlon64 X2 3800 with 2GB of RAM and 2x 200GB drives. Unfortunately it has recently started suffering from the dreaded OOM-killer.
I intend to spend a few hours over the next few days to reduce the memory used by the server - via a combination of reverse proxying, local caching, and apache/mod_perl tweaks.
Additionally I've begged the provider, my employer, to up the memory. So in the next week that will be increased to 4Gb.
A combination of code tweaks and increased memory should hopefully restore normal service.
Comments on this Entry
[ Send Message | View Steve's Scratchpad | View Weblogs ]
[ Parent | Reply to this comment ]
[ Send Message | View Steve's Scratchpad | View Weblogs ]
And hopefully now the use of nginx as a proxy to apache2 will have given a further boost - even with correct IP logging!
[ Parent | Reply to this comment ]
[ Send Message | View Steve's Scratchpad | View Weblogs ]
As documented Speeding up dynamic websites via an nginx proxy.
[ Parent | Reply to this comment ]
[ Parent | Reply to this comment ]
[ Send Message | View Steve's Scratchpad | View Weblogs ]
Not very much:
- MySQL - for the articles, comments, and similar.
- Memcached for caching.
- Apache2 for CGI handling.
- nginx for front-end & static file serving (+as of tonight)
- exim4 for sending out comment notification emails.
- Monit for service monitoring.
- Munin for monitoring & history.
half the problem comes about because some of the site code is naive, and the other half from badly behaved spiders which start spidering all links on the site - with blatent disregard for speed limits and broken links.
I was tempted to post a full process list, but that might be dull. Instead:
debian-administration:~# ps -ef| wc -l 66
Under load the apache instances spiral, but thats a conscious choice and mostly a good thing. (Not sure what nginx will do as that is almost brand new)
[ Parent | Reply to this comment ]
[ Parent | Reply to this comment ]
[ Send Message | View Steve's Scratchpad | View Weblogs ]
Thanks for that report - I've fixed things now.
I previously tested the incoming IP address and if it had ":" in it then I decided you were using IPv6.
Now nginx reports all IPv4 addresses as ::ffff:1.2.3.4 - so I needed to exclude anything with an ::ffff: prefix.
[ Parent | Reply to this comment ]
[ Parent | Reply to this comment ]
504 Gateway Time-outBut the comment got through. I am accessing this via Iceweasel 3.0.6 from Lenny. I use NoScript and CookieSafe, allowing scripts and cookies from this site, but don't think this affects this. I think I had the same problem last time.
nginx/0.7.62
[ Parent | Reply to this comment ]