Weblog entry #211 for Steve
Over the past few months this site has become a lot less reliable than I would wish. This unreliability has been caused by two things:
- Kernel issues.
- Site issues.
The kernel the host has been running, until recently, was the stock Lenny AMD64 kernel. This would frequently hang with messages of the form:
task master:26085 blocked for more than 120 seconds. echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
I appear to have solved these issues by upgrading to a locally compiled kernel of a more recent revision.
The second class of problems seem to be self-inflicted. The machine hosting this site is an Athlon64 X2 3800 with 2GB of RAM and 2x 200GB drives. Unfortunately it has recently started suffering from the dreaded OOM-killer.
I intend to spend a few hours over the next few days to reduce the memory used by the server - via a combination of reverse proxying, local caching, and apache/mod_perl tweaks.
Additionally I've begged the provider, my employer, to up the memory. So in the next week that will be increased to 4Gb.
A combination of code tweaks and increased memory should hopefully restore normal service.
Comments on this Entry