Redeploying Debian-Administration.org ...
Posted by Steve on Mon 21 Jan 2013 at 12:59
For the past nine years this site has been hosted upon a single dedicated server, graciously donated by my employer Bytemark. Over time it has been upgraded, but despite that it has become apparent that a single-server wasn't sufficient, unless it was a huge server - so with that in mind I've recently redeployed this site in a mini-cluster.
On the face of it this site should be fast, it hosts content which is admirably suited for caching and it doesn't change too often. The entire contents of the MySQL database hosting the site can easily fit in RAM, and yet the performance has been degrading for many months.
I run this site as a hobby, but it was becoming frustrating having to wait for the slowly loading pages, and the all too frequent "Server On Fire" error message.
When it came to scaling there were a few options, to simplify things I went with what I thought was the simplest solution, splitting the site into logical components each of which would be handled differently:
- The database, which stores the articles, comments, and similar data.
- The application server which drives the site, interfacing between the clients and the database.
- The "Planet" aggregation server, which essentially builds a static HTML file every few minutes, and serves that static file.
The planet-site was the most basic to move. I created two virtual machines, added a floating IP address, and configured them. These machines have a simple cronjob to rebuild the planet, and it is then served by nginx. There is no caching in place because nginx is fast.
The database was similarly straight-forward. I installed the MySQL database on two new hosts, and configured them in master-master mode such that either of them could accept writes - which would then be replicated to the slave. In the normal course of events only a single one would be used - but when that dies the other is ready to be used.
FACT: All servers die, it is just a question of how often it will happen, and how painful the recovery will be.
Finally the application servers needed to be designed. Here I went overboard. This site was originally created to document the things I was struggling to remember, or to discuss and share knowledge of how I thought things should work. With that in mind I was absolutely happy to use this site itself as another experiment - downtime would be embarrassing, and annoying, but this site is one I know well and has a decent level of traffic which makes it a great playground.
(Despite the near year-long hiatus search-engine spiders are relentless, and contributed hugely to the site-slowdown.)
So rather than taking the cheap way out, using a hardware load-balancer, I came up with an interesting layout for the application servers:
- Create four application-servers, each of which will run Apache to serve the content.
- Have a single floating IP address which any one of four machines can claim.
- On that single floating server run Pound on :443, to redirect to :80, after handling the SSL magic.
- On port 80 of the floating IP we have varnish listening. Varnish is the well-known caching reverse proxy server.
- Varnish can talk directly to all four of the back-end machines, using its built in load-balancing facilities.
This scheme is a little more complex than just using a load-balancer, but it avoids a single point of failure in a way that a more traditional load-balancer wouldn't. If the traffic went like this then the loss of the load-balancer would cause the site to immediately become unavailable:
On this basis my initial temptation to use three machines for Apache and one for Load-Balancing purposes was ruled out early on.
In the current scheme each of the four hosts can claim the floating IP, and that means that each of them can become the load-balancer. Regardless of which host is the "master" the load-balancer will be able to proxy content between visitors and whichever of the back-end servers are online.
A simplified diagram of the new setup looks like this:
Of course life is hard so there were some other challenges. In the past the content of the site was dynamic, but featured layers of caching. Because the caching was local-only that had to be removed to avoid getting into a situation where different hosts were serving different content:
- web4 sees a new article - it invalidates its local cache - this means it is now current and up-to-date.
- web1, web2, & web3 don't see the cache invalidation, so they serve stale content.
There were a couple of different solutions here. I could have promoted the caching to an external, shared, memcached instance. That would allow all the cache hits, purges, and reads to come from a separate source - but given the changes I had to make I decided I would remove the local caching entirely. If four servers can't keep up I'm doing something wrong!
So, what stops the new site from melting? Two things:
- We have four instances of Apache running, so each one will receive 1/4 of the prior peak-load.
- We cache at the proxy layer.
- The varnish installation not only works as a load-balancer it also caches as much content as it can.
The site code-base has been reworked to avoid serving cookies for anonymous visitors - this means that the 90% of our content which is viewed by search-engines, and anonymous viewers, can come from the cache.
The final advantage of running Varnish on the shared IP, instead of running four copies behind a load-balancer is that there is only ever one instance of Varnish running - and it runs on the well-known shared IP address - so when we need to send a flush command we only do it once to a known address. The cache invalidation doesn't need to care about how many back-end hosts there are.
This actually updates our network diagram - the services the floating IP runs are now:
- :80 - Varnish
- :443 - Pound [which routes to varnish, after handling SSL.]
- :6082 - varnish admin [used solely to receive "flush now" requests from the VLAN.]
There were some minor loose ends involved in the migration; in a similar way to the (old) cache invalidation we assumed that when a new article was published we could regenerate the RSS feeds. In the new deployment that caused issues, as the hosts could each have out of sync RSS feeds.
My solution to this problem was to add a cronjob, it generates RSS feeds every five minutes, and if the new feed differs from the old feed the cache is flushed. This means that on the publication of a new article the cache is potentially flushed four times - but that's a small price to pay.
(There are also rules in place to always cache the RSS feeds, stripping cookies, etc. These rules are pretty site-specific and will almost certainly evolve.)
To keep an eye on the server I've got one final machine which runs "misc" services - most notably it runs a simple dashboard which is very similar to the one documented in the article Building a simple dashboard with redis and the node.js server.
The dashboard receives events when different things happen across all the machines in the cluster:
- A host is rebooted.
- RSS feeds are updated.
- The cache is flushed.
- A new article is published.
- A user logs in / logs out / creates a new account.
- A poll vote occurs.
- A new comment is submitted.
The dashboard allows me a near-realtime update on the status of the cluster.
In conclusion this site was previously hosted upon a single machine with 8GB RAM and two 500GB drives, it struggled, but now it should no longer do so. The cluster is comprised nine new hosts:
- 2 x DB servers.
- 2GB RAM & 50GB disk
- 2 x Planet servers.
- 2GB RAM & 20GB disk
- 4 x Application / Cache hosts.
- 4GB RAM & 20GB disk.
- 1 x Misc host / status panel.
- 4GB RAM & 50GB disk.
The new deployment should scale pretty much indefinitely now. If the site is slow then I'll tune the databases. If there is too much load I'll add more application servers. If the planet gets popular I'll add varnish there too.
To ease the administrative burden the setup of the hosts is all automated via my Slaughter tool - and the recipies used to carry out the admin work are documented.
Finally because I've been annoyed too much by the narrow layout, on a site where the content should be king, I've removed the left-panel, and made the right-panel collapsible. That allows more room for the delicious crunchy text.
I hope this was an interesting entry, it is probably the one that has taken the most effort to engineer, plan, and execute, in the history of the site.