Weblog entry #279 for simonw

Fun with Reverse Proxies
Posted by simonw on Tue 23 Sep 2008 at 01:08
Tags: none.

To resolve a bandwidth issue over a link, I thought I'd set up a caching reverse proxy. How hard can it be?

We already had Squid doing the same on another server. Immediately before doing the new server I upgraded the existing Squid box to Etch, where I discovered (in pre-upgrade testing) that the entire syntax for reverse proxies changed between 2.5 and 2.6 in Squid. Squid Reverse Proxy Wiki Page

The config file also seemed somewhat arcane. You set up a list of cache_peer's where the web server you want to reverse proxy is top of the tree of peers, and then ensure that content is sent to the right cache_peer by adding cache_peer_access or cache_peer_domain records. There are lots of confusing options available, and little advice on selecting between them (you do know the innards of HTTP back to front right?).

The performance of Squid as a reverse proxy seemed odd as it repeatedly requested large files I would have expect it to get from its cache. This resulted in significantly worse byte hit rate than I was hoping for.

Disappointed with Squid, but at least with a working solution in hand, I tried "varnish" 2.0beta1. Written from scratch to be a Reverse Proxy, rather than Squid's evolving to be every proxy you ever wanted. Since this was some old version of Centos I had to build from source, which felt like a return to the early 1990's.

The architecture of varnish is interesting, it creates one huge memory mapped file, and sticks the whole content of the cache in that memory mapped file (in various careful ways), and lets the kernel do the memory management. Even in the limited testing I did, varnish was noticably faster at serving large files, since it would cache them in RAM, where as Squid deliberately tries not to cache large files in RAM (it has a parameter to try and avoid it, which I'd tuned upwards!). Obviously one can tune Squid to cache larger files in memory but that just adds to the administrators work, and would seem just the kind of a decision that is best left to a computer rather than a system administrator! Varnish Architect Notes

Basically varnish "just worked" (initially).

I didn't need any "VCL" (read configuration file), the default behaviour of varnish 2.0beta1 was fine, all I did was to tell it to listen on "1.2.3.4:80" and backend it to "5.6.7.8:80", using a 2GB file in /var/lib/varnish as cache, which was all done with $DAEMON_OPTS, which unwound boils down to something like.

ulimit -n 131072
/usr/local/sbin/varnishd -a 1.2.3.4:80 \
             -T localhost:6082 \
             -b 5.6.7.8:80 \
             -u varnish -g varnish \
             -n instance1 \
             -s file,/var/lib/varnish/instance1/varnish_storage.bin,2G

Now I packed up and went home happy, where I could now get video and MP3 downloaded much faster, I was a bit annoyed that because the machine was 32 bit, I had only 2GB of cache. But odd things were happening, it seems the virtualisation software (OpenVZ) providing the proxy doesn't play nicely with memory mapped files (and I had 2 x 2GB of them, I did mention I had two servers I needed to proxy right!?). It thus failed to free up memory in a timely fashion when the OS needs it for minor tasks like letting root login and monitor performance. What testing I did do suggested that the performance of varnish, whilst good for cached content, wasn't any better at avoiding the rerequesting of those big files from the backend servers.

At this point I abandoned varnish, and switched back to squid. A quick query to the Squid mailing list, got me the bit of configuration I hadn't seen done before to simplify my configuration.

Basically I have two IP addresses on the proxy server (1.2.3.4 and 1.2.3.5), and two backend servers (backend1 and backend2) with two sets of different domains hosted on them. So one set of domains are pointed at 1.2.3.4 instead of backend1, and the other set of domains pointed at 1.2.3.5 instead of backend2. One could do this with two squid instances but a little fiddling with Squid ACLs and one instance is sufficient.

The relevant bit of the squid.conf

acl firstproxy myip 1.2.3.4/32
acl secondproxy myip 1.2.3.5/32

cache_peer  backend1 parent 80 7 no-query originserver no-digest
cache_peer_access backend1 allow firstproxy
cache_peer_access backend1 deny secondproxy

cache_peer  backend2 parent 80 7 no-query originserver no-digest
cache_peer_access backend2 allow secondproxy
cache_peer_access backend2 deny firstproxy

Conclusions

Both Squid and Varnish can be configured as HTTP accelerators (caching reverse proxies). A lot of the other software I found (Pound, Nginx) were non-caching proxies.

Varnish looks to be the product of choice if you can make it work! Iif you need to get into the nitty gritty of how a request is handled ( "VCL" provides the ability to rewrite the proxying process in great detail). I would think Varnish would also be better as a simple install and forget proxy in front of a slow web server product like Zope. Varnish was easier to configure and better documented for the task at hand. Squid documentation often mixed syntax from before and after the change for reverse proxies.

Varnish's demands on the OS may defeat overly simplistic virtualisation technologies. You want to be using varnish in places where you control the choice of OS, and hardware, so you can pick 64 bit OS, lots of RAM, and avoid undue complexity in the set-up. The varnish folk claim they should scale better on multiple CPUs, but I didn't find any benchmarks.

Both products are free software, are packaged in Debian (although the Debian versions are somewhat old), and I found excellent advice, and helpful people, in the relevant support forums of both. The varnish IRC folk were especially helpful, as was Amos on the Squid mailing list. Both products need better documentation, I made some tentative efforts to add more to the Squid wiki.

Neither proxy did as well on the byte hit rate as I'd expected. Clearly something is varying between the HTTP requests for large objects (mostly WMV files), that forces them to be fetched multiple times from the backend servers to the proxy, above and beyond the usual "Vary" header. Tips on how to diagnose such issues appreciated, do I have to read more tcpdump output?

There are a lot of broken multimedia players out there.

I'm grateful I don't have to proxy SSL just yet!

 

Comments on this Entry

Posted by diveli (150.101.xx.xx) on Tue 23 Sep 2008 at 03:18
[ Send Message | View Weblogs ]
Cool article!

We just set up reverse proxies too for a rather big client/campaign.. and used powerdns with geo-ip functionality to direct users from specific regions to specific international reverse proxies. All squid3.


We did the SSL by terminating SSL at the proxy with the relevant certs etc, and secure VPN from the proxy to the application servers behind. Seems to 'just work.'

[ Parent | Reply to this comment ]

Posted by simonw (84.45.xx.xx) on Tue 23 Sep 2008 at 14:04
[ Send Message | View Weblogs ]
The secure VPN sounds sensible. Terminating the tunnels in Squid is easy, but the steps to use Squid to recheck the SSL certificate when it sends the SSL request on, looked "interesting". I'm sure it is easy to those who do SSL all day.

One of the minor annoyances in testing was that the packaged Squid in Debian Lenny didn't have the SSL modules enabled. But it may be this is good if it encourages other clean solutions.

Anyway check out Varnish. I'd seen the odd bad comment, or suggestion it wasn't feature complete enough, but I was impressed.

[ Parent | Reply to this comment ]

User Login

Username:

Password:

[ Advanced Login ]

Register Account

Quick Site Search