Simple webserver load balancing with pound
Posted by Steve on Tue 20 Sep 2005 at 11:38
There are times when having only a single webserver is insufficient to handle the amount of traffic, or load, you're receiving. In this situation you have several options. If you have the ability to add new webservers into your setup then using pound might be a good approach.
For load-balancing there are several common solutions, depending upon your requirements:
- Buy a piece of dedicated load-balancing hardware.
- Use a simple solution, such as:
- Round-robin DNS.
- Load balancing with a software solution, such as pound.
The first solution might be the best one, but if you don't have the money to spend on dedicated hardware (after buying another server) then a software-only solution might be your only option.
Choosing between the other solutions will be a matter of knowing why you're using load balancing:
- To spread the load amongst a number of machines/locations.
- To provide redundancy in case one machine/server fails.
Using round-robin DNS gives you the ability to setup a pair, or more, of machines and have users "randomly" connect to a different host. This is simple and reasonably effective, however it doesn't give you much redundancy. (If one machine fails then some users will still be sent to that host, and will receive errors).
A simple non-DNS based load balancing setup will look something like this:
Here you can see there is a publicly visible host at the front, this will be the main machine to which users connect, www.example.com. Behind that is the actual cluster - incoming connections will be routed to one of those machines via "magic".
The magic involved might be a load balancing piece of hardware, an Apache module, or something else. In our case it will be an installation of the Pound software.
pound is very simple to understand and use. It is configured with a list of machines in the cluster, and accepts incoming HTTP-connections. When a request comes in it will be sent to one of the hosts in the pool.
If your server uses some form of state management, such as cookies, and it is important for a particular client to stay with a particular host for the duration of its connections then this can also be accomadated.
Installing the software is simple:
apt-get install pound
Once installed you can configure it by modifying the /etc/pound/pound.cfg file. Note that by default the package will be installed in a disabled state. Once you've configured the software appropriately you must enable it by changing the file /etc/default/pound.
The initial version looks like this:
# Defaults for pound initscript # sourced by /etc/init.d/pound # installed at /etc/default/pound by the maintainer scripts # prevent startup with default configuration # set the below varible to 1 in order to allow pound to start startup=0
The configuration of pound comes in three parts:
- Setting global options, such as which port to bind upon, etc.
- Setting up the list of machines in the cluster, to which requests will be forwarded.
- Setting up any state-parameters.
The global options will likely be setup already to your satisfaction, the only thing you will likely have to change is the IP address to bind upon. This can be setup via something like this:
ListenHTTP 123.123.123.123,80
pound has the ability to read the HTTP connections and take decisions based upon the requested URI. This allows you to send some requests, such as all those beneath http://example.com/images to a particular host. Here we will ignore this, and other options (such as SSL proxying).
Ignoring special handling, then, you'll define your list of machines via settings such as this:
UrlGroup ".*" BackEnd 192.168.1.1,80,1 BackEnd 192.168.1.2,80,1 BackEnd 192.168.1.3,80,1 EndGroup
Here the UrlGroup prologue means that this setting applies to all incoming URLs (".*" is a regular expression applied against the incoming request URI). The BackEnd settings are a list of IP addresses, ports, and priorities.
The priorities are used to express the relative power of the webserver at the given IP address. The acceptable values are 1-9, and those servers listed with a higher priority will receive more connections.
For example if you have two hosts in your cluster (192.168.1.{ 1 100}), and the machine 192.168.1.100 is twice as powerful as the other you could use the following to make sure it gets twice as many incoming connections:
UrlGroup ".*" BackEnd 192.168.1.1,80,1 BackEnd 192.168.1.100,80,2 EndGroup
pound will keep track of the status of each of the hosts in the cluster. This means it won't send requests to hosts which have failed. You can configure this checking period with a setting such as:
# Check backend machines every half-minute Alive 30
The only other thing you need to do is to consider how to maintain state. HTTP is a stateless protocol, and to add the illusion of state to it there are several different options in common use:
- The use of session cookies.
- The use of HTTP "Basic Authentication".
- Session parameters appended to all requests.
pound can handle any of these options, but you must tell it which to use. In the case of cookie-based session you must also specify the name of the cookie which is being used.
To specify the session type you must add the Session setting to your UrlGroup stanza. The available options are:
- IP
The session is kept based on client IP address. Specify this as follows:
Session IP N
- BASIC
The session is based upon HTTP "Basic Authentication", use it as follows:
Session BASIC N
- URL
The session is specified by a parameter appended to all URLs. You specify the name as follows:
Session URL phpsession N
- COOKIE
The sessions are maintained by a cookie passed with each connection. You specify the cookie name as follows:
Session COOKIE cookie-name N
The "N" value is the value for which sessions will be maintained, in seconds. After longer than the given time the client may be passed to another back-end machine.
A complete example, using the cookie name "auth" lasting for an hour would look like this then:
UrlGroup ".*" BackEnd 192.168.0.11,20,1 BackEnd 192.168.0.11,21,1 Session COOKIE auth 360 EndGroup
There are many more options you can tweak in pound and the man page does a good job of explaining them - especially combined with the homepage.
To read the man page run:
man pound
[ Send Message | View Steve's Scratchpad | View Weblogs ]
I guess it depends a lot on the kind of content you have as much as anything else.
For static HTML pages, or images, you could either use NFS, or you could rely upon something like rsync to allow all the hosts to have their own local copy of the content - and have it kept in sync.
(If you have customers uploading content you could have a dedicated host upload.example.com which will trigger a new rsync run; I'm not sure how they would respond to having to wait for the sync to complete before their content goes live though ..)
Personally I've used NFS before with good success, although you're correct in saying that's another potential point of failure. With backups, and a fast mirror you could avoid losing all your content though. I guess the risk of that is a hard one to qualify. (And again it depends on whether you're looking at balancing for redundancy, or for better load handling).
For database-driven sites you might find that having all the hosts in the pool connecting to a single server becomes a bottleneck, so looking at database replication is a good idea.
Steve
--
[ Parent | Reply to this comment ]
For potentialy bidirectionals content replication, unison might be a better option than rsync.
I use it to replicate the web app content and php sessions overs two machines.
[ Parent | Reply to this comment ]
[ Parent | Reply to this comment ]
I would love an article on this subject! :)
[ Parent | Reply to this comment ]
[ Parent | Reply to this comment ]
As for a single point of failure; you're already using a single machine with pound.
[ Parent | Reply to this comment ]
[ Parent | Reply to this comment ]
Another pound instance makes my life easier on a small masquerrading router, a firewall for ~10 machines, 4 of which are different development web-servers.. so all pages can be previewed to custommers on standard port 80, whatever server is used for development ;)
[ Parent | Reply to this comment ]
http://www.linuxvirtualserver.org/software/ipvs.html
If it isn't the same, I would be glad if someone can explain the difference.
If it does the same thing, wouldn't IPVS perform better, being a kernel module?
[ Parent | Reply to this comment ]
Well, you may want to read the last section of
http://www.linux.com/article.pl?sid=05/07/27/1729229
To sum up, LVS and Pound work on different OSI levels
(see http://en.wikipedia.org/wiki/OSI_Model )
[ Parent | Reply to this comment ]
[ Parent | Reply to this comment ]
[ Parent | Reply to this comment ]
i.e. linux 23.21.34.12 --> win2003 iis 31.222.134.23, win2003 iis 29.56.234.3
[ Parent | Reply to this comment ]
Pound is basically a proxy front-end so it'll listen and redirect connections anywhere else. It just doesn't often make sense to run it in the way you're describing.
[ Parent | Reply to this comment ]
is it possible to make that VPS Cluster Loadbalancing like annoknips.com do?
here is description of it.
http://server483.annoknips.com/press.php?view=1&articleid=9
thanks for writing.
bye
[ Parent | Reply to this comment ]
# Let assume our Server public ip is 202.54.1.5.
# Pound will run on 202.54.1.5 port 80 . Let us call this Pound Server.
# A Http-Request-A from a Client (Internet Browser) came into the Pound Server
# Pound Server will forward the Http-Request-A to list of internal Hosts eg. 192.168.1.11, 192.168.1.12, …. all on port 80. Let say, it pick the Host-192.168.1.11
# After the Host-192.168.1.11 process it, how does the Http-Response -A flow back to that Client (Internet Browser) ?
– Does it flow back DIRECTLY to that Client without going through the Pound Server ?
OR
– Does it flow back first to the Pound Server and then to that Client ?
I just want to investigate if there is some network bandwidth toll (incoming & outgoing) at the Pound Server which is used as a load balancer in a Cloud environment since Cloud eg.Amazon or Azure charge for incoming as well as outgoing data transfer. All I want is just the load balancing features using Pound Server inside a high availability nature of the Cloud.
Anyone, pls help. Thank a lot in advance
[ Parent | Reply to this comment ]
[ Send Message ]
But I have a question about those backend servers? How do you ensure they serve the same content? Especially when customers manage their web sites using FTP :)
Is using NFS the right solution? Or what do you use? Using NFS again gives you 1 point of failure and possibly IO bottleneck...
[ Parent | Reply to this comment ]