Simple webserver load balancing with pound

Posted by Steve on Tue 20 Sep 2005 at 11:38

There are times when having only a single webserver is insufficient to handle the amount of traffic, or load, you're receiving. In this situation you have several options. If you have the ability to add new webservers into your setup then using pound might be a good approach.

For load-balancing there are several common solutions, depending upon your requirements:

  • Buy a piece of dedicated load-balancing hardware.
  • Use a simple solution, such as:
    • Round-robin DNS.
    • Load balancing with a software solution, such as pound.

The first solution might be the best one, but if you don't have the money to spend on dedicated hardware (after buying another server) then a software-only solution might be your only option.

Choosing between the other solutions will be a matter of knowing why you're using load balancing:

  • To spread the load amongst a number of machines/locations.
  • To provide redundancy in case one machine/server fails.

Using round-robin DNS gives you the ability to setup a pair, or more, of machines and have users "randomly" connect to a different host. This is simple and reasonably effective, however it doesn't give you much redundancy. (If one machine fails then some users will still be sent to that host, and will receive errors).

A simple non-DNS based load balancing setup will look something like this:

Simple cluster diagram

Here you can see there is a publicly visible host at the front, this will be the main machine to which users connect, www.example.com. Behind that is the actual cluster - incoming connections will be routed to one of those machines via "magic".

The magic involved might be a load balancing piece of hardware, an Apache module, or something else. In our case it will be an installation of the Pound software.

pound is very simple to understand and use. It is configured with a list of machines in the cluster, and accepts incoming HTTP-connections. When a request comes in it will be sent to one of the hosts in the pool.

If your server uses some form of state management, such as cookies, and it is important for a particular client to stay with a particular host for the duration of its connections then this can also be accomadated.

Installing the software is simple:

apt-get install pound

Once installed you can configure it by modifying the /etc/pound/pound.cfg file. Note that by default the package will be installed in a disabled state. Once you've configured the software appropriately you must enable it by changing the file /etc/default/pound.

The initial version looks like this:

# Defaults for pound initscript
# sourced by /etc/init.d/pound
# installed at /etc/default/pound by the maintainer scripts

# prevent startup with default configuration
# set the below varible to 1 in order to allow pound to start
startup=0

The configuration of pound comes in three parts:

  • Setting global options, such as which port to bind upon, etc.
  • Setting up the list of machines in the cluster, to which requests will be forwarded.
  • Setting up any state-parameters.

The global options will likely be setup already to your satisfaction, the only thing you will likely have to change is the IP address to bind upon. This can be setup via something like this:

ListenHTTP 123.123.123.123,80

pound has the ability to read the HTTP connections and take decisions based upon the requested URI. This allows you to send some requests, such as all those beneath http://example.com/images to a particular host. Here we will ignore this, and other options (such as SSL proxying).

Ignoring special handling, then, you'll define your list of machines via settings such as this:

UrlGroup ".*"
BackEnd 192.168.1.1,80,1
BackEnd 192.168.1.2,80,1
BackEnd 192.168.1.3,80,1
EndGroup

Here the UrlGroup prologue means that this setting applies to all incoming URLs (".*" is a regular expression applied against the incoming request URI). The BackEnd settings are a list of IP addresses, ports, and priorities.

The priorities are used to express the relative power of the webserver at the given IP address. The acceptable values are 1-9, and those servers listed with a higher priority will receive more connections.

For example if you have two hosts in your cluster (192.168.1.{ 1 100}), and the machine 192.168.1.100 is twice as powerful as the other you could use the following to make sure it gets twice as many incoming connections:

UrlGroup ".*"
BackEnd 192.168.1.1,80,1
BackEnd 192.168.1.100,80,2
EndGroup

pound will keep track of the status of each of the hosts in the cluster. This means it won't send requests to hosts which have failed. You can configure this checking period with a setting such as:

# Check backend machines every half-minute
Alive 30

The only other thing you need to do is to consider how to maintain state. HTTP is a stateless protocol, and to add the illusion of state to it there are several different options in common use:

  • The use of session cookies.
  • The use of HTTP "Basic Authentication".
  • Session parameters appended to all requests.

pound can handle any of these options, but you must tell it which to use. In the case of cookie-based session you must also specify the name of the cookie which is being used.

To specify the session type you must add the Session setting to your UrlGroup stanza. The available options are:

IP

The session is kept based on client IP address. Specify this as follows:

Session IP N
BASIC

The session is based upon HTTP "Basic Authentication", use it as follows:

Session BASIC N
URL

The session is specified by a parameter appended to all URLs. You specify the name as follows:

Session URL phpsession N
COOKIE

The sessions are maintained by a cookie passed with each connection. You specify the cookie name as follows:

Session COOKIE cookie-name N

The "N" value is the value for which sessions will be maintained, in seconds. After longer than the given time the client may be passed to another back-end machine.

A complete example, using the cookie name "auth" lasting for an hour would look like this then:

UrlGroup ".*"
BackEnd 192.168.0.11,20,1
BackEnd 192.168.0.11,21,1
Session COOKIE auth 360
EndGroup

There are many more options you can tweak in pound and the man page does a good job of explaining them - especially combined with the homepage.

To read the man page run:

man pound

 

 


Posted by marki (82.119.xx.xx) on Tue 20 Sep 2005 at 13:14
Nice article, thanks.
But I have a question about those backend servers? How do you ensure they serve the same content? Especially when customers manage their web sites using FTP :)
Is using NFS the right solution? Or what do you use? Using NFS again gives you 1 point of failure and possibly IO bottleneck...

[ Parent | Reply to this comment ]

Posted by Steve (82.41.xx.xx) on Tue 20 Sep 2005 at 13:20
[ View Steve's Scratchpad | View Weblogs ]

I guess it depends a lot on the kind of content you have as much as anything else.

For static HTML pages, or images, you could either use NFS, or you could rely upon something like rsync to allow all the hosts to have their own local copy of the content - and have it kept in sync.

(If you have customers uploading content you could have a dedicated host upload.example.com which will trigger a new rsync run; I'm not sure how they would respond to having to wait for the sync to complete before their content goes live though ..)

Personally I've used NFS before with good success, although you're correct in saying that's another potential point of failure. With backups, and a fast mirror you could avoid losing all your content though. I guess the risk of that is a hard one to qualify. (And again it depends on whether you're looking at balancing for redundancy, or for better load handling).

For database-driven sites you might find that having all the hosts in the pool connecting to a single server becomes a bottleneck, so looking at database replication is a good idea.

Steve
--

[ Parent | Reply to this comment ]

Posted by Anonymous (81.57.xx.xx) on Wed 21 Sep 2005 at 16:37

For potentialy bidirectionals content replication, unison might be a better option than rsync.

I use it to replicate the web app content and php sessions overs two machines.

[ Parent | Reply to this comment ]

Posted by Anonymous (213.104.xx.xx) on Tue 20 Sep 2005 at 14:39
The standard solution is to have the backend storage mounted via NFS, but you could potentially use a clustered file system such as GFS or OCFS2.

[ Parent | Reply to this comment ]

Posted by Anonymous (203.160.xx.xx) on Wed 21 Sep 2005 at 04:31
And what about codafs? Is it stable enough yet?
I would love an article on this subject! :)

[ Parent | Reply to this comment ]

Posted by Anonymous (213.216.xx.xx) on Sun 25 Sep 2005 at 11:25
I believe the coda utilities are only still found in experimental branch and the packages are still broken with a very old bug report unfixed.

[ Parent | Reply to this comment ]

Posted by Kellen (68.15.xx.xx) on Wed 21 Sep 2005 at 02:28
[ View Weblogs ]
You can also cluster samba servers behind the apache instances.

As for a single point of failure; you're already using a single machine with pound.

[ Parent | Reply to this comment ]

Posted by sh4rk (212.39.xx.xx) on Mon 26 Sep 2005 at 07:38
Marki try with drbd for apache home directories http://oss.linbit.com/drbd/ cool stuff

[ Parent | Reply to this comment ]

Posted by matej (158.193.xx.xx) on Wed 21 Sep 2005 at 08:19
I use pound to serve different kind of traffic to different webservers, e.g. all static content, images, js, html goes to very light-weight server, while php-cgi (& other cgi) goes to apache on another machine.
Another pound instance makes my life easier on a small masquerrading router, a firewall for ~10 machines, 4 of which are different development web-servers.. so all pages can be previewed to custommers on standard port 80, whatever server is used for development ;)

[ Parent | Reply to this comment ]

Posted by Anonymous (62.253.xx.xx) on Fri 16 Jun 2006 at 11:24
It seems to me that this is similar to what Linux Virtual Sever (IPVS) does
http://www.linuxvirtualserver.org/software/ipvs.html

If it isn't the same, I would be glad if someone can explain the difference.
If it does the same thing, wouldn't IPVS perform better, being a kernel module?

[ Parent | Reply to this comment ]

Posted by Anonymous (213.163.xx.xx) on Mon 31 Jul 2006 at 09:49
"I would be glad if someone can explain the difference"

Well, you may want to read the last section of
http://www.linux.com/article.pl?sid=05/07/27/1729229

To sum up, LVS and Pound work on different OSI levels
(see http://en.wikipedia.org/wiki/OSI_Model )

[ Parent | Reply to this comment ]

Posted by Anonymous (213.163.xx.xx) on Tue 1 Aug 2006 at 16:09
Oh, and 360 seconds is not exactly an hour...

[ Parent | Reply to this comment ]

Posted by Anonymous (84.90.xx.xx) on Sat 10 May 2008 at 00:34
yep. 3600 are.

[ Parent | Reply to this comment ]

Posted by Anonymous (77.8.xx.xx) on Mon 17 Nov 2008 at 13:55
is it possible to run POUND as load balancer on a linux vserver and serve two other windows vserver on another subnet in the internet?
i.e. linux 23.21.34.12 --> win2003 iis 31.222.134.23, win2003 iis 29.56.234.3

[ Parent | Reply to this comment ]

Posted by Anonymous (80.68.xx.xx) on Mon 17 Nov 2008 at 13:58
If it can reach them it'll work just fine - it doesn't matter where they are or what they're running.

Pound is basically a proxy front-end so it'll listen and redirect connections anywhere else. It just doesn't often make sense to run it in the way you're describing.

[ Parent | Reply to this comment ]

Posted by Anonymous (77.8.xx.xx) on Fri 30 Jan 2009 at 13:09
Hi

is it possible to make that VPS Cluster Loadbalancing like annoknips.com do?
here is description of it.
http://server483.annoknips.com/press.php?view=1&articleid=9
thanks for writing.

bye

[ Parent | Reply to this comment ]

Posted by Anonymous (120.140.xx.xx) on Thu 14 Apr 2011 at 10:43
I am new to reverse proxy using Pound, I would like to know how the Client Http request & response data flow in a scenarios as below:

# Let assume our Server public ip is 202.54.1.5.

# Pound will run on 202.54.1.5 port 80 . Let us call this Pound Server.

# A Http-Request-A from a Client (Internet Browser) came into the Pound Server

# Pound Server will forward the Http-Request-A to list of internal Hosts eg. 192.168.1.11, 192.168.1.12, …. all on port 80. Let say, it pick the Host-192.168.1.11

# After the Host-192.168.1.11 process it, how does the Http-Response -A flow back to that Client (Internet Browser) ?

– Does it flow back DIRECTLY to that Client without going through the Pound Server ?
OR
– Does it flow back first to the Pound Server and then to that Client ?

I just want to investigate if there is some network bandwidth toll (incoming & outgoing) at the Pound Server which is used as a load balancer in a Cloud environment since Cloud eg.Amazon or Azure charge for incoming as well as outgoing data transfer. All I want is just the load balancing features using Pound Server inside a high availability nature of the Cloud.

Anyone, pls help. Thank a lot in advance

[ Parent | Reply to this comment ]

Sign In

Username:

Password:

[Register|Advanced]

 

Flattr

 

Current Poll

What do you use for configuration management?








( 119 votes ~ 0 comments )