Using the haproxy load-balancer for increased availability

Posted by Steve on Wed 27 Aug 2014 at 14:12

HAProxy is a TCP/HTTP load-balancer, allowing you to route incoming traffic destined for one address to a number of different back-ends. The routing is very flexible and it can be a useful component of a high-availability setup.

The last time we looked at load-balancers was in 2005, where we briefly examined webserver load-balancing with pound.

HAProxy is a little more flexible than pound when it comes to configuration, and in this article we'll show how it can be used to balance traffic between the internet-at-large and a small number of local webservers, along with some of the more advanced facilities it supports.

The general usecase for a load-balancer is to present a service on a network which is actually fulfilled by a number of different back-end hosts. Incoming traffic is accepted upon a single IP address, and then sent to actually be fulfilled by one of a number of back-end hosts.

Splitting traffic like this allows a service to scale pretty well, barring any other limiting factors (such as a shared filesystem, a single database host, etc, etc).

HAproxy can be used to route traffic regardless of the protocol, for example it could provide load balancing to:

  • Webservers
    • Such as a number of hosts running Apache, nginx, lighttped, etc.
  • Mail servers
    • Such as a small pool of hosts running postfix, exim4, qpsmtpd, etc.
  • Arbitrary TCP services
    • Such as APIs implemented in go, lua, or node.js.

I'd imagine the most popular use-case though would be directing traffic to webservers. In this next example we'll show connections made to a single IP address can be passed to four backend hosts.

Getting Started with HAProxy

To get started first install HAProxy. Depending on the release of Debian you're running you might find you need to enable the backports repository first.

aptitude install haproxy

If you see a "package not found" response do consult the package search results here, for a clue.

Once installed the configuration is carried out solely by editing the configuration file /etc/haproxy/haproxy.cfg.

The following example is perhaps the simplest useful configuration, listening for incoming HTTP requests on port 80, and distributing those requests to one of four back-end hosts:

global
        log     /dev/log    local0
        log     /dev/log    local1 notice
        chroot  /var/lib/haproxy
        user    haproxy
        group   haproxy
        maxconn 1024
        daemon

defaults
        log     global
        mode    http
        option  httplog
        option  dontlognull

#
#  Listen on *:80 - Send traffic to the backend named "apache"
#
frontend www-http
    bind *:80
    default_backend apache

#
# Back-end definition.
#
backend apache
    mode http
    balance roundrobin
    server web1 10.0.0.10:8080
    server web2 10.0.0.20:8080
    server web3 10.0.0.30:8080
    server web4 10.0.0.40:8080

With this configuration file in place the service can be restarted to make it live, and the configuration file will be tested for errors before that occurs:

# service haproxy restart

This example is pretty similar to that we demonstrated with pound, so many years ago, however HAProxy has many useful additions which we can now explore.

Obviously for this example to be useful to you it must be updated to refer to the real backends, and they must be reachable from the host you're running the proxy upon. In this case our traffic is passed to port 8080 on a number of hosts in the 10.0.0.0/24 network. In my case I tend to run a small VPN to allow members of a VLAN to communicate securely. Even though I trust my hosting company I see no reason that my traffic should be sniffed.

Equally although this example will give you increased availability, because any failing backend will be removed, it won't provide high-availability because the proxy itself is now a single point of failure.

To use HAProxy for high-availability it should be coupled with IP failover to remove itself as a single point of failure.

Load-Balancing Modes

The simple example listed previously routed traffic "randomly" between the various backend hosts.

There are various different options, which may be specified via the "balance" directive, in the backend section. The three most common approaches are:

  • Distributing each request in turn to the next server:
    • balance roundrobin
  • Distributing each incoming request to the least loaded backed we have:
    • balance leastconn
  • Distribute each request to a particular server, based upon the hash of the source IP making that request:
    • balance source

Of these options only the "balance source" requires any real discusion. This method will ensure that a request from the IP address 1.2.3.4 will always go to the same backend, assuming it remains alive. This allows you to sidestep any issues with cookie persistence if sessions are stored locally.

The roundrobin mode also allows you to assign weights to the backends, such that bigger hosts can receive more of the traffic. The following example has four hosts, two of which have more RAM/CPU to burn, and receive more of the traffic:

backend apache
    mode http
    balance roundrobin
    server web1 10.0.0.10:8080  weight 20
    server web2 10.0.0.20:8080  weight 20
    server web3 10.0.0.30:8080  weight 10
    server web4 10.0.0.40:8080  weight 10

The "weight" parameter is used to adjust the server's weight relative to other servers. All servers will receive a load proportional to their weight relative to the sum of all weights, so the higher the weight, the higher the load. By giving the first two servers weights twice those of the last two we should see they handle twice as many requests as those.

Health Checks

HAproxy will notice if a back-end disappears entirely, because it will fail to connect.

Beyond that though you might wish to programatically determine whether a host is in the pool. The way you do that is by definining the URL that the proxy will poll.

Each backend host can have a URI defined which will be used to determine whether the host is alive - if that URI fails to return a "HTTP 200 OK" response then the host will be removed, and receive no new connections.

The following example will request the file /check.php, sending the correct HTTP host header, against each of the named servers.

backend apache
    mode http
    balance roundrobin
    option httpchk HEAD /check.php HTTP/1.1\r\nHost:\ example.com
    server web1 10.0.0.10:8080 check
    server web2 10.0.0.20:8080 check
    server web3 10.0.0.30:8080 check
    server web4 10.0.0.40:8080 check

Although we've not talked about load-balancing TCP-connections, rather than HTTP-connections, this next example shows how you could test that a Redis server is still working:

backend redis
   option tcp-check
   tcp-check send PING\r\n
   tcp-check expect string +PONG
   tcp-check send info\ replication\r\n
   tcp-check expect string role:master
   tcp-check send QUIT\r\n
   tcp-check expect string +OK
   server R1 10.0.0.11:6379 check inter 1s
   server R2 10.0.0.12:6379 check inter 1s

This example first sends a "PING" string, expecting a "PONG" reply, then tests that the remote host is a Redis master. As you can see this is both simple to configure and extraordinarily powerful.

If you consult the HAProxy documentation you'll find further details which can be used to specify the number of failures required to remove a host, and tweak things such that post-failure a host must respond positively a specific number of times before it is reintroduced, avoiding flaps as services come and go in quick succession.

Adding Gzip Support

Although it isn't possible to rewrite incoming requests, or massage the output received from the backend hosts arbitrarily one thing that is supported by HAProxy is adding Gzip compression between itself and the requesting client.

To enable this update your front-end definition:

frontend www-http
    bind *:80
    compression algo gzip
    compression type text/html text/plain text/javascript application/javascript application/xml text/css
    default_backend apache

Obviously there is a trade-off to be made here:

  • With no compression you'll serve more network traffic.
  • With compression enabled your CPU will have to do more work, (to performi the compression).

In the general case HAProxy has sufficiently low overhead that it is probably a good idea to enable compression. If your load-levels start to rise too high you might want to reconsider that though. This is a common server tuning consideration.

(Adding single/static headers is supported, but complex rewrites are not possible.)

DoS protection

HAProxy can be used to protect servers from particular kinds of attacks, most notably the "slowlaris" attack - where a remote host ties open multiple connections to your server and simply sends requests very very slowly.

To mitigate against this you'd add timeout options:

defaults
        log     global
        mode    http
        option  httplog
        option  dontlognull
        timeout http-request 5s
        timeout connect 5s
        timeout server 10s
        timeout client 30s

Here we've setup some timeout values which seem sane - If a remote client makes a request to your server that takes longer than 5 seconds it will be closed, for example.

Further documentation on the timeout options is available in the HAProxy website along with other notes on connection-counting.

If you have a sufficiently recent version of HAProxy it can be configured to keep a running count of the connections initiated by remote IP addresses - protecting you from a single host attempting to open many connections.

The version of HAProxy available to Debian's stable release, as a backport, doesn't currently support this connection-tracking, but you can install a later 1.5 version via the haproxy.debian.net site.

With a suitably recent release of HAProxy the following definition will allow you to reject more than ten simultaneous connections from a single source:

#
#  Listen on *:80 - Send traffic to the backend named "apache"
#
frontend www-http
    bind *:80

    # Table definition .
    stick-table type ip size 100k expire 30s store conn_cur

    # Shut the new connection as long as the client has already 10 opened
    tcp-request connection reject if { src_conn_cur ge 10 }
    tcp-request connection track-sc1 src

    default_backend apache

Final Example

The final version of our sample configuration file would look like this, taking advantage of all the options we've covered so far.

global
        log /dev/log    local0
        log /dev/log    local1 notice
        chroot /var/lib/haproxy
        user haproxy
        group haproxy
        daemon
        maxconn 1024

defaults
        log     global
        mode    http
        option  httplog
        option  dontlognull
        timeout http-request 5s
        timeout connect 5s
        timeout server 10s
        timeout client 30s


#
#  *:80
#
frontend www-http
    bind *:80
    reqadd X-Forwarded-Proto:\ http
    compression algo gzip
    compression type text/html text/plain text/javascript application/javascript application/xml text/css
    default_backend apache


#
# *:443
#
listen www-https
    bind :443 ssl crt /etc/haproxy/ssl.pem  no-tls-tickets ciphers ECDHE-RSA-AES128-GCM-SHA256:ECDHE-RSA-AES256-GCM-SHA384:ECDHE-RSA-AES128-SHA256:ECDHE-RSA-AES128-SHA:ECDHE-RSA-AES256-SHA384:ECDHE-RSA-AES256-SHA:ECDHE-RSA-RC4-SHA:AES128-GCM-SHA256:AES256-GCM-SHA384:AES128-SHA256:AES128-SHA:AES256-SHA256:AES256-SHA:RC4-SHA
    rspadd Strict-Transport-Security:\ max-age=31536000
    reqadd X-Forwarded-Proto:\ https
    compression algo gzip
    compression type text/html text/plain text/javascript application/javascript application/xml text/css
    default_backend apache


#
# Back-End definition.
#
backend apache
    mode    http
    balance leastconn
    option  http-server-close
    timeout http-keep-alive 3000
    option  forwardfor
    option  httpchk HEAD /check.php HTTP/1.1\r\nHost:\ www.example.org
    server  web1 10.0.0.10:8080 check
    server  web2 10.0.0.20:8080 check
    server  web3 10.0.0.30:8080 check
    server  web4 10.0.0.40:8080 check

There is a lot more to HAProxy than this brief introduction has covered, such as the SSL support, but I hope it was useful regardless.

 

 


Posted by Steve (2.126.xx.xx) on Thu 28 Aug 2014 at 07:05
[ View Steve's Scratchpad | View Weblogs ]

This article was updated just now, to note that you might want to use Debian backports to get the package, as it is not included in Debian Stable / wheezy.

You have two choices to choose between:

For a final link do feel free to take a look at this complex haproxy.cfg, which demonstrates many things, including rewriting request URLs, and matching backends dynamically.

Steve

[ Parent | Reply to this comment ]

Posted by Anonymous (88.162.xx.xx) on Tue 2 Sep 2014 at 23:57
Hi,
In the final example, there is a common issue : "timeout http-keep-alive" should be defined in the frontend, not in the backend, otherwise it is ineffective. Or better, add it to the defaults section to prevent such issue.
Thanks to the http-request timeout in the defaults, keep-alived connections will be closed in 5 seconds with the current configuration, which reduces the DoS risks.

There is still an exception where "timeout http-keep-alive" can be used in a backend, but it's quite uncommon : using a tcp frontend with an http backend.

Cyril Bonté.

[ Parent | Reply to this comment ]

Posted by oxtan (62.195.xx.xx) on Thu 11 Sep 2014 at 20:51
[ View Weblogs ]
how do you consolidate your apache log files when using such a setup?

[ Parent | Reply to this comment ]

Posted by Steve (2.126.xx.xx) on Thu 11 Sep 2014 at 20:53
[ View Steve's Scratchpad | View Weblogs ]

Personally I prefix the logfile with the hostname, and suffix with the date.

That way I can use rsync to pull them all together from:

host1.2014-09-11
host2.2014-09-11
host3.2014-09-11
host4.2014-09-11

Using cronolog makes that easy.

Steve

[ Parent | Reply to this comment ]

Sign In

Username:

Password:

[Register|Advanced]

 

Flattr

 

Current Poll

What do you use for configuration management?








( 666 votes ~ 10 comments )