Policy routing

Posted by Utumno on Wed 29 Mar 2006 at 07:21

Here's a brief tutorial how to connect a single server to 'the Internet' using multiple physical connections and route various services over different interfaces using a mechanism called 'policy routing'.

The Situation
I've got a home-built machine running Debian sid. It serves as my desktop, and also runs my personal weblog, a SSHd server and a small forum. Everything is connected to the Internet thru a DSL line (PPPoE mode, for a description of all possible DSL modes see DSL-HOWTO ).

Link speed is 1Mbps download / 64 Kbps upload. However, I can make 4 concurrent PPPoE connections and each one is going to achieve this speed.

The Problem
My connection speed is a bit too low to run the forum, use P2P and also confortably connect through SSH from the office (evil grin). However, I've got 3 connections sitting unused. So, the idea is to combine all 4 together and either use some kind of load-balancing setup or route given services through separate interfaces.

The Effect
At the end of this article, we are going to arrive at a setup where the server is connected through two PPPoE connections. The only open ports on ppp0 are 80 and 22 ( also some icmp ) and on ppp1 - tcp/udp 4662, Overnet server port, and tcp 4001, MLDonkey GUI port.

All packets sent by user 'mldonkey' are routed through ppp1, while all the rest is routed through ppp0. Using P2P no longer interferes with the rest of my networking activity.

The News
I am going to use just two connections, both of them through the same ISP and gateway. The good news is that this is by no means Linux' limitation; in fact, one can set up n connections to n different providers at the same time. Furthermore, besides routing decisions based on owner of the process who sends packets ( like in my case ) one can route packets based on many other criterions, like the TOS field, destination/source IP or incoming interface. One can even achieve a load-balancing setup by randomizing the route.

Stuff we need
First of all, we need some modules present in the kernel:
IP_ADVANCED_ROUTER
IP_MULTIPLE_TABLES
IP_ROUTE_FWMARK
and possibly IP_ROUTE_MULTIPATH if you're aiming at a load-balanced setup. All of them can be found in Networking -> Networking Support -> Networking Options -> TCP/IP networking in 2.6.15 kernel configuration. The 2.6.8 shipped with Sarge contains all of those compiled as modules.

We also need the excellent 'iproute2' and 'iptables' userspace written by Alexey Kuznetsov:
apt-get install iproute iptables


Step 1
First, we are going to bring up two concurrent PPPoE connections.

We will use two separate network cards on the server and a ADSL router with a 4-port switch (running in 'bridging' mode). We're going to use eth0 to make PPPoE connection 'ppp0', and eth1 <--> ppp1. I am going to assume the server already has one working PPPoE connection, and the connection was configured with Debian standard utility 'pppoeconf'.
'pppoeconf' creates a configuration file in /etc/ppp/peers/ by default named 'dsl-provider'. Here are it's contents: ( without the comments, I accepted all of pppoeconf's defaults )
noipdefault
usepeerdns
defaultroute
hide-password
lcp-echo-interval 20
lcp-echo-failure 3
connect /bin/true
noauth
persist
mtu 1492
noaccomp
default-asyncmap
plugin rp-pppoe.so eth0
user "your username here"
With such configuration, connection can be made with the command
pon dsl-provider
So, in order to create a second connection,
1) connect eth1 to a port in your DSL router ( doh! )
2) create file /usr/ppp/peers/dsl-connection2' which looks like this
noipdefault
usepeerdns
#defaultroute
hide-password
lcp-echo-interval 20
lcp-echo-failure 3
connect /bin/true
noauth
persist
mtu 1492
noaccomp
default-asyncmap
plugin rp-pppoe.so eth1
user "your username here"
i.e. the differences are:
- 'defaultroute' is commented out
- the forelast line tells the Roaring Penguin 'rp-pppoe.so' driver to connect through eth1.
3) create the second connection with
pon dsl-provider2
4) make this setup permanent across reboots with adding
auto dsl-provider2
iface dsl-provider2 inet ppp
     provider dsl-provider2
     pre-up /sbin/ifconfig eth1 up # line maintained by pppoeconf

auto eth1
iface eth1 inet manual
to /etc/network/interfaces. At this point it is beneficial to use ifrename or udev to make interface names consistant across reboots.

At this time we should have two independent PPPoE connections. The 'ppp1' is useless, though, because the only routing table that is currently used - 'main' - tells the system to route everything through ppp0: ( MY.GA.TE.WAY is, obviously, IP of my gateway )
angband:/etc/ppp/peers# route -n
Kernel IP routing table
Destination     Gateway     Genmask         Flags Metric Ref    Use Iface
MY.GA.TE.WAY    0.0.0.0     255.255.255.255 UH    0      0        0 ppp0
MY.GA.TE.WAY    0.0.0.0     255.255.255.255 UH    0      0        0 ppp1
10.0.0.0        0.0.0.0     255.0.0.0       U     0      0        0 eth2
0.0.0.0         0.0.0.0     0.0.0.0         U     0      0        0 ppp0


Step 2
Now we will create two additional routing tables 'PPP0' and 'PPP1'. As root:

1)create aliases for the routing tables
angband:/etc# echo 200 PPP0 >> /etc/iproute2/rt_tables
angband:/etc# echo 201 PPP1 >> /etc/iproute2/rt_tables
2) use 'ip' to create rules for table PPP0: just add a route to the gateway, and then a default route through ppp0:
angband:/etc# ip route add MY.GA.TE.WAY dev ppp0 table PPP0
angband:/etc# ip route add default via MY.GA.TE.WAY dev ppp0 table PPP0
3) the same for PPP1
angband:/etc# ip route add MY.GA.TE.WAY dev ppp1 table PPP1
angband:/etc# ip route add default via MY.GA.TE.WAY dev ppp1 table PPP1
You now can list the contents of your new routing tables with
angband:/etc/iproute2# ip route list table PPP0
MY.GA.TE.WAY dev ppp0  scope link 
default via MY.GA.TE.WAY dev ppp0 
angband:/etc/iproute2# ip route list table PPP1
MY.GA.TE.WAY dev ppp1  scope link 
default via MY.GA.TE.WAY dev ppp1 
At this point, 'ppp1' is still useless because the new routing tables are not used at all yet.

Step 3
So, how do we now route mldonkey's packets through ppp1? First, we mark all such packets with the following rule in the 'mangle' table:
iptables -t mangle -A OUTPUT -m owner --uid-owner 108 -j MARK --set-mark 1
'108' is the user id of mldonkey. The above rule will stamp all packets produced by user with such uid with a so-called 'FWMARK' equal to 1 ( all that before any routing decision is made )

A sidenote: here we took advantage of the fact that MLdonkey, as it is packaged in Debian, runs as a dedicated user 'mldonkey'. But what if you need to route some other system service that does not have its own user and runs as root, say, SSHd? Use the '--cmd-owner' parameter:
iptables -t mangle -A OUTPUT -m owner --cmd-owner sshd -j MARK --set-mark 1 
( another sidenote: AFAIK, the '--cmd-owner' flag does not work in recent ( >= 2.6.15 ) kernels )


Second, some of the promised 'policy routing' :
ip rule add fwmark 1 pri 100 table PPP1
That in turn tells the kernel to use table 'PPP1' when routing all packets marked with an FWMARK equal to 1.

However, there are a few more tricks we have to perform until this setup starts to work. For one, it is possible that the outgoing mldonkey's packets are already stamped with a source address which is different from the interface they're going out on. We have to remedy that using NAT:
iptables -t nat -A POSTROUTING -o ppp1 -j SNAT --to-source=I.P.OF.PPP1
where I.P.OF.PPP1 is, of course, ppp1's IP.

One more problem is that we have to disable rp_filter:
echo 0 > /proc/sys/net/ipv4/conf/ppp1/rp_filter 
rp_filter is a functionality which automatically rejects incoming packets if the routing table entry for their source address doesn't match the network interface they're arriving on. Normally, this has security advantages because it prevents the so-called IP-spoofing, but in our situation ( several IP addresses on different interfaces ) it can pose problems.

Next, we have to route all packets coming from interface X back through that interface:
ip rule add from I.P.OF.PPP0 pri 200 table PPP0
ip rule add from I.P.OF.PPP1 pri 300 table PPP1
The 'pri' ( short for 'priority' ) parameter controls the precedence in which the rules are applied. Routing algorithm goes from priority 0 upwards and applies the first matching rule. Notice that those two rules have to have a highier priority than the 'fwmark' one. ( also notice that, somewhat illogically, 'highier priority' here really means 'lower importance' ).

The rule table should now look like this:
angband:/etc/iproute2# ip rule list
0:      from all lookup local 
100:    from all fwmark 0x1 lookup PPP1 
200:    from I.P.OF.PPP0 lookup PPP0 
300:    from I.P.OF.PPP1 lookup PPP1 
32766:  from all lookup main 
32767:  from all lookup default 


At this point, everything should work correctly: all services ( except mldonkey ) should work through ppp0 normally, and if you start mldonkey and listen on ppp1:
/etc/init.d/mldonket-server start
tcpdump -i ppp1
you should see it working on ppp1.

Now if anyone initiates communication from outside using IP of interface ppp1, traffic will come out also through ppp1. Thus, even though when a SSH or Apache connection is initiated from inside the server it is always going to use ppp0, from outside you can connect to both of them through either ppp0 or ppp1.

Step 4
Let's now put everything together and make this setup permanent across reboots. While we are at it, let's also apply some firewalling so that only ports we actually need open are open.

I am going to achieve that in arguably not the best way: one script (/etc/init.d/my_initscript) will be added to initscripts ( this is the one that sets up firewalling and basic rules for our multipath setup ) and another (/usr/local/bin/check_ip) will run in a cronjob ( this one will watch if ppp0 and ppp1 are still up, bring them back up if not and adjust routing rules that depend on ppp0 and ppp1's IPs ( my IP is dynamic and my ISP kicks me out once every 3 days) )

Here goes the initscript, I hope the comments inside are sufficient:
#!/bin/sh

case "$1" in
        start)

echo "Setting up firewall rules..."


IPTABLES=/sbin/iptables
INTERNAL_IFACE=eth2
EXTERNAL_IFACE0=ppp0
EXTERNAL_IFACE1=ppp1
INTERNAL_IP=10.0.0.1
INTERNAL_NETWORK=10.0.0.0/24

# Start with a tough policy.
$IPTABLES -P INPUT DROP
$IPTABLES -P FORWARD DROP
$IPTABLES -P OUTPUT ACCEPT

# clean up
$IPTABLES -F
$IPTABLES -X
$IPTABLES -Z

# Filtering section
# INPUT chain
# We want to allow ONLY:
#  1. local (loopback) traffic
#  2. traffic from the Internet that is part of an existing connection (no new connections)

$IPTABLES -A INPUT -i lo -j ACCEPT
$IPTABLES -m state -A INPUT -i $EXTERNAL_IFACE0 --state ESTABLISHED,RELATED -j ACCEPT
$IPTABLES -m state -A INPUT -i $EXTERNAL_IFACE1 --state ESTABLISHED,RELATED -j ACCEPT

# on ppp0, allow SSH and HTTP
$IPTABLES -A INPUT -p tcp -m tcp --dport 22   -i $EXTERNAL_IFACE0 -j ACCEPT
$IPTABLES -A INPUT -p tcp -m tcp --dport 80   -i $EXTERNAL_IFACE0 -j ACCEPT

# in ppp1, allow mldonkey
$IPTABLES -A INPUT -p tcp -m tcp --dport 4662 -i $EXTERNAL_IFACE1 -j ACCEPT
$IPTABLES -A INPUT -p udp -m udp --dport 4662 -i $EXTERNAL_IFACE1 -j ACCEPT
$IPTABLES -A INPUT -p tcp -m tcp --dport 4001 -i $EXTERNAL_IFACE1 -j ACCEPT

# everywhere, allow ping and traceroute
$IPTABLES -A INPUT -p icmp -m icmp  --icmp-type 0               -j ACCEPT
$IPTABLES -A INPUT -p icmp -m icmp  --icmp-type 8               -j ACCEPT
$IPTABLES -A INPUT -p icmp -m icmp  --icmp-type 3               -j ACCEPT
$IPTABLES -A INPUT -p icmp -m icmp  --icmp-type 11              -j ACCEPT
$IPTABLES -A INPUT -p icmp -m icmp  --icmp-type 30              -j ACCEPT
$IPTABLES -A INPUT -p udp  -m state --state ESTABLISHED         -j ACCEPT
$IPTABLES -A INPUT -p icmp -m state --state RELATED,ESTABLISHED -j ACCEPT


# limit the number of incoming connections on port 22 ( SSH ) to 3 attempts a minute
$IPTABLES -I INPUT -p tcp --dport 22 -i $EXTERNAL_IFACE0 -m state --state NEW -m recent --set
$IPTABLES -I INPUT -p tcp --dport 22 -i $EXTERNAL_IFACE0 -m state --state NEW -m recent --update --seconds 60 --hitcount 3 -j DROP

# allow all ports on the internal interface
$IPTABLES -A INPUT -p udp -m udp -i $INTERNAL_IFACE --dport 1:65000 -j ACCEPT
$IPTABLES -A INPUT -p tcp -m tcp -i $INTERNAL_IFACE --dport 1:65000 -j ACCEPT

# mark all packets from mldonkey ( uid=108 ) so that later on we can route them thru ppp1
$IPTABLES -t mangle -A OUTPUT -m owner --uid-owner 108 -j MARK --set-mark 1

# switch off rp_filter ( otherwise packets coming back thru ppp1 get dropped by it )
echo 0 > /proc/sys/net/ipv4/conf/all/rp_filter

# hack: go around the case that we can get assigned the same IP 
# as we had before the reboot
DATE=`date +"%F %r"`
echo "$DATE REBOOT 1.1.1.1" >> /var/log/check_ppp0
echo "$DATE REBOOT 1.1.1.1" >> /var/log/check_ppp1

# call the cronjob script to complete the work for us 
# ( for example, set up the NAT rule - I can't do it here
# because I dont know ppp1's IP yet )
echo "Saving new ip..."
/usr/local/bin/check_ip 2> /var/log/check_errors

# add a routing policy to route all packets marked with a '1' thru the PPP1 table
ip rule add fwmark 1 pri 100 table PPP1

;;

stop)

;;

esac
Let's call this script '/etc/init.d/my_initscript' and add it to rc.d:
update-rc.d my_initscript defaults
And here's the /usr/local/bin/check_ip script:
#!/bin/sh

DIGITS="[0-9]\{1,3\}"
IP="$DIGITS\.$DIGITS\.$DIGITS\.$DIGITS"
DATE=`date +"%F %r"`
SLEEP=10
MAXTRIES=5

###############################################################################
####  check if PPP0 is up, if not, bring it up and remember its new IP
 
DEVICE0="ppp0"
LOGFILE0="/var/log/check_${DEVICE0}"
CURRENT_IP0=`/sbin/ifconfig $DEVICE0 | sed -n "s/.*addr:\($IP\) .*/\1/p"`

if [ "x$CURRENT_IP0" = "x" ]
then
        pon dsl-provider > /dev/null
        sleep $SLEEP
        CURRENT_IP0=`/sbin/ifconfig $DEVICE0 | sed -n "s/.*addr:\($IP\) .*/\1/p"`
        COUNTER=0

        while [ "x$CURRENT_IP0" ="x" -a $COUNTER -le $MAXTRIES ]
        do
                echo "$DATE Waiting for device $DEVICE0 for the $COUNTER time..." >> $LOGFILE0
                let "COUNTER += 1"
                sleep $SLEEP
                CURRENT_IP0=`/sbin/ifconfig $DEVICE0 | sed -n "s/.*addr:\($IP\) .*/\1/p"`
        done

        if [ $COUNTER -gt $MAXTRIES ]
        then
                echo "$DATE Failed to bring up device $DEVICE0, giving up..." >> $LOGFILE0
                exit 1
        fi
fi

if [[ -e $LOGFILE0 ]]
then
        LAST_IP0=`cat $LOGFILE0 | grep $IP | tail -1 | sed -n "s/.*\ \(.*\)/\1/p"`
fi

###############################################################################
####  same for PPP1

DEVICE1="ppp1"
LOGFILE1="/var/log/check_${DEVICE1}"
CURRENT_IP1=`/sbin/ifconfig $DEVICE1 | sed -n "s/.*addr:\($IP\) .*/\1/p"`


if [ "x$CURRENT_IP1" = "x" ]
then
        pon dsl-provider2 > /dev/null
        sleep $SLEEP
        CURRENT_IP1=`/sbin/ifconfig $DEVICE1 | sed -n "s/.*addr:\($IP\) .*/\1/p"`
        COUNTER=0

        while [ "x$CURRENT_IP1" ="x" -a $COUNTER -le $MAXTRIES ]
        do
                echo "$DATE Waiting for device $DEVICE1 for the $COUNTER time..." >> $LOGFILE1
                let "COUNTER += 1"
                CURRENT_IP1=`/sbin/ifconfig $DEVICE1 | sed -n "s/.*addr:\($IP\) .*/\1/p"`
                sleep $SLEEP
        done

        if [ $COUNTER -gt $MAXTRIES ]
        then
                echo "$DATE Failed to bring up device $DEVICE1, giving up..." >> $LOGFILE1
                exit 1
        fi
fi


if [[ -e $LOGFILE1 ]]
then
        LAST_IP1=`cat $LOGFILE1 | grep $IP | tail -1 | sed -n "s/.*\ \(.*\)/\1/p"`
fi

###############################################################################
####  Save new IP of ppp1; re-create ip rules that depend on ppp1's IP

if [ "x$LAST_IP1" != "x$CURRENT_IP1" ]
then
        echo "$DATE $CURRENT_IP1" >> $LOGFILE1

        ip rule del from $LAST_IP1    pri 300 table PPP1
        ip rule add from $CURRENT_IP1 pri 300 table PPP1

        GATEWAY1=`/sbin/ifconfig $DEVICE1 | sed -n "s/.*P-t-P:\(.*\)\ .*/\1/p"`

        ip route add $GATEWAY1 dev $DEVICE1 table PPP1
        ip route add default via $GATEWAY1 dev $DEVICE1 table PPP1

        iptables -t nat -D POSTROUTING 1
        iptables -t nat -A POSTROUTING -o $DEVICE1 -j SNAT --to-source=$CURRENT_IP1
fi

###############################################################################
####  same for ppp0 + save it's new IP in an external server.

if [ "x$LAST_IP0" != "x$CURRENT_IP0" ]
then
        # save my new ip to my server with a static ip. 
        # CGI scripts there redirect the traffic back to my home server.
        # I know I can do that with DynDNS, but somehow like to do 
        # everything by myself :) Here we have to have authorized_keys
        # set up for this to work.
        echo $CURRENT_IP0 | ssh MY_SERVER_WITH_STATIC_IP 'cat > ~/html/server/current_ip'
        RET=$?
        echo `date +"%F %r"` "new ip saved, ssh returned $RET" >> $LOGFILE0
        echo "$DATE $CURRENT_IP0" >> $LOGFILE0

        ip rule del from $LAST_IP0    pri 200 table PPP0
        ip rule add from $CURRENT_IP0 pri 200 table PPP0

        GATEWAY0=`/sbin/ifconfig $DEVICE0 | sed -n "s/.*P-t-P:\(.*\)\ .*/\1/p"`

        ip route add $GATEWAY0 dev $DEVICE0 table PPP0
        ip route add default via $GATEWAY0 dev $DEVICE0 table PPP0
fi
Add this script to root's cronjob with 'crontab -e' :
# m h  dom mon dow   command
# every 5 minutes check if we are still connected to DSL; 
# if not, reconnect and save the new IP to MY_SERVER_WITH_STATIC_IP.
*/5 * * * * /usr/local/bin/check_ip 2> /var/log/check_errors
Voilla, we are done!

TODO
- implement the scripts from the last section as an if-up and a small cronjob which only checks if the interface is still up, and if not, brings it up with 'ifup interface'.
- implement ideas from Advanced Routing HOWTO, Section 15.8, to shape traffic on ppp0
- make everything more robust
- how about rather than running a cronjob once every 5 minutes and potentially being down for those 5 minutes, we were able to be notified when a network interface goes down ? Anybody has some suggestions how it could be done?

Further reading
Advanced Routing HOWTO (especially chapters 3,4,11,15)
Multiple Default Gateways under Linux with iproute2
Nano-Howto on how to use more than one independent Internet connection.
IP Command Reference Can also be found in /usr/share/doc/iproute/ip-cref.*.gz
Shorewall and Multiple Internet Connections

 

 


Posted by duke (161.53.xx.xx) on Wed 29 Mar 2006 at 11:09
I'm doing advanced router with 3 ADSL connection and load balance. My modems are also in bridge mode and I must properly configure routing system when IPCP change my IP or when interface goes down. The solution with crontab is not OK for me because the script is executing every 5 minutes. For all this stuff (new IP, interface goes down) you can trigger diferent scripts (qos, firewall, routing conf.) with ip-up, ip-pre-up and ip-down scripts.

[ Parent | Reply to this comment ]

Posted by Utumno (60.248.xx.xx) on Fri 31 Mar 2006 at 12:21
[ View Utumno's Scratchpad | View Weblogs ]
ok, thanks. 5 years of experience with Debian, and I've never used the pre/post-up/down thingies...

[ Parent | Reply to this comment ]

Posted by Anonymous (71.252.xx.xx) on Fri 21 Apr 2006 at 14:15
Two very good books on Routing with Linux.


"Policy Routing Using Linux" by Matthew G. Marsh (ISBN: 0672320525).
http://www.policyrouting.org/
http://www.bestwebbuys.com/books/compare/isbn/0672320525/isrc/b-c ompare-srchbox


"Linux Routers: A Primer for Network Administrators, 2nd Edition" by Tony Mancill (ISBN: 0130090263).
http://mancill.com/linuxrouters/index.html
http://www.bestwebbuys.com/books/compare/isbn/0130090263/isrc/b-h ome-search

[ Parent | Reply to this comment ]

Posted by Anonymous (146.145.xx.xx) on Fri 20 Apr 2007 at 15:00
Great article! I see that you are just routing traffic that is on your firewall though. I would like to know what it takes to route traffic that is behind the firewall? For example, I would like to route traffic from say 192.168.1.5 through ppp1. I thought it would be as simple as doing iptables -t mangle -A PREROUTING -s 192.168.1.5 -j MARK --set-mark=1, but that doesn't work. Any ideas? Thanks again for the great article.

[ Parent | Reply to this comment ]

Posted by Anonymous (193.191.xx.xx) on Fri 22 Aug 2008 at 14:27
Very nice article.

A good way to implement link monitoring will be to use ip monitor and rtmon. This can also look a routing table.

http://linux-ip.net/gl/ip-cref/node152.html

Monitoring should be started before bringing network up.

Rgds.

Gmiga

[ Parent | Reply to this comment ]

Sign In

Username:

Password:

[Register|Advanced]

 

Flattr

 

Current Poll

What do you use for configuration management?








( 257 votes ~ 1 comments )