Weblog entry #9 for fugit

Problem with Bonding and Vlan on Wheezy
Posted by fugit on Thu 26 Dec 2013 at 18:15
The Problem: Using the same configuration that worked under squeeze for Bonding and Vlan with Openvz, on wheezy it is failing. The symptoms are that only the vlan with the default gateway set are working. I can move the default gateway to any vlan and it will work. The vlans work on when communicating to machines on the same vlan. Using tcpdump/wireshark I confirmed that traffic is coming in but never making it out the default GW unless it is the vlan with the default gateway. On the squeeze servers you can see the traffic going out the default GW.

The Solution:
Turns out you need to set net.ipv4.conf.default.rp_filter = 2 (or 0 for no spoof protection). Strict filter results in vlans not on the default gw to be broken. More details and links will be posted later. Unfortunetly I didn't find the links with the solutions till I had found the issue was net.ipv4.conf.default.rp_filter. I originally missed this in testing because you need to restart(networking) after making the changes. I am not sure how I missed this when rebuilding a new clean server with wheezy. When built from scratch with defaults rp_filter = 0. Like most problems it seems pretty obvious once you have the solution. The text in the sysctl.conf file says "Uncomment the next two lines to enable Spoof protection (reverse-path filter)." This pretty clearly was the issue. Sadly I tested twice to make sure change I had made were not causing the problem but the first time failed because I had not restarted the network or the server after reverting the changes to rp_filter. The second time I have no idea how I missed it on a clean build of a new server. After building the server and only changing the network config it presented the same symptoms, obviously I made a change or missed something. Hopefully this post will save someone else some time.

Cisco Setup:
Cisco Hardware
We are using a cisco Nexus 7000 switchs with gigabit ethernet module that supports 802.3ad. For more information regarding the different bonding options you can check out this link

Setup the port channel
                                                                                                                                                                                                            
interface port-channel170                                                                                                                                                                                        
  description servername01                                                                                                                                                                                                                  
  switchport mode trunk                                                                                                                                                                                                                     
  switchport trunk allowed vlan 45,48-49                                                                                                                                                                                                    
  vpc 170                                                                                                                                                                                                                                   
Configure the physical interfaces on the cisco switch:
                                                                                                                                                                                                                                       
interface Ethernet1/11                                                                                                                                                                                                                      
  description servername#1                                                                                                                                                                                                                  
  switchport mode trunk                                                                                                                                                                                                                     
  switchport trunk allowed vlan 45,48-49                                                                                                                                                                                                    
  spanning-tree port type edge                                                                                                                                                                                                              
  channel-group 170 mode active                                                                                                                                                                                                             
  no shutdown                                                                                                                                                                                                                               
                                                                                                                                                                                                                                            
interface Ethernet3/11
  description servername#2
  switchport mode trunk
  switchport trunk allowed vlan 45,48-49
  spanning-tree port type edge
  channel-group 170 mode active
  no shutdown
...
Make sure the the "switchport trunk allowed vlan" has the vlans you are going to be doing on the linux server. Until these matched nothing worked for me.

Server HardWare: The current server we are using is a DL360pG8 which has a broadcom tg3 4 port card. This card has had several reported issues to rule this out I later installed a base wheezy package on an older server that was known to work with our confugration under squeeze and our current Nexus 7000 switch. This produced the same issues reported here. I had also tried using the backport kernel to further rule out drivers, this was before building a new server.
lspci | grep -i broad
03:00.0 Ethernet controller: Broadcom Corporation NetXtreme BCM5719 Gigabit Ethernet PCIe (rev 01)
...
Linux Network Config:
Install the required pacakges and load bonding module
apt-get install vlan ifenslave
Interfaces Config: /etc/network/interfaces
# The loopback network interface
auto lo
iface lo inet loopback

# The primary network interface
#allow-hotplug eth0

auto bond0
iface bond0 inet manual
        #bond-mode 802.3ad
        bond-mode 4
        bond-miimon 100
        bond_downdelay 200
        bond_updelay 200
        bond_xmit_hash_policy layer2+3
        bond_lacp_rate slow
        slaves eth0 eth1 eth2 eth3

auto vlan45
iface vlan45 inet static
        vlan_raw_device bond0  
        address 10.200.45.155  
        netmask 255.255.255.0  
        network 10.200.45.0
        broadcast 10.200.45.255

auto vlan48
iface vlan48 inet static
        vlan_raw_device bond0  
        address 10.200.48.121  
        netmask 255.255.255.0  
        network 10.200.48.0
        broadcast 10.200.48.255
        gateway 10.200.48.1

auto vlan49
iface vlan49 inet static
        vlan_raw_device bond0  
        address 10.200.49.155  
        netmask 255.255.255.0  
        network 10.200.49.0
        broadcast 10.200.49.255
I had also ready posts regarding people having problems using the "pretty" or easy to read version above so I also tried the below configuration with the same results.
# The loopback network interface
auto lo
iface lo inet loopback

# The primary network interface
#allow-hotplug eth0

auto bond0
iface bond0 inet manual
        #bond-mode 802.3ad
        bond-mode 4
        bond-miimon 100
        bond_xmit_hash_policy layer2+3
        bond_lacp_rate slow
        slaves eth0 eth1 eth2 eth3

auto bond0.45
iface bond0.45 inet static
        address 10.200.45.155
        netmask 255.255.255.0

auto bond0.48
iface bond0.48 inet static
        address 10.200.48.121
        netmask 255.255.255.0   
        gateway 10.200.48.1

auto bond0.49
iface bond0.49 inet static
        address 10.200.49.155
        netmask 255.255.255.0
Trouble Shooting:
On Linux
ServerName# cat /proc/net/bonding/bond0
Ethernet Channel Bonding Driver: v3.7.1 (April 27, 2011)

Bonding Mode: IEEE 802.3ad Dynamic link aggregation
Transmit Hash Policy: layer2+3 (2)
MII Status: up
MII Polling Interval (ms): 100  
Up Delay (ms): 200
Down Delay (ms): 200

802.3ad info
LACP rate: slow
Min links: 0
Aggregator selection policy (ad_select): stable
Active Aggregator Info:
        Aggregator ID: 1
        Number of ports: 4
        Actor Key: 17
        Partner Key: 32938
        Partner Mac Address: 00:23:04:ee:be:0a

Slave Interface: eth0
MII Status: up
Speed: 1000 Mbps
Duplex: full
Link Failure Count: 0
Permanent HW addr: d8:9d:67:2c:aa:24
Aggregator ID: 1
Slave queue ID: 0

Slave Interface: eth1
MII Status: up
Speed: 1000 Mbps
Duplex: full
Link Failure Count: 0
Permanent HW addr: d8:9d:67:2c:aa:25
Aggregator ID: 1
Slave queue ID: 0

Slave Interface: eth2
MII Status: up
Speed: 1000 Mbps
Duplex: full
Link Failure Count: 0
Permanent HW addr: d8:9d:67:2c:aa:26
Aggregator ID: 1
Slave queue ID: 0

Slave Interface: eth3
MII Status: up
Speed: 1000 Mbps
Duplex: full
Link Failure Count: 0
Permanent HW addr: d8:9d:67:2c:aa:27
Aggregator ID: 1
Slave queue ID: 0
filename:       /lib/modules/3.2.0-4-amd64/kernel/drivers/net/bonding/bonding.ko
alias:          rtnl-link-bond  
author:         Thomas Davis, tadavis@lbl.gov and many others
description:    Ethernet Channel Bonding Driver, v3.7.1
version:        3.7.1
license:        GPL
srcversion:     0384DF6574E0ED31BA573D8
depends:
intree:         Y
vermagic:       3.2.0-4-amd64 SMP mod_unload modversions
parm:           max_bonds:Max number of bonded devices (int)
parm:           tx_queues:Max number of transmit queues (default = 16) (int)
parm:           num_grat_arp:Number of peer notifications to send on failover event (alias of num_unsol_na) (int)
parm:           num_unsol_na:Number of peer notifications to send on failover event (alias of num_grat_arp) (int)
parm:           miimon:Link check interval in milliseconds (int)
parm:           updelay:Delay before considering link up, in milliseconds (int)
parm:           downdelay:Delay before considering link down, in milliseconds (int)
parm:           use_carrier:Use netif_carrier_ok (vs MII ioctls) in miimon; 0 for off, 1 for on (default) (int)
parm:           mode:Mode of operation; 0 for balance-rr, 1 for active-backup, 2 for balance-xor, 3 for broadcast, 4 for 802.3ad, 5 for balance-tlb, 6 for balance-alb (charp)
parm:           primary:Primary network device to use (charp)
parm:           primary_reselect:Reselect primary slave once it comes up; 0 for always (default), 1 for only if speed of primary is better, 2 for only on active slave failure (charp)
parm:           lacp_rate:LACPDU tx rate to request from 802.3ad partner; 0 for slow, 1 for fast (charp)
parm:           ad_select:803.ad aggregation selection logic; 0 for stable (default), 1 for bandwidth, 2 for count (charp)
parm:           min_links:Minimum number of available links before turning on carrier (int)
parm:           xmit_hash_policy:balance-xor and 802.3ad hashing method; 0 for layer 2 (default), 1 for layer 3+4, 2 for layer 2+3 (charp)
parm:           arp_interval:arp interval in milliseconds (int)
parm:           arp_ip_target:arp targets in n.n.n.n form (array of charp)
parm:           arp_validate:validate src/dst of ARP probes; 0 for none (default), 1 for active, 2 for backup, 3 for all (charp)
parm:           fail_over_mac:For active-backup, do not set all slaves to the same MAC; 0 for none (default), 1 for active, 2 for follow (charp)
parm:           all_slaves_active:Keep all frames received on an interfaceby setting active flag for all slaves; 0 for never (default), 1 for always. (int)
parm:           resend_igmp:Number of IGMP membership reports to send on link failure (int)
I also used tcpdump to determine where the connections were getting lost. I looked at them using wireshark. tcpdump -i any -U not port 22 -w /tmp/tcpdump_any_20131220.dump This showed that traffic was coming in no problem and everything was working except when connecting to vlan's that did not have a default gw and you were not on that vlan. This makes it look like a routing issue within the OS. If anyone would find the dump lines interesting let me know and I can dig them up and post them.
Routing table.
route -n
Kernel IP routing table
Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
0.0.0.0         10.200.48.1     0.0.0.0         UG    0      0        0 vlan48
10.200.45.0     0.0.0.0         255.255.255.0   U     0      0        0 vlan45
10.200.48.0     0.0.0.0         255.255.255.0   U     0      0        0 vlan48
10.200.49.0     0.0.0.0         255.255.255.0   U     0      0        0 vlan49
 
ip route list
default via 10.200.48.1 dev vlan48
10.200.45.0/24 dev vlan45  proto kernel  scope link  src 10.200.45.155
10.200.48.0/24 dev vlan48  proto kernel  scope link  src 10.200.48.121
10.200.49.0/24 dev vlan49  proto kernel  scope link  src 10.200.49.155
On Cisco
show interface port-channel 170
port-channel170 is up
 vPC Status: Up, vPC number: 170
  Hardware: Port-Channel, address: 44d3.cae5.50a2 (bia 44d3.cae5.50a2)
  Description: servername
  MTU 1500 bytes, BW 2000000 Kbit, DLY 10 usec
  reliability 255/255, txload 1/255, rxload 1/255
  Encapsulation ARPA
  Port mode is trunk
  full-duplex, 1000 Mb/s
  Input flow-control is off, output flow-control is off
  Switchport monitor is off
  EtherType is 0x8100
  Members in this channel: Eth1/11, Eth3/11
  Last clearing of "show interface" counters never
  52 interface resets
  30 seconds input rate 80 bits/sec, 0 packets/sec
  30 seconds output rate 1832 bits/sec, 2 packets/sec
  Load-Interval #2: 5 minute (300 seconds)
    input rate 112 bps, 0 pps; output rate 1.94 Kbps, 2 pps
  RX
    380152 unicast packets  113302 multicast packets  3248 broadcast packets
    496720 input packets  88421937 bytes
    0 jumbo packets  0 storm suppression packets
    0 runts  0 giants  0 CRC  0 no buffer
    0 input error  0 short frame  0 overrun   0 underrun  0 ignored
    0 watchdog  0 bad etype drop  0 bad proto drop  0 if down drop
    0 input with dribble  0 input discard
    0 Rx pause
Loaded Modules
lsmod | egrep '8021q|loop|bond'
8021q                  19291  0
garp                   13193  1 8021q
bonding                79169  0
loop                   22641  0
Links:
discard packets when the route for outbound traffic differs from the route of incoming traffic
linux_vlan_routing
openvz on debian
ubnutu bug report where I found my answer
bondong on debian
bonding on wheezy
bonding on wheezy
broadcom related post tg3
openvz on wheezy
Conclusion:
When you are making changes via sysctl and you use '-p' to load them don't forget to restart networking or the server. When you are in the thick of it remember to make your changes one step at a time so you can find the problem. Don't assume your first hunch is the answer.

 

Comments on this Entry

Posted by fugit (199.2.xx.xx) on Fri 27 Dec 2013 at 20:23
[ Send Message | View Weblogs ]
I have also tried downgrading to squeze packages which did not fix the problem.

dpkg --list | egrep 'ifensl|iproute|vlan'
ii ifenslave-2.6 1.1.0-17 amd64
ii iproute 20100519-3 amd64
ii vlan 1.9-3 amd64

Linux debnyintvz01 2.6.32-5-amd64 #1 SMP Mon Sep 23 22:14:43 UTC 2013 x86_64 GNU/Linux

[ Parent | Reply to this comment ]