Weblog entry #9 for fugit
Turns out you need to set net.ipv4.conf.default.rp_filter = 2 (or 0 for no spoof protection). Strict filter results in vlans not on the default gw to be broken. More details and links will be posted later. Unfortunetly I didn't find the links with the solutions till I had found the issue was net.ipv4.conf.default.rp_filter. I originally missed this in testing because you need to restart(networking) after making the changes. I am not sure how I missed this when rebuilding a new clean server with wheezy. When built from scratch with defaults rp_filter = 0. Like most problems it seems pretty obvious once you have the solution. The text in the sysctl.conf file says "Uncomment the next two lines to enable Spoof protection (reverse-path filter)." This pretty clearly was the issue. Sadly I tested twice to make sure change I had made were not causing the problem but the first time failed because I had not restarted the network or the server after reverting the changes to rp_filter. The second time I have no idea how I missed it on a clean build of a new server. After building the server and only changing the network config it presented the same symptoms, obviously I made a change or missed something. Hopefully this post will save someone else some time.
We are using a cisco Nexus 7000 switchs with gigabit ethernet module that supports 802.3ad. For more information regarding the different bonding options you can check out this link
Setup the port channel
interface port-channel170 description servername01 switchport mode trunk switchport trunk allowed vlan 45,48-49 vpc 170Configure the physical interfaces on the cisco switch:
interface Ethernet1/11 description servername#1 switchport mode trunk switchport trunk allowed vlan 45,48-49 spanning-tree port type edge channel-group 170 mode active no shutdown interface Ethernet3/11 description servername#2 switchport mode trunk switchport trunk allowed vlan 45,48-49 spanning-tree port type edge channel-group 170 mode active no shutdown ...Make sure the the "switchport trunk allowed vlan" has the vlans you are going to be doing on the linux server. Until these matched nothing worked for me.
Server HardWare: The current server we are using is a DL360pG8 which has a broadcom tg3 4 port card. This card has had several reported issues to rule this out I later installed a base wheezy package on an older server that was known to work with our confugration under squeeze and our current Nexus 7000 switch. This produced the same issues reported here. I had also tried using the backport kernel to further rule out drivers, this was before building a new server.
lspci | grep -i broad 03:00.0 Ethernet controller: Broadcom Corporation NetXtreme BCM5719 Gigabit Ethernet PCIe (rev 01) ...Linux Network Config:
Install the required pacakges and load bonding module
apt-get install vlan ifenslaveInterfaces Config: /etc/network/interfaces
# The loopback network interface auto lo iface lo inet loopback # The primary network interface #allow-hotplug eth0 auto bond0 iface bond0 inet manual #bond-mode 802.3ad bond-mode 4 bond-miimon 100 bond_downdelay 200 bond_updelay 200 bond_xmit_hash_policy layer2+3 bond_lacp_rate slow slaves eth0 eth1 eth2 eth3 auto vlan45 iface vlan45 inet static vlan_raw_device bond0 address 10.200.45.155 netmask 255.255.255.0 network 10.200.45.0 broadcast 10.200.45.255 auto vlan48 iface vlan48 inet static vlan_raw_device bond0 address 10.200.48.121 netmask 255.255.255.0 network 10.200.48.0 broadcast 10.200.48.255 gateway 10.200.48.1 auto vlan49 iface vlan49 inet static vlan_raw_device bond0 address 10.200.49.155 netmask 255.255.255.0 network 10.200.49.0 broadcast 10.200.49.255I had also ready posts regarding people having problems using the "pretty" or easy to read version above so I also tried the below configuration with the same results.
# The loopback network interface auto lo iface lo inet loopback # The primary network interface #allow-hotplug eth0 auto bond0 iface bond0 inet manual #bond-mode 802.3ad bond-mode 4 bond-miimon 100 bond_xmit_hash_policy layer2+3 bond_lacp_rate slow slaves eth0 eth1 eth2 eth3 auto bond0.45 iface bond0.45 inet static address 10.200.45.155 netmask 255.255.255.0 auto bond0.48 iface bond0.48 inet static address 10.200.48.121 netmask 255.255.255.0 gateway 10.200.48.1 auto bond0.49 iface bond0.49 inet static address 10.200.49.155 netmask 255.255.255.0Trouble Shooting:
ServerName# cat /proc/net/bonding/bond0 Ethernet Channel Bonding Driver: v3.7.1 (April 27, 2011) Bonding Mode: IEEE 802.3ad Dynamic link aggregation Transmit Hash Policy: layer2+3 (2) MII Status: up MII Polling Interval (ms): 100 Up Delay (ms): 200 Down Delay (ms): 200 802.3ad info LACP rate: slow Min links: 0 Aggregator selection policy (ad_select): stable Active Aggregator Info: Aggregator ID: 1 Number of ports: 4 Actor Key: 17 Partner Key: 32938 Partner Mac Address: 00:23:04:ee:be:0a Slave Interface: eth0 MII Status: up Speed: 1000 Mbps Duplex: full Link Failure Count: 0 Permanent HW addr: d8:9d:67:2c:aa:24 Aggregator ID: 1 Slave queue ID: 0 Slave Interface: eth1 MII Status: up Speed: 1000 Mbps Duplex: full Link Failure Count: 0 Permanent HW addr: d8:9d:67:2c:aa:25 Aggregator ID: 1 Slave queue ID: 0 Slave Interface: eth2 MII Status: up Speed: 1000 Mbps Duplex: full Link Failure Count: 0 Permanent HW addr: d8:9d:67:2c:aa:26 Aggregator ID: 1 Slave queue ID: 0 Slave Interface: eth3 MII Status: up Speed: 1000 Mbps Duplex: full Link Failure Count: 0 Permanent HW addr: d8:9d:67:2c:aa:27 Aggregator ID: 1 Slave queue ID: 0
filename: /lib/modules/3.2.0-4-amd64/kernel/drivers/net/bonding/bonding.ko alias: rtnl-link-bond author: Thomas Davis, firstname.lastname@example.org and many others description: Ethernet Channel Bonding Driver, v3.7.1 version: 3.7.1 license: GPL srcversion: 0384DF6574E0ED31BA573D8 depends: intree: Y vermagic: 3.2.0-4-amd64 SMP mod_unload modversions parm: max_bonds:Max number of bonded devices (int) parm: tx_queues:Max number of transmit queues (default = 16) (int) parm: num_grat_arp:Number of peer notifications to send on failover event (alias of num_unsol_na) (int) parm: num_unsol_na:Number of peer notifications to send on failover event (alias of num_grat_arp) (int) parm: miimon:Link check interval in milliseconds (int) parm: updelay:Delay before considering link up, in milliseconds (int) parm: downdelay:Delay before considering link down, in milliseconds (int) parm: use_carrier:Use netif_carrier_ok (vs MII ioctls) in miimon; 0 for off, 1 for on (default) (int) parm: mode:Mode of operation; 0 for balance-rr, 1 for active-backup, 2 for balance-xor, 3 for broadcast, 4 for 802.3ad, 5 for balance-tlb, 6 for balance-alb (charp) parm: primary:Primary network device to use (charp) parm: primary_reselect:Reselect primary slave once it comes up; 0 for always (default), 1 for only if speed of primary is better, 2 for only on active slave failure (charp) parm: lacp_rate:LACPDU tx rate to request from 802.3ad partner; 0 for slow, 1 for fast (charp) parm: ad_select:803.ad aggregation selection logic; 0 for stable (default), 1 for bandwidth, 2 for count (charp) parm: min_links:Minimum number of available links before turning on carrier (int) parm: xmit_hash_policy:balance-xor and 802.3ad hashing method; 0 for layer 2 (default), 1 for layer 3+4, 2 for layer 2+3 (charp) parm: arp_interval:arp interval in milliseconds (int) parm: arp_ip_target:arp targets in n.n.n.n form (array of charp) parm: arp_validate:validate src/dst of ARP probes; 0 for none (default), 1 for active, 2 for backup, 3 for all (charp) parm: fail_over_mac:For active-backup, do not set all slaves to the same MAC; 0 for none (default), 1 for active, 2 for follow (charp) parm: all_slaves_active:Keep all frames received on an interfaceby setting active flag for all slaves; 0 for never (default), 1 for always. (int) parm: resend_igmp:Number of IGMP membership reports to send on link failure (int)I also used tcpdump to determine where the connections were getting lost. I looked at them using wireshark. tcpdump -i any -U not port 22 -w /tmp/tcpdump_any_20131220.dump This showed that traffic was coming in no problem and everything was working except when connecting to vlan's that did not have a default gw and you were not on that vlan. This makes it look like a routing issue within the OS. If anyone would find the dump lines interesting let me know and I can dig them up and post them.
route -n Kernel IP routing table Destination Gateway Genmask Flags Metric Ref Use Iface 0.0.0.0 10.200.48.1 0.0.0.0 UG 0 0 0 vlan48 10.200.45.0 0.0.0.0 255.255.255.0 U 0 0 0 vlan45 10.200.48.0 0.0.0.0 255.255.255.0 U 0 0 0 vlan48 10.200.49.0 0.0.0.0 255.255.255.0 U 0 0 0 vlan49
ip route list default via 10.200.48.1 dev vlan48 10.200.45.0/24 dev vlan45 proto kernel scope link src 10.200.45.155 10.200.48.0/24 dev vlan48 proto kernel scope link src 10.200.48.121 10.200.49.0/24 dev vlan49 proto kernel scope link src 10.200.49.155On Cisco
show interface port-channel 170 port-channel170 is up vPC Status: Up, vPC number: 170 Hardware: Port-Channel, address: 44d3.cae5.50a2 (bia 44d3.cae5.50a2) Description: servername MTU 1500 bytes, BW 2000000 Kbit, DLY 10 usec reliability 255/255, txload 1/255, rxload 1/255 Encapsulation ARPA Port mode is trunk full-duplex, 1000 Mb/s Input flow-control is off, output flow-control is off Switchport monitor is off EtherType is 0x8100 Members in this channel: Eth1/11, Eth3/11 Last clearing of "show interface" counters never 52 interface resets 30 seconds input rate 80 bits/sec, 0 packets/sec 30 seconds output rate 1832 bits/sec, 2 packets/sec Load-Interval #2: 5 minute (300 seconds) input rate 112 bps, 0 pps; output rate 1.94 Kbps, 2 pps RX 380152 unicast packets 113302 multicast packets 3248 broadcast packets 496720 input packets 88421937 bytes 0 jumbo packets 0 storm suppression packets 0 runts 0 giants 0 CRC 0 no buffer 0 input error 0 short frame 0 overrun 0 underrun 0 ignored 0 watchdog 0 bad etype drop 0 bad proto drop 0 if down drop 0 input with dribble 0 input discard 0 Rx pauseLoaded Modules
lsmod | egrep '8021q|loop|bond' 8021q 19291 0 garp 13193 1 8021q bonding 79169 0 loop 22641 0Links:
discard packets when the route for outbound traffic differs from the route of incoming traffic
openvz on debian
ubnutu bug report where I found my answer
bondong on debian
bonding on wheezy
bonding on wheezy
broadcom related post tg3
openvz on wheezy
When you are making changes via sysctl and you use '-p' to load them don't forget to restart networking or the server. When you are in the thick of it remember to make your changes one step at a time so you can find the problem. Don't assume your first hunch is the answer.
Comments on this Entry