Network resilience

Each server is connected to two different Ethernet switches for resilience – a switch failure is thus a non emergency situation.

Warning: Configuring bonding may not work first time, confirm that you have a working out-of-band connection (serial / LOM) to your server before trying to configure bonding so that you can recover the system if you’re left without normal network access to it.

To configure on Ubuntu 11.10

We need the ifenslave package installing

# apt-get install ifenslave

The bond module needs loading at boot

# echo "bonding" >> /etc/modules

Edit /etc/network/interfaces

auto eth0
iface eth0 inet manual
 bond-master bond0
  bond-primary eth0 eth1

auto eth1
iface eth1 inet manual
 bond-master bond0
 bond-primary eth0 eth1

auto bond0
iface bond0 inet static
 slaves eth0 eth1
 bond-mode 6
    bond-miimon 100
 address ..... netmask etc as per original eth0

Check the bonding is working

# cat /proc/net/bonding/bond0

More information

The switches are independent of each other, and consider the two links to be separate connections, it is important that any particular mac address is not regularly sourced from both of the links as this will cause mac table churn on the switches.

The functionality in operating systems for considering two links to be part of the same interface is often geared to bandwidth aggregation rather than resilience, so care needs to be taken to ensure that both links aren’t sourcing the same mac address regularly.

More information (linux specific)

The linux bonding driver is documented at http://www.kernel.org/doc/Documentation/networking/bonding.txt

https://help.ubuntu.com/community/LinkAggregation has details of how to do network resilience on Ubuntu. Note that the driver has a primary intended use of bandwidth increase, not resilience

Do not use bond mode 0 (balance-rr), 2 (balance-xor), or 3 (broadcast) as these source frames with the same mac address on both ports.

Modes 1 (active-backup), 4 (802.3ad), 5 (balance-tlb), 6 (balance-alb) are potentially acceptable, of these 4 is pointless (we can’t run ports on two different switches as part of an 802.3ad bundle, also in testing fail-back leads to other failures), and 1, 5, and 6 offers simplicity, transmit load balancing, and transmit and receive load balancing respectively.