Two of the major characteristic of Ethernet bundles (LACP) are availability and flexibility: bundles resist to a single port failure, and ports can be added and removed from bundles without affecting the overall bundle availability.
Most devices that support Ethernet bundles set the "System ID" of the bundle (the bundle identifier in LACP) to a MAC address that is different from the MAC addresses of the ports part of the bundle, just because ports can be added and removed, so re-using an existing MAC address can generate duplicates if the specific port is removed from the bundle.
DPDK bonding driver has the possibility to set bonding MAC address (rte_eth_bond_mac_address_set), but this is not used as the SystemID: the SystemID is always set to the "aggregator" port mac address, that is the mac of the first port added to the bundle.
When that port is removed from the bundle, this automatically reconfigures itself with a different aggregator port, and so it changes its unique identifier in the LACP packets. Therefore the counterpart shuts the "old" bundle down as nobody is sending anymore that id. It will set up a new one with the new identifier, but in any case there is a connectivity interruption. So I bring the whole bundle down by just removing a single port. This is contrary to the main characteristic of bundles.
I could not find a way to keep the bonding up while removing the aggregator port. Is it possible?
What I succeeded doing is make sure that if a bonding mac address is specified, this one is used as SystemID in LACP packets (thus allowing a smooth removal of the aggregator port). If none is specified then the current behaviour (aggregator port) is maintained.
But I did this changing the driver code; DPDK 21.11 changes to achieve this:
--- rte_eth_bond_8023ad.c
+++ rte_eth_bond_8023ad.c.new
@@ -866,7 +866,7 @@ bond_mode_8023ad_periodic_cb(void *arg)
struct bond_dev_private *internals = bond_dev->data->dev_private;
struct port *port;
struct rte_eth_link link_info;
- struct rte_ether_addr slave_addr;
+ struct rte_ether_addr bond_addr;
struct rte_mbuf *lacp_pkt = NULL;
uint16_t slave_id;
uint16_t i;
@@ -893,7 +893,7 @@ bond_mode_8023ad_periodic_cb(void *arg)
key = 0;
}
- rte_eth_macaddr_get(slave_id, &slave_addr);
+ rte_eth_macaddr_get(bond_dev->data->port_id, &bond_addr);
port = &bond_mode_8023ad_ports[slave_id];
key = rte_cpu_to_be_16(key);
@@ -905,8 +905,8 @@ bond_mode_8023ad_periodic_cb(void *arg)
SM_FLAG_SET(port, NTT);
}
- if (!rte_is_same_ether_addr(&port->actor.system, &slave_addr)) {
- rte_ether_addr_copy(&slave_addr, &port->actor.system);
+ if (!rte_is_same_ether_addr(&port->actor.system, &bond_addr)) {
+ rte_ether_addr_copy(&bond_addr, &port->actor.system);
if (port->aggregator_port_id == slave_id)
SM_FLAG_SET(port, NTT);
}
@@ -1186,7 +1186,7 @@ bond_mode_8023ad_mac_address_update(stru
if (rte_is_same_ether_addr(&slave_addr, &slave->actor.system))
continue;
- rte_ether_addr_copy(&slave_addr, &slave->actor.system);
+ rte_ether_addr_copy(bond_dev->data->mac_addrs, &slave->actor.system);
/* Do nothing if this port is not an aggregator. In other case
* Set NTT flag on every port that use this aggregator. */
if (slave->aggregator_port_id != slave_id)
So back to the question: is there an easier way to remove the aggregator port from a bundle without affecting the overall bundle availability? Or is a patch required?
Thanks