From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail1.sandvine.com (Mail1.sandvine.com [64.7.137.134]) by dpdk.org (Postfix) with ESMTP id C4D1B2A5D for ; Thu, 23 Nov 2017 17:04:14 +0100 (CET) Received: from WTL-EXCHP-1.sandvine.com ([fe80::ac6b:cc1e:f2ff:93aa]) by wtl-exchp-2.sandvine.com ([::1]) with mapi id 14.03.0319.002; Thu, 23 Nov 2017 11:04:13 -0500 From: Kyle Larose To: "dev@dpdk.org" CC: Declan Doherty Thread-Topic: rte_eth_bond: Problem with link failure and 8023AD Thread-Index: AdNkc02nPlVLapNtRaOLYwxkHidOWw== Date: Thu, 23 Nov 2017 16:04:13 +0000 Message-ID: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [192.168.200.51] x-c2processedorg: b2f06e69-072f-40ee-90c5-80a34e700794 MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable X-Content-Filtered-By: Mailman/MimeDel 2.1.15 Subject: [dpdk-dev] rte_eth_bond: Problem with link failure and 8023AD X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 23 Nov 2017 16:04:15 -0000 Hello, I've been testing my LAG implemented with the DPDK eth_bond pmd. As part of= my fault tolerance testing, I want to ensure that if a link is flapping up= and down continuously, impact to service is minimal. My findings are that = in this case, the lag is rendered inoperable if a certain link is flapping.= Details below. Setup: - 4x10G X710 links in a 8023ad lag connected to a switch. - Under normal operations, lag is steady, traffic balanced, etc Problem: If I take down a link on the switch corresponding to the "aggregator" link = in the dpdk lag, then bring it back up, every link in the lag goes from dis= tributing to not distributing to back to distributing. This causes unnecess= ary loss of service. A single link failure, regardless of whether or not it's the aggregator lin= k, should not change the state of the other links. Consider what would happ= en if there were a hardware fault on that link, or its signal were bad: it'= s possible for it to be stuck flapping up and down. This would lead to comp= lete loss of service on the lag, despite there being three stable links rem= aining. Analysis: - The switch is showing that the system id is changing when the link flaps.= It's going from 00:00:00:00:00:00 to the aggregator's mac. This is not goo= d. Why is it happening? It's because by default we seem to be using the "AG= G_BANDWIDTH" selection algorithm, which is broken: It's taking a slave inde= x, and using that the index into the 8023ad ports array, which is based on = the dpdk port number. It should translate it from the slave index into a dp= dk_port number using the slaves[] array. - Aside from the above, if you look, the default is supposed to be AGG_STAB= LE, according to bond_mode_8023ad_conf_get_default. However, bond_mode_8023= ad_conf_assign does not actually copy out the selection algorithm, so it ju= st uses 0, which happens to be AGG_BANDWIDTH. - I fixed the above, but still faced two more issues: 1) The system ID changes when the aggregator changes, which can lead to t= he problem. 2) When the link fails, it is "deactivated" in the lag via bond_mode_8023= ad_deactivate_slave. There is a block in there dedicated to the case where = the aggregator is disabled. In that case, it explicitly unselects each slav= e sharing that aggregator. This causes them to fall back to the DETACHED state in the mux machine -- i.e. the= y are no longer aggregating at all, until the state machine runs through th= e LACP exchange with the partner again. Possible fix: 1) Change bond_mode_8023ad_conf_assign to actually copy out the selection a= lgorithm. 2) Ensure that all members of a LAG have the same system id (i.e. choose th= e LAG's mac address) 3) Do not detach the other members when the aggregator's link state goes do= wn. Note: 1) We should fix AGG_BANDWIDTH and AGG_COUNT separately. 2) I can't see any reason why the system id should be equal to the mac of = the aggregator. It's intended to represent the system to which the lag belo= ngs, not the aggregator itself. The aggregator is represented by the operat= ional key. So, it should be fine to use the LAG's mac address, which is fix= ed at init, as the system id for all possible aggregators. 3) I think not detaching is the correct approach. There is nothing in my re= ading of 802.1Q or 802.1AX' LACP specification that implies we should do th= is. There is a blurb about changes in parameters which lead to the change i= n aggregator forcing the unselected transition, but I don't think that needs to apply here. I'm fairly cert= ain they're talking about changing the operational key/etc. How does everyone feel about this? Am I crazy in requiring this functionali= ty? What about the proposed fix. Does it sound reasonable, or am I going to= break the state machine somehow? Thanks, Kyle