From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from szxga01-in.huawei.com (szxga01-in.huawei.com [45.249.212.187]) by dpdk.org (Postfix) with ESMTP id 2A570201 for ; Mon, 24 Jul 2017 04:46:25 +0200 (CEST) Received: from 172.30.72.56 (EHLO DGGEMA403-HUB.china.huawei.com) ([172.30.72.56]) by dggrg01-dlp.huawei.com (MOS 4.4.6-GA FastPath queued) with ESMTP id AST00949; Mon, 24 Jul 2017 10:46:19 +0800 (CST) Received: from DGGEMA502-MBX.china.huawei.com ([169.254.2.96]) by DGGEMA403-HUB.china.huawei.com ([10.3.20.44]) with mapi id 14.03.0301.000; Mon, 24 Jul 2017 10:46:09 +0800 From: "Zhangkun (K)" To: "dev@dpdk.org" , "tomaszx.kulasek@intel.com" CC: Liuyongan , "Chengwentao (Vintorcheng)" , zhouzhengwu Thread-Topic: LACP bond link broken chain due to the timeout Thread-Index: AdMEJu2AzSrzheGAR0iPFMytgkrmZw== Date: Mon, 24 Jul 2017 02:46:08 +0000 Message-ID: <0FCB215400789046A95ECE23C3E46C51908DBA89@DGGEMA502-MBX.china.huawei.com> Accept-Language: zh-CN, en-US Content-Language: zh-CN X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [10.177.23.192] MIME-Version: 1.0 X-CFilter-Loop: Reflected X-Mirapoint-Virus-RAPID-Raw: score=unknown(0), refid=str=0001.0A090205.59755F7C.001A, ss=1, re=0.000, recu=0.000, reip=0.000, cl=1, cld=1, fgs=0, ip=169.254.2.96, so=2014-11-16 11:51:01, dmn=2013-03-21 17:37:32 X-Mirapoint-Loop-Id: c7f4fcebeb5edf4e5c3dfb1fef52db77 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable X-Content-Filtered-By: Mailman/MimeDel 2.1.15 Subject: [dpdk-dev] LACP bond link broken chain due to the timeout X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 24 Jul 2017 02:46:28 -0000 Hi, all I use dpdk LACP bond for long time large flow test, there are LACP broken c= hain. The dpdk log is as follow: Bond 1: slave id 0 distributing stopped. Bond 1: slave id 1 distributing stopped. Through the analysis of code , LACP protocol packets are handled by eal-int= r-thread thread, at the same time, the thread will also deal with the other= driver interrupt event and query the NIC card state . And each polling thread is waiting for a long time that is leading to the s= witch disconnection timeout. On the generic x86 OS server, has it been considered that the eal-intr-thre= ad thread can 't get a timely scheduling, and that would result in LACP tim= eout and Link broken chain? Or has it been considered that LACP bond would be put on a single thread, n= ot shared with others?