From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mga14.intel.com (mga14.intel.com [192.55.52.115]) by dpdk.org (Postfix) with ESMTP id E68E44A63 for ; Mon, 26 Oct 2015 06:25:07 +0100 (CET) Received: from fmsmga003.fm.intel.com ([10.253.24.29]) by fmsmga103.fm.intel.com with ESMTP; 25 Oct 2015 22:25:07 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.20,200,1444719600"; d="scan'208";a="587945721" Received: from fmsmsx108.amr.corp.intel.com ([10.18.124.206]) by FMSMGA003.fm.intel.com with ESMTP; 25 Oct 2015 22:25:07 -0700 Received: from fmsmsx116.amr.corp.intel.com (10.18.116.20) by FMSMSX108.amr.corp.intel.com (10.18.124.206) with Microsoft SMTP Server (TLS) id 14.3.248.2; Sun, 25 Oct 2015 22:25:06 -0700 Received: from shsmsx104.ccr.corp.intel.com (10.239.110.15) by fmsmsx116.amr.corp.intel.com (10.18.116.20) with Microsoft SMTP Server (TLS) id 14.3.248.2; Sun, 25 Oct 2015 22:25:06 -0700 Received: from shsmsx102.ccr.corp.intel.com ([169.254.2.253]) by SHSMSX104.ccr.corp.intel.com ([169.254.5.194]) with mapi id 14.03.0248.002; Mon, 26 Oct 2015 13:25:04 +0800 From: "Lu, Wenzhuo" To: Tim Shearer Thread-Topic: [dpdk-dev] [PATCH] librte: Link status interrupt race condition, IGB E1000 Thread-Index: AQHRD3h2vMSP7/V8K0m1FLeV5prLKZ59O/Sw Date: Mon, 26 Oct 2015 05:25:04 +0000 Message-ID: <6A0DE07E22DDAD4C9103DF62FEBC0909020A2AE8@shsmsx102.ccr.corp.intel.com> References: <33526A3108217C45B7DAFFA5277E4B67485277B3@mbx024-e1-nj-2.exch024.domain.local> <6060754.bW1Xky3pZ1@xps13> In-Reply-To: <6060754.bW1Xky3pZ1@xps13> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-titus-metadata-40: eyJDYXRlZ29yeUxhYmVscyI6IiIsIk1ldGFkYXRhIjp7Im5zIjoiaHR0cDpcL1wvd3d3LnRpdHVzLmNvbVwvbnNcL0ludGVsIiwiaWQiOiI0YzA4MzFmNi01YzYzLTRiNjctYmRiZC1jZjdjNzk2NGM2OGMiLCJwcm9wcyI6W3sibiI6IkludGVsRGF0YUNsYXNzaWZpY2F0aW9uIiwidmFscyI6W3sidmFsdWUiOiJDVFBfUFVCTElDIn1dfV19LCJTdWJqZWN0TGFiZWxzIjpbXSwiVE1DVmVyc2lvbiI6IjE1LjQuMTAuMTkiLCJUcnVzdGVkTGFiZWxIYXNoIjoiVjBURDZRUm9JYWpzTXBYNTFVajdcL3JFVnJ3dmpiNHZhcUREaUpFRWF1UjQ9In0= x-inteldataclassification: CTP_PUBLIC x-originating-ip: [10.239.127.40] Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Cc: "dev@dpdk.org" Subject: Re: [dpdk-dev] [PATCH] librte: Link status interrupt race condition, IGB E1000 X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 26 Oct 2015 05:25:08 -0000 Hi Tim, > -----Original Message----- > From: Thomas Monjalon [mailto:thomas.monjalon@6wind.com] > Sent: Monday, October 26, 2015 6:56 AM > To: Lu, Wenzhuo > Cc: dev@dpdk.org; Tim Shearer > Subject: Re: [dpdk-dev] [PATCH] librte: Link status interrupt race condit= ion, > IGB E1000 >=20 > Wenzhuo, > Please could you have a look? > Thanks >=20 > 2015-09-24 20:44, Tim Shearer: > > I encountered an issue with DPDK 2.1.0 which occasionally causes the l= ink > status interrupt callback not to be called after the interface is started= for the > first time. I traced the problem back to the function eth_igb_link_update= (), > which is used to determine if the link has changed state since the previo= us > time it was called. It appears that this function can be called simultane= ously > from two different threads: > > > > (1) From the main application/configuration thread, via rte_eth_dev_sta= rt() > - pointed to by (*dev->dev_ops->link_update) > > (2) From the eal interrupt thread, via eth_igb_interrupt_action(), to c= heck > if the link state has transitioned up or down. The user callback is only > executed if the link has changed state. > > > > The race condition manifests itself as follows: > > - Main thread configures the interface with link status interrupt (LSI= ) > enabled, sets up the queues etc. > > - Main thread calls rte_eth_dev_start. The interface is started and th= en we > call eth_igb_link_update() > > - While in this call, the link goes up. Accordingly, we detect the tr= ansition, > and write the new link state (up) into the global rte_eth_dev struct > > - The interrupt fires, which also drops into the eth_igb_link_update > function, finds that the global link status has already been set to up (n= o > change) > > - Therefore, the handler thinks the interrupt was spurious, and the > callback doesn't get called. > > > > I suspect that rte_eth_dev_start shouldn't be checking the link state i= f > interrupts are enabled. Would someone mind taking a quick look at the > patch below? > > > > Thanks! > > Tim > > > > --- a/lib/librte_ether/rte_ethdev.c > > +++ b/lib/librte_ether/rte_ethdev.c > > @@ -1300,7 +1300,7 @@ rte_eth_dev_start(uint8_t port_id) > > > > rte_eth_dev_config_restore(port_id); > > > > - if (dev->data->dev_conf.intr_conf.lsc !=3D 0) { > > + if (dev->data->dev_conf.intr_conf.lsc =3D=3D 0) { > > FUNC_PTR_OR_ERR_RET(*dev->dev_ops->link_update, - > ENOTSUP); > > (*dev->dev_ops->link_update)(dev, 0); > > } > > > > >=20 I think you're right. To my opinion, this if is added to avoid the race con= dition. So, it should be " dev->data->dev_conf.intr_conf.lsc =3D=3D 0". It = means if the interrupts are not enabled, we'd update the link when starting= , if not we can leave it the interrupt handler. Seems it's not an igb specific but common issue.=20