From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id C5246A034C; Sun, 13 Mar 2022 13:22:37 +0100 (CET) Received: from [217.70.189.124] (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id 5C3E140E2D; Sun, 13 Mar 2022 13:22:37 +0100 (CET) Received: from mga14.intel.com (mga14.intel.com [192.55.52.115]) by mails.dpdk.org (Postfix) with ESMTP id 04DF340698; Sun, 13 Mar 2022 13:22:34 +0100 (CET) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1647174155; x=1678710155; h=from:to:subject:date:message-id:references:in-reply-to: mime-version; bh=+aqzEZ7t/5vUvmZ5mdJiqaE0ShEyCSUrohtjxmkQGuk=; b=mtUYt49WrH/r6DZInOuWpDCpwDaXGN2ZmRWShpjdLjpvYcQkr7C3jeYn bk91wd+/tyv3XCLBuFKsRxNUkYsu1XVZ1YwZR0l77EwJDXWNlHdrN2yU5 FGmoBNzh0llmx5kIVXnwV3Es9ilhkexvWggR3RNA4a/OPqRREbvnJDVwo w5x5vFoNNOx774dqSF/zV8yhJAL0rCUYaRoogUIv6G4euKo9XgAIaam3N h0bIMRVNeW2jI3+3gmrCR5mZnYKXzMd2+yrEDE7LfQbxuxv4AtVVi5byC 9fifjw/Klwhf7qY4IAIT2UHP+h9h3BrqQEHRKZQjnkngSn5BdaPfxPAT2 w==; X-IronPort-AV: E=McAfee;i="6200,9189,10284"; a="256050413" X-IronPort-AV: E=Sophos;i="5.90,178,1643702400"; d="scan'208,217";a="256050413" Received: from fmsmga007.fm.intel.com ([10.253.24.52]) by fmsmga103.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 13 Mar 2022 05:22:34 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.90,178,1643702400"; d="scan'208,217";a="548931598" Received: from fmsmsx606.amr.corp.intel.com ([10.18.126.86]) by fmsmga007.fm.intel.com with ESMTP; 13 Mar 2022 05:22:33 -0700 Received: from shsmsx601.ccr.corp.intel.com (10.109.6.141) by fmsmsx606.amr.corp.intel.com (10.18.126.86) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2308.21; Sun, 13 Mar 2022 05:22:33 -0700 Received: from shsmsx601.ccr.corp.intel.com (10.109.6.141) by SHSMSX601.ccr.corp.intel.com (10.109.6.141) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2308.21; Sun, 13 Mar 2022 20:22:31 +0800 Received: from shsmsx601.ccr.corp.intel.com ([10.109.6.141]) by SHSMSX601.ccr.corp.intel.com ([10.109.6.141]) with mapi id 15.01.2308.021; Sun, 13 Mar 2022 20:22:31 +0800 From: "Zhang, Qi Z" To: "Bly, Mike" , "dev@dpdk.org" , "users@dpdk.org" Subject: RE: IXGBE LSC IRQ issue Thread-Topic: IXGBE LSC IRQ issue Thread-Index: AdgwMjt3Aw43DWuZSvGGTkLNbT/dKAGooJog Date: Sun, 13 Mar 2022 12:22:31 +0000 Message-ID: References: In-Reply-To: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: dlp-reaction: no-action dlp-version: 11.6.401.20 dlp-product: dlpe-windows x-originating-ip: [10.239.127.36] Content-Type: multipart/alternative; boundary="_000_b42a7220432b4e19af6971ea38b0af09intelcom_" MIME-Version: 1.0 X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org --_000_b42a7220432b4e19af6971ea38b0af09intelcom_ Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable this is not a patch, but an issue report, suggest to fire a ticket on Bugzi= lla https://bugs.dpdk.org/ From: Bly, Mike Sent: Saturday, March 5, 2022 9:42 AM To: dev@dpdk.org; users@dpdk.org Subject: IXGBE LSC IRQ issue Hello, We recently ran into an issue with DPDK 20.11 for the IXGBE driver operatin= g in 10G BASE-T mode. We have been able to replicate this behavior using dp= dk-testpmd and do not see any recent/pertinent updates, so we are hopeful s= omeone may be able to advise based on the information provided below. On th= e surface, based on our investigation, it would appear the current link-dow= n transition logic does not correctly preserve IRQ mask configurations, spe= cifically LSC, when a link partner causes some sort of slow or bounced link= down event. Background: We recently started using a new 3rd party traffic generator card for testin= g our application. We found when using this card in 10G BASE-T mode and tog= gling link up/down, it would correctly cause our application to detect the = port to be down in our DPDK design. However, the link down event handling b= y the DPDK IXGBE driver appears to permanently disable its LSC IRQ detectio= n on the first port down event such that any subsequent link up or down eve= nts from the external test card on this port would no longer be detected. T= he only way to restore link up was to restart the DPDK port in our design (= stop/start). Having looked at this a bit, we switched over to the classic t= estpmd application and observed the exact same behavior. Here is the data we believe you would find interesting: NIC in question: # lspci -D -nn | grep -F [0200] | grep 552 0000:03:00.0 Ethernet controller [0200]: Intel Corporation Ethernet Connect= ion X552/X557-AT 10GBASE-T [8086:15ad] 0000:03:00.1 Ethernet controller [0200]: Intel Corporation Ethernet Connect= ion X552/X557-AT 10GBASE-T [8086:15ad] # dpdk-devbind.py -s | grep 552 0000:03:00.0 'Ethernet Connection X552/X557-AT 10GBASE-T 15ad' drv=3Dvfio-p= ci unused=3Duio_pci_generic 0000:03:00.1 'Ethernet Connection X552/X557-AT 10GBASE-T 15ad' drv=3Dvfio-p= ci unused=3Duio_pci_generic We made the following debug logging changes to try an capture interesting d= ata to share: diff --git a/drivers/net/ixgbe/ixgbe_ethdev.c b/drivers/net/ixgbe/ixgbe_eth= dev.c index 5a30c39593..75a9f9163b 100644 --- a/drivers/net/ixgbe/ixgbe_ethdev.c +++ b/drivers/net/ixgbe/ixgbe_ethdev.c @@ -4497,7 +4497,7 @@ ixgbe_dev_interrupt_get_status(struct rte_eth_dev *de= v) /* read-on-clear nic registers here */ eicr =3D IXGBE_READ_REG(hw, IXGBE_EICR); - PMD_DRV_LOG(DEBUG, "eicr %x", eicr); + PMD_DRV_LOG(ERR, "eicr %x", eicr); intr->flags =3D 0; @@ -4614,7 +4613,7 @@ ixgbe_dev_interrupt_action(struct rte_eth_dev *dev) } } - PMD_DRV_LOG(DEBUG, "enable intr immediately"); + PMD_DRV_LOG(ERR, "enable intr immediately, mask: 0x%08x, orig: 0x%08x,= flags: 0x%08x", intr->mask, intr->mask_original, intr->flags); ixgbe_enable_intr(dev); return 0; @@ -4648,7 +4647,9 @@ ixgbe_dev_interrupt_delayed_handler(void *param) ixgbe_disable_intr(hw); - eicr =3D IXGBE_READ_REG(hw, IXGBE_EICR); + eicr =3D IXGBE_READ_REG(hw, IXGBE_EICR); + PMD_DRV_LOG(ERR, "in delay func: eicr 0x%08x", eicr); + PMD_DRV_LOG(ERR, "enable intr delayed, mask: 0x%08x, orig: 0x%08x, flag= s: 0x%08x", intr->mask, intr->mask_original, intr->flags); if (eicr & IXGBE_EICR_MAILBOX) ixgbe_pf_mbx_process(dev); With the above "log-err" additions, we have provided the following results.= The first set of data below was generated using an older 3rd party traffic= generator card to provide "good" results that show the IXGBE driver workin= g correctly. Following that are the non-working (bad) logging results for t= he new traffic generator card. Both 3rd party cards correctly transition be= tween down and up states. ###################################################################### # good sequence, both down detection and then up detection ###################################################################### # port transition from up to down <27>1 2022-03-05T00:12:11.415436+00:00 - - ixgbe_dev_interrupt_get_status(= ): eicr 100000 <27>1 2022-03-05T00:12:11.415489+00:00 - - ixgbe_dev_interrupt_action(): e= nable intr immediately, mask: 0x02200000, orig: 0x02300000, flags: 0x000000= 01 <27>1 2022-03-05T00:12:11.425448+00:00 - - ixgbe_dev_interrupt_get_status(= ): eicr 2000000 <27>1 2022-03-05T00:12:11.446191+00:00 - - ixgbe_dev_interrupt_action(): e= nable intr immediately, mask: 0x02200000, orig: 0x02300000, flags: 0x000000= 00 <27>1 2022-03-05T00:12:15.415600+00:00 - - ixgbe_dev_interrupt_delayed_han= dler(): in delay func: eicr 0x00000000 <27>1 2022-03-05T00:12:15.415655+00:00 - - ixgbe_dev_interrupt_delayed_han= dler(): enable intr delayed, mask: 0x02200000, orig: 0x02300000, flags: 0x0= 0000000 # port transition from down to up <27>1 2022-03-05T00:12:33.856734+00:00 - - ixgbe_dev_interrupt_get_status(= ): eicr 2000000 <27>1 2022-03-05T00:12:33.877463+00:00 - - ixgbe_dev_interrupt_action(): e= nable intr immediately, mask: 0x02300000, orig: 0x00000000, flags: 0x000000= 00 <27>1 2022-03-05T00:12:34.203274+00:00 - - ixgbe_dev_interrupt_get_status(= ): eicr 100000 <27>1 2022-03-05T00:12:34.207905+00:00 - - ixgbe_dev_interrupt_action(): e= nable intr immediately, mask: 0x02200000, orig: 0x02300000, flags: 0x000000= 01 <27>1 2022-03-05T00:12:35.207994+00:00 - - ixgbe_dev_interrupt_delayed_han= dler(): in delay func: eicr 0x00100000 <27>1 2022-03-05T00:12:35.208027+00:00 - - ixgbe_dev_interrupt_delayed_han= dler(): enable intr delayed, mask: 0x02200000, orig: 0x02300000, flags: 0x0= 0000001 ###################################################################### # bad sequence, detects down event, but does not see the up event ###################################################################### # port transition from up to down <27>1 2022-03-05T00:13:00.377072+00:00 - - ixgbe_dev_interrupt_get_status(= ): eicr 100000 <27>1 2022-03-05T00:13:00.377127+00:00 - - ixgbe_dev_interrupt_action(): e= nable intr immediately, mask: 0x02200000, orig: 0x02300000, flags: 0x000000= 01 <27>1 2022-03-05T00:13:00.643788+00:00 - - ixgbe_dev_interrupt_get_status(= ): eicr 2100000 <27>1 2022-03-05T00:13:00.664603+00:00 - - ixgbe_dev_interrupt_action(): e= nable intr immediately, mask: 0x02200000, orig: 0x02200000, flags: 0x000000= 01 <27>1 2022-03-05T00:13:01.664703+00:00 - - ixgbe_dev_interrupt_delayed_han= dler(): in delay func: eicr 0x00000000 <27>1 2022-03-05T00:13:01.664738+00:00 - - ixgbe_dev_interrupt_delayed_han= dler(): enable intr delayed, mask: 0x02200000, orig: 0x02200000, flags: 0x0= 0000001 <27>1 2022-03-05T00:13:04.377237+00:00 - - ixgbe_dev_interrupt_delayed_han= dler(): in delay func: eicr 0x00000000 <27>1 2022-03-05T00:13:04.377269+00:00 - - ixgbe_dev_interrupt_delayed_han= dler(): enable intr delayed, mask: 0x02200000, orig: 0x00000000, flags: 0x0= 0000000 # port transition from down to up Let me know what additional data can be provided to help root cause this. -Mike --_000_b42a7220432b4e19af6971ea38b0af09intelcom_ Content-Type: text/html; charset="us-ascii" Content-Transfer-Encoding: quoted-printable

this is not a patch, but an issue report, suggest to= fire a ticket on Bugzilla https://bugs.dpdk.org/

 

 

From: Bly, Mike <mbly@ciena.com>
Sent: Saturday, March 5, 2022 9:42 AM
To: dev@dpdk.org; users@dpdk.org
Subject: IXGBE LSC IRQ issue

 

Hello,

 

We recently ran into = an issue with DPDK 20.11 for the IXGBE driver operating in 10G BASE-T mode.= We have been able to replicate this behavior using dpdk-testpmd and do not= see any recent/pertinent updates, so we are hopeful someone may be able to advise based on the information prov= ided below. On the surface, based on our investigation, it would appear the= current link-down transition logic does not correctly preserve IRQ mask co= nfigurations, specifically LSC, when a link partner causes some sort of slow or bounced link down event.

Background:
We recently started using a new 3rd party traffic generator card= for testing our application. We found when using this card in 10G BASE-T m= ode and toggling link up/down, it would correctly cause our application to = detect the port to be down in our DPDK design. However, the link down event handling by the DPDK IXGBE drive= r appears to permanently disable its LSC IRQ detection on the first port do= wn event such that any subsequent link up or down events from the external = test card on this port would no longer be detected. The only way to restore link up was to restart the DPD= K port in our design (stop/start). Having looked at this a bit, we switched= over to the classic testpmd application and observed the exact same behavi= or.

 

Here is the data we believe you would find interesti= ng:

 

NIC in question:

 

= # lspci -D -nn | grep -F [0200] | grep 552

= 0000:03:00.0 Ethernet controller [0200]: Intel Corporation Ethernet Connect= ion X552/X557-AT 10GBASE-T [8086:15ad]

= 0000:03:00.1 Ethernet controller [0200]: Intel Corporation Ethernet Connect= ion X552/X557-AT 10GBASE-T [8086:15ad]

= # dpdk-devbind.py -s | grep 552

= 0000:03:00.0 'Ethernet Connection X552/X557-AT 10GBASE-T 15ad' drv=3Dvfio-p= ci unused=3Duio_pci_generic

= 0000:03:00.1 'Ethernet Connection X552/X557-AT 10GBASE-T 15ad' drv=3Dvfio-p= ci unused=3Duio_pci_generic

 

We made the following debug logging changes to try a= n capture interesting data to share:

 

= diff --git a/drivers/net/ixgbe/ixgbe_ethdev.c b/drivers/net/ixgbe/ixgbe_eth= dev.c

= index 5a30c39593..75a9f9163b 100644

= --- a/drivers/net/ixgbe/ixgbe_ethdev.c

= +++ b/drivers/net/ixgbe/ixgbe_ethdev.c

= @@ -4497,7 +4497,7 @@ ixgbe_dev_interrupt_get_status(struct rte_eth_dev= *dev)

=  

=      /* read-on-clear nic registers here */<= /span>

=     eicr =3D IXGBE_READ_REG(hw, IXGBE_EICR);

= -    PMD_DRV_LOG(DEBUG, "eicr %x", eicr);

= +    PMD_DRV_LOG(ERR, "eicr %x", eicr);

=  

=      intr->flags =3D 0;

=  

= @@ -4614,7 +4613,7 @@ ixgbe_dev_interrupt_action(struct rte_eth_dev *de= v)

=           }<= /p>

=     }

=  

= -    PMD_DRV_LOG(DEBUG, "enable intr immediately")= ;

= +    PMD_DRV_LOG(ERR, "enable intr immediately, mas= k: 0x%08x, orig: 0x%08x, flags: 0x%08x", intr->mask, intr->mask_= original, intr->flags);

=     ixgbe_enable_intr(dev);

=  

=      return 0;

= @@ -4648,7 +4647,9 @@ ixgbe_dev_interrupt_delayed_handler(void *param)<= o:p>

=  

=      ixgbe_disable_intr(hw);

=  

= -    eicr =3D IXGBE_READ_REG(hw, IXGBE_EICR);

= +   eicr =3D IXGBE_READ_REG(hw, IXGBE_EICR);

= +   PMD_DRV_LOG(ERR, "in delay func: eicr 0x%08x", = eicr);

= +   PMD_DRV_LOG(ERR, "enable intr delayed, mask: 0x%08x,= orig: 0x%08x, flags: 0x%08x", intr->mask, intr->mask_original, = intr->flags);

=     if (eicr & IXGBE_EICR_MAILBOX)

=           ixgbe_pf_mbx_process= (dev);

 

With the above “log-err” additions, we h= ave provided the following results. The first set of data below was generat= ed using an older 3rd party traffic generator card to provide &#= 8220;good” results that show the IXGBE driver working correctly. Following that are the non-working (bad) logging results for th= e new traffic generator card. Both 3rd party cards correctly tra= nsition between down and up states.

 

 

= ######################################################################=

= # good sequence, both down detection and then up detection

= ######################################################################=

= # port transition from up to down

= <27>1 2022-03-05T00:12:11.415436+00:00 - -  ixgbe_dev_interr= upt_get_status(): eicr 100000

= <27>1 2022-03-05T00:12:11.415489+00:00 - -  ixgbe_dev_interr= upt_action(): enable intr immediately, mask: 0x02200000, orig: 0x02300000, = flags: 0x00000001

= <27>1 2022-03-05T00:12:11.425448+00:00 - -  ixgbe_dev_interr= upt_get_status(): eicr 2000000

= <27>1 2022-03-05T00:12:11.446191+00:00 - -  ixgbe_dev_interr= upt_action(): enable intr immediately, mask: 0x02200000, orig: 0x02300000, = flags: 0x00000000

= <27>1 2022-03-05T00:12:15.415600+00:00 - -  ixgbe_dev_interr= upt_delayed_handler(): in delay func: eicr 0x00000000

= <27>1 2022-03-05T00:12:15.415655+00:00 - -  ixgbe_dev_interr= upt_delayed_handler(): enable intr delayed, mask: 0x02200000, orig: 0x02300= 000, flags: 0x00000000

=  

= # port transition from down to up

= <27>1 2022-03-05T00:12:33.856734+00:00 - -  ixgbe_dev_interr= upt_get_status(): eicr 2000000

= <27>1 2022-03-05T00:12:33.877463+00:00 - -  ixgbe_dev_interr= upt_action(): enable intr immediately, mask: 0x02300000, orig: 0x00000000, = flags: 0x00000000

= <27>1 2022-03-05T00:12:34.203274+00:00 - -  ixgbe_dev_interr= upt_get_status(): eicr 100000

= <27>1 2022-03-05T00:12:34.207905+00:00 - -  ixgbe_dev_interr= upt_action(): enable intr immediately, mask: 0x02200000, orig: 0x02300000, = flags: 0x00000001

= <27>1 2022-03-05T00:12:35.207994+00:00 - -  ixgbe_dev_interr= upt_delayed_handler(): in delay func: eicr 0x00100000

= <27>1 2022-03-05T00:12:35.208027+00:00 - -  ixgbe_dev_interr= upt_delayed_handler(): enable intr delayed, mask: 0x02200000, orig: 0x02300= 000, flags: 0x00000001

=  

= ######################################################################=

= # bad sequence, detects down event, but does not see the up event

= ######################################################################=

= # port transition from up to down

= <27>1 2022-03-05T00:13:00.377072+00:00 - -  ixgbe_dev_interr= upt_get_status(): eicr 100000

= <27>1 2022-03-05T00:13:00.377127+00:00 - -  ixgbe_dev_interr= upt_action(): enable intr immediately, mask: 0x02200000, orig: 0x02300000, = flags: 0x00000001

= <27>1 2022-03-05T00:13:00.643788+00:00 - -  ixgbe_dev_interr= upt_get_status(): eicr 2100000

= <27>1 2022-03-05T00:13:00.664603+00:00 - -  ixgbe_dev_interr= upt_action(): enable intr immediately, mask: 0x02200000, orig: 0x02200000, = flags: 0x00000001

= <27>1 2022-03-05T00:13:01.664703+00:00 - -  ixgbe_dev_interr= upt_delayed_handler(): in delay func: eicr 0x00000000

= <27>1 2022-03-05T00:13:01.664738+00:00 - -  ixgbe_dev_interr= upt_delayed_handler(): enable intr delayed, mask: 0x02200000, orig: 0x02200= 000, flags: 0x00000001

= <27>1 2022-03-05T00:13:04.377237+00:00 - -  ixgbe_dev_interr= upt_delayed_handler(): in delay func: eicr 0x00000000

= <27>1 2022-03-05T00:13:04.377269+00:00 - -  ixgbe_dev_interr= upt_delayed_handler(): enable intr delayed, mask: 0x02200000, orig: 0x00000= 000, flags: 0x00000000

=  

= # port transition from down to up

= <nothing happens as LSC IRQ is not enabled due to above link-down sequen= ce>

 

Let me know what additional data can be provided to = help root cause this.

 

-Mike

 

--_000_b42a7220432b4e19af6971ea38b0af09intelcom_--