From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id DFF4D428AA; Mon, 3 Apr 2023 11:27:41 +0200 (CEST) Received: from mails.dpdk.org (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id AD3C240ED7; Mon, 3 Apr 2023 11:27:41 +0200 (CEST) Received: from mail-ed1-f46.google.com (mail-ed1-f46.google.com [209.85.208.46]) by mails.dpdk.org (Postfix) with ESMTP id D32D640A7E for ; Mon, 3 Apr 2023 11:27:39 +0200 (CEST) Received: by mail-ed1-f46.google.com with SMTP id ek18so114666212edb.6 for ; Mon, 03 Apr 2023 02:27:39 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=pantheon-tech.20210112.gappssmtp.com; s=20210112; t=1680514059; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=Nrdy4ahuvYvwTOCizhu33oh5QCrgpKKMCgnpPAFDuGE=; b=6ZvolO1FP34gdH2NxXE32xAowHfBb+4d0jK9nzsybu2DR8fNQAVkhmx+moHPmgDb/z Ic6LOASoL5ZF4xVN43VmeIg/cyq4UR/yotWAR9SZ18Vs4KIUbwX0Vxb5boifXTjk+SRY kJ9UILWupZMMgV5aUEyuviAtFQ6kJhCnVkydVWK0dg/QlRyJgTKL07yuI56kLrJvZNNV ckb5lb8bGLHW4tVXHtmbJj26VpGHrTQbSFbXA1EgcEex34GMzxUy3WNl9FlcZGZOJNlg BuGFVXo/Aa0Ks6i8EUanyAjZPwXsn6UY9vSIu4ZQ4Jn/4r7cMHAyW4FH0OaMkDEpAuaa OMHw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; t=1680514059; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=Nrdy4ahuvYvwTOCizhu33oh5QCrgpKKMCgnpPAFDuGE=; b=st4eiI7PicRYaWvP2pa36otjox2V5WJ5c3TUQM0G8jBAVRI6UTf1tmA5Cn7G7d0iNG bK6UqTBkD9pVVVFcmjtphPay49TDTloEnQtfxjYVYTxiE2RkzY7wpdGMVPAqpfNW9C4Z +SjmNit/HMbllVfT5+Lr6ErkjmCEZc2j7ie1hxw8oCQJ2kLMDvLOVOSmvScIFqOVhLAN yaVeWF9SCNJzMAiCghig4KDdpyOXK+GrSdRRPQUFOM0q/L99nQyAolpAG9+p5smERFzy enVMry7LyfZ7vWPDGcEbULyst8TBRoE/p34ftIpekamelWXx4B96W82fkeceF509bOzD NZdg== X-Gm-Message-State: AAQBX9dSzCHZbSr65PyQzbYlXLZf5JwDv5SM+8aIGsylxIiD7DHdMFj1 dqDxHnpNY8qwlquwrlEIJOlg6WxTdj+oM9Uytxmm2g== X-Google-Smtp-Source: AKy350ZZ6L5ouyuB3LRP8BzlOHy469nAhkMhlzUVijYFIk4lPUuRsxMVJqwjvDUS2lVOH5Xyj+S+Ztk/+Jt9b+lModg= X-Received: by 2002:a17:907:7b0a:b0:947:f389:58ca with SMTP id mn10-20020a1709077b0a00b00947f38958camr4204550ejc.2.1680514054653; Mon, 03 Apr 2023 02:27:34 -0700 (PDT) MIME-Version: 1.0 References: <6d232783fb654d0485f8788c027bd70b@pantheon.tech> In-Reply-To: From: =?UTF-8?Q?Juraj_Linke=C5=A1?= Date: Mon, 3 Apr 2023 11:27:23 +0200 Message-ID: Subject: Re: Testpmd/l3fwd port shutdown failure on Arm Altra systems To: "Xing, Beilei" Cc: "Singh, Aman Deep" , "Zhang, Yuying" , "Yang, Qiming" , "dev@dpdk.org" , Ruifeng Wang , "Zhang, Lijian" , Honnappa Nagarahalli Content-Type: multipart/alternative; boundary="00000000000044972e05f86b2b1f" X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org --00000000000044972e05f86b2b1f Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Hello Qiming, Beilei, Could you please help us debug this issue? Anything that would help with getting to the bottom of anything that could go wrong during port init/cleanup would be appreciated - extra eal/testpmd options or even code changes (such as where could add extra debug messages). Thanks, Juraj On Wed, Mar 8, 2023 at 7:25=E2=80=AFAM Juraj Linke=C5=A1 wrote: > Hello Qiming, Beilei, > > Another reminder - are you looking at this by any chance? > > The high level short description is that testpmd/l3fwd breaks a link > between two servers while VPP (using DPDK) doesn't. This leads us to > believe there's a problem with testpmd/l3fwd/i40e driver in DPDK. > > Thanks, > Juraj > > On Tue, Feb 21, 2023 at 12:18=E2=80=AFPM Juraj Linke=C5=A1 > wrote: > > > > Hi Qiming, > > > > Just a friendly reminder, would you please take a look? > > > > Thanks, > > Juraj > > > > > > On Tue, Feb 7, 2023 at 3:10 AM Xing, Beilei > wrote: > > > > > > Hi Qiming, > > > > > > Could you please help on this? Thanks. > > > > > > BR, > > > Beilei > > > > > > > -----Original Message----- > > > > From: Juraj Linke=C5=A1 > > > > Sent: Monday, February 6, 2023 4:53 PM > > > > To: Singh, Aman Deep ; Zhang, Yuying > > > > ; Xing, Beilei > > > > Cc: dev@dpdk.org; Ruifeng Wang ; Zhang, Lijia= n > > > > ; Honnappa Nagarahalli > > > > > > > > Subject: Re: Testpmd/l3fwd port shutdown failure on Arm Altra syste= ms > > > > > > > > Hello i40e and testpmd maintainers, > > > > > > > > A gentle reminder - would you please advise how to debug the issue > described > > > > below? > > > > > > > > Thanks, > > > > Juraj > > > > > > > > On Fri, Jan 20, 2023 at 1:07 PM Juraj Linke=C5=A1 > > > > > wrote: > > > > > > > > > > Adding the logfile. > > > > > > > > > > > > > > > > > > > > One thing that's in the logs but didn't explicitly mention is the > DPDK version > > > > we've tried this with: > > > > > > > > > > EAL: RTE Version: 'DPDK 22.07.0' > > > > > > > > > > > > > > > > > > > > We also tried earlier versions going back to 21.08, with no luck. > I also did a > > > > quick check on 22.11, also with no luck. > > > > > > > > > > > > > > > > > > > > Juraj > > > > > > > > > > > > > > > > > > > > From: Juraj Linke=C5=A1 > > > > > Sent: Friday, January 20, 2023 12:56 PM > > > > > To: 'aman.deep.singh@intel.com' ; > > > > > 'yuying.zhang@intel.com' ; Xing, Beilei > > > > > > > > > > Cc: dev@dpdk.org; Ruifeng Wang ; 'Lijian > Zhang' > > > > > ; 'Honnappa Nagarahalli' > > > > > > > > > > Subject: Testpmd/l3fwd port shutdown failure on Arm Altra systems > > > > > > > > > > > > > > > > > > > > Hello i40e and testpmd maintainers, > > > > > > > > > > > > > > > > > > > > We're hitting an issue with DPDK testpmd on Ampere Altra servers > in FD.io > > > > lab. > > > > > > > > > > > > > > > > > > > > A bit of background: along with VPP performance tests (which uses > DPDK), > > > > we're running a small number of basic DPDK testpmd and l3fwd tests > in FD.io > > > > as well. This is to catch any performance differences due to VPP > updating its > > > > DPDK version. > > > > > > > > > > > > > > > > > > > > We're running both l3fwd tests and testpmd tests. The Altra > servers are two > > > > socket and the topology is TG -> DUT1 -> DUT2 -> TG, traffic flows > in both > > > > directions, but nothing gets forwarded (with a slight caveat - put = a > pin in this). > > > > There's nothing special in the tests, just forwarding traffic. The > NIC we're > > > > testing is xl710-QDA2. > > > > > > > > > > > > > > > > > > > > The same tests are passing on all other testbeds - we have variou= s > two node > > > > (1 DUT, 1 TG) and three node (2 DUT, 1 TG) Intel and Arm testbeds > and with > > > > various NICs (Intel 700 and 800 series and the Intel testbeds use > some > > > > Mellanox NICs as well). We don't have quite the same combination of > another > > > > three node topology with the same NIC though, so it looks like > something with > > > > testpmd/l3fwd and xl710-QDA2 on Altra servers. > > > > > > > > > > > > > > > > > > > > VPP performance tests are passing, but l3fwd and testpmd fail. > This leads us > > > > to believe to it's a software issue, but there could something wron= g > with the > > > > hardware. I'll talk about testpmd from now on, but as far we can > tell, the > > > > behavior is the same for testpmd and l3fwd. > > > > > > > > > > > > > > > > > > > > Getting back to the caveat mentioned earlier, there seems to be > something > > > > wrong with port shutdown. When running testpmd on a testbed that > hasn't > > > > been used for a while it seems that all ports are up right away (we > don't see > > > > any "Port 0|1: link state change event") and the setup works fine > (forwarding > > > > works). After restarting testpmd (restarting on one server is > sufficient), the > > > > ports between DUT1 and DUT2 (but not between DUTs and TG) go down a= nd > > > > are not usable in DPDK, VPP or in Linux (with i40e kernel driver) > for a while > > > > (measured in minutes, sometimes dozens of minutes; the duration is > seemingly > > > > random). The ports eventually recover and can be used again, but > there's > > > > nothing in syslog suggesting what happened. > > > > > > > > > > > > > > > > > > > > What seems to be happening is testpmd put the ports into some > faulty state. > > > > This only happens on the DUT1 -> DUT2 link though (the ports betwee= n > the > > > > two testpmds), not on TG -> DUT1 link (the TG port is left alone). > > > > > > > > > > > > > > > > > > > > Some more info: > > > > > > > > > > We've come across the issue with this configuration: > > > > > > > > > > OS: Ubuntu20.04 with kernel 5.4.0-65-generic. > > > > > > > > > > Old NIC firmware, never upgraded: 6.01 0x800035da 1.1747.0. > > > > > > > > > > Drivers versions: i40e 2.17.15 and iavf 4.3.19. > > > > > > > > > > > > > > > > > > > > As well as with this configuration: > > > > > > > > > > OS: Ubuntu22.04 with kernel 5.15.0-46-generic. > > > > > > > > > > Updated firmware: 8.30 0x8000a4ae 1.2926.0. > > > > > > > > > > Drivers: i40e 2.19.3 and iavf 4.5.3. > > > > > > > > > > > > > > > > > > > > Unsafe noiommu mode is disabled: > > > > > > > > > > cat /sys/module/vfio/parameters/enable_unsafe_noiommu_mode > > > > > > > > > > N > > > > > > > > > > > > > > > > > > > > We used DPDK 22.07 in manual testing and built it on DUTs, using > generic > > > > build: > > > > > > > > > > meson -Dexamples=3Dl3fwd -Dc_args=3D-DRTE_LIBRTE_I40E_16BYTE_RX_D= ESC=3Dy > > > > > -Dplatform=3Dgeneric build > > > > > > > > > > > > > > > > > > > > We're running testpmd with this command: > > > > > > > > > > sudo build/app/dpdk-testpmd -v -l 1,2 -a 0004:04:00.1 -a > 0004:04:00.0 > > > > > --in-memory -- -i --forward-mode=3Dio --burst=3D64 --txq=3D1 --rx= q=3D1 > > > > > --tx-offloads=3D0x0 --numa --auto-start --total-num-mbufs=3D32768 > > > > > --nb-ports=3D2 --portmask=3D0x3 --max-pkt-len=3D1518 --mbuf-size= =3D16384 > > > > > --nb-cores=3D1 > > > > > > > > > > > > > > > > > > > > And l3fwd (with different macs on the other server): > > > > > > > > > > sudo /tmp/openvpp-testing/dpdk/build/examples/dpdk-l3fwd -v -l 1,= 2 > -a > > > > > 0004:04:00.0 -a 0004:04:00.1 --in-memory -- --parse-ptype > > > > > --eth-dest=3D"0,40:a6:b7:85:e7:79" --eth-dest=3D"1,3c:fd:fe:c3:e7= :a1" > > > > > --config=3D"(0, 0, 2),(1, 0, 2)" -P -L -p 0x3 > > > > > > > > > > > > > > > > > > > > We tried adding logs with --log-level=3Dpmd,debug and > --no-lsc-interrupt, but > > > > that didn't reveal anything helpful, as far as we can tell - please > have a look at > > > > the attached log. The faulty port is port0 (starts out as down, the= n > we waited > > > > for around 25 minutes for it to go up and then we shut down testpmd= ). > > > > > > > > > > > > > > > > > > > > We'd like to ask for pointers on what could be the cause or how t= o > debug > > > > this issue further. > > > > > > > > > > > > > > > > > > > > Thanks, > > > > > Juraj > --00000000000044972e05f86b2b1f Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Hello Qiming, Beilei,

Could = you please help us debug this issue? Anything that would help with getting = to the bottom of anything that could go wrong during port init/cleanup woul= d be appreciated - extra eal/testpmd options or even code changes (such as = where could add extra debug messages).

Thanks,
Juraj

On Wed, Mar 8, 2023 at 7:25=E2=80=AFAM Juraj Linke=C5= =A1 <juraj.linkes@pantheon.tech> wrote:
Hello Qiming, Beilei,

Another reminder - are you looking at this by any chance?

The high level short description is that testpmd/l3fwd breaks a link
between two servers while VPP (using DPDK) doesn't. This leads us to believe there's a problem with testpmd/l3fwd/i40e driver in DPDK.

Thanks,
Juraj

On Tue, Feb 21, 2023 at 12:18=E2=80=AFPM Juraj Linke=C5=A1
<juraj.linkes@pantheon.tech> wrote:
>
> Hi Qiming,
>
> Just a friendly reminder, would you please take a look?
>
> Thanks,
> Juraj
>
>
> On Tue, Feb 7, 2023 at 3:10 AM Xing, Beilei <beilei.xing@intel.com> wrote: > >
> > Hi Qiming,
> >
> > Could you please help on this? Thanks.
> >
> > BR,
> > Beilei
> >
> > > -----Original Message-----
> > > From: Juraj Linke=C5=A1 <juraj.linkes@pantheon.tech> > > > Sent: Monday, February 6, 2023 4:53 PM
> > > To: Singh, Aman Deep <aman.deep.singh@intel.com>; Zhang, Yuyin= g
> > > <yuying.zhang@intel.com>; Xing, Beilei <beilei.xing@intel.com>
> > > Cc: dev@dp= dk.org; Ruifeng Wang <Ruifeng.Wang@arm.com>; Zhang, Lijian
> > > <Lijian.Zhang@arm.com>; Honnappa Nagarahalli
> > > <Honnappa.Nagarahalli@arm.com>
> > > Subject: Re: Testpmd/l3fwd port shutdown failure on Arm Altr= a systems
> > >
> > > Hello i40e and testpmd maintainers,
> > >
> > > A gentle reminder - would you please advise how to debug the= issue described
> > > below?
> > >
> > > Thanks,
> > > Juraj
> > >
> > > On Fri, Jan 20, 2023 at 1:07 PM Juraj Linke=C5=A1 <juraj.= linkes@pantheon.tech>
> > > wrote:
> > > >
> > > > Adding the logfile.
> > > >
> > > >
> > > >
> > > > One thing that's in the logs but didn't explici= tly mention is the DPDK version
> > > we've tried this with:
> > > >
> > > > EAL: RTE Version: 'DPDK 22.07.0'
> > > >
> > > >
> > > >
> > > > We also tried earlier versions going back to 21.08, wit= h no luck. I also did a
> > > quick check on 22.11, also with no luck.
> > > >
> > > >
> > > >
> > > > Juraj
> > > >
> > > >
> > > >
> > > > From: Juraj Linke=C5=A1
> > > > Sent: Friday, January 20, 2023 12:56 PM
> > > > To: 'aman.deep.singh@intel.com' <aman.deep.singh@intel.com&g= t;;
> > > > 'yuying.zhang@intel.com' <yuying.zhang@intel.com>; Xing, Beil= ei
> > > > <beilei.xing@intel.com>
> > > > Cc: d= ev@dpdk.org; Ruifeng Wang <Ruifeng.Wang@arm.com>; 'Lijian Zhang'
> > > > <Lijian.Zhang@arm.com>; 'Honnappa Nagarahalli'
> > > > <Honnappa.Nagarahalli@arm.com>
> > > > Subject: Testpmd/l3fwd port shutdown failure on Arm Alt= ra systems
> > > >
> > > >
> > > >
> > > > Hello i40e and testpmd maintainers,
> > > >
> > > >
> > > >
> > > > We're hitting an issue with DPDK testpmd on Ampere = Altra servers in FD.io
> > > lab.
> > > >
> > > >
> > > >
> > > > A bit of background: along with VPP performance tests (= which uses DPDK),
> > > we're running a small number of basic DPDK testpmd and l= 3fwd tests in FD.io
> > > as well. This is to catch any performance differences due to= VPP updating its
> > > DPDK version.
> > > >
> > > >
> > > >
> > > > We're running both l3fwd tests and testpmd tests. T= he Altra servers are two
> > > socket and the topology is TG -> DUT1 -> DUT2 -> TG= , traffic flows in both
> > > directions, but nothing gets forwarded (with a slight caveat= - put a pin in this).
> > > There's nothing special in the tests, just forwarding tr= affic. The NIC we're
> > > testing is xl710-QDA2.
> > > >
> > > >
> > > >
> > > > The same tests are passing on all other testbeds - we h= ave various two node
> > > (1 DUT, 1 TG) and three node (2 DUT, 1 TG) Intel and Arm tes= tbeds and with
> > > various NICs (Intel 700 and 800 series and the Intel testbed= s use some
> > > Mellanox NICs as well). We don't have quite the same com= bination of another
> > > three node topology with the same NIC though, so it looks li= ke something with
> > > testpmd/l3fwd and xl710-QDA2 on Altra servers.
> > > >
> > > >
> > > >
> > > > VPP performance tests are passing, but l3fwd and testpm= d fail. This leads us
> > > to believe to it's a software issue, but there could som= ething wrong with the
> > > hardware. I'll talk about testpmd from now on, but as fa= r we can tell, the
> > > behavior is the same for testpmd and l3fwd.
> > > >
> > > >
> > > >
> > > > Getting back to the caveat mentioned earlier, there see= ms to be something
> > > wrong with port shutdown. When running testpmd on a testbed = that hasn't
> > > been used for a while it seems that all ports are up right a= way (we don't see
> > > any "Port 0|1: link state change event") and the s= etup works fine (forwarding
> > > works). After restarting testpmd (restarting on one server i= s sufficient), the
> > > ports between DUT1 and DUT2 (but not between DUTs and TG) go= down and
> > > are not usable in DPDK, VPP or in Linux (with i40e kernel dr= iver) for a while
> > > (measured in minutes, sometimes dozens of minutes; the durat= ion is seemingly
> > > random). The ports eventually recover and can be used again,= but there's
> > > nothing in syslog suggesting what happened.
> > > >
> > > >
> > > >
> > > > What seems to be happening is testpmd put the ports int= o some faulty state.
> > > This only happens on the DUT1 -> DUT2 link though (the po= rts between the
> > > two testpmds), not on TG -> DUT1 link (the TG port is lef= t alone).
> > > >
> > > >
> > > >
> > > > Some more info:
> > > >
> > > > We've come across the issue with this configuration= :
> > > >
> > > > OS: Ubuntu20.04 with kernel 5.4.0-65-generic.
> > > >
> > > > Old NIC firmware, never upgraded: 6.01 0x800035da 1.174= 7.0.
> > > >
> > > > Drivers versions: i40e 2.17.15 and iavf 4.3.19.
> > > >
> > > >
> > > >
> > > > As well as with this configuration:
> > > >
> > > > OS: Ubuntu22.04 with kernel 5.15.0-46-generic.
> > > >
> > > > Updated firmware: 8.30 0x8000a4ae 1.2926.0.
> > > >
> > > > Drivers: i40e 2.19.3 and iavf 4.5.3.
> > > >
> > > >
> > > >
> > > > Unsafe noiommu mode is disabled:
> > > >
> > > > cat /sys/module/vfio/parameters/enable_unsafe_noiommu_m= ode
> > > >
> > > > N
> > > >
> > > >
> > > >
> > > > We used DPDK 22.07 in manual testing and built it on DU= Ts, using generic
> > > build:
> > > >
> > > > meson -Dexamples=3Dl3fwd -Dc_args=3D-DRTE_LIBRTE_I40E_1= 6BYTE_RX_DESC=3Dy
> > > > -Dplatform=3Dgeneric build
> > > >
> > > >
> > > >
> > > > We're running testpmd with this command:
> > > >
> > > > sudo build/app/dpdk-testpmd -v -l 1,2 -a 0004:04:00.1 -= a 0004:04:00.0
> > > > --in-memory -- -i --forward-mode=3Dio --burst=3D64 --tx= q=3D1 --rxq=3D1
> > > > --tx-offloads=3D0x0 --numa --auto-start --total-num-mbu= fs=3D32768
> > > > --nb-ports=3D2 --portmask=3D0x3 --max-pkt-len=3D1518 --= mbuf-size=3D16384
> > > > --nb-cores=3D1
> > > >
> > > >
> > > >
> > > > And l3fwd (with different macs on the other server): > > > >
> > > > sudo /tmp/openvpp-testing/dpdk/build/examples/dpdk-l3fw= d -v -l 1,2 -a
> > > > 0004:04:00.0 -a 0004:04:00.1 --in-memory -- --parse-pty= pe
> > > > --eth-dest=3D"0,40:a6:b7:85:e7:79" --eth-dest= =3D"1,3c:fd:fe:c3:e7:a1"
> > > > --config=3D"(0, 0, 2),(1, 0, 2)" -P -L -p 0x3=
> > > >
> > > >
> > > >
> > > > We tried adding logs with=C2=A0 --log-level=3Dpmd,debug= and --no-lsc-interrupt, but
> > > that didn't reveal anything helpful, as far as we can te= ll - please have a look at
> > > the attached log. The faulty port is port0 (starts out as do= wn, then we waited
> > > for around 25 minutes for it to go up and then we shut down = testpmd).
> > > >
> > > >
> > > >
> > > > We'd like to ask for pointers on what could be the = cause or how to debug
> > > this issue further.
> > > >
> > > >
> > > >
> > > > Thanks,
> > > > Juraj
--00000000000044972e05f86b2b1f--