Hi Qiming,

It’s not an issue in VPP. It’s the XL710 NIC link down issue in DPDK testpmd.

The ethernet links btw two XL710 NICs occasionally go down in FD.io VPP lab.

 

If possible, could we have a talk on this issue this afternoon?

 

Thanks.

From: Yang, Qiming <qiming.yang@intel.com>
Sent: Tuesday, April 4, 2023 9:47 AM
To: Juraj Linkeš <juraj.linkes@pantheon.tech>; Xing, Beilei <beilei.xing@intel.com>
Cc: Singh, Aman Deep <aman.deep.singh@intel.com>; Zhang, Yuying <yuying.zhang@intel.com>; dev@dpdk.org; Ruifeng Wang <Ruifeng.Wang@arm.com>; Lijian Zhang <Lijian.Zhang@arm.com>; Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>
Subject: RE: Testpmd/l3fwd port shutdown failure on Arm Altra systems

 

Hi, Juraj

I don’t know VPP. Can I narrow down your question? Do means you run testpmd and l3fwd by these cmd in an ARM system but crush?

> sudo build/app/dpdk-testpmd -v -l 1,2 -a 0004:04:00.1 -a 0004:04:00.0
> > > > --in-memory -- -i --forward-mode=io --burst=64 --txq=1 --rxq=1
> > > > --tx-offloads=0x0 --numa --auto-start --total-num-mbufs=32768
> > > > --nb-ports=2 --portmask=0x3 --max-pkt-len=1518 --mbuf-size=16384
> > > > --nb-cores=1
> > > >
> > > >
> > > >
> > > > And l3fwd (with different macs on the other server):
> > > >
> > > > sudo /tmp/openvpp-testing/dpdk/build/examples/dpdk-l3fwd -v -l 1,2 -a
> > > > 0004:04:00.0 -a 0004:04:00.1 --in-memory -- --parse-ptype
> > > > --eth-dest="0,40:a6:b7:85:e7:79" --eth-dest="1,3c:fd:fe:c3:e7:a1"
> > > > --config="(0, 0, 2),(1, 0, 2)" -P -L -p 0x3

Qiming

 

From: Juraj Linkeš <juraj.linkes@pantheon.tech>
Sent: Monday, April 3, 2023 5:27 PM
To: Xing, Beilei <beilei.xing@intel.com>
Cc: Singh, Aman Deep <aman.deep.singh@intel.com>; Zhang, Yuying <yuying.zhang@intel.com>; Yang, Qiming <qiming.yang@intel.com>; dev@dpdk.org; Ruifeng Wang <Ruifeng.Wang@arm.com>; Zhang, Lijian <Lijian.Zhang@arm.com>; Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>
Subject: Re: Testpmd/l3fwd port shutdown failure on Arm Altra systems

 

Hello Qiming, Beilei,

 

Could you please help us debug this issue? Anything that would help with getting to the bottom of anything that could go wrong during port init/cleanup would be appreciated - extra eal/testpmd options or even code changes (such as where could add extra debug messages).

 

Thanks,

Juraj

 

On Wed, Mar 8, 2023 at 7:25 AM Juraj Linkeš <juraj.linkes@pantheon.tech> wrote:

Hello Qiming, Beilei,

Another reminder - are you looking at this by any chance?

The high level short description is that testpmd/l3fwd breaks a link
between two servers while VPP (using DPDK) doesn't. This leads us to
believe there's a problem with testpmd/l3fwd/i40e driver in DPDK.

Thanks,
Juraj

On Tue, Feb 21, 2023 at 12:18 PM Juraj Linkeš
<juraj.linkes@pantheon.tech> wrote:
>
> Hi Qiming,
>
> Just a friendly reminder, would you please take a look?
>
> Thanks,
> Juraj
>
>
> On Tue, Feb 7, 2023 at 3:10 AM Xing, Beilei <beilei.xing@intel.com> wrote:
> >
> > Hi Qiming,
> >
> > Could you please help on this? Thanks.
> >
> > BR,
> > Beilei
> >
> > > -----Original Message-----
> > > From: Juraj Linkeš <juraj.linkes@pantheon.tech>
> > > Sent: Monday, February 6, 2023 4:53 PM
> > > To: Singh, Aman Deep <aman.deep.singh@intel.com>; Zhang, Yuying
> > > <yuying.zhang@intel.com>; Xing, Beilei <beilei.xing@intel.com>
> > > Cc: dev@dpdk.org; Ruifeng Wang <Ruifeng.Wang@arm.com>; Zhang, Lijian
> > > <Lijian.Zhang@arm.com>; Honnappa Nagarahalli
> > > <Honnappa.Nagarahalli@arm.com>
> > > Subject: Re: Testpmd/l3fwd port shutdown failure on Arm Altra systems
> > >
> > > Hello i40e and testpmd maintainers,
> > >
> > > A gentle reminder - would you please advise how to debug the issue described
> > > below?
> > >
> > > Thanks,
> > > Juraj
> > >
> > > On Fri, Jan 20, 2023 at 1:07 PM Juraj Linkeš <juraj.linkes@pantheon.tech>
> > > wrote:
> > > >
> > > > Adding the logfile.
> > > >
> > > >
> > > >
> > > > One thing that's in the logs but didn't explicitly mention is the DPDK version
> > > we've tried this with:
> > > >
> > > > EAL: RTE Version: 'DPDK 22.07.0'
> > > >
> > > >
> > > >
> > > > We also tried earlier versions going back to 21.08, with no luck. I also did a
> > > quick check on 22.11, also with no luck.
> > > >
> > > >
> > > >
> > > > Juraj
> > > >
> > > >
> > > >
> > > > From: Juraj Linkeš
> > > > Sent: Friday, January 20, 2023 12:56 PM
> > > > To: 'aman.deep.singh@intel.com' <aman.deep.singh@intel.com>;
> > > > 'yuying.zhang@intel.com' <yuying.zhang@intel.com>; Xing, Beilei
> > > > <beilei.xing@intel.com>
> > > > Cc: dev@dpdk.org; Ruifeng Wang <Ruifeng.Wang@arm.com>; 'Lijian Zhang'
> > > > <Lijian.Zhang@arm.com>; 'Honnappa Nagarahalli'
> > > > <Honnappa.Nagarahalli@arm.com>
> > > > Subject: Testpmd/l3fwd port shutdown failure on Arm Altra systems
> > > >
> > > >
> > > >
> > > > Hello i40e and testpmd maintainers,
> > > >
> > > >
> > > >
> > > > We're hitting an issue with DPDK testpmd on Ampere Altra servers in FD.io
> > > lab.
> > > >
> > > >
> > > >
> > > > A bit of background: along with VPP performance tests (which uses DPDK),
> > > we're running a small number of basic DPDK testpmd and l3fwd tests in FD.io
> > > as well. This is to catch any performance differences due to VPP updating its
> > > DPDK version.
> > > >
> > > >
> > > >
> > > > We're running both l3fwd tests and testpmd tests. The Altra servers are two
> > > socket and the topology is TG -> DUT1 -> DUT2 -> TG, traffic flows in both
> > > directions, but nothing gets forwarded (with a slight caveat - put a pin in this).
> > > There's nothing special in the tests, just forwarding traffic. The NIC we're
> > > testing is xl710-QDA2.
> > > >
> > > >
> > > >
> > > > The same tests are passing on all other testbeds - we have various two node
> > > (1 DUT, 1 TG) and three node (2 DUT, 1 TG) Intel and Arm testbeds and with
> > > various NICs (Intel 700 and 800 series and the Intel testbeds use some
> > > Mellanox NICs as well). We don't have quite the same combination of another
> > > three node topology with the same NIC though, so it looks like something with
> > > testpmd/l3fwd and xl710-QDA2 on Altra servers.
> > > >
> > > >
> > > >
> > > > VPP performance tests are passing, but l3fwd and testpmd fail. This leads us
> > > to believe to it's a software issue, but there could something wrong with the
> > > hardware. I'll talk about testpmd from now on, but as far we can tell, the
> > > behavior is the same for testpmd and l3fwd.
> > > >
> > > >
> > > >
> > > > Getting back to the caveat mentioned earlier, there seems to be something
> > > wrong with port shutdown. When running testpmd on a testbed that hasn't
> > > been used for a while it seems that all ports are up right away (we don't see
> > > any "Port 0|1: link state change event") and the setup works fine (forwarding
> > > works). After restarting testpmd (restarting on one server is sufficient), the
> > > ports between DUT1 and DUT2 (but not between DUTs and TG) go down and
> > > are not usable in DPDK, VPP or in Linux (with i40e kernel driver) for a while
> > > (measured in minutes, sometimes dozens of minutes; the duration is seemingly
> > > random). The ports eventually recover and can be used again, but there's
> > > nothing in syslog suggesting what happened.
> > > >
> > > >
> > > >
> > > > What seems to be happening is testpmd put the ports into some faulty state.
> > > This only happens on the DUT1 -> DUT2 link though (the ports between the
> > > two testpmds), not on TG -> DUT1 link (the TG port is left alone).
> > > >
> > > >
> > > >
> > > > Some more info:
> > > >
> > > > We've come across the issue with this configuration:
> > > >
> > > > OS: Ubuntu20.04 with kernel 5.4.0-65-generic.
> > > >
> > > > Old NIC firmware, never upgraded: 6.01 0x800035da 1.1747.0.
> > > >
> > > > Drivers versions: i40e 2.17.15 and iavf 4.3.19.
> > > >
> > > >
> > > >
> > > > As well as with this configuration:
> > > >
> > > > OS: Ubuntu22.04 with kernel 5.15.0-46-generic.
> > > >
> > > > Updated firmware: 8.30 0x8000a4ae 1.2926.0.
> > > >
> > > > Drivers: i40e 2.19.3 and iavf 4.5.3.
> > > >
> > > >
> > > >
> > > > Unsafe noiommu mode is disabled:
> > > >
> > > > cat /sys/module/vfio/parameters/enable_unsafe_noiommu_mode
> > > >
> > > > N
> > > >
> > > >
> > > >
> > > > We used DPDK 22.07 in manual testing and built it on DUTs, using generic
> > > build:
> > > >
> > > > meson -Dexamples=l3fwd -Dc_args=-DRTE_LIBRTE_I40E_16BYTE_RX_DESC=y
> > > > -Dplatform=generic build
> > > >
> > > >
> > > >
> > > > We're running testpmd with this command:
> > > >
> > > > sudo build/app/dpdk-testpmd -v -l 1,2 -a 0004:04:00.1 -a 0004:04:00.0
> > > > --in-memory -- -i --forward-mode=io --burst=64 --txq=1 --rxq=1
> > > > --tx-offloads=0x0 --numa --auto-start --total-num-mbufs=32768
> > > > --nb-ports=2 --portmask=0x3 --max-pkt-len=1518 --mbuf-size=16384
> > > > --nb-cores=1
> > > >
> > > >
> > > >
> > > > And l3fwd (with different macs on the other server):
> > > >
> > > > sudo /tmp/openvpp-testing/dpdk/build/examples/dpdk-l3fwd -v -l 1,2 -a
> > > > 0004:04:00.0 -a 0004:04:00.1 --in-memory -- --parse-ptype
> > > > --eth-dest="0,40:a6:b7:85:e7:79" --eth-dest="1,3c:fd:fe:c3:e7:a1"
> > > > --config="(0, 0, 2),(1, 0, 2)" -P -L -p 0x3
> > > >
> > > >
> > > >
> > > > We tried adding logs with  --log-level=pmd,debug and --no-lsc-interrupt, but
> > > that didn't reveal anything helpful, as far as we can tell - please have a look at
> > > the attached log. The faulty port is port0 (starts out as down, then we waited
> > > for around 25 minutes for it to go up and then we shut down testpmd).
> > > >
> > > >
> > > >
> > > > We'd like to ask for pointers on what could be the cause or how to debug
> > > this issue further.
> > > >
> > > >
> > > >
> > > > Thanks,
> > > > Juraj

IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.