From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id 0230C4242F; Fri, 20 Jan 2023 12:56:30 +0100 (CET) Received: from mails.dpdk.org (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id 27D5D40150; Fri, 20 Jan 2023 12:56:28 +0100 (CET) Received: from mailgw02.pantheon.sk (mailgw01.pantheon.sk [46.229.239.26]) by mails.dpdk.org (Postfix) with ESMTP id 96489400D5 for ; Fri, 20 Jan 2023 12:56:26 +0100 (CET) Received: from mailgw02.pantheon.sk (localhost.localdomain [127.0.0.1]) by mailgw02.pantheon.sk (Proxmox) with ESMTP id 0FC1F181C5B; Fri, 20 Jan 2023 12:56:26 +0100 (CET) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=pantheon.tech; h=cc:cc:content-type:content-type:date:from:from:message-id :mime-version:reply-to:subject:subject:to:to; s=dkim; bh=X/x2DUi IZYVED9uZ9Rv0X6uRWBWaukT8TzuJe50x+ck=; b=pL7XWeVtRuXjcR5dvOxinEB kEUnQ+fKmATM8JwDlTcUh5KgdU5Q2w5XznbMI5Gwd6KiPQl3wyF5+aklUsozEKDy d2ni1KARh8dVwkyXhbFXUOr3bSzGSYtO6lLNDhthEmoD2F5WuuenjJSpRcG34isT 1mRJLefZvCds+lyMUJ4cgZm/nQjB0vdLgaQ/5sSHET20cnuHxhcRKSpKB/DJPk8+ PRXiNxbVX2ZYGBZlbZNVlhEpKV2t5gofgaZ5scVJZdBwgZJWMf2AOFGzbXV24AHo rjKJV8Yo63HwH84IqhHE61ITS7JgxYZ1thV8D0Qv0/lO/N3V7MJjyiWWSSbcjWw= = From: =?iso-8859-2?Q?Juraj_Linke=B9?= To: "aman.deep.singh@intel.com" , "yuying.zhang@intel.com" , "Xing, Beilei" CC: "dev@dpdk.org" , Ruifeng Wang , "Lijian Zhang" , Honnappa Nagarahalli Subject: Testpmd/l3fwd port shutdown failure on Arm Altra systems Thread-Topic: Testpmd/l3fwd port shutdown failure on Arm Altra systems Thread-Index: AdkstfZMJuG4FgjwQ5uoucTY95wFcw== Date: Fri, 20 Jan 2023 11:56:20 +0000 Message-ID: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [10.101.4.10] Content-Type: multipart/alternative; boundary="_000_d1851963e4ab4ccab41789a643a68d1epantheontech_" MIME-Version: 1.0 X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org --_000_d1851963e4ab4ccab41789a643a68d1epantheontech_ Content-Type: text/plain; charset="iso-8859-2" Content-Transfer-Encoding: quoted-printable Hello i40e and testpmd maintainers, We're hitting an issue with DPDK testpmd on Ampere Altra servers in FD.io l= ab. A bit of background: along with VPP performance tests (which uses DPDK), we= 're running a small number of basic DPDK testpmd and l3fwd tests in FD.io a= s well. This is to catch any performance differences due to VPP updating it= s DPDK version. We're running both l3fwd tests and testpmd tests. The Altra servers are two= socket and the topology is TG -> DUT1 -> DUT2 -> TG, traffic flows in both= directions, but nothing gets forwarded (with a slight caveat - put a pin i= n this). There's nothing special in the tests, just forwarding traffic. The= NIC we're testing is xl710-QDA2. The same tests are passing on all other testbeds - we have various two node= (1 DUT, 1 TG) and three node (2 DUT, 1 TG) Intel and Arm testbeds and with= various NICs (Intel 700 and 800 series and the Intel testbeds use some Mel= lanox NICs as well). We don't have quite the same combination of another th= ree node topology with the same NIC though, so it looks like something with= testpmd/l3fwd and xl710-QDA2 on Altra servers. VPP performance tests are passing, but l3fwd and testpmd fail. This leads u= s to believe to it's a software issue, but there could something wrong with= the hardware. I'll talk about testpmd from now on, but as far we can tell,= the behavior is the same for testpmd and l3fwd. Getting back to the caveat mentioned earlier, there seems to be something w= rong with port shutdown. When running testpmd on a testbed that hasn't been= used for a while it seems that all ports are up right away (we don't see a= ny "Port 0|1: link state change event") and the setup works fine (forwardin= g works). After restarting testpmd (restarting on one server is sufficient)= , the ports between DUT1 and DUT2 (but not between DUTs and TG) go down and= are not usable in DPDK, VPP or in Linux (with i40e kernel driver) for a wh= ile (measured in minutes, sometimes dozens of minutes; the duration is seem= ingly random). The ports eventually recover and can be used again, but ther= e's nothing in syslog suggesting what happened. What seems to be happening is testpmd put the ports into some faulty state.= This only happens on the DUT1 -> DUT2 link though (the ports between the t= wo testpmds), not on TG -> DUT1 link (the TG port is left alone). Some more info: We've come across the issue with this configuration: OS: Ubuntu20.04 with kernel 5.4.0-65-generic. Old NIC firmware, never upgraded: 6.01 0x800035da 1.1747.0. Drivers versions: i40e 2.17.15 and iavf 4.3.19. As well as with this configuration: OS: Ubuntu22.04 with kernel 5.15.0-46-generic. Updated firmware: 8.30 0x8000a4ae 1.2926.0. Drivers: i40e 2.19.3 and iavf 4.5.3. Unsafe noiommu mode is disabled: cat /sys/module/vfio/parameters/enable_unsafe_noiommu_mode N We used DPDK 22.07 in manual testing and built it on DUTs, using generic bu= ild: meson -Dexamples=3Dl3fwd -Dc_args=3D-DRTE_LIBRTE_I40E_16BYTE_RX_DESC=3Dy -D= platform=3Dgeneric build We're running testpmd with this command: sudo build/app/dpdk-testpmd -v -l 1,2 -a 0004:04:00.1 -a 0004:04:00.0 --in-= memory -- -i --forward-mode=3Dio --burst=3D64 --txq=3D1 --rxq=3D1 --tx-offl= oads=3D0x0 --numa --auto-start --total-num-mbufs=3D32768 --nb-ports=3D2 --p= ortmask=3D0x3 --max-pkt-len=3D1518 --mbuf-size=3D16384 --nb-cores=3D1 And l3fwd (with different macs on the other server): sudo /tmp/openvpp-testing/dpdk/build/examples/dpdk-l3fwd -v -l 1,2 -a 0004:= 04:00.0 -a 0004:04:00.1 --in-memory -- --parse-ptype --eth-dest=3D"0,40:a6:= b7:85:e7:79" --eth-dest=3D"1,3c:fd:fe:c3:e7:a1" --config=3D"(0, 0, 2),(1, 0= , 2)" -P -L -p 0x3 We tried adding logs with --log-level=3Dpmd,debug and --no-lsc-interrupt, = but that didn't reveal anything helpful, as far as we can tell - please hav= e a look at the attached log. The faulty port is port0 (starts out as down,= then we waited for around 25 minutes for it to go up and then we shut down= testpmd). We'd like to ask for pointers on what could be the cause or how to debug th= is issue further. Thanks, Juraj --_000_d1851963e4ab4ccab41789a643a68d1epantheontech_ Content-Type: text/html; charset="iso-8859-2" Content-Transfer-Encoding: quoted-printable

Hello i40e and testpmd maintain= ers,

 

We're hitting an issue with DPD= K testpmd on Ampere Altra servers in FD.io lab.

 

A bit of background: along with= VPP performance tests (which uses DPDK), we're running a small number of b= asic DPDK testpmd and l3fwd tests in FD.io as well. This is to catch any pe= rformance differences due to VPP updating its DPDK version.

 

We're running both l3fwd tests = and testpmd tests. The Altra servers are two socket and the topology is TG = -> DUT1 -> DUT2 -> TG, traffic flows in both directions, but nothi= ng gets forwarded (with a slight caveat - put a pin in this). There's nothing special in the tests, just forwarding traf= fic. The NIC we're testing is xl710-QDA2.

 

The same tests are passing on a= ll other testbeds - we have various two node (1 DUT, 1 TG) and three node (= 2 DUT, 1 TG) Intel and Arm testbeds and with various NICs (Intel 700 and 80= 0 series and the Intel testbeds use some Mellanox NICs as well). We don't have quite the same combination of a= nother three node topology with the same NIC though, so it looks like somet= hing with testpmd/l3fwd and xl710-QDA2 on Altra servers.<= /p>

 

VPP performance tests are passi= ng, but l3fwd and testpmd fail. This leads us to believe to it's a software= issue, but there could something wrong with the hardware. I'll talk about = testpmd from now on, but as far we can tell, the behavior is the same for testpmd and l3fwd.

 

Getting back to the caveat ment= ioned earlier, there seems to be something wrong with port shutdown. When r= unning testpmd on a testbed that hasn't been used for a while it seems that= all ports are up right away (we don't see any "Port 0|1: link state change event") and the setup works= fine (forwarding works). After restarting testpmd (restarting on one serve= r is sufficient), the ports between DUT1 and DUT2 (but not between DUTs and= TG) go down and are not usable in DPDK, VPP or in Linux (with i40e kernel driver) for a while (measured in minutes, so= metimes dozens of minutes; the duration is seemingly random). The ports eve= ntually recover and can be used again, but there's nothing in syslog sugges= ting what happened.

 

What seems to be happening is t= estpmd put the ports into some faulty state. This only happens on the DUT1 = -> DUT2 link though (the ports between the two testpmds), not on TG ->= ; DUT1 link (the TG port is left alone).

 

Some more info:

We've come across the issue wit= h this configuration:

OS: Ubuntu20.04 with kernel 5.4.0-65-generic.<= /o:p>

Old NIC firmware, never upgraded: 6.01 0x800035da 1= .1747.0.

Drivers versions: i40e 2.17.15 and iavf 4.3.19.

 

As well as with this configurat= ion:

OS: Ubuntu22.04 with kernel 5.15.0-46-generic.=

Updated firmware: 8.30 0x8000a4ae 1.2926.0.

Drivers: i40e 2.19.3 and iavf 4.5.3.

 

Unsafe noiommu mode is disabled= :

cat /sys/module/vfio/parameters/enable_unsafe_noiom= mu_mode

N

 

We used DPDK 22.07 in manual te= sting and built it on DUTs, using generic build:

meson -Dexamples=3Dl3fwd -Dc_args=3D-DRTE_LIBRTE_I4= 0E_16BYTE_RX_DESC=3Dy -Dplatform=3Dgeneric build

 

We're running testpmd with this= command:

sudo build/app/dpdk-testpmd -v -l 1,2 -a 0004:04:00= .1 -a 0004:04:00.0 --in-memory -- -i --forward-mode=3Dio --burst=3D64 --txq= =3D1 --rxq=3D1 --tx-offloads=3D0x0 --numa --auto-start --total-num-mbufs=3D= 32768 --nb-ports=3D2 --portmask=3D0x3 --max-pkt-len=3D1518 --mbuf-size=3D16384 -= -nb-cores=3D1

 

And l3fwd (with different macs = on the other server):

sudo /tmp/openvpp-testing/dpdk/build/examples/dpdk-= l3fwd -v -l 1,2 -a 0004:04:00.0 -a 0004:04:00.1 --in-memory -- --parse-ptyp= e --eth-dest=3D"0,40:a6:b7:85:e7:79" --eth-dest=3D"1,3c:fd:f= e:c3:e7:a1" --config=3D"(0, 0, 2),(1, 0, 2)" -P -L -p 0x3<= /p>

 

We tried adding logs with  = ;--log-level=3Dpmd,debug and --no-lsc-interrupt, but that didn't reveal any= thing helpful, as far as we can tell - please have a look at the attached l= og. The faulty port is port0 (starts out as down, then we waited for around= 25 minutes for it to go up and then we shut down testpmd).

 

We'd like to ask for pointers o= n what could be the cause or how to debug this issue further.

 

Thanks,
Juraj

--_000_d1851963e4ab4ccab41789a643a68d1epantheontech_--