From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id 7560D41E1A; Wed, 8 Mar 2023 07:25:47 +0100 (CET) Received: from mails.dpdk.org (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id 0DCAC40ED6; Wed, 8 Mar 2023 07:25:47 +0100 (CET) Received: from mail-ed1-f43.google.com (mail-ed1-f43.google.com [209.85.208.43]) by mails.dpdk.org (Postfix) with ESMTP id C9FEB40E03 for ; Wed, 8 Mar 2023 07:25:44 +0100 (CET) Received: by mail-ed1-f43.google.com with SMTP id a25so61769731edb.0 for ; Tue, 07 Mar 2023 22:25:44 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=pantheon-tech.20210112.gappssmtp.com; s=20210112; t=1678256744; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=uNTLzGuNBgyA8WyukKKFWhX7Cqs9YZBgFo203FX63A0=; b=gNb1Mc6241KSk4ABmyjLeeFZLzfF3O8Rr6+iwvU1aHrsyiWDOUTF3MLm6yZ1fD/HvY ZdNblLFKwI9YeuJoBXB/wR8yiOzmXYwK5oTfyGmSJvyNEY5AG3ufXZqOgjlSLs5Mq6UF EUNRmw/HaC/vx43FxSEh+GmU+4H+9uZzumu+aYS8BeYM6yO6a8AFpC+QS4sfE1Y6hdSi GKwkBzoupi8bCU/JJjr6YCL7Wwovk0h8PHIzKUKTxIgbNOEDVIkn0MpoAxZeJmPc/xGX 8IIyxCHO1Sxf/7nTBIo+9cbv4V9o84vvtZuldB0QPfVtGVuowzEY2O1JEB+zt76BtO3V tj8A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; t=1678256744; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=uNTLzGuNBgyA8WyukKKFWhX7Cqs9YZBgFo203FX63A0=; b=UQlDNNDc2vYFvtKBvSYct4PBa+AZ+QenLMk8hwFpp0X20aRPfxKLbSVlecZ8h6NvC8 4JdhL4YWJEGa4hNDC65xDUSwnlw0BlCL7cyavdRoGC2YsfC8Meht+xzTf6ix/SHJAaT/ n2UtEEuvmifBCOyiIiT68lkOeU5fFynk9lGGOPmOOlO6N/Z7s0pJ411NRrefRepiwC8C PfZxuwXInVIGVnaOto7lRzrfQXxjf8wKRw2iu7mja52mW2OuAGyWsyPe7ZNStPvUBlKh vspW2kCE55c3g4UTirQ+ZvsaPrL8HehtiKefDgylgQm5f6QHIMbqHfp8iz0fMMVIb2Rc 3FyQ== X-Gm-Message-State: AO0yUKUTkjZTDmcngSw93kkHUJVZADGXpL9rKlyG8IDQH9d0H75DeTej XNtF5q83ukkRo/C3ZO6e1ptxfPAOpGtwQrsDbqFO5w== X-Google-Smtp-Source: AK7set+4VnDcqktTxWlNZRYNkhKOXO4mYqnVvLDlV5T7y0Vw4pECMuziiODgFUKoBZmMyKD4nqIAuW6evykjAK1g0Go= X-Received: by 2002:a17:906:b80e:b0:8ce:cb8f:3747 with SMTP id dv14-20020a170906b80e00b008cecb8f3747mr7911875ejb.5.1678256744336; Tue, 07 Mar 2023 22:25:44 -0800 (PST) MIME-Version: 1.0 References: <6d232783fb654d0485f8788c027bd70b@pantheon.tech> In-Reply-To: From: =?UTF-8?Q?Juraj_Linke=C5=A1?= Date: Wed, 8 Mar 2023 07:25:32 +0100 Message-ID: Subject: Re: Testpmd/l3fwd port shutdown failure on Arm Altra systems To: "Xing, Beilei" Cc: "Singh, Aman Deep" , "Zhang, Yuying" , "Yang, Qiming" , "dev@dpdk.org" , Ruifeng Wang , "Zhang, Lijian" , Honnappa Nagarahalli Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Hello Qiming, Beilei, Another reminder - are you looking at this by any chance? The high level short description is that testpmd/l3fwd breaks a link between two servers while VPP (using DPDK) doesn't. This leads us to believe there's a problem with testpmd/l3fwd/i40e driver in DPDK. Thanks, Juraj On Tue, Feb 21, 2023 at 12:18=E2=80=AFPM Juraj Linke=C5=A1 wrote: > > Hi Qiming, > > Just a friendly reminder, would you please take a look? > > Thanks, > Juraj > > > On Tue, Feb 7, 2023 at 3:10 AM Xing, Beilei wrote= : > > > > Hi Qiming, > > > > Could you please help on this? Thanks. > > > > BR, > > Beilei > > > > > -----Original Message----- > > > From: Juraj Linke=C5=A1 > > > Sent: Monday, February 6, 2023 4:53 PM > > > To: Singh, Aman Deep ; Zhang, Yuying > > > ; Xing, Beilei > > > Cc: dev@dpdk.org; Ruifeng Wang ; Zhang, Lijian > > > ; Honnappa Nagarahalli > > > > > > Subject: Re: Testpmd/l3fwd port shutdown failure on Arm Altra systems > > > > > > Hello i40e and testpmd maintainers, > > > > > > A gentle reminder - would you please advise how to debug the issue de= scribed > > > below? > > > > > > Thanks, > > > Juraj > > > > > > On Fri, Jan 20, 2023 at 1:07 PM Juraj Linke=C5=A1 > > > wrote: > > > > > > > > Adding the logfile. > > > > > > > > > > > > > > > > One thing that's in the logs but didn't explicitly mention is the D= PDK version > > > we've tried this with: > > > > > > > > EAL: RTE Version: 'DPDK 22.07.0' > > > > > > > > > > > > > > > > We also tried earlier versions going back to 21.08, with no luck. I= also did a > > > quick check on 22.11, also with no luck. > > > > > > > > > > > > > > > > Juraj > > > > > > > > > > > > > > > > From: Juraj Linke=C5=A1 > > > > Sent: Friday, January 20, 2023 12:56 PM > > > > To: 'aman.deep.singh@intel.com' ; > > > > 'yuying.zhang@intel.com' ; Xing, Beilei > > > > > > > > Cc: dev@dpdk.org; Ruifeng Wang ; 'Lijian Zhan= g' > > > > ; 'Honnappa Nagarahalli' > > > > > > > > Subject: Testpmd/l3fwd port shutdown failure on Arm Altra systems > > > > > > > > > > > > > > > > Hello i40e and testpmd maintainers, > > > > > > > > > > > > > > > > We're hitting an issue with DPDK testpmd on Ampere Altra servers in= FD.io > > > lab. > > > > > > > > > > > > > > > > A bit of background: along with VPP performance tests (which uses D= PDK), > > > we're running a small number of basic DPDK testpmd and l3fwd tests in= FD.io > > > as well. This is to catch any performance differences due to VPP upda= ting its > > > DPDK version. > > > > > > > > > > > > > > > > We're running both l3fwd tests and testpmd tests. The Altra servers= are two > > > socket and the topology is TG -> DUT1 -> DUT2 -> TG, traffic flows in= both > > > directions, but nothing gets forwarded (with a slight caveat - put a = pin in this). > > > There's nothing special in the tests, just forwarding traffic. The NI= C we're > > > testing is xl710-QDA2. > > > > > > > > > > > > > > > > The same tests are passing on all other testbeds - we have various = two node > > > (1 DUT, 1 TG) and three node (2 DUT, 1 TG) Intel and Arm testbeds and= with > > > various NICs (Intel 700 and 800 series and the Intel testbeds use som= e > > > Mellanox NICs as well). We don't have quite the same combination of a= nother > > > three node topology with the same NIC though, so it looks like someth= ing with > > > testpmd/l3fwd and xl710-QDA2 on Altra servers. > > > > > > > > > > > > > > > > VPP performance tests are passing, but l3fwd and testpmd fail. This= leads us > > > to believe to it's a software issue, but there could something wrong = with the > > > hardware. I'll talk about testpmd from now on, but as far we can tell= , the > > > behavior is the same for testpmd and l3fwd. > > > > > > > > > > > > > > > > Getting back to the caveat mentioned earlier, there seems to be som= ething > > > wrong with port shutdown. When running testpmd on a testbed that hasn= 't > > > been used for a while it seems that all ports are up right away (we d= on't see > > > any "Port 0|1: link state change event") and the setup works fine (fo= rwarding > > > works). After restarting testpmd (restarting on one server is suffici= ent), the > > > ports between DUT1 and DUT2 (but not between DUTs and TG) go down and > > > are not usable in DPDK, VPP or in Linux (with i40e kernel driver) for= a while > > > (measured in minutes, sometimes dozens of minutes; the duration is se= emingly > > > random). The ports eventually recover and can be used again, but ther= e's > > > nothing in syslog suggesting what happened. > > > > > > > > > > > > > > > > What seems to be happening is testpmd put the ports into some fault= y state. > > > This only happens on the DUT1 -> DUT2 link though (the ports between = the > > > two testpmds), not on TG -> DUT1 link (the TG port is left alone). > > > > > > > > > > > > > > > > Some more info: > > > > > > > > We've come across the issue with this configuration: > > > > > > > > OS: Ubuntu20.04 with kernel 5.4.0-65-generic. > > > > > > > > Old NIC firmware, never upgraded: 6.01 0x800035da 1.1747.0. > > > > > > > > Drivers versions: i40e 2.17.15 and iavf 4.3.19. > > > > > > > > > > > > > > > > As well as with this configuration: > > > > > > > > OS: Ubuntu22.04 with kernel 5.15.0-46-generic. > > > > > > > > Updated firmware: 8.30 0x8000a4ae 1.2926.0. > > > > > > > > Drivers: i40e 2.19.3 and iavf 4.5.3. > > > > > > > > > > > > > > > > Unsafe noiommu mode is disabled: > > > > > > > > cat /sys/module/vfio/parameters/enable_unsafe_noiommu_mode > > > > > > > > N > > > > > > > > > > > > > > > > We used DPDK 22.07 in manual testing and built it on DUTs, using ge= neric > > > build: > > > > > > > > meson -Dexamples=3Dl3fwd -Dc_args=3D-DRTE_LIBRTE_I40E_16BYTE_RX_DES= C=3Dy > > > > -Dplatform=3Dgeneric build > > > > > > > > > > > > > > > > We're running testpmd with this command: > > > > > > > > sudo build/app/dpdk-testpmd -v -l 1,2 -a 0004:04:00.1 -a 0004:04:00= .0 > > > > --in-memory -- -i --forward-mode=3Dio --burst=3D64 --txq=3D1 --rxq= =3D1 > > > > --tx-offloads=3D0x0 --numa --auto-start --total-num-mbufs=3D32768 > > > > --nb-ports=3D2 --portmask=3D0x3 --max-pkt-len=3D1518 --mbuf-size=3D= 16384 > > > > --nb-cores=3D1 > > > > > > > > > > > > > > > > And l3fwd (with different macs on the other server): > > > > > > > > sudo /tmp/openvpp-testing/dpdk/build/examples/dpdk-l3fwd -v -l 1,2 = -a > > > > 0004:04:00.0 -a 0004:04:00.1 --in-memory -- --parse-ptype > > > > --eth-dest=3D"0,40:a6:b7:85:e7:79" --eth-dest=3D"1,3c:fd:fe:c3:e7:a= 1" > > > > --config=3D"(0, 0, 2),(1, 0, 2)" -P -L -p 0x3 > > > > > > > > > > > > > > > > We tried adding logs with --log-level=3Dpmd,debug and --no-lsc-int= errupt, but > > > that didn't reveal anything helpful, as far as we can tell - please h= ave a look at > > > the attached log. The faulty port is port0 (starts out as down, then = we waited > > > for around 25 minutes for it to go up and then we shut down testpmd). > > > > > > > > > > > > > > > > We'd like to ask for pointers on what could be the cause or how to = debug > > > this issue further. > > > > > > > > > > > > > > > > Thanks, > > > > Juraj