DPDK patches and discussions
 help / color / mirror / Atom feed
From: "Juraj Linkeš" <juraj.linkes@pantheon.tech>
To: "Xing, Beilei" <beilei.xing@intel.com>
Cc: "Singh, Aman Deep" <aman.deep.singh@intel.com>,
	"Zhang, Yuying" <yuying.zhang@intel.com>,
	 "Yang, Qiming" <qiming.yang@intel.com>,
	"dev@dpdk.org" <dev@dpdk.org>,
	 Ruifeng Wang <Ruifeng.Wang@arm.com>,
	"Zhang, Lijian" <Lijian.Zhang@arm.com>,
	 Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>
Subject: Re: Testpmd/l3fwd port shutdown failure on Arm Altra systems
Date: Mon, 3 Apr 2023 11:27:23 +0200	[thread overview]
Message-ID: <CAOb5WZZHA4r0jGSeitXy04NztZyF+KcrfTRq7+bso3_ZVmbG3w@mail.gmail.com> (raw)
In-Reply-To: <CAOb5WZay32GUp3VaZ9yMWXKWJpovLtVM_JicT0j0-Gc-Yga1ew@mail.gmail.com>

[-- Attachment #1: Type: text/plain, Size: 8553 bytes --]

Hello Qiming, Beilei,

Could you please help us debug this issue? Anything that would help with
getting to the bottom of anything that could go wrong during port
init/cleanup would be appreciated - extra eal/testpmd options or even code
changes (such as where could add extra debug messages).

Thanks,
Juraj

On Wed, Mar 8, 2023 at 7:25 AM Juraj Linkeš <juraj.linkes@pantheon.tech>
wrote:

> Hello Qiming, Beilei,
>
> Another reminder - are you looking at this by any chance?
>
> The high level short description is that testpmd/l3fwd breaks a link
> between two servers while VPP (using DPDK) doesn't. This leads us to
> believe there's a problem with testpmd/l3fwd/i40e driver in DPDK.
>
> Thanks,
> Juraj
>
> On Tue, Feb 21, 2023 at 12:18 PM Juraj Linkeš
> <juraj.linkes@pantheon.tech> wrote:
> >
> > Hi Qiming,
> >
> > Just a friendly reminder, would you please take a look?
> >
> > Thanks,
> > Juraj
> >
> >
> > On Tue, Feb 7, 2023 at 3:10 AM Xing, Beilei <beilei.xing@intel.com>
> wrote:
> > >
> > > Hi Qiming,
> > >
> > > Could you please help on this? Thanks.
> > >
> > > BR,
> > > Beilei
> > >
> > > > -----Original Message-----
> > > > From: Juraj Linkeš <juraj.linkes@pantheon.tech>
> > > > Sent: Monday, February 6, 2023 4:53 PM
> > > > To: Singh, Aman Deep <aman.deep.singh@intel.com>; Zhang, Yuying
> > > > <yuying.zhang@intel.com>; Xing, Beilei <beilei.xing@intel.com>
> > > > Cc: dev@dpdk.org; Ruifeng Wang <Ruifeng.Wang@arm.com>; Zhang, Lijian
> > > > <Lijian.Zhang@arm.com>; Honnappa Nagarahalli
> > > > <Honnappa.Nagarahalli@arm.com>
> > > > Subject: Re: Testpmd/l3fwd port shutdown failure on Arm Altra systems
> > > >
> > > > Hello i40e and testpmd maintainers,
> > > >
> > > > A gentle reminder - would you please advise how to debug the issue
> described
> > > > below?
> > > >
> > > > Thanks,
> > > > Juraj
> > > >
> > > > On Fri, Jan 20, 2023 at 1:07 PM Juraj Linkeš
> <juraj.linkes@pantheon.tech>
> > > > wrote:
> > > > >
> > > > > Adding the logfile.
> > > > >
> > > > >
> > > > >
> > > > > One thing that's in the logs but didn't explicitly mention is the
> DPDK version
> > > > we've tried this with:
> > > > >
> > > > > EAL: RTE Version: 'DPDK 22.07.0'
> > > > >
> > > > >
> > > > >
> > > > > We also tried earlier versions going back to 21.08, with no luck.
> I also did a
> > > > quick check on 22.11, also with no luck.
> > > > >
> > > > >
> > > > >
> > > > > Juraj
> > > > >
> > > > >
> > > > >
> > > > > From: Juraj Linkeš
> > > > > Sent: Friday, January 20, 2023 12:56 PM
> > > > > To: 'aman.deep.singh@intel.com' <aman.deep.singh@intel.com>;
> > > > > 'yuying.zhang@intel.com' <yuying.zhang@intel.com>; Xing, Beilei
> > > > > <beilei.xing@intel.com>
> > > > > Cc: dev@dpdk.org; Ruifeng Wang <Ruifeng.Wang@arm.com>; 'Lijian
> Zhang'
> > > > > <Lijian.Zhang@arm.com>; 'Honnappa Nagarahalli'
> > > > > <Honnappa.Nagarahalli@arm.com>
> > > > > Subject: Testpmd/l3fwd port shutdown failure on Arm Altra systems
> > > > >
> > > > >
> > > > >
> > > > > Hello i40e and testpmd maintainers,
> > > > >
> > > > >
> > > > >
> > > > > We're hitting an issue with DPDK testpmd on Ampere Altra servers
> in FD.io
> > > > lab.
> > > > >
> > > > >
> > > > >
> > > > > A bit of background: along with VPP performance tests (which uses
> DPDK),
> > > > we're running a small number of basic DPDK testpmd and l3fwd tests
> in FD.io
> > > > as well. This is to catch any performance differences due to VPP
> updating its
> > > > DPDK version.
> > > > >
> > > > >
> > > > >
> > > > > We're running both l3fwd tests and testpmd tests. The Altra
> servers are two
> > > > socket and the topology is TG -> DUT1 -> DUT2 -> TG, traffic flows
> in both
> > > > directions, but nothing gets forwarded (with a slight caveat - put a
> pin in this).
> > > > There's nothing special in the tests, just forwarding traffic. The
> NIC we're
> > > > testing is xl710-QDA2.
> > > > >
> > > > >
> > > > >
> > > > > The same tests are passing on all other testbeds - we have various
> two node
> > > > (1 DUT, 1 TG) and three node (2 DUT, 1 TG) Intel and Arm testbeds
> and with
> > > > various NICs (Intel 700 and 800 series and the Intel testbeds use
> some
> > > > Mellanox NICs as well). We don't have quite the same combination of
> another
> > > > three node topology with the same NIC though, so it looks like
> something with
> > > > testpmd/l3fwd and xl710-QDA2 on Altra servers.
> > > > >
> > > > >
> > > > >
> > > > > VPP performance tests are passing, but l3fwd and testpmd fail.
> This leads us
> > > > to believe to it's a software issue, but there could something wrong
> with the
> > > > hardware. I'll talk about testpmd from now on, but as far we can
> tell, the
> > > > behavior is the same for testpmd and l3fwd.
> > > > >
> > > > >
> > > > >
> > > > > Getting back to the caveat mentioned earlier, there seems to be
> something
> > > > wrong with port shutdown. When running testpmd on a testbed that
> hasn't
> > > > been used for a while it seems that all ports are up right away (we
> don't see
> > > > any "Port 0|1: link state change event") and the setup works fine
> (forwarding
> > > > works). After restarting testpmd (restarting on one server is
> sufficient), the
> > > > ports between DUT1 and DUT2 (but not between DUTs and TG) go down and
> > > > are not usable in DPDK, VPP or in Linux (with i40e kernel driver)
> for a while
> > > > (measured in minutes, sometimes dozens of minutes; the duration is
> seemingly
> > > > random). The ports eventually recover and can be used again, but
> there's
> > > > nothing in syslog suggesting what happened.
> > > > >
> > > > >
> > > > >
> > > > > What seems to be happening is testpmd put the ports into some
> faulty state.
> > > > This only happens on the DUT1 -> DUT2 link though (the ports between
> the
> > > > two testpmds), not on TG -> DUT1 link (the TG port is left alone).
> > > > >
> > > > >
> > > > >
> > > > > Some more info:
> > > > >
> > > > > We've come across the issue with this configuration:
> > > > >
> > > > > OS: Ubuntu20.04 with kernel 5.4.0-65-generic.
> > > > >
> > > > > Old NIC firmware, never upgraded: 6.01 0x800035da 1.1747.0.
> > > > >
> > > > > Drivers versions: i40e 2.17.15 and iavf 4.3.19.
> > > > >
> > > > >
> > > > >
> > > > > As well as with this configuration:
> > > > >
> > > > > OS: Ubuntu22.04 with kernel 5.15.0-46-generic.
> > > > >
> > > > > Updated firmware: 8.30 0x8000a4ae 1.2926.0.
> > > > >
> > > > > Drivers: i40e 2.19.3 and iavf 4.5.3.
> > > > >
> > > > >
> > > > >
> > > > > Unsafe noiommu mode is disabled:
> > > > >
> > > > > cat /sys/module/vfio/parameters/enable_unsafe_noiommu_mode
> > > > >
> > > > > N
> > > > >
> > > > >
> > > > >
> > > > > We used DPDK 22.07 in manual testing and built it on DUTs, using
> generic
> > > > build:
> > > > >
> > > > > meson -Dexamples=l3fwd -Dc_args=-DRTE_LIBRTE_I40E_16BYTE_RX_DESC=y
> > > > > -Dplatform=generic build
> > > > >
> > > > >
> > > > >
> > > > > We're running testpmd with this command:
> > > > >
> > > > > sudo build/app/dpdk-testpmd -v -l 1,2 -a 0004:04:00.1 -a
> 0004:04:00.0
> > > > > --in-memory -- -i --forward-mode=io --burst=64 --txq=1 --rxq=1
> > > > > --tx-offloads=0x0 --numa --auto-start --total-num-mbufs=32768
> > > > > --nb-ports=2 --portmask=0x3 --max-pkt-len=1518 --mbuf-size=16384
> > > > > --nb-cores=1
> > > > >
> > > > >
> > > > >
> > > > > And l3fwd (with different macs on the other server):
> > > > >
> > > > > sudo /tmp/openvpp-testing/dpdk/build/examples/dpdk-l3fwd -v -l 1,2
> -a
> > > > > 0004:04:00.0 -a 0004:04:00.1 --in-memory -- --parse-ptype
> > > > > --eth-dest="0,40:a6:b7:85:e7:79" --eth-dest="1,3c:fd:fe:c3:e7:a1"
> > > > > --config="(0, 0, 2),(1, 0, 2)" -P -L -p 0x3
> > > > >
> > > > >
> > > > >
> > > > > We tried adding logs with  --log-level=pmd,debug and
> --no-lsc-interrupt, but
> > > > that didn't reveal anything helpful, as far as we can tell - please
> have a look at
> > > > the attached log. The faulty port is port0 (starts out as down, then
> we waited
> > > > for around 25 minutes for it to go up and then we shut down testpmd).
> > > > >
> > > > >
> > > > >
> > > > > We'd like to ask for pointers on what could be the cause or how to
> debug
> > > > this issue further.
> > > > >
> > > > >
> > > > >
> > > > > Thanks,
> > > > > Juraj
>

[-- Attachment #2: Type: text/html, Size: 12557 bytes --]

  reply	other threads:[~2023-04-03  9:27 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <AQHlNO1vQvNhBNQAf+dkWoX3B17UKg==>
2023-01-20 12:07 ` Juraj Linkeš
2023-02-06  8:52   ` Juraj Linkeš
2023-02-07  2:09     ` Xing, Beilei
2023-02-21 11:18       ` Juraj Linkeš
2023-03-08  6:25         ` Juraj Linkeš
2023-04-03  9:27           ` Juraj Linkeš [this message]
2023-04-04  1:46             ` Yang, Qiming
2023-04-04  1:52               ` Lijian Zhang
2023-01-20 11:56 Juraj Linkeš

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAOb5WZZHA4r0jGSeitXy04NztZyF+KcrfTRq7+bso3_ZVmbG3w@mail.gmail.com \
    --to=juraj.linkes@pantheon.tech \
    --cc=Honnappa.Nagarahalli@arm.com \
    --cc=Lijian.Zhang@arm.com \
    --cc=Ruifeng.Wang@arm.com \
    --cc=aman.deep.singh@intel.com \
    --cc=beilei.xing@intel.com \
    --cc=dev@dpdk.org \
    --cc=qiming.yang@intel.com \
    --cc=yuying.zhang@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).