DPDK patches and discussions
 help / color / mirror / Atom feed
From: Stephen Hemminger <stephen@networkplumber.org>
To: "Michał Krawczyk" <mk@semihalf.com>
Cc: Amiya Mohakud <amohakud@paloaltonetworks.com>, dev <dev@dpdk.org>,
	Sachin Kanoje <skanoje@paloaltonetworks.com>,
	Megha Punjani <mpunjani@paloaltonetworks.com>,
	Sharad Saha <ssaha@paloaltonetworks.com>,
	Eswar Sadaram <esadaram@paloaltonetworks.com>,
	"Brandes, Shai" <shaibran@amazon.com>,
	ena-dev <ena-dev@semihalf.com>
Subject: Re: DPDK:20.11.1: net/ena crash while fetching xstats
Date: Tue, 19 Apr 2022 15:25:39 -0700	[thread overview]
Message-ID: <20220419152539.3f39a704@hermes.local> (raw)
In-Reply-To: <CAJMMOfPpvuBPJT2BihhSQSt9Rwx5cFphpKmfRAN71QJ6VMAxrw@mail.gmail.com>

On Tue, 19 Apr 2022 22:27:32 +0200
Michał Krawczyk <mk@semihalf.com> wrote:

> wt., 19 kwi 2022 o 17:01 Stephen Hemminger
> <stephen@networkplumber.org> napisał(a):
> >
> > On Tue, 19 Apr 2022 14:10:23 +0200
> > Michał Krawczyk <mk@semihalf.com> wrote:
> >  
> > > pon., 18 kwi 2022 o 17:19 Amiya Mohakud
> > > <amohakud@paloaltonetworks.com> napisał(a):  
> > > >
> > > > + Megha, Sharad and Eswar.
> > > >
> > > > On Mon, Apr 18, 2022 at 2:03 PM Amiya Mohakud <amohakud@paloaltonetworks.com> wrote:  
> > > >>
> > > >> Hi Michal/DPDK-Experts,
> > > >>
> > > >> I am facing one issue in net/ena driver while fetching extended stats (xstats). The DPDK seems to segfault with below backtrace.
> > > >>
> > > >> DPDK Version: 20.11.1
> > > >> ENA version: 2.2.1
> > > >>
> > > >>
> > > >> Using host libthread_db library "/lib64/libthread_db.so.1".
> > > >>
> > > >> Core was generated by `/opt/dpfs/usr/local/bin/brdagent'.
> > > >>
> > > >> Program terminated with signal SIGSEGV, Segmentation fault.
> > > >>
> > > >> #0  __memmove_avx_unaligned_erms () at ../sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S:232
> > > >>
> > > >> 232             VMOVU   %VEC(0), (%rdi)
> > > >>
> > > >> [Current thread is 1 (Thread 0x7fffed93a400 (LWP 5060))]
> > > >>
> > > >>
> > > >> Thread 1 (Thread 0x7fffed93a400 (LWP 5060)):
> > > >>
> > > >> #0  __memmove_avx_unaligned_erms () at ../sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S:232
> > > >>
> > > >> #1  0x00007ffff3c246df in ena_com_handle_admin_completion () from ../lib64/../../lib64/libdpdk.so.20
> > > >>
> > > >> #2  0x00007ffff3c1e7f5 in ena_interrupt_handler_rte () from ../lib64/../../lib64/libdpdk.so.20
> > > >>
> > > >> #3  0x00007ffff3519902 in eal_intr_thread_main () from /../lib64/../../lib64/libdpdk.so.20
> > > >>
> > > >> #4  0x00007ffff510714a in start_thread (arg=<optimized out>) at pthread_create.c:479
> > > >>
> > > >> #5  0x00007ffff561ff23 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
> > > >>
> > > >>
> > > >>
> > > >>
> > > >> Background:
> > > >>
> > > >> This used to work fine with DPDK-19.11.3 , that means there was no crash observed with the 19.11.3 DPDK version, but now after upgrading to DPDK 20.11.1, DPDK is crashing with the above trace.
> > > >> It looks to me as a DPDK issue.
> > > >> I could see multiple fixes/patches in the net/ena area, but not able to identify which patch would exactly fix this issue.
> > > >>
> > > >> For example: http://git.dpdk.org/dpdk/diff/?h=releases&id=aab58857330bb4bd03f6699bf1ee716f72993774
> > > >> https://inbox.dpdk.org/dev/20210430125725.28796-6-mk@semihalf.com/T/#me99457c706718bb236d1fd8006ee7a0319ce76fc
> > > >>
> > > >>
> > > >> Could you please help here and let me know what patch could fix this issue.
> > > >>  
> > >
> > > + Shai Brandes and ena-dev
> > >
> > > Hi Amiya,
> > >
> > > Thanks for reaching me out. Could you please provide us with more
> > > details regarding the reproduction? I cannot reproduce this on my
> > > setup for DPDK v20.11.1 when using testpmd and probing for the xstats.
> > >
> > > =======================================================================
> > > [ec2-user@<removed> dpdk]$ sudo ./build/app/dpdk-testpmd -- -i
> > > EAL: Detected 8 lcore(s)
> > > EAL: Detected 1 NUMA nodes
> > > EAL: Detected static linkage of DPDK
> > > EAL: Multi-process socket /var/run/dpdk/rte/mp_socket
> > > EAL: Selected IOVA mode 'PA'
> > > EAL: No available hugepages reported in hugepages-1048576kB
> > > EAL: Probing VFIO support...
> > > EAL:   Invalid NUMA socket, default to 0
> > > EAL:   Invalid NUMA socket, default to 0
> > > EAL: Probe PCI driver: net_ena (1d0f:ec20) device: 0000:00:06.0 (socket 0)
> > > EAL: No legacy callbacks, legacy socket not created
> > > Interactive-mode selected
> > > ena_mtu_set(): Set MTU: 1500
> > >
> > > testpmd: create a new mbuf pool <mb_pool_0>: n=203456, size=2176, socket=0
> > > testpmd: preferred mempool ops selected: ring_mp_mc
> > >
> > > Warning! port-topology=paired and odd forward ports number, the last
> > > port will pair with itself.
> > >
> > > Configuring Port 0 (socket 0)
> > > Port 0: <removed>
> > > Checking link statuses...
> > > Done
> > > Error during enabling promiscuous mode for port 0: Operation not
> > > supported - ignore  
> > > testpmd> start  
> > > io packet forwarding - ports=1 - cores=1 - streams=1 - NUMA support
> > > enabled, MP allocation mode: native
> > > Logical Core 1 (socket 0) forwards packets on 1 streams:
> > >   RX P=0/Q=0 (socket 0) -> TX P=0/Q=0 (socket 0) peer=02:00:00:00:00:00
> > >
> > >   io packet forwarding packets/burst=32
> > >   nb forwarding cores=1 - nb forwarding ports=1
> > >   port 0: RX queue number: 1 Tx queue number: 1
> > >     Rx offloads=0x0 Tx offloads=0x0
> > >     RX queue: 0
> > >       RX desc=0 - RX free threshold=0
> > >       RX threshold registers: pthresh=0 hthresh=0  wthresh=0
> > >       RX Offloads=0x0
> > >     TX queue: 0
> > >       TX desc=0 - TX free threshold=0
> > >       TX threshold registers: pthresh=0 hthresh=0  wthresh=0
> > >       TX offloads=0x0 - TX RS bit threshold=0  
> > > testpmd> show port xstats 0  
> > > ###### NIC extended statistics for port 0
> > > rx_good_packets: 1
> > > tx_good_packets: 1
> > > rx_good_bytes: 42
> > > tx_good_bytes: 42
> > > rx_missed_errors: 0
> > > rx_errors: 0
> > > tx_errors: 0
> > > rx_mbuf_allocation_errors: 0
> > > rx_q0_packets: 1
> > > rx_q0_bytes: 42
> > > rx_q0_errors: 0
> > > tx_q0_packets: 1
> > > tx_q0_bytes: 42
> > > wd_expired: 0
> > > dev_start: 1
> > > dev_stop: 0
> > > tx_drops: 0
> > > bw_in_allowance_exceeded: 0
> > > bw_out_allowance_exceeded: 0
> > > pps_allowance_exceeded: 0
> > > conntrack_allowance_exceeded: 0
> > > linklocal_allowance_exceeded: 0
> > > rx_q0_cnt: 1
> > > rx_q0_bytes: 42
> > > rx_q0_refill_partial: 0
> > > rx_q0_bad_csum: 0
> > > rx_q0_mbuf_alloc_fail: 0
> > > rx_q0_bad_desc_num: 0
> > > rx_q0_bad_req_id: 0
> > > tx_q0_cnt: 1
> > > tx_q0_bytes: 42
> > > tx_q0_prepare_ctx_err: 0
> > > tx_q0_linearize: 0
> > > tx_q0_linearize_failed: 0
> > > tx_q0_tx_poll: 1
> > > tx_q0_doorbells: 1
> > > tx_q0_bad_req_id: 0
> > > tx_q0_available_desc: 1022
> > > =======================================================================
> > >
> > > I think that you can see the regression because of the new xstats (ENI
> > > limiters), which were added after DPDK v19.11 (mainline commit:
> > > 45718ada5fa12619db4821646ba964a2df365c68), but I'm not sure what is
> > > the reason why you can see that.
> > >
> > > Especially I've got few questions below.
> > >
> > > 1. Is the application you're using the single-process or multiprocess?
> > > If so, from which process are you probing for the xstats?
> > > 2. Have you tried running latest DPDK v20.11 LTS?
> > > 3. What kernel module are you using (igb_uio/vfio-pci)?
> > > 4. On what AWS instance type it was reproduced?
> > > 5. Is the Seg Fault happening the first time you call for the xstats?
> > >
> > > If you've got any other information which could be useful, please
> > > share, it will help us with resolving the cause of the issue.
> > >
> > > Thanks,
> > > Michal
> > >  
> > > >>
> > > >> Regards
> > > >> Amiya  
> >
> > Try getting xstats in secondary process.
> > I think that is where the bug was found.  
> 
> Thanks Stephen, indeed the issue reproduces in the secondary process.
> 
> Basically ENA v2.2.1 is not MP aware, meaning it cannot be used safely
> from the secondary process. The main obstacle is the admin queue which
> is used for processing the hardware requests which can be used safely
> only from the primary process. It's not strictly a bug, as we weren't
> exposing 'MP Awareness' in the PMD features list, it's more like a
> lack of proper MP support.

Driver should report error. Not crash.
Could you fix that.



  reply	other threads:[~2022-04-19 22:25 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-04-18  8:33 Amiya Mohakud
2022-04-18 15:18 ` Amiya Mohakud
2022-04-19 12:10   ` Michał Krawczyk
2022-04-19 15:01     ` Stephen Hemminger
2022-04-19 20:27       ` Michał Krawczyk
2022-04-19 22:25         ` Stephen Hemminger [this message]
2022-04-19 23:09         ` Stephen Hemminger
2022-04-20  8:37           ` Amiya Mohakud

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20220419152539.3f39a704@hermes.local \
    --to=stephen@networkplumber.org \
    --cc=amohakud@paloaltonetworks.com \
    --cc=dev@dpdk.org \
    --cc=ena-dev@semihalf.com \
    --cc=esadaram@paloaltonetworks.com \
    --cc=mk@semihalf.com \
    --cc=mpunjani@paloaltonetworks.com \
    --cc=shaibran@amazon.com \
    --cc=skanoje@paloaltonetworks.com \
    --cc=ssaha@paloaltonetworks.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).