From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id B7B41A00BE; Wed, 20 Apr 2022 00:25:46 +0200 (CEST) Received: from [217.70.189.124] (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id C18C04068E; Wed, 20 Apr 2022 00:25:45 +0200 (CEST) Received: from mail-pj1-f45.google.com (mail-pj1-f45.google.com [209.85.216.45]) by mails.dpdk.org (Postfix) with ESMTP id 94C5E40687 for ; Wed, 20 Apr 2022 00:25:43 +0200 (CEST) Received: by mail-pj1-f45.google.com with SMTP id bx5so119133pjb.3 for ; Tue, 19 Apr 2022 15:25:43 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=networkplumber-org.20210112.gappssmtp.com; s=20210112; h=date:from:to:cc:subject:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=SeuxwgKGxEfc06IQtJbzRulhmM47oVLsS9rzDWoJjEw=; b=DUbd4SixcibzNOS84IYJl3VHO4XXVJqvvE+2w7pNfKOojIuonqbEi2j7GCytAnaYH0 OIb0tU1NbhCS4i47jLOD+39/t3wKWJWfPzLRrbzmYozlFgBJMDZAaKB7LNQSLwU1BN9T O8suewtUTAjU2SCOq4zKhuHZVmjzrz5i3vH5EjXHTeQOqmSV1MhO5TbsZv0VX/x7tjXo UCN25kmr0DUPYaiZtDPqtpPTj0tdtT3+GIbQaagxJof+2P3E2NG++4dRTah0gGpNh5Ry lmgHjwqALa5i8THrAqFdytlyV7G3Ba2un2EFF9GQclfj3yu3kOqaUPm5WXgz03IrrYpg c7sA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:from:to:cc:subject:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=SeuxwgKGxEfc06IQtJbzRulhmM47oVLsS9rzDWoJjEw=; b=ezgLvMgbTNdgDtjnkWiHTzg/3qMB55s2Z/xQ5mcy41a5iSd6aGzO6XX0T5ESNhwtp4 WV5EKAJWxLn0fi6YkMxPQ2I2ZU+Ot0MpjUgrgnj0TuEWPe/4+jdE4rWAnfoEODTKvi1o N+JmtIKQDNQMe+sETsjvQmq2FF7d1D+GLBzhIgMeKdIq5q6aOWhyxv8vbV/Uovq5t2hq UF/jI6aIvW75PsFs9/I/UxMOylGmkn3fZddEsnPD7uMzMjVxBfFl/63pg62ZTBscBq20 LUtvMi/Tiu6w+8CoBa3kM0QnhtgWhnsYP1mkcq2xoydiayI8Q9/K6a6DCJiUCuZ2xlom aeYg== X-Gm-Message-State: AOAM531FrxEnRqzpHYafRWhsNG5PmVeudANzbepMiDV7E59xn37ICE/D k3VcO0XoTVtjTqSoJXFjtMXJjA== X-Google-Smtp-Source: ABdhPJwxqjF9FaW4FP6rCK3yRuxcCeELtbfSY70gKc5klGlqZWk+b/QCnXWAWMZ6HFzoNnx2JoJk1A== X-Received: by 2002:a17:902:b613:b0:156:7d82:c09b with SMTP id b19-20020a170902b61300b001567d82c09bmr17927104pls.80.1650407142488; Tue, 19 Apr 2022 15:25:42 -0700 (PDT) Received: from hermes.local (204-195-112-199.wavecable.com. [204.195.112.199]) by smtp.gmail.com with ESMTPSA id d5-20020a17090acd0500b001b9c05b075dsm20496958pju.44.2022.04.19.15.25.41 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 19 Apr 2022 15:25:41 -0700 (PDT) Date: Tue, 19 Apr 2022 15:25:39 -0700 From: Stephen Hemminger To: =?UTF-8?B?TWljaGHFgg==?= Krawczyk Cc: Amiya Mohakud , dev , Sachin Kanoje , Megha Punjani , Sharad Saha , Eswar Sadaram , "Brandes, Shai" , ena-dev Subject: Re: DPDK:20.11.1: net/ena crash while fetching xstats Message-ID: <20220419152539.3f39a704@hermes.local> In-Reply-To: References: <20220419080150.2511dee2@hermes.local> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org On Tue, 19 Apr 2022 22:27:32 +0200 Micha=C5=82 Krawczyk wrote: > wt., 19 kwi 2022 o 17:01 Stephen Hemminger > napisa=C5=82(a): > > > > On Tue, 19 Apr 2022 14:10:23 +0200 > > Micha=C5=82 Krawczyk wrote: > > =20 > > > pon., 18 kwi 2022 o 17:19 Amiya Mohakud > > > napisa=C5=82(a): =20 > > > > > > > > + Megha, Sharad and Eswar. > > > > > > > > On Mon, Apr 18, 2022 at 2:03 PM Amiya Mohakud wrote: =20 > > > >> > > > >> Hi Michal/DPDK-Experts, > > > >> > > > >> I am facing one issue in net/ena driver while fetching extended st= ats (xstats). The DPDK seems to segfault with below backtrace. > > > >> > > > >> DPDK Version: 20.11.1 > > > >> ENA version: 2.2.1 > > > >> > > > >> > > > >> Using host libthread_db library "/lib64/libthread_db.so.1". > > > >> > > > >> Core was generated by `/opt/dpfs/usr/local/bin/brdagent'. > > > >> > > > >> Program terminated with signal SIGSEGV, Segmentation fault. > > > >> > > > >> #0 __memmove_avx_unaligned_erms () at ../sysdeps/x86_64/multiarch= /memmove-vec-unaligned-erms.S:232 > > > >> > > > >> 232 VMOVU %VEC(0), (%rdi) > > > >> > > > >> [Current thread is 1 (Thread 0x7fffed93a400 (LWP 5060))] > > > >> > > > >> > > > >> Thread 1 (Thread 0x7fffed93a400 (LWP 5060)): > > > >> > > > >> #0 __memmove_avx_unaligned_erms () at ../sysdeps/x86_64/multiarch= /memmove-vec-unaligned-erms.S:232 > > > >> > > > >> #1 0x00007ffff3c246df in ena_com_handle_admin_completion () from = ../lib64/../../lib64/libdpdk.so.20 > > > >> > > > >> #2 0x00007ffff3c1e7f5 in ena_interrupt_handler_rte () from ../lib= 64/../../lib64/libdpdk.so.20 > > > >> > > > >> #3 0x00007ffff3519902 in eal_intr_thread_main () from /../lib64/.= ./../lib64/libdpdk.so.20 > > > >> > > > >> #4 0x00007ffff510714a in start_thread (arg=3D) at = pthread_create.c:479 > > > >> > > > >> #5 0x00007ffff561ff23 in clone () at ../sysdeps/unix/sysv/linux/x= 86_64/clone.S:95 > > > >> > > > >> > > > >> > > > >> > > > >> Background: > > > >> > > > >> This used to work fine with DPDK-19.11.3 , that means there was no= crash observed with the 19.11.3 DPDK version, but now after upgrading to D= PDK 20.11.1, DPDK is crashing with the above trace. > > > >> It looks to me as a DPDK issue. > > > >> I could see multiple fixes/patches in the net/ena area, but not ab= le to identify which patch would exactly fix this issue. > > > >> > > > >> For example: http://git.dpdk.org/dpdk/diff/?h=3Dreleases&id=3Daab5= 8857330bb4bd03f6699bf1ee716f72993774 > > > >> https://inbox.dpdk.org/dev/20210430125725.28796-6-mk@semihalf.com/= T/#me99457c706718bb236d1fd8006ee7a0319ce76fc > > > >> > > > >> > > > >> Could you please help here and let me know what patch could fix th= is issue. > > > >> =20 > > > > > > + Shai Brandes and ena-dev > > > > > > Hi Amiya, > > > > > > Thanks for reaching me out. Could you please provide us with more > > > details regarding the reproduction? I cannot reproduce this on my > > > setup for DPDK v20.11.1 when using testpmd and probing for the xstats. > > > > > > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > > > [ec2-user@ dpdk]$ sudo ./build/app/dpdk-testpmd -- -i > > > EAL: Detected 8 lcore(s) > > > EAL: Detected 1 NUMA nodes > > > EAL: Detected static linkage of DPDK > > > EAL: Multi-process socket /var/run/dpdk/rte/mp_socket > > > EAL: Selected IOVA mode 'PA' > > > EAL: No available hugepages reported in hugepages-1048576kB > > > EAL: Probing VFIO support... > > > EAL: Invalid NUMA socket, default to 0 > > > EAL: Invalid NUMA socket, default to 0 > > > EAL: Probe PCI driver: net_ena (1d0f:ec20) device: 0000:00:06.0 (sock= et 0) > > > EAL: No legacy callbacks, legacy socket not created > > > Interactive-mode selected > > > ena_mtu_set(): Set MTU: 1500 > > > > > > testpmd: create a new mbuf pool : n=3D203456, size=3D2176,= socket=3D0 > > > testpmd: preferred mempool ops selected: ring_mp_mc > > > > > > Warning! port-topology=3Dpaired and odd forward ports number, the last > > > port will pair with itself. > > > > > > Configuring Port 0 (socket 0) > > > Port 0: > > > Checking link statuses... > > > Done > > > Error during enabling promiscuous mode for port 0: Operation not > > > supported - ignore =20 > > > testpmd> start =20 > > > io packet forwarding - ports=3D1 - cores=3D1 - streams=3D1 - NUMA sup= port > > > enabled, MP allocation mode: native > > > Logical Core 1 (socket 0) forwards packets on 1 streams: > > > RX P=3D0/Q=3D0 (socket 0) -> TX P=3D0/Q=3D0 (socket 0) peer=3D02:00= :00:00:00:00 > > > > > > io packet forwarding packets/burst=3D32 > > > nb forwarding cores=3D1 - nb forwarding ports=3D1 > > > port 0: RX queue number: 1 Tx queue number: 1 > > > Rx offloads=3D0x0 Tx offloads=3D0x0 > > > RX queue: 0 > > > RX desc=3D0 - RX free threshold=3D0 > > > RX threshold registers: pthresh=3D0 hthresh=3D0 wthresh=3D0 > > > RX Offloads=3D0x0 > > > TX queue: 0 > > > TX desc=3D0 - TX free threshold=3D0 > > > TX threshold registers: pthresh=3D0 hthresh=3D0 wthresh=3D0 > > > TX offloads=3D0x0 - TX RS bit threshold=3D0 =20 > > > testpmd> show port xstats 0 =20 > > > ###### NIC extended statistics for port 0 > > > rx_good_packets: 1 > > > tx_good_packets: 1 > > > rx_good_bytes: 42 > > > tx_good_bytes: 42 > > > rx_missed_errors: 0 > > > rx_errors: 0 > > > tx_errors: 0 > > > rx_mbuf_allocation_errors: 0 > > > rx_q0_packets: 1 > > > rx_q0_bytes: 42 > > > rx_q0_errors: 0 > > > tx_q0_packets: 1 > > > tx_q0_bytes: 42 > > > wd_expired: 0 > > > dev_start: 1 > > > dev_stop: 0 > > > tx_drops: 0 > > > bw_in_allowance_exceeded: 0 > > > bw_out_allowance_exceeded: 0 > > > pps_allowance_exceeded: 0 > > > conntrack_allowance_exceeded: 0 > > > linklocal_allowance_exceeded: 0 > > > rx_q0_cnt: 1 > > > rx_q0_bytes: 42 > > > rx_q0_refill_partial: 0 > > > rx_q0_bad_csum: 0 > > > rx_q0_mbuf_alloc_fail: 0 > > > rx_q0_bad_desc_num: 0 > > > rx_q0_bad_req_id: 0 > > > tx_q0_cnt: 1 > > > tx_q0_bytes: 42 > > > tx_q0_prepare_ctx_err: 0 > > > tx_q0_linearize: 0 > > > tx_q0_linearize_failed: 0 > > > tx_q0_tx_poll: 1 > > > tx_q0_doorbells: 1 > > > tx_q0_bad_req_id: 0 > > > tx_q0_available_desc: 1022 > > > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > > > > > > I think that you can see the regression because of the new xstats (ENI > > > limiters), which were added after DPDK v19.11 (mainline commit: > > > 45718ada5fa12619db4821646ba964a2df365c68), but I'm not sure what is > > > the reason why you can see that. > > > > > > Especially I've got few questions below. > > > > > > 1. Is the application you're using the single-process or multiprocess? > > > If so, from which process are you probing for the xstats? > > > 2. Have you tried running latest DPDK v20.11 LTS? > > > 3. What kernel module are you using (igb_uio/vfio-pci)? > > > 4. On what AWS instance type it was reproduced? > > > 5. Is the Seg Fault happening the first time you call for the xstats? > > > > > > If you've got any other information which could be useful, please > > > share, it will help us with resolving the cause of the issue. > > > > > > Thanks, > > > Michal > > > =20 > > > >> > > > >> Regards > > > >> Amiya =20 > > > > Try getting xstats in secondary process. > > I think that is where the bug was found. =20 >=20 > Thanks Stephen, indeed the issue reproduces in the secondary process. >=20 > Basically ENA v2.2.1 is not MP aware, meaning it cannot be used safely > from the secondary process. The main obstacle is the admin queue which > is used for processing the hardware requests which can be used safely > only from the primary process. It's not strictly a bug, as we weren't > exposing 'MP Awareness' in the PMD features list, it's more like a > lack of proper MP support. Driver should report error. Not crash. Could you fix that.