From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id 35EDEA00C3; Tue, 19 Apr 2022 17:01:56 +0200 (CEST) Received: from [217.70.189.124] (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id 1F8D341109; Tue, 19 Apr 2022 17:01:56 +0200 (CEST) Received: from mail-pg1-f174.google.com (mail-pg1-f174.google.com [209.85.215.174]) by mails.dpdk.org (Postfix) with ESMTP id C6DEA4068E for ; Tue, 19 Apr 2022 17:01:54 +0200 (CEST) Received: by mail-pg1-f174.google.com with SMTP id bg9so24206179pgb.9 for ; Tue, 19 Apr 2022 08:01:54 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=networkplumber-org.20210112.gappssmtp.com; s=20210112; h=date:from:to:cc:subject:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=NmFBhCXGSlfdFRI7+JIw4mzt9Sgcq8hq27bsFTRBPYc=; b=2QQ2J7orWlEFUbupRN79oYjRGR8g1F9j47yT6p3yqaBgXPiRebHN5QOsh7CTL1coQ7 EtTEkdvh28DIAH6p1+mt38B1+JUIxuzj3gaaM16jhUYhH/7ZCcBFAq0zPT36xFCI0n6l Bo8C38gYXYyOUyYHUc9Hte1tdEuFiG36oWY66UflVKUXQTaKb20i0f7Jdrbm8Ex1P2o6 +Oopi1GprQU8jpLkRDc33zsji17k60DKFgbKwPs0veTuu/V0Gq2fFOa8CE6zseQm4VNX /HDfxJK8DAh+TChU8ehHPdkg5+m9HWTT8Q7aPenfX8ipN4c5bOtEy8jX09jbUZ47w4WC LsiQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:from:to:cc:subject:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=NmFBhCXGSlfdFRI7+JIw4mzt9Sgcq8hq27bsFTRBPYc=; b=CB5/MrBfuJvSVSVtMqeX9SJMB2jMVi6qPEAVMFzvOVpjs6PLyDV0/FrvG/1S6aCtGn zO9a2UOssjliqr9KzHBV6B9G7jIxJ17BkLGm2p6S89upO3mi9FfMF6riGDhvAc/w0W0j tI8dsa5Vn3wnctqD69ZWEAz7Dpg6jb5LEmqbb3ZUqWfyvDtJaJjmHjUCdfnYuGt4IApb 5HAo1LB1bG1cgZQCQr/9DEG/ey/kCmMSXzmnnw+1xhH85r1yL41XTRceS11/nhW4DNmS Mr6FAmZpg5aTB8RVlyFuYU3tiOMFe5nj6Sr/axWEKEkdEF1UHMktwAlTDPO8tFOu6s5U JiLw== X-Gm-Message-State: AOAM530bxcZmm0qDZFSCtAVuCDTVl37aAmTeLRfO+PVBkyjq0Gt5k6cN MnD0lvDpz4/gDdFUv0VcAaqQEw== X-Google-Smtp-Source: ABdhPJzHFrL1V8duVQgQ2jmAXMGpYi8xgkHRfh9xi6m1lU7PFm3qOW9SKh0ndJDz8e0qESQZcvUhgg== X-Received: by 2002:a63:590e:0:b0:39d:1a2b:5907 with SMTP id n14-20020a63590e000000b0039d1a2b5907mr15011131pgb.188.1650380513988; Tue, 19 Apr 2022 08:01:53 -0700 (PDT) Received: from hermes.local (204-195-112-199.wavecable.com. [204.195.112.199]) by smtp.gmail.com with ESMTPSA id z14-20020a17090a170e00b001cb7e69ee5csm20306639pjd.54.2022.04.19.08.01.53 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 19 Apr 2022 08:01:53 -0700 (PDT) Date: Tue, 19 Apr 2022 08:01:50 -0700 From: Stephen Hemminger To: =?UTF-8?B?TWljaGHFgg==?= Krawczyk Cc: Amiya Mohakud , dev , Sachin Kanoje , Megha Punjani , Sharad Saha , Eswar Sadaram , "Brandes, Shai" , ena-dev Subject: Re: DPDK:20.11.1: net/ena crash while fetching xstats Message-ID: <20220419080150.2511dee2@hermes.local> In-Reply-To: References: MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org On Tue, 19 Apr 2022 14:10:23 +0200 Micha=C5=82 Krawczyk wrote: > pon., 18 kwi 2022 o 17:19 Amiya Mohakud > napisa=C5=82(a): > > > > + Megha, Sharad and Eswar. > > > > On Mon, Apr 18, 2022 at 2:03 PM Amiya Mohakud wrote: =20 > >> > >> Hi Michal/DPDK-Experts, > >> > >> I am facing one issue in net/ena driver while fetching extended stats = (xstats). The DPDK seems to segfault with below backtrace. > >> > >> DPDK Version: 20.11.1 > >> ENA version: 2.2.1 > >> > >> > >> Using host libthread_db library "/lib64/libthread_db.so.1". > >> > >> Core was generated by `/opt/dpfs/usr/local/bin/brdagent'. > >> > >> Program terminated with signal SIGSEGV, Segmentation fault. > >> > >> #0 __memmove_avx_unaligned_erms () at ../sysdeps/x86_64/multiarch/mem= move-vec-unaligned-erms.S:232 > >> > >> 232 VMOVU %VEC(0), (%rdi) > >> > >> [Current thread is 1 (Thread 0x7fffed93a400 (LWP 5060))] > >> > >> > >> Thread 1 (Thread 0x7fffed93a400 (LWP 5060)): > >> > >> #0 __memmove_avx_unaligned_erms () at ../sysdeps/x86_64/multiarch/mem= move-vec-unaligned-erms.S:232 > >> > >> #1 0x00007ffff3c246df in ena_com_handle_admin_completion () from ../l= ib64/../../lib64/libdpdk.so.20 > >> > >> #2 0x00007ffff3c1e7f5 in ena_interrupt_handler_rte () from ../lib64/.= ./../lib64/libdpdk.so.20 > >> > >> #3 0x00007ffff3519902 in eal_intr_thread_main () from /../lib64/../..= /lib64/libdpdk.so.20 > >> > >> #4 0x00007ffff510714a in start_thread (arg=3D) at pthr= ead_create.c:479 > >> > >> #5 0x00007ffff561ff23 in clone () at ../sysdeps/unix/sysv/linux/x86_6= 4/clone.S:95 > >> > >> > >> > >> > >> Background: > >> > >> This used to work fine with DPDK-19.11.3 , that means there was no cra= sh observed with the 19.11.3 DPDK version, but now after upgrading to DPDK = 20.11.1, DPDK is crashing with the above trace. > >> It looks to me as a DPDK issue. > >> I could see multiple fixes/patches in the net/ena area, but not able t= o identify which patch would exactly fix this issue. > >> > >> For example: http://git.dpdk.org/dpdk/diff/?h=3Dreleases&id=3Daab58857= 330bb4bd03f6699bf1ee716f72993774 > >> https://inbox.dpdk.org/dev/20210430125725.28796-6-mk@semihalf.com/T/#m= e99457c706718bb236d1fd8006ee7a0319ce76fc > >> > >> > >> Could you please help here and let me know what patch could fix this i= ssue. > >> =20 >=20 > + Shai Brandes and ena-dev >=20 > Hi Amiya, >=20 > Thanks for reaching me out. Could you please provide us with more > details regarding the reproduction? I cannot reproduce this on my > setup for DPDK v20.11.1 when using testpmd and probing for the xstats. >=20 > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > [ec2-user@ dpdk]$ sudo ./build/app/dpdk-testpmd -- -i > EAL: Detected 8 lcore(s) > EAL: Detected 1 NUMA nodes > EAL: Detected static linkage of DPDK > EAL: Multi-process socket /var/run/dpdk/rte/mp_socket > EAL: Selected IOVA mode 'PA' > EAL: No available hugepages reported in hugepages-1048576kB > EAL: Probing VFIO support... > EAL: Invalid NUMA socket, default to 0 > EAL: Invalid NUMA socket, default to 0 > EAL: Probe PCI driver: net_ena (1d0f:ec20) device: 0000:00:06.0 (socket 0) > EAL: No legacy callbacks, legacy socket not created > Interactive-mode selected > ena_mtu_set(): Set MTU: 1500 >=20 > testpmd: create a new mbuf pool : n=3D203456, size=3D2176, soc= ket=3D0 > testpmd: preferred mempool ops selected: ring_mp_mc >=20 > Warning! port-topology=3Dpaired and odd forward ports number, the last > port will pair with itself. >=20 > Configuring Port 0 (socket 0) > Port 0: > Checking link statuses... > Done > Error during enabling promiscuous mode for port 0: Operation not > supported - ignore > testpmd> start =20 > io packet forwarding - ports=3D1 - cores=3D1 - streams=3D1 - NUMA support > enabled, MP allocation mode: native > Logical Core 1 (socket 0) forwards packets on 1 streams: > RX P=3D0/Q=3D0 (socket 0) -> TX P=3D0/Q=3D0 (socket 0) peer=3D02:00:00:= 00:00:00 >=20 > io packet forwarding packets/burst=3D32 > nb forwarding cores=3D1 - nb forwarding ports=3D1 > port 0: RX queue number: 1 Tx queue number: 1 > Rx offloads=3D0x0 Tx offloads=3D0x0 > RX queue: 0 > RX desc=3D0 - RX free threshold=3D0 > RX threshold registers: pthresh=3D0 hthresh=3D0 wthresh=3D0 > RX Offloads=3D0x0 > TX queue: 0 > TX desc=3D0 - TX free threshold=3D0 > TX threshold registers: pthresh=3D0 hthresh=3D0 wthresh=3D0 > TX offloads=3D0x0 - TX RS bit threshold=3D0 > testpmd> show port xstats 0 =20 > ###### NIC extended statistics for port 0 > rx_good_packets: 1 > tx_good_packets: 1 > rx_good_bytes: 42 > tx_good_bytes: 42 > rx_missed_errors: 0 > rx_errors: 0 > tx_errors: 0 > rx_mbuf_allocation_errors: 0 > rx_q0_packets: 1 > rx_q0_bytes: 42 > rx_q0_errors: 0 > tx_q0_packets: 1 > tx_q0_bytes: 42 > wd_expired: 0 > dev_start: 1 > dev_stop: 0 > tx_drops: 0 > bw_in_allowance_exceeded: 0 > bw_out_allowance_exceeded: 0 > pps_allowance_exceeded: 0 > conntrack_allowance_exceeded: 0 > linklocal_allowance_exceeded: 0 > rx_q0_cnt: 1 > rx_q0_bytes: 42 > rx_q0_refill_partial: 0 > rx_q0_bad_csum: 0 > rx_q0_mbuf_alloc_fail: 0 > rx_q0_bad_desc_num: 0 > rx_q0_bad_req_id: 0 > tx_q0_cnt: 1 > tx_q0_bytes: 42 > tx_q0_prepare_ctx_err: 0 > tx_q0_linearize: 0 > tx_q0_linearize_failed: 0 > tx_q0_tx_poll: 1 > tx_q0_doorbells: 1 > tx_q0_bad_req_id: 0 > tx_q0_available_desc: 1022 > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D >=20 > I think that you can see the regression because of the new xstats (ENI > limiters), which were added after DPDK v19.11 (mainline commit: > 45718ada5fa12619db4821646ba964a2df365c68), but I'm not sure what is > the reason why you can see that. >=20 > Especially I've got few questions below. >=20 > 1. Is the application you're using the single-process or multiprocess? > If so, from which process are you probing for the xstats? > 2. Have you tried running latest DPDK v20.11 LTS? > 3. What kernel module are you using (igb_uio/vfio-pci)? > 4. On what AWS instance type it was reproduced? > 5. Is the Seg Fault happening the first time you call for the xstats? >=20 > If you've got any other information which could be useful, please > share, it will help us with resolving the cause of the issue. >=20 > Thanks, > Michal >=20 > >> > >> Regards > >> Amiya =20 Try getting xstats in secondary process. I think that is where the bug was found.