* DPDK:20.11.1: net/ena crash while fetching xstats @ 2022-04-18 8:33 Amiya Mohakud 2022-04-18 15:18 ` Amiya Mohakud 0 siblings, 1 reply; 8+ messages in thread From: Amiya Mohakud @ 2022-04-18 8:33 UTC (permalink / raw) To: mk, dev; +Cc: Amiya Mohakud, Sachin Kanoje [-- Attachment #1: Type: text/plain, Size: 1882 bytes --] Hi Michal/DPDK-Experts, I am facing one issue in net/ena driver while fetching extended stats (xstats). The DPDK seems to segfault with below backtrace. DPDK Version: 20.11.1 ENA version: 2.2.1 Using host libthread_db library "/lib64/libthread_db.so.1". Core was generated by `/opt/dpfs/usr/local/bin/brdagent'. Program terminated with signal SIGSEGV, Segmentation fault. #0 __memmove_avx_unaligned_erms () at ../sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S:232 232 VMOVU %VEC(0), (%rdi) [Current thread is 1 (Thread 0x7fffed93a400 (LWP 5060))] Thread 1 (Thread 0x7fffed93a400 (LWP 5060)): #0 __memmove_avx_unaligned_erms () at ../sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S:232 #1 0x00007ffff3c246df in ena_com_handle_admin_completion () from ../lib64/../../lib64/libdpdk.so.20 #2 0x00007ffff3c1e7f5 in ena_interrupt_handler_rte () from ../lib64/../../lib64/libdpdk.so.20 #3 0x00007ffff3519902 in eal_intr_thread_main () from /../lib64/../../lib64/libdpdk.so.20 #4 0x00007ffff510714a in start_thread (arg=<optimized out>) at pthread_create.c:479 #5 0x00007ffff561ff23 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95 *Background:* - This used to work fine with DPDK-19.11.3 , that means there was no crash observed with the 19.11.3 DPDK version, but now after upgrading to DPDK 20.11.1, DPDK is crashing with the above trace. - It looks to me as a DPDK issue. - I could see multiple fixes/patches in the net/ena area, but not able to identify which patch would exactly fix this issue. For example: http://git.dpdk.org/dpdk/diff/?h=releases&id=aab58857330bb4bd03f6699bf1ee716f72993774 https://inbox.dpdk.org/dev/20210430125725.28796-6-mk@semihalf.com/T/#me99457c706718bb236d1fd8006ee7a0319ce76fc Could you please help here and let me know what patch could fix this issue. Regards Amiya [-- Attachment #2: Type: text/html, Size: 6402 bytes --] ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: DPDK:20.11.1: net/ena crash while fetching xstats 2022-04-18 8:33 DPDK:20.11.1: net/ena crash while fetching xstats Amiya Mohakud @ 2022-04-18 15:18 ` Amiya Mohakud 2022-04-19 12:10 ` Michał Krawczyk 0 siblings, 1 reply; 8+ messages in thread From: Amiya Mohakud @ 2022-04-18 15:18 UTC (permalink / raw) To: dev; +Cc: Sachin Kanoje, Megha Punjani, Sharad Saha, Eswar Sadaram [-- Attachment #1: Type: text/plain, Size: 2103 bytes --] + Megha, Sharad and Eswar. On Mon, Apr 18, 2022 at 2:03 PM Amiya Mohakud <amohakud@paloaltonetworks.com> wrote: > Hi Michal/DPDK-Experts, > > I am facing one issue in net/ena driver while fetching extended stats > (xstats). The DPDK seems to segfault with below backtrace. > > DPDK Version: 20.11.1 > ENA version: 2.2.1 > > > Using host libthread_db library "/lib64/libthread_db.so.1". > > Core was generated by `/opt/dpfs/usr/local/bin/brdagent'. > > Program terminated with signal SIGSEGV, Segmentation fault. > > #0 __memmove_avx_unaligned_erms () at > ../sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S:232 > > 232 VMOVU %VEC(0), (%rdi) > > [Current thread is 1 (Thread 0x7fffed93a400 (LWP 5060))] > > > Thread 1 (Thread 0x7fffed93a400 (LWP 5060)): > > #0 __memmove_avx_unaligned_erms () at > ../sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S:232 > > #1 0x00007ffff3c246df in ena_com_handle_admin_completion () from > ../lib64/../../lib64/libdpdk.so.20 > > #2 0x00007ffff3c1e7f5 in ena_interrupt_handler_rte () from > ../lib64/../../lib64/libdpdk.so.20 > > #3 0x00007ffff3519902 in eal_intr_thread_main () from > /../lib64/../../lib64/libdpdk.so.20 > > #4 0x00007ffff510714a in start_thread (arg=<optimized out>) at > pthread_create.c:479 > > #5 0x00007ffff561ff23 in clone () at > ../sysdeps/unix/sysv/linux/x86_64/clone.S:95 > > > > > *Background:* > > - This used to work fine with DPDK-19.11.3 , that means there was no > crash observed with the 19.11.3 DPDK version, but now after upgrading to > DPDK 20.11.1, DPDK is crashing with the above trace. > - It looks to me as a DPDK issue. > - I could see multiple fixes/patches in the net/ena area, but not able > to identify which patch would exactly fix this issue. > > For example: > http://git.dpdk.org/dpdk/diff/?h=releases&id=aab58857330bb4bd03f6699bf1ee716f72993774 > > https://inbox.dpdk.org/dev/20210430125725.28796-6-mk@semihalf.com/T/#me99457c706718bb236d1fd8006ee7a0319ce76fc > > > Could you please help here and let me know what patch could fix this issue. > > > Regards > Amiya > [-- Attachment #2: Type: text/html, Size: 6887 bytes --] ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: DPDK:20.11.1: net/ena crash while fetching xstats 2022-04-18 15:18 ` Amiya Mohakud @ 2022-04-19 12:10 ` Michał Krawczyk 2022-04-19 15:01 ` Stephen Hemminger 0 siblings, 1 reply; 8+ messages in thread From: Michał Krawczyk @ 2022-04-19 12:10 UTC (permalink / raw) To: Amiya Mohakud Cc: dev, Sachin Kanoje, Megha Punjani, Sharad Saha, Eswar Sadaram, Brandes, Shai, ena-dev pon., 18 kwi 2022 o 17:19 Amiya Mohakud <amohakud@paloaltonetworks.com> napisał(a): > > + Megha, Sharad and Eswar. > > On Mon, Apr 18, 2022 at 2:03 PM Amiya Mohakud <amohakud@paloaltonetworks.com> wrote: >> >> Hi Michal/DPDK-Experts, >> >> I am facing one issue in net/ena driver while fetching extended stats (xstats). The DPDK seems to segfault with below backtrace. >> >> DPDK Version: 20.11.1 >> ENA version: 2.2.1 >> >> >> Using host libthread_db library "/lib64/libthread_db.so.1". >> >> Core was generated by `/opt/dpfs/usr/local/bin/brdagent'. >> >> Program terminated with signal SIGSEGV, Segmentation fault. >> >> #0 __memmove_avx_unaligned_erms () at ../sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S:232 >> >> 232 VMOVU %VEC(0), (%rdi) >> >> [Current thread is 1 (Thread 0x7fffed93a400 (LWP 5060))] >> >> >> Thread 1 (Thread 0x7fffed93a400 (LWP 5060)): >> >> #0 __memmove_avx_unaligned_erms () at ../sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S:232 >> >> #1 0x00007ffff3c246df in ena_com_handle_admin_completion () from ../lib64/../../lib64/libdpdk.so.20 >> >> #2 0x00007ffff3c1e7f5 in ena_interrupt_handler_rte () from ../lib64/../../lib64/libdpdk.so.20 >> >> #3 0x00007ffff3519902 in eal_intr_thread_main () from /../lib64/../../lib64/libdpdk.so.20 >> >> #4 0x00007ffff510714a in start_thread (arg=<optimized out>) at pthread_create.c:479 >> >> #5 0x00007ffff561ff23 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95 >> >> >> >> >> Background: >> >> This used to work fine with DPDK-19.11.3 , that means there was no crash observed with the 19.11.3 DPDK version, but now after upgrading to DPDK 20.11.1, DPDK is crashing with the above trace. >> It looks to me as a DPDK issue. >> I could see multiple fixes/patches in the net/ena area, but not able to identify which patch would exactly fix this issue. >> >> For example: http://git.dpdk.org/dpdk/diff/?h=releases&id=aab58857330bb4bd03f6699bf1ee716f72993774 >> https://inbox.dpdk.org/dev/20210430125725.28796-6-mk@semihalf.com/T/#me99457c706718bb236d1fd8006ee7a0319ce76fc >> >> >> Could you please help here and let me know what patch could fix this issue. >> + Shai Brandes and ena-dev Hi Amiya, Thanks for reaching me out. Could you please provide us with more details regarding the reproduction? I cannot reproduce this on my setup for DPDK v20.11.1 when using testpmd and probing for the xstats. ======================================================================= [ec2-user@<removed> dpdk]$ sudo ./build/app/dpdk-testpmd -- -i EAL: Detected 8 lcore(s) EAL: Detected 1 NUMA nodes EAL: Detected static linkage of DPDK EAL: Multi-process socket /var/run/dpdk/rte/mp_socket EAL: Selected IOVA mode 'PA' EAL: No available hugepages reported in hugepages-1048576kB EAL: Probing VFIO support... EAL: Invalid NUMA socket, default to 0 EAL: Invalid NUMA socket, default to 0 EAL: Probe PCI driver: net_ena (1d0f:ec20) device: 0000:00:06.0 (socket 0) EAL: No legacy callbacks, legacy socket not created Interactive-mode selected ena_mtu_set(): Set MTU: 1500 testpmd: create a new mbuf pool <mb_pool_0>: n=203456, size=2176, socket=0 testpmd: preferred mempool ops selected: ring_mp_mc Warning! port-topology=paired and odd forward ports number, the last port will pair with itself. Configuring Port 0 (socket 0) Port 0: <removed> Checking link statuses... Done Error during enabling promiscuous mode for port 0: Operation not supported - ignore testpmd> start io packet forwarding - ports=1 - cores=1 - streams=1 - NUMA support enabled, MP allocation mode: native Logical Core 1 (socket 0) forwards packets on 1 streams: RX P=0/Q=0 (socket 0) -> TX P=0/Q=0 (socket 0) peer=02:00:00:00:00:00 io packet forwarding packets/burst=32 nb forwarding cores=1 - nb forwarding ports=1 port 0: RX queue number: 1 Tx queue number: 1 Rx offloads=0x0 Tx offloads=0x0 RX queue: 0 RX desc=0 - RX free threshold=0 RX threshold registers: pthresh=0 hthresh=0 wthresh=0 RX Offloads=0x0 TX queue: 0 TX desc=0 - TX free threshold=0 TX threshold registers: pthresh=0 hthresh=0 wthresh=0 TX offloads=0x0 - TX RS bit threshold=0 testpmd> show port xstats 0 ###### NIC extended statistics for port 0 rx_good_packets: 1 tx_good_packets: 1 rx_good_bytes: 42 tx_good_bytes: 42 rx_missed_errors: 0 rx_errors: 0 tx_errors: 0 rx_mbuf_allocation_errors: 0 rx_q0_packets: 1 rx_q0_bytes: 42 rx_q0_errors: 0 tx_q0_packets: 1 tx_q0_bytes: 42 wd_expired: 0 dev_start: 1 dev_stop: 0 tx_drops: 0 bw_in_allowance_exceeded: 0 bw_out_allowance_exceeded: 0 pps_allowance_exceeded: 0 conntrack_allowance_exceeded: 0 linklocal_allowance_exceeded: 0 rx_q0_cnt: 1 rx_q0_bytes: 42 rx_q0_refill_partial: 0 rx_q0_bad_csum: 0 rx_q0_mbuf_alloc_fail: 0 rx_q0_bad_desc_num: 0 rx_q0_bad_req_id: 0 tx_q0_cnt: 1 tx_q0_bytes: 42 tx_q0_prepare_ctx_err: 0 tx_q0_linearize: 0 tx_q0_linearize_failed: 0 tx_q0_tx_poll: 1 tx_q0_doorbells: 1 tx_q0_bad_req_id: 0 tx_q0_available_desc: 1022 ======================================================================= I think that you can see the regression because of the new xstats (ENI limiters), which were added after DPDK v19.11 (mainline commit: 45718ada5fa12619db4821646ba964a2df365c68), but I'm not sure what is the reason why you can see that. Especially I've got few questions below. 1. Is the application you're using the single-process or multiprocess? If so, from which process are you probing for the xstats? 2. Have you tried running latest DPDK v20.11 LTS? 3. What kernel module are you using (igb_uio/vfio-pci)? 4. On what AWS instance type it was reproduced? 5. Is the Seg Fault happening the first time you call for the xstats? If you've got any other information which could be useful, please share, it will help us with resolving the cause of the issue. Thanks, Michal >> >> Regards >> Amiya ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: DPDK:20.11.1: net/ena crash while fetching xstats 2022-04-19 12:10 ` Michał Krawczyk @ 2022-04-19 15:01 ` Stephen Hemminger 2022-04-19 20:27 ` Michał Krawczyk 0 siblings, 1 reply; 8+ messages in thread From: Stephen Hemminger @ 2022-04-19 15:01 UTC (permalink / raw) To: Michał Krawczyk Cc: Amiya Mohakud, dev, Sachin Kanoje, Megha Punjani, Sharad Saha, Eswar Sadaram, Brandes, Shai, ena-dev On Tue, 19 Apr 2022 14:10:23 +0200 Michał Krawczyk <mk@semihalf.com> wrote: > pon., 18 kwi 2022 o 17:19 Amiya Mohakud > <amohakud@paloaltonetworks.com> napisał(a): > > > > + Megha, Sharad and Eswar. > > > > On Mon, Apr 18, 2022 at 2:03 PM Amiya Mohakud <amohakud@paloaltonetworks.com> wrote: > >> > >> Hi Michal/DPDK-Experts, > >> > >> I am facing one issue in net/ena driver while fetching extended stats (xstats). The DPDK seems to segfault with below backtrace. > >> > >> DPDK Version: 20.11.1 > >> ENA version: 2.2.1 > >> > >> > >> Using host libthread_db library "/lib64/libthread_db.so.1". > >> > >> Core was generated by `/opt/dpfs/usr/local/bin/brdagent'. > >> > >> Program terminated with signal SIGSEGV, Segmentation fault. > >> > >> #0 __memmove_avx_unaligned_erms () at ../sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S:232 > >> > >> 232 VMOVU %VEC(0), (%rdi) > >> > >> [Current thread is 1 (Thread 0x7fffed93a400 (LWP 5060))] > >> > >> > >> Thread 1 (Thread 0x7fffed93a400 (LWP 5060)): > >> > >> #0 __memmove_avx_unaligned_erms () at ../sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S:232 > >> > >> #1 0x00007ffff3c246df in ena_com_handle_admin_completion () from ../lib64/../../lib64/libdpdk.so.20 > >> > >> #2 0x00007ffff3c1e7f5 in ena_interrupt_handler_rte () from ../lib64/../../lib64/libdpdk.so.20 > >> > >> #3 0x00007ffff3519902 in eal_intr_thread_main () from /../lib64/../../lib64/libdpdk.so.20 > >> > >> #4 0x00007ffff510714a in start_thread (arg=<optimized out>) at pthread_create.c:479 > >> > >> #5 0x00007ffff561ff23 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95 > >> > >> > >> > >> > >> Background: > >> > >> This used to work fine with DPDK-19.11.3 , that means there was no crash observed with the 19.11.3 DPDK version, but now after upgrading to DPDK 20.11.1, DPDK is crashing with the above trace. > >> It looks to me as a DPDK issue. > >> I could see multiple fixes/patches in the net/ena area, but not able to identify which patch would exactly fix this issue. > >> > >> For example: http://git.dpdk.org/dpdk/diff/?h=releases&id=aab58857330bb4bd03f6699bf1ee716f72993774 > >> https://inbox.dpdk.org/dev/20210430125725.28796-6-mk@semihalf.com/T/#me99457c706718bb236d1fd8006ee7a0319ce76fc > >> > >> > >> Could you please help here and let me know what patch could fix this issue. > >> > > + Shai Brandes and ena-dev > > Hi Amiya, > > Thanks for reaching me out. Could you please provide us with more > details regarding the reproduction? I cannot reproduce this on my > setup for DPDK v20.11.1 when using testpmd and probing for the xstats. > > ======================================================================= > [ec2-user@<removed> dpdk]$ sudo ./build/app/dpdk-testpmd -- -i > EAL: Detected 8 lcore(s) > EAL: Detected 1 NUMA nodes > EAL: Detected static linkage of DPDK > EAL: Multi-process socket /var/run/dpdk/rte/mp_socket > EAL: Selected IOVA mode 'PA' > EAL: No available hugepages reported in hugepages-1048576kB > EAL: Probing VFIO support... > EAL: Invalid NUMA socket, default to 0 > EAL: Invalid NUMA socket, default to 0 > EAL: Probe PCI driver: net_ena (1d0f:ec20) device: 0000:00:06.0 (socket 0) > EAL: No legacy callbacks, legacy socket not created > Interactive-mode selected > ena_mtu_set(): Set MTU: 1500 > > testpmd: create a new mbuf pool <mb_pool_0>: n=203456, size=2176, socket=0 > testpmd: preferred mempool ops selected: ring_mp_mc > > Warning! port-topology=paired and odd forward ports number, the last > port will pair with itself. > > Configuring Port 0 (socket 0) > Port 0: <removed> > Checking link statuses... > Done > Error during enabling promiscuous mode for port 0: Operation not > supported - ignore > testpmd> start > io packet forwarding - ports=1 - cores=1 - streams=1 - NUMA support > enabled, MP allocation mode: native > Logical Core 1 (socket 0) forwards packets on 1 streams: > RX P=0/Q=0 (socket 0) -> TX P=0/Q=0 (socket 0) peer=02:00:00:00:00:00 > > io packet forwarding packets/burst=32 > nb forwarding cores=1 - nb forwarding ports=1 > port 0: RX queue number: 1 Tx queue number: 1 > Rx offloads=0x0 Tx offloads=0x0 > RX queue: 0 > RX desc=0 - RX free threshold=0 > RX threshold registers: pthresh=0 hthresh=0 wthresh=0 > RX Offloads=0x0 > TX queue: 0 > TX desc=0 - TX free threshold=0 > TX threshold registers: pthresh=0 hthresh=0 wthresh=0 > TX offloads=0x0 - TX RS bit threshold=0 > testpmd> show port xstats 0 > ###### NIC extended statistics for port 0 > rx_good_packets: 1 > tx_good_packets: 1 > rx_good_bytes: 42 > tx_good_bytes: 42 > rx_missed_errors: 0 > rx_errors: 0 > tx_errors: 0 > rx_mbuf_allocation_errors: 0 > rx_q0_packets: 1 > rx_q0_bytes: 42 > rx_q0_errors: 0 > tx_q0_packets: 1 > tx_q0_bytes: 42 > wd_expired: 0 > dev_start: 1 > dev_stop: 0 > tx_drops: 0 > bw_in_allowance_exceeded: 0 > bw_out_allowance_exceeded: 0 > pps_allowance_exceeded: 0 > conntrack_allowance_exceeded: 0 > linklocal_allowance_exceeded: 0 > rx_q0_cnt: 1 > rx_q0_bytes: 42 > rx_q0_refill_partial: 0 > rx_q0_bad_csum: 0 > rx_q0_mbuf_alloc_fail: 0 > rx_q0_bad_desc_num: 0 > rx_q0_bad_req_id: 0 > tx_q0_cnt: 1 > tx_q0_bytes: 42 > tx_q0_prepare_ctx_err: 0 > tx_q0_linearize: 0 > tx_q0_linearize_failed: 0 > tx_q0_tx_poll: 1 > tx_q0_doorbells: 1 > tx_q0_bad_req_id: 0 > tx_q0_available_desc: 1022 > ======================================================================= > > I think that you can see the regression because of the new xstats (ENI > limiters), which were added after DPDK v19.11 (mainline commit: > 45718ada5fa12619db4821646ba964a2df365c68), but I'm not sure what is > the reason why you can see that. > > Especially I've got few questions below. > > 1. Is the application you're using the single-process or multiprocess? > If so, from which process are you probing for the xstats? > 2. Have you tried running latest DPDK v20.11 LTS? > 3. What kernel module are you using (igb_uio/vfio-pci)? > 4. On what AWS instance type it was reproduced? > 5. Is the Seg Fault happening the first time you call for the xstats? > > If you've got any other information which could be useful, please > share, it will help us with resolving the cause of the issue. > > Thanks, > Michal > > >> > >> Regards > >> Amiya Try getting xstats in secondary process. I think that is where the bug was found. ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: DPDK:20.11.1: net/ena crash while fetching xstats 2022-04-19 15:01 ` Stephen Hemminger @ 2022-04-19 20:27 ` Michał Krawczyk 2022-04-19 22:25 ` Stephen Hemminger 2022-04-19 23:09 ` Stephen Hemminger 0 siblings, 2 replies; 8+ messages in thread From: Michał Krawczyk @ 2022-04-19 20:27 UTC (permalink / raw) To: Stephen Hemminger Cc: Amiya Mohakud, dev, Sachin Kanoje, Megha Punjani, Sharad Saha, Eswar Sadaram, Brandes, Shai, ena-dev wt., 19 kwi 2022 o 17:01 Stephen Hemminger <stephen@networkplumber.org> napisał(a): > > On Tue, 19 Apr 2022 14:10:23 +0200 > Michał Krawczyk <mk@semihalf.com> wrote: > > > pon., 18 kwi 2022 o 17:19 Amiya Mohakud > > <amohakud@paloaltonetworks.com> napisał(a): > > > > > > + Megha, Sharad and Eswar. > > > > > > On Mon, Apr 18, 2022 at 2:03 PM Amiya Mohakud <amohakud@paloaltonetworks.com> wrote: > > >> > > >> Hi Michal/DPDK-Experts, > > >> > > >> I am facing one issue in net/ena driver while fetching extended stats (xstats). The DPDK seems to segfault with below backtrace. > > >> > > >> DPDK Version: 20.11.1 > > >> ENA version: 2.2.1 > > >> > > >> > > >> Using host libthread_db library "/lib64/libthread_db.so.1". > > >> > > >> Core was generated by `/opt/dpfs/usr/local/bin/brdagent'. > > >> > > >> Program terminated with signal SIGSEGV, Segmentation fault. > > >> > > >> #0 __memmove_avx_unaligned_erms () at ../sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S:232 > > >> > > >> 232 VMOVU %VEC(0), (%rdi) > > >> > > >> [Current thread is 1 (Thread 0x7fffed93a400 (LWP 5060))] > > >> > > >> > > >> Thread 1 (Thread 0x7fffed93a400 (LWP 5060)): > > >> > > >> #0 __memmove_avx_unaligned_erms () at ../sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S:232 > > >> > > >> #1 0x00007ffff3c246df in ena_com_handle_admin_completion () from ../lib64/../../lib64/libdpdk.so.20 > > >> > > >> #2 0x00007ffff3c1e7f5 in ena_interrupt_handler_rte () from ../lib64/../../lib64/libdpdk.so.20 > > >> > > >> #3 0x00007ffff3519902 in eal_intr_thread_main () from /../lib64/../../lib64/libdpdk.so.20 > > >> > > >> #4 0x00007ffff510714a in start_thread (arg=<optimized out>) at pthread_create.c:479 > > >> > > >> #5 0x00007ffff561ff23 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95 > > >> > > >> > > >> > > >> > > >> Background: > > >> > > >> This used to work fine with DPDK-19.11.3 , that means there was no crash observed with the 19.11.3 DPDK version, but now after upgrading to DPDK 20.11.1, DPDK is crashing with the above trace. > > >> It looks to me as a DPDK issue. > > >> I could see multiple fixes/patches in the net/ena area, but not able to identify which patch would exactly fix this issue. > > >> > > >> For example: http://git.dpdk.org/dpdk/diff/?h=releases&id=aab58857330bb4bd03f6699bf1ee716f72993774 > > >> https://inbox.dpdk.org/dev/20210430125725.28796-6-mk@semihalf.com/T/#me99457c706718bb236d1fd8006ee7a0319ce76fc > > >> > > >> > > >> Could you please help here and let me know what patch could fix this issue. > > >> > > > > + Shai Brandes and ena-dev > > > > Hi Amiya, > > > > Thanks for reaching me out. Could you please provide us with more > > details regarding the reproduction? I cannot reproduce this on my > > setup for DPDK v20.11.1 when using testpmd and probing for the xstats. > > > > ======================================================================= > > [ec2-user@<removed> dpdk]$ sudo ./build/app/dpdk-testpmd -- -i > > EAL: Detected 8 lcore(s) > > EAL: Detected 1 NUMA nodes > > EAL: Detected static linkage of DPDK > > EAL: Multi-process socket /var/run/dpdk/rte/mp_socket > > EAL: Selected IOVA mode 'PA' > > EAL: No available hugepages reported in hugepages-1048576kB > > EAL: Probing VFIO support... > > EAL: Invalid NUMA socket, default to 0 > > EAL: Invalid NUMA socket, default to 0 > > EAL: Probe PCI driver: net_ena (1d0f:ec20) device: 0000:00:06.0 (socket 0) > > EAL: No legacy callbacks, legacy socket not created > > Interactive-mode selected > > ena_mtu_set(): Set MTU: 1500 > > > > testpmd: create a new mbuf pool <mb_pool_0>: n=203456, size=2176, socket=0 > > testpmd: preferred mempool ops selected: ring_mp_mc > > > > Warning! port-topology=paired and odd forward ports number, the last > > port will pair with itself. > > > > Configuring Port 0 (socket 0) > > Port 0: <removed> > > Checking link statuses... > > Done > > Error during enabling promiscuous mode for port 0: Operation not > > supported - ignore > > testpmd> start > > io packet forwarding - ports=1 - cores=1 - streams=1 - NUMA support > > enabled, MP allocation mode: native > > Logical Core 1 (socket 0) forwards packets on 1 streams: > > RX P=0/Q=0 (socket 0) -> TX P=0/Q=0 (socket 0) peer=02:00:00:00:00:00 > > > > io packet forwarding packets/burst=32 > > nb forwarding cores=1 - nb forwarding ports=1 > > port 0: RX queue number: 1 Tx queue number: 1 > > Rx offloads=0x0 Tx offloads=0x0 > > RX queue: 0 > > RX desc=0 - RX free threshold=0 > > RX threshold registers: pthresh=0 hthresh=0 wthresh=0 > > RX Offloads=0x0 > > TX queue: 0 > > TX desc=0 - TX free threshold=0 > > TX threshold registers: pthresh=0 hthresh=0 wthresh=0 > > TX offloads=0x0 - TX RS bit threshold=0 > > testpmd> show port xstats 0 > > ###### NIC extended statistics for port 0 > > rx_good_packets: 1 > > tx_good_packets: 1 > > rx_good_bytes: 42 > > tx_good_bytes: 42 > > rx_missed_errors: 0 > > rx_errors: 0 > > tx_errors: 0 > > rx_mbuf_allocation_errors: 0 > > rx_q0_packets: 1 > > rx_q0_bytes: 42 > > rx_q0_errors: 0 > > tx_q0_packets: 1 > > tx_q0_bytes: 42 > > wd_expired: 0 > > dev_start: 1 > > dev_stop: 0 > > tx_drops: 0 > > bw_in_allowance_exceeded: 0 > > bw_out_allowance_exceeded: 0 > > pps_allowance_exceeded: 0 > > conntrack_allowance_exceeded: 0 > > linklocal_allowance_exceeded: 0 > > rx_q0_cnt: 1 > > rx_q0_bytes: 42 > > rx_q0_refill_partial: 0 > > rx_q0_bad_csum: 0 > > rx_q0_mbuf_alloc_fail: 0 > > rx_q0_bad_desc_num: 0 > > rx_q0_bad_req_id: 0 > > tx_q0_cnt: 1 > > tx_q0_bytes: 42 > > tx_q0_prepare_ctx_err: 0 > > tx_q0_linearize: 0 > > tx_q0_linearize_failed: 0 > > tx_q0_tx_poll: 1 > > tx_q0_doorbells: 1 > > tx_q0_bad_req_id: 0 > > tx_q0_available_desc: 1022 > > ======================================================================= > > > > I think that you can see the regression because of the new xstats (ENI > > limiters), which were added after DPDK v19.11 (mainline commit: > > 45718ada5fa12619db4821646ba964a2df365c68), but I'm not sure what is > > the reason why you can see that. > > > > Especially I've got few questions below. > > > > 1. Is the application you're using the single-process or multiprocess? > > If so, from which process are you probing for the xstats? > > 2. Have you tried running latest DPDK v20.11 LTS? > > 3. What kernel module are you using (igb_uio/vfio-pci)? > > 4. On what AWS instance type it was reproduced? > > 5. Is the Seg Fault happening the first time you call for the xstats? > > > > If you've got any other information which could be useful, please > > share, it will help us with resolving the cause of the issue. > > > > Thanks, > > Michal > > > > >> > > >> Regards > > >> Amiya > > Try getting xstats in secondary process. > I think that is where the bug was found. Thanks Stephen, indeed the issue reproduces in the secondary process. Basically ENA v2.2.1 is not MP aware, meaning it cannot be used safely from the secondary process. The main obstacle is the admin queue which is used for processing the hardware requests which can be used safely only from the primary process. It's not strictly a bug, as we weren't exposing 'MP Awareness' in the PMD features list, it's more like a lack of proper MP support. The latest ENA PMD release should be MP safe. We currently don't have PMD backport ready for the older LTS release (but we're planning to do so for ENA v2.6.0 on the amzn-drivers repository: https://github.com/amzn/amzn-drivers/tree/master/userspace/dpdk). I can provide you with a list of the patches that were added across the ENA PMD releases which were connected with the MP support: net/ena: make ethdev references multi-process safe aab58857330bb4bd03f6699bf1ee716f72993774 net/ena: disable ops not supported by secondary process 39ecdd3dfa15d5ac591ce8d77d362480bff32355 net/ena: proxy AQ calls to primary process (this is the critical patch) e3595539e0e03f0dbb81904f8edaaef0447a4f62 net/ena: enable stats for multi-process mode 3aa3fa851f58873457bdc5c387d0e5956f812322 net/ena/base: make IO memzone unique per port 850e1bb1c72b3d1163b2857ab7a02af11ba29c40 Thanks, Michal ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: DPDK:20.11.1: net/ena crash while fetching xstats 2022-04-19 20:27 ` Michał Krawczyk @ 2022-04-19 22:25 ` Stephen Hemminger 2022-04-19 23:09 ` Stephen Hemminger 1 sibling, 0 replies; 8+ messages in thread From: Stephen Hemminger @ 2022-04-19 22:25 UTC (permalink / raw) To: Michał Krawczyk Cc: Amiya Mohakud, dev, Sachin Kanoje, Megha Punjani, Sharad Saha, Eswar Sadaram, Brandes, Shai, ena-dev On Tue, 19 Apr 2022 22:27:32 +0200 Michał Krawczyk <mk@semihalf.com> wrote: > wt., 19 kwi 2022 o 17:01 Stephen Hemminger > <stephen@networkplumber.org> napisał(a): > > > > On Tue, 19 Apr 2022 14:10:23 +0200 > > Michał Krawczyk <mk@semihalf.com> wrote: > > > > > pon., 18 kwi 2022 o 17:19 Amiya Mohakud > > > <amohakud@paloaltonetworks.com> napisał(a): > > > > > > > > + Megha, Sharad and Eswar. > > > > > > > > On Mon, Apr 18, 2022 at 2:03 PM Amiya Mohakud <amohakud@paloaltonetworks.com> wrote: > > > >> > > > >> Hi Michal/DPDK-Experts, > > > >> > > > >> I am facing one issue in net/ena driver while fetching extended stats (xstats). The DPDK seems to segfault with below backtrace. > > > >> > > > >> DPDK Version: 20.11.1 > > > >> ENA version: 2.2.1 > > > >> > > > >> > > > >> Using host libthread_db library "/lib64/libthread_db.so.1". > > > >> > > > >> Core was generated by `/opt/dpfs/usr/local/bin/brdagent'. > > > >> > > > >> Program terminated with signal SIGSEGV, Segmentation fault. > > > >> > > > >> #0 __memmove_avx_unaligned_erms () at ../sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S:232 > > > >> > > > >> 232 VMOVU %VEC(0), (%rdi) > > > >> > > > >> [Current thread is 1 (Thread 0x7fffed93a400 (LWP 5060))] > > > >> > > > >> > > > >> Thread 1 (Thread 0x7fffed93a400 (LWP 5060)): > > > >> > > > >> #0 __memmove_avx_unaligned_erms () at ../sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S:232 > > > >> > > > >> #1 0x00007ffff3c246df in ena_com_handle_admin_completion () from ../lib64/../../lib64/libdpdk.so.20 > > > >> > > > >> #2 0x00007ffff3c1e7f5 in ena_interrupt_handler_rte () from ../lib64/../../lib64/libdpdk.so.20 > > > >> > > > >> #3 0x00007ffff3519902 in eal_intr_thread_main () from /../lib64/../../lib64/libdpdk.so.20 > > > >> > > > >> #4 0x00007ffff510714a in start_thread (arg=<optimized out>) at pthread_create.c:479 > > > >> > > > >> #5 0x00007ffff561ff23 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95 > > > >> > > > >> > > > >> > > > >> > > > >> Background: > > > >> > > > >> This used to work fine with DPDK-19.11.3 , that means there was no crash observed with the 19.11.3 DPDK version, but now after upgrading to DPDK 20.11.1, DPDK is crashing with the above trace. > > > >> It looks to me as a DPDK issue. > > > >> I could see multiple fixes/patches in the net/ena area, but not able to identify which patch would exactly fix this issue. > > > >> > > > >> For example: http://git.dpdk.org/dpdk/diff/?h=releases&id=aab58857330bb4bd03f6699bf1ee716f72993774 > > > >> https://inbox.dpdk.org/dev/20210430125725.28796-6-mk@semihalf.com/T/#me99457c706718bb236d1fd8006ee7a0319ce76fc > > > >> > > > >> > > > >> Could you please help here and let me know what patch could fix this issue. > > > >> > > > > > > + Shai Brandes and ena-dev > > > > > > Hi Amiya, > > > > > > Thanks for reaching me out. Could you please provide us with more > > > details regarding the reproduction? I cannot reproduce this on my > > > setup for DPDK v20.11.1 when using testpmd and probing for the xstats. > > > > > > ======================================================================= > > > [ec2-user@<removed> dpdk]$ sudo ./build/app/dpdk-testpmd -- -i > > > EAL: Detected 8 lcore(s) > > > EAL: Detected 1 NUMA nodes > > > EAL: Detected static linkage of DPDK > > > EAL: Multi-process socket /var/run/dpdk/rte/mp_socket > > > EAL: Selected IOVA mode 'PA' > > > EAL: No available hugepages reported in hugepages-1048576kB > > > EAL: Probing VFIO support... > > > EAL: Invalid NUMA socket, default to 0 > > > EAL: Invalid NUMA socket, default to 0 > > > EAL: Probe PCI driver: net_ena (1d0f:ec20) device: 0000:00:06.0 (socket 0) > > > EAL: No legacy callbacks, legacy socket not created > > > Interactive-mode selected > > > ena_mtu_set(): Set MTU: 1500 > > > > > > testpmd: create a new mbuf pool <mb_pool_0>: n=203456, size=2176, socket=0 > > > testpmd: preferred mempool ops selected: ring_mp_mc > > > > > > Warning! port-topology=paired and odd forward ports number, the last > > > port will pair with itself. > > > > > > Configuring Port 0 (socket 0) > > > Port 0: <removed> > > > Checking link statuses... > > > Done > > > Error during enabling promiscuous mode for port 0: Operation not > > > supported - ignore > > > testpmd> start > > > io packet forwarding - ports=1 - cores=1 - streams=1 - NUMA support > > > enabled, MP allocation mode: native > > > Logical Core 1 (socket 0) forwards packets on 1 streams: > > > RX P=0/Q=0 (socket 0) -> TX P=0/Q=0 (socket 0) peer=02:00:00:00:00:00 > > > > > > io packet forwarding packets/burst=32 > > > nb forwarding cores=1 - nb forwarding ports=1 > > > port 0: RX queue number: 1 Tx queue number: 1 > > > Rx offloads=0x0 Tx offloads=0x0 > > > RX queue: 0 > > > RX desc=0 - RX free threshold=0 > > > RX threshold registers: pthresh=0 hthresh=0 wthresh=0 > > > RX Offloads=0x0 > > > TX queue: 0 > > > TX desc=0 - TX free threshold=0 > > > TX threshold registers: pthresh=0 hthresh=0 wthresh=0 > > > TX offloads=0x0 - TX RS bit threshold=0 > > > testpmd> show port xstats 0 > > > ###### NIC extended statistics for port 0 > > > rx_good_packets: 1 > > > tx_good_packets: 1 > > > rx_good_bytes: 42 > > > tx_good_bytes: 42 > > > rx_missed_errors: 0 > > > rx_errors: 0 > > > tx_errors: 0 > > > rx_mbuf_allocation_errors: 0 > > > rx_q0_packets: 1 > > > rx_q0_bytes: 42 > > > rx_q0_errors: 0 > > > tx_q0_packets: 1 > > > tx_q0_bytes: 42 > > > wd_expired: 0 > > > dev_start: 1 > > > dev_stop: 0 > > > tx_drops: 0 > > > bw_in_allowance_exceeded: 0 > > > bw_out_allowance_exceeded: 0 > > > pps_allowance_exceeded: 0 > > > conntrack_allowance_exceeded: 0 > > > linklocal_allowance_exceeded: 0 > > > rx_q0_cnt: 1 > > > rx_q0_bytes: 42 > > > rx_q0_refill_partial: 0 > > > rx_q0_bad_csum: 0 > > > rx_q0_mbuf_alloc_fail: 0 > > > rx_q0_bad_desc_num: 0 > > > rx_q0_bad_req_id: 0 > > > tx_q0_cnt: 1 > > > tx_q0_bytes: 42 > > > tx_q0_prepare_ctx_err: 0 > > > tx_q0_linearize: 0 > > > tx_q0_linearize_failed: 0 > > > tx_q0_tx_poll: 1 > > > tx_q0_doorbells: 1 > > > tx_q0_bad_req_id: 0 > > > tx_q0_available_desc: 1022 > > > ======================================================================= > > > > > > I think that you can see the regression because of the new xstats (ENI > > > limiters), which were added after DPDK v19.11 (mainline commit: > > > 45718ada5fa12619db4821646ba964a2df365c68), but I'm not sure what is > > > the reason why you can see that. > > > > > > Especially I've got few questions below. > > > > > > 1. Is the application you're using the single-process or multiprocess? > > > If so, from which process are you probing for the xstats? > > > 2. Have you tried running latest DPDK v20.11 LTS? > > > 3. What kernel module are you using (igb_uio/vfio-pci)? > > > 4. On what AWS instance type it was reproduced? > > > 5. Is the Seg Fault happening the first time you call for the xstats? > > > > > > If you've got any other information which could be useful, please > > > share, it will help us with resolving the cause of the issue. > > > > > > Thanks, > > > Michal > > > > > > >> > > > >> Regards > > > >> Amiya > > > > Try getting xstats in secondary process. > > I think that is where the bug was found. > > Thanks Stephen, indeed the issue reproduces in the secondary process. > > Basically ENA v2.2.1 is not MP aware, meaning it cannot be used safely > from the secondary process. The main obstacle is the admin queue which > is used for processing the hardware requests which can be used safely > only from the primary process. It's not strictly a bug, as we weren't > exposing 'MP Awareness' in the PMD features list, it's more like a > lack of proper MP support. Driver should report error. Not crash. Could you fix that. ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: DPDK:20.11.1: net/ena crash while fetching xstats 2022-04-19 20:27 ` Michał Krawczyk 2022-04-19 22:25 ` Stephen Hemminger @ 2022-04-19 23:09 ` Stephen Hemminger 2022-04-20 8:37 ` Amiya Mohakud 1 sibling, 1 reply; 8+ messages in thread From: Stephen Hemminger @ 2022-04-19 23:09 UTC (permalink / raw) To: Michał Krawczyk Cc: Amiya Mohakud, dev, Sachin Kanoje, Megha Punjani, Sharad Saha, Eswar Sadaram, Brandes, Shai, ena-dev On Tue, 19 Apr 2022 22:27:32 +0200 Michał Krawczyk <mk@semihalf.com> wrote: > Thanks Stephen, indeed the issue reproduces in the secondary process. > > Basically ENA v2.2.1 is not MP aware, meaning it cannot be used safely > from the secondary process. The main obstacle is the admin queue which > is used for processing the hardware requests which can be used safely > only from the primary process. It's not strictly a bug, as we weren't > exposing 'MP Awareness' in the PMD features list, it's more like a > lack of proper MP support. > > The latest ENA PMD release should be MP safe. We currently don't have > PMD backport ready for the older LTS release (but we're planning to do > so for ENA v2.6.0 on the amzn-drivers repository: > https://github.com/amzn/amzn-drivers/tree/master/userspace/dpdk). I wish that ENA did not have its own versioning scheme. Driver versions are meaningful only to the driver writer/vendor, they don't help the end user. Since backporting is not part of stable process. I suggest doing what XDP did for 21.11 and earlier releases. diff --git a/drivers/net/ena/ena_ethdev.c b/drivers/net/ena/ena_ethdev.c index 634c97acf60d..3778349f3fe9 100644 --- a/drivers/net/ena/ena_ethdev.c +++ b/drivers/net/ena/ena_ethdev.c @@ -3212,6 +3212,12 @@ static int ena_rx_queue_intr_disable(struct rte_eth_dev *dev, static int eth_ena_pci_probe(struct rte_pci_driver *pci_drv __rte_unused, struct rte_pci_device *pci_dev) { + if (rte_eal_process_type() == RTE_PROC_SECONDARY) { + PMD_INIT_LOG(ERR, + "Ena PMD does not support secondary processes\n"); + return -ENOTSUP; + } + return rte_eth_dev_pci_generic_probe(pci_dev, sizeof(struct ena_adapter), eth_ena_dev_init); } ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: DPDK:20.11.1: net/ena crash while fetching xstats 2022-04-19 23:09 ` Stephen Hemminger @ 2022-04-20 8:37 ` Amiya Mohakud 0 siblings, 0 replies; 8+ messages in thread From: Amiya Mohakud @ 2022-04-20 8:37 UTC (permalink / raw) To: Stephen Hemminger Cc: Michał Krawczyk, dev, Sachin Kanoje, Megha Punjani, Sharad Saha, Eswar Sadaram, Brandes, Shai, ena-dev [-- Attachment #1: Type: text/plain, Size: 3204 bytes --] Hi Stephen and Michal Thanks a lot for all the discussions and progress made on this.Appreciate it. Sorry for the late reply. To answer your questions: *1. Is the application you're using the single-process or multiprocess?If so, from which process are you probing for the xstats?* >> System has both primary and secondary processes running. but the stats are being fetched from the* primary process* only. I'm not sure if the presence of secondary processes is causing the crash even if we try to fetch stats from the primary process. Can we confirm this from the code? *2. Have you tried running latest DPDK v20.11 LTS?* *>> *It's DPDK v20.11.1. Did not try with the latest 20.11 LTS. 3. What kernel module are you using (igb_uio/vfio-pci)? >> It's igb_uio. 4. On what AWS instance type it was reproduced? >> It's c5n.2xlarge. ( 8 cores. 1 primary process and 6 secondary processes.) 5. Is the Seg Fault happening the first time you call for the xstats? >> Yes. That's correct. Regards Amiya On Wed, Apr 20, 2022 at 4:39 AM Stephen Hemminger < stephen@networkplumber.org> wrote: > On Tue, 19 Apr 2022 22:27:32 +0200 > Michał Krawczyk <mk@semihalf.com> wrote: > > > Thanks Stephen, indeed the issue reproduces in the secondary process. > > > > Basically ENA v2.2.1 is not MP aware, meaning it cannot be used safely > > from the secondary process. The main obstacle is the admin queue which > > is used for processing the hardware requests which can be used safely > > only from the primary process. It's not strictly a bug, as we weren't > > exposing 'MP Awareness' in the PMD features list, it's more like a > > lack of proper MP support. > > > > The latest ENA PMD release should be MP safe. We currently don't have > > PMD backport ready for the older LTS release (but we're planning to do > > so for ENA v2.6.0 on the amzn-drivers repository: > > > https://urldefense.com/v3/__https://github.com/amzn/amzn-drivers/tree/master/userspace/dpdk__;!!Mt_FR42WkD9csi9Y!ZAgIa147k7j0wwnu83K-vq8T9bH0gWwoldqHg9IshR1CSkTYpJOLzT35FhtlVPDkWbN9CZMv469Jj68fwxrqFsQQErwYHNc$ > ). > > I wish that ENA did not have its own versioning scheme. > Driver versions are meaningful only to the driver writer/vendor, they > don't help the end user. > > Since backporting is not part of stable process. I suggest doing what > XDP did for 21.11 and earlier releases. > > diff --git a/drivers/net/ena/ena_ethdev.c b/drivers/net/ena/ena_ethdev.c > index 634c97acf60d..3778349f3fe9 100644 > --- a/drivers/net/ena/ena_ethdev.c > +++ b/drivers/net/ena/ena_ethdev.c > @@ -3212,6 +3212,12 @@ static int ena_rx_queue_intr_disable(struct > rte_eth_dev *dev, > static int eth_ena_pci_probe(struct rte_pci_driver *pci_drv __rte_unused, > struct rte_pci_device *pci_dev) > { > + if (rte_eal_process_type() == RTE_PROC_SECONDARY) { > + PMD_INIT_LOG(ERR, > + "Ena PMD does not support secondary > processes\n"); > + return -ENOTSUP; > + } > + > return rte_eth_dev_pci_generic_probe(pci_dev, > sizeof(struct ena_adapter), eth_ena_dev_init); > } > [-- Attachment #2: Type: text/html, Size: 6290 bytes --] ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2022-04-20 8:37 UTC | newest] Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2022-04-18 8:33 DPDK:20.11.1: net/ena crash while fetching xstats Amiya Mohakud 2022-04-18 15:18 ` Amiya Mohakud 2022-04-19 12:10 ` Michał Krawczyk 2022-04-19 15:01 ` Stephen Hemminger 2022-04-19 20:27 ` Michał Krawczyk 2022-04-19 22:25 ` Stephen Hemminger 2022-04-19 23:09 ` Stephen Hemminger 2022-04-20 8:37 ` Amiya Mohakud
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).