DPDK usage discussions
 help / color / mirror / Atom feed
From: Stephen Hemminger <stephen@networkplumber.org>
To: Gabor LENCSE <lencse@hit.bme.hu>
Cc: users@dpdk.org
Subject: Re: rte_exit() does not terminate the program -- is it a bug or a new feature?
Date: Fri, 15 Sep 2023 14:33:05 -0700	[thread overview]
Message-ID: <20230915143305.0ac313c6@hermes.local> (raw)
In-Reply-To: <35b55d11-bb67-2363-6f0a-0fb9667ebe6d@hit.bme.hu>

On Fri, 15 Sep 2023 20:28:44 +0200
Gabor LENCSE <lencse@hit.bme.hu> wrote:

> Dear Stephen,
> 
> Thank you very much for your answer!
> 
> > Please get a backtrace. Simple way is to attach gdb to that process.  
> 
> I have recompiled siitperf with the "-g" compiler option and executed it 
> from gdb. When the program stopped, I pressed Ctrl-C and issued a "bt" 
> command, but of course, it displayed the call stack of the main thread. 
> Then I collected some information about the threads using the "info 
> threads" command and after that I switched to all available threads, and 
> issued a "bt" command for those that represented my send() and receive() 
> functions (I identified them using their LWP number). Here are the results:
> 
> root@x033:~/siitperf# gdb ./build/siitperf-tp
> GNU gdb (Ubuntu 12.1-0ubuntu1~22.04) 12.1
> Copyright (C) 2022 Free Software Foundation, Inc.
> License GPLv3+: GNU GPL version 3 or later 
> <http://gnu.org/licenses/gpl.html>
> This is free software: you are free to change and redistribute it.
> There is NO WARRANTY, to the extent permitted by law.
> Type "show copying" and "show warranty" for details.
> This GDB was configured as "x86_64-linux-gnu".
> Type "show configuration" for configuration details.
> For bug reporting instructions, please see:
> <https://www.gnu.org/software/gdb/bugs/>.
> Find the GDB manual and other documentation resources online at:
>      <http://www.gnu.org/software/gdb/documentation/>.
> 
> For help, type "help".
> Type "apropos word" to search for commands related to "word"...
> Reading symbols from ./build/siitperf-tp...
> (gdb) set args 84 8000000 60 2000 2 2
> (gdb) run
> Starting program: /root/siitperf/build/siitperf-tp 84 8000000 60 2000 2 2
> [Thread debugging using libthread_db enabled]
> Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
> EAL: Detected CPU lcores: 56
> EAL: Detected NUMA nodes: 4
> EAL: Detected shared linkage of DPDK
> [New Thread 0x7ffff49c0640 (LWP 24747)]
> EAL: Multi-process socket /var/run/dpdk/rte/mp_socket
> [New Thread 0x7ffff41bf640 (LWP 24748)]
> EAL: Selected IOVA mode 'PA'
> EAL: No free 2048 kB hugepages reported on node 0
> EAL: No free 2048 kB hugepages reported on node 1
> EAL: No free 2048 kB hugepages reported on node 2
> EAL: No free 2048 kB hugepages reported on node 3
> EAL: No available 2048 kB hugepages reported
> EAL: VFIO support initialized
> [New Thread 0x7ffff39be640 (LWP 24749)]
> [New Thread 0x7ffff31bd640 (LWP 24750)]
> [New Thread 0x7ffff29bc640 (LWP 24751)]
> [New Thread 0x7ffff21bb640 (LWP 24752)]
> EAL: Probe PCI driver: net_ice (8086:159b) device: 0000:98:00.0 (socket 2)
> ice_load_pkg_type(): Active package is: 1.3.26.0, ICE OS Default Package 
> (single VLAN mode)
> EAL: Probe PCI driver: net_ice (8086:159b) device: 0000:98:00.1 (socket 2)
> ice_load_pkg_type(): Active package is: 1.3.26.0, ICE OS Default Package 
> (single VLAN mode)
> [New Thread 0x7ffff19ba640 (LWP 24753)]
> TELEMETRY: No legacy callbacks, legacy socket not created
> ice_set_rx_function(): Using AVX2 Vector Rx (port 0).
> ice_set_rx_function(): Using AVX2 Vector Rx (port 1).
> Info: Left port and Left Sender CPU core belong to the same NUMA node: 2
> Info: Right port and Right Receiver CPU core belong to the same NUMA node: 2
> Info: Right port and Right Sender CPU core belong to the same NUMA node: 2
> Info: Left port and Left Receiver CPU core belong to the same NUMA node: 2
> Info: Testing initiated at 2023-09-15 18:06:05
> Reverse frames received: 394340224
> Forward frames received: 421381420
> Info: Forward sender's sending took 70.3073795726 seconds.
> EAL: Error - exiting with code: 1
>    Cause: Forward sending exceeded the 60.0006000000 seconds limit, the 
> test is invalid.
> Info: Reverse sender's sending took 74.9384769772 seconds.
> EAL: Error - exiting with code: 1
>    Cause: Reverse sending exceeded the 60.0006000000 seconds limit, the 
> test is invalid.
> ^C
> Thread 1 "siitperf-tp" received signal SIGINT, Interrupt.
> 0x00007ffff7d99dd2 in rte_eal_wait_lcore () from 
> /lib/x86_64-linux-gnu/librte_eal.so.22
> (gdb) bt
> #0  0x00007ffff7d99dd2 in rte_eal_wait_lcore () from 
> /lib/x86_64-linux-gnu/librte_eal.so.22
> #1  0x000055555559929e in Throughput::measure (this=0x7fffffffe300, 
> leftport=0, rightport=1) at throughput.cc:3743
> #2  0x0000555555557b20 in main (argc=7, argv=0x7fffffffe5b8) at 
> main-tp.cc:34
> (gdb) info threads
>    Id   Target Id                                           Frame
> * 1    Thread 0x7ffff77cac00 (LWP 24744) "siitperf-tp" 
> 0x00007ffff7d99dd2 in rte_eal_wait_lcore ()
>     from /lib/x86_64-linux-gnu/librte_eal.so.22
>    2    Thread 0x7ffff49c0640 (LWP 24747) "eal-intr-thread" 
> 0x00007ffff7a32fde in epoll_wait (epfd=6, events=0x7ffff49978d0,
>      maxevents=3, timeout=-1) at ../sysdeps/unix/sysv/linux/epoll_wait.c:30
>    3    Thread 0x7ffff41bf640 (LWP 24748) "rte_mp_handle" 
> __recvmsg_syscall (flags=0, msg=0x7ffff41965c0, fd=9)
>      at ../sysdeps/unix/sysv/linux/recvmsg.c:27
>    4    Thread 0x7ffff39be640 (LWP 24749) "lcore-worker-1" 
> 0x00007ffff7d99dd2 in rte_eal_wait_lcore ()
>     from /lib/x86_64-linux-gnu/librte_eal.so.22
>    5    Thread 0x7ffff31bd640 (LWP 24750) "lcore-worker-5" 
> __GI___libc_read (nbytes=1, buf=0x7ffff31947cf, fd=40)
>      at ../sysdeps/unix/sysv/linux/read.c:26
>    6    Thread 0x7ffff29bc640 (LWP 24751) "lcore-worker-9" 
> 0x00007ffff7d99dd2 in rte_eal_wait_lcore ()
>     from /lib/x86_64-linux-gnu/librte_eal.so.22
>    7    Thread 0x7ffff21bb640 (LWP 24752) "lcore-worker-13" 
> __GI___libc_read (nbytes=1, buf=0x7ffff21927cf, fd=48)
>      at ../sysdeps/unix/sysv/linux/read.c:26
>    8    Thread 0x7ffff19ba640 (LWP 24753) "telemetry-v2" 
> 0x00007ffff7a3460f in __libc_accept (fd=58, addr=..., len=0x0)
>      at ../sysdeps/unix/sysv/linux/accept.c:26
> (gdb) thread 1
> [Switching to thread 1 (Thread 0x7ffff77cac00 (LWP 24744))]
> #0  0x00007ffff7d99dd2 in rte_eal_wait_lcore () from 
> /lib/x86_64-linux-gnu/librte_eal.so.22
> (gdb) thread 2
> [Switching to thread 2 (Thread 0x7ffff49c0640 (LWP 24747))]
> #0  0x00007ffff7a32fde in epoll_wait (epfd=6, events=0x7ffff49978d0, 
> maxevents=3, timeout=-1)
>      at ../sysdeps/unix/sysv/linux/epoll_wait.c:30
> 30      ../sysdeps/unix/sysv/linux/epoll_wait.c: No such file or directory.
> (gdb) thread 3
> [Switching to thread 3 (Thread 0x7ffff41bf640 (LWP 24748))]
> #0  __recvmsg_syscall (flags=0, msg=0x7ffff41965c0, fd=9) at 
> ../sysdeps/unix/sysv/linux/recvmsg.c:27
> 27      ../sysdeps/unix/sysv/linux/recvmsg.c: No such file or directory.
> (gdb) thread 4
> [Switching to thread 4 (Thread 0x7ffff39be640 (LWP 24749))]
> #0  0x00007ffff7d99dd2 in rte_eal_wait_lcore () from 
> /lib/x86_64-linux-gnu/librte_eal.so.22
> (gdb) bt
> #0  0x00007ffff7d99dd2 in rte_eal_wait_lcore () from 
> /lib/x86_64-linux-gnu/librte_eal.so.22
> #1  0x00007ffff7d99f97 in rte_eal_mp_wait_lcore () from 
> /lib/x86_64-linux-gnu/librte_eal.so.22
> #2  0x00007ffff7da99ee in rte_service_finalize () from 
> /lib/x86_64-linux-gnu/librte_eal.so.22
> #3  0x00007ffff7db0404 in rte_eal_cleanup () from 
> /lib/x86_64-linux-gnu/librte_eal.so.22
> #4  0x00007ffff7d9d0b7 in rte_exit () from 
> /lib/x86_64-linux-gnu/librte_eal.so.22
> #5  0x000055555558e685 in send (par=0x7fffffffde00) at throughput.cc:1562
> #6  0x00007ffff7d94a18 in ?? () from /lib/x86_64-linux-gnu/librte_eal.so.22
> #7  0x00007ffff79a1b43 in start_thread (arg=<optimized out>) at 
> ./nptl/pthread_create.c:442
> #8  0x00007ffff7a33a00 in clone3 () at 
> ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81
> (gdb) thread 5
> [Switching to thread 5 (Thread 0x7ffff31bd640 (LWP 24750))]
> #0  __GI___libc_read (nbytes=1, buf=0x7ffff31947cf, fd=40) at 
> ../sysdeps/unix/sysv/linux/read.c:26
> 26      ../sysdeps/unix/sysv/linux/read.c: No such file or directory.
> (gdb) bt
> #0  __GI___libc_read (nbytes=1, buf=0x7ffff31947cf, fd=40) at 
> ../sysdeps/unix/sysv/linux/read.c:26
> #1  __GI___libc_read (fd=40, buf=0x7ffff31947cf, nbytes=1) at 
> ../sysdeps/unix/sysv/linux/read.c:24
> #2  0x00007ffff7d9490c in ?? () from /lib/x86_64-linux-gnu/librte_eal.so.22
> #3  0x00007ffff79a1b43 in start_thread (arg=<optimized out>) at 
> ./nptl/pthread_create.c:442
> #4  0x00007ffff7a33a00 in clone3 () at 
> ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81
> (gdb) thread 6
> [Switching to thread 6 (Thread 0x7ffff29bc640 (LWP 24751))]
> #0  0x00007ffff7d99dd2 in rte_eal_wait_lcore () from 
> /lib/x86_64-linux-gnu/librte_eal.so.22
> (gdb) bt
> #0  0x00007ffff7d99dd2 in rte_eal_wait_lcore () from 
> /lib/x86_64-linux-gnu/librte_eal.so.22
> #1  0x00007ffff7d99f97 in rte_eal_mp_wait_lcore () from 
> /lib/x86_64-linux-gnu/librte_eal.so.22
> #2  0x00007ffff7da99ee in rte_service_finalize () from 
> /lib/x86_64-linux-gnu/librte_eal.so.22
> #3  0x00007ffff7db0404 in rte_eal_cleanup () from 
> /lib/x86_64-linux-gnu/librte_eal.so.22
> #4  0x00007ffff7d9d0b7 in rte_exit () from 
> /lib/x86_64-linux-gnu/librte_eal.so.22
> #5  0x000055555558e685 in send (par=0x7fffffffde80) at throughput.cc:1562
> #6  0x00007ffff7d94a18 in ?? () from /lib/x86_64-linux-gnu/librte_eal.so.22
> #7  0x00007ffff79a1b43 in start_thread (arg=<optimized out>) at 
> ./nptl/pthread_create.c:442
> #8  0x00007ffff7a33a00 in clone3 () at 
> ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81
> (gdb) thread 7
> [Switching to thread 7 (Thread 0x7ffff21bb640 (LWP 24752))]
> #0  __GI___libc_read (nbytes=1, buf=0x7ffff21927cf, fd=48) at 
> ../sysdeps/unix/sysv/linux/read.c:26
> 26      in ../sysdeps/unix/sysv/linux/read.c
> (gdb) bt
> #0  __GI___libc_read (nbytes=1, buf=0x7ffff21927cf, fd=48) at 
> ../sysdeps/unix/sysv/linux/read.c:26
> #1  __GI___libc_read (fd=48, buf=0x7ffff21927cf, nbytes=1) at 
> ../sysdeps/unix/sysv/linux/read.c:24
> #2  0x00007ffff7d9490c in ?? () from /lib/x86_64-linux-gnu/librte_eal.so.22
> #3  0x00007ffff79a1b43 in start_thread (arg=<optimized out>) at 
> ./nptl/pthread_create.c:442
> #4  0x00007ffff7a33a00 in clone3 () at 
> ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81
> (gdb) thread 8
> [Switching to thread 8 (Thread 0x7ffff19ba640 (LWP 24753))]
> #0  0x00007ffff7a3460f in __libc_accept (fd=58, addr=..., len=0x0) at 
> ../sysdeps/unix/sysv/linux/accept.c:26
> 26      ../sysdeps/unix/sysv/linux/accept.c: No such file or directory.
> (gdb) thread 9
> Unknown thread 9.
> (gdb)
> 
> Some additional information from the siitperf.conf file:
> 
> CPU-L-Send 1 # Left Sender runs on this core
> CPU-R-Recv 5 # Right Receiver runs on this core
> CPU-R-Send 9 # Right Sender runs on this core
> CPU-L-Recv 13 # Left Receiver runs on this core
> 
> Therefore, the "send()" functions are the ones that remain running on 
> CPU cores 1 and 9. And they fully utilize their cores (as well as the 
> main program does with core 0), I have checked it earlier.
> 
> This is a -- perhaps -- relevant part from the code of the main program:
> 
>        // wait until active senders and receivers finish
>        if ( forward ) {
>          rte_eal_wait_lcore(cpu_left_sender);
>          rte_eal_wait_lcore(cpu_right_receiver);
>        }
>        if ( reverse ) {
>          rte_eal_wait_lcore(cpu_right_sender);
>          rte_eal_wait_lcore(cpu_left_receiver);
>        }
> 
> It seems to me as if the two send functions and also the main program 
> would be actively waiting in the rte_eal_wait_lcore() function. But I 
> have no idea, why. If the sender that sent frames in the forward 
> direction is there and the main program is there, then, IMHO, the 
> rte_eal_wait_lcore(cpu_left_sender); function should finish.
> 
> Am I wrong?
> 
> > I suspect that since rte_exit() call the internal eal_cleanup function
> > and that calls close in the dirver, that ICE driver close function has
> > a bug. Perhaps ice close function does not correctly handle case where
> > the device has not started.  
> 
> Yes, your hypothesis was confirmed: my both send() functions were in the 
> rte_eal_cleanup() function. :-)
> 
> However, I am not sure, which device you meant. But I think that they 
> are initialized properly, because I can ensure a successful execution 
> (and finishing) of the program by halving the frame rate:
> 
> root@x033:~/siitperf# ./build/siitperf-tp 84 4000000 6 2000 2 2
> EAL: Detected CPU lcores: 56
> EAL: Detected NUMA nodes: 4
> EAL: Detected shared linkage of DPDK
> EAL: Multi-process socket /var/run/dpdk/rte/mp_socket
> EAL: Selected IOVA mode 'PA'
> EAL: No free 2048 kB hugepages reported on node 0
> EAL: No free 2048 kB hugepages reported on node 1
> EAL: No free 2048 kB hugepages reported on node 2
> EAL: No free 2048 kB hugepages reported on node 3
> EAL: No available 2048 kB hugepages reported
> EAL: VFIO support initialized
> EAL: Probe PCI driver: net_ice (8086:159b) device: 0000:98:00.0 (socket 2)
> ice_load_pkg_type(): Active package is: 1.3.26.0, ICE OS Default Package 
> (single VLAN mode)
> EAL: Probe PCI driver: net_ice (8086:159b) device: 0000:98:00.1 (socket 2)
> ice_load_pkg_type(): Active package is: 1.3.26.0, ICE OS Default Package 
> (single VLAN mode)
> TELEMETRY: No legacy callbacks, legacy socket not created
> ice_set_rx_function(): Using AVX2 Vector Rx (port 0).
> ice_set_rx_function(): Using AVX2 Vector Rx (port 1).
> Info: Left port and Left Sender CPU core belong to the same NUMA node: 2
> Info: Right port and Right Receiver CPU core belong to the same NUMA node: 2
> Info: Right port and Right Sender CPU core belong to the same NUMA node: 2
> Info: Left port and Left Receiver CPU core belong to the same NUMA node: 2
> Info: Testing initiated at 2023-09-15 17:43:11
> Info: Reverse sender's sending took 5.9999998420 seconds.
> Reverse frames sent: 24000000
> Info: Forward sender's sending took 5.9999999023 seconds.
> Forward frames sent: 24000000
> Forward frames received: 24000000
> Reverse frames received: 24000000
> Info: Test finished.
> root@x033:~/siitperf#
> 
> (The only problem with this trick is that I want to use a binary search 
> to determine the performance limit of the tester. And if the tester does 
> not stop, then my bash shell script waits for it forever.)
> 
> So, what should I do next?

Not sure what the tx and rx polling loops look like in your application.
But they need to have some way of forcing exit, and you need to set that
flag before calling rte_exit().

See l2fwd and force_quit flag for an example.

  reply	other threads:[~2023-09-15 21:33 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-09-15  8:24 Gabor LENCSE
2023-09-15 15:06 ` Stephen Hemminger
2023-09-15 18:28   ` Gabor LENCSE
2023-09-15 21:33     ` Stephen Hemminger [this message]
2023-09-17 19:37       ` Gabor LENCSE
2023-09-17 21:27         ` Stephen Hemminger
2023-09-18 18:23           ` Gabor LENCSE
2023-09-18 18:49             ` Stephen Hemminger

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20230915143305.0ac313c6@hermes.local \
    --to=stephen@networkplumber.org \
    --cc=lencse@hit.bme.hu \
    --cc=users@dpdk.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).