DPDK usage discussions
 help / color / mirror / Atom feed
* rte_exit() does not terminate the program -- is it a bug or a new feature?
@ 2023-09-15  8:24 Gabor LENCSE
  2023-09-15 15:06 ` Stephen Hemminger
  0 siblings, 1 reply; 8+ messages in thread
From: Gabor LENCSE @ 2023-09-15  8:24 UTC (permalink / raw)
  To: users

[-- Attachment #1: Type: text/plain, Size: 5969 bytes --]

Dear DPDK Developers and Users,

I have met the following issue with my RFC 8219 compliant SIIT and 
stateful NAT64/NAT44 tester, siitperf: 
https://github.com/lencsegabor/siitperf

Its main program starts two sending threads and two receiving threads on 
their exclusively used CPU cores using the rte_eal_remote_launch() 
function, e.g., the code is as follows:

           // start left sender
           if ( rte_eal_remote_launch(send, &spars1, cpu_left_sender) )
             std::cout << "Error: could not start Left Sender." << 
std::endl;

When the test frame sending is finished, the senders check the sending 
time, and if the allowed time was significantly exceeded, the sender 
gives an error message and terminates (itself and also the main program) 
using the rte_exit() function.

This is the code:

   elapsed_seconds = (double)(rte_rdtsc()-start_tsc)/hz;
   printf("Info: %s sender's sending took %3.10lf seconds.\n", side, 
elapsed_seconds);
   if ( elapsed_seconds > duration*TOLERANCE )
     rte_exit(EXIT_FAILURE, "%s sending exceeded the %3.10lf seconds 
limit, the test is invalid.\n", side, duration*TOLERANCE);
   printf("%s frames sent: %lu\n", side, sent_frames);

   return 0;

The above code worked as I expected, while I used siitperf under Debian 
9.13 with DPDK 16.11.11-1+deb9u2. It always displayed the execution time 
of test frame sending, and if the allowed time was significantly exceed, 
then it gave an error message, and it was terminated, thus the sender 
did not print out the number of send frames. And also the main program 
was terminated due to the call of this function: it did not write out 
the "Info: Test finished." message.

However, when I updated siitperf to use it with Ubuntu 22.04 with DPDK 
version "21.11.3-0ubuntu0.22.04.1 amd64", then I experienced something 
rather strange:

In the case, when the sending time is significantly exceeded, I get the 
following messages from the program (I copy here the full output, as it 
may be useful):

root@x033:~/siitperf# cat temp.out
EAL: Detected CPU lcores: 56
EAL: Detected NUMA nodes: 4
EAL: Detected shared linkage of DPDK
EAL: Multi-process socket /var/run/dpdk/rte/mp_socket
EAL: Selected IOVA mode 'PA'
EAL: No free 2048 kB hugepages reported on node 0
EAL: No free 2048 kB hugepages reported on node 1
EAL: No free 2048 kB hugepages reported on node 2
EAL: No free 2048 kB hugepages reported on node 3
EAL: No available 2048 kB hugepages reported
EAL: VFIO support initialized
EAL: Probe PCI driver: net_ice (8086:159b) device: 0000:98:00.0 (socket 2)
ice_load_pkg_type(): Active package is: 1.3.26.0, ICE OS Default Package 
(single VLAN mode)
EAL: Probe PCI driver: net_ice (8086:159b) device: 0000:98:00.1 (socket 2)
ice_load_pkg_type(): Active package is: 1.3.26.0, ICE OS Default Package 
(single VLAN mode)
TELEMETRY: No legacy callbacks, legacy socket not created
ice_set_rx_function(): Using AVX2 Vector Rx (port 0).
ice_set_rx_function(): Using AVX2 Vector Rx (port 1).
Info: Left port and Left Sender CPU core belong to the same NUMA node: 2
Info: Right port and Right Receiver CPU core belong to the same NUMA node: 2
Info: Right port and Right Sender CPU core belong to the same NUMA node: 2
Info: Left port and Left Receiver CPU core belong to the same NUMA node: 2
Info: Testing initiated at 2023-09-15 07:50:17
EAL: Error - exiting with code: 1
   Cause: Forward sending exceeded the 60.0006000000 seconds limit, the 
test is invalid.
EAL: Error - exiting with code: 1
   Cause: Reverse sending exceeded the 60.0006000000 seconds limit, the 
test is invalid.
root@x033:~/siitperf#

The rte_exit() function seems to work, as the error message appears, and 
the number of sent frames is not displayed, however, the "Info: ..." 
message about the sending time (printed out earlier in the code) is 
missing! This is rather strange!

What is worse, the program does not stop, but *the sender threads and 
the main program remain running (forever)*.

Here is the output of the "top" command:

top - 07:54:24 up 1 day, 14:12,  2 users, load average: 3.02, 2.41, 2.10
Tasks: 591 total,   2 running, 589 sleeping,   0 stopped,   0 zombie
%Cpu0  :100.0 us,  0.0 sy,  0.0 ni,  0.0 id,  0.0 wa,  0.0 hi, 0.0 si,  
0.0 st
%Cpu1  :100.0 us,  0.0 sy,  0.0 ni,  0.0 id,  0.0 wa,  0.0 hi, 0.0 si,  
0.0 st
%Cpu2  :  0.0 us,  0.0 sy,  0.0 ni, 94.1 id,  0.0 wa,  0.0 hi, 5.9 si,  
0.0 st
%Cpu3  :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi, 0.0 si,  
0.0 st
%Cpu4  :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi, 0.0 si,  
0.0 st
%Cpu5  :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi, 0.0 si,  
0.0 st
%Cpu6  :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi, 0.0 si,  
0.0 st
%Cpu7  :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi, 0.0 si,  
0.0 st
%Cpu8  :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi, 0.0 si,  
0.0 st
%Cpu9  :100.0 us,  0.0 sy,  0.0 ni,  0.0 id,  0.0 wa,  0.0 hi, 0.0 si,  
0.0 st
%Cpu10 :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi, 0.0 si,  
0.0 st
%Cpu11 :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi, 0.0 si,  
0.0 st
%Cpu12 :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi, 0.0 si,  
0.0 st
%Cpu13 :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi, 0.0 si,  
0.0 st

CPU0 is the main core, the left sender and the right sender use CPU1 and 
CPU9, respectively. (The receivers that used CPU5 and CPU13 already 
terminated due to their timeout.)

*Thus, rte_exit() behaves differently now: it used the terminate the 
main program but now it does not. *(And it also suppresses some 
previously sent output.)*
*

Is it a bug a new feature?

*How could I achieve the old behavior?* (Or at least the termination of 
the main program by sender threads?)

Thank you very much for you guidance!

Best regards,

Gábor Lencse



[-- Attachment #2: Type: text/html, Size: 7789 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: rte_exit() does not terminate the program -- is it a bug or a new feature?
  2023-09-15  8:24 rte_exit() does not terminate the program -- is it a bug or a new feature? Gabor LENCSE
@ 2023-09-15 15:06 ` Stephen Hemminger
  2023-09-15 18:28   ` Gabor LENCSE
  0 siblings, 1 reply; 8+ messages in thread
From: Stephen Hemminger @ 2023-09-15 15:06 UTC (permalink / raw)
  To: Gabor LENCSE; +Cc: users

On Fri, 15 Sep 2023 10:24:01 +0200
Gabor LENCSE <lencse@hit.bme.hu> wrote:

> Dear DPDK Developers and Users,
> 
> I have met the following issue with my RFC 8219 compliant SIIT and 
> stateful NAT64/NAT44 tester, siitperf: 
> https://github.com/lencsegabor/siitperf
> 
> Its main program starts two sending threads and two receiving threads on 
> their exclusively used CPU cores using the rte_eal_remote_launch() 
> function, e.g., the code is as follows:
> 
>            // start left sender
>            if ( rte_eal_remote_launch(send, &spars1, cpu_left_sender) )
>              std::cout << "Error: could not start Left Sender." << 
> std::endl;
> 
> When the test frame sending is finished, the senders check the sending 
> time, and if the allowed time was significantly exceeded, the sender 
> gives an error message and terminates (itself and also the main program) 
> using the rte_exit() function.
> 
> This is the code:
> 
>    elapsed_seconds = (double)(rte_rdtsc()-start_tsc)/hz;
>    printf("Info: %s sender's sending took %3.10lf seconds.\n", side, 
> elapsed_seconds);
>    if ( elapsed_seconds > duration*TOLERANCE )
>      rte_exit(EXIT_FAILURE, "%s sending exceeded the %3.10lf seconds 
> limit, the test is invalid.\n", side, duration*TOLERANCE);
>    printf("%s frames sent: %lu\n", side, sent_frames);
> 
>    return 0;
> 
> The above code worked as I expected, while I used siitperf under Debian 
> 9.13 with DPDK 16.11.11-1+deb9u2. It always displayed the execution time 
> of test frame sending, and if the allowed time was significantly exceed, 
> then it gave an error message, and it was terminated, thus the sender 
> did not print out the number of send frames. And also the main program 
> was terminated due to the call of this function: it did not write out 
> the "Info: Test finished." message.
> 
> However, when I updated siitperf to use it with Ubuntu 22.04 with DPDK 
> version "21.11.3-0ubuntu0.22.04.1 amd64", then I experienced something 
> rather strange:
> 
> In the case, when the sending time is significantly exceeded, I get the 
> following messages from the program (I copy here the full output, as it 
> may be useful):
> 
> root@x033:~/siitperf# cat temp.out
> EAL: Detected CPU lcores: 56
> EAL: Detected NUMA nodes: 4
> EAL: Detected shared linkage of DPDK
> EAL: Multi-process socket /var/run/dpdk/rte/mp_socket
> EAL: Selected IOVA mode 'PA'
> EAL: No free 2048 kB hugepages reported on node 0
> EAL: No free 2048 kB hugepages reported on node 1
> EAL: No free 2048 kB hugepages reported on node 2
> EAL: No free 2048 kB hugepages reported on node 3
> EAL: No available 2048 kB hugepages reported
> EAL: VFIO support initialized
> EAL: Probe PCI driver: net_ice (8086:159b) device: 0000:98:00.0 (socket 2)
> ice_load_pkg_type(): Active package is: 1.3.26.0, ICE OS Default Package 
> (single VLAN mode)
> EAL: Probe PCI driver: net_ice (8086:159b) device: 0000:98:00.1 (socket 2)
> ice_load_pkg_type(): Active package is: 1.3.26.0, ICE OS Default Package 
> (single VLAN mode)
> TELEMETRY: No legacy callbacks, legacy socket not created
> ice_set_rx_function(): Using AVX2 Vector Rx (port 0).
> ice_set_rx_function(): Using AVX2 Vector Rx (port 1).
> Info: Left port and Left Sender CPU core belong to the same NUMA node: 2
> Info: Right port and Right Receiver CPU core belong to the same NUMA node: 2
> Info: Right port and Right Sender CPU core belong to the same NUMA node: 2
> Info: Left port and Left Receiver CPU core belong to the same NUMA node: 2
> Info: Testing initiated at 2023-09-15 07:50:17
> EAL: Error - exiting with code: 1
>    Cause: Forward sending exceeded the 60.0006000000 seconds limit, the 
> test is invalid.
> EAL: Error - exiting with code: 1
>    Cause: Reverse sending exceeded the 60.0006000000 seconds limit, the 
> test is invalid.
> root@x033:~/siitperf#
> 
> The rte_exit() function seems to work, as the error message appears, and 
> the number of sent frames is not displayed, however, the "Info: ..." 
> message about the sending time (printed out earlier in the code) is 
> missing! This is rather strange!
> 
> What is worse, the program does not stop, but *the sender threads and 
> the main program remain running (forever)*.
> 
> Here is the output of the "top" command:
> 
> top - 07:54:24 up 1 day, 14:12,  2 users, load average: 3.02, 2.41, 2.10
> Tasks: 591 total,   2 running, 589 sleeping,   0 stopped,   0 zombie
> %Cpu0  :100.0 us,  0.0 sy,  0.0 ni,  0.0 id,  0.0 wa,  0.0 hi, 0.0 si,  
> 0.0 st
> %Cpu1  :100.0 us,  0.0 sy,  0.0 ni,  0.0 id,  0.0 wa,  0.0 hi, 0.0 si,  
> 0.0 st
> %Cpu2  :  0.0 us,  0.0 sy,  0.0 ni, 94.1 id,  0.0 wa,  0.0 hi, 5.9 si,  
> 0.0 st
> %Cpu3  :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi, 0.0 si,  
> 0.0 st
> %Cpu4  :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi, 0.0 si,  
> 0.0 st
> %Cpu5  :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi, 0.0 si,  
> 0.0 st
> %Cpu6  :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi, 0.0 si,  
> 0.0 st
> %Cpu7  :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi, 0.0 si,  
> 0.0 st
> %Cpu8  :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi, 0.0 si,  
> 0.0 st
> %Cpu9  :100.0 us,  0.0 sy,  0.0 ni,  0.0 id,  0.0 wa,  0.0 hi, 0.0 si,  
> 0.0 st
> %Cpu10 :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi, 0.0 si,  
> 0.0 st
> %Cpu11 :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi, 0.0 si,  
> 0.0 st
> %Cpu12 :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi, 0.0 si,  
> 0.0 st
> %Cpu13 :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi, 0.0 si,  
> 0.0 st
> 
> CPU0 is the main core, the left sender and the right sender use CPU1 and 
> CPU9, respectively. (The receivers that used CPU5 and CPU13 already 
> terminated due to their timeout.)
> 
> *Thus, rte_exit() behaves differently now: it used the terminate the 
> main program but now it does not. *(And it also suppresses some 
> previously sent output.)*
> *
> 
> Is it a bug a new feature?
> 
> *How could I achieve the old behavior?* (Or at least the termination of 
> the main program by sender threads?)
> 
> Thank you very much for you guidance!
> 
> Best regards,
> 
> Gábor Lencse


Please get a backtrace. Simple way is to attach gdb to that process.
I suspect that since rte_exit() call the internal eal_cleanup function
and that calls close in the dirver, that ICE driver close function has
a bug. Perhaps ice close function does not correctly handle case where
the device has not started.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: rte_exit() does not terminate the program -- is it a bug or a new feature?
  2023-09-15 15:06 ` Stephen Hemminger
@ 2023-09-15 18:28   ` Gabor LENCSE
  2023-09-15 21:33     ` Stephen Hemminger
  0 siblings, 1 reply; 8+ messages in thread
From: Gabor LENCSE @ 2023-09-15 18:28 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: users

Dear Stephen,

Thank you very much for your answer!

> Please get a backtrace. Simple way is to attach gdb to that process.

I have recompiled siitperf with the "-g" compiler option and executed it 
from gdb. When the program stopped, I pressed Ctrl-C and issued a "bt" 
command, but of course, it displayed the call stack of the main thread. 
Then I collected some information about the threads using the "info 
threads" command and after that I switched to all available threads, and 
issued a "bt" command for those that represented my send() and receive() 
functions (I identified them using their LWP number). Here are the results:

root@x033:~/siitperf# gdb ./build/siitperf-tp
GNU gdb (Ubuntu 12.1-0ubuntu1~22.04) 12.1
Copyright (C) 2022 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later 
<http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<https://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
     <http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from ./build/siitperf-tp...
(gdb) set args 84 8000000 60 2000 2 2
(gdb) run
Starting program: /root/siitperf/build/siitperf-tp 84 8000000 60 2000 2 2
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
EAL: Detected CPU lcores: 56
EAL: Detected NUMA nodes: 4
EAL: Detected shared linkage of DPDK
[New Thread 0x7ffff49c0640 (LWP 24747)]
EAL: Multi-process socket /var/run/dpdk/rte/mp_socket
[New Thread 0x7ffff41bf640 (LWP 24748)]
EAL: Selected IOVA mode 'PA'
EAL: No free 2048 kB hugepages reported on node 0
EAL: No free 2048 kB hugepages reported on node 1
EAL: No free 2048 kB hugepages reported on node 2
EAL: No free 2048 kB hugepages reported on node 3
EAL: No available 2048 kB hugepages reported
EAL: VFIO support initialized
[New Thread 0x7ffff39be640 (LWP 24749)]
[New Thread 0x7ffff31bd640 (LWP 24750)]
[New Thread 0x7ffff29bc640 (LWP 24751)]
[New Thread 0x7ffff21bb640 (LWP 24752)]
EAL: Probe PCI driver: net_ice (8086:159b) device: 0000:98:00.0 (socket 2)
ice_load_pkg_type(): Active package is: 1.3.26.0, ICE OS Default Package 
(single VLAN mode)
EAL: Probe PCI driver: net_ice (8086:159b) device: 0000:98:00.1 (socket 2)
ice_load_pkg_type(): Active package is: 1.3.26.0, ICE OS Default Package 
(single VLAN mode)
[New Thread 0x7ffff19ba640 (LWP 24753)]
TELEMETRY: No legacy callbacks, legacy socket not created
ice_set_rx_function(): Using AVX2 Vector Rx (port 0).
ice_set_rx_function(): Using AVX2 Vector Rx (port 1).
Info: Left port and Left Sender CPU core belong to the same NUMA node: 2
Info: Right port and Right Receiver CPU core belong to the same NUMA node: 2
Info: Right port and Right Sender CPU core belong to the same NUMA node: 2
Info: Left port and Left Receiver CPU core belong to the same NUMA node: 2
Info: Testing initiated at 2023-09-15 18:06:05
Reverse frames received: 394340224
Forward frames received: 421381420
Info: Forward sender's sending took 70.3073795726 seconds.
EAL: Error - exiting with code: 1
   Cause: Forward sending exceeded the 60.0006000000 seconds limit, the 
test is invalid.
Info: Reverse sender's sending took 74.9384769772 seconds.
EAL: Error - exiting with code: 1
   Cause: Reverse sending exceeded the 60.0006000000 seconds limit, the 
test is invalid.
^C
Thread 1 "siitperf-tp" received signal SIGINT, Interrupt.
0x00007ffff7d99dd2 in rte_eal_wait_lcore () from 
/lib/x86_64-linux-gnu/librte_eal.so.22
(gdb) bt
#0  0x00007ffff7d99dd2 in rte_eal_wait_lcore () from 
/lib/x86_64-linux-gnu/librte_eal.so.22
#1  0x000055555559929e in Throughput::measure (this=0x7fffffffe300, 
leftport=0, rightport=1) at throughput.cc:3743
#2  0x0000555555557b20 in main (argc=7, argv=0x7fffffffe5b8) at 
main-tp.cc:34
(gdb) info threads
   Id   Target Id                                           Frame
* 1    Thread 0x7ffff77cac00 (LWP 24744) "siitperf-tp" 
0x00007ffff7d99dd2 in rte_eal_wait_lcore ()
    from /lib/x86_64-linux-gnu/librte_eal.so.22
   2    Thread 0x7ffff49c0640 (LWP 24747) "eal-intr-thread" 
0x00007ffff7a32fde in epoll_wait (epfd=6, events=0x7ffff49978d0,
     maxevents=3, timeout=-1) at ../sysdeps/unix/sysv/linux/epoll_wait.c:30
   3    Thread 0x7ffff41bf640 (LWP 24748) "rte_mp_handle" 
__recvmsg_syscall (flags=0, msg=0x7ffff41965c0, fd=9)
     at ../sysdeps/unix/sysv/linux/recvmsg.c:27
   4    Thread 0x7ffff39be640 (LWP 24749) "lcore-worker-1" 
0x00007ffff7d99dd2 in rte_eal_wait_lcore ()
    from /lib/x86_64-linux-gnu/librte_eal.so.22
   5    Thread 0x7ffff31bd640 (LWP 24750) "lcore-worker-5" 
__GI___libc_read (nbytes=1, buf=0x7ffff31947cf, fd=40)
     at ../sysdeps/unix/sysv/linux/read.c:26
   6    Thread 0x7ffff29bc640 (LWP 24751) "lcore-worker-9" 
0x00007ffff7d99dd2 in rte_eal_wait_lcore ()
    from /lib/x86_64-linux-gnu/librte_eal.so.22
   7    Thread 0x7ffff21bb640 (LWP 24752) "lcore-worker-13" 
__GI___libc_read (nbytes=1, buf=0x7ffff21927cf, fd=48)
     at ../sysdeps/unix/sysv/linux/read.c:26
   8    Thread 0x7ffff19ba640 (LWP 24753) "telemetry-v2" 
0x00007ffff7a3460f in __libc_accept (fd=58, addr=..., len=0x0)
     at ../sysdeps/unix/sysv/linux/accept.c:26
(gdb) thread 1
[Switching to thread 1 (Thread 0x7ffff77cac00 (LWP 24744))]
#0  0x00007ffff7d99dd2 in rte_eal_wait_lcore () from 
/lib/x86_64-linux-gnu/librte_eal.so.22
(gdb) thread 2
[Switching to thread 2 (Thread 0x7ffff49c0640 (LWP 24747))]
#0  0x00007ffff7a32fde in epoll_wait (epfd=6, events=0x7ffff49978d0, 
maxevents=3, timeout=-1)
     at ../sysdeps/unix/sysv/linux/epoll_wait.c:30
30      ../sysdeps/unix/sysv/linux/epoll_wait.c: No such file or directory.
(gdb) thread 3
[Switching to thread 3 (Thread 0x7ffff41bf640 (LWP 24748))]
#0  __recvmsg_syscall (flags=0, msg=0x7ffff41965c0, fd=9) at 
../sysdeps/unix/sysv/linux/recvmsg.c:27
27      ../sysdeps/unix/sysv/linux/recvmsg.c: No such file or directory.
(gdb) thread 4
[Switching to thread 4 (Thread 0x7ffff39be640 (LWP 24749))]
#0  0x00007ffff7d99dd2 in rte_eal_wait_lcore () from 
/lib/x86_64-linux-gnu/librte_eal.so.22
(gdb) bt
#0  0x00007ffff7d99dd2 in rte_eal_wait_lcore () from 
/lib/x86_64-linux-gnu/librte_eal.so.22
#1  0x00007ffff7d99f97 in rte_eal_mp_wait_lcore () from 
/lib/x86_64-linux-gnu/librte_eal.so.22
#2  0x00007ffff7da99ee in rte_service_finalize () from 
/lib/x86_64-linux-gnu/librte_eal.so.22
#3  0x00007ffff7db0404 in rte_eal_cleanup () from 
/lib/x86_64-linux-gnu/librte_eal.so.22
#4  0x00007ffff7d9d0b7 in rte_exit () from 
/lib/x86_64-linux-gnu/librte_eal.so.22
#5  0x000055555558e685 in send (par=0x7fffffffde00) at throughput.cc:1562
#6  0x00007ffff7d94a18 in ?? () from /lib/x86_64-linux-gnu/librte_eal.so.22
#7  0x00007ffff79a1b43 in start_thread (arg=<optimized out>) at 
./nptl/pthread_create.c:442
#8  0x00007ffff7a33a00 in clone3 () at 
../sysdeps/unix/sysv/linux/x86_64/clone3.S:81
(gdb) thread 5
[Switching to thread 5 (Thread 0x7ffff31bd640 (LWP 24750))]
#0  __GI___libc_read (nbytes=1, buf=0x7ffff31947cf, fd=40) at 
../sysdeps/unix/sysv/linux/read.c:26
26      ../sysdeps/unix/sysv/linux/read.c: No such file or directory.
(gdb) bt
#0  __GI___libc_read (nbytes=1, buf=0x7ffff31947cf, fd=40) at 
../sysdeps/unix/sysv/linux/read.c:26
#1  __GI___libc_read (fd=40, buf=0x7ffff31947cf, nbytes=1) at 
../sysdeps/unix/sysv/linux/read.c:24
#2  0x00007ffff7d9490c in ?? () from /lib/x86_64-linux-gnu/librte_eal.so.22
#3  0x00007ffff79a1b43 in start_thread (arg=<optimized out>) at 
./nptl/pthread_create.c:442
#4  0x00007ffff7a33a00 in clone3 () at 
../sysdeps/unix/sysv/linux/x86_64/clone3.S:81
(gdb) thread 6
[Switching to thread 6 (Thread 0x7ffff29bc640 (LWP 24751))]
#0  0x00007ffff7d99dd2 in rte_eal_wait_lcore () from 
/lib/x86_64-linux-gnu/librte_eal.so.22
(gdb) bt
#0  0x00007ffff7d99dd2 in rte_eal_wait_lcore () from 
/lib/x86_64-linux-gnu/librte_eal.so.22
#1  0x00007ffff7d99f97 in rte_eal_mp_wait_lcore () from 
/lib/x86_64-linux-gnu/librte_eal.so.22
#2  0x00007ffff7da99ee in rte_service_finalize () from 
/lib/x86_64-linux-gnu/librte_eal.so.22
#3  0x00007ffff7db0404 in rte_eal_cleanup () from 
/lib/x86_64-linux-gnu/librte_eal.so.22
#4  0x00007ffff7d9d0b7 in rte_exit () from 
/lib/x86_64-linux-gnu/librte_eal.so.22
#5  0x000055555558e685 in send (par=0x7fffffffde80) at throughput.cc:1562
#6  0x00007ffff7d94a18 in ?? () from /lib/x86_64-linux-gnu/librte_eal.so.22
#7  0x00007ffff79a1b43 in start_thread (arg=<optimized out>) at 
./nptl/pthread_create.c:442
#8  0x00007ffff7a33a00 in clone3 () at 
../sysdeps/unix/sysv/linux/x86_64/clone3.S:81
(gdb) thread 7
[Switching to thread 7 (Thread 0x7ffff21bb640 (LWP 24752))]
#0  __GI___libc_read (nbytes=1, buf=0x7ffff21927cf, fd=48) at 
../sysdeps/unix/sysv/linux/read.c:26
26      in ../sysdeps/unix/sysv/linux/read.c
(gdb) bt
#0  __GI___libc_read (nbytes=1, buf=0x7ffff21927cf, fd=48) at 
../sysdeps/unix/sysv/linux/read.c:26
#1  __GI___libc_read (fd=48, buf=0x7ffff21927cf, nbytes=1) at 
../sysdeps/unix/sysv/linux/read.c:24
#2  0x00007ffff7d9490c in ?? () from /lib/x86_64-linux-gnu/librte_eal.so.22
#3  0x00007ffff79a1b43 in start_thread (arg=<optimized out>) at 
./nptl/pthread_create.c:442
#4  0x00007ffff7a33a00 in clone3 () at 
../sysdeps/unix/sysv/linux/x86_64/clone3.S:81
(gdb) thread 8
[Switching to thread 8 (Thread 0x7ffff19ba640 (LWP 24753))]
#0  0x00007ffff7a3460f in __libc_accept (fd=58, addr=..., len=0x0) at 
../sysdeps/unix/sysv/linux/accept.c:26
26      ../sysdeps/unix/sysv/linux/accept.c: No such file or directory.
(gdb) thread 9
Unknown thread 9.
(gdb)

Some additional information from the siitperf.conf file:

CPU-L-Send 1 # Left Sender runs on this core
CPU-R-Recv 5 # Right Receiver runs on this core
CPU-R-Send 9 # Right Sender runs on this core
CPU-L-Recv 13 # Left Receiver runs on this core

Therefore, the "send()" functions are the ones that remain running on 
CPU cores 1 and 9. And they fully utilize their cores (as well as the 
main program does with core 0), I have checked it earlier.

This is a -- perhaps -- relevant part from the code of the main program:

       // wait until active senders and receivers finish
       if ( forward ) {
         rte_eal_wait_lcore(cpu_left_sender);
         rte_eal_wait_lcore(cpu_right_receiver);
       }
       if ( reverse ) {
         rte_eal_wait_lcore(cpu_right_sender);
         rte_eal_wait_lcore(cpu_left_receiver);
       }

It seems to me as if the two send functions and also the main program 
would be actively waiting in the rte_eal_wait_lcore() function. But I 
have no idea, why. If the sender that sent frames in the forward 
direction is there and the main program is there, then, IMHO, the 
rte_eal_wait_lcore(cpu_left_sender); function should finish.

Am I wrong?

> I suspect that since rte_exit() call the internal eal_cleanup function
> and that calls close in the dirver, that ICE driver close function has
> a bug. Perhaps ice close function does not correctly handle case where
> the device has not started.

Yes, your hypothesis was confirmed: my both send() functions were in the 
rte_eal_cleanup() function. :-)

However, I am not sure, which device you meant. But I think that they 
are initialized properly, because I can ensure a successful execution 
(and finishing) of the program by halving the frame rate:

root@x033:~/siitperf# ./build/siitperf-tp 84 4000000 6 2000 2 2
EAL: Detected CPU lcores: 56
EAL: Detected NUMA nodes: 4
EAL: Detected shared linkage of DPDK
EAL: Multi-process socket /var/run/dpdk/rte/mp_socket
EAL: Selected IOVA mode 'PA'
EAL: No free 2048 kB hugepages reported on node 0
EAL: No free 2048 kB hugepages reported on node 1
EAL: No free 2048 kB hugepages reported on node 2
EAL: No free 2048 kB hugepages reported on node 3
EAL: No available 2048 kB hugepages reported
EAL: VFIO support initialized
EAL: Probe PCI driver: net_ice (8086:159b) device: 0000:98:00.0 (socket 2)
ice_load_pkg_type(): Active package is: 1.3.26.0, ICE OS Default Package 
(single VLAN mode)
EAL: Probe PCI driver: net_ice (8086:159b) device: 0000:98:00.1 (socket 2)
ice_load_pkg_type(): Active package is: 1.3.26.0, ICE OS Default Package 
(single VLAN mode)
TELEMETRY: No legacy callbacks, legacy socket not created
ice_set_rx_function(): Using AVX2 Vector Rx (port 0).
ice_set_rx_function(): Using AVX2 Vector Rx (port 1).
Info: Left port and Left Sender CPU core belong to the same NUMA node: 2
Info: Right port and Right Receiver CPU core belong to the same NUMA node: 2
Info: Right port and Right Sender CPU core belong to the same NUMA node: 2
Info: Left port and Left Receiver CPU core belong to the same NUMA node: 2
Info: Testing initiated at 2023-09-15 17:43:11
Info: Reverse sender's sending took 5.9999998420 seconds.
Reverse frames sent: 24000000
Info: Forward sender's sending took 5.9999999023 seconds.
Forward frames sent: 24000000
Forward frames received: 24000000
Reverse frames received: 24000000
Info: Test finished.
root@x033:~/siitperf#

(The only problem with this trick is that I want to use a binary search 
to determine the performance limit of the tester. And if the tester does 
not stop, then my bash shell script waits for it forever.)

So, what should I do next?

Best regards,

Gábor



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: rte_exit() does not terminate the program -- is it a bug or a new feature?
  2023-09-15 18:28   ` Gabor LENCSE
@ 2023-09-15 21:33     ` Stephen Hemminger
  2023-09-17 19:37       ` Gabor LENCSE
  0 siblings, 1 reply; 8+ messages in thread
From: Stephen Hemminger @ 2023-09-15 21:33 UTC (permalink / raw)
  To: Gabor LENCSE; +Cc: users

On Fri, 15 Sep 2023 20:28:44 +0200
Gabor LENCSE <lencse@hit.bme.hu> wrote:

> Dear Stephen,
> 
> Thank you very much for your answer!
> 
> > Please get a backtrace. Simple way is to attach gdb to that process.  
> 
> I have recompiled siitperf with the "-g" compiler option and executed it 
> from gdb. When the program stopped, I pressed Ctrl-C and issued a "bt" 
> command, but of course, it displayed the call stack of the main thread. 
> Then I collected some information about the threads using the "info 
> threads" command and after that I switched to all available threads, and 
> issued a "bt" command for those that represented my send() and receive() 
> functions (I identified them using their LWP number). Here are the results:
> 
> root@x033:~/siitperf# gdb ./build/siitperf-tp
> GNU gdb (Ubuntu 12.1-0ubuntu1~22.04) 12.1
> Copyright (C) 2022 Free Software Foundation, Inc.
> License GPLv3+: GNU GPL version 3 or later 
> <http://gnu.org/licenses/gpl.html>
> This is free software: you are free to change and redistribute it.
> There is NO WARRANTY, to the extent permitted by law.
> Type "show copying" and "show warranty" for details.
> This GDB was configured as "x86_64-linux-gnu".
> Type "show configuration" for configuration details.
> For bug reporting instructions, please see:
> <https://www.gnu.org/software/gdb/bugs/>.
> Find the GDB manual and other documentation resources online at:
>      <http://www.gnu.org/software/gdb/documentation/>.
> 
> For help, type "help".
> Type "apropos word" to search for commands related to "word"...
> Reading symbols from ./build/siitperf-tp...
> (gdb) set args 84 8000000 60 2000 2 2
> (gdb) run
> Starting program: /root/siitperf/build/siitperf-tp 84 8000000 60 2000 2 2
> [Thread debugging using libthread_db enabled]
> Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
> EAL: Detected CPU lcores: 56
> EAL: Detected NUMA nodes: 4
> EAL: Detected shared linkage of DPDK
> [New Thread 0x7ffff49c0640 (LWP 24747)]
> EAL: Multi-process socket /var/run/dpdk/rte/mp_socket
> [New Thread 0x7ffff41bf640 (LWP 24748)]
> EAL: Selected IOVA mode 'PA'
> EAL: No free 2048 kB hugepages reported on node 0
> EAL: No free 2048 kB hugepages reported on node 1
> EAL: No free 2048 kB hugepages reported on node 2
> EAL: No free 2048 kB hugepages reported on node 3
> EAL: No available 2048 kB hugepages reported
> EAL: VFIO support initialized
> [New Thread 0x7ffff39be640 (LWP 24749)]
> [New Thread 0x7ffff31bd640 (LWP 24750)]
> [New Thread 0x7ffff29bc640 (LWP 24751)]
> [New Thread 0x7ffff21bb640 (LWP 24752)]
> EAL: Probe PCI driver: net_ice (8086:159b) device: 0000:98:00.0 (socket 2)
> ice_load_pkg_type(): Active package is: 1.3.26.0, ICE OS Default Package 
> (single VLAN mode)
> EAL: Probe PCI driver: net_ice (8086:159b) device: 0000:98:00.1 (socket 2)
> ice_load_pkg_type(): Active package is: 1.3.26.0, ICE OS Default Package 
> (single VLAN mode)
> [New Thread 0x7ffff19ba640 (LWP 24753)]
> TELEMETRY: No legacy callbacks, legacy socket not created
> ice_set_rx_function(): Using AVX2 Vector Rx (port 0).
> ice_set_rx_function(): Using AVX2 Vector Rx (port 1).
> Info: Left port and Left Sender CPU core belong to the same NUMA node: 2
> Info: Right port and Right Receiver CPU core belong to the same NUMA node: 2
> Info: Right port and Right Sender CPU core belong to the same NUMA node: 2
> Info: Left port and Left Receiver CPU core belong to the same NUMA node: 2
> Info: Testing initiated at 2023-09-15 18:06:05
> Reverse frames received: 394340224
> Forward frames received: 421381420
> Info: Forward sender's sending took 70.3073795726 seconds.
> EAL: Error - exiting with code: 1
>    Cause: Forward sending exceeded the 60.0006000000 seconds limit, the 
> test is invalid.
> Info: Reverse sender's sending took 74.9384769772 seconds.
> EAL: Error - exiting with code: 1
>    Cause: Reverse sending exceeded the 60.0006000000 seconds limit, the 
> test is invalid.
> ^C
> Thread 1 "siitperf-tp" received signal SIGINT, Interrupt.
> 0x00007ffff7d99dd2 in rte_eal_wait_lcore () from 
> /lib/x86_64-linux-gnu/librte_eal.so.22
> (gdb) bt
> #0  0x00007ffff7d99dd2 in rte_eal_wait_lcore () from 
> /lib/x86_64-linux-gnu/librte_eal.so.22
> #1  0x000055555559929e in Throughput::measure (this=0x7fffffffe300, 
> leftport=0, rightport=1) at throughput.cc:3743
> #2  0x0000555555557b20 in main (argc=7, argv=0x7fffffffe5b8) at 
> main-tp.cc:34
> (gdb) info threads
>    Id   Target Id                                           Frame
> * 1    Thread 0x7ffff77cac00 (LWP 24744) "siitperf-tp" 
> 0x00007ffff7d99dd2 in rte_eal_wait_lcore ()
>     from /lib/x86_64-linux-gnu/librte_eal.so.22
>    2    Thread 0x7ffff49c0640 (LWP 24747) "eal-intr-thread" 
> 0x00007ffff7a32fde in epoll_wait (epfd=6, events=0x7ffff49978d0,
>      maxevents=3, timeout=-1) at ../sysdeps/unix/sysv/linux/epoll_wait.c:30
>    3    Thread 0x7ffff41bf640 (LWP 24748) "rte_mp_handle" 
> __recvmsg_syscall (flags=0, msg=0x7ffff41965c0, fd=9)
>      at ../sysdeps/unix/sysv/linux/recvmsg.c:27
>    4    Thread 0x7ffff39be640 (LWP 24749) "lcore-worker-1" 
> 0x00007ffff7d99dd2 in rte_eal_wait_lcore ()
>     from /lib/x86_64-linux-gnu/librte_eal.so.22
>    5    Thread 0x7ffff31bd640 (LWP 24750) "lcore-worker-5" 
> __GI___libc_read (nbytes=1, buf=0x7ffff31947cf, fd=40)
>      at ../sysdeps/unix/sysv/linux/read.c:26
>    6    Thread 0x7ffff29bc640 (LWP 24751) "lcore-worker-9" 
> 0x00007ffff7d99dd2 in rte_eal_wait_lcore ()
>     from /lib/x86_64-linux-gnu/librte_eal.so.22
>    7    Thread 0x7ffff21bb640 (LWP 24752) "lcore-worker-13" 
> __GI___libc_read (nbytes=1, buf=0x7ffff21927cf, fd=48)
>      at ../sysdeps/unix/sysv/linux/read.c:26
>    8    Thread 0x7ffff19ba640 (LWP 24753) "telemetry-v2" 
> 0x00007ffff7a3460f in __libc_accept (fd=58, addr=..., len=0x0)
>      at ../sysdeps/unix/sysv/linux/accept.c:26
> (gdb) thread 1
> [Switching to thread 1 (Thread 0x7ffff77cac00 (LWP 24744))]
> #0  0x00007ffff7d99dd2 in rte_eal_wait_lcore () from 
> /lib/x86_64-linux-gnu/librte_eal.so.22
> (gdb) thread 2
> [Switching to thread 2 (Thread 0x7ffff49c0640 (LWP 24747))]
> #0  0x00007ffff7a32fde in epoll_wait (epfd=6, events=0x7ffff49978d0, 
> maxevents=3, timeout=-1)
>      at ../sysdeps/unix/sysv/linux/epoll_wait.c:30
> 30      ../sysdeps/unix/sysv/linux/epoll_wait.c: No such file or directory.
> (gdb) thread 3
> [Switching to thread 3 (Thread 0x7ffff41bf640 (LWP 24748))]
> #0  __recvmsg_syscall (flags=0, msg=0x7ffff41965c0, fd=9) at 
> ../sysdeps/unix/sysv/linux/recvmsg.c:27
> 27      ../sysdeps/unix/sysv/linux/recvmsg.c: No such file or directory.
> (gdb) thread 4
> [Switching to thread 4 (Thread 0x7ffff39be640 (LWP 24749))]
> #0  0x00007ffff7d99dd2 in rte_eal_wait_lcore () from 
> /lib/x86_64-linux-gnu/librte_eal.so.22
> (gdb) bt
> #0  0x00007ffff7d99dd2 in rte_eal_wait_lcore () from 
> /lib/x86_64-linux-gnu/librte_eal.so.22
> #1  0x00007ffff7d99f97 in rte_eal_mp_wait_lcore () from 
> /lib/x86_64-linux-gnu/librte_eal.so.22
> #2  0x00007ffff7da99ee in rte_service_finalize () from 
> /lib/x86_64-linux-gnu/librte_eal.so.22
> #3  0x00007ffff7db0404 in rte_eal_cleanup () from 
> /lib/x86_64-linux-gnu/librte_eal.so.22
> #4  0x00007ffff7d9d0b7 in rte_exit () from 
> /lib/x86_64-linux-gnu/librte_eal.so.22
> #5  0x000055555558e685 in send (par=0x7fffffffde00) at throughput.cc:1562
> #6  0x00007ffff7d94a18 in ?? () from /lib/x86_64-linux-gnu/librte_eal.so.22
> #7  0x00007ffff79a1b43 in start_thread (arg=<optimized out>) at 
> ./nptl/pthread_create.c:442
> #8  0x00007ffff7a33a00 in clone3 () at 
> ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81
> (gdb) thread 5
> [Switching to thread 5 (Thread 0x7ffff31bd640 (LWP 24750))]
> #0  __GI___libc_read (nbytes=1, buf=0x7ffff31947cf, fd=40) at 
> ../sysdeps/unix/sysv/linux/read.c:26
> 26      ../sysdeps/unix/sysv/linux/read.c: No such file or directory.
> (gdb) bt
> #0  __GI___libc_read (nbytes=1, buf=0x7ffff31947cf, fd=40) at 
> ../sysdeps/unix/sysv/linux/read.c:26
> #1  __GI___libc_read (fd=40, buf=0x7ffff31947cf, nbytes=1) at 
> ../sysdeps/unix/sysv/linux/read.c:24
> #2  0x00007ffff7d9490c in ?? () from /lib/x86_64-linux-gnu/librte_eal.so.22
> #3  0x00007ffff79a1b43 in start_thread (arg=<optimized out>) at 
> ./nptl/pthread_create.c:442
> #4  0x00007ffff7a33a00 in clone3 () at 
> ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81
> (gdb) thread 6
> [Switching to thread 6 (Thread 0x7ffff29bc640 (LWP 24751))]
> #0  0x00007ffff7d99dd2 in rte_eal_wait_lcore () from 
> /lib/x86_64-linux-gnu/librte_eal.so.22
> (gdb) bt
> #0  0x00007ffff7d99dd2 in rte_eal_wait_lcore () from 
> /lib/x86_64-linux-gnu/librte_eal.so.22
> #1  0x00007ffff7d99f97 in rte_eal_mp_wait_lcore () from 
> /lib/x86_64-linux-gnu/librte_eal.so.22
> #2  0x00007ffff7da99ee in rte_service_finalize () from 
> /lib/x86_64-linux-gnu/librte_eal.so.22
> #3  0x00007ffff7db0404 in rte_eal_cleanup () from 
> /lib/x86_64-linux-gnu/librte_eal.so.22
> #4  0x00007ffff7d9d0b7 in rte_exit () from 
> /lib/x86_64-linux-gnu/librte_eal.so.22
> #5  0x000055555558e685 in send (par=0x7fffffffde80) at throughput.cc:1562
> #6  0x00007ffff7d94a18 in ?? () from /lib/x86_64-linux-gnu/librte_eal.so.22
> #7  0x00007ffff79a1b43 in start_thread (arg=<optimized out>) at 
> ./nptl/pthread_create.c:442
> #8  0x00007ffff7a33a00 in clone3 () at 
> ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81
> (gdb) thread 7
> [Switching to thread 7 (Thread 0x7ffff21bb640 (LWP 24752))]
> #0  __GI___libc_read (nbytes=1, buf=0x7ffff21927cf, fd=48) at 
> ../sysdeps/unix/sysv/linux/read.c:26
> 26      in ../sysdeps/unix/sysv/linux/read.c
> (gdb) bt
> #0  __GI___libc_read (nbytes=1, buf=0x7ffff21927cf, fd=48) at 
> ../sysdeps/unix/sysv/linux/read.c:26
> #1  __GI___libc_read (fd=48, buf=0x7ffff21927cf, nbytes=1) at 
> ../sysdeps/unix/sysv/linux/read.c:24
> #2  0x00007ffff7d9490c in ?? () from /lib/x86_64-linux-gnu/librte_eal.so.22
> #3  0x00007ffff79a1b43 in start_thread (arg=<optimized out>) at 
> ./nptl/pthread_create.c:442
> #4  0x00007ffff7a33a00 in clone3 () at 
> ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81
> (gdb) thread 8
> [Switching to thread 8 (Thread 0x7ffff19ba640 (LWP 24753))]
> #0  0x00007ffff7a3460f in __libc_accept (fd=58, addr=..., len=0x0) at 
> ../sysdeps/unix/sysv/linux/accept.c:26
> 26      ../sysdeps/unix/sysv/linux/accept.c: No such file or directory.
> (gdb) thread 9
> Unknown thread 9.
> (gdb)
> 
> Some additional information from the siitperf.conf file:
> 
> CPU-L-Send 1 # Left Sender runs on this core
> CPU-R-Recv 5 # Right Receiver runs on this core
> CPU-R-Send 9 # Right Sender runs on this core
> CPU-L-Recv 13 # Left Receiver runs on this core
> 
> Therefore, the "send()" functions are the ones that remain running on 
> CPU cores 1 and 9. And they fully utilize their cores (as well as the 
> main program does with core 0), I have checked it earlier.
> 
> This is a -- perhaps -- relevant part from the code of the main program:
> 
>        // wait until active senders and receivers finish
>        if ( forward ) {
>          rte_eal_wait_lcore(cpu_left_sender);
>          rte_eal_wait_lcore(cpu_right_receiver);
>        }
>        if ( reverse ) {
>          rte_eal_wait_lcore(cpu_right_sender);
>          rte_eal_wait_lcore(cpu_left_receiver);
>        }
> 
> It seems to me as if the two send functions and also the main program 
> would be actively waiting in the rte_eal_wait_lcore() function. But I 
> have no idea, why. If the sender that sent frames in the forward 
> direction is there and the main program is there, then, IMHO, the 
> rte_eal_wait_lcore(cpu_left_sender); function should finish.
> 
> Am I wrong?
> 
> > I suspect that since rte_exit() call the internal eal_cleanup function
> > and that calls close in the dirver, that ICE driver close function has
> > a bug. Perhaps ice close function does not correctly handle case where
> > the device has not started.  
> 
> Yes, your hypothesis was confirmed: my both send() functions were in the 
> rte_eal_cleanup() function. :-)
> 
> However, I am not sure, which device you meant. But I think that they 
> are initialized properly, because I can ensure a successful execution 
> (and finishing) of the program by halving the frame rate:
> 
> root@x033:~/siitperf# ./build/siitperf-tp 84 4000000 6 2000 2 2
> EAL: Detected CPU lcores: 56
> EAL: Detected NUMA nodes: 4
> EAL: Detected shared linkage of DPDK
> EAL: Multi-process socket /var/run/dpdk/rte/mp_socket
> EAL: Selected IOVA mode 'PA'
> EAL: No free 2048 kB hugepages reported on node 0
> EAL: No free 2048 kB hugepages reported on node 1
> EAL: No free 2048 kB hugepages reported on node 2
> EAL: No free 2048 kB hugepages reported on node 3
> EAL: No available 2048 kB hugepages reported
> EAL: VFIO support initialized
> EAL: Probe PCI driver: net_ice (8086:159b) device: 0000:98:00.0 (socket 2)
> ice_load_pkg_type(): Active package is: 1.3.26.0, ICE OS Default Package 
> (single VLAN mode)
> EAL: Probe PCI driver: net_ice (8086:159b) device: 0000:98:00.1 (socket 2)
> ice_load_pkg_type(): Active package is: 1.3.26.0, ICE OS Default Package 
> (single VLAN mode)
> TELEMETRY: No legacy callbacks, legacy socket not created
> ice_set_rx_function(): Using AVX2 Vector Rx (port 0).
> ice_set_rx_function(): Using AVX2 Vector Rx (port 1).
> Info: Left port and Left Sender CPU core belong to the same NUMA node: 2
> Info: Right port and Right Receiver CPU core belong to the same NUMA node: 2
> Info: Right port and Right Sender CPU core belong to the same NUMA node: 2
> Info: Left port and Left Receiver CPU core belong to the same NUMA node: 2
> Info: Testing initiated at 2023-09-15 17:43:11
> Info: Reverse sender's sending took 5.9999998420 seconds.
> Reverse frames sent: 24000000
> Info: Forward sender's sending took 5.9999999023 seconds.
> Forward frames sent: 24000000
> Forward frames received: 24000000
> Reverse frames received: 24000000
> Info: Test finished.
> root@x033:~/siitperf#
> 
> (The only problem with this trick is that I want to use a binary search 
> to determine the performance limit of the tester. And if the tester does 
> not stop, then my bash shell script waits for it forever.)
> 
> So, what should I do next?

Not sure what the tx and rx polling loops look like in your application.
But they need to have some way of forcing exit, and you need to set that
flag before calling rte_exit().

See l2fwd and force_quit flag for an example.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: rte_exit() does not terminate the program -- is it a bug or a new feature?
  2023-09-15 21:33     ` Stephen Hemminger
@ 2023-09-17 19:37       ` Gabor LENCSE
  2023-09-17 21:27         ` Stephen Hemminger
  0 siblings, 1 reply; 8+ messages in thread
From: Gabor LENCSE @ 2023-09-17 19:37 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: users

[-- Attachment #1: Type: text/plain, Size: 1564 bytes --]

Dear Stephen,

Thank you very much for your answer. Please see my answers inline.

> Not sure what the tx and rx polling loops look like in your application.

In short: they surely finish.

In more details:
- The receivers finish due to timeout. In my case the receiving threads 
finished OK. (Their CPU cores became idle.)
- The sender threads check if sending happened within timeout time only 
AFTER all test frames were sent, that is, the sending loop finished. (It 
is done so to spare 1 branch instruction in the innermost sending loop.) 
So the sending loop is already finished when the rte_exit() is called. 
*My problem is that calling the rte_exit() function does not terminate 
the application.*

> But they need to have some way of forcing exit, and you need to set that
> flag before calling rte_exit().
>
> See l2fwd and force_quit flag for an example.

I have looked into its source code. I understand that it sets the 
"force_quit" flag when it receives a SIGINT or SIGTERM signal, and the 
nonzero value of the  "force_quit" flag causes to finish the while cycle 
of the "l2fwd_main_loop(void)".

However, l2fwd also uses the "rte_exit()" function to terminate the 
program. The only difference is that it calls the "rte_exit()" function 
from the main program, and I do so in a thread started by the 
"rte_eal_remote_launch()" function.

Is there any constraint for usage the "rte_eal_remote_launch()" 
function? (E.g., it may be called only from the main thread? I did not 
see anything like that int the documentation.)

Best regards,

Gábor

[-- Attachment #2: Type: text/html, Size: 2291 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: rte_exit() does not terminate the program -- is it a bug or a new feature?
  2023-09-17 19:37       ` Gabor LENCSE
@ 2023-09-17 21:27         ` Stephen Hemminger
  2023-09-18 18:23           ` Gabor LENCSE
  0 siblings, 1 reply; 8+ messages in thread
From: Stephen Hemminger @ 2023-09-17 21:27 UTC (permalink / raw)
  To: Gabor LENCSE; +Cc: users

On Sun, 17 Sep 2023 21:37:30 +0200
Gabor LENCSE <lencse@hit.bme.hu> wrote:

> However, l2fwd also uses the "rte_exit()" function to terminate the 
> program. The only difference is that it calls the "rte_exit()" function 
> from the main program, and I do so in a thread started by the 
> "rte_eal_remote_launch()" function.

Calling rte_exit in a thread other than main thread won't work because
the cleanup code is calling rte_eal_cleanup, and inside that it ends
up waiting for all workers.  Since the thread you are calling from
is a worker, it ends up waiting for itself.

rte_exit()
	rte_eal_cleanup()
		rte_service_finalize()
			rte_eal_mp_wait_lcore()


void
rte_eal_mp_wait_lcore(void)
{
	unsigned lcore_id;

	RTE_LCORE_FOREACH_WORKER(lcore_id) {
		rte_eal_wait_lcore(lcore_id);
	}
}

Either service handling needs to be smarter, the rte_exit() function
check if it is called from main lcore, and/or documentation needs update.
Not a simple fix because in order to safely do the cleanup logic
all threads have to gone to a quiescent state.
	

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: rte_exit() does not terminate the program -- is it a bug or a new feature?
  2023-09-17 21:27         ` Stephen Hemminger
@ 2023-09-18 18:23           ` Gabor LENCSE
  2023-09-18 18:49             ` Stephen Hemminger
  0 siblings, 1 reply; 8+ messages in thread
From: Gabor LENCSE @ 2023-09-18 18:23 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: users

Dear Stephen,

Thank you very much for your reply. Please see my answers inline.

On 9/17/2023 11:27 PM, Stephen Hemminger wrote:
> On Sun, 17 Sep 2023 21:37:30 +0200
> Gabor LENCSE <lencse@hit.bme.hu> wrote:
>
>> However, l2fwd also uses the "rte_exit()" function to terminate the
>> program. The only difference is that it calls the "rte_exit()" function
>> from the main program, and I do so in a thread started by the
>> "rte_eal_remote_launch()" function.
> Calling rte_exit in a thread other than main thread won't work because
> the cleanup code is calling rte_eal_cleanup, and inside that it ends
> up waiting for all workers.  Since the thread you are calling from
> is a worker, it ends up waiting for itself.
>
> rte_exit()
> 	rte_eal_cleanup()
> 		rte_service_finalize()
> 			rte_eal_mp_wait_lcore()
>
>
> void
> rte_eal_mp_wait_lcore(void)
> {
> 	unsigned lcore_id;
>
> 	RTE_LCORE_FOREACH_WORKER(lcore_id) {
> 		rte_eal_wait_lcore(lcore_id);
> 	}
> }

Thank you very much for the detailed explanation!

I have modified the send function of siitperf (just at one point as a 
quick hack) and also the bash shell script. Now it works well and 
produces meaningful self-test results without any problem with program 
termination. So, the issue is solved. :-)

Of course, I will review all my rte_exit calls... It'll take a while...

I am just curious, as I have no idea, why my old code worked all right 
with DPDK 16.11. Has rte_exit() been changed since then?

(But, please do not spend too much time with this question!)

> Either service handling needs to be smarter, the rte_exit() function
> check if it is called from main lcore, and/or documentation needs update.
> Not a simple fix because in order to safely do the cleanup logic
> all threads have to gone to a quiescent state.

Yes I think it definitely SHOULD be mentioned in DPDK documentation that 
the rte_exit() function may be called only from the main lcore.

Now I have learnt it for all my life, but it will be useful for a lot of 
DPDK users. :-)

Once again, thank you very much for your help!

Best regards,

Gábor

p.s.: I have another, even worse problem with siitperf running with the 
new versions of DPDK. I will report it with a different subject line.


> 	

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: rte_exit() does not terminate the program -- is it a bug or a new feature?
  2023-09-18 18:23           ` Gabor LENCSE
@ 2023-09-18 18:49             ` Stephen Hemminger
  0 siblings, 0 replies; 8+ messages in thread
From: Stephen Hemminger @ 2023-09-18 18:49 UTC (permalink / raw)
  To: Gabor LENCSE; +Cc: users

On Mon, 18 Sep 2023 20:23:25 +0200
Gabor LENCSE <lencse@hit.bme.hu> wrote:

> Of course, I will review all my rte_exit calls... It'll take a while...
> 
> I am just curious, as I have no idea, why my old code worked all right 
> with DPDK 16.11. Has rte_exit() been changed since then?

Older versions of the DPDK did not call eal_cleanup() and did not
have service lcores. I think service lcores were new in 18.11

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2023-09-18 18:49 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-09-15  8:24 rte_exit() does not terminate the program -- is it a bug or a new feature? Gabor LENCSE
2023-09-15 15:06 ` Stephen Hemminger
2023-09-15 18:28   ` Gabor LENCSE
2023-09-15 21:33     ` Stephen Hemminger
2023-09-17 19:37       ` Gabor LENCSE
2023-09-17 21:27         ` Stephen Hemminger
2023-09-18 18:23           ` Gabor LENCSE
2023-09-18 18:49             ` Stephen Hemminger

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).