From: Stephen Hemminger <stephen@networkplumber.org>
To: Gabor LENCSE <lencse@hit.bme.hu>
Cc: users@dpdk.org
Subject: Re: rte_exit() does not terminate the program -- is it a bug or a new feature?
Date: Fri, 15 Sep 2023 08:06:08 -0700 [thread overview]
Message-ID: <20230915080608.724f0102@hermes.local> (raw)
In-Reply-To: <3ef90e53-c28b-09e8-e3f3-2b78727114ff@hit.bme.hu>
On Fri, 15 Sep 2023 10:24:01 +0200
Gabor LENCSE <lencse@hit.bme.hu> wrote:
> Dear DPDK Developers and Users,
>
> I have met the following issue with my RFC 8219 compliant SIIT and
> stateful NAT64/NAT44 tester, siitperf:
> https://github.com/lencsegabor/siitperf
>
> Its main program starts two sending threads and two receiving threads on
> their exclusively used CPU cores using the rte_eal_remote_launch()
> function, e.g., the code is as follows:
>
> // start left sender
> if ( rte_eal_remote_launch(send, &spars1, cpu_left_sender) )
> std::cout << "Error: could not start Left Sender." <<
> std::endl;
>
> When the test frame sending is finished, the senders check the sending
> time, and if the allowed time was significantly exceeded, the sender
> gives an error message and terminates (itself and also the main program)
> using the rte_exit() function.
>
> This is the code:
>
> elapsed_seconds = (double)(rte_rdtsc()-start_tsc)/hz;
> printf("Info: %s sender's sending took %3.10lf seconds.\n", side,
> elapsed_seconds);
> if ( elapsed_seconds > duration*TOLERANCE )
> rte_exit(EXIT_FAILURE, "%s sending exceeded the %3.10lf seconds
> limit, the test is invalid.\n", side, duration*TOLERANCE);
> printf("%s frames sent: %lu\n", side, sent_frames);
>
> return 0;
>
> The above code worked as I expected, while I used siitperf under Debian
> 9.13 with DPDK 16.11.11-1+deb9u2. It always displayed the execution time
> of test frame sending, and if the allowed time was significantly exceed,
> then it gave an error message, and it was terminated, thus the sender
> did not print out the number of send frames. And also the main program
> was terminated due to the call of this function: it did not write out
> the "Info: Test finished." message.
>
> However, when I updated siitperf to use it with Ubuntu 22.04 with DPDK
> version "21.11.3-0ubuntu0.22.04.1 amd64", then I experienced something
> rather strange:
>
> In the case, when the sending time is significantly exceeded, I get the
> following messages from the program (I copy here the full output, as it
> may be useful):
>
> root@x033:~/siitperf# cat temp.out
> EAL: Detected CPU lcores: 56
> EAL: Detected NUMA nodes: 4
> EAL: Detected shared linkage of DPDK
> EAL: Multi-process socket /var/run/dpdk/rte/mp_socket
> EAL: Selected IOVA mode 'PA'
> EAL: No free 2048 kB hugepages reported on node 0
> EAL: No free 2048 kB hugepages reported on node 1
> EAL: No free 2048 kB hugepages reported on node 2
> EAL: No free 2048 kB hugepages reported on node 3
> EAL: No available 2048 kB hugepages reported
> EAL: VFIO support initialized
> EAL: Probe PCI driver: net_ice (8086:159b) device: 0000:98:00.0 (socket 2)
> ice_load_pkg_type(): Active package is: 1.3.26.0, ICE OS Default Package
> (single VLAN mode)
> EAL: Probe PCI driver: net_ice (8086:159b) device: 0000:98:00.1 (socket 2)
> ice_load_pkg_type(): Active package is: 1.3.26.0, ICE OS Default Package
> (single VLAN mode)
> TELEMETRY: No legacy callbacks, legacy socket not created
> ice_set_rx_function(): Using AVX2 Vector Rx (port 0).
> ice_set_rx_function(): Using AVX2 Vector Rx (port 1).
> Info: Left port and Left Sender CPU core belong to the same NUMA node: 2
> Info: Right port and Right Receiver CPU core belong to the same NUMA node: 2
> Info: Right port and Right Sender CPU core belong to the same NUMA node: 2
> Info: Left port and Left Receiver CPU core belong to the same NUMA node: 2
> Info: Testing initiated at 2023-09-15 07:50:17
> EAL: Error - exiting with code: 1
> Cause: Forward sending exceeded the 60.0006000000 seconds limit, the
> test is invalid.
> EAL: Error - exiting with code: 1
> Cause: Reverse sending exceeded the 60.0006000000 seconds limit, the
> test is invalid.
> root@x033:~/siitperf#
>
> The rte_exit() function seems to work, as the error message appears, and
> the number of sent frames is not displayed, however, the "Info: ..."
> message about the sending time (printed out earlier in the code) is
> missing! This is rather strange!
>
> What is worse, the program does not stop, but *the sender threads and
> the main program remain running (forever)*.
>
> Here is the output of the "top" command:
>
> top - 07:54:24 up 1 day, 14:12, 2 users, load average: 3.02, 2.41, 2.10
> Tasks: 591 total, 2 running, 589 sleeping, 0 stopped, 0 zombie
> %Cpu0 :100.0 us, 0.0 sy, 0.0 ni, 0.0 id, 0.0 wa, 0.0 hi, 0.0 si,
> 0.0 st
> %Cpu1 :100.0 us, 0.0 sy, 0.0 ni, 0.0 id, 0.0 wa, 0.0 hi, 0.0 si,
> 0.0 st
> %Cpu2 : 0.0 us, 0.0 sy, 0.0 ni, 94.1 id, 0.0 wa, 0.0 hi, 5.9 si,
> 0.0 st
> %Cpu3 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si,
> 0.0 st
> %Cpu4 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si,
> 0.0 st
> %Cpu5 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si,
> 0.0 st
> %Cpu6 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si,
> 0.0 st
> %Cpu7 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si,
> 0.0 st
> %Cpu8 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si,
> 0.0 st
> %Cpu9 :100.0 us, 0.0 sy, 0.0 ni, 0.0 id, 0.0 wa, 0.0 hi, 0.0 si,
> 0.0 st
> %Cpu10 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si,
> 0.0 st
> %Cpu11 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si,
> 0.0 st
> %Cpu12 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si,
> 0.0 st
> %Cpu13 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si,
> 0.0 st
>
> CPU0 is the main core, the left sender and the right sender use CPU1 and
> CPU9, respectively. (The receivers that used CPU5 and CPU13 already
> terminated due to their timeout.)
>
> *Thus, rte_exit() behaves differently now: it used the terminate the
> main program but now it does not. *(And it also suppresses some
> previously sent output.)*
> *
>
> Is it a bug a new feature?
>
> *How could I achieve the old behavior?* (Or at least the termination of
> the main program by sender threads?)
>
> Thank you very much for you guidance!
>
> Best regards,
>
> Gábor Lencse
Please get a backtrace. Simple way is to attach gdb to that process.
I suspect that since rte_exit() call the internal eal_cleanup function
and that calls close in the dirver, that ICE driver close function has
a bug. Perhaps ice close function does not correctly handle case where
the device has not started.
next prev parent reply other threads:[~2023-09-15 15:06 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-09-15 8:24 Gabor LENCSE
2023-09-15 15:06 ` Stephen Hemminger [this message]
2023-09-15 18:28 ` Gabor LENCSE
2023-09-15 21:33 ` Stephen Hemminger
2023-09-17 19:37 ` Gabor LENCSE
2023-09-17 21:27 ` Stephen Hemminger
2023-09-18 18:23 ` Gabor LENCSE
2023-09-18 18:49 ` Stephen Hemminger
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20230915080608.724f0102@hermes.local \
--to=stephen@networkplumber.org \
--cc=lencse@hit.bme.hu \
--cc=users@dpdk.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).