From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id AA632425A3 for ; Fri, 15 Sep 2023 10:24:15 +0200 (CEST) Received: from mails.dpdk.org (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id 9B2B5402AF; Fri, 15 Sep 2023 10:24:15 +0200 (CEST) Received: from frogstar.hit.bme.hu (frogstar.hit.bme.hu [152.66.248.44]) by mails.dpdk.org (Postfix) with ESMTP id D33BF4029E for ; Fri, 15 Sep 2023 10:24:14 +0200 (CEST) Received: from [192.168.0.102] (host-79-121-41-34.kabelnet.hu [79.121.41.34]) (authenticated bits=0) by frogstar.hit.bme.hu (8.17.1/8.17.1) with ESMTPSA id 38F8O6u7009065 (version=TLSv1.3 cipher=TLS_AES_128_GCM_SHA256 bits=128 verify=NO) for ; Fri, 15 Sep 2023 10:24:12 +0200 (CEST) (envelope-from lencse@hit.bme.hu) X-Authentication-Warning: frogstar.hit.bme.hu: Host host-79-121-41-34.kabelnet.hu [79.121.41.34] claimed to be [192.168.0.102] Content-Type: multipart/alternative; boundary="------------K7T8QoTNMMGd8o75qvpkTmau" Message-ID: <3ef90e53-c28b-09e8-e3f3-2b78727114ff@hit.bme.hu> Date: Fri, 15 Sep 2023 10:24:01 +0200 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:102.0) Gecko/20100101 Thunderbird/102.15.1 Content-Language: en-US To: users@dpdk.org From: Gabor LENCSE Subject: rte_exit() does not terminate the program -- is it a bug or a new feature? X-Virus-Scanned: clamav-milter 0.103.8 at frogstar.hit.bme.hu X-Virus-Status: Clean Received-SPF: pass (frogstar.hit.bme.hu: authenticated connection) receiver=frogstar.hit.bme.hu; client-ip=79.121.41.34; helo=[192.168.0.102]; envelope-from=lencse@hit.bme.hu; x-software=spfmilter 2.001 http://www.acme.com/software/spfmilter/ with libspf2-1.2.11; X-DCC-x.dcc-servers-Metrics: frogstar.hit.bme.hu; whitelist X-Spam-Status: No, score=-0.9 required=5.0 tests=ALL_TRUSTED, AWL, HTML_MESSAGE, T_SCC_BODY_TEXT_LINE autolearn=disabled version=3.4.6-frogstar X-Spam-Checker-Version: SpamAssassin 3.4.6-frogstar (2021-04-09) on frogstar.hit.bme.hu X-Scanned-By: MIMEDefang 2.86 on 152.66.248.44 X-BeenThere: users@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK usage discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: users-bounces@dpdk.org This is a multi-part message in MIME format. --------------K7T8QoTNMMGd8o75qvpkTmau Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit Dear DPDK Developers and Users, I have met the following issue with my RFC 8219 compliant SIIT and stateful NAT64/NAT44 tester, siitperf: https://github.com/lencsegabor/siitperf Its main program starts two sending threads and two receiving threads on their exclusively used CPU cores using the rte_eal_remote_launch() function, e.g., the code is as follows:           // start left sender           if ( rte_eal_remote_launch(send, &spars1, cpu_left_sender) )             std::cout << "Error: could not start Left Sender." << std::endl; When the test frame sending is finished, the senders check the sending time, and if the allowed time was significantly exceeded, the sender gives an error message and terminates (itself and also the main program) using the rte_exit() function. This is the code:   elapsed_seconds = (double)(rte_rdtsc()-start_tsc)/hz;   printf("Info: %s sender's sending took %3.10lf seconds.\n", side, elapsed_seconds);   if ( elapsed_seconds > duration*TOLERANCE )     rte_exit(EXIT_FAILURE, "%s sending exceeded the %3.10lf seconds limit, the test is invalid.\n", side, duration*TOLERANCE);   printf("%s frames sent: %lu\n", side, sent_frames);   return 0; The above code worked as I expected, while I used siitperf under Debian 9.13 with DPDK 16.11.11-1+deb9u2. It always displayed the execution time of test frame sending, and if the allowed time was significantly exceed, then it gave an error message, and it was terminated, thus the sender did not print out the number of send frames. And also the main program was terminated due to the call of this function: it did not write out the "Info: Test finished." message. However, when I updated siitperf to use it with Ubuntu 22.04 with DPDK version "21.11.3-0ubuntu0.22.04.1 amd64", then I experienced something rather strange: In the case, when the sending time is significantly exceeded, I get the following messages from the program (I copy here the full output, as it may be useful): root@x033:~/siitperf# cat temp.out EAL: Detected CPU lcores: 56 EAL: Detected NUMA nodes: 4 EAL: Detected shared linkage of DPDK EAL: Multi-process socket /var/run/dpdk/rte/mp_socket EAL: Selected IOVA mode 'PA' EAL: No free 2048 kB hugepages reported on node 0 EAL: No free 2048 kB hugepages reported on node 1 EAL: No free 2048 kB hugepages reported on node 2 EAL: No free 2048 kB hugepages reported on node 3 EAL: No available 2048 kB hugepages reported EAL: VFIO support initialized EAL: Probe PCI driver: net_ice (8086:159b) device: 0000:98:00.0 (socket 2) ice_load_pkg_type(): Active package is: 1.3.26.0, ICE OS Default Package (single VLAN mode) EAL: Probe PCI driver: net_ice (8086:159b) device: 0000:98:00.1 (socket 2) ice_load_pkg_type(): Active package is: 1.3.26.0, ICE OS Default Package (single VLAN mode) TELEMETRY: No legacy callbacks, legacy socket not created ice_set_rx_function(): Using AVX2 Vector Rx (port 0). ice_set_rx_function(): Using AVX2 Vector Rx (port 1). Info: Left port and Left Sender CPU core belong to the same NUMA node: 2 Info: Right port and Right Receiver CPU core belong to the same NUMA node: 2 Info: Right port and Right Sender CPU core belong to the same NUMA node: 2 Info: Left port and Left Receiver CPU core belong to the same NUMA node: 2 Info: Testing initiated at 2023-09-15 07:50:17 EAL: Error - exiting with code: 1   Cause: Forward sending exceeded the 60.0006000000 seconds limit, the test is invalid. EAL: Error - exiting with code: 1   Cause: Reverse sending exceeded the 60.0006000000 seconds limit, the test is invalid. root@x033:~/siitperf# The rte_exit() function seems to work, as the error message appears, and the number of sent frames is not displayed, however, the "Info: ..." message about the sending time (printed out earlier in the code) is missing! This is rather strange! What is worse, the program does not stop, but *the sender threads and the main program remain running (forever)*. Here is the output of the "top" command: top - 07:54:24 up 1 day, 14:12,  2 users, load average: 3.02, 2.41, 2.10 Tasks: 591 total,   2 running, 589 sleeping,   0 stopped,   0 zombie %Cpu0  :100.0 us,  0.0 sy,  0.0 ni,  0.0 id,  0.0 wa,  0.0 hi, 0.0 si,  0.0 st %Cpu1  :100.0 us,  0.0 sy,  0.0 ni,  0.0 id,  0.0 wa,  0.0 hi, 0.0 si,  0.0 st %Cpu2  :  0.0 us,  0.0 sy,  0.0 ni, 94.1 id,  0.0 wa,  0.0 hi, 5.9 si,  0.0 st %Cpu3  :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi, 0.0 si,  0.0 st %Cpu4  :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi, 0.0 si,  0.0 st %Cpu5  :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi, 0.0 si,  0.0 st %Cpu6  :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi, 0.0 si,  0.0 st %Cpu7  :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi, 0.0 si,  0.0 st %Cpu8  :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi, 0.0 si,  0.0 st %Cpu9  :100.0 us,  0.0 sy,  0.0 ni,  0.0 id,  0.0 wa,  0.0 hi, 0.0 si,  0.0 st %Cpu10 :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi, 0.0 si,  0.0 st %Cpu11 :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi, 0.0 si,  0.0 st %Cpu12 :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi, 0.0 si,  0.0 st %Cpu13 :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi, 0.0 si,  0.0 st CPU0 is the main core, the left sender and the right sender use CPU1 and CPU9, respectively. (The receivers that used CPU5 and CPU13 already terminated due to their timeout.) *Thus, rte_exit() behaves differently now: it used the terminate the main program but now it does not. *(And it also suppresses some previously sent output.)* * Is it a bug a new feature? *How could I achieve the old behavior?* (Or at least the termination of the main program by sender threads?) Thank you very much for you guidance! Best regards, Gábor Lencse --------------K7T8QoTNMMGd8o75qvpkTmau Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: 8bit

Dear DPDK Developers and Users,

I have met the following issue with my RFC 8219 compliant SIIT and stateful NAT64/NAT44 tester, siitperf: https://github.com/lencsegabor/siitperf

Its main program starts two sending threads and two receiving threads on their exclusively used CPU cores using the rte_eal_remote_launch() function, e.g., the code is as follows:

          // start left sender
          if ( rte_eal_remote_launch(send, &spars1, cpu_left_sender) )
            std::cout << "Error: could not start Left Sender." << std::endl;

When the test frame sending is finished, the senders check the sending time, and if the allowed time was significantly exceeded, the sender gives an error message and terminates (itself and also the main program) using the rte_exit() function.

This is the code:

  elapsed_seconds = (double)(rte_rdtsc()-start_tsc)/hz;
  printf("Info: %s sender's sending took %3.10lf seconds.\n", side, elapsed_seconds);
  if ( elapsed_seconds > duration*TOLERANCE )
    rte_exit(EXIT_FAILURE, "%s sending exceeded the %3.10lf seconds limit, the test is invalid.\n", side, duration*TOLERANCE);
  printf("%s frames sent: %lu\n", side, sent_frames);

  return 0;

The above code worked as I expected, while I used siitperf under Debian 9.13 with DPDK 16.11.11-1+deb9u2. It always displayed the execution time of test frame sending, and if the allowed time was significantly exceed, then it gave an error message, and it was terminated, thus the sender did not print out the number of send frames. And also the main program was terminated due to the call of this function: it did not write out the "Info: Test finished." message.

However, when I updated siitperf to use it with Ubuntu 22.04 with DPDK version "21.11.3-0ubuntu0.22.04.1 amd64", then I experienced something rather strange:

In the case, when the sending time is significantly exceeded, I get the following messages from the program (I copy here the full output, as it may be useful):

root@x033:~/siitperf# cat temp.out
EAL: Detected CPU lcores: 56
EAL: Detected NUMA nodes: 4
EAL: Detected shared linkage of DPDK
EAL: Multi-process socket /var/run/dpdk/rte/mp_socket
EAL: Selected IOVA mode 'PA'
EAL: No free 2048 kB hugepages reported on node 0
EAL: No free 2048 kB hugepages reported on node 1
EAL: No free 2048 kB hugepages reported on node 2
EAL: No free 2048 kB hugepages reported on node 3
EAL: No available 2048 kB hugepages reported
EAL: VFIO support initialized
EAL: Probe PCI driver: net_ice (8086:159b) device: 0000:98:00.0 (socket 2)
ice_load_pkg_type(): Active package is: 1.3.26.0, ICE OS Default Package (single VLAN mode)
EAL: Probe PCI driver: net_ice (8086:159b) device: 0000:98:00.1 (socket 2)
ice_load_pkg_type(): Active package is: 1.3.26.0, ICE OS Default Package (single VLAN mode)
TELEMETRY: No legacy callbacks, legacy socket not created
ice_set_rx_function(): Using AVX2 Vector Rx (port 0).
ice_set_rx_function(): Using AVX2 Vector Rx (port 1).
Info: Left port and Left Sender CPU core belong to the same NUMA node: 2
Info: Right port and Right Receiver CPU core belong to the same NUMA node: 2
Info: Right port and Right Sender CPU core belong to the same NUMA node: 2
Info: Left port and Left Receiver CPU core belong to the same NUMA node: 2
Info: Testing initiated at 2023-09-15 07:50:17
EAL: Error - exiting with code: 1
  Cause: Forward sending exceeded the 60.0006000000 seconds limit, the test is invalid.
EAL: Error - exiting with code: 1
  Cause: Reverse sending exceeded the 60.0006000000 seconds limit, the test is invalid.
root@x033:~/siitperf#

The rte_exit() function seems to work, as the error message appears, and the number of sent frames is not displayed, however, the "Info: ..." message about the sending time (printed out earlier in the code) is missing! This is rather strange!

What is worse, the program does not stop, but the sender threads and the main program remain running (forever).

Here is the output of the "top" command:

top - 07:54:24 up 1 day, 14:12,  2 users,  load average: 3.02, 2.41, 2.10
Tasks: 591 total,   2 running, 589 sleeping,   0 stopped,   0 zombie
%Cpu0  :100.0 us,  0.0 sy,  0.0 ni,  0.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu1  :100.0 us,  0.0 sy,  0.0 ni,  0.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu2  :  0.0 us,  0.0 sy,  0.0 ni, 94.1 id,  0.0 wa,  0.0 hi,  5.9 si,  0.0 st
%Cpu3  :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu4  :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu5  :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu6  :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu7  :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu8  :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu9  :100.0 us,  0.0 sy,  0.0 ni,  0.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu10 :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu11 :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu12 :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu13 :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st

CPU0 is the main core, the left sender and the right sender use CPU1 and CPU9, respectively. (The receivers that used CPU5 and CPU13 already terminated due to their timeout.)

Thus, rte_exit() behaves differently now: it used the terminate the main program but now it does not. (And it also suppresses some previously sent output.)

Is it a bug a new feature?

How could I achieve the old behavior? (Or at least the termination of the main program by sender threads?)

Thank you very much for you guidance!

Best regards,

Gábor Lencse



--------------K7T8QoTNMMGd8o75qvpkTmau--