From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id 3BC6D425A8 for ; Fri, 15 Sep 2023 20:29:00 +0200 (CEST) Received: from mails.dpdk.org (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id A0BBB402B8; Fri, 15 Sep 2023 20:28:59 +0200 (CEST) Received: from frogstar.hit.bme.hu (frogstar.hit.bme.hu [152.66.248.44]) by mails.dpdk.org (Postfix) with ESMTP id EA2C9402AF for ; Fri, 15 Sep 2023 20:28:58 +0200 (CEST) Received: from [192.168.0.104] (host-79-121-40-211.kabelnet.hu [79.121.40.211]) (authenticated bits=0) by frogstar.hit.bme.hu (8.17.1/8.17.1) with ESMTPSA id 38FISoIZ071588 (version=TLSv1.3 cipher=TLS_AES_128_GCM_SHA256 bits=128 verify=NO); Fri, 15 Sep 2023 20:28:55 +0200 (CEST) (envelope-from lencse@hit.bme.hu) X-Authentication-Warning: frogstar.hit.bme.hu: Host host-79-121-40-211.kabelnet.hu [79.121.40.211] claimed to be [192.168.0.104] Message-ID: <35b55d11-bb67-2363-6f0a-0fb9667ebe6d@hit.bme.hu> Date: Fri, 15 Sep 2023 20:28:44 +0200 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:102.0) Gecko/20100101 Thunderbird/102.15.1 Subject: Re: rte_exit() does not terminate the program -- is it a bug or a new feature? Content-Language: en-US To: Stephen Hemminger Cc: users@dpdk.org References: <3ef90e53-c28b-09e8-e3f3-2b78727114ff@hit.bme.hu> <20230915080608.724f0102@hermes.local> From: Gabor LENCSE In-Reply-To: <20230915080608.724f0102@hermes.local> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Virus-Scanned: clamav-milter 0.103.8 at frogstar.hit.bme.hu X-Virus-Status: Clean Received-SPF: pass (frogstar.hit.bme.hu: authenticated connection) receiver=frogstar.hit.bme.hu; client-ip=79.121.40.211; helo=[192.168.0.104]; envelope-from=lencse@hit.bme.hu; x-software=spfmilter 2.001 http://www.acme.com/software/spfmilter/ with libspf2-1.2.11; X-DCC--Metrics: frogstar.hit.bme.hu; whitelist X-Spam-Status: No, score=-0.9 required=5.0 tests=ALL_TRUSTED, AWL, NICE_REPLY_A, TW_CV,T_SCC_BODY_TEXT_LINE,WEIRD_PORT autolearn=disabled version=3.4.6-frogstar X-Spam-Checker-Version: SpamAssassin 3.4.6-frogstar (2021-04-09) on frogstar.hit.bme.hu X-Scanned-By: MIMEDefang 2.86 on 152.66.248.44 X-BeenThere: users@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK usage discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: users-bounces@dpdk.org Dear Stephen, Thank you very much for your answer! > Please get a backtrace. Simple way is to attach gdb to that process. I have recompiled siitperf with the "-g" compiler option and executed it from gdb. When the program stopped, I pressed Ctrl-C and issued a "bt" command, but of course, it displayed the call stack of the main thread. Then I collected some information about the threads using the "info threads" command and after that I switched to all available threads, and issued a "bt" command for those that represented my send() and receive() functions (I identified them using their LWP number). Here are the results: root@x033:~/siitperf# gdb ./build/siitperf-tp GNU gdb (Ubuntu 12.1-0ubuntu1~22.04) 12.1 Copyright (C) 2022 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "x86_64-linux-gnu". Type "show configuration" for configuration details. For bug reporting instructions, please see: . Find the GDB manual and other documentation resources online at:     . For help, type "help". Type "apropos word" to search for commands related to "word"... Reading symbols from ./build/siitperf-tp... (gdb) set args 84 8000000 60 2000 2 2 (gdb) run Starting program: /root/siitperf/build/siitperf-tp 84 8000000 60 2000 2 2 [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1". EAL: Detected CPU lcores: 56 EAL: Detected NUMA nodes: 4 EAL: Detected shared linkage of DPDK [New Thread 0x7ffff49c0640 (LWP 24747)] EAL: Multi-process socket /var/run/dpdk/rte/mp_socket [New Thread 0x7ffff41bf640 (LWP 24748)] EAL: Selected IOVA mode 'PA' EAL: No free 2048 kB hugepages reported on node 0 EAL: No free 2048 kB hugepages reported on node 1 EAL: No free 2048 kB hugepages reported on node 2 EAL: No free 2048 kB hugepages reported on node 3 EAL: No available 2048 kB hugepages reported EAL: VFIO support initialized [New Thread 0x7ffff39be640 (LWP 24749)] [New Thread 0x7ffff31bd640 (LWP 24750)] [New Thread 0x7ffff29bc640 (LWP 24751)] [New Thread 0x7ffff21bb640 (LWP 24752)] EAL: Probe PCI driver: net_ice (8086:159b) device: 0000:98:00.0 (socket 2) ice_load_pkg_type(): Active package is: 1.3.26.0, ICE OS Default Package (single VLAN mode) EAL: Probe PCI driver: net_ice (8086:159b) device: 0000:98:00.1 (socket 2) ice_load_pkg_type(): Active package is: 1.3.26.0, ICE OS Default Package (single VLAN mode) [New Thread 0x7ffff19ba640 (LWP 24753)] TELEMETRY: No legacy callbacks, legacy socket not created ice_set_rx_function(): Using AVX2 Vector Rx (port 0). ice_set_rx_function(): Using AVX2 Vector Rx (port 1). Info: Left port and Left Sender CPU core belong to the same NUMA node: 2 Info: Right port and Right Receiver CPU core belong to the same NUMA node: 2 Info: Right port and Right Sender CPU core belong to the same NUMA node: 2 Info: Left port and Left Receiver CPU core belong to the same NUMA node: 2 Info: Testing initiated at 2023-09-15 18:06:05 Reverse frames received: 394340224 Forward frames received: 421381420 Info: Forward sender's sending took 70.3073795726 seconds. EAL: Error - exiting with code: 1   Cause: Forward sending exceeded the 60.0006000000 seconds limit, the test is invalid. Info: Reverse sender's sending took 74.9384769772 seconds. EAL: Error - exiting with code: 1   Cause: Reverse sending exceeded the 60.0006000000 seconds limit, the test is invalid. ^C Thread 1 "siitperf-tp" received signal SIGINT, Interrupt. 0x00007ffff7d99dd2 in rte_eal_wait_lcore () from /lib/x86_64-linux-gnu/librte_eal.so.22 (gdb) bt #0  0x00007ffff7d99dd2 in rte_eal_wait_lcore () from /lib/x86_64-linux-gnu/librte_eal.so.22 #1  0x000055555559929e in Throughput::measure (this=0x7fffffffe300, leftport=0, rightport=1) at throughput.cc:3743 #2  0x0000555555557b20 in main (argc=7, argv=0x7fffffffe5b8) at main-tp.cc:34 (gdb) info threads   Id   Target Id                                           Frame * 1    Thread 0x7ffff77cac00 (LWP 24744) "siitperf-tp" 0x00007ffff7d99dd2 in rte_eal_wait_lcore ()    from /lib/x86_64-linux-gnu/librte_eal.so.22   2    Thread 0x7ffff49c0640 (LWP 24747) "eal-intr-thread" 0x00007ffff7a32fde in epoll_wait (epfd=6, events=0x7ffff49978d0,     maxevents=3, timeout=-1) at ../sysdeps/unix/sysv/linux/epoll_wait.c:30   3    Thread 0x7ffff41bf640 (LWP 24748) "rte_mp_handle" __recvmsg_syscall (flags=0, msg=0x7ffff41965c0, fd=9)     at ../sysdeps/unix/sysv/linux/recvmsg.c:27   4    Thread 0x7ffff39be640 (LWP 24749) "lcore-worker-1" 0x00007ffff7d99dd2 in rte_eal_wait_lcore ()    from /lib/x86_64-linux-gnu/librte_eal.so.22   5    Thread 0x7ffff31bd640 (LWP 24750) "lcore-worker-5" __GI___libc_read (nbytes=1, buf=0x7ffff31947cf, fd=40)     at ../sysdeps/unix/sysv/linux/read.c:26   6    Thread 0x7ffff29bc640 (LWP 24751) "lcore-worker-9" 0x00007ffff7d99dd2 in rte_eal_wait_lcore ()    from /lib/x86_64-linux-gnu/librte_eal.so.22   7    Thread 0x7ffff21bb640 (LWP 24752) "lcore-worker-13" __GI___libc_read (nbytes=1, buf=0x7ffff21927cf, fd=48)     at ../sysdeps/unix/sysv/linux/read.c:26   8    Thread 0x7ffff19ba640 (LWP 24753) "telemetry-v2" 0x00007ffff7a3460f in __libc_accept (fd=58, addr=..., len=0x0)     at ../sysdeps/unix/sysv/linux/accept.c:26 (gdb) thread 1 [Switching to thread 1 (Thread 0x7ffff77cac00 (LWP 24744))] #0  0x00007ffff7d99dd2 in rte_eal_wait_lcore () from /lib/x86_64-linux-gnu/librte_eal.so.22 (gdb) thread 2 [Switching to thread 2 (Thread 0x7ffff49c0640 (LWP 24747))] #0  0x00007ffff7a32fde in epoll_wait (epfd=6, events=0x7ffff49978d0, maxevents=3, timeout=-1)     at ../sysdeps/unix/sysv/linux/epoll_wait.c:30 30      ../sysdeps/unix/sysv/linux/epoll_wait.c: No such file or directory. (gdb) thread 3 [Switching to thread 3 (Thread 0x7ffff41bf640 (LWP 24748))] #0  __recvmsg_syscall (flags=0, msg=0x7ffff41965c0, fd=9) at ../sysdeps/unix/sysv/linux/recvmsg.c:27 27      ../sysdeps/unix/sysv/linux/recvmsg.c: No such file or directory. (gdb) thread 4 [Switching to thread 4 (Thread 0x7ffff39be640 (LWP 24749))] #0  0x00007ffff7d99dd2 in rte_eal_wait_lcore () from /lib/x86_64-linux-gnu/librte_eal.so.22 (gdb) bt #0  0x00007ffff7d99dd2 in rte_eal_wait_lcore () from /lib/x86_64-linux-gnu/librte_eal.so.22 #1  0x00007ffff7d99f97 in rte_eal_mp_wait_lcore () from /lib/x86_64-linux-gnu/librte_eal.so.22 #2  0x00007ffff7da99ee in rte_service_finalize () from /lib/x86_64-linux-gnu/librte_eal.so.22 #3  0x00007ffff7db0404 in rte_eal_cleanup () from /lib/x86_64-linux-gnu/librte_eal.so.22 #4  0x00007ffff7d9d0b7 in rte_exit () from /lib/x86_64-linux-gnu/librte_eal.so.22 #5  0x000055555558e685 in send (par=0x7fffffffde00) at throughput.cc:1562 #6  0x00007ffff7d94a18 in ?? () from /lib/x86_64-linux-gnu/librte_eal.so.22 #7  0x00007ffff79a1b43 in start_thread (arg=) at ./nptl/pthread_create.c:442 #8  0x00007ffff7a33a00 in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81 (gdb) thread 5 [Switching to thread 5 (Thread 0x7ffff31bd640 (LWP 24750))] #0  __GI___libc_read (nbytes=1, buf=0x7ffff31947cf, fd=40) at ../sysdeps/unix/sysv/linux/read.c:26 26      ../sysdeps/unix/sysv/linux/read.c: No such file or directory. (gdb) bt #0  __GI___libc_read (nbytes=1, buf=0x7ffff31947cf, fd=40) at ../sysdeps/unix/sysv/linux/read.c:26 #1  __GI___libc_read (fd=40, buf=0x7ffff31947cf, nbytes=1) at ../sysdeps/unix/sysv/linux/read.c:24 #2  0x00007ffff7d9490c in ?? () from /lib/x86_64-linux-gnu/librte_eal.so.22 #3  0x00007ffff79a1b43 in start_thread (arg=) at ./nptl/pthread_create.c:442 #4  0x00007ffff7a33a00 in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81 (gdb) thread 6 [Switching to thread 6 (Thread 0x7ffff29bc640 (LWP 24751))] #0  0x00007ffff7d99dd2 in rte_eal_wait_lcore () from /lib/x86_64-linux-gnu/librte_eal.so.22 (gdb) bt #0  0x00007ffff7d99dd2 in rte_eal_wait_lcore () from /lib/x86_64-linux-gnu/librte_eal.so.22 #1  0x00007ffff7d99f97 in rte_eal_mp_wait_lcore () from /lib/x86_64-linux-gnu/librte_eal.so.22 #2  0x00007ffff7da99ee in rte_service_finalize () from /lib/x86_64-linux-gnu/librte_eal.so.22 #3  0x00007ffff7db0404 in rte_eal_cleanup () from /lib/x86_64-linux-gnu/librte_eal.so.22 #4  0x00007ffff7d9d0b7 in rte_exit () from /lib/x86_64-linux-gnu/librte_eal.so.22 #5  0x000055555558e685 in send (par=0x7fffffffde80) at throughput.cc:1562 #6  0x00007ffff7d94a18 in ?? () from /lib/x86_64-linux-gnu/librte_eal.so.22 #7  0x00007ffff79a1b43 in start_thread (arg=) at ./nptl/pthread_create.c:442 #8  0x00007ffff7a33a00 in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81 (gdb) thread 7 [Switching to thread 7 (Thread 0x7ffff21bb640 (LWP 24752))] #0  __GI___libc_read (nbytes=1, buf=0x7ffff21927cf, fd=48) at ../sysdeps/unix/sysv/linux/read.c:26 26      in ../sysdeps/unix/sysv/linux/read.c (gdb) bt #0  __GI___libc_read (nbytes=1, buf=0x7ffff21927cf, fd=48) at ../sysdeps/unix/sysv/linux/read.c:26 #1  __GI___libc_read (fd=48, buf=0x7ffff21927cf, nbytes=1) at ../sysdeps/unix/sysv/linux/read.c:24 #2  0x00007ffff7d9490c in ?? () from /lib/x86_64-linux-gnu/librte_eal.so.22 #3  0x00007ffff79a1b43 in start_thread (arg=) at ./nptl/pthread_create.c:442 #4  0x00007ffff7a33a00 in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81 (gdb) thread 8 [Switching to thread 8 (Thread 0x7ffff19ba640 (LWP 24753))] #0  0x00007ffff7a3460f in __libc_accept (fd=58, addr=..., len=0x0) at ../sysdeps/unix/sysv/linux/accept.c:26 26      ../sysdeps/unix/sysv/linux/accept.c: No such file or directory. (gdb) thread 9 Unknown thread 9. (gdb) Some additional information from the siitperf.conf file: CPU-L-Send 1 # Left Sender runs on this core CPU-R-Recv 5 # Right Receiver runs on this core CPU-R-Send 9 # Right Sender runs on this core CPU-L-Recv 13 # Left Receiver runs on this core Therefore, the "send()" functions are the ones that remain running on CPU cores 1 and 9. And they fully utilize their cores (as well as the main program does with core 0), I have checked it earlier. This is a -- perhaps -- relevant part from the code of the main program:       // wait until active senders and receivers finish       if ( forward ) {         rte_eal_wait_lcore(cpu_left_sender);         rte_eal_wait_lcore(cpu_right_receiver);       }       if ( reverse ) {         rte_eal_wait_lcore(cpu_right_sender);         rte_eal_wait_lcore(cpu_left_receiver);       } It seems to me as if the two send functions and also the main program would be actively waiting in the rte_eal_wait_lcore() function. But I have no idea, why. If the sender that sent frames in the forward direction is there and the main program is there, then, IMHO, the rte_eal_wait_lcore(cpu_left_sender); function should finish. Am I wrong? > I suspect that since rte_exit() call the internal eal_cleanup function > and that calls close in the dirver, that ICE driver close function has > a bug. Perhaps ice close function does not correctly handle case where > the device has not started. Yes, your hypothesis was confirmed: my both send() functions were in the rte_eal_cleanup() function. :-) However, I am not sure, which device you meant. But I think that they are initialized properly, because I can ensure a successful execution (and finishing) of the program by halving the frame rate: root@x033:~/siitperf# ./build/siitperf-tp 84 4000000 6 2000 2 2 EAL: Detected CPU lcores: 56 EAL: Detected NUMA nodes: 4 EAL: Detected shared linkage of DPDK EAL: Multi-process socket /var/run/dpdk/rte/mp_socket EAL: Selected IOVA mode 'PA' EAL: No free 2048 kB hugepages reported on node 0 EAL: No free 2048 kB hugepages reported on node 1 EAL: No free 2048 kB hugepages reported on node 2 EAL: No free 2048 kB hugepages reported on node 3 EAL: No available 2048 kB hugepages reported EAL: VFIO support initialized EAL: Probe PCI driver: net_ice (8086:159b) device: 0000:98:00.0 (socket 2) ice_load_pkg_type(): Active package is: 1.3.26.0, ICE OS Default Package (single VLAN mode) EAL: Probe PCI driver: net_ice (8086:159b) device: 0000:98:00.1 (socket 2) ice_load_pkg_type(): Active package is: 1.3.26.0, ICE OS Default Package (single VLAN mode) TELEMETRY: No legacy callbacks, legacy socket not created ice_set_rx_function(): Using AVX2 Vector Rx (port 0). ice_set_rx_function(): Using AVX2 Vector Rx (port 1). Info: Left port and Left Sender CPU core belong to the same NUMA node: 2 Info: Right port and Right Receiver CPU core belong to the same NUMA node: 2 Info: Right port and Right Sender CPU core belong to the same NUMA node: 2 Info: Left port and Left Receiver CPU core belong to the same NUMA node: 2 Info: Testing initiated at 2023-09-15 17:43:11 Info: Reverse sender's sending took 5.9999998420 seconds. Reverse frames sent: 24000000 Info: Forward sender's sending took 5.9999999023 seconds. Forward frames sent: 24000000 Forward frames received: 24000000 Reverse frames received: 24000000 Info: Test finished. root@x033:~/siitperf# (The only problem with this trick is that I want to use a binary search to determine the performance limit of the tester. And if the tester does not stop, then my bash shell script waits for it forever.) So, what should I do next? Best regards, Gábor