From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id 07E31425A6 for ; Fri, 15 Sep 2023 17:06:13 +0200 (CEST) Received: from mails.dpdk.org (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id 7A9A4402B8; Fri, 15 Sep 2023 17:06:12 +0200 (CEST) Received: from mail-pl1-f170.google.com (mail-pl1-f170.google.com [209.85.214.170]) by mails.dpdk.org (Postfix) with ESMTP id EB35740295 for ; Fri, 15 Sep 2023 17:06:11 +0200 (CEST) Received: by mail-pl1-f170.google.com with SMTP id d9443c01a7336-1c3d6d88231so19150805ad.0 for ; Fri, 15 Sep 2023 08:06:11 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=networkplumber-org.20230601.gappssmtp.com; s=20230601; t=1694790371; x=1695395171; darn=dpdk.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:subject:cc:to:from:date:from:to:cc:subject:date :message-id:reply-to; bh=sg5uQQj4j3341a/MlJc7sqEmx8/SUgFGi2fXcy7WVw8=; b=kHAznL6eKUy5O6QhoZ1yS92SyhrHNjyspSWEivSh7MsIOorZutgA9cNP5Plvc3xBZE 1pw/WqG3Txi2sGPh9ZkWkwhemyVyCc1S3qnw4cYwMqKV43XHY7r8I19N+N6SGii9NkNf 6uACDjdgk2RRYUb/reOl8utxUpdK6TmuT8z5ZjvXAIhhqs7XqbUMTKqXw91mwWL7zxX/ ST4jeJpBQPvVveEPRiST6XyTCZniXBfhJDMhKh53WeUcpLoOGvtFcaqUl4aeCXt1HGpu 4/b/v4NoxKeBsJvOK5uFJi3hriUEvZN2cf3aa3FoxQnEHfFwxLXJQ5HdFBHqA5nOq5pU MjGg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1694790371; x=1695395171; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:subject:cc:to:from:date:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=sg5uQQj4j3341a/MlJc7sqEmx8/SUgFGi2fXcy7WVw8=; b=WT5lHI3ndIZCdPOrMTaPzy5wmfJgWc0YOVHigKIwmOF/ylUbVM9qUuNzGzIzUdc/B0 EssZ6KvBERSiPL0lLGoQm3IMXJVAQI7E7kt0dn0zN9TbDFGTEynkG7h498SgeWmv+oSM rBPPnwGvyCNU6Ox635wbAZBSxJ+Ww7zLU43c7xflUo7EamVpUwImJaABNeO3D0r90Tsh y7cxx8MadrOYD90DnB/Gx4AefTkMBiPAp9HoTtGScqRjWXnUnnPvaTHdC1fAiLnWDjZZ yi9c/CqMcIzaADFMu13Tj7dvi+PHEHRweas9693mK91PaVqeVVMyAJKPv1faP/EfCUtk a6uA== X-Gm-Message-State: AOJu0Yw033wNSi4F4lLtIwxf9fo0axVkP/2FQ5+mIQB43GRF31Iqvk6e jksVJNFZXAzUC+ZJwrGqJnn1Xg== X-Google-Smtp-Source: AGHT+IEUrVOu9+yxkFhqqfLmORI+pVExQwAknhl0qGNsIXvG/Kj9W/iy1NS2xsVaOa0PGX+g7LOrYg== X-Received: by 2002:a17:902:d512:b0:1b8:525a:f685 with SMTP id b18-20020a170902d51200b001b8525af685mr2074822plg.37.1694790370511; Fri, 15 Sep 2023 08:06:10 -0700 (PDT) Received: from hermes.local (204-195-112-131.wavecable.com. [204.195.112.131]) by smtp.gmail.com with ESMTPSA id s18-20020a170902b19200b001bb9aadfb04sm3573283plr.220.2023.09.15.08.06.09 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 15 Sep 2023 08:06:10 -0700 (PDT) Date: Fri, 15 Sep 2023 08:06:08 -0700 From: Stephen Hemminger To: Gabor LENCSE Cc: users@dpdk.org Subject: Re: rte_exit() does not terminate the program -- is it a bug or a new feature? Message-ID: <20230915080608.724f0102@hermes.local> In-Reply-To: <3ef90e53-c28b-09e8-e3f3-2b78727114ff@hit.bme.hu> References: <3ef90e53-c28b-09e8-e3f3-2b78727114ff@hit.bme.hu> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-BeenThere: users@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK usage discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: users-bounces@dpdk.org On Fri, 15 Sep 2023 10:24:01 +0200 Gabor LENCSE wrote: > Dear DPDK Developers and Users, >=20 > I have met the following issue with my RFC 8219 compliant SIIT and=20 > stateful NAT64/NAT44 tester, siitperf:=20 > https://github.com/lencsegabor/siitperf >=20 > Its main program starts two sending threads and two receiving threads on= =20 > their exclusively used CPU cores using the rte_eal_remote_launch()=20 > function, e.g., the code is as follows: >=20 > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 // start left sen= der > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 if ( rte_eal_remo= te_launch(send, &spars1, cpu_left_sender) ) > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 std::= cout << "Error: could not start Left Sender." <<=20 > std::endl; >=20 > When the test frame sending is finished, the senders check the sending=20 > time, and if the allowed time was significantly exceeded, the sender=20 > gives an error message and terminates (itself and also the main program)= =20 > using the rte_exit() function. >=20 > This is the code: >=20 > =C2=A0 elapsed_seconds =3D (double)(rte_rdtsc()-start_tsc)/hz; > =C2=A0 printf("Info: %s sender's sending took %3.10lf seconds.\n", side,= =20 > elapsed_seconds); > =C2=A0 if ( elapsed_seconds > duration*TOLERANCE ) > =C2=A0=C2=A0=C2=A0 rte_exit(EXIT_FAILURE, "%s sending exceeded the %3.10= lf seconds=20 > limit, the test is invalid.\n", side, duration*TOLERANCE); > =C2=A0 printf("%s frames sent: %lu\n", side, sent_frames); >=20 > =C2=A0 return 0; >=20 > The above code worked as I expected, while I used siitperf under Debian=20 > 9.13 with DPDK 16.11.11-1+deb9u2. It always displayed the execution time= =20 > of test frame sending, and if the allowed time was significantly exceed,= =20 > then it gave an error message, and it was terminated, thus the sender=20 > did not print out the number of send frames. And also the main program=20 > was terminated due to the call of this function: it did not write out=20 > the "Info: Test finished." message. >=20 > However, when I updated siitperf to use it with Ubuntu 22.04 with DPDK=20 > version "21.11.3-0ubuntu0.22.04.1 amd64", then I experienced something=20 > rather strange: >=20 > In the case, when the sending time is significantly exceeded, I get the=20 > following messages from the program (I copy here the full output, as it=20 > may be useful): >=20 > root@x033:~/siitperf# cat temp.out > EAL: Detected CPU lcores: 56 > EAL: Detected NUMA nodes: 4 > EAL: Detected shared linkage of DPDK > EAL: Multi-process socket /var/run/dpdk/rte/mp_socket > EAL: Selected IOVA mode 'PA' > EAL: No free 2048 kB hugepages reported on node 0 > EAL: No free 2048 kB hugepages reported on node 1 > EAL: No free 2048 kB hugepages reported on node 2 > EAL: No free 2048 kB hugepages reported on node 3 > EAL: No available 2048 kB hugepages reported > EAL: VFIO support initialized > EAL: Probe PCI driver: net_ice (8086:159b) device: 0000:98:00.0 (socket 2) > ice_load_pkg_type(): Active package is: 1.3.26.0, ICE OS Default Package= =20 > (single VLAN mode) > EAL: Probe PCI driver: net_ice (8086:159b) device: 0000:98:00.1 (socket 2) > ice_load_pkg_type(): Active package is: 1.3.26.0, ICE OS Default Package= =20 > (single VLAN mode) > TELEMETRY: No legacy callbacks, legacy socket not created > ice_set_rx_function(): Using AVX2 Vector Rx (port 0). > ice_set_rx_function(): Using AVX2 Vector Rx (port 1). > Info: Left port and Left Sender CPU core belong to the same NUMA node: 2 > Info: Right port and Right Receiver CPU core belong to the same NUMA node= : 2 > Info: Right port and Right Sender CPU core belong to the same NUMA node: 2 > Info: Left port and Left Receiver CPU core belong to the same NUMA node: 2 > Info: Testing initiated at 2023-09-15 07:50:17 > EAL: Error - exiting with code: 1 > =C2=A0 Cause: Forward sending exceeded the 60.0006000000 seconds limit, = the=20 > test is invalid. > EAL: Error - exiting with code: 1 > =C2=A0 Cause: Reverse sending exceeded the 60.0006000000 seconds limit, = the=20 > test is invalid. > root@x033:~/siitperf# >=20 > The rte_exit() function seems to work, as the error message appears, and= =20 > the number of sent frames is not displayed, however, the "Info: ..."=20 > message about the sending time (printed out earlier in the code) is=20 > missing! This is rather strange! >=20 > What is worse, the program does not stop, but *the sender threads and=20 > the main program remain running (forever)*. >=20 > Here is the output of the "top" command: >=20 > top - 07:54:24 up 1 day, 14:12,=C2=A0 2 users, load average: 3.02, 2.41, = 2.10 > Tasks: 591 total,=C2=A0=C2=A0 2 running, 589 sleeping,=C2=A0=C2=A0 0 stop= ped,=C2=A0=C2=A0 0 zombie > %Cpu0=C2=A0 :100.0 us,=C2=A0 0.0 sy,=C2=A0 0.0 ni,=C2=A0 0.0 id,=C2=A0 0.= 0 wa,=C2=A0 0.0 hi, 0.0 si,=C2=A0=20 > 0.0 st > %Cpu1=C2=A0 :100.0 us,=C2=A0 0.0 sy,=C2=A0 0.0 ni,=C2=A0 0.0 id,=C2=A0 0.= 0 wa,=C2=A0 0.0 hi, 0.0 si,=C2=A0=20 > 0.0 st > %Cpu2=C2=A0 :=C2=A0 0.0 us,=C2=A0 0.0 sy,=C2=A0 0.0 ni, 94.1 id,=C2=A0 0.= 0 wa,=C2=A0 0.0 hi, 5.9 si,=C2=A0=20 > 0.0 st > %Cpu3=C2=A0 :=C2=A0 0.0 us,=C2=A0 0.0 sy,=C2=A0 0.0 ni,100.0 id,=C2=A0 0.= 0 wa,=C2=A0 0.0 hi, 0.0 si,=C2=A0=20 > 0.0 st > %Cpu4=C2=A0 :=C2=A0 0.0 us,=C2=A0 0.0 sy,=C2=A0 0.0 ni,100.0 id,=C2=A0 0.= 0 wa,=C2=A0 0.0 hi, 0.0 si,=C2=A0=20 > 0.0 st > %Cpu5=C2=A0 :=C2=A0 0.0 us,=C2=A0 0.0 sy,=C2=A0 0.0 ni,100.0 id,=C2=A0 0.= 0 wa,=C2=A0 0.0 hi, 0.0 si,=C2=A0=20 > 0.0 st > %Cpu6=C2=A0 :=C2=A0 0.0 us,=C2=A0 0.0 sy,=C2=A0 0.0 ni,100.0 id,=C2=A0 0.= 0 wa,=C2=A0 0.0 hi, 0.0 si,=C2=A0=20 > 0.0 st > %Cpu7=C2=A0 :=C2=A0 0.0 us,=C2=A0 0.0 sy,=C2=A0 0.0 ni,100.0 id,=C2=A0 0.= 0 wa,=C2=A0 0.0 hi, 0.0 si,=C2=A0=20 > 0.0 st > %Cpu8=C2=A0 :=C2=A0 0.0 us,=C2=A0 0.0 sy,=C2=A0 0.0 ni,100.0 id,=C2=A0 0.= 0 wa,=C2=A0 0.0 hi, 0.0 si,=C2=A0=20 > 0.0 st > %Cpu9=C2=A0 :100.0 us,=C2=A0 0.0 sy,=C2=A0 0.0 ni,=C2=A0 0.0 id,=C2=A0 0.= 0 wa,=C2=A0 0.0 hi, 0.0 si,=C2=A0=20 > 0.0 st > %Cpu10 :=C2=A0 0.0 us,=C2=A0 0.0 sy,=C2=A0 0.0 ni,100.0 id,=C2=A0 0.0 wa,= =C2=A0 0.0 hi, 0.0 si,=C2=A0=20 > 0.0 st > %Cpu11 :=C2=A0 0.0 us,=C2=A0 0.0 sy,=C2=A0 0.0 ni,100.0 id,=C2=A0 0.0 wa,= =C2=A0 0.0 hi, 0.0 si,=C2=A0=20 > 0.0 st > %Cpu12 :=C2=A0 0.0 us,=C2=A0 0.0 sy,=C2=A0 0.0 ni,100.0 id,=C2=A0 0.0 wa,= =C2=A0 0.0 hi, 0.0 si,=C2=A0=20 > 0.0 st > %Cpu13 :=C2=A0 0.0 us,=C2=A0 0.0 sy,=C2=A0 0.0 ni,100.0 id,=C2=A0 0.0 wa,= =C2=A0 0.0 hi, 0.0 si,=C2=A0=20 > 0.0 st >=20 > CPU0 is the main core, the left sender and the right sender use CPU1 and= =20 > CPU9, respectively. (The receivers that used CPU5 and CPU13 already=20 > terminated due to their timeout.) >=20 > *Thus, rte_exit() behaves differently now: it used the terminate the=20 > main program but now it does not. *(And it also suppresses some=20 > previously sent output.)* > * >=20 > Is it a bug a new feature? >=20 > *How could I achieve the old behavior?* (Or at least the termination of=20 > the main program by sender threads?) >=20 > Thank you very much for you guidance! >=20 > Best regards, >=20 > G=C3=A1bor Lencse Please get a backtrace. Simple way is to attach gdb to that process. I suspect that since rte_exit() call the internal eal_cleanup function and that calls close in the dirver, that ICE driver close function has a bug. Perhaps ice close function does not correctly handle case where the device has not started.