From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id F089D42355; Thu, 12 Oct 2023 17:37:38 +0200 (CEST) Received: from mails.dpdk.org (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id 8C7D64029F; Thu, 12 Oct 2023 17:37:38 +0200 (CEST) Received: from inbox.dpdk.org (inbox.dpdk.org [95.142.172.178]) by mails.dpdk.org (Postfix) with ESMTP id 953E24028A for ; Thu, 12 Oct 2023 17:37:37 +0200 (CEST) Received: by inbox.dpdk.org (Postfix, from userid 33) id 774EB42363; Thu, 12 Oct 2023 17:37:37 +0200 (CEST) From: bugzilla@dpdk.org To: dev@dpdk.org Subject: [Bug 1298] memif disconnect thread vs. rx_burst worker thread(s) crash Date: Thu, 12 Oct 2023 15:37:37 +0000 X-Bugzilla-Reason: AssignedTo X-Bugzilla-Type: new X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: DPDK X-Bugzilla-Component: ethdev X-Bugzilla-Version: 21.11 X-Bugzilla-Keywords: X-Bugzilla-Severity: normal X-Bugzilla-Who: bly454@gmail.com X-Bugzilla-Status: UNCONFIRMED X-Bugzilla-Resolution: X-Bugzilla-Priority: Normal X-Bugzilla-Assigned-To: dev@dpdk.org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: bug_id short_desc product version rep_platform op_sys bug_status bug_severity priority component assigned_to reporter target_milestone Message-ID: Content-Type: multipart/alternative; boundary=16971250570.4E5fEDE9a.978940 Content-Transfer-Encoding: 7bit X-Bugzilla-URL: http://bugs.dpdk.org/ Auto-Submitted: auto-generated X-Auto-Response-Suppress: All MIME-Version: 1.0 X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org --16971250570.4E5fEDE9a.978940 Date: Thu, 12 Oct 2023 17:37:37 +0200 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://bugs.dpdk.org/ Auto-Submitted: auto-generated X-Auto-Response-Suppress: All https://bugs.dpdk.org/show_bug.cgi?id=3D1298 Bug ID: 1298 Summary: memif disconnect thread vs. rx_burst worker thread(s) crash Product: DPDK Version: 21.11 Hardware: All OS: All Status: UNCONFIRMED Severity: normal Priority: Normal Component: ethdev Assignee: dev@dpdk.org Reporter: bly454@gmail.com Target Milestone: --- Hello, We have run into a timing issue between threads when using the memif interf= ace type and need some guidance. Our application has a DPDK based process operating (among other things) a m= emif server interface. The problem is exposed when this memif interface receives= a memif.disconnect message from the remote client, while in the middle of an rte_eth_rx_burst() on this same memif interface. As the IRQ message handlin= g is on its own thread as compared to the DPDK worker thread doing the rx_burst, this resulted in a crash. The backtraces for which have been shared below. How does one ensure there are guard rails in place to gracefully exit the rx-burst when a disconnect occurs? Or, how do we properly modify the code s= uch that we defer responding to the disconnect CB after the rx-burst operation= has completed? We are utilizing DPDK 21.11.2. I have diff=E2=80=99d dpdks-stable:22.11.3 in ./drivers/net/memif, but I do not see anything obvious that would address t= his. I did a similar diff for dpdk:23.07, but do not see anything obvious there either. -Mike (gdb) thread 1 [Switching to thread 1 (Thread 0x7f17e2813600 (LWP 470))] #0 0x00007f17e374d225 in eth_memif_rx (queue=3D0x1189023b00, bufs=3D0x7f17e28100e8, nb_pkts=3D32) at ../git/drivers/net/memif/rte_eth_memif.c:338 338 last_slot =3D __atomic_load_n(&ring->head, __ATOMIC_ACQUIRE); (gdb) bt #0 0x00007f17e374d225 in eth_memif_rx (queue=3D0x1189023b00, bufs=3D0x7f17e28100e8, nb_pkts=3D32) at ../git/drivers/net/memif/rte_eth_memif.c:338 #1 0x000000000047e6fb in rte_eth_rx_burst (nb_pkts=3D32, rx_pkts=3D0x7f17e= 28100e8, queue_id=3D0, port_id=3D) at /usr/include/rte_ethdev.h:5368 #2 pmd_main_loop () at ../git/swfw/api/src/swfwPmd.c:1086 #3 0x000000000047f309 in pmd_launch_one_lcore (dummy=3D) at ../git/my_process.c:1157 #4 0x00007f17f7070e7c in eal_thread_loop (arg=3D) at ../git/lib/eal/linux/eal_thread.c:146 #5 0x00007f17f4c3da72 in start_thread (arg=3D) at pthread_create.c:442 #6 0x00007f17f4cbf930 in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81 (gdb) l 333 ring_size =3D 1 << mq->log2_ring_size; 334 mask =3D ring_size - 1; 335 336 if (type =3D=3D MEMIF_RING_C2S) { 337 cur_slot =3D mq->last_head; 338 last_slot =3D __atomic_load_n(&ring->head, __ATOMIC_ACQUIRE); 339 } else { 340 cur_slot =3D mq->last_tail; 341 last_slot =3D __atomic_load_n(&ring->tail, __ATOMIC_ACQUIRE); 342 } (gdb) p ring->head Cannot access memory at address 0x7f17d8e58006 (gdb) thread 19 [Switching to thread 19 (Thread 0x7f17f0804600 (LWP 468))] #0 0x00007f17f4caf97b in __GI___close (fd=3D494) at ../sysdeps/unix/sysv/linux/close.c:27 27 return SYSCALL_CANCEL (close, fd); (gdb) bt #0 0x00007f17f4caf97b in __GI___close (fd=3D494) at ../sysdeps/unix/sysv/linux/close.c:27 #1 0x00007f17e374f01f in memif_free_regions (dev=3Ddev@entry=3D0x7f17f727f= 000 ) at ../git/drivers/net/memif/rte_eth_memif.c:882 #2 0x00007f17e37475d0 in memif_disconnect (dev=3D0x7f17f727f000 ) at ../git/drivers/net/memif/memif_socket.c:623 #3 0x00007f17f7091bd2 in eal_intr_process_interrupts (nfds=3D, events=3D) at ../git/lib/eal/linux/eal_interrupts.c:1026 #4 eal_intr_handle_interrupts (totalfds=3D, pfd=3D20) at ../git/lib/eal/linux/eal_interrupts.c:1100 #5 eal_intr_thread_main (arg=3D) at ../git/lib/eal/linux/eal_interrupts.c:1172 #6 0x00007f17f4c3da72 in start_thread (arg=3D) at pthread_create.c:442 #7 0x00007f17f4cbf930 in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81 --=20 You are receiving this mail because: You are the assignee for the bug.= --16971250570.4E5fEDE9a.978940 Date: Thu, 12 Oct 2023 17:37:37 +0200 MIME-Version: 1.0 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://bugs.dpdk.org/ Auto-Submitted: auto-generated X-Auto-Response-Suppress: All
Bug ID 1298
Summary memif disconnect thread vs. rx_burst worker thread(s) crash
Product DPDK
Version 21.11
Hardware All
OS All
Status UNCONFIRMED
Severity normal
Priority Normal
Component ethdev
Assignee dev@dpdk.org
Reporter bly454@gmail.com
Target Milestone ---

Hello,

We have run into a timing issue between threads when using the memif interf=
ace
type and need some guidance.

Our application has a DPDK based process operating (among other things) a m=
emif
server interface. The problem is exposed when this memif interface receives=
 a
memif.disconnect message from the remote client, while in the middle of an
rte_eth_rx_burst() on this same memif interface. As the IRQ message handlin=
g is
on its own thread as compared to the DPDK worker thread doing the rx_burst,
this resulted in a crash. The backtraces for which have been shared below.

How does one ensure there are guard rails in place to gracefully exit the
rx-burst when a disconnect occurs? Or, how do we properly modify the code s=
uch
that  we defer responding to the disconnect CB after the rx-burst operation=
 has
completed?

We are utilizing DPDK 21.11.2. I have diff=E2=80=99d dpdks-stable:22.11.3 in
./drivers/net/memif, but I do not see anything obvious that would address t=
his.
I did a similar diff for dpdk:23.07, but do not see anything obvious there
either.

-Mike

(gdb) thread 1
[Switching to thread 1 (Thread 0x7f17e2813600 (LWP 470))]
#0  0x00007f17e374d225 in eth_memif_rx (queue=3D0x1189023b00,
bufs=3D0x7f17e28100e8, nb_pkts=3D32) at
../git/drivers/net/memif/rte_eth_memif.c:338
338                     last_slot =3D __atomic_load_n(&ring->head,
__ATOMIC_ACQUIRE);
(gdb) bt
#0  0x00007f17e374d225 in eth_memif_rx (queue=3D0x1189023b00,
bufs=3D0x7f17e28100e8, nb_pkts=3D32) at
../git/drivers/net/memif/rte_eth_memif.c:338
#1  0x000000000047e6fb in rte_eth_rx_burst (nb_pkts=3D32, rx_pkts=3D0x7f17e=
28100e8,
queue_id=3D0, port_id=3D<optimized out>) at /usr/include/rte_ethdev.h=
:5368
#2  pmd_main_loop () at ../git/swfw/api/src/swfwPmd.c:1086
#3  0x000000000047f309 in pmd_launch_one_lcore (dummy=3D<optimized out&g=
t;) at
../git/my_process.c:1157
#4  0x00007f17f7070e7c in eal_thread_loop (arg=3D<optimized out>) at
../git/lib/eal/linux/eal_thread.c:146
#5  0x00007f17f4c3da72 in start_thread (arg=3D<optimized out>) at
pthread_create.c:442
#6  0x00007f17f4cbf930 in clone3 () at
../sysdeps/unix/sysv/linux/x86_64/clone3.S:81
(gdb) l
333             ring_size =3D 1 << mq->log2_ring_size;
334             mask =3D ring_size - 1;
335
336             if (type =3D=3D MEMIF_RING_C2S) {
337                     cur_slot =3D mq->last_head;
338                     last_slot =3D __atomic_load_n(&ring->head,
__ATOMIC_ACQUIRE);
339             } else {
340                     cur_slot =3D mq->last_tail;
341                     last_slot =3D __atomic_load_n(&ring->tail,
__ATOMIC_ACQUIRE);
342             }
(gdb) p ring->head
Cannot access memory at address 0x7f17d8e58006

(gdb) thread 19
[Switching to thread 19 (Thread 0x7f17f0804600 (LWP 468))]
#0  0x00007f17f4caf97b in __GI___close (fd=3D494) at
../sysdeps/unix/sysv/linux/close.c:27
27        return SYSCALL_CANCEL (close, fd);
(gdb) bt
#0  0x00007f17f4caf97b in __GI___close (fd=3D494) at
../sysdeps/unix/sysv/linux/close.c:27
#1  0x00007f17e374f01f in memif_free_regions (dev=3Ddev@entry=3D0x7f17f=
727f000
<rte_eth_devices+99072>) at ../git/drivers/net/memif/rte_eth_memif.c:=
882
#2  0x00007f17e37475d0 in memif_disconnect (dev=3D0x7f17f727f000
<rte_eth_devices+99072>) at ../git/drivers/net/memif/memif_socket.c:6=
23
#3  0x00007f17f7091bd2 in eal_intr_process_interrupts (nfds=3D<optimized=
 out>,
events=3D<optimized out>) at ../git/lib/eal/linux/eal_interrupts.c:10=
26
#4  eal_intr_handle_interrupts (totalfds=3D<optimized out>, pfd=3D20)=
 at
../git/lib/eal/linux/eal_interrupts.c:1100
#5  eal_intr_thread_main (arg=3D<optimized out>) at
../git/lib/eal/linux/eal_interrupts.c:1172
#6  0x00007f17f4c3da72 in start_thread (arg=3D<optimized out>) at
pthread_create.c:442
#7  0x00007f17f4cbf930 in clone3 () at
../sysdeps/unix/sysv/linux/x86_64/clone3.S:81
          


You are receiving this mail because:
  • You are the assignee for the bug.
=20=20=20=20=20=20=20=20=20=20
= --16971250570.4E5fEDE9a.978940--