From: Stephen Hemminger <stephen@networkplumber.org>
To: "Bly, Mike" <mbly@ciena.com>
Cc: "dev@dpdk.org" <dev@dpdk.org>, Jakub Grajciar <jgrajcia@cisco.com>
Subject: Re: memif thread race condition on memif.disconnect()
Date: Mon, 30 Oct 2023 12:18:28 -0700 [thread overview]
Message-ID: <20231030121828.337c96b0@fedora> (raw)
In-Reply-To: <BYAPR04MB4325C64E8B14347B37EFF237CFCCA@BYAPR04MB4325.namprd04.prod.outlook.com>
On Wed, 11 Oct 2023 19:57:56 +0000
"Bly, Mike" <mbly@ciena.com> wrote:
> Hello,
>
> We have run into a timing issue between threads when using the memif
> interface type and need some guidance.
>
> Our application has a DPDK based process operating (among other
> things) a memif server interface. The problem is exposed when this
> memif interface receives a memif.disconnect message from the remote
> client, while in the middle of an rte_eth_rx_burst() on this same
> memif interface. As the IRQ message handling is on its own thread as
> compared to the DPDK worker thread doing the rx_burst, this resulted
> in a crash. The backtraces for which have been shared below. How does
> one ensure there are guard rails in place to gracefully exit the
> rx-burst when a disconnect occurs? Or, how do we properly modify the
> code such that we defer responding to the disconnect CB after the
> rx-burst operation has completed?
>
> We are utilizing DPDK 21.11.2. I have diff'd dpdks-stable:22.11.3 in
> ./drivers/net/memif, but I do not see anything obvious that would
> address this. I did a similar diff for dpdk:23.07, but do not see
> anything obvious there either.
>
> -Mike
>
> (gdb) thread 1
> [Switching to thread 1 (Thread 0x7f17e2813600 (LWP 470))]
> #0 0x00007f17e374d225 in eth_memif_rx (queue=0x1189023b00,
> bufs=0x7f17e28100e8, nb_pkts=32) at
> ../git/drivers/net/memif/rte_eth_memif.c:338 338
> last_slot = __atomic_load_n(&ring->head, __ATOMIC_ACQUIRE); (gdb) bt
> #0 0x00007f17e374d225 in eth_memif_rx (queue=0x1189023b00,
> bufs=0x7f17e28100e8, nb_pkts=32) at
> ../git/drivers/net/memif/rte_eth_memif.c:338 #1 0x000000000047e6fb
> in rte_eth_rx_burst (nb_pkts=32, rx_pkts=0x7f17e28100e8, queue_id=0,
> port_id=<optimized out>) at /usr/include/rte_ethdev.h:5368 #2
> pmd_main_loop () at ../git/swfw/api/src/swfwPmd.c:1086 #3
> 0x000000000047f309 in pmd_launch_one_lcore (dummy=<optimized out>) at
> ../git/my_process.c:1157 #4 0x00007f17f7070e7c in eal_thread_loop
> (arg=<optimized out>) at ../git/lib/eal/linux/eal_thread.c:146 #5
> 0x00007f17f4c3da72 in start_thread (arg=<optimized out>) at
> pthread_create.c:442 #6 0x00007f17f4cbf930 in clone3 () at
> ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81 (gdb) l 333
> ring_size = 1 << mq->log2_ring_size; 334 mask = ring_size
> - 1; 335 336 if (type == MEMIF_RING_C2S) { 337
> cur_slot = mq->last_head; 338 last_slot
> = __atomic_load_n(&ring->head, __ATOMIC_ACQUIRE); 339 }
> else { 340 cur_slot = mq->last_tail; 341
> last_slot = __atomic_load_n(&ring->tail,
> __ATOMIC_ACQUIRE); 342 } (gdb) p ring->head Cannot access
> memory at address 0x7f17d8e58006
>
> (gdb) thread 19
> [Switching to thread 19 (Thread 0x7f17f0804600 (LWP 468))]
> #0 0x00007f17f4caf97b in __GI___close (fd=494) at
> ../sysdeps/unix/sysv/linux/close.c:27 27 return SYSCALL_CANCEL
> (close, fd); (gdb) bt
> #0 0x00007f17f4caf97b in __GI___close (fd=494) at
> ../sysdeps/unix/sysv/linux/close.c:27 #1 0x00007f17e374f01f in
> memif_free_regions (dev=dev@entry=0x7f17f727f000
> <rte_eth_devices+99072>) at
> ../git/drivers/net/memif/rte_eth_memif.c:882 #2 0x00007f17e37475d0
> in memif_disconnect (dev=0x7f17f727f000 <rte_eth_devices+99072>) at
> ../git/drivers/net/memif/memif_socket.c:623 #3 0x00007f17f7091bd2 in
> eal_intr_process_interrupts (nfds=<optimized out>, events=<optimized
> out>) at ../git/lib/eal/linux/eal_interrupts.c:1026 #4
> out>eal_intr_handle_interrupts (totalfds=<optimized out>, pfd=20) at
> out>../git/lib/eal/linux/eal_interrupts.c:1100 #5
> out>eal_intr_thread_main (arg=<optimized out>) at
> out>../git/lib/eal/linux/eal_interrupts.c:1172 #6 0x00007f17f4c3da72
> out>in start_thread (arg=<optimized out>) at pthread_create.c:442 #7
> out>0x00007f17f4cbf930 in clone3 () at
> out>../sysdeps/unix/sysv/linux/x86_64/clone3.S:81
>
I don't think memif maintainer has been very active.
One possibility would be the memif driver support removal event
interrupt. This would require driver and application change.
prev parent reply other threads:[~2023-10-30 19:18 UTC|newest]
Thread overview: 2+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-10-11 19:57 Bly, Mike
2023-10-30 19:18 ` Stephen Hemminger [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20231030121828.337c96b0@fedora \
--to=stephen@networkplumber.org \
--cc=dev@dpdk.org \
--cc=jgrajcia@cisco.com \
--cc=mbly@ciena.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).