From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id 1B92143245; Mon, 30 Oct 2023 20:18:35 +0100 (CET) Received: from mails.dpdk.org (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id DE496402B4; Mon, 30 Oct 2023 20:18:34 +0100 (CET) Received: from mail-pf1-f179.google.com (mail-pf1-f179.google.com [209.85.210.179]) by mails.dpdk.org (Postfix) with ESMTP id 8FA8A40266 for ; Mon, 30 Oct 2023 20:18:32 +0100 (CET) Received: by mail-pf1-f179.google.com with SMTP id d2e1a72fcca58-694ed847889so4136080b3a.2 for ; Mon, 30 Oct 2023 12:18:32 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=networkplumber-org.20230601.gappssmtp.com; s=20230601; t=1698693512; x=1699298312; darn=dpdk.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:subject:cc:to:from:date:from:to:cc:subject:date :message-id:reply-to; bh=gF+mGpYlOheDkOtCE9XKMQxVi8WRcxpb4pi+8Hl9dhg=; b=w25lPQ+KKqDZdhLVr3Uz88Q397RLkpsPAEQHwpGa559cSs3U49sOYG0xDuv5Wtr4zF jKVx10Xdn5vEHZRZk2/dDp0lgstjB0JKtJlGJ4V+sjIbPI2qZHuqweFDULpKiNX+9PA/ Deb1S3Xkd994as08MWM95YezmAd2hYxtfL1/Rgn/Yp4sGOlUoC1NmbkMrCQcTbaDbrpI 8kg1rNy3psCMXBSAV512wltqcSSPaj1TNcT7TiFyOhjEqqoUmAH80U7eRNrKpJzQ1U38 eCqhc7PYYDby1eLQHA515P2f9TGA9vzsxuMSVz52c4k+Lz7Vx4G+NEgBD06JjgjMEG3X 4GcA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1698693512; x=1699298312; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:subject:cc:to:from:date:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=gF+mGpYlOheDkOtCE9XKMQxVi8WRcxpb4pi+8Hl9dhg=; b=LbU8CHvHQMyZYarJLRRimH2ZQFdsphbQCEyMoUzXdCUqnkDpcg7ChI8A8TvbxcrEba uCVu9xtKzkzSNfR2PrgzP1LeiAau7QDXpQlO4EwdsCAebhZZcAG6JfCfk+i9JEc2cZ/h K2vzpQ08ARoV1PVv+IX+c4pQxXT2+mpOTxDCYvyUMaTdWVpb3GNNA4YZJ4P7rYMZDlu9 DzG/8N3M8wsK21aggyRs5+dRCbupMebuZRmBUr4ILLjKW9keYXKPnXzjjZzm1TL+q5ke wrbEhCf7hVppcrhEsmS7PW/JWzPcH4EKaydLSrBUY7XFr7epqbWg2nbSoD3iRcvRkLG2 kllg== X-Gm-Message-State: AOJu0Yy3ZsBxWjP+OgugZ6m1ZBmbbTeb6dehlqB+ZqKEIvR0QFumTvpQ JafhzaU++SM0TRMazANi+lkD53qp0/sO1/rI7nAGclrj30g= X-Google-Smtp-Source: AGHT+IF+Sls0UjK1btEaBVf+Aid0y18mpKs5ACrmoqZONMORPPRaA8SjUwWLLaxIj5ig9+LGeXgHWw== X-Received: by 2002:a05:6a00:17a9:b0:6be:265:1bf6 with SMTP id s41-20020a056a0017a900b006be02651bf6mr9862510pfg.32.1698693511794; Mon, 30 Oct 2023 12:18:31 -0700 (PDT) Received: from fedora ([38.142.2.14]) by smtp.gmail.com with ESMTPSA id du15-20020a056a002b4f00b00690c7552098sm6220472pfb.44.2023.10.30.12.18.31 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 30 Oct 2023 12:18:31 -0700 (PDT) Date: Mon, 30 Oct 2023 12:18:28 -0700 From: Stephen Hemminger To: "Bly, Mike" Cc: "dev@dpdk.org" , Jakub Grajciar Subject: Re: memif thread race condition on memif.disconnect() Message-ID: <20231030121828.337c96b0@fedora> In-Reply-To: References: X-Mailer: Claws Mail 4.1.1 (GTK 3.24.38; x86_64-redhat-linux-gnu) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org On Wed, 11 Oct 2023 19:57:56 +0000 "Bly, Mike" wrote: > Hello, > > We have run into a timing issue between threads when using the memif > interface type and need some guidance. > > Our application has a DPDK based process operating (among other > things) a memif server interface. The problem is exposed when this > memif interface receives a memif.disconnect message from the remote > client, while in the middle of an rte_eth_rx_burst() on this same > memif interface. As the IRQ message handling is on its own thread as > compared to the DPDK worker thread doing the rx_burst, this resulted > in a crash. The backtraces for which have been shared below. How does > one ensure there are guard rails in place to gracefully exit the > rx-burst when a disconnect occurs? Or, how do we properly modify the > code such that we defer responding to the disconnect CB after the > rx-burst operation has completed? > > We are utilizing DPDK 21.11.2. I have diff'd dpdks-stable:22.11.3 in > ./drivers/net/memif, but I do not see anything obvious that would > address this. I did a similar diff for dpdk:23.07, but do not see > anything obvious there either. > > -Mike > > (gdb) thread 1 > [Switching to thread 1 (Thread 0x7f17e2813600 (LWP 470))] > #0 0x00007f17e374d225 in eth_memif_rx (queue=0x1189023b00, > bufs=0x7f17e28100e8, nb_pkts=32) at > ../git/drivers/net/memif/rte_eth_memif.c:338 338 > last_slot = __atomic_load_n(&ring->head, __ATOMIC_ACQUIRE); (gdb) bt > #0 0x00007f17e374d225 in eth_memif_rx (queue=0x1189023b00, > bufs=0x7f17e28100e8, nb_pkts=32) at > ../git/drivers/net/memif/rte_eth_memif.c:338 #1 0x000000000047e6fb > in rte_eth_rx_burst (nb_pkts=32, rx_pkts=0x7f17e28100e8, queue_id=0, > port_id=) at /usr/include/rte_ethdev.h:5368 #2 > pmd_main_loop () at ../git/swfw/api/src/swfwPmd.c:1086 #3 > 0x000000000047f309 in pmd_launch_one_lcore (dummy=) at > ../git/my_process.c:1157 #4 0x00007f17f7070e7c in eal_thread_loop > (arg=) at ../git/lib/eal/linux/eal_thread.c:146 #5 > 0x00007f17f4c3da72 in start_thread (arg=) at > pthread_create.c:442 #6 0x00007f17f4cbf930 in clone3 () at > ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81 (gdb) l 333 > ring_size = 1 << mq->log2_ring_size; 334 mask = ring_size > - 1; 335 336 if (type == MEMIF_RING_C2S) { 337 > cur_slot = mq->last_head; 338 last_slot > = __atomic_load_n(&ring->head, __ATOMIC_ACQUIRE); 339 } > else { 340 cur_slot = mq->last_tail; 341 > last_slot = __atomic_load_n(&ring->tail, > __ATOMIC_ACQUIRE); 342 } (gdb) p ring->head Cannot access > memory at address 0x7f17d8e58006 > > (gdb) thread 19 > [Switching to thread 19 (Thread 0x7f17f0804600 (LWP 468))] > #0 0x00007f17f4caf97b in __GI___close (fd=494) at > ../sysdeps/unix/sysv/linux/close.c:27 27 return SYSCALL_CANCEL > (close, fd); (gdb) bt > #0 0x00007f17f4caf97b in __GI___close (fd=494) at > ../sysdeps/unix/sysv/linux/close.c:27 #1 0x00007f17e374f01f in > memif_free_regions (dev=dev@entry=0x7f17f727f000 > ) at > ../git/drivers/net/memif/rte_eth_memif.c:882 #2 0x00007f17e37475d0 > in memif_disconnect (dev=0x7f17f727f000 ) at > ../git/drivers/net/memif/memif_socket.c:623 #3 0x00007f17f7091bd2 in > eal_intr_process_interrupts (nfds=, events= out>) at ../git/lib/eal/linux/eal_interrupts.c:1026 #4 > out>eal_intr_handle_interrupts (totalfds=, pfd=20) at > out>../git/lib/eal/linux/eal_interrupts.c:1100 #5 > out>eal_intr_thread_main (arg=) at > out>../git/lib/eal/linux/eal_interrupts.c:1172 #6 0x00007f17f4c3da72 > out>in start_thread (arg=) at pthread_create.c:442 #7 > out>0x00007f17f4cbf930 in clone3 () at > out>../sysdeps/unix/sysv/linux/x86_64/clone3.S:81 > I don't think memif maintainer has been very active. One possibility would be the memif driver support removal event interrupt. This would require driver and application change.