From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mellanox.co.il (mail-il-dmz.mellanox.com [193.47.165.129]) by dpdk.org (Postfix) with ESMTP id 60BFC1B512 for ; Fri, 30 Nov 2018 00:15:06 +0100 (CET) Received: from Internal Mail-Server by MTLPINE1 (envelope-from yskoh@mellanox.com) with ESMTPS (AES256-SHA encrypted); 30 Nov 2018 01:20:55 +0200 Received: from scfae-sc-2.mti.labs.mlnx (scfae-sc-2.mti.labs.mlnx [10.101.0.96]) by labmailer.mlnx (8.13.8/8.13.8) with ESMTP id wATNCW8R032075; Fri, 30 Nov 2018 01:15:00 +0200 From: Yongseok Koh To: Luca Boccassi Cc: Brian Russell , dpdk stable Date: Thu, 29 Nov 2018 15:11:21 -0800 Message-Id: <20181129231202.30436-87-yskoh@mellanox.com> X-Mailer: git-send-email 2.11.0 In-Reply-To: <20181129231202.30436-1-yskoh@mellanox.com> References: <20181129231202.30436-1-yskoh@mellanox.com> Subject: [dpdk-stable] patch 'eal/linux: handle UIO read failure in interrupt handler' has been queued to LTS release 17.11.5 X-BeenThere: stable@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches for DPDK stable branches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 29 Nov 2018 23:15:06 -0000 Hi, FYI, your patch has been queued to LTS release 17.11.5 Note it hasn't been pushed to http://dpdk.org/browse/dpdk-stable yet. It will be pushed if I get no objections before 12/01/18. So please shout if anyone has objections. Also note that after the patch there's a diff of the upstream commit vs the patch applied to the branch. If the code is different (ie: not only metadata diffs), due for example to a change in context or macro names, please double check it. Thanks. Yongseok --- >>From 06121f31e2304ea8d3bfea4de44a3d435426a97c Mon Sep 17 00:00:00 2001 From: Luca Boccassi Date: Wed, 31 Oct 2018 18:39:45 +0000 Subject: [PATCH] eal/linux: handle UIO read failure in interrupt handler [ upstream commit 349ac52bbc5264d774c7e28c62c4e3941055b9c4 ] If a device is unplugged while an interrupt is pending, the read call to the uio device to remove it from the poll wait list can fail resulting in it being continually polled forever. This change checks for the read failing and if so, unregisters the device as an interrupt source and causes the wait list to be rebuilt. This race has been reported and observed in production. Fixes: 0a45657a6794 ("pci: rework interrupt handling") Signed-off-by: Brian Russell Signed-off-by: Luca Boccassi --- lib/librte_eal/linuxapp/eal/eal_interrupts.c | 19 ++++++++++++++++++- 1 file changed, 18 insertions(+), 1 deletion(-) diff --git a/lib/librte_eal/linuxapp/eal/eal_interrupts.c b/lib/librte_eal/linuxapp/eal/eal_interrupts.c index e1179b85d..c54b8823d 100644 --- a/lib/librte_eal/linuxapp/eal/eal_interrupts.c +++ b/lib/librte_eal/linuxapp/eal/eal_interrupts.c @@ -652,7 +652,7 @@ eal_intr_process_interrupts(struct epoll_event *events, int nfds) bool call = false; int n, bytes_read; struct rte_intr_source *src; - struct rte_intr_callback *cb; + struct rte_intr_callback *cb, *next; union rte_intr_read_buffer buf; struct rte_intr_callback active_cb; @@ -723,6 +723,23 @@ eal_intr_process_interrupts(struct epoll_event *events, int nfds) "descriptor %d: %s\n", events[n].data.fd, strerror(errno)); + /* + * The device is unplugged or buggy, remove + * it as an interrupt source and return to + * force the wait list to be rebuilt. + */ + rte_spinlock_lock(&intr_lock); + TAILQ_REMOVE(&intr_sources, src, next); + rte_spinlock_unlock(&intr_lock); + + for (cb = TAILQ_FIRST(&src->callbacks); cb; + cb = next) { + next = TAILQ_NEXT(cb, next); + TAILQ_REMOVE(&src->callbacks, cb, next); + free(cb); + } + free(src); + return -1; } else if (bytes_read == 0) RTE_LOG(ERR, EAL, "Read nothing from file " "descriptor %d\n", events[n].data.fd); -- 2.11.0 --- Diff of the applied patch vs upstream commit (please double-check if non-empty: --- --- - 2018-11-29 15:01:49.052273191 -0800 +++ 0087-eal-linux-handle-UIO-read-failure-in-interrupt-handl.patch 2018-11-29 15:01:45.238958000 -0800 @@ -1,8 +1,10 @@ -From 349ac52bbc5264d774c7e28c62c4e3941055b9c4 Mon Sep 17 00:00:00 2001 +From 06121f31e2304ea8d3bfea4de44a3d435426a97c Mon Sep 17 00:00:00 2001 From: Luca Boccassi Date: Wed, 31 Oct 2018 18:39:45 +0000 Subject: [PATCH] eal/linux: handle UIO read failure in interrupt handler +[ upstream commit 349ac52bbc5264d774c7e28c62c4e3941055b9c4 ] + If a device is unplugged while an interrupt is pending, the read call to the uio device to remove it from the poll wait list can fail resulting in it being continually polled forever. This @@ -12,7 +14,6 @@ This race has been reported and observed in production. Fixes: 0a45657a6794 ("pci: rework interrupt handling") -Cc: stable@dpdk.org Signed-off-by: Brian Russell Signed-off-by: Luca Boccassi @@ -21,10 +22,10 @@ 1 file changed, 18 insertions(+), 1 deletion(-) diff --git a/lib/librte_eal/linuxapp/eal/eal_interrupts.c b/lib/librte_eal/linuxapp/eal/eal_interrupts.c -index 39252a887..cbac451e1 100644 +index e1179b85d..c54b8823d 100644 --- a/lib/librte_eal/linuxapp/eal/eal_interrupts.c +++ b/lib/librte_eal/linuxapp/eal/eal_interrupts.c -@@ -700,7 +700,7 @@ eal_intr_process_interrupts(struct epoll_event *events, int nfds) +@@ -652,7 +652,7 @@ eal_intr_process_interrupts(struct epoll_event *events, int nfds) bool call = false; int n, bytes_read; struct rte_intr_source *src; @@ -33,7 +34,7 @@ union rte_intr_read_buffer buf; struct rte_intr_callback active_cb; -@@ -780,6 +780,23 @@ eal_intr_process_interrupts(struct epoll_event *events, int nfds) +@@ -723,6 +723,23 @@ eal_intr_process_interrupts(struct epoll_event *events, int nfds) "descriptor %d: %s\n", events[n].data.fd, strerror(errno));