From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <luca.boccassi@gmail.com>
Received: from mail-wr1-f65.google.com (mail-wr1-f65.google.com
 [209.85.221.65]) by dpdk.org (Postfix) with ESMTP id C02404CA1;
 Thu, 16 Aug 2018 15:50:54 +0200 (CEST)
Received: by mail-wr1-f65.google.com with SMTP id u12-v6so4239091wrr.4;
 Thu, 16 Aug 2018 06:50:54 -0700 (PDT)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20161025;
 h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to
 :references;
 bh=OoN1luTMKuIv4XtXNZbAo4G+yixREYqmXo693o6dr9w=;
 b=mMg1IbkSEMRj16T9JI6psVrrlAyayrcP5pgO8MlKeuakImBzcW7UZwQWkKRVAelvNl
 bJ7evH/LAfiGCSSMMF5ezQgTMPvHgm5BSePSRZyxuIBuq8mPUNr1Aq+a8NylSp2ff3ec
 y8A78HKc9nUACHT8/eY21nqB0Z8+3nuqPScQv+pfv6eAYpv2+AdhdtU6IzmFn1McEL2W
 MwlHXXO/5EYtPD8qib5O8l6g9PGtUUnb21T0d37H3IjkN/dLO7FnmFhLVz6bGG8eQr+N
 ty/o7xeNf3myG8T8IlnovRjaq7KSpdyRJFYk6kLpAh2pqQ69t7JQ7Alio4HVjAPIJFeY
 iy1g==
X-Gm-Message-State: AOUpUlH4OqZVo/MS0GZOe39brWcJFgpdmkJ/CoL+jZivl1AVcMysWYDp
 Ig5tQVqdpdBpSBRTBxpYPDLIOUssysg=
X-Google-Smtp-Source: AA+uWPwsfDbBj+42Ugf61gyC8OWfQZ5ePRtcg1vR01OZocHA3neXba5sZxMCUryhMybBUw3BilyMEQ==
X-Received: by 2002:adf:c890:: with SMTP id k16-v6mr697787wrh.6.1534427454171; 
 Thu, 16 Aug 2018 06:50:54 -0700 (PDT)
Received: from localhost ([2001:1be0:110d:fcfe:41aa:5bfa:6cf3:7531])
 by smtp.gmail.com with ESMTPSA id n8-v6sm18450369wrt.56.2018.08.16.06.50.52
 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256);
 Thu, 16 Aug 2018 06:50:53 -0700 (PDT)
From: Luca Boccassi <bluca@debian.org>
To: dev@dpdk.org
Cc: maxime.coquelin@redhat.com, tiwei.bie@intel.com, yongwang@vmware.com,
 3chas3@gmail.com, bruce.richardson@intel.com, jianfeng.tan@intel.com,
 anatoly.burakov@intel.com, Luca Boccassi <bluca@debian.org>,
 stable@dpdk.org, Brian Russell <brussell@brocade.com>
Date: Thu, 16 Aug 2018 14:50:32 +0100
Message-Id: <20180816135032.28283-4-bluca@debian.org>
X-Mailer: git-send-email 2.18.0
In-Reply-To: <20180816135032.28283-1-bluca@debian.org>
References: <20180816135032.28283-1-bluca@debian.org>
Subject: [dpdk-dev] [PATCH 3/3] eal/linux: handle uio read failure in
	interrupt handler
X-BeenThere: dev@dpdk.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: DPDK patches and discussions <dev.dpdk.org>
List-Unsubscribe: <https://mails.dpdk.org/options/dev>,
 <mailto:dev-request@dpdk.org?subject=unsubscribe>
List-Archive: <http://mails.dpdk.org/archives/dev/>
List-Post: <mailto:dev@dpdk.org>
List-Help: <mailto:dev-request@dpdk.org?subject=help>
List-Subscribe: <https://mails.dpdk.org/listinfo/dev>,
 <mailto:dev-request@dpdk.org?subject=subscribe>
X-List-Received-Date: Thu, 16 Aug 2018 13:50:55 -0000

If a device is unplugged while an interrupt is pending, the
read call to the uio device to remove it from the poll wait list
can fail resulting in it being continually polled forever. This
change checks for the read failing and if so, unregisters the device
as an interrupt source and causes the wait list to be rebuilt.

This race has been reported and observed in production.

Fixes: 0a45657a6794 ("pci: rework interrupt handling")
Cc: stable@dpdk.org

Signed-off-by: Brian Russell <brussell@brocade.com>
Signed-off-by: Luca Boccassi <bluca@debian.org>
---
 lib/librte_eal/linuxapp/eal/eal_interrupts.c | 19 ++++++++++++++++++-
 1 file changed, 18 insertions(+), 1 deletion(-)

diff --git a/lib/librte_eal/linuxapp/eal/eal_interrupts.c b/lib/librte_eal/linuxapp/eal/eal_interrupts.c
index 4076c6d6ca..34584db883 100644
--- a/lib/librte_eal/linuxapp/eal/eal_interrupts.c
+++ b/lib/librte_eal/linuxapp/eal/eal_interrupts.c
@@ -627,7 +627,7 @@ eal_intr_process_interrupts(struct epoll_event *events, int nfds)
 	bool call = false;
 	int n, bytes_read;
 	struct rte_intr_source *src;
-	struct rte_intr_callback *cb;
+	struct rte_intr_callback *cb, *next;
 	union rte_intr_read_buffer buf;
 	struct rte_intr_callback active_cb;
 
@@ -701,6 +701,23 @@ eal_intr_process_interrupts(struct epoll_event *events, int nfds)
 					"descriptor %d: %s\n",
 					events[n].data.fd,
 					strerror(errno));
+				/*
+				 * The device is unplugged or buggy, remove
+				 * it as an interrupt source and return to
+				 * force the wait list to be rebuilt.
+				 */
+				rte_spinlock_lock(&intr_lock);
+				TAILQ_REMOVE(&intr_sources, src, next);
+				rte_spinlock_unlock(&intr_lock);
+
+				for (cb = TAILQ_FIRST(&src->callbacks); cb;
+							cb = next) {
+					next = TAILQ_NEXT(cb, next);
+					TAILQ_REMOVE(&src->callbacks, cb, next);
+					free(cb);
+				}
+				free(src);
+				return -1;
 			} else if (bytes_read == 0)
 				RTE_LOG(ERR, EAL, "Read nothing from file "
 					"descriptor %d\n", events[n].data.fd);
-- 
2.18.0