DPDK patches and discussions
 help / color / mirror / Atom feed
* [dpdk-dev] [PATCH V18 0/5] add hot plug failure and auto bind handler
@ 2018-04-03 18:17 Jeff Guo
  2018-04-03 18:17 ` [dpdk-dev] [PATCH V18 1/5] bus: introduce device hot unplug handle Jeff Guo
                   ` (4 more replies)
  0 siblings, 5 replies; 28+ messages in thread
From: Jeff Guo @ 2018-04-03 18:17 UTC (permalink / raw)
  To: stephen, bruce.richardson, ferruh.yigit, konstantin.ananyev,
	gaetan.rivet, jingjing.wu, thomas, motih, harry.van.haaren,
	jianfeng.tan
  Cc: jblunck, shreyansh.jain, dev, jia.guo, helin.zhang

At the prior, device event monitor machenism have been introduced.  
But for device hot unplug, if we want our data path would not be break when
device hot plug in or out, we still need some preparatory measures to do
some preparation work for the device detach and attach, so that we will 
not encounter memory fault after device have been plug out of the system,
and also let user directly attach device which have been auto bind onto 
the specific kernel driver.

This patch set will introduces two APIs to do that failure and auto bind
handle for hot plug feature, and also use testpmd to show example how to
use these 2 APIs for process hot plug event, let the process could be
smoothly like below case:

1)hot plug removal: 
plugout->failure handle->stop forward->stop port->close port->detach port

2)hot plug insertion:
plugin->kernel driver auto bind->attach port->start port

with this machenism, every use such as fail-safe driver which have enable device
event monitor will be able to develop their own hotplug application.

patchset history:
v16->v15:
add document, add signal bus handler, refine the code to be more clear.

the prior patch history please check the patch set
"add device event monitor framework"

Jeff Guo (5):
  bus: introduce device hot unplug handle
  bus/pci: implement handle hot unplug operation
  eal: add failure handler mechanism for hot plug
  eal: add driver auto bind for hot insertion
  app/testpmd: use auto handle for hotplug

 app/test-pmd/testpmd.c                    | 178 ++++++++++++++++++++++++----
 app/test-pmd/testpmd.h                    |   9 ++
 doc/guides/rel_notes/release_18_05.rst    |   8 ++
 drivers/bus/pci/pci_common.c              |  42 +++++++
 drivers/bus/pci/pci_common_uio.c          |  33 ++++++
 drivers/bus/pci/private.h                 |  12 ++
 lib/librte_eal/bsdapp/eal/eal_dev.c       |   7 ++
 lib/librte_eal/common/include/rte_bus.h   |  15 +++
 lib/librte_eal/common/include/rte_dev.h   |  35 ++++++
 lib/librte_eal/linuxapp/eal/eal_dev.c     | 188 +++++++++++++++++++++++++++++-
 lib/librte_eal/linuxapp/igb_uio/igb_uio.c |   4 +
 lib/librte_eal/rte_eal_version.map        |   2 +
 12 files changed, 509 insertions(+), 24 deletions(-)

-- 
2.7.4

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [dpdk-dev] [PATCH V18 1/5] bus: introduce device hot unplug handle
  2018-04-03 18:17 [dpdk-dev] [PATCH V18 0/5] add hot plug failure and auto bind handler Jeff Guo
@ 2018-04-03 18:17 ` Jeff Guo
  2018-04-04  4:31   ` Tan, Jianfeng
  2018-04-03 18:17 ` [dpdk-dev] [PATCH V18 2/5] bus/pci: implement handle hot unplug operation Jeff Guo
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 28+ messages in thread
From: Jeff Guo @ 2018-04-03 18:17 UTC (permalink / raw)
  To: stephen, bruce.richardson, ferruh.yigit, konstantin.ananyev,
	gaetan.rivet, jingjing.wu, thomas, motih, harry.van.haaren,
	jianfeng.tan
  Cc: jblunck, shreyansh.jain, dev, jia.guo, helin.zhang

As of device hot unplug, we need some preparatory measures so that we will
not encounter memory fault after device have been plug out of the system,
and also let we could recover the running data path but not been break.
This allows the buses to handle device hot unplug event.
In the following patch, will show how to handle the case for pci bus.

Signed-off-by: Jeff Guo <jia.guo@intel.com>
---
v16->v15:
split patch, and remove the ops from RTE_VERIFY
---
 lib/librte_eal/common/include/rte_bus.h | 15 +++++++++++++++
 1 file changed, 15 insertions(+)

diff --git a/lib/librte_eal/common/include/rte_bus.h b/lib/librte_eal/common/include/rte_bus.h
index 6fb0834..ecd8b1c 100644
--- a/lib/librte_eal/common/include/rte_bus.h
+++ b/lib/librte_eal/common/include/rte_bus.h
@@ -168,6 +168,19 @@ typedef int (*rte_bus_unplug_t)(struct rte_device *dev);
 typedef int (*rte_bus_parse_t)(const char *name, void *addr);
 
 /**
+ * Implementation specific hot unplug handler function which is responsible
+ * for handle the failure when hot unplug the device, guaranty the system
+ * would not crash in the case.
+ * @param dev
+ *	Pointer of the device structure.
+ *
+ * @return
+ *	0 on success.
+ *	!0 on error.
+ */
+typedef int (*rte_bus_handle_hot_unplug_t)(struct rte_device *dev);
+
+/**
  * Bus scan policies
  */
 enum rte_bus_scan_mode {
@@ -209,6 +222,8 @@ struct rte_bus {
 	rte_bus_plug_t plug;         /**< Probe single device for drivers */
 	rte_bus_unplug_t unplug;     /**< Remove single device from driver */
 	rte_bus_parse_t parse;       /**< Parse a device name */
+	rte_bus_handle_hot_unplug_t handle_hot_unplug; /**< handle when device
+							hot unplug */
 	struct rte_bus_conf conf;    /**< Bus configuration */
 	rte_bus_get_iommu_class_t get_iommu_class; /**< Get iommu class */
 };
-- 
2.7.4

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [dpdk-dev] [PATCH V18 2/5] bus/pci: implement handle hot unplug operation
  2018-04-03 18:17 [dpdk-dev] [PATCH V18 0/5] add hot plug failure and auto bind handler Jeff Guo
  2018-04-03 18:17 ` [dpdk-dev] [PATCH V18 1/5] bus: introduce device hot unplug handle Jeff Guo
@ 2018-04-03 18:17 ` Jeff Guo
  2018-04-04  5:25   ` Tan, Jianfeng
  2018-04-03 18:17 ` [dpdk-dev] [PATCH V18 3/5] eal: add failure handler mechanism for hot plug Jeff Guo
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 28+ messages in thread
From: Jeff Guo @ 2018-04-03 18:17 UTC (permalink / raw)
  To: stephen, bruce.richardson, ferruh.yigit, konstantin.ananyev,
	gaetan.rivet, jingjing.wu, thomas, motih, harry.van.haaren,
	jianfeng.tan
  Cc: jblunck, shreyansh.jain, dev, jia.guo, helin.zhang

When handle device hot unplug event, remap a dummy memory to avoid
bus read/write error.

Signed-off-by: Jeff Guo <jia.guo@intel.com>
---
v16->v15;
split patch, merge some function to be simple
---
 drivers/bus/pci/pci_common.c     | 42 ++++++++++++++++++++++++++++++++++++++++
 drivers/bus/pci/pci_common_uio.c | 33 +++++++++++++++++++++++++++++++
 drivers/bus/pci/private.h        | 12 ++++++++++++
 3 files changed, 87 insertions(+)

diff --git a/drivers/bus/pci/pci_common.c b/drivers/bus/pci/pci_common.c
index 2a00f36..fa077ec 100644
--- a/drivers/bus/pci/pci_common.c
+++ b/drivers/bus/pci/pci_common.c
@@ -474,6 +474,47 @@ pci_find_device(const struct rte_device *start, rte_dev_cmp_t cmp,
 }
 
 static int
+pci_handle_hot_unplug(struct rte_device *dev)
+{
+	struct rte_pci_device *pdev;
+	int ret;
+
+	if (dev == NULL)
+		return -EINVAL;
+
+	pdev = RTE_DEV_TO_PCI(dev);
+
+	/* remap resources for devices */
+	switch (pdev->kdrv) {
+	case RTE_KDRV_VFIO:
+#ifdef VFIO_PRESENT
+		/* TODO */
+#endif
+		break;
+	case RTE_KDRV_IGB_UIO:
+	case RTE_KDRV_UIO_GENERIC:
+		if (rte_eal_using_phys_addrs()) {
+			/* map resources for devices that use uio */
+			ret = pci_uio_remap_resource(pdev);
+		}
+		break;
+	case RTE_KDRV_NIC_UIO:
+		ret = pci_uio_remap_resource(pdev);
+		break;
+	default:
+		RTE_LOG(DEBUG, EAL,
+			"  Not managed by a supported kernel driver, skipped\n");
+		ret = 1;
+		break;
+	}
+
+	if (ret != 0)
+		RTE_LOG(ERR, EAL, "failed to handle hot unplug of %s",
+			pdev->name);
+	return ret;
+}
+
+static int
 pci_plug(struct rte_device *dev)
 {
 	return pci_probe_all_drivers(RTE_DEV_TO_PCI(dev));
@@ -503,6 +544,7 @@ struct rte_pci_bus rte_pci_bus = {
 		.unplug = pci_unplug,
 		.parse = pci_parse,
 		.get_iommu_class = rte_pci_get_iommu_class,
+		.handle_hot_unplug = pci_handle_hot_unplug,
 	},
 	.device_list = TAILQ_HEAD_INITIALIZER(rte_pci_bus.device_list),
 	.driver_list = TAILQ_HEAD_INITIALIZER(rte_pci_bus.driver_list),
diff --git a/drivers/bus/pci/pci_common_uio.c b/drivers/bus/pci/pci_common_uio.c
index 54bc20b..468ade4 100644
--- a/drivers/bus/pci/pci_common_uio.c
+++ b/drivers/bus/pci/pci_common_uio.c
@@ -146,6 +146,39 @@ pci_uio_unmap(struct mapped_pci_resource *uio_res)
 	}
 }
 
+/* remap the PCI resource of a PCI device in private virtual memory */
+int
+pci_uio_remap_resource(struct rte_pci_device *dev)
+{
+	int i;
+	uint64_t phaddr;
+	void *map_address;
+
+	if (dev == NULL)
+		return -1;
+
+	/* Map all BARs */
+	for (i = 0; i != PCI_MAX_RESOURCE; i++) {
+		/* skip empty BAR */
+		phaddr = dev->mem_resource[i].phys_addr;
+		if (phaddr == 0)
+			continue;
+		pci_unmap_resource(dev->mem_resource[i].addr,
+				(size_t)dev->mem_resource[i].len);
+		map_address = pci_map_resource(
+				dev->mem_resource[i].addr, -1, 0,
+				(size_t)dev->mem_resource[i].len,
+				MAP_ANONYMOUS | MAP_FIXED);
+		if (map_address == MAP_FAILED) {
+			RTE_LOG(ERR, EAL,
+				"Cannot remap resource for device %s\n", dev->name);
+			return -1;
+		}
+	}
+
+	return 0;
+}
+
 static struct mapped_pci_resource *
 pci_uio_find_resource(struct rte_pci_device *dev)
 {
diff --git a/drivers/bus/pci/private.h b/drivers/bus/pci/private.h
index 88fa587..7a862ef 100644
--- a/drivers/bus/pci/private.h
+++ b/drivers/bus/pci/private.h
@@ -173,6 +173,18 @@ void pci_uio_free_resource(struct rte_pci_device *dev,
 		struct mapped_pci_resource *uio_res);
 
 /**
+ * remap the pci uio resource..
+ *
+ * @param dev
+ *   Point to the struct rte pci device.
+ * @return
+ *   - On success, zero.
+ *   - On failure, a negative value.
+ */
+int
+pci_uio_remap_resource(struct rte_pci_device *dev);
+
+/**
  * Map device memory to uio resource
  *
  * This function is private to EAL.
-- 
2.7.4

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [dpdk-dev] [PATCH V18 3/5] eal: add failure handler mechanism for hot plug
  2018-04-03 18:17 [dpdk-dev] [PATCH V18 0/5] add hot plug failure and auto bind handler Jeff Guo
  2018-04-03 18:17 ` [dpdk-dev] [PATCH V18 1/5] bus: introduce device hot unplug handle Jeff Guo
  2018-04-03 18:17 ` [dpdk-dev] [PATCH V18 2/5] bus/pci: implement handle hot unplug operation Jeff Guo
@ 2018-04-03 18:17 ` Jeff Guo
  2018-04-04  2:58   ` Zhang, Qi Z
  2018-04-03 18:17 ` [dpdk-dev] [PATCH V18 4/5] eal: add driver auto bind for hot insertion Jeff Guo
  2018-04-03 18:17 ` [dpdk-dev] [PATCH V18 5/5] app/testpmd: use auto handle for hotplug Jeff Guo
  4 siblings, 1 reply; 28+ messages in thread
From: Jeff Guo @ 2018-04-03 18:17 UTC (permalink / raw)
  To: stephen, bruce.richardson, ferruh.yigit, konstantin.ananyev,
	gaetan.rivet, jingjing.wu, thomas, motih, harry.van.haaren,
	jianfeng.tan
  Cc: jblunck, shreyansh.jain, dev, jia.guo, helin.zhang

This patch introduces an API (rte_dev_handle_hot_unplug) to handle device
hot unplug event. When device be hot plug out, the device resource
become invalid, if this resource is still be unexpected read/write,
system will crash. The api let user register the hot unplug handler, when
hot plug failure occur, the working thread will be block until the uevent
mechanism successful recovery the memory and guaranty the application keep
running smoothly.

Signed-off-by: Jeff Guo <jia.guo@intel.com>
---
v16->v15:
add document and signal bus handler 
---
 doc/guides/rel_notes/release_18_05.rst    |   6 ++
 lib/librte_eal/common/include/rte_dev.h   |  19 +++++
 lib/librte_eal/linuxapp/eal/eal_dev.c     | 134 +++++++++++++++++++++++++++++-
 lib/librte_eal/linuxapp/igb_uio/igb_uio.c |   4 +
 lib/librte_eal/rte_eal_version.map        |   1 +
 5 files changed, 163 insertions(+), 1 deletion(-)

diff --git a/doc/guides/rel_notes/release_18_05.rst b/doc/guides/rel_notes/release_18_05.rst
index 37e00c4..3aacbf1 100644
--- a/doc/guides/rel_notes/release_18_05.rst
+++ b/doc/guides/rel_notes/release_18_05.rst
@@ -51,6 +51,12 @@ New Features
   * ``rte_dev_event_callback_register`` and ``rte_dev_event_callback_unregister``
     are for the user's callbacks register and unregister.
 
+* **Added hot plug failure handler.**
+
+  Added a failure handler machenism to handle hot plug removal.
+
+  * ``rte_dev_handle_hot_unplug`` for handle hot plug removel failure.
+
 API Changes
 -----------
 
diff --git a/lib/librte_eal/common/include/rte_dev.h b/lib/librte_eal/common/include/rte_dev.h
index 4c78938..7075e56 100644
--- a/lib/librte_eal/common/include/rte_dev.h
+++ b/lib/librte_eal/common/include/rte_dev.h
@@ -361,4 +361,23 @@ rte_dev_event_monitor_start(void);
  */
 int __rte_experimental
 rte_dev_event_monitor_stop(void);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice
+ *
+ * It can be used to register the device signal bus handler, and save the
+ * current environment of each thread, when signal bus error invoke, the
+ * handler would restore the environment by long jmp to each working
+ * thread, then block the thread  to waiting until the memory recovery
+ * and remapping be finished, that would guaranty the system not crash
+ * when the device be hot unplug.
+ *
+ * @param none
+ * @return
+ *   - On success, zero.
+ *   - On failure, a negative value.
+ */
+int __rte_experimental
+rte_dev_handle_hot_unplug(void);
 #endif /* _RTE_DEV_H_ */
diff --git a/lib/librte_eal/linuxapp/eal/eal_dev.c b/lib/librte_eal/linuxapp/eal/eal_dev.c
index 9f2ee40..fabb37a 100644
--- a/lib/librte_eal/linuxapp/eal/eal_dev.c
+++ b/lib/librte_eal/linuxapp/eal/eal_dev.c
@@ -4,6 +4,9 @@
 
 #include <string.h>
 #include <unistd.h>
+#include <signal.h>
+#include <setjmp.h>
+#include <pthread.h>
 #include <sys/socket.h>
 #include <linux/netlink.h>
 
@@ -12,12 +15,17 @@
 #include <rte_dev.h>
 #include <rte_malloc.h>
 #include <rte_interrupts.h>
+#include <rte_bus.h>
+#include <rte_per_lcore.h>
 
 #include "eal_private.h"
 
 static struct rte_intr_handle intr_handle = {.fd = -1 };
 static bool monitor_started;
 
+pthread_mutex_t failure_recovery_lock;
+pthread_cond_t failure_recovery_cond;
+
 #define EAL_UEV_MSG_LEN 4096
 #define EAL_UEV_MSG_ELEM_LEN 128
 
@@ -29,6 +37,22 @@ enum eal_dev_event_subsystem {
 	EAL_DEV_EVENT_SUBSYSTEM_MAX
 };
 
+static RTE_DEFINE_PER_LCORE(sigjmp_buf, unplug_longjmp_env);
+
+static void sigbus_handler(int signum __rte_unused)
+{
+	RTE_LOG(DEBUG, EAL, "receive SIGBUS error!\n");
+	siglongjmp(RTE_PER_LCORE(unplug_longjmp_env), 1);
+}
+
+static int cmp_dev_name(const struct rte_device *dev,
+	const void *_name)
+{
+	const char *name = _name;
+
+	return strcmp(dev->name, name);
+}
+
 static int
 dev_uev_socket_fd_create(void)
 {
@@ -135,16 +159,114 @@ dev_uev_receive(int fd, struct rte_dev_event *uevent)
 	return 0;
 }
 
+static int
+dev_uev_remove_handler(struct rte_device *dev)
+{
+	struct rte_bus *bus = rte_bus_find_by_device_name(dev->name);
+	int ret;
+
+	if (!dev)
+		return -1;
+
+	if (bus->handle_hot_unplug) {
+		/**
+	 	 * call bus ops to handle hot unplug.
+	 	 */
+		ret = bus->handle_hot_unplug(dev);
+		if (ret) {
+			RTE_LOG(ERR, EAL,
+				"It cannot handle hot unplug for device (%s) "
+				"on the bus.\n ",
+				dev->name);
+			return ret;
+		}
+	}	
+	return 0;
+}
+
 static void
 dev_uev_process(__rte_unused void *param)
 {
 	struct rte_dev_event uevent;
+	struct rte_bus *bus;
+	struct rte_device *dev;
+	const char *busname;
+	int ret = 0;
 
 	if (dev_uev_receive(intr_handle.fd, &uevent))
 		return;
 
-	if (uevent.devname)
+	switch (uevent.subsystem) {
+	case EAL_DEV_EVENT_SUBSYSTEM_PCI:
+	case EAL_DEV_EVENT_SUBSYSTEM_UIO:
+		busname = "pci";
+		break;
+	default:
+		break;
+	}
+
+	if (uevent.devname) {
+		if (uevent.type == RTE_DEV_EVENT_REMOVE) {
+			bus = rte_bus_find_by_name(busname);
+			if (bus == NULL) {
+				RTE_LOG(ERR, EAL, "Cannot find bus (%s)\n",
+					uevent.devname);
+				return;
+			}
+			dev = bus->find_device(NULL, cmp_dev_name,
+					       uevent.devname);
+			if (dev == NULL) {
+				RTE_LOG(ERR, EAL,
+					"Cannot find unplugged device (%s)\n",
+					uevent.devname);
+				return;
+			}
+			ret = dev_uev_remove_handler(dev);
+			if (ret) {
+				RTE_LOG(ERR, EAL, "Driver cannot remap the "
+					"device (%s)\n",
+					dev->name);
+				return;
+			}
+			/* wake up all the threads */
+			pthread_cond_broadcast(&failure_recovery_cond);
+		}
+
 		dev_callback_process(uevent.devname, uevent.type);
+	}
+}
+
+int __rte_experimental
+rte_dev_handle_hot_unplug(void)
+{
+	struct sigaction act;
+	sigset_t mask;
+
+	/* set signal handlers */
+	memset(&act, 0x00, sizeof(struct sigaction));
+	act.sa_handler = sigbus_handler;
+	sigemptyset(&act.sa_mask);
+	act.sa_flags = SA_RESTART;
+	sigaction(SIGBUS, &act, NULL);
+	sigemptyset(&mask);
+	sigaddset(&mask, SIGBUS);
+	pthread_sigmask(SIG_UNBLOCK, &mask, NULL);
+
+	if (sigsetjmp(RTE_PER_LCORE(unplug_longjmp_env), 1)) {
+		/*
+		 * wait on condition variable
+		 * before failure recovery be finish.
+		 */
+		pthread_mutex_lock(&failure_recovery_lock);
+		RTE_LOG(DEBUG, EAL, "begin waiting for the failure handler.\n");
+		pthread_cond_wait(&failure_recovery_cond,
+				  &failure_recovery_lock);
+		RTE_LOG(DEBUG, EAL,
+			"come back from waiting for failure handler.\n");
+		pthread_mutex_unlock(&failure_recovery_lock);
+	}
+
+	return 0;
 }
 
 int __rte_experimental
@@ -169,6 +291,12 @@ rte_dev_event_monitor_start(void)
 		return -1;
 	}
 
+	/* initialize mutex and condition variable
+	 * to control failure recovery.
+	 */
+	pthread_mutex_init(&failure_recovery_lock, NULL);
+	pthread_cond_init(&failure_recovery_cond, NULL);
+
 	monitor_started = true;
 
 	return 0;
@@ -192,5 +320,9 @@ rte_dev_event_monitor_stop(void)
 	close(intr_handle.fd);
 	intr_handle.fd = -1;
 	monitor_started = false;
+
+	pthread_cond_destroy(&failure_recovery_cond);
+	pthread_mutex_destroy(&failure_recovery_lock);
+
 	return 0;
 }
diff --git a/lib/librte_eal/linuxapp/igb_uio/igb_uio.c b/lib/librte_eal/linuxapp/igb_uio/igb_uio.c
index 4cae4dd..293c310 100644
--- a/lib/librte_eal/linuxapp/igb_uio/igb_uio.c
+++ b/lib/librte_eal/linuxapp/igb_uio/igb_uio.c
@@ -344,6 +344,10 @@ igbuio_pci_release(struct uio_info *info, struct inode *inode)
 	struct rte_uio_pci_dev *udev = info->priv;
 	struct pci_dev *dev = udev->pdev;
 
+	/* check if device has been remove before release */
+	if ((&dev->dev.kobj)->state_remove_uevent_sent == 1)
+		return -1;
+
 	mutex_lock(&udev->lock);
 	if (--udev->refcnt > 0) {
 		mutex_unlock(&udev->lock);
diff --git a/lib/librte_eal/rte_eal_version.map b/lib/librte_eal/rte_eal_version.map
index d23f491..d37dd29 100644
--- a/lib/librte_eal/rte_eal_version.map
+++ b/lib/librte_eal/rte_eal_version.map
@@ -262,5 +262,6 @@ EXPERIMENTAL {
 
         rte_dev_event_callback_register;
         rte_dev_event_callback_unregister;
+	rte_dev_handle_hot_unplug;
 
 } DPDK_18.05;
-- 
2.7.4

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [dpdk-dev] [PATCH V18 4/5] eal: add driver auto bind for hot insertion
  2018-04-03 18:17 [dpdk-dev] [PATCH V18 0/5] add hot plug failure and auto bind handler Jeff Guo
                   ` (2 preceding siblings ...)
  2018-04-03 18:17 ` [dpdk-dev] [PATCH V18 3/5] eal: add failure handler mechanism for hot plug Jeff Guo
@ 2018-04-03 18:17 ` Jeff Guo
  2018-04-03 18:17 ` [dpdk-dev] [PATCH V18 5/5] app/testpmd: use auto handle for hotplug Jeff Guo
  4 siblings, 0 replies; 28+ messages in thread
From: Jeff Guo @ 2018-04-03 18:17 UTC (permalink / raw)
  To: stephen, bruce.richardson, ferruh.yigit, konstantin.ananyev,
	gaetan.rivet, jingjing.wu, thomas, motih, harry.van.haaren,
	jianfeng.tan
  Cc: jblunck, shreyansh.jain, dev, jia.guo, helin.zhang

Normally we use driverctl or dpdk-devbind.py to bind kernel driver before
application running, but lack of an function to automatically bind driver
at runtime. This patch introduce a new API (rte_dev_bind_kernel_driver),
aim to let user call it to bind the specific kernel driver according
their own policy, that would preparing for the next step of attach device,
let app running smoothly when hotplug behavior occur.

Signed-off-by: Jeff Guo <jia.guo@intel.com>
---
v16->v15:
add document.
---
 doc/guides/rel_notes/release_18_05.rst  |  6 ++--
 lib/librte_eal/bsdapp/eal/eal_dev.c     |  7 +++++
 lib/librte_eal/common/include/rte_dev.h | 16 ++++++++++
 lib/librte_eal/linuxapp/eal/eal_dev.c   | 54 +++++++++++++++++++++++++++++++++
 lib/librte_eal/rte_eal_version.map      |  1 +
 5 files changed, 82 insertions(+), 2 deletions(-)

diff --git a/doc/guides/rel_notes/release_18_05.rst b/doc/guides/rel_notes/release_18_05.rst
index 3aacbf1..36e505c 100644
--- a/doc/guides/rel_notes/release_18_05.rst
+++ b/doc/guides/rel_notes/release_18_05.rst
@@ -51,11 +51,13 @@ New Features
   * ``rte_dev_event_callback_register`` and ``rte_dev_event_callback_unregister``
     are for the user's callbacks register and unregister.
 
-* **Added hot plug failure handler.**
+* **Added hot plug failure handler and kernel driver auto-bind func**
 
-  Added a failure handler machenism to handle hot plug removal.
+  Added a failure handler machenism to handle hot plug removal, and added an kernel driver
+  auto bind function for hot plug insertion. The list of new APIs:
 
   * ``rte_dev_handle_hot_unplug`` for handle hot plug removel failure.
+  * ``rte_dev_bind_kernel_driver`` for hot plug insertion.
 
 API Changes
 -----------
diff --git a/lib/librte_eal/bsdapp/eal/eal_dev.c b/lib/librte_eal/bsdapp/eal/eal_dev.c
index 1c6c51b..e953a87 100644
--- a/lib/librte_eal/bsdapp/eal/eal_dev.c
+++ b/lib/librte_eal/bsdapp/eal/eal_dev.c
@@ -19,3 +19,10 @@ rte_dev_event_monitor_stop(void)
 	RTE_LOG(ERR, EAL, "Device event is not supported for FreeBSD\n");
 	return -1;
 }
+
+int __rte_experimental
+rte_dev_bind_driver(const char *dev_name, enum rte_kernel_driver kdrv_type)
+{
+	RTE_LOG(ERR, EAL, "Bind driver is not supported for FreeBSD\n");
+	return -1;
+}
diff --git a/lib/librte_eal/common/include/rte_dev.h b/lib/librte_eal/common/include/rte_dev.h
index 7075e56..6829514 100644
--- a/lib/librte_eal/common/include/rte_dev.h
+++ b/lib/librte_eal/common/include/rte_dev.h
@@ -380,4 +380,20 @@ rte_dev_event_monitor_stop(void);
  */
 int __rte_experimental
 rte_dev_handle_hot_unplug(void);
+
+/**
+ * It can be used to bind a device to a specific type of kernel driver.
+ *
+ * @param dev_name
+ *  The device name.
+ * @param kdrv_type
+ *  The specific kernel driver's type.
+ *
+ * @return
+ *  - On success, zero.
+ *  - On failure, a negative value.
+ */
+int __rte_experimental
+rte_dev_bind_kernel_driver(const char *dev_name,
+			   enum rte_kernel_driver kdrv_type);
 #endif /* _RTE_DEV_H_ */
diff --git a/lib/librte_eal/linuxapp/eal/eal_dev.c b/lib/librte_eal/linuxapp/eal/eal_dev.c
index fabb37a..eb8275f 100644
--- a/lib/librte_eal/linuxapp/eal/eal_dev.c
+++ b/lib/librte_eal/linuxapp/eal/eal_dev.c
@@ -4,6 +4,7 @@
 
 #include <string.h>
 #include <unistd.h>
+#include <fcntl.h>
 #include <signal.h>
 #include <setjmp.h>
 #include <pthread.h>
@@ -326,3 +327,56 @@ rte_dev_event_monitor_stop(void)
 
 	return 0;
 }
+
+int __rte_experimental
+rte_dev_bind_kernel_driver(const char *dev_name,
+			   enum rte_kernel_driver kdrv_type)
+{
+	const char *kdrv_name = NULL;
+	char drv_override_path[1024];
+	int drv_override_fd = -1;
+
+	if (!dev_name || !kdrv_type)
+		return -1;
+
+	switch (kdrv_type) {
+	case RTE_KDRV_IGB_UIO:
+		kdrv_name = "igb_uio";
+		break;
+	case RTE_KDRV_VFIO:
+		kdrv_name = "vfio-pci";
+		break;
+	case RTE_KDRV_UIO_GENERIC:
+		kdrv_name = "uio_pci_generic";
+		break;
+	case RTE_KDRV_NIC_UIO:
+		RTE_LOG(ERR, EAL, "Don't support to bind nic uio driver.\n");
+		goto err;
+	default:
+		break;
+	}
+
+	snprintf(drv_override_path, sizeof(drv_override_path),
+		"/sys/bus/pci/devices/%s/driver_override", dev_name);
+
+	/* specify the driver for a device by writing to driver_override */
+	drv_override_fd = open(drv_override_path, O_WRONLY);
+	if (drv_override_fd < 0) {
+		RTE_LOG(ERR, EAL, "Cannot open %s: %s\n",
+			drv_override_path, strerror(errno));
+		goto err;
+	}
+
+	if (write(drv_override_fd, kdrv_name, sizeof(kdrv_name)) < 0) {
+		RTE_LOG(ERR, EAL,
+			"Error: bind failed - Cannot write "
+			"driver %s to device %s\n", kdrv_name, dev_name);
+		goto err;
+	}
+
+	close(drv_override_fd);
+	return 0;
+err:
+	close(drv_override_fd);
+	return -1;
+}
diff --git a/lib/librte_eal/rte_eal_version.map b/lib/librte_eal/rte_eal_version.map
index d37dd29..0c2ee3f 100644
--- a/lib/librte_eal/rte_eal_version.map
+++ b/lib/librte_eal/rte_eal_version.map
@@ -263,5 +263,6 @@ EXPERIMENTAL {
         rte_dev_event_callback_register;
         rte_dev_event_callback_unregister;
 	rte_dev_handle_hot_unplug;
+	rte_dev_bind_kernel_driver;
 
 } DPDK_18.05;
-- 
2.7.4

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [dpdk-dev] [PATCH V18 5/5] app/testpmd: use auto handle for hotplug
  2018-04-03 18:17 [dpdk-dev] [PATCH V18 0/5] add hot plug failure and auto bind handler Jeff Guo
                   ` (3 preceding siblings ...)
  2018-04-03 18:17 ` [dpdk-dev] [PATCH V18 4/5] eal: add driver auto bind for hot insertion Jeff Guo
@ 2018-04-03 18:17 ` Jeff Guo
  2018-04-06 10:56   ` [dpdk-dev] [PATCH V19 0/4] add hot plug failure and auto bind handler Jeff Guo
  4 siblings, 1 reply; 28+ messages in thread
From: Jeff Guo @ 2018-04-03 18:17 UTC (permalink / raw)
  To: stephen, bruce.richardson, ferruh.yigit, konstantin.ananyev,
	gaetan.rivet, jingjing.wu, thomas, motih, harry.van.haaren,
	jianfeng.tan
  Cc: jblunck, shreyansh.jain, dev, jia.guo, helin.zhang

Use testpmd for example, to show how an application smoothly handle
failure when device be hot removal, and show how to auto bind kernal
driver to preparing attach device when device being hot insertion.

Signed-off-by: Jeff Guo <jia.guo@intel.com>
---
v16->v15:
refine some typo
---
 app/test-pmd/testpmd.c | 178 ++++++++++++++++++++++++++++++++++++++++++-------
 app/test-pmd/testpmd.h |   9 +++
 2 files changed, 164 insertions(+), 23 deletions(-)

diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c
index 2faeb90..791378d 100644
--- a/app/test-pmd/testpmd.c
+++ b/app/test-pmd/testpmd.c
@@ -285,6 +285,8 @@ uint8_t lsc_interrupt = 1; /* enabled by default */
  */
 uint8_t rmv_interrupt = 1; /* enabled by default */
 
+#define HOT_PLUG_FOR_ALL_DEVICE -1
+#define ALL_CALLBACK -1
 uint8_t hot_plug = 0; /**< hotplug disabled by default. */
 
 /*
@@ -387,6 +389,8 @@ uint8_t bitrate_enabled;
 struct gro_status gro_ports[RTE_MAX_ETHPORTS];
 uint8_t gro_flush_cycles = GRO_DEFAULT_FLUSH_CYCLES;
 
+static struct hotplug_request_list hp_list;
+
 /* Forward function declarations */
 static void map_port_queue_stats_mapping_registers(portid_t pi,
 						   struct rte_port *port);
@@ -397,9 +401,12 @@ static int eth_event_callback(portid_t port_id,
 static int eth_dev_event_callback(char *device_name,
 				enum rte_dev_event_type type,
 				void *param);
-static int eth_dev_event_callback_register(void);
-static int eth_dev_event_callback_unregister(void);
+static int eth_dev_event_callback_register(portid_t port_id);
+static int eth_dev_event_callback_unregister(portid_t port_id);
 
+static bool in_hotplug_list(const char *dev_name);
+static int hotplug_list_add(struct rte_device *device,
+				enum rte_kernel_driver device_kdrv);
 
 /*
  * Check if all the ports are started.
@@ -1120,11 +1127,19 @@ run_pkt_fwd_on_lcore(struct fwd_lcore *fc, packet_fwd_t pkt_fwd)
 	uint64_t tics_datum;
 	uint64_t tics_current;
 	uint8_t idx_port, cnt_ports;
+	int ret;
 
 	cnt_ports = rte_eth_dev_count();
 	tics_datum = rte_rdtsc();
 	tics_per_1sec = rte_get_timer_hz();
 #endif
+	if (hot_plug) {
+		ret = rte_dev_handle_hot_unplug();
+		if (ret) {
+			printf("Can not setup handler for hot unplug!\n");
+			return;
+		}
+	}
 	fsm = &fwd_streams[fc->stream_idx];
 	nb_fs = fc->stream_nb;
 	do {
@@ -1863,15 +1878,24 @@ reset_port(portid_t pid)
 }
 
 static int
-eth_dev_event_callback_register(void)
+eth_dev_event_callback_register(portid_t port_id)
 {
-	int diag;
+	int ret;
+	char *device_name;
 
+	/* if port id equal -1, unregister event callbacks for all device. */
+	if (port_id == (portid_t)HOT_PLUG_FOR_ALL_DEVICE) {
+		device_name = NULL;
+	} else {
+		device_name = strdup(rte_eth_devices[port_id].device->name);
+		if (!device_name)
+			return -1;
+	}
 	/* register the device event callback */
-	diag = rte_dev_event_callback_register(NULL,
-		eth_dev_event_callback, NULL);
-	if (diag) {
-		printf("Failed to setup dev_event callback\n");
+	ret = rte_dev_event_callback_register(device_name,
+		eth_dev_event_callback, (void *)(intptr_t)port_id);
+	if (ret) {
+		printf("Failed to register device event callback.\n");
 		return -1;
 	}
 
@@ -1880,15 +1904,25 @@ eth_dev_event_callback_register(void)
 
 
 static int
-eth_dev_event_callback_unregister(void)
+eth_dev_event_callback_unregister(portid_t port_id)
 {
-	int diag;
+	int ret;
+	char *device_name;
+
+	/* if port id equal -1, unregister all device event callbacks */
+	if (port_id == (portid_t)HOT_PLUG_FOR_ALL_DEVICE) {
+		device_name = NULL;
+	} else {
+		device_name = strdup(rte_eth_devices[port_id].device->name);
+		if (!device_name)
+			return -1;
+	}
 
 	/* unregister the device event callback */
-	diag = rte_dev_event_callback_unregister(NULL,
-		eth_dev_event_callback, NULL);
-	if (diag) {
-		printf("Failed to setup dev_event callback\n");
+	ret = rte_dev_event_callback_unregister(device_name,
+		eth_dev_event_callback, (void *)(intptr_t)port_id);
+	if (ret) {
+		printf("Failed to unregister device event callback.\n");
 		return -1;
 	}
 
@@ -1911,6 +1945,8 @@ attach_port(char *identifier)
 	if (rte_eth_dev_attach(identifier, &pi))
 		return;
 
+	eth_dev_event_callback_register(pi);
+
 	socket_id = (unsigned)rte_eth_dev_socket_id(pi);
 	/* if socket_id is invalid, set to 0 */
 	if (check_socket_id(socket_id) < 0)
@@ -1922,6 +1958,12 @@ attach_port(char *identifier)
 
 	ports[pi].port_status = RTE_PORT_STOPPED;
 
+	if (hot_plug) {
+		hotplug_list_add(rte_eth_devices[pi].device,
+				 rte_eth_devices[pi].data->kdrv);
+		eth_dev_event_callback_register(pi);
+	}
+
 	printf("Port %d is attached. Now total ports is %d\n", pi, nb_ports);
 	printf("Done\n");
 }
@@ -1948,6 +1990,12 @@ detach_port(portid_t port_id)
 
 	nb_ports = rte_eth_dev_count();
 
+	if (hot_plug) {
+		hotplug_list_add(rte_eth_devices[port_id].device,
+				 rte_eth_devices[port_id].data->kdrv);
+		eth_dev_event_callback_register(port_id);
+	}
+
 	printf("Port '%s' is detached. Now total ports is %d\n",
 			name, nb_ports);
 	printf("Done\n");
@@ -1979,7 +2027,7 @@ pmd_test_exit(void)
 			RTE_LOG(ERR, EAL,
 				"fail to stop device event monitor.");
 
-		ret = eth_dev_event_callback_unregister();
+		ret = eth_dev_event_callback_unregister(ALL_CALLBACK);
 		if (ret)
 			RTE_LOG(ERR, EAL,
 				"fail to unregister all event callbacks.");
@@ -2068,6 +2116,31 @@ rmv_event_callback(void *arg)
 			dev->device->name);
 }
 
+static void
+rmv_dev_event_callback(uint16_t port_id)
+{
+	RTE_ETH_VALID_PORTID_OR_RET(port_id);
+
+	if (!in_hotplug_list(rte_eth_devices[port_id].device->name))
+		return;
+
+	stop_packet_forwarding();
+
+	stop_port(port_id);
+	close_port(port_id);
+	detach_port(port_id);
+}
+
+static void
+add_dev_event_callback(char *dev_name)
+{
+	if (!in_hotplug_list(dev_name))
+		return;
+
+	RTE_LOG(ERR, EAL, "add device: %s\n", dev_name);
+	attach_port(dev_name);
+}
+
 /* This function is used by the interrupt thread */
 static int
 eth_event_callback(portid_t port_id, enum rte_eth_event_type type, void *param,
@@ -2114,16 +2187,68 @@ eth_event_callback(portid_t port_id, enum rte_eth_event_type type, void *param,
 	return 0;
 }
 
+static bool
+in_hotplug_list(const char *dev_name)
+{
+	struct hotplug_request *hp_request = NULL;
+
+	TAILQ_FOREACH(hp_request, &hp_list, next) {
+		if (!strcmp(hp_request->dev_name, dev_name))
+			break;
+	}
+
+	if (hp_request)
+		return true;
+
+	return false;
+}
+
+static enum rte_kernel_driver
+get_hotplug_driver(const char *dev_name)
+{
+	struct hotplug_request *hp_request = NULL;
+
+	TAILQ_FOREACH(hp_request, &hp_list, next) {
+		if (!strcmp(hp_request->dev_name, dev_name))
+			return hp_request->dev_kdrv;
+	}
+	return -1;
+}
+
+static int
+hotplug_list_add(struct rte_device *device, enum rte_kernel_driver device_kdrv)
+{
+	struct hotplug_request *hp_request;
+
+	hp_request = rte_zmalloc("hoplug request",
+			sizeof(*hp_request), 0);
+	if (hp_request == NULL) {
+		fprintf(stderr, "%s can not alloc memory\n",
+			__func__);
+		return -ENOMEM;
+	}
+
+	hp_request->dev_name = device->name;
+	hp_request->dev_kdrv = device_kdrv;
+
+	TAILQ_INSERT_TAIL(&hp_list, hp_request, next);
+
+	return 0;
+}
+
 /* This function is used by the interrupt thread */
 static int
 eth_dev_event_callback(char *device_name, enum rte_dev_event_type type,
-			     __rte_unused void *arg)
+			     void *arg)
 {
 	int ret = 0;
 	static const char * const event_desc[] = {
 		[RTE_DEV_EVENT_ADD] = "add",
 		[RTE_DEV_EVENT_REMOVE] = "remove",
 	};
+	char *dev_name = malloc(strlen(device_name) + 1);
+
+	strcpy(dev_name, device_name);
 
 	if (type >= RTE_DEV_EVENT_MAX) {
 		fprintf(stderr, "%s called upon invalid event %d\n",
@@ -2139,16 +2264,18 @@ eth_dev_event_callback(char *device_name, enum rte_dev_event_type type,
 	case RTE_DEV_EVENT_REMOVE:
 		RTE_LOG(ERR, EAL, "The device: %s has been removed!\n",
 			device_name);
-		/* TODO: After finish failure handle, begin to stop
-		 * packet forward, stop port, close port, detach port.
-		 */
+		rmv_dev_event_callback((uint16_t)(uintptr_t)arg);
 		break;
 	case RTE_DEV_EVENT_ADD:
 		RTE_LOG(ERR, EAL, "The device: %s has been added!\n",
 			device_name);
-		/* TODO: After finish kernel driver binding,
-		 * begin to attach port.
+		/**
+		 * bind the driver to the device
+		 * before process of hot plug adding device
 		 */
+		rte_dev_bind_kernel_driver(dev_name,
+					   get_hotplug_driver(dev_name));
+		add_dev_event_callback(dev_name);
 		break;
 	default:
 		break;
@@ -2649,8 +2776,13 @@ main(int argc, char** argv)
 			rte_errno = EINVAL;
 			return -1;
 		}
-		eth_dev_event_callback_register();
-
+		if (TAILQ_EMPTY(&hp_list))
+			TAILQ_INIT(&hp_list);
+		RTE_ETH_FOREACH_DEV(port_id) {
+			hotplug_list_add(rte_eth_devices[port_id].device,
+					 rte_eth_devices[port_id].data->kdrv);
+			eth_dev_event_callback_register(port_id);
+		}
 	}
 
 	if (start_port(RTE_PORT_ALL) != 0)
diff --git a/app/test-pmd/testpmd.h b/app/test-pmd/testpmd.h
index 8fde68d..c619e11 100644
--- a/app/test-pmd/testpmd.h
+++ b/app/test-pmd/testpmd.h
@@ -63,6 +63,15 @@ typedef uint16_t streamid_t;
 #define TM_MODE			0
 #endif
 
+struct hotplug_request {
+	TAILQ_ENTRY(hotplug_request) next; /**< Callbacks list */
+	const char *dev_name;            /* request device name */
+	enum rte_kernel_driver dev_kdrv;            /* kernel driver binded */
+};
+
+/** @internal Structure to keep track of registered callbacks */
+TAILQ_HEAD(hotplug_request_list, hotplug_request);
+
 enum {
 	PORT_TOPOLOGY_PAIRED,
 	PORT_TOPOLOGY_CHAINED,
-- 
2.7.4

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [dpdk-dev] [PATCH V18 3/5] eal: add failure handler mechanism for hot plug
  2018-04-03 18:17 ` [dpdk-dev] [PATCH V18 3/5] eal: add failure handler mechanism for hot plug Jeff Guo
@ 2018-04-04  2:58   ` Zhang, Qi Z
  2018-04-06 10:53     ` Guo, Jia
  0 siblings, 1 reply; 28+ messages in thread
From: Zhang, Qi Z @ 2018-04-04  2:58 UTC (permalink / raw)
  To: Guo, Jia, stephen, Richardson, Bruce, Yigit, Ferruh, Ananyev,
	Konstantin, gaetan.rivet, Wu, Jingjing, thomas, motih,
	Van Haaren, Harry, Tan, Jianfeng
  Cc: jblunck, shreyansh.jain, dev, Guo, Jia, Zhang, Helin



> -----Original Message-----
> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Jeff Guo
> Sent: Wednesday, April 4, 2018 2:17 AM
> To: stephen@networkplumber.org; Richardson, Bruce
> <bruce.richardson@intel.com>; Yigit, Ferruh <ferruh.yigit@intel.com>;
> Ananyev, Konstantin <konstantin.ananyev@intel.com>;
> gaetan.rivet@6wind.com; Wu, Jingjing <jingjing.wu@intel.com>;
> thomas@monjalon.net; motih@mellanox.com; Van Haaren, Harry
> <harry.van.haaren@intel.com>; Tan, Jianfeng <jianfeng.tan@intel.com>
> Cc: jblunck@infradead.org; shreyansh.jain@nxp.com; dev@dpdk.org; Guo, Jia
> <jia.guo@intel.com>; Zhang, Helin <helin.zhang@intel.com>
> Subject: [dpdk-dev] [PATCH V18 3/5] eal: add failure handler mechanism for
> hot plug
> 
> This patch introduces an API (rte_dev_handle_hot_unplug) to handle device
> hot unplug event. When device be hot plug out, the device resource become
> invalid, if this resource is still be unexpected read/write, system will crash. The
> api let user register the hot unplug handler, when hot plug failure occur, the
> working thread will be block until the uevent mechanism successful recovery
> the memory and guaranty the application keep running smoothly.
> 
> Signed-off-by: Jeff Guo <jia.guo@intel.com>
> ---
> v16->v15:
> add document and signal bus handler
> ---
>  doc/guides/rel_notes/release_18_05.rst    |   6 ++
>  lib/librte_eal/common/include/rte_dev.h   |  19 +++++
>  lib/librte_eal/linuxapp/eal/eal_dev.c     | 134
> +++++++++++++++++++++++++++++-
>  lib/librte_eal/linuxapp/igb_uio/igb_uio.c |   4 +
>  lib/librte_eal/rte_eal_version.map        |   1 +
>  5 files changed, 163 insertions(+), 1 deletion(-)
> 
> diff --git a/doc/guides/rel_notes/release_18_05.rst
> b/doc/guides/rel_notes/release_18_05.rst
> index 37e00c4..3aacbf1 100644
> --- a/doc/guides/rel_notes/release_18_05.rst
> +++ b/doc/guides/rel_notes/release_18_05.rst
> @@ -51,6 +51,12 @@ New Features
>    * ``rte_dev_event_callback_register`` and
> ``rte_dev_event_callback_unregister``
>      are for the user's callbacks register and unregister.
> 
> +* **Added hot plug failure handler.**
> +
> +  Added a failure handler machenism to handle hot plug removal.
> +
> +  * ``rte_dev_handle_hot_unplug`` for handle hot plug removel failure.
> +
>  API Changes
>  -----------
> 
> diff --git a/lib/librte_eal/common/include/rte_dev.h
> b/lib/librte_eal/common/include/rte_dev.h
> index 4c78938..7075e56 100644
> --- a/lib/librte_eal/common/include/rte_dev.h
> +++ b/lib/librte_eal/common/include/rte_dev.h
> @@ -361,4 +361,23 @@ rte_dev_event_monitor_start(void);
>   */
>  int __rte_experimental
>  rte_dev_event_monitor_stop(void);
> +
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change without prior notice
> + *
> + * It can be used to register the device signal bus handler, and save
> +the
> + * current environment of each thread, when signal bus error invoke,
> +the
> + * handler would restore the environment by long jmp to each working
> + * thread, then block the thread  to waiting until the memory recovery
> + * and remapping be finished, that would guaranty the system not crash
> + * when the device be hot unplug.
> + *
> + * @param none
> + * @return
> + *   - On success, zero.
> + *   - On failure, a negative value.
> + */
> +int __rte_experimental
> +rte_dev_handle_hot_unplug(void);
>  #endif /* _RTE_DEV_H_ */
> diff --git a/lib/librte_eal/linuxapp/eal/eal_dev.c
> b/lib/librte_eal/linuxapp/eal/eal_dev.c
> index 9f2ee40..fabb37a 100644
> --- a/lib/librte_eal/linuxapp/eal/eal_dev.c
> +++ b/lib/librte_eal/linuxapp/eal/eal_dev.c
> @@ -4,6 +4,9 @@
> 
>  #include <string.h>
>  #include <unistd.h>
> +#include <signal.h>
> +#include <setjmp.h>
> +#include <pthread.h>
>  #include <sys/socket.h>
>  #include <linux/netlink.h>
> 
> @@ -12,12 +15,17 @@
>  #include <rte_dev.h>
>  #include <rte_malloc.h>
>  #include <rte_interrupts.h>
> +#include <rte_bus.h>
> +#include <rte_per_lcore.h>
> 
>  #include "eal_private.h"
> 
>  static struct rte_intr_handle intr_handle = {.fd = -1 };  static bool
> monitor_started;
> 
> +pthread_mutex_t failure_recovery_lock;
> +pthread_cond_t failure_recovery_cond;
> +
>  #define EAL_UEV_MSG_LEN 4096
>  #define EAL_UEV_MSG_ELEM_LEN 128
> 
> @@ -29,6 +37,22 @@ enum eal_dev_event_subsystem {
>  	EAL_DEV_EVENT_SUBSYSTEM_MAX
>  };
> 
> +static RTE_DEFINE_PER_LCORE(sigjmp_buf, unplug_longjmp_env);
> +
> +static void sigbus_handler(int signum __rte_unused) {
> +	RTE_LOG(DEBUG, EAL, "receive SIGBUS error!\n");
> +	siglongjmp(RTE_PER_LCORE(unplug_longjmp_env), 1); }
> +
> +static int cmp_dev_name(const struct rte_device *dev,
> +	const void *_name)
> +{
> +	const char *name = _name;
> +
> +	return strcmp(dev->name, name);
> +}
> +
>  static int
>  dev_uev_socket_fd_create(void)
>  {
> @@ -135,16 +159,114 @@ dev_uev_receive(int fd, struct rte_dev_event
> *uevent)
>  	return 0;
>  }
> 
> +static int
> +dev_uev_remove_handler(struct rte_device *dev) {
> +	struct rte_bus *bus = rte_bus_find_by_device_name(dev->name);
> +	int ret;
> +
> +	if (!dev)
> +		return -1;
> +
> +	if (bus->handle_hot_unplug) {
> +		/**
> +	 	 * call bus ops to handle hot unplug.
> +	 	 */
> +		ret = bus->handle_hot_unplug(dev);
> +		if (ret) {
> +			RTE_LOG(ERR, EAL,
> +				"It cannot handle hot unplug for device (%s) "
> +				"on the bus.\n ",
> +				dev->name);
> +			return ret;
> +		}
> +	}
> +	return 0;
> +}
> +
>  static void
>  dev_uev_process(__rte_unused void *param)  {
>  	struct rte_dev_event uevent;
> +	struct rte_bus *bus;
> +	struct rte_device *dev;
> +	const char *busname;
> +	int ret = 0;
> 
>  	if (dev_uev_receive(intr_handle.fd, &uevent))
>  		return;
> 
> -	if (uevent.devname)
> +	switch (uevent.subsystem) {
> +	case EAL_DEV_EVENT_SUBSYSTEM_PCI:
> +	case EAL_DEV_EVENT_SUBSYSTEM_UIO:
> +		busname = "pci";
> +		break;
> +	default:
> +		break;
> +	}
> +
> +	if (uevent.devname) {
> +		if (uevent.type == RTE_DEV_EVENT_REMOVE) {
> +			bus = rte_bus_find_by_name(busname);
> +			if (bus == NULL) {
> +				RTE_LOG(ERR, EAL, "Cannot find bus (%s)\n",
> +					uevent.devname);
> +				return;
> +			}
> +			dev = bus->find_device(NULL, cmp_dev_name,
> +					       uevent.devname);
> +			if (dev == NULL) {
> +				RTE_LOG(ERR, EAL,
> +					"Cannot find unplugged device (%s)\n",
> +					uevent.devname);
> +				return;
> +			}
> +			ret = dev_uev_remove_handler(dev);
> +			if (ret) {
> +				RTE_LOG(ERR, EAL, "Driver cannot remap the "
> +					"device (%s)\n",
> +					dev->name);
> +				return;
> +			}
> +			/* wake up all the threads */
> +			pthread_cond_broadcast(&failure_recovery_cond);
> +		}
> +
>  		dev_callback_process(uevent.devname, uevent.type);
> +	}
> +}
> +
> +int __rte_experimental
> +rte_dev_handle_hot_unplug(void)
> +{
> +	struct sigaction act;
> +	sigset_t mask;
> +
> +	/* set signal handlers */
> +	memset(&act, 0x00, sizeof(struct sigaction));
> +	act.sa_handler = sigbus_handler;
> +	sigemptyset(&act.sa_mask);
> +	act.sa_flags = SA_RESTART;
> +	sigaction(SIGBUS, &act, NULL);
> +	sigemptyset(&mask);
> +	sigaddset(&mask, SIGBUS);
> +	pthread_sigmask(SIG_UNBLOCK, &mask, NULL);
> +
> +	if (sigsetjmp(RTE_PER_LCORE(unplug_longjmp_env), 1)) {
> +		/*
> +		 * wait on condition variable
> +		 * before failure recovery be finish.
> +		 */
> +		pthread_mutex_lock(&failure_recovery_lock);
> +		RTE_LOG(DEBUG, EAL, "begin waiting for the failure handler.\n");
> +		pthread_cond_wait(&failure_recovery_cond,
> +				  &failure_recovery_lock);
> +		RTE_LOG(DEBUG, EAL,
> +			"come back from waiting for failure handler.\n");
> +		pthread_mutex_unlock(&failure_recovery_lock);
> +	}
> +
> +	return 0;

Should we return the value of sigsetjmp, so application can know "rewind" happened and can do some clean up?

>  }
> 
>  int __rte_experimental
> @@ -169,6 +291,12 @@ rte_dev_event_monitor_start(void)
>  		return -1;
>  	}
> 
> +	/* initialize mutex and condition variable
> +	 * to control failure recovery.
> +	 */
> +	pthread_mutex_init(&failure_recovery_lock, NULL);
> +	pthread_cond_init(&failure_recovery_cond, NULL);
> +
>  	monitor_started = true;
> 
>  	return 0;
> @@ -192,5 +320,9 @@ rte_dev_event_monitor_stop(void)
>  	close(intr_handle.fd);
>  	intr_handle.fd = -1;
>  	monitor_started = false;
> +
> +	pthread_cond_destroy(&failure_recovery_cond);
> +	pthread_mutex_destroy(&failure_recovery_lock);
> +
>  	return 0;
>  }
> diff --git a/lib/librte_eal/linuxapp/igb_uio/igb_uio.c
> b/lib/librte_eal/linuxapp/igb_uio/igb_uio.c
> index 4cae4dd..293c310 100644
> --- a/lib/librte_eal/linuxapp/igb_uio/igb_uio.c
> +++ b/lib/librte_eal/linuxapp/igb_uio/igb_uio.c
> @@ -344,6 +344,10 @@ igbuio_pci_release(struct uio_info *info, struct inode
> *inode)
>  	struct rte_uio_pci_dev *udev = info->priv;
>  	struct pci_dev *dev = udev->pdev;
> 
> +	/* check if device has been remove before release */
> +	if ((&dev->dev.kobj)->state_remove_uevent_sent == 1)
> +		return -1;
> +
>  	mutex_lock(&udev->lock);
>  	if (--udev->refcnt > 0) {
>  		mutex_unlock(&udev->lock);
> diff --git a/lib/librte_eal/rte_eal_version.map
> b/lib/librte_eal/rte_eal_version.map
> index d23f491..d37dd29 100644
> --- a/lib/librte_eal/rte_eal_version.map
> +++ b/lib/librte_eal/rte_eal_version.map
> @@ -262,5 +262,6 @@ EXPERIMENTAL {
> 
>          rte_dev_event_callback_register;
>          rte_dev_event_callback_unregister;
> +	rte_dev_handle_hot_unplug;
> 
>  } DPDK_18.05;
> --
> 2.7.4

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [dpdk-dev] [PATCH V18 1/5] bus: introduce device hot unplug handle
  2018-04-03 18:17 ` [dpdk-dev] [PATCH V18 1/5] bus: introduce device hot unplug handle Jeff Guo
@ 2018-04-04  4:31   ` Tan, Jianfeng
  2018-04-06 10:54     ` Guo, Jia
  0 siblings, 1 reply; 28+ messages in thread
From: Tan, Jianfeng @ 2018-04-04  4:31 UTC (permalink / raw)
  To: Guo, Jia, stephen, Richardson, Bruce, Yigit, Ferruh, Ananyev,
	Konstantin, gaetan.rivet, Wu, Jingjing, thomas, motih,
	Van Haaren, Harry
  Cc: jblunck, shreyansh.jain, dev, Zhang, Helin



> -----Original Message-----
> From: Guo, Jia
> Sent: Wednesday, April 4, 2018 2:17 AM
> To: stephen@networkplumber.org; Richardson, Bruce; Yigit, Ferruh;
> Ananyev, Konstantin; gaetan.rivet@6wind.com; Wu, Jingjing;
> thomas@monjalon.net; motih@mellanox.com; Van Haaren, Harry; Tan,
> Jianfeng
> Cc: jblunck@infradead.org; shreyansh.jain@nxp.com; dev@dpdk.org; Guo,
> Jia; Zhang, Helin
> Subject: [PATCH V18 1/5] bus: introduce device hot unplug handle
> 
> As of device hot unplug, we need some preparatory measures so that we will
> not encounter memory fault after device have been plug out of the system,
> and also let we could recover the running data path but not been break.
> This allows the buses to handle device hot unplug event.
> In the following patch, will show how to handle the case for pci bus.

Squeeze this patch with the next one.

> 
> Signed-off-by: Jeff Guo <jia.guo@intel.com>
> ---
> v16->v15:
> split patch, and remove the ops from RTE_VERIFY
> ---
>  lib/librte_eal/common/include/rte_bus.h | 15 +++++++++++++++
>  1 file changed, 15 insertions(+)
> 
> diff --git a/lib/librte_eal/common/include/rte_bus.h
> b/lib/librte_eal/common/include/rte_bus.h
> index 6fb0834..ecd8b1c 100644
> --- a/lib/librte_eal/common/include/rte_bus.h
> +++ b/lib/librte_eal/common/include/rte_bus.h
> @@ -168,6 +168,19 @@ typedef int (*rte_bus_unplug_t)(struct rte_device
> *dev);
>  typedef int (*rte_bus_parse_t)(const char *name, void *addr);
> 
>  /**
> + * Implementation specific hot unplug handler function which is responsible
> + * for handle the failure when hot unplug the device, guaranty the system
> + * would not crash in the case.
> + * @param dev
> + *	Pointer of the device structure.
> + *
> + * @return
> + *	0 on success.
> + *	!0 on error.
> + */
> +typedef int (*rte_bus_handle_hot_unplug_t)(struct rte_device *dev);
> +
> +/**
>   * Bus scan policies
>   */
>  enum rte_bus_scan_mode {
> @@ -209,6 +222,8 @@ struct rte_bus {
>  	rte_bus_plug_t plug;         /**< Probe single device for drivers */
>  	rte_bus_unplug_t unplug;     /**< Remove single device from driver
> */
>  	rte_bus_parse_t parse;       /**< Parse a device name */
> +	rte_bus_handle_hot_unplug_t handle_hot_unplug; /**< handle when device
> +							hot unplug */

May be just rte_bus_hot_unplug_t hot_unplug /**< Handle hot unplug device event */

>  	struct rte_bus_conf conf;    /**< Bus configuration */
>  	rte_bus_get_iommu_class_t get_iommu_class; /**< Get iommu
> class */
>  };
> --
> 2.7.4

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [dpdk-dev] [PATCH V18 2/5] bus/pci: implement handle hot unplug operation
  2018-04-03 18:17 ` [dpdk-dev] [PATCH V18 2/5] bus/pci: implement handle hot unplug operation Jeff Guo
@ 2018-04-04  5:25   ` Tan, Jianfeng
  2018-04-06 10:57     ` Guo, Jia
  0 siblings, 1 reply; 28+ messages in thread
From: Tan, Jianfeng @ 2018-04-04  5:25 UTC (permalink / raw)
  To: Guo, Jia, stephen, Richardson, Bruce, Yigit, Ferruh, Ananyev,
	Konstantin, gaetan.rivet, Wu, Jingjing, thomas, motih,
	Van Haaren, Harry
  Cc: jblunck, shreyansh.jain, dev, Zhang, Helin



> -----Original Message-----
> From: Guo, Jia
> Sent: Wednesday, April 4, 2018 2:17 AM
> To: stephen@networkplumber.org; Richardson, Bruce; Yigit, Ferruh;
> Ananyev, Konstantin; gaetan.rivet@6wind.com; Wu, Jingjing;
> thomas@monjalon.net; motih@mellanox.com; Van Haaren, Harry; Tan,
> Jianfeng
> Cc: jblunck@infradead.org; shreyansh.jain@nxp.com; dev@dpdk.org; Guo,
> Jia; Zhang, Helin
> Subject: [PATCH V18 2/5] bus/pci: implement handle hot unplug operation
> 
> When handle device hot unplug event, remap a dummy memory to avoid
> bus read/write error.
> 
> Signed-off-by: Jeff Guo <jia.guo@intel.com>
> ---
> v16->v15;
> split patch, merge some function to be simple
> ---
>  drivers/bus/pci/pci_common.c     | 42
> ++++++++++++++++++++++++++++++++++++++++
>  drivers/bus/pci/pci_common_uio.c | 33
> +++++++++++++++++++++++++++++++
>  drivers/bus/pci/private.h        | 12 ++++++++++++
>  3 files changed, 87 insertions(+)
> 
> diff --git a/drivers/bus/pci/pci_common.c b/drivers/bus/pci/pci_common.c
> index 2a00f36..fa077ec 100644
> --- a/drivers/bus/pci/pci_common.c
> +++ b/drivers/bus/pci/pci_common.c
> @@ -474,6 +474,47 @@ pci_find_device(const struct rte_device *start,
> rte_dev_cmp_t cmp,
>  }
> 
>  static int
> +pci_handle_hot_unplug(struct rte_device *dev)
> +{
> +	struct rte_pci_device *pdev;
> +	int ret;
> +
> +	if (dev == NULL)
> +		return -EINVAL;
> +
> +	pdev = RTE_DEV_TO_PCI(dev);
> +
> +	/* remap resources for devices */
> +	switch (pdev->kdrv) {
> +	case RTE_KDRV_VFIO:
> +#ifdef VFIO_PRESENT
> +		/* TODO */
> +#endif

What's the difference between uio and vfio? We can just fall though?

> +		break;
> +	case RTE_KDRV_IGB_UIO:
> +	case RTE_KDRV_UIO_GENERIC:
> +		if (rte_eal_using_phys_addrs()) {

Why do we care about if we are using physical addresses?

> +			/* map resources for devices that use uio */
> +			ret = pci_uio_remap_resource(pdev);
> +		}
> +		break;
> +	case RTE_KDRV_NIC_UIO:
> +		ret = pci_uio_remap_resource(pdev);
> +		break;
> +	default:
> +		RTE_LOG(DEBUG, EAL,
> +			"  Not managed by a supported kernel driver, skipped\n");
> +		ret = 1;

-1 for such case?

> +		break;
> +	}
> +
> +	if (ret != 0)
> +		RTE_LOG(ERR, EAL, "failed to handle hot unplug of %s",
> +			pdev->name);
> +	return ret;
> +}
> +
> +static int
>  pci_plug(struct rte_device *dev)
>  {
>  	return pci_probe_all_drivers(RTE_DEV_TO_PCI(dev));
> @@ -503,6 +544,7 @@ struct rte_pci_bus rte_pci_bus = {
>  		.unplug = pci_unplug,
>  		.parse = pci_parse,
>  		.get_iommu_class = rte_pci_get_iommu_class,
> +		.handle_hot_unplug = pci_handle_hot_unplug,
>  	},
>  	.device_list = TAILQ_HEAD_INITIALIZER(rte_pci_bus.device_list),
>  	.driver_list = TAILQ_HEAD_INITIALIZER(rte_pci_bus.driver_list),
> diff --git a/drivers/bus/pci/pci_common_uio.c
> b/drivers/bus/pci/pci_common_uio.c
> index 54bc20b..468ade4 100644
> --- a/drivers/bus/pci/pci_common_uio.c
> +++ b/drivers/bus/pci/pci_common_uio.c
> @@ -146,6 +146,39 @@ pci_uio_unmap(struct mapped_pci_resource
> *uio_res)
>  	}
>  }
> 
> +/* remap the PCI resource of a PCI device in private virtual memory */
> +int
> +pci_uio_remap_resource(struct rte_pci_device *dev)

Why's this function uio specific? I think we can move it to pci_common.c.

> +{
> +	int i;
> +	uint64_t phaddr;
> +	void *map_address;
> +
> +	if (dev == NULL)
> +		return -1;
> +
> +	/* Map all BARs */

s/Map/Remap
 
> +	for (i = 0; i != PCI_MAX_RESOURCE; i++) {
> +		/* skip empty BAR */
> +		phaddr = dev->mem_resource[i].phys_addr;
> +		if (phaddr == 0)
> +			continue;

How about just simple:

if (dev->mem_resource[i].phys_addr == 0)

> +		pci_unmap_resource(dev->mem_resource[i].addr,
> +				(size_t)dev->mem_resource[i].len);
> +		map_address = pci_map_resource(
> +				dev->mem_resource[i].addr, -1, 0,
> +				(size_t)dev->mem_resource[i].len,
> +				MAP_ANONYMOUS | MAP_FIXED);
> +		if (map_address == MAP_FAILED) {
> +			RTE_LOG(ERR, EAL,
> +				"Cannot remap resource for device %s\n",
> dev->name);
> +			return -1;
> +		}
> +	}
> +
> +	return 0;
> +}
> +
>  static struct mapped_pci_resource *
>  pci_uio_find_resource(struct rte_pci_device *dev)
>  {
> diff --git a/drivers/bus/pci/private.h b/drivers/bus/pci/private.h
> index 88fa587..7a862ef 100644
> --- a/drivers/bus/pci/private.h
> +++ b/drivers/bus/pci/private.h
> @@ -173,6 +173,18 @@ void pci_uio_free_resource(struct rte_pci_device
> *dev,
>  		struct mapped_pci_resource *uio_res);
> 
>  /**
> + * remap the pci uio resource..
> + *
> + * @param dev
> + *   Point to the struct rte pci device.
> + * @return
> + *   - On success, zero.
> + *   - On failure, a negative value.
> + */
> +int
> +pci_uio_remap_resource(struct rte_pci_device *dev);

If we define 

> +
> +/**
>   * Map device memory to uio resource
>   *
>   * This function is private to EAL.
> --
> 2.7.4

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [dpdk-dev] [PATCH V18 3/5] eal: add failure handler mechanism for hot plug
  2018-04-04  2:58   ` Zhang, Qi Z
@ 2018-04-06 10:53     ` Guo, Jia
  0 siblings, 0 replies; 28+ messages in thread
From: Guo, Jia @ 2018-04-06 10:53 UTC (permalink / raw)
  To: Zhang, Qi Z, stephen, Richardson, Bruce, Yigit, Ferruh, Ananyev,
	Konstantin, gaetan.rivet, Wu, Jingjing, thomas, motih,
	Van Haaren, Harry, Tan, Jianfeng
  Cc: jblunck, shreyansh.jain, dev, Zhang, Helin



On 4/4/2018 10:58 AM, Zhang, Qi Z wrote:
>
>> -----Original Message-----
>> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Jeff Guo
>> Sent: Wednesday, April 4, 2018 2:17 AM
>> To: stephen@networkplumber.org; Richardson, Bruce
>> <bruce.richardson@intel.com>; Yigit, Ferruh <ferruh.yigit@intel.com>;
>> Ananyev, Konstantin <konstantin.ananyev@intel.com>;
>> gaetan.rivet@6wind.com; Wu, Jingjing <jingjing.wu@intel.com>;
>> thomas@monjalon.net; motih@mellanox.com; Van Haaren, Harry
>> <harry.van.haaren@intel.com>; Tan, Jianfeng <jianfeng.tan@intel.com>
>> Cc: jblunck@infradead.org; shreyansh.jain@nxp.com; dev@dpdk.org; Guo, Jia
>> <jia.guo@intel.com>; Zhang, Helin <helin.zhang@intel.com>
>> Subject: [dpdk-dev] [PATCH V18 3/5] eal: add failure handler mechanism for
>> hot plug
>>
>> This patch introduces an API (rte_dev_handle_hot_unplug) to handle device
>> hot unplug event. When device be hot plug out, the device resource become
>> invalid, if this resource is still be unexpected read/write, system will crash. The
>> api let user register the hot unplug handler, when hot plug failure occur, the
>> working thread will be block until the uevent mechanism successful recovery
>> the memory and guaranty the application keep running smoothly.
>>
>> Signed-off-by: Jeff Guo <jia.guo@intel.com>
>> ---
>> v16->v15:
>> add document and signal bus handler
>> ---
>>   doc/guides/rel_notes/release_18_05.rst    |   6 ++
>>   lib/librte_eal/common/include/rte_dev.h   |  19 +++++
>>   lib/librte_eal/linuxapp/eal/eal_dev.c     | 134
>> +++++++++++++++++++++++++++++-
>>   lib/librte_eal/linuxapp/igb_uio/igb_uio.c |   4 +
>>   lib/librte_eal/rte_eal_version.map        |   1 +
>>   5 files changed, 163 insertions(+), 1 deletion(-)
>>
>> diff --git a/doc/guides/rel_notes/release_18_05.rst
>> b/doc/guides/rel_notes/release_18_05.rst
>> index 37e00c4..3aacbf1 100644
>> --- a/doc/guides/rel_notes/release_18_05.rst
>> +++ b/doc/guides/rel_notes/release_18_05.rst
>> @@ -51,6 +51,12 @@ New Features
>>     * ``rte_dev_event_callback_register`` and
>> ``rte_dev_event_callback_unregister``
>>       are for the user's callbacks register and unregister.
>>
>> +* **Added hot plug failure handler.**
>> +
>> +  Added a failure handler machenism to handle hot plug removal.
>> +
>> +  * ``rte_dev_handle_hot_unplug`` for handle hot plug removel failure.
>> +
>>   API Changes
>>   -----------
>>
>> diff --git a/lib/librte_eal/common/include/rte_dev.h
>> b/lib/librte_eal/common/include/rte_dev.h
>> index 4c78938..7075e56 100644
>> --- a/lib/librte_eal/common/include/rte_dev.h
>> +++ b/lib/librte_eal/common/include/rte_dev.h
>> @@ -361,4 +361,23 @@ rte_dev_event_monitor_start(void);
>>    */
>>   int __rte_experimental
>>   rte_dev_event_monitor_stop(void);
>> +
>> +/**
>> + * @warning
>> + * @b EXPERIMENTAL: this API may change without prior notice
>> + *
>> + * It can be used to register the device signal bus handler, and save
>> +the
>> + * current environment of each thread, when signal bus error invoke,
>> +the
>> + * handler would restore the environment by long jmp to each working
>> + * thread, then block the thread  to waiting until the memory recovery
>> + * and remapping be finished, that would guaranty the system not crash
>> + * when the device be hot unplug.
>> + *
>> + * @param none
>> + * @return
>> + *   - On success, zero.
>> + *   - On failure, a negative value.
>> + */
>> +int __rte_experimental
>> +rte_dev_handle_hot_unplug(void);
>>   #endif /* _RTE_DEV_H_ */
>> diff --git a/lib/librte_eal/linuxapp/eal/eal_dev.c
>> b/lib/librte_eal/linuxapp/eal/eal_dev.c
>> index 9f2ee40..fabb37a 100644
>> --- a/lib/librte_eal/linuxapp/eal/eal_dev.c
>> +++ b/lib/librte_eal/linuxapp/eal/eal_dev.c
>> @@ -4,6 +4,9 @@
>>
>>   #include <string.h>
>>   #include <unistd.h>
>> +#include <signal.h>
>> +#include <setjmp.h>
>> +#include <pthread.h>
>>   #include <sys/socket.h>
>>   #include <linux/netlink.h>
>>
>> @@ -12,12 +15,17 @@
>>   #include <rte_dev.h>
>>   #include <rte_malloc.h>
>>   #include <rte_interrupts.h>
>> +#include <rte_bus.h>
>> +#include <rte_per_lcore.h>
>>
>>   #include "eal_private.h"
>>
>>   static struct rte_intr_handle intr_handle = {.fd = -1 };  static bool
>> monitor_started;
>>
>> +pthread_mutex_t failure_recovery_lock;
>> +pthread_cond_t failure_recovery_cond;
>> +
>>   #define EAL_UEV_MSG_LEN 4096
>>   #define EAL_UEV_MSG_ELEM_LEN 128
>>
>> @@ -29,6 +37,22 @@ enum eal_dev_event_subsystem {
>>   	EAL_DEV_EVENT_SUBSYSTEM_MAX
>>   };
>>
>> +static RTE_DEFINE_PER_LCORE(sigjmp_buf, unplug_longjmp_env);
>> +
>> +static void sigbus_handler(int signum __rte_unused) {
>> +	RTE_LOG(DEBUG, EAL, "receive SIGBUS error!\n");
>> +	siglongjmp(RTE_PER_LCORE(unplug_longjmp_env), 1); }
>> +
>> +static int cmp_dev_name(const struct rte_device *dev,
>> +	const void *_name)
>> +{
>> +	const char *name = _name;
>> +
>> +	return strcmp(dev->name, name);
>> +}
>> +
>>   static int
>>   dev_uev_socket_fd_create(void)
>>   {
>> @@ -135,16 +159,114 @@ dev_uev_receive(int fd, struct rte_dev_event
>> *uevent)
>>   	return 0;
>>   }
>>
>> +static int
>> +dev_uev_remove_handler(struct rte_device *dev) {
>> +	struct rte_bus *bus = rte_bus_find_by_device_name(dev->name);
>> +	int ret;
>> +
>> +	if (!dev)
>> +		return -1;
>> +
>> +	if (bus->handle_hot_unplug) {
>> +		/**
>> +	 	 * call bus ops to handle hot unplug.
>> +	 	 */
>> +		ret = bus->handle_hot_unplug(dev);
>> +		if (ret) {
>> +			RTE_LOG(ERR, EAL,
>> +				"It cannot handle hot unplug for device (%s) "
>> +				"on the bus.\n ",
>> +				dev->name);
>> +			return ret;
>> +		}
>> +	}
>> +	return 0;
>> +}
>> +
>>   static void
>>   dev_uev_process(__rte_unused void *param)  {
>>   	struct rte_dev_event uevent;
>> +	struct rte_bus *bus;
>> +	struct rte_device *dev;
>> +	const char *busname;
>> +	int ret = 0;
>>
>>   	if (dev_uev_receive(intr_handle.fd, &uevent))
>>   		return;
>>
>> -	if (uevent.devname)
>> +	switch (uevent.subsystem) {
>> +	case EAL_DEV_EVENT_SUBSYSTEM_PCI:
>> +	case EAL_DEV_EVENT_SUBSYSTEM_UIO:
>> +		busname = "pci";
>> +		break;
>> +	default:
>> +		break;
>> +	}
>> +
>> +	if (uevent.devname) {
>> +		if (uevent.type == RTE_DEV_EVENT_REMOVE) {
>> +			bus = rte_bus_find_by_name(busname);
>> +			if (bus == NULL) {
>> +				RTE_LOG(ERR, EAL, "Cannot find bus (%s)\n",
>> +					uevent.devname);
>> +				return;
>> +			}
>> +			dev = bus->find_device(NULL, cmp_dev_name,
>> +					       uevent.devname);
>> +			if (dev == NULL) {
>> +				RTE_LOG(ERR, EAL,
>> +					"Cannot find unplugged device (%s)\n",
>> +					uevent.devname);
>> +				return;
>> +			}
>> +			ret = dev_uev_remove_handler(dev);
>> +			if (ret) {
>> +				RTE_LOG(ERR, EAL, "Driver cannot remap the "
>> +					"device (%s)\n",
>> +					dev->name);
>> +				return;
>> +			}
>> +			/* wake up all the threads */
>> +			pthread_cond_broadcast(&failure_recovery_cond);
>> +		}
>> +
>>   		dev_callback_process(uevent.devname, uevent.type);
>> +	}
>> +}
>> +
>> +int __rte_experimental
>> +rte_dev_handle_hot_unplug(void)
>> +{
>> +	struct sigaction act;
>> +	sigset_t mask;
>> +
>> +	/* set signal handlers */
>> +	memset(&act, 0x00, sizeof(struct sigaction));
>> +	act.sa_handler = sigbus_handler;
>> +	sigemptyset(&act.sa_mask);
>> +	act.sa_flags = SA_RESTART;
>> +	sigaction(SIGBUS, &act, NULL);
>> +	sigemptyset(&mask);
>> +	sigaddset(&mask, SIGBUS);
>> +	pthread_sigmask(SIG_UNBLOCK, &mask, NULL);
>> +
>> +	if (sigsetjmp(RTE_PER_LCORE(unplug_longjmp_env), 1)) {
>> +		/*
>> +		 * wait on condition variable
>> +		 * before failure recovery be finish.
>> +		 */
>> +		pthread_mutex_lock(&failure_recovery_lock);
>> +		RTE_LOG(DEBUG, EAL, "begin waiting for the failure handler.\n");
>> +		pthread_cond_wait(&failure_recovery_cond,
>> +				  &failure_recovery_lock);
>> +		RTE_LOG(DEBUG, EAL,
>> +			"come back from waiting for failure handler.\n");
>> +		pthread_mutex_unlock(&failure_recovery_lock);
>> +	}
>> +
>> +	return 0;
> Should we return the value of sigsetjmp, so application can know "rewind" happened and can do some clean up?
good idea. please check next version.
>>   }
>>
>>   int __rte_experimental
>> @@ -169,6 +291,12 @@ rte_dev_event_monitor_start(void)
>>   		return -1;
>>   	}
>>
>> +	/* initialize mutex and condition variable
>> +	 * to control failure recovery.
>> +	 */
>> +	pthread_mutex_init(&failure_recovery_lock, NULL);
>> +	pthread_cond_init(&failure_recovery_cond, NULL);
>> +
>>   	monitor_started = true;
>>
>>   	return 0;
>> @@ -192,5 +320,9 @@ rte_dev_event_monitor_stop(void)
>>   	close(intr_handle.fd);
>>   	intr_handle.fd = -1;
>>   	monitor_started = false;
>> +
>> +	pthread_cond_destroy(&failure_recovery_cond);
>> +	pthread_mutex_destroy(&failure_recovery_lock);
>> +
>>   	return 0;
>>   }
>> diff --git a/lib/librte_eal/linuxapp/igb_uio/igb_uio.c
>> b/lib/librte_eal/linuxapp/igb_uio/igb_uio.c
>> index 4cae4dd..293c310 100644
>> --- a/lib/librte_eal/linuxapp/igb_uio/igb_uio.c
>> +++ b/lib/librte_eal/linuxapp/igb_uio/igb_uio.c
>> @@ -344,6 +344,10 @@ igbuio_pci_release(struct uio_info *info, struct inode
>> *inode)
>>   	struct rte_uio_pci_dev *udev = info->priv;
>>   	struct pci_dev *dev = udev->pdev;
>>
>> +	/* check if device has been remove before release */
>> +	if ((&dev->dev.kobj)->state_remove_uevent_sent == 1)
>> +		return -1;
>> +
>>   	mutex_lock(&udev->lock);
>>   	if (--udev->refcnt > 0) {
>>   		mutex_unlock(&udev->lock);
>> diff --git a/lib/librte_eal/rte_eal_version.map
>> b/lib/librte_eal/rte_eal_version.map
>> index d23f491..d37dd29 100644
>> --- a/lib/librte_eal/rte_eal_version.map
>> +++ b/lib/librte_eal/rte_eal_version.map
>> @@ -262,5 +262,6 @@ EXPERIMENTAL {
>>
>>           rte_dev_event_callback_register;
>>           rte_dev_event_callback_unregister;
>> +	rte_dev_handle_hot_unplug;
>>
>>   } DPDK_18.05;
>> --
>> 2.7.4

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [dpdk-dev] [PATCH V18 1/5] bus: introduce device hot unplug handle
  2018-04-04  4:31   ` Tan, Jianfeng
@ 2018-04-06 10:54     ` Guo, Jia
  0 siblings, 0 replies; 28+ messages in thread
From: Guo, Jia @ 2018-04-06 10:54 UTC (permalink / raw)
  To: Tan, Jianfeng, stephen, Richardson, Bruce, Yigit, Ferruh,
	Ananyev, Konstantin, gaetan.rivet, Wu, Jingjing, thomas, motih,
	Van Haaren, Harry
  Cc: jblunck, shreyansh.jain, dev, Zhang, Helin

thanks.


On 4/4/2018 12:31 PM, Tan, Jianfeng wrote:
>
>> -----Original Message-----
>> From: Guo, Jia
>> Sent: Wednesday, April 4, 2018 2:17 AM
>> To: stephen@networkplumber.org; Richardson, Bruce; Yigit, Ferruh;
>> Ananyev, Konstantin; gaetan.rivet@6wind.com; Wu, Jingjing;
>> thomas@monjalon.net; motih@mellanox.com; Van Haaren, Harry; Tan,
>> Jianfeng
>> Cc: jblunck@infradead.org; shreyansh.jain@nxp.com; dev@dpdk.org; Guo,
>> Jia; Zhang, Helin
>> Subject: [PATCH V18 1/5] bus: introduce device hot unplug handle
>>
>> As of device hot unplug, we need some preparatory measures so that we will
>> not encounter memory fault after device have been plug out of the system,
>> and also let we could recover the running data path but not been break.
>> This allows the buses to handle device hot unplug event.
>> In the following patch, will show how to handle the case for pci bus.
> Squeeze this patch with the next one.
>
>> Signed-off-by: Jeff Guo <jia.guo@intel.com>
>> ---
>> v16->v15:
>> split patch, and remove the ops from RTE_VERIFY
>> ---
>>   lib/librte_eal/common/include/rte_bus.h | 15 +++++++++++++++
>>   1 file changed, 15 insertions(+)
>>
>> diff --git a/lib/librte_eal/common/include/rte_bus.h
>> b/lib/librte_eal/common/include/rte_bus.h
>> index 6fb0834..ecd8b1c 100644
>> --- a/lib/librte_eal/common/include/rte_bus.h
>> +++ b/lib/librte_eal/common/include/rte_bus.h
>> @@ -168,6 +168,19 @@ typedef int (*rte_bus_unplug_t)(struct rte_device
>> *dev);
>>   typedef int (*rte_bus_parse_t)(const char *name, void *addr);
>>
>>   /**
>> + * Implementation specific hot unplug handler function which is responsible
>> + * for handle the failure when hot unplug the device, guaranty the system
>> + * would not crash in the case.
>> + * @param dev
>> + *	Pointer of the device structure.
>> + *
>> + * @return
>> + *	0 on success.
>> + *	!0 on error.
>> + */
>> +typedef int (*rte_bus_handle_hot_unplug_t)(struct rte_device *dev);
>> +
>> +/**
>>    * Bus scan policies
>>    */
>>   enum rte_bus_scan_mode {
>> @@ -209,6 +222,8 @@ struct rte_bus {
>>   	rte_bus_plug_t plug;         /**< Probe single device for drivers */
>>   	rte_bus_unplug_t unplug;     /**< Remove single device from driver
>> */
>>   	rte_bus_parse_t parse;       /**< Parse a device name */
>> +	rte_bus_handle_hot_unplug_t handle_hot_unplug; /**< handle when device
>> +							hot unplug */
> May be just rte_bus_hot_unplug_t hot_unplug /**< Handle hot unplug device event */
>
>>   	struct rte_bus_conf conf;    /**< Bus configuration */
>>   	rte_bus_get_iommu_class_t get_iommu_class; /**< Get iommu
>> class */
>>   };
>> --
>> 2.7.4

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [dpdk-dev] [PATCH V19 0/4] add hot plug failure and auto bind handler
  2018-04-03 18:17 ` [dpdk-dev] [PATCH V18 5/5] app/testpmd: use auto handle for hotplug Jeff Guo
@ 2018-04-06 10:56   ` Jeff Guo
  2018-04-06 10:56     ` [dpdk-dev] [PATCH V19 1/4] bus/pci: introduce device hot unplug handle Jeff Guo
                       ` (3 more replies)
  0 siblings, 4 replies; 28+ messages in thread
From: Jeff Guo @ 2018-04-06 10:56 UTC (permalink / raw)
  To: stephen, bruce.richardson, ferruh.yigit, konstantin.ananyev,
	gaetan.rivet, jingjing.wu, thomas, motih, harry.van.haaren,
	jianfeng.tan
  Cc: jblunck, shreyansh.jain, dev, jia.guo, helin.zhang

At the prior, device event monitor machenism have been introduced.
But for device hot unplug, if we want data path would not be break when
device hot plug in or out, we still need some preparatory measures to do
some preparation work for the device detach and attach, so that we will
not encounter memory fault after device have been plug out of the system,
and also let user directly attach device which have been auto bind onto
the specific kernel driver.

This patch set will introduces two APIs to do that failure and auto bind
handle for hot plug feature, and also use testpmd to show example how to
use these 2 APIs for process hot plug event, let the process could be
smoothly like below case:

1)hot plug removal:
plugout->failure handle->stop forward->stop port->close port->detach port

2)hot plug insertion:
plugin->kernel driver auto bind->attach port->start port

with this machenism, every user such as fail-safe driver or testpmd, if
enable device event monitor they will be able to develop their own
hotplug application.

patchset history:
v19->18:
note for limitation of multiple hotplug,fix some typo, sqeeze patch.

v18->v15:
add document, add signal bus handler, refine the code to be more clear.

the prior patch history please check the patch set
"add device event monitor framework"

Jeff Guo (4):
  bus/pci: introduce device hot unplug handle
  eal: add failure handler mechanism for hot plug
  eal: add driver auto bind for hot insertion
  app/testpmd: use auto handle for hotplug

 app/test-pmd/testpmd.c                  | 199 ++++++++++++++++++++++++++++----
 app/test-pmd/testpmd.h                  |   9 ++
 doc/guides/rel_notes/release_18_05.rst  |   8 ++
 drivers/bus/pci/pci_common.c            |  42 +++++++
 drivers/bus/pci/pci_common_uio.c        |  32 +++++
 drivers/bus/pci/private.h               |  12 ++
 kernel/linux/igb_uio/igb_uio.c          |   4 +
 lib/librte_eal/bsdapp/eal/eal_dev.c     |   7 ++
 lib/librte_eal/common/include/rte_bus.h |  15 +++
 lib/librte_eal/common/include/rte_dev.h |  35 ++++++
 lib/librte_eal/linuxapp/eal/eal_dev.c   | 194 ++++++++++++++++++++++++++++++-
 lib/librte_eal/rte_eal_version.map      |   2 +
 12 files changed, 534 insertions(+), 25 deletions(-)

-- 
2.7.4

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [dpdk-dev] [PATCH V19 1/4] bus/pci: introduce device hot unplug handle
  2018-04-06 10:56   ` [dpdk-dev] [PATCH V19 0/4] add hot plug failure and auto bind handler Jeff Guo
@ 2018-04-06 10:56     ` Jeff Guo
  2018-04-09 17:47       ` Ananyev, Konstantin
  2018-04-06 10:56     ` [dpdk-dev] [PATCH V19 2/4] eal: add failure handler mechanism for hot plug Jeff Guo
                       ` (2 subsequent siblings)
  3 siblings, 1 reply; 28+ messages in thread
From: Jeff Guo @ 2018-04-06 10:56 UTC (permalink / raw)
  To: stephen, bruce.richardson, ferruh.yigit, konstantin.ananyev,
	gaetan.rivet, jingjing.wu, thomas, motih, harry.van.haaren,
	jianfeng.tan
  Cc: jblunck, shreyansh.jain, dev, jia.guo, helin.zhang

As of device hot unplug, we need some preparatory measures so that we will
not encounter memory fault after device be plug out of the system,
and also let we could recover the running data path but not been break.
This allows the buses to handle device hot unplug event.
The patch only enable the ops in pci bus, when handle device hot unplug
event, remap a dummy memory to avoid bus read/write error.
Other buses could accordingly implement this ops specific by themselves.

Signed-off-by: Jeff Guo <jia.guo@intel.com>
---
v19->v18:
fix some typo and squeeze patch
---
 drivers/bus/pci/pci_common.c            | 42 +++++++++++++++++++++++++++++++++
 drivers/bus/pci/pci_common_uio.c        | 32 +++++++++++++++++++++++++
 drivers/bus/pci/private.h               | 12 ++++++++++
 lib/librte_eal/common/include/rte_bus.h | 15 ++++++++++++
 4 files changed, 101 insertions(+)

diff --git a/drivers/bus/pci/pci_common.c b/drivers/bus/pci/pci_common.c
index 2a00f36..09192ed 100644
--- a/drivers/bus/pci/pci_common.c
+++ b/drivers/bus/pci/pci_common.c
@@ -474,6 +474,47 @@ pci_find_device(const struct rte_device *start, rte_dev_cmp_t cmp,
 }
 
 static int
+pci_handle_hot_unplug(struct rte_device *dev)
+{
+	struct rte_pci_device *pdev;
+	int ret;
+
+	if (dev == NULL)
+		return -EINVAL;
+
+	pdev = RTE_DEV_TO_PCI(dev);
+
+	/* remap resources for devices */
+	switch (pdev->kdrv) {
+	case RTE_KDRV_VFIO:
+#ifdef VFIO_PRESENT
+		/* TODO */
+#endif
+		break;
+	case RTE_KDRV_IGB_UIO:
+	case RTE_KDRV_UIO_GENERIC:
+		if (rte_eal_using_phys_addrs()) {
+			/* map resources for devices that use uio */
+			ret = pci_uio_remap_resource(pdev);
+		}
+		break;
+	case RTE_KDRV_NIC_UIO:
+		ret = pci_uio_remap_resource(pdev);
+		break;
+	default:
+		RTE_LOG(DEBUG, EAL,
+			"  Not managed by a supported kernel driver, skipped\n");
+		ret = -1;
+		break;
+	}
+
+	if (ret != 0)
+		RTE_LOG(ERR, EAL, "failed to handle hot unplug of %s",
+			pdev->name);
+	return ret;
+}
+
+static int
 pci_plug(struct rte_device *dev)
 {
 	return pci_probe_all_drivers(RTE_DEV_TO_PCI(dev));
@@ -503,6 +544,7 @@ struct rte_pci_bus rte_pci_bus = {
 		.unplug = pci_unplug,
 		.parse = pci_parse,
 		.get_iommu_class = rte_pci_get_iommu_class,
+		.handle_hot_unplug = pci_handle_hot_unplug,
 	},
 	.device_list = TAILQ_HEAD_INITIALIZER(rte_pci_bus.device_list),
 	.driver_list = TAILQ_HEAD_INITIALIZER(rte_pci_bus.driver_list),
diff --git a/drivers/bus/pci/pci_common_uio.c b/drivers/bus/pci/pci_common_uio.c
index 54bc20b..31a4094 100644
--- a/drivers/bus/pci/pci_common_uio.c
+++ b/drivers/bus/pci/pci_common_uio.c
@@ -146,6 +146,38 @@ pci_uio_unmap(struct mapped_pci_resource *uio_res)
 	}
 }
 
+/* remap the PCI resource of a PCI device in private virtual memory */
+int
+pci_uio_remap_resource(struct rte_pci_device *dev)
+{
+	int i;
+	void *map_address;
+
+	if (dev == NULL)
+		return -1;
+
+	/* Remap all BARs */
+	for (i = 0; i != PCI_MAX_RESOURCE; i++) {
+		/* skip empty BAR */
+		if (dev->mem_resource[i].phys_addr == 0)
+			continue;
+		pci_unmap_resource(dev->mem_resource[i].addr,
+				(size_t)dev->mem_resource[i].len);
+		map_address = pci_map_resource(
+				dev->mem_resource[i].addr, -1, 0,
+				(size_t)dev->mem_resource[i].len,
+				MAP_ANONYMOUS | MAP_FIXED);
+		if (map_address == MAP_FAILED) {
+			RTE_LOG(ERR, EAL,
+				"Cannot remap resource for device %s\n",
+				dev->name);
+			return -1;
+		}
+	}
+
+	return 0;
+}
+
 static struct mapped_pci_resource *
 pci_uio_find_resource(struct rte_pci_device *dev)
 {
diff --git a/drivers/bus/pci/private.h b/drivers/bus/pci/private.h
index 88fa587..7a862ef 100644
--- a/drivers/bus/pci/private.h
+++ b/drivers/bus/pci/private.h
@@ -173,6 +173,18 @@ void pci_uio_free_resource(struct rte_pci_device *dev,
 		struct mapped_pci_resource *uio_res);
 
 /**
+ * remap the pci uio resource..
+ *
+ * @param dev
+ *   Point to the struct rte pci device.
+ * @return
+ *   - On success, zero.
+ *   - On failure, a negative value.
+ */
+int
+pci_uio_remap_resource(struct rte_pci_device *dev);
+
+/**
  * Map device memory to uio resource
  *
  * This function is private to EAL.
diff --git a/lib/librte_eal/common/include/rte_bus.h b/lib/librte_eal/common/include/rte_bus.h
index 6fb0834..729ff34 100644
--- a/lib/librte_eal/common/include/rte_bus.h
+++ b/lib/librte_eal/common/include/rte_bus.h
@@ -168,6 +168,19 @@ typedef int (*rte_bus_unplug_t)(struct rte_device *dev);
 typedef int (*rte_bus_parse_t)(const char *name, void *addr);
 
 /**
+ * Implementation specific hot unplug handler function which is responsible
+ * for handle the failure when hot unplug the device, guaranty the system
+ * would not crash in the case.
+ * @param dev
+ *	Pointer of the device structure.
+ *
+ * @return
+ *	0 on success.
+ *	!0 on error.
+ */
+typedef int (*rte_bus_handle_hot_unplug_t)(struct rte_device *dev);
+
+/**
  * Bus scan policies
  */
 enum rte_bus_scan_mode {
@@ -209,6 +222,8 @@ struct rte_bus {
 	rte_bus_plug_t plug;         /**< Probe single device for drivers */
 	rte_bus_unplug_t unplug;     /**< Remove single device from driver */
 	rte_bus_parse_t parse;       /**< Parse a device name */
+	rte_bus_handle_hot_unplug_t handle_hot_unplug; /**< handle hot unplug
+							device event */
 	struct rte_bus_conf conf;    /**< Bus configuration */
 	rte_bus_get_iommu_class_t get_iommu_class; /**< Get iommu class */
 };
-- 
2.7.4

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [dpdk-dev] [PATCH V19 2/4] eal: add failure handler mechanism for hot plug
  2018-04-06 10:56   ` [dpdk-dev] [PATCH V19 0/4] add hot plug failure and auto bind handler Jeff Guo
  2018-04-06 10:56     ` [dpdk-dev] [PATCH V19 1/4] bus/pci: introduce device hot unplug handle Jeff Guo
@ 2018-04-06 10:56     ` Jeff Guo
  2018-04-06 14:03       ` Zhang, Qi Z
  2018-04-09 17:42       ` Ananyev, Konstantin
  2018-04-06 10:56     ` [dpdk-dev] [PATCH V19 3/4] eal: add driver auto bind for hot insertion Jeff Guo
  2018-04-06 10:56     ` [dpdk-dev] [PATCH V19 4/4] app/testpmd: use auto handle for hotplug Jeff Guo
  3 siblings, 2 replies; 28+ messages in thread
From: Jeff Guo @ 2018-04-06 10:56 UTC (permalink / raw)
  To: stephen, bruce.richardson, ferruh.yigit, konstantin.ananyev,
	gaetan.rivet, jingjing.wu, thomas, motih, harry.van.haaren,
	jianfeng.tan
  Cc: jblunck, shreyansh.jain, dev, jia.guo, helin.zhang

This patch introduces an API (rte_dev_handle_hot_unplug) to handle device
hot unplug event. When device be hot plug out, the device resource
become invalid, if this resource is still be unexpected read/write,
system will crash. The api let user register the hot unplug handler, when
hot plug failure occur, the working thread will be block until the uevent
mechanism successful recovery the memory and guaranty the application keep
running smoothly.

Signed-off-by: Jeff Guo <jia.guo@intel.com>
---
v19->18:
add note for limitation of multiple hotplug
---
 doc/guides/rel_notes/release_18_05.rst  |   6 ++
 kernel/linux/igb_uio/igb_uio.c          |   4 +
 lib/librte_eal/common/include/rte_dev.h |  19 +++++
 lib/librte_eal/linuxapp/eal/eal_dev.c   | 140 +++++++++++++++++++++++++++++++-
 lib/librte_eal/rte_eal_version.map      |   1 +
 5 files changed, 169 insertions(+), 1 deletion(-)

diff --git a/doc/guides/rel_notes/release_18_05.rst b/doc/guides/rel_notes/release_18_05.rst
index cb9e050..2707e73 100644
--- a/doc/guides/rel_notes/release_18_05.rst
+++ b/doc/guides/rel_notes/release_18_05.rst
@@ -70,6 +70,12 @@ New Features
 
   Linux uevent is supported as backend of this device event notification framework.
 
+* **Added hot plug failure handler.**
+
+  Added a failure handler machenism to handle hot unplug device.
+
+  * ``rte_dev_handle_hot_unplug`` for handle hot unplug device failure.
+
 API Changes
 -----------
 
diff --git a/kernel/linux/igb_uio/igb_uio.c b/kernel/linux/igb_uio/igb_uio.c
index 4cae4dd..293c310 100644
--- a/kernel/linux/igb_uio/igb_uio.c
+++ b/kernel/linux/igb_uio/igb_uio.c
@@ -344,6 +344,10 @@ igbuio_pci_release(struct uio_info *info, struct inode *inode)
 	struct rte_uio_pci_dev *udev = info->priv;
 	struct pci_dev *dev = udev->pdev;
 
+	/* check if device has been remove before release */
+	if ((&dev->dev.kobj)->state_remove_uevent_sent == 1)
+		return -1;
+
 	mutex_lock(&udev->lock);
 	if (--udev->refcnt > 0) {
 		mutex_unlock(&udev->lock);
diff --git a/lib/librte_eal/common/include/rte_dev.h b/lib/librte_eal/common/include/rte_dev.h
index a5203e7..17c446d 100644
--- a/lib/librte_eal/common/include/rte_dev.h
+++ b/lib/librte_eal/common/include/rte_dev.h
@@ -361,4 +361,23 @@ rte_dev_event_monitor_start(void);
  */
 int __rte_experimental
 rte_dev_event_monitor_stop(void);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice
+ *
+ * It can be used to register the device signal bus handler, and save the
+ * current environment for each thread, when signal bus error invoke, the
+ * handler would restore the environment by long jmp to each working
+ * thread previous locate, then block the thread to waiting until the memory
+ * recovery and remapping be finished, that would guaranty the system not
+ * crash when the device be hot unplug.
+ *
+ * @param none
+ * @return
+ *   - From a successful direct invocation, zero.
+ *   - From a call of siglongjmp(), non_zero.
+ */
+int __rte_experimental
+rte_dev_handle_hot_unplug(void);
 #endif /* _RTE_DEV_H_ */
diff --git a/lib/librte_eal/linuxapp/eal/eal_dev.c b/lib/librte_eal/linuxapp/eal/eal_dev.c
index 9478a39..84b7efc 100644
--- a/lib/librte_eal/linuxapp/eal/eal_dev.c
+++ b/lib/librte_eal/linuxapp/eal/eal_dev.c
@@ -4,6 +4,9 @@
 
 #include <string.h>
 #include <unistd.h>
+#include <signal.h>
+#include <setjmp.h>
+#include <pthread.h>
 #include <sys/socket.h>
 #include <linux/netlink.h>
 
@@ -13,12 +16,17 @@
 #include <rte_malloc.h>
 #include <rte_interrupts.h>
 #include <rte_alarm.h>
+#include <rte_bus.h>
+#include <rte_per_lcore.h>
 
 #include "eal_private.h"
 
 static struct rte_intr_handle intr_handle = {.fd = -1 };
 static bool monitor_started;
 
+pthread_mutex_t failure_recovery_lock;
+pthread_cond_t failure_recovery_cond;
+
 #define EAL_UEV_MSG_LEN 4096
 #define EAL_UEV_MSG_ELEM_LEN 128
 
@@ -32,6 +40,22 @@ enum eal_dev_event_subsystem {
 	EAL_DEV_EVENT_SUBSYSTEM_MAX
 };
 
+static RTE_DEFINE_PER_LCORE(sigjmp_buf, unplug_longjmp_env);
+
+static void sigbus_handler(int signum __rte_unused)
+{
+	RTE_LOG(DEBUG, EAL, "receive SIGBUS error!\n");
+	siglongjmp(RTE_PER_LCORE(unplug_longjmp_env), 1);
+}
+
+static int cmp_dev_name(const struct rte_device *dev,
+	const void *_name)
+{
+	const char *name = _name;
+
+	return strcmp(dev->name, name);
+}
+
 static int
 dev_uev_socket_fd_create(void)
 {
@@ -132,6 +156,31 @@ dev_uev_parse(const char *buf, struct rte_dev_event *event, int length)
 	return 0;
 }
 
+static int
+dev_uev_remove_handler(struct rte_device *dev)
+{
+	struct rte_bus *bus = rte_bus_find_by_device_name(dev->name);
+	int ret;
+
+	if (!dev)
+		return -1;
+
+	if (bus->handle_hot_unplug) {
+		/**
+		 * call bus ops to handle hot unplug.
+		 */
+		ret = bus->handle_hot_unplug(dev);
+		if (ret) {
+			RTE_LOG(ERR, EAL,
+				"It cannot handle hot unplug for device (%s) "
+				"on the bus.\n ",
+				dev->name);
+			return ret;
+		}
+	}
+	return 0;
+}
+
 static void
 dev_delayed_unregister(void *param)
 {
@@ -146,6 +195,9 @@ dev_uev_handler(__rte_unused void *param)
 	struct rte_dev_event uevent;
 	int ret;
 	char buf[EAL_UEV_MSG_LEN];
+	struct rte_bus *bus;
+	struct rte_device *dev;
+	const char *busname;
 
 	memset(&uevent, 0, sizeof(struct rte_dev_event));
 	memset(buf, 0, EAL_UEV_MSG_LEN);
@@ -170,11 +222,87 @@ dev_uev_handler(__rte_unused void *param)
 	RTE_LOG(DEBUG, EAL, "receive uevent(name:%s, type:%d, subsystem:%d)\n",
 		uevent.devname, uevent.type, uevent.subsystem);
 
-	if (uevent.devname)
+	switch (uevent.subsystem) {
+	case EAL_DEV_EVENT_SUBSYSTEM_PCI:
+	case EAL_DEV_EVENT_SUBSYSTEM_UIO:
+		busname = "pci";
+		break;
+	default:
+		break;
+	}
+
+	if (uevent.devname) {
+		if (uevent.type == RTE_DEV_EVENT_REMOVE) {
+			bus = rte_bus_find_by_name(busname);
+			if (bus == NULL) {
+				RTE_LOG(ERR, EAL, "Cannot find bus (%s)\n",
+					uevent.devname);
+				return;
+			}
+			dev = bus->find_device(NULL, cmp_dev_name,
+					       uevent.devname);
+			if (dev == NULL) {
+				RTE_LOG(ERR, EAL,
+					"Cannot find unplugged device (%s)\n",
+					uevent.devname);
+				return;
+			}
+			ret = dev_uev_remove_handler(dev);
+			if (ret) {
+				RTE_LOG(ERR, EAL, "Driver cannot remap the "
+					"device (%s)\n",
+					dev->name);
+				return;
+			}
+			/* wake up all the threads */
+			pthread_cond_broadcast(&failure_recovery_cond);
+		}
 		dev_callback_process(uevent.devname, uevent.type);
+	}
 }
 
 int __rte_experimental
+rte_dev_handle_hot_unplug(void)
+{
+	struct sigaction act;
+	sigset_t mask;
+	int ret = 0;
+
+	/* set signal handlers */
+	memset(&act, 0x00, sizeof(struct sigaction));
+	act.sa_handler = sigbus_handler;
+	sigemptyset(&act.sa_mask);
+	act.sa_flags = SA_RESTART;
+	sigaction(SIGBUS, &act, NULL);
+	sigemptyset(&mask);
+	sigaddset(&mask, SIGBUS);
+	pthread_sigmask(SIG_UNBLOCK, &mask, NULL);
+
+	ret = sigsetjmp(RTE_PER_LCORE(unplug_longjmp_env), 1);
+	if (ret) {
+		/*
+		 * Waitting for condition variable before failure recovery
+		 * finish. Now the limitation is only handle one device
+		 * hot plug, for multiple devices hotplug, need check if
+		 * the device belong to this working thread, then directly
+		 * call memory remaping, unrelated thread just keep going
+		 * their work by no interrupt from hotplug.
+		 * TODO: multiple device hotplug
+		 */
+		pthread_mutex_lock(&failure_recovery_lock);
+		RTE_LOG(DEBUG, EAL, "begin waiting for the failure handler.\n");
+		pthread_cond_wait(&failure_recovery_cond,
+					&failure_recovery_lock);
+		RTE_LOG(DEBUG, EAL,
+		       "come back from waiting for failure handler.\n");
+		pthread_mutex_unlock(&failure_recovery_lock);
+	}
+
+	return ret;
+}
+
+
+int __rte_experimental
 rte_dev_event_monitor_start(void)
 {
 	int ret;
@@ -196,6 +324,12 @@ rte_dev_event_monitor_start(void)
 		return -1;
 	}
 
+	/* initialize mutex and condition variable
+	 * to control failure recovery.
+	 */
+	pthread_mutex_init(&failure_recovery_lock, NULL);
+	pthread_cond_init(&failure_recovery_cond, NULL);
+
 	monitor_started = true;
 
 	return 0;
@@ -219,5 +353,9 @@ rte_dev_event_monitor_stop(void)
 	close(intr_handle.fd);
 	intr_handle.fd = -1;
 	monitor_started = false;
+
+	pthread_cond_destroy(&failure_recovery_cond);
+	pthread_mutex_destroy(&failure_recovery_lock);
+
 	return 0;
 }
diff --git a/lib/librte_eal/rte_eal_version.map b/lib/librte_eal/rte_eal_version.map
index fc5c62a..873ef38 100644
--- a/lib/librte_eal/rte_eal_version.map
+++ b/lib/librte_eal/rte_eal_version.map
@@ -262,5 +262,6 @@ EXPERIMENTAL {
 	rte_dev_event_monitor_stop;
 	rte_dev_event_callback_register;
 	rte_dev_event_callback_unregister;
+	rte_dev_handle_hot_unplug;
 
 } DPDK_18.02;
-- 
2.7.4

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [dpdk-dev] [PATCH V19 3/4] eal: add driver auto bind for hot insertion
  2018-04-06 10:56   ` [dpdk-dev] [PATCH V19 0/4] add hot plug failure and auto bind handler Jeff Guo
  2018-04-06 10:56     ` [dpdk-dev] [PATCH V19 1/4] bus/pci: introduce device hot unplug handle Jeff Guo
  2018-04-06 10:56     ` [dpdk-dev] [PATCH V19 2/4] eal: add failure handler mechanism for hot plug Jeff Guo
@ 2018-04-06 10:56     ` Jeff Guo
  2018-04-06 10:56     ` [dpdk-dev] [PATCH V19 4/4] app/testpmd: use auto handle for hotplug Jeff Guo
  3 siblings, 0 replies; 28+ messages in thread
From: Jeff Guo @ 2018-04-06 10:56 UTC (permalink / raw)
  To: stephen, bruce.richardson, ferruh.yigit, konstantin.ananyev,
	gaetan.rivet, jingjing.wu, thomas, motih, harry.van.haaren,
	jianfeng.tan
  Cc: jblunck, shreyansh.jain, dev, jia.guo, helin.zhang

Normally we use driverctl or dpdk-devbind.py to bind kernel driver before
application running, but lack of an function to automatically bind driver
at runtime. This patch introduce a new API (rte_dev_bind_kernel_driver),
aim to let user call it to bind the specific kernel driver according
their own policy, that would preparing for the next step of attach device,
let app running smoothly when hotplug behavior occur.

Signed-off-by: Jeff Guo <jia.guo@intel.com>
---
v19->v18:
no change
---
 doc/guides/rel_notes/release_18_05.rst  |  8 +++--
 lib/librte_eal/bsdapp/eal/eal_dev.c     |  7 +++++
 lib/librte_eal/common/include/rte_dev.h | 16 ++++++++++
 lib/librte_eal/linuxapp/eal/eal_dev.c   | 54 +++++++++++++++++++++++++++++++++
 lib/librte_eal/rte_eal_version.map      |  1 +
 5 files changed, 83 insertions(+), 3 deletions(-)

diff --git a/doc/guides/rel_notes/release_18_05.rst b/doc/guides/rel_notes/release_18_05.rst
index 2707e73..f8822d3 100644
--- a/doc/guides/rel_notes/release_18_05.rst
+++ b/doc/guides/rel_notes/release_18_05.rst
@@ -70,11 +70,13 @@ New Features
 
   Linux uevent is supported as backend of this device event notification framework.
 
-* **Added hot plug failure handler.**
+* **Added hot plug failure handler and kernel driver auto-bind func**
 
-  Added a failure handler machenism to handle hot unplug device.
+  Added a failure handler machenism to handle hot unplug device, and added an kernel driver
+  auto bind function for hot plug insertion. The list of new APIs:
 
-  * ``rte_dev_handle_hot_unplug`` for handle hot unplug device failure.
+  * ``rte_dev_handle_hot_unplug`` for handle hot uplug device failure.
+  * ``rte_dev_bind_kernel_driver`` for hot plug insertion.
 
 API Changes
 -----------
diff --git a/lib/librte_eal/bsdapp/eal/eal_dev.c b/lib/librte_eal/bsdapp/eal/eal_dev.c
index 1c6c51b..e953a87 100644
--- a/lib/librte_eal/bsdapp/eal/eal_dev.c
+++ b/lib/librte_eal/bsdapp/eal/eal_dev.c
@@ -19,3 +19,10 @@ rte_dev_event_monitor_stop(void)
 	RTE_LOG(ERR, EAL, "Device event is not supported for FreeBSD\n");
 	return -1;
 }
+
+int __rte_experimental
+rte_dev_bind_driver(const char *dev_name, enum rte_kernel_driver kdrv_type)
+{
+	RTE_LOG(ERR, EAL, "Bind driver is not supported for FreeBSD\n");
+	return -1;
+}
diff --git a/lib/librte_eal/common/include/rte_dev.h b/lib/librte_eal/common/include/rte_dev.h
index 17c446d..35f45d3 100644
--- a/lib/librte_eal/common/include/rte_dev.h
+++ b/lib/librte_eal/common/include/rte_dev.h
@@ -380,4 +380,20 @@ rte_dev_event_monitor_stop(void);
  */
 int __rte_experimental
 rte_dev_handle_hot_unplug(void);
+
+/**
+ * It can be used to bind a device to a specific type of kernel driver.
+ *
+ * @param dev_name
+ *  The device name.
+ * @param kdrv_type
+ *  The specific kernel driver's type.
+ *
+ * @return
+ *  - On success, zero.
+ *  - On failure, a negative value.
+ */
+int __rte_experimental
+rte_dev_bind_kernel_driver(const char *dev_name,
+			   enum rte_kernel_driver kdrv_type);
 #endif /* _RTE_DEV_H_ */
diff --git a/lib/librte_eal/linuxapp/eal/eal_dev.c b/lib/librte_eal/linuxapp/eal/eal_dev.c
index 84b7efc..2ad7444 100644
--- a/lib/librte_eal/linuxapp/eal/eal_dev.c
+++ b/lib/librte_eal/linuxapp/eal/eal_dev.c
@@ -4,6 +4,7 @@
 
 #include <string.h>
 #include <unistd.h>
+#include <fcntl.h>
 #include <signal.h>
 #include <setjmp.h>
 #include <pthread.h>
@@ -359,3 +360,56 @@ rte_dev_event_monitor_stop(void)
 
 	return 0;
 }
+
+int __rte_experimental
+rte_dev_bind_kernel_driver(const char *dev_name,
+			   enum rte_kernel_driver kdrv_type)
+{
+	const char *kdrv_name = NULL;
+	char drv_override_path[1024];
+	int drv_override_fd = -1;
+
+	if (!dev_name || !kdrv_type)
+		return -1;
+
+	switch (kdrv_type) {
+	case RTE_KDRV_IGB_UIO:
+		kdrv_name = "igb_uio";
+		break;
+	case RTE_KDRV_VFIO:
+		kdrv_name = "vfio-pci";
+		break;
+	case RTE_KDRV_UIO_GENERIC:
+		kdrv_name = "uio_pci_generic";
+		break;
+	case RTE_KDRV_NIC_UIO:
+		RTE_LOG(ERR, EAL, "Don't support to bind nic uio driver.\n");
+		goto err;
+	default:
+		break;
+	}
+
+	snprintf(drv_override_path, sizeof(drv_override_path),
+		"/sys/bus/pci/devices/%s/driver_override", dev_name);
+
+	/* specify the driver for a device by writing to driver_override */
+	drv_override_fd = open(drv_override_path, O_WRONLY);
+	if (drv_override_fd < 0) {
+		RTE_LOG(ERR, EAL, "Cannot open %s: %s\n",
+			drv_override_path, strerror(errno));
+		goto err;
+	}
+
+	if (write(drv_override_fd, kdrv_name, sizeof(kdrv_name)) < 0) {
+		RTE_LOG(ERR, EAL,
+			"Error: bind failed - Cannot write "
+			"driver %s to device %s\n", kdrv_name, dev_name);
+		goto err;
+	}
+
+	close(drv_override_fd);
+	return 0;
+err:
+	close(drv_override_fd);
+	return -1;
+}
diff --git a/lib/librte_eal/rte_eal_version.map b/lib/librte_eal/rte_eal_version.map
index 873ef38..664a837 100644
--- a/lib/librte_eal/rte_eal_version.map
+++ b/lib/librte_eal/rte_eal_version.map
@@ -263,5 +263,6 @@ EXPERIMENTAL {
 	rte_dev_event_callback_register;
 	rte_dev_event_callback_unregister;
 	rte_dev_handle_hot_unplug;
+	rte_dev_bind_kernel_driver;
 
 } DPDK_18.02;
-- 
2.7.4

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [dpdk-dev] [PATCH V19 4/4] app/testpmd: use auto handle for hotplug
  2018-04-06 10:56   ` [dpdk-dev] [PATCH V19 0/4] add hot plug failure and auto bind handler Jeff Guo
                       ` (2 preceding siblings ...)
  2018-04-06 10:56     ` [dpdk-dev] [PATCH V19 3/4] eal: add driver auto bind for hot insertion Jeff Guo
@ 2018-04-06 10:56     ` Jeff Guo
  2018-04-12  5:31       ` Matan Azrad
  3 siblings, 1 reply; 28+ messages in thread
From: Jeff Guo @ 2018-04-06 10:56 UTC (permalink / raw)
  To: stephen, bruce.richardson, ferruh.yigit, konstantin.ananyev,
	gaetan.rivet, jingjing.wu, thomas, motih, harry.van.haaren,
	jianfeng.tan
  Cc: jblunck, shreyansh.jain, dev, jia.guo, helin.zhang

Use testpmd for example, to show how an application smoothly handle
failure when device be hot removal, and show how to auto bind kernal
driver to preparing attach device when device being hot insertion.

Signed-off-by: Jeff Guo <jia.guo@intel.com>
---
v19->v18:
clean code
---
 app/test-pmd/testpmd.c | 199 +++++++++++++++++++++++++++++++++++++++++++------
 app/test-pmd/testpmd.h |   9 +++
 2 files changed, 184 insertions(+), 24 deletions(-)

diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c
index d2c122a..d7fa913 100644
--- a/app/test-pmd/testpmd.c
+++ b/app/test-pmd/testpmd.c
@@ -285,6 +285,8 @@ uint8_t lsc_interrupt = 1; /* enabled by default */
  */
 uint8_t rmv_interrupt = 1; /* enabled by default */
 
+#define HOT_PLUG_FOR_ALL_DEVICE -1
+#define ALL_CALLBACK -1
 uint8_t hot_plug = 0; /**< hotplug disabled by default. */
 
 /*
@@ -387,6 +389,8 @@ uint8_t bitrate_enabled;
 struct gro_status gro_ports[RTE_MAX_ETHPORTS];
 uint8_t gro_flush_cycles = GRO_DEFAULT_FLUSH_CYCLES;
 
+static struct hotplug_request_list hp_list;
+
 /* Forward function declarations */
 static void map_port_queue_stats_mapping_registers(portid_t pi,
 						   struct rte_port *port);
@@ -397,9 +401,13 @@ static int eth_event_callback(portid_t port_id,
 static void eth_dev_event_callback(char *device_name,
 				enum rte_dev_event_type type,
 				void *param);
-static int eth_dev_event_callback_register(void);
-static int eth_dev_event_callback_unregister(void);
+static int eth_dev_event_callback_register(portid_t port_id);
+static int eth_dev_event_callback_unregister(portid_t port_id);
+
+static bool in_hotplug_list(const char *dev_name);
 
+static int hotplug_list_add(struct rte_device *device,
+				enum rte_kernel_driver device_kdrv);
 
 /*
  * Check if all the ports are started.
@@ -1120,11 +1128,17 @@ run_pkt_fwd_on_lcore(struct fwd_lcore *fc, packet_fwd_t pkt_fwd)
 	uint64_t tics_datum;
 	uint64_t tics_current;
 	uint8_t idx_port, cnt_ports;
+	int ret;
 
 	cnt_ports = rte_eth_dev_count();
 	tics_datum = rte_rdtsc();
 	tics_per_1sec = rte_get_timer_hz();
 #endif
+	if (hot_plug) {
+		ret = rte_dev_handle_hot_unplug();
+		if (ret)
+			printf("The device is being hot unplug!\n");
+	}
 	fsm = &fwd_streams[fc->stream_idx];
 	nb_fs = fc->stream_nb;
 	do {
@@ -1863,15 +1877,24 @@ reset_port(portid_t pid)
 }
 
 static int
-eth_dev_event_callback_register(void)
+eth_dev_event_callback_register(portid_t port_id)
 {
 	int ret;
+	char *device_name;
 
+	/* if port id equal -1, unregister event callbacks for all device. */
+	if (port_id == (portid_t)HOT_PLUG_FOR_ALL_DEVICE) {
+		device_name = NULL;
+	} else {
+		device_name = strdup(rte_eth_devices[port_id].device->name);
+		if (!device_name)
+			return -1;
+	}
 	/* register the device event callback */
-	ret = rte_dev_event_callback_register(NULL,
-		eth_dev_event_callback, NULL);
+	ret = rte_dev_event_callback_register(device_name,
+		eth_dev_event_callback, (void *)(intptr_t)port_id);
 	if (ret) {
-		printf("Failed to register device event callback\n");
+		printf("Failed to register device event callback.\n");
 		return -1;
 	}
 
@@ -1880,15 +1903,25 @@ eth_dev_event_callback_register(void)
 
 
 static int
-eth_dev_event_callback_unregister(void)
+eth_dev_event_callback_unregister(portid_t port_id)
 {
 	int ret;
+	char *device_name;
+
+	/* if port id equal -1, unregister all device event callbacks */
+	if (port_id == (portid_t)HOT_PLUG_FOR_ALL_DEVICE) {
+		device_name = NULL;
+	} else {
+		device_name = strdup(rte_eth_devices[port_id].device->name);
+		if (!device_name)
+			return -1;
+	}
 
 	/* unregister the device event callback */
-	ret = rte_dev_event_callback_unregister(NULL,
-		eth_dev_event_callback, NULL);
-	if (ret < 0) {
-		printf("Failed to unregister device event callback\n");
+	ret = rte_dev_event_callback_unregister(device_name,
+		eth_dev_event_callback, (void *)(intptr_t)port_id);
+	if (ret) {
+		printf("Failed to unregister device event callback.\n");
 		return -1;
 	}
 
@@ -1911,6 +1944,8 @@ attach_port(char *identifier)
 	if (rte_eth_dev_attach(identifier, &pi))
 		return;
 
+	eth_dev_event_callback_register(pi);
+
 	socket_id = (unsigned)rte_eth_dev_socket_id(pi);
 	/* if socket_id is invalid, set to 0 */
 	if (check_socket_id(socket_id) < 0)
@@ -1922,6 +1957,12 @@ attach_port(char *identifier)
 
 	ports[pi].port_status = RTE_PORT_STOPPED;
 
+	if (hot_plug) {
+		hotplug_list_add(rte_eth_devices[pi].device,
+				 rte_eth_devices[pi].data->kdrv);
+		eth_dev_event_callback_register(pi);
+	}
+
 	printf("Port %d is attached. Now total ports is %d\n", pi, nb_ports);
 	printf("Done\n");
 }
@@ -1933,6 +1974,12 @@ detach_port(portid_t port_id)
 
 	printf("Detaching a port...\n");
 
+	if (hot_plug) {
+		hotplug_list_add(rte_eth_devices[port_id].device,
+				 rte_eth_devices[port_id].data->kdrv);
+		eth_dev_event_callback_register(port_id);
+	}
+
 	if (!port_is_closed(port_id)) {
 		printf("Please close port first\n");
 		return;
@@ -1979,7 +2026,7 @@ pmd_test_exit(void)
 			RTE_LOG(ERR, EAL,
 				"fail to stop device event monitor.");
 
-		ret = eth_dev_event_callback_unregister();
+		ret = eth_dev_event_callback_unregister(ALL_CALLBACK);
 		if (ret)
 			RTE_LOG(ERR, EAL,
 				"fail to unregister all event callbacks.");
@@ -2068,6 +2115,40 @@ rmv_event_callback(void *arg)
 			dev->device->name);
 }
 
+static void
+rmv_dev_event_callback(void *arg)
+{
+	uint8_t port_id = (intptr_t)arg;
+
+	rte_eal_alarm_cancel(rmv_dev_event_callback, arg);
+
+	RTE_ETH_VALID_PORTID_OR_RET(port_id);
+	printf("removing port id:%u\n", port_id);
+
+	if (!in_hotplug_list(rte_eth_devices[port_id].device->name))
+		return;
+
+	stop_packet_forwarding();
+
+	stop_port(port_id);
+	close_port(port_id);
+	detach_port(port_id);
+}
+
+static void
+add_dev_event_callback(void *arg)
+{
+	char *dev_name = (char *)arg;
+
+	rte_eal_alarm_cancel(add_dev_event_callback, arg);
+
+	if (!in_hotplug_list(dev_name))
+		return;
+
+	RTE_LOG(ERR, EAL, "add device: %s\n", dev_name);
+	attach_port(dev_name);
+}
+
 /* This function is used by the interrupt thread */
 static int
 eth_event_callback(portid_t port_id, enum rte_eth_event_type type, void *param,
@@ -2114,31 +2195,96 @@ eth_event_callback(portid_t port_id, enum rte_eth_event_type type, void *param,
 	return 0;
 }
 
+static bool
+in_hotplug_list(const char *dev_name)
+{
+	struct hotplug_request *hp_request = NULL;
+
+	TAILQ_FOREACH(hp_request, &hp_list, next) {
+		if (!strcmp(hp_request->dev_name, dev_name))
+			break;
+	}
+
+	if (hp_request)
+		return true;
+
+	return false;
+}
+
+static enum rte_kernel_driver
+get_hotplug_driver(const char *dev_name)
+{
+	struct hotplug_request *hp_request = NULL;
+
+	TAILQ_FOREACH(hp_request, &hp_list, next) {
+		if (!strcmp(hp_request->dev_name, dev_name))
+			return hp_request->dev_kdrv;
+	}
+	return -1;
+}
+
+static int
+hotplug_list_add(struct rte_device *device, enum rte_kernel_driver device_kdrv)
+{
+	struct hotplug_request *hp_request;
+
+	hp_request = rte_zmalloc("hoplug request",
+			sizeof(*hp_request), 0);
+	if (hp_request == NULL) {
+		fprintf(stderr, "%s can not alloc memory\n",
+			__func__);
+		return -ENOMEM;
+	}
+
+	hp_request->dev_name = device->name;
+	hp_request->dev_kdrv = device_kdrv;
+
+	TAILQ_INSERT_TAIL(&hp_list, hp_request, next);
+
+	return 0;
+}
+
 /* This function is used by the interrupt thread */
 static void
 eth_dev_event_callback(char *device_name, enum rte_dev_event_type type,
-			     __rte_unused void *arg)
+				void *arg)
 {
+	static const char * const event_desc[] = {
+		[RTE_DEV_EVENT_ADD] = "add",
+		[RTE_DEV_EVENT_REMOVE] = "remove",
+	};
+	char *dev_name = malloc(strlen(device_name) + 1);
+
+	strcpy(dev_name, device_name);
+
 	if (type >= RTE_DEV_EVENT_MAX) {
 		fprintf(stderr, "%s called upon invalid event %d\n",
 			__func__, type);
 		fflush(stderr);
+	} else if (event_print_mask & (UINT32_C(1) << type)) {
+		printf("%s event\n",
+			event_desc[type]);
+		fflush(stdout);
 	}
 
 	switch (type) {
 	case RTE_DEV_EVENT_REMOVE:
-		RTE_LOG(ERR, EAL, "The device: %s has been removed!\n",
-			device_name);
-		/* TODO: After finish failure handle, begin to stop
-		 * packet forward, stop port, close port, detach port.
-		 */
+		if (rte_eal_alarm_set(100000,
+			rmv_dev_event_callback, arg))
+			fprintf(stderr, "Could not set up deferred "
+				"device removal\n");
 		break;
 	case RTE_DEV_EVENT_ADD:
-		RTE_LOG(ERR, EAL, "The device: %s has been added!\n",
-			device_name);
-		/* TODO: After finish kernel driver binding,
-		 * begin to attach port.
+		/**
+		 * bind the driver to the device
+		 * before process of hot plug adding device
 		 */
+		rte_dev_bind_kernel_driver(dev_name,
+					   get_hotplug_driver(dev_name));
+		if (rte_eal_alarm_set(100000,
+			add_dev_event_callback, dev_name))
+			fprintf(stderr, "Could not set up deferred "
+				"device add\n");
 		break;
 	default:
 		break;
@@ -2638,8 +2784,13 @@ main(int argc, char** argv)
 			rte_errno = EINVAL;
 			return -1;
 		}
-		eth_dev_event_callback_register();
-
+		if (TAILQ_EMPTY(&hp_list))
+			TAILQ_INIT(&hp_list);
+		RTE_ETH_FOREACH_DEV(port_id) {
+			hotplug_list_add(rte_eth_devices[port_id].device,
+					 rte_eth_devices[port_id].data->kdrv);
+			eth_dev_event_callback_register(port_id);
+		}
 	}
 
 	if (start_port(RTE_PORT_ALL) != 0)
diff --git a/app/test-pmd/testpmd.h b/app/test-pmd/testpmd.h
index 8fde68d..c619e11 100644
--- a/app/test-pmd/testpmd.h
+++ b/app/test-pmd/testpmd.h
@@ -63,6 +63,15 @@ typedef uint16_t streamid_t;
 #define TM_MODE			0
 #endif
 
+struct hotplug_request {
+	TAILQ_ENTRY(hotplug_request) next; /**< Callbacks list */
+	const char *dev_name;            /* request device name */
+	enum rte_kernel_driver dev_kdrv;            /* kernel driver binded */
+};
+
+/** @internal Structure to keep track of registered callbacks */
+TAILQ_HEAD(hotplug_request_list, hotplug_request);
+
 enum {
 	PORT_TOPOLOGY_PAIRED,
 	PORT_TOPOLOGY_CHAINED,
-- 
2.7.4

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [dpdk-dev] [PATCH V18 2/5] bus/pci: implement handle hot unplug operation
  2018-04-04  5:25   ` Tan, Jianfeng
@ 2018-04-06 10:57     ` Guo, Jia
  0 siblings, 0 replies; 28+ messages in thread
From: Guo, Jia @ 2018-04-06 10:57 UTC (permalink / raw)
  To: Tan, Jianfeng, stephen, Richardson, Bruce, Yigit, Ferruh,
	Ananyev, Konstantin, gaetan.rivet, Wu, Jingjing, thomas, motih,
	Van Haaren, Harry
  Cc: jblunck, shreyansh.jain, dev, Zhang, Helin



On 4/4/2018 1:25 PM, Tan, Jianfeng wrote:
>
>> -----Original Message-----
>> From: Guo, Jia
>> Sent: Wednesday, April 4, 2018 2:17 AM
>> To: stephen@networkplumber.org; Richardson, Bruce; Yigit, Ferruh;
>> Ananyev, Konstantin; gaetan.rivet@6wind.com; Wu, Jingjing;
>> thomas@monjalon.net; motih@mellanox.com; Van Haaren, Harry; Tan,
>> Jianfeng
>> Cc: jblunck@infradead.org; shreyansh.jain@nxp.com; dev@dpdk.org; Guo,
>> Jia; Zhang, Helin
>> Subject: [PATCH V18 2/5] bus/pci: implement handle hot unplug operation
>>
>> When handle device hot unplug event, remap a dummy memory to avoid
>> bus read/write error.
>>
>> Signed-off-by: Jeff Guo <jia.guo@intel.com>
>> ---
>> v16->v15;
>> split patch, merge some function to be simple
>> ---
>>   drivers/bus/pci/pci_common.c     | 42
>> ++++++++++++++++++++++++++++++++++++++++
>>   drivers/bus/pci/pci_common_uio.c | 33
>> +++++++++++++++++++++++++++++++
>>   drivers/bus/pci/private.h        | 12 ++++++++++++
>>   3 files changed, 87 insertions(+)
>>
>> diff --git a/drivers/bus/pci/pci_common.c b/drivers/bus/pci/pci_common.c
>> index 2a00f36..fa077ec 100644
>> --- a/drivers/bus/pci/pci_common.c
>> +++ b/drivers/bus/pci/pci_common.c
>> @@ -474,6 +474,47 @@ pci_find_device(const struct rte_device *start,
>> rte_dev_cmp_t cmp,
>>   }
>>
>>   static int
>> +pci_handle_hot_unplug(struct rte_device *dev)
>> +{
>> +	struct rte_pci_device *pdev;
>> +	int ret;
>> +
>> +	if (dev == NULL)
>> +		return -EINVAL;
>> +
>> +	pdev = RTE_DEV_TO_PCI(dev);
>> +
>> +	/* remap resources for devices */
>> +	switch (pdev->kdrv) {
>> +	case RTE_KDRV_VFIO:
>> +#ifdef VFIO_PRESENT
>> +		/* TODO */
>> +#endif
> What's the difference between uio and vfio? We can just fall though?
the VFIO mapping is functional different with uio, and the key is i 
don't implement vfio hotplug right now, so let it TODO.
>> +		break;
>> +	case RTE_KDRV_IGB_UIO:
>> +	case RTE_KDRV_UIO_GENERIC:
>> +		if (rte_eal_using_phys_addrs()) {
> Why do we care about if we are using physical addresses?
please check with the mapping function.
>> +			/* map resources for devices that use uio */
>> +			ret = pci_uio_remap_resource(pdev);
>> +		}
>> +		break;
>> +	case RTE_KDRV_NIC_UIO:
>> +		ret = pci_uio_remap_resource(pdev);
>> +		break;
>> +	default:
>> +		RTE_LOG(DEBUG, EAL,
>> +			"  Not managed by a supported kernel driver, skipped\n");
>> +		ret = 1;
> -1 for such case?
thanks.
>> +		break;
>> +	}
>> +
>> +	if (ret != 0)
>> +		RTE_LOG(ERR, EAL, "failed to handle hot unplug of %s",
>> +			pdev->name);
>> +	return ret;
>> +}
>> +
>> +static int
>>   pci_plug(struct rte_device *dev)
>>   {
>>   	return pci_probe_all_drivers(RTE_DEV_TO_PCI(dev));
>> @@ -503,6 +544,7 @@ struct rte_pci_bus rte_pci_bus = {
>>   		.unplug = pci_unplug,
>>   		.parse = pci_parse,
>>   		.get_iommu_class = rte_pci_get_iommu_class,
>> +		.handle_hot_unplug = pci_handle_hot_unplug,
>>   	},
>>   	.device_list = TAILQ_HEAD_INITIALIZER(rte_pci_bus.device_list),
>>   	.driver_list = TAILQ_HEAD_INITIALIZER(rte_pci_bus.driver_list),
>> diff --git a/drivers/bus/pci/pci_common_uio.c
>> b/drivers/bus/pci/pci_common_uio.c
>> index 54bc20b..468ade4 100644
>> --- a/drivers/bus/pci/pci_common_uio.c
>> +++ b/drivers/bus/pci/pci_common_uio.c
>> @@ -146,6 +146,39 @@ pci_uio_unmap(struct mapped_pci_resource
>> *uio_res)
>>   	}
>>   }
>>
>> +/* remap the PCI resource of a PCI device in private virtual memory */
>> +int
>> +pci_uio_remap_resource(struct rte_pci_device *dev)
> Why's this function uio specific? I think we can move it to pci_common.c.
not convince because not let vfio in this patch set.
>> +{
>> +	int i;
>> +	uint64_t phaddr;
>> +	void *map_address;
>> +
>> +	if (dev == NULL)
>> +		return -1;
>> +
>> +	/* Map all BARs */
> s/Map/Remap
>   
>> +	for (i = 0; i != PCI_MAX_RESOURCE; i++) {
>> +		/* skip empty BAR */
>> +		phaddr = dev->mem_resource[i].phys_addr;
>> +		if (phaddr == 0)
>> +			continue;
> How about just simple:
>
> if (dev->mem_resource[i].phys_addr == 0)
>
>> +		pci_unmap_resource(dev->mem_resource[i].addr,
>> +				(size_t)dev->mem_resource[i].len);
>> +		map_address = pci_map_resource(
>> +				dev->mem_resource[i].addr, -1, 0,
>> +				(size_t)dev->mem_resource[i].len,
>> +				MAP_ANONYMOUS | MAP_FIXED);
>> +		if (map_address == MAP_FAILED) {
>> +			RTE_LOG(ERR, EAL,
>> +				"Cannot remap resource for device %s\n",
>> dev->name);
>> +			return -1;
>> +		}
>> +	}
>> +
>> +	return 0;
>> +}
>> +
>>   static struct mapped_pci_resource *
>>   pci_uio_find_resource(struct rte_pci_device *dev)
>>   {
>> diff --git a/drivers/bus/pci/private.h b/drivers/bus/pci/private.h
>> index 88fa587..7a862ef 100644
>> --- a/drivers/bus/pci/private.h
>> +++ b/drivers/bus/pci/private.h
>> @@ -173,6 +173,18 @@ void pci_uio_free_resource(struct rte_pci_device
>> *dev,
>>   		struct mapped_pci_resource *uio_res);
>>
>>   /**
>> + * remap the pci uio resource..
>> + *
>> + * @param dev
>> + *   Point to the struct rte pci device.
>> + * @return
>> + *   - On success, zero.
>> + *   - On failure, a negative value.
>> + */
>> +int
>> +pci_uio_remap_resource(struct rte_pci_device *dev);
> If we define
>
>> +
>> +/**
>>    * Map device memory to uio resource
>>    *
>>    * This function is private to EAL.
>> --
>> 2.7.4

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [dpdk-dev] [PATCH V19 2/4] eal: add failure handler mechanism for hot plug
  2018-04-06 10:56     ` [dpdk-dev] [PATCH V19 2/4] eal: add failure handler mechanism for hot plug Jeff Guo
@ 2018-04-06 14:03       ` Zhang, Qi Z
  2018-04-06 14:24         ` Zhang, Qi Z
  2018-04-11 11:49         ` Guo, Jia
  2018-04-09 17:42       ` Ananyev, Konstantin
  1 sibling, 2 replies; 28+ messages in thread
From: Zhang, Qi Z @ 2018-04-06 14:03 UTC (permalink / raw)
  To: Guo, Jia, stephen, Richardson, Bruce, Yigit, Ferruh, Ananyev,
	Konstantin, gaetan.rivet, Wu, Jingjing, thomas, motih,
	Van Haaren, Harry, Tan, Jianfeng
  Cc: jblunck, shreyansh.jain, dev, Guo, Jia, Zhang, Helin

Hi Jeff:

> -----Original Message-----
> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Jeff Guo
> Sent: Friday, April 6, 2018 6:57 PM
> To: stephen@networkplumber.org; Richardson, Bruce
> <bruce.richardson@intel.com>; Yigit, Ferruh <ferruh.yigit@intel.com>;
> Ananyev, Konstantin <konstantin.ananyev@intel.com>;
> gaetan.rivet@6wind.com; Wu, Jingjing <jingjing.wu@intel.com>;
> thomas@monjalon.net; motih@mellanox.com; Van Haaren, Harry
> <harry.van.haaren@intel.com>; Tan, Jianfeng <jianfeng.tan@intel.com>
> Cc: jblunck@infradead.org; shreyansh.jain@nxp.com; dev@dpdk.org; Guo, Jia
> <jia.guo@intel.com>; Zhang, Helin <helin.zhang@intel.com>
> Subject: [dpdk-dev] [PATCH V19 2/4] eal: add failure handler mechanism for
> hot plug
> 
> This patch introduces an API (rte_dev_handle_hot_unplug) to handle device
> hot unplug event. When device be hot plug out, the device resource become
> invalid, if this resource is still be unexpected read/write, system will crash. The
> api let user register the hot unplug handler, when hot plug failure occur, the
> working thread will be block until the uevent mechanism successful recovery
> the memory and guaranty the application keep running smoothly.
> 
> Signed-off-by: Jeff Guo <jia.guo@intel.com>
> ---
> v19->18:
> add note for limitation of multiple hotplug
> ---
>  doc/guides/rel_notes/release_18_05.rst  |   6 ++
>  kernel/linux/igb_uio/igb_uio.c          |   4 +
>  lib/librte_eal/common/include/rte_dev.h |  19 +++++
>  lib/librte_eal/linuxapp/eal/eal_dev.c   | 140
> +++++++++++++++++++++++++++++++-
>  lib/librte_eal/rte_eal_version.map      |   1 +
>  5 files changed, 169 insertions(+), 1 deletion(-)
> 
> diff --git a/doc/guides/rel_notes/release_18_05.rst
> b/doc/guides/rel_notes/release_18_05.rst
> index cb9e050..2707e73 100644
> --- a/doc/guides/rel_notes/release_18_05.rst
> +++ b/doc/guides/rel_notes/release_18_05.rst
> @@ -70,6 +70,12 @@ New Features
> 
>    Linux uevent is supported as backend of this device event notification
> framework.
> 
> +* **Added hot plug failure handler.**
> +
> +  Added a failure handler machenism to handle hot unplug device.
> +
> +  * ``rte_dev_handle_hot_unplug`` for handle hot unplug device failure.
> +
>  API Changes
>  -----------
> 
> diff --git a/kernel/linux/igb_uio/igb_uio.c b/kernel/linux/igb_uio/igb_uio.c
> index 4cae4dd..293c310 100644
> --- a/kernel/linux/igb_uio/igb_uio.c
> +++ b/kernel/linux/igb_uio/igb_uio.c
> @@ -344,6 +344,10 @@ igbuio_pci_release(struct uio_info *info, struct inode
> *inode)
>  	struct rte_uio_pci_dev *udev = info->priv;
>  	struct pci_dev *dev = udev->pdev;
> 
> +	/* check if device has been remove before release */
> +	if ((&dev->dev.kobj)->state_remove_uevent_sent == 1)
> +		return -1;
> +
>  	mutex_lock(&udev->lock);
>  	if (--udev->refcnt > 0) {
>  		mutex_unlock(&udev->lock);
> diff --git a/lib/librte_eal/common/include/rte_dev.h
> b/lib/librte_eal/common/include/rte_dev.h
> index a5203e7..17c446d 100644
> --- a/lib/librte_eal/common/include/rte_dev.h
> +++ b/lib/librte_eal/common/include/rte_dev.h
> @@ -361,4 +361,23 @@ rte_dev_event_monitor_start(void);
>   */
>  int __rte_experimental
>  rte_dev_event_monitor_stop(void);
> +
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change without prior notice
> + *
> + * It can be used to register the device signal bus handler, and save
> +the
> + * current environment for each thread, when signal bus error invoke,
> +the
> + * handler would restore the environment by long jmp to each working
> + * thread previous locate, then block the thread to waiting until the
> +memory
> + * recovery and remapping be finished, that would guaranty the system
> +not
> + * crash when the device be hot unplug.
> + *
> + * @param none
> + * @return
> + *   - From a successful direct invocation, zero.
> + *   - From a call of siglongjmp(), non_zero.
> + */
> +int __rte_experimental
> +rte_dev_handle_hot_unplug(void);
>  #endif /* _RTE_DEV_H_ */
> diff --git a/lib/librte_eal/linuxapp/eal/eal_dev.c
> b/lib/librte_eal/linuxapp/eal/eal_dev.c
> index 9478a39..84b7efc 100644
> --- a/lib/librte_eal/linuxapp/eal/eal_dev.c
> +++ b/lib/librte_eal/linuxapp/eal/eal_dev.c
> @@ -4,6 +4,9 @@
> 
>  #include <string.h>
>  #include <unistd.h>
> +#include <signal.h>
> +#include <setjmp.h>
> +#include <pthread.h>
>  #include <sys/socket.h>
>  #include <linux/netlink.h>
> 
> @@ -13,12 +16,17 @@
>  #include <rte_malloc.h>
>  #include <rte_interrupts.h>
>  #include <rte_alarm.h>
> +#include <rte_bus.h>
> +#include <rte_per_lcore.h>
> 
>  #include "eal_private.h"
> 
>  static struct rte_intr_handle intr_handle = {.fd = -1 };  static bool
> monitor_started;
> 
> +pthread_mutex_t failure_recovery_lock;
> +pthread_cond_t failure_recovery_cond;
> +
>  #define EAL_UEV_MSG_LEN 4096
>  #define EAL_UEV_MSG_ELEM_LEN 128
> 
> @@ -32,6 +40,22 @@ enum eal_dev_event_subsystem {
>  	EAL_DEV_EVENT_SUBSYSTEM_MAX
>  };
> 
> +static RTE_DEFINE_PER_LCORE(sigjmp_buf, unplug_longjmp_env);
> +
> +static void sigbus_handler(int signum __rte_unused) {
> +	RTE_LOG(DEBUG, EAL, "receive SIGBUS error!\n");
> +	siglongjmp(RTE_PER_LCORE(unplug_longjmp_env), 1); }
> +
> +static int cmp_dev_name(const struct rte_device *dev,
> +	const void *_name)
> +{
> +	const char *name = _name;
> +
> +	return strcmp(dev->name, name);
> +}
> +
>  static int
>  dev_uev_socket_fd_create(void)
>  {
> @@ -132,6 +156,31 @@ dev_uev_parse(const char *buf, struct rte_dev_event
> *event, int length)
>  	return 0;
>  }
> 
> +static int
> +dev_uev_remove_handler(struct rte_device *dev) {
> +	struct rte_bus *bus = rte_bus_find_by_device_name(dev->name);
> +	int ret;
> +
> +	if (!dev)
> +		return -1;
> +
> +	if (bus->handle_hot_unplug) {
> +		/**
> +		 * call bus ops to handle hot unplug.
> +		 */
> +		ret = bus->handle_hot_unplug(dev);
> +		if (ret) {
> +			RTE_LOG(ERR, EAL,
> +				"It cannot handle hot unplug for device (%s) "
> +				"on the bus.\n ",
> +				dev->name);
> +			return ret;
> +		}
> +	}
> +	return 0;
> +}
> +
>  static void
>  dev_delayed_unregister(void *param)
>  {
> @@ -146,6 +195,9 @@ dev_uev_handler(__rte_unused void *param)
>  	struct rte_dev_event uevent;
>  	int ret;
>  	char buf[EAL_UEV_MSG_LEN];
> +	struct rte_bus *bus;
> +	struct rte_device *dev;
> +	const char *busname;
> 
>  	memset(&uevent, 0, sizeof(struct rte_dev_event));
>  	memset(buf, 0, EAL_UEV_MSG_LEN);
> @@ -170,11 +222,87 @@ dev_uev_handler(__rte_unused void *param)
>  	RTE_LOG(DEBUG, EAL, "receive uevent(name:%s, type:%d,
> subsystem:%d)\n",
>  		uevent.devname, uevent.type, uevent.subsystem);
> 
> -	if (uevent.devname)
> +	switch (uevent.subsystem) {
> +	case EAL_DEV_EVENT_SUBSYSTEM_PCI:
> +	case EAL_DEV_EVENT_SUBSYSTEM_UIO:
> +		busname = "pci";
> +		break;
> +	default:
> +		break;
> +	}
> +
> +	if (uevent.devname) {
> +		if (uevent.type == RTE_DEV_EVENT_REMOVE) {
> +			bus = rte_bus_find_by_name(busname);
> +			if (bus == NULL) {
> +				RTE_LOG(ERR, EAL, "Cannot find bus (%s)\n",
> +					uevent.devname);
> +				return;
> +			}
> +			dev = bus->find_device(NULL, cmp_dev_name,
> +					       uevent.devname);
> +			if (dev == NULL) {
> +				RTE_LOG(ERR, EAL,
> +					"Cannot find unplugged device (%s)\n",
> +					uevent.devname);
> +				return;
> +			}
> +			ret = dev_uev_remove_handler(dev);
> +			if (ret) {
> +				RTE_LOG(ERR, EAL, "Driver cannot remap the "
> +					"device (%s)\n",
> +					dev->name);
> +				return;
> +			}
> +			/* wake up all the threads */
> +			pthread_cond_broadcast(&failure_recovery_cond);
> +		}
>  		dev_callback_process(uevent.devname, uevent.type);
> +	}
>  }
> 
>  int __rte_experimental
> +rte_dev_handle_hot_unplug(void)
> +{
> +	struct sigaction act;
> +	sigset_t mask;
> +	int ret = 0;
> +
> +	/* set signal handlers */
> +	memset(&act, 0x00, sizeof(struct sigaction));
> +	act.sa_handler = sigbus_handler;
> +	sigemptyset(&act.sa_mask);
> +	act.sa_flags = SA_RESTART;
> +	sigaction(SIGBUS, &act, NULL);
> +	sigemptyset(&mask);
> +	sigaddset(&mask, SIGBUS);
> +	pthread_sigmask(SIG_UNBLOCK, &mask, NULL);
> +
> +	ret = sigsetjmp(RTE_PER_LCORE(unplug_longjmp_env), 1);
> +	if (ret) {
> +		/*
> +		 * Waitting for condition variable before failure recovery
> +		 * finish. Now the limitation is only handle one device
> +		 * hot plug, for multiple devices hotplug, need check if
> +		 * the device belong to this working thread, then directly
> +		 * call memory remaping, unrelated thread just keep going
> +		 * their work by no interrupt from hotplug.
> +		 * TODO: multiple device hotplug
> +		 */
> +		pthread_mutex_lock(&failure_recovery_lock);
> +		RTE_LOG(DEBUG, EAL, "begin waiting for the failure handler.\n");
> +		pthread_cond_wait(&failure_recovery_cond,
> +					&failure_recovery_lock);
> +		RTE_LOG(DEBUG, EAL,
> +		       "come back from waiting for failure handler.\n");
> +		pthread_mutex_unlock(&failure_recovery_lock);

I think we should not assume phread_cond_wait always happen before pthread_cond_broadcast,
It is possible Sigbus just happen before remap in udev remove handler while pthread_cond_wait happens
after pthread_cond_broadcast, then we will wait forever.

I think we need a flag to sync
For example:

pthread_mutex_lock(&failure_recovery_lock);
if ( udev_remove_handle == 0 )
	pthread_cond_wait(&failure_recovery_cond, & failure_recovery_lock);
pthread_remove_handle = 0;
pthread_mutex_unlock(&failure_recovery_lock);

while at remove handler:
pthread_mutex_lock(&failure_recovery_lock);
pthread_remove_handle = 1;
pthread_cond_signel(&failure_recovery_cond);
pthread_mutex_unlock(&failure_recovery_lock);


Regards
Qi

> +	}
> +
> +	return ret;
> +}
> +
> +
> +int __rte_experimental
>  rte_dev_event_monitor_start(void)
>  {
>  	int ret;
> @@ -196,6 +324,12 @@ rte_dev_event_monitor_start(void)
>  		return -1;
>  	}
> 
> +	/* initialize mutex and condition variable
> +	 * to control failure recovery.
> +	 */
> +	pthread_mutex_init(&failure_recovery_lock, NULL);
> +	pthread_cond_init(&failure_recovery_cond, NULL);
> +
>  	monitor_started = true;
> 
>  	return 0;
> @@ -219,5 +353,9 @@ rte_dev_event_monitor_stop(void)
>  	close(intr_handle.fd);
>  	intr_handle.fd = -1;
>  	monitor_started = false;
> +
> +	pthread_cond_destroy(&failure_recovery_cond);
> +	pthread_mutex_destroy(&failure_recovery_lock);
> +
>  	return 0;
>  }
> diff --git a/lib/librte_eal/rte_eal_version.map
> b/lib/librte_eal/rte_eal_version.map
> index fc5c62a..873ef38 100644
> --- a/lib/librte_eal/rte_eal_version.map
> +++ b/lib/librte_eal/rte_eal_version.map
> @@ -262,5 +262,6 @@ EXPERIMENTAL {
>  	rte_dev_event_monitor_stop;
>  	rte_dev_event_callback_register;
>  	rte_dev_event_callback_unregister;
> +	rte_dev_handle_hot_unplug;
> 
>  } DPDK_18.02;
> --
> 2.7.4

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [dpdk-dev] [PATCH V19 2/4] eal: add failure handler mechanism for hot plug
  2018-04-06 14:03       ` Zhang, Qi Z
@ 2018-04-06 14:24         ` Zhang, Qi Z
  2018-04-11 11:50           ` Guo, Jia
  2018-04-11 11:49         ` Guo, Jia
  1 sibling, 1 reply; 28+ messages in thread
From: Zhang, Qi Z @ 2018-04-06 14:24 UTC (permalink / raw)
  To: Zhang, Qi Z, Guo, Jia, stephen, Richardson, Bruce, Yigit, Ferruh,
	Ananyev, Konstantin, gaetan.rivet, Wu, Jingjing, thomas, motih,
	Van Haaren, Harry, Tan, Jianfeng
  Cc: jblunck, shreyansh.jain, dev, Guo, Jia, Zhang, Helin

One more comment

> -----Original Message-----
> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Zhang, Qi Z
> Sent: Friday, April 6, 2018 10:04 PM
> To: Guo, Jia <jia.guo@intel.com>; stephen@networkplumber.org; Richardson,
> Bruce <bruce.richardson@intel.com>; Yigit, Ferruh <ferruh.yigit@intel.com>;
> Ananyev, Konstantin <konstantin.ananyev@intel.com>;
> gaetan.rivet@6wind.com; Wu, Jingjing <jingjing.wu@intel.com>;
> thomas@monjalon.net; motih@mellanox.com; Van Haaren, Harry
> <harry.van.haaren@intel.com>; Tan, Jianfeng <jianfeng.tan@intel.com>
> Cc: jblunck@infradead.org; shreyansh.jain@nxp.com; dev@dpdk.org; Guo, Jia
> <jia.guo@intel.com>; Zhang, Helin <helin.zhang@intel.com>
> Subject: Re: [dpdk-dev] [PATCH V19 2/4] eal: add failure handler mechanism
> for hot plug
> 
> Hi Jeff:
> 
> > -----Original Message-----
> > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Jeff Guo
> > Sent: Friday, April 6, 2018 6:57 PM
> > To: stephen@networkplumber.org; Richardson, Bruce
> > <bruce.richardson@intel.com>; Yigit, Ferruh <ferruh.yigit@intel.com>;
> > Ananyev, Konstantin <konstantin.ananyev@intel.com>;
> > gaetan.rivet@6wind.com; Wu, Jingjing <jingjing.wu@intel.com>;
> > thomas@monjalon.net; motih@mellanox.com; Van Haaren, Harry
> > <harry.van.haaren@intel.com>; Tan, Jianfeng <jianfeng.tan@intel.com>
> > Cc: jblunck@infradead.org; shreyansh.jain@nxp.com; dev@dpdk.org; Guo,
> > Jia <jia.guo@intel.com>; Zhang, Helin <helin.zhang@intel.com>
> > Subject: [dpdk-dev] [PATCH V19 2/4] eal: add failure handler mechanism
> > for hot plug
> >
> > This patch introduces an API (rte_dev_handle_hot_unplug) to handle
> > device hot unplug event. When device be hot plug out, the device
> > resource become invalid, if this resource is still be unexpected
> > read/write, system will crash. The api let user register the hot
> > unplug handler, 

The description is a little bit misleading , based on current implementation.
it does not a function that let user "register a handler", actually It let user to
"set a recover point" when some exception (like sigbus) happen due to hot unplug.
Maybe the function name could be changed also, 
for example: rte_dev_set_recover_point.

> when hot plug failure occur, the working thread will
> > be block until the uevent mechanism successful recovery the memory and
> guaranty the application keep running smoothly.
> >
> > Signed-off-by: Jeff Guo <jia.guo@intel.com>
> > ---
> > v19->18:
> > add note for limitation of multiple hotplug
> > ---
> >  doc/guides/rel_notes/release_18_05.rst  |   6 ++
> >  kernel/linux/igb_uio/igb_uio.c          |   4 +
> >  lib/librte_eal/common/include/rte_dev.h |  19 +++++
> >  lib/librte_eal/linuxapp/eal/eal_dev.c   | 140
> > +++++++++++++++++++++++++++++++-
> >  lib/librte_eal/rte_eal_version.map      |   1 +
> >  5 files changed, 169 insertions(+), 1 deletion(-)
> >
> > diff --git a/doc/guides/rel_notes/release_18_05.rst
> > b/doc/guides/rel_notes/release_18_05.rst
> > index cb9e050..2707e73 100644
> > --- a/doc/guides/rel_notes/release_18_05.rst
> > +++ b/doc/guides/rel_notes/release_18_05.rst
> > @@ -70,6 +70,12 @@ New Features
> >
> >    Linux uevent is supported as backend of this device event
> > notification framework.
> >
> > +* **Added hot plug failure handler.**
> > +
> > +  Added a failure handler machenism to handle hot unplug device.
> > +
> > +  * ``rte_dev_handle_hot_unplug`` for handle hot unplug device failure.
> > +
> >  API Changes
> >  -----------
> >
> > diff --git a/kernel/linux/igb_uio/igb_uio.c
> > b/kernel/linux/igb_uio/igb_uio.c index 4cae4dd..293c310 100644
> > --- a/kernel/linux/igb_uio/igb_uio.c
> > +++ b/kernel/linux/igb_uio/igb_uio.c
> > @@ -344,6 +344,10 @@ igbuio_pci_release(struct uio_info *info, struct
> > inode
> > *inode)
> >  	struct rte_uio_pci_dev *udev = info->priv;
> >  	struct pci_dev *dev = udev->pdev;
> >
> > +	/* check if device has been remove before release */
> > +	if ((&dev->dev.kobj)->state_remove_uevent_sent == 1)
> > +		return -1;
> > +
> >  	mutex_lock(&udev->lock);
> >  	if (--udev->refcnt > 0) {
> >  		mutex_unlock(&udev->lock);
> > diff --git a/lib/librte_eal/common/include/rte_dev.h
> > b/lib/librte_eal/common/include/rte_dev.h
> > index a5203e7..17c446d 100644
> > --- a/lib/librte_eal/common/include/rte_dev.h
> > +++ b/lib/librte_eal/common/include/rte_dev.h
> > @@ -361,4 +361,23 @@ rte_dev_event_monitor_start(void);
> >   */
> >  int __rte_experimental
> >  rte_dev_event_monitor_stop(void);
> > +
> > +/**
> > + * @warning
> > + * @b EXPERIMENTAL: this API may change without prior notice
> > + *
> > + * It can be used to register the device signal bus handler, and save
> > +the
> > + * current environment for each thread, when signal bus error invoke,
> > +the
> > + * handler would restore the environment by long jmp to each working
> > + * thread previous locate, then block the thread to waiting until the
> > +memory
> > + * recovery and remapping be finished, that would guaranty the system
> > +not
> > + * crash when the device be hot unplug.
> > + *
> > + * @param none
> > + * @return
> > + *   - From a successful direct invocation, zero.
> > + *   - From a call of siglongjmp(), non_zero.
> > + */
> > +int __rte_experimental
> > +rte_dev_handle_hot_unplug(void);
> >  #endif /* _RTE_DEV_H_ */
> > diff --git a/lib/librte_eal/linuxapp/eal/eal_dev.c
> > b/lib/librte_eal/linuxapp/eal/eal_dev.c
> > index 9478a39..84b7efc 100644
> > --- a/lib/librte_eal/linuxapp/eal/eal_dev.c
> > +++ b/lib/librte_eal/linuxapp/eal/eal_dev.c
> > @@ -4,6 +4,9 @@
> >
> >  #include <string.h>
> >  #include <unistd.h>
> > +#include <signal.h>
> > +#include <setjmp.h>
> > +#include <pthread.h>
> >  #include <sys/socket.h>
> >  #include <linux/netlink.h>
> >
> > @@ -13,12 +16,17 @@
> >  #include <rte_malloc.h>
> >  #include <rte_interrupts.h>
> >  #include <rte_alarm.h>
> > +#include <rte_bus.h>
> > +#include <rte_per_lcore.h>
> >
> >  #include "eal_private.h"
> >
> >  static struct rte_intr_handle intr_handle = {.fd = -1 };  static bool
> > monitor_started;
> >
> > +pthread_mutex_t failure_recovery_lock; pthread_cond_t
> > +failure_recovery_cond;
> > +
> >  #define EAL_UEV_MSG_LEN 4096
> >  #define EAL_UEV_MSG_ELEM_LEN 128
> >
> > @@ -32,6 +40,22 @@ enum eal_dev_event_subsystem {
> >  	EAL_DEV_EVENT_SUBSYSTEM_MAX
> >  };
> >
> > +static RTE_DEFINE_PER_LCORE(sigjmp_buf, unplug_longjmp_env);
> > +
> > +static void sigbus_handler(int signum __rte_unused) {
> > +	RTE_LOG(DEBUG, EAL, "receive SIGBUS error!\n");
> > +	siglongjmp(RTE_PER_LCORE(unplug_longjmp_env), 1); }
> > +
> > +static int cmp_dev_name(const struct rte_device *dev,
> > +	const void *_name)
> > +{
> > +	const char *name = _name;
> > +
> > +	return strcmp(dev->name, name);
> > +}
> > +
> >  static int
> >  dev_uev_socket_fd_create(void)
> >  {
> > @@ -132,6 +156,31 @@ dev_uev_parse(const char *buf, struct
> > rte_dev_event *event, int length)
> >  	return 0;
> >  }
> >
> > +static int
> > +dev_uev_remove_handler(struct rte_device *dev) {
> > +	struct rte_bus *bus = rte_bus_find_by_device_name(dev->name);
> > +	int ret;
> > +
> > +	if (!dev)
> > +		return -1;
> > +
> > +	if (bus->handle_hot_unplug) {
> > +		/**
> > +		 * call bus ops to handle hot unplug.
> > +		 */
> > +		ret = bus->handle_hot_unplug(dev);
> > +		if (ret) {
> > +			RTE_LOG(ERR, EAL,
> > +				"It cannot handle hot unplug for device (%s) "
> > +				"on the bus.\n ",
> > +				dev->name);
> > +			return ret;
> > +		}
> > +	}
> > +	return 0;
> > +}
> > +
> >  static void
> >  dev_delayed_unregister(void *param)
> >  {
> > @@ -146,6 +195,9 @@ dev_uev_handler(__rte_unused void *param)
> >  	struct rte_dev_event uevent;
> >  	int ret;
> >  	char buf[EAL_UEV_MSG_LEN];
> > +	struct rte_bus *bus;
> > +	struct rte_device *dev;
> > +	const char *busname;
> >
> >  	memset(&uevent, 0, sizeof(struct rte_dev_event));
> >  	memset(buf, 0, EAL_UEV_MSG_LEN);
> > @@ -170,11 +222,87 @@ dev_uev_handler(__rte_unused void *param)
> >  	RTE_LOG(DEBUG, EAL, "receive uevent(name:%s, type:%d,
> > subsystem:%d)\n",
> >  		uevent.devname, uevent.type, uevent.subsystem);
> >
> > -	if (uevent.devname)
> > +	switch (uevent.subsystem) {
> > +	case EAL_DEV_EVENT_SUBSYSTEM_PCI:
> > +	case EAL_DEV_EVENT_SUBSYSTEM_UIO:
> > +		busname = "pci";
> > +		break;
> > +	default:
> > +		break;
> > +	}
> > +
> > +	if (uevent.devname) {
> > +		if (uevent.type == RTE_DEV_EVENT_REMOVE) {
> > +			bus = rte_bus_find_by_name(busname);
> > +			if (bus == NULL) {
> > +				RTE_LOG(ERR, EAL, "Cannot find bus (%s)\n",
> > +					uevent.devname);
> > +				return;
> > +			}
> > +			dev = bus->find_device(NULL, cmp_dev_name,
> > +					       uevent.devname);
> > +			if (dev == NULL) {
> > +				RTE_LOG(ERR, EAL,
> > +					"Cannot find unplugged device (%s)\n",
> > +					uevent.devname);
> > +				return;
> > +			}
> > +			ret = dev_uev_remove_handler(dev);
> > +			if (ret) {
> > +				RTE_LOG(ERR, EAL, "Driver cannot remap the "
> > +					"device (%s)\n",
> > +					dev->name);
> > +				return;
> > +			}
> > +			/* wake up all the threads */
> > +			pthread_cond_broadcast(&failure_recovery_cond);
> > +		}
> >  		dev_callback_process(uevent.devname, uevent.type);
> > +	}
> >  }
> >
> >  int __rte_experimental
> > +rte_dev_handle_hot_unplug(void)
> > +{
> > +	struct sigaction act;
> > +	sigset_t mask;
> > +	int ret = 0;
> > +
> > +	/* set signal handlers */
> > +	memset(&act, 0x00, sizeof(struct sigaction));
> > +	act.sa_handler = sigbus_handler;
> > +	sigemptyset(&act.sa_mask);
> > +	act.sa_flags = SA_RESTART;
> > +	sigaction(SIGBUS, &act, NULL);
> > +	sigemptyset(&mask);
> > +	sigaddset(&mask, SIGBUS);
> > +	pthread_sigmask(SIG_UNBLOCK, &mask, NULL);
> > +
> > +	ret = sigsetjmp(RTE_PER_LCORE(unplug_longjmp_env), 1);
> > +	if (ret) {
> > +		/*
> > +		 * Waitting for condition variable before failure recovery
> > +		 * finish. Now the limitation is only handle one device
> > +		 * hot plug, for multiple devices hotplug, need check if
> > +		 * the device belong to this working thread, then directly
> > +		 * call memory remaping, unrelated thread just keep going
> > +		 * their work by no interrupt from hotplug.
> > +		 * TODO: multiple device hotplug
> > +		 */
> > +		pthread_mutex_lock(&failure_recovery_lock);
> > +		RTE_LOG(DEBUG, EAL, "begin waiting for the failure handler.\n");
> > +		pthread_cond_wait(&failure_recovery_cond,
> > +					&failure_recovery_lock);
> > +		RTE_LOG(DEBUG, EAL,
> > +		       "come back from waiting for failure handler.\n");
> > +		pthread_mutex_unlock(&failure_recovery_lock);
> 
> I think we should not assume phread_cond_wait always happen before
> pthread_cond_broadcast, It is possible Sigbus just happen before remap in
> udev remove handler while pthread_cond_wait happens after
> pthread_cond_broadcast, then we will wait forever.
> 
> I think we need a flag to sync
> For example:
> 
> pthread_mutex_lock(&failure_recovery_lock);
> if ( udev_remove_handle == 0 )
> 	pthread_cond_wait(&failure_recovery_cond, & failure_recovery_lock);
> pthread_remove_handle = 0; pthread_mutex_unlock(&failure_recovery_lock);
> 
> while at remove handler:
> pthread_mutex_lock(&failure_recovery_lock);
> pthread_remove_handle = 1;
> pthread_cond_signel(&failure_recovery_cond);
> pthread_mutex_unlock(&failure_recovery_lock);
> 
> 
> Regards
> Qi
> 
> > +	}
> > +
> > +	return ret;
> > +}
> > +
> > +
> > +int __rte_experimental
> >  rte_dev_event_monitor_start(void)
> >  {
> >  	int ret;
> > @@ -196,6 +324,12 @@ rte_dev_event_monitor_start(void)
> >  		return -1;
> >  	}
> >
> > +	/* initialize mutex and condition variable
> > +	 * to control failure recovery.
> > +	 */
> > +	pthread_mutex_init(&failure_recovery_lock, NULL);
> > +	pthread_cond_init(&failure_recovery_cond, NULL);
> > +
> >  	monitor_started = true;
> >
> >  	return 0;
> > @@ -219,5 +353,9 @@ rte_dev_event_monitor_stop(void)
> >  	close(intr_handle.fd);
> >  	intr_handle.fd = -1;
> >  	monitor_started = false;
> > +
> > +	pthread_cond_destroy(&failure_recovery_cond);
> > +	pthread_mutex_destroy(&failure_recovery_lock);
> > +
> >  	return 0;
> >  }
> > diff --git a/lib/librte_eal/rte_eal_version.map
> > b/lib/librte_eal/rte_eal_version.map
> > index fc5c62a..873ef38 100644
> > --- a/lib/librte_eal/rte_eal_version.map
> > +++ b/lib/librte_eal/rte_eal_version.map
> > @@ -262,5 +262,6 @@ EXPERIMENTAL {
> >  	rte_dev_event_monitor_stop;
> >  	rte_dev_event_callback_register;
> >  	rte_dev_event_callback_unregister;
> > +	rte_dev_handle_hot_unplug;
> >
> >  } DPDK_18.02;
> > --
> > 2.7.4

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [dpdk-dev] [PATCH V19 2/4] eal: add failure handler mechanism for hot plug
  2018-04-06 10:56     ` [dpdk-dev] [PATCH V19 2/4] eal: add failure handler mechanism for hot plug Jeff Guo
  2018-04-06 14:03       ` Zhang, Qi Z
@ 2018-04-09 17:42       ` Ananyev, Konstantin
  2018-04-11 11:34         ` Guo, Jia
  1 sibling, 1 reply; 28+ messages in thread
From: Ananyev, Konstantin @ 2018-04-09 17:42 UTC (permalink / raw)
  To: Guo, Jia, stephen, Richardson, Bruce, Yigit, Ferruh,
	gaetan.rivet, Wu, Jingjing, thomas, motih, Van Haaren, Harry,
	Tan, Jianfeng
  Cc: jblunck, shreyansh.jain, dev, Zhang, Helin

Hi Jeff,

> 
> This patch introduces an API (rte_dev_handle_hot_unplug) to handle device
> hot unplug event. When device be hot plug out, the device resource
> become invalid, if this resource is still be unexpected read/write,
> system will crash. The api let user register the hot unplug handler, when
> hot plug failure occur, the working thread will be block until the uevent
> mechanism successful recovery the memory and guaranty the application keep
> running smoothly.
> 
> Signed-off-by: Jeff Guo <jia.guo@intel.com>
> ---
> v19->18:
> add note for limitation of multiple hotplug
> ---
>  doc/guides/rel_notes/release_18_05.rst  |   6 ++
>  kernel/linux/igb_uio/igb_uio.c          |   4 +
>  lib/librte_eal/common/include/rte_dev.h |  19 +++++
>  lib/librte_eal/linuxapp/eal/eal_dev.c   | 140 +++++++++++++++++++++++++++++++-
>  lib/librte_eal/rte_eal_version.map      |   1 +
>  5 files changed, 169 insertions(+), 1 deletion(-)
> 
> diff --git a/doc/guides/rel_notes/release_18_05.rst b/doc/guides/rel_notes/release_18_05.rst
> index cb9e050..2707e73 100644
> --- a/doc/guides/rel_notes/release_18_05.rst
> +++ b/doc/guides/rel_notes/release_18_05.rst
> @@ -70,6 +70,12 @@ New Features
> 
>    Linux uevent is supported as backend of this device event notification framework.
> 
> +* **Added hot plug failure handler.**
> +
> +  Added a failure handler machenism to handle hot unplug device.
> +
> +  * ``rte_dev_handle_hot_unplug`` for handle hot unplug device failure.
> +
>  API Changes
>  -----------
> 
> diff --git a/kernel/linux/igb_uio/igb_uio.c b/kernel/linux/igb_uio/igb_uio.c
> index 4cae4dd..293c310 100644
> --- a/kernel/linux/igb_uio/igb_uio.c
> +++ b/kernel/linux/igb_uio/igb_uio.c
> @@ -344,6 +344,10 @@ igbuio_pci_release(struct uio_info *info, struct inode *inode)
>  	struct rte_uio_pci_dev *udev = info->priv;
>  	struct pci_dev *dev = udev->pdev;
> 
> +	/* check if device has been remove before release */
> +	if ((&dev->dev.kobj)->state_remove_uevent_sent == 1)
> +		return -1;
> +
>  	mutex_lock(&udev->lock);
>  	if (--udev->refcnt > 0) {
>  		mutex_unlock(&udev->lock);
> diff --git a/lib/librte_eal/common/include/rte_dev.h b/lib/librte_eal/common/include/rte_dev.h
> index a5203e7..17c446d 100644
> --- a/lib/librte_eal/common/include/rte_dev.h
> +++ b/lib/librte_eal/common/include/rte_dev.h
> @@ -361,4 +361,23 @@ rte_dev_event_monitor_start(void);
>   */
>  int __rte_experimental
>  rte_dev_event_monitor_stop(void);
> +
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change without prior notice
> + *
> + * It can be used to register the device signal bus handler, and save the
> + * current environment for each thread, when signal bus error invoke, the
> + * handler would restore the environment by long jmp to each working
> + * thread previous locate, then block the thread to waiting until the memory
> + * recovery and remapping be finished, that would guaranty the system not
> + * crash when the device be hot unplug.
> + *
> + * @param none
> + * @return
> + *   - From a successful direct invocation, zero.
> + *   - From a call of siglongjmp(), non_zero.
> + */
> +int __rte_experimental
> +rte_dev_handle_hot_unplug(void);
>  #endif /* _RTE_DEV_H_ */
> diff --git a/lib/librte_eal/linuxapp/eal/eal_dev.c b/lib/librte_eal/linuxapp/eal/eal_dev.c
> index 9478a39..84b7efc 100644
> --- a/lib/librte_eal/linuxapp/eal/eal_dev.c
> +++ b/lib/librte_eal/linuxapp/eal/eal_dev.c
> @@ -4,6 +4,9 @@
> 
>  #include <string.h>
>  #include <unistd.h>
> +#include <signal.h>
> +#include <setjmp.h>
> +#include <pthread.h>
>  #include <sys/socket.h>
>  #include <linux/netlink.h>
> 
> @@ -13,12 +16,17 @@
>  #include <rte_malloc.h>
>  #include <rte_interrupts.h>
>  #include <rte_alarm.h>
> +#include <rte_bus.h>
> +#include <rte_per_lcore.h>
> 
>  #include "eal_private.h"
> 
>  static struct rte_intr_handle intr_handle = {.fd = -1 };
>  static bool monitor_started;
> 
> +pthread_mutex_t failure_recovery_lock;
> +pthread_cond_t failure_recovery_cond;
> +
>  #define EAL_UEV_MSG_LEN 4096
>  #define EAL_UEV_MSG_ELEM_LEN 128
> 
> @@ -32,6 +40,22 @@ enum eal_dev_event_subsystem {
>  	EAL_DEV_EVENT_SUBSYSTEM_MAX
>  };
> 
> +static RTE_DEFINE_PER_LCORE(sigjmp_buf, unplug_longjmp_env);
> +
> +static void sigbus_handler(int signum __rte_unused)
> +{
> +	RTE_LOG(DEBUG, EAL, "receive SIGBUS error!\n");
> +	siglongjmp(RTE_PER_LCORE(unplug_longjmp_env), 1);
> +}
> +
> +static int cmp_dev_name(const struct rte_device *dev,
> +	const void *_name)
> +{
> +	const char *name = _name;
> +
> +	return strcmp(dev->name, name);
> +}
> +
>  static int
>  dev_uev_socket_fd_create(void)
>  {
> @@ -132,6 +156,31 @@ dev_uev_parse(const char *buf, struct rte_dev_event *event, int length)
>  	return 0;
>  }
> 
> +static int
> +dev_uev_remove_handler(struct rte_device *dev)
> +{
> +	struct rte_bus *bus = rte_bus_find_by_device_name(dev->name);
> +	int ret;
> +
> +	if (!dev)
> +		return -1;
> +
> +	if (bus->handle_hot_unplug) {
> +		/**
> +		 * call bus ops to handle hot unplug.
> +		 */
> +		ret = bus->handle_hot_unplug(dev);
> +		if (ret) {
> +			RTE_LOG(ERR, EAL,
> +				"It cannot handle hot unplug for device (%s) "
> +				"on the bus.\n ",
> +				dev->name);
> +			return ret;
> +		}
> +	}
> +	return 0;
> +}
> +
>  static void
>  dev_delayed_unregister(void *param)
>  {
> @@ -146,6 +195,9 @@ dev_uev_handler(__rte_unused void *param)
>  	struct rte_dev_event uevent;
>  	int ret;
>  	char buf[EAL_UEV_MSG_LEN];
> +	struct rte_bus *bus;
> +	struct rte_device *dev;
> +	const char *busname;
> 
>  	memset(&uevent, 0, sizeof(struct rte_dev_event));
>  	memset(buf, 0, EAL_UEV_MSG_LEN);
> @@ -170,11 +222,87 @@ dev_uev_handler(__rte_unused void *param)
>  	RTE_LOG(DEBUG, EAL, "receive uevent(name:%s, type:%d, subsystem:%d)\n",
>  		uevent.devname, uevent.type, uevent.subsystem);
> 
> -	if (uevent.devname)
> +	switch (uevent.subsystem) {
> +	case EAL_DEV_EVENT_SUBSYSTEM_PCI:
> +	case EAL_DEV_EVENT_SUBSYSTEM_UIO:
> +		busname = "pci";
> +		break;
> +	default:
> +		break;
> +	}
> +
> +	if (uevent.devname) {
> +		if (uevent.type == RTE_DEV_EVENT_REMOVE) {
> +			bus = rte_bus_find_by_name(busname);
> +			if (bus == NULL) {
> +				RTE_LOG(ERR, EAL, "Cannot find bus (%s)\n",
> +					uevent.devname);
> +				return;
> +			}
> +			dev = bus->find_device(NULL, cmp_dev_name,
> +					       uevent.devname);
> +			if (dev == NULL) {
> +				RTE_LOG(ERR, EAL,
> +					"Cannot find unplugged device (%s)\n",
> +					uevent.devname);
> +				return;
> +			}
> +			ret = dev_uev_remove_handler(dev);
> +			if (ret) {
> +				RTE_LOG(ERR, EAL, "Driver cannot remap the "
> +					"device (%s)\n",
> +					dev->name);
> +				return;
> +			}
> +			/* wake up all the threads */
> +			pthread_cond_broadcast(&failure_recovery_cond);
> +		}
>  		dev_callback_process(uevent.devname, uevent.type);
> +	}
>  }
> 
>  int __rte_experimental
> +rte_dev_handle_hot_unplug(void)
> +{
> +	struct sigaction act;
> +	sigset_t mask;
> +	int ret = 0;
> +
> +	/* set signal handlers */
> +	memset(&act, 0x00, sizeof(struct sigaction));
> +	act.sa_handler = sigbus_handler;
> +	sigemptyset(&act.sa_mask);
> +	act.sa_flags = SA_RESTART;
> +	sigaction(SIGBUS, &act, NULL);
> +	sigemptyset(&mask);
> +	sigaddset(&mask, SIGBUS);
> +	pthread_sigmask(SIG_UNBLOCK, &mask, NULL);

Ok, but it is really hard to predict which thread will get a sigbus,
even a control thread could be affected just by reading HW stats.
Can we make it a unified one, per process?


> +
> +	ret = sigsetjmp(RTE_PER_LCORE(unplug_longjmp_env), 1);

Hmm... and why we have to do that?
Why we can't call recover procedure (as I understand all it will do - 
mmap our unplugged dev VA to some dummy memory space) from sighandler itself,
and then just make thread to resume?


> +	if (ret) {
> +		/*
> +		 * Waitting for condition variable before failure recovery
> +		 * finish. Now the limitation is only handle one device
> +		 * hot plug, for multiple devices hotplug, need check if
> +		 * the device belong to this working thread, then directly
> +		 * call memory remaping, unrelated thread just keep going
> +		 * their work by no interrupt from hotplug.
> +		 * TODO: multiple device hotplug
> +		 */
> +		pthread_mutex_lock(&failure_recovery_lock);
> +		RTE_LOG(DEBUG, EAL, "begin waiting for the failure handler.\n");
> +		pthread_cond_wait(&failure_recovery_cond,
> +					&failure_recovery_lock);
> +		RTE_LOG(DEBUG, EAL,
> +		       "come back from waiting for failure handler.\n");
> +		pthread_mutex_unlock(&failure_recovery_lock);

That's a strange one - how do you know that you received SIGBUS because of HW unplug?
Might be some buggy code is just user tried to access some mmaped file beyond it's boundaries,
or so?
In that case your signal handler would be hanged forever.
Second thing, how do you know that dev_uev_handler() has fixed that particular device unplugging?
I think you have first figure out by fault address - is that one of our device addresses,
and if yes - identify which particular.
Then  do actual remap().
BTW, once again why we need to wipe current prog state with longjmp()?
    
Konstantin

> +	}
> +
> +	return ret;
> +}
> +
> +
> +int __rte_experimental
>  rte_dev_event_monitor_start(void)
>  {
>  	int ret;
> @@ -196,6 +324,12 @@ rte_dev_event_monitor_start(void)
>  		return -1;
>  	}
> 
> +	/* initialize mutex and condition variable
> +	 * to control failure recovery.
> +	 */
> +	pthread_mutex_init(&failure_recovery_lock, NULL);
> +	pthread_cond_init(&failure_recovery_cond, NULL);
> +
>  	monitor_started = true;
> 
>  	return 0;
> @@ -219,5 +353,9 @@ rte_dev_event_monitor_stop(void)
>  	close(intr_handle.fd);
>  	intr_handle.fd = -1;
>  	monitor_started = false;
> +
> +	pthread_cond_destroy(&failure_recovery_cond);
> +	pthread_mutex_destroy(&failure_recovery_lock);
> +
>  	return 0;
>  }
> diff --git a/lib/librte_eal/rte_eal_version.map b/lib/librte_eal/rte_eal_version.map
> index fc5c62a..873ef38 100644
> --- a/lib/librte_eal/rte_eal_version.map
> +++ b/lib/librte_eal/rte_eal_version.map
> @@ -262,5 +262,6 @@ EXPERIMENTAL {
>  	rte_dev_event_monitor_stop;
>  	rte_dev_event_callback_register;
>  	rte_dev_event_callback_unregister;
> +	rte_dev_handle_hot_unplug;
> 
>  } DPDK_18.02;
> --
> 2.7.4

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [dpdk-dev] [PATCH V19 1/4] bus/pci: introduce device hot unplug handle
  2018-04-06 10:56     ` [dpdk-dev] [PATCH V19 1/4] bus/pci: introduce device hot unplug handle Jeff Guo
@ 2018-04-09 17:47       ` Ananyev, Konstantin
  2018-04-11 11:37         ` Guo, Jia
  0 siblings, 1 reply; 28+ messages in thread
From: Ananyev, Konstantin @ 2018-04-09 17:47 UTC (permalink / raw)
  To: Guo, Jia, stephen, Richardson, Bruce, Yigit, Ferruh,
	gaetan.rivet, Wu, Jingjing, thomas, motih, Van Haaren, Harry,
	Tan, Jianfeng
  Cc: jblunck, shreyansh.jain, dev, Zhang, Helin



> -----Original Message-----
> From: Guo, Jia
> Sent: Friday, April 6, 2018 11:57 AM
> To: stephen@networkplumber.org; Richardson, Bruce <bruce.richardson@intel.com>; Yigit, Ferruh <ferruh.yigit@intel.com>; Ananyev,
> Konstantin <konstantin.ananyev@intel.com>; gaetan.rivet@6wind.com; Wu, Jingjing <jingjing.wu@intel.com>; thomas@monjalon.net;
> motih@mellanox.com; Van Haaren, Harry <harry.van.haaren@intel.com>; Tan, Jianfeng <jianfeng.tan@intel.com>
> Cc: jblunck@infradead.org; shreyansh.jain@nxp.com; dev@dpdk.org; Guo, Jia <jia.guo@intel.com>; Zhang, Helin <helin.zhang@intel.com>
> Subject: [PATCH V19 1/4] bus/pci: introduce device hot unplug handle
> 
> As of device hot unplug, we need some preparatory measures so that we will
> not encounter memory fault after device be plug out of the system,
> and also let we could recover the running data path but not been break.
> This allows the buses to handle device hot unplug event.
> The patch only enable the ops in pci bus, when handle device hot unplug
> event, remap a dummy memory to avoid bus read/write error.
> Other buses could accordingly implement this ops specific by themselves.
> 
> Signed-off-by: Jeff Guo <jia.guo@intel.com>
> ---
> v19->v18:
> fix some typo and squeeze patch
> ---
>  drivers/bus/pci/pci_common.c            | 42 +++++++++++++++++++++++++++++++++
>  drivers/bus/pci/pci_common_uio.c        | 32 +++++++++++++++++++++++++
>  drivers/bus/pci/private.h               | 12 ++++++++++
>  lib/librte_eal/common/include/rte_bus.h | 15 ++++++++++++
>  4 files changed, 101 insertions(+)
> 
> diff --git a/drivers/bus/pci/pci_common.c b/drivers/bus/pci/pci_common.c
> index 2a00f36..09192ed 100644
> --- a/drivers/bus/pci/pci_common.c
> +++ b/drivers/bus/pci/pci_common.c
> @@ -474,6 +474,47 @@ pci_find_device(const struct rte_device *start, rte_dev_cmp_t cmp,
>  }
> 
>  static int
> +pci_handle_hot_unplug(struct rte_device *dev)
> +{
> +	struct rte_pci_device *pdev;
> +	int ret;
> +
> +	if (dev == NULL)
> +		return -EINVAL;
> +
> +	pdev = RTE_DEV_TO_PCI(dev);
> +
> +	/* remap resources for devices */
> +	switch (pdev->kdrv) {
> +	case RTE_KDRV_VFIO:
> +#ifdef VFIO_PRESENT
> +		/* TODO */
> +#endif
> +		break;
> +	case RTE_KDRV_IGB_UIO:
> +	case RTE_KDRV_UIO_GENERIC:
> +		if (rte_eal_using_phys_addrs()) {
> +			/* map resources for devices that use uio */
> +			ret = pci_uio_remap_resource(pdev);
> +		}
> +		break;
> +	case RTE_KDRV_NIC_UIO:
> +		ret = pci_uio_remap_resource(pdev);
> +		break;
> +	default:
> +		RTE_LOG(DEBUG, EAL,
> +			"  Not managed by a supported kernel driver, skipped\n");
> +		ret = -1;
> +		break;
> +	}
> +
> +	if (ret != 0)
> +		RTE_LOG(ERR, EAL, "failed to handle hot unplug of %s",
> +			pdev->name);
> +	return ret;
> +}
> +
> +static int
>  pci_plug(struct rte_device *dev)
>  {
>  	return pci_probe_all_drivers(RTE_DEV_TO_PCI(dev));
> @@ -503,6 +544,7 @@ struct rte_pci_bus rte_pci_bus = {
>  		.unplug = pci_unplug,
>  		.parse = pci_parse,
>  		.get_iommu_class = rte_pci_get_iommu_class,
> +		.handle_hot_unplug = pci_handle_hot_unplug,
>  	},
>  	.device_list = TAILQ_HEAD_INITIALIZER(rte_pci_bus.device_list),
>  	.driver_list = TAILQ_HEAD_INITIALIZER(rte_pci_bus.driver_list),
> diff --git a/drivers/bus/pci/pci_common_uio.c b/drivers/bus/pci/pci_common_uio.c
> index 54bc20b..31a4094 100644
> --- a/drivers/bus/pci/pci_common_uio.c
> +++ b/drivers/bus/pci/pci_common_uio.c
> @@ -146,6 +146,38 @@ pci_uio_unmap(struct mapped_pci_resource *uio_res)
>  	}
>  }
> 
> +/* remap the PCI resource of a PCI device in private virtual memory */
> +int
> +pci_uio_remap_resource(struct rte_pci_device *dev)
> +{
> +	int i;
> +	void *map_address;
> +
> +	if (dev == NULL)
> +		return -1;
> +
> +	/* Remap all BARs */
> +	for (i = 0; i != PCI_MAX_RESOURCE; i++) {
> +		/* skip empty BAR */
> +		if (dev->mem_resource[i].phys_addr == 0)
> +			continue;
> +		pci_unmap_resource(dev->mem_resource[i].addr,
> +				(size_t)dev->mem_resource[i].len);
> +		map_address = pci_map_resource(
> +				dev->mem_resource[i].addr, -1, 0,
> +				(size_t)dev->mem_resource[i].len,
> +				MAP_ANONYMOUS | MAP_FIXED);

Wouldn't it be possible to use  mremap() here?
To do munmap/mmap in one go?


> +		if (map_address == MAP_FAILED) {
> +			RTE_LOG(ERR, EAL,
> +				"Cannot remap resource for device %s\n",
> +				dev->name);
> +			return -1;
> +		}
> +	}
> +
> +	return 0;
> +}
> +

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [dpdk-dev] [PATCH V19 2/4] eal: add failure handler mechanism for hot plug
  2018-04-09 17:42       ` Ananyev, Konstantin
@ 2018-04-11 11:34         ` Guo, Jia
  0 siblings, 0 replies; 28+ messages in thread
From: Guo, Jia @ 2018-04-11 11:34 UTC (permalink / raw)
  To: Ananyev, Konstantin, stephen, Richardson, Bruce, Yigit, Ferruh,
	gaetan.rivet, Wu, Jingjing, thomas, motih, Van Haaren, Harry,
	Tan, Jianfeng
  Cc: jblunck, shreyansh.jain, dev, Zhang, Helin

thanks your review , Konstantin, comment below.


On 4/10/2018 1:42 AM, Ananyev, Konstantin wrote:
> Hi Jeff,
>
>> This patch introduces an API (rte_dev_handle_hot_unplug) to handle device
>> hot unplug event. When device be hot plug out, the device resource
>> become invalid, if this resource is still be unexpected read/write,
>> system will crash. The api let user register the hot unplug handler, when
>> hot plug failure occur, the working thread will be block until the uevent
>> mechanism successful recovery the memory and guaranty the application keep
>> running smoothly.
>>
>> Signed-off-by: Jeff Guo <jia.guo@intel.com>
>> ---
>> v19->18:
>> add note for limitation of multiple hotplug
>> ---
>>   doc/guides/rel_notes/release_18_05.rst  |   6 ++
>>   kernel/linux/igb_uio/igb_uio.c          |   4 +
>>   lib/librte_eal/common/include/rte_dev.h |  19 +++++
>>   lib/librte_eal/linuxapp/eal/eal_dev.c   | 140 +++++++++++++++++++++++++++++++-
>>   lib/librte_eal/rte_eal_version.map      |   1 +
>>   5 files changed, 169 insertions(+), 1 deletion(-)
>>
>> diff --git a/doc/guides/rel_notes/release_18_05.rst b/doc/guides/rel_notes/release_18_05.rst
>> index cb9e050..2707e73 100644
>> --- a/doc/guides/rel_notes/release_18_05.rst
>> +++ b/doc/guides/rel_notes/release_18_05.rst
>> @@ -70,6 +70,12 @@ New Features
>>
>>     Linux uevent is supported as backend of this device event notification framework.
>>
>> +* **Added hot plug failure handler.**
>> +
>> +  Added a failure handler machenism to handle hot unplug device.
>> +
>> +  * ``rte_dev_handle_hot_unplug`` for handle hot unplug device failure.
>> +
>>   API Changes
>>   -----------
>>
>> diff --git a/kernel/linux/igb_uio/igb_uio.c b/kernel/linux/igb_uio/igb_uio.c
>> index 4cae4dd..293c310 100644
>> --- a/kernel/linux/igb_uio/igb_uio.c
>> +++ b/kernel/linux/igb_uio/igb_uio.c
>> @@ -344,6 +344,10 @@ igbuio_pci_release(struct uio_info *info, struct inode *inode)
>>   	struct rte_uio_pci_dev *udev = info->priv;
>>   	struct pci_dev *dev = udev->pdev;
>>
>> +	/* check if device has been remove before release */
>> +	if ((&dev->dev.kobj)->state_remove_uevent_sent == 1)
>> +		return -1;
>> +
>>   	mutex_lock(&udev->lock);
>>   	if (--udev->refcnt > 0) {
>>   		mutex_unlock(&udev->lock);
>> diff --git a/lib/librte_eal/common/include/rte_dev.h b/lib/librte_eal/common/include/rte_dev.h
>> index a5203e7..17c446d 100644
>> --- a/lib/librte_eal/common/include/rte_dev.h
>> +++ b/lib/librte_eal/common/include/rte_dev.h
>> @@ -361,4 +361,23 @@ rte_dev_event_monitor_start(void);
>>    */
>>   int __rte_experimental
>>   rte_dev_event_monitor_stop(void);
>> +
>> +/**
>> + * @warning
>> + * @b EXPERIMENTAL: this API may change without prior notice
>> + *
>> + * It can be used to register the device signal bus handler, and save the
>> + * current environment for each thread, when signal bus error invoke, the
>> + * handler would restore the environment by long jmp to each working
>> + * thread previous locate, then block the thread to waiting until the memory
>> + * recovery and remapping be finished, that would guaranty the system not
>> + * crash when the device be hot unplug.
>> + *
>> + * @param none
>> + * @return
>> + *   - From a successful direct invocation, zero.
>> + *   - From a call of siglongjmp(), non_zero.
>> + */
>> +int __rte_experimental
>> +rte_dev_handle_hot_unplug(void);
>>   #endif /* _RTE_DEV_H_ */
>> diff --git a/lib/librte_eal/linuxapp/eal/eal_dev.c b/lib/librte_eal/linuxapp/eal/eal_dev.c
>> index 9478a39..84b7efc 100644
>> --- a/lib/librte_eal/linuxapp/eal/eal_dev.c
>> +++ b/lib/librte_eal/linuxapp/eal/eal_dev.c
>> @@ -4,6 +4,9 @@
>>
>>   #include <string.h>
>>   #include <unistd.h>
>> +#include <signal.h>
>> +#include <setjmp.h>
>> +#include <pthread.h>
>>   #include <sys/socket.h>
>>   #include <linux/netlink.h>
>>
>> @@ -13,12 +16,17 @@
>>   #include <rte_malloc.h>
>>   #include <rte_interrupts.h>
>>   #include <rte_alarm.h>
>> +#include <rte_bus.h>
>> +#include <rte_per_lcore.h>
>>
>>   #include "eal_private.h"
>>
>>   static struct rte_intr_handle intr_handle = {.fd = -1 };
>>   static bool monitor_started;
>>
>> +pthread_mutex_t failure_recovery_lock;
>> +pthread_cond_t failure_recovery_cond;
>> +
>>   #define EAL_UEV_MSG_LEN 4096
>>   #define EAL_UEV_MSG_ELEM_LEN 128
>>
>> @@ -32,6 +40,22 @@ enum eal_dev_event_subsystem {
>>   	EAL_DEV_EVENT_SUBSYSTEM_MAX
>>   };
>>
>> +static RTE_DEFINE_PER_LCORE(sigjmp_buf, unplug_longjmp_env);
>> +
>> +static void sigbus_handler(int signum __rte_unused)
>> +{
>> +	RTE_LOG(DEBUG, EAL, "receive SIGBUS error!\n");
>> +	siglongjmp(RTE_PER_LCORE(unplug_longjmp_env), 1);
>> +}
>> +
>> +static int cmp_dev_name(const struct rte_device *dev,
>> +	const void *_name)
>> +{
>> +	const char *name = _name;
>> +
>> +	return strcmp(dev->name, name);
>> +}
>> +
>>   static int
>>   dev_uev_socket_fd_create(void)
>>   {
>> @@ -132,6 +156,31 @@ dev_uev_parse(const char *buf, struct rte_dev_event *event, int length)
>>   	return 0;
>>   }
>>
>> +static int
>> +dev_uev_remove_handler(struct rte_device *dev)
>> +{
>> +	struct rte_bus *bus = rte_bus_find_by_device_name(dev->name);
>> +	int ret;
>> +
>> +	if (!dev)
>> +		return -1;
>> +
>> +	if (bus->handle_hot_unplug) {
>> +		/**
>> +		 * call bus ops to handle hot unplug.
>> +		 */
>> +		ret = bus->handle_hot_unplug(dev);
>> +		if (ret) {
>> +			RTE_LOG(ERR, EAL,
>> +				"It cannot handle hot unplug for device (%s) "
>> +				"on the bus.\n ",
>> +				dev->name);
>> +			return ret;
>> +		}
>> +	}
>> +	return 0;
>> +}
>> +
>>   static void
>>   dev_delayed_unregister(void *param)
>>   {
>> @@ -146,6 +195,9 @@ dev_uev_handler(__rte_unused void *param)
>>   	struct rte_dev_event uevent;
>>   	int ret;
>>   	char buf[EAL_UEV_MSG_LEN];
>> +	struct rte_bus *bus;
>> +	struct rte_device *dev;
>> +	const char *busname;
>>
>>   	memset(&uevent, 0, sizeof(struct rte_dev_event));
>>   	memset(buf, 0, EAL_UEV_MSG_LEN);
>> @@ -170,11 +222,87 @@ dev_uev_handler(__rte_unused void *param)
>>   	RTE_LOG(DEBUG, EAL, "receive uevent(name:%s, type:%d, subsystem:%d)\n",
>>   		uevent.devname, uevent.type, uevent.subsystem);
>>
>> -	if (uevent.devname)
>> +	switch (uevent.subsystem) {
>> +	case EAL_DEV_EVENT_SUBSYSTEM_PCI:
>> +	case EAL_DEV_EVENT_SUBSYSTEM_UIO:
>> +		busname = "pci";
>> +		break;
>> +	default:
>> +		break;
>> +	}
>> +
>> +	if (uevent.devname) {
>> +		if (uevent.type == RTE_DEV_EVENT_REMOVE) {
>> +			bus = rte_bus_find_by_name(busname);
>> +			if (bus == NULL) {
>> +				RTE_LOG(ERR, EAL, "Cannot find bus (%s)\n",
>> +					uevent.devname);
>> +				return;
>> +			}
>> +			dev = bus->find_device(NULL, cmp_dev_name,
>> +					       uevent.devname);
>> +			if (dev == NULL) {
>> +				RTE_LOG(ERR, EAL,
>> +					"Cannot find unplugged device (%s)\n",
>> +					uevent.devname);
>> +				return;
>> +			}
>> +			ret = dev_uev_remove_handler(dev);
>> +			if (ret) {
>> +				RTE_LOG(ERR, EAL, "Driver cannot remap the "
>> +					"device (%s)\n",
>> +					dev->name);
>> +				return;
>> +			}
>> +			/* wake up all the threads */
>> +			pthread_cond_broadcast(&failure_recovery_cond);
>> +		}
>>   		dev_callback_process(uevent.devname, uevent.type);
>> +	}
>>   }
>>
>>   int __rte_experimental
>> +rte_dev_handle_hot_unplug(void)
>> +{
>> +	struct sigaction act;
>> +	sigset_t mask;
>> +	int ret = 0;
>> +
>> +	/* set signal handlers */
>> +	memset(&act, 0x00, sizeof(struct sigaction));
>> +	act.sa_handler = sigbus_handler;
>> +	sigemptyset(&act.sa_mask);
>> +	act.sa_flags = SA_RESTART;
>> +	sigaction(SIGBUS, &act, NULL);
>> +	sigemptyset(&mask);
>> +	sigaddset(&mask, SIGBUS);
>> +	pthread_sigmask(SIG_UNBLOCK, &mask, NULL);
> Ok, but it is really hard to predict which thread will get a sigbus,
> even a control thread could be affected just by reading HW stats.
> Can we make it a unified one, per process?
>
yes, the signal would random sent to each thread, i will let  the 
handler and the mask to be unified.
>> +
>> +	ret = sigsetjmp(RTE_PER_LCORE(unplug_longjmp_env), 1);
> Hmm... and why we have to do that?
> Why we can't call recover procedure (as I understand all it will do -
> mmap our unplugged dev VA to some dummy memory space) from sighandler itself,
> and then just make thread to resume?
>
you do invoke a good question, seem make thread resume but not jump 
could be avoid some clean work and make thing be easily.
>> +	if (ret) {
>> +		/*
>> +		 * Waitting for condition variable before failure recovery
>> +		 * finish. Now the limitation is only handle one device
>> +		 * hot plug, for multiple devices hotplug, need check if
>> +		 * the device belong to this working thread, then directly
>> +		 * call memory remaping, unrelated thread just keep going
>> +		 * their work by no interrupt from hotplug.
>> +		 * TODO: multiple device hotplug
>> +		 */
>> +		pthread_mutex_lock(&failure_recovery_lock);
>> +		RTE_LOG(DEBUG, EAL, "begin waiting for the failure handler.\n");
>> +		pthread_cond_wait(&failure_recovery_cond,
>> +					&failure_recovery_lock);
>> +		RTE_LOG(DEBUG, EAL,
>> +		       "come back from waiting for failure handler.\n");
>> +		pthread_mutex_unlock(&failure_recovery_lock);
> That's a strange one - how do you know that you received SIGBUS because of HW unplug?
> Might be some buggy code is just user tried to access some mmaped file beyond it's boundaries,
> or so?
> In that case your signal handler would be hanged forever.
> Second thing, how do you know that dev_uev_handler() has fixed that particular device unplugging?
> I think you have first figure out by fault address - is that one of our device addresses,
> and if yes - identify which particular.
> Then  do actual remap().
> BTW, once again why we need to wipe current prog state with longjmp()?
>      
> Konstantin
your are right , that would not handler the other cause well, and agree 
your second suggestion, we could use address of signal info to identify 
the device which is being unplug,  that would be handler the multiple
device hot unplug case, i will think about to implement it and remove 
the TODO comment.
>> +	}
>> +
>> +	return ret;
>> +}
>> +
>> +
>> +int __rte_experimental
>>   rte_dev_event_monitor_start(void)
>>   {
>>   	int ret;
>> @@ -196,6 +324,12 @@ rte_dev_event_monitor_start(void)
>>   		return -1;
>>   	}
>>
>> +	/* initialize mutex and condition variable
>> +	 * to control failure recovery.
>> +	 */
>> +	pthread_mutex_init(&failure_recovery_lock, NULL);
>> +	pthread_cond_init(&failure_recovery_cond, NULL);
>> +
>>   	monitor_started = true;
>>
>>   	return 0;
>> @@ -219,5 +353,9 @@ rte_dev_event_monitor_stop(void)
>>   	close(intr_handle.fd);
>>   	intr_handle.fd = -1;
>>   	monitor_started = false;
>> +
>> +	pthread_cond_destroy(&failure_recovery_cond);
>> +	pthread_mutex_destroy(&failure_recovery_lock);
>> +
>>   	return 0;
>>   }
>> diff --git a/lib/librte_eal/rte_eal_version.map b/lib/librte_eal/rte_eal_version.map
>> index fc5c62a..873ef38 100644
>> --- a/lib/librte_eal/rte_eal_version.map
>> +++ b/lib/librte_eal/rte_eal_version.map
>> @@ -262,5 +262,6 @@ EXPERIMENTAL {
>>   	rte_dev_event_monitor_stop;
>>   	rte_dev_event_callback_register;
>>   	rte_dev_event_callback_unregister;
>> +	rte_dev_handle_hot_unplug;
>>
>>   } DPDK_18.02;
>> --
>> 2.7.4

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [dpdk-dev] [PATCH V19 1/4] bus/pci: introduce device hot unplug handle
  2018-04-09 17:47       ` Ananyev, Konstantin
@ 2018-04-11 11:37         ` Guo, Jia
  0 siblings, 0 replies; 28+ messages in thread
From: Guo, Jia @ 2018-04-11 11:37 UTC (permalink / raw)
  To: Ananyev, Konstantin, stephen, Richardson, Bruce, Yigit, Ferruh,
	gaetan.rivet, Wu, Jingjing, thomas, motih, Van Haaren, Harry,
	Tan, Jianfeng
  Cc: jblunck, shreyansh.jain, dev, Zhang, Helin



On 4/10/2018 1:47 AM, Ananyev, Konstantin wrote:
>
>> -----Original Message-----
>> From: Guo, Jia
>> Sent: Friday, April 6, 2018 11:57 AM
>> To: stephen@networkplumber.org; Richardson, Bruce <bruce.richardson@intel.com>; Yigit, Ferruh <ferruh.yigit@intel.com>; Ananyev,
>> Konstantin <konstantin.ananyev@intel.com>; gaetan.rivet@6wind.com; Wu, Jingjing <jingjing.wu@intel.com>; thomas@monjalon.net;
>> motih@mellanox.com; Van Haaren, Harry <harry.van.haaren@intel.com>; Tan, Jianfeng <jianfeng.tan@intel.com>
>> Cc: jblunck@infradead.org; shreyansh.jain@nxp.com; dev@dpdk.org; Guo, Jia <jia.guo@intel.com>; Zhang, Helin <helin.zhang@intel.com>
>> Subject: [PATCH V19 1/4] bus/pci: introduce device hot unplug handle
>>
>> As of device hot unplug, we need some preparatory measures so that we will
>> not encounter memory fault after device be plug out of the system,
>> and also let we could recover the running data path but not been break.
>> This allows the buses to handle device hot unplug event.
>> The patch only enable the ops in pci bus, when handle device hot unplug
>> event, remap a dummy memory to avoid bus read/write error.
>> Other buses could accordingly implement this ops specific by themselves.
>>
>> Signed-off-by: Jeff Guo <jia.guo@intel.com>
>> ---
>> v19->v18:
>> fix some typo and squeeze patch
>> ---
>>   drivers/bus/pci/pci_common.c            | 42 +++++++++++++++++++++++++++++++++
>>   drivers/bus/pci/pci_common_uio.c        | 32 +++++++++++++++++++++++++
>>   drivers/bus/pci/private.h               | 12 ++++++++++
>>   lib/librte_eal/common/include/rte_bus.h | 15 ++++++++++++
>>   4 files changed, 101 insertions(+)
>>
>> diff --git a/drivers/bus/pci/pci_common.c b/drivers/bus/pci/pci_common.c
>> index 2a00f36..09192ed 100644
>> --- a/drivers/bus/pci/pci_common.c
>> +++ b/drivers/bus/pci/pci_common.c
>> @@ -474,6 +474,47 @@ pci_find_device(const struct rte_device *start, rte_dev_cmp_t cmp,
>>   }
>>
>>   static int
>> +pci_handle_hot_unplug(struct rte_device *dev)
>> +{
>> +	struct rte_pci_device *pdev;
>> +	int ret;
>> +
>> +	if (dev == NULL)
>> +		return -EINVAL;
>> +
>> +	pdev = RTE_DEV_TO_PCI(dev);
>> +
>> +	/* remap resources for devices */
>> +	switch (pdev->kdrv) {
>> +	case RTE_KDRV_VFIO:
>> +#ifdef VFIO_PRESENT
>> +		/* TODO */
>> +#endif
>> +		break;
>> +	case RTE_KDRV_IGB_UIO:
>> +	case RTE_KDRV_UIO_GENERIC:
>> +		if (rte_eal_using_phys_addrs()) {
>> +			/* map resources for devices that use uio */
>> +			ret = pci_uio_remap_resource(pdev);
>> +		}
>> +		break;
>> +	case RTE_KDRV_NIC_UIO:
>> +		ret = pci_uio_remap_resource(pdev);
>> +		break;
>> +	default:
>> +		RTE_LOG(DEBUG, EAL,
>> +			"  Not managed by a supported kernel driver, skipped\n");
>> +		ret = -1;
>> +		break;
>> +	}
>> +
>> +	if (ret != 0)
>> +		RTE_LOG(ERR, EAL, "failed to handle hot unplug of %s",
>> +			pdev->name);
>> +	return ret;
>> +}
>> +
>> +static int
>>   pci_plug(struct rte_device *dev)
>>   {
>>   	return pci_probe_all_drivers(RTE_DEV_TO_PCI(dev));
>> @@ -503,6 +544,7 @@ struct rte_pci_bus rte_pci_bus = {
>>   		.unplug = pci_unplug,
>>   		.parse = pci_parse,
>>   		.get_iommu_class = rte_pci_get_iommu_class,
>> +		.handle_hot_unplug = pci_handle_hot_unplug,
>>   	},
>>   	.device_list = TAILQ_HEAD_INITIALIZER(rte_pci_bus.device_list),
>>   	.driver_list = TAILQ_HEAD_INITIALIZER(rte_pci_bus.driver_list),
>> diff --git a/drivers/bus/pci/pci_common_uio.c b/drivers/bus/pci/pci_common_uio.c
>> index 54bc20b..31a4094 100644
>> --- a/drivers/bus/pci/pci_common_uio.c
>> +++ b/drivers/bus/pci/pci_common_uio.c
>> @@ -146,6 +146,38 @@ pci_uio_unmap(struct mapped_pci_resource *uio_res)
>>   	}
>>   }
>>
>> +/* remap the PCI resource of a PCI device in private virtual memory */
>> +int
>> +pci_uio_remap_resource(struct rte_pci_device *dev)
>> +{
>> +	int i;
>> +	void *map_address;
>> +
>> +	if (dev == NULL)
>> +		return -1;
>> +
>> +	/* Remap all BARs */
>> +	for (i = 0; i != PCI_MAX_RESOURCE; i++) {
>> +		/* skip empty BAR */
>> +		if (dev->mem_resource[i].phys_addr == 0)
>> +			continue;
>> +		pci_unmap_resource(dev->mem_resource[i].addr,
>> +				(size_t)dev->mem_resource[i].len);
>> +		map_address = pci_map_resource(
>> +				dev->mem_resource[i].addr, -1, 0,
>> +				(size_t)dev->mem_resource[i].len,
>> +				MAP_ANONYMOUS | MAP_FIXED);
> Wouldn't it be possible to use  mremap() here?
> To do munmap/mmap in one go?
>
since i don't thinks is need to add a new function to remap resource, so 
just use the exist map function with specif flag to handler it. what i 
want to say is if my implement have problem or what is
the mremap's  advantage for that, if so please let me know, thank.
>> +		if (map_address == MAP_FAILED) {
>> +			RTE_LOG(ERR, EAL,
>> +				"Cannot remap resource for device %s\n",
>> +				dev->name);
>> +			return -1;
>> +		}
>> +	}
>> +
>> +	return 0;
>> +}
>> +

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [dpdk-dev] [PATCH V19 2/4] eal: add failure handler mechanism for hot plug
  2018-04-06 14:03       ` Zhang, Qi Z
  2018-04-06 14:24         ` Zhang, Qi Z
@ 2018-04-11 11:49         ` Guo, Jia
  1 sibling, 0 replies; 28+ messages in thread
From: Guo, Jia @ 2018-04-11 11:49 UTC (permalink / raw)
  To: Zhang, Qi Z, stephen, Richardson, Bruce, Yigit, Ferruh, Ananyev,
	Konstantin, gaetan.rivet, Wu, Jingjing, thomas, motih,
	Van Haaren, Harry, Tan, Jianfeng
  Cc: jblunck, shreyansh.jain, dev, Zhang, Helin



On 4/6/2018 10:03 PM, Zhang, Qi Z wrote:
> Hi Jeff:
>
>> -----Original Message-----
>> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Jeff Guo
>> Sent: Friday, April 6, 2018 6:57 PM
>> To: stephen@networkplumber.org; Richardson, Bruce
>> <bruce.richardson@intel.com>; Yigit, Ferruh <ferruh.yigit@intel.com>;
>> Ananyev, Konstantin <konstantin.ananyev@intel.com>;
>> gaetan.rivet@6wind.com; Wu, Jingjing <jingjing.wu@intel.com>;
>> thomas@monjalon.net; motih@mellanox.com; Van Haaren, Harry
>> <harry.van.haaren@intel.com>; Tan, Jianfeng <jianfeng.tan@intel.com>
>> Cc: jblunck@infradead.org; shreyansh.jain@nxp.com; dev@dpdk.org; Guo, Jia
>> <jia.guo@intel.com>; Zhang, Helin <helin.zhang@intel.com>
>> Subject: [dpdk-dev] [PATCH V19 2/4] eal: add failure handler mechanism for
>> hot plug
>>
>> This patch introduces an API (rte_dev_handle_hot_unplug) to handle device
>> hot unplug event. When device be hot plug out, the device resource become
>> invalid, if this resource is still be unexpected read/write, system will crash. The
>> api let user register the hot unplug handler, when hot plug failure occur, the
>> working thread will be block until the uevent mechanism successful recovery
>> the memory and guaranty the application keep running smoothly.
>>
>> Signed-off-by: Jeff Guo <jia.guo@intel.com>
>> ---
>> v19->18:
>> add note for limitation of multiple hotplug
>> ---
>>   doc/guides/rel_notes/release_18_05.rst  |   6 ++
>>   kernel/linux/igb_uio/igb_uio.c          |   4 +
>>   lib/librte_eal/common/include/rte_dev.h |  19 +++++
>>   lib/librte_eal/linuxapp/eal/eal_dev.c   | 140
>> +++++++++++++++++++++++++++++++-
>>   lib/librte_eal/rte_eal_version.map      |   1 +
>>   5 files changed, 169 insertions(+), 1 deletion(-)
>>
>> diff --git a/doc/guides/rel_notes/release_18_05.rst
>> b/doc/guides/rel_notes/release_18_05.rst
>> index cb9e050..2707e73 100644
>> --- a/doc/guides/rel_notes/release_18_05.rst
>> +++ b/doc/guides/rel_notes/release_18_05.rst
>> @@ -70,6 +70,12 @@ New Features
>>
>>     Linux uevent is supported as backend of this device event notification
>> framework.
>>
>> +* **Added hot plug failure handler.**
>> +
>> +  Added a failure handler machenism to handle hot unplug device.
>> +
>> +  * ``rte_dev_handle_hot_unplug`` for handle hot unplug device failure.
>> +
>>   API Changes
>>   -----------
>>
>> diff --git a/kernel/linux/igb_uio/igb_uio.c b/kernel/linux/igb_uio/igb_uio.c
>> index 4cae4dd..293c310 100644
>> --- a/kernel/linux/igb_uio/igb_uio.c
>> +++ b/kernel/linux/igb_uio/igb_uio.c
>> @@ -344,6 +344,10 @@ igbuio_pci_release(struct uio_info *info, struct inode
>> *inode)
>>   	struct rte_uio_pci_dev *udev = info->priv;
>>   	struct pci_dev *dev = udev->pdev;
>>
>> +	/* check if device has been remove before release */
>> +	if ((&dev->dev.kobj)->state_remove_uevent_sent == 1)
>> +		return -1;
>> +
>>   	mutex_lock(&udev->lock);
>>   	if (--udev->refcnt > 0) {
>>   		mutex_unlock(&udev->lock);
>> diff --git a/lib/librte_eal/common/include/rte_dev.h
>> b/lib/librte_eal/common/include/rte_dev.h
>> index a5203e7..17c446d 100644
>> --- a/lib/librte_eal/common/include/rte_dev.h
>> +++ b/lib/librte_eal/common/include/rte_dev.h
>> @@ -361,4 +361,23 @@ rte_dev_event_monitor_start(void);
>>    */
>>   int __rte_experimental
>>   rte_dev_event_monitor_stop(void);
>> +
>> +/**
>> + * @warning
>> + * @b EXPERIMENTAL: this API may change without prior notice
>> + *
>> + * It can be used to register the device signal bus handler, and save
>> +the
>> + * current environment for each thread, when signal bus error invoke,
>> +the
>> + * handler would restore the environment by long jmp to each working
>> + * thread previous locate, then block the thread to waiting until the
>> +memory
>> + * recovery and remapping be finished, that would guaranty the system
>> +not
>> + * crash when the device be hot unplug.
>> + *
>> + * @param none
>> + * @return
>> + *   - From a successful direct invocation, zero.
>> + *   - From a call of siglongjmp(), non_zero.
>> + */
>> +int __rte_experimental
>> +rte_dev_handle_hot_unplug(void);
>>   #endif /* _RTE_DEV_H_ */
>> diff --git a/lib/librte_eal/linuxapp/eal/eal_dev.c
>> b/lib/librte_eal/linuxapp/eal/eal_dev.c
>> index 9478a39..84b7efc 100644
>> --- a/lib/librte_eal/linuxapp/eal/eal_dev.c
>> +++ b/lib/librte_eal/linuxapp/eal/eal_dev.c
>> @@ -4,6 +4,9 @@
>>
>>   #include <string.h>
>>   #include <unistd.h>
>> +#include <signal.h>
>> +#include <setjmp.h>
>> +#include <pthread.h>
>>   #include <sys/socket.h>
>>   #include <linux/netlink.h>
>>
>> @@ -13,12 +16,17 @@
>>   #include <rte_malloc.h>
>>   #include <rte_interrupts.h>
>>   #include <rte_alarm.h>
>> +#include <rte_bus.h>
>> +#include <rte_per_lcore.h>
>>
>>   #include "eal_private.h"
>>
>>   static struct rte_intr_handle intr_handle = {.fd = -1 };  static bool
>> monitor_started;
>>
>> +pthread_mutex_t failure_recovery_lock;
>> +pthread_cond_t failure_recovery_cond;
>> +
>>   #define EAL_UEV_MSG_LEN 4096
>>   #define EAL_UEV_MSG_ELEM_LEN 128
>>
>> @@ -32,6 +40,22 @@ enum eal_dev_event_subsystem {
>>   	EAL_DEV_EVENT_SUBSYSTEM_MAX
>>   };
>>
>> +static RTE_DEFINE_PER_LCORE(sigjmp_buf, unplug_longjmp_env);
>> +
>> +static void sigbus_handler(int signum __rte_unused) {
>> +	RTE_LOG(DEBUG, EAL, "receive SIGBUS error!\n");
>> +	siglongjmp(RTE_PER_LCORE(unplug_longjmp_env), 1); }
>> +
>> +static int cmp_dev_name(const struct rte_device *dev,
>> +	const void *_name)
>> +{
>> +	const char *name = _name;
>> +
>> +	return strcmp(dev->name, name);
>> +}
>> +
>>   static int
>>   dev_uev_socket_fd_create(void)
>>   {
>> @@ -132,6 +156,31 @@ dev_uev_parse(const char *buf, struct rte_dev_event
>> *event, int length)
>>   	return 0;
>>   }
>>
>> +static int
>> +dev_uev_remove_handler(struct rte_device *dev) {
>> +	struct rte_bus *bus = rte_bus_find_by_device_name(dev->name);
>> +	int ret;
>> +
>> +	if (!dev)
>> +		return -1;
>> +
>> +	if (bus->handle_hot_unplug) {
>> +		/**
>> +		 * call bus ops to handle hot unplug.
>> +		 */
>> +		ret = bus->handle_hot_unplug(dev);
>> +		if (ret) {
>> +			RTE_LOG(ERR, EAL,
>> +				"It cannot handle hot unplug for device (%s) "
>> +				"on the bus.\n ",
>> +				dev->name);
>> +			return ret;
>> +		}
>> +	}
>> +	return 0;
>> +}
>> +
>>   static void
>>   dev_delayed_unregister(void *param)
>>   {
>> @@ -146,6 +195,9 @@ dev_uev_handler(__rte_unused void *param)
>>   	struct rte_dev_event uevent;
>>   	int ret;
>>   	char buf[EAL_UEV_MSG_LEN];
>> +	struct rte_bus *bus;
>> +	struct rte_device *dev;
>> +	const char *busname;
>>
>>   	memset(&uevent, 0, sizeof(struct rte_dev_event));
>>   	memset(buf, 0, EAL_UEV_MSG_LEN);
>> @@ -170,11 +222,87 @@ dev_uev_handler(__rte_unused void *param)
>>   	RTE_LOG(DEBUG, EAL, "receive uevent(name:%s, type:%d,
>> subsystem:%d)\n",
>>   		uevent.devname, uevent.type, uevent.subsystem);
>>
>> -	if (uevent.devname)
>> +	switch (uevent.subsystem) {
>> +	case EAL_DEV_EVENT_SUBSYSTEM_PCI:
>> +	case EAL_DEV_EVENT_SUBSYSTEM_UIO:
>> +		busname = "pci";
>> +		break;
>> +	default:
>> +		break;
>> +	}
>> +
>> +	if (uevent.devname) {
>> +		if (uevent.type == RTE_DEV_EVENT_REMOVE) {
>> +			bus = rte_bus_find_by_name(busname);
>> +			if (bus == NULL) {
>> +				RTE_LOG(ERR, EAL, "Cannot find bus (%s)\n",
>> +					uevent.devname);
>> +				return;
>> +			}
>> +			dev = bus->find_device(NULL, cmp_dev_name,
>> +					       uevent.devname);
>> +			if (dev == NULL) {
>> +				RTE_LOG(ERR, EAL,
>> +					"Cannot find unplugged device (%s)\n",
>> +					uevent.devname);
>> +				return;
>> +			}
>> +			ret = dev_uev_remove_handler(dev);
>> +			if (ret) {
>> +				RTE_LOG(ERR, EAL, "Driver cannot remap the "
>> +					"device (%s)\n",
>> +					dev->name);
>> +				return;
>> +			}
>> +			/* wake up all the threads */
>> +			pthread_cond_broadcast(&failure_recovery_cond);
>> +		}
>>   		dev_callback_process(uevent.devname, uevent.type);
>> +	}
>>   }
>>
>>   int __rte_experimental
>> +rte_dev_handle_hot_unplug(void)
>> +{
>> +	struct sigaction act;
>> +	sigset_t mask;
>> +	int ret = 0;
>> +
>> +	/* set signal handlers */
>> +	memset(&act, 0x00, sizeof(struct sigaction));
>> +	act.sa_handler = sigbus_handler;
>> +	sigemptyset(&act.sa_mask);
>> +	act.sa_flags = SA_RESTART;
>> +	sigaction(SIGBUS, &act, NULL);
>> +	sigemptyset(&mask);
>> +	sigaddset(&mask, SIGBUS);
>> +	pthread_sigmask(SIG_UNBLOCK, &mask, NULL);
>> +
>> +	ret = sigsetjmp(RTE_PER_LCORE(unplug_longjmp_env), 1);
>> +	if (ret) {
>> +		/*
>> +		 * Waitting for condition variable before failure recovery
>> +		 * finish. Now the limitation is only handle one device
>> +		 * hot plug, for multiple devices hotplug, need check if
>> +		 * the device belong to this working thread, then directly
>> +		 * call memory remaping, unrelated thread just keep going
>> +		 * their work by no interrupt from hotplug.
>> +		 * TODO: multiple device hotplug
>> +		 */
>> +		pthread_mutex_lock(&failure_recovery_lock);
>> +		RTE_LOG(DEBUG, EAL, "begin waiting for the failure handler.\n");
>> +		pthread_cond_wait(&failure_recovery_cond,
>> +					&failure_recovery_lock);
>> +		RTE_LOG(DEBUG, EAL,
>> +		       "come back from waiting for failure handler.\n");
>> +		pthread_mutex_unlock(&failure_recovery_lock);
> I think we should not assume phread_cond_wait always happen before pthread_cond_broadcast,
> It is possible Sigbus just happen before remap in udev remove handler while pthread_cond_wait happens
> after pthread_cond_broadcast, then we will wait forever.
>
> I think we need a flag to sync
> For example:
>
> pthread_mutex_lock(&failure_recovery_lock);
> if ( udev_remove_handle == 0 )
> 	pthread_cond_wait(&failure_recovery_cond, & failure_recovery_lock);
> pthread_remove_handle = 0;
> pthread_mutex_unlock(&failure_recovery_lock);
>
> while at remove handler:
> pthread_mutex_lock(&failure_recovery_lock);
> pthread_remove_handle = 1;
> pthread_cond_signel(&failure_recovery_cond);
> pthread_mutex_unlock(&failure_recovery_lock);
>
>
> Regards
> Qi
>
make sense, so i think that should be find a better way to handler it, 
such as use initiative remap to replace of partitive waiting might be 
better.
>> +	}
>> +
>> +	return ret;
>> +}
>> +
>> +
>> +int __rte_experimental
>>   rte_dev_event_monitor_start(void)
>>   {
>>   	int ret;
>> @@ -196,6 +324,12 @@ rte_dev_event_monitor_start(void)
>>   		return -1;
>>   	}
>>
>> +	/* initialize mutex and condition variable
>> +	 * to control failure recovery.
>> +	 */
>> +	pthread_mutex_init(&failure_recovery_lock, NULL);
>> +	pthread_cond_init(&failure_recovery_cond, NULL);
>> +
>>   	monitor_started = true;
>>
>>   	return 0;
>> @@ -219,5 +353,9 @@ rte_dev_event_monitor_stop(void)
>>   	close(intr_handle.fd);
>>   	intr_handle.fd = -1;
>>   	monitor_started = false;
>> +
>> +	pthread_cond_destroy(&failure_recovery_cond);
>> +	pthread_mutex_destroy(&failure_recovery_lock);
>> +
>>   	return 0;
>>   }
>> diff --git a/lib/librte_eal/rte_eal_version.map
>> b/lib/librte_eal/rte_eal_version.map
>> index fc5c62a..873ef38 100644
>> --- a/lib/librte_eal/rte_eal_version.map
>> +++ b/lib/librte_eal/rte_eal_version.map
>> @@ -262,5 +262,6 @@ EXPERIMENTAL {
>>   	rte_dev_event_monitor_stop;
>>   	rte_dev_event_callback_register;
>>   	rte_dev_event_callback_unregister;
>> +	rte_dev_handle_hot_unplug;
>>
>>   } DPDK_18.02;
>> --
>> 2.7.4

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [dpdk-dev] [PATCH V19 2/4] eal: add failure handler mechanism for hot plug
  2018-04-06 14:24         ` Zhang, Qi Z
@ 2018-04-11 11:50           ` Guo, Jia
  0 siblings, 0 replies; 28+ messages in thread
From: Guo, Jia @ 2018-04-11 11:50 UTC (permalink / raw)
  To: Zhang, Qi Z, stephen, Richardson, Bruce, Yigit, Ferruh, Ananyev,
	Konstantin, gaetan.rivet, Wu, Jingjing, thomas, motih,
	Van Haaren, Harry, Tan, Jianfeng
  Cc: jblunck, shreyansh.jain, dev, Zhang, Helin



On 4/6/2018 10:24 PM, Zhang, Qi Z wrote:
> One more comment
>
>> -----Original Message-----
>> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Zhang, Qi Z
>> Sent: Friday, April 6, 2018 10:04 PM
>> To: Guo, Jia <jia.guo@intel.com>; stephen@networkplumber.org; Richardson,
>> Bruce <bruce.richardson@intel.com>; Yigit, Ferruh <ferruh.yigit@intel.com>;
>> Ananyev, Konstantin <konstantin.ananyev@intel.com>;
>> gaetan.rivet@6wind.com; Wu, Jingjing <jingjing.wu@intel.com>;
>> thomas@monjalon.net; motih@mellanox.com; Van Haaren, Harry
>> <harry.van.haaren@intel.com>; Tan, Jianfeng <jianfeng.tan@intel.com>
>> Cc: jblunck@infradead.org; shreyansh.jain@nxp.com; dev@dpdk.org; Guo, Jia
>> <jia.guo@intel.com>; Zhang, Helin <helin.zhang@intel.com>
>> Subject: Re: [dpdk-dev] [PATCH V19 2/4] eal: add failure handler mechanism
>> for hot plug
>>
>> Hi Jeff:
>>
>>> -----Original Message-----
>>> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Jeff Guo
>>> Sent: Friday, April 6, 2018 6:57 PM
>>> To: stephen@networkplumber.org; Richardson, Bruce
>>> <bruce.richardson@intel.com>; Yigit, Ferruh <ferruh.yigit@intel.com>;
>>> Ananyev, Konstantin <konstantin.ananyev@intel.com>;
>>> gaetan.rivet@6wind.com; Wu, Jingjing <jingjing.wu@intel.com>;
>>> thomas@monjalon.net; motih@mellanox.com; Van Haaren, Harry
>>> <harry.van.haaren@intel.com>; Tan, Jianfeng <jianfeng.tan@intel.com>
>>> Cc: jblunck@infradead.org; shreyansh.jain@nxp.com; dev@dpdk.org; Guo,
>>> Jia <jia.guo@intel.com>; Zhang, Helin <helin.zhang@intel.com>
>>> Subject: [dpdk-dev] [PATCH V19 2/4] eal: add failure handler mechanism
>>> for hot plug
>>>
>>> This patch introduces an API (rte_dev_handle_hot_unplug) to handle
>>> device hot unplug event. When device be hot plug out, the device
>>> resource become invalid, if this resource is still be unexpected
>>> read/write, system will crash. The api let user register the hot
>>> unplug handler,
> The description is a little bit misleading , based on current implementation.
> it does not a function that let user "register a handler", actually It let user to
> "set a recover point" when some exception (like sigbus) happen due to hot unplug.
> Maybe the function name could be changed also,
> for example: rte_dev_set_recover_point.
make sense , will check the better way demonstrate that.
>> when hot plug failure occur, the working thread will
>>> be block until the uevent mechanism successful recovery the memory and
>> guaranty the application keep running smoothly.
>>> Signed-off-by: Jeff Guo <jia.guo@intel.com>
>>> ---
>>> v19->18:
>>> add note for limitation of multiple hotplug
>>> ---
>>>   doc/guides/rel_notes/release_18_05.rst  |   6 ++
>>>   kernel/linux/igb_uio/igb_uio.c          |   4 +
>>>   lib/librte_eal/common/include/rte_dev.h |  19 +++++
>>>   lib/librte_eal/linuxapp/eal/eal_dev.c   | 140
>>> +++++++++++++++++++++++++++++++-
>>>   lib/librte_eal/rte_eal_version.map      |   1 +
>>>   5 files changed, 169 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/doc/guides/rel_notes/release_18_05.rst
>>> b/doc/guides/rel_notes/release_18_05.rst
>>> index cb9e050..2707e73 100644
>>> --- a/doc/guides/rel_notes/release_18_05.rst
>>> +++ b/doc/guides/rel_notes/release_18_05.rst
>>> @@ -70,6 +70,12 @@ New Features
>>>
>>>     Linux uevent is supported as backend of this device event
>>> notification framework.
>>>
>>> +* **Added hot plug failure handler.**
>>> +
>>> +  Added a failure handler machenism to handle hot unplug device.
>>> +
>>> +  * ``rte_dev_handle_hot_unplug`` for handle hot unplug device failure.
>>> +
>>>   API Changes
>>>   -----------
>>>
>>> diff --git a/kernel/linux/igb_uio/igb_uio.c
>>> b/kernel/linux/igb_uio/igb_uio.c index 4cae4dd..293c310 100644
>>> --- a/kernel/linux/igb_uio/igb_uio.c
>>> +++ b/kernel/linux/igb_uio/igb_uio.c
>>> @@ -344,6 +344,10 @@ igbuio_pci_release(struct uio_info *info, struct
>>> inode
>>> *inode)
>>>   	struct rte_uio_pci_dev *udev = info->priv;
>>>   	struct pci_dev *dev = udev->pdev;
>>>
>>> +	/* check if device has been remove before release */
>>> +	if ((&dev->dev.kobj)->state_remove_uevent_sent == 1)
>>> +		return -1;
>>> +
>>>   	mutex_lock(&udev->lock);
>>>   	if (--udev->refcnt > 0) {
>>>   		mutex_unlock(&udev->lock);
>>> diff --git a/lib/librte_eal/common/include/rte_dev.h
>>> b/lib/librte_eal/common/include/rte_dev.h
>>> index a5203e7..17c446d 100644
>>> --- a/lib/librte_eal/common/include/rte_dev.h
>>> +++ b/lib/librte_eal/common/include/rte_dev.h
>>> @@ -361,4 +361,23 @@ rte_dev_event_monitor_start(void);
>>>    */
>>>   int __rte_experimental
>>>   rte_dev_event_monitor_stop(void);
>>> +
>>> +/**
>>> + * @warning
>>> + * @b EXPERIMENTAL: this API may change without prior notice
>>> + *
>>> + * It can be used to register the device signal bus handler, and save
>>> +the
>>> + * current environment for each thread, when signal bus error invoke,
>>> +the
>>> + * handler would restore the environment by long jmp to each working
>>> + * thread previous locate, then block the thread to waiting until the
>>> +memory
>>> + * recovery and remapping be finished, that would guaranty the system
>>> +not
>>> + * crash when the device be hot unplug.
>>> + *
>>> + * @param none
>>> + * @return
>>> + *   - From a successful direct invocation, zero.
>>> + *   - From a call of siglongjmp(), non_zero.
>>> + */
>>> +int __rte_experimental
>>> +rte_dev_handle_hot_unplug(void);
>>>   #endif /* _RTE_DEV_H_ */
>>> diff --git a/lib/librte_eal/linuxapp/eal/eal_dev.c
>>> b/lib/librte_eal/linuxapp/eal/eal_dev.c
>>> index 9478a39..84b7efc 100644
>>> --- a/lib/librte_eal/linuxapp/eal/eal_dev.c
>>> +++ b/lib/librte_eal/linuxapp/eal/eal_dev.c
>>> @@ -4,6 +4,9 @@
>>>
>>>   #include <string.h>
>>>   #include <unistd.h>
>>> +#include <signal.h>
>>> +#include <setjmp.h>
>>> +#include <pthread.h>
>>>   #include <sys/socket.h>
>>>   #include <linux/netlink.h>
>>>
>>> @@ -13,12 +16,17 @@
>>>   #include <rte_malloc.h>
>>>   #include <rte_interrupts.h>
>>>   #include <rte_alarm.h>
>>> +#include <rte_bus.h>
>>> +#include <rte_per_lcore.h>
>>>
>>>   #include "eal_private.h"
>>>
>>>   static struct rte_intr_handle intr_handle = {.fd = -1 };  static bool
>>> monitor_started;
>>>
>>> +pthread_mutex_t failure_recovery_lock; pthread_cond_t
>>> +failure_recovery_cond;
>>> +
>>>   #define EAL_UEV_MSG_LEN 4096
>>>   #define EAL_UEV_MSG_ELEM_LEN 128
>>>
>>> @@ -32,6 +40,22 @@ enum eal_dev_event_subsystem {
>>>   	EAL_DEV_EVENT_SUBSYSTEM_MAX
>>>   };
>>>
>>> +static RTE_DEFINE_PER_LCORE(sigjmp_buf, unplug_longjmp_env);
>>> +
>>> +static void sigbus_handler(int signum __rte_unused) {
>>> +	RTE_LOG(DEBUG, EAL, "receive SIGBUS error!\n");
>>> +	siglongjmp(RTE_PER_LCORE(unplug_longjmp_env), 1); }
>>> +
>>> +static int cmp_dev_name(const struct rte_device *dev,
>>> +	const void *_name)
>>> +{
>>> +	const char *name = _name;
>>> +
>>> +	return strcmp(dev->name, name);
>>> +}
>>> +
>>>   static int
>>>   dev_uev_socket_fd_create(void)
>>>   {
>>> @@ -132,6 +156,31 @@ dev_uev_parse(const char *buf, struct
>>> rte_dev_event *event, int length)
>>>   	return 0;
>>>   }
>>>
>>> +static int
>>> +dev_uev_remove_handler(struct rte_device *dev) {
>>> +	struct rte_bus *bus = rte_bus_find_by_device_name(dev->name);
>>> +	int ret;
>>> +
>>> +	if (!dev)
>>> +		return -1;
>>> +
>>> +	if (bus->handle_hot_unplug) {
>>> +		/**
>>> +		 * call bus ops to handle hot unplug.
>>> +		 */
>>> +		ret = bus->handle_hot_unplug(dev);
>>> +		if (ret) {
>>> +			RTE_LOG(ERR, EAL,
>>> +				"It cannot handle hot unplug for device (%s) "
>>> +				"on the bus.\n ",
>>> +				dev->name);
>>> +			return ret;
>>> +		}
>>> +	}
>>> +	return 0;
>>> +}
>>> +
>>>   static void
>>>   dev_delayed_unregister(void *param)
>>>   {
>>> @@ -146,6 +195,9 @@ dev_uev_handler(__rte_unused void *param)
>>>   	struct rte_dev_event uevent;
>>>   	int ret;
>>>   	char buf[EAL_UEV_MSG_LEN];
>>> +	struct rte_bus *bus;
>>> +	struct rte_device *dev;
>>> +	const char *busname;
>>>
>>>   	memset(&uevent, 0, sizeof(struct rte_dev_event));
>>>   	memset(buf, 0, EAL_UEV_MSG_LEN);
>>> @@ -170,11 +222,87 @@ dev_uev_handler(__rte_unused void *param)
>>>   	RTE_LOG(DEBUG, EAL, "receive uevent(name:%s, type:%d,
>>> subsystem:%d)\n",
>>>   		uevent.devname, uevent.type, uevent.subsystem);
>>>
>>> -	if (uevent.devname)
>>> +	switch (uevent.subsystem) {
>>> +	case EAL_DEV_EVENT_SUBSYSTEM_PCI:
>>> +	case EAL_DEV_EVENT_SUBSYSTEM_UIO:
>>> +		busname = "pci";
>>> +		break;
>>> +	default:
>>> +		break;
>>> +	}
>>> +
>>> +	if (uevent.devname) {
>>> +		if (uevent.type == RTE_DEV_EVENT_REMOVE) {
>>> +			bus = rte_bus_find_by_name(busname);
>>> +			if (bus == NULL) {
>>> +				RTE_LOG(ERR, EAL, "Cannot find bus (%s)\n",
>>> +					uevent.devname);
>>> +				return;
>>> +			}
>>> +			dev = bus->find_device(NULL, cmp_dev_name,
>>> +					       uevent.devname);
>>> +			if (dev == NULL) {
>>> +				RTE_LOG(ERR, EAL,
>>> +					"Cannot find unplugged device (%s)\n",
>>> +					uevent.devname);
>>> +				return;
>>> +			}
>>> +			ret = dev_uev_remove_handler(dev);
>>> +			if (ret) {
>>> +				RTE_LOG(ERR, EAL, "Driver cannot remap the "
>>> +					"device (%s)\n",
>>> +					dev->name);
>>> +				return;
>>> +			}
>>> +			/* wake up all the threads */
>>> +			pthread_cond_broadcast(&failure_recovery_cond);
>>> +		}
>>>   		dev_callback_process(uevent.devname, uevent.type);
>>> +	}
>>>   }
>>>
>>>   int __rte_experimental
>>> +rte_dev_handle_hot_unplug(void)
>>> +{
>>> +	struct sigaction act;
>>> +	sigset_t mask;
>>> +	int ret = 0;
>>> +
>>> +	/* set signal handlers */
>>> +	memset(&act, 0x00, sizeof(struct sigaction));
>>> +	act.sa_handler = sigbus_handler;
>>> +	sigemptyset(&act.sa_mask);
>>> +	act.sa_flags = SA_RESTART;
>>> +	sigaction(SIGBUS, &act, NULL);
>>> +	sigemptyset(&mask);
>>> +	sigaddset(&mask, SIGBUS);
>>> +	pthread_sigmask(SIG_UNBLOCK, &mask, NULL);
>>> +
>>> +	ret = sigsetjmp(RTE_PER_LCORE(unplug_longjmp_env), 1);
>>> +	if (ret) {
>>> +		/*
>>> +		 * Waitting for condition variable before failure recovery
>>> +		 * finish. Now the limitation is only handle one device
>>> +		 * hot plug, for multiple devices hotplug, need check if
>>> +		 * the device belong to this working thread, then directly
>>> +		 * call memory remaping, unrelated thread just keep going
>>> +		 * their work by no interrupt from hotplug.
>>> +		 * TODO: multiple device hotplug
>>> +		 */
>>> +		pthread_mutex_lock(&failure_recovery_lock);
>>> +		RTE_LOG(DEBUG, EAL, "begin waiting for the failure handler.\n");
>>> +		pthread_cond_wait(&failure_recovery_cond,
>>> +					&failure_recovery_lock);
>>> +		RTE_LOG(DEBUG, EAL,
>>> +		       "come back from waiting for failure handler.\n");
>>> +		pthread_mutex_unlock(&failure_recovery_lock);
>> I think we should not assume phread_cond_wait always happen before
>> pthread_cond_broadcast, It is possible Sigbus just happen before remap in
>> udev remove handler while pthread_cond_wait happens after
>> pthread_cond_broadcast, then we will wait forever.
>>
>> I think we need a flag to sync
>> For example:
>>
>> pthread_mutex_lock(&failure_recovery_lock);
>> if ( udev_remove_handle == 0 )
>> 	pthread_cond_wait(&failure_recovery_cond, & failure_recovery_lock);
>> pthread_remove_handle = 0; pthread_mutex_unlock(&failure_recovery_lock);
>>
>> while at remove handler:
>> pthread_mutex_lock(&failure_recovery_lock);
>> pthread_remove_handle = 1;
>> pthread_cond_signel(&failure_recovery_cond);
>> pthread_mutex_unlock(&failure_recovery_lock);
>>
>>
>> Regards
>> Qi
>>
>>> +	}
>>> +
>>> +	return ret;
>>> +}
>>> +
>>> +
>>> +int __rte_experimental
>>>   rte_dev_event_monitor_start(void)
>>>   {
>>>   	int ret;
>>> @@ -196,6 +324,12 @@ rte_dev_event_monitor_start(void)
>>>   		return -1;
>>>   	}
>>>
>>> +	/* initialize mutex and condition variable
>>> +	 * to control failure recovery.
>>> +	 */
>>> +	pthread_mutex_init(&failure_recovery_lock, NULL);
>>> +	pthread_cond_init(&failure_recovery_cond, NULL);
>>> +
>>>   	monitor_started = true;
>>>
>>>   	return 0;
>>> @@ -219,5 +353,9 @@ rte_dev_event_monitor_stop(void)
>>>   	close(intr_handle.fd);
>>>   	intr_handle.fd = -1;
>>>   	monitor_started = false;
>>> +
>>> +	pthread_cond_destroy(&failure_recovery_cond);
>>> +	pthread_mutex_destroy(&failure_recovery_lock);
>>> +
>>>   	return 0;
>>>   }
>>> diff --git a/lib/librte_eal/rte_eal_version.map
>>> b/lib/librte_eal/rte_eal_version.map
>>> index fc5c62a..873ef38 100644
>>> --- a/lib/librte_eal/rte_eal_version.map
>>> +++ b/lib/librte_eal/rte_eal_version.map
>>> @@ -262,5 +262,6 @@ EXPERIMENTAL {
>>>   	rte_dev_event_monitor_stop;
>>>   	rte_dev_event_callback_register;
>>>   	rte_dev_event_callback_unregister;
>>> +	rte_dev_handle_hot_unplug;
>>>
>>>   } DPDK_18.02;
>>> --
>>> 2.7.4

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [dpdk-dev] [PATCH V19 4/4] app/testpmd: use auto handle for hotplug
  2018-04-06 10:56     ` [dpdk-dev] [PATCH V19 4/4] app/testpmd: use auto handle for hotplug Jeff Guo
@ 2018-04-12  5:31       ` Matan Azrad
  2018-04-13 10:48         ` Guo, Jia
  0 siblings, 1 reply; 28+ messages in thread
From: Matan Azrad @ 2018-04-12  5:31 UTC (permalink / raw)
  To: Jeff Guo, stephen, bruce.richardson, ferruh.yigit,
	konstantin.ananyev, gaetan.rivet, jingjing.wu, Thomas Monjalon,
	Mordechay Haimovsky, harry.van.haaren, jianfeng.tan
  Cc: jblunck, shreyansh.jain, dev, helin.zhang

Hi All

From: Jeff Guo, Friday, April 6, 2018 1:57 PM
> Use testpmd for example, to show how an application smoothly handle
> failure when device be hot removal, and show how to auto bind kernal driver
> to preparing attach device when device being hot insertion.

I saw the kmod of old devices were saved by name and the app hope that new device name will be as previous saved name.
How can the application know which driver should be bind for any new device?

Moreover, I think that the application should not deal with driver binding\remap and all this should be managed by the relevant PMDs - 
Looks like the bind mechanism should be managed directly by the PMDs and not annoying application to deal with it.

Am I missing something?

> Signed-off-by: Jeff Guo <jia.guo@intel.com>
> ---
> v19->v18:
> clean code
> ---
>  app/test-pmd/testpmd.c | 199
> +++++++++++++++++++++++++++++++++++++++++++------
>  app/test-pmd/testpmd.h |   9 +++
>  2 files changed, 184 insertions(+), 24 deletions(-)
> 
> diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c index
> d2c122a..d7fa913 100644
> --- a/app/test-pmd/testpmd.c
> +++ b/app/test-pmd/testpmd.c
> @@ -285,6 +285,8 @@ uint8_t lsc_interrupt = 1; /* enabled by default */
>   */
>  uint8_t rmv_interrupt = 1; /* enabled by default */
> 
> +#define HOT_PLUG_FOR_ALL_DEVICE -1
> +#define ALL_CALLBACK -1
>  uint8_t hot_plug = 0; /**< hotplug disabled by default. */
> 
>  /*
> @@ -387,6 +389,8 @@ uint8_t bitrate_enabled;  struct gro_status
> gro_ports[RTE_MAX_ETHPORTS];  uint8_t gro_flush_cycles =
> GRO_DEFAULT_FLUSH_CYCLES;
> 
> +static struct hotplug_request_list hp_list;
> +
>  /* Forward function declarations */
>  static void map_port_queue_stats_mapping_registers(portid_t pi,
>  						   struct rte_port *port);
> @@ -397,9 +401,13 @@ static int eth_event_callback(portid_t port_id,  static
> void eth_dev_event_callback(char *device_name,
>  				enum rte_dev_event_type type,
>  				void *param);
> -static int eth_dev_event_callback_register(void);
> -static int eth_dev_event_callback_unregister(void);
> +static int eth_dev_event_callback_register(portid_t port_id); static
> +int eth_dev_event_callback_unregister(portid_t port_id);
> +
> +static bool in_hotplug_list(const char *dev_name);
> 
> +static int hotplug_list_add(struct rte_device *device,
> +				enum rte_kernel_driver device_kdrv);
> 
>  /*
>   * Check if all the ports are started.
> @@ -1120,11 +1128,17 @@ run_pkt_fwd_on_lcore(struct fwd_lcore *fc,
> packet_fwd_t pkt_fwd)
>  	uint64_t tics_datum;
>  	uint64_t tics_current;
>  	uint8_t idx_port, cnt_ports;
> +	int ret;
> 
>  	cnt_ports = rte_eth_dev_count();
>  	tics_datum = rte_rdtsc();
>  	tics_per_1sec = rte_get_timer_hz();
>  #endif
> +	if (hot_plug) {
> +		ret = rte_dev_handle_hot_unplug();
> +		if (ret)
> +			printf("The device is being hot unplug!\n");
> +	}
>  	fsm = &fwd_streams[fc->stream_idx];
>  	nb_fs = fc->stream_nb;
>  	do {
> @@ -1863,15 +1877,24 @@ reset_port(portid_t pid)  }
> 
>  static int
> -eth_dev_event_callback_register(void)
> +eth_dev_event_callback_register(portid_t port_id)
>  {
>  	int ret;
> +	char *device_name;
> 
> +	/* if port id equal -1, unregister event callbacks for all device. */

unregister => register
And I think that HOT_PLUG_FOR_ALL_DEVICE is descriptive enough.


> +	if (port_id == (portid_t)HOT_PLUG_FOR_ALL_DEVICE) {
> +		device_name = NULL;
> +	} else {
> +		device_name = strdup(rte_eth_devices[port_id].device-
> >name);
> +		if (!device_name)
> +			return -1;
> +	}
>  	/* register the device event callback */
> -	ret = rte_dev_event_callback_register(NULL,
> -		eth_dev_event_callback, NULL);
> +	ret = rte_dev_event_callback_register(device_name,
> +		eth_dev_event_callback, (void *)(intptr_t)port_id);
>  	if (ret) {
> -		printf("Failed to register device event callback\n");
> +		printf("Failed to register device event callback.\n");
>  		return -1;
>  	}

You missed device_name freeing.

> @@ -1880,15 +1903,25 @@ eth_dev_event_callback_register(void)
> 
> 
>  static int
> -eth_dev_event_callback_unregister(void)
> +eth_dev_event_callback_unregister(portid_t port_id)
>  {
>  	int ret;
> +	char *device_name;
> +
> +	/* if port id equal -1, unregister all device event callbacks */
> +	if (port_id == (portid_t)HOT_PLUG_FOR_ALL_DEVICE) {
> +		device_name = NULL;
> +	} else {
> +		device_name = strdup(rte_eth_devices[port_id].device-
> >name);
> +		if (!device_name)
> +			return -1;
> +	}
> 
>  	/* unregister the device event callback */
> -	ret = rte_dev_event_callback_unregister(NULL,
> -		eth_dev_event_callback, NULL);
> -	if (ret < 0) {
> -		printf("Failed to unregister device event callback\n");
> +	ret = rte_dev_event_callback_unregister(device_name,
> +		eth_dev_event_callback, (void *)(intptr_t)port_id);

Also here.

> +	if (ret) {
> +		printf("Failed to unregister device event callback.\n");
>  		return -1;
>  	}
> 
> @@ -1911,6 +1944,8 @@ attach_port(char *identifier)
>  	if (rte_eth_dev_attach(identifier, &pi))
>  		return;
> 
> +	eth_dev_event_callback_register(pi);
> +
>  	socket_id = (unsigned)rte_eth_dev_socket_id(pi);
>  	/* if socket_id is invalid, set to 0 */
>  	if (check_socket_id(socket_id) < 0)
> @@ -1922,6 +1957,12 @@ attach_port(char *identifier)
> 
>  	ports[pi].port_status = RTE_PORT_STOPPED;
> 
> +	if (hot_plug) {
> +		hotplug_list_add(rte_eth_devices[pi].device,
> +				 rte_eth_devices[pi].data->kdrv);
> +		eth_dev_event_callback_register(pi);
> +	}
> +
>  	printf("Port %d is attached. Now total ports is %d\n", pi, nb_ports);
>  	printf("Done\n");
>  }
> @@ -1933,6 +1974,12 @@ detach_port(portid_t port_id)
> 
>  	printf("Detaching a port...\n");
> 
> +	if (hot_plug) {
> +		hotplug_list_add(rte_eth_devices[port_id].device,
> +				 rte_eth_devices[port_id].data->kdrv);
> +		eth_dev_event_callback_register(port_id);
> +	}
> +
>  	if (!port_is_closed(port_id)) {
>  		printf("Please close port first\n");
>  		return;
> @@ -1979,7 +2026,7 @@ pmd_test_exit(void)
>  			RTE_LOG(ERR, EAL,
>  				"fail to stop device event monitor.");
> 
> -		ret = eth_dev_event_callback_unregister();
> +		ret = eth_dev_event_callback_unregister(ALL_CALLBACK);
>  		if (ret)
>  			RTE_LOG(ERR, EAL,
>  				"fail to unregister all event callbacks."); @@ -
> 2068,6 +2115,40 @@ rmv_event_callback(void *arg)
>  			dev->device->name);
>  }
> 
> +static void
> +rmv_dev_event_callback(void *arg)
> +{
> +	uint8_t port_id = (intptr_t)arg;
> +
> +	rte_eal_alarm_cancel(rmv_dev_event_callback, arg);
> +
> +	RTE_ETH_VALID_PORTID_OR_RET(port_id);
> +	printf("removing port id:%u\n", port_id);
> +
> +	if (!in_hotplug_list(rte_eth_devices[port_id].device->name))
> +		return;
> +
> +	stop_packet_forwarding();
> +
> +	stop_port(port_id);
> +	close_port(port_id);
> +	detach_port(port_id);

Did you check the synchronization with the RMV event callback of ethdev? 

> +}
> +
> +static void
> +add_dev_event_callback(void *arg)
> +{
> +	char *dev_name = (char *)arg;
> +
> +	rte_eal_alarm_cancel(add_dev_event_callback, arg);
> +
> +	if (!in_hotplug_list(dev_name))
> +		return;
> +
> +	RTE_LOG(ERR, EAL, "add device: %s\n", dev_name);
> +	attach_port(dev_name);
> +}
> +
>  /* This function is used by the interrupt thread */  static int
> eth_event_callback(portid_t port_id, enum rte_eth_event_type type, void
> *param, @@ -2114,31 +2195,96 @@ eth_event_callback(portid_t port_id,
> enum rte_eth_event_type type, void *param,
>  	return 0;
>  }
> 
> +static bool
> +in_hotplug_list(const char *dev_name)
> +{
> +	struct hotplug_request *hp_request = NULL;
> +
> +	TAILQ_FOREACH(hp_request, &hp_list, next) {
> +		if (!strcmp(hp_request->dev_name, dev_name))
> +			break;
> +	}
> +
> +	if (hp_request)
> +		return true;
> +
> +	return false;
> +}
> +
> +static enum rte_kernel_driver
> +get_hotplug_driver(const char *dev_name) {
> +	struct hotplug_request *hp_request = NULL;
> +
> +	TAILQ_FOREACH(hp_request, &hp_list, next) {
> +		if (!strcmp(hp_request->dev_name, dev_name))
> +			return hp_request->dev_kdrv;
> +	}
> +	return -1;
> +}
> +
> +static int
> +hotplug_list_add(struct rte_device *device, enum rte_kernel_driver
> +device_kdrv) {
> +	struct hotplug_request *hp_request;
> +
> +	hp_request = rte_zmalloc("hoplug request",
> +			sizeof(*hp_request), 0);
> +	if (hp_request == NULL) {
> +		fprintf(stderr, "%s can not alloc memory\n",
> +			__func__);
> +		return -ENOMEM;
> +	}
> +
> +	hp_request->dev_name = device->name;
> +	hp_request->dev_kdrv = device_kdrv;
> +
> +	TAILQ_INSERT_TAIL(&hp_list, hp_request, next);
> +
> +	return 0;
> +}
> +
>  /* This function is used by the interrupt thread */  static void
> eth_dev_event_callback(char *device_name, enum rte_dev_event_type
> type,
> -			     __rte_unused void *arg)
> +				void *arg)
>  {
> +	static const char * const event_desc[] = {
> +		[RTE_DEV_EVENT_ADD] = "add",
> +		[RTE_DEV_EVENT_REMOVE] = "remove",
> +	};
> +	char *dev_name = malloc(strlen(device_name) + 1);
> +
> +	strcpy(dev_name, device_name);
> +
>  	if (type >= RTE_DEV_EVENT_MAX) {
>  		fprintf(stderr, "%s called upon invalid event %d\n",
>  			__func__, type);
>  		fflush(stderr);
> +	} else if (event_print_mask & (UINT32_C(1) << type)) {
> +		printf("%s event\n",
> +			event_desc[type]);
> +		fflush(stdout);
>  	}
>

You need to use port_id_is_invalid() to check the port validity.
 
>  	switch (type) {
>  	case RTE_DEV_EVENT_REMOVE:
> -		RTE_LOG(ERR, EAL, "The device: %s has been removed!\n",
> -			device_name);
> -		/* TODO: After finish failure handle, begin to stop
> -		 * packet forward, stop port, close port, detach port.
> -		 */
> +		if (rte_eal_alarm_set(100000,
> +			rmv_dev_event_callback, arg))
> +			fprintf(stderr, "Could not set up deferred "
> +				"device removal\n");
>  		break;
>  	case RTE_DEV_EVENT_ADD:
> -		RTE_LOG(ERR, EAL, "The device: %s has been added!\n",
> -			device_name);
> -		/* TODO: After finish kernel driver binding,
> -		 * begin to attach port.
> +		/**
> +		 * bind the driver to the device
> +		 * before process of hot plug adding device
>  		 */
> +		rte_dev_bind_kernel_driver(dev_name,
> +					   get_hotplug_driver(dev_name));
> +		if (rte_eal_alarm_set(100000,
> +			add_dev_event_callback, dev_name))
> +			fprintf(stderr, "Could not set up deferred "
> +				"device add\n");
>  		break;
>  	default:
>  		break;
> @@ -2638,8 +2784,13 @@ main(int argc, char** argv)
>  			rte_errno = EINVAL;
>  			return -1;
>  		}
> -		eth_dev_event_callback_register();
> -
> +		if (TAILQ_EMPTY(&hp_list))
> +			TAILQ_INIT(&hp_list);
> +		RTE_ETH_FOREACH_DEV(port_id) {
> +			hotplug_list_add(rte_eth_devices[port_id].device,
> +					 rte_eth_devices[port_id].data-
> >kdrv);
> +			eth_dev_event_callback_register(port_id);
> +		}
>  	}
> 
>  	if (start_port(RTE_PORT_ALL) != 0)
> diff --git a/app/test-pmd/testpmd.h b/app/test-pmd/testpmd.h index
> 8fde68d..c619e11 100644
> --- a/app/test-pmd/testpmd.h
> +++ b/app/test-pmd/testpmd.h
> @@ -63,6 +63,15 @@ typedef uint16_t streamid_t;
>  #define TM_MODE			0
>  #endif
> 
> +struct hotplug_request {
> +	TAILQ_ENTRY(hotplug_request) next; /**< Callbacks list */
> +	const char *dev_name;            /* request device name */
> +	enum rte_kernel_driver dev_kdrv;            /* kernel driver binded */
> +};
> +
> +/** @internal Structure to keep track of registered callbacks */
> +TAILQ_HEAD(hotplug_request_list, hotplug_request);
> +
>  enum {
>  	PORT_TOPOLOGY_PAIRED,
>  	PORT_TOPOLOGY_CHAINED,
> --
> 2.7.4

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [dpdk-dev] [PATCH V19 4/4] app/testpmd: use auto handle for hotplug
  2018-04-12  5:31       ` Matan Azrad
@ 2018-04-13 10:48         ` Guo, Jia
  2018-04-13 14:58           ` Matan Azrad
  0 siblings, 1 reply; 28+ messages in thread
From: Guo, Jia @ 2018-04-13 10:48 UTC (permalink / raw)
  To: Matan Azrad, stephen, bruce.richardson, ferruh.yigit,
	konstantin.ananyev, gaetan.rivet, jingjing.wu, Thomas Monjalon,
	Mordechay Haimovsky, harry.van.haaren, jianfeng.tan
  Cc: jblunck, shreyansh.jain, dev, helin.zhang

hi matan


On 4/12/2018 1:31 PM, Matan Azrad wrote:
> Hi All
>
> From: Jeff Guo, Friday, April 6, 2018 1:57 PM
>> Use testpmd for example, to show how an application smoothly handle
>> failure when device be hot removal, and show how to auto bind kernal driver
>> to preparing attach device when device being hot insertion.
> I saw the kmod of old devices were saved by name and the app hope that new device name will be as previous saved name.
> How can the application know which driver should be bind for any new device?
about which driver should be bind, that is assume that the app record 
the previous saved driver type,  if the device name match the name list 
,just auto bind the corresponding saved driver,
that is aim to application could bind driver at run time, we want to let 
application not detect the hotplug behaviors and running smoothly, that 
will benefit for further failsafe integrate and
live migration.
> Moreover, I think that the application should not deal with driver binding\remap and all this should be managed by the relevant PMDs -
> Looks like the bind mechanism should be managed directly by the PMDs and not annoying application to deal with it.
>
> Am I missing something?
  partial agree, binding/remap could be managed by PMD, or more by 
hotplug framework. but i don't found the problem to exposure a api to 
binding according app's policy. because, when
device hot insertion, the PMDs default to bind a kernel driver, then 
before running testpmd , we bind specific driver by "dpdk-devbind.py" to 
override the driver, then run app. that override is
also the user behavior but not PMDs, so i think the different just at 
run time or setup time. i don't know if i convinced you here.
>> Signed-off-by: Jeff Guo <jia.guo@intel.com>
>> ---
>> v19->v18:
>> clean code
>> ---
>>   app/test-pmd/testpmd.c | 199
>> +++++++++++++++++++++++++++++++++++++++++++------
>>   app/test-pmd/testpmd.h |   9 +++
>>   2 files changed, 184 insertions(+), 24 deletions(-)
>>
>> diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c index
>> d2c122a..d7fa913 100644
>> --- a/app/test-pmd/testpmd.c
>> +++ b/app/test-pmd/testpmd.c
>> @@ -285,6 +285,8 @@ uint8_t lsc_interrupt = 1; /* enabled by default */
>>    */
>>   uint8_t rmv_interrupt = 1; /* enabled by default */
>>
>> +#define HOT_PLUG_FOR_ALL_DEVICE -1
>> +#define ALL_CALLBACK -1
>>   uint8_t hot_plug = 0; /**< hotplug disabled by default. */
>>
>>   /*
>> @@ -387,6 +389,8 @@ uint8_t bitrate_enabled;  struct gro_status
>> gro_ports[RTE_MAX_ETHPORTS];  uint8_t gro_flush_cycles =
>> GRO_DEFAULT_FLUSH_CYCLES;
>>
>> +static struct hotplug_request_list hp_list;
>> +
>>   /* Forward function declarations */
>>   static void map_port_queue_stats_mapping_registers(portid_t pi,
>>   						   struct rte_port *port);
>> @@ -397,9 +401,13 @@ static int eth_event_callback(portid_t port_id,  static
>> void eth_dev_event_callback(char *device_name,
>>   				enum rte_dev_event_type type,
>>   				void *param);
>> -static int eth_dev_event_callback_register(void);
>> -static int eth_dev_event_callback_unregister(void);
>> +static int eth_dev_event_callback_register(portid_t port_id); static
>> +int eth_dev_event_callback_unregister(portid_t port_id);
>> +
>> +static bool in_hotplug_list(const char *dev_name);
>>
>> +static int hotplug_list_add(struct rte_device *device,
>> +				enum rte_kernel_driver device_kdrv);
>>
>>   /*
>>    * Check if all the ports are started.
>> @@ -1120,11 +1128,17 @@ run_pkt_fwd_on_lcore(struct fwd_lcore *fc,
>> packet_fwd_t pkt_fwd)
>>   	uint64_t tics_datum;
>>   	uint64_t tics_current;
>>   	uint8_t idx_port, cnt_ports;
>> +	int ret;
>>
>>   	cnt_ports = rte_eth_dev_count();
>>   	tics_datum = rte_rdtsc();
>>   	tics_per_1sec = rte_get_timer_hz();
>>   #endif
>> +	if (hot_plug) {
>> +		ret = rte_dev_handle_hot_unplug();
>> +		if (ret)
>> +			printf("The device is being hot unplug!\n");
>> +	}
>>   	fsm = &fwd_streams[fc->stream_idx];
>>   	nb_fs = fc->stream_nb;
>>   	do {
>> @@ -1863,15 +1877,24 @@ reset_port(portid_t pid)  }
>>
>>   static int
>> -eth_dev_event_callback_register(void)
>> +eth_dev_event_callback_register(portid_t port_id)
>>   {
>>   	int ret;
>> +	char *device_name;
>>
>> +	/* if port id equal -1, unregister event callbacks for all device. */
> unregister => register
> And I think that HOT_PLUG_FOR_ALL_DEVICE is descriptive enough.
>
ok.
>> +	if (port_id == (portid_t)HOT_PLUG_FOR_ALL_DEVICE) {
>> +		device_name = NULL;
>> +	} else {
>> +		device_name = strdup(rte_eth_devices[port_id].device-
>>> name);
>> +		if (!device_name)
>> +			return -1;
>> +	}
>>   	/* register the device event callback */
>> -	ret = rte_dev_event_callback_register(NULL,
>> -		eth_dev_event_callback, NULL);
>> +	ret = rte_dev_event_callback_register(device_name,
>> +		eth_dev_event_callback, (void *)(intptr_t)port_id);
>>   	if (ret) {
>> -		printf("Failed to register device event callback\n");
>> +		printf("Failed to register device event callback.\n");
>>   		return -1;
>>   	}
> You missed device_name freeing.
got it.
>> @@ -1880,15 +1903,25 @@ eth_dev_event_callback_register(void)
>>
>>
>>   static int
>> -eth_dev_event_callback_unregister(void)
>> +eth_dev_event_callback_unregister(portid_t port_id)
>>   {
>>   	int ret;
>> +	char *device_name;
>> +
>> +	/* if port id equal -1, unregister all device event callbacks */
>> +	if (port_id == (portid_t)HOT_PLUG_FOR_ALL_DEVICE) {
>> +		device_name = NULL;
>> +	} else {
>> +		device_name = strdup(rte_eth_devices[port_id].device-
>>> name);
>> +		if (!device_name)
>> +			return -1;
>> +	}
>>
>>   	/* unregister the device event callback */
>> -	ret = rte_dev_event_callback_unregister(NULL,
>> -		eth_dev_event_callback, NULL);
>> -	if (ret < 0) {
>> -		printf("Failed to unregister device event callback\n");
>> +	ret = rte_dev_event_callback_unregister(device_name,
>> +		eth_dev_event_callback, (void *)(intptr_t)port_id);
> Also here.
>
>> +	if (ret) {
>> +		printf("Failed to unregister device event callback.\n");
>>   		return -1;
>>   	}
>>
>> @@ -1911,6 +1944,8 @@ attach_port(char *identifier)
>>   	if (rte_eth_dev_attach(identifier, &pi))
>>   		return;
>>
>> +	eth_dev_event_callback_register(pi);
>> +
>>   	socket_id = (unsigned)rte_eth_dev_socket_id(pi);
>>   	/* if socket_id is invalid, set to 0 */
>>   	if (check_socket_id(socket_id) < 0)
>> @@ -1922,6 +1957,12 @@ attach_port(char *identifier)
>>
>>   	ports[pi].port_status = RTE_PORT_STOPPED;
>>
>> +	if (hot_plug) {
>> +		hotplug_list_add(rte_eth_devices[pi].device,
>> +				 rte_eth_devices[pi].data->kdrv);
>> +		eth_dev_event_callback_register(pi);
>> +	}
>> +
>>   	printf("Port %d is attached. Now total ports is %d\n", pi, nb_ports);
>>   	printf("Done\n");
>>   }
>> @@ -1933,6 +1974,12 @@ detach_port(portid_t port_id)
>>
>>   	printf("Detaching a port...\n");
>>
>> +	if (hot_plug) {
>> +		hotplug_list_add(rte_eth_devices[port_id].device,
>> +				 rte_eth_devices[port_id].data->kdrv);
>> +		eth_dev_event_callback_register(port_id);
>> +	}
>> +
>>   	if (!port_is_closed(port_id)) {
>>   		printf("Please close port first\n");
>>   		return;
>> @@ -1979,7 +2026,7 @@ pmd_test_exit(void)
>>   			RTE_LOG(ERR, EAL,
>>   				"fail to stop device event monitor.");
>>
>> -		ret = eth_dev_event_callback_unregister();
>> +		ret = eth_dev_event_callback_unregister(ALL_CALLBACK);
>>   		if (ret)
>>   			RTE_LOG(ERR, EAL,
>>   				"fail to unregister all event callbacks."); @@ -
>> 2068,6 +2115,40 @@ rmv_event_callback(void *arg)
>>   			dev->device->name);
>>   }
>>
>> +static void
>> +rmv_dev_event_callback(void *arg)
>> +{
>> +	uint8_t port_id = (intptr_t)arg;
>> +
>> +	rte_eal_alarm_cancel(rmv_dev_event_callback, arg);
>> +
>> +	RTE_ETH_VALID_PORTID_OR_RET(port_id);
>> +	printf("removing port id:%u\n", port_id);
>> +
>> +	if (!in_hotplug_list(rte_eth_devices[port_id].device->name))
>> +		return;
>> +
>> +	stop_packet_forwarding();
>> +
>> +	stop_port(port_id);
>> +	close_port(port_id);
>> +	detach_port(port_id);
> Did you check the synchronization with the RMV event callback of ethdev?
i guess you mean the RMV event from which PMDs have implement this 
hotplug event. right?
>> +}
>> +
>> +static void
>> +add_dev_event_callback(void *arg)
>> +{
>> +	char *dev_name = (char *)arg;
>> +
>> +	rte_eal_alarm_cancel(add_dev_event_callback, arg);
>> +
>> +	if (!in_hotplug_list(dev_name))
>> +		return;
>> +
>> +	RTE_LOG(ERR, EAL, "add device: %s\n", dev_name);
>> +	attach_port(dev_name);
>> +}
>> +
>>   /* This function is used by the interrupt thread */  static int
>> eth_event_callback(portid_t port_id, enum rte_eth_event_type type, void
>> *param, @@ -2114,31 +2195,96 @@ eth_event_callback(portid_t port_id,
>> enum rte_eth_event_type type, void *param,
>>   	return 0;
>>   }
>>
>> +static bool
>> +in_hotplug_list(const char *dev_name)
>> +{
>> +	struct hotplug_request *hp_request = NULL;
>> +
>> +	TAILQ_FOREACH(hp_request, &hp_list, next) {
>> +		if (!strcmp(hp_request->dev_name, dev_name))
>> +			break;
>> +	}
>> +
>> +	if (hp_request)
>> +		return true;
>> +
>> +	return false;
>> +}
>> +
>> +static enum rte_kernel_driver
>> +get_hotplug_driver(const char *dev_name) {
>> +	struct hotplug_request *hp_request = NULL;
>> +
>> +	TAILQ_FOREACH(hp_request, &hp_list, next) {
>> +		if (!strcmp(hp_request->dev_name, dev_name))
>> +			return hp_request->dev_kdrv;
>> +	}
>> +	return -1;
>> +}
>> +
>> +static int
>> +hotplug_list_add(struct rte_device *device, enum rte_kernel_driver
>> +device_kdrv) {
>> +	struct hotplug_request *hp_request;
>> +
>> +	hp_request = rte_zmalloc("hoplug request",
>> +			sizeof(*hp_request), 0);
>> +	if (hp_request == NULL) {
>> +		fprintf(stderr, "%s can not alloc memory\n",
>> +			__func__);
>> +		return -ENOMEM;
>> +	}
>> +
>> +	hp_request->dev_name = device->name;
>> +	hp_request->dev_kdrv = device_kdrv;
>> +
>> +	TAILQ_INSERT_TAIL(&hp_list, hp_request, next);
>> +
>> +	return 0;
>> +}
>> +
>>   /* This function is used by the interrupt thread */  static void
>> eth_dev_event_callback(char *device_name, enum rte_dev_event_type
>> type,
>> -			     __rte_unused void *arg)
>> +				void *arg)
>>   {
>> +	static const char * const event_desc[] = {
>> +		[RTE_DEV_EVENT_ADD] = "add",
>> +		[RTE_DEV_EVENT_REMOVE] = "remove",
>> +	};
>> +	char *dev_name = malloc(strlen(device_name) + 1);
>> +
>> +	strcpy(dev_name, device_name);
>> +
>>   	if (type >= RTE_DEV_EVENT_MAX) {
>>   		fprintf(stderr, "%s called upon invalid event %d\n",
>>   			__func__, type);
>>   		fflush(stderr);
>> +	} else if (event_print_mask & (UINT32_C(1) << type)) {
>> +		printf("%s event\n",
>> +			event_desc[type]);
>> +		fflush(stdout);
>>   	}
>>
> You need to use port_id_is_invalid() to check the port validity.
>   
ok
>>   	switch (type) {
>>   	case RTE_DEV_EVENT_REMOVE:
>> -		RTE_LOG(ERR, EAL, "The device: %s has been removed!\n",
>> -			device_name);
>> -		/* TODO: After finish failure handle, begin to stop
>> -		 * packet forward, stop port, close port, detach port.
>> -		 */
>> +		if (rte_eal_alarm_set(100000,
>> +			rmv_dev_event_callback, arg))
>> +			fprintf(stderr, "Could not set up deferred "
>> +				"device removal\n");
>>   		break;
>>   	case RTE_DEV_EVENT_ADD:
>> -		RTE_LOG(ERR, EAL, "The device: %s has been added!\n",
>> -			device_name);
>> -		/* TODO: After finish kernel driver binding,
>> -		 * begin to attach port.
>> +		/**
>> +		 * bind the driver to the device
>> +		 * before process of hot plug adding device
>>   		 */
>> +		rte_dev_bind_kernel_driver(dev_name,
>> +					   get_hotplug_driver(dev_name));
>> +		if (rte_eal_alarm_set(100000,
>> +			add_dev_event_callback, dev_name))
>> +			fprintf(stderr, "Could not set up deferred "
>> +				"device add\n");
>>   		break;
>>   	default:
>>   		break;
>> @@ -2638,8 +2784,13 @@ main(int argc, char** argv)
>>   			rte_errno = EINVAL;
>>   			return -1;
>>   		}
>> -		eth_dev_event_callback_register();
>> -
>> +		if (TAILQ_EMPTY(&hp_list))
>> +			TAILQ_INIT(&hp_list);
>> +		RTE_ETH_FOREACH_DEV(port_id) {
>> +			hotplug_list_add(rte_eth_devices[port_id].device,
>> +					 rte_eth_devices[port_id].data-
>>> kdrv);
>> +			eth_dev_event_callback_register(port_id);
>> +		}
>>   	}
>>
>>   	if (start_port(RTE_PORT_ALL) != 0)
>> diff --git a/app/test-pmd/testpmd.h b/app/test-pmd/testpmd.h index
>> 8fde68d..c619e11 100644
>> --- a/app/test-pmd/testpmd.h
>> +++ b/app/test-pmd/testpmd.h
>> @@ -63,6 +63,15 @@ typedef uint16_t streamid_t;
>>   #define TM_MODE			0
>>   #endif
>>
>> +struct hotplug_request {
>> +	TAILQ_ENTRY(hotplug_request) next; /**< Callbacks list */
>> +	const char *dev_name;            /* request device name */
>> +	enum rte_kernel_driver dev_kdrv;            /* kernel driver binded */
>> +};
>> +
>> +/** @internal Structure to keep track of registered callbacks */
>> +TAILQ_HEAD(hotplug_request_list, hotplug_request);
>> +
>>   enum {
>>   	PORT_TOPOLOGY_PAIRED,
>>   	PORT_TOPOLOGY_CHAINED,
>> --
>> 2.7.4

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [dpdk-dev] [PATCH V19 4/4] app/testpmd: use auto handle for hotplug
  2018-04-13 10:48         ` Guo, Jia
@ 2018-04-13 14:58           ` Matan Azrad
  0 siblings, 0 replies; 28+ messages in thread
From: Matan Azrad @ 2018-04-13 14:58 UTC (permalink / raw)
  To: Guo, Jia, stephen, bruce.richardson, ferruh.yigit,
	konstantin.ananyev, gaetan.rivet, jingjing.wu, Thomas Monjalon,
	Mordechay Haimovsky, harry.van.haaren, jianfeng.tan
  Cc: jblunck, shreyansh.jain, dev, helin.zhang

Hi Guo

From: Guo, Jia, Friday, April 13, 2018 1:48 PM
> hi matan
> 
> 
> On 4/12/2018 1:31 PM, Matan Azrad wrote:
> > Hi All
> >
> > From: Jeff Guo, Friday, April 6, 2018 1:57 PM
> >> Use testpmd for example, to show how an application smoothly handle
> >> failure when device be hot removal, and show how to auto bind kernal
> >> driver to preparing attach device when device being hot insertion.
> > I saw the kmod of old devices were saved by name and the app hope that
> new device name will be as previous saved name.
> > How can the application know which driver should be bind for any new
> device?
> about which driver should be bind, that is assume that the app record the
> previous saved driver type,  if the device name match the name list ,just auto
> bind the corresponding saved driver, that is aim to application could bind
> driver at run time,

Looks like you assume that the device is always plugged in while the DPDK application starts, this is not true.

Hot-plug means the device may be plugged-out all the time even in the DPDK initialization.

In this case the plug-in process cannot be completely done by this implementation.

> > Moreover, I think that the application should not deal with driver
> > binding\remap and all this should be managed by the relevant PMDs -
> Looks like the bind mechanism should be managed directly by the PMDs and
> not annoying application to deal with it.
> >
> > Am I missing something?
>   partial agree, binding/remap could be managed by PMD, or more by hotplug
> framework. but i don't found the problem to exposure a api to binding
> according app's policy.

If the application asked the device to be probed (by whithlist\blaclist options in EAL command line)
The device should be probed and do the correct binding no need to request it from the user again,
because the PMD should have all the knowledge to do the bind in probe time.


> because, when device hot insertion, the PMDs default
> to bind a kernel driver, then before running testpmd , we bind specific driver
> by "dpdk-devbind.py" to override the driver, then run app.
> that override is  also the user behavior but not PMDs, so i think the different just at run time
> or setup time. i don't know if i convinced you here.

Maybe this is the time to deprecate that's need(bind device by script before dpdk running)  and to forward the binding responsibility to the PMDs.
So if the user asked the device it obviously should be bind to DPDK(\userland) so do the binding by DPDK PMDs.



> > Did you check the synchronization with the RMV event callback of ethdev?
> i guess you mean the RMV event from which PMDs have implement this
> hotplug event. right? 

Yes

^ permalink raw reply	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2018-04-13 14:58 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-04-03 18:17 [dpdk-dev] [PATCH V18 0/5] add hot plug failure and auto bind handler Jeff Guo
2018-04-03 18:17 ` [dpdk-dev] [PATCH V18 1/5] bus: introduce device hot unplug handle Jeff Guo
2018-04-04  4:31   ` Tan, Jianfeng
2018-04-06 10:54     ` Guo, Jia
2018-04-03 18:17 ` [dpdk-dev] [PATCH V18 2/5] bus/pci: implement handle hot unplug operation Jeff Guo
2018-04-04  5:25   ` Tan, Jianfeng
2018-04-06 10:57     ` Guo, Jia
2018-04-03 18:17 ` [dpdk-dev] [PATCH V18 3/5] eal: add failure handler mechanism for hot plug Jeff Guo
2018-04-04  2:58   ` Zhang, Qi Z
2018-04-06 10:53     ` Guo, Jia
2018-04-03 18:17 ` [dpdk-dev] [PATCH V18 4/5] eal: add driver auto bind for hot insertion Jeff Guo
2018-04-03 18:17 ` [dpdk-dev] [PATCH V18 5/5] app/testpmd: use auto handle for hotplug Jeff Guo
2018-04-06 10:56   ` [dpdk-dev] [PATCH V19 0/4] add hot plug failure and auto bind handler Jeff Guo
2018-04-06 10:56     ` [dpdk-dev] [PATCH V19 1/4] bus/pci: introduce device hot unplug handle Jeff Guo
2018-04-09 17:47       ` Ananyev, Konstantin
2018-04-11 11:37         ` Guo, Jia
2018-04-06 10:56     ` [dpdk-dev] [PATCH V19 2/4] eal: add failure handler mechanism for hot plug Jeff Guo
2018-04-06 14:03       ` Zhang, Qi Z
2018-04-06 14:24         ` Zhang, Qi Z
2018-04-11 11:50           ` Guo, Jia
2018-04-11 11:49         ` Guo, Jia
2018-04-09 17:42       ` Ananyev, Konstantin
2018-04-11 11:34         ` Guo, Jia
2018-04-06 10:56     ` [dpdk-dev] [PATCH V19 3/4] eal: add driver auto bind for hot insertion Jeff Guo
2018-04-06 10:56     ` [dpdk-dev] [PATCH V19 4/4] app/testpmd: use auto handle for hotplug Jeff Guo
2018-04-12  5:31       ` Matan Azrad
2018-04-13 10:48         ` Guo, Jia
2018-04-13 14:58           ` Matan Azrad

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).