DPDK patches and discussions
 help / color / mirror / Atom feed
* [dpdk-dev] [RFC 00/14] mlx5: support SubFunction
@ 2021-05-27 13:37 Xueming Li
  2021-05-27 13:37 ` [dpdk-dev] [RFC 01/14] common/mlx5: add common device driver Xueming Li
                   ` (10 more replies)
  0 siblings, 11 replies; 42+ messages in thread
From: Xueming Li @ 2021-05-27 13:37 UTC (permalink / raw)
  To: Viacheslav Ovsiienko; +Cc: dev, xuemingl

SubFunction [1] is a portion of the PCI device, a SF netdev has its own
dedicated queues(txq, rxq). A SF shares PCI level resources with other
SFs and/or with its parent PCI function. Auxiliary bus is the
fundamental of SF.

This patch set introduces SubFunction support for mlx5 PMD driver
including class net, regex, vdpa and compress.

Depends-on: series-16904 ("bus/auxiliary: introduce auxiliary bus")

Version history:
  RFC:
 	initial version

[1] SubFunction in kernel:
https://lore.kernel.org/netdev/20201112192424.2742-1-parav@nvidia.com/

[2] Auxiliary bus:
http://patchwork.dpdk.org/project/dpdk/patch/20210510134732.2174-1-xuemingl@nvidia.com/


Thomas Monjalon (5):
  common/mlx5: move description of PCI sysfs functions
  vdpa/mlx5: fix driver name
  vdpa/mlx5: remove PCI specifics
  common/mlx5: get PCI device address from any bus
  vdpa/mlx5: support SubFunction

Xueming Li (9):
  common/mlx5: add common device driver
  net/mlx5: remove PCI dependency
  net/mlx5: migrate to bus-agnostic common driver
  regex/mlx5: migrate to common driver
  compress/mlx5: migrate to common driver
  common/mlx5: clean up legacy PCI bus driver
  bus/auxiliary: introduce auxiliary bus
  common/mlx5: support auxiliary bus
  net/mlx5: support SubFunction

 MAINTAINERS                                   |   5 +
 doc/guides/nics/mlx5.rst                      | 339 ++++++++++-
 drivers/bus/auxiliary/auxiliary_common.c      | 408 +++++++++++++
 drivers/bus/auxiliary/auxiliary_params.c      |  58 ++
 drivers/bus/auxiliary/linux/auxiliary.c       | 147 +++++
 drivers/bus/auxiliary/meson.build             |  11 +
 drivers/bus/auxiliary/private.h               | 120 ++++
 drivers/bus/auxiliary/rte_bus_auxiliary.h     | 199 +++++++
 drivers/bus/auxiliary/version.map             |  10 +
 drivers/bus/meson.build                       |   1 +
 drivers/common/mlx5/linux/meson.build         |   3 +
 .../common/mlx5/linux/mlx5_common_auxiliary.c | 188 ++++++
 drivers/common/mlx5/linux/mlx5_common_os.c    |  53 +-
 drivers/common/mlx5/linux/mlx5_common_os.h    |   4 -
 drivers/common/mlx5/linux/mlx5_common_verbs.c |  24 +-
 drivers/common/mlx5/mlx5_common.c             | 340 ++++++++++-
 drivers/common/mlx5/mlx5_common.h             | 176 +++++-
 drivers/common/mlx5/mlx5_common_pci.c         | 554 ++++--------------
 drivers/common/mlx5/mlx5_common_pci.h         |  77 ---
 drivers/common/mlx5/mlx5_common_private.h     |  50 ++
 drivers/common/mlx5/version.map               |  14 +-
 drivers/compress/mlx5/mlx5_compress.c         |  71 +--
 drivers/net/mlx5/linux/mlx5_ethdev_os.c       |  14 +-
 drivers/net/mlx5/linux/mlx5_os.c              | 193 ++++--
 drivers/net/mlx5/linux/mlx5_os.h              |   5 +-
 drivers/net/mlx5/mlx5.c                       |  97 +--
 drivers/net/mlx5/mlx5.h                       |  12 +-
 drivers/net/mlx5/mlx5_ethdev.c                |   2 +-
 drivers/net/mlx5/mlx5_mr.c                    |  46 +-
 drivers/net/mlx5/mlx5_rxmode.c                |   8 +-
 drivers/net/mlx5/mlx5_rxtx.h                  |   9 +-
 drivers/net/mlx5/mlx5_trigger.c               |  14 +-
 drivers/net/mlx5/mlx5_txq.c                   |   2 +-
 drivers/net/mlx5/windows/mlx5_os.c            |  20 +-
 drivers/regex/mlx5/mlx5_regex.c               |  49 +-
 drivers/regex/mlx5/mlx5_regex.h               |   1 -
 drivers/vdpa/mlx5/mlx5_vdpa.c                 | 128 ++--
 drivers/vdpa/mlx5/mlx5_vdpa.h                 |   1 -
 38 files changed, 2532 insertions(+), 921 deletions(-)
 create mode 100644 drivers/bus/auxiliary/auxiliary_common.c
 create mode 100644 drivers/bus/auxiliary/auxiliary_params.c
 create mode 100644 drivers/bus/auxiliary/linux/auxiliary.c
 create mode 100644 drivers/bus/auxiliary/meson.build
 create mode 100644 drivers/bus/auxiliary/private.h
 create mode 100644 drivers/bus/auxiliary/rte_bus_auxiliary.h
 create mode 100644 drivers/bus/auxiliary/version.map
 create mode 100644 drivers/common/mlx5/linux/mlx5_common_auxiliary.c
 delete mode 100644 drivers/common/mlx5/mlx5_common_pci.h
 create mode 100644 drivers/common/mlx5/mlx5_common_private.h

-- 
2.25.1


^ permalink raw reply	[flat|nested] 42+ messages in thread

* [dpdk-dev] [RFC 01/14] common/mlx5: add common device driver
  2021-05-27 13:37 [dpdk-dev] [RFC 00/14] mlx5: support SubFunction Xueming Li
@ 2021-05-27 13:37 ` Xueming Li
  2021-06-10  9:51   ` Thomas Monjalon
                     ` (15 more replies)
  2021-05-27 13:37 ` [dpdk-dev] [RFC 02/14] common/mlx5: move description of PCI sysfs functions Xueming Li
                   ` (9 subsequent siblings)
  10 siblings, 16 replies; 42+ messages in thread
From: Xueming Li @ 2021-05-27 13:37 UTC (permalink / raw)
  To: Viacheslav Ovsiienko
  Cc: dev, xuemingl, Matan Azrad, Shahaf Shuler, Ray Kinsella, Neil Horman

To support auxiliary bus, introduces common device driver and callbacks,
suppose to replace current mlx5 common PCI bus driver.

mlx5 common PCI bus driver still used by mlx5 eth, vDPA and regex PMD,
will remove once all PMD drivers adapt to new common driver.

Signed-off-by: Xueming Li <xuemingl@nvidia.com>
---
 drivers/common/mlx5/linux/mlx5_common_verbs.c |  21 +-
 drivers/common/mlx5/mlx5_common.c             | 322 +++++++++++++++++-
 drivers/common/mlx5/mlx5_common.h             | 133 ++++++++
 drivers/common/mlx5/mlx5_common_pci.c         | 165 ++++++++-
 drivers/common/mlx5/mlx5_common_private.h     |  42 +++
 drivers/common/mlx5/version.map               |   5 +
 6 files changed, 672 insertions(+), 16 deletions(-)
 create mode 100644 drivers/common/mlx5/mlx5_common_private.h

diff --git a/drivers/common/mlx5/linux/mlx5_common_verbs.c b/drivers/common/mlx5/linux/mlx5_common_verbs.c
index aa560f05f2..a49440ef72 100644
--- a/drivers/common/mlx5/linux/mlx5_common_verbs.c
+++ b/drivers/common/mlx5/linux/mlx5_common_verbs.c
@@ -10,11 +10,31 @@
 #include <sys/mman.h>
 #include <inttypes.h>
 
+#include <rte_errno.h>
+#include <rte_bus_pci.h>
+
+#include "mlx5_common_log.h"
+#include "mlx5_common_utils.h"
+#include "mlx5_common_private.h"
 #include "mlx5_autoconf.h"
 #include <mlx5_glue.h>
 #include <mlx5_common.h>
 #include <mlx5_common_mr.h>
 
+struct ibv_device *
+mlx5_get_ibv_device(const struct rte_device *dev)
+{
+	struct ibv_device *ibv = NULL;
+
+	if (mlx5_dev_is_pci(dev))
+		ibv = mlx5_get_pci_ibv_device(&RTE_DEV_TO_PCI_CONST(dev)->addr);
+	if (ibv == NULL) {
+		rte_errno = ENODEV;
+		DRV_LOG(ERR, "Verbs device not found: %s", dev->name);
+	}
+	return ibv;
+}
+
 /**
  * Register mr. Given protection domain pointer, pointer to addr and length
  * register the memory region.
@@ -68,4 +88,3 @@ mlx5_common_verbs_dereg_mr(struct mlx5_pmd_mr *pmd_mr)
 		memset(pmd_mr, 0, sizeof(*pmd_mr));
 	}
 }
-
diff --git a/drivers/common/mlx5/mlx5_common.c b/drivers/common/mlx5/mlx5_common.c
index 25e9f09108..f2e2a95ae0 100644
--- a/drivers/common/mlx5/mlx5_common.c
+++ b/drivers/common/mlx5/mlx5_common.c
@@ -8,11 +8,14 @@
 
 #include <rte_errno.h>
 #include <rte_mempool.h>
+#include <rte_class.h>
+#include <rte_malloc.h>
 
 #include "mlx5_common.h"
 #include "mlx5_common_os.h"
 #include "mlx5_common_log.h"
 #include "mlx5_common_pci.h"
+#include "mlx5_common_private.h"
 
 uint8_t haswell_broadwell_cpu;
 
@@ -41,6 +44,321 @@ static inline void mlx5_cpu_id(unsigned int level,
 
 RTE_LOG_REGISTER_DEFAULT(mlx5_common_logtype, NOTICE)
 
+/* Head of list of drivers. */
+static TAILQ_HEAD(mlx5_drivers, mlx5_class_driver) drivers_list =
+				TAILQ_HEAD_INITIALIZER(drivers_list);
+
+/* Head of devices. */
+static TAILQ_HEAD(mlx5_devices, mlx5_common_device) devices_list =
+				TAILQ_HEAD_INITIALIZER(devices_list);
+
+static const struct {
+	const char *name;
+	unsigned int drv_class;
+} mlx5_classes[] = {
+	{ .name = "vdpa", .drv_class = MLX5_CLASS_VDPA },
+	{ .name = "net", .drv_class = MLX5_CLASS_NET },
+	{ .name = "regex", .drv_class = MLX5_CLASS_REGEX },
+	{ .name = "compress", .drv_class = MLX5_CLASS_COMPRESS },
+};
+
+static int
+class_name_to_value(const char *class_name)
+{
+	unsigned int i;
+
+	for (i = 0; i < RTE_DIM(mlx5_classes); i++) {
+		if (strcmp(class_name, mlx5_classes[i].name) == 0)
+			return mlx5_classes[i].drv_class;
+	}
+	return -EINVAL;
+}
+
+static struct mlx5_class_driver *
+driver_get(uint32_t class)
+{
+	struct mlx5_class_driver *driver;
+
+	TAILQ_FOREACH(driver, &drivers_list, next) {
+		if ((uint32_t)driver->drv_class == class)
+			return driver;
+	}
+	return NULL;
+}
+
+static int
+parse_class_option(const struct rte_devargs *devargs)
+{
+	const char *cls;
+	struct rte_kvargs *kvlist = NULL;
+	int ret = 0;
+
+	if (devargs == NULL)
+		return 0;
+	if (devargs->cls != NULL) {
+		/* Global syntax. */
+		cls = devargs->cls->name;
+	} else {
+		/* Legacy syntax. */
+		kvlist = rte_kvargs_parse(devargs->args, NULL);
+		if (kvlist == NULL)
+			return -EINVAL;
+		cls = rte_kvargs_get(kvlist, RTE_DEVARGS_KEY_CLASS);
+	}
+	if (cls != NULL)
+		ret = class_name_to_value(cls);
+	if (kvlist != NULL)
+		rte_kvargs_free(kvlist);
+	return ret;
+}
+
+static const unsigned int mlx5_class_invalid_combinations[] = {
+	MLX5_CLASS_NET | MLX5_CLASS_VDPA,
+	/* New class combination should be added here. */
+};
+
+static int
+is_valid_class_combination(uint32_t user_classes)
+{
+	unsigned int i;
+
+	/* Verify if user specified unsupported combination. */
+	for (i = 0; i < RTE_DIM(mlx5_class_invalid_combinations); i++) {
+		if ((mlx5_class_invalid_combinations[i] & user_classes) ==
+		    mlx5_class_invalid_combinations[i])
+			return -EINVAL;
+	}
+	/* Not found any invalid class combination. */
+	return 0;
+}
+
+static bool
+device_class_enabled(const struct mlx5_common_device *device, uint32_t class)
+{
+	return (device->classes_loaded & class) > 0;
+}
+
+static bool
+mlx5_bus_match(const struct mlx5_class_driver *drv,
+	       const struct rte_device *dev)
+{
+	if (mlx5_dev_is_pci(dev))
+		return mlx5_dev_pci_match(drv, dev);
+	return true;
+}
+
+static struct mlx5_common_device *
+to_mlx5_device(const struct rte_device *rte_dev)
+{
+	struct mlx5_common_device *dev;
+
+	TAILQ_FOREACH(dev, &devices_list, next) {
+		if (rte_dev == dev->dev)
+			return dev;
+	}
+	return NULL;
+}
+
+static void
+dev_release(struct mlx5_common_device *dev)
+{
+	TAILQ_REMOVE(&devices_list, dev, next);
+	rte_free(dev);
+}
+
+static int
+drivers_remove(struct mlx5_common_device *dev, uint32_t enabled_classes)
+{
+	struct mlx5_class_driver *driver;
+	int local_ret = -ENODEV;
+	unsigned int i = 0;
+	int ret = 0;
+
+	enabled_classes &= dev->classes_loaded;
+	while (enabled_classes) {
+		driver = driver_get(RTE_BIT64(i));
+		if (driver != NULL) {
+			local_ret = driver->remove(dev->dev);
+			if (local_ret == 0)
+				dev->classes_loaded &= ~RTE_BIT64(i);
+			else if (ret == 0)
+				ret = local_ret;
+		}
+		enabled_classes &= ~RTE_BIT64(i);
+		i++;
+	}
+	if (local_ret != 0 && ret == 0)
+		ret = local_ret;
+	return ret;
+}
+
+static int
+drivers_probe(struct mlx5_common_device *dev, uint32_t user_classes)
+{
+	struct mlx5_class_driver *driver;
+	uint32_t enabled_classes = 0;
+	bool already_loaded;
+	int ret;
+
+	TAILQ_FOREACH(driver, &drivers_list, next) {
+		if ((driver->drv_class & user_classes) == 0)
+			continue;
+		if (!mlx5_bus_match(driver, dev->dev))
+			continue;
+		already_loaded = dev->classes_loaded & driver->drv_class;
+		if (already_loaded && driver->probe_again == 0) {
+			DRV_LOG(ERR, "Device %s is already probed",
+				dev->dev->name);
+			ret = -EEXIST;
+			goto probe_err;
+		}
+		ret = driver->probe(dev->dev);
+		if (ret < 0) {
+			DRV_LOG(ERR, "Failed to load driver %s",
+				driver->name);
+			goto probe_err;
+		}
+		enabled_classes |= driver->drv_class;
+	}
+	dev->classes_loaded |= enabled_classes;
+	return 0;
+probe_err:
+	/* Only unload drivers which are enabled which were enabled
+	 * in this probe instance.
+	 */
+	drivers_remove(dev, enabled_classes);
+	return ret;
+}
+
+int
+mlx5_common_dev_probe(struct rte_device *eal_dev)
+{
+	struct mlx5_common_device *dev;
+	uint32_t user_class = 0;
+	bool new_device = false;
+	int ret;
+
+	DRV_LOG(INFO, "probe device \"%s\".", eal_dev->name);
+	ret = parse_class_option(eal_dev->devargs);
+	if (ret < 0)
+		return ret;
+	user_class = ret;
+	if (user_class == 0)
+		/* Default to net class. */
+		user_class = MLX5_CLASS_NET;
+	dev = to_mlx5_device(eal_dev);
+	if (!dev) {
+		dev = rte_zmalloc("mlx5_common_device", sizeof(*dev), 0);
+		if (!dev)
+			return -ENOMEM;
+		dev->dev = eal_dev;
+		TAILQ_INSERT_HEAD(&devices_list, dev, next);
+		new_device = true;
+	} else {
+		/* Validate combination here. */
+		ret = is_valid_class_combination(user_class |
+						 dev->classes_loaded);
+		if (ret != 0) {
+			DRV_LOG(ERR, "Unsupported mlx5 classes supplied.");
+			return ret;
+		}
+	}
+	ret = drivers_probe(dev, user_class);
+	if (ret)
+		goto class_err;
+	return 0;
+class_err:
+	if (new_device)
+		dev_release(dev);
+	return ret;
+}
+
+int
+mlx5_common_dev_remove(struct rte_device *eal_dev)
+{
+	struct mlx5_common_device *dev;
+	int ret;
+
+	dev = to_mlx5_device(eal_dev);
+	if (!dev)
+		return -ENODEV;
+	/* Matching device found, cleanup and unload drivers. */
+	ret = drivers_remove(dev, dev->classes_loaded);
+	if (ret != 0)
+		dev_release(dev);
+	return ret;
+}
+
+int
+mlx5_common_dev_dma_map(struct rte_device *dev, void *addr, uint64_t iova,
+			size_t len)
+{
+	struct mlx5_class_driver *driver = NULL;
+	struct mlx5_class_driver *temp;
+	struct mlx5_common_device *mdev;
+	int ret = -EINVAL;
+
+	mdev = to_mlx5_device(dev);
+	if (!mdev)
+		return -ENODEV;
+	TAILQ_FOREACH(driver, &drivers_list, next) {
+		if (!device_class_enabled(mdev, driver->drv_class) ||
+		    driver->dma_map == NULL)
+			continue;
+		ret = driver->dma_map(dev, addr, iova, len);
+		if (ret)
+			goto map_err;
+	}
+	return ret;
+map_err:
+	TAILQ_FOREACH(temp, &drivers_list, next) {
+		if (temp == driver)
+			break;
+		if (device_class_enabled(mdev, temp->drv_class) &&
+		    temp->dma_map && temp->dma_unmap)
+			temp->dma_unmap(dev, addr, iova, len);
+	}
+	return ret;
+}
+
+int
+mlx5_common_dev_dma_unmap(struct rte_device *dev, void *addr, uint64_t iova,
+			  size_t len)
+{
+	struct mlx5_class_driver *driver;
+	struct mlx5_common_device *mdev;
+	int local_ret = -EINVAL;
+	int ret = 0;
+
+	mdev = to_mlx5_device(dev);
+	if (!mdev)
+		return -ENODEV;
+	/* There is no unmap error recovery in current implementation. */
+	TAILQ_FOREACH_REVERSE(driver, &drivers_list, mlx5_drivers, next) {
+		if (!device_class_enabled(mdev, driver->drv_class) ||
+		    driver->dma_unmap == NULL)
+			continue;
+		local_ret = driver->dma_unmap(dev, addr, iova, len);
+		if (local_ret && (ret == 0))
+			ret = local_ret;
+	}
+	if (local_ret)
+		ret = local_ret;
+	return ret;
+}
+
+void
+mlx5_class_driver_register(struct mlx5_class_driver *driver)
+{
+	mlx5_common_driver_on_register_pci(driver);
+	TAILQ_INSERT_TAIL(&drivers_list, driver, next);
+}
+
+static void mlx5_common_driver_init(void)
+{
+	mlx5_common_pci_init();
+}
+
 static bool mlx5_common_initialized;
 
 /**
@@ -55,7 +373,7 @@ mlx5_common_init(void)
 		return;
 
 	mlx5_glue_constructor();
-	mlx5_common_pci_init();
+	mlx5_common_driver_init();
 	mlx5_common_initialized = true;
 }
 
@@ -214,3 +532,5 @@ mlx5_devx_alloc_uar(void *ctx, int mapping)
 exit:
 	return uar;
 }
+
+RTE_PMD_EXPORT_NAME(mlx5_class_driver, __COUNTER__);
diff --git a/drivers/common/mlx5/mlx5_common.h b/drivers/common/mlx5/mlx5_common.h
index 306f2f1ab7..0e67a36d0c 100644
--- a/drivers/common/mlx5/mlx5_common.h
+++ b/drivers/common/mlx5/mlx5_common.h
@@ -10,6 +10,7 @@
 #include <rte_pci.h>
 #include <rte_debug.h>
 #include <rte_atomic.h>
+#include <rte_rwlock.h>
 #include <rte_log.h>
 #include <rte_kvargs.h>
 #include <rte_devargs.h>
@@ -134,6 +135,11 @@ enum {
 	PCI_DEVICE_ID_MELLANOX_CONNECTX7BF = 0Xa2dc,
 };
 
+struct ibv_device;
+
+__rte_internal
+struct ibv_device *mlx5_get_ibv_device(const struct rte_device *dev);
+
 /* Maximum number of simultaneous unicast MAC addresses. */
 #define MLX5_MAX_UC_MAC_ADDRESSES 128
 /* Maximum number of simultaneous Multicast MAC addresses. */
@@ -242,4 +248,131 @@ extern uint8_t haswell_broadwell_cpu;
 __rte_internal
 void mlx5_common_init(void);
 
+/**
+ * Common Driver Interface
+ * ConnectX common driver supports multiple classes: net,vdpa,regex,crypto and
+ * compress devices. This layer enables creating such multiple class of devices
+ * on a single device by allowing to bind multiple class specific device
+ * driver to attach to common driver.
+ *
+ *                        ----------------
+ *                        |   mlx5 class |
+ *                        |    drivers   |
+ *                        ----------------
+ *                               ||
+ *                        -----------------
+ *                        |     mlx5      |
+ *                        | common driver |
+ *                        -----------------
+ *                          |          |
+ *                 -----------        -----------------
+ *                 |   mlx5  |        |   mlx5        |
+ *                 | pci dev |        | auxiliary dev |
+ *                 -----------        -----------------
+ *
+ * - mlx5 pci bus driver binds to mlx5 PCI devices defined by PCI
+ *   ID table of all related mlx5 PCI devices.
+ * - mlx5 class driver such as net, vdpa, regex PMD defines its
+ *   specific PCI ID table and mlx5 bus driver probes matching
+ *   class drivers.
+ * - mlx5 common driver is central place that validates supported
+ *   class combinations.
+ * - mlx5 common driver hide bus difference by resolving device address
+ *   from devargs, locating target RDMA device and probing with it.
+ */
+
+/**
+ * Initialization function for the driver called during device probing.
+ */
+typedef int (mlx5_class_driver_probe_t)(struct rte_device *dev);
+
+/**
+ * Uninitialization function for the driver called during hot-unplugging.
+ */
+typedef int (mlx5_class_driver_remove_t)(struct rte_device *dev);
+
+/**
+ * Driver-specific DMA mapping. After a successful call the device
+ * will be able to read/write from/to this segment.
+ *
+ * @param dev
+ *   Pointer to the device.
+ * @param addr
+ *   Starting virtual address of memory to be mapped.
+ * @param iova
+ *   Starting IOVA address of memory to be mapped.
+ * @param len
+ *   Length of memory segment being mapped.
+ * @return
+ *   - 0 On success.
+ *   - Negative value and rte_errno is set otherwise.
+ */
+typedef int (mlx5_class_driver_dma_map_t)(struct rte_device *dev, void *addr,
+					  uint64_t iova, size_t len);
+
+/**
+ * Driver-specific DMA un-mapping. After a successful call the device
+ * will not be able to read/write from/to this segment.
+ *
+ * @param dev
+ *   Pointer to the device.
+ * @param addr
+ *   Starting virtual address of memory to be unmapped.
+ * @param iova
+ *   Starting IOVA address of memory to be unmapped.
+ * @param len
+ *   Length of memory segment being unmapped.
+ * @return
+ *   - 0 On success.
+ *   - Negative value and rte_errno is set otherwise.
+ */
+typedef int (mlx5_class_driver_dma_unmap_t)(struct rte_device *dev, void *addr,
+					    uint64_t iova, size_t len);
+
+/** Device already probed can be probed again to check for new ports. */
+#define MLX5_DRV_PROBE_AGAIN 0x0004
+
+/**
+ * A structure describing a mlx5 common class driver.
+ */
+struct mlx5_class_driver {
+	TAILQ_ENTRY(mlx5_class_driver) next;
+	enum mlx5_class drv_class;                /**< Class of this driver. */
+	const char *name;                         /**< Driver name. */
+	mlx5_class_driver_probe_t *probe;         /**< Device Probe function. */
+	mlx5_class_driver_remove_t *remove;       /**< Device Remove function. */
+	mlx5_class_driver_dma_map_t *dma_map;	  /**< device dma map function. */
+	mlx5_class_driver_dma_unmap_t *dma_unmap; /**< device dma unmap function. */
+	const struct rte_pci_id *id_table;        /**< ID table, NULL terminated. */
+	uint32_t probe_again:1;
+	/**< Device already probed can be probed again to check new device. */
+	uint32_t intr_lsc:1; /**< Supports link state interrupt. */
+	uint32_t intr_rmv:1; /**< Supports device remove interrupt. */
+};
+
+/**
+ * Register a mlx5 device driver.
+ *
+ * @param driver
+ *   A pointer to a mlx5_driver structure describing the driver
+ *   to be registered.
+ */
+__rte_internal
+void
+mlx5_class_driver_register(struct mlx5_class_driver *driver);
+
+/**
+ * Test a device is PCI bus device.
+ *
+ * @param dev
+ *   Pointer to device.
+ *
+ * @return
+ *   - True on device devargs is a PCI bus device.
+ *   - False otherwise.
+ */
+__rte_internal
+bool
+mlx5_dev_is_pci(const struct rte_device *dev);
+
 #endif /* RTE_PMD_MLX5_COMMON_H_ */
diff --git a/drivers/common/mlx5/mlx5_common_pci.c b/drivers/common/mlx5/mlx5_common_pci.c
index 34747c4e07..53090173a2 100644
--- a/drivers/common/mlx5/mlx5_common_pci.c
+++ b/drivers/common/mlx5/mlx5_common_pci.c
@@ -3,11 +3,19 @@
  */
 
 #include <stdlib.h>
+
 #include <rte_malloc.h>
+#include <rte_devargs.h>
+#include <rte_errno.h>
 #include <rte_class.h>
 
 #include "mlx5_common_log.h"
 #include "mlx5_common_pci.h"
+#include "mlx5_common_private.h"
+
+static struct rte_pci_driver mlx5_common_pci_driver;
+
+/********** Legacy PCI bus driver, to be removed ********/
 
 struct mlx5_pci_device {
 	struct rte_pci_device *pci_dev;
@@ -282,8 +290,8 @@ drivers_probe(struct mlx5_pci_device *dev, struct rte_pci_driver *pci_drv,
  *   0 on success, a negative errno value otherwise and rte_errno is set.
  */
 static int
-mlx5_common_pci_probe(struct rte_pci_driver *pci_drv __rte_unused,
-		      struct rte_pci_device *pci_dev)
+mlx5_pci_probe(struct rte_pci_driver *pci_drv __rte_unused,
+	       struct rte_pci_device *pci_dev)
 {
 	struct mlx5_pci_device *dev;
 	uint32_t user_classes = 0;
@@ -336,7 +344,7 @@ mlx5_common_pci_probe(struct rte_pci_driver *pci_drv __rte_unused,
  *   0 on success, the function cannot fail.
  */
 static int
-mlx5_common_pci_remove(struct rte_pci_device *pci_dev)
+mlx5_pci_remove(struct rte_pci_device *pci_dev)
 {
 	struct mlx5_pci_device *dev;
 	int ret;
@@ -352,8 +360,8 @@ mlx5_common_pci_remove(struct rte_pci_device *pci_dev)
 }
 
 static int
-mlx5_common_pci_dma_map(struct rte_pci_device *pci_dev, void *addr,
-			uint64_t iova, size_t len)
+mlx5_pci_dma_map(struct rte_pci_device *pci_dev, void *addr,
+		 uint64_t iova, size_t len)
 {
 	struct mlx5_pci_driver *driver = NULL;
 	struct mlx5_pci_driver *temp;
@@ -385,8 +393,8 @@ mlx5_common_pci_dma_map(struct rte_pci_device *pci_dev, void *addr,
 }
 
 static int
-mlx5_common_pci_dma_unmap(struct rte_pci_device *pci_dev, void *addr,
-			  uint64_t iova, size_t len)
+mlx5_pci_dma_unmap(struct rte_pci_device *pci_dev, void *addr,
+		   uint64_t iova, size_t len)
 {
 	struct mlx5_pci_driver *driver;
 	struct mlx5_pci_device *dev;
@@ -419,10 +427,10 @@ static struct rte_pci_driver mlx5_pci_driver = {
 	.driver = {
 		.name = MLX5_PCI_DRIVER_NAME,
 	},
-	.probe = mlx5_common_pci_probe,
-	.remove = mlx5_common_pci_remove,
-	.dma_map = mlx5_common_pci_dma_map,
-	.dma_unmap = mlx5_common_pci_dma_unmap,
+	.probe = mlx5_pci_probe,
+	.remove = mlx5_pci_remove,
+	.dma_map = mlx5_pci_dma_map,
+	.dma_unmap = mlx5_pci_dma_unmap,
 };
 
 static int
@@ -486,7 +494,7 @@ pci_ids_table_update(const struct rte_pci_id *driver_id_table)
 	updated_table = calloc(num_ids, sizeof(*updated_table));
 	if (!updated_table)
 		return -ENOMEM;
-	if (TAILQ_EMPTY(&drv_list)) {
+	if (old_table == NULL) {
 		/* Copy the first driver's ID table. */
 		for (id_iter = driver_id_table; id_iter->vendor_id != 0;
 		     id_iter++, i++)
@@ -502,6 +510,7 @@ pci_ids_table_update(const struct rte_pci_id *driver_id_table)
 	/* Terminate table with empty entry. */
 	updated_table[i].vendor_id = 0;
 	mlx5_pci_driver.id_table = updated_table;
+	mlx5_common_pci_driver.id_table = updated_table;
 	mlx5_pci_id_table = updated_table;
 	if (old_table)
 		free(old_table);
@@ -520,6 +529,133 @@ mlx5_pci_driver_register(struct mlx5_pci_driver *driver)
 	TAILQ_INSERT_TAIL(&drv_list, driver, next);
 }
 
+/********** New common PCI bus driver ********/
+
+bool
+mlx5_dev_is_pci(const struct rte_device *dev)
+{
+	struct rte_devargs *da = dev->devargs;
+
+	if (da == NULL || da->bus == NULL)
+		return false;
+	return strcmp(da->bus->name, "pci") == 0;
+}
+
+struct ibv_device *
+mlx5_get_pci_ibv_device(const struct rte_pci_addr *addr)
+{
+	int n;
+	struct ibv_device **ibv_list = mlx5_glue->get_device_list(&n);
+	struct ibv_device *ibv_match = NULL;
+
+	if (!ibv_list) {
+		rte_errno = ENOSYS;
+		return NULL;
+	}
+	while (n-- > 0) {
+		struct rte_pci_addr pci_addr;
+
+		DRV_LOG(DEBUG, "Checking device \"%s\"..", ibv_list[n]->name);
+		if (mlx5_get_pci_addr(ibv_list[n]->ibdev_path, &pci_addr))
+			continue;
+		if (rte_pci_addr_cmp(addr, &pci_addr))
+			continue;
+		ibv_match = ibv_list[n];
+		break;
+	}
+	if (!ibv_match)
+		rte_errno = ENOENT;
+	mlx5_glue->free_device_list(ibv_list);
+	return ibv_match;
+}
+
+bool
+mlx5_dev_pci_match(const struct mlx5_class_driver *drv,
+		   const struct rte_device *dev)
+{
+	const struct rte_pci_device *pci_dev;
+	const struct rte_pci_id *id_table;
+
+	if (!mlx5_dev_is_pci(dev))
+		return false;
+	pci_dev = RTE_DEV_TO_PCI_CONST(dev);
+	for (id_table = drv->id_table; id_table->vendor_id != 0;
+	     id_table++) {
+		/* Check if device's ids match the class driver's ids. */
+		if (id_table->vendor_id != pci_dev->id.vendor_id &&
+		    id_table->vendor_id != RTE_PCI_ANY_ID)
+			continue;
+		if (id_table->device_id != pci_dev->id.device_id &&
+		    id_table->device_id != RTE_PCI_ANY_ID)
+			continue;
+		if (id_table->subsystem_vendor_id !=
+		    pci_dev->id.subsystem_vendor_id &&
+		    id_table->subsystem_vendor_id != RTE_PCI_ANY_ID)
+			continue;
+		if (id_table->subsystem_device_id !=
+		    pci_dev->id.subsystem_device_id &&
+		    id_table->subsystem_device_id != RTE_PCI_ANY_ID)
+			continue;
+		if (id_table->class_id != pci_dev->id.class_id &&
+		    id_table->class_id != RTE_CLASS_ANY_ID)
+			continue;
+		return true;
+	}
+	return false;
+}
+
+static int
+mlx5_common_pci_probe(struct rte_pci_driver *pci_drv __rte_unused,
+		      struct rte_pci_device *pci_dev)
+{
+	return mlx5_common_dev_probe(&pci_dev->device);
+}
+
+static int
+mlx5_common_pci_remove(struct rte_pci_device *pci_dev)
+{
+	return mlx5_common_dev_remove(&pci_dev->device);
+}
+
+static int
+mlx5_common_pci_dma_map(struct rte_pci_device *pci_dev, void *addr,
+			uint64_t iova, size_t len)
+{
+	return mlx5_common_dev_dma_map(&pci_dev->device, addr, iova, len);
+}
+
+static int
+mlx5_common_pci_dma_unmap(struct rte_pci_device *pci_dev, void *addr,
+			  uint64_t iova, size_t len)
+{
+	return mlx5_common_dev_dma_unmap(&pci_dev->device, addr, iova, len);
+}
+
+void
+mlx5_common_driver_on_register_pci(struct mlx5_class_driver *driver)
+{
+	if (driver->id_table != NULL) {
+		if (pci_ids_table_update(driver->id_table) != 0)
+			return;
+	}
+	if (driver->probe_again)
+		mlx5_common_pci_driver.drv_flags |= RTE_PCI_DRV_PROBE_AGAIN;
+	if (driver->intr_lsc)
+		mlx5_common_pci_driver.drv_flags |= RTE_PCI_DRV_INTR_LSC;
+	if (driver->intr_rmv)
+		mlx5_common_pci_driver.drv_flags |= RTE_PCI_DRV_INTR_RMV;
+}
+
+static struct rte_pci_driver mlx5_common_pci_driver = {
+	.driver = {
+		   .name = MLX5_PCI_DRIVER_NAME,
+	},
+	.probe = mlx5_common_pci_probe,
+	.remove = mlx5_common_pci_remove,
+	.dma_map = mlx5_common_pci_dma_map,
+	.dma_unmap = mlx5_common_pci_dma_unmap,
+};
+
 void mlx5_common_pci_init(void)
 {
 	const struct rte_pci_id empty_table[] = {
@@ -535,7 +671,7 @@ void mlx5_common_pci_init(void)
 	 */
 	if (mlx5_pci_id_table == NULL && pci_ids_table_update(empty_table))
 		return;
-	rte_pci_register(&mlx5_pci_driver);
+	rte_pci_register(&mlx5_common_pci_driver);
 }
 
 RTE_FINI(mlx5_common_pci_finish)
@@ -544,8 +680,9 @@ RTE_FINI(mlx5_common_pci_finish)
 		/* Constructor doesn't register with PCI bus if it failed
 		 * to build the table.
 		 */
-		rte_pci_unregister(&mlx5_pci_driver);
+		rte_pci_unregister(&mlx5_common_pci_driver);
 		free(mlx5_pci_id_table);
 	}
 }
+
 RTE_PMD_EXPORT_NAME(mlx5_common_pci, __COUNTER__);
diff --git a/drivers/common/mlx5/mlx5_common_private.h b/drivers/common/mlx5/mlx5_common_private.h
new file mode 100644
index 0000000000..72df9aef35
--- /dev/null
+++ b/drivers/common/mlx5/mlx5_common_private.h
@@ -0,0 +1,42 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright 2021 Mellanox Technologies, Ltd
+ */
+
+#ifndef _MLX5_COMMON_PRIVATE_H_
+#define _MLX5_COMMON_PRIVATE_H_
+
+#include <rte_pci.h>
+
+#include "mlx5_common.h"
+
+#ifdef __cplusplus
+extern "C" {
+#endif /* __cplusplus */
+
+/* Common bus driver: */
+
+struct mlx5_common_device {
+	struct rte_device *dev;
+	TAILQ_ENTRY(mlx5_common_device) next;
+	uint32_t classes_loaded;
+};
+
+int mlx5_common_dev_probe(struct rte_device *eal_dev);
+int mlx5_common_dev_remove(struct rte_device *eal_dev);
+int mlx5_common_dev_dma_map(struct rte_device *dev, void *addr, uint64_t iova,
+			    size_t len);
+int mlx5_common_dev_dma_unmap(struct rte_device *dev, void *addr, uint64_t iova,
+			      size_t len);
+
+/* Common PCI bus driver: */
+
+void mlx5_common_driver_on_register_pci(struct mlx5_class_driver *driver);
+bool mlx5_dev_pci_match(const struct mlx5_class_driver *drv,
+			const struct rte_device *dev);
+struct ibv_device *mlx5_get_pci_ibv_device(const struct rte_pci_addr *);
+
+#ifdef __cplusplus
+}
+#endif /* __cplusplus */
+
+#endif /* _MLX5_COMMON_PRIVATE_H_ */
diff --git a/drivers/common/mlx5/version.map b/drivers/common/mlx5/version.map
index db4f13f1f7..3c21719975 100644
--- a/drivers/common/mlx5/version.map
+++ b/drivers/common/mlx5/version.map
@@ -3,6 +3,8 @@ INTERNAL {
 
 	haswell_broadwell_cpu;
 
+	mlx5_class_driver_register;
+
 	mlx5_common_init;
 
 	mlx5_common_verbs_reg_mr; # WINDOWS_NO_EXPORT
@@ -10,6 +12,7 @@ INTERNAL {
 
 	mlx5_create_mr_ext;
 
+	mlx5_dev_is_pci;
 	mlx5_dev_to_pci_addr; # WINDOWS_NO_EXPORT
 
 	mlx5_devx_alloc_uar; # WINDOWS_NO_EXPORT
@@ -69,6 +72,8 @@ INTERNAL {
 
 	mlx5_free;
 
+	mlx5_get_ibv_device; # WINDOWS_NO_EXPORT
+
 	mlx5_get_ifname_sysfs; # WINDOWS_NO_EXPORT
 
 	mlx5_glue;
-- 
2.25.1


^ permalink raw reply	[flat|nested] 42+ messages in thread

* [dpdk-dev] [RFC 02/14] common/mlx5: move description of PCI sysfs functions
  2021-05-27 13:37 [dpdk-dev] [RFC 00/14] mlx5: support SubFunction Xueming Li
  2021-05-27 13:37 ` [dpdk-dev] [RFC 01/14] common/mlx5: add common device driver Xueming Li
@ 2021-05-27 13:37 ` Xueming Li
  2021-05-27 13:37 ` [dpdk-dev] [RFC 03/14] net/mlx5: remove PCI dependency Xueming Li
                   ` (8 subsequent siblings)
  10 siblings, 0 replies; 42+ messages in thread
From: Xueming Li @ 2021-05-27 13:37 UTC (permalink / raw)
  To: Viacheslav Ovsiienko
  Cc: Thomas Monjalon, dev, xuemingl, Matan Azrad, Shahaf Shuler

From: Thomas Monjalon <thomas@monjalon.net>

The Linux-specific functions mlx5_get_pci_addr() and
mlx5_get_ifname_sysfs() are better described in the .h file.

The requirement for using mlx5_get_pci_addr() is explicited:
the node /device must exist in the provided sysfs path.

Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
---
 drivers/common/mlx5/linux/mlx5_common_os.c | 22 ------------------
 drivers/common/mlx5/mlx5_common.h          | 26 ++++++++++++++++++++++
 2 files changed, 26 insertions(+), 22 deletions(-)

diff --git a/drivers/common/mlx5/linux/mlx5_common_os.c b/drivers/common/mlx5/linux/mlx5_common_os.c
index ea0b71e425..ea6001e6b2 100644
--- a/drivers/common/mlx5/linux/mlx5_common_os.c
+++ b/drivers/common/mlx5/linux/mlx5_common_os.c
@@ -23,17 +23,6 @@
 const struct mlx5_glue *mlx5_glue;
 #endif
 
-/**
- * Get PCI information by sysfs device path.
- *
- * @param dev_path
- *   Pointer to device sysfs folder name.
- * @param[out] pci_addr
- *   PCI bus address output buffer.
- *
- * @return
- *   0 on success, a negative errno value otherwise and rte_errno is set.
- */
 int
 mlx5_dev_to_pci_addr(const char *dev_path,
 		     struct rte_pci_addr *pci_addr)
@@ -159,17 +148,6 @@ mlx5_translate_port_name(const char *port_name_in,
 	port_info_out->name_type = MLX5_PHYS_PORT_NAME_TYPE_UNKNOWN;
 }
 
-/**
- * Get kernel interface name from IB device path.
- *
- * @param[in] ibdev_path
- *   Pointer to IB device path.
- * @param[out] ifname
- *   Interface name output buffer.
- *
- * @return
- *   0 on success, a negative errno value otherwise and rte_errno is set.
- */
 int
 mlx5_get_ifname_sysfs(const char *ibdev_path, char *ifname)
 {
diff --git a/drivers/common/mlx5/mlx5_common.h b/drivers/common/mlx5/mlx5_common.h
index 0e67a36d0c..62a0dc4bad 100644
--- a/drivers/common/mlx5/mlx5_common.h
+++ b/drivers/common/mlx5/mlx5_common.h
@@ -208,8 +208,34 @@ check_cqe(volatile struct mlx5_cqe *cqe, const uint16_t cqes_n,
 	return MLX5_CQE_STATUS_SW_OWN;
 }
 
+/*
+ * Get PCI address from sysfs of a PCI-related device.
+ *
+ * @param[in] dev_path
+ *   The sysfs path should not point to the direct plain PCI device.
+ *   Instead, the node "/device/" is used to access the real device.
+ * @param[out] pci_addr
+ *   Parsed PCI address.
+ *
+ * @return
+ *   - 0 on success.
+ *   - Negative value and rte_errno is set otherwise.
+ */
 __rte_internal
 int mlx5_dev_to_pci_addr(const char *dev_path, struct rte_pci_addr *pci_addr);
+
+/*
+ * Get kernel network interface name from sysfs IB device path.
+ *
+ * @param[in] ibdev_path
+ *   The sysfs path to IB device.
+ * @param[out] ifname
+ *   Interface name output of size IF_NAMESIZE.
+ *
+ * @return
+ *   - 0 on success.
+ *   - Negative value and rte_errno is set otherwise.
+ */
 __rte_internal
 int mlx5_get_ifname_sysfs(const char *ibdev_path, char *ifname);
 
-- 
2.25.1


^ permalink raw reply	[flat|nested] 42+ messages in thread

* [dpdk-dev] [RFC 03/14] net/mlx5: remove PCI dependency
  2021-05-27 13:37 [dpdk-dev] [RFC 00/14] mlx5: support SubFunction Xueming Li
  2021-05-27 13:37 ` [dpdk-dev] [RFC 01/14] common/mlx5: add common device driver Xueming Li
  2021-05-27 13:37 ` [dpdk-dev] [RFC 02/14] common/mlx5: move description of PCI sysfs functions Xueming Li
@ 2021-05-27 13:37 ` Xueming Li
  2021-05-27 13:37 ` [dpdk-dev] [RFC 04/14] net/mlx5: migrate to bus-agnostic common driver Xueming Li
                   ` (7 subsequent siblings)
  10 siblings, 0 replies; 42+ messages in thread
From: Xueming Li @ 2021-05-27 13:37 UTC (permalink / raw)
  To: Viacheslav Ovsiienko; +Cc: dev, xuemingl, Matan Azrad, Shahaf Shuler

To support more bus types, remove PCI dependency where possible.

Signed-off-by: Xueming Li <xuemingl@nvidia.com>
---
 drivers/common/mlx5/mlx5_common_pci.c   |  6 +--
 drivers/net/mlx5/linux/mlx5_ethdev_os.c |  2 +-
 drivers/net/mlx5/linux/mlx5_os.c        |  4 +-
 drivers/net/mlx5/mlx5.c                 | 53 ++++++++++++++++---------
 drivers/net/mlx5/mlx5.h                 |  8 ++--
 drivers/net/mlx5/mlx5_ethdev.c          |  2 +-
 drivers/net/mlx5/mlx5_mr.c              | 14 +++----
 drivers/net/mlx5/mlx5_trigger.c         | 12 +++---
 drivers/net/mlx5/mlx5_txq.c             |  2 +-
 drivers/net/mlx5/windows/mlx5_os.c      |  6 +--
 10 files changed, 58 insertions(+), 51 deletions(-)

diff --git a/drivers/common/mlx5/mlx5_common_pci.c b/drivers/common/mlx5/mlx5_common_pci.c
index 53090173a2..5a824dd50f 100644
--- a/drivers/common/mlx5/mlx5_common_pci.c
+++ b/drivers/common/mlx5/mlx5_common_pci.c
@@ -534,11 +534,7 @@ mlx5_pci_driver_register(struct mlx5_pci_driver *driver)
 bool
 mlx5_dev_is_pci(const struct rte_device *dev)
 {
-	struct rte_devargs *da = dev->devargs;
-
-	if (da == NULL || da->bus == NULL)
-		return false;
-	return strcmp(da->bus->name, "pci") == 0;
+	return strcmp(dev->bus->name, "pci") == 0;
 }
 
 struct ibv_device *
diff --git a/drivers/net/mlx5/linux/mlx5_ethdev_os.c b/drivers/net/mlx5/linux/mlx5_ethdev_os.c
index ddc1371aa9..6fdb310129 100644
--- a/drivers/net/mlx5/linux/mlx5_ethdev_os.c
+++ b/drivers/net/mlx5/linux/mlx5_ethdev_os.c
@@ -346,7 +346,7 @@ mlx5_find_master_dev(struct rte_eth_dev *dev)
 	priv = dev->data->dev_private;
 	domain_id = priv->domain_id;
 	MLX5_ASSERT(priv->representor);
-	MLX5_ETH_FOREACH_DEV(port_id, priv->pci_dev) {
+	MLX5_ETH_FOREACH_DEV(port_id, dev) {
 		struct mlx5_priv *opriv =
 			rte_eth_devices[port_id].data->dev_private;
 		if (opriv &&
diff --git a/drivers/net/mlx5/linux/mlx5_os.c b/drivers/net/mlx5/linux/mlx5_os.c
index 534a56a555..e8a97d4337 100644
--- a/drivers/net/mlx5/linux/mlx5_os.c
+++ b/drivers/net/mlx5/linux/mlx5_os.c
@@ -925,6 +925,7 @@ mlx5_dev_spawn(struct rte_device *dpdk_dev,
 	sh = mlx5_alloc_shared_dev_ctx(spawn, config);
 	if (!sh)
 		return NULL;
+	sh->numa_node = dpdk_dev->numa_node;
 	config->devx = sh->devx;
 #ifdef HAVE_MLX5DV_DR_ACTION_DEST_DEVX_TIR
 	config->dest_tir = 1;
@@ -1133,7 +1134,7 @@ mlx5_dev_spawn(struct rte_device *dpdk_dev,
 	 * Look for sibling devices in order to reuse their switch domain
 	 * if any, otherwise allocate one.
 	 */
-	MLX5_ETH_FOREACH_DEV(port_id, priv->pci_dev) {
+	MLX5_ETH_FOREACH_DEV(port_id, NULL) {
 		const struct mlx5_priv *opriv =
 			rte_eth_devices[port_id].data->dev_private;
 
@@ -2556,7 +2557,6 @@ mlx5_os_open_device(const struct mlx5_dev_spawn_data *spawn,
 	int dbmap_env;
 	int err = 0;
 
-	sh->numa_node = spawn->pci_dev->device.numa_node;
 	pthread_mutex_init(&sh->txpp.mutex, NULL);
 	/*
 	 * Configure environment variable "MLX5_BF_SHUT_UP"
diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index d0faa45944..95ac43268b 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -1185,7 +1185,7 @@ mlx5_alloc_shared_dev_ctx(const struct mlx5_dev_spawn_data *spawn,
 	 */
 	err = mlx5_mr_btree_init(&sh->share_cache.cache,
 				 MLX5_MR_BTREE_CACHE_N * 2,
-				 spawn->pci_dev->device.numa_node);
+				 sh->numa_node);
 	if (err) {
 		err = rte_errno;
 		goto error;
@@ -1620,7 +1620,7 @@ mlx5_dev_close(struct rte_eth_dev *dev)
 		unsigned int c = 0;
 		uint16_t port_id;
 
-		MLX5_ETH_FOREACH_DEV(port_id, priv->pci_dev) {
+		MLX5_ETH_FOREACH_DEV(port_id, dev) {
 			struct mlx5_priv *opriv =
 				rte_eth_devices[port_id].data->dev_private;
 
@@ -2057,7 +2057,8 @@ void
 mlx5_set_min_inline(struct mlx5_dev_spawn_data *spawn,
 		    struct mlx5_dev_config *config)
 {
-	if (config->txq_inline_min != MLX5_ARG_UNSET) {
+	if (config->txq_inline_min != MLX5_ARG_UNSET &&
+	    spawn->pci_dev != NULL) {
 		/* Application defines size of inlined data explicitly. */
 		switch (spawn->pci_dev->id.device_id) {
 		case PCI_DEVICE_ID_MELLANOX_CONNECTX4:
@@ -2124,6 +2125,11 @@ mlx5_set_min_inline(struct mlx5_dev_spawn_data *spawn,
 			}
 		}
 	}
+	if (spawn->pci_dev == NULL) {
+		if (config->txq_inline_min == MLX5_ARG_UNSET)
+			config->txq_inline_min = MLX5_INLINE_HSIZE_NONE;
+		goto exit;
+	}
 	/*
 	 * We get here if we are unable to deduce
 	 * inline data size with DevX. Try PCI ID
@@ -2258,7 +2264,7 @@ mlx5_dev_check_sibling_config(struct mlx5_priv *priv,
 	if (sh->refcnt == 1)
 		return 0;
 	/* Find the device with shared context. */
-	MLX5_ETH_FOREACH_DEV(port_id, priv->pci_dev) {
+	MLX5_ETH_FOREACH_DEV(port_id, NULL) {
 		struct mlx5_priv *opriv =
 			rte_eth_devices[port_id].data->dev_private;
 
@@ -2286,33 +2292,42 @@ mlx5_dev_check_sibling_config(struct mlx5_priv *priv,
 
 /**
  * Look for the ethernet device belonging to mlx5 driver.
+ * If device specified, look for ports belong to same PCI/bonding.
  *
  * @param[in] port_id
  *   port_id to start looking for device.
- * @param[in] pci_dev
- *   Pointer to the hint PCI device. When device is being probed
- *   the its siblings (master and preceding representors might
- *   not have assigned driver yet (because the mlx5_os_pci_probe()
- *   is not completed yet, for this case match on hint PCI
- *   device may be used to detect sibling device.
+ * @param[in] odev
+ *   Device to detect sibling.
  *
  * @return
  *   port_id of found device, RTE_MAX_ETHPORT if not found.
  */
 uint16_t
-mlx5_eth_find_next(uint16_t port_id, struct rte_pci_device *pci_dev)
+mlx5_eth_find_next(uint16_t port_id, struct rte_eth_dev *odev)
 {
-	while (port_id < RTE_MAX_ETHPORTS) {
+	const struct mlx5_priv *opriv = NULL;
+
+	if (odev)
+		opriv = odev->data->dev_private;
+	for ( ; port_id < RTE_MAX_ETHPORTS; port_id++) {
 		struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+		const struct mlx5_priv *priv;
 
-		if (dev->state != RTE_ETH_DEV_UNUSED &&
-		    dev->device &&
-		    (dev->device == &pci_dev->device ||
-		     (dev->device->driver &&
-		     dev->device->driver->name &&
-		     !strcmp(dev->device->driver->name, MLX5_PCI_DRIVER_NAME))))
+		if (dev->state == RTE_ETH_DEV_UNUSED)
+			continue;
+		priv = dev->data->dev_private;
+		if (odev != NULL) {
+			/* odev specified, find devices on same PCI/bonding. */
+			if (opriv->sh == priv->sh ||
+			    odev->device == dev->device)
+				break;
+		} else if (dev->device != NULL && dev->device->driver &&
+			dev->device->driver->name &&
+			!strcmp(dev->device->driver->name,
+				MLX5_PCI_DRIVER_NAME)) {
+			/* odev not specified, found all mlx5 devices. */
 			break;
-		port_id++;
+		}
 	}
 	if (port_id >= RTE_MAX_ETHPORTS)
 		return RTE_MAX_ETHPORTS;
diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index 32b2817bf2..29a9b18887 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -1404,16 +1404,16 @@ int mlx5_proc_priv_init(struct rte_eth_dev *dev);
 void mlx5_proc_priv_uninit(struct rte_eth_dev *dev);
 int mlx5_udp_tunnel_port_add(struct rte_eth_dev *dev,
 			      struct rte_eth_udp_tunnel *udp_tunnel);
-uint16_t mlx5_eth_find_next(uint16_t port_id, struct rte_pci_device *pci_dev);
+uint16_t mlx5_eth_find_next(uint16_t port_id, struct rte_eth_dev *odev);
 int mlx5_dev_close(struct rte_eth_dev *dev);
 bool mlx5_is_hpf(struct rte_eth_dev *dev);
 void mlx5_age_event_prepare(struct mlx5_dev_ctx_shared *sh);
 
 /* Macro to iterate over all valid ports for mlx5 driver. */
-#define MLX5_ETH_FOREACH_DEV(port_id, pci_dev) \
-	for (port_id = mlx5_eth_find_next(0, pci_dev); \
+#define MLX5_ETH_FOREACH_DEV(port_id, dev) \
+	for (port_id = mlx5_eth_find_next(0, dev); \
 	     port_id < RTE_MAX_ETHPORTS; \
-	     port_id = mlx5_eth_find_next(port_id + 1, pci_dev))
+	     port_id = mlx5_eth_find_next(port_id + 1, dev))
 int mlx5_args(struct mlx5_dev_config *config, struct rte_devargs *devargs);
 struct mlx5_dev_ctx_shared *
 mlx5_alloc_shared_dev_ctx(const struct mlx5_dev_spawn_data *spawn,
diff --git a/drivers/net/mlx5/mlx5_ethdev.c b/drivers/net/mlx5/mlx5_ethdev.c
index 90baee5aa4..4654b85844 100644
--- a/drivers/net/mlx5/mlx5_ethdev.c
+++ b/drivers/net/mlx5/mlx5_ethdev.c
@@ -335,7 +335,7 @@ mlx5_dev_infos_get(struct rte_eth_dev *dev, struct rte_eth_dev_info *info)
 	if (priv->representor) {
 		uint16_t port_id;
 
-		MLX5_ETH_FOREACH_DEV(port_id, priv->pci_dev) {
+		MLX5_ETH_FOREACH_DEV(port_id, dev) {
 			struct mlx5_priv *opriv =
 				rte_eth_devices[port_id].data->dev_private;
 
diff --git a/drivers/net/mlx5/mlx5_mr.c b/drivers/net/mlx5/mlx5_mr.c
index e791b6338d..fcb475582d 100644
--- a/drivers/net/mlx5/mlx5_mr.c
+++ b/drivers/net/mlx5/mlx5_mr.c
@@ -290,23 +290,23 @@ mlx5_mr_update_ext_mp_cb(struct rte_mempool *mp, void *opaque,
 }
 
 /**
- * Finds the first ethdev that match the pci device.
+ * Finds the first ethdev that match the device.
  * The existence of multiple ethdev per pci device is only with representors.
  * On such case, it is enough to get only one of the ports as they all share
  * the same ibv context.
  *
- * @param pdev
- *   Pointer to the PCI device.
+ * @param dev
+ *   Pointer to the device.
  *
  * @return
  *   Pointer to the ethdev if found, NULL otherwise.
  */
 static struct rte_eth_dev *
-pci_dev_to_eth_dev(struct rte_pci_device *pdev)
+dev_to_eth_dev(struct rte_device *dev)
 {
 	uint16_t port_id;
 
-	port_id = rte_eth_find_next_of(0, &pdev->device);
+	port_id = rte_eth_find_next_of(0, dev);
 	if (port_id == RTE_MAX_ETHPORTS)
 		return NULL;
 	return &rte_eth_devices[port_id];
@@ -336,7 +336,7 @@ mlx5_dma_map(struct rte_pci_device *pdev, void *addr,
 	struct mlx5_priv *priv;
 	struct mlx5_dev_ctx_shared *sh;
 
-	dev = pci_dev_to_eth_dev(pdev);
+	dev = dev_to_eth_dev(&pdev->device);
 	if (!dev) {
 		DRV_LOG(WARNING, "unable to find matching ethdev "
 				 "to PCI device %p", (void *)pdev);
@@ -386,7 +386,7 @@ mlx5_dma_unmap(struct rte_pci_device *pdev, void *addr,
 	struct mlx5_mr *mr;
 	struct mr_cache_entry entry;
 
-	dev = pci_dev_to_eth_dev(pdev);
+	dev = dev_to_eth_dev(&pdev->device);
 	if (!dev) {
 		DRV_LOG(WARNING, "unable to find matching ethdev "
 				 "to PCI device %p", (void *)pdev);
diff --git a/drivers/net/mlx5/mlx5_trigger.c b/drivers/net/mlx5/mlx5_trigger.c
index ae7fcca229..6c8a64ce03 100644
--- a/drivers/net/mlx5/mlx5_trigger.c
+++ b/drivers/net/mlx5/mlx5_trigger.c
@@ -697,7 +697,7 @@ mlx5_hairpin_bind_single_port(struct rte_eth_dev *dev, uint16_t rx_port)
 	uint32_t explicit;
 	uint16_t rx_queue;
 
-	if (mlx5_eth_find_next(rx_port, priv->pci_dev) != rx_port) {
+	if (mlx5_eth_find_next(rx_port, dev) != rx_port) {
 		rte_errno = ENODEV;
 		DRV_LOG(ERR, "Rx port %u does not belong to mlx5", rx_port);
 		return -rte_errno;
@@ -835,7 +835,7 @@ mlx5_hairpin_unbind_single_port(struct rte_eth_dev *dev, uint16_t rx_port)
 	int ret;
 	uint16_t cur_port = priv->dev_data->port_id;
 
-	if (mlx5_eth_find_next(rx_port, priv->pci_dev) != rx_port) {
+	if (mlx5_eth_find_next(rx_port, dev) != rx_port) {
 		rte_errno = ENODEV;
 		DRV_LOG(ERR, "Rx port %u does not belong to mlx5", rx_port);
 		return -rte_errno;
@@ -893,7 +893,6 @@ mlx5_hairpin_bind(struct rte_eth_dev *dev, uint16_t rx_port)
 {
 	int ret = 0;
 	uint16_t p, pp;
-	struct mlx5_priv *priv = dev->data->dev_private;
 
 	/*
 	 * If the Rx port has no hairpin configuration with the current port,
@@ -902,7 +901,7 @@ mlx5_hairpin_bind(struct rte_eth_dev *dev, uint16_t rx_port)
 	 * information updating.
 	 */
 	if (rx_port == RTE_MAX_ETHPORTS) {
-		MLX5_ETH_FOREACH_DEV(p, priv->pci_dev) {
+		MLX5_ETH_FOREACH_DEV(p, dev) {
 			ret = mlx5_hairpin_bind_single_port(dev, p);
 			if (ret != 0)
 				goto unbind;
@@ -912,7 +911,7 @@ mlx5_hairpin_bind(struct rte_eth_dev *dev, uint16_t rx_port)
 		return mlx5_hairpin_bind_single_port(dev, rx_port);
 	}
 unbind:
-	MLX5_ETH_FOREACH_DEV(pp, priv->pci_dev)
+	MLX5_ETH_FOREACH_DEV(pp, dev)
 		if (pp < p)
 			mlx5_hairpin_unbind_single_port(dev, pp);
 	return ret;
@@ -927,10 +926,9 @@ mlx5_hairpin_unbind(struct rte_eth_dev *dev, uint16_t rx_port)
 {
 	int ret = 0;
 	uint16_t p;
-	struct mlx5_priv *priv = dev->data->dev_private;
 
 	if (rx_port == RTE_MAX_ETHPORTS)
-		MLX5_ETH_FOREACH_DEV(p, priv->pci_dev) {
+		MLX5_ETH_FOREACH_DEV(p, dev) {
 			ret = mlx5_hairpin_unbind_single_port(dev, p);
 			if (ret != 0)
 				return ret;
diff --git a/drivers/net/mlx5/mlx5_txq.c b/drivers/net/mlx5/mlx5_txq.c
index 3e5e94444b..f68c0c61a9 100644
--- a/drivers/net/mlx5/mlx5_txq.c
+++ b/drivers/net/mlx5/mlx5_txq.c
@@ -816,7 +816,7 @@ txq_set_params(struct mlx5_txq_ctrl *txq_ctrl)
 	if (config->txqs_inline == MLX5_ARG_UNSET)
 		txqs_inline =
 #if defined(RTE_ARCH_ARM64)
-		(priv->pci_dev->id.device_id ==
+		(priv->pci_dev && priv->pci_dev->id.device_id ==
 			PCI_DEVICE_ID_MELLANOX_CONNECTX5BF) ?
 			MLX5_INLINE_MAX_TXQS_BLUEFIELD :
 #endif
diff --git a/drivers/net/mlx5/windows/mlx5_os.c b/drivers/net/mlx5/windows/mlx5_os.c
index 3fe3f55f49..1e3260c6b5 100644
--- a/drivers/net/mlx5/windows/mlx5_os.c
+++ b/drivers/net/mlx5/windows/mlx5_os.c
@@ -229,9 +229,6 @@ mlx5_os_open_device(const struct mlx5_dev_spawn_data *spawn,
 	struct mlx5_context *mlx5_ctx;
 
 	pthread_mutex_init(&sh->txpp.mutex, NULL);
-	/* Set numa node from pci probe */
-	sh->numa_node = spawn->pci_dev->device.numa_node;
-
 	/* Try to open device with DevX */
 	rte_errno = 0;
 	sh->ctx = mlx5_glue->open_device(spawn->phys_dev);
@@ -344,6 +341,7 @@ mlx5_dev_spawn(struct rte_device *dpdk_dev,
 	sh = mlx5_alloc_shared_dev_ctx(spawn, config);
 	if (!sh)
 		return NULL;
+	sh->numa_node = dpdk_dev->numa_node;
 	config->devx = sh->devx;
 	/* Initialize the shutdown event in mlx5_dev_spawn to
 	 * support mlx5_is_removed for Windows.
@@ -393,7 +391,7 @@ mlx5_dev_spawn(struct rte_device *dpdk_dev,
 	 * Look for sibling devices in order to reuse their switch domain
 	 * if any, otherwise allocate one.
 	 */
-	MLX5_ETH_FOREACH_DEV(port_id, priv->pci_dev) {
+	MLX5_ETH_FOREACH_DEV(port_id, NULL) {
 		const struct mlx5_priv *opriv =
 			rte_eth_devices[port_id].data->dev_private;
 
-- 
2.25.1


^ permalink raw reply	[flat|nested] 42+ messages in thread

* [dpdk-dev] [RFC 04/14] net/mlx5: migrate to bus-agnostic common driver
  2021-05-27 13:37 [dpdk-dev] [RFC 00/14] mlx5: support SubFunction Xueming Li
                   ` (2 preceding siblings ...)
  2021-05-27 13:37 ` [dpdk-dev] [RFC 03/14] net/mlx5: remove PCI dependency Xueming Li
@ 2021-05-27 13:37 ` Xueming Li
  2021-05-27 13:37 ` [dpdk-dev] [RFC 05/14] regex/mlx5: migrate to " Xueming Li
                   ` (6 subsequent siblings)
  10 siblings, 0 replies; 42+ messages in thread
From: Xueming Li @ 2021-05-27 13:37 UTC (permalink / raw)
  To: Viacheslav Ovsiienko
  Cc: dev, xuemingl, Matan Azrad, Shahaf Shuler, Anatoly Burakov

To support SubFunction based on auxiliary bus, common driver supports
new bus-agnostic driver.

This patch migrates net driver to new common driver.

Signed-off-by: Xueming Li <xuemingl@nvidia.com>
---
 drivers/net/mlx5/linux/mlx5_os.c   | 46 +++++++++++++++++++++---------
 drivers/net/mlx5/linux/mlx5_os.h   |  3 --
 drivers/net/mlx5/mlx5.c            | 42 ++++++++++++---------------
 drivers/net/mlx5/mlx5.h            |  3 +-
 drivers/net/mlx5/mlx5_mr.c         | 36 +++++++++++------------
 drivers/net/mlx5/mlx5_rxtx.h       |  9 +++---
 drivers/net/mlx5/windows/mlx5_os.c | 14 ++++-----
 7 files changed, 80 insertions(+), 73 deletions(-)

diff --git a/drivers/net/mlx5/linux/mlx5_os.c b/drivers/net/mlx5/linux/mlx5_os.c
index e8a97d4337..e8e6b0d5c9 100644
--- a/drivers/net/mlx5/linux/mlx5_os.c
+++ b/drivers/net/mlx5/linux/mlx5_os.c
@@ -1977,14 +1977,6 @@ mlx5_os_pci_probe_pf(struct rte_pci_device *pci_dev,
 	struct mlx5_bond_info bond_info;
 	int ret = -1;
 
-	if (rte_eal_process_type() == RTE_PROC_PRIMARY)
-		mlx5_pmd_socket_init();
-	ret = mlx5_init_once();
-	if (ret) {
-		DRV_LOG(ERR, "unable to init PMD global data: %s",
-			strerror(rte_errno));
-		return -rte_errno;
-	}
 	errno = 0;
 	ibv_list = mlx5_glue->get_device_list(&ret);
 	if (!ibv_list) {
@@ -2417,21 +2409,18 @@ mlx5_os_pci_probe_pf(struct rte_pci_device *pci_dev,
 }
 
 /**
- * DPDK callback to register a PCI device.
+ * Callback to register a PCI device.
  *
  * This function spawns Ethernet devices out of a given PCI device.
  *
- * @param[in] pci_drv
- *   PCI driver structure (mlx5_driver).
  * @param[in] pci_dev
  *   PCI device information.
  *
  * @return
  *   0 on success, a negative errno value otherwise and rte_errno is set.
  */
-int
-mlx5_os_pci_probe(struct rte_pci_driver *pci_drv __rte_unused,
-		  struct rte_pci_device *pci_dev)
+static int
+mlx5_os_pci_probe(struct rte_pci_device *pci_dev)
 {
 	struct rte_eth_devargs eth_da = { .type = RTE_ETH_REPRESENTOR_NONE };
 	int ret = 0;
@@ -2470,6 +2459,35 @@ mlx5_os_pci_probe(struct rte_pci_driver *pci_drv __rte_unused,
 	return ret;
 }
 
+/**
+ * Common bus driver callback to probe a device.
+ *
+ * This function probe PCI bus device(s).
+ *
+ * @param[in] dev
+ *   Pointer to the generic device.
+ *
+ * @return
+ *   0 on success, the function cannot fail.
+ */
+int
+mlx5_os_net_probe(struct rte_device *dev)
+{
+	int ret;
+
+	if (rte_eal_process_type() == RTE_PROC_PRIMARY)
+		mlx5_pmd_socket_init();
+	ret = mlx5_init_once();
+	if (ret) {
+		DRV_LOG(ERR, "unable to init PMD global data: %s",
+			strerror(rte_errno));
+		return -rte_errno;
+	}
+	if (mlx5_dev_is_pci(dev))
+		return mlx5_os_pci_probe(RTE_DEV_TO_PCI(dev));
+	return 0;
+}
+
 static int
 mlx5_config_doorbell_mapping_env(const struct mlx5_dev_config *config)
 {
diff --git a/drivers/net/mlx5/linux/mlx5_os.h b/drivers/net/mlx5/linux/mlx5_os.h
index 4ae7d0ef47..af7cbeb418 100644
--- a/drivers/net/mlx5/linux/mlx5_os.h
+++ b/drivers/net/mlx5/linux/mlx5_os.h
@@ -19,7 +19,4 @@ enum {
 
 #define MLX5_NAMESIZE IF_NAMESIZE
 
-#define PCI_DRV_FLAGS  (RTE_PCI_DRV_INTR_LSC | \
-			RTE_PCI_DRV_INTR_RMV | \
-			RTE_PCI_DRV_PROBE_AGAIN)
 #endif /* RTE_PMD_MLX5_OS_H_ */
diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index 95ac43268b..3defdb2db3 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -12,9 +12,7 @@
 
 #include <rte_malloc.h>
 #include <ethdev_driver.h>
-#include <ethdev_pci.h>
 #include <rte_pci.h>
-#include <rte_bus_pci.h>
 #include <rte_common.h>
 #include <rte_kvargs.h>
 #include <rte_rwlock.h>
@@ -22,13 +20,13 @@
 #include <rte_string_fns.h>
 #include <rte_alarm.h>
 #include <rte_cycles.h>
+#include <rte_bus_pci.h>
 
 #include <mlx5_glue.h>
 #include <mlx5_devx_cmds.h>
 #include <mlx5_common.h>
 #include <mlx5_common_os.h>
 #include <mlx5_common_mp.h>
-#include <mlx5_common_pci.h>
 #include <mlx5_malloc.h>
 
 #include "mlx5_defs.h"
@@ -2335,23 +2333,23 @@ mlx5_eth_find_next(uint16_t port_id, struct rte_eth_dev *odev)
 }
 
 /**
- * DPDK callback to remove a PCI device.
+ * Callback to remove a device.
  *
- * This function removes all Ethernet devices belong to a given PCI device.
+ * This function removes all Ethernet devices belong to a given device.
  *
- * @param[in] pci_dev
- *   Pointer to the PCI device.
+ * @param[in] dev
+ *   Pointer to the generic device.
  *
  * @return
  *   0 on success, the function cannot fail.
  */
 static int
-mlx5_pci_remove(struct rte_pci_device *pci_dev)
+mlx5_net_remove(struct rte_device *dev)
 {
 	uint16_t port_id;
 	int ret = 0;
 
-	RTE_ETH_FOREACH_DEV_OF(port_id, &pci_dev->device) {
+	RTE_ETH_FOREACH_DEV_OF(port_id, dev) {
 		/*
 		 * mlx5_dev_close() is not registered to secondary process,
 		 * call the close function explicitly for secondary process.
@@ -2442,19 +2440,17 @@ static const struct rte_pci_id mlx5_pci_id_map[] = {
 	}
 };
 
-static struct mlx5_pci_driver mlx5_driver = {
-	.driver_class = MLX5_CLASS_NET,
-	.pci_driver = {
-		.driver = {
-			.name = MLX5_PCI_DRIVER_NAME,
-		},
-		.id_table = mlx5_pci_id_map,
-		.probe = mlx5_os_pci_probe,
-		.remove = mlx5_pci_remove,
-		.dma_map = mlx5_dma_map,
-		.dma_unmap = mlx5_dma_unmap,
-		.drv_flags = PCI_DRV_FLAGS,
-	},
+static struct mlx5_class_driver mlx5_net_driver = {
+	.drv_class = MLX5_CLASS_NET,
+	.name = "mlx5_eth",
+	.id_table = mlx5_pci_id_map,
+	.probe = mlx5_os_net_probe,
+	.remove = mlx5_net_remove,
+	.dma_map = mlx5_net_dma_map,
+	.dma_unmap = mlx5_net_dma_unmap,
+	.probe_again = 1,
+	.intr_lsc = 1,
+	.intr_rmv = 1,
 };
 
 /* Initialize driver log type. */
@@ -2472,7 +2468,7 @@ RTE_INIT(rte_mlx5_pmd_init)
 	mlx5_set_cksum_table();
 	mlx5_set_swp_types_table();
 	if (mlx5_glue)
-		mlx5_pci_driver_register(&mlx5_driver);
+		mlx5_class_driver_register(&mlx5_net_driver);
 }
 
 RTE_PMD_EXPORT_NAME(net_mlx5, __COUNTER__);
diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index 29a9b18887..27bb34e827 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -1680,8 +1680,7 @@ int mlx5_os_open_device(const struct mlx5_dev_spawn_data *spawn,
 			 const struct mlx5_dev_config *config,
 			 struct mlx5_dev_ctx_shared *sh);
 int mlx5_os_get_pdn(void *pd, uint32_t *pdn);
-int mlx5_os_pci_probe(struct rte_pci_driver *pci_drv __rte_unused,
-		       struct rte_pci_device *pci_dev);
+int mlx5_os_net_probe(struct rte_device *dev);
 void mlx5_os_dev_shared_handler_install(struct mlx5_dev_ctx_shared *sh);
 void mlx5_os_dev_shared_handler_uninstall(struct mlx5_dev_ctx_shared *sh);
 void mlx5_os_set_reg_mr_cb(mlx5_reg_mr_t *reg_mr_cb,
diff --git a/drivers/net/mlx5/mlx5_mr.c b/drivers/net/mlx5/mlx5_mr.c
index fcb475582d..2bce302eb5 100644
--- a/drivers/net/mlx5/mlx5_mr.c
+++ b/drivers/net/mlx5/mlx5_mr.c
@@ -7,7 +7,6 @@
 #include <rte_mempool.h>
 #include <rte_malloc.h>
 #include <rte_rwlock.h>
-#include <rte_bus_pci.h>
 
 #include <mlx5_common_mp.h>
 #include <mlx5_common_mr.h>
@@ -313,10 +312,10 @@ dev_to_eth_dev(struct rte_device *dev)
 }
 
 /**
- * DPDK callback to DMA map external memory to a PCI device.
+ * Callback to DMA map external memory to a device.
  *
- * @param pdev
- *   Pointer to the PCI device.
+ * @param rte_dev
+ *   Pointer to the generic device.
  * @param addr
  *   Starting virtual address of memory to be mapped.
  * @param iova
@@ -328,18 +327,18 @@ dev_to_eth_dev(struct rte_device *dev)
  *   0 on success, negative value on error.
  */
 int
-mlx5_dma_map(struct rte_pci_device *pdev, void *addr,
-	     uint64_t iova __rte_unused, size_t len)
+mlx5_net_dma_map(struct rte_device *rte_dev, void *addr,
+		 uint64_t iova __rte_unused, size_t len)
 {
 	struct rte_eth_dev *dev;
 	struct mlx5_mr *mr;
 	struct mlx5_priv *priv;
 	struct mlx5_dev_ctx_shared *sh;
 
-	dev = dev_to_eth_dev(&pdev->device);
+	dev = dev_to_eth_dev(rte_dev);
 	if (!dev) {
 		DRV_LOG(WARNING, "unable to find matching ethdev "
-				 "to PCI device %p", (void *)pdev);
+				 "to device %s", rte_dev->name);
 		rte_errno = ENODEV;
 		return -1;
 	}
@@ -362,10 +361,10 @@ mlx5_dma_map(struct rte_pci_device *pdev, void *addr,
 }
 
 /**
- * DPDK callback to DMA unmap external memory to a PCI device.
+ * Callback to DMA unmap external memory to a device.
  *
- * @param pdev
- *   Pointer to the PCI device.
+ * @param rte_dev
+ *   Pointer to the generic device.
  * @param addr
  *   Starting virtual address of memory to be unmapped.
  * @param iova
@@ -377,8 +376,8 @@ mlx5_dma_map(struct rte_pci_device *pdev, void *addr,
  *   0 on success, negative value on error.
  */
 int
-mlx5_dma_unmap(struct rte_pci_device *pdev, void *addr,
-	       uint64_t iova __rte_unused, size_t len __rte_unused)
+mlx5_net_dma_unmap(struct rte_device *rte_dev, void *addr,
+		   uint64_t iova __rte_unused, size_t len __rte_unused)
 {
 	struct rte_eth_dev *dev;
 	struct mlx5_priv *priv;
@@ -386,10 +385,10 @@ mlx5_dma_unmap(struct rte_pci_device *pdev, void *addr,
 	struct mlx5_mr *mr;
 	struct mr_cache_entry entry;
 
-	dev = dev_to_eth_dev(&pdev->device);
+	dev = dev_to_eth_dev(rte_dev);
 	if (!dev) {
-		DRV_LOG(WARNING, "unable to find matching ethdev "
-				 "to PCI device %p", (void *)pdev);
+		DRV_LOG(WARNING, "unable to find matching ethdev to device %s",
+			rte_dev->name);
 		rte_errno = ENODEV;
 		return -1;
 	}
@@ -399,9 +398,8 @@ mlx5_dma_unmap(struct rte_pci_device *pdev, void *addr,
 	mr = mlx5_mr_lookup_list(&sh->share_cache, &entry, (uintptr_t)addr);
 	if (!mr) {
 		rte_rwlock_read_unlock(&sh->share_cache.rwlock);
-		DRV_LOG(WARNING, "address 0x%" PRIxPTR " wasn't registered "
-				 "to PCI device %p", (uintptr_t)addr,
-				 (void *)pdev);
+		DRV_LOG(WARNING, "address 0x%" PRIxPTR " wasn't registered to device %s",
+			(uintptr_t)addr, rte_dev->name);
 		rte_errno = EINVAL;
 		return -1;
 	}
diff --git a/drivers/net/mlx5/mlx5_rxtx.h b/drivers/net/mlx5/mlx5_rxtx.h
index e168dd46f9..ad1144e218 100644
--- a/drivers/net/mlx5/mlx5_rxtx.h
+++ b/drivers/net/mlx5/mlx5_rxtx.h
@@ -16,7 +16,6 @@
 #include <rte_hexdump.h>
 #include <rte_spinlock.h>
 #include <rte_io.h>
-#include <rte_bus_pci.h>
 #include <rte_cycles.h>
 
 #include <mlx5_common.h>
@@ -48,10 +47,10 @@ int mlx5_queue_state_modify(struct rte_eth_dev *dev,
 /* mlx5_mr.c */
 
 void mlx5_mr_flush_local_cache(struct mlx5_mr_ctrl *mr_ctrl);
-int mlx5_dma_map(struct rte_pci_device *pdev, void *addr, uint64_t iova,
-		 size_t len);
-int mlx5_dma_unmap(struct rte_pci_device *pdev, void *addr, uint64_t iova,
-		   size_t len);
+int mlx5_net_dma_map(struct rte_device *rte_dev, void *addr, uint64_t iova,
+		     size_t len);
+int mlx5_net_dma_unmap(struct rte_device *rte_dev, void *addr, uint64_t iova,
+		       size_t len);
 
 /**
  * Get Memory Pool (MP) from mbuf. If mbuf is indirect, the pool from which the
diff --git a/drivers/net/mlx5/windows/mlx5_os.c b/drivers/net/mlx5/windows/mlx5_os.c
index 1e3260c6b5..2dfc957412 100644
--- a/drivers/net/mlx5/windows/mlx5_os.c
+++ b/drivers/net/mlx5/windows/mlx5_os.c
@@ -917,22 +917,22 @@ mlx5_match_devx_devices_to_addr(struct devx_device_bdf *devx_bdf,
 }
 
 /**
- * DPDK callback to register a PCI device.
+ * Driver callback to register a device.
  *
  * This function spawns Ethernet devices out of a given PCI device.
  *
- * @param[in] pci_drv
- *   PCI driver structure (mlx5_driver).
- * @param[in] pci_dev
- *   PCI device information.
+ * @param[in] rte_dev
+ *   Pointer to generic device.
+ * @param[in] ib_dev
+ *   Pointer to Verbs device.
  *
  * @return
  *   0 on success, a negative errno value otherwise and rte_errno is set.
  */
 int
-mlx5_os_pci_probe(struct rte_pci_driver *pci_drv __rte_unused,
-		  struct rte_pci_device *pci_dev)
+mlx5_os_net_probe(struct rte_device *rte_dev)
 {
+	struct rte_pci_device *pci_dev = RTE_DEV_TO_PCI(rte_dev);
 	struct devx_device_bdf *devx_bdf_devs, *orig_devx_bdf_devs;
 	/*
 	 * Number of found IB Devices matching with requested PCI BDF.
-- 
2.25.1


^ permalink raw reply	[flat|nested] 42+ messages in thread

* [dpdk-dev] [RFC 05/14] regex/mlx5: migrate to common driver
  2021-05-27 13:37 [dpdk-dev] [RFC 00/14] mlx5: support SubFunction Xueming Li
                   ` (3 preceding siblings ...)
  2021-05-27 13:37 ` [dpdk-dev] [RFC 04/14] net/mlx5: migrate to bus-agnostic common driver Xueming Li
@ 2021-05-27 13:37 ` Xueming Li
  2021-05-27 13:37 ` [dpdk-dev] [RFC 06/14] compress/mlx5: " Xueming Li
                   ` (5 subsequent siblings)
  10 siblings, 0 replies; 42+ messages in thread
From: Xueming Li @ 2021-05-27 13:37 UTC (permalink / raw)
  To: Viacheslav Ovsiienko; +Cc: dev, xuemingl, Ori Kam

To support auxiliary bus, upgrades driver to use mlx5 common driver
structure.

Signed-off-by: Xueming Li <xuemingl@nvidia.com>
---
 drivers/regex/mlx5/mlx5_regex.c | 49 ++++++++++++---------------------
 drivers/regex/mlx5/mlx5_regex.h |  1 -
 2 files changed, 18 insertions(+), 32 deletions(-)

diff --git a/drivers/regex/mlx5/mlx5_regex.c b/drivers/regex/mlx5/mlx5_regex.c
index dcb2ced88e..9151e477c0 100644
--- a/drivers/regex/mlx5/mlx5_regex.c
+++ b/drivers/regex/mlx5/mlx5_regex.c
@@ -9,8 +9,8 @@
 #include <rte_regexdev.h>
 #include <rte_regexdev_core.h>
 #include <rte_regexdev_driver.h>
+#include <rte_bus_pci.h>
 
-#include <mlx5_common_pci.h>
 #include <mlx5_common.h>
 #include <mlx5_glue.h>
 #include <mlx5_devx_cmds.h>
@@ -76,15 +76,13 @@ mlx5_regex_engines_status(struct ibv_context *ctx, int num_engines)
 }
 
 static void
-mlx5_regex_get_name(char *name, struct rte_pci_device *pci_dev __rte_unused)
+mlx5_regex_get_name(char *name, struct rte_device *dev)
 {
-	sprintf(name, "mlx5_regex_%02x:%02x.%02x", pci_dev->addr.bus,
-		pci_dev->addr.devid, pci_dev->addr.function);
+	sprintf(name, "mlx5_regex_%s", dev->name);
 }
 
 static int
-mlx5_regex_pci_probe(struct rte_pci_driver *pci_drv __rte_unused,
-		     struct rte_pci_device *pci_dev)
+mlx5_regex_dev_probe(struct rte_device *rte_dev)
 {
 	struct ibv_device *ibv;
 	struct mlx5_regex_priv *priv = NULL;
@@ -94,16 +92,10 @@ mlx5_regex_pci_probe(struct rte_pci_driver *pci_drv __rte_unused,
 	int ret;
 	uint32_t val;
 
-	ibv = mlx5_os_get_ibv_device(&pci_dev->addr);
-	if (!ibv) {
-		DRV_LOG(ERR, "No matching IB device for PCI slot "
-			PCI_PRI_FMT ".", pci_dev->addr.domain,
-			pci_dev->addr.bus, pci_dev->addr.devid,
-			pci_dev->addr.function);
+	ibv = mlx5_get_ibv_device(rte_dev);
+	if (ibv == NULL)
 		return -rte_errno;
-	}
-	DRV_LOG(INFO, "PCI information matches for device \"%s\".",
-		ibv->name);
+	DRV_LOG(INFO, "Probe device \"%s\".", ibv->name);
 	ctx = mlx5_glue->dv_open_device(ibv);
 	if (!ctx) {
 		DRV_LOG(ERR, "Failed to open IB device \"%s\".", ibv->name);
@@ -146,7 +138,7 @@ mlx5_regex_pci_probe(struct rte_pci_driver *pci_drv __rte_unused,
 		priv->is_bf2 = 1;
 	/* Default RXP programming mode to Shared. */
 	priv->prog_mode = MLX5_RXP_SHARED_PROG_MODE;
-	mlx5_regex_get_name(name, pci_dev);
+	mlx5_regex_get_name(name, rte_dev);
 	priv->regexdev = rte_regexdev_register(name);
 	if (priv->regexdev == NULL) {
 		DRV_LOG(ERR, "Failed to register RegEx device.");
@@ -180,7 +172,7 @@ mlx5_regex_pci_probe(struct rte_pci_driver *pci_drv __rte_unused,
 		priv->regexdev->enqueue = mlx5_regexdev_enqueue_gga;
 #endif
 	priv->regexdev->dequeue = mlx5_regexdev_dequeue;
-	priv->regexdev->device = (struct rte_device *)pci_dev;
+	priv->regexdev->device = rte_dev;
 	priv->regexdev->data->dev_private = priv;
 	priv->regexdev->state = RTE_REGEXDEV_READY;
 	priv->mr_scache.reg_mr_cb = mlx5_common_verbs_reg_mr;
@@ -213,13 +205,13 @@ mlx5_regex_pci_probe(struct rte_pci_driver *pci_drv __rte_unused,
 }
 
 static int
-mlx5_regex_pci_remove(struct rte_pci_device *pci_dev)
+mlx5_regex_dev_remove(struct rte_device *rte_dev)
 {
 	char name[RTE_REGEXDEV_NAME_MAX_LEN];
 	struct rte_regexdev *dev;
 	struct mlx5_regex_priv *priv = NULL;
 
-	mlx5_regex_get_name(name, pci_dev);
+	mlx5_regex_get_name(name, rte_dev);
 	dev = rte_regexdev_get_device_by_name(name);
 	if (!dev)
 		return 0;
@@ -254,24 +246,19 @@ static const struct rte_pci_id mlx5_regex_pci_id_map[] = {
 	}
 };
 
-static struct mlx5_pci_driver mlx5_regex_driver = {
-	.driver_class = MLX5_CLASS_REGEX,
-	.pci_driver = {
-		.driver = {
-			.name = RTE_STR(MLX5_REGEX_DRIVER_NAME),
-		},
-		.id_table = mlx5_regex_pci_id_map,
-		.probe = mlx5_regex_pci_probe,
-		.remove = mlx5_regex_pci_remove,
-		.drv_flags = 0,
-	},
+static struct mlx5_class_driver mlx5_regex_driver = {
+	.drv_class = MLX5_CLASS_REGEX,
+	.name = RTE_STR(MLX5_REGEX_DRIVER_NAME),
+	.id_table = mlx5_regex_pci_id_map,
+	.probe = mlx5_regex_dev_probe,
+	.remove = mlx5_regex_dev_remove,
 };
 
 RTE_INIT(rte_mlx5_regex_init)
 {
 	mlx5_common_init();
 	if (mlx5_glue)
-		mlx5_pci_driver_register(&mlx5_regex_driver);
+		mlx5_class_driver_register(&mlx5_regex_driver);
 }
 
 RTE_LOG_REGISTER_DEFAULT(mlx5_regex_logtype, NOTICE)
diff --git a/drivers/regex/mlx5/mlx5_regex.h b/drivers/regex/mlx5/mlx5_regex.h
index 51a2101e53..45200bf937 100644
--- a/drivers/regex/mlx5/mlx5_regex.h
+++ b/drivers/regex/mlx5/mlx5_regex.h
@@ -59,7 +59,6 @@ struct mlx5_regex_db {
 struct mlx5_regex_priv {
 	TAILQ_ENTRY(mlx5_regex_priv) next;
 	struct ibv_context *ctx; /* Device context. */
-	struct rte_pci_device *pci_dev;
 	struct rte_regexdev *regexdev; /* Pointer to the RegEx dev. */
 	uint16_t nb_queues; /* Number of queues. */
 	struct mlx5_regex_qp *qps; /* Pointer to the qp array. */
-- 
2.25.1


^ permalink raw reply	[flat|nested] 42+ messages in thread

* [dpdk-dev] [RFC 06/14] compress/mlx5: migrate to common driver
  2021-05-27 13:37 [dpdk-dev] [RFC 00/14] mlx5: support SubFunction Xueming Li
                   ` (4 preceding siblings ...)
  2021-05-27 13:37 ` [dpdk-dev] [RFC 05/14] regex/mlx5: migrate to " Xueming Li
@ 2021-05-27 13:37 ` Xueming Li
  2021-05-27 13:37 ` [dpdk-dev] [RFC 07/14] vdpa/mlx5: fix driver name Xueming Li
                   ` (4 subsequent siblings)
  10 siblings, 0 replies; 42+ messages in thread
From: Xueming Li @ 2021-05-27 13:37 UTC (permalink / raw)
  To: Viacheslav Ovsiienko
  Cc: dev, xuemingl, Matan Azrad, Fiona Trahe, Ashish Gupta

To support auxiliary bus, upgrades driver to use mlx5 common driver
structure.

Signed-off-by: Xueming Li <xuemingl@nvidia.com>
---
 drivers/compress/mlx5/mlx5_compress.c | 71 ++++++---------------------
 1 file changed, 15 insertions(+), 56 deletions(-)

diff --git a/drivers/compress/mlx5/mlx5_compress.c b/drivers/compress/mlx5/mlx5_compress.c
index 80c564f10b..10d64a72d6 100644
--- a/drivers/compress/mlx5/mlx5_compress.c
+++ b/drivers/compress/mlx5/mlx5_compress.c
@@ -5,7 +5,7 @@
 #include <rte_malloc.h>
 #include <rte_log.h>
 #include <rte_errno.h>
-#include <rte_pci.h>
+#include <rte_bus_pci.h>
 #include <rte_spinlock.h>
 #include <rte_comp.h>
 #include <rte_compressdev.h>
@@ -13,7 +13,6 @@
 
 #include <mlx5_glue.h>
 #include <mlx5_common.h>
-#include <mlx5_common_pci.h>
 #include <mlx5_devx_cmds.h>
 #include <mlx5_common_os.h>
 #include <mlx5_common_devx.h>
@@ -37,7 +36,6 @@ struct mlx5_compress_xform {
 struct mlx5_compress_priv {
 	TAILQ_ENTRY(mlx5_compress_priv) next;
 	struct ibv_context *ctx; /* Device context. */
-	struct rte_pci_device *pci_dev;
 	struct rte_compressdev *cdev;
 	void *uar;
 	uint32_t pdn; /* Protection Domain number. */
@@ -711,23 +709,8 @@ mlx5_compress_hw_global_prepare(struct mlx5_compress_priv *priv)
 	return 0;
 }
 
-/**
- * DPDK callback to register a PCI device.
- *
- * This function spawns compress device out of a given PCI device.
- *
- * @param[in] pci_drv
- *   PCI driver structure (mlx5_compress_driver).
- * @param[in] pci_dev
- *   PCI device information.
- *
- * @return
- *   0 on success, 1 to skip this driver, a negative errno value otherwise
- *   and rte_errno is set.
- */
 static int
-mlx5_compress_pci_probe(struct rte_pci_driver *pci_drv,
-			struct rte_pci_device *pci_dev)
+mlx5_compress_dev_probe(struct rte_device *dev)
 {
 	struct ibv_device *ibv;
 	struct rte_compressdev *cdev;
@@ -736,24 +719,17 @@ mlx5_compress_pci_probe(struct rte_pci_driver *pci_drv,
 	struct mlx5_hca_attr att = { 0 };
 	struct rte_compressdev_pmd_init_params init_params = {
 		.name = "",
-		.socket_id = pci_dev->device.numa_node,
+		.socket_id = dev->numa_node,
 	};
 
-	RTE_SET_USED(pci_drv);
 	if (rte_eal_process_type() != RTE_PROC_PRIMARY) {
 		DRV_LOG(ERR, "Non-primary process type is not supported.");
 		rte_errno = ENOTSUP;
 		return -rte_errno;
 	}
-	ibv = mlx5_os_get_ibv_device(&pci_dev->addr);
-	if (ibv == NULL) {
-		DRV_LOG(ERR, "No matching IB device for PCI slot "
-			PCI_PRI_FMT ".", pci_dev->addr.domain,
-			pci_dev->addr.bus, pci_dev->addr.devid,
-			pci_dev->addr.function);
+	ibv = mlx5_get_ibv_device(dev);
+	if (ibv == NULL)
 		return -rte_errno;
-	}
-	DRV_LOG(INFO, "PCI information matches for device \"%s\".", ibv->name);
 	ctx = mlx5_glue->dv_open_device(ibv);
 	if (ctx == NULL) {
 		DRV_LOG(ERR, "Failed to open IB device \"%s\".", ibv->name);
@@ -769,7 +745,7 @@ mlx5_compress_pci_probe(struct rte_pci_driver *pci_drv,
 		rte_errno = ENOTSUP;
 		return -ENOTSUP;
 	}
-	cdev = rte_compressdev_pmd_create(ibv->name, &pci_dev->device,
+	cdev = rte_compressdev_pmd_create(ibv->name, dev,
 					  sizeof(*priv), &init_params);
 	if (cdev == NULL) {
 		DRV_LOG(ERR, "Failed to create device \"%s\".", ibv->name);
@@ -784,7 +760,6 @@ mlx5_compress_pci_probe(struct rte_pci_driver *pci_drv,
 	cdev->feature_flags = RTE_COMPDEV_FF_HW_ACCELERATED;
 	priv = cdev->data->dev_private;
 	priv->ctx = ctx;
-	priv->pci_dev = pci_dev;
 	priv->cdev = cdev;
 	priv->min_block_size = att.compress_min_block_size;
 	priv->sq_ts_format = att.sq_ts_format;
@@ -810,25 +785,14 @@ mlx5_compress_pci_probe(struct rte_pci_driver *pci_drv,
 	return 0;
 }
 
-/**
- * DPDK callback to remove a PCI device.
- *
- * This function removes all compress devices belong to a given PCI device.
- *
- * @param[in] pci_dev
- *   Pointer to the PCI device.
- *
- * @return
- *   0 on success, the function cannot fail.
- */
 static int
-mlx5_compress_pci_remove(struct rte_pci_device *pdev)
+mlx5_compress_dev_remove(struct rte_device *dev)
 {
 	struct mlx5_compress_priv *priv = NULL;
 
 	pthread_mutex_lock(&priv_list_lock);
 	TAILQ_FOREACH(priv, &mlx5_compress_priv_list, next)
-		if (rte_pci_addr_cmp(&priv->pci_dev->addr, &pdev->addr) != 0)
+		if (priv->cdev->device == dev)
 			break;
 	if (priv)
 		TAILQ_REMOVE(&mlx5_compress_priv_list, priv, next);
@@ -852,24 +816,19 @@ static const struct rte_pci_id mlx5_compress_pci_id_map[] = {
 	}
 };
 
-static struct mlx5_pci_driver mlx5_compress_driver = {
-	.driver_class = MLX5_CLASS_COMPRESS,
-	.pci_driver = {
-		.driver = {
-			.name = RTE_STR(MLX5_COMPRESS_DRIVER_NAME),
-		},
-		.id_table = mlx5_compress_pci_id_map,
-		.probe = mlx5_compress_pci_probe,
-		.remove = mlx5_compress_pci_remove,
-		.drv_flags = 0,
-	},
+static struct mlx5_class_driver mlx5_compress_driver = {
+	.drv_class = MLX5_CLASS_COMPRESS,
+	.name = RTE_STR(MLX5_COMPRESS_DRIVER_NAME),
+	.id_table = mlx5_compress_pci_id_map,
+	.probe = mlx5_compress_dev_probe,
+	.remove = mlx5_compress_dev_remove,
 };
 
 RTE_INIT(rte_mlx5_compress_init)
 {
 	mlx5_common_init();
 	if (mlx5_glue != NULL)
-		mlx5_pci_driver_register(&mlx5_compress_driver);
+		mlx5_class_driver_register(&mlx5_compress_driver);
 }
 
 RTE_LOG_REGISTER_DEFAULT(mlx5_compress_logtype, NOTICE)
-- 
2.25.1


^ permalink raw reply	[flat|nested] 42+ messages in thread

* [dpdk-dev] [RFC 07/14] vdpa/mlx5: fix driver name
  2021-05-27 13:37 [dpdk-dev] [RFC 00/14] mlx5: support SubFunction Xueming Li
                   ` (5 preceding siblings ...)
  2021-05-27 13:37 ` [dpdk-dev] [RFC 06/14] compress/mlx5: " Xueming Li
@ 2021-05-27 13:37 ` Xueming Li
  2021-05-27 13:37 ` [dpdk-dev] [RFC 08/14] vdpa/mlx5: remove PCI specifics Xueming Li
                   ` (3 subsequent siblings)
  10 siblings, 0 replies; 42+ messages in thread
From: Xueming Li @ 2021-05-27 13:37 UTC (permalink / raw)
  To: Viacheslav Ovsiienko; +Cc: Thomas Monjalon, dev, xuemingl, Matan Azrad

From: Thomas Monjalon <thomas@monjalon.net>

Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
---
 drivers/vdpa/mlx5/mlx5_vdpa.c | 9 +++++----
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.c b/drivers/vdpa/mlx5/mlx5_vdpa.c
index 8b5bfd8c3d..5ab7c525c2 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa.c
@@ -24,6 +24,7 @@
 #include "mlx5_vdpa_utils.h"
 #include "mlx5_vdpa.h"
 
+#define MLX5_VDPA_DRIVER_NAME vdpa_mlx5
 
 #define MLX5_VDPA_DEFAULT_FEATURES ((1ULL << VHOST_USER_F_PROTOCOL_FEATURES) | \
 			    (1ULL << VIRTIO_F_ANY_LAYOUT) | \
@@ -834,7 +835,7 @@ static struct mlx5_pci_driver mlx5_vdpa_driver = {
 	.driver_class = MLX5_CLASS_VDPA,
 	.pci_driver = {
 		.driver = {
-			.name = "mlx5_vdpa",
+			.name = RTE_STR(MLX5_VDPA_DRIVER_NAME),
 		},
 		.id_table = mlx5_vdpa_pci_id_map,
 		.probe = mlx5_vdpa_pci_probe,
@@ -855,6 +856,6 @@ RTE_INIT(rte_mlx5_vdpa_init)
 		mlx5_pci_driver_register(&mlx5_vdpa_driver);
 }
 
-RTE_PMD_EXPORT_NAME(net_mlx5_vdpa, __COUNTER__);
-RTE_PMD_REGISTER_PCI_TABLE(net_mlx5_vdpa, mlx5_vdpa_pci_id_map);
-RTE_PMD_REGISTER_KMOD_DEP(net_mlx5_vdpa, "* ib_uverbs & mlx5_core & mlx5_ib");
+RTE_PMD_EXPORT_NAME(MLX5_VDPA_DRIVER_NAME, __COUNTER__);
+RTE_PMD_REGISTER_PCI_TABLE(MLX5_VDPA_DRIVER_NAME, mlx5_vdpa_pci_id_map);
+RTE_PMD_REGISTER_KMOD_DEP(MLX5_VDPA_DRIVER_NAME, "* ib_uverbs & mlx5_core & mlx5_ib");
-- 
2.25.1


^ permalink raw reply	[flat|nested] 42+ messages in thread

* [dpdk-dev] [RFC 08/14] vdpa/mlx5: remove PCI specifics
  2021-05-27 13:37 [dpdk-dev] [RFC 00/14] mlx5: support SubFunction Xueming Li
                   ` (6 preceding siblings ...)
  2021-05-27 13:37 ` [dpdk-dev] [RFC 07/14] vdpa/mlx5: fix driver name Xueming Li
@ 2021-05-27 13:37 ` Xueming Li
  2021-05-27 13:37 ` [dpdk-dev] [RFC 09/14] common/mlx5: clean up legacy PCI bus driver Xueming Li
                   ` (2 subsequent siblings)
  10 siblings, 0 replies; 42+ messages in thread
From: Xueming Li @ 2021-05-27 13:37 UTC (permalink / raw)
  To: Viacheslav Ovsiienko; +Cc: Thomas Monjalon, dev, xuemingl, Matan Azrad

From: Thomas Monjalon <thomas@monjalon.net>

Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
---
 drivers/vdpa/mlx5/mlx5_vdpa.c | 119 ++++++++++------------------------
 drivers/vdpa/mlx5/mlx5_vdpa.h |   1 -
 2 files changed, 34 insertions(+), 86 deletions(-)

diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.c b/drivers/vdpa/mlx5/mlx5_vdpa.c
index 5ab7c525c2..967234193f 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa.c
@@ -11,12 +11,11 @@
 #include <rte_malloc.h>
 #include <rte_log.h>
 #include <rte_errno.h>
-#include <rte_pci.h>
 #include <rte_string_fns.h>
+#include <rte_bus_pci.h>
 
 #include <mlx5_glue.h>
 #include <mlx5_common.h>
-#include <mlx5_common_pci.h>
 #include <mlx5_devx_cmds.h>
 #include <mlx5_prm.h>
 #include <mlx5_nl.h>
@@ -552,34 +551,13 @@ mlx5_vdpa_sys_roce_disable(const char *addr)
 }
 
 static int
-mlx5_vdpa_roce_disable(struct rte_pci_addr *addr, struct ibv_device **ibv)
+mlx5_vdpa_roce_disable(struct rte_device *dev)
 {
-	char addr_name[64] = {0};
-
-	rte_pci_device_name(addr, addr_name, sizeof(addr_name));
 	/* Firstly try to disable ROCE by Netlink and fallback to sysfs. */
-	if (mlx5_vdpa_nl_roce_disable(addr_name) == 0 ||
-	    mlx5_vdpa_sys_roce_disable(addr_name) == 0) {
-		/*
-		 * Succeed to disable ROCE, wait for the IB device to appear
-		 * again after reload.
-		 */
-		int r;
-		struct ibv_device *ibv_new;
-
-		for (r = MLX5_VDPA_MAX_RETRIES; r; r--) {
-			ibv_new = mlx5_os_get_ibv_device(addr);
-			if (ibv_new) {
-				*ibv = ibv_new;
-				return 0;
-			}
-			usleep(MLX5_VDPA_USEC);
-		}
-		DRV_LOG(ERR, "Cannot much device %s after ROCE disable, "
-			"retries exceed %d", addr_name, MLX5_VDPA_MAX_RETRIES);
-		rte_errno = EAGAIN;
-	}
-	return -rte_errno;
+	if (mlx5_vdpa_nl_roce_disable(dev->name) != 0 &&
+	    mlx5_vdpa_sys_roce_disable(dev->name) != 0)
+		return -rte_errno;
+	return 0;
 }
 
 static int
@@ -647,44 +625,33 @@ mlx5_vdpa_config_get(struct rte_devargs *devargs, struct mlx5_vdpa_priv *priv)
 	DRV_LOG(DEBUG, "no traffic max is %u.", priv->no_traffic_max);
 }
 
-/**
- * DPDK callback to register a mlx5 PCI device.
- *
- * This function spawns vdpa device out of a given PCI device.
- *
- * @param[in] pci_drv
- *   PCI driver structure (mlx5_vpda_driver).
- * @param[in] pci_dev
- *   PCI device information.
- *
- * @return
- *   0 on success, 1 to skip this driver, a negative errno value otherwise
- *   and rte_errno is set.
- */
 static int
-mlx5_vdpa_pci_probe(struct rte_pci_driver *pci_drv __rte_unused,
-		    struct rte_pci_device *pci_dev __rte_unused)
+mlx5_vdpa_dev_probe(struct rte_device *dev)
 {
 	struct ibv_device *ibv;
 	struct mlx5_vdpa_priv *priv = NULL;
 	struct ibv_context *ctx = NULL;
 	struct mlx5_hca_attr attr;
+	int retry;
 	int ret;
 
-	ibv = mlx5_os_get_ibv_device(&pci_dev->addr);
-	if (!ibv) {
-		DRV_LOG(ERR, "No matching IB device for PCI slot "
-			PCI_PRI_FMT ".", pci_dev->addr.domain,
-			pci_dev->addr.bus, pci_dev->addr.devid,
-			pci_dev->addr.function);
+	if (mlx5_vdpa_roce_disable(dev) != 0) {
+		DRV_LOG(WARNING, "Failed to disable ROCE for \"%s\".",
+			dev->name);
 		return -rte_errno;
-	} else {
-		DRV_LOG(INFO, "PCI information matches for device \"%s\".",
-			ibv->name);
 	}
-	if (mlx5_vdpa_roce_disable(&pci_dev->addr, &ibv) != 0) {
-		DRV_LOG(WARNING, "Failed to disable ROCE for \"%s\".",
-			ibv->name);
+	/* Wait for the IB device to appear again after reload. */
+	for (retry = MLX5_VDPA_MAX_RETRIES; retry > 0; --retry) {
+		ibv = mlx5_get_ibv_device(dev);
+		if (ibv != NULL)
+			break;
+		usleep(MLX5_VDPA_USEC);
+	}
+	if (ibv == NULL) {
+		DRV_LOG(ERR, "Cannot get IB device after disabling RoCE for "
+				"\"%s\", retries exceed %d.",
+				dev->name, MLX5_VDPA_MAX_RETRIES);
+		rte_errno = EAGAIN;
 		return -rte_errno;
 	}
 	ctx = mlx5_glue->dv_open_device(ibv);
@@ -722,20 +689,18 @@ mlx5_vdpa_pci_probe(struct rte_pci_driver *pci_drv __rte_unused,
 	if (attr.num_lag_ports == 0)
 		priv->num_lag_ports = 1;
 	priv->ctx = ctx;
-	priv->pci_dev = pci_dev;
 	priv->var = mlx5_glue->dv_alloc_var(ctx, 0);
 	if (!priv->var) {
 		DRV_LOG(ERR, "Failed to allocate VAR %u.", errno);
 		goto error;
 	}
-	priv->vdev = rte_vdpa_register_device(&pci_dev->device,
-			&mlx5_vdpa_ops);
+	priv->vdev = rte_vdpa_register_device(dev, &mlx5_vdpa_ops);
 	if (priv->vdev == NULL) {
 		DRV_LOG(ERR, "Failed to register vDPA device.");
 		rte_errno = rte_errno ? rte_errno : EINVAL;
 		goto error;
 	}
-	mlx5_vdpa_config_get(pci_dev->device.devargs, priv);
+	mlx5_vdpa_config_get(dev->devargs, priv);
 	SLIST_INIT(&priv->mr_list);
 	pthread_mutex_init(&priv->vq_config_lock, NULL);
 	pthread_mutex_lock(&priv_list_lock);
@@ -754,26 +719,15 @@ mlx5_vdpa_pci_probe(struct rte_pci_driver *pci_drv __rte_unused,
 	return -rte_errno;
 }
 
-/**
- * DPDK callback to remove a PCI device.
- *
- * This function removes all vDPA devices belong to a given PCI device.
- *
- * @param[in] pci_dev
- *   Pointer to the PCI device.
- *
- * @return
- *   0 on success, the function cannot fail.
- */
 static int
-mlx5_vdpa_pci_remove(struct rte_pci_device *pci_dev)
+mlx5_vdpa_dev_remove(struct rte_device *dev)
 {
 	struct mlx5_vdpa_priv *priv = NULL;
 	int found = 0;
 
 	pthread_mutex_lock(&priv_list_lock);
 	TAILQ_FOREACH(priv, &priv_list, next) {
-		if (!rte_pci_addr_cmp(&priv->pci_dev->addr, &pci_dev->addr)) {
+		if (priv->vdev->device == dev) {
 			found = 1;
 			break;
 		}
@@ -831,17 +785,12 @@ static const struct rte_pci_id mlx5_vdpa_pci_id_map[] = {
 	}
 };
 
-static struct mlx5_pci_driver mlx5_vdpa_driver = {
-	.driver_class = MLX5_CLASS_VDPA,
-	.pci_driver = {
-		.driver = {
-			.name = RTE_STR(MLX5_VDPA_DRIVER_NAME),
-		},
-		.id_table = mlx5_vdpa_pci_id_map,
-		.probe = mlx5_vdpa_pci_probe,
-		.remove = mlx5_vdpa_pci_remove,
-		.drv_flags = 0,
-	},
+static struct mlx5_class_driver mlx5_vdpa_driver = {
+	.drv_class = MLX5_CLASS_VDPA,
+	.name = RTE_STR(MLX5_VDPA_DRIVER_NAME),
+	.id_table = mlx5_vdpa_pci_id_map,
+	.probe = mlx5_vdpa_dev_probe,
+	.remove = mlx5_vdpa_dev_remove,
 };
 
 RTE_LOG_REGISTER_DEFAULT(mlx5_vdpa_logtype, NOTICE)
@@ -853,7 +802,7 @@ RTE_INIT(rte_mlx5_vdpa_init)
 {
 	mlx5_common_init();
 	if (mlx5_glue)
-		mlx5_pci_driver_register(&mlx5_vdpa_driver);
+		mlx5_class_driver_register(&mlx5_vdpa_driver);
 }
 
 RTE_PMD_EXPORT_NAME(MLX5_VDPA_DRIVER_NAME, __COUNTER__);
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.h b/drivers/vdpa/mlx5/mlx5_vdpa.h
index 722c72b65e..2a04e36607 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa.h
+++ b/drivers/vdpa/mlx5/mlx5_vdpa.h
@@ -133,7 +133,6 @@ struct mlx5_vdpa_priv {
 	struct rte_vdpa_device *vdev; /* vDPA device. */
 	int vid; /* vhost device id. */
 	struct ibv_context *ctx; /* Device context. */
-	struct rte_pci_device *pci_dev;
 	struct mlx5_hca_vdpa_attr caps;
 	uint32_t pdn; /* Protection Domain number. */
 	struct ibv_pd *pd;
-- 
2.25.1


^ permalink raw reply	[flat|nested] 42+ messages in thread

* [dpdk-dev] [RFC 09/14] common/mlx5: clean up legacy PCI bus driver
  2021-05-27 13:37 [dpdk-dev] [RFC 00/14] mlx5: support SubFunction Xueming Li
                   ` (7 preceding siblings ...)
  2021-05-27 13:37 ` [dpdk-dev] [RFC 08/14] vdpa/mlx5: remove PCI specifics Xueming Li
@ 2021-05-27 13:37 ` Xueming Li
  2021-05-27 14:01 ` [dpdk-dev] [RFC 10/14] bus/auxiliary: introduce auxiliary bus Xueming Li
  2021-06-10 10:33 ` [dpdk-dev] [RFC 00/14] mlx5: " Ferruh Yigit
  10 siblings, 0 replies; 42+ messages in thread
From: Xueming Li @ 2021-05-27 13:37 UTC (permalink / raw)
  To: Viacheslav Ovsiienko
  Cc: dev, xuemingl, Matan Azrad, Shahaf Shuler, Ray Kinsella, Neil Horman

Clean up legacy PCI bus driver since all mlx5 PMDs moved to new common
PCI bus driver.

Signed-off-by: Xueming Li <xuemingl@nvidia.com>
---
 drivers/common/mlx5/linux/mlx5_common_os.c |  28 --
 drivers/common/mlx5/linux/mlx5_common_os.h |   4 -
 drivers/common/mlx5/mlx5_common.c          |   1 -
 drivers/common/mlx5/mlx5_common_pci.c      | 433 +--------------------
 drivers/common/mlx5/mlx5_common_pci.h      |  77 ----
 drivers/common/mlx5/mlx5_common_private.h  |   1 +
 drivers/common/mlx5/version.map            |   4 -
 7 files changed, 3 insertions(+), 545 deletions(-)
 delete mode 100644 drivers/common/mlx5/mlx5_common_pci.h

diff --git a/drivers/common/mlx5/linux/mlx5_common_os.c b/drivers/common/mlx5/linux/mlx5_common_os.c
index ea6001e6b2..cd1c305cc1 100644
--- a/drivers/common/mlx5/linux/mlx5_common_os.c
+++ b/drivers/common/mlx5/linux/mlx5_common_os.c
@@ -401,31 +401,3 @@ mlx5_glue_constructor(void)
 		" libmlx5)");
 	mlx5_glue = NULL;
 }
-
-struct ibv_device *
-mlx5_os_get_ibv_device(struct rte_pci_addr *addr)
-{
-	int n;
-	struct ibv_device **ibv_list = mlx5_glue->get_device_list(&n);
-	struct ibv_device *ibv_match = NULL;
-
-	if (ibv_list == NULL) {
-		rte_errno = ENOSYS;
-		return NULL;
-	}
-	while (n-- > 0) {
-		struct rte_pci_addr paddr;
-
-		DRV_LOG(DEBUG, "Checking device \"%s\"..", ibv_list[n]->name);
-		if (mlx5_dev_to_pci_addr(ibv_list[n]->ibdev_path, &paddr) != 0)
-			continue;
-		if (rte_pci_addr_cmp(addr, &paddr) != 0)
-			continue;
-		ibv_match = ibv_list[n];
-		break;
-	}
-	if (ibv_match == NULL)
-		rte_errno = ENOENT;
-	mlx5_glue->free_device_list(ibv_list);
-	return ibv_match;
-}
diff --git a/drivers/common/mlx5/linux/mlx5_common_os.h b/drivers/common/mlx5/linux/mlx5_common_os.h
index 72d6bf828b..bce5a11c0f 100644
--- a/drivers/common/mlx5/linux/mlx5_common_os.h
+++ b/drivers/common/mlx5/linux/mlx5_common_os.h
@@ -289,8 +289,4 @@ mlx5_os_free(void *addr)
 	free(addr);
 }
 
-__rte_internal
-struct ibv_device *
-mlx5_os_get_ibv_device(struct rte_pci_addr *addr);
-
 #endif /* RTE_PMD_MLX5_COMMON_OS_H_ */
diff --git a/drivers/common/mlx5/mlx5_common.c b/drivers/common/mlx5/mlx5_common.c
index f2e2a95ae0..875668d72b 100644
--- a/drivers/common/mlx5/mlx5_common.c
+++ b/drivers/common/mlx5/mlx5_common.c
@@ -14,7 +14,6 @@
 #include "mlx5_common.h"
 #include "mlx5_common_os.h"
 #include "mlx5_common_log.h"
-#include "mlx5_common_pci.h"
 #include "mlx5_common_private.h"
 
 uint8_t haswell_broadwell_cpu;
diff --git a/drivers/common/mlx5/mlx5_common_pci.c b/drivers/common/mlx5/mlx5_common_pci.c
index 5a824dd50f..c1500f3a2b 100644
--- a/drivers/common/mlx5/mlx5_common_pci.c
+++ b/drivers/common/mlx5/mlx5_common_pci.c
@@ -8,431 +8,17 @@
 #include <rte_devargs.h>
 #include <rte_errno.h>
 #include <rte_class.h>
+#include <rte_pci.h>
+#include <rte_bus_pci.h>
 
 #include "mlx5_common_log.h"
-#include "mlx5_common_pci.h"
 #include "mlx5_common_private.h"
 
 static struct rte_pci_driver mlx5_common_pci_driver;
 
-/********** Legacy PCI bus driver, to be removed ********/
-
-struct mlx5_pci_device {
-	struct rte_pci_device *pci_dev;
-	TAILQ_ENTRY(mlx5_pci_device) next;
-	uint32_t classes_loaded;
-};
-
-/* Head of list of drivers. */
-static TAILQ_HEAD(mlx5_pci_bus_drv_head, mlx5_pci_driver) drv_list =
-				TAILQ_HEAD_INITIALIZER(drv_list);
-
-/* Head of mlx5 pci devices. */
-static TAILQ_HEAD(mlx5_pci_devices_head, mlx5_pci_device) devices_list =
-				TAILQ_HEAD_INITIALIZER(devices_list);
-
-static const struct {
-	const char *name;
-	unsigned int driver_class;
-} mlx5_classes[] = {
-	{ .name = "vdpa", .driver_class = MLX5_CLASS_VDPA },
-	{ .name = "net", .driver_class = MLX5_CLASS_NET },
-	{ .name = "regex", .driver_class = MLX5_CLASS_REGEX },
-	{ .name = "compress", .driver_class = MLX5_CLASS_COMPRESS },
-};
-
-static const unsigned int mlx5_class_combinations[] = {
-	MLX5_CLASS_NET,
-	MLX5_CLASS_VDPA,
-	MLX5_CLASS_REGEX,
-	MLX5_CLASS_COMPRESS,
-	MLX5_CLASS_NET | MLX5_CLASS_REGEX,
-	MLX5_CLASS_VDPA | MLX5_CLASS_REGEX,
-	MLX5_CLASS_NET | MLX5_CLASS_COMPRESS,
-	MLX5_CLASS_VDPA | MLX5_CLASS_COMPRESS,
-	MLX5_CLASS_REGEX | MLX5_CLASS_COMPRESS,
-	MLX5_CLASS_NET | MLX5_CLASS_REGEX | MLX5_CLASS_COMPRESS,
-	MLX5_CLASS_VDPA | MLX5_CLASS_REGEX | MLX5_CLASS_COMPRESS,
-	/* New class combination should be added here. */
-};
-
-static int
-class_name_to_value(const char *class_name)
-{
-	unsigned int i;
-
-	for (i = 0; i < RTE_DIM(mlx5_classes); i++) {
-		if (strcmp(class_name, mlx5_classes[i].name) == 0)
-			return mlx5_classes[i].driver_class;
-	}
-	return -EINVAL;
-}
-
-static struct mlx5_pci_driver *
-driver_get(uint32_t class)
-{
-	struct mlx5_pci_driver *driver;
-
-	TAILQ_FOREACH(driver, &drv_list, next) {
-		if (driver->driver_class == class)
-			return driver;
-	}
-	return NULL;
-}
-
-static int
-bus_cmdline_options_handler(__rte_unused const char *key,
-			    const char *class_names, void *opaque)
-{
-	int *ret = opaque;
-	char *nstr_org;
-	int class_val;
-	char *found;
-	char *nstr;
-	char *refstr = NULL;
-
-	*ret = 0;
-	nstr = strdup(class_names);
-	if (!nstr) {
-		*ret = -ENOMEM;
-		return *ret;
-	}
-	nstr_org = nstr;
-	found = strtok_r(nstr, ":", &refstr);
-	if (!found)
-		goto err;
-	do {
-		/* Extract each individual class name. Multiple
-		 * class key,value is supplied as class=net:vdpa:foo:bar.
-		 */
-		class_val = class_name_to_value(found);
-		/* Check if its a valid class. */
-		if (class_val < 0) {
-			*ret = -EINVAL;
-			goto err;
-		}
-		*ret |= class_val;
-		found = strtok_r(NULL, ":", &refstr);
-	} while (found);
-err:
-	free(nstr_org);
-	if (*ret < 0)
-		DRV_LOG(ERR, "Invalid mlx5 class options %s."
-			" Maybe typo in device class argument setting?",
-			class_names);
-	return *ret;
-}
-
-static int
-parse_class_options(const struct rte_devargs *devargs)
-{
-	const char *key = RTE_DEVARGS_KEY_CLASS;
-	struct rte_kvargs *kvlist;
-	int ret = 0;
-
-	if (devargs == NULL)
-		return 0;
-	kvlist = rte_kvargs_parse(devargs->args, NULL);
-	if (kvlist == NULL)
-		return 0;
-	if (rte_kvargs_count(kvlist, key))
-		rte_kvargs_process(kvlist, key, bus_cmdline_options_handler,
-				   &ret);
-	rte_kvargs_free(kvlist);
-	return ret;
-}
-
-static bool
-mlx5_bus_match(const struct mlx5_pci_driver *drv,
-	       const struct rte_pci_device *pci_dev)
-{
-	const struct rte_pci_id *id_table;
-
-	for (id_table = drv->pci_driver.id_table; id_table->vendor_id != 0;
-	     id_table++) {
-		/* Check if device's ids match the class driver's ids. */
-		if (id_table->vendor_id != pci_dev->id.vendor_id &&
-		    id_table->vendor_id != RTE_PCI_ANY_ID)
-			continue;
-		if (id_table->device_id != pci_dev->id.device_id &&
-		    id_table->device_id != RTE_PCI_ANY_ID)
-			continue;
-		if (id_table->subsystem_vendor_id !=
-		    pci_dev->id.subsystem_vendor_id &&
-		    id_table->subsystem_vendor_id != RTE_PCI_ANY_ID)
-			continue;
-		if (id_table->subsystem_device_id !=
-		    pci_dev->id.subsystem_device_id &&
-		    id_table->subsystem_device_id != RTE_PCI_ANY_ID)
-			continue;
-		if (id_table->class_id != pci_dev->id.class_id &&
-		    id_table->class_id != RTE_CLASS_ANY_ID)
-			continue;
-		return true;
-	}
-	return false;
-}
-
-static int
-is_valid_class_combination(uint32_t user_classes)
-{
-	unsigned int i;
-
-	/* Verify if user specified valid supported combination. */
-	for (i = 0; i < RTE_DIM(mlx5_class_combinations); i++) {
-		if (mlx5_class_combinations[i] == user_classes)
-			return 0;
-	}
-	/* Not found any valid class combination. */
-	return -EINVAL;
-}
-
-static struct mlx5_pci_device *
-pci_to_mlx5_device(const struct rte_pci_device *pci_dev)
-{
-	struct mlx5_pci_device *dev;
-
-	TAILQ_FOREACH(dev, &devices_list, next) {
-		if (dev->pci_dev == pci_dev)
-			return dev;
-	}
-	return NULL;
-}
-
-static bool
-device_class_enabled(const struct mlx5_pci_device *device, uint32_t class)
-{
-	return (device->classes_loaded & class) ? true : false;
-}
-
-static void
-dev_release(struct mlx5_pci_device *dev)
-{
-	TAILQ_REMOVE(&devices_list, dev, next);
-	rte_free(dev);
-}
-
-static int
-drivers_remove(struct mlx5_pci_device *dev, uint32_t enabled_classes)
-{
-	struct mlx5_pci_driver *driver;
-	int local_ret = -ENODEV;
-	unsigned int i = 0;
-	int ret = 0;
-
-	enabled_classes &= dev->classes_loaded;
-	while (enabled_classes) {
-		driver = driver_get(RTE_BIT64(i));
-		if (driver) {
-			local_ret = driver->pci_driver.remove(dev->pci_dev);
-			if (!local_ret)
-				dev->classes_loaded &= ~RTE_BIT64(i);
-			else if (ret == 0)
-				ret = local_ret;
-		}
-		enabled_classes &= ~RTE_BIT64(i);
-		i++;
-	}
-	if (local_ret)
-		ret = local_ret;
-	return ret;
-}
-
-static int
-drivers_probe(struct mlx5_pci_device *dev, struct rte_pci_driver *pci_drv,
-	      struct rte_pci_device *pci_dev, uint32_t user_classes)
-{
-	struct mlx5_pci_driver *driver;
-	uint32_t enabled_classes = 0;
-	bool already_loaded;
-	int ret;
-
-	TAILQ_FOREACH(driver, &drv_list, next) {
-		if ((driver->driver_class & user_classes) == 0)
-			continue;
-		if (!mlx5_bus_match(driver, pci_dev))
-			continue;
-		already_loaded = dev->classes_loaded & driver->driver_class;
-		if (already_loaded &&
-		    !(driver->pci_driver.drv_flags & RTE_PCI_DRV_PROBE_AGAIN)) {
-			DRV_LOG(ERR, "Device %s is already probed",
-				pci_dev->device.name);
-			ret = -EEXIST;
-			goto probe_err;
-		}
-		ret = driver->pci_driver.probe(pci_drv, pci_dev);
-		if (ret < 0) {
-			DRV_LOG(ERR, "Failed to load driver %s",
-				driver->pci_driver.driver.name);
-			goto probe_err;
-		}
-		enabled_classes |= driver->driver_class;
-	}
-	dev->classes_loaded |= enabled_classes;
-	return 0;
-probe_err:
-	/* Only unload drivers which are enabled which were enabled
-	 * in this probe instance.
-	 */
-	drivers_remove(dev, enabled_classes);
-	return ret;
-}
-
-/**
- * DPDK callback to register to probe multiple drivers for a PCI device.
- *
- * @param[in] pci_drv
- *   PCI driver structure.
- * @param[in] dev
- *   PCI device information.
- *
- * @return
- *   0 on success, a negative errno value otherwise and rte_errno is set.
- */
-static int
-mlx5_pci_probe(struct rte_pci_driver *pci_drv __rte_unused,
-	       struct rte_pci_device *pci_dev)
-{
-	struct mlx5_pci_device *dev;
-	uint32_t user_classes = 0;
-	bool new_device = false;
-	int ret;
-
-	ret = parse_class_options(pci_dev->device.devargs);
-	if (ret < 0)
-		return ret;
-	user_classes = ret;
-	if (user_classes) {
-		/* Validate combination here. */
-		ret = is_valid_class_combination(user_classes);
-		if (ret) {
-			DRV_LOG(ERR, "Unsupported mlx5 classes supplied.");
-			return ret;
-		}
-	} else {
-		/* Default to net class. */
-		user_classes = MLX5_CLASS_NET;
-	}
-	dev = pci_to_mlx5_device(pci_dev);
-	if (!dev) {
-		dev = rte_zmalloc("mlx5_pci_device", sizeof(*dev), 0);
-		if (!dev)
-			return -ENOMEM;
-		dev->pci_dev = pci_dev;
-		TAILQ_INSERT_HEAD(&devices_list, dev, next);
-		new_device = true;
-	}
-	ret = drivers_probe(dev, pci_drv, pci_dev, user_classes);
-	if (ret)
-		goto class_err;
-	return 0;
-class_err:
-	if (new_device)
-		dev_release(dev);
-	return ret;
-}
-
-/**
- * DPDK callback to remove one or more drivers for a PCI device.
- *
- * This function removes all drivers probed for a given PCI device.
- *
- * @param[in] pci_dev
- *   Pointer to the PCI device.
- *
- * @return
- *   0 on success, the function cannot fail.
- */
-static int
-mlx5_pci_remove(struct rte_pci_device *pci_dev)
-{
-	struct mlx5_pci_device *dev;
-	int ret;
-
-	dev = pci_to_mlx5_device(pci_dev);
-	if (!dev)
-		return -ENODEV;
-	/* Matching device found, cleanup and unload drivers. */
-	ret = drivers_remove(dev, dev->classes_loaded);
-	if (!ret)
-		dev_release(dev);
-	return ret;
-}
-
-static int
-mlx5_pci_dma_map(struct rte_pci_device *pci_dev, void *addr,
-		 uint64_t iova, size_t len)
-{
-	struct mlx5_pci_driver *driver = NULL;
-	struct mlx5_pci_driver *temp;
-	struct mlx5_pci_device *dev;
-	int ret = -EINVAL;
-
-	dev = pci_to_mlx5_device(pci_dev);
-	if (!dev)
-		return -ENODEV;
-	TAILQ_FOREACH(driver, &drv_list, next) {
-		if (device_class_enabled(dev, driver->driver_class) &&
-		    driver->pci_driver.dma_map) {
-			ret = driver->pci_driver.dma_map(pci_dev, addr,
-							 iova, len);
-			if (ret)
-				goto map_err;
-		}
-	}
-	return ret;
-map_err:
-	TAILQ_FOREACH(temp, &drv_list, next) {
-		if (temp == driver)
-			break;
-		if (device_class_enabled(dev, temp->driver_class) &&
-		    temp->pci_driver.dma_map && temp->pci_driver.dma_unmap)
-			temp->pci_driver.dma_unmap(pci_dev, addr, iova, len);
-	}
-	return ret;
-}
-
-static int
-mlx5_pci_dma_unmap(struct rte_pci_device *pci_dev, void *addr,
-		   uint64_t iova, size_t len)
-{
-	struct mlx5_pci_driver *driver;
-	struct mlx5_pci_device *dev;
-	int local_ret = -EINVAL;
-	int ret;
-
-	dev = pci_to_mlx5_device(pci_dev);
-	if (!dev)
-		return -ENODEV;
-	ret = 0;
-	/* There is no unmap error recovery in current implementation. */
-	TAILQ_FOREACH_REVERSE(driver, &drv_list, mlx5_pci_bus_drv_head, next) {
-		if (device_class_enabled(dev, driver->driver_class) &&
-		    driver->pci_driver.dma_unmap) {
-			local_ret = driver->pci_driver.dma_unmap(pci_dev, addr,
-								 iova, len);
-			if (local_ret && (ret == 0))
-				ret = local_ret;
-		}
-	}
-	if (local_ret)
-		ret = local_ret;
-	return ret;
-}
-
 /* PCI ID table is build dynamically based on registered mlx5 drivers. */
 static struct rte_pci_id *mlx5_pci_id_table;
 
-static struct rte_pci_driver mlx5_pci_driver = {
-	.driver = {
-		.name = MLX5_PCI_DRIVER_NAME,
-	},
-	.probe = mlx5_pci_probe,
-	.remove = mlx5_pci_remove,
-	.dma_map = mlx5_pci_dma_map,
-	.dma_unmap = mlx5_pci_dma_unmap,
-};
-
 static int
 pci_id_table_size_get(const struct rte_pci_id *id_table)
 {
@@ -509,7 +95,6 @@ pci_ids_table_update(const struct rte_pci_id *driver_id_table)
 	}
 	/* Terminate table with empty entry. */
 	updated_table[i].vendor_id = 0;
-	mlx5_pci_driver.id_table = updated_table;
 	mlx5_common_pci_driver.id_table = updated_table;
 	mlx5_pci_id_table = updated_table;
 	if (old_table)
@@ -517,20 +102,6 @@ pci_ids_table_update(const struct rte_pci_id *driver_id_table)
 	return 0;
 }
 
-void
-mlx5_pci_driver_register(struct mlx5_pci_driver *driver)
-{
-	int ret;
-
-	ret = pci_ids_table_update(driver->pci_driver.id_table);
-	if (ret)
-		return;
-	mlx5_pci_driver.drv_flags |= driver->pci_driver.drv_flags;
-	TAILQ_INSERT_TAIL(&drv_list, driver, next);
-}
-
-/********** New common PCI bus driver ********/
-
 bool
 mlx5_dev_is_pci(const struct rte_device *dev)
 {
diff --git a/drivers/common/mlx5/mlx5_common_pci.h b/drivers/common/mlx5/mlx5_common_pci.h
deleted file mode 100644
index de89bb98bc..0000000000
--- a/drivers/common/mlx5/mlx5_common_pci.h
+++ /dev/null
@@ -1,77 +0,0 @@
-/* SPDX-License-Identifier: BSD-3-Clause
- * Copyright 2020 Mellanox Technologies, Ltd
- */
-
-#ifndef _MLX5_COMMON_PCI_H_
-#define _MLX5_COMMON_PCI_H_
-
-/**
- * @file
- *
- * RTE Mellanox PCI Driver Interface
- * Mellanox ConnectX PCI device supports multiple class: net,vdpa,regex and
- * compress devices. This layer enables creating such multiple class of devices
- * on a single PCI device by allowing to bind multiple class specific device
- * driver to attach to mlx5_pci driver.
- *
- * -----------    ------------    -------------    ----------------
- * |   mlx5  |    |   mlx5   |    |   mlx5    |    |     mlx5     |
- * | net pmd |    | vdpa pmd |    | regex pmd |    | compress pmd |
- * -----------    ------------    -------------    ----------------
- *      \              \                    /              /
- *       \              \                  /              /
- *        \              \_--------------_/              /
- *         \_______________|   mlx5     |_______________/
- *                         | pci common |
- *                         --------------
- *                               |
- *                           -----------
- *                           |   mlx5  |
- *                           | pci dev |
- *                           -----------
- *
- * - mlx5 pci driver binds to mlx5 PCI devices defined by PCI
- *   ID table of all related mlx5 PCI devices.
- * - mlx5 class driver such as net, vdpa, regex PMD defines its
- *   specific PCI ID table and mlx5 bus driver probes matching
- *   class drivers.
- * - mlx5 pci bus driver is cental place that validates supported
- *   class combinations.
- */
-
-#ifdef __cplusplus
-extern "C" {
-#endif /* __cplusplus */
-
-#include <rte_pci.h>
-#include <rte_bus_pci.h>
-
-#include <mlx5_common.h>
-
-void mlx5_common_pci_init(void);
-
-/**
- * A structure describing a mlx5 pci driver.
- */
-struct mlx5_pci_driver {
-	struct rte_pci_driver pci_driver;	/**< Inherit core pci driver. */
-	uint32_t driver_class;	/**< Class of this driver, enum mlx5_class */
-	TAILQ_ENTRY(mlx5_pci_driver) next;
-};
-
-/**
- * Register a mlx5_pci device driver.
- *
- * @param driver
- *   A pointer to a mlx5_pci_driver structure describing the driver
- *   to be registered.
- */
-__rte_internal
-void
-mlx5_pci_driver_register(struct mlx5_pci_driver *driver);
-
-#ifdef __cplusplus
-}
-#endif /* __cplusplus */
-
-#endif /* _MLX5_COMMON_PCI_H_ */
diff --git a/drivers/common/mlx5/mlx5_common_private.h b/drivers/common/mlx5/mlx5_common_private.h
index 72df9aef35..1beeaae50e 100644
--- a/drivers/common/mlx5/mlx5_common_private.h
+++ b/drivers/common/mlx5/mlx5_common_private.h
@@ -30,6 +30,7 @@ int mlx5_common_dev_dma_unmap(struct rte_device *dev, void *addr, uint64_t iova,
 
 /* Common PCI bus driver: */
 
+void mlx5_common_pci_init(void);
 void mlx5_common_driver_on_register_pci(struct mlx5_class_driver *driver);
 bool mlx5_dev_pci_match(const struct mlx5_class_driver *drv,
 			const struct rte_device *dev);
diff --git a/drivers/common/mlx5/version.map b/drivers/common/mlx5/version.map
index 3c21719975..b10e1c4646 100644
--- a/drivers/common/mlx5/version.map
+++ b/drivers/common/mlx5/version.map
@@ -129,17 +129,13 @@ INTERNAL {
 	mlx5_nl_vlan_vmwa_create; # WINDOWS_NO_EXPORT
 	mlx5_nl_vlan_vmwa_delete; # WINDOWS_NO_EXPORT
 
-	mlx5_pci_driver_register;
-
 	mlx5_os_alloc_pd;
 	mlx5_os_dealloc_pd;
 	mlx5_os_dereg_mr;
-	mlx5_os_get_ibv_device; # WINDOWS_NO_EXPORT
 	mlx5_os_reg_mr;
 	mlx5_os_umem_dereg;
 	mlx5_os_umem_reg;
 
 	mlx5_realloc;
-
 	mlx5_translate_port_name; # WINDOWS_NO_EXPORT
 };
-- 
2.25.1


^ permalink raw reply	[flat|nested] 42+ messages in thread

* [dpdk-dev] [RFC 10/14] bus/auxiliary: introduce auxiliary bus
  2021-05-27 13:37 [dpdk-dev] [RFC 00/14] mlx5: support SubFunction Xueming Li
                   ` (8 preceding siblings ...)
  2021-05-27 13:37 ` [dpdk-dev] [RFC 09/14] common/mlx5: clean up legacy PCI bus driver Xueming Li
@ 2021-05-27 14:01 ` Xueming Li
  2021-05-27 14:01   ` [dpdk-dev] [RFC 11/14] common/mlx5: support " Xueming Li
                     ` (3 more replies)
  2021-06-10 10:33 ` [dpdk-dev] [RFC 00/14] mlx5: " Ferruh Yigit
  10 siblings, 4 replies; 42+ messages in thread
From: Xueming Li @ 2021-05-27 14:01 UTC (permalink / raw)
  To: Viacheslav Ovsiienko
  Cc: dev, xuemingl, Wang Haiyue, Thomas Monjalon, Parav Pandit,
	Ray Kinsella, Neil Horman

Auxiliary [1] provides a way to split function into child-devices
representing sub-domains of functionality. Each auxiliary_device
represents a part of its parent functionality.

Auxiliary device is identified by unique device name, sysfs path:
  /sys/bus/auxiliary/devices/<name>

[1] kernel auxiliary bus document:
https://www.kernel.org/doc/html/latest/driver-api/auxiliary_bus.html

Signed-off-by: Xueming Li <xuemingl@nvidia.com>
Cc: Wang Haiyue <haiyue.wang@intel.com>
---
 MAINTAINERS                               |   5 +
 drivers/bus/auxiliary/auxiliary_common.c  | 408 ++++++++++++++++++++++
 drivers/bus/auxiliary/auxiliary_params.c  |  58 +++
 drivers/bus/auxiliary/linux/auxiliary.c   | 147 ++++++++
 drivers/bus/auxiliary/meson.build         |  11 +
 drivers/bus/auxiliary/private.h           | 120 +++++++
 drivers/bus/auxiliary/rte_bus_auxiliary.h | 199 +++++++++++
 drivers/bus/auxiliary/version.map         |  10 +
 drivers/bus/meson.build                   |   1 +
 9 files changed, 959 insertions(+)
 create mode 100644 drivers/bus/auxiliary/auxiliary_common.c
 create mode 100644 drivers/bus/auxiliary/auxiliary_params.c
 create mode 100644 drivers/bus/auxiliary/linux/auxiliary.c
 create mode 100644 drivers/bus/auxiliary/meson.build
 create mode 100644 drivers/bus/auxiliary/private.h
 create mode 100644 drivers/bus/auxiliary/rte_bus_auxiliary.h
 create mode 100644 drivers/bus/auxiliary/version.map

diff --git a/MAINTAINERS b/MAINTAINERS
index 5877a16971..eaf691ca6a 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -525,6 +525,11 @@ F: doc/guides/mempool/octeontx2.rst
 Bus Drivers
 -----------
 
+Auxiliary bus driver
+M: Parav Pandit <parav@nvidia.com>
+M: Xueming Li <xuemingl@nvidia.com>
+F: drivers/bus/auxiliary/
+
 Intel FPGA bus
 M: Rosen Xu <rosen.xu@intel.com>
 F: drivers/bus/ifpga/
diff --git a/drivers/bus/auxiliary/auxiliary_common.c b/drivers/bus/auxiliary/auxiliary_common.c
new file mode 100644
index 0000000000..cef85ae991
--- /dev/null
+++ b/drivers/bus/auxiliary/auxiliary_common.c
@@ -0,0 +1,408 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright 2021 Mellanox Technologies, Ltd
+ */
+
+#include <string.h>
+#include <inttypes.h>
+#include <stdint.h>
+#include <stdbool.h>
+#include <stdlib.h>
+#include <stdio.h>
+#include <sys/queue.h>
+#include <rte_errno.h>
+#include <rte_interrupts.h>
+#include <rte_log.h>
+#include <rte_bus.h>
+#include <rte_per_lcore.h>
+#include <rte_memory.h>
+#include <rte_eal.h>
+#include <rte_eal_paging.h>
+#include <rte_string_fns.h>
+#include <rte_common.h>
+#include <rte_devargs.h>
+
+#include "private.h"
+#include "rte_bus_auxiliary.h"
+
+
+int auxiliary_bus_logtype;
+
+static struct rte_devargs *
+auxiliary_devargs_lookup(const char *name)
+{
+	struct rte_devargs *devargs;
+
+	RTE_EAL_DEVARGS_FOREACH("auxiliary", devargs) {
+		if (strcmp(devargs->name, name) == 0)
+			return devargs;
+	}
+	return NULL;
+}
+
+/*
+ * Test whether the auxiliary device exist
+ */
+__rte_weak bool
+auxiliary_dev_exists(const char *name)
+{
+	RTE_SET_USED(name);
+	return false;
+}
+
+/*
+ * Scan the content of the auxiliary bus, and the devices in the devices
+ * list
+ */
+__rte_weak int
+auxiliary_scan(void)
+{
+	return 0;
+}
+
+void
+auxiliary_on_scan(struct rte_auxiliary_device *aux_dev)
+{
+	aux_dev->device.devargs = auxiliary_devargs_lookup(aux_dev->name);
+}
+
+/*
+ * Match the auxiliary Driver and Device using driver function.
+ */
+bool
+auxiliary_match(const struct rte_auxiliary_driver *aux_drv,
+		const struct rte_auxiliary_device *aux_dev)
+{
+	if (aux_drv->match == NULL)
+		return false;
+	return aux_drv->match(aux_dev->name);
+}
+
+/*
+ * Call the probe() function of the driver.
+ */
+static int
+rte_auxiliary_probe_one_driver(struct rte_auxiliary_driver *dr,
+			       struct rte_auxiliary_device *dev)
+{
+	enum rte_iova_mode iova_mode;
+	int ret;
+
+	if ((dr == NULL) || (dev == NULL))
+		return -EINVAL;
+
+	/* The device is not blocked; Check if driver supports it */
+	if (!auxiliary_match(dr, dev))
+		/* Match of device and driver failed */
+		return 1;
+
+	AUXILIARY_LOG(DEBUG, "Auxiliary device %s on NUMA socket %i\n",
+		      dev->name, dev->device.numa_node);
+
+	/* no initialization when marked as blocked, return without error */
+	if (dev->device.devargs != NULL &&
+	    dev->device.devargs->policy == RTE_DEV_BLOCKED) {
+		AUXILIARY_LOG(INFO, "  Device is blocked, not initializing\n");
+		return -1;
+	}
+
+	if (dev->device.numa_node < 0) {
+		AUXILIARY_LOG(WARNING, "  Invalid NUMA socket, default to 0\n");
+		dev->device.numa_node = 0;
+	}
+
+	AUXILIARY_LOG(DEBUG, "  Probe driver: %s\n", dr->driver.name);
+
+	iova_mode = rte_eal_iova_mode();
+	if ((dr->drv_flags & RTE_AUXILIARY_DRV_NEED_IOVA_AS_VA) &&
+	    iova_mode != RTE_IOVA_VA) {
+		AUXILIARY_LOG(ERR, "  Expecting VA IOVA mode but current mode is PA, not initializing\n");
+		return -EINVAL;
+	}
+
+	dev->driver = dr;
+
+	AUXILIARY_LOG(INFO, "Probe auxiliary driver: %s device: %s (socket %i)\n",
+		      dr->driver.name, dev->name, dev->device.numa_node);
+	ret = dr->probe(dr, dev);
+	if (ret)
+		dev->driver = NULL;
+	else
+		dev->device.driver = &dr->driver;
+
+	return ret;
+}
+
+/*
+ * Call the remove() function of the driver.
+ */
+static int
+rte_auxiliary_driver_remove_dev(struct rte_auxiliary_device *dev)
+{
+	struct rte_auxiliary_driver *dr;
+	int ret = 0;
+
+	if (dev == NULL)
+		return -EINVAL;
+
+	dr = dev->driver;
+
+	AUXILIARY_LOG(DEBUG, "Auxiliary device %s on NUMA socket %i\n",
+		      dev->name, dev->device.numa_node);
+
+	AUXILIARY_LOG(DEBUG, "  remove driver: %s %s\n",
+		      dev->name, dr->driver.name);
+
+	if (dr->remove) {
+		ret = dr->remove(dev);
+		if (ret < 0)
+			return ret;
+	}
+
+	/* clear driver structure */
+	dev->driver = NULL;
+	dev->device.driver = NULL;
+
+	return 0;
+}
+
+/*
+ * Call the probe() function of all registered driver for the given device.
+ * Return < 0 if initialization failed.
+ * Return 1 if no driver is found for this device.
+ */
+static int
+auxiliary_probe_all_drivers(struct rte_auxiliary_device *dev)
+{
+	struct rte_auxiliary_driver *dr;
+	int rc;
+
+	if (dev == NULL)
+		return -EINVAL;
+
+	FOREACH_DRIVER_ON_AUXILIARYBUS(dr) {
+		if (!dr->match(dev->name))
+			continue;
+
+		rc = rte_auxiliary_probe_one_driver(dr, dev);
+		if (rc < 0)
+			/* negative value is an error */
+			return rc;
+		if (rc > 0)
+			/* positive value means driver doesn't support it */
+			continue;
+		return 0;
+	}
+	return 1;
+}
+
+/*
+ * Scan the content of the auxiliary bus, and call the probe() function for
+ *
+ * all registered drivers that have a matching entry in its id_table
+ * for discovered devices.
+ */
+static int
+auxiliary_probe(void)
+{
+	struct rte_auxiliary_device *dev = NULL;
+	size_t probed = 0, failed = 0;
+	int ret = 0;
+
+	FOREACH_DEVICE_ON_AUXILIARYBUS(dev) {
+		probed++;
+
+		ret = auxiliary_probe_all_drivers(dev);
+		if (ret < 0) {
+			if (ret != -EEXIST) {
+				AUXILIARY_LOG(ERR, "Requested device %s cannot be used\n",
+					      dev->name);
+				rte_errno = errno;
+				failed++;
+			}
+			ret = 0;
+		}
+	}
+
+	return (probed && probed == failed) ? -1 : 0;
+}
+
+static int
+auxiliary_parse(const char *name, void *addr)
+{
+	struct rte_auxiliary_driver *dr = NULL;
+	const char **out = addr;
+
+	FOREACH_DRIVER_ON_AUXILIARYBUS(dr) {
+		if (dr->match(name))
+			break;
+	}
+	if (dr != NULL && addr != NULL)
+		*out = name;
+	return dr != NULL ? 0 : -1;
+}
+
+/* register a driver */
+void
+rte_auxiliary_register(struct rte_auxiliary_driver *driver)
+{
+	TAILQ_INSERT_TAIL(&auxiliary_bus.driver_list, driver, next);
+	driver->bus = &auxiliary_bus;
+}
+
+/* unregister a driver */
+void
+rte_auxiliary_unregister(struct rte_auxiliary_driver *driver)
+{
+	TAILQ_REMOVE(&auxiliary_bus.driver_list, driver, next);
+	driver->bus = NULL;
+}
+
+/* Add a device to auxiliary bus */
+void
+auxiliary_add_device(struct rte_auxiliary_device *aux_dev)
+{
+	TAILQ_INSERT_TAIL(&auxiliary_bus.device_list, aux_dev, next);
+}
+
+/* Insert a device into a predefined position in auxiliary bus */
+void
+auxiliary_insert_device(struct rte_auxiliary_device *exist_aux_dev,
+			struct rte_auxiliary_device *new_aux_dev)
+{
+	TAILQ_INSERT_BEFORE(exist_aux_dev, new_aux_dev, next);
+}
+
+/* Remove a device from auxiliary bus */
+static void
+rte_auxiliary_remove_device(struct rte_auxiliary_device *auxiliary_dev)
+{
+	TAILQ_REMOVE(&auxiliary_bus.device_list, auxiliary_dev, next);
+}
+
+static struct rte_device *
+auxiliary_find_device(const struct rte_device *start, rte_dev_cmp_t cmp,
+		      const void *data)
+{
+	const struct rte_auxiliary_device *pstart;
+	struct rte_auxiliary_device *adev;
+
+	if (start != NULL) {
+		pstart = RTE_DEV_TO_AUXILIARY_CONST(start);
+		adev = TAILQ_NEXT(pstart, next);
+	} else {
+		adev = TAILQ_FIRST(&auxiliary_bus.device_list);
+	}
+	while (adev != NULL) {
+		if (cmp(&adev->device, data) == 0)
+			return &adev->device;
+		adev = TAILQ_NEXT(adev, next);
+	}
+	return NULL;
+}
+
+static int
+auxiliary_plug(struct rte_device *dev)
+{
+	if (!auxiliary_dev_exists(dev->name))
+		return -ENOENT;
+	return auxiliary_probe_all_drivers(RTE_DEV_TO_AUXILIARY(dev));
+}
+
+static int
+auxiliary_unplug(struct rte_device *dev)
+{
+	struct rte_auxiliary_device *adev;
+	int ret;
+
+	adev = RTE_DEV_TO_AUXILIARY(dev);
+	ret = rte_auxiliary_driver_remove_dev(adev);
+	if (ret == 0) {
+		rte_auxiliary_remove_device(adev);
+		rte_devargs_remove(dev->devargs);
+		free(adev);
+	}
+	return ret;
+}
+
+static int
+auxiliary_dma_map(struct rte_device *dev, void *addr, uint64_t iova, size_t len)
+{
+	struct rte_auxiliary_device *aux_dev = RTE_DEV_TO_AUXILIARY(dev);
+
+	if (dev == NULL || !aux_dev->driver) {
+		rte_errno = EINVAL;
+		return -1;
+	}
+	if (aux_dev->driver->dma_map)
+		return aux_dev->driver->dma_map(aux_dev, addr, iova, len);
+	rte_errno = ENOTSUP;
+	return -1;
+}
+
+static int
+auxiliary_dma_unmap(struct rte_device *dev, void *addr, uint64_t iova,
+		    size_t len)
+{
+	struct rte_auxiliary_device *aux_dev = RTE_DEV_TO_AUXILIARY(dev);
+
+	if (dev == NULL || !aux_dev->driver) {
+		rte_errno = EINVAL;
+		return -1;
+	}
+	if (aux_dev->driver->dma_unmap)
+		return aux_dev->driver->dma_unmap(aux_dev, addr, iova, len);
+	rte_errno = ENOTSUP;
+	return -1;
+}
+
+bool
+auxiliary_ignore_device(const char *name)
+{
+	struct rte_devargs *devargs = auxiliary_devargs_lookup(name);
+
+	switch (auxiliary_bus.bus.conf.scan_mode) {
+	case RTE_BUS_SCAN_ALLOWLIST:
+		if (devargs && devargs->policy == RTE_DEV_ALLOWED)
+			return false;
+		break;
+	case RTE_BUS_SCAN_UNDEFINED:
+	case RTE_BUS_SCAN_BLOCKLIST:
+		if (devargs == NULL || devargs->policy != RTE_DEV_BLOCKED)
+			return false;
+		break;
+	}
+	return true;
+}
+
+static enum rte_iova_mode
+auxiliary_get_iommu_class(void)
+{
+	const struct rte_auxiliary_driver *drv;
+
+	FOREACH_DRIVER_ON_AUXILIARYBUS(drv) {
+		if (drv->drv_flags & RTE_AUXILIARY_DRV_NEED_IOVA_AS_VA)
+			return RTE_IOVA_VA;
+	}
+
+	return RTE_IOVA_DC;
+}
+
+struct rte_auxiliary_bus auxiliary_bus = {
+	.bus = {
+		.scan = auxiliary_scan,
+		.probe = auxiliary_probe,
+		.find_device = auxiliary_find_device,
+		.plug = auxiliary_plug,
+		.unplug = auxiliary_unplug,
+		.parse = auxiliary_parse,
+		.dma_map = auxiliary_dma_map,
+		.dma_unmap = auxiliary_dma_unmap,
+		.get_iommu_class = auxiliary_get_iommu_class,
+		.dev_iterate = auxiliary_dev_iterate,
+	},
+	.device_list = TAILQ_HEAD_INITIALIZER(auxiliary_bus.device_list),
+	.driver_list = TAILQ_HEAD_INITIALIZER(auxiliary_bus.driver_list),
+};
+
+RTE_REGISTER_BUS(auxiliary, auxiliary_bus.bus);
+RTE_LOG_REGISTER(auxiliary_bus_logtype, bus.auxiliary, NOTICE);
diff --git a/drivers/bus/auxiliary/auxiliary_params.c b/drivers/bus/auxiliary/auxiliary_params.c
new file mode 100644
index 0000000000..5a1b029839
--- /dev/null
+++ b/drivers/bus/auxiliary/auxiliary_params.c
@@ -0,0 +1,58 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright 2021 Mellanox Technologies, Ltd
+ */
+
+#include <string.h>
+
+#include <rte_bus.h>
+#include <rte_dev.h>
+#include <rte_errno.h>
+#include <rte_kvargs.h>
+
+#include "private.h"
+#include "rte_bus_auxiliary.h"
+
+enum auxiliary_params {
+	RTE_AUXILIARY_PARAM_NAME,
+};
+
+static const char * const auxiliary_params_keys[] = {
+	[RTE_AUXILIARY_PARAM_NAME] = "name",
+};
+
+static int
+auxiliary_dev_match(const struct rte_device *dev,
+	      const void *_kvlist)
+{
+	const struct rte_kvargs *kvlist = _kvlist;
+	int ret;
+
+	ret = rte_kvargs_process(kvlist,
+			auxiliary_params_keys[RTE_AUXILIARY_PARAM_NAME],
+			rte_kvargs_strcmp, (void *)(uintptr_t)dev->name);
+
+	return ret != 0 ? -1 : 0;
+}
+
+void *
+auxiliary_dev_iterate(const void *start,
+		    const char *str,
+		    const struct rte_dev_iterator *it __rte_unused)
+{
+	rte_bus_find_device_t find_device;
+	struct rte_kvargs *kvargs = NULL;
+	struct rte_device *dev;
+
+	if (str != NULL) {
+		kvargs = rte_kvargs_parse(str, auxiliary_params_keys);
+		if (kvargs == NULL) {
+			RTE_LOG(ERR, EAL, "cannot parse argument list\n");
+			rte_errno = EINVAL;
+			return NULL;
+		}
+	}
+	find_device = auxiliary_bus.bus.find_device;
+	dev = find_device(start, auxiliary_dev_match, kvargs);
+	rte_kvargs_free(kvargs);
+	return dev;
+}
diff --git a/drivers/bus/auxiliary/linux/auxiliary.c b/drivers/bus/auxiliary/linux/auxiliary.c
new file mode 100644
index 0000000000..b75bb4d4a6
--- /dev/null
+++ b/drivers/bus/auxiliary/linux/auxiliary.c
@@ -0,0 +1,147 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright 2021 Mellanox Technologies, Ltd
+ */
+
+#include <string.h>
+#include <dirent.h>
+
+#include <rte_log.h>
+#include <rte_bus.h>
+#include <rte_malloc.h>
+#include <rte_devargs.h>
+#include <rte_memcpy.h>
+#include <eal_filesystem.h>
+
+#include "../rte_bus_auxiliary.h"
+#include "../private.h"
+
+#define AUXILIARY_SYSFS_PATH "/sys/bus/auxiliary/devices"
+
+/**
+ * @file
+ * Linux auxiliary probing.
+ */
+
+/* Scan one auxiliary sysfs entry, and fill the devices list from it. */
+static int
+auxiliary_scan_one(const char *dirname, const char *name)
+{
+	struct rte_auxiliary_device *dev;
+	struct rte_auxiliary_device *dev2;
+	char filename[PATH_MAX];
+	unsigned long tmp;
+	int ret;
+
+	dev = malloc(sizeof(*dev));
+	if (dev == NULL)
+		return -1;
+
+	memset(dev, 0, sizeof(*dev));
+	if (rte_strscpy(dev->name, name, sizeof(dev->name)) < 0) {
+		free(dev);
+		return -1;
+	}
+	dev->device.name = dev->name;
+	dev->device.bus = &auxiliary_bus.bus;
+
+	/* Get numa node, default to 0 if not present */
+	snprintf(filename, sizeof(filename), "%s/%s/numa_node",
+		 dirname, name);
+	if (access(filename, F_OK) != -1) {
+		if (eal_parse_sysfs_value(filename, &tmp) == 0)
+			dev->device.numa_node = tmp;
+		else
+			dev->device.numa_node = -1;
+	} else {
+		dev->device.numa_node = 0;
+	}
+
+	auxiliary_on_scan(dev);
+
+	/* Device is valid, add in list (sorted) */
+	TAILQ_FOREACH(dev2, &auxiliary_bus.device_list, next) {
+		ret = strcmp(dev->name, dev2->name);
+		if (ret > 0)
+			continue;
+		if (ret < 0) {
+			auxiliary_insert_device(dev2, dev);
+		} else { /* already registered */
+			if (rte_dev_is_probed(&dev2->device) &&
+			    dev2->device.devargs != dev->device.devargs) {
+				/* To probe device with new devargs. */
+				rte_devargs_remove(dev2->device.devargs);
+				auxiliary_on_scan(dev2);
+			}
+			free(dev);
+		}
+		return 0;
+	}
+	auxiliary_add_device(dev);
+	return 0;
+}
+
+/*
+ * Test whether the auxiliary device exist
+ */
+bool
+auxiliary_dev_exists(const char *name)
+{
+	DIR *dir;
+	char dirname[PATH_MAX];
+
+	snprintf(dirname, sizeof(dirname), "%s/%s",
+		 AUXILIARY_SYSFS_PATH, name);
+	dir = opendir(dirname);
+	if (dir == NULL)
+		return false;
+	closedir(dir);
+	return true;
+}
+
+/*
+ * Scan the content of the auxiliary bus, and the devices in the devices
+ * list
+ */
+int
+auxiliary_scan(void)
+{
+	struct dirent *e;
+	DIR *dir;
+	char dirname[PATH_MAX];
+	struct rte_auxiliary_driver *drv;
+
+	dir = opendir(AUXILIARY_SYSFS_PATH);
+	if (dir == NULL) {
+		AUXILIARY_LOG(INFO, "%s not found, is auxiliary module loaded?\n",
+			      AUXILIARY_SYSFS_PATH);
+		return 0;
+	}
+
+	while ((e = readdir(dir)) != NULL) {
+		if (e->d_name[0] == '.')
+			continue;
+
+		if (auxiliary_ignore_device(e->d_name))
+			continue;
+
+		snprintf(dirname, sizeof(dirname), "%s/%s",
+			 AUXILIARY_SYSFS_PATH, e->d_name);
+
+		/* Ignore if no driver can handle. */
+		FOREACH_DRIVER_ON_AUXILIARYBUS(drv) {
+			if (drv->match(e->d_name))
+				break;
+		}
+		if (drv == NULL)
+			continue;
+
+		if (auxiliary_scan_one(dirname, e->d_name) < 0)
+			goto error;
+	}
+	closedir(dir);
+	return 0;
+
+error:
+	closedir(dir);
+	return -1;
+}
diff --git a/drivers/bus/auxiliary/meson.build b/drivers/bus/auxiliary/meson.build
new file mode 100644
index 0000000000..f85608afd0
--- /dev/null
+++ b/drivers/bus/auxiliary/meson.build
@@ -0,0 +1,11 @@
+# SPDX-License-Identifier: BSD-3-Clause
+# Copyright 2021 Mellanox Technologies, Ltd
+
+headers = files('rte_bus_auxiliary.h')
+sources = files('auxiliary_common.c',
+	'auxiliary_params.c')
+if is_linux
+	sources += files('linux/auxiliary.c')
+endif
+deps += ['kvargs']
+
diff --git a/drivers/bus/auxiliary/private.h b/drivers/bus/auxiliary/private.h
new file mode 100644
index 0000000000..3529348900
--- /dev/null
+++ b/drivers/bus/auxiliary/private.h
@@ -0,0 +1,120 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright 2021 Mellanox Technologies, Ltd
+ */
+
+#ifndef _AUXILIARY_PRIVATE_H_
+#define _AUXILIARY_PRIVATE_H_
+
+#include <stdbool.h>
+#include <stdio.h>
+#include "rte_bus_auxiliary.h"
+
+extern struct rte_auxiliary_bus auxiliary_bus;
+extern int auxiliary_bus_logtype;
+
+#define AUXILIARY_LOG(level, fmt, args...) \
+	rte_log(RTE_LOG_ ## level, auxiliary_bus_logtype, "%s(): " fmt "\n", \
+		__func__, ##args)
+
+/* Auxiliary bus iterators */
+#define FOREACH_DEVICE_ON_AUXILIARYBUS(p)	\
+		TAILQ_FOREACH(p, &(auxiliary_bus.device_list), next)
+
+#define FOREACH_DRIVER_ON_AUXILIARYBUS(p)	\
+		TAILQ_FOREACH(p, &(auxiliary_bus.driver_list), next)
+
+/**
+ * Test whether the auxiliary device exist
+ *
+ * @param name
+ *  Auxiliary device name
+ * @return
+ *  true on exists, false otherwise
+ */
+bool auxiliary_dev_exists(const char *name);
+
+/**
+ * Scan the content of the auxiliary bus, and the devices in the devices
+ * list
+ *
+ * @return
+ *  0 on success, negative on error
+ */
+int auxiliary_scan(void);
+
+/**
+ * Setup or update device when being scanned.
+ *
+ * @param aux_dev
+ *	AUXILIARY device.
+ */
+void auxiliary_on_scan(struct rte_auxiliary_device *aux_dev);
+
+/**
+ * Validate whether a device with given auxiliary device should be ignored
+ * or not.
+ *
+ * @param name
+ *	Auxiliary name of device to be validated
+ * @return
+ *	true: if device is to be ignored,
+ *	false: if device is to be scanned,
+ */
+bool auxiliary_ignore_device(const char *name);
+
+/**
+ * Add an auxiliary device to the auxiliary bus (append to auxiliary Device
+ * list). This function also updates the bus references of the auxiliary
+ * Device (and the generic device object embedded within.
+ *
+ * @param aux_dev
+ *	AUXILIARY device to add
+ * @return void
+ */
+void auxiliary_add_device(struct rte_auxiliary_device *aux_dev);
+
+/**
+ * Insert an auxiliary device in the auxiliary bus at a particular location
+ * in the device list. It also updates the auxiliary bus reference of the
+ * new devices to be inserted.
+ *
+ * @param exist_aux_dev
+ *	Existing auxiliary device in auxiliary bus
+ * @param new_aux_dev
+ *	AUXILIARY device to be added before exist_aux_dev
+ * @return void
+ */
+void auxiliary_insert_device(struct rte_auxiliary_device *exist_aux_dev,
+			     struct rte_auxiliary_device *new_aux_dev);
+
+/**
+ * Match the auxiliary Driver and Device by driver function
+ *
+ * @param aux_drv
+ *      auxiliary driver
+ * @param aux_dev
+ *      auxiliary device to match against the driver
+ * @return
+ *      the driver can handle the device
+ */
+bool auxiliary_match(const struct rte_auxiliary_driver *aux_drv,
+		     const struct rte_auxiliary_device *aux_dev);
+
+/**
+ * Iterate over internal devices, matching any device against the provided
+ * string.
+ *
+ * @param start
+ *   Iteration starting point.
+ * @param str
+ *   Device string to match against.
+ * @param it
+ *   (unused) iterator structure.
+ * @return
+ *   A pointer to the next matching device if any.
+ *   NULL otherwise.
+ */
+void *auxiliary_dev_iterate(const void *start, const char *str,
+			    const struct rte_dev_iterator *it);
+
+#endif /* _AUXILIARY_PRIVATE_H_ */
diff --git a/drivers/bus/auxiliary/rte_bus_auxiliary.h b/drivers/bus/auxiliary/rte_bus_auxiliary.h
new file mode 100644
index 0000000000..d681464602
--- /dev/null
+++ b/drivers/bus/auxiliary/rte_bus_auxiliary.h
@@ -0,0 +1,199 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright 2021 Mellanox Technologies, Ltd
+ */
+
+#ifndef _RTE_BUS_AUXILIARY_H_
+#define _RTE_BUS_AUXILIARY_H_
+
+/**
+ * @file
+ *
+ * RTE Auxiliary Bus Interface.
+ */
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+#include <stdio.h>
+#include <stdlib.h>
+#include <limits.h>
+#include <errno.h>
+#include <sys/queue.h>
+#include <stdint.h>
+#include <inttypes.h>
+
+#include <rte_debug.h>
+#include <rte_interrupts.h>
+#include <rte_dev.h>
+#include <rte_bus.h>
+#include <rte_kvargs.h>
+
+/* Forward declarations */
+struct rte_auxiliary_driver;
+struct rte_auxiliary_bus;
+struct rte_auxiliary_device;
+
+/**
+ * Match function for the driver to decide if device can be handled.
+ *
+ * @param name
+ *   Pointer to the auxiliary device name.
+ * @return
+ *   Whether the driver can handle the auxiliary device.
+ */
+typedef bool(*rte_auxiliary_match_t) (const char *name);
+
+/**
+ * Initialization function for the driver called during auxiliary probing.
+ *
+ * @param drv
+ *   Pointer to the auxiliary driver.
+ * @param dev
+ *   Pointer to the auxiliary device.
+ * @return
+ *   - 0 On success.
+ *   - Negative value and rte_errno is set otherwise.
+ */
+typedef int(*rte_auxiliary_probe_t) (struct rte_auxiliary_driver *drv,
+				     struct rte_auxiliary_device *dev);
+
+/**
+ * Uninitialization function for the driver called during hotplugging.
+ *
+ * @param dev
+ *   Pointer to the auxiliary device.
+ * @return
+ *   - 0 On success.
+ *   - Negative value and rte_errno is set otherwise.
+ */
+typedef int (*rte_auxiliary_remove_t)(struct rte_auxiliary_device *dev);
+
+/**
+ * Driver-specific DMA mapping. After a successful call the device
+ * will be able to read/write from/to this segment.
+ *
+ * @param dev
+ *   Pointer to the auxiliary device.
+ * @param addr
+ *   Starting virtual address of memory to be mapped.
+ * @param iova
+ *   Starting IOVA address of memory to be mapped.
+ * @param len
+ *   Length of memory segment being mapped.
+ * @return
+ *   - 0 On success.
+ *   - Negative value and rte_errno is set otherwise.
+ */
+typedef int (*rte_auxiliary_dma_map_t)(struct rte_auxiliary_device *dev,
+				       void *addr, uint64_t iova, size_t len);
+
+/**
+ * Driver-specific DMA un-mapping. After a successful call the device
+ * will not be able to read/write from/to this segment.
+ *
+ * @param dev
+ *   Pointer to the auxiliary device.
+ * @param addr
+ *   Starting virtual address of memory to be unmapped.
+ * @param iova
+ *   Starting IOVA address of memory to be unmapped.
+ * @param len
+ *   Length of memory segment being unmapped.
+ * @return
+ *   - 0 On success.
+ *   - Negative value and rte_errno is set otherwise.
+ */
+typedef int (*rte_auxiliary_dma_unmap_t)(struct rte_auxiliary_device *dev,
+					 void *addr, uint64_t iova, size_t len);
+
+/**
+ * A structure describing an auxiliary device.
+ */
+struct rte_auxiliary_device {
+	TAILQ_ENTRY(rte_auxiliary_device) next;   /**< Next probed device. */
+	char name[RTE_DEV_NAME_MAX_LEN + 1];      /**< ASCII device name */
+	struct rte_device device;                 /**< Inherit core device */
+	struct rte_intr_handle intr_handle;       /**< Interrupt handle */
+	struct rte_auxiliary_driver *driver;      /**< driver used in probing */
+};
+
+/** List of auxiliary devices */
+TAILQ_HEAD(rte_auxiliary_device_list, rte_auxiliary_device);
+/** List of auxiliary drivers */
+TAILQ_HEAD(rte_auxiliary_driver_list, rte_auxiliary_driver);
+
+/**
+ * Structure describing the auxiliary bus
+ */
+struct rte_auxiliary_bus {
+	struct rte_bus bus;                  /**< Inherit the generic class */
+	struct rte_auxiliary_device_list device_list;  /**< List of devices */
+	struct rte_auxiliary_driver_list driver_list;  /**< List of drivers */
+};
+
+/**
+ * A structure describing an auxiliary driver.
+ */
+struct rte_auxiliary_driver {
+	TAILQ_ENTRY(rte_auxiliary_driver) next; /**< Next in list. */
+	struct rte_driver driver;            /**< Inherit core driver. */
+	struct rte_auxiliary_bus *bus;       /**< Auxiliary bus reference. */
+	rte_auxiliary_match_t match;         /**< Device match function. */
+	rte_auxiliary_probe_t probe;         /**< Device Probe function. */
+	rte_auxiliary_remove_t remove;       /**< Device Remove function. */
+	rte_auxiliary_dma_map_t dma_map;     /**< Device dma map function. */
+	rte_auxiliary_dma_unmap_t dma_unmap; /**< Device dma unmap function. */
+	uint32_t drv_flags;                  /**< Flags RTE_auxiliary_DRV_*. */
+};
+
+/**
+ * @internal
+ * Helper macro for drivers that need to convert to struct rte_auxiliary_device.
+ */
+#define RTE_DEV_TO_AUXILIARY(ptr) \
+	container_of(ptr, struct rte_auxiliary_device, device)
+
+#define RTE_DEV_TO_AUXILIARY_CONST(ptr) \
+	container_of(ptr, const struct rte_auxiliary_device, device)
+
+#define RTE_ETH_DEV_TO_AUXILIARY(eth_dev) \
+	RTE_DEV_TO_AUXILIARY((eth_dev)->device)
+
+/** Device driver needs IOVA as VA and cannot work with IOVA as PA */
+#define RTE_AUXILIARY_DRV_NEED_IOVA_AS_VA 0x002
+
+/**
+ * Register an auxiliary driver.
+ *
+ * @param driver
+ *   A pointer to a rte_auxiliary_driver structure describing the driver
+ *   to be registered.
+ */
+__rte_experimental
+void rte_auxiliary_register(struct rte_auxiliary_driver *driver);
+
+/** Helper for auxiliary device registration from driver instance */
+#define RTE_PMD_REGISTER_AUXILIARY(nm, auxiliary_drv)		\
+	RTE_INIT(auxiliaryinitfn_##nm)				\
+	{							\
+		(auxiliary_drv).driver.name = RTE_STR(nm);	\
+		rte_auxiliary_register(&auxiliary_drv);		\
+	}							\
+	RTE_PMD_EXPORT_NAME(nm, __COUNTER__)
+
+/**
+ * Unregister an auxiliary driver.
+ *
+ * @param driver
+ *   A pointer to a rte_auxiliary_driver structure describing the driver
+ *   to be unregistered.
+ */
+__rte_experimental
+void rte_auxiliary_unregister(struct rte_auxiliary_driver *driver);
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif /* _RTE_BUS_AUXILIARY_H_ */
diff --git a/drivers/bus/auxiliary/version.map b/drivers/bus/auxiliary/version.map
new file mode 100644
index 0000000000..3d270baea7
--- /dev/null
+++ b/drivers/bus/auxiliary/version.map
@@ -0,0 +1,10 @@
+DPDK_21 {
+	local: *;
+};
+
+EXPERIMENTAL {
+	global:
+
+	rte_auxiliary_register;
+	rte_auxiliary_unregister;
+};
diff --git a/drivers/bus/meson.build b/drivers/bus/meson.build
index 410058de3a..45eab5233d 100644
--- a/drivers/bus/meson.build
+++ b/drivers/bus/meson.build
@@ -2,6 +2,7 @@
 # Copyright(c) 2017 Intel Corporation
 
 drivers = [
+        'auxiliary',
         'dpaa',
         'fslmc',
         'ifpga',
-- 
2.25.1


^ permalink raw reply	[flat|nested] 42+ messages in thread

* [dpdk-dev] [RFC 11/14] common/mlx5: support auxiliary bus
  2021-05-27 14:01 ` [dpdk-dev] [RFC 10/14] bus/auxiliary: introduce auxiliary bus Xueming Li
@ 2021-05-27 14:01   ` Xueming Li
  2021-05-27 14:02   ` [dpdk-dev] [RFC 12/14] common/mlx5: get PCI device address from any bus Xueming Li
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 42+ messages in thread
From: Xueming Li @ 2021-05-27 14:01 UTC (permalink / raw)
  To: Viacheslav Ovsiienko
  Cc: dev, xuemingl, Matan Azrad, Shahaf Shuler, Ray Kinsella, Neil Horman

This patch adds auxiliary bus driver and delegate to
registered internal mlx5 common device drivers, i.e. eth, vdpa...

Current major target is to support SubFunction on auxiliary bus.

As a limitation of current driver, numa node of device is detected from
PCI bus of device symbol link, will remove once numa node file available
on sysfs.

Signed-off-by: Xueming Li <xuemingl@nvidia.com>
---
 drivers/common/mlx5/linux/meson.build         |   3 +
 .../common/mlx5/linux/mlx5_common_auxiliary.c | 173 ++++++++++++++++++
 drivers/common/mlx5/linux/mlx5_common_verbs.c |   5 +-
 drivers/common/mlx5/mlx5_common.c             |   3 +
 drivers/common/mlx5/mlx5_common.h             |   5 +
 drivers/common/mlx5/mlx5_common_private.h     |   5 +
 drivers/common/mlx5/version.map               |   2 +
 7 files changed, 195 insertions(+), 1 deletion(-)
 create mode 100644 drivers/common/mlx5/linux/mlx5_common_auxiliary.c

diff --git a/drivers/common/mlx5/linux/meson.build b/drivers/common/mlx5/linux/meson.build
index 007834a49b..a1070acb77 100644
--- a/drivers/common/mlx5/linux/meson.build
+++ b/drivers/common/mlx5/linux/meson.build
@@ -48,10 +48,13 @@ endif
 sources += files('mlx5_nl.c')
 sources += files('mlx5_common_os.c')
 sources += files('mlx5_common_verbs.c')
+sources += files('mlx5_common_auxiliary.c')
 if not dlopen_ibverbs
     sources += files('mlx5_glue.c')
 endif
 
+deps += ['bus_auxiliary']
+
 # To maintain the compatibility with the make build system
 # mlx5_autoconf.h file is still generated.
 # input array for meson member search:
diff --git a/drivers/common/mlx5/linux/mlx5_common_auxiliary.c b/drivers/common/mlx5/linux/mlx5_common_auxiliary.c
new file mode 100644
index 0000000000..f16fd2ee37
--- /dev/null
+++ b/drivers/common/mlx5/linux/mlx5_common_auxiliary.c
@@ -0,0 +1,173 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright 2020 Mellanox Technologies Ltd
+ */
+
+#include <stdlib.h>
+#include <dirent.h>
+#include <rte_malloc.h>
+#include <rte_errno.h>
+#include <rte_bus_auxiliary.h>
+#include <rte_common.h>
+#include "eal_filesystem.h"
+
+#include "mlx5_common_utils.h"
+#include "mlx5_common_private.h"
+
+#define AUXILIARY_SYSFS_PATH "/sys/bus/auxiliary/devices"
+#define MLX5_AUXILIARY_PREFIX "mlx5_core.sf."
+
+int
+mlx5_auxiliary_get_child_name(const char *dev, const char *node,
+			      char *child, size_t size)
+{
+	DIR *dir;
+	struct dirent *dent;
+	MKSTR(path, "%s/%s%s", AUXILIARY_SYSFS_PATH, dev, node);
+
+	dir = opendir(path);
+	if (dir == NULL) {
+		rte_errno = errno;
+		return -rte_errno;
+	}
+	/* Get the first file name. */
+	while ((dent = readdir(dir)) != NULL) {
+		if (dent->d_name[0] != '.')
+			break;
+	}
+	closedir(dir);
+	if (dent == NULL) {
+		rte_errno = ENOENT;
+		return -rte_errno;
+	}
+	if (rte_strscpy(child, dent->d_name, size) < 0)
+		return -rte_errno;
+	return 0;
+}
+
+static int
+mlx5_auxiliary_get_pci_path(const struct rte_auxiliary_device *dev,
+			    char *sysfs_pci, size_t size)
+{
+	char sysfs_real[PATH_MAX];
+	char *last_slash;
+	MKSTR(sysfs_aux, "%s/%s", AUXILIARY_SYSFS_PATH, dev->name);
+
+	if (realpath(sysfs_aux, sysfs_real) == NULL) {
+		rte_errno = errno;
+		return -rte_errno;
+	}
+	last_slash = strrchr(sysfs_real, '/');
+	if (last_slash == NULL) {
+		rte_errno = EINVAL;
+		return -rte_errno;
+	}
+	*last_slash = '\0';
+	if (rte_strscpy(sysfs_pci, sysfs_real, size) < 0)
+		return -rte_errno;
+	return 0;
+}
+
+static int
+mlx5_auxiliary_get_numa(const struct rte_auxiliary_device *dev)
+{
+	unsigned long numa;
+	char numa_path[PATH_MAX];
+
+	if (mlx5_auxiliary_get_pci_path(dev, numa_path, sizeof(numa_path)) != 0)
+		return SOCKET_ID_ANY;
+	if (strcat(numa_path, "/numa_node") == NULL) {
+		rte_errno = ENAMETOOLONG;
+		return SOCKET_ID_ANY;
+	}
+	if (eal_parse_sysfs_value(numa_path, &numa) != 0) {
+		rte_errno = EINVAL;
+		return SOCKET_ID_ANY;
+	}
+	return (int)numa;
+}
+
+struct ibv_device *
+mlx5_get_aux_ibv_device(const struct rte_auxiliary_device *dev)
+{
+	int n;
+	char ib_name[64];
+	struct ibv_device **ibv_list = mlx5_glue->get_device_list(&n);
+	struct ibv_device *ibv_match = NULL;
+
+	if (!ibv_list) {
+		rte_errno = ENOSYS;
+		return NULL;
+	}
+	if (mlx5_auxiliary_get_child_name(dev->name, "/infiniband",
+					  ib_name, sizeof(ib_name)) != 0)
+		return NULL;
+	while (n-- > 0) {
+		if (strcmp(ibv_list[n]->name, ib_name) != 0)
+			continue;
+		ibv_match = ibv_list[n];
+		break;
+	}
+	if (ibv_match == NULL)
+		rte_errno = ENOENT;
+	mlx5_glue->free_device_list(ibv_list);
+	return ibv_match;
+}
+
+static bool
+mlx5_common_auxiliary_match(const char *name)
+{
+	return strncmp(name, MLX5_AUXILIARY_PREFIX,
+		       strlen(MLX5_AUXILIARY_PREFIX)) == 0;
+}
+
+static int
+mlx5_common_auxiliary_probe(struct rte_auxiliary_driver *drv __rte_unused,
+			    struct rte_auxiliary_device *dev)
+{
+	dev->device.numa_node = mlx5_auxiliary_get_numa(dev);
+	return mlx5_common_dev_probe(&dev->device);
+}
+
+static int
+mlx5_common_auxiliary_remove(struct rte_auxiliary_device *auxiliary_dev)
+{
+	return mlx5_common_dev_remove(&auxiliary_dev->device);
+}
+
+static int
+mlx5_common_auxiliary_dma_map(struct rte_auxiliary_device *auxiliary_dev,
+			      void *addr, uint64_t iova, size_t len)
+{
+	return mlx5_common_dev_dma_map(&auxiliary_dev->device, addr, iova, len);
+}
+
+static int
+mlx5_common_auxiliary_dma_unmap(struct rte_auxiliary_device *auxiliary_dev,
+				void *addr, uint64_t iova, size_t len)
+{
+	return mlx5_common_dev_dma_unmap(&auxiliary_dev->device, addr, iova,
+					 len);
+}
+
+static struct rte_auxiliary_driver mlx5_auxiliary_driver = {
+	.driver = {
+		   .name = MLX5_AUXILIARY_DRIVER_NAME,
+	},
+	.match = mlx5_common_auxiliary_match,
+	.probe = mlx5_common_auxiliary_probe,
+	.remove = mlx5_common_auxiliary_remove,
+	.dma_map = mlx5_common_auxiliary_dma_map,
+	.dma_unmap = mlx5_common_auxiliary_dma_unmap,
+};
+
+void mlx5_common_auxiliary_init(void)
+{
+	if (mlx5_auxiliary_driver.bus == NULL)
+		rte_auxiliary_register(&mlx5_auxiliary_driver);
+}
+
+RTE_FINI(mlx5_common_auxiliary_driver_finish)
+{
+	if (mlx5_auxiliary_driver.bus != NULL)
+		rte_auxiliary_unregister(&mlx5_auxiliary_driver);
+}
diff --git a/drivers/common/mlx5/linux/mlx5_common_verbs.c b/drivers/common/mlx5/linux/mlx5_common_verbs.c
index a49440ef72..856e782878 100644
--- a/drivers/common/mlx5/linux/mlx5_common_verbs.c
+++ b/drivers/common/mlx5/linux/mlx5_common_verbs.c
@@ -12,6 +12,7 @@
 
 #include <rte_errno.h>
 #include <rte_bus_pci.h>
+#include <rte_bus_auxiliary.h>
 
 #include "mlx5_common_log.h"
 #include "mlx5_common_utils.h"
@@ -24,10 +25,12 @@
 struct ibv_device *
 mlx5_get_ibv_device(const struct rte_device *dev)
 {
-	struct ibv_device *ibv = NULL;
+	struct ibv_device *ibv;
 
 	if (mlx5_dev_is_pci(dev))
 		ibv = mlx5_get_pci_ibv_device(&RTE_DEV_TO_PCI_CONST(dev)->addr);
+	else
+		ibv = mlx5_get_aux_ibv_device(RTE_DEV_TO_AUXILIARY_CONST(dev));
 	if (ibv == NULL) {
 		rte_errno = ENODEV;
 		DRV_LOG(ERR, "Verbs device not found: %s", dev->name);
diff --git a/drivers/common/mlx5/mlx5_common.c b/drivers/common/mlx5/mlx5_common.c
index 875668d72b..b7be713cbe 100644
--- a/drivers/common/mlx5/mlx5_common.c
+++ b/drivers/common/mlx5/mlx5_common.c
@@ -356,6 +356,9 @@ mlx5_class_driver_register(struct mlx5_class_driver *driver)
 static void mlx5_common_driver_init(void)
 {
 	mlx5_common_pci_init();
+#ifdef RTE_EXEC_ENV_LINUX
+	mlx5_common_auxiliary_init();
+#endif
 }
 
 static bool mlx5_common_initialized;
diff --git a/drivers/common/mlx5/mlx5_common.h b/drivers/common/mlx5/mlx5_common.h
index 62a0dc4bad..c9c77ce540 100644
--- a/drivers/common/mlx5/mlx5_common.h
+++ b/drivers/common/mlx5/mlx5_common.h
@@ -23,6 +23,7 @@
 
 /* Reported driver name. */
 #define MLX5_PCI_DRIVER_NAME "mlx5_pci"
+#define MLX5_AUXILIARY_DRIVER_NAME "mlx5_auxiliary"
 
 /* Bit-field manipulation. */
 #define BITFIELD_DECLARE(bf, type, size) \
@@ -140,6 +141,10 @@ struct ibv_device;
 __rte_internal
 struct ibv_device *mlx5_get_ibv_device(const struct rte_device *dev);
 
+__rte_internal
+int mlx5_auxiliary_get_child_name(const char *dev, const char *node,
+				  char *child, size_t size);
+
 /* Maximum number of simultaneous unicast MAC addresses. */
 #define MLX5_MAX_UC_MAC_ADDRESSES 128
 /* Maximum number of simultaneous Multicast MAC addresses. */
diff --git a/drivers/common/mlx5/mlx5_common_private.h b/drivers/common/mlx5/mlx5_common_private.h
index 1beeaae50e..d1ab15ac43 100644
--- a/drivers/common/mlx5/mlx5_common_private.h
+++ b/drivers/common/mlx5/mlx5_common_private.h
@@ -6,6 +6,7 @@
 #define _MLX5_COMMON_PRIVATE_H_
 
 #include <rte_pci.h>
+#include <rte_bus_auxiliary.h>
 
 #include "mlx5_common.h"
 
@@ -36,6 +37,10 @@ bool mlx5_dev_pci_match(const struct mlx5_class_driver *drv,
 			const struct rte_device *dev);
 struct ibv_device *mlx5_get_pci_ibv_device(const struct rte_pci_addr *);
 
+/* Common auxiliary bus driver: */
+void mlx5_common_auxiliary_init(void);
+struct ibv_device *mlx5_get_aux_ibv_device(const struct rte_auxiliary_device *);
+
 #ifdef __cplusplus
 }
 #endif /* __cplusplus */
diff --git a/drivers/common/mlx5/version.map b/drivers/common/mlx5/version.map
index b10e1c4646..33eb7f09bc 100644
--- a/drivers/common/mlx5/version.map
+++ b/drivers/common/mlx5/version.map
@@ -3,6 +3,8 @@ INTERNAL {
 
 	haswell_broadwell_cpu;
 
+	mlx5_auxiliary_get_child_name;
+
 	mlx5_class_driver_register;
 
 	mlx5_common_init;
-- 
2.25.1


^ permalink raw reply	[flat|nested] 42+ messages in thread

* [dpdk-dev] [RFC 12/14] common/mlx5: get PCI device address from any bus
  2021-05-27 14:01 ` [dpdk-dev] [RFC 10/14] bus/auxiliary: introduce auxiliary bus Xueming Li
  2021-05-27 14:01   ` [dpdk-dev] [RFC 11/14] common/mlx5: support " Xueming Li
@ 2021-05-27 14:02   ` Xueming Li
  2021-05-27 14:02   ` [dpdk-dev] [RFC 13/14] vdpa/mlx5: support SubFunction Xueming Li
  2021-05-27 14:02   ` [dpdk-dev] [RFC 14/14] net/mlx5: " Xueming Li
  3 siblings, 0 replies; 42+ messages in thread
From: Xueming Li @ 2021-05-27 14:02 UTC (permalink / raw)
  To: Viacheslav Ovsiienko
  Cc: Thomas Monjalon, dev, xuemingl, Matan Azrad, Shahaf Shuler,
	Ray Kinsella, Neil Horman

From: Thomas Monjalon <thomas@monjalon.net>

A function is exported to allow retrieving the PCI address
of the parent PCI device of a Sub-Function in auxiliary bus sysfs.
The function mlx5_dev_to_pci_str() is accepting both PCI and auxiliary
devices. In case of a PCI device, it is simply using the device name.

The function mlx5_dev_to_pci_addr(), which is based on sysfs path
and do not use any device object, is renamed to mlx5_get_pci_addr()
for clarity purpose.

Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
---
 drivers/common/mlx5/linux/mlx5_common_auxiliary.c | 15 +++++++++++++++
 drivers/common/mlx5/linux/mlx5_common_os.c        |  3 +--
 drivers/common/mlx5/mlx5_common.c                 | 14 ++++++++++++++
 drivers/common/mlx5/mlx5_common.h                 | 12 +++++++++++-
 drivers/common/mlx5/mlx5_common_private.h         |  2 ++
 drivers/common/mlx5/version.map                   |  3 ++-
 drivers/net/mlx5/linux/mlx5_os.c                  |  5 ++---
 7 files changed, 47 insertions(+), 7 deletions(-)

diff --git a/drivers/common/mlx5/linux/mlx5_common_auxiliary.c b/drivers/common/mlx5/linux/mlx5_common_auxiliary.c
index f16fd2ee37..f97c5e7350 100644
--- a/drivers/common/mlx5/linux/mlx5_common_auxiliary.c
+++ b/drivers/common/mlx5/linux/mlx5_common_auxiliary.c
@@ -4,6 +4,8 @@
 
 #include <stdlib.h>
 #include <dirent.h>
+#include <libgen.h>
+
 #include <rte_malloc.h>
 #include <rte_errno.h>
 #include <rte_bus_auxiliary.h>
@@ -67,6 +69,19 @@ mlx5_auxiliary_get_pci_path(const struct rte_auxiliary_device *dev,
 	return 0;
 }
 
+int
+mlx5_auxiliary_get_pci_str(const struct rte_auxiliary_device *dev,
+			   char *addr, size_t size)
+{
+	char sysfs_pci[PATH_MAX];
+
+	if (mlx5_auxiliary_get_pci_path(dev, sysfs_pci, sizeof(sysfs_pci)) != 0)
+		return -ENODEV;
+	if (rte_strscpy(addr, basename(dirname(sysfs_pci)), size) < 0)
+		return -rte_errno;
+	return 0;
+}
+
 static int
 mlx5_auxiliary_get_numa(const struct rte_auxiliary_device *dev)
 {
diff --git a/drivers/common/mlx5/linux/mlx5_common_os.c b/drivers/common/mlx5/linux/mlx5_common_os.c
index cd1c305cc1..464a897072 100644
--- a/drivers/common/mlx5/linux/mlx5_common_os.c
+++ b/drivers/common/mlx5/linux/mlx5_common_os.c
@@ -24,8 +24,7 @@ const struct mlx5_glue *mlx5_glue;
 #endif
 
 int
-mlx5_dev_to_pci_addr(const char *dev_path,
-		     struct rte_pci_addr *pci_addr)
+mlx5_get_pci_addr(const char *dev_path, struct rte_pci_addr *pci_addr)
 {
 	FILE *file;
 	char line[32];
diff --git a/drivers/common/mlx5/mlx5_common.c b/drivers/common/mlx5/mlx5_common.c
index b7be713cbe..c3fdb7d23f 100644
--- a/drivers/common/mlx5/mlx5_common.c
+++ b/drivers/common/mlx5/mlx5_common.c
@@ -158,6 +158,20 @@ to_mlx5_device(const struct rte_device *rte_dev)
 	return NULL;
 }
 
+int
+mlx5_dev_to_pci_str(const struct rte_device *dev, char *addr, size_t size)
+{
+	if (mlx5_dev_is_pci(dev))
+		return rte_strscpy(addr, dev->name, size);
+#ifdef RTE_EXEC_ENV_LINUX
+	return mlx5_auxiliary_get_pci_str(RTE_DEV_TO_AUXILIARY_CONST(dev),
+			addr, size);
+#else
+	rte_errno = ENODEV;
+	return -rte_errno;
+#endif
+}
+
 static void
 dev_release(struct mlx5_common_device *dev)
 {
diff --git a/drivers/common/mlx5/mlx5_common.h b/drivers/common/mlx5/mlx5_common.h
index c9c77ce540..4d520b16eb 100644
--- a/drivers/common/mlx5/mlx5_common.h
+++ b/drivers/common/mlx5/mlx5_common.h
@@ -213,6 +213,16 @@ check_cqe(volatile struct mlx5_cqe *cqe, const uint16_t cqes_n,
 	return MLX5_CQE_STATUS_SW_OWN;
 }
 
+/*
+ * Get PCI address string from EAL device.
+ *
+ * @return
+ *   - Copied string length on success.
+ *   - Negative value and rte_errno is set otherwise.
+ */
+__rte_internal
+int mlx5_dev_to_pci_str(const struct rte_device *dev, char *addr, size_t size);
+
 /*
  * Get PCI address from sysfs of a PCI-related device.
  *
@@ -227,7 +237,7 @@ check_cqe(volatile struct mlx5_cqe *cqe, const uint16_t cqes_n,
  *   - Negative value and rte_errno is set otherwise.
  */
 __rte_internal
-int mlx5_dev_to_pci_addr(const char *dev_path, struct rte_pci_addr *pci_addr);
+int mlx5_get_pci_addr(const char *dev_path, struct rte_pci_addr *pci_addr);
 
 /*
  * Get kernel network interface name from sysfs IB device path.
diff --git a/drivers/common/mlx5/mlx5_common_private.h b/drivers/common/mlx5/mlx5_common_private.h
index d1ab15ac43..26dfc55a61 100644
--- a/drivers/common/mlx5/mlx5_common_private.h
+++ b/drivers/common/mlx5/mlx5_common_private.h
@@ -39,6 +39,8 @@ struct ibv_device *mlx5_get_pci_ibv_device(const struct rte_pci_addr *);
 
 /* Common auxiliary bus driver: */
 void mlx5_common_auxiliary_init(void);
+int mlx5_auxiliary_get_pci_str(const struct rte_auxiliary_device *dev,
+			       char *addr, size_t size);
 struct ibv_device *mlx5_get_aux_ibv_device(const struct rte_auxiliary_device *);
 
 #ifdef __cplusplus
diff --git a/drivers/common/mlx5/version.map b/drivers/common/mlx5/version.map
index 33eb7f09bc..afa52ec6bb 100644
--- a/drivers/common/mlx5/version.map
+++ b/drivers/common/mlx5/version.map
@@ -15,7 +15,7 @@ INTERNAL {
 	mlx5_create_mr_ext;
 
 	mlx5_dev_is_pci;
-	mlx5_dev_to_pci_addr; # WINDOWS_NO_EXPORT
+	mlx5_dev_to_pci_str;
 
 	mlx5_devx_alloc_uar; # WINDOWS_NO_EXPORT
 
@@ -77,6 +77,7 @@ INTERNAL {
 	mlx5_get_ibv_device; # WINDOWS_NO_EXPORT
 
 	mlx5_get_ifname_sysfs; # WINDOWS_NO_EXPORT
+	mlx5_get_pci_addr; # WINDOWS_NO_EXPORT
 
 	mlx5_glue;
 
diff --git a/drivers/net/mlx5/linux/mlx5_os.c b/drivers/net/mlx5/linux/mlx5_os.c
index e8e6b0d5c9..4f16230fa5 100644
--- a/drivers/net/mlx5/linux/mlx5_os.c
+++ b/drivers/net/mlx5/linux/mlx5_os.c
@@ -1857,7 +1857,7 @@ mlx5_device_bond_pci_match(const struct ibv_device *ibv_dev,
 		/* Process slave interface names in the loop. */
 		snprintf(tmp_str, sizeof(tmp_str),
 			 "/sys/class/net/%s", ifname);
-		if (mlx5_dev_to_pci_addr(tmp_str, &pci_addr)) {
+		if (mlx5_get_pci_addr(tmp_str, &pci_addr)) {
 			DRV_LOG(WARNING, "can not get PCI address"
 					 " for netdev \"%s\"", ifname);
 			continue;
@@ -2026,8 +2026,7 @@ mlx5_os_pci_probe_pf(struct rte_pci_device *pci_dev,
 			break;
 		} else {
 			/* Bonding device not found. */
-			if (mlx5_dev_to_pci_addr
-				(ibv_list[ret]->ibdev_path, &pci_addr))
+			if (mlx5_get_pci_addr(ibv_list[ret]->ibdev_path, &pci_addr))
 				continue;
 			if (owner_pci.domain != pci_addr.domain ||
 			    owner_pci.bus != pci_addr.bus ||
-- 
2.25.1


^ permalink raw reply	[flat|nested] 42+ messages in thread

* [dpdk-dev] [RFC 13/14] vdpa/mlx5: support SubFunction
  2021-05-27 14:01 ` [dpdk-dev] [RFC 10/14] bus/auxiliary: introduce auxiliary bus Xueming Li
  2021-05-27 14:01   ` [dpdk-dev] [RFC 11/14] common/mlx5: support " Xueming Li
  2021-05-27 14:02   ` [dpdk-dev] [RFC 12/14] common/mlx5: get PCI device address from any bus Xueming Li
@ 2021-05-27 14:02   ` Xueming Li
  2021-05-27 14:02   ` [dpdk-dev] [RFC 14/14] net/mlx5: " Xueming Li
  3 siblings, 0 replies; 42+ messages in thread
From: Xueming Li @ 2021-05-27 14:02 UTC (permalink / raw)
  To: Viacheslav Ovsiienko; +Cc: Thomas Monjalon, dev, xuemingl, Matan Azrad

From: Thomas Monjalon <thomas@monjalon.net>

Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
---
 drivers/vdpa/mlx5/mlx5_vdpa.c | 8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.c b/drivers/vdpa/mlx5/mlx5_vdpa.c
index 967234193f..2f4420becd 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa.c
@@ -553,9 +553,13 @@ mlx5_vdpa_sys_roce_disable(const char *addr)
 static int
 mlx5_vdpa_roce_disable(struct rte_device *dev)
 {
+	char pci_addr[PCI_PRI_STR_SIZE];
+
+	if (mlx5_dev_to_pci_str(dev, pci_addr, sizeof(pci_addr)) < 0)
+		return -rte_errno;
 	/* Firstly try to disable ROCE by Netlink and fallback to sysfs. */
-	if (mlx5_vdpa_nl_roce_disable(dev->name) != 0 &&
-	    mlx5_vdpa_sys_roce_disable(dev->name) != 0)
+	if (mlx5_vdpa_nl_roce_disable(pci_addr) != 0 &&
+	    mlx5_vdpa_sys_roce_disable(pci_addr) != 0)
 		return -rte_errno;
 	return 0;
 }
-- 
2.25.1


^ permalink raw reply	[flat|nested] 42+ messages in thread

* [dpdk-dev] [RFC 14/14] net/mlx5: support SubFunction
  2021-05-27 14:01 ` [dpdk-dev] [RFC 10/14] bus/auxiliary: introduce auxiliary bus Xueming Li
                     ` (2 preceding siblings ...)
  2021-05-27 14:02   ` [dpdk-dev] [RFC 13/14] vdpa/mlx5: support SubFunction Xueming Li
@ 2021-05-27 14:02   ` Xueming Li
  3 siblings, 0 replies; 42+ messages in thread
From: Xueming Li @ 2021-05-27 14:02 UTC (permalink / raw)
  To: Viacheslav Ovsiienko
  Cc: dev, xuemingl, Matan Azrad, Shahaf Shuler, Anatoly Burakov

This patch introduces SF support. Similar to VF, SF on auxiliary bus is
a portion of hardware PF, no representor or bonding parameters for SF.

Devargs to support SF:
-a auxiliary:mlx5_core.sf.8,dv_flow_en=1

New global syntax to support SF:
-a bus=auxiliary,name=mlx5_core.sf.8/class=eth/driver=mlx5,dv_flow_en=1

Signed-off-by: Xueming Li <xuemingl@nvidia.com>
---
 doc/guides/nics/mlx5.rst                | 339 +++++++++++++++++++++++-
 drivers/net/mlx5/linux/mlx5_ethdev_os.c |  12 +-
 drivers/net/mlx5/linux/mlx5_os.c        | 142 +++++++---
 drivers/net/mlx5/linux/mlx5_os.h        |   2 +
 drivers/net/mlx5/mlx5.c                 |  10 +-
 drivers/net/mlx5/mlx5.h                 |   1 +
 drivers/net/mlx5/mlx5_rxmode.c          |   8 +-
 drivers/net/mlx5/mlx5_trigger.c         |   2 +-
 8 files changed, 452 insertions(+), 64 deletions(-)

diff --git a/doc/guides/nics/mlx5.rst b/doc/guides/nics/mlx5.rst
index 83299646dd..3f5692038c 100644
--- a/doc/guides/nics/mlx5.rst
+++ b/doc/guides/nics/mlx5.rst
@@ -403,6 +403,300 @@ Limitations
   - Hairpin between two ports could only manual binding and explicit Tx flow mode. For single port hairpin, all the combinations of auto/manual binding and explicit/implicit Tx flow mode could be supported.
   - Hairpin in switchdev SR-IOV mode is not supported till now.
 
+- Meter:
+
+Limitations
+-----------
+
+- Windows support:
+
+  On Windows, the features are limited:
+
+  - Promiscuous mode is not supported
+  - The following rules are supported:
+
+    - IPv4/UDP with CVLAN filtering
+    - Unicast MAC filtering
+
+- For secondary process:
+
+  - Forked secondary process not supported.
+  - External memory unregistered in EAL memseg list cannot be used for DMA
+    unless such memory has been registered by ``mlx5_mr_update_ext_mp()`` in
+    primary process and remapped to the same virtual address in secondary
+    process. If the external memory is registered by primary process but has
+    different virtual address in secondary process, unexpected error may happen.
+
+- When using Verbs flow engine (``dv_flow_en`` = 0), flow pattern without any
+  specific VLAN will match for VLAN packets as well:
+
+  When VLAN spec is not specified in the pattern, the matching rule will be created with VLAN as a wild card.
+  Meaning, the flow rule::
+
+        flow create 0 ingress pattern eth / vlan vid is 3 / ipv4 / end ...
+
+  Will only match vlan packets with vid=3. and the flow rule::
+
+        flow create 0 ingress pattern eth / ipv4 / end ...
+
+  Will match any ipv4 packet (VLAN included).
+
+- When using Verbs flow engine (``dv_flow_en`` = 0), multi-tagged(QinQ) match is not supported.
+
+- When using DV flow engine (``dv_flow_en`` = 1), flow pattern with any VLAN specification will match only single-tagged packets unless the ETH item ``type`` field is 0x88A8 or the VLAN item ``has_more_vlan`` field is 1.
+  The flow rule::
+
+        flow create 0 ingress pattern eth / ipv4 / end ...
+
+  Will match any ipv4 packet.
+  The flow rules::
+
+        flow create 0 ingress pattern eth / vlan / end ...
+        flow create 0 ingress pattern eth has_vlan is 1 / end ...
+        flow create 0 ingress pattern eth type is 0x8100 / end ...
+
+  Will match single-tagged packets only, with any VLAN ID value.
+  The flow rules::
+
+        flow create 0 ingress pattern eth type is 0x88A8 / end ...
+        flow create 0 ingress pattern eth / vlan has_more_vlan is 1 / end ...
+
+  Will match multi-tagged packets only, with any VLAN ID value.
+
+- A flow pattern with 2 sequential VLAN items is not supported.
+
+- VLAN pop offload command:
+
+  - Flow rules having a VLAN pop offload command as one of their actions and
+    are lacking a match on VLAN as one of their items are not supported.
+  - The command is not supported on egress traffic in NIC mode.
+
+- VLAN push offload is not supported on ingress traffic in NIC mode.
+
+- VLAN set PCP offload is not supported on existing headers.
+
+- A multi segment packet must have not more segments than reported by dev_infos_get()
+  in tx_desc_lim.nb_seg_max field. This value depends on maximal supported Tx descriptor
+  size and ``txq_inline_min`` settings and may be from 2 (worst case forced by maximal
+  inline settings) to 58.
+
+- Flows with a VXLAN Network Identifier equal (or ends to be equal)
+  to 0 are not supported.
+
+- L3 VXLAN and VXLAN-GPE tunnels cannot be supported together with MPLSoGRE and MPLSoUDP.
+
+- Match on Geneve header supports the following fields only:
+
+     - VNI
+     - OAM
+     - protocol type
+     - options length
+
+- Match on Geneve TLV option is supported on the following fields:
+
+     - Class
+     - Type
+     - Length
+     - Data
+
+  Only one Class/Type/Length Geneve TLV option is supported per shared device.
+  Class/Type/Length fields must be specified as well as masks.
+  Class/Type/Length specified masks must be full.
+  Matching Geneve TLV option without specifying data is not supported.
+  Matching Geneve TLV option with ``data & mask == 0`` is not supported.
+
+- VF: flow rules created on VF devices can only match traffic targeted at the
+  configured MAC addresses (see ``rte_eth_dev_mac_addr_add()``).
+
+- Match on GTP tunnel header item supports the following fields only:
+
+     - v_pt_rsv_flags: E flag, S flag, PN flag
+     - msg_type
+     - teid
+
+- Match on GTP extension header only for GTP PDU session container (next
+  extension header type = 0x85).
+- Match on GTP extension header is not supported in group 0.
+
+- No Tx metadata go to the E-Switch steering domain for the Flow group 0.
+  The flows within group 0 and set metadata action are rejected by hardware.
+
+.. note::
+
+   MAC addresses not already present in the bridge table of the associated
+   kernel network device will be added and cleaned up by the PMD when closing
+   the device. In case of ungraceful program termination, some entries may
+   remain present and should be removed manually by other means.
+
+- Buffer split offload is supported with regular Rx burst routine only,
+  no MPRQ feature or vectorized code can be engaged.
+
+- When Multi-Packet Rx queue is configured (``mprq_en``), a Rx packet can be
+  externally attached to a user-provided mbuf with having EXT_ATTACHED_MBUF in
+  ol_flags. As the mempool for the external buffer is managed by PMD, all the
+  Rx mbufs must be freed before the device is closed. Otherwise, the mempool of
+  the external buffers will be freed by PMD and the application which still
+  holds the external buffers may be corrupted.
+
+- If Multi-Packet Rx queue is configured (``mprq_en``) and Rx CQE compression is
+  enabled (``rxq_cqe_comp_en``) at the same time, RSS hash result is not fully
+  supported. Some Rx packets may not have PKT_RX_RSS_HASH.
+
+- IPv6 Multicast messages are not supported on VM, while promiscuous mode
+  and allmulticast mode are both set to off.
+  To receive IPv6 Multicast messages on VM, explicitly set the relevant
+  MAC address using rte_eth_dev_mac_addr_add() API.
+
+- To support a mixed traffic pattern (some buffers from local host memory, some
+  buffers from other devices) with high bandwidth, a mbuf flag is used.
+
+  An application hints the PMD whether or not it should try to inline the
+  given mbuf data buffer. PMD should do the best effort to act upon this request.
+
+  The hint flag ``RTE_PMD_MLX5_FINE_GRANULARITY_INLINE`` is dynamic,
+  registered by application with rte_mbuf_dynflag_register(). This flag is
+  purely driver-specific and declared in PMD specific header ``rte_pmd_mlx5.h``,
+  which is intended to be used by the application.
+
+  To query the supported specific flags in runtime,
+  the function ``rte_pmd_mlx5_get_dyn_flag_names`` returns the array of
+  currently (over present hardware and configuration) supported specific flags.
+  The "not inline hint" feature operating flow is the following one:
+
+    - application starts
+    - probe the devices, ports are created
+    - query the port capabilities
+    - if port supporting the feature is found
+    - register dynamic flag ``RTE_PMD_MLX5_FINE_GRANULARITY_INLINE``
+    - application starts the ports
+    - on ``dev_start()`` PMD checks whether the feature flag is registered and
+      enables the feature support in datapath
+    - application might set the registered flag bit in ``ol_flags`` field
+      of mbuf being sent and PMD will handle ones appropriately.
+
+- The amount of descriptors in Tx queue may be limited by data inline settings.
+  Inline data require the more descriptor building blocks and overall block
+  amount may exceed the hardware supported limits. The application should
+  reduce the requested Tx size or adjust data inline settings with
+  ``txq_inline_max`` and ``txq_inline_mpw`` devargs keys.
+
+- To provide the packet send scheduling on mbuf timestamps the ``tx_pp``
+  parameter should be specified.
+  When PMD sees the RTE_MBUF_DYNFLAG_TX_TIMESTAMP_NAME set on the packet
+  being sent it tries to synchronize the time of packet appearing on
+  the wire with the specified packet timestamp. It the specified one
+  is in the past it should be ignored, if one is in the distant future
+  it should be capped with some reasonable value (in range of seconds).
+  These specific cases ("too late" and "distant future") can be optionally
+  reported via device xstats to assist applications to detect the
+  time-related problems.
+
+  The timestamp upper "too-distant-future" limit
+  at the moment of invoking the Tx burst routine
+  can be estimated as ``tx_pp`` option (in nanoseconds) multiplied by 2^23.
+  Please note, for the testpmd txonly mode,
+  the limit is deduced from the expression::
+
+        (n_tx_descriptors / burst_size + 1) * inter_burst_gap
+
+  There is no any packet reordering according timestamps is supposed,
+  neither within packet burst, nor between packets, it is an entirely
+  application responsibility to generate packets and its timestamps
+  in desired order. The timestamps can be put only in the first packet
+  in the burst providing the entire burst scheduling.
+
+- E-Switch decapsulation Flow:
+
+  - can be applied to PF port only.
+  - must specify VF port action (packet redirection from PF to VF).
+  - optionally may specify tunnel inner source and destination MAC addresses.
+
+- E-Switch  encapsulation Flow:
+
+  - can be applied to VF ports only.
+  - must specify PF port action (packet redirection from VF to PF).
+
+- Raw encapsulation:
+
+  - The input buffer, used as outer header, is not validated.
+
+- Raw decapsulation:
+
+  - The decapsulation is always done up to the outermost tunnel detected by the HW.
+  - The input buffer, providing the removal size, is not validated.
+  - The buffer size must match the length of the headers to be removed.
+
+- ICMP(code/type/identifier/sequence number) / ICMP6(code/type) matching, IP-in-IP and MPLS flow matching are all
+  mutually exclusive features which cannot be supported together
+  (see :ref:`mlx5_firmware_config`).
+
+- LRO:
+
+  - Requires DevX and DV flow to be enabled.
+  - KEEP_CRC offload cannot be supported with LRO.
+  - The first mbuf length, without head-room,  must be big enough to include the
+    TCP header (122B).
+  - Rx queue with LRO offload enabled, receiving a non-LRO packet, can forward
+    it with size limited to max LRO size, not to max RX packet length.
+  - LRO can be used with outer header of TCP packets of the standard format:
+        eth (with or without vlan) / ipv4 or ipv6 / tcp / payload
+
+    Other TCP packets (e.g. with MPLS label) received on Rx queue with LRO enabled, will be received with bad checksum.
+  - LRO packet aggregation is performed by HW only for packet size larger than
+    ``lro_min_mss_size``. This value is reported on device start, when debug
+    mode is enabled.
+
+- CRC:
+
+  - ``DEV_RX_OFFLOAD_KEEP_CRC`` cannot be supported with decapsulation
+    for some NICs (such as ConnectX-6 Dx, ConnectX-6 Lx, and BlueField-2).
+    The capability bit ``scatter_fcs_w_decap_disable`` shows NIC support.
+
+- TX mbuf fast free:
+
+  - fast free offload assumes the all mbufs being sent are originated from the
+    same memory pool and there is no any extra references to the mbufs (the
+    reference counter for each mbuf is equal 1 on tx_burst call). The latter
+    means there should be no any externally attached buffers in mbufs. It is
+    an application responsibility to provide the correct mbufs if the fast
+    free offload is engaged. The mlx5 PMD implicitly produces the mbufs with
+    externally attached buffers if MPRQ option is enabled, hence, the fast
+    free offload is neither supported nor advertised if there is MPRQ enabled.
+
+- Sample flow:
+
+  - Supports ``RTE_FLOW_ACTION_TYPE_SAMPLE`` action only within NIC Rx and
+    E-Switch steering domain.
+  - For E-Switch Sampling flow with sample ratio > 1, additional actions are not
+    supported in the sample actions list.
+  - For ConnectX-5, the ``RTE_FLOW_ACTION_TYPE_SAMPLE`` is typically used as
+    first action in the E-Switch egress flow if with header modify or
+    encapsulation actions.
+  - For NIC Rx flow, supports ``MARK``, ``COUNT``, ``QUEUE``, ``RSS`` in the
+    sample actions list.
+  - For E-Switch mirroring flow, supports ``RAW ENCAP``, ``Port ID``,
+    ``VXLAN ENCAP``, ``NVGRE ENCAP`` in the sample actions list.
+
+- Modify Field flow:
+
+  - Supports the 'set' operation only for ``RTE_FLOW_ACTION_TYPE_MODIFY_FIELD`` action.
+  - Modification of an arbitrary place in a packet via the special ``RTE_FLOW_FIELD_START`` Field ID is not supported.
+  - Modification of the 802.1Q Tag, VXLAN Network or GENEVE Network ID's is not supported.
+  - Encapsulation levels are not supported, can modify outermost header fields only.
+  - Offsets must be 32-bits aligned, cannot skip past the boundary of a field.
+
+- IPv6 header item 'proto' field, indicating the next header protocol, should
+  not be set as extension header.
+  In case the next header is an extension header, it should not be specified in
+  IPv6 header item 'proto' field.
+  The last extension header item 'next header' field can specify the following
+  header protocol type.
+
+- Hairpin:
+
+  - Hairpin between two ports could only manual binding and explicit Tx flow mode. For single port hairpin, all the combinations of auto/manual binding and explicit/implicit Tx flow mode could be supported.
+  - Hairpin in switchdev SR-IOV mode is not supported till now.
+
 - Meter:
 
   - All the meter colors with drop action will be counted only by the global drop statistics.
@@ -1438,13 +1732,17 @@ the DPDK application.
 
         echo switchdev > /sys/class/net/<net device>/compat/devlink/mode
 
-Sub-Function representor
-------------------------
+SubFunction support
+-------------------
+SubFunction is a portion of the PCI device, a SF netdev has its own
+dedicated queues(txq, rxq). A SF shares PCI level resources with other SFs
+and/or with its parent PCI function.
 
-Sub-Function is a portion of the PCI device, a SF netdev has its own
-dedicated queues(txq, rxq). A SF netdev supports E-Switch representation
-offload similar to existing PF and VF representors. A SF shares PCI
-level resources with other SFs and/or with its parent PCI function.
+0. Requirement::
+
+        kernel version >= 5.12 or OFED version >= 5.6
+
+        iproute2 >= 5.11
 
 1. Configure SF feature::
 
@@ -1457,21 +1755,34 @@ level resources with other SFs and/or with its parent PCI function.
             2: 32 SFs
             3: 64 SFs
 
-2. Reset the FW::
+2. Enable switchdev mode::
 
-        mlxfwreset -d <mst device> reset
+        devlink dev eswitch set pci/<DBDF> mode switchdev
 
-3. Enable switchdev mode::
+3. Add SF port::
 
-        echo switchdev > /sys/class/net/<net device>/compat/devlink/mode
+        devlink port add pci/<DBDF> flavour pcisf pfnum 0 sfnum <sfnum>
+
+        Get SFID from output: pci/<DBDF>/<SFID>
+
+4. Modify MAC address::
+
+        devlink port function set pci/<DBDF>/<SFID> hw_addr <MAC>
+
+5. Activate SF port::
+
+        devlink port function set pci/<DBDF>/<ID> state active
 
-4. Create SF::
+6. Devargs to probe SF device::
 
-        mlnx-sf -d <PCI_BDF> -a create
+        auxiliary:mlx5_core.sf.9,dv_flow_en=1
 
-5. Probe SF representor::
+SubFunction representor support
+-------------------------------
+A SF netdev supports E-Switch representation offload similar to existing PF
+and VF representors. Use <sfnum> to probe SF representor.
 
-        testpmd> port attach <PCI_BDF>,representor=sf0,dv_flow_en=1
+        testpmd> port attach <PCI_BDF>,representor=sf<sfnum>,dv_flow_en=1
 
 Performance tuning
 ------------------
diff --git a/drivers/net/mlx5/linux/mlx5_ethdev_os.c b/drivers/net/mlx5/linux/mlx5_ethdev_os.c
index 6fdb310129..8678502595 100644
--- a/drivers/net/mlx5/linux/mlx5_ethdev_os.c
+++ b/drivers/net/mlx5/linux/mlx5_ethdev_os.c
@@ -128,6 +128,17 @@ struct ethtool_link_settings {
 #define ETHTOOL_LINK_MODE_200000baseCR4_Full_BIT 2 /* 66 - 64 */
 #endif
 
+/* Get interface index from SubFunction device name. */
+int
+mlx5_auxiliary_get_ifindex(const char *sf_name)
+{
+	char if_name[IF_NAMESIZE];
+
+	if (mlx5_auxiliary_get_child_name(sf_name, "/net",
+					  if_name, sizeof(if_name)) != 0)
+		return -rte_errno;
+	return if_nametoindex(if_name);
+}
 
 /**
  * Get interface name from private structure.
@@ -1619,4 +1630,3 @@ mlx5_get_mac(struct rte_eth_dev *dev, uint8_t (*mac)[RTE_ETHER_ADDR_LEN])
 	memcpy(mac, request.ifr_hwaddr.sa_data, RTE_ETHER_ADDR_LEN);
 	return 0;
 }
-
diff --git a/drivers/net/mlx5/linux/mlx5_os.c b/drivers/net/mlx5/linux/mlx5_os.c
index 4f16230fa5..d74273a7ca 100644
--- a/drivers/net/mlx5/linux/mlx5_os.c
+++ b/drivers/net/mlx5/linux/mlx5_os.c
@@ -20,6 +20,7 @@
 #include <ethdev_pci.h>
 #include <rte_pci.h>
 #include <rte_bus_pci.h>
+#include <rte_bus_auxiliary.h>
 #include <rte_common.h>
 #include <rte_kvargs.h>
 #include <rte_rwlock.h>
@@ -1923,6 +1924,27 @@ mlx5_device_bond_pci_match(const struct ibv_device *ibv_dev,
 	return pf;
 }
 
+static void
+mlx5_os_config_default(struct mlx5_dev_config *config)
+{
+	memset(config, 0, sizeof(*config));
+	config->mps = MLX5_ARG_UNSET;
+	config->dbnc = MLX5_ARG_UNSET;
+	config->rx_vec_en = 1;
+	config->txq_inline_max = MLX5_ARG_UNSET;
+	config->txq_inline_min = MLX5_ARG_UNSET;
+	config->txq_inline_mpw = MLX5_ARG_UNSET;
+	config->txqs_inline = MLX5_ARG_UNSET;
+	config->vf_nl_en = 1;
+	config->mr_ext_memseg_en = 1;
+	config->mprq.max_memcpy_len = MLX5_MPRQ_MEMCPY_DEFAULT_LEN;
+	config->mprq.min_rxqs_num = MLX5_MPRQ_MIN_RXQS;
+	config->dv_esw_en = 1;
+	config->dv_flow_en = 1;
+	config->decap_en = 1;
+	config->log_hp_size = MLX5_ARG_UNSET;
+}
+
 /**
  * Register a PCI device within bonding.
  *
@@ -2334,23 +2356,8 @@ mlx5_os_pci_probe_pf(struct rte_pci_device *pci_dev,
 		uint32_t restore;
 
 		/* Default configuration. */
-		memset(&dev_config, 0, sizeof(struct mlx5_dev_config));
+		mlx5_os_config_default(&dev_config);
 		dev_config.vf = dev_config_vf;
-		dev_config.mps = MLX5_ARG_UNSET;
-		dev_config.dbnc = MLX5_ARG_UNSET;
-		dev_config.rx_vec_en = 1;
-		dev_config.txq_inline_max = MLX5_ARG_UNSET;
-		dev_config.txq_inline_min = MLX5_ARG_UNSET;
-		dev_config.txq_inline_mpw = MLX5_ARG_UNSET;
-		dev_config.txqs_inline = MLX5_ARG_UNSET;
-		dev_config.vf_nl_en = 1;
-		dev_config.mr_ext_memseg_en = 1;
-		dev_config.mprq.max_memcpy_len = MLX5_MPRQ_MEMCPY_DEFAULT_LEN;
-		dev_config.mprq.min_rxqs_num = MLX5_MPRQ_MIN_RXQS;
-		dev_config.dv_esw_en = 1;
-		dev_config.dv_flow_en = 1;
-		dev_config.decap_en = 1;
-		dev_config.log_hp_size = MLX5_ARG_UNSET;
 		list[i].eth_dev = mlx5_dev_spawn(&pci_dev->device,
 						 &list[i],
 						 &dev_config,
@@ -2407,6 +2414,35 @@ mlx5_os_pci_probe_pf(struct rte_pci_device *pci_dev,
 	return ret;
 }
 
+static int
+mlx5_os_parse_eth_devargs(struct rte_device *dev,
+			  struct rte_eth_devargs *eth_da)
+{
+	int ret = 0;
+
+	if (dev->devargs == NULL)
+		return 0;
+	memset(eth_da, 0, sizeof(*eth_da));
+	/* Parse representor information first from class argument. */
+	if (dev->devargs->cls_str)
+		ret = rte_eth_devargs_parse(dev->devargs->cls_str, eth_da);
+	if (ret != 0) {
+		DRV_LOG(ERR, "failed to parse device arguments: %s",
+			dev->devargs->cls_str);
+		return -rte_errno;
+	}
+	if (eth_da->type == RTE_ETH_REPRESENTOR_NONE) {
+		/* Parse legacy device argument */
+		ret = rte_eth_devargs_parse(dev->devargs->args, eth_da);
+		if (ret) {
+			DRV_LOG(ERR, "failed to parse device arguments: %s",
+				dev->devargs->args);
+			return -rte_errno;
+		}
+	}
+	return 0;
+}
+
 /**
  * Callback to register a PCI device.
  *
@@ -2421,31 +2457,13 @@ mlx5_os_pci_probe_pf(struct rte_pci_device *pci_dev,
 static int
 mlx5_os_pci_probe(struct rte_pci_device *pci_dev)
 {
-	struct rte_eth_devargs eth_da = { .type = RTE_ETH_REPRESENTOR_NONE };
+	struct rte_eth_devargs eth_da = { .nb_ports = 0 };
 	int ret = 0;
 	uint16_t p;
 
-	if (pci_dev->device.devargs) {
-		/* Parse representor information from device argument. */
-		if (pci_dev->device.devargs->cls_str)
-			ret = rte_eth_devargs_parse
-				(pci_dev->device.devargs->cls_str, &eth_da);
-		if (ret) {
-			DRV_LOG(ERR, "failed to parse device arguments: %s",
-				pci_dev->device.devargs->cls_str);
-			return -rte_errno;
-		}
-		if (eth_da.type == RTE_ETH_REPRESENTOR_NONE) {
-			/* Support legacy device argument */
-			ret = rte_eth_devargs_parse
-				(pci_dev->device.devargs->args, &eth_da);
-			if (ret) {
-				DRV_LOG(ERR, "failed to parse device arguments: %s",
-					pci_dev->device.devargs->args);
-				return -rte_errno;
-			}
-		}
-	}
+	ret = mlx5_os_parse_eth_devargs(&pci_dev->device, &eth_da);
+	if (ret != 0)
+		return ret;
 
 	if (eth_da.nb_ports > 0) {
 		/* Iterate all port if devargs pf is range: "pf[0-1]vf[...]". */
@@ -2458,10 +2476,53 @@ mlx5_os_pci_probe(struct rte_pci_device *pci_dev)
 	return ret;
 }
 
+/* Probe a single SF device on auxiliary bus, no representor support. */
+static int
+mlx5_os_auxiliary_probe(struct rte_device *dev)
+{
+	struct rte_eth_devargs eth_da = { .nb_ports = 0 };
+	struct mlx5_dev_config config;
+	struct mlx5_dev_spawn_data spawn = { .pf_bond = -1 };
+	struct rte_auxiliary_device *adev = RTE_DEV_TO_AUXILIARY(dev);
+	struct rte_eth_dev *eth_dev;
+	int ret = 0;
+
+	/* Parse ethdev devargs. */
+	ret = mlx5_os_parse_eth_devargs(dev, &eth_da);
+	if (ret != 0)
+		return ret;
+	/* Set default config data. */
+	mlx5_os_config_default(&config);
+	config.sf = 1;
+	/* Init spawn data. */
+	spawn.max_port = 1;
+	spawn.phys_port = 1;
+	spawn.phys_dev = mlx5_get_ibv_device(dev);
+	ret = mlx5_auxiliary_get_ifindex(dev->name);
+	if (ret < 0) {
+		DRV_LOG(ERR, "failed to get ethdev ifindex: %s", dev->name);
+		return ret;
+	}
+	spawn.ifindex = ret;
+	/* Spawn device. */
+	eth_dev = mlx5_dev_spawn(dev, &spawn, &config, &eth_da);
+	if (eth_dev == NULL)
+		return -rte_errno;
+	/* Post create. */
+	eth_dev->intr_handle = &adev->intr_handle;
+	if (rte_eal_process_type() == RTE_PROC_PRIMARY) {
+		eth_dev->data->dev_flags |= RTE_ETH_DEV_INTR_LSC;
+		eth_dev->data->dev_flags |= RTE_ETH_DEV_INTR_RMV;
+		eth_dev->data->numa_node = dev->numa_node;
+	}
+	rte_eth_dev_probing_finish(eth_dev);
+	return 0;
+}
+
 /**
  * Common bus driver callback to probe a device.
  *
- * This function probe PCI bus device(s).
+ * This function probe PCI bus device(s) or a single SF on auxiliary bus.
  *
  * @param[in] dev
  *   Pointer to the generic device.
@@ -2484,7 +2545,8 @@ mlx5_os_net_probe(struct rte_device *dev)
 	}
 	if (mlx5_dev_is_pci(dev))
 		return mlx5_os_pci_probe(RTE_DEV_TO_PCI(dev));
-	return 0;
+	else
+		return mlx5_os_auxiliary_probe(dev);
 }
 
 static int
diff --git a/drivers/net/mlx5/linux/mlx5_os.h b/drivers/net/mlx5/linux/mlx5_os.h
index af7cbeb418..2991d37df2 100644
--- a/drivers/net/mlx5/linux/mlx5_os.h
+++ b/drivers/net/mlx5/linux/mlx5_os.h
@@ -19,4 +19,6 @@ enum {
 
 #define MLX5_NAMESIZE IF_NAMESIZE
 
+int mlx5_auxiliary_get_ifindex(const char *sf_name);
+
 #endif /* RTE_PMD_MLX5_OS_H_ */
diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index 3defdb2db3..69edd55b86 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -2319,10 +2319,12 @@ mlx5_eth_find_next(uint16_t port_id, struct rte_eth_dev *odev)
 			if (opriv->sh == priv->sh ||
 			    odev->device == dev->device)
 				break;
-		} else if (dev->device != NULL && dev->device->driver &&
-			dev->device->driver->name &&
-			!strcmp(dev->device->driver->name,
-				MLX5_PCI_DRIVER_NAME)) {
+		} else if (dev->device != NULL && dev->device->driver != NULL &&
+			dev->device->driver->name != NULL &&
+			(strcmp(dev->device->driver->name,
+				MLX5_PCI_DRIVER_NAME) == 0 ||
+			 strcmp(dev->device->driver->name,
+				MLX5_AUXILIARY_DRIVER_NAME) == 0)) {
 			/* odev not specified, found all mlx5 devices. */
 			break;
 		}
diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index 27bb34e827..b06f45fc54 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -220,6 +220,7 @@ struct mlx5_dev_config {
 	unsigned int hw_fcs_strip:1; /* FCS stripping is supported. */
 	unsigned int hw_padding:1; /* End alignment padding is supported. */
 	unsigned int vf:1; /* This is a VF. */
+	unsigned int sf:1; /* This is a SF. */
 	unsigned int tunnel_en:1;
 	/* Whether tunnel stateless offloads are supported. */
 	unsigned int mpls_en:1; /* MPLS over GRE/UDP is enabled. */
diff --git a/drivers/net/mlx5/mlx5_rxmode.c b/drivers/net/mlx5/mlx5_rxmode.c
index 25fb47c9ed..7f19b235c2 100644
--- a/drivers/net/mlx5/mlx5_rxmode.c
+++ b/drivers/net/mlx5/mlx5_rxmode.c
@@ -36,7 +36,7 @@ mlx5_promiscuous_enable(struct rte_eth_dev *dev)
 			dev->data->port_id);
 		return 0;
 	}
-	if (priv->config.vf) {
+	if (priv->config.vf || priv->config.sf) {
 		ret = mlx5_os_set_promisc(dev, 1);
 		if (ret)
 			return ret;
@@ -69,7 +69,7 @@ mlx5_promiscuous_disable(struct rte_eth_dev *dev)
 	int ret;
 
 	dev->data->promiscuous = 0;
-	if (priv->config.vf) {
+	if (priv->config.vf || priv->config.sf) {
 		ret = mlx5_os_set_promisc(dev, 0);
 		if (ret)
 			return ret;
@@ -109,7 +109,7 @@ mlx5_allmulticast_enable(struct rte_eth_dev *dev)
 			dev->data->port_id);
 		return 0;
 	}
-	if (priv->config.vf) {
+	if (priv->config.vf || priv->config.sf) {
 		ret = mlx5_os_set_allmulti(dev, 1);
 		if (ret)
 			goto error;
@@ -142,7 +142,7 @@ mlx5_allmulticast_disable(struct rte_eth_dev *dev)
 	int ret;
 
 	dev->data->all_multicast = 0;
-	if (priv->config.vf) {
+	if (priv->config.vf || priv->config.sf) {
 		ret = mlx5_os_set_allmulti(dev, 0);
 		if (ret)
 			goto error;
diff --git a/drivers/net/mlx5/mlx5_trigger.c b/drivers/net/mlx5/mlx5_trigger.c
index 6c8a64ce03..e4e057a6f8 100644
--- a/drivers/net/mlx5/mlx5_trigger.c
+++ b/drivers/net/mlx5/mlx5_trigger.c
@@ -1259,7 +1259,7 @@ mlx5_traffic_enable(struct rte_eth_dev *dev)
 		}
 		mlx5_txq_release(dev, i);
 	}
-	if (priv->config.dv_esw_en && !priv->config.vf) {
+	if (priv->config.dv_esw_en && !priv->config.vf && !priv->config.sf) {
 		if (mlx5_flow_create_esw_table_zero_flow(dev))
 			priv->fdb_def_rule = 1;
 		else
-- 
2.25.1


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [dpdk-dev] [RFC 01/14] common/mlx5: add common device driver
  2021-05-27 13:37 ` [dpdk-dev] [RFC 01/14] common/mlx5: add common device driver Xueming Li
@ 2021-06-10  9:51   ` Thomas Monjalon
  2021-06-16  4:09   ` [dpdk-dev] [PATCH v1 00/14] net/mlx5: support Sub-Function Xueming Li
                     ` (14 subsequent siblings)
  15 siblings, 0 replies; 42+ messages in thread
From: Thomas Monjalon @ 2021-06-10  9:51 UTC (permalink / raw)
  To: Xueming Li; +Cc: Viacheslav Ovsiienko, dev, Matan Azrad, Ray Kinsella, orika

27/05/2021 15:37, Xueming Li:
> +static const struct {
> +	const char *name;
> +	unsigned int drv_class;
> +} mlx5_classes[] = {
> +	{ .name = "vdpa", .drv_class = MLX5_CLASS_VDPA },
> +	{ .name = "net", .drv_class = MLX5_CLASS_NET },
> +	{ .name = "regex", .drv_class = MLX5_CLASS_REGEX },
> +	{ .name = "compress", .drv_class = MLX5_CLASS_COMPRESS },
> +};

The class name should be "eth", not "net".
This is the standard in rte_ethdev.c:
	rte_class_find_by_name("eth")



^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [dpdk-dev] [RFC 00/14] mlx5: support SubFunction
  2021-05-27 13:37 [dpdk-dev] [RFC 00/14] mlx5: support SubFunction Xueming Li
                   ` (9 preceding siblings ...)
  2021-05-27 14:01 ` [dpdk-dev] [RFC 10/14] bus/auxiliary: introduce auxiliary bus Xueming Li
@ 2021-06-10 10:33 ` Ferruh Yigit
  2021-06-10 13:23   ` Thomas Monjalon
  10 siblings, 1 reply; 42+ messages in thread
From: Ferruh Yigit @ 2021-06-10 10:33 UTC (permalink / raw)
  To: Xueming Li, Viacheslav Ovsiienko; +Cc: dev, Thomas Monjalon, Chenbo Xia

On 5/27/2021 2:37 PM, Xueming Li wrote:
> SubFunction [1] is a portion of the PCI device, a SF netdev has its own
> dedicated queues(txq, rxq). A SF shares PCI level resources with other
> SFs and/or with its parent PCI function. Auxiliary bus is the
> fundamental of SF.
> 
> This patch set introduces SubFunction support for mlx5 PMD driver
> including class net, regex, vdpa and compress.
> 

There is already an mdev patch, originated from long ago. Aren't subfunctions
presented as mdev device? If so can't we use mdev for it?

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [dpdk-dev] [RFC 00/14] mlx5: support SubFunction
  2021-06-10 10:33 ` [dpdk-dev] [RFC 00/14] mlx5: " Ferruh Yigit
@ 2021-06-10 13:23   ` Thomas Monjalon
  2021-06-11  5:14     ` Xia, Chenbo
  0 siblings, 1 reply; 42+ messages in thread
From: Thomas Monjalon @ 2021-06-10 13:23 UTC (permalink / raw)
  To: Ferruh Yigit; +Cc: Xueming Li, Viacheslav Ovsiienko, dev, Chenbo Xia

10/06/2021 12:33, Ferruh Yigit:
> On 5/27/2021 2:37 PM, Xueming Li wrote:
> > SubFunction [1] is a portion of the PCI device, a SF netdev has its own
> > dedicated queues(txq, rxq). A SF shares PCI level resources with other
> > SFs and/or with its parent PCI function. Auxiliary bus is the
> > fundamental of SF.
> > 
> > This patch set introduces SubFunction support for mlx5 PMD driver
> > including class net, regex, vdpa and compress.
> > 
> 
> There is already an mdev patch, originated from long ago. Aren't subfunctions
> presented as mdev device? If so can't we use mdev for it?

No unfortunately that's different.
mlx5 SF is based on top of auxiliary bus in the kernel/sysfs.



^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [dpdk-dev] [RFC 00/14] mlx5: support SubFunction
  2021-06-10 13:23   ` Thomas Monjalon
@ 2021-06-11  5:14     ` Xia, Chenbo
  2021-06-11  7:54       ` Thomas Monjalon
  0 siblings, 1 reply; 42+ messages in thread
From: Xia, Chenbo @ 2021-06-11  5:14 UTC (permalink / raw)
  To: Thomas Monjalon, Yigit, Ferruh; +Cc: Xueming Li, Viacheslav Ovsiienko, dev

Hi Thomas,

> -----Original Message-----
> From: Thomas Monjalon <thomas@monjalon.net>
> Sent: Thursday, June 10, 2021 9:23 PM
> To: Yigit, Ferruh <ferruh.yigit@intel.com>
> Cc: Xueming Li <xuemingl@nvidia.com>; Viacheslav Ovsiienko
> <viacheslavo@nvidia.com>; dev@dpdk.org; Xia, Chenbo <chenbo.xia@intel.com>
> Subject: Re: [dpdk-dev] [RFC 00/14] mlx5: support SubFunction
> 
> 10/06/2021 12:33, Ferruh Yigit:
> > On 5/27/2021 2:37 PM, Xueming Li wrote:
> > > SubFunction [1] is a portion of the PCI device, a SF netdev has its
> own
> > > dedicated queues(txq, rxq). A SF shares PCI level resources with other
> > > SFs and/or with its parent PCI function. Auxiliary bus is the
> > > fundamental of SF.
> > >
> > > This patch set introduces SubFunction support for mlx5 PMD driver
> > > including class net, regex, vdpa and compress.
> > >
> >
> > There is already an mdev patch, originated from long ago. Aren't
> subfunctions
> > presented as mdev device? If so can't we use mdev for it?
> 
> No unfortunately that's different.
> mlx5 SF is based on top of auxiliary bus in the kernel/sysfs.
> 

Just out of curiosity:

Does SF use mdev before aux bus is introduced in kernel. I see some history
of it but am not sure: [1] seems SF was base on mdev. [2] seems BlueField
software v2.5 is using mdev for SF. I saw it yesterday and try to figure
out the history. Since you are here, guess you know something 😊

[1] https://patchwork.ozlabs.org/project/netdev/cover/20191107160448.20962-1-parav@mellanox.com/
[2] https://docs.mellanox.com/display/BlueFieldSWv25011176/Mediated+Devices

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [dpdk-dev] [RFC 00/14] mlx5: support SubFunction
  2021-06-11  5:14     ` Xia, Chenbo
@ 2021-06-11  7:54       ` Thomas Monjalon
  2021-06-15  2:10         ` Xia, Chenbo
  0 siblings, 1 reply; 42+ messages in thread
From: Thomas Monjalon @ 2021-06-11  7:54 UTC (permalink / raw)
  To: Yigit, Ferruh, Xia, Chenbo
  Cc: Xueming Li, Viacheslav Ovsiienko, dev, parav, jgg

11/06/2021 07:14, Xia, Chenbo:
> From: Thomas Monjalon <thomas@monjalon.net>
> > 10/06/2021 12:33, Ferruh Yigit:
> > > On 5/27/2021 2:37 PM, Xueming Li wrote:
> > > > SubFunction [1] is a portion of the PCI device, a SF netdev has its
> > own
> > > > dedicated queues(txq, rxq). A SF shares PCI level resources with other
> > > > SFs and/or with its parent PCI function. Auxiliary bus is the
> > > > fundamental of SF.
> > > >
> > > > This patch set introduces SubFunction support for mlx5 PMD driver
> > > > including class net, regex, vdpa and compress.
> > > >
> > >
> > > There is already an mdev patch, originated from long ago. Aren't
> > subfunctions
> > > presented as mdev device? If so can't we use mdev for it?
> > 
> > No unfortunately that's different.
> > mlx5 SF is based on top of auxiliary bus in the kernel/sysfs.
> 
> Just out of curiosity:
> 
> Does SF use mdev before aux bus is introduced in kernel. I see some history
> of it but am not sure: [1] seems SF was base on mdev. [2] seems BlueField
> software v2.5 is using mdev for SF. I saw it yesterday and try to figure
> out the history. Since you are here, guess you know something 😊
> 
> [1] https://patchwork.ozlabs.org/project/netdev/cover/20191107160448.20962-1-parav@mellanox.com/
> [2] https://docs.mellanox.com/display/BlueFieldSWv25011176/Mediated+Devices

Kernel maintainers rejected the use of mdev for this purpose
and suggested to use a real bus.
You can follow the discussion here:
https://lore.kernel.org/netdev/20191108205204.GB1277001@kroah.com/

Does Intel plan to use mdev for SF?

For the sake of follow-up discussion, this is the official mdev doc:
https://www.kernel.org/doc/Documentation/vfio-mediated-device.txt




^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [dpdk-dev] [RFC 00/14] mlx5: support SubFunction
  2021-06-11  7:54       ` Thomas Monjalon
@ 2021-06-15  2:10         ` Xia, Chenbo
  2021-06-15  4:04           ` Parav Pandit
  0 siblings, 1 reply; 42+ messages in thread
From: Xia, Chenbo @ 2021-06-15  2:10 UTC (permalink / raw)
  To: Thomas Monjalon, Yigit, Ferruh
  Cc: Xueming Li, Viacheslav Ovsiienko, dev, parav, jgg

Hi Thomas,

> -----Original Message-----
> From: Thomas Monjalon <thomas@monjalon.net>
> Sent: Friday, June 11, 2021 3:54 PM
> To: Yigit, Ferruh <ferruh.yigit@intel.com>; Xia, Chenbo <chenbo.xia@intel.com>
> Cc: Xueming Li <xuemingl@nvidia.com>; Viacheslav Ovsiienko
> <viacheslavo@nvidia.com>; dev@dpdk.org; parav@nvidia.com; jgg@nvidia.com
> Subject: Re: [dpdk-dev] [RFC 00/14] mlx5: support SubFunction
> 
> 11/06/2021 07:14, Xia, Chenbo:
> > From: Thomas Monjalon <thomas@monjalon.net>
> > > 10/06/2021 12:33, Ferruh Yigit:
> > > > On 5/27/2021 2:37 PM, Xueming Li wrote:
> > > > > SubFunction [1] is a portion of the PCI device, a SF netdev has its
> > > own
> > > > > dedicated queues(txq, rxq). A SF shares PCI level resources with other
> > > > > SFs and/or with its parent PCI function. Auxiliary bus is the
> > > > > fundamental of SF.
> > > > >
> > > > > This patch set introduces SubFunction support for mlx5 PMD driver
> > > > > including class net, regex, vdpa and compress.
> > > > >
> > > >
> > > > There is already an mdev patch, originated from long ago. Aren't
> > > subfunctions
> > > > presented as mdev device? If so can't we use mdev for it?
> > >
> > > No unfortunately that's different.
> > > mlx5 SF is based on top of auxiliary bus in the kernel/sysfs.
> >
> > Just out of curiosity:
> >
> > Does SF use mdev before aux bus is introduced in kernel. I see some history
> > of it but am not sure: [1] seems SF was base on mdev. [2] seems BlueField
> > software v2.5 is using mdev for SF. I saw it yesterday and try to figure
> > out the history. Since you are here, guess you know something 😊
> >
> > [1] https://patchwork.ozlabs.org/project/netdev/cover/20191107160448.20962-
> 1-parav@mellanox.com/
> > [2] https://docs.mellanox.com/display/BlueFieldSWv25011176/Mediated+Devices
> 
> Kernel maintainers rejected the use of mdev for this purpose
> and suggested to use a real bus.
> You can follow the discussion here:
> https://lore.kernel.org/netdev/20191108205204.GB1277001@kroah.com/

OK. Thanks for the info.

> 
> Does Intel plan to use mdev for SF?

Yes. In our term it's called Assignable Device Interface (ADI) introduced in Intel
Scalable IOV (https://01.org/blogs/2019/assignable-interfaces-intel-scalable-i/o-virtualization-linux)

And vfio-mdev is chosen to be the software framework for it. I start to realize there
is difference between SF and ADI: SF considers multi-function devices which may include
net/regex/vdpa/... But ADI only focuses on the virtualization of the devices and splitting
devices to logic parts and providing huge number of interfaces to host APP. I think SF
also considers this but is mainly used for multi-function devices (like DPU in your term?
Correct me if I'm wrong).

And I also noticed that the mdev-based interface can only be used in userspace but aux-based
interface can also be used by other kernel sub-system (like for net, wrap it as netdev).

Thanks,
Chenbo

> 
> For the sake of follow-up discussion, this is the official mdev doc:
> https://www.kernel.org/doc/Documentation/vfio-mediated-device.txt
> 
> 


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [dpdk-dev] [RFC 00/14] mlx5: support SubFunction
  2021-06-15  2:10         ` Xia, Chenbo
@ 2021-06-15  4:04           ` Parav Pandit
  2021-06-15  5:33             ` Xia, Chenbo
  0 siblings, 1 reply; 42+ messages in thread
From: Parav Pandit @ 2021-06-15  4:04 UTC (permalink / raw)
  To: Xia, Chenbo, NBU-Contact-Thomas Monjalon, Yigit, Ferruh
  Cc: Xueming(Steven) Li, Slava Ovsiienko, dev, Jason Gunthorpe

Hi Chenbo,

> From: Xia, Chenbo <chenbo.xia@intel.com>
> Sent: Tuesday, June 15, 2021 7:41 AM
> 
> Hi Thomas,
> 
> > From: Thomas Monjalon <thomas@monjalon.net>
> > Sent: Friday, June 11, 2021 3:54 PM
[..]

> 
> Yes. In our term it's called Assignable Device Interface (ADI) introduced in
> Intel Scalable IOV (https://01.org/blogs/2019/assignable-interfaces-intel-
> scalable-i/o-virtualization-linux)
> 
> And vfio-mdev is chosen to be the software framework for it. I start to realize
> there is difference between SF and ADI: SF considers multi-function devices
> which may include net/regex/vdpa/... 
Yes. net, rdma, vdpa, regex ++.
And eventually vfio_device to map to VM too.

Non mdev framework is chosen so that all the use cases of kernel only, or user only or mix modes can be supported.

> But ADI only focuses on the
> virtualization of the devices and splitting devices to logic parts and providing
> huge number of interfaces to host APP. I think SF also considers this but is
> mainly used for multi-function devices (like DPU in your term?
> Correct me if I'm wrong).
> 
SF also supports DPU mode too but it is in addition to above use cases.
SF will expose mdev (or a vfio_device) to map to a VM.

> And I also noticed that the mdev-based interface can only be used in
> userspace but aux-based interface can also be used by other kernel sub-
> system (like for net, wrap it as netdev).
Correct.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [dpdk-dev] [RFC 00/14] mlx5: support SubFunction
  2021-06-15  4:04           ` Parav Pandit
@ 2021-06-15  5:33             ` Xia, Chenbo
  2021-06-15  5:43               ` Parav Pandit
  0 siblings, 1 reply; 42+ messages in thread
From: Xia, Chenbo @ 2021-06-15  5:33 UTC (permalink / raw)
  To: Parav Pandit, NBU-Contact-Thomas Monjalon, Yigit, Ferruh
  Cc: Xueming(Steven) Li, Slava Ovsiienko, dev, Jason Gunthorpe

Hi Parav,

> -----Original Message-----
> From: Parav Pandit <parav@nvidia.com>
> Sent: Tuesday, June 15, 2021 12:05 PM
> To: Xia, Chenbo <chenbo.xia@intel.com>; NBU-Contact-Thomas Monjalon
> <thomas@monjalon.net>; Yigit, Ferruh <ferruh.yigit@intel.com>
> Cc: Xueming(Steven) Li <xuemingl@nvidia.com>; Slava Ovsiienko
> <viacheslavo@nvidia.com>; dev@dpdk.org; Jason Gunthorpe <jgg@nvidia.com>
> Subject: RE: [dpdk-dev] [RFC 00/14] mlx5: support SubFunction
> 
> Hi Chenbo,
> 
> > From: Xia, Chenbo <chenbo.xia@intel.com>
> > Sent: Tuesday, June 15, 2021 7:41 AM
> >
> > Hi Thomas,
> >
> > > From: Thomas Monjalon <thomas@monjalon.net>
> > > Sent: Friday, June 11, 2021 3:54 PM
> [..]
> 
> >
> > Yes. In our term it's called Assignable Device Interface (ADI) introduced in
> > Intel Scalable IOV (https://01.org/blogs/2019/assignable-interfaces-intel-
> > scalable-i/o-virtualization-linux)
> >
> > And vfio-mdev is chosen to be the software framework for it. I start to
> realize
> > there is difference between SF and ADI: SF considers multi-function devices
> > which may include net/regex/vdpa/...
> Yes. net, rdma, vdpa, regex ++.
> And eventually vfio_device to map to VM too.
> 
> Non mdev framework is chosen so that all the use cases of kernel only, or user
> only or mix modes can be supported.

OK. Got it.

> 
> > But ADI only focuses on the
> > virtualization of the devices and splitting devices to logic parts and
> providing
> > huge number of interfaces to host APP. I think SF also considers this but is
> > mainly used for multi-function devices (like DPU in your term?
> > Correct me if I'm wrong).
> >
> SF also supports DPU mode too but it is in addition to above use cases.
> SF will expose mdev (or a vfio_device) to map to a VM.

So your SW actually supports vfio-mdev? I suppose the device-specific mdev
Kernel module is out-of-tree?

Just FYI:

We are introducing a new mdev bus for DPDK:
http://patchwork.dpdk.org/project/dpdk/cover/20210601030644.3318-1-chenbo.xia@intel.com/

Thanks,
Chenbo

> 
> > And I also noticed that the mdev-based interface can only be used in
> > userspace but aux-based interface can also be used by other kernel sub-
> > system (like for net, wrap it as netdev).
> Correct.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [dpdk-dev] [RFC 00/14] mlx5: support SubFunction
  2021-06-15  5:33             ` Xia, Chenbo
@ 2021-06-15  5:43               ` Parav Pandit
  2021-06-15 11:19                 ` Xia, Chenbo
  0 siblings, 1 reply; 42+ messages in thread
From: Parav Pandit @ 2021-06-15  5:43 UTC (permalink / raw)
  To: Xia, Chenbo, NBU-Contact-Thomas Monjalon, Yigit, Ferruh
  Cc: Xueming(Steven) Li, Slava Ovsiienko, dev, Jason Gunthorpe



> From: Xia, Chenbo <chenbo.xia@intel.com>
> Sent: Tuesday, June 15, 2021 11:03 AM
> 
> Hi Parav,
> 
> > -----Original Message-----
> > From: Parav Pandit <parav@nvidia.com>
> > Sent: Tuesday, June 15, 2021 12:05 PM
> > To: Xia, Chenbo <chenbo.xia@intel.com>; NBU-Contact-Thomas Monjalon
> > <thomas@monjalon.net>; Yigit, Ferruh <ferruh.yigit@intel.com>
> > Cc: Xueming(Steven) Li <xuemingl@nvidia.com>; Slava Ovsiienko
> > <viacheslavo@nvidia.com>; dev@dpdk.org; Jason Gunthorpe
> > <jgg@nvidia.com>
> > Subject: RE: [dpdk-dev] [RFC 00/14] mlx5: support SubFunction
> >
> > Hi Chenbo,
> >
> > > From: Xia, Chenbo <chenbo.xia@intel.com>
> > > Sent: Tuesday, June 15, 2021 7:41 AM
> > >
> > > Hi Thomas,
> > >
> > > > From: Thomas Monjalon <thomas@monjalon.net>
> > > > Sent: Friday, June 11, 2021 3:54 PM
> > [..]
> >
> > >
> > > Yes. In our term it's called Assignable Device Interface (ADI)
> > > introduced in Intel Scalable IOV
> > > (https://01.org/blogs/2019/assignable-interfaces-intel-
> > > scalable-i/o-virtualization-linux)
> > >
> > > And vfio-mdev is chosen to be the software framework for it. I start
> > > to
> > realize
> > > there is difference between SF and ADI: SF considers multi-function
> > > devices which may include net/regex/vdpa/...
> > Yes. net, rdma, vdpa, regex ++.
> > And eventually vfio_device to map to VM too.
> >
> > Non mdev framework is chosen so that all the use cases of kernel only,
> > or user only or mix modes can be supported.
> 
> OK. Got it.
> 
> >
> > > But ADI only focuses on the
> > > virtualization of the devices and splitting devices to logic parts
> > > and
> > providing
> > > huge number of interfaces to host APP. I think SF also considers
> > > this but is mainly used for multi-function devices (like DPU in your term?
> > > Correct me if I'm wrong).
> > >
> > SF also supports DPU mode too but it is in addition to above use cases.
> > SF will expose mdev (or a vfio_device) to map to a VM.
> 
> So your SW actually supports vfio-mdev? I suppose the device-specific mdev
> Kernel module is out-of-tree?
> 
mlx5 driver doesn't support vfio_device for SFs. 
Kernel plumbing for PASID assignment to SF is WIP currently kernel community.
We do not have any out-of-tree kernel module.

> Just FYI:
> 
> We are introducing a new mdev bus for DPDK:
> http://patchwork.dpdk.org/project/dpdk/cover/20210601030644.3318-1-
> chenbo.xia@intel.com/
> 
I am yet to read about it. But I am not sure what value does it add.
A user can open a vfio device using vfio subsystem and operate on it. 
A vfio device can be a create as a result of binding PCI VF/PF to vfio-pci driver or a SF by binding SF to vfio_foo driver.
There is kernel work in progress to use vfio core as library.
So we do not anticipate to use add mdev layer and uuid to create a vfio device for a SF.

For Intel, ADI will never has any netdevs or rdma dev?

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [dpdk-dev] [RFC 00/14] mlx5: support SubFunction
  2021-06-15  5:43               ` Parav Pandit
@ 2021-06-15 11:19                 ` Xia, Chenbo
  2021-06-15 12:47                   ` Parav Pandit
  0 siblings, 1 reply; 42+ messages in thread
From: Xia, Chenbo @ 2021-06-15 11:19 UTC (permalink / raw)
  To: Parav Pandit, NBU-Contact-Thomas Monjalon, Yigit, Ferruh
  Cc: Xueming(Steven) Li, Slava Ovsiienko, dev, Jason Gunthorpe

Hi Parav,

> -----Original Message-----
> From: Parav Pandit <parav@nvidia.com>
> Sent: Tuesday, June 15, 2021 1:43 PM
> To: Xia, Chenbo <chenbo.xia@intel.com>; NBU-Contact-Thomas Monjalon
> <thomas@monjalon.net>; Yigit, Ferruh <ferruh.yigit@intel.com>
> Cc: Xueming(Steven) Li <xuemingl@nvidia.com>; Slava Ovsiienko
> <viacheslavo@nvidia.com>; dev@dpdk.org; Jason Gunthorpe <jgg@nvidia.com>
> Subject: RE: [dpdk-dev] [RFC 00/14] mlx5: support SubFunction
> 
> 
> 
> > From: Xia, Chenbo <chenbo.xia@intel.com>
> > Sent: Tuesday, June 15, 2021 11:03 AM
> >
> > Hi Parav,
> >
> > > -----Original Message-----
> > > From: Parav Pandit <parav@nvidia.com>
> > > Sent: Tuesday, June 15, 2021 12:05 PM
> > > To: Xia, Chenbo <chenbo.xia@intel.com>; NBU-Contact-Thomas Monjalon
> > > <thomas@monjalon.net>; Yigit, Ferruh <ferruh.yigit@intel.com>
> > > Cc: Xueming(Steven) Li <xuemingl@nvidia.com>; Slava Ovsiienko
> > > <viacheslavo@nvidia.com>; dev@dpdk.org; Jason Gunthorpe
> > > <jgg@nvidia.com>
> > > Subject: RE: [dpdk-dev] [RFC 00/14] mlx5: support SubFunction
> > >
> > > Hi Chenbo,
> > >
> > > > From: Xia, Chenbo <chenbo.xia@intel.com>
> > > > Sent: Tuesday, June 15, 2021 7:41 AM
> > > >
> > > > Hi Thomas,
> > > >
> > > > > From: Thomas Monjalon <thomas@monjalon.net>
> > > > > Sent: Friday, June 11, 2021 3:54 PM
> > > [..]
> > >
> > > >
> > > > Yes. In our term it's called Assignable Device Interface (ADI)
> > > > introduced in Intel Scalable IOV
> > > > (https://01.org/blogs/2019/assignable-interfaces-intel-
> > > > scalable-i/o-virtualization-linux)
> > > >
> > > > And vfio-mdev is chosen to be the software framework for it. I start
> > > > to
> > > realize
> > > > there is difference between SF and ADI: SF considers multi-function
> > > > devices which may include net/regex/vdpa/...
> > > Yes. net, rdma, vdpa, regex ++.
> > > And eventually vfio_device to map to VM too.
> > >
> > > Non mdev framework is chosen so that all the use cases of kernel only,
> > > or user only or mix modes can be supported.
> >
> > OK. Got it.
> >
> > >
> > > > But ADI only focuses on the
> > > > virtualization of the devices and splitting devices to logic parts
> > > > and
> > > providing
> > > > huge number of interfaces to host APP. I think SF also considers
> > > > this but is mainly used for multi-function devices (like DPU in your
> term?
> > > > Correct me if I'm wrong).
> > > >
> > > SF also supports DPU mode too but it is in addition to above use cases.
> > > SF will expose mdev (or a vfio_device) to map to a VM.
> >
> > So your SW actually supports vfio-mdev? I suppose the device-specific mdev
> > Kernel module is out-of-tree?
> >
> mlx5 driver doesn't support vfio_device for SFs.
> Kernel plumbing for PASID assignment to SF is WIP currently kernel community.
> We do not have any out-of-tree kernel module.
> 
> > Just FYI:
> >
> > We are introducing a new mdev bus for DPDK:
> > http://patchwork.dpdk.org/project/dpdk/cover/20210601030644.3318-1-
> > chenbo.xia@intel.com/
> >
> I am yet to read about it. But I am not sure what value does it add.
> A user can open a vfio device using vfio subsystem and operate on it.
> A vfio device can be a create as a result of binding PCI VF/PF to vfio-pci
> driver or a SF by binding SF to vfio_foo driver.

Yes, in general it is the way. For vfio-mdev, it works as binding the vfio-mdev
to parent device and echo uuid to create a virtual device. VFIO APP like DPDK,
as you said, should work similar with VFIO UAPI for vfio-pci devices or mdev-based
devices. But currently DPDK only cares about vfio-pci devices and does not care
things for other cases like mdev-based pci devices. For example, it does not scan
/sys/bus/mdev and it always uses pci bdf as device address, which mdev-based pci
devices do not have. Therefore I sent that patchset.

> There is kernel work in progress to use vfio core as library.

OK. Could you share me some link to it? Much appreciated.

> So we do not anticipate to use add mdev layer and uuid to create a vfio device
> for a SF.

OK. For now, we are following the vfio-mdev standard, using UUID to create vfio
devices.

> 
> For Intel, ADI will never has any netdevs or rdma dev?

I think technically it could have. But for some devices like our dma devices, it's
just using mdev:

https://www.spinics.net/lists/kvm/msg244417.html

Thanks,
Chenbo


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [dpdk-dev] [RFC 00/14] mlx5: support SubFunction
  2021-06-15 11:19                 ` Xia, Chenbo
@ 2021-06-15 12:47                   ` Parav Pandit
  2021-06-15 15:19                     ` Jason Gunthorpe
  0 siblings, 1 reply; 42+ messages in thread
From: Parav Pandit @ 2021-06-15 12:47 UTC (permalink / raw)
  To: Xia, Chenbo, NBU-Contact-Thomas Monjalon, Yigit, Ferruh
  Cc: Xueming(Steven) Li, Slava Ovsiienko, dev, Jason Gunthorpe



> From: Xia, Chenbo <chenbo.xia@intel.com>
> Sent: Tuesday, June 15, 2021 4:49 PM
> 
> >
> > > Just FYI:
> > >
> > > We are introducing a new mdev bus for DPDK:
> > > http://patchwork.dpdk.org/project/dpdk/cover/20210601030644.3318-1-
> > > chenbo.xia@intel.com/
> > >
> > I am yet to read about it. But I am not sure what value does it add.
> > A user can open a vfio device using vfio subsystem and operate on it.
> > A vfio device can be a create as a result of binding PCI VF/PF to
> > vfio-pci driver or a SF by binding SF to vfio_foo driver.
> 
> Yes, in general it is the way. For vfio-mdev, it works as binding the vfio-mdev
> to parent device and echo uuid to create a virtual device. VFIO APP like
> DPDK, as you said, should work similar with VFIO UAPI for vfio-pci devices or
> mdev-based devices. But currently DPDK only cares about vfio-pci devices
> and does not care things for other cases like mdev-based pci devices. For
> example, it does not scan /sys/bus/mdev and it always uses pci bdf as device
> address, which mdev-based pci devices do not have. Therefore I sent that
> patchset.
mdev device reside on mdev bus. So dpdk should identify the mdev object by specifying bus type = mdev, and device id = uuid.
There should not be any attachment to pci as Thomas said.

> 
> > There is kernel work in progress to use vfio core as library.
> 
> OK. Could you share me some link to it? Much appreciated.
> 
[1] https://lore.kernel.org/kvm/20210603160809.15845-1-mgurtovoy@nvidia.com/

> > So we do not anticipate to use add mdev layer and uuid to create a
> > vfio device for a SF.
> 
> OK. For now, we are following the vfio-mdev standard, using UUID to create
> vfio devices.
> 
If this layer is going to work on top of VFIO devices, does it really care that is it mdev?
Can it identify the vfio device by vfio device and its UAPI in uniform way?
such as open("/dev/vfio/98" ..);


> >
> > For Intel, ADI will never has any netdevs or rdma dev?
> 
> I think technically it could have. 
Unlikely. As I explained in previous email that creating netdev, rdma dev using mdev bus was already rejected in my previous patches.
And we step forward with auxiliary bus.

> But for some devices like our dma devices,
> it's just using mdev:
> 
> https://www.spinics.net/lists/kvm/msg244417.html
Possibly yes. Some devices might live on mdev bus.
You should wait for kernel patches to be merged as Jason said.

I still think that identifying vfio device by its /dev/vfio/<id> will go long way regardless of its bus type.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [dpdk-dev] [RFC 00/14] mlx5: support SubFunction
  2021-06-15 12:47                   ` Parav Pandit
@ 2021-06-15 15:19                     ` Jason Gunthorpe
  0 siblings, 0 replies; 42+ messages in thread
From: Jason Gunthorpe @ 2021-06-15 15:19 UTC (permalink / raw)
  To: Parav Pandit
  Cc: Xia, Chenbo, NBU-Contact-Thomas Monjalon, Yigit, Ferruh,
	Xueming(Steven) Li, Slava Ovsiienko, dev

On Tue, Jun 15, 2021 at 12:47:13PM +0000, Parav Pandit wrote:

> > But for some devices like our dma devices,
> > it's just using mdev:
> > 
> > https://www.spinics.net/lists/kvm/msg244417.html
> Possibly yes. Some devices might live on mdev bus.
> You should wait for kernel patches to be merged as Jason said.

Also I would not expect dpdk to consume IDXD via vfio, but instead via
the normal multi-process host IOCTL interface it has.

Jason

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [dpdk-dev] [PATCH v1 00/14] net/mlx5: support Sub-Function
  2021-05-27 13:37 ` [dpdk-dev] [RFC 01/14] common/mlx5: add common device driver Xueming Li
  2021-06-10  9:51   ` Thomas Monjalon
@ 2021-06-16  4:09   ` Xueming Li
  2021-06-16  4:09   ` [dpdk-dev] [PATCH v1 01/14] common/mlx5: add common device driver Xueming Li
                     ` (13 subsequent siblings)
  15 siblings, 0 replies; 42+ messages in thread
From: Xueming Li @ 2021-06-16  4:09 UTC (permalink / raw)
  To: Viacheslav Ovsiienko; +Cc: dev, xuemingl

Sub-Function [1] is a portion of the PCI device, a SF netdev has its own
dedicated queues(txq, rxq). A SF shares PCI level resources with other
SFs and/or with its parent PCI function. Auxiliary bus is the
fundamental of SF.

This patch set introduces Sub-Function support for mlx5 PMD driver
including class net, regex, vdpa and compress.

Depends-on: series-16904 ("bus/auxiliary: introduce auxiliary bus")
Depends-on: series-17304 ("eal: save error in string copy")

Version history:
  RFC:
        initial version
  V1:
		rebased on latest upstream code

[1] SubFunction in kernel:
https://lore.kernel.org/netdev/20201112192424.2742-1-parav@nvidia.com/

[2] Auxiliary bus:
http://patchwork.dpdk.org/project/dpdk/patch/20210510134732.2174-1-xuemingl@nvidia.com/


Thomas Monjalon (5):
  common/mlx5: move description of PCI sysfs functions
  common/mlx5: get PCI device address from any bus
  vdpa/mlx5: define driver name as macro
  vdpa/mlx5: remove PCI specifics
  vdpa/mlx5: support SubFunction

Xueming Li (9):
  common/mlx5: add common device driver
  common/mlx5: support auxiliary bus
  net/mlx5: remove PCI dependency
  net/mlx5: migrate to bus-agnostic common driver
  net/mlx5: support SubFunction
  net/mlx5: check max Verbs port number
  regex/mlx5: migrate to common driver
  compress/mlx5: migrate to common driver
  common/mlx5: clean up legacy PCI bus driver

 doc/guides/nics/mlx5.rst                      |  54 +-
 doc/guides/vdpadevs/mlx5.rst                  |  10 +
 drivers/common/mlx5/linux/meson.build         |   3 +
 .../common/mlx5/linux/mlx5_common_auxiliary.c | 191 +++++++
 drivers/common/mlx5/linux/mlx5_common_os.c    |  29 +-
 drivers/common/mlx5/linux/mlx5_common_os.h    |   6 +-
 drivers/common/mlx5/linux/mlx5_common_verbs.c |  24 +-
 drivers/common/mlx5/meson.build               |   2 +-
 drivers/common/mlx5/mlx5_common.c             | 391 ++++++++++++-
 drivers/common/mlx5/mlx5_common.h             | 178 +++++-
 drivers/common/mlx5/mlx5_common_pci.c         | 526 ++++--------------
 drivers/common/mlx5/mlx5_common_pci.h         |  77 ---
 drivers/common/mlx5/mlx5_common_private.h     |  50 ++
 drivers/common/mlx5/mlx5_common_utils.h       |   2 +
 drivers/common/mlx5/version.map               |  12 +-
 drivers/compress/mlx5/mlx5_compress.c         |  71 +--
 drivers/net/mlx5/linux/mlx5_ethdev_os.c       |  14 +-
 drivers/net/mlx5/linux/mlx5_os.c              | 200 +++++--
 drivers/net/mlx5/linux/mlx5_os.h              |   5 +-
 drivers/net/mlx5/mlx5.c                       |  87 +--
 drivers/net/mlx5/mlx5.h                       |  13 +-
 drivers/net/mlx5/mlx5_ethdev.c                |   2 +-
 drivers/net/mlx5/mlx5_mac.c                   |   2 +-
 drivers/net/mlx5/mlx5_mr.c                    |  48 +-
 drivers/net/mlx5/mlx5_rxmode.c                |   8 +-
 drivers/net/mlx5/mlx5_rxtx.h                  |   9 +-
 drivers/net/mlx5/mlx5_trigger.c               |  14 +-
 drivers/net/mlx5/mlx5_txq.c                   |   3 +-
 drivers/net/mlx5/windows/mlx5_os.c            |  14 +-
 drivers/regex/mlx5/mlx5_regex.c               |  49 +-
 drivers/regex/mlx5/mlx5_regex.h               |   1 -
 drivers/vdpa/mlx5/mlx5_vdpa.c                 | 128 ++---
 drivers/vdpa/mlx5/mlx5_vdpa.h                 |   1 -
 33 files changed, 1337 insertions(+), 887 deletions(-)
 create mode 100644 drivers/common/mlx5/linux/mlx5_common_auxiliary.c
 delete mode 100644 drivers/common/mlx5/mlx5_common_pci.h
 create mode 100644 drivers/common/mlx5/mlx5_common_private.h

-- 
2.25.1


^ permalink raw reply	[flat|nested] 42+ messages in thread

* [dpdk-dev] [PATCH v1 01/14] common/mlx5: add common device driver
  2021-05-27 13:37 ` [dpdk-dev] [RFC 01/14] common/mlx5: add common device driver Xueming Li
  2021-06-10  9:51   ` Thomas Monjalon
  2021-06-16  4:09   ` [dpdk-dev] [PATCH v1 00/14] net/mlx5: support Sub-Function Xueming Li
@ 2021-06-16  4:09   ` Xueming Li
  2021-06-16  4:09   ` [dpdk-dev] [PATCH v1 02/14] common/mlx5: move description of PCI sysfs functions Xueming Li
                     ` (12 subsequent siblings)
  15 siblings, 0 replies; 42+ messages in thread
From: Xueming Li @ 2021-06-16  4:09 UTC (permalink / raw)
  To: Viacheslav Ovsiienko
  Cc: dev, xuemingl, Matan Azrad, Shahaf Shuler, Ray Kinsella, Neil Horman

To support auxiliary bus, introduces common device driver and callbacks,
suppose to replace current mlx5 common PCI bus driver.

mlx5 common PCI bus driver still used by mlx5 eth, vDPA and regex PMD,
will remove once all PMD drivers adapt to new common driver.

Signed-off-by: Xueming Li <xuemingl@nvidia.com>
---
 drivers/common/mlx5/linux/mlx5_common_os.c    |   2 +-
 drivers/common/mlx5/linux/mlx5_common_os.h    |   7 +-
 drivers/common/mlx5/linux/mlx5_common_verbs.c |  21 +-
 drivers/common/mlx5/mlx5_common.c             | 364 +++++++++++++++++-
 drivers/common/mlx5/mlx5_common.h             | 129 ++++++-
 drivers/common/mlx5/mlx5_common_pci.c         | 133 ++++++-
 drivers/common/mlx5/mlx5_common_private.h     |  41 ++
 drivers/common/mlx5/mlx5_common_utils.h       |   2 +
 drivers/common/mlx5/version.map               |   4 +
 drivers/net/mlx5/mlx5.c                       |   2 +-
 10 files changed, 685 insertions(+), 20 deletions(-)
 create mode 100644 drivers/common/mlx5/mlx5_common_private.h

diff --git a/drivers/common/mlx5/linux/mlx5_common_os.c b/drivers/common/mlx5/linux/mlx5_common_os.c
index ea0b71e425..78a9723075 100644
--- a/drivers/common/mlx5/linux/mlx5_common_os.c
+++ b/drivers/common/mlx5/linux/mlx5_common_os.c
@@ -425,7 +425,7 @@ mlx5_glue_constructor(void)
 }
 
 struct ibv_device *
-mlx5_os_get_ibv_device(struct rte_pci_addr *addr)
+mlx5_os_get_ibv_device(const struct rte_pci_addr *addr)
 {
 	int n;
 	struct ibv_device **ibv_list = mlx5_glue->get_device_list(&n);
diff --git a/drivers/common/mlx5/linux/mlx5_common_os.h b/drivers/common/mlx5/linux/mlx5_common_os.h
index 72d6bf828b..86d0cb09b0 100644
--- a/drivers/common/mlx5/linux/mlx5_common_os.h
+++ b/drivers/common/mlx5/linux/mlx5_common_os.h
@@ -291,6 +291,11 @@ mlx5_os_free(void *addr)
 
 __rte_internal
 struct ibv_device *
-mlx5_os_get_ibv_device(struct rte_pci_addr *addr);
+mlx5_os_get_ibv_device(const struct rte_pci_addr *addr);
+
+__rte_internal
+struct ibv_device *
+mlx5_os_get_ibv_dev(const struct rte_device *dev);
+
 
 #endif /* RTE_PMD_MLX5_COMMON_OS_H_ */
diff --git a/drivers/common/mlx5/linux/mlx5_common_verbs.c b/drivers/common/mlx5/linux/mlx5_common_verbs.c
index aa560f05f2..6a6ab7a7a2 100644
--- a/drivers/common/mlx5/linux/mlx5_common_verbs.c
+++ b/drivers/common/mlx5/linux/mlx5_common_verbs.c
@@ -10,11 +10,31 @@
 #include <sys/mman.h>
 #include <inttypes.h>
 
+#include <rte_errno.h>
+#include <rte_bus_pci.h>
+
+#include "mlx5_common_utils.h"
+#include "mlx5_common_log.h"
+#include "mlx5_common_private.h"
 #include "mlx5_autoconf.h"
 #include <mlx5_glue.h>
 #include <mlx5_common.h>
 #include <mlx5_common_mr.h>
 
+struct ibv_device *
+mlx5_os_get_ibv_dev(const struct rte_device *dev)
+{
+	struct ibv_device *ibv = NULL;
+
+	if (mlx5_dev_is_pci(dev))
+		ibv = mlx5_os_get_ibv_device(&RTE_DEV_TO_PCI_CONST(dev)->addr);
+	if (ibv == NULL) {
+		rte_errno = ENODEV;
+		DRV_LOG(ERR, "Verbs device not found: %s", dev->name);
+	}
+	return ibv;
+}
+
 /**
  * Register mr. Given protection domain pointer, pointer to addr and length
  * register the memory region.
@@ -68,4 +88,3 @@ mlx5_common_verbs_dereg_mr(struct mlx5_pmd_mr *pmd_mr)
 		memset(pmd_mr, 0, sizeof(*pmd_mr));
 	}
 }
-
diff --git a/drivers/common/mlx5/mlx5_common.c b/drivers/common/mlx5/mlx5_common.c
index 25e9f09108..dc32a7bd90 100644
--- a/drivers/common/mlx5/mlx5_common.c
+++ b/drivers/common/mlx5/mlx5_common.c
@@ -8,11 +8,14 @@
 
 #include <rte_errno.h>
 #include <rte_mempool.h>
+#include <rte_class.h>
+#include <rte_malloc.h>
 
 #include "mlx5_common.h"
 #include "mlx5_common_os.h"
 #include "mlx5_common_log.h"
 #include "mlx5_common_pci.h"
+#include "mlx5_common_private.h"
 
 uint8_t haswell_broadwell_cpu;
 
@@ -41,6 +44,363 @@ static inline void mlx5_cpu_id(unsigned int level,
 
 RTE_LOG_REGISTER_DEFAULT(mlx5_common_logtype, NOTICE)
 
+/* Head of list of drivers. */
+static TAILQ_HEAD(mlx5_drivers, mlx5_class_driver) drivers_list =
+				TAILQ_HEAD_INITIALIZER(drivers_list);
+
+/* Head of devices. */
+static TAILQ_HEAD(mlx5_devices, mlx5_common_device) devices_list =
+				TAILQ_HEAD_INITIALIZER(devices_list);
+
+static const struct {
+	const char *name;
+	unsigned int drv_class;
+} mlx5_classes[] = {
+	{ .name = "vdpa", .drv_class = MLX5_CLASS_VDPA },
+	{ .name = "eth", .drv_class = MLX5_CLASS_ETH },
+	/* Keep class "net" for backward compatibility. */
+	{ .name = "net", .drv_class = MLX5_CLASS_ETH },
+	{ .name = "regex", .drv_class = MLX5_CLASS_REGEX },
+	{ .name = "compress", .drv_class = MLX5_CLASS_COMPRESS },
+};
+
+static int
+class_name_to_value(const char *class_name)
+{
+	unsigned int i;
+
+	for (i = 0; i < RTE_DIM(mlx5_classes); i++) {
+		if (strcmp(class_name, mlx5_classes[i].name) == 0)
+			return mlx5_classes[i].drv_class;
+	}
+	return -EINVAL;
+}
+
+static struct mlx5_class_driver *
+driver_get(uint32_t class)
+{
+	struct mlx5_class_driver *driver;
+
+	TAILQ_FOREACH(driver, &drivers_list, next) {
+		if ((uint32_t)driver->drv_class == class)
+			return driver;
+	}
+	return NULL;
+}
+
+static int
+devargs_class_handler(__rte_unused const char *key,
+		      const char *class_names, void *opaque)
+{
+	int *ret = opaque;
+	int class_val;
+	char *scratch;
+	char *scratch_ref;
+	char *found;
+	char *refstr = NULL;
+
+	*ret = 0;
+	scratch = strdup(class_names);
+	if (!scratch) {
+		*ret = -ENOMEM;
+		return *ret;
+	}
+	scratch_ref = scratch;
+	found = strtok_r(scratch, ":", &refstr);
+	if (!found)
+		/* Empty string. */
+		goto err;
+	do {
+		/* Extract each individual class name. Multiple
+		 * classes can be supplied as class=net:regex:foo:bar.
+		 */
+		class_val = class_name_to_value(found);
+		/* Check if its a valid class. */
+		if (class_val < 0) {
+			*ret = -EINVAL;
+			goto err;
+		}
+		*ret |= class_val;
+		found = strtok_r(NULL, ":", &refstr);
+	} while (found);
+err:
+	free(scratch_ref);
+	if (*ret < 0)
+		DRV_LOG(ERR, "Invalid mlx5 class options: %s.\n", class_names);
+	return *ret;
+}
+
+static int
+parse_class_options(const struct rte_devargs *devargs)
+{
+	struct rte_kvargs *kvlist;
+	int ret = 0;
+
+	if (devargs == NULL)
+		return 0;
+	if (devargs->cls != NULL && devargs->cls->name != NULL)
+		/* Global syntax, only one class type. */
+		return class_name_to_value(devargs->cls->name);
+	/* Legacy devargs support multiple classes. */
+	kvlist = rte_kvargs_parse(devargs->args, NULL);
+	if (kvlist == NULL)
+		return 0;
+	rte_kvargs_process(kvlist, RTE_DEVARGS_KEY_CLASS,
+			   devargs_class_handler, &ret);
+	rte_kvargs_free(kvlist);
+	return ret;
+}
+
+static const unsigned int mlx5_class_invalid_combinations[] = {
+	MLX5_CLASS_ETH | MLX5_CLASS_VDPA,
+	/* New class combination should be added here. */
+};
+
+static int
+is_valid_class_combination(uint32_t user_classes)
+{
+	unsigned int i;
+
+	/* Verify if user specified unsupported combination. */
+	for (i = 0; i < RTE_DIM(mlx5_class_invalid_combinations); i++) {
+		if ((mlx5_class_invalid_combinations[i] & user_classes) ==
+		    mlx5_class_invalid_combinations[i])
+			return -EINVAL;
+	}
+	/* Not found any invalid class combination. */
+	return 0;
+}
+
+static bool
+device_class_enabled(const struct mlx5_common_device *device, uint32_t class)
+{
+	return (device->classes_loaded & class) > 0;
+}
+
+static bool
+mlx5_bus_match(const struct mlx5_class_driver *drv,
+	       const struct rte_device *dev)
+{
+	if (mlx5_dev_is_pci(dev))
+		return mlx5_dev_pci_match(drv, dev);
+	return true;
+}
+
+static struct mlx5_common_device *
+to_mlx5_device(const struct rte_device *rte_dev)
+{
+	struct mlx5_common_device *dev;
+
+	TAILQ_FOREACH(dev, &devices_list, next) {
+		if (rte_dev == dev->dev)
+			return dev;
+	}
+	return NULL;
+}
+
+static void
+dev_release(struct mlx5_common_device *dev)
+{
+	TAILQ_REMOVE(&devices_list, dev, next);
+	rte_free(dev);
+}
+
+static int
+drivers_remove(struct mlx5_common_device *dev, uint32_t enabled_classes)
+{
+	struct mlx5_class_driver *driver;
+	int local_ret = -ENODEV;
+	unsigned int i = 0;
+	int ret = 0;
+
+	enabled_classes &= dev->classes_loaded;
+	while (enabled_classes) {
+		driver = driver_get(RTE_BIT64(i));
+		if (driver != NULL) {
+			local_ret = driver->remove(dev->dev);
+			if (local_ret == 0)
+				dev->classes_loaded &= ~RTE_BIT64(i);
+			else if (ret == 0)
+				ret = local_ret;
+		}
+		enabled_classes &= ~RTE_BIT64(i);
+		i++;
+	}
+	if (local_ret != 0 && ret == 0)
+		ret = local_ret;
+	return ret;
+}
+
+static int
+drivers_probe(struct mlx5_common_device *dev, uint32_t user_classes)
+{
+	struct mlx5_class_driver *driver;
+	uint32_t enabled_classes = 0;
+	bool already_loaded;
+	int ret;
+
+	TAILQ_FOREACH(driver, &drivers_list, next) {
+		if ((driver->drv_class & user_classes) == 0)
+			continue;
+		if (!mlx5_bus_match(driver, dev->dev))
+			continue;
+		already_loaded = dev->classes_loaded & driver->drv_class;
+		if (already_loaded && driver->probe_again == 0) {
+			DRV_LOG(ERR, "Device %s is already probed",
+				dev->dev->name);
+			ret = -EEXIST;
+			goto probe_err;
+		}
+		ret = driver->probe(dev->dev);
+		if (ret < 0) {
+			DRV_LOG(ERR, "Failed to load driver %s",
+				driver->name);
+			goto probe_err;
+		}
+		enabled_classes |= driver->drv_class;
+	}
+	dev->classes_loaded |= enabled_classes;
+	return 0;
+probe_err:
+	/* Only unload drivers which are enabled which were enabled
+	 * in this probe instance.
+	 */
+	drivers_remove(dev, enabled_classes);
+	return ret;
+}
+
+int
+mlx5_common_dev_probe(struct rte_device *eal_dev)
+{
+	struct mlx5_common_device *dev;
+	uint32_t classes = 0;
+	bool new_device = false;
+	int ret;
+
+	DRV_LOG(INFO, "probe device \"%s\".", eal_dev->name);
+	ret = parse_class_options(eal_dev->devargs);
+	if (ret < 0) {
+		DRV_LOG(ERR, "Unsupported mlx5 class type: %s",
+			eal_dev->devargs->args);
+		return ret;
+	}
+	classes = ret;
+	if (classes == 0)
+		/* Default to net class. */
+		classes = MLX5_CLASS_ETH;
+	dev = to_mlx5_device(eal_dev);
+	if (!dev) {
+		dev = rte_zmalloc("mlx5_common_device", sizeof(*dev), 0);
+		if (!dev)
+			return -ENOMEM;
+		dev->dev = eal_dev;
+		TAILQ_INSERT_HEAD(&devices_list, dev, next);
+		new_device = true;
+	} else {
+		/* Validate combination here. */
+		ret = is_valid_class_combination(classes |
+						 dev->classes_loaded);
+		if (ret != 0) {
+			DRV_LOG(ERR, "Unsupported mlx5 classes combination.");
+			return ret;
+		}
+	}
+	ret = drivers_probe(dev, classes);
+	if (ret)
+		goto class_err;
+	return 0;
+class_err:
+	if (new_device)
+		dev_release(dev);
+	return ret;
+}
+
+int
+mlx5_common_dev_remove(struct rte_device *eal_dev)
+{
+	struct mlx5_common_device *dev;
+	int ret;
+
+	dev = to_mlx5_device(eal_dev);
+	if (!dev)
+		return -ENODEV;
+	/* Matching device found, cleanup and unload drivers. */
+	ret = drivers_remove(dev, dev->classes_loaded);
+	if (ret != 0)
+		dev_release(dev);
+	return ret;
+}
+
+int
+mlx5_common_dev_dma_map(struct rte_device *dev, void *addr, uint64_t iova,
+			size_t len)
+{
+	struct mlx5_class_driver *driver = NULL;
+	struct mlx5_class_driver *temp;
+	struct mlx5_common_device *mdev;
+	int ret = -EINVAL;
+
+	mdev = to_mlx5_device(dev);
+	if (!mdev)
+		return -ENODEV;
+	TAILQ_FOREACH(driver, &drivers_list, next) {
+		if (!device_class_enabled(mdev, driver->drv_class) ||
+		    driver->dma_map == NULL)
+			continue;
+		ret = driver->dma_map(dev, addr, iova, len);
+		if (ret)
+			goto map_err;
+	}
+	return ret;
+map_err:
+	TAILQ_FOREACH(temp, &drivers_list, next) {
+		if (temp == driver)
+			break;
+		if (device_class_enabled(mdev, temp->drv_class) &&
+		    temp->dma_map && temp->dma_unmap)
+			temp->dma_unmap(dev, addr, iova, len);
+	}
+	return ret;
+}
+
+int
+mlx5_common_dev_dma_unmap(struct rte_device *dev, void *addr, uint64_t iova,
+			  size_t len)
+{
+	struct mlx5_class_driver *driver;
+	struct mlx5_common_device *mdev;
+	int local_ret = -EINVAL;
+	int ret = 0;
+
+	mdev = to_mlx5_device(dev);
+	if (!mdev)
+		return -ENODEV;
+	/* There is no unmap error recovery in current implementation. */
+	TAILQ_FOREACH_REVERSE(driver, &drivers_list, mlx5_drivers, next) {
+		if (!device_class_enabled(mdev, driver->drv_class) ||
+		    driver->dma_unmap == NULL)
+			continue;
+		local_ret = driver->dma_unmap(dev, addr, iova, len);
+		if (local_ret && (ret == 0))
+			ret = local_ret;
+	}
+	if (local_ret)
+		ret = local_ret;
+	return ret;
+}
+
+void
+mlx5_class_driver_register(struct mlx5_class_driver *driver)
+{
+	mlx5_common_driver_on_register_pci(driver);
+	TAILQ_INSERT_TAIL(&drivers_list, driver, next);
+}
+
+static void mlx5_common_driver_init(void)
+{
+	mlx5_common_pci_init();
+}
+
 static bool mlx5_common_initialized;
 
 /**
@@ -55,7 +415,7 @@ mlx5_common_init(void)
 		return;
 
 	mlx5_glue_constructor();
-	mlx5_common_pci_init();
+	mlx5_common_driver_init();
 	mlx5_common_initialized = true;
 }
 
@@ -214,3 +574,5 @@ mlx5_devx_alloc_uar(void *ctx, int mapping)
 exit:
 	return uar;
 }
+
+RTE_PMD_EXPORT_NAME(mlx5_class_driver, __COUNTER__);
diff --git a/drivers/common/mlx5/mlx5_common.h b/drivers/common/mlx5/mlx5_common.h
index 306f2f1ab7..4783fa6303 100644
--- a/drivers/common/mlx5/mlx5_common.h
+++ b/drivers/common/mlx5/mlx5_common.h
@@ -210,7 +210,7 @@ int mlx5_get_ifname_sysfs(const char *ibdev_path, char *ifname);
 
 enum mlx5_class {
 	MLX5_CLASS_INVALID,
-	MLX5_CLASS_NET = RTE_BIT64(0),
+	MLX5_CLASS_ETH = RTE_BIT64(0),
 	MLX5_CLASS_VDPA = RTE_BIT64(1),
 	MLX5_CLASS_REGEX = RTE_BIT64(2),
 	MLX5_CLASS_COMPRESS = RTE_BIT64(3),
@@ -242,4 +242,131 @@ extern uint8_t haswell_broadwell_cpu;
 __rte_internal
 void mlx5_common_init(void);
 
+/**
+ * Common Driver Interface
+ * ConnectX common driver supports multiple classes: net,vdpa,regex,crypto and
+ * compress devices. This layer enables creating such multiple class of devices
+ * on a single device by allowing to bind multiple class specific device
+ * driver to attach to common driver.
+ *
+ *                        ----------------
+ *                        |   mlx5 class |
+ *                        |    drivers   |
+ *                        ----------------
+ *                               ||
+ *                        -----------------
+ *                        |     mlx5      |
+ *                        | common driver |
+ *                        -----------------
+ *                          |          |
+ *                 -----------        -----------------
+ *                 |   mlx5  |        |   mlx5        |
+ *                 | pci dev |        | auxiliary dev |
+ *                 -----------        -----------------
+ *
+ * - mlx5 pci bus driver binds to mlx5 PCI devices defined by PCI
+ *   ID table of all related mlx5 PCI devices.
+ * - mlx5 class driver such as net, vdpa, regex PMD defines its
+ *   specific PCI ID table and mlx5 bus driver probes matching
+ *   class drivers.
+ * - mlx5 common driver is central place that validates supported
+ *   class combinations.
+ * - mlx5 common driver hide bus difference by resolving device address
+ *   from devargs, locating target RDMA device and probing with it.
+ */
+
+/**
+ * Initialization function for the driver called during device probing.
+ */
+typedef int (mlx5_class_driver_probe_t)(struct rte_device *dev);
+
+/**
+ * Uninitialization function for the driver called during hot-unplugging.
+ */
+typedef int (mlx5_class_driver_remove_t)(struct rte_device *dev);
+
+/**
+ * Driver-specific DMA mapping. After a successful call the device
+ * will be able to read/write from/to this segment.
+ *
+ * @param dev
+ *   Pointer to the device.
+ * @param addr
+ *   Starting virtual address of memory to be mapped.
+ * @param iova
+ *   Starting IOVA address of memory to be mapped.
+ * @param len
+ *   Length of memory segment being mapped.
+ * @return
+ *   - 0 On success.
+ *   - Negative value and rte_errno is set otherwise.
+ */
+typedef int (mlx5_class_driver_dma_map_t)(struct rte_device *dev, void *addr,
+					  uint64_t iova, size_t len);
+
+/**
+ * Driver-specific DMA un-mapping. After a successful call the device
+ * will not be able to read/write from/to this segment.
+ *
+ * @param dev
+ *   Pointer to the device.
+ * @param addr
+ *   Starting virtual address of memory to be unmapped.
+ * @param iova
+ *   Starting IOVA address of memory to be unmapped.
+ * @param len
+ *   Length of memory segment being unmapped.
+ * @return
+ *   - 0 On success.
+ *   - Negative value and rte_errno is set otherwise.
+ */
+typedef int (mlx5_class_driver_dma_unmap_t)(struct rte_device *dev, void *addr,
+					    uint64_t iova, size_t len);
+
+/** Device already probed can be probed again to check for new ports. */
+#define MLX5_DRV_PROBE_AGAIN 0x0004
+
+/**
+ * A structure describing a mlx5 common class driver.
+ */
+struct mlx5_class_driver {
+	TAILQ_ENTRY(mlx5_class_driver) next;
+	enum mlx5_class drv_class;                /**< Class of this driver. */
+	const char *name;                         /**< Driver name. */
+	mlx5_class_driver_probe_t *probe;         /**< Device Probe function. */
+	mlx5_class_driver_remove_t *remove;       /**< Device Remove function. */
+	mlx5_class_driver_dma_map_t *dma_map;	  /**< device dma map function. */
+	mlx5_class_driver_dma_unmap_t *dma_unmap; /**< device dma unmap function. */
+	const struct rte_pci_id *id_table;        /**< ID table, NULL terminated. */
+	uint32_t probe_again:1;
+	/**< Device already probed can be probed again to check new device. */
+	uint32_t intr_lsc:1; /**< Supports link state interrupt. */
+	uint32_t intr_rmv:1; /**< Supports device remove interrupt. */
+};
+
+/**
+ * Register a mlx5 device driver.
+ *
+ * @param driver
+ *   A pointer to a mlx5_driver structure describing the driver
+ *   to be registered.
+ */
+__rte_internal
+void
+mlx5_class_driver_register(struct mlx5_class_driver *driver);
+
+/**
+ * Test a device is PCI bus device.
+ *
+ * @param dev
+ *   Pointer to device.
+ *
+ * @return
+ *   - True on device devargs is a PCI bus device.
+ *   - False otherwise.
+ */
+__rte_internal
+bool
+mlx5_dev_is_pci(const struct rte_device *dev);
+
 #endif /* RTE_PMD_MLX5_COMMON_H_ */
diff --git a/drivers/common/mlx5/mlx5_common_pci.c b/drivers/common/mlx5/mlx5_common_pci.c
index 34747c4e07..6fe28defbf 100644
--- a/drivers/common/mlx5/mlx5_common_pci.c
+++ b/drivers/common/mlx5/mlx5_common_pci.c
@@ -3,11 +3,19 @@
  */
 
 #include <stdlib.h>
+
 #include <rte_malloc.h>
+#include <rte_devargs.h>
+#include <rte_errno.h>
 #include <rte_class.h>
 
 #include "mlx5_common_log.h"
 #include "mlx5_common_pci.h"
+#include "mlx5_common_private.h"
+
+static struct rte_pci_driver mlx5_common_pci_driver;
+
+/********** Legacy PCI bus driver, to be removed ********/
 
 struct mlx5_pci_device {
 	struct rte_pci_device *pci_dev;
@@ -282,8 +290,8 @@ drivers_probe(struct mlx5_pci_device *dev, struct rte_pci_driver *pci_drv,
  *   0 on success, a negative errno value otherwise and rte_errno is set.
  */
 static int
-mlx5_common_pci_probe(struct rte_pci_driver *pci_drv __rte_unused,
-		      struct rte_pci_device *pci_dev)
+mlx5_pci_probe(struct rte_pci_driver *pci_drv __rte_unused,
+	       struct rte_pci_device *pci_dev)
 {
 	struct mlx5_pci_device *dev;
 	uint32_t user_classes = 0;
@@ -336,7 +344,7 @@ mlx5_common_pci_probe(struct rte_pci_driver *pci_drv __rte_unused,
  *   0 on success, the function cannot fail.
  */
 static int
-mlx5_common_pci_remove(struct rte_pci_device *pci_dev)
+mlx5_pci_remove(struct rte_pci_device *pci_dev)
 {
 	struct mlx5_pci_device *dev;
 	int ret;
@@ -352,8 +360,8 @@ mlx5_common_pci_remove(struct rte_pci_device *pci_dev)
 }
 
 static int
-mlx5_common_pci_dma_map(struct rte_pci_device *pci_dev, void *addr,
-			uint64_t iova, size_t len)
+mlx5_pci_dma_map(struct rte_pci_device *pci_dev, void *addr,
+		 uint64_t iova, size_t len)
 {
 	struct mlx5_pci_driver *driver = NULL;
 	struct mlx5_pci_driver *temp;
@@ -385,8 +393,8 @@ mlx5_common_pci_dma_map(struct rte_pci_device *pci_dev, void *addr,
 }
 
 static int
-mlx5_common_pci_dma_unmap(struct rte_pci_device *pci_dev, void *addr,
-			  uint64_t iova, size_t len)
+mlx5_pci_dma_unmap(struct rte_pci_device *pci_dev, void *addr,
+		   uint64_t iova, size_t len)
 {
 	struct mlx5_pci_driver *driver;
 	struct mlx5_pci_device *dev;
@@ -419,10 +427,10 @@ static struct rte_pci_driver mlx5_pci_driver = {
 	.driver = {
 		.name = MLX5_PCI_DRIVER_NAME,
 	},
-	.probe = mlx5_common_pci_probe,
-	.remove = mlx5_common_pci_remove,
-	.dma_map = mlx5_common_pci_dma_map,
-	.dma_unmap = mlx5_common_pci_dma_unmap,
+	.probe = mlx5_pci_probe,
+	.remove = mlx5_pci_remove,
+	.dma_map = mlx5_pci_dma_map,
+	.dma_unmap = mlx5_pci_dma_unmap,
 };
 
 static int
@@ -486,7 +494,7 @@ pci_ids_table_update(const struct rte_pci_id *driver_id_table)
 	updated_table = calloc(num_ids, sizeof(*updated_table));
 	if (!updated_table)
 		return -ENOMEM;
-	if (TAILQ_EMPTY(&drv_list)) {
+	if (old_table == NULL) {
 		/* Copy the first driver's ID table. */
 		for (id_iter = driver_id_table; id_iter->vendor_id != 0;
 		     id_iter++, i++)
@@ -502,6 +510,7 @@ pci_ids_table_update(const struct rte_pci_id *driver_id_table)
 	/* Terminate table with empty entry. */
 	updated_table[i].vendor_id = 0;
 	mlx5_pci_driver.id_table = updated_table;
+	mlx5_common_pci_driver.id_table = updated_table;
 	mlx5_pci_id_table = updated_table;
 	if (old_table)
 		free(old_table);
@@ -520,6 +529,101 @@ mlx5_pci_driver_register(struct mlx5_pci_driver *driver)
 	TAILQ_INSERT_TAIL(&drv_list, driver, next);
 }
 
+/********** New common PCI bus driver ********/
+
+bool
+mlx5_dev_is_pci(const struct rte_device *dev)
+{
+	return strcmp(dev->bus->name, "pci") == 0;
+}
+
+bool
+mlx5_dev_pci_match(const struct mlx5_class_driver *drv,
+		   const struct rte_device *dev)
+{
+	const struct rte_pci_device *pci_dev;
+	const struct rte_pci_id *id_table;
+
+	if (!mlx5_dev_is_pci(dev))
+		return false;
+	pci_dev = RTE_DEV_TO_PCI_CONST(dev);
+	for (id_table = drv->id_table; id_table->vendor_id != 0;
+	     id_table++) {
+		/* Check if device's ids match the class driver's ids. */
+		if (id_table->vendor_id != pci_dev->id.vendor_id &&
+		    id_table->vendor_id != RTE_PCI_ANY_ID)
+			continue;
+		if (id_table->device_id != pci_dev->id.device_id &&
+		    id_table->device_id != RTE_PCI_ANY_ID)
+			continue;
+		if (id_table->subsystem_vendor_id !=
+		    pci_dev->id.subsystem_vendor_id &&
+		    id_table->subsystem_vendor_id != RTE_PCI_ANY_ID)
+			continue;
+		if (id_table->subsystem_device_id !=
+		    pci_dev->id.subsystem_device_id &&
+		    id_table->subsystem_device_id != RTE_PCI_ANY_ID)
+			continue;
+		if (id_table->class_id != pci_dev->id.class_id &&
+		    id_table->class_id != RTE_CLASS_ANY_ID)
+			continue;
+		return true;
+	}
+	return false;
+}
+
+static int
+mlx5_common_pci_probe(struct rte_pci_driver *pci_drv __rte_unused,
+		      struct rte_pci_device *pci_dev)
+{
+	return mlx5_common_dev_probe(&pci_dev->device);
+}
+
+static int
+mlx5_common_pci_remove(struct rte_pci_device *pci_dev)
+{
+	return mlx5_common_dev_remove(&pci_dev->device);
+}
+
+static int
+mlx5_common_pci_dma_map(struct rte_pci_device *pci_dev, void *addr,
+			uint64_t iova, size_t len)
+{
+	return mlx5_common_dev_dma_map(&pci_dev->device, addr, iova, len);
+}
+
+static int
+mlx5_common_pci_dma_unmap(struct rte_pci_device *pci_dev, void *addr,
+			  uint64_t iova, size_t len)
+{
+	return mlx5_common_dev_dma_unmap(&pci_dev->device, addr, iova, len);
+}
+
+void
+mlx5_common_driver_on_register_pci(struct mlx5_class_driver *driver)
+{
+	if (driver->id_table != NULL) {
+		if (pci_ids_table_update(driver->id_table) != 0)
+			return;
+	}
+	if (driver->probe_again)
+		mlx5_common_pci_driver.drv_flags |= RTE_PCI_DRV_PROBE_AGAIN;
+	if (driver->intr_lsc)
+		mlx5_common_pci_driver.drv_flags |= RTE_PCI_DRV_INTR_LSC;
+	if (driver->intr_rmv)
+		mlx5_common_pci_driver.drv_flags |= RTE_PCI_DRV_INTR_RMV;
+}
+
+static struct rte_pci_driver mlx5_common_pci_driver = {
+	.driver = {
+		   .name = MLX5_PCI_DRIVER_NAME,
+	},
+	.probe = mlx5_common_pci_probe,
+	.remove = mlx5_common_pci_remove,
+	.dma_map = mlx5_common_pci_dma_map,
+	.dma_unmap = mlx5_common_pci_dma_unmap,
+};
+
 void mlx5_common_pci_init(void)
 {
 	const struct rte_pci_id empty_table[] = {
@@ -535,7 +639,7 @@ void mlx5_common_pci_init(void)
 	 */
 	if (mlx5_pci_id_table == NULL && pci_ids_table_update(empty_table))
 		return;
-	rte_pci_register(&mlx5_pci_driver);
+	rte_pci_register(&mlx5_common_pci_driver);
 }
 
 RTE_FINI(mlx5_common_pci_finish)
@@ -544,8 +648,9 @@ RTE_FINI(mlx5_common_pci_finish)
 		/* Constructor doesn't register with PCI bus if it failed
 		 * to build the table.
 		 */
-		rte_pci_unregister(&mlx5_pci_driver);
+		rte_pci_unregister(&mlx5_common_pci_driver);
 		free(mlx5_pci_id_table);
 	}
 }
+
 RTE_PMD_EXPORT_NAME(mlx5_common_pci, __COUNTER__);
diff --git a/drivers/common/mlx5/mlx5_common_private.h b/drivers/common/mlx5/mlx5_common_private.h
new file mode 100644
index 0000000000..791eb3cd77
--- /dev/null
+++ b/drivers/common/mlx5/mlx5_common_private.h
@@ -0,0 +1,41 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright 2021 Mellanox Technologies, Ltd
+ */
+
+#ifndef _MLX5_COMMON_PRIVATE_H_
+#define _MLX5_COMMON_PRIVATE_H_
+
+#include <rte_pci.h>
+
+#include "mlx5_common.h"
+
+#ifdef __cplusplus
+extern "C" {
+#endif /* __cplusplus */
+
+/* Common bus driver: */
+
+struct mlx5_common_device {
+	struct rte_device *dev;
+	TAILQ_ENTRY(mlx5_common_device) next;
+	uint32_t classes_loaded;
+};
+
+int mlx5_common_dev_probe(struct rte_device *eal_dev);
+int mlx5_common_dev_remove(struct rte_device *eal_dev);
+int mlx5_common_dev_dma_map(struct rte_device *dev, void *addr, uint64_t iova,
+			    size_t len);
+int mlx5_common_dev_dma_unmap(struct rte_device *dev, void *addr, uint64_t iova,
+			      size_t len);
+
+/* Common PCI bus driver: */
+
+void mlx5_common_driver_on_register_pci(struct mlx5_class_driver *driver);
+bool mlx5_dev_pci_match(const struct mlx5_class_driver *drv,
+			const struct rte_device *dev);
+
+#ifdef __cplusplus
+}
+#endif /* __cplusplus */
+
+#endif /* _MLX5_COMMON_PRIVATE_H_ */
diff --git a/drivers/common/mlx5/mlx5_common_utils.h b/drivers/common/mlx5/mlx5_common_utils.h
index ed378ce9bd..9b067e92a8 100644
--- a/drivers/common/mlx5/mlx5_common_utils.h
+++ b/drivers/common/mlx5/mlx5_common_utils.h
@@ -5,6 +5,8 @@
 #ifndef RTE_PMD_MLX5_COMMON_UTILS_H_
 #define RTE_PMD_MLX5_COMMON_UTILS_H_
 
+#include <rte_rwlock.h>
+
 #include "mlx5_common.h"
 
 #define MLX5_HLIST_DIRECT_KEY 0x0001 /* Use the key directly as hash index. */
diff --git a/drivers/common/mlx5/version.map b/drivers/common/mlx5/version.map
index db4f13f1f7..6be882e98c 100644
--- a/drivers/common/mlx5/version.map
+++ b/drivers/common/mlx5/version.map
@@ -3,6 +3,8 @@ INTERNAL {
 
 	haswell_broadwell_cpu;
 
+	mlx5_class_driver_register;
+
 	mlx5_common_init;
 
 	mlx5_common_verbs_reg_mr; # WINDOWS_NO_EXPORT
@@ -10,6 +12,7 @@ INTERNAL {
 
 	mlx5_create_mr_ext;
 
+	mlx5_dev_is_pci; # WINDOWS_NO_EXPORT
 	mlx5_dev_to_pci_addr; # WINDOWS_NO_EXPORT
 
 	mlx5_devx_alloc_uar; # WINDOWS_NO_EXPORT
@@ -129,6 +132,7 @@ INTERNAL {
 	mlx5_os_alloc_pd;
 	mlx5_os_dealloc_pd;
 	mlx5_os_dereg_mr;
+	mlx5_os_get_ibv_dev; # WINDOWS_NO_EXPORT
 	mlx5_os_get_ibv_device; # WINDOWS_NO_EXPORT
 	mlx5_os_reg_mr;
 	mlx5_os_umem_dereg;
diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index d0faa45944..52573e78f9 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -2428,7 +2428,7 @@ static const struct rte_pci_id mlx5_pci_id_map[] = {
 };
 
 static struct mlx5_pci_driver mlx5_driver = {
-	.driver_class = MLX5_CLASS_NET,
+	.driver_class = MLX5_CLASS_ETH,
 	.pci_driver = {
 		.driver = {
 			.name = MLX5_PCI_DRIVER_NAME,
-- 
2.25.1


^ permalink raw reply	[flat|nested] 42+ messages in thread

* [dpdk-dev] [PATCH v1 02/14] common/mlx5: move description of PCI sysfs functions
  2021-05-27 13:37 ` [dpdk-dev] [RFC 01/14] common/mlx5: add common device driver Xueming Li
                     ` (2 preceding siblings ...)
  2021-06-16  4:09   ` [dpdk-dev] [PATCH v1 01/14] common/mlx5: add common device driver Xueming Li
@ 2021-06-16  4:09   ` Xueming Li
  2021-06-16  4:09   ` [dpdk-dev] [PATCH v1 03/14] common/mlx5: support auxiliary bus Xueming Li
                     ` (11 subsequent siblings)
  15 siblings, 0 replies; 42+ messages in thread
From: Xueming Li @ 2021-06-16  4:09 UTC (permalink / raw)
  To: Viacheslav Ovsiienko
  Cc: Thomas Monjalon, dev, xuemingl, Matan Azrad, Shahaf Shuler

From: Thomas Monjalon <thomas@monjalon.net>

The Linux-specific functions mlx5_get_pci_addr() and
mlx5_get_ifname_sysfs() are better described in the .h file.

The requirement for using mlx5_get_pci_addr() is explicit:
the node /device must exist in the provided sysfs path.

Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
---
 drivers/common/mlx5/linux/mlx5_common_os.c | 22 ------------------
 drivers/common/mlx5/mlx5_common.h          | 26 ++++++++++++++++++++++
 2 files changed, 26 insertions(+), 22 deletions(-)

diff --git a/drivers/common/mlx5/linux/mlx5_common_os.c b/drivers/common/mlx5/linux/mlx5_common_os.c
index 78a9723075..337e9df8cb 100644
--- a/drivers/common/mlx5/linux/mlx5_common_os.c
+++ b/drivers/common/mlx5/linux/mlx5_common_os.c
@@ -23,17 +23,6 @@
 const struct mlx5_glue *mlx5_glue;
 #endif
 
-/**
- * Get PCI information by sysfs device path.
- *
- * @param dev_path
- *   Pointer to device sysfs folder name.
- * @param[out] pci_addr
- *   PCI bus address output buffer.
- *
- * @return
- *   0 on success, a negative errno value otherwise and rte_errno is set.
- */
 int
 mlx5_dev_to_pci_addr(const char *dev_path,
 		     struct rte_pci_addr *pci_addr)
@@ -159,17 +148,6 @@ mlx5_translate_port_name(const char *port_name_in,
 	port_info_out->name_type = MLX5_PHYS_PORT_NAME_TYPE_UNKNOWN;
 }
 
-/**
- * Get kernel interface name from IB device path.
- *
- * @param[in] ibdev_path
- *   Pointer to IB device path.
- * @param[out] ifname
- *   Interface name output buffer.
- *
- * @return
- *   0 on success, a negative errno value otherwise and rte_errno is set.
- */
 int
 mlx5_get_ifname_sysfs(const char *ibdev_path, char *ifname)
 {
diff --git a/drivers/common/mlx5/mlx5_common.h b/drivers/common/mlx5/mlx5_common.h
index 4783fa6303..c0c950f8f9 100644
--- a/drivers/common/mlx5/mlx5_common.h
+++ b/drivers/common/mlx5/mlx5_common.h
@@ -202,8 +202,34 @@ check_cqe(volatile struct mlx5_cqe *cqe, const uint16_t cqes_n,
 	return MLX5_CQE_STATUS_SW_OWN;
 }
 
+/*
+ * Get PCI address from sysfs of a PCI-related device.
+ *
+ * @param[in] dev_path
+ *   The sysfs path should not point to the direct plain PCI device.
+ *   Instead, the node "/device/" is used to access the real device.
+ * @param[out] pci_addr
+ *   Parsed PCI address.
+ *
+ * @return
+ *   - 0 on success.
+ *   - Negative value and rte_errno is set otherwise.
+ */
 __rte_internal
 int mlx5_dev_to_pci_addr(const char *dev_path, struct rte_pci_addr *pci_addr);
+
+/*
+ * Get kernel network interface name from sysfs IB device path.
+ *
+ * @param[in] ibdev_path
+ *   The sysfs path to IB device.
+ * @param[out] ifname
+ *   Interface name output of size IF_NAMESIZE.
+ *
+ * @return
+ *   - 0 on success.
+ *   - Negative value and rte_errno is set otherwise.
+ */
 __rte_internal
 int mlx5_get_ifname_sysfs(const char *ibdev_path, char *ifname);
 
-- 
2.25.1


^ permalink raw reply	[flat|nested] 42+ messages in thread

* [dpdk-dev] [PATCH v1 03/14] common/mlx5: support auxiliary bus
  2021-05-27 13:37 ` [dpdk-dev] [RFC 01/14] common/mlx5: add common device driver Xueming Li
                     ` (3 preceding siblings ...)
  2021-06-16  4:09   ` [dpdk-dev] [PATCH v1 02/14] common/mlx5: move description of PCI sysfs functions Xueming Li
@ 2021-06-16  4:09   ` Xueming Li
  2021-06-16  4:09   ` [dpdk-dev] [PATCH v1 04/14] common/mlx5: get PCI device address from any bus Xueming Li
                     ` (10 subsequent siblings)
  15 siblings, 0 replies; 42+ messages in thread
From: Xueming Li @ 2021-06-16  4:09 UTC (permalink / raw)
  To: Viacheslav Ovsiienko
  Cc: dev, xuemingl, Matan Azrad, Shahaf Shuler, Ray Kinsella, Neil Horman

This patch adds auxiliary bus driver and delegate to
registered internal mlx5 common device drivers, i.e. eth, vdpa...

Current major target is to support SubFunction on auxiliary bus.

As a limitation of current driver, numa node of device is detected from
PCI bus of device symbol link, will remove once numa node file available
on sysfs.

Signed-off-by: Xueming Li <xuemingl@nvidia.com>
---
 drivers/common/mlx5/linux/meson.build         |   3 +
 .../common/mlx5/linux/mlx5_common_auxiliary.c | 172 ++++++++++++++++++
 drivers/common/mlx5/linux/mlx5_common_verbs.c |   5 +-
 drivers/common/mlx5/meson.build               |   2 +-
 drivers/common/mlx5/mlx5_common.c             |   3 +
 drivers/common/mlx5/mlx5_common.h             |   6 +
 drivers/common/mlx5/mlx5_common_private.h     |   6 +
 drivers/common/mlx5/version.map               |   2 +
 8 files changed, 197 insertions(+), 2 deletions(-)
 create mode 100644 drivers/common/mlx5/linux/mlx5_common_auxiliary.c

diff --git a/drivers/common/mlx5/linux/meson.build b/drivers/common/mlx5/linux/meson.build
index 5cea1b44d7..676fd05014 100644
--- a/drivers/common/mlx5/linux/meson.build
+++ b/drivers/common/mlx5/linux/meson.build
@@ -48,10 +48,13 @@ endif
 sources += files('mlx5_nl.c')
 sources += files('mlx5_common_os.c')
 sources += files('mlx5_common_verbs.c')
+sources += files('mlx5_common_auxiliary.c')
 if not dlopen_ibverbs
     sources += files('mlx5_glue.c')
 endif
 
+deps += ['bus_auxiliary']
+
 # To maintain the compatibility with the make build system
 # mlx5_autoconf.h file is still generated.
 # input array for meson member search:
diff --git a/drivers/common/mlx5/linux/mlx5_common_auxiliary.c b/drivers/common/mlx5/linux/mlx5_common_auxiliary.c
new file mode 100644
index 0000000000..79d567087c
--- /dev/null
+++ b/drivers/common/mlx5/linux/mlx5_common_auxiliary.c
@@ -0,0 +1,172 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright 2020 Mellanox Technologies Ltd
+ */
+
+#include <stdlib.h>
+#include <dirent.h>
+#include <rte_malloc.h>
+#include <rte_errno.h>
+#include <rte_bus_auxiliary.h>
+#include <rte_common.h>
+#include "eal_filesystem.h"
+
+#include "mlx5_common_utils.h"
+#include "mlx5_common_private.h"
+
+#define AUXILIARY_SYSFS_PATH "/sys/bus/auxiliary/devices"
+#define MLX5_AUXILIARY_PREFIX "mlx5_core.sf."
+
+int
+mlx5_auxiliary_get_child_name(const char *dev, const char *node,
+			      char *child, size_t size)
+{
+	DIR *dir;
+	struct dirent *dent;
+	MKSTR(path, "%s/%s%s", AUXILIARY_SYSFS_PATH, dev, node);
+
+	dir = opendir(path);
+	if (dir == NULL) {
+		rte_errno = errno;
+		return -rte_errno;
+	}
+	/* Get the first file name. */
+	while ((dent = readdir(dir)) != NULL) {
+		if (dent->d_name[0] != '.')
+			break;
+	}
+	closedir(dir);
+	if (dent == NULL) {
+		rte_errno = ENOENT;
+		return -rte_errno;
+	}
+	if (rte_strscpy(child, dent->d_name, size) < 0)
+		return -rte_errno;
+	return 0;
+}
+
+static int
+mlx5_auxiliary_get_pci_path(const struct rte_auxiliary_device *dev,
+			    char *sysfs_pci, size_t size)
+{
+	char sysfs_real[PATH_MAX] = { 0 };
+	MKSTR(sysfs_aux, "%s/%s", AUXILIARY_SYSFS_PATH, dev->name);
+	char *dir;
+
+	if (realpath(sysfs_aux, sysfs_real) == NULL) {
+		rte_errno = errno;
+		return -rte_errno;
+	}
+	dir = dirname(sysfs_real);
+	if (dir == NULL) {
+		rte_errno = errno;
+		return -rte_errno;
+	}
+	if (rte_strscpy(sysfs_pci, dir, size) < 0)
+		return -rte_errno;
+	return 0;
+}
+
+static int
+mlx5_auxiliary_get_numa(const struct rte_auxiliary_device *dev)
+{
+	unsigned long numa;
+	char numa_path[PATH_MAX];
+
+	if (mlx5_auxiliary_get_pci_path(dev, numa_path, sizeof(numa_path)) != 0)
+		return SOCKET_ID_ANY;
+	if (strcat(numa_path, "/numa_node") == NULL) {
+		rte_errno = ENAMETOOLONG;
+		return SOCKET_ID_ANY;
+	}
+	if (eal_parse_sysfs_value(numa_path, &numa) != 0) {
+		rte_errno = EINVAL;
+		return SOCKET_ID_ANY;
+	}
+	return (int)numa;
+}
+
+struct ibv_device *
+mlx5_get_aux_ibv_device(const struct rte_auxiliary_device *dev)
+{
+	int n;
+	char ib_name[64] = { 0 };
+	struct ibv_device **ibv_list = mlx5_glue->get_device_list(&n);
+	struct ibv_device *ibv_match = NULL;
+
+	if (!ibv_list) {
+		rte_errno = ENOSYS;
+		return NULL;
+	}
+	if (mlx5_auxiliary_get_child_name(dev->name, "/infiniband",
+					  ib_name, sizeof(ib_name)) != 0)
+		return NULL;
+	while (n-- > 0) {
+		if (strcmp(ibv_list[n]->name, ib_name) != 0)
+			continue;
+		ibv_match = ibv_list[n];
+		break;
+	}
+	if (ibv_match == NULL)
+		rte_errno = ENOENT;
+	mlx5_glue->free_device_list(ibv_list);
+	return ibv_match;
+}
+
+static bool
+mlx5_common_auxiliary_match(const char *name)
+{
+	return strncmp(name, MLX5_AUXILIARY_PREFIX,
+		       strlen(MLX5_AUXILIARY_PREFIX)) == 0;
+}
+
+static int
+mlx5_common_auxiliary_probe(struct rte_auxiliary_driver *drv __rte_unused,
+			    struct rte_auxiliary_device *dev)
+{
+	dev->device.numa_node = mlx5_auxiliary_get_numa(dev);
+	return mlx5_common_dev_probe(&dev->device);
+}
+
+static int
+mlx5_common_auxiliary_remove(struct rte_auxiliary_device *auxiliary_dev)
+{
+	return mlx5_common_dev_remove(&auxiliary_dev->device);
+}
+
+static int
+mlx5_common_auxiliary_dma_map(struct rte_auxiliary_device *auxiliary_dev,
+			      void *addr, uint64_t iova, size_t len)
+{
+	return mlx5_common_dev_dma_map(&auxiliary_dev->device, addr, iova, len);
+}
+
+static int
+mlx5_common_auxiliary_dma_unmap(struct rte_auxiliary_device *auxiliary_dev,
+				void *addr, uint64_t iova, size_t len)
+{
+	return mlx5_common_dev_dma_unmap(&auxiliary_dev->device, addr, iova,
+					 len);
+}
+
+static struct rte_auxiliary_driver mlx5_auxiliary_driver = {
+	.driver = {
+		   .name = MLX5_AUXILIARY_DRIVER_NAME,
+	},
+	.match = mlx5_common_auxiliary_match,
+	.probe = mlx5_common_auxiliary_probe,
+	.remove = mlx5_common_auxiliary_remove,
+	.dma_map = mlx5_common_auxiliary_dma_map,
+	.dma_unmap = mlx5_common_auxiliary_dma_unmap,
+};
+
+void mlx5_common_auxiliary_init(void)
+{
+	if (mlx5_auxiliary_driver.bus == NULL)
+		rte_auxiliary_register(&mlx5_auxiliary_driver);
+}
+
+RTE_FINI(mlx5_common_auxiliary_driver_finish)
+{
+	if (mlx5_auxiliary_driver.bus != NULL)
+		rte_auxiliary_unregister(&mlx5_auxiliary_driver);
+}
diff --git a/drivers/common/mlx5/linux/mlx5_common_verbs.c b/drivers/common/mlx5/linux/mlx5_common_verbs.c
index 6a6ab7a7a2..9080bd3e87 100644
--- a/drivers/common/mlx5/linux/mlx5_common_verbs.c
+++ b/drivers/common/mlx5/linux/mlx5_common_verbs.c
@@ -12,6 +12,7 @@
 
 #include <rte_errno.h>
 #include <rte_bus_pci.h>
+#include <rte_bus_auxiliary.h>
 
 #include "mlx5_common_utils.h"
 #include "mlx5_common_log.h"
@@ -24,10 +25,12 @@
 struct ibv_device *
 mlx5_os_get_ibv_dev(const struct rte_device *dev)
 {
-	struct ibv_device *ibv = NULL;
+	struct ibv_device *ibv;
 
 	if (mlx5_dev_is_pci(dev))
 		ibv = mlx5_os_get_ibv_device(&RTE_DEV_TO_PCI_CONST(dev)->addr);
+	else
+		ibv = mlx5_get_aux_ibv_device(RTE_DEV_TO_AUXILIARY_CONST(dev));
 	if (ibv == NULL) {
 		rte_errno = ENODEV;
 		DRV_LOG(ERR, "Verbs device not found: %s", dev->name);
diff --git a/drivers/common/mlx5/meson.build b/drivers/common/mlx5/meson.build
index fa2b8b9834..6ddbde7e8f 100644
--- a/drivers/common/mlx5/meson.build
+++ b/drivers/common/mlx5/meson.build
@@ -7,7 +7,7 @@ if not (is_linux or (is_windows and is_ms_linker))
     subdir_done()
 endif
 
-deps += ['hash', 'pci', 'bus_pci', 'net', 'eal', 'kvargs']
+deps += ['hash', 'pci', 'bus_pci', 'bus_auxiliary', 'net', 'eal', 'kvargs']
 sources += files(
         'mlx5_devx_cmds.c',
         'mlx5_common.c',
diff --git a/drivers/common/mlx5/mlx5_common.c b/drivers/common/mlx5/mlx5_common.c
index dc32a7bd90..b4ca0a9fe1 100644
--- a/drivers/common/mlx5/mlx5_common.c
+++ b/drivers/common/mlx5/mlx5_common.c
@@ -399,6 +399,9 @@ mlx5_class_driver_register(struct mlx5_class_driver *driver)
 static void mlx5_common_driver_init(void)
 {
 	mlx5_common_pci_init();
+#ifdef RTE_EXEC_ENV_LINUX
+	mlx5_common_auxiliary_init();
+#endif
 }
 
 static bool mlx5_common_initialized;
diff --git a/drivers/common/mlx5/mlx5_common.h b/drivers/common/mlx5/mlx5_common.h
index c0c950f8f9..439f16cde1 100644
--- a/drivers/common/mlx5/mlx5_common.h
+++ b/drivers/common/mlx5/mlx5_common.h
@@ -22,6 +22,7 @@
 
 /* Reported driver name. */
 #define MLX5_PCI_DRIVER_NAME "mlx5_pci"
+#define MLX5_AUXILIARY_DRIVER_NAME "mlx5_auxiliary"
 
 /* Bit-field manipulation. */
 #define BITFIELD_DECLARE(bf, type, size) \
@@ -107,6 +108,7 @@ pmd_drv_log_basename(const char *s)
 	int mkstr_size_##name = snprintf(NULL, 0, "" __VA_ARGS__); \
 	char name[mkstr_size_##name + 1]; \
 	\
+	memset(name, 0, mkstr_size_##name + 1); \
 	snprintf(name, sizeof(name), "" __VA_ARGS__)
 
 enum {
@@ -134,6 +136,10 @@ enum {
 	PCI_DEVICE_ID_MELLANOX_CONNECTX7BF = 0Xa2dc,
 };
 
+
+__rte_internal
+int mlx5_auxiliary_get_child_name(const char *dev, const char *node,
+				  char *child, size_t size);
 /* Maximum number of simultaneous unicast MAC addresses. */
 #define MLX5_MAX_UC_MAC_ADDRESSES 128
 /* Maximum number of simultaneous Multicast MAC addresses. */
diff --git a/drivers/common/mlx5/mlx5_common_private.h b/drivers/common/mlx5/mlx5_common_private.h
index 791eb3cd77..9f00a6c54d 100644
--- a/drivers/common/mlx5/mlx5_common_private.h
+++ b/drivers/common/mlx5/mlx5_common_private.h
@@ -6,6 +6,7 @@
 #define _MLX5_COMMON_PRIVATE_H_
 
 #include <rte_pci.h>
+#include <rte_bus_auxiliary.h>
 
 #include "mlx5_common.h"
 
@@ -34,6 +35,11 @@ void mlx5_common_driver_on_register_pci(struct mlx5_class_driver *driver);
 bool mlx5_dev_pci_match(const struct mlx5_class_driver *drv,
 			const struct rte_device *dev);
 
+/* Common auxiliary bus driver: */
+void mlx5_common_auxiliary_init(void);
+struct ibv_device *mlx5_get_aux_ibv_device(
+		const struct rte_auxiliary_device *dev);
+
 #ifdef __cplusplus
 }
 #endif /* __cplusplus */
diff --git a/drivers/common/mlx5/version.map b/drivers/common/mlx5/version.map
index 6be882e98c..edfd3792e0 100644
--- a/drivers/common/mlx5/version.map
+++ b/drivers/common/mlx5/version.map
@@ -3,6 +3,8 @@ INTERNAL {
 
 	haswell_broadwell_cpu;
 
+	mlx5_auxiliary_get_child_name; # WINDOWS_NO_EXPORT
+
 	mlx5_class_driver_register;
 
 	mlx5_common_init;
-- 
2.25.1


^ permalink raw reply	[flat|nested] 42+ messages in thread

* [dpdk-dev] [PATCH v1 04/14] common/mlx5: get PCI device address from any bus
  2021-05-27 13:37 ` [dpdk-dev] [RFC 01/14] common/mlx5: add common device driver Xueming Li
                     ` (4 preceding siblings ...)
  2021-06-16  4:09   ` [dpdk-dev] [PATCH v1 03/14] common/mlx5: support auxiliary bus Xueming Li
@ 2021-06-16  4:09   ` Xueming Li
  2021-06-16  4:09   ` [dpdk-dev] [PATCH v1 05/14] net/mlx5: remove PCI dependency Xueming Li
                     ` (9 subsequent siblings)
  15 siblings, 0 replies; 42+ messages in thread
From: Xueming Li @ 2021-06-16  4:09 UTC (permalink / raw)
  To: Viacheslav Ovsiienko
  Cc: Thomas Monjalon, dev, xuemingl, Matan Azrad, Shahaf Shuler,
	Ray Kinsella, Neil Horman

From: Thomas Monjalon <thomas@monjalon.net>

A function is exported to allow retrieving the PCI address
of the parent PCI device of a Sub-Function in auxiliary bus sysfs.
The function mlx5_dev_to_pci_str() is accepting both PCI and auxiliary
devices. In case of a PCI device, it is simply using the device name.

The function mlx5_dev_to_pci_addr(), which is based on sysfs path
and do not use any device object, is renamed to mlx5_get_pci_addr()
for clarity purpose.

Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
---
 .../common/mlx5/linux/mlx5_common_auxiliary.c | 19 +++++++++++++++
 drivers/common/mlx5/linux/mlx5_common_os.c    |  5 ++--
 drivers/common/mlx5/mlx5_common.c             | 23 +++++++++++++++++++
 drivers/common/mlx5/mlx5_common.h             | 16 ++++++++++++-
 drivers/common/mlx5/mlx5_common_private.h     |  2 ++
 drivers/common/mlx5/version.map               |  3 ++-
 drivers/net/mlx5/linux/mlx5_os.c              |  6 ++---
 7 files changed, 66 insertions(+), 8 deletions(-)

diff --git a/drivers/common/mlx5/linux/mlx5_common_auxiliary.c b/drivers/common/mlx5/linux/mlx5_common_auxiliary.c
index 79d567087c..aaa2407ded 100644
--- a/drivers/common/mlx5/linux/mlx5_common_auxiliary.c
+++ b/drivers/common/mlx5/linux/mlx5_common_auxiliary.c
@@ -4,6 +4,8 @@
 
 #include <stdlib.h>
 #include <dirent.h>
+#include <libgen.h>
+
 #include <rte_malloc.h>
 #include <rte_errno.h>
 #include <rte_bus_auxiliary.h>
@@ -66,6 +68,23 @@ mlx5_auxiliary_get_pci_path(const struct rte_auxiliary_device *dev,
 	return 0;
 }
 
+int
+mlx5_auxiliary_get_pci_str(const struct rte_auxiliary_device *dev,
+			   char *addr, size_t size)
+{
+	char sysfs_pci[PATH_MAX];
+	char *base;
+
+	if (mlx5_auxiliary_get_pci_path(dev, sysfs_pci, sizeof(sysfs_pci)) != 0)
+		return -ENODEV;
+	base = basename(sysfs_pci);
+	if (base == NULL)
+		return -errno;
+	if (rte_strscpy(addr, base, size) < 0)
+		return -rte_errno;
+	return 0;
+}
+
 static int
 mlx5_auxiliary_get_numa(const struct rte_auxiliary_device *dev)
 {
diff --git a/drivers/common/mlx5/linux/mlx5_common_os.c b/drivers/common/mlx5/linux/mlx5_common_os.c
index 337e9df8cb..9e0c823c97 100644
--- a/drivers/common/mlx5/linux/mlx5_common_os.c
+++ b/drivers/common/mlx5/linux/mlx5_common_os.c
@@ -24,8 +24,7 @@ const struct mlx5_glue *mlx5_glue;
 #endif
 
 int
-mlx5_dev_to_pci_addr(const char *dev_path,
-		     struct rte_pci_addr *pci_addr)
+mlx5_get_pci_addr(const char *dev_path, struct rte_pci_addr *pci_addr)
 {
 	FILE *file;
 	char line[32];
@@ -417,7 +416,7 @@ mlx5_os_get_ibv_device(const struct rte_pci_addr *addr)
 		struct rte_pci_addr paddr;
 
 		DRV_LOG(DEBUG, "Checking device \"%s\"..", ibv_list[n]->name);
-		if (mlx5_dev_to_pci_addr(ibv_list[n]->ibdev_path, &paddr) != 0)
+		if (mlx5_get_pci_addr(ibv_list[n]->ibdev_path, &paddr) != 0)
 			continue;
 		if (rte_pci_addr_cmp(addr, &paddr) != 0)
 			continue;
diff --git a/drivers/common/mlx5/mlx5_common.c b/drivers/common/mlx5/mlx5_common.c
index b4ca0a9fe1..4ff13cb461 100644
--- a/drivers/common/mlx5/mlx5_common.c
+++ b/drivers/common/mlx5/mlx5_common.c
@@ -198,6 +198,29 @@ to_mlx5_device(const struct rte_device *rte_dev)
 	return NULL;
 }
 
+int
+mlx5_dev_to_pci_str(const struct rte_device *dev, char *addr, size_t size)
+{
+	struct rte_pci_addr pci_addr = { 0 };
+	int ret;
+
+	if (mlx5_dev_is_pci(dev)) {
+		/* Input might be <BDF>, format PCI address to <DBDF>. */
+		ret = rte_pci_addr_parse(dev->name, &pci_addr);
+		if (ret != 0)
+			return -ENODEV;
+		rte_pci_device_name(&pci_addr, addr, size);
+		return 0;
+	}
+#ifdef RTE_EXEC_ENV_LINUX
+	return mlx5_auxiliary_get_pci_str(RTE_DEV_TO_AUXILIARY_CONST(dev),
+			addr, size);
+#else
+	rte_errno = ENODEV;
+	return -rte_errno;
+#endif
+}
+
 static void
 dev_release(struct mlx5_common_device *dev)
 {
diff --git a/drivers/common/mlx5/mlx5_common.h b/drivers/common/mlx5/mlx5_common.h
index 439f16cde1..7d7b896517 100644
--- a/drivers/common/mlx5/mlx5_common.h
+++ b/drivers/common/mlx5/mlx5_common.h
@@ -208,6 +208,20 @@ check_cqe(volatile struct mlx5_cqe *cqe, const uint16_t cqes_n,
 	return MLX5_CQE_STATUS_SW_OWN;
 }
 
+/*
+ * Get PCI address <DBDF> string from EAL device.
+ *
+ * @param[out] addr
+ *	The output address buffer string
+ * @param[in] size
+ *	The output buffer size
+ * @return
+ *   - 0 on success.
+ *   - Negative value and rte_errno is set otherwise.
+ */
+__rte_internal
+int mlx5_dev_to_pci_str(const struct rte_device *dev, char *addr, size_t size);
+
 /*
  * Get PCI address from sysfs of a PCI-related device.
  *
@@ -222,7 +236,7 @@ check_cqe(volatile struct mlx5_cqe *cqe, const uint16_t cqes_n,
  *   - Negative value and rte_errno is set otherwise.
  */
 __rte_internal
-int mlx5_dev_to_pci_addr(const char *dev_path, struct rte_pci_addr *pci_addr);
+int mlx5_get_pci_addr(const char *dev_path, struct rte_pci_addr *pci_addr);
 
 /*
  * Get kernel network interface name from sysfs IB device path.
diff --git a/drivers/common/mlx5/mlx5_common_private.h b/drivers/common/mlx5/mlx5_common_private.h
index 9f00a6c54d..1096fa85e7 100644
--- a/drivers/common/mlx5/mlx5_common_private.h
+++ b/drivers/common/mlx5/mlx5_common_private.h
@@ -39,6 +39,8 @@ bool mlx5_dev_pci_match(const struct mlx5_class_driver *drv,
 void mlx5_common_auxiliary_init(void);
 struct ibv_device *mlx5_get_aux_ibv_device(
 		const struct rte_auxiliary_device *dev);
+int mlx5_auxiliary_get_pci_str(const struct rte_auxiliary_device *dev,
+			       char *addr, size_t size);
 
 #ifdef __cplusplus
 }
diff --git a/drivers/common/mlx5/version.map b/drivers/common/mlx5/version.map
index edfd3792e0..ea4c49b7e7 100644
--- a/drivers/common/mlx5/version.map
+++ b/drivers/common/mlx5/version.map
@@ -15,7 +15,7 @@ INTERNAL {
 	mlx5_create_mr_ext;
 
 	mlx5_dev_is_pci; # WINDOWS_NO_EXPORT
-	mlx5_dev_to_pci_addr; # WINDOWS_NO_EXPORT
+	mlx5_dev_to_pci_str; # WINDOWS_NO_EXPORT
 
 	mlx5_devx_alloc_uar; # WINDOWS_NO_EXPORT
 
@@ -75,6 +75,7 @@ INTERNAL {
 	mlx5_free;
 
 	mlx5_get_ifname_sysfs; # WINDOWS_NO_EXPORT
+	mlx5_get_pci_addr; # WINDOWS_NO_EXPORT
 
 	mlx5_glue;
 
diff --git a/drivers/net/mlx5/linux/mlx5_os.c b/drivers/net/mlx5/linux/mlx5_os.c
index 54e4a1fe60..78e101d649 100644
--- a/drivers/net/mlx5/linux/mlx5_os.c
+++ b/drivers/net/mlx5/linux/mlx5_os.c
@@ -1848,7 +1848,7 @@ mlx5_device_bond_pci_match(const struct ibv_device *ibv_dev,
 		/* Process slave interface names in the loop. */
 		snprintf(tmp_str, sizeof(tmp_str),
 			 "/sys/class/net/%s", ifname);
-		if (mlx5_dev_to_pci_addr(tmp_str, &pci_addr)) {
+		if (mlx5_get_pci_addr(tmp_str, &pci_addr)) {
 			DRV_LOG(WARNING, "can not get PCI address"
 					 " for netdev \"%s\"", ifname);
 			continue;
@@ -2025,8 +2025,8 @@ mlx5_os_pci_probe_pf(struct rte_pci_device *pci_dev,
 			break;
 		} else {
 			/* Bonding device not found. */
-			if (mlx5_dev_to_pci_addr
-				(ibv_list[ret]->ibdev_path, &pci_addr))
+			if (mlx5_get_pci_addr(ibv_list[ret]->ibdev_path,
+					      &pci_addr))
 				continue;
 			if (owner_pci.domain != pci_addr.domain ||
 			    owner_pci.bus != pci_addr.bus ||
-- 
2.25.1


^ permalink raw reply	[flat|nested] 42+ messages in thread

* [dpdk-dev] [PATCH v1 05/14] net/mlx5: remove PCI dependency
  2021-05-27 13:37 ` [dpdk-dev] [RFC 01/14] common/mlx5: add common device driver Xueming Li
                     ` (5 preceding siblings ...)
  2021-06-16  4:09   ` [dpdk-dev] [PATCH v1 04/14] common/mlx5: get PCI device address from any bus Xueming Li
@ 2021-06-16  4:09   ` Xueming Li
  2021-06-16  4:09   ` [dpdk-dev] [PATCH v1 06/14] net/mlx5: migrate to bus-agnostic common driver Xueming Li
                     ` (8 subsequent siblings)
  15 siblings, 0 replies; 42+ messages in thread
From: Xueming Li @ 2021-06-16  4:09 UTC (permalink / raw)
  To: Viacheslav Ovsiienko; +Cc: dev, xuemingl, Matan Azrad, Shahaf Shuler

To support more bus types, remove PCI dependency where possible.

Signed-off-by: Xueming Li <xuemingl@nvidia.com>
---
 drivers/net/mlx5/linux/mlx5_ethdev_os.c |  2 +-
 drivers/net/mlx5/linux/mlx5_os.c        |  4 ++--
 drivers/net/mlx5/mlx5.c                 | 24 +++++++++++++++---------
 drivers/net/mlx5/mlx5.h                 |  8 ++++----
 drivers/net/mlx5/mlx5_ethdev.c          |  2 +-
 drivers/net/mlx5/mlx5_mr.c              | 14 +++++++-------
 drivers/net/mlx5/mlx5_trigger.c         | 12 +++++-------
 drivers/net/mlx5/mlx5_txq.c             |  3 ++-
 drivers/net/mlx5/windows/mlx5_os.c      |  2 +-
 9 files changed, 38 insertions(+), 33 deletions(-)

diff --git a/drivers/net/mlx5/linux/mlx5_ethdev_os.c b/drivers/net/mlx5/linux/mlx5_ethdev_os.c
index ddc1371aa9..b05b9fc950 100644
--- a/drivers/net/mlx5/linux/mlx5_ethdev_os.c
+++ b/drivers/net/mlx5/linux/mlx5_ethdev_os.c
@@ -346,7 +346,7 @@ mlx5_find_master_dev(struct rte_eth_dev *dev)
 	priv = dev->data->dev_private;
 	domain_id = priv->domain_id;
 	MLX5_ASSERT(priv->representor);
-	MLX5_ETH_FOREACH_DEV(port_id, priv->pci_dev) {
+	MLX5_ETH_FOREACH_DEV(port_id, dev->device) {
 		struct mlx5_priv *opriv =
 			rte_eth_devices[port_id].data->dev_private;
 		if (opriv &&
diff --git a/drivers/net/mlx5/linux/mlx5_os.c b/drivers/net/mlx5/linux/mlx5_os.c
index 78e101d649..b695929e0b 100644
--- a/drivers/net/mlx5/linux/mlx5_os.c
+++ b/drivers/net/mlx5/linux/mlx5_os.c
@@ -923,6 +923,7 @@ mlx5_dev_spawn(struct rte_device *dpdk_dev,
 	sh = mlx5_alloc_shared_dev_ctx(spawn, config);
 	if (!sh)
 		return NULL;
+	sh->numa_node = dpdk_dev->numa_node;
 	config->devx = sh->devx;
 #ifdef HAVE_MLX5DV_DR_ACTION_DEST_DEVX_TIR
 	config->dest_tir = 1;
@@ -1125,7 +1126,7 @@ mlx5_dev_spawn(struct rte_device *dpdk_dev,
 	 * Look for sibling devices in order to reuse their switch domain
 	 * if any, otherwise allocate one.
 	 */
-	MLX5_ETH_FOREACH_DEV(port_id, priv->pci_dev) {
+	MLX5_ETH_FOREACH_DEV(port_id, NULL) {
 		const struct mlx5_priv *opriv =
 			rte_eth_devices[port_id].data->dev_private;
 
@@ -2548,7 +2549,6 @@ mlx5_os_open_device(const struct mlx5_dev_spawn_data *spawn,
 	int dbmap_env;
 	int err = 0;
 
-	sh->numa_node = spawn->pci_dev->device.numa_node;
 	pthread_mutex_init(&sh->txpp.mutex, NULL);
 	/*
 	 * Configure environment variable "MLX5_BF_SHUT_UP"
diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index 52573e78f9..6077c701e7 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -1185,7 +1185,7 @@ mlx5_alloc_shared_dev_ctx(const struct mlx5_dev_spawn_data *spawn,
 	 */
 	err = mlx5_mr_btree_init(&sh->share_cache.cache,
 				 MLX5_MR_BTREE_CACHE_N * 2,
-				 spawn->pci_dev->device.numa_node);
+				 sh->numa_node);
 	if (err) {
 		err = rte_errno;
 		goto error;
@@ -1620,7 +1620,7 @@ mlx5_dev_close(struct rte_eth_dev *dev)
 		unsigned int c = 0;
 		uint16_t port_id;
 
-		MLX5_ETH_FOREACH_DEV(port_id, priv->pci_dev) {
+		MLX5_ETH_FOREACH_DEV(port_id, dev->device) {
 			struct mlx5_priv *opriv =
 				rte_eth_devices[port_id].data->dev_private;
 
@@ -2057,7 +2057,8 @@ void
 mlx5_set_min_inline(struct mlx5_dev_spawn_data *spawn,
 		    struct mlx5_dev_config *config)
 {
-	if (config->txq_inline_min != MLX5_ARG_UNSET) {
+	if (config->txq_inline_min != MLX5_ARG_UNSET &&
+	    spawn->pci_dev != NULL) {
 		/* Application defines size of inlined data explicitly. */
 		switch (spawn->pci_dev->id.device_id) {
 		case PCI_DEVICE_ID_MELLANOX_CONNECTX4:
@@ -2124,6 +2125,11 @@ mlx5_set_min_inline(struct mlx5_dev_spawn_data *spawn,
 			}
 		}
 	}
+	if (spawn->pci_dev == NULL) {
+		if (config->txq_inline_min == MLX5_ARG_UNSET)
+			config->txq_inline_min = MLX5_INLINE_HSIZE_NONE;
+		goto exit;
+	}
 	/*
 	 * We get here if we are unable to deduce
 	 * inline data size with DevX. Try PCI ID
@@ -2258,7 +2264,7 @@ mlx5_dev_check_sibling_config(struct mlx5_priv *priv,
 	if (sh->refcnt == 1)
 		return 0;
 	/* Find the device with shared context. */
-	MLX5_ETH_FOREACH_DEV(port_id, priv->pci_dev) {
+	MLX5_ETH_FOREACH_DEV(port_id, NULL) {
 		struct mlx5_priv *opriv =
 			rte_eth_devices[port_id].data->dev_private;
 
@@ -2289,25 +2295,25 @@ mlx5_dev_check_sibling_config(struct mlx5_priv *priv,
  *
  * @param[in] port_id
  *   port_id to start looking for device.
- * @param[in] pci_dev
- *   Pointer to the hint PCI device. When device is being probed
+ * @param[in] odev
+ *   Pointer to the hint device. When device is being probed
  *   the its siblings (master and preceding representors might
  *   not have assigned driver yet (because the mlx5_os_pci_probe()
- *   is not completed yet, for this case match on hint PCI
+ *   is not completed yet, for this case match on hint
  *   device may be used to detect sibling device.
  *
  * @return
  *   port_id of found device, RTE_MAX_ETHPORT if not found.
  */
 uint16_t
-mlx5_eth_find_next(uint16_t port_id, struct rte_pci_device *pci_dev)
+mlx5_eth_find_next(uint16_t port_id, struct rte_device *odev)
 {
 	while (port_id < RTE_MAX_ETHPORTS) {
 		struct rte_eth_dev *dev = &rte_eth_devices[port_id];
 
 		if (dev->state != RTE_ETH_DEV_UNUSED &&
 		    dev->device &&
-		    (dev->device == &pci_dev->device ||
+		    (dev->device == odev ||
 		     (dev->device->driver &&
 		     dev->device->driver->name &&
 		     !strcmp(dev->device->driver->name, MLX5_PCI_DRIVER_NAME))))
diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index 32b2817bf2..42de853167 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -1404,16 +1404,16 @@ int mlx5_proc_priv_init(struct rte_eth_dev *dev);
 void mlx5_proc_priv_uninit(struct rte_eth_dev *dev);
 int mlx5_udp_tunnel_port_add(struct rte_eth_dev *dev,
 			      struct rte_eth_udp_tunnel *udp_tunnel);
-uint16_t mlx5_eth_find_next(uint16_t port_id, struct rte_pci_device *pci_dev);
+uint16_t mlx5_eth_find_next(uint16_t port_id, struct rte_device *odev);
 int mlx5_dev_close(struct rte_eth_dev *dev);
 bool mlx5_is_hpf(struct rte_eth_dev *dev);
 void mlx5_age_event_prepare(struct mlx5_dev_ctx_shared *sh);
 
 /* Macro to iterate over all valid ports for mlx5 driver. */
-#define MLX5_ETH_FOREACH_DEV(port_id, pci_dev) \
-	for (port_id = mlx5_eth_find_next(0, pci_dev); \
+#define MLX5_ETH_FOREACH_DEV(port_id, dev) \
+	for (port_id = mlx5_eth_find_next(0, dev); \
 	     port_id < RTE_MAX_ETHPORTS; \
-	     port_id = mlx5_eth_find_next(port_id + 1, pci_dev))
+	     port_id = mlx5_eth_find_next(port_id + 1, dev))
 int mlx5_args(struct mlx5_dev_config *config, struct rte_devargs *devargs);
 struct mlx5_dev_ctx_shared *
 mlx5_alloc_shared_dev_ctx(const struct mlx5_dev_spawn_data *spawn,
diff --git a/drivers/net/mlx5/mlx5_ethdev.c b/drivers/net/mlx5/mlx5_ethdev.c
index 90baee5aa4..3047b921a9 100644
--- a/drivers/net/mlx5/mlx5_ethdev.c
+++ b/drivers/net/mlx5/mlx5_ethdev.c
@@ -335,7 +335,7 @@ mlx5_dev_infos_get(struct rte_eth_dev *dev, struct rte_eth_dev_info *info)
 	if (priv->representor) {
 		uint16_t port_id;
 
-		MLX5_ETH_FOREACH_DEV(port_id, priv->pci_dev) {
+		MLX5_ETH_FOREACH_DEV(port_id, dev->device) {
 			struct mlx5_priv *opriv =
 				rte_eth_devices[port_id].data->dev_private;
 
diff --git a/drivers/net/mlx5/mlx5_mr.c b/drivers/net/mlx5/mlx5_mr.c
index 0c5403e493..e6324c22c5 100644
--- a/drivers/net/mlx5/mlx5_mr.c
+++ b/drivers/net/mlx5/mlx5_mr.c
@@ -285,23 +285,23 @@ mlx5_mr_update_ext_mp_cb(struct rte_mempool *mp, void *opaque,
 }
 
 /**
- * Finds the first ethdev that match the pci device.
+ * Finds the first ethdev that match the device.
  * The existence of multiple ethdev per pci device is only with representors.
  * On such case, it is enough to get only one of the ports as they all share
  * the same ibv context.
  *
- * @param pdev
- *   Pointer to the PCI device.
+ * @param dev
+ *   Pointer to the device.
  *
  * @return
  *   Pointer to the ethdev if found, NULL otherwise.
  */
 static struct rte_eth_dev *
-pci_dev_to_eth_dev(struct rte_pci_device *pdev)
+dev_to_eth_dev(struct rte_device *dev)
 {
 	uint16_t port_id;
 
-	port_id = rte_eth_find_next_of(0, &pdev->device);
+	port_id = rte_eth_find_next_of(0, dev);
 	if (port_id == RTE_MAX_ETHPORTS)
 		return NULL;
 	return &rte_eth_devices[port_id];
@@ -331,7 +331,7 @@ mlx5_dma_map(struct rte_pci_device *pdev, void *addr,
 	struct mlx5_priv *priv;
 	struct mlx5_dev_ctx_shared *sh;
 
-	dev = pci_dev_to_eth_dev(pdev);
+	dev = dev_to_eth_dev(&pdev->device);
 	if (!dev) {
 		DRV_LOG(WARNING, "unable to find matching ethdev "
 				 "to PCI device %p", (void *)pdev);
@@ -381,7 +381,7 @@ mlx5_dma_unmap(struct rte_pci_device *pdev, void *addr,
 	struct mlx5_mr *mr;
 	struct mr_cache_entry entry;
 
-	dev = pci_dev_to_eth_dev(pdev);
+	dev = dev_to_eth_dev(&pdev->device);
 	if (!dev) {
 		DRV_LOG(WARNING, "unable to find matching ethdev "
 				 "to PCI device %p", (void *)pdev);
diff --git a/drivers/net/mlx5/mlx5_trigger.c b/drivers/net/mlx5/mlx5_trigger.c
index ae7fcca229..4ff12eac19 100644
--- a/drivers/net/mlx5/mlx5_trigger.c
+++ b/drivers/net/mlx5/mlx5_trigger.c
@@ -697,7 +697,7 @@ mlx5_hairpin_bind_single_port(struct rte_eth_dev *dev, uint16_t rx_port)
 	uint32_t explicit;
 	uint16_t rx_queue;
 
-	if (mlx5_eth_find_next(rx_port, priv->pci_dev) != rx_port) {
+	if (mlx5_eth_find_next(rx_port, dev->device) != rx_port) {
 		rte_errno = ENODEV;
 		DRV_LOG(ERR, "Rx port %u does not belong to mlx5", rx_port);
 		return -rte_errno;
@@ -835,7 +835,7 @@ mlx5_hairpin_unbind_single_port(struct rte_eth_dev *dev, uint16_t rx_port)
 	int ret;
 	uint16_t cur_port = priv->dev_data->port_id;
 
-	if (mlx5_eth_find_next(rx_port, priv->pci_dev) != rx_port) {
+	if (mlx5_eth_find_next(rx_port, dev->device) != rx_port) {
 		rte_errno = ENODEV;
 		DRV_LOG(ERR, "Rx port %u does not belong to mlx5", rx_port);
 		return -rte_errno;
@@ -893,7 +893,6 @@ mlx5_hairpin_bind(struct rte_eth_dev *dev, uint16_t rx_port)
 {
 	int ret = 0;
 	uint16_t p, pp;
-	struct mlx5_priv *priv = dev->data->dev_private;
 
 	/*
 	 * If the Rx port has no hairpin configuration with the current port,
@@ -902,7 +901,7 @@ mlx5_hairpin_bind(struct rte_eth_dev *dev, uint16_t rx_port)
 	 * information updating.
 	 */
 	if (rx_port == RTE_MAX_ETHPORTS) {
-		MLX5_ETH_FOREACH_DEV(p, priv->pci_dev) {
+		MLX5_ETH_FOREACH_DEV(p, dev->device) {
 			ret = mlx5_hairpin_bind_single_port(dev, p);
 			if (ret != 0)
 				goto unbind;
@@ -912,7 +911,7 @@ mlx5_hairpin_bind(struct rte_eth_dev *dev, uint16_t rx_port)
 		return mlx5_hairpin_bind_single_port(dev, rx_port);
 	}
 unbind:
-	MLX5_ETH_FOREACH_DEV(pp, priv->pci_dev)
+	MLX5_ETH_FOREACH_DEV(pp, dev->device)
 		if (pp < p)
 			mlx5_hairpin_unbind_single_port(dev, pp);
 	return ret;
@@ -927,10 +926,9 @@ mlx5_hairpin_unbind(struct rte_eth_dev *dev, uint16_t rx_port)
 {
 	int ret = 0;
 	uint16_t p;
-	struct mlx5_priv *priv = dev->data->dev_private;
 
 	if (rx_port == RTE_MAX_ETHPORTS)
-		MLX5_ETH_FOREACH_DEV(p, priv->pci_dev) {
+		MLX5_ETH_FOREACH_DEV(p, dev->device) {
 			ret = mlx5_hairpin_unbind_single_port(dev, p);
 			if (ret != 0)
 				return ret;
diff --git a/drivers/net/mlx5/mlx5_txq.c b/drivers/net/mlx5/mlx5_txq.c
index 3e5e94444b..11770aeeef 100644
--- a/drivers/net/mlx5/mlx5_txq.c
+++ b/drivers/net/mlx5/mlx5_txq.c
@@ -16,6 +16,7 @@
 #include <rte_bus_pci.h>
 #include <rte_common.h>
 #include <rte_eal_paging.h>
+#include <rte_bus_pci.h>
 
 #include <mlx5_common.h>
 #include <mlx5_common_mr.h>
@@ -816,7 +817,7 @@ txq_set_params(struct mlx5_txq_ctrl *txq_ctrl)
 	if (config->txqs_inline == MLX5_ARG_UNSET)
 		txqs_inline =
 #if defined(RTE_ARCH_ARM64)
-		(priv->pci_dev->id.device_id ==
+		(priv->pci_dev && priv->pci_dev->id.device_id ==
 			PCI_DEVICE_ID_MELLANOX_CONNECTX5BF) ?
 			MLX5_INLINE_MAX_TXQS_BLUEFIELD :
 #endif
diff --git a/drivers/net/mlx5/windows/mlx5_os.c b/drivers/net/mlx5/windows/mlx5_os.c
index 3fe3f55f49..d4de2adfc1 100644
--- a/drivers/net/mlx5/windows/mlx5_os.c
+++ b/drivers/net/mlx5/windows/mlx5_os.c
@@ -393,7 +393,7 @@ mlx5_dev_spawn(struct rte_device *dpdk_dev,
 	 * Look for sibling devices in order to reuse their switch domain
 	 * if any, otherwise allocate one.
 	 */
-	MLX5_ETH_FOREACH_DEV(port_id, priv->pci_dev) {
+	MLX5_ETH_FOREACH_DEV(port_id, NULL) {
 		const struct mlx5_priv *opriv =
 			rte_eth_devices[port_id].data->dev_private;
 
-- 
2.25.1


^ permalink raw reply	[flat|nested] 42+ messages in thread

* [dpdk-dev] [PATCH v1 06/14] net/mlx5: migrate to bus-agnostic common driver
  2021-05-27 13:37 ` [dpdk-dev] [RFC 01/14] common/mlx5: add common device driver Xueming Li
                     ` (6 preceding siblings ...)
  2021-06-16  4:09   ` [dpdk-dev] [PATCH v1 05/14] net/mlx5: remove PCI dependency Xueming Li
@ 2021-06-16  4:09   ` Xueming Li
  2021-06-16  4:09   ` [dpdk-dev] [PATCH v1 07/14] net/mlx5: support SubFunction Xueming Li
                     ` (7 subsequent siblings)
  15 siblings, 0 replies; 42+ messages in thread
From: Xueming Li @ 2021-06-16  4:09 UTC (permalink / raw)
  To: Viacheslav Ovsiienko
  Cc: dev, xuemingl, Matan Azrad, Shahaf Shuler, Anatoly Burakov

To support SubFunction based on auxiliary bus, common driver supports
new bus-agnostic driver.

This patch migrates net driver to new common driver.

Signed-off-by: Xueming Li <xuemingl@nvidia.com>
---
 drivers/net/mlx5/linux/mlx5_os.c | 46 ++++++++++++++++++++++----------
 drivers/net/mlx5/linux/mlx5_os.h |  3 ---
 drivers/net/mlx5/mlx5.c          | 40 +++++++++++++--------------
 drivers/net/mlx5/mlx5.h          |  3 +--
 drivers/net/mlx5/mlx5_mr.c       | 38 +++++++++++++-------------
 drivers/net/mlx5/mlx5_rxtx.h     |  9 +++----
 6 files changed, 73 insertions(+), 66 deletions(-)

diff --git a/drivers/net/mlx5/linux/mlx5_os.c b/drivers/net/mlx5/linux/mlx5_os.c
index b695929e0b..a941ecf641 100644
--- a/drivers/net/mlx5/linux/mlx5_os.c
+++ b/drivers/net/mlx5/linux/mlx5_os.c
@@ -1969,14 +1969,6 @@ mlx5_os_pci_probe_pf(struct rte_pci_device *pci_dev,
 	struct mlx5_bond_info bond_info;
 	int ret = -1;
 
-	if (rte_eal_process_type() == RTE_PROC_PRIMARY)
-		mlx5_pmd_socket_init();
-	ret = mlx5_init_once();
-	if (ret) {
-		DRV_LOG(ERR, "unable to init PMD global data: %s",
-			strerror(rte_errno));
-		return -rte_errno;
-	}
 	errno = 0;
 	ibv_list = mlx5_glue->get_device_list(&ret);
 	if (!ibv_list) {
@@ -2409,21 +2401,18 @@ mlx5_os_pci_probe_pf(struct rte_pci_device *pci_dev,
 }
 
 /**
- * DPDK callback to register a PCI device.
+ * Callback to register a PCI device.
  *
  * This function spawns Ethernet devices out of a given PCI device.
  *
- * @param[in] pci_drv
- *   PCI driver structure (mlx5_driver).
  * @param[in] pci_dev
  *   PCI device information.
  *
  * @return
  *   0 on success, a negative errno value otherwise and rte_errno is set.
  */
-int
-mlx5_os_pci_probe(struct rte_pci_driver *pci_drv __rte_unused,
-		  struct rte_pci_device *pci_dev)
+static int
+mlx5_os_pci_probe(struct rte_pci_device *pci_dev)
 {
 	struct rte_eth_devargs eth_da = { .type = RTE_ETH_REPRESENTOR_NONE };
 	int ret = 0;
@@ -2462,6 +2451,35 @@ mlx5_os_pci_probe(struct rte_pci_driver *pci_drv __rte_unused,
 	return ret;
 }
 
+/**
+ * Common bus driver callback to probe a device.
+ *
+ * This function probe PCI bus device(s).
+ *
+ * @param[in] dev
+ *   Pointer to the generic device.
+ *
+ * @return
+ *   0 on success, the function cannot fail.
+ */
+int
+mlx5_os_net_probe(struct rte_device *dev)
+{
+	int ret;
+
+	if (rte_eal_process_type() == RTE_PROC_PRIMARY)
+		mlx5_pmd_socket_init();
+	ret = mlx5_init_once();
+	if (ret) {
+		DRV_LOG(ERR, "unable to init PMD global data: %s",
+			strerror(rte_errno));
+		return -rte_errno;
+	}
+	if (mlx5_dev_is_pci(dev))
+		return mlx5_os_pci_probe(RTE_DEV_TO_PCI(dev));
+	return 0;
+}
+
 static int
 mlx5_config_doorbell_mapping_env(const struct mlx5_dev_config *config)
 {
diff --git a/drivers/net/mlx5/linux/mlx5_os.h b/drivers/net/mlx5/linux/mlx5_os.h
index 4ae7d0ef47..af7cbeb418 100644
--- a/drivers/net/mlx5/linux/mlx5_os.h
+++ b/drivers/net/mlx5/linux/mlx5_os.h
@@ -19,7 +19,4 @@ enum {
 
 #define MLX5_NAMESIZE IF_NAMESIZE
 
-#define PCI_DRV_FLAGS  (RTE_PCI_DRV_INTR_LSC | \
-			RTE_PCI_DRV_INTR_RMV | \
-			RTE_PCI_DRV_PROBE_AGAIN)
 #endif /* RTE_PMD_MLX5_OS_H_ */
diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index 6077c701e7..c9474a6e74 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -12,7 +12,6 @@
 
 #include <rte_malloc.h>
 #include <ethdev_driver.h>
-#include <ethdev_pci.h>
 #include <rte_pci.h>
 #include <rte_bus_pci.h>
 #include <rte_common.h>
@@ -28,7 +27,6 @@
 #include <mlx5_common.h>
 #include <mlx5_common_os.h>
 #include <mlx5_common_mp.h>
-#include <mlx5_common_pci.h>
 #include <mlx5_malloc.h>
 
 #include "mlx5_defs.h"
@@ -2326,23 +2324,23 @@ mlx5_eth_find_next(uint16_t port_id, struct rte_device *odev)
 }
 
 /**
- * DPDK callback to remove a PCI device.
+ * Callback to remove a device.
  *
- * This function removes all Ethernet devices belong to a given PCI device.
+ * This function removes all Ethernet devices belong to a given device.
  *
- * @param[in] pci_dev
- *   Pointer to the PCI device.
+ * @param[in] dev
+ *   Pointer to the generic device.
  *
  * @return
  *   0 on success, the function cannot fail.
  */
 static int
-mlx5_pci_remove(struct rte_pci_device *pci_dev)
+mlx5_net_remove(struct rte_device *dev)
 {
 	uint16_t port_id;
 	int ret = 0;
 
-	RTE_ETH_FOREACH_DEV_OF(port_id, &pci_dev->device) {
+	RTE_ETH_FOREACH_DEV_OF(port_id, dev) {
 		/*
 		 * mlx5_dev_close() is not registered to secondary process,
 		 * call the close function explicitly for secondary process.
@@ -2433,19 +2431,17 @@ static const struct rte_pci_id mlx5_pci_id_map[] = {
 	}
 };
 
-static struct mlx5_pci_driver mlx5_driver = {
-	.driver_class = MLX5_CLASS_ETH,
-	.pci_driver = {
-		.driver = {
-			.name = MLX5_PCI_DRIVER_NAME,
-		},
-		.id_table = mlx5_pci_id_map,
-		.probe = mlx5_os_pci_probe,
-		.remove = mlx5_pci_remove,
-		.dma_map = mlx5_dma_map,
-		.dma_unmap = mlx5_dma_unmap,
-		.drv_flags = PCI_DRV_FLAGS,
-	},
+static struct mlx5_class_driver mlx5_net_driver = {
+	.drv_class = MLX5_CLASS_ETH,
+	.name = "mlx5_eth",
+	.id_table = mlx5_pci_id_map,
+	.probe = mlx5_os_net_probe,
+	.remove = mlx5_net_remove,
+	.dma_map = mlx5_net_dma_map,
+	.dma_unmap = mlx5_net_dma_unmap,
+	.probe_again = 1,
+	.intr_lsc = 1,
+	.intr_rmv = 1,
 };
 
 /* Initialize driver log type. */
@@ -2463,7 +2459,7 @@ RTE_INIT(rte_mlx5_pmd_init)
 	mlx5_set_cksum_table();
 	mlx5_set_swp_types_table();
 	if (mlx5_glue)
-		mlx5_pci_driver_register(&mlx5_driver);
+		mlx5_class_driver_register(&mlx5_net_driver);
 }
 
 RTE_PMD_EXPORT_NAME(net_mlx5, __COUNTER__);
diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index 42de853167..5ea465fa0b 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -1680,8 +1680,7 @@ int mlx5_os_open_device(const struct mlx5_dev_spawn_data *spawn,
 			 const struct mlx5_dev_config *config,
 			 struct mlx5_dev_ctx_shared *sh);
 int mlx5_os_get_pdn(void *pd, uint32_t *pdn);
-int mlx5_os_pci_probe(struct rte_pci_driver *pci_drv __rte_unused,
-		       struct rte_pci_device *pci_dev);
+int mlx5_os_net_probe(struct rte_device *dev);
 void mlx5_os_dev_shared_handler_install(struct mlx5_dev_ctx_shared *sh);
 void mlx5_os_dev_shared_handler_uninstall(struct mlx5_dev_ctx_shared *sh);
 void mlx5_os_set_reg_mr_cb(mlx5_reg_mr_t *reg_mr_cb,
diff --git a/drivers/net/mlx5/mlx5_mr.c b/drivers/net/mlx5/mlx5_mr.c
index e6324c22c5..fab5470ba2 100644
--- a/drivers/net/mlx5/mlx5_mr.c
+++ b/drivers/net/mlx5/mlx5_mr.c
@@ -7,7 +7,6 @@
 #include <rte_mempool.h>
 #include <rte_malloc.h>
 #include <rte_rwlock.h>
-#include <rte_bus_pci.h>
 
 #include <mlx5_common_mp.h>
 #include <mlx5_common_mr.h>
@@ -308,10 +307,10 @@ dev_to_eth_dev(struct rte_device *dev)
 }
 
 /**
- * DPDK callback to DMA map external memory to a PCI device.
+ * Callback to DMA map external memory to a device.
  *
- * @param pdev
- *   Pointer to the PCI device.
+ * @param rte_dev
+ *   Pointer to the generic device.
  * @param addr
  *   Starting virtual address of memory to be mapped.
  * @param iova
@@ -323,18 +322,18 @@ dev_to_eth_dev(struct rte_device *dev)
  *   0 on success, negative value on error.
  */
 int
-mlx5_dma_map(struct rte_pci_device *pdev, void *addr,
-	     uint64_t iova __rte_unused, size_t len)
+mlx5_net_dma_map(struct rte_device *rte_dev, void *addr,
+		 uint64_t iova __rte_unused, size_t len)
 {
 	struct rte_eth_dev *dev;
 	struct mlx5_mr *mr;
 	struct mlx5_priv *priv;
 	struct mlx5_dev_ctx_shared *sh;
 
-	dev = dev_to_eth_dev(&pdev->device);
+	dev = dev_to_eth_dev(rte_dev);
 	if (!dev) {
 		DRV_LOG(WARNING, "unable to find matching ethdev "
-				 "to PCI device %p", (void *)pdev);
+				 "to device %s", rte_dev->name);
 		rte_errno = ENODEV;
 		return -1;
 	}
@@ -357,10 +356,10 @@ mlx5_dma_map(struct rte_pci_device *pdev, void *addr,
 }
 
 /**
- * DPDK callback to DMA unmap external memory to a PCI device.
+ * Callback to DMA unmap external memory to a device.
  *
- * @param pdev
- *   Pointer to the PCI device.
+ * @param rte_dev
+ *   Pointer to the generic device.
  * @param addr
  *   Starting virtual address of memory to be unmapped.
  * @param iova
@@ -372,8 +371,8 @@ mlx5_dma_map(struct rte_pci_device *pdev, void *addr,
  *   0 on success, negative value on error.
  */
 int
-mlx5_dma_unmap(struct rte_pci_device *pdev, void *addr,
-	       uint64_t iova __rte_unused, size_t len __rte_unused)
+mlx5_net_dma_unmap(struct rte_device *rte_dev, void *addr,
+		   uint64_t iova __rte_unused, size_t len __rte_unused)
 {
 	struct rte_eth_dev *dev;
 	struct mlx5_priv *priv;
@@ -381,10 +380,10 @@ mlx5_dma_unmap(struct rte_pci_device *pdev, void *addr,
 	struct mlx5_mr *mr;
 	struct mr_cache_entry entry;
 
-	dev = dev_to_eth_dev(&pdev->device);
+	dev = dev_to_eth_dev(rte_dev);
 	if (!dev) {
-		DRV_LOG(WARNING, "unable to find matching ethdev "
-				 "to PCI device %p", (void *)pdev);
+		DRV_LOG(WARNING, "unable to find matching ethdev to device %s",
+			rte_dev->name);
 		rte_errno = ENODEV;
 		return -1;
 	}
@@ -394,16 +393,15 @@ mlx5_dma_unmap(struct rte_pci_device *pdev, void *addr,
 	mr = mlx5_mr_lookup_list(&sh->share_cache, &entry, (uintptr_t)addr);
 	if (!mr) {
 		rte_rwlock_read_unlock(&sh->share_cache.rwlock);
-		DRV_LOG(WARNING, "address 0x%" PRIxPTR " wasn't registered "
-				 "to PCI device %p", (uintptr_t)addr,
-				 (void *)pdev);
+		DRV_LOG(WARNING, "address 0x%" PRIxPTR " wasn't registered to device %s",
+			(uintptr_t)addr, rte_dev->name);
 		rte_errno = EINVAL;
 		return -1;
 	}
 	LIST_REMOVE(mr, mr);
-	mlx5_mr_free(mr, sh->share_cache.dereg_mr_cb);
 	DRV_LOG(DEBUG, "port %u remove MR(%p) from list", dev->data->port_id,
 	      (void *)mr);
+	mlx5_mr_free(mr, sh->share_cache.dereg_mr_cb);
 	mlx5_mr_rebuild_cache(&sh->share_cache);
 	/*
 	 * No explicit wmb is needed after updating dev_gen due to
diff --git a/drivers/net/mlx5/mlx5_rxtx.h b/drivers/net/mlx5/mlx5_rxtx.h
index e168dd46f9..ad1144e218 100644
--- a/drivers/net/mlx5/mlx5_rxtx.h
+++ b/drivers/net/mlx5/mlx5_rxtx.h
@@ -16,7 +16,6 @@
 #include <rte_hexdump.h>
 #include <rte_spinlock.h>
 #include <rte_io.h>
-#include <rte_bus_pci.h>
 #include <rte_cycles.h>
 
 #include <mlx5_common.h>
@@ -48,10 +47,10 @@ int mlx5_queue_state_modify(struct rte_eth_dev *dev,
 /* mlx5_mr.c */
 
 void mlx5_mr_flush_local_cache(struct mlx5_mr_ctrl *mr_ctrl);
-int mlx5_dma_map(struct rte_pci_device *pdev, void *addr, uint64_t iova,
-		 size_t len);
-int mlx5_dma_unmap(struct rte_pci_device *pdev, void *addr, uint64_t iova,
-		   size_t len);
+int mlx5_net_dma_map(struct rte_device *rte_dev, void *addr, uint64_t iova,
+		     size_t len);
+int mlx5_net_dma_unmap(struct rte_device *rte_dev, void *addr, uint64_t iova,
+		       size_t len);
 
 /**
  * Get Memory Pool (MP) from mbuf. If mbuf is indirect, the pool from which the
-- 
2.25.1


^ permalink raw reply	[flat|nested] 42+ messages in thread

* [dpdk-dev] [PATCH v1 07/14] net/mlx5: support SubFunction
  2021-05-27 13:37 ` [dpdk-dev] [RFC 01/14] common/mlx5: add common device driver Xueming Li
                     ` (7 preceding siblings ...)
  2021-06-16  4:09   ` [dpdk-dev] [PATCH v1 06/14] net/mlx5: migrate to bus-agnostic common driver Xueming Li
@ 2021-06-16  4:09   ` Xueming Li
  2021-06-16  4:09   ` [dpdk-dev] [PATCH v1 08/14] net/mlx5: check max Verbs port number Xueming Li
                     ` (6 subsequent siblings)
  15 siblings, 0 replies; 42+ messages in thread
From: Xueming Li @ 2021-06-16  4:09 UTC (permalink / raw)
  To: Viacheslav Ovsiienko
  Cc: dev, xuemingl, Matan Azrad, Shahaf Shuler, Anatoly Burakov

This patch introduces SF support. Similar to VF, SF on auxiliary bus is
a portion of hardware PF, no representor or bonding parameters for SF.

Devargs to support SF:
-a auxiliary:mlx5_core.sf.8,dv_flow_en=1

New global syntax to support SF:
-a bus=auxiliary,name=mlx5_core.sf.8/class=eth/driver=mlx5,dv_flow_en=1

Signed-off-by: Xueming Li <xuemingl@nvidia.com>
---
 doc/guides/nics/mlx5.rst                |  54 +++++----
 drivers/net/mlx5/linux/mlx5_ethdev_os.c |  12 +-
 drivers/net/mlx5/linux/mlx5_os.c        | 142 +++++++++++++++++-------
 drivers/net/mlx5/linux/mlx5_os.h        |   2 +
 drivers/net/mlx5/mlx5.c                 |  23 +++-
 drivers/net/mlx5/mlx5.h                 |   2 +
 drivers/net/mlx5/mlx5_mac.c             |   2 +-
 drivers/net/mlx5/mlx5_rxmode.c          |   8 +-
 drivers/net/mlx5/mlx5_trigger.c         |   2 +-
 drivers/net/mlx5/windows/mlx5_os.c      |  12 +-
 10 files changed, 185 insertions(+), 74 deletions(-)

diff --git a/doc/guides/nics/mlx5.rst b/doc/guides/nics/mlx5.rst
index 83299646dd..3b217d9e8e 100644
--- a/doc/guides/nics/mlx5.rst
+++ b/doc/guides/nics/mlx5.rst
@@ -110,6 +110,11 @@ Features
 - Flow integrity offload API.
 - Connection tracking.
 - Sub-Function representors.
+- Sub-Function.
+
+Limitations
+-----------
+
 
 Limitations
 -----------
@@ -1438,40 +1443,51 @@ the DPDK application.
 
         echo switchdev > /sys/class/net/<net device>/compat/devlink/mode
 
-Sub-Function representor
-------------------------
+Sub-Function support
+--------------------
 
 Sub-Function is a portion of the PCI device, a SF netdev has its own
-dedicated queues(txq, rxq). A SF netdev supports E-Switch representation
-offload similar to existing PF and VF representors. A SF shares PCI
-level resources with other SFs and/or with its parent PCI function.
+dedicated queues(txq, rxq). A SF shares PCI level resources with other SFs
+and/or with its parent PCI function.
+
+0. Requirement::
+
+        OFED version >= 5.4-0.3.3.0
 
 1. Configure SF feature::
 
-        mlxconfig -d <mst device> set PF_BAR2_SIZE=<0/1/2/3> PF_BAR2_ENABLE=1
+        # Run mlxconfig on both PFs on host and ECPFs on BlueField.
+        mlxconfig -d <mst device> set PER_PF_NUM_SF=1 PF_TOTAL_SF=252 PF_SF_BAR_SIZE=12
 
-        Value of PF_BAR2_SIZE:
+2. Enable switchdev mode::
 
-            0: 8 SFs
-            1: 16 SFs
-            2: 32 SFs
-            3: 64 SFs
+        mlxdevm dev eswitch set pci/<DBDF> mode switchdev
 
-2. Reset the FW::
+3. Add SF port::
 
-        mlxfwreset -d <mst device> reset
+        mlxdevm port add pci/<DBDF> flavour pcisf pfnum 0 sfnum <sfnum>
 
-3. Enable switchdev mode::
+        Get SFID from output: pci/<DBDF>/<SFID>
 
-        echo switchdev > /sys/class/net/<net device>/compat/devlink/mode
+4. Modify MAC address::
+
+        mlxdevm port function set pci/<DBDF>/<SFID> hw_addr <MAC>
+
+5. Activate SF port::
+
+        mlxdevm port function set pci/<DBDF>/<ID> state active
+
+6. Devargs to probe SF device::
 
-4. Create SF::
+        auxiliary:mlx5_core.sf.<num>,dv_flow_en=1
 
-        mlnx-sf -d <PCI_BDF> -a create
+Sub-Function representor support
+--------------------------------
 
-5. Probe SF representor::
+A SF netdev supports E-Switch representation offload similar to existing PF
+and VF representors. Use <sfnum> to probe SF representor.
 
-        testpmd> port attach <PCI_BDF>,representor=sf0,dv_flow_en=1
+        testpmd> port attach <PCI_BDF>,representor=sf<sfnum>,dv_flow_en=1
 
 Performance tuning
 ------------------
diff --git a/drivers/net/mlx5/linux/mlx5_ethdev_os.c b/drivers/net/mlx5/linux/mlx5_ethdev_os.c
index b05b9fc950..f34133e2c6 100644
--- a/drivers/net/mlx5/linux/mlx5_ethdev_os.c
+++ b/drivers/net/mlx5/linux/mlx5_ethdev_os.c
@@ -128,6 +128,17 @@ struct ethtool_link_settings {
 #define ETHTOOL_LINK_MODE_200000baseCR4_Full_BIT 2 /* 66 - 64 */
 #endif
 
+/* Get interface index from SubFunction device name. */
+int
+mlx5_auxiliary_get_ifindex(const char *sf_name)
+{
+	char if_name[IF_NAMESIZE] = { 0 };
+
+	if (mlx5_auxiliary_get_child_name(sf_name, "/net",
+					  if_name, sizeof(if_name)) != 0)
+		return -rte_errno;
+	return if_nametoindex(if_name);
+}
 
 /**
  * Get interface name from private structure.
@@ -1619,4 +1630,3 @@ mlx5_get_mac(struct rte_eth_dev *dev, uint8_t (*mac)[RTE_ETHER_ADDR_LEN])
 	memcpy(mac, request.ifr_hwaddr.sa_data, RTE_ETHER_ADDR_LEN);
 	return 0;
 }
-
diff --git a/drivers/net/mlx5/linux/mlx5_os.c b/drivers/net/mlx5/linux/mlx5_os.c
index a941ecf641..47df3b92f8 100644
--- a/drivers/net/mlx5/linux/mlx5_os.c
+++ b/drivers/net/mlx5/linux/mlx5_os.c
@@ -20,6 +20,7 @@
 #include <ethdev_pci.h>
 #include <rte_pci.h>
 #include <rte_bus_pci.h>
+#include <rte_bus_auxiliary.h>
 #include <rte_common.h>
 #include <rte_kvargs.h>
 #include <rte_rwlock.h>
@@ -1915,6 +1916,27 @@ mlx5_device_bond_pci_match(const struct ibv_device *ibv_dev,
 	return pf;
 }
 
+static void
+mlx5_os_config_default(struct mlx5_dev_config *config)
+{
+	memset(config, 0, sizeof(*config));
+	config->mps = MLX5_ARG_UNSET;
+	config->dbnc = MLX5_ARG_UNSET;
+	config->rx_vec_en = 1;
+	config->txq_inline_max = MLX5_ARG_UNSET;
+	config->txq_inline_min = MLX5_ARG_UNSET;
+	config->txq_inline_mpw = MLX5_ARG_UNSET;
+	config->txqs_inline = MLX5_ARG_UNSET;
+	config->vf_nl_en = 1;
+	config->mr_ext_memseg_en = 1;
+	config->mprq.max_memcpy_len = MLX5_MPRQ_MEMCPY_DEFAULT_LEN;
+	config->mprq.min_rxqs_num = MLX5_MPRQ_MIN_RXQS;
+	config->dv_esw_en = 1;
+	config->dv_flow_en = 1;
+	config->decap_en = 1;
+	config->log_hp_size = MLX5_ARG_UNSET;
+}
+
 /**
  * Register a PCI device within bonding.
  *
@@ -2327,23 +2349,8 @@ mlx5_os_pci_probe_pf(struct rte_pci_device *pci_dev,
 		uint32_t restore;
 
 		/* Default configuration. */
-		memset(&dev_config, 0, sizeof(struct mlx5_dev_config));
+		mlx5_os_config_default(&dev_config);
 		dev_config.vf = dev_config_vf;
-		dev_config.mps = MLX5_ARG_UNSET;
-		dev_config.dbnc = MLX5_ARG_UNSET;
-		dev_config.rx_vec_en = 1;
-		dev_config.txq_inline_max = MLX5_ARG_UNSET;
-		dev_config.txq_inline_min = MLX5_ARG_UNSET;
-		dev_config.txq_inline_mpw = MLX5_ARG_UNSET;
-		dev_config.txqs_inline = MLX5_ARG_UNSET;
-		dev_config.vf_nl_en = 1;
-		dev_config.mr_ext_memseg_en = 1;
-		dev_config.mprq.max_memcpy_len = MLX5_MPRQ_MEMCPY_DEFAULT_LEN;
-		dev_config.mprq.min_rxqs_num = MLX5_MPRQ_MIN_RXQS;
-		dev_config.dv_esw_en = 1;
-		dev_config.dv_flow_en = 1;
-		dev_config.decap_en = 1;
-		dev_config.log_hp_size = MLX5_ARG_UNSET;
 		list[i].eth_dev = mlx5_dev_spawn(&pci_dev->device,
 						 &list[i],
 						 &dev_config,
@@ -2400,6 +2407,35 @@ mlx5_os_pci_probe_pf(struct rte_pci_device *pci_dev,
 	return ret;
 }
 
+static int
+mlx5_os_parse_eth_devargs(struct rte_device *dev,
+			  struct rte_eth_devargs *eth_da)
+{
+	int ret = 0;
+
+	if (dev->devargs == NULL)
+		return 0;
+	memset(eth_da, 0, sizeof(*eth_da));
+	/* Parse representor information first from class argument. */
+	if (dev->devargs->cls_str)
+		ret = rte_eth_devargs_parse(dev->devargs->cls_str, eth_da);
+	if (ret != 0) {
+		DRV_LOG(ERR, "failed to parse device arguments: %s",
+			dev->devargs->cls_str);
+		return -rte_errno;
+	}
+	if (eth_da->type == RTE_ETH_REPRESENTOR_NONE) {
+		/* Parse legacy device argument */
+		ret = rte_eth_devargs_parse(dev->devargs->args, eth_da);
+		if (ret) {
+			DRV_LOG(ERR, "failed to parse device arguments: %s",
+				dev->devargs->args);
+			return -rte_errno;
+		}
+	}
+	return 0;
+}
+
 /**
  * Callback to register a PCI device.
  *
@@ -2414,31 +2450,13 @@ mlx5_os_pci_probe_pf(struct rte_pci_device *pci_dev,
 static int
 mlx5_os_pci_probe(struct rte_pci_device *pci_dev)
 {
-	struct rte_eth_devargs eth_da = { .type = RTE_ETH_REPRESENTOR_NONE };
+	struct rte_eth_devargs eth_da = { .nb_ports = 0 };
 	int ret = 0;
 	uint16_t p;
 
-	if (pci_dev->device.devargs) {
-		/* Parse representor information from device argument. */
-		if (pci_dev->device.devargs->cls_str)
-			ret = rte_eth_devargs_parse
-				(pci_dev->device.devargs->cls_str, &eth_da);
-		if (ret) {
-			DRV_LOG(ERR, "failed to parse device arguments: %s",
-				pci_dev->device.devargs->cls_str);
-			return -rte_errno;
-		}
-		if (eth_da.type == RTE_ETH_REPRESENTOR_NONE) {
-			/* Support legacy device argument */
-			ret = rte_eth_devargs_parse
-				(pci_dev->device.devargs->args, &eth_da);
-			if (ret) {
-				DRV_LOG(ERR, "failed to parse device arguments: %s",
-					pci_dev->device.devargs->args);
-				return -rte_errno;
-			}
-		}
-	}
+	ret = mlx5_os_parse_eth_devargs(&pci_dev->device, &eth_da);
+	if (ret != 0)
+		return ret;
 
 	if (eth_da.nb_ports > 0) {
 		/* Iterate all port if devargs pf is range: "pf[0-1]vf[...]". */
@@ -2451,10 +2469,53 @@ mlx5_os_pci_probe(struct rte_pci_device *pci_dev)
 	return ret;
 }
 
+/* Probe a single SF device on auxiliary bus, no representor support. */
+static int
+mlx5_os_auxiliary_probe(struct rte_device *dev)
+{
+	struct rte_eth_devargs eth_da = { .nb_ports = 0 };
+	struct mlx5_dev_config config;
+	struct mlx5_dev_spawn_data spawn = { .pf_bond = -1 };
+	struct rte_auxiliary_device *adev = RTE_DEV_TO_AUXILIARY(dev);
+	struct rte_eth_dev *eth_dev;
+	int ret = 0;
+
+	/* Parse ethdev devargs. */
+	ret = mlx5_os_parse_eth_devargs(dev, &eth_da);
+	if (ret != 0)
+		return ret;
+	/* Set default config data. */
+	mlx5_os_config_default(&config);
+	config.sf = 1;
+	/* Init spawn data. */
+	spawn.max_port = 1;
+	spawn.phys_port = 1;
+	spawn.phys_dev = mlx5_os_get_ibv_dev(dev);
+	ret = mlx5_auxiliary_get_ifindex(dev->name);
+	if (ret < 0) {
+		DRV_LOG(ERR, "failed to get ethdev ifindex: %s", dev->name);
+		return ret;
+	}
+	spawn.ifindex = ret;
+	/* Spawn device. */
+	eth_dev = mlx5_dev_spawn(dev, &spawn, &config, &eth_da);
+	if (eth_dev == NULL)
+		return -rte_errno;
+	/* Post create. */
+	eth_dev->intr_handle = &adev->intr_handle;
+	if (rte_eal_process_type() == RTE_PROC_PRIMARY) {
+		eth_dev->data->dev_flags |= RTE_ETH_DEV_INTR_LSC;
+		eth_dev->data->dev_flags |= RTE_ETH_DEV_INTR_RMV;
+		eth_dev->data->numa_node = dev->numa_node;
+	}
+	rte_eth_dev_probing_finish(eth_dev);
+	return 0;
+}
+
 /**
  * Common bus driver callback to probe a device.
  *
- * This function probe PCI bus device(s).
+ * This function probe PCI bus device(s) or a single SF on auxiliary bus.
  *
  * @param[in] dev
  *   Pointer to the generic device.
@@ -2477,7 +2538,8 @@ mlx5_os_net_probe(struct rte_device *dev)
 	}
 	if (mlx5_dev_is_pci(dev))
 		return mlx5_os_pci_probe(RTE_DEV_TO_PCI(dev));
-	return 0;
+	else
+		return mlx5_os_auxiliary_probe(dev);
 }
 
 static int
diff --git a/drivers/net/mlx5/linux/mlx5_os.h b/drivers/net/mlx5/linux/mlx5_os.h
index af7cbeb418..2991d37df2 100644
--- a/drivers/net/mlx5/linux/mlx5_os.h
+++ b/drivers/net/mlx5/linux/mlx5_os.h
@@ -19,4 +19,6 @@ enum {
 
 #define MLX5_NAMESIZE IF_NAMESIZE
 
+int mlx5_auxiliary_get_ifindex(const char *sf_name);
+
 #endif /* RTE_PMD_MLX5_OS_H_ */
diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index c9474a6e74..c6d70af3a7 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -392,6 +392,24 @@ mlx5_is_hpf(struct rte_eth_dev *dev)
 	       MLX5_REPRESENTOR_REPR(-1) == repr;
 }
 
+/**
+ * Decide whether representor ID is a SF port representor.
+ *
+ * @param dev
+ *   Pointer to Ethernet device structure.
+ *
+ * @return
+ *   Non-zero if HPF, otherwise 0.
+ */
+bool
+mlx5_is_sf_repr(struct rte_eth_dev *dev)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	int type = MLX5_REPRESENTOR_TYPE(priv->representor_id);
+
+	return priv->representor != 0 && type == RTE_ETH_REPRESENTOR_SF;
+}
+
 /**
  * Initialize the ASO aging management structure.
  *
@@ -2314,7 +2332,10 @@ mlx5_eth_find_next(uint16_t port_id, struct rte_device *odev)
 		    (dev->device == odev ||
 		     (dev->device->driver &&
 		     dev->device->driver->name &&
-		     !strcmp(dev->device->driver->name, MLX5_PCI_DRIVER_NAME))))
+		     ((strcmp(dev->device->driver->name,
+			      MLX5_PCI_DRIVER_NAME) == 0) ||
+		      (strcmp(dev->device->driver->name,
+			      MLX5_AUXILIARY_DRIVER_NAME) == 0)))))
 			break;
 		port_id++;
 	}
diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index 5ea465fa0b..28bbc72e80 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -220,6 +220,7 @@ struct mlx5_dev_config {
 	unsigned int hw_fcs_strip:1; /* FCS stripping is supported. */
 	unsigned int hw_padding:1; /* End alignment padding is supported. */
 	unsigned int vf:1; /* This is a VF. */
+	unsigned int sf:1; /* This is a SF. */
 	unsigned int tunnel_en:1;
 	/* Whether tunnel stateless offloads are supported. */
 	unsigned int mpls_en:1; /* MPLS over GRE/UDP is enabled. */
@@ -1407,6 +1408,7 @@ int mlx5_udp_tunnel_port_add(struct rte_eth_dev *dev,
 uint16_t mlx5_eth_find_next(uint16_t port_id, struct rte_device *odev);
 int mlx5_dev_close(struct rte_eth_dev *dev);
 bool mlx5_is_hpf(struct rte_eth_dev *dev);
+bool mlx5_is_sf_repr(struct rte_eth_dev *dev);
 void mlx5_age_event_prepare(struct mlx5_dev_ctx_shared *sh);
 
 /* Macro to iterate over all valid ports for mlx5 driver. */
diff --git a/drivers/net/mlx5/mlx5_mac.c b/drivers/net/mlx5/mlx5_mac.c
index 19981d26d8..a791fedc91 100644
--- a/drivers/net/mlx5/mlx5_mac.c
+++ b/drivers/net/mlx5/mlx5_mac.c
@@ -159,7 +159,7 @@ mlx5_mac_addr_set(struct rte_eth_dev *dev, struct rte_ether_addr *mac_addr)
 	 * Configuring the VF instead of its representor,
 	 * need to skip the special case of HPF on Bluefield.
 	 */
-	if (priv->representor && !mlx5_is_hpf(dev)) {
+	if (priv->representor && !mlx5_is_hpf(dev) && !mlx5_is_sf_repr(dev)) {
 		DRV_LOG(DEBUG, "VF represented by port %u setting primary MAC address",
 			dev->data->port_id);
 		if (priv->pf_bond >= 0) {
diff --git a/drivers/net/mlx5/mlx5_rxmode.c b/drivers/net/mlx5/mlx5_rxmode.c
index 25fb47c9ed..7f19b235c2 100644
--- a/drivers/net/mlx5/mlx5_rxmode.c
+++ b/drivers/net/mlx5/mlx5_rxmode.c
@@ -36,7 +36,7 @@ mlx5_promiscuous_enable(struct rte_eth_dev *dev)
 			dev->data->port_id);
 		return 0;
 	}
-	if (priv->config.vf) {
+	if (priv->config.vf || priv->config.sf) {
 		ret = mlx5_os_set_promisc(dev, 1);
 		if (ret)
 			return ret;
@@ -69,7 +69,7 @@ mlx5_promiscuous_disable(struct rte_eth_dev *dev)
 	int ret;
 
 	dev->data->promiscuous = 0;
-	if (priv->config.vf) {
+	if (priv->config.vf || priv->config.sf) {
 		ret = mlx5_os_set_promisc(dev, 0);
 		if (ret)
 			return ret;
@@ -109,7 +109,7 @@ mlx5_allmulticast_enable(struct rte_eth_dev *dev)
 			dev->data->port_id);
 		return 0;
 	}
-	if (priv->config.vf) {
+	if (priv->config.vf || priv->config.sf) {
 		ret = mlx5_os_set_allmulti(dev, 1);
 		if (ret)
 			goto error;
@@ -142,7 +142,7 @@ mlx5_allmulticast_disable(struct rte_eth_dev *dev)
 	int ret;
 
 	dev->data->all_multicast = 0;
-	if (priv->config.vf) {
+	if (priv->config.vf || priv->config.sf) {
 		ret = mlx5_os_set_allmulti(dev, 0);
 		if (ret)
 			goto error;
diff --git a/drivers/net/mlx5/mlx5_trigger.c b/drivers/net/mlx5/mlx5_trigger.c
index 4ff12eac19..2acfb484a5 100644
--- a/drivers/net/mlx5/mlx5_trigger.c
+++ b/drivers/net/mlx5/mlx5_trigger.c
@@ -1259,7 +1259,7 @@ mlx5_traffic_enable(struct rte_eth_dev *dev)
 		}
 		mlx5_txq_release(dev, i);
 	}
-	if (priv->config.dv_esw_en && !priv->config.vf) {
+	if (priv->config.dv_esw_en && !priv->config.vf && !priv->config.sf) {
 		if (mlx5_flow_create_esw_table_zero_flow(dev))
 			priv->fdb_def_rule = 1;
 		else
diff --git a/drivers/net/mlx5/windows/mlx5_os.c b/drivers/net/mlx5/windows/mlx5_os.c
index d4de2adfc1..7211c06333 100644
--- a/drivers/net/mlx5/windows/mlx5_os.c
+++ b/drivers/net/mlx5/windows/mlx5_os.c
@@ -921,20 +921,18 @@ mlx5_match_devx_devices_to_addr(struct devx_device_bdf *devx_bdf,
 /**
  * DPDK callback to register a PCI device.
  *
- * This function spawns Ethernet devices out of a given PCI device.
+ * This function spawns Ethernet devices out of a given device.
  *
- * @param[in] pci_drv
- *   PCI driver structure (mlx5_driver).
- * @param[in] pci_dev
- *   PCI device information.
+ * @param[in] dev
+ *   Pointer to the generic device.
  *
  * @return
  *   0 on success, a negative errno value otherwise and rte_errno is set.
  */
 int
-mlx5_os_pci_probe(struct rte_pci_driver *pci_drv __rte_unused,
-		  struct rte_pci_device *pci_dev)
+mlx5_os_net_probe(struct rte_device *dev)
 {
+	struct rte_pci_device *pci_dev = RTE_DEV_TO_PCI(dev);
 	struct devx_device_bdf *devx_bdf_devs, *orig_devx_bdf_devs;
 	/*
 	 * Number of found IB Devices matching with requested PCI BDF.
-- 
2.25.1


^ permalink raw reply	[flat|nested] 42+ messages in thread

* [dpdk-dev] [PATCH v1 08/14] net/mlx5: check max Verbs port number
  2021-05-27 13:37 ` [dpdk-dev] [RFC 01/14] common/mlx5: add common device driver Xueming Li
                     ` (8 preceding siblings ...)
  2021-06-16  4:09   ` [dpdk-dev] [PATCH v1 07/14] net/mlx5: support SubFunction Xueming Li
@ 2021-06-16  4:09   ` Xueming Li
  2021-06-16  4:09   ` [dpdk-dev] [PATCH v1 09/14] regex/mlx5: migrate to common driver Xueming Li
                     ` (5 subsequent siblings)
  15 siblings, 0 replies; 42+ messages in thread
From: Xueming Li @ 2021-06-16  4:09 UTC (permalink / raw)
  To: Viacheslav Ovsiienko; +Cc: dev, xuemingl, Matan Azrad, Shahaf Shuler

Verbs API doesn't support Device port number larger than 255 by design.
Adds check and fails probing with proper error log.

Signed-off-by: Xueming Li <xuemingl@nvidia.com>
---
 drivers/net/mlx5/linux/mlx5_os.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/drivers/net/mlx5/linux/mlx5_os.c b/drivers/net/mlx5/linux/mlx5_os.c
index 47df3b92f8..9a3616d539 100644
--- a/drivers/net/mlx5/linux/mlx5_os.c
+++ b/drivers/net/mlx5/linux/mlx5_os.c
@@ -1168,6 +1168,12 @@ mlx5_dev_spawn(struct rte_device *dpdk_dev,
 		config->dv_flow_en = 0;
 	}
 #endif
+	if (spawn->max_port > UINT8_MAX) {
+		/* Verbs can't support ports larger than 255 by design. */
+		DRV_LOG(ERR, "can't support IB ports > UINT8_MAX");
+		err = EINVAL;
+		goto error;
+	}
 	config->ind_table_max_size =
 		sh->device_attr.max_rwq_indirection_table_size;
 	/*
-- 
2.25.1


^ permalink raw reply	[flat|nested] 42+ messages in thread

* [dpdk-dev] [PATCH v1 09/14] regex/mlx5: migrate to common driver
  2021-05-27 13:37 ` [dpdk-dev] [RFC 01/14] common/mlx5: add common device driver Xueming Li
                     ` (9 preceding siblings ...)
  2021-06-16  4:09   ` [dpdk-dev] [PATCH v1 08/14] net/mlx5: check max Verbs port number Xueming Li
@ 2021-06-16  4:09   ` Xueming Li
  2021-06-16  4:09   ` [dpdk-dev] [PATCH v1 10/14] vdpa/mlx5: define driver name as macro Xueming Li
                     ` (4 subsequent siblings)
  15 siblings, 0 replies; 42+ messages in thread
From: Xueming Li @ 2021-06-16  4:09 UTC (permalink / raw)
  To: Viacheslav Ovsiienko; +Cc: dev, xuemingl, Ori Kam

To support auxiliary bus, upgrades driver to use mlx5 common driver
structure.

Signed-off-by: Xueming Li <xuemingl@nvidia.com>
---
 drivers/regex/mlx5/mlx5_regex.c | 49 ++++++++++++---------------------
 drivers/regex/mlx5/mlx5_regex.h |  1 -
 2 files changed, 18 insertions(+), 32 deletions(-)

diff --git a/drivers/regex/mlx5/mlx5_regex.c b/drivers/regex/mlx5/mlx5_regex.c
index dcb2ced88e..9d93eaa934 100644
--- a/drivers/regex/mlx5/mlx5_regex.c
+++ b/drivers/regex/mlx5/mlx5_regex.c
@@ -9,8 +9,8 @@
 #include <rte_regexdev.h>
 #include <rte_regexdev_core.h>
 #include <rte_regexdev_driver.h>
+#include <rte_bus_pci.h>
 
-#include <mlx5_common_pci.h>
 #include <mlx5_common.h>
 #include <mlx5_glue.h>
 #include <mlx5_devx_cmds.h>
@@ -76,15 +76,13 @@ mlx5_regex_engines_status(struct ibv_context *ctx, int num_engines)
 }
 
 static void
-mlx5_regex_get_name(char *name, struct rte_pci_device *pci_dev __rte_unused)
+mlx5_regex_get_name(char *name, struct rte_device *dev)
 {
-	sprintf(name, "mlx5_regex_%02x:%02x.%02x", pci_dev->addr.bus,
-		pci_dev->addr.devid, pci_dev->addr.function);
+	sprintf(name, "mlx5_regex_%s", dev->name);
 }
 
 static int
-mlx5_regex_pci_probe(struct rte_pci_driver *pci_drv __rte_unused,
-		     struct rte_pci_device *pci_dev)
+mlx5_regex_dev_probe(struct rte_device *rte_dev)
 {
 	struct ibv_device *ibv;
 	struct mlx5_regex_priv *priv = NULL;
@@ -94,16 +92,10 @@ mlx5_regex_pci_probe(struct rte_pci_driver *pci_drv __rte_unused,
 	int ret;
 	uint32_t val;
 
-	ibv = mlx5_os_get_ibv_device(&pci_dev->addr);
-	if (!ibv) {
-		DRV_LOG(ERR, "No matching IB device for PCI slot "
-			PCI_PRI_FMT ".", pci_dev->addr.domain,
-			pci_dev->addr.bus, pci_dev->addr.devid,
-			pci_dev->addr.function);
+	ibv = mlx5_os_get_ibv_dev(rte_dev);
+	if (ibv == NULL)
 		return -rte_errno;
-	}
-	DRV_LOG(INFO, "PCI information matches for device \"%s\".",
-		ibv->name);
+	DRV_LOG(INFO, "Probe device \"%s\".", ibv->name);
 	ctx = mlx5_glue->dv_open_device(ibv);
 	if (!ctx) {
 		DRV_LOG(ERR, "Failed to open IB device \"%s\".", ibv->name);
@@ -146,7 +138,7 @@ mlx5_regex_pci_probe(struct rte_pci_driver *pci_drv __rte_unused,
 		priv->is_bf2 = 1;
 	/* Default RXP programming mode to Shared. */
 	priv->prog_mode = MLX5_RXP_SHARED_PROG_MODE;
-	mlx5_regex_get_name(name, pci_dev);
+	mlx5_regex_get_name(name, rte_dev);
 	priv->regexdev = rte_regexdev_register(name);
 	if (priv->regexdev == NULL) {
 		DRV_LOG(ERR, "Failed to register RegEx device.");
@@ -180,7 +172,7 @@ mlx5_regex_pci_probe(struct rte_pci_driver *pci_drv __rte_unused,
 		priv->regexdev->enqueue = mlx5_regexdev_enqueue_gga;
 #endif
 	priv->regexdev->dequeue = mlx5_regexdev_dequeue;
-	priv->regexdev->device = (struct rte_device *)pci_dev;
+	priv->regexdev->device = rte_dev;
 	priv->regexdev->data->dev_private = priv;
 	priv->regexdev->state = RTE_REGEXDEV_READY;
 	priv->mr_scache.reg_mr_cb = mlx5_common_verbs_reg_mr;
@@ -213,13 +205,13 @@ mlx5_regex_pci_probe(struct rte_pci_driver *pci_drv __rte_unused,
 }
 
 static int
-mlx5_regex_pci_remove(struct rte_pci_device *pci_dev)
+mlx5_regex_dev_remove(struct rte_device *rte_dev)
 {
 	char name[RTE_REGEXDEV_NAME_MAX_LEN];
 	struct rte_regexdev *dev;
 	struct mlx5_regex_priv *priv = NULL;
 
-	mlx5_regex_get_name(name, pci_dev);
+	mlx5_regex_get_name(name, rte_dev);
 	dev = rte_regexdev_get_device_by_name(name);
 	if (!dev)
 		return 0;
@@ -254,24 +246,19 @@ static const struct rte_pci_id mlx5_regex_pci_id_map[] = {
 	}
 };
 
-static struct mlx5_pci_driver mlx5_regex_driver = {
-	.driver_class = MLX5_CLASS_REGEX,
-	.pci_driver = {
-		.driver = {
-			.name = RTE_STR(MLX5_REGEX_DRIVER_NAME),
-		},
-		.id_table = mlx5_regex_pci_id_map,
-		.probe = mlx5_regex_pci_probe,
-		.remove = mlx5_regex_pci_remove,
-		.drv_flags = 0,
-	},
+static struct mlx5_class_driver mlx5_regex_driver = {
+	.drv_class = MLX5_CLASS_REGEX,
+	.name = RTE_STR(MLX5_REGEX_DRIVER_NAME),
+	.id_table = mlx5_regex_pci_id_map,
+	.probe = mlx5_regex_dev_probe,
+	.remove = mlx5_regex_dev_remove,
 };
 
 RTE_INIT(rte_mlx5_regex_init)
 {
 	mlx5_common_init();
 	if (mlx5_glue)
-		mlx5_pci_driver_register(&mlx5_regex_driver);
+		mlx5_class_driver_register(&mlx5_regex_driver);
 }
 
 RTE_LOG_REGISTER_DEFAULT(mlx5_regex_logtype, NOTICE)
diff --git a/drivers/regex/mlx5/mlx5_regex.h b/drivers/regex/mlx5/mlx5_regex.h
index 51a2101e53..45200bf937 100644
--- a/drivers/regex/mlx5/mlx5_regex.h
+++ b/drivers/regex/mlx5/mlx5_regex.h
@@ -59,7 +59,6 @@ struct mlx5_regex_db {
 struct mlx5_regex_priv {
 	TAILQ_ENTRY(mlx5_regex_priv) next;
 	struct ibv_context *ctx; /* Device context. */
-	struct rte_pci_device *pci_dev;
 	struct rte_regexdev *regexdev; /* Pointer to the RegEx dev. */
 	uint16_t nb_queues; /* Number of queues. */
 	struct mlx5_regex_qp *qps; /* Pointer to the qp array. */
-- 
2.25.1


^ permalink raw reply	[flat|nested] 42+ messages in thread

* [dpdk-dev] [PATCH v1 10/14] vdpa/mlx5: define driver name as macro
  2021-05-27 13:37 ` [dpdk-dev] [RFC 01/14] common/mlx5: add common device driver Xueming Li
                     ` (10 preceding siblings ...)
  2021-06-16  4:09   ` [dpdk-dev] [PATCH v1 09/14] regex/mlx5: migrate to common driver Xueming Li
@ 2021-06-16  4:09   ` Xueming Li
  2021-06-16  4:09   ` [dpdk-dev] [PATCH v1 11/14] vdpa/mlx5: remove PCI specifics Xueming Li
                     ` (3 subsequent siblings)
  15 siblings, 0 replies; 42+ messages in thread
From: Xueming Li @ 2021-06-16  4:09 UTC (permalink / raw)
  To: Viacheslav Ovsiienko; +Cc: Thomas Monjalon, dev, xuemingl, Matan Azrad

From: Thomas Monjalon <thomas@monjalon.net>

Uses macro for pmd driver name.

Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
---
 drivers/vdpa/mlx5/mlx5_vdpa.c | 9 +++++----
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.c b/drivers/vdpa/mlx5/mlx5_vdpa.c
index 8b5bfd8c3d..5ab7c525c2 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa.c
@@ -24,6 +24,7 @@
 #include "mlx5_vdpa_utils.h"
 #include "mlx5_vdpa.h"
 
+#define MLX5_VDPA_DRIVER_NAME vdpa_mlx5
 
 #define MLX5_VDPA_DEFAULT_FEATURES ((1ULL << VHOST_USER_F_PROTOCOL_FEATURES) | \
 			    (1ULL << VIRTIO_F_ANY_LAYOUT) | \
@@ -834,7 +835,7 @@ static struct mlx5_pci_driver mlx5_vdpa_driver = {
 	.driver_class = MLX5_CLASS_VDPA,
 	.pci_driver = {
 		.driver = {
-			.name = "mlx5_vdpa",
+			.name = RTE_STR(MLX5_VDPA_DRIVER_NAME),
 		},
 		.id_table = mlx5_vdpa_pci_id_map,
 		.probe = mlx5_vdpa_pci_probe,
@@ -855,6 +856,6 @@ RTE_INIT(rte_mlx5_vdpa_init)
 		mlx5_pci_driver_register(&mlx5_vdpa_driver);
 }
 
-RTE_PMD_EXPORT_NAME(net_mlx5_vdpa, __COUNTER__);
-RTE_PMD_REGISTER_PCI_TABLE(net_mlx5_vdpa, mlx5_vdpa_pci_id_map);
-RTE_PMD_REGISTER_KMOD_DEP(net_mlx5_vdpa, "* ib_uverbs & mlx5_core & mlx5_ib");
+RTE_PMD_EXPORT_NAME(MLX5_VDPA_DRIVER_NAME, __COUNTER__);
+RTE_PMD_REGISTER_PCI_TABLE(MLX5_VDPA_DRIVER_NAME, mlx5_vdpa_pci_id_map);
+RTE_PMD_REGISTER_KMOD_DEP(MLX5_VDPA_DRIVER_NAME, "* ib_uverbs & mlx5_core & mlx5_ib");
-- 
2.25.1


^ permalink raw reply	[flat|nested] 42+ messages in thread

* [dpdk-dev] [PATCH v1 11/14] vdpa/mlx5: remove PCI specifics
  2021-05-27 13:37 ` [dpdk-dev] [RFC 01/14] common/mlx5: add common device driver Xueming Li
                     ` (11 preceding siblings ...)
  2021-06-16  4:09   ` [dpdk-dev] [PATCH v1 10/14] vdpa/mlx5: define driver name as macro Xueming Li
@ 2021-06-16  4:09   ` Xueming Li
  2021-06-16  4:09   ` [dpdk-dev] [PATCH v1 12/14] vdpa/mlx5: support SubFunction Xueming Li
                     ` (2 subsequent siblings)
  15 siblings, 0 replies; 42+ messages in thread
From: Xueming Li @ 2021-06-16  4:09 UTC (permalink / raw)
  To: Viacheslav Ovsiienko; +Cc: Thomas Monjalon, dev, xuemingl, Matan Azrad

From: Thomas Monjalon <thomas@monjalon.net>

Removes PCI specific driver, replaces with common class driver.

Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
---
 drivers/vdpa/mlx5/mlx5_vdpa.c | 119 ++++++++++------------------------
 drivers/vdpa/mlx5/mlx5_vdpa.h |   1 -
 2 files changed, 34 insertions(+), 86 deletions(-)

diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.c b/drivers/vdpa/mlx5/mlx5_vdpa.c
index 5ab7c525c2..9c9a552ba0 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa.c
@@ -11,12 +11,11 @@
 #include <rte_malloc.h>
 #include <rte_log.h>
 #include <rte_errno.h>
-#include <rte_pci.h>
 #include <rte_string_fns.h>
+#include <rte_bus_pci.h>
 
 #include <mlx5_glue.h>
 #include <mlx5_common.h>
-#include <mlx5_common_pci.h>
 #include <mlx5_devx_cmds.h>
 #include <mlx5_prm.h>
 #include <mlx5_nl.h>
@@ -552,34 +551,13 @@ mlx5_vdpa_sys_roce_disable(const char *addr)
 }
 
 static int
-mlx5_vdpa_roce_disable(struct rte_pci_addr *addr, struct ibv_device **ibv)
+mlx5_vdpa_roce_disable(struct rte_device *dev)
 {
-	char addr_name[64] = {0};
-
-	rte_pci_device_name(addr, addr_name, sizeof(addr_name));
 	/* Firstly try to disable ROCE by Netlink and fallback to sysfs. */
-	if (mlx5_vdpa_nl_roce_disable(addr_name) == 0 ||
-	    mlx5_vdpa_sys_roce_disable(addr_name) == 0) {
-		/*
-		 * Succeed to disable ROCE, wait for the IB device to appear
-		 * again after reload.
-		 */
-		int r;
-		struct ibv_device *ibv_new;
-
-		for (r = MLX5_VDPA_MAX_RETRIES; r; r--) {
-			ibv_new = mlx5_os_get_ibv_device(addr);
-			if (ibv_new) {
-				*ibv = ibv_new;
-				return 0;
-			}
-			usleep(MLX5_VDPA_USEC);
-		}
-		DRV_LOG(ERR, "Cannot much device %s after ROCE disable, "
-			"retries exceed %d", addr_name, MLX5_VDPA_MAX_RETRIES);
-		rte_errno = EAGAIN;
-	}
-	return -rte_errno;
+	if (mlx5_vdpa_nl_roce_disable(dev->name) != 0 &&
+	    mlx5_vdpa_sys_roce_disable(dev->name) != 0)
+		return -rte_errno;
+	return 0;
 }
 
 static int
@@ -647,44 +625,33 @@ mlx5_vdpa_config_get(struct rte_devargs *devargs, struct mlx5_vdpa_priv *priv)
 	DRV_LOG(DEBUG, "no traffic max is %u.", priv->no_traffic_max);
 }
 
-/**
- * DPDK callback to register a mlx5 PCI device.
- *
- * This function spawns vdpa device out of a given PCI device.
- *
- * @param[in] pci_drv
- *   PCI driver structure (mlx5_vpda_driver).
- * @param[in] pci_dev
- *   PCI device information.
- *
- * @return
- *   0 on success, 1 to skip this driver, a negative errno value otherwise
- *   and rte_errno is set.
- */
 static int
-mlx5_vdpa_pci_probe(struct rte_pci_driver *pci_drv __rte_unused,
-		    struct rte_pci_device *pci_dev __rte_unused)
+mlx5_vdpa_dev_probe(struct rte_device *dev)
 {
 	struct ibv_device *ibv;
 	struct mlx5_vdpa_priv *priv = NULL;
 	struct ibv_context *ctx = NULL;
 	struct mlx5_hca_attr attr;
+	int retry;
 	int ret;
 
-	ibv = mlx5_os_get_ibv_device(&pci_dev->addr);
-	if (!ibv) {
-		DRV_LOG(ERR, "No matching IB device for PCI slot "
-			PCI_PRI_FMT ".", pci_dev->addr.domain,
-			pci_dev->addr.bus, pci_dev->addr.devid,
-			pci_dev->addr.function);
+	if (mlx5_vdpa_roce_disable(dev) != 0) {
+		DRV_LOG(WARNING, "Failed to disable ROCE for \"%s\".",
+			dev->name);
 		return -rte_errno;
-	} else {
-		DRV_LOG(INFO, "PCI information matches for device \"%s\".",
-			ibv->name);
 	}
-	if (mlx5_vdpa_roce_disable(&pci_dev->addr, &ibv) != 0) {
-		DRV_LOG(WARNING, "Failed to disable ROCE for \"%s\".",
-			ibv->name);
+	/* Wait for the IB device to appear again after reload. */
+	for (retry = MLX5_VDPA_MAX_RETRIES; retry > 0; --retry) {
+		ibv = mlx5_os_get_ibv_dev(dev);
+		if (ibv != NULL)
+			break;
+		usleep(MLX5_VDPA_USEC);
+	}
+	if (ibv == NULL) {
+		DRV_LOG(ERR, "Cannot get IB device after disabling RoCE for "
+				"\"%s\", retries exceed %d.",
+				dev->name, MLX5_VDPA_MAX_RETRIES);
+		rte_errno = EAGAIN;
 		return -rte_errno;
 	}
 	ctx = mlx5_glue->dv_open_device(ibv);
@@ -722,20 +689,18 @@ mlx5_vdpa_pci_probe(struct rte_pci_driver *pci_drv __rte_unused,
 	if (attr.num_lag_ports == 0)
 		priv->num_lag_ports = 1;
 	priv->ctx = ctx;
-	priv->pci_dev = pci_dev;
 	priv->var = mlx5_glue->dv_alloc_var(ctx, 0);
 	if (!priv->var) {
 		DRV_LOG(ERR, "Failed to allocate VAR %u.", errno);
 		goto error;
 	}
-	priv->vdev = rte_vdpa_register_device(&pci_dev->device,
-			&mlx5_vdpa_ops);
+	priv->vdev = rte_vdpa_register_device(dev, &mlx5_vdpa_ops);
 	if (priv->vdev == NULL) {
 		DRV_LOG(ERR, "Failed to register vDPA device.");
 		rte_errno = rte_errno ? rte_errno : EINVAL;
 		goto error;
 	}
-	mlx5_vdpa_config_get(pci_dev->device.devargs, priv);
+	mlx5_vdpa_config_get(dev->devargs, priv);
 	SLIST_INIT(&priv->mr_list);
 	pthread_mutex_init(&priv->vq_config_lock, NULL);
 	pthread_mutex_lock(&priv_list_lock);
@@ -754,26 +719,15 @@ mlx5_vdpa_pci_probe(struct rte_pci_driver *pci_drv __rte_unused,
 	return -rte_errno;
 }
 
-/**
- * DPDK callback to remove a PCI device.
- *
- * This function removes all vDPA devices belong to a given PCI device.
- *
- * @param[in] pci_dev
- *   Pointer to the PCI device.
- *
- * @return
- *   0 on success, the function cannot fail.
- */
 static int
-mlx5_vdpa_pci_remove(struct rte_pci_device *pci_dev)
+mlx5_vdpa_dev_remove(struct rte_device *dev)
 {
 	struct mlx5_vdpa_priv *priv = NULL;
 	int found = 0;
 
 	pthread_mutex_lock(&priv_list_lock);
 	TAILQ_FOREACH(priv, &priv_list, next) {
-		if (!rte_pci_addr_cmp(&priv->pci_dev->addr, &pci_dev->addr)) {
+		if (priv->vdev->device == dev) {
 			found = 1;
 			break;
 		}
@@ -831,17 +785,12 @@ static const struct rte_pci_id mlx5_vdpa_pci_id_map[] = {
 	}
 };
 
-static struct mlx5_pci_driver mlx5_vdpa_driver = {
-	.driver_class = MLX5_CLASS_VDPA,
-	.pci_driver = {
-		.driver = {
-			.name = RTE_STR(MLX5_VDPA_DRIVER_NAME),
-		},
-		.id_table = mlx5_vdpa_pci_id_map,
-		.probe = mlx5_vdpa_pci_probe,
-		.remove = mlx5_vdpa_pci_remove,
-		.drv_flags = 0,
-	},
+static struct mlx5_class_driver mlx5_vdpa_driver = {
+	.drv_class = MLX5_CLASS_VDPA,
+	.name = RTE_STR(MLX5_VDPA_DRIVER_NAME),
+	.id_table = mlx5_vdpa_pci_id_map,
+	.probe = mlx5_vdpa_dev_probe,
+	.remove = mlx5_vdpa_dev_remove,
 };
 
 RTE_LOG_REGISTER_DEFAULT(mlx5_vdpa_logtype, NOTICE)
@@ -853,7 +802,7 @@ RTE_INIT(rte_mlx5_vdpa_init)
 {
 	mlx5_common_init();
 	if (mlx5_glue)
-		mlx5_pci_driver_register(&mlx5_vdpa_driver);
+		mlx5_class_driver_register(&mlx5_vdpa_driver);
 }
 
 RTE_PMD_EXPORT_NAME(MLX5_VDPA_DRIVER_NAME, __COUNTER__);
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.h b/drivers/vdpa/mlx5/mlx5_vdpa.h
index 722c72b65e..2a04e36607 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa.h
+++ b/drivers/vdpa/mlx5/mlx5_vdpa.h
@@ -133,7 +133,6 @@ struct mlx5_vdpa_priv {
 	struct rte_vdpa_device *vdev; /* vDPA device. */
 	int vid; /* vhost device id. */
 	struct ibv_context *ctx; /* Device context. */
-	struct rte_pci_device *pci_dev;
 	struct mlx5_hca_vdpa_attr caps;
 	uint32_t pdn; /* Protection Domain number. */
 	struct ibv_pd *pd;
-- 
2.25.1


^ permalink raw reply	[flat|nested] 42+ messages in thread

* [dpdk-dev] [PATCH v1 12/14] vdpa/mlx5: support SubFunction
  2021-05-27 13:37 ` [dpdk-dev] [RFC 01/14] common/mlx5: add common device driver Xueming Li
                     ` (12 preceding siblings ...)
  2021-06-16  4:09   ` [dpdk-dev] [PATCH v1 11/14] vdpa/mlx5: remove PCI specifics Xueming Li
@ 2021-06-16  4:09   ` Xueming Li
  2021-06-16  4:09   ` [dpdk-dev] [PATCH v1 13/14] compress/mlx5: migrate to common driver Xueming Li
  2021-06-16  4:09   ` [dpdk-dev] [PATCH v1 14/14] common/mlx5: clean up legacy PCI bus driver Xueming Li
  15 siblings, 0 replies; 42+ messages in thread
From: Xueming Li @ 2021-06-16  4:09 UTC (permalink / raw)
  To: Viacheslav Ovsiienko; +Cc: Thomas Monjalon, dev, xuemingl, Matan Azrad

From: Thomas Monjalon <thomas@monjalon.net>

Supports SubFunction on auxiliary bus. SF probe devargs:
  auxiliary:mlx5_core.sf.<id>,class=vdpa

Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
---
 doc/guides/vdpadevs/mlx5.rst  | 10 ++++++++++
 drivers/vdpa/mlx5/mlx5_vdpa.c |  8 ++++++--
 2 files changed, 16 insertions(+), 2 deletions(-)

diff --git a/doc/guides/vdpadevs/mlx5.rst b/doc/guides/vdpadevs/mlx5.rst
index 9b2f9f12c7..e81dbd0004 100644
--- a/doc/guides/vdpadevs/mlx5.rst
+++ b/doc/guides/vdpadevs/mlx5.rst
@@ -162,6 +162,16 @@ Driver options
 
   - 0, HW default.
 
+Devargs example
+^^^^^^^^^^^^^^^
+
+- PCI devargs:
+
+  -a 0000:03:00.2,class=vdpa
+
+- Auxiliary devargs:
+
+  -a auxiliary:mlx5_core.sf.2,class=vdpa
 
 Error handling
 ^^^^^^^^^^^^^^
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.c b/drivers/vdpa/mlx5/mlx5_vdpa.c
index 9c9a552ba0..6d17d7a6f3 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa.c
@@ -553,9 +553,13 @@ mlx5_vdpa_sys_roce_disable(const char *addr)
 static int
 mlx5_vdpa_roce_disable(struct rte_device *dev)
 {
+	char pci_addr[PCI_PRI_STR_SIZE] = { 0 };
+
+	if (mlx5_dev_to_pci_str(dev, pci_addr, sizeof(pci_addr)) < 0)
+		return -rte_errno;
 	/* Firstly try to disable ROCE by Netlink and fallback to sysfs. */
-	if (mlx5_vdpa_nl_roce_disable(dev->name) != 0 &&
-	    mlx5_vdpa_sys_roce_disable(dev->name) != 0)
+	if (mlx5_vdpa_nl_roce_disable(pci_addr) != 0 &&
+	    mlx5_vdpa_sys_roce_disable(pci_addr) != 0)
 		return -rte_errno;
 	return 0;
 }
-- 
2.25.1


^ permalink raw reply	[flat|nested] 42+ messages in thread

* [dpdk-dev] [PATCH v1 13/14] compress/mlx5: migrate to common driver
  2021-05-27 13:37 ` [dpdk-dev] [RFC 01/14] common/mlx5: add common device driver Xueming Li
                     ` (13 preceding siblings ...)
  2021-06-16  4:09   ` [dpdk-dev] [PATCH v1 12/14] vdpa/mlx5: support SubFunction Xueming Li
@ 2021-06-16  4:09   ` Xueming Li
  2021-06-16  4:09   ` [dpdk-dev] [PATCH v1 14/14] common/mlx5: clean up legacy PCI bus driver Xueming Li
  15 siblings, 0 replies; 42+ messages in thread
From: Xueming Li @ 2021-06-16  4:09 UTC (permalink / raw)
  To: Viacheslav Ovsiienko; +Cc: dev, xuemingl, Fiona Trahe, Ashish Gupta

To support auxiliary bus, upgrades driver to use mlx5 common driver
structure.

Signed-off-by: Xueming Li <xuemingl@nvidia.com>
---
 drivers/compress/mlx5/mlx5_compress.c | 71 ++++++---------------------
 1 file changed, 15 insertions(+), 56 deletions(-)

diff --git a/drivers/compress/mlx5/mlx5_compress.c b/drivers/compress/mlx5/mlx5_compress.c
index 80c564f10b..77f426e399 100644
--- a/drivers/compress/mlx5/mlx5_compress.c
+++ b/drivers/compress/mlx5/mlx5_compress.c
@@ -5,7 +5,7 @@
 #include <rte_malloc.h>
 #include <rte_log.h>
 #include <rte_errno.h>
-#include <rte_pci.h>
+#include <rte_bus_pci.h>
 #include <rte_spinlock.h>
 #include <rte_comp.h>
 #include <rte_compressdev.h>
@@ -13,7 +13,6 @@
 
 #include <mlx5_glue.h>
 #include <mlx5_common.h>
-#include <mlx5_common_pci.h>
 #include <mlx5_devx_cmds.h>
 #include <mlx5_common_os.h>
 #include <mlx5_common_devx.h>
@@ -37,7 +36,6 @@ struct mlx5_compress_xform {
 struct mlx5_compress_priv {
 	TAILQ_ENTRY(mlx5_compress_priv) next;
 	struct ibv_context *ctx; /* Device context. */
-	struct rte_pci_device *pci_dev;
 	struct rte_compressdev *cdev;
 	void *uar;
 	uint32_t pdn; /* Protection Domain number. */
@@ -711,23 +709,8 @@ mlx5_compress_hw_global_prepare(struct mlx5_compress_priv *priv)
 	return 0;
 }
 
-/**
- * DPDK callback to register a PCI device.
- *
- * This function spawns compress device out of a given PCI device.
- *
- * @param[in] pci_drv
- *   PCI driver structure (mlx5_compress_driver).
- * @param[in] pci_dev
- *   PCI device information.
- *
- * @return
- *   0 on success, 1 to skip this driver, a negative errno value otherwise
- *   and rte_errno is set.
- */
 static int
-mlx5_compress_pci_probe(struct rte_pci_driver *pci_drv,
-			struct rte_pci_device *pci_dev)
+mlx5_compress_dev_probe(struct rte_device *dev)
 {
 	struct ibv_device *ibv;
 	struct rte_compressdev *cdev;
@@ -736,24 +719,17 @@ mlx5_compress_pci_probe(struct rte_pci_driver *pci_drv,
 	struct mlx5_hca_attr att = { 0 };
 	struct rte_compressdev_pmd_init_params init_params = {
 		.name = "",
-		.socket_id = pci_dev->device.numa_node,
+		.socket_id = dev->numa_node,
 	};
 
-	RTE_SET_USED(pci_drv);
 	if (rte_eal_process_type() != RTE_PROC_PRIMARY) {
 		DRV_LOG(ERR, "Non-primary process type is not supported.");
 		rte_errno = ENOTSUP;
 		return -rte_errno;
 	}
-	ibv = mlx5_os_get_ibv_device(&pci_dev->addr);
-	if (ibv == NULL) {
-		DRV_LOG(ERR, "No matching IB device for PCI slot "
-			PCI_PRI_FMT ".", pci_dev->addr.domain,
-			pci_dev->addr.bus, pci_dev->addr.devid,
-			pci_dev->addr.function);
+	ibv = mlx5_os_get_ibv_dev(dev);
+	if (ibv == NULL)
 		return -rte_errno;
-	}
-	DRV_LOG(INFO, "PCI information matches for device \"%s\".", ibv->name);
 	ctx = mlx5_glue->dv_open_device(ibv);
 	if (ctx == NULL) {
 		DRV_LOG(ERR, "Failed to open IB device \"%s\".", ibv->name);
@@ -769,7 +745,7 @@ mlx5_compress_pci_probe(struct rte_pci_driver *pci_drv,
 		rte_errno = ENOTSUP;
 		return -ENOTSUP;
 	}
-	cdev = rte_compressdev_pmd_create(ibv->name, &pci_dev->device,
+	cdev = rte_compressdev_pmd_create(ibv->name, dev,
 					  sizeof(*priv), &init_params);
 	if (cdev == NULL) {
 		DRV_LOG(ERR, "Failed to create device \"%s\".", ibv->name);
@@ -784,7 +760,6 @@ mlx5_compress_pci_probe(struct rte_pci_driver *pci_drv,
 	cdev->feature_flags = RTE_COMPDEV_FF_HW_ACCELERATED;
 	priv = cdev->data->dev_private;
 	priv->ctx = ctx;
-	priv->pci_dev = pci_dev;
 	priv->cdev = cdev;
 	priv->min_block_size = att.compress_min_block_size;
 	priv->sq_ts_format = att.sq_ts_format;
@@ -810,25 +785,14 @@ mlx5_compress_pci_probe(struct rte_pci_driver *pci_drv,
 	return 0;
 }
 
-/**
- * DPDK callback to remove a PCI device.
- *
- * This function removes all compress devices belong to a given PCI device.
- *
- * @param[in] pci_dev
- *   Pointer to the PCI device.
- *
- * @return
- *   0 on success, the function cannot fail.
- */
 static int
-mlx5_compress_pci_remove(struct rte_pci_device *pdev)
+mlx5_compress_dev_remove(struct rte_device *dev)
 {
 	struct mlx5_compress_priv *priv = NULL;
 
 	pthread_mutex_lock(&priv_list_lock);
 	TAILQ_FOREACH(priv, &mlx5_compress_priv_list, next)
-		if (rte_pci_addr_cmp(&priv->pci_dev->addr, &pdev->addr) != 0)
+		if (priv->cdev->device == dev)
 			break;
 	if (priv)
 		TAILQ_REMOVE(&mlx5_compress_priv_list, priv, next);
@@ -852,24 +816,19 @@ static const struct rte_pci_id mlx5_compress_pci_id_map[] = {
 	}
 };
 
-static struct mlx5_pci_driver mlx5_compress_driver = {
-	.driver_class = MLX5_CLASS_COMPRESS,
-	.pci_driver = {
-		.driver = {
-			.name = RTE_STR(MLX5_COMPRESS_DRIVER_NAME),
-		},
-		.id_table = mlx5_compress_pci_id_map,
-		.probe = mlx5_compress_pci_probe,
-		.remove = mlx5_compress_pci_remove,
-		.drv_flags = 0,
-	},
+static struct mlx5_class_driver mlx5_compress_driver = {
+	.drv_class = MLX5_CLASS_COMPRESS,
+	.name = RTE_STR(MLX5_COMPRESS_DRIVER_NAME),
+	.id_table = mlx5_compress_pci_id_map,
+	.probe = mlx5_compress_dev_probe,
+	.remove = mlx5_compress_dev_remove,
 };
 
 RTE_INIT(rte_mlx5_compress_init)
 {
 	mlx5_common_init();
 	if (mlx5_glue != NULL)
-		mlx5_pci_driver_register(&mlx5_compress_driver);
+		mlx5_class_driver_register(&mlx5_compress_driver);
 }
 
 RTE_LOG_REGISTER_DEFAULT(mlx5_compress_logtype, NOTICE)
-- 
2.25.1


^ permalink raw reply	[flat|nested] 42+ messages in thread

* [dpdk-dev] [PATCH v1 14/14] common/mlx5: clean up legacy PCI bus driver
  2021-05-27 13:37 ` [dpdk-dev] [RFC 01/14] common/mlx5: add common device driver Xueming Li
                     ` (14 preceding siblings ...)
  2021-06-16  4:09   ` [dpdk-dev] [PATCH v1 13/14] compress/mlx5: migrate to common driver Xueming Li
@ 2021-06-16  4:09   ` Xueming Li
  15 siblings, 0 replies; 42+ messages in thread
From: Xueming Li @ 2021-06-16  4:09 UTC (permalink / raw)
  To: Viacheslav Ovsiienko
  Cc: dev, xuemingl, Matan Azrad, Shahaf Shuler, Ray Kinsella, Neil Horman

Clean up legacy PCI bus driver since all mlx5 PMDs moved to new common
PCI bus driver.

Signed-off-by: Xueming Li <xuemingl@nvidia.com>
---
 drivers/common/mlx5/linux/mlx5_common_os.h |   1 -
 drivers/common/mlx5/mlx5_common.c          |   1 -
 drivers/common/mlx5/mlx5_common.h          |   1 +
 drivers/common/mlx5/mlx5_common_pci.c      | 433 +--------------------
 drivers/common/mlx5/mlx5_common_pci.h      |  77 ----
 drivers/common/mlx5/mlx5_common_private.h  |   1 +
 drivers/common/mlx5/version.map            |   3 -
 7 files changed, 4 insertions(+), 513 deletions(-)
 delete mode 100644 drivers/common/mlx5/mlx5_common_pci.h

diff --git a/drivers/common/mlx5/linux/mlx5_common_os.h b/drivers/common/mlx5/linux/mlx5_common_os.h
index 86d0cb09b0..2b03bf811e 100644
--- a/drivers/common/mlx5/linux/mlx5_common_os.h
+++ b/drivers/common/mlx5/linux/mlx5_common_os.h
@@ -289,7 +289,6 @@ mlx5_os_free(void *addr)
 	free(addr);
 }
 
-__rte_internal
 struct ibv_device *
 mlx5_os_get_ibv_device(const struct rte_pci_addr *addr);
 
diff --git a/drivers/common/mlx5/mlx5_common.c b/drivers/common/mlx5/mlx5_common.c
index 4ff13cb461..6c83cf4bcd 100644
--- a/drivers/common/mlx5/mlx5_common.c
+++ b/drivers/common/mlx5/mlx5_common.c
@@ -14,7 +14,6 @@
 #include "mlx5_common.h"
 #include "mlx5_common_os.h"
 #include "mlx5_common_log.h"
-#include "mlx5_common_pci.h"
 #include "mlx5_common_private.h"
 
 uint8_t haswell_broadwell_cpu;
diff --git a/drivers/common/mlx5/mlx5_common.h b/drivers/common/mlx5/mlx5_common.h
index 7d7b896517..85855345a9 100644
--- a/drivers/common/mlx5/mlx5_common.h
+++ b/drivers/common/mlx5/mlx5_common.h
@@ -10,6 +10,7 @@
 #include <rte_pci.h>
 #include <rte_debug.h>
 #include <rte_atomic.h>
+#include <rte_rwlock.h>
 #include <rte_log.h>
 #include <rte_kvargs.h>
 #include <rte_devargs.h>
diff --git a/drivers/common/mlx5/mlx5_common_pci.c b/drivers/common/mlx5/mlx5_common_pci.c
index 6fe28defbf..8b38091d87 100644
--- a/drivers/common/mlx5/mlx5_common_pci.c
+++ b/drivers/common/mlx5/mlx5_common_pci.c
@@ -8,431 +8,17 @@
 #include <rte_devargs.h>
 #include <rte_errno.h>
 #include <rte_class.h>
+#include <rte_pci.h>
+#include <rte_bus_pci.h>
 
 #include "mlx5_common_log.h"
-#include "mlx5_common_pci.h"
 #include "mlx5_common_private.h"
 
 static struct rte_pci_driver mlx5_common_pci_driver;
 
-/********** Legacy PCI bus driver, to be removed ********/
-
-struct mlx5_pci_device {
-	struct rte_pci_device *pci_dev;
-	TAILQ_ENTRY(mlx5_pci_device) next;
-	uint32_t classes_loaded;
-};
-
-/* Head of list of drivers. */
-static TAILQ_HEAD(mlx5_pci_bus_drv_head, mlx5_pci_driver) drv_list =
-				TAILQ_HEAD_INITIALIZER(drv_list);
-
-/* Head of mlx5 pci devices. */
-static TAILQ_HEAD(mlx5_pci_devices_head, mlx5_pci_device) devices_list =
-				TAILQ_HEAD_INITIALIZER(devices_list);
-
-static const struct {
-	const char *name;
-	unsigned int driver_class;
-} mlx5_classes[] = {
-	{ .name = "vdpa", .driver_class = MLX5_CLASS_VDPA },
-	{ .name = "net", .driver_class = MLX5_CLASS_NET },
-	{ .name = "regex", .driver_class = MLX5_CLASS_REGEX },
-	{ .name = "compress", .driver_class = MLX5_CLASS_COMPRESS },
-};
-
-static const unsigned int mlx5_class_combinations[] = {
-	MLX5_CLASS_NET,
-	MLX5_CLASS_VDPA,
-	MLX5_CLASS_REGEX,
-	MLX5_CLASS_COMPRESS,
-	MLX5_CLASS_NET | MLX5_CLASS_REGEX,
-	MLX5_CLASS_VDPA | MLX5_CLASS_REGEX,
-	MLX5_CLASS_NET | MLX5_CLASS_COMPRESS,
-	MLX5_CLASS_VDPA | MLX5_CLASS_COMPRESS,
-	MLX5_CLASS_REGEX | MLX5_CLASS_COMPRESS,
-	MLX5_CLASS_NET | MLX5_CLASS_REGEX | MLX5_CLASS_COMPRESS,
-	MLX5_CLASS_VDPA | MLX5_CLASS_REGEX | MLX5_CLASS_COMPRESS,
-	/* New class combination should be added here. */
-};
-
-static int
-class_name_to_value(const char *class_name)
-{
-	unsigned int i;
-
-	for (i = 0; i < RTE_DIM(mlx5_classes); i++) {
-		if (strcmp(class_name, mlx5_classes[i].name) == 0)
-			return mlx5_classes[i].driver_class;
-	}
-	return -EINVAL;
-}
-
-static struct mlx5_pci_driver *
-driver_get(uint32_t class)
-{
-	struct mlx5_pci_driver *driver;
-
-	TAILQ_FOREACH(driver, &drv_list, next) {
-		if (driver->driver_class == class)
-			return driver;
-	}
-	return NULL;
-}
-
-static int
-bus_cmdline_options_handler(__rte_unused const char *key,
-			    const char *class_names, void *opaque)
-{
-	int *ret = opaque;
-	char *nstr_org;
-	int class_val;
-	char *found;
-	char *nstr;
-	char *refstr = NULL;
-
-	*ret = 0;
-	nstr = strdup(class_names);
-	if (!nstr) {
-		*ret = -ENOMEM;
-		return *ret;
-	}
-	nstr_org = nstr;
-	found = strtok_r(nstr, ":", &refstr);
-	if (!found)
-		goto err;
-	do {
-		/* Extract each individual class name. Multiple
-		 * class key,value is supplied as class=net:vdpa:foo:bar.
-		 */
-		class_val = class_name_to_value(found);
-		/* Check if its a valid class. */
-		if (class_val < 0) {
-			*ret = -EINVAL;
-			goto err;
-		}
-		*ret |= class_val;
-		found = strtok_r(NULL, ":", &refstr);
-	} while (found);
-err:
-	free(nstr_org);
-	if (*ret < 0)
-		DRV_LOG(ERR, "Invalid mlx5 class options %s."
-			" Maybe typo in device class argument setting?",
-			class_names);
-	return *ret;
-}
-
-static int
-parse_class_options(const struct rte_devargs *devargs)
-{
-	const char *key = RTE_DEVARGS_KEY_CLASS;
-	struct rte_kvargs *kvlist;
-	int ret = 0;
-
-	if (devargs == NULL)
-		return 0;
-	kvlist = rte_kvargs_parse(devargs->args, NULL);
-	if (kvlist == NULL)
-		return 0;
-	if (rte_kvargs_count(kvlist, key))
-		rte_kvargs_process(kvlist, key, bus_cmdline_options_handler,
-				   &ret);
-	rte_kvargs_free(kvlist);
-	return ret;
-}
-
-static bool
-mlx5_bus_match(const struct mlx5_pci_driver *drv,
-	       const struct rte_pci_device *pci_dev)
-{
-	const struct rte_pci_id *id_table;
-
-	for (id_table = drv->pci_driver.id_table; id_table->vendor_id != 0;
-	     id_table++) {
-		/* Check if device's ids match the class driver's ids. */
-		if (id_table->vendor_id != pci_dev->id.vendor_id &&
-		    id_table->vendor_id != RTE_PCI_ANY_ID)
-			continue;
-		if (id_table->device_id != pci_dev->id.device_id &&
-		    id_table->device_id != RTE_PCI_ANY_ID)
-			continue;
-		if (id_table->subsystem_vendor_id !=
-		    pci_dev->id.subsystem_vendor_id &&
-		    id_table->subsystem_vendor_id != RTE_PCI_ANY_ID)
-			continue;
-		if (id_table->subsystem_device_id !=
-		    pci_dev->id.subsystem_device_id &&
-		    id_table->subsystem_device_id != RTE_PCI_ANY_ID)
-			continue;
-		if (id_table->class_id != pci_dev->id.class_id &&
-		    id_table->class_id != RTE_CLASS_ANY_ID)
-			continue;
-		return true;
-	}
-	return false;
-}
-
-static int
-is_valid_class_combination(uint32_t user_classes)
-{
-	unsigned int i;
-
-	/* Verify if user specified valid supported combination. */
-	for (i = 0; i < RTE_DIM(mlx5_class_combinations); i++) {
-		if (mlx5_class_combinations[i] == user_classes)
-			return 0;
-	}
-	/* Not found any valid class combination. */
-	return -EINVAL;
-}
-
-static struct mlx5_pci_device *
-pci_to_mlx5_device(const struct rte_pci_device *pci_dev)
-{
-	struct mlx5_pci_device *dev;
-
-	TAILQ_FOREACH(dev, &devices_list, next) {
-		if (dev->pci_dev == pci_dev)
-			return dev;
-	}
-	return NULL;
-}
-
-static bool
-device_class_enabled(const struct mlx5_pci_device *device, uint32_t class)
-{
-	return (device->classes_loaded & class) ? true : false;
-}
-
-static void
-dev_release(struct mlx5_pci_device *dev)
-{
-	TAILQ_REMOVE(&devices_list, dev, next);
-	rte_free(dev);
-}
-
-static int
-drivers_remove(struct mlx5_pci_device *dev, uint32_t enabled_classes)
-{
-	struct mlx5_pci_driver *driver;
-	int local_ret = -ENODEV;
-	unsigned int i = 0;
-	int ret = 0;
-
-	enabled_classes &= dev->classes_loaded;
-	while (enabled_classes) {
-		driver = driver_get(RTE_BIT64(i));
-		if (driver) {
-			local_ret = driver->pci_driver.remove(dev->pci_dev);
-			if (!local_ret)
-				dev->classes_loaded &= ~RTE_BIT64(i);
-			else if (ret == 0)
-				ret = local_ret;
-		}
-		enabled_classes &= ~RTE_BIT64(i);
-		i++;
-	}
-	if (local_ret)
-		ret = local_ret;
-	return ret;
-}
-
-static int
-drivers_probe(struct mlx5_pci_device *dev, struct rte_pci_driver *pci_drv,
-	      struct rte_pci_device *pci_dev, uint32_t user_classes)
-{
-	struct mlx5_pci_driver *driver;
-	uint32_t enabled_classes = 0;
-	bool already_loaded;
-	int ret;
-
-	TAILQ_FOREACH(driver, &drv_list, next) {
-		if ((driver->driver_class & user_classes) == 0)
-			continue;
-		if (!mlx5_bus_match(driver, pci_dev))
-			continue;
-		already_loaded = dev->classes_loaded & driver->driver_class;
-		if (already_loaded &&
-		    !(driver->pci_driver.drv_flags & RTE_PCI_DRV_PROBE_AGAIN)) {
-			DRV_LOG(ERR, "Device %s is already probed",
-				pci_dev->device.name);
-			ret = -EEXIST;
-			goto probe_err;
-		}
-		ret = driver->pci_driver.probe(pci_drv, pci_dev);
-		if (ret < 0) {
-			DRV_LOG(ERR, "Failed to load driver %s",
-				driver->pci_driver.driver.name);
-			goto probe_err;
-		}
-		enabled_classes |= driver->driver_class;
-	}
-	dev->classes_loaded |= enabled_classes;
-	return 0;
-probe_err:
-	/* Only unload drivers which are enabled which were enabled
-	 * in this probe instance.
-	 */
-	drivers_remove(dev, enabled_classes);
-	return ret;
-}
-
-/**
- * DPDK callback to register to probe multiple drivers for a PCI device.
- *
- * @param[in] pci_drv
- *   PCI driver structure.
- * @param[in] dev
- *   PCI device information.
- *
- * @return
- *   0 on success, a negative errno value otherwise and rte_errno is set.
- */
-static int
-mlx5_pci_probe(struct rte_pci_driver *pci_drv __rte_unused,
-	       struct rte_pci_device *pci_dev)
-{
-	struct mlx5_pci_device *dev;
-	uint32_t user_classes = 0;
-	bool new_device = false;
-	int ret;
-
-	ret = parse_class_options(pci_dev->device.devargs);
-	if (ret < 0)
-		return ret;
-	user_classes = ret;
-	if (user_classes) {
-		/* Validate combination here. */
-		ret = is_valid_class_combination(user_classes);
-		if (ret) {
-			DRV_LOG(ERR, "Unsupported mlx5 classes supplied.");
-			return ret;
-		}
-	} else {
-		/* Default to net class. */
-		user_classes = MLX5_CLASS_NET;
-	}
-	dev = pci_to_mlx5_device(pci_dev);
-	if (!dev) {
-		dev = rte_zmalloc("mlx5_pci_device", sizeof(*dev), 0);
-		if (!dev)
-			return -ENOMEM;
-		dev->pci_dev = pci_dev;
-		TAILQ_INSERT_HEAD(&devices_list, dev, next);
-		new_device = true;
-	}
-	ret = drivers_probe(dev, pci_drv, pci_dev, user_classes);
-	if (ret)
-		goto class_err;
-	return 0;
-class_err:
-	if (new_device)
-		dev_release(dev);
-	return ret;
-}
-
-/**
- * DPDK callback to remove one or more drivers for a PCI device.
- *
- * This function removes all drivers probed for a given PCI device.
- *
- * @param[in] pci_dev
- *   Pointer to the PCI device.
- *
- * @return
- *   0 on success, the function cannot fail.
- */
-static int
-mlx5_pci_remove(struct rte_pci_device *pci_dev)
-{
-	struct mlx5_pci_device *dev;
-	int ret;
-
-	dev = pci_to_mlx5_device(pci_dev);
-	if (!dev)
-		return -ENODEV;
-	/* Matching device found, cleanup and unload drivers. */
-	ret = drivers_remove(dev, dev->classes_loaded);
-	if (!ret)
-		dev_release(dev);
-	return ret;
-}
-
-static int
-mlx5_pci_dma_map(struct rte_pci_device *pci_dev, void *addr,
-		 uint64_t iova, size_t len)
-{
-	struct mlx5_pci_driver *driver = NULL;
-	struct mlx5_pci_driver *temp;
-	struct mlx5_pci_device *dev;
-	int ret = -EINVAL;
-
-	dev = pci_to_mlx5_device(pci_dev);
-	if (!dev)
-		return -ENODEV;
-	TAILQ_FOREACH(driver, &drv_list, next) {
-		if (device_class_enabled(dev, driver->driver_class) &&
-		    driver->pci_driver.dma_map) {
-			ret = driver->pci_driver.dma_map(pci_dev, addr,
-							 iova, len);
-			if (ret)
-				goto map_err;
-		}
-	}
-	return ret;
-map_err:
-	TAILQ_FOREACH(temp, &drv_list, next) {
-		if (temp == driver)
-			break;
-		if (device_class_enabled(dev, temp->driver_class) &&
-		    temp->pci_driver.dma_map && temp->pci_driver.dma_unmap)
-			temp->pci_driver.dma_unmap(pci_dev, addr, iova, len);
-	}
-	return ret;
-}
-
-static int
-mlx5_pci_dma_unmap(struct rte_pci_device *pci_dev, void *addr,
-		   uint64_t iova, size_t len)
-{
-	struct mlx5_pci_driver *driver;
-	struct mlx5_pci_device *dev;
-	int local_ret = -EINVAL;
-	int ret;
-
-	dev = pci_to_mlx5_device(pci_dev);
-	if (!dev)
-		return -ENODEV;
-	ret = 0;
-	/* There is no unmap error recovery in current implementation. */
-	TAILQ_FOREACH_REVERSE(driver, &drv_list, mlx5_pci_bus_drv_head, next) {
-		if (device_class_enabled(dev, driver->driver_class) &&
-		    driver->pci_driver.dma_unmap) {
-			local_ret = driver->pci_driver.dma_unmap(pci_dev, addr,
-								 iova, len);
-			if (local_ret && (ret == 0))
-				ret = local_ret;
-		}
-	}
-	if (local_ret)
-		ret = local_ret;
-	return ret;
-}
-
 /* PCI ID table is build dynamically based on registered mlx5 drivers. */
 static struct rte_pci_id *mlx5_pci_id_table;
 
-static struct rte_pci_driver mlx5_pci_driver = {
-	.driver = {
-		.name = MLX5_PCI_DRIVER_NAME,
-	},
-	.probe = mlx5_pci_probe,
-	.remove = mlx5_pci_remove,
-	.dma_map = mlx5_pci_dma_map,
-	.dma_unmap = mlx5_pci_dma_unmap,
-};
-
 static int
 pci_id_table_size_get(const struct rte_pci_id *id_table)
 {
@@ -509,7 +95,6 @@ pci_ids_table_update(const struct rte_pci_id *driver_id_table)
 	}
 	/* Terminate table with empty entry. */
 	updated_table[i].vendor_id = 0;
-	mlx5_pci_driver.id_table = updated_table;
 	mlx5_common_pci_driver.id_table = updated_table;
 	mlx5_pci_id_table = updated_table;
 	if (old_table)
@@ -517,20 +102,6 @@ pci_ids_table_update(const struct rte_pci_id *driver_id_table)
 	return 0;
 }
 
-void
-mlx5_pci_driver_register(struct mlx5_pci_driver *driver)
-{
-	int ret;
-
-	ret = pci_ids_table_update(driver->pci_driver.id_table);
-	if (ret)
-		return;
-	mlx5_pci_driver.drv_flags |= driver->pci_driver.drv_flags;
-	TAILQ_INSERT_TAIL(&drv_list, driver, next);
-}
-
-/********** New common PCI bus driver ********/
-
 bool
 mlx5_dev_is_pci(const struct rte_device *dev)
 {
diff --git a/drivers/common/mlx5/mlx5_common_pci.h b/drivers/common/mlx5/mlx5_common_pci.h
deleted file mode 100644
index de89bb98bc..0000000000
--- a/drivers/common/mlx5/mlx5_common_pci.h
+++ /dev/null
@@ -1,77 +0,0 @@
-/* SPDX-License-Identifier: BSD-3-Clause
- * Copyright 2020 Mellanox Technologies, Ltd
- */
-
-#ifndef _MLX5_COMMON_PCI_H_
-#define _MLX5_COMMON_PCI_H_
-
-/**
- * @file
- *
- * RTE Mellanox PCI Driver Interface
- * Mellanox ConnectX PCI device supports multiple class: net,vdpa,regex and
- * compress devices. This layer enables creating such multiple class of devices
- * on a single PCI device by allowing to bind multiple class specific device
- * driver to attach to mlx5_pci driver.
- *
- * -----------    ------------    -------------    ----------------
- * |   mlx5  |    |   mlx5   |    |   mlx5    |    |     mlx5     |
- * | net pmd |    | vdpa pmd |    | regex pmd |    | compress pmd |
- * -----------    ------------    -------------    ----------------
- *      \              \                    /              /
- *       \              \                  /              /
- *        \              \_--------------_/              /
- *         \_______________|   mlx5     |_______________/
- *                         | pci common |
- *                         --------------
- *                               |
- *                           -----------
- *                           |   mlx5  |
- *                           | pci dev |
- *                           -----------
- *
- * - mlx5 pci driver binds to mlx5 PCI devices defined by PCI
- *   ID table of all related mlx5 PCI devices.
- * - mlx5 class driver such as net, vdpa, regex PMD defines its
- *   specific PCI ID table and mlx5 bus driver probes matching
- *   class drivers.
- * - mlx5 pci bus driver is cental place that validates supported
- *   class combinations.
- */
-
-#ifdef __cplusplus
-extern "C" {
-#endif /* __cplusplus */
-
-#include <rte_pci.h>
-#include <rte_bus_pci.h>
-
-#include <mlx5_common.h>
-
-void mlx5_common_pci_init(void);
-
-/**
- * A structure describing a mlx5 pci driver.
- */
-struct mlx5_pci_driver {
-	struct rte_pci_driver pci_driver;	/**< Inherit core pci driver. */
-	uint32_t driver_class;	/**< Class of this driver, enum mlx5_class */
-	TAILQ_ENTRY(mlx5_pci_driver) next;
-};
-
-/**
- * Register a mlx5_pci device driver.
- *
- * @param driver
- *   A pointer to a mlx5_pci_driver structure describing the driver
- *   to be registered.
- */
-__rte_internal
-void
-mlx5_pci_driver_register(struct mlx5_pci_driver *driver);
-
-#ifdef __cplusplus
-}
-#endif /* __cplusplus */
-
-#endif /* _MLX5_COMMON_PCI_H_ */
diff --git a/drivers/common/mlx5/mlx5_common_private.h b/drivers/common/mlx5/mlx5_common_private.h
index 1096fa85e7..c929840408 100644
--- a/drivers/common/mlx5/mlx5_common_private.h
+++ b/drivers/common/mlx5/mlx5_common_private.h
@@ -31,6 +31,7 @@ int mlx5_common_dev_dma_unmap(struct rte_device *dev, void *addr, uint64_t iova,
 
 /* Common PCI bus driver: */
 
+void mlx5_common_pci_init(void);
 void mlx5_common_driver_on_register_pci(struct mlx5_class_driver *driver);
 bool mlx5_dev_pci_match(const struct mlx5_class_driver *drv,
 			const struct rte_device *dev);
diff --git a/drivers/common/mlx5/version.map b/drivers/common/mlx5/version.map
index ea4c49b7e7..39bc7bad23 100644
--- a/drivers/common/mlx5/version.map
+++ b/drivers/common/mlx5/version.map
@@ -130,13 +130,10 @@ INTERNAL {
 	mlx5_nl_vlan_vmwa_create; # WINDOWS_NO_EXPORT
 	mlx5_nl_vlan_vmwa_delete; # WINDOWS_NO_EXPORT
 
-	mlx5_pci_driver_register;
-
 	mlx5_os_alloc_pd;
 	mlx5_os_dealloc_pd;
 	mlx5_os_dereg_mr;
 	mlx5_os_get_ibv_dev; # WINDOWS_NO_EXPORT
-	mlx5_os_get_ibv_device; # WINDOWS_NO_EXPORT
 	mlx5_os_reg_mr;
 	mlx5_os_umem_dereg;
 	mlx5_os_umem_reg;
-- 
2.25.1


^ permalink raw reply	[flat|nested] 42+ messages in thread

end of thread, other threads:[~2021-06-16  6:08 UTC | newest]

Thread overview: 42+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-05-27 13:37 [dpdk-dev] [RFC 00/14] mlx5: support SubFunction Xueming Li
2021-05-27 13:37 ` [dpdk-dev] [RFC 01/14] common/mlx5: add common device driver Xueming Li
2021-06-10  9:51   ` Thomas Monjalon
2021-06-16  4:09   ` [dpdk-dev] [PATCH v1 00/14] net/mlx5: support Sub-Function Xueming Li
2021-06-16  4:09   ` [dpdk-dev] [PATCH v1 01/14] common/mlx5: add common device driver Xueming Li
2021-06-16  4:09   ` [dpdk-dev] [PATCH v1 02/14] common/mlx5: move description of PCI sysfs functions Xueming Li
2021-06-16  4:09   ` [dpdk-dev] [PATCH v1 03/14] common/mlx5: support auxiliary bus Xueming Li
2021-06-16  4:09   ` [dpdk-dev] [PATCH v1 04/14] common/mlx5: get PCI device address from any bus Xueming Li
2021-06-16  4:09   ` [dpdk-dev] [PATCH v1 05/14] net/mlx5: remove PCI dependency Xueming Li
2021-06-16  4:09   ` [dpdk-dev] [PATCH v1 06/14] net/mlx5: migrate to bus-agnostic common driver Xueming Li
2021-06-16  4:09   ` [dpdk-dev] [PATCH v1 07/14] net/mlx5: support SubFunction Xueming Li
2021-06-16  4:09   ` [dpdk-dev] [PATCH v1 08/14] net/mlx5: check max Verbs port number Xueming Li
2021-06-16  4:09   ` [dpdk-dev] [PATCH v1 09/14] regex/mlx5: migrate to common driver Xueming Li
2021-06-16  4:09   ` [dpdk-dev] [PATCH v1 10/14] vdpa/mlx5: define driver name as macro Xueming Li
2021-06-16  4:09   ` [dpdk-dev] [PATCH v1 11/14] vdpa/mlx5: remove PCI specifics Xueming Li
2021-06-16  4:09   ` [dpdk-dev] [PATCH v1 12/14] vdpa/mlx5: support SubFunction Xueming Li
2021-06-16  4:09   ` [dpdk-dev] [PATCH v1 13/14] compress/mlx5: migrate to common driver Xueming Li
2021-06-16  4:09   ` [dpdk-dev] [PATCH v1 14/14] common/mlx5: clean up legacy PCI bus driver Xueming Li
2021-05-27 13:37 ` [dpdk-dev] [RFC 02/14] common/mlx5: move description of PCI sysfs functions Xueming Li
2021-05-27 13:37 ` [dpdk-dev] [RFC 03/14] net/mlx5: remove PCI dependency Xueming Li
2021-05-27 13:37 ` [dpdk-dev] [RFC 04/14] net/mlx5: migrate to bus-agnostic common driver Xueming Li
2021-05-27 13:37 ` [dpdk-dev] [RFC 05/14] regex/mlx5: migrate to " Xueming Li
2021-05-27 13:37 ` [dpdk-dev] [RFC 06/14] compress/mlx5: " Xueming Li
2021-05-27 13:37 ` [dpdk-dev] [RFC 07/14] vdpa/mlx5: fix driver name Xueming Li
2021-05-27 13:37 ` [dpdk-dev] [RFC 08/14] vdpa/mlx5: remove PCI specifics Xueming Li
2021-05-27 13:37 ` [dpdk-dev] [RFC 09/14] common/mlx5: clean up legacy PCI bus driver Xueming Li
2021-05-27 14:01 ` [dpdk-dev] [RFC 10/14] bus/auxiliary: introduce auxiliary bus Xueming Li
2021-05-27 14:01   ` [dpdk-dev] [RFC 11/14] common/mlx5: support " Xueming Li
2021-05-27 14:02   ` [dpdk-dev] [RFC 12/14] common/mlx5: get PCI device address from any bus Xueming Li
2021-05-27 14:02   ` [dpdk-dev] [RFC 13/14] vdpa/mlx5: support SubFunction Xueming Li
2021-05-27 14:02   ` [dpdk-dev] [RFC 14/14] net/mlx5: " Xueming Li
2021-06-10 10:33 ` [dpdk-dev] [RFC 00/14] mlx5: " Ferruh Yigit
2021-06-10 13:23   ` Thomas Monjalon
2021-06-11  5:14     ` Xia, Chenbo
2021-06-11  7:54       ` Thomas Monjalon
2021-06-15  2:10         ` Xia, Chenbo
2021-06-15  4:04           ` Parav Pandit
2021-06-15  5:33             ` Xia, Chenbo
2021-06-15  5:43               ` Parav Pandit
2021-06-15 11:19                 ` Xia, Chenbo
2021-06-15 12:47                   ` Parav Pandit
2021-06-15 15:19                     ` Jason Gunthorpe

DPDK patches and discussions

This inbox may be cloned and mirrored by anyone:

	git clone --mirror https://inbox.dpdk.org/dev/0 dev/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 dev dev/ https://inbox.dpdk.org/dev \
		dev@dpdk.org
	public-inbox-index dev

Example config snippet for mirrors.
Newsgroup available over NNTP:
	nntp://inbox.dpdk.org/inbox.dpdk.dev


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git