DPDK patches and discussions
 help / color / mirror / Atom feed
* [dpdk-dev] [PATCH 00/12] net/mlx5: add bonding configuration support
@ 2019-09-25  7:53 Viacheslav Ovsiienko
  2019-09-25  7:53 ` [dpdk-dev] [PATCH 01/12] net/mlx5: move backing PCI device to private context Viacheslav Ovsiienko
                   ` (13 more replies)
  0 siblings, 14 replies; 17+ messages in thread
From: Viacheslav Ovsiienko @ 2019-09-25  7:53 UTC (permalink / raw)
  To: dev; +Cc: matan, rasland

Multiport Mellanox NICs may support the bonding configurations internally.
Let's suppose there is ConnectX-5 NIC with two physical ports, on the host
it presents two PCI physical functions:

- PF0, say with PCI address 0000:82:00.0 and net interface ens1f0
- PF1, say with PCI address 0000:82:00.1 and net interface ens1f1

Also, let's suppose SR-IOV feature is enabled, swithdev mode is engaged,
and there is some set virtual PCI functions and their representor interfaces.
The physical interfaces may be combined into single bond interface,
supported by NIC HW/FW means directly with standard script:

  modprobe bonding miimon=100 mode=4  # 100 ms link check interval, mode - LACP
  ip link set ens3f0 master bond0
  ip link set ens3f0 master bond1

The dedicated Infiniband devices for single ports is destroyed, the new
multiport Infiniband device is created for bond interface and all
representors for both PFs. The unified E-Switch is created either,
and all representor ports belong to the same unified switch domain.

To use the created bond interface with DPDK application both slave
PCI devices must be specified (in whitelist, if any):

  -w 82:00.0,representor=[0-4]
  -w 82:00.1,representor=[0-7]

Representor enumerations follows the VF enumerations in the same way
as for single device. The two PCI devices will be probed, but eth ports
for only one master device and for all representors will be created.
This ports may reference to different rte_pci_dev but share the
same switch domain ID.

The extra devargs specifying configurations must be compatible
(otherwise error on probing will be arisen). For example, it is not
allowed to specify different values of dv_flow_en parameter for
different PCI devices.

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>

Viacheslav Ovsiienko (12):
  net/mlx5: move backing PCI device to private context
  net/mlx5: update PCI address retrieving routine
  net/mlx5: allocate device list explicitly
  net/mlx5: add VF LAG mode bonding device recognition
  net/mlx5: generate bonding device name
  net/mlx5: check the kernel support for VF LAG bonding
  net/mlx5: query vport index match mode and parameters
  net/mlx5: elaborate E-Switch port parameters query
  net/mlx5: update source and destination vport translations
  net/mlx5: extend switch domain searching range
  net/mlx5: update switch port ID in bonding configuration
  net/mlx5: check sibling device configurations mismatch

 drivers/net/mlx5/Makefile       |   5 +
 drivers/net/mlx5/meson.build    |   2 +
 drivers/net/mlx5/mlx5.c         | 359 +++++++++++++++++++++++++++++++++++++---
 drivers/net/mlx5/mlx5.h         |  23 ++-
 drivers/net/mlx5/mlx5_defs.h    |   4 +
 drivers/net/mlx5/mlx5_ethdev.c  | 128 +++++++-------
 drivers/net/mlx5/mlx5_flow_dv.c |  98 +++++++----
 drivers/net/mlx5/mlx5_prm.h     |   9 +-
 drivers/net/mlx5/mlx5_txq.c     |   2 +-
 9 files changed, 506 insertions(+), 124 deletions(-)

-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [dpdk-dev] [PATCH 01/12] net/mlx5: move backing PCI device to private context
  2019-09-25  7:53 [dpdk-dev] [PATCH 00/12] net/mlx5: add bonding configuration support Viacheslav Ovsiienko
@ 2019-09-25  7:53 ` Viacheslav Ovsiienko
  2019-09-25  7:53 ` [dpdk-dev] [PATCH 02/12] net/mlx5: update PCI address retrieving routine Viacheslav Ovsiienko
                   ` (12 subsequent siblings)
  13 siblings, 0 replies; 17+ messages in thread
From: Viacheslav Ovsiienko @ 2019-09-25  7:53 UTC (permalink / raw)
  To: dev; +Cc: matan, rasland

Now all devices created over the same multiport IB device
have shared context containing the backing PCI device field.
For the VF LAG configurations it becomes possible the
representors might be connected to VF created over different
PFs. In this case representors have the different backing
PCI devices and mentioned field should be moved to device
private area.

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
---
 drivers/net/mlx5/mlx5.c     | 4 ++--
 drivers/net/mlx5/mlx5.h     | 2 +-
 drivers/net/mlx5/mlx5_txq.c | 2 +-
 3 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index 5088a97..b4096a4 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -373,7 +373,6 @@ struct mlx5_dev_spawn_data {
 		sizeof(sh->ibdev_name));
 	strncpy(sh->ibdev_path, sh->ctx->device->ibdev_path,
 		sizeof(sh->ibdev_path));
-	sh->pci_dev = spawn->pci_dev;
 	pthread_mutex_init(&sh->intr_mutex, NULL);
 	/*
 	 * Setting port_id to max unallowed value means
@@ -406,7 +405,7 @@ struct mlx5_dev_spawn_data {
 	 */
 	err = mlx5_mr_btree_init(&sh->mr.cache,
 				 MLX5_MR_BTREE_CACHE_N * 2,
-				 sh->pci_dev->device.numa_node);
+				 spawn->pci_dev->device.numa_node);
 	if (err) {
 		err = rte_errno;
 		goto error;
@@ -1756,6 +1755,7 @@ struct mlx5_dev_spawn_data {
 	}
 	priv->sh = sh;
 	priv->ibv_port = spawn->ibv_port;
+	priv->pci_dev = spawn->pci_dev;
 	priv->mtu = RTE_ETHER_MTU;
 #ifndef RTE_ARCH_64
 	/* Initialize UAR access locks for 32bit implementations. */
diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index 5482eb7..f45d627 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -547,7 +547,6 @@ struct mlx5_ibv_shared {
 	char ibdev_name[IBV_SYSFS_NAME_MAX]; /* IB device name. */
 	char ibdev_path[IBV_SYSFS_PATH_MAX]; /* IB device path for secondary */
 	struct ibv_device_attr_ex device_attr; /* Device properties. */
-	struct rte_pci_device *pci_dev; /* Backend PCI device. */
 	LIST_ENTRY(mlx5_ibv_shared) mem_event_cb;
 	/**< Called by memory event callback. */
 	struct {
@@ -606,6 +605,7 @@ struct mlx5_priv {
 	struct rte_eth_dev_data *dev_data;  /* Pointer to device data. */
 	struct mlx5_ibv_shared *sh; /* Shared IB device context. */
 	uint32_t ibv_port; /* IB device port number. */
+	struct rte_pci_device *pci_dev; /* Backend PCI device. */
 	struct rte_ether_addr mac[MLX5_MAX_MAC_ADDRESSES]; /* MAC addresses. */
 	BITFIELD_DECLARE(mac_own, uint64_t, MLX5_MAX_MAC_ADDRESSES);
 	/* Bit-field of MAC addresses owned by the PMD. */
diff --git a/drivers/net/mlx5/mlx5_txq.c b/drivers/net/mlx5/mlx5_txq.c
index 2b7d6c0..d9fd143 100644
--- a/drivers/net/mlx5/mlx5_txq.c
+++ b/drivers/net/mlx5/mlx5_txq.c
@@ -728,7 +728,7 @@ struct mlx5_txq_ibv *
 	if (config->txqs_inline == MLX5_ARG_UNSET)
 		txqs_inline =
 #if defined(RTE_ARCH_ARM64)
-		(priv->sh->pci_dev->id.device_id ==
+		(priv->pci_dev->id.device_id ==
 			PCI_DEVICE_ID_MELLANOX_CONNECTX5BF) ?
 			MLX5_INLINE_MAX_TXQS_BLUEFIELD :
 #endif
-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [dpdk-dev] [PATCH 02/12] net/mlx5: update PCI address retrieving routine
  2019-09-25  7:53 [dpdk-dev] [PATCH 00/12] net/mlx5: add bonding configuration support Viacheslav Ovsiienko
  2019-09-25  7:53 ` [dpdk-dev] [PATCH 01/12] net/mlx5: move backing PCI device to private context Viacheslav Ovsiienko
@ 2019-09-25  7:53 ` Viacheslav Ovsiienko
  2019-09-25  7:53 ` [dpdk-dev] [PATCH 03/12] net/mlx5: allocate device list explicitly Viacheslav Ovsiienko
                   ` (11 subsequent siblings)
  13 siblings, 0 replies; 17+ messages in thread
From: Viacheslav Ovsiienko @ 2019-09-25  7:53 UTC (permalink / raw)
  To: dev; +Cc: matan, rasland

The routine mlx5_ibv_device_to_pci_addr() takes Infiniband
device list object, takes the device sysfs path from there
and retrieves PCI address. The routine may be implemented
in more generic way by taking sysfs path directly as parameter
and can be used for getting PCI address of netdevs.

The generic routine is renamed to mlx5_dev_to_pci_addr()

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
---
 drivers/net/mlx5/mlx5.c        |  3 ++-
 drivers/net/mlx5/mlx5.h        |  4 ++--
 drivers/net/mlx5/mlx5_ethdev.c | 12 ++++++------
 3 files changed, 10 insertions(+), 9 deletions(-)

diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index b4096a4..dd8159c 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -2184,7 +2184,8 @@ struct mlx5_dev_spawn_data {
 		struct rte_pci_addr pci_addr;
 
 		DRV_LOG(DEBUG, "checking device \"%s\"", ibv_list[ret]->name);
-		if (mlx5_ibv_device_to_pci_addr(ibv_list[ret], &pci_addr))
+		if (mlx5_dev_to_pci_addr
+			(ibv_list[ret]->ibdev_path, &pci_addr))
 			continue;
 		if (pci_dev->addr.domain != pci_addr.domain ||
 		    pci_dev->addr.bus != pci_addr.bus ||
diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index f45d627..9a3fd36 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -701,8 +701,8 @@ int mlx5_dev_get_flow_ctrl(struct rte_eth_dev *dev,
 			   struct rte_eth_fc_conf *fc_conf);
 int mlx5_dev_set_flow_ctrl(struct rte_eth_dev *dev,
 			   struct rte_eth_fc_conf *fc_conf);
-int mlx5_ibv_device_to_pci_addr(const struct ibv_device *device,
-				struct rte_pci_addr *pci_addr);
+int mlx5_dev_to_pci_addr(const char *dev_path,
+			 struct rte_pci_addr *pci_addr);
 void mlx5_dev_link_status_handler(void *arg);
 void mlx5_dev_interrupt_handler(void *arg);
 void mlx5_dev_interrupt_handler_devx(void *arg);
diff --git a/drivers/net/mlx5/mlx5_ethdev.c b/drivers/net/mlx5/mlx5_ethdev.c
index ad53721..71f63ac 100644
--- a/drivers/net/mlx5/mlx5_ethdev.c
+++ b/drivers/net/mlx5/mlx5_ethdev.c
@@ -1131,10 +1131,10 @@ int mlx5_fw_version_get(struct rte_eth_dev *dev, char *fw_ver, size_t fw_size)
 }
 
 /**
- * Get PCI information from struct ibv_device.
+ * Get PCI information by sysfs device path.
  *
- * @param device
- *   Pointer to Ethernet device structure.
+ * @param dev_path
+ *   Pointer to device sysfs folder name.
  * @param[out] pci_addr
  *   PCI bus address output buffer.
  *
@@ -1142,12 +1142,12 @@ int mlx5_fw_version_get(struct rte_eth_dev *dev, char *fw_ver, size_t fw_size)
  *   0 on success, a negative errno value otherwise and rte_errno is set.
  */
 int
-mlx5_ibv_device_to_pci_addr(const struct ibv_device *device,
-			    struct rte_pci_addr *pci_addr)
+mlx5_dev_to_pci_addr(const char *dev_path,
+		     struct rte_pci_addr *pci_addr)
 {
 	FILE *file;
 	char line[32];
-	MKSTR(path, "%s/device/uevent", device->ibdev_path);
+	MKSTR(path, "%s/device/uevent", dev_path);
 
 	file = fopen(path, "rb");
 	if (file == NULL) {
-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [dpdk-dev] [PATCH 03/12] net/mlx5: allocate device list explicitly
  2019-09-25  7:53 [dpdk-dev] [PATCH 00/12] net/mlx5: add bonding configuration support Viacheslav Ovsiienko
  2019-09-25  7:53 ` [dpdk-dev] [PATCH 01/12] net/mlx5: move backing PCI device to private context Viacheslav Ovsiienko
  2019-09-25  7:53 ` [dpdk-dev] [PATCH 02/12] net/mlx5: update PCI address retrieving routine Viacheslav Ovsiienko
@ 2019-09-25  7:53 ` Viacheslav Ovsiienko
  2019-09-25  7:53 ` [dpdk-dev] [PATCH 04/12] net/mlx5: add VF LAG mode bonding device recognition Viacheslav Ovsiienko
                   ` (10 subsequent siblings)
  13 siblings, 0 replies; 17+ messages in thread
From: Viacheslav Ovsiienko @ 2019-09-25  7:53 UTC (permalink / raw)
  To: dev; +Cc: matan, rasland

At device probing the device list to spawn was allocated
as dynamic size local variable. It was no possible to have
one unified exit point from routine due to compiler warnings.
This patch allocates the spawn device list directly with
rte_zmalloc() and it is possible to goto to unified exit
label from anywhere of the routine.

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
---
 drivers/net/mlx5/mlx5.c | 25 +++++++++++++++++--------
 1 file changed, 17 insertions(+), 8 deletions(-)

diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index dd8159c..701da7e 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -2154,6 +2154,7 @@ struct mlx5_dev_spawn_data {
 	 * Actually this is the number of iterations to spawn.
 	 */
 	unsigned int ns = 0;
+	struct mlx5_dev_spawn_data *list = NULL;
 	struct mlx5_dev_config dev_config;
 	int ret;
 
@@ -2176,8 +2177,8 @@ struct mlx5_dev_spawn_data {
 	 * matching ones, gathering into the list.
 	 */
 	struct ibv_device *ibv_match[ret + 1];
-	int nl_route = -1;
-	int nl_rdma = -1;
+	int nl_route = mlx5_nl_init(NETLINK_ROUTE);
+	int nl_rdma = mlx5_nl_init(NETLINK_RDMA);
 	unsigned int i;
 
 	while (ret-- > 0) {
@@ -2199,7 +2200,6 @@ struct mlx5_dev_spawn_data {
 	ibv_match[nd] = NULL;
 	if (!nd) {
 		/* No device matches, just complain and bail out. */
-		mlx5_glue->free_device_list(ibv_list);
 		DRV_LOG(WARNING,
 			"no Verbs device matches PCI device " PCI_PRI_FMT ","
 			" are kernel drivers loaded?",
@@ -2207,10 +2207,8 @@ struct mlx5_dev_spawn_data {
 			pci_dev->addr.devid, pci_dev->addr.function);
 		rte_errno = ENOENT;
 		ret = -rte_errno;
-		return ret;
+		goto exit;
 	}
-	nl_route = mlx5_nl_init(NETLINK_ROUTE);
-	nl_rdma = mlx5_nl_init(NETLINK_RDMA);
 	if (nd == 1) {
 		/*
 		 * Found single matching device may have multiple ports.
@@ -2227,8 +2225,16 @@ struct mlx5_dev_spawn_data {
 	 * Now we can determine the maximal
 	 * amount of devices to be spawned.
 	 */
-	struct mlx5_dev_spawn_data list[np ? np : nd];
-
+	list = rte_zmalloc("device spawn data",
+			 sizeof(struct mlx5_dev_spawn_data) *
+			 (np ? np : nd),
+			 RTE_CACHE_LINE_SIZE);
+	if (!list) {
+		DRV_LOG(ERR, "spawn data array allocation failure");
+		rte_errno = ENOMEM;
+		ret = -rte_errno;
+		goto exit;
+	}
 	if (np > 1) {
 		/*
 		 * Single IB device with multiple ports found,
@@ -2477,12 +2483,15 @@ struct mlx5_dev_spawn_data {
 	/*
 	 * Do the routine cleanup:
 	 * - close opened Netlink sockets
+	 * - free allocated spawn data array
 	 * - free the Infiniband device list
 	 */
 	if (nl_rdma >= 0)
 		close(nl_rdma);
 	if (nl_route >= 0)
 		close(nl_route);
+	if (list)
+		rte_free(list);
 	assert(ibv_list);
 	mlx5_glue->free_device_list(ibv_list);
 	return ret;
-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [dpdk-dev] [PATCH 04/12] net/mlx5: add VF LAG mode bonding device recognition
  2019-09-25  7:53 [dpdk-dev] [PATCH 00/12] net/mlx5: add bonding configuration support Viacheslav Ovsiienko
                   ` (2 preceding siblings ...)
  2019-09-25  7:53 ` [dpdk-dev] [PATCH 03/12] net/mlx5: allocate device list explicitly Viacheslav Ovsiienko
@ 2019-09-25  7:53 ` Viacheslav Ovsiienko
  2019-09-30 10:34   ` Ferruh Yigit
  2019-09-25  7:53 ` [dpdk-dev] [PATCH 05/12] net/mlx5: generate bonding device name Viacheslav Ovsiienko
                   ` (9 subsequent siblings)
  13 siblings, 1 reply; 17+ messages in thread
From: Viacheslav Ovsiienko @ 2019-09-25  7:53 UTC (permalink / raw)
  To: dev; +Cc: matan, rasland

The Mellanox NICs starting from ConnectX-5 support LAG over
NIC ports internally, implemented by the NIC firmware and hardware.
The multiport NIC presents multiple physical PCI functions (PF),
with SR-IOV multiple virtual PCI functions (VFs) might be presented.
With switchdev mode the VF representors are engaged and PFs and their
VFs are connected by internal E-Switch feature. Each PF and related VFs
have dedicated E-Switch and belong to dedicated switch domain.

If NIC ports are combined to support NIC the kernel drivers introduce
the single unified Infiniband multiport devices, and all only one
unified E-Switch with single switch domain combines master PF
all all VFs. No extra DPDK bonding device is needed.

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
---
 drivers/net/mlx5/mlx5.c | 160 +++++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 159 insertions(+), 1 deletion(-)

diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index 701da7e..12eed13 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -169,6 +169,7 @@ struct mlx5_dev_spawn_data {
 	uint32_t ifindex; /**< Network interface index. */
 	uint32_t max_port; /**< IB device maximal port index. */
 	uint32_t ibv_port; /**< IB device physical port index. */
+	int pf_bond; /**< bonding device PF index. < 0 - no bonding */
 	struct mlx5_switch_info info; /**< Switch information. */
 	struct ibv_device *ibv_dev; /**< Associated IB device. */
 	struct rte_eth_dev *eth_dev; /**< Associated Ethernet device. */
@@ -2119,6 +2120,108 @@ struct mlx5_dev_spawn_data {
 }
 
 /**
+ * Match PCI information for possible slaves of bonding device.
+ *
+ * @param[in] ibv_dev
+ *   Pointer to Infiniband device structure.
+ * @param[in] pci_dev
+ *   Pointer to PCI device structure to match PCI address.
+ * @param[in] nl_rdma
+ *   Netlink RDMA group socket handle.
+ *
+ * @return
+ *   negative value if no bonding device found, otherwise
+ *   positive index of slave PF in bonding.
+ */
+static int
+mlx5_device_bond_pci_match(const struct ibv_device *ibv_dev,
+			   const struct rte_pci_device *pci_dev,
+			   int nl_rdma)
+{
+	char ifname[IF_NAMESIZE + 1];
+	unsigned int ifindex;
+	unsigned int np, i;
+	FILE *file = NULL;
+	int pf = -1;
+
+	/*
+	 * Try to get master device name. If something goes
+	 * wrong suppose the lack of kernel support and no
+	 * bonding devices.
+	 */
+	if (nl_rdma < 0)
+		return -1;
+	if (!strstr(ibv_dev->name, "bond"))
+		return -1;
+	np = mlx5_nl_portnum(nl_rdma, ibv_dev->name);
+	if (!np)
+		return -1;
+	/*
+	 * The Master device might not be on the predefined
+	 * port (not on port index 1, it is not garanted),
+	 * we have to scan all Infiniband device port and
+	 * find master.
+	 */
+	for (i = 1; i <= np; ++i) {
+		/* Check whether Infiniband port is populated. */
+		ifindex = mlx5_nl_ifindex(nl_rdma, ibv_dev->name, i);
+		if (!ifindex)
+			continue;
+		if (!if_indextoname(ifindex, ifname))
+			continue;
+		/* Try to read bonding slave names from sysfs. */
+		MKSTR(slaves,
+		      "/sys/class/net/%s/master/bonding/slaves", ifname);
+		file = fopen(slaves, "r");
+		if (file)
+			break;
+	}
+	if (!file)
+		return -1;
+	MKSTR(format, "%c%us", '%', (unsigned int)(sizeof(ifname) - 1));
+
+	/* Use safe format to check maximal buffer length. */
+#pragma GCC diagnostic ignored "-Wformat-nonliteral"
+	while (fscanf(file, format, ifname) == 1) {
+#pragma GCC diagnostic error "-Wformat-nonliteral"
+		char tmp_str[IF_NAMESIZE + 32];
+		struct rte_pci_addr pci_addr;
+		struct mlx5_switch_info	info;
+
+		/* Process slave interface names in the loop. */
+		snprintf(tmp_str, sizeof(tmp_str),
+			 "/sys/class/net/%s", ifname);
+		if (mlx5_dev_to_pci_addr(tmp_str, &pci_addr)) {
+			DRV_LOG(WARNING, "can not get PCI address"
+					 " for netdev \"%s\"", ifname);
+			continue;
+		}
+		if (pci_dev->addr.domain != pci_addr.domain ||
+		    pci_dev->addr.bus != pci_addr.bus ||
+		    pci_dev->addr.devid != pci_addr.devid ||
+		    pci_dev->addr.function != pci_addr.function)
+			continue;
+		/* Slave interface PCI address match found. */
+		fclose(file);
+		snprintf(tmp_str, sizeof(tmp_str),
+			 "/sys/class/net/%s/phys_port_name", ifname);
+		file = fopen(tmp_str, "rb");
+		if (!file)
+			break;
+		info.name_type = MLX5_PHYS_PORT_NAME_TYPE_NOTSET;
+		if (fscanf(file, "%32s", tmp_str) == 1)
+			mlx5_translate_port_name(tmp_str, &info);
+		if (info.name_type == MLX5_PHYS_PORT_NAME_TYPE_LEGACY ||
+		    info.name_type == MLX5_PHYS_PORT_NAME_TYPE_UPLINK)
+			pf = info.port_name;
+		break;
+	}
+	if (file)
+		fclose(file);
+	return pf;
+}
+
+/**
  * DPDK callback to register a PCI device.
  *
  * This function spawns Ethernet devices out of a given PCI device.
@@ -2154,6 +2257,12 @@ struct mlx5_dev_spawn_data {
 	 * Actually this is the number of iterations to spawn.
 	 */
 	unsigned int ns = 0;
+	/*
+	 * Bonding device
+	 *   < 0 - no bonding device (single one)
+	 *  >= 0 - bonding device (value is slave PF index)
+	 */
+	int bd = -1;
 	struct mlx5_dev_spawn_data *list = NULL;
 	struct mlx5_dev_config dev_config;
 	int ret;
@@ -2185,6 +2294,30 @@ struct mlx5_dev_spawn_data {
 		struct rte_pci_addr pci_addr;
 
 		DRV_LOG(DEBUG, "checking device \"%s\"", ibv_list[ret]->name);
+		bd = mlx5_device_bond_pci_match
+				(ibv_list[ret], pci_dev, nl_rdma);
+		if (bd >= 0) {
+			/*
+			 * Bonding device detected. Only one match is allowed,
+			 * the bonding is supported over multi-port IB device,
+			 * there should be no matches on representor PCI
+			 * functions or non VF LAG bonding devices with
+			 * specified address.
+			 */
+			if (nd) {
+				DRV_LOG(ERR,
+					"multiple PCI match on bonding device"
+					"\"%s\" found", ibv_list[ret]->name);
+				rte_errno = ENOENT;
+				ret = -rte_errno;
+				goto exit;
+			}
+			DRV_LOG(INFO, "PCI information matches for"
+				      " slave %d bonding device \"%s\"",
+				      bd, ibv_list[ret]->name);
+			ibv_match[nd++] = ibv_list[ret];
+			break;
+		}
 		if (mlx5_dev_to_pci_addr
 			(ibv_list[ret]->ibdev_path, &pci_addr))
 			continue;
@@ -2220,6 +2353,13 @@ struct mlx5_dev_spawn_data {
 		if (!np)
 			DRV_LOG(WARNING, "can not get IB device \"%s\""
 					 " ports number", ibv_match[0]->name);
+		if (bd >= 0 && !np) {
+			DRV_LOG(ERR, "can not get ports"
+				     " for bonding device");
+			rte_errno = ENOENT;
+			ret = -rte_errno;
+			goto exit;
+		}
 	}
 	/*
 	 * Now we can determine the maximal
@@ -2235,7 +2375,7 @@ struct mlx5_dev_spawn_data {
 		ret = -rte_errno;
 		goto exit;
 	}
-	if (np > 1) {
+	if (bd >= 0 || np > 1) {
 		/*
 		 * Single IB device with multiple ports found,
 		 * it may be E-Switch master device and representors.
@@ -2244,12 +2384,14 @@ struct mlx5_dev_spawn_data {
 		assert(nl_rdma >= 0);
 		assert(ns == 0);
 		assert(nd == 1);
+		assert(np);
 		for (i = 1; i <= np; ++i) {
 			list[ns].max_port = np;
 			list[ns].ibv_port = i;
 			list[ns].ibv_dev = ibv_match[0];
 			list[ns].eth_dev = NULL;
 			list[ns].pci_dev = pci_dev;
+			list[ns].pf_bond = bd;
 			list[ns].ifindex = mlx5_nl_ifindex
 					(nl_rdma, list[ns].ibv_dev->name, i);
 			if (!list[ns].ifindex) {
@@ -2279,6 +2421,21 @@ struct mlx5_dev_spawn_data {
 						(list[ns].ifindex,
 						 &list[ns].info);
 			}
+			if (!ret && bd >= 0) {
+				switch (list[ns].info.name_type) {
+				case MLX5_PHYS_PORT_NAME_TYPE_UPLINK:
+					if (list[ns].info.port_name == bd)
+						ns++;
+					break;
+				case MLX5_PHYS_PORT_NAME_TYPE_PFVF:
+					if (list[ns].info.pf_num == bd)
+						ns++;
+					break;
+				default:
+					break;
+				}
+				continue;
+			}
 			if (!ret && (list[ns].info.representor ^
 				     list[ns].info.master))
 				ns++;
@@ -2317,6 +2474,7 @@ struct mlx5_dev_spawn_data {
 			list[ns].ibv_dev = ibv_match[i];
 			list[ns].eth_dev = NULL;
 			list[ns].pci_dev = pci_dev;
+			list[ns].pf_bond = -1;
 			list[ns].ifindex = 0;
 			if (nl_rdma >= 0)
 				list[ns].ifindex = mlx5_nl_ifindex
-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [dpdk-dev] [PATCH 05/12] net/mlx5: generate bonding device name
  2019-09-25  7:53 [dpdk-dev] [PATCH 00/12] net/mlx5: add bonding configuration support Viacheslav Ovsiienko
                   ` (3 preceding siblings ...)
  2019-09-25  7:53 ` [dpdk-dev] [PATCH 04/12] net/mlx5: add VF LAG mode bonding device recognition Viacheslav Ovsiienko
@ 2019-09-25  7:53 ` Viacheslav Ovsiienko
  2019-09-25  7:53 ` [dpdk-dev] [PATCH 06/12] net/mlx5: check the kernel support for VF LAG bonding Viacheslav Ovsiienko
                   ` (8 subsequent siblings)
  13 siblings, 0 replies; 17+ messages in thread
From: Viacheslav Ovsiienko @ 2019-09-25  7:53 UTC (permalink / raw)
  To: dev; +Cc: matan, rasland

If device is VF LAG bonding one the port name includes
the bonding Infiniband device name and looks like:

  82:00.0_mlx5_bond_0 - for master device port PF0
  82:00.1_mlx5_bond_0_representor_5 - for representor
                                           VF5 over PF1

where bonding Infiniband device mlx5_bond_0 controls
the 82:00.0 as PF0 and 82:00.1 as PF1 PCI functions.

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
---
 drivers/net/mlx5/mlx5.c | 22 +++++++++++++++++-----
 1 file changed, 17 insertions(+), 5 deletions(-)

diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index 12eed13..7c9fd54 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -1585,11 +1585,23 @@ struct mlx5_dev_spawn_data {
 		}
 	}
 	/* Build device name. */
-	if (!switch_info->representor)
-		strlcpy(name, dpdk_dev->name, sizeof(name));
-	else
-		snprintf(name, sizeof(name), "%s_representor_%u",
-			 dpdk_dev->name, switch_info->port_name);
+	if (spawn->pf_bond <  0) {
+		/* Single device. */
+		if (!switch_info->representor)
+			strlcpy(name, dpdk_dev->name, sizeof(name));
+		else
+			snprintf(name, sizeof(name), "%s_representor_%u",
+				 dpdk_dev->name, switch_info->port_name);
+	} else {
+		/* Bonding device. */
+		if (!switch_info->representor)
+			snprintf(name, sizeof(name), "%s_%s",
+				 dpdk_dev->name, spawn->ibv_dev->name);
+		else
+			snprintf(name, sizeof(name), "%s_%s_representor_%u",
+				 dpdk_dev->name, spawn->ibv_dev->name,
+				 switch_info->port_name);
+	}
 	/* check if the device is already spawned */
 	if (rte_eth_dev_get_port_by_name(name, &port_id) == 0) {
 		rte_errno = EEXIST;
-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [dpdk-dev] [PATCH 06/12] net/mlx5: check the kernel support for VF LAG bonding
  2019-09-25  7:53 [dpdk-dev] [PATCH 00/12] net/mlx5: add bonding configuration support Viacheslav Ovsiienko
                   ` (4 preceding siblings ...)
  2019-09-25  7:53 ` [dpdk-dev] [PATCH 05/12] net/mlx5: generate bonding device name Viacheslav Ovsiienko
@ 2019-09-25  7:53 ` Viacheslav Ovsiienko
  2019-09-25  7:53 ` [dpdk-dev] [PATCH 07/12] net/mlx5: query vport index match mode and parameters Viacheslav Ovsiienko
                   ` (7 subsequent siblings)
  13 siblings, 0 replies; 17+ messages in thread
From: Viacheslav Ovsiienko @ 2019-09-25  7:53 UTC (permalink / raw)
  To: dev; +Cc: matan, rasland

If bonding Infiniband device is found the unified E-Switch
is supposed and the extra rdma-core/kernel support is needed
to retrieve vport indices. The patch introduces this feature
defines, bonding support check is added to probe routine.

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
---
 drivers/net/mlx5/Makefile    |  5 +++++
 drivers/net/mlx5/meson.build |  2 ++
 drivers/net/mlx5/mlx5.c      | 13 +++++++++++++
 3 files changed, 20 insertions(+)

diff --git a/drivers/net/mlx5/Makefile b/drivers/net/mlx5/Makefile
index 6c9d4b5..04de93a 100644
--- a/drivers/net/mlx5/Makefile
+++ b/drivers/net/mlx5/Makefile
@@ -170,6 +170,11 @@ mlx5_autoconf.h.new: $(RTE_SDK)/buildtools/auto-config-h.sh
 		func mlx5dv_dr_action_create_push_vlan \
 		$(AUTOCONF_OUTPUT)
 	$Q sh -- '$<' '$@' \
+		HAVE_MLX5DV_DR_DEVX_PORT \
+		infiniband/mlx5dv.h \
+		func mlx5dv_query_devx_port \
+		$(AUTOCONF_OUTPUT)
+	$Q sh -- '$<' '$@' \
 		HAVE_IBV_DEVX_OBJ \
 		infiniband/mlx5dv.h \
 		func mlx5dv_devx_obj_create \
diff --git a/drivers/net/mlx5/meson.build b/drivers/net/mlx5/meson.build
index fa68b54..f8e3cce 100644
--- a/drivers/net/mlx5/meson.build
+++ b/drivers/net/mlx5/meson.build
@@ -120,6 +120,8 @@ if build
 		'IBV_WQ_FLAGS_PCI_WRITE_END_PADDING' ],
 		[ 'HAVE_IBV_WQ_FLAG_RX_END_PADDING', 'infiniband/verbs.h',
 		'IBV_WQ_FLAG_RX_END_PADDING' ],
+		[ 'HAVE_MLX5DV_DR_DEVX_PORT', 'infiniband/mlx5dv.h',
+		'mlx5dv_query_devx_port' ],
 		[ 'HAVE_IBV_DEVX_OBJ', 'infiniband/mlx5dv.h',
 		'mlx5dv_devx_obj_create' ],
 		[ 'HAVE_IBV_FLOW_DEVX_COUNTERS', 'infiniband/mlx5dv.h',
diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index 7c9fd54..25d1530 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -2373,6 +2373,19 @@ struct mlx5_dev_spawn_data {
 			goto exit;
 		}
 	}
+#ifndef HAVE_MLX5DV_DR_DEVX_PORT
+	if (bd >= 0) {
+		/*
+		 * This may happen if there is VF LAG kernel support and
+		 * application is compiled with older rdma_core library.
+		 */
+		DRV_LOG(ERR,
+			"No kernel/verbs support for VF LAG bonding found.");
+		rte_errno = ENOTSUP;
+		ret = -rte_errno;
+		goto exit;
+	}
+#endif
 	/*
 	 * Now we can determine the maximal
 	 * amount of devices to be spawned.
-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [dpdk-dev] [PATCH 07/12] net/mlx5: query vport index match mode and parameters
  2019-09-25  7:53 [dpdk-dev] [PATCH 00/12] net/mlx5: add bonding configuration support Viacheslav Ovsiienko
                   ` (5 preceding siblings ...)
  2019-09-25  7:53 ` [dpdk-dev] [PATCH 06/12] net/mlx5: check the kernel support for VF LAG bonding Viacheslav Ovsiienko
@ 2019-09-25  7:53 ` Viacheslav Ovsiienko
  2019-09-25  7:53 ` [dpdk-dev] [PATCH 08/12] net/mlx5: elaborate E-Switch port parameters query Viacheslav Ovsiienko
                   ` (6 subsequent siblings)
  13 siblings, 0 replies; 17+ messages in thread
From: Viacheslav Ovsiienko @ 2019-09-25  7:53 UTC (permalink / raw)
  To: dev; +Cc: matan, rasland

There new kernel/rdma_core [1] supports matching on metadata
register instead of vport field to provide operations over
VF LAG bonding configurations. The patch retrieves parameters
and information about the way is engaged to match vport on E-Switch.

[1] http://patchwork.ozlabs.org/cover/1122170/
    "Mellanox, mlx5 vport metadata matching"

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
---
 drivers/net/mlx5/mlx5.c | 56 +++++++++++++++++++++++++++++++++++++++++++++++--
 drivers/net/mlx5/mlx5.h |  2 ++
 2 files changed, 56 insertions(+), 2 deletions(-)

diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index 25d1530..1b2f86f 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -1563,6 +1563,9 @@ struct mlx5_dev_spawn_data {
 	int own_domain_id = 0;
 	uint16_t port_id;
 	unsigned int i;
+#ifdef HAVE_MLX5DV_DR_DEVX_PORT
+	struct mlx5dv_devx_port devx_port;
+#endif
 
 	/* Determine if this port representor is supposed to be spawned. */
 	if (switch_info->representor && dpdk_dev->devargs) {
@@ -1783,8 +1786,56 @@ struct mlx5_dev_spawn_data {
 	priv->representor = !!switch_info->representor;
 	priv->master = !!switch_info->master;
 	priv->domain_id = RTE_ETH_DEV_SWITCH_DOMAIN_ID_INVALID;
+	priv->vport_meta_tag = 0;
+	priv->vport_meta_mask = 0;
+#ifdef HAVE_MLX5DV_DR_DEVX_PORT
 	/*
-	 * Currently we support single E-Switch per PF configurations
+	 * The DevX port query API is implemented. E-Switch may use
+	 * either vport or reg_c[0] metadata register to match on
+	 * vport index. The engaged part of metadata register is
+	 * defined by mask.
+	 */
+	devx_port.comp_mask = MLX5DV_DEVX_PORT_VPORT |
+			      MLX5DV_DEVX_PORT_MATCH_REG_C_0;
+	err = mlx5dv_query_devx_port(sh->ctx, spawn->ibv_port, &devx_port);
+	if (err) {
+		DRV_LOG(WARNING, "can't query devx port %d on device %s\n",
+			spawn->ibv_port, spawn->ibv_dev->name);
+		devx_port.comp_mask = 0;
+	}
+	if (devx_port.comp_mask & MLX5DV_DEVX_PORT_MATCH_REG_C_0) {
+		priv->vport_meta_tag = devx_port.reg_c_0.value;
+		priv->vport_meta_mask = devx_port.reg_c_0.mask;
+		if (!priv->vport_meta_mask) {
+			DRV_LOG(ERR, "vport zero mask for port %d"
+				     " on bonding device %s\n",
+				     spawn->ibv_port, spawn->ibv_dev->name);
+			err = ENOTSUP;
+			goto error;
+		}
+		if (priv->vport_meta_tag & ~priv->vport_meta_mask) {
+			DRV_LOG(ERR, "invalid vport tag for port %d"
+				     " on bonding device %s\n",
+				     spawn->ibv_port, spawn->ibv_dev->name);
+			err = ENOTSUP;
+			goto error;
+		}
+	} else if (devx_port.comp_mask & MLX5DV_DEVX_PORT_VPORT) {
+		priv->vport_id = devx_port.vport_num;
+	} else if (spawn->pf_bond >= 0) {
+		DRV_LOG(ERR, "can't deduce vport index for port %d"
+			     " on bonding device %s\n",
+			     spawn->ibv_port, spawn->ibv_dev->name);
+		err = ENOTSUP;
+		goto error;
+	} else {
+		/* Suppose vport index in compatible way. */
+		priv->vport_id = switch_info->representor ?
+				 switch_info->port_name + 1 : -1;
+	}
+#else
+	/*
+	 * Kernel/rdma_core support single E-Switch per PF configurations
 	 * only and vport_id field contains the vport index for
 	 * associated VF, which is deduced from representor port name.
 	 * For example, let's have the IB device port 10, it has
@@ -1796,7 +1847,8 @@ struct mlx5_dev_spawn_data {
 	 */
 	priv->vport_id = switch_info->representor ?
 			 switch_info->port_name + 1 : -1;
-	/* representor_id field keeps the unmodified port/VF index. */
+#endif
+	/* representor_id field keeps the unmodified VF index. */
 	priv->representor_id = switch_info->representor ?
 			       switch_info->port_name : -1;
 	/*
diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index 9a3fd36..631876d 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -620,6 +620,8 @@ struct mlx5_priv {
 	unsigned int counter_fallback:1; /* Use counter fallback management. */
 	uint16_t domain_id; /* Switch domain identifier. */
 	uint16_t vport_id; /* Associated VF vport index (if any). */
+	uint32_t vport_meta_tag; /* Used for vport index match ove VF LAG. */
+	uint32_t vport_meta_mask; /* Used for vport index field match mask. */
 	int32_t representor_id; /* Port representor identifier. */
 	unsigned int if_index; /* Associated kernel network device index. */
 	/* RX/TX queues. */
-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [dpdk-dev] [PATCH 08/12] net/mlx5: elaborate E-Switch port parameters query
  2019-09-25  7:53 [dpdk-dev] [PATCH 00/12] net/mlx5: add bonding configuration support Viacheslav Ovsiienko
                   ` (6 preceding siblings ...)
  2019-09-25  7:53 ` [dpdk-dev] [PATCH 07/12] net/mlx5: query vport index match mode and parameters Viacheslav Ovsiienko
@ 2019-09-25  7:53 ` Viacheslav Ovsiienko
  2019-09-25  7:53 ` [dpdk-dev] [PATCH 09/12] net/mlx5: update source and destination vport translations Viacheslav Ovsiienko
                   ` (5 subsequent siblings)
  13 siblings, 0 replies; 17+ messages in thread
From: Viacheslav Ovsiienko @ 2019-09-25  7:53 UTC (permalink / raw)
  To: dev; +Cc: matan, rasland

The routine mlx5_port_to_eswitch_info() is elaborated
to two ones (get E-Switch port parameters by port and
by device pointer) and simplified to returning structure
containing all parameters instead of copying.

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
---
 drivers/net/mlx5/mlx5.h         |  4 +--
 drivers/net/mlx5/mlx5_ethdev.c  | 49 ++++++++++++++++++++++--------
 drivers/net/mlx5/mlx5_flow_dv.c | 66 +++++++++++++++++++----------------------
 3 files changed, 69 insertions(+), 50 deletions(-)

diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index 631876d..87e0549 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -718,8 +718,8 @@ int mlx5_dev_to_pci_addr(const char *dev_path,
 unsigned int mlx5_dev_to_port_id(const struct rte_device *dev,
 				 uint16_t *port_list,
 				 unsigned int port_list_n);
-int mlx5_port_to_eswitch_info(uint16_t port, uint16_t *es_domain_id,
-			      uint16_t *es_port_id);
+struct mlx5_priv *mlx5_port_to_eswitch_info(uint16_t port);
+struct mlx5_priv *mlx5_dev_to_eswitch_info(struct rte_eth_dev *dev);
 int mlx5_sysfs_switch_info(unsigned int ifindex,
 			   struct mlx5_switch_info *info);
 void mlx5_sysfs_check_switch_info(bool device_dir,
diff --git a/drivers/net/mlx5/mlx5_ethdev.c b/drivers/net/mlx5/mlx5_ethdev.c
index 71f63ac..27372f1 100644
--- a/drivers/net/mlx5/mlx5_ethdev.c
+++ b/drivers/net/mlx5/mlx5_ethdev.c
@@ -1660,7 +1660,7 @@ int mlx5_fw_version_get(struct rte_eth_dev *dev, char *fw_ver, size_t fw_size)
 }
 
 /**
- * Get the E-Switch domain id this port belongs to.
+ * Get the E-Switch parameters by port id.
  *
  * @param[in] port
  *   Device port id.
@@ -1670,34 +1670,57 @@ int mlx5_fw_version_get(struct rte_eth_dev *dev, char *fw_ver, size_t fw_size)
  *   The port id of the port in the E-Switch.
  *
  * @return
- *   0 on success, a negative errno value otherwise and rte_errno is set.
+ *   pointer to device private data structure containing data needed
+ *   on success, NULL otherwise and rte_errno is set.
  */
-int
-mlx5_port_to_eswitch_info(uint16_t port,
-			  uint16_t *es_domain_id, uint16_t *es_port_id)
+struct mlx5_priv *
+mlx5_port_to_eswitch_info(uint16_t port)
 {
 	struct rte_eth_dev *dev;
 	struct mlx5_priv *priv;
 
 	if (port >= RTE_MAX_ETHPORTS) {
 		rte_errno = EINVAL;
-		return -rte_errno;
+		return NULL;
 	}
 	if (!rte_eth_dev_is_valid_port(port)) {
 		rte_errno = ENODEV;
-		return -rte_errno;
+		return NULL;
 	}
 	dev = &rte_eth_devices[port];
 	priv = dev->data->dev_private;
 	if (!(priv->representor || priv->master)) {
 		rte_errno = EINVAL;
-		return -rte_errno;
+		return NULL;
 	}
-	if (es_domain_id)
-		*es_domain_id = priv->domain_id;
-	if (es_port_id)
-		*es_port_id = priv->vport_id;
-	return 0;
+	return priv;
+}
+
+/**
+ * Get the E-Switch parameters by device instance.
+ *
+ * @param[in] port
+ *   Device port id.
+ * @param[out] es_domain_id
+ *   E-Switch domain id.
+ * @param[out] es_port_id
+ *   The port id of the port in the E-Switch.
+ *
+ * @return
+ *   pointer to device private data structure containing data needed
+ *   on success, NULL otherwise and rte_errno is set.
+ */
+struct mlx5_priv *
+mlx5_dev_to_eswitch_info(struct rte_eth_dev *dev)
+{
+	struct mlx5_priv *priv;
+
+	priv = dev->data->dev_private;
+	if (!(priv->representor || priv->master)) {
+		rte_errno = EINVAL;
+		return NULL;
+	}
+	return priv;
 }
 
 /**
diff --git a/drivers/net/mlx5/mlx5_flow_dv.c b/drivers/net/mlx5/mlx5_flow_dv.c
index c234d13..ad4ff5a 100644
--- a/drivers/net/mlx5/mlx5_flow_dv.c
+++ b/drivers/net/mlx5/mlx5_flow_dv.c
@@ -813,8 +813,8 @@ struct field_modify_info modify_tcp[] = {
 	const struct rte_flow_item_port_id switch_mask = {
 			.id = 0xffffffff,
 	};
-	uint16_t esw_domain_id;
-	uint16_t item_port_esw_domain_id;
+	struct mlx5_priv *esw_priv;
+	struct mlx5_priv *dev_priv;
 	int ret;
 
 	if (!attr->transfer)
@@ -845,21 +845,19 @@ struct field_modify_info modify_tcp[] = {
 		return ret;
 	if (!spec)
 		return 0;
-	ret = mlx5_port_to_eswitch_info(spec->id, &item_port_esw_domain_id,
-					NULL);
-	if (ret)
-		return rte_flow_error_set(error, -ret,
+	esw_priv = mlx5_port_to_eswitch_info(spec->id);
+	if (!esw_priv)
+		return rte_flow_error_set(error, -rte_errno,
 					  RTE_FLOW_ERROR_TYPE_ITEM_SPEC, spec,
 					  "failed to obtain E-Switch info for"
 					  " port");
-	ret = mlx5_port_to_eswitch_info(dev->data->port_id,
-					&esw_domain_id, NULL);
-	if (ret < 0)
-		return rte_flow_error_set(error, -ret,
+	dev_priv = mlx5_dev_to_eswitch_info(dev);
+	if (!dev_priv)
+		return rte_flow_error_set(error, -rte_errno,
 					  RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
 					  NULL,
 					  "failed to obtain E-Switch info");
-	if (item_port_esw_domain_id != esw_domain_id)
+	if (esw_priv->domain_id != dev_priv->domain_id)
 		return rte_flow_error_set(error, -ret,
 					  RTE_FLOW_ERROR_TYPE_ITEM_SPEC, spec,
 					  "cannot match on a port from a"
@@ -2440,10 +2438,9 @@ struct field_modify_info modify_tcp[] = {
 				struct rte_flow_error *error)
 {
 	const struct rte_flow_action_port_id *port_id;
+	struct mlx5_priv *act_priv;
+	struct mlx5_priv *dev_priv;
 	uint16_t port;
-	uint16_t esw_domain_id;
-	uint16_t act_port_domain_id;
-	int ret;
 
 	if (!attr->transfer)
 		return rte_flow_error_set(error, ENOTSUP,
@@ -2463,24 +2460,23 @@ struct field_modify_info modify_tcp[] = {
 					  RTE_FLOW_ERROR_TYPE_ACTION, NULL,
 					  "can have only one fate actions in"
 					  " a flow");
-	ret = mlx5_port_to_eswitch_info(dev->data->port_id,
-					&esw_domain_id, NULL);
-	if (ret < 0)
-		return rte_flow_error_set(error, -ret,
+	dev_priv = mlx5_dev_to_eswitch_info(dev);
+	if (!dev_priv)
+		return rte_flow_error_set(error, -rte_errno,
 					  RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
 					  NULL,
 					  "failed to obtain E-Switch info");
 	port_id = action->conf;
 	port = port_id->original ? dev->data->port_id : port_id->id;
-	ret = mlx5_port_to_eswitch_info(port, &act_port_domain_id, NULL);
-	if (ret)
+	act_priv = mlx5_port_to_eswitch_info(port);
+	if (!act_priv)
 		return rte_flow_error_set
-				(error, -ret,
+				(error, -rte_errno,
 				 RTE_FLOW_ERROR_TYPE_ACTION_CONF, port_id,
 				 "failed to obtain E-Switch port id for port");
-	if (act_port_domain_id != esw_domain_id)
+	if (act_priv->domain_id != dev_priv->domain_id)
 		return rte_flow_error_set
-				(error, -ret,
+				(error, -rte_errno,
 				 RTE_FLOW_ERROR_TYPE_ACTION, NULL,
 				 "port does not belong to"
 				 " E-Switch being configured");
@@ -4664,15 +4660,16 @@ struct field_modify_info modify_tcp[] = {
 {
 	const struct rte_flow_item_port_id *pid_m = item ? item->mask : NULL;
 	const struct rte_flow_item_port_id *pid_v = item ? item->spec : NULL;
-	uint16_t mask, val, id;
-	int ret;
+	struct mlx5_priv *priv;
+	uint16_t mask, id;
 
 	mask = pid_m ? pid_m->id : 0xffff;
 	id = pid_v ? pid_v->id : dev->data->port_id;
-	ret = mlx5_port_to_eswitch_info(id, NULL, &val);
-	if (ret)
-		return ret;
-	flow_dv_translate_item_source_vport(matcher, key, val, mask);
+	priv = mlx5_port_to_eswitch_info(id);
+	if (!priv)
+		return -rte_errno;
+	flow_dv_translate_item_source_vport(matcher, key,
+					    priv->vport_id, mask);
 	return 0;
 }
 
@@ -5105,19 +5102,18 @@ struct field_modify_info modify_tcp[] = {
 				 struct rte_flow_error *error)
 {
 	uint32_t port;
-	uint16_t port_id;
-	int ret;
+	struct mlx5_priv *priv;
 	const struct rte_flow_action_port_id *conf =
 			(const struct rte_flow_action_port_id *)action->conf;
 
 	port = conf->original ? dev->data->port_id : conf->id;
-	ret = mlx5_port_to_eswitch_info(port, NULL, &port_id);
-	if (ret)
-		return rte_flow_error_set(error, -ret,
+	priv = mlx5_port_to_eswitch_info(port);
+	if (!priv)
+		return rte_flow_error_set(error, -rte_errno,
 					  RTE_FLOW_ERROR_TYPE_ACTION,
 					  NULL,
 					  "No eswitch info was found for port");
-	*dst_port_id = port_id;
+	*dst_port_id = priv->vport_id;
 	return 0;
 }
 
-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [dpdk-dev] [PATCH 09/12] net/mlx5: update source and destination vport translations
  2019-09-25  7:53 [dpdk-dev] [PATCH 00/12] net/mlx5: add bonding configuration support Viacheslav Ovsiienko
                   ` (7 preceding siblings ...)
  2019-09-25  7:53 ` [dpdk-dev] [PATCH 08/12] net/mlx5: elaborate E-Switch port parameters query Viacheslav Ovsiienko
@ 2019-09-25  7:53 ` Viacheslav Ovsiienko
  2019-09-25  7:53 ` [dpdk-dev] [PATCH 10/12] net/mlx5: extend switch domain searching range Viacheslav Ovsiienko
                   ` (4 subsequent siblings)
  13 siblings, 0 replies; 17+ messages in thread
From: Viacheslav Ovsiienko @ 2019-09-25  7:53 UTC (permalink / raw)
  To: dev; +Cc: matan, rasland

There new kernel/rdma_core [1] supports matching on metadata
register instead of vport field to provide operations over
VF LAG bonding configurations. This patch provides correct
translations for flow matchers and destination port actions
if united E-Switch (for VF LAG) is configured and/or new vport
matching mode is engaged.

[1] http://patchwork.ozlabs.org/cover/1122170/
    "Mellanox, mlx5 vport metadata matching"

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
---
 drivers/net/mlx5/mlx5_flow_dv.c | 38 +++++++++++++++++++++++++++++++++++---
 drivers/net/mlx5/mlx5_prm.h     |  9 ++++++++-
 2 files changed, 43 insertions(+), 4 deletions(-)

diff --git a/drivers/net/mlx5/mlx5_flow_dv.c b/drivers/net/mlx5/mlx5_flow_dv.c
index ad4ff5a..2a7e3ed 100644
--- a/drivers/net/mlx5/mlx5_flow_dv.c
+++ b/drivers/net/mlx5/mlx5_flow_dv.c
@@ -4617,6 +4617,29 @@ struct field_modify_info modify_tcp[] = {
 }
 
 /**
+ * Add vport metadata Reg C0 item to matcher
+ *
+ * @param[in, out] matcher
+ *   Flow matcher.
+ * @param[in, out] key
+ *   Flow matcher value.
+ * @param[in] reg
+ *   Flow pattern to translate.
+ */
+static void
+flow_dv_translate_item_meta_vport(void *matcher, void *key,
+				  uint32_t value, uint32_t mask)
+{
+	void *misc2_m =
+		MLX5_ADDR_OF(fte_match_param, matcher, misc_parameters_2);
+	void *misc2_v =
+		MLX5_ADDR_OF(fte_match_param, key, misc_parameters_2);
+
+	MLX5_SET(fte_match_set_misc2, misc2_m, metadata_reg_c_0, mask);
+	MLX5_SET(fte_match_set_misc2, misc2_v, metadata_reg_c_0, value);
+}
+
+/**
  * Add source vport match to the specified matcher.
  *
  * @param[in, out] matcher
@@ -4668,8 +4691,14 @@ struct field_modify_info modify_tcp[] = {
 	priv = mlx5_port_to_eswitch_info(id);
 	if (!priv)
 		return -rte_errno;
-	flow_dv_translate_item_source_vport(matcher, key,
-					    priv->vport_id, mask);
+	/* Translate to vport field or to metadata, depending on mode. */
+	if (priv->vport_meta_mask)
+		flow_dv_translate_item_meta_vport(matcher, key,
+						  priv->vport_meta_tag,
+						  priv->vport_meta_mask);
+	else
+		flow_dv_translate_item_source_vport(matcher, key,
+						    priv->vport_id, mask);
 	return 0;
 }
 
@@ -5113,7 +5142,10 @@ struct field_modify_info modify_tcp[] = {
 					  RTE_FLOW_ERROR_TYPE_ACTION,
 					  NULL,
 					  "No eswitch info was found for port");
-	*dst_port_id = priv->vport_id;
+	if (priv->vport_meta_mask)
+		*dst_port_id = priv->ibv_port;
+	else
+		*dst_port_id = priv->vport_id;
 	return 0;
 }
 
diff --git a/drivers/net/mlx5/mlx5_prm.h b/drivers/net/mlx5/mlx5_prm.h
index e5afc1c..3765df0 100644
--- a/drivers/net/mlx5/mlx5_prm.h
+++ b/drivers/net/mlx5/mlx5_prm.h
@@ -614,7 +614,14 @@ struct mlx5_ifc_fte_match_set_misc2_bits {
 	struct mlx5_ifc_fte_match_mpls_bits inner_first_mpls;
 	struct mlx5_ifc_fte_match_mpls_bits outer_first_mpls_over_gre;
 	struct mlx5_ifc_fte_match_mpls_bits outer_first_mpls_over_udp;
-	u8 reserved_at_80[0x100];
+	u8 metadata_reg_c_7[0x20];
+	u8 metadata_reg_c_6[0x20];
+	u8 metadata_reg_c_5[0x20];
+	u8 metadata_reg_c_4[0x20];
+	u8 metadata_reg_c_3[0x20];
+	u8 metadata_reg_c_2[0x20];
+	u8 metadata_reg_c_1[0x20];
+	u8 metadata_reg_c_0[0x20];
 	u8 metadata_reg_a[0x20];
 	u8 reserved_at_1a0[0x60];
 };
-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [dpdk-dev] [PATCH 10/12] net/mlx5: extend switch domain searching range
  2019-09-25  7:53 [dpdk-dev] [PATCH 00/12] net/mlx5: add bonding configuration support Viacheslav Ovsiienko
                   ` (8 preceding siblings ...)
  2019-09-25  7:53 ` [dpdk-dev] [PATCH 09/12] net/mlx5: update source and destination vport translations Viacheslav Ovsiienko
@ 2019-09-25  7:53 ` Viacheslav Ovsiienko
  2019-09-25  7:53 ` [dpdk-dev] [PATCH 11/12] net/mlx5: update switch port ID in bonding configuration Viacheslav Ovsiienko
                   ` (3 subsequent siblings)
  13 siblings, 0 replies; 17+ messages in thread
From: Viacheslav Ovsiienko @ 2019-09-25  7:53 UTC (permalink / raw)
  To: dev; +Cc: matan, rasland

With bonding configurations the switch domain may be shared
between multiple PCI devices, we should search the switch
sibling devices within the entire set of present ethernet
devices backed by the mlx5 PMD.

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
---
 drivers/net/mlx5/mlx5.c        | 25 ++++++++++++++++++--
 drivers/net/mlx5/mlx5.h        | 10 +++++---
 drivers/net/mlx5/mlx5_ethdev.c | 52 +++++++++---------------------------------
 3 files changed, 41 insertions(+), 46 deletions(-)

diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index 1b2f86f..e93f069 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -927,7 +927,7 @@ struct mlx5_dev_spawn_data {
 		unsigned int c = 0;
 		uint16_t port_id;
 
-		RTE_ETH_FOREACH_DEV_OF(port_id, dev->device) {
+		MLX5_ETH_FOREACH_DEV(port_id) {
 			struct mlx5_priv *opriv =
 				rte_eth_devices[port_id].data->dev_private;
 
@@ -936,6 +936,7 @@ struct mlx5_dev_spawn_data {
 			    &rte_eth_devices[port_id] == dev)
 				continue;
 			++c;
+			break;
 		}
 		if (!c)
 			claim_zero(rte_eth_switch_domain_free(priv->domain_id));
@@ -1855,11 +1856,12 @@ struct mlx5_dev_spawn_data {
 	 * Look for sibling devices in order to reuse their switch domain
 	 * if any, otherwise allocate one.
 	 */
-	RTE_ETH_FOREACH_DEV_OF(port_id, dpdk_dev) {
+	MLX5_ETH_FOREACH_DEV(port_id) {
 		const struct mlx5_priv *opriv =
 			rte_eth_devices[port_id].data->dev_private;
 
 		if (!opriv ||
+		    opriv->sh != priv->sh ||
 			opriv->domain_id ==
 			RTE_ETH_DEV_SWITCH_DOMAIN_ID_INVALID)
 			continue;
@@ -2732,6 +2734,25 @@ struct mlx5_dev_spawn_data {
 	return ret;
 }
 
+uint16_t
+mlx5_eth_find_next(uint16_t port_id)
+{
+	while (port_id < RTE_MAX_ETHPORTS) {
+		struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+
+		if (dev->state != RTE_ETH_DEV_UNUSED &&
+		    dev->device &&
+		    dev->device->driver &&
+		    dev->device->driver->name &&
+		    !strcmp(dev->device->driver->name, MLX5_DRIVER_NAME))
+			break;
+		port_id++;
+	}
+	if (port_id >= RTE_MAX_ETHPORTS)
+		return RTE_MAX_ETHPORTS;
+	return port_id;
+}
+
 /**
  * DPDK callback to remove a PCI device.
  *
diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index 87e0549..60bd204 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -681,6 +681,13 @@ int32_t mlx5_release_dbr(struct rte_eth_dev *dev, uint32_t umem_id,
 			 uint64_t offset);
 int mlx5_udp_tunnel_port_add(struct rte_eth_dev *dev,
 			      struct rte_eth_udp_tunnel *udp_tunnel);
+uint16_t mlx5_eth_find_next(uint16_t port_id);
+
+/* Macro to iterate over all valid ports for mlx5 driver. */
+#define MLX5_ETH_FOREACH_DEV(port_id) \
+	for (port_id = mlx5_eth_find_next(0); \
+	     port_id < RTE_MAX_ETHPORTS; \
+	     port_id = mlx5_eth_find_next(port_id + 1))
 
 /* mlx5_ethdev.c */
 
@@ -715,9 +722,6 @@ int mlx5_dev_to_pci_addr(const char *dev_path,
 int mlx5_is_removed(struct rte_eth_dev *dev);
 eth_tx_burst_t mlx5_select_tx_function(struct rte_eth_dev *dev);
 eth_rx_burst_t mlx5_select_rx_function(struct rte_eth_dev *dev);
-unsigned int mlx5_dev_to_port_id(const struct rte_device *dev,
-				 uint16_t *port_list,
-				 unsigned int port_list_n);
 struct mlx5_priv *mlx5_port_to_eswitch_info(uint16_t port);
 struct mlx5_priv *mlx5_dev_to_eswitch_info(struct rte_eth_dev *dev);
 int mlx5_sysfs_switch_info(unsigned int ifindex,
diff --git a/drivers/net/mlx5/mlx5_ethdev.c b/drivers/net/mlx5/mlx5_ethdev.c
index 27372f1..751247d 100644
--- a/drivers/net/mlx5/mlx5_ethdev.c
+++ b/drivers/net/mlx5/mlx5_ethdev.c
@@ -580,16 +580,15 @@ struct ethtool_link_settings {
 	info->switch_info.domain_id = priv->domain_id;
 	info->switch_info.port_id = priv->representor_id;
 	if (priv->representor) {
-		unsigned int i = mlx5_dev_to_port_id(dev->device, NULL, 0);
-		uint16_t port_id[i];
+		uint16_t port_id;
 
-		i = RTE_MIN(mlx5_dev_to_port_id(dev->device, port_id, i), i);
-		while (i--) {
+		MLX5_ETH_FOREACH_DEV(port_id) {
 			struct mlx5_priv *opriv =
-				rte_eth_devices[port_id[i]].data->dev_private;
+				rte_eth_devices[port_id].data->dev_private;
 
 			if (!opriv ||
 			    opriv->representor ||
+			    opriv->sh != priv->sh ||
 			    opriv->domain_id != priv->domain_id)
 				continue;
 			/*
@@ -600,7 +599,6 @@ struct ethtool_link_settings {
 			break;
 		}
 	}
-
 	return 0;
 }
 
@@ -717,11 +715,13 @@ int mlx5_fw_version_get(struct rte_eth_dev *dev, char *fw_ver, size_t fw_size)
 	priv = dev->data->dev_private;
 	domain_id = priv->domain_id;
 	assert(priv->representor);
-	RTE_ETH_FOREACH_DEV_OF(port_id, dev->device) {
-		priv = rte_eth_devices[port_id].data->dev_private;
-		if (priv &&
-		    priv->master &&
-		    priv->domain_id == domain_id)
+	MLX5_ETH_FOREACH_DEV(port_id) {
+		struct mlx5_priv *opriv =
+			rte_eth_devices[port_id].data->dev_private;
+		if (opriv &&
+		    opriv->master &&
+		    opriv->domain_id == domain_id &&
+		    opriv->sh == priv->sh)
 			return &rte_eth_devices[port_id];
 	}
 	return NULL;
@@ -1630,36 +1630,6 @@ int mlx5_fw_version_get(struct rte_eth_dev *dev, char *fw_ver, size_t fw_size)
 }
 
 /**
- * Get port ID list of mlx5 instances sharing a common device.
- *
- * @param[in] dev
- *   Device to look for.
- * @param[out] port_list
- *   Result buffer for collected port IDs.
- * @param port_list_n
- *   Maximum number of entries in result buffer. If 0, @p port_list can be
- *   NULL.
- *
- * @return
- *   Number of matching instances regardless of the @p port_list_n
- *   parameter, 0 if none were found.
- */
-unsigned int
-mlx5_dev_to_port_id(const struct rte_device *dev, uint16_t *port_list,
-		    unsigned int port_list_n)
-{
-	uint16_t id;
-	unsigned int n = 0;
-
-	RTE_ETH_FOREACH_DEV_OF(id, dev) {
-		if (n < port_list_n)
-			port_list[n] = id;
-		n++;
-	}
-	return n;
-}
-
-/**
  * Get the E-Switch parameters by port id.
  *
  * @param[in] port
-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [dpdk-dev] [PATCH 11/12] net/mlx5: update switch port ID in bonding configuration
  2019-09-25  7:53 [dpdk-dev] [PATCH 00/12] net/mlx5: add bonding configuration support Viacheslav Ovsiienko
                   ` (9 preceding siblings ...)
  2019-09-25  7:53 ` [dpdk-dev] [PATCH 10/12] net/mlx5: extend switch domain searching range Viacheslav Ovsiienko
@ 2019-09-25  7:53 ` Viacheslav Ovsiienko
  2019-09-25  7:53 ` [dpdk-dev] [PATCH 12/12] net/mlx5: check sibling device configurations mismatch Viacheslav Ovsiienko
                   ` (2 subsequent siblings)
  13 siblings, 0 replies; 17+ messages in thread
From: Viacheslav Ovsiienko @ 2019-09-25  7:53 UTC (permalink / raw)
  To: dev; +Cc: matan, rasland

With bonding configuration multiple PFs may represent the
single switching device with multiple ports as representors.
To distinguish representors belonging to different PFs we
should generated unique port ID. It is proposed to use
the PF index in bonding configuration to generate this
unique port IDs.

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
---
 drivers/net/mlx5/mlx5.c        |  1 +
 drivers/net/mlx5/mlx5.h        |  1 +
 drivers/net/mlx5/mlx5_defs.h   |  4 ++++
 drivers/net/mlx5/mlx5_ethdev.c | 19 +++++++++++++++++++
 4 files changed, 25 insertions(+)

diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index e93f069..71b30d9 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -1789,6 +1789,7 @@ struct mlx5_dev_spawn_data {
 	priv->domain_id = RTE_ETH_DEV_SWITCH_DOMAIN_ID_INVALID;
 	priv->vport_meta_tag = 0;
 	priv->vport_meta_mask = 0;
+	priv->pf_bond = spawn->pf_bond;
 #ifdef HAVE_MLX5DV_DR_DEVX_PORT
 	/*
 	 * The DevX port query API is implemented. E-Switch may use
diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index 60bd204..d4d2ca8 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -623,6 +623,7 @@ struct mlx5_priv {
 	uint32_t vport_meta_tag; /* Used for vport index match ove VF LAG. */
 	uint32_t vport_meta_mask; /* Used for vport index field match mask. */
 	int32_t representor_id; /* Port representor identifier. */
+	int32_t pf_bond; /* >=0 means PF index in bonding configuration. */
 	unsigned int if_index; /* Associated kernel network device index. */
 	/* RX/TX queues. */
 	unsigned int rxqs_n; /* RX queues array size. */
diff --git a/drivers/net/mlx5/mlx5_defs.h b/drivers/net/mlx5/mlx5_defs.h
index d7440fd..180122d 100644
--- a/drivers/net/mlx5/mlx5_defs.h
+++ b/drivers/net/mlx5/mlx5_defs.h
@@ -58,6 +58,10 @@
 #define MLX5_PMD_SOFT_COUNTERS 1
 #endif
 
+/* Switch port ID parameters for bonding configurations. */
+#define MLX5_PORT_ID_BONDING_PF_MASK 0xf
+#define MLX5_PORT_ID_BONDING_PF_SHIFT 0xf
+
 /* Alarm timeout. */
 #define MLX5_ALARM_TIMEOUT_US 100000
 
diff --git a/drivers/net/mlx5/mlx5_ethdev.c b/drivers/net/mlx5/mlx5_ethdev.c
index 751247d..aa645d0 100644
--- a/drivers/net/mlx5/mlx5_ethdev.c
+++ b/drivers/net/mlx5/mlx5_ethdev.c
@@ -582,6 +582,25 @@ struct ethtool_link_settings {
 	if (priv->representor) {
 		uint16_t port_id;
 
+		if (priv->pf_bond >= 0) {
+			/*
+			 * Switch port ID is opaque value with driver defined
+			 * format. Push the PF index in bonding configurations
+			 * in upper four bits of port ID. If we get too many
+			 * representors (more than 4K) or PFs (more than 15)
+			 * this approach must be reconsidered.
+			 */
+			if ((info->switch_info.port_id >>
+				MLX5_PORT_ID_BONDING_PF_SHIFT) ||
+			    priv->pf_bond > MLX5_PORT_ID_BONDING_PF_MASK) {
+				DRV_LOG(ERR, "can't update switch port ID"
+					     " for bonding device");
+				assert(false);
+				return -ENODEV;
+			}
+			info->switch_info.port_id |=
+				priv->pf_bond << MLX5_PORT_ID_BONDING_PF_SHIFT;
+		}
 		MLX5_ETH_FOREACH_DEV(port_id) {
 			struct mlx5_priv *opriv =
 				rte_eth_devices[port_id].data->dev_private;
-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [dpdk-dev] [PATCH 12/12] net/mlx5: check sibling device configurations mismatch
  2019-09-25  7:53 [dpdk-dev] [PATCH 00/12] net/mlx5: add bonding configuration support Viacheslav Ovsiienko
                   ` (10 preceding siblings ...)
  2019-09-25  7:53 ` [dpdk-dev] [PATCH 11/12] net/mlx5: update switch port ID in bonding configuration Viacheslav Ovsiienko
@ 2019-09-25  7:53 ` Viacheslav Ovsiienko
  2019-09-25 10:29 ` [dpdk-dev] [PATCH 00/12] net/mlx5: add bonding configuration support Matan Azrad
  2019-09-29 11:47 ` Raslan Darawsheh
  13 siblings, 0 replies; 17+ messages in thread
From: Viacheslav Ovsiienko @ 2019-09-25  7:53 UTC (permalink / raw)
  To: dev; +Cc: matan, rasland

The devices backed by mlx5 PMD might share the same multiport
Infiniband device context. It regards representors and slaves
of bonding device. These ports are spawned with devargs.
These patch check whether configuration deduced from these
devargs is compatible with configurations if devices
sharing the same context. It prevents the incorrect
whitelists, like:

-w 82:00.0,representor=0,dv_flow_en=1
-w 82:00.0,representor=1,dv_flow_en=0

The representors with indices [0-1] are supposed to spawned
over the same PCi device, but there is dv_flow_en parameter
mismatch.

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
---
 drivers/net/mlx5/mlx5.c | 50 +++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 50 insertions(+)

diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index 71b30d9..951b9f5 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -1519,6 +1519,53 @@ struct mlx5_dev_spawn_data {
 }
 
 /**
+ * Check sibling device configurations.
+ *
+ * Sibling devices sharing the Infiniband device context
+ * should have compatible configurations. This regards
+ * representors and bonding slaves.
+ *
+ * @param priv
+ *   Private device descriptor.
+ * @param config
+ *   Configuration of the device is going to be created.
+ *
+ * @return
+ *   0 on success, EINVAL otherwise
+ */
+static int
+mlx5_dev_check_sibling_config(struct mlx5_priv *priv,
+			      struct mlx5_dev_config *config)
+{
+	struct mlx5_ibv_shared *sh = priv->sh;
+	struct mlx5_dev_config *sh_conf = NULL;
+	uint16_t port_id;
+
+	assert(sh);
+	/* Nothing to compare for the single/first device. */
+	if (sh->refcnt == 1)
+		return 0;
+	/* Find the device with shared context. */
+	MLX5_ETH_FOREACH_DEV(port_id) {
+		struct mlx5_priv *opriv =
+			rte_eth_devices[port_id].data->dev_private;
+
+		if (opriv && opriv != priv && opriv->sh == sh) {
+			sh_conf = &opriv->config;
+			break;
+		}
+	}
+	if (!sh_conf)
+		return 0;
+	if (sh_conf->dv_flow_en ^ config->dv_flow_en) {
+		DRV_LOG(ERR, "\"dv_flow_en\" configuration mismatch"
+			     " for shared %s context", sh->ibdev_name);
+		rte_errno = EINVAL;
+		return rte_errno;
+	}
+	return 0;
+}
+/**
  * Spawn an Ethernet device from Verbs information.
  *
  * @param dpdk_dev
@@ -1886,6 +1933,9 @@ struct mlx5_dev_spawn_data {
 			strerror(rte_errno));
 		goto error;
 	}
+	err = mlx5_dev_check_sibling_config(priv, &config);
+	if (err)
+		goto error;
 	config.hw_csum = !!(sh->device_attr.device_cap_flags_ex &
 			    IBV_DEVICE_RAW_IP_CSUM);
 	DRV_LOG(DEBUG, "checksum offloading is %ssupported",
-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [dpdk-dev] [PATCH 00/12] net/mlx5: add bonding configuration support
  2019-09-25  7:53 [dpdk-dev] [PATCH 00/12] net/mlx5: add bonding configuration support Viacheslav Ovsiienko
                   ` (11 preceding siblings ...)
  2019-09-25  7:53 ` [dpdk-dev] [PATCH 12/12] net/mlx5: check sibling device configurations mismatch Viacheslav Ovsiienko
@ 2019-09-25 10:29 ` Matan Azrad
  2019-09-29 11:47 ` Raslan Darawsheh
  13 siblings, 0 replies; 17+ messages in thread
From: Matan Azrad @ 2019-09-25 10:29 UTC (permalink / raw)
  To: Slava Ovsiienko, dev; +Cc: Raslan Darawsheh



From: Viacheslav Ovsiienko
> Multiport Mellanox NICs may support the bonding configurations internally.
> Let's suppose there is ConnectX-5 NIC with two physical ports, on the host it
> presents two PCI physical functions:
> 
> - PF0, say with PCI address 0000:82:00.0 and net interface ens1f0
> - PF1, say with PCI address 0000:82:00.1 and net interface ens1f1
> 
> Also, let's suppose SR-IOV feature is enabled, swithdev mode is engaged,
> and there is some set virtual PCI functions and their representor interfaces.
> The physical interfaces may be combined into single bond interface,
> supported by NIC HW/FW means directly with standard script:
> 
>   modprobe bonding miimon=100 mode=4  # 100 ms link check interval, mode
> - LACP
>   ip link set ens3f0 master bond0
>   ip link set ens3f0 master bond1
> 
> The dedicated Infiniband devices for single ports is destroyed, the new
> multiport Infiniband device is created for bond interface and all representors
> for both PFs. The unified E-Switch is created either, and all representor ports
> belong to the same unified switch domain.
> 
> To use the created bond interface with DPDK application both slave PCI
> devices must be specified (in whitelist, if any):
> 
>   -w 82:00.0,representor=[0-4]
>   -w 82:00.1,representor=[0-7]
> 
> Representor enumerations follows the VF enumerations in the same way as
> for single device. The two PCI devices will be probed, but eth ports for only
> one master device and for all representors will be created.
> This ports may reference to different rte_pci_dev but share the same switch
> domain ID.
> 
> The extra devargs specifying configurations must be compatible (otherwise
> error on probing will be arisen). For example, it is not allowed to specify
> different values of dv_flow_en parameter for different PCI devices.
> 
> Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
> 
> Viacheslav Ovsiienko (12):
>   net/mlx5: move backing PCI device to private context
>   net/mlx5: update PCI address retrieving routine
>   net/mlx5: allocate device list explicitly
>   net/mlx5: add VF LAG mode bonding device recognition
>   net/mlx5: generate bonding device name
>   net/mlx5: check the kernel support for VF LAG bonding
>   net/mlx5: query vport index match mode and parameters
>   net/mlx5: elaborate E-Switch port parameters query
>   net/mlx5: update source and destination vport translations
>   net/mlx5: extend switch domain searching range
>   net/mlx5: update switch port ID in bonding configuration
>   net/mlx5: check sibling device configurations mismatch
> 
>  drivers/net/mlx5/Makefile       |   5 +
>  drivers/net/mlx5/meson.build    |   2 +
>  drivers/net/mlx5/mlx5.c         | 359
> +++++++++++++++++++++++++++++++++++++---
>  drivers/net/mlx5/mlx5.h         |  23 ++-
>  drivers/net/mlx5/mlx5_defs.h    |   4 +
>  drivers/net/mlx5/mlx5_ethdev.c  | 128 +++++++-------
> drivers/net/mlx5/mlx5_flow_dv.c |  98 +++++++----
>  drivers/net/mlx5/mlx5_prm.h     |   9 +-
>  drivers/net/mlx5/mlx5_txq.c     |   2 +-
>  9 files changed, 506 insertions(+), 124 deletions(-)
> 
> --
> 1.8.3.1
For all the series:
Acked-by: Matan Azrad <matan@mellanox.com>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [dpdk-dev] [PATCH 00/12] net/mlx5: add bonding configuration support
  2019-09-25  7:53 [dpdk-dev] [PATCH 00/12] net/mlx5: add bonding configuration support Viacheslav Ovsiienko
                   ` (12 preceding siblings ...)
  2019-09-25 10:29 ` [dpdk-dev] [PATCH 00/12] net/mlx5: add bonding configuration support Matan Azrad
@ 2019-09-29 11:47 ` Raslan Darawsheh
  13 siblings, 0 replies; 17+ messages in thread
From: Raslan Darawsheh @ 2019-09-29 11:47 UTC (permalink / raw)
  To: Slava Ovsiienko, dev; +Cc: Matan Azrad

Hi,

> -----Original Message-----
> From: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
> Sent: Wednesday, September 25, 2019 10:53 AM
> To: dev@dpdk.org
> Cc: Matan Azrad <matan@mellanox.com>; Raslan Darawsheh
> <rasland@mellanox.com>
> Subject: [PATCH 00/12] net/mlx5: add bonding configuration support
> 
> Multiport Mellanox NICs may support the bonding configurations internally.
> Let's suppose there is ConnectX-5 NIC with two physical ports, on the host it
> presents two PCI physical functions:
> 
> - PF0, say with PCI address 0000:82:00.0 and net interface ens1f0
> - PF1, say with PCI address 0000:82:00.1 and net interface ens1f1
> 
> Also, let's suppose SR-IOV feature is enabled, swithdev mode is engaged,
> and there is some set virtual PCI functions and their representor interfaces.
> The physical interfaces may be combined into single bond interface,
> supported by NIC HW/FW means directly with standard script:
> 
>   modprobe bonding miimon=100 mode=4  # 100 ms link check interval, mode
> - LACP
>   ip link set ens3f0 master bond0
>   ip link set ens3f0 master bond1
> 
> The dedicated Infiniband devices for single ports is destroyed, the new
> multiport Infiniband device is created for bond interface and all representors
> for both PFs. The unified E-Switch is created either, and all representor ports
> belong to the same unified switch domain.
> 
> To use the created bond interface with DPDK application both slave PCI
> devices must be specified (in whitelist, if any):
> 
>   -w 82:00.0,representor=[0-4]
>   -w 82:00.1,representor=[0-7]
> 
> Representor enumerations follows the VF enumerations in the same way as
> for single device. The two PCI devices will be probed, but eth ports for only
> one master device and for all representors will be created.
> This ports may reference to different rte_pci_dev but share the same switch
> domain ID.
> 
> The extra devargs specifying configurations must be compatible (otherwise
> error on probing will be arisen). For example, it is not allowed to specify
> different values of dv_flow_en parameter for different PCI devices.
> 
> Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
> 
> Viacheslav Ovsiienko (12):
>   net/mlx5: move backing PCI device to private context
>   net/mlx5: update PCI address retrieving routine
>   net/mlx5: allocate device list explicitly
>   net/mlx5: add VF LAG mode bonding device recognition
>   net/mlx5: generate bonding device name
>   net/mlx5: check the kernel support for VF LAG bonding
>   net/mlx5: query vport index match mode and parameters
>   net/mlx5: elaborate E-Switch port parameters query
>   net/mlx5: update source and destination vport translations
>   net/mlx5: extend switch domain searching range
>   net/mlx5: update switch port ID in bonding configuration
>   net/mlx5: check sibling device configurations mismatch
> 
>  drivers/net/mlx5/Makefile       |   5 +
>  drivers/net/mlx5/meson.build    |   2 +
>  drivers/net/mlx5/mlx5.c         | 359
> +++++++++++++++++++++++++++++++++++++---
>  drivers/net/mlx5/mlx5.h         |  23 ++-
>  drivers/net/mlx5/mlx5_defs.h    |   4 +
>  drivers/net/mlx5/mlx5_ethdev.c  | 128 +++++++-------
> drivers/net/mlx5/mlx5_flow_dv.c |  98 +++++++----
>  drivers/net/mlx5/mlx5_prm.h     |   9 +-
>  drivers/net/mlx5/mlx5_txq.c     |   2 +-
>  9 files changed, 506 insertions(+), 124 deletions(-)
> 
> --
> 1.8.3.1


Series pushed to next-net-mlx,

Kindest regards,
Raslan Darawsheh

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [dpdk-dev] [PATCH 04/12] net/mlx5: add VF LAG mode bonding device recognition
  2019-09-25  7:53 ` [dpdk-dev] [PATCH 04/12] net/mlx5: add VF LAG mode bonding device recognition Viacheslav Ovsiienko
@ 2019-09-30 10:34   ` Ferruh Yigit
  2019-10-01  9:02     ` Slava Ovsiienko
  0 siblings, 1 reply; 17+ messages in thread
From: Ferruh Yigit @ 2019-09-30 10:34 UTC (permalink / raw)
  To: Viacheslav Ovsiienko, dev; +Cc: matan, rasland

On 9/25/2019 8:53 AM, Viacheslav Ovsiienko wrote:
> The Mellanox NICs starting from ConnectX-5 support LAG over
> NIC ports internally, implemented by the NIC firmware and hardware.
> The multiport NIC presents multiple physical PCI functions (PF),
> with SR-IOV multiple virtual PCI functions (VFs) might be presented.
> With switchdev mode the VF representors are engaged and PFs and their
> VFs are connected by internal E-Switch feature. Each PF and related VFs
> have dedicated E-Switch and belong to dedicated switch domain.
> 
> If NIC ports are combined to support NIC the kernel drivers introduce
> the single unified Infiniband multiport devices, and all only one
> unified E-Switch with single switch domain combines master PF
> all all VFs. No extra DPDK bonding device is needed.
> 
> Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>

<...>

> +
> +	/* Use safe format to check maximal buffer length. */
> +#pragma GCC diagnostic ignored "-Wformat-nonliteral"
> +	while (fscanf(file, format, ifname) == 1) {
> +#pragma GCC diagnostic error "-Wformat-nonliteral"
> +		char tmp_str[IF_NAMESIZE + 32];
> +		struct rte_pci_addr pci_addr;
> +		struct mlx5_switch_info	info;


Hi Slava,

The ICC is failing because of the pragma [1],
- Can you please explain why it is needed, can we escape from it?
- If we can't escape can you please add compiler checks around it? [2]
- And should we enable it as 'error' by default, what about following [3]?


[1]
.../drivers/net/mlx5/mlx5.c(2301): error #2282: unrecognized GCC pragma
  #pragma GCC diagnostic ignored "-Wformat-nonliteral"
                         ^

[2]
#ifdef __GNUC__
#if __GNUC__ > 4 || (__GNUC__ == 4 && __GNUC_MINOR__ > 1)
#pragma GCC diagnostic ignored "-Wformat-nonliteral"
#endif
#endif


[3]
 #pragma GCC diagnostic push
 #pragma GCC diagnostic ignored "-Wformat-nonliteral"
<...>
 #pragma GCC diagnostic pop

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [dpdk-dev] [PATCH 04/12] net/mlx5: add VF LAG mode bonding device recognition
  2019-09-30 10:34   ` Ferruh Yigit
@ 2019-10-01  9:02     ` Slava Ovsiienko
  0 siblings, 0 replies; 17+ messages in thread
From: Slava Ovsiienko @ 2019-10-01  9:02 UTC (permalink / raw)
  To: Ferruh Yigit, dev; +Cc: Matan Azrad, Raslan Darawsheh

Hi, Ferruh

> -----Original Message-----
> From: Ferruh Yigit <ferruh.yigit@intel.com>
> Sent: Monday, September 30, 2019 13:35
> To: Slava Ovsiienko <viacheslavo@mellanox.com>; dev@dpdk.org
> Cc: Matan Azrad <matan@mellanox.com>; Raslan Darawsheh
> <rasland@mellanox.com>
> Subject: Re: [dpdk-dev] [PATCH 04/12] net/mlx5: add VF LAG mode bonding
> device recognition
> 
> On 9/25/2019 8:53 AM, Viacheslav Ovsiienko wrote:
> > The Mellanox NICs starting from ConnectX-5 support LAG over NIC ports
> > internally, implemented by the NIC firmware and hardware.
> > The multiport NIC presents multiple physical PCI functions (PF), with
> > SR-IOV multiple virtual PCI functions (VFs) might be presented.
> > With switchdev mode the VF representors are engaged and PFs and their
> > VFs are connected by internal E-Switch feature. Each PF and related
> > VFs have dedicated E-Switch and belong to dedicated switch domain.
> >
> > If NIC ports are combined to support NIC the kernel drivers introduce
> > the single unified Infiniband multiport devices, and all only one
> > unified E-Switch with single switch domain combines master PF all all
> > VFs. No extra DPDK bonding device is needed.
> >
> > Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
> 
> <...>
> 
> > +
> > +	/* Use safe format to check maximal buffer length. */ #pragma GCC
> > +diagnostic ignored "-Wformat-nonliteral"
> > +	while (fscanf(file, format, ifname) == 1) { #pragma GCC diagnostic
> > +error "-Wformat-nonliteral"
> > +		char tmp_str[IF_NAMESIZE + 32];
> > +		struct rte_pci_addr pci_addr;
> > +		struct mlx5_switch_info	info;
> 
> 
> Hi Slava,
> 
> The ICC is failing because of the pragma [1],
> - Can you please explain why it is needed, can we escape from it?

GCC generates error/warning, depending on the settings "-Werror=format-nonliteral",
for the fscanf(file, format, ...) call,  if the "format" parameter is not literal.

> - If we can't escape can you please add compiler checks around it? [2]


Sure, I will. And I'm sorry for missing this in the patch.
BTW , there are multiple usages of "#ifdef __INTEL_COMPILER"
around the GCC pragmas. I think RTE_TOOLCHAIN_GCC is more relevant for the
in case being discussed.

> - And should we enable it as 'error' by default, what about following [3]?
Yes, it would be better, thanks for the nice clue.

WBR, Slava
> 
> [1]
> .../drivers/net/mlx5/mlx5.c(2301): error #2282: unrecognized GCC pragma
>   #pragma GCC diagnostic ignored "-Wformat-nonliteral"
>                          ^
> 
> [2]
> #ifdef __GNUC__
> #if __GNUC__ > 4 || (__GNUC__ == 4 && __GNUC_MINOR__ > 1) #pragma
> GCC diagnostic ignored "-Wformat-nonliteral"
> #endif
> #endif
> 
> 
> [3]
>  #pragma GCC diagnostic push
>  #pragma GCC diagnostic ignored "-Wformat-nonliteral"
> <...>
>  #pragma GCC diagnostic pop

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2019-10-01  9:02 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-09-25  7:53 [dpdk-dev] [PATCH 00/12] net/mlx5: add bonding configuration support Viacheslav Ovsiienko
2019-09-25  7:53 ` [dpdk-dev] [PATCH 01/12] net/mlx5: move backing PCI device to private context Viacheslav Ovsiienko
2019-09-25  7:53 ` [dpdk-dev] [PATCH 02/12] net/mlx5: update PCI address retrieving routine Viacheslav Ovsiienko
2019-09-25  7:53 ` [dpdk-dev] [PATCH 03/12] net/mlx5: allocate device list explicitly Viacheslav Ovsiienko
2019-09-25  7:53 ` [dpdk-dev] [PATCH 04/12] net/mlx5: add VF LAG mode bonding device recognition Viacheslav Ovsiienko
2019-09-30 10:34   ` Ferruh Yigit
2019-10-01  9:02     ` Slava Ovsiienko
2019-09-25  7:53 ` [dpdk-dev] [PATCH 05/12] net/mlx5: generate bonding device name Viacheslav Ovsiienko
2019-09-25  7:53 ` [dpdk-dev] [PATCH 06/12] net/mlx5: check the kernel support for VF LAG bonding Viacheslav Ovsiienko
2019-09-25  7:53 ` [dpdk-dev] [PATCH 07/12] net/mlx5: query vport index match mode and parameters Viacheslav Ovsiienko
2019-09-25  7:53 ` [dpdk-dev] [PATCH 08/12] net/mlx5: elaborate E-Switch port parameters query Viacheslav Ovsiienko
2019-09-25  7:53 ` [dpdk-dev] [PATCH 09/12] net/mlx5: update source and destination vport translations Viacheslav Ovsiienko
2019-09-25  7:53 ` [dpdk-dev] [PATCH 10/12] net/mlx5: extend switch domain searching range Viacheslav Ovsiienko
2019-09-25  7:53 ` [dpdk-dev] [PATCH 11/12] net/mlx5: update switch port ID in bonding configuration Viacheslav Ovsiienko
2019-09-25  7:53 ` [dpdk-dev] [PATCH 12/12] net/mlx5: check sibling device configurations mismatch Viacheslav Ovsiienko
2019-09-25 10:29 ` [dpdk-dev] [PATCH 00/12] net/mlx5: add bonding configuration support Matan Azrad
2019-09-29 11:47 ` Raslan Darawsheh

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).