DPDK patches and discussions
 help / color / mirror / Atom feed
* [dpdk-dev] [PATCH v2 0/2] support socket direct mode bonding
@ 2021-09-28  8:50 Rongwei Liu
  2021-09-28  8:50 ` [dpdk-dev] [PATCH v2 1/2] common/mlx5: support pcie device guid query Rongwei Liu
  2021-09-28  8:50 ` [dpdk-dev] [PATCH v2 2/2] net/mlx5: support socket direct mode bonding Rongwei Liu
  0 siblings, 2 replies; 11+ messages in thread
From: Rongwei Liu @ 2021-09-28  8:50 UTC (permalink / raw)
  To: matan, viacheslavo, orika, thomas; +Cc: dev, rasland

In socket direct mode, it's possible to bind any two (maybe four
in the future) PCIe devices with IDs like xxxx:xx:xx.x and
yyyy:yy:yy.y. Bonding member interfaces are unnecessary to have
the same PCIe domain/bus/device ID anymore.

Doesn't need to backport to DPDK 20.11

v2: fix ci warnings.

Rongwei Liu (2):
  common/mlx5: support pcie device guid query
  net/mlx5: support socket direct mode bonding

 drivers/common/mlx5/linux/mlx5_common_os.c | 41 +++++++++++++++++++++
 drivers/common/mlx5/linux/mlx5_common_os.h | 19 ++++++++++
 drivers/net/mlx5/linux/mlx5_os.c           | 43 +++++++++++++++++-----
 3 files changed, 94 insertions(+), 9 deletions(-)

-- 
2.27.0


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [dpdk-dev] [PATCH v2 1/2] common/mlx5: support pcie device guid query
  2021-09-28  8:50 [dpdk-dev] [PATCH v2 0/2] support socket direct mode bonding Rongwei Liu
@ 2021-09-28  8:50 ` Rongwei Liu
  2021-09-28  8:50 ` [dpdk-dev] [PATCH v2 2/2] net/mlx5: support socket direct mode bonding Rongwei Liu
  1 sibling, 0 replies; 11+ messages in thread
From: Rongwei Liu @ 2021-09-28  8:50 UTC (permalink / raw)
  To: matan, viacheslavo, orika, thomas; +Cc: dev, rasland

sysfs entry "phys_switch_id" holds each PCIe device'
guid.

The devices which reside in the same physical NIC should
have the same guid.

Signed-off-by: Rongwei Liu <rongweil@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
---
 drivers/common/mlx5/linux/mlx5_common_os.c | 41 ++++++++++++++++++++++
 drivers/common/mlx5/linux/mlx5_common_os.h | 19 ++++++++++
 2 files changed, 60 insertions(+)

diff --git a/drivers/common/mlx5/linux/mlx5_common_os.c b/drivers/common/mlx5/linux/mlx5_common_os.c
index 9e0c823c97..8b3ee2baea 100644
--- a/drivers/common/mlx5/linux/mlx5_common_os.c
+++ b/drivers/common/mlx5/linux/mlx5_common_os.c
@@ -2,6 +2,7 @@
  * Copyright 2020 Mellanox Technologies, Ltd
  */
 
+#include <sys/types.h>
 #include <unistd.h>
 #include <string.h>
 #include <stdio.h>
@@ -428,3 +429,43 @@ mlx5_os_get_ibv_device(const struct rte_pci_addr *addr)
 	mlx5_glue->free_device_list(ibv_list);
 	return ibv_match;
 }
+
+int
+mlx5_get_device_guid(const struct rte_pci_addr *dev, uint8_t *guid, size_t len)
+{
+	char tmp[512];
+	char cur_ifname[IF_NAMESIZE + 1];
+	FILE *id_file;
+	DIR *dir;
+	struct dirent *ptr;
+	int ret;
+
+	if (guid == NULL || len < sizeof(u_int64_t) + 1)
+		return -1;
+	memset(guid, 0, len);
+	snprintf(tmp, sizeof(tmp), "/sys/bus/pci/devices/%04x:%02x:%02x.%x/net",
+			dev->domain, dev->bus, dev->devid, dev->function);
+	dir = opendir(tmp);
+	if (dir == NULL)
+		return -1;
+	/* Traverse to identify PF interface */
+	do {
+		ptr = readdir(dir);
+		if (ptr == NULL || ptr->d_type != DT_DIR) {
+			closedir(dir);
+			return -1;
+		}
+	} while (strchr(ptr->d_name, '.') || strchr(ptr->d_name, '_') ||
+		 strchr(ptr->d_name, 'v'));
+	snprintf(cur_ifname, sizeof(cur_ifname), "%s", ptr->d_name);
+	closedir(dir);
+	snprintf(tmp + strlen(tmp), sizeof(tmp) - strlen(tmp),
+			"/%s/phys_switch_id", cur_ifname);
+	/* Older OFED like 5.3 doesn't support read */
+	id_file = fopen(tmp, "r");
+	if (!id_file)
+		return 0;
+	ret = fscanf(id_file, "%16s", guid);
+	fclose(id_file);
+	return ret;
+}
diff --git a/drivers/common/mlx5/linux/mlx5_common_os.h b/drivers/common/mlx5/linux/mlx5_common_os.h
index c3202b6786..3cdea75373 100644
--- a/drivers/common/mlx5/linux/mlx5_common_os.h
+++ b/drivers/common/mlx5/linux/mlx5_common_os.h
@@ -296,4 +296,23 @@ __rte_internal
 struct ibv_device *
 mlx5_os_get_ibv_dev(const struct rte_device *dev);
 
+/**
+ * This is used to query system_image_guid as describing in PRM.
+ *
+ * @param dev[in]
+ *  Pointer to a device instance as PCIe id.
+ * @param guid[out]
+ *  Pointer to the buffer to hold device guid.
+ *  Guid is uint64_t and corresponding to 17 bytes string.
+ * @param len[in]
+ *  Guid buffer length, 17 bytes at least.
+ *
+ * @return
+ *  -1 if internal failure.
+ *  0 if OFED doesn't support.
+ *  >0 if success.
+ */
+int
+mlx5_get_device_guid(const struct rte_pci_addr *dev, uint8_t *guid, size_t len);
+
 #endif /* RTE_PMD_MLX5_COMMON_OS_H_ */
-- 
2.27.0


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [dpdk-dev] [PATCH v2 2/2] net/mlx5: support socket direct mode bonding
  2021-09-28  8:50 [dpdk-dev] [PATCH v2 0/2] support socket direct mode bonding Rongwei Liu
  2021-09-28  8:50 ` [dpdk-dev] [PATCH v2 1/2] common/mlx5: support pcie device guid query Rongwei Liu
@ 2021-09-28  8:50 ` Rongwei Liu
  2021-09-29 21:58   ` Thomas Monjalon
  1 sibling, 1 reply; 11+ messages in thread
From: Rongwei Liu @ 2021-09-28  8:50 UTC (permalink / raw)
  To: matan, viacheslavo, orika, thomas; +Cc: dev, rasland

In socket direct mode, it's possible to bind any two (maybe four
in future) PCIe devices with IDs like xxxx:xx:xx.x and
yyyy:yy:yy.y. Bonding member interfaces are unnecessary to have
the same PCIe domain/bus/device ID anymore,

Kernel driver uses "system_image_guid" to identify if devices can
be bound together or not. Sysfs "phys_switch_id" is used to get
"system_image_guid" of each network interface.

OFED 5.4+ is required to support "phys_switch_id".
Centos 8.1 needs to enable switch_dev mode first.

Signed-off-by: Rongwei Liu <rongweil@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
---
 drivers/net/mlx5/linux/mlx5_os.c | 43 +++++++++++++++++++++++++-------
 1 file changed, 34 insertions(+), 9 deletions(-)

diff --git a/drivers/net/mlx5/linux/mlx5_os.c b/drivers/net/mlx5/linux/mlx5_os.c
index 3746057673..1d57b934fc 100644
--- a/drivers/net/mlx5/linux/mlx5_os.c
+++ b/drivers/net/mlx5/linux/mlx5_os.c
@@ -2008,6 +2008,8 @@ mlx5_device_bond_pci_match(const struct ibv_device *ibv_dev,
 	FILE *bond_file = NULL, *file;
 	int pf = -1;
 	int ret;
+	uint8_t cur_guid[32] = {0};
+	uint8_t guid[32] = {0};
 
 	/*
 	 * Try to get master device name. If something goes
@@ -2022,6 +2024,8 @@ mlx5_device_bond_pci_match(const struct ibv_device *ibv_dev,
 	np = mlx5_nl_portnum(nl_rdma, ibv_dev->name);
 	if (!np)
 		return -1;
+	if (mlx5_get_device_guid(pci_dev, cur_guid, sizeof(cur_guid)) < 0)
+		return -1;
 	/*
 	 * The Master device might not be on the predefined
 	 * port (not on port index 1, it is not garanted),
@@ -2050,6 +2054,7 @@ mlx5_device_bond_pci_match(const struct ibv_device *ibv_dev,
 		char tmp_str[IF_NAMESIZE + 32];
 		struct rte_pci_addr pci_addr;
 		struct mlx5_switch_info	info;
+		int ret;
 
 		/* Process slave interface names in the loop. */
 		snprintf(tmp_str, sizeof(tmp_str),
@@ -2080,15 +2085,6 @@ mlx5_device_bond_pci_match(const struct ibv_device *ibv_dev,
 				tmp_str);
 			break;
 		}
-		/* Match PCI address, allows BDF0+pfx or BDFx+pfx. */
-		if (pci_dev->domain == pci_addr.domain &&
-		    pci_dev->bus == pci_addr.bus &&
-		    pci_dev->devid == pci_addr.devid &&
-		    ((pci_dev->function == 0 &&
-		      pci_dev->function + owner == pci_addr.function) ||
-		     (pci_dev->function == owner &&
-		      pci_addr.function == owner)))
-			pf = info.port_name;
 		/* Get ifindex. */
 		snprintf(tmp_str, sizeof(tmp_str),
 			 "/sys/class/net/%s/ifindex", ifname);
@@ -2105,6 +2101,30 @@ mlx5_device_bond_pci_match(const struct ibv_device *ibv_dev,
 		bond_info->ports[info.port_name].pci_addr = pci_addr;
 		bond_info->ports[info.port_name].ifindex = ifindex;
 		bond_info->n_port++;
+		/*
+		 * Under socket direct mode, bonding will use
+		 * system_image_guid as identification.
+		 * After OFED 5.4, guid is readable (ret >= 0) under sysfs.
+		 * All bonding members should have the same guid even if driver
+		 * is using PCIe BDF.
+		 */
+		ret = mlx5_get_device_guid(&pci_addr, guid, sizeof(guid));
+		if (ret < 0)
+			break;
+		else if (ret > 0) {
+			if (!memcmp(guid, cur_guid, sizeof(guid)) &&
+			    owner == info.port_name &&
+			    (owner != 0 || (owner == 0 &&
+			    !rte_pci_addr_cmp(pci_dev, &pci_addr))))
+				pf = info.port_name;
+		} else if (pci_dev->domain == pci_addr.domain &&
+		    pci_dev->bus == pci_addr.bus &&
+		    pci_dev->devid == pci_addr.devid &&
+		    ((pci_dev->function == 0 &&
+		      pci_dev->function + owner == pci_addr.function) ||
+		     (pci_dev->function == owner &&
+		      pci_addr.function == owner)))
+			pf = info.port_name;
 	}
 	if (pf >= 0) {
 		/* Get bond interface info */
@@ -2117,6 +2137,11 @@ mlx5_device_bond_pci_match(const struct ibv_device *ibv_dev,
 			DRV_LOG(INFO, "PF device %u, bond device %u(%s)",
 				ifindex, bond_info->ifindex, bond_info->ifname);
 	}
+	if (owner == 0 && pf != 0) {
+		DRV_LOG(INFO, "PCIe instance %04x:%02x:%02x.%x isn't bonding owner",
+				pci_dev->domain, pci_dev->bus, pci_dev->devid,
+				pci_dev->function);
+	}
 	return pf;
 }
 
-- 
2.27.0


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [dpdk-dev] [PATCH v2 2/2] net/mlx5: support socket direct mode bonding
  2021-09-28  8:50 ` [dpdk-dev] [PATCH v2 2/2] net/mlx5: support socket direct mode bonding Rongwei Liu
@ 2021-09-29 21:58   ` Thomas Monjalon
  2021-10-04  6:45     ` Slava Ovsiienko
                       ` (2 more replies)
  0 siblings, 3 replies; 11+ messages in thread
From: Thomas Monjalon @ 2021-09-29 21:58 UTC (permalink / raw)
  To: matan, viacheslavo, Rongwei Liu; +Cc: orika, dev, rasland

28/09/2021 10:50, Rongwei Liu:
> In socket direct mode, it's possible to bind any two (maybe four
> in future) PCIe devices with IDs like xxxx:xx:xx.x and
> yyyy:yy:yy.y. Bonding member interfaces are unnecessary to have
> the same PCIe domain/bus/device ID anymore,
> 
> Kernel driver uses "system_image_guid" to identify if devices can
> be bound together or not. Sysfs "phys_switch_id" is used to get
> "system_image_guid" of each network interface.
> 
> OFED 5.4+ is required to support "phys_switch_id".
> Centos 8.1 needs to enable switch_dev mode first.
> 
> Signed-off-by: Rongwei Liu <rongweil@nvidia.com>
> Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
> ---
>  drivers/net/mlx5/linux/mlx5_os.c | 43 +++++++++++++++++++++++++-------
>  1 file changed, 34 insertions(+), 9 deletions(-)

Does it deserve a line in the release notes?




^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [dpdk-dev] [PATCH v2 2/2] net/mlx5: support socket direct mode bonding
  2021-09-29 21:58   ` Thomas Monjalon
@ 2021-10-04  6:45     ` Slava Ovsiienko
  2021-10-08 10:05     ` [dpdk-dev] [PATCH v3 0/2] " Rongwei Liu
  2021-10-14  2:57     ` [dpdk-dev] [PATCH v4 0/2] " Rongwei Liu
  2 siblings, 0 replies; 11+ messages in thread
From: Slava Ovsiienko @ 2021-10-04  6:45 UTC (permalink / raw)
  To: NBU-Contact-Thomas Monjalon, Matan Azrad, Rongwei Liu
  Cc: Ori Kam, dev, Raslan Darawsheh

> -----Original Message-----
> From: Thomas Monjalon <thomas@monjalon.net>
> Sent: Thursday, September 30, 2021 0:58
> To: Matan Azrad <matan@nvidia.com>; Slava Ovsiienko
> <viacheslavo@nvidia.com>; Rongwei Liu <rongweil@nvidia.com>
> Cc: Ori Kam <orika@nvidia.com>; dev@dpdk.org; Raslan Darawsheh
> <rasland@nvidia.com>
> Subject: Re: [dpdk-dev] [PATCH v2 2/2] net/mlx5: support socket direct
> mode bonding
> 
> 28/09/2021 10:50, Rongwei Liu:
> > In socket direct mode, it's possible to bind any two (maybe four in
> > future) PCIe devices with IDs like xxxx:xx:xx.x and yyyy:yy:yy.y.
> > Bonding member interfaces are unnecessary to have the same PCIe
> > domain/bus/device ID anymore,
> >
> > Kernel driver uses "system_image_guid" to identify if devices can be
> > bound together or not. Sysfs "phys_switch_id" is used to get
> > "system_image_guid" of each network interface.
> >
> > OFED 5.4+ is required to support "phys_switch_id".
> > Centos 8.1 needs to enable switch_dev mode first.
> >
> > Signed-off-by: Rongwei Liu <rongweil@nvidia.com>
> > Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
> > ---
> >  drivers/net/mlx5/linux/mlx5_os.c | 43
> > +++++++++++++++++++++++++-------
> >  1 file changed, 34 insertions(+), 9 deletions(-)
> 
> Does it deserve a line in the release notes?
Not sure, it is minor update.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [dpdk-dev] [PATCH v3 0/2] support socket direct mode bonding
  2021-09-29 21:58   ` Thomas Monjalon
  2021-10-04  6:45     ` Slava Ovsiienko
@ 2021-10-08 10:05     ` Rongwei Liu
  2021-10-08 10:05       ` [dpdk-dev] [PATCH v3 1/2] common/mlx5: support pcie device guid query Rongwei Liu
  2021-10-08 10:05       ` [dpdk-dev] [PATCH v3 2/2] net/mlx5: support socket direct mode bonding Rongwei Liu
  2021-10-14  2:57     ` [dpdk-dev] [PATCH v4 0/2] " Rongwei Liu
  2 siblings, 2 replies; 11+ messages in thread
From: Rongwei Liu @ 2021-10-08 10:05 UTC (permalink / raw)
  To: matan, viacheslavo, orika, thomas; +Cc: dev, rasland

In socket direct mode, it's possible to bind any two (maybe four
in the future) PCIe devices with IDs like xxxx:xx:xx.x and
yyyy:yy:yy.y. Bonding member interfaces are unnecessary to have
the same PCIe domain/bus/device ID anymore.

Doesn't need to backport to DPDK 20.11

v2: fix ci warnings.
v3: add description in release_21_11.

Rongwei Liu (2):
  common/mlx5: support pcie device guid query
  net/mlx5: support socket direct mode bonding

 doc/guides/rel_notes/release_21_11.rst     |  4 ++
 drivers/common/mlx5/linux/mlx5_common_os.c | 41 +++++++++++++++++++++
 drivers/common/mlx5/linux/mlx5_common_os.h | 19 ++++++++++
 drivers/net/mlx5/linux/mlx5_os.c           | 43 +++++++++++++++++-----
 4 files changed, 98 insertions(+), 9 deletions(-)

-- 
2.27.0


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [dpdk-dev] [PATCH v3 1/2] common/mlx5: support pcie device guid query
  2021-10-08 10:05     ` [dpdk-dev] [PATCH v3 0/2] " Rongwei Liu
@ 2021-10-08 10:05       ` Rongwei Liu
  2021-10-08 10:05       ` [dpdk-dev] [PATCH v3 2/2] net/mlx5: support socket direct mode bonding Rongwei Liu
  1 sibling, 0 replies; 11+ messages in thread
From: Rongwei Liu @ 2021-10-08 10:05 UTC (permalink / raw)
  To: matan, viacheslavo, orika, thomas; +Cc: dev, rasland

sysfs entry "phys_switch_id" holds each PCIe device'
guid.

The devices which reside in the same physical NIC should
have the same guid.

Signed-off-by: Rongwei Liu <rongweil@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
---
 drivers/common/mlx5/linux/mlx5_common_os.c | 41 ++++++++++++++++++++++
 drivers/common/mlx5/linux/mlx5_common_os.h | 19 ++++++++++
 2 files changed, 60 insertions(+)

diff --git a/drivers/common/mlx5/linux/mlx5_common_os.c b/drivers/common/mlx5/linux/mlx5_common_os.c
index 9e0c823c97..8b3ee2baea 100644
--- a/drivers/common/mlx5/linux/mlx5_common_os.c
+++ b/drivers/common/mlx5/linux/mlx5_common_os.c
@@ -2,6 +2,7 @@
  * Copyright 2020 Mellanox Technologies, Ltd
  */
 
+#include <sys/types.h>
 #include <unistd.h>
 #include <string.h>
 #include <stdio.h>
@@ -428,3 +429,43 @@ mlx5_os_get_ibv_device(const struct rte_pci_addr *addr)
 	mlx5_glue->free_device_list(ibv_list);
 	return ibv_match;
 }
+
+int
+mlx5_get_device_guid(const struct rte_pci_addr *dev, uint8_t *guid, size_t len)
+{
+	char tmp[512];
+	char cur_ifname[IF_NAMESIZE + 1];
+	FILE *id_file;
+	DIR *dir;
+	struct dirent *ptr;
+	int ret;
+
+	if (guid == NULL || len < sizeof(u_int64_t) + 1)
+		return -1;
+	memset(guid, 0, len);
+	snprintf(tmp, sizeof(tmp), "/sys/bus/pci/devices/%04x:%02x:%02x.%x/net",
+			dev->domain, dev->bus, dev->devid, dev->function);
+	dir = opendir(tmp);
+	if (dir == NULL)
+		return -1;
+	/* Traverse to identify PF interface */
+	do {
+		ptr = readdir(dir);
+		if (ptr == NULL || ptr->d_type != DT_DIR) {
+			closedir(dir);
+			return -1;
+		}
+	} while (strchr(ptr->d_name, '.') || strchr(ptr->d_name, '_') ||
+		 strchr(ptr->d_name, 'v'));
+	snprintf(cur_ifname, sizeof(cur_ifname), "%s", ptr->d_name);
+	closedir(dir);
+	snprintf(tmp + strlen(tmp), sizeof(tmp) - strlen(tmp),
+			"/%s/phys_switch_id", cur_ifname);
+	/* Older OFED like 5.3 doesn't support read */
+	id_file = fopen(tmp, "r");
+	if (!id_file)
+		return 0;
+	ret = fscanf(id_file, "%16s", guid);
+	fclose(id_file);
+	return ret;
+}
diff --git a/drivers/common/mlx5/linux/mlx5_common_os.h b/drivers/common/mlx5/linux/mlx5_common_os.h
index c3202b6786..3cdea75373 100644
--- a/drivers/common/mlx5/linux/mlx5_common_os.h
+++ b/drivers/common/mlx5/linux/mlx5_common_os.h
@@ -296,4 +296,23 @@ __rte_internal
 struct ibv_device *
 mlx5_os_get_ibv_dev(const struct rte_device *dev);
 
+/**
+ * This is used to query system_image_guid as describing in PRM.
+ *
+ * @param dev[in]
+ *  Pointer to a device instance as PCIe id.
+ * @param guid[out]
+ *  Pointer to the buffer to hold device guid.
+ *  Guid is uint64_t and corresponding to 17 bytes string.
+ * @param len[in]
+ *  Guid buffer length, 17 bytes at least.
+ *
+ * @return
+ *  -1 if internal failure.
+ *  0 if OFED doesn't support.
+ *  >0 if success.
+ */
+int
+mlx5_get_device_guid(const struct rte_pci_addr *dev, uint8_t *guid, size_t len);
+
 #endif /* RTE_PMD_MLX5_COMMON_OS_H_ */
-- 
2.27.0


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [dpdk-dev] [PATCH v3 2/2] net/mlx5: support socket direct mode bonding
  2021-10-08 10:05     ` [dpdk-dev] [PATCH v3 0/2] " Rongwei Liu
  2021-10-08 10:05       ` [dpdk-dev] [PATCH v3 1/2] common/mlx5: support pcie device guid query Rongwei Liu
@ 2021-10-08 10:05       ` Rongwei Liu
  1 sibling, 0 replies; 11+ messages in thread
From: Rongwei Liu @ 2021-10-08 10:05 UTC (permalink / raw)
  To: matan, viacheslavo, orika, thomas; +Cc: dev, rasland

In socket direct mode, it's possible to bind any two (maybe four
in future) PCIe devices with IDs like xxxx:xx:xx.x and
yyyy:yy:yy.y. Bonding member interfaces are unnecessary to have
the same PCIe domain/bus/device ID anymore,

Kernel driver uses "system_image_guid" to identify if devices can
be bound together or not. Sysfs "phys_switch_id" is used to get
"system_image_guid" of each network interface.

OFED 5.4+ is required to support "phys_switch_id".
Centos 8.1 needs to enable switch_dev mode first

Signed-off-by: Rongwei Liu <rongweil@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
---
 doc/guides/rel_notes/release_21_11.rst |  4 +++
 drivers/net/mlx5/linux/mlx5_os.c       | 43 ++++++++++++++++++++------
 2 files changed, 38 insertions(+), 9 deletions(-)

diff --git a/doc/guides/rel_notes/release_21_11.rst b/doc/guides/rel_notes/release_21_11.rst
index dfc2cbdeed..54a7bd230f 100644
--- a/doc/guides/rel_notes/release_21_11.rst
+++ b/doc/guides/rel_notes/release_21_11.rst
@@ -106,6 +106,10 @@ New Features
   * Added DES-CBC, AES-XCBC-MAC, AES-CMAC and non-HMAC algo support.
   * Added PDCP short MAC-I support.
 
+* **Updated Mellanox mlx5 driver.**
+
+  * Added socket direct mode bonding support which needs OFED 5.4+.
+
 * **Updated NXP dpaa2_sec crypto PMD.**
 
   * Added PDCP short MAC-I support.
diff --git a/drivers/net/mlx5/linux/mlx5_os.c b/drivers/net/mlx5/linux/mlx5_os.c
index 3746057673..1d57b934fc 100644
--- a/drivers/net/mlx5/linux/mlx5_os.c
+++ b/drivers/net/mlx5/linux/mlx5_os.c
@@ -2008,6 +2008,8 @@ mlx5_device_bond_pci_match(const struct ibv_device *ibv_dev,
 	FILE *bond_file = NULL, *file;
 	int pf = -1;
 	int ret;
+	uint8_t cur_guid[32] = {0};
+	uint8_t guid[32] = {0};
 
 	/*
 	 * Try to get master device name. If something goes
@@ -2022,6 +2024,8 @@ mlx5_device_bond_pci_match(const struct ibv_device *ibv_dev,
 	np = mlx5_nl_portnum(nl_rdma, ibv_dev->name);
 	if (!np)
 		return -1;
+	if (mlx5_get_device_guid(pci_dev, cur_guid, sizeof(cur_guid)) < 0)
+		return -1;
 	/*
 	 * The Master device might not be on the predefined
 	 * port (not on port index 1, it is not garanted),
@@ -2050,6 +2054,7 @@ mlx5_device_bond_pci_match(const struct ibv_device *ibv_dev,
 		char tmp_str[IF_NAMESIZE + 32];
 		struct rte_pci_addr pci_addr;
 		struct mlx5_switch_info	info;
+		int ret;
 
 		/* Process slave interface names in the loop. */
 		snprintf(tmp_str, sizeof(tmp_str),
@@ -2080,15 +2085,6 @@ mlx5_device_bond_pci_match(const struct ibv_device *ibv_dev,
 				tmp_str);
 			break;
 		}
-		/* Match PCI address, allows BDF0+pfx or BDFx+pfx. */
-		if (pci_dev->domain == pci_addr.domain &&
-		    pci_dev->bus == pci_addr.bus &&
-		    pci_dev->devid == pci_addr.devid &&
-		    ((pci_dev->function == 0 &&
-		      pci_dev->function + owner == pci_addr.function) ||
-		     (pci_dev->function == owner &&
-		      pci_addr.function == owner)))
-			pf = info.port_name;
 		/* Get ifindex. */
 		snprintf(tmp_str, sizeof(tmp_str),
 			 "/sys/class/net/%s/ifindex", ifname);
@@ -2105,6 +2101,30 @@ mlx5_device_bond_pci_match(const struct ibv_device *ibv_dev,
 		bond_info->ports[info.port_name].pci_addr = pci_addr;
 		bond_info->ports[info.port_name].ifindex = ifindex;
 		bond_info->n_port++;
+		/*
+		 * Under socket direct mode, bonding will use
+		 * system_image_guid as identification.
+		 * After OFED 5.4, guid is readable (ret >= 0) under sysfs.
+		 * All bonding members should have the same guid even if driver
+		 * is using PCIe BDF.
+		 */
+		ret = mlx5_get_device_guid(&pci_addr, guid, sizeof(guid));
+		if (ret < 0)
+			break;
+		else if (ret > 0) {
+			if (!memcmp(guid, cur_guid, sizeof(guid)) &&
+			    owner == info.port_name &&
+			    (owner != 0 || (owner == 0 &&
+			    !rte_pci_addr_cmp(pci_dev, &pci_addr))))
+				pf = info.port_name;
+		} else if (pci_dev->domain == pci_addr.domain &&
+		    pci_dev->bus == pci_addr.bus &&
+		    pci_dev->devid == pci_addr.devid &&
+		    ((pci_dev->function == 0 &&
+		      pci_dev->function + owner == pci_addr.function) ||
+		     (pci_dev->function == owner &&
+		      pci_addr.function == owner)))
+			pf = info.port_name;
 	}
 	if (pf >= 0) {
 		/* Get bond interface info */
@@ -2117,6 +2137,11 @@ mlx5_device_bond_pci_match(const struct ibv_device *ibv_dev,
 			DRV_LOG(INFO, "PF device %u, bond device %u(%s)",
 				ifindex, bond_info->ifindex, bond_info->ifname);
 	}
+	if (owner == 0 && pf != 0) {
+		DRV_LOG(INFO, "PCIe instance %04x:%02x:%02x.%x isn't bonding owner",
+				pci_dev->domain, pci_dev->bus, pci_dev->devid,
+				pci_dev->function);
+	}
 	return pf;
 }
 
-- 
2.27.0


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [dpdk-dev] [PATCH v4 0/2] support socket direct mode bonding
  2021-09-29 21:58   ` Thomas Monjalon
  2021-10-04  6:45     ` Slava Ovsiienko
  2021-10-08 10:05     ` [dpdk-dev] [PATCH v3 0/2] " Rongwei Liu
@ 2021-10-14  2:57     ` Rongwei Liu
  2021-10-14  2:58       ` [dpdk-dev] [PATCH v4 1/2] common/mlx5: support pcie device guid query Rongwei Liu
  2021-10-14  2:58       ` [dpdk-dev] [PATCH v4 2/2] net/mlx5: support socket direct mode bonding Rongwei Liu
  2 siblings, 2 replies; 11+ messages in thread
From: Rongwei Liu @ 2021-10-14  2:57 UTC (permalink / raw)
  To: matan, viacheslavo, orika, thomas; +Cc: dev, rasland

In socket direct mode, it's possible to bind any two (maybe four
in the future) PCIe devices with IDs like xxxx:xx:xx.x and
yyyy:yy:yy.y. Bonding member interfaces are unnecessary to have
the same PCIe domain/bus/device ID anymore.

Doesn't need to backport to DPDK 20.11

v2: fix ci warnings.
v3: add description in release_21_11.rst.
v4: add description in mlx5.rst.

Rongwei Liu (2):
  common/mlx5: support pcie device guid query
  net/mlx5: support socket direct mode bonding

 doc/guides/nics/mlx5.rst                   |  4 ++
 doc/guides/rel_notes/release_21_11.rst     |  4 ++
 drivers/common/mlx5/linux/mlx5_common_os.c | 41 +++++++++++++++++++++
 drivers/common/mlx5/linux/mlx5_common_os.h | 19 ++++++++++
 drivers/net/mlx5/linux/mlx5_os.c           | 43 +++++++++++++++++-----
 5 files changed, 102 insertions(+), 9 deletions(-)

-- 
2.27.0


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [dpdk-dev] [PATCH v4 1/2] common/mlx5: support pcie device guid query
  2021-10-14  2:57     ` [dpdk-dev] [PATCH v4 0/2] " Rongwei Liu
@ 2021-10-14  2:58       ` Rongwei Liu
  2021-10-14  2:58       ` [dpdk-dev] [PATCH v4 2/2] net/mlx5: support socket direct mode bonding Rongwei Liu
  1 sibling, 0 replies; 11+ messages in thread
From: Rongwei Liu @ 2021-10-14  2:58 UTC (permalink / raw)
  To: matan, viacheslavo, orika, thomas; +Cc: dev, rasland

sysfs entry "phys_switch_id" holds each PCIe device'
guid.

The devices which reside in the same physical NIC should
have the same guid.

Signed-off-by: Rongwei Liu <rongweil@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
---
 drivers/common/mlx5/linux/mlx5_common_os.c | 41 ++++++++++++++++++++++
 drivers/common/mlx5/linux/mlx5_common_os.h | 19 ++++++++++
 2 files changed, 60 insertions(+)

diff --git a/drivers/common/mlx5/linux/mlx5_common_os.c b/drivers/common/mlx5/linux/mlx5_common_os.c
index 9e0c823c97..8b3ee2baea 100644
--- a/drivers/common/mlx5/linux/mlx5_common_os.c
+++ b/drivers/common/mlx5/linux/mlx5_common_os.c
@@ -2,6 +2,7 @@
  * Copyright 2020 Mellanox Technologies, Ltd
  */
 
+#include <sys/types.h>
 #include <unistd.h>
 #include <string.h>
 #include <stdio.h>
@@ -428,3 +429,43 @@ mlx5_os_get_ibv_device(const struct rte_pci_addr *addr)
 	mlx5_glue->free_device_list(ibv_list);
 	return ibv_match;
 }
+
+int
+mlx5_get_device_guid(const struct rte_pci_addr *dev, uint8_t *guid, size_t len)
+{
+	char tmp[512];
+	char cur_ifname[IF_NAMESIZE + 1];
+	FILE *id_file;
+	DIR *dir;
+	struct dirent *ptr;
+	int ret;
+
+	if (guid == NULL || len < sizeof(u_int64_t) + 1)
+		return -1;
+	memset(guid, 0, len);
+	snprintf(tmp, sizeof(tmp), "/sys/bus/pci/devices/%04x:%02x:%02x.%x/net",
+			dev->domain, dev->bus, dev->devid, dev->function);
+	dir = opendir(tmp);
+	if (dir == NULL)
+		return -1;
+	/* Traverse to identify PF interface */
+	do {
+		ptr = readdir(dir);
+		if (ptr == NULL || ptr->d_type != DT_DIR) {
+			closedir(dir);
+			return -1;
+		}
+	} while (strchr(ptr->d_name, '.') || strchr(ptr->d_name, '_') ||
+		 strchr(ptr->d_name, 'v'));
+	snprintf(cur_ifname, sizeof(cur_ifname), "%s", ptr->d_name);
+	closedir(dir);
+	snprintf(tmp + strlen(tmp), sizeof(tmp) - strlen(tmp),
+			"/%s/phys_switch_id", cur_ifname);
+	/* Older OFED like 5.3 doesn't support read */
+	id_file = fopen(tmp, "r");
+	if (!id_file)
+		return 0;
+	ret = fscanf(id_file, "%16s", guid);
+	fclose(id_file);
+	return ret;
+}
diff --git a/drivers/common/mlx5/linux/mlx5_common_os.h b/drivers/common/mlx5/linux/mlx5_common_os.h
index c3202b6786..3cdea75373 100644
--- a/drivers/common/mlx5/linux/mlx5_common_os.h
+++ b/drivers/common/mlx5/linux/mlx5_common_os.h
@@ -296,4 +296,23 @@ __rte_internal
 struct ibv_device *
 mlx5_os_get_ibv_dev(const struct rte_device *dev);
 
+/**
+ * This is used to query system_image_guid as describing in PRM.
+ *
+ * @param dev[in]
+ *  Pointer to a device instance as PCIe id.
+ * @param guid[out]
+ *  Pointer to the buffer to hold device guid.
+ *  Guid is uint64_t and corresponding to 17 bytes string.
+ * @param len[in]
+ *  Guid buffer length, 17 bytes at least.
+ *
+ * @return
+ *  -1 if internal failure.
+ *  0 if OFED doesn't support.
+ *  >0 if success.
+ */
+int
+mlx5_get_device_guid(const struct rte_pci_addr *dev, uint8_t *guid, size_t len);
+
 #endif /* RTE_PMD_MLX5_COMMON_OS_H_ */
-- 
2.27.0


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [dpdk-dev] [PATCH v4 2/2] net/mlx5: support socket direct mode bonding
  2021-10-14  2:57     ` [dpdk-dev] [PATCH v4 0/2] " Rongwei Liu
  2021-10-14  2:58       ` [dpdk-dev] [PATCH v4 1/2] common/mlx5: support pcie device guid query Rongwei Liu
@ 2021-10-14  2:58       ` Rongwei Liu
  1 sibling, 0 replies; 11+ messages in thread
From: Rongwei Liu @ 2021-10-14  2:58 UTC (permalink / raw)
  To: matan, viacheslavo, orika, thomas; +Cc: dev, rasland

In socket direct mode, it's possible to bind any two (maybe four
in future) PCIe devices with IDs like xxxx:xx:xx.x and
yyyy:yy:yy.y. Bonding member interfaces are unnecessary to have
the same PCIe domain/bus/device ID anymore,

Kernel driver uses "system_image_guid" to identify if devices can
be bound together or not. Sysfs "phys_switch_id" is used to get
"system_image_guid" of each network interface.

OFED 5.4+ is required to support "phys_switch_id".

Signed-off-by: Rongwei Liu <rongweil@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
---
 doc/guides/nics/mlx5.rst               |  4 +++
 doc/guides/rel_notes/release_21_11.rst |  4 +++
 drivers/net/mlx5/linux/mlx5_os.c       | 43 ++++++++++++++++++++------
 3 files changed, 42 insertions(+), 9 deletions(-)

diff --git a/doc/guides/nics/mlx5.rst b/doc/guides/nics/mlx5.rst
index bae73f42d8..b58236e00a 100644
--- a/doc/guides/nics/mlx5.rst
+++ b/doc/guides/nics/mlx5.rst
@@ -464,6 +464,10 @@ Limitations
   - In order to achieve best insertion rate, application should manage the flows per lcore.
   - Better to disable memory reclaim by setting ``reclaim_mem_mode`` to 0 to accelerate the flow object allocation and release with cache.
 
+- Bonding under socket direct mode
+
+  - Needs OFED 5.4+.
+
 Statistics
 ----------
 
diff --git a/doc/guides/rel_notes/release_21_11.rst b/doc/guides/rel_notes/release_21_11.rst
index dfc2cbdeed..2a6cc765c2 100644
--- a/doc/guides/rel_notes/release_21_11.rst
+++ b/doc/guides/rel_notes/release_21_11.rst
@@ -106,6 +106,10 @@ New Features
   * Added DES-CBC, AES-XCBC-MAC, AES-CMAC and non-HMAC algo support.
   * Added PDCP short MAC-I support.
 
+* **Updated Mellanox mlx5 driver.**
+
+  * Added socket direct mode bonding support.
+
 * **Updated NXP dpaa2_sec crypto PMD.**
 
   * Added PDCP short MAC-I support.
diff --git a/drivers/net/mlx5/linux/mlx5_os.c b/drivers/net/mlx5/linux/mlx5_os.c
index 3746057673..1d57b934fc 100644
--- a/drivers/net/mlx5/linux/mlx5_os.c
+++ b/drivers/net/mlx5/linux/mlx5_os.c
@@ -2008,6 +2008,8 @@ mlx5_device_bond_pci_match(const struct ibv_device *ibv_dev,
 	FILE *bond_file = NULL, *file;
 	int pf = -1;
 	int ret;
+	uint8_t cur_guid[32] = {0};
+	uint8_t guid[32] = {0};
 
 	/*
 	 * Try to get master device name. If something goes
@@ -2022,6 +2024,8 @@ mlx5_device_bond_pci_match(const struct ibv_device *ibv_dev,
 	np = mlx5_nl_portnum(nl_rdma, ibv_dev->name);
 	if (!np)
 		return -1;
+	if (mlx5_get_device_guid(pci_dev, cur_guid, sizeof(cur_guid)) < 0)
+		return -1;
 	/*
 	 * The Master device might not be on the predefined
 	 * port (not on port index 1, it is not garanted),
@@ -2050,6 +2054,7 @@ mlx5_device_bond_pci_match(const struct ibv_device *ibv_dev,
 		char tmp_str[IF_NAMESIZE + 32];
 		struct rte_pci_addr pci_addr;
 		struct mlx5_switch_info	info;
+		int ret;
 
 		/* Process slave interface names in the loop. */
 		snprintf(tmp_str, sizeof(tmp_str),
@@ -2080,15 +2085,6 @@ mlx5_device_bond_pci_match(const struct ibv_device *ibv_dev,
 				tmp_str);
 			break;
 		}
-		/* Match PCI address, allows BDF0+pfx or BDFx+pfx. */
-		if (pci_dev->domain == pci_addr.domain &&
-		    pci_dev->bus == pci_addr.bus &&
-		    pci_dev->devid == pci_addr.devid &&
-		    ((pci_dev->function == 0 &&
-		      pci_dev->function + owner == pci_addr.function) ||
-		     (pci_dev->function == owner &&
-		      pci_addr.function == owner)))
-			pf = info.port_name;
 		/* Get ifindex. */
 		snprintf(tmp_str, sizeof(tmp_str),
 			 "/sys/class/net/%s/ifindex", ifname);
@@ -2105,6 +2101,30 @@ mlx5_device_bond_pci_match(const struct ibv_device *ibv_dev,
 		bond_info->ports[info.port_name].pci_addr = pci_addr;
 		bond_info->ports[info.port_name].ifindex = ifindex;
 		bond_info->n_port++;
+		/*
+		 * Under socket direct mode, bonding will use
+		 * system_image_guid as identification.
+		 * After OFED 5.4, guid is readable (ret >= 0) under sysfs.
+		 * All bonding members should have the same guid even if driver
+		 * is using PCIe BDF.
+		 */
+		ret = mlx5_get_device_guid(&pci_addr, guid, sizeof(guid));
+		if (ret < 0)
+			break;
+		else if (ret > 0) {
+			if (!memcmp(guid, cur_guid, sizeof(guid)) &&
+			    owner == info.port_name &&
+			    (owner != 0 || (owner == 0 &&
+			    !rte_pci_addr_cmp(pci_dev, &pci_addr))))
+				pf = info.port_name;
+		} else if (pci_dev->domain == pci_addr.domain &&
+		    pci_dev->bus == pci_addr.bus &&
+		    pci_dev->devid == pci_addr.devid &&
+		    ((pci_dev->function == 0 &&
+		      pci_dev->function + owner == pci_addr.function) ||
+		     (pci_dev->function == owner &&
+		      pci_addr.function == owner)))
+			pf = info.port_name;
 	}
 	if (pf >= 0) {
 		/* Get bond interface info */
@@ -2117,6 +2137,11 @@ mlx5_device_bond_pci_match(const struct ibv_device *ibv_dev,
 			DRV_LOG(INFO, "PF device %u, bond device %u(%s)",
 				ifindex, bond_info->ifindex, bond_info->ifname);
 	}
+	if (owner == 0 && pf != 0) {
+		DRV_LOG(INFO, "PCIe instance %04x:%02x:%02x.%x isn't bonding owner",
+				pci_dev->domain, pci_dev->bus, pci_dev->devid,
+				pci_dev->function);
+	}
 	return pf;
 }
 
-- 
2.27.0


^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2021-10-14  2:58 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-09-28  8:50 [dpdk-dev] [PATCH v2 0/2] support socket direct mode bonding Rongwei Liu
2021-09-28  8:50 ` [dpdk-dev] [PATCH v2 1/2] common/mlx5: support pcie device guid query Rongwei Liu
2021-09-28  8:50 ` [dpdk-dev] [PATCH v2 2/2] net/mlx5: support socket direct mode bonding Rongwei Liu
2021-09-29 21:58   ` Thomas Monjalon
2021-10-04  6:45     ` Slava Ovsiienko
2021-10-08 10:05     ` [dpdk-dev] [PATCH v3 0/2] " Rongwei Liu
2021-10-08 10:05       ` [dpdk-dev] [PATCH v3 1/2] common/mlx5: support pcie device guid query Rongwei Liu
2021-10-08 10:05       ` [dpdk-dev] [PATCH v3 2/2] net/mlx5: support socket direct mode bonding Rongwei Liu
2021-10-14  2:57     ` [dpdk-dev] [PATCH v4 0/2] " Rongwei Liu
2021-10-14  2:58       ` [dpdk-dev] [PATCH v4 1/2] common/mlx5: support pcie device guid query Rongwei Liu
2021-10-14  2:58       ` [dpdk-dev] [PATCH v4 2/2] net/mlx5: support socket direct mode bonding Rongwei Liu

DPDK patches and discussions

This inbox may be cloned and mirrored by anyone:

	git clone --mirror http://inbox.dpdk.org/dev/0 dev/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 dev dev/ http://inbox.dpdk.org/dev \
		dev@dpdk.org
	public-inbox-index dev

Example config snippet for mirrors.
Newsgroup available over NNTP:
	nntp://inbox.dpdk.org/inbox.dpdk.dev


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git