From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (xvm-189-124.dc0.ghst.net [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id 128F4A09FF; Wed, 6 Jan 2021 17:40:46 +0100 (CET) Received: from [217.70.189.124] (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id 9DCC0140E34; Wed, 6 Jan 2021 17:40:18 +0100 (CET) Received: from mellanox.co.il (mail-il-dmz.mellanox.com [193.47.165.129]) by mails.dpdk.org (Postfix) with ESMTP id B19CA140E0B for ; Wed, 6 Jan 2021 17:40:11 +0100 (CET) Received: from Internal Mail-Server by MTLPINE1 (envelope-from xuemingl@nvidia.com) with SMTP; 6 Jan 2021 18:40:10 +0200 Received: from nvidia.com (pegasus05.mtr.labs.mlnx [10.210.16.100]) by labmailer.mlnx (8.13.8/8.13.8) with ESMTP id 106Ge96f017829; Wed, 6 Jan 2021 18:40:10 +0200 From: Xueming Li To: Viacheslav Ovsiienko , Shahaf Shuler , Matan Azrad Cc: dev@dpdk.org, xuemingl@nvidia.com, Asaf Penso Date: Wed, 6 Jan 2021 16:39:59 +0000 Message-Id: <1609951199-24794-5-git-send-email-xuemingl@nvidia.com> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1609951199-24794-1-git-send-email-xuemingl@nvidia.com> References: <1609951199-24794-1-git-send-email-xuemingl@nvidia.com> In-Reply-To: <1608303356-13089-2-git-send-email-xuemingl@nvidia.com> References: <1608303356-13089-2-git-send-email-xuemingl@nvidia.com> Subject: [dpdk-dev] [PATCH v2 4/4] net/mlx5: improve bonding representor probe X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" To probe representor on 2nd PF of bonding device, had to specify PF1 BDF in devarg: ,representor=0 When closing bonding device, all representors had to be closed together and this implies all representors have to use master PF of bonding device. So after probing representor port on 2nd PF, when locating new probed device using device argument, the filter used 2nd PF as PCI address and failed to locate new device. Conflict happened by using current representor devargs: - Use PCI BDF to specify representor owner PF - Use PCI BDF to locate probed representor device. - PMD use master PCI BDF as PCI device. To resolve such conflicts, new representor syntax is introduced here: ,representor=pfXvfY All representors must use master PF as owner PCI device, PMD internally locate owner PCI address by checking representor "pfX" part. To EAL, all representor are registered to master PCI device, 2nd PF is hidden to EAL, thus all search should be consistent. This patch also add pf index into bonding mode representor port name: __representor_pfvf Signed-off-by: Xueming Li --- doc/guides/nics/mlx5.rst | 8 +- drivers/net/mlx5/linux/mlx5_os.c | 160 +++++++++++++++++-------------- drivers/net/mlx5/mlx5.c | 22 +++++ drivers/net/mlx5/mlx5.h | 3 +- drivers/net/mlx5/mlx5_defs.h | 4 - drivers/net/mlx5/mlx5_ethdev.c | 27 ------ drivers/net/mlx5/mlx5_mac.c | 8 +- 7 files changed, 122 insertions(+), 110 deletions(-) diff --git a/doc/guides/nics/mlx5.rst b/doc/guides/nics/mlx5.rst index fc7e93842f..8a142e1d59 100644 --- a/doc/guides/nics/mlx5.rst +++ b/doc/guides/nics/mlx5.rst @@ -881,11 +881,15 @@ Driver options For instance, to probe VF port representors 0 through 2:: - representor=vf[0-2] + ,representor=vf[0-2] To probe SF port representors 0 through 2:: - representor=sf[0-2] + ,representor=sf[0-2] + + To probe VF port representors 0 through 2 on both PFs of bonding device:: + + ,representor=pf[0,1]vf[0-2] - ``max_dump_files_num`` parameter [int] diff --git a/drivers/net/mlx5/linux/mlx5_os.c b/drivers/net/mlx5/linux/mlx5_os.c index 1209bfed5b..6866381298 100644 --- a/drivers/net/mlx5/linux/mlx5_os.c +++ b/drivers/net/mlx5/linux/mlx5_os.c @@ -676,6 +676,7 @@ mlx5_dev_spawn(struct rte_device *dpdk_dev, struct mlx5dv_context dv_attr = { .comp_mask = 0 }; struct rte_eth_dev *eth_dev = NULL; struct mlx5_priv *priv = NULL; + uint16_t repr_id = -1; int err = 0; unsigned int hw_padding = 0; unsigned int mps; @@ -693,11 +694,19 @@ mlx5_dev_spawn(struct rte_device *dpdk_dev, char name[RTE_ETH_NAME_MAX_LEN]; int own_domain_id = 0; uint16_t port_id; - unsigned int i; + unsigned int c = 0, p = 0, f = 0; #ifdef HAVE_MLX5DV_DR_DEVX_PORT struct mlx5dv_devx_port devx_port = { .comp_mask = 0 }; #endif + if (switch_info->representor) + repr_id = rte_eth_representor_id_encode( + switch_info->ctrl_num, + spawn->pf_bond >= 0 ? switch_info->pf_num : 0, + switch_info->name_type == + MLX5_PHYS_PORT_NAME_TYPE_PFSF ? + RTE_ETH_REPRESENTOR_SF : RTE_ETH_REPRESENTOR_VF, + switch_info->port_name); /* Determine if this port representor is supposed to be spawned. */ if (switch_info->representor && dpdk_dev->devargs) { switch (eth_da->type) { @@ -729,51 +738,27 @@ mlx5_dev_spawn(struct rte_device *dpdk_dev, dpdk_dev->devargs->args); return NULL; } - /* Check controller ID: */ - for (i = 0; i < eth_da->nb_mh_controllers; ++i) - if (eth_da->mh_controllers[i] == - (uint16_t)switch_info->ctrl_num) - break; - if (eth_da->nb_mh_controllers && - i == eth_da->nb_mh_controllers) { - rte_errno = EBUSY; - return NULL; - } - /* Check SF/VF ID: */ - for (i = 0; i < eth_da->nb_representor_ports; ++i) - if (eth_da->representor_ports[i] == - (uint16_t)switch_info->port_name) - break; - if (eth_da->type != RTE_ETH_REPRESENTOR_PF && - i == eth_da->nb_representor_ports) { - rte_errno = EBUSY; - return NULL; - } - /* Check PF ID. Check after repr port to avoid warning flood. */ - if (spawn->pf_bond >= 0) { - for (i = 0; i < eth_da->nb_ports; ++i) - if (eth_da->ports[i] == - (uint16_t)switch_info->pf_num) - break; - if (eth_da->nb_ports && i == eth_da->nb_ports) { - /* For backward compatibility, bonding - * representor syntax supported with limitation, - * device iterator won't find it: - * ,representor=# - */ - if (switch_info->pf_num > 0 && - eth_da->ports[0] == 0) { - DRV_LOG(WARNING, "Representor on Bonding PF should use pf#vf# format: %s", - dpdk_dev->devargs->args); - } else { - rte_errno = EBUSY; - return NULL; + /* Check representor ID: */ + for (c = 0; c < eth_da->nb_mh_controllers; ++c) { + for (p = 0; p < eth_da->nb_ports; ++p) { + for (f = 0; f < eth_da->nb_representor_ports; + ++f) { + uint16_t repr; + + repr = rte_eth_representor_id_encode( + eth_da->mh_controllers[c], + eth_da->ports[p], + eth_da->type, + eth_da->representor_ports[f]); + + if (repr_id == repr) + break; } } - } else if (eth_da->nb_ports > 1 || eth_da->ports[0]) { - rte_errno = EINVAL; - DRV_LOG(ERR, "PF id not supported by non-bond device: %s", - dpdk_dev->devargs->args); + } + if (c == eth_da->nb_mh_controllers && p == eth_da->nb_ports && + f == eth_da->nb_representor_ports) { + rte_errno = EBUSY; return NULL; } } @@ -790,17 +775,23 @@ mlx5_dev_spawn(struct rte_device *dpdk_dev, switch_info->port_name); } else { /* Bonding device. */ - if (!switch_info->representor) + if (!switch_info->representor) { snprintf(name, sizeof(name), "%s_%s", dpdk_dev->name, mlx5_os_get_dev_device_name(spawn->phys_dev)); - else - snprintf(name, sizeof(name), "%s_%s_representor_%s%u", - dpdk_dev->name, - mlx5_os_get_dev_device_name(spawn->phys_dev), - switch_info->name_type == - MLX5_PHYS_PORT_NAME_TYPE_PFSF ? "sf" : "vf", - switch_info->port_name); + } else { + err = snprintf(name, sizeof(name), "%s_%s_representor_c%dpf%d%s%u", + dpdk_dev->name, + mlx5_os_get_dev_device_name(spawn->phys_dev), + switch_info->ctrl_num, + switch_info->pf_num, + switch_info->name_type == + MLX5_PHYS_PORT_NAME_TYPE_PFSF ? "sf" : "vf", + switch_info->port_name); + if (err >= (int)sizeof(name)) + DRV_LOG(WARNING, "representor name overflow %s", + name); + } } /* check if the device is already spawned */ if (rte_eth_dev_get_port_by_name(name, &port_id) == 0) { @@ -1079,11 +1070,7 @@ mlx5_dev_spawn(struct rte_device *dpdk_dev, priv->vport_id = switch_info->representor ? switch_info->port_name + 1 : -1; #endif - /* representor_id field keeps the unmodified VF index. */ - priv->representor_id = switch_info->representor ? - rte_eth_representor_id_encode(0, 0, RTE_ETH_REPRESENTOR_VF, - switch_info->port_name) : - -1; + priv->representor_id = repr_id; /* * Look for sibling devices in order to reuse their switch domain * if any, otherwise allocate one. @@ -1704,9 +1691,11 @@ mlx5_dev_spawn_data_cmp(const void *a, const void *b) * @param[in] ibv_dev * Pointer to Infiniband device structure. * @param[in] pci_dev - * Pointer to PCI device structure to match PCI address. + * Pointer to master PCI Address structure to match PCI address. * @param[in] nl_rdma * Netlink RDMA group socket handle. + * @param[in] owner + * Rerepsentor owner PF index. * * @return * negative value if no bonding device found, otherwise @@ -1714,8 +1703,8 @@ mlx5_dev_spawn_data_cmp(const void *a, const void *b) */ static int mlx5_device_bond_pci_match(const struct ibv_device *ibv_dev, - const struct rte_pci_device *pci_dev, - int nl_rdma) + const struct rte_pci_addr *pci_dev, + int nl_rdma, uint16_t owner) { char ifname[IF_NAMESIZE + 1]; unsigned int ifindex; @@ -1772,10 +1761,10 @@ mlx5_device_bond_pci_match(const struct ibv_device *ibv_dev, " for netdev \"%s\"", ifname); continue; } - if (pci_dev->addr.domain != pci_addr.domain || - pci_dev->addr.bus != pci_addr.bus || - pci_dev->addr.devid != pci_addr.devid || - pci_dev->addr.function != pci_addr.function) + if (pci_dev->domain != pci_addr.domain || + pci_dev->bus != pci_addr.bus || + pci_dev->devid != pci_addr.devid || + pci_dev->function + owner != pci_addr.function) continue; /* Slave interface PCI address match found. */ fclose(file); @@ -1843,7 +1832,8 @@ mlx5_os_pci_probe(struct rte_pci_driver *pci_drv __rte_unused, struct mlx5_dev_config dev_config; unsigned int dev_config_vf; struct rte_eth_devargs eth_da = { .type = RTE_ETH_REPRESENTOR_NONE }; - int ret; + struct rte_pci_addr probe_addr = pci_dev->addr; + int ret = -1; if (rte_eal_process_type() == RTE_PROC_PRIMARY) mlx5_pmd_socket_init(); @@ -1895,7 +1885,8 @@ mlx5_os_pci_probe(struct rte_pci_driver *pci_drv __rte_unused, DRV_LOG(DEBUG, "checking device \"%s\"", ibv_list[ret]->name); bd = mlx5_device_bond_pci_match - (ibv_list[ret], pci_dev, nl_rdma); + (ibv_list[ret], &probe_addr, nl_rdma, + eth_da.ports[0]); if (bd >= 0) { /* * Bonding device detected. Only one match is allowed, @@ -1912,6 +1903,9 @@ mlx5_os_pci_probe(struct rte_pci_driver *pci_drv __rte_unused, ret = -rte_errno; goto exit; } + /* Amend master pci address if owner PF specified. */ + if (eth_da.nb_representor_ports) + probe_addr.function += eth_da.ports[0]; DRV_LOG(INFO, "PCI information matches for" " slave %d bonding device \"%s\"", bd, ibv_list[ret]->name); @@ -1921,10 +1915,10 @@ mlx5_os_pci_probe(struct rte_pci_driver *pci_drv __rte_unused, if (mlx5_dev_to_pci_addr (ibv_list[ret]->ibdev_path, &pci_addr)) continue; - if (pci_dev->addr.domain != pci_addr.domain || - pci_dev->addr.bus != pci_addr.bus || - pci_dev->addr.devid != pci_addr.devid || - pci_dev->addr.function != pci_addr.function) + if (probe_addr.domain != pci_addr.domain || + probe_addr.bus != pci_addr.bus || + probe_addr.devid != pci_addr.devid || + probe_addr.function != pci_addr.function) continue; DRV_LOG(INFO, "PCI information matches for device \"%s\"", ibv_list[ret]->name); @@ -1936,8 +1930,8 @@ mlx5_os_pci_probe(struct rte_pci_driver *pci_drv __rte_unused, DRV_LOG(WARNING, "no Verbs device matches PCI device " PCI_PRI_FMT "," " are kernel drivers loaded?", - pci_dev->addr.domain, pci_dev->addr.bus, - pci_dev->addr.devid, pci_dev->addr.function); + probe_addr.domain, probe_addr.bus, + probe_addr.devid, probe_addr.function); rte_errno = ENOENT; ret = -rte_errno; goto exit; @@ -2202,6 +2196,24 @@ mlx5_os_pci_probe(struct rte_pci_driver *pci_drv __rte_unused, dev_config_vf = 0; break; } + if (pci_dev->device.devargs) { + /* Set devargs default values. */ + if (eth_da.nb_mh_controllers == 0) { + eth_da.nb_mh_controllers = 1; + eth_da.mh_controllers[0] = 0; + } + if (eth_da.nb_ports == 0 && ns > 0) { + if (list[0].pf_bond >= 0 && list[0].info.representor) + DRV_LOG(WARNING, "Representor on Bonding device should use pf#vf# syntax: %s", + pci_dev->device.devargs->args); + eth_da.nb_ports = 1; + eth_da.ports[0] = list[0].info.pf_num; + } + if (eth_da.nb_representor_ports == 0) { + eth_da.nb_representor_ports = 1; + eth_da.representor_ports[0] = 0; + } + } for (i = 0; i != ns; ++i) { uint32_t restore; @@ -2243,8 +2255,8 @@ mlx5_os_pci_probe(struct rte_pci_driver *pci_drv __rte_unused, DRV_LOG(ERR, "probe of PCI device " PCI_PRI_FMT " aborted after" " encountering an error: %s", - pci_dev->addr.domain, pci_dev->addr.bus, - pci_dev->addr.devid, pci_dev->addr.function, + probe_addr.domain, probe_addr.bus, + probe_addr.devid, probe_addr.function, strerror(rte_errno)); ret = -rte_errno; /* Roll back. */ diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c index 628587faac..a8de42ff14 100644 --- a/drivers/net/mlx5/mlx5.c +++ b/drivers/net/mlx5/mlx5.c @@ -358,6 +358,28 @@ static const struct mlx5_indexed_pool_config mlx5_ipool_cfg[] = { #define MLX5_FLOW_TABLE_HLIST_ARRAY_SIZE 4096 +/** + * Decide whether representor ID is a HPF(host PF) port on BF2. + * + * @param dev + * Pointer to Ethernet device structure. + * + * @return + * Non-zero if HPF, otherwise 0. + */ +int +mlx5_is_hpf(struct rte_eth_dev *dev) +{ + struct mlx5_priv *priv = dev->data->dev_private; + enum rte_eth_representor_type type; + uint16_t port; + + port = rte_eth_representor_id_parse(priv->representor_id, + NULL, NULL, &type); + return priv->representor && type == RTE_ETH_REPRESENTOR_VF && + port == rte_eth_representor_id_parse(-1, NULL, NULL, NULL); +} + /** * Initialize the ASO aging management structure. * diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h index a77a1600d5..767dcfdc6d 100644 --- a/drivers/net/mlx5/mlx5.h +++ b/drivers/net/mlx5/mlx5.h @@ -945,7 +945,7 @@ struct mlx5_priv { uint16_t vport_id; /* Associated VF vport index (if any). */ uint32_t vport_meta_tag; /* Used for vport index match ove VF LAG. */ uint32_t vport_meta_mask; /* Used for vport index field match mask. */ - int32_t representor_id; /* Port representor identifier. */ + int32_t representor_id; /* RTE_ETH_REPR(), -1 if not a representor. */ int32_t pf_bond; /* >=0 means PF index in bonding configuration. */ unsigned int if_index; /* Associated kernel network device index. */ uint32_t bond_ifindex; /**< Bond interface index. */ @@ -1019,6 +1019,7 @@ int mlx5_udp_tunnel_port_add(struct rte_eth_dev *dev, struct rte_eth_udp_tunnel *udp_tunnel); uint16_t mlx5_eth_find_next(uint16_t port_id, struct rte_pci_device *pci_dev); int mlx5_dev_close(struct rte_eth_dev *dev); +int mlx5_is_hpf(struct rte_eth_dev *dev); void mlx5_age_event_prepare(struct mlx5_dev_ctx_shared *sh); /* Macro to iterate over all valid ports for mlx5 driver. */ diff --git a/drivers/net/mlx5/mlx5_defs.h b/drivers/net/mlx5/mlx5_defs.h index 85a0979653..4648196550 100644 --- a/drivers/net/mlx5/mlx5_defs.h +++ b/drivers/net/mlx5/mlx5_defs.h @@ -48,10 +48,6 @@ #define MLX5_PMD_SOFT_COUNTERS 1 #endif -/* Switch port ID parameters for bonding configurations. */ -#define MLX5_PORT_ID_BONDING_PF_MASK 0xf -#define MLX5_PORT_ID_BONDING_PF_SHIFT 12 - /* Alarm timeout. */ #define MLX5_ALARM_TIMEOUT_US 100000 diff --git a/drivers/net/mlx5/mlx5_ethdev.c b/drivers/net/mlx5/mlx5_ethdev.c index ad6aacc329..5341eb16c9 100644 --- a/drivers/net/mlx5/mlx5_ethdev.c +++ b/drivers/net/mlx5/mlx5_ethdev.c @@ -330,33 +330,6 @@ mlx5_dev_infos_get(struct rte_eth_dev *dev, struct rte_eth_dev_info *info) if (priv->representor) { uint16_t port_id; - if (priv->pf_bond >= 0) { - /* - * Switch port ID is opaque value with driver defined - * format. Push the PF index in bonding configurations - * in upper four bits of port ID. If we get too many - * representors (more than 4K) or PFs (more than 15) - * this approach must be reconsidered. - */ - /* Switch port ID for VF representors: 0 - 0xFFE */ - if ((info->switch_info.port_id != 0xffff && - info->switch_info.port_id >= - ((1 << MLX5_PORT_ID_BONDING_PF_SHIFT) - 1)) || - priv->pf_bond > MLX5_PORT_ID_BONDING_PF_MASK) { - DRV_LOG(ERR, "can't update switch port ID" - " for bonding device"); - MLX5_ASSERT(false); - return -ENODEV; - } - /* - * Switch port ID for Host PF representor - * (representor_id is -1) , set to 0xFFF - */ - if (info->switch_info.port_id == 0xffff) - info->switch_info.port_id = 0xfff; - info->switch_info.port_id |= - priv->pf_bond << MLX5_PORT_ID_BONDING_PF_SHIFT; - } MLX5_ETH_FOREACH_DEV(port_id, priv->pci_dev) { struct mlx5_priv *opriv = rte_eth_devices[port_id].data->dev_private; diff --git a/drivers/net/mlx5/mlx5_mac.c b/drivers/net/mlx5/mlx5_mac.c index bd786fd638..b5b810b508 100644 --- a/drivers/net/mlx5/mlx5_mac.c +++ b/drivers/net/mlx5/mlx5_mac.c @@ -159,7 +159,7 @@ mlx5_mac_addr_set(struct rte_eth_dev *dev, struct rte_ether_addr *mac_addr) * Configuring the VF instead of its representor, * need to skip the special case of HPF on Bluefield. */ - if (priv->representor && priv->representor_id >= 0) { + if (priv->representor && !mlx5_is_hpf(dev)) { DRV_LOG(DEBUG, "VF represented by port %u setting primary MAC address", dev->data->port_id); RTE_ETH_FOREACH_DEV_SIBLING(port_id, dev->data->port_id) { @@ -169,7 +169,11 @@ mlx5_mac_addr_set(struct rte_eth_dev *dev, struct rte_ether_addr *mac_addr) return mlx5_os_vf_mac_addr_modify (priv, mlx5_ifindex(&rte_eth_devices[port_id]), - mac_addr, priv->representor_id); + mac_addr, + rte_eth_representor_id_parse( + priv->representor_id, + NULL, NULL, NULL) + ); } } rte_errno = -ENOTSUP; -- 2.25.1