From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mellanox.co.il (mail-il-dmz.mellanox.com [193.47.165.129]) by dpdk.org (Postfix) with ESMTP id 208B17D30 for ; Thu, 24 Aug 2017 14:23:17 +0200 (CEST) Received: from Internal Mail-Server by MTLPINE1 (envelope-from shacharbe@mellanox.com) with ESMTPS (AES256-SHA encrypted); 24 Aug 2017 15:23:14 +0300 Received: from pegasus08.mtr.labs.mlnx (pegasus08.mtr.labs.mlnx [10.210.16.114]) by labmailer.mlnx (8.13.8/8.13.8) with ESMTP id v7OCNEq3031618; Thu, 24 Aug 2017 15:23:14 +0300 Received: from pegasus08.mtr.labs.mlnx (localhost [127.0.0.1]) by pegasus08.mtr.labs.mlnx (8.14.7/8.14.7) with ESMTP id v7OCNEUQ020147; Thu, 24 Aug 2017 12:23:14 GMT Received: (from shacharbe@localhost) by pegasus08.mtr.labs.mlnx (8.14.7/8.14.7/Submit) id v7OCNDhP020145; Thu, 24 Aug 2017 12:23:13 GMT From: Shachar Beiser To: dev@dpdk.org Cc: Shachar Beiser , Adrien Mazarguil , Nelio Laranjeiro Date: Thu, 24 Aug 2017 12:23:10 +0000 Message-Id: <80443745ab52e7f8536918487ddfc97f2efd54b7.1503577332.git.shacharbe@mellanox.com> X-Mailer: git-send-email 1.8.3.1 Subject: [dpdk-dev] [PATCH v1] net/mlx5: support upstream rdma-core X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 24 Aug 2017 12:23:18 -0000 This removes the dependency on specific Mellanox OFED libraries by using the upstream rdma-core and linux upstream community code. Minimal requirements: rdma-core v16 and Kernel Linux 4.14. Signed-off-by: Shachar Beiser --- doc/guides/nics/mlx5.rst | 29 +- drivers/net/mlx5/Makefile | 39 +-- drivers/net/mlx5/mlx5.c | 93 +++-- drivers/net/mlx5/mlx5.h | 4 +- drivers/net/mlx5/mlx5.rst | 663 +++++++++++++++++++++++++++++++++++ drivers/net/mlx5/mlx5_ethdev.c | 10 +- drivers/net/mlx5/mlx5_fdir.c | 103 +++--- drivers/net/mlx5/mlx5_flow.c | 226 ++++++------ drivers/net/mlx5/mlx5_mac.c | 16 +- drivers/net/mlx5/mlx5_prm.h | 41 ++- drivers/net/mlx5/mlx5_rxmode.c | 18 +- drivers/net/mlx5/mlx5_rxq.c | 221 ++++++------ drivers/net/mlx5/mlx5_rxtx.c | 17 +- drivers/net/mlx5/mlx5_rxtx.h | 35 +- drivers/net/mlx5/mlx5_rxtx_vec_sse.c | 5 +- drivers/net/mlx5/mlx5_txq.c | 71 ++-- drivers/net/mlx5/mlx5_vlan.c | 12 +- mk/rte.app.mk | 2 +- 18 files changed, 1145 insertions(+), 460 deletions(-) create mode 100644 drivers/net/mlx5/mlx5.rst diff --git a/doc/guides/nics/mlx5.rst b/doc/guides/nics/mlx5.rst index f4cb18b..a1b3321 100644 --- a/doc/guides/nics/mlx5.rst +++ b/doc/guides/nics/mlx5.rst @@ -295,6 +295,7 @@ DPDK and must be installed separately: - **Kernel modules** (mlnx-ofed-kernel) + DPDK 17.11 supports linux upstream kernel. They provide the kernel-side Verbs API and low level device drivers that manage actual hardware initialization and resources sharing with user space processes. @@ -376,23 +377,27 @@ Supported NICs Quick Start Guide ----------------- -1. Download latest Mellanox OFED. For more info check the `prerequisites`_. +1. Since DPDK 17.11 version Mellanox DPDK runs on both top of linux upstream kernel + and on top of Mellanox OFED. + If your Mellanox DPDK version is older than 17.11 or + you Mellanox DPDK version is newer , but you want to run on top Mellanox OFED : + a. Download latest Mellanox OFED. For more info check the `prerequisites`_. + b. Install the required libraries and kernel modules either by installing + only the required set, or by installing the entire Mellanox OFED: + .. code-block:: console -2. Install the required libraries and kernel modules either by installing - only the required set, or by installing the entire Mellanox OFED: - - .. code-block:: console - - ./mlnxofedinstall - -3. Verify the firmware is the correct one: + ./mlnxofedinstall + If your Mellanox DPDK is newer than 17.11 and runs on top of linux upstream + a. Install linux upstream kernel 4.14v and above. + b. Install Mellanox rdma-core v16 or above +2. Verify the firmware is the correct one: .. code-block:: console ibv_devinfo -4. Verify all ports links are set to Ethernet: +3. Verify all ports links are set to Ethernet: .. code-block:: console @@ -422,7 +427,7 @@ Quick Start Guide mlxconfig -d set SRIOV_EN=1 NUM_OF_VFS=16 mlxfwreset -d reset -5. Restart the driver: +4. Restart the driver: .. code-block:: console @@ -449,7 +454,7 @@ Quick Start Guide echo [num_vfs] > /sys/class/infiniband/mlx5_0/device/sriov_numvfs -6. Compile DPDK and you are ready to go. See instructions on +5. Compile DPDK and you are ready to go. See instructions on :ref:`Development Kit Build System ` Performance tuning diff --git a/drivers/net/mlx5/Makefile b/drivers/net/mlx5/Makefile index 14b739a..2de1c78 100644 --- a/drivers/net/mlx5/Makefile +++ b/drivers/net/mlx5/Makefile @@ -104,41 +104,20 @@ mlx5_autoconf.h.new: FORCE mlx5_autoconf.h.new: $(RTE_SDK)/buildtools/auto-config-h.sh $Q $(RM) -f -- '$@' $Q sh -- '$<' '$@' \ - HAVE_VERBS_IBV_EXP_CQ_COMPRESSED_CQE \ - infiniband/verbs_exp.h \ - enum IBV_EXP_CQ_COMPRESSED_CQE \ + HAVE_IBV_DEVICE_VXLAN_SUPPORT \ + infiniband/verbs.h \ + enum IBV_DEVICE_VXLAN_SUPPORT \ $(AUTOCONF_OUTPUT) $Q sh -- '$<' '$@' \ - HAVE_VERBS_MLX5_ETH_VLAN_INLINE_HEADER_SIZE \ - infiniband/mlx5_hw.h \ - enum MLX5_ETH_VLAN_INLINE_HEADER_SIZE \ + HAVE_IBV_WQ_FLAG_RX_END_PADDING \ + infiniband/verbs.h \ + enum IBV_WQ_FLAG_RX_END_PADDING \ $(AUTOCONF_OUTPUT) $Q sh -- '$<' '$@' \ - HAVE_VERBS_MLX5_OPCODE_TSO \ - infiniband/mlx5_hw.h \ - enum MLX5_OPCODE_TSO \ + HAVE_IBV_MLX5_MOD_MPW \ + infiniband/mlx5dv.h \ + enum MLX5DV_CONTEXT_FLAGS_MPW_ALLOWED \ $(AUTOCONF_OUTPUT) - $Q sh -- '$<' '$@' \ - HAVE_ETHTOOL_LINK_MODE_25G \ - /usr/include/linux/ethtool.h \ - enum ETHTOOL_LINK_MODE_25000baseCR_Full_BIT \ - $(AUTOCONF_OUTPUT) - $Q sh -- '$<' '$@' \ - HAVE_ETHTOOL_LINK_MODE_50G \ - /usr/include/linux/ethtool.h \ - enum ETHTOOL_LINK_MODE_50000baseCR2_Full_BIT \ - $(AUTOCONF_OUTPUT) - $Q sh -- '$<' '$@' \ - HAVE_ETHTOOL_LINK_MODE_100G \ - /usr/include/linux/ethtool.h \ - enum ETHTOOL_LINK_MODE_100000baseKR4_Full_BIT \ - $(AUTOCONF_OUTPUT) - $Q sh -- '$<' '$@' \ - HAVE_UPDATE_CQ_CI \ - infiniband/mlx5_hw.h \ - func ibv_mlx5_exp_update_cq_ci \ - $(AUTOCONF_OUTPUT) - # Create mlx5_autoconf.h or update it in case it differs from the new one. mlx5_autoconf.h: mlx5_autoconf.h.new diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c index bd66a7c..c2e37a3 100644 --- a/drivers/net/mlx5/mlx5.c +++ b/drivers/net/mlx5/mlx5.c @@ -247,10 +247,8 @@ struct mlx5_args { .filter_ctrl = mlx5_dev_filter_ctrl, .rx_descriptor_status = mlx5_rx_descriptor_status, .tx_descriptor_status = mlx5_tx_descriptor_status, -#ifdef HAVE_UPDATE_CQ_CI .rx_queue_intr_enable = mlx5_rx_intr_enable, .rx_queue_intr_disable = mlx5_rx_intr_disable, -#endif }; static struct { @@ -442,7 +440,7 @@ struct mlx5_args { struct ibv_device *ibv_dev; int err = 0; struct ibv_context *attr_ctx = NULL; - struct ibv_device_attr device_attr; + struct ibv_device_attr_ex device_attr; unsigned int sriov; unsigned int mps; unsigned int tunnel_en; @@ -493,34 +491,24 @@ struct mlx5_args { PCI_DEVICE_ID_MELLANOX_CONNECTX5VF) || (pci_dev->id.device_id == PCI_DEVICE_ID_MELLANOX_CONNECTX5EXVF)); - /* - * Multi-packet send is supported by ConnectX-4 Lx PF as well - * as all ConnectX-5 devices. - */ switch (pci_dev->id.device_id) { case PCI_DEVICE_ID_MELLANOX_CONNECTX4: tunnel_en = 1; - mps = MLX5_MPW_DISABLED; break; case PCI_DEVICE_ID_MELLANOX_CONNECTX4LX: - mps = MLX5_MPW; - break; case PCI_DEVICE_ID_MELLANOX_CONNECTX5: case PCI_DEVICE_ID_MELLANOX_CONNECTX5VF: case PCI_DEVICE_ID_MELLANOX_CONNECTX5EX: case PCI_DEVICE_ID_MELLANOX_CONNECTX5EXVF: tunnel_en = 1; - mps = MLX5_MPW_ENHANCED; break; default: - mps = MLX5_MPW_DISABLED; + break; } INFO("PCI information matches, using device \"%s\"" - " (SR-IOV: %s, %sMPS: %s)", + " (SR-IOV: %s)", list[i]->name, - sriov ? "true" : "false", - mps == MLX5_MPW_ENHANCED ? "Enhanced " : "", - mps != MLX5_MPW_DISABLED ? "true" : "false"); + sriov ? "true" : "false"); attr_ctx = ibv_open_device(list[i]); err = errno; break; @@ -539,13 +527,29 @@ struct mlx5_args { return -err; } ibv_dev = list[i]; - DEBUG("device opened"); - if (ibv_query_device(attr_ctx, &device_attr)) +#ifdef HAVE_IBV_MLX5_MOD_MPW + struct mlx5dv_context attrs_out; + mlx5dv_query_device(attr_ctx, &attrs_out); + if (attrs_out.flags & (MLX5DV_CONTEXT_FLAGS_ENHANCED_MPW | + MLX5DV_CONTEXT_FLAGS_MPW_ALLOWED)) { + INFO("Enhanced MPW is detected\n"); + mps = MLX5_MPW_ENHANCED; + } else if (attrs_out.flags & MLX5DV_CONTEXT_FLAGS_MPW_ALLOWED) { + INFO("MPW is detected\n"); + mps = MLX5_MPW; + } else { + INFO("MPW is disabled\n"); + mps = MLX5_MPW_DISABLED; + } +#else + mps = MLX5_MPW_DISABLED; +#endif + if (ibv_query_device_ex(attr_ctx, NULL, &device_attr)) goto error; - INFO("%u port(s) detected", device_attr.phys_port_cnt); + INFO("%u port(s) detected", device_attr.orig_attr.phys_port_cnt); - for (i = 0; i < device_attr.phys_port_cnt; i++) { + for (i = 0; i < device_attr.orig_attr.phys_port_cnt; i++) { uint32_t port = i + 1; /* ports are indexed from one */ uint32_t test = (1 << i); struct ibv_context *ctx = NULL; @@ -553,7 +557,7 @@ struct mlx5_args { struct ibv_pd *pd = NULL; struct priv *priv = NULL; struct rte_eth_dev *eth_dev; - struct ibv_exp_device_attr exp_device_attr; + struct ibv_device_attr_ex device_attr_ex; struct ether_addr mac; uint16_t num_vfs = 0; struct mlx5_args args = { @@ -568,14 +572,6 @@ struct mlx5_args { .rx_vec_en = MLX5_ARG_UNSET, }; - exp_device_attr.comp_mask = - IBV_EXP_DEVICE_ATTR_EXP_CAP_FLAGS | - IBV_EXP_DEVICE_ATTR_RX_HASH | - IBV_EXP_DEVICE_ATTR_VLAN_OFFLOADS | - IBV_EXP_DEVICE_ATTR_RX_PAD_END_ALIGN | - IBV_EXP_DEVICE_ATTR_TSO_CAPS | - 0; - DEBUG("using port %u (%08" PRIx32 ")", port, test); ctx = ibv_open_device(ibv_dev); @@ -619,7 +615,6 @@ struct mlx5_args { err = ENOMEM; goto port_error; } - priv->ctx = ctx; priv->device_attr = device_attr; priv->port = port; @@ -638,25 +633,26 @@ struct mlx5_args { goto port_error; } mlx5_args_assign(priv, &args); - if (ibv_exp_query_device(ctx, &exp_device_attr)) { - ERROR("ibv_exp_query_device() failed"); + if (ibv_query_device_ex(ctx, NULL, &device_attr_ex)) { + ERROR("ibv_query_device_ex() failed"); goto port_error; } priv->hw_csum = - ((exp_device_attr.exp_device_cap_flags & - IBV_EXP_DEVICE_RX_CSUM_TCP_UDP_PKT) && - (exp_device_attr.exp_device_cap_flags & - IBV_EXP_DEVICE_RX_CSUM_IP_PKT)); + ((device_attr_ex.device_cap_flags_ex & + IBV_DEVICE_UD_IP_CSUM)); DEBUG("checksum offloading is %ssupported", (priv->hw_csum ? "" : "not ")); +#ifdef HAVE_IBV_DEVICE_VXLAN_SUPPORT priv->hw_csum_l2tun = !!(exp_device_attr.exp_device_cap_flags & - IBV_EXP_DEVICE_VXLAN_SUPPORT); + IBV_DEVICE_VXLAN_SUPPORT); +#endif DEBUG("L2 tunnel checksum offloads are %ssupported", (priv->hw_csum_l2tun ? "" : "not ")); - priv->ind_table_max_size = exp_device_attr.rx_hash_caps.max_rwq_indirection_table_size; + priv->ind_table_max_size = + device_attr_ex.rss_caps.max_rwq_indirection_table_size; /* Remove this check once DPDK supports larger/variable * indirection tables. */ if (priv->ind_table_max_size > @@ -664,29 +660,32 @@ struct mlx5_args { priv->ind_table_max_size = ETH_RSS_RETA_SIZE_512; DEBUG("maximum RX indirection table size is %u", priv->ind_table_max_size); - priv->hw_vlan_strip = !!(exp_device_attr.wq_vlan_offloads_cap & - IBV_EXP_RECEIVE_WQ_CVLAN_STRIP); + priv->hw_vlan_strip = !!(device_attr_ex.raw_packet_caps & + IBV_RAW_PACKET_CAP_CVLAN_STRIPPING); DEBUG("VLAN stripping is %ssupported", (priv->hw_vlan_strip ? "" : "not ")); - priv->hw_fcs_strip = !!(exp_device_attr.exp_device_cap_flags & - IBV_EXP_DEVICE_SCATTER_FCS); + priv->hw_fcs_strip = + !!(device_attr_ex.orig_attr.device_cap_flags & + IBV_WQ_FLAGS_SCATTER_FCS); DEBUG("FCS stripping configuration is %ssupported", (priv->hw_fcs_strip ? "" : "not ")); - priv->hw_padding = !!exp_device_attr.rx_pad_end_addr_align; +#ifdef HAVE_IBV_WQ_FLAG_RX_END_PADDING + priv->hw_padding = !!device_attr_ex.rx_pad_end_addr_align; +#endif DEBUG("hardware RX end alignment padding is %ssupported", (priv->hw_padding ? "" : "not ")); priv_get_num_vfs(priv, &num_vfs); priv->sriov = (num_vfs || sriov); priv->tso = ((priv->tso) && - (exp_device_attr.tso_caps.max_tso > 0) && - (exp_device_attr.tso_caps.supported_qpts & - (1 << IBV_QPT_RAW_ETH))); + (device_attr_ex.tso_caps.max_tso > 0) && + (device_attr_ex.tso_caps.supported_qpts & + (1 << IBV_QPT_RAW_PACKET))); if (priv->tso) priv->max_tso_payload_sz = - exp_device_attr.tso_caps.max_tso; + device_attr_ex.tso_caps.max_tso; if (priv->mps && !mps) { ERROR("multi-packet send not supported on this device" " (" MLX5_TXQ_MPW_EN ")"); diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h index e89aba8..ab03fe0 100644 --- a/drivers/net/mlx5/mlx5.h +++ b/drivers/net/mlx5/mlx5.h @@ -89,7 +89,7 @@ struct mlx5_xstats_ctrl { struct priv { struct rte_eth_dev *dev; /* Ethernet device. */ struct ibv_context *ctx; /* Verbs context. */ - struct ibv_device_attr device_attr; /* Device properties. */ + struct ibv_device_attr_ex device_attr; /* Device properties. */ struct ibv_pd *pd; /* Protection Domain. */ /* * MAC addresses array and configuration bit-field. @@ -132,7 +132,7 @@ struct priv { struct rxq *(*rxqs)[]; /* RX queues. */ struct txq *(*txqs)[]; /* TX queues. */ /* Indirection tables referencing all RX WQs. */ - struct ibv_exp_rwq_ind_table *(*ind_tables)[]; + struct ibv_rwq_ind_table *(*ind_tables)[]; unsigned int ind_tables_n; /* Number of indirection tables. */ unsigned int ind_table_max_size; /* Maximum indirection table size. */ /* Hash RX QPs feeding the indirection table. */ diff --git a/drivers/net/mlx5/mlx5.rst b/drivers/net/mlx5/mlx5.rst new file mode 100644 index 0000000..ae06636 --- /dev/null +++ b/drivers/net/mlx5/mlx5.rst @@ -0,0 +1,663 @@ +.. BSD LICENSE + Copyright 2015 6WIND S.A. + Copyright 2015 Mellanox + + Redistribution and use in source and binary forms, with or without + modification, are permitted provided that the following conditions + are met: + + * Redistributions of source code must retain the above copyright + notice, this list of conditions and the following disclaimer. + * Redistributions in binary form must reproduce the above copyright + notice, this list of conditions and the following disclaimer in + the documentation and/or other materials provided with the + distribution. + * Neither the name of 6WIND S.A. nor the names of its + contributors may be used to endorse or promote products derived + from this software without specific prior written permission. + + THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +MLX5 poll mode driver +===================== + +The MLX5 poll mode driver library (**librte_pmd_mlx5**) provides support +for **Mellanox ConnectX-4**, **Mellanox ConnectX-4 Lx** and **Mellanox +ConnectX-5** families of 10/25/40/50/100 Gb/s adapters as well as their +virtual functions (VF) in SR-IOV context. + +Information and documentation about these adapters can be found on the +`Mellanox website `__. Help is also provided by the +`Mellanox community `__. + +There is also a `section dedicated to this poll mode driver +`__. + +.. note:: + + Due to external dependencies, this driver is disabled by default. It must + be enabled manually by setting ``CONFIG_RTE_LIBRTE_MLX5_PMD=y`` and + recompiling DPDK. + +Implementation details +---------------------- + +Besides its dependency on libibverbs (that implies libmlx5 and associated +kernel support), librte_pmd_mlx5 relies heavily on system calls for control +operations such as querying/updating the MTU and flow control parameters. + +For security reasons and robustness, this driver only deals with virtual +memory addresses. The way resources allocations are handled by the kernel +combined with hardware specifications that allow it to handle virtual memory +addresses directly ensure that DPDK applications cannot access random +physical memory (or memory that does not belong to the current process). + +This capability allows the PMD to coexist with kernel network interfaces +which remain functional, although they stop receiving unicast packets as +long as they share the same MAC address. +This means legacy linux control tools (for example: ethtool, ifconfig and +more) can operate on the same network interfaces that owned by the DPDK +application. + +Enabling librte_pmd_mlx5 causes DPDK applications to be linked against +libibverbs. + +Features +-------- + +- Multi arch support: x86_64, POWER8, ARMv8. +- Multiple TX and RX queues. +- Support for scattered TX and RX frames. +- IPv4, IPv6, TCPv4, TCPv6, UDPv4 and UDPv6 RSS on any number of queues. +- Several RSS hash keys, one for each flow type. +- Configurable RETA table. +- Support for multiple MAC addresses. +- VLAN filtering. +- RX VLAN stripping. +- TX VLAN insertion. +- RX CRC stripping configuration. +- Promiscuous mode. +- Multicast promiscuous mode. +- Hardware checksum offloads. +- Flow director (RTE_FDIR_MODE_PERFECT, RTE_FDIR_MODE_PERFECT_MAC_VLAN and + RTE_ETH_FDIR_REJECT). +- Flow API. +- Secondary process TX is supported. +- KVM and VMware ESX SR-IOV modes are supported. +- RSS hash result is supported. +- Hardware TSO. +- Hardware checksum TX offload for VXLAN and GRE. +- RX interrupts. +- Statistics query including Basic, Extended and per queue. + +Limitations +----------- + +- Inner RSS for VXLAN frames is not supported yet. +- Port statistics through software counters only. +- Hardware checksum RX offloads for VXLAN inner header are not supported yet. +- Secondary process RX is not supported. +- Flow pattern without any specific vlan will match for vlan packets as well: + + When VLAN spec is not specified in the pattern, the matching rule will be created with VLAN as a wild card. + Meaning, the flow rule:: + + flow create 0 ingress pattern eth / vlan vid is 3 / ipv4 / end ... + + Will only match vlan packets with vid=3. and the flow rules:: + + flow create 0 ingress pattern eth / ipv4 / end ... + + Or:: + + flow create 0 ingress pattern eth / vlan / ipv4 / end ... + + Will match any ipv4 packet (VLAN included). + +Configuration +------------- + +Compilation options +~~~~~~~~~~~~~~~~~~~ + +These options can be modified in the ``.config`` file. + +- ``CONFIG_RTE_LIBRTE_MLX5_PMD`` (default **n**) + + Toggle compilation of librte_pmd_mlx5 itself. + +- ``CONFIG_RTE_LIBRTE_MLX5_DEBUG`` (default **n**) + + Toggle debugging code and stricter compilation flags. Enabling this option + adds additional run-time checks and debugging messages at the cost of + lower performance. + +- ``CONFIG_RTE_LIBRTE_MLX5_TX_MP_CACHE`` (default **8**) + + Maximum number of cached memory pools (MPs) per TX queue. Each MP from + which buffers are to be transmitted must be associated to memory regions + (MRs). This is a slow operation that must be cached. + + This value is always 1 for RX queues since they use a single MP. + +Environment variables +~~~~~~~~~~~~~~~~~~~~~ + +- ``MLX5_PMD_ENABLE_PADDING`` + + Enables HW packet padding in PCI bus transactions. + + When packet size is cache aligned and CRC stripping is enabled, 4 fewer + bytes are written to the PCI bus. Enabling padding makes such packets + aligned again. + + In cases where PCI bandwidth is the bottleneck, padding can improve + performance by 10%. + + This is disabled by default since this can also decrease performance for + unaligned packet sizes. + +Run-time configuration +~~~~~~~~~~~~~~~~~~~~~~ + +- librte_pmd_mlx5 brings kernel network interfaces up during initialization + because it is affected by their state. Forcing them down prevents packets + reception. + +- **ethtool** operations on related kernel interfaces also affect the PMD. + +- ``rxq_cqe_comp_en`` parameter [int] + + A nonzero value enables the compression of CQE on RX side. This feature + allows to save PCI bandwidth and improve performance. Enabled by default. + + Supported on: + + - x86_64 with ConnectX-4, ConnectX-4 LX and ConnectX-5. + - POWER8 and ARMv8 with ConnectX-4 LX and ConnectX-5. + +- ``txq_inline`` parameter [int] + + Amount of data to be inlined during TX operations. Improves latency. + Can improve PPS performance when PCI back pressure is detected and may be + useful for scenarios involving heavy traffic on many queues. + + Because additional software logic is necessary to handle this mode, this + option should be used with care, as it can lower performance when back + pressure is not expected. + +- ``txqs_min_inline`` parameter [int] + + Enable inline send only when the number of TX queues is greater or equal + to this value. + + This option should be used in combination with ``txq_inline`` above. + + On ConnectX-4, ConnectX-4 LX and ConnectX-5 without Enhanced MPW: + + - Disabled by default. + - In case ``txq_inline`` is set recommendation is 4. + + On ConnectX-5 with Enhanced MPW: + + - Set to 8 by default. + +- ``txq_mpw_en`` parameter [int] + + A nonzero value enables multi-packet send (MPS) for ConnectX-4 Lx and + enhanced multi-packet send (Enhanced MPS) for ConnectX-5. MPS allows the + TX burst function to pack up multiple packets in a single descriptor + session in order to save PCI bandwidth and improve performance at the + cost of a slightly higher CPU usage. When ``txq_inline`` is set along + with ``txq_mpw_en``, TX burst function tries to copy entire packet data + on to TX descriptor instead of including pointer of packet only if there + is enough room remained in the descriptor. ``txq_inline`` sets + per-descriptor space for either pointers or inlined packets. In addition, + Enhanced MPS supports hybrid mode - mixing inlined packets and pointers + in the same descriptor. + + This option cannot be used in conjunction with ``tso`` below. When ``tso`` + is set, ``txq_mpw_en`` is disabled. + + It is currently only supported on the ConnectX-4 Lx and ConnectX-5 + families of adapters. Enabled by default. + +- ``txq_mpw_hdr_dseg_en`` parameter [int] + + A nonzero value enables including two pointers in the first block of TX + descriptor. This can be used to lessen CPU load for memory copy. + + Effective only when Enhanced MPS is supported. Disabled by default. + +- ``txq_max_inline_len`` parameter [int] + + Maximum size of packet to be inlined. This limits the size of packet to + be inlined. If the size of a packet is larger than configured value, the + packet isn't inlined even though there's enough space remained in the + descriptor. Instead, the packet is included with pointer. + + Effective only when Enhanced MPS is supported. The default value is 256. + +- ``tso`` parameter [int] + + A nonzero value enables hardware TSO. + When hardware TSO is enabled, packets marked with TCP segmentation + offload will be divided into segments by the hardware. Disabled by default. + +- ``tx_vec_en`` parameter [int] + + A nonzero value enables Tx vector on ConnectX-5 only NIC if the number of + global Tx queues on the port is lesser than MLX5_VPMD_MIN_TXQS. + + Enabled by default on ConnectX-5. + +- ``rx_vec_en`` parameter [int] + + A nonzero value enables Rx vector if the port is not configured in + multi-segment otherwise this parameter is ignored. + + Enabled by default. + +Prerequisites +------------- + +This driver relies on external libraries and kernel drivers for resources +allocations and initialization. The following dependencies are not part of +DPDK and must be installed separately: + +- **libibverbs** + + User space Verbs framework used by librte_pmd_mlx5. This library provides + a generic interface between the kernel and low-level user space drivers + such as libmlx5. + + It allows slow and privileged operations (context initialization, hardware + resources allocations) to be managed by the kernel and fast operations to + never leave user space. + +- **libmlx5** + + Low-level user space driver library for Mellanox ConnectX-4/ConnectX-5 + devices, it is automatically loaded by libibverbs. + + This library basically implements send/receive calls to the hardware + queues. + +- **Kernel modules** (mlnx-ofed-kernel or linux upstream) + + They provide the kernel-side Verbs API and low level device drivers that + manage actual hardware initialization and resources sharing with user + space processes. + + Unlike most other PMDs, these modules must remain loaded and bound to + their devices: + + - mlx5_core: hardware driver managing Mellanox ConnectX-4/ConnectX-5 + devices and related Ethernet kernel network devices. + - mlx5_ib: InifiniBand device driver. + - ib_uverbs: user space driver for Verbs (entry point for libibverbs). + +- **Firmware update** + + Mellanox OFED releases include firmware updates for ConnectX-4/ConnectX-5 + adapters. + + Because each release provides new features, these updates must be applied to + match the kernel modules and libraries they come with. + +.. note:: + + Both libraries are BSD and GPL licensed. Linux kernel modules are GPL + licensed. + +Currently supported by DPDK: + +- Mellanox OFED version: **4.1**. +- firmware version: + + - ConnectX-4: **12.20.1010** and above. + - ConnectX-4 Lx: **14.20.1010** and above. + - ConnectX-5: **16.20.1010** and above. + - ConnectX-5 Ex: **16.20.1010** and above. + +Getting Mellanox OFED +~~~~~~~~~~~~~~~~~~~~~ + +While these libraries and kernel modules are available on OpenFabrics +Alliance's `website `__ and provided by package +managers on most distributions, this PMD requires Ethernet extensions that +may not be supported at the moment (this is a work in progress). + +`Mellanox OFED +`__ +includes the necessary support and should be used in the meantime. For DPDK, +only libibverbs, libmlx5, mlnx-ofed-kernel packages and firmware updates are +required from that distribution. + +.. note:: + + Several versions of Mellanox OFED are available. Installing the version + this DPDK release was developed and tested against is strongly + recommended. Please check the `prerequisites`_. + +Supported NICs +-------------- + +* Mellanox(R) ConnectX(R)-4 10G MCX4111A-XCAT (1x10G) +* Mellanox(R) ConnectX(R)-4 10G MCX4121A-XCAT (2x10G) +* Mellanox(R) ConnectX(R)-4 25G MCX4111A-ACAT (1x25G) +* Mellanox(R) ConnectX(R)-4 25G MCX4121A-ACAT (2x25G) +* Mellanox(R) ConnectX(R)-4 40G MCX4131A-BCAT (1x40G) +* Mellanox(R) ConnectX(R)-4 40G MCX413A-BCAT (1x40G) +* Mellanox(R) ConnectX(R)-4 40G MCX415A-BCAT (1x40G) +* Mellanox(R) ConnectX(R)-4 50G MCX4131A-GCAT (1x50G) +* Mellanox(R) ConnectX(R)-4 50G MCX413A-GCAT (1x50G) +* Mellanox(R) ConnectX(R)-4 50G MCX414A-BCAT (2x50G) +* Mellanox(R) ConnectX(R)-4 50G MCX415A-GCAT (2x50G) +* Mellanox(R) ConnectX(R)-4 50G MCX416A-BCAT (2x50G) +* Mellanox(R) ConnectX(R)-4 50G MCX416A-GCAT (2x50G) +* Mellanox(R) ConnectX(R)-4 50G MCX415A-CCAT (1x100G) +* Mellanox(R) ConnectX(R)-4 100G MCX416A-CCAT (2x100G) +* Mellanox(R) ConnectX(R)-4 Lx 10G MCX4121A-XCAT (2x10G) +* Mellanox(R) ConnectX(R)-4 Lx 25G MCX4121A-ACAT (2x25G) +* Mellanox(R) ConnectX(R)-5 100G MCX556A-ECAT (2x100G) +* Mellanox(R) ConnectX(R)-5 Ex EN 100G MCX516A-CDAT (2x100G) + +Quick Start Guide for OFED users +-------------------------------- + +1. Download latest Mellanox OFED. For more info check the `prerequisites`_. + + +2. Install the required libraries and kernel modules either by installing + only the required set, or by installing the entire Mellanox OFED: + + .. code-block:: console + + ./mlnxofedinstall + +3. Verify the firmware is the correct one: + + .. code-block:: console + + ibv_devinfo + +4. Verify all ports links are set to Ethernet: + + .. code-block:: console + + mlxconfig -d query | grep LINK_TYPE + LINK_TYPE_P1 ETH(2) + LINK_TYPE_P2 ETH(2) + + Link types may have to be configured to Ethernet: + + .. code-block:: console + + mlxconfig -d set LINK_TYPE_P1/2=1/2/3 + + * LINK_TYPE_P1=<1|2|3> , 1=Infiniband 2=Ethernet 3=VPI(auto-sense) + + For hypervisors verify SR-IOV is enabled on the NIC: + + .. code-block:: console + + mlxconfig -d query | grep SRIOV_EN + SRIOV_EN True(1) + + If needed, set enable the set the relevant fields: + + .. code-block:: console + + mlxconfig -d set SRIOV_EN=1 NUM_OF_VFS=16 + mlxfwreset -d reset + +5. Restart the driver: + + .. code-block:: console + + /etc/init.d/openibd restart + + or: + + .. code-block:: console + + service openibd restart + + If link type was changed, firmware must be reset as well: + + .. code-block:: console + + mlxfwreset -d reset + + For hypervisors, after reset write the sysfs number of virtual functions + needed for the PF. + + To dynamically instantiate a given number of virtual functions (VFs): + + .. code-block:: console + + echo [num_vfs] > /sys/class/infiniband/mlx5_0/device/sriov_numvfs + +6. Compile DPDK and you are ready to go. See instructions on + :ref:`Development Kit Build System ` + +Performance tuning +------------------ + +1. Configure aggressive CQE Zipping for maximum performance: + + .. code-block:: console + + mlxconfig -d s CQE_COMPRESSION=1 + + To set it back to the default CQE Zipping mode use: + + .. code-block:: console + + mlxconfig -d s CQE_COMPRESSION=0 + +2. In case of virtualization: + + - Make sure that hypervisor kernel is 3.16 or newer. + - Configure boot with ``iommu=pt``. + - Use 1G huge pages. + - Make sure to allocate a VM on huge pages. + - Make sure to set CPU pinning. + +3. Use the CPU near local NUMA node to which the PCIe adapter is connected, + for better performance. For VMs, verify that the right CPU + and NUMA node are pinned according to the above. Run: + + .. code-block:: console + + lstopo-no-graphics + + to identify the NUMA node to which the PCIe adapter is connected. + +4. If more than one adapter is used, and root complex capabilities allow + to put both adapters on the same NUMA node without PCI bandwidth degradation, + it is recommended to locate both adapters on the same NUMA node. + This in order to forward packets from one to the other without + NUMA performance penalty. + +5. Disable pause frames: + + .. code-block:: console + + ethtool -A rx off tx off + +6. Verify IO non-posted prefetch is disabled by default. This can be checked + via the BIOS configuration. Please contact you server provider for more + information about the settings. + +.. note:: + + On some machines, depends on the machine integrator, it is beneficial + to set the PCI max read request parameter to 1K. This can be + done in the following way: + + To query the read request size use: + + .. code-block:: console + + setpci -s 68.w + + If the output is different than 3XXX, set it by: + + .. code-block:: console + + setpci -s 68.w=3XXX + + The XXX can be different on different systems. Make sure to configure + according to the setpci output. + +Notes for testpmd +----------------- + +Compared to librte_pmd_mlx4 that implements a single RSS configuration per +port, librte_pmd_mlx5 supports per-protocol RSS configuration. + +Since ``testpmd`` defaults to IP RSS mode and there is currently no +command-line parameter to enable additional protocols (UDP and TCP as well +as IP), the following commands must be entered from its CLI to get the same +behavior as librte_pmd_mlx4: + +.. code-block:: console + + > port stop all + > port config all rss all + > port start all + +Usage example +------------- + +This section demonstrates how to launch **testpmd** with Mellanox +ConnectX-4/ConnectX-5 devices managed by librte_pmd_mlx5. + +#. Load the kernel modules: + + .. code-block:: console + + modprobe -a ib_uverbs mlx5_core mlx5_ib + + Alternatively if MLNX_OFED is fully installed, the following script can + be run: + + .. code-block:: console + + /etc/init.d/openibd restart + + .. note:: + + User space I/O kernel modules (uio and igb_uio) are not used and do + not have to be loaded. + +#. Make sure Ethernet interfaces are in working order and linked to kernel + verbs. Related sysfs entries should be present: + + .. code-block:: console + + ls -d /sys/class/net/*/device/infiniband_verbs/uverbs* | cut -d / -f 5 + + Example output: + + .. code-block:: console + + eth30 + eth31 + eth32 + eth33 + +#. Optionally, retrieve their PCI bus addresses for whitelisting: + + .. code-block:: console + + { + for intf in eth2 eth3 eth4 eth5; + do + (cd "/sys/class/net/${intf}/device/" && pwd -P); + done; + } | + sed -n 's,.*/\(.*\),-w \1,p' + + Example output: + + .. code-block:: console + + -w 0000:05:00.1 + -w 0000:06:00.0 + -w 0000:06:00.1 + -w 0000:05:00.0 + +#. Request huge pages: + + .. code-block:: console + + echo 1024 > /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages/nr_hugepages + +#. Start testpmd with basic parameters: + + .. code-block:: console + + testpmd -l 8-15 -n 4 -w 05:00.0 -w 05:00.1 -w 06:00.0 -w 06:00.1 -- --rxq=2 --txq=2 -i + + Example output: + + .. code-block:: console + + [...] + EAL: PCI device 0000:05:00.0 on NUMA socket 0 + EAL: probe driver: 15b3:1013 librte_pmd_mlx5 + PMD: librte_pmd_mlx5: PCI information matches, using device "mlx5_0" (VF: false) + PMD: librte_pmd_mlx5: 1 port(s) detected + PMD: librte_pmd_mlx5: port 1 MAC address is e4:1d:2d:e7:0c:fe + EAL: PCI device 0000:05:00.1 on NUMA socket 0 + EAL: probe driver: 15b3:1013 librte_pmd_mlx5 + PMD: librte_pmd_mlx5: PCI information matches, using device "mlx5_1" (VF: false) + PMD: librte_pmd_mlx5: 1 port(s) detected + PMD: librte_pmd_mlx5: port 1 MAC address is e4:1d:2d:e7:0c:ff + EAL: PCI device 0000:06:00.0 on NUMA socket 0 + EAL: probe driver: 15b3:1013 librte_pmd_mlx5 + PMD: librte_pmd_mlx5: PCI information matches, using device "mlx5_2" (VF: false) + PMD: librte_pmd_mlx5: 1 port(s) detected + PMD: librte_pmd_mlx5: port 1 MAC address is e4:1d:2d:e7:0c:fa + EAL: PCI device 0000:06:00.1 on NUMA socket 0 + EAL: probe driver: 15b3:1013 librte_pmd_mlx5 + PMD: librte_pmd_mlx5: PCI information matches, using device "mlx5_3" (VF: false) + PMD: librte_pmd_mlx5: 1 port(s) detected + PMD: librte_pmd_mlx5: port 1 MAC address is e4:1d:2d:e7:0c:fb + Interactive-mode selected + Configuring Port 0 (socket 0) + PMD: librte_pmd_mlx5: 0x8cba80: TX queues number update: 0 -> 2 + PMD: librte_pmd_mlx5: 0x8cba80: RX queues number update: 0 -> 2 + Port 0: E4:1D:2D:E7:0C:FE + Configuring Port 1 (socket 0) + PMD: librte_pmd_mlx5: 0x8ccac8: TX queues number update: 0 -> 2 + PMD: librte_pmd_mlx5: 0x8ccac8: RX queues number update: 0 -> 2 + Port 1: E4:1D:2D:E7:0C:FF + Configuring Port 2 (socket 0) + PMD: librte_pmd_mlx5: 0x8cdb10: TX queues number update: 0 -> 2 + PMD: librte_pmd_mlx5: 0x8cdb10: RX queues number update: 0 -> 2 + Port 2: E4:1D:2D:E7:0C:FA + Configuring Port 3 (socket 0) + PMD: librte_pmd_mlx5: 0x8ceb58: TX queues number update: 0 -> 2 + PMD: librte_pmd_mlx5: 0x8ceb58: RX queues number update: 0 -> 2 + Port 3: E4:1D:2D:E7:0C:FB + Checking link statuses... + Port 0 Link Up - speed 40000 Mbps - full-duplex + Port 1 Link Up - speed 40000 Mbps - full-duplex + Port 2 Link Up - speed 10000 Mbps - full-duplex + Port 3 Link Up - speed 10000 Mbps - full-duplex + Done + testpmd> diff --git a/drivers/net/mlx5/mlx5_ethdev.c b/drivers/net/mlx5/mlx5_ethdev.c index 57f6237..f2acb61 100644 --- a/drivers/net/mlx5/mlx5_ethdev.c +++ b/drivers/net/mlx5/mlx5_ethdev.c @@ -97,21 +97,15 @@ struct ethtool_link_settings { #define ETHTOOL_LINK_MODE_56000baseSR4_Full_BIT 29 #define ETHTOOL_LINK_MODE_56000baseLR4_Full_BIT 30 #endif -#ifndef HAVE_ETHTOOL_LINK_MODE_25G #define ETHTOOL_LINK_MODE_25000baseCR_Full_BIT 31 #define ETHTOOL_LINK_MODE_25000baseKR_Full_BIT 32 #define ETHTOOL_LINK_MODE_25000baseSR_Full_BIT 33 -#endif -#ifndef HAVE_ETHTOOL_LINK_MODE_50G #define ETHTOOL_LINK_MODE_50000baseCR2_Full_BIT 34 #define ETHTOOL_LINK_MODE_50000baseKR2_Full_BIT 35 -#endif -#ifndef HAVE_ETHTOOL_LINK_MODE_100G #define ETHTOOL_LINK_MODE_100000baseKR4_Full_BIT 36 #define ETHTOOL_LINK_MODE_100000baseSR4_Full_BIT 37 #define ETHTOOL_LINK_MODE_100000baseCR4_Full_BIT 38 #define ETHTOOL_LINK_MODE_100000baseLR4_ER4_Full_BIT 39 -#endif #define ETHTOOL_LINK_MODE_MASK_MAX_KERNEL_NU32 (SCHAR_MAX) /** @@ -660,8 +654,8 @@ struct priv * * Since we need one CQ per QP, the limit is the minimum number * between the two values. */ - max = ((priv->device_attr.max_cq > priv->device_attr.max_qp) ? - priv->device_attr.max_qp : priv->device_attr.max_cq); + max = RTE_MIN(priv->device_attr.orig_attr.max_cq, + priv->device_attr.orig_attr.max_qp); /* If max >= 65535 then max = 0, max_rx_queues is uint16_t. */ if (max >= 65535) max = 65535; diff --git a/drivers/net/mlx5/mlx5_fdir.c b/drivers/net/mlx5/mlx5_fdir.c index ad256e4..acae668 100644 --- a/drivers/net/mlx5/mlx5_fdir.c +++ b/drivers/net/mlx5/mlx5_fdir.c @@ -72,7 +72,7 @@ struct mlx5_fdir_filter { uint16_t queue; /* Queue assigned to if FDIR match. */ enum rte_eth_fdir_behavior behavior; struct fdir_flow_desc desc; - struct ibv_exp_flow *flow; + struct ibv_flow *flow; }; LIST_HEAD(fdir_filter_list, mlx5_fdir_filter); @@ -238,19 +238,19 @@ struct mlx5_fdir_filter { struct mlx5_fdir_filter *mlx5_fdir_filter, struct fdir_queue *fdir_queue) { - struct ibv_exp_flow *flow; + struct ibv_flow *flow; struct fdir_flow_desc *desc = &mlx5_fdir_filter->desc; enum rte_fdir_mode fdir_mode = priv->dev->data->dev_conf.fdir_conf.mode; struct rte_eth_fdir_masks *mask = &priv->dev->data->dev_conf.fdir_conf.mask; FLOW_ATTR_SPEC_ETH(data, priv_flow_attr(priv, NULL, 0, desc->type)); - struct ibv_exp_flow_attr *attr = &data->attr; + struct ibv_flow_attr *attr = &data->attr; uintptr_t spec_offset = (uintptr_t)&data->spec; - struct ibv_exp_flow_spec_eth *spec_eth; - struct ibv_exp_flow_spec_ipv4 *spec_ipv4; - struct ibv_exp_flow_spec_ipv6 *spec_ipv6; - struct ibv_exp_flow_spec_tcp_udp *spec_tcp_udp; + struct ibv_flow_spec_eth *spec_eth; + struct ibv_flow_spec_ipv4 *spec_ipv4; + struct ibv_flow_spec_ipv6 *spec_ipv6; + struct ibv_flow_spec_tcp_udp *spec_tcp_udp; struct mlx5_fdir_filter *iter_fdir_filter; unsigned int i; @@ -272,10 +272,10 @@ struct mlx5_fdir_filter { priv_flow_attr(priv, attr, sizeof(data), desc->type); /* Set Ethernet spec */ - spec_eth = (struct ibv_exp_flow_spec_eth *)spec_offset; + spec_eth = (struct ibv_flow_spec_eth *)spec_offset; /* The first specification must be Ethernet. */ - assert(spec_eth->type == IBV_EXP_FLOW_SPEC_ETH); + assert(spec_eth->type == IBV_FLOW_SPEC_ETH); assert(spec_eth->size == sizeof(*spec_eth)); /* VLAN ID */ @@ -302,10 +302,10 @@ struct mlx5_fdir_filter { spec_offset += spec_eth->size; /* Set IP spec */ - spec_ipv4 = (struct ibv_exp_flow_spec_ipv4 *)spec_offset; + spec_ipv4 = (struct ibv_flow_spec_ipv4 *)spec_offset; /* The second specification must be IP. */ - assert(spec_ipv4->type == IBV_EXP_FLOW_SPEC_IPV4); + assert(spec_ipv4->type == IBV_FLOW_SPEC_IPV4); assert(spec_ipv4->size == sizeof(*spec_ipv4)); spec_ipv4->val.src_ip = @@ -329,10 +329,10 @@ struct mlx5_fdir_filter { spec_offset += spec_eth->size; /* Set IP spec */ - spec_ipv6 = (struct ibv_exp_flow_spec_ipv6 *)spec_offset; + spec_ipv6 = (struct ibv_flow_spec_ipv6 *)spec_offset; /* The second specification must be IP. */ - assert(spec_ipv6->type == IBV_EXP_FLOW_SPEC_IPV6); + assert(spec_ipv6->type == IBV_FLOW_SPEC_IPV6); assert(spec_ipv6->size == sizeof(*spec_ipv6)); for (i = 0; i != RTE_DIM(desc->src_ip); ++i) { @@ -362,11 +362,11 @@ struct mlx5_fdir_filter { } /* Set TCP/UDP flow specification. */ - spec_tcp_udp = (struct ibv_exp_flow_spec_tcp_udp *)spec_offset; + spec_tcp_udp = (struct ibv_flow_spec_tcp_udp *)spec_offset; /* The third specification must be TCP/UDP. */ - assert(spec_tcp_udp->type == IBV_EXP_FLOW_SPEC_TCP || - spec_tcp_udp->type == IBV_EXP_FLOW_SPEC_UDP); + assert(spec_tcp_udp->type == IBV_FLOW_SPEC_TCP || + spec_tcp_udp->type == IBV_FLOW_SPEC_UDP); assert(spec_tcp_udp->size == sizeof(*spec_tcp_udp)); spec_tcp_udp->val.src_port = desc->src_port & mask->src_port_mask; @@ -380,7 +380,7 @@ struct mlx5_fdir_filter { create_flow: errno = 0; - flow = ibv_exp_create_flow(fdir_queue->qp, attr); + flow = ibv_create_flow(fdir_queue->qp, attr); if (flow == NULL) { /* It's not clear whether errno is always set in this case. */ ERROR("%p: flow director configuration failed, errno=%d: %s", @@ -416,16 +416,16 @@ struct mlx5_fdir_filter { assert(idx < priv->rxqs_n); if (fdir_queue == rxq_ctrl->fdir_queue && fdir_filter->flow != NULL) { - claim_zero(ibv_exp_destroy_flow(fdir_filter->flow)); + claim_zero(ibv_destroy_flow(fdir_filter->flow)); fdir_filter->flow = NULL; } } assert(fdir_queue->qp); claim_zero(ibv_destroy_qp(fdir_queue->qp)); assert(fdir_queue->ind_table); - claim_zero(ibv_exp_destroy_rwq_ind_table(fdir_queue->ind_table)); + claim_zero(ibv_destroy_rwq_ind_table(fdir_queue->ind_table)); if (fdir_queue->wq) - claim_zero(ibv_exp_destroy_wq(fdir_queue->wq)); + claim_zero(ibv_destroy_wq(fdir_queue->wq)); if (fdir_queue->cq) claim_zero(ibv_destroy_cq(fdir_queue->cq)); #ifndef NDEBUG @@ -447,7 +447,7 @@ struct mlx5_fdir_filter { * Related flow director queue on success, NULL otherwise. */ static struct fdir_queue * -priv_fdir_queue_create(struct priv *priv, struct ibv_exp_wq *wq, +priv_fdir_queue_create(struct priv *priv, struct ibv_wq *wq, unsigned int socket) { struct fdir_queue *fdir_queue; @@ -461,21 +461,18 @@ struct mlx5_fdir_filter { assert(priv->pd); assert(priv->ctx); if (!wq) { - fdir_queue->cq = ibv_exp_create_cq( - priv->ctx, 1, NULL, NULL, 0, - &(struct ibv_exp_cq_init_attr){ - .comp_mask = 0, - }); + fdir_queue->cq = ibv_create_cq( + priv->ctx, 1, NULL, NULL, 0); if (!fdir_queue->cq) { ERROR("cannot create flow director CQ"); goto error; } - fdir_queue->wq = ibv_exp_create_wq( + fdir_queue->wq = ibv_create_wq( priv->ctx, - &(struct ibv_exp_wq_init_attr){ - .wq_type = IBV_EXP_WQT_RQ, - .max_recv_wr = 1, - .max_recv_sge = 1, + &(struct ibv_wq_init_attr){ + .wq_type = IBV_WQT_RQ, + .max_wr = 1, + .max_sge = 1, .pd = priv->pd, .cq = fdir_queue->cq, }); @@ -485,10 +482,9 @@ struct mlx5_fdir_filter { } wq = fdir_queue->wq; } - fdir_queue->ind_table = ibv_exp_create_rwq_ind_table( + fdir_queue->ind_table = ibv_create_rwq_ind_table( priv->ctx, - &(struct ibv_exp_rwq_ind_table_init_attr){ - .pd = priv->pd, + &(struct ibv_rwq_ind_table_init_attr){ .log_ind_tbl_size = 0, .ind_tbl = &wq, .comp_mask = 0, @@ -497,24 +493,23 @@ struct mlx5_fdir_filter { ERROR("cannot create flow director indirection table"); goto error; } - fdir_queue->qp = ibv_exp_create_qp( + fdir_queue->qp = ibv_create_qp_ex( priv->ctx, - &(struct ibv_exp_qp_init_attr){ + &(struct ibv_qp_init_attr_ex){ .qp_type = IBV_QPT_RAW_PACKET, .comp_mask = - IBV_EXP_QP_INIT_ATTR_PD | - IBV_EXP_QP_INIT_ATTR_PORT | - IBV_EXP_QP_INIT_ATTR_RX_HASH, - .pd = priv->pd, - .rx_hash_conf = &(struct ibv_exp_rx_hash_conf){ + IBV_QP_INIT_ATTR_PD | + IBV_QP_INIT_ATTR_IND_TABLE | + IBV_QP_INIT_ATTR_RX_HASH, + .rx_hash_conf = (struct ibv_rx_hash_conf){ .rx_hash_function = - IBV_EXP_RX_HASH_FUNC_TOEPLITZ, + IBV_RX_HASH_FUNC_TOEPLITZ, .rx_hash_key_len = rss_hash_default_key_len, .rx_hash_key = rss_hash_default_key, .rx_hash_fields_mask = 0, - .rwq_ind_tbl = fdir_queue->ind_table, }, - .port_num = priv->port, + .rwq_ind_tbl = fdir_queue->ind_table, + .pd = priv->pd, }); if (!fdir_queue->qp) { ERROR("cannot create flow director hash RX QP"); @@ -525,10 +520,10 @@ struct mlx5_fdir_filter { assert(fdir_queue); assert(!fdir_queue->qp); if (fdir_queue->ind_table) - claim_zero(ibv_exp_destroy_rwq_ind_table + claim_zero(ibv_destroy_rwq_ind_table (fdir_queue->ind_table)); if (fdir_queue->wq) - claim_zero(ibv_exp_destroy_wq(fdir_queue->wq)); + claim_zero(ibv_destroy_wq(fdir_queue->wq)); if (fdir_queue->cq) claim_zero(ibv_destroy_cq(fdir_queue->cq)); rte_free(fdir_queue); @@ -673,13 +668,13 @@ struct mlx5_fdir_filter { struct mlx5_fdir_filter *mlx5_fdir_filter; while ((mlx5_fdir_filter = LIST_FIRST(priv->fdir_filter_list))) { - struct ibv_exp_flow *flow = mlx5_fdir_filter->flow; + struct ibv_flow *flow = mlx5_fdir_filter->flow; DEBUG("%p: flushing flow director filter %p", (void *)priv, (void *)mlx5_fdir_filter); LIST_REMOVE(mlx5_fdir_filter, next); if (flow != NULL) - claim_zero(ibv_exp_destroy_flow(flow)); + claim_zero(ibv_destroy_flow(flow)); rte_free(mlx5_fdir_filter); } } @@ -712,7 +707,7 @@ struct mlx5_fdir_filter { /* Run on every flow director filter and destroy flow handle. */ LIST_FOREACH(mlx5_fdir_filter, priv->fdir_filter_list, next) { - struct ibv_exp_flow *flow; + struct ibv_flow *flow; /* Only valid elements should be in the list */ assert(mlx5_fdir_filter != NULL); @@ -720,7 +715,7 @@ struct mlx5_fdir_filter { /* Destroy flow handle */ if (flow != NULL) { - claim_zero(ibv_exp_destroy_flow(flow)); + claim_zero(ibv_destroy_flow(flow)); mlx5_fdir_filter->flow = NULL; } } @@ -887,7 +882,7 @@ struct mlx5_fdir_filter { mlx5_fdir_filter = priv_find_filter_in_list(priv, fdir_filter); if (mlx5_fdir_filter != NULL) { - struct ibv_exp_flow *flow = mlx5_fdir_filter->flow; + struct ibv_flow *flow = mlx5_fdir_filter->flow; int err = 0; /* Update queue number. */ @@ -895,7 +890,7 @@ struct mlx5_fdir_filter { /* Destroy flow handle. */ if (flow != NULL) { - claim_zero(ibv_exp_destroy_flow(flow)); + claim_zero(ibv_destroy_flow(flow)); mlx5_fdir_filter->flow = NULL; } DEBUG("%p: flow director filter %p updated", @@ -933,14 +928,14 @@ struct mlx5_fdir_filter { mlx5_fdir_filter = priv_find_filter_in_list(priv, fdir_filter); if (mlx5_fdir_filter != NULL) { - struct ibv_exp_flow *flow = mlx5_fdir_filter->flow; + struct ibv_flow *flow = mlx5_fdir_filter->flow; /* Remove element from list. */ LIST_REMOVE(mlx5_fdir_filter, next); /* Destroy flow handle. */ if (flow != NULL) { - claim_zero(ibv_exp_destroy_flow(flow)); + claim_zero(ibv_destroy_flow(flow)); mlx5_fdir_filter->flow = NULL; } diff --git a/drivers/net/mlx5/mlx5_flow.c b/drivers/net/mlx5/mlx5_flow.c index 7dd3ebb..5b20fdd 100644 --- a/drivers/net/mlx5/mlx5_flow.c +++ b/drivers/net/mlx5/mlx5_flow.c @@ -89,11 +89,11 @@ struct rte_flow { TAILQ_ENTRY(rte_flow) next; /**< Pointer to the next flow structure. */ - struct ibv_exp_flow_attr *ibv_attr; /**< Pointer to Verbs attributes. */ - struct ibv_exp_rwq_ind_table *ind_table; /**< Indirection table. */ + struct ibv_flow_attr *ibv_attr; /**< Pointer to Verbs attributes. */ + struct ibv_rwq_ind_table *ind_table; /**< Indirection table. */ struct ibv_qp *qp; /**< Verbs queue pair. */ - struct ibv_exp_flow *ibv_flow; /**< Verbs flow. */ - struct ibv_exp_wq *wq; /**< Verbs work queue. */ + struct ibv_flow *ibv_flow; /**< Verbs flow. */ + struct ibv_wq *wq; /**< Verbs work queue. */ struct ibv_cq *cq; /**< Verbs completion queue. */ uint16_t rxqs_n; /**< Number of queues in this flow, 0 if drop queue. */ uint32_t mark:1; /**< Set if the flow is marked. */ @@ -172,7 +172,7 @@ struct mlx5_flow_items { .default_mask = &rte_flow_item_eth_mask, .mask_sz = sizeof(struct rte_flow_item_eth), .convert = mlx5_flow_create_eth, - .dst_sz = sizeof(struct ibv_exp_flow_spec_eth), + .dst_sz = sizeof(struct ibv_flow_spec_eth), }, [RTE_FLOW_ITEM_TYPE_VLAN] = { .items = ITEMS(RTE_FLOW_ITEM_TYPE_IPV4, @@ -201,7 +201,7 @@ struct mlx5_flow_items { .default_mask = &rte_flow_item_ipv4_mask, .mask_sz = sizeof(struct rte_flow_item_ipv4), .convert = mlx5_flow_create_ipv4, - .dst_sz = sizeof(struct ibv_exp_flow_spec_ipv4_ext), + .dst_sz = sizeof(struct ibv_flow_spec_ipv4_ext), }, [RTE_FLOW_ITEM_TYPE_IPV6] = { .items = ITEMS(RTE_FLOW_ITEM_TYPE_UDP, @@ -229,7 +229,7 @@ struct mlx5_flow_items { .default_mask = &rte_flow_item_ipv6_mask, .mask_sz = sizeof(struct rte_flow_item_ipv6), .convert = mlx5_flow_create_ipv6, - .dst_sz = sizeof(struct ibv_exp_flow_spec_ipv6_ext), + .dst_sz = sizeof(struct ibv_flow_spec_ipv6), }, [RTE_FLOW_ITEM_TYPE_UDP] = { .items = ITEMS(RTE_FLOW_ITEM_TYPE_VXLAN), @@ -243,7 +243,7 @@ struct mlx5_flow_items { .default_mask = &rte_flow_item_udp_mask, .mask_sz = sizeof(struct rte_flow_item_udp), .convert = mlx5_flow_create_udp, - .dst_sz = sizeof(struct ibv_exp_flow_spec_tcp_udp), + .dst_sz = sizeof(struct ibv_flow_spec_tcp_udp), }, [RTE_FLOW_ITEM_TYPE_TCP] = { .actions = valid_actions, @@ -256,7 +256,7 @@ struct mlx5_flow_items { .default_mask = &rte_flow_item_tcp_mask, .mask_sz = sizeof(struct rte_flow_item_tcp), .convert = mlx5_flow_create_tcp, - .dst_sz = sizeof(struct ibv_exp_flow_spec_tcp_udp), + .dst_sz = sizeof(struct ibv_flow_spec_tcp_udp), }, [RTE_FLOW_ITEM_TYPE_VXLAN] = { .items = ITEMS(RTE_FLOW_ITEM_TYPE_ETH), @@ -267,13 +267,13 @@ struct mlx5_flow_items { .default_mask = &rte_flow_item_vxlan_mask, .mask_sz = sizeof(struct rte_flow_item_vxlan), .convert = mlx5_flow_create_vxlan, - .dst_sz = sizeof(struct ibv_exp_flow_spec_tunnel), + .dst_sz = sizeof(struct ibv_flow_spec_tunnel), }, }; /** Structure to pass to the conversion function. */ struct mlx5_flow { - struct ibv_exp_flow_attr *ibv_attr; /**< Verbs attribute. */ + struct ibv_flow_attr *ibv_attr; /**< Verbs attribute. */ unsigned int offset; /**< Offset in bytes in the ibv_attr buffer. */ uint32_t inner; /**< Set once VXLAN is encountered. */ uint64_t hash_fields; /**< Fields that participate in the hash. */ @@ -281,9 +281,9 @@ struct mlx5_flow { /** Structure for Drop queue. */ struct rte_flow_drop { - struct ibv_exp_rwq_ind_table *ind_table; /**< Indirection table. */ + struct ibv_rwq_ind_table *ind_table; /**< Indirection table. */ struct ibv_qp *qp; /**< Verbs queue pair. */ - struct ibv_exp_wq *wq; /**< Verbs work queue. */ + struct ibv_wq *wq; /**< Verbs work queue. */ struct ibv_cq *cq; /**< Verbs completion queue. */ }; @@ -572,9 +572,9 @@ struct mlx5_flow_action { } } if (action->mark && !flow->ibv_attr && !action->drop) - flow->offset += sizeof(struct ibv_exp_flow_spec_action_tag); + flow->offset += sizeof(struct ibv_flow_spec_action_tag); if (!flow->ibv_attr && action->drop) - flow->offset += sizeof(struct ibv_exp_flow_spec_action_drop); + flow->offset += sizeof(struct ibv_flow_spec_action_drop); if (!action->queue && !action->drop) { rte_flow_error_set(error, ENOTSUP, RTE_FLOW_ERROR_TYPE_HANDLE, NULL, "no valid action"); @@ -606,7 +606,7 @@ struct mlx5_flow_action { { struct priv *priv = dev->data->dev_private; int ret; - struct mlx5_flow flow = { .offset = sizeof(struct ibv_exp_flow_attr) }; + struct mlx5_flow flow = { .offset = sizeof(struct ibv_flow_attr) }; struct mlx5_flow_action action = { .queue = 0, .drop = 0, @@ -640,16 +640,16 @@ struct mlx5_flow_action { const struct rte_flow_item_eth *spec = item->spec; const struct rte_flow_item_eth *mask = item->mask; struct mlx5_flow *flow = (struct mlx5_flow *)data; - struct ibv_exp_flow_spec_eth *eth; - const unsigned int eth_size = sizeof(struct ibv_exp_flow_spec_eth); + struct ibv_flow_spec_eth *eth; + const unsigned int eth_size = sizeof(struct ibv_flow_spec_eth); unsigned int i; ++flow->ibv_attr->num_of_specs; flow->ibv_attr->priority = 2; flow->hash_fields = 0; eth = (void *)((uintptr_t)flow->ibv_attr + flow->offset); - *eth = (struct ibv_exp_flow_spec_eth) { - .type = flow->inner | IBV_EXP_FLOW_SPEC_ETH, + *eth = (struct ibv_flow_spec_eth) { + .type = flow->inner | IBV_FLOW_SPEC_ETH, .size = eth_size, }; if (!spec) @@ -689,8 +689,8 @@ struct mlx5_flow_action { const struct rte_flow_item_vlan *spec = item->spec; const struct rte_flow_item_vlan *mask = item->mask; struct mlx5_flow *flow = (struct mlx5_flow *)data; - struct ibv_exp_flow_spec_eth *eth; - const unsigned int eth_size = sizeof(struct ibv_exp_flow_spec_eth); + struct ibv_flow_spec_eth *eth; + const unsigned int eth_size = sizeof(struct ibv_flow_spec_eth); eth = (void *)((uintptr_t)flow->ibv_attr + flow->offset - eth_size); if (!spec) @@ -721,29 +721,29 @@ struct mlx5_flow_action { const struct rte_flow_item_ipv4 *spec = item->spec; const struct rte_flow_item_ipv4 *mask = item->mask; struct mlx5_flow *flow = (struct mlx5_flow *)data; - struct ibv_exp_flow_spec_ipv4_ext *ipv4; - unsigned int ipv4_size = sizeof(struct ibv_exp_flow_spec_ipv4_ext); + struct ibv_flow_spec_ipv4_ext *ipv4; + unsigned int ipv4_size = sizeof(struct ibv_flow_spec_ipv4_ext); ++flow->ibv_attr->num_of_specs; flow->ibv_attr->priority = 1; - flow->hash_fields = (IBV_EXP_RX_HASH_SRC_IPV4 | - IBV_EXP_RX_HASH_DST_IPV4); + flow->hash_fields = (IBV_RX_HASH_SRC_IPV4 | + IBV_RX_HASH_DST_IPV4); ipv4 = (void *)((uintptr_t)flow->ibv_attr + flow->offset); - *ipv4 = (struct ibv_exp_flow_spec_ipv4_ext) { - .type = flow->inner | IBV_EXP_FLOW_SPEC_IPV4_EXT, + *ipv4 = (struct ibv_flow_spec_ipv4_ext) { + .type = flow->inner | IBV_FLOW_SPEC_IPV4_EXT, .size = ipv4_size, }; if (!spec) return 0; if (!mask) mask = default_mask; - ipv4->val = (struct ibv_exp_flow_ipv4_ext_filter){ + ipv4->val = (struct ibv_flow_ipv4_ext_filter){ .src_ip = spec->hdr.src_addr, .dst_ip = spec->hdr.dst_addr, .proto = spec->hdr.next_proto_id, .tos = spec->hdr.type_of_service, }; - ipv4->mask = (struct ibv_exp_flow_ipv4_ext_filter){ + ipv4->mask = (struct ibv_flow_ipv4_ext_filter){ .src_ip = mask->hdr.src_addr, .dst_ip = mask->hdr.dst_addr, .proto = mask->hdr.next_proto_id, @@ -775,17 +775,17 @@ struct mlx5_flow_action { const struct rte_flow_item_ipv6 *spec = item->spec; const struct rte_flow_item_ipv6 *mask = item->mask; struct mlx5_flow *flow = (struct mlx5_flow *)data; - struct ibv_exp_flow_spec_ipv6_ext *ipv6; - unsigned int ipv6_size = sizeof(struct ibv_exp_flow_spec_ipv6_ext); + struct ibv_flow_spec_ipv6 *ipv6; + unsigned int ipv6_size = sizeof(struct ibv_flow_spec_ipv6); unsigned int i; ++flow->ibv_attr->num_of_specs; flow->ibv_attr->priority = 1; - flow->hash_fields = (IBV_EXP_RX_HASH_SRC_IPV6 | - IBV_EXP_RX_HASH_DST_IPV6); + flow->hash_fields = (IBV_RX_HASH_SRC_IPV6 | + IBV_RX_HASH_DST_IPV6); ipv6 = (void *)((uintptr_t)flow->ibv_attr + flow->offset); - *ipv6 = (struct ibv_exp_flow_spec_ipv6_ext) { - .type = flow->inner | IBV_EXP_FLOW_SPEC_IPV6_EXT, + *ipv6 = (struct ibv_flow_spec_ipv6) { + .type = flow->inner | IBV_FLOW_SPEC_IPV6, .size = ipv6_size, }; if (!spec) @@ -832,16 +832,16 @@ struct mlx5_flow_action { const struct rte_flow_item_udp *spec = item->spec; const struct rte_flow_item_udp *mask = item->mask; struct mlx5_flow *flow = (struct mlx5_flow *)data; - struct ibv_exp_flow_spec_tcp_udp *udp; - unsigned int udp_size = sizeof(struct ibv_exp_flow_spec_tcp_udp); + struct ibv_flow_spec_tcp_udp *udp; + unsigned int udp_size = sizeof(struct ibv_flow_spec_tcp_udp); ++flow->ibv_attr->num_of_specs; flow->ibv_attr->priority = 0; - flow->hash_fields |= (IBV_EXP_RX_HASH_SRC_PORT_UDP | - IBV_EXP_RX_HASH_DST_PORT_UDP); + flow->hash_fields |= (IBV_RX_HASH_SRC_PORT_UDP | + IBV_RX_HASH_DST_PORT_UDP); udp = (void *)((uintptr_t)flow->ibv_attr + flow->offset); - *udp = (struct ibv_exp_flow_spec_tcp_udp) { - .type = flow->inner | IBV_EXP_FLOW_SPEC_UDP, + *udp = (struct ibv_flow_spec_tcp_udp) { + .type = flow->inner | IBV_FLOW_SPEC_UDP, .size = udp_size, }; if (!spec) @@ -876,16 +876,16 @@ struct mlx5_flow_action { const struct rte_flow_item_tcp *spec = item->spec; const struct rte_flow_item_tcp *mask = item->mask; struct mlx5_flow *flow = (struct mlx5_flow *)data; - struct ibv_exp_flow_spec_tcp_udp *tcp; - unsigned int tcp_size = sizeof(struct ibv_exp_flow_spec_tcp_udp); + struct ibv_flow_spec_tcp_udp *tcp; + unsigned int tcp_size = sizeof(struct ibv_flow_spec_tcp_udp); ++flow->ibv_attr->num_of_specs; flow->ibv_attr->priority = 0; - flow->hash_fields |= (IBV_EXP_RX_HASH_SRC_PORT_TCP | - IBV_EXP_RX_HASH_DST_PORT_TCP); + flow->hash_fields |= (IBV_RX_HASH_SRC_PORT_TCP | + IBV_RX_HASH_DST_PORT_TCP); tcp = (void *)((uintptr_t)flow->ibv_attr + flow->offset); - *tcp = (struct ibv_exp_flow_spec_tcp_udp) { - .type = flow->inner | IBV_EXP_FLOW_SPEC_TCP, + *tcp = (struct ibv_flow_spec_tcp_udp) { + .type = flow->inner | IBV_FLOW_SPEC_TCP, .size = tcp_size, }; if (!spec) @@ -920,8 +920,8 @@ struct mlx5_flow_action { const struct rte_flow_item_vxlan *spec = item->spec; const struct rte_flow_item_vxlan *mask = item->mask; struct mlx5_flow *flow = (struct mlx5_flow *)data; - struct ibv_exp_flow_spec_tunnel *vxlan; - unsigned int size = sizeof(struct ibv_exp_flow_spec_tunnel); + struct ibv_flow_spec_tunnel *vxlan; + unsigned int size = sizeof(struct ibv_flow_spec_tunnel); union vni { uint32_t vlan_id; uint8_t vni[4]; @@ -931,11 +931,11 @@ struct mlx5_flow_action { flow->ibv_attr->priority = 0; id.vni[0] = 0; vxlan = (void *)((uintptr_t)flow->ibv_attr + flow->offset); - *vxlan = (struct ibv_exp_flow_spec_tunnel) { - .type = flow->inner | IBV_EXP_FLOW_SPEC_VXLAN_TUNNEL, + *vxlan = (struct ibv_flow_spec_tunnel) { + .type = flow->inner | IBV_FLOW_SPEC_VXLAN_TUNNEL, .size = size, }; - flow->inner = IBV_EXP_FLOW_SPEC_INNER; + flow->inner = IBV_FLOW_SPEC_INNER; if (!spec) return 0; if (!mask) @@ -960,12 +960,12 @@ struct mlx5_flow_action { static int mlx5_flow_create_flag_mark(struct mlx5_flow *flow, uint32_t mark_id) { - struct ibv_exp_flow_spec_action_tag *tag; - unsigned int size = sizeof(struct ibv_exp_flow_spec_action_tag); + struct ibv_flow_spec_action_tag *tag; + unsigned int size = sizeof(struct ibv_flow_spec_action_tag); tag = (void *)((uintptr_t)flow->ibv_attr + flow->offset); - *tag = (struct ibv_exp_flow_spec_action_tag){ - .type = IBV_EXP_FLOW_SPEC_ACTION_TAG, + *tag = (struct ibv_flow_spec_action_tag){ + .type = IBV_FLOW_SPEC_ACTION_TAG, .size = size, .tag_id = mlx5_flow_mark_set(mark_id), }; @@ -992,8 +992,8 @@ struct mlx5_flow_action { struct rte_flow_error *error) { struct rte_flow *rte_flow; - struct ibv_exp_flow_spec_action_drop *drop; - unsigned int size = sizeof(struct ibv_exp_flow_spec_action_drop); + struct ibv_flow_spec_action_drop *drop; + unsigned int size = sizeof(struct ibv_flow_spec_action_drop); assert(priv->pd); assert(priv->ctx); @@ -1005,17 +1005,17 @@ struct mlx5_flow_action { } rte_flow->drop = 1; drop = (void *)((uintptr_t)flow->ibv_attr + flow->offset); - *drop = (struct ibv_exp_flow_spec_action_drop){ - .type = IBV_EXP_FLOW_SPEC_ACTION_DROP, + *drop = (struct ibv_flow_spec_action_drop){ + .type = IBV_FLOW_SPEC_ACTION_DROP, .size = size, }; ++flow->ibv_attr->num_of_specs; - flow->offset += sizeof(struct ibv_exp_flow_spec_action_drop); + flow->offset += sizeof(struct ibv_flow_spec_action_drop); rte_flow->ibv_attr = flow->ibv_attr; if (!priv->started) return rte_flow; rte_flow->qp = priv->flow_drop_queue->qp; - rte_flow->ibv_flow = ibv_exp_create_flow(rte_flow->qp, + rte_flow->ibv_flow = ibv_create_flow(rte_flow->qp, rte_flow->ibv_attr); if (!rte_flow->ibv_flow) { rte_flow_error_set(error, ENOMEM, RTE_FLOW_ERROR_TYPE_HANDLE, @@ -1054,7 +1054,7 @@ struct mlx5_flow_action { unsigned int i; unsigned int j; const unsigned int wqs_n = 1 << log2above(action->queues_n); - struct ibv_exp_wq *wqs[wqs_n]; + struct ibv_wq *wqs[wqs_n]; assert(priv->pd); assert(priv->ctx); @@ -1085,10 +1085,9 @@ struct mlx5_flow_action { rte_flow->mark = action->mark; rte_flow->ibv_attr = flow->ibv_attr; rte_flow->hash_fields = flow->hash_fields; - rte_flow->ind_table = ibv_exp_create_rwq_ind_table( + rte_flow->ind_table = ibv_create_rwq_ind_table( priv->ctx, - &(struct ibv_exp_rwq_ind_table_init_attr){ - .pd = priv->pd, + &(struct ibv_rwq_ind_table_init_attr){ .log_ind_tbl_size = log2above(action->queues_n), .ind_tbl = wqs, .comp_mask = 0, @@ -1098,24 +1097,23 @@ struct mlx5_flow_action { NULL, "cannot allocate indirection table"); goto error; } - rte_flow->qp = ibv_exp_create_qp( + rte_flow->qp = ibv_create_qp_ex( priv->ctx, - &(struct ibv_exp_qp_init_attr){ + &(struct ibv_qp_init_attr_ex){ .qp_type = IBV_QPT_RAW_PACKET, .comp_mask = - IBV_EXP_QP_INIT_ATTR_PD | - IBV_EXP_QP_INIT_ATTR_PORT | - IBV_EXP_QP_INIT_ATTR_RX_HASH, - .pd = priv->pd, - .rx_hash_conf = &(struct ibv_exp_rx_hash_conf){ + IBV_QP_INIT_ATTR_PD | + IBV_QP_INIT_ATTR_IND_TABLE | + IBV_QP_INIT_ATTR_RX_HASH, + .rx_hash_conf = (struct ibv_rx_hash_conf){ .rx_hash_function = - IBV_EXP_RX_HASH_FUNC_TOEPLITZ, + IBV_RX_HASH_FUNC_TOEPLITZ, .rx_hash_key_len = rss_hash_default_key_len, .rx_hash_key = rss_hash_default_key, .rx_hash_fields_mask = rte_flow->hash_fields, - .rwq_ind_tbl = rte_flow->ind_table, }, - .port_num = priv->port, + .rwq_ind_tbl = rte_flow->ind_table, + .pd = priv->pd }); if (!rte_flow->qp) { rte_flow_error_set(error, ENOMEM, RTE_FLOW_ERROR_TYPE_HANDLE, @@ -1124,7 +1122,7 @@ struct mlx5_flow_action { } if (!priv->started) return rte_flow; - rte_flow->ibv_flow = ibv_exp_create_flow(rte_flow->qp, + rte_flow->ibv_flow = ibv_create_flow(rte_flow->qp, rte_flow->ibv_attr); if (!rte_flow->ibv_flow) { rte_flow_error_set(error, ENOMEM, RTE_FLOW_ERROR_TYPE_HANDLE, @@ -1137,7 +1135,7 @@ struct mlx5_flow_action { if (rte_flow->qp) ibv_destroy_qp(rte_flow->qp); if (rte_flow->ind_table) - ibv_exp_destroy_rwq_ind_table(rte_flow->ind_table); + ibv_destroy_rwq_ind_table(rte_flow->ind_table); rte_free(rte_flow); return NULL; } @@ -1167,7 +1165,7 @@ struct mlx5_flow_action { struct rte_flow_error *error) { struct rte_flow *rte_flow; - struct mlx5_flow flow = { .offset = sizeof(struct ibv_exp_flow_attr), }; + struct mlx5_flow flow = { .offset = sizeof(struct ibv_flow_attr), }; struct mlx5_flow_action action = { .queue = 0, .drop = 0, @@ -1182,20 +1180,19 @@ struct mlx5_flow_action { if (err) goto exit; flow.ibv_attr = rte_malloc(__func__, flow.offset, 0); - flow.offset = sizeof(struct ibv_exp_flow_attr); + flow.offset = sizeof(struct ibv_flow_attr); if (!flow.ibv_attr) { rte_flow_error_set(error, ENOMEM, RTE_FLOW_ERROR_TYPE_HANDLE, NULL, "cannot allocate ibv_attr memory"); goto exit; } - *flow.ibv_attr = (struct ibv_exp_flow_attr){ - .type = IBV_EXP_FLOW_ATTR_NORMAL, - .size = sizeof(struct ibv_exp_flow_attr), + *flow.ibv_attr = (struct ibv_flow_attr){ + .type = IBV_FLOW_ATTR_NORMAL, + .size = sizeof(struct ibv_flow_attr), .priority = attr->priority, .num_of_specs = 0, .port = 0, .flags = 0, - .reserved = 0, }; flow.inner = 0; flow.hash_fields = 0; @@ -1203,7 +1200,7 @@ struct mlx5_flow_action { error, &flow, &action)); if (action.mark && !action.drop) { mlx5_flow_create_flag_mark(&flow, action.mark_id); - flow.offset += sizeof(struct ibv_exp_flow_spec_action_tag); + flow.offset += sizeof(struct ibv_flow_spec_action_tag); } if (action.drop) rte_flow = @@ -1259,13 +1256,13 @@ struct rte_flow * { TAILQ_REMOVE(&priv->flows, flow, next); if (flow->ibv_flow) - claim_zero(ibv_exp_destroy_flow(flow->ibv_flow)); + claim_zero(ibv_destroy_flow(flow->ibv_flow)); if (flow->drop) goto free; if (flow->qp) claim_zero(ibv_destroy_qp(flow->qp)); if (flow->ind_table) - claim_zero(ibv_exp_destroy_rwq_ind_table(flow->ind_table)); + claim_zero(ibv_destroy_rwq_ind_table(flow->ind_table)); if (flow->mark) { struct rte_flow *tmp; struct rxq *rxq; @@ -1381,19 +1378,16 @@ struct rte_flow * WARN("cannot allocate memory for drop queue"); goto error; } - fdq->cq = ibv_exp_create_cq(priv->ctx, 1, NULL, NULL, 0, - &(struct ibv_exp_cq_init_attr){ - .comp_mask = 0, - }); + fdq->cq = ibv_create_cq(priv->ctx, 1, NULL, NULL, 0); if (!fdq->cq) { WARN("cannot allocate CQ for drop queue"); goto error; } - fdq->wq = ibv_exp_create_wq(priv->ctx, - &(struct ibv_exp_wq_init_attr){ - .wq_type = IBV_EXP_WQT_RQ, - .max_recv_wr = 1, - .max_recv_sge = 1, + fdq->wq = ibv_create_wq(priv->ctx, + &(struct ibv_wq_init_attr){ + .wq_type = IBV_WQT_RQ, + .max_wr = 1, + .max_sge = 1, .pd = priv->pd, .cq = fdq->cq, }); @@ -1401,9 +1395,8 @@ struct rte_flow * WARN("cannot allocate WQ for drop queue"); goto error; } - fdq->ind_table = ibv_exp_create_rwq_ind_table(priv->ctx, - &(struct ibv_exp_rwq_ind_table_init_attr){ - .pd = priv->pd, + fdq->ind_table = ibv_create_rwq_ind_table(priv->ctx, + &(struct ibv_rwq_ind_table_init_attr){ .log_ind_tbl_size = 0, .ind_tbl = &fdq->wq, .comp_mask = 0, @@ -1412,24 +1405,23 @@ struct rte_flow * WARN("cannot allocate indirection table for drop queue"); goto error; } - fdq->qp = ibv_exp_create_qp(priv->ctx, - &(struct ibv_exp_qp_init_attr){ + fdq->qp = ibv_create_qp_ex(priv->ctx, + &(struct ibv_qp_init_attr_ex){ .qp_type = IBV_QPT_RAW_PACKET, .comp_mask = - IBV_EXP_QP_INIT_ATTR_PD | - IBV_EXP_QP_INIT_ATTR_PORT | - IBV_EXP_QP_INIT_ATTR_RX_HASH, - .pd = priv->pd, - .rx_hash_conf = &(struct ibv_exp_rx_hash_conf){ + IBV_QP_INIT_ATTR_PD | + IBV_QP_INIT_ATTR_IND_TABLE | + IBV_QP_INIT_ATTR_RX_HASH, + .rx_hash_conf = (struct ibv_rx_hash_conf){ .rx_hash_function = - IBV_EXP_RX_HASH_FUNC_TOEPLITZ, + IBV_RX_HASH_FUNC_TOEPLITZ, .rx_hash_key_len = rss_hash_default_key_len, .rx_hash_key = rss_hash_default_key, .rx_hash_fields_mask = 0, - .rwq_ind_tbl = fdq->ind_table, }, - .port_num = priv->port, - }); + .rwq_ind_tbl = fdq->ind_table, + .pd = priv->pd + }); if (!fdq->qp) { WARN("cannot allocate QP for drop queue"); goto error; @@ -1440,9 +1432,9 @@ struct rte_flow * if (fdq->qp) claim_zero(ibv_destroy_qp(fdq->qp)); if (fdq->ind_table) - claim_zero(ibv_exp_destroy_rwq_ind_table(fdq->ind_table)); + claim_zero(ibv_destroy_rwq_ind_table(fdq->ind_table)); if (fdq->wq) - claim_zero(ibv_exp_destroy_wq(fdq->wq)); + claim_zero(ibv_destroy_wq(fdq->wq)); if (fdq->cq) claim_zero(ibv_destroy_cq(fdq->cq)); if (fdq) @@ -1467,9 +1459,9 @@ struct rte_flow * if (fdq->qp) claim_zero(ibv_destroy_qp(fdq->qp)); if (fdq->ind_table) - claim_zero(ibv_exp_destroy_rwq_ind_table(fdq->ind_table)); + claim_zero(ibv_destroy_rwq_ind_table(fdq->ind_table)); if (fdq->wq) - claim_zero(ibv_exp_destroy_wq(fdq->wq)); + claim_zero(ibv_destroy_wq(fdq->wq)); if (fdq->cq) claim_zero(ibv_destroy_cq(fdq->cq)); rte_free(fdq); @@ -1490,7 +1482,7 @@ struct rte_flow * struct rte_flow *flow; TAILQ_FOREACH_REVERSE(flow, &priv->flows, mlx5_flows, next) { - claim_zero(ibv_exp_destroy_flow(flow->ibv_flow)); + claim_zero(ibv_destroy_flow(flow->ibv_flow)); flow->ibv_flow = NULL; if (flow->mark) { unsigned int n; @@ -1528,7 +1520,7 @@ struct rte_flow * qp = priv->flow_drop_queue->qp; else qp = flow->qp; - flow->ibv_flow = ibv_exp_create_flow(qp, flow->ibv_attr); + flow->ibv_flow = ibv_create_flow(qp, flow->ibv_attr); if (!flow->ibv_flow) { DEBUG("Flow %p cannot be applied", (void *)flow); rte_errno = EINVAL; diff --git a/drivers/net/mlx5/mlx5_mac.c b/drivers/net/mlx5/mlx5_mac.c index 45d23e4..63b98bc 100644 --- a/drivers/net/mlx5/mlx5_mac.c +++ b/drivers/net/mlx5/mlx5_mac.c @@ -112,7 +112,7 @@ (*mac)[0], (*mac)[1], (*mac)[2], (*mac)[3], (*mac)[4], (*mac)[5], mac_index, vlan_index); - claim_zero(ibv_exp_destroy_flow(hash_rxq->mac_flow + claim_zero(ibv_destroy_flow(hash_rxq->mac_flow [mac_index][vlan_index])); hash_rxq->mac_flow[mac_index][vlan_index] = NULL; } @@ -231,14 +231,14 @@ hash_rxq_add_mac_flow(struct hash_rxq *hash_rxq, unsigned int mac_index, unsigned int vlan_index) { - struct ibv_exp_flow *flow; + struct ibv_flow *flow; struct priv *priv = hash_rxq->priv; const uint8_t (*mac)[ETHER_ADDR_LEN] = (const uint8_t (*)[ETHER_ADDR_LEN]) priv->mac[mac_index].addr_bytes; FLOW_ATTR_SPEC_ETH(data, priv_flow_attr(priv, NULL, 0, hash_rxq->type)); - struct ibv_exp_flow_attr *attr = &data->attr; - struct ibv_exp_flow_spec_eth *spec = &data->spec; + struct ibv_flow_attr *attr = &data->attr; + struct ibv_flow_spec_eth *spec = &data->spec; unsigned int vlan_enabled = !!priv->vlan_filter_n; unsigned int vlan_id = priv->vlan_filter[vlan_index]; @@ -253,10 +253,10 @@ assert(((uint8_t *)attr + sizeof(*attr)) == (uint8_t *)spec); priv_flow_attr(priv, attr, sizeof(data), hash_rxq->type); /* The first specification must be Ethernet. */ - assert(spec->type == IBV_EXP_FLOW_SPEC_ETH); + assert(spec->type == IBV_FLOW_SPEC_ETH); assert(spec->size == sizeof(*spec)); - *spec = (struct ibv_exp_flow_spec_eth){ - .type = IBV_EXP_FLOW_SPEC_ETH, + *spec = (struct ibv_flow_spec_eth){ + .type = IBV_FLOW_SPEC_ETH, .size = sizeof(*spec), .val = { .dst_mac = { @@ -280,7 +280,7 @@ vlan_id); /* Create related flow. */ errno = 0; - flow = ibv_exp_create_flow(hash_rxq->qp, attr); + flow = ibv_create_flow(hash_rxq->qp, attr); if (flow == NULL) { /* It's not clear whether errno is always set in this case. */ ERROR("%p: flow configuration failed, errno=%d: %s", diff --git a/drivers/net/mlx5/mlx5_prm.h b/drivers/net/mlx5/mlx5_prm.h index 608072f..c1c4935 100644 --- a/drivers/net/mlx5/mlx5_prm.h +++ b/drivers/net/mlx5/mlx5_prm.h @@ -35,13 +35,14 @@ #define RTE_PMD_MLX5_PRM_H_ #include +#include /* Verbs header. */ /* ISO C doesn't support unnamed structs/unions, disabling -pedantic. */ #ifdef PEDANTIC #pragma GCC diagnostic ignored "-Wpedantic" #endif -#include +#include #ifdef PEDANTIC #pragma GCC diagnostic error "-Wpedantic" #endif @@ -89,10 +90,6 @@ /* Default max packet length to be inlined. */ #define MLX5_EMPW_MAX_INLINE_LEN (4U * MLX5_WQE_SIZE) -#ifndef HAVE_VERBS_MLX5_OPCODE_TSO -#define MLX5_OPCODE_TSO MLX5_OPCODE_LSO_MPW /* Compat with OFED 3.3. */ -#endif - #define MLX5_OPC_MOD_ENHANCED_MPSW 0 #define MLX5_OPCODE_ENHANCED_MPSW 0x29 @@ -244,6 +241,40 @@ struct mlx5_cqe { uint8_t op_own; }; +/* Adding direct verbs to data-path. */ + +/* CQ doorbell index mask. */ +#define MLX5_CI_MASK 0xffffff + +/* CQ doorbell offset. */ +#define MLX5_CQ_ARM_DB 1 + +/* CQ doorbell offset*/ +#define MLX5_CQ_DOORBELL 0x20 + +/* CQE format value. */ +#define MLX5_COMPRESSED 0x3 + +/* CQE format mask. */ +#define MLX5E_CQE_FORMAT_MASK 0xc + +/* MPW opcode. */ +#define MLX5_OPC_MOD_MPW 0x01 + +/* Compressed Rx CQE structure. */ +struct mlx5_mini_cqe8 { + union { + uint32_t rx_hash_result; + uint32_t checksum; + struct { + uint16_t wqe_counter; + uint8_t s_wqe_opcode; + uint8_t reserved; + } s_wqe_info; + }; + uint32_t byte_cnt; +}; + /** * Convert a user mark to flow mark. * diff --git a/drivers/net/mlx5/mlx5_rxmode.c b/drivers/net/mlx5/mlx5_rxmode.c index 4a51e47..56d3e3b 100644 --- a/drivers/net/mlx5/mlx5_rxmode.c +++ b/drivers/net/mlx5/mlx5_rxmode.c @@ -122,10 +122,10 @@ unsigned int vlan_index) { struct priv *priv = hash_rxq->priv; - struct ibv_exp_flow *flow; + struct ibv_flow *flow; FLOW_ATTR_SPEC_ETH(data, priv_flow_attr(priv, NULL, 0, hash_rxq->type)); - struct ibv_exp_flow_attr *attr = &data->attr; - struct ibv_exp_flow_spec_eth *spec = &data->spec; + struct ibv_flow_attr *attr = &data->attr; + struct ibv_flow_spec_eth *spec = &data->spec; const uint8_t *mac; const uint8_t *mask; unsigned int vlan_enabled = (priv->vlan_filter_n && @@ -146,13 +146,13 @@ assert(((uint8_t *)attr + sizeof(*attr)) == (uint8_t *)spec); priv_flow_attr(priv, attr, sizeof(data), hash_rxq->type); /* The first specification must be Ethernet. */ - assert(spec->type == IBV_EXP_FLOW_SPEC_ETH); + assert(spec->type == IBV_FLOW_SPEC_ETH); assert(spec->size == sizeof(*spec)); mac = special_flow_init[flow_type].dst_mac_val; mask = special_flow_init[flow_type].dst_mac_mask; - *spec = (struct ibv_exp_flow_spec_eth){ - .type = IBV_EXP_FLOW_SPEC_ETH, + *spec = (struct ibv_flow_spec_eth){ + .type = IBV_FLOW_SPEC_ETH, .size = sizeof(*spec), .val = { .dst_mac = { @@ -171,7 +171,7 @@ }; errno = 0; - flow = ibv_exp_create_flow(hash_rxq->qp, attr); + flow = ibv_create_flow(hash_rxq->qp, attr); if (flow == NULL) { /* It's not clear whether errno is always set in this case. */ ERROR("%p: flow configuration failed, errno=%d: %s", @@ -203,12 +203,12 @@ enum hash_rxq_flow_type flow_type, unsigned int vlan_index) { - struct ibv_exp_flow *flow = + struct ibv_flow *flow = hash_rxq->special_flow[flow_type][vlan_index]; if (flow == NULL) return; - claim_zero(ibv_exp_destroy_flow(flow)); + claim_zero(ibv_destroy_flow(flow)); hash_rxq->special_flow[flow_type][vlan_index] = NULL; DEBUG("%p: special flow %s (index %d) VLAN %u (index %u) disabled", (void *)hash_rxq, hash_rxq_flow_type_str(flow_type), flow_type, diff --git a/drivers/net/mlx5/mlx5_rxq.c b/drivers/net/mlx5/mlx5_rxq.c index 35c5cb4..dc54714 100644 --- a/drivers/net/mlx5/mlx5_rxq.c +++ b/drivers/net/mlx5/mlx5_rxq.c @@ -37,15 +37,13 @@ #include #include #include - /* Verbs header. */ /* ISO C doesn't support unnamed structs/unions, disabling -pedantic. */ #ifdef PEDANTIC #pragma GCC diagnostic ignored "-Wpedantic" #endif #include -#include -#include +#include #ifdef PEDANTIC #pragma GCC diagnostic error "-Wpedantic" #endif @@ -55,7 +53,9 @@ #include #include #include +#include #include +#include #include "mlx5.h" #include "mlx5_rxtx.h" @@ -66,77 +66,77 @@ /* Initialization data for hash RX queues. */ const struct hash_rxq_init hash_rxq_init[] = { [HASH_RXQ_TCPV4] = { - .hash_fields = (IBV_EXP_RX_HASH_SRC_IPV4 | - IBV_EXP_RX_HASH_DST_IPV4 | - IBV_EXP_RX_HASH_SRC_PORT_TCP | - IBV_EXP_RX_HASH_DST_PORT_TCP), + .hash_fields = (IBV_RX_HASH_SRC_IPV4 | + IBV_RX_HASH_DST_IPV4 | + IBV_RX_HASH_SRC_PORT_TCP | + IBV_RX_HASH_DST_PORT_TCP), .dpdk_rss_hf = ETH_RSS_NONFRAG_IPV4_TCP, .flow_priority = 0, .flow_spec.tcp_udp = { - .type = IBV_EXP_FLOW_SPEC_TCP, + .type = IBV_FLOW_SPEC_TCP, .size = sizeof(hash_rxq_init[0].flow_spec.tcp_udp), }, .underlayer = &hash_rxq_init[HASH_RXQ_IPV4], }, [HASH_RXQ_UDPV4] = { - .hash_fields = (IBV_EXP_RX_HASH_SRC_IPV4 | - IBV_EXP_RX_HASH_DST_IPV4 | - IBV_EXP_RX_HASH_SRC_PORT_UDP | - IBV_EXP_RX_HASH_DST_PORT_UDP), + .hash_fields = (IBV_RX_HASH_SRC_IPV4 | + IBV_RX_HASH_DST_IPV4 | + IBV_RX_HASH_SRC_PORT_UDP | + IBV_RX_HASH_DST_PORT_UDP), .dpdk_rss_hf = ETH_RSS_NONFRAG_IPV4_UDP, .flow_priority = 0, .flow_spec.tcp_udp = { - .type = IBV_EXP_FLOW_SPEC_UDP, + .type = IBV_FLOW_SPEC_UDP, .size = sizeof(hash_rxq_init[0].flow_spec.tcp_udp), }, .underlayer = &hash_rxq_init[HASH_RXQ_IPV4], }, [HASH_RXQ_IPV4] = { - .hash_fields = (IBV_EXP_RX_HASH_SRC_IPV4 | - IBV_EXP_RX_HASH_DST_IPV4), + .hash_fields = (IBV_RX_HASH_SRC_IPV4 | + IBV_RX_HASH_DST_IPV4), .dpdk_rss_hf = (ETH_RSS_IPV4 | ETH_RSS_FRAG_IPV4), .flow_priority = 1, .flow_spec.ipv4 = { - .type = IBV_EXP_FLOW_SPEC_IPV4, + .type = IBV_FLOW_SPEC_IPV4, .size = sizeof(hash_rxq_init[0].flow_spec.ipv4), }, .underlayer = &hash_rxq_init[HASH_RXQ_ETH], }, [HASH_RXQ_TCPV6] = { - .hash_fields = (IBV_EXP_RX_HASH_SRC_IPV6 | - IBV_EXP_RX_HASH_DST_IPV6 | - IBV_EXP_RX_HASH_SRC_PORT_TCP | - IBV_EXP_RX_HASH_DST_PORT_TCP), + .hash_fields = (IBV_RX_HASH_SRC_IPV6 | + IBV_RX_HASH_DST_IPV6 | + IBV_RX_HASH_SRC_PORT_TCP | + IBV_RX_HASH_DST_PORT_TCP), .dpdk_rss_hf = ETH_RSS_NONFRAG_IPV6_TCP, .flow_priority = 0, .flow_spec.tcp_udp = { - .type = IBV_EXP_FLOW_SPEC_TCP, + .type = IBV_FLOW_SPEC_TCP, .size = sizeof(hash_rxq_init[0].flow_spec.tcp_udp), }, .underlayer = &hash_rxq_init[HASH_RXQ_IPV6], }, [HASH_RXQ_UDPV6] = { - .hash_fields = (IBV_EXP_RX_HASH_SRC_IPV6 | - IBV_EXP_RX_HASH_DST_IPV6 | - IBV_EXP_RX_HASH_SRC_PORT_UDP | - IBV_EXP_RX_HASH_DST_PORT_UDP), + .hash_fields = (IBV_RX_HASH_SRC_IPV6 | + IBV_RX_HASH_DST_IPV6 | + IBV_RX_HASH_SRC_PORT_UDP | + IBV_RX_HASH_DST_PORT_UDP), .dpdk_rss_hf = ETH_RSS_NONFRAG_IPV6_UDP, .flow_priority = 0, .flow_spec.tcp_udp = { - .type = IBV_EXP_FLOW_SPEC_UDP, + .type = IBV_FLOW_SPEC_UDP, .size = sizeof(hash_rxq_init[0].flow_spec.tcp_udp), }, .underlayer = &hash_rxq_init[HASH_RXQ_IPV6], }, [HASH_RXQ_IPV6] = { - .hash_fields = (IBV_EXP_RX_HASH_SRC_IPV6 | - IBV_EXP_RX_HASH_DST_IPV6), + .hash_fields = (IBV_RX_HASH_SRC_IPV6 | + IBV_RX_HASH_DST_IPV6), .dpdk_rss_hf = (ETH_RSS_IPV6 | ETH_RSS_FRAG_IPV6), .flow_priority = 1, .flow_spec.ipv6 = { - .type = IBV_EXP_FLOW_SPEC_IPV6, + .type = IBV_FLOW_SPEC_IPV6, .size = sizeof(hash_rxq_init[0].flow_spec.ipv6), }, .underlayer = &hash_rxq_init[HASH_RXQ_ETH], @@ -146,7 +146,7 @@ .dpdk_rss_hf = 0, .flow_priority = 2, .flow_spec.eth = { - .type = IBV_EXP_FLOW_SPEC_ETH, + .type = IBV_FLOW_SPEC_ETH, .size = sizeof(hash_rxq_init[0].flow_spec.eth), }, .underlayer = NULL, @@ -215,7 +215,7 @@ * Total size of the flow attribute buffer. No errors are defined. */ size_t -priv_flow_attr(struct priv *priv, struct ibv_exp_flow_attr *flow_attr, +priv_flow_attr(struct priv *priv, struct ibv_flow_attr *flow_attr, size_t flow_attr_size, enum hash_rxq_type type) { size_t offset = sizeof(*flow_attr); @@ -231,8 +231,8 @@ return offset; flow_attr_size = offset; init = &hash_rxq_init[type]; - *flow_attr = (struct ibv_exp_flow_attr){ - .type = IBV_EXP_FLOW_ATTR_NORMAL, + *flow_attr = (struct ibv_flow_attr){ + .type = IBV_FLOW_ATTR_NORMAL, /* Priorities < 3 are reserved for flow director. */ .priority = init->flow_priority + 3, .num_of_specs = 0, @@ -338,13 +338,13 @@ int priv_create_hash_rxqs(struct priv *priv) { - struct ibv_exp_wq *wqs[priv->reta_idx_n]; + struct ibv_wq *wqs[priv->reta_idx_n]; struct ind_table_init ind_table_init[IND_TABLE_INIT_N]; unsigned int ind_tables_n = priv_make_ind_table_init(priv, &ind_table_init); unsigned int hash_rxqs_n = 0; struct hash_rxq (*hash_rxqs)[] = NULL; - struct ibv_exp_rwq_ind_table *(*ind_tables)[] = NULL; + struct ibv_rwq_ind_table *(*ind_tables)[] = NULL; unsigned int i; unsigned int j; unsigned int k; @@ -395,20 +395,19 @@ goto error; } for (i = 0; (i != ind_tables_n); ++i) { - struct ibv_exp_rwq_ind_table_init_attr ind_init_attr = { - .pd = priv->pd, + struct ibv_rwq_ind_table_init_attr ind_init_attr = { .log_ind_tbl_size = 0, /* Set below. */ .ind_tbl = wqs, .comp_mask = 0, }; unsigned int ind_tbl_size = ind_table_init[i].max_size; - struct ibv_exp_rwq_ind_table *ind_table; + struct ibv_rwq_ind_table *ind_table; if (priv->reta_idx_n < ind_tbl_size) ind_tbl_size = priv->reta_idx_n; ind_init_attr.log_ind_tbl_size = log2above(ind_tbl_size); errno = 0; - ind_table = ibv_exp_create_rwq_ind_table(priv->ctx, + ind_table = ibv_create_rwq_ind_table(priv->ctx, &ind_init_attr); if (ind_table != NULL) { (*ind_tables)[i] = ind_table; @@ -437,8 +436,8 @@ hash_rxq_type_from_pos(&ind_table_init[j], k); struct rte_eth_rss_conf *priv_rss_conf = (*priv->rss_conf)[type]; - struct ibv_exp_rx_hash_conf hash_conf = { - .rx_hash_function = IBV_EXP_RX_HASH_FUNC_TOEPLITZ, + struct ibv_rx_hash_conf hash_conf = { + .rx_hash_function = IBV_RX_HASH_FUNC_TOEPLITZ, .rx_hash_key_len = (priv_rss_conf ? priv_rss_conf->rss_key_len : rss_hash_default_key_len), @@ -446,23 +445,22 @@ priv_rss_conf->rss_key : rss_hash_default_key), .rx_hash_fields_mask = hash_rxq_init[type].hash_fields, - .rwq_ind_tbl = (*ind_tables)[j], }; - struct ibv_exp_qp_init_attr qp_init_attr = { - .max_inl_recv = 0, /* Currently not supported. */ + struct ibv_qp_init_attr_ex qp_init_attr = { .qp_type = IBV_QPT_RAW_PACKET, - .comp_mask = (IBV_EXP_QP_INIT_ATTR_PD | - IBV_EXP_QP_INIT_ATTR_RX_HASH), + .comp_mask = (IBV_QP_INIT_ATTR_PD | + IBV_QP_INIT_ATTR_IND_TABLE | + IBV_QP_INIT_ATTR_RX_HASH), + .rx_hash_conf = hash_conf, + .rwq_ind_tbl = (*ind_tables)[j], .pd = priv->pd, - .rx_hash_conf = &hash_conf, - .port_num = priv->port, }; DEBUG("using indirection table %u for hash RX queue %u type %d", j, i, type); *hash_rxq = (struct hash_rxq){ .priv = priv, - .qp = ibv_exp_create_qp(priv->ctx, &qp_init_attr), + .qp = ibv_create_qp_ex(priv->ctx, &qp_init_attr), .type = type, }; if (hash_rxq->qp == NULL) { @@ -497,12 +495,12 @@ } if (ind_tables != NULL) { for (j = 0; (j != ind_tables_n); ++j) { - struct ibv_exp_rwq_ind_table *ind_table = + struct ibv_rwq_ind_table *ind_table = (*ind_tables)[j]; if (ind_table == NULL) continue; - claim_zero(ibv_exp_destroy_rwq_ind_table(ind_table)); + claim_zero(ibv_destroy_rwq_ind_table(ind_table)); } rte_free(ind_tables); } @@ -547,11 +545,11 @@ rte_free(priv->hash_rxqs); priv->hash_rxqs = NULL; for (i = 0; (i != priv->ind_tables_n); ++i) { - struct ibv_exp_rwq_ind_table *ind_table = + struct ibv_rwq_ind_table *ind_table = (*priv->ind_tables)[i]; assert(ind_table != NULL); - claim_zero(ibv_exp_destroy_rwq_ind_table(ind_table)); + claim_zero(ibv_destroy_rwq_ind_table(ind_table)); } priv->ind_tables_n = 0; rte_free(priv->ind_tables); @@ -672,7 +670,8 @@ /* scat->addr must be able to store a pointer. */ assert(sizeof(scat->addr) >= sizeof(uintptr_t)); *scat = (struct mlx5_wqe_data_seg){ - .addr = htonll(rte_pktmbuf_mtod(buf, uintptr_t)), + .addr = rte_cpu_to_be_64( + rte_pktmbuf_mtod(buf, uintptr_t)), .byte_count = htonl(DATA_LEN(buf)), .lkey = htonl(rxq_ctrl->mr->lkey), }; @@ -764,7 +763,7 @@ if (rxq_ctrl->fdir_queue != NULL) priv_fdir_queue_destroy(rxq_ctrl->priv, rxq_ctrl->fdir_queue); if (rxq_ctrl->wq != NULL) - claim_zero(ibv_exp_destroy_wq(rxq_ctrl->wq)); + claim_zero(ibv_destroy_wq(rxq_ctrl->wq)); if (rxq_ctrl->cq != NULL) claim_zero(ibv_destroy_cq(rxq_ctrl->cq)); if (rxq_ctrl->channel != NULL) @@ -787,16 +786,23 @@ rxq_setup(struct rxq_ctrl *tmpl) { struct ibv_cq *ibcq = tmpl->cq; - struct ibv_mlx5_cq_info cq_info; - struct mlx5_rwq *rwq = container_of(tmpl->wq, struct mlx5_rwq, wq); + struct mlx5dv_cq cq_info; + struct mlx5dv_rwq rwq; const uint16_t desc_n = (1 << tmpl->rxq.elts_n) + tmpl->priv->rx_vec_en * MLX5_VPMD_DESCS_PER_LOOP; struct rte_mbuf *(*elts)[desc_n] = rte_calloc_socket("RXQ", 1, sizeof(*elts), 0, tmpl->socket); - if (ibv_mlx5_exp_get_cq_info(ibcq, &cq_info)) { - ERROR("Unable to query CQ info. check your OFED."); - return ENOTSUP; + struct mlx5dv_obj obj; + int ret = 0; + + obj.cq.in = ibcq; + obj.cq.out = &cq_info; + obj.rwq.in = tmpl->wq; + obj.rwq.out = &rwq; + ret = mlx5dv_init_obj(&obj, MLX5DV_OBJ_CQ | MLX5DV_OBJ_RWQ); + if (ret != 0) { + return -EINVAL; } if (cq_info.cqe_size != RTE_CACHE_LINE_SIZE) { ERROR("Wrong MLX5_CQE_SIZE environment variable value: " @@ -805,7 +811,7 @@ } if (elts == NULL) return ENOMEM; - tmpl->rxq.rq_db = rwq->rq.db; + tmpl->rxq.rq_db = rwq.dbrec; tmpl->rxq.cqe_n = log2above(cq_info.cqe_cnt); tmpl->rxq.cq_ci = 0; tmpl->rxq.rq_ci = 0; @@ -813,11 +819,14 @@ tmpl->rxq.cq_db = cq_info.dbrec; tmpl->rxq.wqes = (volatile struct mlx5_wqe_data_seg (*)[]) - (uintptr_t)rwq->rq.buff; + (uintptr_t)rwq.buf; tmpl->rxq.cqes = (volatile struct mlx5_cqe (*)[]) (uintptr_t)cq_info.buf; tmpl->rxq.elts = elts; + tmpl->rxq.cq_uar = cq_info.uar; + tmpl->rxq.cqn = cq_info.cqn; + tmpl->rxq.cq_arm_sn = 0; return 0; } @@ -855,11 +864,11 @@ .rss_hash = priv->rxqs_n > 1, }, }; - struct ibv_exp_wq_attr mod; + struct ibv_wq_attr mod; union { - struct ibv_exp_cq_init_attr cq; - struct ibv_exp_wq_init_attr wq; - struct ibv_exp_cq_attr cq_attr; + struct ibv_cq_init_attr_ex cq; + struct ibv_wq_init_attr wq; + struct ibv_cq_ex cq_attr; } attr; unsigned int mb_len = rte_pktmbuf_data_room_size(mp); unsigned int cqe_n = desc - 1; @@ -939,12 +948,12 @@ goto error; } } - attr.cq = (struct ibv_exp_cq_init_attr){ + attr.cq = (struct ibv_cq_init_attr_ex){ .comp_mask = 0, }; if (priv->cqe_comp) { - attr.cq.comp_mask |= IBV_EXP_CQ_INIT_ATTR_FLAGS; - attr.cq.flags |= IBV_EXP_CQ_COMPRESSED_CQE; + attr.cq.comp_mask |= IBV_CQ_INIT_ATTR_MASK_FLAGS; + attr.cq.flags |= MLX5DV_CQ_INIT_ATTR_MASK_COMPRESSED_CQE; /* * For vectorized Rx, it must not be doubled in order to * make cq_ci and rq_ci aligned. @@ -952,8 +961,7 @@ if (rxq_check_vec_support(&tmpl.rxq) < 0) cqe_n = (desc * 2) - 1; /* Double the number of CQEs. */ } - tmpl.cq = ibv_exp_create_cq(priv->ctx, cqe_n, NULL, tmpl.channel, 0, - &attr.cq); + tmpl.cq = ibv_create_cq(priv->ctx, cqe_n, NULL, tmpl.channel, 0); if (tmpl.cq == NULL) { ret = ENOMEM; ERROR("%p: CQ creation failure: %s", @@ -961,35 +969,35 @@ goto error; } DEBUG("priv->device_attr.max_qp_wr is %d", - priv->device_attr.max_qp_wr); + priv->device_attr.orig_attr.max_qp_wr); DEBUG("priv->device_attr.max_sge is %d", - priv->device_attr.max_sge); + priv->device_attr.orig_attr.max_sge); /* Configure VLAN stripping. */ tmpl.rxq.vlan_strip = (priv->hw_vlan_strip && !!dev->data->dev_conf.rxmode.hw_vlan_strip); - attr.wq = (struct ibv_exp_wq_init_attr){ + attr.wq = (struct ibv_wq_init_attr){ .wq_context = NULL, /* Could be useful in the future. */ - .wq_type = IBV_EXP_WQT_RQ, + .wq_type = IBV_WQT_RQ, /* Max number of outstanding WRs. */ - .max_recv_wr = desc >> tmpl.rxq.sges_n, + .max_wr = desc >> tmpl.rxq.sges_n, /* Max number of scatter/gather elements in a WR. */ - .max_recv_sge = 1 << tmpl.rxq.sges_n, + .max_sge = 1 << tmpl.rxq.sges_n, .pd = priv->pd, .cq = tmpl.cq, .comp_mask = - IBV_EXP_CREATE_WQ_VLAN_OFFLOADS | + IBV_WQ_FLAGS_CVLAN_STRIPPING | 0, - .vlan_offloads = (tmpl.rxq.vlan_strip ? - IBV_EXP_RECEIVE_WQ_CVLAN_STRIP : - 0), + .create_flags = (tmpl.rxq.vlan_strip ? + IBV_WQ_FLAGS_CVLAN_STRIPPING : + 0), }; /* By default, FCS (CRC) is stripped by hardware. */ if (dev->data->dev_conf.rxmode.hw_strip_crc) { tmpl.rxq.crc_present = 0; } else if (priv->hw_fcs_strip) { /* Ask HW/Verbs to leave CRC in place when supported. */ - attr.wq.flags |= IBV_EXP_CREATE_WQ_FLAG_SCATTER_FCS; - attr.wq.comp_mask |= IBV_EXP_CREATE_WQ_FLAGS; + attr.wq.create_flags |= IBV_WQ_FLAGS_SCATTER_FCS; + attr.wq.comp_mask |= IBV_WQ_INIT_ATTR_FLAGS; tmpl.rxq.crc_present = 1; } else { WARN("%p: CRC stripping has been disabled but will still" @@ -1003,20 +1011,22 @@ (void *)dev, tmpl.rxq.crc_present ? "disabled" : "enabled", tmpl.rxq.crc_present << 2); +#ifdef HAVE_IBV_WQ_FLAG_RX_END_PADDING if (!mlx5_getenv_int("MLX5_PMD_ENABLE_PADDING")) ; /* Nothing else to do. */ else if (priv->hw_padding) { INFO("%p: enabling packet padding on queue %p", (void *)dev, (void *)rxq_ctrl); - attr.wq.flags |= IBV_EXP_CREATE_WQ_FLAG_RX_END_PADDING; - attr.wq.comp_mask |= IBV_EXP_CREATE_WQ_FLAGS; + attr.wq.create_flags |= IBV_WQ_FLAG_RX_END_PADDING; + attr.wq.comp_mask |= IBV_WQ_INIT_ATTR_FLAGS; } else WARN("%p: packet padding has been requested but is not" " supported, make sure MLNX_OFED and firmware are" " up to date", (void *)dev); +#endif - tmpl.wq = ibv_exp_create_wq(priv->ctx, &attr.wq); + tmpl.wq = ibv_create_wq(priv->ctx, &attr.wq); if (tmpl.wq == NULL) { ret = (errno ? errno : EINVAL); ERROR("%p: WQ creation failure: %s", @@ -1027,12 +1037,12 @@ * Make sure number of WRs*SGEs match expectations since a queue * cannot allocate more than "desc" buffers. */ - if (((int)attr.wq.max_recv_wr != (desc >> tmpl.rxq.sges_n)) || - ((int)attr.wq.max_recv_sge != (1 << tmpl.rxq.sges_n))) { + if (((int)attr.wq.max_wr != (desc >> tmpl.rxq.sges_n)) || + ((int)attr.wq.max_sge != (1 << tmpl.rxq.sges_n))) { ERROR("%p: requested %u*%u but got %u*%u WRs*SGEs", (void *)dev, (desc >> tmpl.rxq.sges_n), (1 << tmpl.rxq.sges_n), - attr.wq.max_recv_wr, attr.wq.max_recv_sge); + attr.wq.max_wr, attr.wq.max_sge); ret = EINVAL; goto error; } @@ -1040,13 +1050,13 @@ tmpl.rxq.port_id = dev->data->port_id; DEBUG("%p: RTE port ID: %u", (void *)rxq_ctrl, tmpl.rxq.port_id); /* Change queue state to ready. */ - mod = (struct ibv_exp_wq_attr){ - .attr_mask = IBV_EXP_WQ_ATTR_STATE, - .wq_state = IBV_EXP_WQS_RDY, + mod = (struct ibv_wq_attr){ + .attr_mask = IBV_WQ_ATTR_STATE, + .wq_state = IBV_WQS_RDY, }; - ret = ibv_exp_modify_wq(tmpl.wq, &mod); + ret = ibv_modify_wq(tmpl.wq, &mod); if (ret) { - ERROR("%p: WQ state to IBV_EXP_WQS_RDY failed: %s", + ERROR("%p: WQ state to IBV_WQS_RDY failed: %s", (void *)dev, strerror(ret)); goto error; } @@ -1310,8 +1320,21 @@ intr_handle->intr_vec = NULL; } -#ifdef HAVE_UPDATE_CQ_CI +static inline void mlx5_arm_cq(struct rxq *rxq, int sq_n_rxq) +{ + int sq_n = 0; + uint32_t doorbell_hi; + uint64_t doorbell; + void *cq_db_reg = (char *)rxq->cq_uar + MLX5_CQ_DOORBELL; + sq_n = sq_n_rxq & 0x3; + doorbell_hi = sq_n << 28 | (rxq->cq_ci & MLX5_CI_MASK); + doorbell = (uint64_t)doorbell_hi << 32; + doorbell |= rxq->cqn; + rxq->cq_db[MLX5_CQ_ARM_DB] = rte_cpu_to_be_32(doorbell_hi); + rte_wmb(); + rte_write64(rte_cpu_to_be_64(doorbell), cq_db_reg); +} /** * DPDK callback for Rx queue interrupt enable. * @@ -1329,13 +1352,12 @@ struct priv *priv = mlx5_get_priv(dev); struct rxq *rxq = (*priv->rxqs)[rx_queue_id]; struct rxq_ctrl *rxq_ctrl = container_of(rxq, struct rxq_ctrl, rxq); - int ret; + int ret = 0; if (!rxq || !rxq_ctrl->channel) { ret = EINVAL; } else { - ibv_mlx5_exp_update_cq_ci(rxq_ctrl->cq, rxq->cq_ci); - ret = ibv_req_notify_cq(rxq_ctrl->cq, 0); + mlx5_arm_cq(rxq, rxq->cq_arm_sn); } if (ret) WARN("unable to arm interrupt on rx queue %d", rx_queue_id); @@ -1367,6 +1389,7 @@ ret = EINVAL; } else { ret = ibv_get_cq_event(rxq_ctrl->cq->channel, &ev_cq, &ev_ctx); + rxq->cq_arm_sn++; if (ret || ev_cq != rxq_ctrl->cq) ret = EINVAL; } @@ -1377,5 +1400,3 @@ ibv_ack_cq_events(rxq_ctrl->cq, 1); return -ret; } - -#endif /* HAVE_UPDATE_CQ_CI */ diff --git a/drivers/net/mlx5/mlx5_rxtx.c b/drivers/net/mlx5/mlx5_rxtx.c index fe9e7ea..991ea94 100644 --- a/drivers/net/mlx5/mlx5_rxtx.c +++ b/drivers/net/mlx5/mlx5_rxtx.c @@ -42,8 +42,7 @@ #pragma GCC diagnostic ignored "-Wpedantic" #endif #include -#include -#include +#include #ifdef PEDANTIC #pragma GCC diagnostic error "-Wpedantic" #endif @@ -603,7 +602,7 @@ ds = 3; use_dseg: /* Add the remaining packet as a simple ds. */ - naddr = htonll(addr); + naddr = rte_cpu_to_be_64(addr); *dseg = (rte_v128u32_t){ htonl(length), mlx5_tx_mb2mr(txq, buf), @@ -642,7 +641,7 @@ total_length += length; #endif /* Store segment information. */ - naddr = htonll(rte_pktmbuf_mtod(buf, uintptr_t)); + naddr = rte_cpu_to_be_64(rte_pktmbuf_mtod(buf, uintptr_t)); *dseg = (rte_v128u32_t){ htonl(length), mlx5_tx_mb2mr(txq, buf), @@ -888,7 +887,7 @@ *dseg = (struct mlx5_wqe_data_seg){ .byte_count = htonl(DATA_LEN(buf)), .lkey = mlx5_tx_mb2mr(txq, buf), - .addr = htonll(addr), + .addr = rte_cpu_to_be_64(addr), }; #if defined(MLX5_PMD_SOFT_COUNTERS) || !defined(NDEBUG) length += DATA_LEN(buf); @@ -1134,7 +1133,7 @@ *dseg = (struct mlx5_wqe_data_seg){ .byte_count = htonl(DATA_LEN(buf)), .lkey = mlx5_tx_mb2mr(txq, buf), - .addr = htonll(addr), + .addr = rte_cpu_to_be_64(addr), }; #if defined(MLX5_PMD_SOFT_COUNTERS) || !defined(NDEBUG) length += DATA_LEN(buf); @@ -1441,7 +1440,7 @@ *dseg = (struct mlx5_wqe_data_seg){ .byte_count = htonl(DATA_LEN(buf)), .lkey = mlx5_tx_mb2mr(txq, buf), - .addr = htonll(addr), + .addr = rte_cpu_to_be_64(addr), }; #if defined(MLX5_PMD_SOFT_COUNTERS) || !defined(NDEBUG) length += DATA_LEN(buf); @@ -1520,7 +1519,7 @@ for (n = 0; n * RTE_CACHE_LINE_SIZE < length; n++) rte_prefetch2((void *)(addr + n * RTE_CACHE_LINE_SIZE)); - naddr = htonll(addr); + naddr = rte_cpu_to_be_64(addr); *dseg = (rte_v128u32_t) { htonl(length), mlx5_tx_mb2mr(txq, buf), @@ -1872,7 +1871,7 @@ * of the buffers are already known, only the buffer address * changes. */ - wqe->addr = htonll(rte_pktmbuf_mtod(rep, uintptr_t)); + wqe->addr = rte_cpu_to_be_64(rte_pktmbuf_mtod(rep, uintptr_t)); if (len > DATA_LEN(seg)) { len -= DATA_LEN(seg); ++NB_SEGS(pkt); diff --git a/drivers/net/mlx5/mlx5_rxtx.h b/drivers/net/mlx5/mlx5_rxtx.h index b3b161d..72d0330 100644 --- a/drivers/net/mlx5/mlx5_rxtx.h +++ b/drivers/net/mlx5/mlx5_rxtx.h @@ -43,7 +43,7 @@ #pragma GCC diagnostic ignored "-Wpedantic" #endif #include -#include +#include #ifdef PEDANTIC #pragma GCC diagnostic error "-Wpedantic" #endif @@ -80,8 +80,8 @@ struct mlx5_txq_stats { /* Flow director queue structure. */ struct fdir_queue { struct ibv_qp *qp; /* Associated RX QP. */ - struct ibv_exp_rwq_ind_table *ind_table; /* Indirection table. */ - struct ibv_exp_wq *wq; /* Work queue. */ + struct ibv_rwq_ind_table *ind_table; /* Indirection table. */ + struct ibv_wq *wq; /* Work queue. */ struct ibv_cq *cq; /* Completion queue. */ }; @@ -123,13 +123,16 @@ struct rxq { struct mlx5_rxq_stats stats; uint64_t mbuf_initializer; /* Default rearm_data for vectorized Rx. */ struct rte_mbuf fake_mbuf; /* elts padding for vectorized Rx. */ + void *cq_uar; /* CQ user access region. */ + uint32_t cqn; /* CQ number. */ + uint8_t cq_arm_sn; /* CQ arm seq number. */ } __rte_cache_aligned; /* RX queue control descriptor. */ struct rxq_ctrl { struct priv *priv; /* Back pointer to private data. */ struct ibv_cq *cq; /* Completion Queue. */ - struct ibv_exp_wq *wq; /* Work Queue. */ + struct ibv_wq *wq; /* Work Queue. */ struct fdir_queue *fdir_queue; /* Flow director queue. */ struct ibv_mr *mr; /* Memory Region (for mp). */ struct ibv_comp_channel *channel; @@ -151,8 +154,8 @@ enum hash_rxq_type { /* Flow structure with Ethernet specification. It is packed to prevent padding * between attr and spec as this layout is expected by libibverbs. */ struct flow_attr_spec_eth { - struct ibv_exp_flow_attr attr; - struct ibv_exp_flow_spec_eth spec; + struct ibv_flow_attr attr; + struct ibv_flow_spec_eth spec; } __attribute__((packed)); /* Define a struct flow_attr_spec_eth object as an array of at least @@ -170,13 +173,13 @@ struct hash_rxq_init { unsigned int flow_priority; /* Flow priority to use. */ union { struct { - enum ibv_exp_flow_spec_type type; + enum ibv_flow_spec_type type; uint16_t size; } hdr; - struct ibv_exp_flow_spec_tcp_udp tcp_udp; - struct ibv_exp_flow_spec_ipv4 ipv4; - struct ibv_exp_flow_spec_ipv6 ipv6; - struct ibv_exp_flow_spec_eth eth; + struct ibv_flow_spec_tcp_udp tcp_udp; + struct ibv_flow_spec_ipv4 ipv4; + struct ibv_flow_spec_ipv6 ipv6; + struct ibv_flow_spec_eth eth; } flow_spec; /* Flow specification template. */ const struct hash_rxq_init *underlayer; /* Pointer to underlayer. */ }; @@ -230,9 +233,9 @@ struct hash_rxq { struct ibv_qp *qp; /* Hash RX QP. */ enum hash_rxq_type type; /* Hash RX queue type. */ /* MAC flow steering rules, one per VLAN ID. */ - struct ibv_exp_flow *mac_flow + struct ibv_flow *mac_flow [MLX5_MAX_MAC_ADDRESSES][MLX5_MAX_VLAN_IDS]; - struct ibv_exp_flow *special_flow + struct ibv_flow *special_flow [MLX5_MAX_SPECIAL_FLOWS][MLX5_MAX_VLAN_IDS]; }; @@ -292,8 +295,8 @@ struct txq_ctrl { extern uint8_t rss_hash_default_key[]; extern const size_t rss_hash_default_key_len; -size_t priv_flow_attr(struct priv *, struct ibv_exp_flow_attr *, - size_t, enum hash_rxq_type); +size_t priv_flow_attr(struct priv *, struct ibv_flow_attr *, size_t, + enum hash_rxq_type); int priv_create_hash_rxqs(struct priv *); void priv_destroy_hash_rxqs(struct priv *); int priv_allow_flow_type(struct priv *, enum hash_rxq_flow_type); @@ -304,10 +307,8 @@ int mlx5_rx_queue_setup(struct rte_eth_dev *, uint16_t, uint16_t, unsigned int, void mlx5_rx_queue_release(void *); int priv_rx_intr_vec_enable(struct priv *priv); void priv_rx_intr_vec_disable(struct priv *priv); -#ifdef HAVE_UPDATE_CQ_CI int mlx5_rx_intr_enable(struct rte_eth_dev *dev, uint16_t rx_queue_id); int mlx5_rx_intr_disable(struct rte_eth_dev *dev, uint16_t rx_queue_id); -#endif /* HAVE_UPDATE_CQ_CI */ /* mlx5_txq.c */ diff --git a/drivers/net/mlx5/mlx5_rxtx_vec_sse.c b/drivers/net/mlx5/mlx5_rxtx_vec_sse.c index 37854a7..5bef200 100644 --- a/drivers/net/mlx5/mlx5_rxtx_vec_sse.c +++ b/drivers/net/mlx5/mlx5_rxtx_vec_sse.c @@ -43,8 +43,7 @@ #pragma GCC diagnostic ignored "-Wpedantic" #endif #include -#include -#include +#include #ifdef PEDANTIC #pragma GCC diagnostic error "-Wpedantic" #endif @@ -561,7 +560,7 @@ return; } for (i = 0; i < n; ++i) - wq[i].addr = htonll((uintptr_t)elts[i]->buf_addr + + wq[i].addr = rte_cpu_to_be_64((uintptr_t)elts[i]->buf_addr + RTE_PKTMBUF_HEADROOM); rxq->rq_ci += n; rte_wmb(); diff --git a/drivers/net/mlx5/mlx5_txq.c b/drivers/net/mlx5/mlx5_txq.c index 4b0b532..3156ad2 100644 --- a/drivers/net/mlx5/mlx5_txq.c +++ b/drivers/net/mlx5/mlx5_txq.c @@ -162,13 +162,19 @@ static inline int txq_setup(struct txq_ctrl *tmpl, struct txq_ctrl *txq_ctrl) { - struct mlx5_qp *qp = to_mqp(tmpl->qp); + struct mlx5dv_qp qp; struct ibv_cq *ibcq = tmpl->cq; - struct ibv_mlx5_cq_info cq_info; + struct mlx5dv_cq cq_info; + struct mlx5dv_obj obj; + int ret = 0; - if (ibv_mlx5_exp_get_cq_info(ibcq, &cq_info)) { - ERROR("Unable to query CQ info. check your OFED."); - return ENOTSUP; + obj.cq.in = ibcq; + obj.cq.out = &cq_info; + obj.qp.in = tmpl->qp; + obj.qp.out = &qp; + ret = mlx5dv_init_obj(&obj, MLX5DV_OBJ_CQ | MLX5DV_OBJ_QP); + if (ret != 0) { + return -EINVAL; } if (cq_info.cqe_size != RTE_CACHE_LINE_SIZE) { ERROR("Wrong MLX5_CQE_SIZE environment variable value: " @@ -176,11 +182,11 @@ return EINVAL; } tmpl->txq.cqe_n = log2above(cq_info.cqe_cnt); - tmpl->txq.qp_num_8s = qp->ctrl_seg.qp_num << 8; - tmpl->txq.wqes = qp->gen_data.sqstart; - tmpl->txq.wqe_n = log2above(qp->sq.wqe_cnt); - tmpl->txq.qp_db = &qp->gen_data.db[MLX5_SND_DBR]; - tmpl->txq.bf_reg = qp->gen_data.bf->reg; + tmpl->txq.qp_num_8s = tmpl->qp->qp_num << 8; + tmpl->txq.wqes = qp.sq.buf; + tmpl->txq.wqe_n = log2above(qp.sq.wqe_cnt); + tmpl->txq.qp_db = &qp.dbrec[MLX5_SND_DBR]; + tmpl->txq.bf_reg = qp.bf.reg; tmpl->txq.cq_db = cq_info.dbrec; tmpl->txq.cqes = (volatile struct mlx5_cqe (*)[]) @@ -219,10 +225,10 @@ .socket = socket, }; union { - struct ibv_exp_qp_init_attr init; - struct ibv_exp_cq_init_attr cq; - struct ibv_exp_qp_attr mod; - struct ibv_exp_cq_attr cq_attr; + struct ibv_qp_init_attr_ex init; + struct ibv_cq_init_attr_ex cq; + struct ibv_qp_attr mod; + struct ibv_cq_ex cq_attr; } attr; unsigned int cqe_n; const unsigned int max_tso_inline = ((MLX5_MAX_TSO_HEADER + @@ -241,16 +247,16 @@ if (priv->mps == MLX5_MPW_ENHANCED) tmpl.txq.mpw_hdr_dseg = priv->mpw_hdr_dseg; /* MRs will be registered in mp2mr[] later. */ - attr.cq = (struct ibv_exp_cq_init_attr){ + attr.cq = (struct ibv_cq_init_attr_ex){ .comp_mask = 0, }; cqe_n = ((desc / MLX5_TX_COMP_THRESH) - 1) ? ((desc / MLX5_TX_COMP_THRESH) - 1) : 1; if (priv->mps == MLX5_MPW_ENHANCED) cqe_n += MLX5_TX_COMP_THRESH_INLINE_DIV; - tmpl.cq = ibv_exp_create_cq(priv->ctx, + tmpl.cq = ibv_create_cq(priv->ctx, cqe_n, - NULL, NULL, 0, &attr.cq); + NULL, NULL, 0); if (tmpl.cq == NULL) { ret = ENOMEM; ERROR("%p: CQ creation failure: %s", @@ -258,19 +264,20 @@ goto error; } DEBUG("priv->device_attr.max_qp_wr is %d", - priv->device_attr.max_qp_wr); + priv->device_attr.orig_attr.max_qp_wr); DEBUG("priv->device_attr.max_sge is %d", - priv->device_attr.max_sge); - attr.init = (struct ibv_exp_qp_init_attr){ + priv->device_attr.orig_attr.max_sge); + attr.init = (struct ibv_qp_init_attr_ex){ /* CQ to be associated with the send queue. */ .send_cq = tmpl.cq, /* CQ to be associated with the receive queue. */ .recv_cq = tmpl.cq, .cap = { /* Max number of outstanding WRs. */ - .max_send_wr = ((priv->device_attr.max_qp_wr < desc) ? - priv->device_attr.max_qp_wr : - desc), + .max_send_wr = + ((priv->device_attr.orig_attr.max_qp_wr < desc) ? + priv->device_attr.orig_attr.max_qp_wr : + desc), /* * Max number of scatter/gather elements in a WR, * must be 1 to prevent libmlx5 from trying to affect @@ -285,7 +292,7 @@ * TX burst. */ .sq_sig_all = 0, .pd = priv->pd, - .comp_mask = IBV_EXP_QP_INIT_ATTR_PD, + .comp_mask = IBV_QP_INIT_ATTR_PD, }; if (priv->txq_inline && (priv->txqs_n >= priv->txqs_inline)) { tmpl.txq.max_inline = @@ -324,14 +331,14 @@ if (priv->tso) { attr.init.max_tso_header = max_tso_inline * RTE_CACHE_LINE_SIZE; - attr.init.comp_mask |= IBV_EXP_QP_INIT_ATTR_MAX_TSO_HEADER; + attr.init.comp_mask |= IBV_QP_INIT_ATTR_MAX_TSO_HEADER; tmpl.txq.max_inline = RTE_MAX(tmpl.txq.max_inline, max_tso_inline); tmpl.txq.tso_en = 1; } if (priv->tunnel_en) tmpl.txq.tunnel_en = 1; - tmpl.qp = ibv_exp_create_qp(priv->ctx, &attr.init); + tmpl.qp = ibv_create_qp_ex(priv->ctx, &attr.init); if (tmpl.qp == NULL) { ret = (errno ? errno : EINVAL); ERROR("%p: QP creation failure: %s", @@ -343,14 +350,14 @@ attr.init.cap.max_send_wr, attr.init.cap.max_send_sge, attr.init.cap.max_inline_data); - attr.mod = (struct ibv_exp_qp_attr){ + attr.mod = (struct ibv_qp_attr){ /* Move the QP to this state. */ .qp_state = IBV_QPS_INIT, /* Primary port number. */ .port_num = priv->port }; - ret = ibv_exp_modify_qp(tmpl.qp, &attr.mod, - (IBV_EXP_QP_STATE | IBV_EXP_QP_PORT)); + ret = ibv_modify_qp(tmpl.qp, &attr.mod, + (IBV_QP_STATE | IBV_QP_PORT)); if (ret) { ERROR("%p: QP state to IBV_QPS_INIT failed: %s", (void *)dev, strerror(ret)); @@ -363,17 +370,17 @@ goto error; } txq_alloc_elts(&tmpl, desc); - attr.mod = (struct ibv_exp_qp_attr){ + attr.mod = (struct ibv_qp_attr){ .qp_state = IBV_QPS_RTR }; - ret = ibv_exp_modify_qp(tmpl.qp, &attr.mod, IBV_EXP_QP_STATE); + ret = ibv_modify_qp(tmpl.qp, &attr.mod, IBV_QP_STATE); if (ret) { ERROR("%p: QP state to IBV_QPS_RTR failed: %s", (void *)dev, strerror(ret)); goto error; } attr.mod.qp_state = IBV_QPS_RTS; - ret = ibv_exp_modify_qp(tmpl.qp, &attr.mod, IBV_EXP_QP_STATE); + ret = ibv_modify_qp(tmpl.qp, &attr.mod, IBV_QP_STATE); if (ret) { ERROR("%p: QP state to IBV_QPS_RTS failed: %s", (void *)dev, strerror(ret)); diff --git a/drivers/net/mlx5/mlx5_vlan.c b/drivers/net/mlx5/mlx5_vlan.c index 353ae49..4d531dc 100644 --- a/drivers/net/mlx5/mlx5_vlan.c +++ b/drivers/net/mlx5/mlx5_vlan.c @@ -139,20 +139,20 @@ { struct rxq *rxq = (*priv->rxqs)[idx]; struct rxq_ctrl *rxq_ctrl = container_of(rxq, struct rxq_ctrl, rxq); - struct ibv_exp_wq_attr mod; + struct ibv_wq_attr mod; uint16_t vlan_offloads = - (on ? IBV_EXP_RECEIVE_WQ_CVLAN_STRIP : 0) | + (on ? IBV_WQ_FLAGS_CVLAN_STRIPPING : 0) | 0; int err; DEBUG("set VLAN offloads 0x%x for port %d queue %d", vlan_offloads, rxq->port_id, idx); - mod = (struct ibv_exp_wq_attr){ - .attr_mask = IBV_EXP_WQ_ATTR_VLAN_OFFLOADS, - .vlan_offloads = vlan_offloads, + mod = (struct ibv_wq_attr){ + .attr_mask = IBV_WQ_FLAGS_CVLAN_STRIPPING, + .flags = vlan_offloads, }; - err = ibv_exp_modify_wq(rxq_ctrl->wq, &mod); + err = ibv_modify_wq(rxq_ctrl->wq, &mod); if (err) { ERROR("%p: failed to modified stripping mode: %s", (void *)priv, strerror(err)); diff --git a/mk/rte.app.mk b/mk/rte.app.mk index c25fdd9..9415537 100644 --- a/mk/rte.app.mk +++ b/mk/rte.app.mk @@ -129,7 +129,7 @@ _LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_KNI) += -lrte_pmd_kni endif _LDLIBS-$(CONFIG_RTE_LIBRTE_LIO_PMD) += -lrte_pmd_lio _LDLIBS-$(CONFIG_RTE_LIBRTE_MLX4_PMD) += -lrte_pmd_mlx4 -libverbs -_LDLIBS-$(CONFIG_RTE_LIBRTE_MLX5_PMD) += -lrte_pmd_mlx5 -libverbs +_LDLIBS-$(CONFIG_RTE_LIBRTE_MLX5_PMD) += -lrte_pmd_mlx5 -libverbs -lmlx5 _LDLIBS-$(CONFIG_RTE_LIBRTE_NFP_PMD) += -lrte_pmd_nfp _LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_NULL) += -lrte_pmd_null _LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_PCAP) += -lrte_pmd_pcap -lpcap -- 1.8.3.1