From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from dpdk.org (dpdk.org [92.243.14.124]) by inbox.dpdk.org (Postfix) with ESMTP id 910F7A046B for ; Mon, 22 Jul 2019 07:32:34 +0200 (CEST) Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id 053B02F42; Mon, 22 Jul 2019 07:32:33 +0200 (CEST) Received: from EUR02-AM5-obe.outbound.protection.outlook.com (mail-eopbgr00076.outbound.protection.outlook.com [40.107.0.76]) by dpdk.org (Postfix) with ESMTP id 642852BAF for ; Mon, 22 Jul 2019 07:32:29 +0200 (CEST) ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=nBeiNaMb3hR6PxrT2N9h+3j94WUAnfW7+cH8G5rpAY0QBVa+V7ok8ZDXpxRO1NIvCYyVIM+vs5fJ/JL5cNUOuV/mEO5LlNcf5NUcvzHtLtCeM+HjTX/pjqstM0gjbQ3NPxX5KME06vnHHQaY0KgnGlwDPZGYrzhdYLYp7Yyqr9dhAlQggqC+uT1yP98nxpQJoPl2tjZMvIbuPNrumMvBaNGcyjmuSzsyaN3bKn0npvU3xgYBGGQkhz8zLFB+rQmWcMrTh+F2n36TfRph3548zJcmJbIueHmvowgoFQ70dHLPTJhSGpxM/kD2aQB+ujGlMT3oGGyFc+0r+UadKm0vJQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=A54vuFYse6ny9EzvCujUxuV5VXh3RL0LqIFLAZ8pD9I=; b=RNWG5L8aveDSMnraZge5ODHiY/NFkfLrn22ftAANtFG+15m7aMFYNBrdwoiLWT+l++Q5fvDoABGT9ltmVg92V6BAco52O07cWt+t1IgzYVDI/mtGHdK9mKPi1uIKjO2gMiXtiHnCyzp0ZqbxB38TzKzoydH5hjzv4PEcdGPEkqIEbow6Z78C7kFW/tuz3bpMjUOAuoENET2wYBswOPj/d/pKffUv1UIkleGCCA074CEf8ai7kKQv6ABiWi6JlIMZijwLuQvmbzWzHgjdf2d+FgcBBj0p0X0cWLRTdZJ9o6cux2odm1L6DQM9FzZG8E2T6lvMj4iKVeQyslSPIQzK3A== ARC-Authentication-Results: i=1; mx.microsoft.com 1;spf=pass smtp.mailfrom=mellanox.com;dmarc=pass action=none header.from=mellanox.com;dkim=pass header.d=mellanox.com;arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Mellanox.com; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=A54vuFYse6ny9EzvCujUxuV5VXh3RL0LqIFLAZ8pD9I=; b=DoKwR2bguMRdzqSuStF3P/87K7GMKIUzf1iRURmb0gtr2O2RkTCmTb6MrcB8RCVIqqSG6b7r8gaqqtG9NXAa0id/XYbUnPmiE921u9UetxxRjGIUnC8ZoQCMlNiocsgaDm7nBXGeOzDBm3ALHH+TM3gNOg3izX58raJUY8BGjTM= Received: from DB3PR0502MB3980.eurprd05.prod.outlook.com (52.134.72.27) by DB3PR0502MB3995.eurprd05.prod.outlook.com (52.134.72.30) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.2094.16; Mon, 22 Jul 2019 05:32:24 +0000 Received: from DB3PR0502MB3980.eurprd05.prod.outlook.com ([fe80::69c1:c0d7:1fa1:f89f]) by DB3PR0502MB3980.eurprd05.prod.outlook.com ([fe80::69c1:c0d7:1fa1:f89f%6]) with mapi id 15.20.2094.013; Mon, 22 Jul 2019 05:32:24 +0000 From: Yongseok Koh To: Slava Ovsiienko CC: "dev@dpdk.org" Thread-Topic: [PATCH v4 1/8] net/mlx5: remove Tx datapath implementation Thread-Index: AQHVP9AsNC57Zi+8wkeM8+4mUkVJkKbWHgqA Date: Mon, 22 Jul 2019 05:32:23 +0000 Message-ID: References: <1563346400-1762-1-git-send-email-viacheslavo@mellanox.com> <1563719100-368-1-git-send-email-viacheslavo@mellanox.com> <1563719100-368-2-git-send-email-viacheslavo@mellanox.com> In-Reply-To: <1563719100-368-2-git-send-email-viacheslavo@mellanox.com> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: authentication-results: spf=none (sender IP is ) smtp.mailfrom=yskoh@mellanox.com; x-originating-ip: [69.181.245.183] x-ms-publictraffictype: Email x-ms-office365-filtering-correlation-id: bd747301-f452-47ca-da9b-08d70e65f9bd x-ms-office365-filtering-ht: Tenant x-microsoft-antispam: BCL:0; PCL:0; RULEID:(2390118)(7020095)(4652040)(8989299)(4534185)(4627221)(201703031133081)(201702281549075)(8990200)(5600148)(711020)(4605104)(1401327)(4618075)(2017052603328)(7193020); SRVR:DB3PR0502MB3995; x-ms-traffictypediagnostic: DB3PR0502MB3995: x-microsoft-antispam-prvs: x-ms-oob-tlc-oobclassifiers: OLM:510; x-forefront-prvs: 01068D0A20 x-forefront-antispam-report: SFV:NSPM; SFS:(10009020)(4636009)(39860400002)(376002)(346002)(366004)(396003)(136003)(199004)(189003)(478600001)(6486002)(33656002)(53936002)(76176011)(53946003)(6436002)(316002)(37006003)(36756003)(6512007)(81156014)(81166006)(186003)(6862004)(25786009)(446003)(8676002)(26005)(30864003)(2616005)(11346002)(476003)(4326008)(229853002)(6636002)(8936002)(71190400001)(71200400001)(486006)(102836004)(7736002)(2906002)(14454004)(6506007)(53546011)(305945005)(6246003)(66946007)(66476007)(66556008)(64756008)(66446008)(66066001)(76116006)(91956017)(5024004)(14444005)(256004)(6116002)(68736007)(86362001)(99286004)(5660300002)(3846002)(569006); DIR:OUT; SFP:1101; SCL:1; SRVR:DB3PR0502MB3995; H:DB3PR0502MB3980.eurprd05.prod.outlook.com; FPR:; SPF:None; LANG:en; PTR:InfoNoRecords; A:1; MX:1; received-spf: None (protection.outlook.com: mellanox.com does not designate permitted sender hosts) x-ms-exchange-senderadcheck: 1 x-microsoft-antispam-message-info: haeCFaAKw/juGOg8+dVkpF6wQ+GJPJ9oebwhxceBwt28MmxVKIz+OU1W0iz1XAWlHp/UVmY600AhYyiwIke2scdZhsoLdKHRTyAQ5z0vKHKlEi2gXQ/czE7vPyDlNiH7cE2bZqiGpLf408p0sWluJCHgrpLDNA/UH3Yy0/VkXND8CgzNKyoJvZI2u4Fhq/wJv1PNXh3fagk7m4FKFlqz0jtEZ4L4B+mFrFY3XN9i6X4snxViCV0kv/r2ZVGubOAGu09/y3R0d2N3YPu+aq70+SCtgzHx2OXVyjhhXT8P3kjFJpBilUID3OWxM5nkALXyoi00EJKM4/oY65vn66kpRCtiBtGpffApK4h23P/cFN3DtV4D7G3NIxSPU20MzK9UaRNIp6mhlfcwqLqQ9yNpZcr4wUcBhFOHDDBxZ62Mmc8= Content-Type: text/plain; charset="us-ascii" Content-ID: Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-OriginatorOrg: Mellanox.com X-MS-Exchange-CrossTenant-Network-Message-Id: bd747301-f452-47ca-da9b-08d70e65f9bd X-MS-Exchange-CrossTenant-originalarrivaltime: 22 Jul 2019 05:32:23.8030 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: a652971c-7d2e-4d9b-a6a4-d149256f461b X-MS-Exchange-CrossTenant-mailboxtype: HOSTED X-MS-Exchange-CrossTenant-userprincipalname: yskoh@mellanox.com X-MS-Exchange-Transport-CrossTenantHeadersStamped: DB3PR0502MB3995 Subject: Re: [dpdk-dev] [PATCH v4 1/8] net/mlx5: remove Tx datapath implementation X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" > On Jul 21, 2019, at 7:24 AM, Viacheslav Ovsiienko wrote: >=20 > This patch removes the existing Tx datapath code > as preparation step before introducing the new > implementation. The following entities are being > removed: >=20 > - deprecated devargs support > - tx_burst() routines > - related PRM definitions > - SQ configuration code > - Tx routine selection code > - incompatible Tx completion code >=20 > The following devargs are deprecated and ignored: > - "txq_inline" is going to be converted to "txq_inline_max" > for compatibility issue > - "tx_vec_en" > - "txqs_max_vec" > - "txq_mpw_hdr_dseg_en" > - "txq_max_inline_len" is going to be converted > to "txq_inline_mpw" for compatibility issue >=20 > The deprecated devarg keys are recognized by PMD > and ignored/converted to the new ones in order not > to block device probing. >=20 > Signed-off-by: Viacheslav Ovsiienko Acked-by: Yongseok Koh > --- > doc/guides/nics/mlx5.rst | 34 +- > drivers/net/mlx5/mlx5.c | 39 +- > drivers/net/mlx5/mlx5.h | 5 - > drivers/net/mlx5/mlx5_defs.h | 16 - > drivers/net/mlx5/mlx5_ethdev.c | 58 -- > drivers/net/mlx5/mlx5_prm.h | 77 -- > drivers/net/mlx5/mlx5_rxtx.c | 1434 +---------------------------= ----- > drivers/net/mlx5/mlx5_rxtx.h | 273 ------- > drivers/net/mlx5/mlx5_rxtx_vec.c | 175 ---- > drivers/net/mlx5/mlx5_rxtx_vec_neon.h | 289 ------- > drivers/net/mlx5/mlx5_rxtx_vec_sse.h | 284 ------- > drivers/net/mlx5/mlx5_txq.c | 110 +-- > 12 files changed, 65 insertions(+), 2729 deletions(-) >=20 > diff --git a/doc/guides/nics/mlx5.rst b/doc/guides/nics/mlx5.rst > index 16aa390..5cf1e76 100644 > --- a/doc/guides/nics/mlx5.rst > +++ b/doc/guides/nics/mlx5.rst > @@ -350,13 +350,8 @@ Run-time configuration >=20 > - ``txq_inline`` parameter [int] >=20 > - Amount of data to be inlined during TX operations. Improves latency. > - Can improve PPS performance when PCI back pressure is detected and may= be > - useful for scenarios involving heavy traffic on many queues. > - > - Because additional software logic is necessary to handle this mode, th= is > - option should be used with care, as it can lower performance when back > - pressure is not expected. > + Amount of data to be inlined during TX operations. This parameter is > + deprecated and ignored, kept for compatibility issue. >=20 > - ``txqs_min_inline`` parameter [int] >=20 > @@ -378,16 +373,8 @@ Run-time configuration > - ``txqs_max_vec`` parameter [int] >=20 > Enable vectorized Tx only when the number of TX queues is less than or > - equal to this value. Effective only when ``tx_vec_en`` is enabled. > - > - On ConnectX-5: > - > - - Set to 8 by default on ARMv8. > - - Set to 4 by default otherwise. > - > - On BlueField > - > - - Set to 16 by default. > + equal to this value. This parameter is deprecated and ignored, kept > + for compatibility issue to not prevent driver from probing. >=20 > - ``txq_mpw_en`` parameter [int] >=20 > @@ -418,7 +405,8 @@ Run-time configuration > - ``txq_mpw_hdr_dseg_en`` parameter [int] >=20 > A nonzero value enables including two pointers in the first block of TX > - descriptor. This can be used to lessen CPU load for memory copy. > + descriptor. The parameter is deprecated and ignored, kept for compatib= ility > + issue. >=20 > Effective only when Enhanced MPS is supported. Disabled by default. >=20 > @@ -427,14 +415,14 @@ Run-time configuration > Maximum size of packet to be inlined. This limits the size of packet to > be inlined. If the size of a packet is larger than configured value, th= e > packet isn't inlined even though there's enough space remained in the > - descriptor. Instead, the packet is included with pointer. > - > - Effective only when Enhanced MPS is supported. The default value is 25= 6. > + descriptor. Instead, the packet is included with pointer. This paramet= er > + is deprecated. >=20 > - ``tx_vec_en`` parameter [int] >=20 > - A nonzero value enables Tx vector on ConnectX-5, ConnectX-6 and BlueFi= eld NICs if the number of > - global Tx queues on the port is less than ``txqs_max_vec``. > + A nonzero value enables Tx vector on ConnectX-5, ConnectX-6 and BlueFi= eld > + NICs if the number of global Tx queues on the port is less than > + ``txqs_max_vec``. The parameter is deprecated and ignored. >=20 > This option cannot be used with certain offloads such as ``DEV_TX_OFFLO= AD_TCP_TSO, > DEV_TX_OFFLOAD_VXLAN_TNL_TSO, DEV_TX_OFFLOAD_GRE_TNL_TSO, DEV_TX_OFFLOA= D_VLAN_INSERT``. > diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c > index f4ad5d2..d4f0eb2 100644 > --- a/drivers/net/mlx5/mlx5.c > +++ b/drivers/net/mlx5/mlx5.c > @@ -69,7 +69,7 @@ > /* Device parameter to set the minimum number of Rx queues to enable MPRQ= . */ > #define MLX5_RXQS_MIN_MPRQ "rxqs_min_mprq" >=20 > -/* Device parameter to configure inline send. */ > +/* Device parameter to configure inline send. Deprecated, ignored.*/ > #define MLX5_TXQ_INLINE "txq_inline" >=20 > /* > @@ -80,20 +80,29 @@ >=20 > /* > * Device parameter to configure the number of TX queues threshold for > - * enabling vectorized Tx. > + * enabling vectorized Tx, deprecated, ignored (no vectorized Tx routine= s). > */ > #define MLX5_TXQS_MAX_VEC "txqs_max_vec" >=20 > /* Device parameter to enable multi-packet send WQEs. */ > #define MLX5_TXQ_MPW_EN "txq_mpw_en" >=20 > -/* Device parameter to include 2 dsegs in the title WQEBB. */ > +/* > + * Device parameter to include 2 dsegs in the title WQEBB. > + * Deprecated, ignored. > + */ > #define MLX5_TXQ_MPW_HDR_DSEG_EN "txq_mpw_hdr_dseg_en" >=20 > -/* Device parameter to limit the size of inlining packet. */ > +/* > + * Device parameter to limit the size of inlining packet. > + * Deprecated, ignored. > + */ > #define MLX5_TXQ_MAX_INLINE_LEN "txq_max_inline_len" >=20 > -/* Device parameter to enable hardware Tx vector. */ > +/* > + * Device parameter to enable hardware Tx vector. > + * Deprecated, ignored (no vectorized Tx routines anymore). > + */ > #define MLX5_TX_VEC_EN "tx_vec_en" >=20 > /* Device parameter to enable hardware Rx vector. */ > @@ -997,19 +1006,19 @@ struct mlx5_dev_spawn_data { > } else if (strcmp(MLX5_RXQS_MIN_MPRQ, key) =3D=3D 0) { > config->mprq.min_rxqs_num =3D tmp; > } else if (strcmp(MLX5_TXQ_INLINE, key) =3D=3D 0) { > - config->txq_inline =3D tmp; > + DRV_LOG(WARNING, "%s: deprecated parameter, ignored", key); > } else if (strcmp(MLX5_TXQS_MIN_INLINE, key) =3D=3D 0) { > config->txqs_inline =3D tmp; > } else if (strcmp(MLX5_TXQS_MAX_VEC, key) =3D=3D 0) { > - config->txqs_vec =3D tmp; > + DRV_LOG(WARNING, "%s: deprecated parameter, ignored", key); > } else if (strcmp(MLX5_TXQ_MPW_EN, key) =3D=3D 0) { > config->mps =3D !!tmp; > } else if (strcmp(MLX5_TXQ_MPW_HDR_DSEG_EN, key) =3D=3D 0) { > - config->mpw_hdr_dseg =3D !!tmp; > + DRV_LOG(WARNING, "%s: deprecated parameter, ignored", key); > } else if (strcmp(MLX5_TXQ_MAX_INLINE_LEN, key) =3D=3D 0) { > - config->inline_max_packet_sz =3D tmp; > + DRV_LOG(WARNING, "%s: deprecated parameter, ignored", key); > } else if (strcmp(MLX5_TX_VEC_EN, key) =3D=3D 0) { > - config->tx_vec_en =3D !!tmp; > + DRV_LOG(WARNING, "%s: deprecated parameter, ignored", key); > } else if (strcmp(MLX5_RX_VEC_EN, key) =3D=3D 0) { > config->rx_vec_en =3D !!tmp; > } else if (strcmp(MLX5_L3_VXLAN_EN, key) =3D=3D 0) { > @@ -2016,12 +2025,8 @@ struct mlx5_dev_spawn_data { > dev_config =3D (struct mlx5_dev_config){ > .hw_padding =3D 0, > .mps =3D MLX5_ARG_UNSET, > - .tx_vec_en =3D 1, > .rx_vec_en =3D 1, > - .txq_inline =3D MLX5_ARG_UNSET, > .txqs_inline =3D MLX5_ARG_UNSET, > - .txqs_vec =3D MLX5_ARG_UNSET, > - .inline_max_packet_sz =3D MLX5_ARG_UNSET, > .vf_nl_en =3D 1, > .mr_ext_memseg_en =3D 1, > .mprq =3D { > @@ -2034,9 +2039,6 @@ struct mlx5_dev_spawn_data { > }; > /* Device specific configuration. */ > switch (pci_dev->id.device_id) { > - case PCI_DEVICE_ID_MELLANOX_CONNECTX5BF: > - dev_config.txqs_vec =3D MLX5_VPMD_MAX_TXQS_BLUEFIELD; > - break; > case PCI_DEVICE_ID_MELLANOX_CONNECTX4VF: > case PCI_DEVICE_ID_MELLANOX_CONNECTX4LXVF: > case PCI_DEVICE_ID_MELLANOX_CONNECTX5VF: > @@ -2046,9 +2048,6 @@ struct mlx5_dev_spawn_data { > default: > break; > } > - /* Set architecture-dependent default value if unset. */ > - if (dev_config.txqs_vec =3D=3D MLX5_ARG_UNSET) > - dev_config.txqs_vec =3D MLX5_VPMD_MAX_TXQS; > for (i =3D 0; i !=3D ns; ++i) { > uint32_t restore; >=20 > diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h > index 6230371..354f6bc 100644 > --- a/drivers/net/mlx5/mlx5.h > +++ b/drivers/net/mlx5/mlx5.h > @@ -198,9 +198,7 @@ struct mlx5_dev_config { > unsigned int cqe_comp:1; /* CQE compression is enabled. */ > unsigned int cqe_pad:1; /* CQE padding is enabled. */ > unsigned int tso:1; /* Whether TSO is supported. */ > - unsigned int tx_vec_en:1; /* Tx vector is enabled. */ > unsigned int rx_vec_en:1; /* Rx vector is enabled. */ > - unsigned int mpw_hdr_dseg:1; /* Enable DSEGs in the title WQEBB. */ > unsigned int mr_ext_memseg_en:1; > /* Whether memseg should be extended for MR creation. */ > unsigned int l3_vxlan_en:1; /* Enable L3 VXLAN flow creation. */ > @@ -224,10 +222,7 @@ struct mlx5_dev_config { > unsigned int tso_max_payload_sz; /* Maximum TCP payload for TSO. */ > unsigned int ind_table_max_size; /* Maximum indirection table size. */ > unsigned int max_dump_files_num; /* Maximum dump files per queue. */ > - int txq_inline; /* Maximum packet size for inlining. */ > int txqs_inline; /* Queue number threshold for inlining. */ > - int txqs_vec; /* Queue number threshold for vectorized Tx. */ > - int inline_max_packet_sz; /* Max packet size for inlining. */ > struct mlx5_hca_attr hca_attr; /* HCA attributes. */ > }; >=20 > diff --git a/drivers/net/mlx5/mlx5_defs.h b/drivers/net/mlx5/mlx5_defs.h > index 13801a5..6861304 100644 > --- a/drivers/net/mlx5/mlx5_defs.h > +++ b/drivers/net/mlx5/mlx5_defs.h > @@ -60,15 +60,6 @@ > /* Maximum Packet headers size (L2+L3+L4) for TSO. */ > #define MLX5_MAX_TSO_HEADER 192 >=20 > -/* Default maximum number of Tx queues for vectorized Tx. */ > -#if defined(RTE_ARCH_ARM64) > -#define MLX5_VPMD_MAX_TXQS 8 > -#define MLX5_VPMD_MAX_TXQS_BLUEFIELD 16 > -#else > -#define MLX5_VPMD_MAX_TXQS 4 > -#define MLX5_VPMD_MAX_TXQS_BLUEFIELD MLX5_VPMD_MAX_TXQS > -#endif > - > /* Threshold of buffer replenishment for vectorized Rx. */ > #define MLX5_VPMD_RXQ_RPLNSH_THRESH(n) \ > (RTE_MIN(MLX5_VPMD_RX_MAX_BURST, (unsigned int)(n) >> 2)) > @@ -76,13 +67,6 @@ > /* Maximum size of burst for vectorized Rx. */ > #define MLX5_VPMD_RX_MAX_BURST 64U >=20 > -/* > - * Maximum size of burst for vectorized Tx. This is related to the maxim= um size > - * of Enhanced MPW (eMPW) WQE as vectorized Tx is supported with eMPW. > - * Careful when changing, large value can cause WQE DS to overlap. > - */ > -#define MLX5_VPMD_TX_MAX_BURST 32U > - > /* Number of packets vectorized Rx can simultaneously process in a loop. = */ > #define MLX5_VPMD_DESCS_PER_LOOP 4 >=20 > diff --git a/drivers/net/mlx5/mlx5_ethdev.c b/drivers/net/mlx5/mlx5_ethde= v.c > index f9826c9..738d540 100644 > --- a/drivers/net/mlx5/mlx5_ethdev.c > +++ b/drivers/net/mlx5/mlx5_ethdev.c > @@ -1653,64 +1653,6 @@ int mlx5_fw_version_get(struct rte_eth_dev *dev, c= har *fw_ver, size_t fw_size) > } >=20 > /** > - * Configure the TX function to use. > - * > - * @param dev > - * Pointer to private data structure. > - * > - * @return > - * Pointer to selected Tx burst function. > - */ > -eth_tx_burst_t > -mlx5_select_tx_function(struct rte_eth_dev *dev) > -{ > - struct mlx5_priv *priv =3D dev->data->dev_private; > - eth_tx_burst_t tx_pkt_burst =3D mlx5_tx_burst; > - struct mlx5_dev_config *config =3D &priv->config; > - uint64_t tx_offloads =3D dev->data->dev_conf.txmode.offloads; > - int tso =3D !!(tx_offloads & (DEV_TX_OFFLOAD_TCP_TSO | > - DEV_TX_OFFLOAD_VXLAN_TNL_TSO | > - DEV_TX_OFFLOAD_GRE_TNL_TSO | > - DEV_TX_OFFLOAD_IP_TNL_TSO | > - DEV_TX_OFFLOAD_UDP_TNL_TSO)); > - int swp =3D !!(tx_offloads & (DEV_TX_OFFLOAD_IP_TNL_TSO | > - DEV_TX_OFFLOAD_UDP_TNL_TSO | > - DEV_TX_OFFLOAD_OUTER_IPV4_CKSUM)); > - int vlan_insert =3D !!(tx_offloads & DEV_TX_OFFLOAD_VLAN_INSERT); > - > - assert(priv !=3D NULL); > - /* Select appropriate TX function. */ > - if (vlan_insert || tso || swp) > - return tx_pkt_burst; > - if (config->mps =3D=3D MLX5_MPW_ENHANCED) { > - if (mlx5_check_vec_tx_support(dev) > 0) { > - if (mlx5_check_raw_vec_tx_support(dev) > 0) > - tx_pkt_burst =3D mlx5_tx_burst_raw_vec; > - else > - tx_pkt_burst =3D mlx5_tx_burst_vec; > - DRV_LOG(DEBUG, > - "port %u selected enhanced MPW Tx vectorized" > - " function", > - dev->data->port_id); > - } else { > - tx_pkt_burst =3D mlx5_tx_burst_empw; > - DRV_LOG(DEBUG, > - "port %u selected enhanced MPW Tx function", > - dev->data->port_id); > - } > - } else if (config->mps && (config->txq_inline > 0)) { > - tx_pkt_burst =3D mlx5_tx_burst_mpw_inline; > - DRV_LOG(DEBUG, "port %u selected MPW inline Tx function", > - dev->data->port_id); > - } else if (config->mps) { > - tx_pkt_burst =3D mlx5_tx_burst_mpw; > - DRV_LOG(DEBUG, "port %u selected MPW Tx function", > - dev->data->port_id); > - } > - return tx_pkt_burst; > -} > - > -/** > * Configure the RX function to use. > * > * @param dev > diff --git a/drivers/net/mlx5/mlx5_prm.h b/drivers/net/mlx5/mlx5_prm.h > index 95ff29a..dfd9317 100644 > --- a/drivers/net/mlx5/mlx5_prm.h > +++ b/drivers/net/mlx5/mlx5_prm.h > @@ -39,32 +39,12 @@ > /* Invalidate a CQE. */ > #define MLX5_CQE_INVALIDATE (MLX5_CQE_INVALID << 4) >=20 > -/* Maximum number of packets a multi-packet WQE can handle. */ > -#define MLX5_MPW_DSEG_MAX 5 > - > /* WQE DWORD size */ > #define MLX5_WQE_DWORD_SIZE 16 >=20 > /* WQE size */ > #define MLX5_WQE_SIZE (4 * MLX5_WQE_DWORD_SIZE) >=20 > -/* Max size of a WQE session. */ > -#define MLX5_WQE_SIZE_MAX 960U > - > -/* Compute the number of DS. */ > -#define MLX5_WQE_DS(n) \ > - (((n) + MLX5_WQE_DWORD_SIZE - 1) / MLX5_WQE_DWORD_SIZE) > - > -/* Room for inline data in multi-packet WQE. */ > -#define MLX5_MWQE64_INL_DATA 28 > - > -/* Default minimum number of Tx queues for inlining packets. */ > -#define MLX5_EMPW_MIN_TXQS 8 > - > -/* Default max packet length to be inlined. */ > -#define MLX5_EMPW_MAX_INLINE_LEN (4U * MLX5_WQE_SIZE) > - > - > #define MLX5_OPC_MOD_ENHANCED_MPSW 0 > #define MLX5_OPCODE_ENHANCED_MPSW 0x29 >=20 > @@ -164,47 +144,11 @@ enum mlx5_completion_mode { > MLX5_COMP_CQE_AND_EQE =3D 0x3, > }; >=20 > -/* Subset of struct mlx5_wqe_eth_seg. */ > -struct mlx5_wqe_eth_seg_small { > - uint32_t rsvd0; > - uint8_t cs_flags; > - uint8_t rsvd1; > - uint16_t mss; > - uint32_t flow_table_metadata; > - uint16_t inline_hdr_sz; > - uint8_t inline_hdr[2]; > -} __rte_aligned(MLX5_WQE_DWORD_SIZE); > - > -struct mlx5_wqe_inl_small { > - uint32_t byte_cnt; > - uint8_t raw; > -} __rte_aligned(MLX5_WQE_DWORD_SIZE); > - > -struct mlx5_wqe_ctrl { > - uint32_t ctrl0; > - uint32_t ctrl1; > - uint32_t ctrl2; > - uint32_t ctrl3; > -} __rte_aligned(MLX5_WQE_DWORD_SIZE); > - > /* Small common part of the WQE. */ > struct mlx5_wqe { > uint32_t ctrl[4]; > - struct mlx5_wqe_eth_seg_small eseg; > -}; > - > -/* Vectorize WQE header. */ > -struct mlx5_wqe_v { > - rte_v128u32_t ctrl; > - rte_v128u32_t eseg; > }; >=20 > -/* WQE. */ > -struct mlx5_wqe64 { > - struct mlx5_wqe hdr; > - uint8_t raw[32]; > -} __rte_aligned(MLX5_WQE_SIZE); > - > /* MPW mode. */ > enum mlx5_mpw_mode { > MLX5_MPW_DISABLED, > @@ -212,27 +156,6 @@ enum mlx5_mpw_mode { > MLX5_MPW_ENHANCED, /* Enhanced Multi-Packet Send WQE, a.k.a MPWv2. */ > }; >=20 > -/* MPW session status. */ > -enum mlx5_mpw_state { > - MLX5_MPW_STATE_OPENED, > - MLX5_MPW_INL_STATE_OPENED, > - MLX5_MPW_ENHANCED_STATE_OPENED, > - MLX5_MPW_STATE_CLOSED, > -}; > - > -/* MPW session descriptor. */ > -struct mlx5_mpw { > - enum mlx5_mpw_state state; > - unsigned int pkts_n; > - unsigned int len; > - unsigned int total_len; > - volatile struct mlx5_wqe *wqe; > - union { > - volatile struct mlx5_wqe_data_seg *dseg[MLX5_MPW_DSEG_MAX]; > - volatile uint8_t *raw; > - } data; > -}; > - > /* WQE for Multi-Packet RQ. */ > struct mlx5_wqe_mprq { > struct mlx5_wqe_srq_next_seg next_seg; > diff --git a/drivers/net/mlx5/mlx5_rxtx.c b/drivers/net/mlx5/mlx5_rxtx.c > index c1dc8c4..f2d6918 100644 > --- a/drivers/net/mlx5/mlx5_rxtx.c > +++ b/drivers/net/mlx5/mlx5_rxtx.c > @@ -288,140 +288,6 @@ > } >=20 > /** > - * Return the size of tailroom of WQ. > - * > - * @param txq > - * Pointer to TX queue structure. > - * @param addr > - * Pointer to tail of WQ. > - * > - * @return > - * Size of tailroom. > - */ > -static inline size_t > -tx_mlx5_wq_tailroom(struct mlx5_txq_data *txq, void *addr) > -{ > - size_t tailroom; > - tailroom =3D (uintptr_t)(txq->wqes) + > - (1 << txq->wqe_n) * MLX5_WQE_SIZE - > - (uintptr_t)addr; > - return tailroom; > -} > - > -/** > - * Copy data to tailroom of circular queue. > - * > - * @param dst > - * Pointer to destination. > - * @param src > - * Pointer to source. > - * @param n > - * Number of bytes to copy. > - * @param base > - * Pointer to head of queue. > - * @param tailroom > - * Size of tailroom from dst. > - * > - * @return > - * Pointer after copied data. > - */ > -static inline void * > -mlx5_copy_to_wq(void *dst, const void *src, size_t n, > - void *base, size_t tailroom) > -{ > - void *ret; > - > - if (n > tailroom) { > - rte_memcpy(dst, src, tailroom); > - rte_memcpy(base, (void *)((uintptr_t)src + tailroom), > - n - tailroom); > - ret =3D (uint8_t *)base + n - tailroom; > - } else { > - rte_memcpy(dst, src, n); > - ret =3D (n =3D=3D tailroom) ? base : (uint8_t *)dst + n; > - } > - return ret; > -} > - > -/** > - * Inline TSO headers into WQE. > - * > - * @return > - * 0 on success, negative errno value on failure. > - */ > -static int > -inline_tso(struct mlx5_txq_data *txq, struct rte_mbuf *buf, > - uint32_t *length, > - uintptr_t *addr, > - uint16_t *pkt_inline_sz, > - uint8_t **raw, > - uint16_t *max_wqe, > - uint16_t *tso_segsz, > - uint16_t *tso_header_sz) > -{ > - uintptr_t end =3D (uintptr_t)(((uintptr_t)txq->wqes) + > - (1 << txq->wqe_n) * MLX5_WQE_SIZE); > - unsigned int copy_b; > - uint8_t vlan_sz =3D (buf->ol_flags & PKT_TX_VLAN_PKT) ? 4 : 0; > - const uint8_t tunneled =3D txq->tunnel_en && (buf->ol_flags & > - PKT_TX_TUNNEL_MASK); > - uint16_t n_wqe; > - > - *tso_segsz =3D buf->tso_segsz; > - *tso_header_sz =3D buf->l2_len + vlan_sz + buf->l3_len + buf->l4_len; > - if (unlikely(*tso_segsz =3D=3D 0 || *tso_header_sz =3D=3D 0)) { > - txq->stats.oerrors++; > - return -EINVAL; > - } > - if (tunneled) > - *tso_header_sz +=3D buf->outer_l2_len + buf->outer_l3_len; > - /* First seg must contain all TSO headers. */ > - if (unlikely(*tso_header_sz > MLX5_MAX_TSO_HEADER) || > - *tso_header_sz > DATA_LEN(buf)) { > - txq->stats.oerrors++; > - return -EINVAL; > - } > - copy_b =3D *tso_header_sz - *pkt_inline_sz; > - if (!copy_b || ((end - (uintptr_t)*raw) < copy_b)) > - return -EAGAIN; > - n_wqe =3D (MLX5_WQE_DS(copy_b) - 1 + 3) / 4; > - if (unlikely(*max_wqe < n_wqe)) > - return -EINVAL; > - *max_wqe -=3D n_wqe; > - rte_memcpy((void *)*raw, (void *)*addr, copy_b); > - *length -=3D copy_b; > - *addr +=3D copy_b; > - copy_b =3D MLX5_WQE_DS(copy_b) * MLX5_WQE_DWORD_SIZE; > - *pkt_inline_sz +=3D copy_b; > - *raw +=3D copy_b; > - return 0; > -} > - > -/** > - * DPDK callback to check the status of a tx descriptor. > - * > - * @param tx_queue > - * The tx queue. > - * @param[in] offset > - * The index of the descriptor in the ring. > - * > - * @return > - * The status of the tx descriptor. > - */ > -int > -mlx5_tx_descriptor_status(void *tx_queue, uint16_t offset) > -{ > - struct mlx5_txq_data *txq =3D tx_queue; > - uint16_t used; > - > - mlx5_tx_complete(txq); > - used =3D txq->elts_head - txq->elts_tail; > - if (offset < used) > - return RTE_ETH_TX_DESC_FULL; > - return RTE_ETH_TX_DESC_DONE; > -} > - > -/** > * Internal function to compute the number of used descriptors in an RX q= ueue > * > * @param rxq > @@ -655,7 +521,7 @@ > (1 << txq->cqe_n)); > mlx5_dump_debug_information(name, "MLX5 Error SQ:", > (const void *)((uintptr_t) > - tx_mlx5_wqe(txq, 0)), > + txq->wqes), > MLX5_WQE_SIZE * > (1 << txq->wqe_n)); > txq_ctrl->dump_file_n++; > @@ -683,1247 +549,6 @@ > } >=20 > /** > - * DPDK callback for TX. > - * > - * @param dpdk_txq > - * Generic pointer to TX queue structure. > - * @param[in] pkts > - * Packets to transmit. > - * @param pkts_n > - * Number of packets in array. > - * > - * @return > - * Number of packets successfully transmitted (<=3D pkts_n). > - */ > -uint16_t > -mlx5_tx_burst(void *dpdk_txq, struct rte_mbuf **pkts, uint16_t pkts_n) > -{ > - struct mlx5_txq_data *txq =3D (struct mlx5_txq_data *)dpdk_txq; > - uint16_t elts_head =3D txq->elts_head; > - const uint16_t elts_n =3D 1 << txq->elts_n; > - const uint16_t elts_m =3D elts_n - 1; > - unsigned int i =3D 0; > - unsigned int j =3D 0; > - unsigned int k =3D 0; > - uint16_t max_elts; > - uint16_t max_wqe; > - unsigned int comp; > - volatile struct mlx5_wqe_ctrl *last_wqe =3D NULL; > - unsigned int segs_n =3D 0; > - const unsigned int max_inline =3D txq->max_inline; > - uint64_t addr_64; > - > - if (unlikely(!pkts_n)) > - return 0; > - /* Prefetch first packet cacheline. */ > - rte_prefetch0(*pkts); > - /* Start processing. */ > - mlx5_tx_complete(txq); > - max_elts =3D (elts_n - (elts_head - txq->elts_tail)); > - max_wqe =3D (1u << txq->wqe_n) - (txq->wqe_ci - txq->wqe_pi); > - if (unlikely(!max_wqe)) > - return 0; > - do { > - struct rte_mbuf *buf =3D *pkts; /* First_seg. */ > - uint8_t *raw; > - volatile struct mlx5_wqe_v *wqe =3D NULL; > - volatile rte_v128u32_t *dseg =3D NULL; > - uint32_t length; > - unsigned int ds =3D 0; > - unsigned int sg =3D 0; /* counter of additional segs attached. */ > - uintptr_t addr; > - uint16_t pkt_inline_sz =3D MLX5_WQE_DWORD_SIZE + 2; > - uint16_t tso_header_sz =3D 0; > - uint16_t ehdr; > - uint8_t cs_flags; > - uint8_t tso =3D txq->tso_en && (buf->ol_flags & PKT_TX_TCP_SEG); > - uint32_t swp_offsets =3D 0; > - uint8_t swp_types =3D 0; > - rte_be32_t metadata; > - uint16_t tso_segsz =3D 0; > -#ifdef MLX5_PMD_SOFT_COUNTERS > - uint32_t total_length =3D 0; > -#endif > - int ret; > - > - segs_n =3D buf->nb_segs; > - /* > - * Make sure there is enough room to store this packet and > - * that one ring entry remains unused. > - */ > - assert(segs_n); > - if (max_elts < segs_n) > - break; > - max_elts -=3D segs_n; > - sg =3D --segs_n; > - if (unlikely(--max_wqe =3D=3D 0)) > - break; > - wqe =3D (volatile struct mlx5_wqe_v *) > - tx_mlx5_wqe(txq, txq->wqe_ci); > - rte_prefetch0(tx_mlx5_wqe(txq, txq->wqe_ci + 1)); > - if (pkts_n - i > 1) > - rte_prefetch0(*(pkts + 1)); > - addr =3D rte_pktmbuf_mtod(buf, uintptr_t); > - length =3D DATA_LEN(buf); > - ehdr =3D (((uint8_t *)addr)[1] << 8) | > - ((uint8_t *)addr)[0]; > -#ifdef MLX5_PMD_SOFT_COUNTERS > - total_length =3D length; > -#endif > - if (length < (MLX5_WQE_DWORD_SIZE + 2)) { > - txq->stats.oerrors++; > - break; > - } > - /* Update element. */ > - (*txq->elts)[elts_head & elts_m] =3D buf; > - /* Prefetch next buffer data. */ > - if (pkts_n - i > 1) > - rte_prefetch0( > - rte_pktmbuf_mtod(*(pkts + 1), volatile void *)); > - cs_flags =3D txq_ol_cksum_to_cs(buf); > - txq_mbuf_to_swp(txq, buf, (uint8_t *)&swp_offsets, &swp_types); > - raw =3D ((uint8_t *)(uintptr_t)wqe) + 2 * MLX5_WQE_DWORD_SIZE; > - /* Copy metadata from mbuf if valid */ > - metadata =3D buf->ol_flags & PKT_TX_METADATA ? buf->tx_metadata : > - 0; > - /* Replace the Ethernet type by the VLAN if necessary. */ > - if (buf->ol_flags & PKT_TX_VLAN_PKT) { > - uint32_t vlan =3D rte_cpu_to_be_32(0x81000000 | > - buf->vlan_tci); > - unsigned int len =3D 2 * RTE_ETHER_ADDR_LEN - 2; > - > - addr +=3D 2; > - length -=3D 2; > - /* Copy Destination and source mac address. */ > - memcpy((uint8_t *)raw, ((uint8_t *)addr), len); > - /* Copy VLAN. */ > - memcpy((uint8_t *)raw + len, &vlan, sizeof(vlan)); > - /* Copy missing two bytes to end the DSeg. */ > - memcpy((uint8_t *)raw + len + sizeof(vlan), > - ((uint8_t *)addr) + len, 2); > - addr +=3D len + 2; > - length -=3D (len + 2); > - } else { > - memcpy((uint8_t *)raw, ((uint8_t *)addr) + 2, > - MLX5_WQE_DWORD_SIZE); > - length -=3D pkt_inline_sz; > - addr +=3D pkt_inline_sz; > - } > - raw +=3D MLX5_WQE_DWORD_SIZE; > - if (tso) { > - ret =3D inline_tso(txq, buf, &length, > - &addr, &pkt_inline_sz, > - &raw, &max_wqe, > - &tso_segsz, &tso_header_sz); > - if (ret =3D=3D -EINVAL) { > - break; > - } else if (ret =3D=3D -EAGAIN) { > - /* NOP WQE. */ > - wqe->ctrl =3D (rte_v128u32_t){ > - rte_cpu_to_be_32(txq->wqe_ci << 8), > - rte_cpu_to_be_32(txq->qp_num_8s | 1), > - rte_cpu_to_be_32 > - (MLX5_COMP_ONLY_FIRST_ERR << > - MLX5_COMP_MODE_OFFSET), > - 0, > - }; > - ds =3D 1; > -#ifdef MLX5_PMD_SOFT_COUNTERS > - total_length =3D 0; > -#endif > - k++; > - goto next_wqe; > - } > - } > - /* Inline if enough room. */ > - if (max_inline || tso) { > - uint32_t inl =3D 0; > - uintptr_t end =3D (uintptr_t) > - (((uintptr_t)txq->wqes) + > - (1 << txq->wqe_n) * MLX5_WQE_SIZE); > - unsigned int inline_room =3D max_inline * > - RTE_CACHE_LINE_SIZE - > - (pkt_inline_sz - 2) - > - !!tso * sizeof(inl); > - uintptr_t addr_end; > - unsigned int copy_b; > - > -pkt_inline: > - addr_end =3D RTE_ALIGN_FLOOR(addr + inline_room, > - RTE_CACHE_LINE_SIZE); > - copy_b =3D (addr_end > addr) ? > - RTE_MIN((addr_end - addr), length) : 0; > - if (copy_b && ((end - (uintptr_t)raw) > > - (copy_b + sizeof(inl)))) { > - /* > - * One Dseg remains in the current WQE. To > - * keep the computation positive, it is > - * removed after the bytes to Dseg conversion. > - */ > - uint16_t n =3D (MLX5_WQE_DS(copy_b) - 1 + 3) / 4; > - > - if (unlikely(max_wqe < n)) > - break; > - max_wqe -=3D n; > - if (tso) { > - assert(inl =3D=3D 0); > - inl =3D rte_cpu_to_be_32(copy_b | > - MLX5_INLINE_SEG); > - rte_memcpy((void *)raw, > - (void *)&inl, sizeof(inl)); > - raw +=3D sizeof(inl); > - pkt_inline_sz +=3D sizeof(inl); > - } > - rte_memcpy((void *)raw, (void *)addr, copy_b); > - addr +=3D copy_b; > - length -=3D copy_b; > - pkt_inline_sz +=3D copy_b; > - } > - /* > - * 2 DWORDs consumed by the WQE header + ETH segment + > - * the size of the inline part of the packet. > - */ > - ds =3D 2 + MLX5_WQE_DS(pkt_inline_sz - 2); > - if (length > 0) { > - if (ds % (MLX5_WQE_SIZE / > - MLX5_WQE_DWORD_SIZE) =3D=3D 0) { > - if (unlikely(--max_wqe =3D=3D 0)) > - break; > - dseg =3D (volatile rte_v128u32_t *) > - tx_mlx5_wqe(txq, txq->wqe_ci + > - ds / 4); > - } else { > - dseg =3D (volatile rte_v128u32_t *) > - ((uintptr_t)wqe + > - (ds * MLX5_WQE_DWORD_SIZE)); > - } > - goto use_dseg; > - } else if (!segs_n) { > - goto next_pkt; > - } else { > - /* > - * Further inline the next segment only for > - * non-TSO packets. > - */ > - if (!tso) { > - raw +=3D copy_b; > - inline_room -=3D copy_b; > - } else { > - inline_room =3D 0; > - } > - /* Move to the next segment. */ > - --segs_n; > - buf =3D buf->next; > - assert(buf); > - addr =3D rte_pktmbuf_mtod(buf, uintptr_t); > - length =3D DATA_LEN(buf); > -#ifdef MLX5_PMD_SOFT_COUNTERS > - total_length +=3D length; > -#endif > - (*txq->elts)[++elts_head & elts_m] =3D buf; > - goto pkt_inline; > - } > - } else { > - /* > - * No inline has been done in the packet, only the > - * Ethernet Header as been stored. > - */ > - dseg =3D (volatile rte_v128u32_t *) > - ((uintptr_t)wqe + (3 * MLX5_WQE_DWORD_SIZE)); > - ds =3D 3; > -use_dseg: > - /* Add the remaining packet as a simple ds. */ > - addr_64 =3D rte_cpu_to_be_64(addr); > - *dseg =3D (rte_v128u32_t){ > - rte_cpu_to_be_32(length), > - mlx5_tx_mb2mr(txq, buf), > - addr_64, > - addr_64 >> 32, > - }; > - ++ds; > - if (!segs_n) > - goto next_pkt; > - } > -next_seg: > - assert(buf); > - assert(ds); > - assert(wqe); > - /* > - * Spill on next WQE when the current one does not have > - * enough room left. Size of WQE must a be a multiple > - * of data segment size. > - */ > - assert(!(MLX5_WQE_SIZE % MLX5_WQE_DWORD_SIZE)); > - if (!(ds % (MLX5_WQE_SIZE / MLX5_WQE_DWORD_SIZE))) { > - if (unlikely(--max_wqe =3D=3D 0)) > - break; > - dseg =3D (volatile rte_v128u32_t *) > - tx_mlx5_wqe(txq, txq->wqe_ci + ds / 4); > - rte_prefetch0(tx_mlx5_wqe(txq, > - txq->wqe_ci + ds / 4 + 1)); > - } else { > - ++dseg; > - } > - ++ds; > - buf =3D buf->next; > - assert(buf); > - length =3D DATA_LEN(buf); > -#ifdef MLX5_PMD_SOFT_COUNTERS > - total_length +=3D length; > -#endif > - /* Store segment information. */ > - addr_64 =3D rte_cpu_to_be_64(rte_pktmbuf_mtod(buf, uintptr_t)); > - *dseg =3D (rte_v128u32_t){ > - rte_cpu_to_be_32(length), > - mlx5_tx_mb2mr(txq, buf), > - addr_64, > - addr_64 >> 32, > - }; > - (*txq->elts)[++elts_head & elts_m] =3D buf; > - if (--segs_n) > - goto next_seg; > -next_pkt: > - if (ds > MLX5_DSEG_MAX) { > - txq->stats.oerrors++; > - break; > - } > - ++elts_head; > - ++pkts; > - ++i; > - j +=3D sg; > - /* Initialize known and common part of the WQE structure. */ > - if (tso) { > - wqe->ctrl =3D (rte_v128u32_t){ > - rte_cpu_to_be_32((txq->wqe_ci << 8) | > - MLX5_OPCODE_TSO), > - rte_cpu_to_be_32(txq->qp_num_8s | ds), > - rte_cpu_to_be_32(MLX5_COMP_ONLY_FIRST_ERR << > - MLX5_COMP_MODE_OFFSET), > - 0, > - }; > - wqe->eseg =3D (rte_v128u32_t){ > - swp_offsets, > - cs_flags | (swp_types << 8) | > - (rte_cpu_to_be_16(tso_segsz) << 16), > - metadata, > - (ehdr << 16) | rte_cpu_to_be_16(tso_header_sz), > - }; > - } else { > - wqe->ctrl =3D (rte_v128u32_t){ > - rte_cpu_to_be_32((txq->wqe_ci << 8) | > - MLX5_OPCODE_SEND), > - rte_cpu_to_be_32(txq->qp_num_8s | ds), > - rte_cpu_to_be_32(MLX5_COMP_ONLY_FIRST_ERR << > - MLX5_COMP_MODE_OFFSET), > - 0, > - }; > - wqe->eseg =3D (rte_v128u32_t){ > - swp_offsets, > - cs_flags | (swp_types << 8), > - metadata, > - (ehdr << 16) | rte_cpu_to_be_16(pkt_inline_sz), > - }; > - } > -next_wqe: > - txq->wqe_ci +=3D (ds + 3) / 4; > - /* Save the last successful WQE for completion request */ > - last_wqe =3D (volatile struct mlx5_wqe_ctrl *)wqe; > -#ifdef MLX5_PMD_SOFT_COUNTERS > - /* Increment sent bytes counter. */ > - txq->stats.obytes +=3D total_length; > -#endif > - } while (i < pkts_n); > - /* Take a shortcut if nothing must be sent. */ > - if (unlikely((i + k) =3D=3D 0)) > - return 0; > - txq->elts_head +=3D (i + j); > - /* Check whether completion threshold has been reached. */ > - comp =3D txq->elts_comp + i + j + k; > - if (comp >=3D MLX5_TX_COMP_THRESH) { > - /* A CQE slot must always be available. */ > - assert((1u << txq->cqe_n) - (txq->cq_pi++ - txq->cq_ci)); > - /* Request completion on last WQE. */ > - last_wqe->ctrl2 =3D rte_cpu_to_be_32(MLX5_COMP_ALWAYS << > - MLX5_COMP_MODE_OFFSET); > - /* Save elts_head in unused "immediate" field of WQE. */ > - last_wqe->ctrl3 =3D txq->elts_head; > - txq->elts_comp =3D 0; > - } else { > - txq->elts_comp =3D comp; > - } > -#ifdef MLX5_PMD_SOFT_COUNTERS > - /* Increment sent packets counter. */ > - txq->stats.opackets +=3D i; > -#endif > - /* Ring QP doorbell. */ > - mlx5_tx_dbrec(txq, (volatile struct mlx5_wqe *)last_wqe); > - return i; > -} > - > -/** > - * Open a MPW session. > - * > - * @param txq > - * Pointer to TX queue structure. > - * @param mpw > - * Pointer to MPW session structure. > - * @param length > - * Packet length. > - */ > -static inline void > -mlx5_mpw_new(struct mlx5_txq_data *txq, struct mlx5_mpw *mpw, uint32_t l= ength) > -{ > - uint16_t idx =3D txq->wqe_ci & ((1 << txq->wqe_n) - 1); > - volatile struct mlx5_wqe_data_seg (*dseg)[MLX5_MPW_DSEG_MAX] =3D > - (volatile struct mlx5_wqe_data_seg (*)[]) > - tx_mlx5_wqe(txq, idx + 1); > - > - mpw->state =3D MLX5_MPW_STATE_OPENED; > - mpw->pkts_n =3D 0; > - mpw->len =3D length; > - mpw->total_len =3D 0; > - mpw->wqe =3D (volatile struct mlx5_wqe *)tx_mlx5_wqe(txq, idx); > - mpw->wqe->eseg.mss =3D rte_cpu_to_be_16(length); > - mpw->wqe->eseg.inline_hdr_sz =3D 0; > - mpw->wqe->eseg.rsvd0 =3D 0; > - mpw->wqe->eseg.rsvd1 =3D 0; > - mpw->wqe->eseg.flow_table_metadata =3D 0; > - mpw->wqe->ctrl[0] =3D rte_cpu_to_be_32((MLX5_OPC_MOD_MPW << 24) | > - (txq->wqe_ci << 8) | > - MLX5_OPCODE_TSO); > - mpw->wqe->ctrl[2] =3D rte_cpu_to_be_32(MLX5_COMP_ONLY_FIRST_ERR << > - MLX5_COMP_MODE_OFFSET); > - mpw->wqe->ctrl[3] =3D 0; > - mpw->data.dseg[0] =3D (volatile struct mlx5_wqe_data_seg *) > - (((uintptr_t)mpw->wqe) + (2 * MLX5_WQE_DWORD_SIZE)); > - mpw->data.dseg[1] =3D (volatile struct mlx5_wqe_data_seg *) > - (((uintptr_t)mpw->wqe) + (3 * MLX5_WQE_DWORD_SIZE)); > - mpw->data.dseg[2] =3D &(*dseg)[0]; > - mpw->data.dseg[3] =3D &(*dseg)[1]; > - mpw->data.dseg[4] =3D &(*dseg)[2]; > -} > - > -/** > - * Close a MPW session. > - * > - * @param txq > - * Pointer to TX queue structure. > - * @param mpw > - * Pointer to MPW session structure. > - */ > -static inline void > -mlx5_mpw_close(struct mlx5_txq_data *txq, struct mlx5_mpw *mpw) > -{ > - unsigned int num =3D mpw->pkts_n; > - > - /* > - * Store size in multiple of 16 bytes. Control and Ethernet segments > - * count as 2. > - */ > - mpw->wqe->ctrl[1] =3D rte_cpu_to_be_32(txq->qp_num_8s | (2 + num)); > - mpw->state =3D MLX5_MPW_STATE_CLOSED; > - if (num < 3) > - ++txq->wqe_ci; > - else > - txq->wqe_ci +=3D 2; > - rte_prefetch0(tx_mlx5_wqe(txq, txq->wqe_ci)); > - rte_prefetch0(tx_mlx5_wqe(txq, txq->wqe_ci + 1)); > -} > - > -/** > - * DPDK callback for TX with MPW support. > - * > - * @param dpdk_txq > - * Generic pointer to TX queue structure. > - * @param[in] pkts > - * Packets to transmit. > - * @param pkts_n > - * Number of packets in array. > - * > - * @return > - * Number of packets successfully transmitted (<=3D pkts_n). > - */ > -uint16_t > -mlx5_tx_burst_mpw(void *dpdk_txq, struct rte_mbuf **pkts, uint16_t pkts_= n) > -{ > - struct mlx5_txq_data *txq =3D (struct mlx5_txq_data *)dpdk_txq; > - uint16_t elts_head =3D txq->elts_head; > - const uint16_t elts_n =3D 1 << txq->elts_n; > - const uint16_t elts_m =3D elts_n - 1; > - unsigned int i =3D 0; > - unsigned int j =3D 0; > - uint16_t max_elts; > - uint16_t max_wqe; > - unsigned int comp; > - struct mlx5_mpw mpw =3D { > - .state =3D MLX5_MPW_STATE_CLOSED, > - }; > - > - if (unlikely(!pkts_n)) > - return 0; > - /* Prefetch first packet cacheline. */ > - rte_prefetch0(tx_mlx5_wqe(txq, txq->wqe_ci)); > - rte_prefetch0(tx_mlx5_wqe(txq, txq->wqe_ci + 1)); > - /* Start processing. */ > - mlx5_tx_complete(txq); > - max_elts =3D (elts_n - (elts_head - txq->elts_tail)); > - max_wqe =3D (1u << txq->wqe_n) - (txq->wqe_ci - txq->wqe_pi); > - if (unlikely(!max_wqe)) > - return 0; > - do { > - struct rte_mbuf *buf =3D *(pkts++); > - uint32_t length; > - unsigned int segs_n =3D buf->nb_segs; > - uint32_t cs_flags; > - rte_be32_t metadata; > - > - /* > - * Make sure there is enough room to store this packet and > - * that one ring entry remains unused. > - */ > - assert(segs_n); > - if (max_elts < segs_n) > - break; > - /* Do not bother with large packets MPW cannot handle. */ > - if (segs_n > MLX5_MPW_DSEG_MAX) { > - txq->stats.oerrors++; > - break; > - } > - max_elts -=3D segs_n; > - --pkts_n; > - cs_flags =3D txq_ol_cksum_to_cs(buf); > - /* Copy metadata from mbuf if valid */ > - metadata =3D buf->ol_flags & PKT_TX_METADATA ? buf->tx_metadata : > - 0; > - /* Retrieve packet information. */ > - length =3D PKT_LEN(buf); > - assert(length); > - /* Start new session if packet differs. */ > - if ((mpw.state =3D=3D MLX5_MPW_STATE_OPENED) && > - ((mpw.len !=3D length) || > - (segs_n !=3D 1) || > - (mpw.wqe->eseg.flow_table_metadata !=3D metadata) || > - (mpw.wqe->eseg.cs_flags !=3D cs_flags))) > - mlx5_mpw_close(txq, &mpw); > - if (mpw.state =3D=3D MLX5_MPW_STATE_CLOSED) { > - /* > - * Multi-Packet WQE consumes at most two WQE. > - * mlx5_mpw_new() expects to be able to use such > - * resources. > - */ > - if (unlikely(max_wqe < 2)) > - break; > - max_wqe -=3D 2; > - mlx5_mpw_new(txq, &mpw, length); > - mpw.wqe->eseg.cs_flags =3D cs_flags; > - mpw.wqe->eseg.flow_table_metadata =3D metadata; > - } > - /* Multi-segment packets must be alone in their MPW. */ > - assert((segs_n =3D=3D 1) || (mpw.pkts_n =3D=3D 0)); > -#if defined(MLX5_PMD_SOFT_COUNTERS) || !defined(NDEBUG) > - length =3D 0; > -#endif > - do { > - volatile struct mlx5_wqe_data_seg *dseg; > - uintptr_t addr; > - > - assert(buf); > - (*txq->elts)[elts_head++ & elts_m] =3D buf; > - dseg =3D mpw.data.dseg[mpw.pkts_n]; > - addr =3D rte_pktmbuf_mtod(buf, uintptr_t); > - *dseg =3D (struct mlx5_wqe_data_seg){ > - .byte_count =3D rte_cpu_to_be_32(DATA_LEN(buf)), > - .lkey =3D mlx5_tx_mb2mr(txq, buf), > - .addr =3D rte_cpu_to_be_64(addr), > - }; > -#if defined(MLX5_PMD_SOFT_COUNTERS) || !defined(NDEBUG) > - length +=3D DATA_LEN(buf); > -#endif > - buf =3D buf->next; > - ++mpw.pkts_n; > - ++j; > - } while (--segs_n); > - assert(length =3D=3D mpw.len); > - if (mpw.pkts_n =3D=3D MLX5_MPW_DSEG_MAX) > - mlx5_mpw_close(txq, &mpw); > -#ifdef MLX5_PMD_SOFT_COUNTERS > - /* Increment sent bytes counter. */ > - txq->stats.obytes +=3D length; > -#endif > - ++i; > - } while (pkts_n); > - /* Take a shortcut if nothing must be sent. */ > - if (unlikely(i =3D=3D 0)) > - return 0; > - /* Check whether completion threshold has been reached. */ > - /* "j" includes both packets and segments. */ > - comp =3D txq->elts_comp + j; > - if (comp >=3D MLX5_TX_COMP_THRESH) { > - volatile struct mlx5_wqe *wqe =3D mpw.wqe; > - > - /* A CQE slot must always be available. */ > - assert((1u << txq->cqe_n) - (txq->cq_pi++ - txq->cq_ci)); > - /* Request completion on last WQE. */ > - wqe->ctrl[2] =3D rte_cpu_to_be_32(MLX5_COMP_ALWAYS << > - MLX5_COMP_MODE_OFFSET); > - /* Save elts_head in unused "immediate" field of WQE. */ > - wqe->ctrl[3] =3D elts_head; > - txq->elts_comp =3D 0; > - } else { > - txq->elts_comp =3D comp; > - } > -#ifdef MLX5_PMD_SOFT_COUNTERS > - /* Increment sent packets counter. */ > - txq->stats.opackets +=3D i; > -#endif > - /* Ring QP doorbell. */ > - if (mpw.state =3D=3D MLX5_MPW_STATE_OPENED) > - mlx5_mpw_close(txq, &mpw); > - mlx5_tx_dbrec(txq, mpw.wqe); > - txq->elts_head =3D elts_head; > - return i; > -} > - > -/** > - * Open a MPW inline session. > - * > - * @param txq > - * Pointer to TX queue structure. > - * @param mpw > - * Pointer to MPW session structure. > - * @param length > - * Packet length. > - */ > -static inline void > -mlx5_mpw_inline_new(struct mlx5_txq_data *txq, struct mlx5_mpw *mpw, > - uint32_t length) > -{ > - uint16_t idx =3D txq->wqe_ci & ((1 << txq->wqe_n) - 1); > - struct mlx5_wqe_inl_small *inl; > - > - mpw->state =3D MLX5_MPW_INL_STATE_OPENED; > - mpw->pkts_n =3D 0; > - mpw->len =3D length; > - mpw->total_len =3D 0; > - mpw->wqe =3D (volatile struct mlx5_wqe *)tx_mlx5_wqe(txq, idx); > - mpw->wqe->ctrl[0] =3D rte_cpu_to_be_32((MLX5_OPC_MOD_MPW << 24) | > - (txq->wqe_ci << 8) | > - MLX5_OPCODE_TSO); > - mpw->wqe->ctrl[2] =3D rte_cpu_to_be_32(MLX5_COMP_ONLY_FIRST_ERR << > - MLX5_COMP_MODE_OFFSET); > - mpw->wqe->ctrl[3] =3D 0; > - mpw->wqe->eseg.mss =3D rte_cpu_to_be_16(length); > - mpw->wqe->eseg.inline_hdr_sz =3D 0; > - mpw->wqe->eseg.cs_flags =3D 0; > - mpw->wqe->eseg.rsvd0 =3D 0; > - mpw->wqe->eseg.rsvd1 =3D 0; > - mpw->wqe->eseg.flow_table_metadata =3D 0; > - inl =3D (struct mlx5_wqe_inl_small *) > - (((uintptr_t)mpw->wqe) + 2 * MLX5_WQE_DWORD_SIZE); > - mpw->data.raw =3D (uint8_t *)&inl->raw; > -} > - > -/** > - * Close a MPW inline session. > - * > - * @param txq > - * Pointer to TX queue structure. > - * @param mpw > - * Pointer to MPW session structure. > - */ > -static inline void > -mlx5_mpw_inline_close(struct mlx5_txq_data *txq, struct mlx5_mpw *mpw) > -{ > - unsigned int size; > - struct mlx5_wqe_inl_small *inl =3D (struct mlx5_wqe_inl_small *) > - (((uintptr_t)mpw->wqe) + (2 * MLX5_WQE_DWORD_SIZE)); > - > - size =3D MLX5_WQE_SIZE - MLX5_MWQE64_INL_DATA + mpw->total_len; > - /* > - * Store size in multiple of 16 bytes. Control and Ethernet segments > - * count as 2. > - */ > - mpw->wqe->ctrl[1] =3D rte_cpu_to_be_32(txq->qp_num_8s | > - MLX5_WQE_DS(size)); > - mpw->state =3D MLX5_MPW_STATE_CLOSED; > - inl->byte_cnt =3D rte_cpu_to_be_32(mpw->total_len | MLX5_INLINE_SEG); > - txq->wqe_ci +=3D (size + (MLX5_WQE_SIZE - 1)) / MLX5_WQE_SIZE; > -} > - > -/** > - * DPDK callback for TX with MPW inline support. > - * > - * @param dpdk_txq > - * Generic pointer to TX queue structure. > - * @param[in] pkts > - * Packets to transmit. > - * @param pkts_n > - * Number of packets in array. > - * > - * @return > - * Number of packets successfully transmitted (<=3D pkts_n). > - */ > -uint16_t > -mlx5_tx_burst_mpw_inline(void *dpdk_txq, struct rte_mbuf **pkts, > - uint16_t pkts_n) > -{ > - struct mlx5_txq_data *txq =3D (struct mlx5_txq_data *)dpdk_txq; > - uint16_t elts_head =3D txq->elts_head; > - const uint16_t elts_n =3D 1 << txq->elts_n; > - const uint16_t elts_m =3D elts_n - 1; > - unsigned int i =3D 0; > - unsigned int j =3D 0; > - uint16_t max_elts; > - uint16_t max_wqe; > - unsigned int comp; > - unsigned int inline_room =3D txq->max_inline * RTE_CACHE_LINE_SIZE; > - struct mlx5_mpw mpw =3D { > - .state =3D MLX5_MPW_STATE_CLOSED, > - }; > - /* > - * Compute the maximum number of WQE which can be consumed by inline > - * code. > - * - 2 DSEG for: > - * - 1 control segment, > - * - 1 Ethernet segment, > - * - N Dseg from the inline request. > - */ > - const unsigned int wqe_inl_n =3D > - ((2 * MLX5_WQE_DWORD_SIZE + > - txq->max_inline * RTE_CACHE_LINE_SIZE) + > - RTE_CACHE_LINE_SIZE - 1) / RTE_CACHE_LINE_SIZE; > - > - if (unlikely(!pkts_n)) > - return 0; > - /* Prefetch first packet cacheline. */ > - rte_prefetch0(tx_mlx5_wqe(txq, txq->wqe_ci)); > - rte_prefetch0(tx_mlx5_wqe(txq, txq->wqe_ci + 1)); > - /* Start processing. */ > - mlx5_tx_complete(txq); > - max_elts =3D (elts_n - (elts_head - txq->elts_tail)); > - do { > - struct rte_mbuf *buf =3D *(pkts++); > - uintptr_t addr; > - uint32_t length; > - unsigned int segs_n =3D buf->nb_segs; > - uint8_t cs_flags; > - rte_be32_t metadata; > - > - /* > - * Make sure there is enough room to store this packet and > - * that one ring entry remains unused. > - */ > - assert(segs_n); > - if (max_elts < segs_n) > - break; > - /* Do not bother with large packets MPW cannot handle. */ > - if (segs_n > MLX5_MPW_DSEG_MAX) { > - txq->stats.oerrors++; > - break; > - } > - max_elts -=3D segs_n; > - --pkts_n; > - /* > - * Compute max_wqe in case less WQE were consumed in previous > - * iteration. > - */ > - max_wqe =3D (1u << txq->wqe_n) - (txq->wqe_ci - txq->wqe_pi); > - cs_flags =3D txq_ol_cksum_to_cs(buf); > - /* Copy metadata from mbuf if valid */ > - metadata =3D buf->ol_flags & PKT_TX_METADATA ? buf->tx_metadata : > - 0; > - /* Retrieve packet information. */ > - length =3D PKT_LEN(buf); > - /* Start new session if packet differs. */ > - if (mpw.state =3D=3D MLX5_MPW_STATE_OPENED) { > - if ((mpw.len !=3D length) || > - (segs_n !=3D 1) || > - (mpw.wqe->eseg.flow_table_metadata !=3D metadata) || > - (mpw.wqe->eseg.cs_flags !=3D cs_flags)) > - mlx5_mpw_close(txq, &mpw); > - } else if (mpw.state =3D=3D MLX5_MPW_INL_STATE_OPENED) { > - if ((mpw.len !=3D length) || > - (segs_n !=3D 1) || > - (length > inline_room) || > - (mpw.wqe->eseg.flow_table_metadata !=3D metadata) || > - (mpw.wqe->eseg.cs_flags !=3D cs_flags)) { > - mlx5_mpw_inline_close(txq, &mpw); > - inline_room =3D > - txq->max_inline * RTE_CACHE_LINE_SIZE; > - } > - } > - if (mpw.state =3D=3D MLX5_MPW_STATE_CLOSED) { > - if ((segs_n !=3D 1) || > - (length > inline_room)) { > - /* > - * Multi-Packet WQE consumes at most two WQE. > - * mlx5_mpw_new() expects to be able to use > - * such resources. > - */ > - if (unlikely(max_wqe < 2)) > - break; > - max_wqe -=3D 2; > - mlx5_mpw_new(txq, &mpw, length); > - mpw.wqe->eseg.cs_flags =3D cs_flags; > - mpw.wqe->eseg.flow_table_metadata =3D metadata; > - } else { > - if (unlikely(max_wqe < wqe_inl_n)) > - break; > - max_wqe -=3D wqe_inl_n; > - mlx5_mpw_inline_new(txq, &mpw, length); > - mpw.wqe->eseg.cs_flags =3D cs_flags; > - mpw.wqe->eseg.flow_table_metadata =3D metadata; > - } > - } > - /* Multi-segment packets must be alone in their MPW. */ > - assert((segs_n =3D=3D 1) || (mpw.pkts_n =3D=3D 0)); > - if (mpw.state =3D=3D MLX5_MPW_STATE_OPENED) { > - assert(inline_room =3D=3D > - txq->max_inline * RTE_CACHE_LINE_SIZE); > -#if defined(MLX5_PMD_SOFT_COUNTERS) || !defined(NDEBUG) > - length =3D 0; > -#endif > - do { > - volatile struct mlx5_wqe_data_seg *dseg; > - > - assert(buf); > - (*txq->elts)[elts_head++ & elts_m] =3D buf; > - dseg =3D mpw.data.dseg[mpw.pkts_n]; > - addr =3D rte_pktmbuf_mtod(buf, uintptr_t); > - *dseg =3D (struct mlx5_wqe_data_seg){ > - .byte_count =3D > - rte_cpu_to_be_32(DATA_LEN(buf)), > - .lkey =3D mlx5_tx_mb2mr(txq, buf), > - .addr =3D rte_cpu_to_be_64(addr), > - }; > -#if defined(MLX5_PMD_SOFT_COUNTERS) || !defined(NDEBUG) > - length +=3D DATA_LEN(buf); > -#endif > - buf =3D buf->next; > - ++mpw.pkts_n; > - ++j; > - } while (--segs_n); > - assert(length =3D=3D mpw.len); > - if (mpw.pkts_n =3D=3D MLX5_MPW_DSEG_MAX) > - mlx5_mpw_close(txq, &mpw); > - } else { > - unsigned int max; > - > - assert(mpw.state =3D=3D MLX5_MPW_INL_STATE_OPENED); > - assert(length <=3D inline_room); > - assert(length =3D=3D DATA_LEN(buf)); > - addr =3D rte_pktmbuf_mtod(buf, uintptr_t); > - (*txq->elts)[elts_head++ & elts_m] =3D buf; > - /* Maximum number of bytes before wrapping. */ > - max =3D ((((uintptr_t)(txq->wqes)) + > - (1 << txq->wqe_n) * > - MLX5_WQE_SIZE) - > - (uintptr_t)mpw.data.raw); > - if (length > max) { > - rte_memcpy((void *)(uintptr_t)mpw.data.raw, > - (void *)addr, > - max); > - mpw.data.raw =3D (volatile void *)txq->wqes; > - rte_memcpy((void *)(uintptr_t)mpw.data.raw, > - (void *)(addr + max), > - length - max); > - mpw.data.raw +=3D length - max; > - } else { > - rte_memcpy((void *)(uintptr_t)mpw.data.raw, > - (void *)addr, > - length); > - > - if (length =3D=3D max) > - mpw.data.raw =3D > - (volatile void *)txq->wqes; > - else > - mpw.data.raw +=3D length; > - } > - ++mpw.pkts_n; > - mpw.total_len +=3D length; > - ++j; > - if (mpw.pkts_n =3D=3D MLX5_MPW_DSEG_MAX) { > - mlx5_mpw_inline_close(txq, &mpw); > - inline_room =3D > - txq->max_inline * RTE_CACHE_LINE_SIZE; > - } else { > - inline_room -=3D length; > - } > - } > -#ifdef MLX5_PMD_SOFT_COUNTERS > - /* Increment sent bytes counter. */ > - txq->stats.obytes +=3D length; > -#endif > - ++i; > - } while (pkts_n); > - /* Take a shortcut if nothing must be sent. */ > - if (unlikely(i =3D=3D 0)) > - return 0; > - /* Check whether completion threshold has been reached. */ > - /* "j" includes both packets and segments. */ > - comp =3D txq->elts_comp + j; > - if (comp >=3D MLX5_TX_COMP_THRESH) { > - volatile struct mlx5_wqe *wqe =3D mpw.wqe; > - > - /* A CQE slot must always be available. */ > - assert((1u << txq->cqe_n) - (txq->cq_pi++ - txq->cq_ci)); > - /* Request completion on last WQE. */ > - wqe->ctrl[2] =3D rte_cpu_to_be_32(MLX5_COMP_ALWAYS << > - MLX5_COMP_MODE_OFFSET); > - /* Save elts_head in unused "immediate" field of WQE. */ > - wqe->ctrl[3] =3D elts_head; > - txq->elts_comp =3D 0; > - } else { > - txq->elts_comp =3D comp; > - } > -#ifdef MLX5_PMD_SOFT_COUNTERS > - /* Increment sent packets counter. */ > - txq->stats.opackets +=3D i; > -#endif > - /* Ring QP doorbell. */ > - if (mpw.state =3D=3D MLX5_MPW_INL_STATE_OPENED) > - mlx5_mpw_inline_close(txq, &mpw); > - else if (mpw.state =3D=3D MLX5_MPW_STATE_OPENED) > - mlx5_mpw_close(txq, &mpw); > - mlx5_tx_dbrec(txq, mpw.wqe); > - txq->elts_head =3D elts_head; > - return i; > -} > - > -/** > - * Open an Enhanced MPW session. > - * > - * @param txq > - * Pointer to TX queue structure. > - * @param mpw > - * Pointer to MPW session structure. > - * @param length > - * Packet length. > - */ > -static inline void > -mlx5_empw_new(struct mlx5_txq_data *txq, struct mlx5_mpw *mpw, int paddi= ng) > -{ > - uint16_t idx =3D txq->wqe_ci & ((1 << txq->wqe_n) - 1); > - > - mpw->state =3D MLX5_MPW_ENHANCED_STATE_OPENED; > - mpw->pkts_n =3D 0; > - mpw->total_len =3D sizeof(struct mlx5_wqe); > - mpw->wqe =3D (volatile struct mlx5_wqe *)tx_mlx5_wqe(txq, idx); > - mpw->wqe->ctrl[0] =3D > - rte_cpu_to_be_32((MLX5_OPC_MOD_ENHANCED_MPSW << 24) | > - (txq->wqe_ci << 8) | > - MLX5_OPCODE_ENHANCED_MPSW); > - mpw->wqe->ctrl[2] =3D rte_cpu_to_be_32(MLX5_COMP_ONLY_FIRST_ERR << > - MLX5_COMP_MODE_OFFSET); > - mpw->wqe->ctrl[3] =3D 0; > - memset((void *)(uintptr_t)&mpw->wqe->eseg, 0, MLX5_WQE_DWORD_SIZE); > - if (unlikely(padding)) { > - uintptr_t addr =3D (uintptr_t)(mpw->wqe + 1); > - > - /* Pad the first 2 DWORDs with zero-length inline header. */ > - *(volatile uint32_t *)addr =3D rte_cpu_to_be_32(MLX5_INLINE_SEG); > - *(volatile uint32_t *)(addr + MLX5_WQE_DWORD_SIZE) =3D > - rte_cpu_to_be_32(MLX5_INLINE_SEG); > - mpw->total_len +=3D 2 * MLX5_WQE_DWORD_SIZE; > - /* Start from the next WQEBB. */ > - mpw->data.raw =3D (volatile void *)(tx_mlx5_wqe(txq, idx + 1)); > - } else { > - mpw->data.raw =3D (volatile void *)(mpw->wqe + 1); > - } > -} > - > -/** > - * Close an Enhanced MPW session. > - * > - * @param txq > - * Pointer to TX queue structure. > - * @param mpw > - * Pointer to MPW session structure. > - * > - * @return > - * Number of consumed WQEs. > - */ > -static inline uint16_t > -mlx5_empw_close(struct mlx5_txq_data *txq, struct mlx5_mpw *mpw) > -{ > - uint16_t ret; > - > - /* Store size in multiple of 16 bytes. Control and Ethernet segments > - * count as 2. > - */ > - mpw->wqe->ctrl[1] =3D rte_cpu_to_be_32(txq->qp_num_8s | > - MLX5_WQE_DS(mpw->total_len)); > - mpw->state =3D MLX5_MPW_STATE_CLOSED; > - ret =3D (mpw->total_len + (MLX5_WQE_SIZE - 1)) / MLX5_WQE_SIZE; > - txq->wqe_ci +=3D ret; > - return ret; > -} > - > -/** > - * TX with Enhanced MPW support. > - * > - * @param txq > - * Pointer to TX queue structure. > - * @param[in] pkts > - * Packets to transmit. > - * @param pkts_n > - * Number of packets in array. > - * > - * @return > - * Number of packets successfully transmitted (<=3D pkts_n). > - */ > -static inline uint16_t > -txq_burst_empw(struct mlx5_txq_data *txq, struct rte_mbuf **pkts, > - uint16_t pkts_n) > -{ > - uint16_t elts_head =3D txq->elts_head; > - const uint16_t elts_n =3D 1 << txq->elts_n; > - const uint16_t elts_m =3D elts_n - 1; > - unsigned int i =3D 0; > - unsigned int j =3D 0; > - uint16_t max_elts; > - uint16_t max_wqe; > - unsigned int max_inline =3D txq->max_inline * RTE_CACHE_LINE_SIZE; > - unsigned int mpw_room =3D 0; > - unsigned int inl_pad =3D 0; > - uint32_t inl_hdr; > - uint64_t addr_64; > - struct mlx5_mpw mpw =3D { > - .state =3D MLX5_MPW_STATE_CLOSED, > - }; > - > - if (unlikely(!pkts_n)) > - return 0; > - /* Start processing. */ > - mlx5_tx_complete(txq); > - max_elts =3D (elts_n - (elts_head - txq->elts_tail)); > - max_wqe =3D (1u << txq->wqe_n) - (txq->wqe_ci - txq->wqe_pi); > - if (unlikely(!max_wqe)) > - return 0; > - do { > - struct rte_mbuf *buf =3D *(pkts++); > - uintptr_t addr; > - unsigned int do_inline =3D 0; /* Whether inline is possible. */ > - uint32_t length; > - uint8_t cs_flags; > - rte_be32_t metadata; > - > - /* Multi-segmented packet is handled in slow-path outside. */ > - assert(NB_SEGS(buf) =3D=3D 1); > - /* Make sure there is enough room to store this packet. */ > - if (max_elts - j =3D=3D 0) > - break; > - cs_flags =3D txq_ol_cksum_to_cs(buf); > - /* Copy metadata from mbuf if valid */ > - metadata =3D buf->ol_flags & PKT_TX_METADATA ? buf->tx_metadata : > - 0; > - /* Retrieve packet information. */ > - length =3D PKT_LEN(buf); > - /* Start new session if: > - * - multi-segment packet > - * - no space left even for a dseg > - * - next packet can be inlined with a new WQE > - * - cs_flag differs > - */ > - if (mpw.state =3D=3D MLX5_MPW_ENHANCED_STATE_OPENED) { > - if ((inl_pad + sizeof(struct mlx5_wqe_data_seg) > > - mpw_room) || > - (length <=3D txq->inline_max_packet_sz && > - inl_pad + sizeof(inl_hdr) + length > > - mpw_room) || > - (mpw.wqe->eseg.flow_table_metadata !=3D metadata) || > - (mpw.wqe->eseg.cs_flags !=3D cs_flags)) > - max_wqe -=3D mlx5_empw_close(txq, &mpw); > - } > - if (unlikely(mpw.state =3D=3D MLX5_MPW_STATE_CLOSED)) { > - /* In Enhanced MPW, inline as much as the budget is > - * allowed. The remaining space is to be filled with > - * dsegs. If the title WQEBB isn't padded, it will have > - * 2 dsegs there. > - */ > - mpw_room =3D RTE_MIN(MLX5_WQE_SIZE_MAX, > - (max_inline ? max_inline : > - pkts_n * MLX5_WQE_DWORD_SIZE) + > - MLX5_WQE_SIZE); > - if (unlikely(max_wqe * MLX5_WQE_SIZE < mpw_room)) > - break; > - /* Don't pad the title WQEBB to not waste WQ. */ > - mlx5_empw_new(txq, &mpw, 0); > - mpw_room -=3D mpw.total_len; > - inl_pad =3D 0; > - do_inline =3D length <=3D txq->inline_max_packet_sz && > - sizeof(inl_hdr) + length <=3D mpw_room && > - !txq->mpw_hdr_dseg; > - mpw.wqe->eseg.cs_flags =3D cs_flags; > - mpw.wqe->eseg.flow_table_metadata =3D metadata; > - } else { > - /* Evaluate whether the next packet can be inlined. > - * Inlininig is possible when: > - * - length is less than configured value > - * - length fits for remaining space > - * - not required to fill the title WQEBB with dsegs > - */ > - do_inline =3D > - length <=3D txq->inline_max_packet_sz && > - inl_pad + sizeof(inl_hdr) + length <=3D > - mpw_room && > - (!txq->mpw_hdr_dseg || > - mpw.total_len >=3D MLX5_WQE_SIZE); > - } > - if (max_inline && do_inline) { > - /* Inline packet into WQE. */ > - unsigned int max; > - > - assert(mpw.state =3D=3D MLX5_MPW_ENHANCED_STATE_OPENED); > - assert(length =3D=3D DATA_LEN(buf)); > - inl_hdr =3D rte_cpu_to_be_32(length | MLX5_INLINE_SEG); > - addr =3D rte_pktmbuf_mtod(buf, uintptr_t); > - mpw.data.raw =3D (volatile void *) > - ((uintptr_t)mpw.data.raw + inl_pad); > - max =3D tx_mlx5_wq_tailroom(txq, > - (void *)(uintptr_t)mpw.data.raw); > - /* Copy inline header. */ > - mpw.data.raw =3D (volatile void *) > - mlx5_copy_to_wq( > - (void *)(uintptr_t)mpw.data.raw, > - &inl_hdr, > - sizeof(inl_hdr), > - (void *)(uintptr_t)txq->wqes, > - max); > - max =3D tx_mlx5_wq_tailroom(txq, > - (void *)(uintptr_t)mpw.data.raw); > - /* Copy packet data. */ > - mpw.data.raw =3D (volatile void *) > - mlx5_copy_to_wq( > - (void *)(uintptr_t)mpw.data.raw, > - (void *)addr, > - length, > - (void *)(uintptr_t)txq->wqes, > - max); > - ++mpw.pkts_n; > - mpw.total_len +=3D (inl_pad + sizeof(inl_hdr) + length); > - /* No need to get completion as the entire packet is > - * copied to WQ. Free the buf right away. > - */ > - rte_pktmbuf_free_seg(buf); > - mpw_room -=3D (inl_pad + sizeof(inl_hdr) + length); > - /* Add pad in the next packet if any. */ > - inl_pad =3D (((uintptr_t)mpw.data.raw + > - (MLX5_WQE_DWORD_SIZE - 1)) & > - ~(MLX5_WQE_DWORD_SIZE - 1)) - > - (uintptr_t)mpw.data.raw; > - } else { > - /* No inline. Load a dseg of packet pointer. */ > - volatile rte_v128u32_t *dseg; > - > - assert(mpw.state =3D=3D MLX5_MPW_ENHANCED_STATE_OPENED); > - assert((inl_pad + sizeof(*dseg)) <=3D mpw_room); > - assert(length =3D=3D DATA_LEN(buf)); > - if (!tx_mlx5_wq_tailroom(txq, > - (void *)((uintptr_t)mpw.data.raw > - + inl_pad))) > - dseg =3D (volatile void *)txq->wqes; > - else > - dseg =3D (volatile void *) > - ((uintptr_t)mpw.data.raw + > - inl_pad); > - (*txq->elts)[elts_head++ & elts_m] =3D buf; > - addr_64 =3D rte_cpu_to_be_64(rte_pktmbuf_mtod(buf, > - uintptr_t)); > - *dseg =3D (rte_v128u32_t) { > - rte_cpu_to_be_32(length), > - mlx5_tx_mb2mr(txq, buf), > - addr_64, > - addr_64 >> 32, > - }; > - mpw.data.raw =3D (volatile void *)(dseg + 1); > - mpw.total_len +=3D (inl_pad + sizeof(*dseg)); > - ++j; > - ++mpw.pkts_n; > - mpw_room -=3D (inl_pad + sizeof(*dseg)); > - inl_pad =3D 0; > - } > -#ifdef MLX5_PMD_SOFT_COUNTERS > - /* Increment sent bytes counter. */ > - txq->stats.obytes +=3D length; > -#endif > - ++i; > - } while (i < pkts_n); > - /* Take a shortcut if nothing must be sent. */ > - if (unlikely(i =3D=3D 0)) > - return 0; > - /* Check whether completion threshold has been reached. */ > - if (txq->elts_comp + j >=3D MLX5_TX_COMP_THRESH || > - (uint16_t)(txq->wqe_ci - txq->mpw_comp) >=3D > - (1 << txq->wqe_n) / MLX5_TX_COMP_THRESH_INLINE_DIV) { > - volatile struct mlx5_wqe *wqe =3D mpw.wqe; > - > - /* A CQE slot must always be available. */ > - assert((1u << txq->cqe_n) - (txq->cq_pi++ - txq->cq_ci)); > - /* Request completion on last WQE. */ > - wqe->ctrl[2] =3D rte_cpu_to_be_32(MLX5_COMP_ALWAYS << > - MLX5_COMP_MODE_OFFSET); > - /* Save elts_head in unused "immediate" field of WQE. */ > - wqe->ctrl[3] =3D elts_head; > - txq->elts_comp =3D 0; > - txq->mpw_comp =3D txq->wqe_ci; > - } else { > - txq->elts_comp +=3D j; > - } > -#ifdef MLX5_PMD_SOFT_COUNTERS > - /* Increment sent packets counter. */ > - txq->stats.opackets +=3D i; > -#endif > - if (mpw.state =3D=3D MLX5_MPW_ENHANCED_STATE_OPENED) > - mlx5_empw_close(txq, &mpw); > - /* Ring QP doorbell. */ > - mlx5_tx_dbrec(txq, mpw.wqe); > - txq->elts_head =3D elts_head; > - return i; > -} > - > -/** > - * DPDK callback for TX with Enhanced MPW support. > - * > - * @param dpdk_txq > - * Generic pointer to TX queue structure. > - * @param[in] pkts > - * Packets to transmit. > - * @param pkts_n > - * Number of packets in array. > - * > - * @return > - * Number of packets successfully transmitted (<=3D pkts_n). > - */ > -uint16_t > -mlx5_tx_burst_empw(void *dpdk_txq, struct rte_mbuf **pkts, uint16_t pkts= _n) > -{ > - struct mlx5_txq_data *txq =3D (struct mlx5_txq_data *)dpdk_txq; > - uint16_t nb_tx =3D 0; > - > - while (pkts_n > nb_tx) { > - uint16_t n; > - uint16_t ret; > - > - n =3D txq_count_contig_multi_seg(&pkts[nb_tx], pkts_n - nb_tx); > - if (n) { > - ret =3D mlx5_tx_burst(dpdk_txq, &pkts[nb_tx], n); > - if (!ret) > - break; > - nb_tx +=3D ret; > - } > - n =3D txq_count_contig_single_seg(&pkts[nb_tx], pkts_n - nb_tx); > - if (n) { > - ret =3D txq_burst_empw(txq, &pkts[nb_tx], n); > - if (!ret) > - break; > - nb_tx +=3D ret; > - } > - } > - return nb_tx; > -} > - > -/** > * Translate RX completion flags to packet type. > * > * @param[in] rxq > @@ -2867,22 +1492,6 @@ > */ >=20 > __rte_weak uint16_t > -mlx5_tx_burst_raw_vec(void *dpdk_txq __rte_unused, > - struct rte_mbuf **pkts __rte_unused, > - uint16_t pkts_n __rte_unused) > -{ > - return 0; > -} > - > -__rte_weak uint16_t > -mlx5_tx_burst_vec(void *dpdk_txq __rte_unused, > - struct rte_mbuf **pkts __rte_unused, > - uint16_t pkts_n __rte_unused) > -{ > - return 0; > -} > - > -__rte_weak uint16_t > mlx5_rx_burst_vec(void *dpdk_txq __rte_unused, > struct rte_mbuf **pkts __rte_unused, > uint16_t pkts_n __rte_unused) > @@ -2891,25 +1500,50 @@ > } >=20 > __rte_weak int > -mlx5_check_raw_vec_tx_support(struct rte_eth_dev *dev __rte_unused) > +mlx5_rxq_check_vec_support(struct mlx5_rxq_data *rxq __rte_unused) > { > return -ENOTSUP; > } >=20 > __rte_weak int > -mlx5_check_vec_tx_support(struct rte_eth_dev *dev __rte_unused) > +mlx5_check_vec_rx_support(struct rte_eth_dev *dev __rte_unused) > { > return -ENOTSUP; > } >=20 > -__rte_weak int > -mlx5_rxq_check_vec_support(struct mlx5_rxq_data *rxq __rte_unused) > +/** > + * DPDK callback to check the status of a tx descriptor. > + * > + * @param tx_queue > + * The tx queue. > + * @param[in] offset > + * The index of the descriptor in the ring. > + * > + * @return > + * The status of the tx descriptor. > + */ > +int > +mlx5_tx_descriptor_status(void *tx_queue, uint16_t offset) > { > - return -ENOTSUP; > + (void)tx_queue; > + (void)offset; > + return RTE_ETH_TX_DESC_FULL; > } >=20 > -__rte_weak int > -mlx5_check_vec_rx_support(struct rte_eth_dev *dev __rte_unused) > +/** > + * Configure the TX function to use. > + * > + * @param dev > + * Pointer to private data structure. > + * > + * @return > + * Pointer to selected Tx burst function. > + */ > +eth_tx_burst_t > +mlx5_select_tx_function(struct rte_eth_dev *dev) > { > - return -ENOTSUP; > + (void)dev; > + return removed_tx_burst; > } > + > + > diff --git a/drivers/net/mlx5/mlx5_rxtx.h b/drivers/net/mlx5/mlx5_rxtx.h > index 3d79c18..acde09d 100644 > --- a/drivers/net/mlx5/mlx5_rxtx.h > +++ b/drivers/net/mlx5/mlx5_rxtx.h > @@ -329,14 +329,6 @@ struct mlx5_txq_ctrl *mlx5_txq_new(struct rte_eth_de= v *dev, uint16_t idx, > void mlx5_set_ptype_table(void); > void mlx5_set_cksum_table(void); > void mlx5_set_swp_types_table(void); > -uint16_t mlx5_tx_burst(void *dpdk_txq, struct rte_mbuf **pkts, > - uint16_t pkts_n); > -uint16_t mlx5_tx_burst_mpw(void *dpdk_txq, struct rte_mbuf **pkts, > - uint16_t pkts_n); > -uint16_t mlx5_tx_burst_mpw_inline(void *dpdk_txq, struct rte_mbuf **pkts= , > - uint16_t pkts_n); > -uint16_t mlx5_tx_burst_empw(void *dpdk_txq, struct rte_mbuf **pkts, > - uint16_t pkts_n); > __rte_noinline uint16_t mlx5_tx_error_cqe_handle(struct mlx5_txq_data *tx= q, > volatile struct mlx5_err_cqe *err_cqe); > uint16_t mlx5_rx_burst(void *dpdk_rxq, struct rte_mbuf **pkts, uint16_t p= kts_n); > @@ -360,14 +352,8 @@ int mlx5_queue_state_modify_primary(struct rte_eth_d= ev *dev, > const struct mlx5_mp_arg_queue_state_modify *sm); >=20 > /* Vectorized version of mlx5_rxtx.c */ > -int mlx5_check_raw_vec_tx_support(struct rte_eth_dev *dev); > -int mlx5_check_vec_tx_support(struct rte_eth_dev *dev); > int mlx5_rxq_check_vec_support(struct mlx5_rxq_data *rxq_data); > int mlx5_check_vec_rx_support(struct rte_eth_dev *dev); > -uint16_t mlx5_tx_burst_raw_vec(void *dpdk_txq, struct rte_mbuf **pkts, > - uint16_t pkts_n); > -uint16_t mlx5_tx_burst_vec(void *dpdk_txq, struct rte_mbuf **pkts, > - uint16_t pkts_n); > uint16_t mlx5_rx_burst_vec(void *dpdk_txq, struct rte_mbuf **pkts, > uint16_t pkts_n); >=20 > @@ -478,122 +464,6 @@ enum mlx5_cqe_status { > } >=20 > /** > - * Return the address of the WQE. > - * > - * @param txq > - * Pointer to TX queue structure. > - * @param wqe_ci > - * WQE consumer index. > - * > - * @return > - * WQE address. > - */ > -static inline uintptr_t * > -tx_mlx5_wqe(struct mlx5_txq_data *txq, uint16_t ci) > -{ > - ci &=3D ((1 << txq->wqe_n) - 1); > - return (uintptr_t *)((uintptr_t)txq->wqes + ci * MLX5_WQE_SIZE); > -} > - > -/** > - * Handle the next CQE. > - * > - * @param txq > - * Pointer to TX queue structure. > - * > - * @return > - * The last Tx buffer element to free. > - */ > -static __rte_always_inline uint16_t > -mlx5_tx_cqe_handle(struct mlx5_txq_data *txq) > -{ > - const unsigned int cqe_n =3D 1 << txq->cqe_n; > - const unsigned int cqe_cnt =3D cqe_n - 1; > - uint16_t last_elts; > - union { > - volatile struct mlx5_cqe *cqe; > - volatile struct mlx5_err_cqe *err_cqe; > - } u =3D { > - .cqe =3D &(*txq->cqes)[txq->cq_ci & cqe_cnt], > - }; > - int ret =3D check_cqe(u.cqe, cqe_n, txq->cq_ci); > - > - if (unlikely(ret !=3D MLX5_CQE_STATUS_SW_OWN)) { > - if (unlikely(ret =3D=3D MLX5_CQE_STATUS_ERR)) > - last_elts =3D mlx5_tx_error_cqe_handle(txq, u.err_cqe); > - else > - /* Do not release buffers. */ > - return txq->elts_tail; > - } else { > - uint16_t new_wqe_pi =3D rte_be_to_cpu_16(u.cqe->wqe_counter); > - volatile struct mlx5_wqe_ctrl *ctrl =3D > - (volatile struct mlx5_wqe_ctrl *) > - tx_mlx5_wqe(txq, new_wqe_pi); > - > - /* Release completion burst buffers. */ > - last_elts =3D ctrl->ctrl3; > - txq->wqe_pi =3D new_wqe_pi; > - txq->cq_ci++; > - } > - rte_compiler_barrier(); > - *txq->cq_db =3D rte_cpu_to_be_32(txq->cq_ci); > - return last_elts; > -} > - > -/** > - * Manage TX completions. > - * > - * When sending a burst, mlx5_tx_burst() posts several WRs. > - * > - * @param txq > - * Pointer to TX queue structure. > - */ > -static __rte_always_inline void > -mlx5_tx_complete(struct mlx5_txq_data *txq) > -{ > - const uint16_t elts_n =3D 1 << txq->elts_n; > - const uint16_t elts_m =3D elts_n - 1; > - uint16_t elts_free =3D txq->elts_tail; > - uint16_t elts_tail; > - struct rte_mbuf *m, *free[elts_n]; > - struct rte_mempool *pool =3D NULL; > - unsigned int blk_n =3D 0; > - > - elts_tail =3D mlx5_tx_cqe_handle(txq); > - assert((elts_tail & elts_m) < (1 << txq->wqe_n)); > - /* Free buffers. */ > - while (elts_free !=3D elts_tail) { > - m =3D rte_pktmbuf_prefree_seg((*txq->elts)[elts_free++ & elts_m]); > - if (likely(m !=3D NULL)) { > - if (likely(m->pool =3D=3D pool)) { > - free[blk_n++] =3D m; > - } else { > - if (likely(pool !=3D NULL)) > - rte_mempool_put_bulk(pool, > - (void *)free, > - blk_n); > - free[0] =3D m; > - pool =3D m->pool; > - blk_n =3D 1; > - } > - } > - } > - if (blk_n) > - rte_mempool_put_bulk(pool, (void *)free, blk_n); > -#ifndef NDEBUG > - elts_free =3D txq->elts_tail; > - /* Poisoning. */ > - while (elts_free !=3D elts_tail) { > - memset(&(*txq->elts)[elts_free & elts_m], > - 0x66, > - sizeof((*txq->elts)[elts_free & elts_m])); > - ++elts_free; > - } > -#endif > - txq->elts_tail =3D elts_tail; > -} > - > -/** > * Get Memory Pool (MP) from mbuf. If mbuf is indirect, the pool from whi= ch the > * cloned mbuf is allocated is returned instead. > * > @@ -710,147 +580,4 @@ enum mlx5_cqe_status { > mlx5_tx_dbrec_cond_wmb(txq, wqe, 1); > } >=20 > -/** > - * Convert mbuf to Verb SWP. > - * > - * @param txq_data > - * Pointer to the Tx queue. > - * @param buf > - * Pointer to the mbuf. > - * @param offsets > - * Pointer to the SWP header offsets. > - * @param swp_types > - * Pointer to the SWP header types. > - */ > -static __rte_always_inline void > -txq_mbuf_to_swp(struct mlx5_txq_data *txq, struct rte_mbuf *buf, > - uint8_t *offsets, uint8_t *swp_types) > -{ > - const uint64_t vlan =3D buf->ol_flags & PKT_TX_VLAN_PKT; > - const uint64_t tunnel =3D buf->ol_flags & PKT_TX_TUNNEL_MASK; > - const uint64_t tso =3D buf->ol_flags & PKT_TX_TCP_SEG; > - const uint64_t csum_flags =3D buf->ol_flags & PKT_TX_L4_MASK; > - const uint64_t inner_ip =3D > - buf->ol_flags & (PKT_TX_IPV4 | PKT_TX_IPV6); > - const uint64_t ol_flags_mask =3D PKT_TX_L4_MASK | PKT_TX_IPV6 | > - PKT_TX_OUTER_IPV6; > - uint16_t idx; > - uint16_t off; > - > - if (likely(!txq->swp_en || (tunnel !=3D PKT_TX_TUNNEL_UDP && > - tunnel !=3D PKT_TX_TUNNEL_IP))) > - return; > - /* > - * The index should have: > - * bit[0:1] =3D PKT_TX_L4_MASK > - * bit[4] =3D PKT_TX_IPV6 > - * bit[8] =3D PKT_TX_OUTER_IPV6 > - * bit[9] =3D PKT_TX_OUTER_UDP > - */ > - idx =3D (buf->ol_flags & ol_flags_mask) >> 52; > - if (tunnel =3D=3D PKT_TX_TUNNEL_UDP) > - idx |=3D 1 << 9; > - *swp_types =3D mlx5_swp_types_table[idx]; > - /* > - * Set offsets for SW parser. Since ConnectX-5, SW parser just > - * complements HW parser. SW parser starts to engage only if HW parser > - * can't reach a header. For the older devices, HW parser will not kick > - * in if any of SWP offsets is set. Therefore, all of the L3 offsets > - * should be set regardless of HW offload. > - */ > - off =3D buf->outer_l2_len + (vlan ? sizeof(struct rte_vlan_hdr) : 0); > - offsets[1] =3D off >> 1; /* Outer L3 offset. */ > - off +=3D buf->outer_l3_len; > - if (tunnel =3D=3D PKT_TX_TUNNEL_UDP) > - offsets[0] =3D off >> 1; /* Outer L4 offset. */ > - if (inner_ip) { > - off +=3D buf->l2_len; > - offsets[3] =3D off >> 1; /* Inner L3 offset. */ > - if (csum_flags =3D=3D PKT_TX_TCP_CKSUM || tso || > - csum_flags =3D=3D PKT_TX_UDP_CKSUM) { > - off +=3D buf->l3_len; > - offsets[2] =3D off >> 1; /* Inner L4 offset. */ > - } > - } > -} > - > -/** > - * Convert the Checksum offloads to Verbs. > - * > - * @param buf > - * Pointer to the mbuf. > - * > - * @return > - * Converted checksum flags. > - */ > -static __rte_always_inline uint8_t > -txq_ol_cksum_to_cs(struct rte_mbuf *buf) > -{ > - uint32_t idx; > - uint8_t is_tunnel =3D !!(buf->ol_flags & PKT_TX_TUNNEL_MASK); > - const uint64_t ol_flags_mask =3D PKT_TX_TCP_SEG | PKT_TX_L4_MASK | > - PKT_TX_IP_CKSUM | PKT_TX_OUTER_IP_CKSUM; > - > - /* > - * The index should have: > - * bit[0] =3D PKT_TX_TCP_SEG > - * bit[2:3] =3D PKT_TX_UDP_CKSUM, PKT_TX_TCP_CKSUM > - * bit[4] =3D PKT_TX_IP_CKSUM > - * bit[8] =3D PKT_TX_OUTER_IP_CKSUM > - * bit[9] =3D tunnel > - */ > - idx =3D ((buf->ol_flags & ol_flags_mask) >> 50) | (!!is_tunnel << 9); > - return mlx5_cksum_table[idx]; > -} > - > -/** > - * Count the number of contiguous single segment packets. > - * > - * @param pkts > - * Pointer to array of packets. > - * @param pkts_n > - * Number of packets. > - * > - * @return > - * Number of contiguous single segment packets. > - */ > -static __rte_always_inline unsigned int > -txq_count_contig_single_seg(struct rte_mbuf **pkts, uint16_t pkts_n) > -{ > - unsigned int pos; > - > - if (!pkts_n) > - return 0; > - /* Count the number of contiguous single segment packets. */ > - for (pos =3D 0; pos < pkts_n; ++pos) > - if (NB_SEGS(pkts[pos]) > 1) > - break; > - return pos; > -} > - > -/** > - * Count the number of contiguous multi-segment packets. > - * > - * @param pkts > - * Pointer to array of packets. > - * @param pkts_n > - * Number of packets. > - * > - * @return > - * Number of contiguous multi-segment packets. > - */ > -static __rte_always_inline unsigned int > -txq_count_contig_multi_seg(struct rte_mbuf **pkts, uint16_t pkts_n) > -{ > - unsigned int pos; > - > - if (!pkts_n) > - return 0; > - /* Count the number of contiguous multi-segment packets. */ > - for (pos =3D 0; pos < pkts_n; ++pos) > - if (NB_SEGS(pkts[pos]) =3D=3D 1) > - break; > - return pos; > -} > - > #endif /* RTE_PMD_MLX5_RXTX_H_ */ > diff --git a/drivers/net/mlx5/mlx5_rxtx_vec.c b/drivers/net/mlx5/mlx5_rxt= x_vec.c > index 073044f..f6ec828 100644 > --- a/drivers/net/mlx5/mlx5_rxtx_vec.c > +++ b/drivers/net/mlx5/mlx5_rxtx_vec.c > @@ -40,138 +40,6 @@ > #endif >=20 > /** > - * Count the number of packets having same ol_flags and same metadata (i= f > - * PKT_TX_METADATA is set in ol_flags), and calculate cs_flags. > - * > - * @param pkts > - * Pointer to array of packets. > - * @param pkts_n > - * Number of packets. > - * @param cs_flags > - * Pointer of flags to be returned. > - * @param metadata > - * Pointer of metadata to be returned. > - * @param txq_offloads > - * Offloads enabled on Tx queue > - * > - * @return > - * Number of packets having same ol_flags and metadata, if relevant. > - */ > -static inline unsigned int > -txq_calc_offload(struct rte_mbuf **pkts, uint16_t pkts_n, uint8_t *cs_fl= ags, > - rte_be32_t *metadata, const uint64_t txq_offloads) > -{ > - unsigned int pos; > - const uint64_t cksum_ol_mask =3D > - PKT_TX_IP_CKSUM | PKT_TX_TCP_CKSUM | > - PKT_TX_UDP_CKSUM | PKT_TX_TUNNEL_GRE | > - PKT_TX_TUNNEL_VXLAN | PKT_TX_OUTER_IP_CKSUM; > - rte_be32_t p0_metadata, pn_metadata; > - > - if (!pkts_n) > - return 0; > - p0_metadata =3D pkts[0]->ol_flags & PKT_TX_METADATA ? > - pkts[0]->tx_metadata : 0; > - /* Count the number of packets having same offload parameters. */ > - for (pos =3D 1; pos < pkts_n; ++pos) { > - /* Check if packet has same checksum flags. */ > - if ((txq_offloads & MLX5_VEC_TX_CKSUM_OFFLOAD_CAP) && > - ((pkts[pos]->ol_flags ^ pkts[0]->ol_flags) & cksum_ol_mask)) > - break; > - /* Check if packet has same metadata. */ > - if (txq_offloads & DEV_TX_OFFLOAD_MATCH_METADATA) { > - pn_metadata =3D pkts[pos]->ol_flags & PKT_TX_METADATA ? > - pkts[pos]->tx_metadata : 0; > - if (pn_metadata !=3D p0_metadata) > - break; > - } > - } > - *cs_flags =3D txq_ol_cksum_to_cs(pkts[0]); > - *metadata =3D p0_metadata; > - return pos; > -} > - > -/** > - * DPDK callback for vectorized TX. > - * > - * @param dpdk_txq > - * Generic pointer to TX queue structure. > - * @param[in] pkts > - * Packets to transmit. > - * @param pkts_n > - * Number of packets in array. > - * > - * @return > - * Number of packets successfully transmitted (<=3D pkts_n). > - */ > -uint16_t > -mlx5_tx_burst_raw_vec(void *dpdk_txq, struct rte_mbuf **pkts, > - uint16_t pkts_n) > -{ > - struct mlx5_txq_data *txq =3D (struct mlx5_txq_data *)dpdk_txq; > - uint16_t nb_tx =3D 0; > - > - while (pkts_n > nb_tx) { > - uint16_t n; > - uint16_t ret; > - > - n =3D RTE_MIN((uint16_t)(pkts_n - nb_tx), MLX5_VPMD_TX_MAX_BURST); > - ret =3D txq_burst_v(txq, &pkts[nb_tx], n, 0, 0); > - nb_tx +=3D ret; > - if (!ret) > - break; > - } > - return nb_tx; > -} > - > -/** > - * DPDK callback for vectorized TX with multi-seg packets and offload. > - * > - * @param dpdk_txq > - * Generic pointer to TX queue structure. > - * @param[in] pkts > - * Packets to transmit. > - * @param pkts_n > - * Number of packets in array. > - * > - * @return > - * Number of packets successfully transmitted (<=3D pkts_n). > - */ > -uint16_t > -mlx5_tx_burst_vec(void *dpdk_txq, struct rte_mbuf **pkts, uint16_t pkts_= n) > -{ > - struct mlx5_txq_data *txq =3D (struct mlx5_txq_data *)dpdk_txq; > - uint16_t nb_tx =3D 0; > - > - while (pkts_n > nb_tx) { > - uint8_t cs_flags =3D 0; > - uint16_t n; > - uint16_t ret; > - rte_be32_t metadata =3D 0; > - > - /* Transmit multi-seg packets in the head of pkts list. */ > - if ((txq->offloads & DEV_TX_OFFLOAD_MULTI_SEGS) && > - NB_SEGS(pkts[nb_tx]) > 1) > - nb_tx +=3D txq_scatter_v(txq, > - &pkts[nb_tx], > - pkts_n - nb_tx); > - n =3D RTE_MIN((uint16_t)(pkts_n - nb_tx), MLX5_VPMD_TX_MAX_BURST); > - if (txq->offloads & DEV_TX_OFFLOAD_MULTI_SEGS) > - n =3D txq_count_contig_single_seg(&pkts[nb_tx], n); > - if (txq->offloads & (MLX5_VEC_TX_CKSUM_OFFLOAD_CAP | > - DEV_TX_OFFLOAD_MATCH_METADATA)) > - n =3D txq_calc_offload(&pkts[nb_tx], n, > - &cs_flags, &metadata, > - txq->offloads); > - ret =3D txq_burst_v(txq, &pkts[nb_tx], n, cs_flags, metadata); > - nb_tx +=3D ret; > - if (!ret) > - break; > - } > - return nb_tx; > -} > - > -/** > * Skip error packets. > * > * @param rxq > @@ -243,49 +111,6 @@ > } >=20 > /** > - * Check Tx queue flags are set for raw vectorized Tx. > - * > - * @param dev > - * Pointer to Ethernet device. > - * > - * @return > - * 1 if supported, negative errno value if not. > - */ > -int __attribute__((cold)) > -mlx5_check_raw_vec_tx_support(struct rte_eth_dev *dev) > -{ > - uint64_t offloads =3D dev->data->dev_conf.txmode.offloads; > - > - /* Doesn't support any offload. */ > - if (offloads) > - return -ENOTSUP; > - return 1; > -} > - > -/** > - * Check a device can support vectorized TX. > - * > - * @param dev > - * Pointer to Ethernet device. > - * > - * @return > - * 1 if supported, negative errno value if not. > - */ > -int __attribute__((cold)) > -mlx5_check_vec_tx_support(struct rte_eth_dev *dev) > -{ > - struct mlx5_priv *priv =3D dev->data->dev_private; > - uint64_t offloads =3D dev->data->dev_conf.txmode.offloads; > - > - if (!priv->config.tx_vec_en || > - priv->txqs_n > (unsigned int)priv->config.txqs_vec || > - priv->config.mps !=3D MLX5_MPW_ENHANCED || > - offloads & ~MLX5_VEC_TX_OFFLOAD_CAP) > - return -ENOTSUP; > - return 1; > -} > - > -/** > * Check a RX queue can support vectorized RX. > * > * @param rxq > diff --git a/drivers/net/mlx5/mlx5_rxtx_vec_neon.h b/drivers/net/mlx5/mlx= 5_rxtx_vec_neon.h > index 1c7e3b4..9930286 100644 > --- a/drivers/net/mlx5/mlx5_rxtx_vec_neon.h > +++ b/drivers/net/mlx5/mlx5_rxtx_vec_neon.h > @@ -27,295 +27,6 @@ > #pragma GCC diagnostic ignored "-Wcast-qual" >=20 > /** > - * Fill in buffer descriptors in a multi-packet send descriptor. > - * > - * @param txq > - * Pointer to TX queue structure. > - * @param dseg > - * Pointer to buffer descriptor to be written. > - * @param pkts > - * Pointer to array of packets to be sent. > - * @param n > - * Number of packets to be filled. > - */ > -static inline void > -txq_wr_dseg_v(struct mlx5_txq_data *txq, uint8_t *dseg, > - struct rte_mbuf **pkts, unsigned int n) > -{ > - unsigned int pos; > - uintptr_t addr; > - const uint8x16_t dseg_shuf_m =3D { > - 3, 2, 1, 0, /* length, bswap32 */ > - 4, 5, 6, 7, /* lkey */ > - 15, 14, 13, 12, /* addr, bswap64 */ > - 11, 10, 9, 8 > - }; > -#ifdef MLX5_PMD_SOFT_COUNTERS > - uint32_t tx_byte =3D 0; > -#endif > - > - for (pos =3D 0; pos < n; ++pos, dseg +=3D MLX5_WQE_DWORD_SIZE) { > - uint8x16_t desc; > - struct rte_mbuf *pkt =3D pkts[pos]; > - > - addr =3D rte_pktmbuf_mtod(pkt, uintptr_t); > - desc =3D vreinterpretq_u8_u32((uint32x4_t) { > - DATA_LEN(pkt), > - mlx5_tx_mb2mr(txq, pkt), > - addr, > - addr >> 32 }); > - desc =3D vqtbl1q_u8(desc, dseg_shuf_m); > - vst1q_u8(dseg, desc); > -#ifdef MLX5_PMD_SOFT_COUNTERS > - tx_byte +=3D DATA_LEN(pkt); > -#endif > - } > -#ifdef MLX5_PMD_SOFT_COUNTERS > - txq->stats.obytes +=3D tx_byte; > -#endif > -} > - > -/** > - * Send multi-segmented packets until it encounters a single segment pac= ket in > - * the pkts list. > - * > - * @param txq > - * Pointer to TX queue structure. > - * @param pkts > - * Pointer to array of packets to be sent. > - * @param pkts_n > - * Number of packets to be sent. > - * > - * @return > - * Number of packets successfully transmitted (<=3D pkts_n). > - */ > -static uint16_t > -txq_scatter_v(struct mlx5_txq_data *txq, struct rte_mbuf **pkts, > - uint16_t pkts_n) > -{ > - uint16_t elts_head =3D txq->elts_head; > - const uint16_t elts_n =3D 1 << txq->elts_n; > - const uint16_t elts_m =3D elts_n - 1; > - const uint16_t wq_n =3D 1 << txq->wqe_n; > - const uint16_t wq_mask =3D wq_n - 1; > - const unsigned int nb_dword_per_wqebb =3D > - MLX5_WQE_SIZE / MLX5_WQE_DWORD_SIZE; > - const unsigned int nb_dword_in_hdr =3D > - sizeof(struct mlx5_wqe) / MLX5_WQE_DWORD_SIZE; > - unsigned int n; > - volatile struct mlx5_wqe *wqe =3D NULL; > - bool metadata_ol =3D > - txq->offloads & DEV_TX_OFFLOAD_MATCH_METADATA ? true : false; > - > - assert(elts_n > pkts_n); > - mlx5_tx_complete(txq); > - if (unlikely(!pkts_n)) > - return 0; > - for (n =3D 0; n < pkts_n; ++n) { > - struct rte_mbuf *buf =3D pkts[n]; > - unsigned int segs_n =3D buf->nb_segs; > - unsigned int ds =3D nb_dword_in_hdr; > - unsigned int len =3D PKT_LEN(buf); > - uint16_t wqe_ci =3D txq->wqe_ci; > - const uint8x16_t ctrl_shuf_m =3D { > - 3, 2, 1, 0, /* bswap32 */ > - 7, 6, 5, 4, /* bswap32 */ > - 11, 10, 9, 8, /* bswap32 */ > - 12, 13, 14, 15 > - }; > - uint8_t cs_flags; > - uint16_t max_elts; > - uint16_t max_wqe; > - uint8x16_t *t_wqe; > - uint8_t *dseg; > - uint8x16_t ctrl; > - rte_be32_t metadata =3D > - metadata_ol && (buf->ol_flags & PKT_TX_METADATA) ? > - buf->tx_metadata : 0; > - > - assert(segs_n); > - max_elts =3D elts_n - (elts_head - txq->elts_tail); > - max_wqe =3D wq_n - (txq->wqe_ci - txq->wqe_pi); > - /* > - * A MPW session consumes 2 WQEs at most to > - * include MLX5_MPW_DSEG_MAX pointers. > - */ > - if (segs_n =3D=3D 1 || > - max_elts < segs_n || max_wqe < 2) > - break; > - wqe =3D &((volatile struct mlx5_wqe64 *) > - txq->wqes)[wqe_ci & wq_mask].hdr; > - cs_flags =3D txq_ol_cksum_to_cs(buf); > - /* Title WQEBB pointer. */ > - t_wqe =3D (uint8x16_t *)wqe; > - dseg =3D (uint8_t *)(wqe + 1); > - do { > - if (!(ds++ % nb_dword_per_wqebb)) { > - dseg =3D (uint8_t *) > - &((volatile struct mlx5_wqe64 *) > - txq->wqes)[++wqe_ci & wq_mask]; > - } > - txq_wr_dseg_v(txq, dseg, &buf, 1); > - dseg +=3D MLX5_WQE_DWORD_SIZE; > - (*txq->elts)[elts_head++ & elts_m] =3D buf; > - buf =3D buf->next; > - } while (--segs_n); > - ++wqe_ci; > - /* Fill CTRL in the header. */ > - ctrl =3D vreinterpretq_u8_u32((uint32x4_t) { > - MLX5_OPC_MOD_MPW << 24 | > - txq->wqe_ci << 8 | MLX5_OPCODE_TSO, > - txq->qp_num_8s | ds, 4, 0}); > - ctrl =3D vqtbl1q_u8(ctrl, ctrl_shuf_m); > - vst1q_u8((void *)t_wqe, ctrl); > - /* Fill ESEG in the header. */ > - vst1q_u32((void *)(t_wqe + 1), > - ((uint32x4_t){ 0, > - rte_cpu_to_be_16(len) << 16 | cs_flags, > - metadata, 0 })); > - txq->wqe_ci =3D wqe_ci; > - } > - if (!n) > - return 0; > - txq->elts_comp +=3D (uint16_t)(elts_head - txq->elts_head); > - txq->elts_head =3D elts_head; > - if (txq->elts_comp >=3D MLX5_TX_COMP_THRESH) { > - /* A CQE slot must always be available. */ > - assert((1u << txq->cqe_n) - (txq->cq_pi++ - txq->cq_ci)); > - wqe->ctrl[2] =3D rte_cpu_to_be_32(MLX5_COMP_ALWAYS << > - MLX5_COMP_MODE_OFFSET); > - wqe->ctrl[3] =3D txq->elts_head; > - txq->elts_comp =3D 0; > - } > -#ifdef MLX5_PMD_SOFT_COUNTERS > - txq->stats.opackets +=3D n; > -#endif > - mlx5_tx_dbrec(txq, wqe); > - return n; > -} > - > -/** > - * Send burst of packets with Enhanced MPW. If it encounters a multi-seg= packet, > - * it returns to make it processed by txq_scatter_v(). All the packets i= n > - * the pkts list should be single segment packets having same offload fl= ags. > - * This must be checked by txq_count_contig_single_seg() and txq_calc_of= fload(). > - * > - * @param txq > - * Pointer to TX queue structure. > - * @param pkts > - * Pointer to array of packets to be sent. > - * @param pkts_n > - * Number of packets to be sent (<=3D MLX5_VPMD_TX_MAX_BURST). > - * @param cs_flags > - * Checksum offload flags to be written in the descriptor. > - * @param metadata > - * Metadata value to be written in the descriptor. > - * > - * @return > - * Number of packets successfully transmitted (<=3D pkts_n). > - */ > -static inline uint16_t > -txq_burst_v(struct mlx5_txq_data *txq, struct rte_mbuf **pkts, uint16_t = pkts_n, > - uint8_t cs_flags, rte_be32_t metadata) > -{ > - struct rte_mbuf **elts; > - uint16_t elts_head =3D txq->elts_head; > - const uint16_t elts_n =3D 1 << txq->elts_n; > - const uint16_t elts_m =3D elts_n - 1; > - const unsigned int nb_dword_per_wqebb =3D > - MLX5_WQE_SIZE / MLX5_WQE_DWORD_SIZE; > - const unsigned int nb_dword_in_hdr =3D > - sizeof(struct mlx5_wqe) / MLX5_WQE_DWORD_SIZE; > - unsigned int n =3D 0; > - unsigned int pos; > - uint16_t max_elts; > - uint16_t max_wqe; > - uint32_t comp_req; > - const uint16_t wq_n =3D 1 << txq->wqe_n; > - const uint16_t wq_mask =3D wq_n - 1; > - uint16_t wq_idx =3D txq->wqe_ci & wq_mask; > - volatile struct mlx5_wqe64 *wq =3D > - &((volatile struct mlx5_wqe64 *)txq->wqes)[wq_idx]; > - volatile struct mlx5_wqe *wqe =3D (volatile struct mlx5_wqe *)wq; > - const uint8x16_t ctrl_shuf_m =3D { > - 3, 2, 1, 0, /* bswap32 */ > - 7, 6, 5, 4, /* bswap32 */ > - 11, 10, 9, 8, /* bswap32 */ > - 12, 13, 14, 15 > - }; > - uint8x16_t *t_wqe; > - uint8_t *dseg; > - uint8x16_t ctrl; > - > - /* Make sure all packets can fit into a single WQE. */ > - assert(elts_n > pkts_n); > - mlx5_tx_complete(txq); > - max_elts =3D (elts_n - (elts_head - txq->elts_tail)); > - max_wqe =3D (1u << txq->wqe_n) - (txq->wqe_ci - txq->wqe_pi); > - pkts_n =3D RTE_MIN((unsigned int)RTE_MIN(pkts_n, max_wqe), max_elts); > - if (unlikely(!pkts_n)) > - return 0; > - elts =3D &(*txq->elts)[elts_head & elts_m]; > - /* Loop for available tailroom first. */ > - n =3D RTE_MIN(elts_n - (elts_head & elts_m), pkts_n); > - for (pos =3D 0; pos < (n & -2); pos +=3D 2) > - vst1q_u64((void *)&elts[pos], vld1q_u64((void *)&pkts[pos])); > - if (n & 1) > - elts[pos] =3D pkts[pos]; > - /* Check if it crosses the end of the queue. */ > - if (unlikely(n < pkts_n)) { > - elts =3D &(*txq->elts)[0]; > - for (pos =3D 0; pos < pkts_n - n; ++pos) > - elts[pos] =3D pkts[n + pos]; > - } > - txq->elts_head +=3D pkts_n; > - /* Save title WQEBB pointer. */ > - t_wqe =3D (uint8x16_t *)wqe; > - dseg =3D (uint8_t *)(wqe + 1); > - /* Calculate the number of entries to the end. */ > - n =3D RTE_MIN( > - (wq_n - wq_idx) * nb_dword_per_wqebb - nb_dword_in_hdr, > - pkts_n); > - /* Fill DSEGs. */ > - txq_wr_dseg_v(txq, dseg, pkts, n); > - /* Check if it crosses the end of the queue. */ > - if (n < pkts_n) { > - dseg =3D (uint8_t *)txq->wqes; > - txq_wr_dseg_v(txq, dseg, &pkts[n], pkts_n - n); > - } > - if (txq->elts_comp + pkts_n < MLX5_TX_COMP_THRESH) { > - txq->elts_comp +=3D pkts_n; > - comp_req =3D MLX5_COMP_ONLY_FIRST_ERR << MLX5_COMP_MODE_OFFSET; > - } else { > - /* A CQE slot must always be available. */ > - assert((1u << txq->cqe_n) - (txq->cq_pi++ - txq->cq_ci)); > - /* Request a completion. */ > - txq->elts_comp =3D 0; > - comp_req =3D MLX5_COMP_ALWAYS << MLX5_COMP_MODE_OFFSET; > - } > - /* Fill CTRL in the header. */ > - ctrl =3D vreinterpretq_u8_u32((uint32x4_t) { > - MLX5_OPC_MOD_ENHANCED_MPSW << 24 | > - txq->wqe_ci << 8 | MLX5_OPCODE_ENHANCED_MPSW, > - txq->qp_num_8s | (pkts_n + 2), > - comp_req, > - txq->elts_head }); > - ctrl =3D vqtbl1q_u8(ctrl, ctrl_shuf_m); > - vst1q_u8((void *)t_wqe, ctrl); > - /* Fill ESEG in the header. */ > - vst1q_u32((void *)(t_wqe + 1), > - ((uint32x4_t) { 0, cs_flags, metadata, 0 })); > -#ifdef MLX5_PMD_SOFT_COUNTERS > - txq->stats.opackets +=3D pkts_n; > -#endif > - txq->wqe_ci +=3D (nb_dword_in_hdr + pkts_n + (nb_dword_per_wqebb - 1)) = / > - nb_dword_per_wqebb; > - /* Ring QP doorbell. */ > - mlx5_tx_dbrec_cond_wmb(txq, wqe, pkts_n < MLX5_VPMD_TX_MAX_BURST); > - return pkts_n; > -} > - > -/** > * Store free buffers to RX SW ring. > * > * @param rxq > diff --git a/drivers/net/mlx5/mlx5_rxtx_vec_sse.h b/drivers/net/mlx5/mlx5= _rxtx_vec_sse.h > index 503ca0f..7bd254f 100644 > --- a/drivers/net/mlx5/mlx5_rxtx_vec_sse.h > +++ b/drivers/net/mlx5/mlx5_rxtx_vec_sse.h > @@ -29,290 +29,6 @@ > #endif >=20 > /** > - * Fill in buffer descriptors in a multi-packet send descriptor. > - * > - * @param txq > - * Pointer to TX queue structure. > - * @param dseg > - * Pointer to buffer descriptor to be written. > - * @param pkts > - * Pointer to array of packets to be sent. > - * @param n > - * Number of packets to be filled. > - */ > -static inline void > -txq_wr_dseg_v(struct mlx5_txq_data *txq, __m128i *dseg, > - struct rte_mbuf **pkts, unsigned int n) > -{ > - unsigned int pos; > - uintptr_t addr; > - const __m128i shuf_mask_dseg =3D > - _mm_set_epi8(8, 9, 10, 11, /* addr, bswap64 */ > - 12, 13, 14, 15, > - 7, 6, 5, 4, /* lkey */ > - 0, 1, 2, 3 /* length, bswap32 */); > -#ifdef MLX5_PMD_SOFT_COUNTERS > - uint32_t tx_byte =3D 0; > -#endif > - > - for (pos =3D 0; pos < n; ++pos, ++dseg) { > - __m128i desc; > - struct rte_mbuf *pkt =3D pkts[pos]; > - > - addr =3D rte_pktmbuf_mtod(pkt, uintptr_t); > - desc =3D _mm_set_epi32(addr >> 32, > - addr, > - mlx5_tx_mb2mr(txq, pkt), > - DATA_LEN(pkt)); > - desc =3D _mm_shuffle_epi8(desc, shuf_mask_dseg); > - _mm_store_si128(dseg, desc); > -#ifdef MLX5_PMD_SOFT_COUNTERS > - tx_byte +=3D DATA_LEN(pkt); > -#endif > - } > -#ifdef MLX5_PMD_SOFT_COUNTERS > - txq->stats.obytes +=3D tx_byte; > -#endif > -} > - > -/** > - * Send multi-segmented packets until it encounters a single segment pac= ket in > - * the pkts list. > - * > - * @param txq > - * Pointer to TX queue structure. > - * @param pkts > - * Pointer to array of packets to be sent. > - * @param pkts_n > - * Number of packets to be sent. > - * > - * @return > - * Number of packets successfully transmitted (<=3D pkts_n). > - */ > -static uint16_t > -txq_scatter_v(struct mlx5_txq_data *txq, struct rte_mbuf **pkts, > - uint16_t pkts_n) > -{ > - uint16_t elts_head =3D txq->elts_head; > - const uint16_t elts_n =3D 1 << txq->elts_n; > - const uint16_t elts_m =3D elts_n - 1; > - const uint16_t wq_n =3D 1 << txq->wqe_n; > - const uint16_t wq_mask =3D wq_n - 1; > - const unsigned int nb_dword_per_wqebb =3D > - MLX5_WQE_SIZE / MLX5_WQE_DWORD_SIZE; > - const unsigned int nb_dword_in_hdr =3D > - sizeof(struct mlx5_wqe) / MLX5_WQE_DWORD_SIZE; > - unsigned int n; > - volatile struct mlx5_wqe *wqe =3D NULL; > - bool metadata_ol =3D > - txq->offloads & DEV_TX_OFFLOAD_MATCH_METADATA ? true : false; > - > - assert(elts_n > pkts_n); > - mlx5_tx_complete(txq); > - if (unlikely(!pkts_n)) > - return 0; > - for (n =3D 0; n < pkts_n; ++n) { > - struct rte_mbuf *buf =3D pkts[n]; > - unsigned int segs_n =3D buf->nb_segs; > - unsigned int ds =3D nb_dword_in_hdr; > - unsigned int len =3D PKT_LEN(buf); > - uint16_t wqe_ci =3D txq->wqe_ci; > - const __m128i shuf_mask_ctrl =3D > - _mm_set_epi8(15, 14, 13, 12, > - 8, 9, 10, 11, /* bswap32 */ > - 4, 5, 6, 7, /* bswap32 */ > - 0, 1, 2, 3 /* bswap32 */); > - uint8_t cs_flags; > - uint16_t max_elts; > - uint16_t max_wqe; > - __m128i *t_wqe, *dseg; > - __m128i ctrl; > - rte_be32_t metadata =3D > - metadata_ol && (buf->ol_flags & PKT_TX_METADATA) ? > - buf->tx_metadata : 0; > - > - assert(segs_n); > - max_elts =3D elts_n - (elts_head - txq->elts_tail); > - max_wqe =3D wq_n - (txq->wqe_ci - txq->wqe_pi); > - /* > - * A MPW session consumes 2 WQEs at most to > - * include MLX5_MPW_DSEG_MAX pointers. > - */ > - if (segs_n =3D=3D 1 || > - max_elts < segs_n || max_wqe < 2) > - break; > - if (segs_n > MLX5_MPW_DSEG_MAX) { > - txq->stats.oerrors++; > - break; > - } > - wqe =3D &((volatile struct mlx5_wqe64 *) > - txq->wqes)[wqe_ci & wq_mask].hdr; > - cs_flags =3D txq_ol_cksum_to_cs(buf); > - /* Title WQEBB pointer. */ > - t_wqe =3D (__m128i *)wqe; > - dseg =3D (__m128i *)(wqe + 1); > - do { > - if (!(ds++ % nb_dword_per_wqebb)) { > - dseg =3D (__m128i *) > - &((volatile struct mlx5_wqe64 *) > - txq->wqes)[++wqe_ci & wq_mask]; > - } > - txq_wr_dseg_v(txq, dseg++, &buf, 1); > - (*txq->elts)[elts_head++ & elts_m] =3D buf; > - buf =3D buf->next; > - } while (--segs_n); > - ++wqe_ci; > - /* Fill CTRL in the header. */ > - ctrl =3D _mm_set_epi32(0, 4, txq->qp_num_8s | ds, > - MLX5_OPC_MOD_MPW << 24 | > - txq->wqe_ci << 8 | MLX5_OPCODE_TSO); > - ctrl =3D _mm_shuffle_epi8(ctrl, shuf_mask_ctrl); > - _mm_store_si128(t_wqe, ctrl); > - /* Fill ESEG in the header. */ > - _mm_store_si128(t_wqe + 1, > - _mm_set_epi32(0, metadata, > - (rte_cpu_to_be_16(len) << 16) | > - cs_flags, 0)); > - txq->wqe_ci =3D wqe_ci; > - } > - if (!n) > - return 0; > - txq->elts_comp +=3D (uint16_t)(elts_head - txq->elts_head); > - txq->elts_head =3D elts_head; > - if (txq->elts_comp >=3D MLX5_TX_COMP_THRESH) { > - /* A CQE slot must always be available. */ > - assert((1u << txq->cqe_n) - (txq->cq_pi++ - txq->cq_ci)); > - wqe->ctrl[2] =3D rte_cpu_to_be_32(MLX5_COMP_ALWAYS << > - MLX5_COMP_MODE_OFFSET); > - wqe->ctrl[3] =3D txq->elts_head; > - txq->elts_comp =3D 0; > - } > -#ifdef MLX5_PMD_SOFT_COUNTERS > - txq->stats.opackets +=3D n; > -#endif > - mlx5_tx_dbrec(txq, wqe); > - return n; > -} > - > -/** > - * Send burst of packets with Enhanced MPW. If it encounters a multi-seg= packet, > - * it returns to make it processed by txq_scatter_v(). All the packets i= n > - * the pkts list should be single segment packets having same offload fl= ags. > - * This must be checked by txq_count_contig_single_seg() and txq_calc_of= fload(). > - * > - * @param txq > - * Pointer to TX queue structure. > - * @param pkts > - * Pointer to array of packets to be sent. > - * @param pkts_n > - * Number of packets to be sent (<=3D MLX5_VPMD_TX_MAX_BURST). > - * @param cs_flags > - * Checksum offload flags to be written in the descriptor. > - * @param metadata > - * Metadata value to be written in the descriptor. > - * > - * @return > - * Number of packets successfully transmitted (<=3D pkts_n). > - */ > -static inline uint16_t > -txq_burst_v(struct mlx5_txq_data *txq, struct rte_mbuf **pkts, uint16_t = pkts_n, > - uint8_t cs_flags, rte_be32_t metadata) > -{ > - struct rte_mbuf **elts; > - uint16_t elts_head =3D txq->elts_head; > - const uint16_t elts_n =3D 1 << txq->elts_n; > - const uint16_t elts_m =3D elts_n - 1; > - const unsigned int nb_dword_per_wqebb =3D > - MLX5_WQE_SIZE / MLX5_WQE_DWORD_SIZE; > - const unsigned int nb_dword_in_hdr =3D > - sizeof(struct mlx5_wqe) / MLX5_WQE_DWORD_SIZE; > - unsigned int n =3D 0; > - unsigned int pos; > - uint16_t max_elts; > - uint16_t max_wqe; > - uint32_t comp_req; > - const uint16_t wq_n =3D 1 << txq->wqe_n; > - const uint16_t wq_mask =3D wq_n - 1; > - uint16_t wq_idx =3D txq->wqe_ci & wq_mask; > - volatile struct mlx5_wqe64 *wq =3D > - &((volatile struct mlx5_wqe64 *)txq->wqes)[wq_idx]; > - volatile struct mlx5_wqe *wqe =3D (volatile struct mlx5_wqe *)wq; > - const __m128i shuf_mask_ctrl =3D > - _mm_set_epi8(15, 14, 13, 12, > - 8, 9, 10, 11, /* bswap32 */ > - 4, 5, 6, 7, /* bswap32 */ > - 0, 1, 2, 3 /* bswap32 */); > - __m128i *t_wqe, *dseg; > - __m128i ctrl; > - > - /* Make sure all packets can fit into a single WQE. */ > - assert(elts_n > pkts_n); > - mlx5_tx_complete(txq); > - max_elts =3D (elts_n - (elts_head - txq->elts_tail)); > - max_wqe =3D (1u << txq->wqe_n) - (txq->wqe_ci - txq->wqe_pi); > - pkts_n =3D RTE_MIN((unsigned int)RTE_MIN(pkts_n, max_wqe), max_elts); > - assert(pkts_n <=3D MLX5_DSEG_MAX - nb_dword_in_hdr); > - if (unlikely(!pkts_n)) > - return 0; > - elts =3D &(*txq->elts)[elts_head & elts_m]; > - /* Loop for available tailroom first. */ > - n =3D RTE_MIN(elts_n - (elts_head & elts_m), pkts_n); > - for (pos =3D 0; pos < (n & -2); pos +=3D 2) > - _mm_storeu_si128((__m128i *)&elts[pos], > - _mm_loadu_si128((__m128i *)&pkts[pos])); > - if (n & 1) > - elts[pos] =3D pkts[pos]; > - /* Check if it crosses the end of the queue. */ > - if (unlikely(n < pkts_n)) { > - elts =3D &(*txq->elts)[0]; > - for (pos =3D 0; pos < pkts_n - n; ++pos) > - elts[pos] =3D pkts[n + pos]; > - } > - txq->elts_head +=3D pkts_n; > - /* Save title WQEBB pointer. */ > - t_wqe =3D (__m128i *)wqe; > - dseg =3D (__m128i *)(wqe + 1); > - /* Calculate the number of entries to the end. */ > - n =3D RTE_MIN( > - (wq_n - wq_idx) * nb_dword_per_wqebb - nb_dword_in_hdr, > - pkts_n); > - /* Fill DSEGs. */ > - txq_wr_dseg_v(txq, dseg, pkts, n); > - /* Check if it crosses the end of the queue. */ > - if (n < pkts_n) { > - dseg =3D (__m128i *)txq->wqes; > - txq_wr_dseg_v(txq, dseg, &pkts[n], pkts_n - n); > - } > - if (txq->elts_comp + pkts_n < MLX5_TX_COMP_THRESH) { > - txq->elts_comp +=3D pkts_n; > - comp_req =3D MLX5_COMP_ONLY_FIRST_ERR << MLX5_COMP_MODE_OFFSET; > - } else { > - /* A CQE slot must always be available. */ > - assert((1u << txq->cqe_n) - (txq->cq_pi++ - txq->cq_ci)); > - /* Request a completion. */ > - txq->elts_comp =3D 0; > - comp_req =3D MLX5_COMP_ALWAYS << MLX5_COMP_MODE_OFFSET; > - } > - /* Fill CTRL in the header. */ > - ctrl =3D _mm_set_epi32(txq->elts_head, comp_req, > - txq->qp_num_8s | (pkts_n + 2), > - MLX5_OPC_MOD_ENHANCED_MPSW << 24 | > - txq->wqe_ci << 8 | MLX5_OPCODE_ENHANCED_MPSW); > - ctrl =3D _mm_shuffle_epi8(ctrl, shuf_mask_ctrl); > - _mm_store_si128(t_wqe, ctrl); > - /* Fill ESEG in the header. */ > - _mm_store_si128(t_wqe + 1, _mm_set_epi32(0, metadata, cs_flags, 0)); > -#ifdef MLX5_PMD_SOFT_COUNTERS > - txq->stats.opackets +=3D pkts_n; > -#endif > - txq->wqe_ci +=3D (nb_dword_in_hdr + pkts_n + (nb_dword_per_wqebb - 1)) = / > - nb_dword_per_wqebb; > - /* Ring QP doorbell. */ > - mlx5_tx_dbrec_cond_wmb(txq, wqe, pkts_n < MLX5_VPMD_TX_MAX_BURST); > - return pkts_n; > -} > - > -/** > * Store free buffers to RX SW ring. > * > * @param rxq > diff --git a/drivers/net/mlx5/mlx5_txq.c b/drivers/net/mlx5/mlx5_txq.c > index 82493d7..55892e2 100644 > --- a/drivers/net/mlx5/mlx5_txq.c > +++ b/drivers/net/mlx5/mlx5_txq.c > @@ -365,25 +365,6 @@ > } >=20 > /** > - * Check if the burst function is using eMPW. > - * > - * @param tx_pkt_burst > - * Tx burst function pointer. > - * > - * @return > - * 1 if the burst function is using eMPW, 0 otherwise. > - */ > -static int > -is_empw_burst_func(eth_tx_burst_t tx_pkt_burst) > -{ > - if (tx_pkt_burst =3D=3D mlx5_tx_burst_raw_vec || > - tx_pkt_burst =3D=3D mlx5_tx_burst_vec || > - tx_pkt_burst =3D=3D mlx5_tx_burst_empw) > - return 1; > - return 0; > -} > - > -/** > * Create the Tx queue Verbs object. > * > * @param dev > @@ -414,7 +395,6 @@ struct mlx5_txq_ibv * > struct mlx5dv_cq cq_info; > struct mlx5dv_obj obj; > const int desc =3D 1 << txq_data->elts_n; > - eth_tx_burst_t tx_pkt_burst =3D mlx5_select_tx_function(dev); > int ret =3D 0; >=20 > assert(txq_data); > @@ -432,8 +412,6 @@ struct mlx5_txq_ibv * > .comp_mask =3D 0, > }; > cqe_n =3D desc / MLX5_TX_COMP_THRESH + 1; > - if (is_empw_burst_func(tx_pkt_burst)) > - cqe_n +=3D MLX5_TX_COMP_THRESH_INLINE_DIV; > tmpl.cq =3D mlx5_glue->create_cq(priv->sh->ctx, cqe_n, NULL, NULL, 0); > if (tmpl.cq =3D=3D NULL) { > DRV_LOG(ERR, "port %u Tx queue %u CQ creation failure", > @@ -698,93 +676,7 @@ struct mlx5_txq_ibv * > static void > txq_set_params(struct mlx5_txq_ctrl *txq_ctrl) > { > - struct mlx5_priv *priv =3D txq_ctrl->priv; > - struct mlx5_dev_config *config =3D &priv->config; > - const unsigned int max_tso_inline =3D > - ((MLX5_MAX_TSO_HEADER + (RTE_CACHE_LINE_SIZE - 1)) / > - RTE_CACHE_LINE_SIZE); > - unsigned int txq_inline; > - unsigned int txqs_inline; > - unsigned int inline_max_packet_sz; > - eth_tx_burst_t tx_pkt_burst =3D > - mlx5_select_tx_function(ETH_DEV(priv)); > - int is_empw_func =3D is_empw_burst_func(tx_pkt_burst); > - int tso =3D !!(txq_ctrl->txq.offloads & (DEV_TX_OFFLOAD_TCP_TSO | > - DEV_TX_OFFLOAD_VXLAN_TNL_TSO | > - DEV_TX_OFFLOAD_GRE_TNL_TSO | > - DEV_TX_OFFLOAD_IP_TNL_TSO | > - DEV_TX_OFFLOAD_UDP_TNL_TSO)); > - > - txq_inline =3D (config->txq_inline =3D=3D MLX5_ARG_UNSET) ? > - 0 : config->txq_inline; > - txqs_inline =3D (config->txqs_inline =3D=3D MLX5_ARG_UNSET) ? > - 0 : config->txqs_inline; > - inline_max_packet_sz =3D > - (config->inline_max_packet_sz =3D=3D MLX5_ARG_UNSET) ? > - 0 : config->inline_max_packet_sz; > - if (is_empw_func) { > - if (config->txq_inline =3D=3D MLX5_ARG_UNSET) > - txq_inline =3D MLX5_WQE_SIZE_MAX - MLX5_WQE_SIZE; > - if (config->txqs_inline =3D=3D MLX5_ARG_UNSET) > - txqs_inline =3D MLX5_EMPW_MIN_TXQS; > - if (config->inline_max_packet_sz =3D=3D MLX5_ARG_UNSET) > - inline_max_packet_sz =3D MLX5_EMPW_MAX_INLINE_LEN; > - txq_ctrl->txq.mpw_hdr_dseg =3D config->mpw_hdr_dseg; > - txq_ctrl->txq.inline_max_packet_sz =3D inline_max_packet_sz; > - } > - if (txq_inline && priv->txqs_n >=3D txqs_inline) { > - unsigned int ds_cnt; > - > - txq_ctrl->txq.max_inline =3D > - ((txq_inline + (RTE_CACHE_LINE_SIZE - 1)) / > - RTE_CACHE_LINE_SIZE); > - if (is_empw_func) { > - /* To minimize the size of data set, avoid requesting > - * too large WQ. > - */ > - txq_ctrl->max_inline_data =3D > - ((RTE_MIN(txq_inline, > - inline_max_packet_sz) + > - (RTE_CACHE_LINE_SIZE - 1)) / > - RTE_CACHE_LINE_SIZE) * RTE_CACHE_LINE_SIZE; > - } else { > - txq_ctrl->max_inline_data =3D > - txq_ctrl->txq.max_inline * RTE_CACHE_LINE_SIZE; > - } > - /* > - * Check if the inline size is too large in a way which > - * can make the WQE DS to overflow. > - * Considering in calculation: > - * WQE CTRL (1 DS) > - * WQE ETH (1 DS) > - * Inline part (N DS) > - */ > - ds_cnt =3D 2 + (txq_ctrl->txq.max_inline / MLX5_WQE_DWORD_SIZE); > - if (ds_cnt > MLX5_DSEG_MAX) { > - unsigned int max_inline =3D (MLX5_DSEG_MAX - 2) * > - MLX5_WQE_DWORD_SIZE; > - > - max_inline =3D max_inline - (max_inline % > - RTE_CACHE_LINE_SIZE); > - DRV_LOG(WARNING, > - "port %u txq inline is too large (%d) setting" > - " it to the maximum possible: %d\n", > - PORT_ID(priv), txq_inline, max_inline); > - txq_ctrl->txq.max_inline =3D max_inline / > - RTE_CACHE_LINE_SIZE; > - } > - } > - if (tso) { > - txq_ctrl->max_tso_header =3D max_tso_inline * RTE_CACHE_LINE_SIZE; > - txq_ctrl->txq.max_inline =3D RTE_MAX(txq_ctrl->txq.max_inline, > - max_tso_inline); > - txq_ctrl->txq.tso_en =3D 1; > - } > - txq_ctrl->txq.tunnel_en =3D config->tunnel_en | config->swp; > - txq_ctrl->txq.swp_en =3D ((DEV_TX_OFFLOAD_IP_TNL_TSO | > - DEV_TX_OFFLOAD_UDP_TNL_TSO | > - DEV_TX_OFFLOAD_OUTER_IPV4_CKSUM) & > - txq_ctrl->txq.offloads) && config->swp; > + (void)txq_ctrl; > } >=20 > /** > --=20 > 1.8.3.1 >=20