From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from EUR01-DB5-obe.outbound.protection.outlook.com (mail-db5eur01on0068.outbound.protection.outlook.com [104.47.2.68]) by dpdk.org (Postfix) with ESMTP id 4D9321B42A for ; Mon, 9 Jul 2018 15:07:15 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Mellanox.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=2IH8WMPBXipCLEqwzP/W131b6SW6EPpmHWq90dZviHA=; b=k7hRUus91OmsgIXvKhg6CqSlWwy1tYcFLWFORYxTYGbC0zBYJQwlODjb8ihc7Y0IyYA03ivKRKiGQuWuJJCeG3uAXsREhojj2i968f/fD2EjBgbxkzXd+5ETqcIu/YT+pF5793exjaeT+Je3M/3oh8bjfP1m6MHkdKoia5mxYGM= Received: from VI1PR0501MB2608.eurprd05.prod.outlook.com (10.168.137.20) by VI1PR0501MB2752.eurprd05.prod.outlook.com (10.172.11.14) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.930.19; Mon, 9 Jul 2018 13:07:12 +0000 Received: from VI1PR0501MB2608.eurprd05.prod.outlook.com ([fe80::9dd0:9bdb:fd59:b615]) by VI1PR0501MB2608.eurprd05.prod.outlook.com ([fe80::9dd0:9bdb:fd59:b615%7]) with mapi id 15.20.0930.022; Mon, 9 Jul 2018 13:07:12 +0000 From: Matan Azrad To: Mordechay Haimovsky , Adrien Mazarguil CC: "dev@dpdk.org" Thread-Topic: [PATCH v5] net/mlx4: support hardware TSO Thread-Index: AQHUF3Gti+tnEN79NkSwvmIEj38g8KSGzPig Date: Mon, 9 Jul 2018 13:07:12 +0000 Message-ID: References: <1530715998-15703-1-git-send-email-motih@mellanox.com> <1531132986-5054-1-git-send-email-motih@mellanox.com> In-Reply-To: <1531132986-5054-1-git-send-email-motih@mellanox.com> Accept-Language: en-US, he-IL Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: authentication-results: spf=none (sender IP is ) smtp.mailfrom=matan@mellanox.com; x-originating-ip: [193.47.165.251] x-ms-publictraffictype: Email x-microsoft-exchange-diagnostics: 1; VI1PR0501MB2752; 7:u06woFec5vRyZA5MdoaM56GtujkOCr3uxNmP3eWUb00AotayVIoZ2sk9CukWcpiRN8dPPL7zPWufDJJnofxfwamXdvZVp1FMzchHsIQV7j+0y7p+qucebte3G0TBr5vjvqurI4SA542lo67tbeg533I/n7jQxueSlgOzf+lbU456A6zAmrD7Vhas/d7kcBqPKGWNZAQUIA8BmFJ/jqfklPJdjLiMt5vnvVXT35rMnb9evu2bDZJbA0c6g6YF3vra x-ms-exchange-antispam-srfa-diagnostics: SOS; x-ms-office365-filtering-ht: Tenant x-ms-office365-filtering-correlation-id: 563ae222-e050-4a11-34e4-08d5e59ce2c8 x-microsoft-antispam: UriScan:; BCL:0; PCL:0; RULEID:(7020095)(4652040)(8989117)(48565401081)(5600053)(711020)(2017052603328)(7153060)(7193020); SRVR:VI1PR0501MB2752; x-ms-traffictypediagnostic: VI1PR0501MB2752: x-ld-processed: a652971c-7d2e-4d9b-a6a4-d149256f461b,ExtAddr x-microsoft-antispam-prvs: x-exchange-antispam-report-test: UriScan:(788757137089)(17755550239193); x-ms-exchange-senderadcheck: 1 x-exchange-antispam-report-cfa-test: BCL:0; PCL:0; RULEID:(8211001083)(6040522)(2401047)(5005006)(8121501046)(3231311)(944501410)(52105095)(3002001)(93006095)(93001095)(10201501046)(6055026)(149027)(150027)(6041310)(20161123564045)(20161123558120)(20161123562045)(20161123560045)(201703131423095)(201702281528075)(20161123555045)(201703061421075)(201703061406153)(6072148)(201708071742011)(7699016); SRVR:VI1PR0501MB2752; BCL:0; PCL:0; RULEID:; SRVR:VI1PR0501MB2752; x-forefront-prvs: 07283408BE x-forefront-antispam-report: SFV:NSPM; SFS:(10009020)(39850400004)(136003)(366004)(346002)(396003)(376002)(199004)(189003)(51444003)(76176011)(105586002)(26005)(6506007)(5250100002)(478600001)(81156014)(2900100001)(97736004)(81166006)(14454004)(229853002)(102836004)(55016002)(486006)(476003)(106356001)(66066001)(53936002)(7736002)(446003)(110136005)(2906002)(6436002)(8676002)(9686003)(316002)(305945005)(68736007)(3846002)(6116002)(74316002)(4326008)(14444005)(256004)(5660300001)(7696005)(25786009)(53946003)(86362001)(6246003)(8936002)(99286004)(33656002)(11346002)(579004); DIR:OUT; SFP:1101; SCL:1; SRVR:VI1PR0501MB2752; H:VI1PR0501MB2608.eurprd05.prod.outlook.com; FPR:; SPF:None; LANG:en; PTR:InfoNoRecords; A:1; MX:1; received-spf: None (protection.outlook.com: mellanox.com does not designate permitted sender hosts) x-microsoft-antispam-message-info: CkcUgSX0sYXmJmU43mqd9HDdmuYYAAAxj5gI9FjECVaLpw1zf7Zj3oShHcux1HH+Qzn2KirShkJsMwwNV/IzXGvLNZW8Te6h+jfvkOAOP78UtYfcHIFfdAcPv5ukmIvMcAzavq8aOLPgSuE2AgRxHzCOh+sLgKER1kDSSWvHh+vasLsxfpHbEwEAivT5bikzXhqVMQRBy7ZGYj0HFFgPT0o5lqnZGytTJ4d/RrAzGql98aEUB+Y60bD3S+DGwSl43QqWye7r4VAq/znHXUqxsrMwYGaDrGnKKSM8796HTUDQuTbM2RrHtX+jSS6wV2rUEdg5MfZQGH6Rt6QsuQS3UsFVM0r+8BIDYT86+4jA0jA= spamdiagnosticoutput: 1:99 spamdiagnosticmetadata: NSPM Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-OriginatorOrg: Mellanox.com X-MS-Exchange-CrossTenant-Network-Message-Id: 563ae222-e050-4a11-34e4-08d5e59ce2c8 X-MS-Exchange-CrossTenant-originalarrivaltime: 09 Jul 2018 13:07:12.3672 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: a652971c-7d2e-4d9b-a6a4-d149256f461b X-MS-Exchange-Transport-CrossTenantHeadersStamped: VI1PR0501MB2752 Subject: Re: [dpdk-dev] [PATCH v5] net/mlx4: support hardware TSO X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 09 Jul 2018 13:07:15 -0000 Hi Moti=20 Please see some comments below. From: Mordechay Haimovsky > Implement support for hardware TSO. >=20 > Signed-off-by: Moti Haimovsky > --- > v5: > * Modification to the code according to review inputs from Matan > Azrad. > * Code optimization to the TSO header copy routine. > * Rearranged the TSO data-segments creation routine. > in reply to > 1530715998-15703-1-git-send-email-motih@mellanox.com >=20 > v4: > * Bug fixes in filling TSO data segments. > * Modifications according to review inputs from Adrien Mazarguil > and Matan Azrad. > in reply to > 1530190137-17848-1-git-send-email-motih@mellanox.com >=20 > v3: > * Fixed compilation errors in compilers without GNU C extensions > caused by a declaration of zero-length array in the code. > in reply to > 1530187032-6489-1-git-send-email-motih@mellanox.com >=20 > v2: > * Fixed coding style warning. > in reply to > 1530184583-30166-1-git-send-email-motih@mellanox.com >=20 > v1: > * Fixed coding style warnings. > in reply to > 1530181779-19716-1-git-send-email-motih@mellanox.com > --- > doc/guides/nics/features/mlx4.ini | 1 + > doc/guides/nics/mlx4.rst | 3 + > drivers/net/mlx4/Makefile | 5 + > drivers/net/mlx4/mlx4.c | 9 + > drivers/net/mlx4/mlx4.h | 5 + > drivers/net/mlx4/mlx4_prm.h | 15 ++ > drivers/net/mlx4/mlx4_rxtx.c | 372 > +++++++++++++++++++++++++++++++++++++- > drivers/net/mlx4/mlx4_rxtx.h | 2 +- > drivers/net/mlx4/mlx4_txq.c | 8 +- > 9 files changed, 416 insertions(+), 4 deletions(-) >=20 > diff --git a/doc/guides/nics/features/mlx4.ini > b/doc/guides/nics/features/mlx4.ini > index f6efd21..98a3f61 100644 > --- a/doc/guides/nics/features/mlx4.ini > +++ b/doc/guides/nics/features/mlx4.ini > @@ -13,6 +13,7 @@ Queue start/stop =3D Y > MTU update =3D Y > Jumbo frame =3D Y > Scattered Rx =3D Y > +TSO =3D Y > Promiscuous mode =3D Y > Allmulticast mode =3D Y > Unicast MAC filter =3D Y > diff --git a/doc/guides/nics/mlx4.rst b/doc/guides/nics/mlx4.rst index > 491106a..12adaeb 100644 > --- a/doc/guides/nics/mlx4.rst > +++ b/doc/guides/nics/mlx4.rst > @@ -142,6 +142,9 @@ Limitations > The ability to enable/disable CRC stripping requires OFED version > 4.3-1.5.0.0 and above or rdma-core version v18 and above. >=20 > +- TSO (Transmit Segmentation Offload) is supported in OFED version > + 4.4 and above or in rdma-core version v18 and above. > + > Prerequisites > ------------- >=20 > diff --git a/drivers/net/mlx4/Makefile b/drivers/net/mlx4/Makefile index > 73f9d40..63bc003 100644 > --- a/drivers/net/mlx4/Makefile > +++ b/drivers/net/mlx4/Makefile > @@ -85,6 +85,11 @@ mlx4_autoconf.h.new: FORCE > mlx4_autoconf.h.new: $(RTE_SDK)/buildtools/auto-config-h.sh > $Q $(RM) -f -- '$@' > $Q : > '$@' > + $Q sh -- '$<' '$@' \ > + HAVE_IBV_MLX4_WQE_LSO_SEG \ > + infiniband/mlx4dv.h \ > + type 'struct mlx4_wqe_lso_seg' \ > + $(AUTOCONF_OUTPUT) >=20 > # Create mlx4_autoconf.h or update it in case it differs from the new on= e. >=20 > diff --git a/drivers/net/mlx4/mlx4.c b/drivers/net/mlx4/mlx4.c index > d151a90..5d8c76d 100644 > --- a/drivers/net/mlx4/mlx4.c > +++ b/drivers/net/mlx4/mlx4.c > @@ -677,6 +677,15 @@ struct mlx4_conf { >=20 > IBV_RAW_PACKET_CAP_SCATTER_FCS); > DEBUG("FCS stripping toggling is %ssupported", > priv->hw_fcs_strip ? "" : "not "); > + priv->tso =3D > + ((device_attr_ex.tso_caps.max_tso > 0) && > + (device_attr_ex.tso_caps.supported_qpts & > + (1 << IBV_QPT_RAW_PACKET))); > + if (priv->tso) > + priv->tso_max_payload_sz =3D > + device_attr_ex.tso_caps.max_tso; > + DEBUG("TSO is %ssupported", > + priv->tso ? "" : "not "); > /* Configure the first MAC address by default. */ > err =3D mlx4_get_mac(priv, &mac.addr_bytes); > if (err) { > diff --git a/drivers/net/mlx4/mlx4.h b/drivers/net/mlx4/mlx4.h index > 300cb4d..89d8c38 100644 > --- a/drivers/net/mlx4/mlx4.h > +++ b/drivers/net/mlx4/mlx4.h > @@ -47,6 +47,9 @@ > /** Interrupt alarm timeout value in microseconds. */ #define > MLX4_INTR_ALARM_TIMEOUT 100000 >=20 > +/* Maximum packet headers size (L2+L3+L4) for TSO. */ #define > +MLX4_MAX_TSO_HEADER 192 > + > /** Port parameter. */ > #define MLX4_PMD_PORT_KVARG "port" >=20 > @@ -90,6 +93,8 @@ struct priv { > uint32_t hw_csum:1; /**< Checksum offload is supported. */ > uint32_t hw_csum_l2tun:1; /**< Checksum support for L2 tunnels. > */ > uint32_t hw_fcs_strip:1; /**< FCS stripping toggling is supported. */ > + uint32_t tso:1; /**< Transmit segmentation offload is supported. */ > + uint32_t tso_max_payload_sz; /**< Max supported TSO payload > size. */ > uint64_t hw_rss_sup; /**< Supported RSS hash fields (Verbs > format). */ > struct rte_intr_handle intr_handle; /**< Port interrupt handle. */ > struct mlx4_drop *drop; /**< Shared resources for drop flow rules. > */ diff --git a/drivers/net/mlx4/mlx4_prm.h b/drivers/net/mlx4/mlx4_prm.h > index b771d8c..aef77ba 100644 > --- a/drivers/net/mlx4/mlx4_prm.h > +++ b/drivers/net/mlx4/mlx4_prm.h > @@ -19,6 +19,7 @@ > #ifdef PEDANTIC > #pragma GCC diagnostic error "-Wpedantic" > #endif > +#include "mlx4_autoconf.h" >=20 > /* ConnectX-3 Tx queue basic block. */ > #define MLX4_TXBB_SHIFT 6 > @@ -40,6 +41,7 @@ > /* Work queue element (WQE) flags. */ > #define MLX4_WQE_CTRL_IIP_HDR_CSUM (1 << 28) #define > MLX4_WQE_CTRL_IL4_HDR_CSUM (1 << 27) > +#define MLX4_WQE_CTRL_RR (1 << 6) >=20 > /* CQE checksum flags. */ > enum { > @@ -98,6 +100,19 @@ struct mlx4_cq { > int arm_sn; /**< Rx event counter. */ > }; >=20 > +#ifndef HAVE_IBV_MLX4_WQE_LSO_SEG > +/* > + * WQE LSO segment structure. > + * Defined here as backward compatibility for rdma-core v17 and below. > + * Similar definition is found in infiniband/mlx4dv.h in rdma-core v18 > + * and above. > + */ > +struct mlx4_wqe_lso_seg { > + rte_be32_t mss_hdr_size; > + rte_be32_t header[]; > +}; > +#endif > + > /** > * Retrieve a CQE entry from a CQ. > * > diff --git a/drivers/net/mlx4/mlx4_rxtx.c b/drivers/net/mlx4/mlx4_rxtx.c > index 78b6dd5..b695539 100644 > --- a/drivers/net/mlx4/mlx4_rxtx.c > +++ b/drivers/net/mlx4/mlx4_rxtx.c > @@ -38,10 +38,29 @@ > * DWORD (32 byte) of a TXBB. > */ > struct pv { > - volatile struct mlx4_wqe_data_seg *dseg; > + union { > + volatile struct mlx4_wqe_data_seg *dseg; > + volatile uint32_t *dst; > + }; > uint32_t val; > }; >=20 > +/** A helper structure for TSO packet handling. */ struct tso_info { > + /** Pointer to the array of saved first DWORD (32 byte) of a TXBB. */ > + struct pv *pv; > + /** Current entry in the pv array. */ > + int pv_counter; > + /** Total size of the WQE including padding. */ > + uint32_t wqe_size; > + /** Size of TSO header to prepend to each packet to send. */ > + uint16_t tso_header_size; > + /** Total size of the TSO segment in the WQE. */ > + uint16_t wqe_tso_seg_size; > + /** Raw WQE size in units of 16 Bytes and without padding. */ > + uint8_t fence_size; > +}; > + > /** A table to translate Rx completion flags to packet type. */ uint32_= t > mlx4_ptype_table[0x100] __rte_cache_aligned =3D { > /* > @@ -368,6 +387,345 @@ struct pv { > } >=20 > /** > + * Obtain and calculate TSO information needed for assembling a TSO WQE. > + * > + * @param buf > + * Pointer to the first packet mbuf. > + * @param txq > + * Pointer to Tx queue structure. > + * @param tinfo > + * Pointer to a structure to fill the info with. > + * > + * @return > + * 0 on success, negative value upon error. > + */ > +static inline int > +mlx4_tx_burst_tso_get_params(struct rte_mbuf *buf, > + struct txq *txq, > + struct tso_info *tinfo) > +{ > + struct mlx4_sq *sq =3D &txq->msq; > + const uint8_t tunneled =3D txq->priv->hw_csum_l2tun && > + (buf->ol_flags & PKT_TX_TUNNEL_MASK); > + > + tinfo->tso_header_size =3D buf->l2_len + buf->l3_len + buf->l4_len; > + if (tunneled) > + tinfo->tso_header_size +=3D > + buf->outer_l2_len + buf->outer_l3_len; > + if (unlikely(buf->tso_segsz =3D=3D 0 || > + tinfo->tso_header_size =3D=3D 0 || > + tinfo->tso_header_size > MLX4_MAX_TSO_HEADER || > + tinfo->tso_header_size > buf->data_len)) > + return -EINVAL; > + /* > + * Calculate the WQE TSO segment size > + * Note: > + * 1. An LSO segment must be padded such that the subsequent data > + * segment is 16-byte aligned. > + * 2. The start address of the TSO segment is always 16 Bytes aligned. > + */ > + tinfo->wqe_tso_seg_size =3D RTE_ALIGN(sizeof(struct > mlx4_wqe_lso_seg) + > + tinfo->tso_header_size, > + sizeof(struct > mlx4_wqe_data_seg)); > + tinfo->fence_size =3D ((sizeof(struct mlx4_wqe_ctrl_seg) + > + tinfo->wqe_tso_seg_size) >> MLX4_SEG_SHIFT) + > + buf->nb_segs; > + tinfo->wqe_size =3D > + RTE_ALIGN((uint32_t)(tinfo->fence_size << > MLX4_SEG_SHIFT), > + MLX4_TXBB_SIZE); > + /* Validate WQE size and WQE space in the send queue. */ > + if (sq->remain_size < tinfo->wqe_size || > + tinfo->wqe_size > MLX4_MAX_WQE_SIZE) > + return -ENOMEM; > + /* Init pv. */ > + tinfo->pv =3D (struct pv *)txq->bounce_buf; > + tinfo->pv_counter =3D 0; > + return 0; > +} > + > +/** > + * Fill the TSO WQE data segments with info on buffers to transmit . > + * > + * @param buf > + * Pointer to the first packet mbuf. > + * @param txq > + * Pointer to Tx queue structure. > + * @param tinfo > + * Pointer to TSO info to use. > + * @param dseg > + * Pointer to the first data segment in the TSO WQE. > + * @param ctrl > + * Pointer to the control segment in the TSO WQE. > + * > + * @return > + * 0 on success, negative value upon error. > + */ > +static inline volatile struct mlx4_wqe_ctrl_seg * > +mlx4_tx_burst_fill_tso_dsegs(struct rte_mbuf *buf, > + struct txq *txq, > + struct tso_info *tinfo, > + volatile struct mlx4_wqe_data_seg *dseg, > + volatile struct mlx4_wqe_ctrl_seg *ctrl) { > + uint32_t lkey; > + int nb_segs =3D buf->nb_segs; > + int nb_segs_txbb; > + struct mlx4_sq *sq =3D &txq->msq; > + struct rte_mbuf *sbuf =3D buf; > + struct pv *pv =3D tinfo->pv; > + int *pv_counter =3D &tinfo->pv_counter; > + volatile struct mlx4_wqe_ctrl_seg *ctrl_next =3D > + (volatile struct mlx4_wqe_ctrl_seg *) > + ((volatile uint8_t *)ctrl + tinfo->wqe_size); > + uint16_t sb_of =3D tinfo->tso_header_size; > + uint16_t data_len; > + > + do { > + /* how many dseg entries do we have in the current TXBB ? > */ > + nb_segs_txbb =3D (MLX4_TXBB_SIZE - > + ((uintptr_t)dseg & (MLX4_TXBB_SIZE - 1))) >> > + MLX4_SEG_SHIFT; > + switch (nb_segs_txbb) { > + default: > + /* Should never happen. */ > + rte_panic("%p: Invalid number of SGEs(%d) for a > TXBB", > + (void *)txq, nb_segs_txbb); > + /* rte_panic never returns. */ Since this default case should not happen because of the above calculation = I think we don't need it. Just "break" if the compiler complain of default case lack. > + case 4: > + /* Memory region key for this memory pool. */ > + lkey =3D mlx4_tx_mb2mr(txq, sbuf); > + if (unlikely(lkey =3D=3D (uint32_t)-1)) > + goto err; > + dseg->addr =3D > + > rte_cpu_to_be_64(rte_pktmbuf_mtod_offset(sbuf, > + uintptr_t, > + sb_of)); > + dseg->lkey =3D lkey; > + /* > + * This data segment starts at the beginning of a new > + * TXBB, so we need to postpone its byte_count > writing > + * for later. > + */ > + pv[*pv_counter].dseg =3D dseg; > + /* > + * Zero length segment is treated as inline segment > + * with zero data. > + */ > + data_len =3D sbuf->data_len - sb_of; Is there a chance that the data_len will be negative? Rolled in this case? Maybe it is better to change it for int16_t and to replace the next check t= o be: data_len > 0 ? data_len : 0x80000000 And I think I found a way to remove the sb_of calculations for each segment= : Each segment will create the next segment parameters while only the pre loo= p calculation for the first segment parameters will calculate the header of= fset: The parameters: data_len and sb_of. So before the loop: sb_of =3D tinfo->tso_header_size; data_len =3D sbuf->data_len - sb_of; And inside the loop (after the check of nb_segs): sb_of =3D 0; data_len =3D sbuf->data_len(the next sbuf); so each segment calculates the next segment parameters and we don't need th= e "- sb_of" calculation per segment. > + pv[(*pv_counter)++].val =3D > + rte_cpu_to_be_32(data_len ? > + data_len : > + 0x80000000); > + sb_of =3D 0; > + sbuf =3D sbuf->next; > + dseg++; > + if (--nb_segs =3D=3D 0) > + return ctrl_next; > + /* fallthrough */ > + case 3: > + lkey =3D mlx4_tx_mb2mr(txq, sbuf); > + if (unlikely(lkey =3D=3D (uint32_t)-1)) > + goto err; > + data_len =3D sbuf->data_len - sb_of; > + mlx4_fill_tx_data_seg(dseg, > + lkey, > + rte_pktmbuf_mtod_offset(sbuf, > + uintptr_t, > + sb_of), > + rte_cpu_to_be_32(data_len ? > + data_len : > + 0x80000000)); > + sb_of =3D 0; > + sbuf =3D sbuf->next; > + dseg++; > + if (--nb_segs =3D=3D 0) > + return ctrl_next; > + /* fallthrough */ > + case 2: > + lkey =3D mlx4_tx_mb2mr(txq, sbuf); > + if (unlikely(lkey =3D=3D (uint32_t)-1)) > + goto err; > + data_len =3D sbuf->data_len - sb_of; > + mlx4_fill_tx_data_seg(dseg, > + lkey, > + rte_pktmbuf_mtod_offset(sbuf, > + uintptr_t, > + sb_of), > + rte_cpu_to_be_32(data_len ? > + data_len : > + 0x80000000)); > + sb_of =3D 0; > + sbuf =3D sbuf->next; > + dseg++; > + if (--nb_segs =3D=3D 0) > + return ctrl_next; > + /* fallthrough */ > + case 1: > + lkey =3D mlx4_tx_mb2mr(txq, sbuf); > + if (unlikely(lkey =3D=3D (uint32_t)-1)) > + goto err; > + data_len =3D sbuf->data_len - sb_of; > + mlx4_fill_tx_data_seg(dseg, > + lkey, > + rte_pktmbuf_mtod_offset(sbuf, > + uintptr_t, > + sb_of), > + rte_cpu_to_be_32(data_len ? > + data_len : > + 0x80000000)); > + sb_of =3D 0; > + sbuf =3D sbuf->next; > + dseg++; > + if (--nb_segs =3D=3D 0) > + return ctrl_next; > + } > + /* Wrap dseg if it points at the end of the queue. */ > + if ((volatile uint8_t *)dseg >=3D sq->eob) > + dseg =3D (volatile struct mlx4_wqe_data_seg *) > + ((volatile uint8_t *)dseg - sq->size); > + } while (true); > +err: > + return NULL; > +} > + > +/** > + * Fill the packet's l2, l3 and l4 headers to the WQE. > + * > + * This will be used as the header for each TSO segment that is transmit= ted. > + * > + * @param buf > + * Pointer to the first packet mbuf. > + * @param txq > + * Pointer to Tx queue structure. > + * @param tinfo > + * Pointer to TSO info to use. > + * @param ctrl > + * Pointer to the control segment in the TSO WQE. > + * > + * @return > + * 0 on success, negative value upon error. > + */ > +static inline volatile struct mlx4_wqe_data_seg * > +mlx4_tx_burst_fill_tso_hdr(struct rte_mbuf *buf, > + struct txq *txq, > + struct tso_info *tinfo, > + volatile struct mlx4_wqe_ctrl_seg *ctrl) { > + volatile struct mlx4_wqe_lso_seg *tseg =3D > + (volatile struct mlx4_wqe_lso_seg *)(ctrl + 1); > + struct mlx4_sq *sq =3D &txq->msq; > + struct pv *pv =3D tinfo->pv; > + int *pv_counter =3D &tinfo->pv_counter; > + int remain_size =3D tinfo->tso_header_size; > + char *from =3D rte_pktmbuf_mtod(buf, char *); > + uint16_t txbb_avail_space; > + /* Union to overcome volatile constraints when copying TSO header. > */ > + union { > + volatile uint8_t *vto; > + uint8_t *to; > + } thdr =3D { .vto =3D (volatile uint8_t *)tseg->header, }; > + > + /* > + * TSO data always starts at offset 20 from the beginning of the TXBB > + * (16 byte ctrl + 4byte TSO desc). Since each TXBB is 64Byte aligned > + * we can write the first 44 TSO header bytes without worry for TxQ > + * wrapping or overwriting the first TXBB 32bit word. > + */ > + txbb_avail_space =3D MLX4_TXBB_SIZE - > + (sizeof(struct mlx4_wqe_ctrl_seg) + > + sizeof(struct mlx4_wqe_lso_seg)); I think that better name is txbb_tail_size. > + while (remain_size >=3D (int)(txbb_avail_space + sizeof(uint32_t))) { > + /* Copy to end of txbb. */ > + rte_memcpy(thdr.to, from, txbb_avail_space); > + from +=3D txbb_avail_space; > + thdr.to +=3D txbb_avail_space; > + /* New TXBB, Check for TxQ wrap. */ > + if (thdr.to >=3D sq->eob) > + thdr.vto =3D sq->buf; > + /* New TXBB, stash the first 32bits for later use. */ > + pv[*pv_counter].dst =3D (volatile uint32_t *)thdr.to; > + pv[(*pv_counter)++].val =3D *(uint32_t *)from, > + from +=3D sizeof(uint32_t); > + thdr.to +=3D sizeof(uint32_t); > + remain_size -=3D (txbb_avail_space + sizeof(uint32_t)); You don't need the () here. > + /* Avail space in new TXBB is TXBB size - 4 */ > + txbb_avail_space =3D MLX4_TXBB_SIZE - sizeof(uint32_t); > + } > + if (remain_size > txbb_avail_space) { > + rte_memcpy(thdr.to, from, txbb_avail_space); > + from +=3D txbb_avail_space; > + thdr.to +=3D txbb_avail_space; > + remain_size -=3D txbb_avail_space; > + /* New TXBB, Check for TxQ wrap. */ > + if (thdr.to >=3D sq->eob) > + thdr.vto =3D sq->buf; > + pv[*pv_counter].dst =3D (volatile uint32_t *)thdr.to; > + rte_memcpy(&pv[*pv_counter].val, from, remain_size); > + (*pv_counter)++; > + } else { Here it should be else if (remain_size > 0). > + rte_memcpy(thdr.to, from, remain_size); > + } > + > + tseg->mss_hdr_size =3D rte_cpu_to_be_32((buf->tso_segsz << 16) | > + tinfo->tso_header_size); > + /* Calculate data segment location */ > + return (volatile struct mlx4_wqe_data_seg *) > + ((uintptr_t)tseg + tinfo->wqe_tso_seg_size); > } > + > +/** > + * Write data segments and header for TSO uni/multi segment packet. > + * > + * @param buf > + * Pointer to the first packet mbuf. > + * @param txq > + * Pointer to Tx queue structure. > + * @param ctrl > + * Pointer to the WQE control segment. > + * > + * @return > + * Pointer to the next WQE control segment on success, NULL otherwise. > + */ > +static volatile struct mlx4_wqe_ctrl_seg * mlx4_tx_burst_tso(struct > +rte_mbuf *buf, struct txq *txq, > + volatile struct mlx4_wqe_ctrl_seg *ctrl) { > + volatile struct mlx4_wqe_data_seg *dseg; > + volatile struct mlx4_wqe_ctrl_seg *ctrl_next; > + struct mlx4_sq *sq =3D &txq->msq; > + struct tso_info tinfo; > + struct pv *pv; > + int pv_counter; > + int ret; > + > + ret =3D mlx4_tx_burst_tso_get_params(buf, txq, &tinfo); > + if (unlikely(ret)) > + goto error; > + dseg =3D mlx4_tx_burst_fill_tso_hdr(buf, txq, &tinfo, ctrl); > + if (unlikely(dseg =3D=3D NULL)) > + goto error; > + if ((uintptr_t)dseg >=3D (uintptr_t)sq->eob) > + dseg =3D (volatile struct mlx4_wqe_data_seg *) > + ((uintptr_t)dseg - sq->size); > + ctrl_next =3D mlx4_tx_burst_fill_tso_dsegs(buf, txq, &tinfo, dseg, ctrl= ); > + if (unlikely(ctrl_next =3D=3D NULL)) > + goto error; > + /* Write the first DWORD of each TXBB save earlier. */ > + pv =3D tinfo.pv; > + pv_counter =3D tinfo.pv_counter; > + /* Need a barrier here before writing the first TXBB word. */ > + rte_io_wmb(); > + for (--pv_counter; pv_counter >=3D 0; pv_counter--) Since we don't need the first check do while statement is better. To be fully safe you can use likely check before the memory barrier.=20 > + *pv[pv_counter].dst =3D pv[pv_counter].val; > + ctrl->fence_size =3D tinfo.fence_size; > + sq->remain_size -=3D tinfo.wqe_size; > + return ctrl_next; > +error: > + txq->stats.odropped++; > + return NULL; > +} > + > +/** > * Write data segments of multi-segment packet. > * > * @param buf > @@ -560,6 +918,7 @@ struct pv { > uint16_t flags16[2]; > } srcrb; > uint32_t lkey; > + bool tso =3D txq->priv->tso && (buf->ol_flags & > PKT_TX_TCP_SEG); >=20 > /* Clean up old buffer. */ > if (likely(elt->buf !=3D NULL)) { > @@ -578,7 +937,16 @@ struct pv { > } while (tmp !=3D NULL); > } > RTE_MBUF_PREFETCH_TO_FREE(elt_next->buf); > - if (buf->nb_segs =3D=3D 1) { > + if (tso) { > + /* Change opcode to TSO */ > + owner_opcode &=3D ~MLX4_OPCODE_CONFIG_CMD; > + owner_opcode |=3D MLX4_OPCODE_LSO | > MLX4_WQE_CTRL_RR; > + ctrl_next =3D mlx4_tx_burst_tso(buf, txq, ctrl); > + if (!ctrl_next) { > + elt->buf =3D NULL; > + break; > + } > + } else if (buf->nb_segs =3D=3D 1) { > /* Validate WQE space in the send queue. */ > if (sq->remain_size < MLX4_TXBB_SIZE) { > elt->buf =3D NULL; > diff --git a/drivers/net/mlx4/mlx4_rxtx.h b/drivers/net/mlx4/mlx4_rxtx.h > index 4c025e3..ffa8abf 100644 > --- a/drivers/net/mlx4/mlx4_rxtx.h > +++ b/drivers/net/mlx4/mlx4_rxtx.h > @@ -90,7 +90,7 @@ struct mlx4_txq_stats { > unsigned int idx; /**< Mapping index. */ > uint64_t opackets; /**< Total of successfully sent packets. */ > uint64_t obytes; /**< Total of successfully sent bytes. */ > - uint64_t odropped; /**< Total of packets not sent when Tx ring full. > */ > + uint64_t odropped; /**< Total number of packets failed to transmit. > */ > }; >=20 > /** Tx queue descriptor. */ > diff --git a/drivers/net/mlx4/mlx4_txq.c b/drivers/net/mlx4/mlx4_txq.c > index 6edaadb..9aa7440 100644 > --- a/drivers/net/mlx4/mlx4_txq.c > +++ b/drivers/net/mlx4/mlx4_txq.c > @@ -116,8 +116,14 @@ > DEV_TX_OFFLOAD_UDP_CKSUM | > DEV_TX_OFFLOAD_TCP_CKSUM); > } > - if (priv->hw_csum_l2tun) > + if (priv->tso) > + offloads |=3D DEV_TX_OFFLOAD_TCP_TSO; > + if (priv->hw_csum_l2tun) { > offloads |=3D DEV_TX_OFFLOAD_OUTER_IPV4_CKSUM; > + if (priv->tso) > + offloads |=3D (DEV_TX_OFFLOAD_VXLAN_TNL_TSO | > + DEV_TX_OFFLOAD_GRE_TNL_TSO); > + } > return offloads; > } >=20 > -- > 1.8.3.1