From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from EUR02-VE1-obe.outbound.protection.outlook.com (mail-eopbgr20059.outbound.protection.outlook.com [40.107.2.59]) by dpdk.org (Postfix) with ESMTP id 2D9101B42D for ; Mon, 9 Jul 2018 18:22:47 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Mellanox.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=DwYL8D6cW4vnx4UMycbqQHgyc6Cj4k7lPjKRK8fIl+I=; b=xY7PFuog6nrzulfd5pN2mL1NoSpzpzTLeXlhd8/de+Nfv4AeNO8dScgS0+DFuMy5+wmL1Vi1+VEZLIX3bawRpBk6/gG9Ie3nimHCCa0P6yj+MQ9mYfSobungfeWQAw3nz57N0ckvXlzJ+v4WeZBrPZ7n7HTiPw9nMkjyua3mWjo= Received: from AM0PR05MB4435.eurprd05.prod.outlook.com (52.134.92.20) by AM0PR05MB4980.eurprd05.prod.outlook.com (20.177.42.97) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.930.21; Mon, 9 Jul 2018 16:22:46 +0000 Received: from AM0PR05MB4435.eurprd05.prod.outlook.com ([fe80::f4d5:1ccc:219f:10b8]) by AM0PR05MB4435.eurprd05.prod.outlook.com ([fe80::f4d5:1ccc:219f:10b8%5]) with mapi id 15.20.0930.016; Mon, 9 Jul 2018 16:22:45 +0000 From: Mordechay Haimovsky To: Matan Azrad , Adrien Mazarguil CC: "dev@dpdk.org" Thread-Topic: [PATCH v5] net/mlx4: support hardware TSO Thread-Index: AQHUF3GtuZpSggt7gEGgH1RVYVouzKSG3JwAgAARoxA= Date: Mon, 9 Jul 2018 16:22:45 +0000 Message-ID: References: <1530715998-15703-1-git-send-email-motih@mellanox.com> <1531132986-5054-1-git-send-email-motih@mellanox.com> In-Reply-To: Accept-Language: he-IL, en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [193.47.165.251] x-ms-publictraffictype: Email x-microsoft-exchange-diagnostics: 1; AM0PR05MB4980; 7:qyBg91CHAv7J14SVh98yBpiymPqMsj5gO4/+obQ2H5Dfkn5mMly4Iaupq6hTHt7z7siGxsrhgHr2n/Zy7d53jX8doLN3owpy1VH0n4HPPgnPH0jnXdBISPM4o4FqJ2VVJ7ZtmYRXGxnqIdqkEakG2pxwHvZzud3NlskW/CpOa4yoOZxL+lYpe4OjqaoJ2yVtVLdM3GNUhMeXD0zMfAMM8zRutewtV54SP3xdzfbHVxBGXZ9YG5Nab7soxVBjpHBJ x-ms-exchange-antispam-srfa-diagnostics: SOS; x-ms-office365-filtering-correlation-id: 3107c7a5-f7c8-4fd8-4cb2-08d5e5b83479 x-ms-office365-filtering-ht: Tenant x-microsoft-antispam: UriScan:; BCL:0; PCL:0; RULEID:(7020095)(4652040)(8989117)(5600053)(711020)(48565401081)(2017052603328)(7153060)(7193020); SRVR:AM0PR05MB4980; x-ms-traffictypediagnostic: AM0PR05MB4980: authentication-results: spf=none (sender IP is ) smtp.mailfrom=motih@mellanox.com; x-ld-processed: a652971c-7d2e-4d9b-a6a4-d149256f461b,ExtAddr x-microsoft-antispam-prvs: x-exchange-antispam-report-test: UriScan:(788757137089)(17755550239193); x-ms-exchange-senderadcheck: 1 x-exchange-antispam-report-cfa-test: BCL:0; PCL:0; RULEID:(8211001083)(6040522)(2401047)(8121501046)(5005006)(93006095)(93001095)(3231311)(944501410)(52105095)(10201501046)(3002001)(6055026)(149027)(150027)(6041310)(20161123560045)(20161123564045)(201703131423095)(201702281528075)(20161123555045)(201703061421075)(201703061406153)(20161123562045)(20161123558120)(6072148)(201708071742011)(7699016); SRVR:AM0PR05MB4980; BCL:0; PCL:0; RULEID:; SRVR:AM0PR05MB4980; x-forefront-prvs: 07283408BE x-forefront-antispam-report: SFV:NSPM; SFS:(10009020)(376002)(396003)(366004)(346002)(136003)(39860400002)(199004)(189003)(51444003)(13464003)(76176011)(53946003)(305945005)(2906002)(102836004)(6506007)(68736007)(74316002)(53936002)(6246003)(446003)(99286004)(11346002)(476003)(7696005)(2900100001)(7736002)(53546011)(33656002)(486006)(14454004)(3846002)(6116002)(14444005)(256004)(26005)(229853002)(5660300001)(478600001)(97736004)(316002)(5250100002)(106356001)(105586002)(8936002)(8676002)(110136005)(86362001)(4326008)(55016002)(6436002)(9686003)(81156014)(81166006)(66066001)(25786009)(42262002)(579004); DIR:OUT; SFP:1101; SCL:1; SRVR:AM0PR05MB4980; H:AM0PR05MB4435.eurprd05.prod.outlook.com; FPR:; SPF:None; LANG:en; PTR:InfoNoRecords; A:1; MX:1; received-spf: None (protection.outlook.com: mellanox.com does not designate permitted sender hosts) x-microsoft-antispam-message-info: DCNMCxcElT9MrrfZzee9xQ10eKqvbB9w30R+4wki0JG21435pWItQv4vdUBGWBFNkoohVSZeGcwX57XZL4YsJqF8wS3qhEJYZ0/n/xlYEuu+m6QuF7ftwmICUdEcbN1tuDtRzPAPXw9WfdsKryD0WUIOrNM0DZk1vvOiZZoC1pjrfSmR/+pjg73Olwux1QVaad2R/BsGyrNqJ90zt2rC3u5qcR28A7npWqQvzGpP6lqnQdVGvyUVDcoy5in6msUO7yA8OpnQF9+TIJXY2/cRQOY+RQktcVoO4RC3mpTyFF7u0F0ooQ1mmcd0dh9O801DX6WElt3Tejgk9v3Ht6Etlf7WrQ3yV+c3UeBpGe48vto= spamdiagnosticoutput: 1:99 spamdiagnosticmetadata: NSPM Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-OriginatorOrg: Mellanox.com X-MS-Exchange-CrossTenant-Network-Message-Id: 3107c7a5-f7c8-4fd8-4cb2-08d5e5b83479 X-MS-Exchange-CrossTenant-originalarrivaltime: 09 Jul 2018 16:22:45.9170 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: a652971c-7d2e-4d9b-a6a4-d149256f461b X-MS-Exchange-Transport-CrossTenantHeadersStamped: AM0PR05MB4980 Subject: Re: [dpdk-dev] [PATCH v5] net/mlx4: support hardware TSO X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 09 Jul 2018 16:22:47 -0000 inline > -----Original Message----- > From: Matan Azrad > Sent: Monday, July 9, 2018 4:07 PM > To: Mordechay Haimovsky ; Adrien Mazarguil > > Cc: dev@dpdk.org > Subject: RE: [PATCH v5] net/mlx4: support hardware TSO >=20 >=20 >=20 > Hi Moti >=20 > Please see some comments below. >=20 > From: Mordechay Haimovsky > > Implement support for hardware TSO. > > > > Signed-off-by: Moti Haimovsky > > --- > > v5: > > * Modification to the code according to review inputs from Matan > > Azrad. > > * Code optimization to the TSO header copy routine. > > * Rearranged the TSO data-segments creation routine. > > in reply to > > 1530715998-15703-1-git-send-email-motih@mellanox.com > > > > v4: > > * Bug fixes in filling TSO data segments. > > * Modifications according to review inputs from Adrien Mazarguil > > and Matan Azrad. > > in reply to > > 1530190137-17848-1-git-send-email-motih@mellanox.com > > > > v3: > > * Fixed compilation errors in compilers without GNU C extensions > > caused by a declaration of zero-length array in the code. > > in reply to > > 1530187032-6489-1-git-send-email-motih@mellanox.com > > > > v2: > > * Fixed coding style warning. > > in reply to > > 1530184583-30166-1-git-send-email-motih@mellanox.com > > > > v1: > > * Fixed coding style warnings. > > in reply to > > 1530181779-19716-1-git-send-email-motih@mellanox.com > > --- > > doc/guides/nics/features/mlx4.ini | 1 + > > doc/guides/nics/mlx4.rst | 3 + > > drivers/net/mlx4/Makefile | 5 + > > drivers/net/mlx4/mlx4.c | 9 + > > drivers/net/mlx4/mlx4.h | 5 + > > drivers/net/mlx4/mlx4_prm.h | 15 ++ > > drivers/net/mlx4/mlx4_rxtx.c | 372 > > +++++++++++++++++++++++++++++++++++++- > > drivers/net/mlx4/mlx4_rxtx.h | 2 +- > > drivers/net/mlx4/mlx4_txq.c | 8 +- > > 9 files changed, 416 insertions(+), 4 deletions(-) > > > > diff --git a/doc/guides/nics/features/mlx4.ini > > b/doc/guides/nics/features/mlx4.ini > > index f6efd21..98a3f61 100644 > > --- a/doc/guides/nics/features/mlx4.ini > > +++ b/doc/guides/nics/features/mlx4.ini > > @@ -13,6 +13,7 @@ Queue start/stop =3D Y > > MTU update =3D Y > > Jumbo frame =3D Y > > Scattered Rx =3D Y > > +TSO =3D Y > > Promiscuous mode =3D Y > > Allmulticast mode =3D Y > > Unicast MAC filter =3D Y > > diff --git a/doc/guides/nics/mlx4.rst b/doc/guides/nics/mlx4.rst index > > 491106a..12adaeb 100644 > > --- a/doc/guides/nics/mlx4.rst > > +++ b/doc/guides/nics/mlx4.rst > > @@ -142,6 +142,9 @@ Limitations > > The ability to enable/disable CRC stripping requires OFED version > > 4.3-1.5.0.0 and above or rdma-core version v18 and above. > > > > +- TSO (Transmit Segmentation Offload) is supported in OFED version > > + 4.4 and above or in rdma-core version v18 and above. > > + > > Prerequisites > > ------------- > > > > diff --git a/drivers/net/mlx4/Makefile b/drivers/net/mlx4/Makefile > > index > > 73f9d40..63bc003 100644 > > --- a/drivers/net/mlx4/Makefile > > +++ b/drivers/net/mlx4/Makefile > > @@ -85,6 +85,11 @@ mlx4_autoconf.h.new: FORCE > > mlx4_autoconf.h.new: $(RTE_SDK)/buildtools/auto-config-h.sh > > $Q $(RM) -f -- '$@' > > $Q : > '$@' > > + $Q sh -- '$<' '$@' \ > > + HAVE_IBV_MLX4_WQE_LSO_SEG \ > > + infiniband/mlx4dv.h \ > > + type 'struct mlx4_wqe_lso_seg' \ > > + $(AUTOCONF_OUTPUT) > > > > # Create mlx4_autoconf.h or update it in case it differs from the new = one. > > > > diff --git a/drivers/net/mlx4/mlx4.c b/drivers/net/mlx4/mlx4.c index > > d151a90..5d8c76d 100644 > > --- a/drivers/net/mlx4/mlx4.c > > +++ b/drivers/net/mlx4/mlx4.c > > @@ -677,6 +677,15 @@ struct mlx4_conf { > > > > IBV_RAW_PACKET_CAP_SCATTER_FCS); > > DEBUG("FCS stripping toggling is %ssupported", > > priv->hw_fcs_strip ? "" : "not "); > > + priv->tso =3D > > + ((device_attr_ex.tso_caps.max_tso > 0) && > > + (device_attr_ex.tso_caps.supported_qpts & > > + (1 << IBV_QPT_RAW_PACKET))); > > + if (priv->tso) > > + priv->tso_max_payload_sz =3D > > + device_attr_ex.tso_caps.max_tso; > > + DEBUG("TSO is %ssupported", > > + priv->tso ? "" : "not "); > > /* Configure the first MAC address by default. */ > > err =3D mlx4_get_mac(priv, &mac.addr_bytes); > > if (err) { > > diff --git a/drivers/net/mlx4/mlx4.h b/drivers/net/mlx4/mlx4.h index > > 300cb4d..89d8c38 100644 > > --- a/drivers/net/mlx4/mlx4.h > > +++ b/drivers/net/mlx4/mlx4.h > > @@ -47,6 +47,9 @@ > > /** Interrupt alarm timeout value in microseconds. */ #define > > MLX4_INTR_ALARM_TIMEOUT 100000 > > > > +/* Maximum packet headers size (L2+L3+L4) for TSO. */ #define > > +MLX4_MAX_TSO_HEADER 192 > > + > > /** Port parameter. */ > > #define MLX4_PMD_PORT_KVARG "port" > > > > @@ -90,6 +93,8 @@ struct priv { > > uint32_t hw_csum:1; /**< Checksum offload is supported. */ > > uint32_t hw_csum_l2tun:1; /**< Checksum support for L2 tunnels. > > */ > > uint32_t hw_fcs_strip:1; /**< FCS stripping toggling is supported. > > */ > > + uint32_t tso:1; /**< Transmit segmentation offload is supported. */ > > + uint32_t tso_max_payload_sz; /**< Max supported TSO payload > > size. */ > > uint64_t hw_rss_sup; /**< Supported RSS hash fields (Verbs > format). > > */ > > struct rte_intr_handle intr_handle; /**< Port interrupt handle. */ > > struct mlx4_drop *drop; /**< Shared resources for drop flow rules. > > */ diff --git a/drivers/net/mlx4/mlx4_prm.h > > b/drivers/net/mlx4/mlx4_prm.h index b771d8c..aef77ba 100644 > > --- a/drivers/net/mlx4/mlx4_prm.h > > +++ b/drivers/net/mlx4/mlx4_prm.h > > @@ -19,6 +19,7 @@ > > #ifdef PEDANTIC > > #pragma GCC diagnostic error "-Wpedantic" > > #endif > > +#include "mlx4_autoconf.h" > > > > /* ConnectX-3 Tx queue basic block. */ #define MLX4_TXBB_SHIFT 6 @@ > > -40,6 +41,7 @@ > > /* Work queue element (WQE) flags. */ #define > > MLX4_WQE_CTRL_IIP_HDR_CSUM (1 << 28) #define > > MLX4_WQE_CTRL_IL4_HDR_CSUM (1 << 27) > > +#define MLX4_WQE_CTRL_RR (1 << 6) > > > > /* CQE checksum flags. */ > > enum { > > @@ -98,6 +100,19 @@ struct mlx4_cq { > > int arm_sn; /**< Rx event counter. */ }; > > > > +#ifndef HAVE_IBV_MLX4_WQE_LSO_SEG > > +/* > > + * WQE LSO segment structure. > > + * Defined here as backward compatibility for rdma-core v17 and below. > > + * Similar definition is found in infiniband/mlx4dv.h in rdma-core > > +v18 > > + * and above. > > + */ > > +struct mlx4_wqe_lso_seg { > > + rte_be32_t mss_hdr_size; > > + rte_be32_t header[]; > > +}; > > +#endif > > + > > /** > > * Retrieve a CQE entry from a CQ. > > * > > diff --git a/drivers/net/mlx4/mlx4_rxtx.c > > b/drivers/net/mlx4/mlx4_rxtx.c index 78b6dd5..b695539 100644 > > --- a/drivers/net/mlx4/mlx4_rxtx.c > > +++ b/drivers/net/mlx4/mlx4_rxtx.c > > @@ -38,10 +38,29 @@ > > * DWORD (32 byte) of a TXBB. > > */ > > struct pv { > > - volatile struct mlx4_wqe_data_seg *dseg; > > + union { > > + volatile struct mlx4_wqe_data_seg *dseg; > > + volatile uint32_t *dst; > > + }; > > uint32_t val; > > }; > > > > +/** A helper structure for TSO packet handling. */ struct tso_info { > > + /** Pointer to the array of saved first DWORD (32 byte) of a TXBB. */ > > + struct pv *pv; > > + /** Current entry in the pv array. */ > > + int pv_counter; > > + /** Total size of the WQE including padding. */ > > + uint32_t wqe_size; > > + /** Size of TSO header to prepend to each packet to send. */ > > + uint16_t tso_header_size; > > + /** Total size of the TSO segment in the WQE. */ > > + uint16_t wqe_tso_seg_size; > > + /** Raw WQE size in units of 16 Bytes and without padding. */ > > + uint8_t fence_size; > > +}; > > + > > /** A table to translate Rx completion flags to packet type. */ > > uint32_t mlx4_ptype_table[0x100] __rte_cache_aligned =3D { > > /* > > @@ -368,6 +387,345 @@ struct pv { > > } > > > > /** > > + * Obtain and calculate TSO information needed for assembling a TSO > WQE. > > + * > > + * @param buf > > + * Pointer to the first packet mbuf. > > + * @param txq > > + * Pointer to Tx queue structure. > > + * @param tinfo > > + * Pointer to a structure to fill the info with. > > + * > > + * @return > > + * 0 on success, negative value upon error. > > + */ > > +static inline int > > +mlx4_tx_burst_tso_get_params(struct rte_mbuf *buf, > > + struct txq *txq, > > + struct tso_info *tinfo) > > +{ > > + struct mlx4_sq *sq =3D &txq->msq; > > + const uint8_t tunneled =3D txq->priv->hw_csum_l2tun && > > + (buf->ol_flags & PKT_TX_TUNNEL_MASK); > > + > > + tinfo->tso_header_size =3D buf->l2_len + buf->l3_len + buf->l4_len; > > + if (tunneled) > > + tinfo->tso_header_size +=3D > > + buf->outer_l2_len + buf->outer_l3_len; > > + if (unlikely(buf->tso_segsz =3D=3D 0 || > > + tinfo->tso_header_size =3D=3D 0 || > > + tinfo->tso_header_size > MLX4_MAX_TSO_HEADER || > > + tinfo->tso_header_size > buf->data_len)) > > + return -EINVAL; > > + /* > > + * Calculate the WQE TSO segment size > > + * Note: > > + * 1. An LSO segment must be padded such that the subsequent data > > + * segment is 16-byte aligned. > > + * 2. The start address of the TSO segment is always 16 Bytes aligned= . > > + */ > > + tinfo->wqe_tso_seg_size =3D RTE_ALIGN(sizeof(struct > > mlx4_wqe_lso_seg) + > > + tinfo->tso_header_size, > > + sizeof(struct > > mlx4_wqe_data_seg)); > > + tinfo->fence_size =3D ((sizeof(struct mlx4_wqe_ctrl_seg) + > > + tinfo->wqe_tso_seg_size) >> MLX4_SEG_SHIFT) + > > + buf->nb_segs; > > + tinfo->wqe_size =3D > > + RTE_ALIGN((uint32_t)(tinfo->fence_size << > > MLX4_SEG_SHIFT), > > + MLX4_TXBB_SIZE); > > + /* Validate WQE size and WQE space in the send queue. */ > > + if (sq->remain_size < tinfo->wqe_size || > > + tinfo->wqe_size > MLX4_MAX_WQE_SIZE) > > + return -ENOMEM; > > + /* Init pv. */ > > + tinfo->pv =3D (struct pv *)txq->bounce_buf; > > + tinfo->pv_counter =3D 0; > > + return 0; > > +} > > + > > +/** > > + * Fill the TSO WQE data segments with info on buffers to transmit . > > + * > > + * @param buf > > + * Pointer to the first packet mbuf. > > + * @param txq > > + * Pointer to Tx queue structure. > > + * @param tinfo > > + * Pointer to TSO info to use. > > + * @param dseg > > + * Pointer to the first data segment in the TSO WQE. > > + * @param ctrl > > + * Pointer to the control segment in the TSO WQE. > > + * > > + * @return > > + * 0 on success, negative value upon error. > > + */ > > +static inline volatile struct mlx4_wqe_ctrl_seg * > > +mlx4_tx_burst_fill_tso_dsegs(struct rte_mbuf *buf, > > + struct txq *txq, > > + struct tso_info *tinfo, > > + volatile struct mlx4_wqe_data_seg *dseg, > > + volatile struct mlx4_wqe_ctrl_seg *ctrl) { > > + uint32_t lkey; > > + int nb_segs =3D buf->nb_segs; > > + int nb_segs_txbb; > > + struct mlx4_sq *sq =3D &txq->msq; > > + struct rte_mbuf *sbuf =3D buf; > > + struct pv *pv =3D tinfo->pv; > > + int *pv_counter =3D &tinfo->pv_counter; > > + volatile struct mlx4_wqe_ctrl_seg *ctrl_next =3D > > + (volatile struct mlx4_wqe_ctrl_seg *) > > + ((volatile uint8_t *)ctrl + tinfo->wqe_size); > > + uint16_t sb_of =3D tinfo->tso_header_size; > > + uint16_t data_len; > > + > > + do { > > + /* how many dseg entries do we have in the current TXBB ? > > */ > > + nb_segs_txbb =3D (MLX4_TXBB_SIZE - > > + ((uintptr_t)dseg & (MLX4_TXBB_SIZE - 1))) >> > > + MLX4_SEG_SHIFT; > > + switch (nb_segs_txbb) { > > + default: > > + /* Should never happen. */ > > + rte_panic("%p: Invalid number of SGEs(%d) for a > > TXBB", > > + (void *)txq, nb_segs_txbb); > > + /* rte_panic never returns. */ >=20 > Since this default case should not happen because of the above calculatio= n I > think we don't need it. > Just "break" if the compiler complain of default case lack. >=20 Although "default" is not mandatory in switch case statement it is a good p= ractice to have it even just for code clarity. so I will keep it there. > > + case 4: > > + /* Memory region key for this memory pool. */ > > + lkey =3D mlx4_tx_mb2mr(txq, sbuf); > > + if (unlikely(lkey =3D=3D (uint32_t)-1)) > > + goto err; > > + dseg->addr =3D > > + > > rte_cpu_to_be_64(rte_pktmbuf_mtod_offset(sbuf, > > + uintptr_t, > > + sb_of)); > > + dseg->lkey =3D lkey; > > + /* > > + * This data segment starts at the beginning of a new > > + * TXBB, so we need to postpone its byte_count > > writing > > + * for later. > > + */ > > + pv[*pv_counter].dseg =3D dseg; > > + /* > > + * Zero length segment is treated as inline segment > > + * with zero data. > > + */ > > + data_len =3D sbuf->data_len - sb_of; >=20 > Is there a chance that the data_len will be negative? Rolled in this case= ? Since we verify ahead the all l2,l3 and L4 headers reside in the same fragm= ent there is no reason for data_len to become negative, this is why I use uint16_t which is the same = data type used in struct rte_mbuf for representing data_len , and as we do it in mlx4_tx_burst_segs. > Maybe it is better to change it for int16_t and to replace the next check= to > be: > data_len > 0 ? data_len : 0x80000000 >=20 I will keep this the way it is for 2 reasons: 1. Seems to me more cumbersome then what I wrote. 2. Code consistency wise, this is how we also wrote it in mlx4_tx_burst_seg= s, What's good there is also good here. >=20 > And I think I found a way to remove the sb_of calculations for each segme= nt: >=20 > Each segment will create the next segment parameters while only the pre > loop calculation for the first segment parameters will calculate the head= er > offset: >=20 > The parameters: data_len and sb_of. >=20 > So before the loop: > sb_of =3D tinfo->tso_header_size; > data_len =3D sbuf->data_len - sb_of; >=20 > And inside the loop (after the check of nb_segs): > sb_of =3D 0; > data_len =3D sbuf->data_len(the next sbuf); >=20 > so each segment calculates the next segment parameters and we don't need > the "- sb_of" calculation per segment. >=20 NICE :) > > + pv[(*pv_counter)++].val =3D > > + rte_cpu_to_be_32(data_len ? > > + data_len : > > + 0x80000000); > > + sb_of =3D 0; > > + sbuf =3D sbuf->next; > > + dseg++; > > + if (--nb_segs =3D=3D 0) > > + return ctrl_next; > > + /* fallthrough */ > > + case 3: > > + lkey =3D mlx4_tx_mb2mr(txq, sbuf); > > + if (unlikely(lkey =3D=3D (uint32_t)-1)) > > + goto err; > > + data_len =3D sbuf->data_len - sb_of; > > + mlx4_fill_tx_data_seg(dseg, > > + lkey, > > + rte_pktmbuf_mtod_offset(sbuf, > > + uintptr_t, > > + sb_of), > > + rte_cpu_to_be_32(data_len ? > > + data_len : > > + 0x80000000)); > > + sb_of =3D 0; > > + sbuf =3D sbuf->next; > > + dseg++; > > + if (--nb_segs =3D=3D 0) > > + return ctrl_next; > > + /* fallthrough */ > > + case 2: > > + lkey =3D mlx4_tx_mb2mr(txq, sbuf); > > + if (unlikely(lkey =3D=3D (uint32_t)-1)) > > + goto err; > > + data_len =3D sbuf->data_len - sb_of; > > + mlx4_fill_tx_data_seg(dseg, > > + lkey, > > + rte_pktmbuf_mtod_offset(sbuf, > > + uintptr_t, > > + sb_of), > > + rte_cpu_to_be_32(data_len ? > > + data_len : > > + 0x80000000)); > > + sb_of =3D 0; > > + sbuf =3D sbuf->next; > > + dseg++; > > + if (--nb_segs =3D=3D 0) > > + return ctrl_next; > > + /* fallthrough */ > > + case 1: > > + lkey =3D mlx4_tx_mb2mr(txq, sbuf); > > + if (unlikely(lkey =3D=3D (uint32_t)-1)) > > + goto err; > > + data_len =3D sbuf->data_len - sb_of; > > + mlx4_fill_tx_data_seg(dseg, > > + lkey, > > + rte_pktmbuf_mtod_offset(sbuf, > > + uintptr_t, > > + sb_of), > > + rte_cpu_to_be_32(data_len ? > > + data_len : > > + 0x80000000)); > > + sb_of =3D 0; > > + sbuf =3D sbuf->next; > > + dseg++; > > + if (--nb_segs =3D=3D 0) > > + return ctrl_next; > > + } > > + /* Wrap dseg if it points at the end of the queue. */ > > + if ((volatile uint8_t *)dseg >=3D sq->eob) > > + dseg =3D (volatile struct mlx4_wqe_data_seg *) > > + ((volatile uint8_t *)dseg - sq->size); > > + } while (true); > > +err: > > + return NULL; > > +} > > + > > +/** > > + * Fill the packet's l2, l3 and l4 headers to the WQE. > > + * > > + * This will be used as the header for each TSO segment that is > transmitted. > > + * > > + * @param buf > > + * Pointer to the first packet mbuf. > > + * @param txq > > + * Pointer to Tx queue structure. > > + * @param tinfo > > + * Pointer to TSO info to use. > > + * @param ctrl > > + * Pointer to the control segment in the TSO WQE. > > + * > > + * @return > > + * 0 on success, negative value upon error. > > + */ > > +static inline volatile struct mlx4_wqe_data_seg * > > +mlx4_tx_burst_fill_tso_hdr(struct rte_mbuf *buf, > > + struct txq *txq, > > + struct tso_info *tinfo, > > + volatile struct mlx4_wqe_ctrl_seg *ctrl) { > > + volatile struct mlx4_wqe_lso_seg *tseg =3D > > + (volatile struct mlx4_wqe_lso_seg *)(ctrl + 1); > > + struct mlx4_sq *sq =3D &txq->msq; > > + struct pv *pv =3D tinfo->pv; > > + int *pv_counter =3D &tinfo->pv_counter; > > + int remain_size =3D tinfo->tso_header_size; > > + char *from =3D rte_pktmbuf_mtod(buf, char *); > > + uint16_t txbb_avail_space; > > + /* Union to overcome volatile constraints when copying TSO header. > > */ > > + union { > > + volatile uint8_t *vto; > > + uint8_t *to; > > + } thdr =3D { .vto =3D (volatile uint8_t *)tseg->header, }; > > + > > + /* > > + * TSO data always starts at offset 20 from the beginning of the TXBB > > + * (16 byte ctrl + 4byte TSO desc). Since each TXBB is 64Byte aligned > > + * we can write the first 44 TSO header bytes without worry for TxQ > > + * wrapping or overwriting the first TXBB 32bit word. > > + */ > > + txbb_avail_space =3D MLX4_TXBB_SIZE - > > + (sizeof(struct mlx4_wqe_ctrl_seg) + > > + sizeof(struct mlx4_wqe_lso_seg)); >=20 > I think that better name is txbb_tail_size. I think that txbb_avail_space is good enough, so no change here. >=20 > > + while (remain_size >=3D (int)(txbb_avail_space + sizeof(uint32_t))) { > > + /* Copy to end of txbb. */ > > + rte_memcpy(thdr.to, from, txbb_avail_space); > > + from +=3D txbb_avail_space; > > + thdr.to +=3D txbb_avail_space; > > + /* New TXBB, Check for TxQ wrap. */ > > + if (thdr.to >=3D sq->eob) > > + thdr.vto =3D sq->buf; > > + /* New TXBB, stash the first 32bits for later use. */ > > + pv[*pv_counter].dst =3D (volatile uint32_t *)thdr.to; > > + pv[(*pv_counter)++].val =3D *(uint32_t *)from, > > + from +=3D sizeof(uint32_t); > > + thdr.to +=3D sizeof(uint32_t); > > + remain_size -=3D (txbb_avail_space + sizeof(uint32_t)); >=20 > You don't need the () here. True >=20 > > + /* Avail space in new TXBB is TXBB size - 4 */ > > + txbb_avail_space =3D MLX4_TXBB_SIZE - sizeof(uint32_t); > > + } > > + if (remain_size > txbb_avail_space) { > > + rte_memcpy(thdr.to, from, txbb_avail_space); > > + from +=3D txbb_avail_space; > > + thdr.to +=3D txbb_avail_space; > > + remain_size -=3D txbb_avail_space; > > + /* New TXBB, Check for TxQ wrap. */ > > + if (thdr.to >=3D sq->eob) > > + thdr.vto =3D sq->buf; > > + pv[*pv_counter].dst =3D (volatile uint32_t *)thdr.to; > > + rte_memcpy(&pv[*pv_counter].val, from, remain_size); > > + (*pv_counter)++; > > + } else { >=20 > Here it should be else if (remain_size > 0). true >=20 > > + rte_memcpy(thdr.to, from, remain_size); > > + } > > + > > + tseg->mss_hdr_size =3D rte_cpu_to_be_32((buf->tso_segsz << 16) | > > + tinfo->tso_header_size); > > + /* Calculate data segment location */ > > + return (volatile struct mlx4_wqe_data_seg *) > > + ((uintptr_t)tseg + tinfo->wqe_tso_seg_size); > > } > > + > > +/** > > + * Write data segments and header for TSO uni/multi segment packet. > > + * > > + * @param buf > > + * Pointer to the first packet mbuf. > > + * @param txq > > + * Pointer to Tx queue structure. > > + * @param ctrl > > + * Pointer to the WQE control segment. > > + * > > + * @return > > + * Pointer to the next WQE control segment on success, NULL otherwis= e. > > + */ > > +static volatile struct mlx4_wqe_ctrl_seg * mlx4_tx_burst_tso(struct > > +rte_mbuf *buf, struct txq *txq, > > + volatile struct mlx4_wqe_ctrl_seg *ctrl) { > > + volatile struct mlx4_wqe_data_seg *dseg; > > + volatile struct mlx4_wqe_ctrl_seg *ctrl_next; > > + struct mlx4_sq *sq =3D &txq->msq; > > + struct tso_info tinfo; > > + struct pv *pv; > > + int pv_counter; > > + int ret; > > + > > + ret =3D mlx4_tx_burst_tso_get_params(buf, txq, &tinfo); > > + if (unlikely(ret)) > > + goto error; > > + dseg =3D mlx4_tx_burst_fill_tso_hdr(buf, txq, &tinfo, ctrl); > > + if (unlikely(dseg =3D=3D NULL)) > > + goto error; > > + if ((uintptr_t)dseg >=3D (uintptr_t)sq->eob) > > + dseg =3D (volatile struct mlx4_wqe_data_seg *) > > + ((uintptr_t)dseg - sq->size); > > + ctrl_next =3D mlx4_tx_burst_fill_tso_dsegs(buf, txq, &tinfo, dseg, ct= rl); > > + if (unlikely(ctrl_next =3D=3D NULL)) > > + goto error; > > + /* Write the first DWORD of each TXBB save earlier. */ > > + pv =3D tinfo.pv; > > + pv_counter =3D tinfo.pv_counter; > > + /* Need a barrier here before writing the first TXBB word. */ > > + rte_io_wmb(); >=20 > > + for (--pv_counter; pv_counter >=3D 0; pv_counter--) >=20 > Since we don't need the first check do while statement is better. > To be fully safe you can use likely check before the memory barrier. >=20 Will return the if statement But will not change the loop as it is the same= as in mlx4_tx_burst_segs and I do want to have a consistent code. > > + *pv[pv_counter].dst =3D pv[pv_counter].val; > > + ctrl->fence_size =3D tinfo.fence_size; > > + sq->remain_size -=3D tinfo.wqe_size; > > + return ctrl_next; > > +error: > > + txq->stats.odropped++; > > + return NULL; > > +} > > + > > +/** > > * Write data segments of multi-segment packet. > > * > > * @param buf > > @@ -560,6 +918,7 @@ struct pv { > > uint16_t flags16[2]; > > } srcrb; > > uint32_t lkey; > > + bool tso =3D txq->priv->tso && (buf->ol_flags & > > PKT_TX_TCP_SEG); > > > > /* Clean up old buffer. */ > > if (likely(elt->buf !=3D NULL)) { > > @@ -578,7 +937,16 @@ struct pv { > > } while (tmp !=3D NULL); > > } > > RTE_MBUF_PREFETCH_TO_FREE(elt_next->buf); > > - if (buf->nb_segs =3D=3D 1) { > > + if (tso) { > > + /* Change opcode to TSO */ > > + owner_opcode &=3D ~MLX4_OPCODE_CONFIG_CMD; > > + owner_opcode |=3D MLX4_OPCODE_LSO | > > MLX4_WQE_CTRL_RR; > > + ctrl_next =3D mlx4_tx_burst_tso(buf, txq, ctrl); > > + if (!ctrl_next) { > > + elt->buf =3D NULL; > > + break; > > + } > > + } else if (buf->nb_segs =3D=3D 1) { > > /* Validate WQE space in the send queue. */ > > if (sq->remain_size < MLX4_TXBB_SIZE) { > > elt->buf =3D NULL; > > diff --git a/drivers/net/mlx4/mlx4_rxtx.h > > b/drivers/net/mlx4/mlx4_rxtx.h index 4c025e3..ffa8abf 100644 > > --- a/drivers/net/mlx4/mlx4_rxtx.h > > +++ b/drivers/net/mlx4/mlx4_rxtx.h > > @@ -90,7 +90,7 @@ struct mlx4_txq_stats { > > unsigned int idx; /**< Mapping index. */ > > uint64_t opackets; /**< Total of successfully sent packets. */ > > uint64_t obytes; /**< Total of successfully sent bytes. */ > > - uint64_t odropped; /**< Total of packets not sent when Tx ring full. > > */ > > + uint64_t odropped; /**< Total number of packets failed to transmit. > > */ > > }; > > > > /** Tx queue descriptor. */ > > diff --git a/drivers/net/mlx4/mlx4_txq.c b/drivers/net/mlx4/mlx4_txq.c > > index 6edaadb..9aa7440 100644 > > --- a/drivers/net/mlx4/mlx4_txq.c > > +++ b/drivers/net/mlx4/mlx4_txq.c > > @@ -116,8 +116,14 @@ > > DEV_TX_OFFLOAD_UDP_CKSUM | > > DEV_TX_OFFLOAD_TCP_CKSUM); > > } > > - if (priv->hw_csum_l2tun) > > + if (priv->tso) > > + offloads |=3D DEV_TX_OFFLOAD_TCP_TSO; > > + if (priv->hw_csum_l2tun) { > > offloads |=3D DEV_TX_OFFLOAD_OUTER_IPV4_CKSUM; > > + if (priv->tso) > > + offloads |=3D (DEV_TX_OFFLOAD_VXLAN_TNL_TSO | > > + DEV_TX_OFFLOAD_GRE_TNL_TSO); > > + } > > return offloads; > > } > > > > -- > > 1.8.3.1