From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from EUR04-VI1-obe.outbound.protection.outlook.com (mail-eopbgr80084.outbound.protection.outlook.com [40.107.8.84]) by dpdk.org (Postfix) with ESMTP id CFE394C8B for ; Mon, 9 Jul 2018 20:18:20 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Mellanox.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=KiUn2Ce7yN/qlT6/W396LEU2qClwadJglTSg7uInhwQ=; b=oQnoWkvkdV3Chi943Cvxn3Q4dYD7A4LhSnnc16Sv7TBRGkQsCLmKySVLM7EpRbmmAwgO89xeabMEhrsvPFnJ7PrpvHFxcIFe8YfBewimJNE75gUiUc+UP7Smq4GPNJspnacX6W0YjIi5L43V+Rl7JQw2GmTbQ3vB080d/UUzbSU= Received: from VI1PR0501MB2608.eurprd05.prod.outlook.com (10.168.137.20) by VI1PR0501MB2109.eurprd05.prod.outlook.com (10.167.196.13) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.930.21; Mon, 9 Jul 2018 18:18:18 +0000 Received: from VI1PR0501MB2608.eurprd05.prod.outlook.com ([fe80::9dd0:9bdb:fd59:b615]) by VI1PR0501MB2608.eurprd05.prod.outlook.com ([fe80::9dd0:9bdb:fd59:b615%7]) with mapi id 15.20.0930.022; Mon, 9 Jul 2018 18:18:17 +0000 From: Matan Azrad To: Mordechay Haimovsky , Adrien Mazarguil CC: "dev@dpdk.org" Thread-Topic: [PATCH v5] net/mlx4: support hardware TSO Thread-Index: AQHUF3Gti+tnEN79NkSwvmIEj38g8KSGzPiggABGR4CAABIWwA== Date: Mon, 9 Jul 2018 18:18:17 +0000 Message-ID: References: <1530715998-15703-1-git-send-email-motih@mellanox.com> <1531132986-5054-1-git-send-email-motih@mellanox.com> In-Reply-To: Accept-Language: en-US, he-IL Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: authentication-results: spf=none (sender IP is ) smtp.mailfrom=matan@mellanox.com; x-originating-ip: [85.64.136.190] x-ms-publictraffictype: Email x-microsoft-exchange-diagnostics: 1; VI1PR0501MB2109; 7:cLwmWjhvagzmt2voZPd8/JWP7jVR5VoQu5JR5QfN1YZoiz+00vBi0Y4W3TTw1+MC96WDv7sQEsQLkBbRS2Xoyb3vmzbKwHFukifXKn/KnGZxVHEuMfFiRDt+lB1VZImYJodCF7S0B3vNZTESeQoyrLj1fr9awhz6y2mcZ9T5U2ahJXUEvdLGuaBRgm5tJ4GDxNo/9K6anu1ljtk9Rv+O68JNXuNxAiQXWfjiZqjbZyYFNMp6h5qyJt09V24XEkhz x-ms-exchange-antispam-srfa-diagnostics: SOS; x-ms-office365-filtering-correlation-id: 805978ce-fa4b-4aa2-187f-08d5e5c85842 x-ms-office365-filtering-ht: Tenant x-microsoft-antispam: UriScan:; BCL:0; PCL:0; RULEID:(7020095)(4652040)(8989117)(5600053)(711020)(48565401081)(2017052603328)(7153060)(7193020); SRVR:VI1PR0501MB2109; x-ms-traffictypediagnostic: VI1PR0501MB2109: x-ld-processed: a652971c-7d2e-4d9b-a6a4-d149256f461b,ExtAddr x-microsoft-antispam-prvs: x-exchange-antispam-report-test: UriScan:(788757137089)(17755550239193); x-ms-exchange-senderadcheck: 1 x-exchange-antispam-report-cfa-test: BCL:0; PCL:0; RULEID:(8211001083)(6040522)(2401047)(8121501046)(5005006)(10201501046)(3231311)(944501410)(52105095)(93006095)(93001095)(3002001)(6055026)(149027)(150027)(6041310)(20161123560045)(201703131423095)(201702281528075)(20161123555045)(201703061421075)(201703061406153)(20161123562045)(20161123558120)(20161123564045)(6072148)(201708071742011)(7699016); SRVR:VI1PR0501MB2109; BCL:0; PCL:0; RULEID:; SRVR:VI1PR0501MB2109; x-forefront-prvs: 07283408BE x-forefront-antispam-report: SFV:NSPM; SFS:(10009020)(136003)(346002)(396003)(376002)(366004)(39860400002)(51444003)(189003)(199004)(105586002)(5250100002)(53936002)(25786009)(74316002)(8936002)(6246003)(316002)(102836004)(26005)(8676002)(93886005)(7696005)(14454004)(97736004)(106356001)(6506007)(81166006)(6436002)(76176011)(33656002)(110136005)(2900100001)(99286004)(486006)(256004)(11346002)(476003)(446003)(14444005)(81156014)(3846002)(6116002)(229853002)(68736007)(9686003)(5660300001)(478600001)(55016002)(4326008)(2906002)(86362001)(66066001)(7736002)(305945005); DIR:OUT; SFP:1101; SCL:1; SRVR:VI1PR0501MB2109; H:VI1PR0501MB2608.eurprd05.prod.outlook.com; FPR:; SPF:None; LANG:en; PTR:InfoNoRecords; MX:1; A:1; received-spf: None (protection.outlook.com: mellanox.com does not designate permitted sender hosts) x-microsoft-antispam-message-info: dwPjlvM+JLMfc6MFGWKnBA+VZ11IiF3N4LNg6hk5ZeK1VpPQGSGRnBAv74BmU8swVv1crgwkic1yFu237Cdnx7U9hh/3OsRI0hyw5FPQJUzbtIVN5JNSirkG2ajxb81+Bn1l2tupL1tRpiarlE8sZogqfZFigFYpXkAUOcxXKJdwaFE/lLL7t5QWaeS3oCFG4nyiH36adAUzk9dHjNbp1JNw9ai3YTdOHuLDK0CAQw21jEUqjf/mv33kaU+EAxESTpsjIWiCMfrlOFzQ/ALZIRXBgx9KSdI0hucwyep3hKCcv9TZi64Wrx/Kv8TeOsiAaeXjtCyYtDNo0uXkyVaM/p1+ARkuiAe3ZzDTPKlFUpQ= spamdiagnosticoutput: 1:99 spamdiagnosticmetadata: NSPM Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-OriginatorOrg: Mellanox.com X-MS-Exchange-CrossTenant-Network-Message-Id: 805978ce-fa4b-4aa2-187f-08d5e5c85842 X-MS-Exchange-CrossTenant-originalarrivaltime: 09 Jul 2018 18:18:17.8829 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: a652971c-7d2e-4d9b-a6a4-d149256f461b X-MS-Exchange-Transport-CrossTenantHeadersStamped: VI1PR0501MB2109 Subject: Re: [dpdk-dev] [PATCH v5] net/mlx4: support hardware TSO X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 09 Jul 2018 18:18:21 -0000 Hi Moti I continue the discussion here in spite of the next version was out just to= see the full discussions.=20 From: Mordechay Haimovsky > inline >=20 > > From: Matan Azrad > > Hi Moti > > > > Please see some comments below. > > > > From: Mordechay Haimovsky > > > Implement support for hardware TSO. > > > > > > Signed-off-by: Moti Haimovsky ... > > > + do { > > > + /* how many dseg entries do we have in the current TXBB ? > > > */ > > > + nb_segs_txbb =3D (MLX4_TXBB_SIZE - > > > + ((uintptr_t)dseg & (MLX4_TXBB_SIZE - 1))) >> > > > + MLX4_SEG_SHIFT; > > > + switch (nb_segs_txbb) { > > > + default: > > > + /* Should never happen. */ > > > + rte_panic("%p: Invalid number of SGEs(%d) for a > > > TXBB", > > > + (void *)txq, nb_segs_txbb); > > > + /* rte_panic never returns. */ > > > > Since this default case should not happen because of the above > > calculation I think we don't need it. > > Just "break" if the compiler complain of default case lack. > > > Although "default" is not mandatory in switch case statement it is a good > practice to have it even just for code clarity. > so I will keep it there. But the rte_panic code (and all the default block) is redundant and we don'= t need redundant code in our data-path. You can remain a comment if you want for clarifying. =20 > > > + case 4: > > > + /* Memory region key for this memory pool. */ > > > + lkey =3D mlx4_tx_mb2mr(txq, sbuf); > > > + if (unlikely(lkey =3D=3D (uint32_t)-1)) > > > + goto err; > > > + dseg->addr =3D > > > + > > > rte_cpu_to_be_64(rte_pktmbuf_mtod_offset(sbuf, > > > + uintptr_t, > > > + sb_of)); > > > + dseg->lkey =3D lkey; > > > + /* > > > + * This data segment starts at the beginning of a new > > > + * TXBB, so we need to postpone its byte_count > > > writing > > > + * for later. > > > + */ > > > + pv[*pv_counter].dseg =3D dseg; > > > + /* > > > + * Zero length segment is treated as inline segment > > > + * with zero data. > > > + */ > > > + data_len =3D sbuf->data_len - sb_of; > > > > Is there a chance that the data_len will be negative? Rolled in this ca= se? > Since we verify ahead the all l2,l3 and L4 headers reside in the same fra= gment > there is no reason for data_len to become negative, this is why I use uin= t16_t > which is the same data type used in struct rte_mbuf for representing > data_len , and as we do it in mlx4_tx_burst_segs. >=20 > > Maybe it is better to change it for int16_t and to replace the next > > check to > > be: > > data_len > 0 ? data_len : 0x80000000 > > > I will keep this the way it is for 2 reasons: > 1. Seems to me more cumbersome then what I wrote. OK, you right here, if it cannot be negative we shouldn't change it :) > 2. Code consistency wise, this is how we also wrote it in mlx4_tx_burst_s= egs, > What's good there is also good here. Not agree, here is really a different case from there, a lot of assumption = are different and the code may reflects it. > > And I think I found a way to remove the sb_of calculations for each > segment: > > > > Each segment will create the next segment parameters while only the > > pre loop calculation for the first segment parameters will calculate > > the header > > offset: > > > > The parameters: data_len and sb_of. > > > > So before the loop: > > sb_of =3D tinfo->tso_header_size; > > data_len =3D sbuf->data_len - sb_of; > > > > And inside the loop (after the check of nb_segs): > > sb_of =3D 0; > > data_len =3D sbuf->data_len(the next sbuf); > > > > so each segment calculates the next segment parameters and we don't > > need the "- sb_of" calculation per segment. > > > NICE :) >=20 Sorry for see it only now, but we don't need even the "sb_of=3D0" per segme= nt: We can add one more parameter for the next segment=20 addr =3D rte_pktmbuf_mtod_offset(sbuf, uintptr_t, tinfo->tso_header_size) before the loop and addr=3D rte_pktmbuf_mtod(sbuf, uintptr_t) inside the loop so finally we save 2 cycles per segment :) ... > > > +static inline volatile struct mlx4_wqe_data_seg * > > > +mlx4_tx_burst_fill_tso_hdr(struct rte_mbuf *buf, > > > + struct txq *txq, > > > + struct tso_info *tinfo, > > > + volatile struct mlx4_wqe_ctrl_seg *ctrl) { > > > + volatile struct mlx4_wqe_lso_seg *tseg =3D > > > + (volatile struct mlx4_wqe_lso_seg *)(ctrl + 1); > > > + struct mlx4_sq *sq =3D &txq->msq; > > > + struct pv *pv =3D tinfo->pv; > > > + int *pv_counter =3D &tinfo->pv_counter; > > > + int remain_size =3D tinfo->tso_header_size; > > > + char *from =3D rte_pktmbuf_mtod(buf, char *); > > > + uint16_t txbb_avail_space; > > > + /* Union to overcome volatile constraints when copying TSO header. > > > */ > > > + union { > > > + volatile uint8_t *vto; > > > + uint8_t *to; > > > + } thdr =3D { .vto =3D (volatile uint8_t *)tseg->header, }; > > > + > > > + /* > > > + * TSO data always starts at offset 20 from the beginning of the TX= BB > > > + * (16 byte ctrl + 4byte TSO desc). Since each TXBB is 64Byte align= ed > > > + * we can write the first 44 TSO header bytes without worry for TxQ > > > + * wrapping or overwriting the first TXBB 32bit word. > > > + */ > > > + txbb_avail_space =3D MLX4_TXBB_SIZE - > > > + (sizeof(struct mlx4_wqe_ctrl_seg) + > > > + sizeof(struct mlx4_wqe_lso_seg)); > > > > I think that better name is txbb_tail_size. > I think that txbb_avail_space is good enough, so no change here. My suggestion is because this size is only the tail size in the txbb withou= t the first 4 bytes, so it may be more reasonable. I can understand also your suggestion while avail points to the available t= xbb size to write (the first 4B are not available now only later). I'm not going to argue about it :) =20 >=20 > > > > > + while (remain_size >=3D (int)(txbb_avail_space + sizeof(uint32_t)))= { > > > + /* Copy to end of txbb. */ > > > + rte_memcpy(thdr.to, from, txbb_avail_space); > > > + from +=3D txbb_avail_space; > > > + thdr.to +=3D txbb_avail_space; > > > + /* New TXBB, Check for TxQ wrap. */ > > > + if (thdr.to >=3D sq->eob) > > > + thdr.vto =3D sq->buf; > > > + /* New TXBB, stash the first 32bits for later use. */ > > > + pv[*pv_counter].dst =3D (volatile uint32_t *)thdr.to; > > > + pv[(*pv_counter)++].val =3D *(uint32_t *)from, > > > + from +=3D sizeof(uint32_t); > > > + thdr.to +=3D sizeof(uint32_t); > > > + remain_size -=3D (txbb_avail_space + sizeof(uint32_t)); > > > > You don't need the () here. > True > > > > > + /* Avail space in new TXBB is TXBB size - 4 */ > > > + txbb_avail_space =3D MLX4_TXBB_SIZE - sizeof(uint32_t); > > > + } > > > + if (remain_size > txbb_avail_space) { > > > + rte_memcpy(thdr.to, from, txbb_avail_space); > > > + from +=3D txbb_avail_space; > > > + thdr.to +=3D txbb_avail_space; > > > + remain_size -=3D txbb_avail_space; > > > + /* New TXBB, Check for TxQ wrap. */ > > > + if (thdr.to >=3D sq->eob) > > > + thdr.vto =3D sq->buf; > > > + pv[*pv_counter].dst =3D (volatile uint32_t *)thdr.to; > > > + rte_memcpy(&pv[*pv_counter].val, from, remain_size); > > > + (*pv_counter)++; > > > + } else { > > > > Here it should be else if (remain_size > 0). > true > > > > > + rte_memcpy(thdr.to, from, remain_size); > > > + } > > > + > > > + tseg->mss_hdr_size =3D rte_cpu_to_be_32((buf->tso_segsz << 16) | > > > + tinfo->tso_header_size); > > > + /* Calculate data segment location */ > > > + return (volatile struct mlx4_wqe_data_seg *) > > > + ((uintptr_t)tseg + tinfo->wqe_tso_seg_size); > > > } > > > + > > > +/** > > > + * Write data segments and header for TSO uni/multi segment packet. > > > + * > > > + * @param buf > > > + * Pointer to the first packet mbuf. > > > + * @param txq > > > + * Pointer to Tx queue structure. > > > + * @param ctrl > > > + * Pointer to the WQE control segment. > > > + * > > > + * @return > > > + * Pointer to the next WQE control segment on success, NULL > otherwise. > > > + */ > > > +static volatile struct mlx4_wqe_ctrl_seg * mlx4_tx_burst_tso(struct > > > +rte_mbuf *buf, struct txq *txq, > > > + volatile struct mlx4_wqe_ctrl_seg *ctrl) { > > > + volatile struct mlx4_wqe_data_seg *dseg; > > > + volatile struct mlx4_wqe_ctrl_seg *ctrl_next; > > > + struct mlx4_sq *sq =3D &txq->msq; > > > + struct tso_info tinfo; > > > + struct pv *pv; > > > + int pv_counter; > > > + int ret; > > > + > > > + ret =3D mlx4_tx_burst_tso_get_params(buf, txq, &tinfo); > > > + if (unlikely(ret)) > > > + goto error; > > > + dseg =3D mlx4_tx_burst_fill_tso_hdr(buf, txq, &tinfo, ctrl); > > > + if (unlikely(dseg =3D=3D NULL)) > > > + goto error; > > > + if ((uintptr_t)dseg >=3D (uintptr_t)sq->eob) > > > + dseg =3D (volatile struct mlx4_wqe_data_seg *) > > > + ((uintptr_t)dseg - sq->size); > > > + ctrl_next =3D mlx4_tx_burst_fill_tso_dsegs(buf, txq, &tinfo, dseg, = ctrl); > > > + if (unlikely(ctrl_next =3D=3D NULL)) > > > + goto error; > > > + /* Write the first DWORD of each TXBB save earlier. */ > > > + pv =3D tinfo.pv; > > > + pv_counter =3D tinfo.pv_counter; > > > + /* Need a barrier here before writing the first TXBB word. */ > > > + rte_io_wmb(); > > > > > + for (--pv_counter; pv_counter >=3D 0; pv_counter--) > > > > Since we don't need the first check do while statement is better. > > To be fully safe you can use likely check before the memory barrier. > > > Will return the if statement But will not change the loop as it is the sa= me as in > mlx4_tx_burst_segs and I do want to have a consistent code. I'm not agree with this statement as above - different assumptions - differ= ent optimized code. Here and in the mlx4_tx_burst_segs code we don't need the first check in th= e for loop and we don't need redundant checks in our datapath. So both should be updated. While the difference is in the prior check, here it should be likely and th= ere it should not. So actually there the "for" loop can stay as is but we don't need the first= if check of pv_counter because it is already checked in the for loop. I suggest to optimize both(prior patch for mlx4_tx_burst_segs optimization)= . > > > + *pv[pv_counter].dst =3D pv[pv_counter].val; > > > + ctrl->fence_size =3D tinfo.fence_size; > > > + sq->remain_size -=3D tinfo.wqe_size; > > > + return ctrl_next;