From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id 493E343E2B for ; Tue, 9 Apr 2024 15:38:42 +0200 (CEST) Received: from mails.dpdk.org (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id 38D5E402ED; Tue, 9 Apr 2024 15:38:42 +0200 (CEST) Received: from frasgout.his.huawei.com (frasgout.his.huawei.com [185.176.79.56]) by mails.dpdk.org (Postfix) with ESMTP id 89CCF402C6; Tue, 9 Apr 2024 15:38:40 +0200 (CEST) Received: from mail.maildlp.com (unknown [172.18.186.231]) by frasgout.his.huawei.com (SkyGuard) with ESMTP id 4VDRp80NqJz689xS; Tue, 9 Apr 2024 21:37:04 +0800 (CST) Received: from frapeml100007.china.huawei.com (unknown [7.182.85.133]) by mail.maildlp.com (Postfix) with ESMTPS id BE430140A36; Tue, 9 Apr 2024 21:38:38 +0800 (CST) Received: from frapeml500007.china.huawei.com (7.182.85.172) by frapeml100007.china.huawei.com (7.182.85.133) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.1.2507.35; Tue, 9 Apr 2024 15:38:38 +0200 Received: from frapeml500007.china.huawei.com ([7.182.85.172]) by frapeml500007.china.huawei.com ([7.182.85.172]) with mapi id 15.01.2507.035; Tue, 9 Apr 2024 15:38:38 +0200 From: Konstantin Ananyev To: =?iso-8859-1?Q?Morten_Br=F8rup?= , "David Marchand" , "dev@dpdk.org" CC: "thomas@monjalon.net" , "ferruh.yigit@amd.com" , "stable@dpdk.org" , Olivier Matz , Jijiang Liu , "Andrew Rybchenko" , Ferruh Yigit , Kaiwen Deng , "qiming.yang@intel.com" , "yidingx.zhou@intel.com" , Aman Singh , "Yuying Zhang" , Thomas Monjalon , "Jerin Jacob" Subject: RE: [PATCH v2 3/8] mbuf: fix Tx checksum offload examples Thread-Topic: [PATCH v2 3/8] mbuf: fix Tx checksum offload examples Thread-Index: AQHah2gcwh+zxBWPrEOmSW2nVSxrXbFZuegAgAX4BtA= Date: Tue, 9 Apr 2024 13:38:38 +0000 Message-ID: <10b564b42f8d4db387f6302701f24ce3@huawei.com> References: <20240405125039.897933-1-david.marchand@redhat.com> <20240405144604.906695-1-david.marchand@redhat.com> <20240405144604.906695-4-david.marchand@redhat.com> <98CBD80474FA8B44BF855DF32C47DC35E9F36C@smartserver.smartshare.dk> In-Reply-To: <98CBD80474FA8B44BF855DF32C47DC35E9F36C@smartserver.smartshare.dk> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [10.206.138.42] Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-BeenThere: stable@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: patches for DPDK stable branches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: stable-bounces@dpdk.org > > From: David Marchand [mailto:david.marchand@redhat.com] > > Sent: Friday, 5 April 2024 16.46 > > > > Mandate use of rte_eth_tx_prepare() in the mbuf Tx checksum offload > > examples. >=20 > I strongly disagree with this change! >=20 > It will cause a huge performance degradation for shaping applications: >=20 > A packet will be processed and finalized at an output or forwarding pipel= ine stage, where some other fields might also be written, so > zeroing e.g. the out_ip checksum at this stage has low cost (no new cache= misses). >=20 > Then, the packet might be queued for QoS or similar. >=20 > If rte_eth_tx_prepare() must be called at the egress pipeline stage, it h= as to write to the packet and cause a cache miss per packet, > instead of simply passing on the packet to the NIC hardware. >=20 > It must be possible to finalize the packet at the output/forwarding pipel= ine stage! If you can finalize your packet on output/forwarding, then why you can't i= nvoke tx_prepare() on the same stage? There seems to be some misunderstanding about what tx_prepare() does -=20 in fact it doesn't communicate with HW queue (doesn't update TXD ring, etc.= ), what it does - just make changes in mbuf itself. Yes, it reads some fields in SW TX queue struct (max number of TXDs per pac= ket, etc.), but AFAIK it is safe to call tx_prepare() and tx_burst() from different threads. At least on implementations I am aware about. Just checked the docs - it seems not stated explicitly anywhere, might be t= hat's why it causing such misunderstanding. =20 =20 >=20 > Also, how is rte_eth_tx_prepare() supposed to work for cloned packets egr= essing on different NIC hardware? If you create a clone of full packet (including L2/L3) headers then obvious= ly such construction might not work properly with tx_prepare() over two different NICs. Though In majority of cases you do clone segments with data, while at least= L2 headers are put into different segments. One simple approach would be to keep L3 header in that separate segment. But yes, there is a problem when you'll need to send exactly the same packe= t over different NICs. As I remember, for bonding PMD things don't work quite well here - you migh= t have a bond over 2 NICs with different tx_prepare() and which one to call might be not clear till actual= PMD tx_burst() is invoked. =20 >=20 > In theory, it might get even worse if we make this opaque instead of tran= sparent and standardized: > One PMD might reset out_ip checksum to 0x0000, and another PMD might rese= t it to 0xFFFF. =20 >=20 > I can only see one solution: > We need to standardize on common minimum requirements for how to prepare = packets for each TX offload. If we can make each and every vendor to agree here - that definitely will h= elp to simplify things quite a bit. Then we can probably have one common tx_prepare() for all vendors ;)