From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id C856E43E2B; Tue, 9 Apr 2024 16:44:26 +0200 (CEST) Received: from mails.dpdk.org (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id 52430402C7; Tue, 9 Apr 2024 16:44:26 +0200 (CEST) Received: from dkmailrelay1.smartsharesystems.com (smartserver.smartsharesystems.com [77.243.40.215]) by mails.dpdk.org (Postfix) with ESMTP id 4FBB4402C6; Tue, 9 Apr 2024 16:44:24 +0200 (CEST) Received: from smartserver.smartsharesystems.com (smartserver.smartsharesys.local [192.168.4.10]) by dkmailrelay1.smartsharesystems.com (Postfix) with ESMTP id 261DC205F9; Tue, 9 Apr 2024 16:44:24 +0200 (CEST) Content-class: urn:content-classes:message MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable Subject: RE: [PATCH v2 3/8] mbuf: fix Tx checksum offload examples Date: Tue, 9 Apr 2024 16:44:21 +0200 X-MimeOLE: Produced By Microsoft Exchange V6.5 Message-ID: <98CBD80474FA8B44BF855DF32C47DC35E9F381@smartserver.smartshare.dk> In-Reply-To: <10b564b42f8d4db387f6302701f24ce3@huawei.com> X-MS-Has-Attach: X-MS-TNEF-Correlator: Thread-Topic: [PATCH v2 3/8] mbuf: fix Tx checksum offload examples Thread-Index: AQHah2gcwh+zxBWPrEOmSW2nVSxrXbFZuegAgAX4BtCAAE6MMA== References: <20240405125039.897933-1-david.marchand@redhat.com> <20240405144604.906695-1-david.marchand@redhat.com> <20240405144604.906695-4-david.marchand@redhat.com> <98CBD80474FA8B44BF855DF32C47DC35E9F36C@smartserver.smartshare.dk> <10b564b42f8d4db387f6302701f24ce3@huawei.com> From: =?iso-8859-1?Q?Morten_Br=F8rup?= To: "Konstantin Ananyev" , "David Marchand" , Cc: , , , "Olivier Matz" , "Jijiang Liu" , "Andrew Rybchenko" , "Ferruh Yigit" , "Kaiwen Deng" , , , "Aman Singh" , "Yuying Zhang" , "Thomas Monjalon" , "Jerin Jacob" X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org > From: Konstantin Ananyev [mailto:konstantin.ananyev@huawei.com] > Sent: Tuesday, 9 April 2024 15.39 >=20 > > > From: David Marchand [mailto:david.marchand@redhat.com] > > > Sent: Friday, 5 April 2024 16.46 > > > > > > Mandate use of rte_eth_tx_prepare() in the mbuf Tx checksum = offload > > > examples. > > > > I strongly disagree with this change! > > > > It will cause a huge performance degradation for shaping = applications: > > > > A packet will be processed and finalized at an output or forwarding > pipeline stage, where some other fields might also be written, so > > zeroing e.g. the out_ip checksum at this stage has low cost (no new > cache misses). > > > > Then, the packet might be queued for QoS or similar. > > > > If rte_eth_tx_prepare() must be called at the egress pipeline stage, > it has to write to the packet and cause a cache miss per packet, > > instead of simply passing on the packet to the NIC hardware. > > > > It must be possible to finalize the packet at the output/forwarding > pipeline stage! >=20 > If you can finalize your packet on output/forwarding, then why you > can't invoke tx_prepare() on the same stage? > There seems to be some misunderstanding about what tx_prepare() does - > in fact it doesn't communicate with HW queue (doesn't update TXD ring, > etc.), what it does - just make changes in mbuf itself. > Yes, it reads some fields in SW TX queue struct (max number of TXDs = per > packet, etc.), but AFAIK it is safe > to call tx_prepare() and tx_burst() from different threads. > At least on implementations I am aware about. > Just checked the docs - it seems not stated explicitly anywhere, might > be that's why it causing such misunderstanding. >=20 > > > > Also, how is rte_eth_tx_prepare() supposed to work for cloned = packets > egressing on different NIC hardware? >=20 > If you create a clone of full packet (including L2/L3) headers then > obviously such construction might not > work properly with tx_prepare() over two different NICs. > Though In majority of cases you do clone segments with data, while at > least L2 headers are put into different segments. > One simple approach would be to keep L3 header in that separate = segment. > But yes, there is a problem when you'll need to send exactly the same > packet over different NICs. > As I remember, for bonding PMD things don't work quite well here - you > might have a bond over 2 NICs with > different tx_prepare() and which one to call might be not clear till > actual PMD tx_burst() is invoked. >=20 > > > > In theory, it might get even worse if we make this opaque instead of > transparent and standardized: > > One PMD might reset out_ip checksum to 0x0000, and another PMD might > reset it to 0xFFFF. >=20 > > > > I can only see one solution: > > We need to standardize on common minimum requirements for how to > prepare packets for each TX offload. >=20 > If we can make each and every vendor to agree here - that definitely > will help to simplify things quite a bit. An API is more than a function name and parameters. It also has preconditions and postconditions. All major NIC vendors are contributing to DPDK. It should be possible to reach consensus for reasonable minimum = requirements for offloads. Hardware- and driver-specific exceptions can be documented with the = offload flag, or with rte_eth_rx/tx_burst(), like the note to = rte_eth_rx_burst(): "Some drivers using vector instructions require that nb_pkts is = divisible by 4 or 8, depending on the driver implementation." You mention the bonding driver, which is a good example. The rte_eth_tx_burst() documentation has a note about the API = postcondition exception for the bonding driver: "This function must not modify mbufs (including packets data) unless the = refcnt is 1. An exception is the bonding PMD, [...], mbufs may be = modified." > Then we can probably have one common tx_prepare() for all vendors ;) Yes, that would be the goal. More realistically, the ethdev layer could perform the common checks, = and only the non-conforming drivers would have to implement their = specific tweaks. If we don't standardize the meaning of the offload flags, the = application developers cannot trust them! I'm afraid this is the current situation - application developers either = test with specific NIC hardware, or don't use the offload features.