From: "Morten Brørup" <mb@smartsharesystems.com>
To: "Konstantin Ananyev" <konstantin.ananyev@huawei.com>,
"David Marchand" <david.marchand@redhat.com>, <dev@dpdk.org>
Cc: <thomas@monjalon.net>, <ferruh.yigit@amd.com>, <stable@dpdk.org>,
"Olivier Matz" <olivier.matz@6wind.com>,
"Jijiang Liu" <jijiang.liu@intel.com>,
"Andrew Rybchenko" <andrew.rybchenko@oktetlabs.ru>,
"Ferruh Yigit" <ferruh.yigit@amd.com>,
"Kaiwen Deng" <kaiwenx.deng@intel.com>, <qiming.yang@intel.com>,
<yidingx.zhou@intel.com>,
"Aman Singh" <aman.deep.singh@intel.com>,
"Yuying Zhang" <yuying.zhang@intel.com>,
"Thomas Monjalon" <thomas@monjalon.net>,
"Jerin Jacob" <jerinj@marvell.com>
Subject: RE: [PATCH v2 3/8] mbuf: fix Tx checksum offload examples
Date: Tue, 9 Apr 2024 16:44:21 +0200 [thread overview]
Message-ID: <98CBD80474FA8B44BF855DF32C47DC35E9F381@smartserver.smartshare.dk> (raw)
In-Reply-To: <10b564b42f8d4db387f6302701f24ce3@huawei.com>
> From: Konstantin Ananyev [mailto:konstantin.ananyev@huawei.com]
> Sent: Tuesday, 9 April 2024 15.39
>
> > > From: David Marchand [mailto:david.marchand@redhat.com]
> > > Sent: Friday, 5 April 2024 16.46
> > >
> > > Mandate use of rte_eth_tx_prepare() in the mbuf Tx checksum offload
> > > examples.
> >
> > I strongly disagree with this change!
> >
> > It will cause a huge performance degradation for shaping applications:
> >
> > A packet will be processed and finalized at an output or forwarding
> pipeline stage, where some other fields might also be written, so
> > zeroing e.g. the out_ip checksum at this stage has low cost (no new
> cache misses).
> >
> > Then, the packet might be queued for QoS or similar.
> >
> > If rte_eth_tx_prepare() must be called at the egress pipeline stage,
> it has to write to the packet and cause a cache miss per packet,
> > instead of simply passing on the packet to the NIC hardware.
> >
> > It must be possible to finalize the packet at the output/forwarding
> pipeline stage!
>
> If you can finalize your packet on output/forwarding, then why you
> can't invoke tx_prepare() on the same stage?
> There seems to be some misunderstanding about what tx_prepare() does -
> in fact it doesn't communicate with HW queue (doesn't update TXD ring,
> etc.), what it does - just make changes in mbuf itself.
> Yes, it reads some fields in SW TX queue struct (max number of TXDs per
> packet, etc.), but AFAIK it is safe
> to call tx_prepare() and tx_burst() from different threads.
> At least on implementations I am aware about.
> Just checked the docs - it seems not stated explicitly anywhere, might
> be that's why it causing such misunderstanding.
>
> >
> > Also, how is rte_eth_tx_prepare() supposed to work for cloned packets
> egressing on different NIC hardware?
>
> If you create a clone of full packet (including L2/L3) headers then
> obviously such construction might not
> work properly with tx_prepare() over two different NICs.
> Though In majority of cases you do clone segments with data, while at
> least L2 headers are put into different segments.
> One simple approach would be to keep L3 header in that separate segment.
> But yes, there is a problem when you'll need to send exactly the same
> packet over different NICs.
> As I remember, for bonding PMD things don't work quite well here - you
> might have a bond over 2 NICs with
> different tx_prepare() and which one to call might be not clear till
> actual PMD tx_burst() is invoked.
>
> >
> > In theory, it might get even worse if we make this opaque instead of
> transparent and standardized:
> > One PMD might reset out_ip checksum to 0x0000, and another PMD might
> reset it to 0xFFFF.
>
> >
> > I can only see one solution:
> > We need to standardize on common minimum requirements for how to
> prepare packets for each TX offload.
>
> If we can make each and every vendor to agree here - that definitely
> will help to simplify things quite a bit.
An API is more than a function name and parameters.
It also has preconditions and postconditions.
All major NIC vendors are contributing to DPDK.
It should be possible to reach consensus for reasonable minimum requirements for offloads.
Hardware- and driver-specific exceptions can be documented with the offload flag, or with rte_eth_rx/tx_burst(), like the note to rte_eth_rx_burst():
"Some drivers using vector instructions require that nb_pkts is divisible by 4 or 8, depending on the driver implementation."
You mention the bonding driver, which is a good example.
The rte_eth_tx_burst() documentation has a note about the API postcondition exception for the bonding driver:
"This function must not modify mbufs (including packets data) unless the refcnt is 1. An exception is the bonding PMD, [...], mbufs may be modified."
> Then we can probably have one common tx_prepare() for all vendors ;)
Yes, that would be the goal.
More realistically, the ethdev layer could perform the common checks, and only the non-conforming drivers would have to implement their specific tweaks.
If we don't standardize the meaning of the offload flags, the application developers cannot trust them!
I'm afraid this is the current situation - application developers either test with specific NIC hardware, or don't use the offload features.
next prev parent reply other threads:[~2024-04-09 14:44 UTC|newest]
Thread overview: 30+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <20240405125039.897933-1-david.marchand@redhat.com>
2024-04-05 12:49 ` [PATCH " David Marchand
2024-04-05 12:49 ` [PATCH 4/8] app/testpmd: fix outer IP checksum offload David Marchand
2024-04-05 12:49 ` [PATCH 5/8] net: fix outer UDP checksum in Intel prepare helper David Marchand
2024-04-05 12:49 ` [PATCH 6/8] net/i40e: fix outer UDP checksum offload for X710 David Marchand
2024-04-05 12:49 ` [PATCH 7/8] net/iavf: remove outer UDP checksum offload for X710 VF David Marchand
[not found] ` <20240405144604.906695-1-david.marchand@redhat.com>
2024-04-05 14:45 ` [PATCH v2 3/8] mbuf: fix Tx checksum offload examples David Marchand
2024-04-05 16:20 ` Morten Brørup
2024-04-08 10:12 ` David Marchand
2024-04-09 13:38 ` Konstantin Ananyev
2024-04-09 14:44 ` Morten Brørup [this message]
2024-04-10 10:35 ` Konstantin Ananyev
2024-04-10 12:20 ` Morten Brørup
2024-04-12 12:46 ` Konstantin Ananyev
2024-04-12 14:44 ` Morten Brørup
2024-04-12 15:17 ` Konstantin Ananyev
2024-04-12 15:54 ` Morten Brørup
2024-04-16 9:16 ` Konstantin Ananyev
2024-04-16 11:36 ` Konstantin Ananyev
2024-04-15 15:07 ` Ferruh Yigit
2024-04-16 7:14 ` Morten Brørup
2024-04-16 9:26 ` Konstantin Ananyev
2024-04-05 14:45 ` [PATCH v2 4/8] app/testpmd: fix outer IP checksum offload David Marchand
2024-04-05 14:45 ` [PATCH v2 5/8] net: fix outer UDP checksum in Intel prepare helper David Marchand
2024-04-05 14:46 ` [PATCH v2 6/8] net/i40e: fix outer UDP checksum offload for X710 David Marchand
2024-04-05 14:46 ` [PATCH v2 7/8] net/iavf: remove outer UDP checksum offload for X710 VF David Marchand
[not found] ` <20240418082023.1767998-1-david.marchand@redhat.com>
2024-04-18 8:20 ` [PATCH v3 3/7] app/testpmd: fix outer IP checksum offload David Marchand
2024-06-11 18:25 ` Ferruh Yigit
2024-04-18 8:20 ` [PATCH v3 4/7] net: fix outer UDP checksum in Intel prepare helper David Marchand
2024-04-18 8:20 ` [PATCH v3 5/7] net/i40e: fix outer UDP checksum offload for X710 David Marchand
2024-04-18 8:20 ` [PATCH v3 6/7] net/iavf: remove outer UDP checksum offload for X710 VF David Marchand
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=98CBD80474FA8B44BF855DF32C47DC35E9F381@smartserver.smartshare.dk \
--to=mb@smartsharesystems.com \
--cc=aman.deep.singh@intel.com \
--cc=andrew.rybchenko@oktetlabs.ru \
--cc=david.marchand@redhat.com \
--cc=dev@dpdk.org \
--cc=ferruh.yigit@amd.com \
--cc=jerinj@marvell.com \
--cc=jijiang.liu@intel.com \
--cc=kaiwenx.deng@intel.com \
--cc=konstantin.ananyev@huawei.com \
--cc=olivier.matz@6wind.com \
--cc=qiming.yang@intel.com \
--cc=stable@dpdk.org \
--cc=thomas@monjalon.net \
--cc=yidingx.zhou@intel.com \
--cc=yuying.zhang@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).