patches for DPDK stable branches
 help / color / mirror / Atom feed
From: Konstantin Ananyev <konstantin.ananyev@huawei.com>
To: "Morten Brørup" <mb@smartsharesystems.com>,
	"David Marchand" <david.marchand@redhat.com>,
	"dev@dpdk.org" <dev@dpdk.org>
Cc: "thomas@monjalon.net" <thomas@monjalon.net>,
	"ferruh.yigit@amd.com" <ferruh.yigit@amd.com>,
	"stable@dpdk.org" <stable@dpdk.org>,
	Olivier Matz <olivier.matz@6wind.com>,
	Jijiang Liu <jijiang.liu@intel.com>,
	"Andrew Rybchenko" <andrew.rybchenko@oktetlabs.ru>,
	Ferruh Yigit <ferruh.yigit@amd.com>,
	Kaiwen Deng <kaiwenx.deng@intel.com>,
	"qiming.yang@intel.com" <qiming.yang@intel.com>,
	"yidingx.zhou@intel.com" <yidingx.zhou@intel.com>,
	Aman Singh <aman.deep.singh@intel.com>,
	"Yuying Zhang" <yuying.zhang@intel.com>,
	Thomas Monjalon <thomas@monjalon.net>,
	"Jerin Jacob" <jerinj@marvell.com>
Subject: RE: [PATCH v2 3/8] mbuf: fix Tx checksum offload examples
Date: Wed, 10 Apr 2024 10:35:07 +0000	[thread overview]
Message-ID: <409157f5da3e4c628ca678dd9e2c7957@huawei.com> (raw)
In-Reply-To: <98CBD80474FA8B44BF855DF32C47DC35E9F381@smartserver.smartshare.dk>



> > From: Konstantin Ananyev [mailto:konstantin.ananyev@huawei.com]
> > Sent: Tuesday, 9 April 2024 15.39
> >
> > > > From: David Marchand [mailto:david.marchand@redhat.com]
> > > > Sent: Friday, 5 April 2024 16.46
> > > >
> > > > Mandate use of rte_eth_tx_prepare() in the mbuf Tx checksum offload
> > > > examples.
> > >
> > > I strongly disagree with this change!
> > >
> > > It will cause a huge performance degradation for shaping applications:
> > >
> > > A packet will be processed and finalized at an output or forwarding
> > pipeline stage, where some other fields might also be written, so
> > > zeroing e.g. the out_ip checksum at this stage has low cost (no new
> > cache misses).
> > >
> > > Then, the packet might be queued for QoS or similar.
> > >
> > > If rte_eth_tx_prepare() must be called at the egress pipeline stage,
> > it has to write to the packet and cause a cache miss per packet,
> > > instead of simply passing on the packet to the NIC hardware.
> > >
> > > It must be possible to finalize the packet at the output/forwarding
> > pipeline stage!
> >
> > If you can finalize your packet on  output/forwarding, then why you
> > can't invoke tx_prepare() on the same stage?
> > There seems to be some misunderstanding about what tx_prepare() does -
> > in fact it doesn't communicate with HW queue (doesn't update TXD ring,
> > etc.), what it does - just make changes in mbuf itself.
> > Yes, it reads some fields in SW TX queue struct (max number of TXDs per
> > packet, etc.), but AFAIK it is safe
> > to call tx_prepare() and tx_burst() from different threads.
> > At least on implementations I am aware about.
> > Just checked the docs - it seems not stated explicitly anywhere, might
> > be that's why it causing such misunderstanding.
> >
> > >
> > > Also, how is rte_eth_tx_prepare() supposed to work for cloned packets
> > egressing on different NIC hardware?
> >
> > If you create a clone of full packet (including L2/L3) headers then
> > obviously such construction might not
> > work properly with tx_prepare() over two different NICs.
> > Though In majority of cases you do clone segments with data, while at
> > least L2 headers are put into different segments.
> > One simple approach would be to keep L3 header in that separate segment.
> > But yes, there is a problem when you'll need to send exactly the same
> > packet over different NICs.
> > As I remember, for bonding PMD things don't work quite well here - you
> > might have a bond over 2 NICs with
> > different tx_prepare() and which one to call might be not clear till
> > actual PMD tx_burst() is invoked.
> >
> > >
> > > In theory, it might get even worse if we make this opaque instead of
> > transparent and standardized:
> > > One PMD might reset out_ip checksum to 0x0000, and another PMD might
> > reset it to 0xFFFF.
> >
> > >
> > > I can only see one solution:
> > > We need to standardize on common minimum requirements for how to
> > prepare packets for each TX offload.
> >
> > If we can make each and every vendor to agree here - that definitely
> > will help to simplify things quite a bit.
> 
> An API is more than a function name and parameters.
> It also has preconditions and postconditions.
> 
> All major NIC vendors are contributing to DPDK.
> It should be possible to reach consensus for reasonable minimum requirements for offloads.
> Hardware- and driver-specific exceptions can be documented with the offload flag, or with rte_eth_rx/tx_burst(), like the note to
> rte_eth_rx_burst():
> "Some drivers using vector instructions require that nb_pkts is divisible by 4 or 8, depending on the driver implementation."

If we introduce a rule that everyone supposed to follow and then straightway allow people to have a 'documented exceptions',
for me it means like 'no rule' in practice.
A 'documented exceptions' approach might work if you have 5 different PMDs to support, but not when you have 50+.
No-one would write an app with possible 10 different exception cases in his head.
Again, with such approach we can forget about backward compatibility.
I think we already had this discussion before, my opinion remains the same here -
'documented exceptions' approach is a way to trouble.    

> You mention the bonding driver, which is a good example.
> The rte_eth_tx_burst() documentation has a note about the API postcondition exception for the bonding driver:
> "This function must not modify mbufs (including packets data) unless the refcnt is 1. An exception is the bonding PMD, [...], mbufs
> may be modified."

For me, what we've done for bonding tx_prepare/tx_burst() is a really bad example.
Initial agreement and design choice was that tx_burst() should not modify contents of the packets
(that actually was one of the reasons why tx_prepare() was introduced).
The only reason I agreed on that exception - because I couldn't come-up with something less uglier.

Actually, these problems with bonding PMD made me to start thinking that current
tx_prepare/tx_burst approach might need to be reconsidered somehow. 

> > Then we can probably have one common tx_prepare() for all vendors ;)
> 
> Yes, that would be the goal.
> More realistically, the ethdev layer could perform the common checks, and only the non-conforming drivers would have to implement
> their specific tweaks.
 
Hmm,  but that's what we have right now: 
- fields in mbuf and packet data that user has to fill correctly and dev specific tx_prepare(). 
How what you suggest will differ then? 
And how it will help let say with bonding PMD situation, or with TX-ing of the same packet over 2 different NICs?

> If we don't standardize the meaning of the offload flags, the application developers cannot trust them!
> I'm afraid this is the current situation - application developers either test with specific NIC hardware, or don't use the offload features.

Well, I have used TX offloads through several projects, it worked quite well.
Though have to admit, never have to use TX offloads together with our bonding PMD. 



  reply	other threads:[~2024-04-10 10:35 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <20240405125039.897933-1-david.marchand@redhat.com>
2024-04-05 12:49 ` [PATCH " David Marchand
2024-04-05 12:49 ` [PATCH 4/8] app/testpmd: fix outer IP checksum offload David Marchand
2024-04-05 12:49 ` [PATCH 5/8] net: fix outer UDP checksum in Intel prepare helper David Marchand
2024-04-05 12:49 ` [PATCH 6/8] net/i40e: fix outer UDP checksum offload for X710 David Marchand
2024-04-05 12:49 ` [PATCH 7/8] net/iavf: remove outer UDP checksum offload for X710 VF David Marchand
     [not found] ` <20240405144604.906695-1-david.marchand@redhat.com>
2024-04-05 14:45   ` [PATCH v2 3/8] mbuf: fix Tx checksum offload examples David Marchand
2024-04-05 16:20     ` Morten Brørup
2024-04-08 10:12       ` David Marchand
2024-04-09 13:38       ` Konstantin Ananyev
2024-04-09 14:44         ` Morten Brørup
2024-04-10 10:35           ` Konstantin Ananyev [this message]
2024-04-10 12:20             ` Morten Brørup
2024-04-12 12:46               ` Konstantin Ananyev
2024-04-12 14:44                 ` Morten Brørup
2024-04-12 15:17                   ` Konstantin Ananyev
2024-04-12 15:54                     ` Morten Brørup
2024-04-16  9:16                       ` Konstantin Ananyev
2024-04-16 11:36                         ` Konstantin Ananyev
2024-04-15 15:07                   ` Ferruh Yigit
2024-04-16  7:14                     ` Morten Brørup
2024-04-16  9:26                       ` Konstantin Ananyev
2024-04-05 14:45   ` [PATCH v2 4/8] app/testpmd: fix outer IP checksum offload David Marchand
2024-04-05 14:45   ` [PATCH v2 5/8] net: fix outer UDP checksum in Intel prepare helper David Marchand
2024-04-05 14:46   ` [PATCH v2 6/8] net/i40e: fix outer UDP checksum offload for X710 David Marchand
2024-04-05 14:46   ` [PATCH v2 7/8] net/iavf: remove outer UDP checksum offload for X710 VF David Marchand
     [not found] ` <20240418082023.1767998-1-david.marchand@redhat.com>
2024-04-18  8:20   ` [PATCH v3 3/7] app/testpmd: fix outer IP checksum offload David Marchand
2024-06-11 18:25     ` Ferruh Yigit
2024-04-18  8:20   ` [PATCH v3 4/7] net: fix outer UDP checksum in Intel prepare helper David Marchand
2024-04-18  8:20   ` [PATCH v3 5/7] net/i40e: fix outer UDP checksum offload for X710 David Marchand
2024-04-18  8:20   ` [PATCH v3 6/7] net/iavf: remove outer UDP checksum offload for X710 VF David Marchand

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=409157f5da3e4c628ca678dd9e2c7957@huawei.com \
    --to=konstantin.ananyev@huawei.com \
    --cc=aman.deep.singh@intel.com \
    --cc=andrew.rybchenko@oktetlabs.ru \
    --cc=david.marchand@redhat.com \
    --cc=dev@dpdk.org \
    --cc=ferruh.yigit@amd.com \
    --cc=jerinj@marvell.com \
    --cc=jijiang.liu@intel.com \
    --cc=kaiwenx.deng@intel.com \
    --cc=mb@smartsharesystems.com \
    --cc=olivier.matz@6wind.com \
    --cc=qiming.yang@intel.com \
    --cc=stable@dpdk.org \
    --cc=thomas@monjalon.net \
    --cc=yidingx.zhou@intel.com \
    --cc=yuying.zhang@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).