DPDK patches and discussions
 help / color / mirror / Atom feed
From: "Morten Brørup" <mb@smartsharesystems.com>
To: "Konstantin Ananyev" <konstantin.ananyev@huawei.com>,
	"Ferruh Yigit" <ferruh.yigit@amd.com>,
	"Kaiwen Deng" <kaiwenx.deng@intel.com>, <dev@dpdk.org>
Cc: <stable@dpdk.org>, <qiming.yang@intel.com>,
	<yidingx.zhou@intel.com>,
	"Aman Singh" <aman.deep.singh@intel.com>,
	"Yuying Zhang" <yuying.zhang@intel.com>,
	"David Marchand" <david.marchand@redhat.com>,
	"Thomas Monjalon" <thomas@monjalon.net>,
	"Andrew Rybchenko" <andrew.rybchenko@oktetlabs.ru>,
	"Jerin Jacob" <jerinj@marvell.com>
Subject: RE: [PATCH v2] app/testpmd: use Tx preparation in txonly engine
Date: Tue, 13 Feb 2024 11:27:21 +0100	[thread overview]
Message-ID: <98CBD80474FA8B44BF855DF32C47DC35E9F20C@smartserver.smartshare.dk> (raw)
In-Reply-To: <918593c56c5745a285facc47b6cdc76b@huawei.com>

+CC: Ethernet API maintainers
+CC: Jerin (commented on another branch of this thread)

> From: Konstantin Ananyev [mailto:konstantin.ananyev@huawei.com]
> Sent: Sunday, 11 February 2024 16.04
> 
> > > > > TSO breaks when MSS spans more than 8 data fragments. Those
> > > > > packets will be dropped by Tx preparation API, but it will
> cause
> > > > > MDD event if txonly forwarding engine does not call the Tx
> > > preparation
> > > > > API before transmitting packets.
> > > > >
> > > >
> > > > txonly is used commonly, adding Tx prepare for a specific case
> may
> > > > impact performance for users.
> > > >
> > > > What happens when driver throws MDD (Malicious Driver Detection)
> > > event,
> > > > can't it be ignored? As you are already OK to drop the packet,
> can
> > > > device be configured to drop these packages?
> > > >
> > > >
> > > > Or as Jerin suggested adding a new forwarding engine is a
> solution,
> > > but
> > > > that will create code duplication, I prefer to not have it if
> this
> > > can
> > > > be handled in device level.
> > >
> > > Actually I am agree with the author of the patch - when TX offloads
> > > and/or multisegs are enabled,
> > > user supposed to invoke eth_tx_prepare().
> > > Not doing that seems like a bug to me.
> >
> > I strongly disagree with that statement, Konstantin!
> > It is not documented anywhere that using TX offloads and/or multisegs
> requires calling rte_eth_tx_prepare() before
> > rte_eth_tx_burst(). And none of the examples do it.
> 
> In fact, we do use it for test-pmd/csumonly.c.
> About other sample apps:
> AFAIK, not many of other DPDK apps do use L4 offloads.
> Right now special treatment (pseudo-header cksum calculation) is needed
> only for L4 offloads (CKSUM, SEG).
> So, majority of our apps who rely on other TX offloads (multi-seg, ipv4
> cksum, vlan insertion) happily run without
> calling tx_prepare(), even though it is not the safest way.
> 
> >
> > In my opinion:
> > If some driver has limitations for a feature, e.g. max 8 fragments,
> it should be documented for that driver, so the application
> > developer can make the appropriate decisions when designing the
> application.
> > Furthermore, we have APIs for the drivers to expose to the
> applications what the driver supports, so the application can configure
> > itself optimally at startup. Perhaps those APIs need to be expanded.
> > And if a feature limitation is common across the majority of drivers,
> that limitation should be mentioned in the documentation of the
> > feature itself.
> 
> Many of such limitations *are* documented and in fact we do have an API
> to check max segments that each driver support,
> see struct rte_eth_desc_lim.

Yes, this is the kind of API we should provide, so the application can configure itself appropriately.

> The problem is:
> - none of our sample app does proper check on these values, so users
> don't have a good example how to do it.

Agreed.
Adding an example showing how to do it properly would be the best solution.
Calling tx_prepare() in the examples is certainly not the solution.

> - with current DPDK API not all of HW/PMD requirements could be
> extracted programmatically:
>   let say majority of Intel PMDs for TCP offloads expect pseudo-header
> cksum to be pre-calculated by the SW.

I hope this requirement is documented somewhere.

>   another example, some HW expects pkt_len to be bigger then some
> threshold value, otherwise HW hang may appear.

I hope this requirement is also documented somewhere.

Generally, if the requirements cannot be extracted programmatically, they must be prominently documented, like this note to rte_eth_rx_burst():

 * @note
 *   Some drivers using vector instructions require that *nb_pkts* is
 *   divisible by 4 or 8, depending on the driver implementation.

> - As new HW and PMD keep appearing it is hard to predict what extra
> limitations/requirements will arise,
>   that's why tx_prepare() was introduced as s driver op.
> 
> >
> > We don't want to check in the fast path what can be checked at
> startup or build time!
> 
> If your app supposed to work with just a few, known in advance, NIC
> models, then sure, you can do that.
> For apps that supposed to work 'in general'  with any possible PMDs
> that DPDK supports - that might be a problem.
> That's why tx_prepare() was introduced and it is strongly recommended
> to use it by the apps that do use TX offloads.
> Probably tx_prepare() is not the best possible approach, but right now
> there are not many alternatives within DPDK.

What exactly is an application supposed to do if tx_prepare() doesn't accept the full burst? It doesn't return information about what is wrong. Dropping the packets might not be an option, e.g. for applications used in life support or tele-medicine.

If limitations are documented, an application can use the lowest common denominator of the NICs it supports. And if the application is supposed to work in general, that becomes the lowest common denominator of all NICs.

It looks like tx_prepare() has become a horrible workaround for undocumented limitations.

Limitations due to hardware and/or software tradeoffs are unavoidable, so we have to live with them; but we should not accept PMDs with undocumented limitations.

> 
> >
> > > If it still works for some cases, that's a lucky coincidence, but
> not
> > > the expected behavior.
> > > About performance - first we can check is that really a drop.
> > > Also as I remember most drivers set it to non-NULL value, only when
> > > some TX offloads were
> > > enabled by the user on that port, so hopefully for simple case (one
> > > segment, no tx offloads) it
> > > should be negligible.
> > > Again, we can add manual check in testpmd tx-only code to decide do
> we
> > > need a TX prepare
> > > to be called or not.
> > > Konstantin

  reply	other threads:[~2024-02-13 10:27 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-01-03  1:29 [PATCH v1] " Kaiwen Deng
2024-01-04  1:03 ` Stephen Hemminger
2024-01-04  5:52 ` Jerin Jacob
2024-01-11  5:25 ` [PATCH v2] " Kaiwen Deng
2024-01-11  6:34   ` lihuisong (C)
2024-01-11 16:57   ` Stephen Hemminger
2024-01-12 16:00   ` David Marchand
2024-02-08  0:07   ` Ferruh Yigit
2024-02-08 10:50     ` Konstantin Ananyev
2024-02-08 11:35       ` Ferruh Yigit
2024-02-08 15:14         ` Konstantin Ananyev
2024-02-08 11:52       ` Morten Brørup
2024-02-11 15:04         ` Konstantin Ananyev
2024-02-13 10:27           ` Morten Brørup [this message]
2024-02-22 18:28             ` Konstantin Ananyev
2024-02-23  8:36               ` Andrew Rybchenko
2024-02-26 13:26                 ` Konstantin Ananyev
2024-02-26 13:56                   ` Morten Brørup
2024-02-27 10:41                     ` Konstantin Ananyev
2024-02-08 12:09     ` Jerin Jacob
2024-02-09 19:18       ` Ferruh Yigit

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=98CBD80474FA8B44BF855DF32C47DC35E9F20C@smartserver.smartshare.dk \
    --to=mb@smartsharesystems.com \
    --cc=aman.deep.singh@intel.com \
    --cc=andrew.rybchenko@oktetlabs.ru \
    --cc=david.marchand@redhat.com \
    --cc=dev@dpdk.org \
    --cc=ferruh.yigit@amd.com \
    --cc=jerinj@marvell.com \
    --cc=kaiwenx.deng@intel.com \
    --cc=konstantin.ananyev@huawei.com \
    --cc=qiming.yang@intel.com \
    --cc=stable@dpdk.org \
    --cc=thomas@monjalon.net \
    --cc=yidingx.zhou@intel.com \
    --cc=yuying.zhang@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).