From: Konstantin Ananyev <konstantin.ananyev@huawei.com>
To: "Andrew Rybchenko" <andrew.rybchenko@oktetlabs.ru>,
"Morten Brørup" <mb@smartsharesystems.com>,
"Ferruh Yigit" <ferruh.yigit@amd.com>,
"Kaiwen Deng" <kaiwenx.deng@intel.com>,
"dev@dpdk.org" <dev@dpdk.org>
Cc: "stable@dpdk.org" <stable@dpdk.org>,
"qiming.yang@intel.com" <qiming.yang@intel.com>,
"yidingx.zhou@intel.com" <yidingx.zhou@intel.com>,
Aman Singh <aman.deep.singh@intel.com>,
Yuying Zhang <yuying.zhang@intel.com>,
David Marchand <david.marchand@redhat.com>,
"Thomas Monjalon" <thomas@monjalon.net>,
Jerin Jacob <jerinj@marvell.com>
Subject: RE: [PATCH v2] app/testpmd: use Tx preparation in txonly engine
Date: Mon, 26 Feb 2024 13:26:12 +0000 [thread overview]
Message-ID: <2b264320ddd5499b8e3899708a44cab6@huawei.com> (raw)
In-Reply-To: <545e4c08-f4cf-4246-9ed3-22882c754640@oktetlabs.ru>
> >>>>>>> TSO breaks when MSS spans more than 8 data fragments. Those
> >>>>>>> packets will be dropped by Tx preparation API, but it will
> >>> cause
> >>>>>>> MDD event if txonly forwarding engine does not call the Tx
> >>>>> preparation
> >>>>>>> API before transmitting packets.
> >>>>>>>
> >>>>>>
> >>>>>> txonly is used commonly, adding Tx prepare for a specific case
> >>> may
> >>>>>> impact performance for users.
> >>>>>>
> >>>>>> What happens when driver throws MDD (Malicious Driver Detection)
> >>>>> event,
> >>>>>> can't it be ignored? As you are already OK to drop the packet,
> >>> can
> >>>>>> device be configured to drop these packages?
> >>>>>>
> >>>>>>
> >>>>>> Or as Jerin suggested adding a new forwarding engine is a
> >>> solution,
> >>>>> but
> >>>>>> that will create code duplication, I prefer to not have it if
> >>> this
> >>>>> can
> >>>>>> be handled in device level.
> >>>>>
> >>>>> Actually I am agree with the author of the patch - when TX offloads
> >>>>> and/or multisegs are enabled,
> >>>>> user supposed to invoke eth_tx_prepare().
> >>>>> Not doing that seems like a bug to me.
> >>>>
> >>>> I strongly disagree with that statement, Konstantin!
> >>>> It is not documented anywhere that using TX offloads and/or multisegs
> >>> requires calling rte_eth_tx_prepare() before
> >>>> rte_eth_tx_burst(). And none of the examples do it.
> >>>
> >>> In fact, we do use it for test-pmd/csumonly.c.
> >>> About other sample apps:
> >>> AFAIK, not many of other DPDK apps do use L4 offloads.
> >>> Right now special treatment (pseudo-header cksum calculation) is needed
> >>> only for L4 offloads (CKSUM, SEG).
> >>> So, majority of our apps who rely on other TX offloads (multi-seg, ipv4
> >>> cksum, vlan insertion) happily run without
> >>> calling tx_prepare(), even though it is not the safest way.
> >>>
> >>>>
> >>>> In my opinion:
> >>>> If some driver has limitations for a feature, e.g. max 8 fragments,
> >>> it should be documented for that driver, so the application
> >>>> developer can make the appropriate decisions when designing the
> >>> application.
> >>>> Furthermore, we have APIs for the drivers to expose to the
> >>> applications what the driver supports, so the application can configure
> >>>> itself optimally at startup. Perhaps those APIs need to be expanded.
> >>>> And if a feature limitation is common across the majority of drivers,
> >>> that limitation should be mentioned in the documentation of the
> >>>> feature itself.
> >>>
> >>> Many of such limitations *are* documented and in fact we do have an API
> >>> to check max segments that each driver support,
> >>> see struct rte_eth_desc_lim.
> >>
> >> Yes, this is the kind of API we should provide, so the application can configure itself appropriately.
> >>
> >>> The problem is:
> >>> - none of our sample app does proper check on these values, so users
> >>> don't have a good example how to do it.
> >>
> >> Agreed.
> >> Adding an example showing how to do it properly would be the best solution.
> >> Calling tx_prepare() in the examples is certainly not the solution.
> >>
> >>> - with current DPDK API not all of HW/PMD requirements could be
> >>> extracted programmatically:
> >>> let say majority of Intel PMDs for TCP offloads expect pseudo-header
> >>> cksum to be pre-calculated by the SW.
> >>
> >> I hope this requirement is documented somewhere.
> >>
> >>> another example, some HW expects pkt_len to be bigger then some
> >>> threshold value, otherwise HW hang may appear.
> >>
> >> I hope this requirement is also documented somewhere.
> >
> > No idea, I found it only in the code.
>
> IMHO Tx burst must check such limitations. If you made your HW simpler
> (or just lost it on initial testing), pay in your drivers (or your
> HW+driver will be unusable because of such problems).
>
> >> Generally, if the requirements cannot be extracted programmatically, they must be prominently documented, like this note to
> >> rte_eth_rx_burst():
> >
> > Obviously, more detailed documentation is always good, but...
> > Right now we have 50+ different PMDs from different vendors.
> > Even if each and every of them will carefully document all possible limitations and necessary preparation steps,
> > how DPDK app developer supposed to deal with all that?
> > Do you expect everyone, to read carefully through all of them, and handle all of them properly oh his own
> > in each and every DPDK app he is going to write?
> > That seems unrealistic.
> > Again what to do with backward compatibility: when new driver (with new limitations) will arise
> > *after* your app is already written and tested?
>
> +1
>
> >>
> >> * @note
> >> * Some drivers using vector instructions require that *nb_pkts* is
> >> * divisible by 4 or 8, depending on the driver implementation.
>
> I'm wondering what application should do if it needs to send just one
> packet and do it now. IMHO, such limitations are not acceptable.
>
> >>
> >>> - As new HW and PMD keep appearing it is hard to predict what extra
> >>> limitations/requirements will arise,
> >>> that's why tx_prepare() was introduced as s driver op.
> >>>
> >>>>
> >>>> We don't want to check in the fast path what can be checked at
> >>> startup or build time!
> >>>
> >>> If your app supposed to work with just a few, known in advance, NIC
> >>> models, then sure, you can do that.
> >>> For apps that supposed to work 'in general' with any possible PMDs
> >>> that DPDK supports - that might be a problem.
> >>> That's why tx_prepare() was introduced and it is strongly recommended
> >>> to use it by the apps that do use TX offloads.
> >>> Probably tx_prepare() is not the best possible approach, but right now
> >>> there are not many alternatives within DPDK.
> >>
> >> What exactly is an application supposed to do if tx_prepare() doesn't accept the full burst? It doesn't return information about
> what is
> >> wrong.
> >
> > It provides some information, but it is *very* limited: just index of 'bad' packet and error code.
> > In theory app can try to handle it in a 'smart' way: let say if ENOTSUP is returned, then try to disable all HW offloads
> > and do all in SW. But again, it is much better to do so *before* submitting packets for TX, so in practice everyone
> > just drop such 'bad' packets.
> >
> > Dropping the packets might not be an option, e.g. for applications used in life support or tele-medicine.
>
> Critical applications should be able to do all Tx offloads in SW and
> retry. Of course, various statistics and logs should help to improve the
> application.
>
> > If the packet is 'bad', then it is much better to drop it, then TX corrupted packet or even hang NIC HW completely.
>
> IMHO Tx burst must drop packet which could hang NIC HW completely. I
> realize that it is an extra checks and performance drop, but vendor
> should pay in performance if HW is not good enough.
>
> > Though off-course it is much better to have an app that would check for limitations that can be checked by API provided
> > and enable only supported offloads.
>
> Yes, that's why API to get limitation is much better than documentation.
>
> >> If limitations are documented, an application can use the lowest common denominator of the NICs it supports. And if the
> application is
> >> supposed to work in general, that becomes the lowest common denominator of all NICs.
> >
> > I agree: for limitations that can be extracted with generic API, like:
> > number of segments per packet, supported TX offloads, mbuf fileds that must be provided for each TX offload, etc. -
> > it is responsibility of well-written application to obey all of them.
> > Yes, many tx_prepare() implementations do such checks anyway, but from my perspective it is sort of last-line of defense.
> > For well written application that should just never happen.
> > But there is one more important responsibility of tx_prepare() - it performs PMD specific packet modifications for requested
> offloads.
> > As I already mentioned for Intel NICs - it does pseudo-header cksum calcucation, for packets with size less then minimal, it can
> > probably do padding (even if doesn't do it right now), for some other PMDs - might be something else, I didn't check.
> > Obviously it saves app developer from a burden to do all these things on his own.
> >
> >> It looks like tx_prepare() has become a horrible workaround for undocumented limitations.
>
> I strongly disagree. Documentation is never a solution for a generic
> application which is intended to work on any HW and IMHO it is the
> goal to have more and more applications which work on any HW.
>
> >> Limitations due to hardware and/or software tradeoffs are unavoidable, so we have to live with them; but we should not accept
> >> PMDs with undocumented limitations.
> >
> > As I already said, more documentation never hurts, but for that case, I think it is not enough.
> > I expect PMD to provide a tx_prepare() implementation that would deal with such specific things.
>
> +1
> > Anyway, back to the original patch - I looked at it once again, and realized that the problem
> > is just in the unsupported number of segments.
> > As we discussed above - such limitations should be handled by well written app,
> > but none of DPDK apps does it right now.
> > So probably it is good opportunity to do things in a proper way and introduce such checks
> > in testpmd ;)
>
> +1 since we already have fields in device information to report such
> limitations, but it does not say that Tx prepare should be dropped.
> Drivers which don't need Tx prepare keep it NULL and it returns
> immediately from ethdev. Since it is done per-burst, it should not
> affect performance a lot.
>
100% agree. I think tx_prepare needs to stay, and in general has to be strongly recommended
for apps that do use TX offloads/multi-segs.
next prev parent reply other threads:[~2024-02-26 13:26 UTC|newest]
Thread overview: 21+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-01-03 1:29 [PATCH v1] " Kaiwen Deng
2024-01-04 1:03 ` Stephen Hemminger
2024-01-04 5:52 ` Jerin Jacob
2024-01-11 5:25 ` [PATCH v2] " Kaiwen Deng
2024-01-11 6:34 ` lihuisong (C)
2024-01-11 16:57 ` Stephen Hemminger
2024-01-12 16:00 ` David Marchand
2024-02-08 0:07 ` Ferruh Yigit
2024-02-08 10:50 ` Konstantin Ananyev
2024-02-08 11:35 ` Ferruh Yigit
2024-02-08 15:14 ` Konstantin Ananyev
2024-02-08 11:52 ` Morten Brørup
2024-02-11 15:04 ` Konstantin Ananyev
2024-02-13 10:27 ` Morten Brørup
2024-02-22 18:28 ` Konstantin Ananyev
2024-02-23 8:36 ` Andrew Rybchenko
2024-02-26 13:26 ` Konstantin Ananyev [this message]
2024-02-26 13:56 ` Morten Brørup
2024-02-27 10:41 ` Konstantin Ananyev
2024-02-08 12:09 ` Jerin Jacob
2024-02-09 19:18 ` Ferruh Yigit
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=2b264320ddd5499b8e3899708a44cab6@huawei.com \
--to=konstantin.ananyev@huawei.com \
--cc=aman.deep.singh@intel.com \
--cc=andrew.rybchenko@oktetlabs.ru \
--cc=david.marchand@redhat.com \
--cc=dev@dpdk.org \
--cc=ferruh.yigit@amd.com \
--cc=jerinj@marvell.com \
--cc=kaiwenx.deng@intel.com \
--cc=mb@smartsharesystems.com \
--cc=qiming.yang@intel.com \
--cc=stable@dpdk.org \
--cc=thomas@monjalon.net \
--cc=yidingx.zhou@intel.com \
--cc=yuying.zhang@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).