From: Konstantin Ananyev <konstantin.ananyev@huawei.com>
To: "Morten Brørup" <mb@smartsharesystems.com>,
"Andrew Rybchenko" <andrew.rybchenko@oktetlabs.ru>,
"Ferruh Yigit" <ferruh.yigit@amd.com>,
"Kaiwen Deng" <kaiwenx.deng@intel.com>,
"dev@dpdk.org" <dev@dpdk.org>
Cc: "stable@dpdk.org" <stable@dpdk.org>,
"qiming.yang@intel.com" <qiming.yang@intel.com>,
"yidingx.zhou@intel.com" <yidingx.zhou@intel.com>,
Aman Singh <aman.deep.singh@intel.com>,
Yuying Zhang <yuying.zhang@intel.com>,
David Marchand <david.marchand@redhat.com>,
"Thomas Monjalon" <thomas@monjalon.net>,
Jerin Jacob <jerinj@marvell.com>
Subject: RE: [PATCH v2] app/testpmd: use Tx preparation in txonly engine
Date: Tue, 27 Feb 2024 10:41:08 +0000 [thread overview]
Message-ID: <b833e5bc79b14129ae69374dfed08ca0@huawei.com> (raw)
In-Reply-To: <98CBD80474FA8B44BF855DF32C47DC35E9F268@smartserver.smartshare.dk>
> Subject: RE: [PATCH v2] app/testpmd: use Tx preparation in txonly engine
>
> > > >>>>>>> TSO breaks when MSS spans more than 8 data fragments. Those
> > > >>>>>>> packets will be dropped by Tx preparation API, but it will
> > > >>> cause
> > > >>>>>>> MDD event if txonly forwarding engine does not call the Tx
> > > >>>>> preparation
> > > >>>>>>> API before transmitting packets.
> > > >>>>>>>
> > > >>>>>>
> > > >>>>>> txonly is used commonly, adding Tx prepare for a specific case
> > > >>> may
> > > >>>>>> impact performance for users.
> > > >>>>>>
> > > >>>>>> What happens when driver throws MDD (Malicious Driver
> > Detection)
> > > >>>>> event,
> > > >>>>>> can't it be ignored? As you are already OK to drop the packet,
> > > >>> can
> > > >>>>>> device be configured to drop these packages?
> > > >>>>>>
> > > >>>>>>
> > > >>>>>> Or as Jerin suggested adding a new forwarding engine is a
> > > >>> solution,
> > > >>>>> but
> > > >>>>>> that will create code duplication, I prefer to not have it if
> > > >>> this
> > > >>>>> can
> > > >>>>>> be handled in device level.
> > > >>>>>
> > > >>>>> Actually I am agree with the author of the patch - when TX
> > offloads
> > > >>>>> and/or multisegs are enabled,
> > > >>>>> user supposed to invoke eth_tx_prepare().
> > > >>>>> Not doing that seems like a bug to me.
> > > >>>>
> > > >>>> I strongly disagree with that statement, Konstantin!
> > > >>>> It is not documented anywhere that using TX offloads and/or
> > multisegs
> > > >>> requires calling rte_eth_tx_prepare() before
> > > >>>> rte_eth_tx_burst(). And none of the examples do it.
> > > >>>
> > > >>> In fact, we do use it for test-pmd/csumonly.c.
> > > >>> About other sample apps:
> > > >>> AFAIK, not many of other DPDK apps do use L4 offloads.
> > > >>> Right now special treatment (pseudo-header cksum calculation) is
> > needed
> > > >>> only for L4 offloads (CKSUM, SEG).
> > > >>> So, majority of our apps who rely on other TX offloads (multi-seg,
> > ipv4
> > > >>> cksum, vlan insertion) happily run without
> > > >>> calling tx_prepare(), even though it is not the safest way.
> > > >>>
> > > >>>>
> > > >>>> In my opinion:
> > > >>>> If some driver has limitations for a feature, e.g. max 8
> > fragments,
> > > >>> it should be documented for that driver, so the application
> > > >>>> developer can make the appropriate decisions when designing the
> > > >>> application.
> > > >>>> Furthermore, we have APIs for the drivers to expose to the
> > > >>> applications what the driver supports, so the application can
> > configure
> > > >>>> itself optimally at startup. Perhaps those APIs need to be
> > expanded.
> > > >>>> And if a feature limitation is common across the majority of
> > drivers,
> > > >>> that limitation should be mentioned in the documentation of the
> > > >>>> feature itself.
> > > >>>
> > > >>> Many of such limitations *are* documented and in fact we do have
> > an API
> > > >>> to check max segments that each driver support,
> > > >>> see struct rte_eth_desc_lim.
> > > >>
> > > >> Yes, this is the kind of API we should provide, so the application
> > can configure itself appropriately.
> > > >>
> > > >>> The problem is:
> > > >>> - none of our sample app does proper check on these values, so
> > users
> > > >>> don't have a good example how to do it.
> > > >>
> > > >> Agreed.
> > > >> Adding an example showing how to do it properly would be the best
> > solution.
> > > >> Calling tx_prepare() in the examples is certainly not the solution.
> > > >>
> > > >>> - with current DPDK API not all of HW/PMD requirements could be
> > > >>> extracted programmatically:
> > > >>> let say majority of Intel PMDs for TCP offloads expect pseudo-
> > header
> > > >>> cksum to be pre-calculated by the SW.
> > > >>
> > > >> I hope this requirement is documented somewhere.
> > > >>
> > > >>> another example, some HW expects pkt_len to be bigger then some
> > > >>> threshold value, otherwise HW hang may appear.
> > > >>
> > > >> I hope this requirement is also documented somewhere.
> > > >
> > > > No idea, I found it only in the code.
> > >
> > > IMHO Tx burst must check such limitations. If you made your HW simpler
> > > (or just lost it on initial testing), pay in your drivers (or your
> > > HW+driver will be unusable because of such problems).
> > >
> > > >> Generally, if the requirements cannot be extracted
> > programmatically, they must be prominently documented, like this note to
> > > >> rte_eth_rx_burst():
> > > >
> > > > Obviously, more detailed documentation is always good, but...
> > > > Right now we have 50+ different PMDs from different vendors.
> > > > Even if each and every of them will carefully document all possible
> > limitations and necessary preparation steps,
> > > > how DPDK app developer supposed to deal with all that?
> > > > Do you expect everyone, to read carefully through all of them, and
> > handle all of them properly oh his own
> > > > in each and every DPDK app he is going to write?
> > > > That seems unrealistic.
> > > > Again what to do with backward compatibility: when new driver (with
> > new limitations) will arise
> > > > *after* your app is already written and tested?
> > >
> > > +1
> > >
> > > >>
> > > >> * @note
> > > >> * Some drivers using vector instructions require that *nb_pkts*
> > is
> > > >> * divisible by 4 or 8, depending on the driver implementation.
> > >
> > > I'm wondering what application should do if it needs to send just one
> > > packet and do it now. IMHO, such limitations are not acceptable.
>
> This common limitation for vector drivers is for RX burst, not TX burst.
> I agree such a limitation would be unacceptable for TX.
>
> > >
> > > >>
> > > >>> - As new HW and PMD keep appearing it is hard to predict what
> > extra
> > > >>> limitations/requirements will arise,
> > > >>> that's why tx_prepare() was introduced as s driver op.
> > > >>>
> > > >>>>
> > > >>>> We don't want to check in the fast path what can be checked at
> > > >>> startup or build time!
> > > >>>
> > > >>> If your app supposed to work with just a few, known in advance,
> > NIC
> > > >>> models, then sure, you can do that.
> > > >>> For apps that supposed to work 'in general' with any possible
> > PMDs
> > > >>> that DPDK supports - that might be a problem.
> > > >>> That's why tx_prepare() was introduced and it is strongly
> > recommended
> > > >>> to use it by the apps that do use TX offloads.
> > > >>> Probably tx_prepare() is not the best possible approach, but right
> > now
> > > >>> there are not many alternatives within DPDK.
> > > >>
> > > >> What exactly is an application supposed to do if tx_prepare()
> > doesn't accept the full burst? It doesn't return information about
> > > what is
> > > >> wrong.
> > > >
> > > > It provides some information, but it is *very* limited: just index
> > of 'bad' packet and error code.
> > > > In theory app can try to handle it in a 'smart' way: let say if
> > ENOTSUP is returned, then try to disable all HW offloads
> > > > and do all in SW. But again, it is much better to do so *before*
> > submitting packets for TX, so in practice everyone
> > > > just drop such 'bad' packets.
> > > >
> > > > Dropping the packets might not be an option, e.g. for applications
> > used in life support or tele-medicine.
> > >
> > > Critical applications should be able to do all Tx offloads in SW and
> > > retry. Of course, various statistics and logs should help to improve
> > the
> > > application.
> > >
> > > > If the packet is 'bad', then it is much better to drop it, then TX
> > corrupted packet or even hang NIC HW completely.
> > >
> > > IMHO Tx burst must drop packet which could hang NIC HW completely. I
> > > realize that it is an extra checks and performance drop, but vendor
> > > should pay in performance if HW is not good enough.
> > >
> > > > Though off-course it is much better to have an app that would check
> > for limitations that can be checked by API provided
> > > > and enable only supported offloads.
> > >
> > > Yes, that's why API to get limitation is much better than
> > documentation.
> > >
> > > >> If limitations are documented, an application can use the lowest
> > common denominator of the NICs it supports. And if the
> > > application is
> > > >> supposed to work in general, that becomes the lowest common
> > denominator of all NICs.
> > > >
> > > > I agree: for limitations that can be extracted with generic API,
> > like:
> > > > number of segments per packet, supported TX offloads, mbuf fileds
> > that must be provided for each TX offload, etc. -
> > > > it is responsibility of well-written application to obey all of
> > them.
> > > > Yes, many tx_prepare() implementations do such checks anyway, but
> > from my perspective it is sort of last-line of defense.
> > > > For well written application that should just never happen.
> > > > But there is one more important responsibility of tx_prepare() - it
> > performs PMD specific packet modifications for requested
> > > offloads.
> > > > As I already mentioned for Intel NICs - it does pseudo-header cksum
> > calcucation, for packets with size less then minimal, it can
> > > > probably do padding (even if doesn't do it right now), for some
> > other PMDs - might be something else, I didn't check.
> > > > Obviously it saves app developer from a burden to do all these
> > things on his own.
> > > >
> > > >> It looks like tx_prepare() has become a horrible workaround for
> > undocumented limitations.
> > >
> > > I strongly disagree. Documentation is never a solution for a generic
> > > application which is intended to work on any HW and IMHO it is the
> > > goal to have more and more applications which work on any HW.
> > >
> > > >> Limitations due to hardware and/or software tradeoffs are
> > unavoidable, so we have to live with them; but we should not accept
> > > >> PMDs with undocumented limitations.
> > > >
> > > > As I already said, more documentation never hurts, but for that
> > case, I think it is not enough.
> > > > I expect PMD to provide a tx_prepare() implementation that would
> > deal with such specific things.
> > >
> > > +1
> > > > Anyway, back to the original patch - I looked at it once again, and
> > realized that the problem
> > > > is just in the unsupported number of segments.
> > > > As we discussed above - such limitations should be handled by well
> > written app,
> > > > but none of DPDK apps does it right now.
> > > > So probably it is good opportunity to do things in a proper way and
> > introduce such checks
> > > > in testpmd ;)
> > >
> > > +1 since we already have fields in device information to report such
> > > limitations, but it does not say that Tx prepare should be dropped.
> > > Drivers which don't need Tx prepare keep it NULL and it returns
> > > immediately from ethdev. Since it is done per-burst, it should not
> > > affect performance a lot.
> > >
> >
> > 100% agree. I think tx_prepare needs to stay, and in general has to be
> > strongly recommended
> > for apps that do use TX offloads/multi-segs.
>
> <irony>
> Then tx_prepare should randomly reject packets and segments, to ensure the application has sufficient SW fallback implemented.
> </irony>
>
> No, seriously, considering the above arguments, I think PMD conformance requirements would be a better solution.
> If a PMD claims support for some feature, it must conform to some minimum requirements for that feature.
>
> E.g. support for multi-seg TX must be able to handle min. 8 segments (or some other reasonable number).
Hmm... why is that?
I thought we had a consensus here that good behaving app should use existing API (rte_eth_desc_lim) to deduce min common denominator,
and that we need some code example, or even better a lib function that would do such thing.
>
> Common limitations and preconditions, such as minimum RX burst size for vector drivers and pseudo-header checksum precalculation
> for TCP offload, should be accepted and prominently documented.
I still don't understand why these preconditions can't be done by tx_prepare().
> Unusual HW limitations, such as inability to pad short Ethernet frames, should be handled by the driver's TX function as workarounds
> for unacceptable limitations (in DPDK API context) in the HW.
With current design, rte_eth_tx_burst() is not allowed to modify mbufs whose refcnt > 1.
That's one of the reason why tx_prepare() was introduced.
next prev parent reply other threads:[~2024-02-27 10:41 UTC|newest]
Thread overview: 21+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-01-03 1:29 [PATCH v1] " Kaiwen Deng
2024-01-04 1:03 ` Stephen Hemminger
2024-01-04 5:52 ` Jerin Jacob
2024-01-11 5:25 ` [PATCH v2] " Kaiwen Deng
2024-01-11 6:34 ` lihuisong (C)
2024-01-11 16:57 ` Stephen Hemminger
2024-01-12 16:00 ` David Marchand
2024-02-08 0:07 ` Ferruh Yigit
2024-02-08 10:50 ` Konstantin Ananyev
2024-02-08 11:35 ` Ferruh Yigit
2024-02-08 15:14 ` Konstantin Ananyev
2024-02-08 11:52 ` Morten Brørup
2024-02-11 15:04 ` Konstantin Ananyev
2024-02-13 10:27 ` Morten Brørup
2024-02-22 18:28 ` Konstantin Ananyev
2024-02-23 8:36 ` Andrew Rybchenko
2024-02-26 13:26 ` Konstantin Ananyev
2024-02-26 13:56 ` Morten Brørup
2024-02-27 10:41 ` Konstantin Ananyev [this message]
2024-02-08 12:09 ` Jerin Jacob
2024-02-09 19:18 ` Ferruh Yigit
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=b833e5bc79b14129ae69374dfed08ca0@huawei.com \
--to=konstantin.ananyev@huawei.com \
--cc=aman.deep.singh@intel.com \
--cc=andrew.rybchenko@oktetlabs.ru \
--cc=david.marchand@redhat.com \
--cc=dev@dpdk.org \
--cc=ferruh.yigit@amd.com \
--cc=jerinj@marvell.com \
--cc=kaiwenx.deng@intel.com \
--cc=mb@smartsharesystems.com \
--cc=qiming.yang@intel.com \
--cc=stable@dpdk.org \
--cc=thomas@monjalon.net \
--cc=yidingx.zhou@intel.com \
--cc=yuying.zhang@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).