RE: Fixing MBUF_FAST_FREE TX offload requirements?

DPDK patches and discussions
 help / color / mirror / Atom feed

From: Konstantin Ananyev <konstantin.ananyev@huawei.com>
To: "Morten Brørup" <mb@smartsharesystems.com>,
	"Bruce Richardson" <bruce.richardson@intel.com>
Cc: Ajit Khaparde <ajit.khaparde@broadcom.com>,
	Somnath Kotur <somnath.kotur@broadcom.com>,
	Nithin Dabilpuram <ndabilpuram@marvell.com>,
	Kiran Kumar K <kirankumark@marvell.com>,
	Sunil Kumar Kori <skori@marvell.com>,
	Satha Rao <skoteshwar@marvell.com>,
	Harman Kalra <hkalra@marvell.com>,
	Hemant Agrawal <hemant.agrawal@nxp.com>,
	Sachin Saxena <sachin.saxena@oss.nxp.com>,
	Shai Brandes <shaibran@amazon.com>,
	"Evgeny Schemeilin" <evgenys@amazon.com>,
	Ron Beider <rbeider@amazon.com>,
	"Amit Bernstein" <amitbern@amazon.com>,
	Wajeeh Atrash <atrwajee@amazon.com>,
	"Gaetan Rivet" <grive@u256.net>,
	yangxingui <yangxingui@h-partners.com>,
	Fengchengwen <fengchengwen@huawei.com>,
	Praveen Shetty <praveen.shetty@intel.com>,
	Vladimir Medvedkin <vladimir.medvedkin@intel.com>,
	Anatoly Burakov <anatoly.burakov@intel.com>,
	Jingjing Wu <jingjing.wu@intel.com>,
	Rosen Xu <rosen.xu@altera.com>,
	Andrew Boyer <andrew.boyer@amd.com>,
	Dariusz Sosnowski <dsosnowski@nvidia.com>,
	Viacheslav Ovsiienko <viacheslavo@nvidia.com>,
	"Bing Zhao" <bingz@nvidia.com>, Ori Kam <orika@nvidia.com>,
	Suanming Mou <suanmingm@nvidia.com>,
	Matan Azrad <matan@nvidia.com>, Wenbo Cao <caowenbo@mucse.com>,
	Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>,
	"Jerin Jacob" <jerinj@marvell.com>,
	Maciej Czekaj <mczekaj@marvell.com>,
	"dev@dpdk.org" <dev@dpdk.org>,
	"techboard@dpdk.org" <techboard@dpdk.org>,
	Ivan Malov <ivan.malov@arknetworks.am>,
	Thomas Monjalon <thomas@monjalon.net>
Subject: RE: Fixing MBUF_FAST_FREE TX offload requirements?
Date: Thu, 18 Sep 2025 14:12:17 +0000	[thread overview]
Message-ID: <f3d8956d80814d82a450abc263349ea7@huawei.com> (raw)
In-Reply-To: <98CBD80474FA8B44BF855DF32C47DC35F65446@smartserver.smartshare.dk>



> Subject: RE: Fixing MBUF_FAST_FREE TX offload requirements?
> 
> > From: Bruce Richardson [mailto:bruce.richardson@intel.com]
> > Sent: Thursday, 18 September 2025 11.09
> >
> > On Thu, Sep 18, 2025 at 10:50:11AM +0200, Morten Brørup wrote:
> > > Dear NIC driver maintainers (CC: DPDK Tech Board),
> > >
> > > The DPDK Tech Board has discussed that patch [1] (included in DPDK
> > 25.07) extended the documented requirements to the
> > RTE_ETH_TX_OFFLOAD_MBUF_FAST_FREE offload.
> > > These changes put additional limitations on applications' use of the
> > MBUF_FAST_FREE TX offload, and made MBUF_FAST_FREE mutually exclusive
> > with MULTI_SEGS (which is typically used for jumbo frame support).
> > > The Tech Board discussed that these changes do not reflect the
> > intention of the MBUF_FAST_FREE TX offload, and wants to fix it.
> > > Mainly, MBUF_FAST_FREE and MULTI_SEGS should not be mutually
> > exclusive.
> > >
> > > The original RTE_ETH_TX_OFFLOAD_MBUF_FAST_FREE requirements were:
> > > When set, application must guarantee that
> > > 1) per-queue all mbufs come from the same mempool, and
> > > 2) mbufs have refcnt = 1.
> > >
> > > The patch added the following requirements to the MBUF_FAST_FREE
> > offload, reflecting rte_pktmbuf_prefree_seg() postconditions:
> > > 3) mbufs are direct,
> > > 4) mbufs have next = NULL and nb_segs = 1.
> > >
> > > Now, the key question is:
> > > Can we roll back to the original two requirements?
> > > Or do the drivers also depend on the third and/or fourth
> > requirements?
> > >
> > > <advertisement>
> > > Drivers freeing mbufs directly to a mempool should use the new
> > rte_mbuf_raw_free_bulk() instead of rte_mempool_put_bulk(), so the
> > preconditions for freeing mbufs directly into a mempool are validated
> > in mbuf debug mode (with RTE_LIBRTE_MBUF_DEBUG enabled).
> > > Similarly, rte_mbuf_raw_alloc_bulk() should be used instead of
> > rte_mempool_get_bulk().
> > > </advertisement>
> > >
> > > PS: The feature documentation [2] still reflects the original
> > requirements.
> > >
> > > [1]:
> >
> https://github.com/DPDK/dpdk/commit/55624173bacb2becaa67793b7139188487
> 6
> > 673c1
> > > [2]:
> > https://elixir.bootlin.com/dpdk/v25.07/source/doc/guides/nics/features.
> > rst#L125
> > >
> > >
> > > Venlig hilsen / Kind regards,
> > > -Morten Brørup
> > >
> > I'm a little torn on this question, because I can see benefits for both
> > approaches. Firstly, it would be nice if we made FAST_FREE as
> > accessible
> > for driver use as it was originally, with minimal requirements.
> > However, on
> > looking at the code, I believe that many drivers actually took it to
> > mean
> > that scattered packets couldn't occur in that case either, so the use
> > was
> > incorrect.
> 
> I primarily look at Intel drivers, and that's how I read the driver code too.
> 
> > Similarly, and secondly, if we do have the extra
> > requirements
> > for FAST_FREE, it does mean that any use of it can be very, very
> > minimal
> > and efficient, since we don't need to check anything before freeing the
> > buffers.
> >
> > Given where we are now, I think keeping the more restrictive definition
> > of
> > FAST_FREE is the way to go - keeping it exclusive with MULTI_SEGS -
> > because
> > it means that we are less likely to have bugs. If we look to change it
> > back, I think we'd have to check all drivers to ensure they are using
> > the
> > flag safely.
> 
> However, those driver bugs are not new.
> If we haven't received bug reports from users affected by them, maybe we can
> disregard them (in this discussion about pros and cons).
> I prefer we register them as driver bugs, instead of changing the API to
> accommodate bugs in the drivers.
> 
> From an application perspective, here's an idea for consideration:
> Assuming that indirect mbufs are uncommon, we keep requirement #3.
> To allow MULTI_SEGS (jumbo frames) with FAST_FREE, we get rid of requirement
> #4.

Do we really need to enable FAST_FREE for jumbo-frames?
Jumbo-frames usually means much smaller PPS number and actual RX/TX overhead
becomes really tiny. 

> Since the driver knows that refcnt == 1, the driver can set next = NULL and
> nb_segs = 1 at any time, either when writing the TX descriptor (when it reads the
> mbuf anyway), or when freeing the mbuf.
> Regarding performance, this means that the driver's TX code path has to write to
> the mbufs (i.e. adding the performance cost of memory store operations) when
> segmented - but that is a universal requirement when freeing segmented mbufs
> to the mempool.

It might work, but I think it will become way too complicated.
Again I don't know who is going to inspect/fix all the drivers.
Just not allowing FAST_FREE for mulsti-seg seems like a much more simpler and safer approach.
 
> For even more optimized driver performance, as Bruce mentions...
> If a port is configured for FAST_FREE and not MULTI_SEGS, the driver can use a
> super lean transmit function.
> Since the driver's transmit function pointer is per port (not per queue), this would
> require the driver to provide the MULTI_SEGS capability only per port, and not
> per queue. (Or we would have to add a NOT_MULTI_SEGS offload flag, to ensure
> that no queue is configured for MULTI_SEGS.)
>

next prev parent reply	other threads:[~2025-09-18 14:12 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-09-18  8:50 Morten Brørup
2025-09-18  9:09 ` Bruce Richardson
2025-09-18 10:00   ` Morten Brørup
2025-09-18 14:12     ` Konstantin Ananyev [this message]
2025-09-18 15:13 ` Stephen Hemminger

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=f3d8956d80814d82a450abc263349ea7@huawei.com \
    --to=konstantin.ananyev@huawei.com \
    --cc=ajit.khaparde@broadcom.com \
    --cc=amitbern@amazon.com \
    --cc=anatoly.burakov@intel.com \
    --cc=andrew.boyer@amd.com \
    --cc=andrew.rybchenko@oktetlabs.ru \
    --cc=atrwajee@amazon.com \
    --cc=bingz@nvidia.com \
    --cc=bruce.richardson@intel.com \
    --cc=caowenbo@mucse.com \
    --cc=dev@dpdk.org \
    --cc=dsosnowski@nvidia.com \
    --cc=evgenys@amazon.com \
    --cc=fengchengwen@huawei.com \
    --cc=grive@u256.net \
    --cc=hemant.agrawal@nxp.com \
    --cc=hkalra@marvell.com \
    --cc=ivan.malov@arknetworks.am \
    --cc=jerinj@marvell.com \
    --cc=jingjing.wu@intel.com \
    --cc=kirankumark@marvell.com \
    --cc=matan@nvidia.com \
    --cc=mb@smartsharesystems.com \
    --cc=mczekaj@marvell.com \
    --cc=ndabilpuram@marvell.com \
    --cc=orika@nvidia.com \
    --cc=praveen.shetty@intel.com \
    --cc=rbeider@amazon.com \
    --cc=rosen.xu@altera.com \
    --cc=sachin.saxena@oss.nxp.com \
    --cc=shaibran@amazon.com \
    --cc=skori@marvell.com \
    --cc=skoteshwar@marvell.com \
    --cc=somnath.kotur@broadcom.com \
    --cc=suanmingm@nvidia.com \
    --cc=techboard@dpdk.org \
    --cc=thomas@monjalon.net \
    --cc=viacheslavo@nvidia.com \
    --cc=vladimir.medvedkin@intel.com \
    --cc=yangxingui@h-partners.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).