RE: [PATCH v2] net/null: Add fast mbuf release TX offload

DPDK patches and discussions
 help / color / mirror / Atom feed

From: "Morten Brørup" <mb@smartsharesystems.com>
To: "Konstantin Ananyev" <konstantin.ananyev@huawei.com>,
	"Ivan Malov" <ivan.malov@arknetworks.am>
Cc: <dev@dpdk.org>, "Tetsuya Mukawa" <mtetsuyah@gmail.com>,
	"Stephen Hemminger" <stephen@networkplumber.org>,
	"Vipin Varghese" <Vipin.Varghese@amd.com>,
	"Thiyagarjan P" <Thiyagarajan.P@amd.com>
Subject: RE: [PATCH v2] net/null: Add fast mbuf release TX offload
Date: Mon, 28 Jul 2025 18:42:21 +0200	[thread overview]
Message-ID: <98CBD80474FA8B44BF855DF32C47DC35E9FDCE@smartserver.smartshare.dk> (raw)
In-Reply-To: <b9f113ebc01f46cc87f7cd0ff7502add@huawei.com>

> From: Konstantin Ananyev [mailto:konstantin.ananyev@huawei.com]
> Sent: Monday, 28 July 2025 17.42
> 
> > > Hi Morten,
> > >
> > > Good patch. Please see below.
> > >
> > > On Sat, 26 Jul 2025, Morten Brørup wrote:
> > >
> > > > Added fast mbuf release, re-using the existing mbuf pool pointer
> > > > in the queue structure.
> > > >
> > > > Signed-off-by: Morten Brørup <mb@smartsharesystems.com>
> > > > ---
> > > > v2:
> > > > * Also announce the offload as a per-queue capability.
> > > > * Added missing test of per-device offload configuration when
> > > configuring
> > > >  the queue.
> > > > ---
> > > > drivers/net/null/rte_eth_null.c | 33 ++++++++++++++++++++++++++++++---
> > > > 1 file changed, 30 insertions(+), 3 deletions(-)
> > > >
> > > > diff --git a/drivers/net/null/rte_eth_null.c
> > > b/drivers/net/null/rte_eth_null.c
> > > > index 8a9b74a03b..09cfc74494 100644
> > > > --- a/drivers/net/null/rte_eth_null.c
> > > > +++ b/drivers/net/null/rte_eth_null.c
> > > > @@ -34,6 +34,17 @@ struct pmd_internals;
> > > > struct null_queue {
> > > > 	struct pmd_internals *internals;
> > > >
> > > > +	/**
> > > > +	 * For RX queue:
> > > > +	 *  Mempool to allocate mbufs from.
> > > > +	 *
> > > > +	 * For TX queue:
> > >
> > > Perhaps spell it 'Rx', 'Tx', but this is up to you.
> >
> > I just checked, and it seems all three spellings "rx", "Rx" and "RX" are
> being used in DPDK.
> > I personally prefer RX, so I'll keep that.
> >
> > >
> > > > +	 *  Mempool to free mbufs to, if fast release of mbufs is
> enabled.
> > > > +	 *  UINTPTR_MAX if the mempool for fast release of mbufs has not
> > > yet been detected.
> > > > +	 *  NULL if fast release of mbufs is not enabled.
> > > > +	 *
> > > > +	 *  @see RTE_ETH_TX_OFFLOAD_MBUF_FAST_FREE
> > > > +	 */
> > >
> > > May be it would be better to have a separate 'tx_pkt_burst' callback, to
> > > avoid
> > > conditional checks below. Though, I understand this will downgrade the
> > > per-queue
> > > capability to the per-port only, so feel free to disregard this point.
> >
> > I considered this, and I can imagine an application using FAST_FREE for its
> primary queues, and normal free for some secondary
> > queues.
> > Looking at other drivers - which have implemented a runtime check, not
> separate callbacks for FAST_FREE - I guess they came to the
> > same conclusion.
> >
> > >
> > > > 	struct rte_mempool *mb_pool;
> > > > 	void *dummy_packet;
> > > >
> > > > @@ -151,7 +162,16 @@ eth_null_tx(void *q, struct rte_mbuf **bufs,
> > > uint16_t nb_bufs)
> > > > 	for (i = 0; i < nb_bufs; i++)
> > > > 		bytes += rte_pktmbuf_pkt_len(bufs[i]);
> > > >
> > > > -	rte_pktmbuf_free_bulk(bufs, nb_bufs);
> > > > +	if (h->mb_pool != NULL) { /* RTE_ETH_TX_OFFLOAD_MBUF_FAST_FREE */
> > > > +		if (unlikely(h->mb_pool == (void *)UINTPTR_MAX)) {
> > > > +			if (unlikely(nb_bufs == 0))
> > > > +				return 0; /* Do not dereference uninitialized
> > > bufs[0]. */
> > > > +			h->mb_pool = bufs[0]->pool;
> > > > +		}
> > > > +		rte_mbuf_raw_free_bulk(h->mb_pool, bufs, nb_bufs);
> > > > +	} else {
> > > > +		rte_pktmbuf_free_bulk(bufs, nb_bufs);
> > > > +	}
> > > > 	rte_atomic_fetch_add_explicit(&h->tx_pkts, nb_bufs,
> > > rte_memory_order_relaxed);
> > > > 	rte_atomic_fetch_add_explicit(&h->tx_bytes, bytes,
> > > rte_memory_order_relaxed);
> > > >
> > > > @@ -259,7 +279,7 @@ static int
> > > > eth_tx_queue_setup(struct rte_eth_dev *dev, uint16_t tx_queue_id,
> > > > 		uint16_t nb_tx_desc __rte_unused,
> > > > 		unsigned int socket_id __rte_unused,
> > > > -		const struct rte_eth_txconf *tx_conf __rte_unused)
> > > > +		const struct rte_eth_txconf *tx_conf)
> > > > {
> > > > 	struct rte_mbuf *dummy_packet;
> > > > 	struct pmd_internals *internals;
> > > > @@ -284,6 +304,10 @@ eth_tx_queue_setup(struct rte_eth_dev *dev,
> > > uint16_t tx_queue_id,
> > > >
> > > > 	internals->tx_null_queues[tx_queue_id].internals = internals;
> > > > 	internals->tx_null_queues[tx_queue_id].dummy_packet =
> > > dummy_packet;
> > > > +	internals->tx_null_queues[tx_queue_id].mb_pool =
> > > > +			(dev->data->dev_conf.txmode.offloads | tx_conf-
> > > >offloads) &
> > > > +			RTE_ETH_TX_OFFLOAD_MBUF_FAST_FREE ?
> > > > +			(void *)UINTPTR_MAX : NULL;
> > >
> > > Given the fact that FAST_FREE and MULTI_SEGS are effectively
> > > conflicting,
> > > wouldn't it be better to have a check for the presence of both flags
> > > here? I'm
> > > not sure whether this is already checked at the generic layer above the
> > > PMD.
> >
> > Interesting thought - got me looking deeper into this.
> >
> > It seems MULTI_SEGS is primarily a capability flag.
> > The description of the MULTI_SEGS flag [1] could be interpreted that way
> too:
> > /** Device supports multi segment send. */
> >
> > [1]:
> https://elixir.bootlin.com/dpdk/v25.07/source/lib/ethdev/rte_ethdev.h#L1614
> 
> In fact, I believe it serves both purposes: report capabilities and request
> for offloads to enable.
> Few example, I believe request this offload:
> https://elixir.bootlin.com/dpdk/v25.07/source/examples/ip_fragmentation/main.c
> #L156
> https://elixir.bootlin.com/dpdk/v25.07/source/examples/ipsec-secgw/ipsec-
> secgw.c#L1985
> https://elixir.bootlin.com/dpdk/v25.07/source/examples/ip_reassembly/main.c#L1
> 77

Another flag with unclear description. :-(

> 
> >
> > E.g. the i40e driver offers MULTI_SEGS capability per-device, but not per-
> queue. And it doesn't use the MULTI_SEGS flag for any
> > purpose (beyond capability reporting).
> >
> > Furthermore, enabling MULTI_SEGS on TX (per device or per queue) wouldn't
> mean that all transmitted packets are segmented; it
> > only means that the driver must be able to handle segmented packets.
> 
> Yep.
> 
> > I.e. MULTI_SEGS could be enabled on a device, and yet it would be acceptable
> to enable FAST_FREE on a queue on that device.
> 
> In theory yes... you probably can have one TX queue with FAST_FREE (no multi-
> seg packets) and another TX queue serving mulit-seg packets.
> Again, probably some drivers can even support both offloads on the same TX
> queue,
> as long as conditions for the FAST_FREE offload are still satisfied: single
> mempool, refcnt==1, no indirect mbufs, etc.
> Though in practice, using MULTI_SEG usually implies usage all these mbuf
> features that are not-compatible with FAST_FREE.
> BTW,  I see many of DPDK examples - do use both FAST_FREE and MULTI_SEG.
> TBH - I don't understand how it works together, from my memories - for many
> cases it just shouldn't.

Agree: Using FAST_FREE and MULTI_SEG together shouldn't work.
But if a driver (e.g. i40e) doesn't support configuring MULTI_SEG per queue, only per device, it would be impossible to configure FAST_FREE for one queue and MULTI_SEG for another queue.
Cleaning up this mess would break applications that assume MULTI_SEG is for capability only, and thus don't set it when configuring the device or queue. And applications that configure MULTI_SEG on a device and FAST_FREE on a queue.

Slightly related: I suspect that FAST_FREE might not be implemented 100 % correctly in all drivers, so I submitted a patch [2] to verify that FAST_FREE'ed mbufs conform to the required state of mbufs held in the mbuf pool. (No specific drivers in mind, just a weak hunch.)
Any drivers implementing FAST_FREE should use rte_mbuf_raw_free_bulk() instead of rte_mempool_put_bulk() to benefit from this patch, especially when conformance testing the driver.

[2]: https://inbox.dpdk.org/dev/20250722093431.555214-1-mb@smartsharesystems.com/

> 
> >
> > >
> > > Thank you.
> >
> > Thank you for reviewing.
> >
> > >
> > > >
> > > > 	return 0;
> > > > }
> > > > @@ -309,7 +333,10 @@ eth_dev_info(struct rte_eth_dev *dev,
> > > > 	dev_info->max_rx_queues = RTE_DIM(internals->rx_null_queues);
> > > > 	dev_info->max_tx_queues = RTE_DIM(internals->tx_null_queues);
> > > > 	dev_info->min_rx_bufsize = 0;
> > > > -	dev_info->tx_offload_capa = RTE_ETH_TX_OFFLOAD_MULTI_SEGS |
> > > RTE_ETH_TX_OFFLOAD_MT_LOCKFREE;
> > > > +	dev_info->tx_queue_offload_capa =
> > > RTE_ETH_TX_OFFLOAD_MBUF_FAST_FREE;
> > > > +	dev_info->tx_offload_capa = RTE_ETH_TX_OFFLOAD_MULTI_SEGS |
> > > > +			RTE_ETH_TX_OFFLOAD_MT_LOCKFREE |
> > > > +			dev_info->tx_queue_offload_capa;
> > > >
> > > > 	dev_info->reta_size = internals->reta_size;
> > > > 	dev_info->flow_type_rss_offloads = internals-
> > > >flow_type_rss_offloads;
> > > > --
> > > > 2.43.0
> > > >
> > > >

     prev parent reply	other threads:[~2025-07-28 16:42 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-06-24 18:14 [PATCH] " Morten Brørup
2025-06-26 14:05 ` Stephen Hemminger
2025-06-26 15:44   ` Morten Brørup
2025-06-27 12:07     ` Varghese, Vipin
2025-07-26  4:34       ` Morten Brørup
2025-07-28  8:22         ` Varghese, Vipin
2025-07-26  4:48 ` [PATCH v2] " Morten Brørup
2025-07-26  6:15   ` Ivan Malov
2025-07-28 13:27     ` Morten Brørup
2025-07-28 13:51       ` Ivan Malov
2025-07-28 15:42       ` Konstantin Ananyev
2025-07-28 16:42         ` Morten Brørup [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=98CBD80474FA8B44BF855DF32C47DC35E9FDCE@smartserver.smartshare.dk \
    --to=mb@smartsharesystems.com \
    --cc=Thiyagarajan.P@amd.com \
    --cc=Vipin.Varghese@amd.com \
    --cc=dev@dpdk.org \
    --cc=ivan.malov@arknetworks.am \
    --cc=konstantin.ananyev@huawei.com \
    --cc=mtetsuyah@gmail.com \
    --cc=stephen@networkplumber.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).