RE: [PATCH] net/i40e: Fast release optimizations

DPDK patches and discussions
 help / color / mirror / Atom feed

From: "Morten Brørup" <mb@smartsharesystems.com>
To: "Konstantin Ananyev" <konstantin.ananyev@huawei.com>,
	"Bruce Richardson" <bruce.richardson@intel.com>, <dev@dpdk.org>
Subject: RE: [PATCH] net/i40e: Fast release optimizations
Date: Thu, 3 Jul 2025 14:02:42 +0200	[thread overview]
Message-ID: <98CBD80474FA8B44BF855DF32C47DC35E9FD85@smartserver.smartshare.dk> (raw)
In-Reply-To: <03b6263113f642828843b60878c811e6@huawei.com>

> > > I am talking about different thing:
> > > I think with some extra effort driver can use (in some cases)
> > > rte_mbuf_raw_free_bulk()  even  when
> RTE_ETH_TX_OFFLOAD_MBUF_FAST_FREE
> > > is not specified.
> > > Let say we can make txq->fast_free_mp[] an array with the same size
> as txq-
> > > >txep[].
> > > At tx_burst() when filling txep[] we can do pre_free() checks for
> that mbuf,
> > > and in case of success store it's mempool pointer in corresponding
> txq-
> > > >fast_free_mp[],
> > > otherwise put NULL there.
> > > Then at tx_free() we can scan fast_free_mp[] and invoke   raw_free()
> for non-
> > > NULL entries.
> > > Again, for now it is just an idea probably worth to think about.
> >
> > Yes, that seems like an excellent idea, certainly worth considering!
> >
> > At tx_free(), the mbufs might be cold, so not accessing them at this
> point improves performance. (Which is also the point of my
> > patch.)
> 
> Yes.
> 
> >
> > At tx_burst(), the mbufs are read anyway (their information is written
> into the tx descriptors), so the mbufs are hot in the cache at
> > this point.
> 
> Yes.
> 
> > Best case with your suggestion, rte_pktmbuf_prefree_seg() doesn't
> write the mbuf, so the performance cost of doing it at tx_burst()
> > is extremely low.
> 
> Yes.
> 
> > Worst case with your suggestion, rte_pktmbuf_prefree_seg() does write
> the mbuf, so the mbuf write operation simply moves from
> > tx_free() to tx_burst().
> > However, in tx_burst(), the mbuf is already hot in the cache, so per
> transmitted mbuf, we get one load+store at tx_burst() instead of
> > one load at tx_burst() + one load+store at tx_free().
> 
> I suppose you plan to invoke full  rte_pktmbuf_prefree_seg() here?
> Unfortunately, I don't think it is possible - for cases when refcnt > 1,
> we need to decrement refcnt only when we are ready to
> release the mbuf. Otherwise we can end up with NIC HW reading from
> already released (and probably re-used) mbuf.

Good catch.

> What we probably need is a lightweight version of
> rte_pktmbuf_prefree_seg() that would return not-NULL value only when
> refcnt==1, and segment and not indirect mbuf or external memory
> attached.
> Something like:
> 
> static __rte_always_inline struct rte_mbuf *
> rte_pktmbuf_prefree_check(sconst truct rte_mbuf *m)
> {
>         if (rte_mbuf_refcnt_read(m) == 1 && RTE_MBUF_DIRECT(m))
>              return m;
>        return NULL;
> }

Yes.
For the mbufs that are safe to prefree at tx_burst(), we must completely prefree them like rte_pktmbuf_prefree_seg() does, i.e. also initialize the m->next and m->nb_segs fields, so we can blindly free the mbufs directly to the relevant mempools at tx_free().

> 
> So at worst case (when such check will return NULL) we still need to do
> load+store at tx_free().

Yes, at the worst case, the behavior of tx_free() will be the same as today.

Plus the overhead of reading and checking the fast_free_mp[] array at tx_free(), which will be incurred in both best and worst case. With x86-64 architecture, this array covers 8 mbufs per cache line, so the per-mbuf cost is relatively low.
(In RTE_ETH_TX_OFFLOAD_MBUF_FAST_FREE offload mode, the fast_free_mp[] array should not be used at all, so this small extra overhead is not incurred in the offload mode.)

From the high level perspective, we should also consider the relevance of this optimization:
Do any (non-exotic) applications exist that transmit many mbufs compliant to the requirements of this optimization (and thus benefitting from it), but not compliant to the RTE_ETH_TX_OFFLOAD_MBUF_FAST_FREE offload requirements (and thus being unable to use that instead)?

     prev parent reply	other threads:[~2025-07-03 12:02 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-06-24  6:12 Morten Brørup
2025-06-25 10:47 ` Bruce Richardson
2025-06-25 11:15   ` Morten Brørup
2025-06-30 11:40 ` Konstantin Ananyev
2025-06-30 13:45   ` Morten Brørup
2025-06-30 16:06     ` Morten Brørup
2025-07-01  7:31       ` Morten Brørup
2025-07-01  8:16       ` Konstantin Ananyev
2025-07-01  9:09         ` Morten Brørup
2025-07-03  8:12           ` Konstantin Ananyev
2025-07-03 12:02             ` Morten Brørup [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=98CBD80474FA8B44BF855DF32C47DC35E9FD85@smartserver.smartshare.dk \
    --to=mb@smartsharesystems.com \
    --cc=bruce.richardson@intel.com \
    --cc=dev@dpdk.org \
    --cc=konstantin.ananyev@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).