From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id 10D4848A43; Wed, 29 Oct 2025 13:23:34 +0100 (CET) Received: from mails.dpdk.org (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id 925814028E; Wed, 29 Oct 2025 13:23:33 +0100 (CET) Received: from dkmailrelay1.smartsharesystems.com (smartserver.smartsharesystems.com [77.243.40.215]) by mails.dpdk.org (Postfix) with ESMTP id 64DCA40288; Wed, 29 Oct 2025 13:23:31 +0100 (CET) Received: from smartserver.smartsharesystems.com (smartserver.smartsharesys.local [192.168.4.10]) by dkmailrelay1.smartsharesystems.com (Postfix) with ESMTP id 2D245206CD; Wed, 29 Oct 2025 13:23:31 +0100 (CET) Content-class: urn:content-classes:message MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable Subject: RE: Fixing MBUF_FAST_FREE TX offload requirements? Date: Wed, 29 Oct 2025 13:23:28 +0100 Message-ID: <98CBD80474FA8B44BF855DF32C47DC35F65513@smartserver.smartshare.dk> X-MimeOLE: Produced By Microsoft Exchange V6.5 In-Reply-To: X-MS-Has-Attach: X-MS-TNEF-Correlator: Thread-Topic: Fixing MBUF_FAST_FREE TX offload requirements? Thread-Index: AdxItbf0vaAZYbBtT5KshyiE1pVM9QAEFMUw References: <98CBD80474FA8B44BF855DF32C47DC35F65442@smartserver.smartshare.dk> <98CBD80474FA8B44BF855DF32C47DC35F65446@smartserver.smartshare.dk> <157addba-7dc8-4f0c-8b86-4ca8d057cdfc@oktetlabs.ru> From: =?iso-8859-1?Q?Morten_Br=F8rup?= To: "Bruce Richardson" , "Andrew Rybchenko" Cc: "Konstantin Ananyev" , "Ajit Khaparde" , "Somnath Kotur" , "Nithin Dabilpuram" , "Kiran Kumar K" , "Sunil Kumar Kori" , "Satha Rao" , "Harman Kalra" , "Hemant Agrawal" , "Sachin Saxena" , "Shai Brandes" , "Evgeny Schemeilin" , "Ron Beider" , "Amit Bernstein" , "Wajeeh Atrash" , "Gaetan Rivet" , "yangxingui" , "Fengchengwen" , "Praveen Shetty" , "Vladimir Medvedkin" , "Anatoly Burakov" , "Jingjing Wu" , "Rosen Xu" , "Andrew Boyer" , "Dariusz Sosnowski" , "Viacheslav Ovsiienko" , "Bing Zhao" , "Ori Kam" , "Suanming Mou" , "Matan Azrad" , "Wenbo Cao" , "Jerin Jacob" , "Maciej Czekaj" , , , "Ivan Malov" , "Thomas Monjalon" X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org > From: Bruce Richardson [mailto:bruce.richardson@intel.com] > Sent: Wednesday, 29 October 2025 10.23 >=20 > On Wed, Oct 29, 2025 at 12:16:37PM +0300, Andrew Rybchenko wrote: > > On 9/18/25 5:12 PM, Konstantin Ananyev wrote: > > > > > > > > > > Subject: RE: Fixing MBUF_FAST_FREE TX offload requirements? > > > > > > > > > From: Bruce Richardson [mailto:bruce.richardson@intel.com] > > > > > Sent: Thursday, 18 September 2025 11.09 > > > > > > > > > > On Thu, Sep 18, 2025 at 10:50:11AM +0200, Morten Br=F8rup = wrote: > > > > > > Dear NIC driver maintainers (CC: DPDK Tech Board), > > > > > > > > > > > > The DPDK Tech Board has discussed that patch [1] (included = in > DPDK > > > > > 25.07) extended the documented requirements to the > > > > > RTE_ETH_TX_OFFLOAD_MBUF_FAST_FREE offload. > > > > > > These changes put additional limitations on applications' = use > of the > > > > > MBUF_FAST_FREE TX offload, and made MBUF_FAST_FREE mutually > exclusive > > > > > with MULTI_SEGS (which is typically used for jumbo frame > support). > > > > > > The Tech Board discussed that these changes do not reflect > the > > > > > intention of the MBUF_FAST_FREE TX offload, and wants to fix > it. > > > > > > Mainly, MBUF_FAST_FREE and MULTI_SEGS should not be mutually > > > > > exclusive. > > > > > > > > > > > > The original RTE_ETH_TX_OFFLOAD_MBUF_FAST_FREE requirements > were: > > > > > > When set, application must guarantee that > > > > > > 1) per-queue all mbufs come from the same mempool, and > > > > > > 2) mbufs have refcnt =3D 1. > > > > > > > > > > > > The patch added the following requirements to the > MBUF_FAST_FREE > > > > > offload, reflecting rte_pktmbuf_prefree_seg() postconditions: > > > > > > 3) mbufs are direct, > > > > > > 4) mbufs have next =3D NULL and nb_segs =3D 1. > > > > > > > > > > > > Now, the key question is: > > > > > > Can we roll back to the original two requirements? > > > > > > Or do the drivers also depend on the third and/or fourth > > > > > requirements? > > > > > > > > > > > > > > > > > > Drivers freeing mbufs directly to a mempool should use the > new > > > > > rte_mbuf_raw_free_bulk() instead of rte_mempool_put_bulk(), so > the > > > > > preconditions for freeing mbufs directly into a mempool are > validated > > > > > in mbuf debug mode (with RTE_LIBRTE_MBUF_DEBUG enabled). > > > > > > Similarly, rte_mbuf_raw_alloc_bulk() should be used instead > of > > > > > rte_mempool_get_bulk(). > > > > > > > > > > > > > > > > > > PS: The feature documentation [2] still reflects the = original > > > > > requirements. > > > > > > > > > > > > [1]: > > > > > > > > > > https://github.com/DPDK/dpdk/commit/55624173bacb2becaa67793b7139188487 > > > > 6 > > > > > 673c1 > > > > > > [2]: > > > > > > = https://elixir.bootlin.com/dpdk/v25.07/source/doc/guides/nics/features. > > > > > rst#L125 > > > > > > > > > > > > > > > > > > Venlig hilsen / Kind regards, > > > > > > -Morten Br=F8rup > > > > > > > > > > > I'm a little torn on this question, because I can see benefits > for both > > > > > approaches. Firstly, it would be nice if we made FAST_FREE as > > > > > accessible > > > > > for driver use as it was originally, with minimal = requirements. > > > > > However, on > > > > > looking at the code, I believe that many drivers actually took > it to > > > > > mean > > > > > that scattered packets couldn't occur in that case either, so > the use > > > > > was > > > > > incorrect. > > > > > > > > I primarily look at Intel drivers, and that's how I read the > driver code too. > > > > > > > > > Similarly, and secondly, if we do have the extra > > > > > requirements > > > > > for FAST_FREE, it does mean that any use of it can be very, > very > > > > > minimal > > > > > and efficient, since we don't need to check anything before > freeing the > > > > > buffers. > > > > > > > > > > Given where we are now, I think keeping the more restrictive > definition > > > > > of > > > > > FAST_FREE is the way to go - keeping it exclusive with > MULTI_SEGS - > > > > > because > > > > > it means that we are less likely to have bugs. If we look to > change it > > > > > back, I think we'd have to check all drivers to ensure they = are > using > > > > > the > > > > > flag safely. > > > > > > > > However, those driver bugs are not new. > > > > If we haven't received bug reports from users affected by them, > maybe we can > > > > disregard them (in this discussion about pros and cons). > > > > I prefer we register them as driver bugs, instead of changing = the > API to > > > > accommodate bugs in the drivers. > > > > > > > > From an application perspective, here's an idea for > consideration: > > > > Assuming that indirect mbufs are uncommon, we keep requirement > #3. > > > > To allow MULTI_SEGS (jumbo frames) with FAST_FREE, we get rid of > requirement > > > > #4. > > > > > > Do we really need to enable FAST_FREE for jumbo-frames? > > > Jumbo-frames usually means much smaller PPS number and actual = RX/TX > overhead > > > becomes really tiny. > > > > +1 > > > > Since the driver knows that refcnt =3D=3D 1, the driver can set = next > =3D NULL and > > > > nb_segs =3D 1 at any time, either when writing the TX descriptor > (when it reads the > > > > mbuf anyway), or when freeing the mbuf. > > > > Regarding performance, this means that the driver's TX code path > has to write to > > > > the mbufs (i.e. adding the performance cost of memory store > operations) when > > > > segmented - but that is a universal requirement when freeing > segmented mbufs > > > > to the mempool. > > > > > > It might work, but I think it will become way too complicated. > > > Again I don't know who is going to inspect/fix all the drivers. > > > Just not allowing FAST_FREE for mulsti-seg seems like a much more > simpler and safer approach. > > > > For even more optimized driver performance, as Bruce mentions... > > > > If a port is configured for FAST_FREE and not MULTI_SEGS, the > driver can use a > > > > super lean transmit function. > > > > Since the driver's transmit function pointer is per port (not = per > queue), this would > > > > require the driver to provide the MULTI_SEGS capability only per > port, and not > > > > per queue. (Or we would have to add a NOT_MULTI_SEGS offload > flag, to ensure > > > > that no queue is configured for MULTI_SEGS.) > > > > > > FAST_FREE is not a real Tx offload, since there is no promise from > > driver to do something (like other Tx offloads, e.g. checksumming or > > TSO). Is it a promise to ignore refcount or take a look at memory > pool > > of some packets only? I guess no. If so, basically any driver may > > advertise it and simply ignore if the offload is requested, but > > driver can do nothing with these limitations on input data. > > > > It is a performance hint in fact and promise from application to > > follow specified limitations on Tx mbufs. > > > > So, if application specifies both FAST_FREE and MULTI_SEG, but = driver > > code can't FAST_FREE with MULTI_SEG, it should just ignore = FAST_FREE. > > That's it. The performance hint is simply useless in this case. > > There is no point to make FAST_FREE and MULTI_SEG mutual exclusive. > > If some drivers can really support both - great. If no, just ignore > > FAST_FREE and support MULTI_SEG. > > > > "mbufs are direct" must be FAST_FREE requirement. Since otherwise > > freeing is not simple. I guess is was simply lost in the original > > definition of FAST_FREE. Agree about the "mbufs are direct" statement being lost in the original = definition. It can be extended to include mbufs using "pinned external buffer with = refcnt=3D=3D1", because freeing those is just as simple as freeing = "direct" mbufs. > > > That's a good point and expanation of things. Perhaps we are better to > deprecate FAST_FREE and replace it with a couple of explicit hints = that > better explain what they are? >=20 > - RTE_ETH_TX_HINT_DIRECT_MBUFS In the FAST_FREE case, this hint would be = TX_HINT_MBUF_DIRECT_OR_SINGLE_OWNER_PINNED_EXTBUF. > - RTE_ETH_TX_HINT_SINGLE_MEMPOOL Prefer TX_HINT_SINGLE_MEMPOOL -> TX_HINT_SAME_MEMPOOL, so we can add a = globally scoped TX_HINT_SINGLE_MEMPOOL later. Also, RTE_ETH_TX_HINT_NON_SEGMENTED can be added later. I strongly agree with the finer granularity for the hints; the = optimization of freeing to one mempool instead of a variety of mempools = is orthogonal to the optimization of not having to consider indirect = mbufs. And the drivers are free to only optimize if multiple hints are present; = so there is no downside to using a finer granularity for hints. Although we are reusing "offload" fields for hints, there's no need for = drivers to announce capability for such hints, including FAST_FREE; = since the drivers can freely ignore any hints, hint capability doesn't = contain any information about the driver's ability to do anything useful = with the hints. Regarding naming, we should use "promise" instead of "hint", to = emphasize that the "hint" is not allowed to be violated.