DPDK patches and discussions
 help / color / mirror / Atom feed
From: "Morten Brørup" <mb@smartsharesystems.com>
To: "Konstantin Ananyev" <konstantin.v.ananyev@yandex.ru>,
	"Huichao Cai" <chcchc88@163.com>
Cc: <dev@dpdk.org>, "Stephen Hemminger" <stephen@networkplumber.org>,
	"Olivier Matz" <olivier.matz@6wind.com>,
	"Yuying Zhang" <Yuying.Zhang@intel.com>,
	"Beilei Xing" <beilei.xing@intel.com>,
	"Matan Azrad" <matan@nvidia.com>,
	"Viacheslav Ovsiienko" <viacheslavo@nvidia.com>
Subject: RE: [PATCH v3] ip_frag: add IPv4 fragment copy packet API
Date: Sun, 24 Jul 2022 00:27:29 +0200	[thread overview]
Message-ID: <98CBD80474FA8B44BF855DF32C47DC35D871E4@smartserver.smartshare.dk> (raw)
In-Reply-To: <b85f94a3-63bb-3ff9-c57c-00700761df35@yandex.ru>

> From: Konstantin Ananyev [mailto:konstantin.v.ananyev@yandex.ru]
> Sent: Saturday, 23 July 2022 20.25
> 
> 23/07/2022 09:24, Morten Brørup пишет:
> > +CC: i40e maintainers
> > +CC: mlx5 maintainers
> >
> >> From: Konstantin Ananyev [mailto:konstantin.v.ananyev@yandex.ru]
> >> Sent: Saturday, 23 July 2022 00.35
> >>
> >> 22/07/2022 17:14, Morten Brørup пишет:
> >>> From: Huichao Cai [mailto:chcchc88@163.com]
> >>> Sent: Friday, 22 July 2022 17.59
> >>>
> >>>> At 2022-07-22 23:52:28, "Morten Brørup" <mb@smartsharesystems.com>
> >> wrote:
> >>>>>> From: Stephen Hemminger [mailto:stephen@networkplumber.org]
> >>>>>> Sent: Friday, 22 July 2022 16.49
> >>>>>>
> >>>>>> On Fri, 22 Jul 2022 21:01:50 +0800
> >>>>>> Huichao Cai <chcchc88@163.com> wrote:
> >>>>>>
> >>>>>>> Some NIC drivers support MBUF_FAST_FREE(Device supports
> >> optimization
> >>>>>>> for fast release of mbufs. When set application must guarantee
> >> that
> >>>>>>> per-queue all mbufs comes from the same mempool and has refcnt
> =
> >> 1)
> >>>>>>> offload. In order to adapt to this offload function, add this
> >> API.
> >>>>>>> Add some test data for this API.
> >>>>>>>
> >>>>>>> Signed-off-by: Huichao Cai <chcchc88@163.com>
> >>>>>>
> >>>>>> The code should just be checking that refcnt == 1 directly.
> >>>>>>
> >>>>>> There are cases where sender passes a cloned mbuf.  This is
> >> independent
> >>>>>> of the fast free optimization.
> >>>>>>
> >>>>>> Similar to what Linux kernel does with skb_cow().
> >>>>>
> >>>>> Olivier just confirmed that MBUF_FAST_FREE requires that the
> mbufs
> >> are direct and non-segmented, although these requirements are not
> yet
> >> documented.
> >>>>>
> >>>>> This means that you should not generate segmented mbufs with this
> >> patch. I don't know what to do instead; probably fail with an
> >> appropriate errno.
> >>>>
> >>>> When the bnxt driver sends mbuf, it will take the mbuf segments
> >> apart and hang it to the tx_buf_ring, so there is no mbuf segments
> when
> >> it is released. Does this mean that there can be mbuf segments?
> >>>
> >>> Only if the bnxt driver also resets the segmentation fields
> (nb_segs
> >> and next) in those mbufs, which I suppose it does, if it supports
> >> MBUF_FAST_FREE with segmented packets.
> >>>
> >>> However, other Ethernet drivers don't do that, so a generic library
> >> function cannot rely on it. These missing requirements for
> >> MBUF_FAST_FREE is a bug, either in the MBUF_FAST_FREE documentation,
> or
> >> in the drivers where MBUF_FAST_FREE only works correctly with direct
> >> and non-segmented mbufs.
> >>>
> >>
> >> I believe multi-segment packets work ok with MBUF_FAST_FREE
> >> (as long as other requirements are met).
> >
> > Looking at the i40e and mlx5 drivers, they both seem to call
> rte_mempool_put_bulk() without first calling rte_pktmbuf_prefree_seg().
> So segmented packets freed with MBUF_FAST_FREE, will be stored in the
> mbuf pool without m->nb_segs and m->next being reset first.
> >
> > I don't have deep knowledge of these drivers, so maybe I have
> overlooked something.
> >
> > The point of MBUF_FAST_FREE is to bypass a lot of code under certain
> conditions. So I believe that these two undocumented requirements
> should remain, so the drivers can bypass this code. Otherwise, don't
> use MBUF_FAST_FREE.
> >
> 
> Actually, after another look, I think you and Olivier are right -
> multi-seg packets should not be used together with MBUF_FAST_FREE.
> I forgot that mbuf_prefree() is responsible to reset both 'next'
> and 'nb_segs' fields of the mbuf.
> It might keep working for some simple forwarding app (like l3fwd),
> as most PMDs reset these fields at RX path anyway, but that's just a
> coincidence we shouldn't rely on.

I hope the PMDs don't reset these fields in their RX path, unless they are creating multi-seg packets and therefore must. It might cause an extra cache miss per packet, if the PMD unnecessarily sets m->next, which is in the second cache line of the mbuf.

Or perhaps everyone has forgotten about this RX/TX split of the first/second cache line of the mbufs, because all tests are based on run-to-completion, where the second cache line will be written shortly afterwards anyway. :-(

> We probably need to update l3fwd (and other examples) to dis-allow
> MBUF_FAST_FREE when TX_OFFLOAD_MULTI_SEGS is selected.

+1

> 
> Konstantin


  reply	other threads:[~2022-07-23 22:27 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-06-09  2:39 [PATCH v1] " Huichao Cai
2022-06-09 14:19 ` [PATCH v2] " Huichao Cai
2022-07-10 23:35   ` Konstantin Ananyev
2022-07-11  9:14     ` Konstantin Ananyev
2022-07-15  8:05       ` Huichao Cai
2022-07-19  8:19         ` Konstantin Ananyev
2022-07-22 13:01   ` [PATCH v3] " Huichao Cai
2022-07-22 14:42     ` Morten Brørup
2022-07-22 14:49     ` Stephen Hemminger
2022-07-22 15:52       ` Morten Brørup
2022-07-22 15:58         ` Huichao Cai
2022-07-22 16:14           ` Morten Brørup
2022-07-22 22:35             ` Konstantin Ananyev
2022-07-23  8:24               ` Morten Brørup
2022-07-23 18:25                 ` Konstantin Ananyev
2022-07-23 22:27                   ` Morten Brørup [this message]
2022-07-22 14:49     ` [PATCH v4] " Huichao Cai
2022-07-24  4:50       ` [PATCH v5] " Huichao Cai
2022-07-24  8:10         ` [PATCH v6] " Huichao Cai
2022-07-25 15:42           ` Stephen Hemminger
2022-07-26  1:22             ` Huichao Cai
2022-08-07 11:49               ` Konstantin Ananyev
2022-08-07 11:45           ` Konstantin Ananyev
2022-08-08  1:48           ` [PATCH v7] " Huichao Cai
2022-08-08 22:29             ` Konstantin Ananyev
2022-08-29 14:22               ` Thomas Monjalon

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=98CBD80474FA8B44BF855DF32C47DC35D871E4@smartserver.smartshare.dk \
    --to=mb@smartsharesystems.com \
    --cc=Yuying.Zhang@intel.com \
    --cc=beilei.xing@intel.com \
    --cc=chcchc88@163.com \
    --cc=dev@dpdk.org \
    --cc=konstantin.v.ananyev@yandex.ru \
    --cc=matan@nvidia.com \
    --cc=olivier.matz@6wind.com \
    --cc=stephen@networkplumber.org \
    --cc=viacheslavo@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).