From: "Ananyev, Konstantin" <konstantin.ananyev@intel.com>
To: "Andrew Rybchenko" <andrew.rybchenko@oktetlabs.ru>,
"Morten Brørup" <mb@smartsharesystems.com>,
"Olivier Matz" <olivier.matz@6wind.com>
Cc: "dev@dpdk.org" <dev@dpdk.org>
Subject: Re: [dpdk-dev] [PATCH] mbuf: fix reset on mbuf free
Date: Sun, 8 Nov 2020 14:19:55 +0000 [thread overview]
Message-ID: <DM6PR11MB330811402FF23318DE14D2CE9AEB0@DM6PR11MB3308.namprd11.prod.outlook.com> (raw)
In-Reply-To: <c2b4adb2-0a0a-7aa2-d89b-fc51b438b446@oktetlabs.ru>
> >>
> >>>>>>>>>>>>>>>>>> Hi Olivier,
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> m->nb_seg must be reset on mbuf free
> >>>> whatever
> >>>>>> the
> >>>>>>>> value
> >>>>>>>>>> of m->next,
> >>>>>>>>>>>>>>>>>>> because it can happen that m->nb_seg is
> >> !=
> >>>> 1.
> >>>>>> For
> >>>>>>>>>> instance in this
> >>>>>>>>>>>>>>>>>>> case:
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> m1 = rte_pktmbuf_alloc(mp);
> >>>>>>>>>>>>>>>>>>> rte_pktmbuf_append(m1, 500);
> >>>>>>>>>>>>>>>>>>> m2 = rte_pktmbuf_alloc(mp);
> >>>>>>>>>>>>>>>>>>> rte_pktmbuf_append(m2, 500);
> >>>>>>>>>>>>>>>>>>> rte_pktmbuf_chain(m1, m2);
> >>>>>>>>>>>>>>>>>>> m0 = rte_pktmbuf_alloc(mp);
> >>>>>>>>>>>>>>>>>>> rte_pktmbuf_append(m0, 500);
> >>>>>>>>>>>>>>>>>>> rte_pktmbuf_chain(m0, m1);
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> As rte_pktmbuf_chain() does not reset
> >>>> nb_seg in
> >>>>>> the
> >>>>>>>>>> initial m1
> >>>>>>>>>>>>>>>>>>> segment (this is not required), after
> >> this
> >>>> code
> >>>>>> the
> >>>>>>>>>> mbuf chain
> >>>>>>>>>>>>>>>>>>> have 3 segments:
> >>>>>>>>>>>>>>>>>>> - m0: next=m1, nb_seg=3
> >>>>>>>>>>>>>>>>>>> - m1: next=m2, nb_seg=2
> >>>>>>>>>>>>>>>>>>> - m2: next=NULL, nb_seg=1
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> Freeing this mbuf chain will not
> >> restore
> >>>>>> nb_seg=1
> >>>>>>>> in
> >>>>>>>>>> the second
> >>>>>>>>>>>>>>>>>>> segment.
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> Hmm, not sure why is that?
> >>>>>>>>>>>>>>>>>> You are talking about freeing m1, right?
> >>>>>>>>>>>>>>>>>> rte_pktmbuf_prefree_seg(struct rte_mbuf
> >> *m)
> >>>>>>>>>>>>>>>>>> {
> >>>>>>>>>>>>>>>>>> ...
> >>>>>>>>>>>>>>>>>> if (m->next != NULL) {
> >>>>>>>>>>>>>>>>>> m->next = NULL;
> >>>>>>>>>>>>>>>>>> m->nb_segs = 1;
> >>>>>>>>>>>>>>>>>> }
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> m1->next != NULL, so it will enter the
> >> if()
> >>>>>> block,
> >>>>>>>>>>>>>>>>>> and will reset both next and nb_segs.
> >>>>>>>>>>>>>>>>>> What I am missing here?
> >>>>>>>>>>>>>>>>>> Thinking in more generic way, that
> >> change:
> >>>>>>>>>>>>>>>>>> - if (m->next != NULL) {
> >>>>>>>>>>>>>>>>>> - m->next = NULL;
> >>>>>>>>>>>>>>>>>> - m->nb_segs = 1;
> >>>>>>>>>>>>>>>>>> - }
> >>>>>>>>>>>>>>>>>> + m->next = NULL;
> >>>>>>>>>>>>>>>>>> + m->nb_segs = 1;
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> Ah, sorry. I oversimplified the example
> >> and
> >>>> now
> >>>>>> it
> >>>>>>>> does
> >>>>>>>>>> not
> >>>>>>>>>>>>>>>>> show the issue...
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> The full example also adds a split() to
> >> break
> >>>> the
> >>>>>>>> mbuf
> >>>>>>>>>> chain
> >>>>>>>>>>>>>>>>> between m1 and m2. The kind of thing that
> >>>> would
> >>>>>> be
> >>>>>>>> done
> >>>>>>>>>> for
> >>>>>>>>>>>>>>>>> software TCP segmentation.
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> If so, may be the right solution is to care
> >>>> about
> >>>>>>>> nb_segs
> >>>>>>>>>>>>>>>> when next is set to NULL on split? Any
> >> place
> >>>> when
> >>>>>> next
> >>>>>>>> is
> >>>>>>>>>> set
> >>>>>>>>>>>>>>>> to NULL. Just to keep the optimization in a
> >>>> more
> >>>>>>>> generic
> >>>>>>>>>> place.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> The problem with that approach is that there
> >> are
> >>>>>> already
> >>>>>>>>>> several
> >>>>>>>>>>>>>>> existing split() or trim() implementations in
> >>>>>> different
> >>>>>>>> DPDK-
> >>>>>>>>>> based
> >>>>>>>>>>>>>>> applications. For instance, we have some in
> >>>>>> 6WINDGate. If
> >>>>>>>> we
> >>>>>>>>>> force
> >>>>>>>>>>>>>>> applications to set nb_seg to 1 when
> >> resetting
> >>>> next,
> >>>>>> it
> >>>>>>>> has
> >>>>>>>>>> to be
> >>>>>>>>>>>>>>> documented because it is not straightforward.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> I think it is better to go that way.
> >>>>>>>>>>>>>> From my perspective it seems natural to reset
> >>>> nb_seg at
> >>>>>>>> same
> >>>>>>>>>> time
> >>>>>>>>>>>>>> we reset next, otherwise inconsistency will
> >> occur.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> While it is not explicitly stated for nb_segs, to
> >> me
> >>>> it
> >>>>>> was
> >>>>>>>> clear
> >>>>>>>>>> that
> >>>>>>>>>>>>> nb_segs is only valid in the first segment, like
> >> for
> >>>> many
> >>>>>>>> fields
> >>>>>>>>>> (port,
> >>>>>>>>>>>>> ol_flags, vlan, rss, ...).
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> If we say that nb_segs has to be valid in any
> >>>> segments,
> >>>>>> it
> >>>>>>>> means
> >>>>>>>>>> that
> >>>>>>>>>>>>> chain() or split() will have to update it in all
> >>>>>> segments,
> >>>>>>>> which
> >>>>>>>>>> is not
> >>>>>>>>>>>>> efficient.
> >>>>>>>>>>>>
> >>>>>>>>>>>> Why in all?
> >>>>>>>>>>>> We can state that nb_segs on non-first segment
> >> should
> >>>>>> always
> >>>>>>>> equal
> >>>>>>>>>> 1.
> >>>>>>>>>>>> As I understand in that case, both split() and
> >> chain()
> >>>> have
> >>>>>> to
> >>>>>>>>>> update nb_segs
> >>>>>>>>>>>> only for head mbufs, rest ones will remain
> >> untouched.
> >>>>>>>>>>>
> >>>>>>>>>>> Well, anyway, I think it's strange to have a
> >> constraint
> >>>> on m-
> >>>>>>>>> nb_segs
> >>>>>>>>>> for
> >>>>>>>>>>> non-first segment. We don't have that kind of
> >> constraints
> >>>> for
> >>>>>>>> other
> >>>>>>>>>> fields.
> >>>>>>>>>>
> >>>>>>>>>> True, we don't. But this is one of the fields we
> >> consider
> >>>>>> critical
> >>>>>>>>>> for proper work of mbuf alloc/free mechanism.
> >>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> I am not sure that requiring m->nb_segs == 1 on non-first
> >>>>>> segments
> >>>>>>>> will provide any benefits.
> >>>>>>>>
> >>>>>>>> It would make this patch unneeded.
> >>>>>>>> So, for direct, non-segmented mbufs pktmbuf_free() will
> >> remain
> >>>>>> write-
> >>>>>>>> free.
> >>>>>>>
> >>>>>>> I see. Then I agree with Konstantin that alternative
> >> solutions
> >>>> should
> >>>>>> be considered.
> >>>>>>>
> >>>>>>> The benefit regarding free()'ing non-segmented mbufs - which
> >> is a
> >>>>>> very common operation - certainly outweighs the cost of
> >> requiring
> >>>>>> split()/chain() operations to set the new head mbuf's nb_segs =
> >> 1.
> >>>>>>>
> >>>>>>> Nonetheless, the bug needs to be fixed somehow.
> >>>>>>>
> >>>>>>> If we can't come up with a better solution that doesn't break
> >> the
> >>>>>> ABI, we are forced to accept the patch.
> >>>>>>>
> >>>>>>> Unless the techboard accepts to break the ABI in order to
> >> avoid
> >>>> the
> >>>>>> performance cost of this patch.
> >>>>>>
> >>>>>> Did someone notice a performance drop with this patch?
> >>>>>> On my side, I don't see any regression on a L3 use case.
> >>>>>
> >>>>> I am afraid that the DPDK performance regression tests are based
> >> on
> >>>> TX immediately following RX, so cache misses in TX may go by
> >> unnoticed
> >>>> because RX warmed up the cache for TX already. And similarly for RX
> >>>> reusing mbufs that have been warmed up by the preceding free() at
> >> TX.
> >>>>>
> >>>>> Please consider testing the performance difference with the mbuf
> >>>> being completely cold at TX, and going completely cold again before
> >>>> being reused for RX.
> >>>>>
> >>>>>>
> >>>>>> Let's sumarize: splitting a mbuf chain and freeing it causes
> >>>> subsequent
> >>>>>> mbuf
> >>>>>> allocation to return a mbuf which is not correctly initialized.
> >>>> There
> >>>>>> are 2
> >>>>>> options to fix it:
> >>>>>>
> >>>>>> 1/ change the mbuf free function (this patch)
> >>>>>>
> >>>>>> - m->nb_segs would behave like many other field: valid in
> >> the
> >>>> first
> >>>>>> segment, ignored in other segments
> >>>>>> - may impact performance (suspected)
> >>>>>>
> >>>>>> 2/ change all places where a mbuf chain is split, or trimmed
> >>>>>>
> >>>>>> - m->nb_segs would have a specific behavior: count the
> >> number of
> >>>>>> segments in the first mbuf, should be 1 in the last
> >> segment,
> >>>>>> ignored in other ones.
> >>>>>> - no code change in mbuf library, so no performance impact
> >>>>>> - need to patch all places where we do a mbuf split or trim.
> >>>> From
> >>>>>> afar,
> >>>>>> I see at least mbuf_cut_seg_ofs() in DPDK. Some external
> >>>>>> applications
> >>>>>> may have to be patched (for instance, I already found 3
> >> places
> >>>> in
> >>>>>> 6WIND code base without a deep search).
> >>>>>>
> >>>>>> In my opinion, 1/ is better, except we notice a significant
> >>>>>> performance,
> >>>>>> because the (implicit) behavior is unchanged.
> >>>>>>
> >>>>>> Whatever the solution, some documentation has to be added.
> >>>>>>
> >>>>>> Olivier
> >>>>>>
> >>>>>
> >>>>> Unfortunately, I don't think that anything but the first option
> >> will
> >>>> go into 20.11 and stable releases of older versions, so I stand by
> >> my
> >>>> acknowledgment of the patch.
> >>>>
> >>>> If we are affraid about 20.11 performance (it is legitimate, few
> >> days
> >>>> before the release), we can target 21.02. After all, everybody
> >> lives
> >>>> with this bug since 2017, so there is no urgency. If accepted and
> >> well
> >>>> tested, it can be backported in stable branches.
> >>>
> >>> +1
> >>>
> >>> Good thinking, Olivier!
> >>
> >> Looking at the changes once again, it probably can be reworked a bit:
> >>
> >> - if (m->next != NULL) {
> >> - m->next = NULL;
> >> - m->nb_segs = 1;
> >> - }
> >>
> >> + if (m->next != NULL)
> >> + m->next = NULL;
> >> + if (m->nb_segs != 1)
> >> + m->nb_segs = 1;
> >>
> >> That way we add one more condition checking, but I suppose it
> >> shouldn't be that perf critical.
> >> That way for direct,non-segmented mbuf it still should be write-free.
> >> Except cases as you described above: chain(), then split().
> >>
> >> Of-course we still need to do perf testing for that approach too.
> >> So if your preference it to postpone it till 21.02 - that's ok for me.
> >> Konstantin
> >
> > With this suggestion, I cannot imagine any performance drop for direct, non-segmented mbufs: It now reads m->nb_segs, residing in the
> mbuf's first cache line, but the function already reads m->refcnt in the first cache line; so no cache misses are introduced.
>
> +1
I don't expect perf drop with that approach either.
But some perf testing still needs to be done, just in case 😊
next prev parent reply other threads:[~2020-11-08 14:20 UTC|newest]
Thread overview: 74+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-11-04 17:00 Olivier Matz
2020-11-05 0:15 ` Ananyev, Konstantin
2020-11-05 7:46 ` Olivier Matz
2020-11-05 8:26 ` Andrew Rybchenko
2020-11-05 9:10 ` Olivier Matz
2020-11-05 11:34 ` Ananyev, Konstantin
2020-11-05 12:31 ` Olivier Matz
2020-11-05 13:14 ` Ananyev, Konstantin
2020-11-05 13:24 ` Olivier Matz
2020-11-05 13:55 ` Ananyev, Konstantin
2020-11-05 16:30 ` Morten Brørup
2020-11-05 23:55 ` Ananyev, Konstantin
2020-11-06 7:52 ` Morten Brørup
2020-11-06 8:20 ` Olivier Matz
2020-11-06 8:50 ` Morten Brørup
2020-11-06 10:04 ` Olivier Matz
2020-11-06 10:07 ` Morten Brørup
2020-11-06 11:53 ` Ananyev, Konstantin
2020-11-06 12:23 ` Morten Brørup
2020-11-08 14:16 ` Andrew Rybchenko
2020-11-08 14:19 ` Ananyev, Konstantin [this message]
2020-11-10 16:26 ` Olivier Matz
2020-11-05 8:33 ` Morten Brørup
2020-11-05 9:03 ` Olivier Matz
2020-11-05 9:09 ` Andrew Rybchenko
2020-11-08 7:25 ` Ali Alnubani
2020-12-18 12:52 ` [dpdk-dev] [PATCH v2] " Olivier Matz
2020-12-18 13:18 ` Morten Brørup
2020-12-18 23:33 ` Ajit Khaparde
2021-01-06 13:33 ` [dpdk-dev] [PATCH v3] " Olivier Matz
2021-01-10 9:28 ` Ali Alnubani
2021-01-11 13:14 ` Ananyev, Konstantin
2021-01-13 13:27 ` [dpdk-dev] [PATCH v4] " Olivier Matz
2021-01-15 13:59 ` [dpdk-dev] [dpdk-stable] " David Marchand
2021-01-15 18:39 ` Ali Alnubani
2021-01-18 17:52 ` Ali Alnubani
2021-01-19 8:32 ` Olivier Matz
2021-01-19 8:53 ` Morten Brørup
2021-01-19 12:00 ` Ferruh Yigit
2021-01-19 12:27 ` Morten Brørup
2021-01-19 14:03 ` Ferruh Yigit
2021-01-19 14:21 ` Morten Brørup
2021-01-21 9:15 ` Ferruh Yigit
2021-01-19 14:04 ` Slava Ovsiienko
2021-07-24 8:47 ` Thomas Monjalon
2021-07-30 12:36 ` Olivier Matz
2021-07-30 14:35 ` Morten Brørup
2021-07-30 14:54 ` Thomas Monjalon
2021-07-30 15:14 ` Olivier Matz
2021-07-30 15:23 ` Morten Brørup
2021-08-04 13:29 ` [dpdk-dev] [PATCH] doc: add known issue with mbuf segment Thomas Monjalon
2021-08-04 14:25 ` Ajit Khaparde
2021-08-05 6:08 ` Morten Brørup
2021-08-06 14:21 ` Thomas Monjalon
2021-08-06 14:24 ` Morten Brørup
2021-09-28 8:28 ` [dpdk-dev] [dpdk-stable] [PATCH v4] mbuf: fix reset on mbuf free Thomas Monjalon
2021-09-28 9:00 ` Slava Ovsiienko
2021-09-28 9:25 ` Ananyev, Konstantin
2021-09-28 9:39 ` Morten Brørup
2021-09-29 8:03 ` Ali Alnubani
2021-09-29 21:39 ` Olivier Matz
2021-09-30 13:29 ` Ali Alnubani
2021-10-21 8:26 ` Thomas Monjalon
2021-01-21 9:19 ` Ferruh Yigit
2021-01-21 9:29 ` Morten Brørup
2021-01-21 16:35 ` [dpdk-dev] [dpdklab] " Lincoln Lavoie
2021-01-23 8:57 ` Morten Brørup
2021-01-25 17:00 ` Brandon Lo
2021-01-25 18:42 ` Ferruh Yigit
2021-06-15 13:56 ` [dpdk-dev] " Morten Brørup
2021-09-29 21:37 ` [dpdk-dev] [PATCH v5] " Olivier Matz
2021-09-30 13:27 ` Ali Alnubani
2021-10-21 9:18 ` David Marchand
2022-07-28 14:06 ` CI performance test results might be misleading Morten Brørup
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=DM6PR11MB330811402FF23318DE14D2CE9AEB0@DM6PR11MB3308.namprd11.prod.outlook.com \
--to=konstantin.ananyev@intel.com \
--cc=andrew.rybchenko@oktetlabs.ru \
--cc=dev@dpdk.org \
--cc=mb@smartsharesystems.com \
--cc=olivier.matz@6wind.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).