DPDK patches and discussions
 help / color / mirror / Atom feed
From: "Morten Brørup" <mb@smartsharesystems.com>
To: "Olivier Matz" <olivier.matz@6wind.com>
Cc: "Ananyev, Konstantin" <konstantin.ananyev@intel.com>,
	"Andrew Rybchenko" <andrew.rybchenko@oktetlabs.ru>,
	<dev@dpdk.org>
Subject: Re: [dpdk-dev] [PATCH] mbuf: fix reset on mbuf free
Date: Fri, 6 Nov 2020 11:07:43 +0100	[thread overview]
Message-ID: <98CBD80474FA8B44BF855DF32C47DC35C613FF@smartserver.smartshare.dk> (raw)
In-Reply-To: <20201106100437.GA1898@platinum>

> From: Olivier Matz [mailto:olivier.matz@6wind.com]
> Sent: Friday, November 6, 2020 11:05 AM
> 
> On Fri, Nov 06, 2020 at 09:50:45AM +0100, Morten Brørup wrote:
> > > From: Olivier Matz [mailto:olivier.matz@6wind.com]
> > > Sent: Friday, November 6, 2020 9:21 AM
> > >
> > > On Fri, Nov 06, 2020 at 08:52:58AM +0100, Morten Brørup wrote:
> > > > > From: Ananyev, Konstantin [mailto:konstantin.ananyev@intel.com]
> > > > > Sent: Friday, November 6, 2020 12:55 AM
> > > > >
> > > > > > > > > > > > > >> Hi Olivier,
> > > > > > > > > > > > > >>
> > > > > > > > > > > > > >>> m->nb_seg must be reset on mbuf free
> whatever
> > > the
> > > > > value
> > > > > > > of m->next,
> > > > > > > > > > > > > >>> because it can happen that m->nb_seg is !=
> 1.
> > > For
> > > > > > > instance in this
> > > > > > > > > > > > > >>> case:
> > > > > > > > > > > > > >>>
> > > > > > > > > > > > > >>>   m1 = rte_pktmbuf_alloc(mp);
> > > > > > > > > > > > > >>>   rte_pktmbuf_append(m1, 500);
> > > > > > > > > > > > > >>>   m2 = rte_pktmbuf_alloc(mp);
> > > > > > > > > > > > > >>>   rte_pktmbuf_append(m2, 500);
> > > > > > > > > > > > > >>>   rte_pktmbuf_chain(m1, m2);
> > > > > > > > > > > > > >>>   m0 = rte_pktmbuf_alloc(mp);
> > > > > > > > > > > > > >>>   rte_pktmbuf_append(m0, 500);
> > > > > > > > > > > > > >>>   rte_pktmbuf_chain(m0, m1);
> > > > > > > > > > > > > >>>
> > > > > > > > > > > > > >>> As rte_pktmbuf_chain() does not reset
> nb_seg in
> > > the
> > > > > > > initial m1
> > > > > > > > > > > > > >>> segment (this is not required), after this
> code
> > > the
> > > > > > > mbuf chain
> > > > > > > > > > > > > >>> have 3 segments:
> > > > > > > > > > > > > >>>   - m0: next=m1, nb_seg=3
> > > > > > > > > > > > > >>>   - m1: next=m2, nb_seg=2
> > > > > > > > > > > > > >>>   - m2: next=NULL, nb_seg=1
> > > > > > > > > > > > > >>>
> > > > > > > > > > > > > >>> Freeing this mbuf chain will not restore
> > > nb_seg=1
> > > > > in
> > > > > > > the second
> > > > > > > > > > > > > >>> segment.
> > > > > > > > > > > > > >>
> > > > > > > > > > > > > >> Hmm, not sure why is that?
> > > > > > > > > > > > > >> You are talking about freeing m1, right?
> > > > > > > > > > > > > >> rte_pktmbuf_prefree_seg(struct rte_mbuf *m)
> > > > > > > > > > > > > >> {
> > > > > > > > > > > > > >> 	...
> > > > > > > > > > > > > >> 	if (m->next != NULL) {
> > > > > > > > > > > > > >>                         m->next = NULL;
> > > > > > > > > > > > > >>                         m->nb_segs = 1;
> > > > > > > > > > > > > >>                 }
> > > > > > > > > > > > > >>
> > > > > > > > > > > > > >> m1->next != NULL, so it will enter the if()
> > > block,
> > > > > > > > > > > > > >> and will reset both next and nb_segs.
> > > > > > > > > > > > > >> What I am missing here?
> > > > > > > > > > > > > >> Thinking in more generic way, that change:
> > > > > > > > > > > > > >>  -		if (m->next != NULL) {
> > > > > > > > > > > > > >>  -			m->next = NULL;
> > > > > > > > > > > > > >>  -			m->nb_segs = 1;
> > > > > > > > > > > > > >>  -		}
> > > > > > > > > > > > > >>  +		m->next = NULL;
> > > > > > > > > > > > > >>  +		m->nb_segs = 1;
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Ah, sorry. I oversimplified the example and
> now
> > > it
> > > > > does
> > > > > > > not
> > > > > > > > > > > > > > show the issue...
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > The full example also adds a split() to break
> the
> > > > > mbuf
> > > > > > > chain
> > > > > > > > > > > > > > between m1 and m2. The kind of thing that
> would
> > > be
> > > > > done
> > > > > > > for
> > > > > > > > > > > > > > software TCP segmentation.
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > If so, may be the right solution is to care
> about
> > > > > nb_segs
> > > > > > > > > > > > > when next is set to NULL on split? Any place
> when
> > > next
> > > > > is
> > > > > > > set
> > > > > > > > > > > > > to NULL. Just to keep the optimization in a
> more
> > > > > generic
> > > > > > > place.
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > > The problem with that approach is that there are
> > > already
> > > > > > > several
> > > > > > > > > > > > existing split() or trim() implementations in
> > > different
> > > > > DPDK-
> > > > > > > based
> > > > > > > > > > > > applications. For instance, we have some in
> > > 6WINDGate. If
> > > > > we
> > > > > > > force
> > > > > > > > > > > > applications to set nb_seg to 1 when resetting
> next,
> > > it
> > > > > has
> > > > > > > to be
> > > > > > > > > > > > documented because it is not straightforward.
> > > > > > > > > > >
> > > > > > > > > > > I think it is better to go that way.
> > > > > > > > > > > From my perspective it seems natural to reset
> nb_seg at
> > > > > same
> > > > > > > time
> > > > > > > > > > > we reset next, otherwise inconsistency will occur.
> > > > > > > > > >
> > > > > > > > > > While it is not explicitly stated for nb_segs, to me
> it
> > > was
> > > > > clear
> > > > > > > that
> > > > > > > > > > nb_segs is only valid in the first segment, like for
> many
> > > > > fields
> > > > > > > (port,
> > > > > > > > > > ol_flags, vlan, rss, ...).
> > > > > > > > > >
> > > > > > > > > > If we say that nb_segs has to be valid in any
> segments,
> > > it
> > > > > means
> > > > > > > that
> > > > > > > > > > chain() or split() will have to update it in all
> > > segments,
> > > > > which
> > > > > > > is not
> > > > > > > > > > efficient.
> > > > > > > > >
> > > > > > > > > Why in all?
> > > > > > > > > We can state that nb_segs on non-first segment should
> > > always
> > > > > equal
> > > > > > > 1.
> > > > > > > > > As I understand in that case, both split() and chain()
> have
> > > to
> > > > > > > update nb_segs
> > > > > > > > > only for head mbufs, rest ones will remain untouched.
> > > > > > > >
> > > > > > > > Well, anyway, I think it's strange to have a constraint
> on m-
> > > > > >nb_segs
> > > > > > > for
> > > > > > > > non-first segment. We don't have that kind of constraints
> for
> > > > > other
> > > > > > > fields.
> > > > > > >
> > > > > > > True, we don't. But this is one of the fields we consider
> > > critical
> > > > > > > for proper work of mbuf alloc/free mechanism.
> > > > > > >
> > > > > >
> > > > > > I am not sure that requiring m->nb_segs == 1 on non-first
> > > segments
> > > > > will provide any benefits.
> > > > >
> > > > > It would make this patch unneeded.
> > > > > So, for direct, non-segmented mbufs  pktmbuf_free() will remain
> > > write-
> > > > > free.
> > > >
> > > > I see. Then I agree with Konstantin that alternative solutions
> should
> > > be considered.
> > > >
> > > > The benefit regarding free()'ing non-segmented mbufs - which is a
> > > very common operation - certainly outweighs the cost of requiring
> > > split()/chain() operations to set the new head mbuf's nb_segs = 1.
> > > >
> > > > Nonetheless, the bug needs to be fixed somehow.
> > > >
> > > > If we can't come up with a better solution that doesn't break the
> > > ABI, we are forced to accept the patch.
> > > >
> > > > Unless the techboard accepts to break the ABI in order to avoid
> the
> > > performance cost of this patch.
> > >
> > > Did someone notice a performance drop with this patch?
> > > On my side, I don't see any regression on a L3 use case.
> >
> > I am afraid that the DPDK performance regression tests are based on
> TX immediately following RX, so cache misses in TX may go by unnoticed
> because RX warmed up the cache for TX already. And similarly for RX
> reusing mbufs that have been warmed up by the preceding free() at TX.
> >
> > Please consider testing the performance difference with the mbuf
> being completely cold at TX, and going completely cold again before
> being reused for RX.
> >
> > >
> > > Let's sumarize: splitting a mbuf chain and freeing it causes
> subsequent
> > > mbuf
> > > allocation to return a mbuf which is not correctly initialized.
> There
> > > are 2
> > > options to fix it:
> > >
> > > 1/ change the mbuf free function (this patch)
> > >
> > >    - m->nb_segs would behave like many other field: valid in the
> first
> > >      segment, ignored in other segments
> > >    - may impact performance (suspected)
> > >
> > > 2/ change all places where a mbuf chain is split, or trimmed
> > >
> > >    - m->nb_segs would have a specific behavior: count the number of
> > >      segments in the first mbuf, should be 1 in the last segment,
> > >      ignored in other ones.
> > >    - no code change in mbuf library, so no performance impact
> > >    - need to patch all places where we do a mbuf split or trim.
> From
> > > afar,
> > >      I see at least mbuf_cut_seg_ofs() in DPDK. Some external
> > > applications
> > >      may have to be patched (for instance, I already found 3 places
> in
> > >      6WIND code base without a deep search).
> > >
> > > In my opinion, 1/ is better, except we notice a significant
> > > performance,
> > > because the (implicit) behavior is unchanged.
> > >
> > > Whatever the solution, some documentation has to be added.
> > >
> > > Olivier
> > >
> >
> > Unfortunately, I don't think that anything but the first option will
> go into 20.11 and stable releases of older versions, so I stand by my
> acknowledgment of the patch.
> 
> If we are affraid about 20.11 performance (it is legitimate, few days
> before the release), we can target 21.02. After all, everybody lives
> with this bug since 2017, so there is no urgency. If accepted and well
> tested, it can be backported in stable branches.

+1

Good thinking, Olivier!


  reply	other threads:[~2020-11-06 10:07 UTC|newest]

Thread overview: 74+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-11-04 17:00 Olivier Matz
2020-11-05  0:15 ` Ananyev, Konstantin
2020-11-05  7:46   ` Olivier Matz
2020-11-05  8:26     ` Andrew Rybchenko
2020-11-05  9:10       ` Olivier Matz
2020-11-05 11:34         ` Ananyev, Konstantin
2020-11-05 12:31           ` Olivier Matz
2020-11-05 13:14             ` Ananyev, Konstantin
2020-11-05 13:24               ` Olivier Matz
2020-11-05 13:55                 ` Ananyev, Konstantin
2020-11-05 16:30                   ` Morten Brørup
2020-11-05 23:55                     ` Ananyev, Konstantin
2020-11-06  7:52                       ` Morten Brørup
2020-11-06  8:20                         ` Olivier Matz
2020-11-06  8:50                           ` Morten Brørup
2020-11-06 10:04                             ` Olivier Matz
2020-11-06 10:07                               ` Morten Brørup [this message]
2020-11-06 11:53                                 ` Ananyev, Konstantin
2020-11-06 12:23                                   ` Morten Brørup
2020-11-08 14:16                                     ` Andrew Rybchenko
2020-11-08 14:19                                       ` Ananyev, Konstantin
2020-11-10 16:26                                         ` Olivier Matz
2020-11-05  8:33     ` Morten Brørup
2020-11-05  9:03       ` Olivier Matz
2020-11-05  9:09     ` Andrew Rybchenko
2020-11-08  7:25 ` Ali Alnubani
2020-12-18 12:52 ` [dpdk-dev] [PATCH v2] " Olivier Matz
2020-12-18 13:18   ` Morten Brørup
2020-12-18 23:33     ` Ajit Khaparde
2021-01-06 13:33 ` [dpdk-dev] [PATCH v3] " Olivier Matz
2021-01-10  9:28   ` Ali Alnubani
2021-01-11 13:14   ` Ananyev, Konstantin
2021-01-13 13:27 ` [dpdk-dev] [PATCH v4] " Olivier Matz
2021-01-15 13:59   ` [dpdk-dev] [dpdk-stable] " David Marchand
2021-01-15 18:39     ` Ali Alnubani
2021-01-18 17:52       ` Ali Alnubani
2021-01-19  8:32         ` Olivier Matz
2021-01-19  8:53           ` Morten Brørup
2021-01-19 12:00             ` Ferruh Yigit
2021-01-19 12:27               ` Morten Brørup
2021-01-19 14:03                 ` Ferruh Yigit
2021-01-19 14:21                   ` Morten Brørup
2021-01-21  9:15                     ` Ferruh Yigit
2021-01-19 14:04           ` Slava Ovsiienko
2021-07-24  8:47             ` Thomas Monjalon
2021-07-30 12:36               ` Olivier Matz
2021-07-30 14:35                 ` Morten Brørup
2021-07-30 14:54                   ` Thomas Monjalon
2021-07-30 15:14                     ` Olivier Matz
2021-07-30 15:23                       ` Morten Brørup
2021-08-04 13:29                       ` [dpdk-dev] [PATCH] doc: add known issue with mbuf segment Thomas Monjalon
2021-08-04 14:25                         ` Ajit Khaparde
2021-08-05  6:08                         ` Morten Brørup
2021-08-06 14:21                           ` Thomas Monjalon
2021-08-06 14:24                             ` Morten Brørup
2021-09-28  8:28                     ` [dpdk-dev] [dpdk-stable] [PATCH v4] mbuf: fix reset on mbuf free Thomas Monjalon
2021-09-28  9:00                       ` Slava Ovsiienko
2021-09-28  9:25                         ` Ananyev, Konstantin
2021-09-28  9:39                         ` Morten Brørup
2021-09-29  8:03                           ` Ali Alnubani
2021-09-29 21:39                             ` Olivier Matz
2021-09-30 13:29                               ` Ali Alnubani
2021-10-21  8:26                                 ` Thomas Monjalon
2021-01-21  9:19       ` Ferruh Yigit
2021-01-21  9:29         ` Morten Brørup
2021-01-21 16:35           ` [dpdk-dev] [dpdklab] " Lincoln Lavoie
2021-01-23  8:57             ` Morten Brørup
2021-01-25 17:00               ` Brandon Lo
2021-01-25 18:42             ` Ferruh Yigit
2021-06-15 13:56   ` [dpdk-dev] " Morten Brørup
2021-09-29 21:37   ` [dpdk-dev] [PATCH v5] " Olivier Matz
2021-09-30 13:27     ` Ali Alnubani
2021-10-21  9:18     ` David Marchand
2022-07-28 14:06       ` CI performance test results might be misleading Morten Brørup

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=98CBD80474FA8B44BF855DF32C47DC35C613FF@smartserver.smartshare.dk \
    --to=mb@smartsharesystems.com \
    --cc=andrew.rybchenko@oktetlabs.ru \
    --cc=dev@dpdk.org \
    --cc=konstantin.ananyev@intel.com \
    --cc=olivier.matz@6wind.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).