DPDK patches and discussions
 help / color / mirror / Atom feed
From: "Morten Brørup" <mb@smartsharesystems.com>
To: "Ananyev, Konstantin" <konstantin.ananyev@intel.com>,
	"Olivier Matz" <olivier.matz@6wind.com>
Cc: "Andrew Rybchenko" <andrew.rybchenko@oktetlabs.ru>, <dev@dpdk.org>
Subject: Re: [dpdk-dev] [PATCH] mbuf: fix reset on mbuf free
Date: Fri, 6 Nov 2020 08:52:58 +0100	[thread overview]
Message-ID: <98CBD80474FA8B44BF855DF32C47DC35C613F8@smartserver.smartshare.dk> (raw)
In-Reply-To: <DM6PR11MB330887C8A3587FE9D82278779AEE0@DM6PR11MB3308.namprd11.prod.outlook.com>

> From: Ananyev, Konstantin [mailto:konstantin.ananyev@intel.com]
> Sent: Friday, November 6, 2020 12:55 AM
> 
> > > > > > > > > >> Hi Olivier,
> > > > > > > > > >>
> > > > > > > > > >>> m->nb_seg must be reset on mbuf free whatever the
> value
> > > of m->next,
> > > > > > > > > >>> because it can happen that m->nb_seg is != 1. For
> > > instance in this
> > > > > > > > > >>> case:
> > > > > > > > > >>>
> > > > > > > > > >>>   m1 = rte_pktmbuf_alloc(mp);
> > > > > > > > > >>>   rte_pktmbuf_append(m1, 500);
> > > > > > > > > >>>   m2 = rte_pktmbuf_alloc(mp);
> > > > > > > > > >>>   rte_pktmbuf_append(m2, 500);
> > > > > > > > > >>>   rte_pktmbuf_chain(m1, m2);
> > > > > > > > > >>>   m0 = rte_pktmbuf_alloc(mp);
> > > > > > > > > >>>   rte_pktmbuf_append(m0, 500);
> > > > > > > > > >>>   rte_pktmbuf_chain(m0, m1);
> > > > > > > > > >>>
> > > > > > > > > >>> As rte_pktmbuf_chain() does not reset nb_seg in the
> > > initial m1
> > > > > > > > > >>> segment (this is not required), after this code the
> > > mbuf chain
> > > > > > > > > >>> have 3 segments:
> > > > > > > > > >>>   - m0: next=m1, nb_seg=3
> > > > > > > > > >>>   - m1: next=m2, nb_seg=2
> > > > > > > > > >>>   - m2: next=NULL, nb_seg=1
> > > > > > > > > >>>
> > > > > > > > > >>> Freeing this mbuf chain will not restore nb_seg=1
> in
> > > the second
> > > > > > > > > >>> segment.
> > > > > > > > > >>
> > > > > > > > > >> Hmm, not sure why is that?
> > > > > > > > > >> You are talking about freeing m1, right?
> > > > > > > > > >> rte_pktmbuf_prefree_seg(struct rte_mbuf *m)
> > > > > > > > > >> {
> > > > > > > > > >> 	...
> > > > > > > > > >> 	if (m->next != NULL) {
> > > > > > > > > >>                         m->next = NULL;
> > > > > > > > > >>                         m->nb_segs = 1;
> > > > > > > > > >>                 }
> > > > > > > > > >>
> > > > > > > > > >> m1->next != NULL, so it will enter the if() block,
> > > > > > > > > >> and will reset both next and nb_segs.
> > > > > > > > > >> What I am missing here?
> > > > > > > > > >> Thinking in more generic way, that change:
> > > > > > > > > >>  -		if (m->next != NULL) {
> > > > > > > > > >>  -			m->next = NULL;
> > > > > > > > > >>  -			m->nb_segs = 1;
> > > > > > > > > >>  -		}
> > > > > > > > > >>  +		m->next = NULL;
> > > > > > > > > >>  +		m->nb_segs = 1;
> > > > > > > > > >
> > > > > > > > > > Ah, sorry. I oversimplified the example and now it
> does
> > > not
> > > > > > > > > > show the issue...
> > > > > > > > > >
> > > > > > > > > > The full example also adds a split() to break the
> mbuf
> > > chain
> > > > > > > > > > between m1 and m2. The kind of thing that would be
> done
> > > for
> > > > > > > > > > software TCP segmentation.
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > > If so, may be the right solution is to care about
> nb_segs
> > > > > > > > > when next is set to NULL on split? Any place when next
> is
> > > set
> > > > > > > > > to NULL. Just to keep the optimization in a more
> generic
> > > place.
> > > > > > >
> > > > > > >
> > > > > > > > The problem with that approach is that there are already
> > > several
> > > > > > > > existing split() or trim() implementations in different
> DPDK-
> > > based
> > > > > > > > applications. For instance, we have some in 6WINDGate. If
> we
> > > force
> > > > > > > > applications to set nb_seg to 1 when resetting next, it
> has
> > > to be
> > > > > > > > documented because it is not straightforward.
> > > > > > >
> > > > > > > I think it is better to go that way.
> > > > > > > From my perspective it seems natural to reset nb_seg at
> same
> > > time
> > > > > > > we reset next, otherwise inconsistency will occur.
> > > > > >
> > > > > > While it is not explicitly stated for nb_segs, to me it was
> clear
> > > that
> > > > > > nb_segs is only valid in the first segment, like for many
> fields
> > > (port,
> > > > > > ol_flags, vlan, rss, ...).
> > > > > >
> > > > > > If we say that nb_segs has to be valid in any segments, it
> means
> > > that
> > > > > > chain() or split() will have to update it in all segments,
> which
> > > is not
> > > > > > efficient.
> > > > >
> > > > > Why in all?
> > > > > We can state that nb_segs on non-first segment should always
> equal
> > > 1.
> > > > > As I understand in that case, both split() and chain() have to
> > > update nb_segs
> > > > > only for head mbufs, rest ones will remain untouched.
> > > >
> > > > Well, anyway, I think it's strange to have a constraint on m-
> >nb_segs
> > > for
> > > > non-first segment. We don't have that kind of constraints for
> other
> > > fields.
> > >
> > > True, we don't. But this is one of the fields we consider critical
> > > for proper work of mbuf alloc/free mechanism.
> > >
> >
> > I am not sure that requiring m->nb_segs == 1 on non-first segments
> will provide any benefits.
> 
> It would make this patch unneeded.
> So, for direct, non-segmented mbufs  pktmbuf_free() will remain write-
> free.

I see. Then I agree with Konstantin that alternative solutions should be considered.

The benefit regarding free()'ing non-segmented mbufs - which is a very common operation - certainly outweighs the cost of requiring split()/chain() operations to set the new head mbuf's nb_segs = 1.

Nonetheless, the bug needs to be fixed somehow.

If we can't come up with a better solution that doesn't break the ABI, we are forced to accept the patch.

Unless the techboard accepts to break the ABI in order to avoid the performance cost of this patch.

> 
> >
> > E.g. the second segment of a three-segment chain will still have m-
> >next != NULL, so it cannot be used as a gate to prevent accessing m-
> > >next.
> >
> > I might have overlooked something, though.
> >
> > > >
> > > >
> > > > >
> > > > > >
> > > > > > Saying that nb_segs has to be valid for the first and last
> > > segment seems
> > > > > > really odd to me. What is the logic behind that except
> keeping
> > > this test
> > > > > > as is?
> > > > > >
> > > > > > In any case, it has to be better documented.
> > > > > >
> > > > > >
> > > > > > Olivier
> > > > > >
> > > > > >
> > > > > > > > I think the approach from
> > > > > > > > this patch is safer.
> > > > > > >
> > > > > > > It might be easier from perspective that changes in less
> places
> > > are required,
> > > > > > > Though I think that this patch will introduce some
> performance
> > > drop.
> > > > > > > As now each mbuf_prefree_seg() will cause update of 2 cache
> > > lines unconditionally.
> > > > > > >
> > > > > > > > By the way, for 21.11, if we are able to do some
> > > optimizations and have
> > > > > > > > both pool (index?) and next in the first cache line, we
> may
> > > reconsider
> > > > > > > > the fact that next and nb_segs are already set for new
> > > allocated mbufs,
> > > > > > > > because it is not straightforward either.
> > > > > > >
> > > > > > > My suggestion - let's put future optimization discussion
> aside
> > > for now,
> > > > > > > and concentrate on that particular patch.
> > > > > > >
> > > > > > > >
> > > > > > > > > > After this operation, we have 2 mbuf chain:
> > > > > > > > > >  - m0 with 2 segments, the last one has next=NULL but
> > > nb_seg=2
> > > > > > > > > >  - new_m with 1 segment
> > > > > > > > > >
> > > > > > > > > > Freeing m0 will not restore nb_seg=1 in the second
> > > segment.
> > > > > > > > > >
> > > > > > > > > >> Assumes that it is ok to have an mbuf with
> > > > > > > > > >> nb_seg > 1 and next == NULL.
> > > > > > > > > >> Which seems wrong to me.
> > > > > > > > > >
> > > > > > > > > > I don't think it is wrong: nb_seg is just ignored
> when
> > > not in the first
> > > > > > > > > > segment, and there is nothing saying it should be set
> to
> > > 1. Typically,
> > > > > > > > > > rte_pktmbuf_chain() does not change it, and I guess
> it's
> > > the same for
> > > > > > > > > > many similar functions in applications.
> > > > > > > > > >
> > > > > > > > > > Olivier
> > > > > > > > > >
> > > > > > > > > >>
> > > > > > > > > >>
> > > > > > > > > >>> This is expected that mbufs stored in pool have
> their
> > > > > > > > > >>> nb_seg field set to 1.
> > > > > > > > > >>>
> > > > > > > > > >>> Fixes: 8f094a9ac5d7 ("mbuf: set mbuf fields while
> in
> > > pool")
> > > > > > > > > >>> Cc: stable@dpdk.org
> > > > > > > > > >>>
> > > > > > > > > >>> Signed-off-by: Olivier Matz
> <olivier.matz@6wind.com>
> > > > > > > > > >>> ---
> > > > > > > > > >>>  lib/librte_mbuf/rte_mbuf.c |  6 ++----
> > > > > > > > > >>>  lib/librte_mbuf/rte_mbuf.h | 12 ++++--------
> > > > > > > > > >>>  2 files changed, 6 insertions(+), 12 deletions(-)
> > > > > > > > > >>>
> > > > > > > > > >>> diff --git a/lib/librte_mbuf/rte_mbuf.c
> > > b/lib/librte_mbuf/rte_mbuf.c
> > > > > > > > > >>> index 8a456e5e64..e632071c23 100644
> > > > > > > > > >>> --- a/lib/librte_mbuf/rte_mbuf.c
> > > > > > > > > >>> +++ b/lib/librte_mbuf/rte_mbuf.c
> > > > > > > > > >>> @@ -129,10 +129,8 @@
> > > rte_pktmbuf_free_pinned_extmem(void *addr, void *opaque)
> > > > > > > > > >>>
> > > > > > > > > >>>  	rte_mbuf_ext_refcnt_set(m->shinfo, 1);
> > > > > > > > > >>>  	m->ol_flags = EXT_ATTACHED_MBUF;
> > > > > > > > > >>> -	if (m->next != NULL) {
> > > > > > > > > >>> -		m->next = NULL;
> > > > > > > > > >>> -		m->nb_segs = 1;
> > > > > > > > > >>> -	}
> > > > > > > > > >>> +	m->next = NULL;
> > > > > > > > > >>> +	m->nb_segs = 1;
> > > > > > > > > >>>  	rte_mbuf_raw_free(m);
> > > > > > > > > >>>  }
> > > > > > > > > >>>
> > > > > > > > > >>> diff --git a/lib/librte_mbuf/rte_mbuf.h
> > > b/lib/librte_mbuf/rte_mbuf.h
> > > > > > > > > >>> index a1414ed7cd..ef5800c8ef 100644
> > > > > > > > > >>> --- a/lib/librte_mbuf/rte_mbuf.h
> > > > > > > > > >>> +++ b/lib/librte_mbuf/rte_mbuf.h
> > > > > > > > > >>> @@ -1329,10 +1329,8 @@
> rte_pktmbuf_prefree_seg(struct
> > > rte_mbuf *m)
> > > > > > > > > >>>  				return NULL;
> > > > > > > > > >>>  		}
> > > > > > > > > >>>
> > > > > > > > > >>> -		if (m->next != NULL) {
> > > > > > > > > >>> -			m->next = NULL;
> > > > > > > > > >>> -			m->nb_segs = 1;
> > > > > > > > > >>> -		}
> > > > > > > > > >>> +		m->next = NULL;
> > > > > > > > > >>> +		m->nb_segs = 1;
> > > > > > > > > >>>
> > > > > > > > > >>>  		return m;
> > > > > > > > > >>>
> > > > > > > > > >>> @@ -1346,10 +1344,8 @@
> rte_pktmbuf_prefree_seg(struct
> > > rte_mbuf *m)
> > > > > > > > > >>>  				return NULL;
> > > > > > > > > >>>  		}
> > > > > > > > > >>>
> > > > > > > > > >>> -		if (m->next != NULL) {
> > > > > > > > > >>> -			m->next = NULL;
> > > > > > > > > >>> -			m->nb_segs = 1;
> > > > > > > > > >>> -		}
> > > > > > > > > >>> +		m->next = NULL;
> > > > > > > > > >>> +		m->nb_segs = 1;
> > > > > > > > > >>>  		rte_mbuf_refcnt_set(m, 1);
> > > > > > > > > >>>
> > > > > > > > > >>>  		return m;
> > > > > > > > > >>> --
> > > > > > > > > >>> 2.25.1
> > > > > > > > > >>
> > > > > > > > >


  reply	other threads:[~2020-11-06  7:53 UTC|newest]

Thread overview: 74+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-11-04 17:00 Olivier Matz
2020-11-05  0:15 ` Ananyev, Konstantin
2020-11-05  7:46   ` Olivier Matz
2020-11-05  8:26     ` Andrew Rybchenko
2020-11-05  9:10       ` Olivier Matz
2020-11-05 11:34         ` Ananyev, Konstantin
2020-11-05 12:31           ` Olivier Matz
2020-11-05 13:14             ` Ananyev, Konstantin
2020-11-05 13:24               ` Olivier Matz
2020-11-05 13:55                 ` Ananyev, Konstantin
2020-11-05 16:30                   ` Morten Brørup
2020-11-05 23:55                     ` Ananyev, Konstantin
2020-11-06  7:52                       ` Morten Brørup [this message]
2020-11-06  8:20                         ` Olivier Matz
2020-11-06  8:50                           ` Morten Brørup
2020-11-06 10:04                             ` Olivier Matz
2020-11-06 10:07                               ` Morten Brørup
2020-11-06 11:53                                 ` Ananyev, Konstantin
2020-11-06 12:23                                   ` Morten Brørup
2020-11-08 14:16                                     ` Andrew Rybchenko
2020-11-08 14:19                                       ` Ananyev, Konstantin
2020-11-10 16:26                                         ` Olivier Matz
2020-11-05  8:33     ` Morten Brørup
2020-11-05  9:03       ` Olivier Matz
2020-11-05  9:09     ` Andrew Rybchenko
2020-11-08  7:25 ` Ali Alnubani
2020-12-18 12:52 ` [dpdk-dev] [PATCH v2] " Olivier Matz
2020-12-18 13:18   ` Morten Brørup
2020-12-18 23:33     ` Ajit Khaparde
2021-01-06 13:33 ` [dpdk-dev] [PATCH v3] " Olivier Matz
2021-01-10  9:28   ` Ali Alnubani
2021-01-11 13:14   ` Ananyev, Konstantin
2021-01-13 13:27 ` [dpdk-dev] [PATCH v4] " Olivier Matz
2021-01-15 13:59   ` [dpdk-dev] [dpdk-stable] " David Marchand
2021-01-15 18:39     ` Ali Alnubani
2021-01-18 17:52       ` Ali Alnubani
2021-01-19  8:32         ` Olivier Matz
2021-01-19  8:53           ` Morten Brørup
2021-01-19 12:00             ` Ferruh Yigit
2021-01-19 12:27               ` Morten Brørup
2021-01-19 14:03                 ` Ferruh Yigit
2021-01-19 14:21                   ` Morten Brørup
2021-01-21  9:15                     ` Ferruh Yigit
2021-01-19 14:04           ` Slava Ovsiienko
2021-07-24  8:47             ` Thomas Monjalon
2021-07-30 12:36               ` Olivier Matz
2021-07-30 14:35                 ` Morten Brørup
2021-07-30 14:54                   ` Thomas Monjalon
2021-07-30 15:14                     ` Olivier Matz
2021-07-30 15:23                       ` Morten Brørup
2021-08-04 13:29                       ` [dpdk-dev] [PATCH] doc: add known issue with mbuf segment Thomas Monjalon
2021-08-04 14:25                         ` Ajit Khaparde
2021-08-05  6:08                         ` Morten Brørup
2021-08-06 14:21                           ` Thomas Monjalon
2021-08-06 14:24                             ` Morten Brørup
2021-09-28  8:28                     ` [dpdk-dev] [dpdk-stable] [PATCH v4] mbuf: fix reset on mbuf free Thomas Monjalon
2021-09-28  9:00                       ` Slava Ovsiienko
2021-09-28  9:25                         ` Ananyev, Konstantin
2021-09-28  9:39                         ` Morten Brørup
2021-09-29  8:03                           ` Ali Alnubani
2021-09-29 21:39                             ` Olivier Matz
2021-09-30 13:29                               ` Ali Alnubani
2021-10-21  8:26                                 ` Thomas Monjalon
2021-01-21  9:19       ` Ferruh Yigit
2021-01-21  9:29         ` Morten Brørup
2021-01-21 16:35           ` [dpdk-dev] [dpdklab] " Lincoln Lavoie
2021-01-23  8:57             ` Morten Brørup
2021-01-25 17:00               ` Brandon Lo
2021-01-25 18:42             ` Ferruh Yigit
2021-06-15 13:56   ` [dpdk-dev] " Morten Brørup
2021-09-29 21:37   ` [dpdk-dev] [PATCH v5] " Olivier Matz
2021-09-30 13:27     ` Ali Alnubani
2021-10-21  9:18     ` David Marchand
2022-07-28 14:06       ` CI performance test results might be misleading Morten Brørup

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=98CBD80474FA8B44BF855DF32C47DC35C613F8@smartserver.smartshare.dk \
    --to=mb@smartsharesystems.com \
    --cc=andrew.rybchenko@oktetlabs.ru \
    --cc=dev@dpdk.org \
    --cc=konstantin.ananyev@intel.com \
    --cc=olivier.matz@6wind.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).