DPDK patches and discussions
 help / color / mirror / Atom feed
From: "Ananyev, Konstantin" <konstantin.ananyev@intel.com>
To: Olivier Matz <olivier.matz@6wind.com>,
	"dev@dpdk.org" <dev@dpdk.org>,
	Adrien Mazarguil <adrien.mazarguil@6wind.com>
Subject: Re: [dpdk-dev] [PATCH] mbuf: remove inconsistent assert statements
Date: Wed, 8 Jun 2016 16:07:25 +0000	[thread overview]
Message-ID: <2601191342CEEE43887BDE71AB97725836B6D17A@irsmsx105.ger.corp.intel.com> (raw)
In-Reply-To: <57582797.8050300@6wind.com>

Hi Olivier,

> 
> Hi Adrien, Konstantin,
> 
> I'm jumping in this (interesting) discussion. I already talked a
> bit with Adrien in point to point, and I think its patch is valid.
> Please see some comments below.
> 
> 
> On 06/08/2016 03:57 PM, Adrien Mazarguil wrote:
> > On Wed, Jun 08, 2016 at 01:09:18PM +0000, Ananyev, Konstantin wrote:
> >>>
> >>> Hi Konstantin,
> >>>
> >>> On Wed, Jun 08, 2016 at 10:34:17AM +0000, Ananyev, Konstantin wrote:
> >>>> Hi Adrien,
> >>>>
> >>>>>
> >>>>> An assertion failure occurs in __rte_mbuf_raw_free() (called by a few PMDs)
> >>>>> when compiling DPDK with CONFIG_RTE_LOG_LEVEL=RTE_LOG_DEBUG and starting
> >>>>> applications with a log level high enough to trigger it.
> >>>>>
> >>>>> While rte_mbuf_raw_alloc() sets refcount to 1, __rte_mbuf_raw_free()
> >>>>> expects it to be 0.
> >>>>> Considering users are not expected to reset the
> >>>>> reference count to satisfy assert() and that raw functions are designed on
> >>>>> purpose without safety belts, remove these checks.
> >>>>
> >>>> Yes, it refcnt supposed to be set to 0 by __rte_pktmbuf_prefree_seg().
> >>>> Wright now, it is a user responsibility to make sure refcnt==0 before pushing
> >>>> mbuf back to the pool.
> >>>> Not sure why do you consider that wrong?
> >>>
> >>> I do not consider this wrong and I'm all for using assert() to catch
> >>> programming errors, however in this specific case, I think they are
> >>> inconsistent and misleading.
> >>
> >> Honestly, I don't understand why.
> >> Right now the rule of thumb is - when mbuf is in the pool, it's refcnt should be equal zero.
> 
> What is the purpose of this? Is there some code relying on this?

The whole current implementation of mbuf_free code path relies on that.
Straight here: 
if (likely(NULL != (m = __rte_pktmbuf_prefree_seg(m)))) {
                m->next = NULL;
                __rte_mbuf_raw_free(m);
        }

If we'll exclude indirect mbuf logic, all it does:
if (rte_mbuf_refcnt_update(m, -1) == 0) {
	m->next = NULL;
	__rte_mbuf_raw_free(m);
}

I.E.:
decrement mbuf->refcnt.
If new value of refcnt is zero, then put it back into the pool.

So having ASERT(mbuf->refcnt==0) inside
__rte_mbuf_raw_free()/__rte_mbuf_raw_alloc()
looks absolutely valid to me.
I *has* to be zero at that point with current implementation,
And if it is not then we probably have (or will have) a silent memory corruption.

> 
> >> Yes, as you pointed below - that rule probably can be changed to:
> >> when mbuf is in the pool, it's refcnt should equal one, and that would probably allow us
> >> to speedup things a bit, but I suppose that's the matter of another aptch/discussion.
> >
> > Agreed.
> 
> I agree this is somehow another discussion, but let's dive into :)

OK :)

> 
> But since [1], it is allowed to call rte_mbuf_raw_alloc() in PMDs (all
> PMDs were calling an internal function before).

Yes, raw_alloc is public, NP with that.

> We could argue that
> rte_mbuf_raw_free() should also be made public for PMDs.

We could, but right now it is not.
Again, as I said, user could use it on his own but he obviously has to
obey the rules and do manually what __rte_pktmbuf_prefree_seg() does.
At least:

rte_mbuf_refcnt_set(m, 0);
__rte_mbuf_raw_free(m);

> 
> As you said below, no-one promised that the free() reverts the malloc(),
> but given the function names, one can legitimately expect that the
> following code is valid:
> 
> m = __rte_mbuf_raw_alloc();
> /* do nothing here */
> __rte_mbuf_raw_free(m);

Surely people could (and would) expect various things...
But the reality right now is: __rte_mbuf_raw_free() is an internal
function and not counterpart of __rte_mbuf_raw_alloc().
If the people don't bother to read API docs or/and the actual code,
I can't see how we can help them :)

> 
> If no code relies on having the refcnt set to 0 when a mbuf is in
> the pool, I suggest to relax this constraint as Adrien proposed.

Why not just rename it to __rte_mbuf_raw_free_dont_use_after_raw_alloc()?
To avoid any further confusion :)
Seriously speaking I would prefer to leave it as it is.
If you feel we have to introduce a counterpart of  rte_mbuf_raw_alloc(),
we can make a new public one:

rte_mbuf_raw_free(stuct rte_mbuf *m)
{
      if (rte_mbuf_refcnt_update(m, -1) == 0)
                __rte_mbuf_raw_free(m);
}

> 
> Then, my opinion is that the refcount should be set to 1 in
> rte_pktmbuf_reset().

I don't think we need to update rte_pktmbuf_reset(),
it doesn't touch refcnt at all and probably better to keep it that way.
To achieve what you suggesting, we probably need to:
1) update  _rte_pktmbuf_prefree_seg and rte_pktmbuf_detach() to
set refcnt back to 1, i.e:

static inline struct rte_mbuf* __attribute__((always_inline))
__rte_pktmbuf_prefree_seg(struct rte_mbuf *m)
{
        __rte_mbuf_sanity_check(m, 0);

        if (likely(rte_mbuf_refcnt_update(m, -1) == 0)) {
                /* if this is an indirect mbuf, it is detached. */
                if (RTE_MBUF_INDIRECT(m))
                        rte_pktmbuf_detach(m);
+	rte_mbuf_refcnt_set(m, 1);
                return m;
        }
        return NULL;
}

2) either:
   a) update mbuf constructor function, so it sets refcnt=1.
        I suppose that is easy for rte_pktmbuf_init(), but it means that all custom
        constructors should do the same.
        Which means possible changes in existing user code and all ABI change related hassle.
  b) keep rte_mbuf_raw_alloc() setting mbuf->refcnt=1.
      But then I don't see how will get any performance gain here.  

So not sure is it really worth it.

> And rte_pktmbuf_free() should not be allowed on
> an uninitialized mbuf (yes, this would require some changes in PMDs).

Not sure I understand you here...
free() wouldn not be allowed on mbuf whose recnf==0?
But it is not allowed right now anyway.

> This would open the door for bulk allocation/free in the mbuf api.

Hmm and what stops us having one right now?
As I know we do have rte_pktmbuf_alloc_bulk(), I don't see
why we can't have rte_pktmbuf_free_bulk() right now.

> 
> This could be done in several steps:
> 
> 1/ remove these assert(), add introduce a public rte_mbuf_raw_free()
> 2/ announce that rte_pktmbuf_free() won't work on uninitialized mbufs
> in a future version, and ensure that all PMD are inline with this
> requirement
> 3/ later, move refcount initialization in rte_pktmbuf_reset()
> 
> What do you think?

I still think that assert() is on a right place :)
*If* we'll change mbuf free code in a way that mbufs inside the pool
should have refcnt==1, then I think we'll change it to:
RTE_ASSERT(rte_mbuf_refcnt_read(m) == 1); 
But as I stated above, that change might cause some backward compatibility hassle: 2.a)
Or might not give us any performance gain: 2.b).

> 
> Another option is to remove the rte_mbuf_raw_alloc()/rte_mbuf_raw_free()
> and use mempool calls.

Don't see why we have to remove them...
Basically we have a bug in PMD, but instead of fixing it,
you guys suggest to change mbuf code.
Sounds a bit strange to me.
Why not just make for these PMDs to call rte_mempool_put() directly, if you are sure it is safe here?

> But having a mbuf wrapper does not seem a bad
> thing to me.

We can add some extra wrapper then, something like:
#define __RTE_MBUF_PUT_FREED(m)	(rte_mempool_put((m)->pool, m))
?

Konstantin


> 
> By the way, __rte_pktmbuf_prefree_seg() is also an internal function.
> I think we should try to update the mbuf API so that the PMDs do not
> need to call these internal functions.
> 
> 
> [1]
> http://dpdk.org/browse/dpdk/commit/?id=fbfd99551ca370266f4bfff58ce441cf5cb1203a
> 
> 
> Regards,
> Olivier
> 
> 
> >
> >>>> If the user calls __rte_mbuf_raw_free() manualy it is his responsibility to make
> >>>> sure mbuf's refcn==0.
> >>>
> >>> Sure, however what harm does it cause (besides assert() to fail), since the
> >>> allocation function sets refcount to 1?
> >>>
> >>> Why having the allocation function set the refcount if we are sure it is
> >>> already 0 (assert() proves it). Removing rte_mbuf_refcnt_set(m, 1) can
> >>> surely improve performance.
> >>
> >> That's' just an assert() enabled when MBUF_DEBUG  is on.
> >> Its sole purpose is to help troubleshoot the bugs and help to catch situations
> >> when someone silently updates mbufs supposed to be free.
> >
> > I perfectly understand and I cannot agree more with this explanation,
> > however the fact these functions are not symmetrical remains an issue that
> > needs to be addressed somehow in my opinion.
> >
> >>>> BTW, why are you doing it?
> >>>> The comment clearly states that the function is for internal use:
> >>>> /**
> >>>>  * @internal Put mbuf back into its original mempool.
> >>>>  * The use of that function is reserved for RTE internal needs.
> >>>>  * Please use rte_pktmbuf_free().
> >>>>  *
> >>>>  * @param m
> >>>>  *   The mbuf to be freed.
> >>>>  */
> >>>> static inline void __attribute__((always_inline))
> >>>> __rte_mbuf_raw_free(struct rte_mbuf *m)
> >>>
> >>> Several PMDs are using it anyway (won't name names, but I know one of them
> >>> quite well).
> >>
> >> Then it probably is a bug in these PMDs that need to be fixed.
> >>
> >>> I chose to modify this code instead of its users for the
> >>> following reasons:
> >>>
> >>> - Considering their names, these functions should be opposites and able to
> >>>   work together like malloc()/free().
> >>
> >> These are internal functions.
> >> Comments in mbuf clearly state that library users shouldn't call them directly.
> >> They are written to fit internal librte_mbuf needs, and no-one ever promised
> >> malloc/free() compatibility here.
> >
> > So it cannot be provided for the sake of not providing it or is there a good
> > reason?
> >
> > What I meant is that since PMDs already made the mistake of using these
> > functions to benefit from the improved performance, DPDK being all about
> > performance and stuff, let them use it as intended. Perhaps we should drop
> > those "__" like for rte_mbuf_raw_alloc().
> >
> >>>
> >>> - PMDs are relying on these functions for performance reasons, we can assume
> >>>   they took the extra care necessary to make sure it would work properly.
> >>
> >> That just doesn't seem correct to me.
> >> The proper way to do free fo mbuf segment is:
> >>
> >> static inline void __attribute__((always_inline))
> >> rte_pktmbuf_free_seg(struct rte_mbuf *m)
> >> {
> >>         if (likely(NULL != (m = __rte_pktmbuf_prefree_seg(m)))) {
> >>                 m->next = NULL;
> >>                 __rte_mbuf_raw_free(m);
> >>         }
> >> }
> >>
> >> If by some reason you choose not to use this function, then it is your
> >> responsibility to perform similar actions on your own before pushing mbuf into the pool.
> >> That's what some TX functions for some Intel NICs do to improve performance:
> >> they call _prefree_seg() manually and try to put mbufs into the pool in groups.
> >
> > Not anymore it seems, but in the current code base both ena and mpipe PMDs
> > (soon mlx5 as well) seem to get this wrong.
> >
> >>> - Preventing it would make these PMDs slower and is not acceptable either.
> >>
> >> I can hardly imagine that __rte_pktmbuf_prefree_seg() impact would be that severe...
> >> But ok, probably  you do have some very specific case, but then why you PMD just doesn't call:
> >> rte_mempool_put(m->pool, m);
> >> directly?
> >
> > To survive the upcoming transition to mbufs backed by libc malloc() without
> > having to fix them? Joke aside, I guess the reason is to use functions with
> > "mbuf" in their names when dealing with mbufs.
> >
> >> Why instead you choose to change common functions and compromise
> >> librte_mbuf debug ability?
> >
> > No, I'm fine with keeping the debug ability, however I did not find a way to
> > both keep it and fix the consistency issue without breaking something
> > (performance or refcount assumptions I'm not familiar with elsewhere).
> >
> >>> What remains is the consistency issue, I think these statements were only
> >>> added to catch multiple frees,
> >>
> >> Yes these asserts() here to help catch bugs,
> >> and I think it is a good thing to have them here.
> >>
> >>> and those should be caught at a higher
> >>> level, where other consistency checks are also performed.
> >>
> >> Like where?
> >
> > Possibly rte_pktmbuf_free_seg().
> >
> >>>>> Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
> >>>>> ---
> >>>>>  lib/librte_mbuf/rte_mbuf.h | 2 --
> >>>>>  1 file changed, 2 deletions(-)
> >>>>>
> >>>>> diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h
> >>>>> index 11fa06d..7070bb8 100644
> >>>>> --- a/lib/librte_mbuf/rte_mbuf.h
> >>>>> +++ b/lib/librte_mbuf/rte_mbuf.h
> >>>>> @@ -1108,7 +1108,6 @@ static inline struct rte_mbuf *rte_mbuf_raw_alloc(struct rte_mempool *mp)
> >>>>>  	if (rte_mempool_get(mp, &mb) < 0)
> >>>>>  		return NULL;
> >>>>>  	m = (struct rte_mbuf *)mb;
> >>>>> -	RTE_ASSERT(rte_mbuf_refcnt_read(m) == 0);
> >>>>>  	rte_mbuf_refcnt_set(m, 1);
> >>>>>  	__rte_mbuf_sanity_check(m, 0);
> >>>>>
> >>>>> @@ -1133,7 +1132,6 @@ __rte_mbuf_raw_alloc(struct rte_mempool *mp)
> >>>>>  static inline void __attribute__((always_inline))
> >>>>>  __rte_mbuf_raw_free(struct rte_mbuf *m)
> >>>>>  {
> >>>>> -	RTE_ASSERT(rte_mbuf_refcnt_read(m) == 0);
> >>>>>  	rte_mempool_put(m->pool, m);
> >>>>>  }
> >>>>>
> >>>>> --
> >>>>> 2.1.4
> >>>>
> >>>
> >>> --
> >>> Adrien Mazarguil
> >>> 6WIND
> >

  reply	other threads:[~2016-06-08 16:08 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-06-08  8:31 Adrien Mazarguil
2016-06-08 10:34 ` Ananyev, Konstantin
2016-06-08 12:27   ` Adrien Mazarguil
2016-06-08 13:09     ` Ananyev, Konstantin
2016-06-08 13:57       ` Adrien Mazarguil
2016-06-08 14:11         ` Olivier Matz
2016-06-08 16:07           ` Ananyev, Konstantin [this message]
2016-06-09  7:46             ` Olivier Matz
2016-06-09 13:21               ` Ananyev, Konstantin
2016-06-09 14:19                 ` Bruce Richardson
2016-06-09 15:27                 ` Thomas Monjalon
2016-06-09 15:45                   ` Ananyev, Konstantin
2016-06-09 18:42                     ` Don Provan
2016-06-20 13:49   ` Adrien Mazarguil

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=2601191342CEEE43887BDE71AB97725836B6D17A@irsmsx105.ger.corp.intel.com \
    --to=konstantin.ananyev@intel.com \
    --cc=adrien.mazarguil@6wind.com \
    --cc=dev@dpdk.org \
    --cc=olivier.matz@6wind.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).