DPDK patches and discussions
 help / color / mirror / Atom feed
From: Bruce Richardson <bruce.richardson@intel.com>
To: "Ananyev, Konstantin" <konstantin.ananyev@intel.com>
Cc: Olivier Matz <olivier.matz@6wind.com>,
	"dev@dpdk.org" <dev@dpdk.org>,
	Adrien Mazarguil <adrien.mazarguil@6wind.com>,
	"Zhang, Helin" <helin.zhang@intel.com>
Subject: Re: [dpdk-dev] [PATCH] mbuf: remove inconsistent assert statements
Date: Thu, 9 Jun 2016 15:19:48 +0100	[thread overview]
Message-ID: <20160609141948.GB12520@bricha3-MOBL3> (raw)
In-Reply-To: <2601191342CEEE43887BDE71AB97725836B6D90C@irsmsx105.ger.corp.intel.com>

On Thu, Jun 09, 2016 at 01:21:18PM +0000, Ananyev, Konstantin wrote:
> Hi Olivier,
> 
> > -----Original Message-----
> > From: Olivier Matz [mailto:olivier.matz@6wind.com]
> > Sent: Thursday, June 09, 2016 8:47 AM
> > To: Ananyev, Konstantin; dev@dpdk.org; Adrien Mazarguil
> > Subject: Re: [dpdk-dev] [PATCH] mbuf: remove inconsistent assert statements
> > 
> > Hi Konstantin,
> > 
> > >>>>>> Yes, it refcnt supposed to be set to 0 by __rte_pktmbuf_prefree_seg().
> > >>>>>> Wright now, it is a user responsibility to make sure refcnt==0 before pushing
> > >>>>>> mbuf back to the pool.
> > >>>>>> Not sure why do you consider that wrong?
> > >>>>>
> > >>>>> I do not consider this wrong and I'm all for using assert() to catch
> > >>>>> programming errors, however in this specific case, I think they are
> > >>>>> inconsistent and misleading.
> > >>>>
> > >>>> Honestly, I don't understand why.
> > >>>> Right now the rule of thumb is - when mbuf is in the pool, it's refcnt should be equal zero.
> > >>
> > >> What is the purpose of this? Is there some code relying on this?
> > >
> > > The whole current implementation of mbuf_free code path relies on that.
> > > Straight here:
> > > if (likely(NULL != (m = __rte_pktmbuf_prefree_seg(m)))) {
> > >                 m->next = NULL;
> > >                 __rte_mbuf_raw_free(m);
> > >         }
> > >
> > > If we'll exclude indirect mbuf logic, all it does:
> > > if (rte_mbuf_refcnt_update(m, -1) == 0) {
> > > 	m->next = NULL;
> > > 	__rte_mbuf_raw_free(m);
> > > }
> > >
> > > I.E.:
> > > decrement mbuf->refcnt.
> > > If new value of refcnt is zero, then put it back into the pool.
> > >
> > > So having ASERT(mbuf->refcnt==0) inside
> > > __rte_mbuf_raw_free()/__rte_mbuf_raw_alloc()
> > > looks absolutely valid to me.
> > > I *has* to be zero at that point with current implementation,
> > > And if it is not then we probably have (or will have) a silent memory corruption.
> > 
> > This explains how the refcount is used, and why it is set
> > to zero before returning the mbuf to the pool with the mbuf
> > free functions.
> 
> From my point, that shows that rte_pktmbuf_free() relies on the value of the refcnt
> to make a decision is it ok to put mbuf back to the pool or not.
> Right now it puts mbuf to the pool *only* if it's refcnt==0.
> As discussed below, we probably can change it to be refcnt==1
> (if there really would be noticeable performance gain).
> But I think it still should be just one predefined value of refcnt (0 or 1).
> In theory it is possible to allow both (0 and 1),
> but that will make it hard to debug any alloc/free issues,
> plus would neglect any possible performance gain -
> as in that case raw_alloc (or it's analog) should still do
> mbuf->refcnt=1; 
> 

I think enforcing the rule of the refcnt being 1 inside the mempool is reasonable.
It saves setting the flag to zero and back to one again in the normal case. The
one case, that I can think of, where it does cause extra work is when two
threads simultaneously do the atomic decrement of the refcnt and one of them
gets zero and must free the buffer. However, in that case, the additional cost
of setting the count back to 1 on free is far less than the penalty paid on the
atomic decrement - and this should definitely be an edge-case anyway, since it
requires the two threads to free the same mbuf at the same time.

If we make this change, it also may open the possibility to look at moving the
refcount to cacheline 1 if we are reorganising the mbuf. It would no longer be
needed inside RX in our PMDs, and the path to free mbufs already uses the pool
pointer from the second cache-line anyway. That would free up 16-bits of space
to expand out the nb_segs and port values to 16-bit each if we wanted.

Regards,
/Bruce

  reply	other threads:[~2016-06-09 14:19 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-06-08  8:31 Adrien Mazarguil
2016-06-08 10:34 ` Ananyev, Konstantin
2016-06-08 12:27   ` Adrien Mazarguil
2016-06-08 13:09     ` Ananyev, Konstantin
2016-06-08 13:57       ` Adrien Mazarguil
2016-06-08 14:11         ` Olivier Matz
2016-06-08 16:07           ` Ananyev, Konstantin
2016-06-09  7:46             ` Olivier Matz
2016-06-09 13:21               ` Ananyev, Konstantin
2016-06-09 14:19                 ` Bruce Richardson [this message]
2016-06-09 15:27                 ` Thomas Monjalon
2016-06-09 15:45                   ` Ananyev, Konstantin
2016-06-09 18:42                     ` Don Provan
2016-06-20 13:49   ` Adrien Mazarguil

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160609141948.GB12520@bricha3-MOBL3 \
    --to=bruce.richardson@intel.com \
    --cc=adrien.mazarguil@6wind.com \
    --cc=dev@dpdk.org \
    --cc=helin.zhang@intel.com \
    --cc=konstantin.ananyev@intel.com \
    --cc=olivier.matz@6wind.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).