DPDK patches and discussions
 help / color / mirror / Atom feed
From: Neil Horman <nhorman@tuxdriver.com>
To: "Wiles, Keith" <keith.wiles@intel.com>
Cc: "dev@dpdk.org" <dev@dpdk.org>
Subject: Re: [dpdk-dev] [PATCH] mbuf: optimize refcnt handling during free
Date: Fri, 27 Mar 2015 06:25:33 -0400	[thread overview]
Message-ID: <20150327102533.GA5375@hmsreliant.think-freely.org> (raw)
In-Reply-To: <D139DD49.19B82%keith.wiles@intel.com>

On Thu, Mar 26, 2015 at 09:00:33PM +0000, Wiles, Keith wrote:
> 
> 
> On 3/26/15, 1:10 PM, "Zoltan Kiss" <zoltan.kiss@linaro.org> wrote:
> 
> >The current way is not the most efficient: if m->refcnt is 1, the second
> >condition never evaluates, and we set it to 0. If refcnt > 1, the 2nd
> >condition fails again, although the code suggest otherwise to branch
> >prediction. Instead we should keep the second condition only, and remove
> >the
> >duplicate set to zero.
> >
> >Signed-off-by: Zoltan Kiss <zoltan.kiss@linaro.org>
> >---
> > lib/librte_mbuf/rte_mbuf.h | 5 +----
> > 1 file changed, 1 insertion(+), 4 deletions(-)
> >
> >diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h
> >index 17ba791..3ec4024 100644
> >--- a/lib/librte_mbuf/rte_mbuf.h
> >+++ b/lib/librte_mbuf/rte_mbuf.h
> >@@ -764,10 +764,7 @@ __rte_pktmbuf_prefree_seg(struct rte_mbuf *m)
> > {
> > 	__rte_mbuf_sanity_check(m, 0);
> > 
> >-	if (likely (rte_mbuf_refcnt_read(m) == 1) ||
> >-			likely (rte_mbuf_refcnt_update(m, -1) == 0)) {
> >-
> >-		rte_mbuf_refcnt_set(m, 0);
> >+	if (likely (rte_mbuf_refcnt_update(m, -1) == 0)) {
> > 
> > 		/* if this is an indirect mbuf, then
> > 		 *  - detach mbuf
> 
> I fell for this one too, but read Bruce¹s email
> http://dpdk.org/ml/archives/dev/2015-March/014481.html

This is still the right thing to do though, Bruce's reasoning is erroneous.
Just because the return from rte_mbuf_refcnt_read returns 1, doesn't mean you
are the last user of the mbuf, you are only guaranteed that if the update
operation returns zero.

In other words:
rte_mbuf_refcnt_update(m, -1)

is an atomic operation

if (likely (rte_mbuf_refcnt_read(m) == 1) ||
                    likely (rte_mbuf_refcnt_update(m, -1) == 0)) {


is not.

To illustrate, on two cpus, this might occur:

CPU0					CPU1
rte_mbuf_refcnt_read			...
   returns 1				rte_mbuf_refcnt_read
...					   returns 1
execute if clause			execute if clause

In the above scenario both cpus fell into the if clause because they both held a
pointer to the same buffer and both got a return value of one, so they skipped
the update portion of the if clause and both executed the internal block of the
conditional expression.  you might be tempted to think thats ok, since that
block just sets the refcnt to zero, and doing so twice isn't harmful, but the
entire purpose of that if conditional above was to ensure that only one
execution context ever executed the conditional for a given buffer.  Look at
what else happens in that conditional:

static inline struct rte_mbuf* __attribute__((always_inline))
__rte_pktmbuf_prefree_seg(struct rte_mbuf *m)
{
        __rte_mbuf_sanity_check(m, 0);

        if (likely (rte_mbuf_refcnt_read(m) == 1) ||
                        likely (rte_mbuf_refcnt_update(m, -1) == 0)) {

                rte_mbuf_refcnt_set(m, 0);

                /* if this is an indirect mbuf, then
                 *  - detach mbuf
                 *  - free attached mbuf segment
                 */
                if (RTE_MBUF_INDIRECT(m)) {
                        struct rte_mbuf *md = RTE_MBUF_FROM_BADDR(m->buf_addr);
                        rte_pktmbuf_detach(m);
                        if (rte_mbuf_refcnt_update(md, -1) == 0)
                                __rte_mbuf_raw_free(md);
                }
                return(m);
        }
        return (NULL);
}

If the buffer is indirect, another refcnt update occurs to the buf_addr mbuf,
and in the scenario I outlined above, that refcnt will underflow, likely causing
a buffer leak.  Additionally, the return code of this function is designed to
indicate to the caller if they were the last user of the buffer.  In the above
scenario, two execution contexts will be told that they were, which is wrong.

Zoltans patch is a good fix

Acked-by: Neil Horman <nhorman@tuxdriver.com>

  parent reply	other threads:[~2015-03-27 10:25 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-03-26 18:10 Zoltan Kiss
2015-03-26 21:00 ` Wiles, Keith
2015-03-26 21:07   ` Bruce Richardson
2015-03-27 10:25   ` Neil Horman [this message]
2015-03-27 10:48     ` Ananyev, Konstantin
2015-03-27 12:44       ` Neil Horman
2015-03-27 13:10         ` Olivier MATZ
2015-03-27 13:16           ` Bruce Richardson
2015-03-27 13:22             ` Ananyev, Konstantin
2015-03-27 10:50     ` Olivier MATZ

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20150327102533.GA5375@hmsreliant.think-freely.org \
    --to=nhorman@tuxdriver.com \
    --cc=dev@dpdk.org \
    --cc=keith.wiles@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).