From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mga11.intel.com (mga11.intel.com [192.55.52.93]) by dpdk.org (Postfix) with ESMTP id 8BC11B62 for ; Fri, 27 Mar 2015 11:48:24 +0100 (CET) Received: from orsmga001.jf.intel.com ([10.7.209.18]) by fmsmga102.fm.intel.com with ESMTP; 27 Mar 2015 03:48:23 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.11,478,1422950400"; d="scan'208";a="671428565" Received: from irsmsx107.ger.corp.intel.com ([163.33.3.99]) by orsmga001.jf.intel.com with ESMTP; 27 Mar 2015 03:48:22 -0700 Received: from irsmsx105.ger.corp.intel.com ([169.254.7.47]) by IRSMSX107.ger.corp.intel.com ([163.33.3.99]) with mapi id 14.03.0224.002; Fri, 27 Mar 2015 10:48:22 +0000 From: "Ananyev, Konstantin" To: Neil Horman , "Wiles, Keith" Thread-Topic: [dpdk-dev] [PATCH] mbuf: optimize refcnt handling during free Thread-Index: AQHQZ/sfrK7JIxqKWUiU1ulApFpoUZ0vP9aAgADg6oCAAAF6sA== Date: Fri, 27 Mar 2015 10:48:20 +0000 Message-ID: <2601191342CEEE43887BDE71AB97725821407F18@irsmsx105.ger.corp.intel.com> References: <1427393457-7080-1-git-send-email-zoltan.kiss@linaro.org> <20150327102533.GA5375@hmsreliant.think-freely.org> In-Reply-To: <20150327102533.GA5375@hmsreliant.think-freely.org> Accept-Language: en-IE, en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [163.33.239.182] Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Cc: "dev@dpdk.org" Subject: Re: [dpdk-dev] [PATCH] mbuf: optimize refcnt handling during free X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 27 Mar 2015 10:48:25 -0000 > -----Original Message----- > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Neil Horman > Sent: Friday, March 27, 2015 10:26 AM > To: Wiles, Keith > Cc: dev@dpdk.org > Subject: Re: [dpdk-dev] [PATCH] mbuf: optimize refcnt handling during fre= e >=20 > On Thu, Mar 26, 2015 at 09:00:33PM +0000, Wiles, Keith wrote: > > > > > > On 3/26/15, 1:10 PM, "Zoltan Kiss" wrote: > > > > >The current way is not the most efficient: if m->refcnt is 1, the seco= nd > > >condition never evaluates, and we set it to 0. If refcnt > 1, the 2nd > > >condition fails again, although the code suggest otherwise to branch > > >prediction. Instead we should keep the second condition only, and remo= ve > > >the > > >duplicate set to zero. > > > > > >Signed-off-by: Zoltan Kiss > > >--- > > > lib/librte_mbuf/rte_mbuf.h | 5 +---- > > > 1 file changed, 1 insertion(+), 4 deletions(-) > > > > > >diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h > > >index 17ba791..3ec4024 100644 > > >--- a/lib/librte_mbuf/rte_mbuf.h > > >+++ b/lib/librte_mbuf/rte_mbuf.h > > >@@ -764,10 +764,7 @@ __rte_pktmbuf_prefree_seg(struct rte_mbuf *m) > > > { > > > __rte_mbuf_sanity_check(m, 0); > > > > > >- if (likely (rte_mbuf_refcnt_read(m) =3D=3D 1) || > > >- likely (rte_mbuf_refcnt_update(m, -1) =3D=3D 0)) { > > >- > > >- rte_mbuf_refcnt_set(m, 0); > > >+ if (likely (rte_mbuf_refcnt_update(m, -1) =3D=3D 0)) { > > > > > > /* if this is an indirect mbuf, then > > > * - detach mbuf > > > > I fell for this one too, but read Bruce=B9s email > > http://dpdk.org/ml/archives/dev/2015-March/014481.html >=20 > This is still the right thing to do though, Bruce's reasoning is erroneou= s. No, it is not. I believe Bruce comments is absolutely correct here. > Just because the return from rte_mbuf_refcnt_read returns 1, doesn't mean= you It does. > are the last user of the mbuf, > you are only guaranteed that if the update > operation returns zero. >=20 > In other words: > rte_mbuf_refcnt_update(m, -1) >=20 > is an atomic operation >=20 > if (likely (rte_mbuf_refcnt_read(m) =3D=3D 1) || > likely (rte_mbuf_refcnt_update(m, -1) =3D=3D 0)) { >=20 >=20 > is not. >=20 > To illustrate, on two cpus, this might occur: >=20 > CPU0 CPU1 > rte_mbuf_refcnt_read ... > returns 1 rte_mbuf_refcnt_read > ... returns 1 > execute if clause execute if clause If you have an mbuf with refcnt=3D=3DN and try to call free() for it N+1 ti= mes - it is a bug in your code. Such code wouldn't work properly doesn't matter do we use: if (likely (rte_mbuf_refcnt_read(m) =3D=3D 1) || likely (rte_mbuf_refcnt_u= pdate(m, -1) =3D=3D 0)) or just:=20 if (likely (rte_mbuf_refcnt_update(m, -1) =3D=3D 0)) To illustrate it with your example: Suppose m.refcnt=3D=3D1 CPU0 executes:=20 rte_pktmbuf_free(m1) /*rte_mbuf_refcnt_update(m1, -1) returns 0, so we reset I'ts refcnt= and next and put mbuf back to the pool.*/ m2 =3D rte_pktmbuf_alloc(pool); /*as m1 is 'free' alloc could return same mbuf here, i.e: m2 =3D=3D m= 1. */ /* m2 refcnt =3D=3D1 start using m2 */ CPU1 executes: rte_pktmbuf_free(m1) /*rte_mbuf_refcnt_update(m1, -1) returns 0, so we reset I'ts refcnt= and next and put mbuf back to the pool.*/ We just returnend to the pool mbuf that is in use and caused silent memory = corruption of the mbuf's content. >=20 > In the above scenario both cpus fell into the if clause because they both= held a > pointer to the same buffer and both got a return value of one, so they sk= ipped > the update portion of the if clause and both executed the internal block = of the > conditional expression. you might be tempted to think thats ok, since th= at > block just sets the refcnt to zero, and doing so twice isn't harmful, but= the > entire purpose of that if conditional above was to ensure that only one > execution context ever executed the conditional for a given buffer. Look= at > what else happens in that conditional: >=20 > static inline struct rte_mbuf* __attribute__((always_inline)) > __rte_pktmbuf_prefree_seg(struct rte_mbuf *m) > { > __rte_mbuf_sanity_check(m, 0); >=20 > if (likely (rte_mbuf_refcnt_read(m) =3D=3D 1) || > likely (rte_mbuf_refcnt_update(m, -1) =3D=3D 0)) = { >=20 > rte_mbuf_refcnt_set(m, 0); >=20 > /* if this is an indirect mbuf, then > * - detach mbuf > * - free attached mbuf segment > */ > if (RTE_MBUF_INDIRECT(m)) { > struct rte_mbuf *md =3D RTE_MBUF_FROM_BADDR(m->bu= f_addr); > rte_pktmbuf_detach(m); > if (rte_mbuf_refcnt_update(md, -1) =3D=3D 0) > __rte_mbuf_raw_free(md); > } > return(m); > } > return (NULL); > } >=20 > If the buffer is indirect, another refcnt update occurs to the buf_addr m= buf, > and in the scenario I outlined above, that refcnt will underflow, likely = causing > a buffer leak. Additionally, the return code of this function is designe= d to > indicate to the caller if they were the last user of the buffer. In the = above > scenario, two execution contexts will be told that they were, which is wr= ong. >=20 > Zoltans patch is a good fix I don't think so. >=20 > Acked-by: Neil Horman NACKed-by: Konstantin Ananyev