From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail.droids-corp.org (zoll.droids-corp.org [94.23.50.67]) by dpdk.org (Postfix) with ESMTP id 17AF55902 for ; Mon, 16 May 2016 10:52:51 +0200 (CEST) Received: from was59-1-82-226-113-214.fbx.proxad.net ([82.226.113.214] helo=[192.168.0.10]) by mail.droids-corp.org with esmtpsa (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.84_2) (envelope-from ) id 1b2EIi-00050j-IM; Mon, 16 May 2016 10:54:56 +0200 To: Hiroyuki Mikita References: <1463327436-6863-1-git-send-email-h.mikita89@gmail.com> Cc: dev@dpdk.org, "Ananyev, Konstantin" From: Olivier Matz Message-ID: <57398A5C.2050802@6wind.com> Date: Mon, 16 May 2016 10:52:44 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Icedove/38.6.0 MIME-Version: 1.0 In-Reply-To: <1463327436-6863-1-git-send-email-h.mikita89@gmail.com> Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: 7bit Subject: Re: [dpdk-dev] [PATCH] mbuf: decrease refcnt when detaching X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 16 May 2016 08:52:51 -0000 Hi Hiroyuki, On 05/15/2016 05:50 PM, Hiroyuki Mikita wrote: > The rte_pktmbuf_detach() function should decrease refcnt on a direct > buffer. > > Signed-off-by: Hiroyuki Mikita > --- > lib/librte_mbuf/rte_mbuf.h | 4 +++- > 1 file changed, 3 insertions(+), 1 deletion(-) > > diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h > index 529debb..3b117ca 100644 > --- a/lib/librte_mbuf/rte_mbuf.h > +++ b/lib/librte_mbuf/rte_mbuf.h > @@ -1468,9 +1468,11 @@ static inline void rte_pktmbuf_attach(struct rte_mbuf *mi, struct rte_mbuf *m) > */ > static inline void rte_pktmbuf_detach(struct rte_mbuf *m) > { > + struct rte_mbuf *md = rte_mbuf_from_indirect(m); > struct rte_mempool *mp = m->pool; > uint32_t mbuf_size, buf_len, priv_size; > > + rte_mbuf_refcnt_update(md, -1); > priv_size = rte_pktmbuf_priv_size(mp); > mbuf_size = sizeof(struct rte_mbuf) + priv_size; > buf_len = rte_pktmbuf_data_room_size(mp); > @@ -1498,7 +1500,7 @@ __rte_pktmbuf_prefree_seg(struct rte_mbuf *m) > if (RTE_MBUF_INDIRECT(m)) { > struct rte_mbuf *md = rte_mbuf_from_indirect(m); > rte_pktmbuf_detach(m); > - if (rte_mbuf_refcnt_update(md, -1) == 0) > + if (rte_mbuf_refcnt_read(md) == 0) > __rte_mbuf_raw_free(md); > } > return m; > Thanks for submitting this patch. I agree that rte_pktmbuf_attach() and rte_pktmbuf_detach() are not symmetrical, but I think your patch could trigger some race conditions. Example: - init: m, c1 and c2 are direct mbuf - rte_pktmbuf_attach(c1, m); # c1 becomes a clone of m - rte_pktmbuf_attach(c2, m); # c2 becomes another clone of m - rte_pktmbuf_free(m); - after that, we have: - m is a direct mbuf with refcnt = 2 - c1 is an indirect mbuf pointing to data of m - c2 is an indirect mbuf pointing to data of m - if we call rte_pktmbuf_free(c1) and rte_pktmbuf_free(c2) on 2 different cores at the same time, m can be freed twice because (rte_mbuf_refcnt_read(md) == 0) can be true on both cores. I think the proper way of doing would be to have rte_pktmbuf_detach() returning the value of rte_mbuf_refcnt_update(md, -1), ensuring that only one core will call _rte_mbuf_raw_free(). In the unit tests, in test_attach_from_different_pool(), the mbuf m is never freed due to this behavior. That shows the current api is a bit misleading. I think it should also be fixed in the patch. Another issue is that it will break the API. To avoid issues in applications relying on the current behavior of rte_pktmbuf_detach(), I'd say we should keep the function as-is and mark it as deprecated. Then, introduce a new function rte_pktmbuf_detach2() (or any better name :) ) that changes the behavior to what you suggest. An entry in the release note would also be helpful. The other option is to let things as-is and just document the behavior of rte_pktmbuf_detach(), explicitly saying that it is not symmetrical with the attach. But I'd prefer the first option. Thoughts ? Regards, Olivier