From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mga01.intel.com (mga01.intel.com [192.55.52.88]) by dpdk.org (Postfix) with ESMTP id C9BFB2935 for ; Thu, 9 Jun 2016 16:19:52 +0200 (CEST) Received: from orsmga002.jf.intel.com ([10.7.209.21]) by fmsmga101.fm.intel.com with ESMTP; 09 Jun 2016 07:19:51 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.26,445,1459839600"; d="scan'208";a="994335691" Received: from bricha3-mobl3.ger.corp.intel.com ([10.237.221.71]) by orsmga002.jf.intel.com with SMTP; 09 Jun 2016 07:19:49 -0700 Received: by (sSMTP sendmail emulation); Thu, 09 Jun 2016 15:19:48 +0025 Date: Thu, 9 Jun 2016 15:19:48 +0100 From: Bruce Richardson To: "Ananyev, Konstantin" Cc: Olivier Matz , "dev@dpdk.org" , Adrien Mazarguil , "Zhang, Helin" Message-ID: <20160609141948.GB12520@bricha3-MOBL3> References: <1465374688-11729-1-git-send-email-adrien.mazarguil@6wind.com> <2601191342CEEE43887BDE71AB97725836B6CE18@irsmsx105.ger.corp.intel.com> <20160608122723.GK7621@6wind.com> <2601191342CEEE43887BDE71AB97725836B6CFDD@irsmsx105.ger.corp.intel.com> <20160608135751.GM7621@6wind.com> <57582797.8050300@6wind.com> <2601191342CEEE43887BDE71AB97725836B6D17A@irsmsx105.ger.corp.intel.com> <57591EEA.9040807@6wind.com> <2601191342CEEE43887BDE71AB97725836B6D90C@irsmsx105.ger.corp.intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <2601191342CEEE43887BDE71AB97725836B6D90C@irsmsx105.ger.corp.intel.com> Organization: Intel Research and =?iso-8859-1?Q?De=ACvel?= =?iso-8859-1?Q?opment?= Ireland Ltd. User-Agent: Mutt/1.5.23 (2014-03-12) Subject: Re: [dpdk-dev] [PATCH] mbuf: remove inconsistent assert statements X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 09 Jun 2016 14:19:53 -0000 On Thu, Jun 09, 2016 at 01:21:18PM +0000, Ananyev, Konstantin wrote: > Hi Olivier, > > > -----Original Message----- > > From: Olivier Matz [mailto:olivier.matz@6wind.com] > > Sent: Thursday, June 09, 2016 8:47 AM > > To: Ananyev, Konstantin; dev@dpdk.org; Adrien Mazarguil > > Subject: Re: [dpdk-dev] [PATCH] mbuf: remove inconsistent assert statements > > > > Hi Konstantin, > > > > >>>>>> Yes, it refcnt supposed to be set to 0 by __rte_pktmbuf_prefree_seg(). > > >>>>>> Wright now, it is a user responsibility to make sure refcnt==0 before pushing > > >>>>>> mbuf back to the pool. > > >>>>>> Not sure why do you consider that wrong? > > >>>>> > > >>>>> I do not consider this wrong and I'm all for using assert() to catch > > >>>>> programming errors, however in this specific case, I think they are > > >>>>> inconsistent and misleading. > > >>>> > > >>>> Honestly, I don't understand why. > > >>>> Right now the rule of thumb is - when mbuf is in the pool, it's refcnt should be equal zero. > > >> > > >> What is the purpose of this? Is there some code relying on this? > > > > > > The whole current implementation of mbuf_free code path relies on that. > > > Straight here: > > > if (likely(NULL != (m = __rte_pktmbuf_prefree_seg(m)))) { > > > m->next = NULL; > > > __rte_mbuf_raw_free(m); > > > } > > > > > > If we'll exclude indirect mbuf logic, all it does: > > > if (rte_mbuf_refcnt_update(m, -1) == 0) { > > > m->next = NULL; > > > __rte_mbuf_raw_free(m); > > > } > > > > > > I.E.: > > > decrement mbuf->refcnt. > > > If new value of refcnt is zero, then put it back into the pool. > > > > > > So having ASERT(mbuf->refcnt==0) inside > > > __rte_mbuf_raw_free()/__rte_mbuf_raw_alloc() > > > looks absolutely valid to me. > > > I *has* to be zero at that point with current implementation, > > > And if it is not then we probably have (or will have) a silent memory corruption. > > > > This explains how the refcount is used, and why it is set > > to zero before returning the mbuf to the pool with the mbuf > > free functions. > > From my point, that shows that rte_pktmbuf_free() relies on the value of the refcnt > to make a decision is it ok to put mbuf back to the pool or not. > Right now it puts mbuf to the pool *only* if it's refcnt==0. > As discussed below, we probably can change it to be refcnt==1 > (if there really would be noticeable performance gain). > But I think it still should be just one predefined value of refcnt (0 or 1). > In theory it is possible to allow both (0 and 1), > but that will make it hard to debug any alloc/free issues, > plus would neglect any possible performance gain - > as in that case raw_alloc (or it's analog) should still do > mbuf->refcnt=1; > I think enforcing the rule of the refcnt being 1 inside the mempool is reasonable. It saves setting the flag to zero and back to one again in the normal case. The one case, that I can think of, where it does cause extra work is when two threads simultaneously do the atomic decrement of the refcnt and one of them gets zero and must free the buffer. However, in that case, the additional cost of setting the count back to 1 on free is far less than the penalty paid on the atomic decrement - and this should definitely be an edge-case anyway, since it requires the two threads to free the same mbuf at the same time. If we make this change, it also may open the possibility to look at moving the refcount to cacheline 1 if we are reorganising the mbuf. It would no longer be needed inside RX in our PMDs, and the path to free mbufs already uses the pool pointer from the second cache-line anyway. That would free up 16-bits of space to expand out the nb_segs and port values to 16-bit each if we wanted. Regards, /Bruce