From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mga01.intel.com (mga01.intel.com [192.55.52.88]) by dpdk.org (Postfix) with ESMTP id 77977B45F for ; Wed, 8 Jun 2016 18:08:02 +0200 (CEST) Received: from fmsmga002.fm.intel.com ([10.253.24.26]) by fmsmga101.fm.intel.com with ESMTP; 08 Jun 2016 09:07:28 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.26,439,1459839600"; d="scan'208";a="997918861" Received: from irsmsx152.ger.corp.intel.com ([163.33.192.66]) by fmsmga002.fm.intel.com with ESMTP; 08 Jun 2016 09:07:27 -0700 Received: from irsmsx105.ger.corp.intel.com ([169.254.7.51]) by IRSMSX152.ger.corp.intel.com ([169.254.6.247]) with mapi id 14.03.0248.002; Wed, 8 Jun 2016 17:07:26 +0100 From: "Ananyev, Konstantin" To: Olivier Matz , "dev@dpdk.org" , Adrien Mazarguil Thread-Topic: [dpdk-dev] [PATCH] mbuf: remove inconsistent assert statements Thread-Index: AQHRwWBHlYh+pJQko0aMG2e1V0BGOZ/fXbUQgAAQ/oCAABYKAIAAAzyAgAAD14CAAB8nYA== Date: Wed, 8 Jun 2016 16:07:25 +0000 Message-ID: <2601191342CEEE43887BDE71AB97725836B6D17A@irsmsx105.ger.corp.intel.com> References: <1465374688-11729-1-git-send-email-adrien.mazarguil@6wind.com> <2601191342CEEE43887BDE71AB97725836B6CE18@irsmsx105.ger.corp.intel.com> <20160608122723.GK7621@6wind.com> <2601191342CEEE43887BDE71AB97725836B6CFDD@irsmsx105.ger.corp.intel.com> <20160608135751.GM7621@6wind.com> <57582797.8050300@6wind.com> In-Reply-To: <57582797.8050300@6wind.com> Accept-Language: en-IE, en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [163.33.239.181] Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Subject: Re: [dpdk-dev] [PATCH] mbuf: remove inconsistent assert statements X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 08 Jun 2016 16:08:03 -0000 Hi Olivier, >=20 > Hi Adrien, Konstantin, >=20 > I'm jumping in this (interesting) discussion. I already talked a > bit with Adrien in point to point, and I think its patch is valid. > Please see some comments below. >=20 >=20 > On 06/08/2016 03:57 PM, Adrien Mazarguil wrote: > > On Wed, Jun 08, 2016 at 01:09:18PM +0000, Ananyev, Konstantin wrote: > >>> > >>> Hi Konstantin, > >>> > >>> On Wed, Jun 08, 2016 at 10:34:17AM +0000, Ananyev, Konstantin wrote: > >>>> Hi Adrien, > >>>> > >>>>> > >>>>> An assertion failure occurs in __rte_mbuf_raw_free() (called by a f= ew PMDs) > >>>>> when compiling DPDK with CONFIG_RTE_LOG_LEVEL=3DRTE_LOG_DEBUG and s= tarting > >>>>> applications with a log level high enough to trigger it. > >>>>> > >>>>> While rte_mbuf_raw_alloc() sets refcount to 1, __rte_mbuf_raw_free(= ) > >>>>> expects it to be 0. > >>>>> Considering users are not expected to reset the > >>>>> reference count to satisfy assert() and that raw functions are desi= gned on > >>>>> purpose without safety belts, remove these checks. > >>>> > >>>> Yes, it refcnt supposed to be set to 0 by __rte_pktmbuf_prefree_seg(= ). > >>>> Wright now, it is a user responsibility to make sure refcnt=3D=3D0 b= efore pushing > >>>> mbuf back to the pool. > >>>> Not sure why do you consider that wrong? > >>> > >>> I do not consider this wrong and I'm all for using assert() to catch > >>> programming errors, however in this specific case, I think they are > >>> inconsistent and misleading. > >> > >> Honestly, I don't understand why. > >> Right now the rule of thumb is - when mbuf is in the pool, it's refcnt= should be equal zero. >=20 > What is the purpose of this? Is there some code relying on this? The whole current implementation of mbuf_free code path relies on that. Straight here:=20 if (likely(NULL !=3D (m =3D __rte_pktmbuf_prefree_seg(m)))) { m->next =3D NULL; __rte_mbuf_raw_free(m); } If we'll exclude indirect mbuf logic, all it does: if (rte_mbuf_refcnt_update(m, -1) =3D=3D 0) { m->next =3D NULL; __rte_mbuf_raw_free(m); } I.E.: decrement mbuf->refcnt. If new value of refcnt is zero, then put it back into the pool. So having ASERT(mbuf->refcnt=3D=3D0) inside __rte_mbuf_raw_free()/__rte_mbuf_raw_alloc() looks absolutely valid to me. I *has* to be zero at that point with current implementation, And if it is not then we probably have (or will have) a silent memory corru= ption. >=20 > >> Yes, as you pointed below - that rule probably can be changed to: > >> when mbuf is in the pool, it's refcnt should equal one, and that would= probably allow us > >> to speedup things a bit, but I suppose that's the matter of another ap= tch/discussion. > > > > Agreed. >=20 > I agree this is somehow another discussion, but let's dive into :) OK :) >=20 > But since [1], it is allowed to call rte_mbuf_raw_alloc() in PMDs (all > PMDs were calling an internal function before). Yes, raw_alloc is public, NP with that. > We could argue that > rte_mbuf_raw_free() should also be made public for PMDs. We could, but right now it is not. Again, as I said, user could use it on his own but he obviously has to obey the rules and do manually what __rte_pktmbuf_prefree_seg() does. At least: rte_mbuf_refcnt_set(m, 0); __rte_mbuf_raw_free(m); >=20 > As you said below, no-one promised that the free() reverts the malloc(), > but given the function names, one can legitimately expect that the > following code is valid: >=20 > m =3D __rte_mbuf_raw_alloc(); > /* do nothing here */ > __rte_mbuf_raw_free(m); Surely people could (and would) expect various things... But the reality right now is: __rte_mbuf_raw_free() is an internal function and not counterpart of __rte_mbuf_raw_alloc(). If the people don't bother to read API docs or/and the actual code, I can't see how we can help them :) >=20 > If no code relies on having the refcnt set to 0 when a mbuf is in > the pool, I suggest to relax this constraint as Adrien proposed. Why not just rename it to __rte_mbuf_raw_free_dont_use_after_raw_alloc()? To avoid any further confusion :) Seriously speaking I would prefer to leave it as it is. If you feel we have to introduce a counterpart of rte_mbuf_raw_alloc(), we can make a new public one: rte_mbuf_raw_free(stuct rte_mbuf *m) { if (rte_mbuf_refcnt_update(m, -1) =3D=3D 0) __rte_mbuf_raw_free(m); } >=20 > Then, my opinion is that the refcount should be set to 1 in > rte_pktmbuf_reset(). I don't think we need to update rte_pktmbuf_reset(), it doesn't touch refcnt at all and probably better to keep it that way. To achieve what you suggesting, we probably need to: 1) update _rte_pktmbuf_prefree_seg and rte_pktmbuf_detach() to set refcnt back to 1, i.e: static inline struct rte_mbuf* __attribute__((always_inline)) __rte_pktmbuf_prefree_seg(struct rte_mbuf *m) { __rte_mbuf_sanity_check(m, 0); if (likely(rte_mbuf_refcnt_update(m, -1) =3D=3D 0)) { /* if this is an indirect mbuf, it is detached. */ if (RTE_MBUF_INDIRECT(m)) rte_pktmbuf_detach(m); + rte_mbuf_refcnt_set(m, 1); return m; } return NULL; } 2) either: a) update mbuf constructor function, so it sets refcnt=3D1. I suppose that is easy for rte_pktmbuf_init(), but it means that al= l custom constructors should do the same. Which means possible changes in existing user code and all ABI chan= ge related hassle. b) keep rte_mbuf_raw_alloc() setting mbuf->refcnt=3D1. But then I don't see how will get any performance gain here. =20 So not sure is it really worth it. > And rte_pktmbuf_free() should not be allowed on > an uninitialized mbuf (yes, this would require some changes in PMDs). Not sure I understand you here... free() wouldn not be allowed on mbuf whose recnf=3D=3D0? But it is not allowed right now anyway. > This would open the door for bulk allocation/free in the mbuf api. Hmm and what stops us having one right now? As I know we do have rte_pktmbuf_alloc_bulk(), I don't see why we can't have rte_pktmbuf_free_bulk() right now. >=20 > This could be done in several steps: >=20 > 1/ remove these assert(), add introduce a public rte_mbuf_raw_free() > 2/ announce that rte_pktmbuf_free() won't work on uninitialized mbufs > in a future version, and ensure that all PMD are inline with this > requirement > 3/ later, move refcount initialization in rte_pktmbuf_reset() >=20 > What do you think? I still think that assert() is on a right place :) *If* we'll change mbuf free code in a way that mbufs inside the pool should have refcnt=3D=3D1, then I think we'll change it to: RTE_ASSERT(rte_mbuf_refcnt_read(m) =3D=3D 1);=20 But as I stated above, that change might cause some backward compatibility = hassle: 2.a) Or might not give us any performance gain: 2.b). >=20 > Another option is to remove the rte_mbuf_raw_alloc()/rte_mbuf_raw_free() > and use mempool calls. Don't see why we have to remove them... Basically we have a bug in PMD, but instead of fixing it, you guys suggest to change mbuf code. Sounds a bit strange to me. Why not just make for these PMDs to call rte_mempool_put() directly, if you= are sure it is safe here? > But having a mbuf wrapper does not seem a bad > thing to me. We can add some extra wrapper then, something like: #define __RTE_MBUF_PUT_FREED(m) (rte_mempool_put((m)->pool, m)) ? Konstantin >=20 > By the way, __rte_pktmbuf_prefree_seg() is also an internal function. > I think we should try to update the mbuf API so that the PMDs do not > need to call these internal functions. >=20 >=20 > [1] > http://dpdk.org/browse/dpdk/commit/?id=3Dfbfd99551ca370266f4bfff58ce441cf= 5cb1203a >=20 >=20 > Regards, > Olivier >=20 >=20 > > > >>>> If the user calls __rte_mbuf_raw_free() manualy it is his responsibi= lity to make > >>>> sure mbuf's refcn=3D=3D0. > >>> > >>> Sure, however what harm does it cause (besides assert() to fail), sin= ce the > >>> allocation function sets refcount to 1? > >>> > >>> Why having the allocation function set the refcount if we are sure it= is > >>> already 0 (assert() proves it). Removing rte_mbuf_refcnt_set(m, 1) ca= n > >>> surely improve performance. > >> > >> That's' just an assert() enabled when MBUF_DEBUG is on. > >> Its sole purpose is to help troubleshoot the bugs and help to catch si= tuations > >> when someone silently updates mbufs supposed to be free. > > > > I perfectly understand and I cannot agree more with this explanation, > > however the fact these functions are not symmetrical remains an issue t= hat > > needs to be addressed somehow in my opinion. > > > >>>> BTW, why are you doing it? > >>>> The comment clearly states that the function is for internal use: > >>>> /** > >>>> * @internal Put mbuf back into its original mempool. > >>>> * The use of that function is reserved for RTE internal needs. > >>>> * Please use rte_pktmbuf_free(). > >>>> * > >>>> * @param m > >>>> * The mbuf to be freed. > >>>> */ > >>>> static inline void __attribute__((always_inline)) > >>>> __rte_mbuf_raw_free(struct rte_mbuf *m) > >>> > >>> Several PMDs are using it anyway (won't name names, but I know one of= them > >>> quite well). > >> > >> Then it probably is a bug in these PMDs that need to be fixed. > >> > >>> I chose to modify this code instead of its users for the > >>> following reasons: > >>> > >>> - Considering their names, these functions should be opposites and ab= le to > >>> work together like malloc()/free(). > >> > >> These are internal functions. > >> Comments in mbuf clearly state that library users shouldn't call them = directly. > >> They are written to fit internal librte_mbuf needs, and no-one ever pr= omised > >> malloc/free() compatibility here. > > > > So it cannot be provided for the sake of not providing it or is there a= good > > reason? > > > > What I meant is that since PMDs already made the mistake of using these > > functions to benefit from the improved performance, DPDK being all abou= t > > performance and stuff, let them use it as intended. Perhaps we should d= rop > > those "__" like for rte_mbuf_raw_alloc(). > > > >>> > >>> - PMDs are relying on these functions for performance reasons, we can= assume > >>> they took the extra care necessary to make sure it would work prope= rly. > >> > >> That just doesn't seem correct to me. > >> The proper way to do free fo mbuf segment is: > >> > >> static inline void __attribute__((always_inline)) > >> rte_pktmbuf_free_seg(struct rte_mbuf *m) > >> { > >> if (likely(NULL !=3D (m =3D __rte_pktmbuf_prefree_seg(m)))) { > >> m->next =3D NULL; > >> __rte_mbuf_raw_free(m); > >> } > >> } > >> > >> If by some reason you choose not to use this function, then it is your > >> responsibility to perform similar actions on your own before pushing m= buf into the pool. > >> That's what some TX functions for some Intel NICs do to improve perfor= mance: > >> they call _prefree_seg() manually and try to put mbufs into the pool i= n groups. > > > > Not anymore it seems, but in the current code base both ena and mpipe P= MDs > > (soon mlx5 as well) seem to get this wrong. > > > >>> - Preventing it would make these PMDs slower and is not acceptable ei= ther. > >> > >> I can hardly imagine that __rte_pktmbuf_prefree_seg() impact would be = that severe... > >> But ok, probably you do have some very specific case, but then why yo= u PMD just doesn't call: > >> rte_mempool_put(m->pool, m); > >> directly? > > > > To survive the upcoming transition to mbufs backed by libc malloc() wit= hout > > having to fix them? Joke aside, I guess the reason is to use functions = with > > "mbuf" in their names when dealing with mbufs. > > > >> Why instead you choose to change common functions and compromise > >> librte_mbuf debug ability? > > > > No, I'm fine with keeping the debug ability, however I did not find a w= ay to > > both keep it and fix the consistency issue without breaking something > > (performance or refcount assumptions I'm not familiar with elsewhere). > > > >>> What remains is the consistency issue, I think these statements were = only > >>> added to catch multiple frees, > >> > >> Yes these asserts() here to help catch bugs, > >> and I think it is a good thing to have them here. > >> > >>> and those should be caught at a higher > >>> level, where other consistency checks are also performed. > >> > >> Like where? > > > > Possibly rte_pktmbuf_free_seg(). > > > >>>>> Signed-off-by: Adrien Mazarguil > >>>>> --- > >>>>> lib/librte_mbuf/rte_mbuf.h | 2 -- > >>>>> 1 file changed, 2 deletions(-) > >>>>> > >>>>> diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.= h > >>>>> index 11fa06d..7070bb8 100644 > >>>>> --- a/lib/librte_mbuf/rte_mbuf.h > >>>>> +++ b/lib/librte_mbuf/rte_mbuf.h > >>>>> @@ -1108,7 +1108,6 @@ static inline struct rte_mbuf *rte_mbuf_raw_a= lloc(struct rte_mempool *mp) > >>>>> if (rte_mempool_get(mp, &mb) < 0) > >>>>> return NULL; > >>>>> m =3D (struct rte_mbuf *)mb; > >>>>> - RTE_ASSERT(rte_mbuf_refcnt_read(m) =3D=3D 0); > >>>>> rte_mbuf_refcnt_set(m, 1); > >>>>> __rte_mbuf_sanity_check(m, 0); > >>>>> > >>>>> @@ -1133,7 +1132,6 @@ __rte_mbuf_raw_alloc(struct rte_mempool *mp) > >>>>> static inline void __attribute__((always_inline)) > >>>>> __rte_mbuf_raw_free(struct rte_mbuf *m) > >>>>> { > >>>>> - RTE_ASSERT(rte_mbuf_refcnt_read(m) =3D=3D 0); > >>>>> rte_mempool_put(m->pool, m); > >>>>> } > >>>>> > >>>>> -- > >>>>> 2.1.4 > >>>> > >>> > >>> -- > >>> Adrien Mazarguil > >>> 6WIND > >