From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from dpdk.org (dpdk.org [92.243.14.124]) by inbox.dpdk.org (Postfix) with ESMTP id 4251FA04DD; Thu, 5 Nov 2020 14:24:43 +0100 (CET) Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id C730DBE9B; Thu, 5 Nov 2020 14:24:41 +0100 (CET) Received: from mail-wm1-f68.google.com (mail-wm1-f68.google.com [209.85.128.68]) by dpdk.org (Postfix) with ESMTP id CD501BE89 for ; Thu, 5 Nov 2020 14:24:40 +0100 (CET) Received: by mail-wm1-f68.google.com with SMTP id d142so1582603wmd.4 for ; Thu, 05 Nov 2020 05:24:40 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=6wind.com; s=google; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to:user-agent; bh=+dM7ldvRr9kEBfOQUMSUpQ+bden5XYuTLlpoChgmH48=; b=DXgsEyd20hjdw/rjgN/4rG7z/c91q9O/wi07UtGdPttowFLSKrlgV3t1mKkb56Vk6z jyIfnFa+DEd7V2vF8hiGpojP2H0gTacFrNZF0HBLnWCHxe3O9U++7vyno17qWpt90Z5a Js//sv+nmVVBRBbKuDGdUi76fq3/F+qL816zTjpacCSVorGhSPNt1YRGvkDxwqFtNpON 9PFqrylQp7ErS9D0vq2PJMLs95ED5IywXQh2RpC9ros2j+89IrPi0HAtXdjc6cHyPQnZ Mt508ngkWeDCoXqloLTI2dKPK6VsKndCIMuvqTsiGSCNZ0oUOo5xENF/s77rL/K1SURt oVxw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to:user-agent; bh=+dM7ldvRr9kEBfOQUMSUpQ+bden5XYuTLlpoChgmH48=; b=obLLB3XJJ6wCe4RM0iYCK++wMswjmosWseWhlNvXPEidlT1E61bCxDu75VFhBLSP6I 3f8LCCYGvSnsDUGImAXZkyckT3VDGg0EpQm8hFFKASxesr2PpEXlPi+IxQMHC17NaIy6 ZaRgFaHobmtInLBCjn2RUjm4Erq//B5f1LVBbbPdKOmyXU7yTLoK742xvjH2yLIaLHb7 Eest49y2efG0AxSC3U4cyOEAyA1cdk7O1FciownEJJcAedl14QJkGJO5iLaM7PagVBue f7evMk9Jl0rJUl1SYv8lBrMgfYTFdeSV9VbtPGMyemM/8aDX+00N+Z8EA+ozHwyvGPGj /STQ== X-Gm-Message-State: AOAM530Z92kHgv3lvxKeynARtvaDOOJItCdZGchbWu47Ltn5h6+LHM5Z aj4nhRQxSK0tDqNuovhsP6WYZA== X-Google-Smtp-Source: ABdhPJwgHn5oVhoBxfztVhUNqRm8If2y0/ZWtkta2G/nvvh04gCovmAae7+pTAv/zcApSjMdGbLk2w== X-Received: by 2002:a05:600c:2254:: with SMTP id a20mr2760768wmm.117.1604582679451; Thu, 05 Nov 2020 05:24:39 -0800 (PST) Received: from 6wind.com (2a01cb0c0005a600345636f7e65ed1a0.ipv6.abo.wanadoo.fr. [2a01:cb0c:5:a600:3456:36f7:e65e:d1a0]) by smtp.gmail.com with ESMTPSA id x81sm2732015wmg.5.2020.11.05.05.24.38 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 05 Nov 2020 05:24:38 -0800 (PST) Date: Thu, 5 Nov 2020 14:24:38 +0100 From: Olivier Matz To: "Ananyev, Konstantin" Cc: Andrew Rybchenko , "dev@dpdk.org" Message-ID: <20201105132438.GS1898@platinum> References: <20201104170007.8026-1-olivier.matz@6wind.com> <20201105074626.GL1898@platinum> <10b675b5-c376-746f-9ae2-210685b0b05d@oktetlabs.ru> <20201105091022.GO1898@platinum> <20201105123150.GP1898@platinum> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.10.1 (2018-07-13) Subject: Re: [dpdk-dev] [PATCH] mbuf: fix reset on mbuf free X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" On Thu, Nov 05, 2020 at 01:14:07PM +0000, Ananyev, Konstantin wrote: > > > > > > On Thu, Nov 05, 2020 at 11:34:18AM +0000, Ananyev, Konstantin wrote: > > > > > > > > > > On Thu, Nov 05, 2020 at 11:26:51AM +0300, Andrew Rybchenko wrote: > > > > > On 11/5/20 10:46 AM, Olivier Matz wrote: > > > > > > On Thu, Nov 05, 2020 at 12:15:49AM +0000, Ananyev, Konstantin wrote: > > > > > >> > > > > > >> Hi Olivier, > > > > > >> > > > > > >>> m->nb_seg must be reset on mbuf free whatever the value of m->next, > > > > > >>> because it can happen that m->nb_seg is != 1. For instance in this > > > > > >>> case: > > > > > >>> > > > > > >>> m1 = rte_pktmbuf_alloc(mp); > > > > > >>> rte_pktmbuf_append(m1, 500); > > > > > >>> m2 = rte_pktmbuf_alloc(mp); > > > > > >>> rte_pktmbuf_append(m2, 500); > > > > > >>> rte_pktmbuf_chain(m1, m2); > > > > > >>> m0 = rte_pktmbuf_alloc(mp); > > > > > >>> rte_pktmbuf_append(m0, 500); > > > > > >>> rte_pktmbuf_chain(m0, m1); > > > > > >>> > > > > > >>> As rte_pktmbuf_chain() does not reset nb_seg in the initial m1 > > > > > >>> segment (this is not required), after this code the mbuf chain > > > > > >>> have 3 segments: > > > > > >>> - m0: next=m1, nb_seg=3 > > > > > >>> - m1: next=m2, nb_seg=2 > > > > > >>> - m2: next=NULL, nb_seg=1 > > > > > >>> > > > > > >>> Freeing this mbuf chain will not restore nb_seg=1 in the second > > > > > >>> segment. > > > > > >> > > > > > >> Hmm, not sure why is that? > > > > > >> You are talking about freeing m1, right? > > > > > >> rte_pktmbuf_prefree_seg(struct rte_mbuf *m) > > > > > >> { > > > > > >> ... > > > > > >> if (m->next != NULL) { > > > > > >> m->next = NULL; > > > > > >> m->nb_segs = 1; > > > > > >> } > > > > > >> > > > > > >> m1->next != NULL, so it will enter the if() block, > > > > > >> and will reset both next and nb_segs. > > > > > >> What I am missing here? > > > > > >> Thinking in more generic way, that change: > > > > > >> - if (m->next != NULL) { > > > > > >> - m->next = NULL; > > > > > >> - m->nb_segs = 1; > > > > > >> - } > > > > > >> + m->next = NULL; > > > > > >> + m->nb_segs = 1; > > > > > > > > > > > > Ah, sorry. I oversimplified the example and now it does not > > > > > > show the issue... > > > > > > > > > > > > The full example also adds a split() to break the mbuf chain > > > > > > between m1 and m2. The kind of thing that would be done for > > > > > > software TCP segmentation. > > > > > > > > > > > > > > > > If so, may be the right solution is to care about nb_segs > > > > > when next is set to NULL on split? Any place when next is set > > > > > to NULL. Just to keep the optimization in a more generic place. > > > > > > > > > > The problem with that approach is that there are already several > > > > existing split() or trim() implementations in different DPDK-based > > > > applications. For instance, we have some in 6WINDGate. If we force > > > > applications to set nb_seg to 1 when resetting next, it has to be > > > > documented because it is not straightforward. > > > > > > I think it is better to go that way. > > > From my perspective it seems natural to reset nb_seg at same time > > > we reset next, otherwise inconsistency will occur. > > > > While it is not explicitly stated for nb_segs, to me it was clear that > > nb_segs is only valid in the first segment, like for many fields (port, > > ol_flags, vlan, rss, ...). > > > > If we say that nb_segs has to be valid in any segments, it means that > > chain() or split() will have to update it in all segments, which is not > > efficient. > > Why in all? > We can state that nb_segs on non-first segment should always equal 1. > As I understand in that case, both split() and chain() have to update nb_segs > only for head mbufs, rest ones will remain untouched. Well, anyway, I think it's strange to have a constraint on m->nb_segs for non-first segment. We don't have that kind of constraints for other fields. > > > > > Saying that nb_segs has to be valid for the first and last segment seems > > really odd to me. What is the logic behind that except keeping this test > > as is? > > > > In any case, it has to be better documented. > > > > > > Olivier > > > > > > > > I think the approach from > > > > this patch is safer. > > > > > > It might be easier from perspective that changes in less places are required, > > > Though I think that this patch will introduce some performance drop. > > > As now each mbuf_prefree_seg() will cause update of 2 cache lines unconditionally. > > > > > > > By the way, for 21.11, if we are able to do some optimizations and have > > > > both pool (index?) and next in the first cache line, we may reconsider > > > > the fact that next and nb_segs are already set for new allocated mbufs, > > > > because it is not straightforward either. > > > > > > My suggestion - let's put future optimization discussion aside for now, > > > and concentrate on that particular patch. > > > > > > > > > > > > > After this operation, we have 2 mbuf chain: > > > > > > - m0 with 2 segments, the last one has next=NULL but nb_seg=2 > > > > > > - new_m with 1 segment > > > > > > > > > > > > Freeing m0 will not restore nb_seg=1 in the second segment. > > > > > > > > > > > >> Assumes that it is ok to have an mbuf with > > > > > >> nb_seg > 1 and next == NULL. > > > > > >> Which seems wrong to me. > > > > > > > > > > > > I don't think it is wrong: nb_seg is just ignored when not in the first > > > > > > segment, and there is nothing saying it should be set to 1. Typically, > > > > > > rte_pktmbuf_chain() does not change it, and I guess it's the same for > > > > > > many similar functions in applications. > > > > > > > > > > > > Olivier > > > > > > > > > > > >> > > > > > >> > > > > > >>> This is expected that mbufs stored in pool have their > > > > > >>> nb_seg field set to 1. > > > > > >>> > > > > > >>> Fixes: 8f094a9ac5d7 ("mbuf: set mbuf fields while in pool") > > > > > >>> Cc: stable@dpdk.org > > > > > >>> > > > > > >>> Signed-off-by: Olivier Matz > > > > > >>> --- > > > > > >>> lib/librte_mbuf/rte_mbuf.c | 6 ++---- > > > > > >>> lib/librte_mbuf/rte_mbuf.h | 12 ++++-------- > > > > > >>> 2 files changed, 6 insertions(+), 12 deletions(-) > > > > > >>> > > > > > >>> diff --git a/lib/librte_mbuf/rte_mbuf.c b/lib/librte_mbuf/rte_mbuf.c > > > > > >>> index 8a456e5e64..e632071c23 100644 > > > > > >>> --- a/lib/librte_mbuf/rte_mbuf.c > > > > > >>> +++ b/lib/librte_mbuf/rte_mbuf.c > > > > > >>> @@ -129,10 +129,8 @@ rte_pktmbuf_free_pinned_extmem(void *addr, void *opaque) > > > > > >>> > > > > > >>> rte_mbuf_ext_refcnt_set(m->shinfo, 1); > > > > > >>> m->ol_flags = EXT_ATTACHED_MBUF; > > > > > >>> - if (m->next != NULL) { > > > > > >>> - m->next = NULL; > > > > > >>> - m->nb_segs = 1; > > > > > >>> - } > > > > > >>> + m->next = NULL; > > > > > >>> + m->nb_segs = 1; > > > > > >>> rte_mbuf_raw_free(m); > > > > > >>> } > > > > > >>> > > > > > >>> diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h > > > > > >>> index a1414ed7cd..ef5800c8ef 100644 > > > > > >>> --- a/lib/librte_mbuf/rte_mbuf.h > > > > > >>> +++ b/lib/librte_mbuf/rte_mbuf.h > > > > > >>> @@ -1329,10 +1329,8 @@ rte_pktmbuf_prefree_seg(struct rte_mbuf *m) > > > > > >>> return NULL; > > > > > >>> } > > > > > >>> > > > > > >>> - if (m->next != NULL) { > > > > > >>> - m->next = NULL; > > > > > >>> - m->nb_segs = 1; > > > > > >>> - } > > > > > >>> + m->next = NULL; > > > > > >>> + m->nb_segs = 1; > > > > > >>> > > > > > >>> return m; > > > > > >>> > > > > > >>> @@ -1346,10 +1344,8 @@ rte_pktmbuf_prefree_seg(struct rte_mbuf *m) > > > > > >>> return NULL; > > > > > >>> } > > > > > >>> > > > > > >>> - if (m->next != NULL) { > > > > > >>> - m->next = NULL; > > > > > >>> - m->nb_segs = 1; > > > > > >>> - } > > > > > >>> + m->next = NULL; > > > > > >>> + m->nb_segs = 1; > > > > > >>> rte_mbuf_refcnt_set(m, 1); > > > > > >>> > > > > > >>> return m; > > > > > >>> -- > > > > > >>> 2.25.1 > > > > > >> > > > > >