From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from dpdk.org (dpdk.org [92.243.14.124]) by inbox.dpdk.org (Postfix) with ESMTP id 1700AA0524; Fri, 6 Nov 2020 11:04:44 +0100 (CET) Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id C11D35A62; Fri, 6 Nov 2020 11:04:42 +0100 (CET) Received: from mail-wr1-f68.google.com (mail-wr1-f68.google.com [209.85.221.68]) by dpdk.org (Postfix) with ESMTP id A18D15946 for ; Fri, 6 Nov 2020 11:04:40 +0100 (CET) Received: by mail-wr1-f68.google.com with SMTP id a3so655441wrx.13 for ; Fri, 06 Nov 2020 02:04:40 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=6wind.com; s=google; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:content-transfer-encoding:in-reply-to :user-agent; bh=T02VzGP/1ajA/yHvbZ9caQHX6ZifC8UPWVlku8bMFt4=; b=NpidNLlKrOEl6/2MfV8S3IOTbQR7KgdlGwhSnSy2dyyvDT+RTPJTfr80gryN4/PGPT 4bSfW/gIBVXWYTM80rUj1LCImhufzmx3k9JtbQmScTOUJK6781OTyZgv/AJJe1+9yRt1 5usiAQx1FX0fo08FTaqU5P0+IIGmQC3MD5ZpdZIdmsjY9/qj7w0Tkgi9y618WlMIVtT2 1uwC+SporUNmiqDnendR8oERdA6tUvx28DLOUpd9ioiZruM2UlsjFGXhK1a7h5RXSYcP zCGXysH1o5l1EG+FxcDZTuVA1arxIPyXVQu91Bg0IW2PFWfdurjnWRhRoKFgBwDw0Wa/ V0IQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:content-transfer-encoding :in-reply-to:user-agent; bh=T02VzGP/1ajA/yHvbZ9caQHX6ZifC8UPWVlku8bMFt4=; b=B6sO7ooaqT14kdNrPMWDJG/Ek80VGV8WAsuS3X2a03Zitb3W08ICyyiXrT/DK4cbIU OC3VhrDYqKQnVQQ3HCA6rpTYZ/9qNBGMVWBUR1nX0bD4QMyUL8It6VsQd6ElXMFUNZHo BCsoD9KTMdjUCzLLWRBzRb/02fE+F47yu30KWawLKnB1hJEwQta1v9IwfIhOmNtd1eEn jB6SD1TT7Zfn5phGiZkUBelQtkFsi6Y8IP3acRUDSoLFjvN4QVb6jSMHnY3nEbr6FwZ+ +LdNAExMx37Uczap06WxVGV7CcEbsqyxVl2msyzlb3Vwmni9zKML3ufo+NMNoCxuiGTw VcQg== X-Gm-Message-State: AOAM532zxU+w0Gc8rYoiexrnbNoKamvSdSfustuX8jzycMZ98xFEl6wS 5no2UMKsB653lerDP0Vgb4PBZg== X-Google-Smtp-Source: ABdhPJyjqkrcInNyx36EHFYCpAx+HSnYrpHGWoQ0wAf8xcLtkbANZODMXI1TI6cO393F9pTwfv0/lg== X-Received: by 2002:a5d:498a:: with SMTP id r10mr1823157wrq.106.1604657079189; Fri, 06 Nov 2020 02:04:39 -0800 (PST) Received: from 6wind.com (2a01cb0c0005a600345636f7e65ed1a0.ipv6.abo.wanadoo.fr. [2a01:cb0c:5:a600:3456:36f7:e65e:d1a0]) by smtp.gmail.com with ESMTPSA id m1sm1473746wme.48.2020.11.06.02.04.38 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 06 Nov 2020 02:04:38 -0800 (PST) Date: Fri, 6 Nov 2020 11:04:37 +0100 From: Olivier Matz To: Morten =?iso-8859-1?Q?Br=F8rup?= Cc: "Ananyev, Konstantin" , Andrew Rybchenko , dev@dpdk.org Message-ID: <20201106100437.GA1898@platinum> References: <20201105123150.GP1898@platinum> <20201105132438.GS1898@platinum> <98CBD80474FA8B44BF855DF32C47DC35C613F7@smartserver.smartshare.dk> <98CBD80474FA8B44BF855DF32C47DC35C613F8@smartserver.smartshare.dk> <20201106082053.GZ1898@platinum> <98CBD80474FA8B44BF855DF32C47DC35C613FA@smartserver.smartshare.dk> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <98CBD80474FA8B44BF855DF32C47DC35C613FA@smartserver.smartshare.dk> User-Agent: Mutt/1.10.1 (2018-07-13) Subject: Re: [dpdk-dev] [PATCH] mbuf: fix reset on mbuf free X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" On Fri, Nov 06, 2020 at 09:50:45AM +0100, Morten Brørup wrote: > > From: Olivier Matz [mailto:olivier.matz@6wind.com] > > Sent: Friday, November 6, 2020 9:21 AM > > > > On Fri, Nov 06, 2020 at 08:52:58AM +0100, Morten Brørup wrote: > > > > From: Ananyev, Konstantin [mailto:konstantin.ananyev@intel.com] > > > > Sent: Friday, November 6, 2020 12:55 AM > > > > > > > > > > > > > > > > >> Hi Olivier, > > > > > > > > > > > > >> > > > > > > > > > > > > >>> m->nb_seg must be reset on mbuf free whatever > > the > > > > value > > > > > > of m->next, > > > > > > > > > > > > >>> because it can happen that m->nb_seg is != 1. > > For > > > > > > instance in this > > > > > > > > > > > > >>> case: > > > > > > > > > > > > >>> > > > > > > > > > > > > >>> m1 = rte_pktmbuf_alloc(mp); > > > > > > > > > > > > >>> rte_pktmbuf_append(m1, 500); > > > > > > > > > > > > >>> m2 = rte_pktmbuf_alloc(mp); > > > > > > > > > > > > >>> rte_pktmbuf_append(m2, 500); > > > > > > > > > > > > >>> rte_pktmbuf_chain(m1, m2); > > > > > > > > > > > > >>> m0 = rte_pktmbuf_alloc(mp); > > > > > > > > > > > > >>> rte_pktmbuf_append(m0, 500); > > > > > > > > > > > > >>> rte_pktmbuf_chain(m0, m1); > > > > > > > > > > > > >>> > > > > > > > > > > > > >>> As rte_pktmbuf_chain() does not reset nb_seg in > > the > > > > > > initial m1 > > > > > > > > > > > > >>> segment (this is not required), after this code > > the > > > > > > mbuf chain > > > > > > > > > > > > >>> have 3 segments: > > > > > > > > > > > > >>> - m0: next=m1, nb_seg=3 > > > > > > > > > > > > >>> - m1: next=m2, nb_seg=2 > > > > > > > > > > > > >>> - m2: next=NULL, nb_seg=1 > > > > > > > > > > > > >>> > > > > > > > > > > > > >>> Freeing this mbuf chain will not restore > > nb_seg=1 > > > > in > > > > > > the second > > > > > > > > > > > > >>> segment. > > > > > > > > > > > > >> > > > > > > > > > > > > >> Hmm, not sure why is that? > > > > > > > > > > > > >> You are talking about freeing m1, right? > > > > > > > > > > > > >> rte_pktmbuf_prefree_seg(struct rte_mbuf *m) > > > > > > > > > > > > >> { > > > > > > > > > > > > >> ... > > > > > > > > > > > > >> if (m->next != NULL) { > > > > > > > > > > > > >> m->next = NULL; > > > > > > > > > > > > >> m->nb_segs = 1; > > > > > > > > > > > > >> } > > > > > > > > > > > > >> > > > > > > > > > > > > >> m1->next != NULL, so it will enter the if() > > block, > > > > > > > > > > > > >> and will reset both next and nb_segs. > > > > > > > > > > > > >> What I am missing here? > > > > > > > > > > > > >> Thinking in more generic way, that change: > > > > > > > > > > > > >> - if (m->next != NULL) { > > > > > > > > > > > > >> - m->next = NULL; > > > > > > > > > > > > >> - m->nb_segs = 1; > > > > > > > > > > > > >> - } > > > > > > > > > > > > >> + m->next = NULL; > > > > > > > > > > > > >> + m->nb_segs = 1; > > > > > > > > > > > > > > > > > > > > > > > > > > Ah, sorry. I oversimplified the example and now > > it > > > > does > > > > > > not > > > > > > > > > > > > > show the issue... > > > > > > > > > > > > > > > > > > > > > > > > > > The full example also adds a split() to break the > > > > mbuf > > > > > > chain > > > > > > > > > > > > > between m1 and m2. The kind of thing that would > > be > > > > done > > > > > > for > > > > > > > > > > > > > software TCP segmentation. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > If so, may be the right solution is to care about > > > > nb_segs > > > > > > > > > > > > when next is set to NULL on split? Any place when > > next > > > > is > > > > > > set > > > > > > > > > > > > to NULL. Just to keep the optimization in a more > > > > generic > > > > > > place. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > The problem with that approach is that there are > > already > > > > > > several > > > > > > > > > > > existing split() or trim() implementations in > > different > > > > DPDK- > > > > > > based > > > > > > > > > > > applications. For instance, we have some in > > 6WINDGate. If > > > > we > > > > > > force > > > > > > > > > > > applications to set nb_seg to 1 when resetting next, > > it > > > > has > > > > > > to be > > > > > > > > > > > documented because it is not straightforward. > > > > > > > > > > > > > > > > > > > > I think it is better to go that way. > > > > > > > > > > From my perspective it seems natural to reset nb_seg at > > > > same > > > > > > time > > > > > > > > > > we reset next, otherwise inconsistency will occur. > > > > > > > > > > > > > > > > > > While it is not explicitly stated for nb_segs, to me it > > was > > > > clear > > > > > > that > > > > > > > > > nb_segs is only valid in the first segment, like for many > > > > fields > > > > > > (port, > > > > > > > > > ol_flags, vlan, rss, ...). > > > > > > > > > > > > > > > > > > If we say that nb_segs has to be valid in any segments, > > it > > > > means > > > > > > that > > > > > > > > > chain() or split() will have to update it in all > > segments, > > > > which > > > > > > is not > > > > > > > > > efficient. > > > > > > > > > > > > > > > > Why in all? > > > > > > > > We can state that nb_segs on non-first segment should > > always > > > > equal > > > > > > 1. > > > > > > > > As I understand in that case, both split() and chain() have > > to > > > > > > update nb_segs > > > > > > > > only for head mbufs, rest ones will remain untouched. > > > > > > > > > > > > > > Well, anyway, I think it's strange to have a constraint on m- > > > > >nb_segs > > > > > > for > > > > > > > non-first segment. We don't have that kind of constraints for > > > > other > > > > > > fields. > > > > > > > > > > > > True, we don't. But this is one of the fields we consider > > critical > > > > > > for proper work of mbuf alloc/free mechanism. > > > > > > > > > > > > > > > > I am not sure that requiring m->nb_segs == 1 on non-first > > segments > > > > will provide any benefits. > > > > > > > > It would make this patch unneeded. > > > > So, for direct, non-segmented mbufs pktmbuf_free() will remain > > write- > > > > free. > > > > > > I see. Then I agree with Konstantin that alternative solutions should > > be considered. > > > > > > The benefit regarding free()'ing non-segmented mbufs - which is a > > very common operation - certainly outweighs the cost of requiring > > split()/chain() operations to set the new head mbuf's nb_segs = 1. > > > > > > Nonetheless, the bug needs to be fixed somehow. > > > > > > If we can't come up with a better solution that doesn't break the > > ABI, we are forced to accept the patch. > > > > > > Unless the techboard accepts to break the ABI in order to avoid the > > performance cost of this patch. > > > > Did someone notice a performance drop with this patch? > > On my side, I don't see any regression on a L3 use case. > > I am afraid that the DPDK performance regression tests are based on TX immediately following RX, so cache misses in TX may go by unnoticed because RX warmed up the cache for TX already. And similarly for RX reusing mbufs that have been warmed up by the preceding free() at TX. > > Please consider testing the performance difference with the mbuf being completely cold at TX, and going completely cold again before being reused for RX. > > > > > Let's sumarize: splitting a mbuf chain and freeing it causes subsequent > > mbuf > > allocation to return a mbuf which is not correctly initialized. There > > are 2 > > options to fix it: > > > > 1/ change the mbuf free function (this patch) > > > > - m->nb_segs would behave like many other field: valid in the first > > segment, ignored in other segments > > - may impact performance (suspected) > > > > 2/ change all places where a mbuf chain is split, or trimmed > > > > - m->nb_segs would have a specific behavior: count the number of > > segments in the first mbuf, should be 1 in the last segment, > > ignored in other ones. > > - no code change in mbuf library, so no performance impact > > - need to patch all places where we do a mbuf split or trim. From > > afar, > > I see at least mbuf_cut_seg_ofs() in DPDK. Some external > > applications > > may have to be patched (for instance, I already found 3 places in > > 6WIND code base without a deep search). > > > > In my opinion, 1/ is better, except we notice a significant > > performance, > > because the (implicit) behavior is unchanged. > > > > Whatever the solution, some documentation has to be added. > > > > Olivier > > > > Unfortunately, I don't think that anything but the first option will go into 20.11 and stable releases of older versions, so I stand by my acknowledgment of the patch. If we are affraid about 20.11 performance (it is legitimate, few days before the release), we can target 21.02. After all, everybody lives with this bug since 2017, so there is no urgency. If accepted and well tested, it can be backported in stable branches. > Which reminds me: Consider reporting the bug in Bugzilla. > > > > > > > > > > > > > > > > > > > > E.g. the second segment of a three-segment chain will still have > > m- > > > > >next != NULL, so it cannot be used as a gate to prevent accessing > > m- > > > > > >next. > > > > > > > > > > I might have overlooked something, though. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Saying that nb_segs has to be valid for the first and > > last > > > > > > segment seems > > > > > > > > > really odd to me. What is the logic behind that except > > > > keeping > > > > > > this test > > > > > > > > > as is? > > > > > > > > > > > > > > > > > > In any case, it has to be better documented. > > > > > > > > > > > > > > > > > > > > > > > > > > > Olivier > > > > > > > > > > > > > > > > > > > > > > > > > > > > > I think the approach from > > > > > > > > > > > this patch is safer. > > > > > > > > > > > > > > > > > > > > It might be easier from perspective that changes in > > less > > > > places > > > > > > are required, > > > > > > > > > > Though I think that this patch will introduce some > > > > performance > > > > > > drop. > > > > > > > > > > As now each mbuf_prefree_seg() will cause update of 2 > > cache > > > > > > lines unconditionally. > > > > > > > > > > > > > > > > > > > > > By the way, for 21.11, if we are able to do some > > > > > > optimizations and have > > > > > > > > > > > both pool (index?) and next in the first cache line, > > we > > > > may > > > > > > reconsider > > > > > > > > > > > the fact that next and nb_segs are already set for > > new > > > > > > allocated mbufs, > > > > > > > > > > > because it is not straightforward either. > > > > > > > > > > > > > > > > > > > > My suggestion - let's put future optimization > > discussion > > > > aside > > > > > > for now, > > > > > > > > > > and concentrate on that particular patch. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > After this operation, we have 2 mbuf chain: > > > > > > > > > > > > > - m0 with 2 segments, the last one has next=NULL > > but > > > > > > nb_seg=2 > > > > > > > > > > > > > - new_m with 1 segment > > > > > > > > > > > > > > > > > > > > > > > > > > Freeing m0 will not restore nb_seg=1 in the > > second > > > > > > segment. > > > > > > > > > > > > > > > > > > > > > > > > > >> Assumes that it is ok to have an mbuf with > > > > > > > > > > > > >> nb_seg > 1 and next == NULL. > > > > > > > > > > > > >> Which seems wrong to me. > > > > > > > > > > > > > > > > > > > > > > > > > > I don't think it is wrong: nb_seg is just ignored > > > > when > > > > > > not in the first > > > > > > > > > > > > > segment, and there is nothing saying it should be > > set > > > > to > > > > > > 1. Typically, > > > > > > > > > > > > > rte_pktmbuf_chain() does not change it, and I > > guess > > > > it's > > > > > > the same for > > > > > > > > > > > > > many similar functions in applications. > > > > > > > > > > > > > > > > > > > > > > > > > > Olivier > > > > > > > > > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > >> > > > > > > > > > > > > >>> This is expected that mbufs stored in pool have > > > > their > > > > > > > > > > > > >>> nb_seg field set to 1. > > > > > > > > > > > > >>> > > > > > > > > > > > > >>> Fixes: 8f094a9ac5d7 ("mbuf: set mbuf fields > > while > > > > in > > > > > > pool") > > > > > > > > > > > > >>> Cc: stable@dpdk.org > > > > > > > > > > > > >>> > > > > > > > > > > > > >>> Signed-off-by: Olivier Matz > > > > > > > > > > > > > > > > >>> --- > > > > > > > > > > > > >>> lib/librte_mbuf/rte_mbuf.c | 6 ++---- > > > > > > > > > > > > >>> lib/librte_mbuf/rte_mbuf.h | 12 ++++-------- > > > > > > > > > > > > >>> 2 files changed, 6 insertions(+), 12 > > deletions(-) > > > > > > > > > > > > >>> > > > > > > > > > > > > >>> diff --git a/lib/librte_mbuf/rte_mbuf.c > > > > > > b/lib/librte_mbuf/rte_mbuf.c > > > > > > > > > > > > >>> index 8a456e5e64..e632071c23 100644 > > > > > > > > > > > > >>> --- a/lib/librte_mbuf/rte_mbuf.c > > > > > > > > > > > > >>> +++ b/lib/librte_mbuf/rte_mbuf.c > > > > > > > > > > > > >>> @@ -129,10 +129,8 @@ > > > > > > rte_pktmbuf_free_pinned_extmem(void *addr, void *opaque) > > > > > > > > > > > > >>> > > > > > > > > > > > > >>> rte_mbuf_ext_refcnt_set(m->shinfo, 1); > > > > > > > > > > > > >>> m->ol_flags = EXT_ATTACHED_MBUF; > > > > > > > > > > > > >>> - if (m->next != NULL) { > > > > > > > > > > > > >>> - m->next = NULL; > > > > > > > > > > > > >>> - m->nb_segs = 1; > > > > > > > > > > > > >>> - } > > > > > > > > > > > > >>> + m->next = NULL; > > > > > > > > > > > > >>> + m->nb_segs = 1; > > > > > > > > > > > > >>> rte_mbuf_raw_free(m); > > > > > > > > > > > > >>> } > > > > > > > > > > > > >>> > > > > > > > > > > > > >>> diff --git a/lib/librte_mbuf/rte_mbuf.h > > > > > > b/lib/librte_mbuf/rte_mbuf.h > > > > > > > > > > > > >>> index a1414ed7cd..ef5800c8ef 100644 > > > > > > > > > > > > >>> --- a/lib/librte_mbuf/rte_mbuf.h > > > > > > > > > > > > >>> +++ b/lib/librte_mbuf/rte_mbuf.h > > > > > > > > > > > > >>> @@ -1329,10 +1329,8 @@ > > > > rte_pktmbuf_prefree_seg(struct > > > > > > rte_mbuf *m) > > > > > > > > > > > > >>> return NULL; > > > > > > > > > > > > >>> } > > > > > > > > > > > > >>> > > > > > > > > > > > > >>> - if (m->next != NULL) { > > > > > > > > > > > > >>> - m->next = NULL; > > > > > > > > > > > > >>> - m->nb_segs = 1; > > > > > > > > > > > > >>> - } > > > > > > > > > > > > >>> + m->next = NULL; > > > > > > > > > > > > >>> + m->nb_segs = 1; > > > > > > > > > > > > >>> > > > > > > > > > > > > >>> return m; > > > > > > > > > > > > >>> > > > > > > > > > > > > >>> @@ -1346,10 +1344,8 @@ > > > > rte_pktmbuf_prefree_seg(struct > > > > > > rte_mbuf *m) > > > > > > > > > > > > >>> return NULL; > > > > > > > > > > > > >>> } > > > > > > > > > > > > >>> > > > > > > > > > > > > >>> - if (m->next != NULL) { > > > > > > > > > > > > >>> - m->next = NULL; > > > > > > > > > > > > >>> - m->nb_segs = 1; > > > > > > > > > > > > >>> - } > > > > > > > > > > > > >>> + m->next = NULL; > > > > > > > > > > > > >>> + m->nb_segs = 1; > > > > > > > > > > > > >>> rte_mbuf_refcnt_set(m, 1); > > > > > > > > > > > > >>> > > > > > > > > > > > > >>> return m; > > > > > > > > > > > > >>> -- > > > > > > > > > > > > >>> 2.25.1 > > > > > > > > > > > > >> > > > > > > > > > > > > > > > >