From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <olivier.matz@6wind.com>
Received: from mail-wm0-f54.google.com (mail-wm0-f54.google.com [74.125.82.54])
 by dpdk.org (Postfix) with ESMTP id 3D42298
 for <dev@dpdk.org>; Tue, 21 Feb 2017 16:04:49 +0100 (CET)
Received: by mail-wm0-f54.google.com with SMTP id c85so113478866wmi.1
 for <dev@dpdk.org>; Tue, 21 Feb 2017 07:04:49 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=6wind-com.20150623.gappssmtp.com; s=20150623;
 h=from:date:to:cc:subject:message-id:in-reply-to:references
 :mime-version:content-transfer-encoding;
 bh=ThV/yubg4VCfgZ4TyR42g1CkgKrBT0T0USGNAL7EwiM=;
 b=sazGp1C54cE5Ng+T1HWIWRdH0weEtP9pvtIk1oZ2iSvl4kQorKHwgUiqr4WexOUmZi
 9BuVy9cx1Pg4GhypdAa/PBn3e0fn1+N30IRE67e2dGilw+bqiE20Y6/QhFbB5ESXyCIr
 Y+yCbjTC+ig20aXs3lvVaN3nmEC/iQ91mFMyHf2aFLJYLI0I9OCBxs95clObcz00gagp
 KGi7zcP9iRq9/9Bdn2qsYoIzDeAJ/f0Hp+NzyA2Cf4+cR7sy1LtBxUs9fgWlq+cDtJlu
 GKfiufC6OyFnzXz2nvhgoYIEG6JWmJl7nioyj94VI9XsKtXSua/VEB5ORuhMuVewxoac
 y2TA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20161025;
 h=x-gm-message-state:from:date:to:cc:subject:message-id:in-reply-to
 :references:mime-version:content-transfer-encoding;
 bh=ThV/yubg4VCfgZ4TyR42g1CkgKrBT0T0USGNAL7EwiM=;
 b=a4urvI4AWi2dodUouF+YpKiN09i/YnkrMvctTGeMaxD+LCALrkeTYqCsWojb+Nse3X
 IaapgSYIV1/eC6a0uP0lEDARYpv1p6IEmkZ14BwhNUIimZRnIKHUK2phNNKX9lkEcAbW
 rNDJ+XSa1vthmNs43NAHCBqhh0BahDinArNTFOYQFUVnDMd3SurP+e9yl9wzEuNDrF3s
 Aa54B6NxJYSxKJY7fe5r8iILKOG6T4/I98xhxuP88B/5qnoZLpsvWFMV/cJKw6FTdQPI
 UNdfezGa0vEZioK0ahgL0hCftrLDV6KZeW8qpyY/jLT75zFV+gwXUF9fibPlrEbJVVRP
 UFDg==
X-Gm-Message-State: AMke39mnTNC28VIBEs6195FhmPWkJeZknqnFYTYMxK3ABRCFwtl74X9XxHxEwugqCBttQRct
X-Received: by 10.28.167.68 with SMTP id q65mr15604876wme.126.1487689488957;
 Tue, 21 Feb 2017 07:04:48 -0800 (PST)
Received: from glumotte.dev.6wind.com (host.78.145.23.62.rev.coltfrance.com.
 [62.23.145.78])
 by smtp.gmail.com with ESMTPSA id y1sm17908289wme.15.2017.02.21.07.04.48
 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256);
 Tue, 21 Feb 2017 07:04:48 -0800 (PST)
From: Olivier MATZ <olivier.matz@6wind.com>
X-Google-Original-From: Olivier MATZ <olivier.matz@dev.6wind.com>
Date: Tue, 21 Feb 2017 16:04:40 +0100
To: Morten =?UTF-8?B?QnLDuHJ1cA==?= <mb@smartsharesystems.com>
Cc: "Olivier Matz" <olivier.matz@6wind.com>, "Bruce Richardson"
 <bruce.richardson@intel.com>, "Ananyev, Konstantin"
 <konstantin.ananyev@intel.com>, <dev@dpdk.org>
Message-ID: <20170221160440.58695572@glumotte.dev.6wind.com>
In-Reply-To: <98CBD80474FA8B44BF855DF32C47DC359EAD0D@smartserver.smartshare.dk>
References: <1485271173-13408-1-git-send-email-olivier.matz@6wind.com>
 <2601191342CEEE43887BDE71AB9772583F111A29@irsmsx105.ger.corp.intel.com>
 <20170216144807.7add2c71@platinum>
 <20170216154619.GA115208@bricha3-MOBL3.ger.corp.intel.com>
 <20170216171410.57bff4ed@platinum>
 <98CBD80474FA8B44BF855DF32C47DC359EAD0D@smartserver.smartshare.dk>
X-Mailer: Claws Mail 3.14.1 (GTK+ 2.24.30; x86_64-pc-linux-gnu)
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable
Subject: Re: [dpdk-dev] [RFC 0/8] mbuf: structure reorganization
X-BeenThere: dev@dpdk.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: DPDK patches and discussions <dev.dpdk.org>
List-Unsubscribe: <http://dpdk.org/ml/options/dev>,
 <mailto:dev-request@dpdk.org?subject=unsubscribe>
List-Archive: <http://dpdk.org/ml/archives/dev/>
List-Post: <mailto:dev@dpdk.org>
List-Help: <mailto:dev-request@dpdk.org?subject=help>
List-Subscribe: <http://dpdk.org/ml/listinfo/dev>,
 <mailto:dev-request@dpdk.org?subject=subscribe>
X-List-Received-Date: Tue, 21 Feb 2017 15:04:49 -0000

Hi Morten,

On Tue, 21 Feb 2017 15:20:23 +0100, Morten Br=C3=B8rup
<mb@smartsharesystems.com> wrote:
> Comments at the end.
>=20
>=20
> > -----Original Message-----
> > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Olivier Matz
> > Sent: Thursday, February 16, 2017 5:14 PM
> > To: Bruce Richardson
> > Cc: Ananyev, Konstantin; dev@dpdk.org
> > Subject: Re: [dpdk-dev] [RFC 0/8] mbuf: structure reorganization
> >=20
> > On Thu, 16 Feb 2017 15:46:19 +0000, Bruce Richardson
> > <bruce.richardson@intel.com> wrote: =20
> > > On Thu, Feb 16, 2017 at 02:48:07PM +0100, Olivier Matz wrote: =20
> > > > Hi Konstantin,
> > > >
> > > > Thanks for the feedback.
> > > > Comments inline.
> > > >
> > > >
> > > > On Mon, 6 Feb 2017 18:41:27 +0000, "Ananyev, Konstantin"
> > > > <konstantin.ananyev@intel.com> wrote: =20
> > > > > Hi Olivier,
> > > > > Looks good in general, some comments from me below.
> > > > > Thanks
> > > > > Konstantin
> > > > > =20
> > > > > >
> > > > > > The main changes are:
> > > > > > - reorder structure to increase vector performance on some
> > > > > > non-ia platforms.
> > > > > > - add a 64bits timestamp field in the 1st cache line =20
> > > > >
> > > > > Wonder why it deserves to be in first cache line?
> > > > > How it differs from seqn below (pure SW stuff right now). =20
> > > >
> > > > In case the timestamp is set from a NIC value, it is set in the
> > > > Rx path. So that's why I think it deserve to be located in the
> > > > 1st cache line.
> > > >
> > > > As you said, the seqn is a pure sw stuff right: it is set in a
> > > > lib, not in a PMD rx path.
> > > > =20
> > > > > > - m->next, m->nb_segs, and m->refcnt are always initialized
> > > > > > for mbufs in the pool, avoiding the need of setting
> > > > > > m->next =20
> > (located =20
> > > > > > in the 2nd cache line) in the Rx path for mono-segment
> > > > > > packets.
> > > > > > - change port and nb_segs to 16 bits =20
> > > > >
> > > > > Not that I am completely against it, but changing nb_segs to
> > > > > 16 bits seems like an overkill to me.
> > > > > I think we can keep and extra 8bits for something more useful
> > > > > in future. =20
> > > >
> > > > In my case, I use the m->next field to chain more than 256
> > > > segments for L4 socket buffers. It also updates nb_seg that can
> > > > overflow. It's not a big issue since at the end, nb_seg is
> > > > decremented for each segment. On the other hand, if I enable
> > > > some sanity checks on mbufs, it complains because the number of
> > > > segments is not equal to nb_seg.
> > > >
> > > > There is also another use case with fragmentation as discussed
> > > > recently: http://dpdk.org/dev/patchwork/patch/19819/
> > > >
> > > > Of course, dealing with a long mbuf list is not that efficient,
> > > > but the application can maintain another structure to
> > > > accelerate the access to the middle/end of the list.
> > > >
> > > > Finally, we have other ideas to get additional 8 bits if
> > > > required =20
> > in =20
> > > > the future, so I don't think it's really a problem.
> > > >
> > > > =20
> > > > > =20
> > > > > > - move seqn in the 2nd cache line
> > > > > >
> > > > > > Things discussed but not done in the patchset:
> > > > > > - move refcnt and nb_segs to the 2nd cache line: many
> > > > > > drivers sets them in the Rx path, so it could introduce a
> > > > > > performance regression, or =20
> > > > >
> > > > > I wonder can refcnt only be moved into the 2-nd cacheline?
> > > > > As I understand thanks to other change (from above)
> > > > > m->refcnt =20
> > will =20
> > > > > already be initialized, so RX code don't need to touch it.
> > > > > Though yes, it still would require changes in all PMDs. =20
> > > >
> > > > Yes, I agree, some fields could be moved in the 2nd cache line
> > > > once all PMDs stop to write them in RX path. I propose to issue
> > > > some guidelines to PMD maintainers at the same time the
> > > > patchset is pushed. Then we can consider changing it in a
> > > > future version, in case we need more room in the 1st mbuf cache
> > > > line.=20
> > >
> > > If we are changing things, we should really do all that now,
> > > rather than storing up future breaks to mbuf. Worst case, we
> > > should plan for it immediately after the release where we make
> > > these changes. Have =20
> > two =20
> > > releases that break mbuf immediately after each other - and
> > > flagged =20
> > as =20
> > > such, but keep it stable thereafter. I don't like having technical
> > > debt on mbuf just after we supposedly "fix" it. =20
> >=20
> > I think there is no need to do this change now. And I don't feel
> > good with the idea of having a patchset that updates all the PMDs
> > to remove the access to a field because it moved to the 2nd cache
> > line (especially thinking about vector PMDs).
> >=20
> > That's why I think the plan could be:
> > - push an updated version of this patchset quickly
> > - advertise to PMD maintainers "you don't need to set the m->next,
> >   m->refcnt, and m->nb_segs in the RX path, please update your
> > drivers"
> > - later, if we need more room in the 1st cache line of the mbuf, we
> >   can move refcnt and nb_seg, probably without impacting the
> >   performance.
> >=20
> >=20
> > Olivier =20
>=20
> I suppose you mean that PMDs don't need to /initialize/ m->next,
> m->refcnt and m->nb_segs.
>=20
> Forgive my ignorance, and this is wild speculation, but: Would a PMD
> not need to set m->next and m->nb_segs if it receives a jumbogram
> larger than an mbuf packet buffer? And if this is a realistic use
> case, these fields actually do belong in the 1st cache line. PMD
> developers please chime in.

Nothing to add to Bruce's answer :)

>=20
> And I tend to agree with Bruce about making all these mbuf changes in
> one go, rather than postponing some of them to later. Especially
> because the postponement also closes and reopens the whole discussion
> and decision process! (Not initializing a few fields in a PMD cannot
> require a lot of work by the PMD developers. Moving the fields to the
> 2nd cache line will in the worst case degrade the performance of the
> non-updated PMDs.)
>=20
> A two step process makes good sense for the developers of DPDK, but
> both steps should be taken within the same release, so they are
> transparent to the users of DPDK.

I don't think this is doable, knowing the submission dead line is in
less than 2 weeks. On my side, honestly, I don't want to dive in the
code of into all PMDs. I feel this would be more risky than letting
the PMD maintainers update their own PMD code.

Olivier