From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <dev-bounces@dpdk.org>
Received: from dpdk.org (dpdk.org [92.243.14.124])
	by inbox.dpdk.org (Postfix) with ESMTP id 9789AA04E7;
	Mon,  2 Nov 2020 16:58:19 +0100 (CET)
Received: from [92.243.14.124] (localhost [127.0.0.1])
	by dpdk.org (Postfix) with ESMTP id 73BE0C982;
	Mon,  2 Nov 2020 16:58:18 +0100 (CET)
Received: from new4-smtp.messagingengine.com (new4-smtp.messagingengine.com
 [66.111.4.230]) by dpdk.org (Postfix) with ESMTP id C7F4DC958;
 Mon,  2 Nov 2020 16:58:15 +0100 (CET)
Received: from compute2.internal (compute2.nyi.internal [10.202.2.42])
 by mailnew.nyi.internal (Postfix) with ESMTP id 66F11580420;
 Mon,  2 Nov 2020 10:58:14 -0500 (EST)
Received: from mailfrontend2 ([10.202.2.163])
 by compute2.internal (MEProxy); Mon, 02 Nov 2020 10:58:14 -0500
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=monjalon.net; h=
 from:to:cc:subject:date:message-id:in-reply-to:references
 :mime-version:content-transfer-encoding:content-type; s=fm2; bh=
 g+pOJHxjImO5AXaXccCzyFUCX4eE+Z4mezzlgtOyLwM=; b=GXh1cZfDqMq8ZjrC
 n7kVClGLmc08j520M5vkqa+S35UQnYz55Haknxyr441PsKfdR7uwX0a4NZ+O+g36
 +3jAY2czXnxzIrgCGlut/KsPGASa5G9YfpzWYRBrir1rCIuNYPnqc+mtS6dqRpgO
 9q5CWhH4topy07/o2dgT9qM/DfBx+IkwcNoSt3bnPF0JFnWv5q0NxBjGvBLW0Yzx
 Voq8IU8ZTutMfkf7JB7O+VNitYH5PDpUdRTarrP60Edhlv6+BVA5wQH6mA91CYui
 ofrN6gVmFEF1dFxu7VXpG0UWxTomJTMYAgj+mfw3kTO02WlMoSCFznbncl0JstsN
 Uszrwg==
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=
 messagingengine.com; h=cc:content-transfer-encoding:content-type
 :date:from:in-reply-to:message-id:mime-version:references
 :subject:to:x-me-proxy:x-me-proxy:x-me-sender:x-me-sender
 :x-sasl-enc; s=fm1; bh=g+pOJHxjImO5AXaXccCzyFUCX4eE+Z4mezzlgtOyL
 wM=; b=cQ7BpdQ3JEhuq5laLQ/61f/q0CzXMDEQDPh53nWIo3ftTyqnCJcEAnScF
 e5cedrPuo3+5nzxTtRYb87vESh8JQYFGEoJ1yjdUZM+8fkYe9nIr5KC7EOVohciG
 369HSyNOgRNJ1WnkZjZs1WVTDQgQLnkXkHWsWtqWXqx0VT2/4YklNQ46tNHK5K66
 Rs7lth6i/ZB9hOHQpufAm6dwq76QQYhj9KFo72g7/wFssE212GJpcD27yuk+UtVU
 DCE5dWYNaRTR5ludAJf4k7wy9eFwfyKTZ+3K9UWLEbpApI+lWfZNsV/mzYnLKy9Y
 xBUePWnB3MXb2GD1+tk/t3HhPRZIg==
X-ME-Sender: <xms:kyygX0BoINLKu6MuQTfY5YrGU8lq5t65pd346wwCrxmL4WodpEfpJw>
 <xme:kyygX2j_a3i40ClzWmgKhpgEKK4VHBmDlyikrsMbN_Arqu1slK4C3Fkn3AMLaA_mY
 7PjFOEJhsfZVeaBUg>
X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedujedruddtuddgkedvucetufdoteggodetrfdotf
 fvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdfqfgfvpdfurfetoffkrfgpnffqhgen
 uceurghilhhouhhtmecufedttdenucesvcftvggtihhpihgvnhhtshculddquddttddmne
 cujfgurhephffvufffkfgjfhgggfgtsehtqhertddttddunecuhfhrohhmpefvhhhomhgr
 shcuofhonhhjrghlohhnuceothhhohhmrghssehmohhnjhgrlhhonhdrnhgvtheqnecugg
 ftrfgrthhtvghrnhepfeegffeihfeftedthfdvgfetkeffffdukeevtdevtddvgfevuedu
 veegvdeggedtnecukfhppeejjedrudefgedrvddtfedrudekgeenucevlhhushhtvghruf
 hiiigvpedtnecurfgrrhgrmhepmhgrihhlfhhrohhmpehthhhomhgrshesmhhonhhjrghl
 ohhnrdhnvght
X-ME-Proxy: <xmx:kyygX3mHs-g2lKJQ_Nc7QliYuzMM6JLHdHkIoN5jL93j1CNIuHhxzw>
 <xmx:kyygX6zQlnYHTecH-yyUz16TMzu2u-Y28n9Yz70ELZH6pMFJawKB-Q>
 <xmx:kyygX5QlhcAff9VdBBNNa6aLD1U3GzuaC5V1yLooWN-jFGa63juQ2w>
 <xmx:liygXyiQHz503DM_RWMwNgKNyDdN9lyBFtF-BfDWS2IqgYtlwXMz_w>
Received: from xps.localnet (184.203.134.77.rev.sfr.net [77.134.203.184])
 by mail.messagingengine.com (Postfix) with ESMTPA id F23A83064682;
 Mon,  2 Nov 2020 10:58:09 -0500 (EST)
From: Thomas Monjalon <thomas@monjalon.net>
To: dev@dpdk.org, techboard@dpdk.org
Cc: Ajit Khaparde <ajit.khaparde@broadcom.com>, "Ananyev,
 Konstantin" <konstantin.ananyev@intel.com>,
 Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>, dev@dpdk.org, "Yigit,
 Ferruh" <ferruh.yigit@intel.com>, david.marchand@redhat.com, "Richardson,
 Bruce" <bruce.richardson@intel.com>, olivier.matz@6wind.com,
 jerinj@marvell.com, viacheslavo@nvidia.com, honnappa.nagarahalli@arm.com,
 maxime.coquelin@redhat.com, stephen@networkplumber.org, hemant.agrawal@nxp.com,
 viacheslavo@nvidia.com, Matan Azrad <matan@nvidia.com>,
 Shahaf Shuler <shahafs@nvidia.com>,
 Morten =?ISO-8859-1?Q?Br=F8rup?= <mb@smartsharesystems.com>
Date: Mon, 02 Nov 2020 16:58:08 +0100
Message-ID: <13044489.RHGIMAnax8@thomas>
In-Reply-To: <98CBD80474FA8B44BF855DF32C47DC35C613CD@smartserver.smartshare.dk>
References: <20201029092751.3837177-1-thomas@monjalon.net>
 <3086227.yllCKDRCEA@thomas>
 <98CBD80474FA8B44BF855DF32C47DC35C613CD@smartserver.smartshare.dk>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain; charset="iso-8859-1"
Subject: Re: [dpdk-dev] [PATCH 15/15] mbuf: move pool pointer in hotterfirst
	half
X-BeenThere: dev@dpdk.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: DPDK patches and discussions <dev.dpdk.org>
List-Unsubscribe: <https://mails.dpdk.org/options/dev>,
 <mailto:dev-request@dpdk.org?subject=unsubscribe>
List-Archive: <http://mails.dpdk.org/archives/dev/>
List-Post: <mailto:dev@dpdk.org>
List-Help: <mailto:dev-request@dpdk.org?subject=help>
List-Subscribe: <https://mails.dpdk.org/listinfo/dev>,
 <mailto:dev-request@dpdk.org?subject=subscribe>
Errors-To: dev-bounces@dpdk.org
Sender: "dev" <dev-bounces@dpdk.org>

+Cc techboard

We need benchmark numbers in order to take a decision.
Please all, prepare some arguments and numbers so we can discuss
the mbuf layout in the next techboard meeting.


01/11/2020 21:59, Morten Br=F8rup:
> > From: Thomas Monjalon [mailto:thomas@monjalon.net]
> > Sent: Sunday, November 1, 2020 5:38 PM
> >=20
> > 01/11/2020 10:12, Morten Br=F8rup:
> > > One thing has always puzzled me:
> > > Why do we use 64 bits to indicate which memory pool
> > > an mbuf belongs to?
> > > The portid only uses 16 bits and an indirection index.
> > > Why don't we use the same kind of indirection index for mbuf pools?
> >=20
> > I wonder what would be the cost of indirection. Probably neglectible.
>=20
> Probably. The portid does it, and that indirection is heavily used everyw=
here.
>=20
> The size of mbuf memory pool indirection array should be compile time con=
figurable, like the size of the portid indirection array.
>=20
> And for reference, the indirection array will fit into one cache line if =
we default to 8 mbuf pools, thus supporting an 8 CPU socket system with one=
 mbuf pool per CPU socket, or a 4 CPU socket system with two mbuf pools per=
 CPU socket.
>=20
> (And as a side note: Our application is optimized for single-socket syste=
ms, and we only use one mbuf pool. I guess many applications were developed=
 without carefully optimizing for multi-socket systems, and also just use o=
ne mbuf pool. In these cases, the mbuf structure doesn't really need a pool=
 field. But it is still there, and the DPDK libraries use it, so we didn't =
bother removing it.)
>=20
> > I think it is a good proposal...
> > ... for next year, after a deprecation notice.
> >=20
> > > I can easily imagine using one mbuf pool (or perhaps a few pools)
> > > per CPU socket (or per physical memory bus closest to an attached NIC=
),
> > > but not more than 256 mbuf memory pools in total.
> > > So, let's introduce an mbufpoolid like the portid,
> > > and cut this mbuf field down from 64 to 8 bits.

We will need to measure the perf of the solution.
There is a chance for the cost to be too much high.


> > > If we also cut down m->pkt_len from 32 to 24 bits,
> >=20
> > Who is using packets larger than 64k? Are 16 bits enough?
>=20
> I personally consider 64k a reasonable packet size limit. Exotic applicat=
ions with even larger packets would have to live with this constraint. But =
let's see if there are any objections. For reference, 64k corresponds to ca=
=2E 44 Ethernet (1500 byte) packets.
>=20
> (The limit could be 65535 bytes, to avoid translation of the value 0 into=
 65536 bytes.)
>=20
> This modification would go nicely hand in hand with the mbuf pool indirec=
tion modification.
>=20
> ... after yet another round of ABI stability discussions, depreciation no=
tices, and so on. :-)

After more thoughts, I'm afraid 64k is too small in some cases.
And 24-bit manipulation would probably break performance.
I'm afraid we are stuck with 32-bit length.


> > > we can get the 8 bit mbuf pool index into the first cache line
> > > at no additional cost.
> >=20
> > I like the idea.
> > It means we don't need to move the pool pointer now,
> > i.e. it does not have to replace the timestamp field.
>=20
> Agreed! Don't move m->pool to the first cache line; it is not used for RX.
>=20
> >=20
> > > In other words: This would free up another 64 bit field in the mbuf
> > structure!
> >=20
> > That would be great!
> >=20
> >=20
> > > And even though the m->next pointer for scattered packets resides
> > > in the second cache line, the libraries and application knows
> > > that m->next is NULL when m->nb_segs is 1.
> > > This proves that my suggestion would make touching
> > > the second cache line unnecessary (in simple cases),
> > > even for re-initializing the mbuf.
> >=20
> > So you think the "next" pointer should stay in the second half of mbuf?
> >=20
> > I feel you would like to move the Tx offloads in the first half
> > to improve performance of very simple apps.
>=20
> "Very simple apps" sounds like a minority of apps. I would rather say "ve=
ry simple packet handling scenarios", e.g. forwarding of normal size non-se=
gmented packets. I would guess that the vast majority of packets handled by=
 DPDK applications actually match this scenario. So I'm proposing to optimi=
ze for what I think is the most common scenario.
>=20
> If segmented packets are common, then m->next could be moved to the first=
 cache line. But it will only improve the pure RX steps of the pipeline. Wh=
en preparing the packet for TX, m->tx_offloads will need to be set, and the=
 second cache line comes into play. So I'm wondering how big the benefit of=
 having m->next in the first cache line really is - assuming that m->nb_seg=
s will be checked before accessing m->next.
>=20
> > I am thinking the opposite: we could have some dynamic fields space
> > in the first half to improve performance of complex Rx.
> > Note: we can add a flag hint for field registration in this first half.
> >=20
>=20
> I have had the same thoughts. However, I would prefer being able to forwa=
rd ordinary packets without using the second mbuf cache line at all (althou=
gh only in specific scenarios like my example above).
>=20
> Furthermore, the application can abuse the 64 bit m->tx_offload field for=
 private purposes until it is time to prepare the packet for TX and pass it=
 on to the driver. This hack somewhat resembles a dynamic field in the firs=
t cache line, and will not be possible if the m->pool or m->next field is m=
oved there.