From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <dev-bounces@dpdk.org>
Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124])
	by inbox.dpdk.org (Postfix) with ESMTP id 11E24A00C3;
	Tue, 18 Jan 2022 09:25:27 +0100 (CET)
Received: from [217.70.189.124] (localhost [127.0.0.1])
	by mails.dpdk.org (Postfix) with ESMTP id 964584117D;
	Tue, 18 Jan 2022 09:25:26 +0100 (CET)
Received: from smartserver.smartsharesystems.com
 (smartserver.smartsharesystems.com [77.243.40.215])
 by mails.dpdk.org (Postfix) with ESMTP id B44D54067E
 for <dev@dpdk.org>; Tue, 18 Jan 2022 09:25:24 +0100 (CET)
Content-class: urn:content-classes:message
MIME-Version: 1.0
Content-Type: text/plain;
	charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable
Subject: RE: [PATCH] mempool: fix get objects from mempool with cache
X-MimeOLE: Produced By Microsoft Exchange V6.5
Date: Tue, 18 Jan 2022 09:25:22 +0100
Message-ID: <98CBD80474FA8B44BF855DF32C47DC35D86E0F@smartserver.smartshare.dk>
In-Reply-To: <YeWoyXJbrs3tv5zE@bricha3-MOBL.ger.corp.intel.com>
X-MS-Has-Attach: 
X-MS-TNEF-Correlator: 
Thread-Topic: [PATCH] mempool: fix get objects from mempool with cache
Thread-Index: AdgLyJeDF8b9FBSYT9qBIq7DSnBASwAcnbVA
References: <98CBD80474FA8B44BF855DF32C47DC35D86DB2@smartserver.smartshare.dk>
 <20220114163650.94288-1-mb@smartsharesystems.com>
 <YeWoyXJbrs3tv5zE@bricha3-MOBL.ger.corp.intel.com>
From: =?iso-8859-1?Q?Morten_Br=F8rup?= <mb@smartsharesystems.com>
To: "Bruce Richardson" <bruce.richardson@intel.com>
Cc: <olivier.matz@6wind.com>, <andrew.rybchenko@oktetlabs.ru>,
 <jerinjacobk@gmail.com>, <dev@dpdk.org>
X-BeenThere: dev@dpdk.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: DPDK patches and discussions <dev.dpdk.org>
List-Unsubscribe: <https://mails.dpdk.org/options/dev>,
 <mailto:dev-request@dpdk.org?subject=unsubscribe>
List-Archive: <http://mails.dpdk.org/archives/dev/>
List-Post: <mailto:dev@dpdk.org>
List-Help: <mailto:dev-request@dpdk.org?subject=help>
List-Subscribe: <https://mails.dpdk.org/listinfo/dev>,
 <mailto:dev-request@dpdk.org?subject=subscribe>
Errors-To: dev-bounces@dpdk.org

> From: Bruce Richardson [mailto:bruce.richardson@intel.com]
> Sent: Monday, 17 January 2022 18.35
>=20
> On Fri, Jan 14, 2022 at 05:36:50PM +0100, Morten Br=F8rup wrote:
> > A flush threshold for the mempool cache was introduced in DPDK
> version
> > 1.3, but rte_mempool_do_generic_get() was not completely updated =
back
> > then, and some inefficiencies were introduced.
> >
> > This patch fixes the following in rte_mempool_do_generic_get():
> >
> > 1. The code that initially screens the cache request was not updated
> > with the change in DPDK version 1.3.
> > The initial screening compared the request length to the cache size,
> > which was correct before, but became irrelevant with the =
introduction
> of
> > the flush threshold. E.g. the cache can hold up to flushthresh
> objects,
> > which is more than its size, so some requests were not served from
> the
> > cache, even though they could be.
> > The initial screening has now been corrected to match the initial
> > screening in rte_mempool_do_generic_put(), which verifies that a
> cache
> > is present, and that the length of the request does not overflow the
> > memory allocated for the cache.
> >
> > 2. The function is a helper for rte_mempool_generic_get(), so it =
must
> > behave according to the description of that function.
> > Specifically, objects must first be returned from the cache,
> > subsequently from the ring.
> > After the change in DPDK version 1.3, this was not the behavior when
> > the request was partially satisfied from the cache; instead, the
> objects
> > from the ring were returned ahead of the objects from the cache. =
This
> is
> > bad for CPUs with a small L1 cache, which benefit from having the =
hot
> > objects first in the returned array. (This is also the reason why
> > the function returns the objects in reverse order.)
> > Now, all code paths first return objects from the cache, =
subsequently
> > from the ring.
> >
> > 3. If the cache could not be backfilled, the function would attempt
> > to get all the requested objects from the ring (instead of only the
> > number of requested objects minus the objects available in the =
ring),
> > and the function would fail if that failed.
> > Now, the first part of the request is always satisfied from the
> cache,
> > and if the subsequent backfilling of the cache from the ring fails,
> only
> > the remaining requested objects are retrieved from the ring.
> >
> > 4. The code flow for satisfying the request from the cache was
> slightly
> > inefficient:
> > The likely code path where the objects are simply served from the
> cache
> > was treated as unlikely. Now it is treated as likely.
> > And in the code path where the cache was backfilled first, numbers
> were
> > added and subtracted from the cache length; now this code path =
simply
> > sets the cache length to its final value.
> >
> > 5. Some comments were not correct anymore.
> > The comments have been updated.
> > Most importanly, the description of the succesful return value was
> > inaccurate. Success only returns 0, not >=3D 0.
> >
> > Signed-off-by: Morten Br=F8rup <mb@smartsharesystems.com>
> > ---
>=20
> I am a little uncertain about the reversing of the copies taking =
things
> out
> of the mempool - for machines where we are not that cache constrainted
> will
> we lose out in possible optimizations where the compiler optimizes the
> copy
> loop as a memcpy?

The objects are also returned in reverse order in the code it replaces, =
so this behavior is not introduced by this patch; I only describe the =
reason for it.

I floated a previous patch, in which the objects were returned in order, =
but Jerin argued [1] that we should keep it the way it was, unless I =
could show a performance improvement.

So I retracted that patch to split it up in two independent patches =
instead. This patch for get(), and [3] for put().

While experimenting using rte_memcpy() for these, I couldn't achieve a =
performance boost - quite the opposite. So I gave up on it.

Reviewing the x86 variant of rte_memcpy() [2] makes me think that it is =
inefficient for copying small bulks of pointers, especially when n is =
unknown at compile time, and its code path goes through a great deal of =
branches.

>=20
> Otherwise the logic all looks correct to me.
>=20
> /Bruce

[1]: =
http://inbox.dpdk.org/dev/CALBAE1OjCswxUfaNLWg5y-tnPkFhvvKQ8sJ3JpBoo7Obge=
B5OA@mail.gmail.com/
[2]: =
http://code.dpdk.org/dpdk/latest/source/lib/eal/x86/include/rte_memcpy.h
[3]: =
http://inbox.dpdk.org/dev/20220117115231.8060-1-mb@smartsharesystems.com/=