From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <dev-bounces@dpdk.org>
Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124])
	by inbox.dpdk.org (Postfix) with ESMTP id 49AF4A050B;
	Thu,  7 Apr 2022 13:36:23 +0200 (CEST)
Received: from [217.70.189.124] (localhost [127.0.0.1])
	by mails.dpdk.org (Postfix) with ESMTP id E03284068B;
	Thu,  7 Apr 2022 13:36:22 +0200 (CEST)
Received: from smartserver.smartsharesystems.com
 (smartserver.smartsharesystems.com [77.243.40.215])
 by mails.dpdk.org (Postfix) with ESMTP id DDC5040689
 for <dev@dpdk.org>; Thu,  7 Apr 2022 13:36:21 +0200 (CEST)
X-MimeOLE: Produced By Microsoft Exchange V6.5
Content-class: urn:content-classes:message
MIME-Version: 1.0
Content-Type: text/plain;
	charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable
Subject: RE: [PATCH v4] mempool: fix mempool cache flushing algorithm
Date: Thu, 7 Apr 2022 13:36:17 +0200
Message-ID: <98CBD80474FA8B44BF855DF32C47DC35D86FBB@smartserver.smartshare.dk>
In-Reply-To: <Yk7AanFMiUYpKEzl@bricha3-MOBL.ger.corp.intel.com>
X-MS-Has-Attach: 
X-MS-TNEF-Correlator: 
Thread-Topic: [PATCH v4] mempool: fix mempool cache flushing algorithm
Thread-Index: AdhKbGlJircBbZDvR46AGDRtNCUl1gAAIPfg
References: <98CBD80474FA8B44BF855DF32C47DC35D86DB2@smartserver.smartshare.dk>
 <20220202103354.79832-1-mb@smartsharesystems.com>
 <98CBD80474FA8B44BF855DF32C47DC35D86FB8@smartserver.smartshare.dk>
 <Yk6rdMyYx3XAd4R3@bricha3-MOBL.ger.corp.intel.com>
 <98CBD80474FA8B44BF855DF32C47DC35D86FB9@smartserver.smartshare.dk>
 <Yk69rJCP53XYmbwY@bricha3-MOBL.ger.corp.intel.com>
 <Yk7AanFMiUYpKEzl@bricha3-MOBL.ger.corp.intel.com>
From: =?iso-8859-1?Q?Morten_Br=F8rup?= <mb@smartsharesystems.com>
To: "Bruce Richardson" <bruce.richardson@intel.com>
Cc: <olivier.matz@6wind.com>, <andrew.rybchenko@oktetlabs.ru>,
 <thomas@monjalon.net>, <jerinjacobk@gmail.com>, <dev@dpdk.org>
X-BeenThere: dev@dpdk.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: DPDK patches and discussions <dev.dpdk.org>
List-Unsubscribe: <https://mails.dpdk.org/options/dev>,
 <mailto:dev-request@dpdk.org?subject=unsubscribe>
List-Archive: <http://mails.dpdk.org/archives/dev/>
List-Post: <mailto:dev@dpdk.org>
List-Help: <mailto:dev-request@dpdk.org?subject=help>
List-Subscribe: <https://mails.dpdk.org/listinfo/dev>,
 <mailto:dev-request@dpdk.org?subject=subscribe>
Errors-To: dev-bounces@dpdk.org

> From: Bruce Richardson [mailto:bruce.richardson@intel.com]
> Sent: Thursday, 7 April 2022 12.44
>=20
> On Thu, Apr 07, 2022 at 11:32:12AM +0100, Bruce Richardson wrote:
> > On Thu, Apr 07, 2022 at 11:26:53AM +0200, Morten Br=F8rup wrote:
> > > > From: Bruce Richardson [mailto:bruce.richardson@intel.com]
> > > > Sent: Thursday, 7 April 2022 11.14
> > > >
> > > > On Thu, Apr 07, 2022 at 11:04:53AM +0200, Morten Br=F8rup wrote:
> > > > > > From: Morten Br=F8rup [mailto:mb@smartsharesystems.com]
> > > > > > Sent: Wednesday, 2 February 2022 11.34
> > > > > >
> > > > > > This patch fixes the rte_mempool_do_generic_put() caching
> > > > algorithm,
> > > > > > which was fundamentally wrong, causing multiple performance
> issues
> > > > when
> > > > > > flushing.
> > > > > >
> > > > >
> > > > > [...]
> > > > >
> > > > > Olivier,
> > > > >
> > > > > Will you please consider this patch [1] and the other one [2].
> > > > >
> > > > > The primary bug here is this: When a mempool cache becomes =
full
> (i.e.
> > > > exceeds the "flush threshold"), and is flushed to the backing
> ring, it
> > > > is still full afterwards; but it should be empty afterwards. It
> is not
> > > > flushed entirely, only the elements exceeding "size" are =
flushed.
> > > > >
> > > >
> > > > I don't believe it should be flushed entirely, there should
> always be
> > > > some
> > > > elements left so that even after flush we can still allocate an
> > > > additional
> > > > burst. We want to avoid the situation where a flush of all
> elements is
> > > > immediately followed by a refill of new elements. However, we =
can
> flush
> > > > to
> > > > maybe size/2, and improve things. In short, this not emptying is
> by
> > > > design
> > > > rather than a bug, though we can look to tweak the behaviour.
> > > >
> > >
> > > I initially agreed with you about flushing to size/2.
> > >
> > > However, I did think further about it when I wrote the patch, and
> came to this conclusion: If an application thread repeatedly puts
> objects into the mempool, and does it so often that the cache =
overflows
> (i.e. reaches the flush threshold) and needs to be flushed, it is far
> more likely that the application thread will continue doing that,
> rather than start getting objects from the mempool. This speaks for
> flushing the cache entirely.
> > >
> > > Both solutions are better than flushing to size, so if there is a
> preference for keeping some objects in the cache after flushing, I can
> update the patch accordingly.

I forgot to mention some details here...

The cache is a stack, so leaving objects in it after flushing can be =
done in one of two ways:

1. Flush the top objects, and leave the bottom objects, which are =
extremely cold.
2. Flush the bottom objects, and move the objects from the top to the =
bottom, which is a costly operation.

Theoretically, there is a third option: Make the stack a circular =
buffer, so its "bottom pointer" can be moved around, instead of copying =
objects from the top to the bottom after flushing. However, this will =
add complexity when copying arrays to/from the stack in both the normal =
cases, i.e. to/from the application. And it introduces requirements to =
the cache size. So I quickly discarded the idea when it first came to =
me.

The provided patch flushes the entire cache, and then stores the newly =
added objects (the ones causing the flush) in the cache. So it is not =
completely empty after flushing. It contains some (but not many) =
objects, and they are hot.

> > >
> >
> > Would it be worth looking at adding per-core hinting to the mempool?
> > Indicate for a core that it allocates-only, i.e. RX thread, frees-
> only,
> > i.e. TX-thread, or does both alloc and free (the default)? That hint
> could
> > be used only on flush or refill to specify whether to flush all or
> partial,
> > and similarly to refill to max possible or just to size.
> >
> Actually, taking the idea further, we could always track per-core
> whether a
> core has ever done a flush/refill and use that as the hint instead. It
> could even be done in a branch-free manner if we want. For example:
>=20
> on flush:
> 	keep_entries =3D (size >> 1) & (never_refills - 1);
>=20
> which will set the entries to keep to be 0 if we have never had to
> refill, or
> half of size, if the thread has previously done refills.
>=20

Your suggestion is a good idea for a performance improvement.

We would also need "mostly" variants in addition to the "only" variants. =
Or the automatic detection will cause problems if triggered by some rare =
event.

And applications using the "service cores" concept will just fall back =
to the default alloc-free balanced variant.


Perhaps we should fix the current bugs (my term, not consensus) first, =
and then look at further performance improvements. It's already uphill =
getting Acks for my fixes as they are.


Another performance improvement could be hard coding the mempool cache =
size to RTE_MEMPOOL_CACHE_MAX_SIZE, so the copying between the cache and =
the backing ring can be done by a fixed size optimized memcpy using =
vector instructions.

Obviously, whatever we do to optimize this, we should ensure optimal =
handling of the common case, where objects are only moved between the =
application and the mempool cache, and doesn't touch the backing ring.