From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <dev-bounces@dpdk.org>
Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124])
	by inbox.dpdk.org (Postfix) with ESMTP id AA32BA050B;
	Thu,  7 Apr 2022 11:26:59 +0200 (CEST)
Received: from [217.70.189.124] (localhost [127.0.0.1])
	by mails.dpdk.org (Postfix) with ESMTP id 71E824068B;
	Thu,  7 Apr 2022 11:26:59 +0200 (CEST)
Received: from smartserver.smartsharesystems.com
 (smartserver.smartsharesystems.com [77.243.40.215])
 by mails.dpdk.org (Postfix) with ESMTP id B5CBE40689
 for <dev@dpdk.org>; Thu,  7 Apr 2022 11:26:57 +0200 (CEST)
X-MimeOLE: Produced By Microsoft Exchange V6.5
Content-class: urn:content-classes:message
MIME-Version: 1.0
Content-Type: text/plain;
	charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable
Subject: RE: [PATCH v4] mempool: fix mempool cache flushing algorithm
Date: Thu, 7 Apr 2022 11:26:53 +0200
Message-ID: <98CBD80474FA8B44BF855DF32C47DC35D86FB9@smartserver.smartshare.dk>
In-Reply-To: <Yk6rdMyYx3XAd4R3@bricha3-MOBL.ger.corp.intel.com>
X-MS-Has-Attach: 
X-MS-TNEF-Correlator: 
Thread-Topic: [PATCH v4] mempool: fix mempool cache flushing algorithm
Thread-Index: AdhKX+onvKA9YmtUQ6C3URMMSGyz6wAADqcQ
References: <98CBD80474FA8B44BF855DF32C47DC35D86DB2@smartserver.smartshare.dk>
 <20220202103354.79832-1-mb@smartsharesystems.com>
 <98CBD80474FA8B44BF855DF32C47DC35D86FB8@smartserver.smartshare.dk>
 <Yk6rdMyYx3XAd4R3@bricha3-MOBL.ger.corp.intel.com>
From: =?iso-8859-1?Q?Morten_Br=F8rup?= <mb@smartsharesystems.com>
To: "Bruce Richardson" <bruce.richardson@intel.com>
Cc: <olivier.matz@6wind.com>, <andrew.rybchenko@oktetlabs.ru>,
 <thomas@monjalon.net>, <jerinjacobk@gmail.com>, <dev@dpdk.org>
X-BeenThere: dev@dpdk.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: DPDK patches and discussions <dev.dpdk.org>
List-Unsubscribe: <https://mails.dpdk.org/options/dev>,
 <mailto:dev-request@dpdk.org?subject=unsubscribe>
List-Archive: <http://mails.dpdk.org/archives/dev/>
List-Post: <mailto:dev@dpdk.org>
List-Help: <mailto:dev-request@dpdk.org?subject=help>
List-Subscribe: <https://mails.dpdk.org/listinfo/dev>,
 <mailto:dev-request@dpdk.org?subject=subscribe>
Errors-To: dev-bounces@dpdk.org

> From: Bruce Richardson [mailto:bruce.richardson@intel.com]
> Sent: Thursday, 7 April 2022 11.14
>=20
> On Thu, Apr 07, 2022 at 11:04:53AM +0200, Morten Br=F8rup wrote:
> > > From: Morten Br=F8rup [mailto:mb@smartsharesystems.com]
> > > Sent: Wednesday, 2 February 2022 11.34
> > >
> > > This patch fixes the rte_mempool_do_generic_put() caching
> algorithm,
> > > which was fundamentally wrong, causing multiple performance issues
> when
> > > flushing.
> > >
> >
> > [...]
> >
> > Olivier,
> >
> > Will you please consider this patch [1] and the other one [2].
> >
> > The primary bug here is this: When a mempool cache becomes full =
(i.e.
> exceeds the "flush threshold"), and is flushed to the backing ring, it
> is still full afterwards; but it should be empty afterwards. It is not
> flushed entirely, only the elements exceeding "size" are flushed.
> >
>=20
> I don't believe it should be flushed entirely, there should always be
> some
> elements left so that even after flush we can still allocate an
> additional
> burst. We want to avoid the situation where a flush of all elements is
> immediately followed by a refill of new elements. However, we can =
flush
> to
> maybe size/2, and improve things. In short, this not emptying is by
> design
> rather than a bug, though we can look to tweak the behaviour.
>=20

I initially agreed with you about flushing to size/2.

However, I did think further about it when I wrote the patch, and came =
to this conclusion: If an application thread repeatedly puts objects =
into the mempool, and does it so often that the cache overflows (i.e. =
reaches the flush threshold) and needs to be flushed, it is far more =
likely that the application thread will continue doing that, rather than =
start getting objects from the mempool. This speaks for flushing the =
cache entirely.

Both solutions are better than flushing to size, so if there is a =
preference for keeping some objects in the cache after flushing, I can =
update the patch accordingly.

> > E.g. pipelined applications having ingress threads and egress =
threads
> running on different lcores are affected by this bug.
> >
> If we are looking at improvements for pipelined applications, I think =
a
> bigger win would be to change the default mempool from ring-based to
> stack-based. For apps using a run-to-completion model, they should run
> out
> of cache and should therefore be largely unaffected by such a change.
>=20
> > I don't think the real performance impact is very big, but these
> algorithm level bugs really annoy me.
> >
> > I'm still wondering how the patch introducing the mempool cache =
flush
> threshold could pass internal code review with so many bugs.
> >
> > [1]
> https://patchwork.dpdk.org/project/dpdk/patch/20220202103354.79832-1-
> mb@smartsharesystems.com/
> > [2]
> https://patchwork.dpdk.org/project/dpdk/patch/20220202081426.77975-1-
> mb@smartsharesystems.com/
> >
> > -Morten
> >
> > > Signed-off-by: Morten Br=F8rup <mb@smartsharesystems.com>
> > > ---
> > >  lib/mempool/rte_mempool.h | 34 ++++++++++++++++++++++------------
> > >  1 file changed, 22 insertions(+), 12 deletions(-)
> > >
> > > diff --git a/lib/mempool/rte_mempool.h b/lib/mempool/rte_mempool.h
> > > index 1e7a3c1527..e7e09e48fc 100644
> > > --- a/lib/mempool/rte_mempool.h
> > > +++ b/lib/mempool/rte_mempool.h
> > > @@ -1344,31 +1344,41 @@ rte_mempool_do_generic_put(struct
> rte_mempool
> > > *mp, void * const *obj_table,
> > >  	if (unlikely(cache =3D=3D NULL || n > =
RTE_MEMPOOL_CACHE_MAX_SIZE))
> > >  		goto ring_enqueue;
> > >
> > > -	cache_objs =3D &cache->objs[cache->len];
> > > +	/* If the request itself is too big for the cache */
> > > +	if (unlikely(n > cache->flushthresh))
> > > +		goto ring_enqueue;
> > >
> > >  	/*
> > >  	 * The cache follows the following algorithm
> > > -	 *   1. Add the objects to the cache
> > > -	 *   2. Anything greater than the cache min value (if it crosses
> > > the
> > > -	 *   cache flush threshold) is flushed to the ring.
> >
> > In the code, "the cache min value" is actually "the cache size". =
This
> indicates an intention to do something more. Perhaps the patch
> introducing the "flush threshold" was committed while still =
incomplete,
> and just never got completed?
> >
> > > +	 *   1. If the objects cannot be added to the cache without
> > > +	 *   crossing the flush threshold, flush the cache to the ring.
> > > +	 *   2. Add the objects to the cache.
> > >  	 */
> > >
> > > -	/* Add elements back into the cache */
> > > -	rte_memcpy(&cache_objs[0], obj_table, sizeof(void *) * n);
> > > +	if (cache->len + n <=3D cache->flushthresh) {
> > > +		cache_objs =3D &cache->objs[cache->len];
> > >
> > > -	cache->len +=3D n;
> > > +		cache->len +=3D n;
> > > +	} else {
> > > +		cache_objs =3D &cache->objs[0];
> > >
> > > -	if (cache->len >=3D cache->flushthresh) {
> > > -		rte_mempool_ops_enqueue_bulk(mp, &cache->objs[cache->size],
> > > -				cache->len - cache->size);
> > > -		cache->len =3D cache->size;
> > > +#ifdef RTE_LIBRTE_MEMPOOL_DEBUG
> > > +		if (rte_mempool_ops_enqueue_bulk(mp, cache_objs, cache-
> > > >len) < 0)
> > > +			rte_panic("cannot put objects in mempool\n");
> > > +#else
> > > +		rte_mempool_ops_enqueue_bulk(mp, cache_objs, cache->len);
> > > +#endif
> > > +		cache->len =3D n;
> > >  	}
> > >
> > > +	/* Add the objects to the cache. */
> > > +	rte_memcpy(cache_objs, obj_table, sizeof(void *) * n);
> > > +
> > >  	return;
> > >
> > >  ring_enqueue:
> > >
> > > -	/* push remaining objects in ring */
> > > +	/* Put the objects into the ring */
> > >  #ifdef RTE_LIBRTE_MEMPOOL_DEBUG
> > >  	if (rte_mempool_ops_enqueue_bulk(mp, obj_table, n) < 0)
> > >  		rte_panic("cannot put objects in mempool\n");
> > > --
> > > 2.17.1
> >