DPDK patches and discussions
 help / color / mirror / Atom feed
From: Konstantin Ananyev <konstantin.ananyev@huawei.com>
To: "Morten Brørup" <mb@smartsharesystems.com>,
	"olivier.matz@6wind.com" <olivier.matz@6wind.com>,
	"andrew.rybchenko@oktetlabs.ru" <andrew.rybchenko@oktetlabs.ru>,
	"thomas@monjalon.net" <thomas@monjalon.net>
Cc: "dev@dpdk.org" <dev@dpdk.org>,
	"honnappa.nagarahalli@arm.com" <honnappa.nagarahalli@arm.com>,
	"bruce.richardson@intel.com" <bruce.richardson@intel.com>,
	nd <nd@arm.com>
Subject: RE: [PATCH v2] mempool: micro-optimize put function
Date: Thu, 22 Dec 2022 13:52:00 +0000	[thread overview]
Message-ID: <dfdd0c19d21344168c159fd51d1db7d9@huawei.com> (raw)
In-Reply-To: <98CBD80474FA8B44BF855DF32C47DC35D875C8@smartserver.smartshare.dk>


Hi Morten,

> PING mempool maintainers.
> 
> If you don't provide ACK or Review to patches, how should Thomas know that it is ready for merging?
> 
> -Morten

The code changes itself looks ok to me. 
Though I don't think it would really bring any noticeable speedup.
But yes, it looks a bit nicer this way, especially with extra comments.
One question though, why do you feel this assert:
RTE_ASSERT(cache->len <= cache->flushthresh);
is necessary?
I can't see any way it could happen, unless something is totally broken
within the app.   

> > From: Morten Brørup [mailto:mb@smartsharesystems.com]
> > Sent: Wednesday, 16 November 2022 18.39
> >
> > > From: Honnappa Nagarahalli [mailto:Honnappa.Nagarahalli@arm.com]
> > > Sent: Wednesday, 16 November 2022 17.27
> > >
> > >
> > > > > >
> > > > > > Micro-optimization:
> > > > > > Reduced the most likely code path in the generic put function
> > by
> > > > > moving an
> > > > > > unlikely check out of the most likely code path and further
> > down.
> > > > > >
> > > > > > Also updated the comments in the function.
> > > > > >
> > > > > > v2 (feedback from Andrew Rybchenko):
> > > > > > * Modified comparison to prevent overflow if n is really huge
> > and
> > > > > > len
> > > > > is
> > > > > >   non-zero.
> > > > > > * Added assertion about the invariant preventing overflow in
> > the
> > > > > >   comparison.
> > > > > > * Crossing the threshold is not extremely unlikely, so removed
> > > > > likely()
> > > > > >   from that comparison.
> > > > > >   The compiler will generate code with optimal static branch
> > > > > prediction
> > > > > >   here anyway.
> > > > > >
> > > > > > Signed-off-by: Morten Brørup <mb@smartsharesystems.com>
> > > > > > ---
> > > > > >  lib/mempool/rte_mempool.h | 36 ++++++++++++++++++++-----------
> > --
> > > ---
> > > > > >  1 file changed, 20 insertions(+), 16 deletions(-)
> > > > > >
> > > > > > diff --git a/lib/mempool/rte_mempool.h
> > > b/lib/mempool/rte_mempool.h
> > > > > > index 9f530db24b..dd1a3177d6 100644
> > > > > > --- a/lib/mempool/rte_mempool.h
> > > > > > +++ b/lib/mempool/rte_mempool.h
> > > > > > @@ -1364,32 +1364,36 @@ rte_mempool_do_generic_put(struct
> > > > > > rte_mempool *mp, void * const *obj_table,  {
> > > > > >  	void **cache_objs;
> > > > > >
> > > > > > -	/* No cache provided */
> > > > > > +	/* No cache provided? */
> > > > > >  	if (unlikely(cache == NULL))
> > > > > >  		goto driver_enqueue;
> > > > > >
> > > > > > -	/* increment stat now, adding in mempool always success */
> > > > > > +	/* Increment stats now, adding in mempool always succeeds.
> > > */
> > > > > >  	RTE_MEMPOOL_CACHE_STAT_ADD(cache, put_bulk, 1);
> > > > > >  	RTE_MEMPOOL_CACHE_STAT_ADD(cache, put_objs, n);
> > > > > >
> > > > > > -	/* The request itself is too big for the cache */
> > > > > > -	if (unlikely(n > cache->flushthresh))
> > > > > > -		goto driver_enqueue_stats_incremented;
> > > > > > -
> > > > > > -	/*
> > > > > > -	 * The cache follows the following algorithm:
> > > > > > -	 *   1. If the objects cannot be added to the cache without
> > > > > crossing
> > > > > > -	 *      the flush threshold, flush the cache to the
> > > backend.
> > > > > > -	 *   2. Add the objects to the cache.
> > > > > > -	 */
> > > > > > +	/* Assert the invariant preventing overflow in the
> > > comparison
> > > > > below.
> > > > > > */
> > > > > > +	RTE_ASSERT(cache->len <= cache->flushthresh);
> > > > > >
> > > > > > -	if (cache->len + n <= cache->flushthresh) {
> > > > > > +	if (n <= cache->flushthresh - cache->len) {
> > > > > > +		/*
> > > > > > +		 * The objects can be added to the cache without
> > > crossing
> > > > > the
> > > > > > +		 * flush threshold.
> > > > > > +		 */
> > > > > >  		cache_objs = &cache->objs[cache->len];
> > > > > >  		cache->len += n;
> > > > > > -	} else {
> > > > > > +	} else if (likely(n <= cache->flushthresh)) {
> > > > > IMO, this is a misconfiguration on the application part. In the
> > > PMDs I
> > > > > have looked at, max value of 'n' is controlled by compile time
> > > > > constants. Application could do a compile time check on the cache
> > > > > threshold or we could have another RTE_ASSERT on this.
> > > >
> > > > There could be applications using a mempool for something else than
> > > mbufs.
> > > Agree
> > >
> > > >
> > > > In that case, the application should be allowed to get/put many
> > > objects in
> > > > one transaction.
> > > Still, this is a misconfiguration on the application. On one hand the
> > > threshold is configured for 'x' but they are sending a request which
> > is
> > > more than 'x'. It should be possible to change the threshold
> > > configuration or reduce the request size.
> > >
> > > If the application does not fix the misconfiguration, it is possible
> > > that it will always hit this case and does not get the benefit of
> > using
> > > the per-core cache.
> >
> > Correct. I suppose this is the intended behavior of this API.
> >
> > The zero-copy API proposed in another patch [1] has stricter
> > requirements to the bulk size.
> >
> > [1]: http://inbox.dpdk.org/dev/20221115161822.70886-1-
> > mb@smartsharesystems.com/T/#u
> >
> > >
> > > With this check, we are introducing an additional memcpy as well. I
> > am
> > > not sure if reusing the latest buffers is better than having an
> > memcpy.
> >
> > There is no additional memcpy. The large bulk transfer is stored
> > directly in the backend pool, bypassing the mempool cache.
> >
> > Please note that this check is not new, it has just been moved. Before
> > this patch, it was checked on every call (if a cache is present); with
> > this patch, it is only checked if the entire request cannot go directly
> > into the cache.
> >
> > >
> > > >
> > > > >
> > > > > > +		/*
> > > > > > +		 * The request itself fits into the cache.
> > > > > > +		 * But first, the cache must be flushed to the
> > > backend, so
> > > > > > +		 * adding the objects does not cross the flush
> > > threshold.
> > > > > > +		 */
> > > > > >  		cache_objs = &cache->objs[0];
> > > > > >  		rte_mempool_ops_enqueue_bulk(mp, cache_objs, cache-
> > > > > > >len);
> > > > > >  		cache->len = n;
> > > > > > +	} else {
> > > > > > +		/* The request itself is too big for the cache. */
> > > > > > +		goto driver_enqueue_stats_incremented;
> > > > > >  	}
> > > > > >
> > > > > >  	/* Add the objects to the cache. */ @@ -1399,13 +1403,13 @@
> > > > > > rte_mempool_do_generic_put(struct rte_mempool *mp, void * const
> > > > > > *obj_table,
> > > > > >
> > > > > >  driver_enqueue:
> > > > > >
> > > > > > -	/* increment stat now, adding in mempool always success */
> > > > > > +	/* Increment stats now, adding in mempool always succeeds.
> > > */
> > > > > >  	RTE_MEMPOOL_STAT_ADD(mp, put_bulk, 1);
> > > > > >  	RTE_MEMPOOL_STAT_ADD(mp, put_objs, n);
> > > > > >
> > > > > >  driver_enqueue_stats_incremented:
> > > > > >
> > > > > > -	/* push objects to the backend */
> > > > > > +	/* Push the objects to the backend. */
> > > > > >  	rte_mempool_ops_enqueue_bulk(mp, obj_table, n);  }
> > > > > >
> > > > > > --
> > > > > > 2.17.1


  reply	other threads:[~2022-12-22 13:52 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-11-16 10:18 [PATCH] " Morten Brørup
2022-11-16 11:04 ` Andrew Rybchenko
2022-11-16 11:10   ` Morten Brørup
2022-11-16 11:29     ` Andrew Rybchenko
2022-11-16 12:14 ` [PATCH v2] " Morten Brørup
2022-11-16 15:51   ` Honnappa Nagarahalli
2022-11-16 15:59     ` Morten Brørup
2022-11-16 16:26       ` Honnappa Nagarahalli
2022-11-16 17:39         ` Morten Brørup
2022-12-19  8:50           ` Morten Brørup
2022-12-22 13:52             ` Konstantin Ananyev [this message]
2022-12-22 15:02               ` Morten Brørup
2022-12-23 16:34                 ` Konstantin Ananyev
2022-12-24 10:46 ` [PATCH v3] " Morten Brørup
2022-12-27  8:54   ` Andrew Rybchenko
2022-12-27 15:37     ` Morten Brørup

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=dfdd0c19d21344168c159fd51d1db7d9@huawei.com \
    --to=konstantin.ananyev@huawei.com \
    --cc=andrew.rybchenko@oktetlabs.ru \
    --cc=bruce.richardson@intel.com \
    --cc=dev@dpdk.org \
    --cc=honnappa.nagarahalli@arm.com \
    --cc=mb@smartsharesystems.com \
    --cc=nd@arm.com \
    --cc=olivier.matz@6wind.com \
    --cc=thomas@monjalon.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).