DPDK patches and discussions
 help / color / mirror / Atom feed
From: Bruce Richardson <bruce.richardson@intel.com>
To: "Morten Brørup" <mb@smartsharesystems.com>
Cc: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>, <dev@dpdk.org>
Subject: Re: [PATCH] mempool: micro optimizations
Date: Thu, 27 Mar 2025 17:15:58 +0000	[thread overview]
Message-ID: <Z-WHzgriLwWC_3Mu@bricha3-mobl1.ger.corp.intel.com> (raw)
In-Reply-To: <20250226155923.128859-1-mb@smartsharesystems.com>

On Wed, Feb 26, 2025 at 03:59:22PM +0000, Morten Brørup wrote:
> The comparisons lcore_id < RTE_MAX_LCORE and lcore_id != LCORE_ID_ANY are
> equivalent, but the latter compiles to fewer bytes of code space.
> Similarly for lcore_id >= RTE_MAX_LCORE and lcore_id == LCORE_ID_ANY.
> 
> The rte_mempool_get_ops() function is also used in the fast path, so
> RTE_VERIFY() was replaced by RTE_ASSERT().
> 
> Compilers implicitly consider comparisons of variable == 0 likely, so
> unlikely() was added to the check for no mempool cache (mp->cache_size ==
> 0) in the rte_mempool_default_cache() function.
> 
> The rte_mempool_do_generic_put() function for adding objects to a mempool
> was refactored as follows:
> - The comparison for the request itself being too big, which is considered
>   unlikely, was moved down and out of the code path where the cache has
>   sufficient room for the added objects, which is considered the most
>   likely code path.
> - Added __rte_assume() about the cache length, size and threshold, for
>   compiler optimization when "n" is compile time constant.
> - Added __rte_assume() about "ret" being zero, so other functions using
>   the value returned by this function can be potentially optimized by the
>   compiler; especially when it merges multiple sequential code paths of
>   inlined code depending on the return value being either zero or
>   negative.
> - The refactored source code (with comments) made the separate comment
>   describing the cache flush/add algorithm superfluous, so it was removed.
> 
> A few more likely()/unlikely() were added.

In general not a big fan of using likely/unlikely, but if they give a perf
benefit, we should probably take them.

Few more comments inline below.

> A few comments were improved for readability.
> 
> Some assertions, RTE_ASSERT(), were added. Most importantly to assert that
> the return values of the mempool drivers' enqueue and dequeue operations
> are API compliant, i.e. 0 (for success) or negative (for failure), and
> never positive.
> 
> Signed-off-by: Morten Brørup <mb@smartsharesystems.com>

Acked-by: Bruce Richardson <bruce.richardson@intel.com>

> ---
>  lib/mempool/rte_mempool.h | 67 ++++++++++++++++++++++-----------------
>  1 file changed, 38 insertions(+), 29 deletions(-)
> 
> diff --git a/lib/mempool/rte_mempool.h b/lib/mempool/rte_mempool.h
> index c495cc012f..aedc100964 100644
> --- a/lib/mempool/rte_mempool.h
> +++ b/lib/mempool/rte_mempool.h
> @@ -334,7 +334,7 @@ struct __rte_cache_aligned rte_mempool {
>  #ifdef RTE_LIBRTE_MEMPOOL_STATS
>  #define RTE_MEMPOOL_STAT_ADD(mp, name, n) do {                                  \
>  		unsigned int __lcore_id = rte_lcore_id();                       \
> -		if (likely(__lcore_id < RTE_MAX_LCORE))                         \
> +		if (likely(__lcore_id != LCORE_ID_ANY))                         \

Is this not opening up the possibility of runtime crashes, if __lcore_id is
invalid? I see from the commit log, you say the change in comparison
results in smaller code gen, but it does leave undefined behaviour when
__lcore_id == 500, for example.

>  			(mp)->stats[__lcore_id].name += (n);                    \
>  		else                                                            \
>  			rte_atomic_fetch_add_explicit(&((mp)->stats[RTE_MAX_LCORE].name),  \
> @@ -751,7 +751,7 @@ extern struct rte_mempool_ops_table rte_mempool_ops_table;
>  static inline struct rte_mempool_ops *
>  rte_mempool_get_ops(int ops_index)
>  {
> -	RTE_VERIFY((ops_index >= 0) && (ops_index < RTE_MEMPOOL_MAX_OPS_IDX));
> +	RTE_ASSERT((ops_index >= 0) && (ops_index < RTE_MEMPOOL_MAX_OPS_IDX));
>  
>  	return &rte_mempool_ops_table.ops[ops_index];
>  }
> @@ -791,7 +791,8 @@ rte_mempool_ops_dequeue_bulk(struct rte_mempool *mp,
>  	rte_mempool_trace_ops_dequeue_bulk(mp, obj_table, n);
>  	ops = rte_mempool_get_ops(mp->ops_index);
>  	ret = ops->dequeue(mp, obj_table, n);
> -	if (ret == 0) {
> +	RTE_ASSERT(ret <= 0);
> +	if (likely(ret == 0)) {
>  		RTE_MEMPOOL_STAT_ADD(mp, get_common_pool_bulk, 1);
>  		RTE_MEMPOOL_STAT_ADD(mp, get_common_pool_objs, n);
>  	}
> @@ -816,11 +817,14 @@ rte_mempool_ops_dequeue_contig_blocks(struct rte_mempool *mp,
>  		void **first_obj_table, unsigned int n)
>  {
>  	struct rte_mempool_ops *ops;
> +	int ret;
>  
>  	ops = rte_mempool_get_ops(mp->ops_index);
>  	RTE_ASSERT(ops->dequeue_contig_blocks != NULL);
>  	rte_mempool_trace_ops_dequeue_contig_blocks(mp, first_obj_table, n);
> -	return ops->dequeue_contig_blocks(mp, first_obj_table, n);
> +	ret = ops->dequeue_contig_blocks(mp, first_obj_table, n);
> +	RTE_ASSERT(ret <= 0);
> +	return ret;
>  }
>  
>  /**
> @@ -848,6 +852,7 @@ rte_mempool_ops_enqueue_bulk(struct rte_mempool *mp, void * const *obj_table,
>  	rte_mempool_trace_ops_enqueue_bulk(mp, obj_table, n);
>  	ops = rte_mempool_get_ops(mp->ops_index);
>  	ret = ops->enqueue(mp, obj_table, n);
> +	RTE_ASSERT(ret <= 0);
>  #ifdef RTE_LIBRTE_MEMPOOL_DEBUG
>  	if (unlikely(ret < 0))
>  		RTE_MEMPOOL_LOG(CRIT, "cannot enqueue %u objects to mempool %s",
> @@ -1333,10 +1338,10 @@ rte_mempool_cache_free(struct rte_mempool_cache *cache);
>  static __rte_always_inline struct rte_mempool_cache *
>  rte_mempool_default_cache(struct rte_mempool *mp, unsigned lcore_id)
>  {
> -	if (mp->cache_size == 0)
> +	if (unlikely(mp->cache_size == 0))
>  		return NULL;
>  
> -	if (lcore_id >= RTE_MAX_LCORE)
> +	if (unlikely(lcore_id == LCORE_ID_ANY))
>  		return NULL;
>  

Again, I'd be concerned about the resiliency of this. But I suppose having
an invalid lcore id is just asking for problems and crashes later.

>  	rte_mempool_trace_default_cache(mp, lcore_id,
> @@ -1383,32 +1388,33 @@ rte_mempool_do_generic_put(struct rte_mempool *mp, void * const *obj_table,
>  {

  parent reply	other threads:[~2025-03-27 17:16 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-02-26 15:59 Morten Brørup
2025-02-26 16:53 ` Bruce Richardson
2025-02-27  9:14   ` Morten Brørup
2025-02-27  9:17     ` Bruce Richardson
2025-02-28 16:59       ` Morten Brørup
2025-03-25  7:13 ` Morten Brørup
2025-03-27 17:15 ` Bruce Richardson [this message]
2025-03-27 19:30   ` Morten Brørup
2025-03-30  8:09   ` Andrew Rybchenko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Z-WHzgriLwWC_3Mu@bricha3-mobl1.ger.corp.intel.com \
    --to=bruce.richardson@intel.com \
    --cc=andrew.rybchenko@oktetlabs.ru \
    --cc=dev@dpdk.org \
    --cc=mb@smartsharesystems.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).