From: Bruce Richardson <bruce.richardson@intel.com>
To: "Morten Brørup" <mb@smartsharesystems.com>
Cc: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>, <dev@dpdk.org>
Subject: Re: [PATCH] mempool: micro optimizations
Date: Thu, 27 Mar 2025 17:15:58 +0000 [thread overview]
Message-ID: <Z-WHzgriLwWC_3Mu@bricha3-mobl1.ger.corp.intel.com> (raw)
In-Reply-To: <20250226155923.128859-1-mb@smartsharesystems.com>
On Wed, Feb 26, 2025 at 03:59:22PM +0000, Morten Brørup wrote:
> The comparisons lcore_id < RTE_MAX_LCORE and lcore_id != LCORE_ID_ANY are
> equivalent, but the latter compiles to fewer bytes of code space.
> Similarly for lcore_id >= RTE_MAX_LCORE and lcore_id == LCORE_ID_ANY.
>
> The rte_mempool_get_ops() function is also used in the fast path, so
> RTE_VERIFY() was replaced by RTE_ASSERT().
>
> Compilers implicitly consider comparisons of variable == 0 likely, so
> unlikely() was added to the check for no mempool cache (mp->cache_size ==
> 0) in the rte_mempool_default_cache() function.
>
> The rte_mempool_do_generic_put() function for adding objects to a mempool
> was refactored as follows:
> - The comparison for the request itself being too big, which is considered
> unlikely, was moved down and out of the code path where the cache has
> sufficient room for the added objects, which is considered the most
> likely code path.
> - Added __rte_assume() about the cache length, size and threshold, for
> compiler optimization when "n" is compile time constant.
> - Added __rte_assume() about "ret" being zero, so other functions using
> the value returned by this function can be potentially optimized by the
> compiler; especially when it merges multiple sequential code paths of
> inlined code depending on the return value being either zero or
> negative.
> - The refactored source code (with comments) made the separate comment
> describing the cache flush/add algorithm superfluous, so it was removed.
>
> A few more likely()/unlikely() were added.
In general not a big fan of using likely/unlikely, but if they give a perf
benefit, we should probably take them.
Few more comments inline below.
> A few comments were improved for readability.
>
> Some assertions, RTE_ASSERT(), were added. Most importantly to assert that
> the return values of the mempool drivers' enqueue and dequeue operations
> are API compliant, i.e. 0 (for success) or negative (for failure), and
> never positive.
>
> Signed-off-by: Morten Brørup <mb@smartsharesystems.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
> ---
> lib/mempool/rte_mempool.h | 67 ++++++++++++++++++++++-----------------
> 1 file changed, 38 insertions(+), 29 deletions(-)
>
> diff --git a/lib/mempool/rte_mempool.h b/lib/mempool/rte_mempool.h
> index c495cc012f..aedc100964 100644
> --- a/lib/mempool/rte_mempool.h
> +++ b/lib/mempool/rte_mempool.h
> @@ -334,7 +334,7 @@ struct __rte_cache_aligned rte_mempool {
> #ifdef RTE_LIBRTE_MEMPOOL_STATS
> #define RTE_MEMPOOL_STAT_ADD(mp, name, n) do { \
> unsigned int __lcore_id = rte_lcore_id(); \
> - if (likely(__lcore_id < RTE_MAX_LCORE)) \
> + if (likely(__lcore_id != LCORE_ID_ANY)) \
Is this not opening up the possibility of runtime crashes, if __lcore_id is
invalid? I see from the commit log, you say the change in comparison
results in smaller code gen, but it does leave undefined behaviour when
__lcore_id == 500, for example.
> (mp)->stats[__lcore_id].name += (n); \
> else \
> rte_atomic_fetch_add_explicit(&((mp)->stats[RTE_MAX_LCORE].name), \
> @@ -751,7 +751,7 @@ extern struct rte_mempool_ops_table rte_mempool_ops_table;
> static inline struct rte_mempool_ops *
> rte_mempool_get_ops(int ops_index)
> {
> - RTE_VERIFY((ops_index >= 0) && (ops_index < RTE_MEMPOOL_MAX_OPS_IDX));
> + RTE_ASSERT((ops_index >= 0) && (ops_index < RTE_MEMPOOL_MAX_OPS_IDX));
>
> return &rte_mempool_ops_table.ops[ops_index];
> }
> @@ -791,7 +791,8 @@ rte_mempool_ops_dequeue_bulk(struct rte_mempool *mp,
> rte_mempool_trace_ops_dequeue_bulk(mp, obj_table, n);
> ops = rte_mempool_get_ops(mp->ops_index);
> ret = ops->dequeue(mp, obj_table, n);
> - if (ret == 0) {
> + RTE_ASSERT(ret <= 0);
> + if (likely(ret == 0)) {
> RTE_MEMPOOL_STAT_ADD(mp, get_common_pool_bulk, 1);
> RTE_MEMPOOL_STAT_ADD(mp, get_common_pool_objs, n);
> }
> @@ -816,11 +817,14 @@ rte_mempool_ops_dequeue_contig_blocks(struct rte_mempool *mp,
> void **first_obj_table, unsigned int n)
> {
> struct rte_mempool_ops *ops;
> + int ret;
>
> ops = rte_mempool_get_ops(mp->ops_index);
> RTE_ASSERT(ops->dequeue_contig_blocks != NULL);
> rte_mempool_trace_ops_dequeue_contig_blocks(mp, first_obj_table, n);
> - return ops->dequeue_contig_blocks(mp, first_obj_table, n);
> + ret = ops->dequeue_contig_blocks(mp, first_obj_table, n);
> + RTE_ASSERT(ret <= 0);
> + return ret;
> }
>
> /**
> @@ -848,6 +852,7 @@ rte_mempool_ops_enqueue_bulk(struct rte_mempool *mp, void * const *obj_table,
> rte_mempool_trace_ops_enqueue_bulk(mp, obj_table, n);
> ops = rte_mempool_get_ops(mp->ops_index);
> ret = ops->enqueue(mp, obj_table, n);
> + RTE_ASSERT(ret <= 0);
> #ifdef RTE_LIBRTE_MEMPOOL_DEBUG
> if (unlikely(ret < 0))
> RTE_MEMPOOL_LOG(CRIT, "cannot enqueue %u objects to mempool %s",
> @@ -1333,10 +1338,10 @@ rte_mempool_cache_free(struct rte_mempool_cache *cache);
> static __rte_always_inline struct rte_mempool_cache *
> rte_mempool_default_cache(struct rte_mempool *mp, unsigned lcore_id)
> {
> - if (mp->cache_size == 0)
> + if (unlikely(mp->cache_size == 0))
> return NULL;
>
> - if (lcore_id >= RTE_MAX_LCORE)
> + if (unlikely(lcore_id == LCORE_ID_ANY))
> return NULL;
>
Again, I'd be concerned about the resiliency of this. But I suppose having
an invalid lcore id is just asking for problems and crashes later.
> rte_mempool_trace_default_cache(mp, lcore_id,
> @@ -1383,32 +1388,33 @@ rte_mempool_do_generic_put(struct rte_mempool *mp, void * const *obj_table,
> {
next prev parent reply other threads:[~2025-03-27 17:16 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-02-26 15:59 Morten Brørup
2025-02-26 16:53 ` Bruce Richardson
2025-02-27 9:14 ` Morten Brørup
2025-02-27 9:17 ` Bruce Richardson
2025-02-28 16:59 ` Morten Brørup
2025-03-25 7:13 ` Morten Brørup
2025-03-27 17:15 ` Bruce Richardson [this message]
2025-03-27 19:30 ` Morten Brørup
2025-03-30 8:09 ` Andrew Rybchenko
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Z-WHzgriLwWC_3Mu@bricha3-mobl1.ger.corp.intel.com \
--to=bruce.richardson@intel.com \
--cc=andrew.rybchenko@oktetlabs.ru \
--cc=dev@dpdk.org \
--cc=mb@smartsharesystems.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).