From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id 10A7246495; Thu, 27 Mar 2025 20:30:21 +0100 (CET) Received: from mails.dpdk.org (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id DFDE140654; Thu, 27 Mar 2025 20:30:20 +0100 (CET) Received: from dkmailrelay1.smartsharesystems.com (smartserver.smartsharesystems.com [77.243.40.215]) by mails.dpdk.org (Postfix) with ESMTP id A8575402A7 for ; Thu, 27 Mar 2025 20:30:19 +0100 (CET) Received: from smartserver.smartsharesystems.com (smartserver.smartsharesys.local [192.168.4.10]) by dkmailrelay1.smartsharesystems.com (Postfix) with ESMTP id 64C3B21733; Thu, 27 Mar 2025 20:30:19 +0100 (CET) Content-class: urn:content-classes:message MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable Subject: RE: [PATCH] mempool: micro optimizations X-MimeOLE: Produced By Microsoft Exchange V6.5 Date: Thu, 27 Mar 2025 20:30:18 +0100 Message-ID: <98CBD80474FA8B44BF855DF32C47DC35E9FB7D@smartserver.smartshare.dk> In-Reply-To: X-MS-Has-Attach: X-MS-TNEF-Correlator: Thread-Topic: [PATCH] mempool: micro optimizations Thread-Index: AdufO+79NeVnobzHRtGUckI3dsV5cwAD228A References: <20250226155923.128859-1-mb@smartsharesystems.com> From: =?iso-8859-1?Q?Morten_Br=F8rup?= To: "Bruce Richardson" Cc: "Andrew Rybchenko" , X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org > From: Bruce Richardson [mailto:bruce.richardson@intel.com] > Sent: Thursday, 27 March 2025 18.16 >=20 > On Wed, Feb 26, 2025 at 03:59:22PM +0000, Morten Br=F8rup wrote: > > The comparisons lcore_id < RTE_MAX_LCORE and lcore_id !=3D = LCORE_ID_ANY > are > > equivalent, but the latter compiles to fewer bytes of code space. > > Similarly for lcore_id >=3D RTE_MAX_LCORE and lcore_id =3D=3D = LCORE_ID_ANY. > > > > The rte_mempool_get_ops() function is also used in the fast path, so > > RTE_VERIFY() was replaced by RTE_ASSERT(). > > > > Compilers implicitly consider comparisons of variable =3D=3D 0 = likely, so > > unlikely() was added to the check for no mempool cache (mp- > >cache_size =3D=3D > > 0) in the rte_mempool_default_cache() function. > > > > The rte_mempool_do_generic_put() function for adding objects to a > mempool > > was refactored as follows: > > - The comparison for the request itself being too big, which is > considered > > unlikely, was moved down and out of the code path where the cache > has > > sufficient room for the added objects, which is considered the = most > > likely code path. > > - Added __rte_assume() about the cache length, size and threshold, > for > > compiler optimization when "n" is compile time constant. > > - Added __rte_assume() about "ret" being zero, so other functions > using > > the value returned by this function can be potentially optimized = by > the > > compiler; especially when it merges multiple sequential code paths > of > > inlined code depending on the return value being either zero or > > negative. > > - The refactored source code (with comments) made the separate > comment > > describing the cache flush/add algorithm superfluous, so it was > removed. > > > > A few more likely()/unlikely() were added. >=20 > In general not a big fan of using likely/unlikely, but if they give a > perf > benefit, we should probably take them. They can also be a hint to the code reviewer. The benefit varies depending on the architecture's dynamic branch = predictor. Some architectures don't consume a branch predictor entry if = the code follows the expected path (according to static branch = prediction); but I don't know if this is the case for architectures = supported by DPDK, or ancient architectures only. I like them enough to probably some day provide an EAL patch offering = superlikely() and superunlikely() based on = __builtin_expect_with_probability(). I have seen compilers emit different assembly output when using = superlikely() vs. likely(). Moving away the super-unlikely code from the cache lines holding the = common code path reduces the L1 instruction cache pressure. In other words: Not even a perfect dynamic branch predictor can = substitute all the benefits from using likely()/unlikely(). >=20 > Few more comments inline below. >=20 > > A few comments were improved for readability. > > > > Some assertions, RTE_ASSERT(), were added. Most importantly to = assert > that > > the return values of the mempool drivers' enqueue and dequeue > operations > > are API compliant, i.e. 0 (for success) or negative (for failure), > and > > never positive. > > > > Signed-off-by: Morten Br=F8rup >=20 > Acked-by: Bruce Richardson >=20 > > --- > > lib/mempool/rte_mempool.h | 67 = ++++++++++++++++++++++--------------- > -- > > 1 file changed, 38 insertions(+), 29 deletions(-) > > > > diff --git a/lib/mempool/rte_mempool.h b/lib/mempool/rte_mempool.h > > index c495cc012f..aedc100964 100644 > > --- a/lib/mempool/rte_mempool.h > > +++ b/lib/mempool/rte_mempool.h > > @@ -334,7 +334,7 @@ struct __rte_cache_aligned rte_mempool { > > #ifdef RTE_LIBRTE_MEMPOOL_STATS > > #define RTE_MEMPOOL_STAT_ADD(mp, name, n) do { > \ > > unsigned int __lcore_id =3D rte_lcore_id(); > \ > > - if (likely(__lcore_id < RTE_MAX_LCORE)) > \ > > + if (likely(__lcore_id !=3D LCORE_ID_ANY)) > \ >=20 > Is this not opening up the possibility of runtime crashes, if > __lcore_id is > invalid? I see from the commit log, you say the change in comparison > results in smaller code gen, but it does leave undefined behaviour = when > __lcore_id =3D=3D 500, for example. In this case, __lcore_id comes from rte_lcore_id(), and if that is = invalid, everything breaks everywhere. >=20 > > (mp)->stats[__lcore_id].name +=3D (n); > \ > > else > \ > > rte_atomic_fetch_add_explicit(&((mp)- > >stats[RTE_MAX_LCORE].name), \ > > @@ -751,7 +751,7 @@ extern struct rte_mempool_ops_table > rte_mempool_ops_table; > > static inline struct rte_mempool_ops * > > rte_mempool_get_ops(int ops_index) > > { > > - RTE_VERIFY((ops_index >=3D 0) && (ops_index < > RTE_MEMPOOL_MAX_OPS_IDX)); > > + RTE_ASSERT((ops_index >=3D 0) && (ops_index < > RTE_MEMPOOL_MAX_OPS_IDX)); > > > > return &rte_mempool_ops_table.ops[ops_index]; > > } > > @@ -791,7 +791,8 @@ rte_mempool_ops_dequeue_bulk(struct rte_mempool > *mp, > > rte_mempool_trace_ops_dequeue_bulk(mp, obj_table, n); > > ops =3D rte_mempool_get_ops(mp->ops_index); > > ret =3D ops->dequeue(mp, obj_table, n); > > - if (ret =3D=3D 0) { > > + RTE_ASSERT(ret <=3D 0); > > + if (likely(ret =3D=3D 0)) { > > RTE_MEMPOOL_STAT_ADD(mp, get_common_pool_bulk, 1); > > RTE_MEMPOOL_STAT_ADD(mp, get_common_pool_objs, n); > > } > > @@ -816,11 +817,14 @@ rte_mempool_ops_dequeue_contig_blocks(struct > rte_mempool *mp, > > void **first_obj_table, unsigned int n) > > { > > struct rte_mempool_ops *ops; > > + int ret; > > > > ops =3D rte_mempool_get_ops(mp->ops_index); > > RTE_ASSERT(ops->dequeue_contig_blocks !=3D NULL); > > rte_mempool_trace_ops_dequeue_contig_blocks(mp, first_obj_table, > n); > > - return ops->dequeue_contig_blocks(mp, first_obj_table, n); > > + ret =3D ops->dequeue_contig_blocks(mp, first_obj_table, n); > > + RTE_ASSERT(ret <=3D 0); > > + return ret; > > } > > > > /** > > @@ -848,6 +852,7 @@ rte_mempool_ops_enqueue_bulk(struct rte_mempool > *mp, void * const *obj_table, > > rte_mempool_trace_ops_enqueue_bulk(mp, obj_table, n); > > ops =3D rte_mempool_get_ops(mp->ops_index); > > ret =3D ops->enqueue(mp, obj_table, n); > > + RTE_ASSERT(ret <=3D 0); > > #ifdef RTE_LIBRTE_MEMPOOL_DEBUG > > if (unlikely(ret < 0)) > > RTE_MEMPOOL_LOG(CRIT, "cannot enqueue %u objects to mempool > %s", > > @@ -1333,10 +1338,10 @@ rte_mempool_cache_free(struct > rte_mempool_cache *cache); > > static __rte_always_inline struct rte_mempool_cache * > > rte_mempool_default_cache(struct rte_mempool *mp, unsigned = lcore_id) > > { > > - if (mp->cache_size =3D=3D 0) > > + if (unlikely(mp->cache_size =3D=3D 0)) > > return NULL; > > > > - if (lcore_id >=3D RTE_MAX_LCORE) > > + if (unlikely(lcore_id =3D=3D LCORE_ID_ANY)) > > return NULL; > > >=20 > Again, I'd be concerned about the resiliency of this. But I suppose > having > an invalid lcore id is just asking for problems and crashes later. I was through the same line of thinking... This introduces a risk. However, DPDK is built on a design requirement that parameters passed to = fast path APIs must be valid. So I trust the API contract here. And, as you say, if someone starts passing an invalid lcore_id around, = lots of stuff will break anyway. >=20 > > rte_mempool_trace_default_cache(mp, lcore_id, > > @@ -1383,32 +1388,33 @@ rte_mempool_do_generic_put(struct = rte_mempool > *mp, void * const *obj_table, > > { Thank you for the feedback and ack, Bruce.