From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id 7EDC8A00C2; Wed, 2 Nov 2022 09:01:15 +0100 (CET) Received: from [217.70.189.124] (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id 31E7E40693; Wed, 2 Nov 2022 09:01:15 +0100 (CET) Received: from mail.lysator.liu.se (mail.lysator.liu.se [130.236.254.3]) by mails.dpdk.org (Postfix) with ESMTP id 4B63940223 for ; Wed, 2 Nov 2022 09:01:14 +0100 (CET) Received: from mail.lysator.liu.se (localhost [127.0.0.1]) by mail.lysator.liu.se (Postfix) with ESMTP id E155E3D19 for ; Wed, 2 Nov 2022 09:01:13 +0100 (CET) Received: by mail.lysator.liu.se (Postfix, from userid 1004) id C6CF43DA9; Wed, 2 Nov 2022 09:01:13 +0100 (CET) X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on hermod.lysator.liu.se X-Spam-Level: X-Spam-Status: No, score=-1.5 required=5.0 tests=ALL_TRUSTED, AWL, NICE_REPLY_A autolearn=disabled version=3.4.6 X-Spam-Score: -1.5 Received: from [192.168.1.59] (h-62-63-215-114.A163.priv.bahnhof.se [62.63.215.114]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange ECDHE (P-256) server-signature RSA-PSS (2048 bits)) (No client certificate requested) by mail.lysator.liu.se (Postfix) with ESMTPSA id A4C8839EC; Wed, 2 Nov 2022 09:01:07 +0100 (CET) Message-ID: <5850fa4f-dbc5-b2a6-5927-23adf8ba209f@lysator.liu.se> Date: Wed, 2 Nov 2022 09:01:07 +0100 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.2.2 Subject: Re: [PATCH v2 3/3] mempool: use cache for frequently updated statistics Content-Language: en-US To: =?UTF-8?Q?Morten_Br=c3=b8rup?= , olivier.matz@6wind.com, andrew.rybchenko@oktetlabs.ru, stephen@networkplumber.org, jerinj@marvell.com, bruce.richardson@intel.com Cc: thomas@monjalon.net, dev@dpdk.org References: <20221030115445.2115-1-mb@smartsharesystems.com> <20221031112634.18329-1-mb@smartsharesystems.com> <20221031112634.18329-3-mb@smartsharesystems.com> From: =?UTF-8?Q?Mattias_R=c3=b6nnblom?= In-Reply-To: <20221031112634.18329-3-mb@smartsharesystems.com> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Virus-Scanned: ClamAV using ClamSMTP X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org On 2022-10-31 12:26, Morten Brørup wrote: > When built with statistics enabled (RTE_LIBRTE_MEMPOOL_STATS defined), the > performance of mempools with caches is improved as follows. > > When accessing objects in the mempool, either the put_bulk and put_objs or > the get_success_bulk and get_success_objs statistics counters are likely > to be incremented. > > By adding an alternative set of these counters to the mempool cache > structure, accesing the dedicated statistics structure is avoided in the > likely cases where these counters are incremented. > > The trick here is that the cache line holding the mempool cache structure > is accessed anyway, in order to access the 'len' or 'flushthresh' fields. > Updating some statistics counters in the same cache line has lower > performance cost than accessing the statistics counters in the dedicated > statistics structure, which resides in another cache line. > > mempool_perf_autotest with this patch shows the follwing change in > rate_persec. > > Compared to only spliting statistics from debug: > +1.5 % and +14.4 %, respectively without and with cache. > > Compared to not enabling mempool stats: > -4.4 % and -9.9 %, respectively without and with cache. > > v2: > * Move the statistics counters into a stats structure. > > Signed-off-by: Morten Brørup > --- > lib/mempool/rte_mempool.c | 9 +++++ > lib/mempool/rte_mempool.h | 73 ++++++++++++++++++++++++++++++++------- > 2 files changed, 69 insertions(+), 13 deletions(-) > > diff --git a/lib/mempool/rte_mempool.c b/lib/mempool/rte_mempool.c > index e6208125e0..a18e39af04 100644 > --- a/lib/mempool/rte_mempool.c > +++ b/lib/mempool/rte_mempool.c > @@ -1286,6 +1286,15 @@ rte_mempool_dump(FILE *f, struct rte_mempool *mp) > sum.get_success_blks += mp->stats[lcore_id].get_success_blks; > sum.get_fail_blks += mp->stats[lcore_id].get_fail_blks; > } > + if (mp->cache_size != 0) { > + /* Add the statistics stored in the mempool caches. */ > + for (lcore_id = 0; lcore_id < RTE_MAX_LCORE; lcore_id++) { > + sum.put_bulk += mp->local_cache[lcore_id].stats.put_bulk; > + sum.put_objs += mp->local_cache[lcore_id].stats.put_objs; > + sum.get_success_bulk += mp->local_cache[lcore_id].stats.get_success_bulk; > + sum.get_success_objs += mp->local_cache[lcore_id].stats.get_success_objs; > + } > + } > fprintf(f, " stats:\n"); > fprintf(f, " put_bulk=%"PRIu64"\n", sum.put_bulk); > fprintf(f, " put_objs=%"PRIu64"\n", sum.put_objs); > diff --git a/lib/mempool/rte_mempool.h b/lib/mempool/rte_mempool.h > index 16e7e62e3c..5806e75609 100644 > --- a/lib/mempool/rte_mempool.h > +++ b/lib/mempool/rte_mempool.h > @@ -86,6 +86,21 @@ struct rte_mempool_cache { > uint32_t size; /**< Size of the cache */ > uint32_t flushthresh; /**< Threshold before we flush excess elements */ > uint32_t len; /**< Current cache count */ > + uint32_t unused0; > +#ifdef RTE_LIBRTE_MEMPOOL_STATS > + /* > + * Alternative location for the most frequently updated mempool statistics (per-lcore), > + * providing faster update access when using a mempool cache. > + */ > + struct { > + uint64_t put_bulk; /**< Number of puts. */ > + uint64_t put_objs; /**< Number of objects successfully put. */ > + uint64_t get_success_bulk; /**< Successful allocation number. */ > + uint64_t get_success_objs; /**< Objects successfully allocated. */ > + } stats; /**< Statistics */ > +#else > + uint64_t unused1[4]; Are a particular DPDK version supposed to be ABI compatible with itself, with different configuration options? E.g., with and without RTE_LIBRTE_MEMPOOL_STATS. Is that why you have those 4 unused uint64_ts? > +#endif > /** > * Cache objects > * > @@ -296,14 +311,14 @@ struct rte_mempool { > | RTE_MEMPOOL_F_NO_IOVA_CONTIG \ > ) > /** > - * @internal When debug is enabled, store some statistics. > + * @internal When stats is enabled, store some statistics. > * > * @param mp > * Pointer to the memory pool. > * @param name > * Name of the statistics field to increment in the memory pool. > * @param n > - * Number to add to the object-oriented statistics. > + * Number to add to the statistics. > */ > #ifdef RTE_LIBRTE_MEMPOOL_STATS > #define RTE_MEMPOOL_STAT_ADD(mp, name, n) do { \ > @@ -312,6 +327,23 @@ struct rte_mempool { > #else > #define RTE_MEMPOOL_STAT_ADD(mp, name, n) do {} while (0) > #endif > +/** > + * @internal When stats is enabled, store some statistics. > + * > + * @param cache > + * Pointer to the memory pool cache. > + * @param name > + * Name of the statistics field to increment in the memory pool cache. > + * @param n > + * Number to add to the statistics. > + */ > +#ifdef RTE_LIBRTE_MEMPOOL_STATS > +#define RTE_MEMPOOL_CACHE_STAT_ADD(cache, name, n) do { \ > + (cache)->stats.name += n; \ > + } while (0) > +#else > +#define RTE_MEMPOOL_CACHE_STAT_ADD(cache, name, n) do {} while (0) Somewhat unrelated comment: maybe should have a RTE_NOP macro. > +#endif > > /** > * @internal Calculate the size of the mempool header. > @@ -1327,13 +1359,17 @@ rte_mempool_do_generic_put(struct rte_mempool *mp, void * const *obj_table, > { > void **cache_objs; > > + /* No cache provided */ > + if (unlikely(cache == NULL)) > + goto driver_enqueue; > + > /* increment stat now, adding in mempool always success */ > - RTE_MEMPOOL_STAT_ADD(mp, put_bulk, 1); > - RTE_MEMPOOL_STAT_ADD(mp, put_objs, n); > + RTE_MEMPOOL_CACHE_STAT_ADD(cache, put_bulk, 1); > + RTE_MEMPOOL_CACHE_STAT_ADD(cache, put_objs, n); > > - /* No cache provided or the request itself is too big for the cache */ > - if (unlikely(cache == NULL || n > cache->flushthresh)) > - goto driver_enqueue; > + /* The request itself is too big for the cache */ > + if (unlikely(n > cache->flushthresh)) > + goto driver_enqueue_stats_incremented; > > /* > * The cache follows the following algorithm: > @@ -1358,6 +1394,12 @@ rte_mempool_do_generic_put(struct rte_mempool *mp, void * const *obj_table, > > driver_enqueue: > > + /* increment stat now, adding in mempool always success */ > + RTE_MEMPOOL_STAT_ADD(mp, put_bulk, 1); > + RTE_MEMPOOL_STAT_ADD(mp, put_objs, n); > + > +driver_enqueue_stats_incremented: > + > /* push objects to the backend */ > rte_mempool_ops_enqueue_bulk(mp, obj_table, n); > } > @@ -1464,8 +1506,8 @@ rte_mempool_do_generic_get(struct rte_mempool *mp, void **obj_table, > if (remaining == 0) { > /* The entire request is satisfied from the cache. */ > > - RTE_MEMPOOL_STAT_ADD(mp, get_success_bulk, 1); > - RTE_MEMPOOL_STAT_ADD(mp, get_success_objs, n); > + RTE_MEMPOOL_CACHE_STAT_ADD(cache, get_success_bulk, 1); > + RTE_MEMPOOL_CACHE_STAT_ADD(cache, get_success_objs, n); > > return 0; > } > @@ -1494,8 +1536,8 @@ rte_mempool_do_generic_get(struct rte_mempool *mp, void **obj_table, > > cache->len = cache->size; > > - RTE_MEMPOOL_STAT_ADD(mp, get_success_bulk, 1); > - RTE_MEMPOOL_STAT_ADD(mp, get_success_objs, n); > + RTE_MEMPOOL_CACHE_STAT_ADD(cache, get_success_bulk, 1); > + RTE_MEMPOOL_CACHE_STAT_ADD(cache, get_success_objs, n); > > return 0; > > @@ -1517,8 +1559,13 @@ rte_mempool_do_generic_get(struct rte_mempool *mp, void **obj_table, > RTE_MEMPOOL_STAT_ADD(mp, get_fail_bulk, 1); > RTE_MEMPOOL_STAT_ADD(mp, get_fail_objs, n); > } else { > - RTE_MEMPOOL_STAT_ADD(mp, get_success_bulk, 1); > - RTE_MEMPOOL_STAT_ADD(mp, get_success_objs, n); > + if (likely(cache != NULL)) { > + RTE_MEMPOOL_CACHE_STAT_ADD(cache, get_success_bulk, 1); > + RTE_MEMPOOL_CACHE_STAT_ADD(cache, get_success_objs, n); > + } else { > + RTE_MEMPOOL_STAT_ADD(mp, get_success_bulk, 1); > + RTE_MEMPOOL_STAT_ADD(mp, get_success_objs, n); > + } > } > > return ret;