From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id 12237A04A7; Mon, 24 Jan 2022 11:26:04 +0100 (CET) Received: from [217.70.189.124] (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id A014940E5A; Mon, 24 Jan 2022 11:26:03 +0100 (CET) Received: from mail-wm1-f53.google.com (mail-wm1-f53.google.com [209.85.128.53]) by mails.dpdk.org (Postfix) with ESMTP id 20BF640E0F for ; Mon, 24 Jan 2022 11:26:02 +0100 (CET) Received: by mail-wm1-f53.google.com with SMTP id a203-20020a1c98d4000000b0034d2956eb04so954660wme.5 for ; Mon, 24 Jan 2022 02:26:02 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=6wind.com; s=google; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:content-transfer-encoding:in-reply-to; bh=2UM8hA6TQ9gPicPgxzCYs3rhIRNIhtbF6nWLXhdsE6c=; b=F60EJYp9qCSWA88Q6naMb6+shJkPiViLy1PBN61JwfIRQmPs/DVeaTSGt9zsW0gVro bLGrtnoGoPAKQQp+n1SwiPZlbEpkYZJOrwbkULHjd1RmRq5rivPYGA5FEeWs7ph/HxcQ K60TG3U+sUF0X3vgE7FKZU/NktpXRfmfQteQvvMyzOBGs5FDbmxDMWauf+0MIdGyWObp hXrUmkPLdiOOBkQaVyNPYJhOA8ye7G4KJG2GjOLXqbC2Q3eiIYDiUiq8GygMYr1AiOUb jWzcw2RopCh4qt5KMre+nmQbPvhXka8Ml9n3zv2DkgaErY6R8EFz2Q0aKVurSQeYQs4J Ehxg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:content-transfer-encoding :in-reply-to; bh=2UM8hA6TQ9gPicPgxzCYs3rhIRNIhtbF6nWLXhdsE6c=; b=flSwqFBxIkRX5BoFQigZZwBlTtsev8Iha5APZ545WJM+Hdjx1xCm5tXpt0dspBVT9R XkMdq9cvkZVfOEScb+malwOT10BQwv5H8PHExBI3U7N0bNQtLdDwYEBuHDyDPZrdwQWU id4THgkgVtzOss72JhNpY2fuFCRb84LqwGeUNUpMafiGzRwOKg5IT6HHYxVUdgDwqMLD aqjX5hIZOchHLyNomqWqEoYZtvEruVEoegR8FK8WdlYsfIbBMXf0dOSV75CEA0elHmJA cIINxp97Yi/ReDRTqUXGENUjZPNvHMTg5WZq3ADxEIjNsToRHfNdoUMmoqRfNLp4989Q yV2g== X-Gm-Message-State: AOAM533z1xwgCVVlb1kPBBoxUh8ZptrlQsnTEDsS3RN8YRMLbC3vH7ax QwnfZBX0+4080VBc3zz+ONXTIg== X-Google-Smtp-Source: ABdhPJy9pZvIEx/J6rmAGW6J7GzbDobFr45xp0wdbEK0kJqBEm/lRA0yc3x7q+H5+EK3BsvyMaJ54Q== X-Received: by 2002:a05:600c:4e90:: with SMTP id f16mr1194942wmq.87.1643019961710; Mon, 24 Jan 2022 02:26:01 -0800 (PST) Received: from 6wind.com ([2a01:e0a:5ac:6460:c065:401d:87eb:9b25]) by smtp.gmail.com with ESMTPSA id n8sm4038551wmk.18.2022.01.24.02.26.00 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 24 Jan 2022 02:26:01 -0800 (PST) Date: Mon, 24 Jan 2022 11:26:00 +0100 From: Olivier Matz To: Morten =?iso-8859-1?Q?Br=F8rup?= Cc: andrew.rybchenko@oktetlabs.ru, thomas@monjalon.net, bruce.richardson@intel.com, jerinjacobk@gmail.com, dev@dpdk.org Subject: Re: [PATCH] mempool: test performance with constant n Message-ID: References: <20220119113732.40167-1-mb@smartsharesystems.com> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <20220119113732.40167-1-mb@smartsharesystems.com> X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Hi Morten, Thank you for enhancing the mempool test. Please see some comments below. On Wed, Jan 19, 2022 at 12:37:32PM +0100, Morten Brørup wrote: > "What gets measured gets done." > > This patch adds mempool performance tests where the number of objects to > put and get is constant at compile time, which may significantly improve > the performance of these functions. [*] > > Also, it is ensured that the array holding the object used for testing > is cache line aligned, for maximum performance. > > And finally, the following entries are added to the list of tests: > - Number of kept objects: 512 > - Number of objects to get and to put: The number of pointers fitting > into a cache line, i.e. 8 or 16 > > [*] Some example performance test (with cache) results: > > get_bulk=4 put_bulk=4 keep=128 constant_n=false rate_persec=280480972 > get_bulk=4 put_bulk=4 keep=128 constant_n=true rate_persec=622159462 > > get_bulk=8 put_bulk=8 keep=128 constant_n=false rate_persec=477967155 > get_bulk=8 put_bulk=8 keep=128 constant_n=true rate_persec=917582643 > > get_bulk=32 put_bulk=32 keep=32 constant_n=false rate_persec=871248691 > get_bulk=32 put_bulk=32 keep=32 constant_n=true rate_persec=1134021836 > > Signed-off-by: Morten Brørup > --- > app/test/test_mempool_perf.c | 120 +++++++++++++++++++++-------------- > 1 file changed, 74 insertions(+), 46 deletions(-) > > diff --git a/app/test/test_mempool_perf.c b/app/test/test_mempool_perf.c > index 87ad251367..ffefe934d5 100644 > --- a/app/test/test_mempool_perf.c > +++ b/app/test/test_mempool_perf.c > @@ -1,5 +1,6 @@ > /* SPDX-License-Identifier: BSD-3-Clause > * Copyright(c) 2010-2014 Intel Corporation > + * Copyright(c) 2022 SmartShare Systems > */ > > #include > @@ -55,19 +56,24 @@ > * > * - Bulk get from 1 to 32 > * - Bulk put from 1 to 32 > + * - Bulk get and put from 1 to 32, compile time constant > * > * - Number of kept objects (*n_keep*) > * > * - 32 > * - 128 > + * - 512 > */ > > #define N 65536 > #define TIME_S 5 > #define MEMPOOL_ELT_SIZE 2048 > -#define MAX_KEEP 128 > +#define MAX_KEEP 512 > #define MEMPOOL_SIZE ((rte_lcore_count()*(MAX_KEEP+RTE_MEMPOOL_CACHE_MAX_SIZE))-1) > > +/* Number of pointers fitting into one cache line. */ > +#define CACHE_LINE_BURST (RTE_CACHE_LINE_SIZE/sizeof(uintptr_t)) > + nit: I think it's better to follow the coding rules and add a space around the '/', even if I can see the line right above does not follow this convention > #define LOG_ERR() printf("test failed at %s():%d\n", __func__, __LINE__) > #define RET_ERR() do { \ > LOG_ERR(); \ > @@ -80,16 +86,16 @@ > } while (0) > > static int use_external_cache; > -static unsigned external_cache_size = RTE_MEMPOOL_CACHE_MAX_SIZE; > +static unsigned int external_cache_size = RTE_MEMPOOL_CACHE_MAX_SIZE; > > static uint32_t synchro; > > /* number of objects in one bulk operation (get or put) */ > -static unsigned n_get_bulk; > -static unsigned n_put_bulk; > +static int n_get_bulk; > +static int n_put_bulk; > > /* number of objects retrieved from mempool before putting them back */ > -static unsigned n_keep; > +static int n_keep; > > /* number of enqueues / dequeues */ > struct mempool_test_stats { > @@ -104,20 +110,43 @@ static struct mempool_test_stats stats[RTE_MAX_LCORE]; > */ > static void > my_obj_init(struct rte_mempool *mp, __rte_unused void *arg, > - void *obj, unsigned i) > + void *obj, unsigned int i) > { > uint32_t *objnum = obj; > memset(obj, 0, mp->elt_size); > *objnum = i; > } > > +#define test_loop(x_keep, x_get_bulk, x_put_bulk) \ > + for (i = 0; likely(i < (N/x_keep)); i++) { \ > + /* get x_keep objects by bulk of x_get_bulk */ \ > + for (idx = 0; idx < x_keep; idx += x_get_bulk) {\ > + ret = rte_mempool_generic_get(mp, \ > + &obj_table[idx], \ > + x_get_bulk, \ > + cache); \ > + if (unlikely(ret < 0)) { \ > + rte_mempool_dump(stdout, mp); \ > + GOTO_ERR(ret, out); \ > + } \ > + } \ > + \ > + /* put the objects back by bulk of x_put_bulk */\ > + for (idx = 0; idx < x_keep; idx += x_put_bulk) {\ > + rte_mempool_generic_put(mp, \ > + &obj_table[idx], \ > + x_put_bulk, \ > + cache); \ > + } \ > + } > + I think a static __rte_always_inline function would do the job and is clearer. Something like this: static __rte_always_inline int test_loop(struct rte_mempool *mp, struct rte_mempool_cache *cache, unsigned int x_keep, unsigned int x_get_bulk, unsigned int x_put_bulk) { void *obj_table[MAX_KEEP] __rte_cache_aligned; unsigned int idx; unsigned int i; int ret; for (i = 0; likely(i < (N / x_keep)); i++) { /* get x_keep objects by bulk of x_get_bulk */ for (idx = 0; idx < x_keep; idx += x_get_bulk) { ret = rte_mempool_generic_get(mp, &obj_table[idx], x_get_bulk, cache); if (unlikely(ret < 0)) { rte_mempool_dump(stdout, mp); return ret; } } /* put the objects back by bulk of x_put_bulk */ for (idx = 0; idx < x_keep; idx += x_put_bulk) { rte_mempool_generic_put(mp, &obj_table[idx], x_put_bulk, cache); } } return 0; } > static int > per_lcore_mempool_test(void *arg) > { > - void *obj_table[MAX_KEEP]; > - unsigned i, idx; > + void *obj_table[MAX_KEEP] __rte_cache_aligned; > + int i, idx; > struct rte_mempool *mp = arg; > - unsigned lcore_id = rte_lcore_id(); > + unsigned int lcore_id = rte_lcore_id(); > int ret = 0; > uint64_t start_cycles, end_cycles; > uint64_t time_diff = 0, hz = rte_get_timer_hz(); > @@ -139,6 +168,9 @@ per_lcore_mempool_test(void *arg) > GOTO_ERR(ret, out); > if (((n_keep / n_put_bulk) * n_put_bulk) != n_keep) > GOTO_ERR(ret, out); > + /* for constant n, n_get_bulk and n_put_bulk must be the same */ > + if (n_get_bulk < 0 && n_put_bulk != n_get_bulk) > + GOTO_ERR(ret, out); > > stats[lcore_id].enq_count = 0; > > @@ -149,30 +181,18 @@ per_lcore_mempool_test(void *arg) > start_cycles = rte_get_timer_cycles(); > > while (time_diff/hz < TIME_S) { > - for (i = 0; likely(i < (N/n_keep)); i++) { > - /* get n_keep objects by bulk of n_bulk */ > - idx = 0; > - while (idx < n_keep) { > - ret = rte_mempool_generic_get(mp, > - &obj_table[idx], > - n_get_bulk, > - cache); > - if (unlikely(ret < 0)) { > - rte_mempool_dump(stdout, mp); > - /* in this case, objects are lost... */ > - GOTO_ERR(ret, out); > - } > - idx += n_get_bulk; > - } > - > - /* put the objects back */ > - idx = 0; > - while (idx < n_keep) { > - rte_mempool_generic_put(mp, &obj_table[idx], > - n_put_bulk, > - cache); > - idx += n_put_bulk; > - } > + if (n_get_bulk > 0) { > + test_loop(n_keep, n_get_bulk, n_put_bulk); > + } else if (n_get_bulk == -1) { > + test_loop(-n_keep, 1, 1); > + } else if (n_get_bulk == -4) { > + test_loop(-n_keep, 4, 4); > + } else if (n_get_bulk == -(int)CACHE_LINE_BURST) { > + test_loop(-n_keep, CACHE_LINE_BURST, CACHE_LINE_BURST); > + } else if (n_get_bulk == -32) { > + test_loop(-n_keep, 32, 32); > + } else { > + GOTO_ERR(ret, out); > } > end_cycles = rte_get_timer_cycles(); > time_diff = end_cycles - start_cycles; I'm not convinced that having negative values to mean "constant" is clear. I'd prefer to have another global variable "use_constant_values", which would give something like this: if (!use_constant_values) ret = test_loop(mp, cache, n_keep, n_get_bulk, n_put_bulk); else if (n_get_bulk == 1) ret = test_loop(mp, cache, n_keep, 1, 1); else if (n_get_bulk == 4) ret = test_loop(mp, cache, n_keep, 4, 4); else if (n_get_bulk == CACHE_LINE_BURST) ret = test_loop(mp, cache, n_keep, CACHE_LINE_BURST, CACHE_LINE_BURST); else if (n_get_bulk == 32) ret = test_loop(mp, cache, n_keep, 32, 32); else ret = -1; if (ret < 0) GOTO_ERR(ret, out); > @@ -192,10 +212,10 @@ per_lcore_mempool_test(void *arg) > static int > launch_cores(struct rte_mempool *mp, unsigned int cores) > { > - unsigned lcore_id; > + unsigned int lcore_id; > uint64_t rate; > int ret; > - unsigned cores_save = cores; > + unsigned int cores_save = cores; > > __atomic_store_n(&synchro, 0, __ATOMIC_RELAXED); > > @@ -203,10 +223,14 @@ launch_cores(struct rte_mempool *mp, unsigned int cores) > memset(stats, 0, sizeof(stats)); > > printf("mempool_autotest cache=%u cores=%u n_get_bulk=%u " > - "n_put_bulk=%u n_keep=%u ", > + "n_put_bulk=%u n_keep=%u constant_n=%s ", > use_external_cache ? > - external_cache_size : (unsigned) mp->cache_size, > - cores, n_get_bulk, n_put_bulk, n_keep); > + external_cache_size : (unsigned int) mp->cache_size, > + cores, > + n_get_bulk > 0 ? n_get_bulk : -n_get_bulk, > + n_put_bulk > 0 ? n_put_bulk : -n_put_bulk, > + n_keep > 0 ? n_keep : -n_keep, > + n_get_bulk > 0 ? "false" : "true"); > This would become much simpler with this new variable. > if (rte_mempool_avail_count(mp) != MEMPOOL_SIZE) { > printf("mempool is not full\n"); > @@ -253,12 +277,13 @@ launch_cores(struct rte_mempool *mp, unsigned int cores) > static int > do_one_mempool_test(struct rte_mempool *mp, unsigned int cores) > { > - unsigned bulk_tab_get[] = { 1, 4, 32, 0 }; > - unsigned bulk_tab_put[] = { 1, 4, 32, 0 }; > - unsigned keep_tab[] = { 32, 128, 0 }; > - unsigned *get_bulk_ptr; > - unsigned *put_bulk_ptr; > - unsigned *keep_ptr; > + /* Negative n_get_bulk values represent constants in the test. */ > + int bulk_tab_get[] = { 1, 4, CACHE_LINE_BURST, 32, -1, -4, -(int)CACHE_LINE_BURST, -32, 0 }; > + int bulk_tab_put[] = { 1, 4, CACHE_LINE_BURST, 32, 0 }; > + int keep_tab[] = { 32, 128, 512, 0 }; > + int *get_bulk_ptr; > + int *put_bulk_ptr; > + int *keep_ptr; > int ret; > Same here, changes would be minimal. > for (get_bulk_ptr = bulk_tab_get; *get_bulk_ptr; get_bulk_ptr++) { > @@ -266,13 +291,16 @@ do_one_mempool_test(struct rte_mempool *mp, unsigned int cores) > for (keep_ptr = keep_tab; *keep_ptr; keep_ptr++) { > > n_get_bulk = *get_bulk_ptr; > - n_put_bulk = *put_bulk_ptr; > - n_keep = *keep_ptr; > + n_put_bulk = n_get_bulk > 0 ? *put_bulk_ptr : n_get_bulk; > + n_keep = n_get_bulk > 0 ? *keep_ptr : -*keep_ptr; > ret = launch_cores(mp, cores); > > if (ret < 0) > return -1; No change would be required above (except use_constant_values = 0). And below, we could simply add instead: /* replay test with constant values */ if (n_get_bulk == n_put_bulk) { use_constant_values = 1; ret = launch_cores(mp, cores); if (ret < 0) return -1; } If you are ok with the proposed changes, I can directly submit a v2, since I already made them locally. Thanks, Olivier