From: Olivier Matz <olivier.matz@6wind.com>
To: "Morten Brørup" <mb@smartsharesystems.com>
Cc: andrew.rybchenko@oktetlabs.ru, thomas@monjalon.net,
bruce.richardson@intel.com, jerinjacobk@gmail.com, dev@dpdk.org
Subject: Re: [PATCH] mempool: test performance with constant n
Date: Mon, 24 Jan 2022 11:26:00 +0100 [thread overview]
Message-ID: <Ye5+uJDd/75qq/7R@platinum> (raw)
In-Reply-To: <20220119113732.40167-1-mb@smartsharesystems.com>
Hi Morten,
Thank you for enhancing the mempool test. Please see some comments
below.
On Wed, Jan 19, 2022 at 12:37:32PM +0100, Morten Brørup wrote:
> "What gets measured gets done."
>
> This patch adds mempool performance tests where the number of objects to
> put and get is constant at compile time, which may significantly improve
> the performance of these functions. [*]
>
> Also, it is ensured that the array holding the object used for testing
> is cache line aligned, for maximum performance.
>
> And finally, the following entries are added to the list of tests:
> - Number of kept objects: 512
> - Number of objects to get and to put: The number of pointers fitting
> into a cache line, i.e. 8 or 16
>
> [*] Some example performance test (with cache) results:
>
> get_bulk=4 put_bulk=4 keep=128 constant_n=false rate_persec=280480972
> get_bulk=4 put_bulk=4 keep=128 constant_n=true rate_persec=622159462
>
> get_bulk=8 put_bulk=8 keep=128 constant_n=false rate_persec=477967155
> get_bulk=8 put_bulk=8 keep=128 constant_n=true rate_persec=917582643
>
> get_bulk=32 put_bulk=32 keep=32 constant_n=false rate_persec=871248691
> get_bulk=32 put_bulk=32 keep=32 constant_n=true rate_persec=1134021836
>
> Signed-off-by: Morten Brørup <mb@smartsharesystems.com>
> ---
> app/test/test_mempool_perf.c | 120 +++++++++++++++++++++--------------
> 1 file changed, 74 insertions(+), 46 deletions(-)
>
> diff --git a/app/test/test_mempool_perf.c b/app/test/test_mempool_perf.c
> index 87ad251367..ffefe934d5 100644
> --- a/app/test/test_mempool_perf.c
> +++ b/app/test/test_mempool_perf.c
> @@ -1,5 +1,6 @@
> /* SPDX-License-Identifier: BSD-3-Clause
> * Copyright(c) 2010-2014 Intel Corporation
> + * Copyright(c) 2022 SmartShare Systems
> */
>
> #include <string.h>
> @@ -55,19 +56,24 @@
> *
> * - Bulk get from 1 to 32
> * - Bulk put from 1 to 32
> + * - Bulk get and put from 1 to 32, compile time constant
> *
> * - Number of kept objects (*n_keep*)
> *
> * - 32
> * - 128
> + * - 512
> */
>
> #define N 65536
> #define TIME_S 5
> #define MEMPOOL_ELT_SIZE 2048
> -#define MAX_KEEP 128
> +#define MAX_KEEP 512
> #define MEMPOOL_SIZE ((rte_lcore_count()*(MAX_KEEP+RTE_MEMPOOL_CACHE_MAX_SIZE))-1)
>
> +/* Number of pointers fitting into one cache line. */
> +#define CACHE_LINE_BURST (RTE_CACHE_LINE_SIZE/sizeof(uintptr_t))
> +
nit: I think it's better to follow the coding rules and add a space around the
'/', even if I can see the line right above does not follow this convention
> #define LOG_ERR() printf("test failed at %s():%d\n", __func__, __LINE__)
> #define RET_ERR() do { \
> LOG_ERR(); \
> @@ -80,16 +86,16 @@
> } while (0)
>
> static int use_external_cache;
> -static unsigned external_cache_size = RTE_MEMPOOL_CACHE_MAX_SIZE;
> +static unsigned int external_cache_size = RTE_MEMPOOL_CACHE_MAX_SIZE;
>
> static uint32_t synchro;
>
> /* number of objects in one bulk operation (get or put) */
> -static unsigned n_get_bulk;
> -static unsigned n_put_bulk;
> +static int n_get_bulk;
> +static int n_put_bulk;
>
> /* number of objects retrieved from mempool before putting them back */
> -static unsigned n_keep;
> +static int n_keep;
>
> /* number of enqueues / dequeues */
> struct mempool_test_stats {
> @@ -104,20 +110,43 @@ static struct mempool_test_stats stats[RTE_MAX_LCORE];
> */
> static void
> my_obj_init(struct rte_mempool *mp, __rte_unused void *arg,
> - void *obj, unsigned i)
> + void *obj, unsigned int i)
> {
> uint32_t *objnum = obj;
> memset(obj, 0, mp->elt_size);
> *objnum = i;
> }
>
> +#define test_loop(x_keep, x_get_bulk, x_put_bulk) \
> + for (i = 0; likely(i < (N/x_keep)); i++) { \
> + /* get x_keep objects by bulk of x_get_bulk */ \
> + for (idx = 0; idx < x_keep; idx += x_get_bulk) {\
> + ret = rte_mempool_generic_get(mp, \
> + &obj_table[idx], \
> + x_get_bulk, \
> + cache); \
> + if (unlikely(ret < 0)) { \
> + rte_mempool_dump(stdout, mp); \
> + GOTO_ERR(ret, out); \
> + } \
> + } \
> + \
> + /* put the objects back by bulk of x_put_bulk */\
> + for (idx = 0; idx < x_keep; idx += x_put_bulk) {\
> + rte_mempool_generic_put(mp, \
> + &obj_table[idx], \
> + x_put_bulk, \
> + cache); \
> + } \
> + }
> +
I think a static __rte_always_inline function would do the job and is
clearer. Something like this:
static __rte_always_inline int
test_loop(struct rte_mempool *mp, struct rte_mempool_cache *cache,
unsigned int x_keep, unsigned int x_get_bulk, unsigned int x_put_bulk)
{
void *obj_table[MAX_KEEP] __rte_cache_aligned;
unsigned int idx;
unsigned int i;
int ret;
for (i = 0; likely(i < (N / x_keep)); i++) {
/* get x_keep objects by bulk of x_get_bulk */
for (idx = 0; idx < x_keep; idx += x_get_bulk) {
ret = rte_mempool_generic_get(mp,
&obj_table[idx],
x_get_bulk,
cache);
if (unlikely(ret < 0)) {
rte_mempool_dump(stdout, mp);
return ret;
}
}
/* put the objects back by bulk of x_put_bulk */
for (idx = 0; idx < x_keep; idx += x_put_bulk) {
rte_mempool_generic_put(mp,
&obj_table[idx],
x_put_bulk,
cache);
}
}
return 0;
}
> static int
> per_lcore_mempool_test(void *arg)
> {
> - void *obj_table[MAX_KEEP];
> - unsigned i, idx;
> + void *obj_table[MAX_KEEP] __rte_cache_aligned;
> + int i, idx;
> struct rte_mempool *mp = arg;
> - unsigned lcore_id = rte_lcore_id();
> + unsigned int lcore_id = rte_lcore_id();
> int ret = 0;
> uint64_t start_cycles, end_cycles;
> uint64_t time_diff = 0, hz = rte_get_timer_hz();
> @@ -139,6 +168,9 @@ per_lcore_mempool_test(void *arg)
> GOTO_ERR(ret, out);
> if (((n_keep / n_put_bulk) * n_put_bulk) != n_keep)
> GOTO_ERR(ret, out);
> + /* for constant n, n_get_bulk and n_put_bulk must be the same */
> + if (n_get_bulk < 0 && n_put_bulk != n_get_bulk)
> + GOTO_ERR(ret, out);
>
> stats[lcore_id].enq_count = 0;
>
> @@ -149,30 +181,18 @@ per_lcore_mempool_test(void *arg)
> start_cycles = rte_get_timer_cycles();
>
> while (time_diff/hz < TIME_S) {
> - for (i = 0; likely(i < (N/n_keep)); i++) {
> - /* get n_keep objects by bulk of n_bulk */
> - idx = 0;
> - while (idx < n_keep) {
> - ret = rte_mempool_generic_get(mp,
> - &obj_table[idx],
> - n_get_bulk,
> - cache);
> - if (unlikely(ret < 0)) {
> - rte_mempool_dump(stdout, mp);
> - /* in this case, objects are lost... */
> - GOTO_ERR(ret, out);
> - }
> - idx += n_get_bulk;
> - }
> -
> - /* put the objects back */
> - idx = 0;
> - while (idx < n_keep) {
> - rte_mempool_generic_put(mp, &obj_table[idx],
> - n_put_bulk,
> - cache);
> - idx += n_put_bulk;
> - }
> + if (n_get_bulk > 0) {
> + test_loop(n_keep, n_get_bulk, n_put_bulk);
> + } else if (n_get_bulk == -1) {
> + test_loop(-n_keep, 1, 1);
> + } else if (n_get_bulk == -4) {
> + test_loop(-n_keep, 4, 4);
> + } else if (n_get_bulk == -(int)CACHE_LINE_BURST) {
> + test_loop(-n_keep, CACHE_LINE_BURST, CACHE_LINE_BURST);
> + } else if (n_get_bulk == -32) {
> + test_loop(-n_keep, 32, 32);
> + } else {
> + GOTO_ERR(ret, out);
> }
> end_cycles = rte_get_timer_cycles();
> time_diff = end_cycles - start_cycles;
I'm not convinced that having negative values to mean "constant" is
clear. I'd prefer to have another global variable "use_constant_values",
which would give something like this:
if (!use_constant_values)
ret = test_loop(mp, cache, n_keep, n_get_bulk, n_put_bulk);
else if (n_get_bulk == 1)
ret = test_loop(mp, cache, n_keep, 1, 1);
else if (n_get_bulk == 4)
ret = test_loop(mp, cache, n_keep, 4, 4);
else if (n_get_bulk == CACHE_LINE_BURST)
ret = test_loop(mp, cache, n_keep,
CACHE_LINE_BURST, CACHE_LINE_BURST);
else if (n_get_bulk == 32)
ret = test_loop(mp, cache, n_keep, 32, 32);
else
ret = -1;
if (ret < 0)
GOTO_ERR(ret, out);
> @@ -192,10 +212,10 @@ per_lcore_mempool_test(void *arg)
> static int
> launch_cores(struct rte_mempool *mp, unsigned int cores)
> {
> - unsigned lcore_id;
> + unsigned int lcore_id;
> uint64_t rate;
> int ret;
> - unsigned cores_save = cores;
> + unsigned int cores_save = cores;
>
> __atomic_store_n(&synchro, 0, __ATOMIC_RELAXED);
>
> @@ -203,10 +223,14 @@ launch_cores(struct rte_mempool *mp, unsigned int cores)
> memset(stats, 0, sizeof(stats));
>
> printf("mempool_autotest cache=%u cores=%u n_get_bulk=%u "
> - "n_put_bulk=%u n_keep=%u ",
> + "n_put_bulk=%u n_keep=%u constant_n=%s ",
> use_external_cache ?
> - external_cache_size : (unsigned) mp->cache_size,
> - cores, n_get_bulk, n_put_bulk, n_keep);
> + external_cache_size : (unsigned int) mp->cache_size,
> + cores,
> + n_get_bulk > 0 ? n_get_bulk : -n_get_bulk,
> + n_put_bulk > 0 ? n_put_bulk : -n_put_bulk,
> + n_keep > 0 ? n_keep : -n_keep,
> + n_get_bulk > 0 ? "false" : "true");
>
This would become much simpler with this new variable.
> if (rte_mempool_avail_count(mp) != MEMPOOL_SIZE) {
> printf("mempool is not full\n");
> @@ -253,12 +277,13 @@ launch_cores(struct rte_mempool *mp, unsigned int cores)
> static int
> do_one_mempool_test(struct rte_mempool *mp, unsigned int cores)
> {
> - unsigned bulk_tab_get[] = { 1, 4, 32, 0 };
> - unsigned bulk_tab_put[] = { 1, 4, 32, 0 };
> - unsigned keep_tab[] = { 32, 128, 0 };
> - unsigned *get_bulk_ptr;
> - unsigned *put_bulk_ptr;
> - unsigned *keep_ptr;
> + /* Negative n_get_bulk values represent constants in the test. */
> + int bulk_tab_get[] = { 1, 4, CACHE_LINE_BURST, 32, -1, -4, -(int)CACHE_LINE_BURST, -32, 0 };
> + int bulk_tab_put[] = { 1, 4, CACHE_LINE_BURST, 32, 0 };
> + int keep_tab[] = { 32, 128, 512, 0 };
> + int *get_bulk_ptr;
> + int *put_bulk_ptr;
> + int *keep_ptr;
> int ret;
>
Same here, changes would be minimal.
> for (get_bulk_ptr = bulk_tab_get; *get_bulk_ptr; get_bulk_ptr++) {
> @@ -266,13 +291,16 @@ do_one_mempool_test(struct rte_mempool *mp, unsigned int cores)
> for (keep_ptr = keep_tab; *keep_ptr; keep_ptr++) {
>
> n_get_bulk = *get_bulk_ptr;
> - n_put_bulk = *put_bulk_ptr;
> - n_keep = *keep_ptr;
> + n_put_bulk = n_get_bulk > 0 ? *put_bulk_ptr : n_get_bulk;
> + n_keep = n_get_bulk > 0 ? *keep_ptr : -*keep_ptr;
> ret = launch_cores(mp, cores);
>
> if (ret < 0)
> return -1;
No change would be required above (except use_constant_values = 0).
And below, we could simply add instead:
/* replay test with constant values */
if (n_get_bulk == n_put_bulk) {
use_constant_values = 1;
ret = launch_cores(mp, cores);
if (ret < 0)
return -1;
}
If you are ok with the proposed changes, I can directly submit a v2,
since I already made them locally.
Thanks,
Olivier
next prev parent reply other threads:[~2022-01-24 10:26 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-01-19 11:37 Morten Brørup
2022-01-24 10:26 ` Olivier Matz [this message]
2022-01-24 10:37 ` Morten Brørup
2022-01-24 14:53 ` Olivier Matz
2022-01-24 14:57 ` Olivier Matz
2022-01-24 14:59 ` [PATCH v2] " Olivier Matz
2022-01-24 17:20 ` Morten Brørup
2022-01-25 12:56 ` Olivier Matz
2022-02-02 22:39 ` Thomas Monjalon
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Ye5+uJDd/75qq/7R@platinum \
--to=olivier.matz@6wind.com \
--cc=andrew.rybchenko@oktetlabs.ru \
--cc=bruce.richardson@intel.com \
--cc=dev@dpdk.org \
--cc=jerinjacobk@gmail.com \
--cc=mb@smartsharesystems.com \
--cc=thomas@monjalon.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).