* Re: [PATCH] mempool: test performance with larger bursts
2024-01-21 4:52 [PATCH] mempool: test performance with larger bursts Morten Brørup
@ 2024-01-22 7:10 ` fengchengwen
2024-01-22 14:34 ` [PATCH v2] " Morten Brørup
` (7 subsequent siblings)
8 siblings, 0 replies; 28+ messages in thread
From: fengchengwen @ 2024-01-22 7:10 UTC (permalink / raw)
To: Morten Brørup, andrew.rybchenko; +Cc: dev
Hi Morten,
On 2024/1/21 12:52, Morten Brørup wrote:
> Bursts of up to 128 packets are not uncommon, so increase the maximum
> tested get and put burst sizes from 32 to 128.
How about add 64 ?
>
> Some applications keep more than 512 objects, so increase the maximum
> number of kept objects from 512 to 4096.
> This exceeds the typical mempool cache size of 512 objects, so the test
> also exercises the mempool driver.
And for 2048? (I notice below already has 1024)
PS: with this commit, the number of combinations will grow much, and every
subtest cost 5sec, so the total time will increases great. So could this perf suite
support paramters or derivative command ? for instance:
REGISTER_PERF_TEST(mempool_perf_autotest, test_mempool_perf);
REGISTER_PERF_TEST(mempool_perf_autotest_keeps256, test_mempool_perf_keeps256);
Thanks.
>
> Signed-off-by: Morten Brørup <mb@smartsharesystems.com>
> ---
> app/test/test_mempool_perf.c | 25 ++++++++++++++++---------
> 1 file changed, 16 insertions(+), 9 deletions(-)
>
> diff --git a/app/test/test_mempool_perf.c b/app/test/test_mempool_perf.c
> index 96de347f04..f52106e833 100644
> --- a/app/test/test_mempool_perf.c
> +++ b/app/test/test_mempool_perf.c
> @@ -1,6 +1,6 @@
> /* SPDX-License-Identifier: BSD-3-Clause
> * Copyright(c) 2010-2014 Intel Corporation
> - * Copyright(c) 2022 SmartShare Systems
> + * Copyright(c) 2022-2024 SmartShare Systems
> */
>
> #include <string.h>
> @@ -54,22 +54,24 @@
> *
> * - Bulk size (*n_get_bulk*, *n_put_bulk*)
> *
> - * - Bulk get from 1 to 32
> - * - Bulk put from 1 to 32
> - * - Bulk get and put from 1 to 32, compile time constant
> + * - Bulk get from 1 to 128
> + * - Bulk put from 1 to 128
> + * - Bulk get and put from 1 to 128, compile time constant
> *
> * - Number of kept objects (*n_keep*)
> *
> * - 32
> * - 128
> * - 512
> + * - 1024
> + * - 4096
> */
>
> #define N 65536
> #define TIME_S 5
> #define MEMPOOL_ELT_SIZE 2048
> -#define MAX_KEEP 512
> -#define MEMPOOL_SIZE ((rte_lcore_count()*(MAX_KEEP+RTE_MEMPOOL_CACHE_MAX_SIZE))-1)
> +#define MAX_KEEP 4096
> +#define MEMPOOL_SIZE ((rte_lcore_count()*(MAX_KEEP+RTE_MEMPOOL_CACHE_MAX_SIZE*2))-1)
>
> /* Number of pointers fitting into one cache line. */
> #define CACHE_LINE_BURST (RTE_CACHE_LINE_SIZE / sizeof(uintptr_t))
> @@ -204,6 +206,8 @@ per_lcore_mempool_test(void *arg)
> CACHE_LINE_BURST, CACHE_LINE_BURST);
> else if (n_get_bulk == 32)
> ret = test_loop(mp, cache, n_keep, 32, 32);
> + else if (n_get_bulk == 128)
> + ret = test_loop(mp, cache, n_keep, 128, 128);
> else
> ret = -1;
>
> @@ -289,9 +293,9 @@ launch_cores(struct rte_mempool *mp, unsigned int cores)
> static int
> do_one_mempool_test(struct rte_mempool *mp, unsigned int cores)
> {
> - unsigned int bulk_tab_get[] = { 1, 4, CACHE_LINE_BURST, 32, 0 };
> - unsigned int bulk_tab_put[] = { 1, 4, CACHE_LINE_BURST, 32, 0 };
> - unsigned int keep_tab[] = { 32, 128, 512, 0 };
> + unsigned int bulk_tab_get[] = { 1, 4, CACHE_LINE_BURST, 32, 128, 0 };
> + unsigned int bulk_tab_put[] = { 1, 4, CACHE_LINE_BURST, 32, 128, 0 };
> + unsigned int keep_tab[] = { 32, 128, 512, 1024, 4096, 0 };
> unsigned *get_bulk_ptr;
> unsigned *put_bulk_ptr;
> unsigned *keep_ptr;
> @@ -301,6 +305,9 @@ do_one_mempool_test(struct rte_mempool *mp, unsigned int cores)
> for (put_bulk_ptr = bulk_tab_put; *put_bulk_ptr; put_bulk_ptr++) {
> for (keep_ptr = keep_tab; *keep_ptr; keep_ptr++) {
>
> + if (*keep_ptr < *get_bulk_ptr || *keep_ptr < *put_bulk_ptr)
> + continue;
> +
> use_constant_values = 0;
> n_get_bulk = *get_bulk_ptr;
> n_put_bulk = *put_bulk_ptr;
>
^ permalink raw reply [flat|nested] 28+ messages in thread
* [PATCH v2] mempool: test performance with larger bursts
2024-01-21 4:52 [PATCH] mempool: test performance with larger bursts Morten Brørup
2024-01-22 7:10 ` fengchengwen
@ 2024-01-22 14:34 ` Morten Brørup
2024-01-24 2:41 ` fengchengwen
2024-01-24 8:58 ` [PATCH v3] " Morten Brørup
` (6 subsequent siblings)
8 siblings, 1 reply; 28+ messages in thread
From: Morten Brørup @ 2024-01-22 14:34 UTC (permalink / raw)
To: andrew.rybchenko, fengchengwen; +Cc: dev, Morten Brørup
Bursts of up to 64 or 128 packets are not uncommon, so increase the
maximum tested get and put burst sizes from 32 to 128.
Some applications keep more than 512 objects, so increase the maximum
number of kept objects from 512 to 8192, still in jumps of factor four.
This exceeds the typical mempool cache size of 512 objects, so the test
also exercises the mempool driver.
Signed-off-by: Morten Brørup <mb@smartsharesystems.com>
---
v2: Addressed feedback by Chengwen Feng
* Added get and put burst sizes of 64 packets, which is probably also not
uncommon.
* Fixed list of number of kept objects so list remains in jumps of factor
four.
* Added three derivative test cases, for faster testing.
---
app/test/test_mempool_perf.c | 107 ++++++++++++++++++++---------------
1 file changed, 62 insertions(+), 45 deletions(-)
diff --git a/app/test/test_mempool_perf.c b/app/test/test_mempool_perf.c
index 96de347f04..a5a7d43608 100644
--- a/app/test/test_mempool_perf.c
+++ b/app/test/test_mempool_perf.c
@@ -1,6 +1,6 @@
/* SPDX-License-Identifier: BSD-3-Clause
* Copyright(c) 2010-2014 Intel Corporation
- * Copyright(c) 2022 SmartShare Systems
+ * Copyright(c) 2022-2024 SmartShare Systems
*/
#include <string.h>
@@ -54,22 +54,24 @@
*
* - Bulk size (*n_get_bulk*, *n_put_bulk*)
*
- * - Bulk get from 1 to 32
- * - Bulk put from 1 to 32
- * - Bulk get and put from 1 to 32, compile time constant
+ * - Bulk get from 1 to 128
+ * - Bulk put from 1 to 128
+ * - Bulk get and put from 1 to 128, compile time constant
*
* - Number of kept objects (*n_keep*)
*
* - 32
* - 128
* - 512
+ * - 2048
+ * - 8192
*/
#define N 65536
#define TIME_S 5
#define MEMPOOL_ELT_SIZE 2048
-#define MAX_KEEP 512
-#define MEMPOOL_SIZE ((rte_lcore_count()*(MAX_KEEP+RTE_MEMPOOL_CACHE_MAX_SIZE))-1)
+#define MAX_KEEP 8192
+#define MEMPOOL_SIZE ((rte_lcore_count()*(MAX_KEEP+RTE_MEMPOOL_CACHE_MAX_SIZE*2))-1)
/* Number of pointers fitting into one cache line. */
#define CACHE_LINE_BURST (RTE_CACHE_LINE_SIZE / sizeof(uintptr_t))
@@ -204,6 +206,10 @@ per_lcore_mempool_test(void *arg)
CACHE_LINE_BURST, CACHE_LINE_BURST);
else if (n_get_bulk == 32)
ret = test_loop(mp, cache, n_keep, 32, 32);
+ else if (n_get_bulk == 64)
+ ret = test_loop(mp, cache, n_keep, 64, 64);
+ else if (n_get_bulk == 128)
+ ret = test_loop(mp, cache, n_keep, 128, 128);
else
ret = -1;
@@ -289,9 +295,9 @@ launch_cores(struct rte_mempool *mp, unsigned int cores)
static int
do_one_mempool_test(struct rte_mempool *mp, unsigned int cores)
{
- unsigned int bulk_tab_get[] = { 1, 4, CACHE_LINE_BURST, 32, 0 };
- unsigned int bulk_tab_put[] = { 1, 4, CACHE_LINE_BURST, 32, 0 };
- unsigned int keep_tab[] = { 32, 128, 512, 0 };
+ unsigned int bulk_tab_get[] = { 1, 4, CACHE_LINE_BURST, 32, 64, 128, 0 };
+ unsigned int bulk_tab_put[] = { 1, 4, CACHE_LINE_BURST, 32, 64, 128, 0 };
+ unsigned int keep_tab[] = { 32, 128, 512, 2048, 8192, 0 };
unsigned *get_bulk_ptr;
unsigned *put_bulk_ptr;
unsigned *keep_ptr;
@@ -301,6 +307,9 @@ do_one_mempool_test(struct rte_mempool *mp, unsigned int cores)
for (put_bulk_ptr = bulk_tab_put; *put_bulk_ptr; put_bulk_ptr++) {
for (keep_ptr = keep_tab; *keep_ptr; keep_ptr++) {
+ if (*keep_ptr < *get_bulk_ptr || *keep_ptr < *put_bulk_ptr)
+ continue;
+
use_constant_values = 0;
n_get_bulk = *get_bulk_ptr;
n_put_bulk = *put_bulk_ptr;
@@ -323,7 +332,7 @@ do_one_mempool_test(struct rte_mempool *mp, unsigned int cores)
}
static int
-test_mempool_perf(void)
+do_all_mempool_perf_tests(unsigned int cores)
{
struct rte_mempool *mp_cache = NULL;
struct rte_mempool *mp_nocache = NULL;
@@ -376,65 +385,73 @@ test_mempool_perf(void)
rte_mempool_obj_iter(default_pool, my_obj_init, NULL);
- /* performance test with 1, 2 and max cores */
printf("start performance test (without cache)\n");
-
- if (do_one_mempool_test(mp_nocache, 1) < 0)
+ if (do_one_mempool_test(mp_nocache, cores) < 0)
goto err;
- if (do_one_mempool_test(mp_nocache, 2) < 0)
- goto err;
-
- if (do_one_mempool_test(mp_nocache, rte_lcore_count()) < 0)
- goto err;
-
- /* performance test with 1, 2 and max cores */
printf("start performance test for %s (without cache)\n",
default_pool_ops);
-
- if (do_one_mempool_test(default_pool, 1) < 0)
+ if (do_one_mempool_test(default_pool, cores) < 0)
goto err;
- if (do_one_mempool_test(default_pool, 2) < 0)
+ printf("start performance test (with cache)\n");
+ if (do_one_mempool_test(mp_cache, cores) < 0)
goto err;
- if (do_one_mempool_test(default_pool, rte_lcore_count()) < 0)
+ printf("start performance test (with user-owned cache)\n");
+ use_external_cache = 1;
+ if (do_one_mempool_test(mp_nocache, cores) < 0)
goto err;
- /* performance test with 1, 2 and max cores */
- printf("start performance test (with cache)\n");
+ rte_mempool_list_dump(stdout);
- if (do_one_mempool_test(mp_cache, 1) < 0)
- goto err;
+ ret = 0;
- if (do_one_mempool_test(mp_cache, 2) < 0)
- goto err;
+err:
+ rte_mempool_free(mp_cache);
+ rte_mempool_free(mp_nocache);
+ rte_mempool_free(default_pool);
+ return ret;
+}
- if (do_one_mempool_test(mp_cache, rte_lcore_count()) < 0)
- goto err;
+static int
+test_mempool_perf_1core(void)
+{
+ return do_all_mempool_perf_tests(1);
+}
- /* performance test with 1, 2 and max cores */
- printf("start performance test (with user-owned cache)\n");
- use_external_cache = 1;
+static int
+test_mempool_perf_2cores(void)
+{
+ return do_all_mempool_perf_tests(2);
+}
- if (do_one_mempool_test(mp_nocache, 1) < 0)
- goto err;
+static int
+test_mempool_perf_allcores(void)
+{
+ return do_all_mempool_perf_tests(rte_lcore_count());
+}
- if (do_one_mempool_test(mp_nocache, 2) < 0)
- goto err;
+static int
+test_mempool_perf(void)
+{
+ int ret = -1;
- if (do_one_mempool_test(mp_nocache, rte_lcore_count()) < 0)
+ /* performance test with 1, 2 and max cores */
+ if (do_all_mempool_perf_tests(1) < 0)
+ goto err;
+ if (do_all_mempool_perf_tests(2) < 0)
+ goto err;
+ if (do_all_mempool_perf_tests(rte_lcore_count()) < 0)
goto err;
-
- rte_mempool_list_dump(stdout);
ret = 0;
err:
- rte_mempool_free(mp_cache);
- rte_mempool_free(mp_nocache);
- rte_mempool_free(default_pool);
return ret;
}
REGISTER_PERF_TEST(mempool_perf_autotest, test_mempool_perf);
+REGISTER_PERF_TEST(mempool_perf_autotest_1core, test_mempool_perf_1core);
+REGISTER_PERF_TEST(mempool_perf_autotest_2cores, test_mempool_perf_2cores);
+REGISTER_PERF_TEST(mempool_perf_autotest_allcores, test_mempool_perf_allcores);
--
2.17.1
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH v2] mempool: test performance with larger bursts
2024-01-22 14:34 ` [PATCH v2] " Morten Brørup
@ 2024-01-24 2:41 ` fengchengwen
0 siblings, 0 replies; 28+ messages in thread
From: fengchengwen @ 2024-01-24 2:41 UTC (permalink / raw)
To: Morten Brørup, andrew.rybchenko; +Cc: dev
Hi Morten,
On 2024/1/22 22:34, Morten Brørup wrote:
> Bursts of up to 64 or 128 packets are not uncommon, so increase the
> maximum tested get and put burst sizes from 32 to 128.
>
> Some applications keep more than 512 objects, so increase the maximum
> number of kept objects from 512 to 8192, still in jumps of factor four.
> This exceeds the typical mempool cache size of 512 objects, so the test
> also exercises the mempool driver.
>
> Signed-off-by: Morten Brørup <mb@smartsharesystems.com>
>
> ---
>
> v2: Addressed feedback by Chengwen Feng
> * Added get and put burst sizes of 64 packets, which is probably also not
> uncommon.
> * Fixed list of number of kept objects so list remains in jumps of factor
> four.
> * Added three derivative test cases, for faster testing.
> ---
> app/test/test_mempool_perf.c | 107 ++++++++++++++++++++---------------
> 1 file changed, 62 insertions(+), 45 deletions(-)
>
> diff --git a/app/test/test_mempool_perf.c b/app/test/test_mempool_perf.c
> index 96de347f04..a5a7d43608 100644
> --- a/app/test/test_mempool_perf.c
> +++ b/app/test/test_mempool_perf.c
> @@ -1,6 +1,6 @@
> /* SPDX-License-Identifier: BSD-3-Clause
> * Copyright(c) 2010-2014 Intel Corporation
> - * Copyright(c) 2022 SmartShare Systems
> + * Copyright(c) 2022-2024 SmartShare Systems
> */
>
> #include <string.h>
> @@ -54,22 +54,24 @@
> *
> * - Bulk size (*n_get_bulk*, *n_put_bulk*)
> *
> - * - Bulk get from 1 to 32
> - * - Bulk put from 1 to 32
> - * - Bulk get and put from 1 to 32, compile time constant
> + * - Bulk get from 1 to 128
> + * - Bulk put from 1 to 128
> + * - Bulk get and put from 1 to 128, compile time constant
> *
> * - Number of kept objects (*n_keep*)
> *
> * - 32
> * - 128
> * - 512
> + * - 2048
> + * - 8192
> */
>
> #define N 65536
> #define TIME_S 5
> #define MEMPOOL_ELT_SIZE 2048
> -#define MAX_KEEP 512
> -#define MEMPOOL_SIZE ((rte_lcore_count()*(MAX_KEEP+RTE_MEMPOOL_CACHE_MAX_SIZE))-1)
> +#define MAX_KEEP 8192
> +#define MEMPOOL_SIZE ((rte_lcore_count()*(MAX_KEEP+RTE_MEMPOOL_CACHE_MAX_SIZE*2))-1)
>
> /* Number of pointers fitting into one cache line. */
> #define CACHE_LINE_BURST (RTE_CACHE_LINE_SIZE / sizeof(uintptr_t))
> @@ -204,6 +206,10 @@ per_lcore_mempool_test(void *arg)
> CACHE_LINE_BURST, CACHE_LINE_BURST);
> else if (n_get_bulk == 32)
> ret = test_loop(mp, cache, n_keep, 32, 32);
> + else if (n_get_bulk == 64)
> + ret = test_loop(mp, cache, n_keep, 64, 64);
> + else if (n_get_bulk == 128)
> + ret = test_loop(mp, cache, n_keep, 128, 128);
> else
> ret = -1;
>
> @@ -289,9 +295,9 @@ launch_cores(struct rte_mempool *mp, unsigned int cores)
> static int
> do_one_mempool_test(struct rte_mempool *mp, unsigned int cores)
> {
> - unsigned int bulk_tab_get[] = { 1, 4, CACHE_LINE_BURST, 32, 0 };
> - unsigned int bulk_tab_put[] = { 1, 4, CACHE_LINE_BURST, 32, 0 };
> - unsigned int keep_tab[] = { 32, 128, 512, 0 };
> + unsigned int bulk_tab_get[] = { 1, 4, CACHE_LINE_BURST, 32, 64, 128, 0 };
> + unsigned int bulk_tab_put[] = { 1, 4, CACHE_LINE_BURST, 32, 64, 128, 0 };
> + unsigned int keep_tab[] = { 32, 128, 512, 2048, 8192, 0 };
> unsigned *get_bulk_ptr;
> unsigned *put_bulk_ptr;
> unsigned *keep_ptr;
> @@ -301,6 +307,9 @@ do_one_mempool_test(struct rte_mempool *mp, unsigned int cores)
> for (put_bulk_ptr = bulk_tab_put; *put_bulk_ptr; put_bulk_ptr++) {
> for (keep_ptr = keep_tab; *keep_ptr; keep_ptr++) {
>
> + if (*keep_ptr < *get_bulk_ptr || *keep_ptr < *put_bulk_ptr)
> + continue;
> +
> use_constant_values = 0;
> n_get_bulk = *get_bulk_ptr;
> n_put_bulk = *put_bulk_ptr;
> @@ -323,7 +332,7 @@ do_one_mempool_test(struct rte_mempool *mp, unsigned int cores)
> }
>
> static int
> -test_mempool_perf(void)
> +do_all_mempool_perf_tests(unsigned int cores)
> {
> struct rte_mempool *mp_cache = NULL;
> struct rte_mempool *mp_nocache = NULL;
> @@ -376,65 +385,73 @@ test_mempool_perf(void)
>
> rte_mempool_obj_iter(default_pool, my_obj_init, NULL);
>
> - /* performance test with 1, 2 and max cores */
> printf("start performance test (without cache)\n");
> -
> - if (do_one_mempool_test(mp_nocache, 1) < 0)
> + if (do_one_mempool_test(mp_nocache, cores) < 0)
> goto err;
>
> - if (do_one_mempool_test(mp_nocache, 2) < 0)
> - goto err;
> -
> - if (do_one_mempool_test(mp_nocache, rte_lcore_count()) < 0)
> - goto err;
> -
> - /* performance test with 1, 2 and max cores */
> printf("start performance test for %s (without cache)\n",
> default_pool_ops);
> -
> - if (do_one_mempool_test(default_pool, 1) < 0)
> + if (do_one_mempool_test(default_pool, cores) < 0)
> goto err;
>
> - if (do_one_mempool_test(default_pool, 2) < 0)
> + printf("start performance test (with cache)\n");
> + if (do_one_mempool_test(mp_cache, cores) < 0)
> goto err;
>
> - if (do_one_mempool_test(default_pool, rte_lcore_count()) < 0)
> + printf("start performance test (with user-owned cache)\n");
> + use_external_cache = 1;
This variable should set to zero after next test, because we may repeat execute command again.
I think the original code already has this bug, suggest add a bugfix first and then with this commit.
> + if (do_one_mempool_test(mp_nocache, cores) < 0)
> goto err;
>
> - /* performance test with 1, 2 and max cores */
> - printf("start performance test (with cache)\n");
> + rte_mempool_list_dump(stdout);
>
> - if (do_one_mempool_test(mp_cache, 1) < 0)
> - goto err;
> + ret = 0;
>
> - if (do_one_mempool_test(mp_cache, 2) < 0)
> - goto err;
> +err:
> + rte_mempool_free(mp_cache);
> + rte_mempool_free(mp_nocache);
> + rte_mempool_free(default_pool);
> + return ret;
> +}
>
> - if (do_one_mempool_test(mp_cache, rte_lcore_count()) < 0)
> - goto err;
> +static int
> +test_mempool_perf_1core(void)
> +{
> + return do_all_mempool_perf_tests(1);
> +}
>
> - /* performance test with 1, 2 and max cores */
> - printf("start performance test (with user-owned cache)\n");
> - use_external_cache = 1;
> +static int
> +test_mempool_perf_2cores(void)
> +{
> + return do_all_mempool_perf_tests(2);
> +}
>
> - if (do_one_mempool_test(mp_nocache, 1) < 0)
> - goto err;
> +static int
> +test_mempool_perf_allcores(void)
> +{
> + return do_all_mempool_perf_tests(rte_lcore_count());
> +}
>
> - if (do_one_mempool_test(mp_nocache, 2) < 0)
> - goto err;
> +static int
> +test_mempool_perf(void)
> +{
> + int ret = -1;
>
> - if (do_one_mempool_test(mp_nocache, rte_lcore_count()) < 0)
> + /* performance test with 1, 2 and max cores */
> + if (do_all_mempool_perf_tests(1) < 0)
> + goto err;
> + if (do_all_mempool_perf_tests(2) < 0)
> + goto err;
> + if (do_all_mempool_perf_tests(rte_lcore_count()) < 0)
> goto err;
> -
> - rte_mempool_list_dump(stdout);
>
> ret = 0;
>
> err:
> - rte_mempool_free(mp_cache);
> - rte_mempool_free(mp_nocache);
> - rte_mempool_free(default_pool);
> return ret;
> }
>
> REGISTER_PERF_TEST(mempool_perf_autotest, test_mempool_perf);
> +REGISTER_PERF_TEST(mempool_perf_autotest_1core, test_mempool_perf_1core);
> +REGISTER_PERF_TEST(mempool_perf_autotest_2cores, test_mempool_perf_2cores);
> +REGISTER_PERF_TEST(mempool_perf_autotest_allcores, test_mempool_perf_allcores);
I'm OK for derivative tests by core-number.
With above bug fixed,
Acked-by: Chengwen Feng <fengchengwen@huawei.com>
Thanks
>
^ permalink raw reply [flat|nested] 28+ messages in thread
* [PATCH v3] mempool: test performance with larger bursts
2024-01-21 4:52 [PATCH] mempool: test performance with larger bursts Morten Brørup
2024-01-22 7:10 ` fengchengwen
2024-01-22 14:34 ` [PATCH v2] " Morten Brørup
@ 2024-01-24 8:58 ` Morten Brørup
2024-01-24 9:10 ` [PATCH v4] " Morten Brørup
` (5 subsequent siblings)
8 siblings, 0 replies; 28+ messages in thread
From: Morten Brørup @ 2024-01-24 8:58 UTC (permalink / raw)
To: andrew.rybchenko, fengchengwen; +Cc: dev, Morten Brørup
Bursts of up to 64 or 128 packets are not uncommon, so increase the
maximum tested get and put burst sizes from 32 to 128.
For convenience, also test get and put burst sizes of
RTE_MEMPOOL_CACHE_MAX_SIZE.
Some applications keep more than 512 objects, so increase the maximum
number of kept objects from 512 to 32768, still in jumps of factor four.
This exceeds the typical mempool cache size of 512 objects, so the test
also exercises the mempool driver.
Signed-off-by: Morten Brørup <mb@smartsharesystems.com>
---
v3:
* Increased max number of kept objects to 32768.
* Added get and put burst sizes of RTE_MEMPOOL_CACHE_MAX_SIZE objects.
* Print error if unable to allocate mempool.
* Initialize use_external_cache with each test.
A previous version of this patch had a bug, where all test runs
following the first would use external cache. (Chengwen Feng)
v2: Addressed feedback by Chengwen Feng
* Added get and put burst sizes of 64 objects, which is probably also not
uncommon packet burst size.
* Fixed list of number of kept objects so list remains in jumps of factor
four.
* Added three derivative test cases, for faster testing.
---
app/test/test_mempool_perf.c | 42 ++++++++++++++++++++++--------------
1 file changed, 26 insertions(+), 16 deletions(-)
diff --git a/app/test/test_mempool_perf.c b/app/test/test_mempool_perf.c
index a5a7d43608..481263186a 100644
--- a/app/test/test_mempool_perf.c
+++ b/app/test/test_mempool_perf.c
@@ -54,9 +54,9 @@
*
* - Bulk size (*n_get_bulk*, *n_put_bulk*)
*
- * - Bulk get from 1 to 128
- * - Bulk put from 1 to 128
- * - Bulk get and put from 1 to 128, compile time constant
+ * - Bulk get from 1 to 128, and RTE_MEMPOOL_CACHE_MAX_SIZE
+ * - Bulk put from 1 to 128, and RTE_MEMPOOL_CACHE_MAX_SIZE
+ * - Bulk get and put from 1 to 128, and RTE_MEMPOOL_CACHE_MAX_SIZE, compile time constant
*
* - Number of kept objects (*n_keep*)
*
@@ -65,12 +65,13 @@
* - 512
* - 2048
* - 8192
+ * - 32768
*/
-#define N 65536
+#define N 262144
#define TIME_S 5
#define MEMPOOL_ELT_SIZE 2048
-#define MAX_KEEP 8192
+#define MAX_KEEP 32768
#define MEMPOOL_SIZE ((rte_lcore_count()*(MAX_KEEP+RTE_MEMPOOL_CACHE_MAX_SIZE*2))-1)
/* Number of pointers fitting into one cache line. */
@@ -210,6 +211,9 @@ per_lcore_mempool_test(void *arg)
ret = test_loop(mp, cache, n_keep, 64, 64);
else if (n_get_bulk == 128)
ret = test_loop(mp, cache, n_keep, 128, 128);
+ else if (n_get_bulk == RTE_MEMPOOL_CACHE_MAX_SIZE)
+ ret = test_loop(mp, cache, n_keep,
+ RTE_MEMPOOL_CACHE_MAX_SIZE, RTE_MEMPOOL_CACHE_MAX_SIZE);
else
ret = -1;
@@ -293,11 +297,13 @@ launch_cores(struct rte_mempool *mp, unsigned int cores)
/* for a given number of core, launch all test cases */
static int
-do_one_mempool_test(struct rte_mempool *mp, unsigned int cores)
+do_one_mempool_test(struct rte_mempool *mp, unsigned int cores, int external_cache)
{
- unsigned int bulk_tab_get[] = { 1, 4, CACHE_LINE_BURST, 32, 64, 128, 0 };
- unsigned int bulk_tab_put[] = { 1, 4, CACHE_LINE_BURST, 32, 64, 128, 0 };
- unsigned int keep_tab[] = { 32, 128, 512, 2048, 8192, 0 };
+ unsigned int bulk_tab_get[] = { 1, 4, CACHE_LINE_BURST, 32, 64, 128,
+ RTE_MEMPOOL_CACHE_MAX_SIZE, 0 };
+ unsigned int bulk_tab_put[] = { 1, 4, CACHE_LINE_BURST, 32, 64, 128,
+ RTE_MEMPOOL_CACHE_MAX_SIZE, 0 };
+ unsigned int keep_tab[] = { 32, 128, 512, 2048, 8192, 32768, 0 };
unsigned *get_bulk_ptr;
unsigned *put_bulk_ptr;
unsigned *keep_ptr;
@@ -310,6 +316,7 @@ do_one_mempool_test(struct rte_mempool *mp, unsigned int cores)
if (*keep_ptr < *get_bulk_ptr || *keep_ptr < *put_bulk_ptr)
continue;
+ use_external_cache = external_cache;
use_constant_values = 0;
n_get_bulk = *get_bulk_ptr;
n_put_bulk = *put_bulk_ptr;
@@ -346,8 +353,10 @@ do_all_mempool_perf_tests(unsigned int cores)
NULL, NULL,
my_obj_init, NULL,
SOCKET_ID_ANY, 0);
- if (mp_nocache == NULL)
+ if (mp_nocache == NULL) {
+ printf("cannot allocate mempool (without cache)\n");
goto err;
+ }
/* create a mempool (with cache) */
mp_cache = rte_mempool_create("perf_test_cache", MEMPOOL_SIZE,
@@ -356,8 +365,10 @@ do_all_mempool_perf_tests(unsigned int cores)
NULL, NULL,
my_obj_init, NULL,
SOCKET_ID_ANY, 0);
- if (mp_cache == NULL)
+ if (mp_cache == NULL) {
+ printf("cannot allocate mempool (with cache)\n");
goto err;
+ }
default_pool_ops = rte_mbuf_best_mempool_ops();
/* Create a mempool based on Default handler */
@@ -386,21 +397,20 @@ do_all_mempool_perf_tests(unsigned int cores)
rte_mempool_obj_iter(default_pool, my_obj_init, NULL);
printf("start performance test (without cache)\n");
- if (do_one_mempool_test(mp_nocache, cores) < 0)
+ if (do_one_mempool_test(mp_nocache, cores, 0) < 0)
goto err;
printf("start performance test for %s (without cache)\n",
default_pool_ops);
- if (do_one_mempool_test(default_pool, cores) < 0)
+ if (do_one_mempool_test(default_pool, cores, 0) < 0)
goto err;
printf("start performance test (with cache)\n");
- if (do_one_mempool_test(mp_cache, cores) < 0)
+ if (do_one_mempool_test(mp_cache, cores, 0) < 0)
goto err;
printf("start performance test (with user-owned cache)\n");
- use_external_cache = 1;
- if (do_one_mempool_test(mp_nocache, cores) < 0)
+ if (do_one_mempool_test(mp_nocache, cores, 1) < 0)
goto err;
rte_mempool_list_dump(stdout);
--
2.17.1
^ permalink raw reply [flat|nested] 28+ messages in thread
* [PATCH v4] mempool: test performance with larger bursts
2024-01-21 4:52 [PATCH] mempool: test performance with larger bursts Morten Brørup
` (2 preceding siblings ...)
2024-01-24 8:58 ` [PATCH v3] " Morten Brørup
@ 2024-01-24 9:10 ` Morten Brørup
2024-01-24 11:21 ` [PATCH v5] " Morten Brørup
` (4 subsequent siblings)
8 siblings, 0 replies; 28+ messages in thread
From: Morten Brørup @ 2024-01-24 9:10 UTC (permalink / raw)
To: andrew.rybchenko, fengchengwen; +Cc: dev, Morten Brørup
Bursts of up to 64 or 128 packets are not uncommon, so increase the
maximum tested get and put burst sizes from 32 to 128.
For convenience, also test get and put burst sizes of
RTE_MEMPOOL_CACHE_MAX_SIZE.
Some applications keep more than 512 objects, so increase the maximum
number of kept objects from 512 to 32768, still in jumps of factor four.
This exceeds the typical mempool cache size of 512 objects, so the test
also exercises the mempool driver.
Signed-off-by: Morten Brørup <mb@smartsharesystems.com>
Acked-by: Chengwen Feng <fengchengwen@huawei.com>
---
v4:
* v3 failed to apply; I had messed up something with git.
* Added ACK from Chengwen Feng.
v3:
* Increased max number of kept objects to 32768.
* Added get and put burst sizes of RTE_MEMPOOL_CACHE_MAX_SIZE objects.
* Print error if unable to allocate mempool.
* Initialize use_external_cache with each test.
A previous version of this patch had a bug, where all test runs
following the first would use external cache. (Chengwen Feng)
v2: Addressed feedback by Chengwen Feng
* Added get and put burst sizes of 64 objects, which is probably also not
uncommon packet burst size.
* Fixed list of number of kept objects so list remains in jumps of factor
four.
* Added three derivative test cases, for faster testing.
---
app/test/test_mempool_perf.c | 125 +++++++++++++++++++++--------------
1 file changed, 76 insertions(+), 49 deletions(-)
diff --git a/app/test/test_mempool_perf.c b/app/test/test_mempool_perf.c
index 96de347f04..481263186a 100644
--- a/app/test/test_mempool_perf.c
+++ b/app/test/test_mempool_perf.c
@@ -1,6 +1,6 @@
/* SPDX-License-Identifier: BSD-3-Clause
* Copyright(c) 2010-2014 Intel Corporation
- * Copyright(c) 2022 SmartShare Systems
+ * Copyright(c) 2022-2024 SmartShare Systems
*/
#include <string.h>
@@ -54,22 +54,25 @@
*
* - Bulk size (*n_get_bulk*, *n_put_bulk*)
*
- * - Bulk get from 1 to 32
- * - Bulk put from 1 to 32
- * - Bulk get and put from 1 to 32, compile time constant
+ * - Bulk get from 1 to 128, and RTE_MEMPOOL_CACHE_MAX_SIZE
+ * - Bulk put from 1 to 128, and RTE_MEMPOOL_CACHE_MAX_SIZE
+ * - Bulk get and put from 1 to 128, and RTE_MEMPOOL_CACHE_MAX_SIZE, compile time constant
*
* - Number of kept objects (*n_keep*)
*
* - 32
* - 128
* - 512
+ * - 2048
+ * - 8192
+ * - 32768
*/
-#define N 65536
+#define N 262144
#define TIME_S 5
#define MEMPOOL_ELT_SIZE 2048
-#define MAX_KEEP 512
-#define MEMPOOL_SIZE ((rte_lcore_count()*(MAX_KEEP+RTE_MEMPOOL_CACHE_MAX_SIZE))-1)
+#define MAX_KEEP 32768
+#define MEMPOOL_SIZE ((rte_lcore_count()*(MAX_KEEP+RTE_MEMPOOL_CACHE_MAX_SIZE*2))-1)
/* Number of pointers fitting into one cache line. */
#define CACHE_LINE_BURST (RTE_CACHE_LINE_SIZE / sizeof(uintptr_t))
@@ -204,6 +207,13 @@ per_lcore_mempool_test(void *arg)
CACHE_LINE_BURST, CACHE_LINE_BURST);
else if (n_get_bulk == 32)
ret = test_loop(mp, cache, n_keep, 32, 32);
+ else if (n_get_bulk == 64)
+ ret = test_loop(mp, cache, n_keep, 64, 64);
+ else if (n_get_bulk == 128)
+ ret = test_loop(mp, cache, n_keep, 128, 128);
+ else if (n_get_bulk == RTE_MEMPOOL_CACHE_MAX_SIZE)
+ ret = test_loop(mp, cache, n_keep,
+ RTE_MEMPOOL_CACHE_MAX_SIZE, RTE_MEMPOOL_CACHE_MAX_SIZE);
else
ret = -1;
@@ -287,11 +297,13 @@ launch_cores(struct rte_mempool *mp, unsigned int cores)
/* for a given number of core, launch all test cases */
static int
-do_one_mempool_test(struct rte_mempool *mp, unsigned int cores)
+do_one_mempool_test(struct rte_mempool *mp, unsigned int cores, int external_cache)
{
- unsigned int bulk_tab_get[] = { 1, 4, CACHE_LINE_BURST, 32, 0 };
- unsigned int bulk_tab_put[] = { 1, 4, CACHE_LINE_BURST, 32, 0 };
- unsigned int keep_tab[] = { 32, 128, 512, 0 };
+ unsigned int bulk_tab_get[] = { 1, 4, CACHE_LINE_BURST, 32, 64, 128,
+ RTE_MEMPOOL_CACHE_MAX_SIZE, 0 };
+ unsigned int bulk_tab_put[] = { 1, 4, CACHE_LINE_BURST, 32, 64, 128,
+ RTE_MEMPOOL_CACHE_MAX_SIZE, 0 };
+ unsigned int keep_tab[] = { 32, 128, 512, 2048, 8192, 32768, 0 };
unsigned *get_bulk_ptr;
unsigned *put_bulk_ptr;
unsigned *keep_ptr;
@@ -301,6 +313,10 @@ do_one_mempool_test(struct rte_mempool *mp, unsigned int cores)
for (put_bulk_ptr = bulk_tab_put; *put_bulk_ptr; put_bulk_ptr++) {
for (keep_ptr = keep_tab; *keep_ptr; keep_ptr++) {
+ if (*keep_ptr < *get_bulk_ptr || *keep_ptr < *put_bulk_ptr)
+ continue;
+
+ use_external_cache = external_cache;
use_constant_values = 0;
n_get_bulk = *get_bulk_ptr;
n_put_bulk = *put_bulk_ptr;
@@ -323,7 +339,7 @@ do_one_mempool_test(struct rte_mempool *mp, unsigned int cores)
}
static int
-test_mempool_perf(void)
+do_all_mempool_perf_tests(unsigned int cores)
{
struct rte_mempool *mp_cache = NULL;
struct rte_mempool *mp_nocache = NULL;
@@ -337,8 +353,10 @@ test_mempool_perf(void)
NULL, NULL,
my_obj_init, NULL,
SOCKET_ID_ANY, 0);
- if (mp_nocache == NULL)
+ if (mp_nocache == NULL) {
+ printf("cannot allocate mempool (without cache)\n");
goto err;
+ }
/* create a mempool (with cache) */
mp_cache = rte_mempool_create("perf_test_cache", MEMPOOL_SIZE,
@@ -347,8 +365,10 @@ test_mempool_perf(void)
NULL, NULL,
my_obj_init, NULL,
SOCKET_ID_ANY, 0);
- if (mp_cache == NULL)
+ if (mp_cache == NULL) {
+ printf("cannot allocate mempool (with cache)\n");
goto err;
+ }
default_pool_ops = rte_mbuf_best_mempool_ops();
/* Create a mempool based on Default handler */
@@ -376,65 +396,72 @@ test_mempool_perf(void)
rte_mempool_obj_iter(default_pool, my_obj_init, NULL);
- /* performance test with 1, 2 and max cores */
printf("start performance test (without cache)\n");
-
- if (do_one_mempool_test(mp_nocache, 1) < 0)
+ if (do_one_mempool_test(mp_nocache, cores, 0) < 0)
goto err;
- if (do_one_mempool_test(mp_nocache, 2) < 0)
- goto err;
-
- if (do_one_mempool_test(mp_nocache, rte_lcore_count()) < 0)
- goto err;
-
- /* performance test with 1, 2 and max cores */
printf("start performance test for %s (without cache)\n",
default_pool_ops);
-
- if (do_one_mempool_test(default_pool, 1) < 0)
+ if (do_one_mempool_test(default_pool, cores, 0) < 0)
goto err;
- if (do_one_mempool_test(default_pool, 2) < 0)
+ printf("start performance test (with cache)\n");
+ if (do_one_mempool_test(mp_cache, cores, 0) < 0)
goto err;
- if (do_one_mempool_test(default_pool, rte_lcore_count()) < 0)
+ printf("start performance test (with user-owned cache)\n");
+ if (do_one_mempool_test(mp_nocache, cores, 1) < 0)
goto err;
- /* performance test with 1, 2 and max cores */
- printf("start performance test (with cache)\n");
+ rte_mempool_list_dump(stdout);
- if (do_one_mempool_test(mp_cache, 1) < 0)
- goto err;
+ ret = 0;
- if (do_one_mempool_test(mp_cache, 2) < 0)
- goto err;
+err:
+ rte_mempool_free(mp_cache);
+ rte_mempool_free(mp_nocache);
+ rte_mempool_free(default_pool);
+ return ret;
+}
- if (do_one_mempool_test(mp_cache, rte_lcore_count()) < 0)
- goto err;
+static int
+test_mempool_perf_1core(void)
+{
+ return do_all_mempool_perf_tests(1);
+}
- /* performance test with 1, 2 and max cores */
- printf("start performance test (with user-owned cache)\n");
- use_external_cache = 1;
+static int
+test_mempool_perf_2cores(void)
+{
+ return do_all_mempool_perf_tests(2);
+}
- if (do_one_mempool_test(mp_nocache, 1) < 0)
- goto err;
+static int
+test_mempool_perf_allcores(void)
+{
+ return do_all_mempool_perf_tests(rte_lcore_count());
+}
- if (do_one_mempool_test(mp_nocache, 2) < 0)
- goto err;
+static int
+test_mempool_perf(void)
+{
+ int ret = -1;
- if (do_one_mempool_test(mp_nocache, rte_lcore_count()) < 0)
+ /* performance test with 1, 2 and max cores */
+ if (do_all_mempool_perf_tests(1) < 0)
+ goto err;
+ if (do_all_mempool_perf_tests(2) < 0)
+ goto err;
+ if (do_all_mempool_perf_tests(rte_lcore_count()) < 0)
goto err;
-
- rte_mempool_list_dump(stdout);
ret = 0;
err:
- rte_mempool_free(mp_cache);
- rte_mempool_free(mp_nocache);
- rte_mempool_free(default_pool);
return ret;
}
REGISTER_PERF_TEST(mempool_perf_autotest, test_mempool_perf);
+REGISTER_PERF_TEST(mempool_perf_autotest_1core, test_mempool_perf_1core);
+REGISTER_PERF_TEST(mempool_perf_autotest_2cores, test_mempool_perf_2cores);
+REGISTER_PERF_TEST(mempool_perf_autotest_allcores, test_mempool_perf_allcores);
--
2.17.1
^ permalink raw reply [flat|nested] 28+ messages in thread
* [PATCH v5] mempool: test performance with larger bursts
2024-01-21 4:52 [PATCH] mempool: test performance with larger bursts Morten Brørup
` (3 preceding siblings ...)
2024-01-24 9:10 ` [PATCH v4] " Morten Brørup
@ 2024-01-24 11:21 ` Morten Brørup
2024-02-18 18:03 ` Thomas Monjalon
2024-02-20 14:01 ` [PATCH v6] " Morten Brørup
` (3 subsequent siblings)
8 siblings, 1 reply; 28+ messages in thread
From: Morten Brørup @ 2024-01-24 11:21 UTC (permalink / raw)
To: andrew.rybchenko, fengchengwen; +Cc: dev, Morten Brørup
Bursts of up to 64 or 128 packets are not uncommon, so increase the
maximum tested get and put burst sizes from 32 to 128.
For convenience, also test get and put burst sizes of
RTE_MEMPOOL_CACHE_MAX_SIZE.
Some applications keep more than 512 objects, so increase the maximum
number of kept objects from 512 to 32768, still in jumps of factor four.
This exceeds the typical mempool cache size of 512 objects, so the test
also exercises the mempool driver.
Increased the precision of rate_persec calculation by timing the actual
duration of the test, instead of assuming it took exactly 5 seconds.
Added cache guard to per-lcore stats structure.
Signed-off-by: Morten Brørup <mb@smartsharesystems.com>
Acked-by: Chengwen Feng <fengchengwen@huawei.com>
---
v5:
* Increased N, to reduce measurement overhead with large numbers of kept
objects.
* Increased precision of rate_persec calculation.
* Added missing cache guard to per-lcore stats structure.
v4:
* v3 failed to apply; I had messed up something with git.
* Added ACK from Chengwen Feng.
v3:
* Increased max number of kept objects to 32768.
* Added get and put burst sizes of RTE_MEMPOOL_CACHE_MAX_SIZE objects.
* Print error if unable to allocate mempool.
* Initialize use_external_cache with each test.
A previous version of this patch had a bug, where all test runs
following the first would use external cache. (Chengwen Feng)
v2: Addressed feedback by Chengwen Feng
* Added get and put burst sizes of 64 objects, which is probably also not
uncommon packet burst size.
* Fixed list of number of kept objects so list remains in jumps of factor
four.
* Added three derivative test cases, for faster testing.
---
app/test/test_mempool_perf.c | 137 ++++++++++++++++++++++-------------
1 file changed, 86 insertions(+), 51 deletions(-)
diff --git a/app/test/test_mempool_perf.c b/app/test/test_mempool_perf.c
index 96de347f04..dcdfb52020 100644
--- a/app/test/test_mempool_perf.c
+++ b/app/test/test_mempool_perf.c
@@ -1,6 +1,6 @@
/* SPDX-License-Identifier: BSD-3-Clause
* Copyright(c) 2010-2014 Intel Corporation
- * Copyright(c) 2022 SmartShare Systems
+ * Copyright(c) 2022-2024 SmartShare Systems
*/
#include <string.h>
@@ -54,22 +54,25 @@
*
* - Bulk size (*n_get_bulk*, *n_put_bulk*)
*
- * - Bulk get from 1 to 32
- * - Bulk put from 1 to 32
- * - Bulk get and put from 1 to 32, compile time constant
+ * - Bulk get from 1 to 128, and RTE_MEMPOOL_CACHE_MAX_SIZE
+ * - Bulk put from 1 to 128, and RTE_MEMPOOL_CACHE_MAX_SIZE
+ * - Bulk get and put from 1 to 128, and RTE_MEMPOOL_CACHE_MAX_SIZE, compile time constant
*
* - Number of kept objects (*n_keep*)
*
* - 32
* - 128
* - 512
+ * - 2048
+ * - 8192
+ * - 32768
*/
-#define N 65536
#define TIME_S 5
#define MEMPOOL_ELT_SIZE 2048
-#define MAX_KEEP 512
-#define MEMPOOL_SIZE ((rte_lcore_count()*(MAX_KEEP+RTE_MEMPOOL_CACHE_MAX_SIZE))-1)
+#define MAX_KEEP 32768
+#define N (128 * MAX_KEEP)
+#define MEMPOOL_SIZE ((rte_lcore_count()*(MAX_KEEP+RTE_MEMPOOL_CACHE_MAX_SIZE*2))-1)
/* Number of pointers fitting into one cache line. */
#define CACHE_LINE_BURST (RTE_CACHE_LINE_SIZE / sizeof(uintptr_t))
@@ -100,9 +103,11 @@ static unsigned n_keep;
/* true if we want to test with constant n_get_bulk and n_put_bulk */
static int use_constant_values;
-/* number of enqueues / dequeues */
+/* number of enqueues / dequeues, and time used */
struct mempool_test_stats {
uint64_t enq_count;
+ uint64_t duration_cycles;
+ RTE_CACHE_GUARD;
} __rte_cache_aligned;
static struct mempool_test_stats stats[RTE_MAX_LCORE];
@@ -185,6 +190,7 @@ per_lcore_mempool_test(void *arg)
GOTO_ERR(ret, out);
stats[lcore_id].enq_count = 0;
+ stats[lcore_id].duration_cycles = 0;
/* wait synchro for workers */
if (lcore_id != rte_get_main_lcore())
@@ -204,6 +210,13 @@ per_lcore_mempool_test(void *arg)
CACHE_LINE_BURST, CACHE_LINE_BURST);
else if (n_get_bulk == 32)
ret = test_loop(mp, cache, n_keep, 32, 32);
+ else if (n_get_bulk == 64)
+ ret = test_loop(mp, cache, n_keep, 64, 64);
+ else if (n_get_bulk == 128)
+ ret = test_loop(mp, cache, n_keep, 128, 128);
+ else if (n_get_bulk == RTE_MEMPOOL_CACHE_MAX_SIZE)
+ ret = test_loop(mp, cache, n_keep,
+ RTE_MEMPOOL_CACHE_MAX_SIZE, RTE_MEMPOOL_CACHE_MAX_SIZE);
else
ret = -1;
@@ -215,6 +228,8 @@ per_lcore_mempool_test(void *arg)
stats[lcore_id].enq_count += N;
}
+ stats[lcore_id].duration_cycles = time_diff;
+
out:
if (use_external_cache) {
rte_mempool_cache_flush(cache, mp);
@@ -232,6 +247,7 @@ launch_cores(struct rte_mempool *mp, unsigned int cores)
uint64_t rate;
int ret;
unsigned cores_save = cores;
+ double hz = rte_get_timer_hz();
__atomic_store_n(&synchro, 0, __ATOMIC_RELAXED);
@@ -278,7 +294,9 @@ launch_cores(struct rte_mempool *mp, unsigned int cores)
rate = 0;
for (lcore_id = 0; lcore_id < RTE_MAX_LCORE; lcore_id++)
- rate += (stats[lcore_id].enq_count / TIME_S);
+ if (stats[lcore_id].duration_cycles != 0)
+ rate += (double)stats[lcore_id].enq_count * hz /
+ (double)stats[lcore_id].duration_cycles;
printf("rate_persec=%" PRIu64 "\n", rate);
@@ -287,11 +305,13 @@ launch_cores(struct rte_mempool *mp, unsigned int cores)
/* for a given number of core, launch all test cases */
static int
-do_one_mempool_test(struct rte_mempool *mp, unsigned int cores)
+do_one_mempool_test(struct rte_mempool *mp, unsigned int cores, int external_cache)
{
- unsigned int bulk_tab_get[] = { 1, 4, CACHE_LINE_BURST, 32, 0 };
- unsigned int bulk_tab_put[] = { 1, 4, CACHE_LINE_BURST, 32, 0 };
- unsigned int keep_tab[] = { 32, 128, 512, 0 };
+ unsigned int bulk_tab_get[] = { 1, 4, CACHE_LINE_BURST, 32, 64, 128,
+ RTE_MEMPOOL_CACHE_MAX_SIZE, 0 };
+ unsigned int bulk_tab_put[] = { 1, 4, CACHE_LINE_BURST, 32, 64, 128,
+ RTE_MEMPOOL_CACHE_MAX_SIZE, 0 };
+ unsigned int keep_tab[] = { 32, 128, 512, 2048, 8192, 32768, 0 };
unsigned *get_bulk_ptr;
unsigned *put_bulk_ptr;
unsigned *keep_ptr;
@@ -301,6 +321,10 @@ do_one_mempool_test(struct rte_mempool *mp, unsigned int cores)
for (put_bulk_ptr = bulk_tab_put; *put_bulk_ptr; put_bulk_ptr++) {
for (keep_ptr = keep_tab; *keep_ptr; keep_ptr++) {
+ if (*keep_ptr < *get_bulk_ptr || *keep_ptr < *put_bulk_ptr)
+ continue;
+
+ use_external_cache = external_cache;
use_constant_values = 0;
n_get_bulk = *get_bulk_ptr;
n_put_bulk = *put_bulk_ptr;
@@ -323,7 +347,7 @@ do_one_mempool_test(struct rte_mempool *mp, unsigned int cores)
}
static int
-test_mempool_perf(void)
+do_all_mempool_perf_tests(unsigned int cores)
{
struct rte_mempool *mp_cache = NULL;
struct rte_mempool *mp_nocache = NULL;
@@ -337,8 +361,10 @@ test_mempool_perf(void)
NULL, NULL,
my_obj_init, NULL,
SOCKET_ID_ANY, 0);
- if (mp_nocache == NULL)
+ if (mp_nocache == NULL) {
+ printf("cannot allocate mempool (without cache)\n");
goto err;
+ }
/* create a mempool (with cache) */
mp_cache = rte_mempool_create("perf_test_cache", MEMPOOL_SIZE,
@@ -347,8 +373,10 @@ test_mempool_perf(void)
NULL, NULL,
my_obj_init, NULL,
SOCKET_ID_ANY, 0);
- if (mp_cache == NULL)
+ if (mp_cache == NULL) {
+ printf("cannot allocate mempool (with cache)\n");
goto err;
+ }
default_pool_ops = rte_mbuf_best_mempool_ops();
/* Create a mempool based on Default handler */
@@ -376,65 +404,72 @@ test_mempool_perf(void)
rte_mempool_obj_iter(default_pool, my_obj_init, NULL);
- /* performance test with 1, 2 and max cores */
printf("start performance test (without cache)\n");
-
- if (do_one_mempool_test(mp_nocache, 1) < 0)
- goto err;
-
- if (do_one_mempool_test(mp_nocache, 2) < 0)
+ if (do_one_mempool_test(mp_nocache, cores, 0) < 0)
goto err;
- if (do_one_mempool_test(mp_nocache, rte_lcore_count()) < 0)
- goto err;
-
- /* performance test with 1, 2 and max cores */
printf("start performance test for %s (without cache)\n",
default_pool_ops);
-
- if (do_one_mempool_test(default_pool, 1) < 0)
+ if (do_one_mempool_test(default_pool, cores, 0) < 0)
goto err;
- if (do_one_mempool_test(default_pool, 2) < 0)
+ printf("start performance test (with cache)\n");
+ if (do_one_mempool_test(mp_cache, cores, 0) < 0)
goto err;
- if (do_one_mempool_test(default_pool, rte_lcore_count()) < 0)
+ printf("start performance test (with user-owned cache)\n");
+ if (do_one_mempool_test(mp_nocache, cores, 1) < 0)
goto err;
- /* performance test with 1, 2 and max cores */
- printf("start performance test (with cache)\n");
+ rte_mempool_list_dump(stdout);
- if (do_one_mempool_test(mp_cache, 1) < 0)
- goto err;
+ ret = 0;
- if (do_one_mempool_test(mp_cache, 2) < 0)
- goto err;
+err:
+ rte_mempool_free(mp_cache);
+ rte_mempool_free(mp_nocache);
+ rte_mempool_free(default_pool);
+ return ret;
+}
- if (do_one_mempool_test(mp_cache, rte_lcore_count()) < 0)
- goto err;
+static int
+test_mempool_perf_1core(void)
+{
+ return do_all_mempool_perf_tests(1);
+}
- /* performance test with 1, 2 and max cores */
- printf("start performance test (with user-owned cache)\n");
- use_external_cache = 1;
+static int
+test_mempool_perf_2cores(void)
+{
+ return do_all_mempool_perf_tests(2);
+}
- if (do_one_mempool_test(mp_nocache, 1) < 0)
- goto err;
+static int
+test_mempool_perf_allcores(void)
+{
+ return do_all_mempool_perf_tests(rte_lcore_count());
+}
- if (do_one_mempool_test(mp_nocache, 2) < 0)
- goto err;
+static int
+test_mempool_perf(void)
+{
+ int ret = -1;
- if (do_one_mempool_test(mp_nocache, rte_lcore_count()) < 0)
+ /* performance test with 1, 2 and max cores */
+ if (do_all_mempool_perf_tests(1) < 0)
+ goto err;
+ if (do_all_mempool_perf_tests(2) < 0)
+ goto err;
+ if (do_all_mempool_perf_tests(rte_lcore_count()) < 0)
goto err;
-
- rte_mempool_list_dump(stdout);
ret = 0;
err:
- rte_mempool_free(mp_cache);
- rte_mempool_free(mp_nocache);
- rte_mempool_free(default_pool);
return ret;
}
REGISTER_PERF_TEST(mempool_perf_autotest, test_mempool_perf);
+REGISTER_PERF_TEST(mempool_perf_autotest_1core, test_mempool_perf_1core);
+REGISTER_PERF_TEST(mempool_perf_autotest_2cores, test_mempool_perf_2cores);
+REGISTER_PERF_TEST(mempool_perf_autotest_allcores, test_mempool_perf_allcores);
--
2.17.1
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH v5] mempool: test performance with larger bursts
2024-01-24 11:21 ` [PATCH v5] " Morten Brørup
@ 2024-02-18 18:03 ` Thomas Monjalon
2024-02-20 13:49 ` Morten Brørup
0 siblings, 1 reply; 28+ messages in thread
From: Thomas Monjalon @ 2024-02-18 18:03 UTC (permalink / raw)
To: Morten Brørup; +Cc: andrew.rybchenko, fengchengwen, dev
24/01/2024 12:21, Morten Brørup:
> --- a/app/test/test_mempool_perf.c
> +++ b/app/test/test_mempool_perf.c
> @@ -1,6 +1,6 @@
> /* SPDX-License-Identifier: BSD-3-Clause
> * Copyright(c) 2010-2014 Intel Corporation
> - * Copyright(c) 2022 SmartShare Systems
> + * Copyright(c) 2022-2024 SmartShare Systems
You don't need to update copyright year.
The first year is the only important one.
reading: https://matija.suklje.name/how-and-why-to-properly-write-copyright-statements-in-your-code#why-not-bump-the-year-on-change
[...]
> REGISTER_PERF_TEST(mempool_perf_autotest, test_mempool_perf);
> +REGISTER_PERF_TEST(mempool_perf_autotest_1core, test_mempool_perf_1core);
> +REGISTER_PERF_TEST(mempool_perf_autotest_2cores, test_mempool_perf_2cores);
How do we make sure the test is skipped if we have only 1 core?
> +REGISTER_PERF_TEST(mempool_perf_autotest_allcores, test_mempool_perf_allcores);
How the test duration is changed after this patch?
^ permalink raw reply [flat|nested] 28+ messages in thread
* RE: [PATCH v5] mempool: test performance with larger bursts
2024-02-18 18:03 ` Thomas Monjalon
@ 2024-02-20 13:49 ` Morten Brørup
2024-02-21 10:22 ` Thomas Monjalon
0 siblings, 1 reply; 28+ messages in thread
From: Morten Brørup @ 2024-02-20 13:49 UTC (permalink / raw)
To: Thomas Monjalon; +Cc: andrew.rybchenko, fengchengwen, dev
> From: Thomas Monjalon [mailto:thomas@monjalon.net]
> Sent: Sunday, 18 February 2024 19.04
>
> 24/01/2024 12:21, Morten Brørup:
> > --- a/app/test/test_mempool_perf.c
> > +++ b/app/test/test_mempool_perf.c
> > @@ -1,6 +1,6 @@
> > /* SPDX-License-Identifier: BSD-3-Clause
> > * Copyright(c) 2010-2014 Intel Corporation
> > - * Copyright(c) 2022 SmartShare Systems
> > + * Copyright(c) 2022-2024 SmartShare Systems
>
> You don't need to update copyright year.
> The first year is the only important one.
>
> reading: https://matija.suklje.name/how-and-why-to-properly-write-
> copyright-statements-in-your-code#why-not-bump-the-year-on-change
Thank you, Thomas. Very informative.
Will fix in next version.
>
> [...]
> > REGISTER_PERF_TEST(mempool_perf_autotest, test_mempool_perf);
> > +REGISTER_PERF_TEST(mempool_perf_autotest_1core,
> test_mempool_perf_1core);
> > +REGISTER_PERF_TEST(mempool_perf_autotest_2cores,
> test_mempool_perf_2cores);
>
> How do we make sure the test is skipped if we have only 1 core?
Good point. Will fix in next version.
>
> > +REGISTER_PERF_TEST(mempool_perf_autotest_allcores,
> test_mempool_perf_allcores);
>
> How the test duration is changed after this patch?
>
On my test machine, the expanded test parameter set increased the duration of one test run from 20 minutes to 100 minutes.
Before the patch, all three test runs were always executed, i.e. a total duration of 60 minutes.
In other words:
The expanded test parameter set increased the test run duration by factor five.
Introducing the ability to optionally only test with a specific number of lcores reduced the total test duration to a third.
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH v5] mempool: test performance with larger bursts
2024-02-20 13:49 ` Morten Brørup
@ 2024-02-21 10:22 ` Thomas Monjalon
2024-02-21 10:38 ` Morten Brørup
0 siblings, 1 reply; 28+ messages in thread
From: Thomas Monjalon @ 2024-02-21 10:22 UTC (permalink / raw)
To: Morten Brørup; +Cc: andrew.rybchenko, fengchengwen, dev
20/02/2024 14:49, Morten Brørup:
> From: Thomas Monjalon [mailto:thomas@monjalon.net]
> > 24/01/2024 12:21, Morten Brørup:
> > > REGISTER_PERF_TEST(mempool_perf_autotest, test_mempool_perf);
> > > +REGISTER_PERF_TEST(mempool_perf_autotest_1core,
> > test_mempool_perf_1core);
> > > +REGISTER_PERF_TEST(mempool_perf_autotest_2cores,
> > test_mempool_perf_2cores);
> >
> > How do we make sure the test is skipped if we have only 1 core?
>
> Good point. Will fix in next version.
>
> >
> > > +REGISTER_PERF_TEST(mempool_perf_autotest_allcores,
> > test_mempool_perf_allcores);
> >
> > How the test duration is changed after this patch?
>
> On my test machine, the expanded test parameter set increased the duration of one test run from 20 minutes to 100 minutes.
> Before the patch, all three test runs were always executed, i.e. a total duration of 60 minutes.
>
> In other words:
> The expanded test parameter set increased the test run duration by factor five.
> Introducing the ability to optionally only test with a specific number of lcores reduced the total test duration to a third.
That's a very long test.
It would be interesting to find a way to make it shorter.
^ permalink raw reply [flat|nested] 28+ messages in thread
* RE: [PATCH v5] mempool: test performance with larger bursts
2024-02-21 10:22 ` Thomas Monjalon
@ 2024-02-21 10:38 ` Morten Brørup
2024-02-21 10:40 ` Bruce Richardson
0 siblings, 1 reply; 28+ messages in thread
From: Morten Brørup @ 2024-02-21 10:38 UTC (permalink / raw)
To: Thomas Monjalon; +Cc: andrew.rybchenko, fengchengwen, dev
> From: Thomas Monjalon [mailto:thomas@monjalon.net]
> Sent: Wednesday, 21 February 2024 11.23
>
> 20/02/2024 14:49, Morten Brørup:
> > From: Thomas Monjalon [mailto:thomas@monjalon.net]
> > >
> > > How the test duration is changed after this patch?
> >
> > On my test machine, the expanded test parameter set increased the
> duration of one test run from 20 minutes to 100 minutes.
> > Before the patch, all three test runs were always executed, i.e. a
> total duration of 60 minutes.
> >
> > In other words:
> > The expanded test parameter set increased the test run duration by
> factor five.
> > Introducing the ability to optionally only test with a specific
> number of lcores reduced the total test duration to a third.
>
> That's a very long test.
> It would be interesting to find a way to make it shorter.
I tried looking into this, but I couldn't figure out how to pass parameters to a test, so I added the three variants with shorter tests, as suggested by Chengwen Feng.
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH v5] mempool: test performance with larger bursts
2024-02-21 10:38 ` Morten Brørup
@ 2024-02-21 10:40 ` Bruce Richardson
0 siblings, 0 replies; 28+ messages in thread
From: Bruce Richardson @ 2024-02-21 10:40 UTC (permalink / raw)
To: Morten Brørup; +Cc: Thomas Monjalon, andrew.rybchenko, fengchengwen, dev
On Wed, Feb 21, 2024 at 11:38:34AM +0100, Morten Brørup wrote:
> > From: Thomas Monjalon [mailto:thomas@monjalon.net]
> > Sent: Wednesday, 21 February 2024 11.23
> >
> > 20/02/2024 14:49, Morten Brørup:
> > > From: Thomas Monjalon [mailto:thomas@monjalon.net]
> > > >
> > > > How the test duration is changed after this patch?
> > >
> > > On my test machine, the expanded test parameter set increased the
> > duration of one test run from 20 minutes to 100 minutes.
> > > Before the patch, all three test runs were always executed, i.e. a
> > total duration of 60 minutes.
> > >
> > > In other words:
> > > The expanded test parameter set increased the test run duration by
> > factor five.
> > > Introducing the ability to optionally only test with a specific
> > number of lcores reduced the total test duration to a third.
> >
> > That's a very long test.
> > It would be interesting to find a way to make it shorter.
>
> I tried looking into this, but I couldn't figure out how to pass parameters to a test, so I added the three variants with shorter tests, as suggested by Chengwen Feng.
>
It's not currently possible, but:
https://patches.dpdk.org/project/dpdk/patch/20231215130656.247582-1-bruce.richardson@intel.com/
/Bruce
^ permalink raw reply [flat|nested] 28+ messages in thread
* [PATCH v6] mempool: test performance with larger bursts
2024-01-21 4:52 [PATCH] mempool: test performance with larger bursts Morten Brørup
` (4 preceding siblings ...)
2024-01-24 11:21 ` [PATCH v5] " Morten Brørup
@ 2024-02-20 14:01 ` Morten Brørup
2024-03-02 20:04 ` [PATCH v7] " Morten Brørup
` (2 subsequent siblings)
8 siblings, 0 replies; 28+ messages in thread
From: Morten Brørup @ 2024-02-20 14:01 UTC (permalink / raw)
To: andrew.rybchenko, fengchengwen, thomas; +Cc: dev, Morten Brørup
Bursts of up to 64 or 128 packets are not uncommon, so increase the
maximum tested get and put burst sizes from 32 to 128.
For convenience, also test get and put burst sizes of
RTE_MEMPOOL_CACHE_MAX_SIZE.
Some applications keep more than 512 objects, so increase the maximum
number of kept objects from 512 to 32768, still in jumps of factor four.
This exceeds the typical mempool cache size of 512 objects, so the test
also exercises the mempool driver.
Increased the precision of rate_persec calculation by timing the actual
duration of the test, instead of assuming it took exactly 5 seconds.
Added cache guard to per-lcore stats structure.
Signed-off-by: Morten Brørup <mb@smartsharesystems.com>
Acked-by: Chengwen Feng <fengchengwen@huawei.com>
---
v6:
* Do not test with more lcores than available. (Thomas)
v5:
* Increased N, to reduce measurement overhead with large numbers of kept
objects.
* Increased precision of rate_persec calculation.
* Added missing cache guard to per-lcore stats structure.
v4:
* v3 failed to apply; I had messed up something with git.
* Added ACK from Chengwen Feng.
v3:
* Increased max number of kept objects to 32768.
* Added get and put burst sizes of RTE_MEMPOOL_CACHE_MAX_SIZE objects.
* Print error if unable to allocate mempool.
* Initialize use_external_cache with each test.
A previous version of this patch had a bug, where all test runs
following the first would use external cache. (Chengwen Feng)
v2: Addressed feedback by Chengwen Feng
* Added get and put burst sizes of 64 objects, which is probably also not
uncommon packet burst size.
* Fixed list of number of kept objects so list remains in jumps of factor
four.
* Added three derivative test cases, for faster testing.
---
app/test/test_mempool_perf.c | 142 +++++++++++++++++++++++------------
1 file changed, 94 insertions(+), 48 deletions(-)
diff --git a/app/test/test_mempool_perf.c b/app/test/test_mempool_perf.c
index 96de347f04..23836af215 100644
--- a/app/test/test_mempool_perf.c
+++ b/app/test/test_mempool_perf.c
@@ -54,22 +54,25 @@
*
* - Bulk size (*n_get_bulk*, *n_put_bulk*)
*
- * - Bulk get from 1 to 32
- * - Bulk put from 1 to 32
- * - Bulk get and put from 1 to 32, compile time constant
+ * - Bulk get from 1 to 128, and RTE_MEMPOOL_CACHE_MAX_SIZE
+ * - Bulk put from 1 to 128, and RTE_MEMPOOL_CACHE_MAX_SIZE
+ * - Bulk get and put from 1 to 128, and RTE_MEMPOOL_CACHE_MAX_SIZE, compile time constant
*
* - Number of kept objects (*n_keep*)
*
* - 32
* - 128
* - 512
+ * - 2048
+ * - 8192
+ * - 32768
*/
-#define N 65536
#define TIME_S 5
#define MEMPOOL_ELT_SIZE 2048
-#define MAX_KEEP 512
-#define MEMPOOL_SIZE ((rte_lcore_count()*(MAX_KEEP+RTE_MEMPOOL_CACHE_MAX_SIZE))-1)
+#define MAX_KEEP 32768
+#define N (128 * MAX_KEEP)
+#define MEMPOOL_SIZE ((rte_lcore_count()*(MAX_KEEP+RTE_MEMPOOL_CACHE_MAX_SIZE*2))-1)
/* Number of pointers fitting into one cache line. */
#define CACHE_LINE_BURST (RTE_CACHE_LINE_SIZE / sizeof(uintptr_t))
@@ -100,9 +103,11 @@ static unsigned n_keep;
/* true if we want to test with constant n_get_bulk and n_put_bulk */
static int use_constant_values;
-/* number of enqueues / dequeues */
+/* number of enqueues / dequeues, and time used */
struct mempool_test_stats {
uint64_t enq_count;
+ uint64_t duration_cycles;
+ RTE_CACHE_GUARD;
} __rte_cache_aligned;
static struct mempool_test_stats stats[RTE_MAX_LCORE];
@@ -185,6 +190,7 @@ per_lcore_mempool_test(void *arg)
GOTO_ERR(ret, out);
stats[lcore_id].enq_count = 0;
+ stats[lcore_id].duration_cycles = 0;
/* wait synchro for workers */
if (lcore_id != rte_get_main_lcore())
@@ -204,6 +210,13 @@ per_lcore_mempool_test(void *arg)
CACHE_LINE_BURST, CACHE_LINE_BURST);
else if (n_get_bulk == 32)
ret = test_loop(mp, cache, n_keep, 32, 32);
+ else if (n_get_bulk == 64)
+ ret = test_loop(mp, cache, n_keep, 64, 64);
+ else if (n_get_bulk == 128)
+ ret = test_loop(mp, cache, n_keep, 128, 128);
+ else if (n_get_bulk == RTE_MEMPOOL_CACHE_MAX_SIZE)
+ ret = test_loop(mp, cache, n_keep,
+ RTE_MEMPOOL_CACHE_MAX_SIZE, RTE_MEMPOOL_CACHE_MAX_SIZE);
else
ret = -1;
@@ -215,6 +228,8 @@ per_lcore_mempool_test(void *arg)
stats[lcore_id].enq_count += N;
}
+ stats[lcore_id].duration_cycles = time_diff;
+
out:
if (use_external_cache) {
rte_mempool_cache_flush(cache, mp);
@@ -232,6 +247,7 @@ launch_cores(struct rte_mempool *mp, unsigned int cores)
uint64_t rate;
int ret;
unsigned cores_save = cores;
+ double hz = rte_get_timer_hz();
__atomic_store_n(&synchro, 0, __ATOMIC_RELAXED);
@@ -278,7 +294,9 @@ launch_cores(struct rte_mempool *mp, unsigned int cores)
rate = 0;
for (lcore_id = 0; lcore_id < RTE_MAX_LCORE; lcore_id++)
- rate += (stats[lcore_id].enq_count / TIME_S);
+ if (stats[lcore_id].duration_cycles != 0)
+ rate += (double)stats[lcore_id].enq_count * hz /
+ (double)stats[lcore_id].duration_cycles;
printf("rate_persec=%" PRIu64 "\n", rate);
@@ -287,11 +305,13 @@ launch_cores(struct rte_mempool *mp, unsigned int cores)
/* for a given number of core, launch all test cases */
static int
-do_one_mempool_test(struct rte_mempool *mp, unsigned int cores)
+do_one_mempool_test(struct rte_mempool *mp, unsigned int cores, int external_cache)
{
- unsigned int bulk_tab_get[] = { 1, 4, CACHE_LINE_BURST, 32, 0 };
- unsigned int bulk_tab_put[] = { 1, 4, CACHE_LINE_BURST, 32, 0 };
- unsigned int keep_tab[] = { 32, 128, 512, 0 };
+ unsigned int bulk_tab_get[] = { 1, 4, CACHE_LINE_BURST, 32, 64, 128,
+ RTE_MEMPOOL_CACHE_MAX_SIZE, 0 };
+ unsigned int bulk_tab_put[] = { 1, 4, CACHE_LINE_BURST, 32, 64, 128,
+ RTE_MEMPOOL_CACHE_MAX_SIZE, 0 };
+ unsigned int keep_tab[] = { 32, 128, 512, 2048, 8192, 32768, 0 };
unsigned *get_bulk_ptr;
unsigned *put_bulk_ptr;
unsigned *keep_ptr;
@@ -301,6 +321,10 @@ do_one_mempool_test(struct rte_mempool *mp, unsigned int cores)
for (put_bulk_ptr = bulk_tab_put; *put_bulk_ptr; put_bulk_ptr++) {
for (keep_ptr = keep_tab; *keep_ptr; keep_ptr++) {
+ if (*keep_ptr < *get_bulk_ptr || *keep_ptr < *put_bulk_ptr)
+ continue;
+
+ use_external_cache = external_cache;
use_constant_values = 0;
n_get_bulk = *get_bulk_ptr;
n_put_bulk = *put_bulk_ptr;
@@ -323,7 +347,7 @@ do_one_mempool_test(struct rte_mempool *mp, unsigned int cores)
}
static int
-test_mempool_perf(void)
+do_all_mempool_perf_tests(unsigned int cores)
{
struct rte_mempool *mp_cache = NULL;
struct rte_mempool *mp_nocache = NULL;
@@ -337,8 +361,10 @@ test_mempool_perf(void)
NULL, NULL,
my_obj_init, NULL,
SOCKET_ID_ANY, 0);
- if (mp_nocache == NULL)
+ if (mp_nocache == NULL) {
+ printf("cannot allocate mempool (without cache)\n");
goto err;
+ }
/* create a mempool (with cache) */
mp_cache = rte_mempool_create("perf_test_cache", MEMPOOL_SIZE,
@@ -347,8 +373,10 @@ test_mempool_perf(void)
NULL, NULL,
my_obj_init, NULL,
SOCKET_ID_ANY, 0);
- if (mp_cache == NULL)
+ if (mp_cache == NULL) {
+ printf("cannot allocate mempool (with cache)\n");
goto err;
+ }
default_pool_ops = rte_mbuf_best_mempool_ops();
/* Create a mempool based on Default handler */
@@ -376,65 +404,83 @@ test_mempool_perf(void)
rte_mempool_obj_iter(default_pool, my_obj_init, NULL);
- /* performance test with 1, 2 and max cores */
printf("start performance test (without cache)\n");
-
- if (do_one_mempool_test(mp_nocache, 1) < 0)
- goto err;
-
- if (do_one_mempool_test(mp_nocache, 2) < 0)
+ if (do_one_mempool_test(mp_nocache, cores, 0) < 0)
goto err;
- if (do_one_mempool_test(mp_nocache, rte_lcore_count()) < 0)
- goto err;
-
- /* performance test with 1, 2 and max cores */
printf("start performance test for %s (without cache)\n",
default_pool_ops);
-
- if (do_one_mempool_test(default_pool, 1) < 0)
+ if (do_one_mempool_test(default_pool, cores, 0) < 0)
goto err;
- if (do_one_mempool_test(default_pool, 2) < 0)
+ printf("start performance test (with cache)\n");
+ if (do_one_mempool_test(mp_cache, cores, 0) < 0)
goto err;
- if (do_one_mempool_test(default_pool, rte_lcore_count()) < 0)
+ printf("start performance test (with user-owned cache)\n");
+ if (do_one_mempool_test(mp_nocache, cores, 1) < 0)
goto err;
- /* performance test with 1, 2 and max cores */
- printf("start performance test (with cache)\n");
+ rte_mempool_list_dump(stdout);
- if (do_one_mempool_test(mp_cache, 1) < 0)
- goto err;
+ ret = 0;
- if (do_one_mempool_test(mp_cache, 2) < 0)
- goto err;
+err:
+ rte_mempool_free(mp_cache);
+ rte_mempool_free(mp_nocache);
+ rte_mempool_free(default_pool);
+ return ret;
+}
- if (do_one_mempool_test(mp_cache, rte_lcore_count()) < 0)
- goto err;
+static int
+test_mempool_perf_1core(void)
+{
+ return do_all_mempool_perf_tests(1);
+}
- /* performance test with 1, 2 and max cores */
- printf("start performance test (with user-owned cache)\n");
- use_external_cache = 1;
+static int
+test_mempool_perf_2cores(void)
+{
+ if (rte_lcore_count() < 2) {
+ printf("not enough lcores\n");
+ return -1;
+ }
+ return do_all_mempool_perf_tests(2);
+}
- if (do_one_mempool_test(mp_nocache, 1) < 0)
- goto err;
+static int
+test_mempool_perf_allcores(void)
+{
+ return do_all_mempool_perf_tests(rte_lcore_count());
+}
+
+static int
+test_mempool_perf(void)
+{
+ int ret = -1;
- if (do_one_mempool_test(mp_nocache, 2) < 0)
+ /* performance test with 1, 2 and max cores */
+ if (do_all_mempool_perf_tests(1) < 0)
goto err;
+ if (rte_lcore_count() == 1)
+ goto done;
- if (do_one_mempool_test(mp_nocache, rte_lcore_count()) < 0)
+ if (do_all_mempool_perf_tests(2) < 0)
goto err;
+ if (rte_lcore_count() == 2)
+ goto done;
- rte_mempool_list_dump(stdout);
+ if (do_all_mempool_perf_tests(rte_lcore_count()) < 0)
+ goto err;
+done:
ret = 0;
err:
- rte_mempool_free(mp_cache);
- rte_mempool_free(mp_nocache);
- rte_mempool_free(default_pool);
return ret;
}
REGISTER_PERF_TEST(mempool_perf_autotest, test_mempool_perf);
+REGISTER_PERF_TEST(mempool_perf_autotest_1core, test_mempool_perf_1core);
+REGISTER_PERF_TEST(mempool_perf_autotest_2cores, test_mempool_perf_2cores);
+REGISTER_PERF_TEST(mempool_perf_autotest_allcores, test_mempool_perf_allcores);
--
2.17.1
^ permalink raw reply [flat|nested] 28+ messages in thread
* [PATCH v7] mempool: test performance with larger bursts
2024-01-21 4:52 [PATCH] mempool: test performance with larger bursts Morten Brørup
` (5 preceding siblings ...)
2024-02-20 14:01 ` [PATCH v6] " Morten Brørup
@ 2024-03-02 20:04 ` Morten Brørup
2024-04-04 9:26 ` Morten Brørup
2024-09-16 15:37 ` [PATCH v8] " Morten Brørup
2024-09-17 8:10 ` [PATCH v9] " Morten Brørup
8 siblings, 1 reply; 28+ messages in thread
From: Morten Brørup @ 2024-03-02 20:04 UTC (permalink / raw)
To: andrew.rybchenko, fengchengwen, thomas, honnappa.nagarahalli
Cc: dev, Morten Brørup
Bursts of up to 64, 128 and 256 packets are not uncommon, so increase the
maximum tested get and put burst sizes from 32 to 256.
For convenience, also test get and put burst sizes of
RTE_MEMPOOL_CACHE_MAX_SIZE.
Some applications keep more than 512 objects, so increase the maximum
number of kept objects from 512 to 32768, still in jumps of factor four.
This exceeds the typical mempool cache size of 512 objects, so the test
also exercises the mempool driver.
Increased the precision of rate_persec calculation by timing the actual
duration of the test, instead of assuming it took exactly 5 seconds.
Added cache guard to per-lcore stats structure.
Signed-off-by: Morten Brørup <mb@smartsharesystems.com>
Acked-by: Chengwen Feng <fengchengwen@huawei.com>
---
v7:
* Increase max burst size to 256. (Inspired by Honnappa)
v6:
* Do not test with more lcores than available. (Thomas)
v5:
* Increased N, to reduce measurement overhead with large numbers of kept
objects.
* Increased precision of rate_persec calculation.
* Added missing cache guard to per-lcore stats structure.
v4:
* v3 failed to apply; I had messed up something with git.
* Added ACK from Chengwen Feng.
v3:
* Increased max number of kept objects to 32768.
* Added get and put burst sizes of RTE_MEMPOOL_CACHE_MAX_SIZE objects.
* Print error if unable to allocate mempool.
* Initialize use_external_cache with each test.
A previous version of this patch had a bug, where all test runs
following the first would use external cache. (Chengwen Feng)
v2: Addressed feedback by Chengwen Feng
* Added get and put burst sizes of 64 objects, which is probably also not
uncommon packet burst size.
* Fixed list of number of kept objects so list remains in jumps of factor
four.
* Added three derivative test cases, for faster testing.
---
app/test/test_mempool_perf.c | 144 +++++++++++++++++++++++------------
1 file changed, 96 insertions(+), 48 deletions(-)
diff --git a/app/test/test_mempool_perf.c b/app/test/test_mempool_perf.c
index 96de347f04..bb40d1d911 100644
--- a/app/test/test_mempool_perf.c
+++ b/app/test/test_mempool_perf.c
@@ -54,22 +54,25 @@
*
* - Bulk size (*n_get_bulk*, *n_put_bulk*)
*
- * - Bulk get from 1 to 32
- * - Bulk put from 1 to 32
- * - Bulk get and put from 1 to 32, compile time constant
+ * - Bulk get from 1 to 256, and RTE_MEMPOOL_CACHE_MAX_SIZE
+ * - Bulk put from 1 to 256, and RTE_MEMPOOL_CACHE_MAX_SIZE
+ * - Bulk get and put from 1 to 256, and RTE_MEMPOOL_CACHE_MAX_SIZE, compile time constant
*
* - Number of kept objects (*n_keep*)
*
* - 32
* - 128
* - 512
+ * - 2048
+ * - 8192
+ * - 32768
*/
-#define N 65536
#define TIME_S 5
#define MEMPOOL_ELT_SIZE 2048
-#define MAX_KEEP 512
-#define MEMPOOL_SIZE ((rte_lcore_count()*(MAX_KEEP+RTE_MEMPOOL_CACHE_MAX_SIZE))-1)
+#define MAX_KEEP 32768
+#define N (128 * MAX_KEEP)
+#define MEMPOOL_SIZE ((rte_lcore_count()*(MAX_KEEP+RTE_MEMPOOL_CACHE_MAX_SIZE*2))-1)
/* Number of pointers fitting into one cache line. */
#define CACHE_LINE_BURST (RTE_CACHE_LINE_SIZE / sizeof(uintptr_t))
@@ -100,9 +103,11 @@ static unsigned n_keep;
/* true if we want to test with constant n_get_bulk and n_put_bulk */
static int use_constant_values;
-/* number of enqueues / dequeues */
+/* number of enqueues / dequeues, and time used */
struct mempool_test_stats {
uint64_t enq_count;
+ uint64_t duration_cycles;
+ RTE_CACHE_GUARD;
} __rte_cache_aligned;
static struct mempool_test_stats stats[RTE_MAX_LCORE];
@@ -185,6 +190,7 @@ per_lcore_mempool_test(void *arg)
GOTO_ERR(ret, out);
stats[lcore_id].enq_count = 0;
+ stats[lcore_id].duration_cycles = 0;
/* wait synchro for workers */
if (lcore_id != rte_get_main_lcore())
@@ -204,6 +210,15 @@ per_lcore_mempool_test(void *arg)
CACHE_LINE_BURST, CACHE_LINE_BURST);
else if (n_get_bulk == 32)
ret = test_loop(mp, cache, n_keep, 32, 32);
+ else if (n_get_bulk == 64)
+ ret = test_loop(mp, cache, n_keep, 64, 64);
+ else if (n_get_bulk == 128)
+ ret = test_loop(mp, cache, n_keep, 128, 128);
+ else if (n_get_bulk == 256)
+ ret = test_loop(mp, cache, n_keep, 256, 256);
+ else if (n_get_bulk == RTE_MEMPOOL_CACHE_MAX_SIZE)
+ ret = test_loop(mp, cache, n_keep,
+ RTE_MEMPOOL_CACHE_MAX_SIZE, RTE_MEMPOOL_CACHE_MAX_SIZE);
else
ret = -1;
@@ -215,6 +230,8 @@ per_lcore_mempool_test(void *arg)
stats[lcore_id].enq_count += N;
}
+ stats[lcore_id].duration_cycles = time_diff;
+
out:
if (use_external_cache) {
rte_mempool_cache_flush(cache, mp);
@@ -232,6 +249,7 @@ launch_cores(struct rte_mempool *mp, unsigned int cores)
uint64_t rate;
int ret;
unsigned cores_save = cores;
+ double hz = rte_get_timer_hz();
__atomic_store_n(&synchro, 0, __ATOMIC_RELAXED);
@@ -278,7 +296,9 @@ launch_cores(struct rte_mempool *mp, unsigned int cores)
rate = 0;
for (lcore_id = 0; lcore_id < RTE_MAX_LCORE; lcore_id++)
- rate += (stats[lcore_id].enq_count / TIME_S);
+ if (stats[lcore_id].duration_cycles != 0)
+ rate += (double)stats[lcore_id].enq_count * hz /
+ (double)stats[lcore_id].duration_cycles;
printf("rate_persec=%" PRIu64 "\n", rate);
@@ -287,11 +307,13 @@ launch_cores(struct rte_mempool *mp, unsigned int cores)
/* for a given number of core, launch all test cases */
static int
-do_one_mempool_test(struct rte_mempool *mp, unsigned int cores)
+do_one_mempool_test(struct rte_mempool *mp, unsigned int cores, int external_cache)
{
- unsigned int bulk_tab_get[] = { 1, 4, CACHE_LINE_BURST, 32, 0 };
- unsigned int bulk_tab_put[] = { 1, 4, CACHE_LINE_BURST, 32, 0 };
- unsigned int keep_tab[] = { 32, 128, 512, 0 };
+ unsigned int bulk_tab_get[] = { 1, 4, CACHE_LINE_BURST, 32, 64, 128, 256,
+ RTE_MEMPOOL_CACHE_MAX_SIZE, 0 };
+ unsigned int bulk_tab_put[] = { 1, 4, CACHE_LINE_BURST, 32, 64, 128, 256,
+ RTE_MEMPOOL_CACHE_MAX_SIZE, 0 };
+ unsigned int keep_tab[] = { 32, 128, 512, 2048, 8192, 32768, 0 };
unsigned *get_bulk_ptr;
unsigned *put_bulk_ptr;
unsigned *keep_ptr;
@@ -301,6 +323,10 @@ do_one_mempool_test(struct rte_mempool *mp, unsigned int cores)
for (put_bulk_ptr = bulk_tab_put; *put_bulk_ptr; put_bulk_ptr++) {
for (keep_ptr = keep_tab; *keep_ptr; keep_ptr++) {
+ if (*keep_ptr < *get_bulk_ptr || *keep_ptr < *put_bulk_ptr)
+ continue;
+
+ use_external_cache = external_cache;
use_constant_values = 0;
n_get_bulk = *get_bulk_ptr;
n_put_bulk = *put_bulk_ptr;
@@ -323,7 +349,7 @@ do_one_mempool_test(struct rte_mempool *mp, unsigned int cores)
}
static int
-test_mempool_perf(void)
+do_all_mempool_perf_tests(unsigned int cores)
{
struct rte_mempool *mp_cache = NULL;
struct rte_mempool *mp_nocache = NULL;
@@ -337,8 +363,10 @@ test_mempool_perf(void)
NULL, NULL,
my_obj_init, NULL,
SOCKET_ID_ANY, 0);
- if (mp_nocache == NULL)
+ if (mp_nocache == NULL) {
+ printf("cannot allocate mempool (without cache)\n");
goto err;
+ }
/* create a mempool (with cache) */
mp_cache = rte_mempool_create("perf_test_cache", MEMPOOL_SIZE,
@@ -347,8 +375,10 @@ test_mempool_perf(void)
NULL, NULL,
my_obj_init, NULL,
SOCKET_ID_ANY, 0);
- if (mp_cache == NULL)
+ if (mp_cache == NULL) {
+ printf("cannot allocate mempool (with cache)\n");
goto err;
+ }
default_pool_ops = rte_mbuf_best_mempool_ops();
/* Create a mempool based on Default handler */
@@ -376,65 +406,83 @@ test_mempool_perf(void)
rte_mempool_obj_iter(default_pool, my_obj_init, NULL);
- /* performance test with 1, 2 and max cores */
printf("start performance test (without cache)\n");
-
- if (do_one_mempool_test(mp_nocache, 1) < 0)
- goto err;
-
- if (do_one_mempool_test(mp_nocache, 2) < 0)
+ if (do_one_mempool_test(mp_nocache, cores, 0) < 0)
goto err;
- if (do_one_mempool_test(mp_nocache, rte_lcore_count()) < 0)
- goto err;
-
- /* performance test with 1, 2 and max cores */
printf("start performance test for %s (without cache)\n",
default_pool_ops);
-
- if (do_one_mempool_test(default_pool, 1) < 0)
+ if (do_one_mempool_test(default_pool, cores, 0) < 0)
goto err;
- if (do_one_mempool_test(default_pool, 2) < 0)
+ printf("start performance test (with cache)\n");
+ if (do_one_mempool_test(mp_cache, cores, 0) < 0)
goto err;
- if (do_one_mempool_test(default_pool, rte_lcore_count()) < 0)
+ printf("start performance test (with user-owned cache)\n");
+ if (do_one_mempool_test(mp_nocache, cores, 1) < 0)
goto err;
- /* performance test with 1, 2 and max cores */
- printf("start performance test (with cache)\n");
+ rte_mempool_list_dump(stdout);
- if (do_one_mempool_test(mp_cache, 1) < 0)
- goto err;
+ ret = 0;
- if (do_one_mempool_test(mp_cache, 2) < 0)
- goto err;
+err:
+ rte_mempool_free(mp_cache);
+ rte_mempool_free(mp_nocache);
+ rte_mempool_free(default_pool);
+ return ret;
+}
- if (do_one_mempool_test(mp_cache, rte_lcore_count()) < 0)
- goto err;
+static int
+test_mempool_perf_1core(void)
+{
+ return do_all_mempool_perf_tests(1);
+}
- /* performance test with 1, 2 and max cores */
- printf("start performance test (with user-owned cache)\n");
- use_external_cache = 1;
+static int
+test_mempool_perf_2cores(void)
+{
+ if (rte_lcore_count() < 2) {
+ printf("not enough lcores\n");
+ return -1;
+ }
+ return do_all_mempool_perf_tests(2);
+}
- if (do_one_mempool_test(mp_nocache, 1) < 0)
- goto err;
+static int
+test_mempool_perf_allcores(void)
+{
+ return do_all_mempool_perf_tests(rte_lcore_count());
+}
+
+static int
+test_mempool_perf(void)
+{
+ int ret = -1;
- if (do_one_mempool_test(mp_nocache, 2) < 0)
+ /* performance test with 1, 2 and max cores */
+ if (do_all_mempool_perf_tests(1) < 0)
goto err;
+ if (rte_lcore_count() == 1)
+ goto done;
- if (do_one_mempool_test(mp_nocache, rte_lcore_count()) < 0)
+ if (do_all_mempool_perf_tests(2) < 0)
goto err;
+ if (rte_lcore_count() == 2)
+ goto done;
- rte_mempool_list_dump(stdout);
+ if (do_all_mempool_perf_tests(rte_lcore_count()) < 0)
+ goto err;
+done:
ret = 0;
err:
- rte_mempool_free(mp_cache);
- rte_mempool_free(mp_nocache);
- rte_mempool_free(default_pool);
return ret;
}
REGISTER_PERF_TEST(mempool_perf_autotest, test_mempool_perf);
+REGISTER_PERF_TEST(mempool_perf_autotest_1core, test_mempool_perf_1core);
+REGISTER_PERF_TEST(mempool_perf_autotest_2cores, test_mempool_perf_2cores);
+REGISTER_PERF_TEST(mempool_perf_autotest_allcores, test_mempool_perf_allcores);
--
2.17.1
^ permalink raw reply [flat|nested] 28+ messages in thread
* RE: [PATCH v7] mempool: test performance with larger bursts
2024-03-02 20:04 ` [PATCH v7] " Morten Brørup
@ 2024-04-04 9:26 ` Morten Brørup
2024-06-10 8:56 ` Morten Brørup
0 siblings, 1 reply; 28+ messages in thread
From: Morten Brørup @ 2024-04-04 9:26 UTC (permalink / raw)
To: andrew.rybchenko, honnappa.nagarahalli, thomas; +Cc: dev, fengchengwen
PING for review. This patch is relatively trivial.
> From: Morten Brørup [mailto:mb@smartsharesystems.com]
> Sent: Saturday, 2 March 2024 21.04
>
> Bursts of up to 64, 128 and 256 packets are not uncommon, so increase the
> maximum tested get and put burst sizes from 32 to 256.
> For convenience, also test get and put burst sizes of
> RTE_MEMPOOL_CACHE_MAX_SIZE.
>
> Some applications keep more than 512 objects, so increase the maximum
> number of kept objects from 512 to 32768, still in jumps of factor four.
> This exceeds the typical mempool cache size of 512 objects, so the test
> also exercises the mempool driver.
>
> Increased the precision of rate_persec calculation by timing the actual
> duration of the test, instead of assuming it took exactly 5 seconds.
>
> Added cache guard to per-lcore stats structure.
>
> Signed-off-by: Morten Brørup <mb@smartsharesystems.com>
> Acked-by: Chengwen Feng <fengchengwen@huawei.com>
> ---
>
> v7:
> * Increase max burst size to 256. (Inspired by Honnappa)
> v6:
> * Do not test with more lcores than available. (Thomas)
> v5:
> * Increased N, to reduce measurement overhead with large numbers of kept
> objects.
> * Increased precision of rate_persec calculation.
> * Added missing cache guard to per-lcore stats structure.
> v4:
> * v3 failed to apply; I had messed up something with git.
> * Added ACK from Chengwen Feng.
> v3:
> * Increased max number of kept objects to 32768.
> * Added get and put burst sizes of RTE_MEMPOOL_CACHE_MAX_SIZE objects.
> * Print error if unable to allocate mempool.
> * Initialize use_external_cache with each test.
> A previous version of this patch had a bug, where all test runs
> following the first would use external cache. (Chengwen Feng)
> v2: Addressed feedback by Chengwen Feng
> * Added get and put burst sizes of 64 objects, which is probably also not
> uncommon packet burst size.
> * Fixed list of number of kept objects so list remains in jumps of factor
> four.
> * Added three derivative test cases, for faster testing.
> ---
> app/test/test_mempool_perf.c | 144 +++++++++++++++++++++++------------
> 1 file changed, 96 insertions(+), 48 deletions(-)
>
> diff --git a/app/test/test_mempool_perf.c b/app/test/test_mempool_perf.c
> index 96de347f04..bb40d1d911 100644
> --- a/app/test/test_mempool_perf.c
> +++ b/app/test/test_mempool_perf.c
> @@ -54,22 +54,25 @@
> *
> * - Bulk size (*n_get_bulk*, *n_put_bulk*)
> *
> - * - Bulk get from 1 to 32
> - * - Bulk put from 1 to 32
> - * - Bulk get and put from 1 to 32, compile time constant
> + * - Bulk get from 1 to 256, and RTE_MEMPOOL_CACHE_MAX_SIZE
> + * - Bulk put from 1 to 256, and RTE_MEMPOOL_CACHE_MAX_SIZE
> + * - Bulk get and put from 1 to 256, and RTE_MEMPOOL_CACHE_MAX_SIZE,
> compile time constant
> *
> * - Number of kept objects (*n_keep*)
> *
> * - 32
> * - 128
> * - 512
> + * - 2048
> + * - 8192
> + * - 32768
> */
>
> -#define N 65536
> #define TIME_S 5
> #define MEMPOOL_ELT_SIZE 2048
> -#define MAX_KEEP 512
> -#define MEMPOOL_SIZE
> ((rte_lcore_count()*(MAX_KEEP+RTE_MEMPOOL_CACHE_MAX_SIZE))-1)
> +#define MAX_KEEP 32768
> +#define N (128 * MAX_KEEP)
> +#define MEMPOOL_SIZE
> ((rte_lcore_count()*(MAX_KEEP+RTE_MEMPOOL_CACHE_MAX_SIZE*2))-1)
>
> /* Number of pointers fitting into one cache line. */
> #define CACHE_LINE_BURST (RTE_CACHE_LINE_SIZE / sizeof(uintptr_t))
> @@ -100,9 +103,11 @@ static unsigned n_keep;
> /* true if we want to test with constant n_get_bulk and n_put_bulk */
> static int use_constant_values;
>
> -/* number of enqueues / dequeues */
> +/* number of enqueues / dequeues, and time used */
> struct mempool_test_stats {
> uint64_t enq_count;
> + uint64_t duration_cycles;
> + RTE_CACHE_GUARD;
> } __rte_cache_aligned;
>
> static struct mempool_test_stats stats[RTE_MAX_LCORE];
> @@ -185,6 +190,7 @@ per_lcore_mempool_test(void *arg)
> GOTO_ERR(ret, out);
>
> stats[lcore_id].enq_count = 0;
> + stats[lcore_id].duration_cycles = 0;
>
> /* wait synchro for workers */
> if (lcore_id != rte_get_main_lcore())
> @@ -204,6 +210,15 @@ per_lcore_mempool_test(void *arg)
> CACHE_LINE_BURST, CACHE_LINE_BURST);
> else if (n_get_bulk == 32)
> ret = test_loop(mp, cache, n_keep, 32, 32);
> + else if (n_get_bulk == 64)
> + ret = test_loop(mp, cache, n_keep, 64, 64);
> + else if (n_get_bulk == 128)
> + ret = test_loop(mp, cache, n_keep, 128, 128);
> + else if (n_get_bulk == 256)
> + ret = test_loop(mp, cache, n_keep, 256, 256);
> + else if (n_get_bulk == RTE_MEMPOOL_CACHE_MAX_SIZE)
> + ret = test_loop(mp, cache, n_keep,
> + RTE_MEMPOOL_CACHE_MAX_SIZE,
> RTE_MEMPOOL_CACHE_MAX_SIZE);
> else
> ret = -1;
>
> @@ -215,6 +230,8 @@ per_lcore_mempool_test(void *arg)
> stats[lcore_id].enq_count += N;
> }
>
> + stats[lcore_id].duration_cycles = time_diff;
> +
> out:
> if (use_external_cache) {
> rte_mempool_cache_flush(cache, mp);
> @@ -232,6 +249,7 @@ launch_cores(struct rte_mempool *mp, unsigned int cores)
> uint64_t rate;
> int ret;
> unsigned cores_save = cores;
> + double hz = rte_get_timer_hz();
>
> __atomic_store_n(&synchro, 0, __ATOMIC_RELAXED);
>
> @@ -278,7 +296,9 @@ launch_cores(struct rte_mempool *mp, unsigned int cores)
>
> rate = 0;
> for (lcore_id = 0; lcore_id < RTE_MAX_LCORE; lcore_id++)
> - rate += (stats[lcore_id].enq_count / TIME_S);
> + if (stats[lcore_id].duration_cycles != 0)
> + rate += (double)stats[lcore_id].enq_count * hz /
> + (double)stats[lcore_id].duration_cycles;
>
> printf("rate_persec=%" PRIu64 "\n", rate);
>
> @@ -287,11 +307,13 @@ launch_cores(struct rte_mempool *mp, unsigned int cores)
>
> /* for a given number of core, launch all test cases */
> static int
> -do_one_mempool_test(struct rte_mempool *mp, unsigned int cores)
> +do_one_mempool_test(struct rte_mempool *mp, unsigned int cores, int
> external_cache)
> {
> - unsigned int bulk_tab_get[] = { 1, 4, CACHE_LINE_BURST, 32, 0 };
> - unsigned int bulk_tab_put[] = { 1, 4, CACHE_LINE_BURST, 32, 0 };
> - unsigned int keep_tab[] = { 32, 128, 512, 0 };
> + unsigned int bulk_tab_get[] = { 1, 4, CACHE_LINE_BURST, 32, 64, 128,
> 256,
> + RTE_MEMPOOL_CACHE_MAX_SIZE, 0 };
> + unsigned int bulk_tab_put[] = { 1, 4, CACHE_LINE_BURST, 32, 64, 128,
> 256,
> + RTE_MEMPOOL_CACHE_MAX_SIZE, 0 };
> + unsigned int keep_tab[] = { 32, 128, 512, 2048, 8192, 32768, 0 };
> unsigned *get_bulk_ptr;
> unsigned *put_bulk_ptr;
> unsigned *keep_ptr;
> @@ -301,6 +323,10 @@ do_one_mempool_test(struct rte_mempool *mp, unsigned int
> cores)
> for (put_bulk_ptr = bulk_tab_put; *put_bulk_ptr; put_bulk_ptr++)
> {
> for (keep_ptr = keep_tab; *keep_ptr; keep_ptr++) {
>
> + if (*keep_ptr < *get_bulk_ptr || *keep_ptr <
> *put_bulk_ptr)
> + continue;
> +
> + use_external_cache = external_cache;
> use_constant_values = 0;
> n_get_bulk = *get_bulk_ptr;
> n_put_bulk = *put_bulk_ptr;
> @@ -323,7 +349,7 @@ do_one_mempool_test(struct rte_mempool *mp, unsigned int
> cores)
> }
>
> static int
> -test_mempool_perf(void)
> +do_all_mempool_perf_tests(unsigned int cores)
> {
> struct rte_mempool *mp_cache = NULL;
> struct rte_mempool *mp_nocache = NULL;
> @@ -337,8 +363,10 @@ test_mempool_perf(void)
> NULL, NULL,
> my_obj_init, NULL,
> SOCKET_ID_ANY, 0);
> - if (mp_nocache == NULL)
> + if (mp_nocache == NULL) {
> + printf("cannot allocate mempool (without cache)\n");
> goto err;
> + }
>
> /* create a mempool (with cache) */
> mp_cache = rte_mempool_create("perf_test_cache", MEMPOOL_SIZE,
> @@ -347,8 +375,10 @@ test_mempool_perf(void)
> NULL, NULL,
> my_obj_init, NULL,
> SOCKET_ID_ANY, 0);
> - if (mp_cache == NULL)
> + if (mp_cache == NULL) {
> + printf("cannot allocate mempool (with cache)\n");
> goto err;
> + }
>
> default_pool_ops = rte_mbuf_best_mempool_ops();
> /* Create a mempool based on Default handler */
> @@ -376,65 +406,83 @@ test_mempool_perf(void)
>
> rte_mempool_obj_iter(default_pool, my_obj_init, NULL);
>
> - /* performance test with 1, 2 and max cores */
> printf("start performance test (without cache)\n");
> -
> - if (do_one_mempool_test(mp_nocache, 1) < 0)
> - goto err;
> -
> - if (do_one_mempool_test(mp_nocache, 2) < 0)
> + if (do_one_mempool_test(mp_nocache, cores, 0) < 0)
> goto err;
>
> - if (do_one_mempool_test(mp_nocache, rte_lcore_count()) < 0)
> - goto err;
> -
> - /* performance test with 1, 2 and max cores */
> printf("start performance test for %s (without cache)\n",
> default_pool_ops);
> -
> - if (do_one_mempool_test(default_pool, 1) < 0)
> + if (do_one_mempool_test(default_pool, cores, 0) < 0)
> goto err;
>
> - if (do_one_mempool_test(default_pool, 2) < 0)
> + printf("start performance test (with cache)\n");
> + if (do_one_mempool_test(mp_cache, cores, 0) < 0)
> goto err;
>
> - if (do_one_mempool_test(default_pool, rte_lcore_count()) < 0)
> + printf("start performance test (with user-owned cache)\n");
> + if (do_one_mempool_test(mp_nocache, cores, 1) < 0)
> goto err;
>
> - /* performance test with 1, 2 and max cores */
> - printf("start performance test (with cache)\n");
> + rte_mempool_list_dump(stdout);
>
> - if (do_one_mempool_test(mp_cache, 1) < 0)
> - goto err;
> + ret = 0;
>
> - if (do_one_mempool_test(mp_cache, 2) < 0)
> - goto err;
> +err:
> + rte_mempool_free(mp_cache);
> + rte_mempool_free(mp_nocache);
> + rte_mempool_free(default_pool);
> + return ret;
> +}
>
> - if (do_one_mempool_test(mp_cache, rte_lcore_count()) < 0)
> - goto err;
> +static int
> +test_mempool_perf_1core(void)
> +{
> + return do_all_mempool_perf_tests(1);
> +}
>
> - /* performance test with 1, 2 and max cores */
> - printf("start performance test (with user-owned cache)\n");
> - use_external_cache = 1;
> +static int
> +test_mempool_perf_2cores(void)
> +{
> + if (rte_lcore_count() < 2) {
> + printf("not enough lcores\n");
> + return -1;
> + }
> + return do_all_mempool_perf_tests(2);
> +}
>
> - if (do_one_mempool_test(mp_nocache, 1) < 0)
> - goto err;
> +static int
> +test_mempool_perf_allcores(void)
> +{
> + return do_all_mempool_perf_tests(rte_lcore_count());
> +}
> +
> +static int
> +test_mempool_perf(void)
> +{
> + int ret = -1;
>
> - if (do_one_mempool_test(mp_nocache, 2) < 0)
> + /* performance test with 1, 2 and max cores */
> + if (do_all_mempool_perf_tests(1) < 0)
> goto err;
> + if (rte_lcore_count() == 1)
> + goto done;
>
> - if (do_one_mempool_test(mp_nocache, rte_lcore_count()) < 0)
> + if (do_all_mempool_perf_tests(2) < 0)
> goto err;
> + if (rte_lcore_count() == 2)
> + goto done;
>
> - rte_mempool_list_dump(stdout);
> + if (do_all_mempool_perf_tests(rte_lcore_count()) < 0)
> + goto err;
>
> +done:
> ret = 0;
>
> err:
> - rte_mempool_free(mp_cache);
> - rte_mempool_free(mp_nocache);
> - rte_mempool_free(default_pool);
> return ret;
> }
>
> REGISTER_PERF_TEST(mempool_perf_autotest, test_mempool_perf);
> +REGISTER_PERF_TEST(mempool_perf_autotest_1core, test_mempool_perf_1core);
> +REGISTER_PERF_TEST(mempool_perf_autotest_2cores, test_mempool_perf_2cores);
> +REGISTER_PERF_TEST(mempool_perf_autotest_allcores,
> test_mempool_perf_allcores);
> --
> 2.17.1
^ permalink raw reply [flat|nested] 28+ messages in thread
* RE: [PATCH v7] mempool: test performance with larger bursts
2024-04-04 9:26 ` Morten Brørup
@ 2024-06-10 8:56 ` Morten Brørup
2024-06-18 13:21 ` Bruce Richardson
0 siblings, 1 reply; 28+ messages in thread
From: Morten Brørup @ 2024-06-10 8:56 UTC (permalink / raw)
To: honnappa.nagarahalli, thomas, andrew.rybchenko; +Cc: dev, fengchengwen
PING (again) for review.
Many applications use bursts of more than 32 packets,
and some applications buffer more than 512 packets.
This patch updates the mempool perf test accordingly.
-Morten
> From: Morten Brørup [mailto:mb@smartsharesystems.com]
> Sent: Thursday, 4 April 2024 11.27
>
> PING for review. This patch is relatively trivial.
>
> > From: Morten Brørup [mailto:mb@smartsharesystems.com]
> > Sent: Saturday, 2 March 2024 21.04
> >
> > Bursts of up to 64, 128 and 256 packets are not uncommon, so increase the
> > maximum tested get and put burst sizes from 32 to 256.
> > For convenience, also test get and put burst sizes of
> > RTE_MEMPOOL_CACHE_MAX_SIZE.
> >
> > Some applications keep more than 512 objects, so increase the maximum
> > number of kept objects from 512 to 32768, still in jumps of factor four.
> > This exceeds the typical mempool cache size of 512 objects, so the test
> > also exercises the mempool driver.
> >
> > Increased the precision of rate_persec calculation by timing the actual
> > duration of the test, instead of assuming it took exactly 5 seconds.
> >
> > Added cache guard to per-lcore stats structure.
> >
> > Signed-off-by: Morten Brørup <mb@smartsharesystems.com>
> > Acked-by: Chengwen Feng <fengchengwen@huawei.com>
> > ---
> >
> > v7:
> > * Increase max burst size to 256. (Inspired by Honnappa)
> > v6:
> > * Do not test with more lcores than available. (Thomas)
> > v5:
> > * Increased N, to reduce measurement overhead with large numbers of kept
> > objects.
> > * Increased precision of rate_persec calculation.
> > * Added missing cache guard to per-lcore stats structure.
> > v4:
> > * v3 failed to apply; I had messed up something with git.
> > * Added ACK from Chengwen Feng.
> > v3:
> > * Increased max number of kept objects to 32768.
> > * Added get and put burst sizes of RTE_MEMPOOL_CACHE_MAX_SIZE objects.
> > * Print error if unable to allocate mempool.
> > * Initialize use_external_cache with each test.
> > A previous version of this patch had a bug, where all test runs
> > following the first would use external cache. (Chengwen Feng)
> > v2: Addressed feedback by Chengwen Feng
> > * Added get and put burst sizes of 64 objects, which is probably also not
> > uncommon packet burst size.
> > * Fixed list of number of kept objects so list remains in jumps of factor
> > four.
> > * Added three derivative test cases, for faster testing.
> > ---
> > app/test/test_mempool_perf.c | 144 +++++++++++++++++++++++------------
> > 1 file changed, 96 insertions(+), 48 deletions(-)
> >
> > diff --git a/app/test/test_mempool_perf.c b/app/test/test_mempool_perf.c
> > index 96de347f04..bb40d1d911 100644
> > --- a/app/test/test_mempool_perf.c
> > +++ b/app/test/test_mempool_perf.c
> > @@ -54,22 +54,25 @@
> > *
> > * - Bulk size (*n_get_bulk*, *n_put_bulk*)
> > *
> > - * - Bulk get from 1 to 32
> > - * - Bulk put from 1 to 32
> > - * - Bulk get and put from 1 to 32, compile time constant
> > + * - Bulk get from 1 to 256, and RTE_MEMPOOL_CACHE_MAX_SIZE
> > + * - Bulk put from 1 to 256, and RTE_MEMPOOL_CACHE_MAX_SIZE
> > + * - Bulk get and put from 1 to 256, and RTE_MEMPOOL_CACHE_MAX_SIZE,
> > compile time constant
> > *
> > * - Number of kept objects (*n_keep*)
> > *
> > * - 32
> > * - 128
> > * - 512
> > + * - 2048
> > + * - 8192
> > + * - 32768
> > */
> >
> > -#define N 65536
> > #define TIME_S 5
> > #define MEMPOOL_ELT_SIZE 2048
> > -#define MAX_KEEP 512
> > -#define MEMPOOL_SIZE
> > ((rte_lcore_count()*(MAX_KEEP+RTE_MEMPOOL_CACHE_MAX_SIZE))-1)
> > +#define MAX_KEEP 32768
> > +#define N (128 * MAX_KEEP)
> > +#define MEMPOOL_SIZE
> > ((rte_lcore_count()*(MAX_KEEP+RTE_MEMPOOL_CACHE_MAX_SIZE*2))-1)
> >
> > /* Number of pointers fitting into one cache line. */
> > #define CACHE_LINE_BURST (RTE_CACHE_LINE_SIZE / sizeof(uintptr_t))
> > @@ -100,9 +103,11 @@ static unsigned n_keep;
> > /* true if we want to test with constant n_get_bulk and n_put_bulk */
> > static int use_constant_values;
> >
> > -/* number of enqueues / dequeues */
> > +/* number of enqueues / dequeues, and time used */
> > struct mempool_test_stats {
> > uint64_t enq_count;
> > + uint64_t duration_cycles;
> > + RTE_CACHE_GUARD;
> > } __rte_cache_aligned;
> >
> > static struct mempool_test_stats stats[RTE_MAX_LCORE];
> > @@ -185,6 +190,7 @@ per_lcore_mempool_test(void *arg)
> > GOTO_ERR(ret, out);
> >
> > stats[lcore_id].enq_count = 0;
> > + stats[lcore_id].duration_cycles = 0;
> >
> > /* wait synchro for workers */
> > if (lcore_id != rte_get_main_lcore())
> > @@ -204,6 +210,15 @@ per_lcore_mempool_test(void *arg)
> > CACHE_LINE_BURST, CACHE_LINE_BURST);
> > else if (n_get_bulk == 32)
> > ret = test_loop(mp, cache, n_keep, 32, 32);
> > + else if (n_get_bulk == 64)
> > + ret = test_loop(mp, cache, n_keep, 64, 64);
> > + else if (n_get_bulk == 128)
> > + ret = test_loop(mp, cache, n_keep, 128, 128);
> > + else if (n_get_bulk == 256)
> > + ret = test_loop(mp, cache, n_keep, 256, 256);
> > + else if (n_get_bulk == RTE_MEMPOOL_CACHE_MAX_SIZE)
> > + ret = test_loop(mp, cache, n_keep,
> > + RTE_MEMPOOL_CACHE_MAX_SIZE,
> > RTE_MEMPOOL_CACHE_MAX_SIZE);
> > else
> > ret = -1;
> >
> > @@ -215,6 +230,8 @@ per_lcore_mempool_test(void *arg)
> > stats[lcore_id].enq_count += N;
> > }
> >
> > + stats[lcore_id].duration_cycles = time_diff;
> > +
> > out:
> > if (use_external_cache) {
> > rte_mempool_cache_flush(cache, mp);
> > @@ -232,6 +249,7 @@ launch_cores(struct rte_mempool *mp, unsigned int cores)
> > uint64_t rate;
> > int ret;
> > unsigned cores_save = cores;
> > + double hz = rte_get_timer_hz();
> >
> > __atomic_store_n(&synchro, 0, __ATOMIC_RELAXED);
> >
> > @@ -278,7 +296,9 @@ launch_cores(struct rte_mempool *mp, unsigned int cores)
> >
> > rate = 0;
> > for (lcore_id = 0; lcore_id < RTE_MAX_LCORE; lcore_id++)
> > - rate += (stats[lcore_id].enq_count / TIME_S);
> > + if (stats[lcore_id].duration_cycles != 0)
> > + rate += (double)stats[lcore_id].enq_count * hz /
> > + (double)stats[lcore_id].duration_cycles;
> >
> > printf("rate_persec=%" PRIu64 "\n", rate);
> >
> > @@ -287,11 +307,13 @@ launch_cores(struct rte_mempool *mp, unsigned int
> cores)
> >
> > /* for a given number of core, launch all test cases */
> > static int
> > -do_one_mempool_test(struct rte_mempool *mp, unsigned int cores)
> > +do_one_mempool_test(struct rte_mempool *mp, unsigned int cores, int
> > external_cache)
> > {
> > - unsigned int bulk_tab_get[] = { 1, 4, CACHE_LINE_BURST, 32, 0 };
> > - unsigned int bulk_tab_put[] = { 1, 4, CACHE_LINE_BURST, 32, 0 };
> > - unsigned int keep_tab[] = { 32, 128, 512, 0 };
> > + unsigned int bulk_tab_get[] = { 1, 4, CACHE_LINE_BURST, 32, 64, 128,
> > 256,
> > + RTE_MEMPOOL_CACHE_MAX_SIZE, 0 };
> > + unsigned int bulk_tab_put[] = { 1, 4, CACHE_LINE_BURST, 32, 64, 128,
> > 256,
> > + RTE_MEMPOOL_CACHE_MAX_SIZE, 0 };
> > + unsigned int keep_tab[] = { 32, 128, 512, 2048, 8192, 32768, 0 };
> > unsigned *get_bulk_ptr;
> > unsigned *put_bulk_ptr;
> > unsigned *keep_ptr;
> > @@ -301,6 +323,10 @@ do_one_mempool_test(struct rte_mempool *mp, unsigned
> int
> > cores)
> > for (put_bulk_ptr = bulk_tab_put; *put_bulk_ptr; put_bulk_ptr++)
> > {
> > for (keep_ptr = keep_tab; *keep_ptr; keep_ptr++) {
> >
> > + if (*keep_ptr < *get_bulk_ptr || *keep_ptr <
> > *put_bulk_ptr)
> > + continue;
> > +
> > + use_external_cache = external_cache;
> > use_constant_values = 0;
> > n_get_bulk = *get_bulk_ptr;
> > n_put_bulk = *put_bulk_ptr;
> > @@ -323,7 +349,7 @@ do_one_mempool_test(struct rte_mempool *mp, unsigned int
> > cores)
> > }
> >
> > static int
> > -test_mempool_perf(void)
> > +do_all_mempool_perf_tests(unsigned int cores)
> > {
> > struct rte_mempool *mp_cache = NULL;
> > struct rte_mempool *mp_nocache = NULL;
> > @@ -337,8 +363,10 @@ test_mempool_perf(void)
> > NULL, NULL,
> > my_obj_init, NULL,
> > SOCKET_ID_ANY, 0);
> > - if (mp_nocache == NULL)
> > + if (mp_nocache == NULL) {
> > + printf("cannot allocate mempool (without cache)\n");
> > goto err;
> > + }
> >
> > /* create a mempool (with cache) */
> > mp_cache = rte_mempool_create("perf_test_cache", MEMPOOL_SIZE,
> > @@ -347,8 +375,10 @@ test_mempool_perf(void)
> > NULL, NULL,
> > my_obj_init, NULL,
> > SOCKET_ID_ANY, 0);
> > - if (mp_cache == NULL)
> > + if (mp_cache == NULL) {
> > + printf("cannot allocate mempool (with cache)\n");
> > goto err;
> > + }
> >
> > default_pool_ops = rte_mbuf_best_mempool_ops();
> > /* Create a mempool based on Default handler */
> > @@ -376,65 +406,83 @@ test_mempool_perf(void)
> >
> > rte_mempool_obj_iter(default_pool, my_obj_init, NULL);
> >
> > - /* performance test with 1, 2 and max cores */
> > printf("start performance test (without cache)\n");
> > -
> > - if (do_one_mempool_test(mp_nocache, 1) < 0)
> > - goto err;
> > -
> > - if (do_one_mempool_test(mp_nocache, 2) < 0)
> > + if (do_one_mempool_test(mp_nocache, cores, 0) < 0)
> > goto err;
> >
> > - if (do_one_mempool_test(mp_nocache, rte_lcore_count()) < 0)
> > - goto err;
> > -
> > - /* performance test with 1, 2 and max cores */
> > printf("start performance test for %s (without cache)\n",
> > default_pool_ops);
> > -
> > - if (do_one_mempool_test(default_pool, 1) < 0)
> > + if (do_one_mempool_test(default_pool, cores, 0) < 0)
> > goto err;
> >
> > - if (do_one_mempool_test(default_pool, 2) < 0)
> > + printf("start performance test (with cache)\n");
> > + if (do_one_mempool_test(mp_cache, cores, 0) < 0)
> > goto err;
> >
> > - if (do_one_mempool_test(default_pool, rte_lcore_count()) < 0)
> > + printf("start performance test (with user-owned cache)\n");
> > + if (do_one_mempool_test(mp_nocache, cores, 1) < 0)
> > goto err;
> >
> > - /* performance test with 1, 2 and max cores */
> > - printf("start performance test (with cache)\n");
> > + rte_mempool_list_dump(stdout);
> >
> > - if (do_one_mempool_test(mp_cache, 1) < 0)
> > - goto err;
> > + ret = 0;
> >
> > - if (do_one_mempool_test(mp_cache, 2) < 0)
> > - goto err;
> > +err:
> > + rte_mempool_free(mp_cache);
> > + rte_mempool_free(mp_nocache);
> > + rte_mempool_free(default_pool);
> > + return ret;
> > +}
> >
> > - if (do_one_mempool_test(mp_cache, rte_lcore_count()) < 0)
> > - goto err;
> > +static int
> > +test_mempool_perf_1core(void)
> > +{
> > + return do_all_mempool_perf_tests(1);
> > +}
> >
> > - /* performance test with 1, 2 and max cores */
> > - printf("start performance test (with user-owned cache)\n");
> > - use_external_cache = 1;
> > +static int
> > +test_mempool_perf_2cores(void)
> > +{
> > + if (rte_lcore_count() < 2) {
> > + printf("not enough lcores\n");
> > + return -1;
> > + }
> > + return do_all_mempool_perf_tests(2);
> > +}
> >
> > - if (do_one_mempool_test(mp_nocache, 1) < 0)
> > - goto err;
> > +static int
> > +test_mempool_perf_allcores(void)
> > +{
> > + return do_all_mempool_perf_tests(rte_lcore_count());
> > +}
> > +
> > +static int
> > +test_mempool_perf(void)
> > +{
> > + int ret = -1;
> >
> > - if (do_one_mempool_test(mp_nocache, 2) < 0)
> > + /* performance test with 1, 2 and max cores */
> > + if (do_all_mempool_perf_tests(1) < 0)
> > goto err;
> > + if (rte_lcore_count() == 1)
> > + goto done;
> >
> > - if (do_one_mempool_test(mp_nocache, rte_lcore_count()) < 0)
> > + if (do_all_mempool_perf_tests(2) < 0)
> > goto err;
> > + if (rte_lcore_count() == 2)
> > + goto done;
> >
> > - rte_mempool_list_dump(stdout);
> > + if (do_all_mempool_perf_tests(rte_lcore_count()) < 0)
> > + goto err;
> >
> > +done:
> > ret = 0;
> >
> > err:
> > - rte_mempool_free(mp_cache);
> > - rte_mempool_free(mp_nocache);
> > - rte_mempool_free(default_pool);
> > return ret;
> > }
> >
> > REGISTER_PERF_TEST(mempool_perf_autotest, test_mempool_perf);
> > +REGISTER_PERF_TEST(mempool_perf_autotest_1core, test_mempool_perf_1core);
> > +REGISTER_PERF_TEST(mempool_perf_autotest_2cores, test_mempool_perf_2cores);
> > +REGISTER_PERF_TEST(mempool_perf_autotest_allcores,
> > test_mempool_perf_allcores);
> > --
> > 2.17.1
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH v7] mempool: test performance with larger bursts
2024-06-10 8:56 ` Morten Brørup
@ 2024-06-18 13:21 ` Bruce Richardson
2024-06-18 13:48 ` Morten Brørup
0 siblings, 1 reply; 28+ messages in thread
From: Bruce Richardson @ 2024-06-18 13:21 UTC (permalink / raw)
To: Morten Brørup
Cc: honnappa.nagarahalli, thomas, andrew.rybchenko, dev, fengchengwen
On Mon, Jun 10, 2024 at 10:56:00AM +0200, Morten Brørup wrote:
> PING (again) for review.
>
> Many applications use bursts of more than 32 packets,
> and some applications buffer more than 512 packets.
>
> This patch updates the mempool perf test accordingly.
>
> -Morten
>
> > From: Morten Brørup [mailto:mb@smartsharesystems.com]
> > Sent: Thursday, 4 April 2024 11.27
> >
> > PING for review. This patch is relatively trivial.
> >
> > > From: Morten Brørup [mailto:mb@smartsharesystems.com]
> > > Sent: Saturday, 2 March 2024 21.04
> > >
> > > Bursts of up to 64, 128 and 256 packets are not uncommon, so increase the
> > > maximum tested get and put burst sizes from 32 to 256.
> > > For convenience, also test get and put burst sizes of
> > > RTE_MEMPOOL_CACHE_MAX_SIZE.
> > >
> > > Some applications keep more than 512 objects, so increase the maximum
> > > number of kept objects from 512 to 32768, still in jumps of factor four.
> > > This exceeds the typical mempool cache size of 512 objects, so the test
> > > also exercises the mempool driver.
> > >
> > > Increased the precision of rate_persec calculation by timing the actual
> > > duration of the test, instead of assuming it took exactly 5 seconds.
> > >
> > > Added cache guard to per-lcore stats structure.
> > >
> > > Signed-off-by: Morten Brørup <mb@smartsharesystems.com>
> > > Acked-by: Chengwen Feng <fengchengwen@huawei.com>
> > > ---
> > >
> > > v7:
> > > * Increase max burst size to 256. (Inspired by Honnappa)
> > > v6:
> > > * Do not test with more lcores than available. (Thomas)
> > > v5:
> > > * Increased N, to reduce measurement overhead with large numbers of kept
> > > objects.
> > > * Increased precision of rate_persec calculation.
> > > * Added missing cache guard to per-lcore stats structure.
This looks ok to me. However, the test itself takes a very long time to
run, with 5 seconds per iteration. One suggest I have is to reduce the
5-seconds to 1-second - given we are looking at millions of iterations each
time, the difference in results should not be that great, I'd hope. A very
quick test of the delta on my end indicates variance in the first couple of
results of a couple of %, just.
With or without this suggestion.
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
^ permalink raw reply [flat|nested] 28+ messages in thread
* RE: [PATCH v7] mempool: test performance with larger bursts
2024-06-18 13:21 ` Bruce Richardson
@ 2024-06-18 13:48 ` Morten Brørup
2024-09-13 14:58 ` Morten Brørup
0 siblings, 1 reply; 28+ messages in thread
From: Morten Brørup @ 2024-06-18 13:48 UTC (permalink / raw)
To: Bruce Richardson
Cc: honnappa.nagarahalli, thomas, andrew.rybchenko, dev, fengchengwen
> From: Bruce Richardson [mailto:bruce.richardson@intel.com]
>
> On Mon, Jun 10, 2024 at 10:56:00AM +0200, Morten Brørup wrote:
> > PING (again) for review.
> >
> > Many applications use bursts of more than 32 packets,
> > and some applications buffer more than 512 packets.
> >
> > This patch updates the mempool perf test accordingly.
> >
> > -Morten
> >
> > > From: Morten Brørup [mailto:mb@smartsharesystems.com]
> > > Sent: Thursday, 4 April 2024 11.27
> > >
> > > PING for review. This patch is relatively trivial.
> > >
> > > > From: Morten Brørup [mailto:mb@smartsharesystems.com]
> > > > Sent: Saturday, 2 March 2024 21.04
> > > >
> > > > Bursts of up to 64, 128 and 256 packets are not uncommon, so increase
> the
> > > > maximum tested get and put burst sizes from 32 to 256.
> > > > For convenience, also test get and put burst sizes of
> > > > RTE_MEMPOOL_CACHE_MAX_SIZE.
> > > >
> > > > Some applications keep more than 512 objects, so increase the maximum
> > > > number of kept objects from 512 to 32768, still in jumps of factor four.
> > > > This exceeds the typical mempool cache size of 512 objects, so the test
> > > > also exercises the mempool driver.
> > > >
> > > > Increased the precision of rate_persec calculation by timing the actual
> > > > duration of the test, instead of assuming it took exactly 5 seconds.
> > > >
> > > > Added cache guard to per-lcore stats structure.
> > > >
> > > > Signed-off-by: Morten Brørup <mb@smartsharesystems.com>
> > > > Acked-by: Chengwen Feng <fengchengwen@huawei.com>
> > > > ---
> > > >
> > > > v7:
> > > > * Increase max burst size to 256. (Inspired by Honnappa)
> > > > v6:
> > > > * Do not test with more lcores than available. (Thomas)
> > > > v5:
> > > > * Increased N, to reduce measurement overhead with large numbers of kept
> > > > objects.
> > > > * Increased precision of rate_persec calculation.
> > > > * Added missing cache guard to per-lcore stats structure.
>
> This looks ok to me. However, the test itself takes a very long time to
> run, with 5 seconds per iteration. One suggest I have is to reduce the
> 5-seconds to 1-second - given we are looking at millions of iterations each
> time, the difference in results should not be that great, I'd hope.
The test duration annoys me too.
Reducing the duration of each iteration would make the test more sensitive to short spikes of noise, e.g. from noisy neighbors in virtual environments.
Someone once decided that 5 seconds was a good duration, and I didn't want to challenge that.
I also considered reducing the array of tested burst sizes, by jumping factor four here too; but I assume that both 32, 64, 128 and 256 are popular max burst sizes in applications, so I decided to keep them all, instead of omitting 32 and 128 and only keeping 64 and 256 to represent full bursts.
> A very
> quick test of the delta on my end indicates variance in the first couple of
> results of a couple of %, just.
Thanks for the review and suggestions, though.
>
> With or without this suggestion.
>
> Acked-by: Bruce Richardson <bruce.richardson@intel.com>
^ permalink raw reply [flat|nested] 28+ messages in thread
* RE: [PATCH v7] mempool: test performance with larger bursts
2024-06-18 13:48 ` Morten Brørup
@ 2024-09-13 14:58 ` Morten Brørup
2024-09-16 12:40 ` Thomas Monjalon
0 siblings, 1 reply; 28+ messages in thread
From: Morten Brørup @ 2024-09-13 14:58 UTC (permalink / raw)
To: thomas
Cc: honnappa.nagarahalli, andrew.rybchenko, dev, fengchengwen,
Bruce Richardson
PING for apply.
Patch has 2 acks.
And since it was signed off by a co-maintainer (myself), I don't think an ack from the other co-maintainer (Andrew) is required. Please correct me if I'm wrong?
-Morten
> -----Original Message-----
> From: Morten Brørup [mailto:mb@smartsharesystems.com]
> Sent: Tuesday, 18 June 2024 15.48
> To: Bruce Richardson
> Cc: honnappa.nagarahalli@arm.com; thomas@monjalon.net;
> andrew.rybchenko@oktetlabs.ru; dev@dpdk.org; fengchengwen@huawei.com
> Subject: RE: [PATCH v7] mempool: test performance with larger bursts
>
> > From: Bruce Richardson [mailto:bruce.richardson@intel.com]
> >
> > On Mon, Jun 10, 2024 at 10:56:00AM +0200, Morten Brørup wrote:
> > > PING (again) for review.
> > >
> > > Many applications use bursts of more than 32 packets,
> > > and some applications buffer more than 512 packets.
> > >
> > > This patch updates the mempool perf test accordingly.
> > >
> > > -Morten
> > >
> > > > From: Morten Brørup [mailto:mb@smartsharesystems.com]
> > > > Sent: Thursday, 4 April 2024 11.27
> > > >
> > > > PING for review. This patch is relatively trivial.
> > > >
> > > > > From: Morten Brørup [mailto:mb@smartsharesystems.com]
> > > > > Sent: Saturday, 2 March 2024 21.04
> > > > >
> > > > > Bursts of up to 64, 128 and 256 packets are not uncommon, so increase
> > the
> > > > > maximum tested get and put burst sizes from 32 to 256.
> > > > > For convenience, also test get and put burst sizes of
> > > > > RTE_MEMPOOL_CACHE_MAX_SIZE.
> > > > >
> > > > > Some applications keep more than 512 objects, so increase the maximum
> > > > > number of kept objects from 512 to 32768, still in jumps of factor
> four.
> > > > > This exceeds the typical mempool cache size of 512 objects, so the
> test
> > > > > also exercises the mempool driver.
> > > > >
> > > > > Increased the precision of rate_persec calculation by timing the
> actual
> > > > > duration of the test, instead of assuming it took exactly 5 seconds.
> > > > >
> > > > > Added cache guard to per-lcore stats structure.
> > > > >
> > > > > Signed-off-by: Morten Brørup <mb@smartsharesystems.com>
> > > > > Acked-by: Chengwen Feng <fengchengwen@huawei.com>
> > > > > ---
> > > > >
> > > > > v7:
> > > > > * Increase max burst size to 256. (Inspired by Honnappa)
> > > > > v6:
> > > > > * Do not test with more lcores than available. (Thomas)
> > > > > v5:
> > > > > * Increased N, to reduce measurement overhead with large numbers of
> kept
> > > > > objects.
> > > > > * Increased precision of rate_persec calculation.
> > > > > * Added missing cache guard to per-lcore stats structure.
> >
> > This looks ok to me. However, the test itself takes a very long time to
> > run, with 5 seconds per iteration. One suggest I have is to reduce the
> > 5-seconds to 1-second - given we are looking at millions of iterations each
> > time, the difference in results should not be that great, I'd hope.
>
> The test duration annoys me too.
>
> Reducing the duration of each iteration would make the test more sensitive to
> short spikes of noise, e.g. from noisy neighbors in virtual environments.
> Someone once decided that 5 seconds was a good duration, and I didn't want to
> challenge that.
>
> I also considered reducing the array of tested burst sizes, by jumping factor
> four here too; but I assume that both 32, 64, 128 and 256 are popular max
> burst sizes in applications, so I decided to keep them all, instead of
> omitting 32 and 128 and only keeping 64 and 256 to represent full bursts.
>
> > A very
> > quick test of the delta on my end indicates variance in the first couple of
> > results of a couple of %, just.
>
> Thanks for the review and suggestions, though.
>
> >
> > With or without this suggestion.
> >
> > Acked-by: Bruce Richardson <bruce.richardson@intel.com>
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH v7] mempool: test performance with larger bursts
2024-09-13 14:58 ` Morten Brørup
@ 2024-09-16 12:40 ` Thomas Monjalon
2024-09-16 13:08 ` Morten Brørup
0 siblings, 1 reply; 28+ messages in thread
From: Thomas Monjalon @ 2024-09-16 12:40 UTC (permalink / raw)
To: Morten Brørup
Cc: honnappa.nagarahalli, andrew.rybchenko, dev, fengchengwen,
Bruce Richardson
13/09/2024 16:58, Morten Brørup:
> PING for apply.
>
> Patch has 2 acks.
> And since it was signed off by a co-maintainer (myself),
> I don't think an ack from the other co-maintainer (Andrew) is required.
> Please correct me if I'm wrong?
It's not a matter of acks.
I feel we should reduce from 5 seconds to 1 second as part of this patch.
But seeing there is no more comments, I suppose I should apply this version.
> From: Morten Brørup [mailto:mb@smartsharesystems.com]
> > From: Bruce Richardson [mailto:bruce.richardson@intel.com]
> > > On Mon, Jun 10, 2024 at 10:56:00AM +0200, Morten Brørup wrote:
> > > > PING (again) for review.
> > > >
> > > > Many applications use bursts of more than 32 packets,
> > > > and some applications buffer more than 512 packets.
> > > >
> > > > This patch updates the mempool perf test accordingly.
> > > >
> > > > > From: Morten Brørup [mailto:mb@smartsharesystems.com]
> > > > > Sent: Thursday, 4 April 2024 11.27
> > > > >
> > > > > PING for review. This patch is relatively trivial.
> > > > >
> > > > > > From: Morten Brørup [mailto:mb@smartsharesystems.com]
> > > > > > Sent: Saturday, 2 March 2024 21.04
> > > > > >
> > > > > > Bursts of up to 64, 128 and 256 packets are not uncommon, so increase
> > > the
> > > > > > maximum tested get and put burst sizes from 32 to 256.
> > > > > > For convenience, also test get and put burst sizes of
> > > > > > RTE_MEMPOOL_CACHE_MAX_SIZE.
> > > > > >
> > > > > > Some applications keep more than 512 objects, so increase the maximum
> > > > > > number of kept objects from 512 to 32768, still in jumps of factor
> > four.
> > > > > > This exceeds the typical mempool cache size of 512 objects, so the
> > test
> > > > > > also exercises the mempool driver.
> > > > > >
> > > > > > Increased the precision of rate_persec calculation by timing the
> > actual
> > > > > > duration of the test, instead of assuming it took exactly 5 seconds.
> > > > > >
> > > > > > Added cache guard to per-lcore stats structure.
> > > > > >
> > > > > > Signed-off-by: Morten Brørup <mb@smartsharesystems.com>
> > > > > > Acked-by: Chengwen Feng <fengchengwen@huawei.com>
> > >
> > > This looks ok to me. However, the test itself takes a very long time to
> > > run, with 5 seconds per iteration. One suggest I have is to reduce the
> > > 5-seconds to 1-second - given we are looking at millions of iterations each
> > > time, the difference in results should not be that great, I'd hope.
> >
> > The test duration annoys me too.
> >
> > Reducing the duration of each iteration would make the test more sensitive to
> > short spikes of noise, e.g. from noisy neighbors in virtual environments.
> > Someone once decided that 5 seconds was a good duration, and I didn't want to
> > challenge that.
> >
> > I also considered reducing the array of tested burst sizes, by jumping factor
> > four here too; but I assume that both 32, 64, 128 and 256 are popular max
> > burst sizes in applications, so I decided to keep them all, instead of
> > omitting 32 and 128 and only keeping 64 and 256 to represent full bursts.
> >
> > > A very
> > > quick test of the delta on my end indicates variance in the first couple of
> > > results of a couple of %, just.
> >
> > Thanks for the review and suggestions, though.
> >
> > >
> > > With or without this suggestion.
> > >
> > > Acked-by: Bruce Richardson <bruce.richardson@intel.com>
^ permalink raw reply [flat|nested] 28+ messages in thread
* RE: [PATCH v7] mempool: test performance with larger bursts
2024-09-16 12:40 ` Thomas Monjalon
@ 2024-09-16 13:08 ` Morten Brørup
2024-09-16 14:04 ` Thomas Monjalon
0 siblings, 1 reply; 28+ messages in thread
From: Morten Brørup @ 2024-09-16 13:08 UTC (permalink / raw)
To: Thomas Monjalon
Cc: honnappa.nagarahalli, andrew.rybchenko, dev, fengchengwen,
Bruce Richardson
> From: Thomas Monjalon [mailto:thomas@monjalon.net]
> Sent: Monday, 16 September 2024 14.41
>
> 13/09/2024 16:58, Morten Brørup:
> > PING for apply.
> >
> > Patch has 2 acks.
> > And since it was signed off by a co-maintainer (myself),
> > I don't think an ack from the other co-maintainer (Andrew) is required.
> > Please correct me if I'm wrong?
>
>
> It's not a matter of acks.
>
> I feel we should reduce from 5 seconds to 1 second as part of this patch.
As mentioned below, I was worried about reducing the test duration.
If you think the test will be accurate enough, I can easily reduce it to 1; you're not the only person annoyed by the long test duration.
> But seeing there is no more comments, I suppose I should apply this version.
Should I send v8 with reduced test duration from 5 to 1 second?
>
>
>
> > From: Morten Brørup [mailto:mb@smartsharesystems.com]
> > > From: Bruce Richardson [mailto:bruce.richardson@intel.com]
> > > > On Mon, Jun 10, 2024 at 10:56:00AM +0200, Morten Brørup wrote:
> > > > > PING (again) for review.
> > > > >
> > > > > Many applications use bursts of more than 32 packets,
> > > > > and some applications buffer more than 512 packets.
> > > > >
> > > > > This patch updates the mempool perf test accordingly.
> > > > >
> > > > > > From: Morten Brørup [mailto:mb@smartsharesystems.com]
> > > > > > Sent: Thursday, 4 April 2024 11.27
> > > > > >
> > > > > > PING for review. This patch is relatively trivial.
> > > > > >
> > > > > > > From: Morten Brørup [mailto:mb@smartsharesystems.com]
> > > > > > > Sent: Saturday, 2 March 2024 21.04
> > > > > > >
> > > > > > > Bursts of up to 64, 128 and 256 packets are not uncommon, so
> increase
> > > > the
> > > > > > > maximum tested get and put burst sizes from 32 to 256.
> > > > > > > For convenience, also test get and put burst sizes of
> > > > > > > RTE_MEMPOOL_CACHE_MAX_SIZE.
> > > > > > >
> > > > > > > Some applications keep more than 512 objects, so increase the
> maximum
> > > > > > > number of kept objects from 512 to 32768, still in jumps of factor
> > > four.
> > > > > > > This exceeds the typical mempool cache size of 512 objects, so the
> > > test
> > > > > > > also exercises the mempool driver.
> > > > > > >
> > > > > > > Increased the precision of rate_persec calculation by timing the
> > > actual
> > > > > > > duration of the test, instead of assuming it took exactly 5
> seconds.
> > > > > > >
> > > > > > > Added cache guard to per-lcore stats structure.
> > > > > > >
> > > > > > > Signed-off-by: Morten Brørup <mb@smartsharesystems.com>
> > > > > > > Acked-by: Chengwen Feng <fengchengwen@huawei.com>
> > > >
> > > > This looks ok to me. However, the test itself takes a very long time to
> > > > run, with 5 seconds per iteration. One suggest I have is to reduce the
> > > > 5-seconds to 1-second - given we are looking at millions of iterations
> each
> > > > time, the difference in results should not be that great, I'd hope.
> > >
> > > The test duration annoys me too.
> > >
> > > Reducing the duration of each iteration would make the test more sensitive
> to
> > > short spikes of noise, e.g. from noisy neighbors in virtual environments.
> > > Someone once decided that 5 seconds was a good duration, and I didn't want
> to
> > > challenge that.
> > >
> > > I also considered reducing the array of tested burst sizes, by jumping
> factor
> > > four here too; but I assume that both 32, 64, 128 and 256 are popular max
> > > burst sizes in applications, so I decided to keep them all, instead of
> > > omitting 32 and 128 and only keeping 64 and 256 to represent full bursts.
> > >
> > > > A very
> > > > quick test of the delta on my end indicates variance in the first couple
> of
> > > > results of a couple of %, just.
> > >
> > > Thanks for the review and suggestions, though.
> > >
> > > >
> > > > With or without this suggestion.
> > > >
> > > > Acked-by: Bruce Richardson <bruce.richardson@intel.com>
>
>
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH v7] mempool: test performance with larger bursts
2024-09-16 13:08 ` Morten Brørup
@ 2024-09-16 14:04 ` Thomas Monjalon
0 siblings, 0 replies; 28+ messages in thread
From: Thomas Monjalon @ 2024-09-16 14:04 UTC (permalink / raw)
To: Morten Brørup
Cc: honnappa.nagarahalli, andrew.rybchenko, dev, fengchengwen,
Bruce Richardson
16/09/2024 15:08, Morten Brørup:
> > From: Thomas Monjalon [mailto:thomas@monjalon.net]
> > Sent: Monday, 16 September 2024 14.41
> >
> > 13/09/2024 16:58, Morten Brørup:
> > > PING for apply.
> > >
> > > Patch has 2 acks.
> > > And since it was signed off by a co-maintainer (myself),
> > > I don't think an ack from the other co-maintainer (Andrew) is required.
> > > Please correct me if I'm wrong?
> >
> >
> > It's not a matter of acks.
> >
> > I feel we should reduce from 5 seconds to 1 second as part of this patch.
>
> As mentioned below, I was worried about reducing the test duration.
> If you think the test will be accurate enough, I can easily reduce it to 1; you're not the only person annoyed by the long test duration.
I cannot be sure but it's probably worth to try.
> > But seeing there is no more comments, I suppose I should apply this version.
>
> Should I send v8 with reduced test duration from 5 to 1 second?
If you agree to try this duration, yes please send a v8.
^ permalink raw reply [flat|nested] 28+ messages in thread
* [PATCH v8] mempool: test performance with larger bursts
2024-01-21 4:52 [PATCH] mempool: test performance with larger bursts Morten Brørup
` (6 preceding siblings ...)
2024-03-02 20:04 ` [PATCH v7] " Morten Brørup
@ 2024-09-16 15:37 ` Morten Brørup
2024-09-17 8:10 ` [PATCH v9] " Morten Brørup
8 siblings, 0 replies; 28+ messages in thread
From: Morten Brørup @ 2024-09-16 15:37 UTC (permalink / raw)
To: andrew.rybchenko, fengchengwen, thomas, honnappa.nagarahalli,
bruce.richardson
Cc: dev, Morten Brørup
Bursts of up to 64, 128 and 256 packets are not uncommon, so increase the
maximum tested get and put burst sizes from 32 to 256.
For convenience, also test get and put burst sizes of
RTE_MEMPOOL_CACHE_MAX_SIZE.
Some applications keep more than 512 objects, so increase the maximum
number of kept objects from 512 to 32768, still in jumps of factor four.
This exceeds the typical mempool cache size of 512 objects, so the test
also exercises the mempool driver.
Reduced the duration of each iteration from 5 seconds to 1 second.
Increased the precision of rate_persec calculation by timing the actual
duration of the test, instead of assuming it took exactly 1 second.
Added cache guard to per-lcore stats structure.
Signed-off-by: Morten Brørup <mb@smartsharesystems.com>
Acked-by: Chengwen Feng <fengchengwen@huawei.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
---
v8:
* Reduced test iteration duration to 1 second. (Bruce, Thomas)
v7:
* Increase max burst size to 256. (Inspired by Honnappa)
v6:
* Do not test with more lcores than available. (Thomas)
v5:
* Increased N, to reduce measurement overhead with large numbers of kept
objects.
* Increased precision of rate_persec calculation.
* Added missing cache guard to per-lcore stats structure.
v4:
* v3 failed to apply; I had messed up something with git.
* Added ACK from Chengwen Feng.
v3:
* Increased max number of kept objects to 32768.
* Added get and put burst sizes of RTE_MEMPOOL_CACHE_MAX_SIZE objects.
* Print error if unable to allocate mempool.
* Initialize use_external_cache with each test.
A previous version of this patch had a bug, where all test runs
following the first would use external cache. (Chengwen Feng)
v2: Addressed feedback by Chengwen Feng
* Added get and put burst sizes of 64 objects, which is probably also not
uncommon packet burst size.
* Fixed list of number of kept objects so list remains in jumps of factor
four.
* Added three derivative test cases, for faster testing.
---
app/test/test_mempool_perf.c | 146 +++++++++++++++++++++++------------
1 file changed, 97 insertions(+), 49 deletions(-)
diff --git a/app/test/test_mempool_perf.c b/app/test/test_mempool_perf.c
index 96de347f04..b86c3449f4 100644
--- a/app/test/test_mempool_perf.c
+++ b/app/test/test_mempool_perf.c
@@ -54,22 +54,25 @@
*
* - Bulk size (*n_get_bulk*, *n_put_bulk*)
*
- * - Bulk get from 1 to 32
- * - Bulk put from 1 to 32
- * - Bulk get and put from 1 to 32, compile time constant
+ * - Bulk get from 1 to 256, and RTE_MEMPOOL_CACHE_MAX_SIZE
+ * - Bulk put from 1 to 256, and RTE_MEMPOOL_CACHE_MAX_SIZE
+ * - Bulk get and put from 1 to 256, and RTE_MEMPOOL_CACHE_MAX_SIZE, compile time constant
*
* - Number of kept objects (*n_keep*)
*
* - 32
* - 128
* - 512
+ * - 2048
+ * - 8192
+ * - 32768
*/
-#define N 65536
-#define TIME_S 5
+#define TIME_S 1
#define MEMPOOL_ELT_SIZE 2048
-#define MAX_KEEP 512
-#define MEMPOOL_SIZE ((rte_lcore_count()*(MAX_KEEP+RTE_MEMPOOL_CACHE_MAX_SIZE))-1)
+#define MAX_KEEP 32768
+#define N (128 * MAX_KEEP)
+#define MEMPOOL_SIZE ((rte_lcore_count()*(MAX_KEEP+RTE_MEMPOOL_CACHE_MAX_SIZE*2))-1)
/* Number of pointers fitting into one cache line. */
#define CACHE_LINE_BURST (RTE_CACHE_LINE_SIZE / sizeof(uintptr_t))
@@ -100,9 +103,11 @@ static unsigned n_keep;
/* true if we want to test with constant n_get_bulk and n_put_bulk */
static int use_constant_values;
-/* number of enqueues / dequeues */
+/* number of enqueues / dequeues, and time used */
struct mempool_test_stats {
uint64_t enq_count;
+ uint64_t duration_cycles;
+ RTE_CACHE_GUARD;
} __rte_cache_aligned;
static struct mempool_test_stats stats[RTE_MAX_LCORE];
@@ -185,6 +190,7 @@ per_lcore_mempool_test(void *arg)
GOTO_ERR(ret, out);
stats[lcore_id].enq_count = 0;
+ stats[lcore_id].duration_cycles = 0;
/* wait synchro for workers */
if (lcore_id != rte_get_main_lcore())
@@ -204,6 +210,15 @@ per_lcore_mempool_test(void *arg)
CACHE_LINE_BURST, CACHE_LINE_BURST);
else if (n_get_bulk == 32)
ret = test_loop(mp, cache, n_keep, 32, 32);
+ else if (n_get_bulk == 64)
+ ret = test_loop(mp, cache, n_keep, 64, 64);
+ else if (n_get_bulk == 128)
+ ret = test_loop(mp, cache, n_keep, 128, 128);
+ else if (n_get_bulk == 256)
+ ret = test_loop(mp, cache, n_keep, 256, 256);
+ else if (n_get_bulk == RTE_MEMPOOL_CACHE_MAX_SIZE)
+ ret = test_loop(mp, cache, n_keep,
+ RTE_MEMPOOL_CACHE_MAX_SIZE, RTE_MEMPOOL_CACHE_MAX_SIZE);
else
ret = -1;
@@ -215,6 +230,8 @@ per_lcore_mempool_test(void *arg)
stats[lcore_id].enq_count += N;
}
+ stats[lcore_id].duration_cycles = time_diff;
+
out:
if (use_external_cache) {
rte_mempool_cache_flush(cache, mp);
@@ -232,6 +249,7 @@ launch_cores(struct rte_mempool *mp, unsigned int cores)
uint64_t rate;
int ret;
unsigned cores_save = cores;
+ double hz = rte_get_timer_hz();
__atomic_store_n(&synchro, 0, __ATOMIC_RELAXED);
@@ -278,7 +296,9 @@ launch_cores(struct rte_mempool *mp, unsigned int cores)
rate = 0;
for (lcore_id = 0; lcore_id < RTE_MAX_LCORE; lcore_id++)
- rate += (stats[lcore_id].enq_count / TIME_S);
+ if (stats[lcore_id].duration_cycles != 0)
+ rate += (double)stats[lcore_id].enq_count * hz /
+ (double)stats[lcore_id].duration_cycles;
printf("rate_persec=%" PRIu64 "\n", rate);
@@ -287,11 +307,13 @@ launch_cores(struct rte_mempool *mp, unsigned int cores)
/* for a given number of core, launch all test cases */
static int
-do_one_mempool_test(struct rte_mempool *mp, unsigned int cores)
+do_one_mempool_test(struct rte_mempool *mp, unsigned int cores, int external_cache)
{
- unsigned int bulk_tab_get[] = { 1, 4, CACHE_LINE_BURST, 32, 0 };
- unsigned int bulk_tab_put[] = { 1, 4, CACHE_LINE_BURST, 32, 0 };
- unsigned int keep_tab[] = { 32, 128, 512, 0 };
+ unsigned int bulk_tab_get[] = { 1, 4, CACHE_LINE_BURST, 32, 64, 128, 256,
+ RTE_MEMPOOL_CACHE_MAX_SIZE, 0 };
+ unsigned int bulk_tab_put[] = { 1, 4, CACHE_LINE_BURST, 32, 64, 128, 256,
+ RTE_MEMPOOL_CACHE_MAX_SIZE, 0 };
+ unsigned int keep_tab[] = { 32, 128, 512, 2048, 8192, 32768, 0 };
unsigned *get_bulk_ptr;
unsigned *put_bulk_ptr;
unsigned *keep_ptr;
@@ -301,6 +323,10 @@ do_one_mempool_test(struct rte_mempool *mp, unsigned int cores)
for (put_bulk_ptr = bulk_tab_put; *put_bulk_ptr; put_bulk_ptr++) {
for (keep_ptr = keep_tab; *keep_ptr; keep_ptr++) {
+ if (*keep_ptr < *get_bulk_ptr || *keep_ptr < *put_bulk_ptr)
+ continue;
+
+ use_external_cache = external_cache;
use_constant_values = 0;
n_get_bulk = *get_bulk_ptr;
n_put_bulk = *put_bulk_ptr;
@@ -323,7 +349,7 @@ do_one_mempool_test(struct rte_mempool *mp, unsigned int cores)
}
static int
-test_mempool_perf(void)
+do_all_mempool_perf_tests(unsigned int cores)
{
struct rte_mempool *mp_cache = NULL;
struct rte_mempool *mp_nocache = NULL;
@@ -337,8 +363,10 @@ test_mempool_perf(void)
NULL, NULL,
my_obj_init, NULL,
SOCKET_ID_ANY, 0);
- if (mp_nocache == NULL)
+ if (mp_nocache == NULL) {
+ printf("cannot allocate mempool (without cache)\n");
goto err;
+ }
/* create a mempool (with cache) */
mp_cache = rte_mempool_create("perf_test_cache", MEMPOOL_SIZE,
@@ -347,8 +375,10 @@ test_mempool_perf(void)
NULL, NULL,
my_obj_init, NULL,
SOCKET_ID_ANY, 0);
- if (mp_cache == NULL)
+ if (mp_cache == NULL) {
+ printf("cannot allocate mempool (with cache)\n");
goto err;
+ }
default_pool_ops = rte_mbuf_best_mempool_ops();
/* Create a mempool based on Default handler */
@@ -376,65 +406,83 @@ test_mempool_perf(void)
rte_mempool_obj_iter(default_pool, my_obj_init, NULL);
- /* performance test with 1, 2 and max cores */
printf("start performance test (without cache)\n");
-
- if (do_one_mempool_test(mp_nocache, 1) < 0)
- goto err;
-
- if (do_one_mempool_test(mp_nocache, 2) < 0)
+ if (do_one_mempool_test(mp_nocache, cores, 0) < 0)
goto err;
- if (do_one_mempool_test(mp_nocache, rte_lcore_count()) < 0)
- goto err;
-
- /* performance test with 1, 2 and max cores */
printf("start performance test for %s (without cache)\n",
default_pool_ops);
-
- if (do_one_mempool_test(default_pool, 1) < 0)
+ if (do_one_mempool_test(default_pool, cores, 0) < 0)
goto err;
- if (do_one_mempool_test(default_pool, 2) < 0)
+ printf("start performance test (with cache)\n");
+ if (do_one_mempool_test(mp_cache, cores, 0) < 0)
goto err;
- if (do_one_mempool_test(default_pool, rte_lcore_count()) < 0)
+ printf("start performance test (with user-owned cache)\n");
+ if (do_one_mempool_test(mp_nocache, cores, 1) < 0)
goto err;
- /* performance test with 1, 2 and max cores */
- printf("start performance test (with cache)\n");
+ rte_mempool_list_dump(stdout);
- if (do_one_mempool_test(mp_cache, 1) < 0)
- goto err;
+ ret = 0;
- if (do_one_mempool_test(mp_cache, 2) < 0)
- goto err;
+err:
+ rte_mempool_free(mp_cache);
+ rte_mempool_free(mp_nocache);
+ rte_mempool_free(default_pool);
+ return ret;
+}
- if (do_one_mempool_test(mp_cache, rte_lcore_count()) < 0)
- goto err;
+static int
+test_mempool_perf_1core(void)
+{
+ return do_all_mempool_perf_tests(1);
+}
- /* performance test with 1, 2 and max cores */
- printf("start performance test (with user-owned cache)\n");
- use_external_cache = 1;
+static int
+test_mempool_perf_2cores(void)
+{
+ if (rte_lcore_count() < 2) {
+ printf("not enough lcores\n");
+ return -1;
+ }
+ return do_all_mempool_perf_tests(2);
+}
- if (do_one_mempool_test(mp_nocache, 1) < 0)
- goto err;
+static int
+test_mempool_perf_allcores(void)
+{
+ return do_all_mempool_perf_tests(rte_lcore_count());
+}
+
+static int
+test_mempool_perf(void)
+{
+ int ret = -1;
- if (do_one_mempool_test(mp_nocache, 2) < 0)
+ /* performance test with 1, 2 and max cores */
+ if (do_all_mempool_perf_tests(1) < 0)
goto err;
+ if (rte_lcore_count() == 1)
+ goto done;
- if (do_one_mempool_test(mp_nocache, rte_lcore_count()) < 0)
+ if (do_all_mempool_perf_tests(2) < 0)
goto err;
+ if (rte_lcore_count() == 2)
+ goto done;
- rte_mempool_list_dump(stdout);
+ if (do_all_mempool_perf_tests(rte_lcore_count()) < 0)
+ goto err;
+done:
ret = 0;
err:
- rte_mempool_free(mp_cache);
- rte_mempool_free(mp_nocache);
- rte_mempool_free(default_pool);
return ret;
}
REGISTER_PERF_TEST(mempool_perf_autotest, test_mempool_perf);
+REGISTER_PERF_TEST(mempool_perf_autotest_1core, test_mempool_perf_1core);
+REGISTER_PERF_TEST(mempool_perf_autotest_2cores, test_mempool_perf_2cores);
+REGISTER_PERF_TEST(mempool_perf_autotest_allcores, test_mempool_perf_allcores);
--
2.17.1
^ permalink raw reply [flat|nested] 28+ messages in thread
* [PATCH v9] mempool: test performance with larger bursts
2024-01-21 4:52 [PATCH] mempool: test performance with larger bursts Morten Brørup
` (7 preceding siblings ...)
2024-09-16 15:37 ` [PATCH v8] " Morten Brørup
@ 2024-09-17 8:10 ` Morten Brørup
2024-10-08 9:14 ` Morten Brørup
2024-10-11 12:50 ` David Marchand
8 siblings, 2 replies; 28+ messages in thread
From: Morten Brørup @ 2024-09-17 8:10 UTC (permalink / raw)
To: andrew.rybchenko, fengchengwen, thomas, honnappa.nagarahalli,
bruce.richardson
Cc: dev, Morten Brørup
Bursts of up to 64, 128 and 256 packets are not uncommon, so increase the
maximum tested get and put burst sizes from 32 to 256.
For convenience, also test get and put burst sizes of
RTE_MEMPOOL_CACHE_MAX_SIZE.
Some applications keep more than 512 objects, so increase the maximum
number of kept objects from 512 to 32768, still in jumps of factor four.
This exceeds the typical mempool cache size of 512 objects, so the test
also exercises the mempool driver.
Reduced the duration of each iteration from 5 seconds to 1 second.
Increased the precision of rate_persec calculation by timing the actual
duration of the test, instead of assuming it took exactly 1 second.
Added cache guard to per-lcore stats structure.
Signed-off-by: Morten Brørup <mb@smartsharesystems.com>
Acked-by: Chengwen Feng <fengchengwen@huawei.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
---
v9:
* Rebased to main.
v8:
* Reduced test iteration duration to 1 second. (Bruce, Thomas)
v7:
* Increase max burst size to 256. (Inspired by Honnappa)
v6:
* Do not test with more lcores than available. (Thomas)
v5:
* Increased N, to reduce measurement overhead with large numbers of kept
objects.
* Increased precision of rate_persec calculation.
* Added missing cache guard to per-lcore stats structure.
v4:
* v3 failed to apply; I had messed up something with git.
* Added ACK from Chengwen Feng.
v3:
* Increased max number of kept objects to 32768.
* Added get and put burst sizes of RTE_MEMPOOL_CACHE_MAX_SIZE objects.
* Print error if unable to allocate mempool.
* Initialize use_external_cache with each test.
A previous version of this patch had a bug, where all test runs
following the first would use external cache. (Chengwen Feng)
v2: Addressed feedback by Chengwen Feng
* Added get and put burst sizes of 64 objects, which is probably also not
uncommon packet burst size.
* Fixed list of number of kept objects so list remains in jumps of factor
four.
* Added three derivative test cases, for faster testing.
---
app/test/test_mempool_perf.c | 146 +++++++++++++++++++++++------------
1 file changed, 97 insertions(+), 49 deletions(-)
diff --git a/app/test/test_mempool_perf.c b/app/test/test_mempool_perf.c
index 55e17cce47..4dd74ef75a 100644
--- a/app/test/test_mempool_perf.c
+++ b/app/test/test_mempool_perf.c
@@ -54,22 +54,25 @@
*
* - Bulk size (*n_get_bulk*, *n_put_bulk*)
*
- * - Bulk get from 1 to 32
- * - Bulk put from 1 to 32
- * - Bulk get and put from 1 to 32, compile time constant
+ * - Bulk get from 1 to 256, and RTE_MEMPOOL_CACHE_MAX_SIZE
+ * - Bulk put from 1 to 256, and RTE_MEMPOOL_CACHE_MAX_SIZE
+ * - Bulk get and put from 1 to 256, and RTE_MEMPOOL_CACHE_MAX_SIZE, compile time constant
*
* - Number of kept objects (*n_keep*)
*
* - 32
* - 128
* - 512
+ * - 2048
+ * - 8192
+ * - 32768
*/
-#define N 65536
-#define TIME_S 5
+#define TIME_S 1
#define MEMPOOL_ELT_SIZE 2048
-#define MAX_KEEP 512
-#define MEMPOOL_SIZE ((rte_lcore_count()*(MAX_KEEP+RTE_MEMPOOL_CACHE_MAX_SIZE))-1)
+#define MAX_KEEP 32768
+#define N (128 * MAX_KEEP)
+#define MEMPOOL_SIZE ((rte_lcore_count()*(MAX_KEEP+RTE_MEMPOOL_CACHE_MAX_SIZE*2))-1)
/* Number of pointers fitting into one cache line. */
#define CACHE_LINE_BURST (RTE_CACHE_LINE_SIZE / sizeof(uintptr_t))
@@ -100,9 +103,11 @@ static unsigned n_keep;
/* true if we want to test with constant n_get_bulk and n_put_bulk */
static int use_constant_values;
-/* number of enqueues / dequeues */
+/* number of enqueues / dequeues, and time used */
struct __rte_cache_aligned mempool_test_stats {
uint64_t enq_count;
+ uint64_t duration_cycles;
+ RTE_CACHE_GUARD;
};
static struct mempool_test_stats stats[RTE_MAX_LCORE];
@@ -185,6 +190,7 @@ per_lcore_mempool_test(void *arg)
GOTO_ERR(ret, out);
stats[lcore_id].enq_count = 0;
+ stats[lcore_id].duration_cycles = 0;
/* wait synchro for workers */
if (lcore_id != rte_get_main_lcore())
@@ -205,6 +211,15 @@ per_lcore_mempool_test(void *arg)
CACHE_LINE_BURST, CACHE_LINE_BURST);
else if (n_get_bulk == 32)
ret = test_loop(mp, cache, n_keep, 32, 32);
+ else if (n_get_bulk == 64)
+ ret = test_loop(mp, cache, n_keep, 64, 64);
+ else if (n_get_bulk == 128)
+ ret = test_loop(mp, cache, n_keep, 128, 128);
+ else if (n_get_bulk == 256)
+ ret = test_loop(mp, cache, n_keep, 256, 256);
+ else if (n_get_bulk == RTE_MEMPOOL_CACHE_MAX_SIZE)
+ ret = test_loop(mp, cache, n_keep,
+ RTE_MEMPOOL_CACHE_MAX_SIZE, RTE_MEMPOOL_CACHE_MAX_SIZE);
else
ret = -1;
@@ -216,6 +231,8 @@ per_lcore_mempool_test(void *arg)
stats[lcore_id].enq_count += N;
}
+ stats[lcore_id].duration_cycles = time_diff;
+
out:
if (use_external_cache) {
rte_mempool_cache_flush(cache, mp);
@@ -233,6 +250,7 @@ launch_cores(struct rte_mempool *mp, unsigned int cores)
uint64_t rate;
int ret;
unsigned cores_save = cores;
+ double hz = rte_get_timer_hz();
rte_atomic_store_explicit(&synchro, 0, rte_memory_order_relaxed);
@@ -279,7 +297,9 @@ launch_cores(struct rte_mempool *mp, unsigned int cores)
rate = 0;
for (lcore_id = 0; lcore_id < RTE_MAX_LCORE; lcore_id++)
- rate += (stats[lcore_id].enq_count / TIME_S);
+ if (stats[lcore_id].duration_cycles != 0)
+ rate += (double)stats[lcore_id].enq_count * hz /
+ (double)stats[lcore_id].duration_cycles;
printf("rate_persec=%" PRIu64 "\n", rate);
@@ -288,11 +308,13 @@ launch_cores(struct rte_mempool *mp, unsigned int cores)
/* for a given number of core, launch all test cases */
static int
-do_one_mempool_test(struct rte_mempool *mp, unsigned int cores)
+do_one_mempool_test(struct rte_mempool *mp, unsigned int cores, int external_cache)
{
- unsigned int bulk_tab_get[] = { 1, 4, CACHE_LINE_BURST, 32, 0 };
- unsigned int bulk_tab_put[] = { 1, 4, CACHE_LINE_BURST, 32, 0 };
- unsigned int keep_tab[] = { 32, 128, 512, 0 };
+ unsigned int bulk_tab_get[] = { 1, 4, CACHE_LINE_BURST, 32, 64, 128, 256,
+ RTE_MEMPOOL_CACHE_MAX_SIZE, 0 };
+ unsigned int bulk_tab_put[] = { 1, 4, CACHE_LINE_BURST, 32, 64, 128, 256,
+ RTE_MEMPOOL_CACHE_MAX_SIZE, 0 };
+ unsigned int keep_tab[] = { 32, 128, 512, 2048, 8192, 32768, 0 };
unsigned *get_bulk_ptr;
unsigned *put_bulk_ptr;
unsigned *keep_ptr;
@@ -302,6 +324,10 @@ do_one_mempool_test(struct rte_mempool *mp, unsigned int cores)
for (put_bulk_ptr = bulk_tab_put; *put_bulk_ptr; put_bulk_ptr++) {
for (keep_ptr = keep_tab; *keep_ptr; keep_ptr++) {
+ if (*keep_ptr < *get_bulk_ptr || *keep_ptr < *put_bulk_ptr)
+ continue;
+
+ use_external_cache = external_cache;
use_constant_values = 0;
n_get_bulk = *get_bulk_ptr;
n_put_bulk = *put_bulk_ptr;
@@ -324,7 +350,7 @@ do_one_mempool_test(struct rte_mempool *mp, unsigned int cores)
}
static int
-test_mempool_perf(void)
+do_all_mempool_perf_tests(unsigned int cores)
{
struct rte_mempool *mp_cache = NULL;
struct rte_mempool *mp_nocache = NULL;
@@ -338,8 +364,10 @@ test_mempool_perf(void)
NULL, NULL,
my_obj_init, NULL,
SOCKET_ID_ANY, 0);
- if (mp_nocache == NULL)
+ if (mp_nocache == NULL) {
+ printf("cannot allocate mempool (without cache)\n");
goto err;
+ }
/* create a mempool (with cache) */
mp_cache = rte_mempool_create("perf_test_cache", MEMPOOL_SIZE,
@@ -348,8 +376,10 @@ test_mempool_perf(void)
NULL, NULL,
my_obj_init, NULL,
SOCKET_ID_ANY, 0);
- if (mp_cache == NULL)
+ if (mp_cache == NULL) {
+ printf("cannot allocate mempool (with cache)\n");
goto err;
+ }
default_pool_ops = rte_mbuf_best_mempool_ops();
/* Create a mempool based on Default handler */
@@ -377,65 +407,83 @@ test_mempool_perf(void)
rte_mempool_obj_iter(default_pool, my_obj_init, NULL);
- /* performance test with 1, 2 and max cores */
printf("start performance test (without cache)\n");
-
- if (do_one_mempool_test(mp_nocache, 1) < 0)
- goto err;
-
- if (do_one_mempool_test(mp_nocache, 2) < 0)
+ if (do_one_mempool_test(mp_nocache, cores, 0) < 0)
goto err;
- if (do_one_mempool_test(mp_nocache, rte_lcore_count()) < 0)
- goto err;
-
- /* performance test with 1, 2 and max cores */
printf("start performance test for %s (without cache)\n",
default_pool_ops);
-
- if (do_one_mempool_test(default_pool, 1) < 0)
+ if (do_one_mempool_test(default_pool, cores, 0) < 0)
goto err;
- if (do_one_mempool_test(default_pool, 2) < 0)
+ printf("start performance test (with cache)\n");
+ if (do_one_mempool_test(mp_cache, cores, 0) < 0)
goto err;
- if (do_one_mempool_test(default_pool, rte_lcore_count()) < 0)
+ printf("start performance test (with user-owned cache)\n");
+ if (do_one_mempool_test(mp_nocache, cores, 1) < 0)
goto err;
- /* performance test with 1, 2 and max cores */
- printf("start performance test (with cache)\n");
+ rte_mempool_list_dump(stdout);
- if (do_one_mempool_test(mp_cache, 1) < 0)
- goto err;
+ ret = 0;
- if (do_one_mempool_test(mp_cache, 2) < 0)
- goto err;
+err:
+ rte_mempool_free(mp_cache);
+ rte_mempool_free(mp_nocache);
+ rte_mempool_free(default_pool);
+ return ret;
+}
- if (do_one_mempool_test(mp_cache, rte_lcore_count()) < 0)
- goto err;
+static int
+test_mempool_perf_1core(void)
+{
+ return do_all_mempool_perf_tests(1);
+}
- /* performance test with 1, 2 and max cores */
- printf("start performance test (with user-owned cache)\n");
- use_external_cache = 1;
+static int
+test_mempool_perf_2cores(void)
+{
+ if (rte_lcore_count() < 2) {
+ printf("not enough lcores\n");
+ return -1;
+ }
+ return do_all_mempool_perf_tests(2);
+}
- if (do_one_mempool_test(mp_nocache, 1) < 0)
- goto err;
+static int
+test_mempool_perf_allcores(void)
+{
+ return do_all_mempool_perf_tests(rte_lcore_count());
+}
+
+static int
+test_mempool_perf(void)
+{
+ int ret = -1;
- if (do_one_mempool_test(mp_nocache, 2) < 0)
+ /* performance test with 1, 2 and max cores */
+ if (do_all_mempool_perf_tests(1) < 0)
goto err;
+ if (rte_lcore_count() == 1)
+ goto done;
- if (do_one_mempool_test(mp_nocache, rte_lcore_count()) < 0)
+ if (do_all_mempool_perf_tests(2) < 0)
goto err;
+ if (rte_lcore_count() == 2)
+ goto done;
- rte_mempool_list_dump(stdout);
+ if (do_all_mempool_perf_tests(rte_lcore_count()) < 0)
+ goto err;
+done:
ret = 0;
err:
- rte_mempool_free(mp_cache);
- rte_mempool_free(mp_nocache);
- rte_mempool_free(default_pool);
return ret;
}
REGISTER_PERF_TEST(mempool_perf_autotest, test_mempool_perf);
+REGISTER_PERF_TEST(mempool_perf_autotest_1core, test_mempool_perf_1core);
+REGISTER_PERF_TEST(mempool_perf_autotest_2cores, test_mempool_perf_2cores);
+REGISTER_PERF_TEST(mempool_perf_autotest_allcores, test_mempool_perf_allcores);
--
2.43.0
^ permalink raw reply [flat|nested] 28+ messages in thread
* RE: [PATCH v9] mempool: test performance with larger bursts
2024-09-17 8:10 ` [PATCH v9] " Morten Brørup
@ 2024-10-08 9:14 ` Morten Brørup
2024-10-11 12:50 ` David Marchand
1 sibling, 0 replies; 28+ messages in thread
From: Morten Brørup @ 2024-10-08 9:14 UTC (permalink / raw)
To: thomas, bruce.richardson
Cc: dev, honnappa.nagarahalli, andrew.rybchenko, fengchengwen
> From: Morten Brørup [mailto:mb@smartsharesystems.com]
> Sent: Tuesday, 17 September 2024 10.10
>
> Bursts of up to 64, 128 and 256 packets are not uncommon, so increase
> the
> maximum tested get and put burst sizes from 32 to 256.
> For convenience, also test get and put burst sizes of
> RTE_MEMPOOL_CACHE_MAX_SIZE.
>
> Some applications keep more than 512 objects, so increase the maximum
> number of kept objects from 512 to 32768, still in jumps of factor
> four.
> This exceeds the typical mempool cache size of 512 objects, so the test
> also exercises the mempool driver.
>
> Reduced the duration of each iteration from 5 seconds to 1 second.
>
> Increased the precision of rate_persec calculation by timing the actual
> duration of the test, instead of assuming it took exactly 1 second.
>
> Added cache guard to per-lcore stats structure.
>
> Signed-off-by: Morten Brørup <mb@smartsharesystems.com>
> Acked-by: Chengwen Feng <fengchengwen@huawei.com>
> Acked-by: Bruce Richardson <bruce.richardson@intel.com>
> ---
Duration of mempool_perf_autotest on a 4 core x86_64 virtual machine: 1 hour 15 minutes
Tested-by: Morten Brørup <mb@smartsharesystems.com>
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH v9] mempool: test performance with larger bursts
2024-09-17 8:10 ` [PATCH v9] " Morten Brørup
2024-10-08 9:14 ` Morten Brørup
@ 2024-10-11 12:50 ` David Marchand
1 sibling, 0 replies; 28+ messages in thread
From: David Marchand @ 2024-10-11 12:50 UTC (permalink / raw)
To: Morten Brørup
Cc: andrew.rybchenko, fengchengwen, thomas, honnappa.nagarahalli,
bruce.richardson, dev
On Tue, Sep 17, 2024 at 10:10 AM Morten Brørup <mb@smartsharesystems.com> wrote:
>
> Bursts of up to 64, 128 and 256 packets are not uncommon, so increase the
> maximum tested get and put burst sizes from 32 to 256.
> For convenience, also test get and put burst sizes of
> RTE_MEMPOOL_CACHE_MAX_SIZE.
>
> Some applications keep more than 512 objects, so increase the maximum
> number of kept objects from 512 to 32768, still in jumps of factor four.
> This exceeds the typical mempool cache size of 512 objects, so the test
> also exercises the mempool driver.
>
> Reduced the duration of each iteration from 5 seconds to 1 second.
>
> Increased the precision of rate_persec calculation by timing the actual
> duration of the test, instead of assuming it took exactly 1 second.
>
> Added cache guard to per-lcore stats structure.
>
> Signed-off-by: Morten Brørup <mb@smartsharesystems.com>
> Acked-by: Chengwen Feng <fengchengwen@huawei.com>
> Acked-by: Bruce Richardson <bruce.richardson@intel.com>
Applied, thanks Morten.
--
David Marchand
^ permalink raw reply [flat|nested] 28+ messages in thread