[PATCH v9] mempool: test performance with larger bursts

DPDK patches and discussions
 help / color / mirror / Atom feed

From: "Morten Brørup" <mb@smartsharesystems.com>
To: andrew.rybchenko@oktetlabs.ru, fengchengwen@huawei.com,
	thomas@monjalon.net, honnappa.nagarahalli@arm.com,
	bruce.richardson@intel.com
Cc: dev@dpdk.org, "Morten Brørup" <mb@smartsharesystems.com>
Subject: [PATCH v9] mempool: test performance with larger bursts
Date: Tue, 17 Sep 2024 08:10:01 +0000	[thread overview]
Message-ID: <20240917081001.774187-1-mb@smartsharesystems.com> (raw)
In-Reply-To: <20240121045249.22465-1-mb@smartsharesystems.com>

Bursts of up to 64, 128 and 256 packets are not uncommon, so increase the
maximum tested get and put burst sizes from 32 to 256.
For convenience, also test get and put burst sizes of
RTE_MEMPOOL_CACHE_MAX_SIZE.

Some applications keep more than 512 objects, so increase the maximum
number of kept objects from 512 to 32768, still in jumps of factor four.
This exceeds the typical mempool cache size of 512 objects, so the test
also exercises the mempool driver.

Reduced the duration of each iteration from 5 seconds to 1 second.

Increased the precision of rate_persec calculation by timing the actual
duration of the test, instead of assuming it took exactly 1 second.

Added cache guard to per-lcore stats structure.

Signed-off-by: Morten Brørup <mb@smartsharesystems.com>
Acked-by: Chengwen Feng <fengchengwen@huawei.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
---

v9:
* Rebased to main.
v8:
* Reduced test iteration duration to 1 second. (Bruce, Thomas)
v7:
* Increase max burst size to 256. (Inspired by Honnappa)
v6:
* Do not test with more lcores than available. (Thomas)
v5:
* Increased N, to reduce measurement overhead with large numbers of kept
  objects.
* Increased precision of rate_persec calculation.
* Added missing cache guard to per-lcore stats structure.
v4:
* v3 failed to apply; I had messed up something with git.
* Added ACK from Chengwen Feng.
v3:
* Increased max number of kept objects to 32768.
* Added get and put burst sizes of RTE_MEMPOOL_CACHE_MAX_SIZE objects.
* Print error if unable to allocate mempool.
* Initialize use_external_cache with each test.
  A previous version of this patch had a bug, where all test runs
  following the first would use external cache. (Chengwen Feng)
v2: Addressed feedback by Chengwen Feng
* Added get and put burst sizes of 64 objects, which is probably also not
  uncommon packet burst size.
* Fixed list of number of kept objects so list remains in jumps of factor
  four.
* Added three derivative test cases, for faster testing.
---
 app/test/test_mempool_perf.c | 146 +++++++++++++++++++++++------------
 1 file changed, 97 insertions(+), 49 deletions(-)

diff --git a/app/test/test_mempool_perf.c b/app/test/test_mempool_perf.c
index 55e17cce47..4dd74ef75a 100644
--- a/app/test/test_mempool_perf.c
+++ b/app/test/test_mempool_perf.c
@@ -54,22 +54,25 @@
  *
  *    - Bulk size (*n_get_bulk*, *n_put_bulk*)
  *
- *      - Bulk get from 1 to 32
- *      - Bulk put from 1 to 32
- *      - Bulk get and put from 1 to 32, compile time constant
+ *      - Bulk get from 1 to 256, and RTE_MEMPOOL_CACHE_MAX_SIZE
+ *      - Bulk put from 1 to 256, and RTE_MEMPOOL_CACHE_MAX_SIZE
+ *      - Bulk get and put from 1 to 256, and RTE_MEMPOOL_CACHE_MAX_SIZE, compile time constant
  *
  *    - Number of kept objects (*n_keep*)
  *
  *      - 32
  *      - 128
  *      - 512
+ *      - 2048
+ *      - 8192
+ *      - 32768
  */
 
-#define N 65536
-#define TIME_S 5
+#define TIME_S 1
 #define MEMPOOL_ELT_SIZE 2048
-#define MAX_KEEP 512
-#define MEMPOOL_SIZE ((rte_lcore_count()*(MAX_KEEP+RTE_MEMPOOL_CACHE_MAX_SIZE))-1)
+#define MAX_KEEP 32768
+#define N (128 * MAX_KEEP)
+#define MEMPOOL_SIZE ((rte_lcore_count()*(MAX_KEEP+RTE_MEMPOOL_CACHE_MAX_SIZE*2))-1)
 
 /* Number of pointers fitting into one cache line. */
 #define CACHE_LINE_BURST (RTE_CACHE_LINE_SIZE / sizeof(uintptr_t))
@@ -100,9 +103,11 @@ static unsigned n_keep;
 /* true if we want to test with constant n_get_bulk and n_put_bulk */
 static int use_constant_values;
 
-/* number of enqueues / dequeues */
+/* number of enqueues / dequeues, and time used */
 struct __rte_cache_aligned mempool_test_stats {
 	uint64_t enq_count;
+	uint64_t duration_cycles;
+	RTE_CACHE_GUARD;
 };
 
 static struct mempool_test_stats stats[RTE_MAX_LCORE];
@@ -185,6 +190,7 @@ per_lcore_mempool_test(void *arg)
 		GOTO_ERR(ret, out);
 
 	stats[lcore_id].enq_count = 0;
+	stats[lcore_id].duration_cycles = 0;
 
 	/* wait synchro for workers */
 	if (lcore_id != rte_get_main_lcore())
@@ -205,6 +211,15 @@ per_lcore_mempool_test(void *arg)
 					CACHE_LINE_BURST, CACHE_LINE_BURST);
 		else if (n_get_bulk == 32)
 			ret = test_loop(mp, cache, n_keep, 32, 32);
+		else if (n_get_bulk == 64)
+			ret = test_loop(mp, cache, n_keep, 64, 64);
+		else if (n_get_bulk == 128)
+			ret = test_loop(mp, cache, n_keep, 128, 128);
+		else if (n_get_bulk == 256)
+			ret = test_loop(mp, cache, n_keep, 256, 256);
+		else if (n_get_bulk == RTE_MEMPOOL_CACHE_MAX_SIZE)
+			ret = test_loop(mp, cache, n_keep,
+					RTE_MEMPOOL_CACHE_MAX_SIZE, RTE_MEMPOOL_CACHE_MAX_SIZE);
 		else
 			ret = -1;
 
@@ -216,6 +231,8 @@ per_lcore_mempool_test(void *arg)
 		stats[lcore_id].enq_count += N;
 	}
 
+	stats[lcore_id].duration_cycles = time_diff;
+
 out:
 	if (use_external_cache) {
 		rte_mempool_cache_flush(cache, mp);
@@ -233,6 +250,7 @@ launch_cores(struct rte_mempool *mp, unsigned int cores)
 	uint64_t rate;
 	int ret;
 	unsigned cores_save = cores;
+	double hz = rte_get_timer_hz();
 
 	rte_atomic_store_explicit(&synchro, 0, rte_memory_order_relaxed);
 
@@ -279,7 +297,9 @@ launch_cores(struct rte_mempool *mp, unsigned int cores)
 
 	rate = 0;
 	for (lcore_id = 0; lcore_id < RTE_MAX_LCORE; lcore_id++)
-		rate += (stats[lcore_id].enq_count / TIME_S);
+		if (stats[lcore_id].duration_cycles != 0)
+			rate += (double)stats[lcore_id].enq_count * hz /
+					(double)stats[lcore_id].duration_cycles;
 
 	printf("rate_persec=%" PRIu64 "\n", rate);
 
@@ -288,11 +308,13 @@ launch_cores(struct rte_mempool *mp, unsigned int cores)
 
 /* for a given number of core, launch all test cases */
 static int
-do_one_mempool_test(struct rte_mempool *mp, unsigned int cores)
+do_one_mempool_test(struct rte_mempool *mp, unsigned int cores, int external_cache)
 {
-	unsigned int bulk_tab_get[] = { 1, 4, CACHE_LINE_BURST, 32, 0 };
-	unsigned int bulk_tab_put[] = { 1, 4, CACHE_LINE_BURST, 32, 0 };
-	unsigned int keep_tab[] = { 32, 128, 512, 0 };
+	unsigned int bulk_tab_get[] = { 1, 4, CACHE_LINE_BURST, 32, 64, 128, 256,
+			RTE_MEMPOOL_CACHE_MAX_SIZE, 0 };
+	unsigned int bulk_tab_put[] = { 1, 4, CACHE_LINE_BURST, 32, 64, 128, 256,
+			RTE_MEMPOOL_CACHE_MAX_SIZE, 0 };
+	unsigned int keep_tab[] = { 32, 128, 512, 2048, 8192, 32768, 0 };
 	unsigned *get_bulk_ptr;
 	unsigned *put_bulk_ptr;
 	unsigned *keep_ptr;
@@ -302,6 +324,10 @@ do_one_mempool_test(struct rte_mempool *mp, unsigned int cores)
 		for (put_bulk_ptr = bulk_tab_put; *put_bulk_ptr; put_bulk_ptr++) {
 			for (keep_ptr = keep_tab; *keep_ptr; keep_ptr++) {
 
+				if (*keep_ptr < *get_bulk_ptr || *keep_ptr < *put_bulk_ptr)
+					continue;
+
+				use_external_cache = external_cache;
 				use_constant_values = 0;
 				n_get_bulk = *get_bulk_ptr;
 				n_put_bulk = *put_bulk_ptr;
@@ -324,7 +350,7 @@ do_one_mempool_test(struct rte_mempool *mp, unsigned int cores)
 }
 
 static int
-test_mempool_perf(void)
+do_all_mempool_perf_tests(unsigned int cores)
 {
 	struct rte_mempool *mp_cache = NULL;
 	struct rte_mempool *mp_nocache = NULL;
@@ -338,8 +364,10 @@ test_mempool_perf(void)
 					NULL, NULL,
 					my_obj_init, NULL,
 					SOCKET_ID_ANY, 0);
-	if (mp_nocache == NULL)
+	if (mp_nocache == NULL) {
+		printf("cannot allocate mempool (without cache)\n");
 		goto err;
+	}
 
 	/* create a mempool (with cache) */
 	mp_cache = rte_mempool_create("perf_test_cache", MEMPOOL_SIZE,
@@ -348,8 +376,10 @@ test_mempool_perf(void)
 				      NULL, NULL,
 				      my_obj_init, NULL,
 				      SOCKET_ID_ANY, 0);
-	if (mp_cache == NULL)
+	if (mp_cache == NULL) {
+		printf("cannot allocate mempool (with cache)\n");
 		goto err;
+	}
 
 	default_pool_ops = rte_mbuf_best_mempool_ops();
 	/* Create a mempool based on Default handler */
@@ -377,65 +407,83 @@ test_mempool_perf(void)
 
 	rte_mempool_obj_iter(default_pool, my_obj_init, NULL);
 
-	/* performance test with 1, 2 and max cores */
 	printf("start performance test (without cache)\n");
-
-	if (do_one_mempool_test(mp_nocache, 1) < 0)
-		goto err;
-
-	if (do_one_mempool_test(mp_nocache, 2) < 0)
+	if (do_one_mempool_test(mp_nocache, cores, 0) < 0)
 		goto err;
 
-	if (do_one_mempool_test(mp_nocache, rte_lcore_count()) < 0)
-		goto err;
-
-	/* performance test with 1, 2 and max cores */
 	printf("start performance test for %s (without cache)\n",
 	       default_pool_ops);
-
-	if (do_one_mempool_test(default_pool, 1) < 0)
+	if (do_one_mempool_test(default_pool, cores, 0) < 0)
 		goto err;
 
-	if (do_one_mempool_test(default_pool, 2) < 0)
+	printf("start performance test (with cache)\n");
+	if (do_one_mempool_test(mp_cache, cores, 0) < 0)
 		goto err;
 
-	if (do_one_mempool_test(default_pool, rte_lcore_count()) < 0)
+	printf("start performance test (with user-owned cache)\n");
+	if (do_one_mempool_test(mp_nocache, cores, 1) < 0)
 		goto err;
 
-	/* performance test with 1, 2 and max cores */
-	printf("start performance test (with cache)\n");
+	rte_mempool_list_dump(stdout);
 
-	if (do_one_mempool_test(mp_cache, 1) < 0)
-		goto err;
+	ret = 0;
 
-	if (do_one_mempool_test(mp_cache, 2) < 0)
-		goto err;
+err:
+	rte_mempool_free(mp_cache);
+	rte_mempool_free(mp_nocache);
+	rte_mempool_free(default_pool);
+	return ret;
+}
 
-	if (do_one_mempool_test(mp_cache, rte_lcore_count()) < 0)
-		goto err;
+static int
+test_mempool_perf_1core(void)
+{
+	return do_all_mempool_perf_tests(1);
+}
 
-	/* performance test with 1, 2 and max cores */
-	printf("start performance test (with user-owned cache)\n");
-	use_external_cache = 1;
+static int
+test_mempool_perf_2cores(void)
+{
+	if (rte_lcore_count() < 2) {
+		printf("not enough lcores\n");
+		return -1;
+	}
+	return do_all_mempool_perf_tests(2);
+}
 
-	if (do_one_mempool_test(mp_nocache, 1) < 0)
-		goto err;
+static int
+test_mempool_perf_allcores(void)
+{
+	return do_all_mempool_perf_tests(rte_lcore_count());
+}
+
+static int
+test_mempool_perf(void)
+{
+	int ret = -1;
 
-	if (do_one_mempool_test(mp_nocache, 2) < 0)
+	/* performance test with 1, 2 and max cores */
+	if (do_all_mempool_perf_tests(1) < 0)
 		goto err;
+	if (rte_lcore_count() == 1)
+		goto done;
 
-	if (do_one_mempool_test(mp_nocache, rte_lcore_count()) < 0)
+	if (do_all_mempool_perf_tests(2) < 0)
 		goto err;
+	if (rte_lcore_count() == 2)
+		goto done;
 
-	rte_mempool_list_dump(stdout);
+	if (do_all_mempool_perf_tests(rte_lcore_count()) < 0)
+		goto err;
 
+done:
 	ret = 0;
 
 err:
-	rte_mempool_free(mp_cache);
-	rte_mempool_free(mp_nocache);
-	rte_mempool_free(default_pool);
 	return ret;
 }
 
 REGISTER_PERF_TEST(mempool_perf_autotest, test_mempool_perf);
+REGISTER_PERF_TEST(mempool_perf_autotest_1core, test_mempool_perf_1core);
+REGISTER_PERF_TEST(mempool_perf_autotest_2cores, test_mempool_perf_2cores);
+REGISTER_PERF_TEST(mempool_perf_autotest_allcores, test_mempool_perf_allcores);
-- 
2.43.0

next prev parent reply	other threads:[~2024-09-17  8:10 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-01-21  4:52 [PATCH] " Morten Brørup
2024-01-22  7:10 ` fengchengwen
2024-01-22 14:34 ` [PATCH v2] " Morten Brørup
2024-01-24  2:41   ` fengchengwen
2024-01-24  8:58 ` [PATCH v3] " Morten Brørup
2024-01-24  9:10 ` [PATCH v4] " Morten Brørup
2024-01-24 11:21 ` [PATCH v5] " Morten Brørup
2024-02-18 18:03   ` Thomas Monjalon
2024-02-20 13:49     ` Morten Brørup
2024-02-21 10:22       ` Thomas Monjalon
2024-02-21 10:38         ` Morten Brørup
2024-02-21 10:40           ` Bruce Richardson
2024-02-20 14:01 ` [PATCH v6] " Morten Brørup
2024-03-02 20:04 ` [PATCH v7] " Morten Brørup
2024-04-04  9:26   ` Morten Brørup
2024-06-10  8:56     ` Morten Brørup
2024-06-18 13:21       ` Bruce Richardson
2024-06-18 13:48         ` Morten Brørup
2024-09-13 14:58           ` Morten Brørup
2024-09-16 12:40             ` Thomas Monjalon
2024-09-16 13:08               ` Morten Brørup
2024-09-16 14:04                 ` Thomas Monjalon
2024-09-16 15:37 ` [PATCH v8] " Morten Brørup
2024-09-17  8:10 ` Morten Brørup [this message]
2024-10-08  9:14   ` [PATCH v9] " Morten Brørup
2024-10-11 12:50   ` David Marchand

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20240917081001.774187-1-mb@smartsharesystems.com \
    --to=mb@smartsharesystems.com \
    --cc=andrew.rybchenko@oktetlabs.ru \
    --cc=bruce.richardson@intel.com \
    --cc=dev@dpdk.org \
    --cc=fengchengwen@huawei.com \
    --cc=honnappa.nagarahalli@arm.com \
    --cc=thomas@monjalon.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).