DPDK patches and discussions
 help / color / mirror / Atom feed
* [dpdk-dev] [PATCH 0/4] test/stack: improve multithreaded test
@ 2020-08-05 15:57 Steven Lariau
  2020-08-05 15:57 ` [dpdk-dev] [PATCH 1/4] test/stack: avoid trivial memory allocations Steven Lariau
                   ` (4 more replies)
  0 siblings, 5 replies; 20+ messages in thread
From: Steven Lariau @ 2020-08-05 15:57 UTC (permalink / raw)
  Cc: dev, honnappa.nagarahalli, dharmik.thakkar, nd, Steven Lariau

The current multithread DPDK stack test is using atomics operations to
share information between threads.
The lockfree stack implementation also uses atomic operations.
This is an issue for testing. The atomics operations for the test may
add some extra synchronization to the stack implementation,
that doesn't exist.
It makes it harder to find bugs related to memory orderings and data
races. The main goal of the patch is to remove all atomics operations
and any other form of data sharing in this test, to make sure that
most of the execution time is spent on the stack library.
    
Furthermore, this patch uses more appropriate functions to start /
wait cores in order to simplify the code.
The patch also adds code to propagate errors on any slave core to the
master.

Steven Lariau (4):
  test/stack: avoid trivial memory allocations
  test/stack: launch tests with mp remote launch API
  test/stack: propagate errors to main core
  test/stack: remove atomics operations

 app/test/test_stack.c | 71 ++++++++-----------------------------------
 1 file changed, 13 insertions(+), 58 deletions(-)

-- 
2.17.1


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [dpdk-dev] [PATCH 1/4] test/stack: avoid trivial memory allocations
  2020-08-05 15:57 [dpdk-dev] [PATCH 0/4] test/stack: improve multithreaded test Steven Lariau
@ 2020-08-05 15:57 ` Steven Lariau
  2020-08-11 20:13   ` Eads, Gage
  2020-08-05 15:57 ` [dpdk-dev] [PATCH 2/4] test/stack: launch tests with mp remote launch API Steven Lariau
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 20+ messages in thread
From: Steven Lariau @ 2020-08-05 15:57 UTC (permalink / raw)
  To: Gage Eads, Olivier Matz
  Cc: dev, honnappa.nagarahalli, dharmik.thakkar, nd, Steven Lariau

Replace the arguments array by one argument.
All objects in the args array have the same values, so there is no need
to use an array, only one struct is enough.
The args object is a lot smaller, and the allocation can be replaced
with a stack variable.

The allocation of obj_table isn't needed either, because MAX_BULK is
small. The allocation can instead be replaced with a static array.

Signed-off-by: Steven Lariau <steven.lariau@arm.com>
Reviewed-by: Dharmik Thakkar <dharmik.thakkar@arm.com>
Reviewed-by: Phil Yang <phil.yang@arm.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
---
 app/test/test_stack.c | 39 ++++++---------------------------------
 1 file changed, 6 insertions(+), 33 deletions(-)

diff --git a/app/test/test_stack.c b/app/test/test_stack.c
index c8dac1f55..5a7273a7d 100644
--- a/app/test/test_stack.c
+++ b/app/test/test_stack.c
@@ -280,16 +280,9 @@ static int
 stack_thread_push_pop(void *args)
 {
 	struct test_args *t = args;
-	void **obj_table;
+	void *obj_table[MAX_BULK];
 	int i;
 
-	obj_table = rte_calloc(NULL, STACK_SIZE, sizeof(void *), 0);
-	if (obj_table == NULL) {
-		printf("[%s():%u] failed to calloc %zu bytes\n",
-		       __func__, __LINE__, STACK_SIZE * sizeof(void *));
-		return -1;
-	}
-
 	for (i = 0; i < NUM_ITERS_PER_THREAD; i++) {
 		unsigned int success, num;
 
@@ -310,28 +303,25 @@ stack_thread_push_pop(void *args)
 		if (rte_stack_push(t->s, obj_table, num) != num) {
 			printf("[%s():%u] Failed to push %u pointers\n",
 			       __func__, __LINE__, num);
-			rte_free(obj_table);
 			return -1;
 		}
 
 		if (rte_stack_pop(t->s, obj_table, num) != num) {
 			printf("[%s():%u] Failed to pop %u pointers\n",
 			       __func__, __LINE__, num);
-			rte_free(obj_table);
 			return -1;
 		}
 
 		rte_atomic64_sub(t->sz, num);
 	}
 
-	rte_free(obj_table);
 	return 0;
 }
 
 static int
 test_stack_multithreaded(uint32_t flags)
 {
-	struct test_args *args;
+	struct test_args args;
 	unsigned int lcore_id;
 	struct rte_stack *s;
 	rte_atomic64_t size;
@@ -344,45 +334,28 @@ test_stack_multithreaded(uint32_t flags)
 	printf("[%s():%u] Running with %u lcores\n",
 	       __func__, __LINE__, rte_lcore_count());
 
-	args = rte_malloc(NULL, sizeof(struct test_args) * RTE_MAX_LCORE, 0);
-	if (args == NULL) {
-		printf("[%s():%u] failed to malloc %zu bytes\n",
-		       __func__, __LINE__,
-		       sizeof(struct test_args) * RTE_MAX_LCORE);
-		return -1;
-	}
-
 	s = rte_stack_create("test", STACK_SIZE, rte_socket_id(), flags);
 	if (s == NULL) {
 		printf("[%s():%u] Failed to create a stack\n",
 		       __func__, __LINE__);
-		rte_free(args);
 		return -1;
 	}
 
 	rte_atomic64_init(&size);
+	args.s = s;
+	args.sz = &size;
 
 	RTE_LCORE_FOREACH_SLAVE(lcore_id) {
-		args[lcore_id].s = s;
-		args[lcore_id].sz = &size;
-
 		if (rte_eal_remote_launch(stack_thread_push_pop,
-					  &args[lcore_id], lcore_id))
+					  &args, lcore_id))
 			rte_panic("Failed to launch lcore %d\n", lcore_id);
 	}
 
-	lcore_id = rte_lcore_id();
-
-	args[lcore_id].s = s;
-	args[lcore_id].sz = &size;
-
-	stack_thread_push_pop(&args[lcore_id]);
+	stack_thread_push_pop(&args);
 
 	rte_eal_mp_wait_lcore();
 
 	rte_stack_free(s);
-	rte_free(args);
-
 	return 0;
 }
 
-- 
2.17.1


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [dpdk-dev] [PATCH 2/4] test/stack: launch tests with mp remote launch API
  2020-08-05 15:57 [dpdk-dev] [PATCH 0/4] test/stack: improve multithreaded test Steven Lariau
  2020-08-05 15:57 ` [dpdk-dev] [PATCH 1/4] test/stack: avoid trivial memory allocations Steven Lariau
@ 2020-08-05 15:57 ` Steven Lariau
  2020-08-11 20:13   ` Eads, Gage
  2020-08-05 15:57 ` [dpdk-dev] [PATCH 3/4] test/stack: propagate errors to main core Steven Lariau
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 20+ messages in thread
From: Steven Lariau @ 2020-08-05 15:57 UTC (permalink / raw)
  To: Gage Eads, Olivier Matz
  Cc: dev, honnappa.nagarahalli, dharmik.thakkar, nd, Steven Lariau

All the cores use the same argument object, so there is no need to use
a loop to launch the test on every core one by one.
Replace loop with one call to rte_eal_mp_remote_launch


Signed-off-by: Steven Lariau <steven.lariau@arm.com>
Reviewed-by: Dharmik Thakkar <dharmik.thakkar@arm.com>
Reviewed-by: Phil Yang <phil.yang@arm.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
---
 app/test/test_stack.c | 11 ++---------
 1 file changed, 2 insertions(+), 9 deletions(-)

diff --git a/app/test/test_stack.c b/app/test/test_stack.c
index 5a7273a7d..c100d9faf 100644
--- a/app/test/test_stack.c
+++ b/app/test/test_stack.c
@@ -322,7 +322,6 @@ static int
 test_stack_multithreaded(uint32_t flags)
 {
 	struct test_args args;
-	unsigned int lcore_id;
 	struct rte_stack *s;
 	rte_atomic64_t size;
 
@@ -345,14 +344,8 @@ test_stack_multithreaded(uint32_t flags)
 	args.s = s;
 	args.sz = &size;
 
-	RTE_LCORE_FOREACH_SLAVE(lcore_id) {
-		if (rte_eal_remote_launch(stack_thread_push_pop,
-					  &args, lcore_id))
-			rte_panic("Failed to launch lcore %d\n", lcore_id);
-	}
-
-	stack_thread_push_pop(&args);
-
+	if (rte_eal_mp_remote_launch(stack_thread_push_pop, &args, CALL_MASTER))
+		rte_panic("Failed to launch tests\n");
 	rte_eal_mp_wait_lcore();
 
 	rte_stack_free(s);
-- 
2.17.1


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [dpdk-dev] [PATCH 3/4] test/stack: propagate errors to main core
  2020-08-05 15:57 [dpdk-dev] [PATCH 0/4] test/stack: improve multithreaded test Steven Lariau
  2020-08-05 15:57 ` [dpdk-dev] [PATCH 1/4] test/stack: avoid trivial memory allocations Steven Lariau
  2020-08-05 15:57 ` [dpdk-dev] [PATCH 2/4] test/stack: launch tests with mp remote launch API Steven Lariau
@ 2020-08-05 15:57 ` Steven Lariau
  2020-08-11 20:13   ` Eads, Gage
  2020-08-05 15:57 ` [dpdk-dev] [PATCH 4/4] test/stack: remove atomics operations Steven Lariau
  2020-08-12 19:18 ` [dpdk-dev] [PATCH v2 0/4] test/stack: improve multithreaded test Steven Lariau
  4 siblings, 1 reply; 20+ messages in thread
From: Steven Lariau @ 2020-08-05 15:57 UTC (permalink / raw)
  To: Gage Eads, Olivier Matz
  Cc: dev, honnappa.nagarahalli, dharmik.thakkar, nd, Steven Lariau

Use rte_eal_wait_lcore to wait and get the return value for all cores.
This is used to propagate any error to the main core.

Signed-off-by: Steven Lariau <steven.lariau@arm.com>
Reviewed-by: Dharmik Thakkar <dharmik.thakkar@arm.com>
Reviewed-by: Phil Yang <phil.yang@arm.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
---
 app/test/test_stack.c | 10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/app/test/test_stack.c b/app/test/test_stack.c
index c100d9faf..b9d6befd4 100644
--- a/app/test/test_stack.c
+++ b/app/test/test_stack.c
@@ -322,8 +322,10 @@ static int
 test_stack_multithreaded(uint32_t flags)
 {
 	struct test_args args;
+	unsigned int lcore_id;
 	struct rte_stack *s;
 	rte_atomic64_t size;
+	int result = 0;
 
 	if (rte_lcore_count() < 2) {
 		printf("Not enough cores for test_stack_multithreaded, expecting at least 2\n");
@@ -346,10 +348,14 @@ test_stack_multithreaded(uint32_t flags)
 
 	if (rte_eal_mp_remote_launch(stack_thread_push_pop, &args, CALL_MASTER))
 		rte_panic("Failed to launch tests\n");
-	rte_eal_mp_wait_lcore();
+
+	RTE_LCORE_FOREACH(lcore_id) {
+		if (rte_eal_wait_lcore(lcore_id) < 0)
+			result = -1;
+	}
 
 	rte_stack_free(s);
-	return 0;
+	return result;
 }
 
 static int
-- 
2.17.1


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [dpdk-dev] [PATCH 4/4] test/stack: remove atomics operations
  2020-08-05 15:57 [dpdk-dev] [PATCH 0/4] test/stack: improve multithreaded test Steven Lariau
                   ` (2 preceding siblings ...)
  2020-08-05 15:57 ` [dpdk-dev] [PATCH 3/4] test/stack: propagate errors to main core Steven Lariau
@ 2020-08-05 15:57 ` Steven Lariau
  2020-08-11 20:13   ` Eads, Gage
  2020-08-12 19:18 ` [dpdk-dev] [PATCH v2 0/4] test/stack: improve multithreaded test Steven Lariau
  4 siblings, 1 reply; 20+ messages in thread
From: Steven Lariau @ 2020-08-05 15:57 UTC (permalink / raw)
  To: Gage Eads, Olivier Matz
  Cc: dev, honnappa.nagarahalli, dharmik.thakkar, nd, Steven Lariau

Remove the part that checks if there is enough room in the stack, it's
always true as long as size of stack >= MAX_BULK*rte_lcore_count().
This check used an atomic cmpset, and read / write to a shared size
variable. These operations result in some form of synchronization
that might get in the way of the actual stack testing.

Signed-off-by: Steven Lariau <steven.lariau@arm.com>
Reviewed-by: Dharmik Thakkar <dharmik.thakkar@arm.com>
Reviewed-by: Phil Yang <phil.yang@arm.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
---
 app/test/test_stack.c | 23 +++--------------------
 1 file changed, 3 insertions(+), 20 deletions(-)

diff --git a/app/test/test_stack.c b/app/test/test_stack.c
index b9d6befd4..02bea7ec5 100644
--- a/app/test/test_stack.c
+++ b/app/test/test_stack.c
@@ -273,7 +273,6 @@ test_free_null(void)
 
 struct test_args {
 	struct rte_stack *s;
-	rte_atomic64_t *sz;
 };
 
 static int
@@ -284,21 +283,9 @@ stack_thread_push_pop(void *args)
 	int i;
 
 	for (i = 0; i < NUM_ITERS_PER_THREAD; i++) {
-		unsigned int success, num;
+		unsigned int num;
 
-		/* Reserve up to min(MAX_BULK, available slots) stack entries,
-		 * then push and pop those stack entries.
-		 */
-		do {
-			uint64_t sz = rte_atomic64_read(t->sz);
-			volatile uint64_t *sz_addr;
-
-			sz_addr = (volatile uint64_t *)t->sz;
-
-			num = RTE_MIN(rte_rand() % MAX_BULK, STACK_SIZE - sz);
-
-			success = rte_atomic64_cmpset(sz_addr, sz, sz + num);
-		} while (success == 0);
+		num = rte_rand() % MAX_BULK;
 
 		if (rte_stack_push(t->s, obj_table, num) != num) {
 			printf("[%s():%u] Failed to push %u pointers\n",
@@ -312,7 +299,6 @@ stack_thread_push_pop(void *args)
 			return -1;
 		}
 
-		rte_atomic64_sub(t->sz, num);
 	}
 
 	return 0;
@@ -324,7 +310,6 @@ test_stack_multithreaded(uint32_t flags)
 	struct test_args args;
 	unsigned int lcore_id;
 	struct rte_stack *s;
-	rte_atomic64_t size;
 	int result = 0;
 
 	if (rte_lcore_count() < 2) {
@@ -335,16 +320,14 @@ test_stack_multithreaded(uint32_t flags)
 	printf("[%s():%u] Running with %u lcores\n",
 	       __func__, __LINE__, rte_lcore_count());
 
-	s = rte_stack_create("test", STACK_SIZE, rte_socket_id(), flags);
+	s = rte_stack_create("test", MAX_BULK * rte_lcore_count(), rte_socket_id(), flags);
 	if (s == NULL) {
 		printf("[%s():%u] Failed to create a stack\n",
 		       __func__, __LINE__);
 		return -1;
 	}
 
-	rte_atomic64_init(&size);
 	args.s = s;
-	args.sz = &size;
 
 	if (rte_eal_mp_remote_launch(stack_thread_push_pop, &args, CALL_MASTER))
 		rte_panic("Failed to launch tests\n");
-- 
2.17.1


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [dpdk-dev] [PATCH 1/4] test/stack: avoid trivial memory allocations
  2020-08-05 15:57 ` [dpdk-dev] [PATCH 1/4] test/stack: avoid trivial memory allocations Steven Lariau
@ 2020-08-11 20:13   ` Eads, Gage
  2020-08-11 20:38     ` Stephen Hemminger
  0 siblings, 1 reply; 20+ messages in thread
From: Eads, Gage @ 2020-08-11 20:13 UTC (permalink / raw)
  To: Steven Lariau, Olivier Matz
  Cc: dev, honnappa.nagarahalli, dharmik.thakkar, nd

Hi Steven,

> -----Original Message-----
> From: Steven Lariau <steven.lariau@arm.com>
> Sent: Wednesday, August 5, 2020 10:57 AM
> To: Eads, Gage <gage.eads@intel.com>; Olivier Matz
> <olivier.matz@6wind.com>
> Cc: dev@dpdk.org; honnappa.nagarahalli@arm.com;
> dharmik.thakkar@arm.com; nd@arm.com; Steven Lariau
> <steven.lariau@arm.com>
> Subject: [PATCH 1/4] test/stack: avoid trivial memory allocations
> 
> Replace the arguments array by one argument.
> All objects in the args array have the same values, so there is no need
> to use an array, only one struct is enough.
> The args object is a lot smaller, and the allocation can be replaced
> with a stack variable.
> 
> The allocation of obj_table isn't needed either, because MAX_BULK is
> small. The allocation can instead be replaced with a static array.
> 
> Signed-off-by: Steven Lariau <steven.lariau@arm.com>
> Reviewed-by: Dharmik Thakkar <dharmik.thakkar@arm.com>
> Reviewed-by: Phil Yang <phil.yang@arm.com>
> Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
> ---
>  app/test/test_stack.c | 39 ++++++---------------------------------
>  1 file changed, 6 insertions(+), 33 deletions(-)
> 
> diff --git a/app/test/test_stack.c b/app/test/test_stack.c
> index c8dac1f55..5a7273a7d 100644
> --- a/app/test/test_stack.c
> +++ b/app/test/test_stack.c
> @@ -280,16 +280,9 @@ static int
>  stack_thread_push_pop(void *args)
>  {
>  	struct test_args *t = args;
> -	void **obj_table;
> +	void *obj_table[MAX_BULK];
>  	int i;
> 
> -	obj_table = rte_calloc(NULL, STACK_SIZE, sizeof(void *), 0);
> -	if (obj_table == NULL) {
> -		printf("[%s():%u] failed to calloc %zu bytes\n",
> -		       __func__, __LINE__, STACK_SIZE * sizeof(void *));
> -		return -1;
> -	}
> -
>  	for (i = 0; i < NUM_ITERS_PER_THREAD; i++) {
>  		unsigned int success, num;
> 
> @@ -310,28 +303,25 @@ stack_thread_push_pop(void *args)
>  		if (rte_stack_push(t->s, obj_table, num) != num) {
>  			printf("[%s():%u] Failed to push %u pointers\n",
>  			       __func__, __LINE__, num);
> -			rte_free(obj_table);
>  			return -1;
>  		}
> 
>  		if (rte_stack_pop(t->s, obj_table, num) != num) {
>  			printf("[%s():%u] Failed to pop %u pointers\n",
>  			       __func__, __LINE__, num);
> -			rte_free(obj_table);
>  			return -1;
>  		}
> 
>  		rte_atomic64_sub(t->sz, num);
>  	}
> 
> -	rte_free(obj_table);
>  	return 0;
>  }

Agreed, the dynamic allocation is unnecessary.

> 
>  static int
>  test_stack_multithreaded(uint32_t flags)
>  {
> -	struct test_args *args;
> +	struct test_args args;
>  	unsigned int lcore_id;
>  	struct rte_stack *s;
>  	rte_atomic64_t size;
> @@ -344,45 +334,28 @@ test_stack_multithreaded(uint32_t flags)
>  	printf("[%s():%u] Running with %u lcores\n",
>  	       __func__, __LINE__, rte_lcore_count());
> 
> -	args = rte_malloc(NULL, sizeof(struct test_args) * RTE_MAX_LCORE,
> 0);
> -	if (args == NULL) {
> -		printf("[%s():%u] failed to malloc %zu bytes\n",
> -		       __func__, __LINE__,
> -		       sizeof(struct test_args) * RTE_MAX_LCORE);
> -		return -1;
> -	}
> -
>  	s = rte_stack_create("test", STACK_SIZE, rte_socket_id(), flags);
>  	if (s == NULL) {
>  		printf("[%s():%u] Failed to create a stack\n",
>  		       __func__, __LINE__);
> -		rte_free(args);
>  		return -1;
>  	}
> 
>  	rte_atomic64_init(&size);
> +	args.s = s;
> +	args.sz = &size;
> 
>  	RTE_LCORE_FOREACH_SLAVE(lcore_id) {
> -		args[lcore_id].s = s;
> -		args[lcore_id].sz = &size;
> -
>  		if (rte_eal_remote_launch(stack_thread_push_pop,
> -					  &args[lcore_id], lcore_id))
> +					  &args, lcore_id))
>  			rte_panic("Failed to launch lcore %d\n", lcore_id);
>  	}


In general we shouldn't pass a stack variable to other threads. Though your
code here looks fine, I'd rather err on the safe side in case this is ever used
as a template/basis for some other code...particularly since there's no
performance/correctness/etc. penalty to using dynamically allocated memory.

To support patch 2/4, you can instead convert the rte_malloc to allocate a
single shared test_args structure. Or perhaps move patch 4 earlier in the series,
and simply pass the stack pointer instead.

Thanks,
Gage

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [dpdk-dev] [PATCH 2/4] test/stack: launch tests with mp remote launch API
  2020-08-05 15:57 ` [dpdk-dev] [PATCH 2/4] test/stack: launch tests with mp remote launch API Steven Lariau
@ 2020-08-11 20:13   ` Eads, Gage
  0 siblings, 0 replies; 20+ messages in thread
From: Eads, Gage @ 2020-08-11 20:13 UTC (permalink / raw)
  To: Steven Lariau, Olivier Matz
  Cc: dev, honnappa.nagarahalli, dharmik.thakkar, nd

> -----Original Message-----
> From: Steven Lariau <steven.lariau@arm.com>
> Sent: Wednesday, August 5, 2020 10:57 AM
> To: Eads, Gage <gage.eads@intel.com>; Olivier Matz
> <olivier.matz@6wind.com>
> Cc: dev@dpdk.org; honnappa.nagarahalli@arm.com;
> dharmik.thakkar@arm.com; nd@arm.com; Steven Lariau
> <steven.lariau@arm.com>
> Subject: [PATCH 2/4] test/stack: launch tests with mp remote launch API
> 
> All the cores use the same argument object, so there is no need to use
> a loop to launch the test on every core one by one.
> Replace loop with one call to rte_eal_mp_remote_launch
> 
> 
> Signed-off-by: Steven Lariau <steven.lariau@arm.com>
> Reviewed-by: Dharmik Thakkar <dharmik.thakkar@arm.com>
> Reviewed-by: Phil Yang <phil.yang@arm.com>
> Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>

Acked-by: Gage Eads <gage.eads@intel.com>

Thanks,
Gage

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [dpdk-dev] [PATCH 3/4] test/stack: propagate errors to main core
  2020-08-05 15:57 ` [dpdk-dev] [PATCH 3/4] test/stack: propagate errors to main core Steven Lariau
@ 2020-08-11 20:13   ` Eads, Gage
  0 siblings, 0 replies; 20+ messages in thread
From: Eads, Gage @ 2020-08-11 20:13 UTC (permalink / raw)
  To: Steven Lariau, Olivier Matz
  Cc: dev, honnappa.nagarahalli, dharmik.thakkar, nd



> -----Original Message-----
> From: Steven Lariau <steven.lariau@arm.com>
> Sent: Wednesday, August 5, 2020 10:57 AM
> To: Eads, Gage <gage.eads@intel.com>; Olivier Matz
> <olivier.matz@6wind.com>
> Cc: dev@dpdk.org; honnappa.nagarahalli@arm.com;
> dharmik.thakkar@arm.com; nd@arm.com; Steven Lariau
> <steven.lariau@arm.com>
> Subject: [PATCH 3/4] test/stack: propagate errors to main core
> 
> Use rte_eal_wait_lcore to wait and get the return value for all cores.
> This is used to propagate any error to the main core.
> 
> Signed-off-by: Steven Lariau <steven.lariau@arm.com>
> Reviewed-by: Dharmik Thakkar <dharmik.thakkar@arm.com>
> Reviewed-by: Phil Yang <phil.yang@arm.com>
> Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>

Acked-by: Gage Eads <gage.eads@intel.com>

Thanks,
Gage

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [dpdk-dev] [PATCH 4/4] test/stack: remove atomics operations
  2020-08-05 15:57 ` [dpdk-dev] [PATCH 4/4] test/stack: remove atomics operations Steven Lariau
@ 2020-08-11 20:13   ` Eads, Gage
  0 siblings, 0 replies; 20+ messages in thread
From: Eads, Gage @ 2020-08-11 20:13 UTC (permalink / raw)
  To: Steven Lariau, Olivier Matz
  Cc: dev, honnappa.nagarahalli, dharmik.thakkar, nd



> -----Original Message-----
> From: Steven Lariau <steven.lariau@arm.com>
> Sent: Wednesday, August 5, 2020 10:57 AM
> To: Eads, Gage <gage.eads@intel.com>; Olivier Matz
> <olivier.matz@6wind.com>
> Cc: dev@dpdk.org; honnappa.nagarahalli@arm.com;
> dharmik.thakkar@arm.com; nd@arm.com; Steven Lariau
> <steven.lariau@arm.com>
> Subject: [PATCH 4/4] test/stack: remove atomics operations
> 
> Remove the part that checks if there is enough room in the stack, it's
> always true as long as size of stack >= MAX_BULK*rte_lcore_count().
> This check used an atomic cmpset, and read / write to a shared size
> variable. These operations result in some form of synchronization
> that might get in the way of the actual stack testing.
> 
> Signed-off-by: Steven Lariau <steven.lariau@arm.com>
> Reviewed-by: Dharmik Thakkar <dharmik.thakkar@arm.com>
> Reviewed-by: Phil Yang <phil.yang@arm.com>
> Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>

Nice simplification, and thanks for the good contributions to these tests.

Acked-by: Gage Eads <gage.eads@intel.com

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [dpdk-dev] [PATCH 1/4] test/stack: avoid trivial memory allocations
  2020-08-11 20:13   ` Eads, Gage
@ 2020-08-11 20:38     ` Stephen Hemminger
  2020-08-11 20:49       ` Honnappa Nagarahalli
  0 siblings, 1 reply; 20+ messages in thread
From: Stephen Hemminger @ 2020-08-11 20:38 UTC (permalink / raw)
  To: Eads, Gage
  Cc: Steven Lariau, Olivier Matz, dev, honnappa.nagarahalli,
	dharmik.thakkar, nd

On Tue, 11 Aug 2020 20:13:24 +0000
"Eads, Gage" <gage.eads@intel.com> wrote:

> Hi Steven,
> 
> > -----Original Message-----
> > From: Steven Lariau <steven.lariau@arm.com>
> > Sent: Wednesday, August 5, 2020 10:57 AM
> > To: Eads, Gage <gage.eads@intel.com>; Olivier Matz
> > <olivier.matz@6wind.com>
> > Cc: dev@dpdk.org; honnappa.nagarahalli@arm.com;
> > dharmik.thakkar@arm.com; nd@arm.com; Steven Lariau
> > <steven.lariau@arm.com>
> > Subject: [PATCH 1/4] test/stack: avoid trivial memory allocations
> > 
> > Replace the arguments array by one argument.
> > All objects in the args array have the same values, so there is no need
> > to use an array, only one struct is enough.
> > The args object is a lot smaller, and the allocation can be replaced
> > with a stack variable.
> > 
> > The allocation of obj_table isn't needed either, because MAX_BULK is
> > small. The allocation can instead be replaced with a static array.
> > 
> > Signed-off-by: Steven Lariau <steven.lariau@arm.com>
> > Reviewed-by: Dharmik Thakkar <dharmik.thakkar@arm.com>
> > Reviewed-by: Phil Yang <phil.yang@arm.com>
> > Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
> > ---
> >  app/test/test_stack.c | 39 ++++++---------------------------------
> >  1 file changed, 6 insertions(+), 33 deletions(-)
> > 
> > diff --git a/app/test/test_stack.c b/app/test/test_stack.c
> > index c8dac1f55..5a7273a7d 100644
> > --- a/app/test/test_stack.c
> > +++ b/app/test/test_stack.c
> > @@ -280,16 +280,9 @@ static int
> >  stack_thread_push_pop(void *args)
> >  {
> >  	struct test_args *t = args;
> > -	void **obj_table;
> > +	void *obj_table[MAX_BULK];
> >  	int i;
> > 
> > -	obj_table = rte_calloc(NULL, STACK_SIZE, sizeof(void *), 0);
> > -	if (obj_table == NULL) {
> > -		printf("[%s():%u] failed to calloc %zu bytes\n",
> > -		       __func__, __LINE__, STACK_SIZE * sizeof(void *));
> > -		return -1;
> > -	}
> > -
> >  	for (i = 0; i < NUM_ITERS_PER_THREAD; i++) {
> >  		unsigned int success, num;
> > 
> > @@ -310,28 +303,25 @@ stack_thread_push_pop(void *args)
> >  		if (rte_stack_push(t->s, obj_table, num) != num) {
> >  			printf("[%s():%u] Failed to push %u pointers\n",
> >  			       __func__, __LINE__, num);
> > -			rte_free(obj_table);
> >  			return -1;
> >  		}
> > 
> >  		if (rte_stack_pop(t->s, obj_table, num) != num) {
> >  			printf("[%s():%u] Failed to pop %u pointers\n",
> >  			       __func__, __LINE__, num);
> > -			rte_free(obj_table);
> >  			return -1;
> >  		}
> > 
> >  		rte_atomic64_sub(t->sz, num);
> >  	}
> > 
> > -	rte_free(obj_table);
> >  	return 0;
> >  }  
> 
> Agreed, the dynamic allocation is unnecessary.
> 
> > 
> >  static int
> >  test_stack_multithreaded(uint32_t flags)
> >  {
> > -	struct test_args *args;
> > +	struct test_args args;
> >  	unsigned int lcore_id;
> >  	struct rte_stack *s;
> >  	rte_atomic64_t size;
> > @@ -344,45 +334,28 @@ test_stack_multithreaded(uint32_t flags)
> >  	printf("[%s():%u] Running with %u lcores\n",
> >  	       __func__, __LINE__, rte_lcore_count());
> > 
> > -	args = rte_malloc(NULL, sizeof(struct test_args) * RTE_MAX_LCORE,
> > 0);
> > -	if (args == NULL) {
> > -		printf("[%s():%u] failed to malloc %zu bytes\n",
> > -		       __func__, __LINE__,
> > -		       sizeof(struct test_args) * RTE_MAX_LCORE);
> > -		return -1;
> > -	}
> > -
> >  	s = rte_stack_create("test", STACK_SIZE, rte_socket_id(), flags);
> >  	if (s == NULL) {
> >  		printf("[%s():%u] Failed to create a stack\n",
> >  		       __func__, __LINE__);
> > -		rte_free(args);
> >  		return -1;
> >  	}
> > 
> >  	rte_atomic64_init(&size);
> > +	args.s = s;
> > +	args.sz = &size;
> > 
> >  	RTE_LCORE_FOREACH_SLAVE(lcore_id) {
> > -		args[lcore_id].s = s;
> > -		args[lcore_id].sz = &size;
> > -
> >  		if (rte_eal_remote_launch(stack_thread_push_pop,
> > -					  &args[lcore_id], lcore_id))
> > +					  &args, lcore_id))
> >  			rte_panic("Failed to launch lcore %d\n", lcore_id);
> >  	}  
> 
> 
> In general we shouldn't pass a stack variable to other threads. Though your
> code here looks fine, I'd rather err on the safe side in case this is ever used
> as a template/basis for some other code...particularly since there's no
> performance/correctness/etc. penalty to using dynamically allocated memory.
> 
> To support patch 2/4, you can instead convert the rte_malloc to allocate a
> single shared test_args structure. Or perhaps move patch 4 earlier in the series,
> and simply pass the stack pointer instead.
> 
> Thanks,
> Gage

There is no gain to using rte_malloc unless you are doing primary/secondary process
or trying to test rte_malloc. Why not use regular malloc which has good tools and library support.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [dpdk-dev] [PATCH 1/4] test/stack: avoid trivial memory allocations
  2020-08-11 20:38     ` Stephen Hemminger
@ 2020-08-11 20:49       ` Honnappa Nagarahalli
  2020-08-11 21:14         ` Eads, Gage
  0 siblings, 1 reply; 20+ messages in thread
From: Honnappa Nagarahalli @ 2020-08-11 20:49 UTC (permalink / raw)
  To: Stephen Hemminger, Eads, Gage
  Cc: Steven Lariau, Olivier Matz, dev, Dharmik Thakkar, nd,
	Honnappa Nagarahalli, nd

<snip>

> > >
> > > Replace the arguments array by one argument.
> > > All objects in the args array have the same values, so there is no
> > > need to use an array, only one struct is enough.
> > > The args object is a lot smaller, and the allocation can be replaced
> > > with a stack variable.
> > >
> > > The allocation of obj_table isn't needed either, because MAX_BULK is
> > > small. The allocation can instead be replaced with a static array.
> > >
> > > Signed-off-by: Steven Lariau <steven.lariau@arm.com>
> > > Reviewed-by: Dharmik Thakkar <dharmik.thakkar@arm.com>
> > > Reviewed-by: Phil Yang <phil.yang@arm.com>
> > > Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
> > > ---
> > >  app/test/test_stack.c | 39 ++++++---------------------------------
> > >  1 file changed, 6 insertions(+), 33 deletions(-)
> > >
> > > diff --git a/app/test/test_stack.c b/app/test/test_stack.c index
> > > c8dac1f55..5a7273a7d 100644
> > > --- a/app/test/test_stack.c
> > > +++ b/app/test/test_stack.c
> > > @@ -280,16 +280,9 @@ static int
> > >  stack_thread_push_pop(void *args)
> > >  {
> > >  	struct test_args *t = args;
> > > -	void **obj_table;
> > > +	void *obj_table[MAX_BULK];
> > >  	int i;
> > >
> > > -	obj_table = rte_calloc(NULL, STACK_SIZE, sizeof(void *), 0);
> > > -	if (obj_table == NULL) {
> > > -		printf("[%s():%u] failed to calloc %zu bytes\n",
> > > -		       __func__, __LINE__, STACK_SIZE * sizeof(void *));
> > > -		return -1;
> > > -	}
> > > -
> > >  	for (i = 0; i < NUM_ITERS_PER_THREAD; i++) {
> > >  		unsigned int success, num;
> > >
> > > @@ -310,28 +303,25 @@ stack_thread_push_pop(void *args)
> > >  		if (rte_stack_push(t->s, obj_table, num) != num) {
> > >  			printf("[%s():%u] Failed to push %u pointers\n",
> > >  			       __func__, __LINE__, num);
> > > -			rte_free(obj_table);
> > >  			return -1;
> > >  		}
> > >
> > >  		if (rte_stack_pop(t->s, obj_table, num) != num) {
> > >  			printf("[%s():%u] Failed to pop %u pointers\n",
> > >  			       __func__, __LINE__, num);
> > > -			rte_free(obj_table);
> > >  			return -1;
> > >  		}
> > >
> > >  		rte_atomic64_sub(t->sz, num);
> > >  	}
> > >
> > > -	rte_free(obj_table);
> > >  	return 0;
> > >  }
> >
> > Agreed, the dynamic allocation is unnecessary.
> >
> > >
> > >  static int
> > >  test_stack_multithreaded(uint32_t flags)  {
> > > -	struct test_args *args;
> > > +	struct test_args args;
> > >  	unsigned int lcore_id;
> > >  	struct rte_stack *s;
> > >  	rte_atomic64_t size;
> > > @@ -344,45 +334,28 @@ test_stack_multithreaded(uint32_t flags)
> > >  	printf("[%s():%u] Running with %u lcores\n",
> > >  	       __func__, __LINE__, rte_lcore_count());
> > >
> > > -	args = rte_malloc(NULL, sizeof(struct test_args) * RTE_MAX_LCORE,
> > > 0);
> > > -	if (args == NULL) {
> > > -		printf("[%s():%u] failed to malloc %zu bytes\n",
> > > -		       __func__, __LINE__,
> > > -		       sizeof(struct test_args) * RTE_MAX_LCORE);
> > > -		return -1;
> > > -	}
> > > -
> > >  	s = rte_stack_create("test", STACK_SIZE, rte_socket_id(), flags);
> > >  	if (s == NULL) {
> > >  		printf("[%s():%u] Failed to create a stack\n",
> > >  		       __func__, __LINE__);
> > > -		rte_free(args);
> > >  		return -1;
> > >  	}
> > >
> > >  	rte_atomic64_init(&size);
> > > +	args.s = s;
> > > +	args.sz = &size;
> > >
> > >  	RTE_LCORE_FOREACH_SLAVE(lcore_id) {
> > > -		args[lcore_id].s = s;
> > > -		args[lcore_id].sz = &size;
> > > -
> > >  		if (rte_eal_remote_launch(stack_thread_push_pop,
> > > -					  &args[lcore_id], lcore_id))
> > > +					  &args, lcore_id))
> > >  			rte_panic("Failed to launch lcore %d\n", lcore_id);
> > >  	}
> >
> >
> > In general we shouldn't pass a stack variable to other threads. Though
> > your code here looks fine, I'd rather err on the safe side in case
> > this is ever used as a template/basis for some other
> > code...particularly since there's no performance/correctness/etc. penalty to
> using dynamically allocated memory.
> >
> > To support patch 2/4, you can instead convert the rte_malloc to
> > allocate a single shared test_args structure. Or perhaps move patch 4
> > earlier in the series, and simply pass the stack pointer instead.
> >
> > Thanks,
> > Gage
> 
> There is no gain to using rte_malloc unless you are doing primary/secondary
> process or trying to test rte_malloc. Why not use regular malloc which has
> good tools and library support.

I think making 'args' a global variable is enough in this case.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [dpdk-dev] [PATCH 1/4] test/stack: avoid trivial memory allocations
  2020-08-11 20:49       ` Honnappa Nagarahalli
@ 2020-08-11 21:14         ` Eads, Gage
  0 siblings, 0 replies; 20+ messages in thread
From: Eads, Gage @ 2020-08-11 21:14 UTC (permalink / raw)
  To: Honnappa Nagarahalli, Stephen Hemminger
  Cc: Steven Lariau, Olivier Matz, dev, Dharmik Thakkar, nd, nd



> -----Original Message-----
> From: Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>
> Sent: Tuesday, August 11, 2020 3:50 PM
> To: Stephen Hemminger <stephen@networkplumber.org>; Eads, Gage
> <gage.eads@intel.com>
> Cc: Steven Lariau <Steven.Lariau@arm.com>; Olivier Matz
> <olivier.matz@6wind.com>; dev@dpdk.org; Dharmik Thakkar
> <Dharmik.Thakkar@arm.com>; nd <nd@arm.com>; Honnappa Nagarahalli
> <Honnappa.Nagarahalli@arm.com>; nd <nd@arm.com>
> Subject: RE: [dpdk-dev] [PATCH 1/4] test/stack: avoid trivial memory
> allocations
> 
> <snip>
> 
> > > >
> > > > Replace the arguments array by one argument.
> > > > All objects in the args array have the same values, so there is no
> > > > need to use an array, only one struct is enough.
> > > > The args object is a lot smaller, and the allocation can be replaced
> > > > with a stack variable.
> > > >
> > > > The allocation of obj_table isn't needed either, because MAX_BULK is
> > > > small. The allocation can instead be replaced with a static array.
> > > >
> > > > Signed-off-by: Steven Lariau <steven.lariau@arm.com>
> > > > Reviewed-by: Dharmik Thakkar <dharmik.thakkar@arm.com>
> > > > Reviewed-by: Phil Yang <phil.yang@arm.com>
> > > > Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
> > > > ---
> > > >  app/test/test_stack.c | 39 ++++++---------------------------------
> > > >  1 file changed, 6 insertions(+), 33 deletions(-)
> > > >
> > > > diff --git a/app/test/test_stack.c b/app/test/test_stack.c index
> > > > c8dac1f55..5a7273a7d 100644
> > > > --- a/app/test/test_stack.c
> > > > +++ b/app/test/test_stack.c
> > > > @@ -280,16 +280,9 @@ static int
> > > >  stack_thread_push_pop(void *args)
> > > >  {
> > > >  	struct test_args *t = args;
> > > > -	void **obj_table;
> > > > +	void *obj_table[MAX_BULK];
> > > >  	int i;
> > > >
> > > > -	obj_table = rte_calloc(NULL, STACK_SIZE, sizeof(void *), 0);
> > > > -	if (obj_table == NULL) {
> > > > -		printf("[%s():%u] failed to calloc %zu bytes\n",
> > > > -		       __func__, __LINE__, STACK_SIZE * sizeof(void *));
> > > > -		return -1;
> > > > -	}
> > > > -
> > > >  	for (i = 0; i < NUM_ITERS_PER_THREAD; i++) {
> > > >  		unsigned int success, num;
> > > >
> > > > @@ -310,28 +303,25 @@ stack_thread_push_pop(void *args)
> > > >  		if (rte_stack_push(t->s, obj_table, num) != num) {
> > > >  			printf("[%s():%u] Failed to push %u pointers\n",
> > > >  			       __func__, __LINE__, num);
> > > > -			rte_free(obj_table);
> > > >  			return -1;
> > > >  		}
> > > >
> > > >  		if (rte_stack_pop(t->s, obj_table, num) != num) {
> > > >  			printf("[%s():%u] Failed to pop %u pointers\n",
> > > >  			       __func__, __LINE__, num);
> > > > -			rte_free(obj_table);
> > > >  			return -1;
> > > >  		}
> > > >
> > > >  		rte_atomic64_sub(t->sz, num);
> > > >  	}
> > > >
> > > > -	rte_free(obj_table);
> > > >  	return 0;
> > > >  }
> > >
> > > Agreed, the dynamic allocation is unnecessary.
> > >
> > > >
> > > >  static int
> > > >  test_stack_multithreaded(uint32_t flags)  {
> > > > -	struct test_args *args;
> > > > +	struct test_args args;
> > > >  	unsigned int lcore_id;
> > > >  	struct rte_stack *s;
> > > >  	rte_atomic64_t size;
> > > > @@ -344,45 +334,28 @@ test_stack_multithreaded(uint32_t flags)
> > > >  	printf("[%s():%u] Running with %u lcores\n",
> > > >  	       __func__, __LINE__, rte_lcore_count());
> > > >
> > > > -	args = rte_malloc(NULL, sizeof(struct test_args) * RTE_MAX_LCORE,
> > > > 0);
> > > > -	if (args == NULL) {
> > > > -		printf("[%s():%u] failed to malloc %zu bytes\n",
> > > > -		       __func__, __LINE__,
> > > > -		       sizeof(struct test_args) * RTE_MAX_LCORE);
> > > > -		return -1;
> > > > -	}
> > > > -
> > > >  	s = rte_stack_create("test", STACK_SIZE, rte_socket_id(), flags);
> > > >  	if (s == NULL) {
> > > >  		printf("[%s():%u] Failed to create a stack\n",
> > > >  		       __func__, __LINE__);
> > > > -		rte_free(args);
> > > >  		return -1;
> > > >  	}
> > > >
> > > >  	rte_atomic64_init(&size);
> > > > +	args.s = s;
> > > > +	args.sz = &size;
> > > >
> > > >  	RTE_LCORE_FOREACH_SLAVE(lcore_id) {
> > > > -		args[lcore_id].s = s;
> > > > -		args[lcore_id].sz = &size;
> > > > -
> > > >  		if (rte_eal_remote_launch(stack_thread_push_pop,
> > > > -					  &args[lcore_id], lcore_id))
> > > > +					  &args, lcore_id))
> > > >  			rte_panic("Failed to launch lcore %d\n", lcore_id);
> > > >  	}
> > >
> > >
> > > In general we shouldn't pass a stack variable to other threads. Though
> > > your code here looks fine, I'd rather err on the safe side in case
> > > this is ever used as a template/basis for some other
> > > code...particularly since there's no performance/correctness/etc.
> penalty to
> > using dynamically allocated memory.
> > >
> > > To support patch 2/4, you can instead convert the rte_malloc to
> > > allocate a single shared test_args structure. Or perhaps move patch 4
> > > earlier in the series, and simply pass the stack pointer instead.
> > >
> > > Thanks,
> > > Gage
> >
> > There is no gain to using rte_malloc unless you are doing
> primary/secondary
> > process or trying to test rte_malloc. Why not use regular malloc which has
> > good tools and library support.
> 
> I think making 'args' a global variable is enough in this case.

Agreed.

Thanks,
Gage

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [dpdk-dev] [PATCH v2 0/4] test/stack: improve multithreaded test
  2020-08-05 15:57 [dpdk-dev] [PATCH 0/4] test/stack: improve multithreaded test Steven Lariau
                   ` (3 preceding siblings ...)
  2020-08-05 15:57 ` [dpdk-dev] [PATCH 4/4] test/stack: remove atomics operations Steven Lariau
@ 2020-08-12 19:18 ` Steven Lariau
  2020-08-12 19:18   ` [dpdk-dev] [PATCH v2 1/4] test/stack: avoid trivial memory allocations Steven Lariau
                     ` (4 more replies)
  4 siblings, 5 replies; 20+ messages in thread
From: Steven Lariau @ 2020-08-12 19:18 UTC (permalink / raw)
  Cc: dev, nd, Steven Lariau

The current multithread DPDK stack test is using atomics operations to
share information between threads.
The lockfree stack implementation also uses atomic operations.
This is an issue for testing. The atomics operations for the test may
add some extra synchronization to the stack implementation,
that doesn't exist.
It makes it harder to find bugs related to memory orderings and data
races. The main goal of the patch is to remove all atomics operations
and any other form of data sharing in this test, to make sure that
most of the execution time is spent on the stack library.

Furthermore, this patch uses more appropriate functions to start /
wait cores in order to simplify the code.
The patch also adds code to propagate errors on any slave core to the
master.

v2: replace stack variable for arguments with a global variable.

Steven Lariau (4):
  test/stack: avoid trivial memory allocations
  test/stack: launch tests with mp remote launch API
  test/stack: propagate errors to main core
  test/stack: remove atomics operations

 app/test/test_stack.c | 80 +++++++++----------------------------------
 1 file changed, 17 insertions(+), 63 deletions(-)

-- 
2.17.1


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [dpdk-dev] [PATCH v2 1/4] test/stack: avoid trivial memory allocations
  2020-08-12 19:18 ` [dpdk-dev] [PATCH v2 0/4] test/stack: improve multithreaded test Steven Lariau
@ 2020-08-12 19:18   ` Steven Lariau
  2020-08-13 21:38     ` Eads, Gage
  2020-08-12 19:18   ` [dpdk-dev] [PATCH v2 2/4] test/stack: launch tests with mp remote launch API Steven Lariau
                     ` (3 subsequent siblings)
  4 siblings, 1 reply; 20+ messages in thread
From: Steven Lariau @ 2020-08-12 19:18 UTC (permalink / raw)
  To: Gage Eads, Olivier Matz; +Cc: dev, nd, Steven Lariau

Replace the arguments array by one argument.
All objects in the args array have the same values, so there is no need
to use an array, only one struct is enough.
The args object is a lot smaller, and the allocation can be replaced
with a global variable.

The allocation of obj_table isn't needed either, because MAX_BULK is
small. The allocation can instead be replaced with a static array.

Signed-off-by: Steven Lariau <steven.lariau@arm.com>
Reviewed-by: Dharmik Thakkar <dharmik.thakkar@arm.com>
Reviewed-by: Phil Yang <phil.yang@arm.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
---
 app/test/test_stack.c | 53 +++++++++++--------------------------------
 1 file changed, 13 insertions(+), 40 deletions(-)

diff --git a/app/test/test_stack.c b/app/test/test_stack.c
index c8dac1f55..f64d70c9d 100644
--- a/app/test/test_stack.c
+++ b/app/test/test_stack.c
@@ -276,20 +276,14 @@ struct test_args {
 	rte_atomic64_t *sz;
 };
 
+static struct test_args thread_test_args;
+
 static int
-stack_thread_push_pop(void *args)
+stack_thread_push_pop(__rte_unused void *args)
 {
-	struct test_args *t = args;
-	void **obj_table;
+	void *obj_table[MAX_BULK];
 	int i;
 
-	obj_table = rte_calloc(NULL, STACK_SIZE, sizeof(void *), 0);
-	if (obj_table == NULL) {
-		printf("[%s():%u] failed to calloc %zu bytes\n",
-		       __func__, __LINE__, STACK_SIZE * sizeof(void *));
-		return -1;
-	}
-
 	for (i = 0; i < NUM_ITERS_PER_THREAD; i++) {
 		unsigned int success, num;
 
@@ -297,41 +291,37 @@ stack_thread_push_pop(void *args)
 		 * then push and pop those stack entries.
 		 */
 		do {
-			uint64_t sz = rte_atomic64_read(t->sz);
+			uint64_t sz = rte_atomic64_read(thread_test_args.sz);
 			volatile uint64_t *sz_addr;
 
-			sz_addr = (volatile uint64_t *)t->sz;
+			sz_addr = (volatile uint64_t *)thread_test_args.sz;
 
 			num = RTE_MIN(rte_rand() % MAX_BULK, STACK_SIZE - sz);
 
 			success = rte_atomic64_cmpset(sz_addr, sz, sz + num);
 		} while (success == 0);
 
-		if (rte_stack_push(t->s, obj_table, num) != num) {
+		if (rte_stack_push(thread_test_args.s, obj_table, num) != num) {
 			printf("[%s():%u] Failed to push %u pointers\n",
 			       __func__, __LINE__, num);
-			rte_free(obj_table);
 			return -1;
 		}
 
-		if (rte_stack_pop(t->s, obj_table, num) != num) {
+		if (rte_stack_pop(thread_test_args.s, obj_table, num) != num) {
 			printf("[%s():%u] Failed to pop %u pointers\n",
 			       __func__, __LINE__, num);
-			rte_free(obj_table);
 			return -1;
 		}
 
-		rte_atomic64_sub(t->sz, num);
+		rte_atomic64_sub(thread_test_args.sz, num);
 	}
 
-	rte_free(obj_table);
 	return 0;
 }
 
 static int
 test_stack_multithreaded(uint32_t flags)
 {
-	struct test_args *args;
 	unsigned int lcore_id;
 	struct rte_stack *s;
 	rte_atomic64_t size;
@@ -344,45 +334,28 @@ test_stack_multithreaded(uint32_t flags)
 	printf("[%s():%u] Running with %u lcores\n",
 	       __func__, __LINE__, rte_lcore_count());
 
-	args = rte_malloc(NULL, sizeof(struct test_args) * RTE_MAX_LCORE, 0);
-	if (args == NULL) {
-		printf("[%s():%u] failed to malloc %zu bytes\n",
-		       __func__, __LINE__,
-		       sizeof(struct test_args) * RTE_MAX_LCORE);
-		return -1;
-	}
-
 	s = rte_stack_create("test", STACK_SIZE, rte_socket_id(), flags);
 	if (s == NULL) {
 		printf("[%s():%u] Failed to create a stack\n",
 		       __func__, __LINE__);
-		rte_free(args);
 		return -1;
 	}
 
 	rte_atomic64_init(&size);
+	thread_test_args.s = s;
+	thread_test_args.sz = &size;
 
 	RTE_LCORE_FOREACH_SLAVE(lcore_id) {
-		args[lcore_id].s = s;
-		args[lcore_id].sz = &size;
-
 		if (rte_eal_remote_launch(stack_thread_push_pop,
-					  &args[lcore_id], lcore_id))
+					  NULL, lcore_id))
 			rte_panic("Failed to launch lcore %d\n", lcore_id);
 	}
 
-	lcore_id = rte_lcore_id();
-
-	args[lcore_id].s = s;
-	args[lcore_id].sz = &size;
-
-	stack_thread_push_pop(&args[lcore_id]);
+	stack_thread_push_pop(NULL);
 
 	rte_eal_mp_wait_lcore();
 
 	rte_stack_free(s);
-	rte_free(args);
-
 	return 0;
 }
 
-- 
2.17.1


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [dpdk-dev] [PATCH v2 2/4] test/stack: launch tests with mp remote launch API
  2020-08-12 19:18 ` [dpdk-dev] [PATCH v2 0/4] test/stack: improve multithreaded test Steven Lariau
  2020-08-12 19:18   ` [dpdk-dev] [PATCH v2 1/4] test/stack: avoid trivial memory allocations Steven Lariau
@ 2020-08-12 19:18   ` Steven Lariau
  2020-08-12 19:18   ` [dpdk-dev] [PATCH v2 3/4] test/stack: propagate errors to main core Steven Lariau
                     ` (2 subsequent siblings)
  4 siblings, 0 replies; 20+ messages in thread
From: Steven Lariau @ 2020-08-12 19:18 UTC (permalink / raw)
  To: Gage Eads, Olivier Matz; +Cc: dev, nd, Steven Lariau

All the cores use the same argument object, so there is no need to use
a loop to launch the test on every core one by one.
Replace loop with one call to rte_eal_mp_remote_launch

Signed-off-by: Steven Lariau <steven.lariau@arm.com>
Reviewed-by: Dharmik Thakkar <dharmik.thakkar@arm.com>
Reviewed-by: Phil Yang <phil.yang@arm.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
Acked-by: Gage Eads <gage.eads@intel.com>
---
 app/test/test_stack.c | 11 ++---------
 1 file changed, 2 insertions(+), 9 deletions(-)

diff --git a/app/test/test_stack.c b/app/test/test_stack.c
index f64d70c9d..efd473855 100644
--- a/app/test/test_stack.c
+++ b/app/test/test_stack.c
@@ -322,7 +322,6 @@ stack_thread_push_pop(__rte_unused void *args)
 static int
 test_stack_multithreaded(uint32_t flags)
 {
-	unsigned int lcore_id;
 	struct rte_stack *s;
 	rte_atomic64_t size;
 
@@ -345,14 +344,8 @@ test_stack_multithreaded(uint32_t flags)
 	thread_test_args.s = s;
 	thread_test_args.sz = &size;
 
-	RTE_LCORE_FOREACH_SLAVE(lcore_id) {
-		if (rte_eal_remote_launch(stack_thread_push_pop,
-					  NULL, lcore_id))
-			rte_panic("Failed to launch lcore %d\n", lcore_id);
-	}
-
-	stack_thread_push_pop(NULL);
-
+	if (rte_eal_mp_remote_launch(stack_thread_push_pop, NULL, CALL_MASTER))
+		rte_panic("Failed to launch tests\n");
 	rte_eal_mp_wait_lcore();
 
 	rte_stack_free(s);
-- 
2.17.1


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [dpdk-dev] [PATCH v2 3/4] test/stack: propagate errors to main core
  2020-08-12 19:18 ` [dpdk-dev] [PATCH v2 0/4] test/stack: improve multithreaded test Steven Lariau
  2020-08-12 19:18   ` [dpdk-dev] [PATCH v2 1/4] test/stack: avoid trivial memory allocations Steven Lariau
  2020-08-12 19:18   ` [dpdk-dev] [PATCH v2 2/4] test/stack: launch tests with mp remote launch API Steven Lariau
@ 2020-08-12 19:18   ` Steven Lariau
  2020-08-12 19:18   ` [dpdk-dev] [PATCH v2 4/4] test/stack: remove atomics operations Steven Lariau
  2020-09-30 19:12   ` [dpdk-dev] [PATCH v2 0/4] test/stack: improve multithreaded test David Marchand
  4 siblings, 0 replies; 20+ messages in thread
From: Steven Lariau @ 2020-08-12 19:18 UTC (permalink / raw)
  To: Gage Eads, Olivier Matz; +Cc: dev, nd, Steven Lariau

Use rte_eal_wait_lcore to wait and get the return value for all cores.
This is used to propagate any error to the main core.

Signed-off-by: Steven Lariau <steven.lariau@arm.com>
Reviewed-by: Dharmik Thakkar <dharmik.thakkar@arm.com>
Reviewed-by: Phil Yang <phil.yang@arm.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
Acked-by: Gage Eads <gage.eads@intel.com>
---
 app/test/test_stack.c | 10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/app/test/test_stack.c b/app/test/test_stack.c
index efd473855..d959b566a 100644
--- a/app/test/test_stack.c
+++ b/app/test/test_stack.c
@@ -322,8 +322,10 @@ stack_thread_push_pop(__rte_unused void *args)
 static int
 test_stack_multithreaded(uint32_t flags)
 {
+	unsigned int lcore_id;
 	struct rte_stack *s;
 	rte_atomic64_t size;
+	int result = 0;
 
 	if (rte_lcore_count() < 2) {
 		printf("Not enough cores for test_stack_multithreaded, expecting at least 2\n");
@@ -346,10 +348,14 @@ test_stack_multithreaded(uint32_t flags)
 
 	if (rte_eal_mp_remote_launch(stack_thread_push_pop, NULL, CALL_MASTER))
 		rte_panic("Failed to launch tests\n");
-	rte_eal_mp_wait_lcore();
+
+	RTE_LCORE_FOREACH(lcore_id) {
+		if (rte_eal_wait_lcore(lcore_id) < 0)
+			result = -1;
+	}
 
 	rte_stack_free(s);
-	return 0;
+	return result;
 }
 
 static int
-- 
2.17.1


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [dpdk-dev] [PATCH v2 4/4] test/stack: remove atomics operations
  2020-08-12 19:18 ` [dpdk-dev] [PATCH v2 0/4] test/stack: improve multithreaded test Steven Lariau
                     ` (2 preceding siblings ...)
  2020-08-12 19:18   ` [dpdk-dev] [PATCH v2 3/4] test/stack: propagate errors to main core Steven Lariau
@ 2020-08-12 19:18   ` Steven Lariau
  2020-09-30 19:12   ` [dpdk-dev] [PATCH v2 0/4] test/stack: improve multithreaded test David Marchand
  4 siblings, 0 replies; 20+ messages in thread
From: Steven Lariau @ 2020-08-12 19:18 UTC (permalink / raw)
  To: Gage Eads, Olivier Matz; +Cc: dev, nd, Steven Lariau

Remove the part that checks if there is enough room in the stack, it's
always true as long as size of stack >= MAX_BULK*rte_lcore_count().
This check used an atomic cmpset, and read / write to a shared size
variable. These operations result in some form of synchronization
that might get in the way of the actual stack testing.

Signed-off-by: Steven Lariau <steven.lariau@arm.com>
Reviewed-by: Dharmik Thakkar <dharmik.thakkar@arm.com>
Reviewed-by: Phil Yang <phil.yang@arm.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
Acked-by: Gage Eads <gage.eads@intel.com>
---
 app/test/test_stack.c | 24 +++---------------------
 1 file changed, 3 insertions(+), 21 deletions(-)

diff --git a/app/test/test_stack.c b/app/test/test_stack.c
index d959b566a..463460ccc 100644
--- a/app/test/test_stack.c
+++ b/app/test/test_stack.c
@@ -273,7 +273,6 @@ test_free_null(void)
 
 struct test_args {
 	struct rte_stack *s;
-	rte_atomic64_t *sz;
 };
 
 static struct test_args thread_test_args;
@@ -285,21 +284,9 @@ stack_thread_push_pop(__rte_unused void *args)
 	int i;
 
 	for (i = 0; i < NUM_ITERS_PER_THREAD; i++) {
-		unsigned int success, num;
+		unsigned int num;
 
-		/* Reserve up to min(MAX_BULK, available slots) stack entries,
-		 * then push and pop those stack entries.
-		 */
-		do {
-			uint64_t sz = rte_atomic64_read(thread_test_args.sz);
-			volatile uint64_t *sz_addr;
-
-			sz_addr = (volatile uint64_t *)thread_test_args.sz;
-
-			num = RTE_MIN(rte_rand() % MAX_BULK, STACK_SIZE - sz);
-
-			success = rte_atomic64_cmpset(sz_addr, sz, sz + num);
-		} while (success == 0);
+		num = rte_rand() % MAX_BULK;
 
 		if (rte_stack_push(thread_test_args.s, obj_table, num) != num) {
 			printf("[%s():%u] Failed to push %u pointers\n",
@@ -312,8 +299,6 @@ stack_thread_push_pop(__rte_unused void *args)
 			       __func__, __LINE__, num);
 			return -1;
 		}
-
-		rte_atomic64_sub(thread_test_args.sz, num);
 	}
 
 	return 0;
@@ -324,7 +309,6 @@ test_stack_multithreaded(uint32_t flags)
 {
 	unsigned int lcore_id;
 	struct rte_stack *s;
-	rte_atomic64_t size;
 	int result = 0;
 
 	if (rte_lcore_count() < 2) {
@@ -335,16 +319,14 @@ test_stack_multithreaded(uint32_t flags)
 	printf("[%s():%u] Running with %u lcores\n",
 	       __func__, __LINE__, rte_lcore_count());
 
-	s = rte_stack_create("test", STACK_SIZE, rte_socket_id(), flags);
+	s = rte_stack_create("test", MAX_BULK * rte_lcore_count(), rte_socket_id(), flags);
 	if (s == NULL) {
 		printf("[%s():%u] Failed to create a stack\n",
 		       __func__, __LINE__);
 		return -1;
 	}
 
-	rte_atomic64_init(&size);
 	thread_test_args.s = s;
-	thread_test_args.sz = &size;
 
 	if (rte_eal_mp_remote_launch(stack_thread_push_pop, NULL, CALL_MASTER))
 		rte_panic("Failed to launch tests\n");
-- 
2.17.1


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [dpdk-dev] [PATCH v2 1/4] test/stack: avoid trivial memory allocations
  2020-08-12 19:18   ` [dpdk-dev] [PATCH v2 1/4] test/stack: avoid trivial memory allocations Steven Lariau
@ 2020-08-13 21:38     ` Eads, Gage
  0 siblings, 0 replies; 20+ messages in thread
From: Eads, Gage @ 2020-08-13 21:38 UTC (permalink / raw)
  To: Steven Lariau, Olivier Matz; +Cc: dev, nd



> -----Original Message-----
> From: Steven Lariau <steven.lariau@arm.com>
> Sent: Wednesday, August 12, 2020 2:19 PM
> To: Eads, Gage <gage.eads@intel.com>; Olivier Matz
> <olivier.matz@6wind.com>
> Cc: dev@dpdk.org; nd@arm.com; Steven Lariau <steven.lariau@arm.com>
> Subject: [PATCH v2 1/4] test/stack: avoid trivial memory allocations
> 
> Replace the arguments array by one argument.
> All objects in the args array have the same values, so there is no need
> to use an array, only one struct is enough.
> The args object is a lot smaller, and the allocation can be replaced
> with a global variable.
> 
> The allocation of obj_table isn't needed either, because MAX_BULK is
> small. The allocation can instead be replaced with a static array.
> 
> Signed-off-by: Steven Lariau <steven.lariau@arm.com>
> Reviewed-by: Dharmik Thakkar <dharmik.thakkar@arm.com>
> Reviewed-by: Phil Yang <phil.yang@arm.com>
> Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>

Acked-by: Gage Eads <gage.eads@intel.com>

Thanks,
Gage

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [dpdk-dev] [PATCH v2 0/4] test/stack: improve multithreaded test
  2020-08-12 19:18 ` [dpdk-dev] [PATCH v2 0/4] test/stack: improve multithreaded test Steven Lariau
                     ` (3 preceding siblings ...)
  2020-08-12 19:18   ` [dpdk-dev] [PATCH v2 4/4] test/stack: remove atomics operations Steven Lariau
@ 2020-09-30 19:12   ` David Marchand
  4 siblings, 0 replies; 20+ messages in thread
From: David Marchand @ 2020-09-30 19:12 UTC (permalink / raw)
  To: Steven Lariau
  Cc: dev, nd, Gage Eads, Dharmik Thakkar, Ruifeng Wang (Arm Technology China)

On Wed, Aug 12, 2020 at 9:20 PM Steven Lariau <steven.lariau@arm.com> wrote:
>
> The current multithread DPDK stack test is using atomics operations to
> share information between threads.
> The lockfree stack implementation also uses atomic operations.
> This is an issue for testing. The atomics operations for the test may
> add some extra synchronization to the stack implementation,
> that doesn't exist.
> It makes it harder to find bugs related to memory orderings and data
> races. The main goal of the patch is to remove all atomics operations
> and any other form of data sharing in this test, to make sure that
> most of the execution time is spent on the stack library.
>
> Furthermore, this patch uses more appropriate functions to start /
> wait cores in order to simplify the code.
> The patch also adds code to propagate errors on any slave core to the
> master.
>
> v2: replace stack variable for arguments with a global variable.
>
> Steven Lariau (4):
>   test/stack: avoid trivial memory allocations
>   test/stack: launch tests with mp remote launch API
>   test/stack: propagate errors to main core
>   test/stack: remove atomics operations
>
>  app/test/test_stack.c | 80 +++++++++----------------------------------
>  1 file changed, 17 insertions(+), 63 deletions(-)
>

Series applied, thanks Steven.


-- 
David Marchand


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [dpdk-dev] [PATCH 0/4] test/stack: improve multithreaded test
@ 2020-08-05 15:45 Steven Lariau
  0 siblings, 0 replies; 20+ messages in thread
From: Steven Lariau @ 2020-08-05 15:45 UTC (permalink / raw)
  Cc: dev, honnappa.nagarahalli, nd, Steven Lariau

The current multithread DPDK stack test is using atomics operations to
share information between threads.
The lockfree stack implementation also uses atomic operations.
This is an issue for testing. The atomics operations for the test may
add some extra synchronization to the stack implementation,
that doesn't exist.
It makes it harder to find bugs related to memory orderings and data
races. The main goal of the patch is to remove all atomics operations
and any other form of data sharing in this test, to make sure that
most of the execution time is spent on the stack library.
    
Furthermore, this patch uses more appropriate functions to start /
wait cores in order to simplify the code.
The patch also adds code to propagate errors on any slave core to the
master.

Steven Lariau (4):
  test/stack: avoid trivial memory allocations
  test/stack: launch tests with mp remote launch API
  test/stack: propagate errors to main core
  test/stack: remove atomics operations

 app/test/test_stack.c | 71 ++++++++-----------------------------------
 1 file changed, 13 insertions(+), 58 deletions(-)

-- 
2.17.1


^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2020-09-30 19:13 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-08-05 15:57 [dpdk-dev] [PATCH 0/4] test/stack: improve multithreaded test Steven Lariau
2020-08-05 15:57 ` [dpdk-dev] [PATCH 1/4] test/stack: avoid trivial memory allocations Steven Lariau
2020-08-11 20:13   ` Eads, Gage
2020-08-11 20:38     ` Stephen Hemminger
2020-08-11 20:49       ` Honnappa Nagarahalli
2020-08-11 21:14         ` Eads, Gage
2020-08-05 15:57 ` [dpdk-dev] [PATCH 2/4] test/stack: launch tests with mp remote launch API Steven Lariau
2020-08-11 20:13   ` Eads, Gage
2020-08-05 15:57 ` [dpdk-dev] [PATCH 3/4] test/stack: propagate errors to main core Steven Lariau
2020-08-11 20:13   ` Eads, Gage
2020-08-05 15:57 ` [dpdk-dev] [PATCH 4/4] test/stack: remove atomics operations Steven Lariau
2020-08-11 20:13   ` Eads, Gage
2020-08-12 19:18 ` [dpdk-dev] [PATCH v2 0/4] test/stack: improve multithreaded test Steven Lariau
2020-08-12 19:18   ` [dpdk-dev] [PATCH v2 1/4] test/stack: avoid trivial memory allocations Steven Lariau
2020-08-13 21:38     ` Eads, Gage
2020-08-12 19:18   ` [dpdk-dev] [PATCH v2 2/4] test/stack: launch tests with mp remote launch API Steven Lariau
2020-08-12 19:18   ` [dpdk-dev] [PATCH v2 3/4] test/stack: propagate errors to main core Steven Lariau
2020-08-12 19:18   ` [dpdk-dev] [PATCH v2 4/4] test/stack: remove atomics operations Steven Lariau
2020-09-30 19:12   ` [dpdk-dev] [PATCH v2 0/4] test/stack: improve multithreaded test David Marchand
  -- strict thread matches above, loose matches on Subject: below --
2020-08-05 15:45 [dpdk-dev] [PATCH " Steven Lariau

DPDK patches and discussions

This inbox may be cloned and mirrored by anyone:

	git clone --mirror https://inbox.dpdk.org/dev/0 dev/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 dev dev/ https://inbox.dpdk.org/dev \
		dev@dpdk.org
	public-inbox-index dev

Example config snippet for mirrors.
Newsgroup available over NNTP:
	nntp://inbox.dpdk.org/inbox.dpdk.dev


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git