The current multithread DPDK stack test is using atomics operations to share information between threads. The lockfree stack implementation also uses atomic operations. This is an issue for testing. The atomics operations for the test may add some extra synchronization to the stack implementation, that doesn't exist. It makes it harder to find bugs related to memory orderings and data races. The main goal of the patch is to remove all atomics operations and any other form of data sharing in this test, to make sure that most of the execution time is spent on the stack library. Furthermore, this patch uses more appropriate functions to start / wait cores in order to simplify the code. The patch also adds code to propagate errors on any slave core to the master. Steven Lariau (4): test/stack: avoid trivial memory allocations test/stack: launch tests with mp remote launch API test/stack: propagate errors to main core test/stack: remove atomics operations app/test/test_stack.c | 71 ++++++++----------------------------------- 1 file changed, 13 insertions(+), 58 deletions(-) -- 2.17.1
The current multithread DPDK stack test is using atomics operations to share information between threads. The lockfree stack implementation also uses atomic operations. This is an issue for testing. The atomics operations for the test may add some extra synchronization to the stack implementation, that doesn't exist. It makes it harder to find bugs related to memory orderings and data races. The main goal of the patch is to remove all atomics operations and any other form of data sharing in this test, to make sure that most of the execution time is spent on the stack library. Furthermore, this patch uses more appropriate functions to start / wait cores in order to simplify the code. The patch also adds code to propagate errors on any slave core to the master. Steven Lariau (4): test/stack: avoid trivial memory allocations test/stack: launch tests with mp remote launch API test/stack: propagate errors to main core test/stack: remove atomics operations app/test/test_stack.c | 71 ++++++++----------------------------------- 1 file changed, 13 insertions(+), 58 deletions(-) -- 2.17.1
Replace the arguments array by one argument. All objects in the args array have the same values, so there is no need to use an array, only one struct is enough. The args object is a lot smaller, and the allocation can be replaced with a stack variable. The allocation of obj_table isn't needed either, because MAX_BULK is small. The allocation can instead be replaced with a static array. Signed-off-by: Steven Lariau <steven.lariau@arm.com> Reviewed-by: Dharmik Thakkar <dharmik.thakkar@arm.com> Reviewed-by: Phil Yang <phil.yang@arm.com> Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com> --- app/test/test_stack.c | 39 ++++++--------------------------------- 1 file changed, 6 insertions(+), 33 deletions(-) diff --git a/app/test/test_stack.c b/app/test/test_stack.c index c8dac1f55..5a7273a7d 100644 --- a/app/test/test_stack.c +++ b/app/test/test_stack.c @@ -280,16 +280,9 @@ static int stack_thread_push_pop(void *args) { struct test_args *t = args; - void **obj_table; + void *obj_table[MAX_BULK]; int i; - obj_table = rte_calloc(NULL, STACK_SIZE, sizeof(void *), 0); - if (obj_table == NULL) { - printf("[%s():%u] failed to calloc %zu bytes\n", - __func__, __LINE__, STACK_SIZE * sizeof(void *)); - return -1; - } - for (i = 0; i < NUM_ITERS_PER_THREAD; i++) { unsigned int success, num; @@ -310,28 +303,25 @@ stack_thread_push_pop(void *args) if (rte_stack_push(t->s, obj_table, num) != num) { printf("[%s():%u] Failed to push %u pointers\n", __func__, __LINE__, num); - rte_free(obj_table); return -1; } if (rte_stack_pop(t->s, obj_table, num) != num) { printf("[%s():%u] Failed to pop %u pointers\n", __func__, __LINE__, num); - rte_free(obj_table); return -1; } rte_atomic64_sub(t->sz, num); } - rte_free(obj_table); return 0; } static int test_stack_multithreaded(uint32_t flags) { - struct test_args *args; + struct test_args args; unsigned int lcore_id; struct rte_stack *s; rte_atomic64_t size; @@ -344,45 +334,28 @@ test_stack_multithreaded(uint32_t flags) printf("[%s():%u] Running with %u lcores\n", __func__, __LINE__, rte_lcore_count()); - args = rte_malloc(NULL, sizeof(struct test_args) * RTE_MAX_LCORE, 0); - if (args == NULL) { - printf("[%s():%u] failed to malloc %zu bytes\n", - __func__, __LINE__, - sizeof(struct test_args) * RTE_MAX_LCORE); - return -1; - } - s = rte_stack_create("test", STACK_SIZE, rte_socket_id(), flags); if (s == NULL) { printf("[%s():%u] Failed to create a stack\n", __func__, __LINE__); - rte_free(args); return -1; } rte_atomic64_init(&size); + args.s = s; + args.sz = &size; RTE_LCORE_FOREACH_SLAVE(lcore_id) { - args[lcore_id].s = s; - args[lcore_id].sz = &size; - if (rte_eal_remote_launch(stack_thread_push_pop, - &args[lcore_id], lcore_id)) + &args, lcore_id)) rte_panic("Failed to launch lcore %d\n", lcore_id); } - lcore_id = rte_lcore_id(); - - args[lcore_id].s = s; - args[lcore_id].sz = &size; - - stack_thread_push_pop(&args[lcore_id]); + stack_thread_push_pop(&args); rte_eal_mp_wait_lcore(); rte_stack_free(s); - rte_free(args); - return 0; } -- 2.17.1
All the cores use the same argument object, so there is no need to use a loop to launch the test on every core one by one. Replace loop with one call to rte_eal_mp_remote_launch Signed-off-by: Steven Lariau <steven.lariau@arm.com> Reviewed-by: Dharmik Thakkar <dharmik.thakkar@arm.com> Reviewed-by: Phil Yang <phil.yang@arm.com> Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com> --- app/test/test_stack.c | 11 ++--------- 1 file changed, 2 insertions(+), 9 deletions(-) diff --git a/app/test/test_stack.c b/app/test/test_stack.c index 5a7273a7d..c100d9faf 100644 --- a/app/test/test_stack.c +++ b/app/test/test_stack.c @@ -322,7 +322,6 @@ static int test_stack_multithreaded(uint32_t flags) { struct test_args args; - unsigned int lcore_id; struct rte_stack *s; rte_atomic64_t size; @@ -345,14 +344,8 @@ test_stack_multithreaded(uint32_t flags) args.s = s; args.sz = &size; - RTE_LCORE_FOREACH_SLAVE(lcore_id) { - if (rte_eal_remote_launch(stack_thread_push_pop, - &args, lcore_id)) - rte_panic("Failed to launch lcore %d\n", lcore_id); - } - - stack_thread_push_pop(&args); - + if (rte_eal_mp_remote_launch(stack_thread_push_pop, &args, CALL_MASTER)) + rte_panic("Failed to launch tests\n"); rte_eal_mp_wait_lcore(); rte_stack_free(s); -- 2.17.1
Use rte_eal_wait_lcore to wait and get the return value for all cores. This is used to propagate any error to the main core. Signed-off-by: Steven Lariau <steven.lariau@arm.com> Reviewed-by: Dharmik Thakkar <dharmik.thakkar@arm.com> Reviewed-by: Phil Yang <phil.yang@arm.com> Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com> --- app/test/test_stack.c | 10 ++++++++-- 1 file changed, 8 insertions(+), 2 deletions(-) diff --git a/app/test/test_stack.c b/app/test/test_stack.c index c100d9faf..b9d6befd4 100644 --- a/app/test/test_stack.c +++ b/app/test/test_stack.c @@ -322,8 +322,10 @@ static int test_stack_multithreaded(uint32_t flags) { struct test_args args; + unsigned int lcore_id; struct rte_stack *s; rte_atomic64_t size; + int result = 0; if (rte_lcore_count() < 2) { printf("Not enough cores for test_stack_multithreaded, expecting at least 2\n"); @@ -346,10 +348,14 @@ test_stack_multithreaded(uint32_t flags) if (rte_eal_mp_remote_launch(stack_thread_push_pop, &args, CALL_MASTER)) rte_panic("Failed to launch tests\n"); - rte_eal_mp_wait_lcore(); + + RTE_LCORE_FOREACH(lcore_id) { + if (rte_eal_wait_lcore(lcore_id) < 0) + result = -1; + } rte_stack_free(s); - return 0; + return result; } static int -- 2.17.1
Remove the part that checks if there is enough room in the stack, it's always true as long as size of stack >= MAX_BULK*rte_lcore_count(). This check used an atomic cmpset, and read / write to a shared size variable. These operations result in some form of synchronization that might get in the way of the actual stack testing. Signed-off-by: Steven Lariau <steven.lariau@arm.com> Reviewed-by: Dharmik Thakkar <dharmik.thakkar@arm.com> Reviewed-by: Phil Yang <phil.yang@arm.com> Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com> --- app/test/test_stack.c | 23 +++-------------------- 1 file changed, 3 insertions(+), 20 deletions(-) diff --git a/app/test/test_stack.c b/app/test/test_stack.c index b9d6befd4..02bea7ec5 100644 --- a/app/test/test_stack.c +++ b/app/test/test_stack.c @@ -273,7 +273,6 @@ test_free_null(void) struct test_args { struct rte_stack *s; - rte_atomic64_t *sz; }; static int @@ -284,21 +283,9 @@ stack_thread_push_pop(void *args) int i; for (i = 0; i < NUM_ITERS_PER_THREAD; i++) { - unsigned int success, num; + unsigned int num; - /* Reserve up to min(MAX_BULK, available slots) stack entries, - * then push and pop those stack entries. - */ - do { - uint64_t sz = rte_atomic64_read(t->sz); - volatile uint64_t *sz_addr; - - sz_addr = (volatile uint64_t *)t->sz; - - num = RTE_MIN(rte_rand() % MAX_BULK, STACK_SIZE - sz); - - success = rte_atomic64_cmpset(sz_addr, sz, sz + num); - } while (success == 0); + num = rte_rand() % MAX_BULK; if (rte_stack_push(t->s, obj_table, num) != num) { printf("[%s():%u] Failed to push %u pointers\n", @@ -312,7 +299,6 @@ stack_thread_push_pop(void *args) return -1; } - rte_atomic64_sub(t->sz, num); } return 0; @@ -324,7 +310,6 @@ test_stack_multithreaded(uint32_t flags) struct test_args args; unsigned int lcore_id; struct rte_stack *s; - rte_atomic64_t size; int result = 0; if (rte_lcore_count() < 2) { @@ -335,16 +320,14 @@ test_stack_multithreaded(uint32_t flags) printf("[%s():%u] Running with %u lcores\n", __func__, __LINE__, rte_lcore_count()); - s = rte_stack_create("test", STACK_SIZE, rte_socket_id(), flags); + s = rte_stack_create("test", MAX_BULK * rte_lcore_count(), rte_socket_id(), flags); if (s == NULL) { printf("[%s():%u] Failed to create a stack\n", __func__, __LINE__); return -1; } - rte_atomic64_init(&size); args.s = s; - args.sz = &size; if (rte_eal_mp_remote_launch(stack_thread_push_pop, &args, CALL_MASTER)) rte_panic("Failed to launch tests\n"); -- 2.17.1
Hi Steven, > -----Original Message----- > From: Steven Lariau <steven.lariau@arm.com> > Sent: Wednesday, August 5, 2020 10:57 AM > To: Eads, Gage <gage.eads@intel.com>; Olivier Matz > <olivier.matz@6wind.com> > Cc: dev@dpdk.org; honnappa.nagarahalli@arm.com; > dharmik.thakkar@arm.com; nd@arm.com; Steven Lariau > <steven.lariau@arm.com> > Subject: [PATCH 1/4] test/stack: avoid trivial memory allocations > > Replace the arguments array by one argument. > All objects in the args array have the same values, so there is no need > to use an array, only one struct is enough. > The args object is a lot smaller, and the allocation can be replaced > with a stack variable. > > The allocation of obj_table isn't needed either, because MAX_BULK is > small. The allocation can instead be replaced with a static array. > > Signed-off-by: Steven Lariau <steven.lariau@arm.com> > Reviewed-by: Dharmik Thakkar <dharmik.thakkar@arm.com> > Reviewed-by: Phil Yang <phil.yang@arm.com> > Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com> > --- > app/test/test_stack.c | 39 ++++++--------------------------------- > 1 file changed, 6 insertions(+), 33 deletions(-) > > diff --git a/app/test/test_stack.c b/app/test/test_stack.c > index c8dac1f55..5a7273a7d 100644 > --- a/app/test/test_stack.c > +++ b/app/test/test_stack.c > @@ -280,16 +280,9 @@ static int > stack_thread_push_pop(void *args) > { > struct test_args *t = args; > - void **obj_table; > + void *obj_table[MAX_BULK]; > int i; > > - obj_table = rte_calloc(NULL, STACK_SIZE, sizeof(void *), 0); > - if (obj_table == NULL) { > - printf("[%s():%u] failed to calloc %zu bytes\n", > - __func__, __LINE__, STACK_SIZE * sizeof(void *)); > - return -1; > - } > - > for (i = 0; i < NUM_ITERS_PER_THREAD; i++) { > unsigned int success, num; > > @@ -310,28 +303,25 @@ stack_thread_push_pop(void *args) > if (rte_stack_push(t->s, obj_table, num) != num) { > printf("[%s():%u] Failed to push %u pointers\n", > __func__, __LINE__, num); > - rte_free(obj_table); > return -1; > } > > if (rte_stack_pop(t->s, obj_table, num) != num) { > printf("[%s():%u] Failed to pop %u pointers\n", > __func__, __LINE__, num); > - rte_free(obj_table); > return -1; > } > > rte_atomic64_sub(t->sz, num); > } > > - rte_free(obj_table); > return 0; > } Agreed, the dynamic allocation is unnecessary. > > static int > test_stack_multithreaded(uint32_t flags) > { > - struct test_args *args; > + struct test_args args; > unsigned int lcore_id; > struct rte_stack *s; > rte_atomic64_t size; > @@ -344,45 +334,28 @@ test_stack_multithreaded(uint32_t flags) > printf("[%s():%u] Running with %u lcores\n", > __func__, __LINE__, rte_lcore_count()); > > - args = rte_malloc(NULL, sizeof(struct test_args) * RTE_MAX_LCORE, > 0); > - if (args == NULL) { > - printf("[%s():%u] failed to malloc %zu bytes\n", > - __func__, __LINE__, > - sizeof(struct test_args) * RTE_MAX_LCORE); > - return -1; > - } > - > s = rte_stack_create("test", STACK_SIZE, rte_socket_id(), flags); > if (s == NULL) { > printf("[%s():%u] Failed to create a stack\n", > __func__, __LINE__); > - rte_free(args); > return -1; > } > > rte_atomic64_init(&size); > + args.s = s; > + args.sz = &size; > > RTE_LCORE_FOREACH_SLAVE(lcore_id) { > - args[lcore_id].s = s; > - args[lcore_id].sz = &size; > - > if (rte_eal_remote_launch(stack_thread_push_pop, > - &args[lcore_id], lcore_id)) > + &args, lcore_id)) > rte_panic("Failed to launch lcore %d\n", lcore_id); > } In general we shouldn't pass a stack variable to other threads. Though your code here looks fine, I'd rather err on the safe side in case this is ever used as a template/basis for some other code...particularly since there's no performance/correctness/etc. penalty to using dynamically allocated memory. To support patch 2/4, you can instead convert the rte_malloc to allocate a single shared test_args structure. Or perhaps move patch 4 earlier in the series, and simply pass the stack pointer instead. Thanks, Gage
> -----Original Message-----
> From: Steven Lariau <steven.lariau@arm.com>
> Sent: Wednesday, August 5, 2020 10:57 AM
> To: Eads, Gage <gage.eads@intel.com>; Olivier Matz
> <olivier.matz@6wind.com>
> Cc: dev@dpdk.org; honnappa.nagarahalli@arm.com;
> dharmik.thakkar@arm.com; nd@arm.com; Steven Lariau
> <steven.lariau@arm.com>
> Subject: [PATCH 2/4] test/stack: launch tests with mp remote launch API
>
> All the cores use the same argument object, so there is no need to use
> a loop to launch the test on every core one by one.
> Replace loop with one call to rte_eal_mp_remote_launch
>
>
> Signed-off-by: Steven Lariau <steven.lariau@arm.com>
> Reviewed-by: Dharmik Thakkar <dharmik.thakkar@arm.com>
> Reviewed-by: Phil Yang <phil.yang@arm.com>
> Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
Acked-by: Gage Eads <gage.eads@intel.com>
Thanks,
Gage
> -----Original Message-----
> From: Steven Lariau <steven.lariau@arm.com>
> Sent: Wednesday, August 5, 2020 10:57 AM
> To: Eads, Gage <gage.eads@intel.com>; Olivier Matz
> <olivier.matz@6wind.com>
> Cc: dev@dpdk.org; honnappa.nagarahalli@arm.com;
> dharmik.thakkar@arm.com; nd@arm.com; Steven Lariau
> <steven.lariau@arm.com>
> Subject: [PATCH 3/4] test/stack: propagate errors to main core
>
> Use rte_eal_wait_lcore to wait and get the return value for all cores.
> This is used to propagate any error to the main core.
>
> Signed-off-by: Steven Lariau <steven.lariau@arm.com>
> Reviewed-by: Dharmik Thakkar <dharmik.thakkar@arm.com>
> Reviewed-by: Phil Yang <phil.yang@arm.com>
> Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
Acked-by: Gage Eads <gage.eads@intel.com>
Thanks,
Gage
> -----Original Message-----
> From: Steven Lariau <steven.lariau@arm.com>
> Sent: Wednesday, August 5, 2020 10:57 AM
> To: Eads, Gage <gage.eads@intel.com>; Olivier Matz
> <olivier.matz@6wind.com>
> Cc: dev@dpdk.org; honnappa.nagarahalli@arm.com;
> dharmik.thakkar@arm.com; nd@arm.com; Steven Lariau
> <steven.lariau@arm.com>
> Subject: [PATCH 4/4] test/stack: remove atomics operations
>
> Remove the part that checks if there is enough room in the stack, it's
> always true as long as size of stack >= MAX_BULK*rte_lcore_count().
> This check used an atomic cmpset, and read / write to a shared size
> variable. These operations result in some form of synchronization
> that might get in the way of the actual stack testing.
>
> Signed-off-by: Steven Lariau <steven.lariau@arm.com>
> Reviewed-by: Dharmik Thakkar <dharmik.thakkar@arm.com>
> Reviewed-by: Phil Yang <phil.yang@arm.com>
> Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
Nice simplification, and thanks for the good contributions to these tests.
Acked-by: Gage Eads <gage.eads@intel.com
On Tue, 11 Aug 2020 20:13:24 +0000
"Eads, Gage" <gage.eads@intel.com> wrote:
> Hi Steven,
>
> > -----Original Message-----
> > From: Steven Lariau <steven.lariau@arm.com>
> > Sent: Wednesday, August 5, 2020 10:57 AM
> > To: Eads, Gage <gage.eads@intel.com>; Olivier Matz
> > <olivier.matz@6wind.com>
> > Cc: dev@dpdk.org; honnappa.nagarahalli@arm.com;
> > dharmik.thakkar@arm.com; nd@arm.com; Steven Lariau
> > <steven.lariau@arm.com>
> > Subject: [PATCH 1/4] test/stack: avoid trivial memory allocations
> >
> > Replace the arguments array by one argument.
> > All objects in the args array have the same values, so there is no need
> > to use an array, only one struct is enough.
> > The args object is a lot smaller, and the allocation can be replaced
> > with a stack variable.
> >
> > The allocation of obj_table isn't needed either, because MAX_BULK is
> > small. The allocation can instead be replaced with a static array.
> >
> > Signed-off-by: Steven Lariau <steven.lariau@arm.com>
> > Reviewed-by: Dharmik Thakkar <dharmik.thakkar@arm.com>
> > Reviewed-by: Phil Yang <phil.yang@arm.com>
> > Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
> > ---
> > app/test/test_stack.c | 39 ++++++---------------------------------
> > 1 file changed, 6 insertions(+), 33 deletions(-)
> >
> > diff --git a/app/test/test_stack.c b/app/test/test_stack.c
> > index c8dac1f55..5a7273a7d 100644
> > --- a/app/test/test_stack.c
> > +++ b/app/test/test_stack.c
> > @@ -280,16 +280,9 @@ static int
> > stack_thread_push_pop(void *args)
> > {
> > struct test_args *t = args;
> > - void **obj_table;
> > + void *obj_table[MAX_BULK];
> > int i;
> >
> > - obj_table = rte_calloc(NULL, STACK_SIZE, sizeof(void *), 0);
> > - if (obj_table == NULL) {
> > - printf("[%s():%u] failed to calloc %zu bytes\n",
> > - __func__, __LINE__, STACK_SIZE * sizeof(void *));
> > - return -1;
> > - }
> > -
> > for (i = 0; i < NUM_ITERS_PER_THREAD; i++) {
> > unsigned int success, num;
> >
> > @@ -310,28 +303,25 @@ stack_thread_push_pop(void *args)
> > if (rte_stack_push(t->s, obj_table, num) != num) {
> > printf("[%s():%u] Failed to push %u pointers\n",
> > __func__, __LINE__, num);
> > - rte_free(obj_table);
> > return -1;
> > }
> >
> > if (rte_stack_pop(t->s, obj_table, num) != num) {
> > printf("[%s():%u] Failed to pop %u pointers\n",
> > __func__, __LINE__, num);
> > - rte_free(obj_table);
> > return -1;
> > }
> >
> > rte_atomic64_sub(t->sz, num);
> > }
> >
> > - rte_free(obj_table);
> > return 0;
> > }
>
> Agreed, the dynamic allocation is unnecessary.
>
> >
> > static int
> > test_stack_multithreaded(uint32_t flags)
> > {
> > - struct test_args *args;
> > + struct test_args args;
> > unsigned int lcore_id;
> > struct rte_stack *s;
> > rte_atomic64_t size;
> > @@ -344,45 +334,28 @@ test_stack_multithreaded(uint32_t flags)
> > printf("[%s():%u] Running with %u lcores\n",
> > __func__, __LINE__, rte_lcore_count());
> >
> > - args = rte_malloc(NULL, sizeof(struct test_args) * RTE_MAX_LCORE,
> > 0);
> > - if (args == NULL) {
> > - printf("[%s():%u] failed to malloc %zu bytes\n",
> > - __func__, __LINE__,
> > - sizeof(struct test_args) * RTE_MAX_LCORE);
> > - return -1;
> > - }
> > -
> > s = rte_stack_create("test", STACK_SIZE, rte_socket_id(), flags);
> > if (s == NULL) {
> > printf("[%s():%u] Failed to create a stack\n",
> > __func__, __LINE__);
> > - rte_free(args);
> > return -1;
> > }
> >
> > rte_atomic64_init(&size);
> > + args.s = s;
> > + args.sz = &size;
> >
> > RTE_LCORE_FOREACH_SLAVE(lcore_id) {
> > - args[lcore_id].s = s;
> > - args[lcore_id].sz = &size;
> > -
> > if (rte_eal_remote_launch(stack_thread_push_pop,
> > - &args[lcore_id], lcore_id))
> > + &args, lcore_id))
> > rte_panic("Failed to launch lcore %d\n", lcore_id);
> > }
>
>
> In general we shouldn't pass a stack variable to other threads. Though your
> code here looks fine, I'd rather err on the safe side in case this is ever used
> as a template/basis for some other code...particularly since there's no
> performance/correctness/etc. penalty to using dynamically allocated memory.
>
> To support patch 2/4, you can instead convert the rte_malloc to allocate a
> single shared test_args structure. Or perhaps move patch 4 earlier in the series,
> and simply pass the stack pointer instead.
>
> Thanks,
> Gage
There is no gain to using rte_malloc unless you are doing primary/secondary process
or trying to test rte_malloc. Why not use regular malloc which has good tools and library support.
<snip>
> > >
> > > Replace the arguments array by one argument.
> > > All objects in the args array have the same values, so there is no
> > > need to use an array, only one struct is enough.
> > > The args object is a lot smaller, and the allocation can be replaced
> > > with a stack variable.
> > >
> > > The allocation of obj_table isn't needed either, because MAX_BULK is
> > > small. The allocation can instead be replaced with a static array.
> > >
> > > Signed-off-by: Steven Lariau <steven.lariau@arm.com>
> > > Reviewed-by: Dharmik Thakkar <dharmik.thakkar@arm.com>
> > > Reviewed-by: Phil Yang <phil.yang@arm.com>
> > > Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
> > > ---
> > > app/test/test_stack.c | 39 ++++++---------------------------------
> > > 1 file changed, 6 insertions(+), 33 deletions(-)
> > >
> > > diff --git a/app/test/test_stack.c b/app/test/test_stack.c index
> > > c8dac1f55..5a7273a7d 100644
> > > --- a/app/test/test_stack.c
> > > +++ b/app/test/test_stack.c
> > > @@ -280,16 +280,9 @@ static int
> > > stack_thread_push_pop(void *args)
> > > {
> > > struct test_args *t = args;
> > > - void **obj_table;
> > > + void *obj_table[MAX_BULK];
> > > int i;
> > >
> > > - obj_table = rte_calloc(NULL, STACK_SIZE, sizeof(void *), 0);
> > > - if (obj_table == NULL) {
> > > - printf("[%s():%u] failed to calloc %zu bytes\n",
> > > - __func__, __LINE__, STACK_SIZE * sizeof(void *));
> > > - return -1;
> > > - }
> > > -
> > > for (i = 0; i < NUM_ITERS_PER_THREAD; i++) {
> > > unsigned int success, num;
> > >
> > > @@ -310,28 +303,25 @@ stack_thread_push_pop(void *args)
> > > if (rte_stack_push(t->s, obj_table, num) != num) {
> > > printf("[%s():%u] Failed to push %u pointers\n",
> > > __func__, __LINE__, num);
> > > - rte_free(obj_table);
> > > return -1;
> > > }
> > >
> > > if (rte_stack_pop(t->s, obj_table, num) != num) {
> > > printf("[%s():%u] Failed to pop %u pointers\n",
> > > __func__, __LINE__, num);
> > > - rte_free(obj_table);
> > > return -1;
> > > }
> > >
> > > rte_atomic64_sub(t->sz, num);
> > > }
> > >
> > > - rte_free(obj_table);
> > > return 0;
> > > }
> >
> > Agreed, the dynamic allocation is unnecessary.
> >
> > >
> > > static int
> > > test_stack_multithreaded(uint32_t flags) {
> > > - struct test_args *args;
> > > + struct test_args args;
> > > unsigned int lcore_id;
> > > struct rte_stack *s;
> > > rte_atomic64_t size;
> > > @@ -344,45 +334,28 @@ test_stack_multithreaded(uint32_t flags)
> > > printf("[%s():%u] Running with %u lcores\n",
> > > __func__, __LINE__, rte_lcore_count());
> > >
> > > - args = rte_malloc(NULL, sizeof(struct test_args) * RTE_MAX_LCORE,
> > > 0);
> > > - if (args == NULL) {
> > > - printf("[%s():%u] failed to malloc %zu bytes\n",
> > > - __func__, __LINE__,
> > > - sizeof(struct test_args) * RTE_MAX_LCORE);
> > > - return -1;
> > > - }
> > > -
> > > s = rte_stack_create("test", STACK_SIZE, rte_socket_id(), flags);
> > > if (s == NULL) {
> > > printf("[%s():%u] Failed to create a stack\n",
> > > __func__, __LINE__);
> > > - rte_free(args);
> > > return -1;
> > > }
> > >
> > > rte_atomic64_init(&size);
> > > + args.s = s;
> > > + args.sz = &size;
> > >
> > > RTE_LCORE_FOREACH_SLAVE(lcore_id) {
> > > - args[lcore_id].s = s;
> > > - args[lcore_id].sz = &size;
> > > -
> > > if (rte_eal_remote_launch(stack_thread_push_pop,
> > > - &args[lcore_id], lcore_id))
> > > + &args, lcore_id))
> > > rte_panic("Failed to launch lcore %d\n", lcore_id);
> > > }
> >
> >
> > In general we shouldn't pass a stack variable to other threads. Though
> > your code here looks fine, I'd rather err on the safe side in case
> > this is ever used as a template/basis for some other
> > code...particularly since there's no performance/correctness/etc. penalty to
> using dynamically allocated memory.
> >
> > To support patch 2/4, you can instead convert the rte_malloc to
> > allocate a single shared test_args structure. Or perhaps move patch 4
> > earlier in the series, and simply pass the stack pointer instead.
> >
> > Thanks,
> > Gage
>
> There is no gain to using rte_malloc unless you are doing primary/secondary
> process or trying to test rte_malloc. Why not use regular malloc which has
> good tools and library support.
I think making 'args' a global variable is enough in this case.
> -----Original Message-----
> From: Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>
> Sent: Tuesday, August 11, 2020 3:50 PM
> To: Stephen Hemminger <stephen@networkplumber.org>; Eads, Gage
> <gage.eads@intel.com>
> Cc: Steven Lariau <Steven.Lariau@arm.com>; Olivier Matz
> <olivier.matz@6wind.com>; dev@dpdk.org; Dharmik Thakkar
> <Dharmik.Thakkar@arm.com>; nd <nd@arm.com>; Honnappa Nagarahalli
> <Honnappa.Nagarahalli@arm.com>; nd <nd@arm.com>
> Subject: RE: [dpdk-dev] [PATCH 1/4] test/stack: avoid trivial memory
> allocations
>
> <snip>
>
> > > >
> > > > Replace the arguments array by one argument.
> > > > All objects in the args array have the same values, so there is no
> > > > need to use an array, only one struct is enough.
> > > > The args object is a lot smaller, and the allocation can be replaced
> > > > with a stack variable.
> > > >
> > > > The allocation of obj_table isn't needed either, because MAX_BULK is
> > > > small. The allocation can instead be replaced with a static array.
> > > >
> > > > Signed-off-by: Steven Lariau <steven.lariau@arm.com>
> > > > Reviewed-by: Dharmik Thakkar <dharmik.thakkar@arm.com>
> > > > Reviewed-by: Phil Yang <phil.yang@arm.com>
> > > > Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
> > > > ---
> > > > app/test/test_stack.c | 39 ++++++---------------------------------
> > > > 1 file changed, 6 insertions(+), 33 deletions(-)
> > > >
> > > > diff --git a/app/test/test_stack.c b/app/test/test_stack.c index
> > > > c8dac1f55..5a7273a7d 100644
> > > > --- a/app/test/test_stack.c
> > > > +++ b/app/test/test_stack.c
> > > > @@ -280,16 +280,9 @@ static int
> > > > stack_thread_push_pop(void *args)
> > > > {
> > > > struct test_args *t = args;
> > > > - void **obj_table;
> > > > + void *obj_table[MAX_BULK];
> > > > int i;
> > > >
> > > > - obj_table = rte_calloc(NULL, STACK_SIZE, sizeof(void *), 0);
> > > > - if (obj_table == NULL) {
> > > > - printf("[%s():%u] failed to calloc %zu bytes\n",
> > > > - __func__, __LINE__, STACK_SIZE * sizeof(void *));
> > > > - return -1;
> > > > - }
> > > > -
> > > > for (i = 0; i < NUM_ITERS_PER_THREAD; i++) {
> > > > unsigned int success, num;
> > > >
> > > > @@ -310,28 +303,25 @@ stack_thread_push_pop(void *args)
> > > > if (rte_stack_push(t->s, obj_table, num) != num) {
> > > > printf("[%s():%u] Failed to push %u pointers\n",
> > > > __func__, __LINE__, num);
> > > > - rte_free(obj_table);
> > > > return -1;
> > > > }
> > > >
> > > > if (rte_stack_pop(t->s, obj_table, num) != num) {
> > > > printf("[%s():%u] Failed to pop %u pointers\n",
> > > > __func__, __LINE__, num);
> > > > - rte_free(obj_table);
> > > > return -1;
> > > > }
> > > >
> > > > rte_atomic64_sub(t->sz, num);
> > > > }
> > > >
> > > > - rte_free(obj_table);
> > > > return 0;
> > > > }
> > >
> > > Agreed, the dynamic allocation is unnecessary.
> > >
> > > >
> > > > static int
> > > > test_stack_multithreaded(uint32_t flags) {
> > > > - struct test_args *args;
> > > > + struct test_args args;
> > > > unsigned int lcore_id;
> > > > struct rte_stack *s;
> > > > rte_atomic64_t size;
> > > > @@ -344,45 +334,28 @@ test_stack_multithreaded(uint32_t flags)
> > > > printf("[%s():%u] Running with %u lcores\n",
> > > > __func__, __LINE__, rte_lcore_count());
> > > >
> > > > - args = rte_malloc(NULL, sizeof(struct test_args) * RTE_MAX_LCORE,
> > > > 0);
> > > > - if (args == NULL) {
> > > > - printf("[%s():%u] failed to malloc %zu bytes\n",
> > > > - __func__, __LINE__,
> > > > - sizeof(struct test_args) * RTE_MAX_LCORE);
> > > > - return -1;
> > > > - }
> > > > -
> > > > s = rte_stack_create("test", STACK_SIZE, rte_socket_id(), flags);
> > > > if (s == NULL) {
> > > > printf("[%s():%u] Failed to create a stack\n",
> > > > __func__, __LINE__);
> > > > - rte_free(args);
> > > > return -1;
> > > > }
> > > >
> > > > rte_atomic64_init(&size);
> > > > + args.s = s;
> > > > + args.sz = &size;
> > > >
> > > > RTE_LCORE_FOREACH_SLAVE(lcore_id) {
> > > > - args[lcore_id].s = s;
> > > > - args[lcore_id].sz = &size;
> > > > -
> > > > if (rte_eal_remote_launch(stack_thread_push_pop,
> > > > - &args[lcore_id], lcore_id))
> > > > + &args, lcore_id))
> > > > rte_panic("Failed to launch lcore %d\n", lcore_id);
> > > > }
> > >
> > >
> > > In general we shouldn't pass a stack variable to other threads. Though
> > > your code here looks fine, I'd rather err on the safe side in case
> > > this is ever used as a template/basis for some other
> > > code...particularly since there's no performance/correctness/etc.
> penalty to
> > using dynamically allocated memory.
> > >
> > > To support patch 2/4, you can instead convert the rte_malloc to
> > > allocate a single shared test_args structure. Or perhaps move patch 4
> > > earlier in the series, and simply pass the stack pointer instead.
> > >
> > > Thanks,
> > > Gage
> >
> > There is no gain to using rte_malloc unless you are doing
> primary/secondary
> > process or trying to test rte_malloc. Why not use regular malloc which has
> > good tools and library support.
>
> I think making 'args' a global variable is enough in this case.
Agreed.
Thanks,
Gage
The current multithread DPDK stack test is using atomics operations to share information between threads. The lockfree stack implementation also uses atomic operations. This is an issue for testing. The atomics operations for the test may add some extra synchronization to the stack implementation, that doesn't exist. It makes it harder to find bugs related to memory orderings and data races. The main goal of the patch is to remove all atomics operations and any other form of data sharing in this test, to make sure that most of the execution time is spent on the stack library. Furthermore, this patch uses more appropriate functions to start / wait cores in order to simplify the code. The patch also adds code to propagate errors on any slave core to the master. v2: replace stack variable for arguments with a global variable. Steven Lariau (4): test/stack: avoid trivial memory allocations test/stack: launch tests with mp remote launch API test/stack: propagate errors to main core test/stack: remove atomics operations app/test/test_stack.c | 80 +++++++++---------------------------------- 1 file changed, 17 insertions(+), 63 deletions(-) -- 2.17.1
Replace the arguments array by one argument. All objects in the args array have the same values, so there is no need to use an array, only one struct is enough. The args object is a lot smaller, and the allocation can be replaced with a global variable. The allocation of obj_table isn't needed either, because MAX_BULK is small. The allocation can instead be replaced with a static array. Signed-off-by: Steven Lariau <steven.lariau@arm.com> Reviewed-by: Dharmik Thakkar <dharmik.thakkar@arm.com> Reviewed-by: Phil Yang <phil.yang@arm.com> Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com> --- app/test/test_stack.c | 53 +++++++++++-------------------------------- 1 file changed, 13 insertions(+), 40 deletions(-) diff --git a/app/test/test_stack.c b/app/test/test_stack.c index c8dac1f55..f64d70c9d 100644 --- a/app/test/test_stack.c +++ b/app/test/test_stack.c @@ -276,20 +276,14 @@ struct test_args { rte_atomic64_t *sz; }; +static struct test_args thread_test_args; + static int -stack_thread_push_pop(void *args) +stack_thread_push_pop(__rte_unused void *args) { - struct test_args *t = args; - void **obj_table; + void *obj_table[MAX_BULK]; int i; - obj_table = rte_calloc(NULL, STACK_SIZE, sizeof(void *), 0); - if (obj_table == NULL) { - printf("[%s():%u] failed to calloc %zu bytes\n", - __func__, __LINE__, STACK_SIZE * sizeof(void *)); - return -1; - } - for (i = 0; i < NUM_ITERS_PER_THREAD; i++) { unsigned int success, num; @@ -297,41 +291,37 @@ stack_thread_push_pop(void *args) * then push and pop those stack entries. */ do { - uint64_t sz = rte_atomic64_read(t->sz); + uint64_t sz = rte_atomic64_read(thread_test_args.sz); volatile uint64_t *sz_addr; - sz_addr = (volatile uint64_t *)t->sz; + sz_addr = (volatile uint64_t *)thread_test_args.sz; num = RTE_MIN(rte_rand() % MAX_BULK, STACK_SIZE - sz); success = rte_atomic64_cmpset(sz_addr, sz, sz + num); } while (success == 0); - if (rte_stack_push(t->s, obj_table, num) != num) { + if (rte_stack_push(thread_test_args.s, obj_table, num) != num) { printf("[%s():%u] Failed to push %u pointers\n", __func__, __LINE__, num); - rte_free(obj_table); return -1; } - if (rte_stack_pop(t->s, obj_table, num) != num) { + if (rte_stack_pop(thread_test_args.s, obj_table, num) != num) { printf("[%s():%u] Failed to pop %u pointers\n", __func__, __LINE__, num); - rte_free(obj_table); return -1; } - rte_atomic64_sub(t->sz, num); + rte_atomic64_sub(thread_test_args.sz, num); } - rte_free(obj_table); return 0; } static int test_stack_multithreaded(uint32_t flags) { - struct test_args *args; unsigned int lcore_id; struct rte_stack *s; rte_atomic64_t size; @@ -344,45 +334,28 @@ test_stack_multithreaded(uint32_t flags) printf("[%s():%u] Running with %u lcores\n", __func__, __LINE__, rte_lcore_count()); - args = rte_malloc(NULL, sizeof(struct test_args) * RTE_MAX_LCORE, 0); - if (args == NULL) { - printf("[%s():%u] failed to malloc %zu bytes\n", - __func__, __LINE__, - sizeof(struct test_args) * RTE_MAX_LCORE); - return -1; - } - s = rte_stack_create("test", STACK_SIZE, rte_socket_id(), flags); if (s == NULL) { printf("[%s():%u] Failed to create a stack\n", __func__, __LINE__); - rte_free(args); return -1; } rte_atomic64_init(&size); + thread_test_args.s = s; + thread_test_args.sz = &size; RTE_LCORE_FOREACH_SLAVE(lcore_id) { - args[lcore_id].s = s; - args[lcore_id].sz = &size; - if (rte_eal_remote_launch(stack_thread_push_pop, - &args[lcore_id], lcore_id)) + NULL, lcore_id)) rte_panic("Failed to launch lcore %d\n", lcore_id); } - lcore_id = rte_lcore_id(); - - args[lcore_id].s = s; - args[lcore_id].sz = &size; - - stack_thread_push_pop(&args[lcore_id]); + stack_thread_push_pop(NULL); rte_eal_mp_wait_lcore(); rte_stack_free(s); - rte_free(args); - return 0; } -- 2.17.1
All the cores use the same argument object, so there is no need to use a loop to launch the test on every core one by one. Replace loop with one call to rte_eal_mp_remote_launch Signed-off-by: Steven Lariau <steven.lariau@arm.com> Reviewed-by: Dharmik Thakkar <dharmik.thakkar@arm.com> Reviewed-by: Phil Yang <phil.yang@arm.com> Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com> Acked-by: Gage Eads <gage.eads@intel.com> --- app/test/test_stack.c | 11 ++--------- 1 file changed, 2 insertions(+), 9 deletions(-) diff --git a/app/test/test_stack.c b/app/test/test_stack.c index f64d70c9d..efd473855 100644 --- a/app/test/test_stack.c +++ b/app/test/test_stack.c @@ -322,7 +322,6 @@ stack_thread_push_pop(__rte_unused void *args) static int test_stack_multithreaded(uint32_t flags) { - unsigned int lcore_id; struct rte_stack *s; rte_atomic64_t size; @@ -345,14 +344,8 @@ test_stack_multithreaded(uint32_t flags) thread_test_args.s = s; thread_test_args.sz = &size; - RTE_LCORE_FOREACH_SLAVE(lcore_id) { - if (rte_eal_remote_launch(stack_thread_push_pop, - NULL, lcore_id)) - rte_panic("Failed to launch lcore %d\n", lcore_id); - } - - stack_thread_push_pop(NULL); - + if (rte_eal_mp_remote_launch(stack_thread_push_pop, NULL, CALL_MASTER)) + rte_panic("Failed to launch tests\n"); rte_eal_mp_wait_lcore(); rte_stack_free(s); -- 2.17.1
Use rte_eal_wait_lcore to wait and get the return value for all cores. This is used to propagate any error to the main core. Signed-off-by: Steven Lariau <steven.lariau@arm.com> Reviewed-by: Dharmik Thakkar <dharmik.thakkar@arm.com> Reviewed-by: Phil Yang <phil.yang@arm.com> Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com> Acked-by: Gage Eads <gage.eads@intel.com> --- app/test/test_stack.c | 10 ++++++++-- 1 file changed, 8 insertions(+), 2 deletions(-) diff --git a/app/test/test_stack.c b/app/test/test_stack.c index efd473855..d959b566a 100644 --- a/app/test/test_stack.c +++ b/app/test/test_stack.c @@ -322,8 +322,10 @@ stack_thread_push_pop(__rte_unused void *args) static int test_stack_multithreaded(uint32_t flags) { + unsigned int lcore_id; struct rte_stack *s; rte_atomic64_t size; + int result = 0; if (rte_lcore_count() < 2) { printf("Not enough cores for test_stack_multithreaded, expecting at least 2\n"); @@ -346,10 +348,14 @@ test_stack_multithreaded(uint32_t flags) if (rte_eal_mp_remote_launch(stack_thread_push_pop, NULL, CALL_MASTER)) rte_panic("Failed to launch tests\n"); - rte_eal_mp_wait_lcore(); + + RTE_LCORE_FOREACH(lcore_id) { + if (rte_eal_wait_lcore(lcore_id) < 0) + result = -1; + } rte_stack_free(s); - return 0; + return result; } static int -- 2.17.1
Remove the part that checks if there is enough room in the stack, it's always true as long as size of stack >= MAX_BULK*rte_lcore_count(). This check used an atomic cmpset, and read / write to a shared size variable. These operations result in some form of synchronization that might get in the way of the actual stack testing. Signed-off-by: Steven Lariau <steven.lariau@arm.com> Reviewed-by: Dharmik Thakkar <dharmik.thakkar@arm.com> Reviewed-by: Phil Yang <phil.yang@arm.com> Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com> Acked-by: Gage Eads <gage.eads@intel.com> --- app/test/test_stack.c | 24 +++--------------------- 1 file changed, 3 insertions(+), 21 deletions(-) diff --git a/app/test/test_stack.c b/app/test/test_stack.c index d959b566a..463460ccc 100644 --- a/app/test/test_stack.c +++ b/app/test/test_stack.c @@ -273,7 +273,6 @@ test_free_null(void) struct test_args { struct rte_stack *s; - rte_atomic64_t *sz; }; static struct test_args thread_test_args; @@ -285,21 +284,9 @@ stack_thread_push_pop(__rte_unused void *args) int i; for (i = 0; i < NUM_ITERS_PER_THREAD; i++) { - unsigned int success, num; + unsigned int num; - /* Reserve up to min(MAX_BULK, available slots) stack entries, - * then push and pop those stack entries. - */ - do { - uint64_t sz = rte_atomic64_read(thread_test_args.sz); - volatile uint64_t *sz_addr; - - sz_addr = (volatile uint64_t *)thread_test_args.sz; - - num = RTE_MIN(rte_rand() % MAX_BULK, STACK_SIZE - sz); - - success = rte_atomic64_cmpset(sz_addr, sz, sz + num); - } while (success == 0); + num = rte_rand() % MAX_BULK; if (rte_stack_push(thread_test_args.s, obj_table, num) != num) { printf("[%s():%u] Failed to push %u pointers\n", @@ -312,8 +299,6 @@ stack_thread_push_pop(__rte_unused void *args) __func__, __LINE__, num); return -1; } - - rte_atomic64_sub(thread_test_args.sz, num); } return 0; @@ -324,7 +309,6 @@ test_stack_multithreaded(uint32_t flags) { unsigned int lcore_id; struct rte_stack *s; - rte_atomic64_t size; int result = 0; if (rte_lcore_count() < 2) { @@ -335,16 +319,14 @@ test_stack_multithreaded(uint32_t flags) printf("[%s():%u] Running with %u lcores\n", __func__, __LINE__, rte_lcore_count()); - s = rte_stack_create("test", STACK_SIZE, rte_socket_id(), flags); + s = rte_stack_create("test", MAX_BULK * rte_lcore_count(), rte_socket_id(), flags); if (s == NULL) { printf("[%s():%u] Failed to create a stack\n", __func__, __LINE__); return -1; } - rte_atomic64_init(&size); thread_test_args.s = s; - thread_test_args.sz = &size; if (rte_eal_mp_remote_launch(stack_thread_push_pop, NULL, CALL_MASTER)) rte_panic("Failed to launch tests\n"); -- 2.17.1
> -----Original Message-----
> From: Steven Lariau <steven.lariau@arm.com>
> Sent: Wednesday, August 12, 2020 2:19 PM
> To: Eads, Gage <gage.eads@intel.com>; Olivier Matz
> <olivier.matz@6wind.com>
> Cc: dev@dpdk.org; nd@arm.com; Steven Lariau <steven.lariau@arm.com>
> Subject: [PATCH v2 1/4] test/stack: avoid trivial memory allocations
>
> Replace the arguments array by one argument.
> All objects in the args array have the same values, so there is no need
> to use an array, only one struct is enough.
> The args object is a lot smaller, and the allocation can be replaced
> with a global variable.
>
> The allocation of obj_table isn't needed either, because MAX_BULK is
> small. The allocation can instead be replaced with a static array.
>
> Signed-off-by: Steven Lariau <steven.lariau@arm.com>
> Reviewed-by: Dharmik Thakkar <dharmik.thakkar@arm.com>
> Reviewed-by: Phil Yang <phil.yang@arm.com>
> Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
Acked-by: Gage Eads <gage.eads@intel.com>
Thanks,
Gage
On Wed, Aug 12, 2020 at 9:20 PM Steven Lariau <steven.lariau@arm.com> wrote:
>
> The current multithread DPDK stack test is using atomics operations to
> share information between threads.
> The lockfree stack implementation also uses atomic operations.
> This is an issue for testing. The atomics operations for the test may
> add some extra synchronization to the stack implementation,
> that doesn't exist.
> It makes it harder to find bugs related to memory orderings and data
> races. The main goal of the patch is to remove all atomics operations
> and any other form of data sharing in this test, to make sure that
> most of the execution time is spent on the stack library.
>
> Furthermore, this patch uses more appropriate functions to start /
> wait cores in order to simplify the code.
> The patch also adds code to propagate errors on any slave core to the
> master.
>
> v2: replace stack variable for arguments with a global variable.
>
> Steven Lariau (4):
> test/stack: avoid trivial memory allocations
> test/stack: launch tests with mp remote launch API
> test/stack: propagate errors to main core
> test/stack: remove atomics operations
>
> app/test/test_stack.c | 80 +++++++++----------------------------------
> 1 file changed, 17 insertions(+), 63 deletions(-)
>
Series applied, thanks Steven.
--
David Marchand