* [dpdk-dev] [PATCH 0/4] test/stack: improve multithreaded test @ 2020-08-05 15:45 Steven Lariau 2020-08-05 15:45 ` [dpdk-dev] [PATCH 1/4] test/stack: avoid trivial memory allocations Steven Lariau ` (3 more replies) 0 siblings, 4 replies; 10+ messages in thread From: Steven Lariau @ 2020-08-05 15:45 UTC (permalink / raw) Cc: dev, honnappa.nagarahalli, nd, Steven Lariau The current multithread DPDK stack test is using atomics operations to share information between threads. The lockfree stack implementation also uses atomic operations. This is an issue for testing. The atomics operations for the test may add some extra synchronization to the stack implementation, that doesn't exist. It makes it harder to find bugs related to memory orderings and data races. The main goal of the patch is to remove all atomics operations and any other form of data sharing in this test, to make sure that most of the execution time is spent on the stack library. Furthermore, this patch uses more appropriate functions to start / wait cores in order to simplify the code. The patch also adds code to propagate errors on any slave core to the master. Steven Lariau (4): test/stack: avoid trivial memory allocations test/stack: launch tests with mp remote launch API test/stack: propagate errors to main core test/stack: remove atomics operations app/test/test_stack.c | 71 ++++++++----------------------------------- 1 file changed, 13 insertions(+), 58 deletions(-) -- 2.17.1 ^ permalink raw reply [flat|nested] 10+ messages in thread
* [dpdk-dev] [PATCH 1/4] test/stack: avoid trivial memory allocations 2020-08-05 15:45 [dpdk-dev] [PATCH 0/4] test/stack: improve multithreaded test Steven Lariau @ 2020-08-05 15:45 ` Steven Lariau 2020-08-05 15:45 ` [dpdk-dev] [PATCH 2/4] test/stack: launch tests with mp remote launch API Steven Lariau ` (2 subsequent siblings) 3 siblings, 0 replies; 10+ messages in thread From: Steven Lariau @ 2020-08-05 15:45 UTC (permalink / raw) To: Gage Eads, Olivier Matz; +Cc: dev, honnappa.nagarahalli, nd, Steven Lariau Replace the arguments array by one argument. All objects in the args array have the same values, so there is no need to use an array, only one struct is enough. The args object is a lot smaller, and the allocation can be replaced with a stack variable. The allocation of obj_table isn't needed either, because MAX_BULK is small. The allocation can instead be replaced with a static array. Signed-off-by: Steven Lariau <steven.lariau@arm.com> Reviewed-by: Dharmik Thakkar <dharmik.thakkar@arm.com> Reviewed-by: Phil Yang <phil.yang@arm.com> Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com> --- app/test/test_stack.c | 39 ++++++--------------------------------- 1 file changed, 6 insertions(+), 33 deletions(-) diff --git a/app/test/test_stack.c b/app/test/test_stack.c index c8dac1f55..5a7273a7d 100644 --- a/app/test/test_stack.c +++ b/app/test/test_stack.c @@ -280,16 +280,9 @@ static int stack_thread_push_pop(void *args) { struct test_args *t = args; - void **obj_table; + void *obj_table[MAX_BULK]; int i; - obj_table = rte_calloc(NULL, STACK_SIZE, sizeof(void *), 0); - if (obj_table == NULL) { - printf("[%s():%u] failed to calloc %zu bytes\n", - __func__, __LINE__, STACK_SIZE * sizeof(void *)); - return -1; - } - for (i = 0; i < NUM_ITERS_PER_THREAD; i++) { unsigned int success, num; @@ -310,28 +303,25 @@ stack_thread_push_pop(void *args) if (rte_stack_push(t->s, obj_table, num) != num) { printf("[%s():%u] Failed to push %u pointers\n", __func__, __LINE__, num); - rte_free(obj_table); return -1; } if (rte_stack_pop(t->s, obj_table, num) != num) { printf("[%s():%u] Failed to pop %u pointers\n", __func__, __LINE__, num); - rte_free(obj_table); return -1; } rte_atomic64_sub(t->sz, num); } - rte_free(obj_table); return 0; } static int test_stack_multithreaded(uint32_t flags) { - struct test_args *args; + struct test_args args; unsigned int lcore_id; struct rte_stack *s; rte_atomic64_t size; @@ -344,45 +334,28 @@ test_stack_multithreaded(uint32_t flags) printf("[%s():%u] Running with %u lcores\n", __func__, __LINE__, rte_lcore_count()); - args = rte_malloc(NULL, sizeof(struct test_args) * RTE_MAX_LCORE, 0); - if (args == NULL) { - printf("[%s():%u] failed to malloc %zu bytes\n", - __func__, __LINE__, - sizeof(struct test_args) * RTE_MAX_LCORE); - return -1; - } - s = rte_stack_create("test", STACK_SIZE, rte_socket_id(), flags); if (s == NULL) { printf("[%s():%u] Failed to create a stack\n", __func__, __LINE__); - rte_free(args); return -1; } rte_atomic64_init(&size); + args.s = s; + args.sz = &size; RTE_LCORE_FOREACH_SLAVE(lcore_id) { - args[lcore_id].s = s; - args[lcore_id].sz = &size; - if (rte_eal_remote_launch(stack_thread_push_pop, - &args[lcore_id], lcore_id)) + &args, lcore_id)) rte_panic("Failed to launch lcore %d\n", lcore_id); } - lcore_id = rte_lcore_id(); - - args[lcore_id].s = s; - args[lcore_id].sz = &size; - - stack_thread_push_pop(&args[lcore_id]); + stack_thread_push_pop(&args); rte_eal_mp_wait_lcore(); rte_stack_free(s); - rte_free(args); - return 0; } -- 2.17.1 ^ permalink raw reply [flat|nested] 10+ messages in thread
* [dpdk-dev] [PATCH 2/4] test/stack: launch tests with mp remote launch API 2020-08-05 15:45 [dpdk-dev] [PATCH 0/4] test/stack: improve multithreaded test Steven Lariau 2020-08-05 15:45 ` [dpdk-dev] [PATCH 1/4] test/stack: avoid trivial memory allocations Steven Lariau @ 2020-08-05 15:45 ` Steven Lariau 2020-08-05 15:46 ` [dpdk-dev] [PATCH 3/4] test/stack: propagate errors to main core Steven Lariau 2020-08-05 15:46 ` [dpdk-dev] [PATCH 4/4] test/stack: remove atomics operations Steven Lariau 3 siblings, 0 replies; 10+ messages in thread From: Steven Lariau @ 2020-08-05 15:45 UTC (permalink / raw) To: Gage Eads, Olivier Matz; +Cc: dev, honnappa.nagarahalli, nd, Steven Lariau All the cores use the same argument object, so there is no need to use a loop to launch the test on every core one by one. Replace loop with one call to rte_eal_mp_remote_launch Signed-off-by: Steven Lariau <steven.lariau@arm.com> Reviewed-by: Dharmik Thakkar <dharmik.thakkar@arm.com> Reviewed-by: Phil Yang <phil.yang@arm.com> Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com> --- app/test/test_stack.c | 11 ++--------- 1 file changed, 2 insertions(+), 9 deletions(-) diff --git a/app/test/test_stack.c b/app/test/test_stack.c index 5a7273a7d..c100d9faf 100644 --- a/app/test/test_stack.c +++ b/app/test/test_stack.c @@ -322,7 +322,6 @@ static int test_stack_multithreaded(uint32_t flags) { struct test_args args; - unsigned int lcore_id; struct rte_stack *s; rte_atomic64_t size; @@ -345,14 +344,8 @@ test_stack_multithreaded(uint32_t flags) args.s = s; args.sz = &size; - RTE_LCORE_FOREACH_SLAVE(lcore_id) { - if (rte_eal_remote_launch(stack_thread_push_pop, - &args, lcore_id)) - rte_panic("Failed to launch lcore %d\n", lcore_id); - } - - stack_thread_push_pop(&args); - + if (rte_eal_mp_remote_launch(stack_thread_push_pop, &args, CALL_MASTER)) + rte_panic("Failed to launch tests\n"); rte_eal_mp_wait_lcore(); rte_stack_free(s); -- 2.17.1 ^ permalink raw reply [flat|nested] 10+ messages in thread
* [dpdk-dev] [PATCH 3/4] test/stack: propagate errors to main core 2020-08-05 15:45 [dpdk-dev] [PATCH 0/4] test/stack: improve multithreaded test Steven Lariau 2020-08-05 15:45 ` [dpdk-dev] [PATCH 1/4] test/stack: avoid trivial memory allocations Steven Lariau 2020-08-05 15:45 ` [dpdk-dev] [PATCH 2/4] test/stack: launch tests with mp remote launch API Steven Lariau @ 2020-08-05 15:46 ` Steven Lariau 2020-08-05 15:46 ` [dpdk-dev] [PATCH 4/4] test/stack: remove atomics operations Steven Lariau 3 siblings, 0 replies; 10+ messages in thread From: Steven Lariau @ 2020-08-05 15:46 UTC (permalink / raw) To: Gage Eads, Olivier Matz; +Cc: dev, honnappa.nagarahalli, nd, Steven Lariau Use rte_eal_wait_lcore to wait and get the return value for all cores. This is used to propagate any error to the main core. Signed-off-by: Steven Lariau <steven.lariau@arm.com> Reviewed-by: Dharmik Thakkar <dharmik.thakkar@arm.com> Reviewed-by: Phil Yang <phil.yang@arm.com> Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com> --- app/test/test_stack.c | 10 ++++++++-- 1 file changed, 8 insertions(+), 2 deletions(-) diff --git a/app/test/test_stack.c b/app/test/test_stack.c index c100d9faf..b9d6befd4 100644 --- a/app/test/test_stack.c +++ b/app/test/test_stack.c @@ -322,8 +322,10 @@ static int test_stack_multithreaded(uint32_t flags) { struct test_args args; + unsigned int lcore_id; struct rte_stack *s; rte_atomic64_t size; + int result = 0; if (rte_lcore_count() < 2) { printf("Not enough cores for test_stack_multithreaded, expecting at least 2\n"); @@ -346,10 +348,14 @@ test_stack_multithreaded(uint32_t flags) if (rte_eal_mp_remote_launch(stack_thread_push_pop, &args, CALL_MASTER)) rte_panic("Failed to launch tests\n"); - rte_eal_mp_wait_lcore(); + + RTE_LCORE_FOREACH(lcore_id) { + if (rte_eal_wait_lcore(lcore_id) < 0) + result = -1; + } rte_stack_free(s); - return 0; + return result; } static int -- 2.17.1 ^ permalink raw reply [flat|nested] 10+ messages in thread
* [dpdk-dev] [PATCH 4/4] test/stack: remove atomics operations 2020-08-05 15:45 [dpdk-dev] [PATCH 0/4] test/stack: improve multithreaded test Steven Lariau ` (2 preceding siblings ...) 2020-08-05 15:46 ` [dpdk-dev] [PATCH 3/4] test/stack: propagate errors to main core Steven Lariau @ 2020-08-05 15:46 ` Steven Lariau 3 siblings, 0 replies; 10+ messages in thread From: Steven Lariau @ 2020-08-05 15:46 UTC (permalink / raw) To: Gage Eads, Olivier Matz; +Cc: dev, honnappa.nagarahalli, nd, Steven Lariau Remove the part that checks if there is enough room in the stack, it's always true as long as size of stack >= MAX_BULK*rte_lcore_count(). This check used an atomic cmpset, and read / write to a shared size variable. These operations result in some form of synchronization that might get in the way of the actual stack testing. Signed-off-by: Steven Lariau <steven.lariau@arm.com> Reviewed-by: Dharmik Thakkar <dharmik.thakkar@arm.com> Reviewed-by: Phil Yang <phil.yang@arm.com> Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com> --- app/test/test_stack.c | 23 +++-------------------- 1 file changed, 3 insertions(+), 20 deletions(-) diff --git a/app/test/test_stack.c b/app/test/test_stack.c index b9d6befd4..02bea7ec5 100644 --- a/app/test/test_stack.c +++ b/app/test/test_stack.c @@ -273,7 +273,6 @@ test_free_null(void) struct test_args { struct rte_stack *s; - rte_atomic64_t *sz; }; static int @@ -284,21 +283,9 @@ stack_thread_push_pop(void *args) int i; for (i = 0; i < NUM_ITERS_PER_THREAD; i++) { - unsigned int success, num; + unsigned int num; - /* Reserve up to min(MAX_BULK, available slots) stack entries, - * then push and pop those stack entries. - */ - do { - uint64_t sz = rte_atomic64_read(t->sz); - volatile uint64_t *sz_addr; - - sz_addr = (volatile uint64_t *)t->sz; - - num = RTE_MIN(rte_rand() % MAX_BULK, STACK_SIZE - sz); - - success = rte_atomic64_cmpset(sz_addr, sz, sz + num); - } while (success == 0); + num = rte_rand() % MAX_BULK; if (rte_stack_push(t->s, obj_table, num) != num) { printf("[%s():%u] Failed to push %u pointers\n", @@ -312,7 +299,6 @@ stack_thread_push_pop(void *args) return -1; } - rte_atomic64_sub(t->sz, num); } return 0; @@ -324,7 +310,6 @@ test_stack_multithreaded(uint32_t flags) struct test_args args; unsigned int lcore_id; struct rte_stack *s; - rte_atomic64_t size; int result = 0; if (rte_lcore_count() < 2) { @@ -335,16 +320,14 @@ test_stack_multithreaded(uint32_t flags) printf("[%s():%u] Running with %u lcores\n", __func__, __LINE__, rte_lcore_count()); - s = rte_stack_create("test", STACK_SIZE, rte_socket_id(), flags); + s = rte_stack_create("test", MAX_BULK * rte_lcore_count(), rte_socket_id(), flags); if (s == NULL) { printf("[%s():%u] Failed to create a stack\n", __func__, __LINE__); return -1; } - rte_atomic64_init(&size); args.s = s; - args.sz = &size; if (rte_eal_mp_remote_launch(stack_thread_push_pop, &args, CALL_MASTER)) rte_panic("Failed to launch tests\n"); -- 2.17.1 ^ permalink raw reply [flat|nested] 10+ messages in thread
* [dpdk-dev] [PATCH 0/4] test/stack: improve multithreaded test @ 2020-08-05 15:57 Steven Lariau 2020-08-05 15:57 ` [dpdk-dev] [PATCH 1/4] test/stack: avoid trivial memory allocations Steven Lariau 0 siblings, 1 reply; 10+ messages in thread From: Steven Lariau @ 2020-08-05 15:57 UTC (permalink / raw) Cc: dev, honnappa.nagarahalli, dharmik.thakkar, nd, Steven Lariau The current multithread DPDK stack test is using atomics operations to share information between threads. The lockfree stack implementation also uses atomic operations. This is an issue for testing. The atomics operations for the test may add some extra synchronization to the stack implementation, that doesn't exist. It makes it harder to find bugs related to memory orderings and data races. The main goal of the patch is to remove all atomics operations and any other form of data sharing in this test, to make sure that most of the execution time is spent on the stack library. Furthermore, this patch uses more appropriate functions to start / wait cores in order to simplify the code. The patch also adds code to propagate errors on any slave core to the master. Steven Lariau (4): test/stack: avoid trivial memory allocations test/stack: launch tests with mp remote launch API test/stack: propagate errors to main core test/stack: remove atomics operations app/test/test_stack.c | 71 ++++++++----------------------------------- 1 file changed, 13 insertions(+), 58 deletions(-) -- 2.17.1 ^ permalink raw reply [flat|nested] 10+ messages in thread
* [dpdk-dev] [PATCH 1/4] test/stack: avoid trivial memory allocations 2020-08-05 15:57 [dpdk-dev] [PATCH 0/4] test/stack: improve multithreaded test Steven Lariau @ 2020-08-05 15:57 ` Steven Lariau 2020-08-11 20:13 ` Eads, Gage 0 siblings, 1 reply; 10+ messages in thread From: Steven Lariau @ 2020-08-05 15:57 UTC (permalink / raw) To: Gage Eads, Olivier Matz Cc: dev, honnappa.nagarahalli, dharmik.thakkar, nd, Steven Lariau Replace the arguments array by one argument. All objects in the args array have the same values, so there is no need to use an array, only one struct is enough. The args object is a lot smaller, and the allocation can be replaced with a stack variable. The allocation of obj_table isn't needed either, because MAX_BULK is small. The allocation can instead be replaced with a static array. Signed-off-by: Steven Lariau <steven.lariau@arm.com> Reviewed-by: Dharmik Thakkar <dharmik.thakkar@arm.com> Reviewed-by: Phil Yang <phil.yang@arm.com> Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com> --- app/test/test_stack.c | 39 ++++++--------------------------------- 1 file changed, 6 insertions(+), 33 deletions(-) diff --git a/app/test/test_stack.c b/app/test/test_stack.c index c8dac1f55..5a7273a7d 100644 --- a/app/test/test_stack.c +++ b/app/test/test_stack.c @@ -280,16 +280,9 @@ static int stack_thread_push_pop(void *args) { struct test_args *t = args; - void **obj_table; + void *obj_table[MAX_BULK]; int i; - obj_table = rte_calloc(NULL, STACK_SIZE, sizeof(void *), 0); - if (obj_table == NULL) { - printf("[%s():%u] failed to calloc %zu bytes\n", - __func__, __LINE__, STACK_SIZE * sizeof(void *)); - return -1; - } - for (i = 0; i < NUM_ITERS_PER_THREAD; i++) { unsigned int success, num; @@ -310,28 +303,25 @@ stack_thread_push_pop(void *args) if (rte_stack_push(t->s, obj_table, num) != num) { printf("[%s():%u] Failed to push %u pointers\n", __func__, __LINE__, num); - rte_free(obj_table); return -1; } if (rte_stack_pop(t->s, obj_table, num) != num) { printf("[%s():%u] Failed to pop %u pointers\n", __func__, __LINE__, num); - rte_free(obj_table); return -1; } rte_atomic64_sub(t->sz, num); } - rte_free(obj_table); return 0; } static int test_stack_multithreaded(uint32_t flags) { - struct test_args *args; + struct test_args args; unsigned int lcore_id; struct rte_stack *s; rte_atomic64_t size; @@ -344,45 +334,28 @@ test_stack_multithreaded(uint32_t flags) printf("[%s():%u] Running with %u lcores\n", __func__, __LINE__, rte_lcore_count()); - args = rte_malloc(NULL, sizeof(struct test_args) * RTE_MAX_LCORE, 0); - if (args == NULL) { - printf("[%s():%u] failed to malloc %zu bytes\n", - __func__, __LINE__, - sizeof(struct test_args) * RTE_MAX_LCORE); - return -1; - } - s = rte_stack_create("test", STACK_SIZE, rte_socket_id(), flags); if (s == NULL) { printf("[%s():%u] Failed to create a stack\n", __func__, __LINE__); - rte_free(args); return -1; } rte_atomic64_init(&size); + args.s = s; + args.sz = &size; RTE_LCORE_FOREACH_SLAVE(lcore_id) { - args[lcore_id].s = s; - args[lcore_id].sz = &size; - if (rte_eal_remote_launch(stack_thread_push_pop, - &args[lcore_id], lcore_id)) + &args, lcore_id)) rte_panic("Failed to launch lcore %d\n", lcore_id); } - lcore_id = rte_lcore_id(); - - args[lcore_id].s = s; - args[lcore_id].sz = &size; - - stack_thread_push_pop(&args[lcore_id]); + stack_thread_push_pop(&args); rte_eal_mp_wait_lcore(); rte_stack_free(s); - rte_free(args); - return 0; } -- 2.17.1 ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [dpdk-dev] [PATCH 1/4] test/stack: avoid trivial memory allocations 2020-08-05 15:57 ` [dpdk-dev] [PATCH 1/4] test/stack: avoid trivial memory allocations Steven Lariau @ 2020-08-11 20:13 ` Eads, Gage 2020-08-11 20:38 ` Stephen Hemminger 0 siblings, 1 reply; 10+ messages in thread From: Eads, Gage @ 2020-08-11 20:13 UTC (permalink / raw) To: Steven Lariau, Olivier Matz Cc: dev, honnappa.nagarahalli, dharmik.thakkar, nd Hi Steven, > -----Original Message----- > From: Steven Lariau <steven.lariau@arm.com> > Sent: Wednesday, August 5, 2020 10:57 AM > To: Eads, Gage <gage.eads@intel.com>; Olivier Matz > <olivier.matz@6wind.com> > Cc: dev@dpdk.org; honnappa.nagarahalli@arm.com; > dharmik.thakkar@arm.com; nd@arm.com; Steven Lariau > <steven.lariau@arm.com> > Subject: [PATCH 1/4] test/stack: avoid trivial memory allocations > > Replace the arguments array by one argument. > All objects in the args array have the same values, so there is no need > to use an array, only one struct is enough. > The args object is a lot smaller, and the allocation can be replaced > with a stack variable. > > The allocation of obj_table isn't needed either, because MAX_BULK is > small. The allocation can instead be replaced with a static array. > > Signed-off-by: Steven Lariau <steven.lariau@arm.com> > Reviewed-by: Dharmik Thakkar <dharmik.thakkar@arm.com> > Reviewed-by: Phil Yang <phil.yang@arm.com> > Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com> > --- > app/test/test_stack.c | 39 ++++++--------------------------------- > 1 file changed, 6 insertions(+), 33 deletions(-) > > diff --git a/app/test/test_stack.c b/app/test/test_stack.c > index c8dac1f55..5a7273a7d 100644 > --- a/app/test/test_stack.c > +++ b/app/test/test_stack.c > @@ -280,16 +280,9 @@ static int > stack_thread_push_pop(void *args) > { > struct test_args *t = args; > - void **obj_table; > + void *obj_table[MAX_BULK]; > int i; > > - obj_table = rte_calloc(NULL, STACK_SIZE, sizeof(void *), 0); > - if (obj_table == NULL) { > - printf("[%s():%u] failed to calloc %zu bytes\n", > - __func__, __LINE__, STACK_SIZE * sizeof(void *)); > - return -1; > - } > - > for (i = 0; i < NUM_ITERS_PER_THREAD; i++) { > unsigned int success, num; > > @@ -310,28 +303,25 @@ stack_thread_push_pop(void *args) > if (rte_stack_push(t->s, obj_table, num) != num) { > printf("[%s():%u] Failed to push %u pointers\n", > __func__, __LINE__, num); > - rte_free(obj_table); > return -1; > } > > if (rte_stack_pop(t->s, obj_table, num) != num) { > printf("[%s():%u] Failed to pop %u pointers\n", > __func__, __LINE__, num); > - rte_free(obj_table); > return -1; > } > > rte_atomic64_sub(t->sz, num); > } > > - rte_free(obj_table); > return 0; > } Agreed, the dynamic allocation is unnecessary. > > static int > test_stack_multithreaded(uint32_t flags) > { > - struct test_args *args; > + struct test_args args; > unsigned int lcore_id; > struct rte_stack *s; > rte_atomic64_t size; > @@ -344,45 +334,28 @@ test_stack_multithreaded(uint32_t flags) > printf("[%s():%u] Running with %u lcores\n", > __func__, __LINE__, rte_lcore_count()); > > - args = rte_malloc(NULL, sizeof(struct test_args) * RTE_MAX_LCORE, > 0); > - if (args == NULL) { > - printf("[%s():%u] failed to malloc %zu bytes\n", > - __func__, __LINE__, > - sizeof(struct test_args) * RTE_MAX_LCORE); > - return -1; > - } > - > s = rte_stack_create("test", STACK_SIZE, rte_socket_id(), flags); > if (s == NULL) { > printf("[%s():%u] Failed to create a stack\n", > __func__, __LINE__); > - rte_free(args); > return -1; > } > > rte_atomic64_init(&size); > + args.s = s; > + args.sz = &size; > > RTE_LCORE_FOREACH_SLAVE(lcore_id) { > - args[lcore_id].s = s; > - args[lcore_id].sz = &size; > - > if (rte_eal_remote_launch(stack_thread_push_pop, > - &args[lcore_id], lcore_id)) > + &args, lcore_id)) > rte_panic("Failed to launch lcore %d\n", lcore_id); > } In general we shouldn't pass a stack variable to other threads. Though your code here looks fine, I'd rather err on the safe side in case this is ever used as a template/basis for some other code...particularly since there's no performance/correctness/etc. penalty to using dynamically allocated memory. To support patch 2/4, you can instead convert the rte_malloc to allocate a single shared test_args structure. Or perhaps move patch 4 earlier in the series, and simply pass the stack pointer instead. Thanks, Gage ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [dpdk-dev] [PATCH 1/4] test/stack: avoid trivial memory allocations 2020-08-11 20:13 ` Eads, Gage @ 2020-08-11 20:38 ` Stephen Hemminger 2020-08-11 20:49 ` Honnappa Nagarahalli 0 siblings, 1 reply; 10+ messages in thread From: Stephen Hemminger @ 2020-08-11 20:38 UTC (permalink / raw) To: Eads, Gage Cc: Steven Lariau, Olivier Matz, dev, honnappa.nagarahalli, dharmik.thakkar, nd On Tue, 11 Aug 2020 20:13:24 +0000 "Eads, Gage" <gage.eads@intel.com> wrote: > Hi Steven, > > > -----Original Message----- > > From: Steven Lariau <steven.lariau@arm.com> > > Sent: Wednesday, August 5, 2020 10:57 AM > > To: Eads, Gage <gage.eads@intel.com>; Olivier Matz > > <olivier.matz@6wind.com> > > Cc: dev@dpdk.org; honnappa.nagarahalli@arm.com; > > dharmik.thakkar@arm.com; nd@arm.com; Steven Lariau > > <steven.lariau@arm.com> > > Subject: [PATCH 1/4] test/stack: avoid trivial memory allocations > > > > Replace the arguments array by one argument. > > All objects in the args array have the same values, so there is no need > > to use an array, only one struct is enough. > > The args object is a lot smaller, and the allocation can be replaced > > with a stack variable. > > > > The allocation of obj_table isn't needed either, because MAX_BULK is > > small. The allocation can instead be replaced with a static array. > > > > Signed-off-by: Steven Lariau <steven.lariau@arm.com> > > Reviewed-by: Dharmik Thakkar <dharmik.thakkar@arm.com> > > Reviewed-by: Phil Yang <phil.yang@arm.com> > > Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com> > > --- > > app/test/test_stack.c | 39 ++++++--------------------------------- > > 1 file changed, 6 insertions(+), 33 deletions(-) > > > > diff --git a/app/test/test_stack.c b/app/test/test_stack.c > > index c8dac1f55..5a7273a7d 100644 > > --- a/app/test/test_stack.c > > +++ b/app/test/test_stack.c > > @@ -280,16 +280,9 @@ static int > > stack_thread_push_pop(void *args) > > { > > struct test_args *t = args; > > - void **obj_table; > > + void *obj_table[MAX_BULK]; > > int i; > > > > - obj_table = rte_calloc(NULL, STACK_SIZE, sizeof(void *), 0); > > - if (obj_table == NULL) { > > - printf("[%s():%u] failed to calloc %zu bytes\n", > > - __func__, __LINE__, STACK_SIZE * sizeof(void *)); > > - return -1; > > - } > > - > > for (i = 0; i < NUM_ITERS_PER_THREAD; i++) { > > unsigned int success, num; > > > > @@ -310,28 +303,25 @@ stack_thread_push_pop(void *args) > > if (rte_stack_push(t->s, obj_table, num) != num) { > > printf("[%s():%u] Failed to push %u pointers\n", > > __func__, __LINE__, num); > > - rte_free(obj_table); > > return -1; > > } > > > > if (rte_stack_pop(t->s, obj_table, num) != num) { > > printf("[%s():%u] Failed to pop %u pointers\n", > > __func__, __LINE__, num); > > - rte_free(obj_table); > > return -1; > > } > > > > rte_atomic64_sub(t->sz, num); > > } > > > > - rte_free(obj_table); > > return 0; > > } > > Agreed, the dynamic allocation is unnecessary. > > > > > static int > > test_stack_multithreaded(uint32_t flags) > > { > > - struct test_args *args; > > + struct test_args args; > > unsigned int lcore_id; > > struct rte_stack *s; > > rte_atomic64_t size; > > @@ -344,45 +334,28 @@ test_stack_multithreaded(uint32_t flags) > > printf("[%s():%u] Running with %u lcores\n", > > __func__, __LINE__, rte_lcore_count()); > > > > - args = rte_malloc(NULL, sizeof(struct test_args) * RTE_MAX_LCORE, > > 0); > > - if (args == NULL) { > > - printf("[%s():%u] failed to malloc %zu bytes\n", > > - __func__, __LINE__, > > - sizeof(struct test_args) * RTE_MAX_LCORE); > > - return -1; > > - } > > - > > s = rte_stack_create("test", STACK_SIZE, rte_socket_id(), flags); > > if (s == NULL) { > > printf("[%s():%u] Failed to create a stack\n", > > __func__, __LINE__); > > - rte_free(args); > > return -1; > > } > > > > rte_atomic64_init(&size); > > + args.s = s; > > + args.sz = &size; > > > > RTE_LCORE_FOREACH_SLAVE(lcore_id) { > > - args[lcore_id].s = s; > > - args[lcore_id].sz = &size; > > - > > if (rte_eal_remote_launch(stack_thread_push_pop, > > - &args[lcore_id], lcore_id)) > > + &args, lcore_id)) > > rte_panic("Failed to launch lcore %d\n", lcore_id); > > } > > > In general we shouldn't pass a stack variable to other threads. Though your > code here looks fine, I'd rather err on the safe side in case this is ever used > as a template/basis for some other code...particularly since there's no > performance/correctness/etc. penalty to using dynamically allocated memory. > > To support patch 2/4, you can instead convert the rte_malloc to allocate a > single shared test_args structure. Or perhaps move patch 4 earlier in the series, > and simply pass the stack pointer instead. > > Thanks, > Gage There is no gain to using rte_malloc unless you are doing primary/secondary process or trying to test rte_malloc. Why not use regular malloc which has good tools and library support. ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [dpdk-dev] [PATCH 1/4] test/stack: avoid trivial memory allocations 2020-08-11 20:38 ` Stephen Hemminger @ 2020-08-11 20:49 ` Honnappa Nagarahalli 2020-08-11 21:14 ` Eads, Gage 0 siblings, 1 reply; 10+ messages in thread From: Honnappa Nagarahalli @ 2020-08-11 20:49 UTC (permalink / raw) To: Stephen Hemminger, Eads, Gage Cc: Steven Lariau, Olivier Matz, dev, Dharmik Thakkar, nd, Honnappa Nagarahalli, nd <snip> > > > > > > Replace the arguments array by one argument. > > > All objects in the args array have the same values, so there is no > > > need to use an array, only one struct is enough. > > > The args object is a lot smaller, and the allocation can be replaced > > > with a stack variable. > > > > > > The allocation of obj_table isn't needed either, because MAX_BULK is > > > small. The allocation can instead be replaced with a static array. > > > > > > Signed-off-by: Steven Lariau <steven.lariau@arm.com> > > > Reviewed-by: Dharmik Thakkar <dharmik.thakkar@arm.com> > > > Reviewed-by: Phil Yang <phil.yang@arm.com> > > > Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com> > > > --- > > > app/test/test_stack.c | 39 ++++++--------------------------------- > > > 1 file changed, 6 insertions(+), 33 deletions(-) > > > > > > diff --git a/app/test/test_stack.c b/app/test/test_stack.c index > > > c8dac1f55..5a7273a7d 100644 > > > --- a/app/test/test_stack.c > > > +++ b/app/test/test_stack.c > > > @@ -280,16 +280,9 @@ static int > > > stack_thread_push_pop(void *args) > > > { > > > struct test_args *t = args; > > > - void **obj_table; > > > + void *obj_table[MAX_BULK]; > > > int i; > > > > > > - obj_table = rte_calloc(NULL, STACK_SIZE, sizeof(void *), 0); > > > - if (obj_table == NULL) { > > > - printf("[%s():%u] failed to calloc %zu bytes\n", > > > - __func__, __LINE__, STACK_SIZE * sizeof(void *)); > > > - return -1; > > > - } > > > - > > > for (i = 0; i < NUM_ITERS_PER_THREAD; i++) { > > > unsigned int success, num; > > > > > > @@ -310,28 +303,25 @@ stack_thread_push_pop(void *args) > > > if (rte_stack_push(t->s, obj_table, num) != num) { > > > printf("[%s():%u] Failed to push %u pointers\n", > > > __func__, __LINE__, num); > > > - rte_free(obj_table); > > > return -1; > > > } > > > > > > if (rte_stack_pop(t->s, obj_table, num) != num) { > > > printf("[%s():%u] Failed to pop %u pointers\n", > > > __func__, __LINE__, num); > > > - rte_free(obj_table); > > > return -1; > > > } > > > > > > rte_atomic64_sub(t->sz, num); > > > } > > > > > > - rte_free(obj_table); > > > return 0; > > > } > > > > Agreed, the dynamic allocation is unnecessary. > > > > > > > > static int > > > test_stack_multithreaded(uint32_t flags) { > > > - struct test_args *args; > > > + struct test_args args; > > > unsigned int lcore_id; > > > struct rte_stack *s; > > > rte_atomic64_t size; > > > @@ -344,45 +334,28 @@ test_stack_multithreaded(uint32_t flags) > > > printf("[%s():%u] Running with %u lcores\n", > > > __func__, __LINE__, rte_lcore_count()); > > > > > > - args = rte_malloc(NULL, sizeof(struct test_args) * RTE_MAX_LCORE, > > > 0); > > > - if (args == NULL) { > > > - printf("[%s():%u] failed to malloc %zu bytes\n", > > > - __func__, __LINE__, > > > - sizeof(struct test_args) * RTE_MAX_LCORE); > > > - return -1; > > > - } > > > - > > > s = rte_stack_create("test", STACK_SIZE, rte_socket_id(), flags); > > > if (s == NULL) { > > > printf("[%s():%u] Failed to create a stack\n", > > > __func__, __LINE__); > > > - rte_free(args); > > > return -1; > > > } > > > > > > rte_atomic64_init(&size); > > > + args.s = s; > > > + args.sz = &size; > > > > > > RTE_LCORE_FOREACH_SLAVE(lcore_id) { > > > - args[lcore_id].s = s; > > > - args[lcore_id].sz = &size; > > > - > > > if (rte_eal_remote_launch(stack_thread_push_pop, > > > - &args[lcore_id], lcore_id)) > > > + &args, lcore_id)) > > > rte_panic("Failed to launch lcore %d\n", lcore_id); > > > } > > > > > > In general we shouldn't pass a stack variable to other threads. Though > > your code here looks fine, I'd rather err on the safe side in case > > this is ever used as a template/basis for some other > > code...particularly since there's no performance/correctness/etc. penalty to > using dynamically allocated memory. > > > > To support patch 2/4, you can instead convert the rte_malloc to > > allocate a single shared test_args structure. Or perhaps move patch 4 > > earlier in the series, and simply pass the stack pointer instead. > > > > Thanks, > > Gage > > There is no gain to using rte_malloc unless you are doing primary/secondary > process or trying to test rte_malloc. Why not use regular malloc which has > good tools and library support. I think making 'args' a global variable is enough in this case. ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [dpdk-dev] [PATCH 1/4] test/stack: avoid trivial memory allocations 2020-08-11 20:49 ` Honnappa Nagarahalli @ 2020-08-11 21:14 ` Eads, Gage 0 siblings, 0 replies; 10+ messages in thread From: Eads, Gage @ 2020-08-11 21:14 UTC (permalink / raw) To: Honnappa Nagarahalli, Stephen Hemminger Cc: Steven Lariau, Olivier Matz, dev, Dharmik Thakkar, nd, nd > -----Original Message----- > From: Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com> > Sent: Tuesday, August 11, 2020 3:50 PM > To: Stephen Hemminger <stephen@networkplumber.org>; Eads, Gage > <gage.eads@intel.com> > Cc: Steven Lariau <Steven.Lariau@arm.com>; Olivier Matz > <olivier.matz@6wind.com>; dev@dpdk.org; Dharmik Thakkar > <Dharmik.Thakkar@arm.com>; nd <nd@arm.com>; Honnappa Nagarahalli > <Honnappa.Nagarahalli@arm.com>; nd <nd@arm.com> > Subject: RE: [dpdk-dev] [PATCH 1/4] test/stack: avoid trivial memory > allocations > > <snip> > > > > > > > > > Replace the arguments array by one argument. > > > > All objects in the args array have the same values, so there is no > > > > need to use an array, only one struct is enough. > > > > The args object is a lot smaller, and the allocation can be replaced > > > > with a stack variable. > > > > > > > > The allocation of obj_table isn't needed either, because MAX_BULK is > > > > small. The allocation can instead be replaced with a static array. > > > > > > > > Signed-off-by: Steven Lariau <steven.lariau@arm.com> > > > > Reviewed-by: Dharmik Thakkar <dharmik.thakkar@arm.com> > > > > Reviewed-by: Phil Yang <phil.yang@arm.com> > > > > Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com> > > > > --- > > > > app/test/test_stack.c | 39 ++++++--------------------------------- > > > > 1 file changed, 6 insertions(+), 33 deletions(-) > > > > > > > > diff --git a/app/test/test_stack.c b/app/test/test_stack.c index > > > > c8dac1f55..5a7273a7d 100644 > > > > --- a/app/test/test_stack.c > > > > +++ b/app/test/test_stack.c > > > > @@ -280,16 +280,9 @@ static int > > > > stack_thread_push_pop(void *args) > > > > { > > > > struct test_args *t = args; > > > > - void **obj_table; > > > > + void *obj_table[MAX_BULK]; > > > > int i; > > > > > > > > - obj_table = rte_calloc(NULL, STACK_SIZE, sizeof(void *), 0); > > > > - if (obj_table == NULL) { > > > > - printf("[%s():%u] failed to calloc %zu bytes\n", > > > > - __func__, __LINE__, STACK_SIZE * sizeof(void *)); > > > > - return -1; > > > > - } > > > > - > > > > for (i = 0; i < NUM_ITERS_PER_THREAD; i++) { > > > > unsigned int success, num; > > > > > > > > @@ -310,28 +303,25 @@ stack_thread_push_pop(void *args) > > > > if (rte_stack_push(t->s, obj_table, num) != num) { > > > > printf("[%s():%u] Failed to push %u pointers\n", > > > > __func__, __LINE__, num); > > > > - rte_free(obj_table); > > > > return -1; > > > > } > > > > > > > > if (rte_stack_pop(t->s, obj_table, num) != num) { > > > > printf("[%s():%u] Failed to pop %u pointers\n", > > > > __func__, __LINE__, num); > > > > - rte_free(obj_table); > > > > return -1; > > > > } > > > > > > > > rte_atomic64_sub(t->sz, num); > > > > } > > > > > > > > - rte_free(obj_table); > > > > return 0; > > > > } > > > > > > Agreed, the dynamic allocation is unnecessary. > > > > > > > > > > > static int > > > > test_stack_multithreaded(uint32_t flags) { > > > > - struct test_args *args; > > > > + struct test_args args; > > > > unsigned int lcore_id; > > > > struct rte_stack *s; > > > > rte_atomic64_t size; > > > > @@ -344,45 +334,28 @@ test_stack_multithreaded(uint32_t flags) > > > > printf("[%s():%u] Running with %u lcores\n", > > > > __func__, __LINE__, rte_lcore_count()); > > > > > > > > - args = rte_malloc(NULL, sizeof(struct test_args) * RTE_MAX_LCORE, > > > > 0); > > > > - if (args == NULL) { > > > > - printf("[%s():%u] failed to malloc %zu bytes\n", > > > > - __func__, __LINE__, > > > > - sizeof(struct test_args) * RTE_MAX_LCORE); > > > > - return -1; > > > > - } > > > > - > > > > s = rte_stack_create("test", STACK_SIZE, rte_socket_id(), flags); > > > > if (s == NULL) { > > > > printf("[%s():%u] Failed to create a stack\n", > > > > __func__, __LINE__); > > > > - rte_free(args); > > > > return -1; > > > > } > > > > > > > > rte_atomic64_init(&size); > > > > + args.s = s; > > > > + args.sz = &size; > > > > > > > > RTE_LCORE_FOREACH_SLAVE(lcore_id) { > > > > - args[lcore_id].s = s; > > > > - args[lcore_id].sz = &size; > > > > - > > > > if (rte_eal_remote_launch(stack_thread_push_pop, > > > > - &args[lcore_id], lcore_id)) > > > > + &args, lcore_id)) > > > > rte_panic("Failed to launch lcore %d\n", lcore_id); > > > > } > > > > > > > > > In general we shouldn't pass a stack variable to other threads. Though > > > your code here looks fine, I'd rather err on the safe side in case > > > this is ever used as a template/basis for some other > > > code...particularly since there's no performance/correctness/etc. > penalty to > > using dynamically allocated memory. > > > > > > To support patch 2/4, you can instead convert the rte_malloc to > > > allocate a single shared test_args structure. Or perhaps move patch 4 > > > earlier in the series, and simply pass the stack pointer instead. > > > > > > Thanks, > > > Gage > > > > There is no gain to using rte_malloc unless you are doing > primary/secondary > > process or trying to test rte_malloc. Why not use regular malloc which has > > good tools and library support. > > I think making 'args' a global variable is enough in this case. Agreed. Thanks, Gage ^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2020-08-11 21:15 UTC | newest] Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2020-08-05 15:45 [dpdk-dev] [PATCH 0/4] test/stack: improve multithreaded test Steven Lariau 2020-08-05 15:45 ` [dpdk-dev] [PATCH 1/4] test/stack: avoid trivial memory allocations Steven Lariau 2020-08-05 15:45 ` [dpdk-dev] [PATCH 2/4] test/stack: launch tests with mp remote launch API Steven Lariau 2020-08-05 15:46 ` [dpdk-dev] [PATCH 3/4] test/stack: propagate errors to main core Steven Lariau 2020-08-05 15:46 ` [dpdk-dev] [PATCH 4/4] test/stack: remove atomics operations Steven Lariau 2020-08-05 15:57 [dpdk-dev] [PATCH 0/4] test/stack: improve multithreaded test Steven Lariau 2020-08-05 15:57 ` [dpdk-dev] [PATCH 1/4] test/stack: avoid trivial memory allocations Steven Lariau 2020-08-11 20:13 ` Eads, Gage 2020-08-11 20:38 ` Stephen Hemminger 2020-08-11 20:49 ` Honnappa Nagarahalli 2020-08-11 21:14 ` Eads, Gage
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).