* [dpdk-dev] [PATCH] test/service: fix wait for service core @ 2019-11-27 13:20 Harry van Haaren 2019-11-27 14:10 ` Aaron Conole 2019-11-27 21:38 ` [dpdk-dev] " David Marchand 0 siblings, 2 replies; 7+ messages in thread From: Harry van Haaren @ 2019-11-27 13:20 UTC (permalink / raw) To: dev; +Cc: aconole, Harry van Haaren This commit fixes a sporadic failure of the service_autotest unit test, as seen in the DPDK CI. The failure occurs as the main test thread did not wait on the service-thread to return, and allowing it to read a flag before the service was able to write to it. The fix changes the wait API call to specific the service-core ID, and this waits for cores with both ROLE_RTE and ROLE_SERVICE. The rte_eal_mp_wait_lcore() call does not (and should not) wait for service cores, so must not be used to wait on service-cores. Fixes: f038a81e1c56 ("service: add unit tests") Reported-by: Aaron Conole <aconole@redhat.com> Signed-off-by: Harry van Haaren <harry.van.haaren@intel.com> --- Given this is a fix in the unit test, and not a functional change I'm not sure its worth backporting to LTS / stable releases? I've not added stable on CC yet. --- app/test/test_service_cores.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/app/test/test_service_cores.c b/app/test/test_service_cores.c index 9fe38f5e0..a922c7ddc 100644 --- a/app/test/test_service_cores.c +++ b/app/test/test_service_cores.c @@ -483,7 +483,7 @@ service_lcore_en_dis_able(void) int ret = rte_eal_remote_launch(service_remote_launch_func, NULL, slcore_id); TEST_ASSERT_EQUAL(0, ret, "Ex-service core remote launch failed."); - rte_eal_mp_wait_lcore(); + rte_eal_wait_lcore(slcore_id); TEST_ASSERT_EQUAL(1, service_remote_launch_flag, "Ex-service core function call had no effect."); -- 2.17.1 ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [dpdk-dev] [PATCH] test/service: fix wait for service core 2019-11-27 13:20 [dpdk-dev] [PATCH] test/service: fix wait for service core Harry van Haaren @ 2019-11-27 14:10 ` Aaron Conole 2019-11-27 14:16 ` Van Haaren, Harry 2019-11-27 21:38 ` [dpdk-dev] " David Marchand 1 sibling, 1 reply; 7+ messages in thread From: Aaron Conole @ 2019-11-27 14:10 UTC (permalink / raw) To: Harry van Haaren; +Cc: dev Harry van Haaren <harry.van.haaren@intel.com> writes: > This commit fixes a sporadic failure of the service_autotest > unit test, as seen in the DPDK CI. The failure occurs as the main test > thread did not wait on the service-thread to return, and allowing it > to read a flag before the service was able to write to it. > > The fix changes the wait API call to specific the service-core ID, > and this waits for cores with both ROLE_RTE and ROLE_SERVICE. > > The rte_eal_mp_wait_lcore() call does not (and should not) wait > for service cores, so must not be used to wait on service-cores. > > Fixes: f038a81e1c56 ("service: add unit tests") > > Reported-by: Aaron Conole <aconole@redhat.com> > Signed-off-by: Harry van Haaren <harry.van.haaren@intel.com> > > --- It might also be good to document this behavior in the API area. It's unclear that the lcore wait function which takes a core id will work, but the broad wait will not. > Given this is a fix in the unit test, and not a functional change > I'm not sure its worth backporting to LTS / stable releases? > I've not added stable on CC yet. I think it's worth it if the LTS / stable branches use the unit tests (otherwise, they will observe sporadic failures). > --- > app/test/test_service_cores.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/app/test/test_service_cores.c b/app/test/test_service_cores.c > index 9fe38f5e0..a922c7ddc 100644 > --- a/app/test/test_service_cores.c > +++ b/app/test/test_service_cores.c > @@ -483,7 +483,7 @@ service_lcore_en_dis_able(void) > int ret = rte_eal_remote_launch(service_remote_launch_func, NULL, > slcore_id); > TEST_ASSERT_EQUAL(0, ret, "Ex-service core remote launch failed."); > - rte_eal_mp_wait_lcore(); > + rte_eal_wait_lcore(slcore_id); > TEST_ASSERT_EQUAL(1, service_remote_launch_flag, > "Ex-service core function call had no effect."); Should we also have some change like the following (just a guess): diff --git a/app/test/test_service_cores.c b/app/test/test_service_cores.c index 9fe38f5e08..695c35ac6c 100644 --- a/app/test/test_service_cores.c +++ b/app/test/test_service_cores.c @@ -773,7 +773,7 @@ service_app_lcore_poll_impl(const int mt_safe) /* flag done, then wait for the spawned 2nd core to return */ params[0] = 1; - rte_eal_mp_wait_lcore(); + rte_eal_wait_lcore(app_core2); /* core two gets launched first - and should hold the service lock */ TEST_ASSERT_EQUAL(0, app_core2_ret, ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [dpdk-dev] [PATCH] test/service: fix wait for service core 2019-11-27 14:10 ` Aaron Conole @ 2019-11-27 14:16 ` Van Haaren, Harry 2019-11-27 18:11 ` [dpdk-dev] [dpdk-stable] " David Marchand 2019-11-27 20:11 ` David Marchand 0 siblings, 2 replies; 7+ messages in thread From: Van Haaren, Harry @ 2019-11-27 14:16 UTC (permalink / raw) To: Aaron Conole; +Cc: dev, stable > -----Original Message----- > From: Aaron Conole <aconole@redhat.com> > Sent: Wednesday, November 27, 2019 2:10 PM > To: Van Haaren, Harry <harry.van.haaren@intel.com> > Cc: dev@dpdk.org > Subject: Re: [PATCH] test/service: fix wait for service core > > Harry van Haaren <harry.van.haaren@intel.com> writes: > > > This commit fixes a sporadic failure of the service_autotest > > unit test, as seen in the DPDK CI. The failure occurs as the main test > > thread did not wait on the service-thread to return, and allowing it > > to read a flag before the service was able to write to it. > > > > The fix changes the wait API call to specific the service-core ID, > > and this waits for cores with both ROLE_RTE and ROLE_SERVICE. > > > > The rte_eal_mp_wait_lcore() call does not (and should not) wait > > for service cores, so must not be used to wait on service-cores. > > > > Fixes: f038a81e1c56 ("service: add unit tests") > > > > Reported-by: Aaron Conole <aconole@redhat.com> > > Signed-off-by: Harry van Haaren <harry.van.haaren@intel.com> > > > > --- > > It might also be good to document this behavior in the API area. It's > unclear that the lcore wait function which takes a core id will work, > but the broad wait will not. Yes agreed that docs can improve here - different patch. > > Given this is a fix in the unit test, and not a functional change > > I'm not sure its worth backporting to LTS / stable releases? > > I've not added stable on CC yet. > > I think it's worth it if the LTS / stable branches use the unit tests > (otherwise, they will observe sporadic failures). Ok, I've added stable@dpdk.org on CC now > > app/test/test_service_cores.c | 2 +- > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > diff --git a/app/test/test_service_cores.c b/app/test/test_service_cores.c > > index 9fe38f5e0..a922c7ddc 100644 > > --- a/app/test/test_service_cores.c > > +++ b/app/test/test_service_cores.c > > @@ -483,7 +483,7 @@ service_lcore_en_dis_able(void) > > int ret = rte_eal_remote_launch(service_remote_launch_func, NULL, > > slcore_id); > > TEST_ASSERT_EQUAL(0, ret, "Ex-service core remote launch failed."); > > - rte_eal_mp_wait_lcore(); > > + rte_eal_wait_lcore(slcore_id); > > TEST_ASSERT_EQUAL(1, service_remote_launch_flag, > > "Ex-service core function call had no effect."); > > Should we also have some change like the following (just a guess): > > diff --git a/app/test/test_service_cores.c b/app/test/test_service_cores.c > index 9fe38f5e08..695c35ac6c 100644 > --- a/app/test/test_service_cores.c > +++ b/app/test/test_service_cores.c > @@ -773,7 +773,7 @@ service_app_lcore_poll_impl(const int mt_safe) > > /* flag done, then wait for the spawned 2nd core to return */ > params[0] = 1; > - rte_eal_mp_wait_lcore(); > + rte_eal_wait_lcore(app_core2); > > /* core two gets launched first - and should hold the service lock */ > TEST_ASSERT_EQUAL(0, app_core2_ret, I reviewed this usage of the function, and I believe it waits on application cores (aka, ROLE_RTE, not ROLE_SERVICE). Hence this usage is actually correct. Please review and double check my logic though - more eyes is good. ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [dpdk-dev] [dpdk-stable] [PATCH] test/service: fix wait for service core 2019-11-27 14:16 ` Van Haaren, Harry @ 2019-11-27 18:11 ` David Marchand 2019-11-27 19:10 ` Aaron Conole 2019-11-27 20:11 ` David Marchand 1 sibling, 1 reply; 7+ messages in thread From: David Marchand @ 2019-11-27 18:11 UTC (permalink / raw) To: Van Haaren, Harry, Aaron Conole; +Cc: dev, stable On Wed, Nov 27, 2019 at 3:16 PM Van Haaren, Harry <harry.van.haaren@intel.com> wrote: > > > -----Original Message----- > > From: Aaron Conole <aconole@redhat.com> > > Sent: Wednesday, November 27, 2019 2:10 PM > > To: Van Haaren, Harry <harry.van.haaren@intel.com> > > Cc: dev@dpdk.org > > Subject: Re: [PATCH] test/service: fix wait for service core > > > > Harry van Haaren <harry.van.haaren@intel.com> writes: > > > > > This commit fixes a sporadic failure of the service_autotest > > > unit test, as seen in the DPDK CI. The failure occurs as the main test > > > thread did not wait on the service-thread to return, and allowing it > > > to read a flag before the service was able to write to it. > > > > > > The fix changes the wait API call to specific the service-core ID, > > > and this waits for cores with both ROLE_RTE and ROLE_SERVICE. > > > > > > The rte_eal_mp_wait_lcore() call does not (and should not) wait > > > for service cores, so must not be used to wait on service-cores. > > > > > > Fixes: f038a81e1c56 ("service: add unit tests") > > > > > > Reported-by: Aaron Conole <aconole@redhat.com> > > > Signed-off-by: Harry van Haaren <harry.van.haaren@intel.com> > > > > > > --- > > > > It might also be good to document this behavior in the API area. It's > > unclear that the lcore wait function which takes a core id will work, > > but the broad wait will not. > > Yes agreed that docs can improve here - different patch. > > > > > Given this is a fix in the unit test, and not a functional change > > > I'm not sure its worth backporting to LTS / stable releases? > > > I've not added stable on CC yet. > > > > I think it's worth it if the LTS / stable branches use the unit tests > > (otherwise, they will observe sporadic failures). > > Ok, I've added stable@dpdk.org on CC now > > > > > app/test/test_service_cores.c | 2 +- > > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > > > diff --git a/app/test/test_service_cores.c b/app/test/test_service_cores.c > > > index 9fe38f5e0..a922c7ddc 100644 > > > --- a/app/test/test_service_cores.c > > > +++ b/app/test/test_service_cores.c > > > @@ -483,7 +483,7 @@ service_lcore_en_dis_able(void) > > > int ret = rte_eal_remote_launch(service_remote_launch_func, NULL, > > > slcore_id); > > > TEST_ASSERT_EQUAL(0, ret, "Ex-service core remote launch failed."); > > > - rte_eal_mp_wait_lcore(); > > > + rte_eal_wait_lcore(slcore_id); > > > TEST_ASSERT_EQUAL(1, service_remote_launch_flag, > > > "Ex-service core function call had no effect."); > > > > Should we also have some change like the following (just a guess): > > > > diff --git a/app/test/test_service_cores.c b/app/test/test_service_cores.c > > index 9fe38f5e08..695c35ac6c 100644 > > --- a/app/test/test_service_cores.c > > +++ b/app/test/test_service_cores.c > > @@ -773,7 +773,7 @@ service_app_lcore_poll_impl(const int mt_safe) > > > > /* flag done, then wait for the spawned 2nd core to return */ > > params[0] = 1; > > - rte_eal_mp_wait_lcore(); > > + rte_eal_wait_lcore(app_core2); > > > > /* core two gets launched first - and should hold the service lock */ > > TEST_ASSERT_EQUAL(0, app_core2_ret, > > > I reviewed this usage of the function, and I believe it waits on application > cores (aka, ROLE_RTE, not ROLE_SERVICE). Hence this usage is actually correct. > Please review and double check my logic though - more eyes is good. > I will check it later tonight but I am for taking this in 19.11 if we can get more stable tests. Aaron, do you have an objection? -- David Marchand ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [dpdk-dev] [dpdk-stable] [PATCH] test/service: fix wait for service core 2019-11-27 18:11 ` [dpdk-dev] [dpdk-stable] " David Marchand @ 2019-11-27 19:10 ` Aaron Conole 0 siblings, 0 replies; 7+ messages in thread From: Aaron Conole @ 2019-11-27 19:10 UTC (permalink / raw) To: David Marchand; +Cc: Van Haaren, Harry, dev, stable David Marchand <david.marchand@redhat.com> writes: > On Wed, Nov 27, 2019 at 3:16 PM Van Haaren, Harry > <harry.van.haaren@intel.com> wrote: >> >> > -----Original Message----- >> > From: Aaron Conole <aconole@redhat.com> >> > Sent: Wednesday, November 27, 2019 2:10 PM >> > To: Van Haaren, Harry <harry.van.haaren@intel.com> >> > Cc: dev@dpdk.org >> > Subject: Re: [PATCH] test/service: fix wait for service core >> > >> > Harry van Haaren <harry.van.haaren@intel.com> writes: >> > >> > > This commit fixes a sporadic failure of the service_autotest >> > > unit test, as seen in the DPDK CI. The failure occurs as the main test >> > > thread did not wait on the service-thread to return, and allowing it >> > > to read a flag before the service was able to write to it. >> > > >> > > The fix changes the wait API call to specific the service-core ID, >> > > and this waits for cores with both ROLE_RTE and ROLE_SERVICE. >> > > >> > > The rte_eal_mp_wait_lcore() call does not (and should not) wait >> > > for service cores, so must not be used to wait on service-cores. >> > > >> > > Fixes: f038a81e1c56 ("service: add unit tests") >> > > >> > > Reported-by: Aaron Conole <aconole@redhat.com> >> > > Signed-off-by: Harry van Haaren <harry.van.haaren@intel.com> >> > > >> > > --- >> > >> > It might also be good to document this behavior in the API area. It's >> > unclear that the lcore wait function which takes a core id will work, >> > but the broad wait will not. >> >> Yes agreed that docs can improve here - different patch. >> >> >> > > Given this is a fix in the unit test, and not a functional change >> > > I'm not sure its worth backporting to LTS / stable releases? >> > > I've not added stable on CC yet. >> > >> > I think it's worth it if the LTS / stable branches use the unit tests >> > (otherwise, they will observe sporadic failures). >> >> Ok, I've added stable@dpdk.org on CC now >> >> >> > > app/test/test_service_cores.c | 2 +- >> > > 1 file changed, 1 insertion(+), 1 deletion(-) >> > > >> > > diff --git a/app/test/test_service_cores.c b/app/test/test_service_cores.c >> > > index 9fe38f5e0..a922c7ddc 100644 >> > > --- a/app/test/test_service_cores.c >> > > +++ b/app/test/test_service_cores.c >> > > @@ -483,7 +483,7 @@ service_lcore_en_dis_able(void) >> > > int ret = rte_eal_remote_launch(service_remote_launch_func, NULL, >> > > slcore_id); >> > > TEST_ASSERT_EQUAL(0, ret, "Ex-service core remote launch failed."); >> > > - rte_eal_mp_wait_lcore(); >> > > + rte_eal_wait_lcore(slcore_id); >> > > TEST_ASSERT_EQUAL(1, service_remote_launch_flag, >> > > "Ex-service core function call had no effect."); >> > >> > Should we also have some change like the following (just a guess): >> > >> > diff --git a/app/test/test_service_cores.c b/app/test/test_service_cores.c >> > index 9fe38f5e08..695c35ac6c 100644 >> > --- a/app/test/test_service_cores.c >> > +++ b/app/test/test_service_cores.c >> > @@ -773,7 +773,7 @@ service_app_lcore_poll_impl(const int mt_safe) >> > >> > /* flag done, then wait for the spawned 2nd core to return */ >> > params[0] = 1; >> > - rte_eal_mp_wait_lcore(); >> > + rte_eal_wait_lcore(app_core2); >> > >> > /* core two gets launched first - and should hold the service lock */ >> > TEST_ASSERT_EQUAL(0, app_core2_ret, >> >> >> I reviewed this usage of the function, and I believe it waits on application >> cores (aka, ROLE_RTE, not ROLE_SERVICE). Hence this usage is actually correct. >> Please review and double check my logic though - more eyes is good. >> > > I will check it later tonight but I am for taking this in 19.11 if we > can get more stable tests. > Aaron, do you have an objection? No objection ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [dpdk-dev] [dpdk-stable] [PATCH] test/service: fix wait for service core 2019-11-27 14:16 ` Van Haaren, Harry 2019-11-27 18:11 ` [dpdk-dev] [dpdk-stable] " David Marchand @ 2019-11-27 20:11 ` David Marchand 1 sibling, 0 replies; 7+ messages in thread From: David Marchand @ 2019-11-27 20:11 UTC (permalink / raw) To: Van Haaren, Harry; +Cc: Aaron Conole, dev, stable On Wed, Nov 27, 2019 at 3:16 PM Van Haaren, Harry <harry.van.haaren@intel.com> wrote: > > > -----Original Message----- > > From: Aaron Conole <aconole@redhat.com> > > Sent: Wednesday, November 27, 2019 2:10 PM > > To: Van Haaren, Harry <harry.van.haaren@intel.com> > > Cc: dev@dpdk.org > > Subject: Re: [PATCH] test/service: fix wait for service core > > > > Harry van Haaren <harry.van.haaren@intel.com> writes: > > > > > This commit fixes a sporadic failure of the service_autotest > > > unit test, as seen in the DPDK CI. The failure occurs as the main test > > > thread did not wait on the service-thread to return, and allowing it > > > to read a flag before the service was able to write to it. > > > > > > The fix changes the wait API call to specific the service-core ID, > > > and this waits for cores with both ROLE_RTE and ROLE_SERVICE. > > > > > > The rte_eal_mp_wait_lcore() call does not (and should not) wait > > > for service cores, so must not be used to wait on service-cores. > > > > > > Fixes: f038a81e1c56 ("service: add unit tests") > > > > > > Reported-by: Aaron Conole <aconole@redhat.com> > > > Signed-off-by: Harry van Haaren <harry.van.haaren@intel.com> > > > > > > --- > > > > It might also be good to document this behavior in the API area. It's > > unclear that the lcore wait function which takes a core id will work, > > but the broad wait will not. > > Yes agreed that docs can improve here - different patch. > > > > > Given this is a fix in the unit test, and not a functional change > > > I'm not sure its worth backporting to LTS / stable releases? > > > I've not added stable on CC yet. > > > > I think it's worth it if the LTS / stable branches use the unit tests > > (otherwise, they will observe sporadic failures). > > Ok, I've added stable@dpdk.org on CC now > > > > > app/test/test_service_cores.c | 2 +- > > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > > > diff --git a/app/test/test_service_cores.c b/app/test/test_service_cores.c > > > index 9fe38f5e0..a922c7ddc 100644 > > > --- a/app/test/test_service_cores.c > > > +++ b/app/test/test_service_cores.c > > > @@ -483,7 +483,7 @@ service_lcore_en_dis_able(void) > > > int ret = rte_eal_remote_launch(service_remote_launch_func, NULL, > > > slcore_id); > > > TEST_ASSERT_EQUAL(0, ret, "Ex-service core remote launch failed."); > > > - rte_eal_mp_wait_lcore(); > > > + rte_eal_wait_lcore(slcore_id); > > > TEST_ASSERT_EQUAL(1, service_remote_launch_flag, > > > "Ex-service core function call had no effect."); > > > > Should we also have some change like the following (just a guess): > > > > diff --git a/app/test/test_service_cores.c b/app/test/test_service_cores.c > > index 9fe38f5e08..695c35ac6c 100644 > > --- a/app/test/test_service_cores.c > > +++ b/app/test/test_service_cores.c > > @@ -773,7 +773,7 @@ service_app_lcore_poll_impl(const int mt_safe) > > > > /* flag done, then wait for the spawned 2nd core to return */ > > params[0] = 1; > > - rte_eal_mp_wait_lcore(); > > + rte_eal_wait_lcore(app_core2); > > > > /* core two gets launched first - and should hold the service lock */ > > TEST_ASSERT_EQUAL(0, app_core2_ret, > > > I reviewed this usage of the function, and I believe it waits on application > cores (aka, ROLE_RTE, not ROLE_SERVICE). Hence this usage is actually correct. > Please review and double check my logic though - more eyes is good. It seems to be the case, yes. My overall feeling is that the services stuff is a giant hack, so better documentation will prove me wrong :-). As I said I am for taking this change in 19.11 now, as it only impacts this test and it seems to solve the random failures. Acked-by: David Marchand <david.marchand@redhat.com> -- David Marchand ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [dpdk-dev] [PATCH] test/service: fix wait for service core 2019-11-27 13:20 [dpdk-dev] [PATCH] test/service: fix wait for service core Harry van Haaren 2019-11-27 14:10 ` Aaron Conole @ 2019-11-27 21:38 ` David Marchand 1 sibling, 0 replies; 7+ messages in thread From: David Marchand @ 2019-11-27 21:38 UTC (permalink / raw) To: Harry van Haaren; +Cc: dev, Aaron Conole, dpdk stable On Wed, Nov 27, 2019 at 2:20 PM Harry van Haaren <harry.van.haaren@intel.com> wrote: > > This commit fixes a sporadic failure of the service_autotest > unit test, as seen in the DPDK CI. The failure occurs as the main test > thread did not wait on the service-thread to return, and allowing it > to read a flag before the service was able to write to it. > > The fix changes the wait API call to specific the service-core ID, > and this waits for cores with both ROLE_RTE and ROLE_SERVICE. > > The rte_eal_mp_wait_lcore() call does not (and should not) wait > for service cores, so must not be used to wait on service-cores. > > Fixes: f038a81e1c56 ("service: add unit tests") Cc: stable@dpdk.org > > Reported-by: Aaron Conole <aconole@redhat.com> > Signed-off-by: Harry van Haaren <harry.van.haaren@intel.com> Acked-by: David Marchand <david.marchand@redhat.com> Before this patch, reproduced the pb in less than 2 minutes with: # time (log=/tmp/$$.log; while true; do echo service_autotest |taskset -c 0-1 build-gcc-static/app/test/dpdk-test --log-level *:8 -l 0-1 >$log 2>&1; grep -q 'Test OK' $log || break; done; cat $log; rm -f $log) With the patch, this loop has been running for 40 minutes. Applied, thanks. -- David Marchand ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2019-11-27 21:39 UTC | newest] Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2019-11-27 13:20 [dpdk-dev] [PATCH] test/service: fix wait for service core Harry van Haaren 2019-11-27 14:10 ` Aaron Conole 2019-11-27 14:16 ` Van Haaren, Harry 2019-11-27 18:11 ` [dpdk-dev] [dpdk-stable] " David Marchand 2019-11-27 19:10 ` Aaron Conole 2019-11-27 20:11 ` David Marchand 2019-11-27 21:38 ` [dpdk-dev] " David Marchand
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).