DPDK patches and discussions
 help / color / mirror / Atom feed
* [dpdk-dev] [PATCH] test/service: fix race in attr check
@ 2021-10-11 14:54 David Marchand
  2021-10-11 15:06 ` Aaron Conole
                   ` (2 more replies)
  0 siblings, 3 replies; 4+ messages in thread
From: David Marchand @ 2021-10-11 14:54 UTC (permalink / raw)
  To: dev; +Cc: aconole, stable, Harry van Haaren, Kevin Laatz

The CI reported rare (and cryptic) failures like:

RTE>>service_autotest
 + ------------------------------------------------------- +
 + Test Suite : service core test suite
 + ------------------------------------------------------- +
 + TestCase [ 0] : unregister_all succeeded
 + TestCase [ 1] : service_name succeeded
 + TestCase [ 2] : service_get_by_name succeeded
Service dummy_service Summary
  dummy_service: stats 1	calls 0	cycles 0	avg: 0
Service dummy_service Summary
  dummy_service: stats 0	calls 0	cycles 0	avg: 0
 + TestCase [ 3] : service_dump succeeded
 + TestCase [ 4] : service_attr_get failed
 + TestCase [ 5] : service_lcore_attr_get succeeded
 + TestCase [ 6] : service_probe_capability succeeded
 + TestCase [ 7] : service_start_stop succeeded
 + TestCase [ 8] : service_lcore_add_del succeeded
 + TestCase [ 9] : service_lcore_start_stop succeeded
 + TestCase [10] : service_lcore_en_dis_able succeeded
 + TestCase [11] : service_mt_unsafe_poll succeeded
 + TestCase [12] : service_mt_safe_poll succeeded
perf test for MT Safe: 42.7 cycles per call
 + TestCase [13] : service_app_lcore_mt_safe succeeded
perf test for MT Unsafe: 73.3 cycles per call
 + TestCase [14] : service_app_lcore_mt_unsafe succeeded
 + TestCase [15] : service_may_be_active succeeded
 + TestCase [16] : service_active_two_cores succeeded
 + ------------------------------------------------------- +
 + Test Suite Summary : service core test suite
 + ------------------------------------------------------- +
 + Tests Total :       17
 + Tests Skipped :      0
 + Tests Executed :    17
 + Tests Unsupported:   0
 + Tests Passed :      16
 + Tests Failed :       1
 + ------------------------------------------------------- +
Test Failed
RTE>>
stderr:
EAL: Detected CPU lcores: 16
EAL: Detected NUMA nodes: 2
EAL: Detected static linkage of DPDK
EAL: Multi-process socket /var/run/dpdk/service_autotest/mp_socket
EAL: Selected IOVA mode 'PA'
EAL: No available 1048576 kB hugepages reported
EAL: VFIO support initialized
EAL: Device 0000:03:00.0 is not NUMA-aware, defaulting socket to 0
APP: HPET is not enabled, using TSC as default timer
EAL: Test assert service_attr_get line 340 failed: attr_get() call didn't
 get call count (zero)

According to API, trying to stop a service lcore is not possible if this
lcore is the only one associated to a service.
Doing this will result in a -EBUSY return code from
rte_service_lcore_stop() which the service_attr_get subtest was not
checking.
This left the service lcore running, and a race existed with the main
lcore on checking the service attributes which triggered this CI
failure.

To fix this, dissociate the service lcore with current service.

Once fixed this first issue, a race still exists, because the
wait_slcore_inactive helper added in a previous fix was not
paired with a check that the service lcore _did_ stop.

Add missing check on rte_service_lcore_may_be_active.

Fixes: 4d55194d76a4 ("service: add attribute get function")
Fixes: 52bb6be259ff ("test/service: fix race condition on stopping lcore")
Cc: stable@dpdk.org

Signed-off-by: David Marchand <david.marchand@redhat.com>
---
 app/test/test_service_cores.c | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/app/test/test_service_cores.c b/app/test/test_service_cores.c
index ece104054e..6821dca0e6 100644
--- a/app/test/test_service_cores.c
+++ b/app/test/test_service_cores.c
@@ -318,10 +318,16 @@ service_attr_get(void)
 	TEST_ASSERT_EQUAL(1, cycles_gt_zero,
 			"attr_get() failed to get cycles (expected > zero)");
 
-	rte_service_lcore_stop(slcore_id);
+	TEST_ASSERT_EQUAL(0, rte_service_map_lcore_set(id, slcore_id, 0),
+			"Disabling valid service and core failed");
+	TEST_ASSERT_EQUAL(0, rte_service_lcore_stop(slcore_id),
+			"Failed to stop service lcore");
 
 	wait_slcore_inactive(slcore_id);
 
+	TEST_ASSERT_EQUAL(0, rte_service_lcore_may_be_active(slcore_id),
+			  "Service lcore not stopped after waiting.");
+
 	TEST_ASSERT_EQUAL(0, rte_service_attr_get(id, attr_calls, &attr_value),
 			"Valid attr_get() call didn't return success");
 	TEST_ASSERT_EQUAL(1, (attr_value > 0),
-- 
2.23.0


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [dpdk-dev] [PATCH] test/service: fix race in attr check
  2021-10-11 14:54 [dpdk-dev] [PATCH] test/service: fix race in attr check David Marchand
@ 2021-10-11 15:06 ` Aaron Conole
  2021-10-12 11:52 ` Van Haaren, Harry
  2021-10-12 18:49 ` [dpdk-dev] [dpdk-stable] " David Marchand
  2 siblings, 0 replies; 4+ messages in thread
From: Aaron Conole @ 2021-10-11 15:06 UTC (permalink / raw)
  To: David Marchand; +Cc: dev, stable, Harry van Haaren, Kevin Laatz

David Marchand <david.marchand@redhat.com> writes:

> The CI reported rare (and cryptic) failures like:
>
> RTE>>service_autotest
>  + ------------------------------------------------------- +
>  + Test Suite : service core test suite
>  + ------------------------------------------------------- +
>  + TestCase [ 0] : unregister_all succeeded
>  + TestCase [ 1] : service_name succeeded
>  + TestCase [ 2] : service_get_by_name succeeded
> Service dummy_service Summary
>   dummy_service: stats 1	calls 0	cycles 0	avg: 0
> Service dummy_service Summary
>   dummy_service: stats 0	calls 0	cycles 0	avg: 0
>  + TestCase [ 3] : service_dump succeeded
>  + TestCase [ 4] : service_attr_get failed
>  + TestCase [ 5] : service_lcore_attr_get succeeded
>  + TestCase [ 6] : service_probe_capability succeeded
>  + TestCase [ 7] : service_start_stop succeeded
>  + TestCase [ 8] : service_lcore_add_del succeeded
>  + TestCase [ 9] : service_lcore_start_stop succeeded
>  + TestCase [10] : service_lcore_en_dis_able succeeded
>  + TestCase [11] : service_mt_unsafe_poll succeeded
>  + TestCase [12] : service_mt_safe_poll succeeded
> perf test for MT Safe: 42.7 cycles per call
>  + TestCase [13] : service_app_lcore_mt_safe succeeded
> perf test for MT Unsafe: 73.3 cycles per call
>  + TestCase [14] : service_app_lcore_mt_unsafe succeeded
>  + TestCase [15] : service_may_be_active succeeded
>  + TestCase [16] : service_active_two_cores succeeded
>  + ------------------------------------------------------- +
>  + Test Suite Summary : service core test suite
>  + ------------------------------------------------------- +
>  + Tests Total :       17
>  + Tests Skipped :      0
>  + Tests Executed :    17
>  + Tests Unsupported:   0
>  + Tests Passed :      16
>  + Tests Failed :       1
>  + ------------------------------------------------------- +
> Test Failed
> RTE>>
> stderr:
> EAL: Detected CPU lcores: 16
> EAL: Detected NUMA nodes: 2
> EAL: Detected static linkage of DPDK
> EAL: Multi-process socket /var/run/dpdk/service_autotest/mp_socket
> EAL: Selected IOVA mode 'PA'
> EAL: No available 1048576 kB hugepages reported
> EAL: VFIO support initialized
> EAL: Device 0000:03:00.0 is not NUMA-aware, defaulting socket to 0
> APP: HPET is not enabled, using TSC as default timer
> EAL: Test assert service_attr_get line 340 failed: attr_get() call didn't
>  get call count (zero)
>
> According to API, trying to stop a service lcore is not possible if this
> lcore is the only one associated to a service.
> Doing this will result in a -EBUSY return code from
> rte_service_lcore_stop() which the service_attr_get subtest was not
> checking.
> This left the service lcore running, and a race existed with the main
> lcore on checking the service attributes which triggered this CI
> failure.
>
> To fix this, dissociate the service lcore with current service.
>
> Once fixed this first issue, a race still exists, because the
> wait_slcore_inactive helper added in a previous fix was not
> paired with a check that the service lcore _did_ stop.
>
> Add missing check on rte_service_lcore_may_be_active.
>
> Fixes: 4d55194d76a4 ("service: add attribute get function")
> Fixes: 52bb6be259ff ("test/service: fix race condition on stopping lcore")
> Cc: stable@dpdk.org
>
> Signed-off-by: David Marchand <david.marchand@redhat.com>
> ---

Excellent catch.

Acked-by: Aaron Conole <aconole@redhat.com>


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [dpdk-dev] [PATCH] test/service: fix race in attr check
  2021-10-11 14:54 [dpdk-dev] [PATCH] test/service: fix race in attr check David Marchand
  2021-10-11 15:06 ` Aaron Conole
@ 2021-10-12 11:52 ` Van Haaren, Harry
  2021-10-12 18:49 ` [dpdk-dev] [dpdk-stable] " David Marchand
  2 siblings, 0 replies; 4+ messages in thread
From: Van Haaren, Harry @ 2021-10-12 11:52 UTC (permalink / raw)
  To: David Marchand, dev; +Cc: aconole, stable, Laatz, Kevin

> -----Original Message-----
> From: David Marchand <david.marchand@redhat.com>
> Sent: Monday, October 11, 2021 3:55 PM
> To: dev@dpdk.org
> Cc: aconole@redhat.com; stable@dpdk.org; Van Haaren, Harry
> <harry.van.haaren@intel.com>; Laatz, Kevin <kevin.laatz@intel.com>
> Subject: [PATCH] test/service: fix race in attr check
> 
> The CI reported rare (and cryptic) failures like:

<snip>

> Fixes: 4d55194d76a4 ("service: add attribute get function")
> Fixes: 52bb6be259ff ("test/service: fix race condition on stopping lcore")
> Cc: stable@dpdk.org
> 
> Signed-off-by: David Marchand <david.marchand@redhat.com>

Good detail in commit message, & good catch thanks;

Acked-by: Harry van Haaren <harry.van.haaren@intel.com>


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [dpdk-dev] [dpdk-stable] [PATCH] test/service: fix race in attr check
  2021-10-11 14:54 [dpdk-dev] [PATCH] test/service: fix race in attr check David Marchand
  2021-10-11 15:06 ` Aaron Conole
  2021-10-12 11:52 ` Van Haaren, Harry
@ 2021-10-12 18:49 ` David Marchand
  2 siblings, 0 replies; 4+ messages in thread
From: David Marchand @ 2021-10-12 18:49 UTC (permalink / raw)
  To: dev; +Cc: Aaron Conole, dpdk stable, Harry van Haaren, Kevin Laatz

On Mon, Oct 11, 2021 at 4:54 PM David Marchand
<david.marchand@redhat.com> wrote:
>
> The CI reported rare (and cryptic) failures like:
>
> RTE>>service_autotest
>  + ------------------------------------------------------- +
>  + Test Suite : service core test suite
>  + ------------------------------------------------------- +
>  + TestCase [ 0] : unregister_all succeeded
>  + TestCase [ 1] : service_name succeeded
>  + TestCase [ 2] : service_get_by_name succeeded
> Service dummy_service Summary
>   dummy_service: stats 1        calls 0 cycles 0        avg: 0
> Service dummy_service Summary
>   dummy_service: stats 0        calls 0 cycles 0        avg: 0
>  + TestCase [ 3] : service_dump succeeded
>  + TestCase [ 4] : service_attr_get failed
>  + TestCase [ 5] : service_lcore_attr_get succeeded
>  + TestCase [ 6] : service_probe_capability succeeded
>  + TestCase [ 7] : service_start_stop succeeded
>  + TestCase [ 8] : service_lcore_add_del succeeded
>  + TestCase [ 9] : service_lcore_start_stop succeeded
>  + TestCase [10] : service_lcore_en_dis_able succeeded
>  + TestCase [11] : service_mt_unsafe_poll succeeded
>  + TestCase [12] : service_mt_safe_poll succeeded
> perf test for MT Safe: 42.7 cycles per call
>  + TestCase [13] : service_app_lcore_mt_safe succeeded
> perf test for MT Unsafe: 73.3 cycles per call
>  + TestCase [14] : service_app_lcore_mt_unsafe succeeded
>  + TestCase [15] : service_may_be_active succeeded
>  + TestCase [16] : service_active_two_cores succeeded
>  + ------------------------------------------------------- +
>  + Test Suite Summary : service core test suite
>  + ------------------------------------------------------- +
>  + Tests Total :       17
>  + Tests Skipped :      0
>  + Tests Executed :    17
>  + Tests Unsupported:   0
>  + Tests Passed :      16
>  + Tests Failed :       1
>  + ------------------------------------------------------- +
> Test Failed
> RTE>>
> stderr:
> EAL: Detected CPU lcores: 16
> EAL: Detected NUMA nodes: 2
> EAL: Detected static linkage of DPDK
> EAL: Multi-process socket /var/run/dpdk/service_autotest/mp_socket
> EAL: Selected IOVA mode 'PA'
> EAL: No available 1048576 kB hugepages reported
> EAL: VFIO support initialized
> EAL: Device 0000:03:00.0 is not NUMA-aware, defaulting socket to 0
> APP: HPET is not enabled, using TSC as default timer
> EAL: Test assert service_attr_get line 340 failed: attr_get() call didn't
>  get call count (zero)
>
> According to API, trying to stop a service lcore is not possible if this
> lcore is the only one associated to a service.
> Doing this will result in a -EBUSY return code from
> rte_service_lcore_stop() which the service_attr_get subtest was not
> checking.
> This left the service lcore running, and a race existed with the main
> lcore on checking the service attributes which triggered this CI
> failure.
>
> To fix this, dissociate the service lcore with current service.
>
> Once fixed this first issue, a race still exists, because the
> wait_slcore_inactive helper added in a previous fix was not
> paired with a check that the service lcore _did_ stop.
>
> Add missing check on rte_service_lcore_may_be_active.
>
> Fixes: 4d55194d76a4 ("service: add attribute get function")
> Fixes: 52bb6be259ff ("test/service: fix race condition on stopping lcore")
> Cc: stable@dpdk.org
>
> Signed-off-by: David Marchand <david.marchand@redhat.com>
Acked-by: Aaron Conole <aconole@redhat.com>
Acked-by: Harry van Haaren <harry.van.haaren@intel.com>

Applied, thanks.


-- 
David Marchand


^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2021-10-12 18:49 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-10-11 14:54 [dpdk-dev] [PATCH] test/service: fix race in attr check David Marchand
2021-10-11 15:06 ` Aaron Conole
2021-10-12 11:52 ` Van Haaren, Harry
2021-10-12 18:49 ` [dpdk-dev] [dpdk-stable] " David Marchand

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).