DPDK patches and discussions
 help / color / mirror / Atom feed
From: Lukasz Wojciechowski <l.wojciechow@partner.samsung.com>
To: Harry van Haaren <harry.van.haaren@intel.com>, dev@dpdk.org
Cc: david.marchand@redhat.com, igor.romanov@oktetlabs.ru,
	honnappa.nagarahalli@arm.com, ferruh.yigit@intel.com, nd@arm.com,
	aconole@redhat.com
Subject: Re: [dpdk-dev] [PATCH] service: fix stop API to wait for service thread
Date: Mon, 20 Jul 2020 14:51:56 +0200	[thread overview]
Message-ID: <fbd9c539-24c1-8cd8-988f-56dd423f892d@partner.samsung.com> (raw)
In-Reply-To: <20200720120938.34660-1-harry.van.haaren@intel.com>


W dniu 20.07.2020 o 14:09, Harry van Haaren pisze:
> This commit improves the service_lcore_stop() implementation,
> waiting for the service core in question to return. The service
> thread itself now has a variable to indicate if its thread is
> active. When zero the service thread has completed its service,
> and has returned from the service_runner_func() function.
>
> This fixes a race condition observed in the DPDK CI, where the
> statistics of the service were not consistent with the expectation
> due to the service thread still running, and incrementing a stat
> after stop was called.
>
> Signed-off-by: Harry van Haaren <harry.van.haaren@intel.com>
>
> ---
>
> This is one possible solution, that avoids a class of race-conditions
> based on stop() api and following behaviours. Without a change in
> implementation of the service core thread, we could not detect when
> the thread was actually finished. This is now possible, and the stop
> api makes use of it to wait for 1000x one millisecond, or log a warning
> that a service core didn't return quickly.
>
> Thanks for the discussion/debug on list - I'm not sure how to add
> reported-by/suggested-by etc tags: but I'll resend a V2 (or can add
> on apply).
>
> ---
>   lib/librte_eal/common/rte_service.c | 24 ++++++++++++++++++++++++
>   1 file changed, 24 insertions(+)
>
> diff --git a/lib/librte_eal/common/rte_service.c b/lib/librte_eal/common/rte_service.c
> index 6a0e0ff65..d2255587d 100644
> --- a/lib/librte_eal/common/rte_service.c
> +++ b/lib/librte_eal/common/rte_service.c
> @@ -65,6 +65,7 @@ struct core_state {
>   	/* map of services IDs are run on this core */
>   	uint64_t service_mask;
>   	uint8_t runstate; /* running or stopped */
> +	uint8_t thread_active; /* indicates when the thread is in service_run() */
>   	uint8_t is_service_core; /* set if core is currently a service core */
>   	uint8_t service_active_on_lcore[RTE_SERVICE_NUM_MAX];
>   	uint64_t loops;
> @@ -457,6 +458,8 @@ service_runner_func(void *arg)
>   	const int lcore = rte_lcore_id();
>   	struct core_state *cs = &lcore_states[lcore];
>   
> +	__atomic_store_n(&cs->thread_active, 1, __ATOMIC_RELAXED);
> +
>   	/* runstate act as the guard variable. Use load-acquire
>   	 * memory order here to synchronize with store-release
>   	 * in runstate update functions.
> @@ -475,6 +478,7 @@ service_runner_func(void *arg)
>   		cs->loops++;
>   	}
>   
> +	__atomic_store_n(&cs->thread_active, 0, __ATOMIC_RELAXED);
>   	return 0;
>   }
>   
> @@ -765,6 +769,26 @@ rte_service_lcore_stop(uint32_t lcore)
>   	__atomic_store_n(&lcore_states[lcore].runstate, RUNSTATE_STOPPED,
>   		__ATOMIC_RELEASE);
>   
> +	/* wait for service lcore to return */
> +	i = 0;
> +	uint8_t active;
> +	uint64_t start = rte_rdtsc();
> +	do {
> +		active = __atomic_load_n(&lcore_states[lcore].thread_active,
> +					 __ATOMIC_RELAXED);
> +		if (active == 0)
> +			break;
> +		rte_delay_ms(1);
> +		i++;
> +	} while (i < 1000);
> +
> +	if (active != 0) {
> +		uint64_t end = rte_rdtsc();
> +		RTE_LOG(WARNING, EAL,
> +			"service lcore stop() failed, waited for %ld cycles\n",
> +			end - start);
> +	}
> +
>   	return 0;
>   }
>   
I don't like the idea of inserting this polling loop inside API call. 
And I don't like setting up a 1000 iterations constraint.
How about keeping the thread_active flag, but moving checking state of 
this flag to separate function. This way the user of the API would be 
able to write own loop.
Maybe he/she would like a custom loop, because:
* waiting for more cores
* would like to wait longer
* would like to check if service is finished less often...

-- 
Lukasz Wojciechowski
Principal Software Engineer

Samsung R&D Institute Poland
Samsung Electronics
Office +48 22 377 88 25
l.wojciechow@partner.samsung.com


  reply	other threads:[~2020-07-20 12:52 UTC|newest]

Thread overview: 31+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <CGME20200720120850eucas1p10debcafa273d244a7e63d225c50cc9df@eucas1p1.samsung.com>
2020-07-20 12:09 ` Harry van Haaren
2020-07-20 12:51   ` Lukasz Wojciechowski [this message]
2020-07-20 14:20     ` Van Haaren, Harry
2020-07-20 14:38   ` [dpdk-dev] [PATCH v2 1/2] service: add API to retrieve service core active Harry van Haaren
2020-07-20 14:38     ` [dpdk-dev] [PATCH v2 2/2] test/service: fix race condition on stopping lcore Harry van Haaren
2020-07-20 17:45       ` Lukasz Wojciechowski
2020-07-21  8:38       ` Phil Yang
2020-07-22 10:26         ` Van Haaren, Harry
2020-07-20 17:45     ` [dpdk-dev] [PATCH v2 1/2] service: add API to retrieve service core active Lukasz Wojciechowski
2020-07-21  7:47     ` Phil Yang
2020-07-21 19:43     ` Honnappa Nagarahalli
2020-07-21 19:50       ` David Marchand
2020-07-21 20:23         ` Honnappa Nagarahalli
2020-07-22 10:14           ` Van Haaren, Harry
2020-07-22 18:50             ` Honnappa Nagarahalli
2020-07-23 16:59               ` Van Haaren, Harry
2020-07-22 10:37     ` [dpdk-dev] [PATCH v3 " Harry van Haaren
2020-07-22 10:37       ` [dpdk-dev] [PATCH v3 2/2] test/service: fix race condition on stopping lcore Harry van Haaren
2020-07-22 21:40         ` Honnappa Nagarahalli
2020-07-22 21:39       ` [dpdk-dev] [PATCH v3 1/2] service: add API to retrieve service core active Honnappa Nagarahalli
2020-07-24 12:45       ` [dpdk-dev] [PATCH v4 " Harry van Haaren
2020-07-24 12:45         ` [dpdk-dev] [PATCH v4 2/2] test/service: fix race condition on stopping lcore Harry van Haaren
2020-07-24 13:45         ` [dpdk-dev] [PATCH v5 1/2] service: add API to retrieve service core active Harry van Haaren
2020-07-24 13:45           ` [dpdk-dev] [PATCH v5 2/2] test/service: fix race condition on stopping lcore Harry van Haaren
2020-09-14  8:36             ` David Marchand
2020-09-14 14:33               ` Van Haaren, Harry
2020-09-14 14:31           ` [dpdk-dev] [PATCH v6 1/2] service: add API to retrieve service core active Harry van Haaren
2020-09-14 14:31             ` [dpdk-dev] [PATCH v6 2/2] test/service: fix race condition on stopping lcore Harry van Haaren
2020-09-21 14:51               ` David Marchand
2020-10-13 19:45                 ` David Marchand
2020-10-15  8:11                   ` Van Haaren, Harry

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=fbd9c539-24c1-8cd8-988f-56dd423f892d@partner.samsung.com \
    --to=l.wojciechow@partner.samsung.com \
    --cc=aconole@redhat.com \
    --cc=david.marchand@redhat.com \
    --cc=dev@dpdk.org \
    --cc=ferruh.yigit@intel.com \
    --cc=harry.van.haaren@intel.com \
    --cc=honnappa.nagarahalli@arm.com \
    --cc=igor.romanov@oktetlabs.ru \
    --cc=nd@arm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).