DPDK patches and discussions
 help / color / mirror / Atom feed
From: Aaron Conole <aconole@redhat.com>
To: Ruifeng Wang <ruifeng.wang@arm.com>
Cc: david.hunt@intel.com, dev@dpdk.org, hkalra@marvell.com,
	gavin.hu@arm.com, honnappa.nagarahalli@arm.com, nd@arm.com,
	stable@dpdk.org
Subject: Re: [dpdk-dev] [PATCH] lib/distributor: fix deadlock issue for aarch64
Date: Tue, 08 Oct 2019 13:05:42 -0400	[thread overview]
Message-ID: <f7tk19f182h.fsf@dhcp-25.97.bos.redhat.com> (raw)
In-Reply-To: <20191008095524.1585-1-ruifeng.wang@arm.com> (Ruifeng Wang's message of "Tue, 8 Oct 2019 17:55:24 +0800")

Ruifeng Wang <ruifeng.wang@arm.com> writes:

> Distributor and worker threads rely on data structs in cache line
> for synchronization. The shared data structs were not protected.
> This caused deadlock issue on weaker memory ordering platforms as
> aarch64.
> Fix this issue by adding memory barriers to ensure synchronization
> among cores.
>
> Bugzilla ID: 342
> Fixes: 775003ad2f96 ("distributor: add new burst-capable library")
> Cc: stable@dpdk.org
>
> Signed-off-by: Ruifeng Wang <ruifeng.wang@arm.com>
> Reviewed-by: Gavin Hu <gavin.hu@arm.com>
> ---

I see a failure in the distributor_autotest (on one of the builds):

64/82 DPDK:fast-tests / distributor_autotest  FAIL     0.37 s (exit status 255 or signal 127 SIGinvalid)

--- command ---

DPDK_TEST='distributor_autotest' /home/travis/build/ovsrobot/dpdk/build/app/test/dpdk-test -l 0-1 --file-prefix=distributor_autotest

--- stdout ---

EAL: Probing VFIO support...

APP: HPET is not enabled, using TSC as default timer

RTE>>distributor_autotest

=== Basic distributor sanity tests ===

Worker 0 handled 32 packets

Sanity test with all zero hashes done.

Worker 0 handled 32 packets

Sanity test with non-zero hashes done

=== testing big burst (single) ===

Sanity test of returned packets done

=== Sanity test with mbuf alloc/free (single) ===

Sanity test with mbuf alloc/free passed

Too few cores to run worker shutdown test

=== Basic distributor sanity tests ===

Worker 0 handled 32 packets

Sanity test with all zero hashes done.

Worker 0 handled 32 packets

Sanity test with non-zero hashes done

=== testing big burst (burst) ===

Sanity test of returned packets done

=== Sanity test with mbuf alloc/free (burst) ===

Line 326: Packet count is incorrect, 1048568, expected 1048576

Test Failed

RTE>>

--- stderr ---

EAL: Detected 2 lcore(s)

EAL: Detected 1 NUMA nodes

EAL: Multi-process socket /var/run/dpdk/distributor_autotest/mp_socket

EAL: Selected IOVA mode 'PA'

EAL: No available hugepages reported in hugepages-1048576kB

-------

Not sure how to help debug further.  I'll re-start the job to see if
it 'clears' up - but I guess there may be a delicate synchronization
somewhere that needs to be accounted.

>  lib/librte_distributor/rte_distributor.c     | 28 ++++++++++------
>  lib/librte_distributor/rte_distributor_v20.c | 34 +++++++++++++-------
>  2 files changed, 41 insertions(+), 21 deletions(-)
>
> diff --git a/lib/librte_distributor/rte_distributor.c b/lib/librte_distributor/rte_distributor.c
> index 21eb1fb0a..7bf96e224 100644
> --- a/lib/librte_distributor/rte_distributor.c
> +++ b/lib/librte_distributor/rte_distributor.c
> @@ -50,7 +50,8 @@ rte_distributor_request_pkt_v1705(struct rte_distributor *d,
>  
>  	retptr64 = &(buf->retptr64[0]);
>  	/* Spin while handshake bits are set (scheduler clears it) */
> -	while (unlikely(*retptr64 & RTE_DISTRIB_GET_BUF)) {
> +	while (unlikely(__atomic_load_n(retptr64, __ATOMIC_ACQUIRE)
> +			& RTE_DISTRIB_GET_BUF)) {
>  		rte_pause();
>  		uint64_t t = rte_rdtsc()+100;
>  
> @@ -76,7 +77,8 @@ rte_distributor_request_pkt_v1705(struct rte_distributor *d,
>  	 * Finally, set the GET_BUF  to signal to distributor that cache
>  	 * line is ready for processing
>  	 */
> -	*retptr64 |= RTE_DISTRIB_GET_BUF;
> +	__atomic_store_n(retptr64, *retptr64 | RTE_DISTRIB_GET_BUF,
> +			__ATOMIC_RELEASE);
>  }
>  BIND_DEFAULT_SYMBOL(rte_distributor_request_pkt, _v1705, 17.05);
>  MAP_STATIC_SYMBOL(void rte_distributor_request_pkt(struct rte_distributor *d,
> @@ -99,7 +101,8 @@ rte_distributor_poll_pkt_v1705(struct rte_distributor *d,
>  	}
>  
>  	/* If bit is set, return */
> -	if (buf->bufptr64[0] & RTE_DISTRIB_GET_BUF)
> +	if (__atomic_load_n(&(buf->bufptr64[0]), __ATOMIC_ACQUIRE)
> +		& RTE_DISTRIB_GET_BUF)
>  		return -1;
>  
>  	/* since bufptr64 is signed, this should be an arithmetic shift */
> @@ -116,6 +119,8 @@ rte_distributor_poll_pkt_v1705(struct rte_distributor *d,
>  	 * on the next cacheline while we're working.
>  	 */
>  	buf->bufptr64[0] |= RTE_DISTRIB_GET_BUF;
> +	__atomic_store_n(&(buf->bufptr64[0]),
> +		buf->bufptr64[0] | RTE_DISTRIB_GET_BUF, __ATOMIC_RELEASE);
>  
>  	return count;
>  }
> @@ -183,7 +188,8 @@ rte_distributor_return_pkt_v1705(struct rte_distributor *d,
>  			RTE_DISTRIB_FLAG_BITS) | RTE_DISTRIB_RETURN_BUF;
>  
>  	/* set the GET_BUF but even if we got no returns */
> -	buf->retptr64[0] |= RTE_DISTRIB_GET_BUF;
> +	__atomic_store_n(&(buf->retptr64[0]),
> +		buf->retptr64[0] | RTE_DISTRIB_GET_BUF, __ATOMIC_RELEASE);
>  
>  	return 0;
>  }
> @@ -273,7 +279,8 @@ handle_returns(struct rte_distributor *d, unsigned int wkr)
>  	unsigned int count = 0;
>  	unsigned int i;
>  
> -	if (buf->retptr64[0] & RTE_DISTRIB_GET_BUF) {
> +	if (__atomic_load_n(&(buf->retptr64[0]), __ATOMIC_ACQUIRE)
> +		& RTE_DISTRIB_GET_BUF) {
>  		for (i = 0; i < RTE_DIST_BURST_SIZE; i++) {
>  			if (buf->retptr64[i] & RTE_DISTRIB_RETURN_BUF) {
>  				oldbuf = ((uintptr_t)(buf->retptr64[i] >>
> @@ -287,7 +294,7 @@ handle_returns(struct rte_distributor *d, unsigned int wkr)
>  		d->returns.start = ret_start;
>  		d->returns.count = ret_count;
>  		/* Clear for the worker to populate with more returns */
> -		buf->retptr64[0] = 0;
> +		__atomic_store_n(&(buf->retptr64[0]), 0, __ATOMIC_RELEASE);
>  	}
>  	return count;
>  }
> @@ -307,7 +314,8 @@ release(struct rte_distributor *d, unsigned int wkr)
>  	struct rte_distributor_buffer *buf = &(d->bufs[wkr]);
>  	unsigned int i;
>  
> -	while (!(d->bufs[wkr].bufptr64[0] & RTE_DISTRIB_GET_BUF))
> +	while (!(__atomic_load_n(&(d->bufs[wkr].bufptr64[0]), __ATOMIC_ACQUIRE)
> +		& RTE_DISTRIB_GET_BUF))
>  		rte_pause();
>  
>  	handle_returns(d, wkr);
> @@ -328,7 +336,8 @@ release(struct rte_distributor *d, unsigned int wkr)
>  	d->backlog[wkr].count = 0;
>  
>  	/* Clear the GET bit */
> -	buf->bufptr64[0] &= ~RTE_DISTRIB_GET_BUF;
> +	__atomic_store_n(&(buf->bufptr64[0]),
> +		buf->bufptr64[0] & ~RTE_DISTRIB_GET_BUF, __ATOMIC_RELEASE);
>  	return  buf->count;
>  
>  }
> @@ -574,7 +583,8 @@ rte_distributor_clear_returns_v1705(struct rte_distributor *d)
>  
>  	/* throw away returns, so workers can exit */
>  	for (wkr = 0; wkr < d->num_workers; wkr++)
> -		d->bufs[wkr].retptr64[0] = 0;
> +		__atomic_store_n(&(d->bufs[wkr].retptr64[0]), 0,
> +				__ATOMIC_RELEASE);
>  }
>  BIND_DEFAULT_SYMBOL(rte_distributor_clear_returns, _v1705, 17.05);
>  MAP_STATIC_SYMBOL(void rte_distributor_clear_returns(struct rte_distributor *d),
> diff --git a/lib/librte_distributor/rte_distributor_v20.c b/lib/librte_distributor/rte_distributor_v20.c
> index cdc0969a8..3a5810c6d 100644
> --- a/lib/librte_distributor/rte_distributor_v20.c
> +++ b/lib/librte_distributor/rte_distributor_v20.c
> @@ -34,9 +34,10 @@ rte_distributor_request_pkt_v20(struct rte_distributor_v20 *d,
>  	union rte_distributor_buffer_v20 *buf = &d->bufs[worker_id];
>  	int64_t req = (((int64_t)(uintptr_t)oldpkt) << RTE_DISTRIB_FLAG_BITS)
>  			| RTE_DISTRIB_GET_BUF;
> -	while (unlikely(buf->bufptr64 & RTE_DISTRIB_FLAGS_MASK))
> +	while (unlikely(__atomic_load_n(&(buf->bufptr64), __ATOMIC_ACQUIRE)
> +		& RTE_DISTRIB_FLAGS_MASK))
>  		rte_pause();
> -	buf->bufptr64 = req;
> +	__atomic_store_n(&(buf->bufptr64), req, __ATOMIC_RELEASE);
>  }
>  VERSION_SYMBOL(rte_distributor_request_pkt, _v20, 2.0);
>  
> @@ -45,7 +46,8 @@ rte_distributor_poll_pkt_v20(struct rte_distributor_v20 *d,
>  		unsigned worker_id)
>  {
>  	union rte_distributor_buffer_v20 *buf = &d->bufs[worker_id];
> -	if (buf->bufptr64 & RTE_DISTRIB_GET_BUF)
> +	if (__atomic_load_n(&(buf->bufptr64), __ATOMIC_ACQUIRE)
> +		& RTE_DISTRIB_GET_BUF)
>  		return NULL;
>  
>  	/* since bufptr64 is signed, this should be an arithmetic shift */
> @@ -73,7 +75,7 @@ rte_distributor_return_pkt_v20(struct rte_distributor_v20 *d,
>  	union rte_distributor_buffer_v20 *buf = &d->bufs[worker_id];
>  	uint64_t req = (((int64_t)(uintptr_t)oldpkt) << RTE_DISTRIB_FLAG_BITS)
>  			| RTE_DISTRIB_RETURN_BUF;
> -	buf->bufptr64 = req;
> +	__atomic_store_n(&(buf->bufptr64), req, __ATOMIC_RELEASE);
>  	return 0;
>  }
>  VERSION_SYMBOL(rte_distributor_return_pkt, _v20, 2.0);
> @@ -117,7 +119,7 @@ handle_worker_shutdown(struct rte_distributor_v20 *d, unsigned int wkr)
>  {
>  	d->in_flight_tags[wkr] = 0;
>  	d->in_flight_bitmask &= ~(1UL << wkr);
> -	d->bufs[wkr].bufptr64 = 0;
> +	__atomic_store_n(&(d->bufs[wkr].bufptr64), 0, __ATOMIC_RELEASE);
>  	if (unlikely(d->backlog[wkr].count != 0)) {
>  		/* On return of a packet, we need to move the
>  		 * queued packets for this core elsewhere.
> @@ -165,13 +167,17 @@ process_returns(struct rte_distributor_v20 *d)
>  		const int64_t data = d->bufs[wkr].bufptr64;
>  		uintptr_t oldbuf = 0;
>  
> -		if (data & RTE_DISTRIB_GET_BUF) {
> +		if (__atomic_load_n(&data, __ATOMIC_ACQUIRE)
> +			& RTE_DISTRIB_GET_BUF) {
>  			flushed++;
>  			if (d->backlog[wkr].count)
> -				d->bufs[wkr].bufptr64 =
> -						backlog_pop(&d->backlog[wkr]);
> +				__atomic_store_n(&(d->bufs[wkr].bufptr64),
> +					backlog_pop(&d->backlog[wkr]),
> +					__ATOMIC_RELEASE);
>  			else {
> -				d->bufs[wkr].bufptr64 = RTE_DISTRIB_GET_BUF;
> +				__atomic_store_n(&(d->bufs[wkr].bufptr64),
> +					RTE_DISTRIB_GET_BUF,
> +					__ATOMIC_RELEASE);
>  				d->in_flight_tags[wkr] = 0;
>  				d->in_flight_bitmask &= ~(1UL << wkr);
>  			}
> @@ -251,7 +257,8 @@ rte_distributor_process_v20(struct rte_distributor_v20 *d,
>  			}
>  		}
>  
> -		if ((data & RTE_DISTRIB_GET_BUF) &&
> +		if ((__atomic_load_n(&data, __ATOMIC_ACQUIRE)
> +			& RTE_DISTRIB_GET_BUF) &&
>  				(d->backlog[wkr].count || next_mb)) {
>  
>  			if (d->backlog[wkr].count)
> @@ -280,13 +287,16 @@ rte_distributor_process_v20(struct rte_distributor_v20 *d,
>  	 * if they are ready */
>  	for (wkr = 0; wkr < d->num_workers; wkr++)
>  		if (d->backlog[wkr].count &&
> -				(d->bufs[wkr].bufptr64 & RTE_DISTRIB_GET_BUF)) {
> +				(__atomic_load_n(&(d->bufs[wkr].bufptr64),
> +				__ATOMIC_ACQUIRE) & RTE_DISTRIB_GET_BUF)) {
>  
>  			int64_t oldbuf = d->bufs[wkr].bufptr64 >>
>  					RTE_DISTRIB_FLAG_BITS;
>  			store_return(oldbuf, d, &ret_start, &ret_count);
>  
> -			d->bufs[wkr].bufptr64 = backlog_pop(&d->backlog[wkr]);
> +			__atomic_store_n(&(d->bufs[wkr].bufptr64),
> +				backlog_pop(&d->backlog[wkr]),
> +				__ATOMIC_RELEASE);
>  		}
>  
>  	d->returns.start = ret_start;

  parent reply	other threads:[~2019-10-08 17:05 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-10-08  9:55 Ruifeng Wang
2019-10-08 12:53 ` Hunt, David
2019-10-08 17:05 ` Aaron Conole [this message]
2019-10-08 19:46   ` [dpdk-dev] [dpdk-stable] " David Marchand
2019-10-08 20:08     ` Aaron Conole
2019-10-09  5:52     ` Ruifeng Wang (Arm Technology China)
2019-10-17 11:42       ` [dpdk-dev] [EXT] " Harman Kalra
2019-10-17 13:48         ` Ruifeng Wang (Arm Technology China)
2019-10-12  2:43 ` [dpdk-dev] [PATCH v2 0/2] fix distributor unit test Ruifeng Wang
2019-10-12  2:43   ` [dpdk-dev] [PATCH v2 1/2] lib/distributor: fix deadlock issue for aarch64 Ruifeng Wang
2019-10-13  2:31     ` Honnappa Nagarahalli
2019-10-14 10:00       ` Ruifeng Wang (Arm Technology China)
2019-10-12  2:43   ` [dpdk-dev] [PATCH v2 2/2] test/distributor: fix false unit test failure Ruifeng Wang
2019-10-15  9:28 ` [dpdk-dev] [PATCH v3 0/2] fix distributor unit test Ruifeng Wang
2019-10-15  9:28   ` [dpdk-dev] [PATCH v3 1/2] lib/distributor: fix deadlock issue for aarch64 Ruifeng Wang
2019-10-25  8:13     ` Hunt, David
2019-10-15  9:28   ` [dpdk-dev] [PATCH v3 2/2] test/distributor: fix false unit test failure Ruifeng Wang
2019-10-25  8:13     ` Hunt, David
2019-10-24 19:31   ` [dpdk-dev] [PATCH v3 0/2] fix distributor unit test David Marchand
2019-10-25  8:11     ` Hunt, David
2019-10-25  8:18       ` David Marchand
2019-10-25  8:20         ` Hunt, David
2019-10-25  8:33   ` David Marchand

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=f7tk19f182h.fsf@dhcp-25.97.bos.redhat.com \
    --to=aconole@redhat.com \
    --cc=david.hunt@intel.com \
    --cc=dev@dpdk.org \
    --cc=gavin.hu@arm.com \
    --cc=hkalra@marvell.com \
    --cc=honnappa.nagarahalli@arm.com \
    --cc=nd@arm.com \
    --cc=ruifeng.wang@arm.com \
    --cc=stable@dpdk.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).