DPDK patches and discussions
 help / color / mirror / Atom feed
* [dpdk-dev] [PATCH v1] ring: enforce reading the tails before ring operations
@ 2019-03-06  3:07 gavin hu
       [not found] ` <CGME20190306114906eucas1p19c2572b1fe777e1eb0ca96d2e47295bd@eucas1p1.samsung.com>
                   ` (3 more replies)
  0 siblings, 4 replies; 29+ messages in thread
From: gavin hu @ 2019-03-06  3:07 UTC (permalink / raw)
  To: dev
  Cc: nd, thomas, jerinj, hemant.agrawal, nipun.gupta,
	Honnappa.Nagarahalli, gavin.hu, olivier.matz

In weak memory models, like arm64, reading the {prod,cons}.tail may get
reordered after reading or writing the ring slots, which corrupts the ring
and stale data is observed.
This issue was reported by NXP on 8-A72 DPAA2 board. The problem is most
likely caused by missing the acquire semantics when reading cons.tail (in
SP enqueue) or prod.tail (in SC dequeue) which makes it possible to read a
stale value from the ring slots.  There must be a read fence before writing
or reading the ring slots, rte_atomic32_cmpset() provides the same ordering
for MP (and MC) case. This patch is to enforce this ordering for SP (and
SC) case.

Signed-off-by: gavin hu <gavin.hu@arm.com>
Reviewed-by: Ola Liljedahl <Ola.Liljedahl@arm.com>
Tested-by: Nipun Gupta <nipun.gupta@nxp.com>
---
 lib/librte_ring/rte_ring_generic.h | 16 ++++++++++------
 1 file changed, 10 insertions(+), 6 deletions(-)

diff --git a/lib/librte_ring/rte_ring_generic.h b/lib/librte_ring/rte_ring_generic.h
index ea7dbe5..1bd3dfd 100644
--- a/lib/librte_ring/rte_ring_generic.h
+++ b/lib/librte_ring/rte_ring_generic.h
@@ -90,9 +90,11 @@ __rte_ring_move_prod_head(struct rte_ring *r, unsigned int is_sp,
 			return 0;
 
 		*new_head = *old_head + n;
-		if (is_sp)
-			r->prod.head = *new_head, success = 1;
-		else
+		if (is_sp) {
+			r->prod.head = *new_head;
+			rte_smp_rmb();
+			success = 1;
+		} else
 			success = rte_atomic32_cmpset(&r->prod.head,
 					*old_head, *new_head);
 	} while (unlikely(success == 0));
@@ -158,9 +160,11 @@ __rte_ring_move_cons_head(struct rte_ring *r, unsigned int is_sc,
 			return 0;
 
 		*new_head = *old_head + n;
-		if (is_sc)
-			r->cons.head = *new_head, success = 1;
-		else
+		if (is_sc) {
+			r->cons.head = *new_head;
+			rte_smp_rmb();
+			success = 1;
+		} else
 			success = rte_atomic32_cmpset(&r->cons.head, *old_head,
 					*new_head);
 	} while (unlikely(success == 0));
-- 
2.7.4

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [dpdk-dev] [v1] ring: enforce reading the tails before ring operations
       [not found] ` <CGME20190306114906eucas1p19c2572b1fe777e1eb0ca96d2e47295bd@eucas1p1.samsung.com>
@ 2019-03-06 11:49   ` Ilya Maximets
  2019-03-07  6:50     ` Gavin Hu (Arm Technology China)
  0 siblings, 1 reply; 29+ messages in thread
From: Ilya Maximets @ 2019-03-06 11:49 UTC (permalink / raw)
  To: gavin hu, dev
  Cc: nd, thomas, jerinj, hemant.agrawal, nipun.gupta,
	Honnappa.Nagarahalli, olivier.matz

On 06.03.2019 6:07, gavin hu wrote:
> In weak memory models, like arm64, reading the {prod,cons}.tail may get
> reordered after reading or writing the ring slots, which corrupts the ring
> and stale data is observed.
> This issue was reported by NXP on 8-A72 DPAA2 board. The problem is most
> likely caused by missing the acquire semantics when reading cons.tail (in
> SP enqueue) or prod.tail (in SC dequeue) which makes it possible to read a
> stale value from the ring slots.  There must be a read fence before writing

Sorry, but the phrase "There must be a read fence before writing" makes no sense.
Could you please rephrase or describe in details which reads you're trying to
keep in exact order?

> or reading the ring slots, rte_atomic32_cmpset() provides the same ordering
> for MP (and MC) case. This patch is to enforce this ordering for SP (and
> SC) case.
> 
> Signed-off-by: gavin hu <gavin.hu@arm.com>
> Reviewed-by: Ola Liljedahl <Ola.Liljedahl@arm.com>
> Tested-by: Nipun Gupta <nipun.gupta@nxp.com>
> ---
>  lib/librte_ring/rte_ring_generic.h | 16 ++++++++++------
>  1 file changed, 10 insertions(+), 6 deletions(-)
> 
> diff --git a/lib/librte_ring/rte_ring_generic.h b/lib/librte_ring/rte_ring_generic.h
> index ea7dbe5..1bd3dfd 100644
> --- a/lib/librte_ring/rte_ring_generic.h
> +++ b/lib/librte_ring/rte_ring_generic.h
> @@ -90,9 +90,11 @@ __rte_ring_move_prod_head(struct rte_ring *r, unsigned int is_sp,
>  			return 0;
>  
>  		*new_head = *old_head + n;
> -		if (is_sp)
> -			r->prod.head = *new_head, success = 1;
> -		else
> +		if (is_sp) {
> +			r->prod.head = *new_head;
> +			rte_smp_rmb();
> +			success = 1;
> +		} else
>  			success = rte_atomic32_cmpset(&r->prod.head,
>  					*old_head, *new_head);
>  	} while (unlikely(success == 0));
> @@ -158,9 +160,11 @@ __rte_ring_move_cons_head(struct rte_ring *r, unsigned int is_sc,
>  			return 0;
>  
>  		*new_head = *old_head + n;
> -		if (is_sc)
> -			r->cons.head = *new_head, success = 1;
> -		else
> +		if (is_sc) {
> +			r->cons.head = *new_head;
> +			rte_smp_rmb();
> +			success = 1;
> +		} else
>  			success = rte_atomic32_cmpset(&r->cons.head, *old_head,
>  					*new_head);
>  	} while (unlikely(success == 0));
> 

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [dpdk-dev] [PATCH v2] ring: enforce reading the tails before ring operations
  2019-03-06  3:07 [dpdk-dev] [PATCH v1] ring: enforce reading the tails before ring operations gavin hu
       [not found] ` <CGME20190306114906eucas1p19c2572b1fe777e1eb0ca96d2e47295bd@eucas1p1.samsung.com>
@ 2019-03-07  6:45 ` gavin hu
  2019-03-07  8:52   ` Ilya Maximets
  2019-03-12 16:58 ` [dpdk-dev] [PATCH v3 0/1] ring: enforce reading the tail before reading ring slots Gavin Hu
  2019-03-12 16:58 ` [dpdk-dev] [PATCH v3 1/1] " Gavin Hu
  3 siblings, 1 reply; 29+ messages in thread
From: gavin hu @ 2019-03-07  6:45 UTC (permalink / raw)
  To: gavin.hu, dev
  Cc: nd, thomas, jerinj, hemant.agrawal, nipun.gupta,
	Honnappa.Nagarahalli, olivier.matz, i.maximets

In weak memory models, like arm64, reading the {prod,cons}.tail may get
reordered after reading or writing the ring slots, which corrupts the ring
and stale data is observed.

This issue was reported by NXP on 8-A72 DPAA2 board. The problem is most
likely caused by missing the acquire semantics when reading cons.tail (in
SP enqueue) or prod.tail (in SC dequeue) which makes it possible to read a
stale value from the ring slots.

For MP (and MC) case, rte_atomic32_cmpset() already provides the required
ordering. This patch is to prevent reading and writing the ring slots get
reordered before reading {prod,cons}.tail for SP (and SC) case.

Signed-off-by: gavin hu <gavin.hu@arm.com>
Reviewed-by: Ola Liljedahl <Ola.Liljedahl@arm.com>
Tested-by: Nipun Gupta <nipun.gupta@nxp.com>
---
 lib/librte_ring/rte_ring_generic.h | 16 ++++++++++------
 1 file changed, 10 insertions(+), 6 deletions(-)

diff --git a/lib/librte_ring/rte_ring_generic.h b/lib/librte_ring/rte_ring_generic.h
index ea7dbe5..1bd3dfd 100644
--- a/lib/librte_ring/rte_ring_generic.h
+++ b/lib/librte_ring/rte_ring_generic.h
@@ -90,9 +90,11 @@ __rte_ring_move_prod_head(struct rte_ring *r, unsigned int is_sp,
 			return 0;
 
 		*new_head = *old_head + n;
-		if (is_sp)
-			r->prod.head = *new_head, success = 1;
-		else
+		if (is_sp) {
+			r->prod.head = *new_head;
+			rte_smp_rmb();
+			success = 1;
+		} else
 			success = rte_atomic32_cmpset(&r->prod.head,
 					*old_head, *new_head);
 	} while (unlikely(success == 0));
@@ -158,9 +160,11 @@ __rte_ring_move_cons_head(struct rte_ring *r, unsigned int is_sc,
 			return 0;
 
 		*new_head = *old_head + n;
-		if (is_sc)
-			r->cons.head = *new_head, success = 1;
-		else
+		if (is_sc) {
+			r->cons.head = *new_head;
+			rte_smp_rmb();
+			success = 1;
+		} else
 			success = rte_atomic32_cmpset(&r->cons.head, *old_head,
 					*new_head);
 	} while (unlikely(success == 0));
-- 
2.7.4

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [dpdk-dev] [v1] ring: enforce reading the tails before ring operations
  2019-03-06 11:49   ` [dpdk-dev] [v1] " Ilya Maximets
@ 2019-03-07  6:50     ` Gavin Hu (Arm Technology China)
  0 siblings, 0 replies; 29+ messages in thread
From: Gavin Hu (Arm Technology China) @ 2019-03-07  6:50 UTC (permalink / raw)
  To: Ilya Maximets, dev
  Cc: nd, thomas, jerinj, hemant.agrawal, Nipun.gupta@nxp.com,
	Honnappa Nagarahalli, olivier.matz

Hi ilya,

Thanks for your comments, inline comments and new commit message in V2.

/Gavin

> -----Original Message-----
> From: Ilya Maximets <i.maximets@samsung.com>
> Sent: Wednesday, March 6, 2019 7:49 PM
> To: Gavin Hu (Arm Technology China) <Gavin.Hu@arm.com>;
> dev@dpdk.org
> Cc: nd <nd@arm.com>; thomas@monjalon.net; jerinj@marvell.com;
> hemant.agrawal@nxp.com; Nipun.gupta@nxp.com; Honnappa Nagarahalli
> <Honnappa.Nagarahalli@arm.com>; olivier.matz@6wind.com
> Subject: Re: [v1] ring: enforce reading the tails before ring operations
> 
> On 06.03.2019 6:07, gavin hu wrote:
> > In weak memory models, like arm64, reading the {prod,cons}.tail may get
> > reordered after reading or writing the ring slots, which corrupts the ring
> > and stale data is observed.
> > This issue was reported by NXP on 8-A72 DPAA2 board. The problem is
> most
> > likely caused by missing the acquire semantics when reading cons.tail (in
> > SP enqueue) or prod.tail (in SC dequeue) which makes it possible to read
> a
> > stale value from the ring slots.  There must be a read fence before
> writing
> 
> Sorry, but the phrase "There must be a read fence before writing" makes
> no sense.
> Could you please rephrase or describe in details which reads you're trying
> to
> keep in exact order?

This patch is to keep in exact order for reading the tails(to calculate the available
/free slots of ring when moving the heads) before reading or writing the ring slots.

I rephrased the commit message in V2, could you have a look? 

> > or reading the ring slots, rte_atomic32_cmpset() provides the same
> ordering
> > for MP (and MC) case. This patch is to enforce this ordering for SP (and
> > SC) case.
> >
> > Signed-off-by: gavin hu <gavin.hu@arm.com>
> > Reviewed-by: Ola Liljedahl <Ola.Liljedahl@arm.com>
> > Tested-by: Nipun Gupta <nipun.gupta@nxp.com>
> > ---
> >  lib/librte_ring/rte_ring_generic.h | 16 ++++++++++------
> >  1 file changed, 10 insertions(+), 6 deletions(-)
> >
> > diff --git a/lib/librte_ring/rte_ring_generic.h
> b/lib/librte_ring/rte_ring_generic.h
> > index ea7dbe5..1bd3dfd 100644
> > --- a/lib/librte_ring/rte_ring_generic.h
> > +++ b/lib/librte_ring/rte_ring_generic.h
> > @@ -90,9 +90,11 @@ __rte_ring_move_prod_head(struct rte_ring *r,
> unsigned int is_sp,
> >  			return 0;
> >
> >  		*new_head = *old_head + n;
> > -		if (is_sp)
> > -			r->prod.head = *new_head, success = 1;
> > -		else
> > +		if (is_sp) {
> > +			r->prod.head = *new_head;
> > +			rte_smp_rmb();
> > +			success = 1;
> > +		} else
> >  			success = rte_atomic32_cmpset(&r->prod.head,
> >  					*old_head, *new_head);
> >  	} while (unlikely(success == 0));
> > @@ -158,9 +160,11 @@ __rte_ring_move_cons_head(struct rte_ring *r,
> unsigned int is_sc,
> >  			return 0;
> >
> >  		*new_head = *old_head + n;
> > -		if (is_sc)
> > -			r->cons.head = *new_head, success = 1;
> > -		else
> > +		if (is_sc) {
> > +			r->cons.head = *new_head;
> > +			rte_smp_rmb();
> > +			success = 1;
> > +		} else
> >  			success = rte_atomic32_cmpset(&r->cons.head,
> *old_head,
> >  					*new_head);
> >  	} while (unlikely(success == 0));
> >

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [dpdk-dev] [PATCH v2] ring: enforce reading the tails before ring operations
  2019-03-07  6:45 ` [dpdk-dev] [PATCH v2] " gavin hu
@ 2019-03-07  8:52   ` Ilya Maximets
  2019-03-07  9:27     ` Gavin Hu (Arm Technology China)
  0 siblings, 1 reply; 29+ messages in thread
From: Ilya Maximets @ 2019-03-07  8:52 UTC (permalink / raw)
  To: gavin hu, dev
  Cc: nd, thomas, jerinj, hemant.agrawal, nipun.gupta,
	Honnappa.Nagarahalli, olivier.matz

On 07.03.2019 9:45, gavin hu wrote:
> In weak memory models, like arm64, reading the {prod,cons}.tail may get
> reordered after reading or writing the ring slots, which corrupts the ring
> and stale data is observed.
> 
> This issue was reported by NXP on 8-A72 DPAA2 board. The problem is most
> likely caused by missing the acquire semantics when reading cons.tail (in
> SP enqueue) or prod.tail (in SC dequeue) which makes it possible to read a
> stale value from the ring slots.
> 
> For MP (and MC) case, rte_atomic32_cmpset() already provides the required
> ordering. This patch is to prevent reading and writing the ring slots get
> reordered before reading {prod,cons}.tail for SP (and SC) case.

Read barrier rte_smp_rmb() is OK to prevent reading the ring get reordered
before reading the tail. However, to prevent *writing* the ring get reordered
*before reading* the tail you need a full memory barrier, i.e. rte_smp_mb().

> 
> Signed-off-by: gavin hu <gavin.hu@arm.com>
> Reviewed-by: Ola Liljedahl <Ola.Liljedahl@arm.com>
> Tested-by: Nipun Gupta <nipun.gupta@nxp.com>
> ---
>  lib/librte_ring/rte_ring_generic.h | 16 ++++++++++------
>  1 file changed, 10 insertions(+), 6 deletions(-)
> 
> diff --git a/lib/librte_ring/rte_ring_generic.h b/lib/librte_ring/rte_ring_generic.h
> index ea7dbe5..1bd3dfd 100644
> --- a/lib/librte_ring/rte_ring_generic.h
> +++ b/lib/librte_ring/rte_ring_generic.h
> @@ -90,9 +90,11 @@ __rte_ring_move_prod_head(struct rte_ring *r, unsigned int is_sp,
>  			return 0;
>  
>  		*new_head = *old_head + n;
> -		if (is_sp)
> -			r->prod.head = *new_head, success = 1;
> -		else
> +		if (is_sp) {
> +			r->prod.head = *new_head;
> +			rte_smp_rmb();
> +			success = 1;
> +		} else
>  			success = rte_atomic32_cmpset(&r->prod.head,
>  					*old_head, *new_head);
>  	} while (unlikely(success == 0));
> @@ -158,9 +160,11 @@ __rte_ring_move_cons_head(struct rte_ring *r, unsigned int is_sc,
>  			return 0;
>  
>  		*new_head = *old_head + n;
> -		if (is_sc)
> -			r->cons.head = *new_head, success = 1;
> -		else
> +		if (is_sc) {
> +			r->cons.head = *new_head;
> +			rte_smp_rmb();
> +			success = 1;
> +		} else
>  			success = rte_atomic32_cmpset(&r->cons.head, *old_head,
>  					*new_head);
>  	} while (unlikely(success == 0));
> 

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [dpdk-dev] [PATCH v2] ring: enforce reading the tails before ring operations
  2019-03-07  8:52   ` Ilya Maximets
@ 2019-03-07  9:27     ` Gavin Hu (Arm Technology China)
  2019-03-07  9:48       ` Ilya Maximets
  0 siblings, 1 reply; 29+ messages in thread
From: Gavin Hu (Arm Technology China) @ 2019-03-07  9:27 UTC (permalink / raw)
  To: Ilya Maximets, dev
  Cc: nd, thomas, jerinj, hemant.agrawal, Nipun.gupta@nxp.com,
	Honnappa Nagarahalli, olivier.matz



> -----Original Message-----
> From: Ilya Maximets <i.maximets@samsung.com>
> Sent: Thursday, March 7, 2019 4:52 PM
> To: Gavin Hu (Arm Technology China) <Gavin.Hu@arm.com>;
> dev@dpdk.org
> Cc: nd <nd@arm.com>; thomas@monjalon.net; jerinj@marvell.com;
> hemant.agrawal@nxp.com; Nipun.gupta@nxp.com; Honnappa Nagarahalli
> <Honnappa.Nagarahalli@arm.com>; olivier.matz@6wind.com
> Subject: Re: [PATCH v2] ring: enforce reading the tails before ring
> operations
> 
> On 07.03.2019 9:45, gavin hu wrote:
> > In weak memory models, like arm64, reading the {prod,cons}.tail may get
> > reordered after reading or writing the ring slots, which corrupts the ring
> > and stale data is observed.
> >
> > This issue was reported by NXP on 8-A72 DPAA2 board. The problem is
> most
> > likely caused by missing the acquire semantics when reading cons.tail (in
> > SP enqueue) or prod.tail (in SC dequeue) which makes it possible to read
> a
> > stale value from the ring slots.
> >
> > For MP (and MC) case, rte_atomic32_cmpset() already provides the
> required
> > ordering. This patch is to prevent reading and writing the ring slots get
> > reordered before reading {prod,cons}.tail for SP (and SC) case.
> 
> Read barrier rte_smp_rmb() is OK to prevent reading the ring get
> reordered
> before reading the tail. However, to prevent *writing* the ring get
> reordered
> *before reading* the tail you need a full memory barrier, i.e.
> rte_smp_mb().

ISHLD(rte_smp_rmb is DMB(ishld) orders LD/LD and LD/ST, while WMB(ST Option) orders ST/ST.
For more details, please refer to: Table B2-1 Encoding of the DMB and DSB <option> parameter  in
https://developer.arm.com/docs/ddi0487/latest/arm-architecture-reference-manual-armv8-for-armv8-a-architecture-profile 

> 
> >
> > Signed-off-by: gavin hu <gavin.hu@arm.com>
> > Reviewed-by: Ola Liljedahl <Ola.Liljedahl@arm.com>
> > Tested-by: Nipun Gupta <nipun.gupta@nxp.com>
> > ---
> >  lib/librte_ring/rte_ring_generic.h | 16 ++++++++++------
> >  1 file changed, 10 insertions(+), 6 deletions(-)
> >
> > diff --git a/lib/librte_ring/rte_ring_generic.h
> b/lib/librte_ring/rte_ring_generic.h
> > index ea7dbe5..1bd3dfd 100644
> > --- a/lib/librte_ring/rte_ring_generic.h
> > +++ b/lib/librte_ring/rte_ring_generic.h
> > @@ -90,9 +90,11 @@ __rte_ring_move_prod_head(struct rte_ring *r,
> unsigned int is_sp,
> >  			return 0;
> >
> >  		*new_head = *old_head + n;
> > -		if (is_sp)
> > -			r->prod.head = *new_head, success = 1;
> > -		else
> > +		if (is_sp) {
> > +			r->prod.head = *new_head;
> > +			rte_smp_rmb();
> > +			success = 1;
> > +		} else
> >  			success = rte_atomic32_cmpset(&r->prod.head,
> >  					*old_head, *new_head);
> >  	} while (unlikely(success == 0));
> > @@ -158,9 +160,11 @@ __rte_ring_move_cons_head(struct rte_ring *r,
> unsigned int is_sc,
> >  			return 0;
> >
> >  		*new_head = *old_head + n;
> > -		if (is_sc)
> > -			r->cons.head = *new_head, success = 1;
> > -		else
> > +		if (is_sc) {
> > +			r->cons.head = *new_head;
> > +			rte_smp_rmb();
> > +			success = 1;
> > +		} else
> >  			success = rte_atomic32_cmpset(&r->cons.head,
> *old_head,
> >  					*new_head);
> >  	} while (unlikely(success == 0));
> >

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [dpdk-dev] [PATCH v2] ring: enforce reading the tails before ring operations
  2019-03-07  9:27     ` Gavin Hu (Arm Technology China)
@ 2019-03-07  9:48       ` Ilya Maximets
  2019-03-07 10:44         ` Gavin Hu (Arm Technology China)
  0 siblings, 1 reply; 29+ messages in thread
From: Ilya Maximets @ 2019-03-07  9:48 UTC (permalink / raw)
  To: Gavin Hu (Arm Technology China), dev
  Cc: nd, thomas, jerinj, hemant.agrawal, Nipun.gupta@nxp.com,
	Honnappa Nagarahalli, olivier.matz

On 07.03.2019 12:27, Gavin Hu (Arm Technology China) wrote:
> 
> 
>> -----Original Message-----
>> From: Ilya Maximets <i.maximets@samsung.com>
>> Sent: Thursday, March 7, 2019 4:52 PM
>> To: Gavin Hu (Arm Technology China) <Gavin.Hu@arm.com>;
>> dev@dpdk.org
>> Cc: nd <nd@arm.com>; thomas@monjalon.net; jerinj@marvell.com;
>> hemant.agrawal@nxp.com; Nipun.gupta@nxp.com; Honnappa Nagarahalli
>> <Honnappa.Nagarahalli@arm.com>; olivier.matz@6wind.com
>> Subject: Re: [PATCH v2] ring: enforce reading the tails before ring
>> operations
>>
>> On 07.03.2019 9:45, gavin hu wrote:
>>> In weak memory models, like arm64, reading the {prod,cons}.tail may get
>>> reordered after reading or writing the ring slots, which corrupts the ring
>>> and stale data is observed.
>>>
>>> This issue was reported by NXP on 8-A72 DPAA2 board. The problem is
>> most
>>> likely caused by missing the acquire semantics when reading cons.tail (in
>>> SP enqueue) or prod.tail (in SC dequeue) which makes it possible to read
>> a
>>> stale value from the ring slots.
>>>
>>> For MP (and MC) case, rte_atomic32_cmpset() already provides the
>> required
>>> ordering. This patch is to prevent reading and writing the ring slots get
>>> reordered before reading {prod,cons}.tail for SP (and SC) case.
>>
>> Read barrier rte_smp_rmb() is OK to prevent reading the ring get
>> reordered
>> before reading the tail. However, to prevent *writing* the ring get
>> reordered
>> *before reading* the tail you need a full memory barrier, i.e.
>> rte_smp_mb().
> 
> ISHLD(rte_smp_rmb is DMB(ishld) orders LD/LD and LD/ST, while WMB(ST Option) orders ST/ST.
> For more details, please refer to: Table B2-1 Encoding of the DMB and DSB <option> parameter  in
> https://developer.arm.com/docs/ddi0487/latest/arm-architecture-reference-manual-armv8-for-armv8-a-architecture-profile 

I see. But you have to change the rte_smp_rmb() function definition in
lib/librte_eal/common/include/generic/rte_atomic.h and assure that all
other architectures follows same rules.
Otherwise, this change is logically wrong, because read barrier in current
definition could not be used to order Load with Store.

> 
>>
>>>
>>> Signed-off-by: gavin hu <gavin.hu@arm.com>
>>> Reviewed-by: Ola Liljedahl <Ola.Liljedahl@arm.com>
>>> Tested-by: Nipun Gupta <nipun.gupta@nxp.com>
>>> ---
>>>  lib/librte_ring/rte_ring_generic.h | 16 ++++++++++------
>>>  1 file changed, 10 insertions(+), 6 deletions(-)
>>>
>>> diff --git a/lib/librte_ring/rte_ring_generic.h
>> b/lib/librte_ring/rte_ring_generic.h
>>> index ea7dbe5..1bd3dfd 100644
>>> --- a/lib/librte_ring/rte_ring_generic.h
>>> +++ b/lib/librte_ring/rte_ring_generic.h
>>> @@ -90,9 +90,11 @@ __rte_ring_move_prod_head(struct rte_ring *r,
>> unsigned int is_sp,
>>>  			return 0;
>>>
>>>  		*new_head = *old_head + n;
>>> -		if (is_sp)
>>> -			r->prod.head = *new_head, success = 1;
>>> -		else
>>> +		if (is_sp) {
>>> +			r->prod.head = *new_head;
>>> +			rte_smp_rmb();
>>> +			success = 1;
>>> +		} else
>>>  			success = rte_atomic32_cmpset(&r->prod.head,
>>>  					*old_head, *new_head);
>>>  	} while (unlikely(success == 0));
>>> @@ -158,9 +160,11 @@ __rte_ring_move_cons_head(struct rte_ring *r,
>> unsigned int is_sc,
>>>  			return 0;
>>>
>>>  		*new_head = *old_head + n;
>>> -		if (is_sc)
>>> -			r->cons.head = *new_head, success = 1;
>>> -		else
>>> +		if (is_sc) {
>>> +			r->cons.head = *new_head;
>>> +			rte_smp_rmb();
>>> +			success = 1;
>>> +		} else
>>>  			success = rte_atomic32_cmpset(&r->cons.head,
>> *old_head,
>>>  					*new_head);
>>>  	} while (unlikely(success == 0));
>>>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [dpdk-dev] [PATCH v2] ring: enforce reading the tails before ring operations
  2019-03-07  9:48       ` Ilya Maximets
@ 2019-03-07 10:44         ` Gavin Hu (Arm Technology China)
  2019-03-07 11:17           ` Ananyev, Konstantin
  2019-03-08  4:23           ` Gavin Hu (Arm Technology China)
  0 siblings, 2 replies; 29+ messages in thread
From: Gavin Hu (Arm Technology China) @ 2019-03-07 10:44 UTC (permalink / raw)
  To: Ilya Maximets, dev
  Cc: nd, thomas, jerinj, hemant.agrawal, Nipun.gupta@nxp.com,
	Honnappa Nagarahalli, olivier.matz, bruce.richardson,
	konstantin.ananyev, chaozhu



> -----Original Message-----
> From: Ilya Maximets <i.maximets@samsung.com>
> Sent: Thursday, March 7, 2019 5:48 PM
> To: Gavin Hu (Arm Technology China) <Gavin.Hu@arm.com>; dev@dpdk.org
> Cc: nd <nd@arm.com>; thomas@monjalon.net; jerinj@marvell.com;
> hemant.agrawal@nxp.com; Nipun.gupta@nxp.com; Honnappa Nagarahalli
> <Honnappa.Nagarahalli@arm.com>; olivier.matz@6wind.com
> Subject: Re: [PATCH v2] ring: enforce reading the tails before ring
> operations
> 
> On 07.03.2019 12:27, Gavin Hu (Arm Technology China) wrote:
> >
> >
> >> -----Original Message-----
> >> From: Ilya Maximets <i.maximets@samsung.com>
> >> Sent: Thursday, March 7, 2019 4:52 PM
> >> To: Gavin Hu (Arm Technology China) <Gavin.Hu@arm.com>;
> >> dev@dpdk.org
> >> Cc: nd <nd@arm.com>; thomas@monjalon.net; jerinj@marvell.com;
> >> hemant.agrawal@nxp.com; Nipun.gupta@nxp.com; Honnappa Nagarahalli
> >> <Honnappa.Nagarahalli@arm.com>; olivier.matz@6wind.com
> >> Subject: Re: [PATCH v2] ring: enforce reading the tails before ring
> >> operations
> >>
> >> On 07.03.2019 9:45, gavin hu wrote:
> >>> In weak memory models, like arm64, reading the {prod,cons}.tail may get
> >>> reordered after reading or writing the ring slots, which corrupts the ring
> >>> and stale data is observed.
> >>>
> >>> This issue was reported by NXP on 8-A72 DPAA2 board. The problem is
> >> most
> >>> likely caused by missing the acquire semantics when reading cons.tail (in
> >>> SP enqueue) or prod.tail (in SC dequeue) which makes it possible to
> read
> >> a
> >>> stale value from the ring slots.
> >>>
> >>> For MP (and MC) case, rte_atomic32_cmpset() already provides the
> >> required
> >>> ordering. This patch is to prevent reading and writing the ring slots get
> >>> reordered before reading {prod,cons}.tail for SP (and SC) case.
> >>
> >> Read barrier rte_smp_rmb() is OK to prevent reading the ring get
> >> reordered
> >> before reading the tail. However, to prevent *writing* the ring get
> >> reordered
> >> *before reading* the tail you need a full memory barrier, i.e.
> >> rte_smp_mb().
> >
> > ISHLD(rte_smp_rmb is DMB(ishld) orders LD/LD and LD/ST, while WMB(ST
> Option) orders ST/ST.
> > For more details, please refer to: Table B2-1 Encoding of the DMB and DSB
> <option> parameter  in
> > https://developer.arm.com/docs/ddi0487/latest/arm-architecture-
> reference-manual-armv8-for-armv8-a-architecture-profile
> 
> I see. But you have to change the rte_smp_rmb() function definition in
> lib/librte_eal/common/include/generic/rte_atomic.h and assure that all
> other architectures follows same rules.
> Otherwise, this change is logically wrong, because read barrier in current
> definition could not be used to order Load with Store.
> 

Good points, let me re-think how to handle for other architectures. 
Full MB is required for other architectures(x86? Ppc?), but for arm, read barrier(load/store and load/load) is enough. 

> >
> >>
> >>>
> >>> Signed-off-by: gavin hu <gavin.hu@arm.com>
> >>> Reviewed-by: Ola Liljedahl <Ola.Liljedahl@arm.com>
> >>> Tested-by: Nipun Gupta <nipun.gupta@nxp.com>
> >>> ---
> >>>  lib/librte_ring/rte_ring_generic.h | 16 ++++++++++------
> >>>  1 file changed, 10 insertions(+), 6 deletions(-)
> >>>
> >>> diff --git a/lib/librte_ring/rte_ring_generic.h
> >> b/lib/librte_ring/rte_ring_generic.h
> >>> index ea7dbe5..1bd3dfd 100644
> >>> --- a/lib/librte_ring/rte_ring_generic.h
> >>> +++ b/lib/librte_ring/rte_ring_generic.h
> >>> @@ -90,9 +90,11 @@ __rte_ring_move_prod_head(struct rte_ring *r,
> >> unsigned int is_sp,
> >>>  			return 0;
> >>>
> >>>  		*new_head = *old_head + n;
> >>> -		if (is_sp)
> >>> -			r->prod.head = *new_head, success = 1;
> >>> -		else
> >>> +		if (is_sp) {
> >>> +			r->prod.head = *new_head;
> >>> +			rte_smp_rmb();
> >>> +			success = 1;
> >>> +		} else
> >>>  			success = rte_atomic32_cmpset(&r->prod.head,
> >>>  					*old_head, *new_head);
> >>>  	} while (unlikely(success == 0));
> >>> @@ -158,9 +160,11 @@ __rte_ring_move_cons_head(struct rte_ring
> *r,
> >> unsigned int is_sc,
> >>>  			return 0;
> >>>
> >>>  		*new_head = *old_head + n;
> >>> -		if (is_sc)
> >>> -			r->cons.head = *new_head, success = 1;
> >>> -		else
> >>> +		if (is_sc) {
> >>> +			r->cons.head = *new_head;
> >>> +			rte_smp_rmb();
> >>> +			success = 1;
> >>> +		} else
> >>>  			success = rte_atomic32_cmpset(&r->cons.head,
> >> *old_head,
> >>>  					*new_head);
> >>>  	} while (unlikely(success == 0));
> >>>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [dpdk-dev] [PATCH v2] ring: enforce reading the tails before ring operations
  2019-03-07 10:44         ` Gavin Hu (Arm Technology China)
@ 2019-03-07 11:17           ` Ananyev, Konstantin
  2019-03-08  3:21             ` Honnappa Nagarahalli
  2019-03-08  4:23           ` Gavin Hu (Arm Technology China)
  1 sibling, 1 reply; 29+ messages in thread
From: Ananyev, Konstantin @ 2019-03-07 11:17 UTC (permalink / raw)
  To: Gavin Hu (Arm Technology China), Ilya Maximets, dev
  Cc: nd, thomas, jerinj, hemant.agrawal, Nipun.gupta@nxp.com,
	Honnappa Nagarahalli, olivier.matz, Richardson, Bruce, chaozhu

Hi Gavin,

> > >>> In weak memory models, like arm64, reading the {prod,cons}.tail may get
> > >>> reordered after reading or writing the ring slots, which corrupts the ring
> > >>> and stale data is observed.
> > >>>
> > >>> This issue was reported by NXP on 8-A72 DPAA2 board. The problem is
> > >> most
> > >>> likely caused by missing the acquire semantics when reading cons.tail (in
> > >>> SP enqueue) or prod.tail (in SC dequeue) which makes it possible to
> > read
> > >> a
> > >>> stale value from the ring slots.
> > >>>
> > >>> For MP (and MC) case, rte_atomic32_cmpset() already provides the
> > >> required
> > >>> ordering. This patch is to prevent reading and writing the ring slots get
> > >>> reordered before reading {prod,cons}.tail for SP (and SC) case.
> > >>
> > >> Read barrier rte_smp_rmb() is OK to prevent reading the ring get
> > >> reordered
> > >> before reading the tail. However, to prevent *writing* the ring get
> > >> reordered
> > >> *before reading* the tail you need a full memory barrier, i.e.
> > >> rte_smp_mb().
> > >
> > > ISHLD(rte_smp_rmb is DMB(ishld) orders LD/LD and LD/ST, while WMB(ST
> > Option) orders ST/ST.
> > > For more details, please refer to: Table B2-1 Encoding of the DMB and DSB
> > <option> parameter  in
> > > https://developer.arm.com/docs/ddi0487/latest/arm-architecture-
> > reference-manual-armv8-for-armv8-a-architecture-profile
> >
> > I see. But you have to change the rte_smp_rmb() function definition in
> > lib/librte_eal/common/include/generic/rte_atomic.h and assure that all
> > other architectures follows same rules.
> > Otherwise, this change is logically wrong, because read barrier in current
> > definition could not be used to order Load with Store.
> >
> 
> Good points, let me re-think how to handle for other architectures.
> Full MB is required for other architectures(x86? Ppc?), but for arm, read barrier(load/store and load/load) is enough.

For x86, I don't think you need any barrier here, as with IA memory mode:
-  Reads are not reordered with other reads.
- Writes are not reordered with older reads.

BTW, could you explain a bit more why barrier is necessary even on arm here?
As I can see, there is a data dependency between the tail value and
subsequent address calculations for ring writes/reads.
Isn't that sufficient to prevent re-ordering even for weak memory model?
Konstantin
 

> 
> > >
> > >>
> > >>>
> > >>> Signed-off-by: gavin hu <gavin.hu@arm.com>
> > >>> Reviewed-by: Ola Liljedahl <Ola.Liljedahl@arm.com>
> > >>> Tested-by: Nipun Gupta <nipun.gupta@nxp.com>
> > >>> ---
> > >>>  lib/librte_ring/rte_ring_generic.h | 16 ++++++++++------
> > >>>  1 file changed, 10 insertions(+), 6 deletions(-)
> > >>>
> > >>> diff --git a/lib/librte_ring/rte_ring_generic.h
> > >> b/lib/librte_ring/rte_ring_generic.h
> > >>> index ea7dbe5..1bd3dfd 100644
> > >>> --- a/lib/librte_ring/rte_ring_generic.h
> > >>> +++ b/lib/librte_ring/rte_ring_generic.h
> > >>> @@ -90,9 +90,11 @@ __rte_ring_move_prod_head(struct rte_ring *r,
> > >> unsigned int is_sp,
> > >>>  			return 0;
> > >>>
> > >>>  		*new_head = *old_head + n;
> > >>> -		if (is_sp)
> > >>> -			r->prod.head = *new_head, success = 1;
> > >>> -		else
> > >>> +		if (is_sp) {
> > >>> +			r->prod.head = *new_head;
> > >>> +			rte_smp_rmb();
> > >>> +			success = 1;
> > >>> +		} else
> > >>>  			success = rte_atomic32_cmpset(&r->prod.head,
> > >>>  					*old_head, *new_head);
> > >>>  	} while (unlikely(success == 0));
> > >>> @@ -158,9 +160,11 @@ __rte_ring_move_cons_head(struct rte_ring
> > *r,
> > >> unsigned int is_sc,
> > >>>  			return 0;
> > >>>
> > >>>  		*new_head = *old_head + n;
> > >>> -		if (is_sc)
> > >>> -			r->cons.head = *new_head, success = 1;
> > >>> -		else
> > >>> +		if (is_sc) {
> > >>> +			r->cons.head = *new_head;
> > >>> +			rte_smp_rmb();
> > >>> +			success = 1;
> > >>> +		} else
> > >>>  			success = rte_atomic32_cmpset(&r->cons.head,
> > >> *old_head,
> > >>>  					*new_head);
> > >>>  	} while (unlikely(success == 0));
> > >>>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [dpdk-dev] [PATCH v2] ring: enforce reading the tails before ring operations
  2019-03-07 11:17           ` Ananyev, Konstantin
@ 2019-03-08  3:21             ` Honnappa Nagarahalli
  2019-03-08  5:27               ` Gavin Hu (Arm Technology China)
  0 siblings, 1 reply; 29+ messages in thread
From: Honnappa Nagarahalli @ 2019-03-08  3:21 UTC (permalink / raw)
  To: Ananyev, Konstantin, Gavin Hu (Arm Technology China), Ilya Maximets, dev
  Cc: nd, thomas, jerinj, hemant.agrawal, Nipun.gupta@nxp.com,
	olivier.matz, Richardson, Bruce, chaozhu, nd

> Hi Gavin,
> 
> > > >>> In weak memory models, like arm64, reading the {prod,cons}.tail
> > > >>> may get reordered after reading or writing the ring slots, which
> > > >>> corrupts the ring and stale data is observed.
> > > >>>
> > > >>> This issue was reported by NXP on 8-A72 DPAA2 board. The problem
> > > >>> is
> > > >> most
> > > >>> likely caused by missing the acquire semantics when reading
> > > >>> cons.tail (in SP enqueue) or prod.tail (in SC dequeue) which
> > > >>> makes it possible to
> > > read
> > > >> a
> > > >>> stale value from the ring slots.
> > > >>>
> > > >>> For MP (and MC) case, rte_atomic32_cmpset() already provides the
> > > >> required
> > > >>> ordering. This patch is to prevent reading and writing the ring
> > > >>> slots get reordered before reading {prod,cons}.tail for SP (and SC)
> case.
> > > >>
> > > >> Read barrier rte_smp_rmb() is OK to prevent reading the ring get
> > > >> reordered before reading the tail. However, to prevent *writing*
> > > >> the ring get reordered *before reading* the tail you need a full
> > > >> memory barrier, i.e.
> > > >> rte_smp_mb().
> > > >
> > > > ISHLD(rte_smp_rmb is DMB(ishld) orders LD/LD and LD/ST, while
> > > > WMB(ST
> > > Option) orders ST/ST.
> > > > For more details, please refer to: Table B2-1 Encoding of the DMB
> > > > and DSB
> > > <option> parameter  in
> > > > https://developer.arm.com/docs/ddi0487/latest/arm-architecture-
> > > reference-manual-armv8-for-armv8-a-architecture-profile
> > >
> > > I see. But you have to change the rte_smp_rmb() function definition
> > > in lib/librte_eal/common/include/generic/rte_atomic.h and assure
> > > that all other architectures follows same rules.
> > > Otherwise, this change is logically wrong, because read barrier in
> > > current definition could not be used to order Load with Store.
> > >
> >
> > Good points, let me re-think how to handle for other architectures.
> > Full MB is required for other architectures(x86? Ppc?), but for arm, read
> barrier(load/store and load/load) is enough.
> 
> For x86, I don't think you need any barrier here, as with IA memory mode:
> -  Reads are not reordered with other reads.
> - Writes are not reordered with older reads.
Agree

> 
> BTW, could you explain a bit more why barrier is necessary even on arm here?
> As I can see, there is a data dependency between the tail value and
> subsequent address calculations for ring writes/reads.
> Isn't that sufficient to prevent re-ordering even for weak memory model?
The tail value affects 'n'. But, the value of 'n' can be speculated because of the following 'if' statement:

if (unlikely(n > *free_entries))
                        n = (behavior == RTE_RING_QUEUE_FIXED) ? 0 : *free_entries;

The address calculations for actual ring writes/reads do not depend on the tail value. Since 'n' can be speculated, the writes/reads can be moved up before the load of the tail value.

> Konstantin
> 
> 
<snip>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [dpdk-dev] [PATCH v2] ring: enforce reading the tails before ring operations
  2019-03-07 10:44         ` Gavin Hu (Arm Technology China)
  2019-03-07 11:17           ` Ananyev, Konstantin
@ 2019-03-08  4:23           ` Gavin Hu (Arm Technology China)
  2019-03-08  5:06             ` Honnappa Nagarahalli
  2019-03-08 12:13             ` Ananyev, Konstantin
  1 sibling, 2 replies; 29+ messages in thread
From: Gavin Hu (Arm Technology China) @ 2019-03-08  4:23 UTC (permalink / raw)
  To: Ilya Maximets, dev
  Cc: nd, thomas, jerinj, hemant.agrawal, Nipun.gupta@nxp.com,
	Honnappa Nagarahalli, olivier.matz, bruce.richardson,
	konstantin.ananyev, chaozhu



> -----Original Message-----
> From: Gavin Hu (Arm Technology China)
> Sent: Thursday, March 7, 2019 6:45 PM
> To: Ilya Maximets <i.maximets@samsung.com>; dev@dpdk.org
> Cc: nd <nd@arm.com>; thomas@monjalon.net; jerinj@marvell.com;
> hemant.agrawal@nxp.com; Nipun.gupta@nxp.com; Honnappa Nagarahalli
> <Honnappa.Nagarahalli@arm.com>; olivier.matz@6wind.com;
> bruce.richardson@intel.com; konstantin.ananyev@intel.com;
> chaozhu@linux.vnet.ibm.com
> Subject: RE: [PATCH v2] ring: enforce reading the tails before ring operations
> 
> 
> 
> > -----Original Message-----
> > From: Ilya Maximets <i.maximets@samsung.com>
> > Sent: Thursday, March 7, 2019 5:48 PM
> > To: Gavin Hu (Arm Technology China) <Gavin.Hu@arm.com>;
> dev@dpdk.org
> > Cc: nd <nd@arm.com>; thomas@monjalon.net; jerinj@marvell.com;
> > hemant.agrawal@nxp.com; Nipun.gupta@nxp.com; Honnappa Nagarahalli
> > <Honnappa.Nagarahalli@arm.com>; olivier.matz@6wind.com
> > Subject: Re: [PATCH v2] ring: enforce reading the tails before ring
> > operations
> >
> > On 07.03.2019 12:27, Gavin Hu (Arm Technology China) wrote:
> > >
> > >
> > >> -----Original Message-----
> > >> From: Ilya Maximets <i.maximets@samsung.com>
> > >> Sent: Thursday, March 7, 2019 4:52 PM
> > >> To: Gavin Hu (Arm Technology China) <Gavin.Hu@arm.com>;
> > >> dev@dpdk.org
> > >> Cc: nd <nd@arm.com>; thomas@monjalon.net; jerinj@marvell.com;
> > >> hemant.agrawal@nxp.com; Nipun.gupta@nxp.com; Honnappa
> Nagarahalli
> > >> <Honnappa.Nagarahalli@arm.com>; olivier.matz@6wind.com
> > >> Subject: Re: [PATCH v2] ring: enforce reading the tails before ring
> > >> operations
> > >>
> > >> On 07.03.2019 9:45, gavin hu wrote:
> > >>> In weak memory models, like arm64, reading the {prod,cons}.tail may
> get
> > >>> reordered after reading or writing the ring slots, which corrupts the
> ring
> > >>> and stale data is observed.
> > >>>
> > >>> This issue was reported by NXP on 8-A72 DPAA2 board. The problem
> is
> > >> most
> > >>> likely caused by missing the acquire semantics when reading cons.tail
> (in
> > >>> SP enqueue) or prod.tail (in SC dequeue) which makes it possible to
> > read
> > >> a
> > >>> stale value from the ring slots.
> > >>>
> > >>> For MP (and MC) case, rte_atomic32_cmpset() already provides the
> > >> required
> > >>> ordering. This patch is to prevent reading and writing the ring slots get
> > >>> reordered before reading {prod,cons}.tail for SP (and SC) case.
> > >>
> > >> Read barrier rte_smp_rmb() is OK to prevent reading the ring get
> > >> reordered
> > >> before reading the tail. However, to prevent *writing* the ring get
> > >> reordered
> > >> *before reading* the tail you need a full memory barrier, i.e.
> > >> rte_smp_mb().
> > >
> > > ISHLD(rte_smp_rmb is DMB(ishld) orders LD/LD and LD/ST, while
> WMB(ST
> > Option) orders ST/ST.
> > > For more details, please refer to: Table B2-1 Encoding of the DMB and
> DSB
> > <option> parameter  in
> > > https://developer.arm.com/docs/ddi0487/latest/arm-architecture-
> > reference-manual-armv8-for-armv8-a-architecture-profile
> >
> > I see. But you have to change the rte_smp_rmb() function definition in
> > lib/librte_eal/common/include/generic/rte_atomic.h and assure that all
> > other architectures follows same rules.
> > Otherwise, this change is logically wrong, because read barrier in current
> > definition could not be used to order Load with Store.
> >
> 
> Good points, let me re-think how to handle for other architectures.
> Full MB is required for other architectures(x86? Ppc?), but for arm, read
> barrier(load/store and load/load) is enough.

Hi Ilya,

I would expand the rmb definition to cover load/store, in addition to load/load.
For X86, as a strong memory order model, rmb is actually equivalent to mb, as implemented as a compiler barrier: rte_compiler_barrier, arm32 is also this case.
For PPC, both 32 and 64-bit, rmb=wmb=mb, lwsync/sync orders load/store, load/load, store/load, store/store, looking at the table on this page:
https://www.ibm.com/developerworks/systems/articles/powerpc.html 

In summary, we are safe to expand this definition for all the architectures DPDK support? 
Any comments are welcome!

BR. Gavin

> > >
> > >>
> > >>>
> > >>> Signed-off-by: gavin hu <gavin.hu@arm.com>
> > >>> Reviewed-by: Ola Liljedahl <Ola.Liljedahl@arm.com>
> > >>> Tested-by: Nipun Gupta <nipun.gupta@nxp.com>
> > >>> ---
> > >>>  lib/librte_ring/rte_ring_generic.h | 16 ++++++++++------
> > >>>  1 file changed, 10 insertions(+), 6 deletions(-)
> > >>>
> > >>> diff --git a/lib/librte_ring/rte_ring_generic.h
> > >> b/lib/librte_ring/rte_ring_generic.h
> > >>> index ea7dbe5..1bd3dfd 100644
> > >>> --- a/lib/librte_ring/rte_ring_generic.h
> > >>> +++ b/lib/librte_ring/rte_ring_generic.h
> > >>> @@ -90,9 +90,11 @@ __rte_ring_move_prod_head(struct rte_ring
> *r,
> > >> unsigned int is_sp,
> > >>>  			return 0;
> > >>>
> > >>>  		*new_head = *old_head + n;
> > >>> -		if (is_sp)
> > >>> -			r->prod.head = *new_head, success = 1;
> > >>> -		else
> > >>> +		if (is_sp) {
> > >>> +			r->prod.head = *new_head;
> > >>> +			rte_smp_rmb();
> > >>> +			success = 1;
> > >>> +		} else
> > >>>  			success = rte_atomic32_cmpset(&r->prod.head,
> > >>>  					*old_head, *new_head);
> > >>>  	} while (unlikely(success == 0));
> > >>> @@ -158,9 +160,11 @@ __rte_ring_move_cons_head(struct
> rte_ring
> > *r,
> > >> unsigned int is_sc,
> > >>>  			return 0;
> > >>>
> > >>>  		*new_head = *old_head + n;
> > >>> -		if (is_sc)
> > >>> -			r->cons.head = *new_head, success = 1;
> > >>> -		else
> > >>> +		if (is_sc) {
> > >>> +			r->cons.head = *new_head;
> > >>> +			rte_smp_rmb();
> > >>> +			success = 1;
> > >>> +		} else
> > >>>  			success = rte_atomic32_cmpset(&r->cons.head,
> > >> *old_head,
> > >>>  					*new_head);
> > >>>  	} while (unlikely(success == 0));
> > >>>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [dpdk-dev] [PATCH v2] ring: enforce reading the tails before ring operations
  2019-03-08  4:23           ` Gavin Hu (Arm Technology China)
@ 2019-03-08  5:06             ` Honnappa Nagarahalli
  2019-03-08 12:13             ` Ananyev, Konstantin
  1 sibling, 0 replies; 29+ messages in thread
From: Honnappa Nagarahalli @ 2019-03-08  5:06 UTC (permalink / raw)
  To: Gavin Hu (Arm Technology China), Ilya Maximets, dev
  Cc: nd, thomas, jerinj, hemant.agrawal, Nipun.gupta@nxp.com,
	olivier.matz, bruce.richardson, konstantin.ananyev, chaozhu, nd

> > > >> On 07.03.2019 9:45, gavin hu wrote:
> > > >>> In weak memory models, like arm64, reading the {prod,cons}.tail
> > > >>> may
> > get
> > > >>> reordered after reading or writing the ring slots, which
> > > >>> corrupts the
> > ring
> > > >>> and stale data is observed.
> > > >>>
> > > >>> This issue was reported by NXP on 8-A72 DPAA2 board. The problem
> > is
> > > >> most
> > > >>> likely caused by missing the acquire semantics when reading
> > > >>> cons.tail
> > (in
> > > >>> SP enqueue) or prod.tail (in SC dequeue) which makes it possible
> > > >>> to
> > > read
> > > >> a
> > > >>> stale value from the ring slots.
> > > >>>
> > > >>> For MP (and MC) case, rte_atomic32_cmpset() already provides the
> > > >> required
> > > >>> ordering. This patch is to prevent reading and writing the ring
> > > >>> slots get reordered before reading {prod,cons}.tail for SP (and SC)
> case.
> > > >>
> > > >> Read barrier rte_smp_rmb() is OK to prevent reading the ring get
> > > >> reordered before reading the tail. However, to prevent *writing*
> > > >> the ring get reordered *before reading* the tail you need a full
> > > >> memory barrier, i.e.
> > > >> rte_smp_mb().
> > > >
> > > > ISHLD(rte_smp_rmb is DMB(ishld) orders LD/LD and LD/ST, while
> > WMB(ST
> > > Option) orders ST/ST.
> > > > For more details, please refer to: Table B2-1 Encoding of the DMB
> > > > and
> > DSB
> > > <option> parameter  in
> > > > https://developer.arm.com/docs/ddi0487/latest/arm-architecture-
> > > reference-manual-armv8-for-armv8-a-architecture-profile
> > >
> > > I see. But you have to change the rte_smp_rmb() function definition
> > > in lib/librte_eal/common/include/generic/rte_atomic.h and assure
> > > that all other architectures follows same rules.
> > > Otherwise, this change is logically wrong, because read barrier in
> > > current definition could not be used to order Load with Store.
> > >
> >
> > Good points, let me re-think how to handle for other architectures.
> > Full MB is required for other architectures(x86? Ppc?), but for arm,
> > read barrier(load/store and load/load) is enough.
> 
> Hi Ilya,
> 
> I would expand the rmb definition to cover load/store, in addition to
> load/load.
> For X86, as a strong memory order model, rmb is actually equivalent to mb,
> as implemented as a compiler barrier: rte_compiler_barrier, arm32 is also
> this case.
> For PPC, both 32 and 64-bit, rmb=wmb=mb, lwsync/sync orders load/store,
> load/load, store/load, store/store, looking at the table on this page:
> https://www.ibm.com/developerworks/systems/articles/powerpc.html
> 
> In summary, we are safe to expand this definition for all the architectures
> DPDK support?
Essentially, it is a documentation bug. i.e. the current implementation of rte_smp_rmb() already behaves as load/load and load/store barrier.

> Any comments are welcome!
> 
> BR. Gavin
> 

<snip>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [dpdk-dev] [PATCH v2] ring: enforce reading the tails before ring operations
  2019-03-08  3:21             ` Honnappa Nagarahalli
@ 2019-03-08  5:27               ` Gavin Hu (Arm Technology China)
  2019-03-08 16:33                 ` Ananyev, Konstantin
  0 siblings, 1 reply; 29+ messages in thread
From: Gavin Hu (Arm Technology China) @ 2019-03-08  5:27 UTC (permalink / raw)
  To: Honnappa Nagarahalli, Ananyev, Konstantin, Ilya Maximets, dev
  Cc: nd, thomas, jerinj, hemant.agrawal, Nipun.gupta@nxp.com,
	olivier.matz, Richardson, Bruce, chaozhu, nd

Hi Konstantin,

> -----Original Message-----
> From: Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>
> Sent: Friday, March 8, 2019 11:21 AM
> To: Ananyev, Konstantin <konstantin.ananyev@intel.com>; Gavin Hu (Arm
> Technology China) <Gavin.Hu@arm.com>; Ilya Maximets
> <i.maximets@samsung.com>; dev@dpdk.org
> Cc: nd <nd@arm.com>; thomas@monjalon.net; jerinj@marvell.com;
> hemant.agrawal@nxp.com; Nipun.gupta@nxp.com; olivier.matz@6wind.com;
> Richardson, Bruce <bruce.richardson@intel.com>;
> chaozhu@linux.vnet.ibm.com; nd <nd@arm.com>
> Subject: RE: [PATCH v2] ring: enforce reading the tails before ring operations
> 
> > Hi Gavin,
> >
> > > > >>> In weak memory models, like arm64, reading the {prod,cons}.tail
> > > > >>> may get reordered after reading or writing the ring slots, which
> > > > >>> corrupts the ring and stale data is observed.
> > > > >>>
> > > > >>> This issue was reported by NXP on 8-A72 DPAA2 board. The
> problem
> > > > >>> is
> > > > >> most
> > > > >>> likely caused by missing the acquire semantics when reading
> > > > >>> cons.tail (in SP enqueue) or prod.tail (in SC dequeue) which
> > > > >>> makes it possible to
> > > > read
> > > > >> a
> > > > >>> stale value from the ring slots.
> > > > >>>
> > > > >>> For MP (and MC) case, rte_atomic32_cmpset() already provides
> the
> > > > >> required
> > > > >>> ordering. This patch is to prevent reading and writing the ring
> > > > >>> slots get reordered before reading {prod,cons}.tail for SP (and SC)
> > case.
> > > > >>
> > > > >> Read barrier rte_smp_rmb() is OK to prevent reading the ring get
> > > > >> reordered before reading the tail. However, to prevent *writing*
> > > > >> the ring get reordered *before reading* the tail you need a full
> > > > >> memory barrier, i.e.
> > > > >> rte_smp_mb().
> > > > >
> > > > > ISHLD(rte_smp_rmb is DMB(ishld) orders LD/LD and LD/ST, while
> > > > > WMB(ST
> > > > Option) orders ST/ST.
> > > > > For more details, please refer to: Table B2-1 Encoding of the DMB
> > > > > and DSB
> > > > <option> parameter  in
> > > > > https://developer.arm.com/docs/ddi0487/latest/arm-architecture-
> > > > reference-manual-armv8-for-armv8-a-architecture-profile
> > > >
> > > > I see. But you have to change the rte_smp_rmb() function definition
> > > > in lib/librte_eal/common/include/generic/rte_atomic.h and assure
> > > > that all other architectures follows same rules.
> > > > Otherwise, this change is logically wrong, because read barrier in
> > > > current definition could not be used to order Load with Store.
> > > >
> > >
> > > Good points, let me re-think how to handle for other architectures.
> > > Full MB is required for other architectures(x86? Ppc?), but for arm, read
> > barrier(load/store and load/load) is enough.
> >
> > For x86, I don't think you need any barrier here, as with IA memory mode:
> > -  Reads are not reordered with other reads.
> > - Writes are not reordered with older reads.
> Agree

I understand herein no instruction level barriers are required for IA, but how about the
compiler barrier: rte_compiler_barrier? 

> 
> >
> > BTW, could you explain a bit more why barrier is necessary even on arm
> here?
> > As I can see, there is a data dependency between the tail value and
> > subsequent address calculations for ring writes/reads.
> > Isn't that sufficient to prevent re-ordering even for weak memory model?
> The tail value affects 'n'. But, the value of 'n' can be speculated because of
> the following 'if' statement:
> 
> if (unlikely(n > *free_entries))
>                         n = (behavior == RTE_RING_QUEUE_FIXED) ? 0 : *free_entries;
> 
> The address calculations for actual ring writes/reads do not depend on the
> tail value. Since 'n' can be speculated, the writes/reads can be moved up
> before the load of the tail value.

Good explanation. The address calculations does not depend on tail/n, only the
limit/last one depends on it, while it can be speculated. 

> > Konstantin
> >
> >
> <snip>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [dpdk-dev] [PATCH v2] ring: enforce reading the tails before ring operations
  2019-03-08  4:23           ` Gavin Hu (Arm Technology China)
  2019-03-08  5:06             ` Honnappa Nagarahalli
@ 2019-03-08 12:13             ` Ananyev, Konstantin
  2019-03-08 15:05               ` Gavin Hu (Arm Technology China)
  1 sibling, 1 reply; 29+ messages in thread
From: Ananyev, Konstantin @ 2019-03-08 12:13 UTC (permalink / raw)
  To: Gavin Hu (Arm Technology China), Ilya Maximets, dev
  Cc: nd, thomas, jerinj, hemant.agrawal, Nipun.gupta@nxp.com,
	Honnappa Nagarahalli, olivier.matz, Richardson, Bruce, chaozhu



> -----Original Message-----
> From: Gavin Hu (Arm Technology China) [mailto:Gavin.Hu@arm.com]
> Sent: Friday, March 8, 2019 4:23 AM
> To: Ilya Maximets <i.maximets@samsung.com>; dev@dpdk.org
> Cc: nd <nd@arm.com>; thomas@monjalon.net; jerinj@marvell.com; hemant.agrawal@nxp.com; Nipun.gupta@nxp.com; Honnappa
> Nagarahalli <Honnappa.Nagarahalli@arm.com>; olivier.matz@6wind.com; Richardson, Bruce <bruce.richardson@intel.com>; Ananyev,
> Konstantin <konstantin.ananyev@intel.com>; chaozhu@linux.vnet.ibm.com
> Subject: RE: [PATCH v2] ring: enforce reading the tails before ring operations
> 
> 
> 
> > -----Original Message-----
> > From: Gavin Hu (Arm Technology China)
> > Sent: Thursday, March 7, 2019 6:45 PM
> > To: Ilya Maximets <i.maximets@samsung.com>; dev@dpdk.org
> > Cc: nd <nd@arm.com>; thomas@monjalon.net; jerinj@marvell.com;
> > hemant.agrawal@nxp.com; Nipun.gupta@nxp.com; Honnappa Nagarahalli
> > <Honnappa.Nagarahalli@arm.com>; olivier.matz@6wind.com;
> > bruce.richardson@intel.com; konstantin.ananyev@intel.com;
> > chaozhu@linux.vnet.ibm.com
> > Subject: RE: [PATCH v2] ring: enforce reading the tails before ring operations
> >
> >
> >
> > > -----Original Message-----
> > > From: Ilya Maximets <i.maximets@samsung.com>
> > > Sent: Thursday, March 7, 2019 5:48 PM
> > > To: Gavin Hu (Arm Technology China) <Gavin.Hu@arm.com>;
> > dev@dpdk.org
> > > Cc: nd <nd@arm.com>; thomas@monjalon.net; jerinj@marvell.com;
> > > hemant.agrawal@nxp.com; Nipun.gupta@nxp.com; Honnappa Nagarahalli
> > > <Honnappa.Nagarahalli@arm.com>; olivier.matz@6wind.com
> > > Subject: Re: [PATCH v2] ring: enforce reading the tails before ring
> > > operations
> > >
> > > On 07.03.2019 12:27, Gavin Hu (Arm Technology China) wrote:
> > > >
> > > >
> > > >> -----Original Message-----
> > > >> From: Ilya Maximets <i.maximets@samsung.com>
> > > >> Sent: Thursday, March 7, 2019 4:52 PM
> > > >> To: Gavin Hu (Arm Technology China) <Gavin.Hu@arm.com>;
> > > >> dev@dpdk.org
> > > >> Cc: nd <nd@arm.com>; thomas@monjalon.net; jerinj@marvell.com;
> > > >> hemant.agrawal@nxp.com; Nipun.gupta@nxp.com; Honnappa
> > Nagarahalli
> > > >> <Honnappa.Nagarahalli@arm.com>; olivier.matz@6wind.com
> > > >> Subject: Re: [PATCH v2] ring: enforce reading the tails before ring
> > > >> operations
> > > >>
> > > >> On 07.03.2019 9:45, gavin hu wrote:
> > > >>> In weak memory models, like arm64, reading the {prod,cons}.tail may
> > get
> > > >>> reordered after reading or writing the ring slots, which corrupts the
> > ring
> > > >>> and stale data is observed.
> > > >>>
> > > >>> This issue was reported by NXP on 8-A72 DPAA2 board. The problem
> > is
> > > >> most
> > > >>> likely caused by missing the acquire semantics when reading cons.tail
> > (in
> > > >>> SP enqueue) or prod.tail (in SC dequeue) which makes it possible to
> > > read
> > > >> a
> > > >>> stale value from the ring slots.
> > > >>>
> > > >>> For MP (and MC) case, rte_atomic32_cmpset() already provides the
> > > >> required
> > > >>> ordering. This patch is to prevent reading and writing the ring slots get
> > > >>> reordered before reading {prod,cons}.tail for SP (and SC) case.
> > > >>
> > > >> Read barrier rte_smp_rmb() is OK to prevent reading the ring get
> > > >> reordered
> > > >> before reading the tail. However, to prevent *writing* the ring get
> > > >> reordered
> > > >> *before reading* the tail you need a full memory barrier, i.e.
> > > >> rte_smp_mb().
> > > >
> > > > ISHLD(rte_smp_rmb is DMB(ishld) orders LD/LD and LD/ST, while
> > WMB(ST
> > > Option) orders ST/ST.
> > > > For more details, please refer to: Table B2-1 Encoding of the DMB and
> > DSB
> > > <option> parameter  in
> > > > https://developer.arm.com/docs/ddi0487/latest/arm-architecture-
> > > reference-manual-armv8-for-armv8-a-architecture-profile
> > >
> > > I see. But you have to change the rte_smp_rmb() function definition in
> > > lib/librte_eal/common/include/generic/rte_atomic.h and assure that all
> > > other architectures follows same rules.
> > > Otherwise, this change is logically wrong, because read barrier in current
> > > definition could not be used to order Load with Store.
> > >
> >
> > Good points, let me re-think how to handle for other architectures.
> > Full MB is required for other architectures(x86? Ppc?), but for arm, read
> > barrier(load/store and load/load) is enough.
> 
> Hi Ilya,
> 
> I would expand the rmb definition to cover load/store, in addition to load/load.
> For X86, as a strong memory order model, rmb is actually equivalent to mb,

That's not exactly the case, on x86 we have:
smp_rmb == compiler_barrier
smp_mb is a proper memory barrier.

Konstantin

> as implemented as a compiler barrier: rte_compiler_barrier,
> arm32 is also this case.
> For PPC, both 32 and 64-bit, rmb=wmb=mb, lwsync/sync orders load/store, load/load, store/load, store/store, looking at the table on this
> page:
> https://www.ibm.com/developerworks/systems/articles/powerpc.html
> 
> In summary, we are safe to expand this definition for all the architectures DPDK support?
> Any comments are welcome!
> 
> BR. Gavin
> 

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [dpdk-dev] [PATCH v2] ring: enforce reading the tails before ring operations
  2019-03-08 12:13             ` Ananyev, Konstantin
@ 2019-03-08 15:05               ` Gavin Hu (Arm Technology China)
  2019-03-08 15:50                 ` Ananyev, Konstantin
  0 siblings, 1 reply; 29+ messages in thread
From: Gavin Hu (Arm Technology China) @ 2019-03-08 15:05 UTC (permalink / raw)
  To: Ananyev, Konstantin, Ilya Maximets, dev
  Cc: nd, thomas, jerinj, hemant.agrawal, Nipun.gupta@nxp.com,
	Honnappa Nagarahalli, olivier.matz, Richardson, Bruce, chaozhu

Hi Konstantin,

> -----Original Message-----
> From: Ananyev, Konstantin <konstantin.ananyev@intel.com>
> Sent: Friday, March 8, 2019 8:13 PM
> To: Gavin Hu (Arm Technology China) <Gavin.Hu@arm.com>; Ilya
> Maximets <i.maximets@samsung.com>; dev@dpdk.org
> Cc: nd <nd@arm.com>; thomas@monjalon.net; jerinj@marvell.com;
> hemant.agrawal@nxp.com; Nipun.gupta@nxp.com; Honnappa Nagarahalli
> <Honnappa.Nagarahalli@arm.com>; olivier.matz@6wind.com; Richardson,
> Bruce <bruce.richardson@intel.com>; chaozhu@linux.vnet.ibm.com
> Subject: RE: [PATCH v2] ring: enforce reading the tails before ring
> operations
> 
> 
> 
> > -----Original Message-----
> > From: Gavin Hu (Arm Technology China) [mailto:Gavin.Hu@arm.com]
> > Sent: Friday, March 8, 2019 4:23 AM
> > To: Ilya Maximets <i.maximets@samsung.com>; dev@dpdk.org
> > Cc: nd <nd@arm.com>; thomas@monjalon.net; jerinj@marvell.com;
> hemant.agrawal@nxp.com; Nipun.gupta@nxp.com; Honnappa
> > Nagarahalli <Honnappa.Nagarahalli@arm.com>;
> olivier.matz@6wind.com; Richardson, Bruce
> <bruce.richardson@intel.com>; Ananyev,
> > Konstantin <konstantin.ananyev@intel.com>;
> chaozhu@linux.vnet.ibm.com
> > Subject: RE: [PATCH v2] ring: enforce reading the tails before ring
> operations
> >
> >
> >
> > > -----Original Message-----
> > > From: Gavin Hu (Arm Technology China)
> > > Sent: Thursday, March 7, 2019 6:45 PM
> > > To: Ilya Maximets <i.maximets@samsung.com>; dev@dpdk.org
> > > Cc: nd <nd@arm.com>; thomas@monjalon.net; jerinj@marvell.com;
> > > hemant.agrawal@nxp.com; Nipun.gupta@nxp.com; Honnappa
> Nagarahalli
> > > <Honnappa.Nagarahalli@arm.com>; olivier.matz@6wind.com;
> > > bruce.richardson@intel.com; konstantin.ananyev@intel.com;
> > > chaozhu@linux.vnet.ibm.com
> > > Subject: RE: [PATCH v2] ring: enforce reading the tails before ring
> operations
> > >
> > >
> > >
> > > > -----Original Message-----
> > > > From: Ilya Maximets <i.maximets@samsung.com>
> > > > Sent: Thursday, March 7, 2019 5:48 PM
> > > > To: Gavin Hu (Arm Technology China) <Gavin.Hu@arm.com>;
> > > dev@dpdk.org
> > > > Cc: nd <nd@arm.com>; thomas@monjalon.net; jerinj@marvell.com;
> > > > hemant.agrawal@nxp.com; Nipun.gupta@nxp.com; Honnappa
> Nagarahalli
> > > > <Honnappa.Nagarahalli@arm.com>; olivier.matz@6wind.com
> > > > Subject: Re: [PATCH v2] ring: enforce reading the tails before ring
> > > > operations
> > > >
> > > > On 07.03.2019 12:27, Gavin Hu (Arm Technology China) wrote:
> > > > >
> > > > >
> > > > >> -----Original Message-----
> > > > >> From: Ilya Maximets <i.maximets@samsung.com>
> > > > >> Sent: Thursday, March 7, 2019 4:52 PM
> > > > >> To: Gavin Hu (Arm Technology China) <Gavin.Hu@arm.com>;
> > > > >> dev@dpdk.org
> > > > >> Cc: nd <nd@arm.com>; thomas@monjalon.net;
> jerinj@marvell.com;
> > > > >> hemant.agrawal@nxp.com; Nipun.gupta@nxp.com; Honnappa
> > > Nagarahalli
> > > > >> <Honnappa.Nagarahalli@arm.com>; olivier.matz@6wind.com
> > > > >> Subject: Re: [PATCH v2] ring: enforce reading the tails before ring
> > > > >> operations
> > > > >>
> > > > >> On 07.03.2019 9:45, gavin hu wrote:
> > > > >>> In weak memory models, like arm64, reading the {prod,cons}.tail
> may
> > > get
> > > > >>> reordered after reading or writing the ring slots, which corrupts
> the
> > > ring
> > > > >>> and stale data is observed.
> > > > >>>
> > > > >>> This issue was reported by NXP on 8-A72 DPAA2 board. The
> problem
> > > is
> > > > >> most
> > > > >>> likely caused by missing the acquire semantics when reading
> cons.tail
> > > (in
> > > > >>> SP enqueue) or prod.tail (in SC dequeue) which makes it possible
> to
> > > > read
> > > > >> a
> > > > >>> stale value from the ring slots.
> > > > >>>
> > > > >>> For MP (and MC) case, rte_atomic32_cmpset() already provides
> the
> > > > >> required
> > > > >>> ordering. This patch is to prevent reading and writing the ring
> slots get
> > > > >>> reordered before reading {prod,cons}.tail for SP (and SC) case.
> > > > >>
> > > > >> Read barrier rte_smp_rmb() is OK to prevent reading the ring get
> > > > >> reordered
> > > > >> before reading the tail. However, to prevent *writing* the ring get
> > > > >> reordered
> > > > >> *before reading* the tail you need a full memory barrier, i.e.
> > > > >> rte_smp_mb().
> > > > >
> > > > > ISHLD(rte_smp_rmb is DMB(ishld) orders LD/LD and LD/ST, while
> > > WMB(ST
> > > > Option) orders ST/ST.
> > > > > For more details, please refer to: Table B2-1 Encoding of the DMB
> and
> > > DSB
> > > > <option> parameter  in
> > > > > https://developer.arm.com/docs/ddi0487/latest/arm-architecture-
> > > > reference-manual-armv8-for-armv8-a-architecture-profile
> > > >
> > > > I see. But you have to change the rte_smp_rmb() function definition
> in
> > > > lib/librte_eal/common/include/generic/rte_atomic.h and assure that
> all
> > > > other architectures follows same rules.
> > > > Otherwise, this change is logically wrong, because read barrier in
> current
> > > > definition could not be used to order Load with Store.
> > > >
> > >
> > > Good points, let me re-think how to handle for other architectures.
> > > Full MB is required for other architectures(x86? Ppc?), but for arm,
> read
> > > barrier(load/store and load/load) is enough.
> >
> > Hi Ilya,
> >
> > I would expand the rmb definition to cover load/store, in addition to
> load/load.
> > For X86, as a strong memory order model, rmb is actually equivalent to
> mb,
> 
> That's not exactly the case, on x86 we have:
> smp_rmb == compiler_barrier
> smp_mb is a proper memory barrier.
> 
> Konstantin

Sorry I did not make it clear.
Anyway, on x86, smp_rmb, as a compiler barrier, applies to load/store, not only load/load.
This is the case also for arm, arm64, ppc32, ppc64.
I will submit a patch to expand the definition of this API. 

> 
> > as implemented as a compiler barrier: rte_compiler_barrier,
> > arm32 is also this case.
> > For PPC, both 32 and 64-bit, rmb=wmb=mb, lwsync/sync orders
> load/store, load/load, store/load, store/store, looking at the table on this
> > page:
> > https://www.ibm.com/developerworks/systems/articles/powerpc.html
> >
> > In summary, we are safe to expand this definition for all the
> architectures DPDK support?
> > Any comments are welcome!
> >
> > BR. Gavin
> >

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [dpdk-dev] [PATCH v2] ring: enforce reading the tails before ring operations
  2019-03-08 15:05               ` Gavin Hu (Arm Technology China)
@ 2019-03-08 15:50                 ` Ananyev, Konstantin
  2019-03-08 23:18                   ` Thomas Monjalon
  0 siblings, 1 reply; 29+ messages in thread
From: Ananyev, Konstantin @ 2019-03-08 15:50 UTC (permalink / raw)
  To: Gavin Hu (Arm Technology China), Ilya Maximets, dev
  Cc: nd, thomas, jerinj, hemant.agrawal, Nipun.gupta@nxp.com,
	Honnappa Nagarahalli, olivier.matz, Richardson, Bruce, chaozhu



> > > > > >>> In weak memory models, like arm64, reading the {prod,cons}.tail
> > may
> > > > get
> > > > > >>> reordered after reading or writing the ring slots, which corrupts
> > the
> > > > ring
> > > > > >>> and stale data is observed.
> > > > > >>>
> > > > > >>> This issue was reported by NXP on 8-A72 DPAA2 board. The
> > problem
> > > > is
> > > > > >> most
> > > > > >>> likely caused by missing the acquire semantics when reading
> > cons.tail
> > > > (in
> > > > > >>> SP enqueue) or prod.tail (in SC dequeue) which makes it possible
> > to
> > > > > read
> > > > > >> a
> > > > > >>> stale value from the ring slots.
> > > > > >>>
> > > > > >>> For MP (and MC) case, rte_atomic32_cmpset() already provides
> > the
> > > > > >> required
> > > > > >>> ordering. This patch is to prevent reading and writing the ring
> > slots get
> > > > > >>> reordered before reading {prod,cons}.tail for SP (and SC) case.
> > > > > >>
> > > > > >> Read barrier rte_smp_rmb() is OK to prevent reading the ring get
> > > > > >> reordered
> > > > > >> before reading the tail. However, to prevent *writing* the ring get
> > > > > >> reordered
> > > > > >> *before reading* the tail you need a full memory barrier, i.e.
> > > > > >> rte_smp_mb().
> > > > > >
> > > > > > ISHLD(rte_smp_rmb is DMB(ishld) orders LD/LD and LD/ST, while
> > > > WMB(ST
> > > > > Option) orders ST/ST.
> > > > > > For more details, please refer to: Table B2-1 Encoding of the DMB
> > and
> > > > DSB
> > > > > <option> parameter  in
> > > > > > https://developer.arm.com/docs/ddi0487/latest/arm-architecture-
> > > > > reference-manual-armv8-for-armv8-a-architecture-profile
> > > > >
> > > > > I see. But you have to change the rte_smp_rmb() function definition
> > in
> > > > > lib/librte_eal/common/include/generic/rte_atomic.h and assure that
> > all
> > > > > other architectures follows same rules.
> > > > > Otherwise, this change is logically wrong, because read barrier in
> > current
> > > > > definition could not be used to order Load with Store.
> > > > >
> > > >
> > > > Good points, let me re-think how to handle for other architectures.
> > > > Full MB is required for other architectures(x86? Ppc?), but for arm,
> > read
> > > > barrier(load/store and load/load) is enough.
> > >
> > > Hi Ilya,
> > >
> > > I would expand the rmb definition to cover load/store, in addition to
> > load/load.
> > > For X86, as a strong memory order model, rmb is actually equivalent to
> > mb,
> >
> > That's not exactly the case, on x86 we have:
> > smp_rmb == compiler_barrier
> > smp_mb is a proper memory barrier.
> >
> > Konstantin
> 
> Sorry I did not make it clear.
> Anyway, on x86, smp_rmb, as a compiler barrier, applies to load/store, not only load/load.

Yes, that's true, but I think that's happened by coincidence, 
not intentionally.

> This is the case also for arm, arm64, ppc32, ppc64.
> I will submit a patch to expand the definition of this API.

I understand your intention, but that does mean we would also need
to change not only rte_smp_rmb() but rte_rmb() too (to keep things consistent)?
That sounds worring.  
Might be better to keep smp_rmb() definition as it is, and introduce new function
that fits your purposes (smp_rwmb or smp_load_store_barrier)?

Konstantin

> 
> >
> > > as implemented as a compiler barrier: rte_compiler_barrier,
> > > arm32 is also this case.
> > > For PPC, both 32 and 64-bit, rmb=wmb=mb, lwsync/sync orders
> > load/store, load/load, store/load, store/store, looking at the table on this
> > > page:
> > > https://www.ibm.com/developerworks/systems/articles/powerpc.html
> > >
> > > In summary, we are safe to expand this definition for all the
> > architectures DPDK support?
> > > Any comments are welcome!
> > >
> > > BR. Gavin
> > >

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [dpdk-dev] [PATCH v2] ring: enforce reading the tails before ring operations
  2019-03-08  5:27               ` Gavin Hu (Arm Technology China)
@ 2019-03-08 16:33                 ` Ananyev, Konstantin
  2019-03-10 20:47                   ` Honnappa Nagarahalli
  0 siblings, 1 reply; 29+ messages in thread
From: Ananyev, Konstantin @ 2019-03-08 16:33 UTC (permalink / raw)
  To: Gavin Hu (Arm Technology China),
	Honnappa Nagarahalli, Ilya Maximets, dev
  Cc: nd, thomas, jerinj, hemant.agrawal, Nipun.gupta@nxp.com,
	olivier.matz, Richardson, Bruce, chaozhu, nd


> > > > > >>> In weak memory models, like arm64, reading the {prod,cons}.tail
> > > > > >>> may get reordered after reading or writing the ring slots, which
> > > > > >>> corrupts the ring and stale data is observed.
> > > > > >>>
> > > > > >>> This issue was reported by NXP on 8-A72 DPAA2 board. The
> > problem
> > > > > >>> is
> > > > > >> most
> > > > > >>> likely caused by missing the acquire semantics when reading
> > > > > >>> cons.tail (in SP enqueue) or prod.tail (in SC dequeue) which
> > > > > >>> makes it possible to
> > > > > read
> > > > > >> a
> > > > > >>> stale value from the ring slots.
> > > > > >>>
> > > > > >>> For MP (and MC) case, rte_atomic32_cmpset() already provides
> > the
> > > > > >> required
> > > > > >>> ordering. This patch is to prevent reading and writing the ring
> > > > > >>> slots get reordered before reading {prod,cons}.tail for SP (and SC)
> > > case.
> > > > > >>
> > > > > >> Read barrier rte_smp_rmb() is OK to prevent reading the ring get
> > > > > >> reordered before reading the tail. However, to prevent *writing*
> > > > > >> the ring get reordered *before reading* the tail you need a full
> > > > > >> memory barrier, i.e.
> > > > > >> rte_smp_mb().
> > > > > >
> > > > > > ISHLD(rte_smp_rmb is DMB(ishld) orders LD/LD and LD/ST, while
> > > > > > WMB(ST
> > > > > Option) orders ST/ST.
> > > > > > For more details, please refer to: Table B2-1 Encoding of the DMB
> > > > > > and DSB
> > > > > <option> parameter  in
> > > > > > https://developer.arm.com/docs/ddi0487/latest/arm-architecture-
> > > > > reference-manual-armv8-for-armv8-a-architecture-profile
> > > > >
> > > > > I see. But you have to change the rte_smp_rmb() function definition
> > > > > in lib/librte_eal/common/include/generic/rte_atomic.h and assure
> > > > > that all other architectures follows same rules.
> > > > > Otherwise, this change is logically wrong, because read barrier in
> > > > > current definition could not be used to order Load with Store.
> > > > >
> > > >
> > > > Good points, let me re-think how to handle for other architectures.
> > > > Full MB is required for other architectures(x86? Ppc?), but for arm, read
> > > barrier(load/store and load/load) is enough.
> > >
> > > For x86, I don't think you need any barrier here, as with IA memory mode:
> > > -  Reads are not reordered with other reads.
> > > - Writes are not reordered with older reads.
> > Agree
> 
> I understand herein no instruction level barriers are required for IA, but how about the
> compiler barrier: rte_compiler_barrier?
> 
> >
> > >
> > > BTW, could you explain a bit more why barrier is necessary even on arm
> > here?
> > > As I can see, there is a data dependency between the tail value and
> > > subsequent address calculations for ring writes/reads.
> > > Isn't that sufficient to prevent re-ordering even for weak memory model?
> > The tail value affects 'n'. But, the value of 'n' can be speculated because of
> > the following 'if' statement:
> >
> > if (unlikely(n > *free_entries))
> >                         n = (behavior == RTE_RING_QUEUE_FIXED) ? 0 : *free_entries;
> >
> > The address calculations for actual ring writes/reads do not depend on the
> > tail value. 

Ok, agree I formulated it wrongly, only limit value is dependent on cons.tail. 
Address is not. 

>Since 'n' can be speculated, the writes/reads can be moved up
> > before the load of the tail value.

For my curiosity: ok, I understand that 'n' value can be speculated,
and speculative stores could start before n is calculated properly...
But are you saying that such speculative store results might be visible to the 
other observers (different cpu)? 

> 
> Good explanation. The address calculations does not depend on tail/n, only the
> limit/last one depends on it, while it can be speculated.
> 
> > > Konstantin
> > >
> > >
> > <snip>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [dpdk-dev] [PATCH v2] ring: enforce reading the tails before ring operations
  2019-03-08 15:50                 ` Ananyev, Konstantin
@ 2019-03-08 23:18                   ` Thomas Monjalon
  2019-03-08 23:48                     ` Honnappa Nagarahalli
  0 siblings, 1 reply; 29+ messages in thread
From: Thomas Monjalon @ 2019-03-08 23:18 UTC (permalink / raw)
  To: Ananyev, Konstantin, Gavin Hu (Arm Technology China)
  Cc: Ilya Maximets, dev, nd, jerinj, hemant.agrawal,
	Nipun.gupta@nxp.com, Honnappa Nagarahalli, olivier.matz,
	Richardson, Bruce, chaozhu

08/03/2019 16:50, Ananyev, Konstantin:
> 08/03/2019 16:05, Gavin Hu (Arm Technology China):
> > Anyway, on x86, smp_rmb, as a compiler barrier, applies to load/store, not only load/load.
> 
> Yes, that's true, but I think that's happened by coincidence, 
> not intentionally.
> 
> > This is the case also for arm, arm64, ppc32, ppc64.
> > I will submit a patch to expand the definition of this API.
> 
> I understand your intention, but that does mean we would also need
> to change not only rte_smp_rmb() but rte_rmb() too (to keep things consistent)?
> That sounds worring.  
> Might be better to keep smp_rmb() definition as it is, and introduce new function
> that fits your purposes (smp_rwmb or smp_load_store_barrier)?

How is it managed in other projects?

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [dpdk-dev] [PATCH v2] ring: enforce reading the tails before ring operations
  2019-03-08 23:18                   ` Thomas Monjalon
@ 2019-03-08 23:48                     ` Honnappa Nagarahalli
  2019-03-09 10:28                       ` Gavin Hu (Arm Technology China)
  0 siblings, 1 reply; 29+ messages in thread
From: Honnappa Nagarahalli @ 2019-03-08 23:48 UTC (permalink / raw)
  To: thomas, Ananyev, Konstantin, Gavin Hu (Arm Technology China)
  Cc: Ilya Maximets, dev, nd, jerinj, hemant.agrawal,
	Nipun.gupta@nxp.com, olivier.matz, Richardson, Bruce, chaozhu,
	nd

> 08/03/2019 16:50, Ananyev, Konstantin:
> > 08/03/2019 16:05, Gavin Hu (Arm Technology China):
> > > Anyway, on x86, smp_rmb, as a compiler barrier, applies to load/store, not
> only load/load.
> >
> > Yes, that's true, but I think that's happened by coincidence, not
> > intentionally.
> >
> > > This is the case also for arm, arm64, ppc32, ppc64.
> > > I will submit a patch to expand the definition of this API.
> >
> > I understand your intention, but that does mean we would also need to
> > change not only rte_smp_rmb() but rte_rmb() too (to keep things consistent)?
> > That sounds worring.
> > Might be better to keep smp_rmb() definition as it is, and introduce
> > new function that fits your purposes (smp_rwmb or smp_load_store_barrier)?
Looking at rte_rmb, rte_io_rmb, rte_cio_rmb implementations for Arm, they all provide load/store barrier as well. If other architectures also provide load/store barrier with rte_xxx_rmb, then we could extend the meaning of the existing APIs.

Even if a new API is provided, we need to do provide the same APIs for IO and CIO variants.

> 
> How is it managed in other projects?
In my experience, I usually have been changing the algorithms to use C11 memory model. So, I have not come across this issue yet. Others can comment.

> 

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [dpdk-dev] [PATCH v2] ring: enforce reading the tails before ring operations
  2019-03-08 23:48                     ` Honnappa Nagarahalli
@ 2019-03-09 10:28                       ` Gavin Hu (Arm Technology China)
  0 siblings, 0 replies; 29+ messages in thread
From: Gavin Hu (Arm Technology China) @ 2019-03-09 10:28 UTC (permalink / raw)
  To: Honnappa Nagarahalli, thomas, Ananyev, Konstantin
  Cc: Ilya Maximets, dev, nd, jerinj, hemant.agrawal,
	Nipun.gupta@nxp.com, olivier.matz, Richardson, Bruce, chaozhu,
	nd


> -----Original Message-----
> From: Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>
> Sent: Saturday, March 9, 2019 7:48 AM
> To: thomas@monjalon.net; Ananyev, Konstantin
> <konstantin.ananyev@intel.com>; Gavin Hu (Arm Technology China)
> <Gavin.Hu@arm.com>
> Cc: Ilya Maximets <i.maximets@samsung.com>; dev@dpdk.org; nd
> <nd@arm.com>; jerinj@marvell.com; hemant.agrawal@nxp.com;
> Nipun.gupta@nxp.com; olivier.matz@6wind.com; Richardson, Bruce
> <bruce.richardson@intel.com>; chaozhu@linux.vnet.ibm.com; nd
> <nd@arm.com>
> Subject: RE: [PATCH v2] ring: enforce reading the tails before ring
> operations
> 
> > 08/03/2019 16:50, Ananyev, Konstantin:
> > > 08/03/2019 16:05, Gavin Hu (Arm Technology China):
> > > > Anyway, on x86, smp_rmb, as a compiler barrier, applies to
> load/store, not
> > only load/load.
> > >
> > > Yes, that's true, but I think that's happened by coincidence, not
> > > intentionally.
> > >
> > > > This is the case also for arm, arm64, ppc32, ppc64.
> > > > I will submit a patch to expand the definition of this API.
> > >
> > > I understand your intention, but that does mean we would also need
> to
> > > change not only rte_smp_rmb() but rte_rmb() too (to keep things
> consistent)?
> > > That sounds worring.
> > > Might be better to keep smp_rmb() definition as it is, and introduce
> > > new function that fits your purposes (smp_rwmb or
> smp_load_store_barrier)?
> Looking at rte_rmb, rte_io_rmb, rte_cio_rmb implementations for Arm,
> they all provide load/store barrier as well. If other architectures also
> provide load/store barrier with rte_xxx_rmb, then we could extend the
> meaning of the existing APIs.
Further looking at rte_rmb, rte_io_rmb, rte_cio_rmb implementations for PPC64 and x86,
They also provide load/store barrier. It is safe to extend the meaning of the existing rte_XXX_rmb API.
> 
> Even if a new API is provided, we need to do provide the same APIs for IO
> and CIO variants.
Since rte_XXX_rmbs API for all architectures already provide the desired load/store ordering,
a new API is redundant and not needed. 

> >
> > How is it managed in other projects?
> In my experience, I usually have been changing the algorithms to use C11
> memory model. So, I have not come across this issue yet. Others can
> comment.
> 
> >

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [dpdk-dev] [PATCH v2] ring: enforce reading the tails before ring operations
  2019-03-08 16:33                 ` Ananyev, Konstantin
@ 2019-03-10 20:47                   ` Honnappa Nagarahalli
  2019-03-11 13:58                     ` Ananyev, Konstantin
  0 siblings, 1 reply; 29+ messages in thread
From: Honnappa Nagarahalli @ 2019-03-10 20:47 UTC (permalink / raw)
  To: Ananyev, Konstantin, Gavin Hu (Arm Technology China), Ilya Maximets, dev
  Cc: nd, thomas, jerinj, hemant.agrawal, Nipun.gupta@nxp.com,
	olivier.matz, Richardson, Bruce, chaozhu, nd

> > > > > > >>> In weak memory models, like arm64, reading the
> > > > > > >>> {prod,cons}.tail may get reordered after reading or
> > > > > > >>> writing the ring slots, which corrupts the ring and stale data is
> observed.
> > > > > > >>>
> > > > > > >>> This issue was reported by NXP on 8-A72 DPAA2 board. The
> > > problem
> > > > > > >>> is
> > > > > > >> most
> > > > > > >>> likely caused by missing the acquire semantics when
> > > > > > >>> reading cons.tail (in SP enqueue) or prod.tail (in SC
> > > > > > >>> dequeue) which makes it possible to
> > > > > > read
> > > > > > >> a
> > > > > > >>> stale value from the ring slots.
> > > > > > >>>
> > > > > > >>> For MP (and MC) case, rte_atomic32_cmpset() already
> > > > > > >>> provides
> > > the
> > > > > > >> required
> > > > > > >>> ordering. This patch is to prevent reading and writing the
> > > > > > >>> ring slots get reordered before reading {prod,cons}.tail
> > > > > > >>> for SP (and SC)
> > > > case.
> > > > > > >>
> > > > > > >> Read barrier rte_smp_rmb() is OK to prevent reading the
> > > > > > >> ring get reordered before reading the tail. However, to
> > > > > > >> prevent *writing* the ring get reordered *before reading*
> > > > > > >> the tail you need a full memory barrier, i.e.
> > > > > > >> rte_smp_mb().
> > > > > > >
> > > > > > > ISHLD(rte_smp_rmb is DMB(ishld) orders LD/LD and LD/ST,
> > > > > > > while WMB(ST
> > > > > > Option) orders ST/ST.
> > > > > > > For more details, please refer to: Table B2-1 Encoding of
> > > > > > > the DMB and DSB
> > > > > > <option> parameter  in
> > > > > > > https://developer.arm.com/docs/ddi0487/latest/arm-architectu
> > > > > > > re-
> > > > > > reference-manual-armv8-for-armv8-a-architecture-profile
> > > > > >
> > > > > > I see. But you have to change the rte_smp_rmb() function
> > > > > > definition in
> > > > > > lib/librte_eal/common/include/generic/rte_atomic.h and assure that
> all other architectures follows same rules.
> > > > > > Otherwise, this change is logically wrong, because read
> > > > > > barrier in current definition could not be used to order Load with
> Store.
> > > > > >
> > > > >
> > > > > Good points, let me re-think how to handle for other architectures.
> > > > > Full MB is required for other architectures(x86? Ppc?), but for
> > > > > arm, read
> > > > barrier(load/store and load/load) is enough.
> > > >
> > > > For x86, I don't think you need any barrier here, as with IA memory
> mode:
> > > > -  Reads are not reordered with other reads.
> > > > - Writes are not reordered with older reads.
> > > Agree
> >
> > I understand herein no instruction level barriers are required for IA,
> > but how about the compiler barrier: rte_compiler_barrier?
> >
> > >
> > > >
> > > > BTW, could you explain a bit more why barrier is necessary even on
> > > > arm
> > > here?
> > > > As I can see, there is a data dependency between the tail value
> > > > and subsequent address calculations for ring writes/reads.
> > > > Isn't that sufficient to prevent re-ordering even for weak memory model?
> > > The tail value affects 'n'. But, the value of 'n' can be speculated
> > > because of the following 'if' statement:
> > >
> > > if (unlikely(n > *free_entries))
> > >                         n = (behavior == RTE_RING_QUEUE_FIXED) ? 0 :
> > > *free_entries;
> > >
> > > The address calculations for actual ring writes/reads do not depend
> > > on the tail value.
> 
> Ok, agree I formulated it wrongly, only limit value is dependent on cons.tail.
> Address is not.
> 
> >Since 'n' can be speculated, the writes/reads can be moved up
> > > before the load of the tail value.
> 
> For my curiosity: ok, I understand that 'n' value can be speculated, and
> speculative stores could start before n is calculated properly...
> But are you saying that such speculative store results might be visible to the
> other observers (different cpu)?
> 
You are correct. The speculative stores will NOT be visible to other observers till the value of 'n' is fixed. Speculative stores might have to be discarded depending on the value of 'n' (which will affect cache performance).
There is also a control dependency between the load of cons.tail and the stores to the ring. That should also keep the load and stores from getting reordered (though I am not sure if it still allows for speculative stores).
So, IMO, the barrier in enqueue is not needed. Is this what you wanted to drive at?

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [dpdk-dev] [PATCH v2] ring: enforce reading the tails before ring operations
  2019-03-10 20:47                   ` Honnappa Nagarahalli
@ 2019-03-11 13:58                     ` Ananyev, Konstantin
  0 siblings, 0 replies; 29+ messages in thread
From: Ananyev, Konstantin @ 2019-03-11 13:58 UTC (permalink / raw)
  To: Honnappa Nagarahalli, Gavin Hu (Arm Technology China),
	Ilya Maximets, dev
  Cc: nd, thomas, jerinj, hemant.agrawal, Nipun.gupta@nxp.com,
	olivier.matz, Richardson, Bruce, chaozhu, nd

> > > > > > > >>> In weak memory models, like arm64, reading the
> > > > > > > >>> {prod,cons}.tail may get reordered after reading or
> > > > > > > >>> writing the ring slots, which corrupts the ring and stale data is
> > observed.
> > > > > > > >>>
> > > > > > > >>> This issue was reported by NXP on 8-A72 DPAA2 board. The
> > > > problem
> > > > > > > >>> is
> > > > > > > >> most
> > > > > > > >>> likely caused by missing the acquire semantics when
> > > > > > > >>> reading cons.tail (in SP enqueue) or prod.tail (in SC
> > > > > > > >>> dequeue) which makes it possible to
> > > > > > > read
> > > > > > > >> a
> > > > > > > >>> stale value from the ring slots.
> > > > > > > >>>
> > > > > > > >>> For MP (and MC) case, rte_atomic32_cmpset() already
> > > > > > > >>> provides
> > > > the
> > > > > > > >> required
> > > > > > > >>> ordering. This patch is to prevent reading and writing the
> > > > > > > >>> ring slots get reordered before reading {prod,cons}.tail
> > > > > > > >>> for SP (and SC)
> > > > > case.
> > > > > > > >>
> > > > > > > >> Read barrier rte_smp_rmb() is OK to prevent reading the
> > > > > > > >> ring get reordered before reading the tail. However, to
> > > > > > > >> prevent *writing* the ring get reordered *before reading*
> > > > > > > >> the tail you need a full memory barrier, i.e.
> > > > > > > >> rte_smp_mb().
> > > > > > > >
> > > > > > > > ISHLD(rte_smp_rmb is DMB(ishld) orders LD/LD and LD/ST,
> > > > > > > > while WMB(ST
> > > > > > > Option) orders ST/ST.
> > > > > > > > For more details, please refer to: Table B2-1 Encoding of
> > > > > > > > the DMB and DSB
> > > > > > > <option> parameter  in
> > > > > > > > https://developer.arm.com/docs/ddi0487/latest/arm-architectu
> > > > > > > > re-
> > > > > > > reference-manual-armv8-for-armv8-a-architecture-profile
> > > > > > >
> > > > > > > I see. But you have to change the rte_smp_rmb() function
> > > > > > > definition in
> > > > > > > lib/librte_eal/common/include/generic/rte_atomic.h and assure that
> > all other architectures follows same rules.
> > > > > > > Otherwise, this change is logically wrong, because read
> > > > > > > barrier in current definition could not be used to order Load with
> > Store.
> > > > > > >
> > > > > >
> > > > > > Good points, let me re-think how to handle for other architectures.
> > > > > > Full MB is required for other architectures(x86? Ppc?), but for
> > > > > > arm, read
> > > > > barrier(load/store and load/load) is enough.
> > > > >
> > > > > For x86, I don't think you need any barrier here, as with IA memory
> > mode:
> > > > > -  Reads are not reordered with other reads.
> > > > > - Writes are not reordered with older reads.
> > > > Agree
> > >
> > > I understand herein no instruction level barriers are required for IA,
> > > but how about the compiler barrier: rte_compiler_barrier?
> > >
> > > >
> > > > >
> > > > > BTW, could you explain a bit more why barrier is necessary even on
> > > > > arm
> > > > here?
> > > > > As I can see, there is a data dependency between the tail value
> > > > > and subsequent address calculations for ring writes/reads.
> > > > > Isn't that sufficient to prevent re-ordering even for weak memory model?
> > > > The tail value affects 'n'. But, the value of 'n' can be speculated
> > > > because of the following 'if' statement:
> > > >
> > > > if (unlikely(n > *free_entries))
> > > >                         n = (behavior == RTE_RING_QUEUE_FIXED) ? 0 :
> > > > *free_entries;
> > > >
> > > > The address calculations for actual ring writes/reads do not depend
> > > > on the tail value.
> >
> > Ok, agree I formulated it wrongly, only limit value is dependent on cons.tail.
> > Address is not.
> >
> > >Since 'n' can be speculated, the writes/reads can be moved up
> > > > before the load of the tail value.
> >
> > For my curiosity: ok, I understand that 'n' value can be speculated, and
> > speculative stores could start before n is calculated properly...
> > But are you saying that such speculative store results might be visible to the
> > other observers (different cpu)?
> >
> You are correct. The speculative stores will NOT be visible to other observers till the value of 'n' is fixed. Speculative stores might have to be
> discarded depending on the value of 'n' (which will affect cache performance).
> There is also a control dependency between the load of cons.tail and the stores to the ring. That should also keep the load and stores from
> getting reordered (though I am not sure if it still allows for speculative stores).
> So, IMO, the barrier in enqueue is not needed. Is this what you wanted to drive at?

Yes, that was my thought.
Thanks
Konstantin


^ permalink raw reply	[flat|nested] 29+ messages in thread

* [dpdk-dev] [PATCH v3 0/1] ring: enforce reading the tail before reading ring slots
  2019-03-06  3:07 [dpdk-dev] [PATCH v1] ring: enforce reading the tails before ring operations gavin hu
       [not found] ` <CGME20190306114906eucas1p19c2572b1fe777e1eb0ca96d2e47295bd@eucas1p1.samsung.com>
  2019-03-07  6:45 ` [dpdk-dev] [PATCH v2] " gavin hu
@ 2019-03-12 16:58 ` Gavin Hu
  2019-03-12 16:58 ` [dpdk-dev] [PATCH v3 1/1] " Gavin Hu
  3 siblings, 0 replies; 29+ messages in thread
From: Gavin Hu @ 2019-03-12 16:58 UTC (permalink / raw)
  To: dev
  Cc: nd, thomas, konstantin.ananyev, jerinj, hemant.agrawal,
	nipun.gupta, Honnappa.Nagarahalli, gavin.hu, i.maximets, chaozhu

Change log:
  -V1: First version of the patch.

  -V2: Update the commit message to describe the ordering of detailed memory
accesses.

  -V3: As suggested by Konstantin Ananyev, remove the rte_smp_rmb for enqueue
case, as there is a control dependency in place, making rmb unnecessary. For
dequeue case, it is still required as the control dependency does not
order loads with loads.

gavin hu (1):
  ring: enforce reading the tail before reading ring slots

 lib/librte_ring/rte_ring_generic.h | 9 ++++++---
 1 file changed, 6 insertions(+), 3 deletions(-)

-- 
2.7.4

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [dpdk-dev] [PATCH v3 1/1] ring: enforce reading the tail before reading ring slots
  2019-03-06  3:07 [dpdk-dev] [PATCH v1] ring: enforce reading the tails before ring operations gavin hu
                   ` (2 preceding siblings ...)
  2019-03-12 16:58 ` [dpdk-dev] [PATCH v3 0/1] ring: enforce reading the tail before reading ring slots Gavin Hu
@ 2019-03-12 16:58 ` Gavin Hu
  2019-03-13  8:12   ` Nipun Gupta
  2019-03-15 13:26   ` Ananyev, Konstantin
  3 siblings, 2 replies; 29+ messages in thread
From: Gavin Hu @ 2019-03-12 16:58 UTC (permalink / raw)
  To: dev
  Cc: nd, gavin hu, thomas, konstantin.ananyev, jerinj, hemant.agrawal,
	nipun.gupta, Honnappa.Nagarahalli, i.maximets, chaozhu, stable

From: gavin hu <gavin.hu@arm.com>

In weak memory models, like arm64, reading the prod.tail may get
reordered after reading the ring slots, which corrupts the ring and
stale data is observed.

This issue was reported by NXP on 8-A72 DPAA2 board. The problem is most
likely caused by missing the acquire semantics when reading
prod.tail (in SC dequeue) which makes it possible to read a
stale value from the ring slots.

For MP (and MC) case, rte_atomic32_cmpset() already provides the required
ordering. For SP case, the control depependency between if-statement(which
depends on the read of r->cons.tail) and the later stores to the ring slots
make RMB unnecessary. About the control dependency, read more at:
https://www.cl.cam.ac.uk/~pes20/ppc-supplemental/test7.pdf

This patch is adding the required read barrier to prevent reading the ring
slots get reordered before reading prod.tail for SC case.

Fixes: c9fb3c62896f ("ring: move code in a new header file")
Cc: stable@dpdk.org

Signed-off-by: gavin hu <gavin.hu@arm.com>
Reviewed-by: Ola Liljedahl <Ola.Liljedahl@arm.com>
Tested-by: Nipun Gupta <nipun.gupta@nxp.com>
---
 lib/librte_ring/rte_ring_generic.h | 9 ++++++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/lib/librte_ring/rte_ring_generic.h b/lib/librte_ring/rte_ring_generic.h
index ea7dbe5..953cdbb 100644
--- a/lib/librte_ring/rte_ring_generic.h
+++ b/lib/librte_ring/rte_ring_generic.h
@@ -158,11 +158,14 @@ __rte_ring_move_cons_head(struct rte_ring *r, unsigned int is_sc,
 			return 0;
 
 		*new_head = *old_head + n;
-		if (is_sc)
-			r->cons.head = *new_head, success = 1;
-		else
+		if (is_sc) {
+			r->cons.head = *new_head;
+			rte_smp_rmb();
+			success = 1;
+		} else {
 			success = rte_atomic32_cmpset(&r->cons.head, *old_head,
 					*new_head);
+		}
 	} while (unlikely(success == 0));
 	return n;
 }
-- 
2.7.4

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [dpdk-dev] [PATCH v3 1/1] ring: enforce reading the tail before reading ring slots
  2019-03-12 16:58 ` [dpdk-dev] [PATCH v3 1/1] " Gavin Hu
@ 2019-03-13  8:12   ` Nipun Gupta
  2019-03-15 13:26   ` Ananyev, Konstantin
  1 sibling, 0 replies; 29+ messages in thread
From: Nipun Gupta @ 2019-03-13  8:12 UTC (permalink / raw)
  To: Gavin Hu, dev
  Cc: nd, thomas, konstantin.ananyev, jerinj, Hemant Agrawal,
	Honnappa.Nagarahalli, i.maximets, chaozhu, stable



> -----Original Message-----
> From: Gavin Hu [mailto:gavin.hu@arm.com]
> Sent: Tuesday, March 12, 2019 10:29 PM
> To: dev@dpdk.org
> Cc: nd@arm.com; gavin hu <gavin.hu@arm.com>; thomas@monjalon.net;
> konstantin.ananyev@intel.com; jerinj@marvell.com; Hemant Agrawal
> <hemant.agrawal@nxp.com>; Nipun Gupta <nipun.gupta@nxp.com>;
> Honnappa.Nagarahalli@arm.com; i.maximets@samsung.com;
> chaozhu@linux.vnet.ibm.com; stable@dpdk.org
> Subject: [PATCH v3 1/1] ring: enforce reading the tail before reading ring slots
> 
> From: gavin hu <gavin.hu@arm.com>
> 
> In weak memory models, like arm64, reading the prod.tail may get
> reordered after reading the ring slots, which corrupts the ring and
> stale data is observed.
> 
> This issue was reported by NXP on 8-A72 DPAA2 board. The problem is most
> likely caused by missing the acquire semantics when reading
> prod.tail (in SC dequeue) which makes it possible to read a
> stale value from the ring slots.
> 
> For MP (and MC) case, rte_atomic32_cmpset() already provides the required
> ordering. For SP case, the control depependency between if-
> statement(which
> depends on the read of r->cons.tail) and the later stores to the ring slots
> make RMB unnecessary. About the control dependency, read more at:
> https://emea01.safelinks.protection.outlook.com/?url=https:%2F%2Fwww.c
> l.cam.ac.uk%2F~pes20%2Fppc-
> supplemental%2Ftest7.pdf&amp;data=02%7C01%7Cnipun.gupta%40nxp.co
> m%7Cc3df030ec49449bbaf0508d6a70c0b9f%7C686ea1d3bc2b4c6fa92cd99c5c
> 301635%7C0%7C0%7C636880067514602937&amp;sdata=ogPdd%2F4UGYWE8
> nAwY1lkXPCB9MpUIFY0VOQr2N1lwe4%3D&amp;reserved=0
> 
> This patch is adding the required read barrier to prevent reading the ring
> slots get reordered before reading prod.tail for SC case.
> 
> Fixes: c9fb3c62896f ("ring: move code in a new header file")
> Cc: stable@dpdk.org
> 
> Signed-off-by: gavin hu <gavin.hu@arm.com>
> Reviewed-by: Ola Liljedahl <Ola.Liljedahl@arm.com>
> Tested-by: Nipun Gupta <nipun.gupta@nxp.com>

Acked-by: Nipun Gupta <nipun.gupta@nxp.com>

Also, I have revalidated this updated patch.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [dpdk-dev] [PATCH v3 1/1] ring: enforce reading the tail before reading ring slots
  2019-03-12 16:58 ` [dpdk-dev] [PATCH v3 1/1] " Gavin Hu
  2019-03-13  8:12   ` Nipun Gupta
@ 2019-03-15 13:26   ` Ananyev, Konstantin
  2019-03-15 13:26     ` Ananyev, Konstantin
  2019-03-28  0:21     ` Thomas Monjalon
  1 sibling, 2 replies; 29+ messages in thread
From: Ananyev, Konstantin @ 2019-03-15 13:26 UTC (permalink / raw)
  To: Gavin Hu, dev
  Cc: nd, thomas, jerinj, hemant.agrawal, nipun.gupta,
	Honnappa.Nagarahalli, i.maximets, chaozhu, stable



> -----Original Message-----
> From: Gavin Hu [mailto:gavin.hu@arm.com]
> Sent: Tuesday, March 12, 2019 4:59 PM
> To: dev@dpdk.org
> Cc: nd@arm.com; gavin hu <gavin.hu@arm.com>; thomas@monjalon.net; Ananyev, Konstantin <konstantin.ananyev@intel.com>;
> jerinj@marvell.com; hemant.agrawal@nxp.com; nipun.gupta@nxp.com; Honnappa.Nagarahalli@arm.com; i.maximets@samsung.com;
> chaozhu@linux.vnet.ibm.com; stable@dpdk.org
> Subject: [PATCH v3 1/1] ring: enforce reading the tail before reading ring slots
> 
> From: gavin hu <gavin.hu@arm.com>
> 
> In weak memory models, like arm64, reading the prod.tail may get
> reordered after reading the ring slots, which corrupts the ring and
> stale data is observed.
> 
> This issue was reported by NXP on 8-A72 DPAA2 board. The problem is most
> likely caused by missing the acquire semantics when reading
> prod.tail (in SC dequeue) which makes it possible to read a
> stale value from the ring slots.
> 
> For MP (and MC) case, rte_atomic32_cmpset() already provides the required
> ordering. For SP case, the control depependency between if-statement(which
> depends on the read of r->cons.tail) and the later stores to the ring slots
> make RMB unnecessary. About the control dependency, read more at:
> https://www.cl.cam.ac.uk/~pes20/ppc-supplemental/test7.pdf
> 
> This patch is adding the required read barrier to prevent reading the ring
> slots get reordered before reading prod.tail for SC case.
> 
> Fixes: c9fb3c62896f ("ring: move code in a new header file")
> Cc: stable@dpdk.org
> 
> Signed-off-by: gavin hu <gavin.hu@arm.com>
> Reviewed-by: Ola Liljedahl <Ola.Liljedahl@arm.com>
> Tested-by: Nipun Gupta <nipun.gupta@nxp.com>
> ---
>  lib/librte_ring/rte_ring_generic.h | 9 ++++++---
>  1 file changed, 6 insertions(+), 3 deletions(-)
> 
> diff --git a/lib/librte_ring/rte_ring_generic.h b/lib/librte_ring/rte_ring_generic.h
> index ea7dbe5..953cdbb 100644
> --- a/lib/librte_ring/rte_ring_generic.h
> +++ b/lib/librte_ring/rte_ring_generic.h
> @@ -158,11 +158,14 @@ __rte_ring_move_cons_head(struct rte_ring *r, unsigned int is_sc,
>  			return 0;
> 
>  		*new_head = *old_head + n;
> -		if (is_sc)
> -			r->cons.head = *new_head, success = 1;
> -		else
> +		if (is_sc) {
> +			r->cons.head = *new_head;
> +			rte_smp_rmb();
> +			success = 1;
> +		} else {
>  			success = rte_atomic32_cmpset(&r->cons.head, *old_head,
>  					*new_head);
> +		}
>  	} while (unlikely(success == 0));
>  	return n;
>  }
> --

Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>

> 2.7.4

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [dpdk-dev] [PATCH v3 1/1] ring: enforce reading the tail before reading ring slots
  2019-03-15 13:26   ` Ananyev, Konstantin
@ 2019-03-15 13:26     ` Ananyev, Konstantin
  2019-03-28  0:21     ` Thomas Monjalon
  1 sibling, 0 replies; 29+ messages in thread
From: Ananyev, Konstantin @ 2019-03-15 13:26 UTC (permalink / raw)
  To: Gavin Hu, dev
  Cc: nd, thomas, jerinj, hemant.agrawal, nipun.gupta,
	Honnappa.Nagarahalli, i.maximets, chaozhu, stable



> -----Original Message-----
> From: Gavin Hu [mailto:gavin.hu@arm.com]
> Sent: Tuesday, March 12, 2019 4:59 PM
> To: dev@dpdk.org
> Cc: nd@arm.com; gavin hu <gavin.hu@arm.com>; thomas@monjalon.net; Ananyev, Konstantin <konstantin.ananyev@intel.com>;
> jerinj@marvell.com; hemant.agrawal@nxp.com; nipun.gupta@nxp.com; Honnappa.Nagarahalli@arm.com; i.maximets@samsung.com;
> chaozhu@linux.vnet.ibm.com; stable@dpdk.org
> Subject: [PATCH v3 1/1] ring: enforce reading the tail before reading ring slots
> 
> From: gavin hu <gavin.hu@arm.com>
> 
> In weak memory models, like arm64, reading the prod.tail may get
> reordered after reading the ring slots, which corrupts the ring and
> stale data is observed.
> 
> This issue was reported by NXP on 8-A72 DPAA2 board. The problem is most
> likely caused by missing the acquire semantics when reading
> prod.tail (in SC dequeue) which makes it possible to read a
> stale value from the ring slots.
> 
> For MP (and MC) case, rte_atomic32_cmpset() already provides the required
> ordering. For SP case, the control depependency between if-statement(which
> depends on the read of r->cons.tail) and the later stores to the ring slots
> make RMB unnecessary. About the control dependency, read more at:
> https://www.cl.cam.ac.uk/~pes20/ppc-supplemental/test7.pdf
> 
> This patch is adding the required read barrier to prevent reading the ring
> slots get reordered before reading prod.tail for SC case.
> 
> Fixes: c9fb3c62896f ("ring: move code in a new header file")
> Cc: stable@dpdk.org
> 
> Signed-off-by: gavin hu <gavin.hu@arm.com>
> Reviewed-by: Ola Liljedahl <Ola.Liljedahl@arm.com>
> Tested-by: Nipun Gupta <nipun.gupta@nxp.com>
> ---
>  lib/librte_ring/rte_ring_generic.h | 9 ++++++---
>  1 file changed, 6 insertions(+), 3 deletions(-)
> 
> diff --git a/lib/librte_ring/rte_ring_generic.h b/lib/librte_ring/rte_ring_generic.h
> index ea7dbe5..953cdbb 100644
> --- a/lib/librte_ring/rte_ring_generic.h
> +++ b/lib/librte_ring/rte_ring_generic.h
> @@ -158,11 +158,14 @@ __rte_ring_move_cons_head(struct rte_ring *r, unsigned int is_sc,
>  			return 0;
> 
>  		*new_head = *old_head + n;
> -		if (is_sc)
> -			r->cons.head = *new_head, success = 1;
> -		else
> +		if (is_sc) {
> +			r->cons.head = *new_head;
> +			rte_smp_rmb();
> +			success = 1;
> +		} else {
>  			success = rte_atomic32_cmpset(&r->cons.head, *old_head,
>  					*new_head);
> +		}
>  	} while (unlikely(success == 0));
>  	return n;
>  }
> --

Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>

> 2.7.4


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [dpdk-dev] [PATCH v3 1/1] ring: enforce reading the tail before reading ring slots
  2019-03-15 13:26   ` Ananyev, Konstantin
  2019-03-15 13:26     ` Ananyev, Konstantin
@ 2019-03-28  0:21     ` Thomas Monjalon
  2019-03-28  0:21       ` Thomas Monjalon
  1 sibling, 1 reply; 29+ messages in thread
From: Thomas Monjalon @ 2019-03-28  0:21 UTC (permalink / raw)
  To: Gavin Hu
  Cc: dev, Ananyev, Konstantin, nd, jerinj, hemant.agrawal,
	nipun.gupta, Honnappa.Nagarahalli, i.maximets, chaozhu, stable

> > In weak memory models, like arm64, reading the prod.tail may get
> > reordered after reading the ring slots, which corrupts the ring and
> > stale data is observed.
> > 
> > This issue was reported by NXP on 8-A72 DPAA2 board. The problem is most
> > likely caused by missing the acquire semantics when reading
> > prod.tail (in SC dequeue) which makes it possible to read a
> > stale value from the ring slots.
> > 
> > For MP (and MC) case, rte_atomic32_cmpset() already provides the required
> > ordering. For SP case, the control depependency between if-statement(which
> > depends on the read of r->cons.tail) and the later stores to the ring slots
> > make RMB unnecessary. About the control dependency, read more at:
> > https://www.cl.cam.ac.uk/~pes20/ppc-supplemental/test7.pdf
> > 
> > This patch is adding the required read barrier to prevent reading the ring
> > slots get reordered before reading prod.tail for SC case.
> > 
> > Fixes: c9fb3c62896f ("ring: move code in a new header file")
> > Cc: stable@dpdk.org
> > 
> > Signed-off-by: gavin hu <gavin.hu@arm.com>
> > Reviewed-by: Ola Liljedahl <Ola.Liljedahl@arm.com>
> > Tested-by: Nipun Gupta <nipun.gupta@nxp.com>
> 
> Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>

Applied, thanks

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [dpdk-dev] [PATCH v3 1/1] ring: enforce reading the tail before reading ring slots
  2019-03-28  0:21     ` Thomas Monjalon
@ 2019-03-28  0:21       ` Thomas Monjalon
  0 siblings, 0 replies; 29+ messages in thread
From: Thomas Monjalon @ 2019-03-28  0:21 UTC (permalink / raw)
  To: Gavin Hu
  Cc: dev, Ananyev, Konstantin, nd, jerinj, hemant.agrawal,
	nipun.gupta, Honnappa.Nagarahalli, i.maximets, chaozhu, stable

> > In weak memory models, like arm64, reading the prod.tail may get
> > reordered after reading the ring slots, which corrupts the ring and
> > stale data is observed.
> > 
> > This issue was reported by NXP on 8-A72 DPAA2 board. The problem is most
> > likely caused by missing the acquire semantics when reading
> > prod.tail (in SC dequeue) which makes it possible to read a
> > stale value from the ring slots.
> > 
> > For MP (and MC) case, rte_atomic32_cmpset() already provides the required
> > ordering. For SP case, the control depependency between if-statement(which
> > depends on the read of r->cons.tail) and the later stores to the ring slots
> > make RMB unnecessary. About the control dependency, read more at:
> > https://www.cl.cam.ac.uk/~pes20/ppc-supplemental/test7.pdf
> > 
> > This patch is adding the required read barrier to prevent reading the ring
> > slots get reordered before reading prod.tail for SC case.
> > 
> > Fixes: c9fb3c62896f ("ring: move code in a new header file")
> > Cc: stable@dpdk.org
> > 
> > Signed-off-by: gavin hu <gavin.hu@arm.com>
> > Reviewed-by: Ola Liljedahl <Ola.Liljedahl@arm.com>
> > Tested-by: Nipun Gupta <nipun.gupta@nxp.com>
> 
> Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>

Applied, thanks



^ permalink raw reply	[flat|nested] 29+ messages in thread

end of thread, other threads:[~2019-03-28  0:22 UTC | newest]

Thread overview: 29+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-03-06  3:07 [dpdk-dev] [PATCH v1] ring: enforce reading the tails before ring operations gavin hu
     [not found] ` <CGME20190306114906eucas1p19c2572b1fe777e1eb0ca96d2e47295bd@eucas1p1.samsung.com>
2019-03-06 11:49   ` [dpdk-dev] [v1] " Ilya Maximets
2019-03-07  6:50     ` Gavin Hu (Arm Technology China)
2019-03-07  6:45 ` [dpdk-dev] [PATCH v2] " gavin hu
2019-03-07  8:52   ` Ilya Maximets
2019-03-07  9:27     ` Gavin Hu (Arm Technology China)
2019-03-07  9:48       ` Ilya Maximets
2019-03-07 10:44         ` Gavin Hu (Arm Technology China)
2019-03-07 11:17           ` Ananyev, Konstantin
2019-03-08  3:21             ` Honnappa Nagarahalli
2019-03-08  5:27               ` Gavin Hu (Arm Technology China)
2019-03-08 16:33                 ` Ananyev, Konstantin
2019-03-10 20:47                   ` Honnappa Nagarahalli
2019-03-11 13:58                     ` Ananyev, Konstantin
2019-03-08  4:23           ` Gavin Hu (Arm Technology China)
2019-03-08  5:06             ` Honnappa Nagarahalli
2019-03-08 12:13             ` Ananyev, Konstantin
2019-03-08 15:05               ` Gavin Hu (Arm Technology China)
2019-03-08 15:50                 ` Ananyev, Konstantin
2019-03-08 23:18                   ` Thomas Monjalon
2019-03-08 23:48                     ` Honnappa Nagarahalli
2019-03-09 10:28                       ` Gavin Hu (Arm Technology China)
2019-03-12 16:58 ` [dpdk-dev] [PATCH v3 0/1] ring: enforce reading the tail before reading ring slots Gavin Hu
2019-03-12 16:58 ` [dpdk-dev] [PATCH v3 1/1] " Gavin Hu
2019-03-13  8:12   ` Nipun Gupta
2019-03-15 13:26   ` Ananyev, Konstantin
2019-03-15 13:26     ` Ananyev, Konstantin
2019-03-28  0:21     ` Thomas Monjalon
2019-03-28  0:21       ` Thomas Monjalon

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).