* [dpdk-dev] [PATCH] eal/ppc: remove fix of memory barrier for IBM POWER
@ 2019-03-18 12:58 Dekel Peled
  2019-03-18 12:58 ` Dekel Peled
                   ` (2 more replies)
  0 siblings, 3 replies; 42+ messages in thread
From: Dekel Peled @ 2019-03-18 12:58 UTC (permalink / raw)
  To: chaozhu; +Cc: yskoh, shahafs, dev, orika, thomas, dekelp, stable
>From previous patch description: "to improve performance on PPC64,
use light weight sync instruction instead of sync instruction."
Excerpt from IBM doc [1], section "Memory barrier instructions":
"The second form of the sync instruction is light-weight sync,
or lwsync.
This form is used to control ordering for storage accesses to system
memory only. It does not create a memory barrier for accesses to
device memory."
This patch removes the use of lwsync, so calls to rte_wmb() and
rte_rmb() will provide correct memory barrier to ensure order of
accesses to system memory and device memory.
[1] https://www.ibm.com/developerworks/systems/articles/powerpc.html
Fixes: d23a6bd04d72 ("eal/ppc: fix memory barrier for IBM POWER")
Cc: stable@dpdk.org
Signed-off-by: Dekel Peled <dekelp@mellanox.com>
---
 lib/librte_eal/common/include/arch/ppc_64/rte_atomic.h | 8 --------
 1 file changed, 8 deletions(-)
diff --git a/lib/librte_eal/common/include/arch/ppc_64/rte_atomic.h b/lib/librte_eal/common/include/arch/ppc_64/rte_atomic.h
index ce38350..797381c 100644
--- a/lib/librte_eal/common/include/arch/ppc_64/rte_atomic.h
+++ b/lib/librte_eal/common/include/arch/ppc_64/rte_atomic.h
@@ -63,11 +63,7 @@
  * Guarantees that the STORE operations generated before the barrier
  * occur before the STORE operations generated after.
  */
-#ifdef RTE_ARCH_64
-#define	rte_wmb() asm volatile("lwsync" : : : "memory")
-#else
 #define	rte_wmb() asm volatile("sync" : : : "memory")
-#endif
 
 /**
  * Read memory barrier.
@@ -75,11 +71,7 @@
  * Guarantees that the LOAD operations generated before the barrier
  * occur before the LOAD operations generated after.
  */
-#ifdef RTE_ARCH_64
-#define	rte_rmb() asm volatile("lwsync" : : : "memory")
-#else
 #define	rte_rmb() asm volatile("sync" : : : "memory")
-#endif
 
 #define rte_smp_mb() rte_mb()
 
-- 
1.8.3.1
^ permalink raw reply	[flat|nested] 42+ messages in thread
* [dpdk-dev] [PATCH] eal/ppc: remove fix of memory barrier for IBM POWER
  2019-03-18 12:58 [dpdk-dev] [PATCH] eal/ppc: remove fix of memory barrier for IBM POWER Dekel Peled
@ 2019-03-18 12:58 ` Dekel Peled
  2019-03-19  3:24 ` Chao Zhu
  2019-03-28 22:50 ` [dpdk-dev] [dpdk-stable] " Thomas Monjalon
  2 siblings, 0 replies; 42+ messages in thread
From: Dekel Peled @ 2019-03-18 12:58 UTC (permalink / raw)
  To: chaozhu; +Cc: yskoh, shahafs, dev, orika, thomas, dekelp, stable
From previous patch description: "to improve performance on PPC64,
use light weight sync instruction instead of sync instruction."
Excerpt from IBM doc [1], section "Memory barrier instructions":
"The second form of the sync instruction is light-weight sync,
or lwsync.
This form is used to control ordering for storage accesses to system
memory only. It does not create a memory barrier for accesses to
device memory."
This patch removes the use of lwsync, so calls to rte_wmb() and
rte_rmb() will provide correct memory barrier to ensure order of
accesses to system memory and device memory.
[1] https://www.ibm.com/developerworks/systems/articles/powerpc.html
Fixes: d23a6bd04d72 ("eal/ppc: fix memory barrier for IBM POWER")
Cc: stable@dpdk.org
Signed-off-by: Dekel Peled <dekelp@mellanox.com>
---
 lib/librte_eal/common/include/arch/ppc_64/rte_atomic.h | 8 --------
 1 file changed, 8 deletions(-)
diff --git a/lib/librte_eal/common/include/arch/ppc_64/rte_atomic.h b/lib/librte_eal/common/include/arch/ppc_64/rte_atomic.h
index ce38350..797381c 100644
--- a/lib/librte_eal/common/include/arch/ppc_64/rte_atomic.h
+++ b/lib/librte_eal/common/include/arch/ppc_64/rte_atomic.h
@@ -63,11 +63,7 @@
  * Guarantees that the STORE operations generated before the barrier
  * occur before the STORE operations generated after.
  */
-#ifdef RTE_ARCH_64
-#define	rte_wmb() asm volatile("lwsync" : : : "memory")
-#else
 #define	rte_wmb() asm volatile("sync" : : : "memory")
-#endif
 
 /**
  * Read memory barrier.
@@ -75,11 +71,7 @@
  * Guarantees that the LOAD operations generated before the barrier
  * occur before the LOAD operations generated after.
  */
-#ifdef RTE_ARCH_64
-#define	rte_rmb() asm volatile("lwsync" : : : "memory")
-#else
 #define	rte_rmb() asm volatile("sync" : : : "memory")
-#endif
 
 #define rte_smp_mb() rte_mb()
 
-- 
1.8.3.1
^ permalink raw reply	[flat|nested] 42+ messages in thread
* Re: [dpdk-dev] [PATCH] eal/ppc: remove fix of memory barrier for IBM POWER
  2019-03-18 12:58 [dpdk-dev] [PATCH] eal/ppc: remove fix of memory barrier for IBM POWER Dekel Peled
  2019-03-18 12:58 ` Dekel Peled
@ 2019-03-19  3:24 ` Chao Zhu
  2019-03-19  3:24   ` Chao Zhu
  2019-03-19 10:05   ` Dekel Peled
  2019-03-28 22:50 ` [dpdk-dev] [dpdk-stable] " Thomas Monjalon
  2 siblings, 2 replies; 42+ messages in thread
From: Chao Zhu @ 2019-03-19  3:24 UTC (permalink / raw)
  To: 'Dekel Peled'; +Cc: yskoh, shahafs, dev, orika, thomas, stable
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain, Size: 2495 bytes --]
Dekel£¬
To control the memory order for device memory, I think you should use 
rte_io_mb() instead of rte_mb(). This will generate correct result. rte_wmb() 
is used for system memory.
> -----Original Message-----
> From: Dekel Peled <dekelp@mellanox.com>
> Sent: Monday, March 18, 2019 8:58 PM
> To: chaozhu@linux.vnet.ibm.com
> Cc: yskoh@mellanox.com; shahafs@mellanox.com; dev@dpdk.org;
> orika@mellanox.com; thomas@monjalon.net; dekelp@mellanox.com;
> stable@dpdk.org
> Subject: [PATCH] eal/ppc: remove fix of memory barrier for IBM POWER
>
> From previous patch description: "to improve performance on PPC64, use light
> weight sync instruction instead of sync instruction."
>
> Excerpt from IBM doc [1], section "Memory barrier instructions":
> "The second form of the sync instruction is light-weight sync, or lwsync.
> This form is used to control ordering for storage accesses to system memory
> only. It does not create a memory barrier for accesses to device memory."
>
> This patch removes the use of lwsync, so calls to rte_wmb() and
> rte_rmb() will provide correct memory barrier to ensure order of accesses to
> system memory and device memory.
>
> [1] https://www.ibm.com/developerworks/systems/articles/powerpc.html
>
> Fixes: d23a6bd04d72 ("eal/ppc: fix memory barrier for IBM POWER")
> Cc: stable@dpdk.org
>
> Signed-off-by: Dekel Peled <dekelp@mellanox.com>
> ---
>  lib/librte_eal/common/include/arch/ppc_64/rte_atomic.h | 8 --------
>  1 file changed, 8 deletions(-)
>
> diff --git a/lib/librte_eal/common/include/arch/ppc_64/rte_atomic.h
> b/lib/librte_eal/common/include/arch/ppc_64/rte_atomic.h
> index ce38350..797381c 100644
> --- a/lib/librte_eal/common/include/arch/ppc_64/rte_atomic.h
> +++ b/lib/librte_eal/common/include/arch/ppc_64/rte_atomic.h
> @@ -63,11 +63,7 @@
>   * Guarantees that the STORE operations generated before the barrier
>   * occur before the STORE operations generated after.
>   */
> -#ifdef RTE_ARCH_64
> -#define	rte_wmb() asm volatile("lwsync" : : : "memory")
> -#else
>  #define	rte_wmb() asm volatile("sync" : : : "memory")
> -#endif
>
>  /**
>   * Read memory barrier.
> @@ -75,11 +71,7 @@
>   * Guarantees that the LOAD operations generated before the barrier
>   * occur before the LOAD operations generated after.
>   */
> -#ifdef RTE_ARCH_64
> -#define	rte_rmb() asm volatile("lwsync" : : : "memory")
> -#else
>  #define	rte_rmb() asm volatile("sync" : : : "memory")
> -#endif
>
>  #define rte_smp_mb() rte_mb()
>
> --
> 1.8.3.1
^ permalink raw reply	[flat|nested] 42+ messages in thread
* Re: [dpdk-dev] [PATCH] eal/ppc: remove fix of memory barrier for IBM POWER
  2019-03-19  3:24 ` Chao Zhu
@ 2019-03-19  3:24   ` Chao Zhu
  2019-03-19 10:05   ` Dekel Peled
  1 sibling, 0 replies; 42+ messages in thread
From: Chao Zhu @ 2019-03-19  3:24 UTC (permalink / raw)
  To: 'Dekel Peled'; +Cc: yskoh, shahafs, dev, orika, thomas, stable
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset="UTF-8", Size: 2498 bytes --]
Dekel£¬
To control the memory order for device memory, I think you should use 
rte_io_mb() instead of rte_mb(). This will generate correct result. rte_wmb() 
is used for system memory.
> -----Original Message-----
> From: Dekel Peled <dekelp@mellanox.com>
> Sent: Monday, March 18, 2019 8:58 PM
> To: chaozhu@linux.vnet.ibm.com
> Cc: yskoh@mellanox.com; shahafs@mellanox.com; dev@dpdk.org;
> orika@mellanox.com; thomas@monjalon.net; dekelp@mellanox.com;
> stable@dpdk.org
> Subject: [PATCH] eal/ppc: remove fix of memory barrier for IBM POWER
>
> From previous patch description: "to improve performance on PPC64, use light
> weight sync instruction instead of sync instruction."
>
> Excerpt from IBM doc [1], section "Memory barrier instructions":
> "The second form of the sync instruction is light-weight sync, or lwsync.
> This form is used to control ordering for storage accesses to system memory
> only. It does not create a memory barrier for accesses to device memory."
>
> This patch removes the use of lwsync, so calls to rte_wmb() and
> rte_rmb() will provide correct memory barrier to ensure order of accesses to
> system memory and device memory.
>
> [1] https://www.ibm.com/developerworks/systems/articles/powerpc.html
>
> Fixes: d23a6bd04d72 ("eal/ppc: fix memory barrier for IBM POWER")
> Cc: stable@dpdk.org
>
> Signed-off-by: Dekel Peled <dekelp@mellanox.com>
> ---
>  lib/librte_eal/common/include/arch/ppc_64/rte_atomic.h | 8 --------
>  1 file changed, 8 deletions(-)
>
> diff --git a/lib/librte_eal/common/include/arch/ppc_64/rte_atomic.h
> b/lib/librte_eal/common/include/arch/ppc_64/rte_atomic.h
> index ce38350..797381c 100644
> --- a/lib/librte_eal/common/include/arch/ppc_64/rte_atomic.h
> +++ b/lib/librte_eal/common/include/arch/ppc_64/rte_atomic.h
> @@ -63,11 +63,7 @@
>   * Guarantees that the STORE operations generated before the barrier
>   * occur before the STORE operations generated after.
>   */
> -#ifdef RTE_ARCH_64
> -#define	rte_wmb() asm volatile("lwsync" : : : "memory")
> -#else
>  #define	rte_wmb() asm volatile("sync" : : : "memory")
> -#endif
>
>  /**
>   * Read memory barrier.
> @@ -75,11 +71,7 @@
>   * Guarantees that the LOAD operations generated before the barrier
>   * occur before the LOAD operations generated after.
>   */
> -#ifdef RTE_ARCH_64
> -#define	rte_rmb() asm volatile("lwsync" : : : "memory")
> -#else
>  #define	rte_rmb() asm volatile("sync" : : : "memory")
> -#endif
>
>  #define rte_smp_mb() rte_mb()
>
> --
> 1.8.3.1
^ permalink raw reply	[flat|nested] 42+ messages in thread
* Re: [dpdk-dev] [PATCH] eal/ppc: remove fix of memory barrier for IBM POWER
  2019-03-19  3:24 ` Chao Zhu
  2019-03-19  3:24   ` Chao Zhu
@ 2019-03-19 10:05   ` Dekel Peled
  2019-03-19 10:05     ` Dekel Peled
  2019-03-19 11:14     ` Thomas Monjalon
  1 sibling, 2 replies; 42+ messages in thread
From: Dekel Peled @ 2019-03-19 10:05 UTC (permalink / raw)
  To: Chao Zhu
  Cc: Yongseok Koh, Shahaf Shuler, dev, Ori Kam, Thomas Monjalon, stable
Hi,
For ppc, rte_io_mb() is defined as rte_mb(), which is defined as asm sync.
According to comments in arch/ppc_64/rte_atomic.h, rte_wmb() and rte_rmb() are the same as rte_mb(), for store and load respectively.
My patch propose to define rte_wmb() and rte_rmb() as asm sync, like rte_mb(), since using lwsync is incorrect for them.
Regards,
Dekel
> -----Original Message-----
> From: Chao Zhu <chaozhu@linux.vnet.ibm.com>
> Sent: Tuesday, March 19, 2019 5:24 AM
> To: Dekel Peled <dekelp@mellanox.com>
> Cc: Yongseok Koh <yskoh@mellanox.com>; Shahaf Shuler
> <shahafs@mellanox.com>; dev@dpdk.org; Ori Kam <orika@mellanox.com>;
> Thomas Monjalon <thomas@monjalon.net>; stable@dpdk.org
> Subject: RE: [PATCH] eal/ppc: remove fix of memory barrier for IBM POWER
> 
> Dekel£¬
> 
> To control the memory order for device memory, I think you should use
> rte_io_mb() instead of rte_mb(). This will generate correct result. rte_wmb()
> is used for system memory.
> 
> > -----Original Message-----
> > From: Dekel Peled <dekelp@mellanox.com>
> > Sent: Monday, March 18, 2019 8:58 PM
> > To: chaozhu@linux.vnet.ibm.com
> > Cc: yskoh@mellanox.com; shahafs@mellanox.com; dev@dpdk.org;
> > orika@mellanox.com; thomas@monjalon.net; dekelp@mellanox.com;
> > stable@dpdk.org
> > Subject: [PATCH] eal/ppc: remove fix of memory barrier for IBM POWER
> >
> > From previous patch description: "to improve performance on PPC64, use
> > light weight sync instruction instead of sync instruction."
> >
> > Excerpt from IBM doc [1], section "Memory barrier instructions":
> > "The second form of the sync instruction is light-weight sync, or lwsync.
> > This form is used to control ordering for storage accesses to system
> > memory only. It does not create a memory barrier for accesses to device
> memory."
> >
> > This patch removes the use of lwsync, so calls to rte_wmb() and
> > rte_rmb() will provide correct memory barrier to ensure order of
> > accesses to system memory and device memory.
> >
> > [1]
> >
> https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww
> .
> >
> ibm.com%2Fdeveloperworks%2Fsystems%2Farticles%2Fpowerpc.html&
> ;data=
> >
> 02%7C01%7Cdekelp%40mellanox.com%7C381426b6b9d042f776fa08d6ac1a5d
> c5%7Ca
> >
> 652971c7d2e4d9ba6a4d149256f461b%7C0%7C0%7C636885626593364016&am
> p;sdata
> >
> =wFYTcFX2A%2BMdtQMgtojTAtUOzqds7U5pypNS%2F2SoXUM%3D&re
> served=0
> >
> > Fixes: d23a6bd04d72 ("eal/ppc: fix memory barrier for IBM POWER")
> > Cc: stable@dpdk.org
> >
> > Signed-off-by: Dekel Peled <dekelp@mellanox.com>
> > ---
> >  lib/librte_eal/common/include/arch/ppc_64/rte_atomic.h | 8 --------
> >  1 file changed, 8 deletions(-)
> >
> > diff --git a/lib/librte_eal/common/include/arch/ppc_64/rte_atomic.h
> > b/lib/librte_eal/common/include/arch/ppc_64/rte_atomic.h
> > index ce38350..797381c 100644
> > --- a/lib/librte_eal/common/include/arch/ppc_64/rte_atomic.h
> > +++ b/lib/librte_eal/common/include/arch/ppc_64/rte_atomic.h
> > @@ -63,11 +63,7 @@
> >   * Guarantees that the STORE operations generated before the barrier
> >   * occur before the STORE operations generated after.
> >   */
> > -#ifdef RTE_ARCH_64
> > -#define	rte_wmb() asm volatile("lwsync" : : : "memory")
> > -#else
> >  #define	rte_wmb() asm volatile("sync" : : : "memory")
> > -#endif
> >
> >  /**
> >   * Read memory barrier.
> > @@ -75,11 +71,7 @@
> >   * Guarantees that the LOAD operations generated before the barrier
> >   * occur before the LOAD operations generated after.
> >   */
> > -#ifdef RTE_ARCH_64
> > -#define	rte_rmb() asm volatile("lwsync" : : : "memory")
> > -#else
> >  #define	rte_rmb() asm volatile("sync" : : : "memory")
> > -#endif
> >
> >  #define rte_smp_mb() rte_mb()
> >
> > --
> > 1.8.3.1
> 
> 
^ permalink raw reply	[flat|nested] 42+ messages in thread
* Re: [dpdk-dev] [PATCH] eal/ppc: remove fix of memory barrier for IBM POWER
  2019-03-19 10:05   ` Dekel Peled
@ 2019-03-19 10:05     ` Dekel Peled
  2019-03-19 11:14     ` Thomas Monjalon
  1 sibling, 0 replies; 42+ messages in thread
From: Dekel Peled @ 2019-03-19 10:05 UTC (permalink / raw)
  To: Chao Zhu
  Cc: Yongseok Koh, Shahaf Shuler, dev, Ori Kam, Thomas Monjalon, stable
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset="UTF-8", Size: 3841 bytes --]
Hi,
For ppc, rte_io_mb() is defined as rte_mb(), which is defined as asm sync.
According to comments in arch/ppc_64/rte_atomic.h, rte_wmb() and rte_rmb() are the same as rte_mb(), for store and load respectively.
My patch propose to define rte_wmb() and rte_rmb() as asm sync, like rte_mb(), since using lwsync is incorrect for them.
Regards,
Dekel
> -----Original Message-----
> From: Chao Zhu <chaozhu@linux.vnet.ibm.com>
> Sent: Tuesday, March 19, 2019 5:24 AM
> To: Dekel Peled <dekelp@mellanox.com>
> Cc: Yongseok Koh <yskoh@mellanox.com>; Shahaf Shuler
> <shahafs@mellanox.com>; dev@dpdk.org; Ori Kam <orika@mellanox.com>;
> Thomas Monjalon <thomas@monjalon.net>; stable@dpdk.org
> Subject: RE: [PATCH] eal/ppc: remove fix of memory barrier for IBM POWER
> 
> Dekel£¬
> 
> To control the memory order for device memory, I think you should use
> rte_io_mb() instead of rte_mb(). This will generate correct result. rte_wmb()
> is used for system memory.
> 
> > -----Original Message-----
> > From: Dekel Peled <dekelp@mellanox.com>
> > Sent: Monday, March 18, 2019 8:58 PM
> > To: chaozhu@linux.vnet.ibm.com
> > Cc: yskoh@mellanox.com; shahafs@mellanox.com; dev@dpdk.org;
> > orika@mellanox.com; thomas@monjalon.net; dekelp@mellanox.com;
> > stable@dpdk.org
> > Subject: [PATCH] eal/ppc: remove fix of memory barrier for IBM POWER
> >
> > From previous patch description: "to improve performance on PPC64, use
> > light weight sync instruction instead of sync instruction."
> >
> > Excerpt from IBM doc [1], section "Memory barrier instructions":
> > "The second form of the sync instruction is light-weight sync, or lwsync.
> > This form is used to control ordering for storage accesses to system
> > memory only. It does not create a memory barrier for accesses to device
> memory."
> >
> > This patch removes the use of lwsync, so calls to rte_wmb() and
> > rte_rmb() will provide correct memory barrier to ensure order of
> > accesses to system memory and device memory.
> >
> > [1]
> >
> https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww
> .
> >
> ibm.com%2Fdeveloperworks%2Fsystems%2Farticles%2Fpowerpc.html&
> ;data=
> >
> 02%7C01%7Cdekelp%40mellanox.com%7C381426b6b9d042f776fa08d6ac1a5d
> c5%7Ca
> >
> 652971c7d2e4d9ba6a4d149256f461b%7C0%7C0%7C636885626593364016&am
> p;sdata
> >
> =wFYTcFX2A%2BMdtQMgtojTAtUOzqds7U5pypNS%2F2SoXUM%3D&re
> served=0
> >
> > Fixes: d23a6bd04d72 ("eal/ppc: fix memory barrier for IBM POWER")
> > Cc: stable@dpdk.org
> >
> > Signed-off-by: Dekel Peled <dekelp@mellanox.com>
> > ---
> >  lib/librte_eal/common/include/arch/ppc_64/rte_atomic.h | 8 --------
> >  1 file changed, 8 deletions(-)
> >
> > diff --git a/lib/librte_eal/common/include/arch/ppc_64/rte_atomic.h
> > b/lib/librte_eal/common/include/arch/ppc_64/rte_atomic.h
> > index ce38350..797381c 100644
> > --- a/lib/librte_eal/common/include/arch/ppc_64/rte_atomic.h
> > +++ b/lib/librte_eal/common/include/arch/ppc_64/rte_atomic.h
> > @@ -63,11 +63,7 @@
> >   * Guarantees that the STORE operations generated before the barrier
> >   * occur before the STORE operations generated after.
> >   */
> > -#ifdef RTE_ARCH_64
> > -#define	rte_wmb() asm volatile("lwsync" : : : "memory")
> > -#else
> >  #define	rte_wmb() asm volatile("sync" : : : "memory")
> > -#endif
> >
> >  /**
> >   * Read memory barrier.
> > @@ -75,11 +71,7 @@
> >   * Guarantees that the LOAD operations generated before the barrier
> >   * occur before the LOAD operations generated after.
> >   */
> > -#ifdef RTE_ARCH_64
> > -#define	rte_rmb() asm volatile("lwsync" : : : "memory")
> > -#else
> >  #define	rte_rmb() asm volatile("sync" : : : "memory")
> > -#endif
> >
> >  #define rte_smp_mb() rte_mb()
> >
> > --
> > 1.8.3.1
> 
> 
^ permalink raw reply	[flat|nested] 42+ messages in thread
* Re: [dpdk-dev] [PATCH] eal/ppc: remove fix of memory barrier for IBM POWER
  2019-03-19 10:05   ` Dekel Peled
  2019-03-19 10:05     ` Dekel Peled
@ 2019-03-19 11:14     ` Thomas Monjalon
  2019-03-19 11:14       ` Thomas Monjalon
  2019-03-19 19:42       ` Shahaf Shuler
  1 sibling, 2 replies; 42+ messages in thread
From: Thomas Monjalon @ 2019-03-19 11:14 UTC (permalink / raw)
  To: Dekel Peled, Chao Zhu; +Cc: Yongseok Koh, Shahaf Shuler, dev, Ori Kam, stable
Guys, please let's avoid top-post.
You are both not replying to each other:
1/ Dekel mentioned the IBM doc but Chao did not argue about
the lack of IO protection with lwsync.
We assume that rte_mb should protect any access including IO.
2/ Chao asked about the semantic of the barrier used in mlx5 code,
but Dekel did not reply about the semantic: are we protecting
IO or general memory access?
19/03/2019 11:05, Dekel Peled:
> Hi,
> 
> For ppc, rte_io_mb() is defined as rte_mb(), which is defined as asm sync.
> According to comments in arch/ppc_64/rte_atomic.h, rte_wmb() and rte_rmb() are the same as rte_mb(), for store and load respectively.
> My patch propose to define rte_wmb() and rte_rmb() as asm sync, like rte_mb(), since using lwsync is incorrect for them.
> 
> Regards,
> Dekel
> 
> > -----Original Message-----
> > From: Chao Zhu <chaozhu@linux.vnet.ibm.com>
> > Sent: Tuesday, March 19, 2019 5:24 AM
> > To: Dekel Peled <dekelp@mellanox.com>
> > Cc: Yongseok Koh <yskoh@mellanox.com>; Shahaf Shuler
> > <shahafs@mellanox.com>; dev@dpdk.org; Ori Kam <orika@mellanox.com>;
> > Thomas Monjalon <thomas@monjalon.net>; stable@dpdk.org
> > Subject: RE: [PATCH] eal/ppc: remove fix of memory barrier for IBM POWER
> > 
> > Dekel£¬
> > 
> > To control the memory order for device memory, I think you should use
> > rte_io_mb() instead of rte_mb(). This will generate correct result. rte_wmb()
> > is used for system memory.
> > 
> > > -----Original Message-----
> > > From: Dekel Peled <dekelp@mellanox.com>
> > > Sent: Monday, March 18, 2019 8:58 PM
> > > To: chaozhu@linux.vnet.ibm.com
> > > Cc: yskoh@mellanox.com; shahafs@mellanox.com; dev@dpdk.org;
> > > orika@mellanox.com; thomas@monjalon.net; dekelp@mellanox.com;
> > > stable@dpdk.org
> > > Subject: [PATCH] eal/ppc: remove fix of memory barrier for IBM POWER
> > >
> > > From previous patch description: "to improve performance on PPC64, use
> > > light weight sync instruction instead of sync instruction."
> > >
> > > Excerpt from IBM doc [1], section "Memory barrier instructions":
> > > "The second form of the sync instruction is light-weight sync, or lwsync.
> > > This form is used to control ordering for storage accesses to system
> > > memory only. It does not create a memory barrier for accesses to device
> > memory."
> > >
> > > This patch removes the use of lwsync, so calls to rte_wmb() and
> > > rte_rmb() will provide correct memory barrier to ensure order of
> > > accesses to system memory and device memory.
> > >
> > > [1]
> > >
> > https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww
> > .
> > >
> > ibm.com%2Fdeveloperworks%2Fsystems%2Farticles%2Fpowerpc.html&
> > ;data=
> > >
> > 02%7C01%7Cdekelp%40mellanox.com%7C381426b6b9d042f776fa08d6ac1a5d
> > c5%7Ca
> > >
> > 652971c7d2e4d9ba6a4d149256f461b%7C0%7C0%7C636885626593364016&am
> > p;sdata
> > >
> > =wFYTcFX2A%2BMdtQMgtojTAtUOzqds7U5pypNS%2F2SoXUM%3D&re
> > served=0
> > >
> > > Fixes: d23a6bd04d72 ("eal/ppc: fix memory barrier for IBM POWER")
> > > Cc: stable@dpdk.org
> > >
> > > Signed-off-by: Dekel Peled <dekelp@mellanox.com>
> > > ---
> > >  lib/librte_eal/common/include/arch/ppc_64/rte_atomic.h | 8 --------
> > >  1 file changed, 8 deletions(-)
> > >
> > > diff --git a/lib/librte_eal/common/include/arch/ppc_64/rte_atomic.h
> > > b/lib/librte_eal/common/include/arch/ppc_64/rte_atomic.h
> > > index ce38350..797381c 100644
> > > --- a/lib/librte_eal/common/include/arch/ppc_64/rte_atomic.h
> > > +++ b/lib/librte_eal/common/include/arch/ppc_64/rte_atomic.h
> > > @@ -63,11 +63,7 @@
> > >   * Guarantees that the STORE operations generated before the barrier
> > >   * occur before the STORE operations generated after.
> > >   */
> > > -#ifdef RTE_ARCH_64
> > > -#define	rte_wmb() asm volatile("lwsync" : : : "memory")
> > > -#else
> > >  #define	rte_wmb() asm volatile("sync" : : : "memory")
> > > -#endif
> > >
> > >  /**
> > >   * Read memory barrier.
> > > @@ -75,11 +71,7 @@
> > >   * Guarantees that the LOAD operations generated before the barrier
> > >   * occur before the LOAD operations generated after.
> > >   */
> > > -#ifdef RTE_ARCH_64
> > > -#define	rte_rmb() asm volatile("lwsync" : : : "memory")
> > > -#else
> > >  #define	rte_rmb() asm volatile("sync" : : : "memory")
> > > -#endif
> > >
> > >  #define rte_smp_mb() rte_mb()
^ permalink raw reply	[flat|nested] 42+ messages in thread
* Re: [dpdk-dev] [PATCH] eal/ppc: remove fix of memory barrier for IBM POWER
  2019-03-19 11:14     ` Thomas Monjalon
@ 2019-03-19 11:14       ` Thomas Monjalon
  2019-03-19 19:42       ` Shahaf Shuler
  1 sibling, 0 replies; 42+ messages in thread
From: Thomas Monjalon @ 2019-03-19 11:14 UTC (permalink / raw)
  To: Dekel Peled, Chao Zhu; +Cc: Yongseok Koh, Shahaf Shuler, dev, Ori Kam, stable
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset="UTF-8", Size: 4454 bytes --]
Guys, please let's avoid top-post.
You are both not replying to each other:
1/ Dekel mentioned the IBM doc but Chao did not argue about
the lack of IO protection with lwsync.
We assume that rte_mb should protect any access including IO.
2/ Chao asked about the semantic of the barrier used in mlx5 code,
but Dekel did not reply about the semantic: are we protecting
IO or general memory access?
19/03/2019 11:05, Dekel Peled:
> Hi,
> 
> For ppc, rte_io_mb() is defined as rte_mb(), which is defined as asm sync.
> According to comments in arch/ppc_64/rte_atomic.h, rte_wmb() and rte_rmb() are the same as rte_mb(), for store and load respectively.
> My patch propose to define rte_wmb() and rte_rmb() as asm sync, like rte_mb(), since using lwsync is incorrect for them.
> 
> Regards,
> Dekel
> 
> > -----Original Message-----
> > From: Chao Zhu <chaozhu@linux.vnet.ibm.com>
> > Sent: Tuesday, March 19, 2019 5:24 AM
> > To: Dekel Peled <dekelp@mellanox.com>
> > Cc: Yongseok Koh <yskoh@mellanox.com>; Shahaf Shuler
> > <shahafs@mellanox.com>; dev@dpdk.org; Ori Kam <orika@mellanox.com>;
> > Thomas Monjalon <thomas@monjalon.net>; stable@dpdk.org
> > Subject: RE: [PATCH] eal/ppc: remove fix of memory barrier for IBM POWER
> > 
> > Dekel£¬
> > 
> > To control the memory order for device memory, I think you should use
> > rte_io_mb() instead of rte_mb(). This will generate correct result. rte_wmb()
> > is used for system memory.
> > 
> > > -----Original Message-----
> > > From: Dekel Peled <dekelp@mellanox.com>
> > > Sent: Monday, March 18, 2019 8:58 PM
> > > To: chaozhu@linux.vnet.ibm.com
> > > Cc: yskoh@mellanox.com; shahafs@mellanox.com; dev@dpdk.org;
> > > orika@mellanox.com; thomas@monjalon.net; dekelp@mellanox.com;
> > > stable@dpdk.org
> > > Subject: [PATCH] eal/ppc: remove fix of memory barrier for IBM POWER
> > >
> > > From previous patch description: "to improve performance on PPC64, use
> > > light weight sync instruction instead of sync instruction."
> > >
> > > Excerpt from IBM doc [1], section "Memory barrier instructions":
> > > "The second form of the sync instruction is light-weight sync, or lwsync.
> > > This form is used to control ordering for storage accesses to system
> > > memory only. It does not create a memory barrier for accesses to device
> > memory."
> > >
> > > This patch removes the use of lwsync, so calls to rte_wmb() and
> > > rte_rmb() will provide correct memory barrier to ensure order of
> > > accesses to system memory and device memory.
> > >
> > > [1]
> > >
> > https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww
> > .
> > >
> > ibm.com%2Fdeveloperworks%2Fsystems%2Farticles%2Fpowerpc.html&
> > ;data=
> > >
> > 02%7C01%7Cdekelp%40mellanox.com%7C381426b6b9d042f776fa08d6ac1a5d
> > c5%7Ca
> > >
> > 652971c7d2e4d9ba6a4d149256f461b%7C0%7C0%7C636885626593364016&am
> > p;sdata
> > >
> > =wFYTcFX2A%2BMdtQMgtojTAtUOzqds7U5pypNS%2F2SoXUM%3D&re
> > served=0
> > >
> > > Fixes: d23a6bd04d72 ("eal/ppc: fix memory barrier for IBM POWER")
> > > Cc: stable@dpdk.org
> > >
> > > Signed-off-by: Dekel Peled <dekelp@mellanox.com>
> > > ---
> > >  lib/librte_eal/common/include/arch/ppc_64/rte_atomic.h | 8 --------
> > >  1 file changed, 8 deletions(-)
> > >
> > > diff --git a/lib/librte_eal/common/include/arch/ppc_64/rte_atomic.h
> > > b/lib/librte_eal/common/include/arch/ppc_64/rte_atomic.h
> > > index ce38350..797381c 100644
> > > --- a/lib/librte_eal/common/include/arch/ppc_64/rte_atomic.h
> > > +++ b/lib/librte_eal/common/include/arch/ppc_64/rte_atomic.h
> > > @@ -63,11 +63,7 @@
> > >   * Guarantees that the STORE operations generated before the barrier
> > >   * occur before the STORE operations generated after.
> > >   */
> > > -#ifdef RTE_ARCH_64
> > > -#define	rte_wmb() asm volatile("lwsync" : : : "memory")
> > > -#else
> > >  #define	rte_wmb() asm volatile("sync" : : : "memory")
> > > -#endif
> > >
> > >  /**
> > >   * Read memory barrier.
> > > @@ -75,11 +71,7 @@
> > >   * Guarantees that the LOAD operations generated before the barrier
> > >   * occur before the LOAD operations generated after.
> > >   */
> > > -#ifdef RTE_ARCH_64
> > > -#define	rte_rmb() asm volatile("lwsync" : : : "memory")
> > > -#else
> > >  #define	rte_rmb() asm volatile("sync" : : : "memory")
> > > -#endif
> > >
> > >  #define rte_smp_mb() rte_mb()
^ permalink raw reply	[flat|nested] 42+ messages in thread
* Re: [dpdk-dev] [PATCH] eal/ppc: remove fix of memory barrier for IBM POWER
  2019-03-19 11:14     ` Thomas Monjalon
  2019-03-19 11:14       ` Thomas Monjalon
@ 2019-03-19 19:42       ` Shahaf Shuler
  2019-03-19 19:42         ` Shahaf Shuler
  2019-03-19 20:45         ` Thomas Monjalon
  1 sibling, 2 replies; 42+ messages in thread
From: Shahaf Shuler @ 2019-03-19 19:42 UTC (permalink / raw)
  To: Thomas Monjalon, Dekel Peled, Chao Zhu; +Cc: Yongseok Koh, dev, Ori Kam, stable
Tuesday, March 19, 2019 1:15 PM, Thomas Monjalon:
> Subject: Re: [PATCH] eal/ppc: remove fix of memory barrier for IBM POWER
> 
> Guys, please let's avoid top-post.
> 
> You are both not replying to each other:
> 
> 1/ Dekel mentioned the IBM doc but Chao did not argue about the lack of IO
> protection with lwsync.
> We assume that rte_mb should protect any access including IO.
> 
> 2/ Chao asked about the semantic of the barrier used in mlx5 code, but Dekel
> did not reply about the semantic: are we protecting IO or general memory
> access?
In mlx5 code we want to sync between two different writes:
1. write to system memory (RAM)
2. write to MMIO memory (device)
We need #1 to be visible on host memory before #2 is committed to NIC.
We want to have a single type of barrier which will translate to the correct assembly based on the system arch, and in addition we want it light-weight as possible.
So far, when not running on power, we used the rte_wmb for that. On x86 and ARM systems it provided the needed guarantees.  
It is also mentioned in the barrier doxygen on ARM arch:
"
Write memory barrier.                                            
                                                                 
Guarantees that the STORE operations generated before the barrier
occur before the STORE operations generated after.
"
It doesn't restrict to store to system memory only. 
w/ power is on somewhat different and in fact rte_mb is required. It obviously miss the point of those barrier if we will need to use a different barrier based on the system arch. 
We need to align the definition of the different barriers in DPDK:
1. need a clear documentation of each. this should be global and not part of the specific implementation on each arch. 
2. either modify ppc rte_wmb to match ARM and x86 ones or to define a new type of barrier which will sync between both I/O and stores to systems memory. 
> 
> 
> 19/03/2019 11:05, Dekel Peled:
> > Hi,
> >
> > For ppc, rte_io_mb() is defined as rte_mb(), which is defined as asm sync.
> > According to comments in arch/ppc_64/rte_atomic.h, rte_wmb() and
> rte_rmb() are the same as rte_mb(), for store and load respectively.
> > My patch propose to define rte_wmb() and rte_rmb() as asm sync, like
> rte_mb(), since using lwsync is incorrect for them.
> >
> > Regards,
> > Dekel
> >
> > > -----Original Message-----
> > > From: Chao Zhu <chaozhu@linux.vnet.ibm.com>
> > > Sent: Tuesday, March 19, 2019 5:24 AM
> > > To: Dekel Peled <dekelp@mellanox.com>
> > > Cc: Yongseok Koh <yskoh@mellanox.com>; Shahaf Shuler
> > > <shahafs@mellanox.com>; dev@dpdk.org; Ori Kam
> <orika@mellanox.com>;
> > > Thomas Monjalon <thomas@monjalon.net>; stable@dpdk.org
> > > Subject: RE: [PATCH] eal/ppc: remove fix of memory barrier for IBM
> > > POWER
> > >
> > > Dekel£¬
> > >
> > > To control the memory order for device memory, I think you should
> > > use
> > > rte_io_mb() instead of rte_mb(). This will generate correct result.
> > > rte_wmb() is used for system memory.
> > >
> > > > -----Original Message-----
> > > > From: Dekel Peled <dekelp@mellanox.com>
> > > > Sent: Monday, March 18, 2019 8:58 PM
> > > > To: chaozhu@linux.vnet.ibm.com
> > > > Cc: yskoh@mellanox.com; shahafs@mellanox.com; dev@dpdk.org;
> > > > orika@mellanox.com; thomas@monjalon.net; dekelp@mellanox.com;
> > > > stable@dpdk.org
> > > > Subject: [PATCH] eal/ppc: remove fix of memory barrier for IBM
> > > > POWER
> > > >
> > > > From previous patch description: "to improve performance on PPC64,
> > > > use light weight sync instruction instead of sync instruction."
> > > >
> > > > Excerpt from IBM doc [1], section "Memory barrier instructions":
> > > > "The second form of the sync instruction is light-weight sync, or lwsync.
> > > > This form is used to control ordering for storage accesses to
> > > > system memory only. It does not create a memory barrier for
> > > > accesses to device
> > > memory."
> > > >
> > > > This patch removes the use of lwsync, so calls to rte_wmb() and
> > > > rte_rmb() will provide correct memory barrier to ensure order of
> > > > accesses to system memory and device memory.
> > > >
> > > > [1]
> > > >
> > >
> https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fww
> > > w
> > > .
> > > >
> > >
> ibm.com%2Fdeveloperworks%2Fsystems%2Farticles%2Fpowerpc.html&
> > > ;data=
> > > >
> > >
> 02%7C01%7Cdekelp%40mellanox.com%7C381426b6b9d042f776fa08d6ac1a5d
> > > c5%7Ca
> > > >
> > >
> 652971c7d2e4d9ba6a4d149256f461b%7C0%7C0%7C636885626593364016&am
> > > p;sdata
> > > >
> > >
> =wFYTcFX2A%2BMdtQMgtojTAtUOzqds7U5pypNS%2F2SoXUM%3D&re
> > > served=0
> > > >
> > > > Fixes: d23a6bd04d72 ("eal/ppc: fix memory barrier for IBM POWER")
> > > > Cc: stable@dpdk.org
> > > >
> > > > Signed-off-by: Dekel Peled <dekelp@mellanox.com>
> > > > ---
> > > >  lib/librte_eal/common/include/arch/ppc_64/rte_atomic.h | 8
> > > > --------
> > > >  1 file changed, 8 deletions(-)
> > > >
> > > > diff --git
> > > > a/lib/librte_eal/common/include/arch/ppc_64/rte_atomic.h
> > > > b/lib/librte_eal/common/include/arch/ppc_64/rte_atomic.h
> > > > index ce38350..797381c 100644
> > > > --- a/lib/librte_eal/common/include/arch/ppc_64/rte_atomic.h
> > > > +++ b/lib/librte_eal/common/include/arch/ppc_64/rte_atomic.h
> > > > @@ -63,11 +63,7 @@
> > > >   * Guarantees that the STORE operations generated before the barrier
> > > >   * occur before the STORE operations generated after.
> > > >   */
> > > > -#ifdef RTE_ARCH_64
> > > > -#define	rte_wmb() asm volatile("lwsync" : : : "memory")
> > > > -#else
> > > >  #define	rte_wmb() asm volatile("sync" : : : "memory")
> > > > -#endif
> > > >
> > > >  /**
> > > >   * Read memory barrier.
> > > > @@ -75,11 +71,7 @@
> > > >   * Guarantees that the LOAD operations generated before the barrier
> > > >   * occur before the LOAD operations generated after.
> > > >   */
> > > > -#ifdef RTE_ARCH_64
> > > > -#define	rte_rmb() asm volatile("lwsync" : : : "memory")
> > > > -#else
> > > >  #define	rte_rmb() asm volatile("sync" : : : "memory")
> > > > -#endif
> > > >
> > > >  #define rte_smp_mb() rte_mb()
> 
> 
^ permalink raw reply	[flat|nested] 42+ messages in thread
* Re: [dpdk-dev] [PATCH] eal/ppc: remove fix of memory barrier for IBM POWER
  2019-03-19 19:42       ` Shahaf Shuler
@ 2019-03-19 19:42         ` Shahaf Shuler
  2019-03-19 20:45         ` Thomas Monjalon
  1 sibling, 0 replies; 42+ messages in thread
From: Shahaf Shuler @ 2019-03-19 19:42 UTC (permalink / raw)
  To: Thomas Monjalon, Dekel Peled, Chao Zhu; +Cc: Yongseok Koh, dev, Ori Kam, stable
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset="UTF-8", Size: 6291 bytes --]
Tuesday, March 19, 2019 1:15 PM, Thomas Monjalon:
> Subject: Re: [PATCH] eal/ppc: remove fix of memory barrier for IBM POWER
> 
> Guys, please let's avoid top-post.
> 
> You are both not replying to each other:
> 
> 1/ Dekel mentioned the IBM doc but Chao did not argue about the lack of IO
> protection with lwsync.
> We assume that rte_mb should protect any access including IO.
> 
> 2/ Chao asked about the semantic of the barrier used in mlx5 code, but Dekel
> did not reply about the semantic: are we protecting IO or general memory
> access?
In mlx5 code we want to sync between two different writes:
1. write to system memory (RAM)
2. write to MMIO memory (device)
We need #1 to be visible on host memory before #2 is committed to NIC.
We want to have a single type of barrier which will translate to the correct assembly based on the system arch, and in addition we want it light-weight as possible.
So far, when not running on power, we used the rte_wmb for that. On x86 and ARM systems it provided the needed guarantees.  
It is also mentioned in the barrier doxygen on ARM arch:
"
Write memory barrier.                                            
                                                                 
Guarantees that the STORE operations generated before the barrier
occur before the STORE operations generated after.
"
It doesn't restrict to store to system memory only. 
w/ power is on somewhat different and in fact rte_mb is required. It obviously miss the point of those barrier if we will need to use a different barrier based on the system arch. 
We need to align the definition of the different barriers in DPDK:
1. need a clear documentation of each. this should be global and not part of the specific implementation on each arch. 
2. either modify ppc rte_wmb to match ARM and x86 ones or to define a new type of barrier which will sync between both I/O and stores to systems memory. 
> 
> 
> 19/03/2019 11:05, Dekel Peled:
> > Hi,
> >
> > For ppc, rte_io_mb() is defined as rte_mb(), which is defined as asm sync.
> > According to comments in arch/ppc_64/rte_atomic.h, rte_wmb() and
> rte_rmb() are the same as rte_mb(), for store and load respectively.
> > My patch propose to define rte_wmb() and rte_rmb() as asm sync, like
> rte_mb(), since using lwsync is incorrect for them.
> >
> > Regards,
> > Dekel
> >
> > > -----Original Message-----
> > > From: Chao Zhu <chaozhu@linux.vnet.ibm.com>
> > > Sent: Tuesday, March 19, 2019 5:24 AM
> > > To: Dekel Peled <dekelp@mellanox.com>
> > > Cc: Yongseok Koh <yskoh@mellanox.com>; Shahaf Shuler
> > > <shahafs@mellanox.com>; dev@dpdk.org; Ori Kam
> <orika@mellanox.com>;
> > > Thomas Monjalon <thomas@monjalon.net>; stable@dpdk.org
> > > Subject: RE: [PATCH] eal/ppc: remove fix of memory barrier for IBM
> > > POWER
> > >
> > > Dekel£¬
> > >
> > > To control the memory order for device memory, I think you should
> > > use
> > > rte_io_mb() instead of rte_mb(). This will generate correct result.
> > > rte_wmb() is used for system memory.
> > >
> > > > -----Original Message-----
> > > > From: Dekel Peled <dekelp@mellanox.com>
> > > > Sent: Monday, March 18, 2019 8:58 PM
> > > > To: chaozhu@linux.vnet.ibm.com
> > > > Cc: yskoh@mellanox.com; shahafs@mellanox.com; dev@dpdk.org;
> > > > orika@mellanox.com; thomas@monjalon.net; dekelp@mellanox.com;
> > > > stable@dpdk.org
> > > > Subject: [PATCH] eal/ppc: remove fix of memory barrier for IBM
> > > > POWER
> > > >
> > > > From previous patch description: "to improve performance on PPC64,
> > > > use light weight sync instruction instead of sync instruction."
> > > >
> > > > Excerpt from IBM doc [1], section "Memory barrier instructions":
> > > > "The second form of the sync instruction is light-weight sync, or lwsync.
> > > > This form is used to control ordering for storage accesses to
> > > > system memory only. It does not create a memory barrier for
> > > > accesses to device
> > > memory."
> > > >
> > > > This patch removes the use of lwsync, so calls to rte_wmb() and
> > > > rte_rmb() will provide correct memory barrier to ensure order of
> > > > accesses to system memory and device memory.
> > > >
> > > > [1]
> > > >
> > >
> https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fww
> > > w
> > > .
> > > >
> > >
> ibm.com%2Fdeveloperworks%2Fsystems%2Farticles%2Fpowerpc.html&
> > > ;data=
> > > >
> > >
> 02%7C01%7Cdekelp%40mellanox.com%7C381426b6b9d042f776fa08d6ac1a5d
> > > c5%7Ca
> > > >
> > >
> 652971c7d2e4d9ba6a4d149256f461b%7C0%7C0%7C636885626593364016&am
> > > p;sdata
> > > >
> > >
> =wFYTcFX2A%2BMdtQMgtojTAtUOzqds7U5pypNS%2F2SoXUM%3D&re
> > > served=0
> > > >
> > > > Fixes: d23a6bd04d72 ("eal/ppc: fix memory barrier for IBM POWER")
> > > > Cc: stable@dpdk.org
> > > >
> > > > Signed-off-by: Dekel Peled <dekelp@mellanox.com>
> > > > ---
> > > >  lib/librte_eal/common/include/arch/ppc_64/rte_atomic.h | 8
> > > > --------
> > > >  1 file changed, 8 deletions(-)
> > > >
> > > > diff --git
> > > > a/lib/librte_eal/common/include/arch/ppc_64/rte_atomic.h
> > > > b/lib/librte_eal/common/include/arch/ppc_64/rte_atomic.h
> > > > index ce38350..797381c 100644
> > > > --- a/lib/librte_eal/common/include/arch/ppc_64/rte_atomic.h
> > > > +++ b/lib/librte_eal/common/include/arch/ppc_64/rte_atomic.h
> > > > @@ -63,11 +63,7 @@
> > > >   * Guarantees that the STORE operations generated before the barrier
> > > >   * occur before the STORE operations generated after.
> > > >   */
> > > > -#ifdef RTE_ARCH_64
> > > > -#define	rte_wmb() asm volatile("lwsync" : : : "memory")
> > > > -#else
> > > >  #define	rte_wmb() asm volatile("sync" : : : "memory")
> > > > -#endif
> > > >
> > > >  /**
> > > >   * Read memory barrier.
> > > > @@ -75,11 +71,7 @@
> > > >   * Guarantees that the LOAD operations generated before the barrier
> > > >   * occur before the LOAD operations generated after.
> > > >   */
> > > > -#ifdef RTE_ARCH_64
> > > > -#define	rte_rmb() asm volatile("lwsync" : : : "memory")
> > > > -#else
> > > >  #define	rte_rmb() asm volatile("sync" : : : "memory")
> > > > -#endif
> > > >
> > > >  #define rte_smp_mb() rte_mb()
> 
> 
^ permalink raw reply	[flat|nested] 42+ messages in thread
* Re: [dpdk-dev] [PATCH] eal/ppc: remove fix of memory barrier for IBM POWER
  2019-03-19 19:42       ` Shahaf Shuler
  2019-03-19 19:42         ` Shahaf Shuler
@ 2019-03-19 20:45         ` Thomas Monjalon
  2019-03-19 20:45           ` Thomas Monjalon
  2019-03-20 22:40           ` Pradeep Satyanarayana
  1 sibling, 2 replies; 42+ messages in thread
From: Thomas Monjalon @ 2019-03-19 20:45 UTC (permalink / raw)
  To: Shahaf Shuler
  Cc: Dekel Peled, Chao Zhu, Yongseok Koh, dev, Ori Kam, stable, pradeep
19/03/2019 20:42, Shahaf Shuler:
> Tuesday, March 19, 2019 1:15 PM, Thomas Monjalon:
> > Subject: Re: [PATCH] eal/ppc: remove fix of memory barrier for IBM POWER
> > 
> > Guys, please let's avoid top-post.
> > 
> > You are both not replying to each other:
> > 
> > 1/ Dekel mentioned the IBM doc but Chao did not argue about the lack of IO
> > protection with lwsync.
> > We assume that rte_mb should protect any access including IO.
> > 
> > 2/ Chao asked about the semantic of the barrier used in mlx5 code, but Dekel
> > did not reply about the semantic: are we protecting IO or general memory
> > access?
> 
> In mlx5 code we want to sync between two different writes:
> 1. write to system memory (RAM)
> 2. write to MMIO memory (device)
> 
> We need #1 to be visible on host memory before #2 is committed to NIC.
> We want to have a single type of barrier which will translate to the correct assembly based on the system arch, and in addition we want it light-weight as possible.
> 
> So far, when not running on power, we used the rte_wmb for that. On x86 and ARM systems it provided the needed guarantees.  
> It is also mentioned in the barrier doxygen on ARM arch:
> "
> Write memory barrier.                                            
>                                                                  
> Guarantees that the STORE operations generated before the barrier
> occur before the STORE operations generated after.
> "
> 
> It doesn't restrict to store to system memory only. 
> w/ power is on somewhat different and in fact rte_mb is required. It obviously miss the point of those barrier if we will need to use a different barrier based on the system arch. 
> 
> We need to align the definition of the different barriers in DPDK:
> 1. need a clear documentation of each. this should be global and not part of the specific implementation on each arch.
The global definition is in lib/librte_eal/common/include/generic/rte_atomic.h
There are some copy/paste in Arm32 and PPC that I will remove.
> 2. either modify ppc rte_wmb to match ARM and x86 ones or to define a new type of barrier which will sync between both I/O and stores to systems memory.
The basic memory barrier of DPDK does not mention
a difference between I/O and system memory.
It is not explicit (yet) but I assume it is protecting both.
So, in my opinion, we need to make it explicit in the doc,
and fix the PPC implementation to comply with this definition.
Anyway, I don't see any significant effort from IBM to move from
the alpha support stage to a real Open Source support.
PS: sending a mail every two months, to promise improvements, is not enough!
-----------------
> > 19/03/2019 11:05, Dekel Peled:
> > > Hi,
> > >
> > > For ppc, rte_io_mb() is defined as rte_mb(), which is defined as asm sync.
> > > According to comments in arch/ppc_64/rte_atomic.h, rte_wmb() and
> > rte_rmb() are the same as rte_mb(), for store and load respectively.
> > > My patch propose to define rte_wmb() and rte_rmb() as asm sync, like
> > rte_mb(), since using lwsync is incorrect for them.
> > >
> > > Regards,
> > > Dekel
> > >
> > > From: Chao Zhu <chaozhu@linux.vnet.ibm.com>
> > > > Dekel£¬
> > > >
> > > > To control the memory order for device memory, I think you should
> > > > use
> > > > rte_io_mb() instead of rte_mb(). This will generate correct result.
> > > > rte_wmb() is used for system memory.
> > > >
> > > > From: Dekel Peled <dekelp@mellanox.com>
> > > > >
> > > > > From previous patch description: "to improve performance on PPC64,
> > > > > use light weight sync instruction instead of sync instruction."
> > > > >
> > > > > Excerpt from IBM doc [1], section "Memory barrier instructions":
> > > > > "The second form of the sync instruction is light-weight sync, or lwsync.
> > > > > This form is used to control ordering for storage accesses to
> > > > > system memory only. It does not create a memory barrier for
> > > > > accesses to device
> > > > memory."
> > > > >
> > > > > This patch removes the use of lwsync, so calls to rte_wmb() and
> > > > > rte_rmb() will provide correct memory barrier to ensure order of
> > > > > accesses to system memory and device memory.
^ permalink raw reply	[flat|nested] 42+ messages in thread
* Re: [dpdk-dev] [PATCH] eal/ppc: remove fix of memory barrier for IBM POWER
  2019-03-19 20:45         ` Thomas Monjalon
@ 2019-03-19 20:45           ` Thomas Monjalon
  2019-03-20 22:40           ` Pradeep Satyanarayana
  1 sibling, 0 replies; 42+ messages in thread
From: Thomas Monjalon @ 2019-03-19 20:45 UTC (permalink / raw)
  To: Shahaf Shuler
  Cc: Dekel Peled, Chao Zhu, Yongseok Koh, dev, Ori Kam, stable, pradeep
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset="UTF-8", Size: 4281 bytes --]
19/03/2019 20:42, Shahaf Shuler:
> Tuesday, March 19, 2019 1:15 PM, Thomas Monjalon:
> > Subject: Re: [PATCH] eal/ppc: remove fix of memory barrier for IBM POWER
> > 
> > Guys, please let's avoid top-post.
> > 
> > You are both not replying to each other:
> > 
> > 1/ Dekel mentioned the IBM doc but Chao did not argue about the lack of IO
> > protection with lwsync.
> > We assume that rte_mb should protect any access including IO.
> > 
> > 2/ Chao asked about the semantic of the barrier used in mlx5 code, but Dekel
> > did not reply about the semantic: are we protecting IO or general memory
> > access?
> 
> In mlx5 code we want to sync between two different writes:
> 1. write to system memory (RAM)
> 2. write to MMIO memory (device)
> 
> We need #1 to be visible on host memory before #2 is committed to NIC.
> We want to have a single type of barrier which will translate to the correct assembly based on the system arch, and in addition we want it light-weight as possible.
> 
> So far, when not running on power, we used the rte_wmb for that. On x86 and ARM systems it provided the needed guarantees.  
> It is also mentioned in the barrier doxygen on ARM arch:
> "
> Write memory barrier.                                            
>                                                                  
> Guarantees that the STORE operations generated before the barrier
> occur before the STORE operations generated after.
> "
> 
> It doesn't restrict to store to system memory only. 
> w/ power is on somewhat different and in fact rte_mb is required. It obviously miss the point of those barrier if we will need to use a different barrier based on the system arch. 
> 
> We need to align the definition of the different barriers in DPDK:
> 1. need a clear documentation of each. this should be global and not part of the specific implementation on each arch.
The global definition is in lib/librte_eal/common/include/generic/rte_atomic.h
There are some copy/paste in Arm32 and PPC that I will remove.
> 2. either modify ppc rte_wmb to match ARM and x86 ones or to define a new type of barrier which will sync between both I/O and stores to systems memory.
The basic memory barrier of DPDK does not mention
a difference between I/O and system memory.
It is not explicit (yet) but I assume it is protecting both.
So, in my opinion, we need to make it explicit in the doc,
and fix the PPC implementation to comply with this definition.
Anyway, I don't see any significant effort from IBM to move from
the alpha support stage to a real Open Source support.
PS: sending a mail every two months, to promise improvements, is not enough!
-----------------
> > 19/03/2019 11:05, Dekel Peled:
> > > Hi,
> > >
> > > For ppc, rte_io_mb() is defined as rte_mb(), which is defined as asm sync.
> > > According to comments in arch/ppc_64/rte_atomic.h, rte_wmb() and
> > rte_rmb() are the same as rte_mb(), for store and load respectively.
> > > My patch propose to define rte_wmb() and rte_rmb() as asm sync, like
> > rte_mb(), since using lwsync is incorrect for them.
> > >
> > > Regards,
> > > Dekel
> > >
> > > From: Chao Zhu <chaozhu@linux.vnet.ibm.com>
> > > > Dekel£¬
> > > >
> > > > To control the memory order for device memory, I think you should
> > > > use
> > > > rte_io_mb() instead of rte_mb(). This will generate correct result.
> > > > rte_wmb() is used for system memory.
> > > >
> > > > From: Dekel Peled <dekelp@mellanox.com>
> > > > >
> > > > > From previous patch description: "to improve performance on PPC64,
> > > > > use light weight sync instruction instead of sync instruction."
> > > > >
> > > > > Excerpt from IBM doc [1], section "Memory barrier instructions":
> > > > > "The second form of the sync instruction is light-weight sync, or lwsync.
> > > > > This form is used to control ordering for storage accesses to
> > > > > system memory only. It does not create a memory barrier for
> > > > > accesses to device
> > > > memory."
> > > > >
> > > > > This patch removes the use of lwsync, so calls to rte_wmb() and
> > > > > rte_rmb() will provide correct memory barrier to ensure order of
> > > > > accesses to system memory and device memory.
^ permalink raw reply	[flat|nested] 42+ messages in thread
* Re: [dpdk-dev] [PATCH] eal/ppc: remove fix of memory barrier for IBM POWER
  2019-03-19 20:45         ` Thomas Monjalon
  2019-03-19 20:45           ` Thomas Monjalon
@ 2019-03-20 22:40           ` Pradeep Satyanarayana
  2019-03-20 22:40             ` Pradeep Satyanarayana
  2019-03-21  8:49             ` Shahaf Shuler
  1 sibling, 2 replies; 42+ messages in thread
From: Pradeep Satyanarayana @ 2019-03-20 22:40 UTC (permalink / raw)
  To: Thomas Monjalon
  Cc: Chao Zhu, Dekel Peled, dev, Ori Kam, Shahaf Shuler, stable,
	Yongseok Koh, David Christensen, David Wilder
Thomas Monjalon <thomas@monjalon.net> wrote on 03/19/2019 01:45:01 PM:
> From: Thomas Monjalon <thomas@monjalon.net>
> To: Shahaf Shuler <shahafs@mellanox.com>
> Cc: Dekel Peled <dekelp@mellanox.com>, Chao Zhu
> <chaozhu@linux.vnet.ibm.com>, Yongseok Koh <yskoh@mellanox.com>,
> "dev@dpdk.org" <dev@dpdk.org>, Ori Kam <orika@mellanox.com>,
> "stable@dpdk.org" <stable@dpdk.org>, pradeep@us.ibm.com
> Date: 03/19/2019 01:45 PM
> Subject: Re: [PATCH] eal/ppc: remove fix of memory barrier for IBM POWER
>
> 19/03/2019 20:42, Shahaf Shuler:
> > Tuesday, March 19, 2019 1:15 PM, Thomas Monjalon:
> > > Subject: Re: [PATCH] eal/ppc: remove fix of memory barrier for IBM
POWER
> > >
> > > Guys, please let's avoid top-post.
> > >
> > > You are both not replying to each other:
> > >
> > > 1/ Dekel mentioned the IBM doc but Chao did not argue about the lack
of IO
> > > protection with lwsync.
> > > We assume that rte_mb should protect any access including IO.
> > >
> > > 2/ Chao asked about the semantic of the barrier used in mlx5
> code, but Dekel
> > > did not reply about the semantic: are we protecting IO or general
memory
> > > access?
> >
> > In mlx5 code we want to sync between two different writes:
> > 1. write to system memory (RAM)
> > 2. write to MMIO memory (device)
> >
> > We need #1 to be visible on host memory before #2 is committed to NIC.
> > We want to have a single type of barrier which will translate to
> the correct assembly based on the system arch, and in addition we
> want it light-weight as possible.
> >
> > So far, when not running on power, we used the rte_wmb for that.
> On x86 and ARM systems it provided the needed guarantees.
> > It is also mentioned in the barrier doxygen on ARM arch:
> > "
> > Write memory barrier.
> >
> > Guarantees that the STORE operations generated before the barrier
> > occur before the STORE operations generated after.
> > "
> >
> > It doesn't restrict to store to system memory only.
> > w/ power is on somewhat different and in fact rte_mb is required.
> It obviously miss the point of those barrier if we will need to use
> a different barrier based on the system arch.
> >
> > We need to align the definition of the different barriers in DPDK:
> > 1. need a clear documentation of each. this should be global and
> not part of the specific implementation on each arch.
A single approach may not work for all architectures. Power is different
from others, so we need to be able to accommodate that. More comments
below.
>
> The global definition is in
lib/librte_eal/common/include/generic/rte_atomic.h
>
> There are some copy/paste in Arm32 and PPC that I will remove.
>
> > 2. either modify ppc rte_wmb to match ARM and x86 ones or to
> define a new type of barrier which will sync between both I/O and
> stores to systems memory.
>
> The basic memory barrier of DPDK does not mention
> a difference between I/O and system memory.
In the case of Power, sync will cater to both I/O and system memory.
However, that
is too big a hammer in all cases.
> It is not explicit (yet) but I assume it is protecting both.
> So, in my opinion, we need to make it explicit in the doc,
> and fix the PPC implementation to comply with this definition.
>
> Anyway, I don't see any significant effort from IBM to move from
> the alpha support stage to a real Open Source support.
> PS: sending a mail every two months, to promise improvements, is not
enough!
We have been working to set up CI across several branches and during the
course of this
testing we discovered a segfault with mlx5 PMD and reported it to Mellanox.
We have worked
with them to validate several iterations of the patches. Unfortunately none
of this has been
visible to the community. We should be submitting some patches for DTS to
address issues seen
on Power after next week (vacation plans).
It is taking time and effort and is not visible externally since we are in
catch-up mode.
We discussed the patches and believe:
http://mails.dpdk.org/archives/dev/2019-March/126712.html
looks right and changes only the Power platform.
Dave Christensen had previously validated the following patch from
Mellanox:
http://mails.dpdk.org/archives/dev/2019-March/126726.html
We should retain lwsync, should not be removed as discussed in here:
http://mails.dpdk.org/archives/dev/2019-March/126746.html
>
> -----------------
>
> > > 19/03/2019 11:05, Dekel Peled:
> > > > Hi,
> > > >
> > > > For ppc, rte_io_mb() is defined as rte_mb(), which is defined
> as asm sync.
> > > > According to comments in arch/ppc_64/rte_atomic.h, rte_wmb() and
> > > rte_rmb() are the same as rte_mb(), for store and load respectively.
> > > > My patch propose to define rte_wmb() and rte_rmb() as asm sync,
like
> > > rte_mb(), since using lwsync is incorrect for them.
> > > >
> > > > Regards,
> > > > Dekel
> > > >
> > > > From: Chao Zhu <chaozhu@linux.vnet.ibm.com>
> > > > > Dekel£¬
> > > > >
> > > > > To control the memory order for device memory, I think you should
> > > > > use
> > > > > rte_io_mb() instead of rte_mb(). This will generate correct
result.
> > > > > rte_wmb() is used for system memory.
> > > > >
> > > > > From: Dekel Peled <dekelp@mellanox.com>
> > > > > >
> > > > > > From previous patch description: "to improve performance on
PPC64,
> > > > > > use light weight sync instruction instead of sync instruction."
> > > > > >
> > > > > > Excerpt from IBM doc [1], section "Memory barrier
instructions":
> > > > > > "The second form of the sync instruction is light-weight
> sync, or lwsync.
> > > > > > This form is used to control ordering for storage accesses to
> > > > > > system memory only. It does not create a memory barrier for
> > > > > > accesses to device
> > > > > memory."
> > > > > >
> > > > > > This patch removes the use of lwsync, so calls to rte_wmb() and
> > > > > > rte_rmb() will provide correct memory barrier to ensure order
of
> > > > > > accesses to system memory and device memory.
Thanks
Pradeep
pradeep@us.ibm.com
^ permalink raw reply	[flat|nested] 42+ messages in thread
* Re: [dpdk-dev] [PATCH] eal/ppc: remove fix of memory barrier for IBM POWER
  2019-03-20 22:40           ` Pradeep Satyanarayana
@ 2019-03-20 22:40             ` Pradeep Satyanarayana
  2019-03-21  8:49             ` Shahaf Shuler
  1 sibling, 0 replies; 42+ messages in thread
From: Pradeep Satyanarayana @ 2019-03-20 22:40 UTC (permalink / raw)
  To: Thomas Monjalon
  Cc: Chao Zhu, Dekel Peled, dev, Ori Kam, Shahaf Shuler, stable,
	Yongseok Koh, David Christensen, David Wilder
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset="UTF-8", Size: 6158 bytes --]
Thomas Monjalon <thomas@monjalon.net> wrote on 03/19/2019 01:45:01 PM:
> From: Thomas Monjalon <thomas@monjalon.net>
> To: Shahaf Shuler <shahafs@mellanox.com>
> Cc: Dekel Peled <dekelp@mellanox.com>, Chao Zhu
> <chaozhu@linux.vnet.ibm.com>, Yongseok Koh <yskoh@mellanox.com>,
> "dev@dpdk.org" <dev@dpdk.org>, Ori Kam <orika@mellanox.com>,
> "stable@dpdk.org" <stable@dpdk.org>, pradeep@us.ibm.com
> Date: 03/19/2019 01:45 PM
> Subject: Re: [PATCH] eal/ppc: remove fix of memory barrier for IBM POWER
>
> 19/03/2019 20:42, Shahaf Shuler:
> > Tuesday, March 19, 2019 1:15 PM, Thomas Monjalon:
> > > Subject: Re: [PATCH] eal/ppc: remove fix of memory barrier for IBM
POWER
> > >
> > > Guys, please let's avoid top-post.
> > >
> > > You are both not replying to each other:
> > >
> > > 1/ Dekel mentioned the IBM doc but Chao did not argue about the lack
of IO
> > > protection with lwsync.
> > > We assume that rte_mb should protect any access including IO.
> > >
> > > 2/ Chao asked about the semantic of the barrier used in mlx5
> code, but Dekel
> > > did not reply about the semantic: are we protecting IO or general
memory
> > > access?
> >
> > In mlx5 code we want to sync between two different writes:
> > 1. write to system memory (RAM)
> > 2. write to MMIO memory (device)
> >
> > We need #1 to be visible on host memory before #2 is committed to NIC.
> > We want to have a single type of barrier which will translate to
> the correct assembly based on the system arch, and in addition we
> want it light-weight as possible.
> >
> > So far, when not running on power, we used the rte_wmb for that.
> On x86 and ARM systems it provided the needed guarantees.
> > It is also mentioned in the barrier doxygen on ARM arch:
> > "
> > Write memory barrier.
> >
> > Guarantees that the STORE operations generated before the barrier
> > occur before the STORE operations generated after.
> > "
> >
> > It doesn't restrict to store to system memory only.
> > w/ power is on somewhat different and in fact rte_mb is required.
> It obviously miss the point of those barrier if we will need to use
> a different barrier based on the system arch.
> >
> > We need to align the definition of the different barriers in DPDK:
> > 1. need a clear documentation of each. this should be global and
> not part of the specific implementation on each arch.
A single approach may not work for all architectures. Power is different
from others, so we need to be able to accommodate that. More comments
below.
>
> The global definition is in
lib/librte_eal/common/include/generic/rte_atomic.h
>
> There are some copy/paste in Arm32 and PPC that I will remove.
>
> > 2. either modify ppc rte_wmb to match ARM and x86 ones or to
> define a new type of barrier which will sync between both I/O and
> stores to systems memory.
>
> The basic memory barrier of DPDK does not mention
> a difference between I/O and system memory.
In the case of Power, sync will cater to both I/O and system memory.
However, that
is too big a hammer in all cases.
> It is not explicit (yet) but I assume it is protecting both.
> So, in my opinion, we need to make it explicit in the doc,
> and fix the PPC implementation to comply with this definition.
>
> Anyway, I don't see any significant effort from IBM to move from
> the alpha support stage to a real Open Source support.
> PS: sending a mail every two months, to promise improvements, is not
enough!
We have been working to set up CI across several branches and during the
course of this
testing we discovered a segfault with mlx5 PMD and reported it to Mellanox.
We have worked
with them to validate several iterations of the patches. Unfortunately none
of this has been
visible to the community. We should be submitting some patches for DTS to
address issues seen
on Power after next week (vacation plans).
It is taking time and effort and is not visible externally since we are in
catch-up mode.
We discussed the patches and believe:
http://mails.dpdk.org/archives/dev/2019-March/126712.html
looks right and changes only the Power platform.
Dave Christensen had previously validated the following patch from
Mellanox:
http://mails.dpdk.org/archives/dev/2019-March/126726.html
We should retain lwsync, should not be removed as discussed in here:
http://mails.dpdk.org/archives/dev/2019-March/126746.html
>
> -----------------
>
> > > 19/03/2019 11:05, Dekel Peled:
> > > > Hi,
> > > >
> > > > For ppc, rte_io_mb() is defined as rte_mb(), which is defined
> as asm sync.
> > > > According to comments in arch/ppc_64/rte_atomic.h, rte_wmb() and
> > > rte_rmb() are the same as rte_mb(), for store and load respectively.
> > > > My patch propose to define rte_wmb() and rte_rmb() as asm sync,
like
> > > rte_mb(), since using lwsync is incorrect for them.
> > > >
> > > > Regards,
> > > > Dekel
> > > >
> > > > From: Chao Zhu <chaozhu@linux.vnet.ibm.com>
> > > > > Dekel£¬
> > > > >
> > > > > To control the memory order for device memory, I think you should
> > > > > use
> > > > > rte_io_mb() instead of rte_mb(). This will generate correct
result.
> > > > > rte_wmb() is used for system memory.
> > > > >
> > > > > From: Dekel Peled <dekelp@mellanox.com>
> > > > > >
> > > > > > From previous patch description: "to improve performance on
PPC64,
> > > > > > use light weight sync instruction instead of sync instruction."
> > > > > >
> > > > > > Excerpt from IBM doc [1], section "Memory barrier
instructions":
> > > > > > "The second form of the sync instruction is light-weight
> sync, or lwsync.
> > > > > > This form is used to control ordering for storage accesses to
> > > > > > system memory only. It does not create a memory barrier for
> > > > > > accesses to device
> > > > > memory."
> > > > > >
> > > > > > This patch removes the use of lwsync, so calls to rte_wmb() and
> > > > > > rte_rmb() will provide correct memory barrier to ensure order
of
> > > > > > accesses to system memory and device memory.
Thanks
Pradeep
pradeep@us.ibm.com
^ permalink raw reply	[flat|nested] 42+ messages in thread
* Re: [dpdk-dev] [PATCH] eal/ppc: remove fix of memory barrier for IBM POWER
  2019-03-20 22:40           ` Pradeep Satyanarayana
  2019-03-20 22:40             ` Pradeep Satyanarayana
@ 2019-03-21  8:49             ` Shahaf Shuler
  2019-03-21  8:49               ` Shahaf Shuler
  2019-03-22  1:40               ` Pradeep Satyanarayana
  1 sibling, 2 replies; 42+ messages in thread
From: Shahaf Shuler @ 2019-03-21  8:49 UTC (permalink / raw)
  To: pradeep, Thomas Monjalon
  Cc: Chao Zhu, Dekel Peled, dev, Ori Kam, stable, Yongseok Koh,
	David Christensen, David Wilder
Pradeep Satyanarayana <pradeep@us.ibm.com> wrote on Thu 3/21/2019 12:41 AM:
>Thomas Monjalon <thomas@monjalon.net> wrote on 03/19/2019 01:45:01 PM:
>
>> From: Thomas Monjalon <thomas@monjalon.net>
>> To: Shahaf Shuler <shahafs@mellanox.com>
>> Cc: Dekel Peled <dekelp@mellanox.com>, Chao Zhu
>> <chaozhu@linux.vnet.ibm.com>, Yongseok Koh <yskoh@mellanox.com>,
>> "dev@dpdk.org" <dev@dpdk.org>, Ori Kam <orika@mellanox.com>,
>> "stable@dpdk.org" <stable@dpdk.org>, pradeep@us.ibm.com
>> Date: 03/19/2019 01:45 PM
>> Subject: Re: [PATCH] eal/ppc: remove fix of memory barrier for IBM POWER
>>
[...]
>> >
>> > So far, when not running on power, we used the rte_wmb for that.
>> On x86 and ARM systems it provided the needed guarantees.
>> > It is also mentioned in the barrier doxygen on ARM arch:
>> > "
>> > Write memory barrier.
>> >
>> > Guarantees that the STORE operations generated before the barrier
>> > occur before the STORE operations generated after.
>> > "
>> >
>> > It doesn't restrict to store to system memory only.
>> > w/ power is on somewhat different and in fact rte_mb is required.
>> It obviously miss the point of those barrier if we will need to use
>> a different barrier based on the system arch.
>> >
>> > We need to align the definition of the different barriers in DPDK:
>> > 1. need a clear documentation of each. this should be global and
>> not part of the specific implementation on each arch.
>
>A single approach may not work for all architectures. Power is different
>from others, so we need to be able to accommodate that. More comments below.
it don't get this claim.
It is ok to have some differences between the different arch, but here you implement a well-defined barrier - rte_wmb.
if you see a need we can discuss to define a **new** barrier which sync STORE only to system memory, and will be able to utilize the lwsync command.
>
>>
>> The global definition is in lib/librte_eal/common/include/generic/rte_atomic.h
>>
>> There are some copy/paste in Arm32 and PPC that I will remove.
>>
>> > 2. either modify ppc rte_wmb to match ARM and x86 ones or to
>> define a new type of barrier which will sync between both I/O and
>> stores to systems memory.
>>
>> The basic memory barrier of DPDK does not mention
>> a difference between I/O and system memory.
>
>In the case of Power, sync will cater to both I/O and system memory. However, that
>is too big a hammer in all cases.
rte_wmb requires such sync. you propose to have the wrong barrier in favor of performance.
to mitigate this you can take my suggestion above and define a new, more lightweight one.
>
>> It is not explicit (yet) but I assume it is protecting both.
>> So, in my opinion, we need to make it explicit in the doc,
>> and fix the PPC implementation to comply with this definition.
>>
>> Anyway, I don't see any significant effort from IBM to move from
>> the alpha support stage to a real Open Source support.
>> PS: sending a mail every two months, to promise improvements, is not enough!
>
[...]
>
>We should retain lwsync, should not be removed as discussed in here:
>
>http://mails.dpdk.org/archives/dev/2019-March/126746.html
i don't agree.
it is very clear the rte_wmb implementation in power is broken and we need to fix this right away before other customers will hit the same issue.
^ permalink raw reply	[flat|nested] 42+ messages in thread
* Re: [dpdk-dev] [PATCH] eal/ppc: remove fix of memory barrier for IBM POWER
  2019-03-21  8:49             ` Shahaf Shuler
@ 2019-03-21  8:49               ` Shahaf Shuler
  2019-03-22  1:40               ` Pradeep Satyanarayana
  1 sibling, 0 replies; 42+ messages in thread
From: Shahaf Shuler @ 2019-03-21  8:49 UTC (permalink / raw)
  To: pradeep, Thomas Monjalon
  Cc: Chao Zhu, Dekel Peled, dev, Ori Kam, stable, Yongseok Koh,
	David Christensen, David Wilder
Pradeep Satyanarayana <pradeep@us.ibm.com> wrote on Thu 3/21/2019 12:41 AM:
>Thomas Monjalon <thomas@monjalon.net> wrote on 03/19/2019 01:45:01 PM:
>
>> From: Thomas Monjalon <thomas@monjalon.net>
>> To: Shahaf Shuler <shahafs@mellanox.com>
>> Cc: Dekel Peled <dekelp@mellanox.com>, Chao Zhu
>> <chaozhu@linux.vnet.ibm.com>, Yongseok Koh <yskoh@mellanox.com>,
>> "dev@dpdk.org" <dev@dpdk.org>, Ori Kam <orika@mellanox.com>,
>> "stable@dpdk.org" <stable@dpdk.org>, pradeep@us.ibm.com
>> Date: 03/19/2019 01:45 PM
>> Subject: Re: [PATCH] eal/ppc: remove fix of memory barrier for IBM POWER
>>
[...]
>> >
>> > So far, when not running on power, we used the rte_wmb for that.
>> On x86 and ARM systems it provided the needed guarantees.
>> > It is also mentioned in the barrier doxygen on ARM arch:
>> > "
>> > Write memory barrier.
>> >
>> > Guarantees that the STORE operations generated before the barrier
>> > occur before the STORE operations generated after.
>> > "
>> >
>> > It doesn't restrict to store to system memory only.
>> > w/ power is on somewhat different and in fact rte_mb is required.
>> It obviously miss the point of those barrier if we will need to use
>> a different barrier based on the system arch.
>> >
>> > We need to align the definition of the different barriers in DPDK:
>> > 1. need a clear documentation of each. this should be global and
>> not part of the specific implementation on each arch.
>
>A single approach may not work for all architectures. Power is different
>from others, so we need to be able to accommodate that. More comments below.
it don't get this claim.
It is ok to have some differences between the different arch, but here you implement a well-defined barrier - rte_wmb.
if you see a need we can discuss to define a **new** barrier which sync STORE only to system memory, and will be able to utilize the lwsync command.
>
>>
>> The global definition is in lib/librte_eal/common/include/generic/rte_atomic.h
>>
>> There are some copy/paste in Arm32 and PPC that I will remove.
>>
>> > 2. either modify ppc rte_wmb to match ARM and x86 ones or to
>> define a new type of barrier which will sync between both I/O and
>> stores to systems memory.
>>
>> The basic memory barrier of DPDK does not mention
>> a difference between I/O and system memory.
>
>In the case of Power, sync will cater to both I/O and system memory. However, that
>is too big a hammer in all cases.
rte_wmb requires such sync. you propose to have the wrong barrier in favor of performance.
to mitigate this you can take my suggestion above and define a new, more lightweight one.
>
>> It is not explicit (yet) but I assume it is protecting both.
>> So, in my opinion, we need to make it explicit in the doc,
>> and fix the PPC implementation to comply with this definition.
>>
>> Anyway, I don't see any significant effort from IBM to move from
>> the alpha support stage to a real Open Source support.
>> PS: sending a mail every two months, to promise improvements, is not enough!
>
[...]
>
>We should retain lwsync, should not be removed as discussed in here:
>
>http://mails.dpdk.org/archives/dev/2019-March/126746.html
i don't agree.
it is very clear the rte_wmb implementation in power is broken and we need to fix this right away before other customers will hit the same issue.
^ permalink raw reply	[flat|nested] 42+ messages in thread
* Re: [dpdk-dev] [PATCH] eal/ppc: remove fix of memory barrier for IBM POWER
  2019-03-21  8:49             ` Shahaf Shuler
  2019-03-21  8:49               ` Shahaf Shuler
@ 2019-03-22  1:40               ` Pradeep Satyanarayana
  2019-03-22  1:40                 ` Pradeep Satyanarayana
  2019-03-22  8:49                 ` Thomas Monjalon
  1 sibling, 2 replies; 42+ messages in thread
From: Pradeep Satyanarayana @ 2019-03-22  1:40 UTC (permalink / raw)
  To: Shahaf Shuler
  Cc: Chao Zhu, Dekel Peled, dev, David Christensen, Ori Kam, stable,
	Thomas Monjalon, David Wilder, Yongseok Koh
Shahaf Shuler <shahafs@mellanox.com> wrote on 03/21/2019 01:49:39 AM:
> From: Shahaf Shuler <shahafs@mellanox.com>
> To: "pradeep@us.ibm.com" <pradeep@us.ibm.com>, Thomas Monjalon
> <thomas@monjalon.net>
> Cc: Chao Zhu <chaozhu@linux.vnet.ibm.com>, Dekel Peled
> <dekelp@mellanox.com>, "dev@dpdk.org" <dev@dpdk.org>, Ori Kam
> <orika@mellanox.com>, "stable@dpdk.org" <stable@dpdk.org>, Yongseok
> Koh <yskoh@mellanox.com>, David Christensen <drc@ibm.com>, David
> Wilder <wilder@us.ibm.com>
> Date: 03/21/2019 01:54 AM
> Subject: RE: [PATCH] eal/ppc: remove fix of memory barrier for IBM POWER
>
> Pradeep Satyanarayana <pradeep@us.ibm.com> wrote on Thu 3/21/2019 12:41
AM:
> >Thomas Monjalon <thomas@monjalon.net> wrote on 03/19/2019 01:45:01 PM:
> >
> >> From: Thomas Monjalon <thomas@monjalon.net>
> >> To: Shahaf Shuler <shahafs@mellanox.com>
> >> Cc: Dekel Peled <dekelp@mellanox.com>, Chao Zhu
> >> <chaozhu@linux.vnet.ibm.com>, Yongseok Koh <yskoh@mellanox.com>,
> >> "dev@dpdk.org" <dev@dpdk.org>, Ori Kam <orika@mellanox.com>,
> >> "stable@dpdk.org" <stable@dpdk.org>, pradeep@us.ibm.com
> >> Date: 03/19/2019 01:45 PM
> >> Subject: Re: [PATCH] eal/ppc: remove fix of memory barrier for IBM
POWER
> >>
>
> [...]
>
> >> >
> >> > So far, when not running on power, we used the rte_wmb for that.
> >> On x86 and ARM systems it provided the needed guarantees.
> >> > It is also mentioned in the barrier doxygen on ARM arch:
> >> > "
> >> > Write memory barrier.
> >> >
> >> > Guarantees that the STORE operations generated before the barrier
> >> > occur before the STORE operations generated after.
> >> > "
> >> >
> >> > It doesn't restrict to store to system memory only.
> >> > w/ power is on somewhat different and in fact rte_mb is required.
> >> It obviously miss the point of those barrier if we will need to use
> >> a different barrier based on the system arch.
> >> >
> >> > We need to align the definition of the different barriers in DPDK:
> >> > 1. need a clear documentation of each. this should be global and
> >> not part of the specific implementation on each arch.
> >
> >A single approach may not work for all architectures. Power is different
> >from others, so we need to be able to accommodate that. More comments
below.
>
> it don't get this claim.
> It is ok to have some differences between the different arch, but
> here you implement a well-defined barrier - rte_wmb.
> if you see a need we can discuss to define a **new** barrier which
> sync STORE only to system memory, and will be able to utilize the
> lwsync command.
>
> >
> >>
> >> The global definition is in lib/librte_eal/common/include/
> generic/rte_atomic.h
> >>
> >> There are some copy/paste in Arm32 and PPC that I will remove.
> >>
> >> > 2. either modify ppc rte_wmb to match ARM and x86 ones or to
> >> define a new type of barrier which will sync between both I/O and
> >> stores to systems memory.
> >>
> >> The basic memory barrier of DPDK does not mention
> >> a difference between I/O and system memory.
> >
> >In the case of Power, sync will cater to both I/O and system
> memory. However, that
> >is too big a hammer in all cases.
>
> rte_wmb requires such sync. you propose to have the wrong barrier in
> favor of performance.
> to mitigate this you can take my suggestion above and define a new,
> more lightweight one.
>
> >
> >> It is not explicit (yet) but I assume it is protecting both.
> >> So, in my opinion, we need to make it explicit in the doc,
> >> and fix the PPC implementation to comply with this definition.
> >>
> >> Anyway, I don't see any significant effort from IBM to move from
> >> the alpha support stage to a real Open Source support.
> >> PS: sending a mail every two months, to promise improvements, is
> not enough!
> >
>
> […]
>
> >
> >We should retain lwsync, should not be removed as discussed in here:
> >
> >http://mails.dpdk.org/archives/dev/2019-March/126746.html
>
> i don't agree.
> it is very clear the rte_wmb implementation in power is broken and
> we need to fix this right away before other customers will hit the
> same issue.
In the DPDK source I see a couple of different classes of memory barriers.
I am
not clear on the usage of these in the drivers, but I would think the
guidelines
to be as shown below (for Power):
- rte_[rw]mb (general memory barrier) --> should be lwsync
- rte_smp_[rw]mb (SMP memory barrier) -->should be lwsync
- rte_io_[rw]mb (I/O memory barrier)  --> should be sync
- rte_cio_[rw]mb (coherent I/O memory barrier) -->should be sync
lwsync is appropriate for cases where CPUs are accessing cacheable memory
(i.e. Memory Coherence Required) while the sync instruction should be used
in all other cases.
With the patch in:
http://mails.dpdk.org/archives/dev/2019-March/126746.html
It converts even the rte_smp_[rw]mb into sync. That is not what the
rte_smp*() should
be implementing as per the guidelines above.
static __rte_always_inline void
mlx5_tx_dbrec_cond_wmb(struct mlx5_txq_data *txq, volatile struct mlx5_wqe
*wqe,
                       int cond)
{
        uint64_t *dst = (uint64_t *)((uintptr_t)txq->bf_reg);
        volatile uint64_t *src = ((volatile uint64_t *)wqe);
        rte_cio_wmb(); --> would rte_cio_rmb() be more appropriate?
        *txq->qp_db = rte_cpu_to_be_32(txq->wqe_ci);
        /* Ensure ordering between DB record and BF copy. */
        rte_wmb(); --> what has been established is that for Power we need
"sync" instead of lwsync
		We are dealing with device memory -should we be using an
rte_io_wmb() here?
        mlx5_uar_write64_relaxed(*src, dst, txq->uar_lock);
        if (cond)
                rte_wmb(); --> what about here? Are there conditions when
we need sync?
}
Thanks
Pradeep
pradeep@us.ibm.com
^ permalink raw reply	[flat|nested] 42+ messages in thread
* Re: [dpdk-dev] [PATCH] eal/ppc: remove fix of memory barrier for IBM POWER
  2019-03-22  1:40               ` Pradeep Satyanarayana
@ 2019-03-22  1:40                 ` Pradeep Satyanarayana
  2019-03-22  8:49                 ` Thomas Monjalon
  1 sibling, 0 replies; 42+ messages in thread
From: Pradeep Satyanarayana @ 2019-03-22  1:40 UTC (permalink / raw)
  To: Shahaf Shuler
  Cc: Chao Zhu, Dekel Peled, dev, David Christensen, Ori Kam, stable,
	Thomas Monjalon, David Wilder, Yongseok Koh
Shahaf Shuler <shahafs@mellanox.com> wrote on 03/21/2019 01:49:39 AM:
> From: Shahaf Shuler <shahafs@mellanox.com>
> To: "pradeep@us.ibm.com" <pradeep@us.ibm.com>, Thomas Monjalon
> <thomas@monjalon.net>
> Cc: Chao Zhu <chaozhu@linux.vnet.ibm.com>, Dekel Peled
> <dekelp@mellanox.com>, "dev@dpdk.org" <dev@dpdk.org>, Ori Kam
> <orika@mellanox.com>, "stable@dpdk.org" <stable@dpdk.org>, Yongseok
> Koh <yskoh@mellanox.com>, David Christensen <drc@ibm.com>, David
> Wilder <wilder@us.ibm.com>
> Date: 03/21/2019 01:54 AM
> Subject: RE: [PATCH] eal/ppc: remove fix of memory barrier for IBM POWER
>
> Pradeep Satyanarayana <pradeep@us.ibm.com> wrote on Thu 3/21/2019 12:41
AM:
> >Thomas Monjalon <thomas@monjalon.net> wrote on 03/19/2019 01:45:01 PM:
> >
> >> From: Thomas Monjalon <thomas@monjalon.net>
> >> To: Shahaf Shuler <shahafs@mellanox.com>
> >> Cc: Dekel Peled <dekelp@mellanox.com>, Chao Zhu
> >> <chaozhu@linux.vnet.ibm.com>, Yongseok Koh <yskoh@mellanox.com>,
> >> "dev@dpdk.org" <dev@dpdk.org>, Ori Kam <orika@mellanox.com>,
> >> "stable@dpdk.org" <stable@dpdk.org>, pradeep@us.ibm.com
> >> Date: 03/19/2019 01:45 PM
> >> Subject: Re: [PATCH] eal/ppc: remove fix of memory barrier for IBM
POWER
> >>
>
> [...]
>
> >> >
> >> > So far, when not running on power, we used the rte_wmb for that.
> >> On x86 and ARM systems it provided the needed guarantees.
> >> > It is also mentioned in the barrier doxygen on ARM arch:
> >> > "
> >> > Write memory barrier.
> >> >
> >> > Guarantees that the STORE operations generated before the barrier
> >> > occur before the STORE operations generated after.
> >> > "
> >> >
> >> > It doesn't restrict to store to system memory only.
> >> > w/ power is on somewhat different and in fact rte_mb is required.
> >> It obviously miss the point of those barrier if we will need to use
> >> a different barrier based on the system arch.
> >> >
> >> > We need to align the definition of the different barriers in DPDK:
> >> > 1. need a clear documentation of each. this should be global and
> >> not part of the specific implementation on each arch.
> >
> >A single approach may not work for all architectures. Power is different
> >from others, so we need to be able to accommodate that. More comments
below.
>
> it don't get this claim.
> It is ok to have some differences between the different arch, but
> here you implement a well-defined barrier - rte_wmb.
> if you see a need we can discuss to define a **new** barrier which
> sync STORE only to system memory, and will be able to utilize the
> lwsync command.
>
> >
> >>
> >> The global definition is in lib/librte_eal/common/include/
> generic/rte_atomic.h
> >>
> >> There are some copy/paste in Arm32 and PPC that I will remove.
> >>
> >> > 2. either modify ppc rte_wmb to match ARM and x86 ones or to
> >> define a new type of barrier which will sync between both I/O and
> >> stores to systems memory.
> >>
> >> The basic memory barrier of DPDK does not mention
> >> a difference between I/O and system memory.
> >
> >In the case of Power, sync will cater to both I/O and system
> memory. However, that
> >is too big a hammer in all cases.
>
> rte_wmb requires such sync. you propose to have the wrong barrier in
> favor of performance.
> to mitigate this you can take my suggestion above and define a new,
> more lightweight one.
>
> >
> >> It is not explicit (yet) but I assume it is protecting both.
> >> So, in my opinion, we need to make it explicit in the doc,
> >> and fix the PPC implementation to comply with this definition.
> >>
> >> Anyway, I don't see any significant effort from IBM to move from
> >> the alpha support stage to a real Open Source support.
> >> PS: sending a mail every two months, to promise improvements, is
> not enough!
> >
>
> […]
>
> >
> >We should retain lwsync, should not be removed as discussed in here:
> >
> >http://mails.dpdk.org/archives/dev/2019-March/126746.html
>
> i don't agree.
> it is very clear the rte_wmb implementation in power is broken and
> we need to fix this right away before other customers will hit the
> same issue.
In the DPDK source I see a couple of different classes of memory barriers.
I am
not clear on the usage of these in the drivers, but I would think the
guidelines
to be as shown below (for Power):
- rte_[rw]mb (general memory barrier) --> should be lwsync
- rte_smp_[rw]mb (SMP memory barrier) -->should be lwsync
- rte_io_[rw]mb (I/O memory barrier)  --> should be sync
- rte_cio_[rw]mb (coherent I/O memory barrier) -->should be sync
lwsync is appropriate for cases where CPUs are accessing cacheable memory
(i.e. Memory Coherence Required) while the sync instruction should be used
in all other cases.
With the patch in:
http://mails.dpdk.org/archives/dev/2019-March/126746.html
It converts even the rte_smp_[rw]mb into sync. That is not what the
rte_smp*() should
be implementing as per the guidelines above.
static __rte_always_inline void
mlx5_tx_dbrec_cond_wmb(struct mlx5_txq_data *txq, volatile struct mlx5_wqe
*wqe,
                       int cond)
{
        uint64_t *dst = (uint64_t *)((uintptr_t)txq->bf_reg);
        volatile uint64_t *src = ((volatile uint64_t *)wqe);
        rte_cio_wmb(); --> would rte_cio_rmb() be more appropriate?
        *txq->qp_db = rte_cpu_to_be_32(txq->wqe_ci);
        /* Ensure ordering between DB record and BF copy. */
        rte_wmb(); --> what has been established is that for Power we need
"sync" instead of lwsync
		We are dealing with device memory -should we be using an
rte_io_wmb() here?
        mlx5_uar_write64_relaxed(*src, dst, txq->uar_lock);
        if (cond)
                rte_wmb(); --> what about here? Are there conditions when
we need sync?
}
Thanks
Pradeep
pradeep@us.ibm.com
^ permalink raw reply	[flat|nested] 42+ messages in thread
* Re: [dpdk-dev] [PATCH] eal/ppc: remove fix of memory barrier for IBM POWER
  2019-03-22  1:40               ` Pradeep Satyanarayana
  2019-03-22  1:40                 ` Pradeep Satyanarayana
@ 2019-03-22  8:49                 ` Thomas Monjalon
  2019-03-22  8:49                   ` Thomas Monjalon
  2019-03-22 15:30                   ` Pradeep Satyanarayana
  1 sibling, 2 replies; 42+ messages in thread
From: Thomas Monjalon @ 2019-03-22  8:49 UTC (permalink / raw)
  To: Pradeep Satyanarayana, David Wilder
  Cc: Shahaf Shuler, Chao Zhu, Dekel Peled, dev, David Christensen,
	Ori Kam, Yongseok Koh, konstantin.ananyev, ola.liljedahl,
	honnappa.nagarahalli, bruce.richardson
We need to agree on the definitions.
Please see below,
22/03/2019 02:40, Pradeep Satyanarayana:
> Shahaf Shuler <shahafs@mellanox.com> wrote on 03/21/2019 01:49:39 AM:
> > Pradeep Satyanarayana <pradeep@us.ibm.com> wrote on Thu 3/21/2019 12:41
> AM:
> > >> > So far, when not running on power, we used the rte_wmb for that.
> > >> On x86 and ARM systems it provided the needed guarantees.
> > >> > It is also mentioned in the barrier doxygen on ARM arch:
> > >> > "
> > >> > Write memory barrier.
> > >> >
> > >> > Guarantees that the STORE operations generated before the barrier
> > >> > occur before the STORE operations generated after.
> > >> > "
> > >> >
> > >> > It doesn't restrict to store to system memory only.
> > >> > w/ power is on somewhat different and in fact rte_mb is required.
> > >> It obviously miss the point of those barrier if we will need to use
> > >> a different barrier based on the system arch.
> > >> >
> > >> > We need to align the definition of the different barriers in DPDK:
> > >> > 1. need a clear documentation of each. this should be global and
> > >> not part of the specific implementation on each arch.
> > >
> > >A single approach may not work for all architectures. Power is different
> > >from others, so we need to be able to accommodate that. More comments
> below.
> >
> > it don't get this claim.
> > It is ok to have some differences between the different arch, but
> > here you implement a well-defined barrier - rte_wmb.
> > if you see a need we can discuss to define a **new** barrier which
> > sync STORE only to system memory, and will be able to utilize the
> > lwsync command.
> >
> > >
> > >> The global definition is in lib/librte_eal/common/include/
> > generic/rte_atomic.h
> > >>
> > >> There are some copy/paste in Arm32 and PPC that I will remove.
> > >>
> > >> > 2. either modify ppc rte_wmb to match ARM and x86 ones or to
> > >> define a new type of barrier which will sync between both I/O and
> > >> stores to systems memory.
> > >>
> > >> The basic memory barrier of DPDK does not mention
> > >> a difference between I/O and system memory.
> > >
> > >In the case of Power, sync will cater to both I/O and system
> > memory. However, that
> > >is too big a hammer in all cases.
> >
> > rte_wmb requires such sync. you propose to have the wrong barrier in
> > favor of performance.
> > to mitigate this you can take my suggestion above and define a new,
> > more lightweight one.
> >
> > >
> > >> It is not explicit (yet) but I assume it is protecting both.
> > >> So, in my opinion, we need to make it explicit in the doc,
> > >> and fix the PPC implementation to comply with this definition.
> > >>
> > >> Anyway, I don't see any significant effort from IBM to move from
> > >> the alpha support stage to a real Open Source support.
> > >> PS: sending a mail every two months, to promise improvements, is
> > not enough!
> > >
> >
> > […]
> >
> > >
> > >We should retain lwsync, should not be removed as discussed in here:
> > >
> > >http://mails.dpdk.org/archives/dev/2019-March/126746.html
> >
> > i don't agree.
> > it is very clear the rte_wmb implementation in power is broken and
> > we need to fix this right away before other customers will hit the
> > same issue.
> 
> 
> In the DPDK source I see a couple of different classes of memory barriers.
> I am
> not clear on the usage of these in the drivers, but I would think the
> guidelines
> to be as shown below (for Power):
> 
> - rte_[rw]mb (general memory barrier) --> should be lwsync
This is what may be discussed.
The assumption is that the general memory barrier should cover
all cases (CPU caches, SMP and I/O).
That's why we think it should "sync" for Power.
> - rte_smp_[rw]mb (SMP memory barrier) -->should be lwsync
> - rte_io_[rw]mb (I/O memory barrier)  --> should be sync
> - rte_cio_[rw]mb (coherent I/O memory barrier) -->should be sync
> 
> lwsync is appropriate for cases where CPUs are accessing cacheable memory
> (i.e. Memory Coherence Required) while the sync instruction should be used
> in all other cases.
> 
> With the patch in:
> http://mails.dpdk.org/archives/dev/2019-March/126746.html
> 
> It converts even the rte_smp_[rw]mb into sync. That is not what the
> rte_smp*() should
> be implementing as per the guidelines above.
> 
> static __rte_always_inline void
> mlx5_tx_dbrec_cond_wmb(struct mlx5_txq_data *txq, volatile struct mlx5_wqe
> *wqe,
>                        int cond)
> {
>         uint64_t *dst = (uint64_t *)((uintptr_t)txq->bf_reg);
>         volatile uint64_t *src = ((volatile uint64_t *)wqe);
> 
>         rte_cio_wmb(); --> would rte_cio_rmb() be more appropriate?
>         *txq->qp_db = rte_cpu_to_be_32(txq->wqe_ci);
>         /* Ensure ordering between DB record and BF copy. */
>         rte_wmb(); --> what has been established is that for Power we need
> "sync" instead of lwsync
> 		We are dealing with device memory -should we be using an
> rte_io_wmb() here?
> 
>         mlx5_uar_write64_relaxed(*src, dst, txq->uar_lock);
>         if (cond)
>                 rte_wmb(); --> what about here? Are there conditions when
> we need sync?
> }
^ permalink raw reply	[flat|nested] 42+ messages in thread
* Re: [dpdk-dev] [PATCH] eal/ppc: remove fix of memory barrier for IBM POWER
  2019-03-22  8:49                 ` Thomas Monjalon
@ 2019-03-22  8:49                   ` Thomas Monjalon
  2019-03-22 15:30                   ` Pradeep Satyanarayana
  1 sibling, 0 replies; 42+ messages in thread
From: Thomas Monjalon @ 2019-03-22  8:49 UTC (permalink / raw)
  To: Pradeep Satyanarayana, David Wilder
  Cc: Shahaf Shuler, Chao Zhu, Dekel Peled, dev, David Christensen,
	Ori Kam, Yongseok Koh, konstantin.ananyev, ola.liljedahl,
	honnappa.nagarahalli, bruce.richardson
We need to agree on the definitions.
Please see below,
22/03/2019 02:40, Pradeep Satyanarayana:
> Shahaf Shuler <shahafs@mellanox.com> wrote on 03/21/2019 01:49:39 AM:
> > Pradeep Satyanarayana <pradeep@us.ibm.com> wrote on Thu 3/21/2019 12:41
> AM:
> > >> > So far, when not running on power, we used the rte_wmb for that.
> > >> On x86 and ARM systems it provided the needed guarantees.
> > >> > It is also mentioned in the barrier doxygen on ARM arch:
> > >> > "
> > >> > Write memory barrier.
> > >> >
> > >> > Guarantees that the STORE operations generated before the barrier
> > >> > occur before the STORE operations generated after.
> > >> > "
> > >> >
> > >> > It doesn't restrict to store to system memory only.
> > >> > w/ power is on somewhat different and in fact rte_mb is required.
> > >> It obviously miss the point of those barrier if we will need to use
> > >> a different barrier based on the system arch.
> > >> >
> > >> > We need to align the definition of the different barriers in DPDK:
> > >> > 1. need a clear documentation of each. this should be global and
> > >> not part of the specific implementation on each arch.
> > >
> > >A single approach may not work for all architectures. Power is different
> > >from others, so we need to be able to accommodate that. More comments
> below.
> >
> > it don't get this claim.
> > It is ok to have some differences between the different arch, but
> > here you implement a well-defined barrier - rte_wmb.
> > if you see a need we can discuss to define a **new** barrier which
> > sync STORE only to system memory, and will be able to utilize the
> > lwsync command.
> >
> > >
> > >> The global definition is in lib/librte_eal/common/include/
> > generic/rte_atomic.h
> > >>
> > >> There are some copy/paste in Arm32 and PPC that I will remove.
> > >>
> > >> > 2. either modify ppc rte_wmb to match ARM and x86 ones or to
> > >> define a new type of barrier which will sync between both I/O and
> > >> stores to systems memory.
> > >>
> > >> The basic memory barrier of DPDK does not mention
> > >> a difference between I/O and system memory.
> > >
> > >In the case of Power, sync will cater to both I/O and system
> > memory. However, that
> > >is too big a hammer in all cases.
> >
> > rte_wmb requires such sync. you propose to have the wrong barrier in
> > favor of performance.
> > to mitigate this you can take my suggestion above and define a new,
> > more lightweight one.
> >
> > >
> > >> It is not explicit (yet) but I assume it is protecting both.
> > >> So, in my opinion, we need to make it explicit in the doc,
> > >> and fix the PPC implementation to comply with this definition.
> > >>
> > >> Anyway, I don't see any significant effort from IBM to move from
> > >> the alpha support stage to a real Open Source support.
> > >> PS: sending a mail every two months, to promise improvements, is
> > not enough!
> > >
> >
> > […]
> >
> > >
> > >We should retain lwsync, should not be removed as discussed in here:
> > >
> > >http://mails.dpdk.org/archives/dev/2019-March/126746.html
> >
> > i don't agree.
> > it is very clear the rte_wmb implementation in power is broken and
> > we need to fix this right away before other customers will hit the
> > same issue.
> 
> 
> In the DPDK source I see a couple of different classes of memory barriers.
> I am
> not clear on the usage of these in the drivers, but I would think the
> guidelines
> to be as shown below (for Power):
> 
> - rte_[rw]mb (general memory barrier) --> should be lwsync
This is what may be discussed.
The assumption is that the general memory barrier should cover
all cases (CPU caches, SMP and I/O).
That's why we think it should "sync" for Power.
> - rte_smp_[rw]mb (SMP memory barrier) -->should be lwsync
> - rte_io_[rw]mb (I/O memory barrier)  --> should be sync
> - rte_cio_[rw]mb (coherent I/O memory barrier) -->should be sync
> 
> lwsync is appropriate for cases where CPUs are accessing cacheable memory
> (i.e. Memory Coherence Required) while the sync instruction should be used
> in all other cases.
> 
> With the patch in:
> http://mails.dpdk.org/archives/dev/2019-March/126746.html
> 
> It converts even the rte_smp_[rw]mb into sync. That is not what the
> rte_smp*() should
> be implementing as per the guidelines above.
> 
> static __rte_always_inline void
> mlx5_tx_dbrec_cond_wmb(struct mlx5_txq_data *txq, volatile struct mlx5_wqe
> *wqe,
>                        int cond)
> {
>         uint64_t *dst = (uint64_t *)((uintptr_t)txq->bf_reg);
>         volatile uint64_t *src = ((volatile uint64_t *)wqe);
> 
>         rte_cio_wmb(); --> would rte_cio_rmb() be more appropriate?
>         *txq->qp_db = rte_cpu_to_be_32(txq->wqe_ci);
>         /* Ensure ordering between DB record and BF copy. */
>         rte_wmb(); --> what has been established is that for Power we need
> "sync" instead of lwsync
> 		We are dealing with device memory -should we be using an
> rte_io_wmb() here?
> 
>         mlx5_uar_write64_relaxed(*src, dst, txq->uar_lock);
>         if (cond)
>                 rte_wmb(); --> what about here? Are there conditions when
> we need sync?
> }
^ permalink raw reply	[flat|nested] 42+ messages in thread
* Re: [dpdk-dev] [PATCH] eal/ppc: remove fix of memory barrier for IBM POWER
  2019-03-22  8:49                 ` Thomas Monjalon
  2019-03-22  8:49                   ` Thomas Monjalon
@ 2019-03-22 15:30                   ` Pradeep Satyanarayana
  2019-03-22 15:30                     ` Pradeep Satyanarayana
  2019-03-22 17:51                     ` Thomas Monjalon
  1 sibling, 2 replies; 42+ messages in thread
From: Pradeep Satyanarayana @ 2019-03-22 15:30 UTC (permalink / raw)
  To: Thomas Monjalon
  Cc: bruce.richardson, Chao Zhu, Dekel Peled, dev, David Christensen,
	honnappa.nagarahalli, konstantin.ananyev, ola.liljedahl, Ori Kam,
	Shahaf Shuler, David Wilder, Yongseok Koh
Thomas Monjalon <thomas@monjalon.net> wrote on 03/22/2019 01:49:03 AM:
> From: Thomas Monjalon <thomas@monjalon.net>
> To: Pradeep Satyanarayana <pradeep@us.ibm.com>, David Wilder
> <wilder@us.ibm.com>
> Cc: Shahaf Shuler <shahafs@mellanox.com>, Chao Zhu
> <chaozhu@linux.vnet.ibm.com>, Dekel Peled <dekelp@mellanox.com>,
> dev@dpdk.org, David Christensen <drc@ibm.com>, Ori Kam
> <orika@mellanox.com>, Yongseok Koh <yskoh@mellanox.com>,
> konstantin.ananyev@intel.com, ola.liljedahl@arm.com,
> honnappa.nagarahalli@arm.com, bruce.richardson@intel.com
> Date: 03/22/2019 01:49 AM
> Subject: Re: [PATCH] eal/ppc: remove fix of memory barrier for IBM POWER
>
> We need to agree on the definitions.
> Please see below,
>
> 22/03/2019 02:40, Pradeep Satyanarayana:
> > Shahaf Shuler <shahafs@mellanox.com> wrote on 03/21/2019 01:49:39 AM:
> > > Pradeep Satyanarayana <pradeep@us.ibm.com> wrote on Thu 3/21/2019
12:41
> > AM:
> > > >> > So far, when not running on power, we used the rte_wmb for that.
> > > >> On x86 and ARM systems it provided the needed guarantees.
> > > >> > It is also mentioned in the barrier doxygen on ARM arch:
> > > >> > "
> > > >> > Write memory barrier.
> > > >> >
> > > >> > Guarantees that the STORE operations generated before the
barrier
> > > >> > occur before the STORE operations generated after.
> > > >> > "
> > > >> >
> > > >> > It doesn't restrict to store to system memory only.
> > > >> > w/ power is on somewhat different and in fact rte_mb is
required.
> > > >> It obviously miss the point of those barrier if we will need to
use
> > > >> a different barrier based on the system arch.
> > > >> >
> > > >> > We need to align the definition of the different barriers in
DPDK:
> > > >> > 1. need a clear documentation of each. this should be global and
> > > >> not part of the specific implementation on each arch.
> > > >
> > > >A single approach may not work for all architectures. Power is
different
> > > >from others, so we need to be able to accommodate that. More
comments
> > below.
> > >
> > > it don't get this claim.
> > > It is ok to have some differences between the different arch, but
> > > here you implement a well-defined barrier - rte_wmb.
> > > if you see a need we can discuss to define a **new** barrier which
> > > sync STORE only to system memory, and will be able to utilize the
> > > lwsync command.
> > >
> > > >
> > > >> The global definition is in lib/librte_eal/common/include/
> > > generic/rte_atomic.h
> > > >>
> > > >> There are some copy/paste in Arm32 and PPC that I will remove.
> > > >>
> > > >> > 2. either modify ppc rte_wmb to match ARM and x86 ones or to
> > > >> define a new type of barrier which will sync between both I/O and
> > > >> stores to systems memory.
> > > >>
> > > >> The basic memory barrier of DPDK does not mention
> > > >> a difference between I/O and system memory.
> > > >
> > > >In the case of Power, sync will cater to both I/O and system
> > > memory. However, that
> > > >is too big a hammer in all cases.
> > >
> > > rte_wmb requires such sync. you propose to have the wrong barrier in
> > > favor of performance.
> > > to mitigate this you can take my suggestion above and define a new,
> > > more lightweight one.
> > >
> > > >
> > > >> It is not explicit (yet) but I assume it is protecting both.
> > > >> So, in my opinion, we need to make it explicit in the doc,
> > > >> and fix the PPC implementation to comply with this definition.
> > > >>
> > > >> Anyway, I don't see any significant effort from IBM to move from
> > > >> the alpha support stage to a real Open Source support.
> > > >> PS: sending a mail every two months, to promise improvements, is
> > > not enough!
> > > >
> > >
> > > […]
> > >
> > > >
> > > >We should retain lwsync, should not be removed as discussed in here:
> > > >
> > > >https://urldefense.proofpoint.com/v2/url?
>
u=http-3A__mails.dpdk.org_archives_dev_2019-2DMarch_126746.html&d=DwIFaQ&c=jf_iaSHvJObTbx-
> siA1ZOg&r=co4lCofxrQP11yIVMply-
> QYvsUyeKJkYY_jL1QVgeGA&m=SNGJjgGF8mHR9t-
>
ixYHznWvUoXvC3zlbm8q1vlS4x_s&s=TXCEGDEjCiUW1ug5kDwlfuUaqRMowGhpihjly5zEZp8&e=
> > >
> > > i don't agree.
> > > it is very clear the rte_wmb implementation in power is broken and
> > > we need to fix this right away before other customers will hit the
> > > same issue.
> >
> >
> > In the DPDK source I see a couple of different classes of memory
barriers.
> > I am
> > not clear on the usage of these in the drivers, but I would think the
> > guidelines
> > to be as shown below (for Power):
> >
> > - rte_[rw]mb (general memory barrier) --> should be lwsync
>
> This is what may be discussed.
> The assumption is that the general memory barrier should cover
> all cases (CPU caches, SMP and I/O).
> That's why we think it should "sync" for Power.
In that case, at a minimum we must de-link rte_smp_[rw]mb from rte_[rw]mb
and retain it as lwsync. Agreed?
>
> > - rte_smp_[rw]mb (SMP memory barrier) -->should be lwsync
> > - rte_io_[rw]mb (I/O memory barrier)  --> should be sync
> > - rte_cio_[rw]mb (coherent I/O memory barrier) -->should be sync
> >
> > lwsync is appropriate for cases where CPUs are accessing cacheable
memory
> > (i.e. Memory Coherence Required) while the sync instruction should be
used
> > in all other cases.
> >
Thanks
Pradeep
pradeep@us.ibm.com
^ permalink raw reply	[flat|nested] 42+ messages in thread
* Re: [dpdk-dev] [PATCH] eal/ppc: remove fix of memory barrier for IBM POWER
  2019-03-22 15:30                   ` Pradeep Satyanarayana
@ 2019-03-22 15:30                     ` Pradeep Satyanarayana
  2019-03-22 17:51                     ` Thomas Monjalon
  1 sibling, 0 replies; 42+ messages in thread
From: Pradeep Satyanarayana @ 2019-03-22 15:30 UTC (permalink / raw)
  To: Thomas Monjalon
  Cc: bruce.richardson, Chao Zhu, Dekel Peled, dev, David Christensen,
	honnappa.nagarahalli, konstantin.ananyev, ola.liljedahl, Ori Kam,
	Shahaf Shuler, David Wilder, Yongseok Koh
Thomas Monjalon <thomas@monjalon.net> wrote on 03/22/2019 01:49:03 AM:
> From: Thomas Monjalon <thomas@monjalon.net>
> To: Pradeep Satyanarayana <pradeep@us.ibm.com>, David Wilder
> <wilder@us.ibm.com>
> Cc: Shahaf Shuler <shahafs@mellanox.com>, Chao Zhu
> <chaozhu@linux.vnet.ibm.com>, Dekel Peled <dekelp@mellanox.com>,
> dev@dpdk.org, David Christensen <drc@ibm.com>, Ori Kam
> <orika@mellanox.com>, Yongseok Koh <yskoh@mellanox.com>,
> konstantin.ananyev@intel.com, ola.liljedahl@arm.com,
> honnappa.nagarahalli@arm.com, bruce.richardson@intel.com
> Date: 03/22/2019 01:49 AM
> Subject: Re: [PATCH] eal/ppc: remove fix of memory barrier for IBM POWER
>
> We need to agree on the definitions.
> Please see below,
>
> 22/03/2019 02:40, Pradeep Satyanarayana:
> > Shahaf Shuler <shahafs@mellanox.com> wrote on 03/21/2019 01:49:39 AM:
> > > Pradeep Satyanarayana <pradeep@us.ibm.com> wrote on Thu 3/21/2019
12:41
> > AM:
> > > >> > So far, when not running on power, we used the rte_wmb for that.
> > > >> On x86 and ARM systems it provided the needed guarantees.
> > > >> > It is also mentioned in the barrier doxygen on ARM arch:
> > > >> > "
> > > >> > Write memory barrier.
> > > >> >
> > > >> > Guarantees that the STORE operations generated before the
barrier
> > > >> > occur before the STORE operations generated after.
> > > >> > "
> > > >> >
> > > >> > It doesn't restrict to store to system memory only.
> > > >> > w/ power is on somewhat different and in fact rte_mb is
required.
> > > >> It obviously miss the point of those barrier if we will need to
use
> > > >> a different barrier based on the system arch.
> > > >> >
> > > >> > We need to align the definition of the different barriers in
DPDK:
> > > >> > 1. need a clear documentation of each. this should be global and
> > > >> not part of the specific implementation on each arch.
> > > >
> > > >A single approach may not work for all architectures. Power is
different
> > > >from others, so we need to be able to accommodate that. More
comments
> > below.
> > >
> > > it don't get this claim.
> > > It is ok to have some differences between the different arch, but
> > > here you implement a well-defined barrier - rte_wmb.
> > > if you see a need we can discuss to define a **new** barrier which
> > > sync STORE only to system memory, and will be able to utilize the
> > > lwsync command.
> > >
> > > >
> > > >> The global definition is in lib/librte_eal/common/include/
> > > generic/rte_atomic.h
> > > >>
> > > >> There are some copy/paste in Arm32 and PPC that I will remove.
> > > >>
> > > >> > 2. either modify ppc rte_wmb to match ARM and x86 ones or to
> > > >> define a new type of barrier which will sync between both I/O and
> > > >> stores to systems memory.
> > > >>
> > > >> The basic memory barrier of DPDK does not mention
> > > >> a difference between I/O and system memory.
> > > >
> > > >In the case of Power, sync will cater to both I/O and system
> > > memory. However, that
> > > >is too big a hammer in all cases.
> > >
> > > rte_wmb requires such sync. you propose to have the wrong barrier in
> > > favor of performance.
> > > to mitigate this you can take my suggestion above and define a new,
> > > more lightweight one.
> > >
> > > >
> > > >> It is not explicit (yet) but I assume it is protecting both.
> > > >> So, in my opinion, we need to make it explicit in the doc,
> > > >> and fix the PPC implementation to comply with this definition.
> > > >>
> > > >> Anyway, I don't see any significant effort from IBM to move from
> > > >> the alpha support stage to a real Open Source support.
> > > >> PS: sending a mail every two months, to promise improvements, is
> > > not enough!
> > > >
> > >
> > > […]
> > >
> > > >
> > > >We should retain lwsync, should not be removed as discussed in here:
> > > >
> > > >https://urldefense.proofpoint.com/v2/url?
>
u=http-3A__mails.dpdk.org_archives_dev_2019-2DMarch_126746.html&d=DwIFaQ&c=jf_iaSHvJObTbx-
> siA1ZOg&r=co4lCofxrQP11yIVMply-
> QYvsUyeKJkYY_jL1QVgeGA&m=SNGJjgGF8mHR9t-
>
ixYHznWvUoXvC3zlbm8q1vlS4x_s&s=TXCEGDEjCiUW1ug5kDwlfuUaqRMowGhpihjly5zEZp8&e=
> > >
> > > i don't agree.
> > > it is very clear the rte_wmb implementation in power is broken and
> > > we need to fix this right away before other customers will hit the
> > > same issue.
> >
> >
> > In the DPDK source I see a couple of different classes of memory
barriers.
> > I am
> > not clear on the usage of these in the drivers, but I would think the
> > guidelines
> > to be as shown below (for Power):
> >
> > - rte_[rw]mb (general memory barrier) --> should be lwsync
>
> This is what may be discussed.
> The assumption is that the general memory barrier should cover
> all cases (CPU caches, SMP and I/O).
> That's why we think it should "sync" for Power.
In that case, at a minimum we must de-link rte_smp_[rw]mb from rte_[rw]mb
and retain it as lwsync. Agreed?
>
> > - rte_smp_[rw]mb (SMP memory barrier) -->should be lwsync
> > - rte_io_[rw]mb (I/O memory barrier)  --> should be sync
> > - rte_cio_[rw]mb (coherent I/O memory barrier) -->should be sync
> >
> > lwsync is appropriate for cases where CPUs are accessing cacheable
memory
> > (i.e. Memory Coherence Required) while the sync instruction should be
used
> > in all other cases.
> >
Thanks
Pradeep
pradeep@us.ibm.com
^ permalink raw reply	[flat|nested] 42+ messages in thread
* Re: [dpdk-dev] [PATCH] eal/ppc: remove fix of memory barrier for IBM POWER
  2019-03-22 15:30                   ` Pradeep Satyanarayana
  2019-03-22 15:30                     ` Pradeep Satyanarayana
@ 2019-03-22 17:51                     ` Thomas Monjalon
  2019-03-22 17:51                       ` Thomas Monjalon
  2019-03-22 22:57                       ` Pradeep Satyanarayana
  1 sibling, 2 replies; 42+ messages in thread
From: Thomas Monjalon @ 2019-03-22 17:51 UTC (permalink / raw)
  To: Pradeep Satyanarayana
  Cc: bruce.richardson, Chao Zhu, Dekel Peled, dev, David Christensen,
	honnappa.nagarahalli, konstantin.ananyev, ola.liljedahl, Ori Kam,
	Shahaf Shuler, David Wilder, Yongseok Koh
22/03/2019 16:30, Pradeep Satyanarayana:
> Thomas Monjalon <thomas@monjalon.net> wrote on 03/22/2019 01:49:03 AM:
> > 22/03/2019 02:40, Pradeep Satyanarayana:
> > > - rte_[rw]mb (general memory barrier) --> should be lwsync
> >
> > This is what may be discussed.
> > The assumption is that the general memory barrier should cover
> > all cases (CPU caches, SMP and I/O).
> > That's why we think it should "sync" for Power.
> 
> In that case, at a minimum we must de-link rte_smp_[rw]mb from rte_[rw]mb
> and retain it as lwsync. Agreed?
I have no clue about what is needed for SMP barrier in Power.
As long as it works as expected, no problem.
> > > - rte_smp_[rw]mb (SMP memory barrier) -->should be lwsync
> > > - rte_io_[rw]mb (I/O memory barrier)  --> should be sync
> > > - rte_cio_[rw]mb (coherent I/O memory barrier) -->should be sync
> > >
> > > lwsync is appropriate for cases where CPUs are accessing cacheable
> > > memory (i.e. Memory Coherence Required) while the sync instruction
> > > should be used in all other cases.
^ permalink raw reply	[flat|nested] 42+ messages in thread
* Re: [dpdk-dev] [PATCH] eal/ppc: remove fix of memory barrier for IBM POWER
  2019-03-22 17:51                     ` Thomas Monjalon
@ 2019-03-22 17:51                       ` Thomas Monjalon
  2019-03-22 22:57                       ` Pradeep Satyanarayana
  1 sibling, 0 replies; 42+ messages in thread
From: Thomas Monjalon @ 2019-03-22 17:51 UTC (permalink / raw)
  To: Pradeep Satyanarayana
  Cc: bruce.richardson, Chao Zhu, Dekel Peled, dev, David Christensen,
	honnappa.nagarahalli, konstantin.ananyev, ola.liljedahl, Ori Kam,
	Shahaf Shuler, David Wilder, Yongseok Koh
22/03/2019 16:30, Pradeep Satyanarayana:
> Thomas Monjalon <thomas@monjalon.net> wrote on 03/22/2019 01:49:03 AM:
> > 22/03/2019 02:40, Pradeep Satyanarayana:
> > > - rte_[rw]mb (general memory barrier) --> should be lwsync
> >
> > This is what may be discussed.
> > The assumption is that the general memory barrier should cover
> > all cases (CPU caches, SMP and I/O).
> > That's why we think it should "sync" for Power.
> 
> In that case, at a minimum we must de-link rte_smp_[rw]mb from rte_[rw]mb
> and retain it as lwsync. Agreed?
I have no clue about what is needed for SMP barrier in Power.
As long as it works as expected, no problem.
> > > - rte_smp_[rw]mb (SMP memory barrier) -->should be lwsync
> > > - rte_io_[rw]mb (I/O memory barrier)  --> should be sync
> > > - rte_cio_[rw]mb (coherent I/O memory barrier) -->should be sync
> > >
> > > lwsync is appropriate for cases where CPUs are accessing cacheable
> > > memory (i.e. Memory Coherence Required) while the sync instruction
> > > should be used in all other cases.
^ permalink raw reply	[flat|nested] 42+ messages in thread
* Re: [dpdk-dev] [PATCH] eal/ppc: remove fix of memory barrier for IBM POWER
  2019-03-22 17:51                     ` Thomas Monjalon
  2019-03-22 17:51                       ` Thomas Monjalon
@ 2019-03-22 22:57                       ` Pradeep Satyanarayana
  2019-03-22 22:57                         ` Pradeep Satyanarayana
  2019-03-24  6:37                         ` Shahaf Shuler
  1 sibling, 2 replies; 42+ messages in thread
From: Pradeep Satyanarayana @ 2019-03-22 22:57 UTC (permalink / raw)
  To: Thomas Monjalon
  Cc: bruce.richardson, Chao Zhu, Dekel Peled, dev, David Christensen,
	honnappa.nagarahalli, konstantin.ananyev, ola.liljedahl, Ori Kam,
	Shahaf Shuler, David Wilder, Yongseok Koh
Thomas Monjalon <thomas@monjalon.net> wrote on 03/22/2019 10:51:17 AM:
> From: Thomas Monjalon <thomas@monjalon.net>
> To: Pradeep Satyanarayana <pradeep@us.ibm.com>
> Cc: bruce.richardson@intel.com, Chao Zhu
> <chaozhu@linux.vnet.ibm.com>, Dekel Peled <dekelp@mellanox.com>,
> dev@dpdk.org, David Christensen <drc@ibm.com>,
> honnappa.nagarahalli@arm.com, konstantin.ananyev@intel.com,
> ola.liljedahl@arm.com, Ori Kam <orika@mellanox.com>, Shahaf Shuler
> <shahafs@mellanox.com>, David Wilder <wilder@us.ibm.com>, Yongseok
> Koh <yskoh@mellanox.com>
> Date: 03/22/2019 10:51 AM
> Subject: Re: [PATCH] eal/ppc: remove fix of memory barrier for IBM POWER
>
> 22/03/2019 16:30, Pradeep Satyanarayana:
> > Thomas Monjalon <thomas@monjalon.net> wrote on 03/22/2019 01:49:03 AM:
> > > 22/03/2019 02:40, Pradeep Satyanarayana:
> > > > - rte_[rw]mb (general memory barrier) --> should be lwsync
> > >
> > > This is what may be discussed.
> > > The assumption is that the general memory barrier should cover
> > > all cases (CPU caches, SMP and I/O).
> > > That's why we think it should "sync" for Power.
> >
> > In that case, at a minimum we must de-link rte_smp_[rw]mb from
rte_[rw]mb
> > and retain it as lwsync. Agreed?
>
> I have no clue about what is needed for SMP barrier in Power.
> As long as it works as expected, no problem.
>
We will try that out and report back here, later next week
Thanks
Pradeep
pradeep@us.ibm.com
^ permalink raw reply	[flat|nested] 42+ messages in thread
* Re: [dpdk-dev] [PATCH] eal/ppc: remove fix of memory barrier for IBM POWER
  2019-03-22 22:57                       ` Pradeep Satyanarayana
@ 2019-03-22 22:57                         ` Pradeep Satyanarayana
  2019-03-24  6:37                         ` Shahaf Shuler
  1 sibling, 0 replies; 42+ messages in thread
From: Pradeep Satyanarayana @ 2019-03-22 22:57 UTC (permalink / raw)
  To: Thomas Monjalon
  Cc: bruce.richardson, Chao Zhu, Dekel Peled, dev, David Christensen,
	honnappa.nagarahalli, konstantin.ananyev, ola.liljedahl, Ori Kam,
	Shahaf Shuler, David Wilder, Yongseok Koh
Thomas Monjalon <thomas@monjalon.net> wrote on 03/22/2019 10:51:17 AM:
> From: Thomas Monjalon <thomas@monjalon.net>
> To: Pradeep Satyanarayana <pradeep@us.ibm.com>
> Cc: bruce.richardson@intel.com, Chao Zhu
> <chaozhu@linux.vnet.ibm.com>, Dekel Peled <dekelp@mellanox.com>,
> dev@dpdk.org, David Christensen <drc@ibm.com>,
> honnappa.nagarahalli@arm.com, konstantin.ananyev@intel.com,
> ola.liljedahl@arm.com, Ori Kam <orika@mellanox.com>, Shahaf Shuler
> <shahafs@mellanox.com>, David Wilder <wilder@us.ibm.com>, Yongseok
> Koh <yskoh@mellanox.com>
> Date: 03/22/2019 10:51 AM
> Subject: Re: [PATCH] eal/ppc: remove fix of memory barrier for IBM POWER
>
> 22/03/2019 16:30, Pradeep Satyanarayana:
> > Thomas Monjalon <thomas@monjalon.net> wrote on 03/22/2019 01:49:03 AM:
> > > 22/03/2019 02:40, Pradeep Satyanarayana:
> > > > - rte_[rw]mb (general memory barrier) --> should be lwsync
> > >
> > > This is what may be discussed.
> > > The assumption is that the general memory barrier should cover
> > > all cases (CPU caches, SMP and I/O).
> > > That's why we think it should "sync" for Power.
> >
> > In that case, at a minimum we must de-link rte_smp_[rw]mb from
rte_[rw]mb
> > and retain it as lwsync. Agreed?
>
> I have no clue about what is needed for SMP barrier in Power.
> As long as it works as expected, no problem.
>
We will try that out and report back here, later next week
Thanks
Pradeep
pradeep@us.ibm.com
^ permalink raw reply	[flat|nested] 42+ messages in thread
* Re: [dpdk-dev] [PATCH] eal/ppc: remove fix of memory barrier for IBM POWER
  2019-03-22 22:57                       ` Pradeep Satyanarayana
  2019-03-22 22:57                         ` Pradeep Satyanarayana
@ 2019-03-24  6:37                         ` Shahaf Shuler
  2019-03-24  6:37                           ` Shahaf Shuler
  2019-03-24 17:37                           ` Pradeep Satyanarayana
  1 sibling, 2 replies; 42+ messages in thread
From: Shahaf Shuler @ 2019-03-24  6:37 UTC (permalink / raw)
  To: pradeep, Thomas Monjalon
  Cc: bruce.richardson, Chao Zhu, Dekel Peled, dev, David Christensen,
	honnappa.nagarahalli, konstantin.ananyev, ola.liljedahl, Ori Kam,
	David Wilder, Yongseok Koh
Pradeep,
Pradeep Satyanarayana wrote on Saturday, March 23, 2019 12:58 AM
>Thomas Monjalon <thomas@monjalon.net> wrote on 03/22/2019 10:51:17 AM:
>> Date: 03/22/2019 10:51 AM
>> Subject: Re: [PATCH] eal/ppc: remove fix of memory barrier for IBM POWER
>>
>> 22/03/2019 16:30, Pradeep Satyanarayana:
>> > Thomas Monjalon <thomas@monjalon.net> wrote on 03/22/2019 01:49:03 AM:
>> > > 22/03/2019 02:40, Pradeep Satyanarayana:
>> > > > - rte_[rw]mb (general memory barrier) --> should be lwsync
>> > >
>> > > This is what may be discussed.
>> > > The assumption is that the general memory barrier should cover
>> > > all cases (CPU caches, SMP and I/O).
>> > > That's why we think it should "sync" for Power.
>> >
>> > In that case, at a minimum we must de-link rte_smp_[rw]mb from rte_[rw]mb
>> > and retain it as lwsync. Agreed?
>>
>> I have no clue about what is needed for SMP barrier in Power.
>> As long as it works as expected, no problem.
>>
>
>We will try that out and report back here, later next week
Till then, i think there are 2 orthogonal issues:
1. ppc rte_wmb is incorrect
2. ppc rte_smp_[rw]mb may be improved.
for #1 the current patch from Dekel seems to be OK. do you agree?
for #2 i guess you will check and come back w/ patch/answer?
^ permalink raw reply	[flat|nested] 42+ messages in thread
* Re: [dpdk-dev] [PATCH] eal/ppc: remove fix of memory barrier for IBM POWER
  2019-03-24  6:37                         ` Shahaf Shuler
@ 2019-03-24  6:37                           ` Shahaf Shuler
  2019-03-24 17:37                           ` Pradeep Satyanarayana
  1 sibling, 0 replies; 42+ messages in thread
From: Shahaf Shuler @ 2019-03-24  6:37 UTC (permalink / raw)
  To: pradeep, Thomas Monjalon
  Cc: bruce.richardson, Chao Zhu, Dekel Peled, dev, David Christensen,
	honnappa.nagarahalli, konstantin.ananyev, ola.liljedahl, Ori Kam,
	David Wilder, Yongseok Koh
Pradeep,
Pradeep Satyanarayana wrote on Saturday, March 23, 2019 12:58 AM
>Thomas Monjalon <thomas@monjalon.net> wrote on 03/22/2019 10:51:17 AM:
>> Date: 03/22/2019 10:51 AM
>> Subject: Re: [PATCH] eal/ppc: remove fix of memory barrier for IBM POWER
>>
>> 22/03/2019 16:30, Pradeep Satyanarayana:
>> > Thomas Monjalon <thomas@monjalon.net> wrote on 03/22/2019 01:49:03 AM:
>> > > 22/03/2019 02:40, Pradeep Satyanarayana:
>> > > > - rte_[rw]mb (general memory barrier) --> should be lwsync
>> > >
>> > > This is what may be discussed.
>> > > The assumption is that the general memory barrier should cover
>> > > all cases (CPU caches, SMP and I/O).
>> > > That's why we think it should "sync" for Power.
>> >
>> > In that case, at a minimum we must de-link rte_smp_[rw]mb from rte_[rw]mb
>> > and retain it as lwsync. Agreed?
>>
>> I have no clue about what is needed for SMP barrier in Power.
>> As long as it works as expected, no problem.
>>
>
>We will try that out and report back here, later next week
Till then, i think there are 2 orthogonal issues:
1. ppc rte_wmb is incorrect
2. ppc rte_smp_[rw]mb may be improved.
for #1 the current patch from Dekel seems to be OK. do you agree?
for #2 i guess you will check and come back w/ patch/answer?
^ permalink raw reply	[flat|nested] 42+ messages in thread
* Re: [dpdk-dev] [PATCH] eal/ppc: remove fix of memory barrier for IBM POWER
  2019-03-24  6:37                         ` Shahaf Shuler
  2019-03-24  6:37                           ` Shahaf Shuler
@ 2019-03-24 17:37                           ` Pradeep Satyanarayana
  2019-03-24 17:37                             ` Pradeep Satyanarayana
  2019-03-26  9:15                             ` Dekel Peled
  1 sibling, 2 replies; 42+ messages in thread
From: Pradeep Satyanarayana @ 2019-03-24 17:37 UTC (permalink / raw)
  To: Shahaf Shuler
  Cc: bruce.richardson, Chao Zhu, Dekel Peled, dev, David Christensen,
	honnappa.nagarahalli, konstantin.ananyev, ola.liljedahl, Ori Kam,
	Thomas Monjalon, David Wilder, Yongseok Koh
Shahaf Shuler <shahafs@mellanox.com> wrote on 03/23/2019 11:37:42 PM:
> From: Shahaf Shuler <shahafs@mellanox.com>
> To: "pradeep@us.ibm.com" <pradeep@us.ibm.com>, Thomas Monjalon
> <thomas@monjalon.net>
> Cc: "bruce.richardson@intel.com" <bruce.richardson@intel.com>, Chao
> Zhu <chaozhu@linux.vnet.ibm.com>, Dekel Peled <dekelp@mellanox.com>,
> "dev@dpdk.org" <dev@dpdk.org>, David Christensen <drc@ibm.com>,
> "honnappa.nagarahalli@arm.com" <honnappa.nagarahalli@arm.com>,
> "konstantin.ananyev@intel.com" <konstantin.ananyev@intel.com>,
> "ola.liljedahl@arm.com" <ola.liljedahl@arm.com>, Ori Kam
> <orika@mellanox.com>, David Wilder <wilder@us.ibm.com>, Yongseok Koh
> <yskoh@mellanox.com>
> Date: 03/23/2019 11:37 PM
> Subject: RE: [PATCH] eal/ppc: remove fix of memory barrier for IBM POWER
>
> Pradeep,
>
> Pradeep Satyanarayana wrote on Saturday, March 23, 2019 12:58 AM
> >Thomas Monjalon <thomas@monjalon.net> wrote on 03/22/2019 10:51:17 AM:
> >> Date: 03/22/2019 10:51 AM
> >> Subject: Re: [PATCH] eal/ppc: remove fix of memory barrier for IBM
POWER
> >>
> >> 22/03/2019 16:30, Pradeep Satyanarayana:
> >> > Thomas Monjalon <thomas@monjalon.net> wrote on 03/22/2019 01:49:03
AM:
> >> > > 22/03/2019 02:40, Pradeep Satyanarayana:
> >> > > > - rte_[rw]mb (general memory barrier) --> should be lwsync
> >> > >
> >> > > This is what may be discussed.
> >> > > The assumption is that the general memory barrier should cover
> >> > > all cases (CPU caches, SMP and I/O).
> >> > > That's why we think it should "sync" for Power.
> >> >
> >> > In that case, at a minimum we must de-link rte_smp_[rw]mb from
rte_[rw]mb
> >> > and retain it as lwsync. Agreed?
> >>
> >> I have no clue about what is needed for SMP barrier in Power.
> >> As long as it works as expected, no problem.
> >>
> >
> >We will try that out and report back here, later next week
>
> Till then, i think there are 2 orthogonal issues:
> 1. ppc rte_wmb is incorrect
> 2. ppc rte_smp_[rw]mb may be improved.
>
> for #1 the current patch from Dekel seems to be OK. do you agree?
> for #2 i guess you will check and come back w/ patch/answer?
That has been the line of thinking. However, we need to do some extensive
testing
to confirm that it all holds up.
Thanks
Pradeep
pradeep@us.ibm.com
^ permalink raw reply	[flat|nested] 42+ messages in thread
* Re: [dpdk-dev] [PATCH] eal/ppc: remove fix of memory barrier for IBM POWER
  2019-03-24 17:37                           ` Pradeep Satyanarayana
@ 2019-03-24 17:37                             ` Pradeep Satyanarayana
  2019-03-26  9:15                             ` Dekel Peled
  1 sibling, 0 replies; 42+ messages in thread
From: Pradeep Satyanarayana @ 2019-03-24 17:37 UTC (permalink / raw)
  To: Shahaf Shuler
  Cc: bruce.richardson, Chao Zhu, Dekel Peled, dev, David Christensen,
	honnappa.nagarahalli, konstantin.ananyev, ola.liljedahl, Ori Kam,
	Thomas Monjalon, David Wilder, Yongseok Koh
Shahaf Shuler <shahafs@mellanox.com> wrote on 03/23/2019 11:37:42 PM:
> From: Shahaf Shuler <shahafs@mellanox.com>
> To: "pradeep@us.ibm.com" <pradeep@us.ibm.com>, Thomas Monjalon
> <thomas@monjalon.net>
> Cc: "bruce.richardson@intel.com" <bruce.richardson@intel.com>, Chao
> Zhu <chaozhu@linux.vnet.ibm.com>, Dekel Peled <dekelp@mellanox.com>,
> "dev@dpdk.org" <dev@dpdk.org>, David Christensen <drc@ibm.com>,
> "honnappa.nagarahalli@arm.com" <honnappa.nagarahalli@arm.com>,
> "konstantin.ananyev@intel.com" <konstantin.ananyev@intel.com>,
> "ola.liljedahl@arm.com" <ola.liljedahl@arm.com>, Ori Kam
> <orika@mellanox.com>, David Wilder <wilder@us.ibm.com>, Yongseok Koh
> <yskoh@mellanox.com>
> Date: 03/23/2019 11:37 PM
> Subject: RE: [PATCH] eal/ppc: remove fix of memory barrier for IBM POWER
>
> Pradeep,
>
> Pradeep Satyanarayana wrote on Saturday, March 23, 2019 12:58 AM
> >Thomas Monjalon <thomas@monjalon.net> wrote on 03/22/2019 10:51:17 AM:
> >> Date: 03/22/2019 10:51 AM
> >> Subject: Re: [PATCH] eal/ppc: remove fix of memory barrier for IBM
POWER
> >>
> >> 22/03/2019 16:30, Pradeep Satyanarayana:
> >> > Thomas Monjalon <thomas@monjalon.net> wrote on 03/22/2019 01:49:03
AM:
> >> > > 22/03/2019 02:40, Pradeep Satyanarayana:
> >> > > > - rte_[rw]mb (general memory barrier) --> should be lwsync
> >> > >
> >> > > This is what may be discussed.
> >> > > The assumption is that the general memory barrier should cover
> >> > > all cases (CPU caches, SMP and I/O).
> >> > > That's why we think it should "sync" for Power.
> >> >
> >> > In that case, at a minimum we must de-link rte_smp_[rw]mb from
rte_[rw]mb
> >> > and retain it as lwsync. Agreed?
> >>
> >> I have no clue about what is needed for SMP barrier in Power.
> >> As long as it works as expected, no problem.
> >>
> >
> >We will try that out and report back here, later next week
>
> Till then, i think there are 2 orthogonal issues:
> 1. ppc rte_wmb is incorrect
> 2. ppc rte_smp_[rw]mb may be improved.
>
> for #1 the current patch from Dekel seems to be OK. do you agree?
> for #2 i guess you will check and come back w/ patch/answer?
That has been the line of thinking. However, we need to do some extensive
testing
to confirm that it all holds up.
Thanks
Pradeep
pradeep@us.ibm.com
^ permalink raw reply	[flat|nested] 42+ messages in thread
* Re: [dpdk-dev] [PATCH] eal/ppc: remove fix of memory barrier for IBM POWER
  2019-03-24 17:37                           ` Pradeep Satyanarayana
  2019-03-24 17:37                             ` Pradeep Satyanarayana
@ 2019-03-26  9:15                             ` Dekel Peled
  2019-03-26  9:15                               ` Dekel Peled
  2019-03-27  9:19                               ` Thomas Monjalon
  1 sibling, 2 replies; 42+ messages in thread
From: Dekel Peled @ 2019-03-26  9:15 UTC (permalink / raw)
  To: pradeep, Shahaf Shuler
  Cc: bruce.richardson, Chao Zhu, dev, David Christensen,
	honnappa.nagarahalli, konstantin.ananyev, ola.liljedahl, Ori Kam,
	Thomas Monjalon, David Wilder, Yongseok Koh, Idan Werpoler,
	Olga Shern
+Idan Werpoler.
From: Pradeep Satyanarayana <pradeep@us.ibm.com>
Sent: Sunday, March 24, 2019 7:38 PM
To: Shahaf Shuler <shahafs@mellanox.com>
Cc: bruce.richardson@intel.com; Chao Zhu <chaozhu@linux.vnet.ibm.com>; Dekel Peled <dekelp@mellanox.com>; dev@dpdk.org; David Christensen <drc@ibm.com>; honnappa.nagarahalli@arm.com; konstantin.ananyev@intel.com; ola.liljedahl@arm.com; Ori Kam <orika@mellanox.com>; Thomas Monjalon <thomas@monjalon.net>; David Wilder <wilder@us.ibm.com>; Yongseok Koh <yskoh@mellanox.com>
Subject: RE: [PATCH] eal/ppc: remove fix of memory barrier for IBM POWER
Shahaf Shuler <shahafs@mellanox.com<mailto:shahafs@mellanox.com>> wrote on 03/23/2019 11:37:42 PM:
> From: Shahaf Shuler <shahafs@mellanox.com<mailto:shahafs@mellanox.com>>
> To: "pradeep@us.ibm.com<mailto:pradeep@us.ibm.com>" <pradeep@us.ibm.com<mailto:pradeep@us.ibm.com>>, Thomas Monjalon
> <thomas@monjalon.net<mailto:thomas@monjalon.net>>
> Cc: "bruce.richardson@intel.com<mailto:bruce.richardson@intel.com>" <bruce.richardson@intel.com<mailto:bruce.richardson@intel.com>>, Chao
> Zhu <chaozhu@linux.vnet.ibm.com<mailto:chaozhu@linux.vnet.ibm.com>>, Dekel Peled <dekelp@mellanox.com<mailto:dekelp@mellanox.com>>,
> "dev@dpdk.org<mailto:dev@dpdk.org>" <dev@dpdk.org<mailto:dev@dpdk.org>>, David Christensen <drc@ibm.com<mailto:drc@ibm.com>>,
> "honnappa.nagarahalli@arm.com<mailto:honnappa.nagarahalli@arm.com>" <honnappa.nagarahalli@arm.com<mailto:honnappa.nagarahalli@arm.com>>,
> "konstantin.ananyev@intel.com<mailto:konstantin.ananyev@intel.com>" <konstantin.ananyev@intel.com<mailto:konstantin.ananyev@intel.com>>,
> "ola.liljedahl@arm.com<mailto:ola.liljedahl@arm.com>" <ola.liljedahl@arm.com<mailto:ola.liljedahl@arm.com>>, Ori Kam
> <orika@mellanox.com<mailto:orika@mellanox.com>>, David Wilder <wilder@us.ibm.com<mailto:wilder@us.ibm.com>>, Yongseok Koh
> <yskoh@mellanox.com<mailto:yskoh@mellanox.com>>
> Date: 03/23/2019 11:37 PM
> Subject: RE: [PATCH] eal/ppc: remove fix of memory barrier for IBM POWER
>
> Pradeep,
>
> Pradeep Satyanarayana wrote on Saturday, March 23, 2019 12:58 AM
> >Thomas Monjalon <thomas@monjalon.net<mailto:thomas@monjalon.net>> wrote on 03/22/2019 10:51:17 AM:
> >> Date: 03/22/2019 10:51 AM
> >> Subject: Re: [PATCH] eal/ppc: remove fix of memory barrier for IBM POWER
> >>
> >> 22/03/2019 16:30, Pradeep Satyanarayana:
> >> > Thomas Monjalon <thomas@monjalon.net<mailto:thomas@monjalon.net>> wrote on 03/22/2019 01:49:03 AM:
> >> > > 22/03/2019 02:40, Pradeep Satyanarayana:
> >> > > > - rte_[rw]mb (general memory barrier) --> should be lwsync
> >> > >
> >> > > This is what may be discussed.
> >> > > The assumption is that the general memory barrier should cover
> >> > > all cases (CPU caches, SMP and I/O).
> >> > > That's why we think it should "sync" for Power.
> >> >
> >> > In that case, at a minimum we must de-link rte_smp_[rw]mb from rte_[rw]mb
> >> > and retain it as lwsync. Agreed?
> >>
> >> I have no clue about what is needed for SMP barrier in Power.
> >> As long as it works as expected, no problem.
> >>
> >
> >We will try that out and report back here, later next week
>
> Till then, i think there are 2 orthogonal issues:
> 1. ppc rte_wmb is incorrect
> 2. ppc rte_smp_[rw]mb may be improved.
>
> for #1 the current patch from Dekel seems to be OK. do you agree?
> for #2 i guess you will check and come back w/ patch/answer?
That has been the line of thinking. However, we need to do some extensive testing
to confirm that it all holds up.
Thanks
Pradeep
pradeep@us.ibm.com<mailto:pradeep@us.ibm.com>
^ permalink raw reply	[flat|nested] 42+ messages in thread
* Re: [dpdk-dev] [PATCH] eal/ppc: remove fix of memory barrier for IBM POWER
  2019-03-26  9:15                             ` Dekel Peled
@ 2019-03-26  9:15                               ` Dekel Peled
  2019-03-27  9:19                               ` Thomas Monjalon
  1 sibling, 0 replies; 42+ messages in thread
From: Dekel Peled @ 2019-03-26  9:15 UTC (permalink / raw)
  To: pradeep, Shahaf Shuler
  Cc: bruce.richardson, Chao Zhu, dev, David Christensen,
	honnappa.nagarahalli, konstantin.ananyev, ola.liljedahl, Ori Kam,
	Thomas Monjalon, David Wilder, Yongseok Koh, Idan Werpoler,
	Olga Shern
+Idan Werpoler.
From: Pradeep Satyanarayana <pradeep@us.ibm.com>
Sent: Sunday, March 24, 2019 7:38 PM
To: Shahaf Shuler <shahafs@mellanox.com>
Cc: bruce.richardson@intel.com; Chao Zhu <chaozhu@linux.vnet.ibm.com>; Dekel Peled <dekelp@mellanox.com>; dev@dpdk.org; David Christensen <drc@ibm.com>; honnappa.nagarahalli@arm.com; konstantin.ananyev@intel.com; ola.liljedahl@arm.com; Ori Kam <orika@mellanox.com>; Thomas Monjalon <thomas@monjalon.net>; David Wilder <wilder@us.ibm.com>; Yongseok Koh <yskoh@mellanox.com>
Subject: RE: [PATCH] eal/ppc: remove fix of memory barrier for IBM POWER
Shahaf Shuler <shahafs@mellanox.com<mailto:shahafs@mellanox.com>> wrote on 03/23/2019 11:37:42 PM:
> From: Shahaf Shuler <shahafs@mellanox.com<mailto:shahafs@mellanox.com>>
> To: "pradeep@us.ibm.com<mailto:pradeep@us.ibm.com>" <pradeep@us.ibm.com<mailto:pradeep@us.ibm.com>>, Thomas Monjalon
> <thomas@monjalon.net<mailto:thomas@monjalon.net>>
> Cc: "bruce.richardson@intel.com<mailto:bruce.richardson@intel.com>" <bruce.richardson@intel.com<mailto:bruce.richardson@intel.com>>, Chao
> Zhu <chaozhu@linux.vnet.ibm.com<mailto:chaozhu@linux.vnet.ibm.com>>, Dekel Peled <dekelp@mellanox.com<mailto:dekelp@mellanox.com>>,
> "dev@dpdk.org<mailto:dev@dpdk.org>" <dev@dpdk.org<mailto:dev@dpdk.org>>, David Christensen <drc@ibm.com<mailto:drc@ibm.com>>,
> "honnappa.nagarahalli@arm.com<mailto:honnappa.nagarahalli@arm.com>" <honnappa.nagarahalli@arm.com<mailto:honnappa.nagarahalli@arm.com>>,
> "konstantin.ananyev@intel.com<mailto:konstantin.ananyev@intel.com>" <konstantin.ananyev@intel.com<mailto:konstantin.ananyev@intel.com>>,
> "ola.liljedahl@arm.com<mailto:ola.liljedahl@arm.com>" <ola.liljedahl@arm.com<mailto:ola.liljedahl@arm.com>>, Ori Kam
> <orika@mellanox.com<mailto:orika@mellanox.com>>, David Wilder <wilder@us.ibm.com<mailto:wilder@us.ibm.com>>, Yongseok Koh
> <yskoh@mellanox.com<mailto:yskoh@mellanox.com>>
> Date: 03/23/2019 11:37 PM
> Subject: RE: [PATCH] eal/ppc: remove fix of memory barrier for IBM POWER
>
> Pradeep,
>
> Pradeep Satyanarayana wrote on Saturday, March 23, 2019 12:58 AM
> >Thomas Monjalon <thomas@monjalon.net<mailto:thomas@monjalon.net>> wrote on 03/22/2019 10:51:17 AM:
> >> Date: 03/22/2019 10:51 AM
> >> Subject: Re: [PATCH] eal/ppc: remove fix of memory barrier for IBM POWER
> >>
> >> 22/03/2019 16:30, Pradeep Satyanarayana:
> >> > Thomas Monjalon <thomas@monjalon.net<mailto:thomas@monjalon.net>> wrote on 03/22/2019 01:49:03 AM:
> >> > > 22/03/2019 02:40, Pradeep Satyanarayana:
> >> > > > - rte_[rw]mb (general memory barrier) --> should be lwsync
> >> > >
> >> > > This is what may be discussed.
> >> > > The assumption is that the general memory barrier should cover
> >> > > all cases (CPU caches, SMP and I/O).
> >> > > That's why we think it should "sync" for Power.
> >> >
> >> > In that case, at a minimum we must de-link rte_smp_[rw]mb from rte_[rw]mb
> >> > and retain it as lwsync. Agreed?
> >>
> >> I have no clue about what is needed for SMP barrier in Power.
> >> As long as it works as expected, no problem.
> >>
> >
> >We will try that out and report back here, later next week
>
> Till then, i think there are 2 orthogonal issues:
> 1. ppc rte_wmb is incorrect
> 2. ppc rte_smp_[rw]mb may be improved.
>
> for #1 the current patch from Dekel seems to be OK. do you agree?
> for #2 i guess you will check and come back w/ patch/answer?
That has been the line of thinking. However, we need to do some extensive testing
to confirm that it all holds up.
Thanks
Pradeep
pradeep@us.ibm.com<mailto:pradeep@us.ibm.com>
^ permalink raw reply	[flat|nested] 42+ messages in thread
* Re: [dpdk-dev] [PATCH] eal/ppc: remove fix of memory barrier for IBM POWER
  2019-03-26  9:15                             ` Dekel Peled
  2019-03-26  9:15                               ` Dekel Peled
@ 2019-03-27  9:19                               ` Thomas Monjalon
  2019-03-27  9:19                                 ` Thomas Monjalon
                                                   ` (2 more replies)
  1 sibling, 3 replies; 42+ messages in thread
From: Thomas Monjalon @ 2019-03-27  9:19 UTC (permalink / raw)
  To: pradeep, Chao Zhu
  Cc: dev, Dekel Peled, Shahaf Shuler, bruce.richardson,
	David Christensen, honnappa.nagarahalli, konstantin.ananyev,
	ola.liljedahl, Ori Kam, David Wilder, Yongseok Koh,
	Idan Werpoler, Olga Shern
Pradeep, Chao,
Do we have more news?
We must merge this patch for DPDK 19.05-rc1.
I understand you want to try improving performance
by using lightweight sync for SMP barrier,
and this change can be done later.
First priority is to fix the bug of the general barrier.
That's why I should push this patch before the end of the week.
> Shahaf Shuler <shahafs@mellanox.com<mailto:shahafs@mellanox.com>> wrote on 03/23/2019 11:37:42 PM:
> > From: Shahaf Shuler <shahafs@mellanox.com<mailto:shahafs@mellanox.com>>
> > Pradeep Satyanarayana wrote on Saturday, March 23, 2019 12:58 AM
> > >Thomas Monjalon <thomas@monjalon.net<mailto:thomas@monjalon.net>> wrote on 03/22/2019 10:51:17 AM:
> > >> 22/03/2019 16:30, Pradeep Satyanarayana:
> > >> > Thomas Monjalon <thomas@monjalon.net<mailto:thomas@monjalon.net>> wrote on 03/22/2019 01:49:03 AM:
> > >> > > 22/03/2019 02:40, Pradeep Satyanarayana:
> > >> > > > - rte_[rw]mb (general memory barrier) --> should be lwsync
> > >> > >
> > >> > > This is what may be discussed.
> > >> > > The assumption is that the general memory barrier should cover
> > >> > > all cases (CPU caches, SMP and I/O).
> > >> > > That's why we think it should "sync" for Power.
> > >> >
> > >> > In that case, at a minimum we must de-link rte_smp_[rw]mb from rte_[rw]mb
> > >> > and retain it as lwsync. Agreed?
> > >>
> > >> I have no clue about what is needed for SMP barrier in Power.
> > >> As long as it works as expected, no problem.
> > >>
> > >
> > >We will try that out and report back here, later next week
> >
> > Till then, i think there are 2 orthogonal issues:
> > 1. ppc rte_wmb is incorrect
> > 2. ppc rte_smp_[rw]mb may be improved.
> >
> > for #1 the current patch from Dekel seems to be OK. do you agree?
> > for #2 i guess you will check and come back w/ patch/answer?
> 
> That has been the line of thinking. However, we need to do some extensive testing
> to confirm that it all holds up.
^ permalink raw reply	[flat|nested] 42+ messages in thread
* Re: [dpdk-dev] [PATCH] eal/ppc: remove fix of memory barrier for IBM POWER
  2019-03-27  9:19                               ` Thomas Monjalon
@ 2019-03-27  9:19                                 ` Thomas Monjalon
  2019-03-27 23:50                                 ` Pradeep Satyanarayana
       [not found]                                 ` <OF456B0ECC.006EF7E7-ON882583CA.00827A75-882583CA.0082F7BE@LocalDomain>
  2 siblings, 0 replies; 42+ messages in thread
From: Thomas Monjalon @ 2019-03-27  9:19 UTC (permalink / raw)
  To: pradeep, Chao Zhu
  Cc: dev, Dekel Peled, Shahaf Shuler, bruce.richardson,
	David Christensen, honnappa.nagarahalli, konstantin.ananyev,
	ola.liljedahl, Ori Kam, David Wilder, Yongseok Koh,
	Idan Werpoler, Olga Shern
Pradeep, Chao,
Do we have more news?
We must merge this patch for DPDK 19.05-rc1.
I understand you want to try improving performance
by using lightweight sync for SMP barrier,
and this change can be done later.
First priority is to fix the bug of the general barrier.
That's why I should push this patch before the end of the week.
> Shahaf Shuler <shahafs@mellanox.com<mailto:shahafs@mellanox.com>> wrote on 03/23/2019 11:37:42 PM:
> > From: Shahaf Shuler <shahafs@mellanox.com<mailto:shahafs@mellanox.com>>
> > Pradeep Satyanarayana wrote on Saturday, March 23, 2019 12:58 AM
> > >Thomas Monjalon <thomas@monjalon.net<mailto:thomas@monjalon.net>> wrote on 03/22/2019 10:51:17 AM:
> > >> 22/03/2019 16:30, Pradeep Satyanarayana:
> > >> > Thomas Monjalon <thomas@monjalon.net<mailto:thomas@monjalon.net>> wrote on 03/22/2019 01:49:03 AM:
> > >> > > 22/03/2019 02:40, Pradeep Satyanarayana:
> > >> > > > - rte_[rw]mb (general memory barrier) --> should be lwsync
> > >> > >
> > >> > > This is what may be discussed.
> > >> > > The assumption is that the general memory barrier should cover
> > >> > > all cases (CPU caches, SMP and I/O).
> > >> > > That's why we think it should "sync" for Power.
> > >> >
> > >> > In that case, at a minimum we must de-link rte_smp_[rw]mb from rte_[rw]mb
> > >> > and retain it as lwsync. Agreed?
> > >>
> > >> I have no clue about what is needed for SMP barrier in Power.
> > >> As long as it works as expected, no problem.
> > >>
> > >
> > >We will try that out and report back here, later next week
> >
> > Till then, i think there are 2 orthogonal issues:
> > 1. ppc rte_wmb is incorrect
> > 2. ppc rte_smp_[rw]mb may be improved.
> >
> > for #1 the current patch from Dekel seems to be OK. do you agree?
> > for #2 i guess you will check and come back w/ patch/answer?
> 
> That has been the line of thinking. However, we need to do some extensive testing
> to confirm that it all holds up.
^ permalink raw reply	[flat|nested] 42+ messages in thread
* Re: [dpdk-dev] [PATCH] eal/ppc: remove fix of memory barrier for IBM POWER
  2019-03-27  9:19                               ` Thomas Monjalon
  2019-03-27  9:19                                 ` Thomas Monjalon
@ 2019-03-27 23:50                                 ` Pradeep Satyanarayana
  2019-03-27 23:50                                   ` Pradeep Satyanarayana
       [not found]                                 ` <OF456B0ECC.006EF7E7-ON882583CA.00827A75-882583CA.0082F7BE@LocalDomain>
  2 siblings, 1 reply; 42+ messages in thread
From: Pradeep Satyanarayana @ 2019-03-27 23:50 UTC (permalink / raw)
  To: Thomas Monjalon
  Cc: bruce.richardson, Chao Zhu, Dekel Peled, dev, David Christensen,
	honnappa.nagarahalli, Idan Werpoler, konstantin.ananyev,
	ola.liljedahl, Olga Shern, Ori Kam, Shahaf Shuler, David Wilder,
	Yongseok Koh
Thomas Monjalon <thomas@monjalon.net> wrote on 03/27/2019 02:19:03 AM:
> From: Thomas Monjalon <thomas@monjalon.net>
> To: "pradeep@us.ibm.com" <pradeep@us.ibm.com>, Chao Zhu
> <chaozhu@linux.vnet.ibm.com>
> Cc: dev@dpdk.org, Dekel Peled <dekelp@mellanox.com>, Shahaf Shuler
> <shahafs@mellanox.com>, "bruce.richardson@intel.com"
> <bruce.richardson@intel.com>, David Christensen <drc@ibm.com>,
> "honnappa.nagarahalli@arm.com" <honnappa.nagarahalli@arm.com>,
> "konstantin.ananyev@intel.com" <konstantin.ananyev@intel.com>,
> "ola.liljedahl@arm.com" <ola.liljedahl@arm.com>, Ori Kam
> <orika@mellanox.com>, David Wilder <wilder@us.ibm.com>, Yongseok Koh
> <yskoh@mellanox.com>, Idan Werpoler <Idanw@mellanox.com>, Olga Shern
> <olgas@mellanox.com>
> Date: 03/27/2019 02:19 AM
> Subject: Re: [dpdk-dev] [PATCH] eal/ppc: remove fix of memory
> barrier for IBM POWER
>
> Pradeep, Chao,
>
> Do we have more news?
> We must merge this patch for DPDK 19.05-rc1.
>
> I understand you want to try improving performance
> by using lightweight sync for SMP barrier,
> and this change can be done later.
> First priority is to fix the bug of the general barrier.
> That's why I should push this patch before the end of the week.
Dekel's patch at:
http://mails.dpdk.org/archives/dev/2019-March/126746.html
seems to be holding up well with the testing done so far. We have
also pulled that patch in to test with DTS. If all hold up well,
we should be able to let you know by tomorrow morning (US west coast time).
Thanks
Pradeep
pradeep@us.ibm.com
^ permalink raw reply	[flat|nested] 42+ messages in thread
* Re: [dpdk-dev] [PATCH] eal/ppc: remove fix of memory barrier for IBM POWER
  2019-03-27 23:50                                 ` Pradeep Satyanarayana
@ 2019-03-27 23:50                                   ` Pradeep Satyanarayana
  0 siblings, 0 replies; 42+ messages in thread
From: Pradeep Satyanarayana @ 2019-03-27 23:50 UTC (permalink / raw)
  To: Thomas Monjalon
  Cc: bruce.richardson, Chao Zhu, Dekel Peled, dev, David Christensen,
	honnappa.nagarahalli, Idan Werpoler, konstantin.ananyev,
	ola.liljedahl, Olga Shern, Ori Kam, Shahaf Shuler, David Wilder,
	Yongseok Koh
Thomas Monjalon <thomas@monjalon.net> wrote on 03/27/2019 02:19:03 AM:
> From: Thomas Monjalon <thomas@monjalon.net>
> To: "pradeep@us.ibm.com" <pradeep@us.ibm.com>, Chao Zhu
> <chaozhu@linux.vnet.ibm.com>
> Cc: dev@dpdk.org, Dekel Peled <dekelp@mellanox.com>, Shahaf Shuler
> <shahafs@mellanox.com>, "bruce.richardson@intel.com"
> <bruce.richardson@intel.com>, David Christensen <drc@ibm.com>,
> "honnappa.nagarahalli@arm.com" <honnappa.nagarahalli@arm.com>,
> "konstantin.ananyev@intel.com" <konstantin.ananyev@intel.com>,
> "ola.liljedahl@arm.com" <ola.liljedahl@arm.com>, Ori Kam
> <orika@mellanox.com>, David Wilder <wilder@us.ibm.com>, Yongseok Koh
> <yskoh@mellanox.com>, Idan Werpoler <Idanw@mellanox.com>, Olga Shern
> <olgas@mellanox.com>
> Date: 03/27/2019 02:19 AM
> Subject: Re: [dpdk-dev] [PATCH] eal/ppc: remove fix of memory
> barrier for IBM POWER
>
> Pradeep, Chao,
>
> Do we have more news?
> We must merge this patch for DPDK 19.05-rc1.
>
> I understand you want to try improving performance
> by using lightweight sync for SMP barrier,
> and this change can be done later.
> First priority is to fix the bug of the general barrier.
> That's why I should push this patch before the end of the week.
Dekel's patch at:
http://mails.dpdk.org/archives/dev/2019-March/126746.html
seems to be holding up well with the testing done so far. We have
also pulled that patch in to test with DTS. If all hold up well,
we should be able to let you know by tomorrow morning (US west coast time).
Thanks
Pradeep
pradeep@us.ibm.com
^ permalink raw reply	[flat|nested] 42+ messages in thread
* Re: [dpdk-dev] [PATCH] eal/ppc: remove fix of memory barrier for IBM POWER
       [not found]                                 ` <OF456B0ECC.006EF7E7-ON882583CA.00827A75-882583CA.0082F7BE@LocalDomain>
@ 2019-03-28 17:51                                   ` Pradeep Satyanarayana
  2019-03-28 17:51                                     ` Pradeep Satyanarayana
  2019-03-28 17:56                                     ` Thomas Monjalon
  0 siblings, 2 replies; 42+ messages in thread
From: Pradeep Satyanarayana @ 2019-03-28 17:51 UTC (permalink / raw)
  To: Thomas Monjalon
  Cc: bruce.richardson, Chao Zhu, Dekel Peled, dev, David Christensen,
	honnappa.nagarahalli, Idan Werpoler, konstantin.ananyev,
	ola.liljedahl, Olga Shern, Ori Kam, Shahaf Shuler, David Wilder,
	Yongseok Koh
Pradeep Satyanarayana/Beaverton/IBM wrote on 03/27/2019 04:50:31 PM:
> From: Pradeep Satyanarayana/Beaverton/IBM
> To: Thomas Monjalon <thomas@monjalon.net>
> Cc: "bruce.richardson@intel.com" <bruce.richardson@intel.com>, Chao
> Zhu <chaozhu@linux.vnet.ibm.com>, Dekel Peled <dekelp@mellanox.com>,
> dev@dpdk.org, David Christensen <drc@ibm.com>,
> "honnappa.nagarahalli@arm.com" <honnappa.nagarahalli@arm.com>, Idan
> Werpoler <Idanw@mellanox.com>, "konstantin.ananyev@intel.com"
> <konstantin.ananyev@intel.com>, "ola.liljedahl@arm.com"
> <ola.liljedahl@arm.com>, Olga Shern <olgas@mellanox.com>, Ori Kam
> <orika@mellanox.com>, Shahaf Shuler <shahafs@mellanox.com>, David
> Wilder <wilder@us.ibm.com>, Yongseok Koh <yskoh@mellanox.com>
> Date: 03/27/2019 04:50 PM
> Subject: Re: [dpdk-dev] [PATCH] eal/ppc: remove fix of memory
> barrier for IBM POWER
>
> Thomas Monjalon <thomas@monjalon.net> wrote on 03/27/2019 02:19:03 AM:
>
> > From: Thomas Monjalon <thomas@monjalon.net>
> > To: "pradeep@us.ibm.com" <pradeep@us.ibm.com>, Chao Zhu
> > <chaozhu@linux.vnet.ibm.com>
> > Cc: dev@dpdk.org, Dekel Peled <dekelp@mellanox.com>, Shahaf Shuler
> > <shahafs@mellanox.com>, "bruce.richardson@intel.com"
> > <bruce.richardson@intel.com>, David Christensen <drc@ibm.com>,
> > "honnappa.nagarahalli@arm.com" <honnappa.nagarahalli@arm.com>,
> > "konstantin.ananyev@intel.com" <konstantin.ananyev@intel.com>,
> > "ola.liljedahl@arm.com" <ola.liljedahl@arm.com>, Ori Kam
> > <orika@mellanox.com>, David Wilder <wilder@us.ibm.com>, Yongseok Koh
> > <yskoh@mellanox.com>, Idan Werpoler <Idanw@mellanox.com>, Olga Shern
> > <olgas@mellanox.com>
> > Date: 03/27/2019 02:19 AM
> > Subject: Re: [dpdk-dev] [PATCH] eal/ppc: remove fix of memory
> > barrier for IBM POWER
> >
> > Pradeep, Chao,
> >
> > Do we have more news?
> > We must merge this patch for DPDK 19.05-rc1.
> >
> > I understand you want to try improving performance
> > by using lightweight sync for SMP barrier,
> > and this change can be done later.
> > First priority is to fix the bug of the general barrier.
> > That's why I should push this patch before the end of the week.
> Dekel's patch at:
>
> http://mails.dpdk.org/archives/dev/2019-March/126746.html
>
> seems to be holding up well with the testing done so far. We have
> also pulled that patch in to test with DTS. If all hold up well,
> we should be able to let you know by tomorrow morning (US west coast
time).
Thomas,
This is to confirm that the internal testing has completed and all the
tests
passed. So, yes, we can push that into 19.05-rc1.
We will have another patch out for the smp barriers as well. That is still
being tested.
We will also need to get these patches into 18.11 and higher releases. We
will work with others to get that done.
Thanks
Pradeep
pradeep@us.ibm.com
^ permalink raw reply	[flat|nested] 42+ messages in thread
* Re: [dpdk-dev] [PATCH] eal/ppc: remove fix of memory barrier for IBM POWER
  2019-03-28 17:51                                   ` Pradeep Satyanarayana
@ 2019-03-28 17:51                                     ` Pradeep Satyanarayana
  2019-03-28 17:56                                     ` Thomas Monjalon
  1 sibling, 0 replies; 42+ messages in thread
From: Pradeep Satyanarayana @ 2019-03-28 17:51 UTC (permalink / raw)
  To: Thomas Monjalon
  Cc: bruce.richardson, Chao Zhu, Dekel Peled, dev, David Christensen,
	honnappa.nagarahalli, Idan Werpoler, konstantin.ananyev,
	ola.liljedahl, Olga Shern, Ori Kam, Shahaf Shuler, David Wilder,
	Yongseok Koh
Pradeep Satyanarayana/Beaverton/IBM wrote on 03/27/2019 04:50:31 PM:
> From: Pradeep Satyanarayana/Beaverton/IBM
> To: Thomas Monjalon <thomas@monjalon.net>
> Cc: "bruce.richardson@intel.com" <bruce.richardson@intel.com>, Chao
> Zhu <chaozhu@linux.vnet.ibm.com>, Dekel Peled <dekelp@mellanox.com>,
> dev@dpdk.org, David Christensen <drc@ibm.com>,
> "honnappa.nagarahalli@arm.com" <honnappa.nagarahalli@arm.com>, Idan
> Werpoler <Idanw@mellanox.com>, "konstantin.ananyev@intel.com"
> <konstantin.ananyev@intel.com>, "ola.liljedahl@arm.com"
> <ola.liljedahl@arm.com>, Olga Shern <olgas@mellanox.com>, Ori Kam
> <orika@mellanox.com>, Shahaf Shuler <shahafs@mellanox.com>, David
> Wilder <wilder@us.ibm.com>, Yongseok Koh <yskoh@mellanox.com>
> Date: 03/27/2019 04:50 PM
> Subject: Re: [dpdk-dev] [PATCH] eal/ppc: remove fix of memory
> barrier for IBM POWER
>
> Thomas Monjalon <thomas@monjalon.net> wrote on 03/27/2019 02:19:03 AM:
>
> > From: Thomas Monjalon <thomas@monjalon.net>
> > To: "pradeep@us.ibm.com" <pradeep@us.ibm.com>, Chao Zhu
> > <chaozhu@linux.vnet.ibm.com>
> > Cc: dev@dpdk.org, Dekel Peled <dekelp@mellanox.com>, Shahaf Shuler
> > <shahafs@mellanox.com>, "bruce.richardson@intel.com"
> > <bruce.richardson@intel.com>, David Christensen <drc@ibm.com>,
> > "honnappa.nagarahalli@arm.com" <honnappa.nagarahalli@arm.com>,
> > "konstantin.ananyev@intel.com" <konstantin.ananyev@intel.com>,
> > "ola.liljedahl@arm.com" <ola.liljedahl@arm.com>, Ori Kam
> > <orika@mellanox.com>, David Wilder <wilder@us.ibm.com>, Yongseok Koh
> > <yskoh@mellanox.com>, Idan Werpoler <Idanw@mellanox.com>, Olga Shern
> > <olgas@mellanox.com>
> > Date: 03/27/2019 02:19 AM
> > Subject: Re: [dpdk-dev] [PATCH] eal/ppc: remove fix of memory
> > barrier for IBM POWER
> >
> > Pradeep, Chao,
> >
> > Do we have more news?
> > We must merge this patch for DPDK 19.05-rc1.
> >
> > I understand you want to try improving performance
> > by using lightweight sync for SMP barrier,
> > and this change can be done later.
> > First priority is to fix the bug of the general barrier.
> > That's why I should push this patch before the end of the week.
> Dekel's patch at:
>
> http://mails.dpdk.org/archives/dev/2019-March/126746.html
>
> seems to be holding up well with the testing done so far. We have
> also pulled that patch in to test with DTS. If all hold up well,
> we should be able to let you know by tomorrow morning (US west coast
time).
Thomas,
This is to confirm that the internal testing has completed and all the
tests
passed. So, yes, we can push that into 19.05-rc1.
We will have another patch out for the smp barriers as well. That is still
being tested.
We will also need to get these patches into 18.11 and higher releases. We
will work with others to get that done.
Thanks
Pradeep
pradeep@us.ibm.com
^ permalink raw reply	[flat|nested] 42+ messages in thread
* Re: [dpdk-dev] [PATCH] eal/ppc: remove fix of memory barrier for IBM POWER
  2019-03-28 17:51                                   ` Pradeep Satyanarayana
  2019-03-28 17:51                                     ` Pradeep Satyanarayana
@ 2019-03-28 17:56                                     ` Thomas Monjalon
  2019-03-28 17:56                                       ` Thomas Monjalon
  1 sibling, 1 reply; 42+ messages in thread
From: Thomas Monjalon @ 2019-03-28 17:56 UTC (permalink / raw)
  To: Pradeep Satyanarayana
  Cc: bruce.richardson, Chao Zhu, Dekel Peled, dev, David Christensen,
	honnappa.nagarahalli, Idan Werpoler, konstantin.ananyev,
	ola.liljedahl, Olga Shern, Ori Kam, Shahaf Shuler, David Wilder,
	Yongseok Koh
28/03/2019 18:51, Pradeep Satyanarayana:
> From: Pradeep Satyanarayana/Beaverton/IBM
> > From: Thomas Monjalon <thomas@monjalon.net>
> > > Pradeep, Chao,
> > >
> > > Do we have more news?
> > > We must merge this patch for DPDK 19.05-rc1.
> > >
> > > I understand you want to try improving performance
> > > by using lightweight sync for SMP barrier,
> > > and this change can be done later.
> > > First priority is to fix the bug of the general barrier.
> > > That's why I should push this patch before the end of the week.
> 
> > Dekel's patch at:
> >
> > http://mails.dpdk.org/archives/dev/2019-March/126746.html
> >
> > seems to be holding up well with the testing done so far. We have
> > also pulled that patch in to test with DTS. If all hold up well,
> > we should be able to let you know by tomorrow morning (US west coast
> time).
> 
> Thomas,
> 
> This is to confirm that the internal testing has completed and all the
> tests
> passed. So, yes, we can push that into 19.05-rc1.
> 
> We will have another patch out for the smp barriers as well. That is still
> being tested.
> 
> We will also need to get these patches into 18.11 and higher releases. We
> will work with others to get that done.
OK, thanks
^ permalink raw reply	[flat|nested] 42+ messages in thread
* Re: [dpdk-dev] [PATCH] eal/ppc: remove fix of memory barrier for IBM POWER
  2019-03-28 17:56                                     ` Thomas Monjalon
@ 2019-03-28 17:56                                       ` Thomas Monjalon
  0 siblings, 0 replies; 42+ messages in thread
From: Thomas Monjalon @ 2019-03-28 17:56 UTC (permalink / raw)
  To: Pradeep Satyanarayana
  Cc: bruce.richardson, Chao Zhu, Dekel Peled, dev, David Christensen,
	honnappa.nagarahalli, Idan Werpoler, konstantin.ananyev,
	ola.liljedahl, Olga Shern, Ori Kam, Shahaf Shuler, David Wilder,
	Yongseok Koh
28/03/2019 18:51, Pradeep Satyanarayana:
> From: Pradeep Satyanarayana/Beaverton/IBM
> > From: Thomas Monjalon <thomas@monjalon.net>
> > > Pradeep, Chao,
> > >
> > > Do we have more news?
> > > We must merge this patch for DPDK 19.05-rc1.
> > >
> > > I understand you want to try improving performance
> > > by using lightweight sync for SMP barrier,
> > > and this change can be done later.
> > > First priority is to fix the bug of the general barrier.
> > > That's why I should push this patch before the end of the week.
> 
> > Dekel's patch at:
> >
> > http://mails.dpdk.org/archives/dev/2019-March/126746.html
> >
> > seems to be holding up well with the testing done so far. We have
> > also pulled that patch in to test with DTS. If all hold up well,
> > we should be able to let you know by tomorrow morning (US west coast
> time).
> 
> Thomas,
> 
> This is to confirm that the internal testing has completed and all the
> tests
> passed. So, yes, we can push that into 19.05-rc1.
> 
> We will have another patch out for the smp barriers as well. That is still
> being tested.
> 
> We will also need to get these patches into 18.11 and higher releases. We
> will work with others to get that done.
OK, thanks
^ permalink raw reply	[flat|nested] 42+ messages in thread
* Re: [dpdk-dev] [dpdk-stable] [PATCH] eal/ppc: remove fix of memory barrier for IBM POWER
  2019-03-18 12:58 [dpdk-dev] [PATCH] eal/ppc: remove fix of memory barrier for IBM POWER Dekel Peled
  2019-03-18 12:58 ` Dekel Peled
  2019-03-19  3:24 ` Chao Zhu
@ 2019-03-28 22:50 ` Thomas Monjalon
  2019-03-28 22:50   ` Thomas Monjalon
  2 siblings, 1 reply; 42+ messages in thread
From: Thomas Monjalon @ 2019-03-28 22:50 UTC (permalink / raw)
  To: Dekel Peled
  Cc: stable, chaozhu, yskoh, shahafs, dev, orika, pradeep, David Christensen
18/03/2019 13:58, Dekel Peled:
> From previous patch description: "to improve performance on PPC64,
> use light weight sync instruction instead of sync instruction."
> 
> Excerpt from IBM doc [1], section "Memory barrier instructions":
> "The second form of the sync instruction is light-weight sync,
> or lwsync.
> This form is used to control ordering for storage accesses to system
> memory only. It does not create a memory barrier for accesses to
> device memory."
> 
> This patch removes the use of lwsync, so calls to rte_wmb() and
> rte_rmb() will provide correct memory barrier to ensure order of
> accesses to system memory and device memory.
> 
> [1] https://www.ibm.com/developerworks/systems/articles/powerpc.html
> 
> Fixes: d23a6bd04d72 ("eal/ppc: fix memory barrier for IBM POWER")
> Cc: stable@dpdk.org
> 
> Signed-off-by: Dekel Peled <dekelp@mellanox.com>
Applied, thanks
^ permalink raw reply	[flat|nested] 42+ messages in thread
* Re: [dpdk-dev] [dpdk-stable] [PATCH] eal/ppc: remove fix of memory barrier for IBM POWER
  2019-03-28 22:50 ` [dpdk-dev] [dpdk-stable] " Thomas Monjalon
@ 2019-03-28 22:50   ` Thomas Monjalon
  0 siblings, 0 replies; 42+ messages in thread
From: Thomas Monjalon @ 2019-03-28 22:50 UTC (permalink / raw)
  To: Dekel Peled
  Cc: stable, chaozhu, yskoh, shahafs, dev, orika, pradeep, David Christensen
18/03/2019 13:58, Dekel Peled:
> From previous patch description: "to improve performance on PPC64,
> use light weight sync instruction instead of sync instruction."
> 
> Excerpt from IBM doc [1], section "Memory barrier instructions":
> "The second form of the sync instruction is light-weight sync,
> or lwsync.
> This form is used to control ordering for storage accesses to system
> memory only. It does not create a memory barrier for accesses to
> device memory."
> 
> This patch removes the use of lwsync, so calls to rte_wmb() and
> rte_rmb() will provide correct memory barrier to ensure order of
> accesses to system memory and device memory.
> 
> [1] https://www.ibm.com/developerworks/systems/articles/powerpc.html
> 
> Fixes: d23a6bd04d72 ("eal/ppc: fix memory barrier for IBM POWER")
> Cc: stable@dpdk.org
> 
> Signed-off-by: Dekel Peled <dekelp@mellanox.com>
Applied, thanks
^ permalink raw reply	[flat|nested] 42+ messages in thread
end of thread, other threads:[~2019-03-28 22:50 UTC | newest]
Thread overview: 42+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-03-18 12:58 [dpdk-dev] [PATCH] eal/ppc: remove fix of memory barrier for IBM POWER Dekel Peled
2019-03-18 12:58 ` Dekel Peled
2019-03-19  3:24 ` Chao Zhu
2019-03-19  3:24   ` Chao Zhu
2019-03-19 10:05   ` Dekel Peled
2019-03-19 10:05     ` Dekel Peled
2019-03-19 11:14     ` Thomas Monjalon
2019-03-19 11:14       ` Thomas Monjalon
2019-03-19 19:42       ` Shahaf Shuler
2019-03-19 19:42         ` Shahaf Shuler
2019-03-19 20:45         ` Thomas Monjalon
2019-03-19 20:45           ` Thomas Monjalon
2019-03-20 22:40           ` Pradeep Satyanarayana
2019-03-20 22:40             ` Pradeep Satyanarayana
2019-03-21  8:49             ` Shahaf Shuler
2019-03-21  8:49               ` Shahaf Shuler
2019-03-22  1:40               ` Pradeep Satyanarayana
2019-03-22  1:40                 ` Pradeep Satyanarayana
2019-03-22  8:49                 ` Thomas Monjalon
2019-03-22  8:49                   ` Thomas Monjalon
2019-03-22 15:30                   ` Pradeep Satyanarayana
2019-03-22 15:30                     ` Pradeep Satyanarayana
2019-03-22 17:51                     ` Thomas Monjalon
2019-03-22 17:51                       ` Thomas Monjalon
2019-03-22 22:57                       ` Pradeep Satyanarayana
2019-03-22 22:57                         ` Pradeep Satyanarayana
2019-03-24  6:37                         ` Shahaf Shuler
2019-03-24  6:37                           ` Shahaf Shuler
2019-03-24 17:37                           ` Pradeep Satyanarayana
2019-03-24 17:37                             ` Pradeep Satyanarayana
2019-03-26  9:15                             ` Dekel Peled
2019-03-26  9:15                               ` Dekel Peled
2019-03-27  9:19                               ` Thomas Monjalon
2019-03-27  9:19                                 ` Thomas Monjalon
2019-03-27 23:50                                 ` Pradeep Satyanarayana
2019-03-27 23:50                                   ` Pradeep Satyanarayana
     [not found]                                 ` <OF456B0ECC.006EF7E7-ON882583CA.00827A75-882583CA.0082F7BE@LocalDomain>
2019-03-28 17:51                                   ` Pradeep Satyanarayana
2019-03-28 17:51                                     ` Pradeep Satyanarayana
2019-03-28 17:56                                     ` Thomas Monjalon
2019-03-28 17:56                                       ` Thomas Monjalon
2019-03-28 22:50 ` [dpdk-dev] [dpdk-stable] " Thomas Monjalon
2019-03-28 22:50   ` Thomas Monjalon
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).