19/03/2019 20:42, Shahaf Shuler: > Tuesday, March 19, 2019 1:15 PM, Thomas Monjalon: > > Subject: Re: [PATCH] eal/ppc: remove fix of memory barrier for IBM POWER > > > > Guys, please let's avoid top-post. > > > > You are both not replying to each other: > > > > 1/ Dekel mentioned the IBM doc but Chao did not argue about the lack of IO > > protection with lwsync. > > We assume that rte_mb should protect any access including IO. > > > > 2/ Chao asked about the semantic of the barrier used in mlx5 code, but Dekel > > did not reply about the semantic: are we protecting IO or general memory > > access? > > In mlx5 code we want to sync between two different writes: > 1. write to system memory (RAM) > 2. write to MMIO memory (device) > > We need #1 to be visible on host memory before #2 is committed to NIC. > We want to have a single type of barrier which will translate to the correct assembly based on the system arch, and in addition we want it light-weight as possible. > > So far, when not running on power, we used the rte_wmb for that. On x86 and ARM systems it provided the needed guarantees. > It is also mentioned in the barrier doxygen on ARM arch: > " > Write memory barrier. > > Guarantees that the STORE operations generated before the barrier > occur before the STORE operations generated after. > " > > It doesn't restrict to store to system memory only. > w/ power is on somewhat different and in fact rte_mb is required. It obviously miss the point of those barrier if we will need to use a different barrier based on the system arch. > > We need to align the definition of the different barriers in DPDK: > 1. need a clear documentation of each. this should be global and not part of the specific implementation on each arch. The global definition is in lib/librte_eal/common/include/generic/rte_atomic.h There are some copy/paste in Arm32 and PPC that I will remove. > 2. either modify ppc rte_wmb to match ARM and x86 ones or to define a new type of barrier which will sync between both I/O and stores to systems memory. The basic memory barrier of DPDK does not mention a difference between I/O and system memory. It is not explicit (yet) but I assume it is protecting both. So, in my opinion, we need to make it explicit in the doc, and fix the PPC implementation to comply with this definition. Anyway, I don't see any significant effort from IBM to move from the alpha support stage to a real Open Source support. PS: sending a mail every two months, to promise improvements, is not enough! ----------------- > > 19/03/2019 11:05, Dekel Peled: > > > Hi, > > > > > > For ppc, rte_io_mb() is defined as rte_mb(), which is defined as asm sync. > > > According to comments in arch/ppc_64/rte_atomic.h, rte_wmb() and > > rte_rmb() are the same as rte_mb(), for store and load respectively. > > > My patch propose to define rte_wmb() and rte_rmb() as asm sync, like > > rte_mb(), since using lwsync is incorrect for them. > > > > > > Regards, > > > Dekel > > > > > > From: Chao Zhu > > > > Dekel£¬ > > > > > > > > To control the memory order for device memory, I think you should > > > > use > > > > rte_io_mb() instead of rte_mb(). This will generate correct result. > > > > rte_wmb() is used for system memory. > > > > > > > > From: Dekel Peled > > > > > > > > > > From previous patch description: "to improve performance on PPC64, > > > > > use light weight sync instruction instead of sync instruction." > > > > > > > > > > Excerpt from IBM doc [1], section "Memory barrier instructions": > > > > > "The second form of the sync instruction is light-weight sync, or lwsync. > > > > > This form is used to control ordering for storage accesses to > > > > > system memory only. It does not create a memory barrier for > > > > > accesses to device > > > > memory." > > > > > > > > > > This patch removes the use of lwsync, so calls to rte_wmb() and > > > > > rte_rmb() will provide correct memory barrier to ensure order of > > > > > accesses to system memory and device memory.