* [dpdk-stable] [PATCH] raw/ntb: fix write memory barrier issue @ 2019-12-04 15:19 Xiaoyun Li 2019-12-14 15:29 ` [dpdk-stable] [dpdk-dev] " Gavin Hu (Arm Technology China) ` (2 more replies) 0 siblings, 3 replies; 13+ messages in thread From: Xiaoyun Li @ 2019-12-04 15:19 UTC (permalink / raw) To: jingjing.wu; +Cc: dev, Xiaoyun Li, stable All buffers and ring info should be written before tail register update. This patch relocates the write memory barrier before updating tail register to avoid potential issues. Fixes: 11b5c7daf019 ("raw/ntb: add enqueue and dequeue functions") Cc: stable@dpdk.org Signed-off-by: Xiaoyun Li <xiaoyun.li@intel.com> --- drivers/raw/ntb/ntb.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/raw/ntb/ntb.c b/drivers/raw/ntb/ntb.c index ad7f6abfd..dd0b72f8c 100644 --- a/drivers/raw/ntb/ntb.c +++ b/drivers/raw/ntb/ntb.c @@ -683,8 +683,8 @@ ntb_enqueue_bufs(struct rte_rawdev *dev, sizeof(struct ntb_used) * nb1); rte_memcpy(txq->tx_used_ring, tx_used + nb1, sizeof(struct ntb_used) * nb2); - *txq->used_cnt = txq->last_used; rte_wmb(); + *txq->used_cnt = txq->last_used; /* update queue stats */ hw->ntb_xstats[NTB_TX_BYTES_ID + off] += bytes; @@ -789,8 +789,8 @@ ntb_dequeue_bufs(struct rte_rawdev *dev, sizeof(struct ntb_desc) * nb1); rte_memcpy(rxq->rx_desc_ring, rx_desc + nb1, sizeof(struct ntb_desc) * nb2); - *rxq->avail_cnt = rxq->last_avail; rte_wmb(); + *rxq->avail_cnt = rxq->last_avail; /* update queue stats */ off = NTB_XSTATS_NUM * ((size_t)context + 1); -- 2.17.1 ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [dpdk-stable] [dpdk-dev] [PATCH] raw/ntb: fix write memory barrier issue 2019-12-04 15:19 [dpdk-stable] [PATCH] raw/ntb: fix write memory barrier issue Xiaoyun Li @ 2019-12-14 15:29 ` Gavin Hu (Arm Technology China) 2019-12-16 1:57 ` Li, Xiaoyun 2019-12-16 1:58 ` [dpdk-stable] [PATCH v2] " Xiaoyun Li 2019-12-26 1:46 ` [dpdk-stable] [PATCH] " Wu, Jingjing 2 siblings, 1 reply; 13+ messages in thread From: Gavin Hu (Arm Technology China) @ 2019-12-14 15:29 UTC (permalink / raw) To: Xiaoyun Li, jingjing.wu; +Cc: dev, stable, nd Hi Xiaoyun, > -----Original Message----- > From: dev <dev-bounces@dpdk.org> On Behalf Of Xiaoyun Li > Sent: Wednesday, December 4, 2019 11:19 PM > To: jingjing.wu@intel.com > Cc: dev@dpdk.org; Xiaoyun Li <xiaoyun.li@intel.com>; stable@dpdk.org > Subject: [dpdk-dev] [PATCH] raw/ntb: fix write memory barrier issue > > All buffers and ring info should be written before tail register update. > This patch relocates the write memory barrier before updating tail register > to avoid potential issues. > > Fixes: 11b5c7daf019 ("raw/ntb: add enqueue and dequeue functions") > Cc: stable@dpdk.org > > Signed-off-by: Xiaoyun Li <xiaoyun.li@intel.com> > --- > drivers/raw/ntb/ntb.c | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/drivers/raw/ntb/ntb.c b/drivers/raw/ntb/ntb.c > index ad7f6abfd..dd0b72f8c 100644 > --- a/drivers/raw/ntb/ntb.c > +++ b/drivers/raw/ntb/ntb.c > @@ -683,8 +683,8 @@ ntb_enqueue_bufs(struct rte_rawdev *dev, > sizeof(struct ntb_used) * nb1); > rte_memcpy(txq->tx_used_ring, tx_used + nb1, > sizeof(struct ntb_used) * nb2); > - *txq->used_cnt = txq->last_used; > rte_wmb(); > + *txq->used_cnt = txq->last_used; I am ok with the re-location of the barrier, but why not the rte_io_wmb instead of rte_wmb? Rte_io_wmb is sufficient to guarantee the preceding stores are visible to the device, rte_wmb is overkill. https://code.dpdk.org/dpdk/latest/source/lib/librte_eal/common/include/generic/rte_atomic.h#L92 > > /* update queue stats */ > hw->ntb_xstats[NTB_TX_BYTES_ID + off] += bytes; > @@ -789,8 +789,8 @@ ntb_dequeue_bufs(struct rte_rawdev *dev, > sizeof(struct ntb_desc) * nb1); > rte_memcpy(rxq->rx_desc_ring, rx_desc + nb1, > sizeof(struct ntb_desc) * nb2); > - *rxq->avail_cnt = rxq->last_avail; > rte_wmb(); > + *rxq->avail_cnt = rxq->last_avail; > > /* update queue stats */ > off = NTB_XSTATS_NUM * ((size_t)context + 1); > -- > 2.17.1 ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [dpdk-stable] [dpdk-dev] [PATCH] raw/ntb: fix write memory barrier issue 2019-12-14 15:29 ` [dpdk-stable] [dpdk-dev] " Gavin Hu (Arm Technology China) @ 2019-12-16 1:57 ` Li, Xiaoyun 2019-12-25 3:52 ` Gavin Hu 0 siblings, 1 reply; 13+ messages in thread From: Li, Xiaoyun @ 2019-12-16 1:57 UTC (permalink / raw) To: Gavin Hu (Arm Technology China), Wu, Jingjing, Maslekar, Omkar Cc: dev, stable, nd Didn't notice that. Will fix it in v2. Thanks! > -----Original Message----- > From: Gavin Hu (Arm Technology China) [mailto:Gavin.Hu@arm.com] > Sent: Saturday, December 14, 2019 23:30 > To: Li, Xiaoyun <xiaoyun.li@intel.com>; Wu, Jingjing <jingjing.wu@intel.com> > Cc: dev@dpdk.org; stable@dpdk.org; nd <nd@arm.com> > Subject: RE: [dpdk-dev] [PATCH] raw/ntb: fix write memory barrier issue > > Hi Xiaoyun, > > > -----Original Message----- > > From: dev <dev-bounces@dpdk.org> On Behalf Of Xiaoyun Li > > Sent: Wednesday, December 4, 2019 11:19 PM > > To: jingjing.wu@intel.com > > Cc: dev@dpdk.org; Xiaoyun Li <xiaoyun.li@intel.com>; stable@dpdk.org > > Subject: [dpdk-dev] [PATCH] raw/ntb: fix write memory barrier issue > > > > All buffers and ring info should be written before tail register update. > > This patch relocates the write memory barrier before updating tail register > > to avoid potential issues. > > > > Fixes: 11b5c7daf019 ("raw/ntb: add enqueue and dequeue functions") > > Cc: stable@dpdk.org > > > > Signed-off-by: Xiaoyun Li <xiaoyun.li@intel.com> > > --- > > drivers/raw/ntb/ntb.c | 4 ++-- > > 1 file changed, 2 insertions(+), 2 deletions(-) > > > > diff --git a/drivers/raw/ntb/ntb.c b/drivers/raw/ntb/ntb.c > > index ad7f6abfd..dd0b72f8c 100644 > > --- a/drivers/raw/ntb/ntb.c > > +++ b/drivers/raw/ntb/ntb.c > > @@ -683,8 +683,8 @@ ntb_enqueue_bufs(struct rte_rawdev *dev, > > sizeof(struct ntb_used) * nb1); > > rte_memcpy(txq->tx_used_ring, tx_used + nb1, > > sizeof(struct ntb_used) * nb2); > > - *txq->used_cnt = txq->last_used; > > rte_wmb(); > > + *txq->used_cnt = txq->last_used; > I am ok with the re-location of the barrier, but why not the rte_io_wmb instead > of rte_wmb? > Rte_io_wmb is sufficient to guarantee the preceding stores are visible to the > device, rte_wmb is overkill. > https://code.dpdk.org/dpdk/latest/source/lib/librte_eal/common/include/gener > ic/rte_atomic.h#L92 > > > > /* update queue stats */ > > hw->ntb_xstats[NTB_TX_BYTES_ID + off] += bytes; > > @@ -789,8 +789,8 @@ ntb_dequeue_bufs(struct rte_rawdev *dev, > > sizeof(struct ntb_desc) * nb1); > > rte_memcpy(rxq->rx_desc_ring, rx_desc + nb1, > > sizeof(struct ntb_desc) * nb2); > > - *rxq->avail_cnt = rxq->last_avail; > > rte_wmb(); > > + *rxq->avail_cnt = rxq->last_avail; > > > > /* update queue stats */ > > off = NTB_XSTATS_NUM * ((size_t)context + 1); > > -- > > 2.17.1 ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [dpdk-stable] [dpdk-dev] [PATCH] raw/ntb: fix write memory barrier issue 2019-12-16 1:57 ` Li, Xiaoyun @ 2019-12-25 3:52 ` Gavin Hu 0 siblings, 0 replies; 13+ messages in thread From: Gavin Hu @ 2019-12-25 3:52 UTC (permalink / raw) To: Li, Xiaoyun, Wu, Jingjing, Maslekar, Omkar; +Cc: dev, stable, nd, nd Reviewed-by: Gavin Hu <gavin.hu@arm.com> ^ permalink raw reply [flat|nested] 13+ messages in thread
* [dpdk-stable] [PATCH v2] raw/ntb: fix write memory barrier issue 2019-12-04 15:19 [dpdk-stable] [PATCH] raw/ntb: fix write memory barrier issue Xiaoyun Li 2019-12-14 15:29 ` [dpdk-stable] [dpdk-dev] " Gavin Hu (Arm Technology China) @ 2019-12-16 1:58 ` Xiaoyun Li 2019-12-16 10:49 ` [dpdk-stable] [dpdk-dev] " Gavin Hu (Arm Technology China) 2019-12-23 2:38 ` [dpdk-stable] " Wu, Jingjing 2019-12-26 1:46 ` [dpdk-stable] [PATCH] " Wu, Jingjing 2 siblings, 2 replies; 13+ messages in thread From: Xiaoyun Li @ 2019-12-16 1:58 UTC (permalink / raw) To: jingjing.wu; +Cc: dev, omkar.maslekar, Xiaoyun Li, stable All buffers and ring info should be written before tail register update. This patch relocates the write memory barrier before updating tail register to avoid potential issues. Fixes: 11b5c7daf019 ("raw/ntb: add enqueue and dequeue functions") Cc: stable@dpdk.org Signed-off-by: Xiaoyun Li <xiaoyun.li@intel.com> --- v2: * Replaced rte_wmb with rte_io_wmb since rte_io_wmb is enough. --- drivers/raw/ntb/ntb.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/raw/ntb/ntb.c b/drivers/raw/ntb/ntb.c index ad7f6abfd..c7de86f36 100644 --- a/drivers/raw/ntb/ntb.c +++ b/drivers/raw/ntb/ntb.c @@ -683,8 +683,8 @@ ntb_enqueue_bufs(struct rte_rawdev *dev, sizeof(struct ntb_used) * nb1); rte_memcpy(txq->tx_used_ring, tx_used + nb1, sizeof(struct ntb_used) * nb2); + rte_io_wmb(); *txq->used_cnt = txq->last_used; - rte_wmb(); /* update queue stats */ hw->ntb_xstats[NTB_TX_BYTES_ID + off] += bytes; @@ -789,8 +789,8 @@ ntb_dequeue_bufs(struct rte_rawdev *dev, sizeof(struct ntb_desc) * nb1); rte_memcpy(rxq->rx_desc_ring, rx_desc + nb1, sizeof(struct ntb_desc) * nb2); + rte_io_wmb(); *rxq->avail_cnt = rxq->last_avail; - rte_wmb(); /* update queue stats */ off = NTB_XSTATS_NUM * ((size_t)context + 1); -- 2.17.1 ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [dpdk-stable] [dpdk-dev] [PATCH v2] raw/ntb: fix write memory barrier issue 2019-12-16 1:58 ` [dpdk-stable] [PATCH v2] " Xiaoyun Li @ 2019-12-16 10:49 ` Gavin Hu (Arm Technology China) 2019-12-23 7:52 ` Li, Xiaoyun 2019-12-23 2:38 ` [dpdk-stable] " Wu, Jingjing 1 sibling, 1 reply; 13+ messages in thread From: Gavin Hu (Arm Technology China) @ 2019-12-16 10:49 UTC (permalink / raw) To: Xiaoyun Li, jingjing.wu; +Cc: dev, omkar.maslekar, stable, nd > -----Original Message----- > From: dev <dev-bounces@dpdk.org> On Behalf Of Xiaoyun Li > Sent: Monday, December 16, 2019 9:59 AM > To: jingjing.wu@intel.com > Cc: dev@dpdk.org; omkar.maslekar@intel.com; Xiaoyun Li > <xiaoyun.li@intel.com>; stable@dpdk.org > Subject: [dpdk-dev] [PATCH v2] raw/ntb: fix write memory barrier issue > > All buffers and ring info should be written before tail register update. > This patch relocates the write memory barrier before updating tail register > to avoid potential issues. > > Fixes: 11b5c7daf019 ("raw/ntb: add enqueue and dequeue functions") > Cc: stable@dpdk.org > > Signed-off-by: Xiaoyun Li <xiaoyun.li@intel.com> > --- > v2: > * Replaced rte_wmb with rte_io_wmb since rte_io_wmb is enough. > --- > drivers/raw/ntb/ntb.c | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/drivers/raw/ntb/ntb.c b/drivers/raw/ntb/ntb.c > index ad7f6abfd..c7de86f36 100644 > --- a/drivers/raw/ntb/ntb.c > +++ b/drivers/raw/ntb/ntb.c > @@ -683,8 +683,8 @@ ntb_enqueue_bufs(struct rte_rawdev *dev, > sizeof(struct ntb_used) * nb1); > rte_memcpy(txq->tx_used_ring, tx_used + nb1, > sizeof(struct ntb_used) * nb2); > + rte_io_wmb(); As both txq->tx_used_ring and *txq->used_cnt are physically reside in the PCI device side, rte_io_wmb is correct to ensure the ordering. > *txq->used_cnt = txq->last_used; > - rte_wmb(); > > /* update queue stats */ > hw->ntb_xstats[NTB_TX_BYTES_ID + off] += bytes; > @@ -789,8 +789,8 @@ ntb_dequeue_bufs(struct rte_rawdev *dev, > sizeof(struct ntb_desc) * nb1); > rte_memcpy(rxq->rx_desc_ring, rx_desc + nb1, > sizeof(struct ntb_desc) * nb2); > + rte_io_wmb(); > *rxq->avail_cnt = rxq->last_avail; > - rte_wmb(); > > /* update queue stats */ > off = NTB_XSTATS_NUM * ((size_t)context + 1); > -- > 2.17.1 Reviewed-by: Gavin Hu <gavin.hu@arm.com> ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [dpdk-stable] [dpdk-dev] [PATCH v2] raw/ntb: fix write memory barrier issue 2019-12-16 10:49 ` [dpdk-stable] [dpdk-dev] " Gavin Hu (Arm Technology China) @ 2019-12-23 7:52 ` Li, Xiaoyun 2019-12-23 8:38 ` Gavin Hu 0 siblings, 1 reply; 13+ messages in thread From: Li, Xiaoyun @ 2019-12-23 7:52 UTC (permalink / raw) To: Gavin Hu (Arm Technology China), Wu, Jingjing Cc: dev, Maslekar, Omkar, stable, nd Hi I reconsidered and retested about this issue. I still need to use rte_wmb instead of using rte_io_wmb. Because to achieve high performance, ntb needs to turn on WC(write combining) feature. The perf difference with and without WC enabled is more than 20X. And when WC enabled, rte_io_wmb cannot make sure the instructions are in order only rte_wmb can make sure that. And in my retest, when sending 64 bytes packets, using rte_io_wmb will cause out-of-order issue and cause memory corruption on rx side. And using rte_wmb is fine. So I can only use v1 patch and suspend v2 patch in patchwork. Best Regards Xiaoyun Li > -----Original Message----- > From: Gavin Hu (Arm Technology China) [mailto:Gavin.Hu@arm.com] > Sent: Monday, December 16, 2019 18:50 > To: Li, Xiaoyun <xiaoyun.li@intel.com>; Wu, Jingjing <jingjing.wu@intel.com> > Cc: dev@dpdk.org; Maslekar, Omkar <omkar.maslekar@intel.com>; > stable@dpdk.org; nd <nd@arm.com> > Subject: RE: [dpdk-dev] [PATCH v2] raw/ntb: fix write memory barrier issue > > > > > -----Original Message----- > > From: dev <dev-bounces@dpdk.org> On Behalf Of Xiaoyun Li > > Sent: Monday, December 16, 2019 9:59 AM > > To: jingjing.wu@intel.com > > Cc: dev@dpdk.org; omkar.maslekar@intel.com; Xiaoyun Li > > <xiaoyun.li@intel.com>; stable@dpdk.org > > Subject: [dpdk-dev] [PATCH v2] raw/ntb: fix write memory barrier issue > > > > All buffers and ring info should be written before tail register update. > > This patch relocates the write memory barrier before updating tail > > register to avoid potential issues. > > > > Fixes: 11b5c7daf019 ("raw/ntb: add enqueue and dequeue functions") > > Cc: stable@dpdk.org > > > > Signed-off-by: Xiaoyun Li <xiaoyun.li@intel.com> > > --- > > v2: > > * Replaced rte_wmb with rte_io_wmb since rte_io_wmb is enough. > > --- > > drivers/raw/ntb/ntb.c | 4 ++-- > > 1 file changed, 2 insertions(+), 2 deletions(-) > > > > diff --git a/drivers/raw/ntb/ntb.c b/drivers/raw/ntb/ntb.c index > > ad7f6abfd..c7de86f36 100644 > > --- a/drivers/raw/ntb/ntb.c > > +++ b/drivers/raw/ntb/ntb.c > > @@ -683,8 +683,8 @@ ntb_enqueue_bufs(struct rte_rawdev *dev, > > sizeof(struct ntb_used) * nb1); > > rte_memcpy(txq->tx_used_ring, tx_used + nb1, > > sizeof(struct ntb_used) * nb2); > > + rte_io_wmb(); > As both txq->tx_used_ring and *txq->used_cnt are physically reside in the PCI > device side, rte_io_wmb is correct to ensure the ordering. > > > *txq->used_cnt = txq->last_used; > > - rte_wmb(); > > > > /* update queue stats */ > > hw->ntb_xstats[NTB_TX_BYTES_ID + off] += bytes; @@ -789,8 > +789,8 @@ > > ntb_dequeue_bufs(struct rte_rawdev *dev, > > sizeof(struct ntb_desc) * nb1); > > rte_memcpy(rxq->rx_desc_ring, rx_desc + nb1, > > sizeof(struct ntb_desc) * nb2); > > + rte_io_wmb(); > > *rxq->avail_cnt = rxq->last_avail; > > - rte_wmb(); > > > > /* update queue stats */ > > off = NTB_XSTATS_NUM * ((size_t)context + 1); > > -- > > 2.17.1 > > Reviewed-by: Gavin Hu <gavin.hu@arm.com> ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [dpdk-stable] [dpdk-dev] [PATCH v2] raw/ntb: fix write memory barrier issue 2019-12-23 7:52 ` Li, Xiaoyun @ 2019-12-23 8:38 ` Gavin Hu 2019-12-23 9:35 ` Li, Xiaoyun 0 siblings, 1 reply; 13+ messages in thread From: Gavin Hu @ 2019-12-23 8:38 UTC (permalink / raw) To: Li, Xiaoyun, Wu, Jingjing Cc: dev, Maslekar, Omkar, stable, nd, jerinj, Honnappa Nagarahalli, bruce.richardson, nd Hi Xiaoyun, > -----Original Message----- > From: Li, Xiaoyun <xiaoyun.li@intel.com> > Sent: Monday, December 23, 2019 3:52 PM > To: Gavin Hu <Gavin.Hu@arm.com>; Wu, Jingjing <jingjing.wu@intel.com> > Cc: dev@dpdk.org; Maslekar, Omkar <omkar.maslekar@intel.com>; > stable@dpdk.org; nd <nd@arm.com> > Subject: RE: [dpdk-dev] [PATCH v2] raw/ntb: fix write memory barrier issue > > Hi > I reconsidered and retested about this issue. > I still need to use rte_wmb instead of using rte_io_wmb. > > Because to achieve high performance, ntb needs to turn on WC(write > combining) feature. The perf difference with and without WC enabled is > more than 20X. > And when WC enabled, rte_io_wmb cannot make sure the instructions are > in order only rte_wmb can make sure that. > > And in my retest, when sending 64 bytes packets, using rte_io_wmb will > cause out-of-order issue and cause memory corruption on rx side. > And using rte_wmb is fine. That's true, as it is declared as 'write combine' region, even x86 is known as strong ordered, it is the interconnect or PCI RC may do the reordering, 'write combine', 'write coalescing', which caused this problem. IMO, rte_io_*mb barriers on x86 should be promoted to stronger is WC is involved(but that will sap performance for non-WC memories?). https://code.dpdk.org/dpdk/latest/source/lib/librte_eal/common/include/arch/x86/rte_atomic.h#L78 Using rte_wmb will hurt performance for aarch64 also, as pci device memory accesses to a single device are strongly ordered therefore the strongest rte_wmb is not necessary. > So I can only use v1 patch and suspend v2 patch in patchwork. > > Best Regards > Xiaoyun Li > > > -----Original Message----- > > From: Gavin Hu (Arm Technology China) [mailto:Gavin.Hu@arm.com] > > Sent: Monday, December 16, 2019 18:50 > > To: Li, Xiaoyun <xiaoyun.li@intel.com>; Wu, Jingjing > <jingjing.wu@intel.com> > > Cc: dev@dpdk.org; Maslekar, Omkar <omkar.maslekar@intel.com>; > > stable@dpdk.org; nd <nd@arm.com> > > Subject: RE: [dpdk-dev] [PATCH v2] raw/ntb: fix write memory barrier > issue > > > > > > > > > -----Original Message----- > > > From: dev <dev-bounces@dpdk.org> On Behalf Of Xiaoyun Li > > > Sent: Monday, December 16, 2019 9:59 AM > > > To: jingjing.wu@intel.com > > > Cc: dev@dpdk.org; omkar.maslekar@intel.com; Xiaoyun Li > > > <xiaoyun.li@intel.com>; stable@dpdk.org > > > Subject: [dpdk-dev] [PATCH v2] raw/ntb: fix write memory barrier issue > > > > > > All buffers and ring info should be written before tail register update. > > > This patch relocates the write memory barrier before updating tail > > > register to avoid potential issues. > > > > > > Fixes: 11b5c7daf019 ("raw/ntb: add enqueue and dequeue functions") > > > Cc: stable@dpdk.org > > > > > > Signed-off-by: Xiaoyun Li <xiaoyun.li@intel.com> > > > --- > > > v2: > > > * Replaced rte_wmb with rte_io_wmb since rte_io_wmb is enough. > > > --- > > > drivers/raw/ntb/ntb.c | 4 ++-- > > > 1 file changed, 2 insertions(+), 2 deletions(-) > > > > > > diff --git a/drivers/raw/ntb/ntb.c b/drivers/raw/ntb/ntb.c index > > > ad7f6abfd..c7de86f36 100644 > > > --- a/drivers/raw/ntb/ntb.c > > > +++ b/drivers/raw/ntb/ntb.c > > > @@ -683,8 +683,8 @@ ntb_enqueue_bufs(struct rte_rawdev *dev, > > > sizeof(struct ntb_used) * nb1); > > > rte_memcpy(txq->tx_used_ring, tx_used + nb1, > > > sizeof(struct ntb_used) * nb2); > > > + rte_io_wmb(); > > As both txq->tx_used_ring and *txq->used_cnt are physically reside in the > PCI > > device side, rte_io_wmb is correct to ensure the ordering. > > > > > *txq->used_cnt = txq->last_used; > > > - rte_wmb(); > > > > > > /* update queue stats */ > > > hw->ntb_xstats[NTB_TX_BYTES_ID + off] += bytes; @@ - > 789,8 > > +789,8 @@ > > > ntb_dequeue_bufs(struct rte_rawdev *dev, > > > sizeof(struct ntb_desc) * nb1); > > > rte_memcpy(rxq->rx_desc_ring, rx_desc + nb1, > > > sizeof(struct ntb_desc) * nb2); > > > + rte_io_wmb(); > > > *rxq->avail_cnt = rxq->last_avail; > > > - rte_wmb(); > > > > > > /* update queue stats */ > > > off = NTB_XSTATS_NUM * ((size_t)context + 1); > > > -- > > > 2.17.1 > > > > Reviewed-by: Gavin Hu <gavin.hu@arm.com> ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [dpdk-stable] [dpdk-dev] [PATCH v2] raw/ntb: fix write memory barrier issue 2019-12-23 8:38 ` Gavin Hu @ 2019-12-23 9:35 ` Li, Xiaoyun 2019-12-24 15:46 ` Gavin Hu 0 siblings, 1 reply; 13+ messages in thread From: Li, Xiaoyun @ 2019-12-23 9:35 UTC (permalink / raw) To: Gavin Hu, Wu, Jingjing Cc: dev, Maslekar, Omkar, stable, nd, jerinj, Honnappa Nagarahalli, Richardson, Bruce, nd Hi Still, stability and correctness are much more important than performance. As I said, with WC can benefit more than 20X perf. Comparing to this benefit, the difference between rte_wmb and rte_io_wmb is not that important. And in my test, the performance is not that bad with rte_wmb especially with large packets which are the normal use cases. BTW, I've searched linux kernel codes and don't see any NTB device on arm platform. So I don't think you need to consider the perf hurt to arm. Best Regards Xiaoyun Li > -----Original Message----- > From: Gavin Hu [mailto:Gavin.Hu@arm.com] > Sent: Monday, December 23, 2019 16:38 > To: Li, Xiaoyun <xiaoyun.li@intel.com>; Wu, Jingjing <jingjing.wu@intel.com> > Cc: dev@dpdk.org; Maslekar, Omkar <omkar.maslekar@intel.com>; > stable@dpdk.org; nd <nd@arm.com>; jerinj@marvell.com; Honnappa > Nagarahalli <Honnappa.Nagarahalli@arm.com>; Richardson, Bruce > <bruce.richardson@intel.com>; nd <nd@arm.com> > Subject: RE: [dpdk-dev] [PATCH v2] raw/ntb: fix write memory barrier issue > > Hi Xiaoyun, > > > -----Original Message----- > > From: Li, Xiaoyun <xiaoyun.li@intel.com> > > Sent: Monday, December 23, 2019 3:52 PM > > To: Gavin Hu <Gavin.Hu@arm.com>; Wu, Jingjing <jingjing.wu@intel.com> > > Cc: dev@dpdk.org; Maslekar, Omkar <omkar.maslekar@intel.com>; > > stable@dpdk.org; nd <nd@arm.com> > > Subject: RE: [dpdk-dev] [PATCH v2] raw/ntb: fix write memory barrier > > issue > > > > Hi > > I reconsidered and retested about this issue. > > I still need to use rte_wmb instead of using rte_io_wmb. > > > > Because to achieve high performance, ntb needs to turn on WC(write > > combining) feature. The perf difference with and without WC enabled is > > more than 20X. > > And when WC enabled, rte_io_wmb cannot make sure the instructions are > > in order only rte_wmb can make sure that. > > > > And in my retest, when sending 64 bytes packets, using rte_io_wmb will > > cause out-of-order issue and cause memory corruption on rx side. > > And using rte_wmb is fine. > That's true, as it is declared as 'write combine' region, even x86 is known as > strong ordered, it is the interconnect or PCI RC may do the reordering, 'write > combine', 'write coalescing', which caused this problem. > IMO, rte_io_*mb barriers on x86 should be promoted to stronger is WC is > involved(but that will sap performance for non-WC memories?). > https://code.dpdk.org/dpdk/latest/source/lib/librte_eal/common/include/arch/ > x86/rte_atomic.h#L78 > > Using rte_wmb will hurt performance for aarch64 also, as pci device memory > accesses to a single device are strongly ordered therefore the strongest > rte_wmb is not necessary. > > So I can only use v1 patch and suspend v2 patch in patchwork. > > > > Best Regards > > Xiaoyun Li > > > > > -----Original Message----- > > > From: Gavin Hu (Arm Technology China) [mailto:Gavin.Hu@arm.com] > > > Sent: Monday, December 16, 2019 18:50 > > > To: Li, Xiaoyun <xiaoyun.li@intel.com>; Wu, Jingjing > > <jingjing.wu@intel.com> > > > Cc: dev@dpdk.org; Maslekar, Omkar <omkar.maslekar@intel.com>; > > > stable@dpdk.org; nd <nd@arm.com> > > > Subject: RE: [dpdk-dev] [PATCH v2] raw/ntb: fix write memory barrier > > issue > > > > > > > > > > > > > -----Original Message----- > > > > From: dev <dev-bounces@dpdk.org> On Behalf Of Xiaoyun Li > > > > Sent: Monday, December 16, 2019 9:59 AM > > > > To: jingjing.wu@intel.com > > > > Cc: dev@dpdk.org; omkar.maslekar@intel.com; Xiaoyun Li > > > > <xiaoyun.li@intel.com>; stable@dpdk.org > > > > Subject: [dpdk-dev] [PATCH v2] raw/ntb: fix write memory barrier > > > > issue > > > > > > > > All buffers and ring info should be written before tail register update. > > > > This patch relocates the write memory barrier before updating tail > > > > register to avoid potential issues. > > > > > > > > Fixes: 11b5c7daf019 ("raw/ntb: add enqueue and dequeue functions") > > > > Cc: stable@dpdk.org > > > > > > > > Signed-off-by: Xiaoyun Li <xiaoyun.li@intel.com> > > > > --- > > > > v2: > > > > * Replaced rte_wmb with rte_io_wmb since rte_io_wmb is enough. > > > > --- > > > > drivers/raw/ntb/ntb.c | 4 ++-- > > > > 1 file changed, 2 insertions(+), 2 deletions(-) > > > > > > > > diff --git a/drivers/raw/ntb/ntb.c b/drivers/raw/ntb/ntb.c index > > > > ad7f6abfd..c7de86f36 100644 > > > > --- a/drivers/raw/ntb/ntb.c > > > > +++ b/drivers/raw/ntb/ntb.c > > > > @@ -683,8 +683,8 @@ ntb_enqueue_bufs(struct rte_rawdev *dev, > > > > sizeof(struct ntb_used) * nb1); > > > > rte_memcpy(txq->tx_used_ring, tx_used + nb1, > > > > sizeof(struct ntb_used) * nb2); > > > > + rte_io_wmb(); > > > As both txq->tx_used_ring and *txq->used_cnt are physically reside > > > in the > > PCI > > > device side, rte_io_wmb is correct to ensure the ordering. > > > > > > > *txq->used_cnt = txq->last_used; > > > > - rte_wmb(); > > > > > > > > /* update queue stats */ > > > > hw->ntb_xstats[NTB_TX_BYTES_ID + off] += bytes; @@ - > > 789,8 > > > +789,8 @@ > > > > ntb_dequeue_bufs(struct rte_rawdev *dev, > > > > sizeof(struct ntb_desc) * nb1); > > > > rte_memcpy(rxq->rx_desc_ring, rx_desc + nb1, > > > > sizeof(struct ntb_desc) * nb2); > > > > + rte_io_wmb(); > > > > *rxq->avail_cnt = rxq->last_avail; > > > > - rte_wmb(); > > > > > > > > /* update queue stats */ > > > > off = NTB_XSTATS_NUM * ((size_t)context + 1); > > > > -- > > > > 2.17.1 > > > > > > Reviewed-by: Gavin Hu <gavin.hu@arm.com> ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [dpdk-stable] [dpdk-dev] [PATCH v2] raw/ntb: fix write memory barrier issue 2019-12-23 9:35 ` Li, Xiaoyun @ 2019-12-24 15:46 ` Gavin Hu 0 siblings, 0 replies; 13+ messages in thread From: Gavin Hu @ 2019-12-24 15:46 UTC (permalink / raw) To: Li, Xiaoyun, Wu, Jingjing Cc: dev, Maslekar, Omkar, stable, nd, jerinj, Honnappa Nagarahalli, Richardson, Bruce, nd, nd Hi Xiaoyun, > -----Original Message----- > From: Li, Xiaoyun <xiaoyun.li@intel.com> > Sent: Monday, December 23, 2019 5:36 PM > To: Gavin Hu <Gavin.Hu@arm.com>; Wu, Jingjing <jingjing.wu@intel.com> > Cc: dev@dpdk.org; Maslekar, Omkar <omkar.maslekar@intel.com>; > stable@dpdk.org; nd <nd@arm.com>; jerinj@marvell.com; Honnappa > Nagarahalli <Honnappa.Nagarahalli@arm.com>; Richardson, Bruce > <bruce.richardson@intel.com>; nd <nd@arm.com> > Subject: RE: [dpdk-dev] [PATCH v2] raw/ntb: fix write memory barrier issue > > Hi > > Still, stability and correctness are much more important than performance. > As I said, with WC can benefit more than 20X perf. Comparing to this benefit, > the difference between rte_wmb and rte_io_wmb is not that important. > And in my test, the performance is not that bad with rte_wmb especially with > large packets which are the normal use cases. I agree 'sfence' is the correct barrier for WC, as it is a weak memory model. Rte_io on x86 is a compiler barrier, that is not strong enough. > > BTW, I've searched linux kernel codes and don't see any NTB device on arm > platform. > So I don't think you need to consider the perf hurt to arm. Limited the discussion to NTB, I am fine, and since WC in not used for any other NICs, that's ok. > > Best Regards > Xiaoyun Li > > > -----Original Message----- > > From: Gavin Hu [mailto:Gavin.Hu@arm.com] > > Sent: Monday, December 23, 2019 16:38 > > To: Li, Xiaoyun <xiaoyun.li@intel.com>; Wu, Jingjing > <jingjing.wu@intel.com> > > Cc: dev@dpdk.org; Maslekar, Omkar <omkar.maslekar@intel.com>; > > stable@dpdk.org; nd <nd@arm.com>; jerinj@marvell.com; Honnappa > > Nagarahalli <Honnappa.Nagarahalli@arm.com>; Richardson, Bruce > > <bruce.richardson@intel.com>; nd <nd@arm.com> > > Subject: RE: [dpdk-dev] [PATCH v2] raw/ntb: fix write memory barrier issue > > > > Hi Xiaoyun, > > > > > -----Original Message----- > > > From: Li, Xiaoyun <xiaoyun.li@intel.com> > > > Sent: Monday, December 23, 2019 3:52 PM > > > To: Gavin Hu <Gavin.Hu@arm.com>; Wu, Jingjing <jingjing.wu@intel.com> > > > Cc: dev@dpdk.org; Maslekar, Omkar <omkar.maslekar@intel.com>; > > > stable@dpdk.org; nd <nd@arm.com> > > > Subject: RE: [dpdk-dev] [PATCH v2] raw/ntb: fix write memory barrier > > > issue > > > > > > Hi > > > I reconsidered and retested about this issue. > > > I still need to use rte_wmb instead of using rte_io_wmb. > > > > > > Because to achieve high performance, ntb needs to turn on WC(write > > > combining) feature. The perf difference with and without WC enabled is > > > more than 20X. > > > And when WC enabled, rte_io_wmb cannot make sure the instructions > are > > > in order only rte_wmb can make sure that. > > > > > > And in my retest, when sending 64 bytes packets, using rte_io_wmb will > > > cause out-of-order issue and cause memory corruption on rx side. > > > And using rte_wmb is fine. > > That's true, as it is declared as 'write combine' region, even x86 is known as > > strong ordered, it is the interconnect or PCI RC may do the reordering, 'write > > combine', 'write coalescing', which caused this problem. > > IMO, rte_io_*mb barriers on x86 should be promoted to stronger is WC is > > involved(but that will sap performance for non-WC memories?). > > > https://code.dpdk.org/dpdk/latest/source/lib/librte_eal/common/include/ar > ch/ > > x86/rte_atomic.h#L78 > > > > Using rte_wmb will hurt performance for aarch64 also, as pci device > memory > > accesses to a single device are strongly ordered therefore the strongest > > rte_wmb is not necessary. > > > So I can only use v1 patch and suspend v2 patch in patchwork. > > > > > > Best Regards > > > Xiaoyun Li > > > > > > > -----Original Message----- > > > > From: Gavin Hu (Arm Technology China) [mailto:Gavin.Hu@arm.com] > > > > Sent: Monday, December 16, 2019 18:50 > > > > To: Li, Xiaoyun <xiaoyun.li@intel.com>; Wu, Jingjing > > > <jingjing.wu@intel.com> > > > > Cc: dev@dpdk.org; Maslekar, Omkar <omkar.maslekar@intel.com>; > > > > stable@dpdk.org; nd <nd@arm.com> > > > > Subject: RE: [dpdk-dev] [PATCH v2] raw/ntb: fix write memory barrier > > > issue > > > > > > > > > > > > > > > > > -----Original Message----- > > > > > From: dev <dev-bounces@dpdk.org> On Behalf Of Xiaoyun Li > > > > > Sent: Monday, December 16, 2019 9:59 AM > > > > > To: jingjing.wu@intel.com > > > > > Cc: dev@dpdk.org; omkar.maslekar@intel.com; Xiaoyun Li > > > > > <xiaoyun.li@intel.com>; stable@dpdk.org > > > > > Subject: [dpdk-dev] [PATCH v2] raw/ntb: fix write memory barrier > > > > > issue > > > > > > > > > > All buffers and ring info should be written before tail register update. > > > > > This patch relocates the write memory barrier before updating tail > > > > > register to avoid potential issues. > > > > > > > > > > Fixes: 11b5c7daf019 ("raw/ntb: add enqueue and dequeue functions") > > > > > Cc: stable@dpdk.org > > > > > > > > > > Signed-off-by: Xiaoyun Li <xiaoyun.li@intel.com> > > > > > --- > > > > > v2: > > > > > * Replaced rte_wmb with rte_io_wmb since rte_io_wmb is enough. > > > > > --- > > > > > drivers/raw/ntb/ntb.c | 4 ++-- > > > > > 1 file changed, 2 insertions(+), 2 deletions(-) > > > > > > > > > > diff --git a/drivers/raw/ntb/ntb.c b/drivers/raw/ntb/ntb.c index > > > > > ad7f6abfd..c7de86f36 100644 > > > > > --- a/drivers/raw/ntb/ntb.c > > > > > +++ b/drivers/raw/ntb/ntb.c > > > > > @@ -683,8 +683,8 @@ ntb_enqueue_bufs(struct rte_rawdev *dev, > > > > > sizeof(struct ntb_used) * nb1); > > > > > rte_memcpy(txq->tx_used_ring, tx_used + nb1, > > > > > sizeof(struct ntb_used) * nb2); > > > > > + rte_io_wmb(); > > > > As both txq->tx_used_ring and *txq->used_cnt are physically reside > > > > in the > > > PCI > > > > device side, rte_io_wmb is correct to ensure the ordering. > > > > > > > > > *txq->used_cnt = txq->last_used; > > > > > - rte_wmb(); > > > > > > > > > > /* update queue stats */ > > > > > hw->ntb_xstats[NTB_TX_BYTES_ID + off] += bytes; @@ - > > > 789,8 > > > > +789,8 @@ > > > > > ntb_dequeue_bufs(struct rte_rawdev *dev, > > > > > sizeof(struct ntb_desc) * nb1); > > > > > rte_memcpy(rxq->rx_desc_ring, rx_desc + nb1, > > > > > sizeof(struct ntb_desc) * nb2); > > > > > + rte_io_wmb(); > > > > > *rxq->avail_cnt = rxq->last_avail; > > > > > - rte_wmb(); > > > > > > > > > > /* update queue stats */ > > > > > off = NTB_XSTATS_NUM * ((size_t)context + 1); > > > > > -- > > > > > 2.17.1 > > > > > > > > Reviewed-by: Gavin Hu <gavin.hu@arm.com> ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [dpdk-stable] [PATCH v2] raw/ntb: fix write memory barrier issue 2019-12-16 1:58 ` [dpdk-stable] [PATCH v2] " Xiaoyun Li 2019-12-16 10:49 ` [dpdk-stable] [dpdk-dev] " Gavin Hu (Arm Technology China) @ 2019-12-23 2:38 ` Wu, Jingjing 1 sibling, 0 replies; 13+ messages in thread From: Wu, Jingjing @ 2019-12-23 2:38 UTC (permalink / raw) To: Li, Xiaoyun; +Cc: dev, Maslekar, Omkar, stable > -----Original Message----- > From: Li, Xiaoyun <xiaoyun.li@intel.com> > Sent: Monday, December 16, 2019 9:59 AM > To: Wu, Jingjing <jingjing.wu@intel.com> > Cc: dev@dpdk.org; Maslekar, Omkar <omkar.maslekar@intel.com>; Li, Xiaoyun > <xiaoyun.li@intel.com>; stable@dpdk.org > Subject: [PATCH v2] raw/ntb: fix write memory barrier issue > > All buffers and ring info should be written before tail register update. > This patch relocates the write memory barrier before updating tail register > to avoid potential issues. > > Fixes: 11b5c7daf019 ("raw/ntb: add enqueue and dequeue functions") > Cc: stable@dpdk.org > > Signed-off-by: Xiaoyun Li <xiaoyun.li@intel.com> Acked-by: Jingjing Wu <jingjing.wu@intel.com> > --- > v2: > * Replaced rte_wmb with rte_io_wmb since rte_io_wmb is enough. > --- > drivers/raw/ntb/ntb.c | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/drivers/raw/ntb/ntb.c b/drivers/raw/ntb/ntb.c > index ad7f6abfd..c7de86f36 100644 > --- a/drivers/raw/ntb/ntb.c > +++ b/drivers/raw/ntb/ntb.c > @@ -683,8 +683,8 @@ ntb_enqueue_bufs(struct rte_rawdev *dev, > sizeof(struct ntb_used) * nb1); > rte_memcpy(txq->tx_used_ring, tx_used + nb1, > sizeof(struct ntb_used) * nb2); > + rte_io_wmb(); > *txq->used_cnt = txq->last_used; > - rte_wmb(); > > /* update queue stats */ > hw->ntb_xstats[NTB_TX_BYTES_ID + off] += bytes; > @@ -789,8 +789,8 @@ ntb_dequeue_bufs(struct rte_rawdev *dev, > sizeof(struct ntb_desc) * nb1); > rte_memcpy(rxq->rx_desc_ring, rx_desc + nb1, > sizeof(struct ntb_desc) * nb2); > + rte_io_wmb(); > *rxq->avail_cnt = rxq->last_avail; > - rte_wmb(); > > /* update queue stats */ > off = NTB_XSTATS_NUM * ((size_t)context + 1); > -- > 2.17.1 ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [dpdk-stable] [PATCH] raw/ntb: fix write memory barrier issue 2019-12-04 15:19 [dpdk-stable] [PATCH] raw/ntb: fix write memory barrier issue Xiaoyun Li 2019-12-14 15:29 ` [dpdk-stable] [dpdk-dev] " Gavin Hu (Arm Technology China) 2019-12-16 1:58 ` [dpdk-stable] [PATCH v2] " Xiaoyun Li @ 2019-12-26 1:46 ` Wu, Jingjing 2020-01-20 9:04 ` [dpdk-stable] [dpdk-dev] " Thomas Monjalon 2 siblings, 1 reply; 13+ messages in thread From: Wu, Jingjing @ 2019-12-26 1:46 UTC (permalink / raw) To: Li, Xiaoyun; +Cc: dev, stable > -----Original Message----- > From: Li, Xiaoyun <xiaoyun.li@intel.com> > Sent: Wednesday, December 4, 2019 11:19 PM > To: Wu, Jingjing <jingjing.wu@intel.com> > Cc: dev@dpdk.org; Li, Xiaoyun <xiaoyun.li@intel.com>; stable@dpdk.org > Subject: [PATCH] raw/ntb: fix write memory barrier issue > > All buffers and ring info should be written before tail register update. > This patch relocates the write memory barrier before updating tail register > to avoid potential issues. > > Fixes: 11b5c7daf019 ("raw/ntb: add enqueue and dequeue functions") > Cc: stable@dpdk.org > > Signed-off-by: Xiaoyun Li <xiaoyun.li@intel.com> Acked-by: Jingjing Wu <jingjing.wu@intel.com> ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [dpdk-stable] [dpdk-dev] [PATCH] raw/ntb: fix write memory barrier issue 2019-12-26 1:46 ` [dpdk-stable] [PATCH] " Wu, Jingjing @ 2020-01-20 9:04 ` Thomas Monjalon 0 siblings, 0 replies; 13+ messages in thread From: Thomas Monjalon @ 2020-01-20 9:04 UTC (permalink / raw) To: Li, Xiaoyun; +Cc: dev, stable, Wu, Jingjing, gavin.hu 26/12/2019 02:46, Wu, Jingjing: > From: Li, Xiaoyun <xiaoyun.li@intel.com> > > All buffers and ring info should be written before tail register update. > > This patch relocates the write memory barrier before updating tail register > > to avoid potential issues. > > > > Fixes: 11b5c7daf019 ("raw/ntb: add enqueue and dequeue functions") > > Cc: stable@dpdk.org > > > > Signed-off-by: Xiaoyun Li <xiaoyun.li@intel.com> > Acked-by: Jingjing Wu <jingjing.wu@intel.com> v1 applied, thanks ^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2020-01-20 9:04 UTC | newest] Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2019-12-04 15:19 [dpdk-stable] [PATCH] raw/ntb: fix write memory barrier issue Xiaoyun Li 2019-12-14 15:29 ` [dpdk-stable] [dpdk-dev] " Gavin Hu (Arm Technology China) 2019-12-16 1:57 ` Li, Xiaoyun 2019-12-25 3:52 ` Gavin Hu 2019-12-16 1:58 ` [dpdk-stable] [PATCH v2] " Xiaoyun Li 2019-12-16 10:49 ` [dpdk-stable] [dpdk-dev] " Gavin Hu (Arm Technology China) 2019-12-23 7:52 ` Li, Xiaoyun 2019-12-23 8:38 ` Gavin Hu 2019-12-23 9:35 ` Li, Xiaoyun 2019-12-24 15:46 ` Gavin Hu 2019-12-23 2:38 ` [dpdk-stable] " Wu, Jingjing 2019-12-26 1:46 ` [dpdk-stable] [PATCH] " Wu, Jingjing 2020-01-20 9:04 ` [dpdk-stable] [dpdk-dev] " Thomas Monjalon
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).