DPDK patches and discussions
 help / color / Atom feed
* [dpdk-dev] [PATCH RFC v1 0/6] barrier fix and optimization for mlx5 on aarch64
@ 2020-02-13 12:38 Gavin Hu
  2020-02-13 12:38 ` [dpdk-dev] [PATCH RFC v1 1/6] net/mlx5: relax the barrier for UAR write Gavin Hu
                   ` (13 more replies)
  0 siblings, 14 replies; 46+ messages in thread
From: Gavin Hu @ 2020-02-13 12:38 UTC (permalink / raw)
  To: dev
  Cc: nd, david.marchand, thomas, rasland, drc, bruce.richardson,
	konstantin.ananyev, matan, shahafs, viacheslavo, jerinj,
	Honnappa.Nagarahalli, ruifeng.wang, phil.yang, joyce.kong,
	steve.capper

Using just sufficient barriers really matters to performance.
Insufficient barriers will cause issues while barriers stronger
than required, especially in the fast path is a performance killer.

In the joint preliminary testing between Arm and Ampere, 8%~13%
performance was measured.

Gavin Hu (5):
  net/mlx5: relax the barrier for UAR write
  net/mlx5: use cio barrier before the BF WQE
  net/mlx5: add missing barrier
  net/mlx5: add descriptive comment for a barrier
  net/mlx5: non-cacheable mapping defaulted for aarch64

Phil Yang (1):
  net/mlx5: relaxed ordering for multi-packet RQ buffer refcnt

 drivers/net/mlx5/mlx5_rxq.c  |  5 +++--
 drivers/net/mlx5/mlx5_rxtx.c | 16 +++++++++-------
 drivers/net/mlx5/mlx5_rxtx.h | 11 ++++++++---
 drivers/net/mlx5/mlx5_txq.c  |  4 ++++
 4 files changed, 24 insertions(+), 12 deletions(-)

-- 
2.17.1


^ permalink raw reply	[flat|nested] 46+ messages in thread

* [dpdk-dev] [PATCH RFC v1 1/6] net/mlx5: relax the barrier for UAR write
  2020-02-13 12:38 [dpdk-dev] [PATCH RFC v1 0/6] barrier fix and optimization for mlx5 on aarch64 Gavin Hu
@ 2020-02-13 12:38 ` Gavin Hu
  2020-02-13 12:38 ` [dpdk-dev] [PATCH RFC v1 2/6] net/mlx5: use cio barrier before the BF WQE Gavin Hu
                   ` (12 subsequent siblings)
  13 siblings, 0 replies; 46+ messages in thread
From: Gavin Hu @ 2020-02-13 12:38 UTC (permalink / raw)
  To: dev
  Cc: nd, david.marchand, thomas, rasland, drc, bruce.richardson,
	konstantin.ananyev, matan, shahafs, viacheslavo, jerinj,
	Honnappa.Nagarahalli, ruifeng.wang, phil.yang, joyce.kong,
	steve.capper, stable

The UAR is part of PCI address space that is mapped for direct access to
the HCA from the CPU. Read-Write accesses to this space are strongly
ordered thus a compiler barrier is sufficient for all arches.

This patch set is based on the following aarch64 architecural facts:
1. The PCI BAR space is mapped as nGnRE device memory, not cachable nor
write-combine.
2. io accesses to a single device is total ordered.

Fixes: 6bf10ab69be0 ("net/mlx5: support 32-bit systems")
Cc: stable@dpdk.org

Signed-off-by: Gavin Hu <gavin.hu@arm.com>
Reviewed-by: Phil Yang <phil.yang@arm.com>
---
 drivers/net/mlx5/mlx5_rxtx.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/mlx5/mlx5_rxtx.h b/drivers/net/mlx5/mlx5_rxtx.h
index 939778aa5..50b3cc3c9 100644
--- a/drivers/net/mlx5/mlx5_rxtx.h
+++ b/drivers/net/mlx5/mlx5_rxtx.h
@@ -546,7 +546,7 @@ __mlx5_uar_write64_relaxed(uint64_t val, void *addr,
 static __rte_always_inline void
 __mlx5_uar_write64(uint64_t val, void *addr, rte_spinlock_t *lock)
 {
-	rte_io_wmb();
+	rte_compiler_barrier();
 	__mlx5_uar_write64_relaxed(val, addr, lock);
 }
 
-- 
2.17.1


^ permalink raw reply	[flat|nested] 46+ messages in thread

* [dpdk-dev] [PATCH RFC v1 2/6] net/mlx5: use cio barrier before the BF WQE
  2020-02-13 12:38 [dpdk-dev] [PATCH RFC v1 0/6] barrier fix and optimization for mlx5 on aarch64 Gavin Hu
  2020-02-13 12:38 ` [dpdk-dev] [PATCH RFC v1 1/6] net/mlx5: relax the barrier for UAR write Gavin Hu
@ 2020-02-13 12:38 ` Gavin Hu
  2020-02-13 12:38 ` [dpdk-dev] [PATCH RFC v1 3/6] net/mlx5: add missing barrier Gavin Hu
                   ` (11 subsequent siblings)
  13 siblings, 0 replies; 46+ messages in thread
From: Gavin Hu @ 2020-02-13 12:38 UTC (permalink / raw)
  To: dev
  Cc: nd, david.marchand, thomas, rasland, drc, bruce.richardson,
	konstantin.ananyev, matan, shahafs, viacheslavo, jerinj,
	Honnappa.Nagarahalli, ruifeng.wang, phil.yang, joyce.kong,
	steve.capper, stable

To ensure the WQE and doorbell record, which reside in the host memory,
are visible to HW before the blue frame, a CIO barrier is sufficient, a
rte_wmb is overkill.

Fixes: 6cb559d67b83 ("net/mlx5: add vectorized Rx/Tx burst for x86")
Cc: stable@dpdk.org

Signed-off-by: Gavin Hu <gavin.hu@arm.com>
Reviewed-by: Phil Yang <phil.yang@arm.com>
---
 drivers/net/mlx5/mlx5_rxtx.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/mlx5/mlx5_rxtx.h b/drivers/net/mlx5/mlx5_rxtx.h
index 50b3cc3c9..c672af4c4 100644
--- a/drivers/net/mlx5/mlx5_rxtx.h
+++ b/drivers/net/mlx5/mlx5_rxtx.h
@@ -658,7 +658,7 @@ mlx5_tx_dbrec_cond_wmb(struct mlx5_txq_data *txq, volatile struct mlx5_wqe *wqe,
 	rte_cio_wmb();
 	*txq->qp_db = rte_cpu_to_be_32(txq->wqe_ci);
 	/* Ensure ordering between DB record and BF copy. */
-	rte_wmb();
+	rte_cio_wmb();
 	mlx5_uar_write64_relaxed(*src, dst, txq->uar_lock);
 	if (cond)
 		rte_wmb();
-- 
2.17.1


^ permalink raw reply	[flat|nested] 46+ messages in thread

* [dpdk-dev] [PATCH RFC v1 3/6] net/mlx5: add missing barrier
  2020-02-13 12:38 [dpdk-dev] [PATCH RFC v1 0/6] barrier fix and optimization for mlx5 on aarch64 Gavin Hu
  2020-02-13 12:38 ` [dpdk-dev] [PATCH RFC v1 1/6] net/mlx5: relax the barrier for UAR write Gavin Hu
  2020-02-13 12:38 ` [dpdk-dev] [PATCH RFC v1 2/6] net/mlx5: use cio barrier before the BF WQE Gavin Hu
@ 2020-02-13 12:38 ` Gavin Hu
  2020-02-13 12:38 ` [dpdk-dev] [PATCH RFC v1 4/6] net/mlx5: add descriptive comment for a barrier Gavin Hu
                   ` (10 subsequent siblings)
  13 siblings, 0 replies; 46+ messages in thread
From: Gavin Hu @ 2020-02-13 12:38 UTC (permalink / raw)
  To: dev
  Cc: nd, david.marchand, thomas, rasland, drc, bruce.richardson,
	konstantin.ananyev, matan, shahafs, viacheslavo, jerinj,
	Honnappa.Nagarahalli, ruifeng.wang, phil.yang, joyce.kong,
	steve.capper, stable

To keep order of the modification of RX queue descriptor(rxq->cq_db) and
the CQ doorbell register, a rte_cio_wmb barrier is required.

The situation was rescued by the stronger than required barrier in the
mlx5_uar_write64, it becomes a must when the barrier is relaxed.

Fixes: 6bf10ab69be0 ("net/mlx5: support 32-bit systems")
Cc: stable@dpdk.org

Suggested-by: Phil Yang <phil.yang@arm.com>
Signed-off-by: Gavin Hu <gavin.hu@arm.com>
Reviewed-by: Phil Yang <phil.yang@arm.com>
---
 drivers/net/mlx5/mlx5_rxq.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/net/mlx5/mlx5_rxq.c b/drivers/net/mlx5/mlx5_rxq.c
index dc0fd8211..2d1b153a3 100644
--- a/drivers/net/mlx5/mlx5_rxq.c
+++ b/drivers/net/mlx5/mlx5_rxq.c
@@ -856,7 +856,8 @@ mlx5_arm_cq(struct mlx5_rxq_data *rxq, int sq_n_rxq)
 	doorbell = (uint64_t)doorbell_hi << 32;
 	doorbell |=  rxq->cqn;
 	rxq->cq_db[MLX5_CQ_ARM_DB] = rte_cpu_to_be_32(doorbell_hi);
-	mlx5_uar_write64(rte_cpu_to_be_64(doorbell),
+	rte_cio_wmb();
+	mlx5_uar_write64_relaxed(rte_cpu_to_be_64(doorbell),
 			 cq_db_reg, rxq->uar_lock_cq);
 }
 
-- 
2.17.1


^ permalink raw reply	[flat|nested] 46+ messages in thread

* [dpdk-dev] [PATCH RFC v1 4/6] net/mlx5: add descriptive comment for a barrier
  2020-02-13 12:38 [dpdk-dev] [PATCH RFC v1 0/6] barrier fix and optimization for mlx5 on aarch64 Gavin Hu
                   ` (2 preceding siblings ...)
  2020-02-13 12:38 ` [dpdk-dev] [PATCH RFC v1 3/6] net/mlx5: add missing barrier Gavin Hu
@ 2020-02-13 12:38 ` Gavin Hu
  2020-02-13 12:38 ` [dpdk-dev] [PATCH RFC v1 5/6] net/mlx5: non-cacheable mapping defaulted for aarch64 Gavin Hu
                   ` (9 subsequent siblings)
  13 siblings, 0 replies; 46+ messages in thread
From: Gavin Hu @ 2020-02-13 12:38 UTC (permalink / raw)
  To: dev
  Cc: nd, david.marchand, thomas, rasland, drc, bruce.richardson,
	konstantin.ananyev, matan, shahafs, viacheslavo, jerinj,
	Honnappa.Nagarahalli, ruifeng.wang, phil.yang, joyce.kong,
	steve.capper

The barrier is not required or can be moved down if HW waits for the
doorbell ring to execute the WQE.

This is not the case as HW can start executing the WQE until it gets the
ownership(passed by SW writing the doorbell record).

Add a decriptive comment for this HW specific behavior.

Signed-off-by: Gavin Hu <gavin.hu@arm.com>
Reviewed-by: Phil Yang <phil.yang@arm.com>
---
 drivers/net/mlx5/mlx5_rxtx.h | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/drivers/net/mlx5/mlx5_rxtx.h b/drivers/net/mlx5/mlx5_rxtx.h
index c672af4c4..d32c4e430 100644
--- a/drivers/net/mlx5/mlx5_rxtx.h
+++ b/drivers/net/mlx5/mlx5_rxtx.h
@@ -655,6 +655,11 @@ mlx5_tx_dbrec_cond_wmb(struct mlx5_txq_data *txq, volatile struct mlx5_wqe *wqe,
 	uint64_t *dst = MLX5_TX_BFREG(txq);
 	volatile uint64_t *src = ((volatile uint64_t *)wqe);
 
+	/* The ownership of WQE is passed to HW by updating the doorbell
+	 * record. Once WQE ownership has been passed to the HCA, HW can
+	 * execute the WQE. The barrier is to ensure the WQE are visible
+	 * to HW before HW execute it.
+	 */
 	rte_cio_wmb();
 	*txq->qp_db = rte_cpu_to_be_32(txq->wqe_ci);
 	/* Ensure ordering between DB record and BF copy. */
-- 
2.17.1


^ permalink raw reply	[flat|nested] 46+ messages in thread

* [dpdk-dev] [PATCH RFC v1 5/6] net/mlx5: non-cacheable mapping defaulted for aarch64
  2020-02-13 12:38 [dpdk-dev] [PATCH RFC v1 0/6] barrier fix and optimization for mlx5 on aarch64 Gavin Hu
                   ` (3 preceding siblings ...)
  2020-02-13 12:38 ` [dpdk-dev] [PATCH RFC v1 4/6] net/mlx5: add descriptive comment for a barrier Gavin Hu
@ 2020-02-13 12:38 ` Gavin Hu
  2020-02-13 12:38 ` [dpdk-dev] [PATCH RFC v1 6/6] net/mlx5: relaxed ordering for multi-packet RQ buffer refcnt Gavin Hu
                   ` (8 subsequent siblings)
  13 siblings, 0 replies; 46+ messages in thread
From: Gavin Hu @ 2020-02-13 12:38 UTC (permalink / raw)
  To: dev
  Cc: nd, david.marchand, thomas, rasland, drc, bruce.richardson,
	konstantin.ananyev, matan, shahafs, viacheslavo, jerinj,
	Honnappa.Nagarahalli, ruifeng.wang, phil.yang, joyce.kong,
	steve.capper, stable

aarch64 does not map pci resources to 'write-combine' nor
cacheable. In Linux Kernel arch_can_pci_mmap_wc() equals to 0 on
aarch64[1].

[1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/
tree/drivers/pci/pci-sysfs.c?h=v5.4#n1204

Fixes: f078ceb6ae93 ("net/mlx5: fix Tx doorbell write memory barrier")
Cc: stable@dpdk.org

Signed-off-by: Gavin Hu <gavin.hu@arm.com>
Reviewed-by: Phil Yang <phil.yang@arm.com>
---
 drivers/net/mlx5/mlx5_txq.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/drivers/net/mlx5/mlx5_txq.c b/drivers/net/mlx5/mlx5_txq.c
index bc13abfe6..144bab4a6 100644
--- a/drivers/net/mlx5/mlx5_txq.c
+++ b/drivers/net/mlx5/mlx5_txq.c
@@ -319,7 +319,11 @@ txq_uar_ncattr_init(struct mlx5_txq_ctrl *txq_ctrl, size_t page_size)
 	off_t cmd;
 
 	txq_ctrl->txq.db_heu = priv->config.dbnc == MLX5_TXDB_HEURISTIC;
+#ifdef RTE_ARCH_ARM64
+	txq_ctrl->txq.db_nc = 1;
+#else
 	txq_ctrl->txq.db_nc = 0;
+#endif
 	/* Check the doorbell register mapping type. */
 	cmd = txq_ctrl->uar_mmap_offset / page_size;
 	cmd >>= MLX5_UAR_MMAP_CMD_SHIFT;
-- 
2.17.1


^ permalink raw reply	[flat|nested] 46+ messages in thread

* [dpdk-dev] [PATCH RFC v1 6/6] net/mlx5: relaxed ordering for multi-packet RQ buffer refcnt
  2020-02-13 12:38 [dpdk-dev] [PATCH RFC v1 0/6] barrier fix and optimization for mlx5 on aarch64 Gavin Hu
                   ` (4 preceding siblings ...)
  2020-02-13 12:38 ` [dpdk-dev] [PATCH RFC v1 5/6] net/mlx5: non-cacheable mapping defaulted for aarch64 Gavin Hu
@ 2020-02-13 12:38 ` Gavin Hu
  2020-04-10 16:41 ` [dpdk-dev] [PATCH RFC v2 0/7] introduce new barrier class and use it for mlx5 PMD Gavin Hu
                   ` (7 subsequent siblings)
  13 siblings, 0 replies; 46+ messages in thread
From: Gavin Hu @ 2020-02-13 12:38 UTC (permalink / raw)
  To: dev
  Cc: nd, Phil Yang, david.marchand, thomas, rasland, drc,
	bruce.richardson, konstantin.ananyev, matan, shahafs,
	viacheslavo, jerinj, Honnappa.Nagarahalli, ruifeng.wang,
	joyce.kong, steve.capper, stable

From: Phil Yang <phil.yang@arm.com>

PMD Rx queue descriptor contains two mlx5_mprq_buf fields, which
are the multi-packet RQ buffer header pointers. It uses the common
rte_atomic_XXX functions to make sure the refcnt access is atomic.

The common rte_atomic_XXX functions are full barriers on aarch64.
Optimized it with one-way barrier to improve performance.

Fixes: 7d6bf6b866b8 ("net/mlx5: add Multi-Packet Rx support")
Cc: stable@dpdk.org

Suggested-by: Gavin Hu <gavin.hu@arm.com>
Signed-off-by: Phil Yang <phil.yang@arm.com>
Reviewed-by: Gavin Hu <gavin.hu@arm.com>
---
 drivers/net/mlx5/mlx5_rxq.c  |  2 +-
 drivers/net/mlx5/mlx5_rxtx.c | 16 +++++++++-------
 drivers/net/mlx5/mlx5_rxtx.h |  2 +-
 3 files changed, 11 insertions(+), 9 deletions(-)

diff --git a/drivers/net/mlx5/mlx5_rxq.c b/drivers/net/mlx5/mlx5_rxq.c
index 2d1b153a3..765bb1af5 100644
--- a/drivers/net/mlx5/mlx5_rxq.c
+++ b/drivers/net/mlx5/mlx5_rxq.c
@@ -1535,7 +1535,7 @@ mlx5_mprq_buf_init(struct rte_mempool *mp, void *opaque_arg,
 
 	memset(_m, 0, sizeof(*buf));
 	buf->mp = mp;
-	rte_atomic16_set(&buf->refcnt, 1);
+	__atomic_store_n(&buf->refcnt, 1, __ATOMIC_RELAXED);
 	for (j = 0; j != strd_n; ++j) {
 		shinfo = &buf->shinfos[j];
 		shinfo->free_cb = mlx5_mprq_buf_free_cb;
diff --git a/drivers/net/mlx5/mlx5_rxtx.c b/drivers/net/mlx5/mlx5_rxtx.c
index 5eea932d4..0e7519c56 100644
--- a/drivers/net/mlx5/mlx5_rxtx.c
+++ b/drivers/net/mlx5/mlx5_rxtx.c
@@ -1592,10 +1592,11 @@ mlx5_mprq_buf_free_cb(void *addr __rte_unused, void *opaque)
 {
 	struct mlx5_mprq_buf *buf = opaque;
 
-	if (rte_atomic16_read(&buf->refcnt) == 1) {
+	if (__atomic_load_n(&buf->refcnt, __ATOMIC_RELAXED) == 1) {
 		rte_mempool_put(buf->mp, buf);
-	} else if (rte_atomic16_add_return(&buf->refcnt, -1) == 0) {
-		rte_atomic16_set(&buf->refcnt, 1);
+	} else if (unlikely(__atomic_sub_fetch(&buf->refcnt, 1,
+						__ATOMIC_RELAXED) == 0)) {
+		__atomic_store_n(&buf->refcnt, 1, __ATOMIC_RELAXED);
 		rte_mempool_put(buf->mp, buf);
 	}
 }
@@ -1676,7 +1677,8 @@ mlx5_rx_burst_mprq(void *dpdk_rxq, struct rte_mbuf **pkts, uint16_t pkts_n)
 
 		if (consumed_strd == strd_n) {
 			/* Replace WQE only if the buffer is still in use. */
-			if (rte_atomic16_read(&buf->refcnt) > 1) {
+			if (__atomic_load_n(&buf->refcnt,
+					    __ATOMIC_RELAXED) > 1) {
 				mprq_buf_replace(rxq, rq_ci & wq_mask, strd_n);
 				/* Release the old buffer. */
 				mlx5_mprq_buf_free(buf);
@@ -1766,9 +1768,9 @@ mlx5_rx_burst_mprq(void *dpdk_rxq, struct rte_mbuf **pkts, uint16_t pkts_n)
 			void *buf_addr;
 
 			/* Increment the refcnt of the whole chunk. */
-			rte_atomic16_add_return(&buf->refcnt, 1);
-			MLX5_ASSERT((uint16_t)rte_atomic16_read(&buf->refcnt) <=
-				    strd_n + 1);
+			__atomic_add_fetch(&buf->refcnt, 1, __ATOMIC_ACQUIRE);
+			MLX5_ASSERT(__atomic_load_n(&buf->refcnt,
+					__ATOMIC_RELAXED) <= strd_n + 1);
 			buf_addr = RTE_PTR_SUB(addr, headroom_sz);
 			/*
 			 * MLX5 device doesn't use iova but it is necessary in a
diff --git a/drivers/net/mlx5/mlx5_rxtx.h b/drivers/net/mlx5/mlx5_rxtx.h
index d32c4e430..1f453fe09 100644
--- a/drivers/net/mlx5/mlx5_rxtx.h
+++ b/drivers/net/mlx5/mlx5_rxtx.h
@@ -78,7 +78,7 @@ struct rxq_zip {
 /* Multi-Packet RQ buffer header. */
 struct mlx5_mprq_buf {
 	struct rte_mempool *mp;
-	rte_atomic16_t refcnt; /* Atomically accessed refcnt. */
+	uint16_t refcnt; /* Atomically accessed refcnt. */
 	uint8_t pad[RTE_PKTMBUF_HEADROOM]; /* Headroom for the first packet. */
 	struct rte_mbuf_ext_shared_info shinfos[];
 	/*
-- 
2.17.1


^ permalink raw reply	[flat|nested] 46+ messages in thread

* [dpdk-dev] [PATCH RFC v2 0/7] introduce new barrier class and use it for mlx5 PMD
  2020-02-13 12:38 [dpdk-dev] [PATCH RFC v1 0/6] barrier fix and optimization for mlx5 on aarch64 Gavin Hu
                   ` (5 preceding siblings ...)
  2020-02-13 12:38 ` [dpdk-dev] [PATCH RFC v1 6/6] net/mlx5: relaxed ordering for multi-packet RQ buffer refcnt Gavin Hu
@ 2020-04-10 16:41 ` Gavin Hu
  2020-04-10 17:20   ` Andrew Rybchenko
                     ` (4 more replies)
  2020-04-10 16:41 ` [dpdk-dev] [PATCH RFC v2 1/7] eal: introduce new class of barriers for DMA use cases Gavin Hu
                   ` (6 subsequent siblings)
  13 siblings, 5 replies; 46+ messages in thread
From: Gavin Hu @ 2020-04-10 16:41 UTC (permalink / raw)
  To: dev
  Cc: nd, david.marchand, thomas, rasland, drc, bruce.richardson,
	konstantin.ananyev, matan, shahafs, viacheslavo, jerinj,
	Honnappa.Nagarahalli, ruifeng.wang, phil.yang, joyce.kong,
	steve.capper

To order writes to various memory types, 'sfence' is required for x86,
and 'dmb oshst' is required for aarch64. 

But within DPDK, there is no abstracted barriers covers this
combination: sfence(x86)/dmb(aarch64).

So introduce a new barrier class - rte_dma_*mb for this combination, 

Doorbell rings are typical use cases of this new barrier class, which
requires something ready in the memory before letting HW aware.

As a note, rte_io_wmb and rte_cio_wmb are compiler barriers for x86, while
rte_wmb is 'dsb' for aarch64.

In the joint preliminary testing between Arm and Ampere, 8%~13%
performance boost was measured.

As there is no functionality changes, it will not impact x86. 

Gavin Hu (6):
  eal: introduce new class of barriers for DMA use cases
  net/mlx5: dmb for immediate doorbell ring on aarch64
  net/mlx5: relax barrier to order UAR writes on aarch64
  net/mlx5: relax barrier for aarch64
  net/mlx5: add descriptive comment for a barrier
  doc: clarify one configuration in mlx5 guide

Phil Yang (1):
  net/mlx5: relax ordering for multi-packet RQ buffer refcnt

 doc/guides/nics/mlx5.rst                    |  6 ++--
 drivers/net/mlx5/mlx5_rxq.c                 |  2 +-
 drivers/net/mlx5/mlx5_rxtx.c                | 16 ++++++-----
 drivers/net/mlx5/mlx5_rxtx.h                | 14 ++++++----
 lib/librte_eal/arm/include/rte_atomic_32.h  |  6 ++++
 lib/librte_eal/arm/include/rte_atomic_64.h  |  6 ++++
 lib/librte_eal/include/generic/rte_atomic.h | 31 +++++++++++++++++++++
 lib/librte_eal/ppc/include/rte_atomic.h     |  6 ++++
 lib/librte_eal/x86/include/rte_atomic.h     |  6 ++++
 9 files changed, 78 insertions(+), 15 deletions(-)

-- 
2.17.1


^ permalink raw reply	[flat|nested] 46+ messages in thread

* [dpdk-dev] [PATCH RFC v2 1/7] eal: introduce new class of barriers for DMA use cases
  2020-02-13 12:38 [dpdk-dev] [PATCH RFC v1 0/6] barrier fix and optimization for mlx5 on aarch64 Gavin Hu
                   ` (6 preceding siblings ...)
  2020-04-10 16:41 ` [dpdk-dev] [PATCH RFC v2 0/7] introduce new barrier class and use it for mlx5 PMD Gavin Hu
@ 2020-04-10 16:41 ` Gavin Hu
  2020-04-10 16:41 ` [dpdk-dev] [PATCH RFC v2 2/7] net/mlx5: dmb for immediate doorbell ring on aarch64 Gavin Hu
                   ` (5 subsequent siblings)
  13 siblings, 0 replies; 46+ messages in thread
From: Gavin Hu @ 2020-04-10 16:41 UTC (permalink / raw)
  To: dev
  Cc: nd, david.marchand, thomas, rasland, drc, bruce.richardson,
	konstantin.ananyev, matan, shahafs, viacheslavo, jerinj,
	Honnappa.Nagarahalli, ruifeng.wang, phil.yang, joyce.kong,
	steve.capper

In DPDK we use rte_*mb barriers to ensure that memory accesses to DMA
regions are observed before MMIO accesses to hardware registers.

On AArch64, the rte_*mb barriers are implemented by "DSB" (Data
Synchronisation Barrier) style instructions which are the strongest
barriers possible.

Recently, however, it has been realised [1], that for devices where the
MMIO regions are shared between all CPUs, that it is possible to relax
this memory barrier.

There are cases where we wish to retain the strength of the rte_*mb
memory barriers; thus rather than relax rte_*mb we opt instead to
introduce a new class of barrier rte_dma_*mb.

For AArch64, rte_dma_*mb will be implemented by a relaxed "DMB OSH"
style of barrier.

For other architectures, we implement rte_dma_*mb as rte_*mb so this
should not result in any functional changes.

[1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/
commit/?id=22ec71615d824f4f11d38d0e55a88d8956b7e45f

Signed-off-by: Gavin Hu <gavin.hu@arm.com>
Reviewed-by: Steve Capper <steve.capper@arm.com>
---
 lib/librte_eal/arm/include/rte_atomic_32.h  |  6 ++++
 lib/librte_eal/arm/include/rte_atomic_64.h  |  6 ++++
 lib/librte_eal/include/generic/rte_atomic.h | 31 +++++++++++++++++++++
 lib/librte_eal/ppc/include/rte_atomic.h     |  6 ++++
 lib/librte_eal/x86/include/rte_atomic.h     |  6 ++++
 5 files changed, 55 insertions(+)

diff --git a/lib/librte_eal/arm/include/rte_atomic_32.h b/lib/librte_eal/arm/include/rte_atomic_32.h
index 7dc0d06d1..80208467e 100644
--- a/lib/librte_eal/arm/include/rte_atomic_32.h
+++ b/lib/librte_eal/arm/include/rte_atomic_32.h
@@ -33,6 +33,12 @@ extern "C" {
 
 #define rte_io_rmb() rte_rmb()
 
+#define rte_dma_mb() rte_mb()
+
+#define rte_dma_wmb() rte_wmb()
+
+#define rte_dma_rmb() rte_rmb()
+
 #define rte_cio_wmb() rte_wmb()
 
 #define rte_cio_rmb() rte_rmb()
diff --git a/lib/librte_eal/arm/include/rte_atomic_64.h b/lib/librte_eal/arm/include/rte_atomic_64.h
index 7b7099cdc..608726c29 100644
--- a/lib/librte_eal/arm/include/rte_atomic_64.h
+++ b/lib/librte_eal/arm/include/rte_atomic_64.h
@@ -37,6 +37,12 @@ extern "C" {
 
 #define rte_io_rmb() rte_rmb()
 
+#define rte_dma_mb() asm volatile("dmb osh" : : : "memory")
+
+#define rte_dma_wmb() asm volatile("dmb oshst" : : : "memory")
+
+#define rte_dma_rmb() asm volatile("dmb oshld" : : : "memory")
+
 #define rte_cio_wmb() asm volatile("dmb oshst" : : : "memory")
 
 #define rte_cio_rmb() asm volatile("dmb oshld" : : : "memory")
diff --git a/lib/librte_eal/include/generic/rte_atomic.h b/lib/librte_eal/include/generic/rte_atomic.h
index e6ab15a97..042264c7e 100644
--- a/lib/librte_eal/include/generic/rte_atomic.h
+++ b/lib/librte_eal/include/generic/rte_atomic.h
@@ -107,6 +107,37 @@ static inline void rte_io_wmb(void);
 static inline void rte_io_rmb(void);
 ///@}
 
+/** @name DMA Memory Barrier
+ */
+///@{
+/**
+ * memory barrier for DMA use cases
+ *
+ * Guarantees that the LOAD and STORE operations that precede the rte_dma_mb()
+ * call are visible to CPU and I/O device that is shared between all CPUs
+ * before the LOAD and STORE operations that follow it.
+ */
+static inline void rte_dma_mb(void);
+
+/**
+ * Write memory barrier for DMA use cases
+ *
+ * Guarantees that the STORE operations that precede the rte_dma_wmb() call are
+ * visible to CPU and I/O device that is shared between all CPUs before the
+ * STORE operations that follow it.
+ */
+static inline void rte_dma_wmb(void);
+
+/**
+ * Read memory barrier for DMA use cases
+ *
+ * Guarantees that the LOAD operations that precede the rte_dma_rmb() call are
+ * visible to CPU and IO device that is shared between all CPUs before the LOAD
+ * operations that follow it.
+ */
+static inline void rte_dma_rmb(void);
+///@}
+
 /** @name Coherent I/O Memory Barrier
  *
  * Coherent I/O memory barrier is a lightweight version of I/O memory
diff --git a/lib/librte_eal/ppc/include/rte_atomic.h b/lib/librte_eal/ppc/include/rte_atomic.h
index 7e3e13118..faa36bb76 100644
--- a/lib/librte_eal/ppc/include/rte_atomic.h
+++ b/lib/librte_eal/ppc/include/rte_atomic.h
@@ -36,6 +36,12 @@ extern "C" {
 
 #define rte_io_rmb() rte_rmb()
 
+#define rte_dma_mb() rte_mb()
+
+#define rte_dma_wmb() rte_wmb()
+
+#define rte_dma_rmb() rte_rmb()
+
 #define rte_cio_wmb() rte_wmb()
 
 #define rte_cio_rmb() rte_rmb()
diff --git a/lib/librte_eal/x86/include/rte_atomic.h b/lib/librte_eal/x86/include/rte_atomic.h
index 148398f50..0b1d452f3 100644
--- a/lib/librte_eal/x86/include/rte_atomic.h
+++ b/lib/librte_eal/x86/include/rte_atomic.h
@@ -79,6 +79,12 @@ rte_smp_mb(void)
 
 #define rte_io_rmb() rte_compiler_barrier()
 
+#define rte_dma_mb() rte_mb()
+
+#define rte_dma_wmb() rte_wmb()
+
+#define rte_dma_rmb() rte_rmb()
+
 #define rte_cio_wmb() rte_compiler_barrier()
 
 #define rte_cio_rmb() rte_compiler_barrier()
-- 
2.17.1


^ permalink raw reply	[flat|nested] 46+ messages in thread

* [dpdk-dev] [PATCH RFC v2 2/7] net/mlx5: dmb for immediate doorbell ring on aarch64
  2020-02-13 12:38 [dpdk-dev] [PATCH RFC v1 0/6] barrier fix and optimization for mlx5 on aarch64 Gavin Hu
                   ` (7 preceding siblings ...)
  2020-04-10 16:41 ` [dpdk-dev] [PATCH RFC v2 1/7] eal: introduce new class of barriers for DMA use cases Gavin Hu
@ 2020-04-10 16:41 ` Gavin Hu
  2020-04-10 16:41 ` [dpdk-dev] [PATCH RFC v2 3/7] net/mlx5: relax barrier to order UAR writes " Gavin Hu
                   ` (4 subsequent siblings)
  13 siblings, 0 replies; 46+ messages in thread
From: Gavin Hu @ 2020-04-10 16:41 UTC (permalink / raw)
  To: dev
  Cc: nd, david.marchand, thomas, rasland, drc, bruce.richardson,
	konstantin.ananyev, matan, shahafs, viacheslavo, jerinj,
	Honnappa.Nagarahalli, ruifeng.wang, phil.yang, joyce.kong,
	steve.capper

A 'DMB' is enough to evict the merge buffer on aarch64,when the doorbell
register is mapped as 'Normal-NC', the counterpart of WC on x86.

Otherwise, it is mapped as Device memory, no barriers required at all.

Signed-off-by: Gavin Hu <gavin.hu@arm.com>
---
 drivers/net/mlx5/mlx5_rxtx.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/mlx5/mlx5_rxtx.h b/drivers/net/mlx5/mlx5_rxtx.h
index 939778aa5..e509f3b88 100644
--- a/drivers/net/mlx5/mlx5_rxtx.h
+++ b/drivers/net/mlx5/mlx5_rxtx.h
@@ -661,7 +661,7 @@ mlx5_tx_dbrec_cond_wmb(struct mlx5_txq_data *txq, volatile struct mlx5_wqe *wqe,
 	rte_wmb();
 	mlx5_uar_write64_relaxed(*src, dst, txq->uar_lock);
 	if (cond)
-		rte_wmb();
+		rte_dma_wmb();
 }
 
 /**
-- 
2.17.1


^ permalink raw reply	[flat|nested] 46+ messages in thread

* [dpdk-dev] [PATCH RFC v2 3/7] net/mlx5: relax barrier to order UAR writes on aarch64
  2020-02-13 12:38 [dpdk-dev] [PATCH RFC v1 0/6] barrier fix and optimization for mlx5 on aarch64 Gavin Hu
                   ` (8 preceding siblings ...)
  2020-04-10 16:41 ` [dpdk-dev] [PATCH RFC v2 2/7] net/mlx5: dmb for immediate doorbell ring on aarch64 Gavin Hu
@ 2020-04-10 16:41 ` " Gavin Hu
  2020-04-10 16:41 ` [dpdk-dev] [PATCH RFC v2 4/7] net/mlx5: relax barrier for aarch64 Gavin Hu
                   ` (3 subsequent siblings)
  13 siblings, 0 replies; 46+ messages in thread
From: Gavin Hu @ 2020-04-10 16:41 UTC (permalink / raw)
  To: dev
  Cc: nd, david.marchand, thomas, rasland, drc, bruce.richardson,
	konstantin.ananyev, matan, shahafs, viacheslavo, jerinj,
	Honnappa.Nagarahalli, ruifeng.wang, phil.yang, joyce.kong,
	steve.capper, stable

To order the writes to host memory and the MMIO device memory,
'DMB' is sufficient on aarch64, as a 'other-multi-copy' architecture.
'DSB' is over-killing, especially in the fast path.

Using the rte_dma_wmb can take the advantage on aarch64 while no
impacting x86 and ppc.

Fixes: 6bf10ab69be0 ("net/mlx5: support 32-bit systems")
Cc: stable@dpdk.org

Signed-off-by: Gavin Hu <gavin.hu@arm.com>
Reviewed-by: Phil Yang <phil.yang@arm.com>
---
 drivers/net/mlx5/mlx5_rxtx.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/mlx5/mlx5_rxtx.h b/drivers/net/mlx5/mlx5_rxtx.h
index e509f3b88..da5d81350 100644
--- a/drivers/net/mlx5/mlx5_rxtx.h
+++ b/drivers/net/mlx5/mlx5_rxtx.h
@@ -546,7 +546,7 @@ __mlx5_uar_write64_relaxed(uint64_t val, void *addr,
 static __rte_always_inline void
 __mlx5_uar_write64(uint64_t val, void *addr, rte_spinlock_t *lock)
 {
-	rte_io_wmb();
+	rte_dma_wmb();
 	__mlx5_uar_write64_relaxed(val, addr, lock);
 }
 
-- 
2.17.1


^ permalink raw reply	[flat|nested] 46+ messages in thread

* [dpdk-dev] [PATCH RFC v2 4/7] net/mlx5: relax barrier for aarch64
  2020-02-13 12:38 [dpdk-dev] [PATCH RFC v1 0/6] barrier fix and optimization for mlx5 on aarch64 Gavin Hu
                   ` (9 preceding siblings ...)
  2020-04-10 16:41 ` [dpdk-dev] [PATCH RFC v2 3/7] net/mlx5: relax barrier to order UAR writes " Gavin Hu
@ 2020-04-10 16:41 ` Gavin Hu
  2020-04-10 16:41 ` [dpdk-dev] [PATCH RFC v2 5/7] net/mlx5: add descriptive comment for a barrier Gavin Hu
                   ` (2 subsequent siblings)
  13 siblings, 0 replies; 46+ messages in thread
From: Gavin Hu @ 2020-04-10 16:41 UTC (permalink / raw)
  To: dev
  Cc: nd, david.marchand, thomas, rasland, drc, bruce.richardson,
	konstantin.ananyev, matan, shahafs, viacheslavo, jerinj,
	Honnappa.Nagarahalli, ruifeng.wang, phil.yang, joyce.kong,
	steve.capper, stable

To ensure the WQE and doorbell record, which reside in the host memory,
are visible to HW before the blue frame, an ordered mlx5_uar_write call
is sufficient, a rte_wmb is overkill for aarch64.

Fixes: 6cb559d67b83 ("net/mlx5: add vectorized Rx/Tx burst for x86")
Cc: stable@dpdk.org

Signed-off-by: Gavin Hu <gavin.hu@arm.com>
Reviewed-by: Phil Yang <phil.yang@arm.com>
---
 drivers/net/mlx5/mlx5_rxtx.h | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/net/mlx5/mlx5_rxtx.h b/drivers/net/mlx5/mlx5_rxtx.h
index da5d81350..228e37de5 100644
--- a/drivers/net/mlx5/mlx5_rxtx.h
+++ b/drivers/net/mlx5/mlx5_rxtx.h
@@ -658,8 +658,7 @@ mlx5_tx_dbrec_cond_wmb(struct mlx5_txq_data *txq, volatile struct mlx5_wqe *wqe,
 	rte_cio_wmb();
 	*txq->qp_db = rte_cpu_to_be_32(txq->wqe_ci);
 	/* Ensure ordering between DB record and BF copy. */
-	rte_wmb();
-	mlx5_uar_write64_relaxed(*src, dst, txq->uar_lock);
+	mlx5_uar_write64(*src, dst, txq->uar_lock);
 	if (cond)
 		rte_dma_wmb();
 }
-- 
2.17.1


^ permalink raw reply	[flat|nested] 46+ messages in thread

* [dpdk-dev] [PATCH RFC v2 5/7] net/mlx5: add descriptive comment for a barrier
  2020-02-13 12:38 [dpdk-dev] [PATCH RFC v1 0/6] barrier fix and optimization for mlx5 on aarch64 Gavin Hu
                   ` (10 preceding siblings ...)
  2020-04-10 16:41 ` [dpdk-dev] [PATCH RFC v2 4/7] net/mlx5: relax barrier for aarch64 Gavin Hu
@ 2020-04-10 16:41 ` Gavin Hu
  2020-04-10 16:41 ` [dpdk-dev] [PATCH RFC v2 6/7] net/mlx5: relax ordering for multi-packet RQ buffer refcnt Gavin Hu
  2020-04-10 16:41 ` [dpdk-dev] [PATCH RFC v2 7/7] doc: clarify one configuration in mlx5 guide Gavin Hu
  13 siblings, 0 replies; 46+ messages in thread
From: Gavin Hu @ 2020-04-10 16:41 UTC (permalink / raw)
  To: dev
  Cc: nd, david.marchand, thomas, rasland, drc, bruce.richardson,
	konstantin.ananyev, matan, shahafs, viacheslavo, jerinj,
	Honnappa.Nagarahalli, ruifeng.wang, phil.yang, joyce.kong,
	steve.capper

The barrier is not required or can be moved down if HW waits for the
doorbell ring to execute the WQE.

This is not the case as HW can start executing the WQE until it gets the
ownership(passed by SW writing the doorbell record).

Add a decriptive comment for this HW specific behavior.

Signed-off-by: Gavin Hu <gavin.hu@arm.com>
Reviewed-by: Phil Yang <phil.yang@arm.com>
---
 drivers/net/mlx5/mlx5_rxtx.h | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/drivers/net/mlx5/mlx5_rxtx.h b/drivers/net/mlx5/mlx5_rxtx.h
index 228e37de5..737d5716d 100644
--- a/drivers/net/mlx5/mlx5_rxtx.h
+++ b/drivers/net/mlx5/mlx5_rxtx.h
@@ -655,6 +655,11 @@ mlx5_tx_dbrec_cond_wmb(struct mlx5_txq_data *txq, volatile struct mlx5_wqe *wqe,
 	uint64_t *dst = MLX5_TX_BFREG(txq);
 	volatile uint64_t *src = ((volatile uint64_t *)wqe);
 
+	/* The ownership of WQE is passed to HW by updating the doorbell
+	 * record. Once WQE ownership has been passed to the HCA, HW can
+	 * execute the WQE. The barrier is to ensure the WQE are visible
+	 * to HW before HW execute it.
+	 */
 	rte_cio_wmb();
 	*txq->qp_db = rte_cpu_to_be_32(txq->wqe_ci);
 	/* Ensure ordering between DB record and BF copy. */
-- 
2.17.1


^ permalink raw reply	[flat|nested] 46+ messages in thread

* [dpdk-dev] [PATCH RFC v2 6/7] net/mlx5: relax ordering for multi-packet RQ buffer refcnt
  2020-02-13 12:38 [dpdk-dev] [PATCH RFC v1 0/6] barrier fix and optimization for mlx5 on aarch64 Gavin Hu
                   ` (11 preceding siblings ...)
  2020-04-10 16:41 ` [dpdk-dev] [PATCH RFC v2 5/7] net/mlx5: add descriptive comment for a barrier Gavin Hu
@ 2020-04-10 16:41 ` Gavin Hu
  2020-06-23  8:26   ` [dpdk-dev] [PATCH v3] net/mlx5: relaxed " Phil Yang
  2020-04-10 16:41 ` [dpdk-dev] [PATCH RFC v2 7/7] doc: clarify one configuration in mlx5 guide Gavin Hu
  13 siblings, 1 reply; 46+ messages in thread
From: Gavin Hu @ 2020-04-10 16:41 UTC (permalink / raw)
  To: dev
  Cc: nd, Phil Yang, david.marchand, thomas, rasland, drc,
	bruce.richardson, konstantin.ananyev, matan, shahafs,
	viacheslavo, jerinj, Honnappa.Nagarahalli, ruifeng.wang,
	joyce.kong, steve.capper, stable

From: Phil Yang <phil.yang@arm.com>

PMD Rx queue descriptor contains two mlx5_mprq_buf fields, which
are the multi-packet RQ buffer header pointers. It uses the common
rte_atomic_XXX functions to make sure the refcnt access is atomic.

The common rte_atomic_XXX functions are full barriers on aarch64.
Optimized it with one-way barrier to improve performance.

Fixes: 7d6bf6b866b8 ("net/mlx5: add Multi-Packet Rx support")
Cc: stable@dpdk.org

Suggested-by: Gavin Hu <gavin.hu@arm.com>
Signed-off-by: Phil Yang <phil.yang@arm.com>
Reviewed-by: Gavin Hu <gavin.hu@arm.com>
---
 drivers/net/mlx5/mlx5_rxq.c  |  2 +-
 drivers/net/mlx5/mlx5_rxtx.c | 16 +++++++++-------
 drivers/net/mlx5/mlx5_rxtx.h |  2 +-
 3 files changed, 11 insertions(+), 9 deletions(-)

diff --git a/drivers/net/mlx5/mlx5_rxq.c b/drivers/net/mlx5/mlx5_rxq.c
index 8a6b410ef..834057c3b 100644
--- a/drivers/net/mlx5/mlx5_rxq.c
+++ b/drivers/net/mlx5/mlx5_rxq.c
@@ -1539,7 +1539,7 @@ mlx5_mprq_buf_init(struct rte_mempool *mp, void *opaque_arg,
 
 	memset(_m, 0, sizeof(*buf));
 	buf->mp = mp;
-	rte_atomic16_set(&buf->refcnt, 1);
+	__atomic_store_n(&buf->refcnt, 1, __ATOMIC_RELAXED);
 	for (j = 0; j != strd_n; ++j) {
 		shinfo = &buf->shinfos[j];
 		shinfo->free_cb = mlx5_mprq_buf_free_cb;
diff --git a/drivers/net/mlx5/mlx5_rxtx.c b/drivers/net/mlx5/mlx5_rxtx.c
index f3bf76376..039dd0a05 100644
--- a/drivers/net/mlx5/mlx5_rxtx.c
+++ b/drivers/net/mlx5/mlx5_rxtx.c
@@ -1592,10 +1592,11 @@ mlx5_mprq_buf_free_cb(void *addr __rte_unused, void *opaque)
 {
 	struct mlx5_mprq_buf *buf = opaque;
 
-	if (rte_atomic16_read(&buf->refcnt) == 1) {
+	if (__atomic_load_n(&buf->refcnt, __ATOMIC_RELAXED) == 1) {
 		rte_mempool_put(buf->mp, buf);
-	} else if (rte_atomic16_add_return(&buf->refcnt, -1) == 0) {
-		rte_atomic16_set(&buf->refcnt, 1);
+	} else if (unlikely(__atomic_sub_fetch(&buf->refcnt, 1,
+						__ATOMIC_RELAXED) == 0)) {
+		__atomic_store_n(&buf->refcnt, 1, __ATOMIC_RELAXED);
 		rte_mempool_put(buf->mp, buf);
 	}
 }
@@ -1676,7 +1677,8 @@ mlx5_rx_burst_mprq(void *dpdk_rxq, struct rte_mbuf **pkts, uint16_t pkts_n)
 
 		if (consumed_strd == strd_n) {
 			/* Replace WQE only if the buffer is still in use. */
-			if (rte_atomic16_read(&buf->refcnt) > 1) {
+			if (__atomic_load_n(&buf->refcnt,
+					    __ATOMIC_RELAXED) > 1) {
 				mprq_buf_replace(rxq, rq_ci & wq_mask, strd_n);
 				/* Release the old buffer. */
 				mlx5_mprq_buf_free(buf);
@@ -1766,9 +1768,9 @@ mlx5_rx_burst_mprq(void *dpdk_rxq, struct rte_mbuf **pkts, uint16_t pkts_n)
 			void *buf_addr;
 
 			/* Increment the refcnt of the whole chunk. */
-			rte_atomic16_add_return(&buf->refcnt, 1);
-			MLX5_ASSERT((uint16_t)rte_atomic16_read(&buf->refcnt) <=
-				    strd_n + 1);
+			__atomic_add_fetch(&buf->refcnt, 1, __ATOMIC_ACQUIRE);
+			MLX5_ASSERT(__atomic_load_n(&buf->refcnt,
+					__ATOMIC_RELAXED) <= strd_n + 1);
 			buf_addr = RTE_PTR_SUB(addr, headroom_sz);
 			/*
 			 * MLX5 device doesn't use iova but it is necessary in a
diff --git a/drivers/net/mlx5/mlx5_rxtx.h b/drivers/net/mlx5/mlx5_rxtx.h
index 737d5716d..d0a1bffa5 100644
--- a/drivers/net/mlx5/mlx5_rxtx.h
+++ b/drivers/net/mlx5/mlx5_rxtx.h
@@ -78,7 +78,7 @@ struct rxq_zip {
 /* Multi-Packet RQ buffer header. */
 struct mlx5_mprq_buf {
 	struct rte_mempool *mp;
-	rte_atomic16_t refcnt; /* Atomically accessed refcnt. */
+	uint16_t refcnt; /* Atomically accessed refcnt. */
 	uint8_t pad[RTE_PKTMBUF_HEADROOM]; /* Headroom for the first packet. */
 	struct rte_mbuf_ext_shared_info shinfos[];
 	/*
-- 
2.17.1


^ permalink raw reply	[flat|nested] 46+ messages in thread

* [dpdk-dev] [PATCH RFC v2 7/7] doc: clarify one configuration in mlx5 guide
  2020-02-13 12:38 [dpdk-dev] [PATCH RFC v1 0/6] barrier fix and optimization for mlx5 on aarch64 Gavin Hu
                   ` (12 preceding siblings ...)
  2020-04-10 16:41 ` [dpdk-dev] [PATCH RFC v2 6/7] net/mlx5: relax ordering for multi-packet RQ buffer refcnt Gavin Hu
@ 2020-04-10 16:41 ` Gavin Hu
  13 siblings, 0 replies; 46+ messages in thread
From: Gavin Hu @ 2020-04-10 16:41 UTC (permalink / raw)
  To: dev
  Cc: nd, david.marchand, thomas, rasland, drc, bruce.richardson,
	konstantin.ananyev, matan, shahafs, viacheslavo, jerinj,
	Honnappa.Nagarahalli, ruifeng.wang, phil.yang, joyce.kong,
	steve.capper

The 'tx_db_nc' is used to differntiate two mapping types, WC and non-WC,
both are actually non-cacheable.

The Write-Combining on x86, is not-cacheablei. The Normal-NC, the
counterpart on aarch64, is non-cacheable too, as its name suggests, the
cache hierarchy was bypassed for accesses to these two memory regions.

Interpreting it to 'non-cacheable' makes no distinction and is
confusing.

re-interprete it to 'non-combining' can make the distinction.

Also explains why aarch64 default setting for this is different.

Signed-off-by: Gavin Hu <gavin.hu@arm.com>
---
 doc/guides/nics/mlx5.rst | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/doc/guides/nics/mlx5.rst b/doc/guides/nics/mlx5.rst
index afd11cd83..addec9f12 100644
--- a/doc/guides/nics/mlx5.rst
+++ b/doc/guides/nics/mlx5.rst
@@ -610,9 +610,9 @@ Run-time configuration
   The rdma core library can map doorbell register in two ways, depending on the
   environment variable "MLX5_SHUT_UP_BF":
 
-  - As regular cached memory (usually with write combining attribute), if the
+  - As regular memory (usually with write combining attribute), if the
     variable is either missing or set to zero.
-  - As non-cached memory, if the variable is present and set to not "0" value.
+  - As non-combining memory, if the variable is present and set to not "0" value.
 
   The type of mapping may slightly affect the Tx performance, the optimal choice
   is strongly relied on the host architecture and should be deduced practically.
@@ -638,6 +638,8 @@ Run-time configuration
   If ``tx_db_nc`` is omitted or set to zero, the preset (if any) environment
   variable "MLX5_SHUT_UP_BF" value is used. If there is no "MLX5_SHUT_UP_BF",
   the default ``tx_db_nc`` value is zero for ARM64 hosts and one for others.
+  ARM64 is different because it has a gathering buffer, the enforced barrier
+  can evict the doorbell ring immediately.
 
 - ``tx_vec_en`` parameter [int]
 
-- 
2.17.1


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [dpdk-dev] [PATCH RFC v2 0/7] introduce new barrier class and use it for mlx5 PMD
  2020-04-10 16:41 ` [dpdk-dev] [PATCH RFC v2 0/7] introduce new barrier class and use it for mlx5 PMD Gavin Hu
@ 2020-04-10 17:20   ` Andrew Rybchenko
  2020-04-11  3:46     ` Gavin Hu
  2020-05-11 18:06   ` [dpdk-dev] [RFC] eal: adjust barriers for IO on Armv8-a Honnappa Nagarahalli
                     ` (3 subsequent siblings)
  4 siblings, 1 reply; 46+ messages in thread
From: Andrew Rybchenko @ 2020-04-10 17:20 UTC (permalink / raw)
  To: Gavin Hu, dev
  Cc: nd, david.marchand, thomas, rasland, drc, bruce.richardson,
	konstantin.ananyev, matan, shahafs, viacheslavo, jerinj,
	Honnappa.Nagarahalli, ruifeng.wang, phil.yang, joyce.kong,
	steve.capper

On 4/10/20 7:41 PM, Gavin Hu wrote:
> To order writes to various memory types, 'sfence' is required for x86,
> and 'dmb oshst' is required for aarch64.
> 
> But within DPDK, there is no abstracted barriers covers this
> combination: sfence(x86)/dmb(aarch64).
> 
> So introduce a new barrier class - rte_dma_*mb for this combination,
> 
> Doorbell rings are typical use cases of this new barrier class, which
> requires something ready in the memory before letting HW aware.
> 
> As a note, rte_io_wmb and rte_cio_wmb are compiler barriers for x86, while
> rte_wmb is 'dsb' for aarch64.

As far as I can see rte_cio_wmb() is exactly definition of the barrier
to be used for doorbells. Am I missing something?
May be it is just a bug in rte_cio_wmb() on x86?

> In the joint preliminary testing between Arm and Ampere, 8%~13%
> performance boost was measured.
> 
> As there is no functionality changes, it will not impact x86.
> 
> Gavin Hu (6):
>    eal: introduce new class of barriers for DMA use cases
>    net/mlx5: dmb for immediate doorbell ring on aarch64
>    net/mlx5: relax barrier to order UAR writes on aarch64
>    net/mlx5: relax barrier for aarch64
>    net/mlx5: add descriptive comment for a barrier
>    doc: clarify one configuration in mlx5 guide
> 
> Phil Yang (1):
>    net/mlx5: relax ordering for multi-packet RQ buffer refcnt
> 
>   doc/guides/nics/mlx5.rst                    |  6 ++--
>   drivers/net/mlx5/mlx5_rxq.c                 |  2 +-
>   drivers/net/mlx5/mlx5_rxtx.c                | 16 ++++++-----
>   drivers/net/mlx5/mlx5_rxtx.h                | 14 ++++++----
>   lib/librte_eal/arm/include/rte_atomic_32.h  |  6 ++++
>   lib/librte_eal/arm/include/rte_atomic_64.h  |  6 ++++
>   lib/librte_eal/include/generic/rte_atomic.h | 31 +++++++++++++++++++++
>   lib/librte_eal/ppc/include/rte_atomic.h     |  6 ++++
>   lib/librte_eal/x86/include/rte_atomic.h     |  6 ++++
>   9 files changed, 78 insertions(+), 15 deletions(-)
> 


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [dpdk-dev] [PATCH RFC v2 0/7] introduce new barrier class and use it for mlx5 PMD
  2020-04-10 17:20   ` Andrew Rybchenko
@ 2020-04-11  3:46     ` Gavin Hu
  2020-04-13  9:51       ` Andrew Rybchenko
  0 siblings, 1 reply; 46+ messages in thread
From: Gavin Hu @ 2020-04-11  3:46 UTC (permalink / raw)
  To: Andrew Rybchenko, dev
  Cc: nd, david.marchand, thomas, rasland, drc, bruce.richardson,
	konstantin.ananyev, matan, shahafs, viacheslavo, jerinj,
	Honnappa Nagarahalli, Ruifeng Wang, Phil Yang, Joyce Kong,
	Steve Capper, nd

Hi Andrew,

> -----Original Message-----
> From: Andrew Rybchenko <arybchenko@solarflare.com>
> Sent: Saturday, April 11, 2020 1:21 AM
> To: Gavin Hu <Gavin.Hu@arm.com>; dev@dpdk.org
> Cc: nd <nd@arm.com>; david.marchand@redhat.com;
> thomas@monjalon.net; rasland@mellanox.com; drc@linux.vnet.ibm.com;
> bruce.richardson@intel.com; konstantin.ananyev@intel.com;
> matan@mellanox.com; shahafs@mellanox.com; viacheslavo@mellanox.com;
> jerinj@marvell.com; Honnappa Nagarahalli
> <Honnappa.Nagarahalli@arm.com>; Ruifeng Wang
> <Ruifeng.Wang@arm.com>; Phil Yang <Phil.Yang@arm.com>; Joyce Kong
> <Joyce.Kong@arm.com>; Steve Capper <Steve.Capper@arm.com>
> Subject: Re: [dpdk-dev] [PATCH RFC v2 0/7] introduce new barrier class and
> use it for mlx5 PMD
> 
> On 4/10/20 7:41 PM, Gavin Hu wrote:
> > To order writes to various memory types, 'sfence' is required for x86,
> > and 'dmb oshst' is required for aarch64.
> >
> > But within DPDK, there is no abstracted barriers covers this
> > combination: sfence(x86)/dmb(aarch64).
> >
> > So introduce a new barrier class - rte_dma_*mb for this combination,
> >
> > Doorbell rings are typical use cases of this new barrier class, which
> > requires something ready in the memory before letting HW aware.
> >
> > As a note, rte_io_wmb and rte_cio_wmb are compiler barriers for x86,
> while
> > rte_wmb is 'dsb' for aarch64.
> 
> As far as I can see rte_cio_wmb() is exactly definition of the barrier
> to be used for doorbells. Am I missing something?

I understand rte_cio_wmb is for DMA buffers, for examples, descriptors, work queues, located in the host memory, but shared between CPU and IO device.
rte_io_wmb is for MMIO regions. 
We are missing the barriers for various memory types, eg. Doorbell cases.

There is an implication in the definition of rte_cio_wmb, it can not be used for non-coherent MMIO region(WC?)
http://code.dpdk.org/dpdk/v20.02/source/lib/librte_eal/common/include/generic/rte_atomic.h#L124
> May be it is just a bug in rte_cio_wmb() on x86?
rte_cio_wmb is ok for doorbells on aarch64, but looking through the kernel code, 'sfence' is required for various/mixed memory types.
DPDK mlx5 PMD uses rte_cio_wmb widely and wisely, it orders sequences of writes to host memory that shared by IO device.
Strengthening rte_cio_wmb may hurt performance, so a new barrier class is introduced to optimize for aarch64, in the fast path only, while not impacting x86.
http://code.dpdk.org/dpdk/v20.02/source/drivers/net/mlx5/mlx5_rxtx.c#L1087
/Gavin
> 
> > In the joint preliminary testing between Arm and Ampere, 8%~13%
> > performance boost was measured.
> >
> > As there is no functionality changes, it will not impact x86.
> >
> > Gavin Hu (6):
> >    eal: introduce new class of barriers for DMA use cases
> >    net/mlx5: dmb for immediate doorbell ring on aarch64
> >    net/mlx5: relax barrier to order UAR writes on aarch64
> >    net/mlx5: relax barrier for aarch64
> >    net/mlx5: add descriptive comment for a barrier
> >    doc: clarify one configuration in mlx5 guide
> >
> > Phil Yang (1):
> >    net/mlx5: relax ordering for multi-packet RQ buffer refcnt
> >
> >   doc/guides/nics/mlx5.rst                    |  6 ++--
> >   drivers/net/mlx5/mlx5_rxq.c                 |  2 +-
> >   drivers/net/mlx5/mlx5_rxtx.c                | 16 ++++++-----
> >   drivers/net/mlx5/mlx5_rxtx.h                | 14 ++++++----
> >   lib/librte_eal/arm/include/rte_atomic_32.h  |  6 ++++
> >   lib/librte_eal/arm/include/rte_atomic_64.h  |  6 ++++
> >   lib/librte_eal/include/generic/rte_atomic.h | 31 +++++++++++++++++++++
> >   lib/librte_eal/ppc/include/rte_atomic.h     |  6 ++++
> >   lib/librte_eal/x86/include/rte_atomic.h     |  6 ++++
> >   9 files changed, 78 insertions(+), 15 deletions(-)
> >


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [dpdk-dev] [PATCH RFC v2 0/7] introduce new barrier class and use it for mlx5 PMD
  2020-04-11  3:46     ` Gavin Hu
@ 2020-04-13  9:51       ` Andrew Rybchenko
  2020-04-13 16:46         ` Gavin Hu
  0 siblings, 1 reply; 46+ messages in thread
From: Andrew Rybchenko @ 2020-04-13  9:51 UTC (permalink / raw)
  To: Gavin Hu, dev
  Cc: nd, david.marchand, thomas, rasland, drc, bruce.richardson,
	konstantin.ananyev, matan, shahafs, viacheslavo, jerinj,
	Honnappa Nagarahalli, Ruifeng Wang, Phil Yang, Joyce Kong,
	Steve Capper

On 4/11/20 6:46 AM, Gavin Hu wrote:
> Hi Andrew,
> 
>> -----Original Message-----
>> From: Andrew Rybchenko <arybchenko@solarflare.com>
>> Sent: Saturday, April 11, 2020 1:21 AM
>> To: Gavin Hu <Gavin.Hu@arm.com>; dev@dpdk.org
>> Cc: nd <nd@arm.com>; david.marchand@redhat.com;
>> thomas@monjalon.net; rasland@mellanox.com; drc@linux.vnet.ibm.com;
>> bruce.richardson@intel.com; konstantin.ananyev@intel.com;
>> matan@mellanox.com; shahafs@mellanox.com; viacheslavo@mellanox.com;
>> jerinj@marvell.com; Honnappa Nagarahalli
>> <Honnappa.Nagarahalli@arm.com>; Ruifeng Wang
>> <Ruifeng.Wang@arm.com>; Phil Yang <Phil.Yang@arm.com>; Joyce Kong
>> <Joyce.Kong@arm.com>; Steve Capper <Steve.Capper@arm.com>
>> Subject: Re: [dpdk-dev] [PATCH RFC v2 0/7] introduce new barrier class and
>> use it for mlx5 PMD
>>
>> On 4/10/20 7:41 PM, Gavin Hu wrote:
>>> To order writes to various memory types, 'sfence' is required for x86,
>>> and 'dmb oshst' is required for aarch64.
>>>
>>> But within DPDK, there is no abstracted barriers covers this
>>> combination: sfence(x86)/dmb(aarch64).
>>>
>>> So introduce a new barrier class - rte_dma_*mb for this combination,
>>>
>>> Doorbell rings are typical use cases of this new barrier class, which
>>> requires something ready in the memory before letting HW aware.
>>>
>>> As a note, rte_io_wmb and rte_cio_wmb are compiler barriers for x86,
>> while
>>> rte_wmb is 'dsb' for aarch64.
>>
>> As far as I can see rte_cio_wmb() is exactly definition of the barrier
>> to be used for doorbells. Am I missing something?
> 
> I understand rte_cio_wmb is for DMA buffers, for examples, descriptors, work queues, located in the host memory, but shared between CPU and IO device.
> rte_io_wmb is for MMIO regions. 
> We are missing the barriers for various memory types, eg. Doorbell cases.

When the patch series is applied, we'll have 5 types of memory
barriers: regular, smp, cio, io, dma. Do we really need so
many? May be we need a table in description which could
help to make the right choice. I.e. type of access on both
axis and type of barrier to use on intersection.

> There is an implication in the definition of rte_cio_wmb, it can not be used for non-coherent MMIO region(WC?)
> http://code.dpdk.org/dpdk/v20.02/source/lib/librte_eal/common/include/generic/rte_atomic.h#L124
>> May be it is just a bug in rte_cio_wmb() on x86?
> rte_cio_wmb is ok for doorbells on aarch64, but looking through the kernel code, 'sfence' is required for various/mixed memory types.
> DPDK mlx5 PMD uses rte_cio_wmb widely and wisely, it orders sequences of writes to host memory that shared by IO device.
> Strengthening rte_cio_wmb may hurt performance, so a new barrier class is introduced to optimize for aarch64, in the fast path only, while not impacting x86.
> http://code.dpdk.org/dpdk/v20.02/source/drivers/net/mlx5/mlx5_rxtx.c#L1087

May be my problem that I don't fully understand real-life
usecases when cio should be used in accordance with its
current definition. Does it make sense without doorbell?
Does HW polling via DMA?

Thanks for explanations,
Andrew.

>>
>>> In the joint preliminary testing between Arm and Ampere, 8%~13%
>>> performance boost was measured.
>>>
>>> As there is no functionality changes, it will not impact x86.
>>>
>>> Gavin Hu (6):
>>>    eal: introduce new class of barriers for DMA use cases
>>>    net/mlx5: dmb for immediate doorbell ring on aarch64
>>>    net/mlx5: relax barrier to order UAR writes on aarch64
>>>    net/mlx5: relax barrier for aarch64
>>>    net/mlx5: add descriptive comment for a barrier
>>>    doc: clarify one configuration in mlx5 guide
>>>
>>> Phil Yang (1):
>>>    net/mlx5: relax ordering for multi-packet RQ buffer refcnt
>>>
>>>   doc/guides/nics/mlx5.rst                    |  6 ++--
>>>   drivers/net/mlx5/mlx5_rxq.c                 |  2 +-
>>>   drivers/net/mlx5/mlx5_rxtx.c                | 16 ++++++-----
>>>   drivers/net/mlx5/mlx5_rxtx.h                | 14 ++++++----
>>>   lib/librte_eal/arm/include/rte_atomic_32.h  |  6 ++++
>>>   lib/librte_eal/arm/include/rte_atomic_64.h  |  6 ++++
>>>   lib/librte_eal/include/generic/rte_atomic.h | 31 +++++++++++++++++++++
>>>   lib/librte_eal/ppc/include/rte_atomic.h     |  6 ++++
>>>   lib/librte_eal/x86/include/rte_atomic.h     |  6 ++++
>>>   9 files changed, 78 insertions(+), 15 deletions(-)
>>>
> 


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [dpdk-dev] [PATCH RFC v2 0/7] introduce new barrier class and use it for mlx5 PMD
  2020-04-13  9:51       ` Andrew Rybchenko
@ 2020-04-13 16:46         ` Gavin Hu
  0 siblings, 0 replies; 46+ messages in thread
From: Gavin Hu @ 2020-04-13 16:46 UTC (permalink / raw)
  To: Andrew Rybchenko, dev
  Cc: nd, david.marchand, thomas, rasland, drc, bruce.richardson,
	konstantin.ananyev, matan, shahafs, viacheslavo, jerinj,
	Honnappa Nagarahalli, Ruifeng Wang, Phil Yang, Joyce Kong,
	Steve Capper, nd

Hi Andrew,

> -----Original Message-----
> From: Andrew Rybchenko <arybchenko@solarflare.com>
> Sent: Monday, April 13, 2020 5:52 PM
> To: Gavin Hu <Gavin.Hu@arm.com>; dev@dpdk.org
> Cc: nd <nd@arm.com>; david.marchand@redhat.com;
> thomas@monjalon.net; rasland@mellanox.com; drc@linux.vnet.ibm.com;
> bruce.richardson@intel.com; konstantin.ananyev@intel.com;
> matan@mellanox.com; shahafs@mellanox.com; viacheslavo@mellanox.com;
> jerinj@marvell.com; Honnappa Nagarahalli
> <Honnappa.Nagarahalli@arm.com>; Ruifeng Wang
> <Ruifeng.Wang@arm.com>; Phil Yang <Phil.Yang@arm.com>; Joyce Kong
> <Joyce.Kong@arm.com>; Steve Capper <Steve.Capper@arm.com>
> Subject: Re: [dpdk-dev] [PATCH RFC v2 0/7] introduce new barrier class and
> use it for mlx5 PMD
> 
> On 4/11/20 6:46 AM, Gavin Hu wrote:
> > Hi Andrew,
> >
> >> -----Original Message-----
> >> From: Andrew Rybchenko <arybchenko@solarflare.com>
> >> Sent: Saturday, April 11, 2020 1:21 AM
> >> To: Gavin Hu <Gavin.Hu@arm.com>; dev@dpdk.org
> >> Cc: nd <nd@arm.com>; david.marchand@redhat.com;
> >> thomas@monjalon.net; rasland@mellanox.com; drc@linux.vnet.ibm.com;
> >> bruce.richardson@intel.com; konstantin.ananyev@intel.com;
> >> matan@mellanox.com; shahafs@mellanox.com;
> viacheslavo@mellanox.com;
> >> jerinj@marvell.com; Honnappa Nagarahalli
> >> <Honnappa.Nagarahalli@arm.com>; Ruifeng Wang
> >> <Ruifeng.Wang@arm.com>; Phil Yang <Phil.Yang@arm.com>; Joyce Kong
> >> <Joyce.Kong@arm.com>; Steve Capper <Steve.Capper@arm.com>
> >> Subject: Re: [dpdk-dev] [PATCH RFC v2 0/7] introduce new barrier class
> and
> >> use it for mlx5 PMD
> >>
> >> On 4/10/20 7:41 PM, Gavin Hu wrote:
> >>> To order writes to various memory types, 'sfence' is required for x86,
> >>> and 'dmb oshst' is required for aarch64.
> >>>
> >>> But within DPDK, there is no abstracted barriers covers this
> >>> combination: sfence(x86)/dmb(aarch64).
> >>>
> >>> So introduce a new barrier class - rte_dma_*mb for this combination,
> >>>
> >>> Doorbell rings are typical use cases of this new barrier class, which
> >>> requires something ready in the memory before letting HW aware.
> >>>
> >>> As a note, rte_io_wmb and rte_cio_wmb are compiler barriers for x86,
> >> while
> >>> rte_wmb is 'dsb' for aarch64.
> >>
> >> As far as I can see rte_cio_wmb() is exactly definition of the barrier
> >> to be used for doorbells. Am I missing something?
> >
> > I understand rte_cio_wmb is for DMA buffers, for examples, descriptors,
> work queues, located in the host memory, but shared between CPU and IO
> device.
> > rte_io_wmb is for MMIO regions.
> > We are missing the barriers for various memory types, eg. Doorbell cases.
> 
> When the patch series is applied, we'll have 5 types of memory
> barriers: regular, smp, cio, io, dma. Do we really need so
> many? May be we need a table in description which could
> help to make the right choice. I.e. type of access on both
> axis and type of barrier to use on intersection.
Yes, good suggestion!
Actually Honnappa and I already made a table sheet for this.
Will provide it in next release!
Thanks for your opinions!
> 
> > There is an implication in the definition of rte_cio_wmb, it can not be used
> for non-coherent MMIO region(WC?)
> >
> http://code.dpdk.org/dpdk/v20.02/source/lib/librte_eal/common/include/g
> eneric/rte_atomic.h#L124
> >> May be it is just a bug in rte_cio_wmb() on x86?
> > rte_cio_wmb is ok for doorbells on aarch64, but looking through the
> kernel code, 'sfence' is required for various/mixed memory types.
> > DPDK mlx5 PMD uses rte_cio_wmb widely and wisely, it orders sequences
> of writes to host memory that shared by IO device.
> > Strengthening rte_cio_wmb may hurt performance, so a new barrier class
> is introduced to optimize for aarch64, in the fast path only, while not
> impacting x86.
> >
> http://code.dpdk.org/dpdk/v20.02/source/drivers/net/mlx5/mlx5_rxtx.c#L1
> 087
> 
> May be my problem that I don't fully understand real-life
> usecases when cio should be used in accordance with its
> current definition. Does it make sense without doorbell?
> Does HW polling via DMA?
> 
> Thanks for explanations,
> Andrew.
> 
> >>
> >>> In the joint preliminary testing between Arm and Ampere, 8%~13%
> >>> performance boost was measured.
> >>>
> >>> As there is no functionality changes, it will not impact x86.
> >>>
> >>> Gavin Hu (6):
> >>>    eal: introduce new class of barriers for DMA use cases
> >>>    net/mlx5: dmb for immediate doorbell ring on aarch64
> >>>    net/mlx5: relax barrier to order UAR writes on aarch64
> >>>    net/mlx5: relax barrier for aarch64
> >>>    net/mlx5: add descriptive comment for a barrier
> >>>    doc: clarify one configuration in mlx5 guide
> >>>
> >>> Phil Yang (1):
> >>>    net/mlx5: relax ordering for multi-packet RQ buffer refcnt
> >>>
> >>>   doc/guides/nics/mlx5.rst                    |  6 ++--
> >>>   drivers/net/mlx5/mlx5_rxq.c                 |  2 +-
> >>>   drivers/net/mlx5/mlx5_rxtx.c                | 16 ++++++-----
> >>>   drivers/net/mlx5/mlx5_rxtx.h                | 14 ++++++----
> >>>   lib/librte_eal/arm/include/rte_atomic_32.h  |  6 ++++
> >>>   lib/librte_eal/arm/include/rte_atomic_64.h  |  6 ++++
> >>>   lib/librte_eal/include/generic/rte_atomic.h | 31
> +++++++++++++++++++++
> >>>   lib/librte_eal/ppc/include/rte_atomic.h     |  6 ++++
> >>>   lib/librte_eal/x86/include/rte_atomic.h     |  6 ++++
> >>>   9 files changed, 78 insertions(+), 15 deletions(-)
> >>>
> >


^ permalink raw reply	[flat|nested] 46+ messages in thread

* [dpdk-dev] [RFC] eal: adjust barriers for IO on Armv8-a
  2020-04-10 16:41 ` [dpdk-dev] [PATCH RFC v2 0/7] introduce new barrier class and use it for mlx5 PMD Gavin Hu
  2020-04-10 17:20   ` Andrew Rybchenko
@ 2020-05-11 18:06   ` Honnappa Nagarahalli
  2020-05-12  6:18     ` Ruifeng Wang
  2020-06-27 19:12   ` [dpdk-dev] [PATCH v2] " Honnappa Nagarahalli
                     ` (2 subsequent siblings)
  4 siblings, 1 reply; 46+ messages in thread
From: Honnappa Nagarahalli @ 2020-05-11 18:06 UTC (permalink / raw)
  To: dev, jerinj, hemant.agrawal, ajit.khaparde, igorch, thomas,
	viacheslavo, arybchenko, honnappa.nagarahalli
  Cc: ruifeng.wang, nd

Change the barrier APIs for IO to reflect that Armv8-a is other-multi-copy
atomicity memory model.

Armv8-a memory model has been strengthened to require
other-multi-copy atomicity. This property requires memory accesses
from an observer to become visible to all other observers
simultaneously [3]. This means

a) A write arriving at an endpoint shared between multiple CPUs is
   visible to all CPUs
b) A write that is visible to all CPUs is also visible to all other
   observers in the shareability domain

This allows for using cheaper DMB instructions in the place of DSB
for devices that are visible to all CPUs (i.e. devices that DPDK
caters to).

Please refer to [1], [2] and [3] for more information.

[1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=22ec71615d824f4f11d38d0e55a88d8956b7e45f
[2] https://www.youtube.com/watch?v=i6DayghhA8Q
[3] https://www.cl.cam.ac.uk/~pes20/armv8-mca/

Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
---
 lib/librte_eal/arm/include/rte_atomic_64.h | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/lib/librte_eal/arm/include/rte_atomic_64.h b/lib/librte_eal/arm/include/rte_atomic_64.h
index 7b7099cdc..e406411bb 100644
--- a/lib/librte_eal/arm/include/rte_atomic_64.h
+++ b/lib/librte_eal/arm/include/rte_atomic_64.h
@@ -19,11 +19,11 @@ extern "C" {
 #include <rte_compat.h>
 #include <rte_debug.h>
 
-#define rte_mb() asm volatile("dsb sy" : : : "memory")
+#define rte_mb() asm volatile("dmb osh" : : : "memory")
 
-#define rte_wmb() asm volatile("dsb st" : : : "memory")
+#define rte_wmb() asm volatile("dmb oshst" : : : "memory")
 
-#define rte_rmb() asm volatile("dsb ld" : : : "memory")
+#define rte_rmb() asm volatile("dmb oshld" : : : "memory")
 
 #define rte_smp_mb() asm volatile("dmb ish" : : : "memory")
 
@@ -37,9 +37,9 @@ extern "C" {
 
 #define rte_io_rmb() rte_rmb()
 
-#define rte_cio_wmb() asm volatile("dmb oshst" : : : "memory")
+#define rte_cio_wmb() rte_wmb()
 
-#define rte_cio_rmb() asm volatile("dmb oshld" : : : "memory")
+#define rte_cio_rmb() rte_rmb()
 
 /*------------------------ 128 bit atomic operations -------------------------*/
 
-- 
2.17.1


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [dpdk-dev] [RFC] eal: adjust barriers for IO on Armv8-a
  2020-05-11 18:06   ` [dpdk-dev] [RFC] eal: adjust barriers for IO on Armv8-a Honnappa Nagarahalli
@ 2020-05-12  6:18     ` Ruifeng Wang
  2020-05-12  6:42       ` Jerin Jacob
  0 siblings, 1 reply; 46+ messages in thread
From: Ruifeng Wang @ 2020-05-12  6:18 UTC (permalink / raw)
  To: Honnappa Nagarahalli, dev, jerinj, hemant.agrawal,
	Ajit Khaparde (ajit.khaparde, igorch, thomas, viacheslavo,
	arybchenko, Honnappa Nagarahalli
  Cc: nd, nd


> -----Original Message-----
> From: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
> Sent: Tuesday, May 12, 2020 2:07 AM
> To: dev@dpdk.org; jerinj@marvell.com; hemant.agrawal@nxp.com; Ajit
> Khaparde (ajit.khaparde@broadcom.com) <ajit.khaparde@broadcom.com>;
> igorch@amazon.com; thomas@monjalon.net; viacheslavo@mellanox.com;
> arybchenko@solarflare.com; Honnappa Nagarahalli
> <Honnappa.Nagarahalli@arm.com>
> Cc: Ruifeng Wang <Ruifeng.Wang@arm.com>; nd <nd@arm.com>
> Subject: [RFC] eal: adjust barriers for IO on Armv8-a
> 
> Change the barrier APIs for IO to reflect that Armv8-a is other-multi-copy
> atomicity memory model.
> 
> Armv8-a memory model has been strengthened to require other-multi-copy
> atomicity. This property requires memory accesses from an observer to
> become visible to all other observers simultaneously [3]. This means
> 
> a) A write arriving at an endpoint shared between multiple CPUs is
>    visible to all CPUs
> b) A write that is visible to all CPUs is also visible to all other
>    observers in the shareability domain
> 
> This allows for using cheaper DMB instructions in the place of DSB for devices
> that are visible to all CPUs (i.e. devices that DPDK caters to).
> 
> Please refer to [1], [2] and [3] for more information.
> 
> [1]
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i
> d=22ec71615d824f4f11d38d0e55a88d8956b7e45f
> [2] https://www.youtube.com/watch?v=i6DayghhA8Q
> [3] https://www.cl.cam.ac.uk/~pes20/armv8-mca/
> 
> Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
> ---
>  lib/librte_eal/arm/include/rte_atomic_64.h | 10 +++++-----
>  1 file changed, 5 insertions(+), 5 deletions(-)
> 
> diff --git a/lib/librte_eal/arm/include/rte_atomic_64.h
> b/lib/librte_eal/arm/include/rte_atomic_64.h
> index 7b7099cdc..e406411bb 100644
> --- a/lib/librte_eal/arm/include/rte_atomic_64.h
> +++ b/lib/librte_eal/arm/include/rte_atomic_64.h
> @@ -19,11 +19,11 @@ extern "C" {
>  #include <rte_compat.h>
>  #include <rte_debug.h>
> 
> -#define rte_mb() asm volatile("dsb sy" : : : "memory")
> +#define rte_mb() asm volatile("dmb osh" : : : "memory")
> 
> -#define rte_wmb() asm volatile("dsb st" : : : "memory")
> +#define rte_wmb() asm volatile("dmb oshst" : : : "memory")
> 
> -#define rte_rmb() asm volatile("dsb ld" : : : "memory")
> +#define rte_rmb() asm volatile("dmb oshld" : : : "memory")
> 
>  #define rte_smp_mb() asm volatile("dmb ish" : : : "memory")
> 
> @@ -37,9 +37,9 @@ extern "C" {
> 
>  #define rte_io_rmb() rte_rmb()
> 
> -#define rte_cio_wmb() asm volatile("dmb oshst" : : : "memory")
> +#define rte_cio_wmb() rte_wmb()
> 
> -#define rte_cio_rmb() asm volatile("dmb oshld" : : : "memory")
> +#define rte_cio_rmb() rte_rmb()
> 
>  /*------------------------ 128 bit atomic operations -------------------------*/
> 
> --
> 2.17.1

This change showed about 7% performance gain in testpmd single core NDR test.
Tested-by: Ruifeng Wang <ruifeng.wang@arm.com>


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [dpdk-dev] [RFC] eal: adjust barriers for IO on Armv8-a
  2020-05-12  6:18     ` Ruifeng Wang
@ 2020-05-12  6:42       ` Jerin Jacob
  2020-05-12  8:02         ` Ruifeng Wang
  0 siblings, 1 reply; 46+ messages in thread
From: Jerin Jacob @ 2020-05-12  6:42 UTC (permalink / raw)
  To: Ruifeng Wang
  Cc: Honnappa Nagarahalli, dev, jerinj, hemant.agrawal,
	Ajit Khaparde (ajit.khaparde, igorch, thomas, viacheslavo,
	arybchenko, nd

On Tue, May 12, 2020 at 11:48 AM Ruifeng Wang <Ruifeng.Wang@arm.com> wrote:
>
>
> > -----Original Message-----
> > From: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
> > Sent: Tuesday, May 12, 2020 2:07 AM
> > To: dev@dpdk.org; jerinj@marvell.com; hemant.agrawal@nxp.com; Ajit
> > Khaparde (ajit.khaparde@broadcom.com) <ajit.khaparde@broadcom.com>;
> > igorch@amazon.com; thomas@monjalon.net; viacheslavo@mellanox.com;
> > arybchenko@solarflare.com; Honnappa Nagarahalli
> > <Honnappa.Nagarahalli@arm.com>
> > Cc: Ruifeng Wang <Ruifeng.Wang@arm.com>; nd <nd@arm.com>
> > Subject: [RFC] eal: adjust barriers for IO on Armv8-a
> >
> > Change the barrier APIs for IO to reflect that Armv8-a is other-multi-copy
> > atomicity memory model.
> >
> > Armv8-a memory model has been strengthened to require other-multi-copy
> > atomicity. This property requires memory accesses from an observer to
> > become visible to all other observers simultaneously [3]. This means
> >
> > a) A write arriving at an endpoint shared between multiple CPUs is
> >    visible to all CPUs
> > b) A write that is visible to all CPUs is also visible to all other
> >    observers in the shareability domain
> >
> > This allows for using cheaper DMB instructions in the place of DSB for devices
> > that are visible to all CPUs (i.e. devices that DPDK caters to).
> >
> > Please refer to [1], [2] and [3] for more information.
> >
> > [1]
> > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i
> > d=22ec71615d824f4f11d38d0e55a88d8956b7e45f
> > [2] https://www.youtube.com/watch?v=i6DayghhA8Q
> > [3] https://www.cl.cam.ac.uk/~pes20/armv8-mca/
> >
> > Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
> > ---
> >  lib/librte_eal/arm/include/rte_atomic_64.h | 10 +++++-----
> >  1 file changed, 5 insertions(+), 5 deletions(-)
> >
> > diff --git a/lib/librte_eal/arm/include/rte_atomic_64.h
> > b/lib/librte_eal/arm/include/rte_atomic_64.h
> > index 7b7099cdc..e406411bb 100644
> > --- a/lib/librte_eal/arm/include/rte_atomic_64.h
> > +++ b/lib/librte_eal/arm/include/rte_atomic_64.h
> > @@ -19,11 +19,11 @@ extern "C" {
> >  #include <rte_compat.h>
> >  #include <rte_debug.h>
> >
> > -#define rte_mb() asm volatile("dsb sy" : : : "memory")
> > +#define rte_mb() asm volatile("dmb osh" : : : "memory")
> >
> > -#define rte_wmb() asm volatile("dsb st" : : : "memory")
> > +#define rte_wmb() asm volatile("dmb oshst" : : : "memory")
> >
> > -#define rte_rmb() asm volatile("dsb ld" : : : "memory")
> > +#define rte_rmb() asm volatile("dmb oshld" : : : "memory")
> >
> >  #define rte_smp_mb() asm volatile("dmb ish" : : : "memory")
> >
> > @@ -37,9 +37,9 @@ extern "C" {
> >
> >  #define rte_io_rmb() rte_rmb()
> >
> > -#define rte_cio_wmb() asm volatile("dmb oshst" : : : "memory")
> > +#define rte_cio_wmb() rte_wmb()
> >
> > -#define rte_cio_rmb() asm volatile("dmb oshld" : : : "memory")
> > +#define rte_cio_rmb() rte_rmb()
> >
> >  /*------------------------ 128 bit atomic operations -------------------------*/
> >
> > --
> > 2.17.1
>
> This change showed about 7% performance gain in testpmd single core NDR test.

I am trying to understand this patch wrt DPDK current usage model?

1)  Is performance improvement due to the fact that the PMD that you
are using it for testing suppose to use existing rte_cio_* but it was
using rte_[rw]mb?
2) In my understanding :
a) CPU to CPU barrier requirements are addressed by rte_smp_*
b) CPU to DMA/Device barrier requirements are addressed by rte_cio_*
c) CPU to ANY(CPU or Device) are addressed by  rte_[rw]mb

If (c) is true then we are violating the DPDK spec with change. Right?
This change will not be required if fastpath (CPU to Device) is using
rte_cio_*. Right?



> Tested-by: Ruifeng Wang <ruifeng.wang@arm.com>
>

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [dpdk-dev] [RFC] eal: adjust barriers for IO on Armv8-a
  2020-05-12  6:42       ` Jerin Jacob
@ 2020-05-12  8:02         ` Ruifeng Wang
  2020-05-12  8:28           ` Jerin Jacob
  2020-05-12 21:44           ` Honnappa Nagarahalli
  0 siblings, 2 replies; 46+ messages in thread
From: Ruifeng Wang @ 2020-05-12  8:02 UTC (permalink / raw)
  To: Jerin Jacob
  Cc: Honnappa Nagarahalli, dev, jerinj, hemant.agrawal,
	Ajit Khaparde (ajit.khaparde, igorch, thomas, viacheslavo,
	arybchenko, nd, nd


> -----Original Message-----
> From: Jerin Jacob <jerinjacobk@gmail.com>
> Sent: Tuesday, May 12, 2020 2:42 PM
> To: Ruifeng Wang <Ruifeng.Wang@arm.com>
> Cc: Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>;
> dev@dpdk.org; jerinj@marvell.com; hemant.agrawal@nxp.com; Ajit
> Khaparde (ajit.khaparde@broadcom.com) <ajit.khaparde@broadcom.com>;
> igorch@amazon.com; thomas@monjalon.net; viacheslavo@mellanox.com;
> arybchenko@solarflare.com; nd <nd@arm.com>
> Subject: Re: [dpdk-dev] [RFC] eal: adjust barriers for IO on Armv8-a
> 
> On Tue, May 12, 2020 at 11:48 AM Ruifeng Wang <Ruifeng.Wang@arm.com>
> wrote:
> >
> >
> > > -----Original Message-----
> > > From: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
> > > Sent: Tuesday, May 12, 2020 2:07 AM
> > > To: dev@dpdk.org; jerinj@marvell.com; hemant.agrawal@nxp.com; Ajit
> > > Khaparde (ajit.khaparde@broadcom.com)
> <ajit.khaparde@broadcom.com>;
> > > igorch@amazon.com; thomas@monjalon.net;
> viacheslavo@mellanox.com;
> > > arybchenko@solarflare.com; Honnappa Nagarahalli
> > > <Honnappa.Nagarahalli@arm.com>
> > > Cc: Ruifeng Wang <Ruifeng.Wang@arm.com>; nd <nd@arm.com>
> > > Subject: [RFC] eal: adjust barriers for IO on Armv8-a
> > >
> > > Change the barrier APIs for IO to reflect that Armv8-a is
> > > other-multi-copy atomicity memory model.
> > >
> > > Armv8-a memory model has been strengthened to require
> > > other-multi-copy atomicity. This property requires memory accesses
> > > from an observer to become visible to all other observers
> > > simultaneously [3]. This means
> > >
> > > a) A write arriving at an endpoint shared between multiple CPUs is
> > >    visible to all CPUs
> > > b) A write that is visible to all CPUs is also visible to all other
> > >    observers in the shareability domain
> > >
> > > This allows for using cheaper DMB instructions in the place of DSB
> > > for devices that are visible to all CPUs (i.e. devices that DPDK caters to).
> > >
> > > Please refer to [1], [2] and [3] for more information.
> > >
> > > [1]
> > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/c
> > > ommit/?i d=22ec71615d824f4f11d38d0e55a88d8956b7e45f
> > > [2] https://www.youtube.com/watch?v=i6DayghhA8Q
> > > [3] https://www.cl.cam.ac.uk/~pes20/armv8-mca/
> > >
> > > Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
> > > ---
> > >  lib/librte_eal/arm/include/rte_atomic_64.h | 10 +++++-----
> > >  1 file changed, 5 insertions(+), 5 deletions(-)
> > >
> > > diff --git a/lib/librte_eal/arm/include/rte_atomic_64.h
> > > b/lib/librte_eal/arm/include/rte_atomic_64.h
> > > index 7b7099cdc..e406411bb 100644
> > > --- a/lib/librte_eal/arm/include/rte_atomic_64.h
> > > +++ b/lib/librte_eal/arm/include/rte_atomic_64.h
> > > @@ -19,11 +19,11 @@ extern "C" {
> > >  #include <rte_compat.h>
> > >  #include <rte_debug.h>
> > >
> > > -#define rte_mb() asm volatile("dsb sy" : : : "memory")
> > > +#define rte_mb() asm volatile("dmb osh" : : : "memory")
> > >
> > > -#define rte_wmb() asm volatile("dsb st" : : : "memory")
> > > +#define rte_wmb() asm volatile("dmb oshst" : : : "memory")
> > >
> > > -#define rte_rmb() asm volatile("dsb ld" : : : "memory")
> > > +#define rte_rmb() asm volatile("dmb oshld" : : : "memory")
> > >
> > >  #define rte_smp_mb() asm volatile("dmb ish" : : : "memory")
> > >
> > > @@ -37,9 +37,9 @@ extern "C" {
> > >
> > >  #define rte_io_rmb() rte_rmb()
> > >
> > > -#define rte_cio_wmb() asm volatile("dmb oshst" : : : "memory")
> > > +#define rte_cio_wmb() rte_wmb()
> > >
> > > -#define rte_cio_rmb() asm volatile("dmb oshld" : : : "memory")
> > > +#define rte_cio_rmb() rte_rmb()
> > >
> > >  /*------------------------ 128 bit atomic operations
> > > -------------------------*/
> > >
> > > --
> > > 2.17.1
> >
> > This change showed about 7% performance gain in testpmd single core
> NDR test.
> 
> I am trying to understand this patch wrt DPDK current usage model?
> 
> 1)  Is performance improvement due to the fact that the PMD that you are
> using it for testing suppose to use existing rte_cio_* but it was using
> rte_[rw]mb?

This is part of the reason. There are also cases where rte_io_* was used and can be relaxed.
Such as: http://patches.dpdk.org/patch/68162/

> 2) In my understanding :
> a) CPU to CPU barrier requirements are addressed by rte_smp_*
> b) CPU to DMA/Device barrier requirements are addressed by rte_cio_*
> c) CPU to ANY(CPU or Device) are addressed by  rte_[rw]mb
> 
> If (c) is true then we are violating the DPDK spec with change. Right?

Developers are still required to use correct barrier APIs for different use cases.
I think this change mitigates performance penalty when non optimal barrier is used.

> This change will not be required if fastpath (CPU to Device) is using rte_cio_*.
> Right?

See 1). Correct usage of rte_cio_* is not the whole.  
For some other use cases, such as barrier between accesses of different memory types, we can also use lighter barrier 'dmb'.

> 
> 
> 
> > Tested-by: Ruifeng Wang <ruifeng.wang@arm.com>
> >

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [dpdk-dev] [RFC] eal: adjust barriers for IO on Armv8-a
  2020-05-12  8:02         ` Ruifeng Wang
@ 2020-05-12  8:28           ` Jerin Jacob
  2020-05-12 21:44           ` Honnappa Nagarahalli
  1 sibling, 0 replies; 46+ messages in thread
From: Jerin Jacob @ 2020-05-12  8:28 UTC (permalink / raw)
  To: Ruifeng Wang
  Cc: Honnappa Nagarahalli, dev, jerinj, hemant.agrawal,
	Ajit Khaparde (ajit.khaparde, igorch, thomas, viacheslavo,
	arybchenko, nd

On Tue, May 12, 2020 at 1:32 PM Ruifeng Wang <Ruifeng.Wang@arm.com> wrote:
>
>
> > -----Original Message-----
> > From: Jerin Jacob <jerinjacobk@gmail.com>
> > Sent: Tuesday, May 12, 2020 2:42 PM
> > To: Ruifeng Wang <Ruifeng.Wang@arm.com>
> > Cc: Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>;
> > dev@dpdk.org; jerinj@marvell.com; hemant.agrawal@nxp.com; Ajit
> > Khaparde (ajit.khaparde@broadcom.com) <ajit.khaparde@broadcom.com>;
> > igorch@amazon.com; thomas@monjalon.net; viacheslavo@mellanox.com;
> > arybchenko@solarflare.com; nd <nd@arm.com>
> > Subject: Re: [dpdk-dev] [RFC] eal: adjust barriers for IO on Armv8-a
> >
> > On Tue, May 12, 2020 at 11:48 AM Ruifeng Wang <Ruifeng.Wang@arm.com>
> > wrote:
> > >
> > >
> > > > -----Original Message-----
> > > > From: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
> > > > Sent: Tuesday, May 12, 2020 2:07 AM
> > > > To: dev@dpdk.org; jerinj@marvell.com; hemant.agrawal@nxp.com; Ajit
> > > > Khaparde (ajit.khaparde@broadcom.com)
> > <ajit.khaparde@broadcom.com>;
> > > > igorch@amazon.com; thomas@monjalon.net;
> > viacheslavo@mellanox.com;
> > > > arybchenko@solarflare.com; Honnappa Nagarahalli
> > > > <Honnappa.Nagarahalli@arm.com>
> > > > Cc: Ruifeng Wang <Ruifeng.Wang@arm.com>; nd <nd@arm.com>
> > > > Subject: [RFC] eal: adjust barriers for IO on Armv8-a
> > > >
> > > > Change the barrier APIs for IO to reflect that Armv8-a is
> > > > other-multi-copy atomicity memory model.
> > > >
> > > > Armv8-a memory model has been strengthened to require
> > > > other-multi-copy atomicity. This property requires memory accesses
> > > > from an observer to become visible to all other observers
> > > > simultaneously [3]. This means
> > > >
> > > > a) A write arriving at an endpoint shared between multiple CPUs is
> > > >    visible to all CPUs
> > > > b) A write that is visible to all CPUs is also visible to all other
> > > >    observers in the shareability domain
> > > >
> > > > This allows for using cheaper DMB instructions in the place of DSB
> > > > for devices that are visible to all CPUs (i.e. devices that DPDK caters to).
> > > >
> > > > Please refer to [1], [2] and [3] for more information.
> > > >
> > > > [1]
> > > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/c
> > > > ommit/?i d=22ec71615d824f4f11d38d0e55a88d8956b7e45f
> > > > [2] https://www.youtube.com/watch?v=i6DayghhA8Q
> > > > [3] https://www.cl.cam.ac.uk/~pes20/armv8-mca/
> > > >
> > > > Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
> > > > ---
> > > >  lib/librte_eal/arm/include/rte_atomic_64.h | 10 +++++-----
> > > >  1 file changed, 5 insertions(+), 5 deletions(-)
> > > >
> > > > diff --git a/lib/librte_eal/arm/include/rte_atomic_64.h
> > > > b/lib/librte_eal/arm/include/rte_atomic_64.h
> > > > index 7b7099cdc..e406411bb 100644
> > > > --- a/lib/librte_eal/arm/include/rte_atomic_64.h
> > > > +++ b/lib/librte_eal/arm/include/rte_atomic_64.h
> > > > @@ -19,11 +19,11 @@ extern "C" {
> > > >  #include <rte_compat.h>
> > > >  #include <rte_debug.h>
> > > >
> > > > -#define rte_mb() asm volatile("dsb sy" : : : "memory")
> > > > +#define rte_mb() asm volatile("dmb osh" : : : "memory")
> > > >
> > > > -#define rte_wmb() asm volatile("dsb st" : : : "memory")
> > > > +#define rte_wmb() asm volatile("dmb oshst" : : : "memory")
> > > >
> > > > -#define rte_rmb() asm volatile("dsb ld" : : : "memory")
> > > > +#define rte_rmb() asm volatile("dmb oshld" : : : "memory")
> > > >
> > > >  #define rte_smp_mb() asm volatile("dmb ish" : : : "memory")
> > > >
> > > > @@ -37,9 +37,9 @@ extern "C" {
> > > >
> > > >  #define rte_io_rmb() rte_rmb()
> > > >
> > > > -#define rte_cio_wmb() asm volatile("dmb oshst" : : : "memory")
> > > > +#define rte_cio_wmb() rte_wmb()
> > > >
> > > > -#define rte_cio_rmb() asm volatile("dmb oshld" : : : "memory")
> > > > +#define rte_cio_rmb() rte_rmb()
> > > >
> > > >  /*------------------------ 128 bit atomic operations
> > > > -------------------------*/
> > > >
> > > > --
> > > > 2.17.1
> > >
> > > This change showed about 7% performance gain in testpmd single core
> > NDR test.
> >
> > I am trying to understand this patch wrt DPDK current usage model?
> >
> > 1)  Is performance improvement due to the fact that the PMD that you are
> > using it for testing suppose to use existing rte_cio_* but it was using
> > rte_[rw]mb?
>
> This is part of the reason. There are also cases where rte_io_* was used and can be relaxed.
> Such as: http://patches.dpdk.org/patch/68162/
>
> > 2) In my understanding :
> > a) CPU to CPU barrier requirements are addressed by rte_smp_*
> > b) CPU to DMA/Device barrier requirements are addressed by rte_cio_*
> > c) CPU to ANY(CPU or Device) are addressed by  rte_[rw]mb
> >
> > If (c) is true then we are violating the DPDK spec with change. Right?
>
> Developers are still required to use correct barrier APIs for different use cases.
> I think this change mitigates performance penalty when non optimal barrier is used.

But does it violate the contract?  We are using rte_[rw]mb as a low
performance/heavyweight
for all the cases. I think that is the contract to DPDK consumers. For
different requirment,
We have a specific API. IMO, It makes sense to change the fastpath code for more
fine granted barriers based on the need rather than changing the
generic one to lightweight.
i.e rte_[rw]wb is the superset that works on all cases and use
customized one for the specific
use case.

>
> > This change will not be required if fastpath (CPU to Device) is using rte_cio_*.
> > Right?
>
> See 1). Correct usage of rte_cio_* is not the whole.
> For some other use cases, such as barrier between accesses of different memory types, we can also use lighter barrier 'dmb'.
>
> >
> >
> >
> > > Tested-by: Ruifeng Wang <ruifeng.wang@arm.com>
> > >

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [dpdk-dev] [RFC] eal: adjust barriers for IO on Armv8-a
  2020-05-12  8:02         ` Ruifeng Wang
  2020-05-12  8:28           ` Jerin Jacob
@ 2020-05-12 21:44           ` Honnappa Nagarahalli
  2020-05-13 14:49             ` Jerin Jacob
  1 sibling, 1 reply; 46+ messages in thread
From: Honnappa Nagarahalli @ 2020-05-12 21:44 UTC (permalink / raw)
  To: Ruifeng Wang, Jerin Jacob
  Cc: dev, jerinj, hemant.agrawal, Ajit Khaparde (ajit.khaparde,
	igorch, thomas, viacheslavo, arybchenko, nd,
	Honnappa Nagarahalli, nd

<snip>

> > > > Subject: [RFC] eal: adjust barriers for IO on Armv8-a
> > > >
> > > > Change the barrier APIs for IO to reflect that Armv8-a is
> > > > other-multi-copy atomicity memory model.
> > > >
> > > > Armv8-a memory model has been strengthened to require
> > > > other-multi-copy atomicity. This property requires memory accesses
> > > > from an observer to become visible to all other observers
> > > > simultaneously [3]. This means
> > > >
> > > > a) A write arriving at an endpoint shared between multiple CPUs is
> > > >    visible to all CPUs
> > > > b) A write that is visible to all CPUs is also visible to all other
> > > >    observers in the shareability domain
> > > >
> > > > This allows for using cheaper DMB instructions in the place of DSB
> > > > for devices that are visible to all CPUs (i.e. devices that DPDK caters to).
> > > >
> > > > Please refer to [1], [2] and [3] for more information.
> > > >
> > > > [1]
> > > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
> > > > /c ommit/?i d=22ec71615d824f4f11d38d0e55a88d8956b7e45f
> > > > [2] https://www.youtube.com/watch?v=i6DayghhA8Q
> > > > [3] https://www.cl.cam.ac.uk/~pes20/armv8-mca/
> > > >
> > > > Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
> > > > ---
> > > >  lib/librte_eal/arm/include/rte_atomic_64.h | 10 +++++-----
> > > >  1 file changed, 5 insertions(+), 5 deletions(-)
> > > >
> > > > diff --git a/lib/librte_eal/arm/include/rte_atomic_64.h
> > > > b/lib/librte_eal/arm/include/rte_atomic_64.h
> > > > index 7b7099cdc..e406411bb 100644
> > > > --- a/lib/librte_eal/arm/include/rte_atomic_64.h
> > > > +++ b/lib/librte_eal/arm/include/rte_atomic_64.h
> > > > @@ -19,11 +19,11 @@ extern "C" {
> > > >  #include <rte_compat.h>
> > > >  #include <rte_debug.h>
> > > >
> > > > -#define rte_mb() asm volatile("dsb sy" : : : "memory")
> > > > +#define rte_mb() asm volatile("dmb osh" : : : "memory")
> > > >
> > > > -#define rte_wmb() asm volatile("dsb st" : : : "memory")
> > > > +#define rte_wmb() asm volatile("dmb oshst" : : : "memory")
> > > >
> > > > -#define rte_rmb() asm volatile("dsb ld" : : : "memory")
> > > > +#define rte_rmb() asm volatile("dmb oshld" : : : "memory")
> > > >
> > > >  #define rte_smp_mb() asm volatile("dmb ish" : : : "memory")
> > > >
> > > > @@ -37,9 +37,9 @@ extern "C" {
> > > >
> > > >  #define rte_io_rmb() rte_rmb()
> > > >
> > > > -#define rte_cio_wmb() asm volatile("dmb oshst" : : : "memory")
> > > > +#define rte_cio_wmb() rte_wmb()
> > > >
> > > > -#define rte_cio_rmb() asm volatile("dmb oshld" : : : "memory")
> > > > +#define rte_cio_rmb() rte_rmb()
> > > >
> > > >  /*------------------------ 128 bit atomic operations
> > > > -------------------------*/
> > > >
> > > > --
> > > > 2.17.1
> > >
> > > This change showed about 7% performance gain in testpmd single core
> > NDR test.
> >
> > I am trying to understand this patch wrt DPDK current usage model?
> >
> > 1)  Is performance improvement due to the fact that the PMD that you
> > are using it for testing suppose to use existing rte_cio_* but it was
> > using rte_[rw]mb?
No, it is supposed to use rte_[rw]mb for x86.

> 
> This is part of the reason. There are also cases where rte_io_* was used and
> can be relaxed.
> Such as: http://patches.dpdk.org/patch/68162/
> 
> > 2) In my understanding :
> > a) CPU to CPU barrier requirements are addressed by rte_smp_*
> > b) CPU to DMA/Device barrier requirements are addressed by rte_cio_*
> > c) CPU to ANY(CPU or Device) are addressed by  rte_[rw]mb
> >
> > If (c) is true then we are violating the DPDK spec with change. Right?
No, we are not. Essentially, due to the other-multi-copy atomicity behavior of the architecture, we are saying 'DMB OSH*' is enough to achieve (c).

> 
> Developers are still required to use correct barrier APIs for different use cases.
> I think this change mitigates performance penalty when non optimal barrier is
> used.
> 
> > This change will not be required if fastpath (CPU to Device) is using
> rte_cio_*.
> > Right?
Yes. It is required when the fastpath uses rte_[rw]mb.

> 
> See 1). Correct usage of rte_cio_* is not the whole.
> For some other use cases, such as barrier between accesses of different
> memory types, we can also use lighter barrier 'dmb'.
> 
> >
> >
> >
> > > Tested-by: Ruifeng Wang <ruifeng.wang@arm.com>
> > >


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [dpdk-dev] [RFC] eal: adjust barriers for IO on Armv8-a
  2020-05-12 21:44           ` Honnappa Nagarahalli
@ 2020-05-13 14:49             ` Jerin Jacob
  2020-05-14  1:02               ` Honnappa Nagarahalli
  0 siblings, 1 reply; 46+ messages in thread
From: Jerin Jacob @ 2020-05-13 14:49 UTC (permalink / raw)
  To: Honnappa Nagarahalli
  Cc: Ruifeng Wang, dev, jerinj, hemant.agrawal,
	Ajit Khaparde (ajit.khaparde, igorch, thomas, viacheslavo,
	arybchenko, nd, Richardson, Bruce

On Wed, May 13, 2020 at 3:14 AM Honnappa Nagarahalli
<Honnappa.Nagarahalli@arm.com> wrote:
>
> <snip>
>
> > > > > Subject: [RFC] eal: adjust barriers for IO on Armv8-a
> > > > >
> > > > > Change the barrier APIs for IO to reflect that Armv8-a is
> > > > > other-multi-copy atomicity memory model.
> > > > >
> > > > > Armv8-a memory model has been strengthened to require
> > > > > other-multi-copy atomicity. This property requires memory accesses
> > > > > from an observer to become visible to all other observers
> > > > > simultaneously [3]. This means
> > > > >
> > > > > a) A write arriving at an endpoint shared between multiple CPUs is
> > > > >    visible to all CPUs
> > > > > b) A write that is visible to all CPUs is also visible to all other
> > > > >    observers in the shareability domain
> > > > >
> > > > > This allows for using cheaper DMB instructions in the place of DSB
> > > > > for devices that are visible to all CPUs (i.e. devices that DPDK caters to).
> > > > >
> > > > > Please refer to [1], [2] and [3] for more information.
> > > > >
> > > > > [1]
> > > > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
> > > > > /c ommit/?i d=22ec71615d824f4f11d38d0e55a88d8956b7e45f
> > > > > [2] https://www.youtube.com/watch?v=i6DayghhA8Q
> > > > > [3] https://www.cl.cam.ac.uk/~pes20/armv8-mca/
> > > > >
> > > > > Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
> > > > > ---
> > > > >  lib/librte_eal/arm/include/rte_atomic_64.h | 10 +++++-----
> > > > >  1 file changed, 5 insertions(+), 5 deletions(-)
> > > > >
> > > > > diff --git a/lib/librte_eal/arm/include/rte_atomic_64.h
> > > > > b/lib/librte_eal/arm/include/rte_atomic_64.h
> > > > > index 7b7099cdc..e406411bb 100644
> > > > > --- a/lib/librte_eal/arm/include/rte_atomic_64.h
> > > > > +++ b/lib/librte_eal/arm/include/rte_atomic_64.h
> > > > > @@ -19,11 +19,11 @@ extern "C" {
> > > > >  #include <rte_compat.h>
> > > > >  #include <rte_debug.h>
> > > > >
> > > > > -#define rte_mb() asm volatile("dsb sy" : : : "memory")
> > > > > +#define rte_mb() asm volatile("dmb osh" : : : "memory")
> > > > >
> > > > > -#define rte_wmb() asm volatile("dsb st" : : : "memory")
> > > > > +#define rte_wmb() asm volatile("dmb oshst" : : : "memory")
> > > > >
> > > > > -#define rte_rmb() asm volatile("dsb ld" : : : "memory")
> > > > > +#define rte_rmb() asm volatile("dmb oshld" : : : "memory")
> > > > >
> > > > >  #define rte_smp_mb() asm volatile("dmb ish" : : : "memory")
> > > > >
> > > > > @@ -37,9 +37,9 @@ extern "C" {
> > > > >
> > > > >  #define rte_io_rmb() rte_rmb()
> > > > >
> > > > > -#define rte_cio_wmb() asm volatile("dmb oshst" : : : "memory")
> > > > > +#define rte_cio_wmb() rte_wmb()
> > > > >
> > > > > -#define rte_cio_rmb() asm volatile("dmb oshld" : : : "memory")
> > > > > +#define rte_cio_rmb() rte_rmb()
> > > > >
> > > > >  /*------------------------ 128 bit atomic operations
> > > > > -------------------------*/
> > > > >
> > > > > --
> > > > > 2.17.1
> > > >
> > > > This change showed about 7% performance gain in testpmd single core
> > > NDR test.
> > >
> > > I am trying to understand this patch wrt DPDK current usage model?
> > >
> > > 1)  Is performance improvement due to the fact that the PMD that you
> > > are using it for testing suppose to use existing rte_cio_* but it was
> > > using rte_[rw]mb?
> No, it is supposed to use rte_[rw]mb for x86.

Why drivers using rte_[rw]in fastpath, IMO, rte_io_[rw]b and rte_cio_[rw]b
created for this pupose.

But I understand, in x86 it is mapped to rte_compiler_barrier(). Is it
correct from x86 PoV?
@Ananyev, Konstantin @Richardson, Bruce ?

For x86:
#define rte_io_wmb() rte_compiler_barrier()
#define rte_io_rmb() rte_compiler_barrier()
#define rte_cio_wmb() rte_compiler_barrier()
#define rte_cio_rmb() rte_compiler_barrier()


>
> >
> > This is part of the reason. There are also cases where rte_io_* was used and
> > can be relaxed.
> > Such as: http://patches.dpdk.org/patch/68162/
> >
> > > 2) In my understanding :
> > > a) CPU to CPU barrier requirements are addressed by rte_smp_*
> > > b) CPU to DMA/Device barrier requirements are addressed by rte_cio_*
> > > c) CPU to ANY(CPU or Device) are addressed by  rte_[rw]mb
> > >
> > > If (c) is true then we are violating the DPDK spec with change. Right?
> No, we are not. Essentially, due to the other-multi-copy atomicity behavior of the architecture, we are saying 'DMB OSH*' is enough to achieve (c).

Yeah. Probably from userspace POV it should be OK to use "DMB OSH*" to
have the barrier between 4 of them?

1) Global memory (BSS and Data sections), Not mapped as a hugepage.
2) Hugepage memory
3) IOVA memory
4) PCI register read/write

Do we need to worry about anything else which is specific to DSB?
example, TLB related flush etc.

If we are talking this path then rte_cio_[rw]mb() has no meaning in
DPDK as an abstraction as it was created for arm64 for this specific
purpose.
If we can meet all DPDK usecse with DMB OSH then probably we can
deprecate rte_cio_wmb to avoid confusion.

>
> >
> > Developers are still required to use correct barrier APIs for different use cases.
> > I think this change mitigates performance penalty when non optimal barrier is
> > used.
> >
> > > This change will not be required if fastpath (CPU to Device) is using
> > rte_cio_*.
> > > Right?
> Yes. It is required when the fastpath uses rte_[rw]mb.
>
> >
> > See 1). Correct usage of rte_cio_* is not the whole.
> > For some other use cases, such as barrier between accesses of different
> > memory types, we can also use lighter barrier 'dmb'.
> >
> > >
> > >
> > >
> > > > Tested-by: Ruifeng Wang <ruifeng.wang@arm.com>
> > > >
>

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [dpdk-dev] [RFC] eal: adjust barriers for IO on Armv8-a
  2020-05-13 14:49             ` Jerin Jacob
@ 2020-05-14  1:02               ` Honnappa Nagarahalli
  0 siblings, 0 replies; 46+ messages in thread
From: Honnappa Nagarahalli @ 2020-05-14  1:02 UTC (permalink / raw)
  To: Jerin Jacob
  Cc: Ruifeng Wang, dev, jerinj, hemant.agrawal,
	Ajit Khaparde (ajit.khaparde, igorch, thomas, viacheslavo,
	arybchenko, nd, Richardson, Bruce, Honnappa Nagarahalli, nd

<snip>
> >
> > > > > > Subject: [RFC] eal: adjust barriers for IO on Armv8-a
> > > > > >
> > > > > > Change the barrier APIs for IO to reflect that Armv8-a is
> > > > > > other-multi-copy atomicity memory model.
> > > > > >
> > > > > > Armv8-a memory model has been strengthened to require
> > > > > > other-multi-copy atomicity. This property requires memory
> > > > > > accesses from an observer to become visible to all other
> > > > > > observers simultaneously [3]. This means
> > > > > >
> > > > > > a) A write arriving at an endpoint shared between multiple CPUs is
> > > > > >    visible to all CPUs
> > > > > > b) A write that is visible to all CPUs is also visible to all other
> > > > > >    observers in the shareability domain
> > > > > >
> > > > > > This allows for using cheaper DMB instructions in the place of
> > > > > > DSB for devices that are visible to all CPUs (i.e. devices that DPDK
> caters to).
> > > > > >
> > > > > > Please refer to [1], [2] and [3] for more information.
> > > > > >
> > > > > > [1]
> > > > > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux
> > > > > > .git /c ommit/?i d=22ec71615d824f4f11d38d0e55a88d8956b7e45f
> > > > > > [2] https://www.youtube.com/watch?v=i6DayghhA8Q
> > > > > > [3] https://www.cl.cam.ac.uk/~pes20/armv8-mca/
> > > > > >
> > > > > > Signed-off-by: Honnappa Nagarahalli
> > > > > > <honnappa.nagarahalli@arm.com>
> > > > > > ---
> > > > > >  lib/librte_eal/arm/include/rte_atomic_64.h | 10 +++++-----
> > > > > >  1 file changed, 5 insertions(+), 5 deletions(-)
> > > > > >
> > > > > > diff --git a/lib/librte_eal/arm/include/rte_atomic_64.h
> > > > > > b/lib/librte_eal/arm/include/rte_atomic_64.h
> > > > > > index 7b7099cdc..e406411bb 100644
> > > > > > --- a/lib/librte_eal/arm/include/rte_atomic_64.h
> > > > > > +++ b/lib/librte_eal/arm/include/rte_atomic_64.h
> > > > > > @@ -19,11 +19,11 @@ extern "C" {  #include <rte_compat.h>
> > > > > > #include <rte_debug.h>
> > > > > >
> > > > > > -#define rte_mb() asm volatile("dsb sy" : : : "memory")
> > > > > > +#define rte_mb() asm volatile("dmb osh" : : : "memory")
> > > > > >
> > > > > > -#define rte_wmb() asm volatile("dsb st" : : : "memory")
> > > > > > +#define rte_wmb() asm volatile("dmb oshst" : : : "memory")
> > > > > >
> > > > > > -#define rte_rmb() asm volatile("dsb ld" : : : "memory")
> > > > > > +#define rte_rmb() asm volatile("dmb oshld" : : : "memory")
> > > > > >
> > > > > >  #define rte_smp_mb() asm volatile("dmb ish" : : : "memory")
> > > > > >
> > > > > > @@ -37,9 +37,9 @@ extern "C" {
> > > > > >
> > > > > >  #define rte_io_rmb() rte_rmb()
> > > > > >
> > > > > > -#define rte_cio_wmb() asm volatile("dmb oshst" : : :
> > > > > > "memory")
> > > > > > +#define rte_cio_wmb() rte_wmb()
> > > > > >
> > > > > > -#define rte_cio_rmb() asm volatile("dmb oshld" : : :
> > > > > > "memory")
> > > > > > +#define rte_cio_rmb() rte_rmb()
> > > > > >
> > > > > >  /*------------------------ 128 bit atomic operations
> > > > > > -------------------------*/
> > > > > >
> > > > > > --
> > > > > > 2.17.1
> > > > >
> > > > > This change showed about 7% performance gain in testpmd single
> > > > > core
> > > > NDR test.
> > > >
> > > > I am trying to understand this patch wrt DPDK current usage model?
> > > >
> > > > 1)  Is performance improvement due to the fact that the PMD that
> > > > you are using it for testing suppose to use existing rte_cio_* but
> > > > it was using rte_[rw]mb?
> > No, it is supposed to use rte_[rw]mb for x86.
> 
> Why drivers using rte_[rw]in fastpath, IMO, rte_io_[rw]b and rte_cio_[rw]b
> created for this pupose.
> 
> But I understand, in x86 it is mapped to rte_compiler_barrier(). Is it correct
> from x86 PoV?
> @Ananyev, Konstantin @Richardson, Bruce ?
> 
> For x86:
> #define rte_io_wmb() rte_compiler_barrier() #define rte_io_rmb()
> rte_compiler_barrier() #define rte_cio_wmb() rte_compiler_barrier() #define
> rte_cio_rmb() rte_compiler_barrier()
We need a barrier API with 'DMB OSH*' for Arm and '*fence' for x86. My understanding is, '*fence' is required when WC memory is used in x86.

Also, from Arm architecture perspective, effectively we are saying that 'DSB' is not required for portable drivers.

> 
> 
> >
> > >
> > > This is part of the reason. There are also cases where rte_io_* was
> > > used and can be relaxed.
> > > Such as: http://patches.dpdk.org/patch/68162/
> > >
> > > > 2) In my understanding :
> > > > a) CPU to CPU barrier requirements are addressed by rte_smp_*
> > > > b) CPU to DMA/Device barrier requirements are addressed by
> > > > rte_cio_*
> > > > c) CPU to ANY(CPU or Device) are addressed by  rte_[rw]mb
> > > >
> > > > If (c) is true then we are violating the DPDK spec with change. Right?
> > No, we are not. Essentially, due to the other-multi-copy atomicity behavior
> of the architecture, we are saying 'DMB OSH*' is enough to achieve (c).
> 
> Yeah. Probably from userspace POV it should be OK to use "DMB OSH*" to
> have the barrier between 4 of them?
> 
> 1) Global memory (BSS and Data sections), Not mapped as a hugepage.
> 2) Hugepage memory
> 3) IOVA memory
> 4) PCI register read/write
> 
> Do we need to worry about anything else which is specific to DSB?
> example, TLB related flush etc.
Yes, things like TLB flush or self modifying code still need DSB. But, my understanding is we do not have such code in DPDK and such code will be platform specific.

> 
> If we are talking this path then rte_cio_[rw]mb() has no meaning in DPDK as
> an abstraction as it was created for arm64 for this specific purpose.
> If we can meet all DPDK usecse with DMB OSH then probably we can
> deprecate rte_cio_wmb to avoid confusion.
Agree, rte_cio_*mb is confusing to me. We could deprecate those.

I see Octeon TX/TX2 drivers using rte_*mb. Do you see any issues with this change in those drivers?

This is a very fundamental change, we need more feedback from others working with Arm platforms.

> 
> >
> > >
> > > Developers are still required to use correct barrier APIs for different use
> cases.
> > > I think this change mitigates performance penalty when non optimal
> > > barrier is used.
> > >
> > > > This change will not be required if fastpath (CPU to Device) is
> > > > using
> > > rte_cio_*.
> > > > Right?
> > Yes. It is required when the fastpath uses rte_[rw]mb.
> >
> > >
> > > See 1). Correct usage of rte_cio_* is not the whole.
> > > For some other use cases, such as barrier between accesses of
> > > different memory types, we can also use lighter barrier 'dmb'.
> > >
> > > >
> > > >
> > > >
> > > > > Tested-by: Ruifeng Wang <ruifeng.wang@arm.com>
> > > > >
> >

^ permalink raw reply	[flat|nested] 46+ messages in thread

* [dpdk-dev] [PATCH v3] net/mlx5: relaxed ordering for multi-packet RQ buffer refcnt
  2020-04-10 16:41 ` [dpdk-dev] [PATCH RFC v2 6/7] net/mlx5: relax ordering for multi-packet RQ buffer refcnt Gavin Hu
@ 2020-06-23  8:26   ` " Phil Yang
  0 siblings, 0 replies; 46+ messages in thread
From: Phil Yang @ 2020-06-23  8:26 UTC (permalink / raw)
  To: dev; +Cc: matan, shahafs, viacheslavo, Honnappa.Nagarahalli, drc, nd

Use c11 atomics with explicit ordering instead of the rte_atomic ops
which enforce unnecessary barriers on aarch64.

Signed-off-by: Phil Yang <phil.yang@arm.com>
---
v3:
Split from the patchset:
http://patchwork.dpdk.org/cover/68159/

 drivers/net/mlx5/mlx5_rxq.c  |  2 +-
 drivers/net/mlx5/mlx5_rxtx.c | 16 +++++++++-------
 drivers/net/mlx5/mlx5_rxtx.h |  2 +-
 3 files changed, 11 insertions(+), 9 deletions(-)

diff --git a/drivers/net/mlx5/mlx5_rxq.c b/drivers/net/mlx5/mlx5_rxq.c
index dda0073..7f487f1 100644
--- a/drivers/net/mlx5/mlx5_rxq.c
+++ b/drivers/net/mlx5/mlx5_rxq.c
@@ -1545,7 +1545,7 @@ mlx5_mprq_buf_init(struct rte_mempool *mp, void *opaque_arg,
 
 	memset(_m, 0, sizeof(*buf));
 	buf->mp = mp;
-	rte_atomic16_set(&buf->refcnt, 1);
+	__atomic_store_n(&buf->refcnt, 1, __ATOMIC_RELAXED);
 	for (j = 0; j != strd_n; ++j) {
 		shinfo = &buf->shinfos[j];
 		shinfo->free_cb = mlx5_mprq_buf_free_cb;
diff --git a/drivers/net/mlx5/mlx5_rxtx.c b/drivers/net/mlx5/mlx5_rxtx.c
index e4106bf..f0eda88 100644
--- a/drivers/net/mlx5/mlx5_rxtx.c
+++ b/drivers/net/mlx5/mlx5_rxtx.c
@@ -1595,10 +1595,11 @@ mlx5_mprq_buf_free_cb(void *addr __rte_unused, void *opaque)
 {
 	struct mlx5_mprq_buf *buf = opaque;
 
-	if (rte_atomic16_read(&buf->refcnt) == 1) {
+	if (__atomic_load_n(&buf->refcnt, __ATOMIC_RELAXED) == 1) {
 		rte_mempool_put(buf->mp, buf);
-	} else if (rte_atomic16_add_return(&buf->refcnt, -1) == 0) {
-		rte_atomic16_set(&buf->refcnt, 1);
+	} else if (unlikely(__atomic_sub_fetch(&buf->refcnt, 1,
+					       __ATOMIC_RELAXED) == 0)) {
+		__atomic_store_n(&buf->refcnt, 1, __ATOMIC_RELAXED);
 		rte_mempool_put(buf->mp, buf);
 	}
 }
@@ -1678,7 +1679,8 @@ mlx5_rx_burst_mprq(void *dpdk_rxq, struct rte_mbuf **pkts, uint16_t pkts_n)
 
 		if (consumed_strd == strd_n) {
 			/* Replace WQE only if the buffer is still in use. */
-			if (rte_atomic16_read(&buf->refcnt) > 1) {
+			if (__atomic_load_n(&buf->refcnt,
+					    __ATOMIC_RELAXED) > 1) {
 				mprq_buf_replace(rxq, rq_ci & wq_mask, strd_n);
 				/* Release the old buffer. */
 				mlx5_mprq_buf_free(buf);
@@ -1790,9 +1792,9 @@ mlx5_rx_burst_mprq(void *dpdk_rxq, struct rte_mbuf **pkts, uint16_t pkts_n)
 			void *buf_addr;
 
 			/* Increment the refcnt of the whole chunk. */
-			rte_atomic16_add_return(&buf->refcnt, 1);
-			MLX5_ASSERT((uint16_t)rte_atomic16_read(&buf->refcnt) <=
-				    strd_n + 1);
+			__atomic_add_fetch(&buf->refcnt, 1, __ATOMIC_ACQUIRE);
+			MLX5_ASSERT(__atomic_load_n(&buf->refcnt,
+				    __ATOMIC_RELAXED) <= strd_n + 1);
 			buf_addr = RTE_PTR_SUB(addr, RTE_PKTMBUF_HEADROOM);
 			/*
 			 * MLX5 device doesn't use iova but it is necessary in a
diff --git a/drivers/net/mlx5/mlx5_rxtx.h b/drivers/net/mlx5/mlx5_rxtx.h
index 26621ff..0fc15f3 100644
--- a/drivers/net/mlx5/mlx5_rxtx.h
+++ b/drivers/net/mlx5/mlx5_rxtx.h
@@ -78,7 +78,7 @@ struct rxq_zip {
 /* Multi-Packet RQ buffer header. */
 struct mlx5_mprq_buf {
 	struct rte_mempool *mp;
-	rte_atomic16_t refcnt; /* Atomically accessed refcnt. */
+	uint16_t refcnt; /* Atomically accessed refcnt. */
 	uint8_t pad[RTE_PKTMBUF_HEADROOM]; /* Headroom for the first packet. */
 	struct rte_mbuf_ext_shared_info shinfos[];
 	/*
-- 
2.7.4


^ permalink raw reply	[flat|nested] 46+ messages in thread

* [dpdk-dev] [PATCH v2] eal: adjust barriers for IO on Armv8-a
  2020-04-10 16:41 ` [dpdk-dev] [PATCH RFC v2 0/7] introduce new barrier class and use it for mlx5 PMD Gavin Hu
  2020-04-10 17:20   ` Andrew Rybchenko
  2020-05-11 18:06   ` [dpdk-dev] [RFC] eal: adjust barriers for IO on Armv8-a Honnappa Nagarahalli
@ 2020-06-27 19:12   ` " Honnappa Nagarahalli
  2020-06-27 19:25     ` Honnappa Nagarahalli
  2020-07-03 18:57   ` [dpdk-dev] [PATCH v3 1/3] " Honnappa Nagarahalli
  2020-07-06 23:43   ` [dpdk-dev] [PATCH v4 1/3] eal: adjust barriers for IO on Armv8-a Honnappa Nagarahalli
  4 siblings, 1 reply; 46+ messages in thread
From: Honnappa Nagarahalli @ 2020-06-27 19:12 UTC (permalink / raw)
  To: dev, honnappa.nagarahalli, ruifeng.wang, jerinj, hemant.agrawal,
	ajit.khaparde, igorch, thomas, viacheslavo, arybchenko,
	bruce.richardson
  Cc: nd

Change the barrier APIs for IO to reflect that Armv8-a is other-multi-copy
atomicity memory model.

Armv8-a memory model has been strengthened to require
other-multi-copy atomicity. This property requires memory accesses
from an observer to become visible to all other observers
simultaneously [3]. This means

a) A write arriving at an endpoint shared between multiple CPUs is
   visible to all CPUs
b) A write that is visible to all CPUs is also visible to all other
   observers in the shareability domain

This allows for using cheaper DMB instructions in the place of DSB
for devices that are visible to all CPUs (i.e. devices that DPDK
caters to).

Please refer to [1], [2] and [3] for more information.

[1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=22ec71615d824f4f11d38d0e55a88d8956b7e45f
[2] https://www.youtube.com/watch?v=i6DayghhA8Q
[3] https://www.cl.cam.ac.uk/~pes20/armv8-mca/

Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Tested-by: Ruifeng Wang <ruifeng.wang@arm.com>
---
 lib/librte_eal/arm/include/rte_atomic_64.h | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/lib/librte_eal/arm/include/rte_atomic_64.h b/lib/librte_eal/arm/include/rte_atomic_64.h
index 7b7099cdc..e42f69edc 100644
--- a/lib/librte_eal/arm/include/rte_atomic_64.h
+++ b/lib/librte_eal/arm/include/rte_atomic_64.h
@@ -1,6 +1,6 @@
 /* SPDX-License-Identifier: BSD-3-Clause
  * Copyright(c) 2015 Cavium, Inc
- * Copyright(c) 2019 Arm Limited
+ * Copyright(c) 2020 Arm Limited
  */
 
 #ifndef _RTE_ATOMIC_ARM64_H_
@@ -19,11 +19,11 @@ extern "C" {
 #include <rte_compat.h>
 #include <rte_debug.h>
 
-#define rte_mb() asm volatile("dsb sy" : : : "memory")
+#define rte_mb() asm volatile("dmb osh" : : : "memory")
 
-#define rte_wmb() asm volatile("dsb st" : : : "memory")
+#define rte_wmb() asm volatile("dmb oshst" : : : "memory")
 
-#define rte_rmb() asm volatile("dsb ld" : : : "memory")
+#define rte_rmb() asm volatile("dmb oshld" : : : "memory")
 
 #define rte_smp_mb() asm volatile("dmb ish" : : : "memory")
 
@@ -37,9 +37,9 @@ extern "C" {
 
 #define rte_io_rmb() rte_rmb()
 
-#define rte_cio_wmb() asm volatile("dmb oshst" : : : "memory")
+#define rte_cio_wmb() rte_wmb()
 
-#define rte_cio_rmb() asm volatile("dmb oshld" : : : "memory")
+#define rte_cio_rmb() rte_rmb()
 
 /*------------------------ 128 bit atomic operations -------------------------*/
 
-- 
2.17.1


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [dpdk-dev] [PATCH v2] eal: adjust barriers for IO on Armv8-a
  2020-06-27 19:12   ` [dpdk-dev] [PATCH v2] " Honnappa Nagarahalli
@ 2020-06-27 19:25     ` Honnappa Nagarahalli
  2020-06-30  5:13       ` Jerin Jacob
  0 siblings, 1 reply; 46+ messages in thread
From: Honnappa Nagarahalli @ 2020-06-27 19:25 UTC (permalink / raw)
  To: Honnappa Nagarahalli, dev, Ruifeng Wang, jerinj, hemant.agrawal,
	Ajit Khaparde (ajit.khaparde, igorch, thomas, viacheslavo,
	arybchenko, bruce.richardson
  Cc: nd, Honnappa Nagarahalli, nd

Hi Jerin,
	You had a comment earlier about deprecating rte_cio_[rw]mb. Let me know if you are ok with this patch and I can add those changes (replace references to rte_cio_[rw]mb with rte_io_[rw]mb and a deprecation notice).

Thanks,
Honnappa

> -----Original Message-----
> From: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
> Sent: Saturday, June 27, 2020 2:12 PM
> To: dev@dpdk.org; Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>;
> Ruifeng Wang <Ruifeng.Wang@arm.com>; jerinj@marvell.com;
> hemant.agrawal@nxp.com; Ajit Khaparde (ajit.khaparde@broadcom.com)
> <ajit.khaparde@broadcom.com>; igorch@amazon.com;
> thomas@monjalon.net; viacheslavo@mellanox.com;
> arybchenko@solarflare.com; bruce.richardson@intel.com
> Cc: nd <nd@arm.com>
> Subject: [PATCH v2] eal: adjust barriers for IO on Armv8-a
> 
> Change the barrier APIs for IO to reflect that Armv8-a is other-multi-copy
> atomicity memory model.
> 
> Armv8-a memory model has been strengthened to require other-multi-copy
> atomicity. This property requires memory accesses from an observer to
> become visible to all other observers simultaneously [3]. This means
> 
> a) A write arriving at an endpoint shared between multiple CPUs is
>    visible to all CPUs
> b) A write that is visible to all CPUs is also visible to all other
>    observers in the shareability domain
> 
> This allows for using cheaper DMB instructions in the place of DSB for devices
> that are visible to all CPUs (i.e. devices that DPDK caters to).
> 
> Please refer to [1], [2] and [3] for more information.
> 
> [1]
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id
> =22ec71615d824f4f11d38d0e55a88d8956b7e45f
> [2] https://www.youtube.com/watch?v=i6DayghhA8Q
> [3] https://www.cl.cam.ac.uk/~pes20/armv8-mca/
> 
> Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
> Tested-by: Ruifeng Wang <ruifeng.wang@arm.com>
> ---
>  lib/librte_eal/arm/include/rte_atomic_64.h | 12 ++++++------
>  1 file changed, 6 insertions(+), 6 deletions(-)
> 
> diff --git a/lib/librte_eal/arm/include/rte_atomic_64.h
> b/lib/librte_eal/arm/include/rte_atomic_64.h
> index 7b7099cdc..e42f69edc 100644
> --- a/lib/librte_eal/arm/include/rte_atomic_64.h
> +++ b/lib/librte_eal/arm/include/rte_atomic_64.h
> @@ -1,6 +1,6 @@
>  /* SPDX-License-Identifier: BSD-3-Clause
>   * Copyright(c) 2015 Cavium, Inc
> - * Copyright(c) 2019 Arm Limited
> + * Copyright(c) 2020 Arm Limited
>   */
> 
>  #ifndef _RTE_ATOMIC_ARM64_H_
> @@ -19,11 +19,11 @@ extern "C" {
>  #include <rte_compat.h>
>  #include <rte_debug.h>
> 
> -#define rte_mb() asm volatile("dsb sy" : : : "memory")
> +#define rte_mb() asm volatile("dmb osh" : : : "memory")
> 
> -#define rte_wmb() asm volatile("dsb st" : : : "memory")
> +#define rte_wmb() asm volatile("dmb oshst" : : : "memory")
> 
> -#define rte_rmb() asm volatile("dsb ld" : : : "memory")
> +#define rte_rmb() asm volatile("dmb oshld" : : : "memory")
> 
>  #define rte_smp_mb() asm volatile("dmb ish" : : : "memory")
> 
> @@ -37,9 +37,9 @@ extern "C" {
> 
>  #define rte_io_rmb() rte_rmb()
> 
> -#define rte_cio_wmb() asm volatile("dmb oshst" : : : "memory")
> +#define rte_cio_wmb() rte_wmb()
> 
> -#define rte_cio_rmb() asm volatile("dmb oshld" : : : "memory")
> +#define rte_cio_rmb() rte_rmb()
> 
>  /*------------------------ 128 bit atomic operations -------------------------*/
> 
> --
> 2.17.1


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [dpdk-dev] [PATCH v2] eal: adjust barriers for IO on Armv8-a
  2020-06-27 19:25     ` Honnappa Nagarahalli
@ 2020-06-30  5:13       ` Jerin Jacob
  0 siblings, 0 replies; 46+ messages in thread
From: Jerin Jacob @ 2020-06-30  5:13 UTC (permalink / raw)
  To: Honnappa Nagarahalli
  Cc: dev, Ruifeng Wang, jerinj, hemant.agrawal,
	Ajit Khaparde (ajit.khaparde, igorch, thomas, viacheslavo,
	arybchenko, bruce.richardson, nd

On Sun, Jun 28, 2020 at 12:55 AM Honnappa Nagarahalli
<Honnappa.Nagarahalli@arm.com> wrote:
>
> Hi Jerin,
>         You had a comment earlier about deprecating rte_cio_[rw]mb. Let me know if you are ok with this patch and I can add those changes (replace references to rte_cio_[rw]mb with rte_io_[rw]mb and a deprecation notice).

Acked-by: Jerin Jacob <jerinj@marvell.com> for this patch
Please send the deprecation notice for overlapping rte_cio_* for 20.11


>
> Thanks,
> Honnappa
>
> > -----Original Message-----
> > From: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
> > Sent: Saturday, June 27, 2020 2:12 PM
> > To: dev@dpdk.org; Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>;
> > Ruifeng Wang <Ruifeng.Wang@arm.com>; jerinj@marvell.com;
> > hemant.agrawal@nxp.com; Ajit Khaparde (ajit.khaparde@broadcom.com)
> > <ajit.khaparde@broadcom.com>; igorch@amazon.com;
> > thomas@monjalon.net; viacheslavo@mellanox.com;
> > arybchenko@solarflare.com; bruce.richardson@intel.com
> > Cc: nd <nd@arm.com>
> > Subject: [PATCH v2] eal: adjust barriers for IO on Armv8-a
> >
> > Change the barrier APIs for IO to reflect that Armv8-a is other-multi-copy
> > atomicity memory model.
> >
> > Armv8-a memory model has been strengthened to require other-multi-copy
> > atomicity. This property requires memory accesses from an observer to
> > become visible to all other observers simultaneously [3]. This means
> >
> > a) A write arriving at an endpoint shared between multiple CPUs is
> >    visible to all CPUs
> > b) A write that is visible to all CPUs is also visible to all other
> >    observers in the shareability domain
> >
> > This allows for using cheaper DMB instructions in the place of DSB for devices
> > that are visible to all CPUs (i.e. devices that DPDK caters to).
> >
> > Please refer to [1], [2] and [3] for more information.
> >
> > [1]
> > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id
> > =22ec71615d824f4f11d38d0e55a88d8956b7e45f
> > [2] https://www.youtube.com/watch?v=i6DayghhA8Q
> > [3] https://www.cl.cam.ac.uk/~pes20/armv8-mca/
> >
> > Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
> > Tested-by: Ruifeng Wang <ruifeng.wang@arm.com>
> > ---
> >  lib/librte_eal/arm/include/rte_atomic_64.h | 12 ++++++------
> >  1 file changed, 6 insertions(+), 6 deletions(-)
> >
> > diff --git a/lib/librte_eal/arm/include/rte_atomic_64.h
> > b/lib/librte_eal/arm/include/rte_atomic_64.h
> > index 7b7099cdc..e42f69edc 100644
> > --- a/lib/librte_eal/arm/include/rte_atomic_64.h
> > +++ b/lib/librte_eal/arm/include/rte_atomic_64.h
> > @@ -1,6 +1,6 @@
> >  /* SPDX-License-Identifier: BSD-3-Clause
> >   * Copyright(c) 2015 Cavium, Inc
> > - * Copyright(c) 2019 Arm Limited
> > + * Copyright(c) 2020 Arm Limited
> >   */
> >
> >  #ifndef _RTE_ATOMIC_ARM64_H_
> > @@ -19,11 +19,11 @@ extern "C" {
> >  #include <rte_compat.h>
> >  #include <rte_debug.h>
> >
> > -#define rte_mb() asm volatile("dsb sy" : : : "memory")
> > +#define rte_mb() asm volatile("dmb osh" : : : "memory")
> >
> > -#define rte_wmb() asm volatile("dsb st" : : : "memory")
> > +#define rte_wmb() asm volatile("dmb oshst" : : : "memory")
> >
> > -#define rte_rmb() asm volatile("dsb ld" : : : "memory")
> > +#define rte_rmb() asm volatile("dmb oshld" : : : "memory")
> >
> >  #define rte_smp_mb() asm volatile("dmb ish" : : : "memory")
> >
> > @@ -37,9 +37,9 @@ extern "C" {
> >
> >  #define rte_io_rmb() rte_rmb()
> >
> > -#define rte_cio_wmb() asm volatile("dmb oshst" : : : "memory")
> > +#define rte_cio_wmb() rte_wmb()
> >
> > -#define rte_cio_rmb() asm volatile("dmb oshld" : : : "memory")
> > +#define rte_cio_rmb() rte_rmb()
> >
> >  /*------------------------ 128 bit atomic operations -------------------------*/
> >
> > --
> > 2.17.1
>

^ permalink raw reply	[flat|nested] 46+ messages in thread

* [dpdk-dev] [PATCH v3 1/3] eal: adjust barriers for IO on Armv8-a
  2020-04-10 16:41 ` [dpdk-dev] [PATCH RFC v2 0/7] introduce new barrier class and use it for mlx5 PMD Gavin Hu
                     ` (2 preceding siblings ...)
  2020-06-27 19:12   ` [dpdk-dev] [PATCH v2] " Honnappa Nagarahalli
@ 2020-07-03 18:57   ` " Honnappa Nagarahalli
  2020-07-03 18:57     ` [dpdk-dev] [PATCH v3 2/3] doc: update armv8-a IO barrier changes Honnappa Nagarahalli
  2020-07-03 18:57     ` [dpdk-dev] [PATCH v3 3/3] doc: update deprecation of CIO barrier APIs Honnappa Nagarahalli
  2020-07-06 23:43   ` [dpdk-dev] [PATCH v4 1/3] eal: adjust barriers for IO on Armv8-a Honnappa Nagarahalli
  4 siblings, 2 replies; 46+ messages in thread
From: Honnappa Nagarahalli @ 2020-07-03 18:57 UTC (permalink / raw)
  To: dev, honnappa.nagarahalli, ruifeng.wang, jerinj, hemant.agrawal,
	ajit.khaparde, igorch, thomas, viacheslavo, arybchenko,
	bruce.richardson
  Cc: nd

Change the barrier APIs for IO to reflect that Armv8-a is other-multi-copy
atomicity memory model.

Armv8-a memory model has been strengthened to require
other-multi-copy atomicity. This property requires memory accesses
from an observer to become visible to all other observers
simultaneously [3]. This means

a) A write arriving at an endpoint shared between multiple CPUs is
   visible to all CPUs
b) A write that is visible to all CPUs is also visible to all other
   observers in the shareability domain

This allows for using cheaper DMB instructions in the place of DSB
for devices that are visible to all CPUs (i.e. devices that DPDK
caters to).

Please refer to [1], [2] and [3] for more information.

[1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=22ec71615d824f4f11d38d0e55a88d8956b7e45f
[2] https://www.youtube.com/watch?v=i6DayghhA8Q
[3] https://www.cl.cam.ac.uk/~pes20/armv8-mca/

Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Acked-by: Jerin Jacob <jerinj@marvell.com>
Tested-by: Ruifeng Wang <ruifeng.wang@arm.com>
---
 lib/librte_eal/arm/include/rte_atomic_64.h | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/lib/librte_eal/arm/include/rte_atomic_64.h b/lib/librte_eal/arm/include/rte_atomic_64.h
index 7b7099cdc..e42f69edc 100644
--- a/lib/librte_eal/arm/include/rte_atomic_64.h
+++ b/lib/librte_eal/arm/include/rte_atomic_64.h
@@ -1,6 +1,6 @@
 /* SPDX-License-Identifier: BSD-3-Clause
  * Copyright(c) 2015 Cavium, Inc
- * Copyright(c) 2019 Arm Limited
+ * Copyright(c) 2020 Arm Limited
  */
 
 #ifndef _RTE_ATOMIC_ARM64_H_
@@ -19,11 +19,11 @@ extern "C" {
 #include <rte_compat.h>
 #include <rte_debug.h>
 
-#define rte_mb() asm volatile("dsb sy" : : : "memory")
+#define rte_mb() asm volatile("dmb osh" : : : "memory")
 
-#define rte_wmb() asm volatile("dsb st" : : : "memory")
+#define rte_wmb() asm volatile("dmb oshst" : : : "memory")
 
-#define rte_rmb() asm volatile("dsb ld" : : : "memory")
+#define rte_rmb() asm volatile("dmb oshld" : : : "memory")
 
 #define rte_smp_mb() asm volatile("dmb ish" : : : "memory")
 
@@ -37,9 +37,9 @@ extern "C" {
 
 #define rte_io_rmb() rte_rmb()
 
-#define rte_cio_wmb() asm volatile("dmb oshst" : : : "memory")
+#define rte_cio_wmb() rte_wmb()
 
-#define rte_cio_rmb() asm volatile("dmb oshld" : : : "memory")
+#define rte_cio_rmb() rte_rmb()
 
 /*------------------------ 128 bit atomic operations -------------------------*/
 
-- 
2.17.1


^ permalink raw reply	[flat|nested] 46+ messages in thread

* [dpdk-dev] [PATCH v3 2/3] doc: update armv8-a IO barrier changes
  2020-07-03 18:57   ` [dpdk-dev] [PATCH v3 1/3] " Honnappa Nagarahalli
@ 2020-07-03 18:57     ` Honnappa Nagarahalli
  2020-07-05  0:57       ` Jerin Jacob
  2020-07-03 18:57     ` [dpdk-dev] [PATCH v3 3/3] doc: update deprecation of CIO barrier APIs Honnappa Nagarahalli
  1 sibling, 1 reply; 46+ messages in thread
From: Honnappa Nagarahalli @ 2020-07-03 18:57 UTC (permalink / raw)
  To: dev, honnappa.nagarahalli, ruifeng.wang, jerinj, hemant.agrawal,
	ajit.khaparde, igorch, thomas, viacheslavo, arybchenko,
	bruce.richardson
  Cc: nd

Updated the use of DMB instruction in rte_*mb APIs for Armv8-a.

Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
---
 doc/guides/rel_notes/release_20_08.rst | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/doc/guides/rel_notes/release_20_08.rst b/doc/guides/rel_notes/release_20_08.rst
index 5cbc4ce14..15c21996d 100644
--- a/doc/guides/rel_notes/release_20_08.rst
+++ b/doc/guides/rel_notes/release_20_08.rst
@@ -56,6 +56,13 @@ New Features
      Also, make sure to start the actual text at the margin.
      =========================================================
 
+* **rte_*mb APIs are updated to use DMB instruction.**
+
+  Armv8-a memory model has been strengthened to require other-multi-copy
+  atomicity. This allows for using DMB instruction instead of DSB for IO
+  barriers. rte_*mb APIs, for Armv8-a platforms, are changed to use DMB
+  instruction to reflect this.
+
 * **Updated PCAP driver.**
 
   Updated PCAP driver with new features and improvements, including:
-- 
2.17.1


^ permalink raw reply	[flat|nested] 46+ messages in thread

* [dpdk-dev] [PATCH v3 3/3] doc: update deprecation of CIO barrier APIs
  2020-07-03 18:57   ` [dpdk-dev] [PATCH v3 1/3] " Honnappa Nagarahalli
  2020-07-03 18:57     ` [dpdk-dev] [PATCH v3 2/3] doc: update armv8-a IO barrier changes Honnappa Nagarahalli
@ 2020-07-03 18:57     ` Honnappa Nagarahalli
  2020-07-05  0:57       ` Jerin Jacob
                         ` (2 more replies)
  1 sibling, 3 replies; 46+ messages in thread
From: Honnappa Nagarahalli @ 2020-07-03 18:57 UTC (permalink / raw)
  To: dev, honnappa.nagarahalli, ruifeng.wang, jerinj, hemant.agrawal,
	ajit.khaparde, igorch, thomas, viacheslavo, arybchenko,
	bruce.richardson
  Cc: nd

rte_cio_*mb APIs will be deprecated in 20.11 release.

Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
---
 doc/guides/rel_notes/deprecation.rst | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/doc/guides/rel_notes/deprecation.rst b/doc/guides/rel_notes/deprecation.rst
index d1034f60f..59656da3d 100644
--- a/doc/guides/rel_notes/deprecation.rst
+++ b/doc/guides/rel_notes/deprecation.rst
@@ -40,6 +40,12 @@ Deprecation Notices
   These wrappers must be used for patches that need to be merged in 20.08
   onwards. This change will not introduce any performance degradation.
 
+* rte_cio_*mb: Since the IO barriers for ArmV8-a platforms are relaxed from DSB
+  to DMB, rte_cio_*mb APIs provide the same functionality as rte_io_*mb
+  APIs(taking all platforms into consideration). rte_io_*mb APIs should be used
+  in the place of rte_cio_*mb APIs. The rte_cio_*mb APIs will be deprecated in
+  20.11 release.
+
 * igb_uio: In the view of reducing the kernel dependency from the main tree,
   as a first step, the Technical Board decided to move ``igb_uio``
   kernel module to the dpdk-kmods repository in the /linux/igb_uio/ directory
-- 
2.17.1


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [dpdk-dev] [PATCH v3 2/3] doc: update armv8-a IO barrier changes
  2020-07-03 18:57     ` [dpdk-dev] [PATCH v3 2/3] doc: update armv8-a IO barrier changes Honnappa Nagarahalli
@ 2020-07-05  0:57       ` Jerin Jacob
  0 siblings, 0 replies; 46+ messages in thread
From: Jerin Jacob @ 2020-07-05  0:57 UTC (permalink / raw)
  To: Honnappa Nagarahalli
  Cc: dpdk-dev, Ruifeng Wang (Arm Technology China),
	Jerin Jacob, Hemant Agrawal, Ajit Khaparde, Igor Chauskin,
	Thomas Monjalon, Slava Ovsiienko, Andrew Rybchenko, Richardson,
	Bruce, nd

On Sat, Jul 4, 2020 at 12:28 AM Honnappa Nagarahalli
<honnappa.nagarahalli@arm.com> wrote:
>
> Updated the use of DMB instruction in rte_*mb APIs for Armv8-a.
>
> Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
> ---
>  doc/guides/rel_notes/release_20_08.rst | 7 +++++++
>  1 file changed, 7 insertions(+)
>
> diff --git a/doc/guides/rel_notes/release_20_08.rst b/doc/guides/rel_notes/release_20_08.rst
> index 5cbc4ce14..15c21996d 100644
> --- a/doc/guides/rel_notes/release_20_08.rst
> +++ b/doc/guides/rel_notes/release_20_08.rst
> @@ -56,6 +56,13 @@ New Features
>       Also, make sure to start the actual text at the margin.
>       =========================================================
>
> +* **rte_*mb APIs are updated to use DMB instruction.**

IMO, It is better to change to following as the end user can ignore
parsing the description if not interested in arm64.

rte_*mb APIs are updated to use DMB instruction for Armv8-a

With  above change:
Acked-by: Jerin Jacob <jerinj@marvell.com>

> +
> +  Armv8-a memory model has been strengthened to require other-multi-copy
> +  atomicity. This allows for using DMB instruction instead of DSB for IO
> +  barriers. rte_*mb APIs, for Armv8-a platforms, are changed to use DMB
> +  instruction to reflect this.
> +
>  * **Updated PCAP driver.**
>
>    Updated PCAP driver with new features and improvements, including:
> --
> 2.17.1
>

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [dpdk-dev] [PATCH v3 3/3] doc: update deprecation of CIO barrier APIs
  2020-07-03 18:57     ` [dpdk-dev] [PATCH v3 3/3] doc: update deprecation of CIO barrier APIs Honnappa Nagarahalli
@ 2020-07-05  0:57       ` Jerin Jacob
  2020-07-07 20:19       ` Ajit Khaparde
  2020-07-08 11:05       ` Ananyev, Konstantin
  2 siblings, 0 replies; 46+ messages in thread
From: Jerin Jacob @ 2020-07-05  0:57 UTC (permalink / raw)
  To: Honnappa Nagarahalli
  Cc: dpdk-dev, Ruifeng Wang (Arm Technology China),
	Jerin Jacob, Hemant Agrawal, Ajit Khaparde, Igor Chauskin,
	Thomas Monjalon, Slava Ovsiienko, Andrew Rybchenko, Richardson,
	Bruce, nd

On Sat, Jul 4, 2020 at 12:28 AM Honnappa Nagarahalli
<honnappa.nagarahalli@arm.com> wrote:
>
> rte_cio_*mb APIs will be deprecated in 20.11 release.
>
> Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>

Acked-by: Jerin Jacob <jerinj@marvell.com>


> ---
>  doc/guides/rel_notes/deprecation.rst | 6 ++++++
>  1 file changed, 6 insertions(+)
>
> diff --git a/doc/guides/rel_notes/deprecation.rst b/doc/guides/rel_notes/deprecation.rst
> index d1034f60f..59656da3d 100644
> --- a/doc/guides/rel_notes/deprecation.rst
> +++ b/doc/guides/rel_notes/deprecation.rst
> @@ -40,6 +40,12 @@ Deprecation Notices
>    These wrappers must be used for patches that need to be merged in 20.08
>    onwards. This change will not introduce any performance degradation.
>
> +* rte_cio_*mb: Since the IO barriers for ArmV8-a platforms are relaxed from DSB
> +  to DMB, rte_cio_*mb APIs provide the same functionality as rte_io_*mb
> +  APIs(taking all platforms into consideration). rte_io_*mb APIs should be used
> +  in the place of rte_cio_*mb APIs. The rte_cio_*mb APIs will be deprecated in
> +  20.11 release.
> +
>  * igb_uio: In the view of reducing the kernel dependency from the main tree,
>    as a first step, the Technical Board decided to move ``igb_uio``
>    kernel module to the dpdk-kmods repository in the /linux/igb_uio/ directory
> --
> 2.17.1
>

^ permalink raw reply	[flat|nested] 46+ messages in thread

* [dpdk-dev] [PATCH v4 1/3] eal: adjust barriers for IO on Armv8-a
  2020-04-10 16:41 ` [dpdk-dev] [PATCH RFC v2 0/7] introduce new barrier class and use it for mlx5 PMD Gavin Hu
                     ` (3 preceding siblings ...)
  2020-07-03 18:57   ` [dpdk-dev] [PATCH v3 1/3] " Honnappa Nagarahalli
@ 2020-07-06 23:43   ` Honnappa Nagarahalli
  2020-07-06 23:43     ` [dpdk-dev] [PATCH v4 2/3] doc: update armv8-a IO barrier changes Honnappa Nagarahalli
  2020-07-06 23:43     ` [dpdk-dev] [PATCH v4 3/3] doc: update deprecation of CIO barrier APIs Honnappa Nagarahalli
  4 siblings, 2 replies; 46+ messages in thread
From: Honnappa Nagarahalli @ 2020-07-06 23:43 UTC (permalink / raw)
  To: dev, honnappa.nagarahalli, ruifeng.wang, jerinj, hemant.agrawal,
	ajit.khaparde, igorch, thomas, viacheslavo, arybchenko,
	bruce.richardson
  Cc: nd

Change the barrier APIs for IO to reflect that Armv8-a is other-multi-copy
atomicity memory model.

Armv8-a memory model has been strengthened to require
other-multi-copy atomicity. This property requires memory accesses
from an observer to become visible to all other observers
simultaneously [3]. This means

a) A write arriving at an endpoint shared between multiple CPUs is
   visible to all CPUs
b) A write that is visible to all CPUs is also visible to all other
   observers in the shareability domain

This allows for using cheaper DMB instructions in the place of DSB
for devices that are visible to all CPUs (i.e. devices that DPDK
caters to).

Please refer to [1], [2] and [3] for more information.

[1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=22ec71615d824f4f11d38d0e55a88d8956b7e45f
[2] https://www.youtube.com/watch?v=i6DayghhA8Q
[3] https://www.cl.cam.ac.uk/~pes20/armv8-mca/

Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Acked-by: Jerin Jacob <jerinj@marvell.com>
Tested-by: Ruifeng Wang <ruifeng.wang@arm.com>
---
 lib/librte_eal/arm/include/rte_atomic_64.h | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/lib/librte_eal/arm/include/rte_atomic_64.h b/lib/librte_eal/arm/include/rte_atomic_64.h
index 7b7099cdc..e42f69edc 100644
--- a/lib/librte_eal/arm/include/rte_atomic_64.h
+++ b/lib/librte_eal/arm/include/rte_atomic_64.h
@@ -1,6 +1,6 @@
 /* SPDX-License-Identifier: BSD-3-Clause
  * Copyright(c) 2015 Cavium, Inc
- * Copyright(c) 2019 Arm Limited
+ * Copyright(c) 2020 Arm Limited
  */
 
 #ifndef _RTE_ATOMIC_ARM64_H_
@@ -19,11 +19,11 @@ extern "C" {
 #include <rte_compat.h>
 #include <rte_debug.h>
 
-#define rte_mb() asm volatile("dsb sy" : : : "memory")
+#define rte_mb() asm volatile("dmb osh" : : : "memory")
 
-#define rte_wmb() asm volatile("dsb st" : : : "memory")
+#define rte_wmb() asm volatile("dmb oshst" : : : "memory")
 
-#define rte_rmb() asm volatile("dsb ld" : : : "memory")
+#define rte_rmb() asm volatile("dmb oshld" : : : "memory")
 
 #define rte_smp_mb() asm volatile("dmb ish" : : : "memory")
 
@@ -37,9 +37,9 @@ extern "C" {
 
 #define rte_io_rmb() rte_rmb()
 
-#define rte_cio_wmb() asm volatile("dmb oshst" : : : "memory")
+#define rte_cio_wmb() rte_wmb()
 
-#define rte_cio_rmb() asm volatile("dmb oshld" : : : "memory")
+#define rte_cio_rmb() rte_rmb()
 
 /*------------------------ 128 bit atomic operations -------------------------*/
 
-- 
2.17.1


^ permalink raw reply	[flat|nested] 46+ messages in thread

* [dpdk-dev] [PATCH v4 2/3] doc: update armv8-a IO barrier changes
  2020-07-06 23:43   ` [dpdk-dev] [PATCH v4 1/3] eal: adjust barriers for IO on Armv8-a Honnappa Nagarahalli
@ 2020-07-06 23:43     ` Honnappa Nagarahalli
  2020-07-07  8:36       ` David Marchand
  2020-07-06 23:43     ` [dpdk-dev] [PATCH v4 3/3] doc: update deprecation of CIO barrier APIs Honnappa Nagarahalli
  1 sibling, 1 reply; 46+ messages in thread
From: Honnappa Nagarahalli @ 2020-07-06 23:43 UTC (permalink / raw)
  To: dev, honnappa.nagarahalli, ruifeng.wang, jerinj, hemant.agrawal,
	ajit.khaparde, igorch, thomas, viacheslavo, arybchenko,
	bruce.richardson
  Cc: nd

Updated the use of DMB instruction in rte_*mb APIs for Armv8-a.

Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Acked-by: Jerin Jacob <jerinj@marvell.com>
---
 doc/guides/rel_notes/release_20_08.rst | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/doc/guides/rel_notes/release_20_08.rst b/doc/guides/rel_notes/release_20_08.rst
index 5cbc4ce14..567ae6b2a 100644
--- a/doc/guides/rel_notes/release_20_08.rst
+++ b/doc/guides/rel_notes/release_20_08.rst
@@ -56,6 +56,13 @@ New Features
      Also, make sure to start the actual text at the margin.
      =========================================================
 
+* **rte_*mb APIs are updated to use DMB instruction for Armv8-a.**
+
+  Armv8-a memory model has been strengthened to require other-multi-copy
+  atomicity. This allows for using DMB instruction instead of DSB for IO
+  barriers. rte_*mb APIs, for Armv8-a platforms, are changed to use DMB
+  instruction to reflect this.
+
 * **Updated PCAP driver.**
 
   Updated PCAP driver with new features and improvements, including:
-- 
2.17.1


^ permalink raw reply	[flat|nested] 46+ messages in thread

* [dpdk-dev] [PATCH v4 3/3] doc: update deprecation of CIO barrier APIs
  2020-07-06 23:43   ` [dpdk-dev] [PATCH v4 1/3] eal: adjust barriers for IO on Armv8-a Honnappa Nagarahalli
  2020-07-06 23:43     ` [dpdk-dev] [PATCH v4 2/3] doc: update armv8-a IO barrier changes Honnappa Nagarahalli
@ 2020-07-06 23:43     ` Honnappa Nagarahalli
  2020-07-07  8:39       ` David Marchand
                         ` (2 more replies)
  1 sibling, 3 replies; 46+ messages in thread
From: Honnappa Nagarahalli @ 2020-07-06 23:43 UTC (permalink / raw)
  To: dev, honnappa.nagarahalli, ruifeng.wang, jerinj, hemant.agrawal,
	ajit.khaparde, igorch, thomas, viacheslavo, arybchenko,
	bruce.richardson
  Cc: nd

rte_cio_*mb APIs will be deprecated in 20.11 release.

Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Acked-by: Jerin Jacob <jerinj@marvell.com>
---
 doc/guides/rel_notes/deprecation.rst | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/doc/guides/rel_notes/deprecation.rst b/doc/guides/rel_notes/deprecation.rst
index d1034f60f..59656da3d 100644
--- a/doc/guides/rel_notes/deprecation.rst
+++ b/doc/guides/rel_notes/deprecation.rst
@@ -40,6 +40,12 @@ Deprecation Notices
   These wrappers must be used for patches that need to be merged in 20.08
   onwards. This change will not introduce any performance degradation.
 
+* rte_cio_*mb: Since the IO barriers for ArmV8-a platforms are relaxed from DSB
+  to DMB, rte_cio_*mb APIs provide the same functionality as rte_io_*mb
+  APIs(taking all platforms into consideration). rte_io_*mb APIs should be used
+  in the place of rte_cio_*mb APIs. The rte_cio_*mb APIs will be deprecated in
+  20.11 release.
+
 * igb_uio: In the view of reducing the kernel dependency from the main tree,
   as a first step, the Technical Board decided to move ``igb_uio``
   kernel module to the dpdk-kmods repository in the /linux/igb_uio/ directory
-- 
2.17.1


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [dpdk-dev] [PATCH v4 2/3] doc: update armv8-a IO barrier changes
  2020-07-06 23:43     ` [dpdk-dev] [PATCH v4 2/3] doc: update armv8-a IO barrier changes Honnappa Nagarahalli
@ 2020-07-07  8:36       ` David Marchand
  2020-07-07 18:37         ` Honnappa Nagarahalli
  0 siblings, 1 reply; 46+ messages in thread
From: David Marchand @ 2020-07-07  8:36 UTC (permalink / raw)
  To: Honnappa Nagarahalli
  Cc: dev, Ruifeng Wang (Arm Technology China),
	Jerin Jacob Kollanukkaran, Hemant Agrawal, Ajit Khaparde,
	Igor Chauskin, Thomas Monjalon, Viacheslav Ovsiienko,
	Andrew Rybchenko, Bruce Richardson, nd

On Tue, Jul 7, 2020 at 1:44 AM Honnappa Nagarahalli
<honnappa.nagarahalli@arm.com> wrote:
>
> Updated the use of DMB instruction in rte_*mb APIs for Armv8-a.
>
> Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
> Acked-by: Jerin Jacob <jerinj@marvell.com>
> ---
>  doc/guides/rel_notes/release_20_08.rst | 7 +++++++
>  1 file changed, 7 insertions(+)
>
> diff --git a/doc/guides/rel_notes/release_20_08.rst b/doc/guides/rel_notes/release_20_08.rst
> index 5cbc4ce14..567ae6b2a 100644
> --- a/doc/guides/rel_notes/release_20_08.rst
> +++ b/doc/guides/rel_notes/release_20_08.rst
> @@ -56,6 +56,13 @@ New Features
>       Also, make sure to start the actual text at the margin.
>       =========================================================
>

This release note update will be squashed with the change itself in patch 1.

> +* **rte_*mb APIs are updated to use DMB instruction for Armv8-a.**

We use "ARMv8" in the release notes, any objection if I update this
when applying?


> +
> +  Armv8-a memory model has been strengthened to require other-multi-copy
> +  atomicity. This allows for using DMB instruction instead of DSB for IO
> +  barriers. rte_*mb APIs, for Armv8-a platforms, are changed to use DMB
> +  instruction to reflect this.
> +
>  * **Updated PCAP driver.**
>
>    Updated PCAP driver with new features and improvements, including:
> --
> 2.17.1
>


-- 
David Marchand


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [dpdk-dev] [PATCH v4 3/3] doc: update deprecation of CIO barrier APIs
  2020-07-06 23:43     ` [dpdk-dev] [PATCH v4 3/3] doc: update deprecation of CIO barrier APIs Honnappa Nagarahalli
@ 2020-07-07  8:39       ` David Marchand
  2020-07-07 20:14       ` David Christensen
  2020-07-08 11:49       ` David Marchand
  2 siblings, 0 replies; 46+ messages in thread
From: David Marchand @ 2020-07-07  8:39 UTC (permalink / raw)
  To: dev
  Cc: Ruifeng Wang (Arm Technology China),
	Jerin Jacob Kollanukkaran, Hemant Agrawal, Ajit Khaparde,
	Igor Chauskin, Thomas Monjalon, Viacheslav Ovsiienko,
	Andrew Rybchenko, Bruce Richardson, nd, Honnappa Nagarahalli,
	Ananyev, Konstantin, David Christensen

On Tue, Jul 7, 2020 at 1:44 AM Honnappa Nagarahalli
<honnappa.nagarahalli@arm.com> wrote:
>
> rte_cio_*mb APIs will be deprecated in 20.11 release.
>
> Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
> Acked-by: Jerin Jacob <jerinj@marvell.com>
> ---
>  doc/guides/rel_notes/deprecation.rst | 6 ++++++
>  1 file changed, 6 insertions(+)
>
> diff --git a/doc/guides/rel_notes/deprecation.rst b/doc/guides/rel_notes/deprecation.rst
> index d1034f60f..59656da3d 100644
> --- a/doc/guides/rel_notes/deprecation.rst
> +++ b/doc/guides/rel_notes/deprecation.rst
> @@ -40,6 +40,12 @@ Deprecation Notices
>    These wrappers must be used for patches that need to be merged in 20.08
>    onwards. This change will not introduce any performance degradation.
>
> +* rte_cio_*mb: Since the IO barriers for ArmV8-a platforms are relaxed from DSB
> +  to DMB, rte_cio_*mb APIs provide the same functionality as rte_io_*mb
> +  APIs(taking all platforms into consideration). rte_io_*mb APIs should be used

Nit: missing space.


> +  in the place of rte_cio_*mb APIs. The rte_cio_*mb APIs will be deprecated in
> +  20.11 release.
> +
>  * igb_uio: In the view of reducing the kernel dependency from the main tree,
>    as a first step, the Technical Board decided to move ``igb_uio``
>    kernel module to the dpdk-kmods repository in the /linux/igb_uio/ directory
> --
> 2.17.1
>

LGTM.

We need 3 acks (ideally from different vendors/companies) for a
deprecation notice.
Please maintainers?


Thanks.

-- 
David Marchand


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [dpdk-dev] [PATCH v4 2/3] doc: update armv8-a IO barrier changes
  2020-07-07  8:36       ` David Marchand
@ 2020-07-07 18:37         ` Honnappa Nagarahalli
  0 siblings, 0 replies; 46+ messages in thread
From: Honnappa Nagarahalli @ 2020-07-07 18:37 UTC (permalink / raw)
  To: David Marchand
  Cc: dev, Ruifeng Wang, jerinj, hemant.agrawal,
	Ajit Khaparde (ajit.khaparde, Igor Chauskin, thomas,
	Viacheslav Ovsiienko, Andrew Rybchenko, Bruce Richardson, nd,
	Honnappa Nagarahalli, nd

<snip>

> Subject: Re: [dpdk-dev] [PATCH v4 2/3] doc: update armv8-a IO barrier
> changes
> 
> On Tue, Jul 7, 2020 at 1:44 AM Honnappa Nagarahalli
> <honnappa.nagarahalli@arm.com> wrote:
> >
> > Updated the use of DMB instruction in rte_*mb APIs for Armv8-a.
> >
> > Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
> > Acked-by: Jerin Jacob <jerinj@marvell.com>
> > ---
> >  doc/guides/rel_notes/release_20_08.rst | 7 +++++++
> >  1 file changed, 7 insertions(+)
> >
> > diff --git a/doc/guides/rel_notes/release_20_08.rst
> > b/doc/guides/rel_notes/release_20_08.rst
> > index 5cbc4ce14..567ae6b2a 100644
> > --- a/doc/guides/rel_notes/release_20_08.rst
> > +++ b/doc/guides/rel_notes/release_20_08.rst
> > @@ -56,6 +56,13 @@ New Features
> >       Also, make sure to start the actual text at the margin.
> >       =========================================================
> >
> 
> This release note update will be squashed with the change itself in patch 1.
> 
> > +* **rte_*mb APIs are updated to use DMB instruction for Armv8-a.**
> 
> We use "ARMv8" in the release notes, any objection if I update this when
> applying?
No objections.

> 
> 
> > +
> > +  Armv8-a memory model has been strengthened to require
> > + other-multi-copy  atomicity. This allows for using DMB instruction
> > + instead of DSB for IO  barriers. rte_*mb APIs, for Armv8-a
> > + platforms, are changed to use DMB  instruction to reflect this.
> > +
> >  * **Updated PCAP driver.**
> >
> >    Updated PCAP driver with new features and improvements, including:
> > --
> > 2.17.1
> >
> 
> 
> --
> David Marchand


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [dpdk-dev] [PATCH v4 3/3] doc: update deprecation of CIO barrier APIs
  2020-07-06 23:43     ` [dpdk-dev] [PATCH v4 3/3] doc: update deprecation of CIO barrier APIs Honnappa Nagarahalli
  2020-07-07  8:39       ` David Marchand
@ 2020-07-07 20:14       ` David Christensen
  2020-07-08 11:49       ` David Marchand
  2 siblings, 0 replies; 46+ messages in thread
From: David Christensen @ 2020-07-07 20:14 UTC (permalink / raw)
  To: Honnappa Nagarahalli, dev, ruifeng.wang, jerinj, hemant.agrawal,
	ajit.khaparde, igorch, thomas, viacheslavo, arybchenko,
	bruce.richardson
  Cc: nd



On 7/6/20 4:43 PM, Honnappa Nagarahalli wrote:
> rte_cio_*mb APIs will be deprecated in 20.11 release.
> 
> Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
> Acked-by: Jerin Jacob <jerinj@marvell.com>
> ---
>   doc/guides/rel_notes/deprecation.rst | 6 ++++++
>   1 file changed, 6 insertions(+)
> 
> diff --git a/doc/guides/rel_notes/deprecation.rst b/doc/guides/rel_notes/deprecation.rst
> index d1034f60f..59656da3d 100644
> --- a/doc/guides/rel_notes/deprecation.rst
> +++ b/doc/guides/rel_notes/deprecation.rst
> @@ -40,6 +40,12 @@ Deprecation Notices
>     These wrappers must be used for patches that need to be merged in 20.08
>     onwards. This change will not introduce any performance degradation.
> 
> +* rte_cio_*mb: Since the IO barriers for ArmV8-a platforms are relaxed from DSB
> +  to DMB, rte_cio_*mb APIs provide the same functionality as rte_io_*mb
> +  APIs(taking all platforms into consideration). rte_io_*mb APIs should be used
> +  in the place of rte_cio_*mb APIs. The rte_cio_*mb APIs will be deprecated in
> +  20.11 release.
> +

No difference between rte_cio_* and rte_io_* macros on PPC.

Reviewed-by: David Christensen <drc@linux.vnet.ibm.com>

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [dpdk-dev] [PATCH v3 3/3] doc: update deprecation of CIO barrier APIs
  2020-07-03 18:57     ` [dpdk-dev] [PATCH v3 3/3] doc: update deprecation of CIO barrier APIs Honnappa Nagarahalli
  2020-07-05  0:57       ` Jerin Jacob
@ 2020-07-07 20:19       ` Ajit Khaparde
  2020-07-08 11:05       ` Ananyev, Konstantin
  2 siblings, 0 replies; 46+ messages in thread
From: Ajit Khaparde @ 2020-07-07 20:19 UTC (permalink / raw)
  To: Honnappa Nagarahalli
  Cc: dpdk-dev, ruifeng.wang, Jerin Jacob Kollanukkaran,
	Hemant Agrawal, igorch, Thomas Monjalon, Viacheslav Ovsiienko,
	Andrew Rybchenko, Bruce Richardson, nd

On Fri, Jul 3, 2020 at 11:58 AM Honnappa Nagarahalli <
honnappa.nagarahalli@arm.com> wrote:

> rte_cio_*mb APIs will be deprecated in 20.11 release.
>
> Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
>
Acked-by: Ajit Khaparde <ajit.khaparde@broadcom.com>

---
>  doc/guides/rel_notes/deprecation.rst | 6 ++++++
>  1 file changed, 6 insertions(+)
>
> diff --git a/doc/guides/rel_notes/deprecation.rst
> b/doc/guides/rel_notes/deprecation.rst
> index d1034f60f..59656da3d 100644
> --- a/doc/guides/rel_notes/deprecation.rst
> +++ b/doc/guides/rel_notes/deprecation.rst
> @@ -40,6 +40,12 @@ Deprecation Notices
>    These wrappers must be used for patches that need to be merged in 20.08
>    onwards. This change will not introduce any performance degradation.
>
> +* rte_cio_*mb: Since the IO barriers for ArmV8-a platforms are relaxed
> from DSB
> +  to DMB, rte_cio_*mb APIs provide the same functionality as rte_io_*mb
> +  APIs(taking all platforms into consideration). rte_io_*mb APIs should
> be used
> +  in the place of rte_cio_*mb APIs. The rte_cio_*mb APIs will be
> deprecated in
> +  20.11 release.
> +
>  * igb_uio: In the view of reducing the kernel dependency from the main
> tree,
>    as a first step, the Technical Board decided to move ``igb_uio``
>    kernel module to the dpdk-kmods repository in the /linux/igb_uio/
> directory
> --
> 2.17.1
>
>

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [dpdk-dev] [PATCH v3 3/3] doc: update deprecation of CIO barrier APIs
  2020-07-03 18:57     ` [dpdk-dev] [PATCH v3 3/3] doc: update deprecation of CIO barrier APIs Honnappa Nagarahalli
  2020-07-05  0:57       ` Jerin Jacob
  2020-07-07 20:19       ` Ajit Khaparde
@ 2020-07-08 11:05       ` Ananyev, Konstantin
  2 siblings, 0 replies; 46+ messages in thread
From: Ananyev, Konstantin @ 2020-07-08 11:05 UTC (permalink / raw)
  To: Honnappa Nagarahalli, dev, ruifeng.wang, jerinj, hemant.agrawal,
	ajit.khaparde, igorch, thomas, viacheslavo, arybchenko,
	Richardson, Bruce
  Cc: nd



> -----Original Message-----
> From: dev <dev-bounces@dpdk.org> On Behalf Of Honnappa Nagarahalli
> Sent: Friday, July 3, 2020 7:58 PM
> To: dev@dpdk.org; honnappa.nagarahalli@arm.com; ruifeng.wang@arm.com; jerinj@marvell.com; hemant.agrawal@nxp.com;
> ajit.khaparde@broadcom.com; igorch@amazon.com; thomas@monjalon.net; viacheslavo@mellanox.com; arybchenko@solarflare.com;
> Richardson, Bruce <bruce.richardson@intel.com>
> Cc: nd@arm.com
> Subject: [dpdk-dev] [PATCH v3 3/3] doc: update deprecation of CIO barrier APIs
> 
> rte_cio_*mb APIs will be deprecated in 20.11 release.
> 
> Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
> ---
>  doc/guides/rel_notes/deprecation.rst | 6 ++++++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/doc/guides/rel_notes/deprecation.rst b/doc/guides/rel_notes/deprecation.rst
> index d1034f60f..59656da3d 100644
> --- a/doc/guides/rel_notes/deprecation.rst
> +++ b/doc/guides/rel_notes/deprecation.rst
> @@ -40,6 +40,12 @@ Deprecation Notices
>    These wrappers must be used for patches that need to be merged in 20.08
>    onwards. This change will not introduce any performance degradation.
> 
> +* rte_cio_*mb: Since the IO barriers for ArmV8-a platforms are relaxed from DSB
> +  to DMB, rte_cio_*mb APIs provide the same functionality as rte_io_*mb
> +  APIs(taking all platforms into consideration). rte_io_*mb APIs should be used
> +  in the place of rte_cio_*mb APIs. The rte_cio_*mb APIs will be deprecated in
> +  20.11 release.
> +
>  * igb_uio: In the view of reducing the kernel dependency from the main tree,
>    as a first step, the Technical Board decided to move ``igb_uio``
>    kernel module to the dpdk-kmods repository in the /linux/igb_uio/ directory
> --

Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>

> 2.17.1


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [dpdk-dev] [PATCH v4 3/3] doc: update deprecation of CIO barrier APIs
  2020-07-06 23:43     ` [dpdk-dev] [PATCH v4 3/3] doc: update deprecation of CIO barrier APIs Honnappa Nagarahalli
  2020-07-07  8:39       ` David Marchand
  2020-07-07 20:14       ` David Christensen
@ 2020-07-08 11:49       ` David Marchand
  2 siblings, 0 replies; 46+ messages in thread
From: David Marchand @ 2020-07-08 11:49 UTC (permalink / raw)
  To: Honnappa Nagarahalli
  Cc: dev, Ruifeng Wang (Arm Technology China),
	Jerin Jacob Kollanukkaran, Hemant Agrawal, Ajit Khaparde,
	Igor Chauskin, Thomas Monjalon, Viacheslav Ovsiienko,
	Andrew Rybchenko, Bruce Richardson, nd

On Tue, Jul 7, 2020 at 1:44 AM Honnappa Nagarahalli
<honnappa.nagarahalli@arm.com> wrote:
>
> rte_cio_*mb APIs will be deprecated in 20.11 release.
>
> Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
> Acked-by: Jerin Jacob <jerinj@marvell.com>
Reviewed-by: David Christensen <drc@linux.vnet.ibm.com>
Acked-by: Ajit Khaparde <ajit.khaparde@broadcom.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>

Series applied, thanks.


-- 
David Marchand


^ permalink raw reply	[flat|nested] 46+ messages in thread

end of thread, back to index

Thread overview: 46+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-02-13 12:38 [dpdk-dev] [PATCH RFC v1 0/6] barrier fix and optimization for mlx5 on aarch64 Gavin Hu
2020-02-13 12:38 ` [dpdk-dev] [PATCH RFC v1 1/6] net/mlx5: relax the barrier for UAR write Gavin Hu
2020-02-13 12:38 ` [dpdk-dev] [PATCH RFC v1 2/6] net/mlx5: use cio barrier before the BF WQE Gavin Hu
2020-02-13 12:38 ` [dpdk-dev] [PATCH RFC v1 3/6] net/mlx5: add missing barrier Gavin Hu
2020-02-13 12:38 ` [dpdk-dev] [PATCH RFC v1 4/6] net/mlx5: add descriptive comment for a barrier Gavin Hu
2020-02-13 12:38 ` [dpdk-dev] [PATCH RFC v1 5/6] net/mlx5: non-cacheable mapping defaulted for aarch64 Gavin Hu
2020-02-13 12:38 ` [dpdk-dev] [PATCH RFC v1 6/6] net/mlx5: relaxed ordering for multi-packet RQ buffer refcnt Gavin Hu
2020-04-10 16:41 ` [dpdk-dev] [PATCH RFC v2 0/7] introduce new barrier class and use it for mlx5 PMD Gavin Hu
2020-04-10 17:20   ` Andrew Rybchenko
2020-04-11  3:46     ` Gavin Hu
2020-04-13  9:51       ` Andrew Rybchenko
2020-04-13 16:46         ` Gavin Hu
2020-05-11 18:06   ` [dpdk-dev] [RFC] eal: adjust barriers for IO on Armv8-a Honnappa Nagarahalli
2020-05-12  6:18     ` Ruifeng Wang
2020-05-12  6:42       ` Jerin Jacob
2020-05-12  8:02         ` Ruifeng Wang
2020-05-12  8:28           ` Jerin Jacob
2020-05-12 21:44           ` Honnappa Nagarahalli
2020-05-13 14:49             ` Jerin Jacob
2020-05-14  1:02               ` Honnappa Nagarahalli
2020-06-27 19:12   ` [dpdk-dev] [PATCH v2] " Honnappa Nagarahalli
2020-06-27 19:25     ` Honnappa Nagarahalli
2020-06-30  5:13       ` Jerin Jacob
2020-07-03 18:57   ` [dpdk-dev] [PATCH v3 1/3] " Honnappa Nagarahalli
2020-07-03 18:57     ` [dpdk-dev] [PATCH v3 2/3] doc: update armv8-a IO barrier changes Honnappa Nagarahalli
2020-07-05  0:57       ` Jerin Jacob
2020-07-03 18:57     ` [dpdk-dev] [PATCH v3 3/3] doc: update deprecation of CIO barrier APIs Honnappa Nagarahalli
2020-07-05  0:57       ` Jerin Jacob
2020-07-07 20:19       ` Ajit Khaparde
2020-07-08 11:05       ` Ananyev, Konstantin
2020-07-06 23:43   ` [dpdk-dev] [PATCH v4 1/3] eal: adjust barriers for IO on Armv8-a Honnappa Nagarahalli
2020-07-06 23:43     ` [dpdk-dev] [PATCH v4 2/3] doc: update armv8-a IO barrier changes Honnappa Nagarahalli
2020-07-07  8:36       ` David Marchand
2020-07-07 18:37         ` Honnappa Nagarahalli
2020-07-06 23:43     ` [dpdk-dev] [PATCH v4 3/3] doc: update deprecation of CIO barrier APIs Honnappa Nagarahalli
2020-07-07  8:39       ` David Marchand
2020-07-07 20:14       ` David Christensen
2020-07-08 11:49       ` David Marchand
2020-04-10 16:41 ` [dpdk-dev] [PATCH RFC v2 1/7] eal: introduce new class of barriers for DMA use cases Gavin Hu
2020-04-10 16:41 ` [dpdk-dev] [PATCH RFC v2 2/7] net/mlx5: dmb for immediate doorbell ring on aarch64 Gavin Hu
2020-04-10 16:41 ` [dpdk-dev] [PATCH RFC v2 3/7] net/mlx5: relax barrier to order UAR writes " Gavin Hu
2020-04-10 16:41 ` [dpdk-dev] [PATCH RFC v2 4/7] net/mlx5: relax barrier for aarch64 Gavin Hu
2020-04-10 16:41 ` [dpdk-dev] [PATCH RFC v2 5/7] net/mlx5: add descriptive comment for a barrier Gavin Hu
2020-04-10 16:41 ` [dpdk-dev] [PATCH RFC v2 6/7] net/mlx5: relax ordering for multi-packet RQ buffer refcnt Gavin Hu
2020-06-23  8:26   ` [dpdk-dev] [PATCH v3] net/mlx5: relaxed " Phil Yang
2020-04-10 16:41 ` [dpdk-dev] [PATCH RFC v2 7/7] doc: clarify one configuration in mlx5 guide Gavin Hu

DPDK patches and discussions

Archives are clonable:
	git clone --mirror http://inbox.dpdk.org/dev/0 dev/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 dev dev/ http://inbox.dpdk.org/dev \
		dev@dpdk.org
	public-inbox-index dev


Newsgroup available over NNTP:
	nntp://inbox.dpdk.org/inbox.dpdk.dev


AGPL code for this site: git clone https://public-inbox.org/ public-inbox