DPDK patches and discussions
 help / color / Atom feed
* [dpdk-dev] [PATCH RFC v1 0/6] barrier fix and optimization for mlx5 on aarch64
@ 2020-02-13 12:38 Gavin Hu
  2020-02-13 12:38 ` [dpdk-dev] [PATCH RFC v1 1/6] net/mlx5: relax the barrier for UAR write Gavin Hu
                   ` (5 more replies)
  0 siblings, 6 replies; 7+ messages in thread
From: Gavin Hu @ 2020-02-13 12:38 UTC (permalink / raw)
  To: dev
  Cc: nd, david.marchand, thomas, rasland, drc, bruce.richardson,
	konstantin.ananyev, matan, shahafs, viacheslavo, jerinj,
	Honnappa.Nagarahalli, ruifeng.wang, phil.yang, joyce.kong,
	steve.capper

Using just sufficient barriers really matters to performance.
Insufficient barriers will cause issues while barriers stronger
than required, especially in the fast path is a performance killer.

In the joint preliminary testing between Arm and Ampere, 8%~13%
performance was measured.

Gavin Hu (5):
  net/mlx5: relax the barrier for UAR write
  net/mlx5: use cio barrier before the BF WQE
  net/mlx5: add missing barrier
  net/mlx5: add descriptive comment for a barrier
  net/mlx5: non-cacheable mapping defaulted for aarch64

Phil Yang (1):
  net/mlx5: relaxed ordering for multi-packet RQ buffer refcnt

 drivers/net/mlx5/mlx5_rxq.c  |  5 +++--
 drivers/net/mlx5/mlx5_rxtx.c | 16 +++++++++-------
 drivers/net/mlx5/mlx5_rxtx.h | 11 ++++++++---
 drivers/net/mlx5/mlx5_txq.c  |  4 ++++
 4 files changed, 24 insertions(+), 12 deletions(-)

-- 
2.17.1


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [dpdk-dev] [PATCH RFC v1 1/6] net/mlx5: relax the barrier for UAR write
  2020-02-13 12:38 [dpdk-dev] [PATCH RFC v1 0/6] barrier fix and optimization for mlx5 on aarch64 Gavin Hu
@ 2020-02-13 12:38 ` Gavin Hu
  2020-02-13 12:38 ` [dpdk-dev] [PATCH RFC v1 2/6] net/mlx5: use cio barrier before the BF WQE Gavin Hu
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: Gavin Hu @ 2020-02-13 12:38 UTC (permalink / raw)
  To: dev
  Cc: nd, david.marchand, thomas, rasland, drc, bruce.richardson,
	konstantin.ananyev, matan, shahafs, viacheslavo, jerinj,
	Honnappa.Nagarahalli, ruifeng.wang, phil.yang, joyce.kong,
	steve.capper, stable

The UAR is part of PCI address space that is mapped for direct access to
the HCA from the CPU. Read-Write accesses to this space are strongly
ordered thus a compiler barrier is sufficient for all arches.

This patch set is based on the following aarch64 architecural facts:
1. The PCI BAR space is mapped as nGnRE device memory, not cachable nor
write-combine.
2. io accesses to a single device is total ordered.

Fixes: 6bf10ab69be0 ("net/mlx5: support 32-bit systems")
Cc: stable@dpdk.org

Signed-off-by: Gavin Hu <gavin.hu@arm.com>
Reviewed-by: Phil Yang <phil.yang@arm.com>
---
 drivers/net/mlx5/mlx5_rxtx.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/mlx5/mlx5_rxtx.h b/drivers/net/mlx5/mlx5_rxtx.h
index 939778aa5..50b3cc3c9 100644
--- a/drivers/net/mlx5/mlx5_rxtx.h
+++ b/drivers/net/mlx5/mlx5_rxtx.h
@@ -546,7 +546,7 @@ __mlx5_uar_write64_relaxed(uint64_t val, void *addr,
 static __rte_always_inline void
 __mlx5_uar_write64(uint64_t val, void *addr, rte_spinlock_t *lock)
 {
-	rte_io_wmb();
+	rte_compiler_barrier();
 	__mlx5_uar_write64_relaxed(val, addr, lock);
 }
 
-- 
2.17.1


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [dpdk-dev] [PATCH RFC v1 2/6] net/mlx5: use cio barrier before the BF WQE
  2020-02-13 12:38 [dpdk-dev] [PATCH RFC v1 0/6] barrier fix and optimization for mlx5 on aarch64 Gavin Hu
  2020-02-13 12:38 ` [dpdk-dev] [PATCH RFC v1 1/6] net/mlx5: relax the barrier for UAR write Gavin Hu
@ 2020-02-13 12:38 ` Gavin Hu
  2020-02-13 12:38 ` [dpdk-dev] [PATCH RFC v1 3/6] net/mlx5: add missing barrier Gavin Hu
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: Gavin Hu @ 2020-02-13 12:38 UTC (permalink / raw)
  To: dev
  Cc: nd, david.marchand, thomas, rasland, drc, bruce.richardson,
	konstantin.ananyev, matan, shahafs, viacheslavo, jerinj,
	Honnappa.Nagarahalli, ruifeng.wang, phil.yang, joyce.kong,
	steve.capper, stable

To ensure the WQE and doorbell record, which reside in the host memory,
are visible to HW before the blue frame, a CIO barrier is sufficient, a
rte_wmb is overkill.

Fixes: 6cb559d67b83 ("net/mlx5: add vectorized Rx/Tx burst for x86")
Cc: stable@dpdk.org

Signed-off-by: Gavin Hu <gavin.hu@arm.com>
Reviewed-by: Phil Yang <phil.yang@arm.com>
---
 drivers/net/mlx5/mlx5_rxtx.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/mlx5/mlx5_rxtx.h b/drivers/net/mlx5/mlx5_rxtx.h
index 50b3cc3c9..c672af4c4 100644
--- a/drivers/net/mlx5/mlx5_rxtx.h
+++ b/drivers/net/mlx5/mlx5_rxtx.h
@@ -658,7 +658,7 @@ mlx5_tx_dbrec_cond_wmb(struct mlx5_txq_data *txq, volatile struct mlx5_wqe *wqe,
 	rte_cio_wmb();
 	*txq->qp_db = rte_cpu_to_be_32(txq->wqe_ci);
 	/* Ensure ordering between DB record and BF copy. */
-	rte_wmb();
+	rte_cio_wmb();
 	mlx5_uar_write64_relaxed(*src, dst, txq->uar_lock);
 	if (cond)
 		rte_wmb();
-- 
2.17.1


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [dpdk-dev] [PATCH RFC v1 3/6] net/mlx5: add missing barrier
  2020-02-13 12:38 [dpdk-dev] [PATCH RFC v1 0/6] barrier fix and optimization for mlx5 on aarch64 Gavin Hu
  2020-02-13 12:38 ` [dpdk-dev] [PATCH RFC v1 1/6] net/mlx5: relax the barrier for UAR write Gavin Hu
  2020-02-13 12:38 ` [dpdk-dev] [PATCH RFC v1 2/6] net/mlx5: use cio barrier before the BF WQE Gavin Hu
@ 2020-02-13 12:38 ` Gavin Hu
  2020-02-13 12:38 ` [dpdk-dev] [PATCH RFC v1 4/6] net/mlx5: add descriptive comment for a barrier Gavin Hu
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: Gavin Hu @ 2020-02-13 12:38 UTC (permalink / raw)
  To: dev
  Cc: nd, david.marchand, thomas, rasland, drc, bruce.richardson,
	konstantin.ananyev, matan, shahafs, viacheslavo, jerinj,
	Honnappa.Nagarahalli, ruifeng.wang, phil.yang, joyce.kong,
	steve.capper, stable

To keep order of the modification of RX queue descriptor(rxq->cq_db) and
the CQ doorbell register, a rte_cio_wmb barrier is required.

The situation was rescued by the stronger than required barrier in the
mlx5_uar_write64, it becomes a must when the barrier is relaxed.

Fixes: 6bf10ab69be0 ("net/mlx5: support 32-bit systems")
Cc: stable@dpdk.org

Suggested-by: Phil Yang <phil.yang@arm.com>
Signed-off-by: Gavin Hu <gavin.hu@arm.com>
Reviewed-by: Phil Yang <phil.yang@arm.com>
---
 drivers/net/mlx5/mlx5_rxq.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/net/mlx5/mlx5_rxq.c b/drivers/net/mlx5/mlx5_rxq.c
index dc0fd8211..2d1b153a3 100644
--- a/drivers/net/mlx5/mlx5_rxq.c
+++ b/drivers/net/mlx5/mlx5_rxq.c
@@ -856,7 +856,8 @@ mlx5_arm_cq(struct mlx5_rxq_data *rxq, int sq_n_rxq)
 	doorbell = (uint64_t)doorbell_hi << 32;
 	doorbell |=  rxq->cqn;
 	rxq->cq_db[MLX5_CQ_ARM_DB] = rte_cpu_to_be_32(doorbell_hi);
-	mlx5_uar_write64(rte_cpu_to_be_64(doorbell),
+	rte_cio_wmb();
+	mlx5_uar_write64_relaxed(rte_cpu_to_be_64(doorbell),
 			 cq_db_reg, rxq->uar_lock_cq);
 }
 
-- 
2.17.1


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [dpdk-dev] [PATCH RFC v1 4/6] net/mlx5: add descriptive comment for a barrier
  2020-02-13 12:38 [dpdk-dev] [PATCH RFC v1 0/6] barrier fix and optimization for mlx5 on aarch64 Gavin Hu
                   ` (2 preceding siblings ...)
  2020-02-13 12:38 ` [dpdk-dev] [PATCH RFC v1 3/6] net/mlx5: add missing barrier Gavin Hu
@ 2020-02-13 12:38 ` Gavin Hu
  2020-02-13 12:38 ` [dpdk-dev] [PATCH RFC v1 5/6] net/mlx5: non-cacheable mapping defaulted for aarch64 Gavin Hu
  2020-02-13 12:38 ` [dpdk-dev] [PATCH RFC v1 6/6] net/mlx5: relaxed ordering for multi-packet RQ buffer refcnt Gavin Hu
  5 siblings, 0 replies; 7+ messages in thread
From: Gavin Hu @ 2020-02-13 12:38 UTC (permalink / raw)
  To: dev
  Cc: nd, david.marchand, thomas, rasland, drc, bruce.richardson,
	konstantin.ananyev, matan, shahafs, viacheslavo, jerinj,
	Honnappa.Nagarahalli, ruifeng.wang, phil.yang, joyce.kong,
	steve.capper

The barrier is not required or can be moved down if HW waits for the
doorbell ring to execute the WQE.

This is not the case as HW can start executing the WQE until it gets the
ownership(passed by SW writing the doorbell record).

Add a decriptive comment for this HW specific behavior.

Signed-off-by: Gavin Hu <gavin.hu@arm.com>
Reviewed-by: Phil Yang <phil.yang@arm.com>
---
 drivers/net/mlx5/mlx5_rxtx.h | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/drivers/net/mlx5/mlx5_rxtx.h b/drivers/net/mlx5/mlx5_rxtx.h
index c672af4c4..d32c4e430 100644
--- a/drivers/net/mlx5/mlx5_rxtx.h
+++ b/drivers/net/mlx5/mlx5_rxtx.h
@@ -655,6 +655,11 @@ mlx5_tx_dbrec_cond_wmb(struct mlx5_txq_data *txq, volatile struct mlx5_wqe *wqe,
 	uint64_t *dst = MLX5_TX_BFREG(txq);
 	volatile uint64_t *src = ((volatile uint64_t *)wqe);
 
+	/* The ownership of WQE is passed to HW by updating the doorbell
+	 * record. Once WQE ownership has been passed to the HCA, HW can
+	 * execute the WQE. The barrier is to ensure the WQE are visible
+	 * to HW before HW execute it.
+	 */
 	rte_cio_wmb();
 	*txq->qp_db = rte_cpu_to_be_32(txq->wqe_ci);
 	/* Ensure ordering between DB record and BF copy. */
-- 
2.17.1


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [dpdk-dev] [PATCH RFC v1 5/6] net/mlx5: non-cacheable mapping defaulted for aarch64
  2020-02-13 12:38 [dpdk-dev] [PATCH RFC v1 0/6] barrier fix and optimization for mlx5 on aarch64 Gavin Hu
                   ` (3 preceding siblings ...)
  2020-02-13 12:38 ` [dpdk-dev] [PATCH RFC v1 4/6] net/mlx5: add descriptive comment for a barrier Gavin Hu
@ 2020-02-13 12:38 ` Gavin Hu
  2020-02-13 12:38 ` [dpdk-dev] [PATCH RFC v1 6/6] net/mlx5: relaxed ordering for multi-packet RQ buffer refcnt Gavin Hu
  5 siblings, 0 replies; 7+ messages in thread
From: Gavin Hu @ 2020-02-13 12:38 UTC (permalink / raw)
  To: dev
  Cc: nd, david.marchand, thomas, rasland, drc, bruce.richardson,
	konstantin.ananyev, matan, shahafs, viacheslavo, jerinj,
	Honnappa.Nagarahalli, ruifeng.wang, phil.yang, joyce.kong,
	steve.capper, stable

aarch64 does not map pci resources to 'write-combine' nor
cacheable. In Linux Kernel arch_can_pci_mmap_wc() equals to 0 on
aarch64[1].

[1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/
tree/drivers/pci/pci-sysfs.c?h=v5.4#n1204

Fixes: f078ceb6ae93 ("net/mlx5: fix Tx doorbell write memory barrier")
Cc: stable@dpdk.org

Signed-off-by: Gavin Hu <gavin.hu@arm.com>
Reviewed-by: Phil Yang <phil.yang@arm.com>
---
 drivers/net/mlx5/mlx5_txq.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/drivers/net/mlx5/mlx5_txq.c b/drivers/net/mlx5/mlx5_txq.c
index bc13abfe6..144bab4a6 100644
--- a/drivers/net/mlx5/mlx5_txq.c
+++ b/drivers/net/mlx5/mlx5_txq.c
@@ -319,7 +319,11 @@ txq_uar_ncattr_init(struct mlx5_txq_ctrl *txq_ctrl, size_t page_size)
 	off_t cmd;
 
 	txq_ctrl->txq.db_heu = priv->config.dbnc == MLX5_TXDB_HEURISTIC;
+#ifdef RTE_ARCH_ARM64
+	txq_ctrl->txq.db_nc = 1;
+#else
 	txq_ctrl->txq.db_nc = 0;
+#endif
 	/* Check the doorbell register mapping type. */
 	cmd = txq_ctrl->uar_mmap_offset / page_size;
 	cmd >>= MLX5_UAR_MMAP_CMD_SHIFT;
-- 
2.17.1


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [dpdk-dev] [PATCH RFC v1 6/6] net/mlx5: relaxed ordering for multi-packet RQ buffer refcnt
  2020-02-13 12:38 [dpdk-dev] [PATCH RFC v1 0/6] barrier fix and optimization for mlx5 on aarch64 Gavin Hu
                   ` (4 preceding siblings ...)
  2020-02-13 12:38 ` [dpdk-dev] [PATCH RFC v1 5/6] net/mlx5: non-cacheable mapping defaulted for aarch64 Gavin Hu
@ 2020-02-13 12:38 ` Gavin Hu
  5 siblings, 0 replies; 7+ messages in thread
From: Gavin Hu @ 2020-02-13 12:38 UTC (permalink / raw)
  To: dev
  Cc: nd, Phil Yang, david.marchand, thomas, rasland, drc,
	bruce.richardson, konstantin.ananyev, matan, shahafs,
	viacheslavo, jerinj, Honnappa.Nagarahalli, ruifeng.wang,
	joyce.kong, steve.capper, stable

From: Phil Yang <phil.yang@arm.com>

PMD Rx queue descriptor contains two mlx5_mprq_buf fields, which
are the multi-packet RQ buffer header pointers. It uses the common
rte_atomic_XXX functions to make sure the refcnt access is atomic.

The common rte_atomic_XXX functions are full barriers on aarch64.
Optimized it with one-way barrier to improve performance.

Fixes: 7d6bf6b866b8 ("net/mlx5: add Multi-Packet Rx support")
Cc: stable@dpdk.org

Suggested-by: Gavin Hu <gavin.hu@arm.com>
Signed-off-by: Phil Yang <phil.yang@arm.com>
Reviewed-by: Gavin Hu <gavin.hu@arm.com>
---
 drivers/net/mlx5/mlx5_rxq.c  |  2 +-
 drivers/net/mlx5/mlx5_rxtx.c | 16 +++++++++-------
 drivers/net/mlx5/mlx5_rxtx.h |  2 +-
 3 files changed, 11 insertions(+), 9 deletions(-)

diff --git a/drivers/net/mlx5/mlx5_rxq.c b/drivers/net/mlx5/mlx5_rxq.c
index 2d1b153a3..765bb1af5 100644
--- a/drivers/net/mlx5/mlx5_rxq.c
+++ b/drivers/net/mlx5/mlx5_rxq.c
@@ -1535,7 +1535,7 @@ mlx5_mprq_buf_init(struct rte_mempool *mp, void *opaque_arg,
 
 	memset(_m, 0, sizeof(*buf));
 	buf->mp = mp;
-	rte_atomic16_set(&buf->refcnt, 1);
+	__atomic_store_n(&buf->refcnt, 1, __ATOMIC_RELAXED);
 	for (j = 0; j != strd_n; ++j) {
 		shinfo = &buf->shinfos[j];
 		shinfo->free_cb = mlx5_mprq_buf_free_cb;
diff --git a/drivers/net/mlx5/mlx5_rxtx.c b/drivers/net/mlx5/mlx5_rxtx.c
index 5eea932d4..0e7519c56 100644
--- a/drivers/net/mlx5/mlx5_rxtx.c
+++ b/drivers/net/mlx5/mlx5_rxtx.c
@@ -1592,10 +1592,11 @@ mlx5_mprq_buf_free_cb(void *addr __rte_unused, void *opaque)
 {
 	struct mlx5_mprq_buf *buf = opaque;
 
-	if (rte_atomic16_read(&buf->refcnt) == 1) {
+	if (__atomic_load_n(&buf->refcnt, __ATOMIC_RELAXED) == 1) {
 		rte_mempool_put(buf->mp, buf);
-	} else if (rte_atomic16_add_return(&buf->refcnt, -1) == 0) {
-		rte_atomic16_set(&buf->refcnt, 1);
+	} else if (unlikely(__atomic_sub_fetch(&buf->refcnt, 1,
+						__ATOMIC_RELAXED) == 0)) {
+		__atomic_store_n(&buf->refcnt, 1, __ATOMIC_RELAXED);
 		rte_mempool_put(buf->mp, buf);
 	}
 }
@@ -1676,7 +1677,8 @@ mlx5_rx_burst_mprq(void *dpdk_rxq, struct rte_mbuf **pkts, uint16_t pkts_n)
 
 		if (consumed_strd == strd_n) {
 			/* Replace WQE only if the buffer is still in use. */
-			if (rte_atomic16_read(&buf->refcnt) > 1) {
+			if (__atomic_load_n(&buf->refcnt,
+					    __ATOMIC_RELAXED) > 1) {
 				mprq_buf_replace(rxq, rq_ci & wq_mask, strd_n);
 				/* Release the old buffer. */
 				mlx5_mprq_buf_free(buf);
@@ -1766,9 +1768,9 @@ mlx5_rx_burst_mprq(void *dpdk_rxq, struct rte_mbuf **pkts, uint16_t pkts_n)
 			void *buf_addr;
 
 			/* Increment the refcnt of the whole chunk. */
-			rte_atomic16_add_return(&buf->refcnt, 1);
-			MLX5_ASSERT((uint16_t)rte_atomic16_read(&buf->refcnt) <=
-				    strd_n + 1);
+			__atomic_add_fetch(&buf->refcnt, 1, __ATOMIC_ACQUIRE);
+			MLX5_ASSERT(__atomic_load_n(&buf->refcnt,
+					__ATOMIC_RELAXED) <= strd_n + 1);
 			buf_addr = RTE_PTR_SUB(addr, headroom_sz);
 			/*
 			 * MLX5 device doesn't use iova but it is necessary in a
diff --git a/drivers/net/mlx5/mlx5_rxtx.h b/drivers/net/mlx5/mlx5_rxtx.h
index d32c4e430..1f453fe09 100644
--- a/drivers/net/mlx5/mlx5_rxtx.h
+++ b/drivers/net/mlx5/mlx5_rxtx.h
@@ -78,7 +78,7 @@ struct rxq_zip {
 /* Multi-Packet RQ buffer header. */
 struct mlx5_mprq_buf {
 	struct rte_mempool *mp;
-	rte_atomic16_t refcnt; /* Atomically accessed refcnt. */
+	uint16_t refcnt; /* Atomically accessed refcnt. */
 	uint8_t pad[RTE_PKTMBUF_HEADROOM]; /* Headroom for the first packet. */
 	struct rte_mbuf_ext_shared_info shinfos[];
 	/*
-- 
2.17.1


^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, back to index

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-02-13 12:38 [dpdk-dev] [PATCH RFC v1 0/6] barrier fix and optimization for mlx5 on aarch64 Gavin Hu
2020-02-13 12:38 ` [dpdk-dev] [PATCH RFC v1 1/6] net/mlx5: relax the barrier for UAR write Gavin Hu
2020-02-13 12:38 ` [dpdk-dev] [PATCH RFC v1 2/6] net/mlx5: use cio barrier before the BF WQE Gavin Hu
2020-02-13 12:38 ` [dpdk-dev] [PATCH RFC v1 3/6] net/mlx5: add missing barrier Gavin Hu
2020-02-13 12:38 ` [dpdk-dev] [PATCH RFC v1 4/6] net/mlx5: add descriptive comment for a barrier Gavin Hu
2020-02-13 12:38 ` [dpdk-dev] [PATCH RFC v1 5/6] net/mlx5: non-cacheable mapping defaulted for aarch64 Gavin Hu
2020-02-13 12:38 ` [dpdk-dev] [PATCH RFC v1 6/6] net/mlx5: relaxed ordering for multi-packet RQ buffer refcnt Gavin Hu

DPDK patches and discussions

Archives are clonable:
	git clone --mirror http://inbox.dpdk.org/dev/0 dev/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 dev dev/ http://inbox.dpdk.org/dev \
		dev@dpdk.org
	public-inbox-index dev


Newsgroup available over NNTP:
	nntp://inbox.dpdk.org/inbox.dpdk.dev


AGPL code for this site: git clone https://public-inbox.org/ public-inbox