DPDK patches and discussions
 help / color / mirror / Atom feed
* [dpdk-dev] [PATCH] net/mlx5: add support for 32bit systems
@ 2018-06-28  7:12 Moti Haimovsky
  2018-07-02  7:05 ` Shahaf Shuler
  2018-07-02 11:11 ` [dpdk-dev] [PATCH v2] " Moti Haimovsky
  0 siblings, 2 replies; 17+ messages in thread
From: Moti Haimovsky @ 2018-06-28  7:12 UTC (permalink / raw)
  To: yskoh, adrien.mazarguil; +Cc: dev, Moti Haimovsky

This patch adds support for building and running mlx5 PMD on
32bit systems such as i686.

The main issue to tackle was handling the 32bit access to the UAR
as quoted from the mlx5 PRM:
QP and CQ DoorBells require 64-bit writes. For best performance, it
is recommended to execute the QP/CQ DoorBell as a single 64-bit write
operation. For platforms that do not support 64 bit writes, it is
possible to issue the 64 bits DoorBells through two consecutive writes,
each write 32 bits, as described below:
* The order of writing each of the Dwords is from lower to upper
  addresses.
* No other DoorBell can be rung (or even start ringing) in the midst of
  an on-going write of a DoorBell over a given UAR page.
The last rule implies that in a multi-threaded environment, the access
to a UAR page (which can be accessible by all threads in the process)
must be synchronized (for example, using a semaphore) unless an atomic
write of 64 bits in a single bus operation is guaranteed. Such a
synchronization is not required for when ringing DoorBells on different
UAR pages.

Signed-off-by: Moti Haimovsky <motih@mellanox.com>
---
 doc/guides/nics/features/mlx5.ini |  1 +
 doc/guides/nics/mlx5.rst          | 11 +++++++
 drivers/net/mlx5/mlx5.c           |  8 ++++-
 drivers/net/mlx5/mlx5.h           |  5 +++
 drivers/net/mlx5/mlx5_defs.h      | 18 ++++++++--
 drivers/net/mlx5/mlx5_rxq.c       |  6 +++-
 drivers/net/mlx5/mlx5_rxtx.c      | 22 +++++++------
 drivers/net/mlx5/mlx5_rxtx.h      | 69 ++++++++++++++++++++++++++++++++++++++-
 drivers/net/mlx5/mlx5_txq.c       | 13 +++++++-
 9 files changed, 137 insertions(+), 16 deletions(-)

diff --git a/doc/guides/nics/features/mlx5.ini b/doc/guides/nics/features/mlx5.ini
index e75b14b..b28b43e 100644
--- a/doc/guides/nics/features/mlx5.ini
+++ b/doc/guides/nics/features/mlx5.ini
@@ -43,5 +43,6 @@ Multiprocess aware   = Y
 Other kdrv           = Y
 ARMv8                = Y
 Power8               = Y
+x86-32               = Y
 x86-64               = Y
 Usage doc            = Y
diff --git a/doc/guides/nics/mlx5.rst b/doc/guides/nics/mlx5.rst
index 7dd9c1c..cb9d5d8 100644
--- a/doc/guides/nics/mlx5.rst
+++ b/doc/guides/nics/mlx5.rst
@@ -50,6 +50,8 @@ Features
 --------
 
 - Multi arch support: x86_64, POWER8, ARMv8.
+- Support for i686 is available only when working with
+  rdma-core version 18.0 or above, built with 32bit support.
 - Multiple TX and RX queues.
 - Support for scattered TX and RX frames.
 - IPv4, IPv6, TCPv4, TCPv6, UDPv4 and UDPv6 RSS on any number of queues.
@@ -136,6 +138,11 @@ Limitations
   enabled (``rxq_cqe_comp_en``) at the same time, RSS hash result is not fully
   supported. Some Rx packets may not have PKT_RX_RSS_HASH.
 
+- Building for i686 is only supported with:
+
+  - rdma-core version 18.0 or above built with 32bit support.
+  - Kernel version 4.14.41 or above.
+
 Statistics
 ----------
 
@@ -477,6 +484,10 @@ RMDA Core with Linux Kernel
 - Minimal kernel version : v4.14 or the most recent 4.14-rc (see `Linux installation documentation`_)
 - Minimal rdma-core version: v15+ commit 0c5f5765213a ("Merge pull request #227 from yishaih/tm")
   (see `RDMA Core installation documentation`_)
+- When building for i686 use:
+
+  - rdma-core version 18.0 or above built with 32bit support.
+  - Kernel version 4.14.41 or above.
 
 .. _`Linux installation documentation`: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git/plain/Documentation/admin-guide/README.rst
 .. _`RDMA Core installation documentation`: https://raw.githubusercontent.com/linux-rdma/rdma-core/master/README.md
diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index f0e6ed7..5d0f706 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -567,7 +567,7 @@
 	rte_memseg_walk(find_lower_va_bound, &addr);
 
 	/* keep distance to hugepages to minimize potential conflicts. */
-	addr = RTE_PTR_SUB(addr, MLX5_UAR_OFFSET + MLX5_UAR_SIZE);
+	addr = RTE_PTR_SUB(addr, (uintptr_t)(MLX5_UAR_OFFSET + MLX5_UAR_SIZE));
 	/* anonymous mmap, no real memory consumption. */
 	addr = mmap(addr, MLX5_UAR_SIZE,
 		    PROT_NONE, MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
@@ -953,6 +953,12 @@
 		priv->port = port;
 		priv->pd = pd;
 		priv->mtu = ETHER_MTU;
+#ifndef RTE_ARCH_64
+		/* Initialize UAR access locks for 32bit implementations. */
+		rte_spinlock_init(&priv->uar_lock_cq);
+		for (i = 0; i < MLX5_UAR_PAGE_NUM_MAX; i++)
+			rte_spinlock_init(&priv->uar_lock[i]);
+#endif
 		err = mlx5_args(&config, pci_dev->device.devargs);
 		if (err) {
 			err = rte_errno;
diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index 997b04a..2da32cd 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -198,6 +198,11 @@ struct priv {
 	/* Context for Verbs allocator. */
 	int nl_socket; /* Netlink socket. */
 	uint32_t nl_sn; /* Netlink message sequence number. */
+#ifndef RTE_ARCH_64
+	rte_spinlock_t uar_lock_cq; /* CQs share a common distinct UAR */
+	rte_spinlock_t uar_lock[MLX5_UAR_PAGE_NUM_MAX];
+	/* UAR same-page access control required in 32bit implementations. */
+#endif
 };
 
 #define PORT_ID(priv) ((priv)->dev_data->port_id)
diff --git a/drivers/net/mlx5/mlx5_defs.h b/drivers/net/mlx5/mlx5_defs.h
index 5bbbec2..f6ec415 100644
--- a/drivers/net/mlx5/mlx5_defs.h
+++ b/drivers/net/mlx5/mlx5_defs.h
@@ -87,14 +87,28 @@
 #define MLX5_LINK_STATUS_TIMEOUT 10
 
 /* Reserved address space for UAR mapping. */
-#define MLX5_UAR_SIZE (1ULL << 32)
+#define MLX5_UAR_SIZE (1ULL << (sizeof(uintptr_t) * 4))
 
 /* Offset of reserved UAR address space to hugepage memory. Offset is used here
  * to minimize possibility of address next to hugepage being used by other code
  * in either primary or secondary process, failing to map TX UAR would make TX
  * packets invisible to HW.
  */
-#define MLX5_UAR_OFFSET (1ULL << 32)
+#define MLX5_UAR_OFFSET (1ULL << (sizeof(uintptr_t) * 4))
+
+/* Maximum number of UAR pages used by a port,
+ * These are the size and mask for an array of mutexes used to synchronize
+ * the access to port's UARs on platforms that do not support 64 bit writes.
+ * In such systems it is possible to issue the 64 bits DoorBells through two
+ * consecutive writes, each write 32 bits. The access to a UAR page (which can
+ * be accessible by all threads in the process) must be synchronized
+ * (for example, using a semaphore). Such a synchronization is not required
+ * when ringing DoorBells on different UAR pages.
+ * A port with 512 Tx queues uses 8, 4kBytes, UAR pages which are shared
+ * among the ports.
+ */
+#define MLX5_UAR_PAGE_NUM_MAX 64
+#define MLX5_UAR_PAGE_NUM_MASK ((MLX5_UAR_PAGE_NUM_MAX) - 1)
 
 /* Log 2 of the default number of strides per WQE for Multi-Packet RQ. */
 #define MLX5_MPRQ_STRIDE_NUM_N 6U
diff --git a/drivers/net/mlx5/mlx5_rxq.c b/drivers/net/mlx5/mlx5_rxq.c
index 08dd559..820048f 100644
--- a/drivers/net/mlx5/mlx5_rxq.c
+++ b/drivers/net/mlx5/mlx5_rxq.c
@@ -643,7 +643,8 @@
 	doorbell = (uint64_t)doorbell_hi << 32;
 	doorbell |=  rxq->cqn;
 	rxq->cq_db[MLX5_CQ_ARM_DB] = rte_cpu_to_be_32(doorbell_hi);
-	rte_write64(rte_cpu_to_be_64(doorbell), cq_db_reg);
+	mlx5_uar_write64(rte_cpu_to_be_64(doorbell),
+			 cq_db_reg, rxq->uar_lock_cq);
 }
 
 /**
@@ -1445,6 +1446,9 @@ struct mlx5_rxq_ctrl *
 	tmpl->rxq.elts_n = log2above(desc);
 	tmpl->rxq.elts =
 		(struct rte_mbuf *(*)[1 << tmpl->rxq.elts_n])(tmpl + 1);
+#ifndef RTE_ARCH_64
+	tmpl->rxq.uar_lock_cq = &priv->uar_lock_cq;
+#endif
 	tmpl->idx = idx;
 	rte_atomic32_inc(&tmpl->refcnt);
 	LIST_INSERT_HEAD(&priv->rxqsctrl, tmpl, next);
diff --git a/drivers/net/mlx5/mlx5_rxtx.c b/drivers/net/mlx5/mlx5_rxtx.c
index a7ed8d8..ec35ea0 100644
--- a/drivers/net/mlx5/mlx5_rxtx.c
+++ b/drivers/net/mlx5/mlx5_rxtx.c
@@ -495,6 +495,7 @@
 	volatile struct mlx5_wqe_ctrl *last_wqe = NULL;
 	unsigned int segs_n = 0;
 	const unsigned int max_inline = txq->max_inline;
+	uint64_t addr_64;
 
 	if (unlikely(!pkts_n))
 		return 0;
@@ -711,12 +712,12 @@
 			ds = 3;
 use_dseg:
 			/* Add the remaining packet as a simple ds. */
-			addr = rte_cpu_to_be_64(addr);
+			addr_64 = rte_cpu_to_be_64(addr);
 			*dseg = (rte_v128u32_t){
 				rte_cpu_to_be_32(length),
 				mlx5_tx_mb2mr(txq, buf),
-				addr,
-				addr >> 32,
+				addr_64,
+				addr_64 >> 32,
 			};
 			++ds;
 			if (!segs_n)
@@ -750,12 +751,12 @@
 		total_length += length;
 #endif
 		/* Store segment information. */
-		addr = rte_cpu_to_be_64(rte_pktmbuf_mtod(buf, uintptr_t));
+		addr_64 = rte_cpu_to_be_64(rte_pktmbuf_mtod(buf, uintptr_t));
 		*dseg = (rte_v128u32_t){
 			rte_cpu_to_be_32(length),
 			mlx5_tx_mb2mr(txq, buf),
-			addr,
-			addr >> 32,
+			addr_64,
+			addr_64 >> 32,
 		};
 		(*txq->elts)[++elts_head & elts_m] = buf;
 		if (--segs_n)
@@ -1450,6 +1451,7 @@
 	unsigned int mpw_room = 0;
 	unsigned int inl_pad = 0;
 	uint32_t inl_hdr;
+	uint64_t addr_64;
 	struct mlx5_mpw mpw = {
 		.state = MLX5_MPW_STATE_CLOSED,
 	};
@@ -1586,13 +1588,13 @@
 					((uintptr_t)mpw.data.raw +
 					 inl_pad);
 			(*txq->elts)[elts_head++ & elts_m] = buf;
-			addr = rte_cpu_to_be_64(rte_pktmbuf_mtod(buf,
-								 uintptr_t));
+			addr_64 = rte_cpu_to_be_64(
+					rte_pktmbuf_mtod(buf, uintptr_t));
 			*dseg = (rte_v128u32_t) {
 				rte_cpu_to_be_32(length),
 				mlx5_tx_mb2mr(txq, buf),
-				addr,
-				addr >> 32,
+				addr_64,
+				addr_64 >> 32,
 			};
 			mpw.data.raw = (volatile void *)(dseg + 1);
 			mpw.total_len += (inl_pad + sizeof(*dseg));
diff --git a/drivers/net/mlx5/mlx5_rxtx.h b/drivers/net/mlx5/mlx5_rxtx.h
index 0007be0..2448d73 100644
--- a/drivers/net/mlx5/mlx5_rxtx.h
+++ b/drivers/net/mlx5/mlx5_rxtx.h
@@ -26,6 +26,8 @@
 #include <rte_common.h>
 #include <rte_hexdump.h>
 #include <rte_atomic.h>
+#include <rte_spinlock.h>
+#include <rte_io.h>
 
 #include "mlx5_utils.h"
 #include "mlx5.h"
@@ -115,6 +117,10 @@ struct mlx5_rxq_data {
 	void *cq_uar; /* CQ user access region. */
 	uint32_t cqn; /* CQ number. */
 	uint8_t cq_arm_sn; /* CQ arm seq number. */
+#ifndef RTE_ARCH_64
+	rte_spinlock_t *uar_lock_cq;
+	/* CQ (UAR) access lock required for 32bit implementations */
+#endif
 	uint32_t tunnel; /* Tunnel information. */
 } __rte_cache_aligned;
 
@@ -196,6 +202,10 @@ struct mlx5_txq_data {
 	volatile void *bf_reg; /* Blueflame register remapped. */
 	struct rte_mbuf *(*elts)[]; /* TX elements. */
 	struct mlx5_txq_stats stats; /* TX queue counters. */
+#ifndef RTE_ARCH_64
+	rte_spinlock_t *uar_lock;
+	/* UAR access lock required for 32bit implementations */
+#endif
 } __rte_cache_aligned;
 
 /* Verbs Rx queue elements. */
@@ -348,6 +358,63 @@ uint16_t mlx5_rx_burst_vec(void *dpdk_txq, struct rte_mbuf **pkts,
 uint32_t mlx5_rx_addr2mr_bh(struct mlx5_rxq_data *rxq, uintptr_t addr);
 uint32_t mlx5_tx_addr2mr_bh(struct mlx5_txq_data *txq, uintptr_t addr);
 
+/**
+ * Provide safe 64bit store operation to mlx5 UAR region for both 32bit and
+ * 64bit architectures.
+ *
+ * @param val
+ *   value to write in CPU endian format.
+ * @param addr
+ *   Address to write to.
+ * @param lock
+ *   Address of the lock to use for that UAR access.
+ */
+static __rte_always_inline void
+__mlx5_uar_write64_relaxed(uint64_t val, volatile void *addr,
+			   rte_spinlock_t *lock __rte_unused)
+{
+#ifdef RTE_ARCH_64
+	rte_write64_relaxed(val, addr);
+#else /* !RTE_ARCH_64 */
+	rte_spinlock_lock(lock);
+	rte_write32_relaxed(val, addr);
+	rte_io_wmb();
+	rte_write32_relaxed(val >> 32,
+			    (volatile void *)((volatile char *)addr + 4));
+	rte_spinlock_unlock(lock);
+#endif
+}
+
+/**
+ * Provide safe 64bit store operation to mlx5 UAR region for both 32bit and
+ * 64bit architectures while guaranteeing the order of execution with the
+ * code being executed.
+ *
+ * @param val
+ *   value to write in CPU endian format.
+ * @param addr
+ *   Address to write to.
+ * @param lock
+ *   Address of the lock to use for that UAR access.
+ */
+static __rte_always_inline void
+__mlx5_uar_write64(uint64_t val, volatile void *addr, rte_spinlock_t *lock)
+{
+	rte_io_wmb();
+	__mlx5_uar_write64_relaxed(val, addr, lock);
+}
+
+/* Assist macros, used instead of directly calling the functions they wrap. */
+#ifdef RTE_ARCH_64
+#define mlx5_uar_write64_relaxed(val, dst, lock) \
+		__mlx5_uar_write64_relaxed(val, dst, NULL)
+#define mlx5_uar_write64(val, dst, lock) __mlx5_uar_write64(val, dst, NULL)
+#else
+#define mlx5_uar_write64_relaxed(val, dst, lock) \
+		__mlx5_uar_write64_relaxed(val, dst, lock)
+#define mlx5_uar_write64(val, dst, lock) __mlx5_uar_write64(val, dst, lock)
+#endif
+
 #ifndef NDEBUG
 /**
  * Verify or set magic value in CQE.
@@ -614,7 +681,7 @@ uint16_t mlx5_rx_burst_vec(void *dpdk_txq, struct rte_mbuf **pkts,
 	*txq->qp_db = rte_cpu_to_be_32(txq->wqe_ci);
 	/* Ensure ordering between DB record and BF copy. */
 	rte_wmb();
-	*dst = *src;
+	mlx5_uar_write64_relaxed(*src, dst, txq->uar_lock);
 	if (cond)
 		rte_wmb();
 }
diff --git a/drivers/net/mlx5/mlx5_txq.c b/drivers/net/mlx5/mlx5_txq.c
index 669b913..dc786d4 100644
--- a/drivers/net/mlx5/mlx5_txq.c
+++ b/drivers/net/mlx5/mlx5_txq.c
@@ -255,6 +255,9 @@
 	struct mlx5_txq_ctrl *txq_ctrl;
 	int already_mapped;
 	size_t page_size = sysconf(_SC_PAGESIZE);
+#ifndef RTE_ARCH_64
+	unsigned int lock_idx;
+#endif
 
 	memset(pages, 0, priv->txqs_n * sizeof(uintptr_t));
 	/*
@@ -281,7 +284,7 @@
 		}
 		/* new address in reserved UAR address space. */
 		addr = RTE_PTR_ADD(priv->uar_base,
-				   uar_va & (MLX5_UAR_SIZE - 1));
+				   uar_va & (uintptr_t)(MLX5_UAR_SIZE - 1));
 		if (!already_mapped) {
 			pages[pages_n++] = uar_va;
 			/* fixed mmap to specified address in reserved
@@ -305,6 +308,12 @@
 		else
 			assert(txq_ctrl->txq.bf_reg ==
 			       RTE_PTR_ADD((void *)addr, off));
+#ifndef RTE_ARCH_64
+		/* Assign a UAR lock according to UAR page number */
+		lock_idx = (txq_ctrl->uar_mmap_offset / page_size) &
+			   MLX5_UAR_PAGE_NUM_MASK;
+		txq->uar_lock = &priv->uar_lock[lock_idx];
+#endif
 	}
 	return 0;
 }
@@ -511,6 +520,8 @@ struct mlx5_txq_ibv *
 	rte_atomic32_inc(&txq_ibv->refcnt);
 	if (qp.comp_mask & MLX5DV_QP_MASK_UAR_MMAP_OFFSET) {
 		txq_ctrl->uar_mmap_offset = qp.uar_mmap_offset;
+		DRV_LOG(DEBUG, "port %u: uar_mmap_offset 0x%lx",
+			dev->data->port_id, txq_ctrl->uar_mmap_offset);
 	} else {
 		DRV_LOG(ERR,
 			"port %u failed to retrieve UAR info, invalid"
-- 
1.8.3.1

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [dpdk-dev] [PATCH] net/mlx5: add support for 32bit systems
  2018-06-28  7:12 [dpdk-dev] [PATCH] net/mlx5: add support for 32bit systems Moti Haimovsky
@ 2018-07-02  7:05 ` Shahaf Shuler
  2018-07-02 10:39   ` Mordechay Haimovsky
  2018-07-02 11:11 ` [dpdk-dev] [PATCH v2] " Moti Haimovsky
  1 sibling, 1 reply; 17+ messages in thread
From: Shahaf Shuler @ 2018-07-02  7:05 UTC (permalink / raw)
  To: Mordechay Haimovsky, Yongseok Koh, Adrien Mazarguil
  Cc: dev, Mordechay Haimovsky

Hi Moty,

Few nits,

Also please fix the check patch warning :
### net/mlx5: add support for 32bit systems              
                                                         
CHECK:OPEN_ENDED_LINE: Lines should not end with a '('   
#235: FILE: drivers/net/mlx5/mlx5_rxtx.c:1591:           
+                       addr_64 = rte_cpu_to_be_64(      
                                                         
total: 0 errors, 0 warnings, 1 checks, 311 lines checked 
                                                         


Thursday, June 28, 2018 10:13 AM, Moti Haimovsky:
> Subject: [dpdk-dev] [PATCH] net/mlx5: add support for 32bit systems
> 
> This patch adds support for building and running mlx5 PMD on 32bit systems
> such as i686.
> 
> The main issue to tackle was handling the 32bit access to the UAR as quoted
> from the mlx5 PRM:
> QP and CQ DoorBells require 64-bit writes. For best performance, it is
> recommended to execute the QP/CQ DoorBell as a single 64-bit write
> operation. For platforms that do not support 64 bit writes, it is possible to
> issue the 64 bits DoorBells through two consecutive writes, each write 32
> bits, as described below:
> * The order of writing each of the Dwords is from lower to upper
>   addresses.
> * No other DoorBell can be rung (or even start ringing) in the midst of
>   an on-going write of a DoorBell over a given UAR page.
> The last rule implies that in a multi-threaded environment, the access to a
> UAR page (which can be accessible by all threads in the process) must be
> synchronized (for example, using a semaphore) unless an atomic write of 64
> bits in a single bus operation is guaranteed. Such a synchronization is not
> required for when ringing DoorBells on different UAR pages.
> 
> Signed-off-by: Moti Haimovsky <motih@mellanox.com>
> ---
>  doc/guides/nics/features/mlx5.ini |  1 +
>  doc/guides/nics/mlx5.rst          | 11 +++++++
>  drivers/net/mlx5/mlx5.c           |  8 ++++-
>  drivers/net/mlx5/mlx5.h           |  5 +++
>  drivers/net/mlx5/mlx5_defs.h      | 18 ++++++++--
>  drivers/net/mlx5/mlx5_rxq.c       |  6 +++-
>  drivers/net/mlx5/mlx5_rxtx.c      | 22 +++++++------
>  drivers/net/mlx5/mlx5_rxtx.h      | 69
> ++++++++++++++++++++++++++++++++++++++-
>  drivers/net/mlx5/mlx5_txq.c       | 13 +++++++-
>  9 files changed, 137 insertions(+), 16 deletions(-)
> 
> diff --git a/doc/guides/nics/features/mlx5.ini
> b/doc/guides/nics/features/mlx5.ini
> index e75b14b..b28b43e 100644
> --- a/doc/guides/nics/features/mlx5.ini
> +++ b/doc/guides/nics/features/mlx5.ini
> @@ -43,5 +43,6 @@ Multiprocess aware   = Y
>  Other kdrv           = Y
>  ARMv8                = Y
>  Power8               = Y
> +x86-32               = Y
>  x86-64               = Y
>  Usage doc            = Y
> diff --git a/doc/guides/nics/mlx5.rst b/doc/guides/nics/mlx5.rst index
> 7dd9c1c..cb9d5d8 100644
> --- a/doc/guides/nics/mlx5.rst
> +++ b/doc/guides/nics/mlx5.rst
> @@ -50,6 +50,8 @@ Features
>  --------
> 
>  - Multi arch support: x86_64, POWER8, ARMv8.
> +- Support for i686 is available only when working with
> +  rdma-core version 18.0 or above, built with 32bit support.

I think we can just add i686 to the supported arch. The limitation on the rdma-core version is well documented below.

>  - Multiple TX and RX queues.
>  - Support for scattered TX and RX frames.
>  - IPv4, IPv6, TCPv4, TCPv6, UDPv4 and UDPv6 RSS on any number of queues.
> @@ -136,6 +138,11 @@ Limitations
>    enabled (``rxq_cqe_comp_en``) at the same time, RSS hash result is not
> fully
>    supported. Some Rx packets may not have PKT_RX_RSS_HASH.
> 
> +- Building for i686 is only supported with:
> +
> +  - rdma-core version 18.0 or above built with 32bit support.
> +  - Kernel version 4.14.41 or above.

Why the kernel is related? The rdma-core I understand. 

> +
>  Statistics
>  ----------
> 
> @@ -477,6 +484,10 @@ RMDA Core with Linux Kernel
>  - Minimal kernel version : v4.14 or the most recent 4.14-rc (see `Linux
> installation documentation`_)
>  - Minimal rdma-core version: v15+ commit 0c5f5765213a ("Merge pull
> request #227 from yishaih/tm")
>    (see `RDMA Core installation documentation`_)
> +- When building for i686 use:
> +
> +  - rdma-core version 18.0 or above built with 32bit support.
> +  - Kernel version 4.14.41 or above.
> 
>  .. _`Linux installation documentation`:
> https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgit.
> kernel.org%2Fpub%2Fscm%2Flinux%2Fkernel%2Fgit%2Fstable%2Flinux-
> stable.git%2Fplain%2FDocumentation%2Fadmin-
> guide%2FREADME.rst&data=02%7C01%7Cshahafs%40mellanox.com%7C3793
> 359a175d46b47c2508d5dcc69ff1%7Ca652971c7d2e4d9ba6a4d149256f461b%7
> C0%7C0%7C636657668016130861&sdata=yFHd7tQET5SqIcPgj66BSuwJp3sydo
> ujC0ldCMkChVE%3D&reserved=0
>  .. _`RDMA Core installation documentation`:
> https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fra
> w.githubusercontent.com%2Flinux-rdma%2Frdma-
> core%2Fmaster%2FREADME.md&data=02%7C01%7Cshahafs%40mellanox.co
> m%7C3793359a175d46b47c2508d5dcc69ff1%7Ca652971c7d2e4d9ba6a4d1492
> 56f461b%7C0%7C0%7C636657668016130861&sdata=4LNh%2Fr5vM4BJeizvEIxi
> ShMrfcx0NrlBFWz4V2wA%2FkY%3D&reserved=0
> diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c index
> f0e6ed7..5d0f706 100644
> --- a/drivers/net/mlx5/mlx5.c
> +++ b/drivers/net/mlx5/mlx5.c
> @@ -567,7 +567,7 @@
>  	rte_memseg_walk(find_lower_va_bound, &addr);
> 
>  	/* keep distance to hugepages to minimize potential conflicts. */
> -	addr = RTE_PTR_SUB(addr, MLX5_UAR_OFFSET + MLX5_UAR_SIZE);
> +	addr = RTE_PTR_SUB(addr, (uintptr_t)(MLX5_UAR_OFFSET +
> +MLX5_UAR_SIZE));
>  	/* anonymous mmap, no real memory consumption. */
>  	addr = mmap(addr, MLX5_UAR_SIZE,
>  		    PROT_NONE, MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
> @@ -953,6 +953,12 @@
>  		priv->port = port;
>  		priv->pd = pd;
>  		priv->mtu = ETHER_MTU;
> +#ifndef RTE_ARCH_64
> +		/* Initialize UAR access locks for 32bit implementations. */
> +		rte_spinlock_init(&priv->uar_lock_cq);
> +		for (i = 0; i < MLX5_UAR_PAGE_NUM_MAX; i++)
> +			rte_spinlock_init(&priv->uar_lock[i]);
> +#endif
>  		err = mlx5_args(&config, pci_dev->device.devargs);
>  		if (err) {
>  			err = rte_errno;
> diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h index
> 997b04a..2da32cd 100644
> --- a/drivers/net/mlx5/mlx5.h
> +++ b/drivers/net/mlx5/mlx5.h
> @@ -198,6 +198,11 @@ struct priv {
>  	/* Context for Verbs allocator. */
>  	int nl_socket; /* Netlink socket. */
>  	uint32_t nl_sn; /* Netlink message sequence number. */
> +#ifndef RTE_ARCH_64
> +	rte_spinlock_t uar_lock_cq; /* CQs share a common distinct UAR */
> +	rte_spinlock_t uar_lock[MLX5_UAR_PAGE_NUM_MAX];
> +	/* UAR same-page access control required in 32bit implementations.
> */
> +#endif
>  };
> 
>  #define PORT_ID(priv) ((priv)->dev_data->port_id) diff --git
> a/drivers/net/mlx5/mlx5_defs.h b/drivers/net/mlx5/mlx5_defs.h index
> 5bbbec2..f6ec415 100644
> --- a/drivers/net/mlx5/mlx5_defs.h
> +++ b/drivers/net/mlx5/mlx5_defs.h
> @@ -87,14 +87,28 @@
>  #define MLX5_LINK_STATUS_TIMEOUT 10
> 
>  /* Reserved address space for UAR mapping. */ -#define MLX5_UAR_SIZE
> (1ULL << 32)
> +#define MLX5_UAR_SIZE (1ULL << (sizeof(uintptr_t) * 4))
> 
>  /* Offset of reserved UAR address space to hugepage memory. Offset is
> used here
>   * to minimize possibility of address next to hugepage being used by other
> code
>   * in either primary or secondary process, failing to map TX UAR would make
> TX
>   * packets invisible to HW.
>   */
> -#define MLX5_UAR_OFFSET (1ULL << 32)
> +#define MLX5_UAR_OFFSET (1ULL << (sizeof(uintptr_t) * 4))
> +
> +/* Maximum number of UAR pages used by a port,
> + * These are the size and mask for an array of mutexes used to
> +synchronize
> + * the access to port's UARs on platforms that do not support 64 bit writes.
> + * In such systems it is possible to issue the 64 bits DoorBells
> +through two
> + * consecutive writes, each write 32 bits. The access to a UAR page
> +(which can
> + * be accessible by all threads in the process) must be synchronized
> + * (for example, using a semaphore). Such a synchronization is not
> +required
> + * when ringing DoorBells on different UAR pages.
> + * A port with 512 Tx queues uses 8, 4kBytes, UAR pages which are
> +shared
> + * among the ports.
> + */
> +#define MLX5_UAR_PAGE_NUM_MAX 64
> +#define MLX5_UAR_PAGE_NUM_MASK ((MLX5_UAR_PAGE_NUM_MAX)
> - 1)
> 
>  /* Log 2 of the default number of strides per WQE for Multi-Packet RQ. */
> #define MLX5_MPRQ_STRIDE_NUM_N 6U diff --git
> a/drivers/net/mlx5/mlx5_rxq.c b/drivers/net/mlx5/mlx5_rxq.c index
> 08dd559..820048f 100644
> --- a/drivers/net/mlx5/mlx5_rxq.c
> +++ b/drivers/net/mlx5/mlx5_rxq.c
> @@ -643,7 +643,8 @@
>  	doorbell = (uint64_t)doorbell_hi << 32;
>  	doorbell |=  rxq->cqn;
>  	rxq->cq_db[MLX5_CQ_ARM_DB] = rte_cpu_to_be_32(doorbell_hi);
> -	rte_write64(rte_cpu_to_be_64(doorbell), cq_db_reg);
> +	mlx5_uar_write64(rte_cpu_to_be_64(doorbell),
> +			 cq_db_reg, rxq->uar_lock_cq);
>  }
> 
>  /**
> @@ -1445,6 +1446,9 @@ struct mlx5_rxq_ctrl *
>  	tmpl->rxq.elts_n = log2above(desc);
>  	tmpl->rxq.elts =
>  		(struct rte_mbuf *(*)[1 << tmpl->rxq.elts_n])(tmpl + 1);
> +#ifndef RTE_ARCH_64
> +	tmpl->rxq.uar_lock_cq = &priv->uar_lock_cq; #endif
>  	tmpl->idx = idx;
>  	rte_atomic32_inc(&tmpl->refcnt);
>  	LIST_INSERT_HEAD(&priv->rxqsctrl, tmpl, next); diff --git
> a/drivers/net/mlx5/mlx5_rxtx.c b/drivers/net/mlx5/mlx5_rxtx.c index
> a7ed8d8..ec35ea0 100644
> --- a/drivers/net/mlx5/mlx5_rxtx.c
> +++ b/drivers/net/mlx5/mlx5_rxtx.c
> @@ -495,6 +495,7 @@
>  	volatile struct mlx5_wqe_ctrl *last_wqe = NULL;
>  	unsigned int segs_n = 0;
>  	const unsigned int max_inline = txq->max_inline;
> +	uint64_t addr_64;
> 
>  	if (unlikely(!pkts_n))
>  		return 0;
> @@ -711,12 +712,12 @@
>  			ds = 3;
>  use_dseg:
>  			/* Add the remaining packet as a simple ds. */
> -			addr = rte_cpu_to_be_64(addr);
> +			addr_64 = rte_cpu_to_be_64(addr);
>  			*dseg = (rte_v128u32_t){
>  				rte_cpu_to_be_32(length),
>  				mlx5_tx_mb2mr(txq, buf),
> -				addr,
> -				addr >> 32,
> +				addr_64,
> +				addr_64 >> 32,
>  			};
>  			++ds;
>  			if (!segs_n)
> @@ -750,12 +751,12 @@
>  		total_length += length;
>  #endif
>  		/* Store segment information. */
> -		addr = rte_cpu_to_be_64(rte_pktmbuf_mtod(buf,
> uintptr_t));
> +		addr_64 = rte_cpu_to_be_64(rte_pktmbuf_mtod(buf,
> uintptr_t));
>  		*dseg = (rte_v128u32_t){
>  			rte_cpu_to_be_32(length),
>  			mlx5_tx_mb2mr(txq, buf),
> -			addr,
> -			addr >> 32,
> +			addr_64,
> +			addr_64 >> 32,
>  		};
>  		(*txq->elts)[++elts_head & elts_m] = buf;
>  		if (--segs_n)
> @@ -1450,6 +1451,7 @@
>  	unsigned int mpw_room = 0;
>  	unsigned int inl_pad = 0;
>  	uint32_t inl_hdr;
> +	uint64_t addr_64;
>  	struct mlx5_mpw mpw = {
>  		.state = MLX5_MPW_STATE_CLOSED,
>  	};
> @@ -1586,13 +1588,13 @@
>  					((uintptr_t)mpw.data.raw +
>  					 inl_pad);
>  			(*txq->elts)[elts_head++ & elts_m] = buf;
> -			addr = rte_cpu_to_be_64(rte_pktmbuf_mtod(buf,
> -								 uintptr_t));
> +			addr_64 = rte_cpu_to_be_64(
> +					rte_pktmbuf_mtod(buf, uintptr_t));
>  			*dseg = (rte_v128u32_t) {
>  				rte_cpu_to_be_32(length),
>  				mlx5_tx_mb2mr(txq, buf),
> -				addr,
> -				addr >> 32,
> +				addr_64,
> +				addr_64 >> 32,
>  			};
>  			mpw.data.raw = (volatile void *)(dseg + 1);
>  			mpw.total_len += (inl_pad + sizeof(*dseg)); diff --git
> a/drivers/net/mlx5/mlx5_rxtx.h b/drivers/net/mlx5/mlx5_rxtx.h index
> 0007be0..2448d73 100644
> --- a/drivers/net/mlx5/mlx5_rxtx.h
> +++ b/drivers/net/mlx5/mlx5_rxtx.h
> @@ -26,6 +26,8 @@
>  #include <rte_common.h>
>  #include <rte_hexdump.h>
>  #include <rte_atomic.h>
> +#include <rte_spinlock.h>
> +#include <rte_io.h>
> 
>  #include "mlx5_utils.h"
>  #include "mlx5.h"
> @@ -115,6 +117,10 @@ struct mlx5_rxq_data {
>  	void *cq_uar; /* CQ user access region. */
>  	uint32_t cqn; /* CQ number. */
>  	uint8_t cq_arm_sn; /* CQ arm seq number. */
> +#ifndef RTE_ARCH_64
> +	rte_spinlock_t *uar_lock_cq;
> +	/* CQ (UAR) access lock required for 32bit implementations */ #endif
>  	uint32_t tunnel; /* Tunnel information. */  } __rte_cache_aligned;
> 
> @@ -196,6 +202,10 @@ struct mlx5_txq_data {
>  	volatile void *bf_reg; /* Blueflame register remapped. */
>  	struct rte_mbuf *(*elts)[]; /* TX elements. */
>  	struct mlx5_txq_stats stats; /* TX queue counters. */
> +#ifndef RTE_ARCH_64
> +	rte_spinlock_t *uar_lock;
> +	/* UAR access lock required for 32bit implementations */ #endif
>  } __rte_cache_aligned;
> 
>  /* Verbs Rx queue elements. */
> @@ -348,6 +358,63 @@ uint16_t mlx5_rx_burst_vec(void *dpdk_txq, struct
> rte_mbuf **pkts,  uint32_t mlx5_rx_addr2mr_bh(struct mlx5_rxq_data
> *rxq, uintptr_t addr);  uint32_t mlx5_tx_addr2mr_bh(struct mlx5_txq_data
> *txq, uintptr_t addr);
> 
> +/**
> + * Provide safe 64bit store operation to mlx5 UAR region for both 32bit
> +and
> + * 64bit architectures.
> + *
> + * @param val
> + *   value to write in CPU endian format.
> + * @param addr
> + *   Address to write to.
> + * @param lock
> + *   Address of the lock to use for that UAR access.
> + */
> +static __rte_always_inline void
> +__mlx5_uar_write64_relaxed(uint64_t val, volatile void *addr,
> +			   rte_spinlock_t *lock __rte_unused) { #ifdef
> RTE_ARCH_64
> +	rte_write64_relaxed(val, addr);
> +#else /* !RTE_ARCH_64 */
> +	rte_spinlock_lock(lock);
> +	rte_write32_relaxed(val, addr);
> +	rte_io_wmb();
> +	rte_write32_relaxed(val >> 32,
> +			    (volatile void *)((volatile char *)addr + 4));
> +	rte_spinlock_unlock(lock);
> +#endif
> +}
> +
> +/**
> + * Provide safe 64bit store operation to mlx5 UAR region for both 32bit
> +and
> + * 64bit architectures while guaranteeing the order of execution with
> +the
> + * code being executed.
> + *
> + * @param val
> + *   value to write in CPU endian format.
> + * @param addr
> + *   Address to write to.
> + * @param lock
> + *   Address of the lock to use for that UAR access.
> + */
> +static __rte_always_inline void
> +__mlx5_uar_write64(uint64_t val, volatile void *addr, rte_spinlock_t
> +*lock) {
> +	rte_io_wmb();
> +	__mlx5_uar_write64_relaxed(val, addr, lock); }
> +
> +/* Assist macros, used instead of directly calling the functions the
> +wrap. */ #ifdef RTE_ARCH_64 #define mlx5_uar_write64_relaxed(val, dst,
> +lock) \
> +		__mlx5_uar_write64_relaxed(val, dst, NULL) #define
> +mlx5_uar_write64(val, dst, lock) __mlx5_uar_write64(val, dst, NULL)
> +#else #define mlx5_uar_write64_relaxed(val, dst, lock) \
> +		__mlx5_uar_write64_relaxed(val, dst, lock) #define
> +mlx5_uar_write64(val, dst, lock) __mlx5_uar_write64(val, dst, lock)
> +#endif
> +
>  #ifndef NDEBUG
>  /**
>   * Verify or set magic value in CQE.
> @@ -614,7 +681,7 @@ uint16_t mlx5_rx_burst_vec(void *dpdk_txq, struct
> rte_mbuf **pkts,
>  	*txq->qp_db = rte_cpu_to_be_32(txq->wqe_ci);
>  	/* Ensure ordering between DB record and BF copy. */
>  	rte_wmb();
> -	*dst = *src;
> +	mlx5_uar_write64_relaxed(*src, dst, txq->uar_lock);
>  	if (cond)
>  		rte_wmb();
>  }
> diff --git a/drivers/net/mlx5/mlx5_txq.c b/drivers/net/mlx5/mlx5_txq.c
> index 669b913..dc786d4 100644
> --- a/drivers/net/mlx5/mlx5_txq.c
> +++ b/drivers/net/mlx5/mlx5_txq.c
> @@ -255,6 +255,9 @@
>  	struct mlx5_txq_ctrl *txq_ctrl;
>  	int already_mapped;
>  	size_t page_size = sysconf(_SC_PAGESIZE);
> +#ifndef RTE_ARCH_64
> +	unsigned int lock_idx;
> +#endif
> 
>  	memset(pages, 0, priv->txqs_n * sizeof(uintptr_t));
>  	/*
> @@ -281,7 +284,7 @@
>  		}
>  		/* new address in reserved UAR address space. */
>  		addr = RTE_PTR_ADD(priv->uar_base,
> -				   uar_va & (MLX5_UAR_SIZE - 1));
> +				   uar_va & (uintptr_t)(MLX5_UAR_SIZE - 1));
>  		if (!already_mapped) {
>  			pages[pages_n++] = uar_va;
>  			/* fixed mmap to specified address in reserved @@ -
> 305,6 +308,12 @@
>  		else
>  			assert(txq_ctrl->txq.bf_reg ==
>  			       RTE_PTR_ADD((void *)addr, off));
> +#ifndef RTE_ARCH_64
> +		/* Assign a UAR lock according to UAR page number */
> +		lock_idx = (txq_ctrl->uar_mmap_offset / page_size) &
> +			   MLX5_UAR_PAGE_NUM_MASK;
> +		txq->uar_lock = &priv->uar_lock[lock_idx]; #endif
>  	}
>  	return 0;
>  }
> @@ -511,6 +520,8 @@ struct mlx5_txq_ibv *
>  	rte_atomic32_inc(&txq_ibv->refcnt);
>  	if (qp.comp_mask & MLX5DV_QP_MASK_UAR_MMAP_OFFSET) {
>  		txq_ctrl->uar_mmap_offset = qp.uar_mmap_offset;
> +		DRV_LOG(DEBUG, "port %u: uar_mmap_offset 0x%lx",
> +			dev->data->port_id, txq_ctrl->uar_mmap_offset);
>  	} else {
>  		DRV_LOG(ERR,
>  			"port %u failed to retrieve UAR info, invalid"
> --
> 1.8.3.1

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [dpdk-dev] [PATCH] net/mlx5: add support for 32bit systems
  2018-07-02  7:05 ` Shahaf Shuler
@ 2018-07-02 10:39   ` Mordechay Haimovsky
  0 siblings, 0 replies; 17+ messages in thread
From: Mordechay Haimovsky @ 2018-07-02 10:39 UTC (permalink / raw)
  To: Shahaf Shuler, Yongseok Koh, Adrien Mazarguil; +Cc: dev

Inline


> -----Original Message-----
> From: Shahaf Shuler
> Sent: Monday, July 2, 2018 10:05 AM
> To: Mordechay Haimovsky <motih@mellanox.com>; Yongseok Koh
> <yskoh@mellanox.com>; Adrien Mazarguil <adrien.mazarguil@6wind.com>
> Cc: dev@dpdk.org; Mordechay Haimovsky <motih@mellanox.com>
> Subject: RE: [dpdk-dev] [PATCH] net/mlx5: add support for 32bit systems
> 
> Hi Moty,
> 
> Few nits,
> 
> Also please fix the check patch warning :
> ### net/mlx5: add support for 32bit systems
> 
> CHECK:OPEN_ENDED_LINE: Lines should not end with a '('
> #235: FILE: drivers/net/mlx5/mlx5_rxtx.c:1591:
> +                       addr_64 = rte_cpu_to_be_64(
> 
> total: 0 errors, 0 warnings, 1 checks, 311 lines checked
> 
> 
Will Do.
Is there a way to know against which kernel DPDK tools are testing ?
Checkpatch does not show this error when testing with  4.14.0-0.rc4.git4.1.el7.x86_64 kernel for example.
> 
> Thursday, June 28, 2018 10:13 AM, Moti Haimovsky:
> > Subject: [dpdk-dev] [PATCH] net/mlx5: add support for 32bit systems
> >
> > This patch adds support for building and running mlx5 PMD on 32bit
> > systems such as i686.
> >
> > The main issue to tackle was handling the 32bit access to the UAR as
> > quoted from the mlx5 PRM:
> > QP and CQ DoorBells require 64-bit writes. For best performance, it is
> > recommended to execute the QP/CQ DoorBell as a single 64-bit write
> > operation. For platforms that do not support 64 bit writes, it is
> > possible to issue the 64 bits DoorBells through two consecutive
> > writes, each write 32 bits, as described below:
> > * The order of writing each of the Dwords is from lower to upper
> >   addresses.
> > * No other DoorBell can be rung (or even start ringing) in the midst of
> >   an on-going write of a DoorBell over a given UAR page.
> > The last rule implies that in a multi-threaded environment, the access
> > to a UAR page (which can be accessible by all threads in the process)
> > must be synchronized (for example, using a semaphore) unless an atomic
> > write of 64 bits in a single bus operation is guaranteed. Such a
> > synchronization is not required for when ringing DoorBells on different UAR
> pages.
> >
> > Signed-off-by: Moti Haimovsky <motih@mellanox.com>
> > ---
> >  doc/guides/nics/features/mlx5.ini |  1 +
> >  doc/guides/nics/mlx5.rst          | 11 +++++++
> >  drivers/net/mlx5/mlx5.c           |  8 ++++-
> >  drivers/net/mlx5/mlx5.h           |  5 +++
> >  drivers/net/mlx5/mlx5_defs.h      | 18 ++++++++--
> >  drivers/net/mlx5/mlx5_rxq.c       |  6 +++-
> >  drivers/net/mlx5/mlx5_rxtx.c      | 22 +++++++------
> >  drivers/net/mlx5/mlx5_rxtx.h      | 69
> > ++++++++++++++++++++++++++++++++++++++-
> >  drivers/net/mlx5/mlx5_txq.c       | 13 +++++++-
> >  9 files changed, 137 insertions(+), 16 deletions(-)
> >
> > diff --git a/doc/guides/nics/features/mlx5.ini
> > b/doc/guides/nics/features/mlx5.ini
> > index e75b14b..b28b43e 100644
> > --- a/doc/guides/nics/features/mlx5.ini
> > +++ b/doc/guides/nics/features/mlx5.ini
> > @@ -43,5 +43,6 @@ Multiprocess aware   = Y
> >  Other kdrv           = Y
> >  ARMv8                = Y
> >  Power8               = Y
> > +x86-32               = Y
> >  x86-64               = Y
> >  Usage doc            = Y
> > diff --git a/doc/guides/nics/mlx5.rst b/doc/guides/nics/mlx5.rst index
> > 7dd9c1c..cb9d5d8 100644
> > --- a/doc/guides/nics/mlx5.rst
> > +++ b/doc/guides/nics/mlx5.rst
> > @@ -50,6 +50,8 @@ Features
> >  --------
> >
> >  - Multi arch support: x86_64, POWER8, ARMv8.
> > +- Support for i686 is available only when working with
> > +  rdma-core version 18.0 or above, built with 32bit support.
> 
> I think we can just add i686 to the supported arch. The limitation on the
> rdma-core version is well documented below.
> 
Will change this

> >  - Multiple TX and RX queues.
> >  - Support for scattered TX and RX frames.
> >  - IPv4, IPv6, TCPv4, TCPv6, UDPv4 and UDPv6 RSS on any number of
> queues.
> > @@ -136,6 +138,11 @@ Limitations
> >    enabled (``rxq_cqe_comp_en``) at the same time, RSS hash result is
> > not fully
> >    supported. Some Rx packets may not have PKT_RX_RSS_HASH.
> >
> > +- Building for i686 is only supported with:
> > +
> > +  - rdma-core version 18.0 or above built with 32bit support.
> > +  - Kernel version 4.14.41 or above.
> 
> Why the kernel is related? The rdma-core I understand.
> 
There was a patch  added to the kernel that fixed broken 32bit support
f2e9bfac13c9 RDMA/rxe: Fix uABI structure layouts for 32/64 compat
(SHA may have changed)
This patch was added to kernel 4.17 and backported to 4.14.41

> > +
> >  Statistics
> >  ----------
> >
> > @@ -477,6 +484,10 @@ RMDA Core with Linux Kernel
> >  - Minimal kernel version : v4.14 or the most recent 4.14-rc (see
> > `Linux installation documentation`_)
> >  - Minimal rdma-core version: v15+ commit 0c5f5765213a ("Merge pull
> > request #227 from yishaih/tm")
> >    (see `RDMA Core installation documentation`_)
> > +- When building for i686 use:
> > +
> > +  - rdma-core version 18.0 or above built with 32bit support.
> > +  - Kernel version 4.14.41 or above.
> >
> >  .. _`Linux installation documentation`:
> >
> https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgit.
> > kernel.org%2Fpub%2Fscm%2Flinux%2Fkernel%2Fgit%2Fstable%2Flinux-
> > stable.git%2Fplain%2FDocumentation%2Fadmin-
> >
> guide%2FREADME.rst&data=02%7C01%7Cshahafs%40mellanox.com%7C3793
> >
> 359a175d46b47c2508d5dcc69ff1%7Ca652971c7d2e4d9ba6a4d149256f461b%7
> >
> C0%7C0%7C636657668016130861&sdata=yFHd7tQET5SqIcPgj66BSuwJp3sydo
> > ujC0ldCMkChVE%3D&reserved=0
> >  .. _`RDMA Core installation documentation`:
> >
> https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fra
> > w.githubusercontent.com%2Flinux-rdma%2Frdma-
> >
> core%2Fmaster%2FREADME.md&data=02%7C01%7Cshahafs%40mellanox.co
> >
> m%7C3793359a175d46b47c2508d5dcc69ff1%7Ca652971c7d2e4d9ba6a4d1492
> >
> 56f461b%7C0%7C0%7C636657668016130861&sdata=4LNh%2Fr5vM4BJeizvEIxi
> > ShMrfcx0NrlBFWz4V2wA%2FkY%3D&reserved=0
> > diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c index
> > f0e6ed7..5d0f706 100644
> > --- a/drivers/net/mlx5/mlx5.c
> > +++ b/drivers/net/mlx5/mlx5.c
> > @@ -567,7 +567,7 @@
> >  	rte_memseg_walk(find_lower_va_bound, &addr);
> >
> >  	/* keep distance to hugepages to minimize potential conflicts. */
> > -	addr = RTE_PTR_SUB(addr, MLX5_UAR_OFFSET + MLX5_UAR_SIZE);
> > +	addr = RTE_PTR_SUB(addr, (uintptr_t)(MLX5_UAR_OFFSET +
> > +MLX5_UAR_SIZE));
> >  	/* anonymous mmap, no real memory consumption. */
> >  	addr = mmap(addr, MLX5_UAR_SIZE,
> >  		    PROT_NONE, MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
> @@ -953,6
> > +953,12 @@
> >  		priv->port = port;
> >  		priv->pd = pd;
> >  		priv->mtu = ETHER_MTU;
> > +#ifndef RTE_ARCH_64
> > +		/* Initialize UAR access locks for 32bit implementations. */
> > +		rte_spinlock_init(&priv->uar_lock_cq);
> > +		for (i = 0; i < MLX5_UAR_PAGE_NUM_MAX; i++)
> > +			rte_spinlock_init(&priv->uar_lock[i]);
> > +#endif
> >  		err = mlx5_args(&config, pci_dev->device.devargs);
> >  		if (err) {
> >  			err = rte_errno;
> > diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h index
> > 997b04a..2da32cd 100644
> > --- a/drivers/net/mlx5/mlx5.h
> > +++ b/drivers/net/mlx5/mlx5.h
> > @@ -198,6 +198,11 @@ struct priv {
> >  	/* Context for Verbs allocator. */
> >  	int nl_socket; /* Netlink socket. */
> >  	uint32_t nl_sn; /* Netlink message sequence number. */
> > +#ifndef RTE_ARCH_64
> > +	rte_spinlock_t uar_lock_cq; /* CQs share a common distinct UAR */
> > +	rte_spinlock_t uar_lock[MLX5_UAR_PAGE_NUM_MAX];
> > +	/* UAR same-page access control required in 32bit implementations.
> > */
> > +#endif
> >  };
> >
> >  #define PORT_ID(priv) ((priv)->dev_data->port_id) diff --git
> > a/drivers/net/mlx5/mlx5_defs.h b/drivers/net/mlx5/mlx5_defs.h index
> > 5bbbec2..f6ec415 100644
> > --- a/drivers/net/mlx5/mlx5_defs.h
> > +++ b/drivers/net/mlx5/mlx5_defs.h
> > @@ -87,14 +87,28 @@
> >  #define MLX5_LINK_STATUS_TIMEOUT 10
> >
> >  /* Reserved address space for UAR mapping. */ -#define MLX5_UAR_SIZE
> > (1ULL << 32)
> > +#define MLX5_UAR_SIZE (1ULL << (sizeof(uintptr_t) * 4))
> >
> >  /* Offset of reserved UAR address space to hugepage memory. Offset is
> > used here
> >   * to minimize possibility of address next to hugepage being used by
> > other code
> >   * in either primary or secondary process, failing to map TX UAR
> > would make TX
> >   * packets invisible to HW.
> >   */
> > -#define MLX5_UAR_OFFSET (1ULL << 32)
> > +#define MLX5_UAR_OFFSET (1ULL << (sizeof(uintptr_t) * 4))
> > +
> > +/* Maximum number of UAR pages used by a port,
> > + * These are the size and mask for an array of mutexes used to
> > +synchronize
> > + * the access to port's UARs on platforms that do not support 64 bit writes.
> > + * In such systems it is possible to issue the 64 bits DoorBells
> > +through two
> > + * consecutive writes, each write 32 bits. The access to a UAR page
> > +(which can
> > + * be accessible by all threads in the process) must be synchronized
> > + * (for example, using a semaphore). Such a synchronization is not
> > +required
> > + * when ringing DoorBells on different UAR pages.
> > + * A port with 512 Tx queues uses 8, 4kBytes, UAR pages which are
> > +shared
> > + * among the ports.
> > + */
> > +#define MLX5_UAR_PAGE_NUM_MAX 64
> > +#define MLX5_UAR_PAGE_NUM_MASK
> ((MLX5_UAR_PAGE_NUM_MAX)
> > - 1)
> >
> >  /* Log 2 of the default number of strides per WQE for Multi-Packet
> > RQ. */ #define MLX5_MPRQ_STRIDE_NUM_N 6U diff --git
> > a/drivers/net/mlx5/mlx5_rxq.c b/drivers/net/mlx5/mlx5_rxq.c index
> > 08dd559..820048f 100644
> > --- a/drivers/net/mlx5/mlx5_rxq.c
> > +++ b/drivers/net/mlx5/mlx5_rxq.c
> > @@ -643,7 +643,8 @@
> >  	doorbell = (uint64_t)doorbell_hi << 32;
> >  	doorbell |=  rxq->cqn;
> >  	rxq->cq_db[MLX5_CQ_ARM_DB] = rte_cpu_to_be_32(doorbell_hi);
> > -	rte_write64(rte_cpu_to_be_64(doorbell), cq_db_reg);
> > +	mlx5_uar_write64(rte_cpu_to_be_64(doorbell),
> > +			 cq_db_reg, rxq->uar_lock_cq);
> >  }
> >
> >  /**
> > @@ -1445,6 +1446,9 @@ struct mlx5_rxq_ctrl *
> >  	tmpl->rxq.elts_n = log2above(desc);
> >  	tmpl->rxq.elts =
> >  		(struct rte_mbuf *(*)[1 << tmpl->rxq.elts_n])(tmpl + 1);
> > +#ifndef RTE_ARCH_64
> > +	tmpl->rxq.uar_lock_cq = &priv->uar_lock_cq; #endif
> >  	tmpl->idx = idx;
> >  	rte_atomic32_inc(&tmpl->refcnt);
> >  	LIST_INSERT_HEAD(&priv->rxqsctrl, tmpl, next); diff --git
> > a/drivers/net/mlx5/mlx5_rxtx.c b/drivers/net/mlx5/mlx5_rxtx.c index
> > a7ed8d8..ec35ea0 100644
> > --- a/drivers/net/mlx5/mlx5_rxtx.c
> > +++ b/drivers/net/mlx5/mlx5_rxtx.c
> > @@ -495,6 +495,7 @@
> >  	volatile struct mlx5_wqe_ctrl *last_wqe = NULL;
> >  	unsigned int segs_n = 0;
> >  	const unsigned int max_inline = txq->max_inline;
> > +	uint64_t addr_64;
> >
> >  	if (unlikely(!pkts_n))
> >  		return 0;
> > @@ -711,12 +712,12 @@
> >  			ds = 3;
> >  use_dseg:
> >  			/* Add the remaining packet as a simple ds. */
> > -			addr = rte_cpu_to_be_64(addr);
> > +			addr_64 = rte_cpu_to_be_64(addr);
> >  			*dseg = (rte_v128u32_t){
> >  				rte_cpu_to_be_32(length),
> >  				mlx5_tx_mb2mr(txq, buf),
> > -				addr,
> > -				addr >> 32,
> > +				addr_64,
> > +				addr_64 >> 32,
> >  			};
> >  			++ds;
> >  			if (!segs_n)
> > @@ -750,12 +751,12 @@
> >  		total_length += length;
> >  #endif
> >  		/* Store segment information. */
> > -		addr = rte_cpu_to_be_64(rte_pktmbuf_mtod(buf,
> > uintptr_t));
> > +		addr_64 = rte_cpu_to_be_64(rte_pktmbuf_mtod(buf,
> > uintptr_t));
> >  		*dseg = (rte_v128u32_t){
> >  			rte_cpu_to_be_32(length),
> >  			mlx5_tx_mb2mr(txq, buf),
> > -			addr,
> > -			addr >> 32,
> > +			addr_64,
> > +			addr_64 >> 32,
> >  		};
> >  		(*txq->elts)[++elts_head & elts_m] = buf;
> >  		if (--segs_n)
> > @@ -1450,6 +1451,7 @@
> >  	unsigned int mpw_room = 0;
> >  	unsigned int inl_pad = 0;
> >  	uint32_t inl_hdr;
> > +	uint64_t addr_64;
> >  	struct mlx5_mpw mpw = {
> >  		.state = MLX5_MPW_STATE_CLOSED,
> >  	};
> > @@ -1586,13 +1588,13 @@
> >  					((uintptr_t)mpw.data.raw +
> >  					 inl_pad);
> >  			(*txq->elts)[elts_head++ & elts_m] = buf;
> > -			addr = rte_cpu_to_be_64(rte_pktmbuf_mtod(buf,
> > -								 uintptr_t));
> > +			addr_64 = rte_cpu_to_be_64(
> > +					rte_pktmbuf_mtod(buf, uintptr_t));
> >  			*dseg = (rte_v128u32_t) {
> >  				rte_cpu_to_be_32(length),
> >  				mlx5_tx_mb2mr(txq, buf),
> > -				addr,
> > -				addr >> 32,
> > +				addr_64,
> > +				addr_64 >> 32,
> >  			};
> >  			mpw.data.raw = (volatile void *)(dseg + 1);
> >  			mpw.total_len += (inl_pad + sizeof(*dseg)); diff --git
> > a/drivers/net/mlx5/mlx5_rxtx.h b/drivers/net/mlx5/mlx5_rxtx.h index
> > 0007be0..2448d73 100644
> > --- a/drivers/net/mlx5/mlx5_rxtx.h
> > +++ b/drivers/net/mlx5/mlx5_rxtx.h
> > @@ -26,6 +26,8 @@
> >  #include <rte_common.h>
> >  #include <rte_hexdump.h>
> >  #include <rte_atomic.h>
> > +#include <rte_spinlock.h>
> > +#include <rte_io.h>
> >
> >  #include "mlx5_utils.h"
> >  #include "mlx5.h"
> > @@ -115,6 +117,10 @@ struct mlx5_rxq_data {
> >  	void *cq_uar; /* CQ user access region. */
> >  	uint32_t cqn; /* CQ number. */
> >  	uint8_t cq_arm_sn; /* CQ arm seq number. */
> > +#ifndef RTE_ARCH_64
> > +	rte_spinlock_t *uar_lock_cq;
> > +	/* CQ (UAR) access lock required for 32bit implementations */ #endif
> >  	uint32_t tunnel; /* Tunnel information. */  } __rte_cache_aligned;
> >
> > @@ -196,6 +202,10 @@ struct mlx5_txq_data {
> >  	volatile void *bf_reg; /* Blueflame register remapped. */
> >  	struct rte_mbuf *(*elts)[]; /* TX elements. */
> >  	struct mlx5_txq_stats stats; /* TX queue counters. */
> > +#ifndef RTE_ARCH_64
> > +	rte_spinlock_t *uar_lock;
> > +	/* UAR access lock required for 32bit implementations */ #endif
> >  } __rte_cache_aligned;
> >
> >  /* Verbs Rx queue elements. */
> > @@ -348,6 +358,63 @@ uint16_t mlx5_rx_burst_vec(void *dpdk_txq,
> struct
> > rte_mbuf **pkts,  uint32_t mlx5_rx_addr2mr_bh(struct mlx5_rxq_data
> > *rxq, uintptr_t addr);  uint32_t mlx5_tx_addr2mr_bh(struct
> > mlx5_txq_data *txq, uintptr_t addr);
> >
> > +/**
> > + * Provide safe 64bit store operation to mlx5 UAR region for both
> > +32bit and
> > + * 64bit architectures.
> > + *
> > + * @param val
> > + *   value to write in CPU endian format.
> > + * @param addr
> > + *   Address to write to.
> > + * @param lock
> > + *   Address of the lock to use for that UAR access.
> > + */
> > +static __rte_always_inline void
> > +__mlx5_uar_write64_relaxed(uint64_t val, volatile void *addr,
> > +			   rte_spinlock_t *lock __rte_unused) { #ifdef
> > RTE_ARCH_64
> > +	rte_write64_relaxed(val, addr);
> > +#else /* !RTE_ARCH_64 */
> > +	rte_spinlock_lock(lock);
> > +	rte_write32_relaxed(val, addr);
> > +	rte_io_wmb();
> > +	rte_write32_relaxed(val >> 32,
> > +			    (volatile void *)((volatile char *)addr + 4));
> > +	rte_spinlock_unlock(lock);
> > +#endif
> > +}
> > +
> > +/**
> > + * Provide safe 64bit store operation to mlx5 UAR region for both
> > +32bit and
> > + * 64bit architectures while guaranteeing the order of execution with
> > +the
> > + * code being executed.
> > + *
> > + * @param val
> > + *   value to write in CPU endian format.
> > + * @param addr
> > + *   Address to write to.
> > + * @param lock
> > + *   Address of the lock to use for that UAR access.
> > + */
> > +static __rte_always_inline void
> > +__mlx5_uar_write64(uint64_t val, volatile void *addr, rte_spinlock_t
> > +*lock) {
> > +	rte_io_wmb();
> > +	__mlx5_uar_write64_relaxed(val, addr, lock); }
> > +
> > +/* Assist macros, used instead of directly calling the functions the
> > +wrap. */ #ifdef RTE_ARCH_64 #define mlx5_uar_write64_relaxed(val,
> > +dst,
> > +lock) \
> > +		__mlx5_uar_write64_relaxed(val, dst, NULL) #define
> > +mlx5_uar_write64(val, dst, lock) __mlx5_uar_write64(val, dst, NULL)
> > +#else #define mlx5_uar_write64_relaxed(val, dst, lock) \
> > +		__mlx5_uar_write64_relaxed(val, dst, lock) #define
> > +mlx5_uar_write64(val, dst, lock) __mlx5_uar_write64(val, dst, lock)
> > +#endif
> > +
> >  #ifndef NDEBUG
> >  /**
> >   * Verify or set magic value in CQE.
> > @@ -614,7 +681,7 @@ uint16_t mlx5_rx_burst_vec(void *dpdk_txq, struct
> > rte_mbuf **pkts,
> >  	*txq->qp_db = rte_cpu_to_be_32(txq->wqe_ci);
> >  	/* Ensure ordering between DB record and BF copy. */
> >  	rte_wmb();
> > -	*dst = *src;
> > +	mlx5_uar_write64_relaxed(*src, dst, txq->uar_lock);
> >  	if (cond)
> >  		rte_wmb();
> >  }
> > diff --git a/drivers/net/mlx5/mlx5_txq.c b/drivers/net/mlx5/mlx5_txq.c
> > index 669b913..dc786d4 100644
> > --- a/drivers/net/mlx5/mlx5_txq.c
> > +++ b/drivers/net/mlx5/mlx5_txq.c
> > @@ -255,6 +255,9 @@
> >  	struct mlx5_txq_ctrl *txq_ctrl;
> >  	int already_mapped;
> >  	size_t page_size = sysconf(_SC_PAGESIZE);
> > +#ifndef RTE_ARCH_64
> > +	unsigned int lock_idx;
> > +#endif
> >
> >  	memset(pages, 0, priv->txqs_n * sizeof(uintptr_t));
> >  	/*
> > @@ -281,7 +284,7 @@
> >  		}
> >  		/* new address in reserved UAR address space. */
> >  		addr = RTE_PTR_ADD(priv->uar_base,
> > -				   uar_va & (MLX5_UAR_SIZE - 1));
> > +				   uar_va & (uintptr_t)(MLX5_UAR_SIZE - 1));
> >  		if (!already_mapped) {
> >  			pages[pages_n++] = uar_va;
> >  			/* fixed mmap to specified address in reserved @@ -
> > 305,6 +308,12 @@
> >  		else
> >  			assert(txq_ctrl->txq.bf_reg ==
> >  			       RTE_PTR_ADD((void *)addr, off));
> > +#ifndef RTE_ARCH_64
> > +		/* Assign a UAR lock according to UAR page number */
> > +		lock_idx = (txq_ctrl->uar_mmap_offset / page_size) &
> > +			   MLX5_UAR_PAGE_NUM_MASK;
> > +		txq->uar_lock = &priv->uar_lock[lock_idx]; #endif
> >  	}
> >  	return 0;
> >  }
> > @@ -511,6 +520,8 @@ struct mlx5_txq_ibv *
> >  	rte_atomic32_inc(&txq_ibv->refcnt);
> >  	if (qp.comp_mask & MLX5DV_QP_MASK_UAR_MMAP_OFFSET) {
> >  		txq_ctrl->uar_mmap_offset = qp.uar_mmap_offset;
> > +		DRV_LOG(DEBUG, "port %u: uar_mmap_offset 0x%lx",
> > +			dev->data->port_id, txq_ctrl->uar_mmap_offset);
> >  	} else {
> >  		DRV_LOG(ERR,
> >  			"port %u failed to retrieve UAR info, invalid"
> > --
> > 1.8.3.1

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [dpdk-dev] [PATCH v2] net/mlx5: add support for 32bit systems
  2018-06-28  7:12 [dpdk-dev] [PATCH] net/mlx5: add support for 32bit systems Moti Haimovsky
  2018-07-02  7:05 ` Shahaf Shuler
@ 2018-07-02 11:11 ` Moti Haimovsky
  2018-07-02 20:59   ` Yongseok Koh
                     ` (2 more replies)
  1 sibling, 3 replies; 17+ messages in thread
From: Moti Haimovsky @ 2018-07-02 11:11 UTC (permalink / raw)
  To: shahafs; +Cc: adrien.mazarguil, dev, Moti Haimovsky

This patch adds support for building and running mlx5 PMD on
32bit systems such as i686.

The main issue to tackle was handling the 32bit access to the UAR
as quoted from the mlx5 PRM:
QP and CQ DoorBells require 64-bit writes. For best performance, it
is recommended to execute the QP/CQ DoorBell as a single 64-bit write
operation. For platforms that do not support 64 bit writes, it is
possible to issue the 64 bits DoorBells through two consecutive writes,
each write 32 bits, as described below:
* The order of writing each of the Dwords is from lower to upper
  addresses.
* No other DoorBell can be rung (or even start ringing) in the midst of
  an on-going write of a DoorBell over a given UAR page.
The last rule implies that in a multi-threaded environment, the access
to a UAR page (which can be accessible by all threads in the process)
must be synchronized (for example, using a semaphore) unless an atomic
write of 64 bits in a single bus operation is guaranteed. Such a
synchronization is not required for when ringing DoorBells on different
UAR pages.

Signed-off-by: Moti Haimovsky <motih@mellanox.com>
---
v2:
* Fixed coding style issues.
* Modified documentation according to review inputs.
* Fixed merge conflicts.
---
 doc/guides/nics/features/mlx5.ini |  1 +
 doc/guides/nics/mlx5.rst          |  6 +++-
 drivers/net/mlx5/mlx5.c           |  8 ++++-
 drivers/net/mlx5/mlx5.h           |  5 +++
 drivers/net/mlx5/mlx5_defs.h      | 18 ++++++++--
 drivers/net/mlx5/mlx5_rxq.c       |  6 +++-
 drivers/net/mlx5/mlx5_rxtx.c      | 22 +++++++------
 drivers/net/mlx5/mlx5_rxtx.h      | 69 ++++++++++++++++++++++++++++++++++++++-
 drivers/net/mlx5/mlx5_txq.c       | 13 +++++++-
 9 files changed, 131 insertions(+), 17 deletions(-)

diff --git a/doc/guides/nics/features/mlx5.ini b/doc/guides/nics/features/mlx5.ini
index e75b14b..b28b43e 100644
--- a/doc/guides/nics/features/mlx5.ini
+++ b/doc/guides/nics/features/mlx5.ini
@@ -43,5 +43,6 @@ Multiprocess aware   = Y
 Other kdrv           = Y
 ARMv8                = Y
 Power8               = Y
+x86-32               = Y
 x86-64               = Y
 Usage doc            = Y
diff --git a/doc/guides/nics/mlx5.rst b/doc/guides/nics/mlx5.rst
index 7dd9c1c..5fbad60 100644
--- a/doc/guides/nics/mlx5.rst
+++ b/doc/guides/nics/mlx5.rst
@@ -49,7 +49,7 @@ libibverbs.
 Features
 --------
 
-- Multi arch support: x86_64, POWER8, ARMv8.
+- Multi arch support: x86_64, POWER8, ARMv8, i686.
 - Multiple TX and RX queues.
 - Support for scattered TX and RX frames.
 - IPv4, IPv6, TCPv4, TCPv6, UDPv4 and UDPv6 RSS on any number of queues.
@@ -477,6 +477,10 @@ RMDA Core with Linux Kernel
 - Minimal kernel version : v4.14 or the most recent 4.14-rc (see `Linux installation documentation`_)
 - Minimal rdma-core version: v15+ commit 0c5f5765213a ("Merge pull request #227 from yishaih/tm")
   (see `RDMA Core installation documentation`_)
+- When building for i686 use:
+
+  - rdma-core version 18.0 or above built with 32bit support.
+  - Kernel version 4.14.41 or above.
 
 .. _`Linux installation documentation`: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git/plain/Documentation/admin-guide/README.rst
 .. _`RDMA Core installation documentation`: https://raw.githubusercontent.com/linux-rdma/rdma-core/master/README.md
diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index f0e6ed7..5d0f706 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -567,7 +567,7 @@
 	rte_memseg_walk(find_lower_va_bound, &addr);
 
 	/* keep distance to hugepages to minimize potential conflicts. */
-	addr = RTE_PTR_SUB(addr, MLX5_UAR_OFFSET + MLX5_UAR_SIZE);
+	addr = RTE_PTR_SUB(addr, (uintptr_t)(MLX5_UAR_OFFSET + MLX5_UAR_SIZE));
 	/* anonymous mmap, no real memory consumption. */
 	addr = mmap(addr, MLX5_UAR_SIZE,
 		    PROT_NONE, MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
@@ -953,6 +953,12 @@
 		priv->port = port;
 		priv->pd = pd;
 		priv->mtu = ETHER_MTU;
+#ifndef RTE_ARCH_64
+		/* Initialize UAR access locks for 32bit implementations. */
+		rte_spinlock_init(&priv->uar_lock_cq);
+		for (i = 0; i < MLX5_UAR_PAGE_NUM_MAX; i++)
+			rte_spinlock_init(&priv->uar_lock[i]);
+#endif
 		err = mlx5_args(&config, pci_dev->device.devargs);
 		if (err) {
 			err = rte_errno;
diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index 997b04a..2da32cd 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -198,6 +198,11 @@ struct priv {
 	/* Context for Verbs allocator. */
 	int nl_socket; /* Netlink socket. */
 	uint32_t nl_sn; /* Netlink message sequence number. */
+#ifndef RTE_ARCH_64
+	rte_spinlock_t uar_lock_cq; /* CQs share a common distinct UAR */
+	rte_spinlock_t uar_lock[MLX5_UAR_PAGE_NUM_MAX];
+	/* UAR same-page access control required in 32bit implementations. */
+#endif
 };
 
 #define PORT_ID(priv) ((priv)->dev_data->port_id)
diff --git a/drivers/net/mlx5/mlx5_defs.h b/drivers/net/mlx5/mlx5_defs.h
index 5bbbec2..f6ec415 100644
--- a/drivers/net/mlx5/mlx5_defs.h
+++ b/drivers/net/mlx5/mlx5_defs.h
@@ -87,14 +87,28 @@
 #define MLX5_LINK_STATUS_TIMEOUT 10
 
 /* Reserved address space for UAR mapping. */
-#define MLX5_UAR_SIZE (1ULL << 32)
+#define MLX5_UAR_SIZE (1ULL << (sizeof(uintptr_t) * 4))
 
 /* Offset of reserved UAR address space to hugepage memory. Offset is used here
  * to minimize possibility of address next to hugepage being used by other code
  * in either primary or secondary process, failing to map TX UAR would make TX
  * packets invisible to HW.
  */
-#define MLX5_UAR_OFFSET (1ULL << 32)
+#define MLX5_UAR_OFFSET (1ULL << (sizeof(uintptr_t) * 4))
+
+/* Maximum number of UAR pages used by a port,
+ * These are the size and mask for an array of mutexes used to synchronize
+ * the access to port's UARs on platforms that do not support 64 bit writes.
+ * In such systems it is possible to issue the 64 bits DoorBells through two
+ * consecutive writes, each write 32 bits. The access to a UAR page (which can
+ * be accessible by all threads in the process) must be synchronized
+ * (for example, using a semaphore). Such a synchronization is not required
+ * when ringing DoorBells on different UAR pages.
+ * A port with 512 Tx queues uses 8, 4kBytes, UAR pages which are shared
+ * among the ports.
+ */
+#define MLX5_UAR_PAGE_NUM_MAX 64
+#define MLX5_UAR_PAGE_NUM_MASK ((MLX5_UAR_PAGE_NUM_MAX) - 1)
 
 /* Log 2 of the default number of strides per WQE for Multi-Packet RQ. */
 #define MLX5_MPRQ_STRIDE_NUM_N 6U
diff --git a/drivers/net/mlx5/mlx5_rxq.c b/drivers/net/mlx5/mlx5_rxq.c
index fd0df17..3c036a8 100644
--- a/drivers/net/mlx5/mlx5_rxq.c
+++ b/drivers/net/mlx5/mlx5_rxq.c
@@ -645,7 +645,8 @@
 	doorbell = (uint64_t)doorbell_hi << 32;
 	doorbell |=  rxq->cqn;
 	rxq->cq_db[MLX5_CQ_ARM_DB] = rte_cpu_to_be_32(doorbell_hi);
-	rte_write64(rte_cpu_to_be_64(doorbell), cq_db_reg);
+	mlx5_uar_write64(rte_cpu_to_be_64(doorbell),
+			 cq_db_reg, rxq->uar_lock_cq);
 }
 
 /**
@@ -1447,6 +1448,9 @@ struct mlx5_rxq_ctrl *
 	tmpl->rxq.elts_n = log2above(desc);
 	tmpl->rxq.elts =
 		(struct rte_mbuf *(*)[1 << tmpl->rxq.elts_n])(tmpl + 1);
+#ifndef RTE_ARCH_64
+	tmpl->rxq.uar_lock_cq = &priv->uar_lock_cq;
+#endif
 	tmpl->idx = idx;
 	rte_atomic32_inc(&tmpl->refcnt);
 	LIST_INSERT_HEAD(&priv->rxqsctrl, tmpl, next);
diff --git a/drivers/net/mlx5/mlx5_rxtx.c b/drivers/net/mlx5/mlx5_rxtx.c
index a7ed8d8..52a1074 100644
--- a/drivers/net/mlx5/mlx5_rxtx.c
+++ b/drivers/net/mlx5/mlx5_rxtx.c
@@ -495,6 +495,7 @@
 	volatile struct mlx5_wqe_ctrl *last_wqe = NULL;
 	unsigned int segs_n = 0;
 	const unsigned int max_inline = txq->max_inline;
+	uint64_t addr_64;
 
 	if (unlikely(!pkts_n))
 		return 0;
@@ -711,12 +712,12 @@
 			ds = 3;
 use_dseg:
 			/* Add the remaining packet as a simple ds. */
-			addr = rte_cpu_to_be_64(addr);
+			addr_64 = rte_cpu_to_be_64(addr);
 			*dseg = (rte_v128u32_t){
 				rte_cpu_to_be_32(length),
 				mlx5_tx_mb2mr(txq, buf),
-				addr,
-				addr >> 32,
+				addr_64,
+				addr_64 >> 32,
 			};
 			++ds;
 			if (!segs_n)
@@ -750,12 +751,12 @@
 		total_length += length;
 #endif
 		/* Store segment information. */
-		addr = rte_cpu_to_be_64(rte_pktmbuf_mtod(buf, uintptr_t));
+		addr_64 = rte_cpu_to_be_64(rte_pktmbuf_mtod(buf, uintptr_t));
 		*dseg = (rte_v128u32_t){
 			rte_cpu_to_be_32(length),
 			mlx5_tx_mb2mr(txq, buf),
-			addr,
-			addr >> 32,
+			addr_64,
+			addr_64 >> 32,
 		};
 		(*txq->elts)[++elts_head & elts_m] = buf;
 		if (--segs_n)
@@ -1450,6 +1451,7 @@
 	unsigned int mpw_room = 0;
 	unsigned int inl_pad = 0;
 	uint32_t inl_hdr;
+	uint64_t addr_64;
 	struct mlx5_mpw mpw = {
 		.state = MLX5_MPW_STATE_CLOSED,
 	};
@@ -1586,13 +1588,13 @@
 					((uintptr_t)mpw.data.raw +
 					 inl_pad);
 			(*txq->elts)[elts_head++ & elts_m] = buf;
-			addr = rte_cpu_to_be_64(rte_pktmbuf_mtod(buf,
-								 uintptr_t));
+			addr_64 = rte_cpu_to_be_64(rte_pktmbuf_mtod(buf,
+								    uintptr_t));
 			*dseg = (rte_v128u32_t) {
 				rte_cpu_to_be_32(length),
 				mlx5_tx_mb2mr(txq, buf),
-				addr,
-				addr >> 32,
+				addr_64,
+				addr_64 >> 32,
 			};
 			mpw.data.raw = (volatile void *)(dseg + 1);
 			mpw.total_len += (inl_pad + sizeof(*dseg));
diff --git a/drivers/net/mlx5/mlx5_rxtx.h b/drivers/net/mlx5/mlx5_rxtx.h
index 0007be0..2448d73 100644
--- a/drivers/net/mlx5/mlx5_rxtx.h
+++ b/drivers/net/mlx5/mlx5_rxtx.h
@@ -26,6 +26,8 @@
 #include <rte_common.h>
 #include <rte_hexdump.h>
 #include <rte_atomic.h>
+#include <rte_spinlock.h>
+#include <rte_io.h>
 
 #include "mlx5_utils.h"
 #include "mlx5.h"
@@ -115,6 +117,10 @@ struct mlx5_rxq_data {
 	void *cq_uar; /* CQ user access region. */
 	uint32_t cqn; /* CQ number. */
 	uint8_t cq_arm_sn; /* CQ arm seq number. */
+#ifndef RTE_ARCH_64
+	rte_spinlock_t *uar_lock_cq;
+	/* CQ (UAR) access lock required for 32bit implementations */
+#endif
 	uint32_t tunnel; /* Tunnel information. */
 } __rte_cache_aligned;
 
@@ -196,6 +202,10 @@ struct mlx5_txq_data {
 	volatile void *bf_reg; /* Blueflame register remapped. */
 	struct rte_mbuf *(*elts)[]; /* TX elements. */
 	struct mlx5_txq_stats stats; /* TX queue counters. */
+#ifndef RTE_ARCH_64
+	rte_spinlock_t *uar_lock;
+	/* UAR access lock required for 32bit implementations */
+#endif
 } __rte_cache_aligned;
 
 /* Verbs Rx queue elements. */
@@ -348,6 +358,63 @@ uint16_t mlx5_rx_burst_vec(void *dpdk_txq, struct rte_mbuf **pkts,
 uint32_t mlx5_rx_addr2mr_bh(struct mlx5_rxq_data *rxq, uintptr_t addr);
 uint32_t mlx5_tx_addr2mr_bh(struct mlx5_txq_data *txq, uintptr_t addr);
 
+/**
+ * Provide safe 64bit store operation to mlx5 UAR region for both 32bit and
+ * 64bit architectures.
+ *
+ * @param val
+ *   value to write in CPU endian format.
+ * @param addr
+ *   Address to write to.
+ * @param lock
+ *   Address of the lock to use for that UAR access.
+ */
+static __rte_always_inline void
+__mlx5_uar_write64_relaxed(uint64_t val, volatile void *addr,
+			   rte_spinlock_t *lock __rte_unused)
+{
+#ifdef RTE_ARCH_64
+	rte_write64_relaxed(val, addr);
+#else /* !RTE_ARCH_64 */
+	rte_spinlock_lock(lock);
+	rte_write32_relaxed(val, addr);
+	rte_io_wmb();
+	rte_write32_relaxed(val >> 32,
+			    (volatile void *)((volatile char *)addr + 4));
+	rte_spinlock_unlock(lock);
+#endif
+}
+
+/**
+ * Provide safe 64bit store operation to mlx5 UAR region for both 32bit and
+ * 64bit architectures while guaranteeing the order of execution with the
+ * code being executed.
+ *
+ * @param val
+ *   value to write in CPU endian format.
+ * @param addr
+ *   Address to write to.
+ * @param lock
+ *   Address of the lock to use for that UAR access.
+ */
+static __rte_always_inline void
+__mlx5_uar_write64(uint64_t val, volatile void *addr, rte_spinlock_t *lock)
+{
+	rte_io_wmb();
+	__mlx5_uar_write64_relaxed(val, addr, lock);
+}
+
+/* Assist macros, used instead of directly calling the functions they wrap. */
+#ifdef RTE_ARCH_64
+#define mlx5_uar_write64_relaxed(val, dst, lock) \
+		__mlx5_uar_write64_relaxed(val, dst, NULL)
+#define mlx5_uar_write64(val, dst, lock) __mlx5_uar_write64(val, dst, NULL)
+#else
+#define mlx5_uar_write64_relaxed(val, dst, lock) \
+		__mlx5_uar_write64_relaxed(val, dst, lock)
+#define mlx5_uar_write64(val, dst, lock) __mlx5_uar_write64(val, dst, lock)
+#endif
+
 #ifndef NDEBUG
 /**
  * Verify or set magic value in CQE.
@@ -614,7 +681,7 @@ uint16_t mlx5_rx_burst_vec(void *dpdk_txq, struct rte_mbuf **pkts,
 	*txq->qp_db = rte_cpu_to_be_32(txq->wqe_ci);
 	/* Ensure ordering between DB record and BF copy. */
 	rte_wmb();
-	*dst = *src;
+	mlx5_uar_write64_relaxed(*src, dst, txq->uar_lock);
 	if (cond)
 		rte_wmb();
 }
diff --git a/drivers/net/mlx5/mlx5_txq.c b/drivers/net/mlx5/mlx5_txq.c
index 669b913..dc786d4 100644
--- a/drivers/net/mlx5/mlx5_txq.c
+++ b/drivers/net/mlx5/mlx5_txq.c
@@ -255,6 +255,9 @@
 	struct mlx5_txq_ctrl *txq_ctrl;
 	int already_mapped;
 	size_t page_size = sysconf(_SC_PAGESIZE);
+#ifndef RTE_ARCH_64
+	unsigned int lock_idx;
+#endif
 
 	memset(pages, 0, priv->txqs_n * sizeof(uintptr_t));
 	/*
@@ -281,7 +284,7 @@
 		}
 		/* new address in reserved UAR address space. */
 		addr = RTE_PTR_ADD(priv->uar_base,
-				   uar_va & (MLX5_UAR_SIZE - 1));
+				   uar_va & (uintptr_t)(MLX5_UAR_SIZE - 1));
 		if (!already_mapped) {
 			pages[pages_n++] = uar_va;
 			/* fixed mmap to specified address in reserved
@@ -305,6 +308,12 @@
 		else
 			assert(txq_ctrl->txq.bf_reg ==
 			       RTE_PTR_ADD((void *)addr, off));
+#ifndef RTE_ARCH_64
+		/* Assign a UAR lock according to UAR page number */
+		lock_idx = (txq_ctrl->uar_mmap_offset / page_size) &
+			   MLX5_UAR_PAGE_NUM_MASK;
+		txq->uar_lock = &priv->uar_lock[lock_idx];
+#endif
 	}
 	return 0;
 }
@@ -511,6 +520,8 @@ struct mlx5_txq_ibv *
 	rte_atomic32_inc(&txq_ibv->refcnt);
 	if (qp.comp_mask & MLX5DV_QP_MASK_UAR_MMAP_OFFSET) {
 		txq_ctrl->uar_mmap_offset = qp.uar_mmap_offset;
+		DRV_LOG(DEBUG, "port %u: uar_mmap_offset 0x%lx",
+			dev->data->port_id, txq_ctrl->uar_mmap_offset);
 	} else {
 		DRV_LOG(ERR,
 			"port %u failed to retrieve UAR info, invalid"
-- 
1.8.3.1

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [dpdk-dev] [PATCH v2] net/mlx5: add support for 32bit systems
  2018-07-02 11:11 ` [dpdk-dev] [PATCH v2] " Moti Haimovsky
@ 2018-07-02 20:59   ` Yongseok Koh
  2018-07-03 12:03     ` Shahaf Shuler
  2018-07-04 13:48   ` Ferruh Yigit
  2018-07-12 12:01   ` [dpdk-dev] [PATCH v3] " Moti Haimovsky
  2 siblings, 1 reply; 17+ messages in thread
From: Yongseok Koh @ 2018-07-02 20:59 UTC (permalink / raw)
  To: Mordechay Haimovsky; +Cc: Shahaf Shuler, Adrien Mazarguil, dev


> On Jul 2, 2018, at 4:11 AM, Moti Haimovsky <motih@mellanox.com> wrote:
> 
> This patch adds support for building and running mlx5 PMD on
> 32bit systems such as i686.
> 
> The main issue to tackle was handling the 32bit access to the UAR
> as quoted from the mlx5 PRM:
> QP and CQ DoorBells require 64-bit writes. For best performance, it
> is recommended to execute the QP/CQ DoorBell as a single 64-bit write
> operation. For platforms that do not support 64 bit writes, it is
> possible to issue the 64 bits DoorBells through two consecutive writes,
> each write 32 bits, as described below:
> * The order of writing each of the Dwords is from lower to upper
>  addresses.
> * No other DoorBell can be rung (or even start ringing) in the midst of
>  an on-going write of a DoorBell over a given UAR page.
> The last rule implies that in a multi-threaded environment, the access
> to a UAR page (which can be accessible by all threads in the process)
> must be synchronized (for example, using a semaphore) unless an atomic
> write of 64 bits in a single bus operation is guaranteed. Such a
> synchronization is not required for when ringing DoorBells on different
> UAR pages.
> 
> Signed-off-by: Moti Haimovsky <motih@mellanox.com>
> ---
Acked-by: Yongseok Koh <yskoh@mellanox.com>
 
Thanks

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [dpdk-dev] [PATCH v2] net/mlx5: add support for 32bit systems
  2018-07-02 20:59   ` Yongseok Koh
@ 2018-07-03 12:03     ` Shahaf Shuler
  0 siblings, 0 replies; 17+ messages in thread
From: Shahaf Shuler @ 2018-07-03 12:03 UTC (permalink / raw)
  To: Yongseok Koh, Mordechay Haimovsky
  Cc: Adrien Mazarguil, dev, Saleh Alsouqi, Raslan Darawsheh

Tuesday, July 3, 2018 12:00 AM, Yongseok Koh:
> Subject: Re: [dpdk-dev] [PATCH v2] net/mlx5: add support for 32bit systems
> 
> 
> > On Jul 2, 2018, at 4:11 AM, Moti Haimovsky <motih@mellanox.com> wrote:
> >
> > This patch adds support for building and running mlx5 PMD on 32bit
> > systems such as i686.
> >
> > The main issue to tackle was handling the 32bit access to the UAR as
> > quoted from the mlx5 PRM:
> > QP and CQ DoorBells require 64-bit writes. For best performance, it is
> > recommended to execute the QP/CQ DoorBell as a single 64-bit write
> > operation. For platforms that do not support 64 bit writes, it is
> > possible to issue the 64 bits DoorBells through two consecutive
> > writes, each write 32 bits, as described below:
> > * The order of writing each of the Dwords is from lower to upper
> > addresses.
> > * No other DoorBell can be rung (or even start ringing) in the midst
> > of  an on-going write of a DoorBell over a given UAR page.
> > The last rule implies that in a multi-threaded environment, the access
> > to a UAR page (which can be accessible by all threads in the process)
> > must be synchronized (for example, using a semaphore) unless an atomic
> > write of 64 bits in a single bus operation is guaranteed. Such a
> > synchronization is not required for when ringing DoorBells on
> > different UAR pages.
> >
> > Signed-off-by: Moti Haimovsky <motih@mellanox.com>
> > ---
> Acked-by: Yongseok Koh <yskoh@mellanox.com>

Applied to next-net-mlx, thanks. 

> 
> Thanks

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [dpdk-dev] [PATCH v2] net/mlx5: add support for 32bit systems
  2018-07-02 11:11 ` [dpdk-dev] [PATCH v2] " Moti Haimovsky
  2018-07-02 20:59   ` Yongseok Koh
@ 2018-07-04 13:48   ` Ferruh Yigit
  2018-07-05 10:09     ` Mordechay Haimovsky
  2018-07-08 17:04     ` Mordechay Haimovsky
  2018-07-12 12:01   ` [dpdk-dev] [PATCH v3] " Moti Haimovsky
  2 siblings, 2 replies; 17+ messages in thread
From: Ferruh Yigit @ 2018-07-04 13:48 UTC (permalink / raw)
  To: Moti Haimovsky, shahafs; +Cc: adrien.mazarguil, dev

On 7/2/2018 12:11 PM, Moti Haimovsky wrote:
> This patch adds support for building and running mlx5 PMD on
> 32bit systems such as i686.
> 
> The main issue to tackle was handling the 32bit access to the UAR
> as quoted from the mlx5 PRM:
> QP and CQ DoorBells require 64-bit writes. For best performance, it
> is recommended to execute the QP/CQ DoorBell as a single 64-bit write
> operation. For platforms that do not support 64 bit writes, it is
> possible to issue the 64 bits DoorBells through two consecutive writes,
> each write 32 bits, as described below:
> * The order of writing each of the Dwords is from lower to upper
>   addresses.
> * No other DoorBell can be rung (or even start ringing) in the midst of
>   an on-going write of a DoorBell over a given UAR page.
> The last rule implies that in a multi-threaded environment, the access
> to a UAR page (which can be accessible by all threads in the process)
> must be synchronized (for example, using a semaphore) unless an atomic
> write of 64 bits in a single bus operation is guaranteed. Such a
> synchronization is not required for when ringing DoorBells on different
> UAR pages.
> 
> Signed-off-by: Moti Haimovsky <motih@mellanox.com>
> ---
> v2:
> * Fixed coding style issues.
> * Modified documentation according to review inputs.
> * Fixed merge conflicts.
> ---
>  doc/guides/nics/features/mlx5.ini |  1 +
>  doc/guides/nics/mlx5.rst          |  6 +++-
>  drivers/net/mlx5/mlx5.c           |  8 ++++-
>  drivers/net/mlx5/mlx5.h           |  5 +++
>  drivers/net/mlx5/mlx5_defs.h      | 18 ++++++++--
>  drivers/net/mlx5/mlx5_rxq.c       |  6 +++-
>  drivers/net/mlx5/mlx5_rxtx.c      | 22 +++++++------
>  drivers/net/mlx5/mlx5_rxtx.h      | 69 ++++++++++++++++++++++++++++++++++++++-
>  drivers/net/mlx5/mlx5_txq.c       | 13 +++++++-
>  9 files changed, 131 insertions(+), 17 deletions(-)
> 
> diff --git a/doc/guides/nics/features/mlx5.ini b/doc/guides/nics/features/mlx5.ini
> index e75b14b..b28b43e 100644
> --- a/doc/guides/nics/features/mlx5.ini
> +++ b/doc/guides/nics/features/mlx5.ini
> @@ -43,5 +43,6 @@ Multiprocess aware   = Y
>  Other kdrv           = Y
>  ARMv8                = Y
>  Power8               = Y
> +x86-32               = Y
>  x86-64               = Y
>  Usage doc            = Y
> diff --git a/doc/guides/nics/mlx5.rst b/doc/guides/nics/mlx5.rst
> index 7dd9c1c..5fbad60 100644
> --- a/doc/guides/nics/mlx5.rst
> +++ b/doc/guides/nics/mlx5.rst
> @@ -49,7 +49,7 @@ libibverbs.
>  Features
>  --------
>  
> -- Multi arch support: x86_64, POWER8, ARMv8.
> +- Multi arch support: x86_64, POWER8, ARMv8, i686.
>  - Multiple TX and RX queues.
>  - Support for scattered TX and RX frames.
>  - IPv4, IPv6, TCPv4, TCPv6, UDPv4 and UDPv6 RSS on any number of queues.
> @@ -477,6 +477,10 @@ RMDA Core with Linux Kernel
>  - Minimal kernel version : v4.14 or the most recent 4.14-rc (see `Linux installation documentation`_)
>  - Minimal rdma-core version: v15+ commit 0c5f5765213a ("Merge pull request #227 from yishaih/tm")
>    (see `RDMA Core installation documentation`_)
> +- When building for i686 use:
> +
> +  - rdma-core version 18.0 or above built with 32bit support.

related "or above" part, v19 giving build errors with mlx5, FYI.

And with v18 getting build errors originated from rdma headers [1], am I doing
something wrong?

[1]
In file included from .../dpdk/drivers/net/mlx5/mlx5_glue.c:20:
.../rdma-core/build32/include/infiniband/mlx5dv.h: In function
‘mlx5dv_x86_set_data_seg’:
.../rdma-core/build32/include/infiniband/mlx5dv.h:787:69: error: right shift
count >= width of type [-Werror=shift-count-overflow]
  __m128i val  = _mm_set_epi32((uint32_t)address, (uint32_t)(address >> 32),
lkey, length);
                                                                     ^~

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [dpdk-dev] [PATCH v2] net/mlx5: add support for 32bit systems
  2018-07-04 13:48   ` Ferruh Yigit
@ 2018-07-05 10:09     ` Mordechay Haimovsky
  2018-07-05 11:27       ` Ferruh Yigit
  2018-07-05 17:07       ` Mordechay Haimovsky
  2018-07-08 17:04     ` Mordechay Haimovsky
  1 sibling, 2 replies; 17+ messages in thread
From: Mordechay Haimovsky @ 2018-07-05 10:09 UTC (permalink / raw)
  To: Ferruh Yigit, Shahaf Shuler; +Cc: Adrien Mazarguil, dev

Hi,
 Didn’t see it in our setups (not an excuse),  Investigating ....

Moti

> -----Original Message-----
> From: Ferruh Yigit [mailto:ferruh.yigit@intel.com]
> Sent: Wednesday, July 4, 2018 4:49 PM
> To: Mordechay Haimovsky <motih@mellanox.com>; Shahaf Shuler
> <shahafs@mellanox.com>
> Cc: Adrien Mazarguil <adrien.mazarguil@6wind.com>; dev@dpdk.org
> Subject: Re: [dpdk-dev] [PATCH v2] net/mlx5: add support for 32bit systems
> 
> On 7/2/2018 12:11 PM, Moti Haimovsky wrote:
> > This patch adds support for building and running mlx5 PMD on 32bit
> > systems such as i686.
> >
> > The main issue to tackle was handling the 32bit access to the UAR as
> > quoted from the mlx5 PRM:
> > QP and CQ DoorBells require 64-bit writes. For best performance, it is
> > recommended to execute the QP/CQ DoorBell as a single 64-bit write
> > operation. For platforms that do not support 64 bit writes, it is
> > possible to issue the 64 bits DoorBells through two consecutive
> > writes, each write 32 bits, as described below:
> > * The order of writing each of the Dwords is from lower to upper
> >   addresses.
> > * No other DoorBell can be rung (or even start ringing) in the midst of
> >   an on-going write of a DoorBell over a given UAR page.
> > The last rule implies that in a multi-threaded environment, the access
> > to a UAR page (which can be accessible by all threads in the process)
> > must be synchronized (for example, using a semaphore) unless an atomic
> > write of 64 bits in a single bus operation is guaranteed. Such a
> > synchronization is not required for when ringing DoorBells on
> > different UAR pages.
> >
> > Signed-off-by: Moti Haimovsky <motih@mellanox.com>
> > ---
> > v2:
> > * Fixed coding style issues.
> > * Modified documentation according to review inputs.
> > * Fixed merge conflicts.
> > ---
> >  doc/guides/nics/features/mlx5.ini |  1 +
> >  doc/guides/nics/mlx5.rst          |  6 +++-
> >  drivers/net/mlx5/mlx5.c           |  8 ++++-
> >  drivers/net/mlx5/mlx5.h           |  5 +++
> >  drivers/net/mlx5/mlx5_defs.h      | 18 ++++++++--
> >  drivers/net/mlx5/mlx5_rxq.c       |  6 +++-
> >  drivers/net/mlx5/mlx5_rxtx.c      | 22 +++++++------
> >  drivers/net/mlx5/mlx5_rxtx.h      | 69
> ++++++++++++++++++++++++++++++++++++++-
> >  drivers/net/mlx5/mlx5_txq.c       | 13 +++++++-
> >  9 files changed, 131 insertions(+), 17 deletions(-)
> >
> > diff --git a/doc/guides/nics/features/mlx5.ini
> > b/doc/guides/nics/features/mlx5.ini
> > index e75b14b..b28b43e 100644
> > --- a/doc/guides/nics/features/mlx5.ini
> > +++ b/doc/guides/nics/features/mlx5.ini
> > @@ -43,5 +43,6 @@ Multiprocess aware   = Y
> >  Other kdrv           = Y
> >  ARMv8                = Y
> >  Power8               = Y
> > +x86-32               = Y
> >  x86-64               = Y
> >  Usage doc            = Y
> > diff --git a/doc/guides/nics/mlx5.rst b/doc/guides/nics/mlx5.rst index
> > 7dd9c1c..5fbad60 100644
> > --- a/doc/guides/nics/mlx5.rst
> > +++ b/doc/guides/nics/mlx5.rst
> > @@ -49,7 +49,7 @@ libibverbs.
> >  Features
> >  --------
> >
> > -- Multi arch support: x86_64, POWER8, ARMv8.
> > +- Multi arch support: x86_64, POWER8, ARMv8, i686.
> >  - Multiple TX and RX queues.
> >  - Support for scattered TX and RX frames.
> >  - IPv4, IPv6, TCPv4, TCPv6, UDPv4 and UDPv6 RSS on any number of
> queues.
> > @@ -477,6 +477,10 @@ RMDA Core with Linux Kernel
> >  - Minimal kernel version : v4.14 or the most recent 4.14-rc (see
> > `Linux installation documentation`_)
> >  - Minimal rdma-core version: v15+ commit 0c5f5765213a ("Merge pull
> request #227 from yishaih/tm")
> >    (see `RDMA Core installation documentation`_)
> > +- When building for i686 use:
> > +
> > +  - rdma-core version 18.0 or above built with 32bit support.
> 
> related "or above" part, v19 giving build errors with mlx5, FYI.
> 
> And with v18 getting build errors originated from rdma headers [1], am I
> doing something wrong?
> 
> [1]
> In file included from .../dpdk/drivers/net/mlx5/mlx5_glue.c:20:
> .../rdma-core/build32/include/infiniband/mlx5dv.h: In function
> ‘mlx5dv_x86_set_data_seg’:
> .../rdma-core/build32/include/infiniband/mlx5dv.h:787:69: error: right shift
> count >= width of type [-Werror=shift-count-overflow]
>   __m128i val  = _mm_set_epi32((uint32_t)address, (uint32_t)(address >>
> 32), lkey, length);
>                                                                      ^~

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [dpdk-dev] [PATCH v2] net/mlx5: add support for 32bit systems
  2018-07-05 10:09     ` Mordechay Haimovsky
@ 2018-07-05 11:27       ` Ferruh Yigit
  2018-07-11 12:22         ` Shahaf Shuler
  2018-07-05 17:07       ` Mordechay Haimovsky
  1 sibling, 1 reply; 17+ messages in thread
From: Ferruh Yigit @ 2018-07-05 11:27 UTC (permalink / raw)
  To: Mordechay Haimovsky, Shahaf Shuler; +Cc: Adrien Mazarguil, dev

On 7/5/2018 11:09 AM, Mordechay Haimovsky wrote:
> Hi,
>  Didn’t see it in our setups (not an excuse),  Investigating ....

Thanks. Perhaps it can be related to compiler version:
gcc (GCC) 8.1.1 20180502 (Red Hat 8.1.1-1)
(ICC 32bit also gave same build error.)

btw, to clarify rdma-core v19 build errors was not just for 32bit build, I lost
my log but I can reproduce if you require.

> 
> Moti
> 
>> -----Original Message-----
>> From: Ferruh Yigit [mailto:ferruh.yigit@intel.com]
>> Sent: Wednesday, July 4, 2018 4:49 PM
>> To: Mordechay Haimovsky <motih@mellanox.com>; Shahaf Shuler
>> <shahafs@mellanox.com>
>> Cc: Adrien Mazarguil <adrien.mazarguil@6wind.com>; dev@dpdk.org
>> Subject: Re: [dpdk-dev] [PATCH v2] net/mlx5: add support for 32bit systems
>>
>> On 7/2/2018 12:11 PM, Moti Haimovsky wrote:
>>> This patch adds support for building and running mlx5 PMD on 32bit
>>> systems such as i686.
>>>
>>> The main issue to tackle was handling the 32bit access to the UAR as
>>> quoted from the mlx5 PRM:
>>> QP and CQ DoorBells require 64-bit writes. For best performance, it is
>>> recommended to execute the QP/CQ DoorBell as a single 64-bit write
>>> operation. For platforms that do not support 64 bit writes, it is
>>> possible to issue the 64 bits DoorBells through two consecutive
>>> writes, each write 32 bits, as described below:
>>> * The order of writing each of the Dwords is from lower to upper
>>>   addresses.
>>> * No other DoorBell can be rung (or even start ringing) in the midst of
>>>   an on-going write of a DoorBell over a given UAR page.
>>> The last rule implies that in a multi-threaded environment, the access
>>> to a UAR page (which can be accessible by all threads in the process)
>>> must be synchronized (for example, using a semaphore) unless an atomic
>>> write of 64 bits in a single bus operation is guaranteed. Such a
>>> synchronization is not required for when ringing DoorBells on
>>> different UAR pages.
>>>
>>> Signed-off-by: Moti Haimovsky <motih@mellanox.com>
>>> ---
>>> v2:
>>> * Fixed coding style issues.
>>> * Modified documentation according to review inputs.
>>> * Fixed merge conflicts.
>>> ---
>>>  doc/guides/nics/features/mlx5.ini |  1 +
>>>  doc/guides/nics/mlx5.rst          |  6 +++-
>>>  drivers/net/mlx5/mlx5.c           |  8 ++++-
>>>  drivers/net/mlx5/mlx5.h           |  5 +++
>>>  drivers/net/mlx5/mlx5_defs.h      | 18 ++++++++--
>>>  drivers/net/mlx5/mlx5_rxq.c       |  6 +++-
>>>  drivers/net/mlx5/mlx5_rxtx.c      | 22 +++++++------
>>>  drivers/net/mlx5/mlx5_rxtx.h      | 69
>> ++++++++++++++++++++++++++++++++++++++-
>>>  drivers/net/mlx5/mlx5_txq.c       | 13 +++++++-
>>>  9 files changed, 131 insertions(+), 17 deletions(-)
>>>
>>> diff --git a/doc/guides/nics/features/mlx5.ini
>>> b/doc/guides/nics/features/mlx5.ini
>>> index e75b14b..b28b43e 100644
>>> --- a/doc/guides/nics/features/mlx5.ini
>>> +++ b/doc/guides/nics/features/mlx5.ini
>>> @@ -43,5 +43,6 @@ Multiprocess aware   = Y
>>>  Other kdrv           = Y
>>>  ARMv8                = Y
>>>  Power8               = Y
>>> +x86-32               = Y
>>>  x86-64               = Y
>>>  Usage doc            = Y
>>> diff --git a/doc/guides/nics/mlx5.rst b/doc/guides/nics/mlx5.rst index
>>> 7dd9c1c..5fbad60 100644
>>> --- a/doc/guides/nics/mlx5.rst
>>> +++ b/doc/guides/nics/mlx5.rst
>>> @@ -49,7 +49,7 @@ libibverbs.
>>>  Features
>>>  --------
>>>
>>> -- Multi arch support: x86_64, POWER8, ARMv8.
>>> +- Multi arch support: x86_64, POWER8, ARMv8, i686.
>>>  - Multiple TX and RX queues.
>>>  - Support for scattered TX and RX frames.
>>>  - IPv4, IPv6, TCPv4, TCPv6, UDPv4 and UDPv6 RSS on any number of
>> queues.
>>> @@ -477,6 +477,10 @@ RMDA Core with Linux Kernel
>>>  - Minimal kernel version : v4.14 or the most recent 4.14-rc (see
>>> `Linux installation documentation`_)
>>>  - Minimal rdma-core version: v15+ commit 0c5f5765213a ("Merge pull
>> request #227 from yishaih/tm")
>>>    (see `RDMA Core installation documentation`_)
>>> +- When building for i686 use:
>>> +
>>> +  - rdma-core version 18.0 or above built with 32bit support.
>>
>> related "or above" part, v19 giving build errors with mlx5, FYI.
>>
>> And with v18 getting build errors originated from rdma headers [1], am I
>> doing something wrong?
>>
>> [1]
>> In file included from .../dpdk/drivers/net/mlx5/mlx5_glue.c:20:
>> .../rdma-core/build32/include/infiniband/mlx5dv.h: In function
>> ‘mlx5dv_x86_set_data_seg’:
>> .../rdma-core/build32/include/infiniband/mlx5dv.h:787:69: error: right shift
>> count >= width of type [-Werror=shift-count-overflow]
>>   __m128i val  = _mm_set_epi32((uint32_t)address, (uint32_t)(address >>
>> 32), lkey, length);
>>                                                                      ^~

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [dpdk-dev] [PATCH v2] net/mlx5: add support for 32bit systems
  2018-07-05 10:09     ` Mordechay Haimovsky
  2018-07-05 11:27       ` Ferruh Yigit
@ 2018-07-05 17:07       ` Mordechay Haimovsky
  2018-07-05 17:49         ` Ferruh Yigit
  1 sibling, 1 reply; 17+ messages in thread
From: Mordechay Haimovsky @ 2018-07-05 17:07 UTC (permalink / raw)
  To: Ferruh Yigit, Shahaf Shuler; +Cc: Adrien Mazarguil, dev, Olga Shern

Hello Ferruh,
  Here are my findings:

1.  The error you've seen is definitely a bug in mlx5dv.h from rdma-core
      (I'm emphasizing rdma-core since I cannot just send a fix for this file)
      As it didn’t take into account that an address may be a 32bit one when performing the 32bit shift.
      __m128i val  = _mm_set_epi32((uint32_t)address, (uint32_t)(address >> 32), lkey, length);
2. The reason we didn’t see it in our setups is due to the values assigned to the GCC predefined macros
    We are using (from RH and UBUNTU).
    When I run the following commands in our setups:
	alias gccmacros='gcc -dM -E -x c /dev/null'
	gccmacros -m32 | grep -E "(MMX|SSE|AVX|XOP)"
    I get the following results:
        On RH setup using gcc version 4.8.5 20150623 (Red Hat 4.8.5-11) (GCC)
	#define __MMX__ 1
	#define __SSE2__ 1
	#define __SSE__ 1
      On Ubuntu setup using gcc version 5.4.0 20160609 (Ubuntu 5.4.0-6ubuntu1~16.04.10)
	No flags are defined.
   Since the "offending" routine is wrapped with #ifdef __SSE3__ the compiler just ignores it.

ARs:
  1. Open a bug for fixing mlx5dv.h in rdma-core. - Moti H.
  2. Provide a workaround for the problem. - Moti H.
  3. Verify that this is actually the issue by running the above scripts
       In Ferruh setup and verifying  the SSE3 flag is set. - Ferruh Yigit

Moti H. 

> -----Original Message-----
> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Mordechay
> Haimovsky
> Sent: Thursday, July 5, 2018 1:10 PM
> To: Ferruh Yigit <ferruh.yigit@intel.com>; Shahaf Shuler
> <shahafs@mellanox.com>
> Cc: Adrien Mazarguil <adrien.mazarguil@6wind.com>; dev@dpdk.org
> Subject: Re: [dpdk-dev] [PATCH v2] net/mlx5: add support for 32bit systems
> 
> Hi,
>  Didn’t see it in our setups (not an excuse),  Investigating ....
> 
> Moti
> 
> > -----Original Message-----
> > From: Ferruh Yigit [mailto:ferruh.yigit@intel.com]
> > Sent: Wednesday, July 4, 2018 4:49 PM
> > To: Mordechay Haimovsky <motih@mellanox.com>; Shahaf Shuler
> > <shahafs@mellanox.com>
> > Cc: Adrien Mazarguil <adrien.mazarguil@6wind.com>; dev@dpdk.org
> > Subject: Re: [dpdk-dev] [PATCH v2] net/mlx5: add support for 32bit
> > systems
> >
> > On 7/2/2018 12:11 PM, Moti Haimovsky wrote:
> > > This patch adds support for building and running mlx5 PMD on 32bit
> > > systems such as i686.
> > >
> > > The main issue to tackle was handling the 32bit access to the UAR as
> > > quoted from the mlx5 PRM:
> > > QP and CQ DoorBells require 64-bit writes. For best performance, it
> > > is recommended to execute the QP/CQ DoorBell as a single 64-bit
> > > write operation. For platforms that do not support 64 bit writes, it
> > > is possible to issue the 64 bits DoorBells through two consecutive
> > > writes, each write 32 bits, as described below:
> > > * The order of writing each of the Dwords is from lower to upper
> > >   addresses.
> > > * No other DoorBell can be rung (or even start ringing) in the midst of
> > >   an on-going write of a DoorBell over a given UAR page.
> > > The last rule implies that in a multi-threaded environment, the
> > > access to a UAR page (which can be accessible by all threads in the
> > > process) must be synchronized (for example, using a semaphore)
> > > unless an atomic write of 64 bits in a single bus operation is
> > > guaranteed. Such a synchronization is not required for when ringing
> > > DoorBells on different UAR pages.
> > >
> > > Signed-off-by: Moti Haimovsky <motih@mellanox.com>
> > > ---
> > > v2:
> > > * Fixed coding style issues.
> > > * Modified documentation according to review inputs.
> > > * Fixed merge conflicts.
> > > ---
> > >  doc/guides/nics/features/mlx5.ini |  1 +
> > >  doc/guides/nics/mlx5.rst          |  6 +++-
> > >  drivers/net/mlx5/mlx5.c           |  8 ++++-
> > >  drivers/net/mlx5/mlx5.h           |  5 +++
> > >  drivers/net/mlx5/mlx5_defs.h      | 18 ++++++++--
> > >  drivers/net/mlx5/mlx5_rxq.c       |  6 +++-
> > >  drivers/net/mlx5/mlx5_rxtx.c      | 22 +++++++------
> > >  drivers/net/mlx5/mlx5_rxtx.h      | 69
> > ++++++++++++++++++++++++++++++++++++++-
> > >  drivers/net/mlx5/mlx5_txq.c       | 13 +++++++-
> > >  9 files changed, 131 insertions(+), 17 deletions(-)
> > >
> > > diff --git a/doc/guides/nics/features/mlx5.ini
> > > b/doc/guides/nics/features/mlx5.ini
> > > index e75b14b..b28b43e 100644
> > > --- a/doc/guides/nics/features/mlx5.ini
> > > +++ b/doc/guides/nics/features/mlx5.ini
> > > @@ -43,5 +43,6 @@ Multiprocess aware   = Y
> > >  Other kdrv           = Y
> > >  ARMv8                = Y
> > >  Power8               = Y
> > > +x86-32               = Y
> > >  x86-64               = Y
> > >  Usage doc            = Y
> > > diff --git a/doc/guides/nics/mlx5.rst b/doc/guides/nics/mlx5.rst
> > > index
> > > 7dd9c1c..5fbad60 100644
> > > --- a/doc/guides/nics/mlx5.rst
> > > +++ b/doc/guides/nics/mlx5.rst
> > > @@ -49,7 +49,7 @@ libibverbs.
> > >  Features
> > >  --------
> > >
> > > -- Multi arch support: x86_64, POWER8, ARMv8.
> > > +- Multi arch support: x86_64, POWER8, ARMv8, i686.
> > >  - Multiple TX and RX queues.
> > >  - Support for scattered TX and RX frames.
> > >  - IPv4, IPv6, TCPv4, TCPv6, UDPv4 and UDPv6 RSS on any number of
> > queues.
> > > @@ -477,6 +477,10 @@ RMDA Core with Linux Kernel
> > >  - Minimal kernel version : v4.14 or the most recent 4.14-rc (see
> > > `Linux installation documentation`_)
> > >  - Minimal rdma-core version: v15+ commit 0c5f5765213a ("Merge pull
> > request #227 from yishaih/tm")
> > >    (see `RDMA Core installation documentation`_)
> > > +- When building for i686 use:
> > > +
> > > +  - rdma-core version 18.0 or above built with 32bit support.
> >
> > related "or above" part, v19 giving build errors with mlx5, FYI.
> >
> > And with v18 getting build errors originated from rdma headers [1], am
> > I doing something wrong?
> >
> > [1]
> > In file included from .../dpdk/drivers/net/mlx5/mlx5_glue.c:20:
> > .../rdma-core/build32/include/infiniband/mlx5dv.h: In function
> > ‘mlx5dv_x86_set_data_seg’:
> > .../rdma-core/build32/include/infiniband/mlx5dv.h:787:69: error: right
> > shift count >= width of type [-Werror=shift-count-overflow]
> >   __m128i val  = _mm_set_epi32((uint32_t)address, (uint32_t)(address
> > >> 32), lkey, length);
> >
> > ^~

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [dpdk-dev] [PATCH v2] net/mlx5: add support for 32bit systems
  2018-07-05 17:07       ` Mordechay Haimovsky
@ 2018-07-05 17:49         ` Ferruh Yigit
  2018-07-09  7:23           ` Shahaf Shuler
  0 siblings, 1 reply; 17+ messages in thread
From: Ferruh Yigit @ 2018-07-05 17:49 UTC (permalink / raw)
  To: Mordechay Haimovsky, Shahaf Shuler; +Cc: Adrien Mazarguil, dev, Olga Shern

On 7/5/2018 6:07 PM, Mordechay Haimovsky wrote:
> Hello Ferruh,
>   Here are my findings:
> 
> 1.  The error you've seen is definitely a bug in mlx5dv.h from rdma-core
>       (I'm emphasizing rdma-core since I cannot just send a fix for this file)
>       As it didn’t take into account that an address may be a 32bit one when performing the 32bit shift.
>       __m128i val  = _mm_set_epi32((uint32_t)address, (uint32_t)(address >> 32), lkey, length);
> 2. The reason we didn’t see it in our setups is due to the values assigned to the GCC predefined macros
>     We are using (from RH and UBUNTU).
>     When I run the following commands in our setups:
> 	alias gccmacros='gcc -dM -E -x c /dev/null'
> 	gccmacros -m32 | grep -E "(MMX|SSE|AVX|XOP)"
>     I get the following results:
>         On RH setup using gcc version 4.8.5 20150623 (Red Hat 4.8.5-11) (GCC)
> 	#define __MMX__ 1
> 	#define __SSE2__ 1
> 	#define __SSE__ 1
>       On Ubuntu setup using gcc version 5.4.0 20160609 (Ubuntu 5.4.0-6ubuntu1~16.04.10)
> 	No flags are defined.
>    Since the "offending" routine is wrapped with #ifdef __SSE3__ the compiler just ignores it.
> 
> ARs:
>   1. Open a bug for fixing mlx5dv.h in rdma-core. - Moti H.
>   2. Provide a workaround for the problem. - Moti H.
>   3. Verify that this is actually the issue by running the above scripts
>        In Ferruh setup and verifying  the SSE3 flag is set. - Ferruh Yigit

I confirm SSE3 is set in my environment, but I think this will be true for all
x86 because DPDK min required SIMD is SSE4.2. According wiki SSE3 introduced in
2004.

We use -march=native in dpdk build, so:
$ gcc -march=native -m32 -dM -E - </dev/null | grep SSE3
#define __SSSE3__ 1
#define __SSE3__ 1


> 
> Moti H. 
> 
>> -----Original Message-----
>> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Mordechay
>> Haimovsky
>> Sent: Thursday, July 5, 2018 1:10 PM
>> To: Ferruh Yigit <ferruh.yigit@intel.com>; Shahaf Shuler
>> <shahafs@mellanox.com>
>> Cc: Adrien Mazarguil <adrien.mazarguil@6wind.com>; dev@dpdk.org
>> Subject: Re: [dpdk-dev] [PATCH v2] net/mlx5: add support for 32bit systems
>>
>> Hi,
>>  Didn’t see it in our setups (not an excuse),  Investigating ....
>>
>> Moti
>>
>>> -----Original Message-----
>>> From: Ferruh Yigit [mailto:ferruh.yigit@intel.com]
>>> Sent: Wednesday, July 4, 2018 4:49 PM
>>> To: Mordechay Haimovsky <motih@mellanox.com>; Shahaf Shuler
>>> <shahafs@mellanox.com>
>>> Cc: Adrien Mazarguil <adrien.mazarguil@6wind.com>; dev@dpdk.org
>>> Subject: Re: [dpdk-dev] [PATCH v2] net/mlx5: add support for 32bit
>>> systems
>>>
>>> On 7/2/2018 12:11 PM, Moti Haimovsky wrote:
>>>> This patch adds support for building and running mlx5 PMD on 32bit
>>>> systems such as i686.
>>>>
>>>> The main issue to tackle was handling the 32bit access to the UAR as
>>>> quoted from the mlx5 PRM:
>>>> QP and CQ DoorBells require 64-bit writes. For best performance, it
>>>> is recommended to execute the QP/CQ DoorBell as a single 64-bit
>>>> write operation. For platforms that do not support 64 bit writes, it
>>>> is possible to issue the 64 bits DoorBells through two consecutive
>>>> writes, each write 32 bits, as described below:
>>>> * The order of writing each of the Dwords is from lower to upper
>>>>   addresses.
>>>> * No other DoorBell can be rung (or even start ringing) in the midst of
>>>>   an on-going write of a DoorBell over a given UAR page.
>>>> The last rule implies that in a multi-threaded environment, the
>>>> access to a UAR page (which can be accessible by all threads in the
>>>> process) must be synchronized (for example, using a semaphore)
>>>> unless an atomic write of 64 bits in a single bus operation is
>>>> guaranteed. Such a synchronization is not required for when ringing
>>>> DoorBells on different UAR pages.
>>>>
>>>> Signed-off-by: Moti Haimovsky <motih@mellanox.com>
>>>> ---
>>>> v2:
>>>> * Fixed coding style issues.
>>>> * Modified documentation according to review inputs.
>>>> * Fixed merge conflicts.
>>>> ---
>>>>  doc/guides/nics/features/mlx5.ini |  1 +
>>>>  doc/guides/nics/mlx5.rst          |  6 +++-
>>>>  drivers/net/mlx5/mlx5.c           |  8 ++++-
>>>>  drivers/net/mlx5/mlx5.h           |  5 +++
>>>>  drivers/net/mlx5/mlx5_defs.h      | 18 ++++++++--
>>>>  drivers/net/mlx5/mlx5_rxq.c       |  6 +++-
>>>>  drivers/net/mlx5/mlx5_rxtx.c      | 22 +++++++------
>>>>  drivers/net/mlx5/mlx5_rxtx.h      | 69
>>> ++++++++++++++++++++++++++++++++++++++-
>>>>  drivers/net/mlx5/mlx5_txq.c       | 13 +++++++-
>>>>  9 files changed, 131 insertions(+), 17 deletions(-)
>>>>
>>>> diff --git a/doc/guides/nics/features/mlx5.ini
>>>> b/doc/guides/nics/features/mlx5.ini
>>>> index e75b14b..b28b43e 100644
>>>> --- a/doc/guides/nics/features/mlx5.ini
>>>> +++ b/doc/guides/nics/features/mlx5.ini
>>>> @@ -43,5 +43,6 @@ Multiprocess aware   = Y
>>>>  Other kdrv           = Y
>>>>  ARMv8                = Y
>>>>  Power8               = Y
>>>> +x86-32               = Y
>>>>  x86-64               = Y
>>>>  Usage doc            = Y
>>>> diff --git a/doc/guides/nics/mlx5.rst b/doc/guides/nics/mlx5.rst
>>>> index
>>>> 7dd9c1c..5fbad60 100644
>>>> --- a/doc/guides/nics/mlx5.rst
>>>> +++ b/doc/guides/nics/mlx5.rst
>>>> @@ -49,7 +49,7 @@ libibverbs.
>>>>  Features
>>>>  --------
>>>>
>>>> -- Multi arch support: x86_64, POWER8, ARMv8.
>>>> +- Multi arch support: x86_64, POWER8, ARMv8, i686.
>>>>  - Multiple TX and RX queues.
>>>>  - Support for scattered TX and RX frames.
>>>>  - IPv4, IPv6, TCPv4, TCPv6, UDPv4 and UDPv6 RSS on any number of
>>> queues.
>>>> @@ -477,6 +477,10 @@ RMDA Core with Linux Kernel
>>>>  - Minimal kernel version : v4.14 or the most recent 4.14-rc (see
>>>> `Linux installation documentation`_)
>>>>  - Minimal rdma-core version: v15+ commit 0c5f5765213a ("Merge pull
>>> request #227 from yishaih/tm")
>>>>    (see `RDMA Core installation documentation`_)
>>>> +- When building for i686 use:
>>>> +
>>>> +  - rdma-core version 18.0 or above built with 32bit support.
>>>
>>> related "or above" part, v19 giving build errors with mlx5, FYI.
>>>
>>> And with v18 getting build errors originated from rdma headers [1], am
>>> I doing something wrong?
>>>
>>> [1]
>>> In file included from .../dpdk/drivers/net/mlx5/mlx5_glue.c:20:
>>> .../rdma-core/build32/include/infiniband/mlx5dv.h: In function
>>> ‘mlx5dv_x86_set_data_seg’:
>>> .../rdma-core/build32/include/infiniband/mlx5dv.h:787:69: error: right
>>> shift count >= width of type [-Werror=shift-count-overflow]
>>>   __m128i val  = _mm_set_epi32((uint32_t)address, (uint32_t)(address
>>>>> 32), lkey, length);
>>>
>>> ^~

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [dpdk-dev] [PATCH v2] net/mlx5: add support for 32bit systems
  2018-07-04 13:48   ` Ferruh Yigit
  2018-07-05 10:09     ` Mordechay Haimovsky
@ 2018-07-08 17:04     ` Mordechay Haimovsky
  1 sibling, 0 replies; 17+ messages in thread
From: Mordechay Haimovsky @ 2018-07-08 17:04 UTC (permalink / raw)
  To: Ferruh Yigit, Shahaf Shuler; +Cc: Adrien Mazarguil, dev

Hi Ferruh,
 Can you please send me the output of "gcc -v" on your system ?

Moti

> -----Original Message-----
> From: Ferruh Yigit [mailto:ferruh.yigit@intel.com]
> Sent: Wednesday, July 4, 2018 4:49 PM
> To: Mordechay Haimovsky <motih@mellanox.com>; Shahaf Shuler
> <shahafs@mellanox.com>
> Cc: Adrien Mazarguil <adrien.mazarguil@6wind.com>; dev@dpdk.org
> Subject: Re: [dpdk-dev] [PATCH v2] net/mlx5: add support for 32bit systems
> 
> On 7/2/2018 12:11 PM, Moti Haimovsky wrote:
> > This patch adds support for building and running mlx5 PMD on 32bit
> > systems such as i686.
> >
> > The main issue to tackle was handling the 32bit access to the UAR as
> > quoted from the mlx5 PRM:
> > QP and CQ DoorBells require 64-bit writes. For best performance, it is
> > recommended to execute the QP/CQ DoorBell as a single 64-bit write
> > operation. For platforms that do not support 64 bit writes, it is
> > possible to issue the 64 bits DoorBells through two consecutive
> > writes, each write 32 bits, as described below:
> > * The order of writing each of the Dwords is from lower to upper
> >   addresses.
> > * No other DoorBell can be rung (or even start ringing) in the midst of
> >   an on-going write of a DoorBell over a given UAR page.
> > The last rule implies that in a multi-threaded environment, the access
> > to a UAR page (which can be accessible by all threads in the process)
> > must be synchronized (for example, using a semaphore) unless an atomic
> > write of 64 bits in a single bus operation is guaranteed. Such a
> > synchronization is not required for when ringing DoorBells on
> > different UAR pages.
> >
> > Signed-off-by: Moti Haimovsky <motih@mellanox.com>
> > ---
> > v2:
> > * Fixed coding style issues.
> > * Modified documentation according to review inputs.
> > * Fixed merge conflicts.
> > ---
> >  doc/guides/nics/features/mlx5.ini |  1 +
> >  doc/guides/nics/mlx5.rst          |  6 +++-
> >  drivers/net/mlx5/mlx5.c           |  8 ++++-
> >  drivers/net/mlx5/mlx5.h           |  5 +++
> >  drivers/net/mlx5/mlx5_defs.h      | 18 ++++++++--
> >  drivers/net/mlx5/mlx5_rxq.c       |  6 +++-
> >  drivers/net/mlx5/mlx5_rxtx.c      | 22 +++++++------
> >  drivers/net/mlx5/mlx5_rxtx.h      | 69
> ++++++++++++++++++++++++++++++++++++++-
> >  drivers/net/mlx5/mlx5_txq.c       | 13 +++++++-
> >  9 files changed, 131 insertions(+), 17 deletions(-)
> >
> > diff --git a/doc/guides/nics/features/mlx5.ini
> > b/doc/guides/nics/features/mlx5.ini
> > index e75b14b..b28b43e 100644
> > --- a/doc/guides/nics/features/mlx5.ini
> > +++ b/doc/guides/nics/features/mlx5.ini
> > @@ -43,5 +43,6 @@ Multiprocess aware   = Y
> >  Other kdrv           = Y
> >  ARMv8                = Y
> >  Power8               = Y
> > +x86-32               = Y
> >  x86-64               = Y
> >  Usage doc            = Y
> > diff --git a/doc/guides/nics/mlx5.rst b/doc/guides/nics/mlx5.rst index
> > 7dd9c1c..5fbad60 100644
> > --- a/doc/guides/nics/mlx5.rst
> > +++ b/doc/guides/nics/mlx5.rst
> > @@ -49,7 +49,7 @@ libibverbs.
> >  Features
> >  --------
> >
> > -- Multi arch support: x86_64, POWER8, ARMv8.
> > +- Multi arch support: x86_64, POWER8, ARMv8, i686.
> >  - Multiple TX and RX queues.
> >  - Support for scattered TX and RX frames.
> >  - IPv4, IPv6, TCPv4, TCPv6, UDPv4 and UDPv6 RSS on any number of
> queues.
> > @@ -477,6 +477,10 @@ RMDA Core with Linux Kernel
> >  - Minimal kernel version : v4.14 or the most recent 4.14-rc (see
> > `Linux installation documentation`_)
> >  - Minimal rdma-core version: v15+ commit 0c5f5765213a ("Merge pull
> request #227 from yishaih/tm")
> >    (see `RDMA Core installation documentation`_)
> > +- When building for i686 use:
> > +
> > +  - rdma-core version 18.0 or above built with 32bit support.
> 
> related "or above" part, v19 giving build errors with mlx5, FYI.
> 
> And with v18 getting build errors originated from rdma headers [1], am I
> doing something wrong?
> 
> [1]
> In file included from .../dpdk/drivers/net/mlx5/mlx5_glue.c:20:
> .../rdma-core/build32/include/infiniband/mlx5dv.h: In function
> ‘mlx5dv_x86_set_data_seg’:
> .../rdma-core/build32/include/infiniband/mlx5dv.h:787:69: error: right shift
> count >= width of type [-Werror=shift-count-overflow]
>   __m128i val  = _mm_set_epi32((uint32_t)address, (uint32_t)(address >>
> 32), lkey, length);
>                                                                      ^~

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [dpdk-dev] [PATCH v2] net/mlx5: add support for 32bit systems
  2018-07-05 17:49         ` Ferruh Yigit
@ 2018-07-09  7:23           ` Shahaf Shuler
  0 siblings, 0 replies; 17+ messages in thread
From: Shahaf Shuler @ 2018-07-09  7:23 UTC (permalink / raw)
  To: Ferruh Yigit, Mordechay Haimovsky; +Cc: Adrien Mazarguil, dev, Olga Shern

Thursday, July 5, 2018 8:50 PM, Ferruh Yigit:
> Subject: Re: [dpdk-dev] [PATCH v2] net/mlx5: add support for 32bit systems
> 
> On 7/5/2018 6:07 PM, Mordechay Haimovsky wrote:
> > Hello Ferruh,
> >   Here are my findings:
> >
> > 1.  The error you've seen is definitely a bug in mlx5dv.h from rdma-core
> >       (I'm emphasizing rdma-core since I cannot just send a fix for this file)
> >       As it didn’t take into account that an address may be a 32bit one when
> performing the 32bit shift.
> >       __m128i val  = _mm_set_epi32((uint32_t)address,
> > (uint32_t)(address >> 32), lkey, length); 2. The reason we didn’t see it in
> our setups is due to the values assigned to the GCC predefined macros
> >     We are using (from RH and UBUNTU).
> >     When I run the following commands in our setups:
> > 	alias gccmacros='gcc -dM -E -x c /dev/null'
> > 	gccmacros -m32 | grep -E "(MMX|SSE|AVX|XOP)"
> >     I get the following results:
> >         On RH setup using gcc version 4.8.5 20150623 (Red Hat 4.8.5-11) (GCC)
> > 	#define __MMX__ 1
> > 	#define __SSE2__ 1
> > 	#define __SSE__ 1
> >       On Ubuntu setup using gcc version 5.4.0 20160609 (Ubuntu 5.4.0-
> 6ubuntu1~16.04.10)
> > 	No flags are defined.
> >    Since the "offending" routine is wrapped with #ifdef __SSE3__ the
> compiler just ignores it.
> >
> > ARs:
> >   1. Open a bug for fixing mlx5dv.h in rdma-core. - Moti H.
> >   2. Provide a workaround for the problem. - Moti H.
> >   3. Verify that this is actually the issue by running the above scripts
> >        In Ferruh setup and verifying  the SSE3 flag is set. - Ferruh
> > Yigit
> 
> I confirm SSE3 is set in my environment, but I think this will be true for all
> x86 because DPDK min required SIMD is SSE4.2. According wiki SSE3
> introduced in 2004.
> 
> We use -march=native in dpdk build, so:
> $ gcc -march=native -m32 -dM -E - </dev/null | grep SSE3 #define __SSSE3__
> 1 #define __SSE3__ 1

Thanks Ferruh,

I will remove the patch from the tree till this issue is resolved. I hope we can fix rdma-core in few days from now. 

> 
> 
> >
> > Moti H.
> >
> >> -----Original Message-----
> >> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Mordechay
> >> Haimovsky
> >> Sent: Thursday, July 5, 2018 1:10 PM
> >> To: Ferruh Yigit <ferruh.yigit@intel.com>; Shahaf Shuler
> >> <shahafs@mellanox.com>
> >> Cc: Adrien Mazarguil <adrien.mazarguil@6wind.com>; dev@dpdk.org
> >> Subject: Re: [dpdk-dev] [PATCH v2] net/mlx5: add support for 32bit
> >> systems
> >>
> >> Hi,
> >>  Didn’t see it in our setups (not an excuse),  Investigating ....
> >>
> >> Moti
> >>
> >>> -----Original Message-----
> >>> From: Ferruh Yigit [mailto:ferruh.yigit@intel.com]
> >>> Sent: Wednesday, July 4, 2018 4:49 PM
> >>> To: Mordechay Haimovsky <motih@mellanox.com>; Shahaf Shuler
> >>> <shahafs@mellanox.com>
> >>> Cc: Adrien Mazarguil <adrien.mazarguil@6wind.com>; dev@dpdk.org
> >>> Subject: Re: [dpdk-dev] [PATCH v2] net/mlx5: add support for 32bit
> >>> systems
> >>>
> >>> On 7/2/2018 12:11 PM, Moti Haimovsky wrote:
> >>>> This patch adds support for building and running mlx5 PMD on 32bit
> >>>> systems such as i686.
> >>>>
> >>>> The main issue to tackle was handling the 32bit access to the UAR
> >>>> as quoted from the mlx5 PRM:
> >>>> QP and CQ DoorBells require 64-bit writes. For best performance, it
> >>>> is recommended to execute the QP/CQ DoorBell as a single 64-bit
> >>>> write operation. For platforms that do not support 64 bit writes,
> >>>> it is possible to issue the 64 bits DoorBells through two
> >>>> consecutive writes, each write 32 bits, as described below:
> >>>> * The order of writing each of the Dwords is from lower to upper
> >>>>   addresses.
> >>>> * No other DoorBell can be rung (or even start ringing) in the midst of
> >>>>   an on-going write of a DoorBell over a given UAR page.
> >>>> The last rule implies that in a multi-threaded environment, the
> >>>> access to a UAR page (which can be accessible by all threads in the
> >>>> process) must be synchronized (for example, using a semaphore)
> >>>> unless an atomic write of 64 bits in a single bus operation is
> >>>> guaranteed. Such a synchronization is not required for when ringing
> >>>> DoorBells on different UAR pages.
> >>>>
> >>>> Signed-off-by: Moti Haimovsky <motih@mellanox.com>
> >>>> ---
> >>>> v2:
> >>>> * Fixed coding style issues.
> >>>> * Modified documentation according to review inputs.
> >>>> * Fixed merge conflicts.
> >>>> ---
> >>>>  doc/guides/nics/features/mlx5.ini |  1 +
> >>>>  doc/guides/nics/mlx5.rst          |  6 +++-
> >>>>  drivers/net/mlx5/mlx5.c           |  8 ++++-
> >>>>  drivers/net/mlx5/mlx5.h           |  5 +++
> >>>>  drivers/net/mlx5/mlx5_defs.h      | 18 ++++++++--
> >>>>  drivers/net/mlx5/mlx5_rxq.c       |  6 +++-
> >>>>  drivers/net/mlx5/mlx5_rxtx.c      | 22 +++++++------
> >>>>  drivers/net/mlx5/mlx5_rxtx.h      | 69
> >>> ++++++++++++++++++++++++++++++++++++++-
> >>>>  drivers/net/mlx5/mlx5_txq.c       | 13 +++++++-
> >>>>  9 files changed, 131 insertions(+), 17 deletions(-)
> >>>>
> >>>> diff --git a/doc/guides/nics/features/mlx5.ini
> >>>> b/doc/guides/nics/features/mlx5.ini
> >>>> index e75b14b..b28b43e 100644
> >>>> --- a/doc/guides/nics/features/mlx5.ini
> >>>> +++ b/doc/guides/nics/features/mlx5.ini
> >>>> @@ -43,5 +43,6 @@ Multiprocess aware   = Y
> >>>>  Other kdrv           = Y
> >>>>  ARMv8                = Y
> >>>>  Power8               = Y
> >>>> +x86-32               = Y
> >>>>  x86-64               = Y
> >>>>  Usage doc            = Y
> >>>> diff --git a/doc/guides/nics/mlx5.rst b/doc/guides/nics/mlx5.rst
> >>>> index
> >>>> 7dd9c1c..5fbad60 100644
> >>>> --- a/doc/guides/nics/mlx5.rst
> >>>> +++ b/doc/guides/nics/mlx5.rst
> >>>> @@ -49,7 +49,7 @@ libibverbs.
> >>>>  Features
> >>>>  --------
> >>>>
> >>>> -- Multi arch support: x86_64, POWER8, ARMv8.
> >>>> +- Multi arch support: x86_64, POWER8, ARMv8, i686.
> >>>>  - Multiple TX and RX queues.
> >>>>  - Support for scattered TX and RX frames.
> >>>>  - IPv4, IPv6, TCPv4, TCPv6, UDPv4 and UDPv6 RSS on any number of
> >>> queues.
> >>>> @@ -477,6 +477,10 @@ RMDA Core with Linux Kernel
> >>>>  - Minimal kernel version : v4.14 or the most recent 4.14-rc (see
> >>>> `Linux installation documentation`_)
> >>>>  - Minimal rdma-core version: v15+ commit 0c5f5765213a ("Merge pull
> >>> request #227 from yishaih/tm")
> >>>>    (see `RDMA Core installation documentation`_)
> >>>> +- When building for i686 use:
> >>>> +
> >>>> +  - rdma-core version 18.0 or above built with 32bit support.
> >>>
> >>> related "or above" part, v19 giving build errors with mlx5, FYI.
> >>>
> >>> And with v18 getting build errors originated from rdma headers [1],
> >>> am I doing something wrong?
> >>>
> >>> [1]
> >>> In file included from .../dpdk/drivers/net/mlx5/mlx5_glue.c:20:
> >>> .../rdma-core/build32/include/infiniband/mlx5dv.h: In function
> >>> ‘mlx5dv_x86_set_data_seg’:
> >>> .../rdma-core/build32/include/infiniband/mlx5dv.h:787:69: error:
> >>> right shift count >= width of type [-Werror=shift-count-overflow]
> >>>   __m128i val  = _mm_set_epi32((uint32_t)address, (uint32_t)(address
> >>>>> 32), lkey, length);
> >>>
> >>> ^~


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [dpdk-dev] [PATCH v2] net/mlx5: add support for 32bit systems
  2018-07-05 11:27       ` Ferruh Yigit
@ 2018-07-11 12:22         ` Shahaf Shuler
  0 siblings, 0 replies; 17+ messages in thread
From: Shahaf Shuler @ 2018-07-11 12:22 UTC (permalink / raw)
  To: Ferruh Yigit, Mordechay Haimovsky; +Cc: Adrien Mazarguil, dev

Hi Ferruh,

Thursday, July 5, 2018 2:27 PM, Ferruh Yigit:
> Subject: Re: [dpdk-dev] [PATCH v2] net/mlx5: add support for 32bit systems
> 
> On 7/5/2018 11:09 AM, Mordechay Haimovsky wrote:
> > Hi,
> >  Didn’t see it in our setups (not an excuse),  Investigating ....
> 
> Thanks. Perhaps it can be related to compiler version:
> gcc (GCC) 8.1.1 20180502 (Red Hat 8.1.1-1) (ICC 32bit also gave same build
> error.)
> 
> btw, to clarify rdma-core v19 build errors was not just for 32bit build, I lost my
> log but I can reproduce if you require.

Thanks for reporting it we will fix.

Here is the plan to include the patchset upstream:
1. there is a fix on rmda-core v19+ for the 32b compilation issue seen
2. as you reported there is another compilation issue w/ v19 which needs to be solved (I will provide a fix patch).
3. #2 fix patch should be on top of series : https://patches.dpdk.org/project/dpdk/list/?series=512

So the plan is to integrate the series and after it the compilation fix patch and the 32b support one.
Do you agree? 

I hope the above series will be merge at the beginning of next week.


> 
> >
> > Moti
> >
> >> -----Original Message-----
> >> From: Ferruh Yigit [mailto:ferruh.yigit@intel.com]
> >> Sent: Wednesday, July 4, 2018 4:49 PM
> >> To: Mordechay Haimovsky <motih@mellanox.com>; Shahaf Shuler
> >> <shahafs@mellanox.com>
> >> Cc: Adrien Mazarguil <adrien.mazarguil@6wind.com>; dev@dpdk.org
> >> Subject: Re: [dpdk-dev] [PATCH v2] net/mlx5: add support for 32bit
> >> systems
> >>
> >> On 7/2/2018 12:11 PM, Moti Haimovsky wrote:
> >>> This patch adds support for building and running mlx5 PMD on 32bit
> >>> systems such as i686.
> >>>
> >>> The main issue to tackle was handling the 32bit access to the UAR as
> >>> quoted from the mlx5 PRM:
> >>> QP and CQ DoorBells require 64-bit writes. For best performance, it
> >>> is recommended to execute the QP/CQ DoorBell as a single 64-bit
> >>> write operation. For platforms that do not support 64 bit writes, it
> >>> is possible to issue the 64 bits DoorBells through two consecutive
> >>> writes, each write 32 bits, as described below:
> >>> * The order of writing each of the Dwords is from lower to upper
> >>>   addresses.
> >>> * No other DoorBell can be rung (or even start ringing) in the midst of
> >>>   an on-going write of a DoorBell over a given UAR page.
> >>> The last rule implies that in a multi-threaded environment, the
> >>> access to a UAR page (which can be accessible by all threads in the
> >>> process) must be synchronized (for example, using a semaphore)
> >>> unless an atomic write of 64 bits in a single bus operation is
> >>> guaranteed. Such a synchronization is not required for when ringing
> >>> DoorBells on different UAR pages.
> >>>
> >>> Signed-off-by: Moti Haimovsky <motih@mellanox.com>
> >>> ---
> >>> v2:
> >>> * Fixed coding style issues.
> >>> * Modified documentation according to review inputs.
> >>> * Fixed merge conflicts.
> >>> ---
> >>>  doc/guides/nics/features/mlx5.ini |  1 +
> >>>  doc/guides/nics/mlx5.rst          |  6 +++-
> >>>  drivers/net/mlx5/mlx5.c           |  8 ++++-
> >>>  drivers/net/mlx5/mlx5.h           |  5 +++
> >>>  drivers/net/mlx5/mlx5_defs.h      | 18 ++++++++--
> >>>  drivers/net/mlx5/mlx5_rxq.c       |  6 +++-
> >>>  drivers/net/mlx5/mlx5_rxtx.c      | 22 +++++++------
> >>>  drivers/net/mlx5/mlx5_rxtx.h      | 69
> >> ++++++++++++++++++++++++++++++++++++++-
> >>>  drivers/net/mlx5/mlx5_txq.c       | 13 +++++++-
> >>>  9 files changed, 131 insertions(+), 17 deletions(-)
> >>>
> >>> diff --git a/doc/guides/nics/features/mlx5.ini
> >>> b/doc/guides/nics/features/mlx5.ini
> >>> index e75b14b..b28b43e 100644
> >>> --- a/doc/guides/nics/features/mlx5.ini
> >>> +++ b/doc/guides/nics/features/mlx5.ini
> >>> @@ -43,5 +43,6 @@ Multiprocess aware   = Y
> >>>  Other kdrv           = Y
> >>>  ARMv8                = Y
> >>>  Power8               = Y
> >>> +x86-32               = Y
> >>>  x86-64               = Y
> >>>  Usage doc            = Y
> >>> diff --git a/doc/guides/nics/mlx5.rst b/doc/guides/nics/mlx5.rst
> >>> index
> >>> 7dd9c1c..5fbad60 100644
> >>> --- a/doc/guides/nics/mlx5.rst
> >>> +++ b/doc/guides/nics/mlx5.rst
> >>> @@ -49,7 +49,7 @@ libibverbs.
> >>>  Features
> >>>  --------
> >>>
> >>> -- Multi arch support: x86_64, POWER8, ARMv8.
> >>> +- Multi arch support: x86_64, POWER8, ARMv8, i686.
> >>>  - Multiple TX and RX queues.
> >>>  - Support for scattered TX and RX frames.
> >>>  - IPv4, IPv6, TCPv4, TCPv6, UDPv4 and UDPv6 RSS on any number of
> >> queues.
> >>> @@ -477,6 +477,10 @@ RMDA Core with Linux Kernel
> >>>  - Minimal kernel version : v4.14 or the most recent 4.14-rc (see
> >>> `Linux installation documentation`_)
> >>>  - Minimal rdma-core version: v15+ commit 0c5f5765213a ("Merge pull
> >> request #227 from yishaih/tm")
> >>>    (see `RDMA Core installation documentation`_)
> >>> +- When building for i686 use:
> >>> +
> >>> +  - rdma-core version 18.0 or above built with 32bit support.
> >>
> >> related "or above" part, v19 giving build errors with mlx5, FYI.
> >>
> >> And with v18 getting build errors originated from rdma headers [1],
> >> am I doing something wrong?
> >>
> >> [1]
> >> In file included from .../dpdk/drivers/net/mlx5/mlx5_glue.c:20:
> >> .../rdma-core/build32/include/infiniband/mlx5dv.h: In function
> >> ‘mlx5dv_x86_set_data_seg’:
> >> .../rdma-core/build32/include/infiniband/mlx5dv.h:787:69: error:
> >> right shift count >= width of type [-Werror=shift-count-overflow]
> >>   __m128i val  = _mm_set_epi32((uint32_t)address, (uint32_t)(address
> >> >> 32), lkey, length);
> >>                                                                      ^~


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [dpdk-dev] [PATCH v3] net/mlx5: add support for 32bit systems
  2018-07-02 11:11 ` [dpdk-dev] [PATCH v2] " Moti Haimovsky
  2018-07-02 20:59   ` Yongseok Koh
  2018-07-04 13:48   ` Ferruh Yigit
@ 2018-07-12 12:01   ` Moti Haimovsky
  2018-07-13  6:16     ` Shahaf Shuler
  2 siblings, 1 reply; 17+ messages in thread
From: Moti Haimovsky @ 2018-07-12 12:01 UTC (permalink / raw)
  To: shahafs; +Cc: yskoh, dev, Moti Haimovsky

This patch adds support for building and running mlx5 PMD on
32bit systems such as i686.

The main issue to tackle was handling the 32bit access to the UAR
as quoted from the mlx5 PRM:
QP and CQ DoorBells require 64-bit writes. For best performance, it
is recommended to execute the QP/CQ DoorBell as a single 64-bit write
operation. For platforms that do not support 64 bit writes, it is
possible to issue the 64 bits DoorBells through two consecutive
writes,
each write 32 bits, as described below:
* The order of writing each of the Dwords is from lower to upper
  addresses.
* No other DoorBell can be rung (or even start ringing) in the midst
 of an on-going write of a DoorBell over a given UAR page.
The last rule implies that in a multi-threaded environment, the access
to a UAR page (which can be accessible by all threads in the process)
must be synchronized (for example, using a semaphore) unless an atomic
write of 64 bits in a single bus operation is guaranteed. Such a
synchronization is not required for when ringing DoorBells on different
UAR pages.

Signed-off-by: Moti Haimovsky <motih@mellanox.com>
---
v3:
* Rebased upon latest changes in mlx5 PMD and rdma-core.

v2:
* Fixed coding style issues.
* Modified documentation according to review inputs.
* Fixed merge conflicts.
---
 doc/guides/nics/features/mlx5.ini |  1 +
 doc/guides/nics/mlx5.rst          |  6 +++-
 drivers/net/mlx5/mlx5.c           |  8 ++++-
 drivers/net/mlx5/mlx5.h           |  5 +++
 drivers/net/mlx5/mlx5_defs.h      | 18 ++++++++--
 drivers/net/mlx5/mlx5_rxq.c       |  6 +++-
 drivers/net/mlx5/mlx5_rxtx.c      | 22 +++++++------
 drivers/net/mlx5/mlx5_rxtx.h      | 69 ++++++++++++++++++++++++++++++++++++++-
 drivers/net/mlx5/mlx5_txq.c       | 13 +++++++-
 9 files changed, 131 insertions(+), 17 deletions(-)

diff --git a/doc/guides/nics/features/mlx5.ini b/doc/guides/nics/features/mlx5.ini
index e75b14b..b28b43e 100644
--- a/doc/guides/nics/features/mlx5.ini
+++ b/doc/guides/nics/features/mlx5.ini
@@ -43,5 +43,6 @@ Multiprocess aware   = Y
 Other kdrv           = Y
 ARMv8                = Y
 Power8               = Y
+x86-32               = Y
 x86-64               = Y
 Usage doc            = Y
diff --git a/doc/guides/nics/mlx5.rst b/doc/guides/nics/mlx5.rst
index 0d0d217..ebf2336 100644
--- a/doc/guides/nics/mlx5.rst
+++ b/doc/guides/nics/mlx5.rst
@@ -49,7 +49,7 @@ libibverbs.
 Features
 --------
 
-- Multi arch support: x86_64, POWER8, ARMv8.
+- Multi arch support: x86_64, POWER8, ARMv8, i686.
 - Multiple TX and RX queues.
 - Support for scattered TX and RX frames.
 - IPv4, IPv6, TCPv4, TCPv6, UDPv4 and UDPv6 RSS on any number of queues.
@@ -489,6 +489,10 @@ RMDA Core with Linux Kernel
 - Minimal kernel version : v4.14 or the most recent 4.14-rc (see `Linux installation documentation`_)
 - Minimal rdma-core version: v15+ commit 0c5f5765213a ("Merge pull request #227 from yishaih/tm")
   (see `RDMA Core installation documentation`_)
+- When building for i686 use:
+
+  - rdma-core version 18.0 or above built with 32bit support.
+  - Kernel version 4.14.41 or above.
 
 .. _`Linux installation documentation`: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git/plain/Documentation/admin-guide/README.rst
 .. _`RDMA Core installation documentation`: https://raw.githubusercontent.com/linux-rdma/rdma-core/master/README.md
diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index dda50b8..15f1a17 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -598,7 +598,7 @@
 	rte_memseg_walk(find_lower_va_bound, &addr);
 
 	/* keep distance to hugepages to minimize potential conflicts. */
-	addr = RTE_PTR_SUB(addr, MLX5_UAR_OFFSET + MLX5_UAR_SIZE);
+	addr = RTE_PTR_SUB(addr, (uintptr_t)(MLX5_UAR_OFFSET + MLX5_UAR_SIZE));
 	/* anonymous mmap, no real memory consumption. */
 	addr = mmap(addr, MLX5_UAR_SIZE,
 		    PROT_NONE, MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
@@ -939,6 +939,12 @@
 	priv->device_attr = attr;
 	priv->pd = pd;
 	priv->mtu = ETHER_MTU;
+#ifndef RTE_ARCH_64
+	/* Initialize UAR access locks for 32bit implementations. */
+	rte_spinlock_init(&priv->uar_lock_cq);
+	for (i = 0; i < MLX5_UAR_PAGE_NUM_MAX; i++)
+		rte_spinlock_init(&priv->uar_lock[i]);
+#endif
 	/* Some internal functions rely on Netlink sockets, open them now. */
 	priv->nl_socket_rdma = mlx5_nl_init(0, NETLINK_RDMA);
 	priv->nl_socket_route =	mlx5_nl_init(RTMGRP_LINK, NETLINK_ROUTE);
diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index 131be33..896158a 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -215,6 +215,11 @@ struct priv {
 	int nl_socket_rdma; /* Netlink socket (NETLINK_RDMA). */
 	int nl_socket_route; /* Netlink socket (NETLINK_ROUTE). */
 	uint32_t nl_sn; /* Netlink message sequence number. */
+#ifndef RTE_ARCH_64
+	rte_spinlock_t uar_lock_cq; /* CQs share a common distinct UAR */
+	rte_spinlock_t uar_lock[MLX5_UAR_PAGE_NUM_MAX];
+	/* UAR same-page access control required in 32bit implementations. */
+#endif
 };
 
 #define PORT_ID(priv) ((priv)->dev_data->port_id)
diff --git a/drivers/net/mlx5/mlx5_defs.h b/drivers/net/mlx5/mlx5_defs.h
index 5bbbec2..f6ec415 100644
--- a/drivers/net/mlx5/mlx5_defs.h
+++ b/drivers/net/mlx5/mlx5_defs.h
@@ -87,14 +87,28 @@
 #define MLX5_LINK_STATUS_TIMEOUT 10
 
 /* Reserved address space for UAR mapping. */
-#define MLX5_UAR_SIZE (1ULL << 32)
+#define MLX5_UAR_SIZE (1ULL << (sizeof(uintptr_t) * 4))
 
 /* Offset of reserved UAR address space to hugepage memory. Offset is used here
  * to minimize possibility of address next to hugepage being used by other code
  * in either primary or secondary process, failing to map TX UAR would make TX
  * packets invisible to HW.
  */
-#define MLX5_UAR_OFFSET (1ULL << 32)
+#define MLX5_UAR_OFFSET (1ULL << (sizeof(uintptr_t) * 4))
+
+/* Maximum number of UAR pages used by a port,
+ * These are the size and mask for an array of mutexes used to synchronize
+ * the access to port's UARs on platforms that do not support 64 bit writes.
+ * In such systems it is possible to issue the 64 bits DoorBells through two
+ * consecutive writes, each write 32 bits. The access to a UAR page (which can
+ * be accessible by all threads in the process) must be synchronized
+ * (for example, using a semaphore). Such a synchronization is not required
+ * when ringing DoorBells on different UAR pages.
+ * A port with 512 Tx queues uses 8, 4kBytes, UAR pages which are shared
+ * among the ports.
+ */
+#define MLX5_UAR_PAGE_NUM_MAX 64
+#define MLX5_UAR_PAGE_NUM_MASK ((MLX5_UAR_PAGE_NUM_MAX) - 1)
 
 /* Log 2 of the default number of strides per WQE for Multi-Packet RQ. */
 #define MLX5_MPRQ_STRIDE_NUM_N 6U
diff --git a/drivers/net/mlx5/mlx5_rxq.c b/drivers/net/mlx5/mlx5_rxq.c
index 071740b..16e1641 100644
--- a/drivers/net/mlx5/mlx5_rxq.c
+++ b/drivers/net/mlx5/mlx5_rxq.c
@@ -647,7 +647,8 @@
 	doorbell = (uint64_t)doorbell_hi << 32;
 	doorbell |=  rxq->cqn;
 	rxq->cq_db[MLX5_CQ_ARM_DB] = rte_cpu_to_be_32(doorbell_hi);
-	rte_write64(rte_cpu_to_be_64(doorbell), cq_db_reg);
+	mlx5_uar_write64(rte_cpu_to_be_64(doorbell),
+			 cq_db_reg, rxq->uar_lock_cq);
 }
 
 /**
@@ -1449,6 +1450,9 @@ struct mlx5_rxq_ctrl *
 	tmpl->rxq.elts_n = log2above(desc);
 	tmpl->rxq.elts =
 		(struct rte_mbuf *(*)[1 << tmpl->rxq.elts_n])(tmpl + 1);
+#ifndef RTE_ARCH_64
+	tmpl->rxq.uar_lock_cq = &priv->uar_lock_cq;
+#endif
 	tmpl->idx = idx;
 	rte_atomic32_inc(&tmpl->refcnt);
 	LIST_INSERT_HEAD(&priv->rxqsctrl, tmpl, next);
diff --git a/drivers/net/mlx5/mlx5_rxtx.c b/drivers/net/mlx5/mlx5_rxtx.c
index a7ed8d8..52a1074 100644
--- a/drivers/net/mlx5/mlx5_rxtx.c
+++ b/drivers/net/mlx5/mlx5_rxtx.c
@@ -495,6 +495,7 @@
 	volatile struct mlx5_wqe_ctrl *last_wqe = NULL;
 	unsigned int segs_n = 0;
 	const unsigned int max_inline = txq->max_inline;
+	uint64_t addr_64;
 
 	if (unlikely(!pkts_n))
 		return 0;
@@ -711,12 +712,12 @@
 			ds = 3;
 use_dseg:
 			/* Add the remaining packet as a simple ds. */
-			addr = rte_cpu_to_be_64(addr);
+			addr_64 = rte_cpu_to_be_64(addr);
 			*dseg = (rte_v128u32_t){
 				rte_cpu_to_be_32(length),
 				mlx5_tx_mb2mr(txq, buf),
-				addr,
-				addr >> 32,
+				addr_64,
+				addr_64 >> 32,
 			};
 			++ds;
 			if (!segs_n)
@@ -750,12 +751,12 @@
 		total_length += length;
 #endif
 		/* Store segment information. */
-		addr = rte_cpu_to_be_64(rte_pktmbuf_mtod(buf, uintptr_t));
+		addr_64 = rte_cpu_to_be_64(rte_pktmbuf_mtod(buf, uintptr_t));
 		*dseg = (rte_v128u32_t){
 			rte_cpu_to_be_32(length),
 			mlx5_tx_mb2mr(txq, buf),
-			addr,
-			addr >> 32,
+			addr_64,
+			addr_64 >> 32,
 		};
 		(*txq->elts)[++elts_head & elts_m] = buf;
 		if (--segs_n)
@@ -1450,6 +1451,7 @@
 	unsigned int mpw_room = 0;
 	unsigned int inl_pad = 0;
 	uint32_t inl_hdr;
+	uint64_t addr_64;
 	struct mlx5_mpw mpw = {
 		.state = MLX5_MPW_STATE_CLOSED,
 	};
@@ -1586,13 +1588,13 @@
 					((uintptr_t)mpw.data.raw +
 					 inl_pad);
 			(*txq->elts)[elts_head++ & elts_m] = buf;
-			addr = rte_cpu_to_be_64(rte_pktmbuf_mtod(buf,
-								 uintptr_t));
+			addr_64 = rte_cpu_to_be_64(rte_pktmbuf_mtod(buf,
+								    uintptr_t));
 			*dseg = (rte_v128u32_t) {
 				rte_cpu_to_be_32(length),
 				mlx5_tx_mb2mr(txq, buf),
-				addr,
-				addr >> 32,
+				addr_64,
+				addr_64 >> 32,
 			};
 			mpw.data.raw = (volatile void *)(dseg + 1);
 			mpw.total_len += (inl_pad + sizeof(*dseg));
diff --git a/drivers/net/mlx5/mlx5_rxtx.h b/drivers/net/mlx5/mlx5_rxtx.h
index a04a84f..992e977 100644
--- a/drivers/net/mlx5/mlx5_rxtx.h
+++ b/drivers/net/mlx5/mlx5_rxtx.h
@@ -26,6 +26,8 @@
 #include <rte_common.h>
 #include <rte_hexdump.h>
 #include <rte_atomic.h>
+#include <rte_spinlock.h>
+#include <rte_io.h>
 
 #include "mlx5_utils.h"
 #include "mlx5.h"
@@ -118,6 +120,10 @@ struct mlx5_rxq_data {
 	void *cq_uar; /* CQ user access region. */
 	uint32_t cqn; /* CQ number. */
 	uint8_t cq_arm_sn; /* CQ arm seq number. */
+#ifndef RTE_ARCH_64
+	rte_spinlock_t *uar_lock_cq;
+	/* CQ (UAR) access lock required for 32bit implementations */
+#endif
 	uint32_t tunnel; /* Tunnel information. */
 } __rte_cache_aligned;
 
@@ -198,6 +204,10 @@ struct mlx5_txq_data {
 	volatile void *bf_reg; /* Blueflame register remapped. */
 	struct rte_mbuf *(*elts)[]; /* TX elements. */
 	struct mlx5_txq_stats stats; /* TX queue counters. */
+#ifndef RTE_ARCH_64
+	rte_spinlock_t *uar_lock;
+	/* UAR access lock required for 32bit implementations */
+#endif
 } __rte_cache_aligned;
 
 /* Verbs Rx queue elements. */
@@ -353,6 +363,63 @@ uint16_t mlx5_rx_burst_vec(void *dpdk_txq, struct rte_mbuf **pkts,
 uint32_t mlx5_rx_addr2mr_bh(struct mlx5_rxq_data *rxq, uintptr_t addr);
 uint32_t mlx5_tx_addr2mr_bh(struct mlx5_txq_data *txq, uintptr_t addr);
 
+/**
+ * Provide safe 64bit store operation to mlx5 UAR region for both 32bit and
+ * 64bit architectures.
+ *
+ * @param val
+ *   value to write in CPU endian format.
+ * @param addr
+ *   Address to write to.
+ * @param lock
+ *   Address of the lock to use for that UAR access.
+ */
+static __rte_always_inline void
+__mlx5_uar_write64_relaxed(uint64_t val, volatile void *addr,
+			   rte_spinlock_t *lock __rte_unused)
+{
+#ifdef RTE_ARCH_64
+	rte_write64_relaxed(val, addr);
+#else /* !RTE_ARCH_64 */
+	rte_spinlock_lock(lock);
+	rte_write32_relaxed(val, addr);
+	rte_io_wmb();
+	rte_write32_relaxed(val >> 32,
+			    (volatile void *)((volatile char *)addr + 4));
+	rte_spinlock_unlock(lock);
+#endif
+}
+
+/**
+ * Provide safe 64bit store operation to mlx5 UAR region for both 32bit and
+ * 64bit architectures while guaranteeing the order of execution with the
+ * code being executed.
+ *
+ * @param val
+ *   value to write in CPU endian format.
+ * @param addr
+ *   Address to write to.
+ * @param lock
+ *   Address of the lock to use for that UAR access.
+ */
+static __rte_always_inline void
+__mlx5_uar_write64(uint64_t val, volatile void *addr, rte_spinlock_t *lock)
+{
+	rte_io_wmb();
+	__mlx5_uar_write64_relaxed(val, addr, lock);
+}
+
+/* Assist macros, used instead of directly calling the functions they wrap. */
+#ifdef RTE_ARCH_64
+#define mlx5_uar_write64_relaxed(val, dst, lock) \
+		__mlx5_uar_write64_relaxed(val, dst, NULL)
+#define mlx5_uar_write64(val, dst, lock) __mlx5_uar_write64(val, dst, NULL)
+#else
+#define mlx5_uar_write64_relaxed(val, dst, lock) \
+		__mlx5_uar_write64_relaxed(val, dst, lock)
+#define mlx5_uar_write64(val, dst, lock) __mlx5_uar_write64(val, dst, lock)
+#endif
+
 #ifndef NDEBUG
 /**
  * Verify or set magic value in CQE.
@@ -619,7 +686,7 @@ uint16_t mlx5_rx_burst_vec(void *dpdk_txq, struct rte_mbuf **pkts,
 	*txq->qp_db = rte_cpu_to_be_32(txq->wqe_ci);
 	/* Ensure ordering between DB record and BF copy. */
 	rte_wmb();
-	*dst = *src;
+	mlx5_uar_write64_relaxed(*src, dst, txq->uar_lock);
 	if (cond)
 		rte_wmb();
 }
diff --git a/drivers/net/mlx5/mlx5_txq.c b/drivers/net/mlx5/mlx5_txq.c
index 5057561..f9bc473 100644
--- a/drivers/net/mlx5/mlx5_txq.c
+++ b/drivers/net/mlx5/mlx5_txq.c
@@ -255,6 +255,9 @@
 	struct mlx5_txq_ctrl *txq_ctrl;
 	int already_mapped;
 	size_t page_size = sysconf(_SC_PAGESIZE);
+#ifndef RTE_ARCH_64
+	unsigned int lock_idx;
+#endif
 
 	memset(pages, 0, priv->txqs_n * sizeof(uintptr_t));
 	/*
@@ -281,7 +284,7 @@
 		}
 		/* new address in reserved UAR address space. */
 		addr = RTE_PTR_ADD(priv->uar_base,
-				   uar_va & (MLX5_UAR_SIZE - 1));
+				   uar_va & (uintptr_t)(MLX5_UAR_SIZE - 1));
 		if (!already_mapped) {
 			pages[pages_n++] = uar_va;
 			/* fixed mmap to specified address in reserved
@@ -305,6 +308,12 @@
 		else
 			assert(txq_ctrl->txq.bf_reg ==
 			       RTE_PTR_ADD((void *)addr, off));
+#ifndef RTE_ARCH_64
+		/* Assign a UAR lock according to UAR page number */
+		lock_idx = (txq_ctrl->uar_mmap_offset / page_size) &
+			   MLX5_UAR_PAGE_NUM_MASK;
+		txq->uar_lock = &priv->uar_lock[lock_idx];
+#endif
 	}
 	return 0;
 }
@@ -511,6 +520,8 @@ struct mlx5_txq_ibv *
 	rte_atomic32_inc(&txq_ibv->refcnt);
 	if (qp.comp_mask & MLX5DV_QP_MASK_UAR_MMAP_OFFSET) {
 		txq_ctrl->uar_mmap_offset = qp.uar_mmap_offset;
+		DRV_LOG(DEBUG, "port %u: uar_mmap_offset 0x%lx",
+			dev->data->port_id, txq_ctrl->uar_mmap_offset);
 	} else {
 		DRV_LOG(ERR,
 			"port %u failed to retrieve UAR info, invalid"
-- 
1.8.3.1

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [dpdk-dev] [PATCH v3] net/mlx5: add support for 32bit systems
  2018-07-12 12:01   ` [dpdk-dev] [PATCH v3] " Moti Haimovsky
@ 2018-07-13  6:16     ` Shahaf Shuler
  2018-07-18  8:08       ` Ferruh Yigit
  0 siblings, 1 reply; 17+ messages in thread
From: Shahaf Shuler @ 2018-07-13  6:16 UTC (permalink / raw)
  To: Mordechay Haimovsky; +Cc: Yongseok Koh, dev, ferruh.yigit


Thursday, July 12, 2018 3:02 PM, Mordechay Haimovsky:
> Subject: [PATCH v3] net/mlx5: add support for 32bit systems
> 
> This patch adds support for building and running mlx5 PMD on 32bit systems
> such as i686.
> 
> The main issue to tackle was handling the 32bit access to the UAR as quoted
> from the mlx5 PRM:
> QP and CQ DoorBells require 64-bit writes. For best performance, it is
> recommended to execute the QP/CQ DoorBell as a single 64-bit write
> operation. For platforms that do not support 64 bit writes, it is possible to
> issue the 64 bits DoorBells through two consecutive writes, each write 32
> bits, as described below:
> * The order of writing each of the Dwords is from lower to upper
>   addresses.
> * No other DoorBell can be rung (or even start ringing) in the midst  of an on-
> going write of a DoorBell over a given UAR page.
> The last rule implies that in a multi-threaded environment, the access to a
> UAR page (which can be accessible by all threads in the process) must be
> synchronized (for example, using a semaphore) unless an atomic write of 64
> bits in a single bus operation is guaranteed. Such a synchronization is not
> required for when ringing DoorBells on different UAR pages.
> 
> Signed-off-by: Moti Haimovsky <motih@mellanox.com>

Applied to next-net-mlx (again), thanks. 

Guidelines for 32b compilation and testing:
1. fetch the latest rdma-core from github. Make sure you have commit "708c8242 mlx5: Fix compilation on 32 bit systems when sse3 is on"
2. compile rdma-core for 32b by
	mkdir build32
	cd build32
	CFLAGS="-Werror -m32" cmake -GNinja .. -DENABLE_RESOLVE_NEIGH=0 -DIOCTL_MODE=both (approach taken from rdma-core travis build https://github.com/linux-rdma/rdma-core/blob/master/buildlib/travis-build#L20) 
	Ninja (or ninja-build)
3. compile and run DPDK against build32 directory
	

> ---
> v3:
> * Rebased upon latest changes in mlx5 PMD and rdma-core.
> 
> v2:
> * Fixed coding style issues.
> * Modified documentation according to review inputs.
> * Fixed merge conflicts.
> ---
>  doc/guides/nics/features/mlx5.ini |  1 +
>  doc/guides/nics/mlx5.rst          |  6 +++-
>  drivers/net/mlx5/mlx5.c           |  8 ++++-
>  drivers/net/mlx5/mlx5.h           |  5 +++
>  drivers/net/mlx5/mlx5_defs.h      | 18 ++++++++--
>  drivers/net/mlx5/mlx5_rxq.c       |  6 +++-
>  drivers/net/mlx5/mlx5_rxtx.c      | 22 +++++++------
>  drivers/net/mlx5/mlx5_rxtx.h      | 69
> ++++++++++++++++++++++++++++++++++++++-
>  drivers/net/mlx5/mlx5_txq.c       | 13 +++++++-
>  9 files changed, 131 insertions(+), 17 deletions(-)
> 
> diff --git a/doc/guides/nics/features/mlx5.ini
> b/doc/guides/nics/features/mlx5.ini
> index e75b14b..b28b43e 100644
> --- a/doc/guides/nics/features/mlx5.ini
> +++ b/doc/guides/nics/features/mlx5.ini
> @@ -43,5 +43,6 @@ Multiprocess aware   = Y
>  Other kdrv           = Y
>  ARMv8                = Y
>  Power8               = Y
> +x86-32               = Y
>  x86-64               = Y
>  Usage doc            = Y
> diff --git a/doc/guides/nics/mlx5.rst b/doc/guides/nics/mlx5.rst index
> 0d0d217..ebf2336 100644
> --- a/doc/guides/nics/mlx5.rst
> +++ b/doc/guides/nics/mlx5.rst
> @@ -49,7 +49,7 @@ libibverbs.
>  Features
>  --------
> 
> -- Multi arch support: x86_64, POWER8, ARMv8.
> +- Multi arch support: x86_64, POWER8, ARMv8, i686.
>  - Multiple TX and RX queues.
>  - Support for scattered TX and RX frames.
>  - IPv4, IPv6, TCPv4, TCPv6, UDPv4 and UDPv6 RSS on any number of queues.
> @@ -489,6 +489,10 @@ RMDA Core with Linux Kernel
>  - Minimal kernel version : v4.14 or the most recent 4.14-rc (see `Linux
> installation documentation`_)
>  - Minimal rdma-core version: v15+ commit 0c5f5765213a ("Merge pull
> request #227 from yishaih/tm")
>    (see `RDMA Core installation documentation`_)
> +- When building for i686 use:
> +
> +  - rdma-core version 18.0 or above built with 32bit support.
> +  - Kernel version 4.14.41 or above.
> 
>  .. _`Linux installation documentation`:
> https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-
> stable.git/plain/Documentation/admin-guide/README.rst
>  .. _`RDMA Core installation documentation`:
> https://raw.githubusercontent.com/linux-rdma/rdma-
> core/master/README.md
> diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c index
> dda50b8..15f1a17 100644
> --- a/drivers/net/mlx5/mlx5.c
> +++ b/drivers/net/mlx5/mlx5.c
> @@ -598,7 +598,7 @@
>  	rte_memseg_walk(find_lower_va_bound, &addr);
> 
>  	/* keep distance to hugepages to minimize potential conflicts. */
> -	addr = RTE_PTR_SUB(addr, MLX5_UAR_OFFSET + MLX5_UAR_SIZE);
> +	addr = RTE_PTR_SUB(addr, (uintptr_t)(MLX5_UAR_OFFSET +
> +MLX5_UAR_SIZE));
>  	/* anonymous mmap, no real memory consumption. */
>  	addr = mmap(addr, MLX5_UAR_SIZE,
>  		    PROT_NONE, MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
> @@ -939,6 +939,12 @@
>  	priv->device_attr = attr;
>  	priv->pd = pd;
>  	priv->mtu = ETHER_MTU;
> +#ifndef RTE_ARCH_64
> +	/* Initialize UAR access locks for 32bit implementations. */
> +	rte_spinlock_init(&priv->uar_lock_cq);
> +	for (i = 0; i < MLX5_UAR_PAGE_NUM_MAX; i++)
> +		rte_spinlock_init(&priv->uar_lock[i]);
> +#endif
>  	/* Some internal functions rely on Netlink sockets, open them now.
> */
>  	priv->nl_socket_rdma = mlx5_nl_init(0, NETLINK_RDMA);
>  	priv->nl_socket_route =	mlx5_nl_init(RTMGRP_LINK,
> NETLINK_ROUTE);
> diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h index
> 131be33..896158a 100644
> --- a/drivers/net/mlx5/mlx5.h
> +++ b/drivers/net/mlx5/mlx5.h
> @@ -215,6 +215,11 @@ struct priv {
>  	int nl_socket_rdma; /* Netlink socket (NETLINK_RDMA). */
>  	int nl_socket_route; /* Netlink socket (NETLINK_ROUTE). */
>  	uint32_t nl_sn; /* Netlink message sequence number. */
> +#ifndef RTE_ARCH_64
> +	rte_spinlock_t uar_lock_cq; /* CQs share a common distinct UAR */
> +	rte_spinlock_t uar_lock[MLX5_UAR_PAGE_NUM_MAX];
> +	/* UAR same-page access control required in 32bit implementations.
> */
> +#endif
>  };
> 
>  #define PORT_ID(priv) ((priv)->dev_data->port_id) diff --git
> a/drivers/net/mlx5/mlx5_defs.h b/drivers/net/mlx5/mlx5_defs.h index
> 5bbbec2..f6ec415 100644
> --- a/drivers/net/mlx5/mlx5_defs.h
> +++ b/drivers/net/mlx5/mlx5_defs.h
> @@ -87,14 +87,28 @@
>  #define MLX5_LINK_STATUS_TIMEOUT 10
> 
>  /* Reserved address space for UAR mapping. */ -#define MLX5_UAR_SIZE
> (1ULL << 32)
> +#define MLX5_UAR_SIZE (1ULL << (sizeof(uintptr_t) * 4))
> 
>  /* Offset of reserved UAR address space to hugepage memory. Offset is
> used here
>   * to minimize possibility of address next to hugepage being used by other
> code
>   * in either primary or secondary process, failing to map TX UAR would make
> TX
>   * packets invisible to HW.
>   */
> -#define MLX5_UAR_OFFSET (1ULL << 32)
> +#define MLX5_UAR_OFFSET (1ULL << (sizeof(uintptr_t) * 4))
> +
> +/* Maximum number of UAR pages used by a port,
> + * These are the size and mask for an array of mutexes used to
> +synchronize
> + * the access to port's UARs on platforms that do not support 64 bit writes.
> + * In such systems it is possible to issue the 64 bits DoorBells
> +through two
> + * consecutive writes, each write 32 bits. The access to a UAR page
> +(which can
> + * be accessible by all threads in the process) must be synchronized
> + * (for example, using a semaphore). Such a synchronization is not
> +required
> + * when ringing DoorBells on different UAR pages.
> + * A port with 512 Tx queues uses 8, 4kBytes, UAR pages which are
> +shared
> + * among the ports.
> + */
> +#define MLX5_UAR_PAGE_NUM_MAX 64
> +#define MLX5_UAR_PAGE_NUM_MASK ((MLX5_UAR_PAGE_NUM_MAX)
> - 1)
> 
>  /* Log 2 of the default number of strides per WQE for Multi-Packet RQ. */
> #define MLX5_MPRQ_STRIDE_NUM_N 6U diff --git
> a/drivers/net/mlx5/mlx5_rxq.c b/drivers/net/mlx5/mlx5_rxq.c index
> 071740b..16e1641 100644
> --- a/drivers/net/mlx5/mlx5_rxq.c
> +++ b/drivers/net/mlx5/mlx5_rxq.c
> @@ -647,7 +647,8 @@
>  	doorbell = (uint64_t)doorbell_hi << 32;
>  	doorbell |=  rxq->cqn;
>  	rxq->cq_db[MLX5_CQ_ARM_DB] = rte_cpu_to_be_32(doorbell_hi);
> -	rte_write64(rte_cpu_to_be_64(doorbell), cq_db_reg);
> +	mlx5_uar_write64(rte_cpu_to_be_64(doorbell),
> +			 cq_db_reg, rxq->uar_lock_cq);
>  }
> 
>  /**
> @@ -1449,6 +1450,9 @@ struct mlx5_rxq_ctrl *
>  	tmpl->rxq.elts_n = log2above(desc);
>  	tmpl->rxq.elts =
>  		(struct rte_mbuf *(*)[1 << tmpl->rxq.elts_n])(tmpl + 1);
> +#ifndef RTE_ARCH_64
> +	tmpl->rxq.uar_lock_cq = &priv->uar_lock_cq; #endif
>  	tmpl->idx = idx;
>  	rte_atomic32_inc(&tmpl->refcnt);
>  	LIST_INSERT_HEAD(&priv->rxqsctrl, tmpl, next); diff --git
> a/drivers/net/mlx5/mlx5_rxtx.c b/drivers/net/mlx5/mlx5_rxtx.c index
> a7ed8d8..52a1074 100644
> --- a/drivers/net/mlx5/mlx5_rxtx.c
> +++ b/drivers/net/mlx5/mlx5_rxtx.c
> @@ -495,6 +495,7 @@
>  	volatile struct mlx5_wqe_ctrl *last_wqe = NULL;
>  	unsigned int segs_n = 0;
>  	const unsigned int max_inline = txq->max_inline;
> +	uint64_t addr_64;
> 
>  	if (unlikely(!pkts_n))
>  		return 0;
> @@ -711,12 +712,12 @@
>  			ds = 3;
>  use_dseg:
>  			/* Add the remaining packet as a simple ds. */
> -			addr = rte_cpu_to_be_64(addr);
> +			addr_64 = rte_cpu_to_be_64(addr);
>  			*dseg = (rte_v128u32_t){
>  				rte_cpu_to_be_32(length),
>  				mlx5_tx_mb2mr(txq, buf),
> -				addr,
> -				addr >> 32,
> +				addr_64,
> +				addr_64 >> 32,
>  			};
>  			++ds;
>  			if (!segs_n)
> @@ -750,12 +751,12 @@
>  		total_length += length;
>  #endif
>  		/* Store segment information. */
> -		addr = rte_cpu_to_be_64(rte_pktmbuf_mtod(buf,
> uintptr_t));
> +		addr_64 = rte_cpu_to_be_64(rte_pktmbuf_mtod(buf,
> uintptr_t));
>  		*dseg = (rte_v128u32_t){
>  			rte_cpu_to_be_32(length),
>  			mlx5_tx_mb2mr(txq, buf),
> -			addr,
> -			addr >> 32,
> +			addr_64,
> +			addr_64 >> 32,
>  		};
>  		(*txq->elts)[++elts_head & elts_m] = buf;
>  		if (--segs_n)
> @@ -1450,6 +1451,7 @@
>  	unsigned int mpw_room = 0;
>  	unsigned int inl_pad = 0;
>  	uint32_t inl_hdr;
> +	uint64_t addr_64;
>  	struct mlx5_mpw mpw = {
>  		.state = MLX5_MPW_STATE_CLOSED,
>  	};
> @@ -1586,13 +1588,13 @@
>  					((uintptr_t)mpw.data.raw +
>  					 inl_pad);
>  			(*txq->elts)[elts_head++ & elts_m] = buf;
> -			addr = rte_cpu_to_be_64(rte_pktmbuf_mtod(buf,
> -								 uintptr_t));
> +			addr_64 =
> rte_cpu_to_be_64(rte_pktmbuf_mtod(buf,
> +								    uintptr_t));
>  			*dseg = (rte_v128u32_t) {
>  				rte_cpu_to_be_32(length),
>  				mlx5_tx_mb2mr(txq, buf),
> -				addr,
> -				addr >> 32,
> +				addr_64,
> +				addr_64 >> 32,
>  			};
>  			mpw.data.raw = (volatile void *)(dseg + 1);
>  			mpw.total_len += (inl_pad + sizeof(*dseg)); diff --git
> a/drivers/net/mlx5/mlx5_rxtx.h b/drivers/net/mlx5/mlx5_rxtx.h index
> a04a84f..992e977 100644
> --- a/drivers/net/mlx5/mlx5_rxtx.h
> +++ b/drivers/net/mlx5/mlx5_rxtx.h
> @@ -26,6 +26,8 @@
>  #include <rte_common.h>
>  #include <rte_hexdump.h>
>  #include <rte_atomic.h>
> +#include <rte_spinlock.h>
> +#include <rte_io.h>
> 
>  #include "mlx5_utils.h"
>  #include "mlx5.h"
> @@ -118,6 +120,10 @@ struct mlx5_rxq_data {
>  	void *cq_uar; /* CQ user access region. */
>  	uint32_t cqn; /* CQ number. */
>  	uint8_t cq_arm_sn; /* CQ arm seq number. */
> +#ifndef RTE_ARCH_64
> +	rte_spinlock_t *uar_lock_cq;
> +	/* CQ (UAR) access lock required for 32bit implementations */ #endif
>  	uint32_t tunnel; /* Tunnel information. */  } __rte_cache_aligned;
> 
> @@ -198,6 +204,10 @@ struct mlx5_txq_data {
>  	volatile void *bf_reg; /* Blueflame register remapped. */
>  	struct rte_mbuf *(*elts)[]; /* TX elements. */
>  	struct mlx5_txq_stats stats; /* TX queue counters. */
> +#ifndef RTE_ARCH_64
> +	rte_spinlock_t *uar_lock;
> +	/* UAR access lock required for 32bit implementations */ #endif
>  } __rte_cache_aligned;
> 
>  /* Verbs Rx queue elements. */
> @@ -353,6 +363,63 @@ uint16_t mlx5_rx_burst_vec(void *dpdk_txq, struct
> rte_mbuf **pkts,  uint32_t mlx5_rx_addr2mr_bh(struct mlx5_rxq_data
> *rxq, uintptr_t addr);  uint32_t mlx5_tx_addr2mr_bh(struct mlx5_txq_data
> *txq, uintptr_t addr);
> 
> +/**
> + * Provide safe 64bit store operation to mlx5 UAR region for both 32bit
> +and
> + * 64bit architectures.
> + *
> + * @param val
> + *   value to write in CPU endian format.
> + * @param addr
> + *   Address to write to.
> + * @param lock
> + *   Address of the lock to use for that UAR access.
> + */
> +static __rte_always_inline void
> +__mlx5_uar_write64_relaxed(uint64_t val, volatile void *addr,
> +			   rte_spinlock_t *lock __rte_unused) { #ifdef
> RTE_ARCH_64
> +	rte_write64_relaxed(val, addr);
> +#else /* !RTE_ARCH_64 */
> +	rte_spinlock_lock(lock);
> +	rte_write32_relaxed(val, addr);
> +	rte_io_wmb();
> +	rte_write32_relaxed(val >> 32,
> +			    (volatile void *)((volatile char *)addr + 4));
> +	rte_spinlock_unlock(lock);
> +#endif
> +}
> +
> +/**
> + * Provide safe 64bit store operation to mlx5 UAR region for both 32bit
> +and
> + * 64bit architectures while guaranteeing the order of execution with
> +the
> + * code being executed.
> + *
> + * @param val
> + *   value to write in CPU endian format.
> + * @param addr
> + *   Address to write to.
> + * @param lock
> + *   Address of the lock to use for that UAR access.
> + */
> +static __rte_always_inline void
> +__mlx5_uar_write64(uint64_t val, volatile void *addr, rte_spinlock_t
> +*lock) {
> +	rte_io_wmb();
> +	__mlx5_uar_write64_relaxed(val, addr, lock); }
> +
> +/* Assist macros, used instead of directly calling the functions they
> +wrap. */ #ifdef RTE_ARCH_64 #define mlx5_uar_write64_relaxed(val, dst,
> +lock) \
> +		__mlx5_uar_write64_relaxed(val, dst, NULL) #define
> +mlx5_uar_write64(val, dst, lock) __mlx5_uar_write64(val, dst, NULL)
> +#else #define mlx5_uar_write64_relaxed(val, dst, lock) \
> +		__mlx5_uar_write64_relaxed(val, dst, lock) #define
> +mlx5_uar_write64(val, dst, lock) __mlx5_uar_write64(val, dst, lock)
> +#endif
> +
>  #ifndef NDEBUG
>  /**
>   * Verify or set magic value in CQE.
> @@ -619,7 +686,7 @@ uint16_t mlx5_rx_burst_vec(void *dpdk_txq, struct
> rte_mbuf **pkts,
>  	*txq->qp_db = rte_cpu_to_be_32(txq->wqe_ci);
>  	/* Ensure ordering between DB record and BF copy. */
>  	rte_wmb();
> -	*dst = *src;
> +	mlx5_uar_write64_relaxed(*src, dst, txq->uar_lock);
>  	if (cond)
>  		rte_wmb();
>  }
> diff --git a/drivers/net/mlx5/mlx5_txq.c b/drivers/net/mlx5/mlx5_txq.c
> index 5057561..f9bc473 100644
> --- a/drivers/net/mlx5/mlx5_txq.c
> +++ b/drivers/net/mlx5/mlx5_txq.c
> @@ -255,6 +255,9 @@
>  	struct mlx5_txq_ctrl *txq_ctrl;
>  	int already_mapped;
>  	size_t page_size = sysconf(_SC_PAGESIZE);
> +#ifndef RTE_ARCH_64
> +	unsigned int lock_idx;
> +#endif
> 
>  	memset(pages, 0, priv->txqs_n * sizeof(uintptr_t));
>  	/*
> @@ -281,7 +284,7 @@
>  		}
>  		/* new address in reserved UAR address space. */
>  		addr = RTE_PTR_ADD(priv->uar_base,
> -				   uar_va & (MLX5_UAR_SIZE - 1));
> +				   uar_va & (uintptr_t)(MLX5_UAR_SIZE - 1));
>  		if (!already_mapped) {
>  			pages[pages_n++] = uar_va;
>  			/* fixed mmap to specified address in reserved @@ -
> 305,6 +308,12 @@
>  		else
>  			assert(txq_ctrl->txq.bf_reg ==
>  			       RTE_PTR_ADD((void *)addr, off));
> +#ifndef RTE_ARCH_64
> +		/* Assign a UAR lock according to UAR page number */
> +		lock_idx = (txq_ctrl->uar_mmap_offset / page_size) &
> +			   MLX5_UAR_PAGE_NUM_MASK;
> +		txq->uar_lock = &priv->uar_lock[lock_idx]; #endif
>  	}
>  	return 0;
>  }
> @@ -511,6 +520,8 @@ struct mlx5_txq_ibv *
>  	rte_atomic32_inc(&txq_ibv->refcnt);
>  	if (qp.comp_mask & MLX5DV_QP_MASK_UAR_MMAP_OFFSET) {
>  		txq_ctrl->uar_mmap_offset = qp.uar_mmap_offset;
> +		DRV_LOG(DEBUG, "port %u: uar_mmap_offset 0x%lx",
> +			dev->data->port_id, txq_ctrl->uar_mmap_offset);
>  	} else {
>  		DRV_LOG(ERR,
>  			"port %u failed to retrieve UAR info, invalid"
> --
> 1.8.3.1

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [dpdk-dev] [PATCH v3] net/mlx5: add support for 32bit systems
  2018-07-13  6:16     ` Shahaf Shuler
@ 2018-07-18  8:08       ` Ferruh Yigit
  0 siblings, 0 replies; 17+ messages in thread
From: Ferruh Yigit @ 2018-07-18  8:08 UTC (permalink / raw)
  To: Shahaf Shuler, Mordechay Haimovsky; +Cc: Yongseok Koh, dev

On 7/13/2018 7:16 AM, Shahaf Shuler wrote:
> 
> Thursday, July 12, 2018 3:02 PM, Mordechay Haimovsky:
>> Subject: [PATCH v3] net/mlx5: add support for 32bit systems
>>
>> This patch adds support for building and running mlx5 PMD on 32bit systems
>> such as i686.
>>
>> The main issue to tackle was handling the 32bit access to the UAR as quoted
>> from the mlx5 PRM:
>> QP and CQ DoorBells require 64-bit writes. For best performance, it is
>> recommended to execute the QP/CQ DoorBell as a single 64-bit write
>> operation. For platforms that do not support 64 bit writes, it is possible to
>> issue the 64 bits DoorBells through two consecutive writes, each write 32
>> bits, as described below:
>> * The order of writing each of the Dwords is from lower to upper
>>   addresses.
>> * No other DoorBell can be rung (or even start ringing) in the midst  of an on-
>> going write of a DoorBell over a given UAR page.
>> The last rule implies that in a multi-threaded environment, the access to a
>> UAR page (which can be accessible by all threads in the process) must be
>> synchronized (for example, using a semaphore) unless an atomic write of 64
>> bits in a single bus operation is guaranteed. Such a synchronization is not
>> required for when ringing DoorBells on different UAR pages.
>>
>> Signed-off-by: Moti Haimovsky <motih@mellanox.com>
> 
> Applied to next-net-mlx (again), thanks. 
> 
> Guidelines for 32b compilation and testing:
> 1. fetch the latest rdma-core from github. Make sure you have commit "708c8242 mlx5: Fix compilation on 32 bit systems when sse3 is on"
> 2. compile rdma-core for 32b by
> 	mkdir build32
> 	cd build32
> 	CFLAGS="-Werror -m32" cmake -GNinja .. -DENABLE_RESOLVE_NEIGH=0 -DIOCTL_MODE=both (approach taken from rdma-core travis build https://github.com/linux-rdma/rdma-core/blob/master/buildlib/travis-build#L20) 
> 	Ninja (or ninja-build)
> 3. compile and run DPDK against build32 directory

I have confirmed the 32bit build with gcc, thanks for the update.

Only with 32bit ICC getting following errors [1] related to the #pragma usage.

[1]
.../dpdk/drivers/net/mlx5/mlx5_prm.h(14): error #2282: unrecognized GCC pragma
  #pragma GCC diagnostic ignored "-Wpedantic"
                         ^

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2018-07-18  8:08 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-06-28  7:12 [dpdk-dev] [PATCH] net/mlx5: add support for 32bit systems Moti Haimovsky
2018-07-02  7:05 ` Shahaf Shuler
2018-07-02 10:39   ` Mordechay Haimovsky
2018-07-02 11:11 ` [dpdk-dev] [PATCH v2] " Moti Haimovsky
2018-07-02 20:59   ` Yongseok Koh
2018-07-03 12:03     ` Shahaf Shuler
2018-07-04 13:48   ` Ferruh Yigit
2018-07-05 10:09     ` Mordechay Haimovsky
2018-07-05 11:27       ` Ferruh Yigit
2018-07-11 12:22         ` Shahaf Shuler
2018-07-05 17:07       ` Mordechay Haimovsky
2018-07-05 17:49         ` Ferruh Yigit
2018-07-09  7:23           ` Shahaf Shuler
2018-07-08 17:04     ` Mordechay Haimovsky
2018-07-12 12:01   ` [dpdk-dev] [PATCH v3] " Moti Haimovsky
2018-07-13  6:16     ` Shahaf Shuler
2018-07-18  8:08       ` Ferruh Yigit

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).