DPDK patches and discussions
 help / color / mirror / Atom feed
* [dpdk-dev] [PATCH 0/6] mlx5: some UAR fixes
@ 2021-11-03 18:35 michaelba
  2021-11-03 18:35 ` [dpdk-dev] [PATCH 1/6] crypto/mlx5: fix invalid memory access in probing michaelba
                   ` (6 more replies)
  0 siblings, 7 replies; 8+ messages in thread
From: michaelba @ 2021-11-03 18:35 UTC (permalink / raw)
  To: dev; +Cc: Matan Azrad, Thomas Monjalon, Michael Baum

From: Michael Baum <michaelba@oss.nvidia.com>

Some different fixes around User Access Memory object.

Michael Baum (6):
  crypto/mlx5: fix invalid memory access in probing
  common/mlx5: fix redundant code in UAR allocation
  common/mlx5: fix UAR allocation diagnostics messages
  common/mlx5: fix doorbell mapping configuration
  net/mlx5: remove duplicated reference of the TxQ doorbell
  common/mlx5: fix post doorbell barrier

 drivers/common/mlx5/mlx5_common.c        | 101 ++++++++----
 drivers/common/mlx5/mlx5_common.h        |  91 ++++++++++-
 drivers/common/mlx5/mlx5_common_defs.h   |   8 +
 drivers/common/mlx5/version.map          |   5 +-
 drivers/compress/mlx5/mlx5_compress.c    |  69 ++------
 drivers/crypto/mlx5/mlx5_crypto.c        |  72 ++-------
 drivers/crypto/mlx5/mlx5_crypto.h        |   6 +-
 drivers/net/mlx5/linux/mlx5_verbs.c      |  48 +++++-
 drivers/net/mlx5/mlx5.c                  | 195 +++++------------------
 drivers/net/mlx5/mlx5.h                  |  12 +-
 drivers/net/mlx5/mlx5_defs.h             |   8 -
 drivers/net/mlx5/mlx5_devx.c             |  24 +--
 drivers/net/mlx5/mlx5_flow_aso.c         |  81 +++++-----
 drivers/net/mlx5/mlx5_rx.h               |   6 +-
 drivers/net/mlx5/mlx5_rxq.c              |  11 +-
 drivers/net/mlx5/mlx5_tx.h               | 103 ++----------
 drivers/net/mlx5/mlx5_txpp.c             |  36 ++---
 drivers/net/mlx5/mlx5_txq.c              |  79 ++-------
 drivers/regex/mlx5/mlx5_regex.c          |  17 +-
 drivers/regex/mlx5/mlx5_regex.h          |   2 +-
 drivers/regex/mlx5/mlx5_regex_control.c  |   4 +-
 drivers/regex/mlx5/mlx5_regex_fastpath.c |  26 ++-
 drivers/vdpa/mlx5/mlx5_vdpa.h            |   2 +-
 drivers/vdpa/mlx5/mlx5_vdpa_event.c      |  33 +---
 24 files changed, 390 insertions(+), 649 deletions(-)

-- 
2.25.1


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [dpdk-dev] [PATCH 1/6] crypto/mlx5: fix invalid memory access in probing
  2021-11-03 18:35 [dpdk-dev] [PATCH 0/6] mlx5: some UAR fixes michaelba
@ 2021-11-03 18:35 ` michaelba
  2021-11-03 18:35 ` [dpdk-dev] [PATCH 2/6] common/mlx5: fix redundant code in UAR allocation michaelba
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: michaelba @ 2021-11-03 18:35 UTC (permalink / raw)
  To: dev
  Cc: Matan Azrad, Thomas Monjalon, Michael Baum, stable, Viacheslav Ovsiienko

From: Michael Baum <michaelba@oss.nvidia.com>

The probe function creates DevX object named login and saves pointer to
it in priv structure.

The remove function releases first the priv structure and then releases
the login object.
However, the pointer to login object is field of priv structure, which
is invalid.

Release the login object and then release the priv structure.

Fixes: debb27ea3442 ("crypto/mlx5: create login object using DevX")
Cc: stable@dpdk.org

Signed-off-by: Michael Baum <michaelba@oss.nvidia.com>
Reviewed-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
---
 drivers/crypto/mlx5/mlx5_crypto.c | 18 ++++++++++--------
 1 file changed, 10 insertions(+), 8 deletions(-)

diff --git a/drivers/crypto/mlx5/mlx5_crypto.c b/drivers/crypto/mlx5/mlx5_crypto.c
index f430d8cde0..f9fd0d498e 100644
--- a/drivers/crypto/mlx5/mlx5_crypto.c
+++ b/drivers/crypto/mlx5/mlx5_crypto.c
@@ -878,12 +878,6 @@ mlx5_crypto_dev_probe(struct mlx5_common_device *cdev)
 		DRV_LOG(ERR, "Failed to parse devargs.");
 		return -rte_errno;
 	}
-	login = mlx5_devx_cmd_create_crypto_login_obj(cdev->ctx,
-						      &devarg_prms.login_attr);
-	if (login == NULL) {
-		DRV_LOG(ERR, "Failed to configure login.");
-		return -rte_errno;
-	}
 	crypto_dev = rte_cryptodev_pmd_create(ibdev_name, cdev->dev,
 					      &init_params);
 	if (crypto_dev == NULL) {
@@ -899,12 +893,20 @@ mlx5_crypto_dev_probe(struct mlx5_common_device *cdev)
 	crypto_dev->driver_id = mlx5_crypto_driver_id;
 	priv = crypto_dev->data->dev_private;
 	priv->cdev = cdev;
-	priv->login_obj = login;
 	priv->crypto_dev = crypto_dev;
 	if (mlx5_crypto_uar_prepare(priv) != 0) {
 		rte_cryptodev_pmd_destroy(priv->crypto_dev);
 		return -1;
 	}
+	login = mlx5_devx_cmd_create_crypto_login_obj(cdev->ctx,
+						      &devarg_prms.login_attr);
+	if (login == NULL) {
+		DRV_LOG(ERR, "Failed to configure login.");
+		mlx5_crypto_uar_release(priv);
+		rte_cryptodev_pmd_destroy(priv->crypto_dev);
+		return -rte_errno;
+	}
+	priv->login_obj = login;
 	priv->keytag = rte_cpu_to_be_64(devarg_prms.keytag);
 	priv->max_segs_num = devarg_prms.max_segs_num;
 	priv->umr_wqe_size = sizeof(struct mlx5_wqe_umr_bsf_seg) +
@@ -940,9 +942,9 @@ mlx5_crypto_dev_remove(struct mlx5_common_device *cdev)
 		TAILQ_REMOVE(&mlx5_crypto_priv_list, priv, next);
 	pthread_mutex_unlock(&priv_list_lock);
 	if (priv) {
+		claim_zero(mlx5_devx_cmd_destroy(priv->login_obj));
 		mlx5_crypto_uar_release(priv);
 		rte_cryptodev_pmd_destroy(priv->crypto_dev);
-		claim_zero(mlx5_devx_cmd_destroy(priv->login_obj));
 	}
 	return 0;
 }
-- 
2.25.1


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [dpdk-dev] [PATCH 2/6] common/mlx5: fix redundant code in UAR allocation
  2021-11-03 18:35 [dpdk-dev] [PATCH 0/6] mlx5: some UAR fixes michaelba
  2021-11-03 18:35 ` [dpdk-dev] [PATCH 1/6] crypto/mlx5: fix invalid memory access in probing michaelba
@ 2021-11-03 18:35 ` michaelba
  2021-11-03 18:35 ` [dpdk-dev] [PATCH 3/6] common/mlx5: fix UAR allocation diagnostics messages michaelba
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: michaelba @ 2021-11-03 18:35 UTC (permalink / raw)
  To: dev
  Cc: Matan Azrad, Thomas Monjalon, Michael Baum, stable, Viacheslav Ovsiienko

From: Michael Baum <michaelba@oss.nvidia.com>

The User Access Region (UAR) provides access to the hardware resources
like Doorbell Register from userspace.
It means the resources should be mapped by the kernel to some virtual
address range. There two types of memory mapping are supported by mlx5
kernel driver:

 MLX5DV_UAR_ALLOC_TYPE_NC - non-cached, all writes promoted directly to
			    hardware.
 MLX5DV_UAR_ALLOC_TYPE_BF - "BlueFlame", all writes might be cached by
			    CPU, and will be flushed to hardware
			    explicitly with memory barriers.

The supported mapping types depend on the platform (x86/ARM/etc), kernel
version, driver version, virtualization environment (hypervisor), etc.

In UAR allocation, if the system supports the allocation with non-cached
mapping, the first attempt is performed with MLX5DV_UAR_ALLOC_TYPE_NC.
Then, if this fails, the next attempt is done with
MLX5DV_UAR_ALLOC_TYPE_BF.

However, the function adds a condition for the case where the first
attempt was performed with MLX5DV_UAR_ALLOC_TYPE_BF, a condition that is
unattainable since the first attempt was always performed with
MLX5DV_UAR_ALLOC_TYPE_NC.

Remove the unreachable code.

Fixes: 9cc0e99c81ab0 ("common/mlx5: share UAR allocation routine")
Cc: stable@dpdk.org

Signed-off-by: Michael Baum <michaelba@oss.nvidia.com>
Reviewed-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
---
 drivers/common/mlx5/mlx5_common.c | 22 ++++------------------
 1 file changed, 4 insertions(+), 18 deletions(-)

diff --git a/drivers/common/mlx5/mlx5_common.c b/drivers/common/mlx5/mlx5_common.c
index e6ff045c95..a0076510ac 100644
--- a/drivers/common/mlx5/mlx5_common.c
+++ b/drivers/common/mlx5/mlx5_common.c
@@ -942,11 +942,11 @@ RTE_INIT_PRIO(mlx5_is_haswell_broadwell_cpu, LOG)
  *				attributes (if supported by the host), the
  *				writes to the UAR registers must be followed
  *				by write memory barrier.
- *   MLX5DV_UAR_ALLOC_TYPE_NC - allocate as non-cached nenory, all writes are
+ *   MLX5DV_UAR_ALLOC_TYPE_NC - allocate as non-cached memory, all writes are
  *				promoted to the registers immediately, no
  *				memory barriers needed.
- *   mapping < 0 - the first attempt is performed with MLX5DV_UAR_ALLOC_TYPE_BF,
- *		   if this fails the next attempt with MLX5DV_UAR_ALLOC_TYPE_NC
+ *   mapping < 0 - the first attempt is performed with MLX5DV_UAR_ALLOC_TYPE_NC,
+ *		   if this fails the next attempt with MLX5DV_UAR_ALLOC_TYPE_BF
  *		   is performed. The drivers specifying negative values should
  *		   always provide the write memory barrier operation after UAR
  *		   register writings.
@@ -978,21 +978,7 @@ mlx5_devx_alloc_uar(void *ctx, int mapping)
 #endif
 		uar = mlx5_glue->devx_alloc_uar(ctx, uar_mapping);
 #ifdef MLX5DV_UAR_ALLOC_TYPE_NC
-		if (!uar &&
-		    mapping < 0 &&
-		    uar_mapping == MLX5DV_UAR_ALLOC_TYPE_BF) {
-			/*
-			 * In some environments like virtual machine the
-			 * Write Combining mapped might be not supported and
-			 * UAR allocation fails. We tried "Non-Cached" mapping
-			 * for the case.
-			 */
-			DRV_LOG(WARNING, "Failed to allocate DevX UAR (BF)");
-			uar_mapping = MLX5DV_UAR_ALLOC_TYPE_NC;
-			uar = mlx5_glue->devx_alloc_uar(ctx, uar_mapping);
-		} else if (!uar &&
-			   mapping < 0 &&
-			   uar_mapping == MLX5DV_UAR_ALLOC_TYPE_NC) {
+		if (!uar && mapping < 0) {
 			/*
 			 * If Verbs/kernel does not support "Non-Cached"
 			 * try the "Write-Combining".
-- 
2.25.1


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [dpdk-dev] [PATCH 3/6] common/mlx5: fix UAR allocation diagnostics messages
  2021-11-03 18:35 [dpdk-dev] [PATCH 0/6] mlx5: some UAR fixes michaelba
  2021-11-03 18:35 ` [dpdk-dev] [PATCH 1/6] crypto/mlx5: fix invalid memory access in probing michaelba
  2021-11-03 18:35 ` [dpdk-dev] [PATCH 2/6] common/mlx5: fix redundant code in UAR allocation michaelba
@ 2021-11-03 18:35 ` michaelba
  2021-11-03 18:35 ` [dpdk-dev] [PATCH 4/6] common/mlx5: fix doorbell mapping configuration michaelba
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: michaelba @ 2021-11-03 18:35 UTC (permalink / raw)
  To: dev
  Cc: Matan Azrad, Thomas Monjalon, Michael Baum, stable, Viacheslav Ovsiienko

From: Michael Baum <michaelba@oss.nvidia.com>

Depending on kernel capabilities and rdma-core version the mapping of
UAR (User Access Region) of desired memory caching type (non-cached or
write combining) might fail. The PMD implements the flexible strategy
of UAR mapping, alternating the type of caching to succeed.

During this process the failure diagnostics messages are emitted.
These messages are merely diagnostics ones and the logging level should
be adjusted to DEBUG.

Fixes: 9cc0e99c81ab0 ("common/mlx5: share UAR allocation routine")
Cc: stable@dpdk.org

Signed-off-by: Michael Baum <michaelba@oss.nvidia.com>
Reviewed-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
---
 drivers/common/mlx5/mlx5_common.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/common/mlx5/mlx5_common.c b/drivers/common/mlx5/mlx5_common.c
index a0076510ac..7f92e3b2cc 100644
--- a/drivers/common/mlx5/mlx5_common.c
+++ b/drivers/common/mlx5/mlx5_common.c
@@ -983,7 +983,7 @@ mlx5_devx_alloc_uar(void *ctx, int mapping)
 			 * If Verbs/kernel does not support "Non-Cached"
 			 * try the "Write-Combining".
 			 */
-			DRV_LOG(WARNING, "Failed to allocate DevX UAR (NC)");
+			DRV_LOG(DEBUG, "Failed to allocate DevX UAR (NC)");
 			uar_mapping = MLX5DV_UAR_ALLOC_TYPE_BF;
 			uar = mlx5_glue->devx_alloc_uar(ctx, uar_mapping);
 		}
@@ -1001,7 +1001,7 @@ mlx5_devx_alloc_uar(void *ctx, int mapping)
 		 * IB device context, on context closure all UARs
 		 * will be freed, should be no memory/object leakage.
 		 */
-		DRV_LOG(WARNING, "Retrying to allocate DevX UAR");
+		DRV_LOG(DEBUG, "Retrying to allocate DevX UAR");
 		uar = NULL;
 	}
 	/* Check whether we finally succeeded with valid UAR allocation. */
-- 
2.25.1


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [dpdk-dev] [PATCH 4/6] common/mlx5: fix doorbell mapping configuration
  2021-11-03 18:35 [dpdk-dev] [PATCH 0/6] mlx5: some UAR fixes michaelba
                   ` (2 preceding siblings ...)
  2021-11-03 18:35 ` [dpdk-dev] [PATCH 3/6] common/mlx5: fix UAR allocation diagnostics messages michaelba
@ 2021-11-03 18:35 ` michaelba
  2021-11-03 18:35 ` [dpdk-dev] [PATCH 5/6] net/mlx5: remove duplicated reference of the TxQ doorbell michaelba
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: michaelba @ 2021-11-03 18:35 UTC (permalink / raw)
  To: dev
  Cc: Matan Azrad, Thomas Monjalon, Michael Baum, stable, Viacheslav Ovsiienko

From: Michael Baum <michaelba@oss.nvidia.com>

UAR mapping type can be affected by the devarg tx_db_nc, which can cause
setting the environment variable MLX5_SHUT_UP_BF.
So, the MLX5_SHUT_UP_BF value and the UAR mapping parameter affect the
UAR cache mode.

Wrongly, the devarg was considered for the MLX5_SHUT_UP_BF but not for
the UAR mapping parameter in all the drivers except the net.

Take the tx_db_nc devarg into account for all the drivers.

Fixes: ca1418ce3910 ("common/mlx5: share device context object")
Cc: stable@dpdk.org

Signed-off-by: Michael Baum <michaelba@oss.nvidia.com>
Reviewed-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
---
 drivers/common/mlx5/mlx5_common.c     | 52 ++++++++++++++-------------
 drivers/common/mlx5/mlx5_common.h     |  5 +--
 drivers/compress/mlx5/mlx5_compress.c |  2 +-
 drivers/crypto/mlx5/mlx5_crypto.c     |  2 +-
 drivers/regex/mlx5/mlx5_regex.c       |  2 +-
 drivers/vdpa/mlx5/mlx5_vdpa_event.c   |  2 +-
 6 files changed, 35 insertions(+), 30 deletions(-)

diff --git a/drivers/common/mlx5/mlx5_common.c b/drivers/common/mlx5/mlx5_common.c
index 7f92e3b2cc..7bdc550b36 100644
--- a/drivers/common/mlx5/mlx5_common.c
+++ b/drivers/common/mlx5/mlx5_common.c
@@ -934,30 +934,25 @@ RTE_INIT_PRIO(mlx5_is_haswell_broadwell_cpu, LOG)
 
 /**
  * Allocate the User Access Region with DevX on specified device.
+ * This routine handles the following UAR allocation issues:
  *
- * @param [in] ctx
- *   Infiniband device context to perform allocation on.
- * @param [in] mapping
- *   MLX5DV_UAR_ALLOC_TYPE_BF - allocate as cached memory with write-combining
- *				attributes (if supported by the host), the
- *				writes to the UAR registers must be followed
- *				by write memory barrier.
- *   MLX5DV_UAR_ALLOC_TYPE_NC - allocate as non-cached memory, all writes are
- *				promoted to the registers immediately, no
- *				memory barriers needed.
- *   mapping < 0 - the first attempt is performed with MLX5DV_UAR_ALLOC_TYPE_NC,
- *		   if this fails the next attempt with MLX5DV_UAR_ALLOC_TYPE_BF
- *		   is performed. The drivers specifying negative values should
- *		   always provide the write memory barrier operation after UAR
- *		   register writings.
- * If there is no definitions for the MLX5DV_UAR_ALLOC_TYPE_xx (older rdma
- * library headers), the caller can specify 0.
+ *  - tries to allocate the UAR with the most appropriate memory mapping
+ *    type from the ones supported by the host.
+ *
+ *  - tries to allocate the UAR with non-NULL base address OFED 5.0.x and
+ *    Upstream rdma_core before v29 returned the NULL as UAR base address
+ *    if UAR was not the first object in the UAR page.
+ *    It caused the PMD failure and we should try to get another UAR till
+ *    we get the first one with non-NULL base address returned.
+ *
+ * @param [in] cdev
+ *   Pointer to mlx5 device structure to perform allocation on its context.
  *
  * @return
  *   UAR object pointer on success, NULL otherwise and rte_errno is set.
  */
 void *
-mlx5_devx_alloc_uar(void *ctx, int mapping)
+mlx5_devx_alloc_uar(struct mlx5_common_device *cdev)
 {
 	void *uar;
 	uint32_t retry, uar_mapping;
@@ -966,26 +961,35 @@ mlx5_devx_alloc_uar(void *ctx, int mapping)
 	for (retry = 0; retry < MLX5_ALLOC_UAR_RETRY; ++retry) {
 #ifdef MLX5DV_UAR_ALLOC_TYPE_NC
 		/* Control the mapping type according to the settings. */
-		uar_mapping = (mapping < 0) ?
-			      MLX5DV_UAR_ALLOC_TYPE_NC : mapping;
+		uar_mapping = (cdev->config.dbnc == MLX5_TXDB_NCACHED) ?
+			    MLX5DV_UAR_ALLOC_TYPE_NC : MLX5DV_UAR_ALLOC_TYPE_BF;
 #else
 		/*
 		 * It seems we have no way to control the memory mapping type
 		 * for the UAR, the default "Write-Combining" type is supposed.
 		 */
 		uar_mapping = 0;
-		RTE_SET_USED(mapping);
 #endif
-		uar = mlx5_glue->devx_alloc_uar(ctx, uar_mapping);
+		uar = mlx5_glue->devx_alloc_uar(cdev->ctx, uar_mapping);
 #ifdef MLX5DV_UAR_ALLOC_TYPE_NC
-		if (!uar && mapping < 0) {
+		if (!uar && uar_mapping == MLX5DV_UAR_ALLOC_TYPE_BF) {
+			/*
+			 * In some environments like virtual machine the
+			 * Write Combining mapped might be not supported and
+			 * UAR allocation fails. We tried "Non-Cached" mapping
+			 * for the case.
+			 */
+			DRV_LOG(DEBUG, "Failed to allocate DevX UAR (BF)");
+			uar_mapping = MLX5DV_UAR_ALLOC_TYPE_NC;
+			uar = mlx5_glue->devx_alloc_uar(cdev->ctx, uar_mapping);
+		} else if (!uar && uar_mapping == MLX5DV_UAR_ALLOC_TYPE_NC) {
 			/*
 			 * If Verbs/kernel does not support "Non-Cached"
 			 * try the "Write-Combining".
 			 */
 			DRV_LOG(DEBUG, "Failed to allocate DevX UAR (NC)");
 			uar_mapping = MLX5DV_UAR_ALLOC_TYPE_BF;
-			uar = mlx5_glue->devx_alloc_uar(ctx, uar_mapping);
+			uar = mlx5_glue->devx_alloc_uar(cdev->ctx, uar_mapping);
 		}
 #endif
 		if (!uar) {
diff --git a/drivers/common/mlx5/mlx5_common.h b/drivers/common/mlx5/mlx5_common.h
index 744c6a72b3..7febae9cdf 100644
--- a/drivers/common/mlx5/mlx5_common.h
+++ b/drivers/common/mlx5/mlx5_common.h
@@ -284,8 +284,6 @@ __rte_internal
 void mlx5_translate_port_name(const char *port_name_in,
 			      struct mlx5_switch_info *port_info_out);
 void mlx5_glue_constructor(void);
-__rte_internal
-void *mlx5_devx_alloc_uar(void *ctx, int mapping);
 extern uint8_t haswell_broadwell_cpu;
 
 __rte_internal
@@ -417,6 +415,9 @@ void
 mlx5_dev_mempool_unregister(struct mlx5_common_device *cdev,
 			    struct rte_mempool *mp);
 
+__rte_internal
+void *mlx5_devx_alloc_uar(struct mlx5_common_device *cdev);
+
 /* mlx5_common_mr.c */
 
 __rte_internal
diff --git a/drivers/compress/mlx5/mlx5_compress.c b/drivers/compress/mlx5/mlx5_compress.c
index c4081c5f7d..df60b05ab3 100644
--- a/drivers/compress/mlx5/mlx5_compress.c
+++ b/drivers/compress/mlx5/mlx5_compress.c
@@ -690,7 +690,7 @@ mlx5_compress_uar_release(struct mlx5_compress_priv *priv)
 static int
 mlx5_compress_uar_prepare(struct mlx5_compress_priv *priv)
 {
-	priv->uar = mlx5_devx_alloc_uar(priv->cdev->ctx, -1);
+	priv->uar = mlx5_devx_alloc_uar(priv->cdev);
 	if (priv->uar == NULL || mlx5_os_get_devx_uar_reg_addr(priv->uar) ==
 	    NULL) {
 		rte_errno = errno;
diff --git a/drivers/crypto/mlx5/mlx5_crypto.c b/drivers/crypto/mlx5/mlx5_crypto.c
index f9fd0d498e..33d797a6a0 100644
--- a/drivers/crypto/mlx5/mlx5_crypto.c
+++ b/drivers/crypto/mlx5/mlx5_crypto.c
@@ -731,7 +731,7 @@ mlx5_crypto_uar_release(struct mlx5_crypto_priv *priv)
 static int
 mlx5_crypto_uar_prepare(struct mlx5_crypto_priv *priv)
 {
-	priv->uar = mlx5_devx_alloc_uar(priv->cdev->ctx, -1);
+	priv->uar = mlx5_devx_alloc_uar(priv->cdev);
 	if (priv->uar)
 		priv->uar_addr = mlx5_os_get_devx_uar_reg_addr(priv->uar);
 	if (priv->uar == NULL || priv->uar_addr == NULL) {
diff --git a/drivers/regex/mlx5/mlx5_regex.c b/drivers/regex/mlx5/mlx5_regex.c
index b8a513e1fa..d632252794 100644
--- a/drivers/regex/mlx5/mlx5_regex.c
+++ b/drivers/regex/mlx5/mlx5_regex.c
@@ -138,7 +138,7 @@ mlx5_regex_dev_probe(struct mlx5_common_device *cdev)
 	 * registers writings, it is safe to allocate UAR with any
 	 * memory mapping type.
 	 */
-	priv->uar = mlx5_devx_alloc_uar(priv->cdev->ctx, -1);
+	priv->uar = mlx5_devx_alloc_uar(priv->cdev);
 	if (!priv->uar) {
 		DRV_LOG(ERR, "can't allocate uar.");
 		rte_errno = ENOMEM;
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa_event.c b/drivers/vdpa/mlx5/mlx5_vdpa_event.c
index 042d22777f..21738bdfff 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa_event.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa_event.c
@@ -61,7 +61,7 @@ mlx5_vdpa_event_qp_global_prepare(struct mlx5_vdpa_priv *priv)
 	 * registers writings, it is safe to allocate UAR with any
 	 * memory mapping type.
 	 */
-	priv->uar = mlx5_devx_alloc_uar(priv->cdev->ctx, -1);
+	priv->uar = mlx5_devx_alloc_uar(priv->cdev);
 	if (!priv->uar) {
 		rte_errno = errno;
 		DRV_LOG(ERR, "Failed to allocate UAR.");
-- 
2.25.1


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [dpdk-dev] [PATCH 5/6] net/mlx5: remove duplicated reference of the TxQ doorbell
  2021-11-03 18:35 [dpdk-dev] [PATCH 0/6] mlx5: some UAR fixes michaelba
                   ` (3 preceding siblings ...)
  2021-11-03 18:35 ` [dpdk-dev] [PATCH 4/6] common/mlx5: fix doorbell mapping configuration michaelba
@ 2021-11-03 18:35 ` michaelba
  2021-11-03 18:35 ` [dpdk-dev] [PATCH 6/6] common/mlx5: fix post doorbell barrier michaelba
  2021-11-07 15:23 ` [dpdk-dev] [PATCH 0/6] mlx5: some UAR fixes Thomas Monjalon
  6 siblings, 0 replies; 8+ messages in thread
From: michaelba @ 2021-11-03 18:35 UTC (permalink / raw)
  To: dev
  Cc: Matan Azrad, Thomas Monjalon, Michael Baum, stable, Viacheslav Ovsiienko

From: Michael Baum <michaelba@oss.nvidia.com>

The Tx doorbell has different virtual addresses per process.
The secondary process takes the UAR physical page ID of the primary and
mmap it to its own virtual address.
The primary doorbell references were saved in two shared memory
locations: the TxQ structure and a dedicated doorbell array.

Remove the doorbell reference from the TxQ structure and move the
primary processes to take the UAR information from the primary doorbell
array.

Cc: stable@dpdk.org

Signed-off-by: Michael Baum <michaelba@oss.nvidia.com>
Reviewed-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
---
 drivers/net/mlx5/linux/mlx5_verbs.c |  6 ++----
 drivers/net/mlx5/mlx5.c             |  2 ++
 drivers/net/mlx5/mlx5.h             |  2 +-
 drivers/net/mlx5/mlx5_devx.c        |  8 ++------
 drivers/net/mlx5/mlx5_tx.h          |  3 +--
 drivers/net/mlx5/mlx5_txq.c         | 15 ++++++++-------
 6 files changed, 16 insertions(+), 20 deletions(-)

diff --git a/drivers/net/mlx5/linux/mlx5_verbs.c b/drivers/net/mlx5/linux/mlx5_verbs.c
index 4779b37aa6..eef8391c12 100644
--- a/drivers/net/mlx5/linux/mlx5_verbs.c
+++ b/drivers/net/mlx5/linux/mlx5_verbs.c
@@ -990,20 +990,18 @@ mlx5_txq_ibv_obj_new(struct rte_eth_dev *dev, uint16_t idx)
 		}
 	}
 #endif
-	txq_ctrl->bf_reg = qp.bf.reg;
 	if (qp.comp_mask & MLX5DV_QP_MASK_UAR_MMAP_OFFSET) {
 		txq_ctrl->uar_mmap_offset = qp.uar_mmap_offset;
 		DRV_LOG(DEBUG, "Port %u: uar_mmap_offset 0x%" PRIx64 ".",
 			dev->data->port_id, txq_ctrl->uar_mmap_offset);
 	} else {
 		DRV_LOG(ERR,
-			"Port %u failed to retrieve UAR info, invalid"
-			" libmlx5.so",
+			"Port %u failed to retrieve UAR info, invalid libmlx5.so",
 			dev->data->port_id);
 		rte_errno = EINVAL;
 		goto error;
 	}
-	txq_uar_init(txq_ctrl);
+	txq_uar_init(txq_ctrl, qp.bf.reg);
 	dev->data->tx_queue_state[idx] = RTE_ETH_QUEUE_STATE_STARTED;
 	return 0;
 error:
diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index 4fe7e34578..39158a5dde 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -1616,6 +1616,8 @@ mlx5_proc_priv_init(struct rte_eth_dev *dev)
 	}
 	ppriv->uar_table_sz = priv->txqs_n;
 	dev->process_private = ppriv;
+	if (rte_eal_process_type() == RTE_PROC_PRIMARY)
+		priv->sh->pppriv = ppriv;
 	return 0;
 }
 
diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index 5768b82935..3b04f9d4e3 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -1167,6 +1167,7 @@ struct mlx5_dev_ctx_shared {
 	struct mlx5_devx_obj *td; /* Transport domain. */
 	struct mlx5_lag lag; /* LAG attributes */
 	void *tx_uar; /* Tx/packet pacing shared UAR. */
+	struct mlx5_proc_priv *pppriv; /* Pointer to primary private process. */
 	struct mlx5_flex_parser_profiles fp[MLX5_FLEX_PARSER_MAX];
 	/* Flex parser profiles information. */
 	void *devx_rx_uar; /* DevX UAR for Rx. */
@@ -1479,7 +1480,6 @@ void mlx5_set_metadata_mask(struct rte_eth_dev *dev);
 int mlx5_dev_check_sibling_config(struct mlx5_priv *priv,
 				  struct mlx5_dev_config *config,
 				  struct rte_device *dpdk_dev);
-int mlx5_dev_configure(struct rte_eth_dev *dev);
 int mlx5_dev_infos_get(struct rte_eth_dev *dev, struct rte_eth_dev_info *info);
 int mlx5_fw_version_get(struct rte_eth_dev *dev, char *fw_ver, size_t fw_size);
 int mlx5_dev_set_mtu(struct rte_eth_dev *dev, uint16_t mtu);
diff --git a/drivers/net/mlx5/mlx5_devx.c b/drivers/net/mlx5/mlx5_devx.c
index 7ed774e804..dc391529c2 100644
--- a/drivers/net/mlx5/mlx5_devx.c
+++ b/drivers/net/mlx5/mlx5_devx.c
@@ -1074,7 +1074,6 @@ mlx5_txq_devx_obj_new(struct rte_eth_dev *dev, uint16_t idx)
 	struct mlx5_devx_cq_attr cq_attr = {
 		.uar_page_id = mlx5_os_get_devx_uar_page_id(sh->tx_uar),
 	};
-	void *reg_addr;
 	uint32_t cqe_n, log_desc_n;
 	uint32_t wqe_n, wqe_size;
 	int ret = 0;
@@ -1171,13 +1170,10 @@ mlx5_txq_devx_obj_new(struct rte_eth_dev *dev, uint16_t idx)
 	if (!priv->sh->tdn)
 		priv->sh->tdn = priv->sh->td->id;
 #endif
-	MLX5_ASSERT(sh->tx_uar);
-	reg_addr = mlx5_os_get_devx_uar_reg_addr(sh->tx_uar);
-	MLX5_ASSERT(reg_addr);
-	txq_ctrl->bf_reg = reg_addr;
+	MLX5_ASSERT(sh->tx_uar && mlx5_os_get_devx_uar_reg_addr(sh->tx_uar));
 	txq_ctrl->uar_mmap_offset =
 				mlx5_os_get_devx_uar_mmap_offset(sh->tx_uar);
-	txq_uar_init(txq_ctrl);
+	txq_uar_init(txq_ctrl, mlx5_os_get_devx_uar_reg_addr(sh->tx_uar));
 	dev->data->tx_queue_state[idx] = RTE_ETH_QUEUE_STATE_STARTED;
 	return 0;
 error:
diff --git a/drivers/net/mlx5/mlx5_tx.h b/drivers/net/mlx5/mlx5_tx.h
index ea20213a40..24a312b58b 100644
--- a/drivers/net/mlx5/mlx5_tx.h
+++ b/drivers/net/mlx5/mlx5_tx.h
@@ -184,7 +184,6 @@ struct mlx5_txq_ctrl {
 	struct mlx5_txq_obj *obj; /* Verbs/DevX queue object. */
 	struct mlx5_priv *priv; /* Back pointer to private data. */
 	off_t uar_mmap_offset; /* UAR mmap offset for non-primary process. */
-	void *bf_reg; /* BlueFlame register from Verbs. */
 	uint16_t dump_file_n; /* Number of dump files. */
 	struct rte_eth_hairpin_conf hairpin_conf; /* Hairpin configuration. */
 	uint32_t hairpin_status; /* Hairpin binding status. */
@@ -204,7 +203,7 @@ int mlx5_tx_hairpin_queue_setup
 	(struct rte_eth_dev *dev, uint16_t idx, uint16_t desc,
 	 const struct rte_eth_hairpin_conf *hairpin_conf);
 void mlx5_tx_queue_release(struct rte_eth_dev *dev, uint16_t qid);
-void txq_uar_init(struct mlx5_txq_ctrl *txq_ctrl);
+void txq_uar_init(struct mlx5_txq_ctrl *txq_ctrl, void *bf_reg);
 int mlx5_tx_uar_init_secondary(struct rte_eth_dev *dev, int fd);
 void mlx5_tx_uar_uninit_secondary(struct rte_eth_dev *dev);
 int mlx5_txq_obj_verify(struct rte_eth_dev *dev);
diff --git a/drivers/net/mlx5/mlx5_txq.c b/drivers/net/mlx5/mlx5_txq.c
index e2a38d980a..5fa43d63f1 100644
--- a/drivers/net/mlx5/mlx5_txq.c
+++ b/drivers/net/mlx5/mlx5_txq.c
@@ -523,9 +523,11 @@ txq_uar_ncattr_init(struct mlx5_txq_ctrl *txq_ctrl, size_t page_size)
  *
  * @param txq_ctrl
  *   Pointer to Tx queue control structure.
+ * @param bf_reg
+ *   BlueFlame register from Verbs UAR.
  */
 void
-txq_uar_init(struct mlx5_txq_ctrl *txq_ctrl)
+txq_uar_init(struct mlx5_txq_ctrl *txq_ctrl, void *bf_reg)
 {
 	struct mlx5_priv *priv = txq_ctrl->priv;
 	struct mlx5_proc_priv *ppriv = MLX5_PROC_PRIV(PORT_ID(priv));
@@ -542,7 +544,7 @@ txq_uar_init(struct mlx5_txq_ctrl *txq_ctrl)
 		return;
 	MLX5_ASSERT(rte_eal_process_type() == RTE_PROC_PRIMARY);
 	MLX5_ASSERT(ppriv);
-	ppriv->uar_table[txq_ctrl->txq.idx] = txq_ctrl->bf_reg;
+	ppriv->uar_table[txq_ctrl->txq.idx] = bf_reg;
 	txq_uar_ncattr_init(txq_ctrl, page_size);
 #ifndef RTE_ARCH_64
 	/* Assign an UAR lock according to UAR page number */
@@ -571,6 +573,7 @@ txq_uar_init_secondary(struct mlx5_txq_ctrl *txq_ctrl, int fd)
 {
 	struct mlx5_priv *priv = txq_ctrl->priv;
 	struct mlx5_proc_priv *ppriv = MLX5_PROC_PRIV(PORT_ID(priv));
+	struct mlx5_proc_priv *primary_ppriv = priv->sh->pppriv;
 	struct mlx5_txq_data *txq = &txq_ctrl->txq;
 	void *addr;
 	uintptr_t uar_va;
@@ -589,20 +592,18 @@ txq_uar_init_secondary(struct mlx5_txq_ctrl *txq_ctrl, int fd)
 	 * As rdma-core, UARs are mapped in size of OS page
 	 * size. Ref to libmlx5 function: mlx5_init_context()
 	 */
-	uar_va = (uintptr_t)txq_ctrl->bf_reg;
+	uar_va = (uintptr_t)primary_ppriv->uar_table[txq->idx];
 	offset = uar_va & (page_size - 1); /* Offset in page. */
 	addr = rte_mem_map(NULL, page_size, RTE_PROT_WRITE, RTE_MAP_SHARED,
-			    fd, txq_ctrl->uar_mmap_offset);
+			   fd, txq_ctrl->uar_mmap_offset);
 	if (!addr) {
-		DRV_LOG(ERR,
-			"port %u mmap failed for BF reg of txq %u",
+		DRV_LOG(ERR, "Port %u mmap failed for BF reg of txq %u.",
 			txq->port_id, txq->idx);
 		rte_errno = ENXIO;
 		return -rte_errno;
 	}
 	addr = RTE_PTR_ADD(addr, offset);
 	ppriv->uar_table[txq->idx] = addr;
-	txq_uar_ncattr_init(txq_ctrl, page_size);
 	return 0;
 }
 
-- 
2.25.1


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [dpdk-dev] [PATCH 6/6] common/mlx5: fix post doorbell barrier
  2021-11-03 18:35 [dpdk-dev] [PATCH 0/6] mlx5: some UAR fixes michaelba
                   ` (4 preceding siblings ...)
  2021-11-03 18:35 ` [dpdk-dev] [PATCH 5/6] net/mlx5: remove duplicated reference of the TxQ doorbell michaelba
@ 2021-11-03 18:35 ` michaelba
  2021-11-07 15:23 ` [dpdk-dev] [PATCH 0/6] mlx5: some UAR fixes Thomas Monjalon
  6 siblings, 0 replies; 8+ messages in thread
From: michaelba @ 2021-11-03 18:35 UTC (permalink / raw)
  To: dev
  Cc: Matan Azrad, Thomas Monjalon, Michael Baum, stable, Viacheslav Ovsiienko

From: Michael Baum <michaelba@oss.nvidia.com>

The rdma core library can map doorbell register in two ways, depending
on the environment variable "MLX5_SHUT_UP_BF":

  - as regular cached memory, the variable is either missing or set to
    zero. This type of mapping may cause the significant doorbell
    register writing latency and requires an explicit memory write
    barrier to mitigate this issue and prevent write combining.

  - as non-cached memory, the variable is present and set to not "0"
    value. This type of mapping may cause performance impact under
    heavy loading conditions but the explicit write memory barrier is
    not required and it may improve core performance.

The UAR creation function maps a doorbell in one of the above ways
according to the system. In run time, it always adds an explicit memory
barrier after writing to.
In cases where the doorbell was mapped as non-cached memory, the
explicit memory barrier is unnecessary and may impair performance.

The commit [1] solved this problem for a Tx queue. In run time, it
checks the mapping type and provides the memory barrier after writing to
a Tx doorbell register if it is needed. The mapping type is extracted
directly from the uar_mmap_offset field in the queue properties.

This patch shares this code between the drivers and extends the above
solution for each of them.

[1] commit 8409a28573d3
    ("net/mlx5: control transmit doorbell register mapping")

Fixes: f8c97babc9f4 ("compress/mlx5: add data-path functions")
Fixes: 8e196c08ab53 ("crypto/mlx5: support enqueue/dequeue operations")
Fixes: 4d4e245ad637 ("regex/mlx5: support enqueue")
Cc: stable@dpdk.org

Signed-off-by: Michael Baum <michaelba@oss.nvidia.com>
Reviewed-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
---
 drivers/common/mlx5/mlx5_common.c        |  49 +++++-
 drivers/common/mlx5/mlx5_common.h        |  88 ++++++++++-
 drivers/common/mlx5/mlx5_common_defs.h   |   8 +
 drivers/common/mlx5/version.map          |   5 +-
 drivers/compress/mlx5/mlx5_compress.c    |  69 ++------
 drivers/crypto/mlx5/mlx5_crypto.c        |  56 +------
 drivers/crypto/mlx5/mlx5_crypto.h        |   6 +-
 drivers/net/mlx5/linux/mlx5_verbs.c      |  44 +++++-
 drivers/net/mlx5/mlx5.c                  | 193 +++++------------------
 drivers/net/mlx5/mlx5.h                  |  10 +-
 drivers/net/mlx5/mlx5_defs.h             |   8 -
 drivers/net/mlx5/mlx5_devx.c             |  20 ++-
 drivers/net/mlx5/mlx5_flow_aso.c         |  81 +++++-----
 drivers/net/mlx5/mlx5_rx.h               |   6 +-
 drivers/net/mlx5/mlx5_rxq.c              |  11 +-
 drivers/net/mlx5/mlx5_tx.h               | 102 ++----------
 drivers/net/mlx5/mlx5_txpp.c             |  36 ++---
 drivers/net/mlx5/mlx5_txq.c              |  74 ++-------
 drivers/regex/mlx5/mlx5_regex.c          |  17 +-
 drivers/regex/mlx5/mlx5_regex.h          |   2 +-
 drivers/regex/mlx5/mlx5_regex_control.c  |   4 +-
 drivers/regex/mlx5/mlx5_regex_fastpath.c |  26 ++-
 drivers/vdpa/mlx5/mlx5_vdpa.h            |   2 +-
 drivers/vdpa/mlx5/mlx5_vdpa_event.c      |  33 +---
 24 files changed, 351 insertions(+), 599 deletions(-)

diff --git a/drivers/common/mlx5/mlx5_common.c b/drivers/common/mlx5/mlx5_common.c
index 7bdc550b36..c81c7115a4 100644
--- a/drivers/common/mlx5/mlx5_common.c
+++ b/drivers/common/mlx5/mlx5_common.c
@@ -10,6 +10,7 @@
 #include <rte_mempool.h>
 #include <rte_class.h>
 #include <rte_malloc.h>
+#include <rte_eal_paging.h>
 
 #include "mlx5_common.h"
 #include "mlx5_common_os.h"
@@ -936,10 +937,10 @@ RTE_INIT_PRIO(mlx5_is_haswell_broadwell_cpu, LOG)
  * Allocate the User Access Region with DevX on specified device.
  * This routine handles the following UAR allocation issues:
  *
- *  - tries to allocate the UAR with the most appropriate memory mapping
+ *  - Try to allocate the UAR with the most appropriate memory mapping
  *    type from the ones supported by the host.
  *
- *  - tries to allocate the UAR with non-NULL base address OFED 5.0.x and
+ *  - Try to allocate the UAR with non-NULL base address OFED 5.0.x and
  *    Upstream rdma_core before v29 returned the NULL as UAR base address
  *    if UAR was not the first object in the UAR page.
  *    It caused the PMD failure and we should try to get another UAR till
@@ -951,7 +952,7 @@ RTE_INIT_PRIO(mlx5_is_haswell_broadwell_cpu, LOG)
  * @return
  *   UAR object pointer on success, NULL otherwise and rte_errno is set.
  */
-void *
+static void *
 mlx5_devx_alloc_uar(struct mlx5_common_device *cdev)
 {
 	void *uar;
@@ -1021,4 +1022,46 @@ mlx5_devx_alloc_uar(struct mlx5_common_device *cdev)
 	return uar;
 }
 
+void
+mlx5_devx_uar_release(struct mlx5_uar *uar)
+{
+	if (uar->obj != NULL)
+		mlx5_glue->devx_free_uar(uar->obj);
+	memset(uar, 0, sizeof(*uar));
+}
+
+int
+mlx5_devx_uar_prepare(struct mlx5_common_device *cdev, struct mlx5_uar *uar)
+{
+	off_t uar_mmap_offset;
+	const size_t page_size = rte_mem_page_size();
+	void *base_addr;
+	void *uar_obj;
+
+	if (page_size == (size_t)-1) {
+		DRV_LOG(ERR, "Failed to get mem page size");
+		rte_errno = ENOMEM;
+		return -1;
+	}
+	uar_obj = mlx5_devx_alloc_uar(cdev);
+	if (uar_obj == NULL || mlx5_os_get_devx_uar_reg_addr(uar_obj) == NULL) {
+		rte_errno = errno;
+		DRV_LOG(ERR, "Failed to allocate UAR.");
+		return -1;
+	}
+	uar->obj = uar_obj;
+	uar_mmap_offset = mlx5_os_get_devx_uar_mmap_offset(uar_obj);
+	base_addr = mlx5_os_get_devx_uar_base_addr(uar_obj);
+	uar->dbnc = mlx5_db_map_type_get(uar_mmap_offset, page_size);
+	uar->bf_db.db = mlx5_os_get_devx_uar_reg_addr(uar_obj);
+	uar->cq_db.db = RTE_PTR_ADD(base_addr, MLX5_CQ_DOORBELL);
+#ifndef RTE_ARCH_64
+	rte_spinlock_init(&uar->bf_sl);
+	rte_spinlock_init(&uar->cq_sl);
+	uar->bf_db.sl_p = &uar->bf_sl;
+	uar->cq_db.sl_p = &uar->cq_sl;
+#endif /* RTE_ARCH_64 */
+	return 0;
+}
+
 RTE_PMD_EXPORT_NAME(mlx5_common_driver, __COUNTER__);
diff --git a/drivers/common/mlx5/mlx5_common.h b/drivers/common/mlx5/mlx5_common.h
index 7febae9cdf..0b95d477cf 100644
--- a/drivers/common/mlx5/mlx5_common.h
+++ b/drivers/common/mlx5/mlx5_common.h
@@ -280,6 +280,87 @@ struct mlx5_klm {
 	uint64_t address;
 };
 
+/* All UAR arguments using doorbell register in datapath. */
+struct mlx5_uar_data {
+	uint64_t *db;
+	/* The doorbell's virtual address mapped to the relevant HW UAR space.*/
+#ifndef RTE_ARCH_64
+	rte_spinlock_t *sl_p;
+	/* Pointer to UAR access lock required for 32bit implementations. */
+#endif /* RTE_ARCH_64 */
+};
+
+/* DevX UAR control structure. */
+struct mlx5_uar {
+	struct mlx5_uar_data bf_db; /* UAR data for Blueflame register. */
+	struct mlx5_uar_data cq_db; /* UAR data for CQ arm db register. */
+	void *obj; /* DevX UAR object. */
+	bool dbnc; /* Doorbell mapped to non-cached region. */
+#ifndef RTE_ARCH_64
+	rte_spinlock_t bf_sl;
+	rte_spinlock_t cq_sl;
+	/* UAR access locks required for 32bit implementations. */
+#endif /* RTE_ARCH_64 */
+};
+
+/**
+ * Ring a doorbell and flush the update if requested.
+ *
+ * @param uar
+ *   Pointer to UAR data structure.
+ * @param val
+ *   value to write in big endian format.
+ * @param index
+ *   Index of doorbell record.
+ * @param db_rec
+ *   Address of doorbell record.
+ * @param flash
+ *   Decide whether to flush the DB writing using a memory barrier.
+ */
+static __rte_always_inline void
+mlx5_doorbell_ring(struct mlx5_uar_data *uar, uint64_t val, uint32_t index,
+		   volatile uint32_t *db_rec, bool flash)
+{
+	rte_io_wmb();
+	*db_rec = rte_cpu_to_be_32(index);
+	/* Ensure ordering between DB record actual update and UAR access. */
+	rte_wmb();
+#ifdef RTE_ARCH_64
+	*uar->db = val;
+#else /* !RTE_ARCH_64 */
+	rte_spinlock_lock(uar->sl_p);
+	*(volatile uint32_t *)uar->db = val;
+	rte_io_wmb();
+	*((volatile uint32_t *)uar->db + 1) = val >> 32;
+	rte_spinlock_unlock(uar->sl_p);
+#endif
+	if (flash)
+		rte_wmb();
+}
+
+/**
+ * Get the doorbell register mapping type.
+ *
+ * @param uar_mmap_offset
+ *   Mmap offset of Verbs/DevX UAR.
+ * @param page_size
+ *   System page size
+ *
+ * @return
+ *   1 for non-cached, 0 otherwise.
+ */
+static inline uint16_t
+mlx5_db_map_type_get(off_t uar_mmap_offset, size_t page_size)
+{
+	off_t cmd = uar_mmap_offset / page_size;
+
+	cmd >>= MLX5_UAR_MMAP_CMD_SHIFT;
+	cmd &= MLX5_UAR_MMAP_CMD_MASK;
+	if (cmd == MLX5_MMAP_GET_NC_PAGES_CMD)
+		return 1;
+	return 0;
+}
+
 __rte_internal
 void mlx5_translate_port_name(const char *port_name_in,
 			      struct mlx5_switch_info *port_info_out);
@@ -416,7 +497,12 @@ mlx5_dev_mempool_unregister(struct mlx5_common_device *cdev,
 			    struct rte_mempool *mp);
 
 __rte_internal
-void *mlx5_devx_alloc_uar(struct mlx5_common_device *cdev);
+int
+mlx5_devx_uar_prepare(struct mlx5_common_device *cdev, struct mlx5_uar *uar);
+
+__rte_internal
+void
+mlx5_devx_uar_release(struct mlx5_uar *uar);
 
 /* mlx5_common_mr.c */
 
diff --git a/drivers/common/mlx5/mlx5_common_defs.h b/drivers/common/mlx5/mlx5_common_defs.h
index 8f43b8e8ad..ca80cd8d29 100644
--- a/drivers/common/mlx5/mlx5_common_defs.h
+++ b/drivers/common/mlx5/mlx5_common_defs.h
@@ -39,6 +39,14 @@
 #define MLX5_TXDB_NCACHED 1
 #define MLX5_TXDB_HEURISTIC 2
 
+/* Fields of memory mapping type in offset parameter of mmap() */
+#define MLX5_UAR_MMAP_CMD_SHIFT 8
+#define MLX5_UAR_MMAP_CMD_MASK 0xff
+
+#ifndef HAVE_MLX5DV_MMAP_GET_NC_PAGES_CMD
+#define MLX5_MMAP_GET_NC_PAGES_CMD 3
+#endif
+
 #define MLX5_VDPA_MAX_RETRIES 20
 #define MLX5_VDPA_USEC 1000
 
diff --git a/drivers/common/mlx5/version.map b/drivers/common/mlx5/version.map
index 0ea8325f9a..0f3192e58d 100644
--- a/drivers/common/mlx5/version.map
+++ b/drivers/common/mlx5/version.map
@@ -16,8 +16,6 @@ INTERNAL {
 	mlx5_dev_mempool_unregister;
 	mlx5_dev_mempool_subscribe;
 
-	mlx5_devx_alloc_uar; # WINDOWS_NO_EXPORT
-
 	mlx5_devx_cmd_alloc_pd;
 	mlx5_devx_cmd_create_conn_track_offload_obj;
 	mlx5_devx_cmd_create_cq;
@@ -75,6 +73,9 @@ INTERNAL {
 	mlx5_devx_sq_create;
 	mlx5_devx_sq_destroy;
 
+	mlx5_devx_uar_prepare;
+	mlx5_devx_uar_release;
+
 	mlx5_free;
 
 	mlx5_get_ifname_sysfs; # WINDOWS_NO_EXPORT
diff --git a/drivers/compress/mlx5/mlx5_compress.c b/drivers/compress/mlx5/mlx5_compress.c
index df60b05ab3..426524db33 100644
--- a/drivers/compress/mlx5/mlx5_compress.c
+++ b/drivers/compress/mlx5/mlx5_compress.c
@@ -37,23 +37,19 @@ struct mlx5_compress_priv {
 	TAILQ_ENTRY(mlx5_compress_priv) next;
 	struct rte_compressdev *compressdev;
 	struct mlx5_common_device *cdev; /* Backend mlx5 device. */
-	void *uar;
+	struct mlx5_uar uar;
 	uint8_t min_block_size;
 	/* Minimum huffman block size supported by the device. */
 	struct rte_compressdev_config dev_config;
 	LIST_HEAD(xform_list, mlx5_compress_xform) xform_list;
 	rte_spinlock_t xform_sl;
-	volatile uint64_t *uar_addr;
-	/* HCA caps*/
+	/* HCA caps */
 	uint32_t mmo_decomp_sq:1;
 	uint32_t mmo_decomp_qp:1;
 	uint32_t mmo_comp_sq:1;
 	uint32_t mmo_comp_qp:1;
 	uint32_t mmo_dma_sq:1;
 	uint32_t mmo_dma_qp:1;
-#ifndef RTE_ARCH_64
-	rte_spinlock_t uar32_sl;
-#endif /* RTE_ARCH_64 */
 };
 
 struct mlx5_compress_qp {
@@ -183,11 +179,11 @@ mlx5_compress_qp_setup(struct rte_compressdev *dev, uint16_t qp_id,
 	struct mlx5_compress_priv *priv = dev->data->dev_private;
 	struct mlx5_compress_qp *qp;
 	struct mlx5_devx_cq_attr cq_attr = {
-		.uar_page_id = mlx5_os_get_devx_uar_page_id(priv->uar),
+		.uar_page_id = mlx5_os_get_devx_uar_page_id(priv->uar.obj),
 	};
 	struct mlx5_devx_qp_attr qp_attr = {
 		.pd = priv->cdev->pdn,
-		.uar_index = mlx5_os_get_devx_uar_page_id(priv->uar),
+		.uar_index = mlx5_os_get_devx_uar_page_id(priv->uar.obj),
 		.user_index = qp_id,
 	};
 	uint32_t log_ops_n = rte_log2_u32(max_inflight_ops);
@@ -476,24 +472,6 @@ mlx5_compress_dseg_set(struct mlx5_compress_qp *qp,
 	return dseg->lkey;
 }
 
-/*
- * Provide safe 64bit store operation to mlx5 UAR region for both 32bit and
- * 64bit architectures.
- */
-static __rte_always_inline void
-mlx5_compress_uar_write(uint64_t val, struct mlx5_compress_priv *priv)
-{
-#ifdef RTE_ARCH_64
-	*priv->uar_addr = val;
-#else /* !RTE_ARCH_64 */
-	rte_spinlock_lock(&priv->uar32_sl);
-	*(volatile uint32_t *)priv->uar_addr = val;
-	rte_io_wmb();
-	*((volatile uint32_t *)priv->uar_addr + 1) = val >> 32;
-	rte_spinlock_unlock(&priv->uar32_sl);
-#endif
-}
-
 static uint16_t
 mlx5_compress_enqueue_burst(void *queue_pair, struct rte_comp_op **ops,
 			    uint16_t nb_ops)
@@ -554,11 +532,9 @@ mlx5_compress_enqueue_burst(void *queue_pair, struct rte_comp_op **ops,
 		qp->pi++;
 	} while (--remain);
 	qp->stats.enqueued_count += nb_ops;
-	rte_io_wmb();
-	qp->qp.db_rec[MLX5_SND_DBR] = rte_cpu_to_be_32(qp->pi);
-	rte_wmb();
-	mlx5_compress_uar_write(*(volatile uint64_t *)wqe, qp->priv);
-	rte_wmb();
+	mlx5_doorbell_ring(&qp->priv->uar.bf_db, *(volatile uint64_t *)wqe,
+			   qp->pi, &qp->qp.db_rec[MLX5_SND_DBR],
+			   !qp->priv->uar.dbnc);
 	return nb_ops;
 }
 
@@ -678,33 +654,6 @@ mlx5_compress_dequeue_burst(void *queue_pair, struct rte_comp_op **ops,
 	return i;
 }
 
-static void
-mlx5_compress_uar_release(struct mlx5_compress_priv *priv)
-{
-	if (priv->uar != NULL) {
-		mlx5_glue->devx_free_uar(priv->uar);
-		priv->uar = NULL;
-	}
-}
-
-static int
-mlx5_compress_uar_prepare(struct mlx5_compress_priv *priv)
-{
-	priv->uar = mlx5_devx_alloc_uar(priv->cdev);
-	if (priv->uar == NULL || mlx5_os_get_devx_uar_reg_addr(priv->uar) ==
-	    NULL) {
-		rte_errno = errno;
-		DRV_LOG(ERR, "Failed to allocate UAR.");
-		return -1;
-	}
-	priv->uar_addr = mlx5_os_get_devx_uar_reg_addr(priv->uar);
-	MLX5_ASSERT(priv->uar_addr);
-#ifndef RTE_ARCH_64
-	rte_spinlock_init(&priv->uar32_sl);
-#endif /* RTE_ARCH_64 */
-	return 0;
-}
-
 static int
 mlx5_compress_dev_probe(struct mlx5_common_device *cdev)
 {
@@ -752,7 +701,7 @@ mlx5_compress_dev_probe(struct mlx5_common_device *cdev)
 	priv->cdev = cdev;
 	priv->compressdev = compressdev;
 	priv->min_block_size = attr->compress_min_block_size;
-	if (mlx5_compress_uar_prepare(priv) != 0) {
+	if (mlx5_devx_uar_prepare(cdev, &priv->uar) != 0) {
 		rte_compressdev_pmd_destroy(priv->compressdev);
 		return -1;
 	}
@@ -775,7 +724,7 @@ mlx5_compress_dev_remove(struct mlx5_common_device *cdev)
 		TAILQ_REMOVE(&mlx5_compress_priv_list, priv, next);
 	pthread_mutex_unlock(&priv_list_lock);
 	if (priv) {
-		mlx5_compress_uar_release(priv);
+		mlx5_devx_uar_release(&priv->uar);
 		rte_compressdev_pmd_destroy(priv->compressdev);
 	}
 	return 0;
diff --git a/drivers/crypto/mlx5/mlx5_crypto.c b/drivers/crypto/mlx5/mlx5_crypto.c
index 33d797a6a0..ffa24b4ffe 100644
--- a/drivers/crypto/mlx5/mlx5_crypto.c
+++ b/drivers/crypto/mlx5/mlx5_crypto.c
@@ -422,20 +422,6 @@ mlx5_crypto_wqe_set(struct mlx5_crypto_priv *priv,
 	return 1;
 }
 
-static __rte_always_inline void
-mlx5_crypto_uar_write(uint64_t val, struct mlx5_crypto_priv *priv)
-{
-#ifdef RTE_ARCH_64
-	*priv->uar_addr = val;
-#else /* !RTE_ARCH_64 */
-	rte_spinlock_lock(&priv->uar32_sl);
-	*(volatile uint32_t *)priv->uar_addr = val;
-	rte_io_wmb();
-	*((volatile uint32_t *)priv->uar_addr + 1) = val >> 32;
-	rte_spinlock_unlock(&priv->uar32_sl);
-#endif
-}
-
 static uint16_t
 mlx5_crypto_enqueue_burst(void *queue_pair, struct rte_crypto_op **ops,
 			  uint16_t nb_ops)
@@ -471,11 +457,9 @@ mlx5_crypto_enqueue_burst(void *queue_pair, struct rte_crypto_op **ops,
 		qp->pi++;
 	} while (--remain);
 	qp->stats.enqueued_count += nb_ops;
-	rte_io_wmb();
-	qp->qp_obj.db_rec[MLX5_SND_DBR] = rte_cpu_to_be_32(qp->db_pi);
-	rte_wmb();
-	mlx5_crypto_uar_write(*(volatile uint64_t *)qp->wqe, qp->priv);
-	rte_wmb();
+	mlx5_doorbell_ring(&priv->uar.bf_db, *(volatile uint64_t *)qp->wqe,
+			   qp->db_pi, &qp->qp_obj.db_rec[MLX5_SND_DBR],
+			   !priv->uar.dbnc);
 	return nb_ops;
 }
 
@@ -609,7 +593,7 @@ mlx5_crypto_queue_pair_setup(struct rte_cryptodev *dev, uint16_t qp_id,
 	uint32_t ret;
 	uint32_t alloc_size = sizeof(*qp);
 	struct mlx5_devx_cq_attr cq_attr = {
-		.uar_page_id = mlx5_os_get_devx_uar_page_id(priv->uar),
+		.uar_page_id = mlx5_os_get_devx_uar_page_id(priv->uar.obj),
 	};
 
 	if (dev->data->queue_pairs[qp_id] != NULL)
@@ -631,7 +615,7 @@ mlx5_crypto_queue_pair_setup(struct rte_cryptodev *dev, uint16_t qp_id,
 		goto error;
 	}
 	attr.pd = priv->cdev->pdn;
-	attr.uar_index = mlx5_os_get_devx_uar_page_id(priv->uar);
+	attr.uar_index = mlx5_os_get_devx_uar_page_id(priv->uar.obj);
 	attr.cqn = qp->cq_obj.cq->id;
 	attr.rq_size = 0;
 	attr.sq_size = RTE_BIT32(log_nb_desc);
@@ -719,30 +703,6 @@ static struct rte_cryptodev_ops mlx5_crypto_ops = {
 	.sym_configure_raw_dp_ctx	= NULL,
 };
 
-static void
-mlx5_crypto_uar_release(struct mlx5_crypto_priv *priv)
-{
-	if (priv->uar != NULL) {
-		mlx5_glue->devx_free_uar(priv->uar);
-		priv->uar = NULL;
-	}
-}
-
-static int
-mlx5_crypto_uar_prepare(struct mlx5_crypto_priv *priv)
-{
-	priv->uar = mlx5_devx_alloc_uar(priv->cdev);
-	if (priv->uar)
-		priv->uar_addr = mlx5_os_get_devx_uar_reg_addr(priv->uar);
-	if (priv->uar == NULL || priv->uar_addr == NULL) {
-		rte_errno = errno;
-		DRV_LOG(ERR, "Failed to allocate UAR.");
-		return -1;
-	}
-	return 0;
-}
-
-
 static int
 mlx5_crypto_args_check_handler(const char *key, const char *val, void *opaque)
 {
@@ -894,7 +854,7 @@ mlx5_crypto_dev_probe(struct mlx5_common_device *cdev)
 	priv = crypto_dev->data->dev_private;
 	priv->cdev = cdev;
 	priv->crypto_dev = crypto_dev;
-	if (mlx5_crypto_uar_prepare(priv) != 0) {
+	if (mlx5_devx_uar_prepare(cdev, &priv->uar) != 0) {
 		rte_cryptodev_pmd_destroy(priv->crypto_dev);
 		return -1;
 	}
@@ -902,7 +862,7 @@ mlx5_crypto_dev_probe(struct mlx5_common_device *cdev)
 						      &devarg_prms.login_attr);
 	if (login == NULL) {
 		DRV_LOG(ERR, "Failed to configure login.");
-		mlx5_crypto_uar_release(priv);
+		mlx5_devx_uar_release(&priv->uar);
 		rte_cryptodev_pmd_destroy(priv->crypto_dev);
 		return -rte_errno;
 	}
@@ -943,7 +903,7 @@ mlx5_crypto_dev_remove(struct mlx5_common_device *cdev)
 	pthread_mutex_unlock(&priv_list_lock);
 	if (priv) {
 		claim_zero(mlx5_devx_cmd_destroy(priv->login_obj));
-		mlx5_crypto_uar_release(priv);
+		mlx5_devx_uar_release(&priv->uar);
 		rte_cryptodev_pmd_destroy(priv->crypto_dev);
 	}
 	return 0;
diff --git a/drivers/crypto/mlx5/mlx5_crypto.h b/drivers/crypto/mlx5/mlx5_crypto.h
index 69cef81d77..135cd78212 100644
--- a/drivers/crypto/mlx5/mlx5_crypto.h
+++ b/drivers/crypto/mlx5/mlx5_crypto.h
@@ -21,8 +21,7 @@ struct mlx5_crypto_priv {
 	TAILQ_ENTRY(mlx5_crypto_priv) next;
 	struct mlx5_common_device *cdev; /* Backend mlx5 device. */
 	struct rte_cryptodev *crypto_dev;
-	void *uar; /* User Access Region. */
-	volatile uint64_t *uar_addr;
+	struct mlx5_uar uar; /* User Access Region. */
 	uint32_t max_segs_num; /* Maximum supported data segs. */
 	struct mlx5_hlist *dek_hlist; /* Dek hash list. */
 	struct rte_cryptodev_config dev_config;
@@ -32,9 +31,6 @@ struct mlx5_crypto_priv {
 	uint16_t umr_wqe_size;
 	uint16_t umr_wqe_stride;
 	uint16_t max_rdmar_ds;
-#ifndef RTE_ARCH_64
-	rte_spinlock_t uar32_sl;
-#endif /* RTE_ARCH_64 */
 };
 
 struct mlx5_crypto_qp {
diff --git a/drivers/net/mlx5/linux/mlx5_verbs.c b/drivers/net/mlx5/linux/mlx5_verbs.c
index eef8391c12..a8d7c1d1b1 100644
--- a/drivers/net/mlx5/linux/mlx5_verbs.c
+++ b/drivers/net/mlx5/linux/mlx5_verbs.c
@@ -16,6 +16,7 @@
 #include <rte_malloc.h>
 #include <ethdev_driver.h>
 #include <rte_common.h>
+#include <rte_eal_paging.h>
 
 #include <mlx5_glue.h>
 #include <mlx5_common.h>
@@ -374,7 +375,10 @@ mlx5_rxq_ibv_obj_new(struct rte_eth_dev *dev, uint16_t idx)
 	rxq_data->cqe_n = log2above(cq_info.cqe_cnt);
 	rxq_data->cq_db = cq_info.dbrec;
 	rxq_data->cqes = (volatile struct mlx5_cqe (*)[])(uintptr_t)cq_info.buf;
-	rxq_data->cq_uar = cq_info.cq_uar;
+	rxq_data->uar_data.db = RTE_PTR_ADD(cq_info.cq_uar, MLX5_CQ_DOORBELL);
+#ifndef RTE_ARCH_64
+	rxq_data->uar_data.sl_p = &priv->sh->uar_lock_cq;
+#endif
 	rxq_data->cqn = cq_info.cqn;
 	/* Create WQ (RQ) using Verbs API. */
 	tmpl->wq = mlx5_rxq_ibv_wq_create(dev, idx);
@@ -870,6 +874,42 @@ mlx5_txq_ibv_qp_create(struct rte_eth_dev *dev, uint16_t idx)
 	return qp_obj;
 }
 
+/**
+ * Initialize Tx UAR registers for primary process.
+ *
+ * @param txq_ctrl
+ *   Pointer to Tx queue control structure.
+ * @param bf_reg
+ *   BlueFlame register from Verbs UAR.
+ */
+static void
+mlx5_txq_ibv_uar_init(struct mlx5_txq_ctrl *txq_ctrl, void *bf_reg)
+{
+	struct mlx5_priv *priv = txq_ctrl->priv;
+	struct mlx5_proc_priv *ppriv = MLX5_PROC_PRIV(PORT_ID(priv));
+	const size_t page_size = rte_mem_page_size();
+	struct mlx5_txq_data *txq = &txq_ctrl->txq;
+	off_t uar_mmap_offset = txq_ctrl->uar_mmap_offset;
+#ifndef RTE_ARCH_64
+	unsigned int lock_idx;
+#endif
+
+	MLX5_ASSERT(rte_eal_process_type() == RTE_PROC_PRIMARY);
+	MLX5_ASSERT(ppriv);
+	if (page_size == (size_t)-1) {
+		DRV_LOG(ERR, "Failed to get mem page size");
+		rte_errno = ENOMEM;
+	}
+	txq->db_heu = priv->sh->cdev->config.dbnc == MLX5_TXDB_HEURISTIC;
+	txq->db_nc = mlx5_db_map_type_get(uar_mmap_offset, page_size);
+	ppriv->uar_table[txq->idx].db = bf_reg;
+#ifndef RTE_ARCH_64
+	/* Assign an UAR lock according to UAR page number. */
+	lock_idx = (uar_mmap_offset / page_size) & MLX5_UAR_PAGE_NUM_MASK;
+	ppriv->uar_table[txq->idx].sl_p = &priv->sh->uar_lock[lock_idx];
+#endif
+}
+
 /**
  * Create the Tx queue Verbs object.
  *
@@ -1001,7 +1041,7 @@ mlx5_txq_ibv_obj_new(struct rte_eth_dev *dev, uint16_t idx)
 		rte_errno = EINVAL;
 		goto error;
 	}
-	txq_uar_init(txq_ctrl, qp.bf.reg);
+	mlx5_txq_ibv_uar_init(txq_ctrl, qp.bf.reg);
 	dev->data->tx_queue_state[idx] = RTE_ETH_QUEUE_STATE_STARTED;
 	return 0;
 error:
diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index 39158a5dde..d30ce74f5c 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -19,6 +19,7 @@
 #include <rte_rwlock.h>
 #include <rte_spinlock.h>
 #include <rte_string_fns.h>
+#include <rte_eal_paging.h>
 #include <rte_alarm.h>
 #include <rte_cycles.h>
 
@@ -987,143 +988,35 @@ mlx5_get_supported_tunneling_offloads(const struct mlx5_hca_attr *attr)
 	return tn_offloads;
 }
 
-/*
- * Allocate Rx and Tx UARs in robust fashion.
- * This routine handles the following UAR allocation issues:
- *
- *  - tries to allocate the UAR with the most appropriate memory
- *    mapping type from the ones supported by the host
- *
- *  - tries to allocate the UAR with non-NULL base address
- *    OFED 5.0.x and Upstream rdma_core before v29 returned the NULL as
- *    UAR base address if UAR was not the first object in the UAR page.
- *    It caused the PMD failure and we should try to get another UAR
- *    till we get the first one with non-NULL base address returned.
- */
+/* Fill all fields of UAR structure. */
 static int
-mlx5_alloc_rxtx_uars(struct mlx5_dev_ctx_shared *sh,
-		     const struct mlx5_common_dev_config *config)
+mlx5_rxtx_uars_prepare(struct mlx5_dev_ctx_shared *sh)
 {
-	uint32_t uar_mapping, retry;
-	int err = 0;
-	void *base_addr;
-
-	for (retry = 0; retry < MLX5_ALLOC_UAR_RETRY; ++retry) {
-#ifdef MLX5DV_UAR_ALLOC_TYPE_NC
-		/* Control the mapping type according to the settings. */
-		uar_mapping = (config->dbnc == MLX5_TXDB_NCACHED) ?
-			      MLX5DV_UAR_ALLOC_TYPE_NC :
-			      MLX5DV_UAR_ALLOC_TYPE_BF;
-#else
-		RTE_SET_USED(config);
-		/*
-		 * It seems we have no way to control the memory mapping type
-		 * for the UAR, the default "Write-Combining" type is supposed.
-		 * The UAR initialization on queue creation queries the
-		 * actual mapping type done by Verbs/kernel and setups the
-		 * PMD datapath accordingly.
-		 */
-		uar_mapping = 0;
-#endif
-		sh->tx_uar = mlx5_glue->devx_alloc_uar(sh->cdev->ctx,
-						       uar_mapping);
-#ifdef MLX5DV_UAR_ALLOC_TYPE_NC
-		if (!sh->tx_uar &&
-		    uar_mapping == MLX5DV_UAR_ALLOC_TYPE_BF) {
-			if (config->dbnc == MLX5_TXDB_CACHED ||
-			    config->dbnc == MLX5_TXDB_HEURISTIC)
-				DRV_LOG(WARNING, "Devarg tx_db_nc setting "
-						 "is not supported by DevX");
-			/*
-			 * In some environments like virtual machine
-			 * the Write Combining mapped might be not supported
-			 * and UAR allocation fails. We try "Non-Cached"
-			 * mapping for the case. The tx_burst routines take
-			 * the UAR mapping type into account on UAR setup
-			 * on queue creation.
-			 */
-			DRV_LOG(DEBUG, "Failed to allocate Tx DevX UAR (BF)");
-			uar_mapping = MLX5DV_UAR_ALLOC_TYPE_NC;
-			sh->tx_uar = mlx5_glue->devx_alloc_uar(sh->cdev->ctx,
-							       uar_mapping);
-		} else if (!sh->tx_uar &&
-			   uar_mapping == MLX5DV_UAR_ALLOC_TYPE_NC) {
-			if (config->dbnc == MLX5_TXDB_NCACHED)
-				DRV_LOG(WARNING, "Devarg tx_db_nc settings "
-						 "is not supported by DevX");
-			/*
-			 * If Verbs/kernel does not support "Non-Cached"
-			 * try the "Write-Combining".
-			 */
-			DRV_LOG(DEBUG, "Failed to allocate Tx DevX UAR (NC)");
-			uar_mapping = MLX5DV_UAR_ALLOC_TYPE_BF;
-			sh->tx_uar = mlx5_glue->devx_alloc_uar(sh->cdev->ctx,
-							       uar_mapping);
-		}
-#endif
-		if (!sh->tx_uar) {
-			DRV_LOG(ERR, "Failed to allocate Tx DevX UAR (BF/NC)");
-			err = ENOMEM;
-			goto exit;
-		}
-		base_addr = mlx5_os_get_devx_uar_base_addr(sh->tx_uar);
-		if (base_addr)
-			break;
-		/*
-		 * The UARs are allocated by rdma_core within the
-		 * IB device context, on context closure all UARs
-		 * will be freed, should be no memory/object leakage.
-		 */
-		DRV_LOG(DEBUG, "Retrying to allocate Tx DevX UAR");
-		sh->tx_uar = NULL;
-	}
-	/* Check whether we finally succeeded with valid UAR allocation. */
-	if (!sh->tx_uar) {
-		DRV_LOG(ERR, "Failed to allocate Tx DevX UAR (NULL base)");
-		err = ENOMEM;
-		goto exit;
-	}
-	for (retry = 0; retry < MLX5_ALLOC_UAR_RETRY; ++retry) {
-		uar_mapping = 0;
-		sh->devx_rx_uar = mlx5_glue->devx_alloc_uar(sh->cdev->ctx,
-							    uar_mapping);
-#ifdef MLX5DV_UAR_ALLOC_TYPE_NC
-		if (!sh->devx_rx_uar &&
-		    uar_mapping == MLX5DV_UAR_ALLOC_TYPE_BF) {
-			/*
-			 * Rx UAR is used to control interrupts only,
-			 * should be no datapath noticeable impact,
-			 * can try "Non-Cached" mapping safely.
-			 */
-			DRV_LOG(DEBUG, "Failed to allocate Rx DevX UAR (BF)");
-			uar_mapping = MLX5DV_UAR_ALLOC_TYPE_NC;
-			sh->devx_rx_uar = mlx5_glue->devx_alloc_uar
-						   (sh->cdev->ctx, uar_mapping);
-		}
-#endif
-		if (!sh->devx_rx_uar) {
-			DRV_LOG(ERR, "Failed to allocate Rx DevX UAR (BF/NC)");
-			err = ENOMEM;
-			goto exit;
-		}
-		base_addr = mlx5_os_get_devx_uar_base_addr(sh->devx_rx_uar);
-		if (base_addr)
-			break;
-		/*
-		 * The UARs are allocated by rdma_core within the
-		 * IB device context, on context closure all UARs
-		 * will be freed, should be no memory/object leakage.
-		 */
-		DRV_LOG(DEBUG, "Retrying to allocate Rx DevX UAR");
-		sh->devx_rx_uar = NULL;
+	int ret;
+
+	ret = mlx5_devx_uar_prepare(sh->cdev, &sh->tx_uar);
+	if (ret) {
+		DRV_LOG(ERR, "Failed to prepare Tx DevX UAR.");
+		return -rte_errno;
 	}
-	/* Check whether we finally succeeded with valid UAR allocation. */
-	if (!sh->devx_rx_uar) {
-		DRV_LOG(ERR, "Failed to allocate Rx DevX UAR (NULL base)");
-		err = ENOMEM;
+	MLX5_ASSERT(sh->tx_uar.obj);
+	MLX5_ASSERT(mlx5_os_get_devx_uar_base_addr(sh->tx_uar.obj));
+	ret = mlx5_devx_uar_prepare(sh->cdev, &sh->rx_uar);
+	if (ret) {
+		DRV_LOG(ERR, "Failed to prepare Rx DevX UAR.");
+		mlx5_devx_uar_release(&sh->tx_uar);
+		return -rte_errno;
 	}
-exit:
-	return err;
+	MLX5_ASSERT(sh->rx_uar.obj);
+	MLX5_ASSERT(mlx5_os_get_devx_uar_base_addr(sh->rx_uar.obj));
+	return 0;
+}
+
+static void
+mlx5_rxtx_uars_release(struct mlx5_dev_ctx_shared *sh)
+{
+	mlx5_devx_uar_release(&sh->rx_uar);
+	mlx5_devx_uar_release(&sh->tx_uar);
 }
 
 /**
@@ -1332,21 +1225,17 @@ mlx5_alloc_shared_dev_ctx(const struct mlx5_dev_spawn_data *spawn,
 			err = ENOMEM;
 			goto error;
 		}
-		err = mlx5_alloc_rxtx_uars(sh, &sh->cdev->config);
+		err = mlx5_rxtx_uars_prepare(sh);
 		if (err)
 			goto error;
-		MLX5_ASSERT(sh->tx_uar);
-		MLX5_ASSERT(mlx5_os_get_devx_uar_base_addr(sh->tx_uar));
-
-		MLX5_ASSERT(sh->devx_rx_uar);
-		MLX5_ASSERT(mlx5_os_get_devx_uar_base_addr(sh->devx_rx_uar));
-	}
 #ifndef RTE_ARCH_64
-	/* Initialize UAR access locks for 32bit implementations. */
-	rte_spinlock_init(&sh->uar_lock_cq);
-	for (i = 0; i < MLX5_UAR_PAGE_NUM_MAX; i++)
-		rte_spinlock_init(&sh->uar_lock[i]);
+	} else {
+		/* Initialize UAR access locks for 32bit implementations. */
+		rte_spinlock_init(&sh->uar_lock_cq);
+		for (i = 0; i < MLX5_UAR_PAGE_NUM_MAX; i++)
+			rte_spinlock_init(&sh->uar_lock[i]);
 #endif
+	}
 	mlx5_os_dev_shared_handler_install(sh);
 	if (LIST_EMPTY(&mlx5_dev_ctx_list)) {
 		err = mlx5_flow_os_init_workspace_once();
@@ -1373,10 +1262,7 @@ mlx5_alloc_shared_dev_ctx(const struct mlx5_dev_spawn_data *spawn,
 		if (sh->tis[i])
 			claim_zero(mlx5_devx_cmd_destroy(sh->tis[i]));
 	} while (++i < (uint32_t)sh->bond.n_port);
-	if (sh->devx_rx_uar)
-		mlx5_glue->devx_free_uar(sh->devx_rx_uar);
-	if (sh->tx_uar)
-		mlx5_glue->devx_free_uar(sh->tx_uar);
+	mlx5_rxtx_uars_release(sh);
 	mlx5_free(sh);
 	MLX5_ASSERT(err > 0);
 	rte_errno = err;
@@ -1445,18 +1331,13 @@ mlx5_free_shared_dev_ctx(struct mlx5_dev_ctx_shared *sh)
 		mlx5_aso_flow_mtrs_mng_close(sh);
 	mlx5_flow_ipool_destroy(sh);
 	mlx5_os_dev_shared_handler_uninstall(sh);
-	if (sh->tx_uar) {
-		mlx5_glue->devx_free_uar(sh->tx_uar);
-		sh->tx_uar = NULL;
-	}
+	mlx5_rxtx_uars_release(sh);
 	do {
 		if (sh->tis[i])
 			claim_zero(mlx5_devx_cmd_destroy(sh->tis[i]));
 	} while (++i < sh->bond.n_port);
 	if (sh->td)
 		claim_zero(mlx5_devx_cmd_destroy(sh->td));
-	if (sh->devx_rx_uar)
-		mlx5_glue->devx_free_uar(sh->devx_rx_uar);
 	MLX5_ASSERT(sh->geneve_tlv_option_resource == NULL);
 	pthread_mutex_destroy(&sh->txpp.mutex);
 	mlx5_free(sh);
@@ -1606,8 +1487,8 @@ mlx5_proc_priv_init(struct rte_eth_dev *dev)
 	 * UAR register table follows the process private structure. BlueFlame
 	 * registers for Tx queues are stored in the table.
 	 */
-	ppriv_size =
-		sizeof(struct mlx5_proc_priv) + priv->txqs_n * sizeof(void *);
+	ppriv_size = sizeof(struct mlx5_proc_priv) +
+		     priv->txqs_n * sizeof(struct mlx5_uar_data);
 	ppriv = mlx5_malloc(MLX5_MEM_RTE | MLX5_MEM_ZERO, ppriv_size,
 			    RTE_CACHE_LINE_SIZE, dev->device->numa_node);
 	if (!ppriv) {
diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index 3b04f9d4e3..7be0e88233 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -518,7 +518,6 @@ struct mlx5_aso_sq {
 	rte_spinlock_t sqsl;
 	struct mlx5_aso_cq cq;
 	struct mlx5_devx_sq sq_obj;
-	volatile uint64_t *uar_addr;
 	struct mlx5_pmd_mr mr;
 	uint16_t pi;
 	uint32_t head;
@@ -1138,7 +1137,7 @@ struct mlx5_dev_ctx_shared {
 	void *rx_domain; /* RX Direct Rules name space handle. */
 	void *tx_domain; /* TX Direct Rules name space handle. */
 #ifndef RTE_ARCH_64
-	rte_spinlock_t uar_lock_cq; /* CQs share a common distinct UAR */
+	rte_spinlock_t uar_lock_cq; /* CQs share a common distinct UAR. */
 	rte_spinlock_t uar_lock[MLX5_UAR_PAGE_NUM_MAX];
 	/* UAR same-page access control required in 32bit implementations. */
 #endif
@@ -1166,11 +1165,11 @@ struct mlx5_dev_ctx_shared {
 	struct mlx5_devx_obj *tis[16]; /* TIS object. */
 	struct mlx5_devx_obj *td; /* Transport domain. */
 	struct mlx5_lag lag; /* LAG attributes */
-	void *tx_uar; /* Tx/packet pacing shared UAR. */
+	struct mlx5_uar tx_uar; /* DevX UAR for Tx and Txpp and ASO SQs. */
+	struct mlx5_uar rx_uar; /* DevX UAR for Rx. */
 	struct mlx5_proc_priv *pppriv; /* Pointer to primary private process. */
 	struct mlx5_flex_parser_profiles fp[MLX5_FLEX_PARSER_MAX];
 	/* Flex parser profiles information. */
-	void *devx_rx_uar; /* DevX UAR for Rx. */
 	struct mlx5_aso_age_mng *aso_age_mng;
 	/* Management data for aging mechanism using ASO Flow Hit. */
 	struct mlx5_geneve_tlv_option_resource *geneve_tlv_option_resource;
@@ -1191,7 +1190,7 @@ struct mlx5_dev_ctx_shared {
 struct mlx5_proc_priv {
 	size_t uar_table_sz;
 	/* Size of UAR register table. */
-	void *uar_table[];
+	struct mlx5_uar_data uar_table[];
 	/* Table of UAR registers for each process. */
 };
 
@@ -1741,6 +1740,7 @@ int mlx5_flow_meter_flush(struct rte_eth_dev *dev,
 void mlx5_flow_meter_rxq_flush(struct rte_eth_dev *dev);
 
 /* mlx5_os.c */
+
 struct rte_pci_driver;
 int mlx5_os_get_dev_attr(struct mlx5_common_device *dev,
 			 struct mlx5_dev_attr *dev_attr);
diff --git a/drivers/net/mlx5/mlx5_defs.h b/drivers/net/mlx5/mlx5_defs.h
index d36c13c38e..258475ed2c 100644
--- a/drivers/net/mlx5/mlx5_defs.h
+++ b/drivers/net/mlx5/mlx5_defs.h
@@ -112,14 +112,6 @@
 #define MLX5_UAR_PAGE_NUM_MAX 64
 #define MLX5_UAR_PAGE_NUM_MASK ((MLX5_UAR_PAGE_NUM_MAX) - 1)
 
-/* Fields of memory mapping type in offset parameter of mmap() */
-#define MLX5_UAR_MMAP_CMD_SHIFT 8
-#define MLX5_UAR_MMAP_CMD_MASK 0xff
-
-#ifndef HAVE_MLX5DV_MMAP_GET_NC_PAGES_CMD
-#define MLX5_MMAP_GET_NC_PAGES_CMD 3
-#endif
-
 /* Log 2 of the default number of strides per WQE for Multi-Packet RQ. */
 #define MLX5_MPRQ_STRIDE_NUM_N 6U
 
diff --git a/drivers/net/mlx5/mlx5_devx.c b/drivers/net/mlx5/mlx5_devx.c
index dc391529c2..591a4d55c5 100644
--- a/drivers/net/mlx5/mlx5_devx.c
+++ b/drivers/net/mlx5/mlx5_devx.c
@@ -363,7 +363,7 @@ mlx5_rxq_create_devx_cq_resources(struct rte_eth_dev *dev, uint16_t idx)
 			"Port %u Rx CQE compression is disabled for LRO.",
 			dev->data->port_id);
 	}
-	cq_attr.uar_page_id = mlx5_os_get_devx_uar_page_id(sh->devx_rx_uar);
+	cq_attr.uar_page_id = mlx5_os_get_devx_uar_page_id(sh->rx_uar.obj);
 	log_cqe_n = log2above(cqe_n);
 	/* Create CQ using DevX API. */
 	ret = mlx5_devx_cq_create(sh->cdev->ctx, &rxq_ctrl->obj->cq_obj,
@@ -374,7 +374,7 @@ mlx5_rxq_create_devx_cq_resources(struct rte_eth_dev *dev, uint16_t idx)
 	rxq_data->cqes = (volatile struct mlx5_cqe (*)[])
 							(uintptr_t)cq_obj->cqes;
 	rxq_data->cq_db = cq_obj->db_rec;
-	rxq_data->cq_uar = mlx5_os_get_devx_uar_base_addr(sh->devx_rx_uar);
+	rxq_data->uar_data = sh->rx_uar.cq_db;
 	rxq_data->cqe_n = log_cqe_n;
 	rxq_data->cqn = cq_obj->cq->id;
 	if (rxq_ctrl->obj->devx_channel) {
@@ -1015,6 +1015,7 @@ mlx5_txq_create_devx_sq_resources(struct rte_eth_dev *dev, uint16_t idx,
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
 	struct mlx5_common_device *cdev = priv->sh->cdev;
+	struct mlx5_uar *uar = &priv->sh->tx_uar;
 	struct mlx5_txq_data *txq_data = (*priv->txqs)[idx];
 	struct mlx5_txq_ctrl *txq_ctrl =
 			container_of(txq_data, struct mlx5_txq_ctrl, txq);
@@ -1028,8 +1029,7 @@ mlx5_txq_create_devx_sq_resources(struct rte_eth_dev *dev, uint16_t idx,
 		.tis_lst_sz = 1,
 		.wq_attr = (struct mlx5_devx_wq_attr){
 			.pd = cdev->pdn,
-			.uar_page =
-				 mlx5_os_get_devx_uar_page_id(priv->sh->tx_uar),
+			.uar_page = mlx5_os_get_devx_uar_page_id(uar->obj),
 		},
 		.ts_format =
 			mlx5_ts_format_conv(cdev->config.hca_attr.sq_ts_format),
@@ -1069,10 +1069,11 @@ mlx5_txq_devx_obj_new(struct rte_eth_dev *dev, uint16_t idx)
 	rte_errno = ENOMEM;
 	return -rte_errno;
 #else
+	struct mlx5_proc_priv *ppriv = MLX5_PROC_PRIV(PORT_ID(priv));
 	struct mlx5_dev_ctx_shared *sh = priv->sh;
 	struct mlx5_txq_obj *txq_obj = txq_ctrl->obj;
 	struct mlx5_devx_cq_attr cq_attr = {
-		.uar_page_id = mlx5_os_get_devx_uar_page_id(sh->tx_uar),
+		.uar_page_id = mlx5_os_get_devx_uar_page_id(sh->tx_uar.obj),
 	};
 	uint32_t cqe_n, log_desc_n;
 	uint32_t wqe_n, wqe_size;
@@ -1080,6 +1081,8 @@ mlx5_txq_devx_obj_new(struct rte_eth_dev *dev, uint16_t idx)
 
 	MLX5_ASSERT(txq_data);
 	MLX5_ASSERT(txq_obj);
+	MLX5_ASSERT(rte_eal_process_type() == RTE_PROC_PRIMARY);
+	MLX5_ASSERT(ppriv);
 	txq_obj->txq_ctrl = txq_ctrl;
 	txq_obj->dev = dev;
 	cqe_n = (1UL << txq_data->elts_n) / MLX5_TX_COMP_THRESH +
@@ -1152,6 +1155,8 @@ mlx5_txq_devx_obj_new(struct rte_eth_dev *dev, uint16_t idx)
 	txq_data->qp_db = &txq_obj->sq_obj.db_rec[MLX5_SND_DBR];
 	*txq_data->qp_db = 0;
 	txq_data->qp_num_8s = txq_obj->sq_obj.sq->id << 8;
+	txq_data->db_heu = sh->cdev->config.dbnc == MLX5_TXDB_HEURISTIC;
+	txq_data->db_nc = sh->tx_uar.dbnc;
 	/* Change Send Queue state to Ready-to-Send. */
 	ret = mlx5_txq_devx_modify(txq_obj, MLX5_TXQ_MOD_RST2RDY, 0);
 	if (ret) {
@@ -1170,10 +1175,9 @@ mlx5_txq_devx_obj_new(struct rte_eth_dev *dev, uint16_t idx)
 	if (!priv->sh->tdn)
 		priv->sh->tdn = priv->sh->td->id;
 #endif
-	MLX5_ASSERT(sh->tx_uar && mlx5_os_get_devx_uar_reg_addr(sh->tx_uar));
 	txq_ctrl->uar_mmap_offset =
-				mlx5_os_get_devx_uar_mmap_offset(sh->tx_uar);
-	txq_uar_init(txq_ctrl, mlx5_os_get_devx_uar_reg_addr(sh->tx_uar));
+			mlx5_os_get_devx_uar_mmap_offset(sh->tx_uar.obj);
+	ppriv->uar_table[txq_data->idx] = sh->tx_uar.bf_db;
 	dev->data->tx_queue_state[idx] = RTE_ETH_QUEUE_STATE_STARTED;
 	return 0;
 error:
diff --git a/drivers/net/mlx5/mlx5_flow_aso.c b/drivers/net/mlx5/mlx5_flow_aso.c
index 1fc1000b01..345756b044 100644
--- a/drivers/net/mlx5/mlx5_flow_aso.c
+++ b/drivers/net/mlx5/mlx5_flow_aso.c
@@ -285,7 +285,6 @@ mlx5_aso_sq_create(void *ctx, struct mlx5_aso_sq *sq, int socket, void *uar,
 	sq->head = 0;
 	sq->tail = 0;
 	sq->sqn = sq->sq_obj.sq->id;
-	sq->uar_addr = mlx5_os_get_devx_uar_reg_addr(uar);
 	rte_spinlock_init(&sq->sqsl);
 	return 0;
 error:
@@ -317,7 +316,7 @@ mlx5_aso_queue_init(struct mlx5_dev_ctx_shared *sh,
 				    sq_desc_n, &sh->aso_age_mng->aso_sq.mr, 0))
 			return -1;
 		if (mlx5_aso_sq_create(cdev->ctx, &sh->aso_age_mng->aso_sq, 0,
-				       sh->tx_uar, cdev->pdn,
+				       sh->tx_uar.obj, cdev->pdn,
 				       MLX5_ASO_QUEUE_LOG_DESC,
 				       cdev->config.hca_attr.sq_ts_format)) {
 			mlx5_aso_dereg_mr(cdev, &sh->aso_age_mng->aso_sq.mr);
@@ -327,7 +326,7 @@ mlx5_aso_queue_init(struct mlx5_dev_ctx_shared *sh,
 		break;
 	case ASO_OPC_MOD_POLICER:
 		if (mlx5_aso_sq_create(cdev->ctx, &sh->mtrmng->pools_mng.sq, 0,
-				       sh->tx_uar, cdev->pdn,
+				       sh->tx_uar.obj, cdev->pdn,
 				       MLX5_ASO_QUEUE_LOG_DESC,
 				       cdev->config.hca_attr.sq_ts_format))
 			return -1;
@@ -339,7 +338,7 @@ mlx5_aso_queue_init(struct mlx5_dev_ctx_shared *sh,
 				    &sh->ct_mng->aso_sq.mr, 0))
 			return -1;
 		if (mlx5_aso_sq_create(cdev->ctx, &sh->ct_mng->aso_sq, 0,
-				       sh->tx_uar, cdev->pdn,
+				       sh->tx_uar.obj, cdev->pdn,
 				       MLX5_ASO_QUEUE_LOG_DESC,
 				       cdev->config.hca_attr.sq_ts_format)) {
 			mlx5_aso_dereg_mr(cdev, &sh->ct_mng->aso_sq.mr);
@@ -390,8 +389,8 @@ mlx5_aso_queue_uninit(struct mlx5_dev_ctx_shared *sh,
 /**
  * Write a burst of WQEs to ASO SQ.
  *
- * @param[in] mng
- *   ASO management data, contains the SQ.
+ * @param[in] sh
+ *   Pointer to shared device context.
  * @param[in] n
  *   Index of the last valid pool.
  *
@@ -399,8 +398,9 @@ mlx5_aso_queue_uninit(struct mlx5_dev_ctx_shared *sh,
  *   Number of WQEs in burst.
  */
 static uint16_t
-mlx5_aso_sq_enqueue_burst(struct mlx5_aso_age_mng *mng, uint16_t n)
+mlx5_aso_sq_enqueue_burst(struct mlx5_dev_ctx_shared *sh, uint16_t n)
 {
+	struct mlx5_aso_age_mng *mng = sh->aso_age_mng;
 	volatile struct mlx5_aso_wqe *wqe;
 	struct mlx5_aso_sq *sq = &mng->aso_sq;
 	struct mlx5_aso_age_pool *pool;
@@ -439,11 +439,9 @@ mlx5_aso_sq_enqueue_burst(struct mlx5_aso_age_mng *mng, uint16_t n)
 	} while (max);
 	wqe->general_cseg.flags = RTE_BE32(MLX5_COMP_ALWAYS <<
 							 MLX5_COMP_MODE_OFFSET);
-	rte_io_wmb();
-	sq->sq_obj.db_rec[MLX5_SND_DBR] = rte_cpu_to_be_32(sq->pi);
-	rte_wmb();
-	*sq->uar_addr = *(volatile uint64_t *)wqe; /* Assume 64 bit ARCH.*/
-	rte_wmb();
+	mlx5_doorbell_ring(&sh->tx_uar.bf_db, *(volatile uint64_t *)wqe,
+			   sq->pi, &sq->sq_obj.db_rec[MLX5_SND_DBR],
+			   !sh->tx_uar.dbnc);
 	return sq->elts[start_head & mask].burst_size;
 }
 
@@ -644,7 +642,7 @@ mlx5_flow_aso_alarm(void *arg)
 		us = US_PER_S;
 		sq->next = 0;
 	}
-	mlx5_aso_sq_enqueue_burst(sh->aso_age_mng, n);
+	mlx5_aso_sq_enqueue_burst(sh, n);
 	if (rte_eal_alarm_set(us, mlx5_flow_aso_alarm, sh))
 		DRV_LOG(ERR, "Cannot reinitialize aso alarm.");
 }
@@ -695,8 +693,9 @@ mlx5_aso_flow_hit_queue_poll_stop(struct mlx5_dev_ctx_shared *sh)
 }
 
 static uint16_t
-mlx5_aso_mtr_sq_enqueue_single(struct mlx5_aso_sq *sq,
-		struct mlx5_aso_mtr *aso_mtr)
+mlx5_aso_mtr_sq_enqueue_single(struct mlx5_dev_ctx_shared *sh,
+			       struct mlx5_aso_sq *sq,
+			       struct mlx5_aso_mtr *aso_mtr)
 {
 	volatile struct mlx5_aso_wqe *wqe = NULL;
 	struct mlx5_flow_meter_info *fm = NULL;
@@ -774,11 +773,9 @@ mlx5_aso_mtr_sq_enqueue_single(struct mlx5_aso_sq *sq,
 	 */
 	sq->head++;
 	sq->pi += 2;/* Each WQE contains 2 WQEBB's. */
-	rte_io_wmb();
-	sq->sq_obj.db_rec[MLX5_SND_DBR] = rte_cpu_to_be_32(sq->pi);
-	rte_wmb();
-	*sq->uar_addr = *(volatile uint64_t *)wqe; /* Assume 64 bit ARCH. */
-	rte_wmb();
+	mlx5_doorbell_ring(&sh->tx_uar.bf_db, *(volatile uint64_t *)wqe,
+			   sq->pi, &sq->sq_obj.db_rec[MLX5_SND_DBR],
+			   !sh->tx_uar.dbnc);
 	rte_spinlock_unlock(&sq->sqsl);
 	return 1;
 }
@@ -871,7 +868,7 @@ mlx5_aso_meter_update_by_wqe(struct mlx5_dev_ctx_shared *sh,
 
 	do {
 		mlx5_aso_mtr_completion_handle(sq);
-		if (mlx5_aso_mtr_sq_enqueue_single(sq, mtr))
+		if (mlx5_aso_mtr_sq_enqueue_single(sh, sq, mtr))
 			return 0;
 		/* Waiting for wqe resource. */
 		rte_delay_us_sleep(MLX5_ASO_WQE_CQE_RESPONSE_DELAY);
@@ -920,8 +917,8 @@ mlx5_aso_mtr_wait(struct mlx5_dev_ctx_shared *sh,
 /*
  * Post a WQE to the ASO CT SQ to modify the context.
  *
- * @param[in] mng
- *   Pointer to the CT pools management structure.
+ * @param[in] sh
+ *   Pointer to shared device context.
  * @param[in] ct
  *   Pointer to the generic CT structure related to the context.
  * @param[in] profile
@@ -931,12 +928,12 @@ mlx5_aso_mtr_wait(struct mlx5_dev_ctx_shared *sh,
  *   1 on success (WQE number), 0 on failure.
  */
 static uint16_t
-mlx5_aso_ct_sq_enqueue_single(struct mlx5_aso_ct_pools_mng *mng,
+mlx5_aso_ct_sq_enqueue_single(struct mlx5_dev_ctx_shared *sh,
 			      struct mlx5_aso_ct_action *ct,
 			      const struct rte_flow_action_conntrack *profile)
 {
 	volatile struct mlx5_aso_wqe *wqe = NULL;
-	struct mlx5_aso_sq *sq = &mng->aso_sq;
+	struct mlx5_aso_sq *sq = &sh->ct_mng->aso_sq;
 	uint16_t size = 1 << sq->log_desc_n;
 	uint16_t mask = size - 1;
 	uint16_t res;
@@ -1039,11 +1036,9 @@ mlx5_aso_ct_sq_enqueue_single(struct mlx5_aso_ct_pools_mng *mng,
 		 profile->reply_dir.max_ack);
 	sq->head++;
 	sq->pi += 2; /* Each WQE contains 2 WQEBB's. */
-	rte_io_wmb();
-	sq->sq_obj.db_rec[MLX5_SND_DBR] = rte_cpu_to_be_32(sq->pi);
-	rte_wmb();
-	*sq->uar_addr = *(volatile uint64_t *)wqe; /* Assume 64 bit ARCH. */
-	rte_wmb();
+	mlx5_doorbell_ring(&sh->tx_uar.bf_db, *(volatile uint64_t *)wqe,
+			   sq->pi, &sq->sq_obj.db_rec[MLX5_SND_DBR],
+			   !sh->tx_uar.dbnc);
 	rte_spinlock_unlock(&sq->sqsl);
 	return 1;
 }
@@ -1084,8 +1079,8 @@ mlx5_aso_ct_status_update(struct mlx5_aso_sq *sq, uint16_t num)
 /*
  * Post a WQE to the ASO CT SQ to query the current context.
  *
- * @param[in] mng
- *   Pointer to the CT pools management structure.
+ * @param[in] sh
+ *   Pointer to shared device context.
  * @param[in] ct
  *   Pointer to the generic CT structure related to the context.
  * @param[in] data
@@ -1095,11 +1090,11 @@ mlx5_aso_ct_status_update(struct mlx5_aso_sq *sq, uint16_t num)
  *   1 on success (WQE number), 0 on failure.
  */
 static int
-mlx5_aso_ct_sq_query_single(struct mlx5_aso_ct_pools_mng *mng,
+mlx5_aso_ct_sq_query_single(struct mlx5_dev_ctx_shared *sh,
 			    struct mlx5_aso_ct_action *ct, char *data)
 {
 	volatile struct mlx5_aso_wqe *wqe = NULL;
-	struct mlx5_aso_sq *sq = &mng->aso_sq;
+	struct mlx5_aso_sq *sq = &sh->ct_mng->aso_sq;
 	uint16_t size = 1 << sq->log_desc_n;
 	uint16_t mask = size - 1;
 	uint16_t res;
@@ -1154,11 +1149,9 @@ mlx5_aso_ct_sq_query_single(struct mlx5_aso_ct_pools_mng *mng,
 	 * data segment is not used in this case.
 	 */
 	sq->pi += 2;
-	rte_io_wmb();
-	sq->sq_obj.db_rec[MLX5_SND_DBR] = rte_cpu_to_be_32(sq->pi);
-	rte_wmb();
-	*sq->uar_addr = *(volatile uint64_t *)wqe; /* Assume 64 bit ARCH. */
-	rte_wmb();
+	mlx5_doorbell_ring(&sh->tx_uar.bf_db, *(volatile uint64_t *)wqe,
+			   sq->pi, &sq->sq_obj.db_rec[MLX5_SND_DBR],
+			   !sh->tx_uar.dbnc);
 	rte_spinlock_unlock(&sq->sqsl);
 	return 1;
 }
@@ -1238,14 +1231,13 @@ mlx5_aso_ct_update_by_wqe(struct mlx5_dev_ctx_shared *sh,
 			  struct mlx5_aso_ct_action *ct,
 			  const struct rte_flow_action_conntrack *profile)
 {
-	struct mlx5_aso_ct_pools_mng *mng = sh->ct_mng;
 	uint32_t poll_wqe_times = MLX5_CT_POLL_WQE_CQE_TIMES;
 	struct mlx5_aso_ct_pool *pool;
 
 	MLX5_ASSERT(ct);
 	do {
-		mlx5_aso_ct_completion_handle(mng);
-		if (mlx5_aso_ct_sq_enqueue_single(mng, ct, profile))
+		mlx5_aso_ct_completion_handle(sh->ct_mng);
+		if (mlx5_aso_ct_sq_enqueue_single(sh, ct, profile))
 			return 0;
 		/* Waiting for wqe resource. */
 		rte_delay_us_sleep(10u);
@@ -1385,7 +1377,6 @@ mlx5_aso_ct_query_by_wqe(struct mlx5_dev_ctx_shared *sh,
 			 struct mlx5_aso_ct_action *ct,
 			 struct rte_flow_action_conntrack *profile)
 {
-	struct mlx5_aso_ct_pools_mng *mng = sh->ct_mng;
 	uint32_t poll_wqe_times = MLX5_CT_POLL_WQE_CQE_TIMES;
 	struct mlx5_aso_ct_pool *pool;
 	char out_data[64 * 2];
@@ -1393,8 +1384,8 @@ mlx5_aso_ct_query_by_wqe(struct mlx5_dev_ctx_shared *sh,
 
 	MLX5_ASSERT(ct);
 	do {
-		mlx5_aso_ct_completion_handle(mng);
-		ret = mlx5_aso_ct_sq_query_single(mng, ct, out_data);
+		mlx5_aso_ct_completion_handle(sh->ct_mng);
+		ret = mlx5_aso_ct_sq_query_single(sh, ct, out_data);
 		if (ret < 0)
 			return ret;
 		else if (ret > 0)
diff --git a/drivers/net/mlx5/mlx5_rx.h b/drivers/net/mlx5/mlx5_rx.h
index 4952fe1455..6d2e79ebaf 100644
--- a/drivers/net/mlx5/mlx5_rx.h
+++ b/drivers/net/mlx5/mlx5_rx.h
@@ -121,13 +121,9 @@ struct mlx5_rxq_data {
 	struct mlx5_rxq_stats stats;
 	rte_xmm_t mbuf_initializer; /* Default rearm/flags for vectorized Rx. */
 	struct rte_mbuf fake_mbuf; /* elts padding for vectorized Rx. */
-	void *cq_uar; /* Verbs CQ user access region. */
+	struct mlx5_uar_data uar_data; /* CQ doorbell. */
 	uint32_t cqn; /* CQ number. */
 	uint8_t cq_arm_sn; /* CQ arm seq number. */
-#ifndef RTE_ARCH_64
-	rte_spinlock_t *uar_lock_cq;
-	/* CQ (UAR) access lock required for 32bit implementations */
-#endif
 	uint32_t tunnel; /* Tunnel information. */
 	int timestamp_offset; /* Dynamic mbuf field for timestamp. */
 	uint64_t timestamp_rx_flag; /* Dynamic mbuf flag for timestamp. */
diff --git a/drivers/net/mlx5/mlx5_rxq.c b/drivers/net/mlx5/mlx5_rxq.c
index 4f02fe02b9..fdb8a51d02 100644
--- a/drivers/net/mlx5/mlx5_rxq.c
+++ b/drivers/net/mlx5/mlx5_rxq.c
@@ -21,11 +21,11 @@
 
 #include <mlx5_glue.h>
 #include <mlx5_malloc.h>
+#include <mlx5_common.h>
 #include <mlx5_common_mr.h>
 
 #include "mlx5_defs.h"
 #include "mlx5.h"
-#include "mlx5_tx.h"
 #include "mlx5_rx.h"
 #include "mlx5_utils.h"
 #include "mlx5_autoconf.h"
@@ -952,15 +952,13 @@ mlx5_arm_cq(struct mlx5_rxq_data *rxq, int sq_n_rxq)
 	int sq_n = 0;
 	uint32_t doorbell_hi;
 	uint64_t doorbell;
-	void *cq_db_reg = (char *)rxq->cq_uar + MLX5_CQ_DOORBELL;
 
 	sq_n = sq_n_rxq & MLX5_CQ_SQN_MASK;
 	doorbell_hi = sq_n << MLX5_CQ_SQN_OFFSET | (rxq->cq_ci & MLX5_CI_MASK);
 	doorbell = (uint64_t)doorbell_hi << 32;
 	doorbell |= rxq->cqn;
-	rxq->cq_db[MLX5_CQ_ARM_DB] = rte_cpu_to_be_32(doorbell_hi);
-	mlx5_uar_write64(rte_cpu_to_be_64(doorbell),
-			 cq_db_reg, rxq->uar_lock_cq);
+	mlx5_doorbell_ring(&rxq->uar_data, rte_cpu_to_be_64(doorbell),
+			   doorbell_hi, &rxq->cq_db[MLX5_CQ_ARM_DB], 0);
 }
 
 /**
@@ -1621,9 +1619,6 @@ mlx5_rxq_new(struct rte_eth_dev *dev, uint16_t idx, uint16_t desc,
 		(struct rte_mbuf *(*)[desc_n])(tmpl + 1);
 	tmpl->rxq.mprq_bufs =
 		(struct mlx5_mprq_buf *(*)[desc])(*tmpl->rxq.elts + desc_n);
-#ifndef RTE_ARCH_64
-	tmpl->rxq.uar_lock_cq = &priv->sh->uar_lock_cq;
-#endif
 	tmpl->rxq.idx = idx;
 	__atomic_fetch_add(&tmpl->refcnt, 1, __ATOMIC_RELAXED);
 	LIST_INSERT_HEAD(&priv->rxqsctrl, tmpl, next);
diff --git a/drivers/net/mlx5/mlx5_tx.h b/drivers/net/mlx5/mlx5_tx.h
index 24a312b58b..f6da6901c0 100644
--- a/drivers/net/mlx5/mlx5_tx.h
+++ b/drivers/net/mlx5/mlx5_tx.h
@@ -14,6 +14,7 @@
 #include <rte_common.h>
 #include <rte_spinlock.h>
 
+#include <mlx5_common.h>
 #include <mlx5_common_mr.h>
 
 #include "mlx5.h"
@@ -160,10 +161,7 @@ struct mlx5_txq_data {
 	int32_t ts_offset; /* Timestamp field dynamic offset. */
 	struct mlx5_dev_ctx_shared *sh; /* Shared context. */
 	struct mlx5_txq_stats stats; /* TX queue counters. */
-#ifndef RTE_ARCH_64
-	rte_spinlock_t *uar_lock;
-	/* UAR access lock required for 32bit implementations */
-#endif
+	struct mlx5_uar_data uar_data;
 	struct rte_mbuf *elts[0];
 	/* Storage for queued packets, must be the last field. */
 } __rte_cache_aligned;
@@ -203,7 +201,6 @@ int mlx5_tx_hairpin_queue_setup
 	(struct rte_eth_dev *dev, uint16_t idx, uint16_t desc,
 	 const struct rte_eth_hairpin_conf *hairpin_conf);
 void mlx5_tx_queue_release(struct rte_eth_dev *dev, uint16_t qid);
-void txq_uar_init(struct mlx5_txq_ctrl *txq_ctrl, void *bf_reg);
 int mlx5_tx_uar_init_secondary(struct rte_eth_dev *dev, int fd);
 void mlx5_tx_uar_uninit_secondary(struct rte_eth_dev *dev);
 int mlx5_txq_obj_verify(struct rte_eth_dev *dev);
@@ -288,68 +285,12 @@ MLX5_TXOFF_PRE_DECL(mci_mpw);
 MLX5_TXOFF_PRE_DECL(mc_mpw);
 MLX5_TXOFF_PRE_DECL(i_mpw);
 
-static __rte_always_inline uint64_t *
+static __rte_always_inline struct mlx5_uar_data *
 mlx5_tx_bfreg(struct mlx5_txq_data *txq)
 {
-	return MLX5_PROC_PRIV(txq->port_id)->uar_table[txq->idx];
-}
-
-/**
- * Provide safe 64bit store operation to mlx5 UAR region for both 32bit and
- * 64bit architectures.
- *
- * @param val
- *   value to write in CPU endian format.
- * @param addr
- *   Address to write to.
- * @param lock
- *   Address of the lock to use for that UAR access.
- */
-static __rte_always_inline void
-__mlx5_uar_write64_relaxed(uint64_t val, void *addr,
-			   rte_spinlock_t *lock __rte_unused)
-{
-#ifdef RTE_ARCH_64
-	*(uint64_t *)addr = val;
-#else /* !RTE_ARCH_64 */
-	rte_spinlock_lock(lock);
-	*(uint32_t *)addr = val;
-	rte_io_wmb();
-	*((uint32_t *)addr + 1) = val >> 32;
-	rte_spinlock_unlock(lock);
-#endif
-}
-
-/**
- * Provide safe 64bit store operation to mlx5 UAR region for both 32bit and
- * 64bit architectures while guaranteeing the order of execution with the
- * code being executed.
- *
- * @param val
- *   value to write in CPU endian format.
- * @param addr
- *   Address to write to.
- * @param lock
- *   Address of the lock to use for that UAR access.
- */
-static __rte_always_inline void
-__mlx5_uar_write64(uint64_t val, void *addr, rte_spinlock_t *lock)
-{
-	rte_io_wmb();
-	__mlx5_uar_write64_relaxed(val, addr, lock);
+	return &MLX5_PROC_PRIV(txq->port_id)->uar_table[txq->idx];
 }
 
-/* Assist macros, used instead of directly calling the functions they wrap. */
-#ifdef RTE_ARCH_64
-#define mlx5_uar_write64_relaxed(val, dst, lock) \
-		__mlx5_uar_write64_relaxed(val, dst, NULL)
-#define mlx5_uar_write64(val, dst, lock) __mlx5_uar_write64(val, dst, NULL)
-#else
-#define mlx5_uar_write64_relaxed(val, dst, lock) \
-		__mlx5_uar_write64_relaxed(val, dst, lock)
-#define mlx5_uar_write64(val, dst, lock) __mlx5_uar_write64(val, dst, lock)
-#endif
-
 /**
  * Query LKey from a packet buffer for Tx.
  *
@@ -373,32 +314,6 @@ mlx5_tx_mb2mr(struct mlx5_txq_data *txq, struct rte_mbuf *mb)
 	return mlx5_mr_mb2mr(priv->sh->cdev, &priv->mp_id, mr_ctrl, mb);
 }
 
-/**
- * Ring TX queue doorbell and flush the update if requested.
- *
- * @param txq
- *   Pointer to TX queue structure.
- * @param wqe
- *   Pointer to the last WQE posted in the NIC.
- * @param cond
- *   Request for write memory barrier after BlueFlame update.
- */
-static __rte_always_inline void
-mlx5_tx_dbrec_cond_wmb(struct mlx5_txq_data *txq, volatile struct mlx5_wqe *wqe,
-		       int cond)
-{
-	uint64_t *dst = mlx5_tx_bfreg(txq);
-	volatile uint64_t *src = ((volatile uint64_t *)wqe);
-
-	rte_io_wmb();
-	*txq->qp_db = rte_cpu_to_be_32(txq->wqe_ci);
-	/* Ensure ordering between DB record and BF copy. */
-	rte_wmb();
-	mlx5_uar_write64_relaxed(*src, dst, txq->uar_lock);
-	if (cond)
-		rte_wmb();
-}
-
 /**
  * Ring TX queue doorbell and flush the update by write memory barrier.
  *
@@ -410,7 +325,8 @@ mlx5_tx_dbrec_cond_wmb(struct mlx5_txq_data *txq, volatile struct mlx5_wqe *wqe,
 static __rte_always_inline void
 mlx5_tx_dbrec(struct mlx5_txq_data *txq, volatile struct mlx5_wqe *wqe)
 {
-	mlx5_tx_dbrec_cond_wmb(txq, wqe, 1);
+	mlx5_doorbell_ring(mlx5_tx_bfreg(txq), *(volatile uint64_t *)wqe,
+			   txq->wqe_ci, txq->qp_db, 1);
 }
 
 /**
@@ -3683,8 +3599,10 @@ mlx5_tx_burst_tmpl(struct mlx5_txq_data *__rte_restrict txq,
 	 *   packets are coming and the write barrier will be issued on
 	 *   the next burst (after descriptor writing, at least).
 	 */
-	mlx5_tx_dbrec_cond_wmb(txq, loc.wqe_last, !txq->db_nc &&
-			(!txq->db_heu || pkts_n % MLX5_TX_DEFAULT_BURST));
+	mlx5_doorbell_ring(mlx5_tx_bfreg(txq),
+			   *(volatile uint64_t *)loc.wqe_last, txq->wqe_ci,
+			   txq->qp_db, !txq->db_nc &&
+			   (!txq->db_heu || pkts_n % MLX5_TX_DEFAULT_BURST));
 	/* Not all of the mbufs may be stored into elts yet. */
 	part = MLX5_TXOFF_CONFIG(INLINE) ? 0 : loc.pkts_sent - loc.pkts_copy;
 	if (!MLX5_TXOFF_CONFIG(INLINE) && part) {
diff --git a/drivers/net/mlx5/mlx5_txpp.c b/drivers/net/mlx5/mlx5_txpp.c
index 34f92faa67..927c327284 100644
--- a/drivers/net/mlx5/mlx5_txpp.c
+++ b/drivers/net/mlx5/mlx5_txpp.c
@@ -164,21 +164,14 @@ mlx5_txpp_doorbell_rearm_queue(struct mlx5_dev_ctx_shared *sh, uint16_t ci)
 		uint32_t w32[2];
 		uint64_t w64;
 	} cs;
-	void *reg_addr;
 
 	wq->sq_ci = ci + 1;
 	cs.w32[0] = rte_cpu_to_be_32(rte_be_to_cpu_32
 			(wqe[ci & (wq->sq_size - 1)].ctrl[0]) | (ci - 1) << 8);
 	cs.w32[1] = wqe[ci & (wq->sq_size - 1)].ctrl[1];
 	/* Update SQ doorbell record with new SQ ci. */
-	rte_compiler_barrier();
-	*wq->sq_obj.db_rec = rte_cpu_to_be_32(wq->sq_ci);
-	/* Make sure the doorbell record is updated. */
-	rte_wmb();
-	/* Write to doorbel register to start processing. */
-	reg_addr = mlx5_os_get_devx_uar_reg_addr(sh->tx_uar);
-	__mlx5_uar_write64_relaxed(cs.w64, reg_addr, NULL);
-	rte_wmb();
+	mlx5_doorbell_ring(&sh->tx_uar.bf_db, cs.w64, wq->sq_ci,
+			   wq->sq_obj.db_rec, !sh->tx_uar.dbnc);
 }
 
 static void
@@ -233,14 +226,15 @@ mlx5_txpp_create_rearm_queue(struct mlx5_dev_ctx_shared *sh)
 		.tis_num = sh->tis[0]->id,
 		.wq_attr = (struct mlx5_devx_wq_attr){
 			.pd = sh->cdev->pdn,
-			.uar_page = mlx5_os_get_devx_uar_page_id(sh->tx_uar),
+			.uar_page =
+				mlx5_os_get_devx_uar_page_id(sh->tx_uar.obj),
 		},
 		.ts_format = mlx5_ts_format_conv
 				       (sh->cdev->config.hca_attr.sq_ts_format),
 	};
 	struct mlx5_devx_modify_sq_attr msq_attr = { 0 };
 	struct mlx5_devx_cq_attr cq_attr = {
-		.uar_page_id = mlx5_os_get_devx_uar_page_id(sh->tx_uar),
+		.uar_page_id = mlx5_os_get_devx_uar_page_id(sh->tx_uar.obj),
 	};
 	struct mlx5_txpp_wq *wq = &sh->txpp.rearm_queue;
 	int ret;
@@ -394,7 +388,7 @@ mlx5_txpp_create_clock_queue(struct mlx5_dev_ctx_shared *sh)
 	struct mlx5_devx_cq_attr cq_attr = {
 		.use_first_only = 1,
 		.overrun_ignore = 1,
-		.uar_page_id = mlx5_os_get_devx_uar_page_id(sh->tx_uar),
+		.uar_page_id = mlx5_os_get_devx_uar_page_id(sh->tx_uar.obj),
 	};
 	struct mlx5_txpp_wq *wq = &sh->txpp.clock_queue;
 	int ret;
@@ -444,7 +438,7 @@ mlx5_txpp_create_clock_queue(struct mlx5_dev_ctx_shared *sh)
 	sq_attr.cqn = wq->cq_obj.cq->id;
 	sq_attr.packet_pacing_rate_limit_index = sh->txpp.pp_id;
 	sq_attr.wq_attr.cd_slave = 1;
-	sq_attr.wq_attr.uar_page = mlx5_os_get_devx_uar_page_id(sh->tx_uar);
+	sq_attr.wq_attr.uar_page = mlx5_os_get_devx_uar_page_id(sh->tx_uar.obj);
 	sq_attr.wq_attr.pd = sh->cdev->pdn;
 	sq_attr.ts_format =
 		mlx5_ts_format_conv(sh->cdev->config.hca_attr.sq_ts_format);
@@ -479,26 +473,14 @@ mlx5_txpp_create_clock_queue(struct mlx5_dev_ctx_shared *sh)
 static inline void
 mlx5_txpp_cq_arm(struct mlx5_dev_ctx_shared *sh)
 {
-	void *base_addr;
-
 	struct mlx5_txpp_wq *aq = &sh->txpp.rearm_queue;
 	uint32_t arm_sn = aq->arm_sn << MLX5_CQ_SQN_OFFSET;
 	uint32_t db_hi = arm_sn | MLX5_CQ_DBR_CMD_ALL | aq->cq_ci;
 	uint64_t db_be =
 		rte_cpu_to_be_64(((uint64_t)db_hi << 32) | aq->cq_obj.cq->id);
-	base_addr = mlx5_os_get_devx_uar_base_addr(sh->tx_uar);
-	uint32_t *addr = RTE_PTR_ADD(base_addr, MLX5_CQ_DOORBELL);
 
-	rte_compiler_barrier();
-	aq->cq_obj.db_rec[MLX5_CQ_ARM_DB] = rte_cpu_to_be_32(db_hi);
-	rte_wmb();
-#ifdef RTE_ARCH_64
-	*(uint64_t *)addr = db_be;
-#else
-	*(uint32_t *)addr = db_be;
-	rte_io_wmb();
-	*((uint32_t *)addr + 1) = db_be >> 32;
-#endif
+	mlx5_doorbell_ring(&sh->tx_uar.cq_db, db_be, db_hi,
+			   &aq->cq_obj.db_rec[MLX5_CQ_ARM_DB], 0);
 	aq->arm_sn++;
 }
 
diff --git a/drivers/net/mlx5/mlx5_txq.c b/drivers/net/mlx5/mlx5_txq.c
index 5fa43d63f1..4e0bf7af9c 100644
--- a/drivers/net/mlx5/mlx5_txq.c
+++ b/drivers/net/mlx5/mlx5_txq.c
@@ -494,66 +494,6 @@ mlx5_tx_queue_release(struct rte_eth_dev *dev, uint16_t qid)
 	mlx5_txq_release(dev, qid);
 }
 
-/**
- * Configure the doorbell register non-cached attribute.
- *
- * @param txq_ctrl
- *   Pointer to Tx queue control structure.
- * @param page_size
- *   Systme page size
- */
-static void
-txq_uar_ncattr_init(struct mlx5_txq_ctrl *txq_ctrl, size_t page_size)
-{
-	struct mlx5_common_device *cdev = txq_ctrl->priv->sh->cdev;
-	off_t cmd;
-
-	txq_ctrl->txq.db_heu = cdev->config.dbnc == MLX5_TXDB_HEURISTIC;
-	txq_ctrl->txq.db_nc = 0;
-	/* Check the doorbell register mapping type. */
-	cmd = txq_ctrl->uar_mmap_offset / page_size;
-	cmd >>= MLX5_UAR_MMAP_CMD_SHIFT;
-	cmd &= MLX5_UAR_MMAP_CMD_MASK;
-	if (cmd == MLX5_MMAP_GET_NC_PAGES_CMD)
-		txq_ctrl->txq.db_nc = 1;
-}
-
-/**
- * Initialize Tx UAR registers for primary process.
- *
- * @param txq_ctrl
- *   Pointer to Tx queue control structure.
- * @param bf_reg
- *   BlueFlame register from Verbs UAR.
- */
-void
-txq_uar_init(struct mlx5_txq_ctrl *txq_ctrl, void *bf_reg)
-{
-	struct mlx5_priv *priv = txq_ctrl->priv;
-	struct mlx5_proc_priv *ppriv = MLX5_PROC_PRIV(PORT_ID(priv));
-#ifndef RTE_ARCH_64
-	unsigned int lock_idx;
-#endif
-	const size_t page_size = rte_mem_page_size();
-	if (page_size == (size_t)-1) {
-		DRV_LOG(ERR, "Failed to get mem page size");
-		rte_errno = ENOMEM;
-	}
-
-	if (txq_ctrl->type != MLX5_TXQ_TYPE_STANDARD)
-		return;
-	MLX5_ASSERT(rte_eal_process_type() == RTE_PROC_PRIMARY);
-	MLX5_ASSERT(ppriv);
-	ppriv->uar_table[txq_ctrl->txq.idx] = bf_reg;
-	txq_uar_ncattr_init(txq_ctrl, page_size);
-#ifndef RTE_ARCH_64
-	/* Assign an UAR lock according to UAR page number */
-	lock_idx = (txq_ctrl->uar_mmap_offset / page_size) &
-		   MLX5_UAR_PAGE_NUM_MASK;
-	txq_ctrl->txq.uar_lock = &priv->sh->uar_lock[lock_idx];
-#endif
-}
-
 /**
  * Remap UAR register of a Tx queue for secondary process.
  *
@@ -592,7 +532,7 @@ txq_uar_init_secondary(struct mlx5_txq_ctrl *txq_ctrl, int fd)
 	 * As rdma-core, UARs are mapped in size of OS page
 	 * size. Ref to libmlx5 function: mlx5_init_context()
 	 */
-	uar_va = (uintptr_t)primary_ppriv->uar_table[txq->idx];
+	uar_va = (uintptr_t)primary_ppriv->uar_table[txq->idx].db;
 	offset = uar_va & (page_size - 1); /* Offset in page. */
 	addr = rte_mem_map(NULL, page_size, RTE_PROT_WRITE, RTE_MAP_SHARED,
 			   fd, txq_ctrl->uar_mmap_offset);
@@ -603,7 +543,11 @@ txq_uar_init_secondary(struct mlx5_txq_ctrl *txq_ctrl, int fd)
 		return -rte_errno;
 	}
 	addr = RTE_PTR_ADD(addr, offset);
-	ppriv->uar_table[txq->idx] = addr;
+	ppriv->uar_table[txq->idx].db = addr;
+#ifndef RTE_ARCH_64
+	ppriv->uar_table[txq->idx].sl_p =
+			primary_ppriv->uar_table[txq->idx].sl_p;
+#endif
 	return 0;
 }
 
@@ -626,7 +570,7 @@ txq_uar_uninit_secondary(struct mlx5_txq_ctrl *txq_ctrl)
 
 	if (txq_ctrl->type != MLX5_TXQ_TYPE_STANDARD)
 		return;
-	addr = ppriv->uar_table[txq_ctrl->txq.idx];
+	addr = ppriv->uar_table[txq_ctrl->txq.idx].db;
 	rte_mem_unmap(RTE_PTR_ALIGN_FLOOR(addr, page_size), page_size);
 }
 
@@ -651,9 +595,9 @@ mlx5_tx_uar_uninit_secondary(struct rte_eth_dev *dev)
 	}
 	MLX5_ASSERT(rte_eal_process_type() == RTE_PROC_SECONDARY);
 	for (i = 0; i != ppriv->uar_table_sz; ++i) {
-		if (!ppriv->uar_table[i])
+		if (!ppriv->uar_table[i].db)
 			continue;
-		addr = ppriv->uar_table[i];
+		addr = ppriv->uar_table[i].db;
 		rte_mem_unmap(RTE_PTR_ALIGN_FLOOR(addr, page_size), page_size);
 
 	}
diff --git a/drivers/regex/mlx5/mlx5_regex.c b/drivers/regex/mlx5/mlx5_regex.c
index d632252794..71c42eab25 100644
--- a/drivers/regex/mlx5/mlx5_regex.c
+++ b/drivers/regex/mlx5/mlx5_regex.c
@@ -133,17 +133,9 @@ mlx5_regex_dev_probe(struct mlx5_common_device *cdev)
 		rte_errno = rte_errno ? rte_errno : EINVAL;
 		goto dev_error;
 	}
-	/*
-	 * This PMD always claims the write memory barrier on UAR
-	 * registers writings, it is safe to allocate UAR with any
-	 * memory mapping type.
-	 */
-	priv->uar = mlx5_devx_alloc_uar(priv->cdev);
-	if (!priv->uar) {
-		DRV_LOG(ERR, "can't allocate uar.");
-		rte_errno = ENOMEM;
+	ret = mlx5_devx_uar_prepare(cdev, &priv->uar);
+	if (ret)
 		goto error;
-	}
 	priv->regexdev->dev_ops = &mlx5_regexdev_ops;
 	priv->regexdev->enqueue = mlx5_regexdev_enqueue;
 #ifdef HAVE_MLX5_UMR_IMKEY
@@ -162,8 +154,6 @@ mlx5_regex_dev_probe(struct mlx5_common_device *cdev)
 	return 0;
 
 error:
-	if (priv->uar)
-		mlx5_glue->devx_free_uar(priv->uar);
 	if (priv->regexdev)
 		rte_regexdev_unregister(priv->regexdev);
 dev_error:
@@ -185,8 +175,7 @@ mlx5_regex_dev_remove(struct mlx5_common_device *cdev)
 		return 0;
 	priv = dev->data->dev_private;
 	if (priv) {
-		if (priv->uar)
-			mlx5_glue->devx_free_uar(priv->uar);
+		mlx5_devx_uar_release(&priv->uar);
 		if (priv->regexdev)
 			rte_regexdev_unregister(priv->regexdev);
 		rte_free(priv);
diff --git a/drivers/regex/mlx5/mlx5_regex.h b/drivers/regex/mlx5/mlx5_regex.h
index eb59cc38a6..fa17b46f6f 100644
--- a/drivers/regex/mlx5/mlx5_regex.h
+++ b/drivers/regex/mlx5/mlx5_regex.h
@@ -67,7 +67,7 @@ struct mlx5_regex_priv {
 	struct mlx5_regex_db db[MLX5_RXP_MAX_ENGINES +
 				MLX5_RXP_EM_COUNT];
 	uint32_t nb_engines; /* Number of RegEx engines. */
-	struct mlx5dv_devx_uar *uar; /* UAR object. */
+	struct mlx5_uar uar; /* UAR object. */
 	uint8_t is_bf2; /* The device is BF2 device. */
 	uint8_t has_umr; /* The device supports UMR. */
 	uint32_t mmo_regex_qp_cap:1;
diff --git a/drivers/regex/mlx5/mlx5_regex_control.c b/drivers/regex/mlx5/mlx5_regex_control.c
index 50c966a022..9354673251 100644
--- a/drivers/regex/mlx5/mlx5_regex_control.c
+++ b/drivers/regex/mlx5/mlx5_regex_control.c
@@ -78,7 +78,7 @@ static int
 regex_ctrl_create_cq(struct mlx5_regex_priv *priv, struct mlx5_regex_cq *cq)
 {
 	struct mlx5_devx_cq_attr attr = {
-		.uar_page_id = priv->uar->page_id,
+		.uar_page_id = mlx5_os_get_devx_uar_page_id(priv->uar.obj),
 	};
 	int ret;
 
@@ -137,7 +137,7 @@ regex_ctrl_create_hw_qp(struct mlx5_regex_priv *priv, struct mlx5_regex_qp *qp,
 #ifdef HAVE_IBV_FLOW_DV_SUPPORT
 	struct mlx5_devx_qp_attr attr = {
 		.cqn = qp->cq.cq_obj.cq->id,
-		.uar_index = priv->uar->page_id,
+		.uar_index = mlx5_os_get_devx_uar_page_id(priv->uar.obj),
 		.pd = priv->cdev->pdn,
 		.ts_format = mlx5_ts_format_conv
 				     (priv->cdev->config.hca_attr.qp_ts_format),
diff --git a/drivers/regex/mlx5/mlx5_regex_fastpath.c b/drivers/regex/mlx5/mlx5_regex_fastpath.c
index adb5343a46..18bbda6340 100644
--- a/drivers/regex/mlx5/mlx5_regex_fastpath.c
+++ b/drivers/regex/mlx5/mlx5_regex_fastpath.c
@@ -188,24 +188,20 @@ prep_one(struct mlx5_regex_priv *priv, struct mlx5_regex_qp *qp,
 }
 
 static inline void
-send_doorbell(struct mlx5_regex_priv *priv, struct mlx5_regex_hw_qp *qp_obj)
+send_doorbell(struct mlx5_regex_priv *priv, struct mlx5_regex_hw_qp *qp)
 {
-	struct mlx5dv_devx_uar *uar = priv->uar;
-	size_t wqe_offset = (qp_obj->db_pi & (qp_size_get(qp_obj) - 1)) *
-		(MLX5_SEND_WQE_BB << (priv->has_umr ? 2 : 0)) +
-		(priv->has_umr ? MLX5_REGEX_UMR_WQE_SIZE : 0);
-	uint8_t *wqe = (uint8_t *)(uintptr_t)qp_obj->qp_obj.wqes + wqe_offset;
+	size_t wqe_offset = (qp->db_pi & (qp_size_get(qp) - 1)) *
+			    (MLX5_SEND_WQE_BB << (priv->has_umr ? 2 : 0)) +
+			    (priv->has_umr ? MLX5_REGEX_UMR_WQE_SIZE : 0);
+	uint8_t *wqe = (uint8_t *)(uintptr_t)qp->qp_obj.wqes + wqe_offset;
+	uint32_t actual_pi = (priv->has_umr ? (qp->db_pi * 4 + 3) : qp->db_pi) &
+			     MLX5_REGEX_MAX_WQE_INDEX;
+
 	/* Or the fm_ce_se instead of set, avoid the fence be cleared. */
 	((struct mlx5_wqe_ctrl_seg *)wqe)->fm_ce_se |= MLX5_WQE_CTRL_CQ_UPDATE;
-	uint64_t *doorbell_addr =
-		(uint64_t *)((uint8_t *)uar->base_addr + 0x800);
-	rte_io_wmb();
-	qp_obj->qp_obj.db_rec[MLX5_SND_DBR] = rte_cpu_to_be_32((priv->has_umr ?
-					(qp_obj->db_pi * 4 + 3) : qp_obj->db_pi)
-					& MLX5_REGEX_MAX_WQE_INDEX);
-	rte_wmb();
-	*doorbell_addr = *(volatile uint64_t *)wqe;
-	rte_wmb();
+	mlx5_doorbell_ring(&priv->uar.bf_db, *(volatile uint64_t *)wqe,
+			   actual_pi, &qp->qp_obj.db_rec[MLX5_SND_DBR],
+			   !priv->uar.dbnc);
 }
 
 static inline int
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.h b/drivers/vdpa/mlx5/mlx5_vdpa.h
index cf4f384fa4..312473e00c 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa.h
+++ b/drivers/vdpa/mlx5/mlx5_vdpa.h
@@ -136,7 +136,7 @@ struct mlx5_vdpa_priv {
 	struct rte_vhost_memory *vmem;
 	struct mlx5dv_devx_event_channel *eventc;
 	struct mlx5dv_devx_event_channel *err_chnl;
-	struct mlx5dv_devx_uar *uar;
+	struct mlx5_uar uar;
 	struct rte_intr_handle *err_intr_handle;
 	struct mlx5_devx_obj *td;
 	struct mlx5_devx_obj *tiss[16]; /* TIS list for each LAG port. */
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa_event.c b/drivers/vdpa/mlx5/mlx5_vdpa_event.c
index 21738bdfff..9cc71714a2 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa_event.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa_event.c
@@ -30,10 +30,7 @@
 void
 mlx5_vdpa_event_qp_global_release(struct mlx5_vdpa_priv *priv)
 {
-	if (priv->uar) {
-		mlx5_glue->devx_free_uar(priv->uar);
-		priv->uar = NULL;
-	}
+	mlx5_devx_uar_release(&priv->uar);
 #ifdef HAVE_IBV_DEVX_EVENT
 	if (priv->eventc) {
 		mlx5_os_devx_destroy_event_channel(priv->eventc);
@@ -56,14 +53,7 @@ mlx5_vdpa_event_qp_global_prepare(struct mlx5_vdpa_priv *priv)
 			rte_errno);
 		goto error;
 	}
-	/*
-	 * This PMD always claims the write memory barrier on UAR
-	 * registers writings, it is safe to allocate UAR with any
-	 * memory mapping type.
-	 */
-	priv->uar = mlx5_devx_alloc_uar(priv->cdev);
-	if (!priv->uar) {
-		rte_errno = errno;
+	if (mlx5_devx_uar_prepare(priv->cdev, &priv->uar) != 0) {
 		DRV_LOG(ERR, "Failed to allocate UAR.");
 		goto error;
 	}
@@ -88,18 +78,9 @@ mlx5_vdpa_cq_arm(struct mlx5_vdpa_priv *priv, struct mlx5_vdpa_cq *cq)
 	uint32_t doorbell_hi = arm_sn | MLX5_CQ_DBR_CMD_ALL | cq_ci;
 	uint64_t doorbell = ((uint64_t)doorbell_hi << 32) | cq->cq_obj.cq->id;
 	uint64_t db_be = rte_cpu_to_be_64(doorbell);
-	uint32_t *addr = RTE_PTR_ADD(priv->uar->base_addr, MLX5_CQ_DOORBELL);
-
-	rte_io_wmb();
-	cq->cq_obj.db_rec[MLX5_CQ_ARM_DB] = rte_cpu_to_be_32(doorbell_hi);
-	rte_wmb();
-#ifdef RTE_ARCH_64
-	*(uint64_t *)addr = db_be;
-#else
-	*(uint32_t *)addr = db_be;
-	rte_io_wmb();
-	*((uint32_t *)addr + 1) = db_be >> 32;
-#endif
+
+	mlx5_doorbell_ring(&priv->uar.cq_db, db_be, doorbell_hi,
+			   &cq->cq_obj.db_rec[MLX5_CQ_ARM_DB], 0);
 	cq->arm_sn++;
 	cq->armed = 1;
 }
@@ -110,7 +91,7 @@ mlx5_vdpa_cq_create(struct mlx5_vdpa_priv *priv, uint16_t log_desc_n,
 {
 	struct mlx5_devx_cq_attr attr = {
 		.use_first_only = 1,
-		.uar_page_id = priv->uar->page_id,
+		.uar_page_id = mlx5_os_get_devx_uar_page_id(priv->uar.obj),
 	};
 	uint16_t event_nums[1] = {0};
 	int ret;
@@ -606,7 +587,7 @@ mlx5_vdpa_event_qp_create(struct mlx5_vdpa_priv *priv, uint16_t desc_n,
 		DRV_LOG(ERR, "Failed to create FW QP(%u).", rte_errno);
 		goto error;
 	}
-	attr.uar_index = priv->uar->page_id;
+	attr.uar_index = mlx5_os_get_devx_uar_page_id(priv->uar.obj);
 	attr.cqn = eqp->cq.cq_obj.cq->id;
 	attr.rq_size = RTE_BIT32(log_desc_n);
 	attr.log_rq_stride = rte_log2_u32(MLX5_WSEG_SIZE);
-- 
2.25.1


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [dpdk-dev] [PATCH 0/6] mlx5: some UAR fixes
  2021-11-03 18:35 [dpdk-dev] [PATCH 0/6] mlx5: some UAR fixes michaelba
                   ` (5 preceding siblings ...)
  2021-11-03 18:35 ` [dpdk-dev] [PATCH 6/6] common/mlx5: fix post doorbell barrier michaelba
@ 2021-11-07 15:23 ` Thomas Monjalon
  6 siblings, 0 replies; 8+ messages in thread
From: Thomas Monjalon @ 2021-11-07 15:23 UTC (permalink / raw)
  To: Michael Baum; +Cc: dev, Matan Azrad

> Michael Baum (6):
>   crypto/mlx5: fix invalid memory access in probing
>   common/mlx5: fix redundant code in UAR allocation
>   common/mlx5: fix UAR allocation diagnostics messages
>   common/mlx5: fix doorbell mapping configuration
>   net/mlx5: remove duplicated reference of the TxQ doorbell
>   common/mlx5: fix post doorbell barrier

Applied, thanks.



^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2021-11-07 15:23 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-11-03 18:35 [dpdk-dev] [PATCH 0/6] mlx5: some UAR fixes michaelba
2021-11-03 18:35 ` [dpdk-dev] [PATCH 1/6] crypto/mlx5: fix invalid memory access in probing michaelba
2021-11-03 18:35 ` [dpdk-dev] [PATCH 2/6] common/mlx5: fix redundant code in UAR allocation michaelba
2021-11-03 18:35 ` [dpdk-dev] [PATCH 3/6] common/mlx5: fix UAR allocation diagnostics messages michaelba
2021-11-03 18:35 ` [dpdk-dev] [PATCH 4/6] common/mlx5: fix doorbell mapping configuration michaelba
2021-11-03 18:35 ` [dpdk-dev] [PATCH 5/6] net/mlx5: remove duplicated reference of the TxQ doorbell michaelba
2021-11-03 18:35 ` [dpdk-dev] [PATCH 6/6] common/mlx5: fix post doorbell barrier michaelba
2021-11-07 15:23 ` [dpdk-dev] [PATCH 0/6] mlx5: some UAR fixes Thomas Monjalon

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).