DPDK patches and discussions
 help / color / mirror / Atom feed
* [PATCH] common/mlx5: fix mempool registration
@ 2021-11-17 18:49 Dmitry Kozlyuk
  2021-11-18 15:25 ` [PATCH v2] " Dmitry Kozlyuk
  0 siblings, 1 reply; 4+ messages in thread
From: Dmitry Kozlyuk @ 2021-11-17 18:49 UTC (permalink / raw)
  To: dev; +Cc: Raslan Darawsheh, Viacheslav Ovsiienko, Michael Baum, Matan Azrad

Mempool registration was not correctly processing
mempools with RTE_PKTMBUF_F_PINEND_EXT_BUF flag set
("pinned mempools" for short), because it is not known
at registration time whether the mempool is a pktmbuf one,
and its elements may not yet be initialized to analyze them.
Attempts had been made to recognize such pools,
but there was no robust solution, only the owner of a mempool
(the application or a device) knows its type.
This patch extends common/mlx5 registration code
to accept a hint that the mempool is a pinned one
and uses this capability from net/mlx5 driver.

1. Remove all code assuming pktmbuf pool type
   or trying to recognize the type of a pool.
2. Register pinned mempools used for Rx
   and their external memory on port start.
3. Change Tx slow path logic as follows:
   3.1. Search the mempool database for a memory region (MR)
        by the mbuf pool and its buffer address.
   3.2. If not MR for the address is found for the mempool,
	and the mempool contains only pinned external buffers,
	perform the mempool registration of the mempool
	and its external pinned memory.
   3.3. Fall back to using page-based MRs in other cases
	(for example, a buffer with externally attached memory,
	but not from a pinned mempool).

Fixes: 690b2a88c2f7 ("common/mlx5: add mempool registration facilities")
Fixes: fec28ca0e3a9 ("net/mlx5: support mempool registration")

Signed-off-by: Dmitry Kozlyuk <dkozlyuk@nvidia.com>
Reviewed-by: Matan Azrad <matan@nvidia.com>
---
Applies to next-net-mlx.

 drivers/common/mlx5/mlx5_common.c    |  11 +--
 drivers/common/mlx5/mlx5_common_mp.c |   4 +-
 drivers/common/mlx5/mlx5_common_mp.h |  10 ++-
 drivers/common/mlx5/mlx5_common_mr.c | 109 ++++++++++++++++-----------
 drivers/common/mlx5/mlx5_common_mr.h |  12 +--
 drivers/net/mlx5/linux/mlx5_mp_os.c  |   3 +-
 drivers/net/mlx5/mlx5_rxq.c          |   2 +-
 drivers/net/mlx5/mlx5_trigger.c      |   7 +-
 8 files changed, 88 insertions(+), 70 deletions(-)

diff --git a/drivers/common/mlx5/mlx5_common.c b/drivers/common/mlx5/mlx5_common.c
index 66c2c08b7d..f1650f94c6 100644
--- a/drivers/common/mlx5/mlx5_common.c
+++ b/drivers/common/mlx5/mlx5_common.c
@@ -317,9 +317,9 @@ mlx5_dev_to_pci_str(const struct rte_device *dev, char *addr, size_t size)
  */
 static int
 mlx5_dev_mempool_register(struct mlx5_common_device *cdev,
-			  struct rte_mempool *mp)
+			  struct rte_mempool *mp, bool is_extmem)
 {
-	return mlx5_mr_mempool_register(cdev, mp);
+	return mlx5_mr_mempool_register(cdev, mp, is_extmem);
 }
 
 /**
@@ -353,7 +353,7 @@ mlx5_dev_mempool_register_cb(struct rte_mempool *mp, void *arg)
 	struct mlx5_common_device *cdev = arg;
 	int ret;
 
-	ret = mlx5_dev_mempool_register(cdev, mp);
+	ret = mlx5_dev_mempool_register(cdev, mp, false);
 	if (ret < 0 && rte_errno != EEXIST)
 		DRV_LOG(ERR,
 			"Failed to register existing mempool %s for PD %p: %s",
@@ -390,13 +390,10 @@ mlx5_dev_mempool_event_cb(enum rte_mempool_event event, struct rte_mempool *mp,
 			  void *arg)
 {
 	struct mlx5_common_device *cdev = arg;
-	bool extmem = mlx5_mempool_is_extmem(mp);
 
 	switch (event) {
 	case RTE_MEMPOOL_EVENT_READY:
-		if (extmem)
-			break;
-		if (mlx5_dev_mempool_register(cdev, mp) < 0)
+		if (mlx5_dev_mempool_register(cdev, mp, false) < 0)
 			DRV_LOG(ERR,
 				"Failed to register new mempool %s for PD %p: %s",
 				mp->name, cdev->pd, rte_strerror(rte_errno));
diff --git a/drivers/common/mlx5/mlx5_common_mp.c b/drivers/common/mlx5/mlx5_common_mp.c
index 536d61f66c..a7a671b7c5 100644
--- a/drivers/common/mlx5/mlx5_common_mp.c
+++ b/drivers/common/mlx5/mlx5_common_mp.c
@@ -65,7 +65,8 @@ mlx5_mp_req_mr_create(struct mlx5_common_device *cdev, uintptr_t addr)
  */
 int
 mlx5_mp_req_mempool_reg(struct mlx5_common_device *cdev,
-			struct rte_mempool *mempool, bool reg)
+			struct rte_mempool *mempool, bool reg,
+			bool is_extmem)
 {
 	struct rte_mp_msg mp_req;
 	struct rte_mp_msg *mp_res;
@@ -82,6 +83,7 @@ mlx5_mp_req_mempool_reg(struct mlx5_common_device *cdev,
 		     MLX5_MP_REQ_MEMPOOL_UNREGISTER;
 	mp_init_port_agnostic_msg(&mp_req, type);
 	arg->mempool = mempool;
+	arg->is_extmem = is_extmem;
 	arg->cdev = cdev;
 	ret = rte_mp_request_sync(&mp_req, &mp_rep, &ts);
 	if (ret) {
diff --git a/drivers/common/mlx5/mlx5_common_mp.h b/drivers/common/mlx5/mlx5_common_mp.h
index b1e3a41a20..4599ba8f92 100644
--- a/drivers/common/mlx5/mlx5_common_mp.h
+++ b/drivers/common/mlx5/mlx5_common_mp.h
@@ -37,9 +37,12 @@ struct mlx5_mp_arg_queue_id {
 
 struct mlx5_mp_arg_mr_manage {
 	struct mlx5_common_device *cdev;
+	RTE_STD_C11
 	union {
-		struct rte_mempool *mempool;
-		/* MLX5_MP_REQ_MEMPOOL_(UN)REGISTER */
+		struct {
+			struct rte_mempool *mempool;
+			bool is_extmem;
+		}; /* MLX5_MP_REQ_MEMPOOL_(UN)REGISTER */
 		uintptr_t addr; /* MLX5_MP_REQ_CREATE_MR */
 	};
 };
@@ -134,7 +137,8 @@ __rte_internal
 int mlx5_mp_req_mr_create(struct mlx5_common_device *cdev, uintptr_t addr);
 __rte_internal
 int mlx5_mp_req_mempool_reg(struct mlx5_common_device *cdev,
-			    struct rte_mempool *mempool, bool reg);
+			    struct rte_mempool *mempool, bool reg,
+			    bool is_extmem);
 __rte_internal
 int mlx5_mp_req_queue_state_modify(struct mlx5_mp_id *mp_id,
 				   struct mlx5_mp_arg_queue_state_modify *sm);
diff --git a/drivers/common/mlx5/mlx5_common_mr.c b/drivers/common/mlx5/mlx5_common_mr.c
index a7a499f6f9..498fa7513f 100644
--- a/drivers/common/mlx5/mlx5_common_mr.c
+++ b/drivers/common/mlx5/mlx5_common_mr.c
@@ -47,6 +47,8 @@ struct mlx5_mempool_reg {
 	struct mlx5_mempool_mr *mrs;
 	/** Number of memory regions. */
 	unsigned int mrs_n;
+	/** Whether the MR were created for external pinned memory. */
+	bool is_extmem;
 };
 
 void
@@ -1400,6 +1402,8 @@ mlx5_mempool_get_extmem(struct rte_mempool *mp, struct mlx5_range **out,
  *
  * @param[in] mp
  *   Analyzed mempool.
+ * @param[in] is_extmem
+ *   Whether the pool is contains only external pinned buffers.
  * @param[out] out
  *   Receives the ranges, caller must release it with free().
  * @param[out] ount_n
@@ -1409,17 +1413,16 @@ mlx5_mempool_get_extmem(struct rte_mempool *mp, struct mlx5_range **out,
  *   0 on success, (-1) on failure.
  */
 static int
-mlx5_get_mempool_ranges(struct rte_mempool *mp, struct mlx5_range **out,
-			unsigned int *out_n)
+mlx5_get_mempool_ranges(struct rte_mempool *mp, bool is_extmem,
+			struct mlx5_range **out, unsigned int *out_n)
 {
 	struct mlx5_range *chunks;
 	unsigned int chunks_n, contig_n, i;
 	int ret;
 
 	/* Collect the pool underlying memory. */
-	ret = mlx5_mempool_is_extmem(mp) ?
-	      mlx5_mempool_get_extmem(mp, &chunks, &chunks_n) :
-	      mlx5_mempool_get_chunks(mp, &chunks, &chunks_n);
+	ret = is_extmem ? mlx5_mempool_get_extmem(mp, &chunks, &chunks_n) :
+			  mlx5_mempool_get_chunks(mp, &chunks, &chunks_n);
 	if (ret < 0)
 		return ret;
 	/* Merge adjacent chunks and place them at the beginning. */
@@ -1443,6 +1446,8 @@ mlx5_get_mempool_ranges(struct rte_mempool *mp, struct mlx5_range **out,
  *
  * @param[in] mp
  *   Mempool to analyze.
+ * @param[in] is_extmem
+ *   Whether the pool is contains only external pinned buffers.
  * @param[out] out
  *   Receives memory ranges to register, aligned to the system page size.
  *   The caller must release them with free().
@@ -1455,14 +1460,15 @@ mlx5_get_mempool_ranges(struct rte_mempool *mp, struct mlx5_range **out,
  *   0 on success, (-1) on failure.
  */
 static int
-mlx5_mempool_reg_analyze(struct rte_mempool *mp, struct mlx5_range **out,
-			 unsigned int *out_n, bool *share_hugepage)
+mlx5_mempool_reg_analyze(struct rte_mempool *mp, bool is_extmem,
+			 struct mlx5_range **out, unsigned int *out_n,
+			 bool *share_hugepage)
 {
 	struct mlx5_range *ranges = NULL;
 	unsigned int i, ranges_n = 0;
 	struct rte_memseg_list *msl;
 
-	if (mlx5_get_mempool_ranges(mp, &ranges, &ranges_n) < 0) {
+	if (mlx5_get_mempool_ranges(mp, is_extmem, &ranges, &ranges_n) < 0) {
 		DRV_LOG(ERR, "Cannot get address ranges for mempool %s",
 			mp->name);
 		return -1;
@@ -1504,7 +1510,8 @@ mlx5_mempool_reg_analyze(struct rte_mempool *mp, struct mlx5_range **out,
 
 /** Create a registration object for the mempool. */
 static struct mlx5_mempool_reg *
-mlx5_mempool_reg_create(struct rte_mempool *mp, unsigned int mrs_n)
+mlx5_mempool_reg_create(struct rte_mempool *mp, unsigned int mrs_n,
+			bool is_extmem)
 {
 	struct mlx5_mempool_reg *mpr = NULL;
 
@@ -1519,6 +1526,7 @@ mlx5_mempool_reg_create(struct rte_mempool *mp, unsigned int mrs_n)
 	mpr->mp = mp;
 	mpr->mrs = (struct mlx5_mempool_mr *)(mpr + 1);
 	mpr->mrs_n = mrs_n;
+	mpr->is_extmem = is_extmem;
 	return mpr;
 }
 
@@ -1583,31 +1591,32 @@ mlx5_mempool_reg_detach(struct mlx5_mempool_reg *mpr)
 
 static int
 mlx5_mr_mempool_register_primary(struct mlx5_mr_share_cache *share_cache,
-				 void *pd, struct rte_mempool *mp)
+				 void *pd, struct rte_mempool *mp,
+				 bool is_extmem)
 {
 	struct mlx5_range *ranges = NULL;
-	struct mlx5_mempool_reg *mpr, *new_mpr;
+	struct mlx5_mempool_reg *mpr, *old_mpr, *new_mpr;
 	unsigned int i, ranges_n;
-	bool share_hugepage;
+	bool share_hugepage, standalone = false;
 	int ret = -1;
 
 	/* Early check to avoid unnecessary creation of MRs. */
 	rte_rwlock_read_lock(&share_cache->rwlock);
-	mpr = mlx5_mempool_reg_lookup(share_cache, mp);
+	old_mpr = mlx5_mempool_reg_lookup(share_cache, mp);
 	rte_rwlock_read_unlock(&share_cache->rwlock);
-	if (mpr != NULL) {
+	if (old_mpr != NULL && (!is_extmem || old_mpr->is_extmem)) {
 		DRV_LOG(DEBUG, "Mempool %s is already registered for PD %p",
 			mp->name, pd);
 		rte_errno = EEXIST;
 		goto exit;
 	}
-	if (mlx5_mempool_reg_analyze(mp, &ranges, &ranges_n,
+	if (mlx5_mempool_reg_analyze(mp, is_extmem, &ranges, &ranges_n,
 				     &share_hugepage) < 0) {
 		DRV_LOG(ERR, "Cannot get mempool %s memory ranges", mp->name);
 		rte_errno = ENOMEM;
 		goto exit;
 	}
-	new_mpr = mlx5_mempool_reg_create(mp, ranges_n);
+	new_mpr = mlx5_mempool_reg_create(mp, ranges_n, is_extmem);
 	if (new_mpr == NULL) {
 		DRV_LOG(ERR,
 			"Cannot create a registration object for mempool %s in PD %p",
@@ -1667,6 +1676,12 @@ mlx5_mr_mempool_register_primary(struct mlx5_mr_share_cache *share_cache,
 	/* Concurrent registration is not supposed to happen. */
 	rte_rwlock_write_lock(&share_cache->rwlock);
 	mpr = mlx5_mempool_reg_lookup(share_cache, mp);
+	if (mpr == old_mpr && old_mpr != NULL) {
+		LIST_REMOVE(old_mpr, next);
+		standalone = mlx5_mempool_reg_detach(mpr);
+		/* No need to flush the cache: old MRs cannot be in use. */
+		mpr = NULL;
+	}
 	if (mpr == NULL) {
 		mlx5_mempool_reg_attach(new_mpr);
 		LIST_INSERT_HEAD(&share_cache->mempool_reg_list, new_mpr, next);
@@ -1679,6 +1694,10 @@ mlx5_mr_mempool_register_primary(struct mlx5_mr_share_cache *share_cache,
 		mlx5_mempool_reg_destroy(share_cache, new_mpr, true);
 		rte_errno = EEXIST;
 		goto exit;
+	} else if (old_mpr != NULL) {
+		DRV_LOG(DEBUG, "Mempool %s registration for PD %p updated for external memory",
+			mp->name, pd);
+		mlx5_mempool_reg_destroy(share_cache, old_mpr, standalone);
 	}
 exit:
 	free(ranges);
@@ -1687,9 +1706,9 @@ mlx5_mr_mempool_register_primary(struct mlx5_mr_share_cache *share_cache,
 
 static int
 mlx5_mr_mempool_register_secondary(struct mlx5_common_device *cdev,
-				   struct rte_mempool *mp)
+				   struct rte_mempool *mp, bool is_extmem)
 {
-	return mlx5_mp_req_mempool_reg(cdev, mp, true);
+	return mlx5_mp_req_mempool_reg(cdev, mp, true, is_extmem);
 }
 
 /**
@@ -1705,16 +1724,17 @@ mlx5_mr_mempool_register_secondary(struct mlx5_common_device *cdev,
  */
 int
 mlx5_mr_mempool_register(struct mlx5_common_device *cdev,
-			 struct rte_mempool *mp)
+			 struct rte_mempool *mp, bool is_extmem)
 {
 	if (mp->flags & RTE_MEMPOOL_F_NON_IO)
 		return 0;
 	switch (rte_eal_process_type()) {
 	case RTE_PROC_PRIMARY:
 		return mlx5_mr_mempool_register_primary(&cdev->mr_scache,
-							cdev->pd, mp);
+							cdev->pd, mp,
+							is_extmem);
 	case RTE_PROC_SECONDARY:
-		return mlx5_mr_mempool_register_secondary(cdev, mp);
+		return mlx5_mr_mempool_register_secondary(cdev, mp, is_extmem);
 	default:
 		return -1;
 	}
@@ -1753,7 +1773,7 @@ static int
 mlx5_mr_mempool_unregister_secondary(struct mlx5_common_device *cdev,
 				     struct rte_mempool *mp)
 {
-	return mlx5_mp_req_mempool_reg(cdev, mp, false);
+	return mlx5_mp_req_mempool_reg(cdev, mp, false, false /* is_extmem */);
 }
 
 /**
@@ -1910,32 +1930,33 @@ mlx5_mr_mempool2mr_bh(struct mlx5_mr_share_cache *share_cache,
 uint32_t
 mlx5_mr_mb2mr_bh(struct mlx5_mr_ctrl *mr_ctrl, struct rte_mbuf *mb)
 {
+	struct mlx5_mprq_buf *buf;
 	uint32_t lkey;
 	uintptr_t addr = (uintptr_t)mb->buf_addr;
 	struct mlx5_common_device *cdev = mr_ctrl->cdev;
+	struct rte_mempool *mp;
 
-	if (cdev->config.mr_mempool_reg_en) {
-		struct rte_mempool *mp = NULL;
-		struct mlx5_mprq_buf *buf;
-
-		if (!RTE_MBUF_HAS_EXTBUF(mb)) {
-			mp = mlx5_mb2mp(mb);
-		} else if (mb->shinfo->free_cb == mlx5_mprq_buf_free_cb) {
-			/* Recover MPRQ mempool. */
-			buf = mb->shinfo->fcb_opaque;
-			mp = buf->mp;
-		}
-		if (mp != NULL) {
-			lkey = mlx5_mr_mempool2mr_bh(&cdev->mr_scache,
-						     mr_ctrl, mp, addr);
-			/*
-			 * Lookup can only fail on invalid input, e.g. "addr"
-			 * is not from "mp" or "mp" has MEMPOOL_F_NON_IO set.
-			 */
-			if (lkey != UINT32_MAX)
-				return lkey;
-		}
-		/* Fallback for generic mechanism in corner cases. */
+	/* Recover MPRQ mempool. */
+	if (RTE_MBUF_HAS_EXTBUF(mb) &&
+	    mb->shinfo->free_cb == mlx5_mprq_buf_free_cb) {
+		buf = mb->shinfo->fcb_opaque;
+		mp = buf->mp;
+	} else {
+		mp = mlx5_mb2mp(mb);
+	}
+	lkey = mlx5_mr_mempool2mr_bh(&cdev->mr_scache,
+				     mr_ctrl, mp, addr);
+	if (lkey != UINT32_MAX)
+		return lkey;
+	/* Register pinned external memory if the mempool is not used for Rx. */
+	if (cdev->config.mr_mempool_reg_en &&
+	    (rte_pktmbuf_priv_flags(mp) & RTE_PKTMBUF_POOL_F_PINNED_EXT_BUF)) {
+		if (mlx5_mr_mempool_register(mr_ctrl->cdev, mp, true) < 0)
+			return UINT32_MAX;
+		lkey = mlx5_mr_mempool2mr_bh(&cdev->mr_scache,
+					     mr_ctrl, mp, addr);
+		MLX5_ASSERT(lkey != UINT32_MAX);
 	}
+	/* Fallback to generic mechanism in corner cases. */
 	return mlx5_mr_addr2mr_bh(mr_ctrl, addr);
 }
diff --git a/drivers/common/mlx5/mlx5_common_mr.h b/drivers/common/mlx5/mlx5_common_mr.h
index 442b9d4694..08035f48ee 100644
--- a/drivers/common/mlx5/mlx5_common_mr.h
+++ b/drivers/common/mlx5/mlx5_common_mr.h
@@ -257,20 +257,10 @@ mlx5_os_set_reg_mr_cb(mlx5_reg_mr_t *reg_mr_cb, mlx5_dereg_mr_t *dereg_mr_cb);
 __rte_internal
 int
 mlx5_mr_mempool_register(struct mlx5_common_device *cdev,
-			 struct rte_mempool *mp);
+			 struct rte_mempool *mp, bool is_extmem);
 __rte_internal
 int
 mlx5_mr_mempool_unregister(struct mlx5_common_device *cdev,
 			   struct rte_mempool *mp);
 
-/** Check if @p mp has buffers pinned in external memory. */
-static inline bool
-mlx5_mempool_is_extmem(struct rte_mempool *mp)
-{
-	return (mp->private_data_size ==
-		sizeof(struct rte_pktmbuf_pool_private)) &&
-	       (mp->elt_size >= sizeof(struct rte_mbuf)) &&
-	       (rte_pktmbuf_priv_flags(mp) & RTE_PKTMBUF_POOL_F_PINNED_EXT_BUF);
-}
-
 #endif /* RTE_PMD_MLX5_COMMON_MR_H_ */
diff --git a/drivers/net/mlx5/linux/mlx5_mp_os.c b/drivers/net/mlx5/linux/mlx5_mp_os.c
index edc5203dd6..c448a3e9eb 100644
--- a/drivers/net/mlx5/linux/mlx5_mp_os.c
+++ b/drivers/net/mlx5/linux/mlx5_mp_os.c
@@ -48,7 +48,8 @@ mlx5_mp_os_handle_port_agnostic(const struct rte_mp_msg *mp_msg,
 		return rte_mp_reply(&mp_res, peer);
 	case MLX5_MP_REQ_MEMPOOL_REGISTER:
 		mp_init_port_agnostic_msg(&mp_res, param->type);
-		res->result = mlx5_mr_mempool_register(mng->cdev, mng->mempool);
+		res->result = mlx5_mr_mempool_register(mng->cdev, mng->mempool,
+						       mng->is_extmem);
 		return rte_mp_reply(&mp_res, peer);
 	case MLX5_MP_REQ_MEMPOOL_UNREGISTER:
 		mp_init_port_agnostic_msg(&mp_res, param->type);
diff --git a/drivers/net/mlx5/mlx5_rxq.c b/drivers/net/mlx5/mlx5_rxq.c
index d5a7155392..a8ef21c6f1 100644
--- a/drivers/net/mlx5/mlx5_rxq.c
+++ b/drivers/net/mlx5/mlx5_rxq.c
@@ -1458,7 +1458,7 @@ mlx5_mprq_alloc_mp(struct rte_eth_dev *dev)
 		rte_errno = ENOMEM;
 		return -rte_errno;
 	}
-	ret = mlx5_mr_mempool_register(priv->sh->cdev, mp);
+	ret = mlx5_mr_mempool_register(priv->sh->cdev, mp, false);
 	if (ret < 0 && rte_errno != EEXIST) {
 		ret = rte_errno;
 		DRV_LOG(ERR, "port %u failed to register a mempool for Multi-Packet RQ",
diff --git a/drivers/net/mlx5/mlx5_trigger.c b/drivers/net/mlx5/mlx5_trigger.c
index e2bfde19c7..bafb41d9cf 100644
--- a/drivers/net/mlx5/mlx5_trigger.c
+++ b/drivers/net/mlx5/mlx5_trigger.c
@@ -147,14 +147,17 @@ mlx5_rxq_mempool_register(struct mlx5_rxq_ctrl *rxq_ctrl)
 	}
 	for (s = 0; s < rxq_ctrl->rxq.rxseg_n; s++) {
 		uint32_t flags;
+		bool is_extmem;
 
 		mp = rxq_ctrl->rxq.rxseg[s].mp;
 		flags = mp != rxq_ctrl->rxq.mprq_mp ?
 			rte_pktmbuf_priv_flags(mp) : 0;
-		ret = mlx5_mr_mempool_register(rxq_ctrl->sh->cdev, mp);
+		is_extmem = (flags & RTE_PKTMBUF_POOL_F_PINNED_EXT_BUF) != 0;
+		ret = mlx5_mr_mempool_register(rxq_ctrl->sh->cdev, mp,
+					       is_extmem);
 		if (ret < 0 && rte_errno != EEXIST)
 			return ret;
-		if ((flags & RTE_PKTMBUF_POOL_F_PINNED_EXT_BUF) == 0)
+		if (!is_extmem)
 			rte_mempool_mem_iter(mp, mlx5_rxq_mempool_register_cb,
 					     &rxq_ctrl->rxq);
 	}
-- 
2.25.1


^ permalink raw reply	[flat|nested] 4+ messages in thread

* [PATCH v2] common/mlx5: fix mempool registration
  2021-11-17 18:49 [PATCH] common/mlx5: fix mempool registration Dmitry Kozlyuk
@ 2021-11-18 15:25 ` Dmitry Kozlyuk
  2021-11-19 14:31   ` [PATCH v3] " Dmitry Kozlyuk
  0 siblings, 1 reply; 4+ messages in thread
From: Dmitry Kozlyuk @ 2021-11-18 15:25 UTC (permalink / raw)
  To: dev; +Cc: Raslan Darawsheh, Michael Baum, Matan Azrad, Viacheslav Ovsiienko

Mempool registration was not correctly processing
mempools with RTE_PKTMBUF_F_PINEND_EXT_BUF flag set
("pinned mempools" for short), because it is not known
at registration time whether the mempool is a pktmbuf one,
and its elements may not yet be initialized to analyze them.
Attempts had been made to recognize such pools,
but there was no robust solution, only the owner of a mempool
(the application or a device) knows its type.
This patch extends common/mlx5 registration code
to accept a hint that the mempool is a pinned one
and uses this capability from net/mlx5 driver.

1. Remove all code assuming pktmbuf pool type
   or trying to recognize the type of a pool.
2. Register pinned mempools used for Rx
   and their external memory on port start.
   Populate the MR cache with all their MRs.
3. Change Tx slow path logic as follows:
   3.1. Search the mempool database for a memory region (MR)
        by the mbuf pool and its buffer address.
   3.2. If not MR for the address is found for the mempool,
	and the mempool contains only pinned external buffers,
	perform the mempool registration of the mempool
	and its external pinned memory.
   3.3. Fall back to using page-based MRs in other cases
	(for example, a buffer with externally attached memory,
	but not from a pinned mempool).

Fixes: 690b2a88c2f7 ("common/mlx5: add mempool registration facilities")
Fixes: fec28ca0e3a9 ("net/mlx5: support mempool registration")

Signed-off-by: Dmitry Kozlyuk <dkozlyuk@nvidia.com>
Reviewed-by: Matan Azrad <matan@nvidia.com>
Reviewed-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
---
v2: 1) rebase on ToT
    2) fix MR cache population

 drivers/common/mlx5/mlx5_common.c    |  11 +-
 drivers/common/mlx5/mlx5_common_mp.c |   4 +-
 drivers/common/mlx5/mlx5_common_mp.h |  10 +-
 drivers/common/mlx5/mlx5_common_mr.c | 166 ++++++++++++++++++++-------
 drivers/common/mlx5/mlx5_common_mr.h |  15 +--
 drivers/common/mlx5/version.map      |   1 +
 drivers/net/mlx5/linux/mlx5_mp_os.c  |   3 +-
 drivers/net/mlx5/mlx5_rxq.c          |   2 +-
 drivers/net/mlx5/mlx5_trigger.c      |  40 ++-----
 9 files changed, 158 insertions(+), 94 deletions(-)

diff --git a/drivers/common/mlx5/mlx5_common.c b/drivers/common/mlx5/mlx5_common.c
index 66c2c08b7d..f1650f94c6 100644
--- a/drivers/common/mlx5/mlx5_common.c
+++ b/drivers/common/mlx5/mlx5_common.c
@@ -317,9 +317,9 @@ mlx5_dev_to_pci_str(const struct rte_device *dev, char *addr, size_t size)
  */
 static int
 mlx5_dev_mempool_register(struct mlx5_common_device *cdev,
-			  struct rte_mempool *mp)
+			  struct rte_mempool *mp, bool is_extmem)
 {
-	return mlx5_mr_mempool_register(cdev, mp);
+	return mlx5_mr_mempool_register(cdev, mp, is_extmem);
 }
 
 /**
@@ -353,7 +353,7 @@ mlx5_dev_mempool_register_cb(struct rte_mempool *mp, void *arg)
 	struct mlx5_common_device *cdev = arg;
 	int ret;
 
-	ret = mlx5_dev_mempool_register(cdev, mp);
+	ret = mlx5_dev_mempool_register(cdev, mp, false);
 	if (ret < 0 && rte_errno != EEXIST)
 		DRV_LOG(ERR,
 			"Failed to register existing mempool %s for PD %p: %s",
@@ -390,13 +390,10 @@ mlx5_dev_mempool_event_cb(enum rte_mempool_event event, struct rte_mempool *mp,
 			  void *arg)
 {
 	struct mlx5_common_device *cdev = arg;
-	bool extmem = mlx5_mempool_is_extmem(mp);
 
 	switch (event) {
 	case RTE_MEMPOOL_EVENT_READY:
-		if (extmem)
-			break;
-		if (mlx5_dev_mempool_register(cdev, mp) < 0)
+		if (mlx5_dev_mempool_register(cdev, mp, false) < 0)
 			DRV_LOG(ERR,
 				"Failed to register new mempool %s for PD %p: %s",
 				mp->name, cdev->pd, rte_strerror(rte_errno));
diff --git a/drivers/common/mlx5/mlx5_common_mp.c b/drivers/common/mlx5/mlx5_common_mp.c
index 536d61f66c..a7a671b7c5 100644
--- a/drivers/common/mlx5/mlx5_common_mp.c
+++ b/drivers/common/mlx5/mlx5_common_mp.c
@@ -65,7 +65,8 @@ mlx5_mp_req_mr_create(struct mlx5_common_device *cdev, uintptr_t addr)
  */
 int
 mlx5_mp_req_mempool_reg(struct mlx5_common_device *cdev,
-			struct rte_mempool *mempool, bool reg)
+			struct rte_mempool *mempool, bool reg,
+			bool is_extmem)
 {
 	struct rte_mp_msg mp_req;
 	struct rte_mp_msg *mp_res;
@@ -82,6 +83,7 @@ mlx5_mp_req_mempool_reg(struct mlx5_common_device *cdev,
 		     MLX5_MP_REQ_MEMPOOL_UNREGISTER;
 	mp_init_port_agnostic_msg(&mp_req, type);
 	arg->mempool = mempool;
+	arg->is_extmem = is_extmem;
 	arg->cdev = cdev;
 	ret = rte_mp_request_sync(&mp_req, &mp_rep, &ts);
 	if (ret) {
diff --git a/drivers/common/mlx5/mlx5_common_mp.h b/drivers/common/mlx5/mlx5_common_mp.h
index b1e3a41a20..4599ba8f92 100644
--- a/drivers/common/mlx5/mlx5_common_mp.h
+++ b/drivers/common/mlx5/mlx5_common_mp.h
@@ -37,9 +37,12 @@ struct mlx5_mp_arg_queue_id {
 
 struct mlx5_mp_arg_mr_manage {
 	struct mlx5_common_device *cdev;
+	RTE_STD_C11
 	union {
-		struct rte_mempool *mempool;
-		/* MLX5_MP_REQ_MEMPOOL_(UN)REGISTER */
+		struct {
+			struct rte_mempool *mempool;
+			bool is_extmem;
+		}; /* MLX5_MP_REQ_MEMPOOL_(UN)REGISTER */
 		uintptr_t addr; /* MLX5_MP_REQ_CREATE_MR */
 	};
 };
@@ -134,7 +137,8 @@ __rte_internal
 int mlx5_mp_req_mr_create(struct mlx5_common_device *cdev, uintptr_t addr);
 __rte_internal
 int mlx5_mp_req_mempool_reg(struct mlx5_common_device *cdev,
-			    struct rte_mempool *mempool, bool reg);
+			    struct rte_mempool *mempool, bool reg,
+			    bool is_extmem);
 __rte_internal
 int mlx5_mp_req_queue_state_modify(struct mlx5_mp_id *mp_id,
 				   struct mlx5_mp_arg_queue_state_modify *sm);
diff --git a/drivers/common/mlx5/mlx5_common_mr.c b/drivers/common/mlx5/mlx5_common_mr.c
index 70365e1466..91feacf94d 100644
--- a/drivers/common/mlx5/mlx5_common_mr.c
+++ b/drivers/common/mlx5/mlx5_common_mr.c
@@ -47,6 +47,8 @@ struct mlx5_mempool_reg {
 	struct mlx5_mempool_mr *mrs;
 	/** Number of memory regions. */
 	unsigned int mrs_n;
+	/** Whether the MR were created for external pinned memory. */
+	bool is_extmem;
 };
 
 void
@@ -1403,6 +1405,8 @@ mlx5_mempool_get_extmem(struct rte_mempool *mp, struct mlx5_range **out,
  *
  * @param[in] mp
  *   Analyzed mempool.
+ * @param[in] is_extmem
+ *   Whether the pool is contains only external pinned buffers.
  * @param[out] out
  *   Receives the ranges, caller must release it with free().
  * @param[out] ount_n
@@ -1412,17 +1416,16 @@ mlx5_mempool_get_extmem(struct rte_mempool *mp, struct mlx5_range **out,
  *   0 on success, (-1) on failure.
  */
 static int
-mlx5_get_mempool_ranges(struct rte_mempool *mp, struct mlx5_range **out,
-			unsigned int *out_n)
+mlx5_get_mempool_ranges(struct rte_mempool *mp, bool is_extmem,
+			struct mlx5_range **out, unsigned int *out_n)
 {
 	struct mlx5_range *chunks;
 	unsigned int chunks_n, contig_n, i;
 	int ret;
 
 	/* Collect the pool underlying memory. */
-	ret = mlx5_mempool_is_extmem(mp) ?
-	      mlx5_mempool_get_extmem(mp, &chunks, &chunks_n) :
-	      mlx5_mempool_get_chunks(mp, &chunks, &chunks_n);
+	ret = is_extmem ? mlx5_mempool_get_extmem(mp, &chunks, &chunks_n) :
+			  mlx5_mempool_get_chunks(mp, &chunks, &chunks_n);
 	if (ret < 0)
 		return ret;
 	/* Merge adjacent chunks and place them at the beginning. */
@@ -1446,6 +1449,8 @@ mlx5_get_mempool_ranges(struct rte_mempool *mp, struct mlx5_range **out,
  *
  * @param[in] mp
  *   Mempool to analyze.
+ * @param[in] is_extmem
+ *   Whether the pool is contains only external pinned buffers.
  * @param[out] out
  *   Receives memory ranges to register, aligned to the system page size.
  *   The caller must release them with free().
@@ -1458,14 +1463,15 @@ mlx5_get_mempool_ranges(struct rte_mempool *mp, struct mlx5_range **out,
  *   0 on success, (-1) on failure.
  */
 static int
-mlx5_mempool_reg_analyze(struct rte_mempool *mp, struct mlx5_range **out,
-			 unsigned int *out_n, bool *share_hugepage)
+mlx5_mempool_reg_analyze(struct rte_mempool *mp, bool is_extmem,
+			 struct mlx5_range **out, unsigned int *out_n,
+			 bool *share_hugepage)
 {
 	struct mlx5_range *ranges = NULL;
 	unsigned int i, ranges_n = 0;
 	struct rte_memseg_list *msl;
 
-	if (mlx5_get_mempool_ranges(mp, &ranges, &ranges_n) < 0) {
+	if (mlx5_get_mempool_ranges(mp, is_extmem, &ranges, &ranges_n) < 0) {
 		DRV_LOG(ERR, "Cannot get address ranges for mempool %s",
 			mp->name);
 		return -1;
@@ -1507,7 +1513,8 @@ mlx5_mempool_reg_analyze(struct rte_mempool *mp, struct mlx5_range **out,
 
 /** Create a registration object for the mempool. */
 static struct mlx5_mempool_reg *
-mlx5_mempool_reg_create(struct rte_mempool *mp, unsigned int mrs_n)
+mlx5_mempool_reg_create(struct rte_mempool *mp, unsigned int mrs_n,
+			bool is_extmem)
 {
 	struct mlx5_mempool_reg *mpr = NULL;
 
@@ -1522,6 +1529,7 @@ mlx5_mempool_reg_create(struct rte_mempool *mp, unsigned int mrs_n)
 	mpr->mp = mp;
 	mpr->mrs = (struct mlx5_mempool_mr *)(mpr + 1);
 	mpr->mrs_n = mrs_n;
+	mpr->is_extmem = is_extmem;
 	return mpr;
 }
 
@@ -1586,31 +1594,32 @@ mlx5_mempool_reg_detach(struct mlx5_mempool_reg *mpr)
 
 static int
 mlx5_mr_mempool_register_primary(struct mlx5_mr_share_cache *share_cache,
-				 void *pd, struct rte_mempool *mp)
+				 void *pd, struct rte_mempool *mp,
+				 bool is_extmem)
 {
 	struct mlx5_range *ranges = NULL;
-	struct mlx5_mempool_reg *mpr, *new_mpr;
+	struct mlx5_mempool_reg *mpr, *old_mpr, *new_mpr;
 	unsigned int i, ranges_n;
-	bool share_hugepage;
+	bool share_hugepage, standalone = false;
 	int ret = -1;
 
 	/* Early check to avoid unnecessary creation of MRs. */
 	rte_rwlock_read_lock(&share_cache->rwlock);
-	mpr = mlx5_mempool_reg_lookup(share_cache, mp);
+	old_mpr = mlx5_mempool_reg_lookup(share_cache, mp);
 	rte_rwlock_read_unlock(&share_cache->rwlock);
-	if (mpr != NULL) {
+	if (old_mpr != NULL && (!is_extmem || old_mpr->is_extmem)) {
 		DRV_LOG(DEBUG, "Mempool %s is already registered for PD %p",
 			mp->name, pd);
 		rte_errno = EEXIST;
 		goto exit;
 	}
-	if (mlx5_mempool_reg_analyze(mp, &ranges, &ranges_n,
+	if (mlx5_mempool_reg_analyze(mp, is_extmem, &ranges, &ranges_n,
 				     &share_hugepage) < 0) {
 		DRV_LOG(ERR, "Cannot get mempool %s memory ranges", mp->name);
 		rte_errno = ENOMEM;
 		goto exit;
 	}
-	new_mpr = mlx5_mempool_reg_create(mp, ranges_n);
+	new_mpr = mlx5_mempool_reg_create(mp, ranges_n, is_extmem);
 	if (new_mpr == NULL) {
 		DRV_LOG(ERR,
 			"Cannot create a registration object for mempool %s in PD %p",
@@ -1670,6 +1679,12 @@ mlx5_mr_mempool_register_primary(struct mlx5_mr_share_cache *share_cache,
 	/* Concurrent registration is not supposed to happen. */
 	rte_rwlock_write_lock(&share_cache->rwlock);
 	mpr = mlx5_mempool_reg_lookup(share_cache, mp);
+	if (mpr == old_mpr && old_mpr != NULL) {
+		LIST_REMOVE(old_mpr, next);
+		standalone = mlx5_mempool_reg_detach(mpr);
+		/* No need to flush the cache: old MRs cannot be in use. */
+		mpr = NULL;
+	}
 	if (mpr == NULL) {
 		mlx5_mempool_reg_attach(new_mpr);
 		LIST_INSERT_HEAD(&share_cache->mempool_reg_list, new_mpr, next);
@@ -1682,6 +1697,10 @@ mlx5_mr_mempool_register_primary(struct mlx5_mr_share_cache *share_cache,
 		mlx5_mempool_reg_destroy(share_cache, new_mpr, true);
 		rte_errno = EEXIST;
 		goto exit;
+	} else if (old_mpr != NULL) {
+		DRV_LOG(DEBUG, "Mempool %s registration for PD %p updated for external memory",
+			mp->name, pd);
+		mlx5_mempool_reg_destroy(share_cache, old_mpr, standalone);
 	}
 exit:
 	free(ranges);
@@ -1690,9 +1709,9 @@ mlx5_mr_mempool_register_primary(struct mlx5_mr_share_cache *share_cache,
 
 static int
 mlx5_mr_mempool_register_secondary(struct mlx5_common_device *cdev,
-				   struct rte_mempool *mp)
+				   struct rte_mempool *mp, bool is_extmem)
 {
-	return mlx5_mp_req_mempool_reg(cdev, mp, true);
+	return mlx5_mp_req_mempool_reg(cdev, mp, true, is_extmem);
 }
 
 /**
@@ -1708,16 +1727,17 @@ mlx5_mr_mempool_register_secondary(struct mlx5_common_device *cdev,
  */
 int
 mlx5_mr_mempool_register(struct mlx5_common_device *cdev,
-			 struct rte_mempool *mp)
+			 struct rte_mempool *mp, bool is_extmem)
 {
 	if (mp->flags & RTE_MEMPOOL_F_NON_IO)
 		return 0;
 	switch (rte_eal_process_type()) {
 	case RTE_PROC_PRIMARY:
 		return mlx5_mr_mempool_register_primary(&cdev->mr_scache,
-							cdev->pd, mp);
+							cdev->pd, mp,
+							is_extmem);
 	case RTE_PROC_SECONDARY:
-		return mlx5_mr_mempool_register_secondary(cdev, mp);
+		return mlx5_mr_mempool_register_secondary(cdev, mp, is_extmem);
 	default:
 		return -1;
 	}
@@ -1756,7 +1776,7 @@ static int
 mlx5_mr_mempool_unregister_secondary(struct mlx5_common_device *cdev,
 				     struct rte_mempool *mp)
 {
-	return mlx5_mp_req_mempool_reg(cdev, mp, false);
+	return mlx5_mp_req_mempool_reg(cdev, mp, false, false /* is_extmem */);
 }
 
 /**
@@ -1868,6 +1888,65 @@ mlx5_lookup_mempool_regs(struct mlx5_mr_ctrl *mr_ctrl,
 	return lkey;
 }
 
+/**
+ * Populate cache with LKeys of all MRs used by the mempool.
+ * It is intended to be used to register Rx mempools in advance.
+ *
+ * @param mr_ctrl
+ *  Per-queue MR control handle.
+ * @param mp
+ *  Registered memory pool.
+ *
+ * @return
+ *  0 on success, (-1) on failure and rte_errno is set.
+ */
+int
+mlx5_mr_mempool_populate_cache(struct mlx5_mr_ctrl *mr_ctrl,
+			       struct rte_mempool *mp)
+{
+	struct mlx5_mr_share_cache *share_cache =
+		container_of(mr_ctrl->dev_gen_ptr, struct mlx5_mr_share_cache,
+			     dev_gen);
+	struct mlx5_mr_btree *bt = &mr_ctrl->cache_bh;
+	struct mlx5_mempool_reg *mpr;
+	unsigned int i;
+
+	/*
+	 * Registration is valid after the lock is released,
+	 * because the function is called after the mempool is registered.
+	 */
+	rte_rwlock_read_lock(&share_cache->rwlock);
+	mpr = mlx5_mempool_reg_lookup(share_cache, mp);
+	rte_rwlock_read_unlock(&share_cache->rwlock);
+	if (mpr == NULL) {
+		DRV_LOG(ERR, "Mempool %s is not registered", mp->name);
+		rte_errno = ENOENT;
+		return -1;
+	}
+	for (i = 0; i < mpr->mrs_n; i++) {
+		struct mlx5_mempool_mr *mr = &mpr->mrs[i];
+		struct mr_cache_entry entry;
+		uint32_t lkey;
+		uint16_t idx;
+
+		lkey = mr_btree_lookup(bt, &idx, (uintptr_t)mr->pmd_mr.addr);
+		if (lkey != UINT32_MAX)
+			continue;
+		if (bt->len == bt->size)
+			mr_btree_expand(bt, bt->size << 1);
+		entry.start = (uintptr_t)mr->pmd_mr.addr;
+		entry.end = entry.start + mr->pmd_mr.len;
+		entry.lkey = rte_cpu_to_be_32(mr->pmd_mr.lkey);
+		if (mr_btree_insert(bt, &entry) < 0) {
+			DRV_LOG(ERR, "Cannot insert cache entry for mempool %s MR %08x",
+				mp->name, entry.lkey);
+			rte_errno = EINVAL;
+			return -1;
+		}
+	}
+	return 0;
+}
+
 /**
  * Bottom-half lookup for the address from the mempool.
  *
@@ -1909,6 +1988,8 @@ mlx5_mr_mempool2mr_bh(struct mlx5_mr_ctrl *mr_ctrl,
 uint32_t
 mlx5_mr_mb2mr_bh(struct mlx5_mr_ctrl *mr_ctrl, struct rte_mbuf *mb)
 {
+	struct rte_mempool *mp;
+	struct mlx5_mprq_buf *buf;
 	uint32_t lkey;
 	uintptr_t addr = (uintptr_t)mb->buf_addr;
 	struct mlx5_mr_share_cache *share_cache =
@@ -1917,27 +1998,26 @@ mlx5_mr_mb2mr_bh(struct mlx5_mr_ctrl *mr_ctrl, struct rte_mbuf *mb)
 	struct mlx5_common_device *cdev =
 		container_of(share_cache, struct mlx5_common_device, mr_scache);
 
-	if (cdev->config.mr_mempool_reg_en) {
-		struct rte_mempool *mp = NULL;
-		struct mlx5_mprq_buf *buf;
-
-		if (!RTE_MBUF_HAS_EXTBUF(mb)) {
-			mp = mlx5_mb2mp(mb);
-		} else if (mb->shinfo->free_cb == mlx5_mprq_buf_free_cb) {
-			/* Recover MPRQ mempool. */
-			buf = mb->shinfo->fcb_opaque;
-			mp = buf->mp;
-		}
-		if (mp != NULL) {
-			lkey = mlx5_mr_mempool2mr_bh(mr_ctrl, mp, addr);
-			/*
-			 * Lookup can only fail on invalid input, e.g. "addr"
-			 * is not from "mp" or "mp" has MEMPOOL_F_NON_IO set.
-			 */
-			if (lkey != UINT32_MAX)
-				return lkey;
-		}
-		/* Fallback for generic mechanism in corner cases. */
+	/* Recover MPRQ mempool. */
+	if (RTE_MBUF_HAS_EXTBUF(mb) &&
+	    mb->shinfo->free_cb == mlx5_mprq_buf_free_cb) {
+		buf = mb->shinfo->fcb_opaque;
+		mp = buf->mp;
+	} else {
+		mp = mlx5_mb2mp(mb);
+	}
+	lkey = mlx5_mr_mempool2mr_bh(mr_ctrl, mp, addr);
+	if (lkey != UINT32_MAX)
+		return lkey;
+	/* Register pinned external memory if the mempool is not used for Rx. */
+	if (cdev->config.mr_mempool_reg_en &&
+	    (rte_pktmbuf_priv_flags(mp) & RTE_PKTMBUF_POOL_F_PINNED_EXT_BUF)) {
+		if (mlx5_mr_mempool_register(cdev, mp, true) < 0)
+			return UINT32_MAX;
+		lkey = mlx5_mr_mempool2mr_bh(mr_ctrl, mp, addr);
+		MLX5_ASSERT(lkey != UINT32_MAX);
+		return lkey;
 	}
+	/* Fallback to generic mechanism in corner cases. */
 	return mlx5_mr_addr2mr_bh(mr_ctrl, addr);
 }
diff --git a/drivers/common/mlx5/mlx5_common_mr.h b/drivers/common/mlx5/mlx5_common_mr.h
index de0cab29cc..cf384b6748 100644
--- a/drivers/common/mlx5/mlx5_common_mr.h
+++ b/drivers/common/mlx5/mlx5_common_mr.h
@@ -255,20 +255,15 @@ mlx5_os_set_reg_mr_cb(mlx5_reg_mr_t *reg_mr_cb, mlx5_dereg_mr_t *dereg_mr_cb);
 __rte_internal
 int
 mlx5_mr_mempool_register(struct mlx5_common_device *cdev,
-			 struct rte_mempool *mp);
+			 struct rte_mempool *mp, bool is_extmem);
 __rte_internal
 int
 mlx5_mr_mempool_unregister(struct mlx5_common_device *cdev,
 			   struct rte_mempool *mp);
 
-/** Check if @p mp has buffers pinned in external memory. */
-static inline bool
-mlx5_mempool_is_extmem(struct rte_mempool *mp)
-{
-	return (mp->private_data_size ==
-		sizeof(struct rte_pktmbuf_pool_private)) &&
-	       (mp->elt_size >= sizeof(struct rte_mbuf)) &&
-	       (rte_pktmbuf_priv_flags(mp) & RTE_PKTMBUF_POOL_F_PINNED_EXT_BUF);
-}
+__rte_internal
+int
+mlx5_mr_mempool_populate_cache(struct mlx5_mr_ctrl *mr_ctrl,
+			       struct rte_mempool *mp);
 
 #endif /* RTE_PMD_MLX5_COMMON_MR_H_ */
diff --git a/drivers/common/mlx5/version.map b/drivers/common/mlx5/version.map
index 8a62dc2782..34e86004a0 100644
--- a/drivers/common/mlx5/version.map
+++ b/drivers/common/mlx5/version.map
@@ -145,4 +145,5 @@ INTERNAL {
 	mlx5_mr_mempool_unregister;
 	mlx5_mp_req_mempool_reg;
 	mlx5_mr_mempool2mr_bh;
+	mlx5_mr_mempool_populate_cache;
 };
diff --git a/drivers/net/mlx5/linux/mlx5_mp_os.c b/drivers/net/mlx5/linux/mlx5_mp_os.c
index edc5203dd6..c448a3e9eb 100644
--- a/drivers/net/mlx5/linux/mlx5_mp_os.c
+++ b/drivers/net/mlx5/linux/mlx5_mp_os.c
@@ -48,7 +48,8 @@ mlx5_mp_os_handle_port_agnostic(const struct rte_mp_msg *mp_msg,
 		return rte_mp_reply(&mp_res, peer);
 	case MLX5_MP_REQ_MEMPOOL_REGISTER:
 		mp_init_port_agnostic_msg(&mp_res, param->type);
-		res->result = mlx5_mr_mempool_register(mng->cdev, mng->mempool);
+		res->result = mlx5_mr_mempool_register(mng->cdev, mng->mempool,
+						       mng->is_extmem);
 		return rte_mp_reply(&mp_res, peer);
 	case MLX5_MP_REQ_MEMPOOL_UNREGISTER:
 		mp_init_port_agnostic_msg(&mp_res, param->type);
diff --git a/drivers/net/mlx5/mlx5_rxq.c b/drivers/net/mlx5/mlx5_rxq.c
index 480f4f9f07..e406779faf 100644
--- a/drivers/net/mlx5/mlx5_rxq.c
+++ b/drivers/net/mlx5/mlx5_rxq.c
@@ -1458,7 +1458,7 @@ mlx5_mprq_alloc_mp(struct rte_eth_dev *dev)
 		rte_errno = ENOMEM;
 		return -rte_errno;
 	}
-	ret = mlx5_mr_mempool_register(priv->sh->cdev, mp);
+	ret = mlx5_mr_mempool_register(priv->sh->cdev, mp, false);
 	if (ret < 0 && rte_errno != EEXIST) {
 		ret = rte_errno;
 		DRV_LOG(ERR, "port %u failed to register a mempool for Multi-Packet RQ",
diff --git a/drivers/net/mlx5/mlx5_trigger.c b/drivers/net/mlx5/mlx5_trigger.c
index e2bfde19c7..dcdd0624a1 100644
--- a/drivers/net/mlx5/mlx5_trigger.c
+++ b/drivers/net/mlx5/mlx5_trigger.c
@@ -105,21 +105,6 @@ mlx5_txq_start(struct rte_eth_dev *dev)
 	return -rte_errno;
 }
 
-/**
- * Translate the chunk address to MR key in order to put in into the cache.
- */
-static void
-mlx5_rxq_mempool_register_cb(struct rte_mempool *mp, void *opaque,
-			     struct rte_mempool_memhdr *memhdr,
-			     unsigned int idx)
-{
-	struct mlx5_rxq_data *rxq = opaque;
-
-	RTE_SET_USED(mp);
-	RTE_SET_USED(idx);
-	mlx5_rx_addr2mr(rxq, (uintptr_t)memhdr->addr);
-}
-
 /**
  * Register Rx queue mempools and fill the Rx queue cache.
  * This function tolerates repeated mempool registration.
@@ -139,24 +124,23 @@ mlx5_rxq_mempool_register(struct mlx5_rxq_ctrl *rxq_ctrl)
 
 	mlx5_mr_flush_local_cache(&rxq_ctrl->rxq.mr_ctrl);
 	/* MPRQ mempool is registered on creation, just fill the cache. */
-	if (mlx5_rxq_mprq_enabled(&rxq_ctrl->rxq)) {
-		rte_mempool_mem_iter(rxq_ctrl->rxq.mprq_mp,
-				     mlx5_rxq_mempool_register_cb,
-				     &rxq_ctrl->rxq);
-		return 0;
-	}
+	if (mlx5_rxq_mprq_enabled(&rxq_ctrl->rxq))
+		return mlx5_mr_mempool_populate_cache(&rxq_ctrl->rxq.mr_ctrl,
+					              rxq_ctrl->rxq.mprq_mp);
 	for (s = 0; s < rxq_ctrl->rxq.rxseg_n; s++) {
-		uint32_t flags;
+		bool is_extmem;
 
 		mp = rxq_ctrl->rxq.rxseg[s].mp;
-		flags = mp != rxq_ctrl->rxq.mprq_mp ?
-			rte_pktmbuf_priv_flags(mp) : 0;
-		ret = mlx5_mr_mempool_register(rxq_ctrl->sh->cdev, mp);
+		is_extmem = (rte_pktmbuf_priv_flags(mp) &
+			     RTE_PKTMBUF_POOL_F_PINNED_EXT_BUF) != 0;
+		ret = mlx5_mr_mempool_register(rxq_ctrl->sh->cdev, mp,
+					       is_extmem);
 		if (ret < 0 && rte_errno != EEXIST)
 			return ret;
-		if ((flags & RTE_PKTMBUF_POOL_F_PINNED_EXT_BUF) == 0)
-			rte_mempool_mem_iter(mp, mlx5_rxq_mempool_register_cb,
-					     &rxq_ctrl->rxq);
+		ret = mlx5_mr_mempool_populate_cache(&rxq_ctrl->rxq.mr_ctrl,
+						     mp);
+		if (ret < 0)
+			return ret;
 	}
 	return 0;
 }
-- 
2.25.1


^ permalink raw reply	[flat|nested] 4+ messages in thread

* [PATCH v3] common/mlx5: fix mempool registration
  2021-11-18 15:25 ` [PATCH v2] " Dmitry Kozlyuk
@ 2021-11-19 14:31   ` Dmitry Kozlyuk
  2021-11-21 14:39     ` Raslan Darawsheh
  0 siblings, 1 reply; 4+ messages in thread
From: Dmitry Kozlyuk @ 2021-11-19 14:31 UTC (permalink / raw)
  To: dev; +Cc: Raslan Darawsheh, Michael Baum, Matan Azrad, Viacheslav Ovsiienko

Mempool registration was not correctly processing
mempools with RTE_PKTMBUF_F_PINEND_EXT_BUF flag set
("pinned mempools" for short), because it is not known
at registration time whether the mempool is a pktmbuf one,
and its elements may not yet be initialized to analyze them.
Attempts had been made to recognize such pools,
but there was no robust solution, only the owner of a mempool
(the application or a device) knows its type.
This patch extends common/mlx5 registration code
to accept a hint that the mempool is a pinned one
and uses this capability from net/mlx5 driver.

1. Remove all code assuming pktmbuf pool type
   or trying to recognize the type of a pool.
2. Register pinned mempools used for Rx
   and their external memory on port start.
   Populate the MR cache with all their MRs.
3. Change Tx slow path logic as follows:
   3.1. Search the mempool database for a memory region (MR)
        by the mbuf pool and its buffer address.
   3.2. If not MR for the address is found for the mempool,
	and the mempool contains only pinned external buffers,
	perform the mempool registration of the mempool
	and its external pinned memory.
   3.3. Fall back to using page-based MRs in other cases
	(for example, a buffer with externally attached memory,
	but not from a pinned mempool).

Fixes: 690b2a88c2f7 ("common/mlx5: add mempool registration facilities")
Fixes: fec28ca0e3a9 ("net/mlx5: support mempool registration")

Signed-off-by: Dmitry Kozlyuk <dkozlyuk@nvidia.com>
Reviewed-by: Matan Azrad <matan@nvidia.com>
Reviewed-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
---
v3: fix build with GCC on RHEL7
v2: 1) rebase on ToT
    2) fix MR cache population

 drivers/common/mlx5/mlx5_common.c    |  11 +-
 drivers/common/mlx5/mlx5_common_mp.c |   4 +-
 drivers/common/mlx5/mlx5_common_mp.h |  10 +-
 drivers/common/mlx5/mlx5_common_mr.c | 180 +++++++++++++++++++--------
 drivers/common/mlx5/mlx5_common_mr.h |  15 +--
 drivers/common/mlx5/version.map      |   1 +
 drivers/net/mlx5/linux/mlx5_mp_os.c  |   3 +-
 drivers/net/mlx5/mlx5_rxq.c          |   2 +-
 drivers/net/mlx5/mlx5_trigger.c      |  40 ++----
 9 files changed, 163 insertions(+), 103 deletions(-)

diff --git a/drivers/common/mlx5/mlx5_common.c b/drivers/common/mlx5/mlx5_common.c
index 66c2c08b7d..f1650f94c6 100644
--- a/drivers/common/mlx5/mlx5_common.c
+++ b/drivers/common/mlx5/mlx5_common.c
@@ -317,9 +317,9 @@ mlx5_dev_to_pci_str(const struct rte_device *dev, char *addr, size_t size)
  */
 static int
 mlx5_dev_mempool_register(struct mlx5_common_device *cdev,
-			  struct rte_mempool *mp)
+			  struct rte_mempool *mp, bool is_extmem)
 {
-	return mlx5_mr_mempool_register(cdev, mp);
+	return mlx5_mr_mempool_register(cdev, mp, is_extmem);
 }
 
 /**
@@ -353,7 +353,7 @@ mlx5_dev_mempool_register_cb(struct rte_mempool *mp, void *arg)
 	struct mlx5_common_device *cdev = arg;
 	int ret;
 
-	ret = mlx5_dev_mempool_register(cdev, mp);
+	ret = mlx5_dev_mempool_register(cdev, mp, false);
 	if (ret < 0 && rte_errno != EEXIST)
 		DRV_LOG(ERR,
 			"Failed to register existing mempool %s for PD %p: %s",
@@ -390,13 +390,10 @@ mlx5_dev_mempool_event_cb(enum rte_mempool_event event, struct rte_mempool *mp,
 			  void *arg)
 {
 	struct mlx5_common_device *cdev = arg;
-	bool extmem = mlx5_mempool_is_extmem(mp);
 
 	switch (event) {
 	case RTE_MEMPOOL_EVENT_READY:
-		if (extmem)
-			break;
-		if (mlx5_dev_mempool_register(cdev, mp) < 0)
+		if (mlx5_dev_mempool_register(cdev, mp, false) < 0)
 			DRV_LOG(ERR,
 				"Failed to register new mempool %s for PD %p: %s",
 				mp->name, cdev->pd, rte_strerror(rte_errno));
diff --git a/drivers/common/mlx5/mlx5_common_mp.c b/drivers/common/mlx5/mlx5_common_mp.c
index 536d61f66c..a7a671b7c5 100644
--- a/drivers/common/mlx5/mlx5_common_mp.c
+++ b/drivers/common/mlx5/mlx5_common_mp.c
@@ -65,7 +65,8 @@ mlx5_mp_req_mr_create(struct mlx5_common_device *cdev, uintptr_t addr)
  */
 int
 mlx5_mp_req_mempool_reg(struct mlx5_common_device *cdev,
-			struct rte_mempool *mempool, bool reg)
+			struct rte_mempool *mempool, bool reg,
+			bool is_extmem)
 {
 	struct rte_mp_msg mp_req;
 	struct rte_mp_msg *mp_res;
@@ -82,6 +83,7 @@ mlx5_mp_req_mempool_reg(struct mlx5_common_device *cdev,
 		     MLX5_MP_REQ_MEMPOOL_UNREGISTER;
 	mp_init_port_agnostic_msg(&mp_req, type);
 	arg->mempool = mempool;
+	arg->is_extmem = is_extmem;
 	arg->cdev = cdev;
 	ret = rte_mp_request_sync(&mp_req, &mp_rep, &ts);
 	if (ret) {
diff --git a/drivers/common/mlx5/mlx5_common_mp.h b/drivers/common/mlx5/mlx5_common_mp.h
index b1e3a41a20..4599ba8f92 100644
--- a/drivers/common/mlx5/mlx5_common_mp.h
+++ b/drivers/common/mlx5/mlx5_common_mp.h
@@ -37,9 +37,12 @@ struct mlx5_mp_arg_queue_id {
 
 struct mlx5_mp_arg_mr_manage {
 	struct mlx5_common_device *cdev;
+	RTE_STD_C11
 	union {
-		struct rte_mempool *mempool;
-		/* MLX5_MP_REQ_MEMPOOL_(UN)REGISTER */
+		struct {
+			struct rte_mempool *mempool;
+			bool is_extmem;
+		}; /* MLX5_MP_REQ_MEMPOOL_(UN)REGISTER */
 		uintptr_t addr; /* MLX5_MP_REQ_CREATE_MR */
 	};
 };
@@ -134,7 +137,8 @@ __rte_internal
 int mlx5_mp_req_mr_create(struct mlx5_common_device *cdev, uintptr_t addr);
 __rte_internal
 int mlx5_mp_req_mempool_reg(struct mlx5_common_device *cdev,
-			    struct rte_mempool *mempool, bool reg);
+			    struct rte_mempool *mempool, bool reg,
+			    bool is_extmem);
 __rte_internal
 int mlx5_mp_req_queue_state_modify(struct mlx5_mp_id *mp_id,
 				   struct mlx5_mp_arg_queue_state_modify *sm);
diff --git a/drivers/common/mlx5/mlx5_common_mr.c b/drivers/common/mlx5/mlx5_common_mr.c
index 70365e1466..47121d8ff7 100644
--- a/drivers/common/mlx5/mlx5_common_mr.c
+++ b/drivers/common/mlx5/mlx5_common_mr.c
@@ -47,6 +47,8 @@ struct mlx5_mempool_reg {
 	struct mlx5_mempool_mr *mrs;
 	/** Number of memory regions. */
 	unsigned int mrs_n;
+	/** Whether the MR were created for external pinned memory. */
+	bool is_extmem;
 };
 
 void
@@ -1302,16 +1304,14 @@ static int
 mlx5_mempool_get_chunks(struct rte_mempool *mp, struct mlx5_range **out,
 			unsigned int *out_n)
 {
-	struct mlx5_range *chunks;
 	unsigned int n;
 
 	DRV_LOG(DEBUG, "Collecting chunks of regular mempool %s", mp->name);
 	n = mp->nb_mem_chunks;
-	chunks = calloc(sizeof(chunks[0]), n);
-	if (chunks == NULL)
+	*out = calloc(sizeof(**out), n);
+	if (*out == NULL)
 		return -1;
-	rte_mempool_mem_iter(mp, mlx5_range_from_mempool_chunk, chunks);
-	*out = chunks;
+	rte_mempool_mem_iter(mp, mlx5_range_from_mempool_chunk, *out);
 	*out_n = n;
 	return 0;
 }
@@ -1390,11 +1390,9 @@ mlx5_mempool_get_extmem(struct rte_mempool *mp, struct mlx5_range **out,
 		mp->name);
 	memset(&data, 0, sizeof(data));
 	rte_mempool_obj_iter(mp, mlx5_mempool_get_extmem_cb, &data);
-	if (data.ret < 0)
-		return -1;
 	*out = data.heap;
 	*out_n = data.heap_size;
-	return 0;
+	return data.ret;
 }
 
 /**
@@ -1403,26 +1401,27 @@ mlx5_mempool_get_extmem(struct rte_mempool *mp, struct mlx5_range **out,
  *
  * @param[in] mp
  *   Analyzed mempool.
+ * @param[in] is_extmem
+ *   Whether the pool is contains only external pinned buffers.
  * @param[out] out
  *   Receives the ranges, caller must release it with free().
- * @param[out] ount_n
+ * @param[out] out_n
  *   Receives the number of @p out elements.
  *
  * @return
  *   0 on success, (-1) on failure.
  */
 static int
-mlx5_get_mempool_ranges(struct rte_mempool *mp, struct mlx5_range **out,
-			unsigned int *out_n)
+mlx5_get_mempool_ranges(struct rte_mempool *mp, bool is_extmem,
+			struct mlx5_range **out, unsigned int *out_n)
 {
 	struct mlx5_range *chunks;
 	unsigned int chunks_n, contig_n, i;
 	int ret;
 
 	/* Collect the pool underlying memory. */
-	ret = mlx5_mempool_is_extmem(mp) ?
-	      mlx5_mempool_get_extmem(mp, &chunks, &chunks_n) :
-	      mlx5_mempool_get_chunks(mp, &chunks, &chunks_n);
+	ret = is_extmem ? mlx5_mempool_get_extmem(mp, &chunks, &chunks_n) :
+			  mlx5_mempool_get_chunks(mp, &chunks, &chunks_n);
 	if (ret < 0)
 		return ret;
 	/* Merge adjacent chunks and place them at the beginning. */
@@ -1446,6 +1445,8 @@ mlx5_get_mempool_ranges(struct rte_mempool *mp, struct mlx5_range **out,
  *
  * @param[in] mp
  *   Mempool to analyze.
+ * @param[in] is_extmem
+ *   Whether the pool is contains only external pinned buffers.
  * @param[out] out
  *   Receives memory ranges to register, aligned to the system page size.
  *   The caller must release them with free().
@@ -1458,14 +1459,15 @@ mlx5_get_mempool_ranges(struct rte_mempool *mp, struct mlx5_range **out,
  *   0 on success, (-1) on failure.
  */
 static int
-mlx5_mempool_reg_analyze(struct rte_mempool *mp, struct mlx5_range **out,
-			 unsigned int *out_n, bool *share_hugepage)
+mlx5_mempool_reg_analyze(struct rte_mempool *mp, bool is_extmem,
+			 struct mlx5_range **out, unsigned int *out_n,
+			 bool *share_hugepage)
 {
 	struct mlx5_range *ranges = NULL;
 	unsigned int i, ranges_n = 0;
 	struct rte_memseg_list *msl;
 
-	if (mlx5_get_mempool_ranges(mp, &ranges, &ranges_n) < 0) {
+	if (mlx5_get_mempool_ranges(mp, is_extmem, &ranges, &ranges_n) < 0) {
 		DRV_LOG(ERR, "Cannot get address ranges for mempool %s",
 			mp->name);
 		return -1;
@@ -1507,7 +1509,8 @@ mlx5_mempool_reg_analyze(struct rte_mempool *mp, struct mlx5_range **out,
 
 /** Create a registration object for the mempool. */
 static struct mlx5_mempool_reg *
-mlx5_mempool_reg_create(struct rte_mempool *mp, unsigned int mrs_n)
+mlx5_mempool_reg_create(struct rte_mempool *mp, unsigned int mrs_n,
+			bool is_extmem)
 {
 	struct mlx5_mempool_reg *mpr = NULL;
 
@@ -1522,6 +1525,7 @@ mlx5_mempool_reg_create(struct rte_mempool *mp, unsigned int mrs_n)
 	mpr->mp = mp;
 	mpr->mrs = (struct mlx5_mempool_mr *)(mpr + 1);
 	mpr->mrs_n = mrs_n;
+	mpr->is_extmem = is_extmem;
 	return mpr;
 }
 
@@ -1586,31 +1590,32 @@ mlx5_mempool_reg_detach(struct mlx5_mempool_reg *mpr)
 
 static int
 mlx5_mr_mempool_register_primary(struct mlx5_mr_share_cache *share_cache,
-				 void *pd, struct rte_mempool *mp)
+				 void *pd, struct rte_mempool *mp,
+				 bool is_extmem)
 {
 	struct mlx5_range *ranges = NULL;
-	struct mlx5_mempool_reg *mpr, *new_mpr;
+	struct mlx5_mempool_reg *mpr, *old_mpr, *new_mpr;
 	unsigned int i, ranges_n;
-	bool share_hugepage;
+	bool share_hugepage, standalone = false;
 	int ret = -1;
 
 	/* Early check to avoid unnecessary creation of MRs. */
 	rte_rwlock_read_lock(&share_cache->rwlock);
-	mpr = mlx5_mempool_reg_lookup(share_cache, mp);
+	old_mpr = mlx5_mempool_reg_lookup(share_cache, mp);
 	rte_rwlock_read_unlock(&share_cache->rwlock);
-	if (mpr != NULL) {
+	if (old_mpr != NULL && (!is_extmem || old_mpr->is_extmem)) {
 		DRV_LOG(DEBUG, "Mempool %s is already registered for PD %p",
 			mp->name, pd);
 		rte_errno = EEXIST;
 		goto exit;
 	}
-	if (mlx5_mempool_reg_analyze(mp, &ranges, &ranges_n,
+	if (mlx5_mempool_reg_analyze(mp, is_extmem, &ranges, &ranges_n,
 				     &share_hugepage) < 0) {
 		DRV_LOG(ERR, "Cannot get mempool %s memory ranges", mp->name);
 		rte_errno = ENOMEM;
 		goto exit;
 	}
-	new_mpr = mlx5_mempool_reg_create(mp, ranges_n);
+	new_mpr = mlx5_mempool_reg_create(mp, ranges_n, is_extmem);
 	if (new_mpr == NULL) {
 		DRV_LOG(ERR,
 			"Cannot create a registration object for mempool %s in PD %p",
@@ -1670,6 +1675,12 @@ mlx5_mr_mempool_register_primary(struct mlx5_mr_share_cache *share_cache,
 	/* Concurrent registration is not supposed to happen. */
 	rte_rwlock_write_lock(&share_cache->rwlock);
 	mpr = mlx5_mempool_reg_lookup(share_cache, mp);
+	if (mpr == old_mpr && old_mpr != NULL) {
+		LIST_REMOVE(old_mpr, next);
+		standalone = mlx5_mempool_reg_detach(mpr);
+		/* No need to flush the cache: old MRs cannot be in use. */
+		mpr = NULL;
+	}
 	if (mpr == NULL) {
 		mlx5_mempool_reg_attach(new_mpr);
 		LIST_INSERT_HEAD(&share_cache->mempool_reg_list, new_mpr, next);
@@ -1682,6 +1693,10 @@ mlx5_mr_mempool_register_primary(struct mlx5_mr_share_cache *share_cache,
 		mlx5_mempool_reg_destroy(share_cache, new_mpr, true);
 		rte_errno = EEXIST;
 		goto exit;
+	} else if (old_mpr != NULL) {
+		DRV_LOG(DEBUG, "Mempool %s registration for PD %p updated for external memory",
+			mp->name, pd);
+		mlx5_mempool_reg_destroy(share_cache, old_mpr, standalone);
 	}
 exit:
 	free(ranges);
@@ -1690,9 +1705,9 @@ mlx5_mr_mempool_register_primary(struct mlx5_mr_share_cache *share_cache,
 
 static int
 mlx5_mr_mempool_register_secondary(struct mlx5_common_device *cdev,
-				   struct rte_mempool *mp)
+				   struct rte_mempool *mp, bool is_extmem)
 {
-	return mlx5_mp_req_mempool_reg(cdev, mp, true);
+	return mlx5_mp_req_mempool_reg(cdev, mp, true, is_extmem);
 }
 
 /**
@@ -1708,16 +1723,17 @@ mlx5_mr_mempool_register_secondary(struct mlx5_common_device *cdev,
  */
 int
 mlx5_mr_mempool_register(struct mlx5_common_device *cdev,
-			 struct rte_mempool *mp)
+			 struct rte_mempool *mp, bool is_extmem)
 {
 	if (mp->flags & RTE_MEMPOOL_F_NON_IO)
 		return 0;
 	switch (rte_eal_process_type()) {
 	case RTE_PROC_PRIMARY:
 		return mlx5_mr_mempool_register_primary(&cdev->mr_scache,
-							cdev->pd, mp);
+							cdev->pd, mp,
+							is_extmem);
 	case RTE_PROC_SECONDARY:
-		return mlx5_mr_mempool_register_secondary(cdev, mp);
+		return mlx5_mr_mempool_register_secondary(cdev, mp, is_extmem);
 	default:
 		return -1;
 	}
@@ -1756,7 +1772,7 @@ static int
 mlx5_mr_mempool_unregister_secondary(struct mlx5_common_device *cdev,
 				     struct rte_mempool *mp)
 {
-	return mlx5_mp_req_mempool_reg(cdev, mp, false);
+	return mlx5_mp_req_mempool_reg(cdev, mp, false, false /* is_extmem */);
 }
 
 /**
@@ -1868,6 +1884,65 @@ mlx5_lookup_mempool_regs(struct mlx5_mr_ctrl *mr_ctrl,
 	return lkey;
 }
 
+/**
+ * Populate cache with LKeys of all MRs used by the mempool.
+ * It is intended to be used to register Rx mempools in advance.
+ *
+ * @param mr_ctrl
+ *  Per-queue MR control handle.
+ * @param mp
+ *  Registered memory pool.
+ *
+ * @return
+ *  0 on success, (-1) on failure and rte_errno is set.
+ */
+int
+mlx5_mr_mempool_populate_cache(struct mlx5_mr_ctrl *mr_ctrl,
+			       struct rte_mempool *mp)
+{
+	struct mlx5_mr_share_cache *share_cache =
+		container_of(mr_ctrl->dev_gen_ptr, struct mlx5_mr_share_cache,
+			     dev_gen);
+	struct mlx5_mr_btree *bt = &mr_ctrl->cache_bh;
+	struct mlx5_mempool_reg *mpr;
+	unsigned int i;
+
+	/*
+	 * Registration is valid after the lock is released,
+	 * because the function is called after the mempool is registered.
+	 */
+	rte_rwlock_read_lock(&share_cache->rwlock);
+	mpr = mlx5_mempool_reg_lookup(share_cache, mp);
+	rte_rwlock_read_unlock(&share_cache->rwlock);
+	if (mpr == NULL) {
+		DRV_LOG(ERR, "Mempool %s is not registered", mp->name);
+		rte_errno = ENOENT;
+		return -1;
+	}
+	for (i = 0; i < mpr->mrs_n; i++) {
+		struct mlx5_mempool_mr *mr = &mpr->mrs[i];
+		struct mr_cache_entry entry;
+		uint32_t lkey;
+		uint16_t idx;
+
+		lkey = mr_btree_lookup(bt, &idx, (uintptr_t)mr->pmd_mr.addr);
+		if (lkey != UINT32_MAX)
+			continue;
+		if (bt->len == bt->size)
+			mr_btree_expand(bt, bt->size << 1);
+		entry.start = (uintptr_t)mr->pmd_mr.addr;
+		entry.end = entry.start + mr->pmd_mr.len;
+		entry.lkey = rte_cpu_to_be_32(mr->pmd_mr.lkey);
+		if (mr_btree_insert(bt, &entry) < 0) {
+			DRV_LOG(ERR, "Cannot insert cache entry for mempool %s MR %08x",
+				mp->name, entry.lkey);
+			rte_errno = EINVAL;
+			return -1;
+		}
+	}
+	return 0;
+}
+
 /**
  * Bottom-half lookup for the address from the mempool.
  *
@@ -1909,6 +1984,8 @@ mlx5_mr_mempool2mr_bh(struct mlx5_mr_ctrl *mr_ctrl,
 uint32_t
 mlx5_mr_mb2mr_bh(struct mlx5_mr_ctrl *mr_ctrl, struct rte_mbuf *mb)
 {
+	struct rte_mempool *mp;
+	struct mlx5_mprq_buf *buf;
 	uint32_t lkey;
 	uintptr_t addr = (uintptr_t)mb->buf_addr;
 	struct mlx5_mr_share_cache *share_cache =
@@ -1917,27 +1994,26 @@ mlx5_mr_mb2mr_bh(struct mlx5_mr_ctrl *mr_ctrl, struct rte_mbuf *mb)
 	struct mlx5_common_device *cdev =
 		container_of(share_cache, struct mlx5_common_device, mr_scache);
 
-	if (cdev->config.mr_mempool_reg_en) {
-		struct rte_mempool *mp = NULL;
-		struct mlx5_mprq_buf *buf;
-
-		if (!RTE_MBUF_HAS_EXTBUF(mb)) {
-			mp = mlx5_mb2mp(mb);
-		} else if (mb->shinfo->free_cb == mlx5_mprq_buf_free_cb) {
-			/* Recover MPRQ mempool. */
-			buf = mb->shinfo->fcb_opaque;
-			mp = buf->mp;
-		}
-		if (mp != NULL) {
-			lkey = mlx5_mr_mempool2mr_bh(mr_ctrl, mp, addr);
-			/*
-			 * Lookup can only fail on invalid input, e.g. "addr"
-			 * is not from "mp" or "mp" has MEMPOOL_F_NON_IO set.
-			 */
-			if (lkey != UINT32_MAX)
-				return lkey;
-		}
-		/* Fallback for generic mechanism in corner cases. */
+	/* Recover MPRQ mempool. */
+	if (RTE_MBUF_HAS_EXTBUF(mb) &&
+	    mb->shinfo->free_cb == mlx5_mprq_buf_free_cb) {
+		buf = mb->shinfo->fcb_opaque;
+		mp = buf->mp;
+	} else {
+		mp = mlx5_mb2mp(mb);
+	}
+	lkey = mlx5_mr_mempool2mr_bh(mr_ctrl, mp, addr);
+	if (lkey != UINT32_MAX)
+		return lkey;
+	/* Register pinned external memory if the mempool is not used for Rx. */
+	if (cdev->config.mr_mempool_reg_en &&
+	    (rte_pktmbuf_priv_flags(mp) & RTE_PKTMBUF_POOL_F_PINNED_EXT_BUF)) {
+		if (mlx5_mr_mempool_register(cdev, mp, true) < 0)
+			return UINT32_MAX;
+		lkey = mlx5_mr_mempool2mr_bh(mr_ctrl, mp, addr);
+		MLX5_ASSERT(lkey != UINT32_MAX);
+		return lkey;
 	}
+	/* Fallback to generic mechanism in corner cases. */
 	return mlx5_mr_addr2mr_bh(mr_ctrl, addr);
 }
diff --git a/drivers/common/mlx5/mlx5_common_mr.h b/drivers/common/mlx5/mlx5_common_mr.h
index de0cab29cc..cf384b6748 100644
--- a/drivers/common/mlx5/mlx5_common_mr.h
+++ b/drivers/common/mlx5/mlx5_common_mr.h
@@ -255,20 +255,15 @@ mlx5_os_set_reg_mr_cb(mlx5_reg_mr_t *reg_mr_cb, mlx5_dereg_mr_t *dereg_mr_cb);
 __rte_internal
 int
 mlx5_mr_mempool_register(struct mlx5_common_device *cdev,
-			 struct rte_mempool *mp);
+			 struct rte_mempool *mp, bool is_extmem);
 __rte_internal
 int
 mlx5_mr_mempool_unregister(struct mlx5_common_device *cdev,
 			   struct rte_mempool *mp);
 
-/** Check if @p mp has buffers pinned in external memory. */
-static inline bool
-mlx5_mempool_is_extmem(struct rte_mempool *mp)
-{
-	return (mp->private_data_size ==
-		sizeof(struct rte_pktmbuf_pool_private)) &&
-	       (mp->elt_size >= sizeof(struct rte_mbuf)) &&
-	       (rte_pktmbuf_priv_flags(mp) & RTE_PKTMBUF_POOL_F_PINNED_EXT_BUF);
-}
+__rte_internal
+int
+mlx5_mr_mempool_populate_cache(struct mlx5_mr_ctrl *mr_ctrl,
+			       struct rte_mempool *mp);
 
 #endif /* RTE_PMD_MLX5_COMMON_MR_H_ */
diff --git a/drivers/common/mlx5/version.map b/drivers/common/mlx5/version.map
index 8a62dc2782..34e86004a0 100644
--- a/drivers/common/mlx5/version.map
+++ b/drivers/common/mlx5/version.map
@@ -145,4 +145,5 @@ INTERNAL {
 	mlx5_mr_mempool_unregister;
 	mlx5_mp_req_mempool_reg;
 	mlx5_mr_mempool2mr_bh;
+	mlx5_mr_mempool_populate_cache;
 };
diff --git a/drivers/net/mlx5/linux/mlx5_mp_os.c b/drivers/net/mlx5/linux/mlx5_mp_os.c
index edc5203dd6..c448a3e9eb 100644
--- a/drivers/net/mlx5/linux/mlx5_mp_os.c
+++ b/drivers/net/mlx5/linux/mlx5_mp_os.c
@@ -48,7 +48,8 @@ mlx5_mp_os_handle_port_agnostic(const struct rte_mp_msg *mp_msg,
 		return rte_mp_reply(&mp_res, peer);
 	case MLX5_MP_REQ_MEMPOOL_REGISTER:
 		mp_init_port_agnostic_msg(&mp_res, param->type);
-		res->result = mlx5_mr_mempool_register(mng->cdev, mng->mempool);
+		res->result = mlx5_mr_mempool_register(mng->cdev, mng->mempool,
+						       mng->is_extmem);
 		return rte_mp_reply(&mp_res, peer);
 	case MLX5_MP_REQ_MEMPOOL_UNREGISTER:
 		mp_init_port_agnostic_msg(&mp_res, param->type);
diff --git a/drivers/net/mlx5/mlx5_rxq.c b/drivers/net/mlx5/mlx5_rxq.c
index 480f4f9f07..e406779faf 100644
--- a/drivers/net/mlx5/mlx5_rxq.c
+++ b/drivers/net/mlx5/mlx5_rxq.c
@@ -1458,7 +1458,7 @@ mlx5_mprq_alloc_mp(struct rte_eth_dev *dev)
 		rte_errno = ENOMEM;
 		return -rte_errno;
 	}
-	ret = mlx5_mr_mempool_register(priv->sh->cdev, mp);
+	ret = mlx5_mr_mempool_register(priv->sh->cdev, mp, false);
 	if (ret < 0 && rte_errno != EEXIST) {
 		ret = rte_errno;
 		DRV_LOG(ERR, "port %u failed to register a mempool for Multi-Packet RQ",
diff --git a/drivers/net/mlx5/mlx5_trigger.c b/drivers/net/mlx5/mlx5_trigger.c
index e2bfde19c7..68df3f8b4d 100644
--- a/drivers/net/mlx5/mlx5_trigger.c
+++ b/drivers/net/mlx5/mlx5_trigger.c
@@ -105,21 +105,6 @@ mlx5_txq_start(struct rte_eth_dev *dev)
 	return -rte_errno;
 }
 
-/**
- * Translate the chunk address to MR key in order to put in into the cache.
- */
-static void
-mlx5_rxq_mempool_register_cb(struct rte_mempool *mp, void *opaque,
-			     struct rte_mempool_memhdr *memhdr,
-			     unsigned int idx)
-{
-	struct mlx5_rxq_data *rxq = opaque;
-
-	RTE_SET_USED(mp);
-	RTE_SET_USED(idx);
-	mlx5_rx_addr2mr(rxq, (uintptr_t)memhdr->addr);
-}
-
 /**
  * Register Rx queue mempools and fill the Rx queue cache.
  * This function tolerates repeated mempool registration.
@@ -139,24 +124,23 @@ mlx5_rxq_mempool_register(struct mlx5_rxq_ctrl *rxq_ctrl)
 
 	mlx5_mr_flush_local_cache(&rxq_ctrl->rxq.mr_ctrl);
 	/* MPRQ mempool is registered on creation, just fill the cache. */
-	if (mlx5_rxq_mprq_enabled(&rxq_ctrl->rxq)) {
-		rte_mempool_mem_iter(rxq_ctrl->rxq.mprq_mp,
-				     mlx5_rxq_mempool_register_cb,
-				     &rxq_ctrl->rxq);
-		return 0;
-	}
+	if (mlx5_rxq_mprq_enabled(&rxq_ctrl->rxq))
+		return mlx5_mr_mempool_populate_cache(&rxq_ctrl->rxq.mr_ctrl,
+						      rxq_ctrl->rxq.mprq_mp);
 	for (s = 0; s < rxq_ctrl->rxq.rxseg_n; s++) {
-		uint32_t flags;
+		bool is_extmem;
 
 		mp = rxq_ctrl->rxq.rxseg[s].mp;
-		flags = mp != rxq_ctrl->rxq.mprq_mp ?
-			rte_pktmbuf_priv_flags(mp) : 0;
-		ret = mlx5_mr_mempool_register(rxq_ctrl->sh->cdev, mp);
+		is_extmem = (rte_pktmbuf_priv_flags(mp) &
+			     RTE_PKTMBUF_POOL_F_PINNED_EXT_BUF) != 0;
+		ret = mlx5_mr_mempool_register(rxq_ctrl->sh->cdev, mp,
+					       is_extmem);
 		if (ret < 0 && rte_errno != EEXIST)
 			return ret;
-		if ((flags & RTE_PKTMBUF_POOL_F_PINNED_EXT_BUF) == 0)
-			rte_mempool_mem_iter(mp, mlx5_rxq_mempool_register_cb,
-					     &rxq_ctrl->rxq);
+		ret = mlx5_mr_mempool_populate_cache(&rxq_ctrl->rxq.mr_ctrl,
+						     mp);
+		if (ret < 0)
+			return ret;
 	}
 	return 0;
 }
-- 
2.25.1


^ permalink raw reply	[flat|nested] 4+ messages in thread

* RE: [PATCH v3] common/mlx5: fix mempool registration
  2021-11-19 14:31   ` [PATCH v3] " Dmitry Kozlyuk
@ 2021-11-21 14:39     ` Raslan Darawsheh
  0 siblings, 0 replies; 4+ messages in thread
From: Raslan Darawsheh @ 2021-11-21 14:39 UTC (permalink / raw)
  To: Dmitry Kozlyuk, dev; +Cc: Michael Baum, Matan Azrad, Slava Ovsiienko

Hi,

> -----Original Message-----
> From: Dmitry Kozlyuk <dkozlyuk@nvidia.com>
> Sent: Friday, November 19, 2021 4:32 PM
> To: dev@dpdk.org
> Cc: Raslan Darawsheh <rasland@nvidia.com>; Michael Baum
> <michaelba@nvidia.com>; Matan Azrad <matan@nvidia.com>; Slava
> Ovsiienko <viacheslavo@nvidia.com>
> Subject: [PATCH v3] common/mlx5: fix mempool registration
> 
> Mempool registration was not correctly processing
> mempools with RTE_PKTMBUF_F_PINEND_EXT_BUF flag set
> ("pinned mempools" for short), because it is not known
> at registration time whether the mempool is a pktmbuf one,
> and its elements may not yet be initialized to analyze them.
> Attempts had been made to recognize such pools,
> but there was no robust solution, only the owner of a mempool
> (the application or a device) knows its type.
> This patch extends common/mlx5 registration code
> to accept a hint that the mempool is a pinned one
> and uses this capability from net/mlx5 driver.
> 
> 1. Remove all code assuming pktmbuf pool type
>    or trying to recognize the type of a pool.
> 2. Register pinned mempools used for Rx
>    and their external memory on port start.
>    Populate the MR cache with all their MRs.
> 3. Change Tx slow path logic as follows:
>    3.1. Search the mempool database for a memory region (MR)
>         by the mbuf pool and its buffer address.
>    3.2. If not MR for the address is found for the mempool,
> 	and the mempool contains only pinned external buffers,
> 	perform the mempool registration of the mempool
> 	and its external pinned memory.
>    3.3. Fall back to using page-based MRs in other cases
> 	(for example, a buffer with externally attached memory,
> 	but not from a pinned mempool).
> 
> Fixes: 690b2a88c2f7 ("common/mlx5: add mempool registration facilities")
> Fixes: fec28ca0e3a9 ("net/mlx5: support mempool registration")
> 
> Signed-off-by: Dmitry Kozlyuk <dkozlyuk@nvidia.com>
> Reviewed-by: Matan Azrad <matan@nvidia.com>
> Reviewed-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
> ---
> v3: fix build with GCC on RHEL7
> v2: 1) rebase on ToT
>     2) fix MR cache population

Patch applied to next-net-mlx,

Kindest regards,
Raslan Darawsheh

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2021-11-21 14:39 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-11-17 18:49 [PATCH] common/mlx5: fix mempool registration Dmitry Kozlyuk
2021-11-18 15:25 ` [PATCH v2] " Dmitry Kozlyuk
2021-11-19 14:31   ` [PATCH v3] " Dmitry Kozlyuk
2021-11-21 14:39     ` Raslan Darawsheh

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).