DPDK patches and discussions
 help / color / mirror / Atom feed
From: Yongseok Koh <yskoh@mellanox.com>
To: shahafs@mellanox.com
Cc: dev@dpdk.org
Subject: [dpdk-dev] [PATCH v2 4/6] net/mlx5: enable secondary process to register DMA memory
Date: Mon, 25 Mar 2019 12:22:36 -0700	[thread overview]
Message-ID: <20190325192238.20940-5-yskoh@mellanox.com> (raw)
In-Reply-To: <20190325192238.20940-1-yskoh@mellanox.com>

The Memory Region (MR) for DMA memory can't be created from secondary
process due to lib/driver limitation. Whenever it is needed, secondary
process can make a request to primary process through the EAL IPC channel
(rte_mp_msg) which is established on initialization. Once a MR is created
by primary process, it is immediately visible to secondary process because
the MR list is global per a device. Thus, secondary process can look up the
list after the request is successfully returned.

Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
---
 doc/guides/nics/mlx5.rst   |  1 -
 drivers/net/mlx5/mlx5.h    |  6 +++
 drivers/net/mlx5/mlx5_mp.c | 50 ++++++++++++++++++++++++
 drivers/net/mlx5/mlx5_mr.c | 96 ++++++++++++++++++++++++++++++++++++++++------
 drivers/net/mlx5/mlx5_mr.h |  2 +
 5 files changed, 142 insertions(+), 13 deletions(-)

diff --git a/doc/guides/nics/mlx5.rst b/doc/guides/nics/mlx5.rst
index d9ae91dfc1..d793068b51 100644
--- a/doc/guides/nics/mlx5.rst
+++ b/doc/guides/nics/mlx5.rst
@@ -85,7 +85,6 @@ Limitations
 - For secondary process:
 
   - Forked secondary process not supported.
-  - All mempools must be initialized before rte_eth_dev_start().
   - External memory unregistered in EAL memseg list cannot be used for DMA
     unless such memory has been registered by ``mlx5_mr_update_ext_mp()`` in
     primary process and remapped to the same virtual address in secondary
diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index 37c8cd1d34..410f17ab53 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -59,6 +59,7 @@ enum {
 /* Request types for IPC. */
 enum mlx5_mp_req_type {
 	MLX5_MP_REQ_VERBS_CMD_FD = 1,
+	MLX5_MP_REQ_CREATE_MR,
 	MLX5_MP_REQ_START_RXTX,
 	MLX5_MP_REQ_STOP_RXTX,
 };
@@ -68,6 +69,10 @@ struct mlx5_mp_param {
 	enum mlx5_mp_req_type type;
 	int port_id;
 	int result;
+	RTE_STD_C11
+	union {
+		uintptr_t addr; /* MLX5_MP_REQ_CREATE_MR */
+	} args;
 };
 
 /** Request timeout for IPC. */
@@ -437,6 +442,7 @@ void mlx5_flow_delete_drop_queue(struct rte_eth_dev *dev);
 /* mlx5_mp.c */
 void mlx5_mp_req_start_rxtx(struct rte_eth_dev *dev);
 void mlx5_mp_req_stop_rxtx(struct rte_eth_dev *dev);
+int mlx5_mp_req_mr_create(struct rte_eth_dev *dev, uintptr_t addr);
 int mlx5_mp_req_verbs_cmd_fd(struct rte_eth_dev *dev);
 void mlx5_mp_init_primary(void);
 void mlx5_mp_uninit_primary(void);
diff --git a/drivers/net/mlx5/mlx5_mp.c b/drivers/net/mlx5/mlx5_mp.c
index 657ab6872e..7274a33b50 100644
--- a/drivers/net/mlx5/mlx5_mp.c
+++ b/drivers/net/mlx5/mlx5_mp.c
@@ -58,6 +58,8 @@ mp_primary_handle(const struct rte_mp_msg *mp_msg, const void *peer)
 		(const struct mlx5_mp_param *)mp_msg->param;
 	struct rte_eth_dev *dev;
 	struct mlx5_priv *priv;
+	struct mlx5_mr_cache entry;
+	uint32_t lkey;
 	int ret;
 
 	assert(rte_eal_process_type() == RTE_PROC_PRIMARY);
@@ -69,6 +71,13 @@ mp_primary_handle(const struct rte_mp_msg *mp_msg, const void *peer)
 	dev = &rte_eth_devices[param->port_id];
 	priv = dev->data->dev_private;
 	switch (param->type) {
+	case MLX5_MP_REQ_CREATE_MR:
+		mp_init_msg(dev, &mp_res, param->type);
+		lkey = mlx5_mr_create_primary(dev, &entry, param->args.addr);
+		if (lkey == UINT32_MAX)
+			res->result = -rte_errno;
+		ret = rte_mp_reply(&mp_res, peer);
+		break;
 	case MLX5_MP_REQ_VERBS_CMD_FD:
 		mp_init_msg(dev, &mp_res, param->type);
 		mp_res.num_fds = 1;
@@ -221,6 +230,47 @@ mlx5_mp_req_stop_rxtx(struct rte_eth_dev *dev)
 }
 
 /**
+ * Request Memory Region creation to the primary process.
+ *
+ * @param[in] dev
+ *   Pointer to Ethernet structure.
+ * @param addr
+ *   Target virtual address to register.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+int
+mlx5_mp_req_mr_create(struct rte_eth_dev *dev, uintptr_t addr)
+{
+	struct rte_mp_msg mp_req;
+	struct rte_mp_msg *mp_res;
+	struct rte_mp_reply mp_rep;
+	struct mlx5_mp_param *req = (struct mlx5_mp_param *)mp_req.param;
+	struct mlx5_mp_param *res;
+	struct timespec ts = {.tv_sec = MLX5_MP_REQ_TIMEOUT_SEC, .tv_nsec = 0};
+	int ret;
+
+	assert(rte_eal_process_type() == RTE_PROC_SECONDARY);
+	mp_init_msg(dev, &mp_req, MLX5_MP_REQ_CREATE_MR);
+	req->args.addr = addr;
+	ret = rte_mp_request_sync(&mp_req, &mp_rep, &ts);
+	if (ret) {
+		DRV_LOG(ERR, "port %u request to primary process failed",
+			dev->data->port_id);
+		return -rte_errno;
+	}
+	assert(mp_rep.nb_received == 1);
+	mp_res = &mp_rep.msgs[0];
+	res = (struct mlx5_mp_param *)mp_res->param;
+	ret = res->result;
+	if (ret)
+		rte_errno = -ret;
+	free(mp_rep.msgs);
+	return ret;
+}
+
+/**
  * Request Verbs command file descriptor for mmap to the primary process.
  *
  * @param[in] dev
diff --git a/drivers/net/mlx5/mlx5_mr.c b/drivers/net/mlx5/mlx5_mr.c
index e9eda975ff..576a3c298b 100644
--- a/drivers/net/mlx5/mlx5_mr.c
+++ b/drivers/net/mlx5/mlx5_mr.c
@@ -516,7 +516,10 @@ mr_find_contig_memsegs_cb(const struct rte_memseg_list *msl,
 
 /**
  * Create a new global Memroy Region (MR) for a missing virtual address.
- * Register entire virtually contiguous memory chunk around the address.
+ * This API should be called on a secondary process, then a request is sent to
+ * the primary process in order to create a MR for the address. As the global MR
+ * list is on the shared memory, following LKey lookup should succeed unless the
+ * request fails.
  *
  * @param dev
  *   Pointer to Ethernet device.
@@ -530,8 +533,52 @@ mr_find_contig_memsegs_cb(const struct rte_memseg_list *msl,
  *   Searched LKey on success, UINT32_MAX on failure and rte_errno is set.
  */
 static uint32_t
-mlx5_mr_create(struct rte_eth_dev *dev, struct mlx5_mr_cache *entry,
-	       uintptr_t addr)
+mlx5_mr_create_secondary(struct rte_eth_dev *dev, struct mlx5_mr_cache *entry,
+			 uintptr_t addr)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	int ret;
+
+	DEBUG("port %u requesting MR creation for address (%p)",
+	      dev->data->port_id, (void *)addr);
+	ret = mlx5_mp_req_mr_create(dev, addr);
+	if (ret) {
+		DEBUG("port %u fail to request MR creation for address (%p)",
+		      dev->data->port_id, (void *)addr);
+		return UINT32_MAX;
+	}
+	rte_rwlock_read_lock(&priv->mr.rwlock);
+	/* Fill in output data. */
+	mr_lookup_dev(dev, entry, addr);
+	/* Lookup can't fail. */
+	assert(entry->lkey != UINT32_MAX);
+	rte_rwlock_read_unlock(&priv->mr.rwlock);
+	DEBUG("port %u MR CREATED by primary process for %p:\n"
+	      "  [0x%" PRIxPTR ", 0x%" PRIxPTR "), lkey=0x%x",
+	      dev->data->port_id, (void *)addr,
+	      entry->start, entry->end, entry->lkey);
+	return entry->lkey;
+}
+
+/**
+ * Create a new global Memroy Region (MR) for a missing virtual address.
+ * Register entire virtually contiguous memory chunk around the address.
+ * This must be called from the primary process.
+ *
+ * @param dev
+ *   Pointer to Ethernet device.
+ * @param[out] entry
+ *   Pointer to returning MR cache entry, found in the global cache or newly
+ *   created. If failed to create one, this will not be updated.
+ * @param addr
+ *   Target virtual address to register.
+ *
+ * @return
+ *   Searched LKey on success, UINT32_MAX on failure and rte_errno is set.
+ */
+uint32_t
+mlx5_mr_create_primary(struct rte_eth_dev *dev, struct mlx5_mr_cache *entry,
+		       uintptr_t addr)
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
 	struct mlx5_dev_config *config = &priv->config;
@@ -552,15 +599,6 @@ mlx5_mr_create(struct rte_eth_dev *dev, struct mlx5_mr_cache *entry,
 
 	DRV_LOG(DEBUG, "port %u creating a MR using address (%p)",
 		dev->data->port_id, (void *)addr);
-	if (rte_eal_process_type() != RTE_PROC_PRIMARY) {
-		DRV_LOG(WARNING,
-			"port %u using address (%p) of unregistered mempool"
-			" in secondary process, please create mempool"
-			" before rte_eth_dev_start()",
-			dev->data->port_id, (void *)addr);
-		rte_errno = EPERM;
-		goto err_nolock;
-	}
 	/*
 	 * Release detached MRs if any. This can't be called with holding either
 	 * memory_hotplug_lock or priv->mr.rwlock. MRs on the free list have
@@ -772,6 +810,40 @@ mlx5_mr_create(struct rte_eth_dev *dev, struct mlx5_mr_cache *entry,
 }
 
 /**
+ * Create a new global Memroy Region (MR) for a missing virtual address.
+ * This can be called from primary and secondary process.
+ *
+ * @param dev
+ *   Pointer to Ethernet device.
+ * @param[out] entry
+ *   Pointer to returning MR cache entry, found in the global cache or newly
+ *   created. If failed to create one, this will not be updated.
+ * @param addr
+ *   Target virtual address to register.
+ *
+ * @return
+ *   Searched LKey on success, UINT32_MAX on failure and rte_errno is set.
+ */
+static uint32_t
+mlx5_mr_create(struct rte_eth_dev *dev, struct mlx5_mr_cache *entry,
+	       uintptr_t addr)
+{
+	uint32_t ret = 0;
+
+	switch (rte_eal_process_type()) {
+	case RTE_PROC_PRIMARY:
+		ret = mlx5_mr_create_primary(dev, entry, addr);
+		break;
+	case RTE_PROC_SECONDARY:
+		ret = mlx5_mr_create_secondary(dev, entry, addr);
+		break;
+	default:
+		break;
+	}
+	return ret;
+}
+
+/**
  * Rebuild the global B-tree cache of device from the original MR list.
  *
  * @param dev
diff --git a/drivers/net/mlx5/mlx5_mr.h b/drivers/net/mlx5/mlx5_mr.h
index a57003fe92..786f6a3148 100644
--- a/drivers/net/mlx5/mlx5_mr.h
+++ b/drivers/net/mlx5/mlx5_mr.h
@@ -70,6 +70,8 @@ extern rte_rwlock_t mlx5_mem_event_rwlock;
 
 int mlx5_mr_btree_init(struct mlx5_mr_btree *bt, int n, int socket);
 void mlx5_mr_btree_free(struct mlx5_mr_btree *bt);
+uint32_t mlx5_mr_create_primary(struct rte_eth_dev *dev,
+				struct mlx5_mr_cache *entry, uintptr_t addr);
 void mlx5_mr_mem_event_cb(enum rte_mem_event event_type, const void *addr,
 			  size_t len, void *arg);
 int mlx5_mr_update_mp(struct rte_eth_dev *dev, struct mlx5_mr_ctrl *mr_ctrl,
-- 
2.11.0

  parent reply	other threads:[~2019-03-25 19:22 UTC|newest]

Thread overview: 44+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-03-07  7:41 [dpdk-dev] [PATCH 0/6] net/mlx: " Yongseok Koh
2019-03-07  7:41 ` [dpdk-dev] [PATCH 1/6] net/mlx: remove debug messages on datapath Yongseok Koh
2019-03-07  7:41 ` [dpdk-dev] [PATCH 2/6] net/mlx5: fix external memory registration Yongseok Koh
2019-03-07  7:41 ` [dpdk-dev] [PATCH 3/6] net/mlx5: add control of excessive memory pinning by kernel Yongseok Koh
2019-03-07  7:41 ` [dpdk-dev] [PATCH 4/6] net/mlx5: enable secondary process to register DMA memory Yongseok Koh
2019-03-07  7:41 ` [dpdk-dev] [PATCH 5/6] net/mlx4: add control of excessive memory pinning by kernel Yongseok Koh
2019-03-07  7:41 ` [dpdk-dev] [PATCH 6/6] net/mlx4: enable secondary process to register DMA memory Yongseok Koh
2019-03-07  7:55 ` [dpdk-dev] [PATCH 0/6] net/mlx: " Yongseok Koh
2019-03-14 12:45 ` Shahaf Shuler
2019-03-14 12:45   ` Shahaf Shuler
2019-03-25 19:22 ` [dpdk-dev] [PATCH v2 " Yongseok Koh
2019-03-25 19:22   ` Yongseok Koh
2019-03-25 19:22   ` [dpdk-dev] [PATCH v2 1/6] net/mlx: remove debug messages on datapath Yongseok Koh
2019-03-25 19:22     ` Yongseok Koh
2019-03-25 19:22   ` [dpdk-dev] [PATCH v2 2/6] net/mlx5: fix external memory registration Yongseok Koh
2019-03-25 19:22     ` Yongseok Koh
2019-03-25 19:22   ` [dpdk-dev] [PATCH v2 3/6] net/mlx5: add control of excessive memory pinning by kernel Yongseok Koh
2019-03-25 19:22     ` Yongseok Koh
2019-03-25 19:22   ` Yongseok Koh [this message]
2019-03-25 19:22     ` [dpdk-dev] [PATCH v2 4/6] net/mlx5: enable secondary process to register DMA memory Yongseok Koh
2019-03-25 19:22   ` [dpdk-dev] [PATCH v2 5/6] net/mlx4: add control of excessive memory pinning by kernel Yongseok Koh
2019-03-25 19:22     ` Yongseok Koh
2019-03-25 19:22   ` [dpdk-dev] [PATCH v2 6/6] net/mlx4: enable secondary process to register DMA memory Yongseok Koh
2019-03-25 19:22     ` Yongseok Koh
2019-04-01 21:17 ` [dpdk-dev] [PATCH v3 0/6] net/mlx: " Yongseok Koh
2019-04-01 21:17   ` Yongseok Koh
2019-04-01 21:17   ` [dpdk-dev] [PATCH v3 1/6] net/mlx: remove debug messages on datapath Yongseok Koh
2019-04-01 21:17     ` Yongseok Koh
2019-04-01 21:17   ` [dpdk-dev] [PATCH v3 2/6] net/mlx5: fix external memory registration Yongseok Koh
2019-04-01 21:17     ` Yongseok Koh
2019-04-01 21:17   ` [dpdk-dev] [PATCH v3 3/6] net/mlx5: add control of excessive memory pinning by kernel Yongseok Koh
2019-04-01 21:17     ` Yongseok Koh
2019-04-01 21:17   ` [dpdk-dev] [PATCH v3 4/6] net/mlx5: enable secondary process to register DMA memory Yongseok Koh
2019-04-01 21:17     ` Yongseok Koh
2019-04-01 21:17   ` [dpdk-dev] [PATCH v3 5/6] net/mlx4: add control of excessive memory pinning by kernel Yongseok Koh
2019-04-01 21:17     ` Yongseok Koh
2019-05-14  4:52     ` Stephen Hemminger
2019-05-14  4:52       ` Stephen Hemminger
2019-05-14  4:54       ` Stephen Hemminger
2019-05-14  4:54         ` Stephen Hemminger
2019-04-01 21:17   ` [dpdk-dev] [PATCH v3 6/6] net/mlx4: enable secondary process to register DMA memory Yongseok Koh
2019-04-01 21:17     ` Yongseok Koh
2019-04-02  7:13   ` [dpdk-dev] [PATCH v3 0/6] net/mlx: " Shahaf Shuler
2019-04-02  7:13     ` Shahaf Shuler

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190325192238.20940-5-yskoh@mellanox.com \
    --to=yskoh@mellanox.com \
    --cc=dev@dpdk.org \
    --cc=shahafs@mellanox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).