DPDK patches and discussions
 help / color / mirror / Atom feed
* [dpdk-dev] [PATCH 0/6] net/mlx: enable secondary process to register DMA memory
@ 2019-03-07  7:41 Yongseok Koh
  2019-03-07  7:41 ` [dpdk-dev] [PATCH 1/6] net/mlx: remove debug messages on datapath Yongseok Koh
                   ` (9 more replies)
  0 siblings, 10 replies; 44+ messages in thread
From: Yongseok Koh @ 2019-03-07  7:41 UTC (permalink / raw)
  To: shahafs; +Cc: dev

RFC:
https://mails.dpdk.org/archives/dev/2019-March/125517.html

Yongseok Koh (6):
  net/mlx: remove debug messages on datapath
  net/mlx5: fix external memory registration
  net/mlx5: add control of excessive memory pinning by kernel
  net/mlx5: enable secondary process to register DMA memory
  net/mlx4: add control of excessive memory pinning by kernel
  net/mlx4: enable secondary process to register DMA memory

 doc/guides/nics/mlx4.rst   |  12 +++-
 doc/guides/nics/mlx5.rst   |  17 +++++-
 drivers/net/mlx4/mlx4.c    |  11 +++-
 drivers/net/mlx4/mlx4.h    |  11 ++++
 drivers/net/mlx4/mlx4_mp.c |  50 +++++++++++++++++
 drivers/net/mlx4/mlx4_mr.c | 119 +++++++++++++++++++++++++++++++++-------
 drivers/net/mlx4/mlx4_mr.h |   2 +
 drivers/net/mlx5/mlx5.c    |   7 +++
 drivers/net/mlx5/mlx5.h    |   8 +++
 drivers/net/mlx5/mlx5_mp.c |  51 +++++++++++++++++
 drivers/net/mlx5/mlx5_mr.c | 133 +++++++++++++++++++++++++++++++++++++--------
 drivers/net/mlx5/mlx5_mr.h |   2 +
 12 files changed, 376 insertions(+), 47 deletions(-)

-- 
2.11.0

^ permalink raw reply	[flat|nested] 44+ messages in thread

* [dpdk-dev] [PATCH 1/6] net/mlx: remove debug messages on datapath
  2019-03-07  7:41 [dpdk-dev] [PATCH 0/6] net/mlx: enable secondary process to register DMA memory Yongseok Koh
@ 2019-03-07  7:41 ` Yongseok Koh
  2019-03-07  7:41 ` [dpdk-dev] [PATCH 2/6] net/mlx5: fix external memory registration Yongseok Koh
                   ` (8 subsequent siblings)
  9 siblings, 0 replies; 44+ messages in thread
From: Yongseok Koh @ 2019-03-07  7:41 UTC (permalink / raw)
  To: shahafs; +Cc: dev, stable

Cc: stable@dpdk.org

Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
---
 drivers/net/mlx4/mlx4_mr.c | 4 ----
 drivers/net/mlx5/mlx5_mr.c | 6 ------
 2 files changed, 10 deletions(-)

diff --git a/drivers/net/mlx4/mlx4_mr.c b/drivers/net/mlx4/mlx4_mr.c
index 01894faecf..0ba55fda04 100644
--- a/drivers/net/mlx4/mlx4_mr.c
+++ b/drivers/net/mlx4/mlx4_mr.c
@@ -1039,8 +1039,6 @@ mlx4_rx_addr2mr_bh(struct rxq *rxq, uintptr_t addr)
 	struct mlx4_mr_ctrl *mr_ctrl = &rxq->mr_ctrl;
 	struct mlx4_priv *priv = rxq->priv;
 
-	DEBUG("Rx queue %u: miss on top-half, mru=%u, head=%u, addr=%p",
-	      rxq->stats.idx, mr_ctrl->mru, mr_ctrl->head, (void *)addr);
 	return mlx4_mr_addr2mr_bh(ETH_DEV(priv), mr_ctrl, addr);
 }
 
@@ -1061,8 +1059,6 @@ mlx4_tx_addr2mr_bh(struct txq *txq, uintptr_t addr)
 	struct mlx4_mr_ctrl *mr_ctrl = &txq->mr_ctrl;
 	struct mlx4_priv *priv = txq->priv;
 
-	DEBUG("Tx queue %u: miss on top-half, mru=%u, head=%u, addr=%p",
-	      txq->stats.idx, mr_ctrl->mru, mr_ctrl->head, (void *)addr);
 	return mlx4_mr_addr2mr_bh(ETH_DEV(priv), mr_ctrl, addr);
 }
 
diff --git a/drivers/net/mlx5/mlx5_mr.c b/drivers/net/mlx5/mlx5_mr.c
index d336a77e40..8aaa87dd60 100644
--- a/drivers/net/mlx5/mlx5_mr.c
+++ b/drivers/net/mlx5/mlx5_mr.c
@@ -1030,9 +1030,6 @@ mlx5_rx_addr2mr_bh(struct mlx5_rxq_data *rxq, uintptr_t addr)
 	struct mlx5_mr_ctrl *mr_ctrl = &rxq->mr_ctrl;
 	struct mlx5_priv *priv = rxq_ctrl->priv;
 
-	DRV_LOG(DEBUG,
-		"Rx queue %u: miss on top-half, mru=%u, head=%u, addr=%p",
-		rxq_ctrl->idx, mr_ctrl->mru, mr_ctrl->head, (void *)addr);
 	return mlx5_mr_addr2mr_bh(ETH_DEV(priv), mr_ctrl, addr);
 }
 
@@ -1055,9 +1052,6 @@ mlx5_tx_addr2mr_bh(struct mlx5_txq_data *txq, uintptr_t addr)
 	struct mlx5_mr_ctrl *mr_ctrl = &txq->mr_ctrl;
 	struct mlx5_priv *priv = txq_ctrl->priv;
 
-	DRV_LOG(DEBUG,
-		"Tx queue %u: miss on top-half, mru=%u, head=%u, addr=%p",
-		txq_ctrl->idx, mr_ctrl->mru, mr_ctrl->head, (void *)addr);
 	return mlx5_mr_addr2mr_bh(ETH_DEV(priv), mr_ctrl, addr);
 }
 
-- 
2.11.0

^ permalink raw reply	[flat|nested] 44+ messages in thread

* [dpdk-dev] [PATCH 2/6] net/mlx5: fix external memory registration
  2019-03-07  7:41 [dpdk-dev] [PATCH 0/6] net/mlx: enable secondary process to register DMA memory Yongseok Koh
  2019-03-07  7:41 ` [dpdk-dev] [PATCH 1/6] net/mlx: remove debug messages on datapath Yongseok Koh
@ 2019-03-07  7:41 ` Yongseok Koh
  2019-03-07  7:41 ` [dpdk-dev] [PATCH 3/6] net/mlx5: add control of excessive memory pinning by kernel Yongseok Koh
                   ` (7 subsequent siblings)
  9 siblings, 0 replies; 44+ messages in thread
From: Yongseok Koh @ 2019-03-07  7:41 UTC (permalink / raw)
  To: shahafs; +Cc: dev, stable

Secondary process is not allowed to register MR due to a restriction of
library and kernel driver.

Fixes: 7e43a32ee060 ("net/mlx5: support externally allocated static memory")
Cc: stable@dpdk.org

Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
---
 doc/guides/nics/mlx5.rst   |  5 +++++
 drivers/net/mlx5/mlx5_mr.c | 10 ++++++++++
 2 files changed, 15 insertions(+)

diff --git a/doc/guides/nics/mlx5.rst b/doc/guides/nics/mlx5.rst
index 0b67496542..4557e77712 100644
--- a/doc/guides/nics/mlx5.rst
+++ b/doc/guides/nics/mlx5.rst
@@ -86,6 +86,11 @@ Limitations
 
   - Forked secondary process not supported.
   - All mempools must be initialized before rte_eth_dev_start().
+  - External memory unregistered in EAL memseg list cannot be used for DMA
+    unless such memory has been registered by ``mlx5_mr_update_ext_mp()`` in
+    primary process and remapped to the same virtual address in secondary
+    process. If the external memory is registered by primary process but has
+    different virtual address in secondary process, unexpected error may happen.
 
 - Flow pattern without any specific vlan will match for vlan packets as well:
 
diff --git a/drivers/net/mlx5/mlx5_mr.c b/drivers/net/mlx5/mlx5_mr.c
index 8aaa87dd60..e255650add 100644
--- a/drivers/net/mlx5/mlx5_mr.c
+++ b/drivers/net/mlx5/mlx5_mr.c
@@ -1132,6 +1132,7 @@ mlx5_mr_update_ext_mp_cb(struct rte_mempool *mp, void *opaque,
 	struct mlx5_mr_cache entry;
 	uint32_t lkey;
 
+	assert(rte_eal_process_type() == RTE_PROC_PRIMARY);
 	/* If already registered, it should return. */
 	rte_rwlock_read_lock(&priv->mr.rwlock);
 	lkey = mr_lookup_dev(dev, &entry, addr);
@@ -1233,6 +1234,15 @@ mlx5_tx_update_ext_mp(struct mlx5_txq_data *txq, uintptr_t addr,
 	struct mlx5_mr_ctrl *mr_ctrl = &txq->mr_ctrl;
 	struct mlx5_priv *priv = txq_ctrl->priv;
 
+	if (rte_eal_process_type() != RTE_PROC_PRIMARY) {
+		DRV_LOG(WARNING,
+			"port %u using address (%p) from unregistered mempool"
+			" having externally allocated memory"
+			" in secondary process, please create mempool"
+			" prior to rte_eth_dev_start()",
+			PORT_ID(priv), (void *)addr);
+		return UINT32_MAX;
+	}
 	mlx5_mr_update_ext_mp(ETH_DEV(priv), mr_ctrl, mp);
 	return mlx5_tx_addr2mr_bh(txq, addr);
 }
-- 
2.11.0

^ permalink raw reply	[flat|nested] 44+ messages in thread

* [dpdk-dev] [PATCH 3/6] net/mlx5: add control of excessive memory pinning by kernel
  2019-03-07  7:41 [dpdk-dev] [PATCH 0/6] net/mlx: enable secondary process to register DMA memory Yongseok Koh
  2019-03-07  7:41 ` [dpdk-dev] [PATCH 1/6] net/mlx: remove debug messages on datapath Yongseok Koh
  2019-03-07  7:41 ` [dpdk-dev] [PATCH 2/6] net/mlx5: fix external memory registration Yongseok Koh
@ 2019-03-07  7:41 ` Yongseok Koh
  2019-03-07  7:41 ` [dpdk-dev] [PATCH 4/6] net/mlx5: enable secondary process to register DMA memory Yongseok Koh
                   ` (6 subsequent siblings)
  9 siblings, 0 replies; 44+ messages in thread
From: Yongseok Koh @ 2019-03-07  7:41 UTC (permalink / raw)
  To: shahafs; +Cc: dev

A new PMD parameter (mr_ext_memseg_en) is added to control extension of
memseg when creating a MR. It is enabled by default.

If enabled, mlx5_mr_create() tries to maximize the range of MR
registration so that the LKey lookup tables on datapath become smaller and
get the best performance. However, it may worsen memory utilization
because registered memory is pinned by kernel driver. Even if a page in the
extended chunk is freed, that doesn't become reusable until the entire
memory is freed and the MR is destroyed.

To make freed pages available immediately, this parameter has to be turned
off but it could drop performance.

Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
---
 doc/guides/nics/mlx5.rst   | 11 +++++++++++
 drivers/net/mlx5/mlx5.c    |  7 +++++++
 drivers/net/mlx5/mlx5.h    |  2 ++
 drivers/net/mlx5/mlx5_mr.c | 21 ++++++++++++++++-----
 4 files changed, 36 insertions(+), 5 deletions(-)

diff --git a/doc/guides/nics/mlx5.rst b/doc/guides/nics/mlx5.rst
index 4557e77712..945dbc25f4 100644
--- a/doc/guides/nics/mlx5.rst
+++ b/doc/guides/nics/mlx5.rst
@@ -485,6 +485,17 @@ Run-time configuration
 
   Disabled by default.
 
+- ``mr_ext_memseg_en`` parameter [int]
+
+  A nonzero value enables extending memseg when registering DMA memory. If
+  enabled, the number of entries in MR (Memory Region) lookup table on datapath
+  is minimized and it benefits performance. On the other hand, it worsens memory
+  utilization because registered memory is pinned by kernel driver. Even if a
+  page in the extended chunk is freed, that doesn't become reusable until the
+  entire memory is freed.
+
+  Enabled by default.
+
 - ``representor`` parameter [list]
 
   This parameter can be used to instantiate DPDK Ethernet devices from
diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index dd9106296c..23beb7b822 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -108,6 +108,9 @@
 /* Activate Netlink support in VF mode. */
 #define MLX5_VF_NL_EN "vf_nl_en"
 
+/* Enable extending memsegs when creating a MR. */
+#define MLX5_MR_EXT_MEMSEG_EN "mr_ext_memseg_en"
+
 /* Select port representors to instantiate. */
 #define MLX5_REPRESENTOR "representor"
 
@@ -569,6 +572,8 @@ mlx5_args_check(const char *key, const char *val, void *opaque)
 		config->vf_nl_en = !!tmp;
 	} else if (strcmp(MLX5_DV_FLOW_EN, key) == 0) {
 		config->dv_flow_en = !!tmp;
+	} else if (strcmp(MLX5_MR_EXT_MEMSEG_EN, key) == 0) {
+		config->mr_ext_memseg_en = !!tmp;
 	} else {
 		DRV_LOG(WARNING, "%s: unknown parameter", key);
 		rte_errno = EINVAL;
@@ -610,6 +615,7 @@ mlx5_args(struct mlx5_dev_config *config, struct rte_devargs *devargs)
 		MLX5_L3_VXLAN_EN,
 		MLX5_VF_NL_EN,
 		MLX5_DV_FLOW_EN,
+		MLX5_MR_EXT_MEMSEG_EN,
 		MLX5_REPRESENTOR,
 		NULL,
 	};
@@ -1586,6 +1592,7 @@ mlx5_pci_probe(struct rte_pci_driver *pci_drv __rte_unused,
 		.txqs_vec = MLX5_ARG_UNSET,
 		.inline_max_packet_sz = MLX5_ARG_UNSET,
 		.vf_nl_en = 1,
+		.mr_ext_memseg_en = 1,
 		.mprq = {
 			.enabled = 0, /* Disabled by default. */
 			.stride_num_n = MLX5_MPRQ_STRIDE_NUM_N,
diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index 9c042bc3da..2763fb8124 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -163,6 +163,8 @@ struct mlx5_dev_config {
 	unsigned int tx_vec_en:1; /* Tx vector is enabled. */
 	unsigned int rx_vec_en:1; /* Rx vector is enabled. */
 	unsigned int mpw_hdr_dseg:1; /* Enable DSEGs in the title WQEBB. */
+	unsigned int mr_ext_memseg_en:1;
+	/* Whether memseg should be extended for MR creation. */
 	unsigned int l3_vxlan_en:1; /* Enable L3 VXLAN flow creation. */
 	unsigned int vf_nl_en:1; /* Enable Netlink requests in VF mode. */
 	unsigned int dv_flow_en:1; /* Enable DV flow. */
diff --git a/drivers/net/mlx5/mlx5_mr.c b/drivers/net/mlx5/mlx5_mr.c
index e255650add..e9eda975ff 100644
--- a/drivers/net/mlx5/mlx5_mr.c
+++ b/drivers/net/mlx5/mlx5_mr.c
@@ -534,6 +534,7 @@ mlx5_mr_create(struct rte_eth_dev *dev, struct mlx5_mr_cache *entry,
 	       uintptr_t addr)
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_dev_config *config = &priv->config;
 	struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
 	const struct rte_memseg_list *msl;
 	const struct rte_memseg *ms;
@@ -569,14 +570,24 @@ mlx5_mr_create(struct rte_eth_dev *dev, struct mlx5_mr_cache *entry,
 	 */
 	mlx5_mr_garbage_collect(dev);
 	/*
-	 * Find out a contiguous virtual address chunk in use, to which the
-	 * given address belongs, in order to register maximum range. In the
-	 * best case where mempools are not dynamically recreated and
+	 * If enabled, find out a contiguous virtual address chunk in use, to
+	 * which the given address belongs, in order to register maximum range.
+	 * In the best case where mempools are not dynamically recreated and
 	 * '--socket-mem' is specified as an EAL option, it is very likely to
 	 * have only one MR(LKey) per a socket and per a hugepage-size even
-	 * though the system memory is highly fragmented.
+	 * though the system memory is highly fragmented. As the whole memory
+	 * chunk will be pinned by kernel, it can't be reused unless entire
+	 * chunk is freed from EAL.
+	 *
+	 * If disabled, just register one memseg (page). Then, memory
+	 * consumption will be minimized but it may drop performance if there
+	 * are many MRs to lookup on the datapath.
 	 */
-	if (!rte_memseg_contig_walk(mr_find_contig_memsegs_cb, &data)) {
+	if (!config->mr_ext_memseg_en) {
+		data.msl = rte_mem_virt2memseg_list((void *)addr);
+		data.start = RTE_ALIGN_FLOOR(addr, data.msl->page_sz);
+		data.end = data.start + data.msl->page_sz;
+	} else if (!rte_memseg_contig_walk(mr_find_contig_memsegs_cb, &data)) {
 		DRV_LOG(WARNING,
 			"port %u unable to find virtually contiguous"
 			" chunk for address (%p)."
-- 
2.11.0

^ permalink raw reply	[flat|nested] 44+ messages in thread

* [dpdk-dev] [PATCH 4/6] net/mlx5: enable secondary process to register DMA memory
  2019-03-07  7:41 [dpdk-dev] [PATCH 0/6] net/mlx: enable secondary process to register DMA memory Yongseok Koh
                   ` (2 preceding siblings ...)
  2019-03-07  7:41 ` [dpdk-dev] [PATCH 3/6] net/mlx5: add control of excessive memory pinning by kernel Yongseok Koh
@ 2019-03-07  7:41 ` Yongseok Koh
  2019-03-07  7:41 ` [dpdk-dev] [PATCH 5/6] net/mlx4: add control of excessive memory pinning by kernel Yongseok Koh
                   ` (5 subsequent siblings)
  9 siblings, 0 replies; 44+ messages in thread
From: Yongseok Koh @ 2019-03-07  7:41 UTC (permalink / raw)
  To: shahafs; +Cc: dev

The Memory Region (MR) for DMA memory can't be created from secondary
process due to lib/driver limitation. Whenever it is needed, secondary
process can make a request to primary process through the EAL IPC channel
(rte_mp_msg) which is established on initialization. Once a MR is created
by primary process, it is immediately visible to secondary process because
the MR list is global per a device. Thus, secondary process can look up the
list after the request is successfully returned.

Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
---
 doc/guides/nics/mlx5.rst   |  1 -
 drivers/net/mlx5/mlx5.h    |  6 +++
 drivers/net/mlx5/mlx5_mp.c | 51 ++++++++++++++++++++++++
 drivers/net/mlx5/mlx5_mr.c | 96 ++++++++++++++++++++++++++++++++++++++++------
 drivers/net/mlx5/mlx5_mr.h |  2 +
 5 files changed, 143 insertions(+), 13 deletions(-)

diff --git a/doc/guides/nics/mlx5.rst b/doc/guides/nics/mlx5.rst
index 945dbc25f4..fb62daf6cc 100644
--- a/doc/guides/nics/mlx5.rst
+++ b/doc/guides/nics/mlx5.rst
@@ -85,7 +85,6 @@ Limitations
 - For secondary process:
 
   - Forked secondary process not supported.
-  - All mempools must be initialized before rte_eth_dev_start().
   - External memory unregistered in EAL memseg list cannot be used for DMA
     unless such memory has been registered by ``mlx5_mr_update_ext_mp()`` in
     primary process and remapped to the same virtual address in secondary
diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index 2763fb8124..80784c22e0 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -59,6 +59,7 @@ enum {
 /* Request types for IPC. */
 enum mlx5_mp_req_type {
 	MLX5_MP_REQ_VERBS_CMD_FD = 1,
+	MLX5_MP_REQ_CREATE_MR,
 	MLX5_MP_REQ_START_RXTX,
 	MLX5_MP_REQ_STOP_RXTX,
 };
@@ -68,6 +69,10 @@ struct mlx5_mp_param {
 	enum mlx5_mp_req_type type;
 	int port_id;
 	int result;
+	RTE_STD_C11
+	union {
+		uintptr_t addr; /* MLX5_MP_REQ_CREATE_MR */
+	} args;
 };
 
 /** Key string for IPC. */
@@ -431,6 +436,7 @@ void mlx5_flow_delete_drop_queue(struct rte_eth_dev *dev);
 /* mlx5_mp.c */
 void mlx5_mp_req_start_rxtx(struct rte_eth_dev *dev);
 void mlx5_mp_req_stop_rxtx(struct rte_eth_dev *dev);
+int mlx5_mp_req_mr_create(struct rte_eth_dev *dev, uintptr_t addr);
 int mlx5_mp_req_verbs_cmd_fd(struct rte_eth_dev *dev);
 void mlx5_mp_init_primary(void);
 void mlx5_mp_uninit_primary(void);
diff --git a/drivers/net/mlx5/mlx5_mp.c b/drivers/net/mlx5/mlx5_mp.c
index 623dfb8097..a04b912de8 100644
--- a/drivers/net/mlx5/mlx5_mp.c
+++ b/drivers/net/mlx5/mlx5_mp.c
@@ -58,10 +58,19 @@ mp_primary_handle(const struct rte_mp_msg *mp_msg, const void *peer)
 		(const struct mlx5_mp_param *)mp_msg->param;
 	struct rte_eth_dev *dev = &rte_eth_devices[param->port_id];
 	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_mr_cache entry;
+	uint32_t lkey;
 	int ret = 0;
 
 	assert(rte_eal_process_type() == RTE_PROC_PRIMARY);
 	switch (param->type) {
+	case MLX5_MP_REQ_CREATE_MR:
+		mp_init_msg(dev, &mp_res, param->type);
+		lkey = mlx5_mr_create_primary(dev, &entry, param->args.addr);
+		if (lkey == UINT32_MAX)
+			res->result = -rte_errno;
+		ret = rte_mp_reply(&mp_res, peer);
+		break;
 	case MLX5_MP_REQ_VERBS_CMD_FD:
 		mp_init_msg(dev, &mp_res, param->type);
 		mp_res.num_fds = 1;
@@ -202,6 +211,48 @@ mlx5_mp_req_stop_rxtx(struct rte_eth_dev *dev)
 }
 
 /**
+ * Request Memory Region creation to the primary process.
+ *
+ * @param[in] dev
+ *   Pointer to Ethernet structure.
+ * @param addr
+ *   Target virtual address to register.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+int
+mlx5_mp_req_mr_create(struct rte_eth_dev *dev, uintptr_t addr)
+{
+	struct rte_mp_msg mp_req;
+	struct rte_mp_msg *mp_res;
+	struct rte_mp_reply mp_rep;
+	struct mlx5_mp_param *req = (struct mlx5_mp_param *)mp_req.param;
+	struct mlx5_mp_param *res;
+	struct timespec ts = {.tv_sec = 5, .tv_nsec = 0};
+	int ret;
+
+	assert(rte_eal_process_type() == RTE_PROC_SECONDARY);
+	mp_init_msg(dev, &mp_req, MLX5_MP_REQ_CREATE_MR);
+	req->args.addr = addr;
+	ret = rte_mp_request_sync(&mp_req, &mp_rep, &ts);
+	if (ret) {
+		DRV_LOG(ERR,
+			"port %u failed to get command FD from primary process",
+			dev->data->port_id);
+		return -rte_errno;
+	}
+	assert(mp_rep.nb_received == 1);
+	mp_res = &mp_rep.msgs[0];
+	res = (struct mlx5_mp_param *)mp_res->param;
+	ret = res->result;
+	if (ret)
+		rte_errno = -ret;
+	free(mp_rep.msgs);
+	return ret;
+}
+
+/**
  * Request Verbs command file descriptor for mmap to the primary process.
  *
  * @param[in] dev
diff --git a/drivers/net/mlx5/mlx5_mr.c b/drivers/net/mlx5/mlx5_mr.c
index e9eda975ff..576a3c298b 100644
--- a/drivers/net/mlx5/mlx5_mr.c
+++ b/drivers/net/mlx5/mlx5_mr.c
@@ -516,7 +516,10 @@ mr_find_contig_memsegs_cb(const struct rte_memseg_list *msl,
 
 /**
  * Create a new global Memroy Region (MR) for a missing virtual address.
- * Register entire virtually contiguous memory chunk around the address.
+ * This API should be called on a secondary process, then a request is sent to
+ * the primary process in order to create a MR for the address. As the global MR
+ * list is on the shared memory, following LKey lookup should succeed unless the
+ * request fails.
  *
  * @param dev
  *   Pointer to Ethernet device.
@@ -530,8 +533,52 @@ mr_find_contig_memsegs_cb(const struct rte_memseg_list *msl,
  *   Searched LKey on success, UINT32_MAX on failure and rte_errno is set.
  */
 static uint32_t
-mlx5_mr_create(struct rte_eth_dev *dev, struct mlx5_mr_cache *entry,
-	       uintptr_t addr)
+mlx5_mr_create_secondary(struct rte_eth_dev *dev, struct mlx5_mr_cache *entry,
+			 uintptr_t addr)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	int ret;
+
+	DEBUG("port %u requesting MR creation for address (%p)",
+	      dev->data->port_id, (void *)addr);
+	ret = mlx5_mp_req_mr_create(dev, addr);
+	if (ret) {
+		DEBUG("port %u fail to request MR creation for address (%p)",
+		      dev->data->port_id, (void *)addr);
+		return UINT32_MAX;
+	}
+	rte_rwlock_read_lock(&priv->mr.rwlock);
+	/* Fill in output data. */
+	mr_lookup_dev(dev, entry, addr);
+	/* Lookup can't fail. */
+	assert(entry->lkey != UINT32_MAX);
+	rte_rwlock_read_unlock(&priv->mr.rwlock);
+	DEBUG("port %u MR CREATED by primary process for %p:\n"
+	      "  [0x%" PRIxPTR ", 0x%" PRIxPTR "), lkey=0x%x",
+	      dev->data->port_id, (void *)addr,
+	      entry->start, entry->end, entry->lkey);
+	return entry->lkey;
+}
+
+/**
+ * Create a new global Memroy Region (MR) for a missing virtual address.
+ * Register entire virtually contiguous memory chunk around the address.
+ * This must be called from the primary process.
+ *
+ * @param dev
+ *   Pointer to Ethernet device.
+ * @param[out] entry
+ *   Pointer to returning MR cache entry, found in the global cache or newly
+ *   created. If failed to create one, this will not be updated.
+ * @param addr
+ *   Target virtual address to register.
+ *
+ * @return
+ *   Searched LKey on success, UINT32_MAX on failure and rte_errno is set.
+ */
+uint32_t
+mlx5_mr_create_primary(struct rte_eth_dev *dev, struct mlx5_mr_cache *entry,
+		       uintptr_t addr)
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
 	struct mlx5_dev_config *config = &priv->config;
@@ -552,15 +599,6 @@ mlx5_mr_create(struct rte_eth_dev *dev, struct mlx5_mr_cache *entry,
 
 	DRV_LOG(DEBUG, "port %u creating a MR using address (%p)",
 		dev->data->port_id, (void *)addr);
-	if (rte_eal_process_type() != RTE_PROC_PRIMARY) {
-		DRV_LOG(WARNING,
-			"port %u using address (%p) of unregistered mempool"
-			" in secondary process, please create mempool"
-			" before rte_eth_dev_start()",
-			dev->data->port_id, (void *)addr);
-		rte_errno = EPERM;
-		goto err_nolock;
-	}
 	/*
 	 * Release detached MRs if any. This can't be called with holding either
 	 * memory_hotplug_lock or priv->mr.rwlock. MRs on the free list have
@@ -772,6 +810,40 @@ mlx5_mr_create(struct rte_eth_dev *dev, struct mlx5_mr_cache *entry,
 }
 
 /**
+ * Create a new global Memroy Region (MR) for a missing virtual address.
+ * This can be called from primary and secondary process.
+ *
+ * @param dev
+ *   Pointer to Ethernet device.
+ * @param[out] entry
+ *   Pointer to returning MR cache entry, found in the global cache or newly
+ *   created. If failed to create one, this will not be updated.
+ * @param addr
+ *   Target virtual address to register.
+ *
+ * @return
+ *   Searched LKey on success, UINT32_MAX on failure and rte_errno is set.
+ */
+static uint32_t
+mlx5_mr_create(struct rte_eth_dev *dev, struct mlx5_mr_cache *entry,
+	       uintptr_t addr)
+{
+	uint32_t ret = 0;
+
+	switch (rte_eal_process_type()) {
+	case RTE_PROC_PRIMARY:
+		ret = mlx5_mr_create_primary(dev, entry, addr);
+		break;
+	case RTE_PROC_SECONDARY:
+		ret = mlx5_mr_create_secondary(dev, entry, addr);
+		break;
+	default:
+		break;
+	}
+	return ret;
+}
+
+/**
  * Rebuild the global B-tree cache of device from the original MR list.
  *
  * @param dev
diff --git a/drivers/net/mlx5/mlx5_mr.h b/drivers/net/mlx5/mlx5_mr.h
index a57003fe92..786f6a3148 100644
--- a/drivers/net/mlx5/mlx5_mr.h
+++ b/drivers/net/mlx5/mlx5_mr.h
@@ -70,6 +70,8 @@ extern rte_rwlock_t mlx5_mem_event_rwlock;
 
 int mlx5_mr_btree_init(struct mlx5_mr_btree *bt, int n, int socket);
 void mlx5_mr_btree_free(struct mlx5_mr_btree *bt);
+uint32_t mlx5_mr_create_primary(struct rte_eth_dev *dev,
+				struct mlx5_mr_cache *entry, uintptr_t addr);
 void mlx5_mr_mem_event_cb(enum rte_mem_event event_type, const void *addr,
 			  size_t len, void *arg);
 int mlx5_mr_update_mp(struct rte_eth_dev *dev, struct mlx5_mr_ctrl *mr_ctrl,
-- 
2.11.0

^ permalink raw reply	[flat|nested] 44+ messages in thread

* [dpdk-dev] [PATCH 5/6] net/mlx4: add control of excessive memory pinning by kernel
  2019-03-07  7:41 [dpdk-dev] [PATCH 0/6] net/mlx: enable secondary process to register DMA memory Yongseok Koh
                   ` (3 preceding siblings ...)
  2019-03-07  7:41 ` [dpdk-dev] [PATCH 4/6] net/mlx5: enable secondary process to register DMA memory Yongseok Koh
@ 2019-03-07  7:41 ` Yongseok Koh
  2019-03-07  7:41 ` [dpdk-dev] [PATCH 6/6] net/mlx4: enable secondary process to register DMA memory Yongseok Koh
                   ` (4 subsequent siblings)
  9 siblings, 0 replies; 44+ messages in thread
From: Yongseok Koh @ 2019-03-07  7:41 UTC (permalink / raw)
  To: shahafs; +Cc: dev

A new PMD parameter (mr_ext_memseg_en) is added to control extension of
memseg when creating a MR. It is enabled by default.

If enabled, mlx4_mr_create() tries to maximize the range of MR
registration so that the LKey lookup tables on datapath become smaller and
get the best performance. However, it may worsen memory utilization
because registered memory is pinned by kernel driver. Even if a page in the
extended chunk is freed, that doesn't become reusable until the entire
memory is freed and the MR is destroyed.

To make freed pages available immediately, this parameter has to be turned
off but it could drop performance.

Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
---
 doc/guides/nics/mlx4.rst   | 11 +++++++++++
 drivers/net/mlx4/mlx4.c    | 11 +++++++++--
 drivers/net/mlx4/mlx4.h    |  5 +++++
 drivers/net/mlx4/mlx4_mr.c | 20 +++++++++++++++-----
 4 files changed, 40 insertions(+), 7 deletions(-)

diff --git a/doc/guides/nics/mlx4.rst b/doc/guides/nics/mlx4.rst
index cd34838f41..c8a02be4dd 100644
--- a/doc/guides/nics/mlx4.rst
+++ b/doc/guides/nics/mlx4.rst
@@ -119,6 +119,17 @@ Run-time configuration
   times for additional ports. All ports are probed by default if left
   unspecified.
 
+- ``mr_ext_memseg_en`` parameter [int]
+
+  A nonzero value enables extending memseg when registering DMA memory. If
+  enabled, the number of entries in MR (Memory Region) lookup table on datapath
+  is minimized and it benefits performance. On the other hand, it worsens memory
+  utilization because registered memory is pinned by kernel driver. Even if a
+  page in the extended chunk is freed, that doesn't become reusable until the
+  entire memory is freed.
+
+  Enabled by default.
+
 Kernel module parameters
 ~~~~~~~~~~~~~~~~~~~~~~~~
 
diff --git a/drivers/net/mlx4/mlx4.c b/drivers/net/mlx4/mlx4.c
index a5cfcdbee3..d913c2a47e 100644
--- a/drivers/net/mlx4/mlx4.c
+++ b/drivers/net/mlx4/mlx4.c
@@ -71,11 +71,14 @@ struct mlx4_conf {
 		uint32_t present; /**< Bit-field for existing ports. */
 		uint32_t enabled; /**< Bit-field for user-enabled ports. */
 	} ports;
+	int mr_ext_memseg_en;
+	/** Whether memseg should be extended for MR creation. */
 };
 
 /* Available parameters list. */
 const char *pmd_mlx4_init_params[] = {
 	MLX4_PMD_PORT_KVARG,
+	MLX4_MR_EXT_MEMSEG_EN_KVARG,
 	NULL,
 };
 
@@ -516,6 +519,8 @@ mlx4_arg_parse(const char *key, const char *val, struct mlx4_conf *conf)
 			return -rte_errno;
 		}
 		conf->ports.enabled |= 1 << tmp;
+	} else if (strcmp(MLX4_MR_EXT_MEMSEG_EN_KVARG, key) == 0) {
+		conf->mr_ext_memseg_en = !!tmp;
 	} else {
 		rte_errno = EINVAL;
 		WARN("%s: unknown parameter", key);
@@ -551,10 +556,10 @@ mlx4_args(struct rte_devargs *devargs, struct mlx4_conf *conf)
 	}
 	/* Process parameters. */
 	for (i = 0; pmd_mlx4_init_params[i]; ++i) {
-		arg_count = rte_kvargs_count(kvlist, MLX4_PMD_PORT_KVARG);
+		arg_count = rte_kvargs_count(kvlist, pmd_mlx4_init_params[i]);
 		while (arg_count-- > 0) {
 			ret = rte_kvargs_process(kvlist,
-						 MLX4_PMD_PORT_KVARG,
+						 pmd_mlx4_init_params[i],
 						 (int (*)(const char *,
 							  const char *,
 							  void *))
@@ -883,6 +888,7 @@ mlx4_pci_probe(struct rte_pci_driver *pci_drv, struct rte_pci_device *pci_dev)
 	struct ibv_device_attr_ex device_attr_ex;
 	struct mlx4_conf conf = {
 		.ports.present = 0,
+		.mr_ext_memseg_en = 1,
 	};
 	unsigned int vf;
 	int i;
@@ -1100,6 +1106,7 @@ mlx4_pci_probe(struct rte_pci_driver *pci_drv, struct rte_pci_device *pci_dev)
 					device_attr_ex.tso_caps.max_tso;
 		DEBUG("TSO is %ssupported",
 		      priv->tso ? "" : "not ");
+		priv->mr_ext_memseg_en = conf.mr_ext_memseg_en;
 		/* Configure the first MAC address by default. */
 		err = mlx4_get_mac(priv, &mac.addr_bytes);
 		if (err) {
diff --git a/drivers/net/mlx4/mlx4.h b/drivers/net/mlx4/mlx4.h
index bb75f99e03..d450a1c95a 100644
--- a/drivers/net/mlx4/mlx4.h
+++ b/drivers/net/mlx4/mlx4.h
@@ -53,6 +53,9 @@
 /** Port parameter. */
 #define MLX4_PMD_PORT_KVARG "port"
 
+/** Enable extending memsegs when creating a MR. */
+#define MLX4_MR_EXT_MEMSEG_EN_KVARG "mr_ext_memseg_en"
+
 /* Reserved address space for UAR mapping. */
 #define MLX4_UAR_SIZE (1ULL << (sizeof(uintptr_t) * 4))
 
@@ -161,6 +164,8 @@ struct mlx4_priv {
 	uint32_t hw_csum_l2tun:1; /**< Checksum support for L2 tunnels. */
 	uint32_t hw_fcs_strip:1; /**< FCS stripping toggling is supported. */
 	uint32_t tso:1; /**< Transmit segmentation offload is supported. */
+	uint32_t mr_ext_memseg_en:1;
+	/** Whether memseg should be extended for MR creation. */
 	uint32_t tso_max_payload_sz; /**< Max supported TSO payload size. */
 	uint32_t hw_rss_max_qps; /**< Max Rx Queues supported by RSS. */
 	uint64_t hw_rss_sup; /**< Supported RSS hash fields (Verbs format). */
diff --git a/drivers/net/mlx4/mlx4_mr.c b/drivers/net/mlx4/mlx4_mr.c
index 0ba55fda04..6db917a092 100644
--- a/drivers/net/mlx4/mlx4_mr.c
+++ b/drivers/net/mlx4/mlx4_mr.c
@@ -580,14 +580,24 @@ mlx4_mr_create(struct rte_eth_dev *dev, struct mlx4_mr_cache *entry,
 	 */
 	mlx4_mr_garbage_collect(dev);
 	/*
-	 * Find out a contiguous virtual address chunk in use, to which the
-	 * given address belongs, in order to register maximum range. In the
-	 * best case where mempools are not dynamically recreated and
+	 * If enabled, find out a contiguous virtual address chunk in use, to
+	 * which the given address belongs, in order to register maximum range.
+	 * In the best case where mempools are not dynamically recreated and
 	 * '--socket-mem' is specified as an EAL option, it is very likely to
 	 * have only one MR(LKey) per a socket and per a hugepage-size even
-	 * though the system memory is highly fragmented.
+	 * though the system memory is highly fragmented. As the whole memory
+	 * chunk will be pinned by kernel, it can't be reused unless entire
+	 * chunk is freed from EAL.
+	 *
+	 * If disabled, just register one memseg (page). Then, memory
+	 * consumption will be minimized but it may drop performance if there
+	 * are many MRs to lookup on the datapath.
 	 */
-	if (!rte_memseg_contig_walk(mr_find_contig_memsegs_cb, &data)) {
+	if (!priv->mr_ext_memseg_en) {
+		data.msl = rte_mem_virt2memseg_list((void *)addr);
+		data.start = RTE_ALIGN_FLOOR(addr, data.msl->page_sz);
+		data.end = data.start + data.msl->page_sz;
+	} else if (!rte_memseg_contig_walk(mr_find_contig_memsegs_cb, &data)) {
 		WARN("port %u unable to find virtually contiguous"
 		     " chunk for address (%p)."
 		     " rte_memseg_contig_walk() failed.",
-- 
2.11.0

^ permalink raw reply	[flat|nested] 44+ messages in thread

* [dpdk-dev] [PATCH 6/6] net/mlx4: enable secondary process to register DMA memory
  2019-03-07  7:41 [dpdk-dev] [PATCH 0/6] net/mlx: enable secondary process to register DMA memory Yongseok Koh
                   ` (4 preceding siblings ...)
  2019-03-07  7:41 ` [dpdk-dev] [PATCH 5/6] net/mlx4: add control of excessive memory pinning by kernel Yongseok Koh
@ 2019-03-07  7:41 ` Yongseok Koh
  2019-03-07  7:55 ` [dpdk-dev] [PATCH 0/6] net/mlx: " Yongseok Koh
                   ` (3 subsequent siblings)
  9 siblings, 0 replies; 44+ messages in thread
From: Yongseok Koh @ 2019-03-07  7:41 UTC (permalink / raw)
  To: shahafs; +Cc: dev

The Memory Region (MR) for DMA memory can't be created from secondary
process due to lib/driver limitation. Whenever it is needed, secondary
process can make a request to primary process through the EAL IPC channel
(rte_mp_msg) which is established on initialization. Once a MR is created
by primary process, it is immediately visible to secondary process because
the MR list is global per a device. Thus, secondary process can look up the
list after the request is successfully returned.

Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
---
 doc/guides/nics/mlx4.rst   |  1 -
 drivers/net/mlx4/mlx4.h    |  6 +++
 drivers/net/mlx4/mlx4_mp.c | 50 ++++++++++++++++++++++++
 drivers/net/mlx4/mlx4_mr.c | 95 ++++++++++++++++++++++++++++++++++++++++------
 drivers/net/mlx4/mlx4_mr.h |  2 +
 5 files changed, 142 insertions(+), 12 deletions(-)

diff --git a/doc/guides/nics/mlx4.rst b/doc/guides/nics/mlx4.rst
index c8a02be4dd..aaf1907532 100644
--- a/doc/guides/nics/mlx4.rst
+++ b/doc/guides/nics/mlx4.rst
@@ -159,7 +159,6 @@ Limitations
 - For secondary process:
 
   - Forked secondary process not supported.
-  - All mempools must be initialized before rte_eth_dev_start().
   - External memory unregistered in EAL memseg list cannot be used for DMA
     unless such memory has been registered by ``mlx4_mr_update_ext_mp()`` in
     primary process and remapped to the same virtual address in secondary
diff --git a/drivers/net/mlx4/mlx4.h b/drivers/net/mlx4/mlx4.h
index d450a1c95a..f7808a6ec1 100644
--- a/drivers/net/mlx4/mlx4.h
+++ b/drivers/net/mlx4/mlx4.h
@@ -79,6 +79,7 @@ enum {
 /* Request types for IPC. */
 enum mlx4_mp_req_type {
 	MLX4_MP_REQ_VERBS_CMD_FD = 1,
+	MLX4_MP_REQ_CREATE_MR,
 	MLX4_MP_REQ_START_RXTX,
 	MLX4_MP_REQ_STOP_RXTX,
 };
@@ -88,6 +89,10 @@ struct mlx4_mp_param {
 	enum mlx4_mp_req_type type;
 	int port_id;
 	int result;
+	RTE_STD_C11
+	union {
+		uintptr_t addr; /* MLX4_MP_REQ_CREATE_MR */
+	} args;
 };
 
 /** Key string for IPC. */
@@ -231,6 +236,7 @@ int mlx4_rx_intr_enable(struct rte_eth_dev *dev, uint16_t idx);
 /* mlx4_mp.c */
 void mlx4_mp_req_start_rxtx(struct rte_eth_dev *dev);
 void mlx4_mp_req_stop_rxtx(struct rte_eth_dev *dev);
+int mlx4_mp_req_mr_create(struct rte_eth_dev *dev, uintptr_t addr);
 int mlx4_mp_req_verbs_cmd_fd(struct rte_eth_dev *dev);
 void mlx4_mp_init_primary(void);
 void mlx4_mp_uninit_primary(void);
diff --git a/drivers/net/mlx4/mlx4_mp.c b/drivers/net/mlx4/mlx4_mp.c
index b0a91b44fd..dfe9f0f280 100644
--- a/drivers/net/mlx4/mlx4_mp.c
+++ b/drivers/net/mlx4/mlx4_mp.c
@@ -58,10 +58,19 @@ mp_primary_handle(const struct rte_mp_msg *mp_msg, const void *peer)
 		(const struct mlx4_mp_param *)mp_msg->param;
 	struct rte_eth_dev *dev = &rte_eth_devices[param->port_id];
 	struct mlx4_priv *priv = dev->data->dev_private;
+	struct mlx4_mr_cache entry;
+	uint32_t lkey;
 	int ret = 0;
 
 	assert(rte_eal_process_type() == RTE_PROC_PRIMARY);
 	switch (param->type) {
+	case MLX4_MP_REQ_CREATE_MR:
+		mp_init_msg(dev, &mp_res, param->type);
+		lkey = mlx4_mr_create_primary(dev, &entry, param->args.addr);
+		if (lkey == UINT32_MAX)
+			res->result = -rte_errno;
+		ret = rte_mp_reply(&mp_res, peer);
+		break;
 	case MLX4_MP_REQ_VERBS_CMD_FD:
 		mp_init_msg(dev, &mp_res, param->type);
 		mp_res.num_fds = 1;
@@ -198,6 +207,47 @@ mlx4_mp_req_stop_rxtx(struct rte_eth_dev *dev)
 }
 
 /**
+ * Request Memory Region creation to the primary process.
+ *
+ * @param[in] dev
+ *   Pointer to Ethernet structure.
+ * @param addr
+ *   Target virtual address to register.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+int
+mlx4_mp_req_mr_create(struct rte_eth_dev *dev, uintptr_t addr)
+{
+	struct rte_mp_msg mp_req;
+	struct rte_mp_msg *mp_res;
+	struct rte_mp_reply mp_rep;
+	struct mlx4_mp_param *req = (struct mlx4_mp_param *)mp_req.param;
+	struct mlx4_mp_param *res;
+	struct timespec ts = {.tv_sec = 5, .tv_nsec = 0};
+	int ret;
+
+	assert(rte_eal_process_type() == RTE_PROC_SECONDARY);
+	mp_init_msg(dev, &mp_req, MLX4_MP_REQ_CREATE_MR);
+	req->args.addr = addr;
+	ret = rte_mp_request_sync(&mp_req, &mp_rep, &ts);
+	if (ret) {
+		ERROR("port %u failed to get command FD from primary process",
+		      dev->data->port_id);
+		return -rte_errno;
+	}
+	assert(mp_rep.nb_received == 1);
+	mp_res = &mp_rep.msgs[0];
+	res = (struct mlx4_mp_param *)mp_res->param;
+	ret = res->result;
+	if (ret)
+		rte_errno = -ret;
+	free(mp_rep.msgs);
+	return ret;
+}
+
+/**
  * IPC message handler of primary process.
  *
  * @param[in] dev
diff --git a/drivers/net/mlx4/mlx4_mr.c b/drivers/net/mlx4/mlx4_mr.c
index 6db917a092..ad7d4832f2 100644
--- a/drivers/net/mlx4/mlx4_mr.c
+++ b/drivers/net/mlx4/mlx4_mr.c
@@ -528,7 +528,10 @@ mr_find_contig_memsegs_cb(const struct rte_memseg_list *msl,
 
 /**
  * Create a new global Memroy Region (MR) for a missing virtual address.
- * Register entire virtually contiguous memory chunk around the address.
+ * This API should be called on a secondary process, then a request is sent to
+ * the primary process in order to create a MR for the address. As the global MR
+ * list is on the shared memory, following LKey lookup should succeed unless the
+ * request fails.
  *
  * @param dev
  *   Pointer to Ethernet device.
@@ -542,8 +545,52 @@ mr_find_contig_memsegs_cb(const struct rte_memseg_list *msl,
  *   Searched LKey on success, UINT32_MAX on failure and rte_errno is set.
  */
 static uint32_t
-mlx4_mr_create(struct rte_eth_dev *dev, struct mlx4_mr_cache *entry,
-	       uintptr_t addr)
+mlx4_mr_create_secondary(struct rte_eth_dev *dev, struct mlx4_mr_cache *entry,
+			 uintptr_t addr)
+{
+	struct mlx4_priv *priv = dev->data->dev_private;
+	int ret;
+
+	DEBUG("port %u requesting MR creation for address (%p)",
+	      dev->data->port_id, (void *)addr);
+	ret = mlx4_mp_req_mr_create(dev, addr);
+	if (ret) {
+		DEBUG("port %u fail to request MR creation for address (%p)",
+		      dev->data->port_id, (void *)addr);
+		return UINT32_MAX;
+	}
+	rte_rwlock_read_lock(&priv->mr.rwlock);
+	/* Fill in output data. */
+	mr_lookup_dev(dev, entry, addr);
+	/* Lookup can't fail. */
+	assert(entry->lkey != UINT32_MAX);
+	rte_rwlock_read_unlock(&priv->mr.rwlock);
+	DEBUG("port %u MR CREATED by primary process for %p:\n"
+	      "  [0x%" PRIxPTR ", 0x%" PRIxPTR "), lkey=0x%x",
+	      dev->data->port_id, (void *)addr,
+	      entry->start, entry->end, entry->lkey);
+	return entry->lkey;
+}
+
+/**
+ * Create a new global Memroy Region (MR) for a missing virtual address.
+ * Register entire virtually contiguous memory chunk around the address.
+ * This must be called from the primary process.
+ *
+ * @param dev
+ *   Pointer to Ethernet device.
+ * @param[out] entry
+ *   Pointer to returning MR cache entry, found in the global cache or newly
+ *   created. If failed to create one, this will not be updated.
+ * @param addr
+ *   Target virtual address to register.
+ *
+ * @return
+ *   Searched LKey on success, UINT32_MAX on failure and rte_errno is set.
+ */
+uint32_t
+mlx4_mr_create_primary(struct rte_eth_dev *dev, struct mlx4_mr_cache *entry,
+		       uintptr_t addr)
 {
 	struct mlx4_priv *priv = dev->data->dev_private;
 	struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
@@ -563,14 +610,6 @@ mlx4_mr_create(struct rte_eth_dev *dev, struct mlx4_mr_cache *entry,
 
 	DEBUG("port %u creating a MR using address (%p)",
 	      dev->data->port_id, (void *)addr);
-	if (rte_eal_process_type() != RTE_PROC_PRIMARY) {
-		WARN("port %u using address (%p) of unregistered mempool"
-		     " in secondary process, please create mempool"
-		     " before rte_eth_dev_start()",
-		     dev->data->port_id, (void *)addr);
-		rte_errno = EPERM;
-		goto err_nolock;
-	}
 	/*
 	 * Release detached MRs if any. This can't be called with holding either
 	 * memory_hotplug_lock or priv->mr.rwlock. MRs on the free list have
@@ -781,6 +820,40 @@ mlx4_mr_create(struct rte_eth_dev *dev, struct mlx4_mr_cache *entry,
 }
 
 /**
+ * Create a new global Memroy Region (MR) for a missing virtual address.
+ * This can be called from primary and secondary process.
+ *
+ * @param dev
+ *   Pointer to Ethernet device.
+ * @param[out] entry
+ *   Pointer to returning MR cache entry, found in the global cache or newly
+ *   created. If failed to create one, this will not be updated.
+ * @param addr
+ *   Target virtual address to register.
+ *
+ * @return
+ *   Searched LKey on success, UINT32_MAX on failure and rte_errno is set.
+ */
+static uint32_t
+mlx4_mr_create(struct rte_eth_dev *dev, struct mlx4_mr_cache *entry,
+	       uintptr_t addr)
+{
+	uint32_t ret = 0;
+
+	switch (rte_eal_process_type()) {
+	case RTE_PROC_PRIMARY:
+		ret = mlx4_mr_create_primary(dev, entry, addr);
+		break;
+	case RTE_PROC_SECONDARY:
+		ret = mlx4_mr_create_secondary(dev, entry, addr);
+		break;
+	default:
+		break;
+	}
+	return ret;
+}
+
+/**
  * Rebuild the global B-tree cache of device from the original MR list.
  *
  * @param dev
diff --git a/drivers/net/mlx4/mlx4_mr.h b/drivers/net/mlx4/mlx4_mr.h
index 37a365a8b5..9d125e239d 100644
--- a/drivers/net/mlx4/mlx4_mr.h
+++ b/drivers/net/mlx4/mlx4_mr.h
@@ -75,6 +75,8 @@ extern rte_rwlock_t mlx4_mem_event_rwlock;
 int mlx4_mr_btree_init(struct mlx4_mr_btree *bt, int n, int socket);
 void mlx4_mr_btree_free(struct mlx4_mr_btree *bt);
 void mlx4_mr_btree_dump(struct mlx4_mr_btree *bt);
+uint32_t mlx4_mr_create_primary(struct rte_eth_dev *dev,
+				struct mlx4_mr_cache *entry, uintptr_t addr);
 void mlx4_mr_mem_event_cb(enum rte_mem_event event_type, const void *addr,
 			  size_t len, void *arg);
 int mlx4_mr_update_mp(struct rte_eth_dev *dev, struct mlx4_mr_ctrl *mr_ctrl,
-- 
2.11.0

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [dpdk-dev] [PATCH 0/6] net/mlx: enable secondary process to register DMA memory
  2019-03-07  7:41 [dpdk-dev] [PATCH 0/6] net/mlx: enable secondary process to register DMA memory Yongseok Koh
                   ` (5 preceding siblings ...)
  2019-03-07  7:41 ` [dpdk-dev] [PATCH 6/6] net/mlx4: enable secondary process to register DMA memory Yongseok Koh
@ 2019-03-07  7:55 ` Yongseok Koh
  2019-03-14 12:45 ` Shahaf Shuler
                   ` (2 subsequent siblings)
  9 siblings, 0 replies; 44+ messages in thread
From: Yongseok Koh @ 2019-03-07  7:55 UTC (permalink / raw)
  To: Shahaf Shuler; +Cc: dev


> On Mar 6, 2019, at 11:41 PM, Yongseok Koh <yskoh@mellanox.com> wrote:
> 
> RFC:
> https://mails.dpdk.org/archives/dev/2019-March/125517.html
> 
> Yongseok Koh (6):
>  net/mlx: remove debug messages on datapath
>  net/mlx5: fix external memory registration
>  net/mlx5: add control of excessive memory pinning by kernel
>  net/mlx5: enable secondary process to register DMA memory
>  net/mlx4: add control of excessive memory pinning by kernel
>  net/mlx4: enable secondary process to register DMA memory

Hi Shahaf,

I've submitted 3 patchsets and please merge it in the following order:

1) [PATCH 0/4] net/mlx5: rework IPC socket and PMD global data init
2) [PATCH 0/3] net/mlx4: add secondary process support
3) [PATCH 0/6] net/mlx: enable secondary process to register DMA memory

Sorry, I should've mentioned it in the cover letters.

Thanks,
Yongseok

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [dpdk-dev] [PATCH 0/6] net/mlx: enable secondary process to register DMA memory
  2019-03-07  7:41 [dpdk-dev] [PATCH 0/6] net/mlx: enable secondary process to register DMA memory Yongseok Koh
                   ` (6 preceding siblings ...)
  2019-03-07  7:55 ` [dpdk-dev] [PATCH 0/6] net/mlx: " Yongseok Koh
@ 2019-03-14 12:45 ` Shahaf Shuler
  2019-03-14 12:45   ` Shahaf Shuler
  2019-03-25 19:22 ` [dpdk-dev] [PATCH v2 " Yongseok Koh
  2019-04-01 21:17 ` [dpdk-dev] [PATCH v3 0/6] net/mlx: " Yongseok Koh
  9 siblings, 1 reply; 44+ messages in thread
From: Shahaf Shuler @ 2019-03-14 12:45 UTC (permalink / raw)
  To: Yongseok Koh; +Cc: dev

Thursday, March 7, 2019 9:42 AM, Yongseok Koh:
> Subject: [dpdk-dev] [PATCH 0/6] net/mlx: enable secondary process to
> register DMA memory
> 
> RFC:
> https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fma
> ils.dpdk.org%2Farchives%2Fdev%2F2019-
> March%2F125517.html&amp;data=02%7C01%7Cshahafs%40mellanox.com%7
> C693ba1facd364c34f30608d6a2d0662a%7Ca652971c7d2e4d9ba6a4d149256f46
> 1b%7C0%7C0%7C636875413295277804&amp;sdata=5LcMxpZoJdgzD0AhJX%2
> FOYt1rBtlaZS6a2YP1IIEC5CE%3D&amp;reserved=0
> 
> Yongseok Koh (6):
>   net/mlx: remove debug messages on datapath
>   net/mlx5: fix external memory registration
>   net/mlx5: add control of excessive memory pinning by kernel
>   net/mlx5: enable secondary process to register DMA memory
>   net/mlx4: add control of excessive memory pinning by kernel
>   net/mlx4: enable secondary process to register DMA memory

For the series - 
Acked-by: Shahaf Shuler <shahafs@mellanox.com>

You probably need to rebase it as it failed to apply on the CI. 

> 
>  doc/guides/nics/mlx4.rst   |  12 +++-
>  doc/guides/nics/mlx5.rst   |  17 +++++-
>  drivers/net/mlx4/mlx4.c    |  11 +++-
>  drivers/net/mlx4/mlx4.h    |  11 ++++
>  drivers/net/mlx4/mlx4_mp.c |  50 +++++++++++++++++
> drivers/net/mlx4/mlx4_mr.c | 119
> +++++++++++++++++++++++++++++++++-------
>  drivers/net/mlx4/mlx4_mr.h |   2 +
>  drivers/net/mlx5/mlx5.c    |   7 +++
>  drivers/net/mlx5/mlx5.h    |   8 +++
>  drivers/net/mlx5/mlx5_mp.c |  51 +++++++++++++++++
> drivers/net/mlx5/mlx5_mr.c | 133
> +++++++++++++++++++++++++++++++++++++--------
>  drivers/net/mlx5/mlx5_mr.h |   2 +
>  12 files changed, 376 insertions(+), 47 deletions(-)
> 
> --
> 2.11.0

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [dpdk-dev] [PATCH 0/6] net/mlx: enable secondary process to register DMA memory
  2019-03-14 12:45 ` Shahaf Shuler
@ 2019-03-14 12:45   ` Shahaf Shuler
  0 siblings, 0 replies; 44+ messages in thread
From: Shahaf Shuler @ 2019-03-14 12:45 UTC (permalink / raw)
  To: Yongseok Koh; +Cc: dev

Thursday, March 7, 2019 9:42 AM, Yongseok Koh:
> Subject: [dpdk-dev] [PATCH 0/6] net/mlx: enable secondary process to
> register DMA memory
> 
> RFC:
> https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fma
> ils.dpdk.org%2Farchives%2Fdev%2F2019-
> March%2F125517.html&amp;data=02%7C01%7Cshahafs%40mellanox.com%7
> C693ba1facd364c34f30608d6a2d0662a%7Ca652971c7d2e4d9ba6a4d149256f46
> 1b%7C0%7C0%7C636875413295277804&amp;sdata=5LcMxpZoJdgzD0AhJX%2
> FOYt1rBtlaZS6a2YP1IIEC5CE%3D&amp;reserved=0
> 
> Yongseok Koh (6):
>   net/mlx: remove debug messages on datapath
>   net/mlx5: fix external memory registration
>   net/mlx5: add control of excessive memory pinning by kernel
>   net/mlx5: enable secondary process to register DMA memory
>   net/mlx4: add control of excessive memory pinning by kernel
>   net/mlx4: enable secondary process to register DMA memory

For the series - 
Acked-by: Shahaf Shuler <shahafs@mellanox.com>

You probably need to rebase it as it failed to apply on the CI. 

> 
>  doc/guides/nics/mlx4.rst   |  12 +++-
>  doc/guides/nics/mlx5.rst   |  17 +++++-
>  drivers/net/mlx4/mlx4.c    |  11 +++-
>  drivers/net/mlx4/mlx4.h    |  11 ++++
>  drivers/net/mlx4/mlx4_mp.c |  50 +++++++++++++++++
> drivers/net/mlx4/mlx4_mr.c | 119
> +++++++++++++++++++++++++++++++++-------
>  drivers/net/mlx4/mlx4_mr.h |   2 +
>  drivers/net/mlx5/mlx5.c    |   7 +++
>  drivers/net/mlx5/mlx5.h    |   8 +++
>  drivers/net/mlx5/mlx5_mp.c |  51 +++++++++++++++++
> drivers/net/mlx5/mlx5_mr.c | 133
> +++++++++++++++++++++++++++++++++++++--------
>  drivers/net/mlx5/mlx5_mr.h |   2 +
>  12 files changed, 376 insertions(+), 47 deletions(-)
> 
> --
> 2.11.0


^ permalink raw reply	[flat|nested] 44+ messages in thread

* [dpdk-dev] [PATCH v2 0/6] net/mlx: enable secondary process to register DMA memory
  2019-03-07  7:41 [dpdk-dev] [PATCH 0/6] net/mlx: enable secondary process to register DMA memory Yongseok Koh
                   ` (7 preceding siblings ...)
  2019-03-14 12:45 ` Shahaf Shuler
@ 2019-03-25 19:22 ` Yongseok Koh
  2019-03-25 19:22   ` Yongseok Koh
                     ` (6 more replies)
  2019-04-01 21:17 ` [dpdk-dev] [PATCH v3 0/6] net/mlx: " Yongseok Koh
  9 siblings, 7 replies; 44+ messages in thread
From: Yongseok Koh @ 2019-03-25 19:22 UTC (permalink / raw)
  To: shahafs; +Cc: dev

RFC:
https://mails.dpdk.org/archives/dev/2019-March/125517.html

v2:
* add more sanity check for eth_dev and return value from IPC request.
* complement commit messages
* add MLX5_MP_REQ_TIMEOUT_SEC
* keep acked-by: Shahaf Shuler

Yongseok Koh (6):
  net/mlx: remove debug messages on datapath
  net/mlx5: fix external memory registration
  net/mlx5: add control of excessive memory pinning by kernel
  net/mlx5: enable secondary process to register DMA memory
  net/mlx4: add control of excessive memory pinning by kernel
  net/mlx4: enable secondary process to register DMA memory

 doc/guides/nics/mlx4.rst   |  12 +++-
 doc/guides/nics/mlx5.rst   |  17 +++++-
 drivers/net/mlx4/mlx4.c    |  11 +++-
 drivers/net/mlx4/mlx4.h    |  11 ++++
 drivers/net/mlx4/mlx4_mp.c |  50 +++++++++++++++++
 drivers/net/mlx4/mlx4_mr.c | 119 +++++++++++++++++++++++++++++++++-------
 drivers/net/mlx4/mlx4_mr.h |   2 +
 drivers/net/mlx5/mlx5.c    |   7 +++
 drivers/net/mlx5/mlx5.h    |   8 +++
 drivers/net/mlx5/mlx5_mp.c |  50 +++++++++++++++++
 drivers/net/mlx5/mlx5_mr.c | 133 +++++++++++++++++++++++++++++++++++++--------
 drivers/net/mlx5/mlx5_mr.h |   2 +
 12 files changed, 375 insertions(+), 47 deletions(-)

-- 
2.11.0

^ permalink raw reply	[flat|nested] 44+ messages in thread

* [dpdk-dev] [PATCH v2 0/6] net/mlx: enable secondary process to register DMA memory
  2019-03-25 19:22 ` [dpdk-dev] [PATCH v2 " Yongseok Koh
@ 2019-03-25 19:22   ` Yongseok Koh
  2019-03-25 19:22   ` [dpdk-dev] [PATCH v2 1/6] net/mlx: remove debug messages on datapath Yongseok Koh
                     ` (5 subsequent siblings)
  6 siblings, 0 replies; 44+ messages in thread
From: Yongseok Koh @ 2019-03-25 19:22 UTC (permalink / raw)
  To: shahafs; +Cc: dev

RFC:
https://mails.dpdk.org/archives/dev/2019-March/125517.html

v2:
* add more sanity check for eth_dev and return value from IPC request.
* complement commit messages
* add MLX5_MP_REQ_TIMEOUT_SEC
* keep acked-by: Shahaf Shuler

Yongseok Koh (6):
  net/mlx: remove debug messages on datapath
  net/mlx5: fix external memory registration
  net/mlx5: add control of excessive memory pinning by kernel
  net/mlx5: enable secondary process to register DMA memory
  net/mlx4: add control of excessive memory pinning by kernel
  net/mlx4: enable secondary process to register DMA memory

 doc/guides/nics/mlx4.rst   |  12 +++-
 doc/guides/nics/mlx5.rst   |  17 +++++-
 drivers/net/mlx4/mlx4.c    |  11 +++-
 drivers/net/mlx4/mlx4.h    |  11 ++++
 drivers/net/mlx4/mlx4_mp.c |  50 +++++++++++++++++
 drivers/net/mlx4/mlx4_mr.c | 119 +++++++++++++++++++++++++++++++++-------
 drivers/net/mlx4/mlx4_mr.h |   2 +
 drivers/net/mlx5/mlx5.c    |   7 +++
 drivers/net/mlx5/mlx5.h    |   8 +++
 drivers/net/mlx5/mlx5_mp.c |  50 +++++++++++++++++
 drivers/net/mlx5/mlx5_mr.c | 133 +++++++++++++++++++++++++++++++++++++--------
 drivers/net/mlx5/mlx5_mr.h |   2 +
 12 files changed, 375 insertions(+), 47 deletions(-)

-- 
2.11.0


^ permalink raw reply	[flat|nested] 44+ messages in thread

* [dpdk-dev] [PATCH v2 1/6] net/mlx: remove debug messages on datapath
  2019-03-25 19:22 ` [dpdk-dev] [PATCH v2 " Yongseok Koh
  2019-03-25 19:22   ` Yongseok Koh
@ 2019-03-25 19:22   ` Yongseok Koh
  2019-03-25 19:22     ` Yongseok Koh
  2019-03-25 19:22   ` [dpdk-dev] [PATCH v2 2/6] net/mlx5: fix external memory registration Yongseok Koh
                     ` (4 subsequent siblings)
  6 siblings, 1 reply; 44+ messages in thread
From: Yongseok Koh @ 2019-03-25 19:22 UTC (permalink / raw)
  To: shahafs; +Cc: dev, stable

Cc: stable@dpdk.org

Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
---
 drivers/net/mlx4/mlx4_mr.c | 4 ----
 drivers/net/mlx5/mlx5_mr.c | 6 ------
 2 files changed, 10 deletions(-)

diff --git a/drivers/net/mlx4/mlx4_mr.c b/drivers/net/mlx4/mlx4_mr.c
index 01894faecf..0ba55fda04 100644
--- a/drivers/net/mlx4/mlx4_mr.c
+++ b/drivers/net/mlx4/mlx4_mr.c
@@ -1039,8 +1039,6 @@ mlx4_rx_addr2mr_bh(struct rxq *rxq, uintptr_t addr)
 	struct mlx4_mr_ctrl *mr_ctrl = &rxq->mr_ctrl;
 	struct mlx4_priv *priv = rxq->priv;
 
-	DEBUG("Rx queue %u: miss on top-half, mru=%u, head=%u, addr=%p",
-	      rxq->stats.idx, mr_ctrl->mru, mr_ctrl->head, (void *)addr);
 	return mlx4_mr_addr2mr_bh(ETH_DEV(priv), mr_ctrl, addr);
 }
 
@@ -1061,8 +1059,6 @@ mlx4_tx_addr2mr_bh(struct txq *txq, uintptr_t addr)
 	struct mlx4_mr_ctrl *mr_ctrl = &txq->mr_ctrl;
 	struct mlx4_priv *priv = txq->priv;
 
-	DEBUG("Tx queue %u: miss on top-half, mru=%u, head=%u, addr=%p",
-	      txq->stats.idx, mr_ctrl->mru, mr_ctrl->head, (void *)addr);
 	return mlx4_mr_addr2mr_bh(ETH_DEV(priv), mr_ctrl, addr);
 }
 
diff --git a/drivers/net/mlx5/mlx5_mr.c b/drivers/net/mlx5/mlx5_mr.c
index d336a77e40..8aaa87dd60 100644
--- a/drivers/net/mlx5/mlx5_mr.c
+++ b/drivers/net/mlx5/mlx5_mr.c
@@ -1030,9 +1030,6 @@ mlx5_rx_addr2mr_bh(struct mlx5_rxq_data *rxq, uintptr_t addr)
 	struct mlx5_mr_ctrl *mr_ctrl = &rxq->mr_ctrl;
 	struct mlx5_priv *priv = rxq_ctrl->priv;
 
-	DRV_LOG(DEBUG,
-		"Rx queue %u: miss on top-half, mru=%u, head=%u, addr=%p",
-		rxq_ctrl->idx, mr_ctrl->mru, mr_ctrl->head, (void *)addr);
 	return mlx5_mr_addr2mr_bh(ETH_DEV(priv), mr_ctrl, addr);
 }
 
@@ -1055,9 +1052,6 @@ mlx5_tx_addr2mr_bh(struct mlx5_txq_data *txq, uintptr_t addr)
 	struct mlx5_mr_ctrl *mr_ctrl = &txq->mr_ctrl;
 	struct mlx5_priv *priv = txq_ctrl->priv;
 
-	DRV_LOG(DEBUG,
-		"Tx queue %u: miss on top-half, mru=%u, head=%u, addr=%p",
-		txq_ctrl->idx, mr_ctrl->mru, mr_ctrl->head, (void *)addr);
 	return mlx5_mr_addr2mr_bh(ETH_DEV(priv), mr_ctrl, addr);
 }
 
-- 
2.11.0

^ permalink raw reply	[flat|nested] 44+ messages in thread

* [dpdk-dev] [PATCH v2 1/6] net/mlx: remove debug messages on datapath
  2019-03-25 19:22   ` [dpdk-dev] [PATCH v2 1/6] net/mlx: remove debug messages on datapath Yongseok Koh
@ 2019-03-25 19:22     ` Yongseok Koh
  0 siblings, 0 replies; 44+ messages in thread
From: Yongseok Koh @ 2019-03-25 19:22 UTC (permalink / raw)
  To: shahafs; +Cc: dev, stable

Cc: stable@dpdk.org

Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
---
 drivers/net/mlx4/mlx4_mr.c | 4 ----
 drivers/net/mlx5/mlx5_mr.c | 6 ------
 2 files changed, 10 deletions(-)

diff --git a/drivers/net/mlx4/mlx4_mr.c b/drivers/net/mlx4/mlx4_mr.c
index 01894faecf..0ba55fda04 100644
--- a/drivers/net/mlx4/mlx4_mr.c
+++ b/drivers/net/mlx4/mlx4_mr.c
@@ -1039,8 +1039,6 @@ mlx4_rx_addr2mr_bh(struct rxq *rxq, uintptr_t addr)
 	struct mlx4_mr_ctrl *mr_ctrl = &rxq->mr_ctrl;
 	struct mlx4_priv *priv = rxq->priv;
 
-	DEBUG("Rx queue %u: miss on top-half, mru=%u, head=%u, addr=%p",
-	      rxq->stats.idx, mr_ctrl->mru, mr_ctrl->head, (void *)addr);
 	return mlx4_mr_addr2mr_bh(ETH_DEV(priv), mr_ctrl, addr);
 }
 
@@ -1061,8 +1059,6 @@ mlx4_tx_addr2mr_bh(struct txq *txq, uintptr_t addr)
 	struct mlx4_mr_ctrl *mr_ctrl = &txq->mr_ctrl;
 	struct mlx4_priv *priv = txq->priv;
 
-	DEBUG("Tx queue %u: miss on top-half, mru=%u, head=%u, addr=%p",
-	      txq->stats.idx, mr_ctrl->mru, mr_ctrl->head, (void *)addr);
 	return mlx4_mr_addr2mr_bh(ETH_DEV(priv), mr_ctrl, addr);
 }
 
diff --git a/drivers/net/mlx5/mlx5_mr.c b/drivers/net/mlx5/mlx5_mr.c
index d336a77e40..8aaa87dd60 100644
--- a/drivers/net/mlx5/mlx5_mr.c
+++ b/drivers/net/mlx5/mlx5_mr.c
@@ -1030,9 +1030,6 @@ mlx5_rx_addr2mr_bh(struct mlx5_rxq_data *rxq, uintptr_t addr)
 	struct mlx5_mr_ctrl *mr_ctrl = &rxq->mr_ctrl;
 	struct mlx5_priv *priv = rxq_ctrl->priv;
 
-	DRV_LOG(DEBUG,
-		"Rx queue %u: miss on top-half, mru=%u, head=%u, addr=%p",
-		rxq_ctrl->idx, mr_ctrl->mru, mr_ctrl->head, (void *)addr);
 	return mlx5_mr_addr2mr_bh(ETH_DEV(priv), mr_ctrl, addr);
 }
 
@@ -1055,9 +1052,6 @@ mlx5_tx_addr2mr_bh(struct mlx5_txq_data *txq, uintptr_t addr)
 	struct mlx5_mr_ctrl *mr_ctrl = &txq->mr_ctrl;
 	struct mlx5_priv *priv = txq_ctrl->priv;
 
-	DRV_LOG(DEBUG,
-		"Tx queue %u: miss on top-half, mru=%u, head=%u, addr=%p",
-		txq_ctrl->idx, mr_ctrl->mru, mr_ctrl->head, (void *)addr);
 	return mlx5_mr_addr2mr_bh(ETH_DEV(priv), mr_ctrl, addr);
 }
 
-- 
2.11.0


^ permalink raw reply	[flat|nested] 44+ messages in thread

* [dpdk-dev] [PATCH v2 2/6] net/mlx5: fix external memory registration
  2019-03-25 19:22 ` [dpdk-dev] [PATCH v2 " Yongseok Koh
  2019-03-25 19:22   ` Yongseok Koh
  2019-03-25 19:22   ` [dpdk-dev] [PATCH v2 1/6] net/mlx: remove debug messages on datapath Yongseok Koh
@ 2019-03-25 19:22   ` Yongseok Koh
  2019-03-25 19:22     ` Yongseok Koh
  2019-03-25 19:22   ` [dpdk-dev] [PATCH v2 3/6] net/mlx5: add control of excessive memory pinning by kernel Yongseok Koh
                     ` (3 subsequent siblings)
  6 siblings, 1 reply; 44+ messages in thread
From: Yongseok Koh @ 2019-03-25 19:22 UTC (permalink / raw)
  To: shahafs; +Cc: dev, stable

Secondary process is not allowed to register MR due to a restriction of
library and kernel driver.

Fixes: 7e43a32ee060 ("net/mlx5: support externally allocated static memory")
Cc: stable@dpdk.org

Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
---
 doc/guides/nics/mlx5.rst   |  5 +++++
 drivers/net/mlx5/mlx5_mr.c | 10 ++++++++++
 2 files changed, 15 insertions(+)

diff --git a/doc/guides/nics/mlx5.rst b/doc/guides/nics/mlx5.rst
index 0200373008..cbe3fb4c33 100644
--- a/doc/guides/nics/mlx5.rst
+++ b/doc/guides/nics/mlx5.rst
@@ -86,6 +86,11 @@ Limitations
 
   - Forked secondary process not supported.
   - All mempools must be initialized before rte_eth_dev_start().
+  - External memory unregistered in EAL memseg list cannot be used for DMA
+    unless such memory has been registered by ``mlx5_mr_update_ext_mp()`` in
+    primary process and remapped to the same virtual address in secondary
+    process. If the external memory is registered by primary process but has
+    different virtual address in secondary process, unexpected error may happen.
 
 - Flow pattern without any specific vlan will match for vlan packets as well:
 
diff --git a/drivers/net/mlx5/mlx5_mr.c b/drivers/net/mlx5/mlx5_mr.c
index 8aaa87dd60..e255650add 100644
--- a/drivers/net/mlx5/mlx5_mr.c
+++ b/drivers/net/mlx5/mlx5_mr.c
@@ -1132,6 +1132,7 @@ mlx5_mr_update_ext_mp_cb(struct rte_mempool *mp, void *opaque,
 	struct mlx5_mr_cache entry;
 	uint32_t lkey;
 
+	assert(rte_eal_process_type() == RTE_PROC_PRIMARY);
 	/* If already registered, it should return. */
 	rte_rwlock_read_lock(&priv->mr.rwlock);
 	lkey = mr_lookup_dev(dev, &entry, addr);
@@ -1233,6 +1234,15 @@ mlx5_tx_update_ext_mp(struct mlx5_txq_data *txq, uintptr_t addr,
 	struct mlx5_mr_ctrl *mr_ctrl = &txq->mr_ctrl;
 	struct mlx5_priv *priv = txq_ctrl->priv;
 
+	if (rte_eal_process_type() != RTE_PROC_PRIMARY) {
+		DRV_LOG(WARNING,
+			"port %u using address (%p) from unregistered mempool"
+			" having externally allocated memory"
+			" in secondary process, please create mempool"
+			" prior to rte_eth_dev_start()",
+			PORT_ID(priv), (void *)addr);
+		return UINT32_MAX;
+	}
 	mlx5_mr_update_ext_mp(ETH_DEV(priv), mr_ctrl, mp);
 	return mlx5_tx_addr2mr_bh(txq, addr);
 }
-- 
2.11.0

^ permalink raw reply	[flat|nested] 44+ messages in thread

* [dpdk-dev] [PATCH v2 2/6] net/mlx5: fix external memory registration
  2019-03-25 19:22   ` [dpdk-dev] [PATCH v2 2/6] net/mlx5: fix external memory registration Yongseok Koh
@ 2019-03-25 19:22     ` Yongseok Koh
  0 siblings, 0 replies; 44+ messages in thread
From: Yongseok Koh @ 2019-03-25 19:22 UTC (permalink / raw)
  To: shahafs; +Cc: dev, stable

Secondary process is not allowed to register MR due to a restriction of
library and kernel driver.

Fixes: 7e43a32ee060 ("net/mlx5: support externally allocated static memory")
Cc: stable@dpdk.org

Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
---
 doc/guides/nics/mlx5.rst   |  5 +++++
 drivers/net/mlx5/mlx5_mr.c | 10 ++++++++++
 2 files changed, 15 insertions(+)

diff --git a/doc/guides/nics/mlx5.rst b/doc/guides/nics/mlx5.rst
index 0200373008..cbe3fb4c33 100644
--- a/doc/guides/nics/mlx5.rst
+++ b/doc/guides/nics/mlx5.rst
@@ -86,6 +86,11 @@ Limitations
 
   - Forked secondary process not supported.
   - All mempools must be initialized before rte_eth_dev_start().
+  - External memory unregistered in EAL memseg list cannot be used for DMA
+    unless such memory has been registered by ``mlx5_mr_update_ext_mp()`` in
+    primary process and remapped to the same virtual address in secondary
+    process. If the external memory is registered by primary process but has
+    different virtual address in secondary process, unexpected error may happen.
 
 - Flow pattern without any specific vlan will match for vlan packets as well:
 
diff --git a/drivers/net/mlx5/mlx5_mr.c b/drivers/net/mlx5/mlx5_mr.c
index 8aaa87dd60..e255650add 100644
--- a/drivers/net/mlx5/mlx5_mr.c
+++ b/drivers/net/mlx5/mlx5_mr.c
@@ -1132,6 +1132,7 @@ mlx5_mr_update_ext_mp_cb(struct rte_mempool *mp, void *opaque,
 	struct mlx5_mr_cache entry;
 	uint32_t lkey;
 
+	assert(rte_eal_process_type() == RTE_PROC_PRIMARY);
 	/* If already registered, it should return. */
 	rte_rwlock_read_lock(&priv->mr.rwlock);
 	lkey = mr_lookup_dev(dev, &entry, addr);
@@ -1233,6 +1234,15 @@ mlx5_tx_update_ext_mp(struct mlx5_txq_data *txq, uintptr_t addr,
 	struct mlx5_mr_ctrl *mr_ctrl = &txq->mr_ctrl;
 	struct mlx5_priv *priv = txq_ctrl->priv;
 
+	if (rte_eal_process_type() != RTE_PROC_PRIMARY) {
+		DRV_LOG(WARNING,
+			"port %u using address (%p) from unregistered mempool"
+			" having externally allocated memory"
+			" in secondary process, please create mempool"
+			" prior to rte_eth_dev_start()",
+			PORT_ID(priv), (void *)addr);
+		return UINT32_MAX;
+	}
 	mlx5_mr_update_ext_mp(ETH_DEV(priv), mr_ctrl, mp);
 	return mlx5_tx_addr2mr_bh(txq, addr);
 }
-- 
2.11.0


^ permalink raw reply	[flat|nested] 44+ messages in thread

* [dpdk-dev] [PATCH v2 3/6] net/mlx5: add control of excessive memory pinning by kernel
  2019-03-25 19:22 ` [dpdk-dev] [PATCH v2 " Yongseok Koh
                     ` (2 preceding siblings ...)
  2019-03-25 19:22   ` [dpdk-dev] [PATCH v2 2/6] net/mlx5: fix external memory registration Yongseok Koh
@ 2019-03-25 19:22   ` Yongseok Koh
  2019-03-25 19:22     ` Yongseok Koh
  2019-03-25 19:22   ` [dpdk-dev] [PATCH v2 4/6] net/mlx5: enable secondary process to register DMA memory Yongseok Koh
                     ` (2 subsequent siblings)
  6 siblings, 1 reply; 44+ messages in thread
From: Yongseok Koh @ 2019-03-25 19:22 UTC (permalink / raw)
  To: shahafs; +Cc: dev

A new PMD parameter (mr_ext_memseg_en) is added to control extension of
memseg when creating a MR. It is enabled by default.

If enabled, mlx5_mr_create() tries to maximize the range of MR
registration so that the LKey lookup tables on datapath become smaller and
get the best performance. However, it may worsen memory utilization
because registered memory is pinned by kernel driver. Even if a page in the
extended chunk is freed, that doesn't become reusable until the entire
memory is freed and the MR is destroyed.

To make freed pages available immediately, this parameter has to be turned
off but it could drop performance.

Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
---
 doc/guides/nics/mlx5.rst   | 11 +++++++++++
 drivers/net/mlx5/mlx5.c    |  7 +++++++
 drivers/net/mlx5/mlx5.h    |  2 ++
 drivers/net/mlx5/mlx5_mr.c | 21 ++++++++++++++++-----
 4 files changed, 36 insertions(+), 5 deletions(-)

diff --git a/doc/guides/nics/mlx5.rst b/doc/guides/nics/mlx5.rst
index cbe3fb4c33..d9ae91dfc1 100644
--- a/doc/guides/nics/mlx5.rst
+++ b/doc/guides/nics/mlx5.rst
@@ -485,6 +485,17 @@ Run-time configuration
 
   Disabled by default.
 
+- ``mr_ext_memseg_en`` parameter [int]
+
+  A nonzero value enables extending memseg when registering DMA memory. If
+  enabled, the number of entries in MR (Memory Region) lookup table on datapath
+  is minimized and it benefits performance. On the other hand, it worsens memory
+  utilization because registered memory is pinned by kernel driver. Even if a
+  page in the extended chunk is freed, that doesn't become reusable until the
+  entire memory is freed.
+
+  Enabled by default.
+
 - ``representor`` parameter [list]
 
   This parameter can be used to instantiate DPDK Ethernet devices from
diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index 840cd3d307..93c0fc8c20 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -108,6 +108,9 @@
 /* Activate Netlink support in VF mode. */
 #define MLX5_VF_NL_EN "vf_nl_en"
 
+/* Enable extending memsegs when creating a MR. */
+#define MLX5_MR_EXT_MEMSEG_EN "mr_ext_memseg_en"
+
 /* Select port representors to instantiate. */
 #define MLX5_REPRESENTOR "representor"
 
@@ -569,6 +572,8 @@ mlx5_args_check(const char *key, const char *val, void *opaque)
 		config->vf_nl_en = !!tmp;
 	} else if (strcmp(MLX5_DV_FLOW_EN, key) == 0) {
 		config->dv_flow_en = !!tmp;
+	} else if (strcmp(MLX5_MR_EXT_MEMSEG_EN, key) == 0) {
+		config->mr_ext_memseg_en = !!tmp;
 	} else {
 		DRV_LOG(WARNING, "%s: unknown parameter", key);
 		rte_errno = EINVAL;
@@ -610,6 +615,7 @@ mlx5_args(struct mlx5_dev_config *config, struct rte_devargs *devargs)
 		MLX5_L3_VXLAN_EN,
 		MLX5_VF_NL_EN,
 		MLX5_DV_FLOW_EN,
+		MLX5_MR_EXT_MEMSEG_EN,
 		MLX5_REPRESENTOR,
 		NULL,
 	};
@@ -1588,6 +1594,7 @@ mlx5_pci_probe(struct rte_pci_driver *pci_drv __rte_unused,
 		.txqs_vec = MLX5_ARG_UNSET,
 		.inline_max_packet_sz = MLX5_ARG_UNSET,
 		.vf_nl_en = 1,
+		.mr_ext_memseg_en = 1,
 		.mprq = {
 			.enabled = 0, /* Disabled by default. */
 			.stride_num_n = MLX5_MPRQ_STRIDE_NUM_N,
diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index d8a5162bdb..37c8cd1d34 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -167,6 +167,8 @@ struct mlx5_dev_config {
 	unsigned int tx_vec_en:1; /* Tx vector is enabled. */
 	unsigned int rx_vec_en:1; /* Rx vector is enabled. */
 	unsigned int mpw_hdr_dseg:1; /* Enable DSEGs in the title WQEBB. */
+	unsigned int mr_ext_memseg_en:1;
+	/* Whether memseg should be extended for MR creation. */
 	unsigned int l3_vxlan_en:1; /* Enable L3 VXLAN flow creation. */
 	unsigned int vf_nl_en:1; /* Enable Netlink requests in VF mode. */
 	unsigned int dv_flow_en:1; /* Enable DV flow. */
diff --git a/drivers/net/mlx5/mlx5_mr.c b/drivers/net/mlx5/mlx5_mr.c
index e255650add..e9eda975ff 100644
--- a/drivers/net/mlx5/mlx5_mr.c
+++ b/drivers/net/mlx5/mlx5_mr.c
@@ -534,6 +534,7 @@ mlx5_mr_create(struct rte_eth_dev *dev, struct mlx5_mr_cache *entry,
 	       uintptr_t addr)
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_dev_config *config = &priv->config;
 	struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
 	const struct rte_memseg_list *msl;
 	const struct rte_memseg *ms;
@@ -569,14 +570,24 @@ mlx5_mr_create(struct rte_eth_dev *dev, struct mlx5_mr_cache *entry,
 	 */
 	mlx5_mr_garbage_collect(dev);
 	/*
-	 * Find out a contiguous virtual address chunk in use, to which the
-	 * given address belongs, in order to register maximum range. In the
-	 * best case where mempools are not dynamically recreated and
+	 * If enabled, find out a contiguous virtual address chunk in use, to
+	 * which the given address belongs, in order to register maximum range.
+	 * In the best case where mempools are not dynamically recreated and
 	 * '--socket-mem' is specified as an EAL option, it is very likely to
 	 * have only one MR(LKey) per a socket and per a hugepage-size even
-	 * though the system memory is highly fragmented.
+	 * though the system memory is highly fragmented. As the whole memory
+	 * chunk will be pinned by kernel, it can't be reused unless entire
+	 * chunk is freed from EAL.
+	 *
+	 * If disabled, just register one memseg (page). Then, memory
+	 * consumption will be minimized but it may drop performance if there
+	 * are many MRs to lookup on the datapath.
 	 */
-	if (!rte_memseg_contig_walk(mr_find_contig_memsegs_cb, &data)) {
+	if (!config->mr_ext_memseg_en) {
+		data.msl = rte_mem_virt2memseg_list((void *)addr);
+		data.start = RTE_ALIGN_FLOOR(addr, data.msl->page_sz);
+		data.end = data.start + data.msl->page_sz;
+	} else if (!rte_memseg_contig_walk(mr_find_contig_memsegs_cb, &data)) {
 		DRV_LOG(WARNING,
 			"port %u unable to find virtually contiguous"
 			" chunk for address (%p)."
-- 
2.11.0

^ permalink raw reply	[flat|nested] 44+ messages in thread

* [dpdk-dev] [PATCH v2 3/6] net/mlx5: add control of excessive memory pinning by kernel
  2019-03-25 19:22   ` [dpdk-dev] [PATCH v2 3/6] net/mlx5: add control of excessive memory pinning by kernel Yongseok Koh
@ 2019-03-25 19:22     ` Yongseok Koh
  0 siblings, 0 replies; 44+ messages in thread
From: Yongseok Koh @ 2019-03-25 19:22 UTC (permalink / raw)
  To: shahafs; +Cc: dev

A new PMD parameter (mr_ext_memseg_en) is added to control extension of
memseg when creating a MR. It is enabled by default.

If enabled, mlx5_mr_create() tries to maximize the range of MR
registration so that the LKey lookup tables on datapath become smaller and
get the best performance. However, it may worsen memory utilization
because registered memory is pinned by kernel driver. Even if a page in the
extended chunk is freed, that doesn't become reusable until the entire
memory is freed and the MR is destroyed.

To make freed pages available immediately, this parameter has to be turned
off but it could drop performance.

Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
---
 doc/guides/nics/mlx5.rst   | 11 +++++++++++
 drivers/net/mlx5/mlx5.c    |  7 +++++++
 drivers/net/mlx5/mlx5.h    |  2 ++
 drivers/net/mlx5/mlx5_mr.c | 21 ++++++++++++++++-----
 4 files changed, 36 insertions(+), 5 deletions(-)

diff --git a/doc/guides/nics/mlx5.rst b/doc/guides/nics/mlx5.rst
index cbe3fb4c33..d9ae91dfc1 100644
--- a/doc/guides/nics/mlx5.rst
+++ b/doc/guides/nics/mlx5.rst
@@ -485,6 +485,17 @@ Run-time configuration
 
   Disabled by default.
 
+- ``mr_ext_memseg_en`` parameter [int]
+
+  A nonzero value enables extending memseg when registering DMA memory. If
+  enabled, the number of entries in MR (Memory Region) lookup table on datapath
+  is minimized and it benefits performance. On the other hand, it worsens memory
+  utilization because registered memory is pinned by kernel driver. Even if a
+  page in the extended chunk is freed, that doesn't become reusable until the
+  entire memory is freed.
+
+  Enabled by default.
+
 - ``representor`` parameter [list]
 
   This parameter can be used to instantiate DPDK Ethernet devices from
diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index 840cd3d307..93c0fc8c20 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -108,6 +108,9 @@
 /* Activate Netlink support in VF mode. */
 #define MLX5_VF_NL_EN "vf_nl_en"
 
+/* Enable extending memsegs when creating a MR. */
+#define MLX5_MR_EXT_MEMSEG_EN "mr_ext_memseg_en"
+
 /* Select port representors to instantiate. */
 #define MLX5_REPRESENTOR "representor"
 
@@ -569,6 +572,8 @@ mlx5_args_check(const char *key, const char *val, void *opaque)
 		config->vf_nl_en = !!tmp;
 	} else if (strcmp(MLX5_DV_FLOW_EN, key) == 0) {
 		config->dv_flow_en = !!tmp;
+	} else if (strcmp(MLX5_MR_EXT_MEMSEG_EN, key) == 0) {
+		config->mr_ext_memseg_en = !!tmp;
 	} else {
 		DRV_LOG(WARNING, "%s: unknown parameter", key);
 		rte_errno = EINVAL;
@@ -610,6 +615,7 @@ mlx5_args(struct mlx5_dev_config *config, struct rte_devargs *devargs)
 		MLX5_L3_VXLAN_EN,
 		MLX5_VF_NL_EN,
 		MLX5_DV_FLOW_EN,
+		MLX5_MR_EXT_MEMSEG_EN,
 		MLX5_REPRESENTOR,
 		NULL,
 	};
@@ -1588,6 +1594,7 @@ mlx5_pci_probe(struct rte_pci_driver *pci_drv __rte_unused,
 		.txqs_vec = MLX5_ARG_UNSET,
 		.inline_max_packet_sz = MLX5_ARG_UNSET,
 		.vf_nl_en = 1,
+		.mr_ext_memseg_en = 1,
 		.mprq = {
 			.enabled = 0, /* Disabled by default. */
 			.stride_num_n = MLX5_MPRQ_STRIDE_NUM_N,
diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index d8a5162bdb..37c8cd1d34 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -167,6 +167,8 @@ struct mlx5_dev_config {
 	unsigned int tx_vec_en:1; /* Tx vector is enabled. */
 	unsigned int rx_vec_en:1; /* Rx vector is enabled. */
 	unsigned int mpw_hdr_dseg:1; /* Enable DSEGs in the title WQEBB. */
+	unsigned int mr_ext_memseg_en:1;
+	/* Whether memseg should be extended for MR creation. */
 	unsigned int l3_vxlan_en:1; /* Enable L3 VXLAN flow creation. */
 	unsigned int vf_nl_en:1; /* Enable Netlink requests in VF mode. */
 	unsigned int dv_flow_en:1; /* Enable DV flow. */
diff --git a/drivers/net/mlx5/mlx5_mr.c b/drivers/net/mlx5/mlx5_mr.c
index e255650add..e9eda975ff 100644
--- a/drivers/net/mlx5/mlx5_mr.c
+++ b/drivers/net/mlx5/mlx5_mr.c
@@ -534,6 +534,7 @@ mlx5_mr_create(struct rte_eth_dev *dev, struct mlx5_mr_cache *entry,
 	       uintptr_t addr)
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_dev_config *config = &priv->config;
 	struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
 	const struct rte_memseg_list *msl;
 	const struct rte_memseg *ms;
@@ -569,14 +570,24 @@ mlx5_mr_create(struct rte_eth_dev *dev, struct mlx5_mr_cache *entry,
 	 */
 	mlx5_mr_garbage_collect(dev);
 	/*
-	 * Find out a contiguous virtual address chunk in use, to which the
-	 * given address belongs, in order to register maximum range. In the
-	 * best case where mempools are not dynamically recreated and
+	 * If enabled, find out a contiguous virtual address chunk in use, to
+	 * which the given address belongs, in order to register maximum range.
+	 * In the best case where mempools are not dynamically recreated and
 	 * '--socket-mem' is specified as an EAL option, it is very likely to
 	 * have only one MR(LKey) per a socket and per a hugepage-size even
-	 * though the system memory is highly fragmented.
+	 * though the system memory is highly fragmented. As the whole memory
+	 * chunk will be pinned by kernel, it can't be reused unless entire
+	 * chunk is freed from EAL.
+	 *
+	 * If disabled, just register one memseg (page). Then, memory
+	 * consumption will be minimized but it may drop performance if there
+	 * are many MRs to lookup on the datapath.
 	 */
-	if (!rte_memseg_contig_walk(mr_find_contig_memsegs_cb, &data)) {
+	if (!config->mr_ext_memseg_en) {
+		data.msl = rte_mem_virt2memseg_list((void *)addr);
+		data.start = RTE_ALIGN_FLOOR(addr, data.msl->page_sz);
+		data.end = data.start + data.msl->page_sz;
+	} else if (!rte_memseg_contig_walk(mr_find_contig_memsegs_cb, &data)) {
 		DRV_LOG(WARNING,
 			"port %u unable to find virtually contiguous"
 			" chunk for address (%p)."
-- 
2.11.0


^ permalink raw reply	[flat|nested] 44+ messages in thread

* [dpdk-dev] [PATCH v2 4/6] net/mlx5: enable secondary process to register DMA memory
  2019-03-25 19:22 ` [dpdk-dev] [PATCH v2 " Yongseok Koh
                     ` (3 preceding siblings ...)
  2019-03-25 19:22   ` [dpdk-dev] [PATCH v2 3/6] net/mlx5: add control of excessive memory pinning by kernel Yongseok Koh
@ 2019-03-25 19:22   ` Yongseok Koh
  2019-03-25 19:22     ` Yongseok Koh
  2019-03-25 19:22   ` [dpdk-dev] [PATCH v2 5/6] net/mlx4: add control of excessive memory pinning by kernel Yongseok Koh
  2019-03-25 19:22   ` [dpdk-dev] [PATCH v2 6/6] net/mlx4: enable secondary process to register DMA memory Yongseok Koh
  6 siblings, 1 reply; 44+ messages in thread
From: Yongseok Koh @ 2019-03-25 19:22 UTC (permalink / raw)
  To: shahafs; +Cc: dev

The Memory Region (MR) for DMA memory can't be created from secondary
process due to lib/driver limitation. Whenever it is needed, secondary
process can make a request to primary process through the EAL IPC channel
(rte_mp_msg) which is established on initialization. Once a MR is created
by primary process, it is immediately visible to secondary process because
the MR list is global per a device. Thus, secondary process can look up the
list after the request is successfully returned.

Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
---
 doc/guides/nics/mlx5.rst   |  1 -
 drivers/net/mlx5/mlx5.h    |  6 +++
 drivers/net/mlx5/mlx5_mp.c | 50 ++++++++++++++++++++++++
 drivers/net/mlx5/mlx5_mr.c | 96 ++++++++++++++++++++++++++++++++++++++++------
 drivers/net/mlx5/mlx5_mr.h |  2 +
 5 files changed, 142 insertions(+), 13 deletions(-)

diff --git a/doc/guides/nics/mlx5.rst b/doc/guides/nics/mlx5.rst
index d9ae91dfc1..d793068b51 100644
--- a/doc/guides/nics/mlx5.rst
+++ b/doc/guides/nics/mlx5.rst
@@ -85,7 +85,6 @@ Limitations
 - For secondary process:
 
   - Forked secondary process not supported.
-  - All mempools must be initialized before rte_eth_dev_start().
   - External memory unregistered in EAL memseg list cannot be used for DMA
     unless such memory has been registered by ``mlx5_mr_update_ext_mp()`` in
     primary process and remapped to the same virtual address in secondary
diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index 37c8cd1d34..410f17ab53 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -59,6 +59,7 @@ enum {
 /* Request types for IPC. */
 enum mlx5_mp_req_type {
 	MLX5_MP_REQ_VERBS_CMD_FD = 1,
+	MLX5_MP_REQ_CREATE_MR,
 	MLX5_MP_REQ_START_RXTX,
 	MLX5_MP_REQ_STOP_RXTX,
 };
@@ -68,6 +69,10 @@ struct mlx5_mp_param {
 	enum mlx5_mp_req_type type;
 	int port_id;
 	int result;
+	RTE_STD_C11
+	union {
+		uintptr_t addr; /* MLX5_MP_REQ_CREATE_MR */
+	} args;
 };
 
 /** Request timeout for IPC. */
@@ -437,6 +442,7 @@ void mlx5_flow_delete_drop_queue(struct rte_eth_dev *dev);
 /* mlx5_mp.c */
 void mlx5_mp_req_start_rxtx(struct rte_eth_dev *dev);
 void mlx5_mp_req_stop_rxtx(struct rte_eth_dev *dev);
+int mlx5_mp_req_mr_create(struct rte_eth_dev *dev, uintptr_t addr);
 int mlx5_mp_req_verbs_cmd_fd(struct rte_eth_dev *dev);
 void mlx5_mp_init_primary(void);
 void mlx5_mp_uninit_primary(void);
diff --git a/drivers/net/mlx5/mlx5_mp.c b/drivers/net/mlx5/mlx5_mp.c
index 657ab6872e..7274a33b50 100644
--- a/drivers/net/mlx5/mlx5_mp.c
+++ b/drivers/net/mlx5/mlx5_mp.c
@@ -58,6 +58,8 @@ mp_primary_handle(const struct rte_mp_msg *mp_msg, const void *peer)
 		(const struct mlx5_mp_param *)mp_msg->param;
 	struct rte_eth_dev *dev;
 	struct mlx5_priv *priv;
+	struct mlx5_mr_cache entry;
+	uint32_t lkey;
 	int ret;
 
 	assert(rte_eal_process_type() == RTE_PROC_PRIMARY);
@@ -69,6 +71,13 @@ mp_primary_handle(const struct rte_mp_msg *mp_msg, const void *peer)
 	dev = &rte_eth_devices[param->port_id];
 	priv = dev->data->dev_private;
 	switch (param->type) {
+	case MLX5_MP_REQ_CREATE_MR:
+		mp_init_msg(dev, &mp_res, param->type);
+		lkey = mlx5_mr_create_primary(dev, &entry, param->args.addr);
+		if (lkey == UINT32_MAX)
+			res->result = -rte_errno;
+		ret = rte_mp_reply(&mp_res, peer);
+		break;
 	case MLX5_MP_REQ_VERBS_CMD_FD:
 		mp_init_msg(dev, &mp_res, param->type);
 		mp_res.num_fds = 1;
@@ -221,6 +230,47 @@ mlx5_mp_req_stop_rxtx(struct rte_eth_dev *dev)
 }
 
 /**
+ * Request Memory Region creation to the primary process.
+ *
+ * @param[in] dev
+ *   Pointer to Ethernet structure.
+ * @param addr
+ *   Target virtual address to register.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+int
+mlx5_mp_req_mr_create(struct rte_eth_dev *dev, uintptr_t addr)
+{
+	struct rte_mp_msg mp_req;
+	struct rte_mp_msg *mp_res;
+	struct rte_mp_reply mp_rep;
+	struct mlx5_mp_param *req = (struct mlx5_mp_param *)mp_req.param;
+	struct mlx5_mp_param *res;
+	struct timespec ts = {.tv_sec = MLX5_MP_REQ_TIMEOUT_SEC, .tv_nsec = 0};
+	int ret;
+
+	assert(rte_eal_process_type() == RTE_PROC_SECONDARY);
+	mp_init_msg(dev, &mp_req, MLX5_MP_REQ_CREATE_MR);
+	req->args.addr = addr;
+	ret = rte_mp_request_sync(&mp_req, &mp_rep, &ts);
+	if (ret) {
+		DRV_LOG(ERR, "port %u request to primary process failed",
+			dev->data->port_id);
+		return -rte_errno;
+	}
+	assert(mp_rep.nb_received == 1);
+	mp_res = &mp_rep.msgs[0];
+	res = (struct mlx5_mp_param *)mp_res->param;
+	ret = res->result;
+	if (ret)
+		rte_errno = -ret;
+	free(mp_rep.msgs);
+	return ret;
+}
+
+/**
  * Request Verbs command file descriptor for mmap to the primary process.
  *
  * @param[in] dev
diff --git a/drivers/net/mlx5/mlx5_mr.c b/drivers/net/mlx5/mlx5_mr.c
index e9eda975ff..576a3c298b 100644
--- a/drivers/net/mlx5/mlx5_mr.c
+++ b/drivers/net/mlx5/mlx5_mr.c
@@ -516,7 +516,10 @@ mr_find_contig_memsegs_cb(const struct rte_memseg_list *msl,
 
 /**
  * Create a new global Memroy Region (MR) for a missing virtual address.
- * Register entire virtually contiguous memory chunk around the address.
+ * This API should be called on a secondary process, then a request is sent to
+ * the primary process in order to create a MR for the address. As the global MR
+ * list is on the shared memory, following LKey lookup should succeed unless the
+ * request fails.
  *
  * @param dev
  *   Pointer to Ethernet device.
@@ -530,8 +533,52 @@ mr_find_contig_memsegs_cb(const struct rte_memseg_list *msl,
  *   Searched LKey on success, UINT32_MAX on failure and rte_errno is set.
  */
 static uint32_t
-mlx5_mr_create(struct rte_eth_dev *dev, struct mlx5_mr_cache *entry,
-	       uintptr_t addr)
+mlx5_mr_create_secondary(struct rte_eth_dev *dev, struct mlx5_mr_cache *entry,
+			 uintptr_t addr)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	int ret;
+
+	DEBUG("port %u requesting MR creation for address (%p)",
+	      dev->data->port_id, (void *)addr);
+	ret = mlx5_mp_req_mr_create(dev, addr);
+	if (ret) {
+		DEBUG("port %u fail to request MR creation for address (%p)",
+		      dev->data->port_id, (void *)addr);
+		return UINT32_MAX;
+	}
+	rte_rwlock_read_lock(&priv->mr.rwlock);
+	/* Fill in output data. */
+	mr_lookup_dev(dev, entry, addr);
+	/* Lookup can't fail. */
+	assert(entry->lkey != UINT32_MAX);
+	rte_rwlock_read_unlock(&priv->mr.rwlock);
+	DEBUG("port %u MR CREATED by primary process for %p:\n"
+	      "  [0x%" PRIxPTR ", 0x%" PRIxPTR "), lkey=0x%x",
+	      dev->data->port_id, (void *)addr,
+	      entry->start, entry->end, entry->lkey);
+	return entry->lkey;
+}
+
+/**
+ * Create a new global Memroy Region (MR) for a missing virtual address.
+ * Register entire virtually contiguous memory chunk around the address.
+ * This must be called from the primary process.
+ *
+ * @param dev
+ *   Pointer to Ethernet device.
+ * @param[out] entry
+ *   Pointer to returning MR cache entry, found in the global cache or newly
+ *   created. If failed to create one, this will not be updated.
+ * @param addr
+ *   Target virtual address to register.
+ *
+ * @return
+ *   Searched LKey on success, UINT32_MAX on failure and rte_errno is set.
+ */
+uint32_t
+mlx5_mr_create_primary(struct rte_eth_dev *dev, struct mlx5_mr_cache *entry,
+		       uintptr_t addr)
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
 	struct mlx5_dev_config *config = &priv->config;
@@ -552,15 +599,6 @@ mlx5_mr_create(struct rte_eth_dev *dev, struct mlx5_mr_cache *entry,
 
 	DRV_LOG(DEBUG, "port %u creating a MR using address (%p)",
 		dev->data->port_id, (void *)addr);
-	if (rte_eal_process_type() != RTE_PROC_PRIMARY) {
-		DRV_LOG(WARNING,
-			"port %u using address (%p) of unregistered mempool"
-			" in secondary process, please create mempool"
-			" before rte_eth_dev_start()",
-			dev->data->port_id, (void *)addr);
-		rte_errno = EPERM;
-		goto err_nolock;
-	}
 	/*
 	 * Release detached MRs if any. This can't be called with holding either
 	 * memory_hotplug_lock or priv->mr.rwlock. MRs on the free list have
@@ -772,6 +810,40 @@ mlx5_mr_create(struct rte_eth_dev *dev, struct mlx5_mr_cache *entry,
 }
 
 /**
+ * Create a new global Memroy Region (MR) for a missing virtual address.
+ * This can be called from primary and secondary process.
+ *
+ * @param dev
+ *   Pointer to Ethernet device.
+ * @param[out] entry
+ *   Pointer to returning MR cache entry, found in the global cache or newly
+ *   created. If failed to create one, this will not be updated.
+ * @param addr
+ *   Target virtual address to register.
+ *
+ * @return
+ *   Searched LKey on success, UINT32_MAX on failure and rte_errno is set.
+ */
+static uint32_t
+mlx5_mr_create(struct rte_eth_dev *dev, struct mlx5_mr_cache *entry,
+	       uintptr_t addr)
+{
+	uint32_t ret = 0;
+
+	switch (rte_eal_process_type()) {
+	case RTE_PROC_PRIMARY:
+		ret = mlx5_mr_create_primary(dev, entry, addr);
+		break;
+	case RTE_PROC_SECONDARY:
+		ret = mlx5_mr_create_secondary(dev, entry, addr);
+		break;
+	default:
+		break;
+	}
+	return ret;
+}
+
+/**
  * Rebuild the global B-tree cache of device from the original MR list.
  *
  * @param dev
diff --git a/drivers/net/mlx5/mlx5_mr.h b/drivers/net/mlx5/mlx5_mr.h
index a57003fe92..786f6a3148 100644
--- a/drivers/net/mlx5/mlx5_mr.h
+++ b/drivers/net/mlx5/mlx5_mr.h
@@ -70,6 +70,8 @@ extern rte_rwlock_t mlx5_mem_event_rwlock;
 
 int mlx5_mr_btree_init(struct mlx5_mr_btree *bt, int n, int socket);
 void mlx5_mr_btree_free(struct mlx5_mr_btree *bt);
+uint32_t mlx5_mr_create_primary(struct rte_eth_dev *dev,
+				struct mlx5_mr_cache *entry, uintptr_t addr);
 void mlx5_mr_mem_event_cb(enum rte_mem_event event_type, const void *addr,
 			  size_t len, void *arg);
 int mlx5_mr_update_mp(struct rte_eth_dev *dev, struct mlx5_mr_ctrl *mr_ctrl,
-- 
2.11.0

^ permalink raw reply	[flat|nested] 44+ messages in thread

* [dpdk-dev] [PATCH v2 4/6] net/mlx5: enable secondary process to register DMA memory
  2019-03-25 19:22   ` [dpdk-dev] [PATCH v2 4/6] net/mlx5: enable secondary process to register DMA memory Yongseok Koh
@ 2019-03-25 19:22     ` Yongseok Koh
  0 siblings, 0 replies; 44+ messages in thread
From: Yongseok Koh @ 2019-03-25 19:22 UTC (permalink / raw)
  To: shahafs; +Cc: dev

The Memory Region (MR) for DMA memory can't be created from secondary
process due to lib/driver limitation. Whenever it is needed, secondary
process can make a request to primary process through the EAL IPC channel
(rte_mp_msg) which is established on initialization. Once a MR is created
by primary process, it is immediately visible to secondary process because
the MR list is global per a device. Thus, secondary process can look up the
list after the request is successfully returned.

Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
---
 doc/guides/nics/mlx5.rst   |  1 -
 drivers/net/mlx5/mlx5.h    |  6 +++
 drivers/net/mlx5/mlx5_mp.c | 50 ++++++++++++++++++++++++
 drivers/net/mlx5/mlx5_mr.c | 96 ++++++++++++++++++++++++++++++++++++++++------
 drivers/net/mlx5/mlx5_mr.h |  2 +
 5 files changed, 142 insertions(+), 13 deletions(-)

diff --git a/doc/guides/nics/mlx5.rst b/doc/guides/nics/mlx5.rst
index d9ae91dfc1..d793068b51 100644
--- a/doc/guides/nics/mlx5.rst
+++ b/doc/guides/nics/mlx5.rst
@@ -85,7 +85,6 @@ Limitations
 - For secondary process:
 
   - Forked secondary process not supported.
-  - All mempools must be initialized before rte_eth_dev_start().
   - External memory unregistered in EAL memseg list cannot be used for DMA
     unless such memory has been registered by ``mlx5_mr_update_ext_mp()`` in
     primary process and remapped to the same virtual address in secondary
diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index 37c8cd1d34..410f17ab53 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -59,6 +59,7 @@ enum {
 /* Request types for IPC. */
 enum mlx5_mp_req_type {
 	MLX5_MP_REQ_VERBS_CMD_FD = 1,
+	MLX5_MP_REQ_CREATE_MR,
 	MLX5_MP_REQ_START_RXTX,
 	MLX5_MP_REQ_STOP_RXTX,
 };
@@ -68,6 +69,10 @@ struct mlx5_mp_param {
 	enum mlx5_mp_req_type type;
 	int port_id;
 	int result;
+	RTE_STD_C11
+	union {
+		uintptr_t addr; /* MLX5_MP_REQ_CREATE_MR */
+	} args;
 };
 
 /** Request timeout for IPC. */
@@ -437,6 +442,7 @@ void mlx5_flow_delete_drop_queue(struct rte_eth_dev *dev);
 /* mlx5_mp.c */
 void mlx5_mp_req_start_rxtx(struct rte_eth_dev *dev);
 void mlx5_mp_req_stop_rxtx(struct rte_eth_dev *dev);
+int mlx5_mp_req_mr_create(struct rte_eth_dev *dev, uintptr_t addr);
 int mlx5_mp_req_verbs_cmd_fd(struct rte_eth_dev *dev);
 void mlx5_mp_init_primary(void);
 void mlx5_mp_uninit_primary(void);
diff --git a/drivers/net/mlx5/mlx5_mp.c b/drivers/net/mlx5/mlx5_mp.c
index 657ab6872e..7274a33b50 100644
--- a/drivers/net/mlx5/mlx5_mp.c
+++ b/drivers/net/mlx5/mlx5_mp.c
@@ -58,6 +58,8 @@ mp_primary_handle(const struct rte_mp_msg *mp_msg, const void *peer)
 		(const struct mlx5_mp_param *)mp_msg->param;
 	struct rte_eth_dev *dev;
 	struct mlx5_priv *priv;
+	struct mlx5_mr_cache entry;
+	uint32_t lkey;
 	int ret;
 
 	assert(rte_eal_process_type() == RTE_PROC_PRIMARY);
@@ -69,6 +71,13 @@ mp_primary_handle(const struct rte_mp_msg *mp_msg, const void *peer)
 	dev = &rte_eth_devices[param->port_id];
 	priv = dev->data->dev_private;
 	switch (param->type) {
+	case MLX5_MP_REQ_CREATE_MR:
+		mp_init_msg(dev, &mp_res, param->type);
+		lkey = mlx5_mr_create_primary(dev, &entry, param->args.addr);
+		if (lkey == UINT32_MAX)
+			res->result = -rte_errno;
+		ret = rte_mp_reply(&mp_res, peer);
+		break;
 	case MLX5_MP_REQ_VERBS_CMD_FD:
 		mp_init_msg(dev, &mp_res, param->type);
 		mp_res.num_fds = 1;
@@ -221,6 +230,47 @@ mlx5_mp_req_stop_rxtx(struct rte_eth_dev *dev)
 }
 
 /**
+ * Request Memory Region creation to the primary process.
+ *
+ * @param[in] dev
+ *   Pointer to Ethernet structure.
+ * @param addr
+ *   Target virtual address to register.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+int
+mlx5_mp_req_mr_create(struct rte_eth_dev *dev, uintptr_t addr)
+{
+	struct rte_mp_msg mp_req;
+	struct rte_mp_msg *mp_res;
+	struct rte_mp_reply mp_rep;
+	struct mlx5_mp_param *req = (struct mlx5_mp_param *)mp_req.param;
+	struct mlx5_mp_param *res;
+	struct timespec ts = {.tv_sec = MLX5_MP_REQ_TIMEOUT_SEC, .tv_nsec = 0};
+	int ret;
+
+	assert(rte_eal_process_type() == RTE_PROC_SECONDARY);
+	mp_init_msg(dev, &mp_req, MLX5_MP_REQ_CREATE_MR);
+	req->args.addr = addr;
+	ret = rte_mp_request_sync(&mp_req, &mp_rep, &ts);
+	if (ret) {
+		DRV_LOG(ERR, "port %u request to primary process failed",
+			dev->data->port_id);
+		return -rte_errno;
+	}
+	assert(mp_rep.nb_received == 1);
+	mp_res = &mp_rep.msgs[0];
+	res = (struct mlx5_mp_param *)mp_res->param;
+	ret = res->result;
+	if (ret)
+		rte_errno = -ret;
+	free(mp_rep.msgs);
+	return ret;
+}
+
+/**
  * Request Verbs command file descriptor for mmap to the primary process.
  *
  * @param[in] dev
diff --git a/drivers/net/mlx5/mlx5_mr.c b/drivers/net/mlx5/mlx5_mr.c
index e9eda975ff..576a3c298b 100644
--- a/drivers/net/mlx5/mlx5_mr.c
+++ b/drivers/net/mlx5/mlx5_mr.c
@@ -516,7 +516,10 @@ mr_find_contig_memsegs_cb(const struct rte_memseg_list *msl,
 
 /**
  * Create a new global Memroy Region (MR) for a missing virtual address.
- * Register entire virtually contiguous memory chunk around the address.
+ * This API should be called on a secondary process, then a request is sent to
+ * the primary process in order to create a MR for the address. As the global MR
+ * list is on the shared memory, following LKey lookup should succeed unless the
+ * request fails.
  *
  * @param dev
  *   Pointer to Ethernet device.
@@ -530,8 +533,52 @@ mr_find_contig_memsegs_cb(const struct rte_memseg_list *msl,
  *   Searched LKey on success, UINT32_MAX on failure and rte_errno is set.
  */
 static uint32_t
-mlx5_mr_create(struct rte_eth_dev *dev, struct mlx5_mr_cache *entry,
-	       uintptr_t addr)
+mlx5_mr_create_secondary(struct rte_eth_dev *dev, struct mlx5_mr_cache *entry,
+			 uintptr_t addr)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	int ret;
+
+	DEBUG("port %u requesting MR creation for address (%p)",
+	      dev->data->port_id, (void *)addr);
+	ret = mlx5_mp_req_mr_create(dev, addr);
+	if (ret) {
+		DEBUG("port %u fail to request MR creation for address (%p)",
+		      dev->data->port_id, (void *)addr);
+		return UINT32_MAX;
+	}
+	rte_rwlock_read_lock(&priv->mr.rwlock);
+	/* Fill in output data. */
+	mr_lookup_dev(dev, entry, addr);
+	/* Lookup can't fail. */
+	assert(entry->lkey != UINT32_MAX);
+	rte_rwlock_read_unlock(&priv->mr.rwlock);
+	DEBUG("port %u MR CREATED by primary process for %p:\n"
+	      "  [0x%" PRIxPTR ", 0x%" PRIxPTR "), lkey=0x%x",
+	      dev->data->port_id, (void *)addr,
+	      entry->start, entry->end, entry->lkey);
+	return entry->lkey;
+}
+
+/**
+ * Create a new global Memroy Region (MR) for a missing virtual address.
+ * Register entire virtually contiguous memory chunk around the address.
+ * This must be called from the primary process.
+ *
+ * @param dev
+ *   Pointer to Ethernet device.
+ * @param[out] entry
+ *   Pointer to returning MR cache entry, found in the global cache or newly
+ *   created. If failed to create one, this will not be updated.
+ * @param addr
+ *   Target virtual address to register.
+ *
+ * @return
+ *   Searched LKey on success, UINT32_MAX on failure and rte_errno is set.
+ */
+uint32_t
+mlx5_mr_create_primary(struct rte_eth_dev *dev, struct mlx5_mr_cache *entry,
+		       uintptr_t addr)
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
 	struct mlx5_dev_config *config = &priv->config;
@@ -552,15 +599,6 @@ mlx5_mr_create(struct rte_eth_dev *dev, struct mlx5_mr_cache *entry,
 
 	DRV_LOG(DEBUG, "port %u creating a MR using address (%p)",
 		dev->data->port_id, (void *)addr);
-	if (rte_eal_process_type() != RTE_PROC_PRIMARY) {
-		DRV_LOG(WARNING,
-			"port %u using address (%p) of unregistered mempool"
-			" in secondary process, please create mempool"
-			" before rte_eth_dev_start()",
-			dev->data->port_id, (void *)addr);
-		rte_errno = EPERM;
-		goto err_nolock;
-	}
 	/*
 	 * Release detached MRs if any. This can't be called with holding either
 	 * memory_hotplug_lock or priv->mr.rwlock. MRs on the free list have
@@ -772,6 +810,40 @@ mlx5_mr_create(struct rte_eth_dev *dev, struct mlx5_mr_cache *entry,
 }
 
 /**
+ * Create a new global Memroy Region (MR) for a missing virtual address.
+ * This can be called from primary and secondary process.
+ *
+ * @param dev
+ *   Pointer to Ethernet device.
+ * @param[out] entry
+ *   Pointer to returning MR cache entry, found in the global cache or newly
+ *   created. If failed to create one, this will not be updated.
+ * @param addr
+ *   Target virtual address to register.
+ *
+ * @return
+ *   Searched LKey on success, UINT32_MAX on failure and rte_errno is set.
+ */
+static uint32_t
+mlx5_mr_create(struct rte_eth_dev *dev, struct mlx5_mr_cache *entry,
+	       uintptr_t addr)
+{
+	uint32_t ret = 0;
+
+	switch (rte_eal_process_type()) {
+	case RTE_PROC_PRIMARY:
+		ret = mlx5_mr_create_primary(dev, entry, addr);
+		break;
+	case RTE_PROC_SECONDARY:
+		ret = mlx5_mr_create_secondary(dev, entry, addr);
+		break;
+	default:
+		break;
+	}
+	return ret;
+}
+
+/**
  * Rebuild the global B-tree cache of device from the original MR list.
  *
  * @param dev
diff --git a/drivers/net/mlx5/mlx5_mr.h b/drivers/net/mlx5/mlx5_mr.h
index a57003fe92..786f6a3148 100644
--- a/drivers/net/mlx5/mlx5_mr.h
+++ b/drivers/net/mlx5/mlx5_mr.h
@@ -70,6 +70,8 @@ extern rte_rwlock_t mlx5_mem_event_rwlock;
 
 int mlx5_mr_btree_init(struct mlx5_mr_btree *bt, int n, int socket);
 void mlx5_mr_btree_free(struct mlx5_mr_btree *bt);
+uint32_t mlx5_mr_create_primary(struct rte_eth_dev *dev,
+				struct mlx5_mr_cache *entry, uintptr_t addr);
 void mlx5_mr_mem_event_cb(enum rte_mem_event event_type, const void *addr,
 			  size_t len, void *arg);
 int mlx5_mr_update_mp(struct rte_eth_dev *dev, struct mlx5_mr_ctrl *mr_ctrl,
-- 
2.11.0


^ permalink raw reply	[flat|nested] 44+ messages in thread

* [dpdk-dev] [PATCH v2 5/6] net/mlx4: add control of excessive memory pinning by kernel
  2019-03-25 19:22 ` [dpdk-dev] [PATCH v2 " Yongseok Koh
                     ` (4 preceding siblings ...)
  2019-03-25 19:22   ` [dpdk-dev] [PATCH v2 4/6] net/mlx5: enable secondary process to register DMA memory Yongseok Koh
@ 2019-03-25 19:22   ` Yongseok Koh
  2019-03-25 19:22     ` Yongseok Koh
  2019-03-25 19:22   ` [dpdk-dev] [PATCH v2 6/6] net/mlx4: enable secondary process to register DMA memory Yongseok Koh
  6 siblings, 1 reply; 44+ messages in thread
From: Yongseok Koh @ 2019-03-25 19:22 UTC (permalink / raw)
  To: shahafs; +Cc: dev

A new PMD parameter (mr_ext_memseg_en) is added to control extension of
memseg when creating a MR. It is enabled by default.

If enabled, mlx4_mr_create() tries to maximize the range of MR
registration so that the LKey lookup tables on datapath become smaller and
get the best performance. However, it may worsen memory utilization
because registered memory is pinned by kernel driver. Even if a page in the
extended chunk is freed, that doesn't become reusable until the entire
memory is freed and the MR is destroyed.

To make freed pages available immediately, this parameter has to be turned
off but it could drop performance.

Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
---
 doc/guides/nics/mlx4.rst   | 11 +++++++++++
 drivers/net/mlx4/mlx4.c    | 11 +++++++++--
 drivers/net/mlx4/mlx4.h    |  5 +++++
 drivers/net/mlx4/mlx4_mr.c | 20 +++++++++++++++-----
 4 files changed, 40 insertions(+), 7 deletions(-)

diff --git a/doc/guides/nics/mlx4.rst b/doc/guides/nics/mlx4.rst
index cd34838f41..c8a02be4dd 100644
--- a/doc/guides/nics/mlx4.rst
+++ b/doc/guides/nics/mlx4.rst
@@ -119,6 +119,17 @@ Run-time configuration
   times for additional ports. All ports are probed by default if left
   unspecified.
 
+- ``mr_ext_memseg_en`` parameter [int]
+
+  A nonzero value enables extending memseg when registering DMA memory. If
+  enabled, the number of entries in MR (Memory Region) lookup table on datapath
+  is minimized and it benefits performance. On the other hand, it worsens memory
+  utilization because registered memory is pinned by kernel driver. Even if a
+  page in the extended chunk is freed, that doesn't become reusable until the
+  entire memory is freed.
+
+  Enabled by default.
+
 Kernel module parameters
 ~~~~~~~~~~~~~~~~~~~~~~~~
 
diff --git a/drivers/net/mlx4/mlx4.c b/drivers/net/mlx4/mlx4.c
index a5cfcdbee3..d913c2a47e 100644
--- a/drivers/net/mlx4/mlx4.c
+++ b/drivers/net/mlx4/mlx4.c
@@ -71,11 +71,14 @@ struct mlx4_conf {
 		uint32_t present; /**< Bit-field for existing ports. */
 		uint32_t enabled; /**< Bit-field for user-enabled ports. */
 	} ports;
+	int mr_ext_memseg_en;
+	/** Whether memseg should be extended for MR creation. */
 };
 
 /* Available parameters list. */
 const char *pmd_mlx4_init_params[] = {
 	MLX4_PMD_PORT_KVARG,
+	MLX4_MR_EXT_MEMSEG_EN_KVARG,
 	NULL,
 };
 
@@ -516,6 +519,8 @@ mlx4_arg_parse(const char *key, const char *val, struct mlx4_conf *conf)
 			return -rte_errno;
 		}
 		conf->ports.enabled |= 1 << tmp;
+	} else if (strcmp(MLX4_MR_EXT_MEMSEG_EN_KVARG, key) == 0) {
+		conf->mr_ext_memseg_en = !!tmp;
 	} else {
 		rte_errno = EINVAL;
 		WARN("%s: unknown parameter", key);
@@ -551,10 +556,10 @@ mlx4_args(struct rte_devargs *devargs, struct mlx4_conf *conf)
 	}
 	/* Process parameters. */
 	for (i = 0; pmd_mlx4_init_params[i]; ++i) {
-		arg_count = rte_kvargs_count(kvlist, MLX4_PMD_PORT_KVARG);
+		arg_count = rte_kvargs_count(kvlist, pmd_mlx4_init_params[i]);
 		while (arg_count-- > 0) {
 			ret = rte_kvargs_process(kvlist,
-						 MLX4_PMD_PORT_KVARG,
+						 pmd_mlx4_init_params[i],
 						 (int (*)(const char *,
 							  const char *,
 							  void *))
@@ -883,6 +888,7 @@ mlx4_pci_probe(struct rte_pci_driver *pci_drv, struct rte_pci_device *pci_dev)
 	struct ibv_device_attr_ex device_attr_ex;
 	struct mlx4_conf conf = {
 		.ports.present = 0,
+		.mr_ext_memseg_en = 1,
 	};
 	unsigned int vf;
 	int i;
@@ -1100,6 +1106,7 @@ mlx4_pci_probe(struct rte_pci_driver *pci_drv, struct rte_pci_device *pci_dev)
 					device_attr_ex.tso_caps.max_tso;
 		DEBUG("TSO is %ssupported",
 		      priv->tso ? "" : "not ");
+		priv->mr_ext_memseg_en = conf.mr_ext_memseg_en;
 		/* Configure the first MAC address by default. */
 		err = mlx4_get_mac(priv, &mac.addr_bytes);
 		if (err) {
diff --git a/drivers/net/mlx4/mlx4.h b/drivers/net/mlx4/mlx4.h
index 832edc962d..5316c51247 100644
--- a/drivers/net/mlx4/mlx4.h
+++ b/drivers/net/mlx4/mlx4.h
@@ -53,6 +53,9 @@
 /** Port parameter. */
 #define MLX4_PMD_PORT_KVARG "port"
 
+/** Enable extending memsegs when creating a MR. */
+#define MLX4_MR_EXT_MEMSEG_EN_KVARG "mr_ext_memseg_en"
+
 /* Reserved address space for UAR mapping. */
 #define MLX4_UAR_SIZE (1ULL << (sizeof(uintptr_t) * 4))
 
@@ -164,6 +167,8 @@ struct mlx4_priv {
 	uint32_t hw_csum_l2tun:1; /**< Checksum support for L2 tunnels. */
 	uint32_t hw_fcs_strip:1; /**< FCS stripping toggling is supported. */
 	uint32_t tso:1; /**< Transmit segmentation offload is supported. */
+	uint32_t mr_ext_memseg_en:1;
+	/** Whether memseg should be extended for MR creation. */
 	uint32_t tso_max_payload_sz; /**< Max supported TSO payload size. */
 	uint32_t hw_rss_max_qps; /**< Max Rx Queues supported by RSS. */
 	uint64_t hw_rss_sup; /**< Supported RSS hash fields (Verbs format). */
diff --git a/drivers/net/mlx4/mlx4_mr.c b/drivers/net/mlx4/mlx4_mr.c
index 0ba55fda04..6db917a092 100644
--- a/drivers/net/mlx4/mlx4_mr.c
+++ b/drivers/net/mlx4/mlx4_mr.c
@@ -580,14 +580,24 @@ mlx4_mr_create(struct rte_eth_dev *dev, struct mlx4_mr_cache *entry,
 	 */
 	mlx4_mr_garbage_collect(dev);
 	/*
-	 * Find out a contiguous virtual address chunk in use, to which the
-	 * given address belongs, in order to register maximum range. In the
-	 * best case where mempools are not dynamically recreated and
+	 * If enabled, find out a contiguous virtual address chunk in use, to
+	 * which the given address belongs, in order to register maximum range.
+	 * In the best case where mempools are not dynamically recreated and
 	 * '--socket-mem' is specified as an EAL option, it is very likely to
 	 * have only one MR(LKey) per a socket and per a hugepage-size even
-	 * though the system memory is highly fragmented.
+	 * though the system memory is highly fragmented. As the whole memory
+	 * chunk will be pinned by kernel, it can't be reused unless entire
+	 * chunk is freed from EAL.
+	 *
+	 * If disabled, just register one memseg (page). Then, memory
+	 * consumption will be minimized but it may drop performance if there
+	 * are many MRs to lookup on the datapath.
 	 */
-	if (!rte_memseg_contig_walk(mr_find_contig_memsegs_cb, &data)) {
+	if (!priv->mr_ext_memseg_en) {
+		data.msl = rte_mem_virt2memseg_list((void *)addr);
+		data.start = RTE_ALIGN_FLOOR(addr, data.msl->page_sz);
+		data.end = data.start + data.msl->page_sz;
+	} else if (!rte_memseg_contig_walk(mr_find_contig_memsegs_cb, &data)) {
 		WARN("port %u unable to find virtually contiguous"
 		     " chunk for address (%p)."
 		     " rte_memseg_contig_walk() failed.",
-- 
2.11.0

^ permalink raw reply	[flat|nested] 44+ messages in thread

* [dpdk-dev] [PATCH v2 5/6] net/mlx4: add control of excessive memory pinning by kernel
  2019-03-25 19:22   ` [dpdk-dev] [PATCH v2 5/6] net/mlx4: add control of excessive memory pinning by kernel Yongseok Koh
@ 2019-03-25 19:22     ` Yongseok Koh
  0 siblings, 0 replies; 44+ messages in thread
From: Yongseok Koh @ 2019-03-25 19:22 UTC (permalink / raw)
  To: shahafs; +Cc: dev

A new PMD parameter (mr_ext_memseg_en) is added to control extension of
memseg when creating a MR. It is enabled by default.

If enabled, mlx4_mr_create() tries to maximize the range of MR
registration so that the LKey lookup tables on datapath become smaller and
get the best performance. However, it may worsen memory utilization
because registered memory is pinned by kernel driver. Even if a page in the
extended chunk is freed, that doesn't become reusable until the entire
memory is freed and the MR is destroyed.

To make freed pages available immediately, this parameter has to be turned
off but it could drop performance.

Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
---
 doc/guides/nics/mlx4.rst   | 11 +++++++++++
 drivers/net/mlx4/mlx4.c    | 11 +++++++++--
 drivers/net/mlx4/mlx4.h    |  5 +++++
 drivers/net/mlx4/mlx4_mr.c | 20 +++++++++++++++-----
 4 files changed, 40 insertions(+), 7 deletions(-)

diff --git a/doc/guides/nics/mlx4.rst b/doc/guides/nics/mlx4.rst
index cd34838f41..c8a02be4dd 100644
--- a/doc/guides/nics/mlx4.rst
+++ b/doc/guides/nics/mlx4.rst
@@ -119,6 +119,17 @@ Run-time configuration
   times for additional ports. All ports are probed by default if left
   unspecified.
 
+- ``mr_ext_memseg_en`` parameter [int]
+
+  A nonzero value enables extending memseg when registering DMA memory. If
+  enabled, the number of entries in MR (Memory Region) lookup table on datapath
+  is minimized and it benefits performance. On the other hand, it worsens memory
+  utilization because registered memory is pinned by kernel driver. Even if a
+  page in the extended chunk is freed, that doesn't become reusable until the
+  entire memory is freed.
+
+  Enabled by default.
+
 Kernel module parameters
 ~~~~~~~~~~~~~~~~~~~~~~~~
 
diff --git a/drivers/net/mlx4/mlx4.c b/drivers/net/mlx4/mlx4.c
index a5cfcdbee3..d913c2a47e 100644
--- a/drivers/net/mlx4/mlx4.c
+++ b/drivers/net/mlx4/mlx4.c
@@ -71,11 +71,14 @@ struct mlx4_conf {
 		uint32_t present; /**< Bit-field for existing ports. */
 		uint32_t enabled; /**< Bit-field for user-enabled ports. */
 	} ports;
+	int mr_ext_memseg_en;
+	/** Whether memseg should be extended for MR creation. */
 };
 
 /* Available parameters list. */
 const char *pmd_mlx4_init_params[] = {
 	MLX4_PMD_PORT_KVARG,
+	MLX4_MR_EXT_MEMSEG_EN_KVARG,
 	NULL,
 };
 
@@ -516,6 +519,8 @@ mlx4_arg_parse(const char *key, const char *val, struct mlx4_conf *conf)
 			return -rte_errno;
 		}
 		conf->ports.enabled |= 1 << tmp;
+	} else if (strcmp(MLX4_MR_EXT_MEMSEG_EN_KVARG, key) == 0) {
+		conf->mr_ext_memseg_en = !!tmp;
 	} else {
 		rte_errno = EINVAL;
 		WARN("%s: unknown parameter", key);
@@ -551,10 +556,10 @@ mlx4_args(struct rte_devargs *devargs, struct mlx4_conf *conf)
 	}
 	/* Process parameters. */
 	for (i = 0; pmd_mlx4_init_params[i]; ++i) {
-		arg_count = rte_kvargs_count(kvlist, MLX4_PMD_PORT_KVARG);
+		arg_count = rte_kvargs_count(kvlist, pmd_mlx4_init_params[i]);
 		while (arg_count-- > 0) {
 			ret = rte_kvargs_process(kvlist,
-						 MLX4_PMD_PORT_KVARG,
+						 pmd_mlx4_init_params[i],
 						 (int (*)(const char *,
 							  const char *,
 							  void *))
@@ -883,6 +888,7 @@ mlx4_pci_probe(struct rte_pci_driver *pci_drv, struct rte_pci_device *pci_dev)
 	struct ibv_device_attr_ex device_attr_ex;
 	struct mlx4_conf conf = {
 		.ports.present = 0,
+		.mr_ext_memseg_en = 1,
 	};
 	unsigned int vf;
 	int i;
@@ -1100,6 +1106,7 @@ mlx4_pci_probe(struct rte_pci_driver *pci_drv, struct rte_pci_device *pci_dev)
 					device_attr_ex.tso_caps.max_tso;
 		DEBUG("TSO is %ssupported",
 		      priv->tso ? "" : "not ");
+		priv->mr_ext_memseg_en = conf.mr_ext_memseg_en;
 		/* Configure the first MAC address by default. */
 		err = mlx4_get_mac(priv, &mac.addr_bytes);
 		if (err) {
diff --git a/drivers/net/mlx4/mlx4.h b/drivers/net/mlx4/mlx4.h
index 832edc962d..5316c51247 100644
--- a/drivers/net/mlx4/mlx4.h
+++ b/drivers/net/mlx4/mlx4.h
@@ -53,6 +53,9 @@
 /** Port parameter. */
 #define MLX4_PMD_PORT_KVARG "port"
 
+/** Enable extending memsegs when creating a MR. */
+#define MLX4_MR_EXT_MEMSEG_EN_KVARG "mr_ext_memseg_en"
+
 /* Reserved address space for UAR mapping. */
 #define MLX4_UAR_SIZE (1ULL << (sizeof(uintptr_t) * 4))
 
@@ -164,6 +167,8 @@ struct mlx4_priv {
 	uint32_t hw_csum_l2tun:1; /**< Checksum support for L2 tunnels. */
 	uint32_t hw_fcs_strip:1; /**< FCS stripping toggling is supported. */
 	uint32_t tso:1; /**< Transmit segmentation offload is supported. */
+	uint32_t mr_ext_memseg_en:1;
+	/** Whether memseg should be extended for MR creation. */
 	uint32_t tso_max_payload_sz; /**< Max supported TSO payload size. */
 	uint32_t hw_rss_max_qps; /**< Max Rx Queues supported by RSS. */
 	uint64_t hw_rss_sup; /**< Supported RSS hash fields (Verbs format). */
diff --git a/drivers/net/mlx4/mlx4_mr.c b/drivers/net/mlx4/mlx4_mr.c
index 0ba55fda04..6db917a092 100644
--- a/drivers/net/mlx4/mlx4_mr.c
+++ b/drivers/net/mlx4/mlx4_mr.c
@@ -580,14 +580,24 @@ mlx4_mr_create(struct rte_eth_dev *dev, struct mlx4_mr_cache *entry,
 	 */
 	mlx4_mr_garbage_collect(dev);
 	/*
-	 * Find out a contiguous virtual address chunk in use, to which the
-	 * given address belongs, in order to register maximum range. In the
-	 * best case where mempools are not dynamically recreated and
+	 * If enabled, find out a contiguous virtual address chunk in use, to
+	 * which the given address belongs, in order to register maximum range.
+	 * In the best case where mempools are not dynamically recreated and
 	 * '--socket-mem' is specified as an EAL option, it is very likely to
 	 * have only one MR(LKey) per a socket and per a hugepage-size even
-	 * though the system memory is highly fragmented.
+	 * though the system memory is highly fragmented. As the whole memory
+	 * chunk will be pinned by kernel, it can't be reused unless entire
+	 * chunk is freed from EAL.
+	 *
+	 * If disabled, just register one memseg (page). Then, memory
+	 * consumption will be minimized but it may drop performance if there
+	 * are many MRs to lookup on the datapath.
 	 */
-	if (!rte_memseg_contig_walk(mr_find_contig_memsegs_cb, &data)) {
+	if (!priv->mr_ext_memseg_en) {
+		data.msl = rte_mem_virt2memseg_list((void *)addr);
+		data.start = RTE_ALIGN_FLOOR(addr, data.msl->page_sz);
+		data.end = data.start + data.msl->page_sz;
+	} else if (!rte_memseg_contig_walk(mr_find_contig_memsegs_cb, &data)) {
 		WARN("port %u unable to find virtually contiguous"
 		     " chunk for address (%p)."
 		     " rte_memseg_contig_walk() failed.",
-- 
2.11.0


^ permalink raw reply	[flat|nested] 44+ messages in thread

* [dpdk-dev] [PATCH v2 6/6] net/mlx4: enable secondary process to register DMA memory
  2019-03-25 19:22 ` [dpdk-dev] [PATCH v2 " Yongseok Koh
                     ` (5 preceding siblings ...)
  2019-03-25 19:22   ` [dpdk-dev] [PATCH v2 5/6] net/mlx4: add control of excessive memory pinning by kernel Yongseok Koh
@ 2019-03-25 19:22   ` Yongseok Koh
  2019-03-25 19:22     ` Yongseok Koh
  6 siblings, 1 reply; 44+ messages in thread
From: Yongseok Koh @ 2019-03-25 19:22 UTC (permalink / raw)
  To: shahafs; +Cc: dev

The Memory Region (MR) for DMA memory can't be created from secondary
process due to lib/driver limitation. Whenever it is needed, secondary
process can make a request to primary process through the EAL IPC channel
(rte_mp_msg) which is established on initialization. Once a MR is created
by primary process, it is immediately visible to secondary process because
the MR list is global per a device. Thus, secondary process can look up the
list after the request is successfully returned.

Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
---
 doc/guides/nics/mlx4.rst   |  1 -
 drivers/net/mlx4/mlx4.h    |  6 +++
 drivers/net/mlx4/mlx4_mp.c | 50 ++++++++++++++++++++++++
 drivers/net/mlx4/mlx4_mr.c | 95 ++++++++++++++++++++++++++++++++++++++++------
 drivers/net/mlx4/mlx4_mr.h |  2 +
 5 files changed, 142 insertions(+), 12 deletions(-)

diff --git a/doc/guides/nics/mlx4.rst b/doc/guides/nics/mlx4.rst
index c8a02be4dd..aaf1907532 100644
--- a/doc/guides/nics/mlx4.rst
+++ b/doc/guides/nics/mlx4.rst
@@ -159,7 +159,6 @@ Limitations
 - For secondary process:
 
   - Forked secondary process not supported.
-  - All mempools must be initialized before rte_eth_dev_start().
   - External memory unregistered in EAL memseg list cannot be used for DMA
     unless such memory has been registered by ``mlx4_mr_update_ext_mp()`` in
     primary process and remapped to the same virtual address in secondary
diff --git a/drivers/net/mlx4/mlx4.h b/drivers/net/mlx4/mlx4.h
index 5316c51247..3881943ef0 100644
--- a/drivers/net/mlx4/mlx4.h
+++ b/drivers/net/mlx4/mlx4.h
@@ -79,6 +79,7 @@ enum {
 /* Request types for IPC. */
 enum mlx4_mp_req_type {
 	MLX4_MP_REQ_VERBS_CMD_FD = 1,
+	MLX4_MP_REQ_CREATE_MR,
 	MLX4_MP_REQ_START_RXTX,
 	MLX4_MP_REQ_STOP_RXTX,
 };
@@ -88,6 +89,10 @@ struct mlx4_mp_param {
 	enum mlx4_mp_req_type type;
 	int port_id;
 	int result;
+	RTE_STD_C11
+	union {
+		uintptr_t addr; /* MLX4_MP_REQ_CREATE_MR */
+	} args;
 };
 
 /** Request timeout for IPC. */
@@ -234,6 +239,7 @@ int mlx4_rx_intr_enable(struct rte_eth_dev *dev, uint16_t idx);
 /* mlx4_mp.c */
 void mlx4_mp_req_start_rxtx(struct rte_eth_dev *dev);
 void mlx4_mp_req_stop_rxtx(struct rte_eth_dev *dev);
+int mlx4_mp_req_mr_create(struct rte_eth_dev *dev, uintptr_t addr);
 int mlx4_mp_req_verbs_cmd_fd(struct rte_eth_dev *dev);
 void mlx4_mp_init_primary(void);
 void mlx4_mp_uninit_primary(void);
diff --git a/drivers/net/mlx4/mlx4_mp.c b/drivers/net/mlx4/mlx4_mp.c
index eaeb257348..183622453c 100644
--- a/drivers/net/mlx4/mlx4_mp.c
+++ b/drivers/net/mlx4/mlx4_mp.c
@@ -58,6 +58,8 @@ mp_primary_handle(const struct rte_mp_msg *mp_msg, const void *peer)
 		(const struct mlx4_mp_param *)mp_msg->param;
 	struct rte_eth_dev *dev;
 	struct mlx4_priv *priv;
+	struct mlx4_mr_cache entry;
+	uint32_t lkey;
 	int ret;
 
 	assert(rte_eal_process_type() == RTE_PROC_PRIMARY);
@@ -69,6 +71,13 @@ mp_primary_handle(const struct rte_mp_msg *mp_msg, const void *peer)
 	dev = &rte_eth_devices[param->port_id];
 	priv = dev->data->dev_private;
 	switch (param->type) {
+	case MLX4_MP_REQ_CREATE_MR:
+		mp_init_msg(dev, &mp_res, param->type);
+		lkey = mlx4_mr_create_primary(dev, &entry, param->args.addr);
+		if (lkey == UINT32_MAX)
+			res->result = -rte_errno;
+		ret = rte_mp_reply(&mp_res, peer);
+		break;
 	case MLX4_MP_REQ_VERBS_CMD_FD:
 		mp_init_msg(dev, &mp_res, param->type);
 		mp_res.num_fds = 1;
@@ -218,6 +227,47 @@ mlx4_mp_req_stop_rxtx(struct rte_eth_dev *dev)
 }
 
 /**
+ * Request Memory Region creation to the primary process.
+ *
+ * @param[in] dev
+ *   Pointer to Ethernet structure.
+ * @param addr
+ *   Target virtual address to register.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+int
+mlx4_mp_req_mr_create(struct rte_eth_dev *dev, uintptr_t addr)
+{
+	struct rte_mp_msg mp_req;
+	struct rte_mp_msg *mp_res;
+	struct rte_mp_reply mp_rep;
+	struct mlx4_mp_param *req = (struct mlx4_mp_param *)mp_req.param;
+	struct mlx4_mp_param *res;
+	struct timespec ts = {.tv_sec = MLX4_MP_REQ_TIMEOUT_SEC, .tv_nsec = 0};
+	int ret;
+
+	assert(rte_eal_process_type() == RTE_PROC_SECONDARY);
+	mp_init_msg(dev, &mp_req, MLX4_MP_REQ_CREATE_MR);
+	req->args.addr = addr;
+	ret = rte_mp_request_sync(&mp_req, &mp_rep, &ts);
+	if (ret) {
+		ERROR("port %u request to primary process failed",
+		      dev->data->port_id);
+		return -rte_errno;
+	}
+	assert(mp_rep.nb_received == 1);
+	mp_res = &mp_rep.msgs[0];
+	res = (struct mlx4_mp_param *)mp_res->param;
+	ret = res->result;
+	if (ret)
+		rte_errno = -ret;
+	free(mp_rep.msgs);
+	return ret;
+}
+
+/**
  * IPC message handler of primary process.
  *
  * @param[in] dev
diff --git a/drivers/net/mlx4/mlx4_mr.c b/drivers/net/mlx4/mlx4_mr.c
index 6db917a092..ad7d4832f2 100644
--- a/drivers/net/mlx4/mlx4_mr.c
+++ b/drivers/net/mlx4/mlx4_mr.c
@@ -528,7 +528,10 @@ mr_find_contig_memsegs_cb(const struct rte_memseg_list *msl,
 
 /**
  * Create a new global Memroy Region (MR) for a missing virtual address.
- * Register entire virtually contiguous memory chunk around the address.
+ * This API should be called on a secondary process, then a request is sent to
+ * the primary process in order to create a MR for the address. As the global MR
+ * list is on the shared memory, following LKey lookup should succeed unless the
+ * request fails.
  *
  * @param dev
  *   Pointer to Ethernet device.
@@ -542,8 +545,52 @@ mr_find_contig_memsegs_cb(const struct rte_memseg_list *msl,
  *   Searched LKey on success, UINT32_MAX on failure and rte_errno is set.
  */
 static uint32_t
-mlx4_mr_create(struct rte_eth_dev *dev, struct mlx4_mr_cache *entry,
-	       uintptr_t addr)
+mlx4_mr_create_secondary(struct rte_eth_dev *dev, struct mlx4_mr_cache *entry,
+			 uintptr_t addr)
+{
+	struct mlx4_priv *priv = dev->data->dev_private;
+	int ret;
+
+	DEBUG("port %u requesting MR creation for address (%p)",
+	      dev->data->port_id, (void *)addr);
+	ret = mlx4_mp_req_mr_create(dev, addr);
+	if (ret) {
+		DEBUG("port %u fail to request MR creation for address (%p)",
+		      dev->data->port_id, (void *)addr);
+		return UINT32_MAX;
+	}
+	rte_rwlock_read_lock(&priv->mr.rwlock);
+	/* Fill in output data. */
+	mr_lookup_dev(dev, entry, addr);
+	/* Lookup can't fail. */
+	assert(entry->lkey != UINT32_MAX);
+	rte_rwlock_read_unlock(&priv->mr.rwlock);
+	DEBUG("port %u MR CREATED by primary process for %p:\n"
+	      "  [0x%" PRIxPTR ", 0x%" PRIxPTR "), lkey=0x%x",
+	      dev->data->port_id, (void *)addr,
+	      entry->start, entry->end, entry->lkey);
+	return entry->lkey;
+}
+
+/**
+ * Create a new global Memroy Region (MR) for a missing virtual address.
+ * Register entire virtually contiguous memory chunk around the address.
+ * This must be called from the primary process.
+ *
+ * @param dev
+ *   Pointer to Ethernet device.
+ * @param[out] entry
+ *   Pointer to returning MR cache entry, found in the global cache or newly
+ *   created. If failed to create one, this will not be updated.
+ * @param addr
+ *   Target virtual address to register.
+ *
+ * @return
+ *   Searched LKey on success, UINT32_MAX on failure and rte_errno is set.
+ */
+uint32_t
+mlx4_mr_create_primary(struct rte_eth_dev *dev, struct mlx4_mr_cache *entry,
+		       uintptr_t addr)
 {
 	struct mlx4_priv *priv = dev->data->dev_private;
 	struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
@@ -563,14 +610,6 @@ mlx4_mr_create(struct rte_eth_dev *dev, struct mlx4_mr_cache *entry,
 
 	DEBUG("port %u creating a MR using address (%p)",
 	      dev->data->port_id, (void *)addr);
-	if (rte_eal_process_type() != RTE_PROC_PRIMARY) {
-		WARN("port %u using address (%p) of unregistered mempool"
-		     " in secondary process, please create mempool"
-		     " before rte_eth_dev_start()",
-		     dev->data->port_id, (void *)addr);
-		rte_errno = EPERM;
-		goto err_nolock;
-	}
 	/*
 	 * Release detached MRs if any. This can't be called with holding either
 	 * memory_hotplug_lock or priv->mr.rwlock. MRs on the free list have
@@ -781,6 +820,40 @@ mlx4_mr_create(struct rte_eth_dev *dev, struct mlx4_mr_cache *entry,
 }
 
 /**
+ * Create a new global Memroy Region (MR) for a missing virtual address.
+ * This can be called from primary and secondary process.
+ *
+ * @param dev
+ *   Pointer to Ethernet device.
+ * @param[out] entry
+ *   Pointer to returning MR cache entry, found in the global cache or newly
+ *   created. If failed to create one, this will not be updated.
+ * @param addr
+ *   Target virtual address to register.
+ *
+ * @return
+ *   Searched LKey on success, UINT32_MAX on failure and rte_errno is set.
+ */
+static uint32_t
+mlx4_mr_create(struct rte_eth_dev *dev, struct mlx4_mr_cache *entry,
+	       uintptr_t addr)
+{
+	uint32_t ret = 0;
+
+	switch (rte_eal_process_type()) {
+	case RTE_PROC_PRIMARY:
+		ret = mlx4_mr_create_primary(dev, entry, addr);
+		break;
+	case RTE_PROC_SECONDARY:
+		ret = mlx4_mr_create_secondary(dev, entry, addr);
+		break;
+	default:
+		break;
+	}
+	return ret;
+}
+
+/**
  * Rebuild the global B-tree cache of device from the original MR list.
  *
  * @param dev
diff --git a/drivers/net/mlx4/mlx4_mr.h b/drivers/net/mlx4/mlx4_mr.h
index 37a365a8b5..9d125e239d 100644
--- a/drivers/net/mlx4/mlx4_mr.h
+++ b/drivers/net/mlx4/mlx4_mr.h
@@ -75,6 +75,8 @@ extern rte_rwlock_t mlx4_mem_event_rwlock;
 int mlx4_mr_btree_init(struct mlx4_mr_btree *bt, int n, int socket);
 void mlx4_mr_btree_free(struct mlx4_mr_btree *bt);
 void mlx4_mr_btree_dump(struct mlx4_mr_btree *bt);
+uint32_t mlx4_mr_create_primary(struct rte_eth_dev *dev,
+				struct mlx4_mr_cache *entry, uintptr_t addr);
 void mlx4_mr_mem_event_cb(enum rte_mem_event event_type, const void *addr,
 			  size_t len, void *arg);
 int mlx4_mr_update_mp(struct rte_eth_dev *dev, struct mlx4_mr_ctrl *mr_ctrl,
-- 
2.11.0

^ permalink raw reply	[flat|nested] 44+ messages in thread

* [dpdk-dev] [PATCH v2 6/6] net/mlx4: enable secondary process to register DMA memory
  2019-03-25 19:22   ` [dpdk-dev] [PATCH v2 6/6] net/mlx4: enable secondary process to register DMA memory Yongseok Koh
@ 2019-03-25 19:22     ` Yongseok Koh
  0 siblings, 0 replies; 44+ messages in thread
From: Yongseok Koh @ 2019-03-25 19:22 UTC (permalink / raw)
  To: shahafs; +Cc: dev

The Memory Region (MR) for DMA memory can't be created from secondary
process due to lib/driver limitation. Whenever it is needed, secondary
process can make a request to primary process through the EAL IPC channel
(rte_mp_msg) which is established on initialization. Once a MR is created
by primary process, it is immediately visible to secondary process because
the MR list is global per a device. Thus, secondary process can look up the
list after the request is successfully returned.

Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
---
 doc/guides/nics/mlx4.rst   |  1 -
 drivers/net/mlx4/mlx4.h    |  6 +++
 drivers/net/mlx4/mlx4_mp.c | 50 ++++++++++++++++++++++++
 drivers/net/mlx4/mlx4_mr.c | 95 ++++++++++++++++++++++++++++++++++++++++------
 drivers/net/mlx4/mlx4_mr.h |  2 +
 5 files changed, 142 insertions(+), 12 deletions(-)

diff --git a/doc/guides/nics/mlx4.rst b/doc/guides/nics/mlx4.rst
index c8a02be4dd..aaf1907532 100644
--- a/doc/guides/nics/mlx4.rst
+++ b/doc/guides/nics/mlx4.rst
@@ -159,7 +159,6 @@ Limitations
 - For secondary process:
 
   - Forked secondary process not supported.
-  - All mempools must be initialized before rte_eth_dev_start().
   - External memory unregistered in EAL memseg list cannot be used for DMA
     unless such memory has been registered by ``mlx4_mr_update_ext_mp()`` in
     primary process and remapped to the same virtual address in secondary
diff --git a/drivers/net/mlx4/mlx4.h b/drivers/net/mlx4/mlx4.h
index 5316c51247..3881943ef0 100644
--- a/drivers/net/mlx4/mlx4.h
+++ b/drivers/net/mlx4/mlx4.h
@@ -79,6 +79,7 @@ enum {
 /* Request types for IPC. */
 enum mlx4_mp_req_type {
 	MLX4_MP_REQ_VERBS_CMD_FD = 1,
+	MLX4_MP_REQ_CREATE_MR,
 	MLX4_MP_REQ_START_RXTX,
 	MLX4_MP_REQ_STOP_RXTX,
 };
@@ -88,6 +89,10 @@ struct mlx4_mp_param {
 	enum mlx4_mp_req_type type;
 	int port_id;
 	int result;
+	RTE_STD_C11
+	union {
+		uintptr_t addr; /* MLX4_MP_REQ_CREATE_MR */
+	} args;
 };
 
 /** Request timeout for IPC. */
@@ -234,6 +239,7 @@ int mlx4_rx_intr_enable(struct rte_eth_dev *dev, uint16_t idx);
 /* mlx4_mp.c */
 void mlx4_mp_req_start_rxtx(struct rte_eth_dev *dev);
 void mlx4_mp_req_stop_rxtx(struct rte_eth_dev *dev);
+int mlx4_mp_req_mr_create(struct rte_eth_dev *dev, uintptr_t addr);
 int mlx4_mp_req_verbs_cmd_fd(struct rte_eth_dev *dev);
 void mlx4_mp_init_primary(void);
 void mlx4_mp_uninit_primary(void);
diff --git a/drivers/net/mlx4/mlx4_mp.c b/drivers/net/mlx4/mlx4_mp.c
index eaeb257348..183622453c 100644
--- a/drivers/net/mlx4/mlx4_mp.c
+++ b/drivers/net/mlx4/mlx4_mp.c
@@ -58,6 +58,8 @@ mp_primary_handle(const struct rte_mp_msg *mp_msg, const void *peer)
 		(const struct mlx4_mp_param *)mp_msg->param;
 	struct rte_eth_dev *dev;
 	struct mlx4_priv *priv;
+	struct mlx4_mr_cache entry;
+	uint32_t lkey;
 	int ret;
 
 	assert(rte_eal_process_type() == RTE_PROC_PRIMARY);
@@ -69,6 +71,13 @@ mp_primary_handle(const struct rte_mp_msg *mp_msg, const void *peer)
 	dev = &rte_eth_devices[param->port_id];
 	priv = dev->data->dev_private;
 	switch (param->type) {
+	case MLX4_MP_REQ_CREATE_MR:
+		mp_init_msg(dev, &mp_res, param->type);
+		lkey = mlx4_mr_create_primary(dev, &entry, param->args.addr);
+		if (lkey == UINT32_MAX)
+			res->result = -rte_errno;
+		ret = rte_mp_reply(&mp_res, peer);
+		break;
 	case MLX4_MP_REQ_VERBS_CMD_FD:
 		mp_init_msg(dev, &mp_res, param->type);
 		mp_res.num_fds = 1;
@@ -218,6 +227,47 @@ mlx4_mp_req_stop_rxtx(struct rte_eth_dev *dev)
 }
 
 /**
+ * Request Memory Region creation to the primary process.
+ *
+ * @param[in] dev
+ *   Pointer to Ethernet structure.
+ * @param addr
+ *   Target virtual address to register.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+int
+mlx4_mp_req_mr_create(struct rte_eth_dev *dev, uintptr_t addr)
+{
+	struct rte_mp_msg mp_req;
+	struct rte_mp_msg *mp_res;
+	struct rte_mp_reply mp_rep;
+	struct mlx4_mp_param *req = (struct mlx4_mp_param *)mp_req.param;
+	struct mlx4_mp_param *res;
+	struct timespec ts = {.tv_sec = MLX4_MP_REQ_TIMEOUT_SEC, .tv_nsec = 0};
+	int ret;
+
+	assert(rte_eal_process_type() == RTE_PROC_SECONDARY);
+	mp_init_msg(dev, &mp_req, MLX4_MP_REQ_CREATE_MR);
+	req->args.addr = addr;
+	ret = rte_mp_request_sync(&mp_req, &mp_rep, &ts);
+	if (ret) {
+		ERROR("port %u request to primary process failed",
+		      dev->data->port_id);
+		return -rte_errno;
+	}
+	assert(mp_rep.nb_received == 1);
+	mp_res = &mp_rep.msgs[0];
+	res = (struct mlx4_mp_param *)mp_res->param;
+	ret = res->result;
+	if (ret)
+		rte_errno = -ret;
+	free(mp_rep.msgs);
+	return ret;
+}
+
+/**
  * IPC message handler of primary process.
  *
  * @param[in] dev
diff --git a/drivers/net/mlx4/mlx4_mr.c b/drivers/net/mlx4/mlx4_mr.c
index 6db917a092..ad7d4832f2 100644
--- a/drivers/net/mlx4/mlx4_mr.c
+++ b/drivers/net/mlx4/mlx4_mr.c
@@ -528,7 +528,10 @@ mr_find_contig_memsegs_cb(const struct rte_memseg_list *msl,
 
 /**
  * Create a new global Memroy Region (MR) for a missing virtual address.
- * Register entire virtually contiguous memory chunk around the address.
+ * This API should be called on a secondary process, then a request is sent to
+ * the primary process in order to create a MR for the address. As the global MR
+ * list is on the shared memory, following LKey lookup should succeed unless the
+ * request fails.
  *
  * @param dev
  *   Pointer to Ethernet device.
@@ -542,8 +545,52 @@ mr_find_contig_memsegs_cb(const struct rte_memseg_list *msl,
  *   Searched LKey on success, UINT32_MAX on failure and rte_errno is set.
  */
 static uint32_t
-mlx4_mr_create(struct rte_eth_dev *dev, struct mlx4_mr_cache *entry,
-	       uintptr_t addr)
+mlx4_mr_create_secondary(struct rte_eth_dev *dev, struct mlx4_mr_cache *entry,
+			 uintptr_t addr)
+{
+	struct mlx4_priv *priv = dev->data->dev_private;
+	int ret;
+
+	DEBUG("port %u requesting MR creation for address (%p)",
+	      dev->data->port_id, (void *)addr);
+	ret = mlx4_mp_req_mr_create(dev, addr);
+	if (ret) {
+		DEBUG("port %u fail to request MR creation for address (%p)",
+		      dev->data->port_id, (void *)addr);
+		return UINT32_MAX;
+	}
+	rte_rwlock_read_lock(&priv->mr.rwlock);
+	/* Fill in output data. */
+	mr_lookup_dev(dev, entry, addr);
+	/* Lookup can't fail. */
+	assert(entry->lkey != UINT32_MAX);
+	rte_rwlock_read_unlock(&priv->mr.rwlock);
+	DEBUG("port %u MR CREATED by primary process for %p:\n"
+	      "  [0x%" PRIxPTR ", 0x%" PRIxPTR "), lkey=0x%x",
+	      dev->data->port_id, (void *)addr,
+	      entry->start, entry->end, entry->lkey);
+	return entry->lkey;
+}
+
+/**
+ * Create a new global Memroy Region (MR) for a missing virtual address.
+ * Register entire virtually contiguous memory chunk around the address.
+ * This must be called from the primary process.
+ *
+ * @param dev
+ *   Pointer to Ethernet device.
+ * @param[out] entry
+ *   Pointer to returning MR cache entry, found in the global cache or newly
+ *   created. If failed to create one, this will not be updated.
+ * @param addr
+ *   Target virtual address to register.
+ *
+ * @return
+ *   Searched LKey on success, UINT32_MAX on failure and rte_errno is set.
+ */
+uint32_t
+mlx4_mr_create_primary(struct rte_eth_dev *dev, struct mlx4_mr_cache *entry,
+		       uintptr_t addr)
 {
 	struct mlx4_priv *priv = dev->data->dev_private;
 	struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
@@ -563,14 +610,6 @@ mlx4_mr_create(struct rte_eth_dev *dev, struct mlx4_mr_cache *entry,
 
 	DEBUG("port %u creating a MR using address (%p)",
 	      dev->data->port_id, (void *)addr);
-	if (rte_eal_process_type() != RTE_PROC_PRIMARY) {
-		WARN("port %u using address (%p) of unregistered mempool"
-		     " in secondary process, please create mempool"
-		     " before rte_eth_dev_start()",
-		     dev->data->port_id, (void *)addr);
-		rte_errno = EPERM;
-		goto err_nolock;
-	}
 	/*
 	 * Release detached MRs if any. This can't be called with holding either
 	 * memory_hotplug_lock or priv->mr.rwlock. MRs on the free list have
@@ -781,6 +820,40 @@ mlx4_mr_create(struct rte_eth_dev *dev, struct mlx4_mr_cache *entry,
 }
 
 /**
+ * Create a new global Memroy Region (MR) for a missing virtual address.
+ * This can be called from primary and secondary process.
+ *
+ * @param dev
+ *   Pointer to Ethernet device.
+ * @param[out] entry
+ *   Pointer to returning MR cache entry, found in the global cache or newly
+ *   created. If failed to create one, this will not be updated.
+ * @param addr
+ *   Target virtual address to register.
+ *
+ * @return
+ *   Searched LKey on success, UINT32_MAX on failure and rte_errno is set.
+ */
+static uint32_t
+mlx4_mr_create(struct rte_eth_dev *dev, struct mlx4_mr_cache *entry,
+	       uintptr_t addr)
+{
+	uint32_t ret = 0;
+
+	switch (rte_eal_process_type()) {
+	case RTE_PROC_PRIMARY:
+		ret = mlx4_mr_create_primary(dev, entry, addr);
+		break;
+	case RTE_PROC_SECONDARY:
+		ret = mlx4_mr_create_secondary(dev, entry, addr);
+		break;
+	default:
+		break;
+	}
+	return ret;
+}
+
+/**
  * Rebuild the global B-tree cache of device from the original MR list.
  *
  * @param dev
diff --git a/drivers/net/mlx4/mlx4_mr.h b/drivers/net/mlx4/mlx4_mr.h
index 37a365a8b5..9d125e239d 100644
--- a/drivers/net/mlx4/mlx4_mr.h
+++ b/drivers/net/mlx4/mlx4_mr.h
@@ -75,6 +75,8 @@ extern rte_rwlock_t mlx4_mem_event_rwlock;
 int mlx4_mr_btree_init(struct mlx4_mr_btree *bt, int n, int socket);
 void mlx4_mr_btree_free(struct mlx4_mr_btree *bt);
 void mlx4_mr_btree_dump(struct mlx4_mr_btree *bt);
+uint32_t mlx4_mr_create_primary(struct rte_eth_dev *dev,
+				struct mlx4_mr_cache *entry, uintptr_t addr);
 void mlx4_mr_mem_event_cb(enum rte_mem_event event_type, const void *addr,
 			  size_t len, void *arg);
 int mlx4_mr_update_mp(struct rte_eth_dev *dev, struct mlx4_mr_ctrl *mr_ctrl,
-- 
2.11.0


^ permalink raw reply	[flat|nested] 44+ messages in thread

* [dpdk-dev] [PATCH v3 0/6] net/mlx: enable secondary process to register DMA memory
  2019-03-07  7:41 [dpdk-dev] [PATCH 0/6] net/mlx: enable secondary process to register DMA memory Yongseok Koh
                   ` (8 preceding siblings ...)
  2019-03-25 19:22 ` [dpdk-dev] [PATCH v2 " Yongseok Koh
@ 2019-04-01 21:17 ` Yongseok Koh
  2019-04-01 21:17   ` Yongseok Koh
                     ` (7 more replies)
  9 siblings, 8 replies; 44+ messages in thread
From: Yongseok Koh @ 2019-04-01 21:17 UTC (permalink / raw)
  To: shahafs; +Cc: dev

RFC:
https://mails.dpdk.org/archives/dev/2019-March/125517.html

v3:
* rebase on the latest branch tip

v2:
* add more sanity check for eth_dev and return value from IPC request
* complement commit messages
* add MLX5_MP_REQ_TIMEOUT_SEC
* keep acked-by: Shahaf Shuler

Yongseok Koh (6):
  net/mlx: remove debug messages on datapath
  net/mlx5: fix external memory registration
  net/mlx5: add control of excessive memory pinning by kernel
  net/mlx5: enable secondary process to register DMA memory
  net/mlx4: add control of excessive memory pinning by kernel
  net/mlx4: enable secondary process to register DMA memory

 doc/guides/nics/mlx4.rst   |  12 +++-
 doc/guides/nics/mlx5.rst   |  17 +++++-
 drivers/net/mlx4/mlx4.c    |  11 +++-
 drivers/net/mlx4/mlx4.h    |  11 ++++
 drivers/net/mlx4/mlx4_mp.c |  50 +++++++++++++++++
 drivers/net/mlx4/mlx4_mr.c | 119 +++++++++++++++++++++++++++++++++-------
 drivers/net/mlx4/mlx4_mr.h |   2 +
 drivers/net/mlx5/mlx5.c    |   7 +++
 drivers/net/mlx5/mlx5.h    |   8 +++
 drivers/net/mlx5/mlx5_mp.c |  50 +++++++++++++++++
 drivers/net/mlx5/mlx5_mr.c | 133 +++++++++++++++++++++++++++++++++++++--------
 drivers/net/mlx5/mlx5_mr.h |   2 +
 12 files changed, 375 insertions(+), 47 deletions(-)

-- 
2.11.0

^ permalink raw reply	[flat|nested] 44+ messages in thread

* [dpdk-dev] [PATCH v3 0/6] net/mlx: enable secondary process to register DMA memory
  2019-04-01 21:17 ` [dpdk-dev] [PATCH v3 0/6] net/mlx: " Yongseok Koh
@ 2019-04-01 21:17   ` Yongseok Koh
  2019-04-01 21:17   ` [dpdk-dev] [PATCH v3 1/6] net/mlx: remove debug messages on datapath Yongseok Koh
                     ` (6 subsequent siblings)
  7 siblings, 0 replies; 44+ messages in thread
From: Yongseok Koh @ 2019-04-01 21:17 UTC (permalink / raw)
  To: shahafs; +Cc: dev

RFC:
https://mails.dpdk.org/archives/dev/2019-March/125517.html

v3:
* rebase on the latest branch tip

v2:
* add more sanity check for eth_dev and return value from IPC request
* complement commit messages
* add MLX5_MP_REQ_TIMEOUT_SEC
* keep acked-by: Shahaf Shuler

Yongseok Koh (6):
  net/mlx: remove debug messages on datapath
  net/mlx5: fix external memory registration
  net/mlx5: add control of excessive memory pinning by kernel
  net/mlx5: enable secondary process to register DMA memory
  net/mlx4: add control of excessive memory pinning by kernel
  net/mlx4: enable secondary process to register DMA memory

 doc/guides/nics/mlx4.rst   |  12 +++-
 doc/guides/nics/mlx5.rst   |  17 +++++-
 drivers/net/mlx4/mlx4.c    |  11 +++-
 drivers/net/mlx4/mlx4.h    |  11 ++++
 drivers/net/mlx4/mlx4_mp.c |  50 +++++++++++++++++
 drivers/net/mlx4/mlx4_mr.c | 119 +++++++++++++++++++++++++++++++++-------
 drivers/net/mlx4/mlx4_mr.h |   2 +
 drivers/net/mlx5/mlx5.c    |   7 +++
 drivers/net/mlx5/mlx5.h    |   8 +++
 drivers/net/mlx5/mlx5_mp.c |  50 +++++++++++++++++
 drivers/net/mlx5/mlx5_mr.c | 133 +++++++++++++++++++++++++++++++++++++--------
 drivers/net/mlx5/mlx5_mr.h |   2 +
 12 files changed, 375 insertions(+), 47 deletions(-)

-- 
2.11.0


^ permalink raw reply	[flat|nested] 44+ messages in thread

* [dpdk-dev] [PATCH v3 1/6] net/mlx: remove debug messages on datapath
  2019-04-01 21:17 ` [dpdk-dev] [PATCH v3 0/6] net/mlx: " Yongseok Koh
  2019-04-01 21:17   ` Yongseok Koh
@ 2019-04-01 21:17   ` Yongseok Koh
  2019-04-01 21:17     ` Yongseok Koh
  2019-04-01 21:17   ` [dpdk-dev] [PATCH v3 2/6] net/mlx5: fix external memory registration Yongseok Koh
                     ` (5 subsequent siblings)
  7 siblings, 1 reply; 44+ messages in thread
From: Yongseok Koh @ 2019-04-01 21:17 UTC (permalink / raw)
  To: shahafs; +Cc: dev, stable

Cc: stable@dpdk.org

Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
---
 drivers/net/mlx4/mlx4_mr.c | 4 ----
 drivers/net/mlx5/mlx5_mr.c | 6 ------
 2 files changed, 10 deletions(-)

diff --git a/drivers/net/mlx4/mlx4_mr.c b/drivers/net/mlx4/mlx4_mr.c
index 01894faecf..0ba55fda04 100644
--- a/drivers/net/mlx4/mlx4_mr.c
+++ b/drivers/net/mlx4/mlx4_mr.c
@@ -1039,8 +1039,6 @@ mlx4_rx_addr2mr_bh(struct rxq *rxq, uintptr_t addr)
 	struct mlx4_mr_ctrl *mr_ctrl = &rxq->mr_ctrl;
 	struct mlx4_priv *priv = rxq->priv;
 
-	DEBUG("Rx queue %u: miss on top-half, mru=%u, head=%u, addr=%p",
-	      rxq->stats.idx, mr_ctrl->mru, mr_ctrl->head, (void *)addr);
 	return mlx4_mr_addr2mr_bh(ETH_DEV(priv), mr_ctrl, addr);
 }
 
@@ -1061,8 +1059,6 @@ mlx4_tx_addr2mr_bh(struct txq *txq, uintptr_t addr)
 	struct mlx4_mr_ctrl *mr_ctrl = &txq->mr_ctrl;
 	struct mlx4_priv *priv = txq->priv;
 
-	DEBUG("Tx queue %u: miss on top-half, mru=%u, head=%u, addr=%p",
-	      txq->stats.idx, mr_ctrl->mru, mr_ctrl->head, (void *)addr);
 	return mlx4_mr_addr2mr_bh(ETH_DEV(priv), mr_ctrl, addr);
 }
 
diff --git a/drivers/net/mlx5/mlx5_mr.c b/drivers/net/mlx5/mlx5_mr.c
index 88484dd50b..3718877299 100644
--- a/drivers/net/mlx5/mlx5_mr.c
+++ b/drivers/net/mlx5/mlx5_mr.c
@@ -1031,9 +1031,6 @@ mlx5_rx_addr2mr_bh(struct mlx5_rxq_data *rxq, uintptr_t addr)
 	struct mlx5_mr_ctrl *mr_ctrl = &rxq->mr_ctrl;
 	struct mlx5_priv *priv = rxq_ctrl->priv;
 
-	DRV_LOG(DEBUG,
-		"Rx queue %u: miss on top-half, mru=%u, head=%u, addr=%p",
-		rxq_ctrl->idx, mr_ctrl->mru, mr_ctrl->head, (void *)addr);
 	return mlx5_mr_addr2mr_bh(ETH_DEV(priv), mr_ctrl, addr);
 }
 
@@ -1056,9 +1053,6 @@ mlx5_tx_addr2mr_bh(struct mlx5_txq_data *txq, uintptr_t addr)
 	struct mlx5_mr_ctrl *mr_ctrl = &txq->mr_ctrl;
 	struct mlx5_priv *priv = txq_ctrl->priv;
 
-	DRV_LOG(DEBUG,
-		"Tx queue %u: miss on top-half, mru=%u, head=%u, addr=%p",
-		txq_ctrl->idx, mr_ctrl->mru, mr_ctrl->head, (void *)addr);
 	return mlx5_mr_addr2mr_bh(ETH_DEV(priv), mr_ctrl, addr);
 }
 
-- 
2.11.0

^ permalink raw reply	[flat|nested] 44+ messages in thread

* [dpdk-dev] [PATCH v3 1/6] net/mlx: remove debug messages on datapath
  2019-04-01 21:17   ` [dpdk-dev] [PATCH v3 1/6] net/mlx: remove debug messages on datapath Yongseok Koh
@ 2019-04-01 21:17     ` Yongseok Koh
  0 siblings, 0 replies; 44+ messages in thread
From: Yongseok Koh @ 2019-04-01 21:17 UTC (permalink / raw)
  To: shahafs; +Cc: dev, stable

Cc: stable@dpdk.org

Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
---
 drivers/net/mlx4/mlx4_mr.c | 4 ----
 drivers/net/mlx5/mlx5_mr.c | 6 ------
 2 files changed, 10 deletions(-)

diff --git a/drivers/net/mlx4/mlx4_mr.c b/drivers/net/mlx4/mlx4_mr.c
index 01894faecf..0ba55fda04 100644
--- a/drivers/net/mlx4/mlx4_mr.c
+++ b/drivers/net/mlx4/mlx4_mr.c
@@ -1039,8 +1039,6 @@ mlx4_rx_addr2mr_bh(struct rxq *rxq, uintptr_t addr)
 	struct mlx4_mr_ctrl *mr_ctrl = &rxq->mr_ctrl;
 	struct mlx4_priv *priv = rxq->priv;
 
-	DEBUG("Rx queue %u: miss on top-half, mru=%u, head=%u, addr=%p",
-	      rxq->stats.idx, mr_ctrl->mru, mr_ctrl->head, (void *)addr);
 	return mlx4_mr_addr2mr_bh(ETH_DEV(priv), mr_ctrl, addr);
 }
 
@@ -1061,8 +1059,6 @@ mlx4_tx_addr2mr_bh(struct txq *txq, uintptr_t addr)
 	struct mlx4_mr_ctrl *mr_ctrl = &txq->mr_ctrl;
 	struct mlx4_priv *priv = txq->priv;
 
-	DEBUG("Tx queue %u: miss on top-half, mru=%u, head=%u, addr=%p",
-	      txq->stats.idx, mr_ctrl->mru, mr_ctrl->head, (void *)addr);
 	return mlx4_mr_addr2mr_bh(ETH_DEV(priv), mr_ctrl, addr);
 }
 
diff --git a/drivers/net/mlx5/mlx5_mr.c b/drivers/net/mlx5/mlx5_mr.c
index 88484dd50b..3718877299 100644
--- a/drivers/net/mlx5/mlx5_mr.c
+++ b/drivers/net/mlx5/mlx5_mr.c
@@ -1031,9 +1031,6 @@ mlx5_rx_addr2mr_bh(struct mlx5_rxq_data *rxq, uintptr_t addr)
 	struct mlx5_mr_ctrl *mr_ctrl = &rxq->mr_ctrl;
 	struct mlx5_priv *priv = rxq_ctrl->priv;
 
-	DRV_LOG(DEBUG,
-		"Rx queue %u: miss on top-half, mru=%u, head=%u, addr=%p",
-		rxq_ctrl->idx, mr_ctrl->mru, mr_ctrl->head, (void *)addr);
 	return mlx5_mr_addr2mr_bh(ETH_DEV(priv), mr_ctrl, addr);
 }
 
@@ -1056,9 +1053,6 @@ mlx5_tx_addr2mr_bh(struct mlx5_txq_data *txq, uintptr_t addr)
 	struct mlx5_mr_ctrl *mr_ctrl = &txq->mr_ctrl;
 	struct mlx5_priv *priv = txq_ctrl->priv;
 
-	DRV_LOG(DEBUG,
-		"Tx queue %u: miss on top-half, mru=%u, head=%u, addr=%p",
-		txq_ctrl->idx, mr_ctrl->mru, mr_ctrl->head, (void *)addr);
 	return mlx5_mr_addr2mr_bh(ETH_DEV(priv), mr_ctrl, addr);
 }
 
-- 
2.11.0


^ permalink raw reply	[flat|nested] 44+ messages in thread

* [dpdk-dev] [PATCH v3 2/6] net/mlx5: fix external memory registration
  2019-04-01 21:17 ` [dpdk-dev] [PATCH v3 0/6] net/mlx: " Yongseok Koh
  2019-04-01 21:17   ` Yongseok Koh
  2019-04-01 21:17   ` [dpdk-dev] [PATCH v3 1/6] net/mlx: remove debug messages on datapath Yongseok Koh
@ 2019-04-01 21:17   ` Yongseok Koh
  2019-04-01 21:17     ` Yongseok Koh
  2019-04-01 21:17   ` [dpdk-dev] [PATCH v3 3/6] net/mlx5: add control of excessive memory pinning by kernel Yongseok Koh
                     ` (4 subsequent siblings)
  7 siblings, 1 reply; 44+ messages in thread
From: Yongseok Koh @ 2019-04-01 21:17 UTC (permalink / raw)
  To: shahafs; +Cc: dev, stable

Secondary process is not allowed to register MR due to a restriction of
library and kernel driver.

Fixes: 7e43a32ee060 ("net/mlx5: support externally allocated static memory")
Cc: stable@dpdk.org

Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
---
 doc/guides/nics/mlx5.rst   |  5 +++++
 drivers/net/mlx5/mlx5_mr.c | 10 ++++++++++
 2 files changed, 15 insertions(+)

diff --git a/doc/guides/nics/mlx5.rst b/doc/guides/nics/mlx5.rst
index f4db921f7f..fa9bf73da7 100644
--- a/doc/guides/nics/mlx5.rst
+++ b/doc/guides/nics/mlx5.rst
@@ -86,6 +86,11 @@ Limitations
 
   - Forked secondary process not supported.
   - All mempools must be initialized before rte_eth_dev_start().
+  - External memory unregistered in EAL memseg list cannot be used for DMA
+    unless such memory has been registered by ``mlx5_mr_update_ext_mp()`` in
+    primary process and remapped to the same virtual address in secondary
+    process. If the external memory is registered by primary process but has
+    different virtual address in secondary process, unexpected error may happen.
 
 - Flow pattern without any specific vlan will match for vlan packets as well:
 
diff --git a/drivers/net/mlx5/mlx5_mr.c b/drivers/net/mlx5/mlx5_mr.c
index 3718877299..e7f55be6e1 100644
--- a/drivers/net/mlx5/mlx5_mr.c
+++ b/drivers/net/mlx5/mlx5_mr.c
@@ -1185,6 +1185,7 @@ mlx5_mr_update_ext_mp_cb(struct rte_mempool *mp, void *opaque,
 	struct mlx5_mr_cache entry;
 	uint32_t lkey;
 
+	assert(rte_eal_process_type() == RTE_PROC_PRIMARY);
 	/* If already registered, it should return. */
 	rte_rwlock_read_lock(&priv->mr.rwlock);
 	lkey = mr_lookup_dev(dev, &entry, addr);
@@ -1400,6 +1401,15 @@ mlx5_tx_update_ext_mp(struct mlx5_txq_data *txq, uintptr_t addr,
 	struct mlx5_mr_ctrl *mr_ctrl = &txq->mr_ctrl;
 	struct mlx5_priv *priv = txq_ctrl->priv;
 
+	if (rte_eal_process_type() != RTE_PROC_PRIMARY) {
+		DRV_LOG(WARNING,
+			"port %u using address (%p) from unregistered mempool"
+			" having externally allocated memory"
+			" in secondary process, please create mempool"
+			" prior to rte_eth_dev_start()",
+			PORT_ID(priv), (void *)addr);
+		return UINT32_MAX;
+	}
 	mlx5_mr_update_ext_mp(ETH_DEV(priv), mr_ctrl, mp);
 	return mlx5_tx_addr2mr_bh(txq, addr);
 }
-- 
2.11.0

^ permalink raw reply	[flat|nested] 44+ messages in thread

* [dpdk-dev] [PATCH v3 2/6] net/mlx5: fix external memory registration
  2019-04-01 21:17   ` [dpdk-dev] [PATCH v3 2/6] net/mlx5: fix external memory registration Yongseok Koh
@ 2019-04-01 21:17     ` Yongseok Koh
  0 siblings, 0 replies; 44+ messages in thread
From: Yongseok Koh @ 2019-04-01 21:17 UTC (permalink / raw)
  To: shahafs; +Cc: dev, stable

Secondary process is not allowed to register MR due to a restriction of
library and kernel driver.

Fixes: 7e43a32ee060 ("net/mlx5: support externally allocated static memory")
Cc: stable@dpdk.org

Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
---
 doc/guides/nics/mlx5.rst   |  5 +++++
 drivers/net/mlx5/mlx5_mr.c | 10 ++++++++++
 2 files changed, 15 insertions(+)

diff --git a/doc/guides/nics/mlx5.rst b/doc/guides/nics/mlx5.rst
index f4db921f7f..fa9bf73da7 100644
--- a/doc/guides/nics/mlx5.rst
+++ b/doc/guides/nics/mlx5.rst
@@ -86,6 +86,11 @@ Limitations
 
   - Forked secondary process not supported.
   - All mempools must be initialized before rte_eth_dev_start().
+  - External memory unregistered in EAL memseg list cannot be used for DMA
+    unless such memory has been registered by ``mlx5_mr_update_ext_mp()`` in
+    primary process and remapped to the same virtual address in secondary
+    process. If the external memory is registered by primary process but has
+    different virtual address in secondary process, unexpected error may happen.
 
 - Flow pattern without any specific vlan will match for vlan packets as well:
 
diff --git a/drivers/net/mlx5/mlx5_mr.c b/drivers/net/mlx5/mlx5_mr.c
index 3718877299..e7f55be6e1 100644
--- a/drivers/net/mlx5/mlx5_mr.c
+++ b/drivers/net/mlx5/mlx5_mr.c
@@ -1185,6 +1185,7 @@ mlx5_mr_update_ext_mp_cb(struct rte_mempool *mp, void *opaque,
 	struct mlx5_mr_cache entry;
 	uint32_t lkey;
 
+	assert(rte_eal_process_type() == RTE_PROC_PRIMARY);
 	/* If already registered, it should return. */
 	rte_rwlock_read_lock(&priv->mr.rwlock);
 	lkey = mr_lookup_dev(dev, &entry, addr);
@@ -1400,6 +1401,15 @@ mlx5_tx_update_ext_mp(struct mlx5_txq_data *txq, uintptr_t addr,
 	struct mlx5_mr_ctrl *mr_ctrl = &txq->mr_ctrl;
 	struct mlx5_priv *priv = txq_ctrl->priv;
 
+	if (rte_eal_process_type() != RTE_PROC_PRIMARY) {
+		DRV_LOG(WARNING,
+			"port %u using address (%p) from unregistered mempool"
+			" having externally allocated memory"
+			" in secondary process, please create mempool"
+			" prior to rte_eth_dev_start()",
+			PORT_ID(priv), (void *)addr);
+		return UINT32_MAX;
+	}
 	mlx5_mr_update_ext_mp(ETH_DEV(priv), mr_ctrl, mp);
 	return mlx5_tx_addr2mr_bh(txq, addr);
 }
-- 
2.11.0


^ permalink raw reply	[flat|nested] 44+ messages in thread

* [dpdk-dev] [PATCH v3 3/6] net/mlx5: add control of excessive memory pinning by kernel
  2019-04-01 21:17 ` [dpdk-dev] [PATCH v3 0/6] net/mlx: " Yongseok Koh
                     ` (2 preceding siblings ...)
  2019-04-01 21:17   ` [dpdk-dev] [PATCH v3 2/6] net/mlx5: fix external memory registration Yongseok Koh
@ 2019-04-01 21:17   ` Yongseok Koh
  2019-04-01 21:17     ` Yongseok Koh
  2019-04-01 21:17   ` [dpdk-dev] [PATCH v3 4/6] net/mlx5: enable secondary process to register DMA memory Yongseok Koh
                     ` (3 subsequent siblings)
  7 siblings, 1 reply; 44+ messages in thread
From: Yongseok Koh @ 2019-04-01 21:17 UTC (permalink / raw)
  To: shahafs; +Cc: dev

A new PMD parameter (mr_ext_memseg_en) is added to control extension of
memseg when creating a MR. It is enabled by default.

If enabled, mlx5_mr_create() tries to maximize the range of MR
registration so that the LKey lookup tables on datapath become smaller and
get the best performance. However, it may worsen memory utilization
because registered memory is pinned by kernel driver. Even if a page in the
extended chunk is freed, that doesn't become reusable until the entire
memory is freed and the MR is destroyed.

To make freed pages available immediately, this parameter has to be turned
off but it could drop performance.

Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
---
 doc/guides/nics/mlx5.rst   | 11 +++++++++++
 drivers/net/mlx5/mlx5.c    |  7 +++++++
 drivers/net/mlx5/mlx5.h    |  2 ++
 drivers/net/mlx5/mlx5_mr.c | 21 ++++++++++++++++-----
 4 files changed, 36 insertions(+), 5 deletions(-)

diff --git a/doc/guides/nics/mlx5.rst b/doc/guides/nics/mlx5.rst
index fa9bf73da7..e5e3d9061e 100644
--- a/doc/guides/nics/mlx5.rst
+++ b/doc/guides/nics/mlx5.rst
@@ -485,6 +485,17 @@ Run-time configuration
 
   Disabled by default.
 
+- ``mr_ext_memseg_en`` parameter [int]
+
+  A nonzero value enables extending memseg when registering DMA memory. If
+  enabled, the number of entries in MR (Memory Region) lookup table on datapath
+  is minimized and it benefits performance. On the other hand, it worsens memory
+  utilization because registered memory is pinned by kernel driver. Even if a
+  page in the extended chunk is freed, that doesn't become reusable until the
+  entire memory is freed.
+
+  Enabled by default.
+
 - ``representor`` parameter [list]
 
   This parameter can be used to instantiate DPDK Ethernet devices from
diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index 2b7a6d121f..40445056f5 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -107,6 +107,9 @@
 /* Activate Netlink support in VF mode. */
 #define MLX5_VF_NL_EN "vf_nl_en"
 
+/* Enable extending memsegs when creating a MR. */
+#define MLX5_MR_EXT_MEMSEG_EN "mr_ext_memseg_en"
+
 /* Select port representors to instantiate. */
 #define MLX5_REPRESENTOR "representor"
 
@@ -732,6 +735,8 @@ mlx5_args_check(const char *key, const char *val, void *opaque)
 		config->vf_nl_en = !!tmp;
 	} else if (strcmp(MLX5_DV_FLOW_EN, key) == 0) {
 		config->dv_flow_en = !!tmp;
+	} else if (strcmp(MLX5_MR_EXT_MEMSEG_EN, key) == 0) {
+		config->mr_ext_memseg_en = !!tmp;
 	} else {
 		DRV_LOG(WARNING, "%s: unknown parameter", key);
 		rte_errno = EINVAL;
@@ -773,6 +778,7 @@ mlx5_args(struct mlx5_dev_config *config, struct rte_devargs *devargs)
 		MLX5_L3_VXLAN_EN,
 		MLX5_VF_NL_EN,
 		MLX5_DV_FLOW_EN,
+		MLX5_MR_EXT_MEMSEG_EN,
 		MLX5_REPRESENTOR,
 		NULL,
 	};
@@ -1853,6 +1859,7 @@ mlx5_pci_probe(struct rte_pci_driver *pci_drv __rte_unused,
 		.txqs_vec = MLX5_ARG_UNSET,
 		.inline_max_packet_sz = MLX5_ARG_UNSET,
 		.vf_nl_en = 1,
+		.mr_ext_memseg_en = 1,
 		.mprq = {
 			.enabled = 0, /* Disabled by default. */
 			.stride_num_n = MLX5_MPRQ_STRIDE_NUM_N,
diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index 12692505c3..b3445f198f 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -167,6 +167,8 @@ struct mlx5_dev_config {
 	unsigned int tx_vec_en:1; /* Tx vector is enabled. */
 	unsigned int rx_vec_en:1; /* Rx vector is enabled. */
 	unsigned int mpw_hdr_dseg:1; /* Enable DSEGs in the title WQEBB. */
+	unsigned int mr_ext_memseg_en:1;
+	/* Whether memseg should be extended for MR creation. */
 	unsigned int l3_vxlan_en:1; /* Enable L3 VXLAN flow creation. */
 	unsigned int vf_nl_en:1; /* Enable Netlink requests in VF mode. */
 	unsigned int dv_flow_en:1; /* Enable DV flow. */
diff --git a/drivers/net/mlx5/mlx5_mr.c b/drivers/net/mlx5/mlx5_mr.c
index e7f55be6e1..78d829722e 100644
--- a/drivers/net/mlx5/mlx5_mr.c
+++ b/drivers/net/mlx5/mlx5_mr.c
@@ -535,6 +535,7 @@ mlx5_mr_create(struct rte_eth_dev *dev, struct mlx5_mr_cache *entry,
 	       uintptr_t addr)
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_dev_config *config = &priv->config;
 	struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
 	const struct rte_memseg_list *msl;
 	const struct rte_memseg *ms;
@@ -570,14 +571,24 @@ mlx5_mr_create(struct rte_eth_dev *dev, struct mlx5_mr_cache *entry,
 	 */
 	mlx5_mr_garbage_collect(dev);
 	/*
-	 * Find out a contiguous virtual address chunk in use, to which the
-	 * given address belongs, in order to register maximum range. In the
-	 * best case where mempools are not dynamically recreated and
+	 * If enabled, find out a contiguous virtual address chunk in use, to
+	 * which the given address belongs, in order to register maximum range.
+	 * In the best case where mempools are not dynamically recreated and
 	 * '--socket-mem' is specified as an EAL option, it is very likely to
 	 * have only one MR(LKey) per a socket and per a hugepage-size even
-	 * though the system memory is highly fragmented.
+	 * though the system memory is highly fragmented. As the whole memory
+	 * chunk will be pinned by kernel, it can't be reused unless entire
+	 * chunk is freed from EAL.
+	 *
+	 * If disabled, just register one memseg (page). Then, memory
+	 * consumption will be minimized but it may drop performance if there
+	 * are many MRs to lookup on the datapath.
 	 */
-	if (!rte_memseg_contig_walk(mr_find_contig_memsegs_cb, &data)) {
+	if (!config->mr_ext_memseg_en) {
+		data.msl = rte_mem_virt2memseg_list((void *)addr);
+		data.start = RTE_ALIGN_FLOOR(addr, data.msl->page_sz);
+		data.end = data.start + data.msl->page_sz;
+	} else if (!rte_memseg_contig_walk(mr_find_contig_memsegs_cb, &data)) {
 		DRV_LOG(WARNING,
 			"port %u unable to find virtually contiguous"
 			" chunk for address (%p)."
-- 
2.11.0

^ permalink raw reply	[flat|nested] 44+ messages in thread

* [dpdk-dev] [PATCH v3 3/6] net/mlx5: add control of excessive memory pinning by kernel
  2019-04-01 21:17   ` [dpdk-dev] [PATCH v3 3/6] net/mlx5: add control of excessive memory pinning by kernel Yongseok Koh
@ 2019-04-01 21:17     ` Yongseok Koh
  0 siblings, 0 replies; 44+ messages in thread
From: Yongseok Koh @ 2019-04-01 21:17 UTC (permalink / raw)
  To: shahafs; +Cc: dev

A new PMD parameter (mr_ext_memseg_en) is added to control extension of
memseg when creating a MR. It is enabled by default.

If enabled, mlx5_mr_create() tries to maximize the range of MR
registration so that the LKey lookup tables on datapath become smaller and
get the best performance. However, it may worsen memory utilization
because registered memory is pinned by kernel driver. Even if a page in the
extended chunk is freed, that doesn't become reusable until the entire
memory is freed and the MR is destroyed.

To make freed pages available immediately, this parameter has to be turned
off but it could drop performance.

Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
---
 doc/guides/nics/mlx5.rst   | 11 +++++++++++
 drivers/net/mlx5/mlx5.c    |  7 +++++++
 drivers/net/mlx5/mlx5.h    |  2 ++
 drivers/net/mlx5/mlx5_mr.c | 21 ++++++++++++++++-----
 4 files changed, 36 insertions(+), 5 deletions(-)

diff --git a/doc/guides/nics/mlx5.rst b/doc/guides/nics/mlx5.rst
index fa9bf73da7..e5e3d9061e 100644
--- a/doc/guides/nics/mlx5.rst
+++ b/doc/guides/nics/mlx5.rst
@@ -485,6 +485,17 @@ Run-time configuration
 
   Disabled by default.
 
+- ``mr_ext_memseg_en`` parameter [int]
+
+  A nonzero value enables extending memseg when registering DMA memory. If
+  enabled, the number of entries in MR (Memory Region) lookup table on datapath
+  is minimized and it benefits performance. On the other hand, it worsens memory
+  utilization because registered memory is pinned by kernel driver. Even if a
+  page in the extended chunk is freed, that doesn't become reusable until the
+  entire memory is freed.
+
+  Enabled by default.
+
 - ``representor`` parameter [list]
 
   This parameter can be used to instantiate DPDK Ethernet devices from
diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index 2b7a6d121f..40445056f5 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -107,6 +107,9 @@
 /* Activate Netlink support in VF mode. */
 #define MLX5_VF_NL_EN "vf_nl_en"
 
+/* Enable extending memsegs when creating a MR. */
+#define MLX5_MR_EXT_MEMSEG_EN "mr_ext_memseg_en"
+
 /* Select port representors to instantiate. */
 #define MLX5_REPRESENTOR "representor"
 
@@ -732,6 +735,8 @@ mlx5_args_check(const char *key, const char *val, void *opaque)
 		config->vf_nl_en = !!tmp;
 	} else if (strcmp(MLX5_DV_FLOW_EN, key) == 0) {
 		config->dv_flow_en = !!tmp;
+	} else if (strcmp(MLX5_MR_EXT_MEMSEG_EN, key) == 0) {
+		config->mr_ext_memseg_en = !!tmp;
 	} else {
 		DRV_LOG(WARNING, "%s: unknown parameter", key);
 		rte_errno = EINVAL;
@@ -773,6 +778,7 @@ mlx5_args(struct mlx5_dev_config *config, struct rte_devargs *devargs)
 		MLX5_L3_VXLAN_EN,
 		MLX5_VF_NL_EN,
 		MLX5_DV_FLOW_EN,
+		MLX5_MR_EXT_MEMSEG_EN,
 		MLX5_REPRESENTOR,
 		NULL,
 	};
@@ -1853,6 +1859,7 @@ mlx5_pci_probe(struct rte_pci_driver *pci_drv __rte_unused,
 		.txqs_vec = MLX5_ARG_UNSET,
 		.inline_max_packet_sz = MLX5_ARG_UNSET,
 		.vf_nl_en = 1,
+		.mr_ext_memseg_en = 1,
 		.mprq = {
 			.enabled = 0, /* Disabled by default. */
 			.stride_num_n = MLX5_MPRQ_STRIDE_NUM_N,
diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index 12692505c3..b3445f198f 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -167,6 +167,8 @@ struct mlx5_dev_config {
 	unsigned int tx_vec_en:1; /* Tx vector is enabled. */
 	unsigned int rx_vec_en:1; /* Rx vector is enabled. */
 	unsigned int mpw_hdr_dseg:1; /* Enable DSEGs in the title WQEBB. */
+	unsigned int mr_ext_memseg_en:1;
+	/* Whether memseg should be extended for MR creation. */
 	unsigned int l3_vxlan_en:1; /* Enable L3 VXLAN flow creation. */
 	unsigned int vf_nl_en:1; /* Enable Netlink requests in VF mode. */
 	unsigned int dv_flow_en:1; /* Enable DV flow. */
diff --git a/drivers/net/mlx5/mlx5_mr.c b/drivers/net/mlx5/mlx5_mr.c
index e7f55be6e1..78d829722e 100644
--- a/drivers/net/mlx5/mlx5_mr.c
+++ b/drivers/net/mlx5/mlx5_mr.c
@@ -535,6 +535,7 @@ mlx5_mr_create(struct rte_eth_dev *dev, struct mlx5_mr_cache *entry,
 	       uintptr_t addr)
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_dev_config *config = &priv->config;
 	struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
 	const struct rte_memseg_list *msl;
 	const struct rte_memseg *ms;
@@ -570,14 +571,24 @@ mlx5_mr_create(struct rte_eth_dev *dev, struct mlx5_mr_cache *entry,
 	 */
 	mlx5_mr_garbage_collect(dev);
 	/*
-	 * Find out a contiguous virtual address chunk in use, to which the
-	 * given address belongs, in order to register maximum range. In the
-	 * best case where mempools are not dynamically recreated and
+	 * If enabled, find out a contiguous virtual address chunk in use, to
+	 * which the given address belongs, in order to register maximum range.
+	 * In the best case where mempools are not dynamically recreated and
 	 * '--socket-mem' is specified as an EAL option, it is very likely to
 	 * have only one MR(LKey) per a socket and per a hugepage-size even
-	 * though the system memory is highly fragmented.
+	 * though the system memory is highly fragmented. As the whole memory
+	 * chunk will be pinned by kernel, it can't be reused unless entire
+	 * chunk is freed from EAL.
+	 *
+	 * If disabled, just register one memseg (page). Then, memory
+	 * consumption will be minimized but it may drop performance if there
+	 * are many MRs to lookup on the datapath.
 	 */
-	if (!rte_memseg_contig_walk(mr_find_contig_memsegs_cb, &data)) {
+	if (!config->mr_ext_memseg_en) {
+		data.msl = rte_mem_virt2memseg_list((void *)addr);
+		data.start = RTE_ALIGN_FLOOR(addr, data.msl->page_sz);
+		data.end = data.start + data.msl->page_sz;
+	} else if (!rte_memseg_contig_walk(mr_find_contig_memsegs_cb, &data)) {
 		DRV_LOG(WARNING,
 			"port %u unable to find virtually contiguous"
 			" chunk for address (%p)."
-- 
2.11.0


^ permalink raw reply	[flat|nested] 44+ messages in thread

* [dpdk-dev] [PATCH v3 4/6] net/mlx5: enable secondary process to register DMA memory
  2019-04-01 21:17 ` [dpdk-dev] [PATCH v3 0/6] net/mlx: " Yongseok Koh
                     ` (3 preceding siblings ...)
  2019-04-01 21:17   ` [dpdk-dev] [PATCH v3 3/6] net/mlx5: add control of excessive memory pinning by kernel Yongseok Koh
@ 2019-04-01 21:17   ` Yongseok Koh
  2019-04-01 21:17     ` Yongseok Koh
  2019-04-01 21:17   ` [dpdk-dev] [PATCH v3 5/6] net/mlx4: add control of excessive memory pinning by kernel Yongseok Koh
                     ` (2 subsequent siblings)
  7 siblings, 1 reply; 44+ messages in thread
From: Yongseok Koh @ 2019-04-01 21:17 UTC (permalink / raw)
  To: shahafs; +Cc: dev

The Memory Region (MR) for DMA memory can't be created from secondary
process due to lib/driver limitation. Whenever it is needed, secondary
process can make a request to primary process through the EAL IPC channel
(rte_mp_msg) which is established on initialization. Once a MR is created
by primary process, it is immediately visible to secondary process because
the MR list is global per a device. Thus, secondary process can look up the
list after the request is successfully returned.

Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
---
 doc/guides/nics/mlx5.rst   |  1 -
 drivers/net/mlx5/mlx5.h    |  6 +++
 drivers/net/mlx5/mlx5_mp.c | 50 ++++++++++++++++++++++++
 drivers/net/mlx5/mlx5_mr.c | 96 ++++++++++++++++++++++++++++++++++++++++------
 drivers/net/mlx5/mlx5_mr.h |  2 +
 5 files changed, 142 insertions(+), 13 deletions(-)

diff --git a/doc/guides/nics/mlx5.rst b/doc/guides/nics/mlx5.rst
index e5e3d9061e..5fa6b62527 100644
--- a/doc/guides/nics/mlx5.rst
+++ b/doc/guides/nics/mlx5.rst
@@ -85,7 +85,6 @@ Limitations
 - For secondary process:
 
   - Forked secondary process not supported.
-  - All mempools must be initialized before rte_eth_dev_start().
   - External memory unregistered in EAL memseg list cannot be used for DMA
     unless such memory has been registered by ``mlx5_mr_update_ext_mp()`` in
     primary process and remapped to the same virtual address in secondary
diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index b3445f198f..47a7d75f7a 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -59,6 +59,7 @@ enum {
 /* Request types for IPC. */
 enum mlx5_mp_req_type {
 	MLX5_MP_REQ_VERBS_CMD_FD = 1,
+	MLX5_MP_REQ_CREATE_MR,
 	MLX5_MP_REQ_START_RXTX,
 	MLX5_MP_REQ_STOP_RXTX,
 };
@@ -68,6 +69,10 @@ struct mlx5_mp_param {
 	enum mlx5_mp_req_type type;
 	int port_id;
 	int result;
+	RTE_STD_C11
+	union {
+		uintptr_t addr; /* MLX5_MP_REQ_CREATE_MR */
+	} args;
 };
 
 /** Request timeout for IPC. */
@@ -467,6 +472,7 @@ void mlx5_flow_delete_drop_queue(struct rte_eth_dev *dev);
 /* mlx5_mp.c */
 void mlx5_mp_req_start_rxtx(struct rte_eth_dev *dev);
 void mlx5_mp_req_stop_rxtx(struct rte_eth_dev *dev);
+int mlx5_mp_req_mr_create(struct rte_eth_dev *dev, uintptr_t addr);
 int mlx5_mp_req_verbs_cmd_fd(struct rte_eth_dev *dev);
 void mlx5_mp_init_primary(void);
 void mlx5_mp_uninit_primary(void);
diff --git a/drivers/net/mlx5/mlx5_mp.c b/drivers/net/mlx5/mlx5_mp.c
index 45dcc30426..cea74adb63 100644
--- a/drivers/net/mlx5/mlx5_mp.c
+++ b/drivers/net/mlx5/mlx5_mp.c
@@ -58,6 +58,8 @@ mp_primary_handle(const struct rte_mp_msg *mp_msg, const void *peer)
 		(const struct mlx5_mp_param *)mp_msg->param;
 	struct rte_eth_dev *dev;
 	struct mlx5_priv *priv;
+	struct mlx5_mr_cache entry;
+	uint32_t lkey;
 	int ret;
 
 	assert(rte_eal_process_type() == RTE_PROC_PRIMARY);
@@ -69,6 +71,13 @@ mp_primary_handle(const struct rte_mp_msg *mp_msg, const void *peer)
 	dev = &rte_eth_devices[param->port_id];
 	priv = dev->data->dev_private;
 	switch (param->type) {
+	case MLX5_MP_REQ_CREATE_MR:
+		mp_init_msg(dev, &mp_res, param->type);
+		lkey = mlx5_mr_create_primary(dev, &entry, param->args.addr);
+		if (lkey == UINT32_MAX)
+			res->result = -rte_errno;
+		ret = rte_mp_reply(&mp_res, peer);
+		break;
 	case MLX5_MP_REQ_VERBS_CMD_FD:
 		mp_init_msg(dev, &mp_res, param->type);
 		mp_res.num_fds = 1;
@@ -221,6 +230,47 @@ mlx5_mp_req_stop_rxtx(struct rte_eth_dev *dev)
 }
 
 /**
+ * Request Memory Region creation to the primary process.
+ *
+ * @param[in] dev
+ *   Pointer to Ethernet structure.
+ * @param addr
+ *   Target virtual address to register.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+int
+mlx5_mp_req_mr_create(struct rte_eth_dev *dev, uintptr_t addr)
+{
+	struct rte_mp_msg mp_req;
+	struct rte_mp_msg *mp_res;
+	struct rte_mp_reply mp_rep;
+	struct mlx5_mp_param *req = (struct mlx5_mp_param *)mp_req.param;
+	struct mlx5_mp_param *res;
+	struct timespec ts = {.tv_sec = MLX5_MP_REQ_TIMEOUT_SEC, .tv_nsec = 0};
+	int ret;
+
+	assert(rte_eal_process_type() == RTE_PROC_SECONDARY);
+	mp_init_msg(dev, &mp_req, MLX5_MP_REQ_CREATE_MR);
+	req->args.addr = addr;
+	ret = rte_mp_request_sync(&mp_req, &mp_rep, &ts);
+	if (ret) {
+		DRV_LOG(ERR, "port %u request to primary process failed",
+			dev->data->port_id);
+		return -rte_errno;
+	}
+	assert(mp_rep.nb_received == 1);
+	mp_res = &mp_rep.msgs[0];
+	res = (struct mlx5_mp_param *)mp_res->param;
+	ret = res->result;
+	if (ret)
+		rte_errno = -ret;
+	free(mp_rep.msgs);
+	return ret;
+}
+
+/**
  * Request Verbs command file descriptor for mmap to the primary process.
  *
  * @param[in] dev
diff --git a/drivers/net/mlx5/mlx5_mr.c b/drivers/net/mlx5/mlx5_mr.c
index 78d829722e..44b65916da 100644
--- a/drivers/net/mlx5/mlx5_mr.c
+++ b/drivers/net/mlx5/mlx5_mr.c
@@ -517,7 +517,10 @@ mr_find_contig_memsegs_cb(const struct rte_memseg_list *msl,
 
 /**
  * Create a new global Memroy Region (MR) for a missing virtual address.
- * Register entire virtually contiguous memory chunk around the address.
+ * This API should be called on a secondary process, then a request is sent to
+ * the primary process in order to create a MR for the address. As the global MR
+ * list is on the shared memory, following LKey lookup should succeed unless the
+ * request fails.
  *
  * @param dev
  *   Pointer to Ethernet device.
@@ -531,8 +534,52 @@ mr_find_contig_memsegs_cb(const struct rte_memseg_list *msl,
  *   Searched LKey on success, UINT32_MAX on failure and rte_errno is set.
  */
 static uint32_t
-mlx5_mr_create(struct rte_eth_dev *dev, struct mlx5_mr_cache *entry,
-	       uintptr_t addr)
+mlx5_mr_create_secondary(struct rte_eth_dev *dev, struct mlx5_mr_cache *entry,
+			 uintptr_t addr)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	int ret;
+
+	DEBUG("port %u requesting MR creation for address (%p)",
+	      dev->data->port_id, (void *)addr);
+	ret = mlx5_mp_req_mr_create(dev, addr);
+	if (ret) {
+		DEBUG("port %u fail to request MR creation for address (%p)",
+		      dev->data->port_id, (void *)addr);
+		return UINT32_MAX;
+	}
+	rte_rwlock_read_lock(&priv->mr.rwlock);
+	/* Fill in output data. */
+	mr_lookup_dev(dev, entry, addr);
+	/* Lookup can't fail. */
+	assert(entry->lkey != UINT32_MAX);
+	rte_rwlock_read_unlock(&priv->mr.rwlock);
+	DEBUG("port %u MR CREATED by primary process for %p:\n"
+	      "  [0x%" PRIxPTR ", 0x%" PRIxPTR "), lkey=0x%x",
+	      dev->data->port_id, (void *)addr,
+	      entry->start, entry->end, entry->lkey);
+	return entry->lkey;
+}
+
+/**
+ * Create a new global Memroy Region (MR) for a missing virtual address.
+ * Register entire virtually contiguous memory chunk around the address.
+ * This must be called from the primary process.
+ *
+ * @param dev
+ *   Pointer to Ethernet device.
+ * @param[out] entry
+ *   Pointer to returning MR cache entry, found in the global cache or newly
+ *   created. If failed to create one, this will not be updated.
+ * @param addr
+ *   Target virtual address to register.
+ *
+ * @return
+ *   Searched LKey on success, UINT32_MAX on failure and rte_errno is set.
+ */
+uint32_t
+mlx5_mr_create_primary(struct rte_eth_dev *dev, struct mlx5_mr_cache *entry,
+		       uintptr_t addr)
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
 	struct mlx5_dev_config *config = &priv->config;
@@ -553,15 +600,6 @@ mlx5_mr_create(struct rte_eth_dev *dev, struct mlx5_mr_cache *entry,
 
 	DRV_LOG(DEBUG, "port %u creating a MR using address (%p)",
 		dev->data->port_id, (void *)addr);
-	if (rte_eal_process_type() != RTE_PROC_PRIMARY) {
-		DRV_LOG(WARNING,
-			"port %u using address (%p) of unregistered mempool"
-			" in secondary process, please create mempool"
-			" before rte_eth_dev_start()",
-			dev->data->port_id, (void *)addr);
-		rte_errno = EPERM;
-		goto err_nolock;
-	}
 	/*
 	 * Release detached MRs if any. This can't be called with holding either
 	 * memory_hotplug_lock or priv->mr.rwlock. MRs on the free list have
@@ -773,6 +811,40 @@ mlx5_mr_create(struct rte_eth_dev *dev, struct mlx5_mr_cache *entry,
 }
 
 /**
+ * Create a new global Memroy Region (MR) for a missing virtual address.
+ * This can be called from primary and secondary process.
+ *
+ * @param dev
+ *   Pointer to Ethernet device.
+ * @param[out] entry
+ *   Pointer to returning MR cache entry, found in the global cache or newly
+ *   created. If failed to create one, this will not be updated.
+ * @param addr
+ *   Target virtual address to register.
+ *
+ * @return
+ *   Searched LKey on success, UINT32_MAX on failure and rte_errno is set.
+ */
+static uint32_t
+mlx5_mr_create(struct rte_eth_dev *dev, struct mlx5_mr_cache *entry,
+	       uintptr_t addr)
+{
+	uint32_t ret = 0;
+
+	switch (rte_eal_process_type()) {
+	case RTE_PROC_PRIMARY:
+		ret = mlx5_mr_create_primary(dev, entry, addr);
+		break;
+	case RTE_PROC_SECONDARY:
+		ret = mlx5_mr_create_secondary(dev, entry, addr);
+		break;
+	default:
+		break;
+	}
+	return ret;
+}
+
+/**
  * Rebuild the global B-tree cache of device from the original MR list.
  *
  * @param dev
diff --git a/drivers/net/mlx5/mlx5_mr.h b/drivers/net/mlx5/mlx5_mr.h
index a57003fe92..786f6a3148 100644
--- a/drivers/net/mlx5/mlx5_mr.h
+++ b/drivers/net/mlx5/mlx5_mr.h
@@ -70,6 +70,8 @@ extern rte_rwlock_t mlx5_mem_event_rwlock;
 
 int mlx5_mr_btree_init(struct mlx5_mr_btree *bt, int n, int socket);
 void mlx5_mr_btree_free(struct mlx5_mr_btree *bt);
+uint32_t mlx5_mr_create_primary(struct rte_eth_dev *dev,
+				struct mlx5_mr_cache *entry, uintptr_t addr);
 void mlx5_mr_mem_event_cb(enum rte_mem_event event_type, const void *addr,
 			  size_t len, void *arg);
 int mlx5_mr_update_mp(struct rte_eth_dev *dev, struct mlx5_mr_ctrl *mr_ctrl,
-- 
2.11.0

^ permalink raw reply	[flat|nested] 44+ messages in thread

* [dpdk-dev] [PATCH v3 4/6] net/mlx5: enable secondary process to register DMA memory
  2019-04-01 21:17   ` [dpdk-dev] [PATCH v3 4/6] net/mlx5: enable secondary process to register DMA memory Yongseok Koh
@ 2019-04-01 21:17     ` Yongseok Koh
  0 siblings, 0 replies; 44+ messages in thread
From: Yongseok Koh @ 2019-04-01 21:17 UTC (permalink / raw)
  To: shahafs; +Cc: dev

The Memory Region (MR) for DMA memory can't be created from secondary
process due to lib/driver limitation. Whenever it is needed, secondary
process can make a request to primary process through the EAL IPC channel
(rte_mp_msg) which is established on initialization. Once a MR is created
by primary process, it is immediately visible to secondary process because
the MR list is global per a device. Thus, secondary process can look up the
list after the request is successfully returned.

Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
---
 doc/guides/nics/mlx5.rst   |  1 -
 drivers/net/mlx5/mlx5.h    |  6 +++
 drivers/net/mlx5/mlx5_mp.c | 50 ++++++++++++++++++++++++
 drivers/net/mlx5/mlx5_mr.c | 96 ++++++++++++++++++++++++++++++++++++++++------
 drivers/net/mlx5/mlx5_mr.h |  2 +
 5 files changed, 142 insertions(+), 13 deletions(-)

diff --git a/doc/guides/nics/mlx5.rst b/doc/guides/nics/mlx5.rst
index e5e3d9061e..5fa6b62527 100644
--- a/doc/guides/nics/mlx5.rst
+++ b/doc/guides/nics/mlx5.rst
@@ -85,7 +85,6 @@ Limitations
 - For secondary process:
 
   - Forked secondary process not supported.
-  - All mempools must be initialized before rte_eth_dev_start().
   - External memory unregistered in EAL memseg list cannot be used for DMA
     unless such memory has been registered by ``mlx5_mr_update_ext_mp()`` in
     primary process and remapped to the same virtual address in secondary
diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index b3445f198f..47a7d75f7a 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -59,6 +59,7 @@ enum {
 /* Request types for IPC. */
 enum mlx5_mp_req_type {
 	MLX5_MP_REQ_VERBS_CMD_FD = 1,
+	MLX5_MP_REQ_CREATE_MR,
 	MLX5_MP_REQ_START_RXTX,
 	MLX5_MP_REQ_STOP_RXTX,
 };
@@ -68,6 +69,10 @@ struct mlx5_mp_param {
 	enum mlx5_mp_req_type type;
 	int port_id;
 	int result;
+	RTE_STD_C11
+	union {
+		uintptr_t addr; /* MLX5_MP_REQ_CREATE_MR */
+	} args;
 };
 
 /** Request timeout for IPC. */
@@ -467,6 +472,7 @@ void mlx5_flow_delete_drop_queue(struct rte_eth_dev *dev);
 /* mlx5_mp.c */
 void mlx5_mp_req_start_rxtx(struct rte_eth_dev *dev);
 void mlx5_mp_req_stop_rxtx(struct rte_eth_dev *dev);
+int mlx5_mp_req_mr_create(struct rte_eth_dev *dev, uintptr_t addr);
 int mlx5_mp_req_verbs_cmd_fd(struct rte_eth_dev *dev);
 void mlx5_mp_init_primary(void);
 void mlx5_mp_uninit_primary(void);
diff --git a/drivers/net/mlx5/mlx5_mp.c b/drivers/net/mlx5/mlx5_mp.c
index 45dcc30426..cea74adb63 100644
--- a/drivers/net/mlx5/mlx5_mp.c
+++ b/drivers/net/mlx5/mlx5_mp.c
@@ -58,6 +58,8 @@ mp_primary_handle(const struct rte_mp_msg *mp_msg, const void *peer)
 		(const struct mlx5_mp_param *)mp_msg->param;
 	struct rte_eth_dev *dev;
 	struct mlx5_priv *priv;
+	struct mlx5_mr_cache entry;
+	uint32_t lkey;
 	int ret;
 
 	assert(rte_eal_process_type() == RTE_PROC_PRIMARY);
@@ -69,6 +71,13 @@ mp_primary_handle(const struct rte_mp_msg *mp_msg, const void *peer)
 	dev = &rte_eth_devices[param->port_id];
 	priv = dev->data->dev_private;
 	switch (param->type) {
+	case MLX5_MP_REQ_CREATE_MR:
+		mp_init_msg(dev, &mp_res, param->type);
+		lkey = mlx5_mr_create_primary(dev, &entry, param->args.addr);
+		if (lkey == UINT32_MAX)
+			res->result = -rte_errno;
+		ret = rte_mp_reply(&mp_res, peer);
+		break;
 	case MLX5_MP_REQ_VERBS_CMD_FD:
 		mp_init_msg(dev, &mp_res, param->type);
 		mp_res.num_fds = 1;
@@ -221,6 +230,47 @@ mlx5_mp_req_stop_rxtx(struct rte_eth_dev *dev)
 }
 
 /**
+ * Request Memory Region creation to the primary process.
+ *
+ * @param[in] dev
+ *   Pointer to Ethernet structure.
+ * @param addr
+ *   Target virtual address to register.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+int
+mlx5_mp_req_mr_create(struct rte_eth_dev *dev, uintptr_t addr)
+{
+	struct rte_mp_msg mp_req;
+	struct rte_mp_msg *mp_res;
+	struct rte_mp_reply mp_rep;
+	struct mlx5_mp_param *req = (struct mlx5_mp_param *)mp_req.param;
+	struct mlx5_mp_param *res;
+	struct timespec ts = {.tv_sec = MLX5_MP_REQ_TIMEOUT_SEC, .tv_nsec = 0};
+	int ret;
+
+	assert(rte_eal_process_type() == RTE_PROC_SECONDARY);
+	mp_init_msg(dev, &mp_req, MLX5_MP_REQ_CREATE_MR);
+	req->args.addr = addr;
+	ret = rte_mp_request_sync(&mp_req, &mp_rep, &ts);
+	if (ret) {
+		DRV_LOG(ERR, "port %u request to primary process failed",
+			dev->data->port_id);
+		return -rte_errno;
+	}
+	assert(mp_rep.nb_received == 1);
+	mp_res = &mp_rep.msgs[0];
+	res = (struct mlx5_mp_param *)mp_res->param;
+	ret = res->result;
+	if (ret)
+		rte_errno = -ret;
+	free(mp_rep.msgs);
+	return ret;
+}
+
+/**
  * Request Verbs command file descriptor for mmap to the primary process.
  *
  * @param[in] dev
diff --git a/drivers/net/mlx5/mlx5_mr.c b/drivers/net/mlx5/mlx5_mr.c
index 78d829722e..44b65916da 100644
--- a/drivers/net/mlx5/mlx5_mr.c
+++ b/drivers/net/mlx5/mlx5_mr.c
@@ -517,7 +517,10 @@ mr_find_contig_memsegs_cb(const struct rte_memseg_list *msl,
 
 /**
  * Create a new global Memroy Region (MR) for a missing virtual address.
- * Register entire virtually contiguous memory chunk around the address.
+ * This API should be called on a secondary process, then a request is sent to
+ * the primary process in order to create a MR for the address. As the global MR
+ * list is on the shared memory, following LKey lookup should succeed unless the
+ * request fails.
  *
  * @param dev
  *   Pointer to Ethernet device.
@@ -531,8 +534,52 @@ mr_find_contig_memsegs_cb(const struct rte_memseg_list *msl,
  *   Searched LKey on success, UINT32_MAX on failure and rte_errno is set.
  */
 static uint32_t
-mlx5_mr_create(struct rte_eth_dev *dev, struct mlx5_mr_cache *entry,
-	       uintptr_t addr)
+mlx5_mr_create_secondary(struct rte_eth_dev *dev, struct mlx5_mr_cache *entry,
+			 uintptr_t addr)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	int ret;
+
+	DEBUG("port %u requesting MR creation for address (%p)",
+	      dev->data->port_id, (void *)addr);
+	ret = mlx5_mp_req_mr_create(dev, addr);
+	if (ret) {
+		DEBUG("port %u fail to request MR creation for address (%p)",
+		      dev->data->port_id, (void *)addr);
+		return UINT32_MAX;
+	}
+	rte_rwlock_read_lock(&priv->mr.rwlock);
+	/* Fill in output data. */
+	mr_lookup_dev(dev, entry, addr);
+	/* Lookup can't fail. */
+	assert(entry->lkey != UINT32_MAX);
+	rte_rwlock_read_unlock(&priv->mr.rwlock);
+	DEBUG("port %u MR CREATED by primary process for %p:\n"
+	      "  [0x%" PRIxPTR ", 0x%" PRIxPTR "), lkey=0x%x",
+	      dev->data->port_id, (void *)addr,
+	      entry->start, entry->end, entry->lkey);
+	return entry->lkey;
+}
+
+/**
+ * Create a new global Memroy Region (MR) for a missing virtual address.
+ * Register entire virtually contiguous memory chunk around the address.
+ * This must be called from the primary process.
+ *
+ * @param dev
+ *   Pointer to Ethernet device.
+ * @param[out] entry
+ *   Pointer to returning MR cache entry, found in the global cache or newly
+ *   created. If failed to create one, this will not be updated.
+ * @param addr
+ *   Target virtual address to register.
+ *
+ * @return
+ *   Searched LKey on success, UINT32_MAX on failure and rte_errno is set.
+ */
+uint32_t
+mlx5_mr_create_primary(struct rte_eth_dev *dev, struct mlx5_mr_cache *entry,
+		       uintptr_t addr)
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
 	struct mlx5_dev_config *config = &priv->config;
@@ -553,15 +600,6 @@ mlx5_mr_create(struct rte_eth_dev *dev, struct mlx5_mr_cache *entry,
 
 	DRV_LOG(DEBUG, "port %u creating a MR using address (%p)",
 		dev->data->port_id, (void *)addr);
-	if (rte_eal_process_type() != RTE_PROC_PRIMARY) {
-		DRV_LOG(WARNING,
-			"port %u using address (%p) of unregistered mempool"
-			" in secondary process, please create mempool"
-			" before rte_eth_dev_start()",
-			dev->data->port_id, (void *)addr);
-		rte_errno = EPERM;
-		goto err_nolock;
-	}
 	/*
 	 * Release detached MRs if any. This can't be called with holding either
 	 * memory_hotplug_lock or priv->mr.rwlock. MRs on the free list have
@@ -773,6 +811,40 @@ mlx5_mr_create(struct rte_eth_dev *dev, struct mlx5_mr_cache *entry,
 }
 
 /**
+ * Create a new global Memroy Region (MR) for a missing virtual address.
+ * This can be called from primary and secondary process.
+ *
+ * @param dev
+ *   Pointer to Ethernet device.
+ * @param[out] entry
+ *   Pointer to returning MR cache entry, found in the global cache or newly
+ *   created. If failed to create one, this will not be updated.
+ * @param addr
+ *   Target virtual address to register.
+ *
+ * @return
+ *   Searched LKey on success, UINT32_MAX on failure and rte_errno is set.
+ */
+static uint32_t
+mlx5_mr_create(struct rte_eth_dev *dev, struct mlx5_mr_cache *entry,
+	       uintptr_t addr)
+{
+	uint32_t ret = 0;
+
+	switch (rte_eal_process_type()) {
+	case RTE_PROC_PRIMARY:
+		ret = mlx5_mr_create_primary(dev, entry, addr);
+		break;
+	case RTE_PROC_SECONDARY:
+		ret = mlx5_mr_create_secondary(dev, entry, addr);
+		break;
+	default:
+		break;
+	}
+	return ret;
+}
+
+/**
  * Rebuild the global B-tree cache of device from the original MR list.
  *
  * @param dev
diff --git a/drivers/net/mlx5/mlx5_mr.h b/drivers/net/mlx5/mlx5_mr.h
index a57003fe92..786f6a3148 100644
--- a/drivers/net/mlx5/mlx5_mr.h
+++ b/drivers/net/mlx5/mlx5_mr.h
@@ -70,6 +70,8 @@ extern rte_rwlock_t mlx5_mem_event_rwlock;
 
 int mlx5_mr_btree_init(struct mlx5_mr_btree *bt, int n, int socket);
 void mlx5_mr_btree_free(struct mlx5_mr_btree *bt);
+uint32_t mlx5_mr_create_primary(struct rte_eth_dev *dev,
+				struct mlx5_mr_cache *entry, uintptr_t addr);
 void mlx5_mr_mem_event_cb(enum rte_mem_event event_type, const void *addr,
 			  size_t len, void *arg);
 int mlx5_mr_update_mp(struct rte_eth_dev *dev, struct mlx5_mr_ctrl *mr_ctrl,
-- 
2.11.0


^ permalink raw reply	[flat|nested] 44+ messages in thread

* [dpdk-dev] [PATCH v3 5/6] net/mlx4: add control of excessive memory pinning by kernel
  2019-04-01 21:17 ` [dpdk-dev] [PATCH v3 0/6] net/mlx: " Yongseok Koh
                     ` (4 preceding siblings ...)
  2019-04-01 21:17   ` [dpdk-dev] [PATCH v3 4/6] net/mlx5: enable secondary process to register DMA memory Yongseok Koh
@ 2019-04-01 21:17   ` Yongseok Koh
  2019-04-01 21:17     ` Yongseok Koh
  2019-05-14  4:52     ` Stephen Hemminger
  2019-04-01 21:17   ` [dpdk-dev] [PATCH v3 6/6] net/mlx4: enable secondary process to register DMA memory Yongseok Koh
  2019-04-02  7:13   ` [dpdk-dev] [PATCH v3 0/6] net/mlx: " Shahaf Shuler
  7 siblings, 2 replies; 44+ messages in thread
From: Yongseok Koh @ 2019-04-01 21:17 UTC (permalink / raw)
  To: shahafs; +Cc: dev

A new PMD parameter (mr_ext_memseg_en) is added to control extension of
memseg when creating a MR. It is enabled by default.

If enabled, mlx4_mr_create() tries to maximize the range of MR
registration so that the LKey lookup tables on datapath become smaller and
get the best performance. However, it may worsen memory utilization
because registered memory is pinned by kernel driver. Even if a page in the
extended chunk is freed, that doesn't become reusable until the entire
memory is freed and the MR is destroyed.

To make freed pages available immediately, this parameter has to be turned
off but it could drop performance.

Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
---
 doc/guides/nics/mlx4.rst   | 11 +++++++++++
 drivers/net/mlx4/mlx4.c    | 11 +++++++++--
 drivers/net/mlx4/mlx4.h    |  5 +++++
 drivers/net/mlx4/mlx4_mr.c | 20 +++++++++++++++-----
 4 files changed, 40 insertions(+), 7 deletions(-)

diff --git a/doc/guides/nics/mlx4.rst b/doc/guides/nics/mlx4.rst
index cd34838f41..c8a02be4dd 100644
--- a/doc/guides/nics/mlx4.rst
+++ b/doc/guides/nics/mlx4.rst
@@ -119,6 +119,17 @@ Run-time configuration
   times for additional ports. All ports are probed by default if left
   unspecified.
 
+- ``mr_ext_memseg_en`` parameter [int]
+
+  A nonzero value enables extending memseg when registering DMA memory. If
+  enabled, the number of entries in MR (Memory Region) lookup table on datapath
+  is minimized and it benefits performance. On the other hand, it worsens memory
+  utilization because registered memory is pinned by kernel driver. Even if a
+  page in the extended chunk is freed, that doesn't become reusable until the
+  entire memory is freed.
+
+  Enabled by default.
+
 Kernel module parameters
 ~~~~~~~~~~~~~~~~~~~~~~~~
 
diff --git a/drivers/net/mlx4/mlx4.c b/drivers/net/mlx4/mlx4.c
index 315640a6d7..252658fc6a 100644
--- a/drivers/net/mlx4/mlx4.c
+++ b/drivers/net/mlx4/mlx4.c
@@ -66,11 +66,14 @@ struct mlx4_conf {
 		uint32_t present; /**< Bit-field for existing ports. */
 		uint32_t enabled; /**< Bit-field for user-enabled ports. */
 	} ports;
+	int mr_ext_memseg_en;
+	/** Whether memseg should be extended for MR creation. */
 };
 
 /* Available parameters list. */
 const char *pmd_mlx4_init_params[] = {
 	MLX4_PMD_PORT_KVARG,
+	MLX4_MR_EXT_MEMSEG_EN_KVARG,
 	NULL,
 };
 
@@ -509,6 +512,8 @@ mlx4_arg_parse(const char *key, const char *val, struct mlx4_conf *conf)
 			return -rte_errno;
 		}
 		conf->ports.enabled |= 1 << tmp;
+	} else if (strcmp(MLX4_MR_EXT_MEMSEG_EN_KVARG, key) == 0) {
+		conf->mr_ext_memseg_en = !!tmp;
 	} else {
 		rte_errno = EINVAL;
 		WARN("%s: unknown parameter", key);
@@ -544,10 +549,10 @@ mlx4_args(struct rte_devargs *devargs, struct mlx4_conf *conf)
 	}
 	/* Process parameters. */
 	for (i = 0; pmd_mlx4_init_params[i]; ++i) {
-		arg_count = rte_kvargs_count(kvlist, MLX4_PMD_PORT_KVARG);
+		arg_count = rte_kvargs_count(kvlist, pmd_mlx4_init_params[i]);
 		while (arg_count-- > 0) {
 			ret = rte_kvargs_process(kvlist,
-						 MLX4_PMD_PORT_KVARG,
+						 pmd_mlx4_init_params[i],
 						 (int (*)(const char *,
 							  const char *,
 							  void *))
@@ -876,6 +881,7 @@ mlx4_pci_probe(struct rte_pci_driver *pci_drv, struct rte_pci_device *pci_dev)
 	struct ibv_device_attr_ex device_attr_ex;
 	struct mlx4_conf conf = {
 		.ports.present = 0,
+		.mr_ext_memseg_en = 1,
 	};
 	unsigned int vf;
 	int i;
@@ -1100,6 +1106,7 @@ mlx4_pci_probe(struct rte_pci_driver *pci_drv, struct rte_pci_device *pci_dev)
 					device_attr_ex.tso_caps.max_tso;
 		DEBUG("TSO is %ssupported",
 		      priv->tso ? "" : "not ");
+		priv->mr_ext_memseg_en = conf.mr_ext_memseg_en;
 		/* Configure the first MAC address by default. */
 		err = mlx4_get_mac(priv, &mac.addr_bytes);
 		if (err) {
diff --git a/drivers/net/mlx4/mlx4.h b/drivers/net/mlx4/mlx4.h
index 1a7b1fb541..4ff98d772b 100644
--- a/drivers/net/mlx4/mlx4.h
+++ b/drivers/net/mlx4/mlx4.h
@@ -53,6 +53,9 @@
 /** Port parameter. */
 #define MLX4_PMD_PORT_KVARG "port"
 
+/** Enable extending memsegs when creating a MR. */
+#define MLX4_MR_EXT_MEMSEG_EN_KVARG "mr_ext_memseg_en"
+
 /* Reserved address space for UAR mapping. */
 #define MLX4_UAR_SIZE (1ULL << (sizeof(uintptr_t) * 4))
 
@@ -165,6 +168,8 @@ struct mlx4_priv {
 	uint32_t hw_csum_l2tun:1; /**< Checksum support for L2 tunnels. */
 	uint32_t hw_fcs_strip:1; /**< FCS stripping toggling is supported. */
 	uint32_t tso:1; /**< Transmit segmentation offload is supported. */
+	uint32_t mr_ext_memseg_en:1;
+	/** Whether memseg should be extended for MR creation. */
 	uint32_t tso_max_payload_sz; /**< Max supported TSO payload size. */
 	uint32_t hw_rss_max_qps; /**< Max Rx Queues supported by RSS. */
 	uint64_t hw_rss_sup; /**< Supported RSS hash fields (Verbs format). */
diff --git a/drivers/net/mlx4/mlx4_mr.c b/drivers/net/mlx4/mlx4_mr.c
index 0ba55fda04..6db917a092 100644
--- a/drivers/net/mlx4/mlx4_mr.c
+++ b/drivers/net/mlx4/mlx4_mr.c
@@ -580,14 +580,24 @@ mlx4_mr_create(struct rte_eth_dev *dev, struct mlx4_mr_cache *entry,
 	 */
 	mlx4_mr_garbage_collect(dev);
 	/*
-	 * Find out a contiguous virtual address chunk in use, to which the
-	 * given address belongs, in order to register maximum range. In the
-	 * best case where mempools are not dynamically recreated and
+	 * If enabled, find out a contiguous virtual address chunk in use, to
+	 * which the given address belongs, in order to register maximum range.
+	 * In the best case where mempools are not dynamically recreated and
 	 * '--socket-mem' is specified as an EAL option, it is very likely to
 	 * have only one MR(LKey) per a socket and per a hugepage-size even
-	 * though the system memory is highly fragmented.
+	 * though the system memory is highly fragmented. As the whole memory
+	 * chunk will be pinned by kernel, it can't be reused unless entire
+	 * chunk is freed from EAL.
+	 *
+	 * If disabled, just register one memseg (page). Then, memory
+	 * consumption will be minimized but it may drop performance if there
+	 * are many MRs to lookup on the datapath.
 	 */
-	if (!rte_memseg_contig_walk(mr_find_contig_memsegs_cb, &data)) {
+	if (!priv->mr_ext_memseg_en) {
+		data.msl = rte_mem_virt2memseg_list((void *)addr);
+		data.start = RTE_ALIGN_FLOOR(addr, data.msl->page_sz);
+		data.end = data.start + data.msl->page_sz;
+	} else if (!rte_memseg_contig_walk(mr_find_contig_memsegs_cb, &data)) {
 		WARN("port %u unable to find virtually contiguous"
 		     " chunk for address (%p)."
 		     " rte_memseg_contig_walk() failed.",
-- 
2.11.0

^ permalink raw reply	[flat|nested] 44+ messages in thread

* [dpdk-dev] [PATCH v3 5/6] net/mlx4: add control of excessive memory pinning by kernel
  2019-04-01 21:17   ` [dpdk-dev] [PATCH v3 5/6] net/mlx4: add control of excessive memory pinning by kernel Yongseok Koh
@ 2019-04-01 21:17     ` Yongseok Koh
  2019-05-14  4:52     ` Stephen Hemminger
  1 sibling, 0 replies; 44+ messages in thread
From: Yongseok Koh @ 2019-04-01 21:17 UTC (permalink / raw)
  To: shahafs; +Cc: dev

A new PMD parameter (mr_ext_memseg_en) is added to control extension of
memseg when creating a MR. It is enabled by default.

If enabled, mlx4_mr_create() tries to maximize the range of MR
registration so that the LKey lookup tables on datapath become smaller and
get the best performance. However, it may worsen memory utilization
because registered memory is pinned by kernel driver. Even if a page in the
extended chunk is freed, that doesn't become reusable until the entire
memory is freed and the MR is destroyed.

To make freed pages available immediately, this parameter has to be turned
off but it could drop performance.

Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
---
 doc/guides/nics/mlx4.rst   | 11 +++++++++++
 drivers/net/mlx4/mlx4.c    | 11 +++++++++--
 drivers/net/mlx4/mlx4.h    |  5 +++++
 drivers/net/mlx4/mlx4_mr.c | 20 +++++++++++++++-----
 4 files changed, 40 insertions(+), 7 deletions(-)

diff --git a/doc/guides/nics/mlx4.rst b/doc/guides/nics/mlx4.rst
index cd34838f41..c8a02be4dd 100644
--- a/doc/guides/nics/mlx4.rst
+++ b/doc/guides/nics/mlx4.rst
@@ -119,6 +119,17 @@ Run-time configuration
   times for additional ports. All ports are probed by default if left
   unspecified.
 
+- ``mr_ext_memseg_en`` parameter [int]
+
+  A nonzero value enables extending memseg when registering DMA memory. If
+  enabled, the number of entries in MR (Memory Region) lookup table on datapath
+  is minimized and it benefits performance. On the other hand, it worsens memory
+  utilization because registered memory is pinned by kernel driver. Even if a
+  page in the extended chunk is freed, that doesn't become reusable until the
+  entire memory is freed.
+
+  Enabled by default.
+
 Kernel module parameters
 ~~~~~~~~~~~~~~~~~~~~~~~~
 
diff --git a/drivers/net/mlx4/mlx4.c b/drivers/net/mlx4/mlx4.c
index 315640a6d7..252658fc6a 100644
--- a/drivers/net/mlx4/mlx4.c
+++ b/drivers/net/mlx4/mlx4.c
@@ -66,11 +66,14 @@ struct mlx4_conf {
 		uint32_t present; /**< Bit-field for existing ports. */
 		uint32_t enabled; /**< Bit-field for user-enabled ports. */
 	} ports;
+	int mr_ext_memseg_en;
+	/** Whether memseg should be extended for MR creation. */
 };
 
 /* Available parameters list. */
 const char *pmd_mlx4_init_params[] = {
 	MLX4_PMD_PORT_KVARG,
+	MLX4_MR_EXT_MEMSEG_EN_KVARG,
 	NULL,
 };
 
@@ -509,6 +512,8 @@ mlx4_arg_parse(const char *key, const char *val, struct mlx4_conf *conf)
 			return -rte_errno;
 		}
 		conf->ports.enabled |= 1 << tmp;
+	} else if (strcmp(MLX4_MR_EXT_MEMSEG_EN_KVARG, key) == 0) {
+		conf->mr_ext_memseg_en = !!tmp;
 	} else {
 		rte_errno = EINVAL;
 		WARN("%s: unknown parameter", key);
@@ -544,10 +549,10 @@ mlx4_args(struct rte_devargs *devargs, struct mlx4_conf *conf)
 	}
 	/* Process parameters. */
 	for (i = 0; pmd_mlx4_init_params[i]; ++i) {
-		arg_count = rte_kvargs_count(kvlist, MLX4_PMD_PORT_KVARG);
+		arg_count = rte_kvargs_count(kvlist, pmd_mlx4_init_params[i]);
 		while (arg_count-- > 0) {
 			ret = rte_kvargs_process(kvlist,
-						 MLX4_PMD_PORT_KVARG,
+						 pmd_mlx4_init_params[i],
 						 (int (*)(const char *,
 							  const char *,
 							  void *))
@@ -876,6 +881,7 @@ mlx4_pci_probe(struct rte_pci_driver *pci_drv, struct rte_pci_device *pci_dev)
 	struct ibv_device_attr_ex device_attr_ex;
 	struct mlx4_conf conf = {
 		.ports.present = 0,
+		.mr_ext_memseg_en = 1,
 	};
 	unsigned int vf;
 	int i;
@@ -1100,6 +1106,7 @@ mlx4_pci_probe(struct rte_pci_driver *pci_drv, struct rte_pci_device *pci_dev)
 					device_attr_ex.tso_caps.max_tso;
 		DEBUG("TSO is %ssupported",
 		      priv->tso ? "" : "not ");
+		priv->mr_ext_memseg_en = conf.mr_ext_memseg_en;
 		/* Configure the first MAC address by default. */
 		err = mlx4_get_mac(priv, &mac.addr_bytes);
 		if (err) {
diff --git a/drivers/net/mlx4/mlx4.h b/drivers/net/mlx4/mlx4.h
index 1a7b1fb541..4ff98d772b 100644
--- a/drivers/net/mlx4/mlx4.h
+++ b/drivers/net/mlx4/mlx4.h
@@ -53,6 +53,9 @@
 /** Port parameter. */
 #define MLX4_PMD_PORT_KVARG "port"
 
+/** Enable extending memsegs when creating a MR. */
+#define MLX4_MR_EXT_MEMSEG_EN_KVARG "mr_ext_memseg_en"
+
 /* Reserved address space for UAR mapping. */
 #define MLX4_UAR_SIZE (1ULL << (sizeof(uintptr_t) * 4))
 
@@ -165,6 +168,8 @@ struct mlx4_priv {
 	uint32_t hw_csum_l2tun:1; /**< Checksum support for L2 tunnels. */
 	uint32_t hw_fcs_strip:1; /**< FCS stripping toggling is supported. */
 	uint32_t tso:1; /**< Transmit segmentation offload is supported. */
+	uint32_t mr_ext_memseg_en:1;
+	/** Whether memseg should be extended for MR creation. */
 	uint32_t tso_max_payload_sz; /**< Max supported TSO payload size. */
 	uint32_t hw_rss_max_qps; /**< Max Rx Queues supported by RSS. */
 	uint64_t hw_rss_sup; /**< Supported RSS hash fields (Verbs format). */
diff --git a/drivers/net/mlx4/mlx4_mr.c b/drivers/net/mlx4/mlx4_mr.c
index 0ba55fda04..6db917a092 100644
--- a/drivers/net/mlx4/mlx4_mr.c
+++ b/drivers/net/mlx4/mlx4_mr.c
@@ -580,14 +580,24 @@ mlx4_mr_create(struct rte_eth_dev *dev, struct mlx4_mr_cache *entry,
 	 */
 	mlx4_mr_garbage_collect(dev);
 	/*
-	 * Find out a contiguous virtual address chunk in use, to which the
-	 * given address belongs, in order to register maximum range. In the
-	 * best case where mempools are not dynamically recreated and
+	 * If enabled, find out a contiguous virtual address chunk in use, to
+	 * which the given address belongs, in order to register maximum range.
+	 * In the best case where mempools are not dynamically recreated and
 	 * '--socket-mem' is specified as an EAL option, it is very likely to
 	 * have only one MR(LKey) per a socket and per a hugepage-size even
-	 * though the system memory is highly fragmented.
+	 * though the system memory is highly fragmented. As the whole memory
+	 * chunk will be pinned by kernel, it can't be reused unless entire
+	 * chunk is freed from EAL.
+	 *
+	 * If disabled, just register one memseg (page). Then, memory
+	 * consumption will be minimized but it may drop performance if there
+	 * are many MRs to lookup on the datapath.
 	 */
-	if (!rte_memseg_contig_walk(mr_find_contig_memsegs_cb, &data)) {
+	if (!priv->mr_ext_memseg_en) {
+		data.msl = rte_mem_virt2memseg_list((void *)addr);
+		data.start = RTE_ALIGN_FLOOR(addr, data.msl->page_sz);
+		data.end = data.start + data.msl->page_sz;
+	} else if (!rte_memseg_contig_walk(mr_find_contig_memsegs_cb, &data)) {
 		WARN("port %u unable to find virtually contiguous"
 		     " chunk for address (%p)."
 		     " rte_memseg_contig_walk() failed.",
-- 
2.11.0


^ permalink raw reply	[flat|nested] 44+ messages in thread

* [dpdk-dev] [PATCH v3 6/6] net/mlx4: enable secondary process to register DMA memory
  2019-04-01 21:17 ` [dpdk-dev] [PATCH v3 0/6] net/mlx: " Yongseok Koh
                     ` (5 preceding siblings ...)
  2019-04-01 21:17   ` [dpdk-dev] [PATCH v3 5/6] net/mlx4: add control of excessive memory pinning by kernel Yongseok Koh
@ 2019-04-01 21:17   ` Yongseok Koh
  2019-04-01 21:17     ` Yongseok Koh
  2019-04-02  7:13   ` [dpdk-dev] [PATCH v3 0/6] net/mlx: " Shahaf Shuler
  7 siblings, 1 reply; 44+ messages in thread
From: Yongseok Koh @ 2019-04-01 21:17 UTC (permalink / raw)
  To: shahafs; +Cc: dev

The Memory Region (MR) for DMA memory can't be created from secondary
process due to lib/driver limitation. Whenever it is needed, secondary
process can make a request to primary process through the EAL IPC channel
(rte_mp_msg) which is established on initialization. Once a MR is created
by primary process, it is immediately visible to secondary process because
the MR list is global per a device. Thus, secondary process can look up the
list after the request is successfully returned.

Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
---
 doc/guides/nics/mlx4.rst   |  1 -
 drivers/net/mlx4/mlx4.h    |  6 +++
 drivers/net/mlx4/mlx4_mp.c | 50 ++++++++++++++++++++++++
 drivers/net/mlx4/mlx4_mr.c | 95 ++++++++++++++++++++++++++++++++++++++++------
 drivers/net/mlx4/mlx4_mr.h |  2 +
 5 files changed, 142 insertions(+), 12 deletions(-)

diff --git a/doc/guides/nics/mlx4.rst b/doc/guides/nics/mlx4.rst
index c8a02be4dd..aaf1907532 100644
--- a/doc/guides/nics/mlx4.rst
+++ b/doc/guides/nics/mlx4.rst
@@ -159,7 +159,6 @@ Limitations
 - For secondary process:
 
   - Forked secondary process not supported.
-  - All mempools must be initialized before rte_eth_dev_start().
   - External memory unregistered in EAL memseg list cannot be used for DMA
     unless such memory has been registered by ``mlx4_mr_update_ext_mp()`` in
     primary process and remapped to the same virtual address in secondary
diff --git a/drivers/net/mlx4/mlx4.h b/drivers/net/mlx4/mlx4.h
index 4ff98d772b..1db23d6cc9 100644
--- a/drivers/net/mlx4/mlx4.h
+++ b/drivers/net/mlx4/mlx4.h
@@ -79,6 +79,7 @@ enum {
 /* Request types for IPC. */
 enum mlx4_mp_req_type {
 	MLX4_MP_REQ_VERBS_CMD_FD = 1,
+	MLX4_MP_REQ_CREATE_MR,
 	MLX4_MP_REQ_START_RXTX,
 	MLX4_MP_REQ_STOP_RXTX,
 };
@@ -88,6 +89,10 @@ struct mlx4_mp_param {
 	enum mlx4_mp_req_type type;
 	int port_id;
 	int result;
+	RTE_STD_C11
+	union {
+		uintptr_t addr; /* MLX4_MP_REQ_CREATE_MR */
+	} args;
 };
 
 /** Request timeout for IPC. */
@@ -235,6 +240,7 @@ int mlx4_rx_intr_enable(struct rte_eth_dev *dev, uint16_t idx);
 /* mlx4_mp.c */
 void mlx4_mp_req_start_rxtx(struct rte_eth_dev *dev);
 void mlx4_mp_req_stop_rxtx(struct rte_eth_dev *dev);
+int mlx4_mp_req_mr_create(struct rte_eth_dev *dev, uintptr_t addr);
 int mlx4_mp_req_verbs_cmd_fd(struct rte_eth_dev *dev);
 void mlx4_mp_init_primary(void);
 void mlx4_mp_uninit_primary(void);
diff --git a/drivers/net/mlx4/mlx4_mp.c b/drivers/net/mlx4/mlx4_mp.c
index eaeb257348..183622453c 100644
--- a/drivers/net/mlx4/mlx4_mp.c
+++ b/drivers/net/mlx4/mlx4_mp.c
@@ -58,6 +58,8 @@ mp_primary_handle(const struct rte_mp_msg *mp_msg, const void *peer)
 		(const struct mlx4_mp_param *)mp_msg->param;
 	struct rte_eth_dev *dev;
 	struct mlx4_priv *priv;
+	struct mlx4_mr_cache entry;
+	uint32_t lkey;
 	int ret;
 
 	assert(rte_eal_process_type() == RTE_PROC_PRIMARY);
@@ -69,6 +71,13 @@ mp_primary_handle(const struct rte_mp_msg *mp_msg, const void *peer)
 	dev = &rte_eth_devices[param->port_id];
 	priv = dev->data->dev_private;
 	switch (param->type) {
+	case MLX4_MP_REQ_CREATE_MR:
+		mp_init_msg(dev, &mp_res, param->type);
+		lkey = mlx4_mr_create_primary(dev, &entry, param->args.addr);
+		if (lkey == UINT32_MAX)
+			res->result = -rte_errno;
+		ret = rte_mp_reply(&mp_res, peer);
+		break;
 	case MLX4_MP_REQ_VERBS_CMD_FD:
 		mp_init_msg(dev, &mp_res, param->type);
 		mp_res.num_fds = 1;
@@ -218,6 +227,47 @@ mlx4_mp_req_stop_rxtx(struct rte_eth_dev *dev)
 }
 
 /**
+ * Request Memory Region creation to the primary process.
+ *
+ * @param[in] dev
+ *   Pointer to Ethernet structure.
+ * @param addr
+ *   Target virtual address to register.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+int
+mlx4_mp_req_mr_create(struct rte_eth_dev *dev, uintptr_t addr)
+{
+	struct rte_mp_msg mp_req;
+	struct rte_mp_msg *mp_res;
+	struct rte_mp_reply mp_rep;
+	struct mlx4_mp_param *req = (struct mlx4_mp_param *)mp_req.param;
+	struct mlx4_mp_param *res;
+	struct timespec ts = {.tv_sec = MLX4_MP_REQ_TIMEOUT_SEC, .tv_nsec = 0};
+	int ret;
+
+	assert(rte_eal_process_type() == RTE_PROC_SECONDARY);
+	mp_init_msg(dev, &mp_req, MLX4_MP_REQ_CREATE_MR);
+	req->args.addr = addr;
+	ret = rte_mp_request_sync(&mp_req, &mp_rep, &ts);
+	if (ret) {
+		ERROR("port %u request to primary process failed",
+		      dev->data->port_id);
+		return -rte_errno;
+	}
+	assert(mp_rep.nb_received == 1);
+	mp_res = &mp_rep.msgs[0];
+	res = (struct mlx4_mp_param *)mp_res->param;
+	ret = res->result;
+	if (ret)
+		rte_errno = -ret;
+	free(mp_rep.msgs);
+	return ret;
+}
+
+/**
  * IPC message handler of primary process.
  *
  * @param[in] dev
diff --git a/drivers/net/mlx4/mlx4_mr.c b/drivers/net/mlx4/mlx4_mr.c
index 6db917a092..ad7d4832f2 100644
--- a/drivers/net/mlx4/mlx4_mr.c
+++ b/drivers/net/mlx4/mlx4_mr.c
@@ -528,7 +528,10 @@ mr_find_contig_memsegs_cb(const struct rte_memseg_list *msl,
 
 /**
  * Create a new global Memroy Region (MR) for a missing virtual address.
- * Register entire virtually contiguous memory chunk around the address.
+ * This API should be called on a secondary process, then a request is sent to
+ * the primary process in order to create a MR for the address. As the global MR
+ * list is on the shared memory, following LKey lookup should succeed unless the
+ * request fails.
  *
  * @param dev
  *   Pointer to Ethernet device.
@@ -542,8 +545,52 @@ mr_find_contig_memsegs_cb(const struct rte_memseg_list *msl,
  *   Searched LKey on success, UINT32_MAX on failure and rte_errno is set.
  */
 static uint32_t
-mlx4_mr_create(struct rte_eth_dev *dev, struct mlx4_mr_cache *entry,
-	       uintptr_t addr)
+mlx4_mr_create_secondary(struct rte_eth_dev *dev, struct mlx4_mr_cache *entry,
+			 uintptr_t addr)
+{
+	struct mlx4_priv *priv = dev->data->dev_private;
+	int ret;
+
+	DEBUG("port %u requesting MR creation for address (%p)",
+	      dev->data->port_id, (void *)addr);
+	ret = mlx4_mp_req_mr_create(dev, addr);
+	if (ret) {
+		DEBUG("port %u fail to request MR creation for address (%p)",
+		      dev->data->port_id, (void *)addr);
+		return UINT32_MAX;
+	}
+	rte_rwlock_read_lock(&priv->mr.rwlock);
+	/* Fill in output data. */
+	mr_lookup_dev(dev, entry, addr);
+	/* Lookup can't fail. */
+	assert(entry->lkey != UINT32_MAX);
+	rte_rwlock_read_unlock(&priv->mr.rwlock);
+	DEBUG("port %u MR CREATED by primary process for %p:\n"
+	      "  [0x%" PRIxPTR ", 0x%" PRIxPTR "), lkey=0x%x",
+	      dev->data->port_id, (void *)addr,
+	      entry->start, entry->end, entry->lkey);
+	return entry->lkey;
+}
+
+/**
+ * Create a new global Memroy Region (MR) for a missing virtual address.
+ * Register entire virtually contiguous memory chunk around the address.
+ * This must be called from the primary process.
+ *
+ * @param dev
+ *   Pointer to Ethernet device.
+ * @param[out] entry
+ *   Pointer to returning MR cache entry, found in the global cache or newly
+ *   created. If failed to create one, this will not be updated.
+ * @param addr
+ *   Target virtual address to register.
+ *
+ * @return
+ *   Searched LKey on success, UINT32_MAX on failure and rte_errno is set.
+ */
+uint32_t
+mlx4_mr_create_primary(struct rte_eth_dev *dev, struct mlx4_mr_cache *entry,
+		       uintptr_t addr)
 {
 	struct mlx4_priv *priv = dev->data->dev_private;
 	struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
@@ -563,14 +610,6 @@ mlx4_mr_create(struct rte_eth_dev *dev, struct mlx4_mr_cache *entry,
 
 	DEBUG("port %u creating a MR using address (%p)",
 	      dev->data->port_id, (void *)addr);
-	if (rte_eal_process_type() != RTE_PROC_PRIMARY) {
-		WARN("port %u using address (%p) of unregistered mempool"
-		     " in secondary process, please create mempool"
-		     " before rte_eth_dev_start()",
-		     dev->data->port_id, (void *)addr);
-		rte_errno = EPERM;
-		goto err_nolock;
-	}
 	/*
 	 * Release detached MRs if any. This can't be called with holding either
 	 * memory_hotplug_lock or priv->mr.rwlock. MRs on the free list have
@@ -781,6 +820,40 @@ mlx4_mr_create(struct rte_eth_dev *dev, struct mlx4_mr_cache *entry,
 }
 
 /**
+ * Create a new global Memroy Region (MR) for a missing virtual address.
+ * This can be called from primary and secondary process.
+ *
+ * @param dev
+ *   Pointer to Ethernet device.
+ * @param[out] entry
+ *   Pointer to returning MR cache entry, found in the global cache or newly
+ *   created. If failed to create one, this will not be updated.
+ * @param addr
+ *   Target virtual address to register.
+ *
+ * @return
+ *   Searched LKey on success, UINT32_MAX on failure and rte_errno is set.
+ */
+static uint32_t
+mlx4_mr_create(struct rte_eth_dev *dev, struct mlx4_mr_cache *entry,
+	       uintptr_t addr)
+{
+	uint32_t ret = 0;
+
+	switch (rte_eal_process_type()) {
+	case RTE_PROC_PRIMARY:
+		ret = mlx4_mr_create_primary(dev, entry, addr);
+		break;
+	case RTE_PROC_SECONDARY:
+		ret = mlx4_mr_create_secondary(dev, entry, addr);
+		break;
+	default:
+		break;
+	}
+	return ret;
+}
+
+/**
  * Rebuild the global B-tree cache of device from the original MR list.
  *
  * @param dev
diff --git a/drivers/net/mlx4/mlx4_mr.h b/drivers/net/mlx4/mlx4_mr.h
index 37a365a8b5..9d125e239d 100644
--- a/drivers/net/mlx4/mlx4_mr.h
+++ b/drivers/net/mlx4/mlx4_mr.h
@@ -75,6 +75,8 @@ extern rte_rwlock_t mlx4_mem_event_rwlock;
 int mlx4_mr_btree_init(struct mlx4_mr_btree *bt, int n, int socket);
 void mlx4_mr_btree_free(struct mlx4_mr_btree *bt);
 void mlx4_mr_btree_dump(struct mlx4_mr_btree *bt);
+uint32_t mlx4_mr_create_primary(struct rte_eth_dev *dev,
+				struct mlx4_mr_cache *entry, uintptr_t addr);
 void mlx4_mr_mem_event_cb(enum rte_mem_event event_type, const void *addr,
 			  size_t len, void *arg);
 int mlx4_mr_update_mp(struct rte_eth_dev *dev, struct mlx4_mr_ctrl *mr_ctrl,
-- 
2.11.0

^ permalink raw reply	[flat|nested] 44+ messages in thread

* [dpdk-dev] [PATCH v3 6/6] net/mlx4: enable secondary process to register DMA memory
  2019-04-01 21:17   ` [dpdk-dev] [PATCH v3 6/6] net/mlx4: enable secondary process to register DMA memory Yongseok Koh
@ 2019-04-01 21:17     ` Yongseok Koh
  0 siblings, 0 replies; 44+ messages in thread
From: Yongseok Koh @ 2019-04-01 21:17 UTC (permalink / raw)
  To: shahafs; +Cc: dev

The Memory Region (MR) for DMA memory can't be created from secondary
process due to lib/driver limitation. Whenever it is needed, secondary
process can make a request to primary process through the EAL IPC channel
(rte_mp_msg) which is established on initialization. Once a MR is created
by primary process, it is immediately visible to secondary process because
the MR list is global per a device. Thus, secondary process can look up the
list after the request is successfully returned.

Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
---
 doc/guides/nics/mlx4.rst   |  1 -
 drivers/net/mlx4/mlx4.h    |  6 +++
 drivers/net/mlx4/mlx4_mp.c | 50 ++++++++++++++++++++++++
 drivers/net/mlx4/mlx4_mr.c | 95 ++++++++++++++++++++++++++++++++++++++++------
 drivers/net/mlx4/mlx4_mr.h |  2 +
 5 files changed, 142 insertions(+), 12 deletions(-)

diff --git a/doc/guides/nics/mlx4.rst b/doc/guides/nics/mlx4.rst
index c8a02be4dd..aaf1907532 100644
--- a/doc/guides/nics/mlx4.rst
+++ b/doc/guides/nics/mlx4.rst
@@ -159,7 +159,6 @@ Limitations
 - For secondary process:
 
   - Forked secondary process not supported.
-  - All mempools must be initialized before rte_eth_dev_start().
   - External memory unregistered in EAL memseg list cannot be used for DMA
     unless such memory has been registered by ``mlx4_mr_update_ext_mp()`` in
     primary process and remapped to the same virtual address in secondary
diff --git a/drivers/net/mlx4/mlx4.h b/drivers/net/mlx4/mlx4.h
index 4ff98d772b..1db23d6cc9 100644
--- a/drivers/net/mlx4/mlx4.h
+++ b/drivers/net/mlx4/mlx4.h
@@ -79,6 +79,7 @@ enum {
 /* Request types for IPC. */
 enum mlx4_mp_req_type {
 	MLX4_MP_REQ_VERBS_CMD_FD = 1,
+	MLX4_MP_REQ_CREATE_MR,
 	MLX4_MP_REQ_START_RXTX,
 	MLX4_MP_REQ_STOP_RXTX,
 };
@@ -88,6 +89,10 @@ struct mlx4_mp_param {
 	enum mlx4_mp_req_type type;
 	int port_id;
 	int result;
+	RTE_STD_C11
+	union {
+		uintptr_t addr; /* MLX4_MP_REQ_CREATE_MR */
+	} args;
 };
 
 /** Request timeout for IPC. */
@@ -235,6 +240,7 @@ int mlx4_rx_intr_enable(struct rte_eth_dev *dev, uint16_t idx);
 /* mlx4_mp.c */
 void mlx4_mp_req_start_rxtx(struct rte_eth_dev *dev);
 void mlx4_mp_req_stop_rxtx(struct rte_eth_dev *dev);
+int mlx4_mp_req_mr_create(struct rte_eth_dev *dev, uintptr_t addr);
 int mlx4_mp_req_verbs_cmd_fd(struct rte_eth_dev *dev);
 void mlx4_mp_init_primary(void);
 void mlx4_mp_uninit_primary(void);
diff --git a/drivers/net/mlx4/mlx4_mp.c b/drivers/net/mlx4/mlx4_mp.c
index eaeb257348..183622453c 100644
--- a/drivers/net/mlx4/mlx4_mp.c
+++ b/drivers/net/mlx4/mlx4_mp.c
@@ -58,6 +58,8 @@ mp_primary_handle(const struct rte_mp_msg *mp_msg, const void *peer)
 		(const struct mlx4_mp_param *)mp_msg->param;
 	struct rte_eth_dev *dev;
 	struct mlx4_priv *priv;
+	struct mlx4_mr_cache entry;
+	uint32_t lkey;
 	int ret;
 
 	assert(rte_eal_process_type() == RTE_PROC_PRIMARY);
@@ -69,6 +71,13 @@ mp_primary_handle(const struct rte_mp_msg *mp_msg, const void *peer)
 	dev = &rte_eth_devices[param->port_id];
 	priv = dev->data->dev_private;
 	switch (param->type) {
+	case MLX4_MP_REQ_CREATE_MR:
+		mp_init_msg(dev, &mp_res, param->type);
+		lkey = mlx4_mr_create_primary(dev, &entry, param->args.addr);
+		if (lkey == UINT32_MAX)
+			res->result = -rte_errno;
+		ret = rte_mp_reply(&mp_res, peer);
+		break;
 	case MLX4_MP_REQ_VERBS_CMD_FD:
 		mp_init_msg(dev, &mp_res, param->type);
 		mp_res.num_fds = 1;
@@ -218,6 +227,47 @@ mlx4_mp_req_stop_rxtx(struct rte_eth_dev *dev)
 }
 
 /**
+ * Request Memory Region creation to the primary process.
+ *
+ * @param[in] dev
+ *   Pointer to Ethernet structure.
+ * @param addr
+ *   Target virtual address to register.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+int
+mlx4_mp_req_mr_create(struct rte_eth_dev *dev, uintptr_t addr)
+{
+	struct rte_mp_msg mp_req;
+	struct rte_mp_msg *mp_res;
+	struct rte_mp_reply mp_rep;
+	struct mlx4_mp_param *req = (struct mlx4_mp_param *)mp_req.param;
+	struct mlx4_mp_param *res;
+	struct timespec ts = {.tv_sec = MLX4_MP_REQ_TIMEOUT_SEC, .tv_nsec = 0};
+	int ret;
+
+	assert(rte_eal_process_type() == RTE_PROC_SECONDARY);
+	mp_init_msg(dev, &mp_req, MLX4_MP_REQ_CREATE_MR);
+	req->args.addr = addr;
+	ret = rte_mp_request_sync(&mp_req, &mp_rep, &ts);
+	if (ret) {
+		ERROR("port %u request to primary process failed",
+		      dev->data->port_id);
+		return -rte_errno;
+	}
+	assert(mp_rep.nb_received == 1);
+	mp_res = &mp_rep.msgs[0];
+	res = (struct mlx4_mp_param *)mp_res->param;
+	ret = res->result;
+	if (ret)
+		rte_errno = -ret;
+	free(mp_rep.msgs);
+	return ret;
+}
+
+/**
  * IPC message handler of primary process.
  *
  * @param[in] dev
diff --git a/drivers/net/mlx4/mlx4_mr.c b/drivers/net/mlx4/mlx4_mr.c
index 6db917a092..ad7d4832f2 100644
--- a/drivers/net/mlx4/mlx4_mr.c
+++ b/drivers/net/mlx4/mlx4_mr.c
@@ -528,7 +528,10 @@ mr_find_contig_memsegs_cb(const struct rte_memseg_list *msl,
 
 /**
  * Create a new global Memroy Region (MR) for a missing virtual address.
- * Register entire virtually contiguous memory chunk around the address.
+ * This API should be called on a secondary process, then a request is sent to
+ * the primary process in order to create a MR for the address. As the global MR
+ * list is on the shared memory, following LKey lookup should succeed unless the
+ * request fails.
  *
  * @param dev
  *   Pointer to Ethernet device.
@@ -542,8 +545,52 @@ mr_find_contig_memsegs_cb(const struct rte_memseg_list *msl,
  *   Searched LKey on success, UINT32_MAX on failure and rte_errno is set.
  */
 static uint32_t
-mlx4_mr_create(struct rte_eth_dev *dev, struct mlx4_mr_cache *entry,
-	       uintptr_t addr)
+mlx4_mr_create_secondary(struct rte_eth_dev *dev, struct mlx4_mr_cache *entry,
+			 uintptr_t addr)
+{
+	struct mlx4_priv *priv = dev->data->dev_private;
+	int ret;
+
+	DEBUG("port %u requesting MR creation for address (%p)",
+	      dev->data->port_id, (void *)addr);
+	ret = mlx4_mp_req_mr_create(dev, addr);
+	if (ret) {
+		DEBUG("port %u fail to request MR creation for address (%p)",
+		      dev->data->port_id, (void *)addr);
+		return UINT32_MAX;
+	}
+	rte_rwlock_read_lock(&priv->mr.rwlock);
+	/* Fill in output data. */
+	mr_lookup_dev(dev, entry, addr);
+	/* Lookup can't fail. */
+	assert(entry->lkey != UINT32_MAX);
+	rte_rwlock_read_unlock(&priv->mr.rwlock);
+	DEBUG("port %u MR CREATED by primary process for %p:\n"
+	      "  [0x%" PRIxPTR ", 0x%" PRIxPTR "), lkey=0x%x",
+	      dev->data->port_id, (void *)addr,
+	      entry->start, entry->end, entry->lkey);
+	return entry->lkey;
+}
+
+/**
+ * Create a new global Memroy Region (MR) for a missing virtual address.
+ * Register entire virtually contiguous memory chunk around the address.
+ * This must be called from the primary process.
+ *
+ * @param dev
+ *   Pointer to Ethernet device.
+ * @param[out] entry
+ *   Pointer to returning MR cache entry, found in the global cache or newly
+ *   created. If failed to create one, this will not be updated.
+ * @param addr
+ *   Target virtual address to register.
+ *
+ * @return
+ *   Searched LKey on success, UINT32_MAX on failure and rte_errno is set.
+ */
+uint32_t
+mlx4_mr_create_primary(struct rte_eth_dev *dev, struct mlx4_mr_cache *entry,
+		       uintptr_t addr)
 {
 	struct mlx4_priv *priv = dev->data->dev_private;
 	struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
@@ -563,14 +610,6 @@ mlx4_mr_create(struct rte_eth_dev *dev, struct mlx4_mr_cache *entry,
 
 	DEBUG("port %u creating a MR using address (%p)",
 	      dev->data->port_id, (void *)addr);
-	if (rte_eal_process_type() != RTE_PROC_PRIMARY) {
-		WARN("port %u using address (%p) of unregistered mempool"
-		     " in secondary process, please create mempool"
-		     " before rte_eth_dev_start()",
-		     dev->data->port_id, (void *)addr);
-		rte_errno = EPERM;
-		goto err_nolock;
-	}
 	/*
 	 * Release detached MRs if any. This can't be called with holding either
 	 * memory_hotplug_lock or priv->mr.rwlock. MRs on the free list have
@@ -781,6 +820,40 @@ mlx4_mr_create(struct rte_eth_dev *dev, struct mlx4_mr_cache *entry,
 }
 
 /**
+ * Create a new global Memroy Region (MR) for a missing virtual address.
+ * This can be called from primary and secondary process.
+ *
+ * @param dev
+ *   Pointer to Ethernet device.
+ * @param[out] entry
+ *   Pointer to returning MR cache entry, found in the global cache or newly
+ *   created. If failed to create one, this will not be updated.
+ * @param addr
+ *   Target virtual address to register.
+ *
+ * @return
+ *   Searched LKey on success, UINT32_MAX on failure and rte_errno is set.
+ */
+static uint32_t
+mlx4_mr_create(struct rte_eth_dev *dev, struct mlx4_mr_cache *entry,
+	       uintptr_t addr)
+{
+	uint32_t ret = 0;
+
+	switch (rte_eal_process_type()) {
+	case RTE_PROC_PRIMARY:
+		ret = mlx4_mr_create_primary(dev, entry, addr);
+		break;
+	case RTE_PROC_SECONDARY:
+		ret = mlx4_mr_create_secondary(dev, entry, addr);
+		break;
+	default:
+		break;
+	}
+	return ret;
+}
+
+/**
  * Rebuild the global B-tree cache of device from the original MR list.
  *
  * @param dev
diff --git a/drivers/net/mlx4/mlx4_mr.h b/drivers/net/mlx4/mlx4_mr.h
index 37a365a8b5..9d125e239d 100644
--- a/drivers/net/mlx4/mlx4_mr.h
+++ b/drivers/net/mlx4/mlx4_mr.h
@@ -75,6 +75,8 @@ extern rte_rwlock_t mlx4_mem_event_rwlock;
 int mlx4_mr_btree_init(struct mlx4_mr_btree *bt, int n, int socket);
 void mlx4_mr_btree_free(struct mlx4_mr_btree *bt);
 void mlx4_mr_btree_dump(struct mlx4_mr_btree *bt);
+uint32_t mlx4_mr_create_primary(struct rte_eth_dev *dev,
+				struct mlx4_mr_cache *entry, uintptr_t addr);
 void mlx4_mr_mem_event_cb(enum rte_mem_event event_type, const void *addr,
 			  size_t len, void *arg);
 int mlx4_mr_update_mp(struct rte_eth_dev *dev, struct mlx4_mr_ctrl *mr_ctrl,
-- 
2.11.0


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [dpdk-dev] [PATCH v3 0/6] net/mlx: enable secondary process to register DMA memory
  2019-04-01 21:17 ` [dpdk-dev] [PATCH v3 0/6] net/mlx: " Yongseok Koh
                     ` (6 preceding siblings ...)
  2019-04-01 21:17   ` [dpdk-dev] [PATCH v3 6/6] net/mlx4: enable secondary process to register DMA memory Yongseok Koh
@ 2019-04-02  7:13   ` Shahaf Shuler
  2019-04-02  7:13     ` Shahaf Shuler
  7 siblings, 1 reply; 44+ messages in thread
From: Shahaf Shuler @ 2019-04-02  7:13 UTC (permalink / raw)
  To: Yongseok Koh; +Cc: dev

Tuesday, April 2, 2019 12:18 AM, Yongseok Koh:
> Subject: [PATCH v3 0/6] net/mlx: enable secondary process to register DMA
> memory
> 
> RFC:
> https://mails.dpdk.org/archives/dev/2019-March/125517.html
> 
> v3:
> * rebase on the latest branch tip
> 
> v2:
> * add more sanity check for eth_dev and return value from IPC request
> * complement commit messages
> * add MLX5_MP_REQ_TIMEOUT_SEC
> * keep acked-by: Shahaf Shuler
> 
> Yongseok Koh (6):
>   net/mlx: remove debug messages on datapath
>   net/mlx5: fix external memory registration
>   net/mlx5: add control of excessive memory pinning by kernel
>   net/mlx5: enable secondary process to register DMA memory
>   net/mlx4: add control of excessive memory pinning by kernel
>   net/mlx4: enable secondary process to register DMA memory

Applied to next-net-mlx, thanks. 

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [dpdk-dev] [PATCH v3 0/6] net/mlx: enable secondary process to register DMA memory
  2019-04-02  7:13   ` [dpdk-dev] [PATCH v3 0/6] net/mlx: " Shahaf Shuler
@ 2019-04-02  7:13     ` Shahaf Shuler
  0 siblings, 0 replies; 44+ messages in thread
From: Shahaf Shuler @ 2019-04-02  7:13 UTC (permalink / raw)
  To: Yongseok Koh; +Cc: dev

Tuesday, April 2, 2019 12:18 AM, Yongseok Koh:
> Subject: [PATCH v3 0/6] net/mlx: enable secondary process to register DMA
> memory
> 
> RFC:
> https://mails.dpdk.org/archives/dev/2019-March/125517.html
> 
> v3:
> * rebase on the latest branch tip
> 
> v2:
> * add more sanity check for eth_dev and return value from IPC request
> * complement commit messages
> * add MLX5_MP_REQ_TIMEOUT_SEC
> * keep acked-by: Shahaf Shuler
> 
> Yongseok Koh (6):
>   net/mlx: remove debug messages on datapath
>   net/mlx5: fix external memory registration
>   net/mlx5: add control of excessive memory pinning by kernel
>   net/mlx5: enable secondary process to register DMA memory
>   net/mlx4: add control of excessive memory pinning by kernel
>   net/mlx4: enable secondary process to register DMA memory

Applied to next-net-mlx, thanks. 

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [dpdk-dev] [PATCH v3 5/6] net/mlx4: add control of excessive memory pinning by kernel
  2019-04-01 21:17   ` [dpdk-dev] [PATCH v3 5/6] net/mlx4: add control of excessive memory pinning by kernel Yongseok Koh
  2019-04-01 21:17     ` Yongseok Koh
@ 2019-05-14  4:52     ` Stephen Hemminger
  2019-05-14  4:52       ` Stephen Hemminger
  2019-05-14  4:54       ` Stephen Hemminger
  1 sibling, 2 replies; 44+ messages in thread
From: Stephen Hemminger @ 2019-05-14  4:52 UTC (permalink / raw)
  To: Yongseok Koh; +Cc: shahafs, dev

On Mon,  1 Apr 2019 14:17:56 -0700
Yongseok Koh <yskoh@mellanox.com> wrote:

> +- ``mr_ext_memseg_en`` parameter [int]
> +
> +  A nonzero value enables extending memseg when registering DMA memory. If
> +  enabled, the number of entries in MR (Memory Region) lookup table on datapath
> +  is minimized and it benefits performance. On the other hand, it worsens memory
> +  utilization because registered memory is pinned by kernel driver. Even if a
> +  page in the extended chunk is freed, that doesn't become reusable until the
> +  entire memory is freed.
> +
> +  Enabled by default.
> +

This module parameter does not appear in the upstream Linux kernel drivers (even 5.2).
What code are you referring to?

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [dpdk-dev] [PATCH v3 5/6] net/mlx4: add control of excessive memory pinning by kernel
  2019-05-14  4:52     ` Stephen Hemminger
@ 2019-05-14  4:52       ` Stephen Hemminger
  2019-05-14  4:54       ` Stephen Hemminger
  1 sibling, 0 replies; 44+ messages in thread
From: Stephen Hemminger @ 2019-05-14  4:52 UTC (permalink / raw)
  To: Yongseok Koh; +Cc: shahafs, dev

On Mon,  1 Apr 2019 14:17:56 -0700
Yongseok Koh <yskoh@mellanox.com> wrote:

> +- ``mr_ext_memseg_en`` parameter [int]
> +
> +  A nonzero value enables extending memseg when registering DMA memory. If
> +  enabled, the number of entries in MR (Memory Region) lookup table on datapath
> +  is minimized and it benefits performance. On the other hand, it worsens memory
> +  utilization because registered memory is pinned by kernel driver. Even if a
> +  page in the extended chunk is freed, that doesn't become reusable until the
> +  entire memory is freed.
> +
> +  Enabled by default.
> +

This module parameter does not appear in the upstream Linux kernel drivers (even 5.2).
What code are you referring to?

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [dpdk-dev] [PATCH v3 5/6] net/mlx4: add control of excessive memory pinning by kernel
  2019-05-14  4:52     ` Stephen Hemminger
  2019-05-14  4:52       ` Stephen Hemminger
@ 2019-05-14  4:54       ` Stephen Hemminger
  2019-05-14  4:54         ` Stephen Hemminger
  1 sibling, 1 reply; 44+ messages in thread
From: Stephen Hemminger @ 2019-05-14  4:54 UTC (permalink / raw)
  To: Yongseok Koh; +Cc: shahafs, dev

On Mon, 13 May 2019 21:52:22 -0700
Stephen Hemminger <stephen@networkplumber.org> wrote:

> On Mon,  1 Apr 2019 14:17:56 -0700
> Yongseok Koh <yskoh@mellanox.com> wrote:
> 
> > +- ``mr_ext_memseg_en`` parameter [int]
> > +
> > +  A nonzero value enables extending memseg when registering DMA memory. If
> > +  enabled, the number of entries in MR (Memory Region) lookup table on datapath
> > +  is minimized and it benefits performance. On the other hand, it worsens memory
> > +  utilization because registered memory is pinned by kernel driver. Even if a
> > +  page in the extended chunk is freed, that doesn't become reusable until the
> > +  entire memory is freed.
> > +
> > +  Enabled by default.
> > +  
> 
> This module parameter does not appear in the upstream Linux kernel drivers (even 5.2).
> What code are you referring to?

Nevermind, it is a DPDK not kernel parameter.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [dpdk-dev] [PATCH v3 5/6] net/mlx4: add control of excessive memory pinning by kernel
  2019-05-14  4:54       ` Stephen Hemminger
@ 2019-05-14  4:54         ` Stephen Hemminger
  0 siblings, 0 replies; 44+ messages in thread
From: Stephen Hemminger @ 2019-05-14  4:54 UTC (permalink / raw)
  To: Yongseok Koh; +Cc: shahafs, dev

On Mon, 13 May 2019 21:52:22 -0700
Stephen Hemminger <stephen@networkplumber.org> wrote:

> On Mon,  1 Apr 2019 14:17:56 -0700
> Yongseok Koh <yskoh@mellanox.com> wrote:
> 
> > +- ``mr_ext_memseg_en`` parameter [int]
> > +
> > +  A nonzero value enables extending memseg when registering DMA memory. If
> > +  enabled, the number of entries in MR (Memory Region) lookup table on datapath
> > +  is minimized and it benefits performance. On the other hand, it worsens memory
> > +  utilization because registered memory is pinned by kernel driver. Even if a
> > +  page in the extended chunk is freed, that doesn't become reusable until the
> > +  entire memory is freed.
> > +
> > +  Enabled by default.
> > +  
> 
> This module parameter does not appear in the upstream Linux kernel drivers (even 5.2).
> What code are you referring to?

Nevermind, it is a DPDK not kernel parameter.

^ permalink raw reply	[flat|nested] 44+ messages in thread

end of thread, other threads:[~2019-05-14  4:54 UTC | newest]

Thread overview: 44+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-03-07  7:41 [dpdk-dev] [PATCH 0/6] net/mlx: enable secondary process to register DMA memory Yongseok Koh
2019-03-07  7:41 ` [dpdk-dev] [PATCH 1/6] net/mlx: remove debug messages on datapath Yongseok Koh
2019-03-07  7:41 ` [dpdk-dev] [PATCH 2/6] net/mlx5: fix external memory registration Yongseok Koh
2019-03-07  7:41 ` [dpdk-dev] [PATCH 3/6] net/mlx5: add control of excessive memory pinning by kernel Yongseok Koh
2019-03-07  7:41 ` [dpdk-dev] [PATCH 4/6] net/mlx5: enable secondary process to register DMA memory Yongseok Koh
2019-03-07  7:41 ` [dpdk-dev] [PATCH 5/6] net/mlx4: add control of excessive memory pinning by kernel Yongseok Koh
2019-03-07  7:41 ` [dpdk-dev] [PATCH 6/6] net/mlx4: enable secondary process to register DMA memory Yongseok Koh
2019-03-07  7:55 ` [dpdk-dev] [PATCH 0/6] net/mlx: " Yongseok Koh
2019-03-14 12:45 ` Shahaf Shuler
2019-03-14 12:45   ` Shahaf Shuler
2019-03-25 19:22 ` [dpdk-dev] [PATCH v2 " Yongseok Koh
2019-03-25 19:22   ` Yongseok Koh
2019-03-25 19:22   ` [dpdk-dev] [PATCH v2 1/6] net/mlx: remove debug messages on datapath Yongseok Koh
2019-03-25 19:22     ` Yongseok Koh
2019-03-25 19:22   ` [dpdk-dev] [PATCH v2 2/6] net/mlx5: fix external memory registration Yongseok Koh
2019-03-25 19:22     ` Yongseok Koh
2019-03-25 19:22   ` [dpdk-dev] [PATCH v2 3/6] net/mlx5: add control of excessive memory pinning by kernel Yongseok Koh
2019-03-25 19:22     ` Yongseok Koh
2019-03-25 19:22   ` [dpdk-dev] [PATCH v2 4/6] net/mlx5: enable secondary process to register DMA memory Yongseok Koh
2019-03-25 19:22     ` Yongseok Koh
2019-03-25 19:22   ` [dpdk-dev] [PATCH v2 5/6] net/mlx4: add control of excessive memory pinning by kernel Yongseok Koh
2019-03-25 19:22     ` Yongseok Koh
2019-03-25 19:22   ` [dpdk-dev] [PATCH v2 6/6] net/mlx4: enable secondary process to register DMA memory Yongseok Koh
2019-03-25 19:22     ` Yongseok Koh
2019-04-01 21:17 ` [dpdk-dev] [PATCH v3 0/6] net/mlx: " Yongseok Koh
2019-04-01 21:17   ` Yongseok Koh
2019-04-01 21:17   ` [dpdk-dev] [PATCH v3 1/6] net/mlx: remove debug messages on datapath Yongseok Koh
2019-04-01 21:17     ` Yongseok Koh
2019-04-01 21:17   ` [dpdk-dev] [PATCH v3 2/6] net/mlx5: fix external memory registration Yongseok Koh
2019-04-01 21:17     ` Yongseok Koh
2019-04-01 21:17   ` [dpdk-dev] [PATCH v3 3/6] net/mlx5: add control of excessive memory pinning by kernel Yongseok Koh
2019-04-01 21:17     ` Yongseok Koh
2019-04-01 21:17   ` [dpdk-dev] [PATCH v3 4/6] net/mlx5: enable secondary process to register DMA memory Yongseok Koh
2019-04-01 21:17     ` Yongseok Koh
2019-04-01 21:17   ` [dpdk-dev] [PATCH v3 5/6] net/mlx4: add control of excessive memory pinning by kernel Yongseok Koh
2019-04-01 21:17     ` Yongseok Koh
2019-05-14  4:52     ` Stephen Hemminger
2019-05-14  4:52       ` Stephen Hemminger
2019-05-14  4:54       ` Stephen Hemminger
2019-05-14  4:54         ` Stephen Hemminger
2019-04-01 21:17   ` [dpdk-dev] [PATCH v3 6/6] net/mlx4: enable secondary process to register DMA memory Yongseok Koh
2019-04-01 21:17     ` Yongseok Koh
2019-04-02  7:13   ` [dpdk-dev] [PATCH v3 0/6] net/mlx: " Shahaf Shuler
2019-04-02  7:13     ` Shahaf Shuler

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).