From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mellanox.co.il (mail-il-dmz.mellanox.com [193.47.165.129]) by dpdk.org (Postfix) with ESMTP id 31A434CA6 for ; Mon, 25 Mar 2019 20:22:52 +0100 (CET) Received: from Internal Mail-Server by MTLPINE1 (envelope-from yskoh@mellanox.com) with ESMTPS (AES256-SHA encrypted); 25 Mar 2019 21:22:48 +0200 Received: from scfae-sc-2.mti.labs.mlnx (scfae-sc-2.mti.labs.mlnx [10.101.0.96]) by labmailer.mlnx (8.13.8/8.13.8) with ESMTP id x2PJMfrG027389; Mon, 25 Mar 2019 21:22:47 +0200 From: Yongseok Koh To: shahafs@mellanox.com Cc: dev@dpdk.org Date: Mon, 25 Mar 2019 12:22:35 -0700 Message-Id: <20190325192238.20940-4-yskoh@mellanox.com> X-Mailer: git-send-email 2.11.0 In-Reply-To: <20190325192238.20940-1-yskoh@mellanox.com> References: <20190307074151.18815-1-yskoh@mellanox.com> <20190325192238.20940-1-yskoh@mellanox.com> Subject: [dpdk-dev] [PATCH v2 3/6] net/mlx5: add control of excessive memory pinning by kernel X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 25 Mar 2019 19:22:52 -0000 A new PMD parameter (mr_ext_memseg_en) is added to control extension of memseg when creating a MR. It is enabled by default. If enabled, mlx5_mr_create() tries to maximize the range of MR registration so that the LKey lookup tables on datapath become smaller and get the best performance. However, it may worsen memory utilization because registered memory is pinned by kernel driver. Even if a page in the extended chunk is freed, that doesn't become reusable until the entire memory is freed and the MR is destroyed. To make freed pages available immediately, this parameter has to be turned off but it could drop performance. Signed-off-by: Yongseok Koh Acked-by: Shahaf Shuler --- doc/guides/nics/mlx5.rst | 11 +++++++++++ drivers/net/mlx5/mlx5.c | 7 +++++++ drivers/net/mlx5/mlx5.h | 2 ++ drivers/net/mlx5/mlx5_mr.c | 21 ++++++++++++++++----- 4 files changed, 36 insertions(+), 5 deletions(-) diff --git a/doc/guides/nics/mlx5.rst b/doc/guides/nics/mlx5.rst index cbe3fb4c33..d9ae91dfc1 100644 --- a/doc/guides/nics/mlx5.rst +++ b/doc/guides/nics/mlx5.rst @@ -485,6 +485,17 @@ Run-time configuration Disabled by default. +- ``mr_ext_memseg_en`` parameter [int] + + A nonzero value enables extending memseg when registering DMA memory. If + enabled, the number of entries in MR (Memory Region) lookup table on datapath + is minimized and it benefits performance. On the other hand, it worsens memory + utilization because registered memory is pinned by kernel driver. Even if a + page in the extended chunk is freed, that doesn't become reusable until the + entire memory is freed. + + Enabled by default. + - ``representor`` parameter [list] This parameter can be used to instantiate DPDK Ethernet devices from diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c index 840cd3d307..93c0fc8c20 100644 --- a/drivers/net/mlx5/mlx5.c +++ b/drivers/net/mlx5/mlx5.c @@ -108,6 +108,9 @@ /* Activate Netlink support in VF mode. */ #define MLX5_VF_NL_EN "vf_nl_en" +/* Enable extending memsegs when creating a MR. */ +#define MLX5_MR_EXT_MEMSEG_EN "mr_ext_memseg_en" + /* Select port representors to instantiate. */ #define MLX5_REPRESENTOR "representor" @@ -569,6 +572,8 @@ mlx5_args_check(const char *key, const char *val, void *opaque) config->vf_nl_en = !!tmp; } else if (strcmp(MLX5_DV_FLOW_EN, key) == 0) { config->dv_flow_en = !!tmp; + } else if (strcmp(MLX5_MR_EXT_MEMSEG_EN, key) == 0) { + config->mr_ext_memseg_en = !!tmp; } else { DRV_LOG(WARNING, "%s: unknown parameter", key); rte_errno = EINVAL; @@ -610,6 +615,7 @@ mlx5_args(struct mlx5_dev_config *config, struct rte_devargs *devargs) MLX5_L3_VXLAN_EN, MLX5_VF_NL_EN, MLX5_DV_FLOW_EN, + MLX5_MR_EXT_MEMSEG_EN, MLX5_REPRESENTOR, NULL, }; @@ -1588,6 +1594,7 @@ mlx5_pci_probe(struct rte_pci_driver *pci_drv __rte_unused, .txqs_vec = MLX5_ARG_UNSET, .inline_max_packet_sz = MLX5_ARG_UNSET, .vf_nl_en = 1, + .mr_ext_memseg_en = 1, .mprq = { .enabled = 0, /* Disabled by default. */ .stride_num_n = MLX5_MPRQ_STRIDE_NUM_N, diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h index d8a5162bdb..37c8cd1d34 100644 --- a/drivers/net/mlx5/mlx5.h +++ b/drivers/net/mlx5/mlx5.h @@ -167,6 +167,8 @@ struct mlx5_dev_config { unsigned int tx_vec_en:1; /* Tx vector is enabled. */ unsigned int rx_vec_en:1; /* Rx vector is enabled. */ unsigned int mpw_hdr_dseg:1; /* Enable DSEGs in the title WQEBB. */ + unsigned int mr_ext_memseg_en:1; + /* Whether memseg should be extended for MR creation. */ unsigned int l3_vxlan_en:1; /* Enable L3 VXLAN flow creation. */ unsigned int vf_nl_en:1; /* Enable Netlink requests in VF mode. */ unsigned int dv_flow_en:1; /* Enable DV flow. */ diff --git a/drivers/net/mlx5/mlx5_mr.c b/drivers/net/mlx5/mlx5_mr.c index e255650add..e9eda975ff 100644 --- a/drivers/net/mlx5/mlx5_mr.c +++ b/drivers/net/mlx5/mlx5_mr.c @@ -534,6 +534,7 @@ mlx5_mr_create(struct rte_eth_dev *dev, struct mlx5_mr_cache *entry, uintptr_t addr) { struct mlx5_priv *priv = dev->data->dev_private; + struct mlx5_dev_config *config = &priv->config; struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config; const struct rte_memseg_list *msl; const struct rte_memseg *ms; @@ -569,14 +570,24 @@ mlx5_mr_create(struct rte_eth_dev *dev, struct mlx5_mr_cache *entry, */ mlx5_mr_garbage_collect(dev); /* - * Find out a contiguous virtual address chunk in use, to which the - * given address belongs, in order to register maximum range. In the - * best case where mempools are not dynamically recreated and + * If enabled, find out a contiguous virtual address chunk in use, to + * which the given address belongs, in order to register maximum range. + * In the best case where mempools are not dynamically recreated and * '--socket-mem' is specified as an EAL option, it is very likely to * have only one MR(LKey) per a socket and per a hugepage-size even - * though the system memory is highly fragmented. + * though the system memory is highly fragmented. As the whole memory + * chunk will be pinned by kernel, it can't be reused unless entire + * chunk is freed from EAL. + * + * If disabled, just register one memseg (page). Then, memory + * consumption will be minimized but it may drop performance if there + * are many MRs to lookup on the datapath. */ - if (!rte_memseg_contig_walk(mr_find_contig_memsegs_cb, &data)) { + if (!config->mr_ext_memseg_en) { + data.msl = rte_mem_virt2memseg_list((void *)addr); + data.start = RTE_ALIGN_FLOOR(addr, data.msl->page_sz); + data.end = data.start + data.msl->page_sz; + } else if (!rte_memseg_contig_walk(mr_find_contig_memsegs_cb, &data)) { DRV_LOG(WARNING, "port %u unable to find virtually contiguous" " chunk for address (%p)." -- 2.11.0 From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from dpdk.org (dpdk.org [92.243.14.124]) by dpdk.space (Postfix) with ESMTP id 7756DA05D3 for ; Mon, 25 Mar 2019 20:23:14 +0100 (CET) Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id 89FC94F90; Mon, 25 Mar 2019 20:23:01 +0100 (CET) Received: from mellanox.co.il (mail-il-dmz.mellanox.com [193.47.165.129]) by dpdk.org (Postfix) with ESMTP id 31A434CA6 for ; Mon, 25 Mar 2019 20:22:52 +0100 (CET) Received: from Internal Mail-Server by MTLPINE1 (envelope-from yskoh@mellanox.com) with ESMTPS (AES256-SHA encrypted); 25 Mar 2019 21:22:48 +0200 Received: from scfae-sc-2.mti.labs.mlnx (scfae-sc-2.mti.labs.mlnx [10.101.0.96]) by labmailer.mlnx (8.13.8/8.13.8) with ESMTP id x2PJMfrG027389; Mon, 25 Mar 2019 21:22:47 +0200 From: Yongseok Koh To: shahafs@mellanox.com Cc: dev@dpdk.org Date: Mon, 25 Mar 2019 12:22:35 -0700 Message-Id: <20190325192238.20940-4-yskoh@mellanox.com> X-Mailer: git-send-email 2.11.0 In-Reply-To: <20190325192238.20940-1-yskoh@mellanox.com> References: <20190307074151.18815-1-yskoh@mellanox.com> <20190325192238.20940-1-yskoh@mellanox.com> Subject: [dpdk-dev] [PATCH v2 3/6] net/mlx5: add control of excessive memory pinning by kernel X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" Content-Type: text/plain; charset="UTF-8" Message-ID: <20190325192235.fIYIa5FaMp2Ljkbl54WhNirzhn5DYKye25UrmKBFsNw@z> A new PMD parameter (mr_ext_memseg_en) is added to control extension of memseg when creating a MR. It is enabled by default. If enabled, mlx5_mr_create() tries to maximize the range of MR registration so that the LKey lookup tables on datapath become smaller and get the best performance. However, it may worsen memory utilization because registered memory is pinned by kernel driver. Even if a page in the extended chunk is freed, that doesn't become reusable until the entire memory is freed and the MR is destroyed. To make freed pages available immediately, this parameter has to be turned off but it could drop performance. Signed-off-by: Yongseok Koh Acked-by: Shahaf Shuler --- doc/guides/nics/mlx5.rst | 11 +++++++++++ drivers/net/mlx5/mlx5.c | 7 +++++++ drivers/net/mlx5/mlx5.h | 2 ++ drivers/net/mlx5/mlx5_mr.c | 21 ++++++++++++++++----- 4 files changed, 36 insertions(+), 5 deletions(-) diff --git a/doc/guides/nics/mlx5.rst b/doc/guides/nics/mlx5.rst index cbe3fb4c33..d9ae91dfc1 100644 --- a/doc/guides/nics/mlx5.rst +++ b/doc/guides/nics/mlx5.rst @@ -485,6 +485,17 @@ Run-time configuration Disabled by default. +- ``mr_ext_memseg_en`` parameter [int] + + A nonzero value enables extending memseg when registering DMA memory. If + enabled, the number of entries in MR (Memory Region) lookup table on datapath + is minimized and it benefits performance. On the other hand, it worsens memory + utilization because registered memory is pinned by kernel driver. Even if a + page in the extended chunk is freed, that doesn't become reusable until the + entire memory is freed. + + Enabled by default. + - ``representor`` parameter [list] This parameter can be used to instantiate DPDK Ethernet devices from diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c index 840cd3d307..93c0fc8c20 100644 --- a/drivers/net/mlx5/mlx5.c +++ b/drivers/net/mlx5/mlx5.c @@ -108,6 +108,9 @@ /* Activate Netlink support in VF mode. */ #define MLX5_VF_NL_EN "vf_nl_en" +/* Enable extending memsegs when creating a MR. */ +#define MLX5_MR_EXT_MEMSEG_EN "mr_ext_memseg_en" + /* Select port representors to instantiate. */ #define MLX5_REPRESENTOR "representor" @@ -569,6 +572,8 @@ mlx5_args_check(const char *key, const char *val, void *opaque) config->vf_nl_en = !!tmp; } else if (strcmp(MLX5_DV_FLOW_EN, key) == 0) { config->dv_flow_en = !!tmp; + } else if (strcmp(MLX5_MR_EXT_MEMSEG_EN, key) == 0) { + config->mr_ext_memseg_en = !!tmp; } else { DRV_LOG(WARNING, "%s: unknown parameter", key); rte_errno = EINVAL; @@ -610,6 +615,7 @@ mlx5_args(struct mlx5_dev_config *config, struct rte_devargs *devargs) MLX5_L3_VXLAN_EN, MLX5_VF_NL_EN, MLX5_DV_FLOW_EN, + MLX5_MR_EXT_MEMSEG_EN, MLX5_REPRESENTOR, NULL, }; @@ -1588,6 +1594,7 @@ mlx5_pci_probe(struct rte_pci_driver *pci_drv __rte_unused, .txqs_vec = MLX5_ARG_UNSET, .inline_max_packet_sz = MLX5_ARG_UNSET, .vf_nl_en = 1, + .mr_ext_memseg_en = 1, .mprq = { .enabled = 0, /* Disabled by default. */ .stride_num_n = MLX5_MPRQ_STRIDE_NUM_N, diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h index d8a5162bdb..37c8cd1d34 100644 --- a/drivers/net/mlx5/mlx5.h +++ b/drivers/net/mlx5/mlx5.h @@ -167,6 +167,8 @@ struct mlx5_dev_config { unsigned int tx_vec_en:1; /* Tx vector is enabled. */ unsigned int rx_vec_en:1; /* Rx vector is enabled. */ unsigned int mpw_hdr_dseg:1; /* Enable DSEGs in the title WQEBB. */ + unsigned int mr_ext_memseg_en:1; + /* Whether memseg should be extended for MR creation. */ unsigned int l3_vxlan_en:1; /* Enable L3 VXLAN flow creation. */ unsigned int vf_nl_en:1; /* Enable Netlink requests in VF mode. */ unsigned int dv_flow_en:1; /* Enable DV flow. */ diff --git a/drivers/net/mlx5/mlx5_mr.c b/drivers/net/mlx5/mlx5_mr.c index e255650add..e9eda975ff 100644 --- a/drivers/net/mlx5/mlx5_mr.c +++ b/drivers/net/mlx5/mlx5_mr.c @@ -534,6 +534,7 @@ mlx5_mr_create(struct rte_eth_dev *dev, struct mlx5_mr_cache *entry, uintptr_t addr) { struct mlx5_priv *priv = dev->data->dev_private; + struct mlx5_dev_config *config = &priv->config; struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config; const struct rte_memseg_list *msl; const struct rte_memseg *ms; @@ -569,14 +570,24 @@ mlx5_mr_create(struct rte_eth_dev *dev, struct mlx5_mr_cache *entry, */ mlx5_mr_garbage_collect(dev); /* - * Find out a contiguous virtual address chunk in use, to which the - * given address belongs, in order to register maximum range. In the - * best case where mempools are not dynamically recreated and + * If enabled, find out a contiguous virtual address chunk in use, to + * which the given address belongs, in order to register maximum range. + * In the best case where mempools are not dynamically recreated and * '--socket-mem' is specified as an EAL option, it is very likely to * have only one MR(LKey) per a socket and per a hugepage-size even - * though the system memory is highly fragmented. + * though the system memory is highly fragmented. As the whole memory + * chunk will be pinned by kernel, it can't be reused unless entire + * chunk is freed from EAL. + * + * If disabled, just register one memseg (page). Then, memory + * consumption will be minimized but it may drop performance if there + * are many MRs to lookup on the datapath. */ - if (!rte_memseg_contig_walk(mr_find_contig_memsegs_cb, &data)) { + if (!config->mr_ext_memseg_en) { + data.msl = rte_mem_virt2memseg_list((void *)addr); + data.start = RTE_ALIGN_FLOOR(addr, data.msl->page_sz); + data.end = data.start + data.msl->page_sz; + } else if (!rte_memseg_contig_walk(mr_find_contig_memsegs_cb, &data)) { DRV_LOG(WARNING, "port %u unable to find virtually contiguous" " chunk for address (%p)." -- 2.11.0