DPDK patches and discussions
 help / color / mirror / Atom feed
* [dpdk-dev] [RFC 00/10] Support running DPDK without hugetlbfs mountpoint
@ 2018-05-31 14:32 Anatoly Burakov
  2018-05-31 14:32 ` [dpdk-dev] [RFC 01/10] eal: add --no-shared-files option Anatoly Burakov
                   ` (9 more replies)
  0 siblings, 10 replies; 11+ messages in thread
From: Anatoly Burakov @ 2018-05-31 14:32 UTC (permalink / raw)
  To: dev
  Cc: ray.kinsella, kuralamudhan.ramakrishnan, louise.m.daly,
	bruce.richardson, ferruh.yigit, konstantin.ananyev

This patchset takes old debug options "--huge-unlink" and
"--no-shconf" and replaces them both with a new option,
"--no-shared-files". This is a special mode which will
disable support for secondary processes, but which will
cause DPDK to not create any shared files while running -
neither hugepages nor any runtime data (everything will
be entirely in memory).

Additionally, on supported kernel/glibc versions (Linux
4.14+, glibc 2.27+), "--no-shared-files" mode will also
reserve hugepages using memfd instead of relying on
hugetlbfs mountpoint. This will make it possible to use
DPDK without hugetlbfs mountpoints (e.g. container use
cases).

This changes functionality of several command-line
switches, so RFC for now. Maybe we could leave the old
switches as they are and deprecate them in the next
release?

Anatoly Burakov (10):
  eal: add --no-shared-files option
  eal: make --no-shconf an alias for --no-shared-files
  eal: make --huge-unlink an alias for --no-shared-files
  fbarray: support no-shared-files mode
  mem: add support for no-shared-files mode
  ipc: add support for no-shared-files mode
  eal: add support for no-shared-files for hugepage info
  eal: add support for no-shared-files in hugepage data file
  eal: do not create runtime dir in no-shared-files mode
  mem: enable memfd-based hugepage allocation

 lib/librte_eal/bsdapp/eal/eal.c               |   7 +-
 lib/librte_eal/bsdapp/eal/eal_hugepage_info.c |   4 +
 lib/librte_eal/common/eal_common_fbarray.c    |  71 +++++----
 lib/librte_eal/common/eal_common_memory.c     |   3 +-
 lib/librte_eal/common/eal_common_options.c    |  25 ++--
 lib/librte_eal/common/eal_common_proc.c       |  25 ++++
 lib/librte_eal/common/eal_internal_cfg.h      |   3 +-
 lib/librte_eal/common/eal_options.h           |   7 +-
 lib/librte_eal/linuxapp/eal/eal.c             |  18 ++-
 .../linuxapp/eal/eal_hugepage_info.c          | 140 ++++++++++++++----
 lib/librte_eal/linuxapp/eal/eal_memalloc.c    | 126 +++++++++++++++-
 lib/librte_eal/linuxapp/eal/eal_memfd.h       |  28 ++++
 lib/librte_eal/linuxapp/eal/eal_memory.c      |  19 ++-
 test/test/test_eal_flags.c                    |  18 +--
 14 files changed, 384 insertions(+), 110 deletions(-)
 create mode 100644 lib/librte_eal/linuxapp/eal/eal_memfd.h

-- 
2.17.0

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [dpdk-dev] [RFC 01/10] eal: add --no-shared-files option
  2018-05-31 14:32 [dpdk-dev] [RFC 00/10] Support running DPDK without hugetlbfs mountpoint Anatoly Burakov
@ 2018-05-31 14:32 ` Anatoly Burakov
  2018-05-31 14:32 ` [dpdk-dev] [RFC 02/10] eal: make --no-shconf an alias for --no-shared-files Anatoly Burakov
                   ` (8 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: Anatoly Burakov @ 2018-05-31 14:32 UTC (permalink / raw)
  To: dev
  Cc: ray.kinsella, kuralamudhan.ramakrishnan, louise.m.daly,
	bruce.richardson, ferruh.yigit, konstantin.ananyev

This command-line option will cause DPDK to not create any shared
files at runtime, including any shared configuration or hugetlbfs
files. This is useful for debug purposes, as well as for certain
use cases like containers.

Currently, this option does nothing.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
 lib/librte_eal/common/eal_common_options.c | 7 +++++++
 lib/librte_eal/common/eal_internal_cfg.h   | 1 +
 lib/librte_eal/common/eal_options.h        | 2 ++
 3 files changed, 10 insertions(+)

diff --git a/lib/librte_eal/common/eal_common_options.c b/lib/librte_eal/common/eal_common_options.c
index ecebb2923..38df094de 100644
--- a/lib/librte_eal/common/eal_common_options.c
+++ b/lib/librte_eal/common/eal_common_options.c
@@ -66,6 +66,7 @@ eal_long_options[] = {
 	{OPT_NO_HUGE,           0, NULL, OPT_NO_HUGE_NUM          },
 	{OPT_NO_PCI,            0, NULL, OPT_NO_PCI_NUM           },
 	{OPT_NO_SHCONF,         0, NULL, OPT_NO_SHCONF_NUM        },
+	{OPT_NO_SHARED_FILES,   0, NULL, OPT_NO_SHARED_FILES_NUM  },
 	{OPT_PCI_BLACKLIST,     1, NULL, OPT_PCI_BLACKLIST_NUM    },
 	{OPT_PCI_WHITELIST,     1, NULL, OPT_PCI_WHITELIST_NUM    },
 	{OPT_PROC_TYPE,         1, NULL, OPT_PROC_TYPE_NUM        },
@@ -1165,6 +1166,10 @@ eal_parse_common_option(int opt, const char *optarg,
 		conf->no_shconf = 1;
 		break;
 
+	case OPT_NO_SHARED_FILES_NUM:
+		conf->no_shared_files = 1;
+		break;
+
 	case OPT_PROC_TYPE_NUM:
 		conf->process_type = eal_parse_proc_type(optarg);
 		break;
@@ -1370,6 +1375,8 @@ eal_common_usage(void)
 	       "                      Set specific log level\n"
 	       "  -v                  Display version information on startup\n"
 	       "  -h, --help          This help\n"
+	       "  --"OPT_NO_SHARED_FILES"   Do not create any shared files (config, hugetlbfs, etc.).\n"
+	       "                      This disables secondary process support\n"
 	       "\nEAL options for DEBUG use only:\n"
 	       "  --"OPT_HUGE_UNLINK"       Unlink hugepage files after init\n"
 	       "  --"OPT_NO_HUGE"           Use malloc instead of hugetlbfs\n"
diff --git a/lib/librte_eal/common/eal_internal_cfg.h b/lib/librte_eal/common/eal_internal_cfg.h
index c4cbf3acd..3fc71bb49 100644
--- a/lib/librte_eal/common/eal_internal_cfg.h
+++ b/lib/librte_eal/common/eal_internal_cfg.h
@@ -41,6 +41,7 @@ struct internal_config {
 	volatile unsigned vmware_tsc_map; /**< true to use VMware TSC mapping
 										* instead of native TSC */
 	volatile unsigned no_shconf;      /**< true if there is no shared config */
+	volatile unsigned no_shared_files; /**< true if there are no shared files to be created*/
 	volatile unsigned create_uio_dev; /**< true to create /dev/uioX devices */
 	volatile enum rte_proc_type_t process_type; /**< multi-process proc type */
 	/** true to try allocating memory on specific sockets */
diff --git a/lib/librte_eal/common/eal_options.h b/lib/librte_eal/common/eal_options.h
index 211ae06ae..b0d9d6819 100644
--- a/lib/librte_eal/common/eal_options.h
+++ b/lib/librte_eal/common/eal_options.h
@@ -45,6 +45,8 @@ enum {
 	OPT_NO_PCI_NUM,
 #define OPT_NO_SHCONF         "no-shconf"
 	OPT_NO_SHCONF_NUM,
+#define OPT_NO_SHARED_FILES   "no-shared-files"
+	OPT_NO_SHARED_FILES_NUM,
 #define OPT_SOCKET_MEM        "socket-mem"
 	OPT_SOCKET_MEM_NUM,
 #define OPT_SYSLOG            "syslog"
-- 
2.17.0

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [dpdk-dev] [RFC 02/10] eal: make --no-shconf an alias for --no-shared-files
  2018-05-31 14:32 [dpdk-dev] [RFC 00/10] Support running DPDK without hugetlbfs mountpoint Anatoly Burakov
  2018-05-31 14:32 ` [dpdk-dev] [RFC 01/10] eal: add --no-shared-files option Anatoly Burakov
@ 2018-05-31 14:32 ` Anatoly Burakov
  2018-05-31 14:32 ` [dpdk-dev] [RFC 03/10] eal: make --huge-unlink " Anatoly Burakov
                   ` (7 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: Anatoly Burakov @ 2018-05-31 14:32 UTC (permalink / raw)
  To: dev
  Cc: Bruce Richardson, ray.kinsella, kuralamudhan.ramakrishnan,
	louise.m.daly, ferruh.yigit, konstantin.ananyev

Move all functionality associated with --no-shconf to
--no-shared-files, and make the former an alias for the latter.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
 lib/librte_eal/bsdapp/eal/eal.c            |  4 ++--
 lib/librte_eal/common/eal_common_memory.c  |  3 ++-
 lib/librte_eal/common/eal_common_options.c |  8 ++------
 lib/librte_eal/common/eal_internal_cfg.h   |  1 -
 lib/librte_eal/common/eal_options.h        |  2 +-
 lib/librte_eal/linuxapp/eal/eal.c          |  6 +++---
 test/test/test_eal_flags.c                 | 18 +++++++++---------
 7 files changed, 19 insertions(+), 23 deletions(-)

diff --git a/lib/librte_eal/bsdapp/eal/eal.c b/lib/librte_eal/bsdapp/eal/eal.c
index dc279542d..4dff1804e 100644
--- a/lib/librte_eal/bsdapp/eal/eal.c
+++ b/lib/librte_eal/bsdapp/eal/eal.c
@@ -222,7 +222,7 @@ rte_eal_config_create(void)
 
 	const char *pathname = eal_runtime_config_path();
 
-	if (internal_config.no_shconf)
+	if (internal_config.no_shared_files)
 		return;
 
 	if (mem_cfg_fd < 0){
@@ -261,7 +261,7 @@ rte_eal_config_attach(void)
 	void *rte_mem_cfg_addr;
 	const char *pathname = eal_runtime_config_path();
 
-	if (internal_config.no_shconf)
+	if (internal_config.no_shared_files)
 		return;
 
 	if (mem_cfg_fd < 0){
diff --git a/lib/librte_eal/common/eal_common_memory.c b/lib/librte_eal/common/eal_common_memory.c
index 4f0688f9d..a9c4b9b68 100644
--- a/lib/librte_eal/common/eal_common_memory.c
+++ b/lib/librte_eal/common/eal_common_memory.c
@@ -938,7 +938,8 @@ rte_eal_memory_init(void)
 	if (retval < 0)
 		goto fail;
 
-	if (internal_config.no_shconf == 0 && rte_eal_memdevice_init() < 0)
+	if (internal_config.no_shared_files == 0 &&
+			rte_eal_memdevice_init() < 0)
 		goto fail;
 
 	return 0;
diff --git a/lib/librte_eal/common/eal_common_options.c b/lib/librte_eal/common/eal_common_options.c
index 38df094de..0f3eb928a 100644
--- a/lib/librte_eal/common/eal_common_options.c
+++ b/lib/librte_eal/common/eal_common_options.c
@@ -65,7 +65,7 @@ eal_long_options[] = {
 	{OPT_NO_HPET,           0, NULL, OPT_NO_HPET_NUM          },
 	{OPT_NO_HUGE,           0, NULL, OPT_NO_HUGE_NUM          },
 	{OPT_NO_PCI,            0, NULL, OPT_NO_PCI_NUM           },
-	{OPT_NO_SHCONF,         0, NULL, OPT_NO_SHCONF_NUM        },
+	{OPT_NO_SHCONF,         0, NULL, OPT_NO_SHARED_FILES_NUM  },
 	{OPT_NO_SHARED_FILES,   0, NULL, OPT_NO_SHARED_FILES_NUM  },
 	{OPT_PCI_BLACKLIST,     1, NULL, OPT_PCI_BLACKLIST_NUM    },
 	{OPT_PCI_WHITELIST,     1, NULL, OPT_PCI_WHITELIST_NUM    },
@@ -1162,10 +1162,6 @@ eal_parse_common_option(int opt, const char *optarg,
 		conf->vmware_tsc_map = 1;
 		break;
 
-	case OPT_NO_SHCONF_NUM:
-		conf->no_shconf = 1;
-		break;
-
 	case OPT_NO_SHARED_FILES_NUM:
 		conf->no_shared_files = 1;
 		break;
@@ -1382,6 +1378,6 @@ eal_common_usage(void)
 	       "  --"OPT_NO_HUGE"           Use malloc instead of hugetlbfs\n"
 	       "  --"OPT_NO_PCI"            Disable PCI\n"
 	       "  --"OPT_NO_HPET"           Disable HPET\n"
-	       "  --"OPT_NO_SHCONF"         No shared config (mmap'd files)\n"
+	       "  --"OPT_NO_SHCONF"         Deprecated. Alias for --no-shared-files\n"
 	       "\n", RTE_MAX_LCORE);
 }
diff --git a/lib/librte_eal/common/eal_internal_cfg.h b/lib/librte_eal/common/eal_internal_cfg.h
index 3fc71bb49..d80bacd4d 100644
--- a/lib/librte_eal/common/eal_internal_cfg.h
+++ b/lib/librte_eal/common/eal_internal_cfg.h
@@ -40,7 +40,6 @@ struct internal_config {
 	volatile unsigned no_hpet;        /**< true to disable HPET */
 	volatile unsigned vmware_tsc_map; /**< true to use VMware TSC mapping
 										* instead of native TSC */
-	volatile unsigned no_shconf;      /**< true if there is no shared config */
 	volatile unsigned no_shared_files; /**< true if there are no shared files to be created*/
 	volatile unsigned create_uio_dev; /**< true to create /dev/uioX devices */
 	volatile enum rte_proc_type_t process_type; /**< multi-process proc type */
diff --git a/lib/librte_eal/common/eal_options.h b/lib/librte_eal/common/eal_options.h
index b0d9d6819..6890d4114 100644
--- a/lib/librte_eal/common/eal_options.h
+++ b/lib/librte_eal/common/eal_options.h
@@ -43,8 +43,8 @@ enum {
 	OPT_NO_HUGE_NUM,
 #define OPT_NO_PCI            "no-pci"
 	OPT_NO_PCI_NUM,
+/* no-shconf is an alias for no-shared-files */
 #define OPT_NO_SHCONF         "no-shconf"
-	OPT_NO_SHCONF_NUM,
 #define OPT_NO_SHARED_FILES   "no-shared-files"
 	OPT_NO_SHARED_FILES_NUM,
 #define OPT_SOCKET_MEM        "socket-mem"
diff --git a/lib/librte_eal/linuxapp/eal/eal.c b/lib/librte_eal/linuxapp/eal/eal.c
index 8655b8691..32ca25dc2 100644
--- a/lib/librte_eal/linuxapp/eal/eal.c
+++ b/lib/librte_eal/linuxapp/eal/eal.c
@@ -230,7 +230,7 @@ rte_eal_config_create(void)
 
 	const char *pathname = eal_runtime_config_path();
 
-	if (internal_config.no_shconf)
+	if (internal_config.no_shared_files)
 		return;
 
 	/* map the config before hugepage address so that we don't waste a page */
@@ -283,7 +283,7 @@ rte_eal_config_attach(void)
 
 	const char *pathname = eal_runtime_config_path();
 
-	if (internal_config.no_shconf)
+	if (internal_config.no_shared_files)
 		return;
 
 	if (mem_cfg_fd < 0){
@@ -309,7 +309,7 @@ rte_eal_config_reattach(void)
 	struct rte_mem_config *mem_config;
 	void *rte_mem_cfg_addr;
 
-	if (internal_config.no_shconf)
+	if (internal_config.no_shared_files)
 		return;
 
 	/* save the address primary process has mapped shared config to */
diff --git a/test/test/test_eal_flags.c b/test/test/test_eal_flags.c
index f840ca50b..8e83ea7bf 100644
--- a/test/test/test_eal_flags.c
+++ b/test/test/test_eal_flags.c
@@ -27,7 +27,7 @@
 #define mp_flag "--proc-type=secondary"
 #define no_hpet "--no-hpet"
 #define no_huge "--no-huge"
-#define no_shconf "--no-shconf"
+#define no_shared_files "--no-shared-files"
 #define pci_whitelist "--pci-whitelist"
 #define vdev "--vdev"
 #define memtest "memtest"
@@ -370,7 +370,7 @@ test_invalid_vdev_flag(void)
 #ifdef RTE_EXEC_ENV_BSDAPP
 	/* BSD target doesn't support prefixes at this point, and we also need to
 	 * run another primary process here */
-	const char * prefix = no_shconf;
+	const char * prefix = no_shared_files;
 #else
 	const char * prefix = "--file-prefix=vdev";
 #endif
@@ -662,15 +662,15 @@ test_invalid_n_flag(void)
 #endif
 
 	/* -n flag but no value */
-	const char *argv1[] = { prgname, prefix, no_huge, no_shconf, "-c", "1", "-n"};
+	const char *argv1[] = { prgname, prefix, no_huge, no_shared_files, "-c", "1", "-n"};
 	/* bad numeric value */
-	const char *argv2[] = { prgname, prefix, no_huge, no_shconf, "-c", "1", "-n", "e" };
+	const char *argv2[] = { prgname, prefix, no_huge, no_shared_files, "-c", "1", "-n", "e" };
 	/* zero is invalid */
-	const char *argv3[] = { prgname, prefix, no_huge, no_shconf, "-c", "1", "-n", "0" };
+	const char *argv3[] = { prgname, prefix, no_huge, no_shared_files, "-c", "1", "-n", "0" };
 	/* sanity test - check with good value */
-	const char *argv4[] = { prgname, prefix, no_huge, no_shconf, "-c", "1", "-n", "2" };
+	const char *argv4[] = { prgname, prefix, no_huge, no_shared_files, "-c", "1", "-n", "2" };
 	/* sanity test - check with no -n flag */
-	const char *argv5[] = { prgname, prefix, no_huge, no_shconf, "-c", "1"};
+	const char *argv5[] = { prgname, prefix, no_huge, no_shared_files, "-c", "1"};
 
 	if (launch_proc(argv1) == 0
 			|| launch_proc(argv2) == 0
@@ -734,7 +734,7 @@ test_no_huge_flag(void)
 #ifdef RTE_EXEC_ENV_BSDAPP
 	/* BSD target doesn't support prefixes at this point, and we also need to
 	 * run another primary process here */
-	const char * prefix = no_shconf;
+	const char * prefix = no_shared_files;
 #else
 	const char * prefix = "--file-prefix=nohuge";
 #endif
@@ -851,7 +851,7 @@ test_misc_flags(void)
 	const char *argv5[] = {prgname, prefix, mp_flag, "-c", "1", "--syslog", "error"};
 	/* With no-sh-conf */
 	const char *argv6[] = {prgname, "-c", "1", "-n", "2", "-m", DEFAULT_MEM_SIZE,
-			no_shconf, nosh_prefix };
+			no_shared_files, nosh_prefix };
 
 #ifdef RTE_EXEC_ENV_BSDAPP
 	return 0;
-- 
2.17.0

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [dpdk-dev] [RFC 03/10] eal: make --huge-unlink an alias for --no-shared-files
  2018-05-31 14:32 [dpdk-dev] [RFC 00/10] Support running DPDK without hugetlbfs mountpoint Anatoly Burakov
  2018-05-31 14:32 ` [dpdk-dev] [RFC 01/10] eal: add --no-shared-files option Anatoly Burakov
  2018-05-31 14:32 ` [dpdk-dev] [RFC 02/10] eal: make --no-shconf an alias for --no-shared-files Anatoly Burakov
@ 2018-05-31 14:32 ` Anatoly Burakov
  2018-05-31 14:32 ` [dpdk-dev] [RFC 04/10] fbarray: support no-shared-files mode Anatoly Burakov
                   ` (6 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: Anatoly Burakov @ 2018-05-31 14:32 UTC (permalink / raw)
  To: dev
  Cc: ray.kinsella, kuralamudhan.ramakrishnan, louise.m.daly,
	bruce.richardson, ferruh.yigit, konstantin.ananyev

Move all functionality associated with --huge-unlink command-line
option to --no-shared-files, and make it an alias. Since the new
command-line option does things other than just unlinking hugepage
files after they've been created, it is no longer incompatible with
--no-huge option, so removing that check as well.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
 lib/librte_eal/common/eal_common_options.c | 14 ++------------
 lib/librte_eal/common/eal_internal_cfg.h   |  1 -
 lib/librte_eal/common/eal_options.h        |  5 ++---
 lib/librte_eal/linuxapp/eal/eal_memory.c   |  2 +-
 4 files changed, 5 insertions(+), 17 deletions(-)

diff --git a/lib/librte_eal/common/eal_common_options.c b/lib/librte_eal/common/eal_common_options.c
index 0f3eb928a..63e562bdb 100644
--- a/lib/librte_eal/common/eal_common_options.c
+++ b/lib/librte_eal/common/eal_common_options.c
@@ -57,7 +57,7 @@ eal_long_options[] = {
 	{OPT_FILE_PREFIX,       1, NULL, OPT_FILE_PREFIX_NUM      },
 	{OPT_HELP,              0, NULL, OPT_HELP_NUM             },
 	{OPT_HUGE_DIR,          1, NULL, OPT_HUGE_DIR_NUM         },
-	{OPT_HUGE_UNLINK,       0, NULL, OPT_HUGE_UNLINK_NUM      },
+	{OPT_HUGE_UNLINK,       0, NULL, OPT_NO_SHARED_FILES_NUM  },
 	{OPT_LCORES,            1, NULL, OPT_LCORES_NUM           },
 	{OPT_LOG_LEVEL,         1, NULL, OPT_LOG_LEVEL_NUM        },
 	{OPT_MASTER_LCORE,      1, NULL, OPT_MASTER_LCORE_NUM     },
@@ -1140,10 +1140,6 @@ eal_parse_common_option(int opt, const char *optarg,
 		break;
 
 	/* long options */
-	case OPT_HUGE_UNLINK_NUM:
-		conf->hugepage_unlink = 1;
-		break;
-
 	case OPT_NO_HUGE_NUM:
 		conf->no_hugetlbfs = 1;
 		/* no-huge is legacy mem */
@@ -1318,12 +1314,6 @@ eal_check_common_options(struct internal_config *internal_cfg)
 		return -1;
 	}
 
-	if (internal_cfg->no_hugetlbfs && internal_cfg->hugepage_unlink) {
-		RTE_LOG(ERR, EAL, "Option --"OPT_HUGE_UNLINK" cannot "
-			"be specified together with --"OPT_NO_HUGE"\n");
-		return -1;
-	}
-
 	return 0;
 }
 
@@ -1374,7 +1364,7 @@ eal_common_usage(void)
 	       "  --"OPT_NO_SHARED_FILES"   Do not create any shared files (config, hugetlbfs, etc.).\n"
 	       "                      This disables secondary process support\n"
 	       "\nEAL options for DEBUG use only:\n"
-	       "  --"OPT_HUGE_UNLINK"       Unlink hugepage files after init\n"
+	       "  --"OPT_HUGE_UNLINK"       Deprecated. Alias for --no-shared-files\n"
 	       "  --"OPT_NO_HUGE"           Use malloc instead of hugetlbfs\n"
 	       "  --"OPT_NO_PCI"            Disable PCI\n"
 	       "  --"OPT_NO_HPET"           Disable HPET\n"
diff --git a/lib/librte_eal/common/eal_internal_cfg.h b/lib/librte_eal/common/eal_internal_cfg.h
index d80bacd4d..887a6a8e2 100644
--- a/lib/librte_eal/common/eal_internal_cfg.h
+++ b/lib/librte_eal/common/eal_internal_cfg.h
@@ -35,7 +35,6 @@ struct internal_config {
 	volatile unsigned force_nchannel; /**< force number of channels */
 	volatile unsigned force_nrank;    /**< force number of ranks */
 	volatile unsigned no_hugetlbfs;   /**< true to disable hugetlbfs */
-	unsigned hugepage_unlink;         /**< true to unlink backing files */
 	volatile unsigned no_pci;         /**< true to disable PCI */
 	volatile unsigned no_hpet;        /**< true to disable HPET */
 	volatile unsigned vmware_tsc_map; /**< true to use VMware TSC mapping
diff --git a/lib/librte_eal/common/eal_options.h b/lib/librte_eal/common/eal_options.h
index 6890d4114..aef696c92 100644
--- a/lib/librte_eal/common/eal_options.h
+++ b/lib/librte_eal/common/eal_options.h
@@ -25,8 +25,6 @@ enum {
 	OPT_FILE_PREFIX_NUM,
 #define OPT_HUGE_DIR          "huge-dir"
 	OPT_HUGE_DIR_NUM,
-#define OPT_HUGE_UNLINK       "huge-unlink"
-	OPT_HUGE_UNLINK_NUM,
 #define OPT_LCORES            "lcores"
 	OPT_LCORES_NUM,
 #define OPT_LOG_LEVEL         "log-level"
@@ -43,7 +41,8 @@ enum {
 	OPT_NO_HUGE_NUM,
 #define OPT_NO_PCI            "no-pci"
 	OPT_NO_PCI_NUM,
-/* no-shconf is an alias for no-shared-files */
+/* huge-unlink and no-shconf are alias for no-shared-files */
+#define OPT_HUGE_UNLINK       "huge-unlink"
 #define OPT_NO_SHCONF         "no-shconf"
 #define OPT_NO_SHARED_FILES   "no-shared-files"
 	OPT_NO_SHARED_FILES_NUM,
diff --git a/lib/librte_eal/linuxapp/eal/eal_memory.c b/lib/librte_eal/linuxapp/eal/eal_memory.c
index c917de1c2..5e1810712 100644
--- a/lib/librte_eal/linuxapp/eal/eal_memory.c
+++ b/lib/librte_eal/linuxapp/eal/eal_memory.c
@@ -1547,7 +1547,7 @@ eal_legacy_hugepage_init(void)
 	}
 
 	/* free the hugepage backing files */
-	if (internal_config.hugepage_unlink &&
+	if (internal_config.no_shared_files &&
 		unlink_hugepage_files(tmp_hp, internal_config.num_hugepage_sizes) < 0) {
 		RTE_LOG(ERR, EAL, "Unlinking hugepage files failed!\n");
 		goto fail;
-- 
2.17.0

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [dpdk-dev] [RFC 04/10] fbarray: support no-shared-files mode
  2018-05-31 14:32 [dpdk-dev] [RFC 00/10] Support running DPDK without hugetlbfs mountpoint Anatoly Burakov
                   ` (2 preceding siblings ...)
  2018-05-31 14:32 ` [dpdk-dev] [RFC 03/10] eal: make --huge-unlink " Anatoly Burakov
@ 2018-05-31 14:32 ` Anatoly Burakov
  2018-05-31 14:32 ` [dpdk-dev] [RFC 05/10] mem: add support for " Anatoly Burakov
                   ` (5 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: Anatoly Burakov @ 2018-05-31 14:32 UTC (permalink / raw)
  To: dev
  Cc: ray.kinsella, kuralamudhan.ramakrishnan, louise.m.daly,
	bruce.richardson, ferruh.yigit, konstantin.ananyev

When using --no-shared-files option, the expectation is that no
multiprocess will be supported as no shared files are created. However,
fbarray still creates some shared files that prevent multiple processes
with the same prefix from starting.

Fix this by avoiding creating shared files whenever noshconf option is
specified. Since virtual areas we get from eal_get_virtual_area() are
read-only, remap them as writable.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
 lib/librte_eal/common/eal_common_fbarray.c | 71 +++++++++++++---------
 1 file changed, 42 insertions(+), 29 deletions(-)

diff --git a/lib/librte_eal/common/eal_common_fbarray.c b/lib/librte_eal/common/eal_common_fbarray.c
index 019f84c18..69576c8a8 100644
--- a/lib/librte_eal/common/eal_common_fbarray.c
+++ b/lib/librte_eal/common/eal_common_fbarray.c
@@ -434,39 +434,52 @@ rte_fbarray_init(struct rte_fbarray *arr, const char *name, unsigned int len,
 	if (data == NULL)
 		goto fail;
 
-	eal_get_fbarray_path(path, sizeof(path), name);
+	if (internal_config.no_shared_files) {
+		/* remap virtual area as writable */
+		void *new_data = mmap(data, mmap_len, PROT_READ | PROT_WRITE,
+				MAP_FIXED | MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
+		if (new_data == MAP_FAILED) {
+			RTE_LOG(DEBUG, EAL, "%s(): couldn't remap anonymous memory: %s\n",
+					__func__, strerror(errno));
+			goto fail;
+		}
+	} else {
+		eal_get_fbarray_path(path, sizeof(path), name);
 
-	/*
-	 * Each fbarray is unique to process namespace, i.e. the filename
-	 * depends on process prefix. Try to take out a lock and see if we
-	 * succeed. If we don't, someone else is using it already.
-	 */
-	fd = open(path, O_CREAT | O_RDWR, 0600);
-	if (fd < 0) {
-		RTE_LOG(DEBUG, EAL, "%s(): couldn't open %s: %s\n", __func__,
-				path, strerror(errno));
-		rte_errno = errno;
-		goto fail;
-	} else if (flock(fd, LOCK_EX | LOCK_NB)) {
-		RTE_LOG(DEBUG, EAL, "%s(): couldn't lock %s: %s\n", __func__,
-				path, strerror(errno));
-		rte_errno = EBUSY;
-		goto fail;
-	}
+		/*
+		 * Each fbarray is unique to process namespace, i.e. the
+		 * filename depends on process prefix. Try to take out a lock
+		 * and see if we succeed. If we don't, someone else is using it
+		 * already.
+		 */
+		fd = open(path, O_CREAT | O_RDWR, 0600);
+		if (fd < 0) {
+			RTE_LOG(DEBUG, EAL, "%s(): couldn't open %s: %s\n",
+					__func__, path, strerror(errno));
+			rte_errno = errno;
+			goto fail;
+		} else if (flock(fd, LOCK_EX | LOCK_NB)) {
+			RTE_LOG(DEBUG, EAL, "%s(): couldn't lock %s: %s\n",
+					__func__, path, strerror(errno));
+			rte_errno = EBUSY;
+			goto fail;
+		}
 
-	/* take out a non-exclusive lock, so that other processes could still
-	 * attach to it, but no other process could reinitialize it.
-	 */
-	if (flock(fd, LOCK_SH | LOCK_NB)) {
-		rte_errno = errno;
-		goto fail;
-	}
+		/* take out a non-exclusive lock, so that other processes could
+		 * still attach to it, but no other process could reinitialize
+		 * it.
+		 */
+		if (flock(fd, LOCK_SH | LOCK_NB)) {
+			rte_errno = errno;
+			goto fail;
+		}
 
-	if (resize_and_map(fd, data, mmap_len))
-		goto fail;
+		if (resize_and_map(fd, data, mmap_len))
+			goto fail;
 
-	/* we've mmap'ed the file, we can now close the fd */
-	close(fd);
+		/* we've mmap'ed the file, we can now close the fd */
+		close(fd);
+	}
 
 	/* initialize the data */
 	memset(data, 0, mmap_len);
-- 
2.17.0

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [dpdk-dev] [RFC 05/10] mem: add support for no-shared-files mode
  2018-05-31 14:32 [dpdk-dev] [RFC 00/10] Support running DPDK without hugetlbfs mountpoint Anatoly Burakov
                   ` (3 preceding siblings ...)
  2018-05-31 14:32 ` [dpdk-dev] [RFC 04/10] fbarray: support no-shared-files mode Anatoly Burakov
@ 2018-05-31 14:32 ` Anatoly Burakov
  2018-05-31 14:32 ` [dpdk-dev] [RFC 06/10] ipc: " Anatoly Burakov
                   ` (4 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: Anatoly Burakov @ 2018-05-31 14:32 UTC (permalink / raw)
  To: dev
  Cc: ray.kinsella, kuralamudhan.ramakrishnan, louise.m.daly,
	bruce.richardson, ferruh.yigit, konstantin.ananyev

Unlink hugepages after creating them, to honor the no shared files mode.
We cannot resize non-existing files, so make single file segments
explicitly unsupported.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
 lib/librte_eal/linuxapp/eal/eal.c          |  9 +++++++++
 lib/librte_eal/linuxapp/eal/eal_memalloc.c | 23 +++++++++++++++++++---
 2 files changed, 29 insertions(+), 3 deletions(-)

diff --git a/lib/librte_eal/linuxapp/eal/eal.c b/lib/librte_eal/linuxapp/eal/eal.c
index 32ca25dc2..7904f813e 100644
--- a/lib/librte_eal/linuxapp/eal/eal.c
+++ b/lib/librte_eal/linuxapp/eal/eal.c
@@ -690,6 +690,15 @@ eal_parse_args(int argc, char **argv)
 		goto out;
 	}
 
+	if (internal_config.single_file_segments &&
+			internal_config.no_shared_files) {
+		RTE_LOG(ERR, EAL, "Option --"OPT_SINGLE_FILE_SEGMENTS" is "
+			"incompatible with --"OPT_NO_SHARED_FILES"\n");
+		eal_usage(prgname);
+		ret = -1;
+		goto out;
+	}
+
 	if (optind >= 0)
 		argv[optind-1] = prgname;
 	ret = optind-1;
diff --git a/lib/librte_eal/linuxapp/eal/eal_memalloc.c b/lib/librte_eal/linuxapp/eal/eal_memalloc.c
index 8c11f98c9..f57d307dd 100644
--- a/lib/librte_eal/linuxapp/eal/eal_memalloc.c
+++ b/lib/librte_eal/linuxapp/eal/eal_memalloc.c
@@ -512,6 +512,13 @@ alloc_seg(struct rte_memseg *ms, void *addr, int socket_id,
 				__func__, strerror(errno));
 			goto resized;
 		}
+		if (internal_config.no_shared_files) {
+			if (unlink(path)) {
+				RTE_LOG(DEBUG, EAL, "%s(): unlink() failed: %s\n",
+					__func__, strerror(errno));
+				goto resized;
+			}
+		}
 	}
 
 	/*
@@ -562,8 +569,11 @@ alloc_seg(struct rte_memseg *ms, void *addr, int socket_id,
 			(unsigned int)(alloc_sz >> 20));
 		goto mapped;
 	}
-	/* for non-single file segments, we can close fd here */
-	if (!internal_config.single_file_segments)
+	/* for non-single file segments or no shared files mode, we can close fd
+	 * here
+	 */
+	if (!internal_config.single_file_segments ||
+			internal_config.no_shared_files)
 		close(fd);
 
 	/* we need to trigger a write to the page to enforce page fault and
@@ -592,7 +602,8 @@ alloc_seg(struct rte_memseg *ms, void *addr, int socket_id,
 		/* ignore failure, can't make it any worse */
 	} else {
 		/* only remove file if we can take out a write lock */
-		if (lock(fd, LOCK_EX) == 1)
+		if (internal_config.no_shared_files == 0 &&
+				lock(fd, LOCK_EX) == 1)
 			unlink(path);
 		close(fd);
 	}
@@ -617,6 +628,12 @@ free_seg(struct rte_memseg *ms, struct hugepage_info *hi,
 		return -1;
 	}
 
+	/* if we're no in shared files mode, nothing needs to be done */
+	if (internal_config.no_shared_files) {
+		memset(ms, 0, sizeof(*ms));
+		return 0;
+	}
+
 	/* if we are not in single file segments mode, we're going to unmap the
 	 * segment and thus drop the lock on original fd, but hugepage dir is
 	 * now locked so we can take out another one without races.
-- 
2.17.0

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [dpdk-dev] [RFC 06/10] ipc: add support for no-shared-files mode
  2018-05-31 14:32 [dpdk-dev] [RFC 00/10] Support running DPDK without hugetlbfs mountpoint Anatoly Burakov
                   ` (4 preceding siblings ...)
  2018-05-31 14:32 ` [dpdk-dev] [RFC 05/10] mem: add support for " Anatoly Burakov
@ 2018-05-31 14:32 ` Anatoly Burakov
  2018-05-31 14:32 ` [dpdk-dev] [RFC 07/10] eal: add support for no-shared-files for hugepage info Anatoly Burakov
                   ` (3 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: Anatoly Burakov @ 2018-05-31 14:32 UTC (permalink / raw)
  To: dev
  Cc: ray.kinsella, kuralamudhan.ramakrishnan, louise.m.daly,
	bruce.richardson, ferruh.yigit, konstantin.ananyev

IPC is an inter-process communication mechanism. Since no secondaries
can ever be expected to run in no shared files mode, IPC will be
useless, so do not enable it in the first place. In the interests of
API usage convenience, we will still allow registering callbacks, but
obviously they won't ever be triggered.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
 lib/librte_eal/common/eal_common_proc.c | 25 +++++++++++++++++++++++++
 1 file changed, 25 insertions(+)

diff --git a/lib/librte_eal/common/eal_common_proc.c b/lib/librte_eal/common/eal_common_proc.c
index 707d8ab30..6cce4e925 100644
--- a/lib/librte_eal/common/eal_common_proc.c
+++ b/lib/librte_eal/common/eal_common_proc.c
@@ -626,6 +626,14 @@ rte_mp_channel_init(void)
 	int dir_fd;
 	pthread_t mp_handle_tid, async_reply_handle_tid;
 
+	/* in no shared files mode, we do not have secondary processes support,
+	 * so no need to initialize IPC.
+	 */
+	if (internal_config.no_shared_files) {
+		RTE_LOG(DEBUG, EAL, "No shared files mode enabled, IPC will be disabled\n");
+		return 0;
+	}
+
 	/* create filter path */
 	create_socket_path("*", path, sizeof(path));
 	strlcpy(mp_filter, basename(path), sizeof(mp_filter));
@@ -988,6 +996,12 @@ rte_mp_request_sync(struct rte_mp_msg *req, struct rte_mp_reply *reply,
 
 	if (check_input(req) == false)
 		return -1;
+
+	if (internal_config.no_shared_files) {
+		RTE_LOG(DEBUG, EAL, "No shared files mode enabled, IPC is disabled\n");
+		return 0;
+	}
+
 	if (gettimeofday(&now, NULL) < 0) {
 		RTE_LOG(ERR, EAL, "Faile to get current time\n");
 		rte_errno = errno;
@@ -1072,6 +1086,12 @@ rte_mp_request_async(struct rte_mp_msg *req, const struct timespec *ts,
 
 	if (check_input(req) == false)
 		return -1;
+
+	if (internal_config.no_shared_files) {
+		RTE_LOG(DEBUG, EAL, "No shared files mode enabled, IPC is disabled\n");
+		return 0;
+	}
+
 	if (gettimeofday(&now, NULL) < 0) {
 		RTE_LOG(ERR, EAL, "Faile to get current time\n");
 		rte_errno = errno;
@@ -1213,5 +1233,10 @@ rte_mp_reply(struct rte_mp_msg *msg, const char *peer)
 		return -1;
 	}
 
+	if (internal_config.no_shared_files) {
+		RTE_LOG(DEBUG, EAL, "No shared files mode enabled, IPC is disabled\n");
+		return 0;
+	}
+
 	return mp_send(msg, peer, MP_REP);
 }
-- 
2.17.0

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [dpdk-dev] [RFC 07/10] eal: add support for no-shared-files for hugepage info
  2018-05-31 14:32 [dpdk-dev] [RFC 00/10] Support running DPDK without hugetlbfs mountpoint Anatoly Burakov
                   ` (5 preceding siblings ...)
  2018-05-31 14:32 ` [dpdk-dev] [RFC 06/10] ipc: " Anatoly Burakov
@ 2018-05-31 14:32 ` Anatoly Burakov
  2018-05-31 14:32 ` [dpdk-dev] [RFC 08/10] eal: add support for no-shared-files in hugepage data file Anatoly Burakov
                   ` (2 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: Anatoly Burakov @ 2018-05-31 14:32 UTC (permalink / raw)
  To: dev
  Cc: Bruce Richardson, ray.kinsella, kuralamudhan.ramakrishnan,
	louise.m.daly, ferruh.yigit, konstantin.ananyev

Do not create any shared hugepage size info files if we were
asked to not create any shared files.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
 lib/librte_eal/bsdapp/eal/eal_hugepage_info.c   | 4 ++++
 lib/librte_eal/linuxapp/eal/eal_hugepage_info.c | 4 ++++
 2 files changed, 8 insertions(+)

diff --git a/lib/librte_eal/bsdapp/eal/eal_hugepage_info.c b/lib/librte_eal/bsdapp/eal/eal_hugepage_info.c
index 836feb672..4b2f71c7e 100644
--- a/lib/librte_eal/bsdapp/eal/eal_hugepage_info.c
+++ b/lib/librte_eal/bsdapp/eal/eal_hugepage_info.c
@@ -101,6 +101,10 @@ eal_hugepage_info_init(void)
 	hpi->num_pages[0] = num_buffers;
 	hpi->lock_descriptor = fd;
 
+	/* for no shared files mode, do not create shared memory config */
+	if (internal_config.no_shared_files)
+		return 0;
+
 	tmp_hpi = create_shared_memory(eal_hugepage_info_path(),
 			sizeof(internal_config.hugepage_info));
 	if (tmp_hpi == NULL ) {
diff --git a/lib/librte_eal/linuxapp/eal/eal_hugepage_info.c b/lib/librte_eal/linuxapp/eal/eal_hugepage_info.c
index 7eca711ba..02b1c4ff1 100644
--- a/lib/librte_eal/linuxapp/eal/eal_hugepage_info.c
+++ b/lib/librte_eal/linuxapp/eal/eal_hugepage_info.c
@@ -446,6 +446,10 @@ eal_hugepage_info_init(void)
 	if (hugepage_info_init() < 0)
 		return -1;
 
+	/* for no shared files mode, we're done */
+	if (internal_config.no_shared_files)
+		return 0;
+
 	hpi = &internal_config.hugepage_info[0];
 
 	tmp_hpi = create_shared_memory(eal_hugepage_info_path(),
-- 
2.17.0

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [dpdk-dev] [RFC 08/10] eal: add support for no-shared-files in hugepage data file
  2018-05-31 14:32 [dpdk-dev] [RFC 00/10] Support running DPDK without hugetlbfs mountpoint Anatoly Burakov
                   ` (6 preceding siblings ...)
  2018-05-31 14:32 ` [dpdk-dev] [RFC 07/10] eal: add support for no-shared-files for hugepage info Anatoly Burakov
@ 2018-05-31 14:32 ` Anatoly Burakov
  2018-05-31 14:32 ` [dpdk-dev] [RFC 09/10] eal: do not create runtime dir in no-shared-files mode Anatoly Burakov
  2018-05-31 14:32 ` [dpdk-dev] [RFC 10/10] mem: enable memfd-based hugepage allocation Anatoly Burakov
  9 siblings, 0 replies; 11+ messages in thread
From: Anatoly Burakov @ 2018-05-31 14:32 UTC (permalink / raw)
  To: dev
  Cc: ray.kinsella, kuralamudhan.ramakrishnan, louise.m.daly,
	bruce.richardson, ferruh.yigit, konstantin.ananyev

Do not create a shared hugepage data file if we were asked to
not create any shared files.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
 lib/librte_eal/linuxapp/eal/eal_memory.c | 13 ++++++++++++-
 1 file changed, 12 insertions(+), 1 deletion(-)

diff --git a/lib/librte_eal/linuxapp/eal/eal_memory.c b/lib/librte_eal/linuxapp/eal/eal_memory.c
index 5e1810712..d7b43b5c1 100644
--- a/lib/librte_eal/linuxapp/eal/eal_memory.c
+++ b/lib/librte_eal/linuxapp/eal/eal_memory.c
@@ -521,7 +521,18 @@ static void *
 create_shared_memory(const char *filename, const size_t mem_size)
 {
 	void *retval;
-	int fd = open(filename, O_CREAT | O_RDWR, 0666);
+	int fd;
+
+	/* if no shared files mode is used, create anonymous memory instead */
+	if (internal_config.no_shared_files) {
+		retval = mmap(NULL, mem_size, PROT_READ | PROT_WRITE,
+				MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
+		if (retval == MAP_FAILED)
+			return NULL;
+		return retval;
+	}
+
+	fd = open(filename, O_CREAT | O_RDWR, 0666);
 	if (fd < 0)
 		return NULL;
 	if (ftruncate(fd, mem_size) < 0) {
-- 
2.17.0

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [dpdk-dev] [RFC 09/10] eal: do not create runtime dir in no-shared-files mode
  2018-05-31 14:32 [dpdk-dev] [RFC 00/10] Support running DPDK without hugetlbfs mountpoint Anatoly Burakov
                   ` (7 preceding siblings ...)
  2018-05-31 14:32 ` [dpdk-dev] [RFC 08/10] eal: add support for no-shared-files in hugepage data file Anatoly Burakov
@ 2018-05-31 14:32 ` Anatoly Burakov
  2018-05-31 14:32 ` [dpdk-dev] [RFC 10/10] mem: enable memfd-based hugepage allocation Anatoly Burakov
  9 siblings, 0 replies; 11+ messages in thread
From: Anatoly Burakov @ 2018-05-31 14:32 UTC (permalink / raw)
  To: dev
  Cc: Bruce Richardson, ray.kinsella, kuralamudhan.ramakrishnan,
	louise.m.daly, ferruh.yigit, konstantin.ananyev

Now that the rest of the EAL is adjusted to not create any shared
files, prevent runtime directory from ever being created.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
 lib/librte_eal/bsdapp/eal/eal.c   | 3 ++-
 lib/librte_eal/linuxapp/eal/eal.c | 3 ++-
 2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/lib/librte_eal/bsdapp/eal/eal.c b/lib/librte_eal/bsdapp/eal/eal.c
index 4dff1804e..3ba2502cc 100644
--- a/lib/librte_eal/bsdapp/eal/eal.c
+++ b/lib/librte_eal/bsdapp/eal/eal.c
@@ -601,7 +601,8 @@ rte_eal_init(int argc, char **argv)
 	}
 
 	/* create runtime data directory */
-	if (eal_create_runtime_dir() < 0) {
+	if (internal_config.no_shared_files == 0 &&
+			eal_create_runtime_dir() < 0) {
 		rte_eal_init_alert("Cannot create runtime directory\n");
 		rte_errno = EACCES;
 		return -1;
diff --git a/lib/librte_eal/linuxapp/eal/eal.c b/lib/librte_eal/linuxapp/eal/eal.c
index 7904f813e..c0b2b1a5a 100644
--- a/lib/librte_eal/linuxapp/eal/eal.c
+++ b/lib/librte_eal/linuxapp/eal/eal.c
@@ -827,7 +827,8 @@ rte_eal_init(int argc, char **argv)
 	}
 
 	/* create runtime data directory */
-	if (eal_create_runtime_dir() < 0) {
+	if (internal_config.no_shared_files == 0 &&
+			eal_create_runtime_dir() < 0) {
 		rte_eal_init_alert("Cannot create runtime directory\n");
 		rte_errno = EACCES;
 		return -1;
-- 
2.17.0

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [dpdk-dev] [RFC 10/10] mem: enable memfd-based hugepage allocation
  2018-05-31 14:32 [dpdk-dev] [RFC 00/10] Support running DPDK without hugetlbfs mountpoint Anatoly Burakov
                   ` (8 preceding siblings ...)
  2018-05-31 14:32 ` [dpdk-dev] [RFC 09/10] eal: do not create runtime dir in no-shared-files mode Anatoly Burakov
@ 2018-05-31 14:32 ` Anatoly Burakov
  9 siblings, 0 replies; 11+ messages in thread
From: Anatoly Burakov @ 2018-05-31 14:32 UTC (permalink / raw)
  To: dev
  Cc: ray.kinsella, kuralamudhan.ramakrishnan, louise.m.daly,
	bruce.richardson, ferruh.yigit, konstantin.ananyev

This will supplant no-shared-files mode to use memfd-based hugetlbfs
allocation instead of hugetlbfs mounts. Due to memfd only being
supported kernel 4.14+ and glibc 2.27+, a compile-time check is
performed along with runtime checks.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
 .../linuxapp/eal/eal_hugepage_info.c          | 136 ++++++++++++++----
 lib/librte_eal/linuxapp/eal/eal_memalloc.c    | 105 +++++++++++++-
 lib/librte_eal/linuxapp/eal/eal_memfd.h       |  28 ++++
 lib/librte_eal/linuxapp/eal/eal_memory.c      |   4 +-
 4 files changed, 234 insertions(+), 39 deletions(-)
 create mode 100644 lib/librte_eal/linuxapp/eal/eal_memfd.h

diff --git a/lib/librte_eal/linuxapp/eal/eal_hugepage_info.c b/lib/librte_eal/linuxapp/eal/eal_hugepage_info.c
index 02b1c4ff1..1a80ee0ee 100644
--- a/lib/librte_eal/linuxapp/eal/eal_hugepage_info.c
+++ b/lib/librte_eal/linuxapp/eal/eal_hugepage_info.c
@@ -30,6 +30,7 @@
 #include "eal_internal_cfg.h"
 #include "eal_hugepages.h"
 #include "eal_filesystem.h"
+#include "eal_memfd.h"
 
 static const char sys_dir_path[] = "/sys/kernel/mm/hugepages";
 static const char sys_pages_numa_dir_path[] = "/sys/devices/system/node";
@@ -313,11 +314,85 @@ compare_hpi(const void *a, const void *b)
 	return hpi_b->hugepage_sz - hpi_a->hugepage_sz;
 }
 
+static void
+calc_num_pages(struct hugepage_info *hpi, struct dirent *dirent)
+{
+	uint64_t total_pages = 0;
+	unsigned int i;
+
+	/*
+	 * first, try to put all hugepages into relevant sockets, but
+	 * if first attempts fails, fall back to collecting all pages
+	 * in one socket and sorting them later
+	 */
+	total_pages = 0;
+	/* we also don't want to do this for legacy init */
+	if (!internal_config.legacy_mem)
+		for (i = 0; i < rte_socket_count(); i++) {
+			int socket = rte_socket_id_by_idx(i);
+			unsigned int num_pages =
+					get_num_hugepages_on_node(
+						dirent->d_name, socket);
+			hpi->num_pages[socket] = num_pages;
+			total_pages += num_pages;
+		}
+	/*
+	 * we failed to sort memory from the get go, so fall
+	 * back to old way
+	 */
+	if (total_pages == 0) {
+		hpi->num_pages[0] = get_num_hugepages(dirent->d_name);
+
+#ifndef RTE_ARCH_64
+		/* for 32-bit systems, limit number of hugepages to
+		 * 1GB per page size */
+		hpi->num_pages[0] = RTE_MIN(hpi->num_pages[0],
+				RTE_PGSIZE_1G / hpi->hugepage_sz);
+#endif
+	}
+}
+
+static int
+check_memfd_pagesize_supported(uint64_t page_sz)
+{
+#ifdef MEMFD_SUPPORTED
+	int sz_flag, fd;
+
+	/* first, check if this particular pagesize is supported */
+	sz_flag = eal_memalloc_get_memfd_pagesize_flag(page_sz);
+	if (sz_flag == 0) {
+		RTE_LOG(ERR, EAL, "Unexpected memfd hugepage size: %"
+			PRIu64" bytes\n", page_sz);
+		return 0;
+	}
+
+	/* does currently running kernel support it? */
+	fd = memfd_create("memfd_test", sz_flag | MFD_HUGETLB);
+	if (fd >= 0) {
+		/* success */
+		close(fd);
+		return 1;
+	}
+	/* creating memfd failed, but if the error wasn't EINVAL, reserving of
+	 * hugepages via memfd is supported by the kernel
+	 */
+	if (errno != EINVAL) {
+		return 1;
+	}
+	RTE_LOG(DEBUG, EAL, "Kernel does not support memfd hugepages of size %"
+		PRIu64" bytes\n", page_sz);
+#else
+	RTE_LOG(DEBUG, EAL, "Memfd hugepage support not enabled at compile time\n");
+	RTE_SET_USED(page_sz);
+#endif
+	return 0;
+}
+
 static int
 hugepage_info_init(void)
 {	const char dirent_start_text[] = "hugepages-";
 	const size_t dirent_start_len = sizeof(dirent_start_text) - 1;
-	unsigned int i, total_pages, num_sizes = 0;
+	unsigned int i, num_sizes = 0;
 	DIR *dir;
 	struct dirent *dirent;
 
@@ -343,6 +418,10 @@ hugepage_info_init(void)
 		hpi->hugepage_sz =
 			rte_str_to_size(&dirent->d_name[dirent_start_len]);
 
+		/* by default, memfd_hugepage_supported is 1 */
+		memfd_hugepage_supported &=
+			check_memfd_pagesize_supported(hpi->hugepage_sz);
+
 		/* first, check if we have a mountpoint */
 		if (get_hugepage_dir(hpi->hugepage_sz,
 			hpi->hugedir, sizeof(hpi->hugedir)) < 0) {
@@ -355,6 +434,23 @@ hugepage_info_init(void)
 					"%" PRIu64 " reserved, but no mounted "
 					"hugetlbfs found for that size\n",
 					num_pages, hpi->hugepage_sz);
+
+			/* no shared files mode may still be able to allocate
+			 * without a valid mountpoint via memfd, but we cannot
+			 * use memfd in legacy mode, because we cannot sort
+			 * pages, so only allow empty mountpoints in non-legacy
+			 * mode.
+			 */
+			if (internal_config.no_shared_files &&
+					!internal_config.legacy_mem &&
+					memfd_hugepage_supported) {
+				RTE_LOG(NOTICE, EAL, "No shared files mode enabled, "
+					"hugepages of size %" PRIu64 " bytes "
+					"will be allocated anonymously\n",
+					hpi->hugepage_sz);
+				calc_num_pages(hpi, dirent);
+				num_sizes++;
+			}
 			continue;
 		}
 
@@ -371,35 +467,14 @@ hugepage_info_init(void)
 		if (clear_hugedir(hpi->hugedir) == -1)
 			break;
 
-		/*
-		 * first, try to put all hugepages into relevant sockets, but
-		 * if first attempts fails, fall back to collecting all pages
-		 * in one socket and sorting them later
-		 */
-		total_pages = 0;
-		/* we also don't want to do this for legacy init */
-		if (!internal_config.legacy_mem)
-			for (i = 0; i < rte_socket_count(); i++) {
-				int socket = rte_socket_id_by_idx(i);
-				unsigned int num_pages =
-						get_num_hugepages_on_node(
-							dirent->d_name, socket);
-				hpi->num_pages[socket] = num_pages;
-				total_pages += num_pages;
-			}
-		/*
-		 * we failed to sort memory from the get go, so fall
-		 * back to old way
-		 */
-		if (total_pages == 0)
-			hpi->num_pages[0] = get_num_hugepages(dirent->d_name);
+		calc_num_pages(hpi, dirent);
 
-#ifndef RTE_ARCH_64
-		/* for 32-bit systems, limit number of hugepages to
-		 * 1GB per page size */
-		hpi->num_pages[0] = RTE_MIN(hpi->num_pages[0],
-					    RTE_PGSIZE_1G / hpi->hugepage_sz);
-#endif
+		if (internal_config.no_shared_files &&
+				!internal_config.legacy_mem &&
+				memfd_hugepage_supported)
+			RTE_LOG(NOTICE, EAL, "No shared files mode enabled, "
+				"hugepages of size %" PRIu64 " bytes will be "
+				"allocated anonymously\n", hpi->hugepage_sz);
 
 		num_sizes++;
 	}
@@ -423,8 +498,7 @@ hugepage_info_init(void)
 
 		for (j = 0; j < RTE_MAX_NUMA_NODES; j++)
 			num_pages += hpi->num_pages[j];
-		if (strnlen(hpi->hugedir, sizeof(hpi->hugedir)) != 0 &&
-				num_pages > 0)
+		if (num_pages > 0)
 			return 0;
 	}
 
diff --git a/lib/librte_eal/linuxapp/eal/eal_memalloc.c b/lib/librte_eal/linuxapp/eal/eal_memalloc.c
index f57d307dd..c4d57c349 100644
--- a/lib/librte_eal/linuxapp/eal/eal_memalloc.c
+++ b/lib/librte_eal/linuxapp/eal/eal_memalloc.c
@@ -39,6 +39,7 @@
 #include "eal_filesystem.h"
 #include "eal_internal_cfg.h"
 #include "eal_memalloc.h"
+#include "eal_memfd.h"
 
 /*
  * not all kernel version support fallocate on hugetlbfs, so fall back to
@@ -46,6 +47,11 @@
  */
 static int fallocate_supported = -1; /* unknown */
 
+/* not all kernel versions support memfd hugepages. assume supported unless
+ * shown otherwise.
+ */
+int memfd_hugepage_supported = 1;
+
 /* for single-file segments, we need some kind of mechanism to keep track of
  * which hugepages can be freed back to the system, and which cannot. we cannot
  * use flock() because they don't allow locking parts of a file, and we cannot
@@ -293,6 +299,49 @@ static int unlock_segment(int list_idx, int seg_idx)
 	return 0;
 }
 
+int
+eal_memalloc_get_memfd_pagesize_flag(uint64_t page_sz)
+{
+#ifdef MEMFD_SUPPORTED
+	switch (page_sz) {
+	case RTE_PGSIZE_1G:
+		return MFD_HUGE_1GB;
+	case RTE_PGSIZE_2M:
+		return MFD_HUGE_2MB;
+	default:
+		return -1;
+	}
+#endif
+	return 0;
+}
+
+static int
+get_memfd_seg_fd(unsigned int list_idx,
+		unsigned int seg_idx, int sz_flag)
+{
+#ifdef MEMFD_SUPPORTED
+	int flags = MFD_HUGETLB | sz_flag;
+	char name[64];
+	int fd;
+
+	snprintf(name, sizeof(name) - 1, "memseg-%d-%d", list_idx,
+			seg_idx);
+
+	fd = memfd_create(name, flags);
+	if (fd < 0) {
+		RTE_LOG(ERR, EAL, "Couldn't create memfd hugepage: %s\n",
+			strerror(errno));
+		return -1;
+	}
+	return fd;
+#else
+	RTE_SET_USED(list_idx);
+	RTE_SET_USED(seg_idx);
+	RTE_SET_USED(sz_flag);
+	return -1;
+#endif
+}
+
 static int
 get_seg_fd(char *path, int buflen, struct hugepage_info *hi,
 		unsigned int list_idx, unsigned int seg_idx)
@@ -342,6 +391,27 @@ get_seg_fd(char *path, int buflen, struct hugepage_info *hi,
 	return fd;
 }
 
+static int
+get_seg_fd_no_shared(char *path, int buflen, struct hugepage_info *hi,
+		unsigned int list_idx, unsigned int seg_idx)
+{
+	int sz_flag;
+
+	/* if memfd hugepages are not supported, create regular files */
+	if (memfd_hugepage_supported == 0)
+		return get_seg_fd(path, buflen, hi, list_idx, seg_idx);
+
+	/* pick correct page size flags */
+	sz_flag = eal_memalloc_get_memfd_pagesize_flag(hi->hugepage_sz);
+	if (sz_flag == 0) {
+		RTE_LOG(ERR, EAL, "Unexpected page size: %"
+			PRIu64 "\n", hi->hugepage_sz);
+		return -1;
+	}
+
+	return get_memfd_seg_fd(list_idx, seg_idx, sz_flag);
+}
+
 static int
 resize_hugefile(int fd, char *path, int list_idx, int seg_idx,
 		uint64_t fa_offset, uint64_t page_sz, bool grow)
@@ -491,8 +561,16 @@ alloc_seg(struct rte_memseg *ms, void *addr, int socket_id,
 	int fd;
 	size_t alloc_sz;
 
-	/* takes out a read lock on segment or segment list */
-	fd = get_seg_fd(path, sizeof(path), hi, list_idx, seg_idx);
+	if (internal_config.no_shared_files) {
+		/* if allocating memfd hugepages is supported, do that,
+		 * otherwise fallback to regular allocation
+		 */
+		fd = get_seg_fd_no_shared(path, sizeof(path), hi, list_idx,
+				seg_idx);
+	} else {
+		/* takes out a read lock on segment or segment list */
+		fd = get_seg_fd(path, sizeof(path), hi, list_idx, seg_idx);
+	}
 	if (fd < 0) {
 		RTE_LOG(ERR, EAL, "Couldn't get fd on hugepage file\n");
 		return -1;
@@ -512,7 +590,8 @@ alloc_seg(struct rte_memseg *ms, void *addr, int socket_id,
 				__func__, strerror(errno));
 			goto resized;
 		}
-		if (internal_config.no_shared_files) {
+		if (internal_config.no_shared_files &&
+				memfd_hugepage_supported == 0) {
 			if (unlink(path)) {
 				RTE_LOG(DEBUG, EAL, "%s(): unlink() failed: %s\n",
 					__func__, strerror(errno));
@@ -616,7 +695,7 @@ free_seg(struct rte_memseg *ms, struct hugepage_info *hi,
 {
 	uint64_t map_offset;
 	char path[PATH_MAX];
-	int fd, ret;
+	int fd, ret = 0;
 
 	/* erase page data */
 	memset(ms->addr, 0, ms->len);
@@ -685,6 +764,7 @@ alloc_seg_walk(const struct rte_memseg_list *msl, void *arg)
 	size_t page_sz;
 	int cur_idx, start_idx, j, dir_fd = -1;
 	unsigned int msl_idx, need, i;
+	bool mountpoint_is_empty;
 
 	if (msl->page_sz != wa->page_sz)
 		return 0;
@@ -704,6 +784,12 @@ alloc_seg_walk(const struct rte_memseg_list *msl, void *arg)
 		return 0;
 	start_idx = cur_idx;
 
+	/* if we're in no-shared-files mode and memfd is supported, we will
+	 * allow empty mountpoints because memfd doesn't require a mountpoint.
+	 */
+	mountpoint_is_empty =
+			strnlen(wa->hi->hugedir, sizeof(wa->hi->hugedir)) == 0;
+
 	/* do not allow any page allocations during the time we're allocating,
 	 * because file creation and locking operations are not atomic,
 	 * and we might be the first or the last ones to use a particular page,
@@ -712,7 +798,7 @@ alloc_seg_walk(const struct rte_memseg_list *msl, void *arg)
 	 * during init, we already hold a write lock, so don't try to take out
 	 * another one.
 	 */
-	if (wa->hi->lock_descriptor == -1) {
+	if (wa->hi->lock_descriptor == -1 && !mountpoint_is_empty) {
 		dir_fd = open(wa->hi->hugedir, O_RDONLY);
 		if (dir_fd < 0) {
 			RTE_LOG(ERR, EAL, "%s(): Cannot open '%s': %s\n",
@@ -794,6 +880,7 @@ free_seg_walk(const struct rte_memseg_list *msl, void *arg)
 	struct free_walk_param *wa = arg;
 	uintptr_t start_addr, end_addr;
 	int msl_idx, seg_idx, ret, dir_fd = -1;
+	bool mountpoint_is_empty;
 
 	start_addr = (uintptr_t) msl->base_va;
 	end_addr = start_addr + msl->memseg_arr.len * (size_t)msl->page_sz;
@@ -802,6 +889,12 @@ free_seg_walk(const struct rte_memseg_list *msl, void *arg)
 			(uintptr_t)wa->ms->addr >= end_addr)
 		return 0;
 
+	/* if we're in no shared files mode and memfd is supported, we will
+	 * allow empty mountpoints because memfd doesn't require a mountpoint.
+	 */
+	mountpoint_is_empty =
+			strnlen(wa->hi->hugedir, sizeof(wa->hi->hugedir)) == 0;
+
 	msl_idx = msl - mcfg->memsegs;
 	seg_idx = RTE_PTR_DIFF(wa->ms->addr, start_addr) / msl->page_sz;
 
@@ -816,7 +909,7 @@ free_seg_walk(const struct rte_memseg_list *msl, void *arg)
 	 * during init, we already hold a write lock, so don't try to take out
 	 * another one.
 	 */
-	if (wa->hi->lock_descriptor == -1) {
+	if (wa->hi->lock_descriptor == -1 && !mountpoint_is_empty) {
 		dir_fd = open(wa->hi->hugedir, O_RDONLY);
 		if (dir_fd < 0) {
 			RTE_LOG(ERR, EAL, "%s(): Cannot open '%s': %s\n",
diff --git a/lib/librte_eal/linuxapp/eal/eal_memfd.h b/lib/librte_eal/linuxapp/eal/eal_memfd.h
new file mode 100644
index 000000000..55e6dbb2c
--- /dev/null
+++ b/lib/librte_eal/linuxapp/eal/eal_memfd.h
@@ -0,0 +1,28 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2018 Intel Corporation
+ */
+
+#ifndef EAL_MEMFD_H
+#define EAL_MEMFD_H
+
+#include <stdint.h>
+
+/*
+ * For memfd hugepages, both kernel and glibc version must support them. So,
+ * check for both.
+ */
+#include <features.h> /* glibc version */
+#if __GLIBC__ >= 2 && __GLIBC_MINOR__ >= 27
+#include <linux/version.h> /* linux kernel version */
+#if LINUX_VERSION_CODE >= KERNEL_VERSION(4, 14, 0)
+#define MEMFD_SUPPORTED
+#include <linux/memfd.h>
+#endif /* linux version check */
+#endif /* glibc version check */
+
+int
+eal_memalloc_get_memfd_pagesize_flag(uint64_t page_sz);
+
+extern int memfd_hugepage_supported;
+
+#endif /* EAL_MEMFD_H */
diff --git a/lib/librte_eal/linuxapp/eal/eal_memory.c b/lib/librte_eal/linuxapp/eal/eal_memory.c
index d7b43b5c1..b26e21be8 100644
--- a/lib/librte_eal/linuxapp/eal/eal_memory.c
+++ b/lib/librte_eal/linuxapp/eal/eal_memory.c
@@ -44,6 +44,7 @@
 #include "eal_internal_cfg.h"
 #include "eal_filesystem.h"
 #include "eal_hugepages.h"
+#include "eal_memfd.h"
 
 #define PFN_MASK_SIZE	8
 
@@ -1060,8 +1061,7 @@ get_socket_mem_size(int socket)
 
 	for (i = 0; i < internal_config.num_hugepage_sizes; i++){
 		struct hugepage_info *hpi = &internal_config.hugepage_info[i];
-		if (strnlen(hpi->hugedir, sizeof(hpi->hugedir)) != 0)
-			size += hpi->hugepage_sz * hpi->num_pages[socket];
+		size += hpi->hugepage_sz * hpi->num_pages[socket];
 	}
 
 	return size;
-- 
2.17.0

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2018-05-31 14:32 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-05-31 14:32 [dpdk-dev] [RFC 00/10] Support running DPDK without hugetlbfs mountpoint Anatoly Burakov
2018-05-31 14:32 ` [dpdk-dev] [RFC 01/10] eal: add --no-shared-files option Anatoly Burakov
2018-05-31 14:32 ` [dpdk-dev] [RFC 02/10] eal: make --no-shconf an alias for --no-shared-files Anatoly Burakov
2018-05-31 14:32 ` [dpdk-dev] [RFC 03/10] eal: make --huge-unlink " Anatoly Burakov
2018-05-31 14:32 ` [dpdk-dev] [RFC 04/10] fbarray: support no-shared-files mode Anatoly Burakov
2018-05-31 14:32 ` [dpdk-dev] [RFC 05/10] mem: add support for " Anatoly Burakov
2018-05-31 14:32 ` [dpdk-dev] [RFC 06/10] ipc: " Anatoly Burakov
2018-05-31 14:32 ` [dpdk-dev] [RFC 07/10] eal: add support for no-shared-files for hugepage info Anatoly Burakov
2018-05-31 14:32 ` [dpdk-dev] [RFC 08/10] eal: add support for no-shared-files in hugepage data file Anatoly Burakov
2018-05-31 14:32 ` [dpdk-dev] [RFC 09/10] eal: do not create runtime dir in no-shared-files mode Anatoly Burakov
2018-05-31 14:32 ` [dpdk-dev] [RFC 10/10] mem: enable memfd-based hugepage allocation Anatoly Burakov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).