From: Dmitry Kozlyuk <dkozlyuk@nvidia.com>
To: <dev@dpdk.org>
Cc: Bruce Richardson <bruce.richardson@intel.com>,
Thomas Monjalon <thomas@monjalon.net>,
Anatoly Burakov <anatoly.burakov@intel.com>
Subject: [PATCH v2 6/6] eal: extend --huge-unlink for hugepage file reuse
Date: Wed, 19 Jan 2022 23:11:44 +0200 [thread overview]
Message-ID: <20220119211144.766098-2-dkozlyuk@nvidia.com> (raw)
In-Reply-To: <20220119211144.766098-1-dkozlyuk@nvidia.com>
Expose Linux EAL ability to reuse existing hugepage files
via --huge-unlink=never switch.
Default behavior is unchanged, it can also be specified
using --huge-unlink=existing for consistency.
Old --huge-unlink switch is kept,
it is an alias for --huge-unlink=always.
Add a test case for the --huge-unlink=never mode.
Signed-off-by: Dmitry Kozlyuk <dkozlyuk@nvidia.com>
Acked-by: Thomas Monjalon <thomas@monjalon.net>
---
app/test/test_eal_flags.c | 25 ++++++++++++
doc/guides/linux_gsg/linux_eal_parameters.rst | 24 ++++++++++--
.../prog_guide/env_abstraction_layer.rst | 12 ++++++
doc/guides/rel_notes/release_22_03.rst | 7 ++++
lib/eal/common/eal_common_options.c | 39 +++++++++++++++++--
5 files changed, 100 insertions(+), 7 deletions(-)
diff --git a/app/test/test_eal_flags.c b/app/test/test_eal_flags.c
index d7f4c2cd47..e2696cda63 100644
--- a/app/test/test_eal_flags.c
+++ b/app/test/test_eal_flags.c
@@ -1122,6 +1122,11 @@ test_file_prefix(void)
DEFAULT_MEM_SIZE, "--single-file-segments",
"--file-prefix=" memtest1 };
+ /* primary process with memtest1 and --huge-unlink=never mode */
+ const char * const argv9[] = {prgname, "-m",
+ DEFAULT_MEM_SIZE, "--huge-unlink=never",
+ "--file-prefix=" memtest1 };
+
/* check if files for current prefix are present */
if (process_hugefiles(prefix, HUGEPAGE_CHECK_EXISTS) != 1) {
printf("Error - hugepage files for %s were not created!\n", prefix);
@@ -1290,6 +1295,26 @@ test_file_prefix(void)
return -1;
}
+ /* this process will run with --huge-unlink,
+ * so it should not remove hugepage files when it exits
+ */
+ if (launch_proc(argv9) != 0) {
+ printf("Error - failed to run with --huge-unlink=never\n");
+ return -1;
+ }
+
+ /* check if hugefiles for memtest1 are present */
+ if (process_hugefiles(memtest1, HUGEPAGE_CHECK_EXISTS) == 0) {
+ printf("Error - hugepage files for %s were deleted!\n",
+ memtest1);
+ return -1;
+ } else {
+ if (process_hugefiles(memtest1, HUGEPAGE_DELETE) != 1) {
+ printf("Error - deleting hugepages failed!\n");
+ return -1;
+ }
+ }
+
return 0;
}
diff --git a/doc/guides/linux_gsg/linux_eal_parameters.rst b/doc/guides/linux_gsg/linux_eal_parameters.rst
index 74df2611b5..ea8f381391 100644
--- a/doc/guides/linux_gsg/linux_eal_parameters.rst
+++ b/doc/guides/linux_gsg/linux_eal_parameters.rst
@@ -84,10 +84,26 @@ Memory-related options
Use specified hugetlbfs directory instead of autodetected ones. This can be
a sub-directory within a hugetlbfs mountpoint.
-* ``--huge-unlink``
-
- Unlink hugepage files after creating them (implies no secondary process
- support).
+* ``--huge-unlink[=existing|always|never]``
+
+ No ``--huge-unlink`` option or ``--huge-unlink=existing`` is the default:
+ existing hugepage files are removed and re-created
+ to ensure the kernel clears the memory and prevents any data leaks.
+
+ With ``--huge-unlink`` (no value) or ``--huge-unlink=always``,
+ hugepage files are also removed before mapping them,
+ so that the application leaves no files in hugetlbfs.
+ This mode implies no multi-process support.
+
+ When ``--huge-unlink=never`` is specified, existing hugepage files
+ are never removed, but are remapped instead, allowing hugepage reuse.
+ This makes restart faster by saving time to clear memory at initialization,
+ but it may slow down zeroed allocations later.
+ Reused hugepages can contain data from previous processes that used them,
+ which may be a security concern.
+ Hugepage files created in this mode are also not removed
+ when all the hugepages mapped from them are freed,
+ which allows to reuse these files after a restart.
* ``--match-allocations``
diff --git a/doc/guides/prog_guide/env_abstraction_layer.rst b/doc/guides/prog_guide/env_abstraction_layer.rst
index fede7fe69d..b1eae592ab 100644
--- a/doc/guides/prog_guide/env_abstraction_layer.rst
+++ b/doc/guides/prog_guide/env_abstraction_layer.rst
@@ -282,6 +282,18 @@ to prevent data leaks from previous users of the same hugepage.
EAL ensures this behavior by removing existing backing files at startup
and by recreating them before opening for mapping (as a precaution).
+One exception is ``--huge-unlink=never`` mode.
+It is used to speed up EAL initialization, usually on application restart.
+Clearing memory constitutes more than 95% of hugepage mapping time.
+EAL can save it by remapping existing backing files
+with all the data left in the mapped hugepages ("dirty" memory).
+Such segments are marked with ``RTE_MEMSEG_FLAG_DIRTY``.
+Memory allocator detects dirty segments handles them accordingly,
+in particular, it clears memory requested with ``rte_zmalloc*()``.
+In this mode EAL also does not remove a backing file
+when all pages mapped from it are freed,
+because they are intended to be reusable at restart.
+
Anonymous mapping does not allow multi-process architecture,
but it is free of filename conflicts and leftover files on hugetlbfs.
It makes running as non-root easier,
diff --git a/doc/guides/rel_notes/release_22_03.rst b/doc/guides/rel_notes/release_22_03.rst
index 6d99d1eaa9..0b882362cf 100644
--- a/doc/guides/rel_notes/release_22_03.rst
+++ b/doc/guides/rel_notes/release_22_03.rst
@@ -55,6 +55,13 @@ New Features
Also, make sure to start the actual text at the margin.
=======================================================
+* **Added ability to reuse hugepages in Linux.**
+
+ It is possible to reuse files in hugetlbfs to speed up hugepage mapping,
+ which may be useful for fast restart and large allocations.
+ The new mode is activated with ``--huge-unlink=never``
+ and has security implications, refer to the user and programmer guides.
+
Removed Items
-------------
diff --git a/lib/eal/common/eal_common_options.c b/lib/eal/common/eal_common_options.c
index cdd2284b0c..45d393b393 100644
--- a/lib/eal/common/eal_common_options.c
+++ b/lib/eal/common/eal_common_options.c
@@ -74,7 +74,7 @@ eal_long_options[] = {
{OPT_FILE_PREFIX, 1, NULL, OPT_FILE_PREFIX_NUM },
{OPT_HELP, 0, NULL, OPT_HELP_NUM },
{OPT_HUGE_DIR, 1, NULL, OPT_HUGE_DIR_NUM },
- {OPT_HUGE_UNLINK, 0, NULL, OPT_HUGE_UNLINK_NUM },
+ {OPT_HUGE_UNLINK, 2, NULL, OPT_HUGE_UNLINK_NUM },
{OPT_IOVA_MODE, 1, NULL, OPT_IOVA_MODE_NUM },
{OPT_LCORES, 1, NULL, OPT_LCORES_NUM },
{OPT_LOG_LEVEL, 1, NULL, OPT_LOG_LEVEL_NUM },
@@ -1598,6 +1598,28 @@ available_cores(void)
return str;
}
+#define HUGE_UNLINK_NEVER "never"
+
+static int
+eal_parse_huge_unlink(const char *arg, struct hugepage_file_discipline *out)
+{
+ if (arg == NULL || strcmp(arg, "always") == 0) {
+ out->unlink_before_mapping = true;
+ return 0;
+ }
+ if (strcmp(arg, "existing") == 0) {
+ /* same as not specifying the option */
+ return 0;
+ }
+ if (strcmp(arg, HUGE_UNLINK_NEVER) == 0) {
+ RTE_LOG(WARNING, EAL, "Using --"OPT_HUGE_UNLINK"="
+ HUGE_UNLINK_NEVER" may create data leaks.\n");
+ out->unlink_existing = false;
+ return 0;
+ }
+ return -1;
+}
+
int
eal_parse_common_option(int opt, const char *optarg,
struct internal_config *conf)
@@ -1739,7 +1761,10 @@ eal_parse_common_option(int opt, const char *optarg,
/* long options */
case OPT_HUGE_UNLINK_NUM:
- conf->hugepage_file.unlink_before_mapping = true;
+ if (eal_parse_huge_unlink(optarg, &conf->hugepage_file) < 0) {
+ RTE_LOG(ERR, EAL, "invalid --"OPT_HUGE_UNLINK" option\n");
+ return -1;
+ }
break;
case OPT_NO_HUGE_NUM:
@@ -2070,6 +2095,12 @@ eal_check_common_options(struct internal_config *internal_cfg)
"not compatible with --"OPT_HUGE_UNLINK"\n");
return -1;
}
+ if (!internal_cfg->hugepage_file.unlink_existing &&
+ internal_cfg->in_memory) {
+ RTE_LOG(ERR, EAL, "Option --"OPT_IN_MEMORY" is not compatible "
+ "with --"OPT_HUGE_UNLINK"="HUGE_UNLINK_NEVER"\n");
+ return -1;
+ }
if (internal_cfg->legacy_mem &&
internal_cfg->in_memory) {
RTE_LOG(ERR, EAL, "Option --"OPT_LEGACY_MEM" is not compatible "
@@ -2202,7 +2233,9 @@ eal_common_usage(void)
" --"OPT_NO_TELEMETRY" Disable telemetry support\n"
" --"OPT_FORCE_MAX_SIMD_BITWIDTH" Force the max SIMD bitwidth\n"
"\nEAL options for DEBUG use only:\n"
- " --"OPT_HUGE_UNLINK" Unlink hugepage files after init\n"
+ " --"OPT_HUGE_UNLINK"[=existing|always|never]\n"
+ " When to unlink files in hugetlbfs\n"
+ " ('existing' by default, no value means 'always')\n"
" --"OPT_NO_HUGE" Use malloc instead of hugetlbfs\n"
" --"OPT_NO_PCI" Disable PCI\n"
" --"OPT_NO_HPET" Disable HPET\n"
--
2.25.1
next prev parent reply other threads:[~2022-01-19 21:12 UTC|newest]
Thread overview: 53+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-12-30 14:37 [RFC PATCH 0/6] Fast restart with many hugepages Dmitry Kozlyuk
2021-12-30 14:37 ` [RFC PATCH 1/6] doc: add hugepage mapping details Dmitry Kozlyuk
2021-12-30 14:37 ` [RFC PATCH 2/6] mem: add dirty malloc element support Dmitry Kozlyuk
2021-12-30 14:37 ` [RFC PATCH 3/6] eal: refactor --huge-unlink storage Dmitry Kozlyuk
2021-12-30 14:37 ` [RFC PATCH 4/6] eal/linux: allow hugepage file reuse Dmitry Kozlyuk
2021-12-30 14:48 ` [RFC PATCH 5/6] eal: allow hugepage file reuse with --huge-unlink Dmitry Kozlyuk
2021-12-30 14:49 ` [RFC PATCH 6/6] app/test: add allocator performance benchmark Dmitry Kozlyuk
2022-01-17 8:07 ` [PATCH v1 0/6] Fast restart with many hugepages Dmitry Kozlyuk
2022-01-17 8:07 ` [PATCH v1 1/6] doc: add hugepage mapping details Dmitry Kozlyuk
2022-01-17 9:20 ` Thomas Monjalon
2022-01-17 8:07 ` [PATCH v1 2/6] app/test: add allocator performance benchmark Dmitry Kozlyuk
2022-01-17 15:47 ` Bruce Richardson
2022-01-17 15:51 ` Bruce Richardson
2022-01-19 21:12 ` Dmitry Kozlyuk
2022-01-20 9:04 ` Bruce Richardson
2022-01-17 16:06 ` Aaron Conole
2022-01-17 8:07 ` [PATCH v1 3/6] mem: add dirty malloc element support Dmitry Kozlyuk
2022-01-17 14:07 ` Thomas Monjalon
2022-01-17 8:07 ` [PATCH v1 4/6] eal: refactor --huge-unlink storage Dmitry Kozlyuk
2022-01-17 14:10 ` Thomas Monjalon
2022-01-17 8:14 ` [PATCH v1 5/6] eal/linux: allow hugepage file reuse Dmitry Kozlyuk
2022-01-17 14:24 ` Thomas Monjalon
2022-01-17 8:14 ` [PATCH v1 6/6] eal: extend --huge-unlink for " Dmitry Kozlyuk
2022-01-17 14:27 ` Thomas Monjalon
2022-01-17 16:40 ` [PATCH v1 0/6] Fast restart with many hugepages Bruce Richardson
2022-01-19 21:12 ` Dmitry Kozlyuk
2022-01-20 9:05 ` Bruce Richardson
2022-01-19 21:09 ` [PATCH v2 " Dmitry Kozlyuk
2022-01-19 21:09 ` [PATCH v2 1/6] doc: add hugepage mapping details Dmitry Kozlyuk
2022-01-27 13:59 ` Bruce Richardson
2022-01-19 21:09 ` [PATCH v2 2/6] app/test: add allocator performance benchmark Dmitry Kozlyuk
2022-01-19 21:09 ` [PATCH v2 3/6] mem: add dirty malloc element support Dmitry Kozlyuk
2022-01-19 21:09 ` [PATCH v2 4/6] eal: refactor --huge-unlink storage Dmitry Kozlyuk
2022-01-19 21:11 ` [PATCH v2 5/6] eal/linux: allow hugepage file reuse Dmitry Kozlyuk
2022-01-19 21:11 ` Dmitry Kozlyuk [this message]
2022-01-27 12:07 ` [PATCH v2 0/6] Fast restart with many hugepages Bruce Richardson
2022-02-02 14:12 ` Thomas Monjalon
2022-02-02 21:54 ` David Marchand
2022-02-03 10:26 ` David Marchand
2022-02-03 18:13 ` [PATCH v3 " Dmitry Kozlyuk
2022-02-03 18:13 ` [PATCH v3 1/6] doc: add hugepage mapping details Dmitry Kozlyuk
2022-02-08 15:28 ` Burakov, Anatoly
2022-02-03 18:13 ` [PATCH v3 2/6] app/test: add allocator performance benchmark Dmitry Kozlyuk
2022-02-08 16:20 ` Burakov, Anatoly
2022-02-03 18:13 ` [PATCH v3 3/6] mem: add dirty malloc element support Dmitry Kozlyuk
2022-02-08 16:36 ` Burakov, Anatoly
2022-02-03 18:13 ` [PATCH v3 4/6] eal: refactor --huge-unlink storage Dmitry Kozlyuk
2022-02-08 16:39 ` Burakov, Anatoly
2022-02-03 18:13 ` [PATCH v3 5/6] eal/linux: allow hugepage file reuse Dmitry Kozlyuk
2022-02-08 17:05 ` Burakov, Anatoly
2022-02-03 18:13 ` [PATCH v3 6/6] eal: extend --huge-unlink for " Dmitry Kozlyuk
2022-02-08 17:14 ` Burakov, Anatoly
2022-02-08 20:40 ` [PATCH v3 0/6] Fast restart with many hugepages David Marchand
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20220119211144.766098-2-dkozlyuk@nvidia.com \
--to=dkozlyuk@nvidia.com \
--cc=anatoly.burakov@intel.com \
--cc=bruce.richardson@intel.com \
--cc=dev@dpdk.org \
--cc=thomas@monjalon.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).