From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id 0F503A0351; Wed, 19 Jan 2022 22:12:06 +0100 (CET) Received: from [217.70.189.124] (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id F3296411E1; Wed, 19 Jan 2022 22:12:05 +0100 (CET) Received: from NAM10-DM6-obe.outbound.protection.outlook.com (mail-dm6nam10on2067.outbound.protection.outlook.com [40.107.93.67]) by mails.dpdk.org (Postfix) with ESMTP id 23272411B8 for ; Wed, 19 Jan 2022 22:12:05 +0100 (CET) ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=czbvO80K05IftAAE1Ll9TqhVjZtT2ikD7hksE19V+WZ+RZOv7TvpXNXyDhxrheDCFX2J+b4k0uaamTNGARvCgd4ANDWwSQ5uLN3HHelv327QPycAex/jRspLRsVQ5xdGE0Z0V0oj3JXiDb9WolhDt3X7lmf7MKLTaZzNuBxyKKaP4pXiOXEI9b/pDwUHFPA644KY1+rbckS/WEHpe8/lw65l6i/cf8ilNuNjHKvCKCcVErlyye6Y4uF4LgwpNjT7lP3ZPJwDzqlCapN0ou0XhSuYPYtvHLBrIIOoqnj8kQtfxuRZ+IYM+1osFn8frHsWBEi1tp6stjW85JbZ8yj5Lw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=+2yJb50Mr1qrn0b/AR/B/pzTTQWI1w1XCl/qyTz/sLs=; b=Rltumm5jR+XTKrPK5xFCmQq1V0gEnGn+h1qCn3/PHPt+SMx1JVnsoKI6CF/JhmywRkON+hMNFmMqHNfE81ATCaSsBu+F1q3K15eHNXHbmUhnUmhK7H1kDoFRTk/7NMxxgaOU1gpzfCKkgW8wXqfnOwFkAeiCDSC6rVQyn2z5yYkgEIasLi7xnmvf4rJ1oZgb2dmtQw18HDt2ber0MdQcD8I8O8ITakmyT2ebVstORpcmK1cRuEtVyIsUaLgsB7WwIdWoFhxPZIYkng9WDSI/9EYqwh/igsMQCo8fLAQaTPhYS5Q6l9+8VJ38tSO5g9029acSFarJer4D0JgkZ/pYXg== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 12.22.5.234) smtp.rcpttodomain=monjalon.net smtp.mailfrom=nvidia.com; dmarc=pass (p=reject sp=reject pct=100) action=none header.from=nvidia.com; dkim=none (message not signed); arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Nvidia.com; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=+2yJb50Mr1qrn0b/AR/B/pzTTQWI1w1XCl/qyTz/sLs=; b=RSKTwN2h8JBF6ZwsX3BHKPt/d8/M6dTN2EIC8ePqU3kltdFEbqbG4ENBWvcs94Ct5URymCs+dwjL+/LO4UKrkt5gjBPYk9KtPBA6bZtwn8WT9RYoQflaIUe+N8m3fEhdDyTVqcC9LspJnTD10BL1Gsb7lWA6ra9gBQrKD/zKSRNPL3UIW5Nzz4RmqDMVehuJnLaesL6EVcXTfLleGp8CoEHgW/OefwivJmHy4PA9T4oApFMtZlSFCC4+en9B5V5Jsv8ogTeJ/9wKVl25ZjmySh/sne8dNd2xFydGDxep2vSiSG5hX8Dnvo7pNlgIO6eQGnbbBwF63C1Tf+/1fYsTmw== Received: from CO2PR04CA0181.namprd04.prod.outlook.com (2603:10b6:104:5::11) by CH0PR12MB5108.namprd12.prod.outlook.com (2603:10b6:610:bf::6) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4909.8; Wed, 19 Jan 2022 21:12:03 +0000 Received: from CO1NAM11FT006.eop-nam11.prod.protection.outlook.com (2603:10b6:104:5:cafe::24) by CO2PR04CA0181.outlook.office365.com (2603:10b6:104:5::11) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4909.7 via Frontend Transport; Wed, 19 Jan 2022 21:12:03 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 12.22.5.234) smtp.mailfrom=nvidia.com; dkim=none (message not signed) header.d=none;dmarc=pass action=none header.from=nvidia.com; Received-SPF: Pass (protection.outlook.com: domain of nvidia.com designates 12.22.5.234 as permitted sender) receiver=protection.outlook.com; client-ip=12.22.5.234; helo=mail.nvidia.com; Received: from mail.nvidia.com (12.22.5.234) by CO1NAM11FT006.mail.protection.outlook.com (10.13.174.246) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384) id 15.20.4909.7 via Frontend Transport; Wed, 19 Jan 2022 21:12:02 +0000 Received: from rnnvmail201.nvidia.com (10.129.68.8) by DRHQMAIL101.nvidia.com (10.27.9.10) with Microsoft SMTP Server (TLS) id 15.0.1497.18; Wed, 19 Jan 2022 21:12:02 +0000 Received: from nvidia.com (10.126.230.35) by rnnvmail201.nvidia.com (10.129.68.8) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384) id 15.2.986.9; Wed, 19 Jan 2022 13:12:00 -0800 From: Dmitry Kozlyuk To: CC: Bruce Richardson , Thomas Monjalon , Anatoly Burakov Subject: [PATCH v2 6/6] eal: extend --huge-unlink for hugepage file reuse Date: Wed, 19 Jan 2022 23:11:44 +0200 Message-ID: <20220119211144.766098-2-dkozlyuk@nvidia.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20220119211144.766098-1-dkozlyuk@nvidia.com> References: <20220119210917.765505-1-dkozlyuk@nvidia.com> <20220119211144.766098-1-dkozlyuk@nvidia.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Content-Type: text/plain X-Originating-IP: [10.126.230.35] X-ClientProxiedBy: HQMAIL105.nvidia.com (172.20.187.12) To rnnvmail201.nvidia.com (10.129.68.8) X-EOPAttributedMessage: 0 X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id: 7628ca52-ed27-46cb-4f12-08d9db9056eb X-MS-TrafficTypeDiagnostic: CH0PR12MB5108:EE_ X-LD-Processed: 43083d15-7273-40c1-b7db-39efd9ccc17a,ExtAddr X-Microsoft-Antispam-PRVS: X-MS-Oob-TLC-OOBClassifiers: OLM:2582; X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: kFXUPS6/cTajTnJVFr8MqMzDCQFqQCixSxyOrqM4l/MyIGLQXLQk6rAtzfABZA7pwtODMYK7I2HPgvrikRZwyn7Hi1qbE47uxrYva62LFBeS4+qxlfD3TI0FZkLdg3MO28vSAQXwgySIanc2rEr3F82gPpVlLi1n55A3f3JF7iSGYEpB0VLcECHJosucGXoD8Ts+cPe38K26baRlYzqmqSteYVtCA7EskZsCn9Yp9stVppKJT+BhjbH+31B8q+WSwsXtolAxDnIGoyB8/+fO6Mr+FYlYDDkteip26fagUa+GSWLhRfYfjSdbyeebUUPDKLzk95/ZXYJBgvmCJok2n2Tqec8VfD8Jx8HOkj9qb+V9Gc1rx/A2T72cEEgHxJZnkrOGMDgdWdOBCwfijePE0TNG31P6cj7vUMjrAUlGhYFC7zQqISeL4IACeh2bWDjJc4+POajvu7Nx4iCcwvU1jeIZxexS+fBWvY5zIshyiyCMvWYNqn5stdeJ38qNY4pD6pHiz3q1LKOrVFCvKdK4oSnEZyw8d0kAPwB3P3C8+ayOyMtgjmmJkpGzoBa9Y1qNh3X3h4NzYQdIy4f4hyhRm+cKxk/1aPhDzn0d5WeFAE/EXaNP7NUkJ1hkIZx8ynJv+rzb+foLm4vS0XtFSB8HieUraH+LWsCKPKjJcaR448DTe8gVcI6mLZtLC6sLThdwNUVdhGtQ/BX20+0TsYwOlVxSV6UOTcLs2sdv5uX0HTzt2oHRndYK9uMOR4EIh+S/zwVRvGgVxx/HpGJ2lf/bMnrBNi8D1hz4UcndbiqsEcTGWQu/WoxIvadKIuna3CTnlJMymJBU8gnVx10VOC/IDg== X-Forefront-Antispam-Report: CIP:12.22.5.234; CTRY:US; LANG:en; SCL:1; SRV:; IPV:CAL; SFV:NSPM; H:mail.nvidia.com; PTR:InfoNoRecords; CAT:NONE; SFS:(4636009)(36840700001)(40470700002)(46966006)(186003)(16526019)(8936002)(36756003)(316002)(82310400004)(55016003)(356005)(70586007)(36860700001)(70206006)(26005)(6666004)(86362001)(2616005)(4326008)(6916009)(47076005)(81166007)(54906003)(83380400001)(6286002)(5660300002)(2906002)(8676002)(336012)(7696005)(508600001)(40460700001)(1076003)(426003)(14583001)(36900700001); DIR:OUT; SFP:1101; X-OriginatorOrg: Nvidia.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 19 Jan 2022 21:12:02.8265 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 7628ca52-ed27-46cb-4f12-08d9db9056eb X-MS-Exchange-CrossTenant-Id: 43083d15-7273-40c1-b7db-39efd9ccc17a X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=43083d15-7273-40c1-b7db-39efd9ccc17a; Ip=[12.22.5.234]; Helo=[mail.nvidia.com] X-MS-Exchange-CrossTenant-AuthSource: CO1NAM11FT006.eop-nam11.prod.protection.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: CH0PR12MB5108 X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Expose Linux EAL ability to reuse existing hugepage files via --huge-unlink=never switch. Default behavior is unchanged, it can also be specified using --huge-unlink=existing for consistency. Old --huge-unlink switch is kept, it is an alias for --huge-unlink=always. Add a test case for the --huge-unlink=never mode. Signed-off-by: Dmitry Kozlyuk Acked-by: Thomas Monjalon --- app/test/test_eal_flags.c | 25 ++++++++++++ doc/guides/linux_gsg/linux_eal_parameters.rst | 24 ++++++++++-- .../prog_guide/env_abstraction_layer.rst | 12 ++++++ doc/guides/rel_notes/release_22_03.rst | 7 ++++ lib/eal/common/eal_common_options.c | 39 +++++++++++++++++-- 5 files changed, 100 insertions(+), 7 deletions(-) diff --git a/app/test/test_eal_flags.c b/app/test/test_eal_flags.c index d7f4c2cd47..e2696cda63 100644 --- a/app/test/test_eal_flags.c +++ b/app/test/test_eal_flags.c @@ -1122,6 +1122,11 @@ test_file_prefix(void) DEFAULT_MEM_SIZE, "--single-file-segments", "--file-prefix=" memtest1 }; + /* primary process with memtest1 and --huge-unlink=never mode */ + const char * const argv9[] = {prgname, "-m", + DEFAULT_MEM_SIZE, "--huge-unlink=never", + "--file-prefix=" memtest1 }; + /* check if files for current prefix are present */ if (process_hugefiles(prefix, HUGEPAGE_CHECK_EXISTS) != 1) { printf("Error - hugepage files for %s were not created!\n", prefix); @@ -1290,6 +1295,26 @@ test_file_prefix(void) return -1; } + /* this process will run with --huge-unlink, + * so it should not remove hugepage files when it exits + */ + if (launch_proc(argv9) != 0) { + printf("Error - failed to run with --huge-unlink=never\n"); + return -1; + } + + /* check if hugefiles for memtest1 are present */ + if (process_hugefiles(memtest1, HUGEPAGE_CHECK_EXISTS) == 0) { + printf("Error - hugepage files for %s were deleted!\n", + memtest1); + return -1; + } else { + if (process_hugefiles(memtest1, HUGEPAGE_DELETE) != 1) { + printf("Error - deleting hugepages failed!\n"); + return -1; + } + } + return 0; } diff --git a/doc/guides/linux_gsg/linux_eal_parameters.rst b/doc/guides/linux_gsg/linux_eal_parameters.rst index 74df2611b5..ea8f381391 100644 --- a/doc/guides/linux_gsg/linux_eal_parameters.rst +++ b/doc/guides/linux_gsg/linux_eal_parameters.rst @@ -84,10 +84,26 @@ Memory-related options Use specified hugetlbfs directory instead of autodetected ones. This can be a sub-directory within a hugetlbfs mountpoint. -* ``--huge-unlink`` - - Unlink hugepage files after creating them (implies no secondary process - support). +* ``--huge-unlink[=existing|always|never]`` + + No ``--huge-unlink`` option or ``--huge-unlink=existing`` is the default: + existing hugepage files are removed and re-created + to ensure the kernel clears the memory and prevents any data leaks. + + With ``--huge-unlink`` (no value) or ``--huge-unlink=always``, + hugepage files are also removed before mapping them, + so that the application leaves no files in hugetlbfs. + This mode implies no multi-process support. + + When ``--huge-unlink=never`` is specified, existing hugepage files + are never removed, but are remapped instead, allowing hugepage reuse. + This makes restart faster by saving time to clear memory at initialization, + but it may slow down zeroed allocations later. + Reused hugepages can contain data from previous processes that used them, + which may be a security concern. + Hugepage files created in this mode are also not removed + when all the hugepages mapped from them are freed, + which allows to reuse these files after a restart. * ``--match-allocations`` diff --git a/doc/guides/prog_guide/env_abstraction_layer.rst b/doc/guides/prog_guide/env_abstraction_layer.rst index fede7fe69d..b1eae592ab 100644 --- a/doc/guides/prog_guide/env_abstraction_layer.rst +++ b/doc/guides/prog_guide/env_abstraction_layer.rst @@ -282,6 +282,18 @@ to prevent data leaks from previous users of the same hugepage. EAL ensures this behavior by removing existing backing files at startup and by recreating them before opening for mapping (as a precaution). +One exception is ``--huge-unlink=never`` mode. +It is used to speed up EAL initialization, usually on application restart. +Clearing memory constitutes more than 95% of hugepage mapping time. +EAL can save it by remapping existing backing files +with all the data left in the mapped hugepages ("dirty" memory). +Such segments are marked with ``RTE_MEMSEG_FLAG_DIRTY``. +Memory allocator detects dirty segments handles them accordingly, +in particular, it clears memory requested with ``rte_zmalloc*()``. +In this mode EAL also does not remove a backing file +when all pages mapped from it are freed, +because they are intended to be reusable at restart. + Anonymous mapping does not allow multi-process architecture, but it is free of filename conflicts and leftover files on hugetlbfs. It makes running as non-root easier, diff --git a/doc/guides/rel_notes/release_22_03.rst b/doc/guides/rel_notes/release_22_03.rst index 6d99d1eaa9..0b882362cf 100644 --- a/doc/guides/rel_notes/release_22_03.rst +++ b/doc/guides/rel_notes/release_22_03.rst @@ -55,6 +55,13 @@ New Features Also, make sure to start the actual text at the margin. ======================================================= +* **Added ability to reuse hugepages in Linux.** + + It is possible to reuse files in hugetlbfs to speed up hugepage mapping, + which may be useful for fast restart and large allocations. + The new mode is activated with ``--huge-unlink=never`` + and has security implications, refer to the user and programmer guides. + Removed Items ------------- diff --git a/lib/eal/common/eal_common_options.c b/lib/eal/common/eal_common_options.c index cdd2284b0c..45d393b393 100644 --- a/lib/eal/common/eal_common_options.c +++ b/lib/eal/common/eal_common_options.c @@ -74,7 +74,7 @@ eal_long_options[] = { {OPT_FILE_PREFIX, 1, NULL, OPT_FILE_PREFIX_NUM }, {OPT_HELP, 0, NULL, OPT_HELP_NUM }, {OPT_HUGE_DIR, 1, NULL, OPT_HUGE_DIR_NUM }, - {OPT_HUGE_UNLINK, 0, NULL, OPT_HUGE_UNLINK_NUM }, + {OPT_HUGE_UNLINK, 2, NULL, OPT_HUGE_UNLINK_NUM }, {OPT_IOVA_MODE, 1, NULL, OPT_IOVA_MODE_NUM }, {OPT_LCORES, 1, NULL, OPT_LCORES_NUM }, {OPT_LOG_LEVEL, 1, NULL, OPT_LOG_LEVEL_NUM }, @@ -1598,6 +1598,28 @@ available_cores(void) return str; } +#define HUGE_UNLINK_NEVER "never" + +static int +eal_parse_huge_unlink(const char *arg, struct hugepage_file_discipline *out) +{ + if (arg == NULL || strcmp(arg, "always") == 0) { + out->unlink_before_mapping = true; + return 0; + } + if (strcmp(arg, "existing") == 0) { + /* same as not specifying the option */ + return 0; + } + if (strcmp(arg, HUGE_UNLINK_NEVER) == 0) { + RTE_LOG(WARNING, EAL, "Using --"OPT_HUGE_UNLINK"=" + HUGE_UNLINK_NEVER" may create data leaks.\n"); + out->unlink_existing = false; + return 0; + } + return -1; +} + int eal_parse_common_option(int opt, const char *optarg, struct internal_config *conf) @@ -1739,7 +1761,10 @@ eal_parse_common_option(int opt, const char *optarg, /* long options */ case OPT_HUGE_UNLINK_NUM: - conf->hugepage_file.unlink_before_mapping = true; + if (eal_parse_huge_unlink(optarg, &conf->hugepage_file) < 0) { + RTE_LOG(ERR, EAL, "invalid --"OPT_HUGE_UNLINK" option\n"); + return -1; + } break; case OPT_NO_HUGE_NUM: @@ -2070,6 +2095,12 @@ eal_check_common_options(struct internal_config *internal_cfg) "not compatible with --"OPT_HUGE_UNLINK"\n"); return -1; } + if (!internal_cfg->hugepage_file.unlink_existing && + internal_cfg->in_memory) { + RTE_LOG(ERR, EAL, "Option --"OPT_IN_MEMORY" is not compatible " + "with --"OPT_HUGE_UNLINK"="HUGE_UNLINK_NEVER"\n"); + return -1; + } if (internal_cfg->legacy_mem && internal_cfg->in_memory) { RTE_LOG(ERR, EAL, "Option --"OPT_LEGACY_MEM" is not compatible " @@ -2202,7 +2233,9 @@ eal_common_usage(void) " --"OPT_NO_TELEMETRY" Disable telemetry support\n" " --"OPT_FORCE_MAX_SIMD_BITWIDTH" Force the max SIMD bitwidth\n" "\nEAL options for DEBUG use only:\n" - " --"OPT_HUGE_UNLINK" Unlink hugepage files after init\n" + " --"OPT_HUGE_UNLINK"[=existing|always|never]\n" + " When to unlink files in hugetlbfs\n" + " ('existing' by default, no value means 'always')\n" " --"OPT_NO_HUGE" Use malloc instead of hugetlbfs\n" " --"OPT_NO_PCI" Disable PCI\n" " --"OPT_NO_HPET" Disable HPET\n" -- 2.25.1