From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <dev-bounces@dpdk.org>
Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124])
	by inbox.dpdk.org (Postfix) with ESMTP id C62D1A034F;
	Mon, 17 Jan 2022 09:15:01 +0100 (CET)
Received: from [217.70.189.124] (localhost [127.0.0.1])
	by mails.dpdk.org (Postfix) with ESMTP id B88B441168;
	Mon, 17 Jan 2022 09:15:01 +0100 (CET)
Received: from NAM12-BN8-obe.outbound.protection.outlook.com
 (mail-bn8nam12on2057.outbound.protection.outlook.com [40.107.237.57])
 by mails.dpdk.org (Postfix) with ESMTP id DD37941168
 for <dev@dpdk.org>; Mon, 17 Jan 2022 09:15:00 +0100 (CET)
ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none;
 b=nguM5vjbnqHopAJ0+avCx8nLZDWDOCZ60CSUms3P+S75lgj8ocOyo9qPj5MntL6YRMOuePPmftGwSNGYkJRbrUjsIxmTxUMWz/8nN6k/4swZjBDbWqAxgkmu1lnypTR6GcxUCSaQeo8jwuxFKbkvJAHb5As9kpVVVkImy4QnumlFFPN7piUC7gjRXdMe0bqgpK1YLCD7DF1Spf5APFGa483Q7GnK4qIAJzXkUg4PHKDcukpZxkB8mqWyAOKGFHDY7ekDUc6fQFFMB+MPT8FtGpLsWN9S8inlV2TCszibH/zLPTmzGcgzTqsEIakLuHJz/JVvaZP1kIKqk9sp0O/KKQ==
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; 
 s=arcselector9901;
 h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1;
 bh=OMbFxDNtveaNZui0e3Kn5aaOSvWmALQOj4HWVILl56E=;
 b=kuu1O8GDf0j1SFhDr/2hzdKYY2NFLAJIvKM3ws78fSbFnqid8l92aJbJQEvjmEP4YDv6Pj6rWnSYiEujEeJypdN7IEDhqtpaLKsHulWzOJ1C25taUmgP5vTMCPslue/huD2Zxl+6Uw2iUVQky7ogeHAht6j0D/6EzSFs+7hyfCTlfswLgt6YM6gBzuSu3Cm/hi1goFhg367gQhg5yjTKLl3D5RIWGLawYh4OAWSrcK1y7w5CDIsGqveG9MjFAdf6ONoDmBNy99MnUtLcSDcYyCGglN/B3U3w0oXZvF/vfahMVdauLb8U0TQuPGBm1KmpL+joW2K1gVrek4mzOw/eYA==
ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is
 12.22.5.236) smtp.rcpttodomain=intel.com smtp.mailfrom=nvidia.com; dmarc=pass
 (p=reject sp=reject pct=100) action=none header.from=nvidia.com; dkim=none
 (message not signed); arc=none
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Nvidia.com;
 s=selector2;
 h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck;
 bh=OMbFxDNtveaNZui0e3Kn5aaOSvWmALQOj4HWVILl56E=;
 b=EzbMpJQtmDmAlj6GZweSfthcIDzOHKQ1wb69pYXHE/IztT0HjsbAfS23Stc9NGWvBdS9c1szMbTK3jV6/PwtFBbhOuYF5ODVSnshfiH1faaLeRbYY5j9ZLMFciflpct1ju8zJtIuRblN8dv52xefmobb5V8BQO5Xlw1EpeLNyde97BdhpmxF53xbNqFQVTw3hQxpGVmcfYoOyCJ6CDp3PjPDloJrAerPsLECbC/ecc+mcs60hMGFOqs8j7VYBNXsmw3/SGLr30J3yTdvnazJRQWiyFTQAmvwD4brh78H5IjrlRLhJYczO2uM6v5LZPP9VG0efGE8QqS8a7oDOa/jdw==
Received: from DM6PR10CA0011.namprd10.prod.outlook.com (2603:10b6:5:60::24) by
 DM5PR12MB1372.namprd12.prod.outlook.com (2603:10b6:3:77::7) with
 Microsoft
 SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id
 15.20.4888.11; Mon, 17 Jan 2022 08:14:59 +0000
Received: from DM6NAM11FT006.eop-nam11.prod.protection.outlook.com
 (2603:10b6:5:60:cafe::ad) by DM6PR10CA0011.outlook.office365.com
 (2603:10b6:5:60::24) with Microsoft SMTP Server (version=TLS1_2,
 cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4888.11 via Frontend
 Transport; Mon, 17 Jan 2022 08:14:59 +0000
X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 12.22.5.236)
 smtp.mailfrom=nvidia.com; dkim=none (message not signed)
 header.d=none;dmarc=pass action=none header.from=nvidia.com;
Received-SPF: Pass (protection.outlook.com: domain of nvidia.com designates
 12.22.5.236 as permitted sender) receiver=protection.outlook.com;
 client-ip=12.22.5.236; helo=mail.nvidia.com;
Received: from mail.nvidia.com (12.22.5.236) by
 DM6NAM11FT006.mail.protection.outlook.com (10.13.173.104) with Microsoft SMTP
 Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384) id
 15.20.4888.9 via Frontend Transport; Mon, 17 Jan 2022 08:14:59 +0000
Received: from rnnvmail201.nvidia.com (10.129.68.8) by DRHQMAIL109.nvidia.com
 (10.27.9.19) with Microsoft SMTP Server (TLS) id 15.0.1497.18;
 Mon, 17 Jan 2022 08:14:58 +0000
Received: from nvidia.com (10.126.231.35) by rnnvmail201.nvidia.com
 (10.129.68.8) with Microsoft SMTP Server (version=TLS1_2,
 cipher=TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384) id 15.2.986.9; Mon, 17 Jan 2022
 00:14:57 -0800
From: Dmitry Kozlyuk <dkozlyuk@nvidia.com>
To: <dev@dpdk.org>
CC: Anatoly Burakov <anatoly.burakov@intel.com>
Subject: [PATCH v1 6/6] eal: extend --huge-unlink for hugepage file reuse
Date: Mon, 17 Jan 2022 10:14:40 +0200
Message-ID: <20220117081440.482410-1-dkozlyuk@nvidia.com>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <20220117080801.481568-1-dkozlyuk@nvidia.com>
References: <20220117080801.481568-1-dkozlyuk@nvidia.com>
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
Content-Type: text/plain
X-Originating-IP: [10.126.231.35]
X-ClientProxiedBy: HQMAIL105.nvidia.com (172.20.187.12) To
 rnnvmail201.nvidia.com (10.129.68.8)
X-EOPAttributedMessage: 0
X-MS-PublicTrafficType: Email
X-MS-Office365-Filtering-Correlation-Id: 41f4fe91-a6ba-478d-f2b4-08d9d991743f
X-MS-TrafficTypeDiagnostic: DM5PR12MB1372:EE_
X-Microsoft-Antispam-PRVS: <DM5PR12MB13721A3AC9ABB02BE2E918AEB9579@DM5PR12MB1372.namprd12.prod.outlook.com>
X-MS-Oob-TLC-OOBClassifiers: OLM:8882;
X-MS-Exchange-SenderADCheck: 1
X-MS-Exchange-AntiSpam-Relay: 0
X-Microsoft-Antispam: BCL:0;
X-Microsoft-Antispam-Message-Info: uFwjFHr1azVCEPMidHfzqtsDfT3duk1fQYLDXLvYPGvoM3QvHPGJMhiCYIPUTN2Oz0zHX/Dr1r4KPl6z3VhajEv7f36gg0ELQWs5ANp39+O8Q7RNF7QXG38VNRHBdo2/6DKlRHFohr5bsxD/Ie5aBztZIfDumR0VNYIlxGeJ2vqN50c9HmixRMj7Lur4PKJP9zSQBUBRoWbNKbdhNgRBMCOltiQ0Cnpvv/vS/d9ggPWV8TxBvC9pZPwhEo2NkX2nnB/c1p2hBaiqEmeyqSg5MgHQlrGs5rfAnXNW1mUOrSO3UE289I0X8FuQMfIRYE90i2sA77srjKOM90IZ5aj3rBpmD/YM/Ye134h8HuBYvOVQ8zIJIiuPkW9bq+HNfmVfOgxkWCtninWxkQysQ5or3csxPToOkLFSn3TQNmIJTzDTRpVnxLYpomAjfxyBhWWCppI0mj4QJcPElt6h8prEAGy9sxsZgocOzH2DCwH3pNpYw0FNQwmwIkxXa3fsqB2/a+ZGXgRkVHIs6Mgyw3VrPqeZLvMe7/A+BF6wYU82D6bG0Qfno9w6DIboVGOJUuw7/ctOJK9HCQdoq2Z6aVh0Cf/o5KwPjlI55dgaKYXq5/7GUwnUk7b/hEedFceDqBC7xpIrwNRzwV/MvvHpnbCndHRaJnkaZNUTzJCVA4mPtdANuusbPmDN4NTTpj4vCqKF6H2KL6VpfNR4w8HPUHn5NcjnDRsKgrlnYds9VlkZ7FOL3lZyQ9yI4fnXN8fbl7m/C9s1bWSnsfU5yj78X0KCtQEMXRmUN+HmJo+VZ8YxGtI6y6yJ6A7Jv/0X1GOi99A76ef4CuPuqjoo/Qx0OE9/ew==
X-Forefront-Antispam-Report: CIP:12.22.5.236; CTRY:US; LANG:en; SCL:1; SRV:;
 IPV:CAL; SFV:NSPM; H:mail.nvidia.com; PTR:InfoNoRecords; CAT:NONE;
 SFS:(4636009)(40470700002)(36840700001)(46966006)(47076005)(4326008)(40460700001)(8936002)(55016003)(70586007)(86362001)(6286002)(83380400001)(82310400004)(36860700001)(426003)(2906002)(8676002)(6916009)(70206006)(7696005)(36756003)(26005)(316002)(1076003)(5660300002)(336012)(356005)(186003)(2616005)(508600001)(81166007)(16526019)(6666004)(14583001)(36900700001);
 DIR:OUT; SFP:1101; 
X-OriginatorOrg: Nvidia.com
X-MS-Exchange-CrossTenant-OriginalArrivalTime: 17 Jan 2022 08:14:59.2219 (UTC)
X-MS-Exchange-CrossTenant-Network-Message-Id: 41f4fe91-a6ba-478d-f2b4-08d9d991743f
X-MS-Exchange-CrossTenant-Id: 43083d15-7273-40c1-b7db-39efd9ccc17a
X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=43083d15-7273-40c1-b7db-39efd9ccc17a; Ip=[12.22.5.236];
 Helo=[mail.nvidia.com]
X-MS-Exchange-CrossTenant-AuthSource: DM6NAM11FT006.eop-nam11.prod.protection.outlook.com
X-MS-Exchange-CrossTenant-AuthAs: Anonymous
X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem
X-MS-Exchange-Transport-CrossTenantHeadersStamped: DM5PR12MB1372
X-BeenThere: dev@dpdk.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: DPDK patches and discussions <dev.dpdk.org>
List-Unsubscribe: <https://mails.dpdk.org/options/dev>,
 <mailto:dev-request@dpdk.org?subject=unsubscribe>
List-Archive: <http://mails.dpdk.org/archives/dev/>
List-Post: <mailto:dev@dpdk.org>
List-Help: <mailto:dev-request@dpdk.org?subject=help>
List-Subscribe: <https://mails.dpdk.org/listinfo/dev>,
 <mailto:dev-request@dpdk.org?subject=subscribe>
Errors-To: dev-bounces@dpdk.org

Expose Linux EAL ability to reuse existing hugepage files
via --huge-unlink=never switch.
Default behavior is unchanged, it can also be specified
using --huge-unlink=existing for consistency.
Old --huge-unlink switch is kept,
it is an alias for --huge-unlink=always.

Signed-off-by: Dmitry Kozlyuk <dkozlyuk@nvidia.com>
---
 doc/guides/linux_gsg/linux_eal_parameters.rst | 21 ++++++++--
 .../prog_guide/env_abstraction_layer.rst      |  9 +++++
 doc/guides/rel_notes/release_22_03.rst        |  7 ++++
 lib/eal/common/eal_common_options.c           | 39 +++++++++++++++++--
 4 files changed, 69 insertions(+), 7 deletions(-)

diff --git a/doc/guides/linux_gsg/linux_eal_parameters.rst b/doc/guides/linux_gsg/linux_eal_parameters.rst
index 74df2611b5..7586f15ce3 100644
--- a/doc/guides/linux_gsg/linux_eal_parameters.rst
+++ b/doc/guides/linux_gsg/linux_eal_parameters.rst
@@ -84,10 +84,23 @@ Memory-related options
     Use specified hugetlbfs directory instead of autodetected ones. This can be
     a sub-directory within a hugetlbfs mountpoint.
 
-*   ``--huge-unlink``
-
-    Unlink hugepage files after creating them (implies no secondary process
-    support).
+*   ``--huge-unlink[=existing|always|never]``
+
+    No ``--huge-unlink`` option or ``--huge-unlink=existing`` is the default:
+    existing hugepage files are removed and re-created
+    to ensure the kernel clears the memory and prevents any data leaks.
+
+    With ``--huge-unlink`` (no value) or ``--huge-unlink=always``,
+    hugepage files are also removed after creating them,
+    so that the application leaves no files in hugetlbfs.
+    This mode implies no multi-process support.
+
+    When ``--huge-unlink=never`` is specified, existing hugepage files
+    are not removed either before or after mapping them.
+    This makes restart faster by saving time to clear memory at initialization,
+    but it may slow down zeroed allocations later.
+    Reused hugepages can contain data from previous processes that used them,
+    which may be a security concern.
 
 *   ``--match-allocations``
 
diff --git a/doc/guides/prog_guide/env_abstraction_layer.rst b/doc/guides/prog_guide/env_abstraction_layer.rst
index bfe4594bf1..c7dc4a0e6a 100644
--- a/doc/guides/prog_guide/env_abstraction_layer.rst
+++ b/doc/guides/prog_guide/env_abstraction_layer.rst
@@ -277,6 +277,15 @@ to prevent data leaks from previous users of the same hugepage.
 EAL ensures this behavior by removing existing backing files at startup
 and by recreating them before opening for mapping (as a precaution).
 
+One exception is ``--huge-unlink=never`` mode.
+It is used to speed up EAL initialization, usually on application restart.
+Clearing memory constitutes more than 95% of hugepage mapping time.
+EAL can save it by remapping existing backing files
+with all the data left in the mapped hugepages ("dirty" memory).
+Such segments are marked with ``RTE_MEMSEG_FLAG_DIRTY``.
+Memory allocator detects dirty segments handles them accordingly,
+in particular, it clears memory requested with ``rte_zmalloc*()``.
+
 Anonymous mapping does not allow multi-process architecture,
 but it is free of filename conflicts and leftover files on hugetlbfs.
 If memfd_create(2) is supported both at build and run time,
diff --git a/doc/guides/rel_notes/release_22_03.rst b/doc/guides/rel_notes/release_22_03.rst
index 6d99d1eaa9..0b882362cf 100644
--- a/doc/guides/rel_notes/release_22_03.rst
+++ b/doc/guides/rel_notes/release_22_03.rst
@@ -55,6 +55,13 @@ New Features
      Also, make sure to start the actual text at the margin.
      =======================================================
 
+* **Added ability to reuse hugepages in Linux.**
+
+  It is possible to reuse files in hugetlbfs to speed up hugepage mapping,
+  which may be useful for fast restart and large allocations.
+  The new mode is activated with ``--huge-unlink=never``
+  and has security implications, refer to the user and programmer guides.
+
 
 Removed Items
 -------------
diff --git a/lib/eal/common/eal_common_options.c b/lib/eal/common/eal_common_options.c
index 7520ebda8e..905a7769bd 100644
--- a/lib/eal/common/eal_common_options.c
+++ b/lib/eal/common/eal_common_options.c
@@ -74,7 +74,7 @@ eal_long_options[] = {
 	{OPT_FILE_PREFIX,       1, NULL, OPT_FILE_PREFIX_NUM      },
 	{OPT_HELP,              0, NULL, OPT_HELP_NUM             },
 	{OPT_HUGE_DIR,          1, NULL, OPT_HUGE_DIR_NUM         },
-	{OPT_HUGE_UNLINK,       0, NULL, OPT_HUGE_UNLINK_NUM      },
+	{OPT_HUGE_UNLINK,       2, NULL, OPT_HUGE_UNLINK_NUM      },
 	{OPT_IOVA_MODE,	        1, NULL, OPT_IOVA_MODE_NUM        },
 	{OPT_LCORES,            1, NULL, OPT_LCORES_NUM           },
 	{OPT_LOG_LEVEL,         1, NULL, OPT_LOG_LEVEL_NUM        },
@@ -1596,6 +1596,28 @@ available_cores(void)
 	return str;
 }
 
+#define HUGE_UNLINK_NEVER "never"
+
+static int
+eal_parse_huge_unlink(const char *arg, struct hugepage_file_discipline *out)
+{
+	if (arg == NULL || strcmp(arg, "always") == 0) {
+		out->unlink_before_mapping = true;
+		return 0;
+	}
+	if (strcmp(arg, "existing") == 0) {
+		/* same as not specifying the option */
+		return 0;
+	}
+	if (strcmp(arg, HUGE_UNLINK_NEVER) == 0) {
+		RTE_LOG(WARNING, EAL, "Using --"OPT_HUGE_UNLINK"="
+			HUGE_UNLINK_NEVER" may create data leaks.\n");
+		out->keep_existing = true;
+		return 0;
+	}
+	return -1;
+}
+
 int
 eal_parse_common_option(int opt, const char *optarg,
 			struct internal_config *conf)
@@ -1737,7 +1759,10 @@ eal_parse_common_option(int opt, const char *optarg,
 
 	/* long options */
 	case OPT_HUGE_UNLINK_NUM:
-		conf->hugepage_file.unlink_before_mapping = true;
+		if (eal_parse_huge_unlink(optarg, &conf->hugepage_file) < 0) {
+			RTE_LOG(ERR, EAL, "invalid --"OPT_HUGE_UNLINK" option\n");
+			return -1;
+		}
 		break;
 
 	case OPT_NO_HUGE_NUM:
@@ -2068,6 +2093,12 @@ eal_check_common_options(struct internal_config *internal_cfg)
 			"not compatible with --"OPT_HUGE_UNLINK"\n");
 		return -1;
 	}
+	if (internal_cfg->hugepage_file.keep_existing &&
+			internal_cfg->in_memory) {
+		RTE_LOG(ERR, EAL, "Option --"OPT_IN_MEMORY" is not compatible "
+			"with --"OPT_HUGE_UNLINK"="HUGE_UNLINK_NEVER"\n");
+		return -1;
+	}
 	if (internal_cfg->legacy_mem &&
 			internal_cfg->in_memory) {
 		RTE_LOG(ERR, EAL, "Option --"OPT_LEGACY_MEM" is not compatible "
@@ -2200,7 +2231,9 @@ eal_common_usage(void)
 	       "  --"OPT_NO_TELEMETRY"   Disable telemetry support\n"
 	       "  --"OPT_FORCE_MAX_SIMD_BITWIDTH" Force the max SIMD bitwidth\n"
 	       "\nEAL options for DEBUG use only:\n"
-	       "  --"OPT_HUGE_UNLINK"       Unlink hugepage files after init\n"
+	       "  --"OPT_HUGE_UNLINK"[=existing|always|never]\n"
+	       "                      When to unlink files in hugetlbfs\n"
+	       "                      ('existing' by default, no value means 'always')\n"
 	       "  --"OPT_NO_HUGE"           Use malloc instead of hugetlbfs\n"
 	       "  --"OPT_NO_PCI"            Disable PCI\n"
 	       "  --"OPT_NO_HPET"           Disable HPET\n"
-- 
2.25.1