From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id 46874A0351; Wed, 19 Jan 2022 22:10:20 +0100 (CET) Received: from [217.70.189.124] (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id 3671C411B2; Wed, 19 Jan 2022 22:10:20 +0100 (CET) Received: from NAM11-BN8-obe.outbound.protection.outlook.com (mail-bn8nam11on2087.outbound.protection.outlook.com [40.107.236.87]) by mails.dpdk.org (Postfix) with ESMTP id 57573411AB for ; Wed, 19 Jan 2022 22:10:18 +0100 (CET) ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=X4AVGU+HfZ0dr2Ro6J8oYw9pxoWVweJr711/q6oN3t/amXiBquIdlz3O30wzlxNXXjKQIZqvH/ZFbOwPdOLHrdJlbY4q2Hi0Q7Dg/TCa2vUqzesM4VMY7YM4gXuYJeLFGffvQgvYKo3gC3odC8Y+CSiFKoIUBIbgtoX9dgzbL2mAJED5OroFoRfe558K/cOny+OMLRLHwtKWHaFFdByZpt1CBEDRJKVkznxyDPOVNaxSqjZBM+bsYO51PDOkVdHyGud035QXjIpPhJZSdmcrTulh/lpUfEeeGYjblpctssflVNXqELSbjpy3pG7+zmAr2+SiHxbL1f+FQX9W6UpFmw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=e1U6NX+VvsIl0L9octo8gQgYE8FDTiPjhMr/akUhxuw=; b=fYf6/KfElZoEilVGPGAaYDxNfPh0ZBcbWH8AzXElh4DxBsImw+PaIwrL075ecgf8o3zd6hbE4Prdvsm/qn5xQCb1DWZDV90ptfgSdigvqsWamgujlcxdEnI/Ut4hINh8wcw5IyeVAMwBlRD2GsbhRfyac8WSSh+zt7nvos+7ycFALu6acG7A6ZZLqo7bUSu5HaMSfZBLwWObURbC42DA7Y2M71tUz6tCQ7Z201dKdOU+OYAxN8h1vrAs5s/fnXgVFiH0dvOyc4e5J8bPF3/Nf9BF/4TIGwIStedhzeCnMhGlIM1dVSMzs3KpbLr0XWKq4Xpub2LPJaWoNh73jB9nVw== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 12.22.5.238) smtp.rcpttodomain=intel.com smtp.mailfrom=nvidia.com; dmarc=pass (p=reject sp=reject pct=100) action=none header.from=nvidia.com; dkim=none (message not signed); arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Nvidia.com; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=e1U6NX+VvsIl0L9octo8gQgYE8FDTiPjhMr/akUhxuw=; b=OYQGkqWWQ2UuHvRa01+k1r3FitupC8DWCNXi08PuH5XgFuSOge7Mj8OtsbGKk+Uuj4h9Ce4hEQ15v/CR43CGn5SVG8x5AvATIYr5UT/hDSuCihap/gtbhkzyYOlZEGRiE4MKMuzuqu4k/hNbIUB6Pb2K525TigzXBOh6mJdKecTtd5OaviHIJX+52NahgfpBUZ9Ub7ETT5jU4OscMRk/DDZ8/FjfgZ1RGpiIBzlnRyW17CKFhib22zCaenPkgmM2JdDh3nai4Is+wDMZ5Zf9hMSSjoKlsa+8bYRQungpy4bQBODoFfwSuCiZGP9XLBzXGrVFeJvscPIxMO9M2gi5Ew== Received: from BN9PR03CA0930.namprd03.prod.outlook.com (2603:10b6:408:107::35) by DM6PR12MB3003.namprd12.prod.outlook.com (2603:10b6:5:38::24) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4888.12; Wed, 19 Jan 2022 21:10:11 +0000 Received: from BN8NAM11FT062.eop-nam11.prod.protection.outlook.com (2603:10b6:408:107:cafe::35) by BN9PR03CA0930.outlook.office365.com (2603:10b6:408:107::35) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4888.12 via Frontend Transport; Wed, 19 Jan 2022 21:10:11 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 12.22.5.238) smtp.mailfrom=nvidia.com; dkim=none (message not signed) header.d=none;dmarc=pass action=none header.from=nvidia.com; Received-SPF: Pass (protection.outlook.com: domain of nvidia.com designates 12.22.5.238 as permitted sender) receiver=protection.outlook.com; client-ip=12.22.5.238; helo=mail.nvidia.com; Received: from mail.nvidia.com (12.22.5.238) by BN8NAM11FT062.mail.protection.outlook.com (10.13.177.34) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384) id 15.20.4909.7 via Frontend Transport; Wed, 19 Jan 2022 21:10:11 +0000 Received: from rnnvmail201.nvidia.com (10.129.68.8) by DRHQMAIL105.nvidia.com (10.27.9.14) with Microsoft SMTP Server (TLS) id 15.0.1497.18; Wed, 19 Jan 2022 21:09:35 +0000 Received: from nvidia.com (10.126.230.35) by rnnvmail201.nvidia.com (10.129.68.8) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384) id 15.2.986.9; Wed, 19 Jan 2022 13:09:33 -0800 From: Dmitry Kozlyuk To: CC: Bruce Richardson , Anatoly Burakov Subject: [PATCH v2 1/6] doc: add hugepage mapping details Date: Wed, 19 Jan 2022 23:09:12 +0200 Message-ID: <20220119210917.765505-2-dkozlyuk@nvidia.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20220119210917.765505-1-dkozlyuk@nvidia.com> References: <20220117080801.481568-1-dkozlyuk@nvidia.com> <20220119210917.765505-1-dkozlyuk@nvidia.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Content-Type: text/plain X-Originating-IP: [10.126.230.35] X-ClientProxiedBy: HQMAIL105.nvidia.com (172.20.187.12) To rnnvmail201.nvidia.com (10.129.68.8) X-EOPAttributedMessage: 0 X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id: 46b59753-7df7-42ae-b81c-08d9db901481 X-MS-TrafficTypeDiagnostic: DM6PR12MB3003:EE_ X-Microsoft-Antispam-PRVS: X-MS-Oob-TLC-OOBClassifiers: OLM:10000; X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: EyxlECzUDec/bEPRUOJnpO5yl3mDHlo7CO1af+jAqcQEwrtfds0bZm7Pym5eR0QZ0bqpMnMuiPFsSZ/+o6L74cKDUk7pG7QCm2kHyVYRrukd7Z3y959EfLY5aWlfLQiVLMmoWfvXfxJkh4amcavQ2dLrNTLj48H1H6fDlon54fXJqN8/hOI9LrBZu1JJDZke5SC4sN8dw+fonX6S/L4JpBFIwHqcxk9dlKrYNrtW3+C1rA4nVPbsyvH6p1vZkZHma5Q7ZHsK8b5y2f10JDlT+YY20fUabOWAX1sKEenhjRVGX5MQVZjhmtoD/H1fQekp4yC9H1h4FhfNVH6mbpiKFyBmsE6xwqsLdUuQpstSP9WprWd9Oc22avyTe588iBXNn3kx0xqaZX2TMUhitUH8A7emBj/64p4qHrxHmSk1w3lIClnBnhtDSytemWkh94ges4A+kSJ/qxOQCxaedcK+HUUVp50rPyOCGpETYtWmifdF7tZey/IKNQbMBdHUMpW3c8oW21cmi3HqXtvndI00Nr5QD6/h8a40NWU3HyAG4YR8Q1JsvW7Uw9YftpkzZelFwR9A45QfVR6xwmZ/jMxY3fnh6sSkEjqHLj4P5+ezQBNFUkZnw9gASm8kQvNlNhrkVYRf6lRurmDNMy1JJH0cgojuPs/RtAhyRHKO8cR9dXqHa2wo0J3yGSAGW/AD7xEKb1ZOwGa8sIIqbrZTcW/rW0LrFKRBe41gOPYMuX3HXaQv5xhlQd0QutiECDVEmXW1mCXjcSS2Eo+7tMIUIRAWL+j7DhCz+O7kQbRLYTHcGHs= X-Forefront-Antispam-Report: CIP:12.22.5.238; CTRY:US; LANG:en; SCL:1; SRV:; IPV:CAL; SFV:NSPM; H:mail.nvidia.com; PTR:InfoNoRecords; CAT:NONE; SFS:(4636009)(40470700002)(36840700001)(46966006)(83380400001)(8936002)(8676002)(6286002)(55016003)(36860700001)(40460700001)(6916009)(47076005)(16526019)(82310400004)(508600001)(186003)(316002)(1076003)(4326008)(36756003)(2906002)(26005)(7696005)(70586007)(356005)(336012)(5660300002)(54906003)(86362001)(2616005)(70206006)(426003)(81166007)(6666004)(36900700001); DIR:OUT; SFP:1101; X-OriginatorOrg: Nvidia.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 19 Jan 2022 21:10:11.3892 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 46b59753-7df7-42ae-b81c-08d9db901481 X-MS-Exchange-CrossTenant-Id: 43083d15-7273-40c1-b7db-39efd9ccc17a X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=43083d15-7273-40c1-b7db-39efd9ccc17a; Ip=[12.22.5.238]; Helo=[mail.nvidia.com] X-MS-Exchange-CrossTenant-AuthSource: BN8NAM11FT062.eop-nam11.prod.protection.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: DM6PR12MB3003 X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Hugepage mapping is a layer of EAL malloc builds upon. There were implicit references to its details, like mentions of segment file descriptors, but no explicit description of its modes and operation. Add an overview of mechanics used on ech supported OS. Convert memory management subsections from list items to level 4 headers: they are big and important enough. Signed-off-by: Dmitry Kozlyuk --- .../prog_guide/env_abstraction_layer.rst | 95 +++++++++++++++++-- 1 file changed, 86 insertions(+), 9 deletions(-) diff --git a/doc/guides/prog_guide/env_abstraction_layer.rst b/doc/guides/prog_guide/env_abstraction_layer.rst index c6accce701..fede7fe69d 100644 --- a/doc/guides/prog_guide/env_abstraction_layer.rst +++ b/doc/guides/prog_guide/env_abstraction_layer.rst @@ -86,7 +86,7 @@ See chapter Memory Mapping Discovery and Memory Reservation ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -The allocation of large contiguous physical memory is done using the hugetlbfs kernel filesystem. +The allocation of large contiguous physical memory is done using hugepages. The EAL provides an API to reserve named memory zones in this contiguous memory. The physical address of the reserved memory for that memory zone is also returned to the user by the memory zone reservation API. @@ -95,11 +95,13 @@ and legacy mode. Both modes are explained below. .. note:: - Memory reservations done using the APIs provided by rte_malloc are also backed by pages from the hugetlbfs filesystem. + Memory reservations done using the APIs provided by rte_malloc + are also backed by hugepages unless ``--no-huge`` option is given. -+ Dynamic memory mode +Dynamic Memory Mode +^^^^^^^^^^^^^^^^^^^ -Currently, this mode is only supported on Linux. +Currently, this mode is only supported on Linux and Windows. In this mode, usage of hugepages by DPDK application will grow and shrink based on application's requests. Any memory allocation through ``rte_malloc()``, @@ -155,7 +157,8 @@ of memory that can be used by DPDK application. :ref:`Multi-process Support ` for more details about DPDK IPC. -+ Legacy memory mode +Legacy Memory Mode +^^^^^^^^^^^^^^^^^^ This mode is enabled by specifying ``--legacy-mem`` command-line switch to the EAL. This switch will have no effect on FreeBSD as FreeBSD only supports @@ -168,7 +171,8 @@ not allow acquiring or releasing hugepages from the system at runtime. If neither ``-m`` nor ``--socket-mem`` were specified, the entire available hugepage memory will be preallocated. -+ Hugepage allocation matching +Hugepage Allocation Matching +^^^^^^^^^^^^^^^^^^^^^^^^^^^^ This behavior is enabled by specifying the ``--match-allocations`` command-line switch to the EAL. This switch is Linux-only and not supported with @@ -182,7 +186,8 @@ matching can be used by these types of applications to satisfy both of these requirements. This can result in some increased memory usage which is very dependent on the memory allocation patterns of the application. -+ 32-bit support +32-bit Support +^^^^^^^^^^^^^^ Additional restrictions are present when running in 32-bit mode. In dynamic memory mode, by default maximum of 2 gigabytes of VA space will be preallocated, @@ -192,7 +197,8 @@ used. In legacy mode, VA space will only be preallocated for segments that were requested (plus padding, to keep IOVA-contiguousness). -+ Maximum amount of memory +Maximum Amount of Memory +^^^^^^^^^^^^^^^^^^^^^^^^ All possible virtual memory space that can ever be used for hugepage mapping in a DPDK process is preallocated at startup, thereby placing an upper limit on how @@ -222,7 +228,77 @@ Normally, these options do not need to be changed. can later be mapped into that preallocated VA space (if dynamic memory mode is enabled), and can optionally be mapped into it at startup. -+ Segment file descriptors +Hugepage Mapping +^^^^^^^^^^^^^^^^ + +Below is an overview of methods used for each OS to obtain hugepages, +explaining why certain limitations and options exist in EAL. +See the user guide for a specific OS for configuration details. + +FreeBSD uses ``contigmem`` kernel module +to reserve a fixed number of hugepages at system start, +which are mapped by EAL at initialization using a specific ``sysctl()``. + +Windows EAL allocates hugepages from the OS as needed using Win32 API, +so available amount depends on the system load. +It uses ``virt2phys`` kernel module to obtain physical addresses, +unless running in IOVA-as-VA mode (e.g. forced with ``--iova-mode=va``). + +Linux implements a variety of methods: + +* mapping each hugepage from its own file in hugetlbfs; +* mapping multiple hugepages from a shared file in hugetlbfs; +* anonymous mapping. + +Mapping hugepages from files in hugetlbfs is essential for multi-process, +because secondary processes need to map the same hugepages. +EAL creates files like ``rtemap_0`` +in directories specified with ``--huge-dir`` option +(or in the mount point for a specific hugepage size). +The ``rte`` prefix can be changed using ``--file-prefix``. +This may be needed for running multiple primary processes +that share a hugetlbfs mount point. +Each backing file by default corresponds to one hugepage, +it is opened and locked for the entire time the hugepage is used. +This may exhaust the number of open files limit (``NOFILE``). +See :ref:`segment-file-descriptors` section +on how the number of open backing file descriptors can be reduced. + +In dynamic memory mode, EAL removes a backing hugepage file +when all pages mapped from it are freed back to the system. +However, backing files may persist after the application terminates +in case of a crash or a leak of DPDK memory (e.g. ``rte_free()`` is missing). +This reduces the number of hugepages available to other processes +as reported by ``/sys/kernel/mm/hugepages/hugepages-*/free_hugepages``. +EAL can remove the backing files after opening them for mapping +if ``--huge-unlink`` is given to avoid polluting hugetlbfs. +However, since it disables multi-process anyway, +using anonymous mapping (``--in-memory``) is recommended instead. + +:ref:`EAL memory allocator ` relies on hugepages being zero-filled. +Hugepages are cleared by the kernel when a file in hugetlbfs or its part +is mapped for the first time system-wide +to prevent data leaks from previous users of the same hugepage. +EAL ensures this behavior by removing existing backing files at startup +and by recreating them before opening for mapping (as a precaution). + +Anonymous mapping does not allow multi-process architecture, +but it is free of filename conflicts and leftover files on hugetlbfs. +It makes running as non-root easier, +because memory management does not require root permissions in this case +(the limit of locked memory amount, ``MEMLOCK``, still applies). +If memfd_create(2) is supported both at build and run time, +DPDK memory manager can provide file descriptors for memory segments, +which are required for VirtIO with vhost-user backend. +This can exhaust the number of open files limit (``NOFILE``) +despite not creating any files in hugetlbfs. +See :ref:`segment-file-descriptors` section +on how the number of open file descriptors used by EAL can be reduced. + +.. _segment-file-descriptors: + +Segment File Descriptors +^^^^^^^^^^^^^^^^^^^^^^^^ On Linux, in most cases, EAL will store segment file descriptors in EAL. This can become a problem when using smaller page sizes due to underlying limitations @@ -731,6 +807,7 @@ We expect only 50% of CPU spend on packet IO. echo 100000 > pkt_io/cpu.cfs_period_us echo 50000 > pkt_io/cpu.cfs_quota_us +.. _malloc: Malloc ------ -- 2.25.1