From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id 3E3B7425DC; Mon, 18 Sep 2023 18:33:17 +0200 (CEST) Received: from mails.dpdk.org (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id 100EA40DDA; Mon, 18 Sep 2023 18:32:28 +0200 (CEST) Received: from mgamail.intel.com (mgamail.intel.com [134.134.136.31]) by mails.dpdk.org (Postfix) with ESMTP id 2FD0340E7C for ; Mon, 18 Sep 2023 18:32:26 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1695054745; x=1726590745; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=BvhAvYAl+d0DQrp+gCCaMYPmgOwLWX+/wCpGceBEJH8=; b=EkC8o5vnEzl6nffs8biaTQIrK6+z4EnAnsAC4/ZDu0t37Y2UKDMdSV0o uAEz/sNVwy7O3UqNPRNVKT7lWS/bnJyflSPVUEiknquTxTTpZvwskGjfi QX7CnQV1VTj9hlG1D4V1Oz0FSlL00fxTeSZ4qcRHw7R7h6u3x+lAQa4UL AyvNhs0cf0+sPY1SKWRWGxm6FQ4yImpWHk3QKnZ1me+hBVMVbWANZv/Id kgmQ89hpsivgFphIW2gl5/yKAetJoScOoe51etNsqCzaFzJIxP2yszHst xtD78bJs3zbr4BLk7b6B2I8f4cW4sY/WanYxdFcE9/Jky6OjiHb0AwFLF w==; X-IronPort-AV: E=McAfee;i="6600,9927,10837"; a="443784681" X-IronPort-AV: E=Sophos;i="6.02,156,1688454000"; d="scan'208";a="443784681" Received: from orsmga008.jf.intel.com ([10.7.209.65]) by orsmga104.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 18 Sep 2023 09:32:25 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10837"; a="775189383" X-IronPort-AV: E=Sophos;i="6.02,156,1688454000"; d="scan'208";a="775189383" Received: from silpixa00401385.ir.intel.com ([10.237.214.14]) by orsmga008.jf.intel.com with ESMTP; 18 Sep 2023 09:32:24 -0700 From: Bruce Richardson To: dev@dpdk.org Cc: Bruce Richardson , Anatoly Burakov Subject: [PATCH v3 2/2] eal: allow swapping of malloc heaps Date: Mon, 18 Sep 2023 17:32:06 +0100 Message-Id: <20230918163206.1010611-3-bruce.richardson@intel.com> X-Mailer: git-send-email 2.39.2 In-Reply-To: <20230918163206.1010611-1-bruce.richardson@intel.com> References: <20230915122703.475834-1-bruce.richardson@intel.com> <20230918163206.1010611-1-bruce.richardson@intel.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org The external memory functions in DPDK allow the addition of externally access memory to malloc heaps, but with one major restriction - the memory must be allocated to an application-created heap, not one of the standard DPDK heaps for a NUMA node. This restriction makes it difficult - if not impossible - to use externally allocated memory for DPDK by default. However, even if the restriction is relaxed, so we can add external memory to e.g. the socket 0 heap, there would be no way to guarantee that the external memory would be used in preference to the standard DPDK hugepage memory for a given allocation. To give appropriately defined behaviour, a better solution is to allow the application to explicitly swap a pair of heaps. With this one new API in place, it allows the user to configure a new malloc heap, add external memory to it, and then replace a standard socket heap with the newly created one - thereby guaranteeing future allocations from the external memory. Signed-off-by: Bruce Richardson --- lib/eal/common/malloc_heap.c | 24 ++++++++++++++++++++++++ lib/eal/include/rte_malloc.h | 34 ++++++++++++++++++++++++++++++++++ lib/eal/version.map | 2 ++ 3 files changed, 60 insertions(+) diff --git a/lib/eal/common/malloc_heap.c b/lib/eal/common/malloc_heap.c index 4fa38fcd44..eba75111ca 100644 --- a/lib/eal/common/malloc_heap.c +++ b/lib/eal/common/malloc_heap.c @@ -1320,6 +1320,30 @@ malloc_heap_add_external_memory(struct malloc_heap *heap, return 0; } +int +rte_malloc_heap_swap_socket(int socket1, int socket2) +{ + const int h1 = malloc_socket_to_heap_id(socket1); + if (h1 < 0 || h1 > RTE_MAX_HEAPS) + return -1; + + const int h2 = malloc_socket_to_heap_id(socket2); + if (h2 < 0 || h2 > RTE_MAX_HEAPS) + return -1; + + + rte_mcfg_mem_write_lock(); + do { + struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config; + int tmp = mcfg->malloc_heaps[h1].socket_id; + mcfg->malloc_heaps[h1].socket_id = mcfg->malloc_heaps[h2].socket_id; + mcfg->malloc_heaps[h2].socket_id = tmp; + } while (0); + rte_mcfg_mem_write_unlock(); + + return 0; +} + int malloc_heap_remove_external_memory(struct malloc_heap *heap, void *va_addr, size_t len) diff --git a/lib/eal/include/rte_malloc.h b/lib/eal/include/rte_malloc.h index 54a8ac211e..df356a5efe 100644 --- a/lib/eal/include/rte_malloc.h +++ b/lib/eal/include/rte_malloc.h @@ -490,6 +490,40 @@ rte_malloc_heap_get_socket(const char *name); int rte_malloc_heap_socket_is_external(int socket_id); +/** + * Swap the heaps for the given socket ids + * + * This causes the heaps for the given socket ids to be swapped, allowing + * external memory registered as a malloc heap to become the new default memory + * for a standard numa node. For example, to have allocations on socket 0 come + * from external memory, the following sequence of API calls can be used: + * @code + * rte_malloc_heap_create() + * rte_malloc_heap_memory_add(,....) + * id = rte_malloc_heap_get_socket() + * rte_malloc_heap_swap_socket(0, id) + * @endcode + * + * Following these calls, allocations for the old memory allocated on socket 0, + * can be made by passing "id" as the socket_id parameter. + * + * @note: It is recommended that this function be used only after EAL initialization, + * before any temporary objects are created from the DPDK heaps. + * @note: Since any objects allocated using rte_malloc and similar functions, track + * the heaps via pointers, any already-allocated objects will be returned to their + * original heaps, even after a call to this function. + * + * @param socket1 + * The socket id of the first heap to swap + * @param socket2 + * The socket id of the second heap to swap + * @return + * 0 on success, -1 on error + */ +__rte_experimental +int +rte_malloc_heap_swap_socket(int socket1, int socket2); + /** * Dump statistics. * diff --git a/lib/eal/version.map b/lib/eal/version.map index 7940431e5a..b06ee7219e 100644 --- a/lib/eal/version.map +++ b/lib/eal/version.map @@ -417,6 +417,8 @@ EXPERIMENTAL { # added in 23.07 rte_memzone_max_get; rte_memzone_max_set; + + rte_malloc_heap_swap_socket; }; INTERNAL { -- 2.39.2