From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id 04900A04A5; Thu, 30 Dec 2021 15:38:23 +0100 (CET) Received: from [217.70.189.124] (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id E3A3941155; Thu, 30 Dec 2021 15:38:14 +0100 (CET) Received: from NAM04-MW2-obe.outbound.protection.outlook.com (mail-mw2nam08on2044.outbound.protection.outlook.com [40.107.101.44]) by mails.dpdk.org (Postfix) with ESMTP id CE0DA4114B for ; Thu, 30 Dec 2021 15:38:11 +0100 (CET) ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=cYvhmC2wap3YhYud1m0BbHEvVFc7KjpDvQQPQ+g6Q11Y2QbooDqvwVwHx5zJMfqDs+Vhhw6nCLMRSOB5pJeGH4toq7hnDVxqqqDff6g0XVrKZdRwBwB+LeoGnYqtKmIHvt2dqfA0phw5JgjWGL2GRb5SAD2vRogJeGHVFbao70eyF/v0IsLjhlvy71b6zR2tUWHAeZEp/5zXlns9ZcG+Nv4CkK+l25S2ilTLQR8My1MgnKFLvCBDJTiVTE+WPEC0AQrcFRxlRLKjjZ59yO9oaTdVtpLtXRa4BJkAuM0GOqNfBjk10Vt2pXWt9+U2tJ8oAw5RLHHtGChtMVOD5OUmdw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=8eCp5kln7VpcTxH7TvnKPCtOGgMUQyayCZHVigMpvTU=; b=AIpGtzvL4TYN24tPIISgjrQldechMYLkcRzj3sFRfDL2f3K74ow1hFDiLXFevKS187jK/qcOkF6Rsi30vVhzDXjwukW//d0x6ekHIvOfsf+B1N6BiJLey/UksD06SQJ9hT2AAn6BMZ87kJrjTeRrD94alcy6a4PC9ol7/lwPxLisvUEmEkq5q28hdoFP7Mz5N/0GYY0SvOufaVLHyRnEjnTMHYFRIQD8CgGUsBVM9ypecIRuRKB/cAPseBNBdoxlJcxOwvLkEqqvz96oHgdCzNTYIJ1ll4+B54D8cCL4gVNW6tq+wmwMMfUO+ia4mxB9Ngp+ATGdmaylq+DT31/Esw== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 12.22.5.236) smtp.rcpttodomain=intel.com smtp.mailfrom=nvidia.com; dmarc=pass (p=reject sp=reject pct=100) action=none header.from=nvidia.com; dkim=none (message not signed); arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Nvidia.com; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=8eCp5kln7VpcTxH7TvnKPCtOGgMUQyayCZHVigMpvTU=; b=WWMRqBXiulx+Vt6mTLSss5/I2NT1avJp6Yi0pnKzjI+362o2ErNkjlELbBH7qktMrSDuSXwmtXmmVD54xHG5i5eWvUsONhZZTCNcHdXT/XQ0KvG5WPHZiv+Fu9wKQizsqvRsQ5kW9HLtqn+Sp8KzFPh31mF2ngVSDdpSv3ky1/yhCMTJUIOzckAMvV70R1/tjNsQQgtJURVmD5FnmZNlsuly+dv21rkaE0y1/7XBrU13k74Wya9Jdn27SZexAjN/1onnX9NbFSXXb9xO3QwuTP9ckTOK5eQAgxeKgrtF24pJpXWzQNZi4uEtVykR2iYm9qyoyaN/T73nxDUOapYDXQ== Received: from CO2PR18CA0049.namprd18.prod.outlook.com (2603:10b6:104:2::17) by BN8PR12MB2865.namprd12.prod.outlook.com (2603:10b6:408:96::13) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4823.18; Thu, 30 Dec 2021 14:38:09 +0000 Received: from CO1NAM11FT053.eop-nam11.prod.protection.outlook.com (2603:10b6:104:2:cafe::2a) by CO2PR18CA0049.outlook.office365.com (2603:10b6:104:2::17) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4844.14 via Frontend Transport; Thu, 30 Dec 2021 14:38:09 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 12.22.5.236) smtp.mailfrom=nvidia.com; dkim=none (message not signed) header.d=none;dmarc=pass action=none header.from=nvidia.com; Received-SPF: Pass (protection.outlook.com: domain of nvidia.com designates 12.22.5.236 as permitted sender) receiver=protection.outlook.com; client-ip=12.22.5.236; helo=mail.nvidia.com; Received: from mail.nvidia.com (12.22.5.236) by CO1NAM11FT053.mail.protection.outlook.com (10.13.175.63) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384) id 15.20.4844.14 via Frontend Transport; Thu, 30 Dec 2021 14:38:09 +0000 Received: from rnnvmail201.nvidia.com (10.129.68.8) by DRHQMAIL109.nvidia.com (10.27.9.19) with Microsoft SMTP Server (TLS) id 15.0.1497.18; Thu, 30 Dec 2021 14:38:07 +0000 Received: from nvidia.com (172.20.187.6) by rnnvmail201.nvidia.com (10.129.68.8) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384) id 15.2.986.9; Thu, 30 Dec 2021 06:38:03 -0800 From: Dmitry Kozlyuk To: CC: Anatoly Burakov Subject: [RFC PATCH 2/6] mem: add dirty malloc element support Date: Thu, 30 Dec 2021 16:37:40 +0200 Message-ID: <20211230143744.3550098-3-dkozlyuk@nvidia.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20211230143744.3550098-1-dkozlyuk@nvidia.com> References: <20211230143744.3550098-1-dkozlyuk@nvidia.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Content-Type: text/plain X-Originating-IP: [172.20.187.6] X-ClientProxiedBy: HQMAIL105.nvidia.com (172.20.187.12) To rnnvmail201.nvidia.com (10.129.68.8) X-EOPAttributedMessage: 0 X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id: 39215f94-b823-4b21-f8d0-08d9cba20004 X-MS-TrafficTypeDiagnostic: BN8PR12MB2865:EE_ X-Microsoft-Antispam-PRVS: X-MS-Oob-TLC-OOBClassifiers: OLM:9508; X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: KAgZ80A41HjvA5KVj0CpPCyCbfy5gXNA2o14TgkU3fK3rbRrC2rklXFExC8hOCUsgkSeG8usBm9etPXrFL7I8WkvMZA9HvKDnahFHMQ6jUI/uuGH2xuIOLkXqsFnhQzbfFbEEZ1p7t36o5vZtWystFtDOPXsf624QWCM4utb9Nnq4aiABq1GpKniiv4PP115l+sRmCHMwUckNp5jK4b8koYChEpbLyWex/utre0YYHmEnF5d4gK/j7lDGbSw/uISo8WlNm1nCzmwdFeRD9m+m+78FCeeHcBRnV5Xu27bfwm5GNSnIH/ZffSIETBnMTHs9QWFryBJme8A9TBHdovvkaAcvyryb9PRoM61pimSMBG1I6RELXWh+Rvo9GzOtakr1KSD+6d/GTQUN4WMIqL5A1njuityjiCyTd+Qiq/wpSjCVLMd3JT0OwQG/uLtkslU2KMQMiJBU6PlqOl9o7A4POHe2WnCns353FMMwyTKfbFwkyJQFPsHHDCLnpy/IdxX7QAGG3ikiAlI3sG1HQeFUGelxCXI0VZpE/EcRVFwLhEb4QaXM6SfeUQmSXmGDWkKfdUkXWXPdgEt9zdWiWR8IUPbyuBxdBq2NSYrVtIguqNATHcPpVQlDiGMVIUshdz9/dgrtydiDZph5rD5bUyYluStJuyXaTFpMUnVOPYcGMEysgrx2K9/xuA7uT9IYE6ONbScSjuHZzVoNZSJZxTVjLd3IXPJTOW2iTwDwUEISB+W5ooMuuUai7gCoCnZBwvlN6Z7pruNsgSqcdO+ixtt/97Mfbs+NHNXAsYmJIjEZ+/ne0/Nmpuf3Xi6jxieHitayJSNm4m/z52UfqQ/VBbzsA== X-Forefront-Antispam-Report: CIP:12.22.5.236; CTRY:US; LANG:en; SCL:1; SRV:; IPV:CAL; SFV:NSPM; H:mail.nvidia.com; PTR:InfoNoRecords; CAT:NONE; SFS:(4636009)(40470700002)(36840700001)(46966006)(70206006)(81166007)(7696005)(2616005)(26005)(4326008)(316002)(6916009)(508600001)(83380400001)(47076005)(1076003)(6666004)(70586007)(16526019)(6286002)(86362001)(36756003)(2906002)(5660300002)(186003)(40460700001)(356005)(8676002)(426003)(336012)(8936002)(82310400004)(36860700001)(55016003)(14143004)(36900700001); DIR:OUT; SFP:1101; X-OriginatorOrg: Nvidia.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 30 Dec 2021 14:38:09.3852 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 39215f94-b823-4b21-f8d0-08d9cba20004 X-MS-Exchange-CrossTenant-Id: 43083d15-7273-40c1-b7db-39efd9ccc17a X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=43083d15-7273-40c1-b7db-39efd9ccc17a; Ip=[12.22.5.236]; Helo=[mail.nvidia.com] X-MS-Exchange-CrossTenant-AuthSource: CO1NAM11FT053.eop-nam11.prod.protection.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: BN8PR12MB2865 X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org EAL malloc layer assumed all free elements content is filled with zeros ("clean"), as opposed to uninitialized ("dirty"). This assumption was ensured in two ways: 1. EAL memalloc layer always returned clean memory. 2. Freed memory was cleared before returning into the heap. Clearing the memory can be as slow as around 14 GiB/s. To save doing so, memalloc layer is allowed to return dirty memory. Such segments being marked with RTE_MEMSEG_FLAG_DIRTY. The allocator tracks elements that contain dirty memory using the new flag in the element header. When clean memory is requested via rte_zmalloc*() and the suitable element is dirty, it is cleared on allocation. When memory is deallocated, the freed element is joined with adjacent free elements, and the dirty flag is updated: dirty + freed + dirty = dirty => no need to clean freed + dirty = dirty the freed memory clean + freed + clean = clean => freed memory clean + freed = clean must be cleared freed + clean = clean freed = clean As a result, memory is either cleared on free, as before, or it will be cleared on allocation if need be, but never twice. Signed-off-by: Dmitry Kozlyuk --- lib/eal/common/malloc_elem.c | 22 +++++++++++++++++++--- lib/eal/common/malloc_elem.h | 11 +++++++++-- lib/eal/common/malloc_heap.c | 18 ++++++++++++------ lib/eal/common/rte_malloc.c | 21 ++++++++++++++------- lib/eal/include/rte_memory.h | 8 ++++++-- 5 files changed, 60 insertions(+), 20 deletions(-) diff --git a/lib/eal/common/malloc_elem.c b/lib/eal/common/malloc_elem.c index bdd20a162e..e04e0890fb 100644 --- a/lib/eal/common/malloc_elem.c +++ b/lib/eal/common/malloc_elem.c @@ -129,7 +129,7 @@ malloc_elem_find_max_iova_contig(struct malloc_elem *elem, size_t align) void malloc_elem_init(struct malloc_elem *elem, struct malloc_heap *heap, struct rte_memseg_list *msl, size_t size, - struct malloc_elem *orig_elem, size_t orig_size) + struct malloc_elem *orig_elem, size_t orig_size, bool dirty) { elem->heap = heap; elem->msl = msl; @@ -137,6 +137,7 @@ malloc_elem_init(struct malloc_elem *elem, struct malloc_heap *heap, elem->next = NULL; memset(&elem->free_list, 0, sizeof(elem->free_list)); elem->state = ELEM_FREE; + elem->dirty = dirty; elem->size = size; elem->pad = 0; elem->orig_elem = orig_elem; @@ -300,7 +301,7 @@ split_elem(struct malloc_elem *elem, struct malloc_elem *split_pt) const size_t new_elem_size = elem->size - old_elem_size; malloc_elem_init(split_pt, elem->heap, elem->msl, new_elem_size, - elem->orig_elem, elem->orig_size); + elem->orig_elem, elem->orig_size, elem->dirty); split_pt->prev = elem; split_pt->next = next_elem; if (next_elem) @@ -506,6 +507,7 @@ join_elem(struct malloc_elem *elem1, struct malloc_elem *elem2) else elem1->heap->last = elem1; elem1->next = next; + elem1->dirty |= elem2->dirty; if (elem1->pad) { struct malloc_elem *inner = RTE_PTR_ADD(elem1, elem1->pad); inner->size = elem1->size - elem1->pad; @@ -579,6 +581,14 @@ malloc_elem_free(struct malloc_elem *elem) ptr = RTE_PTR_ADD(elem, MALLOC_ELEM_HEADER_LEN); data_len = elem->size - MALLOC_ELEM_OVERHEAD; + /* + * Consider the element clean for the purposes of joining. + * If both neighbors are clean or non-existent, + * the joint element will be clean, + * which means the memory should be cleared. + * There is no need to clear the memory if the joint element is dirty. + */ + elem->dirty = false; elem = malloc_elem_join_adjacent_free(elem); malloc_elem_free_list_insert(elem); @@ -588,8 +598,14 @@ malloc_elem_free(struct malloc_elem *elem) /* decrease heap's count of allocated elements */ elem->heap->alloc_count--; - /* poison memory */ +#ifndef RTE_MALLOC_DEBUG + /* Normally clear the memory when needed. */ + if (!elem->dirty) + memset(ptr, 0, data_len); +#else + /* Always poison the memory in debug mode. */ memset(ptr, MALLOC_POISON, data_len); +#endif return elem; } diff --git a/lib/eal/common/malloc_elem.h b/lib/eal/common/malloc_elem.h index 15d8ba7af2..f2aa98821b 100644 --- a/lib/eal/common/malloc_elem.h +++ b/lib/eal/common/malloc_elem.h @@ -27,7 +27,13 @@ struct malloc_elem { LIST_ENTRY(malloc_elem) free_list; /**< list of free elements in heap */ struct rte_memseg_list *msl; - volatile enum elem_state state; + /** Element state, @c dirty and @c pad validity depends on it. */ + /* An extra bit is needed to represent enum elem_state as signed int. */ + enum elem_state state : 3; + /** If state == ELEM_FREE: the memory is not filled with zeroes. */ + uint32_t dirty : 1; + /** Reserved for future use. */ + uint32_t reserved : 28; uint32_t pad; size_t size; struct malloc_elem *orig_elem; @@ -320,7 +326,8 @@ malloc_elem_init(struct malloc_elem *elem, struct rte_memseg_list *msl, size_t size, struct malloc_elem *orig_elem, - size_t orig_size); + size_t orig_size, + bool dirty); void malloc_elem_insert(struct malloc_elem *elem); diff --git a/lib/eal/common/malloc_heap.c b/lib/eal/common/malloc_heap.c index 55aad2711b..24080fc473 100644 --- a/lib/eal/common/malloc_heap.c +++ b/lib/eal/common/malloc_heap.c @@ -93,11 +93,11 @@ malloc_socket_to_heap_id(unsigned int socket_id) */ static struct malloc_elem * malloc_heap_add_memory(struct malloc_heap *heap, struct rte_memseg_list *msl, - void *start, size_t len) + void *start, size_t len, bool dirty) { struct malloc_elem *elem = start; - malloc_elem_init(elem, heap, msl, len, elem, len); + malloc_elem_init(elem, heap, msl, len, elem, len, dirty); malloc_elem_insert(elem); @@ -135,7 +135,8 @@ malloc_add_seg(const struct rte_memseg_list *msl, found_msl = &mcfg->memsegs[msl_idx]; - malloc_heap_add_memory(heap, found_msl, ms->addr, len); + malloc_heap_add_memory(heap, found_msl, ms->addr, len, + ms->flags & RTE_MEMSEG_FLAG_DIRTY); heap->total_size += len; @@ -303,7 +304,8 @@ alloc_pages_on_heap(struct malloc_heap *heap, uint64_t pg_sz, size_t elt_size, struct rte_memseg_list *msl; struct malloc_elem *elem = NULL; size_t alloc_sz; - int allocd_pages; + int allocd_pages, i; + bool dirty = false; void *ret, *map_addr; alloc_sz = (size_t)pg_sz * n_segs; @@ -372,8 +374,12 @@ alloc_pages_on_heap(struct malloc_heap *heap, uint64_t pg_sz, size_t elt_size, goto fail; } + /* Element is dirty if it contains at least one dirty page. */ + for (i = 0; i < allocd_pages; i++) + dirty |= ms[i]->flags & RTE_MEMSEG_FLAG_DIRTY; + /* add newly minted memsegs to malloc heap */ - elem = malloc_heap_add_memory(heap, msl, map_addr, alloc_sz); + elem = malloc_heap_add_memory(heap, msl, map_addr, alloc_sz, dirty); /* try once more, as now we have allocated new memory */ ret = find_suitable_element(heap, elt_size, flags, align, bound, @@ -1260,7 +1266,7 @@ malloc_heap_add_external_memory(struct malloc_heap *heap, memset(msl->base_va, 0, msl->len); /* now, add newly minted memory to the malloc heap */ - malloc_heap_add_memory(heap, msl, msl->base_va, msl->len); + malloc_heap_add_memory(heap, msl, msl->base_va, msl->len, false); heap->total_size += msl->len; diff --git a/lib/eal/common/rte_malloc.c b/lib/eal/common/rte_malloc.c index d0bec26920..71a3f7ecb4 100644 --- a/lib/eal/common/rte_malloc.c +++ b/lib/eal/common/rte_malloc.c @@ -115,15 +115,22 @@ rte_zmalloc_socket(const char *type, size_t size, unsigned align, int socket) { void *ptr = rte_malloc_socket(type, size, align, socket); + if (ptr != NULL) { + struct malloc_elem *elem = malloc_elem_from_data(ptr); + + if (elem->dirty) { + memset(ptr, 0, size); + } else { #ifdef RTE_MALLOC_DEBUG - /* - * If DEBUG is enabled, then freed memory is marked with poison - * value and set to zero on allocation. - * If DEBUG is not enabled then memory is already zeroed. - */ - if (ptr != NULL) - memset(ptr, 0, size); + /* + * If DEBUG is enabled, then freed memory is marked + * with a poison value and set to zero on allocation. + * If DEBUG is disabled then memory is already zeroed. + */ + memset(ptr, 0, size); #endif + } + } rte_eal_trace_mem_zmalloc(type, size, align, socket, ptr); return ptr; diff --git a/lib/eal/include/rte_memory.h b/lib/eal/include/rte_memory.h index 6d018629ae..d76e7ba780 100644 --- a/lib/eal/include/rte_memory.h +++ b/lib/eal/include/rte_memory.h @@ -19,6 +19,7 @@ extern "C" { #endif +#include #include #include #include @@ -37,11 +38,14 @@ extern "C" { #define SOCKET_ID_ANY -1 /**< Any NUMA socket. */ +/** Prevent this segment from being freed back to the OS. */ +#define RTE_MEMSEG_FLAG_DO_NOT_FREE RTE_BIT32(0) +/** This segment is not fileld with zeros. */ +#define RTE_MEMSEG_FLAG_DIRTY RTE_BIT32(1) + /** * Physical memory segment descriptor. */ -#define RTE_MEMSEG_FLAG_DO_NOT_FREE (1 << 0) -/**< Prevent this segment from being freed back to the OS. */ struct rte_memseg { rte_iova_t iova; /**< Start IO address. */ RTE_STD_C11 -- 2.25.1