From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id D859BA0351; Wed, 19 Jan 2022 22:09:53 +0100 (CET) Received: from [217.70.189.124] (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id 3FC4241226; Wed, 19 Jan 2022 22:09:45 +0100 (CET) Received: from NAM02-DM3-obe.outbound.protection.outlook.com (mail-dm3nam07on2045.outbound.protection.outlook.com [40.107.95.45]) by mails.dpdk.org (Postfix) with ESMTP id 5F48541226 for ; Wed, 19 Jan 2022 22:09:43 +0100 (CET) ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=CeY9be+ZRUk9GkIQMsruGTzpOJHB9f+BxgA8fAuGHmLy5BV1ev5IKyd140Ot/JNF+6MC8k82l4TGYH5MqWVmDC0/j0qiU8qCg13RbR3//3+Lh6/e2OaxUsQr0JKCsetWpxd62/TW7p0PHWPabR4GjyOzHxYAvYPmFpU1cxsuOXhemjwWIPEPm7RBGqfN7XvkkZvYEjiOdsjlXlKebFf/CJH2nkDQd9xQoy7RFLLvVvdK8eXthZq8Em4CIKjDZHz7GMmOwvTAeY8q1UCePbuWKJkMkGfy5dL1EhDOdi8ByRPYbDjK5OQgLUyUnZdXnLGe8StYFodICNURZlkInF2yUg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=3q8p9pWyamHSlNUnDqBeNoemgYaaAfuo1yjAPP8mAZ0=; b=Uo+sxl/tkQHdSJtU2LrfbYnlMVes7FlAuRINYiVutaQ5WLNyktlf+faJegRjHjT8qRQ7C5GjQ2DGNz6LyX15lTV9ZAT0m8Zb4r/1kbgXaoM1w7DAGJsm8F3x0C0aoMId1qpdk/8O19HexZe8tNhO2VIlRQGMxocfnv3Y2/0mOyaHr+5PzXKt0wkIwwfa2XQG0jYrEoiaDxPIgG0i955N65yXeyYn3ylcThln+4ck6xR7LS3K3EsMwZnb2hmj2o02xIjHtueUCgd/HHDgRpOS5LSrHb1Nx5EReezCf8CKQD632rBd+/uyzg41BMwuNXcSIUHwZGU2vbIBvrzEp4xWoQ== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 12.22.5.236) smtp.rcpttodomain=intel.com smtp.mailfrom=nvidia.com; dmarc=pass (p=reject sp=reject pct=100) action=none header.from=nvidia.com; dkim=none (message not signed); arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Nvidia.com; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=3q8p9pWyamHSlNUnDqBeNoemgYaaAfuo1yjAPP8mAZ0=; b=TPAMCR9rD7/s1a6ATHdBTWrMXgnezjGHLU5xqSeSMJJo4SFARSSqs+UjST9+iqGDIywDZOOTklCe8rwW2KmeNKpeLMOjQT4bHOK7LLguPY41WUuZBubBGQDnKWIiGKrt4zgE7Kny2Yz2L24VAwRrnN+HuLFIY6M00iDK7BAjEsHtT7L6zfxpUERdODFgc/Zr9PUli9t1QpNnXjjjQZbAkTWrAW75WcGoBoPBAUUZzgBPdzjFttlBZyyVwsKAGWeAujQq5/T7AZQlhKLMUnsmSm+Yv54c4VA359xHAYDDWhrO96Cqn9VFj5m3k3KRbX+rllTvfmfHoRyLvXT0ZWp4iA== Received: from BN6PR17CA0057.namprd17.prod.outlook.com (2603:10b6:405:75::46) by MN2PR12MB4111.namprd12.prod.outlook.com (2603:10b6:208:1de::22) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4909.8; Wed, 19 Jan 2022 21:09:40 +0000 Received: from BN8NAM11FT049.eop-nam11.prod.protection.outlook.com (2603:10b6:405:75:cafe::d8) by BN6PR17CA0057.outlook.office365.com (2603:10b6:405:75::46) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4888.11 via Frontend Transport; Wed, 19 Jan 2022 21:09:40 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 12.22.5.236) smtp.mailfrom=nvidia.com; dkim=none (message not signed) header.d=none;dmarc=pass action=none header.from=nvidia.com; Received-SPF: Pass (protection.outlook.com: domain of nvidia.com designates 12.22.5.236 as permitted sender) receiver=protection.outlook.com; client-ip=12.22.5.236; helo=mail.nvidia.com; Received: from mail.nvidia.com (12.22.5.236) by BN8NAM11FT049.mail.protection.outlook.com (10.13.177.157) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384) id 15.20.4909.7 via Frontend Transport; Wed, 19 Jan 2022 21:09:40 +0000 Received: from rnnvmail201.nvidia.com (10.129.68.8) by DRHQMAIL109.nvidia.com (10.27.9.19) with Microsoft SMTP Server (TLS) id 15.0.1497.18; Wed, 19 Jan 2022 21:09:38 +0000 Received: from nvidia.com (10.126.230.35) by rnnvmail201.nvidia.com (10.129.68.8) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384) id 15.2.986.9; Wed, 19 Jan 2022 13:09:36 -0800 From: Dmitry Kozlyuk To: CC: Bruce Richardson , Anatoly Burakov Subject: [PATCH v2 3/6] mem: add dirty malloc element support Date: Wed, 19 Jan 2022 23:09:14 +0200 Message-ID: <20220119210917.765505-4-dkozlyuk@nvidia.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20220119210917.765505-1-dkozlyuk@nvidia.com> References: <20220117080801.481568-1-dkozlyuk@nvidia.com> <20220119210917.765505-1-dkozlyuk@nvidia.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Content-Type: text/plain X-Originating-IP: [10.126.230.35] X-ClientProxiedBy: HQMAIL105.nvidia.com (172.20.187.12) To rnnvmail201.nvidia.com (10.129.68.8) X-EOPAttributedMessage: 0 X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id: 71959ffd-3b42-4867-fa42-08d9db900207 X-MS-TrafficTypeDiagnostic: MN2PR12MB4111:EE_ X-Microsoft-Antispam-PRVS: X-MS-Oob-TLC-OOBClassifiers: OLM:10000; X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: y44TnbxaIdqOntfKvx+hEM2QEwuZvfWJNzrnvTIs4zgX9CRxKLnq3zTqda/3OyEjUlGuFFMVh9NVhECPwOyAax/Vc3xvwl5kViZ5QyUh9XEPovldE64lvsfQ8uD77PaCxEPyQKGlkqMsZbzzBqeyWaop81qshN230N2y1KC8EG3zREfn4OUUAwOgSyvjM+g2of4nUX0Sl6TL0XEQNBALw1mwr6VhOvtzJKQKPrp0/zTO5xoZ8t0ZqtPHr0UvL4v9AgbkuHpxDO0S3aCPPaGScr3/dsufOWvo5uzam6QU7NyFUzStG1S8Xs02QmVy46vFdEplFCz4Y4GGeFegx4OHPEu23c8QQ+X5vp0TQC09VmKqZAed+dwdRkCXW8s1qowV+MJXAJiNiykTWOEPrHMPoP0Mnyrb0uv8k1Ssd8bad051wS/o7ZcVNDYZ5ntUDq0+WD831ME3Kmp4L1QkIcrBHJMKvGN1hnr78Vhwy+VVcPxgMtr0egiQE50KlX8lnh7rlSwkyh1yoQfyS+9a7XdTGtzRjVc2DJBgtNiMovZRDw3aCNBXqoey0Y1J+BRViaJTnVvPY23g3ThetLdzTbich/byvRjbz193MLU66FfpMIyLifcpol5LvZkOHrUtFe61HdfoZWZu8caRwX5GHZ8ygfUDpFEYNhABCcnBQ6BH7Sw7hOIImXBcGxvS4oiouwj1H1jHREZzlM7/toV1hWPhsGyLL9+4OJYL/zcDqLtCevecy2HCl8UI5KGWQMFZiPrs4j1RtdKFjgsl4EjXB1a1dtozp+wImcSWnkv6s8nru89sQCgT1UDAsVSM4cGRGBq5oGLtaVL5mBUn6mamhcfgcg== X-Forefront-Antispam-Report: CIP:12.22.5.236; CTRY:US; LANG:en; SCL:1; SRV:; IPV:CAL; SFV:NSPM; H:mail.nvidia.com; PTR:InfoNoRecords; CAT:NONE; SFS:(4636009)(46966006)(36840700001)(40470700002)(86362001)(70586007)(7696005)(8936002)(70206006)(336012)(6666004)(40460700001)(426003)(2906002)(82310400004)(1076003)(36756003)(186003)(8676002)(16526019)(26005)(508600001)(55016003)(2616005)(54906003)(356005)(6286002)(6916009)(47076005)(36860700001)(316002)(83380400001)(4326008)(5660300002)(81166007)(14143004)(36900700001); DIR:OUT; SFP:1101; X-OriginatorOrg: Nvidia.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 19 Jan 2022 21:09:40.3768 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 71959ffd-3b42-4867-fa42-08d9db900207 X-MS-Exchange-CrossTenant-Id: 43083d15-7273-40c1-b7db-39efd9ccc17a X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=43083d15-7273-40c1-b7db-39efd9ccc17a; Ip=[12.22.5.236]; Helo=[mail.nvidia.com] X-MS-Exchange-CrossTenant-AuthSource: BN8NAM11FT049.eop-nam11.prod.protection.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: MN2PR12MB4111 X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org EAL malloc layer assumed all free elements content is filled with zeros ("clean"), as opposed to uninitialized ("dirty"). This assumption was ensured in two ways: 1. EAL memalloc layer always returned clean memory. 2. Freed memory was cleared before returning into the heap. Clearing the memory can be as slow as around 14 GiB/s. To save doing so, memalloc layer is allowed to return dirty memory. Such segments being marked with RTE_MEMSEG_FLAG_DIRTY. The allocator tracks elements that contain dirty memory using the new flag in the element header. When clean memory is requested via rte_zmalloc*() and the suitable element is dirty, it is cleared on allocation. When memory is deallocated, the freed element is joined with adjacent free elements, and the dirty flag is updated: a) If the joint element contains dirty parts, it is dirty: dirty + freed + dirty = dirty => no need to clean freed + dirty = dirty the freed memory Dirty parts may be large (e.g. initial allocation), so clearing them could create unpredictable slowdown. b) If the only dirty part of the joint element is the freed memory, the joint element can be made clean: clean + freed + clean = clean => freed memory clean + freed = clean must be cleared freed + clean = clean freed = clean This logic naturally reproduces the old behavior and always applies in modes when EAL memalloc layer returns only clean segments. As a result, memory is either cleared on free, as before, or it will be cleared on allocation if need be, but never twice. Signed-off-by: Dmitry Kozlyuk --- lib/eal/common/malloc_elem.c | 22 +++++++++++++++++++--- lib/eal/common/malloc_elem.h | 11 +++++++++-- lib/eal/common/malloc_heap.c | 18 ++++++++++++------ lib/eal/common/rte_malloc.c | 21 ++++++++++++++------- lib/eal/include/rte_memory.h | 8 ++++++-- 5 files changed, 60 insertions(+), 20 deletions(-) diff --git a/lib/eal/common/malloc_elem.c b/lib/eal/common/malloc_elem.c index bdd20a162e..e04e0890fb 100644 --- a/lib/eal/common/malloc_elem.c +++ b/lib/eal/common/malloc_elem.c @@ -129,7 +129,7 @@ malloc_elem_find_max_iova_contig(struct malloc_elem *elem, size_t align) void malloc_elem_init(struct malloc_elem *elem, struct malloc_heap *heap, struct rte_memseg_list *msl, size_t size, - struct malloc_elem *orig_elem, size_t orig_size) + struct malloc_elem *orig_elem, size_t orig_size, bool dirty) { elem->heap = heap; elem->msl = msl; @@ -137,6 +137,7 @@ malloc_elem_init(struct malloc_elem *elem, struct malloc_heap *heap, elem->next = NULL; memset(&elem->free_list, 0, sizeof(elem->free_list)); elem->state = ELEM_FREE; + elem->dirty = dirty; elem->size = size; elem->pad = 0; elem->orig_elem = orig_elem; @@ -300,7 +301,7 @@ split_elem(struct malloc_elem *elem, struct malloc_elem *split_pt) const size_t new_elem_size = elem->size - old_elem_size; malloc_elem_init(split_pt, elem->heap, elem->msl, new_elem_size, - elem->orig_elem, elem->orig_size); + elem->orig_elem, elem->orig_size, elem->dirty); split_pt->prev = elem; split_pt->next = next_elem; if (next_elem) @@ -506,6 +507,7 @@ join_elem(struct malloc_elem *elem1, struct malloc_elem *elem2) else elem1->heap->last = elem1; elem1->next = next; + elem1->dirty |= elem2->dirty; if (elem1->pad) { struct malloc_elem *inner = RTE_PTR_ADD(elem1, elem1->pad); inner->size = elem1->size - elem1->pad; @@ -579,6 +581,14 @@ malloc_elem_free(struct malloc_elem *elem) ptr = RTE_PTR_ADD(elem, MALLOC_ELEM_HEADER_LEN); data_len = elem->size - MALLOC_ELEM_OVERHEAD; + /* + * Consider the element clean for the purposes of joining. + * If both neighbors are clean or non-existent, + * the joint element will be clean, + * which means the memory should be cleared. + * There is no need to clear the memory if the joint element is dirty. + */ + elem->dirty = false; elem = malloc_elem_join_adjacent_free(elem); malloc_elem_free_list_insert(elem); @@ -588,8 +598,14 @@ malloc_elem_free(struct malloc_elem *elem) /* decrease heap's count of allocated elements */ elem->heap->alloc_count--; - /* poison memory */ +#ifndef RTE_MALLOC_DEBUG + /* Normally clear the memory when needed. */ + if (!elem->dirty) + memset(ptr, 0, data_len); +#else + /* Always poison the memory in debug mode. */ memset(ptr, MALLOC_POISON, data_len); +#endif return elem; } diff --git a/lib/eal/common/malloc_elem.h b/lib/eal/common/malloc_elem.h index 15d8ba7af2..f2aa98821b 100644 --- a/lib/eal/common/malloc_elem.h +++ b/lib/eal/common/malloc_elem.h @@ -27,7 +27,13 @@ struct malloc_elem { LIST_ENTRY(malloc_elem) free_list; /**< list of free elements in heap */ struct rte_memseg_list *msl; - volatile enum elem_state state; + /** Element state, @c dirty and @c pad validity depends on it. */ + /* An extra bit is needed to represent enum elem_state as signed int. */ + enum elem_state state : 3; + /** If state == ELEM_FREE: the memory is not filled with zeroes. */ + uint32_t dirty : 1; + /** Reserved for future use. */ + uint32_t reserved : 28; uint32_t pad; size_t size; struct malloc_elem *orig_elem; @@ -320,7 +326,8 @@ malloc_elem_init(struct malloc_elem *elem, struct rte_memseg_list *msl, size_t size, struct malloc_elem *orig_elem, - size_t orig_size); + size_t orig_size, + bool dirty); void malloc_elem_insert(struct malloc_elem *elem); diff --git a/lib/eal/common/malloc_heap.c b/lib/eal/common/malloc_heap.c index 55aad2711b..24080fc473 100644 --- a/lib/eal/common/malloc_heap.c +++ b/lib/eal/common/malloc_heap.c @@ -93,11 +93,11 @@ malloc_socket_to_heap_id(unsigned int socket_id) */ static struct malloc_elem * malloc_heap_add_memory(struct malloc_heap *heap, struct rte_memseg_list *msl, - void *start, size_t len) + void *start, size_t len, bool dirty) { struct malloc_elem *elem = start; - malloc_elem_init(elem, heap, msl, len, elem, len); + malloc_elem_init(elem, heap, msl, len, elem, len, dirty); malloc_elem_insert(elem); @@ -135,7 +135,8 @@ malloc_add_seg(const struct rte_memseg_list *msl, found_msl = &mcfg->memsegs[msl_idx]; - malloc_heap_add_memory(heap, found_msl, ms->addr, len); + malloc_heap_add_memory(heap, found_msl, ms->addr, len, + ms->flags & RTE_MEMSEG_FLAG_DIRTY); heap->total_size += len; @@ -303,7 +304,8 @@ alloc_pages_on_heap(struct malloc_heap *heap, uint64_t pg_sz, size_t elt_size, struct rte_memseg_list *msl; struct malloc_elem *elem = NULL; size_t alloc_sz; - int allocd_pages; + int allocd_pages, i; + bool dirty = false; void *ret, *map_addr; alloc_sz = (size_t)pg_sz * n_segs; @@ -372,8 +374,12 @@ alloc_pages_on_heap(struct malloc_heap *heap, uint64_t pg_sz, size_t elt_size, goto fail; } + /* Element is dirty if it contains at least one dirty page. */ + for (i = 0; i < allocd_pages; i++) + dirty |= ms[i]->flags & RTE_MEMSEG_FLAG_DIRTY; + /* add newly minted memsegs to malloc heap */ - elem = malloc_heap_add_memory(heap, msl, map_addr, alloc_sz); + elem = malloc_heap_add_memory(heap, msl, map_addr, alloc_sz, dirty); /* try once more, as now we have allocated new memory */ ret = find_suitable_element(heap, elt_size, flags, align, bound, @@ -1260,7 +1266,7 @@ malloc_heap_add_external_memory(struct malloc_heap *heap, memset(msl->base_va, 0, msl->len); /* now, add newly minted memory to the malloc heap */ - malloc_heap_add_memory(heap, msl, msl->base_va, msl->len); + malloc_heap_add_memory(heap, msl, msl->base_va, msl->len, false); heap->total_size += msl->len; diff --git a/lib/eal/common/rte_malloc.c b/lib/eal/common/rte_malloc.c index d0bec26920..71a3f7ecb4 100644 --- a/lib/eal/common/rte_malloc.c +++ b/lib/eal/common/rte_malloc.c @@ -115,15 +115,22 @@ rte_zmalloc_socket(const char *type, size_t size, unsigned align, int socket) { void *ptr = rte_malloc_socket(type, size, align, socket); + if (ptr != NULL) { + struct malloc_elem *elem = malloc_elem_from_data(ptr); + + if (elem->dirty) { + memset(ptr, 0, size); + } else { #ifdef RTE_MALLOC_DEBUG - /* - * If DEBUG is enabled, then freed memory is marked with poison - * value and set to zero on allocation. - * If DEBUG is not enabled then memory is already zeroed. - */ - if (ptr != NULL) - memset(ptr, 0, size); + /* + * If DEBUG is enabled, then freed memory is marked + * with a poison value and set to zero on allocation. + * If DEBUG is disabled then memory is already zeroed. + */ + memset(ptr, 0, size); #endif + } + } rte_eal_trace_mem_zmalloc(type, size, align, socket, ptr); return ptr; diff --git a/lib/eal/include/rte_memory.h b/lib/eal/include/rte_memory.h index 6d018629ae..68b069fd04 100644 --- a/lib/eal/include/rte_memory.h +++ b/lib/eal/include/rte_memory.h @@ -19,6 +19,7 @@ extern "C" { #endif +#include #include #include #include @@ -37,11 +38,14 @@ extern "C" { #define SOCKET_ID_ANY -1 /**< Any NUMA socket. */ +/** Prevent this segment from being freed back to the OS. */ +#define RTE_MEMSEG_FLAG_DO_NOT_FREE RTE_BIT32(0) +/** This segment is not filled with zeros. */ +#define RTE_MEMSEG_FLAG_DIRTY RTE_BIT32(1) + /** * Physical memory segment descriptor. */ -#define RTE_MEMSEG_FLAG_DO_NOT_FREE (1 << 0) -/**< Prevent this segment from being freed back to the OS. */ struct rte_memseg { rte_iova_t iova; /**< Start IO address. */ RTE_STD_C11 -- 2.25.1