From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id 2B31E41CA5; Wed, 15 Feb 2023 12:10:29 +0100 (CET) Received: from mails.dpdk.org (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id 15E52410DD; Wed, 15 Feb 2023 12:10:29 +0100 (CET) Received: from smartserver.smartsharesystems.com (smartserver.smartsharesystems.com [77.243.40.215]) by mails.dpdk.org (Postfix) with ESMTP id B92ED40EE3 for ; Wed, 15 Feb 2023 12:10:27 +0100 (CET) X-MimeOLE: Produced By Microsoft Exchange V6.5 Content-class: urn:content-classes:message MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable Subject: RE: [PATCH] malloc: fix malloc performance may becomes worse as the number of malloc increases Date: Wed, 15 Feb 2023 12:10:23 +0100 Message-ID: <98CBD80474FA8B44BF855DF32C47DC35D8773F@smartserver.smartshare.dk> In-Reply-To: <2134819.GUh0CODmnK@thomas> X-MS-Has-Attach: X-MS-TNEF-Correlator: Thread-Topic: [PATCH] malloc: fix malloc performance may becomes worse as the number of malloc increases Thread-Index: AdlBJb8s/Cey+yc6TsG+0HgRBawIUQABEDrA References: <20230210063022.52171-1-changfengnan@bytedance.com> <2134819.GUh0CODmnK@thomas> From: =?iso-8859-1?Q?Morten_Br=F8rup?= To: "Thomas Monjalon" , , "Fengnan Chang" Cc: , , , , , , , "Fidaullah Noonari" X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org +CC: Fidaullah Noonari , your name also = shows up in the git log; perhaps you can help review this patch. > From: Thomas Monjalon [mailto:thomas@monjalon.net] > Sent: Wednesday, 15 February 2023 11.10 >=20 > Looking for reviewers please. >=20 > 10/02/2023 07:30, Fengnan Chang: > > Here is a simple test case: > > " > > uint64_t entry_time, time; > > size_t size =3D 4096; > > unsigned align =3D 4096; > > for (int j =3D 0; j < 10; j++) { > > entry_time =3D rte_get_timer_cycles(); > > for (int i =3D 0; i < 2000; i++) { > > rte_malloc(NULL, size, align); > > } > > time =3D (rte_get_timer_cycles()-entry_time) * 1000000 / > > rte_get_timer_hz(); > > printf("total open time %lu avg time %lu\n", time, time/2000); > > } > > " > > > > Single rte_malloc cost time may becomes wrose as the number of = malloc > > increases, In my env, first round avg time is 15us, second is 44us, > > third is 77us, fourth is 168us... > > > > The reason is,in the malloc process, malloc_elem_alloc may split > new_elem > > if there have too much free space after new_elem, and insert the > trailer > > into freelist. When alloc 4k with align 4k, the trailer very likely > insert > > to free_head[2] again, it makes free_head[2] longer. when alloc 4k > again, > > it will search free_head[2] from begin, with the number of malloc > increases, > > search free_head[2] need more time, so the performance will become > worse. > > Same problem will also occurs in alloc 64k with align 64k, but if > alloc > > 4k with align 64, doesn't have this problem. > > > > Fix this by adjust free_head list size range, make free_head[3] hold > > elements which bigger or equal 4k, free_head[4] hold elements which > bigger > > or equal 16k. > > In terms of probabilities, when alloc 4k or 16k, the probability of > finding > > a suitable elem from a larger size list is greater than from a > smaller > > size list. > > > > Signed-off-by: Fengnan Chang > > --- > > lib/eal/common/malloc_elem.c | 14 +++++++------- > > 1 file changed, 7 insertions(+), 7 deletions(-) > > > > diff --git a/lib/eal/common/malloc_elem.c > b/lib/eal/common/malloc_elem.c > > index 83f05497cc..35a2313d04 100644 > > --- a/lib/eal/common/malloc_elem.c > > +++ b/lib/eal/common/malloc_elem.c > > @@ -367,11 +367,11 @@ prev_elem_is_adjacent(struct malloc_elem = *elem) > > * containing larger elements. > > * > > * Example element size ranges for a heap with five free lists: > > - * heap->free_head[0] - (0 , 2^8] > > - * heap->free_head[1] - (2^8 , 2^10] > > - * heap->free_head[2] - (2^10 ,2^12] > > - * heap->free_head[3] - (2^12, 2^14] > > - * heap->free_head[4] - (2^14, MAX_SIZE] > > + * heap->free_head[0] - (0 , 2^8) > > + * heap->free_head[1] - [2^8 , 2^10) > > + * heap->free_head[2] - [2^10 ,2^12) > > + * heap->free_head[3] - [2^12, 2^14) > > + * heap->free_head[4] - [2^14, MAX_SIZE] > > */ > > size_t > > malloc_elem_free_list_index(size_t size) > > @@ -385,8 +385,8 @@ malloc_elem_free_list_index(size_t size) > > if (size <=3D (1UL << MALLOC_MINSIZE_LOG2)) > > return 0; > > > > - /* Find next power of 2 >=3D size. */ > > - log2 =3D sizeof(size) * 8 - __builtin_clzl(size - 1); > > + /* Find next power of 2 > size. */ > > + log2 =3D sizeof(size) * 8 - __builtin_clzl(size); > > > > /* Compute freelist index, based on log2(size). */ > > index =3D (log2 - MALLOC_MINSIZE_LOG2 + MALLOC_LOG2_INCREMENT - 1) > / > > I gave up reviewing in depth, because the existing code is not easy to = quickly understand, and it would take too long for me. E.g. the = malloc_elem->size is field is undocumented, and find_suitable_element() = calls the malloc_elem_free_list_index() function with the raw size (as = passed to the function), but malloc_elem_free_list_insert() calls the = malloc_elem_free_list_index() with malloc_elem->size - = MALLOC_ELEM_HEADER_LEN. Looking isolated at the patch itself... I agree with the way the patch modifies the ranges of the free list, and = the consequential removal of the "- 1" from the calculation of log2. Intuitively, the lists should cover ranges such as [0x100..0x3FF], which = this patch does, not [0x101..0x400], how it was previously... The ranges = with this patch make much more sense. So if the existing code is otherwise correct, i.e. handles the size = with/without MALLOC_ELEM_HEADER_LEN correctly, my gut feeling says this = patch is an improvement. Acked-by: Morten Br=F8rup