From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id 912A841CB2; Thu, 16 Feb 2023 15:04:02 +0100 (CET) Received: from mails.dpdk.org (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id 6C27542D1D; Thu, 16 Feb 2023 15:04:02 +0100 (CET) Received: from sender11-of-o51.zoho.eu (sender11-of-o51.zoho.eu [31.186.226.237]) by mails.dpdk.org (Postfix) with ESMTP id 4719840EE3 for ; Thu, 16 Feb 2023 15:04:01 +0100 (CET) ARC-Seal: i=1; a=rsa-sha256; t=1676556132; cv=none; d=zohomail.eu; s=zohoarc; b=cogwayTQdnDgsXja6B+P5W4mqbidJUdSFTd4l7c60IIHYgPLJNBE15WKwg9pMlqQvBr8bJww7x1MGbeSXIndnspRyv6S/I+a67DIsd4kb1I/O3zlCqtkuidJ/VtgVXi9FJ3IfsfnWOX8FD0CAMM7P3xNVgEEkUvpUXVbHf+BToQ= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=zohomail.eu; s=zohoarc; t=1676556132; h=Content-Type:Content-Transfer-Encoding:Cc:Date:From:In-Reply-To:MIME-Version:Message-ID:References:Subject:To; bh=xmbz71C3H9TEXrU0DZni0T1Nas28Z2MqseGjaCLNbSo=; b=FeZ8FBO1aCQW4s/bFADxTd0oh17NaF07d7FRfd5OS9Hbm5Q+oXCSRbCCbvy0D6WeaM8gHbcgmlliIZ4rFLImy4Qj2aTTsOMSgVmk8toQVGozFG0c+pm3+ZWCgxiVbR02+8vLdrnapWp1cVflnsWyn83ImvenK5L95d8ECibGEC4= ARC-Authentication-Results: i=1; mx.zohomail.eu; spf=pass smtp.mailfrom=liangma@liangbit.com; dmarc=pass header.from= Received: from C02GF04TMD6V (ec2-54-93-135-31.eu-central-1.compute.amazonaws.com [54.93.135.31]) by mx.zoho.eu with SMTPS id 1676556130211807.1495523967719; Thu, 16 Feb 2023 15:02:10 +0100 (CET) Date: Thu, 16 Feb 2023 14:02:07 +0000 From: Liang Ma To: Morten =?iso-8859-1?Q?Br=F8rup?= Cc: Thomas Monjalon , anatoly.burakov@intel.com, Fengnan Chang , dev@dpdk.org, rsanford@akamai.com, bruce.richardson@intel.com, maxime.coquelin@redhat.com, david.marchand@redhat.com, jerinj@marvell.com, honnappa.nagarahalli@arm.com, Fidaullah Noonari Message-ID: Subject: Re: [PATCH] malloc: fix malloc performance may becomes worse as the number of malloc increases References: <20230210063022.52171-1-changfengnan@bytedance.com> <2134819.GUh0CODmnK@thomas> <98CBD80474FA8B44BF855DF32C47DC35D8773F@smartserver.smartshare.dk> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: quoted-printable In-Reply-To: <98CBD80474FA8B44BF855DF32C47DC35D8773F@smartserver.smartshare.dk> X-ZohoMailClient: External X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org On Wed, Feb 15, 2023 at 12:10:23PM +0100, Morten Br=F8rup wrote: > +CC: Fidaullah Noonari , your name also sho= ws up in the git log; perhaps you can help review this patch. >=20 >=20 > I gave up reviewing in depth, because the existing code is not easy to qu= ickly understand, and it would take too long for me. E.g. the malloc_elem->= size is field is undocumented, and find_suitable_element() calls the malloc= _elem_free_list_index() function with the raw size (as passed to the functi= on), but malloc_elem_free_list_insert() calls the malloc_elem_free_list_ind= ex() with malloc_elem->size - MALLOC_ELEM_HEADER_LEN. >=20 > Looking isolated at the patch itself... >=20 > I agree with the way the patch modifies the ranges of the free list, and = the consequential removal of the "- 1" from the calculation of log2. >=20 > Intuitively, the lists should cover ranges such as [0x100..0x3FF], which = this patch does, not [0x101..0x400], how it was previously... The ranges wi= th this patch make much more sense. >=20 > So if the existing code is otherwise correct, i.e. handles the size with/= without MALLOC_ELEM_HEADER_LEN correctly, my gut feeling says this patch is= an improvement. >=20 > Acked-by: Morten Br=F8rup I run the test with DPDK malloc perf test. The issue is replicated. IMO, the whole process is if application use rte_malloc with a relative large alignment size. e.g. 4K alignment. Currently implementation will generate lots "fragment" because of the large alignment in related mem element free list. In the test code, 4K malloc size + 4k alignment will lead to the actually space is needed has to take from upper level mem element idx free list. The consequence is that will generate lots fragment element and those element is inserted to the current level mem free-list. However, when the rte_malloc select which free list to start scan with, it doesn't take the alignment into account, which ends up with waste some time in the current level free-list. If the scan logic take alignment into account, it might "smartly" skip current level and jump to the upper level directly.=20 >=20