From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id D11B74297C; Tue, 18 Apr 2023 13:06:54 +0200 (CEST) Received: from mails.dpdk.org (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id C203E41144; Tue, 18 Apr 2023 13:06:54 +0200 (CEST) Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) by mails.dpdk.org (Postfix) with ESMTP id 9A1FC40698 for ; Tue, 18 Apr 2023 13:06:53 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1681816013; x=1713352013; h=date:from:to:cc:subject:message-id:references: content-transfer-encoding:in-reply-to:mime-version; bh=TExjoqpmqRuqBiBdMpq5K9SXw1kOdaPAN2IxcwwLe2E=; b=JUWWxGxnBKWCZueP9WGtGBgfRL2MVHIFXOHBGp7iQBp7nVN4v5b6H5R9 KFKwUID/q7kN1H2TnyAXb+bu5tXhaetg1KRN2HfI9RJ7rcHLek0sx7hLs wOrmZBe9RuAv4eWjLH4p/GYa6Z5VfKN8IvhPnQwPCTEA6T6HBcFHxQRP1 odcD6KV01Js7+8B7YoZPokf2AH+KAjmr4VGe3q1kJ2j2pTI7m4AJdWchR rTbV/iWys+Yu9sFYU1qY9js847nAecZgPMLSYW04jJomsuV8drNtiBLOb P2WxP66GONEgGncRt0geDOK1UrCKuzdv2Ob1Az1OWqRai1fhfCSBSyumS Q==; X-IronPort-AV: E=McAfee;i="6600,9927,10683"; a="347891369" X-IronPort-AV: E=Sophos;i="5.99,207,1677571200"; d="scan'208";a="347891369" Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 18 Apr 2023 04:06:52 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10683"; a="834845690" X-IronPort-AV: E=Sophos;i="5.99,207,1677571200"; d="scan'208";a="834845690" Received: from orsmsx602.amr.corp.intel.com ([10.22.229.15]) by fmsmga001.fm.intel.com with ESMTP; 18 Apr 2023 04:06:52 -0700 Received: from orsmsx611.amr.corp.intel.com (10.22.229.24) by ORSMSX602.amr.corp.intel.com (10.22.229.15) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.23; Tue, 18 Apr 2023 04:06:52 -0700 Received: from orsmsx610.amr.corp.intel.com (10.22.229.23) by ORSMSX611.amr.corp.intel.com (10.22.229.24) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.23; Tue, 18 Apr 2023 04:06:51 -0700 Received: from ORSEDG602.ED.cps.intel.com (10.7.248.7) by orsmsx610.amr.corp.intel.com (10.22.229.23) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.23 via Frontend Transport; Tue, 18 Apr 2023 04:06:51 -0700 Received: from NAM10-DM6-obe.outbound.protection.outlook.com (104.47.58.105) by edgegateway.intel.com (134.134.137.103) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.1.2507.23; Tue, 18 Apr 2023 04:06:51 -0700 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=lEPTjkC/6J95tbd9rPg88HyGpIGu+vazNxLmeJAYfZOYrYz8U6/9Fi9GwnoLBfQlvS8/sUTjYhbXFP9lo0LoplxiNtDmNYSqvtG/Ateqgq+W3+UZKSJM2gafF4Y9JBHNYEbMKDy6dypBob7D3QPHSoA62sPxmvJ2HHdt2qjUNZd0iJgyUOMIX+Vh2/ni4ZikX/C5UeSofGn6TaMZ56/erIIgyqhFBhnXQECWFurO8A29C9RQrQInCAQJ8WFXAHtyqpYwVZQ0tk9IjHcLmbaEY6jHPwhDXEjcsUNA84WkLhINqye6oB+Zm7NqQmAkCwX/0uG8MfaFQaFBc7nncK+8fg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=Et4ZkX5dfLXZu1QYM+SWm6TrlHbR1ZZzLZiNsQCM1G8=; b=R2uUTXXlQfo90A0OmsVa+2f2QCzeQsq+wB1o2Nf3c+2N+JO2I82/Vkm27Zkbiw2kCViGJ7xxpKweOkcVvKTxEvkVL2s6RkVLPixCRwZj9arbCxanfCtGFR5enrCDRJkF7ply25M9idv273bVFW+ehTOuaIp/sBgn78Vea3vfL3Xc0IoNUNy+pUdAqhWRV/wHkcN5XcEgaO+gHvXYKsfwES0g3ybxM2ZQ1byHpICTH7L/a+x4rB+/kXSJE8HZm/F9iMvNqAtiBaYGbBzH8ta2BaZC3iiTW47l1m8n2KK4iH78IKfK8VORYMfQoNR/4a8UwqUrI9m7OS4/0HM63aSY2Q== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=intel.com; dmarc=pass action=none header.from=intel.com; dkim=pass header.d=intel.com; arc=none Authentication-Results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=intel.com; Received: from DS0PR11MB7309.namprd11.prod.outlook.com (2603:10b6:8:13e::17) by DS0PR11MB7457.namprd11.prod.outlook.com (2603:10b6:8:140::18) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6298.45; Tue, 18 Apr 2023 11:06:49 +0000 Received: from DS0PR11MB7309.namprd11.prod.outlook.com ([fe80::695b:260c:f397:2b69]) by DS0PR11MB7309.namprd11.prod.outlook.com ([fe80::695b:260c:f397:2b69%4]) with mapi id 15.20.6298.030; Tue, 18 Apr 2023 11:06:49 +0000 Date: Tue, 18 Apr 2023 12:06:42 +0100 From: Bruce Richardson To: Morten =?iso-8859-1?Q?Br=F8rup?= CC: , , Subject: Re: [PATCH] mempool: optimize get objects with constant n Message-ID: References: <20230411064845.37713-1-mb@smartsharesystems.com> Content-Type: text/plain; charset="iso-8859-1" Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <20230411064845.37713-1-mb@smartsharesystems.com> X-ClientProxiedBy: DUZPR01CA0165.eurprd01.prod.exchangelabs.com (2603:10a6:10:4b3::6) To DS0PR11MB7309.namprd11.prod.outlook.com (2603:10b6:8:13e::17) MIME-Version: 1.0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: DS0PR11MB7309:EE_|DS0PR11MB7457:EE_ X-MS-Office365-Filtering-Correlation-Id: 399f852b-96bd-4a0e-7c41-08db3ffd0146 X-LD-Processed: 46c98d88-e344-4ed4-8496-4ed7712e255d,ExtAddr X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: QoW42DZl1qUq7U3t8/OgIxC0oEL+BjHn1t4a/+NBk54C0inwfC/FC8sHwpqdtMwVocRnWzJnkBYG4+uY6nfYidsIE5wbEIDV17FRZajgTsL52dA2Gf9GLkE63rlysUHEX7rr4alcTJGoMtBNcCvEviWiCsWVOc19lNNNrHK12fkqH3BeH8ruUiko1/gXL0Q3Zwcgh2n1H0PB+jXX3v1YMKAZoEAh3oWNYHWlHMWZYicj2uqZpVAkzmvO+rTssnNN3LjdTF8no1BUQBJ886utMfEsxKvT6+rFdVtFQZXtrrQZv3YJXs0mRPuNyExE18xUcDmh3fbWkHsasbBWiBQwawzo8AOKOz4qanmIA17NcDFy6j5U9KdJQaywLPT7ugA5jRf/wk/lRWr6IzF6j0WROhNLT89koMt/5oiAnuJGhoxPZGXZxYVqoGbDn5uPYN7V44NiLLWmKDTr3Cnxw4mgee4F6BS2bkCRxvmWoJ24JseczUVDom25Zm2NjB6S5uGZhdEGd8GNFVFbhLtSn7+JRR2q9UXuAE489XWlls0ineXpH6Db2F2f+vokhLAvQbvmzqv9mDrj18O3fDXS3fhW7IPPRaGKXr3PpZbb25fO5CBmmDfLsa4NdAabNQJ4OmHH X-Forefront-Antispam-Report: CIP:255.255.255.255; CTRY:; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:DS0PR11MB7309.namprd11.prod.outlook.com; PTR:; CAT:NONE; SFS:(13230028)(346002)(136003)(366004)(376002)(39860400002)(396003)(451199021)(26005)(6506007)(6512007)(316002)(478600001)(86362001)(66476007)(6916009)(6666004)(66556008)(4326008)(66946007)(2906002)(38100700002)(8936002)(5660300002)(8676002)(44832011)(6486002)(82960400001)(41300700001)(83380400001)(66574015)(186003)(129723003); DIR:OUT; SFP:1102; X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1 X-MS-Exchange-AntiSpam-MessageData-0: =?iso-8859-1?Q?pNJ34MJzt0XDG6rZDe5R75u5G5Y4Dfe7OBJNkhraFoAx8XxUEFGQiTxGeV?= =?iso-8859-1?Q?5Jp3hEym+XYPZPQ9malFDcToUlC5tyrGVwzxAuzNW15TL8XjVkfSXxPO1Q?= =?iso-8859-1?Q?VqqZJ/5YFddTv2T4DUqLIdM/s0KQUJAP27ihscIFfugc2J8hCPYF7eD8dv?= =?iso-8859-1?Q?v5gN12M6Gsj/T254Y9CwUb7hIe/C3nHL1SHZ1StQQo1qeTN8AqVZILw8Ns?= =?iso-8859-1?Q?GgW31Ucs8ldqyzL5jdIc3svZ2L9Ge5fYTwMbZIMoakksrv6tiaZ+eIevaL?= =?iso-8859-1?Q?b3LOyuua2/2BIuwFEERut46ir2URpDfJHX/e63FFq0dohlrYTWDi4v40gf?= =?iso-8859-1?Q?AEInvCzbedJzzuFSmvH4b5mfFZxp+TV+E+7jMv2bhiXJiukOd6RgBihsNO?= =?iso-8859-1?Q?LAOLfBgzGIFMmtnDhGvRJnXxPubaL0IAZ4VjFYpQhnBp2zXZs7LqJNdXiZ?= =?iso-8859-1?Q?KfeDOKa7X/ACZIF8+Vpiiob3X8UTvrP537zP+nGkVVHUwobp7uMa3tDYBw?= =?iso-8859-1?Q?gSnjoHOrP/CeQf0Zp2/PEaD5oEX4YbIErTE/w9y+XiXIvRYSU5XKHh9FQB?= =?iso-8859-1?Q?Ihtx7Ods4OAb3Zoc27+75hfy6JctyQCT1jGqSdMenVA0eSaKLMhyoK2m5+?= =?iso-8859-1?Q?5UVVisUfqJTA85qmV+30vKJNZOU4E8I6tdU2DN4/+tmQ/rWIEc47shBGRf?= =?iso-8859-1?Q?sSi9pjA58l3BjJvOjgaWa4ENdXDqLLexc6F4EXdgFzEyQaF6Xl9mz0oelt?= =?iso-8859-1?Q?uPIfWlSXqc1DeJndUdR99352Y4GcL9qK/uYhHdfkz36L/kNE8Mk2V7dxZB?= =?iso-8859-1?Q?UQTL0F3i8WHVyks91DY70VJcjJkT17HER9cMFINVY1G6JtqGAyaaOQCuiM?= =?iso-8859-1?Q?Fpahs25nBO365yOJFejVVeoEvpEsjM5DRm5eu+sSj1KaFVkHutJsa41SAy?= =?iso-8859-1?Q?xCegOgxM2FMcytXTRxccHkQHtLPwGJonshRh8NNo8JJCujIwG+vySgeiW9?= =?iso-8859-1?Q?0+cH/STAHuHlG9FNvHMO/qp8WODqzpAIQogaUNN/hWOV8sUV2JJKSpudwu?= =?iso-8859-1?Q?SCaJjNqMdmjspMq8oouyhLAwb/wC86ExF9tXXEYQmBfeiGBs6Vnj6HaEVN?= =?iso-8859-1?Q?/OtNL9r9ktS3/67ws8i7lL8pshDZCPyPl0LLP2BEyuKhlKXtYZq/mgcVIu?= =?iso-8859-1?Q?O889lOzBq5MdfNAIjK11VPPTgyZEa2iZsqmef1BCxg8cSi4Z6dHSSo4QGS?= =?iso-8859-1?Q?LIeHLJ6HlTvSSKmM1pXe4ps/Llyx2Bah9OxnPHnpkjW+G1UkrmGq4bNR8L?= =?iso-8859-1?Q?swJyRjphfLLxR3PqXTsDUZE1R6X4W0/z24nDtV2X4l3dOxi1kym1Xhy0J6?= =?iso-8859-1?Q?u/T0n8CnQRUbgY+6jr59zo+/CEsxlC2jedww8Opm9z4gRK5r7k5wIlObVC?= =?iso-8859-1?Q?Nnbxa/9+B7XypZ6EKi3quRLY2u/bnxwTZ06dx1hcmJ9nEHWj5HUsi6s2JL?= =?iso-8859-1?Q?EAJE3KpfpgzBUUrdXh7MC2TO7ZRn2swXEfbEeyJht4M1GWAV1spr2icJCM?= =?iso-8859-1?Q?k7ItHVGuduVoWkyaNHakiqmjoP8bXHGJxJ1BvbbF7n3J9/TP70K5BVEsss?= =?iso-8859-1?Q?MmP1obiFVDpyV4xt0jEiZDlm6EKpfcNrunm6g7J5MzLxaiNvJFKiKfxw?= =?iso-8859-1?Q?=3D=3D?= X-MS-Exchange-CrossTenant-Network-Message-Id: 399f852b-96bd-4a0e-7c41-08db3ffd0146 X-MS-Exchange-CrossTenant-AuthSource: DS0PR11MB7309.namprd11.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-OriginalArrivalTime: 18 Apr 2023 11:06:48.7359 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-CrossTenant-Id: 46c98d88-e344-4ed4-8496-4ed7712e255d X-MS-Exchange-CrossTenant-MailboxType: HOSTED X-MS-Exchange-CrossTenant-UserPrincipalName: LU8jV3fSf9Dt/16RSYCii25s8EFCdstnM3XlNe3Xb8Wmn2CwcLcBF6jlRKJGZZK+QnR0QF87QbyEO/+MGSfstj6o5gu8Pduunb9y26Ho4dw= X-MS-Exchange-Transport-CrossTenantHeadersStamped: DS0PR11MB7457 X-OriginatorOrg: intel.com X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org On Tue, Apr 11, 2023 at 08:48:45AM +0200, Morten Brørup wrote: > When getting objects from the mempool, the number of objects to get is > often constant at build time. > > This patch adds another code path for this case, so the compiler can > optimize more, e.g. unroll the copy loop when the entire request is > satisfied from the cache. > > On an Intel(R) Xeon(R) E5-2620 v4 CPU, and compiled with gcc 9.4.0, > mempool_perf_test with constant n shows an increase in rate_persec by an > average of 17 %, minimum 9.5 %, maximum 24 %. > > The code path where the number of objects to get is unknown at build time > remains essentially unchanged. > > Signed-off-by: Morten Brørup Change looks a good idea. Some suggestions inline below, which you may want to take on board for any future version. I'd strongly suggest adding some extra clarifying code comments, as I suggest below. With those exta code comments: Acked-by: Bruce Richardson > --- > lib/mempool/rte_mempool.h | 24 +++++++++++++++++++++--- > 1 file changed, 21 insertions(+), 3 deletions(-) > > diff --git a/lib/mempool/rte_mempool.h b/lib/mempool/rte_mempool.h > index 9f530db24b..ade0100ec7 100644 > --- a/lib/mempool/rte_mempool.h > +++ b/lib/mempool/rte_mempool.h > @@ -1500,15 +1500,33 @@ rte_mempool_do_generic_get(struct rte_mempool *mp, void **obj_table, > if (unlikely(cache == NULL)) > goto driver_dequeue; > > - /* Use the cache as much as we have to return hot objects first */ > - len = RTE_MIN(remaining, cache->len); > cache_objs = &cache->objs[cache->len]; > + > + if (__extension__(__builtin_constant_p(n)) && n <= cache->len) { > + /* > + * The request size is known at build time, and > + * the entire request can be satisfied from the cache, > + * so let the compiler unroll the fixed length copy loop. > + */ > + cache->len -= n; > + for (index = 0; index < n; index++) > + *obj_table++ = *--cache_objs; > + This loop looks a little awkward to me. Would it be clearer (and perhaps easier for compilers to unroll efficiently if it was rewritten as: cache->len -= n; cache_objs = &cache->objs[cache->len]; for (index = 0; index < n; index++) obj_table[index] = cache_objs[index]; alternatively those last two lines can be replaced by a memcpy, which the compiler should nicely optimize itself, for constant size copy: memcpy(obj_table, cache_objs, sizeof(obj_table[0]) * n); > + RTE_MEMPOOL_CACHE_STAT_ADD(cache, get_success_bulk, 1); > + RTE_MEMPOOL_CACHE_STAT_ADD(cache, get_success_objs, n); > + > + return 0; > + } > + > + /* Use the cache as much as we have to return hot objects first */ > + len = __extension__(__builtin_constant_p(n)) ? cache->len : > + RTE_MIN(remaining, cache->len); This line confused me a lot, until I applied the patch and stared at it a lot :-). Why would the length value depend upon whether or not n is a compile-time constant? I think it would really help here to add in a comment stating that when n is a compile-time constant, at this point it much be "> cache->len" since the previous block was untaken, therefore we don't need to call RTE_MIN. > cache->len -= len; > remaining -= len; > for (index = 0; index < len; index++) > *obj_table++ = *--cache_objs; > > - if (remaining == 0) { > + if (!__extension__(__builtin_constant_p(n)) && remaining == 0) { > /* The entire request is satisfied from the cache. */ > > RTE_MEMPOOL_CACHE_STAT_ADD(cache, get_success_bulk, 1); Once I'd worked out the logic for the above conditional check, then this conditional adjustment was clear. I just wonder if a further comment might help here. I am also wondering if having larger blocks for the constant and non-constant cases may help. It would lead to some duplication but may clarify the code. > -- > 2.17.1 >