From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id 7B2CE4297D; Tue, 18 Apr 2023 14:54:21 +0200 (CEST) Received: from mails.dpdk.org (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id 5816F40EDF; Tue, 18 Apr 2023 14:54:21 +0200 (CEST) Received: from mga18.intel.com (mga18.intel.com [134.134.136.126]) by mails.dpdk.org (Postfix) with ESMTP id 2774A40698 for ; Tue, 18 Apr 2023 14:54:18 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1681822459; x=1713358459; h=date:from:to:cc:subject:message-id:references: content-transfer-encoding:in-reply-to:mime-version; bh=uAwhUAiqCfCloFpilWThqoopDjtF+h5MEU3XiLuJCjE=; b=K/yVEt9tMjgdxMns7h5g2iOfZC4RZxUhX90ka9R+TOfK1c5Tqzc1ezu9 ScwTWSVsKUvwjIoP/01wj95fgQdLD1nlH0TwXsm+G5nq8AbKFAVHHV1Aa 50qIkYt6Jpl//XPBxX2UXu66pWY9kIcjSJO4CaKXEht80TCJrt8uz9jBk 7myQAAYkcinMl600a2gJyUnH5tWaKnB6SE04CiWA/JjNEg7jHrDwbdYhv +13P+fbR9rubn2axCEMqj4uQWmnQ69Gyh9X70hqSNof/omxM6Oz85iEnj ZCR811Qg9WlFM/d09eCwB2InC5ZTKmd3zvFYLdsU7fZ9rZaJdJ91v0Ter w==; X-IronPort-AV: E=McAfee;i="6600,9927,10684"; a="329332639" X-IronPort-AV: E=Sophos;i="5.99,207,1677571200"; d="scan'208";a="329332639" Received: from orsmga003.jf.intel.com ([10.7.209.27]) by orsmga106.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 18 Apr 2023 05:54:18 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10684"; a="641370286" X-IronPort-AV: E=Sophos;i="5.99,207,1677571200"; d="scan'208";a="641370286" Received: from fmsmsx602.amr.corp.intel.com ([10.18.126.82]) by orsmga003.jf.intel.com with ESMTP; 18 Apr 2023 05:54:18 -0700 Received: from fmsmsx610.amr.corp.intel.com (10.18.126.90) by fmsmsx602.amr.corp.intel.com (10.18.126.82) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.23; Tue, 18 Apr 2023 05:54:17 -0700 Received: from fmsmsx610.amr.corp.intel.com (10.18.126.90) by fmsmsx610.amr.corp.intel.com (10.18.126.90) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.23; Tue, 18 Apr 2023 05:54:17 -0700 Received: from FMSEDG603.ED.cps.intel.com (10.1.192.133) by fmsmsx610.amr.corp.intel.com (10.18.126.90) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.23 via Frontend Transport; Tue, 18 Apr 2023 05:54:17 -0700 Received: from NAM12-BN8-obe.outbound.protection.outlook.com (104.47.55.177) by edgegateway.intel.com (192.55.55.68) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.1.2507.23; Tue, 18 Apr 2023 05:54:17 -0700 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=NFsFJKosgB4PNHaTfXUQPerVSp41te5mYulPvhdCbpL3Rg0n+vXPWor6GYdtgafrC8U/UP3jE5B8ESOy9lQKum/budbVQZ/zx610iUdl0q0DbR2xfEGLWN0AI2QX+MghGq8AowMtbETHw4TYxNwG6EzMJu1MwKMMx79yyns/GRWDytkybDIBwq7L8oGvKzkc5Iw+WVwC87c9R93Vpy/+08ZyriJzJZOkgTe5yt7tPy2jJ6VUe4FhxDVcD8pPEva4B9ZkicvPT16cJ+YIPTL8Xcc+symLqaKNNk2Mo0PzjhWOJfVzATR93Cnl7GIdZ7qHH3XddPuTz30kzCWYILDvRw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=CoSbuTTYnkpqnln1wnc4GmGZLOhMwAM3xo5YL1dC410=; b=Qi32kZlHrhTHtuVPwhI4i1FxSIFan7LWmKBxTVX81SRhfatws/m7LWI7w9ijSuPMItVdkKiSaORD0Y1IELBlcLOcjfEs1xrfbXrlAHxOqP2x4AtEysqrCHlFafH0zYx01DNaHxcwfXUFxQp4L1zgBX322NQGOWHrz8I0WP14oTtaotTDW7UpaM+bVX65iArN9JU7A+o/C6m2VtANO2m1gFwIXsbCjWT8Ox4SL9BizlUmDacnpEeWgj57qL6PIBAk9vVpca6H5E/tnBJBOMz3KsSJD5wrFRjr/KAGW+/txeaCY9YecHefrnc+cvuG/SGNNMEKEYaGx4n1+P78j7Piqg== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=intel.com; dmarc=pass action=none header.from=intel.com; dkim=pass header.d=intel.com; arc=none Authentication-Results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=intel.com; Received: from DS0PR11MB7309.namprd11.prod.outlook.com (2603:10b6:8:13e::17) by DS0PR11MB7767.namprd11.prod.outlook.com (2603:10b6:8:138::5) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6319.20; Tue, 18 Apr 2023 12:54:14 +0000 Received: from DS0PR11MB7309.namprd11.prod.outlook.com ([fe80::695b:260c:f397:2b69]) by DS0PR11MB7309.namprd11.prod.outlook.com ([fe80::695b:260c:f397:2b69%4]) with mapi id 15.20.6298.030; Tue, 18 Apr 2023 12:54:14 +0000 Date: Tue, 18 Apr 2023 13:54:08 +0100 From: Bruce Richardson To: Morten =?iso-8859-1?Q?Br=F8rup?= CC: , , Subject: Re: [PATCH] mempool: optimize get objects with constant n Message-ID: References: <20230411064845.37713-1-mb@smartsharesystems.com> <98CBD80474FA8B44BF855DF32C47DC35D8788D@smartserver.smartshare.dk> Content-Type: text/plain; charset="iso-8859-1" Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <98CBD80474FA8B44BF855DF32C47DC35D8788D@smartserver.smartshare.dk> X-ClientProxiedBy: LO4P302CA0033.GBRP302.PROD.OUTLOOK.COM (2603:10a6:600:317::8) To DS0PR11MB7309.namprd11.prod.outlook.com (2603:10b6:8:13e::17) MIME-Version: 1.0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: DS0PR11MB7309:EE_|DS0PR11MB7767:EE_ X-MS-Office365-Filtering-Correlation-Id: eabe3df6-4be9-40c8-0874-08db400c0327 X-LD-Processed: 46c98d88-e344-4ed4-8496-4ed7712e255d,ExtAddr X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: 9u6oR6+6NH+ICRlVOLUo+od6v4wkn+/N3Vw65RcCiOdM2H2b+BQYD6h15Bfo7P+5gv6X1kvYXtYTAegj/rgKF4Mzng2Mq06qHPKwJ4Ae7T2vgV9nIumcBzrEHOowx0BKifs72ln+MpfcURTI+t+CJ5OLo3jIm/A3IYAjw9PMArAOAMTE1EW+dGu0WVGHIqBjGyqfJ4MA2Bd7Gsho1FaSRmj2tBHR9ZFevbCyGAIXNZgugRS2vGQhRpOY8qiqH3yjWn5bitoRS4iDYxZ/9uIAHPCdh0Wbr66Jxhmhlrb4A3Zp0RyVqQFnDIgOh/SCGDBQtPYkn58ol03XdkMbtm1YdMj9wr1MFMog8+C47rnPmeRqyo2jmwLurTWepGk4J4BHpGBOSro6nmu2VJK9OWxCvgrhilNkVO+zne4XbyL3GnTlshsUUsiGfwsAJTOvr7MoA1k4ZguSQLiryewussJxoY+eTqS63pOsQZ9AZ6I/rSWmhAh4L0WLR4zVhRhqmnT2rkx25H8D86tkWkNH80zUi3SqguDuMcEswiXYqtqB6vemgHoV7sW9EBZ6AA10gJmwl4+pYjj4zMhZTSfWbmCbfWCo7w7yPNb492kOofGHsln+/Tr/jae9ob6mAoVJbQmH X-Forefront-Antispam-Report: CIP:255.255.255.255; CTRY:; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:DS0PR11MB7309.namprd11.prod.outlook.com; PTR:; CAT:NONE; SFS:(13230028)(136003)(366004)(346002)(396003)(39860400002)(376002)(451199021)(6666004)(6486002)(478600001)(86362001)(66574015)(83380400001)(186003)(6506007)(6512007)(26005)(82960400001)(38100700002)(6916009)(316002)(4326008)(66476007)(66556008)(66946007)(2906002)(44832011)(8676002)(8936002)(5660300002)(41300700001)(129723003); DIR:OUT; SFP:1102; X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1 X-MS-Exchange-AntiSpam-MessageData-0: =?iso-8859-1?Q?etDdzLH/eSooIS0jKtQSPvRYI6TVH+4nENraUdUaZe6qIV01sqo5KWKXh6?= =?iso-8859-1?Q?M9xmGFVi1Pbdgae/1QCgLxOqOMOgwI9bFeBOnXpjRMDr3NmHyo9F8y9ZQv?= =?iso-8859-1?Q?HEHEM6o3UD8CzXs32A0WBUhW6jfP1+H0SFhYhLwdkpe3+hVGxjsng5bBZQ?= =?iso-8859-1?Q?iCrTTEYW3NAgwOSsvgLD258ujGhQGDfunSuDx1iYX0GM7jAJE5ofvldduP?= =?iso-8859-1?Q?MwT9CUycSFKnGDCowCZXeeN7Cl9lEPuNmWuVODjgIKZi23XyKbPtPwHhNs?= =?iso-8859-1?Q?glEB630TfqJlb/LUdEaaWncdeJcsvrvZuSVq8QYt0g1AKgiC7Knb8Ia5rX?= =?iso-8859-1?Q?0X2OOS5KUA6oAP6Ydaf6ipzooQ3Lw7vGrH9mlQe9R/uLlRh/CGurJnwtuh?= =?iso-8859-1?Q?PYDkvbBVuV5w+GWRhvrJ4y6ADZtnYygtGNgt5eTvgcmQJXsuvcgNfwniC9?= =?iso-8859-1?Q?x4r83+6ksMrAyql5GD23x/mj0baiyVbCbuen8iYw6PJ8AWXwC6/lLStVuX?= =?iso-8859-1?Q?CU+qSs+8hTz4D0A09Qh5ljUAjjdDKULLWvtBuC0AvNRTLWKJROAJDlf9l6?= =?iso-8859-1?Q?OhCPyhBWQC7zznRvS+l+m9fJAxVgxh9guVeKW/rvFM/TZsow3INnhPKrWq?= =?iso-8859-1?Q?MBzkRQn9LiW1voCfE2QCC3MQcKp05/d+V/EWkpyI6svQpALlkA2Rm8wzOm?= =?iso-8859-1?Q?86UCR76LfaMYVrIER84rmX0G87OXoMtb5efCvV9XcxLy4lyAvxA2nty01c?= =?iso-8859-1?Q?AYQxIMx++sT7fyqik3y3D3cm2rLrIHsiAk80i2j5+OfLX2f1YFwIDel+ai?= =?iso-8859-1?Q?/AG2PTtD7jpCfIBBWYw40C6k09R33Jsw9dslww36ySNHDV03FtuNuQVDNU?= =?iso-8859-1?Q?n280vSkG+OXf0mn/JnnMFwXOm6d8vCCKfdJfqQZbtz8LkRChcBfYuDH1vf?= =?iso-8859-1?Q?kGLKsNcmbnQcO44nd84EOqiL1/4/+6rXjE8mDRyQMDsbZKoPCYavwTfSin?= =?iso-8859-1?Q?S7jPU1D+pmZmVPKiFEGynIqP6+i/7Z6tPwBn1yNOIX+D0pThm7Yrt/ujr6?= =?iso-8859-1?Q?Eom5xlzRT2YixQotKcKe2fZFvkDCfYbYm4z+4bbiMSL35pQOVS4+TWQAhm?= =?iso-8859-1?Q?DJ2vWX3YzpMNaAMzySq1ov/NO+NW9jESD6ErJw+M/+RYxYtUdOeSoYgpGi?= =?iso-8859-1?Q?Lcpc/7k3QV9OjY53hCGJ0fhmknZlM4XBq3W7F2ryTIUoefauC7dcKDJWSz?= =?iso-8859-1?Q?6xYsqncNnm3axln/y3F/gtuHLImNhydu47Q2Z+bfNxMGwHUkfj0ChiJYTn?= =?iso-8859-1?Q?H9IfxpKWVpd7X1qtBIZVvxk7ya1kTqm5x5uzO+04/1QrR5laLKvkqZL+24?= =?iso-8859-1?Q?Qo03JLNdba7bY2CZ+8Jap6yah5C8zP208h51GurVUlDilyQRGBN5k2/jJ4?= =?iso-8859-1?Q?W4iS3g1hHFB6KWRx47IvQ57Bl5WigBRq+pvu1fKGRszYWKNtrkSrXgwCKE?= =?iso-8859-1?Q?92ykAmuJOmOFHW7A0p/Lygp8LdUuXzhLgpno9NOXQov/A4F+2+cnH32f0r?= =?iso-8859-1?Q?R75EXEDhZ9Lx45fRh/jHjPJI5xKUrFsABXj08cvct2tPFfbqkaGnEDJitF?= =?iso-8859-1?Q?J90CCUEOhMmGZUv+xAK5MkR47pfGFqCqZk0htqZByYBSM93x8wwyAHhA?= =?iso-8859-1?Q?=3D=3D?= X-MS-Exchange-CrossTenant-Network-Message-Id: eabe3df6-4be9-40c8-0874-08db400c0327 X-MS-Exchange-CrossTenant-AuthSource: DS0PR11MB7309.namprd11.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-OriginalArrivalTime: 18 Apr 2023 12:54:14.2847 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-CrossTenant-Id: 46c98d88-e344-4ed4-8496-4ed7712e255d X-MS-Exchange-CrossTenant-MailboxType: HOSTED X-MS-Exchange-CrossTenant-UserPrincipalName: NvYLvnc5evJf90yrMmA4KD5qUE2VomZzBm1fokEwSgK3gFDjKPwMguKqw+cnXsY+RptVcG26vFkrkSDk615xqooZMXHS2nboPl/EspVlPkI= X-MS-Exchange-Transport-CrossTenantHeadersStamped: DS0PR11MB7767 X-OriginatorOrg: intel.com X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org On Tue, Apr 18, 2023 at 01:29:49PM +0200, Morten Brørup wrote: > > From: Bruce Richardson [mailto:bruce.richardson@intel.com] > > Sent: Tuesday, 18 April 2023 13.07 > > > > On Tue, Apr 11, 2023 at 08:48:45AM +0200, Morten Brørup wrote: > > > When getting objects from the mempool, the number of objects to get is > > > often constant at build time. > > > > > > This patch adds another code path for this case, so the compiler can > > > optimize more, e.g. unroll the copy loop when the entire request is > > > satisfied from the cache. > > > > > > On an Intel(R) Xeon(R) E5-2620 v4 CPU, and compiled with gcc 9.4.0, > > > mempool_perf_test with constant n shows an increase in rate_persec by an > > > average of 17 %, minimum 9.5 %, maximum 24 %. > > > > > > The code path where the number of objects to get is unknown at build time > > > remains essentially unchanged. > > > > > > Signed-off-by: Morten Brørup > > > > Change looks a good idea. Some suggestions inline below, which you may want to > > take on board for any future version. I'd strongly suggest adding some > > extra clarifying code comments, as I suggest below. > > With those exta code comments: > > > > Acked-by: Bruce Richardson > > > > > --- > > > lib/mempool/rte_mempool.h | 24 +++++++++++++++++++++--- > > > 1 file changed, 21 insertions(+), 3 deletions(-) > > > > > > diff --git a/lib/mempool/rte_mempool.h b/lib/mempool/rte_mempool.h > > > index 9f530db24b..ade0100ec7 100644 > > > --- a/lib/mempool/rte_mempool.h > > > +++ b/lib/mempool/rte_mempool.h > > > @@ -1500,15 +1500,33 @@ rte_mempool_do_generic_get(struct rte_mempool *mp, > > void **obj_table, > > > if (unlikely(cache == NULL)) > > > goto driver_dequeue; > > > > > > - /* Use the cache as much as we have to return hot objects first */ > > > - len = RTE_MIN(remaining, cache->len); > > > cache_objs = &cache->objs[cache->len]; > > > + > > > + if (__extension__(__builtin_constant_p(n)) && n <= cache->len) { > > > + /* > > > + * The request size is known at build time, and > > > + * the entire request can be satisfied from the cache, > > > + * so let the compiler unroll the fixed length copy loop. > > > + */ > > > + cache->len -= n; > > > + for (index = 0; index < n; index++) > > > + *obj_table++ = *--cache_objs; > > > + > > > > This loop looks a little awkward to me. Would it be clearer (and perhaps > > easier for compilers to unroll efficiently if it was rewritten as: > > > > cache->len -= n; > > cache_objs = &cache->objs[cache->len]; > > for (index = 0; index < n; index++) > > obj_table[index] = cache_objs[index]; > > The mempool cache is a stack, so the copy loop needs get the objects in decrementing order. I.e. the source index decrements and the destination index increments. > > Regardless, your point here is still valid! I expected that any unrolling capable compiler can unroll *dst++ = *--src; but I can experiment with different compilers on godbolt.org to see if dst[index] = src[-index] is better. > > A future version could be hand coded with vector instructions here, like in the Ethdev drivers. > > > > > alternatively those last two lines can be replaced by a memcpy, which the > > compiler should nicely optimize itself, for constant size copy: > > > > memcpy(obj_table, cache_objs, sizeof(obj_table[0]) * n); > > Just for reference: It was previously discussed to ignore the stack ordering and simply copy the objects, but the idea was rejected due to the potential performance impact of not returning the hottest objects, i.e. the objects at the top of the stack, as the first in the returned array. > Ah, yes, I had forgotten that we want the order reversed, so most recent objects first. It would be interesting to see if ignoring order within a burst can lead to some performance improvements, as I expect that in most cases, the buffers to be used in a burst are to be used pretty much simultaneously. However, based on your feedback, feel free to ignore this comment and keep code as-is. > > > > > + RTE_MEMPOOL_CACHE_STAT_ADD(cache, get_success_bulk, 1); > > > + RTE_MEMPOOL_CACHE_STAT_ADD(cache, get_success_objs, n); > > > + > > > + return 0; > > > + } > > > + > > > + /* Use the cache as much as we have to return hot objects first */ > > > + len = __extension__(__builtin_constant_p(n)) ? cache->len : > > > + RTE_MIN(remaining, cache->len); > > > > This line confused me a lot, until I applied the patch and stared at it a > > lot :-). Why would the length value depend upon whether or not n is a > > compile-time constant? > > I think it would really help here to add in a comment stating that when n > > is a compile-time constant, at this point it much be "> cache->len" since > > the previous block was untaken, therefore we don't need to call RTE_MIN. > > I agree. This is almost like perl... write-only source code. > > I will add a few explanatory comments. > Thanks. > > > > > cache->len -= len; > > > remaining -= len; > > > for (index = 0; index < len; index++) > > > *obj_table++ = *--cache_objs; > > > > > > - if (remaining == 0) { > > > + if (!__extension__(__builtin_constant_p(n)) && remaining == 0) { > > > /* The entire request is satisfied from the cache. */ > > > > > > RTE_MEMPOOL_CACHE_STAT_ADD(cache, get_success_bulk, 1); > > > > Once I'd worked out the logic for the above conditional check, then this > > conditional adjustment was clear. I just wonder if a further comment might > > help here. > > > > I am also wondering if having larger blocks for the constant and > > non-constant cases may help. It would lead to some duplication but may > > clarify the code. > > I get your point, but the code is still short enough to grasp (after comments have been added). So I prefer to avoid code duplication. > Sure. > > > > > -- > > > 2.17.1 > > >