From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id 13BEEA034D; Tue, 11 Jan 2022 03:26:52 +0100 (CET) Received: from [217.70.189.124] (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id 800AE4067C; Tue, 11 Jan 2022 03:26:51 +0100 (CET) Received: from mga07.intel.com (mga07.intel.com [134.134.136.100]) by mails.dpdk.org (Postfix) with ESMTP id AFC6B40041 for ; Tue, 11 Jan 2022 03:26:49 +0100 (CET) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1641868009; x=1673404009; h=from:to:cc:subject:date:message-id:references: in-reply-to:content-transfer-encoding:mime-version; bh=sR+D8zn6nm1SoSnun5/GFXpQiwRP5AmtzGp4t+Q3DQc=; b=gsm9QwmwuatkBSA/4j1BxufkDx3n2QQdlPODuBTeJk0g9tkJ8W8k/FxL VknIEtqSZNsNTa3t7Qp3a7q50Bh3gQaF20cGeeIV4DMkJa2Y00GxZVzA3 zNeNNR5eYgtSmpVYtLS1XwIi+fSJbAfJsXFUB8EmKfL240/kE+FYiWcMv Popm/3A1YInIESPuBKDfjslZgjyJb2Ef9xeIwRgZjVvnjLAu0hyIgjaMN Bgaw81BrLrt8syIMugGaArgEfVhU9+RjNs9mcqfQzGh2ArVtg9wAV+L1Y inY0RaTXr3Jxx/sAKShu/G3I63RDaghH1flO7Cp/7UZgSouoZQXccRJqX w==; X-IronPort-AV: E=McAfee;i="6200,9189,10223"; a="306737016" X-IronPort-AV: E=Sophos;i="5.88,278,1635231600"; d="scan'208";a="306737016" Received: from fmsmga006.fm.intel.com ([10.253.24.20]) by orsmga105.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 10 Jan 2022 18:26:48 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.88,278,1635231600"; d="scan'208";a="762368590" Received: from orsmsx602.amr.corp.intel.com ([10.22.229.15]) by fmsmga006.fm.intel.com with ESMTP; 10 Jan 2022 18:26:48 -0800 Received: from orsmsx607.amr.corp.intel.com (10.22.229.20) by ORSMSX602.amr.corp.intel.com (10.22.229.15) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2308.20; Mon, 10 Jan 2022 18:26:47 -0800 Received: from orsmsx601.amr.corp.intel.com (10.22.229.14) by ORSMSX607.amr.corp.intel.com (10.22.229.20) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2308.20; Mon, 10 Jan 2022 18:26:47 -0800 Received: from ORSEDG602.ED.cps.intel.com (10.7.248.7) by orsmsx601.amr.corp.intel.com (10.22.229.14) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2308.20 via Frontend Transport; Mon, 10 Jan 2022 18:26:47 -0800 Received: from NAM11-DM6-obe.outbound.protection.outlook.com (104.47.57.170) by edgegateway.intel.com (134.134.137.103) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.1.2308.20; Mon, 10 Jan 2022 18:26:46 -0800 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=DDaVtVLVlBgEdmxvYgz7cmAcKqsD1z+BqBhMsQTWISLVsiG8Q/yNAzurc6GHU2QhNgYHubiBimgSIc5hTzWg7jzYUTxuloNt+gHAZVgvVDlwgQgJOTvsWK9jeaA7CgDP6hPsgkyO21hMqT5MuDGlLJ7kmXhb+3Zygj0kHpsbLE2xC60DjC0Ww2+lAveLYV+V4iOz9EA16eVMyN6o+/n+e/AM5NEeX06q/yIwPeRFl5TJ1GP6dMoAf6hrP6dhtb50G01zpXMqOOaRMZBJlhcoIUw8evvbAKFR/fklRjK049RYAsRzn8oCQce0byG0V2fMNKdaAfguQRObjjYrQ+WVqA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=iq/NvJab8dt2IvJKb8Fl1vKqd0rlTDS/ofwKwPqhDwc=; b=FmMTsyHxBeogBBMjAUKWcPxnc/rf6T8SEV6GUOXy75qwcp4v0bMBgzY1Fc0QHegrRU0bax8PXbnckcLVjbBX5xvv5iNP1ZJtv0gb1ozKn6QJJHDlbR/4iMdvk2pFgFbwdfosA4X22SD8w5AWQhjgekMIIQ8hX/FC3K5CVdGkdLV7vk7Rdhm/C3aBBJCf1uofJuMKu5OqfY/PDH7XJ7aW0GdtyAl30y1E989ftptsgBCMYJrCzlSjMsASxzyyaZCRA9pZjWiWeP0vuWJo9Z7RegPytmM1KFY2RyA9Vc67+NekCWfnxiIDQwOECrBaLFN+IsGNynsX2rL7sBjwWMwhVg== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=intel.com; dmarc=pass action=none header.from=intel.com; dkim=pass header.d=intel.com; arc=none Received: from DM6PR11MB4491.namprd11.prod.outlook.com (2603:10b6:5:204::19) by DM5PR11MB1820.namprd11.prod.outlook.com (2603:10b6:3:111::9) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4867.7; Tue, 11 Jan 2022 02:26:45 +0000 Received: from DM6PR11MB4491.namprd11.prod.outlook.com ([fe80::7de4:731c:cee2:49c2]) by DM6PR11MB4491.namprd11.prod.outlook.com ([fe80::7de4:731c:cee2:49c2%3]) with mapi id 15.20.4867.012; Tue, 11 Jan 2022 02:26:45 +0000 From: "Ananyev, Konstantin" To: Dharmik Thakkar , Olivier Matz , Andrew Rybchenko CC: "dev@dpdk.org" , "nd@arm.com" , "honnappa.nagarahalli@arm.com" , "ruifeng.wang@arm.com" Subject: RE: [PATCH 1/1] mempool: implement index-based per core cache Thread-Topic: [PATCH 1/1] mempool: implement index-based per core cache Thread-Index: AQHX+RoX5eYid0d4TkS1YP4buw/aYqxctE0Q Date: Tue, 11 Jan 2022 02:26:45 +0000 Message-ID: References: <20210930172735.2675627-1-dharmik.thakkar@arm.com> <20211224225923.806498-1-dharmik.thakkar@arm.com> <20211224225923.806498-2-dharmik.thakkar@arm.com> In-Reply-To: <20211224225923.806498-2-dharmik.thakkar@arm.com> Accept-Language: en-GB, en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: dlp-product: dlpe-windows dlp-reaction: no-action dlp-version: 11.6.200.16 authentication-results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=intel.com; x-ms-publictraffictype: Email x-ms-office365-filtering-correlation-id: 1bbc4c79-fa80-4d2f-ac9c-08d9d4a9cff9 x-ms-traffictypediagnostic: DM5PR11MB1820:EE_ x-microsoft-antispam-prvs: x-ms-oob-tlc-oobclassifiers: OLM:8882; x-ms-exchange-senderadcheck: 1 x-ms-exchange-antispam-relay: 0 x-microsoft-antispam: BCL:0; x-microsoft-antispam-message-info: ty7i6XaMWhXgiuLiPrtxNDN4ra1Qj6maEdQUyMaLIhpI523UXIsiyM1iGgTBs51c7T2W8Z4wwwTE3/h4kV4U2u/E2sHJf4qhwKmVdFu5zaDyN/fcff6ZSP891WBmbDQwIQCFbLk+QzS+372hUDTiYwnltRtinLBAaIbOBrv4bwcl3KYxp0zKBuIZ0xIbTZ+AQxU5PbY5F5rUv+NsK9KlEweq3Qk6OekZlgYjBPuas4aTHrIhm8x8vske7fxHiWirqtkGBsD2xXBGMTCNSw1X9UlK9rpndt8ZtZnRCMtLjQvRjd7WxTa9OBq3vZK5pfhev2vlWRGr/RaGMPHJO7xTiE+uFx3avY+DDRnEK0saY4V4qtgTkvSBcchao/oIVv/gCEewXBLJX72Al2BW33Cn9V2nuHC8BnbTXcMlShL4OpQo3mOa0vh4yPgmg0lqHYAPXz3pGs0WOc2PE50DpanNgQDhxPpdToZ1+jxclsCPpzhGq9TjWg6yNMXjFapHLnS2/S8TE9kCfoNGCh60cb6EZP9bX6a2bpaGV1FbXWfxeQI4AqlkzCNWLP1bQLqEJVYzADFeZyu8MLJ1gucp0tzMTxRpA2YtymHPVE1oXa0mvw2ySYj2nFWmYfvhLVzbj5QSzaqsy3aq2C0ZIDUP8UxPgRMZbZxNUjm/oc/7oVf9YyoZrrO/T7RhCEHzOQJLT3p2lG1hV4XlGV5cYCOTNAXI0w== x-forefront-antispam-report: CIP:255.255.255.255; CTRY:; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:DM6PR11MB4491.namprd11.prod.outlook.com; PTR:; CAT:NONE; SFS:(366004)(66476007)(66446008)(52536014)(186003)(54906003)(71200400001)(7696005)(66556008)(4326008)(5660300002)(9686003)(110136005)(38070700005)(55016003)(6506007)(122000001)(8676002)(508600001)(316002)(33656002)(30864003)(76116006)(38100700002)(83380400001)(26005)(66946007)(64756008)(8936002)(86362001)(82960400001)(2906002); DIR:OUT; SFP:1102; x-ms-exchange-antispam-messagedata-chunkcount: 1 x-ms-exchange-antispam-messagedata-0: =?us-ascii?Q?E2y9EV1GY3E7P40LdZtE/FoqciP7hXtzp4/SWrW4FGjpVKGeTW7DWTuWhKuZ?= =?us-ascii?Q?vTQrDfUPk1oi0yhOoGxNtul4oaZSb2Pr8aZuCN6mRXcjYW2do/MB3Gb0A0K2?= =?us-ascii?Q?Y3mLLmXrv/180IZB4mDGa3bQPG451pP5msIbzOJYF9WccAUFFnySBDJ1P3Eo?= =?us-ascii?Q?jQgCGwr0thLhcFhpfTeu19Rl3jkMTtXeRP6BnofQx4ksw7+B5JMOC5wm2dZC?= =?us-ascii?Q?gM5Bd+daMytb7v8/KkbEJ4CGyiRIVEymPPULQp44+el3JEbb8wNc5Jz28lHQ?= =?us-ascii?Q?CyASnLMDze+Ut08zFetXK2Nnm/ledClf+SJa86Z67pxFPil1IQELekiczqA4?= =?us-ascii?Q?QPvRFaT1VowcRhWU80yZRWmEFGjnaX9DLsB9OLqKenFqII0nxwfhZ2tTXVQm?= =?us-ascii?Q?ZmZaaiQqxCuTK4t9LxPSXvtpLdH6NsweJ2V+/S9mmAys0S0mJv6w1vjz82q1?= =?us-ascii?Q?ZtT7pkvUncJXleZalSL6t0N9A/feJstdRNFUrz4uOk1IRSH+lInqVfRJvkva?= =?us-ascii?Q?FBdGeJqzgh1yCtuoUZWV3qrR6T4jfOHFOLbyrSZwVo4R75CtaTBvAXfi5NQH?= =?us-ascii?Q?ktSBFShlX7Amo53wZVtR7OW0xtJakMhaFmJ9rHfPJHU4mzFOC2dzEDR713Et?= =?us-ascii?Q?bj6owJ/XmRNGEZPww1aWV9kyYHm/860uAq+KN8WJKbOff/4IJmXALdMe0eIn?= =?us-ascii?Q?QRdySRl/EOA20VAlYcPr4bKltAHsIzxfO+LZ3EIiOgmT/hThb7WGtX2GAggX?= =?us-ascii?Q?MZ7EjavydDTtprRdhQLsmWty4kTFwBKHcxrCv7EorAF2nbb1KccxvGQ22GoD?= =?us-ascii?Q?QkCPXDrHDlbLmqvGGBhv2jFCBRKs5RR1cJnhxh2OKAhQrGLZ5N2/cDMfZnMy?= =?us-ascii?Q?ObjKLm129ErW0097piChOVEx6EGN+wj766rHaPEWrFjroBgj1tevuxr/jsDg?= =?us-ascii?Q?sEKr3ADqalvK/+e2EMDzzjECAYY7raiFRJzha1OK8a+S+gDznsKD0c9YJgAu?= =?us-ascii?Q?pIDBQ9gFuSQ2PIYDu5+zXY42OHhIev/U7dYLtRWvT1pXgiI44i4+lY042KhH?= =?us-ascii?Q?Ouz/oRC+InuqohJd5PWYzovYclMy+jgBmrP/oGaf+wY9Rs/5wG2UkAal0OPT?= =?us-ascii?Q?hwQWlUQMvolYokO0coZmBmg8V68Jp6FJU1DToaTvKREKKkAUBwDPSSWypzbh?= =?us-ascii?Q?FeM0Wk9DfQUIpmUVvrSDR5DvOg/Ol4Ql5M0qcNeyCaNYI71cZroNFOT7Fhg9?= =?us-ascii?Q?1fKqFFVfS4rCzGNh/tAp3rfpybpnBBRQckifnPK/36k4gvWkqOy5KvcGZUJ3?= =?us-ascii?Q?UgcnIshoQz/5MYLCJgD/5Y1evB/dwWSFOyKQuEHjpsS4tQeSJxnlqD2djn5b?= =?us-ascii?Q?8vp+du3myMQx3AFdw/GXqKf/slX4teo9pfa8oGjGv1J7ZA9r1wNB7PEPcS+M?= =?us-ascii?Q?d0FPIa2BXfxpfdPR6K+0b+ZUEbZVC2TEI2HmfnlOkRdnk0tR21+kwGxoCFg+?= =?us-ascii?Q?LuIfXaUqCvF0v3ENYDiHlB9kwUQz2t7yeZO3dVsuYaC+J8106X7AJU3DaENc?= =?us-ascii?Q?1E6hGvvT0ujo6WgXDjQiS3REqjQIBK5/D123J4bQ25M081+gMKNZh+YFcTcO?= =?us-ascii?Q?uFF57oPei3GrUq2eg7hz4R4EbQGjqOZC528qv3PQy8lVcP+jLwf8HBxmj0qe?= =?us-ascii?Q?F22NFA=3D=3D?= Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-AuthSource: DM6PR11MB4491.namprd11.prod.outlook.com X-MS-Exchange-CrossTenant-Network-Message-Id: 1bbc4c79-fa80-4d2f-ac9c-08d9d4a9cff9 X-MS-Exchange-CrossTenant-originalarrivaltime: 11 Jan 2022 02:26:45.2506 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: 46c98d88-e344-4ed4-8496-4ed7712e255d X-MS-Exchange-CrossTenant-mailboxtype: HOSTED X-MS-Exchange-CrossTenant-userprincipalname: e8coglcwM19FM2dbsfFFrz2JAWTLjky/lgVBXFDS26j0fIb9NYApYzRrQv/6/RTuXkiSc9Nap2Zka7SbelYQKtfPRgag6zBJG3u5wLSToak= X-MS-Exchange-Transport-CrossTenantHeadersStamped: DM5PR11MB1820 X-OriginatorOrg: intel.com X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org =20 > Current mempool per core cache implementation stores pointers to mbufs > On 64b architectures, each pointer consumes 8B > This patch replaces it with index-based implementation, > where in each buffer is addressed by (pool base address + index) > It reduces the amount of memory/cache required for per core cache >=20 > L3Fwd performance testing reveals minor improvements in the cache > performance (L1 and L2 misses reduced by 0.60%) > with no change in throughput I feel really sceptical about that patch and the whole idea in general: - From what I read above there is no real performance improvement observed. (In fact on my IA boxes mempool_perf_autotest reports ~20% slowdown, see below for more details).=20 - Space utilization difference looks neglectable too. - The change introduces a new build time config option with a major limitat= ion: All memzones in a pool have to be within the same 4GB boundary.=20 To address it properly, extra changes will be required in init(/populate= ) part of the code. All that will complicate mempool code, will make it more error prone and harder to maintain. But, as there is no real gain in return - no point to add such extra comple= xity at all. Konstantin CSX 2.1 GHz =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D echo 'mempool_perf_autotest' | ./dpdk-test -n 4 --lcores=3D'6-13' --no-pci params : = rate_persec =09 = (normal/index-based/diff %) (with cache) cache=3D512 cores=3D1 n_get_bulk=3D32 n_put_bulk=3D32 n_keep=3D32 : 7409893= 37.00/504116019.00/-31.97 cache=3D512 cores=3D1 n_get_bulk=3D32 n_put_bulk=3D32 n_keep=3D128 : 756495= 155.00/615002931.00/-18.70 cache=3D512 cores=3D2 n_get_bulk=3D32 n_put_bulk=3D32 n_keep=3D32 : 1483499= 110.00/1007248997.00/-32.10 cache=3D512 cores=3D2 n_get_bulk=3D32 n_put_bulk=3D32 n_keep=3D128 : 151243= 9807.00/1229927218.00/-18.68 cache=3D512 cores=3D8 n_get_bulk=3D32 n_put_bulk=3D32 n_keep=3D32 : 5933668= 757.00/4029048421.00/-32.10 cache=3D512 cores=3D8 n_get_bulk=3D32 n_put_bulk=3D32 n_keep=3D128 : 604923= 4942.00/4921111344.00/-18.65 (with user-owned cache) cache=3D512 cores=3D1 n_get_bulk=3D32 n_put_bulk=3D32 n_keep=3D32 : 6306004= 99.00/504312627.00/-20.03 cache=3D512 cores=3D1 n_get_bulk=3D32 n_put_bulk=3D32 n_keep=3D128 : 75625= 9225.00/615042252.00/-18.67 cache=3D512 cores=3D2 n_get_bulk=3D32 n_put_bulk=3D32 n_keep=3D32 : 126205= 2966.00/1007039283.00/-20.21 cache=3D512 cores=3D2 n_get_bulk=3D32 n_put_bulk=3D32 n_keep=3D128 : 15178= 53081.00/1230818508.00/-18.91 cache=3D512 cores=3D8 n_get_bulk=3D32 n_put_bulk=3D32 n_keep=3D32 :5054529= 533.00/4028052273.00/-20.31 cache=3D512 cores=3D8 n_get_bulk=3D32 n_put_bulk=3D32 n_keep=3D128 : 60593= 40592.00/4912893129.00/-18.92 >=20 > Suggested-by: Honnappa Nagarahalli > Signed-off-by: Dharmik Thakkar > Reviewed-by: Ruifeng Wang > --- > lib/mempool/rte_mempool.h | 114 +++++++++++++++++++++++++- > lib/mempool/rte_mempool_ops_default.c | 7 ++ > 2 files changed, 119 insertions(+), 2 deletions(-) >=20 > diff --git a/lib/mempool/rte_mempool.h b/lib/mempool/rte_mempool.h > index 1e7a3c15273c..4fabd3b1920b 100644 > --- a/lib/mempool/rte_mempool.h > +++ b/lib/mempool/rte_mempool.h > @@ -50,6 +50,10 @@ > #include > #include >=20 > +#ifdef RTE_MEMPOOL_INDEX_BASED_LCORE_CACHE > +#include > +#endif > + > #include "rte_mempool_trace_fp.h" >=20 > #ifdef __cplusplus > @@ -239,6 +243,9 @@ struct rte_mempool { > int32_t ops_index; >=20 > struct rte_mempool_cache *local_cache; /**< Per-lcore local cache */ > +#ifdef RTE_MEMPOOL_INDEX_BASED_LCORE_CACHE > + void *pool_base_value; /**< Base value to calculate indices */ > +#endif >=20 > uint32_t populated_size; /**< Number of populated objects. */ > struct rte_mempool_objhdr_list elt_list; /**< List of objects in pool *= / > @@ -1314,7 +1321,19 @@ rte_mempool_cache_flush(struct rte_mempool_cache *= cache, > if (cache =3D=3D NULL || cache->len =3D=3D 0) > return; > rte_mempool_trace_cache_flush(cache, mp); > + > +#ifdef RTE_MEMPOOL_INDEX_BASED_LCORE_CACHE > + unsigned int i; > + unsigned int cache_len =3D cache->len; > + void *obj_table[RTE_MEMPOOL_CACHE_MAX_SIZE * 3]; > + void *base_value =3D mp->pool_base_value; > + uint32_t *cache_objs =3D (uint32_t *) cache->objs; > + for (i =3D 0; i < cache_len; i++) > + obj_table[i] =3D (void *) RTE_PTR_ADD(base_value, cache_objs[i]); > + rte_mempool_ops_enqueue_bulk(mp, obj_table, cache->len); > +#else > rte_mempool_ops_enqueue_bulk(mp, cache->objs, cache->len); > +#endif > cache->len =3D 0; > } >=20 > @@ -1334,8 +1353,13 @@ static __rte_always_inline void > rte_mempool_do_generic_put(struct rte_mempool *mp, void * const *obj_tab= le, > unsigned int n, struct rte_mempool_cache *cache) > { > +#ifdef RTE_MEMPOOL_INDEX_BASED_LCORE_CACHE > + uint32_t *cache_objs; > + void *base_value; > + uint32_t i; > +#else > void **cache_objs; > - > +#endif > /* increment stat now, adding in mempool always success */ > RTE_MEMPOOL_STAT_ADD(mp, put_bulk, 1); > RTE_MEMPOOL_STAT_ADD(mp, put_objs, n); > @@ -1344,7 +1368,13 @@ rte_mempool_do_generic_put(struct rte_mempool *mp,= void * const *obj_table, > if (unlikely(cache =3D=3D NULL || n > RTE_MEMPOOL_CACHE_MAX_SIZE)) > goto ring_enqueue; >=20 > +#ifdef RTE_MEMPOOL_INDEX_BASED_LCORE_CACHE > + cache_objs =3D (uint32_t *) cache->objs; > + cache_objs =3D &cache_objs[cache->len]; > + base_value =3D mp->pool_base_value; > +#else > cache_objs =3D &cache->objs[cache->len]; > +#endif >=20 > /* > * The cache follows the following algorithm > @@ -1354,13 +1384,40 @@ rte_mempool_do_generic_put(struct rte_mempool *mp= , void * const *obj_table, > */ >=20 > /* Add elements back into the cache */ > + > +#ifdef RTE_MEMPOOL_INDEX_BASED_LCORE_CACHE > +#if defined __ARM_NEON > + uint64x2_t v_obj_table; > + uint64x2_t v_base_value =3D vdupq_n_u64((uint64_t)base_value); > + uint32x2_t v_cache_objs; > + > + for (i =3D 0; i < (n & ~0x1); i +=3D 2) { > + v_obj_table =3D vld1q_u64((const uint64_t *)&obj_table[i]); > + v_cache_objs =3D vqmovn_u64(vsubq_u64(v_obj_table, v_base_value)); > + vst1_u32(cache_objs + i, v_cache_objs); > + } > + if (n & 0x1) { > + cache_objs[i] =3D (uint32_t) RTE_PTR_DIFF(obj_table[i], base_value); > + } > +#else > + for (i =3D 0; i < n; i++) { > + cache_objs[i] =3D (uint32_t) RTE_PTR_DIFF(obj_table[i], base_value); > + } > +#endif > +#else > rte_memcpy(&cache_objs[0], obj_table, sizeof(void *) * n); > +#endif >=20 > cache->len +=3D n; >=20 > if (cache->len >=3D cache->flushthresh) { > +#ifdef RTE_MEMPOOL_INDEX_BASED_LCORE_CACHE > + rte_mempool_ops_enqueue_bulk(mp, obj_table + cache->len - cache->size, > + cache->len - cache->size); > +#else > rte_mempool_ops_enqueue_bulk(mp, &cache->objs[cache->size], > cache->len - cache->size); > +#endif > cache->len =3D cache->size; > } >=20 > @@ -1461,13 +1518,22 @@ rte_mempool_do_generic_get(struct rte_mempool *mp= , void **obj_table, > { > int ret; > uint32_t index, len; > +#ifdef RTE_MEMPOOL_INDEX_BASED_LCORE_CACHE > + uint32_t i; > + uint32_t *cache_objs; > +#else > void **cache_objs; > - > +#endif > /* No cache provided or cannot be satisfied from cache */ > if (unlikely(cache =3D=3D NULL || n >=3D cache->size)) > goto ring_dequeue; >=20 > +#ifdef RTE_MEMPOOL_INDEX_BASED_LCORE_CACHE > + void *base_value =3D mp->pool_base_value; > + cache_objs =3D (uint32_t *) cache->objs; > +#else > cache_objs =3D cache->objs; > +#endif >=20 > /* Can this be satisfied from the cache? */ > if (cache->len < n) { > @@ -1475,8 +1541,14 @@ rte_mempool_do_generic_get(struct rte_mempool *mp,= void **obj_table, > uint32_t req =3D n + (cache->size - cache->len); >=20 > /* How many do we require i.e. number to fill the cache + the request = */ > +#ifdef RTE_MEMPOOL_INDEX_BASED_LCORE_CACHE > + void *temp_objs[RTE_MEMPOOL_CACHE_MAX_SIZE * 3]; /**< Cache objects */ > + ret =3D rte_mempool_ops_dequeue_bulk(mp, > + temp_objs, req); > +#else > ret =3D rte_mempool_ops_dequeue_bulk(mp, > &cache->objs[cache->len], req); > +#endif > if (unlikely(ret < 0)) { > /* > * In the off chance that we are buffer constrained, > @@ -1487,12 +1559,50 @@ rte_mempool_do_generic_get(struct rte_mempool *mp= , void **obj_table, > goto ring_dequeue; > } >=20 > +#ifdef RTE_MEMPOOL_INDEX_BASED_LCORE_CACHE > + len =3D cache->len; > + for (i =3D 0; i < req; ++i, ++len) { > + cache_objs[len] =3D (uint32_t) RTE_PTR_DIFF(temp_objs[i], > + base_value); > + } > +#endif > cache->len +=3D req; > } >=20 > /* Now fill in the response ... */ > +#ifdef RTE_MEMPOOL_INDEX_BASED_LCORE_CACHE > +#if defined __ARM_NEON > + uint64x2_t v_obj_table; > + uint64x2_t v_cache_objs; > + uint64x2_t v_base_value =3D vdupq_n_u64((uint64_t)base_value); > + > + for (index =3D 0, len =3D cache->len - 1; index < (n & ~0x3); index += =3D 4, > + len -=3D 4, obj_table +=3D 4) { > + v_cache_objs =3D vmovl_u32(vld1_u32(cache_objs + len - 1)); > + v_obj_table =3D vaddq_u64(v_cache_objs, v_base_value); > + vst1q_u64((uint64_t *)obj_table, v_obj_table); > + v_cache_objs =3D vmovl_u32(vld1_u32(cache_objs + len - 3)); > + v_obj_table =3D vaddq_u64(v_cache_objs, v_base_value); > + vst1q_u64((uint64_t *)(obj_table + 2), v_obj_table); > + } > + switch (n & 0x3) { > + case 3: > + *(obj_table++) =3D (void *) RTE_PTR_ADD(base_value, cache_objs[len--])= ; > + /* fallthrough */ > + case 2: > + *(obj_table++) =3D (void *) RTE_PTR_ADD(base_value, cache_objs[len--])= ; > + /* fallthrough */ > + case 1: > + *(obj_table++) =3D (void *) RTE_PTR_ADD(base_value, cache_objs[len--])= ; > + } > +#else > + for (index =3D 0, len =3D cache->len - 1; index < n; ++index, len--, ob= j_table++) > + *obj_table =3D (void *) RTE_PTR_ADD(base_value, cache_objs[len]); > +#endif > +#else > for (index =3D 0, len =3D cache->len - 1; index < n; ++index, len--, ob= j_table++) > *obj_table =3D cache_objs[len]; > +#endif >=20 > cache->len -=3D n; >=20 > diff --git a/lib/mempool/rte_mempool_ops_default.c b/lib/mempool/rte_memp= ool_ops_default.c > index 22fccf9d7619..3543cad9d4ce 100644 > --- a/lib/mempool/rte_mempool_ops_default.c > +++ b/lib/mempool/rte_mempool_ops_default.c > @@ -127,6 +127,13 @@ rte_mempool_op_populate_helper(struct rte_mempool *m= p, unsigned int flags, > obj =3D va + off; > obj_cb(mp, obj_cb_arg, obj, > (iova =3D=3D RTE_BAD_IOVA) ? RTE_BAD_IOVA : (iova + off)); > +#ifdef RTE_MEMPOOL_INDEX_BASED_LCORE_CACHE > + /* Store pool base value to calculate indices for index-based > + * lcore cache implementation > + */ > + if (i =3D=3D 0) > + mp->pool_base_value =3D obj; > +#endif > rte_mempool_ops_enqueue_bulk(mp, &obj, 1); > off +=3D mp->elt_size + mp->trailer_size; > } > -- > 2.25.1