From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id 285FDA00C4; Thu, 13 Jan 2022 06:17:43 +0100 (CET) Received: from [217.70.189.124] (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id 91BE440DF6; Thu, 13 Jan 2022 06:17:42 +0100 (CET) Received: from EUR04-VI1-obe.outbound.protection.outlook.com (mail-eopbgr80053.outbound.protection.outlook.com [40.107.8.53]) by mails.dpdk.org (Postfix) with ESMTP id E4A5B40150 for ; Thu, 13 Jan 2022 06:17:40 +0100 (CET) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=armh.onmicrosoft.com; s=selector2-armh-onmicrosoft-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=jes4XZF20f0TtmVx6jjlx80Qqzo4g+WHiFhj1hrgx2w=; b=JWJPxz2fklunzQLYaIKaiRB4EFKRMQkrkCWR/keiiQ2FFynN2Og77n4B0yvMUaWQnkug6qZNvKrEdGi2tsdzhDXfTqOnS/9O/PKqGxfE0Yi/A65WARTC9fkrLLk6MVIEs2C6FE7Zj4N996dndXPSQtFTkitv5e+ZfvSoHbVc06g= Received: from AS9PR05CA0012.eurprd05.prod.outlook.com (2603:10a6:20b:488::30) by DBBPR08MB6028.eurprd08.prod.outlook.com (2603:10a6:10:208::6) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4888.11; Thu, 13 Jan 2022 05:17:36 +0000 Received: from AM5EUR03FT016.eop-EUR03.prod.protection.outlook.com (2603:10a6:20b:488:cafe::36) by AS9PR05CA0012.outlook.office365.com (2603:10a6:20b:488::30) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4867.9 via Frontend Transport; Thu, 13 Jan 2022 05:17:36 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 63.35.35.123) smtp.mailfrom=arm.com; dkim=pass (signature was verified) header.d=armh.onmicrosoft.com;dmarc=pass action=none header.from=arm.com; Received-SPF: Pass (protection.outlook.com: domain of arm.com designates 63.35.35.123 as permitted sender) receiver=protection.outlook.com; client-ip=63.35.35.123; helo=64aa7808-outbound-1.mta.getcheckrecipient.com; Received: from 64aa7808-outbound-1.mta.getcheckrecipient.com (63.35.35.123) by AM5EUR03FT016.mail.protection.outlook.com (10.152.16.142) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4888.9 via Frontend Transport; Thu, 13 Jan 2022 05:17:36 +0000 Received: ("Tessian outbound de6049708a0a:v110"); Thu, 13 Jan 2022 05:17:36 +0000 X-CheckRecipientChecked: true X-CR-MTA-CID: 88b74651cedc5241 X-CR-MTA-TID: 64aa7808 Received: from c3686fb74c08.1 by 64aa7808-outbound-1.mta.getcheckrecipient.com id 736E4C9C-ED94-48ED-94D7-60646C8085DE.1; Thu, 13 Jan 2022 05:17:25 +0000 Received: from EUR02-AM5-obe.outbound.protection.outlook.com by 64aa7808-outbound-1.mta.getcheckrecipient.com with ESMTPS id c3686fb74c08.1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384); Thu, 13 Jan 2022 05:17:25 +0000 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=DjFA4DGBgOkLU9jMJmbeVJG3Y7f/z2myir63bh6XgBOuaYEzOMxNI1dB9kwfCaw+Fe7Xw+2lMkiJkJvbIYUf508nCt+UJhL06lTeSpG6P9g6dEfpclpBp40+anL+Aj0fGOLRKB7jpM/mQbqM3NhDyGze30Zw3JybDTG2VnnLdycbbZwQIlap+vJu/ik57P66LW5LBM2p2EI2Wzh2aRABtP5UUC6EeV28+emICdKbZp57Ok1aVvd1SkXJFFTw0+rfWmg1E6ZPvh0SXzoXJSRBq+W+ZtFyCUQXRj3fWiKhcohlxRQ53VV8JfjhaN6JWxMQ/YgBPB+SpuLEgNsbnFEEYQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=jes4XZF20f0TtmVx6jjlx80Qqzo4g+WHiFhj1hrgx2w=; b=HikyHKqjedJswGbwgYkyc0HIHgQ8tWRlc9A2lRPGOFnVkU+BRUV3CXWxhzhaMfAT51gFpteQIyyJrlaFlI5AVPA6x97A/mpCG9e9Xh0wccq1pBecIaIvYN6exI3mEU+obZagX60UBMDkXaIVK+wKB1438+dROUvzl4FF7DQ2sFwW+xdJlZ271npuDNkOrnyGgK0+q3okg3fjMgiPa603Hv87BnRCeBU2U9b9ZcEILX04uTSpmdr7aPqrNqEnMR03PAxSuwHwUMuml6Jsn/bpOCz2upRPqGf8QWXeS3bAj4Rbi05GwhocZY824RvpLxglOUbJ1vGH2kr+q83XWvxq+Q== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=arm.com; dmarc=pass action=none header.from=arm.com; dkim=pass header.d=arm.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=armh.onmicrosoft.com; s=selector2-armh-onmicrosoft-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=jes4XZF20f0TtmVx6jjlx80Qqzo4g+WHiFhj1hrgx2w=; b=JWJPxz2fklunzQLYaIKaiRB4EFKRMQkrkCWR/keiiQ2FFynN2Og77n4B0yvMUaWQnkug6qZNvKrEdGi2tsdzhDXfTqOnS/9O/PKqGxfE0Yi/A65WARTC9fkrLLk6MVIEs2C6FE7Zj4N996dndXPSQtFTkitv5e+ZfvSoHbVc06g= Received: from VI1PR08MB4622.eurprd08.prod.outlook.com (2603:10a6:803:bc::17) by DB9PR08MB7212.eurprd08.prod.outlook.com (2603:10a6:10:2cf::23) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4888.11; Thu, 13 Jan 2022 05:17:23 +0000 Received: from VI1PR08MB4622.eurprd08.prod.outlook.com ([fe80::b95b:5090:82a5:9d1f]) by VI1PR08MB4622.eurprd08.prod.outlook.com ([fe80::b95b:5090:82a5:9d1f%3]) with mapi id 15.20.4844.019; Thu, 13 Jan 2022 05:17:23 +0000 From: Dharmik Thakkar To: "Ananyev, Konstantin" CC: Olivier Matz , Andrew Rybchenko , "dev@dpdk.org" , nd , Honnappa Nagarahalli , Ruifeng Wang Subject: Re: [PATCH 1/1] mempool: implement index-based per core cache Thread-Topic: [PATCH 1/1] mempool: implement index-based per core cache Thread-Index: AQHX+RoGDk3+4penq0a+Je4y+m9W36xdMyaAgANUVQA= Date: Thu, 13 Jan 2022 05:17:23 +0000 Message-ID: <8F6CF7E6-BD3D-424B-A7E1-DB6E53276DFE@arm.com> References: <20210930172735.2675627-1-dharmik.thakkar@arm.com> <20211224225923.806498-1-dharmik.thakkar@arm.com> <20211224225923.806498-2-dharmik.thakkar@arm.com> In-Reply-To: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: Authentication-Results-Original: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=arm.com; X-MS-Office365-Filtering-Correlation-Id: a54cdd79-22b1-43e3-58d0-08d9d6540303 x-ms-traffictypediagnostic: DB9PR08MB7212:EE_|AM5EUR03FT016:EE_|DBBPR08MB6028:EE_ X-Microsoft-Antispam-PRVS: x-checkrecipientrouted: true nodisclaimer: true x-ms-oob-tlc-oobclassifiers: OLM:9508;OLM:9508; X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam-Untrusted: BCL:0; X-Microsoft-Antispam-Message-Info-Original: mY0n2g0Hc+O+Ssnn47QqeXP2xIpVoYLeoWRa1SKKq10GBskC05Amv53glxPdIcP/GjcQcCxIB1SvJ962tLcAKEJC+LG6yFWF4fWYXxVY3OIhGV2pgqxeWdT2yy/skUIkDNBXiE/d48mucSP0JChOG6T+fyBxCg1yOZSEqkPAq7SR77UsnD2z9nUS/Qbkjxt5tNSMymXe8HCe0yitnmBlRlF41ySGrsN7BltwP1tP+PXK0HfsFcS98RJp131H6wKKZPDOncMAJvYyyJAw/r1GWRUppoZ1AO/hrTwzxlc97xn9tZ4l9urxAIgoaRfeJoY7sd0AG2u4eTJH4R9SuxSHBzGqOGsjBsIQDU9khW+peGV+KgyWpUU06YIepBS5L4yOn34N2y2F4JNeG+la5MKlt13Lbrcqxs5kx1ZZ9j16ZHan8YMPOVYfKvB2CgYOQjvfiS/BQVZvYDQ1nVB88c4xalgPpvS122RbMiQZtuSD5bA1AiSPAOn4KskH1C+2xXhETbmrzZ2ne3uerAJurPsycoN+lpI6W6CGHgWmvIzU8tuhNFcLM2wDTy+beYbxmqIpFBuHS8mIIcMmFDRwkWifhxcSQ04mCcmDdvQ5GTe7Dpi6nuar5TJGOmMHG1FjMr5j6yg9CgS9iTfo11WwAt5/oRa/Pa5IS0LJGVYaDEcdSpbbAQT9Asm/SHGD8uPtcM2K6+8AyJzg090SBhr5iN36zNH/3DOb37xmeUGCbLD6O24= X-Forefront-Antispam-Report-Untrusted: CIP:255.255.255.255; CTRY:; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:VI1PR08MB4622.eurprd08.prod.outlook.com; PTR:; CAT:NONE; SFS:(4636009)(366004)(6916009)(8676002)(30864003)(8936002)(83380400001)(54906003)(508600001)(2906002)(186003)(26005)(66446008)(64756008)(66556008)(33656002)(316002)(91956017)(76116006)(4326008)(38100700002)(66476007)(5660300002)(86362001)(6512007)(122000001)(6486002)(71200400001)(66946007)(38070700005)(6506007)(53546011)(2616005)(36756003)(45980500001); DIR:OUT; SFP:1101; Content-Type: text/plain; charset="us-ascii" Content-ID: Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-MS-Exchange-Transport-CrossTenantHeadersStamped: DB9PR08MB7212 Original-Authentication-Results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=arm.com; X-EOPAttributedMessage: 0 X-MS-Exchange-Transport-CrossTenantHeadersStripped: AM5EUR03FT016.eop-EUR03.prod.protection.outlook.com X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id-Prvs: 9080901b-1e61-46e7-9341-08d9d653fb1f X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: lvoEqpZ7SPiY1oQc1t/cz6ZfCGp3KrZLK7WoJpSeEYjhyM3HFIyXwKwIDlXwLKiD7e5f87DFMavH80W4eCv6aZyvIY3IG8Esg91gsUDvlT+Y2lgJ4begfZRg64e164Y8wjtea9ZkYJd+qamhPTXR5V1CYk9EYwWNOfNS0jQ+fXcLnLweyy5PhG4kWvJyoQPRAbvqp3RQVsI8SIQr/yJU69YE0FZukf/dMc3u+booNcoEn5W1C9QKPtivULYHMya1mrRT4PDhlwZvL4FJCDjPzwk+IOSyP2O4evUgARLESPaTlaP6WlyE1bU5nDHInpGw/U36iFNYI8l5eie/bO6NkHtZUAwXQu1xzxxVTkJJpzRL7LI84fyx7cyWyPd6p2pmxswB5coX7oJyzL1Nrp62HRKdBOT550DH1U+th4xp3sTOtFjnMezJG2WvOpjLHylQU+D0WyJccLRCmCKESx7hd3kVkab5Y78rQ/7Aea/3+VZ+tuqmjXYxGry6iO2gD7YgQ/+w3ajB4hVb2rfvG2fTP7CHfa++lCPfuXrOG1suJTeSLAoX9Ru9FE+oH/C3micbFYIAaVLms3S1ZviMrqxdQ+Hw5r06oLp/rQa4Bce0L2wwZTCo1nS+le5YTz/oomvx0t/T6gbOO9LubXEVjVGkRk7uK8jWWHm+LtG3f58uusKOz30mWBwhbuN2DQurqEob+juU22yXPjlQOTBHilRewkxsb0Gquh0VfayZBZOtGDkumfUPxZVAkNK5jGsBAJbQACeCHG5WXraA9t2hfYN6wA== X-Forefront-Antispam-Report: CIP:63.35.35.123; CTRY:IE; LANG:en; SCL:1; SRV:; IPV:CAL; SFV:NSPM; H:64aa7808-outbound-1.mta.getcheckrecipient.com; PTR:ec2-63-35-35-123.eu-west-1.compute.amazonaws.com; CAT:NONE; SFS:(4636009)(36840700001)(46966006)(40470700002)(316002)(86362001)(36860700001)(6486002)(6512007)(54906003)(83380400001)(70206006)(70586007)(2616005)(40460700001)(81166007)(33656002)(26005)(186003)(4326008)(508600001)(6862004)(2906002)(356005)(8676002)(36756003)(5660300002)(30864003)(47076005)(82310400004)(6506007)(336012)(53546011)(8936002); DIR:OUT; SFP:1101; X-OriginatorOrg: arm.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 13 Jan 2022 05:17:36.5002 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: a54cdd79-22b1-43e3-58d0-08d9d6540303 X-MS-Exchange-CrossTenant-Id: f34e5979-57d9-4aaa-ad4d-b122a662184d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=f34e5979-57d9-4aaa-ad4d-b122a662184d; Ip=[63.35.35.123]; Helo=[64aa7808-outbound-1.mta.getcheckrecipient.com] X-MS-Exchange-CrossTenant-AuthSource: AM5EUR03FT016.eop-EUR03.prod.protection.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: DBBPR08MB6028 X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Hi Konstatin, Thank you for your comments and the test report! > On Jan 10, 2022, at 8:26 PM, Ananyev, Konstantin wrote: >=20 >=20 >=20 >=20 >> Current mempool per core cache implementation stores pointers to mbufs >> On 64b architectures, each pointer consumes 8B >> This patch replaces it with index-based implementation, >> where in each buffer is addressed by (pool base address + index) >> It reduces the amount of memory/cache required for per core cache >>=20 >> L3Fwd performance testing reveals minor improvements in the cache >> performance (L1 and L2 misses reduced by 0.60%) >> with no change in throughput >=20 > I feel really sceptical about that patch and the whole idea in general: > - From what I read above there is no real performance improvement observe= d. > (In fact on my IA boxes mempool_perf_autotest reports ~20% slowdown, > see below for more details).=20 Currently, the optimizations (loop unroll and vectorization) are only imple= mented for ARM64. Similar optimizations can be implemented for x86 platforms which should clo= se the performance gap and in my understanding should give better performance for a bulk size of 3= 2. > - Space utilization difference looks neglectable too. Sorry, I did not understand this point. > - The change introduces a new build time config option with a major limit= ation: > All memzones in a pool have to be within the same 4GB boundary.=20 > To address it properly, extra changes will be required in init(/populat= e) part of the code. I agree to the above mentioned challenges and I am currently working on res= olving these issues. > All that will complicate mempool code, will make it more error prone > and harder to maintain. > But, as there is no real gain in return - no point to add such extra comp= lexity at all. >=20 > Konstantin >=20 > CSX 2.1 GHz > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D >=20 > echo 'mempool_perf_autotest' | ./dpdk-test -n 4 --lcores=3D'6-13' --no-pc= i >=20 > params : = rate_persec =09 > = (normal/index-based/diff %) > (with cache) > cache=3D512 cores=3D1 n_get_bulk=3D32 n_put_bulk=3D32 n_keep=3D32 : 74098= 9337.00/504116019.00/-31.97 > cache=3D512 cores=3D1 n_get_bulk=3D32 n_put_bulk=3D32 n_keep=3D128 : 7564= 95155.00/615002931.00/-18.70 > cache=3D512 cores=3D2 n_get_bulk=3D32 n_put_bulk=3D32 n_keep=3D32 : 14834= 99110.00/1007248997.00/-32.10 > cache=3D512 cores=3D2 n_get_bulk=3D32 n_put_bulk=3D32 n_keep=3D128 : 1512= 439807.00/1229927218.00/-18.68 > cache=3D512 cores=3D8 n_get_bulk=3D32 n_put_bulk=3D32 n_keep=3D32 : 59336= 68757.00/4029048421.00/-32.10 > cache=3D512 cores=3D8 n_get_bulk=3D32 n_put_bulk=3D32 n_keep=3D128 : 6049= 234942.00/4921111344.00/-18.65 >=20 > (with user-owned cache) > cache=3D512 cores=3D1 n_get_bulk=3D32 n_put_bulk=3D32 n_keep=3D32 : 63060= 0499.00/504312627.00/-20.03 > cache=3D512 cores=3D1 n_get_bulk=3D32 n_put_bulk=3D32 n_keep=3D128 : 7562= 59225.00/615042252.00/-18.67 > cache=3D512 cores=3D2 n_get_bulk=3D32 n_put_bulk=3D32 n_keep=3D32 : 12620= 52966.00/1007039283.00/-20.21 > cache=3D512 cores=3D2 n_get_bulk=3D32 n_put_bulk=3D32 n_keep=3D128 : 1517= 853081.00/1230818508.00/-18.91 > cache=3D512 cores=3D8 n_get_bulk=3D32 n_put_bulk=3D32 n_keep=3D32 :505452= 9533.00/4028052273.00/-20.31 > cache=3D512 cores=3D8 n_get_bulk=3D32 n_put_bulk=3D32 n_keep=3D128 : 6059= 340592.00/4912893129.00/-18.92 >=20 >>=20 >> Suggested-by: Honnappa Nagarahalli >> Signed-off-by: Dharmik Thakkar >> Reviewed-by: Ruifeng Wang >> --- >> lib/mempool/rte_mempool.h | 114 +++++++++++++++++++++++++- >> lib/mempool/rte_mempool_ops_default.c | 7 ++ >> 2 files changed, 119 insertions(+), 2 deletions(-) >>=20 >> diff --git a/lib/mempool/rte_mempool.h b/lib/mempool/rte_mempool.h >> index 1e7a3c15273c..4fabd3b1920b 100644 >> --- a/lib/mempool/rte_mempool.h >> +++ b/lib/mempool/rte_mempool.h >> @@ -50,6 +50,10 @@ >> #include >> #include >>=20 >> +#ifdef RTE_MEMPOOL_INDEX_BASED_LCORE_CACHE >> +#include >> +#endif >> + >> #include "rte_mempool_trace_fp.h" >>=20 >> #ifdef __cplusplus >> @@ -239,6 +243,9 @@ struct rte_mempool { >> int32_t ops_index; >>=20 >> struct rte_mempool_cache *local_cache; /**< Per-lcore local cache */ >> +#ifdef RTE_MEMPOOL_INDEX_BASED_LCORE_CACHE >> + void *pool_base_value; /**< Base value to calculate indices */ >> +#endif >>=20 >> uint32_t populated_size; /**< Number of populated objects. */ >> struct rte_mempool_objhdr_list elt_list; /**< List of objects in pool *= / >> @@ -1314,7 +1321,19 @@ rte_mempool_cache_flush(struct rte_mempool_cache = *cache, >> if (cache =3D=3D NULL || cache->len =3D=3D 0) >> return; >> rte_mempool_trace_cache_flush(cache, mp); >> + >> +#ifdef RTE_MEMPOOL_INDEX_BASED_LCORE_CACHE >> + unsigned int i; >> + unsigned int cache_len =3D cache->len; >> + void *obj_table[RTE_MEMPOOL_CACHE_MAX_SIZE * 3]; >> + void *base_value =3D mp->pool_base_value; >> + uint32_t *cache_objs =3D (uint32_t *) cache->objs; >> + for (i =3D 0; i < cache_len; i++) >> + obj_table[i] =3D (void *) RTE_PTR_ADD(base_value, cache_objs[i]); >> + rte_mempool_ops_enqueue_bulk(mp, obj_table, cache->len); >> +#else >> rte_mempool_ops_enqueue_bulk(mp, cache->objs, cache->len); >> +#endif >> cache->len =3D 0; >> } >>=20 >> @@ -1334,8 +1353,13 @@ static __rte_always_inline void >> rte_mempool_do_generic_put(struct rte_mempool *mp, void * const *obj_tab= le, >> unsigned int n, struct rte_mempool_cache *cache) >> { >> +#ifdef RTE_MEMPOOL_INDEX_BASED_LCORE_CACHE >> + uint32_t *cache_objs; >> + void *base_value; >> + uint32_t i; >> +#else >> void **cache_objs; >> - >> +#endif >> /* increment stat now, adding in mempool always success */ >> RTE_MEMPOOL_STAT_ADD(mp, put_bulk, 1); >> RTE_MEMPOOL_STAT_ADD(mp, put_objs, n); >> @@ -1344,7 +1368,13 @@ rte_mempool_do_generic_put(struct rte_mempool *mp= , void * const *obj_table, >> if (unlikely(cache =3D=3D NULL || n > RTE_MEMPOOL_CACHE_MAX_SIZE)) >> goto ring_enqueue; >>=20 >> +#ifdef RTE_MEMPOOL_INDEX_BASED_LCORE_CACHE >> + cache_objs =3D (uint32_t *) cache->objs; >> + cache_objs =3D &cache_objs[cache->len]; >> + base_value =3D mp->pool_base_value; >> +#else >> cache_objs =3D &cache->objs[cache->len]; >> +#endif >>=20 >> /* >> * The cache follows the following algorithm >> @@ -1354,13 +1384,40 @@ rte_mempool_do_generic_put(struct rte_mempool *m= p, void * const *obj_table, >> */ >>=20 >> /* Add elements back into the cache */ >> + >> +#ifdef RTE_MEMPOOL_INDEX_BASED_LCORE_CACHE >> +#if defined __ARM_NEON >> + uint64x2_t v_obj_table; >> + uint64x2_t v_base_value =3D vdupq_n_u64((uint64_t)base_value); >> + uint32x2_t v_cache_objs; >> + >> + for (i =3D 0; i < (n & ~0x1); i +=3D 2) { >> + v_obj_table =3D vld1q_u64((const uint64_t *)&obj_table[i]); >> + v_cache_objs =3D vqmovn_u64(vsubq_u64(v_obj_table, v_base_value)); >> + vst1_u32(cache_objs + i, v_cache_objs); >> + } >> + if (n & 0x1) { >> + cache_objs[i] =3D (uint32_t) RTE_PTR_DIFF(obj_table[i], base_value); >> + } >> +#else >> + for (i =3D 0; i < n; i++) { >> + cache_objs[i] =3D (uint32_t) RTE_PTR_DIFF(obj_table[i], base_value); >> + } >> +#endif >> +#else >> rte_memcpy(&cache_objs[0], obj_table, sizeof(void *) * n); >> +#endif >>=20 >> cache->len +=3D n; >>=20 >> if (cache->len >=3D cache->flushthresh) { >> +#ifdef RTE_MEMPOOL_INDEX_BASED_LCORE_CACHE >> + rte_mempool_ops_enqueue_bulk(mp, obj_table + cache->len - cache->size= , >> + cache->len - cache->size); >> +#else >> rte_mempool_ops_enqueue_bulk(mp, &cache->objs[cache->size], >> cache->len - cache->size); >> +#endif >> cache->len =3D cache->size; >> } >>=20 >> @@ -1461,13 +1518,22 @@ rte_mempool_do_generic_get(struct rte_mempool *m= p, void **obj_table, >> { >> int ret; >> uint32_t index, len; >> +#ifdef RTE_MEMPOOL_INDEX_BASED_LCORE_CACHE >> + uint32_t i; >> + uint32_t *cache_objs; >> +#else >> void **cache_objs; >> - >> +#endif >> /* No cache provided or cannot be satisfied from cache */ >> if (unlikely(cache =3D=3D NULL || n >=3D cache->size)) >> goto ring_dequeue; >>=20 >> +#ifdef RTE_MEMPOOL_INDEX_BASED_LCORE_CACHE >> + void *base_value =3D mp->pool_base_value; >> + cache_objs =3D (uint32_t *) cache->objs; >> +#else >> cache_objs =3D cache->objs; >> +#endif >>=20 >> /* Can this be satisfied from the cache? */ >> if (cache->len < n) { >> @@ -1475,8 +1541,14 @@ rte_mempool_do_generic_get(struct rte_mempool *mp= , void **obj_table, >> uint32_t req =3D n + (cache->size - cache->len); >>=20 >> /* How many do we require i.e. number to fill the cache + the request = */ >> +#ifdef RTE_MEMPOOL_INDEX_BASED_LCORE_CACHE >> + void *temp_objs[RTE_MEMPOOL_CACHE_MAX_SIZE * 3]; /**< Cache objects *= / >> + ret =3D rte_mempool_ops_dequeue_bulk(mp, >> + temp_objs, req); >> +#else >> ret =3D rte_mempool_ops_dequeue_bulk(mp, >> &cache->objs[cache->len], req); >> +#endif >> if (unlikely(ret < 0)) { >> /* >> * In the off chance that we are buffer constrained, >> @@ -1487,12 +1559,50 @@ rte_mempool_do_generic_get(struct rte_mempool *m= p, void **obj_table, >> goto ring_dequeue; >> } >>=20 >> +#ifdef RTE_MEMPOOL_INDEX_BASED_LCORE_CACHE >> + len =3D cache->len; >> + for (i =3D 0; i < req; ++i, ++len) { >> + cache_objs[len] =3D (uint32_t) RTE_PTR_DIFF(temp_objs[i], >> + base_value); >> + } >> +#endif >> cache->len +=3D req; >> } >>=20 >> /* Now fill in the response ... */ >> +#ifdef RTE_MEMPOOL_INDEX_BASED_LCORE_CACHE >> +#if defined __ARM_NEON >> + uint64x2_t v_obj_table; >> + uint64x2_t v_cache_objs; >> + uint64x2_t v_base_value =3D vdupq_n_u64((uint64_t)base_value); >> + >> + for (index =3D 0, len =3D cache->len - 1; index < (n & ~0x3); index += =3D 4, >> + len -=3D 4, obj_table +=3D 4) { >> + v_cache_objs =3D vmovl_u32(vld1_u32(cache_objs + len - 1)); >> + v_obj_table =3D vaddq_u64(v_cache_objs, v_base_value); >> + vst1q_u64((uint64_t *)obj_table, v_obj_table); >> + v_cache_objs =3D vmovl_u32(vld1_u32(cache_objs + len - 3)); >> + v_obj_table =3D vaddq_u64(v_cache_objs, v_base_value); >> + vst1q_u64((uint64_t *)(obj_table + 2), v_obj_table); >> + } >> + switch (n & 0x3) { >> + case 3: >> + *(obj_table++) =3D (void *) RTE_PTR_ADD(base_value, cache_objs[len--]= ); >> + /* fallthrough */ >> + case 2: >> + *(obj_table++) =3D (void *) RTE_PTR_ADD(base_value, cache_objs[len--]= ); >> + /* fallthrough */ >> + case 1: >> + *(obj_table++) =3D (void *) RTE_PTR_ADD(base_value, cache_objs[len--]= ); >> + } >> +#else >> + for (index =3D 0, len =3D cache->len - 1; index < n; ++index, len--, o= bj_table++) >> + *obj_table =3D (void *) RTE_PTR_ADD(base_value, cache_objs[len]); >> +#endif >> +#else >> for (index =3D 0, len =3D cache->len - 1; index < n; ++index, len--, ob= j_table++) >> *obj_table =3D cache_objs[len]; >> +#endif >>=20 >> cache->len -=3D n; >>=20 >> diff --git a/lib/mempool/rte_mempool_ops_default.c b/lib/mempool/rte_mem= pool_ops_default.c >> index 22fccf9d7619..3543cad9d4ce 100644 >> --- a/lib/mempool/rte_mempool_ops_default.c >> +++ b/lib/mempool/rte_mempool_ops_default.c >> @@ -127,6 +127,13 @@ rte_mempool_op_populate_helper(struct rte_mempool *= mp, unsigned int flags, >> obj =3D va + off; >> obj_cb(mp, obj_cb_arg, obj, >> (iova =3D=3D RTE_BAD_IOVA) ? RTE_BAD_IOVA : (iova + off)); >> +#ifdef RTE_MEMPOOL_INDEX_BASED_LCORE_CACHE >> + /* Store pool base value to calculate indices for index-based >> + * lcore cache implementation >> + */ >> + if (i =3D=3D 0) >> + mp->pool_base_value =3D obj; >> +#endif >> rte_mempool_ops_enqueue_bulk(mp, &obj, 1); >> off +=3D mp->elt_size + mp->trailer_size; >> } >> -- >> 2.25.1 >=20