From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <dev-bounces@dpdk.org>
Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124])
	by inbox.dpdk.org (Postfix) with ESMTP id D113EA0032;
	Fri,  1 Oct 2021 23:30:14 +0200 (CEST)
Received: from [217.70.189.124] (localhost [127.0.0.1])
	by mails.dpdk.org (Postfix) with ESMTP id 5BC6541161;
	Fri,  1 Oct 2021 23:30:14 +0200 (CEST)
Received: from mga01.intel.com (mga01.intel.com [192.55.52.88])
 by mails.dpdk.org (Postfix) with ESMTP id 8D76E4067E
 for <dev@dpdk.org>; Fri,  1 Oct 2021 23:30:12 +0200 (CEST)
X-IronPort-AV: E=McAfee;i="6200,9189,10124"; a="248153994"
X-IronPort-AV: E=Sophos;i="5.85,340,1624345200"; d="scan'208";a="248153994"
Received: from fmsmga004.fm.intel.com ([10.253.24.48])
 by fmsmga101.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 01 Oct 2021 14:30:11 -0700
X-ExtLoop1: 1
X-IronPort-AV: E=Sophos;i="5.85,340,1624345200"; d="scan'208";a="540562264"
Received: from fmsmsx601.amr.corp.intel.com ([10.18.126.81])
 by fmsmga004.fm.intel.com with ESMTP; 01 Oct 2021 14:30:11 -0700
Received: from fmsmsx608.amr.corp.intel.com (10.18.126.88) by
 fmsmsx601.amr.corp.intel.com (10.18.126.81) with Microsoft SMTP Server
 (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id
 15.1.2242.12; Fri, 1 Oct 2021 14:30:11 -0700
Received: from FMSEDG603.ED.cps.intel.com (10.1.192.133) by
 fmsmsx608.amr.corp.intel.com (10.18.126.88) with Microsoft SMTP Server
 (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id
 15.1.2242.12 via Frontend Transport; Fri, 1 Oct 2021 14:30:11 -0700
Received: from NAM10-DM6-obe.outbound.protection.outlook.com (104.47.58.105)
 by edgegateway.intel.com (192.55.55.68) with Microsoft SMTP Server
 (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id
 15.1.2242.12; Fri, 1 Oct 2021 14:30:10 -0700
ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none;
 b=ld7bzx+1MuAtO8ckTDOiR8PHMDAipkhh2/vW7GOJ3dvNGp4xkaXbNV4Ag+SBO3aAsmYoCeHlxVET6rCU8ManglloRVfKKBQFO5zGl9Pb5pUkZLsLaw0eEYsIr+3Vj8+i4S3UWu02AOSIiaW6IyPvVmNWpJ11hUyqXI3m5qrhT6a+96enYd0oqPDaZkjqYMiMDjObOZVM321EkQb9vVHafeosX64SvscTiWcdkJiLFy3xStlGi+oHtaryw13CipSoH9ZhU3ori0636PYgkygQYTuBAI77y+e3PHdwForziJVJklPgvt9pe9lcR+r4YpPmjhHZVgxMPysiCw32HgIaIQ==
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; 
 s=arcselector9901;
 h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1;
 bh=KW9R33e+Z3sai6jC6jKfjZ7CRrFZLctnmngDw15F0aY=;
 b=VRE7KWuAd/g91MI5qttGq2JyjBan5stnbwCZ4ZzcpmeyctXzIMyAR94K9ykPTGE+5aA2bVDWbRnELfPV5M8ayy+LAazxSg4xpht3L5Brv2xR8BjaqwEh2hZ5SZKFMJGVN3eR8GJsfPZq1YfV0XDpL7rEL6QVwCyA41MZLZpzULCgl0JYBl1xR+hQa25Sbm7edwsFK3Al7Qx+ZmtX5aPK2ExYcyQIT2r5S30vIQ69ICBqkJ/U9SjF8Xw9LW6RWvi8J+nlBAWsllXKp1tEbtQLVA8tDu9ceckBlrhNNdKCwYhp/Les/80bmdwPMqdTz22rgguDH2IcCbgGxWaag0oVeQ==
ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass
 smtp.mailfrom=intel.com; dmarc=pass action=none header.from=intel.com;
 dkim=pass header.d=intel.com; arc=none
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=intel.onmicrosoft.com; 
 s=selector2-intel-onmicrosoft-com;
 h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck;
 bh=KW9R33e+Z3sai6jC6jKfjZ7CRrFZLctnmngDw15F0aY=;
 b=YLqzQ9Bush0FvufTz1TVF2K6aij5COp5Vp6E085FehNVRNK9CT2Lts+QT3W0vIJuaqcmCXIo2pt/ieJcaDdDwDMJYoQKkFKenC2qyUzzvPKfsMrMgZSK0RmJuXQQt501tHSQTJ8vn2JZItgKqSgnVLFzRktCNT1YwtEHitciTXE=
Received: from DM6PR11MB4491.namprd11.prod.outlook.com (2603:10b6:5:204::19)
 by DM6PR11MB4250.namprd11.prod.outlook.com (2603:10b6:5:1df::18) with
 Microsoft SMTP Server (version=TLS1_2,
 cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4566.17; Fri, 1 Oct
 2021 21:30:09 +0000
Received: from DM6PR11MB4491.namprd11.prod.outlook.com
 ([fe80::740e:126e:c785:c8fd]) by DM6PR11MB4491.namprd11.prod.outlook.com
 ([fe80::740e:126e:c785:c8fd%4]) with mapi id 15.20.4566.014; Fri, 1 Oct 2021
 21:30:09 +0000
From: "Ananyev, Konstantin" <konstantin.ananyev@intel.com>
To: Dharmik Thakkar <dharmik.thakkar@arm.com>, Olivier Matz
 <olivier.matz@6wind.com>, Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
CC: "dev@dpdk.org" <dev@dpdk.org>, "nd@arm.com" <nd@arm.com>,
 "honnappa.nagarahalli@arm.com" <honnappa.nagarahalli@arm.com>,
 "ruifeng.wang@arm.com" <ruifeng.wang@arm.com>
Thread-Topic: [dpdk-dev] [RFC] mempool: implement index-based per core cache
Thread-Index: AQHXtiCSje19dgSVGUWYbrC84aN826u+qP0Q
Date: Fri, 1 Oct 2021 21:30:08 +0000
Message-ID: <DM6PR11MB449143289777B6B94042E9969AAB9@DM6PR11MB4491.namprd11.prod.outlook.com>
References: <20210930172735.2675627-1-dharmik.thakkar@arm.com>
In-Reply-To: <20210930172735.2675627-1-dharmik.thakkar@arm.com>
Accept-Language: en-GB, en-US
Content-Language: en-US
X-MS-Has-Attach: 
X-MS-TNEF-Correlator: 
dlp-product: dlpe-windows
dlp-reaction: no-action
dlp-version: 11.6.200.16
authentication-results: arm.com; dkim=none (message not signed)
 header.d=none;arm.com; dmarc=none action=none header.from=intel.com;
x-ms-publictraffictype: Email
x-ms-office365-filtering-correlation-id: 2697c5b9-0ad7-4131-7511-08d98522a4ec
x-ms-traffictypediagnostic: DM6PR11MB4250:
x-microsoft-antispam-prvs: <DM6PR11MB42509D15A655BFE7D76C79A69AAB9@DM6PR11MB4250.namprd11.prod.outlook.com>
x-ms-oob-tlc-oobclassifiers: OLM:8273;
x-ms-exchange-senderadcheck: 1
x-ms-exchange-antispam-relay: 0
x-microsoft-antispam: BCL:0;
x-microsoft-antispam-message-info: 8YsenpCf52/+arsfQ3c6UI3dXD1CuQqh1uWEjR7w9z+C89sTNBE9PKKziA2kWDcY99v9YantbS2IrGquLmMoz20Zc3ITR7j3jioag/8IfYVlhc4eIxKy6JNbE27+oK3lekra6xIKTI+DJzbnM6IcbigZUHjGSyL+H+NJju+G5LuN/olG+pDoxUxiBushiywrB47FUNAq4p3iBDorJmorHZquseDj+T4Jp5zePvw2phlZ7q4OdbEtulVa1bUY3Lo9+8Pbw3+Tmq/pmU/0yepA8UD7IgWnN20E4ctzCVQvtOOzn+FqYi/frnme/f4kwgMSusqSpbcqS7odoH8tNVtdKZ0EvoRcQqdWJ8joh7oT90j1dvuWHdeM78EkWBFCzW796DQBT81PmqvEpf316D/5mfWUMMA1Z5JuO0TTI8wMMXMhSqm4is8/Ly43Lwje7OjDF4220Tp2Gu/FqFbdhVW2oW+Shbs/kDgeWyMsobydBW6jMeFLvJb4P67XPhz7rN3opSNrpMVWEtOfgrGBFKYuHdE1ScrK+/ss4O+L9NuzvMkcYZA0Y0f6DmYtS9fakN0i2xCPpdSsCLB+HMXi/agJjA/qE3uZybr2AYCFCF5V5mIxcvqM+nBWY4ddNQnlhK/qQhiIkntVKNIXVjugy/Nx+fnsUJsuqyZfn/iIHNmqsg5dlltByiUehRNc3FUJgMomRSfZpGrvLVV5mLNm9fxTjA==
x-forefront-antispam-report: CIP:255.255.255.255; CTRY:; LANG:en; SCL:1; SRV:;
 IPV:NLI; SFV:NSPM; H:DM6PR11MB4491.namprd11.prod.outlook.com; PTR:; CAT:NONE;
 SFS:(4636009)(366004)(64756008)(83380400001)(186003)(71200400001)(110136005)(8936002)(86362001)(4326008)(52536014)(122000001)(38100700002)(54906003)(508600001)(66946007)(66476007)(2906002)(66446008)(33656002)(55236004)(38070700005)(26005)(5660300002)(316002)(7696005)(66556008)(9686003)(76116006)(6506007)(55016002)(8676002);
 DIR:OUT; SFP:1102; 
x-ms-exchange-antispam-messagedata-chunkcount: 1
x-ms-exchange-antispam-messagedata-0: =?us-ascii?Q?gIdLA0MKm6d6eDKhxxZXNBMmpLC+a6gKljvmskJ2eUTRzJ9p9xFqfvmu62Nl?=
 =?us-ascii?Q?E1gIDtPYbG8z38jqIjp9qaBLV33CtmN66lu3KiBGMixv4xA9KA0S8CSxa7h3?=
 =?us-ascii?Q?bwkcoJcwv83WIP5YKHDwTYoqIs+IbUoJz7BIGI7YO/CjiWxEHNxXZvlsmC2Y?=
 =?us-ascii?Q?Ves2SoOo6TBQErslORVTSScRZaiqnn+0A1d+L/wul+zCY953CogbcGks80bc?=
 =?us-ascii?Q?WU5XSoNvjOZdtz1xx7KrFpILgC3jV8NS2KTppXUuUQSymhnHKEkhGEVZ0orJ?=
 =?us-ascii?Q?rJVsd6uC6ab5IstMsDYKfc+H+cWlz9ToFKcOgyNsgK0/EWImIYBlaitZT05f?=
 =?us-ascii?Q?jBMBZp66zlun208ganpf76pl4s9B2wTgh0XH4yrW94iSuZSLM2n8xHzdTFll?=
 =?us-ascii?Q?vqEgb5I47OuFbjYxooMdupq/dNSBGqCBVx0N7BIgDUQrL0EgkCM4hQMt+YSU?=
 =?us-ascii?Q?z2af3X+YRBY2nDVIHS2GFbgTh/J0Vv3UZB8KclfGdDhpfO5tSfujIRu60wIN?=
 =?us-ascii?Q?vk2ucw1nBHaltrU95j9y6BGGL//nJHxOsPU7xUfhkoRcP4AYFagPkBtTLkRt?=
 =?us-ascii?Q?SClWFax2qrZmIlq6YjMUvJVhUzBov/HFyLoSrrPxb2WLeF64S0SnF1U4wdr3?=
 =?us-ascii?Q?y0KY1+ivHTVfXAIT8UbpkPBlGQuFNhikxi6vpvRCEkm37jrQfIyf2ef2t01J?=
 =?us-ascii?Q?rHagnW4bf/x8gvH7ui10IvEczOoM2jaiHHVV5Zz28s7CwgZa0Lw7SHHVUYBS?=
 =?us-ascii?Q?vH7cvmJYAbPpucOjhLBf1m8+tm5zvtebHwhdzEZmhEZZ7ESBsMZ6awBhTQhz?=
 =?us-ascii?Q?1nyikwhBubBxJOyPSxkSOkpaeHYz9Du4SdtfWFp4o8wayzImjfV8uyU2zWjO?=
 =?us-ascii?Q?YnPvGemR9KFxcugsV9Decz/vxO1fCc+Bx9YWt6HM1qqiTwpiifvHm/HzXwDy?=
 =?us-ascii?Q?9F6kdIqo1RYQfzrvGIXiatoMadlKPILfmk2MPtN5fjP61W1RmT56xTadhG7K?=
 =?us-ascii?Q?VYTNtDvFmi9r8pXS5Q/XRU4W87OCm3pNnxDSaI8JWIdeHBi1Cdb5kCxfHcT1?=
 =?us-ascii?Q?kgMv9wKkQ684u0qR1FB2aZOaVferJ/Znjasli0qMevNeqfufL3zY96yXaxqX?=
 =?us-ascii?Q?igc6pZsgqH2sCmby/LNRCkSImEOLHTHgVxQMzzKuO2VsIhpCyLmNBHOjiYrS?=
 =?us-ascii?Q?/6+M5nT9SekVCwnHQubLiCBShOz1Ve0Y//MO3qLvCyiKazYr6wYZttGkxGx4?=
 =?us-ascii?Q?wieroIbSraKeg0UOBX717gsfRma0mDdGq8w2n4gaplmSrHf31CGKTQevYYGm?=
 =?us-ascii?Q?dNFoO02CihXDemzNBuqtRkPs?=
x-ms-exchange-transport-forked: True
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0
X-MS-Exchange-CrossTenant-AuthAs: Internal
X-MS-Exchange-CrossTenant-AuthSource: DM6PR11MB4491.namprd11.prod.outlook.com
X-MS-Exchange-CrossTenant-Network-Message-Id: 2697c5b9-0ad7-4131-7511-08d98522a4ec
X-MS-Exchange-CrossTenant-originalarrivaltime: 01 Oct 2021 21:30:09.0678 (UTC)
X-MS-Exchange-CrossTenant-fromentityheader: Hosted
X-MS-Exchange-CrossTenant-id: 46c98d88-e344-4ed4-8496-4ed7712e255d
X-MS-Exchange-CrossTenant-mailboxtype: HOSTED
X-MS-Exchange-CrossTenant-userprincipalname: ooamvGPoSpvjHUMz5kWtnZbgkEDGCwEvhJbSc7yTHvnE99kaDjQ5QdztLlakDLDv3ByZR81FJj4y1UdmJu+nEDLD3F7UQxT8VHpA+JRb3VE=
X-MS-Exchange-Transport-CrossTenantHeadersStamped: DM6PR11MB4250
X-OriginatorOrg: intel.com
Subject: Re: [dpdk-dev] [RFC] mempool: implement index-based per core cache
X-BeenThere: dev@dpdk.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: DPDK patches and discussions <dev.dpdk.org>
List-Unsubscribe: <https://mails.dpdk.org/options/dev>,
 <mailto:dev-request@dpdk.org?subject=unsubscribe>
List-Archive: <http://mails.dpdk.org/archives/dev/>
List-Post: <mailto:dev@dpdk.org>
List-Help: <mailto:dev-request@dpdk.org?subject=help>
List-Subscribe: <https://mails.dpdk.org/listinfo/dev>,
 <mailto:dev-request@dpdk.org?subject=subscribe>
Errors-To: dev-bounces@dpdk.org
Sender: "dev" <dev-bounces@dpdk.org>

> Current mempool per core cache implementation is based on pointer
> For most architectures, each pointer consumes 64b
> Replace it with index-based implementation, where in each buffer
> is addressed by (pool address + index)

I don't think it is going to work:
On 64-bit systems difference between pool address and it's elem address
could be bigger than 4GB.
=20
> It will reduce memory requirements
>=20
> L3Fwd performance testing reveals minor improvements in the cache
> performance and no change in throughput
>=20
> Micro-benchmarking the patch using mempool_perf_test shows
> significant improvement with majority of the test cases
>=20
> Future plan involves replacing global pool's pointer-based implementation=
 with index-based implementation
>=20
> Signed-off-by: Dharmik Thakkar <dharmik.thakkar@arm.com>
> ---
>  drivers/mempool/ring/rte_mempool_ring.c |  2 +-
>  lib/mempool/rte_mempool.c               |  8 +++
>  lib/mempool/rte_mempool.h               | 74 ++++++++++++++++++++++---
>  3 files changed, 74 insertions(+), 10 deletions(-)
>=20
> diff --git a/drivers/mempool/ring/rte_mempool_ring.c b/drivers/mempool/ri=
ng/rte_mempool_ring.c
> index b1f09ff28f4d..e55913e47f21 100644
> --- a/drivers/mempool/ring/rte_mempool_ring.c
> +++ b/drivers/mempool/ring/rte_mempool_ring.c
> @@ -101,7 +101,7 @@ ring_alloc(struct rte_mempool *mp, uint32_t rg_flags)
>  		return -rte_errno;
>=20
>  	mp->pool_data =3D r;
> -
> +	mp->local_cache_base_addr =3D &r[1];
>  	return 0;
>  }
>=20
> diff --git a/lib/mempool/rte_mempool.c b/lib/mempool/rte_mempool.c
> index 59a588425bd6..424bdb19c323 100644
> --- a/lib/mempool/rte_mempool.c
> +++ b/lib/mempool/rte_mempool.c
> @@ -480,6 +480,7 @@ rte_mempool_populate_default(struct rte_mempool *mp)
>  	int ret;
>  	bool need_iova_contig_obj;
>  	size_t max_alloc_size =3D SIZE_MAX;
> +	unsigned lcore_id;
>=20
>  	ret =3D mempool_ops_alloc_once(mp);
>  	if (ret !=3D 0)
> @@ -600,6 +601,13 @@ rte_mempool_populate_default(struct rte_mempool *mp)
>  		}
>  	}
>=20
> +	/* Init all default caches. */
> +	if (mp->cache_size !=3D 0) {
> +		for (lcore_id =3D 0; lcore_id < RTE_MAX_LCORE; lcore_id++)
> +			mp->local_cache[lcore_id].local_cache_base_value =3D
> +				*(void **)mp->local_cache_base_addr;
> +	}
> +
>  	rte_mempool_trace_populate_default(mp);
>  	return mp->size;
>=20
> diff --git a/lib/mempool/rte_mempool.h b/lib/mempool/rte_mempool.h
> index 4235d6f0bf2b..545405c0d3ce 100644
> --- a/lib/mempool/rte_mempool.h
> +++ b/lib/mempool/rte_mempool.h
> @@ -51,6 +51,8 @@
>  #include <rte_memcpy.h>
>  #include <rte_common.h>
>=20
> +#include <arm_neon.h>
> +
>  #include "rte_mempool_trace_fp.h"
>=20
>  #ifdef __cplusplus
> @@ -91,11 +93,12 @@ struct rte_mempool_cache {
>  	uint32_t size;	      /**< Size of the cache */
>  	uint32_t flushthresh; /**< Threshold before we flush excess elements */
>  	uint32_t len;	      /**< Current cache count */
> +	void *local_cache_base_value; /**< Base value to calculate indices */
>  	/*
>  	 * Cache is allocated to this size to allow it to overflow in certain
>  	 * cases to avoid needless emptying of cache.
>  	 */
> -	void *objs[RTE_MEMPOOL_CACHE_MAX_SIZE * 3]; /**< Cache objects */
> +	uint32_t objs[RTE_MEMPOOL_CACHE_MAX_SIZE * 3]; /**< Cache objects */
>  } __rte_cache_aligned;
>=20
>  /**
> @@ -172,7 +175,6 @@ struct rte_mempool_objtlr {
>   * A list of memory where objects are stored
>   */
>  STAILQ_HEAD(rte_mempool_memhdr_list, rte_mempool_memhdr);
> -
>  /**
>   * Callback used to free a memory chunk
>   */
> @@ -244,6 +246,7 @@ struct rte_mempool {
>  	int32_t ops_index;
>=20
>  	struct rte_mempool_cache *local_cache; /**< Per-lcore local cache */
> +	void *local_cache_base_addr; /**< Reference to the base value */
>=20
>  	uint32_t populated_size;         /**< Number of populated objects. */
>  	struct rte_mempool_objhdr_list elt_list; /**< List of objects in pool *=
/
> @@ -1269,7 +1272,15 @@ rte_mempool_cache_flush(struct rte_mempool_cache *=
cache,
>  	if (cache =3D=3D NULL || cache->len =3D=3D 0)
>  		return;
>  	rte_mempool_trace_cache_flush(cache, mp);
> -	rte_mempool_ops_enqueue_bulk(mp, cache->objs, cache->len);
> +
> +	unsigned int i;
> +	unsigned int cache_len =3D cache->len;
> +	void *obj_table[RTE_MEMPOOL_CACHE_MAX_SIZE * 3];
> +	void *base_value =3D cache->local_cache_base_value;
> +	uint32_t *cache_objs =3D cache->objs;
> +	for (i =3D 0; i < cache_len; i++)
> +		obj_table[i] =3D (void *) RTE_PTR_ADD(base_value, cache_objs[i]);
> +	rte_mempool_ops_enqueue_bulk(mp, obj_table, cache->len);
>  	cache->len =3D 0;
>  }
>=20
> @@ -1289,7 +1300,9 @@ static __rte_always_inline void
>  __mempool_generic_put(struct rte_mempool *mp, void * const *obj_table,
>  		      unsigned int n, struct rte_mempool_cache *cache)
>  {
> -	void **cache_objs;
> +	uint32_t *cache_objs;
> +	void *base_value;
> +	uint32_t i;
>=20
>  	/* increment stat now, adding in mempool always success */
>  	__MEMPOOL_STAT_ADD(mp, put_bulk, 1);
> @@ -1301,6 +1314,12 @@ __mempool_generic_put(struct rte_mempool *mp, void=
 * const *obj_table,
>=20
>  	cache_objs =3D &cache->objs[cache->len];
>=20
> +	base_value =3D cache->local_cache_base_value;
> +
> +	uint64x2_t v_obj_table;
> +	uint64x2_t v_base_value =3D vdupq_n_u64((uint64_t)base_value);
> +	uint32x2_t v_cache_objs;
> +
>  	/*
>  	 * The cache follows the following algorithm
>  	 *   1. Add the objects to the cache
> @@ -1309,12 +1328,26 @@ __mempool_generic_put(struct rte_mempool *mp, voi=
d * const *obj_table,
>  	 */
>=20
>  	/* Add elements back into the cache */
> -	rte_memcpy(&cache_objs[0], obj_table, sizeof(void *) * n);
> +
> +#if defined __ARM_NEON
> +	for (i =3D 0; i < (n & ~0x1); i+=3D2) {
> +		v_obj_table =3D vld1q_u64((const uint64_t *)&obj_table[i]);
> +		v_cache_objs =3D vqmovn_u64(vsubq_u64(v_obj_table, v_base_value));
> +		vst1_u32(cache_objs + i, v_cache_objs);
> +	}
> +	if (n & 0x1) {
> +		cache_objs[i] =3D (uint32_t) RTE_PTR_DIFF(obj_table[i], base_value);
> +	}
> +#else
> +	for (i =3D 0; i < n; i++) {
> +		cache_objs[i] =3D (uint32_t) RTE_PTR_DIFF(obj_table[i], base_value);
> +	}
> +#endif
>=20
>  	cache->len +=3D n;
>=20
>  	if (cache->len >=3D cache->flushthresh) {
> -		rte_mempool_ops_enqueue_bulk(mp, &cache->objs[cache->size],
> +		rte_mempool_ops_enqueue_bulk(mp, obj_table + cache->len - cache->size,
>  				cache->len - cache->size);
>  		cache->len =3D cache->size;
>  	}
> @@ -1415,23 +1448,26 @@ __mempool_generic_get(struct rte_mempool *mp, voi=
d **obj_table,
>  		      unsigned int n, struct rte_mempool_cache *cache)
>  {
>  	int ret;
> +	uint32_t i;
>  	uint32_t index, len;
> -	void **cache_objs;
> +	uint32_t *cache_objs;
>=20
>  	/* No cache provided or cannot be satisfied from cache */
>  	if (unlikely(cache =3D=3D NULL || n >=3D cache->size))
>  		goto ring_dequeue;
>=20
> +	void *base_value =3D cache->local_cache_base_value;
>  	cache_objs =3D cache->objs;
>=20
>  	/* Can this be satisfied from the cache? */
>  	if (cache->len < n) {
>  		/* No. Backfill the cache first, and then fill from it */
>  		uint32_t req =3D n + (cache->size - cache->len);
> +		void *temp_objs[RTE_MEMPOOL_CACHE_MAX_SIZE * 3]; /**< Cache objects */
>=20
>  		/* How many do we require i.e. number to fill the cache + the request =
*/
>  		ret =3D rte_mempool_ops_dequeue_bulk(mp,
> -			&cache->objs[cache->len], req);
> +			temp_objs, req);
>  		if (unlikely(ret < 0)) {
>  			/*
>  			 * In the off chance that we are buffer constrained,
> @@ -1442,12 +1478,32 @@ __mempool_generic_get(struct rte_mempool *mp, voi=
d **obj_table,
>  			goto ring_dequeue;
>  		}
>=20
> +		len =3D cache->len;
> +		for (i =3D 0; i < req; ++i, ++len) {
> +			cache_objs[len] =3D (uint32_t) RTE_PTR_DIFF(temp_objs[i], base_value)=
;
> +		}
> +
>  		cache->len +=3D req;
>  	}
>=20
> +	uint64x2_t v_obj_table;
> +	uint64x2_t v_cache_objs;
> +	uint64x2_t v_base_value =3D vdupq_n_u64((uint64_t)base_value);
> +
>  	/* Now fill in the response ... */
> +#if defined __ARM_NEON
> +	for (index =3D 0, len =3D cache->len - 1; index < (n & ~0x1); index+=3D=
2,
> +						len-=3D2, obj_table+=3D2) {
> +		v_cache_objs =3D vmovl_u32(vld1_u32(cache_objs + len - 1));
> +		v_obj_table =3D vaddq_u64(v_cache_objs, v_base_value);
> +		vst1q_u64((uint64_t *)obj_table, v_obj_table);
> +	}
> +	if (n & 0x1)
> +		*obj_table =3D (void *) RTE_PTR_ADD(base_value, cache_objs[len]);
> +#else
>  	for (index =3D 0, len =3D cache->len - 1; index < n; ++index, len--, ob=
j_table++)
> -		*obj_table =3D cache_objs[len];
> +		*obj_table =3D (void *) RTE_PTR_ADD(base_value, cache_objs[len]);
> +#endif
>=20
>  	cache->len -=3D n;
>=20
> --
> 2.17.1