From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id 8A6E1A04A7; Mon, 24 Jan 2022 14:06:03 +0100 (CET) Received: from [217.70.189.124] (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id CA4C3427D1; Mon, 24 Jan 2022 14:05:58 +0100 (CET) Received: from mail-108-mta29.mxroute.com (mail-108-mta29.mxroute.com [136.175.108.29]) by mails.dpdk.org (Postfix) with ESMTP id 3A13540E0F for ; Mon, 24 Jan 2022 14:05:55 +0100 (CET) Received: from filter006.mxroute.com ([140.82.40.27] 140.82.40.27.vultr.com) (Authenticated sender: mN4UYu2MZsgR) by mail-108-mta29.mxroute.com (ZoneMTA) with ESMTPSA id 17e8c3152ac0005a20.006 for (version=TLSv1/SSLv3 cipher=ECDHE-RSA-AES128-GCM-SHA256); Mon, 24 Jan 2022 13:05:49 +0000 X-Zone-Loop: 39ceee3f6bcb0b8560395590febca0cc7dc12d133ec2 X-Originating-IP: [140.82.40.27] DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=ashroe.eu; s=x; h=Content-Transfer-Encoding:Content-Type:MIME-Version:Message-ID:Date: In-reply-to:Subject:Cc:To:From:References:Sender:Reply-To:Content-ID: Content-Description:Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc :Resent-Message-ID:List-Id:List-Help:List-Unsubscribe:List-Subscribe: List-Post:List-Owner:List-Archive; bh=ys4+vcZ+RKsIr9TZr7Oy9N8X3M6PW3WrjlbqOJKTw7o=; b=N6d/ShjfEy/9SuXaNV68X1vUgf wuhgSqeXd6PsOIjVhQB6+C6lk4cnH+qpFblnV8JomBFsfWeB/AiH2vh8XQUanLwfR0dYabs0+jOiu clizsdTp9HMbPSYWnCaQpxM/reTOxFHVauLrp1RBfZ1eyBP2DkwDBEm6PU4QK7tP86z7Lec4kTGwP 0lzzUuutTMWeKmcdpD6deelYkOgrxG9QoXlxS4eeH2XEnIQ2Q9NLPy8ROLF18urKgNmuBR3GOlOsn 8YZhvKRQzKGy4Iiqp9RnAR5QD/toy/TgCIgYEACaxsBVCTvhhcsOS8D7gQMlSkStJfRRyxIiosEL0 ismxTQrQ==; References: <20211224225923.806498-1-dharmik.thakkar@arm.com> <20220113053630.886638-1-dharmik.thakkar@arm.com> <20220113053630.886638-2-dharmik.thakkar@arm.com> <98CBD80474FA8B44BF855DF32C47DC35D86E14@smartserver.smartshare.dk> <98CBD80474FA8B44BF855DF32C47DC35D86E18@smartserver.smartshare.dk> User-agent: mu4e 1.4.15; emacs 27.1 From: Ray Kinsella To: Morten =?utf-8?Q?Br=C3=B8rup?= Cc: Honnappa Nagarahalli , Dharmik Thakkar , Olivier Matz , Andrew Rybchenko , dev@dpdk.org, Ruifeng Wang , Beilei Xing , nd Subject: Re: [PATCH v2 1/1] mempool: implement index-based per core cache In-reply-to: <98CBD80474FA8B44BF855DF32C47DC35D86E18@smartserver.smartshare.dk> Date: Mon, 24 Jan 2022 08:05:45 -0500 Message-ID: <87v8y9s2qe.fsf@mdr78.vserver.site> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-AuthUser: mdr@ashroe.eu X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Morten Br=C3=B8rup writes: > +Ray Kinsella, ABI Policy maintainer > >> From: Honnappa Nagarahalli [mailto:Honnappa.Nagarahalli@arm.com] >> Sent: Friday, 21 January 2022 07.01 >>=20 >> > >> > +CC Beilei as i40e maintainer >> > >> > > From: Dharmik Thakkar [mailto:dharmik.thakkar@arm.com] >> > > Sent: Thursday, 13 January 2022 06.37 >> > > >> > > Current mempool per core cache implementation stores pointers to >> mbufs >> > > On 64b architectures, each pointer consumes 8B This patch replaces >> it >> > > with index-based implementation, where in each buffer is addressed >> by >> > > (pool base address + index) It reduces the amount of memory/cache >> > > required for per core cache >> > > >> > > L3Fwd performance testing reveals minor improvements in the cache >> > > performance (L1 and L2 misses reduced by 0.60%) with no change in >> > > throughput >> > > >> > > Suggested-by: Honnappa Nagarahalli >> > > Signed-off-by: Dharmik Thakkar >> > > Reviewed-by: Ruifeng Wang >> > > --- >> > > lib/mempool/rte_mempool.h | 150 >> +++++++++++++++++++++++++- >> > > lib/mempool/rte_mempool_ops_default.c | 7 ++ >> > > 2 files changed, 156 insertions(+), 1 deletion(-) >> > > >> > > diff --git a/lib/mempool/rte_mempool.h b/lib/mempool/rte_mempool.h >> > > index 1e7a3c15273c..f2403fbc97a7 100644 >> > > --- a/lib/mempool/rte_mempool.h >> > > +++ b/lib/mempool/rte_mempool.h >> > > @@ -50,6 +50,10 @@ >> > > #include >> > > #include >> > > >> > > +#ifdef RTE_MEMPOOL_INDEX_BASED_LCORE_CACHE >> > > +#include >> > > +#endif >> > > + >> > > #include "rte_mempool_trace_fp.h" >> > > >> > > #ifdef __cplusplus >> > > @@ -239,6 +243,9 @@ struct rte_mempool { >> > > int32_t ops_index; >> > > >> > > struct rte_mempool_cache *local_cache; /**< Per-lcore local cache >> */ >> > > +#ifdef RTE_MEMPOOL_INDEX_BASED_LCORE_CACHE >> > > + void *pool_base_value; /**< Base value to calculate indices */ >> > > +#endif >> > > >> > > uint32_t populated_size; /**< Number of populated >> > > objects. */ >> > > struct rte_mempool_objhdr_list elt_list; /**< List of objects in >> > > pool */ @@ -1314,7 +1321,22 @@ rte_mempool_cache_flush(struct >> > > rte_mempool_cache *cache, >> > > if (cache =3D=3D NULL || cache->len =3D=3D 0) >> > > return; >> > > rte_mempool_trace_cache_flush(cache, mp); >> > > + >> > > +#ifdef RTE_MEMPOOL_INDEX_BASED_LCORE_CACHE >> > > + unsigned int i; >> > > + unsigned int cache_len =3D cache->len; >> > > + void *obj_table[RTE_MEMPOOL_CACHE_MAX_SIZE * 3]; >> > > + void *base_value =3D mp->pool_base_value; >> > > + uint32_t *cache_objs =3D (uint32_t *) cache->objs; >> > >> > Hi Dharmik and Honnappa, >> > >> > The essence of this patch is based on recasting the type of the objs >> field in the >> > rte_mempool_cache structure from an array of pointers to an array of >> > uint32_t. >> > >> > However, this effectively breaks the ABI, because the >> rte_mempool_cache >> > structure is public and part of the API. >> The patch does not change the public structure, the new member is under >> compile time flag, not sure how it breaks the ABI. >>=20 >> > >> > Some drivers [1] even bypass the mempool API and access the >> > rte_mempool_cache structure directly, assuming that the objs array in >> the >> > cache is an array of pointers. So you cannot recast the fields in the >> > rte_mempool_cache structure the way this patch requires. >> IMO, those drivers are at fault. The mempool cache structure is public >> only because the APIs are inline. We should still maintain modularity >> and not use the members of structures belonging to another library >> directly. A similar effort involving rte_ring was not accepted sometime >> back [1] >>=20 >> [1] >> http://inbox.dpdk.org/dev/DBAPR08MB5814907968595EE56F5E20A798390@DBAPR0 >> 8MB5814.eurprd08.prod.outlook.com/ >>=20 >> > >> > Although I do consider bypassing an API's accessor functions >> "spaghetti >> > code", this driver's behavior is formally acceptable as long as the >> > rte_mempool_cache structure is not marked as internal. >> > >> > I really liked your idea of using indexes instead of pointers, so I'm >> very sorry to >> > shoot it down. :-( >> > >> > [1]: E.g. the Intel i40e PMD, >> > >> http://code.dpdk.org/dpdk/latest/source/drivers/net/i40e/i40e_rxtx_vec_ >> avx >> > 512.c#L25 >> It is possible to throw an error when this feature is enabled in this >> file. Alternatively, this PMD could implement the code for index based >> mempool. >>=20 > > I agree with both your points, Honnappa. > > The ABI remains intact, and only changes when this feature is enabled at = compile time. > > In addition to your suggestions, I propose that the patch modifies the ob= js type in the mempool cache structure itself, instead of type casting it t= hrough an access variable. This should throw an error when compiling an app= lication that accesses it as a pointer array instead of a uint32_t array - = like the affected Intel PMDs. > > The updated objs field in the mempool cache structure should have the sam= e size when compiled as the original objs field, so this feature doesn't ch= ange anything else in the ABI, only the type of the mempool cache objects. > > Also, the description of the feature should stress that applications acce= ssing the cache objects directly will fail miserably. Thanks for CC'ing me Morten. My 2c is that, I would be slow in supporting this patch as it introduces code paths that are harder (impossible?) to test regularly. So yes, it is optional, in that case are we just adding automatically dead code - I would ask, if a runtime option not make more sense for this? Also we can't automatically assume what the PMD's are doing are breaking an unwritten rule (breaking abstractions) - I would guess these are doing it for solid performance reasons. If so that would futher support my point about making the mempool runtime configurable and query-able (is this mempool a bucket of indexes or pointers etc), and enabling the PMDs to ask rather than assume. Like Morten, I like the idea, saving memory and reducing cache misses with indexes, this is all good IMHO. --=20 Regards, Ray K