From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id 38273A034C; Fri, 21 Jan 2022 10:12:30 +0100 (CET) Received: from [217.70.189.124] (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id AA7294272F; Fri, 21 Jan 2022 10:12:29 +0100 (CET) Received: from mga14.intel.com (mga14.intel.com [192.55.52.115]) by mails.dpdk.org (Postfix) with ESMTP id 3D8B840042 for ; Fri, 21 Jan 2022 10:12:28 +0100 (CET) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1642756348; x=1674292348; h=date:from:to:cc:subject:message-id:references: mime-version:in-reply-to; bh=+NkjY57hNM/7ci0oVeudhs+IcoVCU6Zmq8v9cYM7uus=; b=htu8Amb8cBjnn00e0SnR8W6Z8vxjmIbVj2Ocj6Tf5IUjAvk6BqI42xPQ F1foiNmhLZ7d+UskVl27/ONy6x90WGwb7Oxa4y5MmgBSsCzS9oq7wMdm5 t0YB4aYD3tsqdD0Gay570p1nAtMSVuncaSqYyu+ekQezMhEJnpv0qiRzP uiiwyyBVVZvPvirFDB4ogX7cCV+ZItzw7ubjVdETXB9cLBBxW6uUSq8tV wPHOAuNObkhBMYVGpxYp7XbZqtUS7pNDqc7sTQ1gGkuMNf9ler+k+spND H0fWeBBDurGtSlmJpYS/nKvT54ZV7hJ4Kqo6YNRxPtbz8tWu/kD+SJxUB Q==; X-IronPort-AV: E=McAfee;i="6200,9189,10233"; a="245828823" X-IronPort-AV: E=Sophos;i="5.88,304,1635231600"; d="scan'208";a="245828823" Received: from fmsmga003.fm.intel.com ([10.253.24.29]) by fmsmga103.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 21 Jan 2022 01:12:27 -0800 X-IronPort-AV: E=Sophos;i="5.88,304,1635231600"; d="scan'208";a="616444404" Received: from bricha3-mobl.ger.corp.intel.com ([10.252.4.12]) by fmsmga003-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-SHA; 21 Jan 2022 01:12:25 -0800 Date: Fri, 21 Jan 2022 09:12:21 +0000 From: Bruce Richardson To: Honnappa Nagarahalli Cc: Morten =?iso-8859-1?Q?Br=F8rup?= , Dharmik Thakkar , Olivier Matz , Andrew Rybchenko , "dev@dpdk.org" , nd , Ruifeng Wang , Beilei Xing Subject: Re: [PATCH v2 1/1] mempool: implement index-based per core cache Message-ID: References: <20211224225923.806498-1-dharmik.thakkar@arm.com> <20220113053630.886638-1-dharmik.thakkar@arm.com> <20220113053630.886638-2-dharmik.thakkar@arm.com> <98CBD80474FA8B44BF855DF32C47DC35D86E14@smartserver.smartshare.dk> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org On Fri, Jan 21, 2022 at 06:01:23AM +0000, Honnappa Nagarahalli wrote: > > > > > +CC Beilei as i40e maintainer > > > > > From: Dharmik Thakkar [mailto:dharmik.thakkar@arm.com] Sent: > > > Thursday, 13 January 2022 06.37 > > > > > > Current mempool per core cache implementation stores pointers to > > > mbufs On 64b architectures, each pointer consumes 8B This patch > > > replaces it with index-based implementation, where in each buffer is > > > addressed by (pool base address + index) It reduces the amount of > > > memory/cache required for per core cache > > > > > > L3Fwd performance testing reveals minor improvements in the cache > > > performance (L1 and L2 misses reduced by 0.60%) with no change in > > > throughput > > > > > > Suggested-by: Honnappa Nagarahalli > > > Signed-off-by: Dharmik Thakkar Reviewed-by: > > > Ruifeng Wang --- lib/mempool/rte_mempool.h > > > | 150 +++++++++++++++++++++++++- > > > lib/mempool/rte_mempool_ops_default.c | 7 ++ 2 files changed, 156 > > > insertions(+), 1 deletion(-) > > > > > > diff --git a/lib/mempool/rte_mempool.h b/lib/mempool/rte_mempool.h > > > index 1e7a3c15273c..f2403fbc97a7 100644 --- > > > a/lib/mempool/rte_mempool.h +++ b/lib/mempool/rte_mempool.h @@ -50,6 > > > +50,10 @@ #include #include > > > > > > +#ifdef RTE_MEMPOOL_INDEX_BASED_LCORE_CACHE +#include > > > +#endif + #include "rte_mempool_trace_fp.h" > > > > > > #ifdef __cplusplus @@ -239,6 +243,9 @@ struct rte_mempool { int32_t > > > ops_index; > > > > > > struct rte_mempool_cache *local_cache; /**< Per-lcore local cache > > > */ +#ifdef RTE_MEMPOOL_INDEX_BASED_LCORE_CACHE + void > > > *pool_base_value; /**< Base value to calculate indices */ +#endif > > > > > > uint32_t populated_size; /**< Number of populated objects. > > > */ struct rte_mempool_objhdr_list elt_list; /**< List of objects in > > > pool */ @@ -1314,7 +1321,22 @@ rte_mempool_cache_flush(struct > > > rte_mempool_cache *cache, if (cache == NULL || cache->len == 0) > > > return; rte_mempool_trace_cache_flush(cache, mp); + +#ifdef > > > RTE_MEMPOOL_INDEX_BASED_LCORE_CACHE + unsigned int i; + > > > unsigned int cache_len = cache->len; + void > > > *obj_table[RTE_MEMPOOL_CACHE_MAX_SIZE * 3]; + void *base_value = > > > mp->pool_base_value; + uint32_t *cache_objs = (uint32_t *) > > > cache->objs; > > > > Hi Dharmik and Honnappa, > > > > The essence of this patch is based on recasting the type of the objs > > field in the rte_mempool_cache structure from an array of pointers to > > an array of uint32_t. > > > > However, this effectively breaks the ABI, because the rte_mempool_cache > > structure is public and part of the API. > The patch does not change the public structure, the new member is under > compile time flag, not sure how it breaks the ABI. > > > > > Some drivers [1] even bypass the mempool API and access the > > rte_mempool_cache structure directly, assuming that the objs array in > > the cache is an array of pointers. So you cannot recast the fields in > > the rte_mempool_cache structure the way this patch requires. > IMO, those drivers are at fault. The mempool cache structure is public > only because the APIs are inline. We should still maintain modularity and > not use the members of structures belonging to another library directly. > A similar effort involving rte_ring was not accepted sometime back [1] > > [1] > http://inbox.dpdk.org/dev/DBAPR08MB5814907968595EE56F5E20A798390@DBAPR08MB5814.eurprd08.prod.outlook.com/ > > > > > Although I do consider bypassing an API's accessor functions "spaghetti > > code", this driver's behavior is formally acceptable as long as the > > rte_mempool_cache structure is not marked as internal. > > > > I really liked your idea of using indexes instead of pointers, so I'm > > very sorry to shoot it down. :-( > > > > [1]: E.g. the Intel i40e PMD, > > http://code.dpdk.org/dpdk/latest/source/drivers/net/i40e/i40e_rxtx_vec_avx > > 512.c#L25 > It is possible to throw an error when this feature is enabled in this > file. Alternatively, this PMD could implement the code for index based > mempool. > Yes, it can implement it, and if this model get put in mempool it probably will [even if it's just a fallback to the mempool code in that case]. However, I would object to adding in this model in the library right now if it cannot be proved to show some benefit in a realworld case. As I understand it, the only benefit seen has been in unit test cases? I want to ensure that for any perf improvements we put in that they have some real-world applicabilty - the amoung of applicability will depend on the scope and impact - and by the same token that we don't reject simplifications or improvements on the basis that they *might* cause issues, if all perf data fails to show any problem. So for this patch, can we get some perf numbers for an app where it does show the value of it? L3fwd is a very trivial app, and as such is usually fairly reliable in showing perf benefits of optimizations if they exist. Perhaps for this case, we need something with a bigger cache footprint perhaps? Regards, /Bruce