From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id 20949A0543; Fri, 28 Oct 2022 08:41:58 +0200 (CEST) Received: from [217.70.189.124] (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id F36DE400D5; Fri, 28 Oct 2022 08:41:57 +0200 (CEST) Received: from smartserver.smartsharesystems.com (smartserver.smartsharesystems.com [77.243.40.215]) by mails.dpdk.org (Postfix) with ESMTP id 356B540041 for ; Fri, 28 Oct 2022 08:41:56 +0200 (CEST) Received: from dkrd2.smartsharesys.local ([192.168.4.12]) by smartserver.smartsharesystems.com with Microsoft SMTPSVC(6.0.3790.4675); Fri, 28 Oct 2022 08:41:54 +0200 From: =?UTF-8?q?Morten=20Br=C3=B8rup?= To: olivier.matz@6wind.com, andrew.rybchenko@oktetlabs.ru Cc: jerinj@marvell.com, thomas@monjalon.net, bruce.richardson@intel.com, dev@dpdk.org, =?UTF-8?q?Morten=20Br=C3=B8rup?= Subject: [PATCH v4 1/2] mempool: cache align mempool cache objects Date: Fri, 28 Oct 2022 08:41:51 +0200 Message-Id: <20221028064152.98341-1-mb@smartsharesystems.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20221026144436.71068-1-mb@smartsharesystems.com> References: <20221026144436.71068-1-mb@smartsharesystems.com> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-OriginalArrivalTime: 28 Oct 2022 06:41:54.0583 (UTC) FILETIME=[5E529A70:01D8EA98] X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Add __rte_cache_aligned to the objs array. It makes no difference in the general case, but if get/put operations are always 32 objects, it will reduce the number of memory (or last level cache) accesses from five to four 64 B cache lines for every get/put operation. For readability reasons, an example using 16 objects follows: Currently, with 16 objects (128B), we access to 3 cache lines: ┌────────┐ │len │ cache │********│--- line0 │********│ ^ │********│ | ├────────┤ | 16 objects │********│ | 128B cache │********│ | line1 │********│ | │********│ | ├────────┤ | │********│_v_ cache │ │ line2 │ │ │ │ └────────┘ With the alignment, it is also 3 cache lines: ┌────────┐ │len │ cache │ │ line0 │ │ │ │ ├────────┤--- │********│ ^ cache │********│ | line1 │********│ | │********│ | ├────────┤ | 16 objects │********│ | 128B cache │********│ | line2 │********│ | │********│ v └────────┘--- However, accessing the objects at the bottom of the mempool cache is a special case, where cache line0 is also used for objects. Consider the next burst (and any following bursts): Current: ┌────────┐ │len │ cache │ │ line0 │ │ │ │ ├────────┤ │ │ cache │ │ line1 │ │ │ │ ├────────┤ │ │ cache │********│--- line2 │********│ ^ │********│ | ├────────┤ | 16 objects │********│ | 128B cache │********│ | line3 │********│ | │********│ | ├────────┤ | │********│_v_ cache │ │ line4 │ │ │ │ └────────┘ 4 cache lines touched, incl. line0 for len. With the proposed alignment: ┌────────┐ │len │ cache │ │ line0 │ │ │ │ ├────────┤ │ │ cache │ │ line1 │ │ │ │ ├────────┤ │ │ cache │ │ line2 │ │ │ │ ├────────┤ │********│--- cache │********│ ^ line3 │********│ | │********│ | 16 objects ├────────┤ | 128B │********│ | cache │********│ | line4 │********│ | │********│_v_ └────────┘ Only 3 cache lines touched, incl. line0 for len. Credits go to Olivier Matz for the nice ASCII graphics. v4: * No changes. Added reviewed- and acked-by tags. v3: * No changes. Made part of a series. v2: * No such version. Signed-off-by: Morten Brørup Reviewed-by: Andrew Rybchenko Acked-by: Olivier Matz --- lib/mempool/rte_mempool.h | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/lib/mempool/rte_mempool.h b/lib/mempool/rte_mempool.h index 1f5707f46a..3725a72951 100644 --- a/lib/mempool/rte_mempool.h +++ b/lib/mempool/rte_mempool.h @@ -86,11 +86,13 @@ struct rte_mempool_cache { uint32_t size; /**< Size of the cache */ uint32_t flushthresh; /**< Threshold before we flush excess elements */ uint32_t len; /**< Current cache count */ - /* + /** + * Cache objects + * * Cache is allocated to this size to allow it to overflow in certain * cases to avoid needless emptying of cache. */ - void *objs[RTE_MEMPOOL_CACHE_MAX_SIZE * 2]; /**< Cache objects */ + void *objs[RTE_MEMPOOL_CACHE_MAX_SIZE * 2] __rte_cache_aligned; } __rte_cache_aligned; /** -- 2.17.1