From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id 1DC6AA00C5; Thu, 27 Oct 2022 10:34:45 +0200 (CEST) Received: from [217.70.189.124] (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id B731C42B73; Thu, 27 Oct 2022 10:34:45 +0200 (CEST) Received: from mail-wm1-f41.google.com (mail-wm1-f41.google.com [209.85.128.41]) by mails.dpdk.org (Postfix) with ESMTP id 6E5F14280C for ; Thu, 27 Oct 2022 10:34:44 +0200 (CEST) Received: by mail-wm1-f41.google.com with SMTP id l16-20020a05600c4f1000b003c6c0d2a445so594135wmq.4 for ; Thu, 27 Oct 2022 01:34:44 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=6wind.com; s=google; h=in-reply-to:content-transfer-encoding:content-disposition :mime-version:references:message-id:subject:cc:to:from:date:from:to :cc:subject:date:message-id:reply-to; bh=vlN9YiQe7xoPalKVWmUSJYzYgz/hQI0qM1GsyQeMSWI=; b=gbTETdbeAe6yiDVYVL1jOrMrb4pXZjGkpV0lXG/N3/iHQuh3RQ/+GqjLWchQ07KA9K WVLIbUSZbVumSWVtwbqBYLlEAq/bh27SFi+GaTuT4xsWktQajg8SN2jfRplB3i6MAOYO 9GJCJrZeV6y4vgA8O7HOvmzgFL2FC1BnOrdmqhu49KF9HVTMQTNhgUobkcb4kftDm1nW 49oPmTHmlfUCjtBY1m8g9nBwyGgQ2XkLGRaf+xh9cI2jGm4+4Kj/iKmk6VVvsv2Bd37P 4V8RkRm7QXkdcS9YtS+aNH1dZeiYxrggPqj6ogursZWf/hPTDIdrM2h6jeRo3VuL7f9R UQHg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=in-reply-to:content-transfer-encoding:content-disposition :mime-version:references:message-id:subject:cc:to:from:date :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=vlN9YiQe7xoPalKVWmUSJYzYgz/hQI0qM1GsyQeMSWI=; b=TGSs/dOM414oc65KB2/WXCn4y5Rv6CsAmGDATxaQlbqzfJPCfKVE1J466M5bE2tXee /6gpYNjE3bW9zJLadcsdx68/cGGOeFbG5s/uDupPhPTyDrxEjho51zC2pHGQAXz+esem zgk0+PNj6bVSt5EpSVsaO0mWpLJHqfisQvpdFpsxHoZpDA26FvuC7HjJPbt2Br6+CgFi IE/hhERXB73yuf1jxXoBF90UB/OF69VSGBdIxpMig8AgHa0ggM9r8DsXl113wvse3rxc AQQnnOvPdYmIKEUBq4QxM1Eu8RTqBAGkjj1D5+Z9VyDoXpW7XKmk7fmSerZSVmUnldjT Pkjw== X-Gm-Message-State: ACrzQf2gBtM4ctJ5CcFfFxiWX65iiDdabHZ0dp88RH8aOYwZl9SfOqT/ 97lMM/Cdg7ZcJ//06CRQ0kjj6Q== X-Google-Smtp-Source: AMsMyM4rH7lCM1pggjc+X6jvQjKoyrUDdKJ/9kwnrfusWlD4qcGua5wMIpmZYv4rRjtRvniXif9chQ== X-Received: by 2002:a05:600c:2196:b0:3cf:4de3:163c with SMTP id e22-20020a05600c219600b003cf4de3163cmr3757734wme.122.1666859684076; Thu, 27 Oct 2022 01:34:44 -0700 (PDT) Received: from 6wind.com ([2a01:e0a:5ac:6460:c065:401d:87eb:9b25]) by smtp.gmail.com with ESMTPSA id l4-20020a5d4804000000b002366c3eefccsm500783wrq.109.2022.10.27.01.34.42 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 27 Oct 2022 01:34:43 -0700 (PDT) Date: Thu, 27 Oct 2022 10:34:42 +0200 From: Olivier Matz To: Morten =?iso-8859-1?Q?Br=F8rup?= Cc: andrew.rybchenko@oktetlabs.ru, jerinj@marvell.com, thomas@monjalon.net, bruce.richardson@intel.com, dev@dpdk.org Subject: Re: [PATCH] mempool: cache align mempool cache objects Message-ID: References: <98CBD80474FA8B44BF855DF32C47DC35D8744E@smartserver.smartshare.dk> <20221026144436.71068-1-mb@smartsharesystems.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <20221026144436.71068-1-mb@smartsharesystems.com> X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Hi Morten, On Wed, Oct 26, 2022 at 04:44:36PM +0200, Morten Brørup wrote: > Add __rte_cache_aligned to the objs array. > > It makes no difference in the general case, but if get/put operations are > always 32 objects, it will reduce the number of memory (or last level > cache) accesses from five to four 64 B cache lines for every get/put > operation. > > For readability reasons, an example using 16 objects follows: > > Currently, with 16 objects (128B), we access to 3 > cache lines: > > ┌────────┐ > │len │ > cache │********│--- > line0 │********│ ^ > │********│ | > ├────────┤ | 16 objects > │********│ | 128B > cache │********│ | > line1 │********│ | > │********│ | > ├────────┤ | > │********│_v_ > cache │ │ > line2 │ │ > │ │ > └────────┘ > > With the alignment, it is also 3 cache lines: > > ┌────────┐ > │len │ > cache │ │ > line0 │ │ > │ │ > ├────────┤--- > │********│ ^ > cache │********│ | > line1 │********│ | > │********│ | > ├────────┤ | 16 objects > │********│ | 128B > cache │********│ | > line2 │********│ | > │********│ v > └────────┘--- > > However, accessing the objects at the bottom of the mempool cache is a > special case, where cache line0 is also used for objects. > > Consider the next burst (and any following bursts): > > Current: > ┌────────┐ > │len │ > cache │ │ > line0 │ │ > │ │ > ├────────┤ > │ │ > cache │ │ > line1 │ │ > │ │ > ├────────┤ > │ │ > cache │********│--- > line2 │********│ ^ > │********│ | > ├────────┤ | 16 objects > │********│ | 128B > cache │********│ | > line3 │********│ | > │********│ | > ├────────┤ | > │********│_v_ > cache │ │ > line4 │ │ > │ │ > └────────┘ > 4 cache lines touched, incl. line0 for len. > > With the proposed alignment: > ┌────────┐ > │len │ > cache │ │ > line0 │ │ > │ │ > ├────────┤ > │ │ > cache │ │ > line1 │ │ > │ │ > ├────────┤ > │ │ > cache │ │ > line2 │ │ > │ │ > ├────────┤ > │********│--- > cache │********│ ^ > line3 │********│ | > │********│ | 16 objects > ├────────┤ | 128B > │********│ | > cache │********│ | > line4 │********│ | > │********│_v_ > └────────┘ > Only 3 cache lines touched, incl. line0 for len. I understand your logic, but are we sure that having an application that works with bulks of 32 means that the cache will stay aligned to 32 elements for the whole life of the application? In an application, the alignment of the cache can change if you have any of: - software queues (reassembly for instance) - packet duplication (bridge, multicast) - locally generated packets (keepalive, control protocol) - pipeline to other cores Even with testpmd, which work by bulk of 32, I can see that the size of the cache filling is not aligned to 32. Right after starting the application, we already have this: internal cache infos: cache_size=250 cache_count[0]=231 This is probably related to the hw rx rings size, number of queues, number of ports. The "250" default value for cache size in testpmd is questionable, but with --mbcache=256, the behavior is similar. Also, when we transmit to a NIC, the mbufs are not returned immediatly to the pool, they may stay in the hw tx ring during some time, which is a driver decision. After processing traffic on cores 8 and 24 with this testpmd, I get: cache_count[0]=231 cache_count[8]=123 cache_count[24]=122 In my opinion, it is not realistic to think that the mempool cache will remain aligned to cachelines. In these conditions, it looks better to keep the structure packed to avoid wasting memory. Olivier > > Credits go to Olivier Matz for the nice ASCII graphics. > > Signed-off-by: Morten Brørup > --- > lib/mempool/rte_mempool.h | 6 ++++-- > 1 file changed, 4 insertions(+), 2 deletions(-) > > diff --git a/lib/mempool/rte_mempool.h b/lib/mempool/rte_mempool.h > index 1f5707f46a..3725a72951 100644 > --- a/lib/mempool/rte_mempool.h > +++ b/lib/mempool/rte_mempool.h > @@ -86,11 +86,13 @@ struct rte_mempool_cache { > uint32_t size; /**< Size of the cache */ > uint32_t flushthresh; /**< Threshold before we flush excess elements */ > uint32_t len; /**< Current cache count */ > - /* > + /** > + * Cache objects > + * > * Cache is allocated to this size to allow it to overflow in certain > * cases to avoid needless emptying of cache. > */ > - void *objs[RTE_MEMPOOL_CACHE_MAX_SIZE * 2]; /**< Cache objects */ > + void *objs[RTE_MEMPOOL_CACHE_MAX_SIZE * 2] __rte_cache_aligned; > } __rte_cache_aligned; > > /** > -- > 2.17.1 >