From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id A951FA00C2; Fri, 14 Oct 2022 21:50:41 +0200 (CEST) Received: from [217.70.189.124] (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id 44514400D4; Fri, 14 Oct 2022 21:50:41 +0200 (CEST) Received: from mail-wm1-f52.google.com (mail-wm1-f52.google.com [209.85.128.52]) by mails.dpdk.org (Postfix) with ESMTP id 1DBD74003C for ; Fri, 14 Oct 2022 21:50:40 +0200 (CEST) Received: by mail-wm1-f52.google.com with SMTP id v130-20020a1cac88000000b003bcde03bd44so6203647wme.5 for ; Fri, 14 Oct 2022 12:50:40 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=6wind.com; s=google; h=in-reply-to:content-transfer-encoding:content-disposition :mime-version:references:message-id:subject:cc:to:from:date:from:to :cc:subject:date:message-id:reply-to; bh=W9sPJOwuRcu7zfLpdB5ClVniGjHky6YSGDN7VJJSt0E=; b=VClR3e1EBpiLuWKXw8bJ0Fx86mXFhDFxkrEIy6w3nXKeG/x+o0AjrtzOtQM8GK47Y7 DZWbDqas5KXJeZEYLbiCCTOw2eoWv11z11IZBAFYn6dtIr0VHgIzvBsYWmublKZ2k16N fiLiCANLw7P2ihWG7GVoKlhDvZd2PXPqC7NZ48mFC2z49t6MhCf9nsJ1w50YZqU5jkCm AV3yAFHk8E2xXOKxX2PxDON/pVu0RC/1lmAw8yOaz5QX2HbPwY5TVCiB8ouDExz2DM6c Fy7RgzuJGt++JG0ecFOk6fUknqFj+lQ9v4tliO7YbW7/oZI4TQtxgdKNmC/eGp+VyYDE 0Sng== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=in-reply-to:content-transfer-encoding:content-disposition :mime-version:references:message-id:subject:cc:to:from:date :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=W9sPJOwuRcu7zfLpdB5ClVniGjHky6YSGDN7VJJSt0E=; b=DP1KgVCLLn9Njw4zM0WCtri3J8xjym3Hj8tv7+XMkXLC0/zwGYKBkqU4uLwktf2pnn 20q3OeeOUJ906Uwgy4LicwEObeZnURt/Z8uAX/8CiQe1eCrFS20YIcKHXiCjc1w2leeO i4wtWcx7grEpVgBkBuzzRTxtS7OAsq4fG9293yET0ESvsVpC2eKK6CO1SkCtjczEe5af EoYlh3IOV3LKMCo0+XXQ+2/67mh+YCZjl5/0ABs/FXsu3cJS14MXZ/poW3PXM3HYYd4P 7FEIz1bBj+hCCeQkPYl7PeHTJCoWggI76gs8yhoFwvFFgyYUnjHXfMY+wADtFOrbB9X2 rWoA== X-Gm-Message-State: ACrzQf1Mb7sSfAOf0MZXQ1I6g6kM2cXMD58hMP2QXt7A6qZ02jSOlqP7 ALffKGW57+ayyJeTvudKLC1wXg== X-Google-Smtp-Source: AMsMyM5NC9YhZ8JafD/uzxcYi3GxupZ9sqUnM82avT5mBuOLPMlqhQqMicWHZYspSPQh3j8+43sqGA== X-Received: by 2002:a05:600c:3d8b:b0:3c2:5b78:f978 with SMTP id bi11-20020a05600c3d8b00b003c25b78f978mr11922783wmb.32.1665777039653; Fri, 14 Oct 2022 12:50:39 -0700 (PDT) Received: from 6wind.com ([2a01:e0a:5ac:6460:c065:401d:87eb:9b25]) by smtp.gmail.com with ESMTPSA id r1-20020adfdc81000000b0022e3538d305sm2892402wrj.117.2022.10.14.12.50.38 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 14 Oct 2022 12:50:38 -0700 (PDT) Date: Fri, 14 Oct 2022 21:50:37 +0200 From: Olivier Matz To: Morten =?iso-8859-1?Q?Br=F8rup?= Cc: Andrew Rybchenko , dev@dpdk.org, Bruce Richardson Subject: Re: [PATCH v6 3/4] mempool: fix cache flushing algorithm Message-ID: References: <98CBD80474FA8B44BF855DF32C47DC35D86DB2@smartserver.smartshare.dk> <20221009133737.795377-1-andrew.rybchenko@oktetlabs.ru> <20221009133737.795377-4-andrew.rybchenko@oktetlabs.ru> <98CBD80474FA8B44BF855DF32C47DC35D873B6@smartserver.smartshare.dk> <98CBD80474FA8B44BF855DF32C47DC35D873B8@smartserver.smartshare.dk> <98CBD80474FA8B44BF855DF32C47DC35D873E3@smartserver.smartshare.dk> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <98CBD80474FA8B44BF855DF32C47DC35D873E3@smartserver.smartshare.dk> X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org On Fri, Oct 14, 2022 at 05:57:39PM +0200, Morten Brørup wrote: > > From: Olivier Matz [mailto:olivier.matz@6wind.com] > > Sent: Friday, 14 October 2022 16.01 > > > > Hi Morten, Andrew, > > > > On Sun, Oct 09, 2022 at 05:08:39PM +0200, Morten Brørup wrote: > > > > From: Andrew Rybchenko [mailto:andrew.rybchenko@oktetlabs.ru] > > > > Sent: Sunday, 9 October 2022 16.52 > > > > > > > > On 10/9/22 17:31, Morten Brørup wrote: > > > > >> From: Andrew Rybchenko [mailto:andrew.rybchenko@oktetlabs.ru] > > > > >> Sent: Sunday, 9 October 2022 15.38 > > > > >> > > > > >> From: Morten Brørup > > > > >> > > > > > > [...] > > > > I finally took a couple of hours to carefully review the mempool- > > related > > series (including the ones that have already been pushed). > > > > The new behavior looks better to me in all situations I can think > > about. > > Extreme care is required when touching a core library like the mempool. > > Thank you, Olivier. > > > > > > > > > > >> --- a/lib/mempool/rte_mempool.h > > > > >> +++ b/lib/mempool/rte_mempool.h > > > > >> @@ -90,7 +90,7 @@ struct rte_mempool_cache { > > > > >> * Cache is allocated to this size to allow it to overflow > > in > > > > >> certain > > > > >> * cases to avoid needless emptying of cache. > > > > >> */ > > > > >> - void *objs[RTE_MEMPOOL_CACHE_MAX_SIZE * 3]; /**< Cache > > objects */ > > > > >> + void *objs[RTE_MEMPOOL_CACHE_MAX_SIZE * 2]; /**< Cache > > objects */ > > > > >> } __rte_cache_aligned; > > > > > > > > > > How much are we allowed to break the ABI here? > > > > > > > > > > This patch reduces the size of the structure by removing a now > > unused > > > > part at the end, which should be harmless. > > > > It is an ABI breakage: an existing application will use the new 22.11 > > function to create the mempool (with a smaller cache), but will use the > > old inlined get/put that can exceed MAX_SIZE x 2 will remain. > > > > But this is a nice memory consumption improvement, in my opinion we > > should accept it for 22.11 with an entry in the release note. > > > > > > > > > > > > > > If we may also move the position of the objs array, I would add > > > > __rte_cache_aligned to the objs array. It makes no difference in > > the > > > > general case, but if get/put operations are always 32 objects, it > > will > > > > reduce the number of memory (or last level cache) accesses from > > five to > > > > four 64 B cache lines for every get/put operation. > > > > Will it really be the case? Since cache->len has to be accessed too, > > I don't think it would make a difference. > > Yes, the first cache line, containing cache->len, will be accessed always. I forgot to count that; so the improvement by aligning cache->objs will be five cache line accesses instead of six. > > Let me try to explain the scenario in other words: > > In an application where a mempool cache is only accessed in bursts of 32 objects (256 B), it matters if those 256 B accesses in the mempool cache start at a cache line aligned address or not. If cache line aligned, accessing those 256 B in the mempool cache will only touch 4 cache lines; if not, 5 cache lines will be touched. (For architectures with 128 B cache line, it will be 2 instead of 3 touched cache lines per mempool cache get/put operation in applications using only bursts of 32 objects.) > > If we cache line align cache->objs, those bursts of 32 objects (256 B) will be cache line aligned: Any address at cache->objs[N * 32 objects] is cache line aligned if objs->objs[0] is cache line aligned. > > Currently, the cache->objs directly follows cache->len, which makes cache->objs[0] cache line unaligned. > > If we decide to break the mempool cache ABI, we might as well include my suggested cache line alignment performance improvement. It doesn't degrade performance for mempool caches not only accessed in bursts of 32 objects. I don't follow you. Currently, with 16 objects (128B), we access to 3 cache lines: ┌────────┐ │len │ cache │********│--- line0 │********│ ^ │********│ | ├────────┤ | 16 objects │********│ | 128B cache │********│ | line1 │********│ | │********│ | ├────────┤ | │********│_v_ cache │ │ line2 │ │ │ │ └────────┘ With the alignment, it is also 3 cache lines: ┌────────┐ │len │ cache │ │ line0 │ │ │ │ ├────────┤--- │********│ ^ cache │********│ | line1 │********│ | │********│ | ├────────┤ | 16 objects │********│ | 128B cache │********│ | line2 │********│ | │********│ v └────────┘--- Am I missing something? > > > > > > > > > > > > > > > uint32_t len; /**< Current cache count */ > > > > > - /* > > > > > - * Cache is allocated to this size to allow it to overflow > > in > > > > certain > > > > > - * cases to avoid needless emptying of cache. > > > > > - */ > > > > > - void *objs[RTE_MEMPOOL_CACHE_MAX_SIZE * 3]; /**< Cache > > objects */ > > > > > + /** > > > > > + * Cache objects > > > > > + * > > > > > + * Cache is allocated to this size to allow it to overflow > > in > > > > certain > > > > > + * cases to avoid needless emptying of cache. > > > > > + */ > > > > > + void *objs[RTE_MEMPOOL_CACHE_MAX_SIZE * 2] > > __rte_cache_aligned; > > > > > } __rte_cache_aligned; > > > > > > > > I think aligning objs on cacheline should be a separate patch. > > > > > > Good point. I'll let you do it. :-) > > > > > > PS: Thank you for following up on this patch series, Andrew! > > > > Many thanks for this rework. > > > > Acked-by: Olivier Matz > > Perhaps Reviewed-by would be appropriate? I was thinking that "Acked-by" was commonly used by maintainers, and "Reviewed-by" for reviews by community members. After reading the documentation again, it's not that clear now in my mind :) Thanks, Olivier