From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id 4B40EA0093; Wed, 9 Nov 2022 06:03:10 +0100 (CET) Received: from mails.dpdk.org (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id EE153400D7; Wed, 9 Nov 2022 06:03:09 +0100 (CET) Received: from smartserver.smartsharesystems.com (smartserver.smartsharesystems.com [77.243.40.215]) by mails.dpdk.org (Postfix) with ESMTP id BED94400D4 for ; Wed, 9 Nov 2022 06:03:07 +0100 (CET) X-MimeOLE: Produced By Microsoft Exchange V6.5 Content-class: urn:content-classes:message MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable Subject: RE: FW: [PATCH v4 3/3] mempool: use cache for frequently updated stats Date: Wed, 9 Nov 2022 06:03:03 +0100 Message-ID: <98CBD80474FA8B44BF855DF32C47DC35D874A4@smartserver.smartshare.dk> In-Reply-To: <74775cadd8174c3797b4076929ebbcb7@huawei.com> X-MS-Has-Attach: X-MS-TNEF-Correlator: Thread-Topic: FW: [PATCH v4 3/3] mempool: use cache for frequently updated stats Thread-Index: AQHY8EWGJHVmHQn4qECYwEvekL7ida40xLXwgAAcq8CAAAcqEIAAEz2AgAAQHICAABangIAAAkcAgAArPdCAAKWUAA== References: <98CBD80474FA8B44BF855DF32C47DC35D874A1@smartserver.smartshare.dk> <6110999.17fYzF0512@thomas> <98CBD80474FA8B44BF855DF32C47DC35D874A3@smartserver.smartshare.dk> <2028060.trqCLbgVIZ@thomas> <74775cadd8174c3797b4076929ebbcb7@huawei.com> From: =?iso-8859-1?Q?Morten_Br=F8rup?= To: "Konstantin Ananyev" , "Bruce Richardson" , "Thomas Monjalon" Cc: , , , , , , , X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org > From: Konstantin Ananyev [mailto:konstantin.ananyev@huawei.com] > Sent: Tuesday, 8 November 2022 18.38 > > > > On Tue, Nov 08, 2022 at 04:51:11PM +0100, Thomas Monjalon wrote: > > > 08/11/2022 15:30, Morten Br=F8rup: > > > > > From: Thomas Monjalon [mailto:thomas@monjalon.net] > > > > > 08/11/2022 12:25, Morten Br=F8rup: > > > > > > From: Morten Br=F8rup > > > > > > > From: Konstantin Ananyev > [mailto:konstantin.ananyev@huawei.com] > > > > > > > Sent: Tuesday, 8 November 2022 10.20 > > > > > > > > +#ifdef RTE_LIBRTE_MEMPOOL_STATS > > > > > > > > +#define RTE_MEMPOOL_CACHE_STAT_ADD(cache, name, n) > (cache)- > > > > > >stats.name +=3D n > > > > > > > > > > > > > > As Andrew already pointed, it needs to be: ((cache)- > >stats.name +=3D > > > > > (n)) > > > > > > > Apart from that, LGTM. > > > > > > > Series-Acked-by: Konstantin Ananyev > > > > > > > > > > > > > @Thomas, this series should be ready to apply... it now has > been: > > > > > > Reviewed-by: (mempool maintainer) Andrew Rybchenko > > > > > > > > > > > Reviewed-By: Mattias R=F6nnblom = > > > > > > Acked-by: Konstantin Ananyev > > > > > > > > > > Being acked does not mean it is good to apply in -rc3. > > > > > > > > I understand that the RFC/v1 of this series was formally too = late > to make it in 22.11, so I will not complain loudly if you choose to > > omit it for 22.11. > > > > > > > > With two independent reviews, including from a mempool > maintainer, I still have some hope. Also considering the risk > assessment > > below. ;-) > > > > > > > > > Please tell what is the benefit for 22.11 (before/after and > condition). > > > > > > > > Short version: With this series, mempool statistics can be used > in production. Without it, the performance cost > > (mempool_perf_autotest: -74 %) is prohibitive! > > > > > > > > Long version: > > > > > > > > The patch series provides significantly higher performance for > mempool statistics, which are readable through > > rte_mempool_dump(FILE *f, struct rte_mempool *mp). > > > > > > > > Without this series, you have to set RTE_LIBRTE_MEMPOOL_DEBUG at > build time to get mempool statistics. > > RTE_LIBRTE_MEMPOOL_DEBUG also enables protective cookies before and > after each mempool object, which are all verified on > > get/put from the mempool. According to mempool_perf_autotest, the > performance cost of mempool statistics (by setting > > RTE_LIBRTE_MEMPOOL_DEBUG) is a 74 % decrease in rate_persec for > mempools with cache (i.e. mbuf pools). Prohibitive for use in > > production! > > > > > > > > With this series, the performance cost of mempool statistics (by > setting RTE_LIBRTE_MEMPOOL_STATS) in > > mempool_perf_autotest is only 6.7 %, so mempool statistics can be > used in production. > > > > > > > > > Note there is a real risk doing such change that late. > > > > > > > > Risk assessment: > > > > > > > > The patch series has zero effect unless either > RTE_LIBRTE_MEMPOOL_DEBUG or RTE_LIBRTE_MEMPOOL_STATS are set when > > building. They are not set in the default build. > > > > > > If theses build flags are not set, there is no risk and no = benefit. > > > But if they are set, there is a risk of regression, > > > for the benefit of an increased performance of a debug feature. > > > I would say it is better to avoid any functional regression in a > debug feature > > > at this stage. > > > Any other opinion? > > > > > While I agree that we should avoid any functional regression, I > wonder how > > widely used the debug feature is, and how big the risk of a > regression is? > > Even if there is one, having a regression in a debug feature is a = lot > less > > serious than having one in something which goes into production. > > >=20 > Unless it introduces an ABI breakage (as I understand it doesn't), = I'll > wait till 23.03. > Just in case. If built (both before and after this series) without = RTE_LIBRTE_MEMPOOL_DEBUG (and without RTE_LIBRTE_MEMPOOL_STATS, which is = introduced by the series), there is no ABI breakage. If built (both before and after this series) with = RTE_LIBRTE_MEMPOOL_DEBUG (and without RTE_LIBRTE_MEMPOOL_STATS), the ABI = differs between before and after this series: The stats array disappears = from struct rte_mempool, and the output from rte_mempool_dump() does not = include the statistics. If built (both before and after this series) with = RTE_LIBRTE_MEMPOOL_DEBUG (and with RTE_LIBRTE_MEMPOOL_STATS), the ABI = also differs between before and after this series: The size of the stats = array in struct rte_mempool grows by one element. > BTW, as a side thought - if the impact is really that small now, would > it make sense to make > it run-time option, instead of compile-time one? The mempool get/put functions are very lean when built without STATS or = DEBUG. With a runtime option, the resulting code would be slightly = longer, and only one additional conditional would be hit in the common = case (i.e. when the objects don't miss the mempool cache). So with stats = disabled (at runtime), it would only add a very small performance cost. = However, checking the value of the enabled/disabled variable can cause a = CPU cache miss, which has a performance cost. And the enabled/disabled = variable should definitely be global - if it is per mempool, it will = cause many CPU cache misses (because the common case doesn't touch the = mempool structure, only the mempool cache structure). Also, checking the runtime option should have unlikely(), so the = performance cost of the stats (when enabled at runtime) is also higher = than with a build time option. (Yes, dynamic branch prediction will = alleviate most of this, but it will consume entries in the branch = predictor table - these are inlined functions. Just like we always try = to avoid cache misses in DPDK, we should also try to conserve branch = predictor table entries. I hate the argument that branch prediction = fixes conditionals, especially if they are weird or could have been = avoided.) In the cost/benefit analysis, we need to consider that these statistics = are not fill/emptiness level status or similar, but only debug counters = (number of get/put transactions and objects), so we need to ask = ourselves this question: How many users are interested in these = statistics for production and are unable to build their application with = RTE_LIBRTE_MEMPOOL_STATS? For example, we (SmartShare Systems) are only interested in them for = application profiling purposes... trying to improve the performance by = striving for a higher number of objects per burst in every pipeline = stage. > Konstantin