From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from dpdk.org (dpdk.org [92.243.14.124]) by inbox.dpdk.org (Postfix) with ESMTP id 1134FA0352; Thu, 16 Jan 2020 14:10:41 +0100 (CET) Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id D6B661D44F; Thu, 16 Jan 2020 14:10:40 +0100 (CET) Received: from mail-il1-f195.google.com (mail-il1-f195.google.com [209.85.166.195]) by dpdk.org (Postfix) with ESMTP id A35ED1D44D for ; Thu, 16 Jan 2020 14:10:39 +0100 (CET) Received: by mail-il1-f195.google.com with SMTP id g12so18072976ild.2 for ; Thu, 16 Jan 2020 05:10:39 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=VI+qLWhU+3YMFw2EVdQT8KsNU4PfoUj8hiXZdeHJkrk=; b=qXWU5Fpbp/sNDpSLIrv6pA83HNm66NE+PukYPLaPe9aNPwY1TRpox0DU/HVVfXhEHz K1466tCAjy/REZBuuCCeMp6dKQIKinZ3LRFNNuZFG2u2Katgf0db2CQ3Jmrfs5l0efJK aN5r4BBcNCiRgLAYAA1oGTX+Y9UU91cIXKdw/545NQ8tnQIwJhs6+Hu4ptRDK3eKf3aa XXczTMOgc/jYipZ6yPyZtoZsDEAwinwRqTtiSIMVp3HvSKnFMBfZ7KLfnT6/dx4QIu2X dxHoqn/mLRgolzAuURSi1DvpcTUY09jRR+Lea6oEnvRKG0cdZFQX9DmG9gh46DSqBpOv +VfQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=VI+qLWhU+3YMFw2EVdQT8KsNU4PfoUj8hiXZdeHJkrk=; b=MDbkeNdeCQ2QDeZWFVih9NcMuyDDR61I+NodCzNFHfiYZJI4Feo4z4J+8zjj9J6kCY kcDDZyB7U8c0Z0PCNsnkyTdWgf+O7jIX0OpyNhQvCTc8EO0d5RFjyTib3JeOuBRQ4AfA ljQIyodiCT33ZKbY1PCB/vDr6SNwJ33MkgcPj3GkyJXpzgzMj4v4jQWWsBkhAXcA1A3A Pqe3ChwC1j0RmMUxwWTvpDF4q2XXCXSiEvc0Apmn0suMubKwZBGD9LIvTwHktOMGF8HD 5BubeaKyU1LOLdJrXWxLy439du53uZLo+MKVSROXVRtImoS1BkYzRajpnLy8PP/rd0OY 7Bgg== X-Gm-Message-State: APjAAAV2zWO7jayu2ZqOrrNQwUFuskzIdjBer1gbJ+geY2UpEpsQ4v+1 PVwO9uTjF+gk47033jylHCBlFrs7rQZl6hifPLA= X-Google-Smtp-Source: APXvYqxxEynG9Y4JJAtsTerhstVfZ/+hgPA6Yc9b2UbM0D9uILMa2N7ZiNHibRqBZxE2D2BUB2CbfmcQPmqXLXWJwnk= X-Received: by 2002:a92:dcca:: with SMTP id b10mr3591344ilr.294.1579180238923; Thu, 16 Jan 2020 05:10:38 -0800 (PST) MIME-Version: 1.0 References: <20200113064941.2749356-1-jerinj@marvell.com> <20200114210603.1836160-1-jerinj@marvell.com> In-Reply-To: <20200114210603.1836160-1-jerinj@marvell.com> From: Jerin Jacob Date: Thu, 16 Jan 2020 18:40:23 +0530 Message-ID: To: Jerin Jacob Cc: dpdk-dev , Thomas Monjalon , Olivier Matz , Andrew Rybchenko , "Richardson, Bruce" , "Ananyev, Konstantin" , Hemant Agrawal , Shahaf Shuler , Honnappa Nagarahalli , Gavin Hu , Jan Viktorin , David Christensen , Anatoly Burakov Content-Type: text/plain; charset="UTF-8" Subject: Re: [dpdk-dev] [PATCH v4] mempool: remove memory wastage on non x86 X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" On Wed, Jan 15, 2020 at 2:35 AM wrote: > > From: Jerin Jacob > > The existing optimize_object_size() function address the memory object > alignment constraint on x86 for better performance. > > Different (micro) architecture may have different memory alignment > constraint for better performance and it not the same as the existing > optimize_object_size(). > > Some use, XOR(kind of CRC) scheme to enable DRAM channel distribution > based on the address and some may have a different formula. > > Introducing arch_mem_object_align() function to abstract > the difference between different (micro) architectures to avoid > wasting memory for mempool object alignment for the architecture > that it is not required to do so. > > Details on the amount of memory saving: > > Currently, arm64 based architectures use the default (nchan=4, > nrank=1). The worst case is for an object whose size (including mempool > header) is 2 cache lines, where it is optimized to 3 cache lines (+50%). > > Examples for cache lines size = 64: > orig optimized > 64 -> 64 +0% > 128 -> 192 +50% > 192 -> 192 +0% > 256 -> 320 +25% > 320 -> 320 +0% > 384 -> 448 +16% > ... > 2304 -> 2368 +2.7% (~mbuf size) > > Additional details: > https://www.mail-archive.com/dev@dpdk.org/msg149157.html > > Signed-off-by: Jerin Jacob > Reviewed-by: Gavin Hu Ping for merge. > --- > v4: > > - s/X86/x86 in the doc/guides/prog_guide/mempool_lib.rst file (David) > - Remove Fixes and CC: stable@dpdk.org as well, as this is just a > improvement.(David) > - Udpate the patch heading to remove checkpatch warning due to above > change. > > v3: > - Change comment for MEMPOOL_F_NO_SPREAD flag as > " Spreading among memory channels not required." (Stephen Hemminger) > > v2: > - Changed the return type of arch_mem_object_align() to "unsigned int" from > "unsigned" to fix the checkpatch issues (Olivier Matz) > - Updated the comments for MEMPOOL_F_NO_SPREAD (Olivier Matz) > - Update the git comments to share the memory saving details. > > doc/guides/prog_guide/mempool_lib.rst | 6 +++--- > lib/librte_mempool/rte_mempool.c | 17 +++++++++++++---- > lib/librte_mempool/rte_mempool.h | 3 ++- > 3 files changed, 18 insertions(+), 8 deletions(-) > > diff --git a/doc/guides/prog_guide/mempool_lib.rst b/doc/guides/prog_guide/mempool_lib.rst > index 3bb84b0a6..f8b430d65 100644 > --- a/doc/guides/prog_guide/mempool_lib.rst > +++ b/doc/guides/prog_guide/mempool_lib.rst > @@ -27,10 +27,10 @@ In debug mode (CONFIG_RTE_LIBRTE_MEMPOOL_DEBUG is enabled), > statistics about get from/put in the pool are stored in the mempool structure. > Statistics are per-lcore to avoid concurrent access to statistics counters. > > -Memory Alignment Constraints > ----------------------------- > +Memory Alignment Constraints on x86 architecture > +------------------------------------------------ > > -Depending on hardware memory configuration, performance can be greatly improved by adding a specific padding between objects. > +Depending on hardware memory configuration on X86 architecture, performance can be greatly improved by adding a specific padding between objects. > The objective is to ensure that the beginning of each object starts on a different channel and rank in memory so that all channels are equally loaded. > > This is particularly true for packet buffers when doing L3 forwarding or flow classification. > diff --git a/lib/librte_mempool/rte_mempool.c b/lib/librte_mempool/rte_mempool.c > index 78d8eb941..1909998e8 100644 > --- a/lib/librte_mempool/rte_mempool.c > +++ b/lib/librte_mempool/rte_mempool.c > @@ -45,6 +45,7 @@ EAL_REGISTER_TAILQ(rte_mempool_tailq) > #define CALC_CACHE_FLUSHTHRESH(c) \ > ((typeof(c))((c) * CACHE_FLUSHTHRESH_MULTIPLIER)) > > +#if defined(RTE_ARCH_X86) > /* > * return the greatest common divisor between a and b (fast algorithm) > * > @@ -74,12 +75,13 @@ static unsigned get_gcd(unsigned a, unsigned b) > } > > /* > - * Depending on memory configuration, objects addresses are spread > + * Depending on memory configuration on x86 arch, objects addresses are spread > * between channels and ranks in RAM: the pool allocator will add > * padding between objects. This function return the new size of the > * object. > */ > -static unsigned optimize_object_size(unsigned obj_size) > +static unsigned int > +arch_mem_object_align(unsigned int obj_size) > { > unsigned nrank, nchan; > unsigned new_obj_size; > @@ -99,6 +101,13 @@ static unsigned optimize_object_size(unsigned obj_size) > new_obj_size++; > return new_obj_size * RTE_MEMPOOL_ALIGN; > } > +#else > +static unsigned int > +arch_mem_object_align(unsigned int obj_size) > +{ > + return obj_size; > +} > +#endif > > struct pagesz_walk_arg { > int socket_id; > @@ -234,8 +243,8 @@ rte_mempool_calc_obj_size(uint32_t elt_size, uint32_t flags, > */ > if ((flags & MEMPOOL_F_NO_SPREAD) == 0) { > unsigned new_size; > - new_size = optimize_object_size(sz->header_size + sz->elt_size + > - sz->trailer_size); > + new_size = arch_mem_object_align > + (sz->header_size + sz->elt_size + sz->trailer_size); > sz->trailer_size = new_size - sz->header_size - sz->elt_size; > } > > diff --git a/lib/librte_mempool/rte_mempool.h b/lib/librte_mempool/rte_mempool.h > index f81152af9..f48c94def 100644 > --- a/lib/librte_mempool/rte_mempool.h > +++ b/lib/librte_mempool/rte_mempool.h > @@ -260,7 +260,8 @@ struct rte_mempool { > #endif > } __rte_cache_aligned; > > -#define MEMPOOL_F_NO_SPREAD 0x0001 /**< Do not spread among memory channels. */ > +#define MEMPOOL_F_NO_SPREAD 0x0001 > +/**< Spreading among memory channels not required. */ > #define MEMPOOL_F_NO_CACHE_ALIGN 0x0002 /**< Do not align objs on cache lines.*/ > #define MEMPOOL_F_SP_PUT 0x0004 /**< Default put is "single-producer".*/ > #define MEMPOOL_F_SC_GET 0x0008 /**< Default get is "single-consumer".*/ > -- > 2.24.1 >