From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <dev-bounces@dpdk.org>
Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124])
	by inbox.dpdk.org (Postfix) with ESMTP id 52B4FA0351;
	Mon, 10 Jan 2022 07:38:53 +0100 (CET)
Received: from [217.70.189.124] (localhost [127.0.0.1])
	by mails.dpdk.org (Postfix) with ESMTP id 3BD6441172;
	Mon, 10 Jan 2022 07:38:53 +0100 (CET)
Received: from mail-il1-f171.google.com (mail-il1-f171.google.com
 [209.85.166.171])
 by mails.dpdk.org (Postfix) with ESMTP id B0B794013F
 for <dev@dpdk.org>; Mon, 10 Jan 2022 07:38:52 +0100 (CET)
Received: by mail-il1-f171.google.com with SMTP id d14so10503626ila.1
 for <dev@dpdk.org>; Sun, 09 Jan 2022 22:38:52 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112;
 h=mime-version:references:in-reply-to:from:date:message-id:subject:to
 :cc:content-transfer-encoding;
 bh=EjYfn2dNu9LJl+pFztczECfo1iRZDo3IpMvsRy2pcJc=;
 b=kHduB1jyNrcOaIdhHeKztXh0xuY8Kzn5OdEW3gIfmXbPUblQ3oghRnGYqiHX7hK1Tx
 ShaTEr7KnUtT33Eirft0+EXUjWSWScAl3HvnP3sW3d3XUnbE87SkNrEtE43RsIQapASg
 Gz9LLcl2ewfhyd4WPuG29z6mf0VzG29mPrfQIJk87LIDW6uEBSv8L/LF9us+peAHxCgk
 YMH806bSFnZ0wQ4PcrxQ+Sy8BwG55x2Jl7seuwFVKLjhTeI7vZNgQZCSfYtgRPtdsVYl
 7UXnuDbo8l+aCFeE3iChc4FlNmIu4lgvm6rZcDo8VgPVtf7jaHwKVopSCXmqyyXamaZg
 8HDQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20210112;
 h=x-gm-message-state:mime-version:references:in-reply-to:from:date
 :message-id:subject:to:cc:content-transfer-encoding;
 bh=EjYfn2dNu9LJl+pFztczECfo1iRZDo3IpMvsRy2pcJc=;
 b=h6AQQYBjwlDJtkT+KBqEWols4WO6m7eRzQNW9Z76g9JFt7EIdnSfyczvSeKwzsPhf6
 u7QvDUpCS9Bh1MNiu0Tim/wVQ4FQcPv+oqMM5lwNz4v3gwDd7uyg6uQ5D8HiRI88DtwW
 UfPb7V4kBT1aVMUnYCiDW2s3/ZpHFgTkBa3P0aBpH+ejR3VnjSdaeTMg4B5VhpBpSCIU
 2milrtCz8fbazW6JDkPwfXOKNPU70I4lLu4AqcisDOWScuSU8J7AmcljMcnV6+zexCz0
 +V02ov16RiXGpEBP7PRkyg7Lw6/41o6M4xA2BFuW9U7kyNsjudlQFq5WUDSVGB61sZ5n
 SF7Q==
X-Gm-Message-State: AOAM531gemg37IBkwzcEK79ufRmSW3pYBoowDoYwolfeXdb7gFmDUu/0
 Sf61VEuGiqNBMyJAuuU8vEbED1TmDV67x4iI5eA=
X-Google-Smtp-Source: ABdhPJykcbm7Lq8abZ8nXaJ6ziH6kOTrjOKoza0ptMNFUMfHSzVv62qTPcHcB1q5VvJv4Knl3Nna016sVTHXY8tXHJU=
X-Received: by 2002:a05:6e02:1be2:: with SMTP id
 y2mr33348664ilv.262.1641796732031; 
 Sun, 09 Jan 2022 22:38:52 -0800 (PST)
MIME-Version: 1.0
References: <20210930172735.2675627-1-dharmik.thakkar@arm.com>
 <20211224225923.806498-1-dharmik.thakkar@arm.com>
 <98CBD80474FA8B44BF855DF32C47DC35D86DAD@smartserver.smartshare.dk>
 <Ydgg7YlndpBb0g+w@bricha3-MOBL.ger.corp.intel.com>
 <98CBD80474FA8B44BF855DF32C47DC35D86DEA@smartserver.smartshare.dk>
 <YdhFKhWtpzKS6g7l@bricha3-MOBL.ger.corp.intel.com>
 <98CBD80474FA8B44BF855DF32C47DC35D86DEF@smartserver.smartshare.dk>
In-Reply-To: <98CBD80474FA8B44BF855DF32C47DC35D86DEF@smartserver.smartshare.dk>
From: Jerin Jacob <jerinjacobk@gmail.com>
Date: Mon, 10 Jan 2022 12:08:25 +0530
Message-ID: <CALBAE1NWcTNiAjxD7hSgGBzsYP0bx+0bsaUxiJ0LpGmCByLRtA@mail.gmail.com>
Subject: Re: [PATCH 0/1] mempool: implement index-based per core cache
To: =?UTF-8?Q?Morten_Br=C3=B8rup?= <mb@smartsharesystems.com>
Cc: Bruce Richardson <bruce.richardson@intel.com>,
 Dharmik Thakkar <dharmik.thakkar@arm.com>, 
 Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>, dpdk-dev <dev@dpdk.org>,
 nd <nd@arm.com>, 
 "Ruifeng Wang (Arm Technology China)" <ruifeng.wang@arm.com>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-BeenThere: dev@dpdk.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: DPDK patches and discussions <dev.dpdk.org>
List-Unsubscribe: <https://mails.dpdk.org/options/dev>,
 <mailto:dev-request@dpdk.org?subject=unsubscribe>
List-Archive: <http://mails.dpdk.org/archives/dev/>
List-Post: <mailto:dev@dpdk.org>
List-Help: <mailto:dev-request@dpdk.org?subject=help>
List-Subscribe: <https://mails.dpdk.org/listinfo/dev>,
 <mailto:dev-request@dpdk.org?subject=subscribe>
Errors-To: dev-bounces@dpdk.org

On Sat, Jan 8, 2022 at 3:07 PM Morten Br=C3=B8rup <mb@smartsharesystems.com=
> wrote:
>
> > From: Bruce Richardson [mailto:bruce.richardson@intel.com]
> > Sent: Friday, 7 January 2022 14.51
> >
> > On Fri, Jan 07, 2022 at 12:29:23PM +0100, Morten Br=C3=B8rup wrote:
> > > > From: Bruce Richardson [mailto:bruce.richardson@intel.com]
> > > > Sent: Friday, 7 January 2022 12.16
> > > >
> > > > On Sat, Dec 25, 2021 at 01:16:03AM +0100, Morten Br=C3=B8rup wrote:
> > > > > > From: Dharmik Thakkar [mailto:dharmik.thakkar@arm.com] Sent:
> > > > Friday, 24
> > > > > > December 2021 23.59
> > > > > >
> > > > > > Current mempool per core cache implementation stores pointers
> > to
> > > > mbufs
> > > > > > On 64b architectures, each pointer consumes 8B This patch
> > replaces
> > > > it
> > > > > > with index-based implementation, where in each buffer is
> > addressed
> > > > by
> > > > > > (pool base address + index) It reduces the amount of
> > memory/cache
> > > > > > required for per core cache
> > > > > >
> > > > > > L3Fwd performance testing reveals minor improvements in the
> > cache
> > > > > > performance (L1 and L2 misses reduced by 0.60%) with no change
> > in
> > > > > > throughput
> > > > > >
> > > > > > Micro-benchmarking the patch using mempool_perf_test shows
> > > > significant
> > > > > > improvement with majority of the test cases
> > > > > >
> > > > >
> > > > > I still think this is very interesting. And your performance
> > numbers
> > > > are
> > > > > looking good.
> > > > >
> > > > > However, it limits the size of a mempool to 4 GB. As previously
> > > > > discussed, the max mempool size can be increased by multiplying
> > the
> > > > index
> > > > > with a constant.
> > > > >
> > > > > I would suggest using sizeof(uintptr_t) as the constant
> > multiplier,
> > > > so
> > > > > the mempool can hold objects of any size divisible by
> > > > sizeof(uintptr_t).
> > > > > And it would be silly to use a mempool to hold objects smaller
> > than
> > > > > sizeof(uintptr_t).
> > > > >
> > > > > How does the performance look if you multiply the index by
> > > > > sizeof(uintptr_t)?
> > > > >
> > > >
> > > > Each mempool entry is cache aligned, so we can use that if we want
> > a
> > > > bigger
> > > > multiplier.
> > >
> > > Thanks for chiming in, Bruce.
> > >
> > > Please also read this discussion about the multiplier:
> > > http://inbox.dpdk.org/dev/CALBAE1PrQYyOG96f6ECeW1vPF3TOh1h7MZZULiY95z=
9xjbRuyA@mail.gmail.com/
> > >
> >
> > I actually wondered after I had sent the email whether we had indeed an
> > option to disable the cache alignment or not! Thanks for pointing out
> > that
> > we do. This brings a couple additional thoughts:
> >
> > * Using indexes for the cache should probably be a runtime flag rather
> > than
> >   a build-time one.
> > * It would seem reasonable to me to disallow use of the indexed-cache
> > flag
> >   and the non-cache aligned flag simultaneously.
> > * On the offchance that that restriction is unacceptable, then we can
> >   make things a little more complicated by doing a runtime computation
> > of
> >   the "index-shiftwidth" to use.
> >
> > Overall, I think defaulting to cacheline shiftwidth and disallowing
> > index-based addressing when using unaligned buffers is simplest and
> > easiest
> > unless we can come up with a valid usecase for needing more than that.
> >
> > /Bruce
>
> This feature is a performance optimization.
>
> With that in mind, it should not introduce function pointers or similar r=
un-time checks or in the fast path, to determine what kind of cache to use =
per mempool. And if an index multiplier is implemented, it should be a comp=
ile time constant, probably something between sizeof(uintptr_t) or RTE_MEMP=
OOL_ALIGN (=3DRTE_CACHE_LINE_SIZE).
>
> The patch comes with a tradeoff between better performance and limited me=
mpool size, and possibly some limitations regarding very small objects that=
 are not cache line aligned to avoid wasting memory (RTE_MEMPOOL_POPULATE_F=
_ALIGN_OBJ).
>
> With no multiplier, the only tradeoff is that the mempool size is limited=
 to 4 GB.
>
> If the multiplier is small (i.e. 8 bytes) the only tradeoff is that the m=
empool size is limited to 32 GB. (And a waste of memory for objects smaller=
 than 8 byte; but I don't think anyone would use a mempool to hold objects =
smaller than 8 byte.)
>
> If the multiplier is larger (i.e. 64 bytes cache line size), the mempool =
size is instead limited to 256 GB, but RTE_MEMPOOL_POPULATE_F_ALIGN_OBJ has=
 no effect.
>
> Note: 32 bit platforms have no benefit from this patch: The pointer alrea=
dy only uses 4 bytes, so replacing the pointer with a 4 byte index makes no=
 difference.
>
>
> Since this feature is a performance optimization only, and doesn't provid=
e any new features, I don't mind it being a compile time option.
>
> If this feature is a compile time option, and the mempool library is comp=
iled with the large multiplier, then RTE_MEMPOOL_POPULATE_F_ALIGN_OBJ could=
 be made undefined in the public header file, so compilation of application=
s using the flag will fail. And rte_mempool_create() could RTE_ASSERT() tha=
t RTE_MEMPOOL_POPULATE_F_ALIGN_OBJ is not set in its flags parameter, or em=
it a warning about the flag being ignored. Obviously, rte_mempool_create() =
should also RTE_ASSERT() that the mempool is not larger than the library su=
pports, possibly emitting a message that the mempool library should be buil=
t without this feature to support the larger mempool.
>
> Here is another thought: If only exotic applications use mempools larger =
than 32 GB, this would be a generally acceptable limit, and DPDK should use=
 index-based cache as default, making the opposite (i.e. pointer-based cach=
e) a compile time option instead. A similar decision was recently made for =
limiting the RTE_MAX_LCORE default.
>
>
> Although DPDK is moving away from compile time options in order to better=
 support Linux distros, there should be a general exception for performance=
 and memory optimizations. Otherwise, network appliance vendors will inheri=
t the increasing amount of DPDK bloat, and we (network appliance vendors) w=
ill eventually be forced to fork DPDK to get rid of the bloat and achieve t=
he goals originally intended by DPDK.

Agree with Morten's view on this.

>If anyone disagrees with the principle about a general exception for perfo=
rmance and memory optimizations, I would like to pass on the decision to th=
e Techboard!
>