RE: [PATCH 0/1] mempool: implement index-based per core cache

DPDK patches and discussions
 help / color / mirror / Atom feed

From: "Morten Brørup" <mb@smartsharesystems.com>
To: "Thomas Monjalon" <thomas@monjalon.net>,
	"Dharmik Thakkar" <Dharmik.Thakkar@arm.com>
Cc: <dev@dpdk.org>, "Jerin Jacob" <jerinjacobk@gmail.com>,
	"Bruce Richardson" <bruce.richardson@intel.com>,
	"Honnappa Nagarahalli" <Honnappa.Nagarahalli@arm.com>,
	"nd" <nd@arm.com>, "Ruifeng Wang" <Ruifeng.Wang@arm.com>,
	"Stephen Hemminger" <stephen@networkplumber.org>,
	<olivier.matz@6wind.com>, <andrew.rybchenko@oktetlabs.ru>
Subject: RE: [PATCH 0/1] mempool: implement index-based per core cache
Date: Mon, 31 Jul 2023 14:33:03 +0200	[thread overview]
Message-ID: <98CBD80474FA8B44BF855DF32C47DC35D87AAF@smartserver.smartshare.dk> (raw)
In-Reply-To: <12619560.VsHLxoZxqI@thomas>

> From: Thomas Monjalon [mailto:thomas@monjalon.net]
> Sent: Monday, 31 July 2023 14.24
> 
> The v2 was not sent, and Stephen dropped the patch from patchwork.
> 
> Do we abandon this feature?

+1, because I think that the zero-copy mempool cache access functions make this patch irrelevant.

> Should I remove it from the roadmap?

+1

> 
> 
> 06/07/2023 19:43, Stephen Hemminger:
> > On Thu, 13 Jan 2022 05:31:18 +0000
> > Dharmik Thakkar <Dharmik.Thakkar@arm.com> wrote:
> >
> > > Hi,
> > >
> > > Thank you for your valuable review comments and suggestions!
> > >
> > > I will be sending out a v2 in which I have increased the size of the
> mempool to 32GB by using division by sizeof(uintptr_t).
> > > However, I am seeing ~5% performance degradation with
> mempool_perf_autotest (for bulk size of 32) with this change
> > > when compared to the base performance.
> > > Earlier, without this change, I was seeing an improvement of ~13% compared
> to base performance. So, this is a significant degradation.
> > > I would appreciate your review comments on v2.
> > >
> > > Thank you!
> > >
> > > > On Jan 10, 2022, at 12:38 AM, Jerin Jacob <jerinjacobk@gmail.com> wrote:
> > > >
> > > > On Sat, Jan 8, 2022 at 3:07 PM Morten Brørup <mb@smartsharesystems.com>
> wrote:
> > > >>
> > > >>> From: Bruce Richardson [mailto:bruce.richardson@intel.com]
> > > >>> Sent: Friday, 7 January 2022 14.51
> > > >>>
> > > >>> On Fri, Jan 07, 2022 at 12:29:23PM +0100, Morten Brørup wrote:
> > > >>>>> From: Bruce Richardson [mailto:bruce.richardson@intel.com]
> > > >>>>> Sent: Friday, 7 January 2022 12.16
> > > >>>>>
> > > >>>>> On Sat, Dec 25, 2021 at 01:16:03AM +0100, Morten Brørup wrote:
> > > >>>>>>> From: Dharmik Thakkar [mailto:dharmik.thakkar@arm.com] Sent:
> > > >>>>> Friday, 24
> > > >>>>>>> December 2021 23.59
> > > >>>>>>>
> > > >>>>>>> Current mempool per core cache implementation stores pointers
> > > >>> to
> > > >>>>> mbufs
> > > >>>>>>> On 64b architectures, each pointer consumes 8B This patch
> > > >>> replaces
> > > >>>>> it
> > > >>>>>>> with index-based implementation, where in each buffer is
> > > >>> addressed
> > > >>>>> by
> > > >>>>>>> (pool base address + index) It reduces the amount of
> > > >>> memory/cache
> > > >>>>>>> required for per core cache
> > > >>>>>>>
> > > >>>>>>> L3Fwd performance testing reveals minor improvements in the
> > > >>> cache
> > > >>>>>>> performance (L1 and L2 misses reduced by 0.60%) with no change
> > > >>> in
> > > >>>>>>> throughput
> > > >>>>>>>
> > > >>>>>>> Micro-benchmarking the patch using mempool_perf_test shows
> > > >>>>> significant
> > > >>>>>>> improvement with majority of the test cases
> > > >>>>>>>
> > > >>>>>>
> > > >>>>>> I still think this is very interesting. And your performance
> > > >>> numbers
> > > >>>>> are
> > > >>>>>> looking good.
> > > >>>>>>
> > > >>>>>> However, it limits the size of a mempool to 4 GB. As previously
> > > >>>>>> discussed, the max mempool size can be increased by multiplying
> > > >>> the
> > > >>>>> index
> > > >>>>>> with a constant.
> > > >>>>>>
> > > >>>>>> I would suggest using sizeof(uintptr_t) as the constant
> > > >>> multiplier,
> > > >>>>> so
> > > >>>>>> the mempool can hold objects of any size divisible by
> > > >>>>> sizeof(uintptr_t).
> > > >>>>>> And it would be silly to use a mempool to hold objects smaller
> > > >>> than
> > > >>>>>> sizeof(uintptr_t).
> > > >>>>>>
> > > >>>>>> How does the performance look if you multiply the index by
> > > >>>>>> sizeof(uintptr_t)?
> > > >>>>>>
> > > >>>>>
> > > >>>>> Each mempool entry is cache aligned, so we can use that if we want
> > > >>> a
> > > >>>>> bigger
> > > >>>>> multiplier.
> > > >>>>
> > > >>>> Thanks for chiming in, Bruce.
> > > >>>>
> > > >>>> Please also read this discussion about the multiplier:
> > > >>>>
> http://inbox.dpdk.org/dev/CALBAE1PrQYyOG96f6ECeW1vPF3TOh1h7MZZULiY95z9xjbRuyA@
> mail.gmail.com/
> > > >>>>
> > > >>>
> > > >>> I actually wondered after I had sent the email whether we had indeed
> an
> > > >>> option to disable the cache alignment or not! Thanks for pointing out
> > > >>> that
> > > >>> we do. This brings a couple additional thoughts:
> > > >>>
> > > >>> * Using indexes for the cache should probably be a runtime flag rather
> > > >>> than
> > > >>>  a build-time one.
> > > >>> * It would seem reasonable to me to disallow use of the indexed-cache
> > > >>> flag
> > > >>>  and the non-cache aligned flag simultaneously.
> > > >>> * On the offchance that that restriction is unacceptable, then we can
> > > >>>  make things a little more complicated by doing a runtime computation
> > > >>> of
> > > >>>  the "index-shiftwidth" to use.
> > > >>>
> > > >>> Overall, I think defaulting to cacheline shiftwidth and disallowing
> > > >>> index-based addressing when using unaligned buffers is simplest and
> > > >>> easiest
> > > >>> unless we can come up with a valid usecase for needing more than that.
> > > >>>
> > > >>> /Bruce
> > > >>
> > > >> This feature is a performance optimization.
> > > >>
> > > >> With that in mind, it should not introduce function pointers or similar
> run-time checks or in the fast path, to determine what kind of cache to use
> per mempool. And if an index multiplier is implemented, it should be a compile
> time constant, probably something between sizeof(uintptr_t) or
> RTE_MEMPOOL_ALIGN (=RTE_CACHE_LINE_SIZE).
> > > >>
> > > >> The patch comes with a tradeoff between better performance and limited
> mempool size, and possibly some limitations regarding very small objects that
> are not cache line aligned to avoid wasting memory
> (RTE_MEMPOOL_POPULATE_F_ALIGN_OBJ).
> > > >>
> > > >> With no multiplier, the only tradeoff is that the mempool size is
> limited to 4 GB.
> > > >>
> > > >> If the multiplier is small (i.e. 8 bytes) the only tradeoff is that the
> mempool size is limited to 32 GB. (And a waste of memory for objects smaller
> than 8 byte; but I don't think anyone would use a mempool to hold objects
> smaller than 8 byte.)
> > > >>
> > > >> If the multiplier is larger (i.e. 64 bytes cache line size), the
> mempool size is instead limited to 256 GB, but
> RTE_MEMPOOL_POPULATE_F_ALIGN_OBJ has no effect.
> > > >>
> > > >> Note: 32 bit platforms have no benefit from this patch: The pointer
> already only uses 4 bytes, so replacing the pointer with a 4 byte index makes
> no difference.
> > > >>
> > > >>
> > > >> Since this feature is a performance optimization only, and doesn't
> provide any new features, I don't mind it being a compile time option.
> > > >>
> > > >> If this feature is a compile time option, and the mempool library is
> compiled with the large multiplier, then RTE_MEMPOOL_POPULATE_F_ALIGN_OBJ
> could be made undefined in the public header file, so compilation of
> applications using the flag will fail. And rte_mempool_create() could
> RTE_ASSERT() that RTE_MEMPOOL_POPULATE_F_ALIGN_OBJ is not set in its flags
> parameter, or emit a warning about the flag being ignored. Obviously,
> rte_mempool_create() should also RTE_ASSERT() that the mempool is not larger
> than the library supports, possibly emitting a message that the mempool
> library should be built without this feature to support the larger mempool.
> > > >>
> > > >> Here is another thought: If only exotic applications use mempools
> larger than 32 GB, this would be a generally acceptable limit, and DPDK should
> use index-based cache as default, making the opposite (i.e. pointer-based
> cache) a compile time option instead. A similar decision was recently made for
> limiting the RTE_MAX_LCORE default.
> > > >>
> > > >>
> > > >> Although DPDK is moving away from compile time options in order to
> better support Linux distros, there should be a general exception for
> performance and memory optimizations. Otherwise, network appliance vendors
> will inherit the increasing amount of DPDK bloat, and we (network appliance
> vendors) will eventually be forced to fork DPDK to get rid of the bloat and
> achieve the goals originally intended by DPDK.
> > > >
> > > > Agree with Morten's view on this.
> > > >
> > > >> If anyone disagrees with the principle about a general exception for
> performance and memory optimizations, I would like to pass on the decision to
> the Techboard!
> > > >>
> >
> > NAK
> > Having compile time stuff like this means one side or the other is not
> tested
> > by CI infrastructure.  There never was sufficient justification, and lots of
> objections.
> > Dropping the patch.
> >
> >
> 
> 
> 
>

next prev parent reply	other threads:[~2023-07-31 12:33 UTC|newest]

Thread overview: 52+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-09-30 17:27 [dpdk-dev] [RFC] " Dharmik Thakkar
2021-10-01 12:36 ` Jerin Jacob
2021-10-01 15:44   ` Honnappa Nagarahalli
2021-10-01 17:32     ` Jerin Jacob
2021-10-01 17:57       ` Honnappa Nagarahalli
2021-10-01 18:21       ` Jerin Jacob
2021-10-01 21:30 ` Ananyev, Konstantin
2021-10-02  0:07   ` Honnappa Nagarahalli
2021-10-02 18:51     ` Ananyev, Konstantin
2021-10-04 16:36       ` Honnappa Nagarahalli
2021-10-30 10:23         ` Morten Brørup
2021-10-31  8:14         ` Morten Brørup
2021-11-03 15:12           ` Dharmik Thakkar
2021-11-03 15:52             ` Morten Brørup
2021-11-04  4:42               ` Dharmik Thakkar
2021-11-04  8:04                 ` Morten Brørup
2021-11-08  4:32                   ` Honnappa Nagarahalli
2021-11-08  7:22                     ` Morten Brørup
2021-11-08 15:29                       ` Honnappa Nagarahalli
2021-11-08 15:39                         ` Morten Brørup
2021-11-08 15:46                           ` Honnappa Nagarahalli
2021-11-08 16:03                             ` Morten Brørup
2021-11-08 16:47                               ` Jerin Jacob
2021-12-24 22:59 ` [PATCH 0/1] " Dharmik Thakkar
2021-12-24 22:59   ` [PATCH 1/1] " Dharmik Thakkar
2022-01-11  2:26     ` Ananyev, Konstantin
2022-01-13  5:17       ` Dharmik Thakkar
2022-01-13 10:37         ` Ananyev, Konstantin
2022-01-19 15:32           ` Dharmik Thakkar
2022-01-21 11:25             ` Ananyev, Konstantin
2022-01-21 11:31               ` Ananyev, Konstantin
2022-03-24 19:51               ` Dharmik Thakkar
2021-12-25  0:16   ` [PATCH 0/1] " Morten Brørup
2022-01-07 11:15     ` Bruce Richardson
2022-01-07 11:29       ` Morten Brørup
2022-01-07 13:50         ` Bruce Richardson
2022-01-08  9:37           ` Morten Brørup
2022-01-10  6:38             ` Jerin Jacob
2022-01-13  5:31               ` Dharmik Thakkar
2023-07-06 17:43                 ` Stephen Hemminger
2023-07-31 12:23                   ` Thomas Monjalon
2023-07-31 12:33                     ` Morten Brørup [this message]
2023-07-31 14:57                       ` Dharmik Jayesh Thakkar
2022-01-13  5:36   ` [PATCH v2 " Dharmik Thakkar
2022-01-13  5:36     ` [PATCH v2 1/1] " Dharmik Thakkar
2022-01-13 10:18       ` Jerin Jacob
2022-01-20  8:21       ` Morten Brørup
2022-01-21  6:01         ` Honnappa Nagarahalli
2022-01-21  7:36           ` Morten Brørup
2022-01-24 13:05             ` Ray Kinsella
2022-01-21  9:12           ` Bruce Richardson
2022-01-23  7:13       ` Wang, Haiyue

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=98CBD80474FA8B44BF855DF32C47DC35D87AAF@smartserver.smartshare.dk \
    --to=mb@smartsharesystems.com \
    --cc=Dharmik.Thakkar@arm.com \
    --cc=Honnappa.Nagarahalli@arm.com \
    --cc=Ruifeng.Wang@arm.com \
    --cc=andrew.rybchenko@oktetlabs.ru \
    --cc=bruce.richardson@intel.com \
    --cc=dev@dpdk.org \
    --cc=jerinjacobk@gmail.com \
    --cc=nd@arm.com \
    --cc=olivier.matz@6wind.com \
    --cc=stephen@networkplumber.org \
    --cc=thomas@monjalon.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).