DPDK patches and discussions
 help / color / mirror / Atom feed
From: Jerin Jacob <jerinjacobk@gmail.com>
To: "Morten Brørup" <mb@smartsharesystems.com>
Cc: Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>,
	Dharmik Thakkar <Dharmik.Thakkar@arm.com>,
	 "Ananyev, Konstantin" <konstantin.ananyev@intel.com>,
	Olivier Matz <olivier.matz@6wind.com>,
	 Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>,
	dpdk-dev <dev@dpdk.org>, nd <nd@arm.com>,
	Ruifeng Wang <Ruifeng.Wang@arm.com>
Subject: Re: [dpdk-dev] [RFC] mempool: implement index-based per core cache
Date: Mon, 8 Nov 2021 22:17:59 +0530	[thread overview]
Message-ID: <CALBAE1PrQYyOG96f6ECeW1vPF3TOh1h7MZZULiY95z9xjbRuyA@mail.gmail.com> (raw)
In-Reply-To: <98CBD80474FA8B44BF855DF32C47DC35D86CB4@smartserver.smartshare.dk>

On Mon, Nov 8, 2021 at 9:34 PM Morten Brørup <mb@smartsharesystems.com> wrote:
>
> > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Honnappa
> > Nagarahalli
> > Sent: Monday, 8 November 2021 16.46
> >
> > <snip>
> > > >
> > > > > > > > >>>>>>>>> Current mempool per core cache implementation is
> > > > based
> > > > > > > > >>>>>>>>> on
> > > > > > > > >>>>> pointer
> > > > > > > > >>>>>>>>> For most architectures, each pointer consumes 64b
> > > > > > > > >>>>>>>>> Replace
> > > > > > it
> > > > > > > > >>>>> with
> > > > > > > > >>>>>>>>> index-based implementation, where in each buffer
> > is
> > > > > > > > >>>>>>>>> addressed
> > > > > > > > >>>>> by
> > > > > > > > >>>>>>>>> (pool address + index)
> > > > > > > > >>>>
> > > > > > > > >>>> I like Dharmik's suggestion very much. CPU cache is a
> > > > > > > > >>>> critical and limited resource.
> > > > > > > > >>>>
> > > > > > > > >>>> DPDK has a tendency of using pointers where indexes
> > could
> > > > be
> > > > > > used
> > > > > > > > >>>> instead. I suppose pointers provide the additional
> > > > > > > > >>>> flexibility
> > > > > > of
> > > > > > > > >>>> mixing entries from different memory pools, e.g.
> > multiple
> > > > > > > > >>>> mbuf
> > > > > > > > >> pools.
> > > > > > > > >>>>
> > > > > > > > >>
> > > > > > > > >> Agreed, thank you!
> > > > > > > > >>
> > > > > > > > >>>>>>>>
> > > > > > > > >>>>>>>> I don't think it is going to work:
> > > > > > > > >>>>>>>> On 64-bit systems difference between pool address
> > and
> > > > > > > > >>>>>>>> it's
> > > > > > > > elem
> > > > > > > > >>>>>>>> address could be bigger than 4GB.
> > > > > > > > >>>>>>> Are you talking about a case where the memory pool
> > > > > > > > >>>>>>> size
> > > > is
> > > > > > > > >>>>>>> more
> > > > > > > > >>>>> than 4GB?
> > > > > > > > >>>>>>
> > > > > > > > >>>>>> That is one possible scenario.
> > > > > > > > >>>>
> > > > > > > > >>>> That could be solved by making the index an element
> > index
> > > > > > instead
> > > > > > > > of
> > > > > > > > >> a
> > > > > > > > >>>> pointer offset: address = (pool address + index *
> > element
> > > > > > size).
> > > > > > > > >>>
> > > > > > > > >>> Or instead of scaling the index with the element size,
> > > > which
> > > > > > > > >>> is
> > > > > > > > only
> > > > > > > > >> known at runtime, the index could be more efficiently
> > > > > > > > >> scaled
> > > > by
> > > > > > a
> > > > > > > > >> compile time constant such as RTE_MEMPOOL_ALIGN (=
> > > > > > > > >> RTE_CACHE_LINE_SIZE). With a cache line size of 64 byte,
> > > > that
> > > > > > would
> > > > > > > > >> allow indexing into mempools up to 256 GB in size.
> > > > > > > > >>>
> > > > > > > > >>
> > > > > > > > >> Looking at this snippet [1] from
> > > > > > rte_mempool_op_populate_helper(),
> > > > > > > > >> there is an ‘offset’ added to avoid objects to cross
> > page
> > > > > > > > boundaries.
> > > > > > > > >> If my understanding is correct, using the index of
> > element
> > > > > > instead
> > > > > > > > of a
> > > > > > > > >> pointer offset will pose a challenge for some of the
> > corner
> > > > > > cases.
> > > > > > > > >>
> > > > > > > > >> [1]
> > > > > > > > >>        for (i = 0; i < max_objs; i++) {
> > > > > > > > >>                /* avoid objects to cross page boundaries
> > */
> > > > > > > > >>                if (check_obj_bounds(va + off, pg_sz,
> > > > > > total_elt_sz)
> > > > > > > > >> <
> > > > > > > > >> 0) {
> > > > > > > > >>                        off += RTE_PTR_ALIGN_CEIL(va +
> > off,
> > > > > > pg_sz) -
> > > > > > > > >> (va + off);
> > > > > > > > >>                        if (flags &
> > > > > > RTE_MEMPOOL_POPULATE_F_ALIGN_OBJ)
> > > > > > > > >>                                off += total_elt_sz -
> > > > > > > > >>                                        (((uintptr_t)(va
> > +
> > > > off -
> > > > > > 1) %
> > > > > > > > >>
> > > > > > > > >> total_elt_sz)
> > > > +
> > > > > > 1);
> > > > > > > > >>                }
> > > > > > > > >>
> > > > > > > > >
> > > > > > > > > OK. Alternatively to scaling the index with a cache line
> > > > size,
> > > > > > you
> > > > > > > > can scale it with sizeof(uintptr_t) to be able to address
> > 32
> > > > > > > > or
> > > > 16
> > > > > > GB
> > > > > > > > mempools on respectively 64 bit and 32 bit architectures.
> > Both
> > > > x86
> > > > > > and
> > > > > > > > ARM CPUs have instructions to access memory with an added
> > > > offset
> > > > > > > > multiplied by 4 or 8. So that should be high performance.
> > > > > > > >
> > > > > > > > Yes, agreed this can be done.
> > > > > > > > Cache line size can also be used when
> > > > ‘MEMPOOL_F_NO_CACHE_ALIGN’
> > > > > > > > is not enabled.
> > > > > > > > On a side note, I wanted to better understand the need for
> > > > having
> > > > > > the
> > > > > > > > 'MEMPOOL_F_NO_CACHE_ALIGN' option.
> > > > > > >
> > > > > > > The description of this field is misleading, and should be
> > > > corrected.
> > > > > > > The correct description would be: Don't need to align objs on
> > > > cache
> > > > > > lines.
> > > > > > >
> > > > > > > It is useful for mempools containing very small objects, to
> > > > conserve
> > > > > > memory.
> > > > > > I think we can assume that mbuf pools are created with the
> > > > > > 'MEMPOOL_F_NO_CACHE_ALIGN' flag set. With this we can use
> > offset
> > > > > > calculated with cache line size as the unit.
> > > > >
> > > > > You mean MEMPOOL_F_NO_CACHE_ALIGN flag not set. ;-)
> > > > Yes 😊
> > > >
> > > > >
> > > > > I agree. And since the flag is a hint only, it can be ignored if
> > the
> > > > mempool
> > > > > library is scaling the index with the cache line size.
> > > > I do not think we should ignore the flag for reason you mention
> > below.
> > > >
> > > > >
> > > > > However, a mempool may contain other objects than mbufs, and
> > those
> > > > objects
> > > > > may be small, so ignoring the MEMPOOL_F_NO_CACHE_ALIGN flag may
> > > cost
> > > > a
> > > > > lot of memory for such mempools.
> > > > We could use different methods. If MEMPOOL_F_NO_CACHE_ALIGN is set,
> > > > use the unit as 'sizeof(uintptr_t)', if not set use cache line size
> > as
> > > > the unit.
> > > >
> > >
> > > That would require that the indexing multiplier is a runtime
> > parameter instead
> > > of a compile time parameter. So it would have a performance penalty.
> > >
> > > The indexing multiplier could be compile time configurable, so it is
> > a tradeoff
> > > between granularity and maximum mempool size.
> > I meant compile time configurable. i.e.
> >
> > #ifdef MEMPOOL_F_NO_CACHE_ALIGN
> > <use sizeof(uintptr_t) as the multiplier>
> > #else
> > <use cache line size as the multiplier> /* This should provide enough
> > memory for packet buffers */
> > #endif
>
> Please note that MEMPOOL_F_NO_CACHE_ALIGN is a runtime flag passed when creating a mempool, not a compile time option.

Also, Please share  PMU counters stats on L1 and L2 miss with or
without this scheme after the rework. IMO, we should not have any
regression on
1) Per core mpps
OR
2) L1 and L2 misses.
with l3fwd/testpmd/l2fwd etc,


>
>

  reply	other threads:[~2021-11-08 16:48 UTC|newest]

Thread overview: 52+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-09-30 17:27 Dharmik Thakkar
2021-10-01 12:36 ` Jerin Jacob
2021-10-01 15:44   ` Honnappa Nagarahalli
2021-10-01 17:32     ` Jerin Jacob
2021-10-01 17:57       ` Honnappa Nagarahalli
2021-10-01 18:21       ` Jerin Jacob
2021-10-01 21:30 ` Ananyev, Konstantin
2021-10-02  0:07   ` Honnappa Nagarahalli
2021-10-02 18:51     ` Ananyev, Konstantin
2021-10-04 16:36       ` Honnappa Nagarahalli
2021-10-30 10:23         ` Morten Brørup
2021-10-31  8:14         ` Morten Brørup
2021-11-03 15:12           ` Dharmik Thakkar
2021-11-03 15:52             ` Morten Brørup
2021-11-04  4:42               ` Dharmik Thakkar
2021-11-04  8:04                 ` Morten Brørup
2021-11-08  4:32                   ` Honnappa Nagarahalli
2021-11-08  7:22                     ` Morten Brørup
2021-11-08 15:29                       ` Honnappa Nagarahalli
2021-11-08 15:39                         ` Morten Brørup
2021-11-08 15:46                           ` Honnappa Nagarahalli
2021-11-08 16:03                             ` Morten Brørup
2021-11-08 16:47                               ` Jerin Jacob [this message]
2021-12-24 22:59 ` [PATCH 0/1] " Dharmik Thakkar
2021-12-24 22:59   ` [PATCH 1/1] " Dharmik Thakkar
2022-01-11  2:26     ` Ananyev, Konstantin
2022-01-13  5:17       ` Dharmik Thakkar
2022-01-13 10:37         ` Ananyev, Konstantin
2022-01-19 15:32           ` Dharmik Thakkar
2022-01-21 11:25             ` Ananyev, Konstantin
2022-01-21 11:31               ` Ananyev, Konstantin
2022-03-24 19:51               ` Dharmik Thakkar
2021-12-25  0:16   ` [PATCH 0/1] " Morten Brørup
2022-01-07 11:15     ` Bruce Richardson
2022-01-07 11:29       ` Morten Brørup
2022-01-07 13:50         ` Bruce Richardson
2022-01-08  9:37           ` Morten Brørup
2022-01-10  6:38             ` Jerin Jacob
2022-01-13  5:31               ` Dharmik Thakkar
2023-07-06 17:43                 ` Stephen Hemminger
2023-07-31 12:23                   ` Thomas Monjalon
2023-07-31 12:33                     ` Morten Brørup
2023-07-31 14:57                       ` Dharmik Jayesh Thakkar
2022-01-13  5:36   ` [PATCH v2 " Dharmik Thakkar
2022-01-13  5:36     ` [PATCH v2 1/1] " Dharmik Thakkar
2022-01-13 10:18       ` Jerin Jacob
2022-01-20  8:21       ` Morten Brørup
2022-01-21  6:01         ` Honnappa Nagarahalli
2022-01-21  7:36           ` Morten Brørup
2022-01-24 13:05             ` Ray Kinsella
2022-01-21  9:12           ` Bruce Richardson
2022-01-23  7:13       ` Wang, Haiyue

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CALBAE1PrQYyOG96f6ECeW1vPF3TOh1h7MZZULiY95z9xjbRuyA@mail.gmail.com \
    --to=jerinjacobk@gmail.com \
    --cc=Dharmik.Thakkar@arm.com \
    --cc=Honnappa.Nagarahalli@arm.com \
    --cc=Ruifeng.Wang@arm.com \
    --cc=andrew.rybchenko@oktetlabs.ru \
    --cc=dev@dpdk.org \
    --cc=konstantin.ananyev@intel.com \
    --cc=mb@smartsharesystems.com \
    --cc=nd@arm.com \
    --cc=olivier.matz@6wind.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).