DPDK patches and discussions
 help / color / mirror / Atom feed
From: "Morten Brørup" <mb@smartsharesystems.com>
To: "Honnappa Nagarahalli" <Honnappa.Nagarahalli@arm.com>
Cc: "Konstantin Ananyev" <konstantin.ananyev@huawei.com>,
	"Paul Szczepanek" <Paul.Szczepanek@arm.com>, <dev@dpdk.org>,
	<konstantin.v.ananyev@yandex.ru>, "nd" <nd@arm.com>
Subject: RE: [PATCH v5 0/4] add pointer compression API
Date: Sat, 2 Mar 2024 11:33:52 +0100	[thread overview]
Message-ID: <98CBD80474FA8B44BF855DF32C47DC35E9F291@smartserver.smartshare.dk> (raw)
In-Reply-To: <7D23A333-9846-4A34-A8B5-FDC11F042025@arm.com>

> From: Honnappa Nagarahalli [mailto:Honnappa.Nagarahalli@arm.com]
> Sent: Friday, 1 March 2024 20.57
> 
> > On Mar 1, 2024, at 5:16 AM, Morten Brørup <mb@smartsharesystems.com>
> wrote:
> >
> >> From: Konstantin Ananyev [mailto:konstantin.ananyev@huawei.com]
> >> Sent: Thursday, 22 February 2024 17.16
> >>
> >>> For some reason your email is not visible to me, even though it's in
> the
> >>> archive.
> >>
> >> No worries.
> >>
> >>>
> >>> On 02/11/202416:32,Konstantin Ananyev konstantin.v.ananyev  wrote:
> >>>
> >>>> From one side the code itself is very small and straightforward, >
> from
> >> other side - it is not clear to me what is intended usage for it
> >>>> within DPDK and it's applianances?
> >>>> Konstantin
> >>>
> >>> The intended usage is explained in the cover email (see below) and
> >> demonstrated
> >>> in the test supplied in the following patch - when sending arrays of
> >> pointers
> >>> between cores as it happens in a forwarding example.
> >>
> >> Yes, I saw that. The thing is that test is a 'synthetic' one.
> >> My question was about how do you expect people to use it in more
> realistic
> >> scenarios?
> >> Let say user has a bunch of mbuf pointers, possibly from different
> mempools.
> >> How he can use this API: how to deduce the base pointer for all of
> them and
> >> what to
> >> do if it can't be done?
> >
> > I share Konstantin's concerns with this feature.
> >
> > If we want to compress mbuf pointers in applications with a few mbuf
> pools, e.g. an mbuf pool per CPU socket, the compression algorithm would
> be different.
> This feature is targeted for pipeline mode of applications. We see many
> customers using pipeline mode. This feature helps in reducing the cost
> of transferring the packets between cores by reducing the copies
> involved.

OK. I agree this is a very common use case, worth optimizing for.

> For an application with multiple pools, it depends on how the
> applications are using multiple pools. But, if there is a bunch of
> packets belonging to multiple mempools, compressing those mbufs may not
> be possible. But if those mbufs are grouped per mempool and are
> transferred on different queues, then it is possible. Hence the APIs are
> implemented very generically.

OK.

<feature creep>
And for a possible future extension:
If there are very few mbuf pools, such as 2 or 4, it might be possible to develop similar functions to efficiently compress/decompress pointers in a shared queue. E.g. the highest bits could identify the pool, and the lowest bits could identify the pointer offset (with bit shift) in that pool. Or if the pools are less than 4 GB each, the lowest bits could identify the pool, and be masked away for getting the offset (no bit shift), taking advantage of lowest bits of the pointer address always being zero anyway.
I am mentioning this, so it can be taken into consideration when designing the pointer compression library and its API. I don't expect it to be implemented at this time. Also, it might not lead to any changes of the already proposed pointer compression API - just give it a few thoughts.
</feature creep>

+1 for the simplicity of the functions and the API in this patch.
E.g. the bit_shift is most likely known constant at build time, so inlining allows the compiler to optimize for this. In many use cases, it might be 1, and thus optimized away.

> 
> >
> > I would like to add:
> > If we want to offer optimizations specifically for applications with a
> single mbuf pool, I think it should be considered in a system-wide
> context to determine if performance could be improved in more areas.
> > E.g. removing the pool field from the rte_mbuf structure might free up
> space to move hot fields from the second cache line to the first, so the
> second cache line rarely needs to be touched. (As an alternative to
> removing the pool field, it could be moved to the second cache line,
> only to be used if the global "single mbuf pool" is NULL.)
> Agree on this. The feedback I have received is on similar lines, many
> are using simple features. I also received feedback that 90% of the
> applications use less than 4GB of memory for mbuf and burst sizes are up
> to 256.

Interesting.
Keeping the most common use cases in mind is important for steering DPDK in the right direction as it evolves.

If a very large percentage of use cases use one mbuf pool of less than 4 GB, we should seriously consider the broader opportunity for optimizing by generally referencing mbufs by an uint32_t pointer offset (no bit shifting) instead of by pointers.

> 
> >
> > On the other hand, I agree that pointer compression can be useful for
> some applications, so we should accept it.
> >
> > However, pointer compression has nothing to do with the underlying
> hardware or operating system, so it does not belong in the EAL (which is
> already too bloated). It should be a separate library.
> Yes, this is generic (though there is SIMD code). We could move it out
> of EAL.

Thank you.

I think that a misconception that arch specific optimizations (such as SIMD code) required stuff to go into EAL has been prevailing, and this misconception is a main reason why EAL has become so bloated.
Moving features like pointer compression out of EAL, thereby showing alternative design patterns for code containing arch specific optimizations, will help eliminate that misconception.

> 
> >
> >>
> >>> On 01/11/2023 18:12, Paul Szczepanek wrote:
> >>>
> >>>> This patchset is proposing adding a new EAL header with utility
> functions
> >>>> that allow compression of arrays of pointers.
> >>>>
> >>>> When passing caches full of pointers between threads, memory
> containing
> >>>> the pointers is copied multiple times which is especially costly
> between
> >>>> cores. A compression method will allow us to shrink the memory size
> >>>> copied.
> >>>>
> >>>> The compression takes advantage of the fact that pointers are
> usually
> >>>> located in a limited memory region (like a mempool). We can
> compress them
> >>>> by converting them to offsets from a base memory address.
> >>>>
> >>>> Offsets can be stored in fewer bytes (dictated by the memory region
> size
> >>>> and alignment of the pointer). For example: an 8 byte aligned
> pointer
> >>>> which is part of a 32GB memory pool can be stored in 4 bytes. The
> API is
> >>>> very generic and does not assume mempool pointers, any pointer can
> be
> >>>> passed in.
> >>>>
> >>>> Compression is based on few and fast operations and especially with
> vector
> >>>> instructions leveraged creates minimal overhead.
> >>>>
> >>>> The API accepts and returns arrays because the overhead means it
> only is
> >>>> worth it when done in bulk.
> >>>>
> >>>> Test is added that shows potential performance gain from
> compression. In
> >>>> this test an array of pointers is passed through a ring between two
> cores.
> >>>> It shows the gain which is dependent on the bulk operation size. In
> this
> >>>> synthetic test run on ampere altra a substantial (up to 25%)
> performance
> >>>> gain is seen if done in bulk size larger than 32. At 32 it breaks
> even and
> >>>> lower sizes create a small (less than 5%) slowdown due to overhead.
> >>>>
> >>>> In a more realistic mock application running the l3 forwarding dpdk
> >>>> example that works in pipeline mode on two cores this translated
> into a
> >>>> ~5% throughput increase on an ampere altra.
> >
> > Which burst size was used to achieve this ~5% throughput increase?
> This is the stock L3fwd application which is split into 2 stages: RX,
> L3fwd, TX. The default burst size 32 is used.

Impressive.
It proves the point that synthetic tests often are too simple to show the benefits of optimizations for reducing cache misses.


  reply	other threads:[~2024-03-02 10:33 UTC|newest]

Thread overview: 76+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-09-27 15:08 [RFC 0/2] " Paul Szczepanek
2023-09-27 15:08 ` [RFC 1/2] eal: add pointer compression functions Paul Szczepanek
2023-10-09 15:54   ` Thomas Monjalon
2023-10-11 13:36     ` Honnappa Nagarahalli
2023-10-11 16:43       ` Paul Szczepanek
2023-10-11 12:43   ` [RFC v2 0/2] add pointer compression API Paul Szczepanek
2023-10-11 12:43     ` [RFC v2 1/2] eal: add pointer compression functions Paul Szczepanek
2023-10-11 12:43     ` [RFC v2 2/2] test: add pointer compress tests to ring perf test Paul Szczepanek
2023-10-31 18:10   ` [PATCH v3 0/3] add pointer compression API Paul Szczepanek
2023-10-31 18:10     ` [PATCH v3 1/3] eal: add pointer compression functions Paul Szczepanek
2023-10-31 18:10     ` [PATCH v3 2/3] test: add pointer compress tests to ring perf test Paul Szczepanek
2023-10-31 18:10     ` [PATCH v3 3/3] docs: add pointer compression to the EAL guide Paul Szczepanek
2023-11-01  7:42     ` [PATCH v3 0/3] add pointer compression API Morten Brørup
2023-11-01 12:52       ` Paul Szczepanek
2023-11-01 12:46   ` [PATCH v4 0/4] " Paul Szczepanek
2023-11-01 12:46     ` [PATCH v4 1/4] eal: add pointer compression functions Paul Szczepanek
2023-11-01 12:46     ` [PATCH v4 2/4] test: add pointer compress tests to ring perf test Paul Szczepanek
2023-11-01 12:46     ` [PATCH v4 3/4] docs: add pointer compression to the EAL guide Paul Szczepanek
2023-11-01 12:46     ` [PATCH v4 4/4] test: add unit test for ptr compression Paul Szczepanek
2023-11-01 18:12   ` [PATCH v5 0/4] add pointer compression API Paul Szczepanek
2023-11-01 18:12     ` [PATCH v5 1/4] eal: add pointer compression functions Paul Szczepanek
2024-02-11 15:32       ` Konstantin Ananyev
2023-11-01 18:12     ` [PATCH v5 2/4] test: add pointer compress tests to ring perf test Paul Szczepanek
2023-11-01 18:13     ` [PATCH v5 3/4] docs: add pointer compression to the EAL guide Paul Szczepanek
2023-11-01 18:13     ` [PATCH v5 4/4] test: add unit test for ptr compression Paul Szczepanek
2024-02-22  8:15     ` [PATCH v5 0/4] add pointer compression API Paul Szczepanek
2024-02-22 16:16       ` Konstantin Ananyev
2024-03-01 11:16         ` Morten Brørup
2024-03-01 16:12           ` Patrick Robb
2024-03-01 19:57           ` Honnappa Nagarahalli
2024-03-02 10:33             ` Morten Brørup [this message]
2024-03-06 22:31               ` Paul Szczepanek
2024-03-07  2:13                 ` Honnappa Nagarahalli
2024-03-04 14:44             ` Konstantin Ananyev
2024-05-15 17:00               ` Paul Szczepanek
2024-05-15 22:34                 ` Morten Brørup
2024-05-16  8:25                   ` Paul Szczepanek
2024-05-16  8:40                   ` Konstantin Ananyev
2024-02-29 16:03   ` [PATCH v6 " Paul Szczepanek
2024-02-29 16:03     ` [PATCH v6 1/4] eal: add pointer compression functions Paul Szczepanek
2024-02-29 16:03     ` [PATCH v6 2/4] test: add pointer compress tests to ring perf test Paul Szczepanek
2024-02-29 16:03     ` [PATCH v6 3/4] docs: add pointer compression to the EAL guide Paul Szczepanek
2024-02-29 16:03     ` [PATCH v6 4/4] test: add unit test for ptr compression Paul Szczepanek
2024-03-01 10:21   ` [PATCH v7 0/4] add pointer compression API Paul Szczepanek
2024-03-01 10:21     ` [PATCH v7 1/4] eal: add pointer compression functions Paul Szczepanek
2024-03-07 11:22       ` David Marchand
2024-03-01 10:21     ` [PATCH v7 2/4] test: add pointer compress tests to ring perf test Paul Szczepanek
2024-03-07 11:27       ` David Marchand
2024-03-01 10:21     ` [PATCH v7 3/4] docs: add pointer compression to the EAL guide Paul Szczepanek
2024-03-01 10:21     ` [PATCH v7 4/4] test: add unit test for ptr compression Paul Szczepanek
2024-03-07 11:30       ` David Marchand
2024-03-07 20:39   ` [PATCH v7 0/4] add pointer compression API Paul Szczepanek
2024-03-07 20:39     ` [PATCH v8 1/4] ptr_compress: add pointer compression library Paul Szczepanek
2024-03-07 20:39     ` [PATCH v8 2/4] test: add pointer compress tests to ring perf test Paul Szczepanek
2024-03-07 20:39     ` [PATCH v8 3/4] docs: add pointer compression guide Paul Szczepanek
2024-03-07 20:39     ` [PATCH v8 4/4] test: add unit test for ptr compression Paul Szczepanek
2024-03-08  8:27     ` [PATCH v7 0/4] add pointer compression API David Marchand
2024-03-10 19:34       ` Honnappa Nagarahalli
2024-03-11  7:44         ` David Marchand
2024-03-11 14:47   ` [PATCH v9 0/5] " Paul Szczepanek
2024-03-11 14:47     ` [PATCH v9 1/5] lib: allow libraries with no sources Paul Szczepanek
2024-03-11 15:23       ` Bruce Richardson
2024-03-15  8:33         ` Paul Szczepanek
2024-03-11 14:47     ` [PATCH v9 2/5] ptr_compress: add pointer compression library Paul Szczepanek
2024-03-11 14:47     ` [PATCH v9 3/5] test: add pointer compress tests to ring perf test Paul Szczepanek
2024-03-11 14:47     ` [PATCH v9 4/5] docs: add pointer compression guide Paul Szczepanek
2024-03-11 14:47     ` [PATCH v9 5/5] test: add unit test for ptr compression Paul Szczepanek
2024-03-11 20:31   ` [PATCH v10 0/5] add pointer compression API Paul Szczepanek
2024-03-11 20:31     ` [PATCH v10 1/5] lib: allow libraries with no sources Paul Szczepanek
2024-03-15  9:14       ` Bruce Richardson
2024-03-11 20:31     ` [PATCH v10 2/5] ptr_compress: add pointer compression library Paul Szczepanek
2024-03-11 20:31     ` [PATCH v10 3/5] test: add pointer compress tests to ring perf test Paul Szczepanek
2024-03-11 20:31     ` [PATCH v10 4/5] docs: add pointer compression guide Paul Szczepanek
2024-03-11 20:31     ` [PATCH v10 5/5] test: add unit test for ptr compression Paul Szczepanek
2023-09-27 15:08 ` [RFC 2/2] test: add pointer compress tests to ring perf test Paul Szczepanek
2023-10-09 15:48   ` Thomas Monjalon

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=98CBD80474FA8B44BF855DF32C47DC35E9F291@smartserver.smartshare.dk \
    --to=mb@smartsharesystems.com \
    --cc=Honnappa.Nagarahalli@arm.com \
    --cc=Paul.Szczepanek@arm.com \
    --cc=dev@dpdk.org \
    --cc=konstantin.ananyev@huawei.com \
    --cc=konstantin.v.ananyev@yandex.ru \
    --cc=nd@arm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).