From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id 4EB6043C0C; Fri, 1 Mar 2024 17:12:36 +0100 (CET) Received: from mails.dpdk.org (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id 389D2402EA; Fri, 1 Mar 2024 17:12:36 +0100 (CET) Received: from mail-oo1-f52.google.com (mail-oo1-f52.google.com [209.85.161.52]) by mails.dpdk.org (Postfix) with ESMTP id 9F6F1402E0 for ; Fri, 1 Mar 2024 17:12:35 +0100 (CET) Received: by mail-oo1-f52.google.com with SMTP id 006d021491bc7-5a0e7fef61dso900184eaf.0 for ; Fri, 01 Mar 2024 08:12:35 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=iol.unh.edu; s=unh-iol; t=1709309555; x=1709914355; darn=dpdk.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=TwvKgb9IvuT4BES64SN9qux9yTF4IwsUy5RBw5UCGcA=; b=ekZ60dXvF8NEUzGawWE46seKb6o/Fq3HHN8r22pmBfLNZgFqd93PZU9X6MwhvhNL7M ufXp1kacvs3UfCSmvej3XTTuwweFTALC97cUzUfdb9Z15S5XoaVT5Gs0gbrBT0dDeh8x tFfR8yi5wl1D9Rgp1B7ZSL1hJ0d2zbv/a5eJ8= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1709309555; x=1709914355; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=TwvKgb9IvuT4BES64SN9qux9yTF4IwsUy5RBw5UCGcA=; b=Bguz0StaNlS072V5hJfc/ahdy1hnvT8D3Do2dklgqxm7sEXiZ5sO8Iueb/xGKDre9y fSDQCBu7WxThNDIHg2NK3e98cWXa6lePkqVwAsk0g92obnt7tEWcWmUGjfNVUUmLNsUk OYOLB/0bC3L30zXeJZnO4D5WJX8iq8ailZEb+LrOo3te928FsrySQXGVOXITr4ZW1crL rwEBSWnzlnZmthi1+FEOfAKPQ/Uv9QLNpAOUbJ2aLigMm+AaKzhUDDWv1bdJtUcuG6yv YFIxdwQoTWoYihtJJApG4bcK4WESEK7qquimNNuQnZgCXGY9daywIJFmeHEugSJOzFHb ckvA== X-Forwarded-Encrypted: i=1; AJvYcCXw0v8dSsMqCY5eY3lJko50wuY83FHXo5TBh+2HySqcr/iWcwnim73f2v0Xth/AtP3WnQAVF++vxxlPGXg= X-Gm-Message-State: AOJu0YxIVdWIssMb0myLGjEsfUSBgzt8ccfhesWEXwnH7svXwZCWilv4 IwOTYGWDRqM3f7x0dRj46HIeTo25oS13d6Y/rshH/yiZSJAA43SKuWZWpkwUCBbqQFhBprQYIzE Ns7dQSJhaqot8qwDHnM0qcWGNFBxnXDgautCpcg== X-Google-Smtp-Source: AGHT+IFLLZe6VhjfEw7csLiZWmlHo5ub6i5118+MHyC/bggPe6dIdGO/3ZFlYIlCmjkIbbAWInxth1SiPA6xSKLBDjI= X-Received: by 2002:a4a:6554:0:b0:5a0:e5e8:715d with SMTP id z20-20020a4a6554000000b005a0e5e8715dmr1952936oog.7.1709309554999; Fri, 01 Mar 2024 08:12:34 -0800 (PST) MIME-Version: 1.0 References: <20230927150854.3670391-2-paul.szczepanek@arm.com> <20231101181301.2449804-1-paul.szczepanek@arm.com> <7058331a-d829-4f0e-8634-726ca3be1ef2@arm.com> <98CBD80474FA8B44BF855DF32C47DC35E9F290@smartserver.smartshare.dk> In-Reply-To: <98CBD80474FA8B44BF855DF32C47DC35E9F290@smartserver.smartshare.dk> From: Patrick Robb Date: Fri, 1 Mar 2024 11:12:24 -0500 Message-ID: Subject: Re: [PATCH v5 0/4] add pointer compression API To: Paul Szczepanek Cc: Konstantin Ananyev , dev@dpdk.org, konstantin.v.ananyev@yandex.ru, =?UTF-8?Q?Morten_Br=C3=B8rup?= Content-Type: multipart/alternative; boundary="000000000000d6588806129ba4d1" X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org --000000000000d6588806129ba4d1 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable The Community CI Testing Lab had an infra failure this morning and some patches including yours were affected with false failures. The issue is now resolved and we are rerunning the tests in question for all patches submitted today. On Fri, Mar 1, 2024 at 6:16=E2=80=AFAM Morten Br=C3=B8rup wrote: > > From: Konstantin Ananyev [mailto:konstantin.ananyev@huawei.com] > > Sent: Thursday, 22 February 2024 17.16 > > > > > For some reason your email is not visible to me, even though it's in > the > > > archive. > > > > No worries. > > > > > > > > On 02/11/202416:32,Konstantin Ananyev konstantin.v.ananyev wrote: > > > > > > > From one side the code itself is very small and straightforward, > > from > > other side - it is not clear to me what is intended usage for it > > > > within DPDK and it's applianances? > > > > Konstantin > > > > > > The intended usage is explained in the cover email (see below) and > > demonstrated > > > in the test supplied in the following patch - when sending arrays of > > pointers > > > between cores as it happens in a forwarding example. > > > > Yes, I saw that. The thing is that test is a 'synthetic' one. > > My question was about how do you expect people to use it in more > realistic > > scenarios? > > Let say user has a bunch of mbuf pointers, possibly from different > mempools. > > How he can use this API: how to deduce the base pointer for all of them > and > > what to > > do if it can't be done? > > I share Konstantin's concerns with this feature. > > If we want to compress mbuf pointers in applications with a few mbuf > pools, e.g. an mbuf pool per CPU socket, the compression algorithm would = be > different. > > I would like to add: > If we want to offer optimizations specifically for applications with a > single mbuf pool, I think it should be considered in a system-wide contex= t > to determine if performance could be improved in more areas. > E.g. removing the pool field from the rte_mbuf structure might free up > space to move hot fields from the second cache line to the first, so the > second cache line rarely needs to be touched. (As an alternative to > removing the pool field, it could be moved to the second cache line, only > to be used if the global "single mbuf pool" is NULL.) > > On the other hand, I agree that pointer compression can be useful for som= e > applications, so we should accept it. > > However, pointer compression has nothing to do with the underlying > hardware or operating system, so it does not belong in the EAL (which is > already too bloated). It should be a separate library. > > > > > > On 01/11/2023 18:12, Paul Szczepanek wrote: > > > > > > > This patchset is proposing adding a new EAL header with utility > functions > > > > that allow compression of arrays of pointers. > > > > > > > > When passing caches full of pointers between threads, memory > containing > > > > the pointers is copied multiple times which is especially costly > between > > > > cores. A compression method will allow us to shrink the memory size > > > > copied. > > > > > > > > The compression takes advantage of the fact that pointers are usual= ly > > > > located in a limited memory region (like a mempool). We can compres= s > them > > > > by converting them to offsets from a base memory address. > > > > > > > > Offsets can be stored in fewer bytes (dictated by the memory region > size > > > > and alignment of the pointer). For example: an 8 byte aligned point= er > > > > which is part of a 32GB memory pool can be stored in 4 bytes. The > API is > > > > very generic and does not assume mempool pointers, any pointer can = be > > > > passed in. > > > > > > > > Compression is based on few and fast operations and especially with > vector > > > > instructions leveraged creates minimal overhead. > > > > > > > > The API accepts and returns arrays because the overhead means it > only is > > > > worth it when done in bulk. > > > > > > > > Test is added that shows potential performance gain from > compression. In > > > > this test an array of pointers is passed through a ring between two > cores. > > > > It shows the gain which is dependent on the bulk operation size. In > this > > > > synthetic test run on ampere altra a substantial (up to 25%) > performance > > > > gain is seen if done in bulk size larger than 32. At 32 it breaks > even and > > > > lower sizes create a small (less than 5%) slowdown due to overhead. > > > > > > > > In a more realistic mock application running the l3 forwarding dpdk > > > > example that works in pipeline mode on two cores this translated > into a > > > > ~5% throughput increase on an ampere altra. > > Which burst size was used to achieve this ~5% throughput increase? > > > > > > > > > v2: > > > > * addressed review comments (style, explanations and typos) > > > > * lowered bulk iterations closer to original numbers to keep runtim= e > short > > > > * fixed pointer size warning on 32-bit arch > > > > v3: > > > > * added 16-bit versions of compression functions and tests > > > > * added documentation of these new utility functions in the EAL gui= de > > > > v4: > > > > * added unit test > > > > * fix bug in NEON implementation of 32-bit decompress > > > > v5: > > > > * disable NEON and SVE implementation on AARCH32 due to wrong > pointer size > > > > > > > > Paul Szczepanek (4): > > > > eal: add pointer compression functions > > > > test: add pointer compress tests to ring perf test > > > > docs: add pointer compression to the EAL guide > > > > test: add unit test for ptr compression > > > > > > > > .mailmap | 1 + > > > > app/test/meson.build | 1 + > > > > app/test/test_eal_ptr_compress.c | 108 ++++++ > > > > app/test/test_ring.h | 94 ++++- > > > > app/test/test_ring_perf.c | 354 > ++++++++++++------ > > > > .../prog_guide/env_abstraction_layer.rst | 142 +++++++ > > > > lib/eal/include/meson.build | 1 + > > > > lib/eal/include/rte_ptr_compress.h | 266 +++++++++++++ > > > > 8 files changed, 843 insertions(+), 124 deletions(-) > > > > create mode 100644 app/test/test_eal_ptr_compress.c > > > > create mode 100644 lib/eal/include/rte_ptr_compress.h > > > > > > > > -- > > > > 2.25.1 > > > > > --000000000000d6588806129ba4d1 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
The Community CI Testing Lab had an infra= failure this morning and some patches including yours were affected with f= alse failures. The issue is now resolved and we are rerunning the tests in = question for all patches submitted today.=C2=A0

On Fri, Mar 1, 2024 at 6= :16=E2=80=AFAM Morten Br=C3=B8rup <mb@smartsharesystems.com> wrote:
> From: Konstantin Ananyev [mailto:konstantin.a= nanyev@huawei.com]
> Sent: Thursday, 22 February 2024 17.16
>
> > For some reason your email is not visible to me, even though it&#= 39;s in the
> > archive.
>
> No worries.
>
> >
> > On 02/11/202416:32,Konstantin Ananyev konstantin.v.ananyev=C2=A0 = wrote:
> >
> > > From one side the code itself is very small and straightforw= ard, > from
> other side - it is not clear to me what is intended usage for it
> > > within DPDK and it's applianances?
> > > Konstantin
> >
> > The intended usage is explained in the cover email (see below) an= d
> demonstrated
> > in the test supplied in the following patch - when sending arrays= of
> pointers
> > between cores as it happens in a forwarding example.
>
> Yes, I saw that. The thing is that test is a 'synthetic' one.<= br> > My question was about how do you expect people to use it in more reali= stic
> scenarios?
> Let say user has a bunch of mbuf pointers, possibly from different mem= pools.
> How he can use this API: how to deduce the base pointer for all of the= m and
> what to
> do if it can't be done?

I share Konstantin's concerns with this feature.

If we want to compress mbuf pointers in applications with a few mbuf pools,= e.g. an mbuf pool per CPU socket, the compression algorithm would be diffe= rent.

I would like to add:
If we want to offer optimizations specifically for applications with a sing= le mbuf pool, I think it should be considered in a system-wide context to d= etermine if performance could be improved in more areas.
E.g. removing the pool field from the rte_mbuf structure might free up spac= e to move hot fields from the second cache line to the first, so the second= cache line rarely needs to be touched. (As an alternative to removing the = pool field, it could be moved to the second cache line, only to be used if = the global "single mbuf pool" is NULL.)

On the other hand, I agree that pointer compression can be useful for some = applications, so we should accept it.

However, pointer compression has nothing to do with the underlying hardware= or operating system, so it does not belong in the EAL (which is already to= o bloated). It should be a separate library.

>
> > On 01/11/2023 18:12, Paul Szczepanek wrote:
> >
> > > This patchset is proposing adding a new EAL header with util= ity functions
> > > that allow compression of arrays of pointers.
> > >
> > > When passing caches full of pointers between threads, memory= containing
> > > the pointers is copied multiple times which is especially co= stly between
> > > cores. A compression method will allow us to shrink the memo= ry size
> > > copied.
> > >
> > > The compression takes advantage of the fact that pointers ar= e usually
> > > located in a limited memory region (like a mempool). We can = compress them
> > > by converting them to offsets from a base memory address. > > >
> > > Offsets can be stored in fewer bytes (dictated by the memory= region size
> > > and alignment of the pointer). For example: an 8 byte aligne= d pointer
> > > which is part of a 32GB memory pool can be stored in 4 bytes= . The API is
> > > very generic and does not assume mempool pointers, any point= er can be
> > > passed in.
> > >
> > > Compression is based on few and fast operations and especial= ly with vector
> > > instructions leveraged creates minimal overhead.
> > >
> > > The API accepts and returns arrays because the overhead mean= s it only is
> > > worth it when done in bulk.
> > >
> > > Test is added that shows potential performance gain from com= pression. In
> > > this test an array of pointers is passed through a ring betw= een two cores.
> > > It shows the gain which is dependent on the bulk operation s= ize. In this
> > > synthetic test run on ampere altra a substantial (up to 25%)= performance
> > > gain is seen if done in bulk size larger than 32. At 32 it b= reaks even and
> > > lower sizes create a small (less than 5%) slowdown due to ov= erhead.
> > >
> > > In a more realistic mock application running the l3 forwardi= ng dpdk
> > > example that works in pipeline mode on two cores this transl= ated into a
> > > ~5% throughput increase on an ampere altra.

Which burst size was used to achieve this ~5% throughput increase?

> > >
> > > v2:
> > > * addressed review comments (style, explanations and typos)<= br> > > > * lowered bulk iterations closer to original numbers to keep= runtime short
> > > * fixed pointer size warning on 32-bit arch
> > > v3:
> > > * added 16-bit versions of compression functions and tests > > > * added documentation of these new utility functions in the = EAL guide
> > > v4:
> > > * added unit test
> > > * fix bug in NEON implementation of 32-bit decompress
> > > v5:
> > > * disable NEON and SVE implementation on AARCH32 due to wron= g pointer size
> > >
> > > Paul Szczepanek (4):
> > >=C2=A0 =C2=A0 eal: add pointer compression functions
> > >=C2=A0 =C2=A0 test: add pointer compress tests to ring perf t= est
> > >=C2=A0 =C2=A0 docs: add pointer compression to the EAL guide<= br> > > >=C2=A0 =C2=A0 test: add unit test for ptr compression
> > >
> > >=C2=A0 =C2=A0.mailmap=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 |=C2=A0 =C2=A01 +
> > >=C2=A0 =C2=A0app/test/meson.build=C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 |=C2=A0 =C2= =A01 +
> > >=C2=A0 =C2=A0app/test/test_eal_ptr_compress.c=C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 | 108 ++++++
> > >=C2=A0 =C2=A0app/test/test_ring.h=C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 |=C2=A0 94 += +++-
> > >=C2=A0 =C2=A0app/test/test_ring_perf.c=C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0| 354 ++++++++++++--= ----
> > >=C2=A0 =C2=A0.../prog_guide/env_abstraction_layer.rst=C2=A0 = =C2=A0 =C2=A0 | 142 +++++++
> > >=C2=A0 =C2=A0lib/eal/include/meson.build=C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0|=C2=A0 =C2=A01 +
> > >=C2=A0 =C2=A0lib/eal/include/rte_ptr_compress.h=C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 | 266 +++++++++++++
> > >=C2=A0 =C2=A08 files changed, 843 insertions(+), 124 deletion= s(-)
> > >=C2=A0 =C2=A0create mode 100644 app/test/test_eal_ptr_compres= s.c
> > >=C2=A0 =C2=A0create mode 100644 lib/eal/include/rte_ptr_compr= ess.h
> > >
> > > --
> > > 2.25.1
> > >

--000000000000d6588806129ba4d1--