From: "Morten Brørup" <mb@smartsharesystems.com>
To: "Stephen Hemminger" <stephen@networkplumber.org>
Cc: <bruce.richardson@intel.com>, <konstantin.v.ananyev@yandex.ru>,
<mattias.ronnblom@ericsson.com>, <dev@dpdk.org>
Subject: RE: [PATCH] eal/x86: improve rte_memcpy const size 16 performance
Date: Sun, 3 Mar 2024 11:07:19 +0100 [thread overview]
Message-ID: <98CBD80474FA8B44BF855DF32C47DC35E9F29C@smartserver.smartshare.dk> (raw)
In-Reply-To: <20240302215807.6d7c3cd9@hermes.local>
> From: Stephen Hemminger [mailto:stephen@networkplumber.org]
> Sent: Sunday, 3 March 2024 06.58
>
> On Sat, 2 Mar 2024 21:40:03 -0800
> Stephen Hemminger <stephen@networkplumber.org> wrote:
>
> > On Sun, 3 Mar 2024 00:48:12 +0100
> > Morten Brørup <mb@smartsharesystems.com> wrote:
> >
> > > When the rte_memcpy() size is 16, the same 16 bytes are copied
> twice.
> > > In the case where the size is knownto be 16 at build tine, omit the
> > > duplicate copy.
> > >
> > > Reduced the amount of effectively copy-pasted code by using #ifdef
> > > inside functions instead of outside functions.
> > >
> > > Suggested-by: Stephen Hemminger <stephen@networkplumber.org>
> > > Signed-off-by: Morten Brørup <mb@smartsharesystems.com>
> > > ---
> >
> > Looks good, let me see how it looks in goldbolt vs Gcc.
> >
> > One other issue is that for the non-constant case, rte_memcpy has an
> excessively
> > large inline code footprint. That is one of the reasons Gcc doesn't
> always
> > inline. For > 128 bytes, it really should be a function.
Yes, the code footprint is significant for the non-constant case.
I suppose Intel considered the cost and benefits when they developed this.
Or perhaps they just wanted a showcase for their new and shiny vector instructions. ;-)
Inlining might provide significant branch prediction benefits in cases where the size is not build-time constant, but run-time constant.
>
> For size of 4,6,8,16, 32, 64, up to 128 Gcc inline and rte_memcpy match.
>
> For size 128. It looks gcc is simpler.
>
> rte_copy_addr:
> vmovdqu ymm0, YMMWORD PTR [rsi]
> vextracti128 XMMWORD PTR [rdi+16], ymm0, 0x1
> vmovdqu XMMWORD PTR [rdi], xmm0
> vmovdqu ymm0, YMMWORD PTR [rsi+32]
> vextracti128 XMMWORD PTR [rdi+48], ymm0, 0x1
> vmovdqu XMMWORD PTR [rdi+32], xmm0
> vmovdqu ymm0, YMMWORD PTR [rsi+64]
> vextracti128 XMMWORD PTR [rdi+80], ymm0, 0x1
> vmovdqu XMMWORD PTR [rdi+64], xmm0
> vmovdqu ymm0, YMMWORD PTR [rsi+96]
> vextracti128 XMMWORD PTR [rdi+112], ymm0, 0x1
> vmovdqu XMMWORD PTR [rdi+96], xmm0
> vzeroupper
> ret
Interesting. Playing around with Godbolt revealed that GCC version < 11 creates the above from rte_memcpy, whereas GCC version >= 11 does it correctly. Clang doesn't have this issue.
I guess that's why the original code treated AVX as SSE.
Fixed in v2.
> copy_addr:
> vmovdqu ymm0, YMMWORD PTR [rsi]
> vmovdqu YMMWORD PTR [rdi], ymm0
> vmovdqu ymm1, YMMWORD PTR [rsi+32]
> vmovdqu YMMWORD PTR [rdi+32], ymm1
> vmovdqu ymm2, YMMWORD PTR [rsi+64]
> vmovdqu YMMWORD PTR [rdi+64], ymm2
> vmovdqu ymm3, YMMWORD PTR [rsi+96]
> vmovdqu YMMWORD PTR [rdi+96], ymm3
> vzeroupper
> ret
next prev parent reply other threads:[~2024-03-03 10:07 UTC|newest]
Thread overview: 40+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-03-02 23:48 Morten Brørup
2024-03-03 0:38 ` Morten Brørup
2024-03-03 5:40 ` Stephen Hemminger
2024-03-03 5:47 ` Stephen Hemminger
2024-03-03 5:58 ` Stephen Hemminger
2024-03-03 5:58 ` Stephen Hemminger
2024-03-03 10:07 ` Morten Brørup [this message]
2024-03-03 5:41 ` Stephen Hemminger
2024-03-03 9:46 ` [PATCH v2] " Morten Brørup
2024-04-04 9:18 ` Morten Brørup
2024-04-04 10:07 ` Bruce Richardson
2024-04-04 11:19 ` Morten Brørup
2024-04-04 13:29 ` Bruce Richardson
2024-04-04 15:37 ` Morten Brørup
2024-04-04 15:55 ` Stephen Hemminger
2024-04-04 16:10 ` Morten Brørup
2024-04-04 16:55 ` Bruce Richardson
2024-03-03 16:05 ` [PATCH] " Stephen Hemminger
2024-04-05 12:46 ` [PATCH v3] " Morten Brørup
2024-04-05 13:17 ` Bruce Richardson
2024-04-05 13:48 ` [PATCH v4] " Morten Brørup
2024-05-27 13:15 ` Morten Brørup
2024-05-27 13:16 ` [PATCH v5] " Morten Brørup
2024-05-27 14:13 ` Morten Brørup
2024-05-28 6:18 ` Morten Brørup
2024-05-28 6:22 ` [PATCH v6] " Morten Brørup
2024-05-28 7:05 ` [PATCH v7] " Morten Brørup
2024-05-30 15:41 ` [PATCH v8] " Morten Brørup
2024-06-10 9:05 ` Morten Brørup
2024-06-10 13:40 ` Konstantin Ananyev
2024-06-10 13:59 ` Morten Brørup
2024-07-09 9:24 ` David Marchand
2024-07-09 11:42 ` David Marchand
2024-07-09 12:43 ` Morten Brørup
2024-07-09 12:47 ` David Marchand
2024-07-09 12:54 ` Morten Brørup
2024-07-09 15:26 ` Patrick Robb
2024-07-09 13:27 ` [PATCH v9] " Morten Brørup
2024-07-09 15:42 ` David Marchand
2024-07-10 8:03 ` David Marchand
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=98CBD80474FA8B44BF855DF32C47DC35E9F29C@smartserver.smartshare.dk \
--to=mb@smartsharesystems.com \
--cc=bruce.richardson@intel.com \
--cc=dev@dpdk.org \
--cc=konstantin.v.ananyev@yandex.ru \
--cc=mattias.ronnblom@ericsson.com \
--cc=stephen@networkplumber.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).