DPDK patches and discussions
 help / color / mirror / Atom feed
From: Bruce Richardson <bruce.richardson@intel.com>
To: Manish Sharma <manish.sharmajee75@gmail.com>
Cc: dev@dpdk.org
Subject: Re: [dpdk-dev] rte_memcpy - fence and stream
Date: Tue, 25 May 2021 10:20:24 +0100	[thread overview]
Message-ID: <YKzBWEKuHU2AlUtI@bricha3-MOBL.ger.corp.intel.com> (raw)
In-Reply-To: <CAOT2AHX7Zf-N-wFRJePSHfaup2ogxR6udNeT82zxjOSH5n0xpw@mail.gmail.com>

On Mon, May 24, 2021 at 11:43:24PM +0530, Manish Sharma wrote:
> I am looking at the source for rte_memcpy (this is a discussion only for
> x86-64)
> 
> For one of the cases, when aligned correctly, it uses
> 
> /**
>  * Copy 64 bytes from one location to another,
>  * locations should not overlap.
>  */
> static __rte_always_inline void
> rte_mov64(uint8_t *dst, const uint8_t *src)
> {
>         __m512i zmm0;
> 
>         zmm0 = _mm512_loadu_si512((const void *)src);
>         _mm512_storeu_si512((void *)dst, zmm0);
> }
> 
> I had some questions about this:
> 
> 1. What I dont see is any use of x86 fence(rmb,wmb) instructions. Is that
> not required in this case and if not, why isnt it needed?
> 
> 2. Are the  mm512_loadu_si512 and  _mm512_storeu_si512 non temporal?
> 
They are not non-temporal, so don't need fences.

> 3. Why isn't the code using  stream variants, _mm512_stream_load_si512 and
> friends? It would not pollute the cache, so should be better - unless the
> required fence instructions cause a drop in performance?
> 
Whether the stream variants perform better really depends on what you are
trying to measure. However, in the vast majority of cases a core is making
a copy of data to work on it, in which case having the data not in cache is
going to cause massive stalls on the next access to that data as it's
fetched from DRAM. Therefore, the best option is to use regular
instructions meaning that the data is in local cache afterwards, giving far
better performance when the data is actually used.

> 4. Do the _mm512_stream_load_si512 need fence instructions? Based on my
> reading of the spec, the answer is yes - but wanted to confirm.
>
I believe a fence would be necessary for safety, yes. 

  reply	other threads:[~2021-05-25  9:20 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-05-24  8:49 [dpdk-dev] rte_memcpy Manish Sharma
2021-05-24 18:13 ` [dpdk-dev] rte_memcpy - fence and stream Manish Sharma
2021-05-25  9:20   ` Bruce Richardson [this message]
2021-05-27 15:49     ` Morten Brørup
2021-05-27 16:25       ` Bruce Richardson
2021-05-27 17:09         ` Manish Sharma
2021-05-27 17:22           ` Bruce Richardson
2021-05-27 18:15             ` Morten Brørup
2021-06-22 21:55               ` Morten Brørup

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YKzBWEKuHU2AlUtI@bricha3-MOBL.ger.corp.intel.com \
    --to=bruce.richardson@intel.com \
    --cc=dev@dpdk.org \
    --cc=manish.sharmajee75@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).