DPDK patches and discussions
 help / color / mirror / Atom feed
From: Konstantin Ananyev <konstantin.ananyev@huawei.com>
To: "Morten Brørup" <mb@smartsharesystems.com>,
	"hofors@lysator.liu.se" <hofors@lysator.liu.se>,
	"konstantin.v.ananyev@yandex.ru" <konstantin.v.ananyev@yandex.ru>,
	"Honnappa.Nagarahalli@arm.com" <Honnappa.Nagarahalli@arm.com>,
	"stephen@networkplumber.org" <stephen@networkplumber.org>
Cc: "mattias.ronnblom@ericsson.com" <mattias.ronnblom@ericsson.com>,
	"bruce.richardson@intel.com" <bruce.richardson@intel.com>,
	"kda@semihalf.com" <kda@semihalf.com>,
	"drc@linux.vnet.ibm.com" <drc@linux.vnet.ibm.com>,
	"dev@dpdk.org" <dev@dpdk.org>
Subject: RE: [PATCH] eal: non-temporal memcpy
Date: Tue, 11 Oct 2022 09:25:23 +0000	[thread overview]
Message-ID: <b1cdcb19cb49402eba0383cc35fbe79f@huawei.com> (raw)
In-Reply-To: <98CBD80474FA8B44BF855DF32C47DC35D873BC@smartserver.smartshare.dk>


Hi Morten,
 
> Mattias, Konstantin, Honnappa, Stephen,
> 
> In my patch for non-temporal memcpy, I have been aiming for using as much non-temporal store as possible. E.g. copying 16 byte to a
> 16 byte aligned address will be done using non-temporal store instructions.
> 
> Now, I am seriously considering this alternative:
> 
> Only using non-temporal stores for complete cache lines, and using normal stores for partial cache lines.
> 
> I think it will make things simpler when an application mixes normal and non-temporal stores. E.g. an application writing metadata (a
> pcap header) followed by packet data.

Sounds like a reasonable idea to me.

> 
> The disadvantage is that copying a burst of 32 packets, will - in the worst case - pollute 64 cache lines (one at the start plus one at the
> end of the copied data), i.e. 4 KiB of data cache. If copying to a consecutive memory area, e.g. a packet capture buffer, it will pollute 33
> cache lines (because the start of packet #2 is in the same cache line as the end of packet #1, etc.).
> 
> What do you think?

My guess that for modern high-end x86 CPUs the difference would be neglectable.
Though again, right now it is just my guess, and I don't have a clue what will be impact (if any) on other platforms. 
If we really want to avoid any doubts, then probably the best thing it  to have some sort of micro-bench in our UT that would simulate
some memory(/cache) bound workload plus normal or NT copies.
As a very rough though:
Allocate some big enough memory buffer (size=X) that for sure wouldn't fit into CPU caches.
Then in a loop for each iteration:
    - do N random normal reads/writes from/to that buffer to simulate some memory bound workload.
     (so each iteration cause  some (more or less) constant % of cache-misses).    
    - invoke our memcpy_ex(size=Y) in question K(=32 as DPDK magic number?) times for different memory locations.
Measure amount of cycles it takes for some big number of iterations.
That would probably show us a difference (if any)
between memcpy vs memcpy_ex() or between different implementations of memcpy_ex()
in terms of cache-line saving, etc.  
Again it will probably show at what size>=Y it is worth to start using NT instead of normal copies for such workloads.
By varying X,N,Y,K parameters we can test different scenarios on different platforms.  

> 
> PS: Non-temporal loads are easy to work with, so don't worry about that.
> 
> 
> Med venlig hilsen / Kind regards,
> -Morten Brørup

  parent reply	other threads:[~2022-10-11  9:25 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-08-19 13:58 [RFC v3] " Morten Brørup
2022-10-06 20:34 ` [PATCH] eal: " Morten Brørup
2022-10-10  7:35   ` Morten Brørup
2022-10-10  8:58     ` Mattias Rönnblom
2022-10-10  9:36       ` Morten Brørup
2022-10-10 11:58         ` Stanislaw Kardach
2022-10-10  9:57       ` Bruce Richardson
2022-10-11  9:25     ` Konstantin Ananyev [this message]
2022-10-07 10:19 ` [PATCH v2] " Morten Brørup
2022-10-09 15:35 ` [PATCH v3] " Morten Brørup
2022-10-10  6:46 ` [PATCH v4] " Morten Brørup
2022-10-16 14:27   ` Mattias Rönnblom
2022-10-16 19:55   ` Mattias Rönnblom
2023-07-31 12:14   ` Thomas Monjalon
2023-07-31 12:25     ` Morten Brørup
2023-08-04  5:49       ` Mattias Rönnblom

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=b1cdcb19cb49402eba0383cc35fbe79f@huawei.com \
    --to=konstantin.ananyev@huawei.com \
    --cc=Honnappa.Nagarahalli@arm.com \
    --cc=bruce.richardson@intel.com \
    --cc=dev@dpdk.org \
    --cc=drc@linux.vnet.ibm.com \
    --cc=hofors@lysator.liu.se \
    --cc=kda@semihalf.com \
    --cc=konstantin.v.ananyev@yandex.ru \
    --cc=mattias.ronnblom@ericsson.com \
    --cc=mb@smartsharesystems.com \
    --cc=stephen@networkplumber.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).