From: "Mattias Rönnblom" <hofors@lysator.liu.se>
To: "Morten Brørup" <mb@smartsharesystems.com>
Cc: dev@dpdk.org, Stephen Hemminger <stephen@networkplumber.org>,
Konstantin Ananyev <konstantin.ananyev@huawei.com>,
Bruce Richardson <bruce.richardson@intel.com>,
Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>
Subject: Re: [RFC v2] non-temporal memcpy
Date: Tue, 9 Aug 2022 13:53:25 +0200 [thread overview]
Message-ID: <d1f5991a-8f96-1e79-a3c4-e527959fb57e@lysator.liu.se> (raw)
In-Reply-To: <98CBD80474FA8B44BF855DF32C47DC35D87245@smartserver.smartshare.dk>
On 2022-08-09 11:24, Morten Brørup wrote:
>> From: Mattias Rönnblom [mailto:hofors@lysator.liu.se]
>> Sent: Sunday, 7 August 2022 22.41
>>
>> On 2022-07-29 18:05, Stephen Hemminger wrote:
>>>
>>> It makes sense in a few select places to use non-temporal copy.
>>> But it would add unnecessary complexity to DPDK if every function in
>> DPDK that could
>>> cause a copy had a non-temporal variant.
>>
>> A NT load and NT store variant, plus a NT load+store variant. :)
>
> I considered this, but it adds complexity, and our use case only needs the NT load+store. So I decided to only provide that variant.
>
> I can prepare the API for all four combinations. The extended function would be renamed from rte_memcpy_nt_ex() to just rte_memcpy_ex(). And the rte_memcpy_nt() would be omitted, rather than just perform rte_memcpy_ex(dst,src,len,F_DST_NT|F_SRC_NT).
>
> What does the community prefer in this regard?
>
I would suggest just having a single function, with a flags or an enum
to signify, if load, store or both should be non-temporal. If all
platforms honor all combinations is a different matter.
Is there something that suggests that this particular use case will be
more common than others? When I've used non-temporal memcpy(), only the
store side was NT, since the application would go on an use the source data.
>>
>>>
>>> Maybe just having rte_memcpy have a threshold (config value?) that if
>> copy is larger than
>>> a certain size, then it would automatically be non-temporal. Small
>> copies wouldn't matter,
>>> the optimization is more about not stopping cache size issues with
>> large streams of data.
>>
>> I don't think there's any way for rte_memcpy() to know if the
>> application plan to use the source, the destination, both, or neither
>> of
>> the buffers in the immediate future.
>
> Agree. Which is why explicit NT function variants should be offered.
>
>> For huge copies (MBs or more) the
>> size heuristic makes sense, but for medium sized copies (say a packet
>> worth of data), I'm not so sure.
>
> This is the behavior of glibc memcpy().
>
Yes, but, from what I can tell, glibc issues a sfence at the end of the
copy.
Have a non-temporal memcpy() with a different memory model than the
compiler intrinsic memcpy(), the glibc memcpy() and the DPDK
rte_memcpy() implementations seems like asking for trouble.
>>
>> What is unclear to me is if there is a benefit (or drawback) of using
>> the imaginary rte_memcpy_nt(), compared to doing rte_memcpy() +
>> clflushopt or cldemote, in the typical use case (if there is such).
>>
>
> Our use case is packet capture (copying) to memory, where the copies will be read much later, so there is no need to pollute the cache with the copies.
>
If you flush/demote the cache line you've used more or less immediately,
there won't be much pollution. Especially if you include the
clflushopt/cldemote into the copying routine, as opposed to a large
flush at the end.
I haven't tried this in practice, but it seems to me it's an option
worth exploring. It could be a way to implement a portable NT memcpy(),
if nothing else.
> Our application also doesn't look deep inside the original packets after copying them, there is also no need to pollute the cache with the originals.
>
See above.
> And even though the application looked partially into the packets before copying them (and thus they are partially in cache) using NT load (instead of normal load) has no additional cost.
>
next prev parent reply other threads:[~2022-08-09 11:53 UTC|newest]
Thread overview: 57+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-07-19 15:26 Morten Brørup
2022-07-19 18:00 ` David Christensen
2022-07-19 18:41 ` Morten Brørup
2022-07-19 18:51 ` Stanisław Kardach
2022-07-19 22:15 ` Morten Brørup
2022-07-21 23:19 ` Konstantin Ananyev
2022-07-22 10:44 ` Morten Brørup
2022-07-24 13:35 ` Konstantin Ananyev
2022-07-24 22:18 ` Morten Brørup
2022-07-29 10:00 ` Konstantin Ananyev
2022-07-29 10:46 ` Morten Brørup
2022-07-29 11:50 ` Konstantin Ananyev
2022-07-29 17:17 ` Morten Brørup
2022-07-29 22:00 ` Konstantin Ananyev
2022-07-30 9:51 ` Morten Brørup
2022-08-02 9:05 ` Konstantin Ananyev
2022-07-29 12:13 ` Konstantin Ananyev
2022-07-29 16:05 ` Stephen Hemminger
2022-07-29 17:29 ` Morten Brørup
2022-08-07 20:40 ` Mattias Rönnblom
2022-08-09 9:24 ` Morten Brørup
2022-08-09 11:53 ` Mattias Rönnblom [this message]
2022-10-09 16:16 ` Morten Brørup
2022-07-29 18:13 ` Morten Brørup
2022-07-29 19:49 ` Konstantin Ananyev
2022-07-29 20:26 ` Morten Brørup
2022-07-29 21:34 ` Konstantin Ananyev
2022-08-07 20:20 ` Mattias Rönnblom
2022-08-09 9:34 ` Morten Brørup
2022-08-09 11:56 ` Mattias Rönnblom
2022-08-10 21:05 ` Honnappa Nagarahalli
2022-08-11 11:50 ` Mattias Rönnblom
2022-08-11 16:26 ` Honnappa Nagarahalli
2022-07-25 1:17 ` Honnappa Nagarahalli
2022-07-27 10:26 ` Morten Brørup
2022-07-27 17:37 ` Honnappa Nagarahalli
2022-07-27 18:49 ` Morten Brørup
2022-07-27 19:12 ` Stephen Hemminger
2022-07-28 9:00 ` Morten Brørup
2022-07-27 19:52 ` Honnappa Nagarahalli
2022-07-27 22:02 ` Stanisław Kardach
2022-07-28 10:51 ` Morten Brørup
2022-07-29 9:21 ` Konstantin Ananyev
2022-08-07 20:25 ` Mattias Rönnblom
2022-08-09 9:46 ` Morten Brørup
2022-08-09 12:05 ` Mattias Rönnblom
2022-08-09 15:00 ` Morten Brørup
2022-08-10 11:47 ` Mattias Rönnblom
2022-08-09 15:26 ` Stephen Hemminger
2022-08-09 17:24 ` Morten Brørup
2022-08-10 11:59 ` Mattias Rönnblom
2022-08-10 12:12 ` Morten Brørup
2022-08-10 11:55 ` Mattias Rönnblom
2022-08-10 12:18 ` Morten Brørup
2022-08-10 21:20 ` Honnappa Nagarahalli
2022-08-11 11:53 ` Mattias Rönnblom
2022-08-11 22:24 ` Honnappa Nagarahalli
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=d1f5991a-8f96-1e79-a3c4-e527959fb57e@lysator.liu.se \
--to=hofors@lysator.liu.se \
--cc=Honnappa.Nagarahalli@arm.com \
--cc=bruce.richardson@intel.com \
--cc=dev@dpdk.org \
--cc=konstantin.ananyev@huawei.com \
--cc=mb@smartsharesystems.com \
--cc=stephen@networkplumber.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).