From: "Mattias Rönnblom" <hofors@lysator.liu.se>
To: "Morten Brørup" <mb@smartsharesystems.com>,
"Thomas Monjalon" <thomas@monjalon.net>
Cc: bruce.richardson@intel.com, konstantin.v.ananyev@yandex.ru,
Honnappa.Nagarahalli@arm.com, stephen@networkplumber.org,
dev@dpdk.org, mattias.ronnblom@ericsson.com, kda@semihalf.com,
drc@linux.vnet.ibm.com, andrew.rybchenko@oktetlabs.ru,
olivier.matz@6wind.com, anatoly.burakov@intel.com,
dmitry.kozliuk@gmail.com
Subject: Re: [PATCH v4] eal: non-temporal memcpy
Date: Fri, 4 Aug 2023 07:49:07 +0200 [thread overview]
Message-ID: <eaa8815c-e8d7-2027-0182-826b2392bffd@lysator.liu.se> (raw)
In-Reply-To: <98CBD80474FA8B44BF855DF32C47DC35D87AAE@smartserver.smartshare.dk>
On 2023-07-31 14:25, Morten Brørup wrote:
>> From: Thomas Monjalon [mailto:thomas@monjalon.net]
>> Sent: Monday, 31 July 2023 14.14
>>
>> Hello,
>>
>> What's the status of this feature?
>
> I haven't given up on upstreaming this feature, but there doesn't seem to be much demand for it, so working on it has low priority.
>
This would definitely be a useful addition to the EAL, IMO.
It's also a case where it's difficult to provide a generic and portable
solution with both good performance and reasonable semantics. The upside
is you seem to come pretty far already.
>>
>>
>> 10/10/2022 08:46, Morten Brørup:
>>> This patch provides a function for memory copy using non-temporal store,
>>> load or both, controlled by flags passed to the function.
>>>
>>> Applications sometimes copy data to another memory location, which is only
>>> used much later.
>>> In this case, it is inefficient to pollute the data cache with the copied
>>> data.
>>>
>>> An example use case (originating from a real life application):
>>> Copying filtered packets, or the first part of them, into a capture buffer
>>> for offline analysis.
>>>
>>> The purpose of the function is to achieve a performance gain by not
>>> polluting the cache when copying data.
>>> Although the throughput can be improved by further optimization, I do not
>>> have time to do it now.
>>>
>>> The functional tests and performance tests for memory copy have been
>>> expanded to include non-temporal copying.
>>>
>>> A non-temporal version of the mbuf library's function to create a full
>>> copy of a given packet mbuf is provided.
>>>
>>> The packet capture and packet dump libraries have been updated to use
>>> non-temporal memory copy of the packets.
>>>
>>> Implementation notes:
>>>
>>> Implementations for non-x86 architectures can be provided by anyone at a
>>> later time. I am not going to do it.
>>>
>>> x86 non-temporal load instructions must be 16 byte aligned [1], and
>>> non-temporal store instructions must be 4, 8 or 16 byte aligned [2].
>>>
>>> ARM non-temporal load and store instructions seem to require 4 byte
>>> alignment [3].
>>>
>>> [1] https://www.intel.com/content/www/us/en/docs/intrinsics-guide/
>>> index.html#text=_mm_stream_load
>>> [2] https://www.intel.com/content/www/us/en/docs/intrinsics-guide/
>>> index.html#text=_mm_stream_si
>>> [3] https://developer.arm.com/documentation/100076/0100/
>>> A64-Instruction-Set-Reference/A64-Floating-point-Instructions/
>>> LDNP--SIMD-and-FP-
>>>
>>> This patch is a major rewrite from the RFC v3, so no version log comparing
>>> to the RFC is provided.
>>>
>>> v4
>>> * Also ignore the warning for clang int the workaround for
>>> _mm_stream_load_si128() missing const in the parameter.
>>> * Add missing C linkage specifier in rte_memcpy.h.
>>>
>>> v3
>>> * _mm_stream_si64() is not supported on 32-bit x86 architecture, so only
>>> use it on 64-bit x86 architecture.
>>> * CLANG warns that _mm_stream_load_si128_const() and
>>> rte_memcpy_nt_15_or_less_s16a() are not public,
>>> so remove __rte_internal from them. It also affects the documentation
>>> for the functions, so the fix can't be limited to CLANG.
>>> * Use __rte_experimental instead of __rte_internal.
>>> * Replace <n> with nnn in function documentation; it doesn't look like
>>> HTML.
>>> * Slightly modify the workaround for _mm_stream_load_si128() missing const
>>> in the parameter; the ancient GCC 4.5.8 in RHEL7 doesn't understand
>>> #pragma GCC diagnostic ignored "-Wdiscarded-qualifiers", so use
>>> #pragma GCC diagnostic ignored "-Wcast-qual" instead. I hope that works.
>>> * Fixed one coding style issue missed in v2.
>>>
>>> v2
>>> * The last 16 byte block of data, incl. any trailing bytes, were not
>>> copied from the source memory area in rte_memcpy_nt_buf().
>>> * Fix many coding style issues.
>>> * Add some missing header files.
>>> * Fix build time warning for non-x86 architectures by using a different
>>> method to mark the flags parameter unused.
>>> * CLANG doesn't understand RTE_BUILD_BUG_ON(!__builtin_constant_p(flags)),
>>> so omit it when using CLANG.
>>>
>>> Signed-off-by: Morten Brørup <mb@smartsharesystems.com>
>>> ---
>>> app/test/test_memcpy.c | 65 +-
>>> app/test/test_memcpy_perf.c | 187 ++--
>>> lib/eal/include/generic/rte_memcpy.h | 127 +++
>>> lib/eal/x86/include/rte_memcpy.h | 1238 ++++++++++++++++++++++++++
>>> lib/mbuf/rte_mbuf.c | 77 ++
>>> lib/mbuf/rte_mbuf.h | 32 +
>>> lib/mbuf/version.map | 1 +
>>> lib/pcapng/rte_pcapng.c | 3 +-
>>> lib/pdump/rte_pdump.c | 6 +-
>>> 9 files changed, 1645 insertions(+), 91 deletions(-)
>>
>>
>>
>
prev parent reply other threads:[~2023-08-04 5:49 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-08-19 13:58 [RFC v3] " Morten Brørup
2022-10-06 20:34 ` [PATCH] eal: " Morten Brørup
2022-10-10 7:35 ` Morten Brørup
2022-10-10 8:58 ` Mattias Rönnblom
2022-10-10 9:36 ` Morten Brørup
2022-10-10 11:58 ` Stanislaw Kardach
2022-10-10 9:57 ` Bruce Richardson
2022-10-11 9:25 ` Konstantin Ananyev
2022-10-07 10:19 ` [PATCH v2] " Morten Brørup
2022-10-09 15:35 ` [PATCH v3] " Morten Brørup
2022-10-10 6:46 ` [PATCH v4] " Morten Brørup
2022-10-16 14:27 ` Mattias Rönnblom
2022-10-16 19:55 ` Mattias Rönnblom
2023-07-31 12:14 ` Thomas Monjalon
2023-07-31 12:25 ` Morten Brørup
2023-08-04 5:49 ` Mattias Rönnblom [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=eaa8815c-e8d7-2027-0182-826b2392bffd@lysator.liu.se \
--to=hofors@lysator.liu.se \
--cc=Honnappa.Nagarahalli@arm.com \
--cc=anatoly.burakov@intel.com \
--cc=andrew.rybchenko@oktetlabs.ru \
--cc=bruce.richardson@intel.com \
--cc=dev@dpdk.org \
--cc=dmitry.kozliuk@gmail.com \
--cc=drc@linux.vnet.ibm.com \
--cc=kda@semihalf.com \
--cc=konstantin.v.ananyev@yandex.ru \
--cc=mattias.ronnblom@ericsson.com \
--cc=mb@smartsharesystems.com \
--cc=olivier.matz@6wind.com \
--cc=stephen@networkplumber.org \
--cc=thomas@monjalon.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).