DPDK patches and discussions
 help / color / mirror / Atom feed
From: "Morten Brørup" <mb@smartsharesystems.com>
To: "Thomas Monjalon" <thomas@monjalon.net>
Cc: <hofors@lysator.liu.se>, <bruce.richardson@intel.com>,
	<konstantin.v.ananyev@yandex.ru>, <Honnappa.Nagarahalli@arm.com>,
	<stephen@networkplumber.org>, <dev@dpdk.org>,
	<mattias.ronnblom@ericsson.com>, <kda@semihalf.com>,
	<drc@linux.vnet.ibm.com>, <dev@dpdk.org>,
	<andrew.rybchenko@oktetlabs.ru>, <olivier.matz@6wind.com>,
	<anatoly.burakov@intel.com>, <dmitry.kozliuk@gmail.com>
Subject: RE: [PATCH v4] eal: non-temporal memcpy
Date: Mon, 31 Jul 2023 14:25:30 +0200	[thread overview]
Message-ID: <98CBD80474FA8B44BF855DF32C47DC35D87AAE@smartserver.smartshare.dk> (raw)
In-Reply-To: <5204082.6fTUFtlzNn@thomas>

> From: Thomas Monjalon [mailto:thomas@monjalon.net]
> Sent: Monday, 31 July 2023 14.14
> 
> Hello,
> 
> What's the status of this feature?

I haven't given up on upstreaming this feature, but there doesn't seem to be much demand for it, so working on it has low priority.

> 
> 
> 10/10/2022 08:46, Morten Brørup:
> > This patch provides a function for memory copy using non-temporal store,
> > load or both, controlled by flags passed to the function.
> >
> > Applications sometimes copy data to another memory location, which is only
> > used much later.
> > In this case, it is inefficient to pollute the data cache with the copied
> > data.
> >
> > An example use case (originating from a real life application):
> > Copying filtered packets, or the first part of them, into a capture buffer
> > for offline analysis.
> >
> > The purpose of the function is to achieve a performance gain by not
> > polluting the cache when copying data.
> > Although the throughput can be improved by further optimization, I do not
> > have time to do it now.
> >
> > The functional tests and performance tests for memory copy have been
> > expanded to include non-temporal copying.
> >
> > A non-temporal version of the mbuf library's function to create a full
> > copy of a given packet mbuf is provided.
> >
> > The packet capture and packet dump libraries have been updated to use
> > non-temporal memory copy of the packets.
> >
> > Implementation notes:
> >
> > Implementations for non-x86 architectures can be provided by anyone at a
> > later time. I am not going to do it.
> >
> > x86 non-temporal load instructions must be 16 byte aligned [1], and
> > non-temporal store instructions must be 4, 8 or 16 byte aligned [2].
> >
> > ARM non-temporal load and store instructions seem to require 4 byte
> > alignment [3].
> >
> > [1] https://www.intel.com/content/www/us/en/docs/intrinsics-guide/
> > index.html#text=_mm_stream_load
> > [2] https://www.intel.com/content/www/us/en/docs/intrinsics-guide/
> > index.html#text=_mm_stream_si
> > [3] https://developer.arm.com/documentation/100076/0100/
> > A64-Instruction-Set-Reference/A64-Floating-point-Instructions/
> > LDNP--SIMD-and-FP-
> >
> > This patch is a major rewrite from the RFC v3, so no version log comparing
> > to the RFC is provided.
> >
> > v4
> > * Also ignore the warning for clang int the workaround for
> >   _mm_stream_load_si128() missing const in the parameter.
> > * Add missing C linkage specifier in rte_memcpy.h.
> >
> > v3
> > * _mm_stream_si64() is not supported on 32-bit x86 architecture, so only
> >   use it on 64-bit x86 architecture.
> > * CLANG warns that _mm_stream_load_si128_const() and
> >   rte_memcpy_nt_15_or_less_s16a() are not public,
> >   so remove __rte_internal from them. It also affects the documentation
> >   for the functions, so the fix can't be limited to CLANG.
> > * Use __rte_experimental instead of __rte_internal.
> > * Replace <n> with nnn in function documentation; it doesn't look like
> >   HTML.
> > * Slightly modify the workaround for _mm_stream_load_si128() missing const
> >   in the parameter; the ancient GCC 4.5.8 in RHEL7 doesn't understand
> >   #pragma GCC diagnostic ignored "-Wdiscarded-qualifiers", so use
> >   #pragma GCC diagnostic ignored "-Wcast-qual" instead. I hope that works.
> > * Fixed one coding style issue missed in v2.
> >
> > v2
> > * The last 16 byte block of data, incl. any trailing bytes, were not
> >   copied from the source memory area in rte_memcpy_nt_buf().
> > * Fix many coding style issues.
> > * Add some missing header files.
> > * Fix build time warning for non-x86 architectures by using a different
> >   method to mark the flags parameter unused.
> > * CLANG doesn't understand RTE_BUILD_BUG_ON(!__builtin_constant_p(flags)),
> >   so omit it when using CLANG.
> >
> > Signed-off-by: Morten Brørup <mb@smartsharesystems.com>
> > ---
> >  app/test/test_memcpy.c               |   65 +-
> >  app/test/test_memcpy_perf.c          |  187 ++--
> >  lib/eal/include/generic/rte_memcpy.h |  127 +++
> >  lib/eal/x86/include/rte_memcpy.h     | 1238 ++++++++++++++++++++++++++
> >  lib/mbuf/rte_mbuf.c                  |   77 ++
> >  lib/mbuf/rte_mbuf.h                  |   32 +
> >  lib/mbuf/version.map                 |    1 +
> >  lib/pcapng/rte_pcapng.c              |    3 +-
> >  lib/pdump/rte_pdump.c                |    6 +-
> >  9 files changed, 1645 insertions(+), 91 deletions(-)
> 
> 
> 


  reply	other threads:[~2023-07-31 12:25 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-08-19 13:58 [RFC v3] " Morten Brørup
2022-10-06 20:34 ` [PATCH] eal: " Morten Brørup
2022-10-10  7:35   ` Morten Brørup
2022-10-10  8:58     ` Mattias Rönnblom
2022-10-10  9:36       ` Morten Brørup
2022-10-10 11:58         ` Stanislaw Kardach
2022-10-10  9:57       ` Bruce Richardson
2022-10-11  9:25     ` Konstantin Ananyev
2022-10-07 10:19 ` [PATCH v2] " Morten Brørup
2022-10-09 15:35 ` [PATCH v3] " Morten Brørup
2022-10-10  6:46 ` [PATCH v4] " Morten Brørup
2022-10-16 14:27   ` Mattias Rönnblom
2022-10-16 19:55   ` Mattias Rönnblom
2023-07-31 12:14   ` Thomas Monjalon
2023-07-31 12:25     ` Morten Brørup [this message]
2023-08-04  5:49       ` Mattias Rönnblom

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=98CBD80474FA8B44BF855DF32C47DC35D87AAE@smartserver.smartshare.dk \
    --to=mb@smartsharesystems.com \
    --cc=Honnappa.Nagarahalli@arm.com \
    --cc=anatoly.burakov@intel.com \
    --cc=andrew.rybchenko@oktetlabs.ru \
    --cc=bruce.richardson@intel.com \
    --cc=dev@dpdk.org \
    --cc=dmitry.kozliuk@gmail.com \
    --cc=drc@linux.vnet.ibm.com \
    --cc=hofors@lysator.liu.se \
    --cc=kda@semihalf.com \
    --cc=konstantin.v.ananyev@yandex.ru \
    --cc=mattias.ronnblom@ericsson.com \
    --cc=olivier.matz@6wind.com \
    --cc=stephen@networkplumber.org \
    --cc=thomas@monjalon.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).