DPDK patches and discussions
 help / color / mirror / Atom feed
From: "Mattias Rönnblom" <hofors@lysator.liu.se>
To: "David Marchand" <david.marchand@redhat.com>,
	"Mattias Rönnblom" <mattias.ronnblom@ericsson.com>
Cc: dev@dpdk.org, "Morten Brørup" <mb@smartsharesystems.com>,
	"Stephen Hemminger" <stephen@networkplumber.org>,
	"Pavan Nikhilesh" <pbhagavatula@marvell.com>,
	"Bruce Richardson" <bruce.richardson@intel.com>
Subject: Re: [PATCH v6 5/7] eal: provide option to use compiler memcpy instead of RTE
Date: Fri, 4 Oct 2024 11:21:20 +0200	[thread overview]
Message-ID: <8e63fd80-2ab0-4afe-94e8-ce3277338fdf@lysator.liu.se> (raw)
In-Reply-To: <CAJFAV8ywLMPsF+_nju3Qsz0vTMNuPqQ5kJo9LebnyuhMfbYhwg@mail.gmail.com>

On 2024-10-04 09:52, David Marchand wrote:
> On Fri, Sep 20, 2024 at 12:36 PM Mattias Rönnblom
> <mattias.ronnblom@ericsson.com> wrote:
>>
>> Provide build option to have functions in <rte_memcpy.h> delegate to
>> the standard compiler/libc memcpy(), instead of using the various
>> custom DPDK, handcrafted, per-architecture rte_memcpy()
>> implementations.
>>
>> A new meson build option 'use_cc_memcpy' is added. By default, the
>> traditional, custom DPDK rte_memcpy() implementation is used.
>>
>> The performance benefits of the custom DPDK rte_memcpy()
>> implementations have been diminishing with every compiler release, and
>> with current toolchains the use of a custom memcpy() implementation
>> may even be a liability.
>>
>> An additional benefit of this change is that compilers and static
>> analysis tools have an easier time detecting incorrect usage of
>> rte_memcpy() (e.g., buffer overruns, or overlapping source and
>> destination buffers).
>>
>> Signed-off-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
>> Acked-by: Morten Brørup <mb@smartsharesystems.com>
> 
> I like this patch and the direction we are taking: stop reinvent
> memcpy and rely on compiler to optimize it.
> 
> I have some comments on the implementation.
> 
> - When I splitted headers in the early days of dpdk, the intention
> with arch-specific headers in EAL was to have them include the generic
> one, in all cases.
> It seems that, over time, x86 rte_memcpy.h (at least) deviated from
> this and stopped including generic/rte_memcpy.h...
> 
> So in this current patch, I expect every arch specific headers first
> include generic/rte_memcpy.h, regardless of any arch-specific define
> coming from the configuration.
> 
> An additional note on this, ARM32 and ARM64 have their own
> implementation in rte_memcpy_32.h resp. rte_memcpy_64.h, and I would
> check RTE_USE_CC_MEMCPY in each of them rather than in the top as
> ARM32 and ARM64 are like two different arches.
> 
> 
> - Now, looking at what was available for arches so far in DPDK:
> * ARM was relying by default on compiler implementation, with specific
> implementations for ARM32 and ARM64 available (see for more details
> below) => possible values (default first) RTE_USE_CC_MEMCPY = true /
> false
> * loongarch was relying on compiler implementation, with no specific
> implementations, => RTE_USE_CC_MEMCPY = true
> * ppc was relying on arch specific implementation, => RTE_USE_CC_MEMCPY = false
> * risc was relying on compiler implementation, with no specific
> implementations, => RTE_USE_CC_MEMCPY = true
> * x86 was relying on arch specific implementation, => RTE_USE_CC_MEMCPY = false
> 
> We can't get a unified default value for a meson option and keep
> compat for all arches (except maybe introduce a "auto" value).
> 
> Plus, disabling RTE_USE_CC_MEMCPY on loongarch and risc makes no
> sense, as there was never a specific implementation.
> 
> My suggestion is to drop the meson option and instead just set
> RTE_USE_CC_MEMCPY in config/$arch/meson.build.
> Testers / interested users may edit config/$arch/meson.build on their own.
> 

So we've gone from...

"Eliminate DPDK custom per-arch memcpy altogether"
to
"Keep custom memcpy, but make cc memcpy the default"
to
"Keep custom memcpy as the default, but make cc memcpy a build option"
to
"Keep custom memcpy as the default, and have the user modify some 
obscure build file to use cc memcpy"

I seems like the natural next step is just

"Keep the custom memcpy. Period."

If we intend to keep the custom DPDK memcpy implementations 
indefinitely, we should just provide an option to use CC memcpy on x86 
as well, just like on ARM.

That would go against the original intention of this patch set, which 
was to reduce DPDK complexity (and hopefully improve performance as 
well, on average).

> 
> - Additionnally, ARM people have introduced arch-specific
> implementation config options for memcpy in ARM32 resp. ARM64:
> RTE_ARCH_ARM_NEON_MEMCPY resp. RTE_ARCH_ARM64_MEMCPY.
> RTE_USE_CC_MEMCPY can replace those two options (we may keep some
> compat in case someone relied on those defines for arm).
> That removes the need for a RTE_CC_MEMCPY define.
> 
> More comments below:
> 
> [snip]
> 
>> diff --git a/doc/guides/rel_notes/release_24_11.rst b/doc/guides/rel_notes/release_24_11.rst
>> index 0ff70d9057..8be000294d 100644
>> --- a/doc/guides/rel_notes/release_24_11.rst
>> +++ b/doc/guides/rel_notes/release_24_11.rst
>> @@ -55,6 +55,26 @@ New Features
>>        Also, make sure to start the actual text at the margin.
>>        =======================================================
>>
>> +* **Compiler memcpy replaces custom DPDK implementation.**
>> +
>> +  The memory copy functions of ``<rte_memcpy.h>`` now optionally
>> +  delegates to the standard memcpy() function, implemented by the
>> +  compiler and the C runtime (e.g., libc).
>> +
>> +  In this release of DPDK, the handcrafted, per-architecture memory
>> +  copy implementations are still the default. Compiler memcpy is
>> +  enabled by setting the new ``use_cc_memcpy`` build option to true.
>> +
>> +  The performance benefits of the custom DPDK rte_memcpy()
>> +  implementations have been diminishing with every new compiler
>> +  release, and with current toolchains the use of a custom memcpy()
>> +  implementation may even result in worse performance than the
>> +  standard memcpy().
>> +
>> +  An additional benefit of using compiler memcpy is that compilers and
>> +  static analysis tools have an easier time detecting incorrect usage
>> +  of rte_memcpy() (e.g., buffer overruns, or overlapping source and
>> +  destination buffers).
> 
> As explained in the RN comments, an entry should use the form:
> 
>     * **Add a title in the past tense with a full stop.**
> 
>       Add a short 1-2 sentence description in the past tense.
>       The description should be enough to allow someone scanning
>       the release notes to understand the new feature.
> 
> It seems this note is a copy/paste of the commit log, please adjust
> the title and make the description shorter.
> 
>>
>>   Removed Items
>>   -------------
> 
> [snip]
> 
>> diff --git a/lib/eal/include/generic/rte_memcpy.h b/lib/eal/include/generic/rte_memcpy.h
>> index e7f0f8eaa9..cfb0175bd2 100644
>> --- a/lib/eal/include/generic/rte_memcpy.h
>> +++ b/lib/eal/include/generic/rte_memcpy.h
>> @@ -5,12 +5,19 @@
>>   #ifndef _RTE_MEMCPY_H_
>>   #define _RTE_MEMCPY_H_
>>
>> +#ifdef __cplusplus
>> +extern "C" {
>> +#endif
>> +
>>   /**
>>    * @file
>>    *
>>    * Functions for vectorised implementation of memcpy().
>>    */
>>
>> +#include <stdint.h>
>> +#include <string.h>
> 
> I don't think those includes should go in a extern "C" { block.
> 
>> +
>>   /**
>>    * Copy 16 bytes from one location to another using optimised
>>    * instructions. The locations should not overlap.
>> @@ -35,8 +42,6 @@ rte_mov16(uint8_t *dst, const uint8_t *src);
>>   static inline void
>>   rte_mov32(uint8_t *dst, const uint8_t *src);
>>
>> -#ifdef __DOXYGEN__
>> -
> 
> This strange check was added as not all architectures provide
> rte_mov48 (/me slaps Adrien and Thomas).
> I think the CI reported no issue because of a problem in the next
> patch where all that is tested is RTE_USE_CC_MEMCPY = true
> combination.
> 
> Still, the overall goal of this work is to drop the whole rte_memcpy
> thing in the future, so I think we can live with this #ifdef
> __DOXYGEN__ non sense hiding the absence of rte_mov48 in x86...
> 
> 
>>   /**
>>    * Copy 48 bytes from one location to another using optimised
>>    * instructions. The locations should not overlap.
>> @@ -49,8 +54,6 @@ rte_mov32(uint8_t *dst, const uint8_t *src);
>>   static inline void
>>   rte_mov48(uint8_t *dst, const uint8_t *src);
>>
>> -#endif /* __DOXYGEN__ */
>> -
>>   /**
>>    * Copy 64 bytes from one location to another using optimised
>>    * instructions. The locations should not overlap.
>> @@ -87,8 +90,6 @@ rte_mov128(uint8_t *dst, const uint8_t *src);
>>   static inline void
>>   rte_mov256(uint8_t *dst, const uint8_t *src);
>>
>> -#ifdef __DOXYGEN__
>> -
>>   /**
>>    * Copy bytes from one location to another. The locations must not overlap.
>>    *
>> @@ -111,6 +112,52 @@ rte_mov256(uint8_t *dst, const uint8_t *src);
>>   static void *
>>   rte_memcpy(void *dst, const void *src, size_t n);
>>
>> -#endif /* __DOXYGEN__ */
> 
> Removing this DOXYGEN here should be ok.
> CI will tell us.
> 
> 
>> diff --git a/lib/eal/x86/include/meson.build b/lib/eal/x86/include/meson.build
>> index 52d2f8e969..09c2fe2485 100644
>> --- a/lib/eal/x86/include/meson.build
>> +++ b/lib/eal/x86/include/meson.build
>> @@ -16,6 +16,7 @@ arch_headers = files(
>>           'rte_spinlock.h',
>>           'rte_vect.h',
>>   )
>> +
> 
> Unrelated change.
> 
> 
>>   arch_indirect_headers = files(
>>           'rte_atomic_32.h',
>>           'rte_atomic_64.h',
> 
> 


  reply	other threads:[~2024-10-04  9:21 UTC|newest]

Thread overview: 118+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-05-27 11:11 [RFC] " Mattias Rönnblom
2024-05-28  7:43 ` [RFC v2] " Mattias Rönnblom
2024-05-28  8:19   ` Mattias Rönnblom
2024-05-28  8:27     ` Bruce Richardson
2024-05-28  8:59       ` Mattias Rönnblom
2024-05-28  9:07         ` Morten Brørup
2024-05-28 16:17           ` Mattias Rönnblom
2024-05-28 14:59     ` Stephen Hemminger
2024-05-28 15:09       ` Bruce Richardson
2024-05-31  5:19         ` Mattias Rönnblom
2024-05-31 16:50           ` Stephen Hemminger
2024-06-02 11:33             ` Mattias Rönnblom
2024-05-28 16:03       ` Mattias Rönnblom
2024-05-29 21:55         ` Stephen Hemminger
2024-05-28  8:20   ` Bruce Richardson
2024-06-02 12:39   ` [RFC v3 0/5] Optionally have rte_memcpy delegate to compiler memcpy Mattias Rönnblom
2024-06-02 12:39     ` [RFC v3 1/5] event/dlb2: include headers for vector and memory copy APIs Mattias Rönnblom
2024-06-05  6:49       ` [PATCH 0/5] Optionally have rte_memcpy delegate to compiler memcpy Mattias Rönnblom
2024-06-05  6:49         ` [PATCH 1/5] event/dlb2: include headers for vector and memory copy APIs Mattias Rönnblom
2024-06-05  6:49         ` [PATCH 2/5] net/octeon_ep: properly include vector API header file Mattias Rönnblom
2024-06-05  6:49         ` [PATCH 3/5] distributor: " Mattias Rönnblom
2024-06-10 14:27           ` Tyler Retzlaff
2024-06-05  6:49         ` [PATCH 4/5] fib: " Mattias Rönnblom
2024-06-10 14:28           ` Tyler Retzlaff
2024-06-05  6:49         ` [PATCH 5/5] eal: provide option to use compiler memcpy instead of RTE Mattias Rönnblom
2024-06-20  7:24         ` [PATCH v2 0/6] Optionally have rte_memcpy delegate to compiler memcpy Mattias Rönnblom
2024-06-20  7:24           ` [PATCH v2 1/6] net/fm10k: add missing intrinsic include Mattias Rönnblom
2024-06-20  9:02             ` Bruce Richardson
2024-06-20  9:28             ` Bruce Richardson
2024-06-20 11:40               ` Mattias Rönnblom
2024-06-20 11:59                 ` Bruce Richardson
2024-06-20 11:50             ` [PATCH v3 0/6] Optionally have rte_memcpy delegate to compiler memcpy Mattias Rönnblom
2024-06-20 11:50               ` [PATCH v3 1/6] net/fm10k: add missing vector API header include Mattias Rönnblom
2024-06-20 12:34                 ` Bruce Richardson
2024-06-20 17:57                 ` [PATCH v4 00/13] Optionally have rte_memcpy delegate to compiler memcpy Mattias Rönnblom
2024-06-20 17:57                   ` [PATCH v4 01/13] net/i40e: add missing vector API header include Mattias Rönnblom
2024-07-24  7:53                     ` [PATCH v5 0/6] Optionally have rte_memcpy delegate to compiler memcpy Mattias Rönnblom
2024-07-24  7:53                       ` [PATCH v5 1/6] net/octeon_ep: add missing vector API header include Mattias Rönnblom
2024-09-20 10:27                         ` [PATCH v6 0/7] Optionally have rte_memcpy delegate to compiler memcpy Mattias Rönnblom
2024-09-20 10:27                           ` [PATCH v6 1/7] event/dlb2: include headers for vector and memory copy APIs Mattias Rönnblom
2024-09-20 10:27                           ` [PATCH v6 2/7] net/octeon_ep: add missing vector API header include Mattias Rönnblom
2024-09-20 10:27                           ` [PATCH v6 3/7] distributor: " Mattias Rönnblom
2024-09-20 10:27                           ` [PATCH v6 4/7] fib: " Mattias Rönnblom
2024-09-20 10:27                           ` [PATCH v6 5/7] eal: provide option to use compiler memcpy instead of RTE Mattias Rönnblom
2024-10-04  7:52                             ` David Marchand
2024-10-04  9:21                               ` Mattias Rönnblom [this message]
2024-10-04  9:54                                 ` David Marchand
2024-10-04 12:07                                   ` Thomas Monjalon
2024-10-04  9:27                               ` Mattias Rönnblom
2024-09-20 10:27                           ` [PATCH v6 6/7] ci: test compiler memcpy Mattias Rönnblom
2024-10-04  7:56                             ` David Marchand
2024-09-20 10:27                           ` [PATCH v6 7/7] vhost: optimize memcpy routines when cc memcpy is used Mattias Rönnblom
2024-10-03 11:46                             ` Maxime Coquelin
2024-07-24  7:53                       ` [PATCH v5 2/6] distributor: add missing vector API header include Mattias Rönnblom
2024-07-24  7:53                       ` [PATCH v5 3/6] fib: " Mattias Rönnblom
2024-07-24  7:53                       ` [PATCH v5 4/6] eal: provide option to use compiler memcpy instead of RTE Mattias Rönnblom
2024-07-24  7:53                       ` [PATCH v5 5/6] ci: test compiler memcpy Mattias Rönnblom
2024-07-24  7:53                       ` [PATCH v5 6/6] vhost: optimize memcpy routines when cc memcpy is used Mattias Rönnblom
2024-07-29 11:00                         ` Morten Brørup
2024-07-29 19:27                           ` Mattias Rönnblom
2024-07-29 19:56                             ` Morten Brørup
2024-06-20 17:57                   ` [PATCH v4 02/13] net/iavf: add missing vector API header include Mattias Rönnblom
2024-06-20 17:57                   ` [PATCH v4 03/13] net/ice: " Mattias Rönnblom
2024-06-20 17:57                   ` [PATCH v4 04/13] net/ixgbe: " Mattias Rönnblom
2024-06-20 17:57                   ` [PATCH v4 05/13] net/ngbe: " Mattias Rönnblom
2024-06-20 17:57                   ` [PATCH v4 06/13] net/txgbe: " Mattias Rönnblom
2024-06-20 17:57                   ` [PATCH v4 07/13] net/virtio: " Mattias Rönnblom
2024-06-20 17:57                   ` [PATCH v4 08/13] net/fm10k: " Mattias Rönnblom
2024-06-20 17:57                   ` [PATCH v4 09/13] event/dlb2: include headers for vector and memory copy APIs Mattias Rönnblom
2024-06-20 17:57                   ` [PATCH v4 10/13] net/octeon_ep: add missing vector API header include Mattias Rönnblom
2024-06-20 17:57                   ` [PATCH v4 11/13] distributor: " Mattias Rönnblom
2024-06-20 17:57                   ` [PATCH v4 12/13] fib: " Mattias Rönnblom
2024-06-20 17:57                   ` [PATCH v4 13/13] eal: provide option to use compiler memcpy instead of RTE Mattias Rönnblom
2024-06-21 15:19                     ` Stephen Hemminger
2024-06-24 10:05                     ` Thomas Monjalon
2024-06-24 17:56                       ` Mattias Rönnblom
2024-06-25 13:06                       ` Mattias Rönnblom
2024-06-25 13:34                         ` Thomas Monjalon
2024-06-20 18:53                   ` [PATCH v4 00/13] Optionally have rte_memcpy delegate to compiler memcpy Morten Brørup
2024-06-21  6:56                   ` Mattias Rönnblom
2024-06-21  7:04                     ` David Marchand
2024-06-21  7:35                       ` Mattias Rönnblom
2024-06-21  7:41                         ` David Marchand
2024-06-25 15:29                   ` Maxime Coquelin
2024-06-25 15:44                     ` Stephen Hemminger
2024-06-25 19:27                     ` Mattias Rönnblom
2024-06-26  8:37                       ` Maxime Coquelin
2024-06-26 14:58                         ` Stephen Hemminger
2024-06-26 15:24                           ` Maxime Coquelin
2024-06-26 18:47                             ` Mattias Rönnblom
2024-06-26 20:16                               ` Morten Brørup
2024-06-27 11:06                                 ` Mattias Rönnblom
2024-06-27 15:10                                   ` Stephen Hemminger
2024-06-27 15:23                                     ` Mattias Rönnblom
2024-06-20 11:50               ` [PATCH v3 2/6] event/dlb2: include headers for vector and memory copy APIs Mattias Rönnblom
2024-06-20 11:50               ` [PATCH v3 3/6] net/octeon_ep: add missing vector API header include Mattias Rönnblom
2024-06-20 11:50               ` [PATCH v3 4/6] distributor: " Mattias Rönnblom
2024-06-20 11:50               ` [PATCH v3 5/6] fib: " Mattias Rönnblom
2024-06-20 11:50               ` [PATCH v3 6/6] eal: provide option to use compiler memcpy instead of RTE Mattias Rönnblom
2024-06-20  7:24           ` [PATCH v2 2/6] event/dlb2: include headers for vector and memory copy APIs Mattias Rönnblom
2024-06-20  9:03             ` Bruce Richardson
2024-06-20  7:24           ` [PATCH v2 3/6] net/octeon_ep: properly include vector API header file Mattias Rönnblom
2024-06-20 14:43             ` Stephen Hemminger
2024-06-20  7:24           ` [PATCH v2 4/6] distributor: " Mattias Rönnblom
2024-06-20  9:13             ` Bruce Richardson
2024-06-20  7:24           ` [PATCH v2 5/6] fib: " Mattias Rönnblom
2024-06-20  9:14             ` Bruce Richardson
2024-06-20 14:43               ` Stephen Hemminger
2024-06-20  7:24           ` [PATCH v2 6/6] eal: provide option to use compiler memcpy instead of RTE Mattias Rönnblom
2024-06-02 12:39     ` [RFC v3 2/5] net/octeon_ep: properly include vector API header file Mattias Rönnblom
2024-06-02 12:39     ` [RFC v3 3/5] distributor: " Mattias Rönnblom
2024-06-02 12:39     ` [RFC v3 4/5] fib: " Mattias Rönnblom
2024-06-02 12:39     ` [RFC v3 5/5] eal: provide option to use compiler memcpy instead of RTE Mattias Rönnblom
2024-06-02 20:58       ` Morten Brørup
2024-06-03 17:04         ` Mattias Rönnblom
2024-06-03 17:08           ` Stephen Hemminger
2024-05-29 21:56 ` [RFC] " Stephen Hemminger
2024-06-02 11:30   ` Mattias Rönnblom

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=8e63fd80-2ab0-4afe-94e8-ce3277338fdf@lysator.liu.se \
    --to=hofors@lysator.liu.se \
    --cc=bruce.richardson@intel.com \
    --cc=david.marchand@redhat.com \
    --cc=dev@dpdk.org \
    --cc=mattias.ronnblom@ericsson.com \
    --cc=mb@smartsharesystems.com \
    --cc=pbhagavatula@marvell.com \
    --cc=stephen@networkplumber.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).