From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id 85E8E45AAC; Fri, 4 Oct 2024 11:27:28 +0200 (CEST) Received: from mails.dpdk.org (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id 1A42F4027F; Fri, 4 Oct 2024 11:27:28 +0200 (CEST) Received: from mail.lysator.liu.se (mail.lysator.liu.se [130.236.254.3]) by mails.dpdk.org (Postfix) with ESMTP id 12EA040268 for ; Fri, 4 Oct 2024 11:27:26 +0200 (CEST) Received: from mail.lysator.liu.se (localhost [127.0.0.1]) by mail.lysator.liu.se (Postfix) with ESMTP id CC9CE4541 for ; Fri, 4 Oct 2024 11:27:25 +0200 (CEST) Received: by mail.lysator.liu.se (Postfix, from userid 1004) id C0D7B44DC; Fri, 4 Oct 2024 11:27:25 +0200 (CEST) X-Spam-Checker-Version: SpamAssassin 4.0.0 (2022-12-13) on hermod.lysator.liu.se X-Spam-Level: X-Spam-Status: No, score=-1.2 required=5.0 tests=ALL_TRUSTED,AWL, T_SCC_BODY_TEXT_LINE autolearn=disabled version=4.0.0 X-Spam-Score: -1.2 Received: from [192.168.1.86] (h-62-63-215-114.A163.priv.bahnhof.se [62.63.215.114]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mail.lysator.liu.se (Postfix) with ESMTPSA id 0131845A6; Fri, 4 Oct 2024 11:27:23 +0200 (CEST) Message-ID: <7d165396-a5dd-4aab-bc24-a6bfc610c291@lysator.liu.se> Date: Fri, 4 Oct 2024 11:27:23 +0200 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH v6 5/7] eal: provide option to use compiler memcpy instead of RTE To: David Marchand , =?UTF-8?Q?Mattias_R=C3=B6nnblom?= Cc: dev@dpdk.org, =?UTF-8?Q?Morten_Br=C3=B8rup?= , Stephen Hemminger , Pavan Nikhilesh , Bruce Richardson References: <20240724075357.546248-2-mattias.ronnblom@ericsson.com> <20240920102716.738940-1-mattias.ronnblom@ericsson.com> <20240920102716.738940-6-mattias.ronnblom@ericsson.com> Content-Language: en-US From: =?UTF-8?Q?Mattias_R=C3=B6nnblom?= In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Virus-Scanned: ClamAV using ClamSMTP X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org On 2024-10-04 09:52, David Marchand wrote: > On Fri, Sep 20, 2024 at 12:36 PM Mattias Rönnblom > wrote: >> >> Provide build option to have functions in delegate to >> the standard compiler/libc memcpy(), instead of using the various >> custom DPDK, handcrafted, per-architecture rte_memcpy() >> implementations. >> >> A new meson build option 'use_cc_memcpy' is added. By default, the >> traditional, custom DPDK rte_memcpy() implementation is used. >> >> The performance benefits of the custom DPDK rte_memcpy() >> implementations have been diminishing with every compiler release, and >> with current toolchains the use of a custom memcpy() implementation >> may even be a liability. >> >> An additional benefit of this change is that compilers and static >> analysis tools have an easier time detecting incorrect usage of >> rte_memcpy() (e.g., buffer overruns, or overlapping source and >> destination buffers). >> >> Signed-off-by: Mattias Rönnblom >> Acked-by: Morten Brørup > > I like this patch and the direction we are taking: stop reinvent > memcpy and rely on compiler to optimize it. > > I have some comments on the implementation. > > - When I splitted headers in the early days of dpdk, the intention > with arch-specific headers in EAL was to have them include the generic > one, in all cases. > It seems that, over time, x86 rte_memcpy.h (at least) deviated from > this and stopped including generic/rte_memcpy.h... > > So in this current patch, I expect every arch specific headers first > include generic/rte_memcpy.h, regardless of any arch-specific define > coming from the configuration. > > An additional note on this, ARM32 and ARM64 have their own > implementation in rte_memcpy_32.h resp. rte_memcpy_64.h, and I would > check RTE_USE_CC_MEMCPY in each of them rather than in the top as > ARM32 and ARM64 are like two different arches. > > > - Now, looking at what was available for arches so far in DPDK: > * ARM was relying by default on compiler implementation, with specific > implementations for ARM32 and ARM64 available (see for more details > below) => possible values (default first) RTE_USE_CC_MEMCPY = true / > false > * loongarch was relying on compiler implementation, with no specific > implementations, => RTE_USE_CC_MEMCPY = true > * ppc was relying on arch specific implementation, => RTE_USE_CC_MEMCPY = false > * risc was relying on compiler implementation, with no specific > implementations, => RTE_USE_CC_MEMCPY = true > * x86 was relying on arch specific implementation, => RTE_USE_CC_MEMCPY = false > > We can't get a unified default value for a meson option and keep > compat for all arches (except maybe introduce a "auto" value). > What if you just renamed RTE_USE_CC_MEMCPY to RTE_ALWAYS_USE_CC_MEMCPY RTE_FORCE_CC_MEMCPY Then the naming would better match a scenario where cc memcpy may be the only option. > Plus, disabling RTE_USE_CC_MEMCPY on loongarch and risc makes no > sense, as there was never a specific implementation. > > My suggestion is to drop the meson option and instead just set > RTE_USE_CC_MEMCPY in config/$arch/meson.build. > Testers / interested users may edit config/$arch/meson.build on their own. > > > - Additionnally, ARM people have introduced arch-specific > implementation config options for memcpy in ARM32 resp. ARM64: > RTE_ARCH_ARM_NEON_MEMCPY resp. RTE_ARCH_ARM64_MEMCPY. > RTE_USE_CC_MEMCPY can replace those two options (we may keep some > compat in case someone relied on those defines for arm). > That removes the need for a RTE_CC_MEMCPY define. > > More comments below: > > [snip] > >> diff --git a/doc/guides/rel_notes/release_24_11.rst b/doc/guides/rel_notes/release_24_11.rst >> index 0ff70d9057..8be000294d 100644 >> --- a/doc/guides/rel_notes/release_24_11.rst >> +++ b/doc/guides/rel_notes/release_24_11.rst >> @@ -55,6 +55,26 @@ New Features >> Also, make sure to start the actual text at the margin. >> ======================================================= >> >> +* **Compiler memcpy replaces custom DPDK implementation.** >> + >> + The memory copy functions of ```` now optionally >> + delegates to the standard memcpy() function, implemented by the >> + compiler and the C runtime (e.g., libc). >> + >> + In this release of DPDK, the handcrafted, per-architecture memory >> + copy implementations are still the default. Compiler memcpy is >> + enabled by setting the new ``use_cc_memcpy`` build option to true. >> + >> + The performance benefits of the custom DPDK rte_memcpy() >> + implementations have been diminishing with every new compiler >> + release, and with current toolchains the use of a custom memcpy() >> + implementation may even result in worse performance than the >> + standard memcpy(). >> + >> + An additional benefit of using compiler memcpy is that compilers and >> + static analysis tools have an easier time detecting incorrect usage >> + of rte_memcpy() (e.g., buffer overruns, or overlapping source and >> + destination buffers). > > As explained in the RN comments, an entry should use the form: > > * **Add a title in the past tense with a full stop.** > > Add a short 1-2 sentence description in the past tense. > The description should be enough to allow someone scanning > the release notes to understand the new feature. > > It seems this note is a copy/paste of the commit log, please adjust > the title and make the description shorter. > >> >> Removed Items >> ------------- > > [snip] > >> diff --git a/lib/eal/include/generic/rte_memcpy.h b/lib/eal/include/generic/rte_memcpy.h >> index e7f0f8eaa9..cfb0175bd2 100644 >> --- a/lib/eal/include/generic/rte_memcpy.h >> +++ b/lib/eal/include/generic/rte_memcpy.h >> @@ -5,12 +5,19 @@ >> #ifndef _RTE_MEMCPY_H_ >> #define _RTE_MEMCPY_H_ >> >> +#ifdef __cplusplus >> +extern "C" { >> +#endif >> + >> /** >> * @file >> * >> * Functions for vectorised implementation of memcpy(). >> */ >> >> +#include >> +#include > > I don't think those includes should go in a extern "C" { block. > >> + >> /** >> * Copy 16 bytes from one location to another using optimised >> * instructions. The locations should not overlap. >> @@ -35,8 +42,6 @@ rte_mov16(uint8_t *dst, const uint8_t *src); >> static inline void >> rte_mov32(uint8_t *dst, const uint8_t *src); >> >> -#ifdef __DOXYGEN__ >> - > > This strange check was added as not all architectures provide > rte_mov48 (/me slaps Adrien and Thomas). > I think the CI reported no issue because of a problem in the next > patch where all that is tested is RTE_USE_CC_MEMCPY = true > combination. > > Still, the overall goal of this work is to drop the whole rte_memcpy > thing in the future, so I think we can live with this #ifdef > __DOXYGEN__ non sense hiding the absence of rte_mov48 in x86... > > >> /** >> * Copy 48 bytes from one location to another using optimised >> * instructions. The locations should not overlap. >> @@ -49,8 +54,6 @@ rte_mov32(uint8_t *dst, const uint8_t *src); >> static inline void >> rte_mov48(uint8_t *dst, const uint8_t *src); >> >> -#endif /* __DOXYGEN__ */ >> - >> /** >> * Copy 64 bytes from one location to another using optimised >> * instructions. The locations should not overlap. >> @@ -87,8 +90,6 @@ rte_mov128(uint8_t *dst, const uint8_t *src); >> static inline void >> rte_mov256(uint8_t *dst, const uint8_t *src); >> >> -#ifdef __DOXYGEN__ >> - >> /** >> * Copy bytes from one location to another. The locations must not overlap. >> * >> @@ -111,6 +112,52 @@ rte_mov256(uint8_t *dst, const uint8_t *src); >> static void * >> rte_memcpy(void *dst, const void *src, size_t n); >> >> -#endif /* __DOXYGEN__ */ > > Removing this DOXYGEN here should be ok. > CI will tell us. > > >> diff --git a/lib/eal/x86/include/meson.build b/lib/eal/x86/include/meson.build >> index 52d2f8e969..09c2fe2485 100644 >> --- a/lib/eal/x86/include/meson.build >> +++ b/lib/eal/x86/include/meson.build >> @@ -16,6 +16,7 @@ arch_headers = files( >> 'rte_spinlock.h', >> 'rte_vect.h', >> ) >> + > > Unrelated change. > > >> arch_indirect_headers = files( >> 'rte_atomic_32.h', >> 'rte_atomic_64.h', > >