From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id 398B345AAA; Fri, 4 Oct 2024 09:53:07 +0200 (CEST) Received: from mails.dpdk.org (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id BC2454027F; Fri, 4 Oct 2024 09:53:06 +0200 (CEST) Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by mails.dpdk.org (Postfix) with ESMTP id 32C3C40268 for ; Fri, 4 Oct 2024 09:53:05 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1728028384; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=UQSHSSSd9bTiZN+tC3wlBsf89kSXD5dgN2SYkpS6ezI=; b=AlBVR7eeBeAbERh8VlF1YS1GnLkYBovAXubNM0wmjHgP4Ulp+dJPO8iiPxhIdj1LBISqZa YrKiFFJvOa/bsPcGpOq7KH+0M06DpVzutoRyLYNSQaPRrd8EBLXu6/rHqMKoT12WG6MuLB dr3EgR5gT7o0NAFTnSqMtre5ENyDaZ4= Received: from mail-lf1-f71.google.com (mail-lf1-f71.google.com [209.85.167.71]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-495-7vci2Wz8PZyM4XiqlGanXg-1; Fri, 04 Oct 2024 03:53:03 -0400 X-MC-Unique: 7vci2Wz8PZyM4XiqlGanXg-1 Received: by mail-lf1-f71.google.com with SMTP id 2adb3069b0e04-5399a4f3a48so1775161e87.0 for ; Fri, 04 Oct 2024 00:53:03 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1728028382; x=1728633182; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=UQSHSSSd9bTiZN+tC3wlBsf89kSXD5dgN2SYkpS6ezI=; b=bN/Y+9u9vA64qDHvcyGzIvESbSlOjouaUJbbj3lqVMKoyG3/TpssiWXKDN4RTdYUU7 prRMtLMiEwXoP/c8X9czRBzXVTEeMYcQBPxJ8ExasGZKAnH3TQ9CXl1h57hZCecFB8Gp xJVRQ4gBc1iXSo8qn+L8JIDUVrPv/sk6UOG0aUr6Z5Bp8XYu4sQcZkTD8S5hzRJah88u YyuXo751MXWcAcGdsG7QVHY4Db5w5mzNPI2d4gvm+6nxCqh9P2/ZV3uHq71G79FyouyQ qma0wakktIpIdJeYgHjfX/rTUSF43X2zWYS1WgLr5/j8Xd88YQ2P4CPoQ2WmRF8PiQtQ U6Cw== X-Gm-Message-State: AOJu0YwNH+o62L0vDhldiZgf3kJq89ASgDgaxK76lfLbeUnuKZFZHdhX /vRTxEM9fPvuV9vvSt3RA5QSMIz8z6iRGWjWAJctWNO56OjGvo0jK+9991guEam6Rl2Z/l82kn6 29rOPZKYGmG5/UyOh8vBu5n3tanBiEeUAP36QfnvGk8haHIWSfmlIYNoumYJ26WPqZdJDoLgPn3 GZUkxbTdLcQwiZpTQ= X-Received: by 2002:a05:6512:39c9:b0:52d:b226:9428 with SMTP id 2adb3069b0e04-539ab85bfadmr1067367e87.6.1728028381626; Fri, 04 Oct 2024 00:53:01 -0700 (PDT) X-Google-Smtp-Source: AGHT+IF5Iyyx8kE8YX7uzCnf19qoKAEeRNaZAQVlzqalsX/6n3Lzi62pfLIlt0pnup4ECQUKnyudmtFU0Gy0BrfC2VY= X-Received: by 2002:a05:6512:39c9:b0:52d:b226:9428 with SMTP id 2adb3069b0e04-539ab85bfadmr1067352e87.6.1728028381122; Fri, 04 Oct 2024 00:53:01 -0700 (PDT) MIME-Version: 1.0 References: <20240724075357.546248-2-mattias.ronnblom@ericsson.com> <20240920102716.738940-1-mattias.ronnblom@ericsson.com> <20240920102716.738940-6-mattias.ronnblom@ericsson.com> In-Reply-To: <20240920102716.738940-6-mattias.ronnblom@ericsson.com> From: David Marchand Date: Fri, 4 Oct 2024 09:52:49 +0200 Message-ID: Subject: Re: [PATCH v6 5/7] eal: provide option to use compiler memcpy instead of RTE To: =?UTF-8?Q?Mattias_R=C3=B6nnblom?= Cc: dev@dpdk.org, =?UTF-8?Q?Mattias_R=C3=B6nnblom?= , =?UTF-8?Q?Morten_Br=C3=B8rup?= , Stephen Hemminger , Pavan Nikhilesh , Bruce Richardson X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org On Fri, Sep 20, 2024 at 12:36=E2=80=AFPM Mattias R=C3=B6nnblom wrote: > > Provide build option to have functions in delegate to > the standard compiler/libc memcpy(), instead of using the various > custom DPDK, handcrafted, per-architecture rte_memcpy() > implementations. > > A new meson build option 'use_cc_memcpy' is added. By default, the > traditional, custom DPDK rte_memcpy() implementation is used. > > The performance benefits of the custom DPDK rte_memcpy() > implementations have been diminishing with every compiler release, and > with current toolchains the use of a custom memcpy() implementation > may even be a liability. > > An additional benefit of this change is that compilers and static > analysis tools have an easier time detecting incorrect usage of > rte_memcpy() (e.g., buffer overruns, or overlapping source and > destination buffers). > > Signed-off-by: Mattias R=C3=B6nnblom > Acked-by: Morten Br=C3=B8rup I like this patch and the direction we are taking: stop reinvent memcpy and rely on compiler to optimize it. I have some comments on the implementation. - When I splitted headers in the early days of dpdk, the intention with arch-specific headers in EAL was to have them include the generic one, in all cases. It seems that, over time, x86 rte_memcpy.h (at least) deviated from this and stopped including generic/rte_memcpy.h... So in this current patch, I expect every arch specific headers first include generic/rte_memcpy.h, regardless of any arch-specific define coming from the configuration. An additional note on this, ARM32 and ARM64 have their own implementation in rte_memcpy_32.h resp. rte_memcpy_64.h, and I would check RTE_USE_CC_MEMCPY in each of them rather than in the top as ARM32 and ARM64 are like two different arches. - Now, looking at what was available for arches so far in DPDK: * ARM was relying by default on compiler implementation, with specific implementations for ARM32 and ARM64 available (see for more details below) =3D> possible values (default first) RTE_USE_CC_MEMCPY =3D true / false * loongarch was relying on compiler implementation, with no specific implementations, =3D> RTE_USE_CC_MEMCPY =3D true * ppc was relying on arch specific implementation, =3D> RTE_USE_CC_MEMCPY = =3D false * risc was relying on compiler implementation, with no specific implementations, =3D> RTE_USE_CC_MEMCPY =3D true * x86 was relying on arch specific implementation, =3D> RTE_USE_CC_MEMCPY = =3D false We can't get a unified default value for a meson option and keep compat for all arches (except maybe introduce a "auto" value). Plus, disabling RTE_USE_CC_MEMCPY on loongarch and risc makes no sense, as there was never a specific implementation. My suggestion is to drop the meson option and instead just set RTE_USE_CC_MEMCPY in config/$arch/meson.build. Testers / interested users may edit config/$arch/meson.build on their own. - Additionnally, ARM people have introduced arch-specific implementation config options for memcpy in ARM32 resp. ARM64: RTE_ARCH_ARM_NEON_MEMCPY resp. RTE_ARCH_ARM64_MEMCPY. RTE_USE_CC_MEMCPY can replace those two options (we may keep some compat in case someone relied on those defines for arm). That removes the need for a RTE_CC_MEMCPY define. More comments below: [snip] > diff --git a/doc/guides/rel_notes/release_24_11.rst b/doc/guides/rel_note= s/release_24_11.rst > index 0ff70d9057..8be000294d 100644 > --- a/doc/guides/rel_notes/release_24_11.rst > +++ b/doc/guides/rel_notes/release_24_11.rst > @@ -55,6 +55,26 @@ New Features > Also, make sure to start the actual text at the margin. > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D > > +* **Compiler memcpy replaces custom DPDK implementation.** > + > + The memory copy functions of ```` now optionally > + delegates to the standard memcpy() function, implemented by the > + compiler and the C runtime (e.g., libc). > + > + In this release of DPDK, the handcrafted, per-architecture memory > + copy implementations are still the default. Compiler memcpy is > + enabled by setting the new ``use_cc_memcpy`` build option to true. > + > + The performance benefits of the custom DPDK rte_memcpy() > + implementations have been diminishing with every new compiler > + release, and with current toolchains the use of a custom memcpy() > + implementation may even result in worse performance than the > + standard memcpy(). > + > + An additional benefit of using compiler memcpy is that compilers and > + static analysis tools have an easier time detecting incorrect usage > + of rte_memcpy() (e.g., buffer overruns, or overlapping source and > + destination buffers). As explained in the RN comments, an entry should use the form: * **Add a title in the past tense with a full stop.** Add a short 1-2 sentence description in the past tense. The description should be enough to allow someone scanning the release notes to understand the new feature. It seems this note is a copy/paste of the commit log, please adjust the title and make the description shorter. > > Removed Items > ------------- [snip] > diff --git a/lib/eal/include/generic/rte_memcpy.h b/lib/eal/include/gener= ic/rte_memcpy.h > index e7f0f8eaa9..cfb0175bd2 100644 > --- a/lib/eal/include/generic/rte_memcpy.h > +++ b/lib/eal/include/generic/rte_memcpy.h > @@ -5,12 +5,19 @@ > #ifndef _RTE_MEMCPY_H_ > #define _RTE_MEMCPY_H_ > > +#ifdef __cplusplus > +extern "C" { > +#endif > + > /** > * @file > * > * Functions for vectorised implementation of memcpy(). > */ > > +#include > +#include I don't think those includes should go in a extern "C" { block. > + > /** > * Copy 16 bytes from one location to another using optimised > * instructions. The locations should not overlap. > @@ -35,8 +42,6 @@ rte_mov16(uint8_t *dst, const uint8_t *src); > static inline void > rte_mov32(uint8_t *dst, const uint8_t *src); > > -#ifdef __DOXYGEN__ > - This strange check was added as not all architectures provide rte_mov48 (/me slaps Adrien and Thomas). I think the CI reported no issue because of a problem in the next patch where all that is tested is RTE_USE_CC_MEMCPY =3D true combination. Still, the overall goal of this work is to drop the whole rte_memcpy thing in the future, so I think we can live with this #ifdef __DOXYGEN__ non sense hiding the absence of rte_mov48 in x86... > /** > * Copy 48 bytes from one location to another using optimised > * instructions. The locations should not overlap. > @@ -49,8 +54,6 @@ rte_mov32(uint8_t *dst, const uint8_t *src); > static inline void > rte_mov48(uint8_t *dst, const uint8_t *src); > > -#endif /* __DOXYGEN__ */ > - > /** > * Copy 64 bytes from one location to another using optimised > * instructions. The locations should not overlap. > @@ -87,8 +90,6 @@ rte_mov128(uint8_t *dst, const uint8_t *src); > static inline void > rte_mov256(uint8_t *dst, const uint8_t *src); > > -#ifdef __DOXYGEN__ > - > /** > * Copy bytes from one location to another. The locations must not overl= ap. > * > @@ -111,6 +112,52 @@ rte_mov256(uint8_t *dst, const uint8_t *src); > static void * > rte_memcpy(void *dst, const void *src, size_t n); > > -#endif /* __DOXYGEN__ */ Removing this DOXYGEN here should be ok. CI will tell us. > diff --git a/lib/eal/x86/include/meson.build b/lib/eal/x86/include/meson.= build > index 52d2f8e969..09c2fe2485 100644 > --- a/lib/eal/x86/include/meson.build > +++ b/lib/eal/x86/include/meson.build > @@ -16,6 +16,7 @@ arch_headers =3D files( > 'rte_spinlock.h', > 'rte_vect.h', > ) > + Unrelated change. > arch_indirect_headers =3D files( > 'rte_atomic_32.h', > 'rte_atomic_64.h', --=20 David Marchand