From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id 1207942FA3; Mon, 31 Jul 2023 14:25:34 +0200 (CEST) Received: from mails.dpdk.org (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id 019E140A89; Mon, 31 Jul 2023 14:25:34 +0200 (CEST) Received: from dkmailrelay1.smartsharesystems.com (smartserver.smartsharesystems.com [77.243.40.215]) by mails.dpdk.org (Postfix) with ESMTP id 167DF4067B for ; Mon, 31 Jul 2023 14:25:32 +0200 (CEST) Received: from smartserver.smartsharesystems.com (smartserver.smartsharesys.local [192.168.4.10]) by dkmailrelay1.smartsharesystems.com (Postfix) with ESMTP id E333D20424; Mon, 31 Jul 2023 14:25:31 +0200 (CEST) Content-class: urn:content-classes:message MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable Subject: RE: [PATCH v4] eal: non-temporal memcpy Date: Mon, 31 Jul 2023 14:25:30 +0200 X-MimeOLE: Produced By Microsoft Exchange V6.5 Message-ID: <98CBD80474FA8B44BF855DF32C47DC35D87AAE@smartserver.smartshare.dk> In-Reply-To: <5204082.6fTUFtlzNn@thomas> X-MS-Has-Attach: X-MS-TNEF-Correlator: Thread-Topic: [PATCH v4] eal: non-temporal memcpy Thread-Index: AdnDqIgGKpUNQXSgSs24TXmNfjT/dQAALOXg References: <98CBD80474FA8B44BF855DF32C47DC35D8728A@smartserver.smartshare.dk> <20221010064600.16495-1-mb@smartsharesystems.com> <5204082.6fTUFtlzNn@thomas> From: =?iso-8859-1?Q?Morten_Br=F8rup?= To: "Thomas Monjalon" Cc: , , , , , , , , , , , , , X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org > From: Thomas Monjalon [mailto:thomas@monjalon.net] > Sent: Monday, 31 July 2023 14.14 >=20 > Hello, >=20 > What's the status of this feature? I haven't given up on upstreaming this feature, but there doesn't seem = to be much demand for it, so working on it has low priority. >=20 >=20 > 10/10/2022 08:46, Morten Br=F8rup: > > This patch provides a function for memory copy using non-temporal = store, > > load or both, controlled by flags passed to the function. > > > > Applications sometimes copy data to another memory location, which = is only > > used much later. > > In this case, it is inefficient to pollute the data cache with the = copied > > data. > > > > An example use case (originating from a real life application): > > Copying filtered packets, or the first part of them, into a capture = buffer > > for offline analysis. > > > > The purpose of the function is to achieve a performance gain by not > > polluting the cache when copying data. > > Although the throughput can be improved by further optimization, I = do not > > have time to do it now. > > > > The functional tests and performance tests for memory copy have been > > expanded to include non-temporal copying. > > > > A non-temporal version of the mbuf library's function to create a = full > > copy of a given packet mbuf is provided. > > > > The packet capture and packet dump libraries have been updated to = use > > non-temporal memory copy of the packets. > > > > Implementation notes: > > > > Implementations for non-x86 architectures can be provided by anyone = at a > > later time. I am not going to do it. > > > > x86 non-temporal load instructions must be 16 byte aligned [1], and > > non-temporal store instructions must be 4, 8 or 16 byte aligned [2]. > > > > ARM non-temporal load and store instructions seem to require 4 byte > > alignment [3]. > > > > [1] https://www.intel.com/content/www/us/en/docs/intrinsics-guide/ > > index.html#text=3D_mm_stream_load > > [2] https://www.intel.com/content/www/us/en/docs/intrinsics-guide/ > > index.html#text=3D_mm_stream_si > > [3] https://developer.arm.com/documentation/100076/0100/ > > A64-Instruction-Set-Reference/A64-Floating-point-Instructions/ > > LDNP--SIMD-and-FP- > > > > This patch is a major rewrite from the RFC v3, so no version log = comparing > > to the RFC is provided. > > > > v4 > > * Also ignore the warning for clang int the workaround for > > _mm_stream_load_si128() missing const in the parameter. > > * Add missing C linkage specifier in rte_memcpy.h. > > > > v3 > > * _mm_stream_si64() is not supported on 32-bit x86 architecture, so = only > > use it on 64-bit x86 architecture. > > * CLANG warns that _mm_stream_load_si128_const() and > > rte_memcpy_nt_15_or_less_s16a() are not public, > > so remove __rte_internal from them. It also affects the = documentation > > for the functions, so the fix can't be limited to CLANG. > > * Use __rte_experimental instead of __rte_internal. > > * Replace with nnn in function documentation; it doesn't look = like > > HTML. > > * Slightly modify the workaround for _mm_stream_load_si128() missing = const > > in the parameter; the ancient GCC 4.5.8 in RHEL7 doesn't = understand > > #pragma GCC diagnostic ignored "-Wdiscarded-qualifiers", so use > > #pragma GCC diagnostic ignored "-Wcast-qual" instead. I hope that = works. > > * Fixed one coding style issue missed in v2. > > > > v2 > > * The last 16 byte block of data, incl. any trailing bytes, were not > > copied from the source memory area in rte_memcpy_nt_buf(). > > * Fix many coding style issues. > > * Add some missing header files. > > * Fix build time warning for non-x86 architectures by using a = different > > method to mark the flags parameter unused. > > * CLANG doesn't understand = RTE_BUILD_BUG_ON(!__builtin_constant_p(flags)), > > so omit it when using CLANG. > > > > Signed-off-by: Morten Br=F8rup > > --- > > app/test/test_memcpy.c | 65 +- > > app/test/test_memcpy_perf.c | 187 ++-- > > lib/eal/include/generic/rte_memcpy.h | 127 +++ > > lib/eal/x86/include/rte_memcpy.h | 1238 = ++++++++++++++++++++++++++ > > lib/mbuf/rte_mbuf.c | 77 ++ > > lib/mbuf/rte_mbuf.h | 32 + > > lib/mbuf/version.map | 1 + > > lib/pcapng/rte_pcapng.c | 3 +- > > lib/pdump/rte_pdump.c | 6 +- > > 9 files changed, 1645 insertions(+), 91 deletions(-) >=20 >=20 >=20