From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id 298D142FA3; Mon, 31 Jul 2023 14:14:17 +0200 (CEST) Received: from mails.dpdk.org (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id 1AF9B43245; Mon, 31 Jul 2023 14:14:17 +0200 (CEST) Received: from wout4-smtp.messagingengine.com (wout4-smtp.messagingengine.com [64.147.123.20]) by mails.dpdk.org (Postfix) with ESMTP id 416514067B for ; Mon, 31 Jul 2023 14:14:15 +0200 (CEST) Received: from compute2.internal (compute2.nyi.internal [10.202.2.46]) by mailout.west.internal (Postfix) with ESMTP id BF68132000E5; Mon, 31 Jul 2023 08:14:12 -0400 (EDT) Received: from mailfrontend2 ([10.202.2.163]) by compute2.internal (MEProxy); Mon, 31 Jul 2023 08:14:14 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=monjalon.net; h= cc:cc:content-transfer-encoding:content-type:content-type:date :date:from:from:in-reply-to:in-reply-to:message-id:mime-version :references:reply-to:sender:subject:subject:to:to; s=fm3; t= 1690805652; x=1690892052; bh=7SZJg0M4ssdkmayrv4DS6SH63B74x2QtNFW LU/iDr8A=; b=uszxgUPMkh07w0XCRfTiq27abbtRtN3/hAsTISHNCKveWHBS9zE C+k78sj9tjzHVTQdoIUAKCkqDtnRuwMmWkW5W2yhA+d6ifZ9SYcV1lX0mJrnnEmJ gUfaVslW6JgAB8SWcYY7zWnnBUdldWWBvokKdUvbZMODO9CIAFUVqqaVwqHuEnsn vF8+jiSQAHHvawGCP+DcE6W+N9q2DPWHijwniMJhYR8OqFGWcwzf6Likocw2HjiX Kbju6O611Q/o1ZCNK34hdS4BJZZD4bmFsvpWTNwX8I8MZs1OvbAMQnGnNI4wrHbk Pb2aU1f9wk86v9oAN+vz+J8PBePHcCO3phA== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:cc:content-transfer-encoding :content-type:content-type:date:date:feedback-id:feedback-id :from:from:in-reply-to:in-reply-to:message-id:mime-version :references:reply-to:sender:subject:subject:to:to:x-me-proxy :x-me-proxy:x-me-sender:x-me-sender:x-sasl-enc; s=fm3; t= 1690805652; x=1690892052; bh=7SZJg0M4ssdkmayrv4DS6SH63B74x2QtNFW LU/iDr8A=; b=KMoeghurh0ZL4m22ZlSvE98as3XkZxoK7pn37ZNcjVkockccPlt SziDzKX4bX5nf72ABCNu4oZA4JOBka5DKBlEWlCr2DG2e6T+kdKX/GPZ20+HsIY2 FRRKQv2TFszwHBsfAN6nFIGMh0ZLW6JeFgJCLURV4WUd3jTZLe+HkLaqKQvH9X3K Erkmb1uC22NDs8B0T2RB7F0BoY7di9bR3dhnCCQMkRtipLTLI41Q0GtDXfcliaEY thpZlUWNRzI990s+KS4UeVFo42OGIyfxitmpFZXd/Nfj9P9/XbFU/0XMRrX1gcvs THJrdfI5dkqVHGEY33pyRF5qX61Vws6Qk6Q== X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedviedrjeeggddviecutefuodetggdotefrodftvf curfhrohhfihhlvgemucfhrghsthforghilhdpqfgfvfdpuffrtefokffrpgfnqfghnecu uegrihhlohhuthemuceftddtnecusecvtfgvtghiphhivghnthhsucdlqddutddtmdenuc fjughrpefhvfevufffkfgjfhgggfgtsehtqhertddttddunecuhfhrohhmpefvhhhomhgr shcuofhonhhjrghlohhnuceothhhohhmrghssehmohhnjhgrlhhonhdrnhgvtheqnecugg ftrfgrthhtvghrnhepfefgvedvieelveevledvgfevhffgteelleevtefgudelvddtudev hfehledulefgnecuffhomhgrihhnpehinhhtvghlrdgtohhmpdgrrhhmrdgtohhmnecuve hluhhsthgvrhfuihiivgeptdenucfrrghrrghmpehmrghilhhfrhhomhepthhhohhmrghs sehmohhnjhgrlhhonhdrnhgvth X-ME-Proxy: Feedback-ID: i47234305:Fastmail Received: by mail.messagingengine.com (Postfix) with ESMTPA; Mon, 31 Jul 2023 08:14:09 -0400 (EDT) From: Thomas Monjalon To: Morten =?ISO-8859-1?Q?Br=F8rup?= Cc: hofors@lysator.liu.se, bruce.richardson@intel.com, konstantin.v.ananyev@yandex.ru, Honnappa.Nagarahalli@arm.com, stephen@networkplumber.org, dev@dpdk.org, mattias.ronnblom@ericsson.com, kda@semihalf.com, drc@linux.vnet.ibm.com, dev@dpdk.org, andrew.rybchenko@oktetlabs.ru, olivier.matz@6wind.com, anatoly.burakov@intel.com, dmitry.kozliuk@gmail.com Subject: Re: [PATCH v4] eal: non-temporal memcpy Date: Mon, 31 Jul 2023 14:14:08 +0200 Message-ID: <5204082.6fTUFtlzNn@thomas> In-Reply-To: <20221010064600.16495-1-mb@smartsharesystems.com> References: <98CBD80474FA8B44BF855DF32C47DC35D8728A@smartserver.smartshare.dk> <20221010064600.16495-1-mb@smartsharesystems.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="iso-8859-1" X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Hello, What's the status of this feature? 10/10/2022 08:46, Morten Br=F8rup: > This patch provides a function for memory copy using non-temporal store, > load or both, controlled by flags passed to the function. >=20 > Applications sometimes copy data to another memory location, which is only > used much later. > In this case, it is inefficient to pollute the data cache with the copied > data. >=20 > An example use case (originating from a real life application): > Copying filtered packets, or the first part of them, into a capture buffer > for offline analysis. >=20 > The purpose of the function is to achieve a performance gain by not > polluting the cache when copying data. > Although the throughput can be improved by further optimization, I do not > have time to do it now. >=20 > The functional tests and performance tests for memory copy have been > expanded to include non-temporal copying. >=20 > A non-temporal version of the mbuf library's function to create a full > copy of a given packet mbuf is provided. >=20 > The packet capture and packet dump libraries have been updated to use > non-temporal memory copy of the packets. >=20 > Implementation notes: >=20 > Implementations for non-x86 architectures can be provided by anyone at a > later time. I am not going to do it. >=20 > x86 non-temporal load instructions must be 16 byte aligned [1], and > non-temporal store instructions must be 4, 8 or 16 byte aligned [2]. >=20 > ARM non-temporal load and store instructions seem to require 4 byte > alignment [3]. >=20 > [1] https://www.intel.com/content/www/us/en/docs/intrinsics-guide/ > index.html#text=3D_mm_stream_load > [2] https://www.intel.com/content/www/us/en/docs/intrinsics-guide/ > index.html#text=3D_mm_stream_si > [3] https://developer.arm.com/documentation/100076/0100/ > A64-Instruction-Set-Reference/A64-Floating-point-Instructions/ > LDNP--SIMD-and-FP- >=20 > This patch is a major rewrite from the RFC v3, so no version log comparing > to the RFC is provided. >=20 > v4 > * Also ignore the warning for clang int the workaround for > _mm_stream_load_si128() missing const in the parameter. > * Add missing C linkage specifier in rte_memcpy.h. >=20 > v3 > * _mm_stream_si64() is not supported on 32-bit x86 architecture, so only > use it on 64-bit x86 architecture. > * CLANG warns that _mm_stream_load_si128_const() and > rte_memcpy_nt_15_or_less_s16a() are not public, > so remove __rte_internal from them. It also affects the documentation > for the functions, so the fix can't be limited to CLANG. > * Use __rte_experimental instead of __rte_internal. > * Replace with nnn in function documentation; it doesn't look like > HTML. > * Slightly modify the workaround for _mm_stream_load_si128() missing const > in the parameter; the ancient GCC 4.5.8 in RHEL7 doesn't understand > #pragma GCC diagnostic ignored "-Wdiscarded-qualifiers", so use > #pragma GCC diagnostic ignored "-Wcast-qual" instead. I hope that works. > * Fixed one coding style issue missed in v2. >=20 > v2 > * The last 16 byte block of data, incl. any trailing bytes, were not > copied from the source memory area in rte_memcpy_nt_buf(). > * Fix many coding style issues. > * Add some missing header files. > * Fix build time warning for non-x86 architectures by using a different > method to mark the flags parameter unused. > * CLANG doesn't understand RTE_BUILD_BUG_ON(!__builtin_constant_p(flags)), > so omit it when using CLANG. >=20 > Signed-off-by: Morten Br=F8rup > --- > app/test/test_memcpy.c | 65 +- > app/test/test_memcpy_perf.c | 187 ++-- > lib/eal/include/generic/rte_memcpy.h | 127 +++ > lib/eal/x86/include/rte_memcpy.h | 1238 ++++++++++++++++++++++++++ > lib/mbuf/rte_mbuf.c | 77 ++ > lib/mbuf/rte_mbuf.h | 32 + > lib/mbuf/version.map | 1 + > lib/pcapng/rte_pcapng.c | 3 +- > lib/pdump/rte_pdump.c | 6 +- > 9 files changed, 1645 insertions(+), 91 deletions(-)