From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <dev-bounces@dpdk.org>
Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124])
	by inbox.dpdk.org (Postfix) with ESMTP id 298D142FA3;
	Mon, 31 Jul 2023 14:14:17 +0200 (CEST)
Received: from mails.dpdk.org (localhost [127.0.0.1])
	by mails.dpdk.org (Postfix) with ESMTP id 1AF9B43245;
	Mon, 31 Jul 2023 14:14:17 +0200 (CEST)
Received: from wout4-smtp.messagingengine.com (wout4-smtp.messagingengine.com
 [64.147.123.20]) by mails.dpdk.org (Postfix) with ESMTP id 416514067B
 for <dev@dpdk.org>; Mon, 31 Jul 2023 14:14:15 +0200 (CEST)
Received: from compute2.internal (compute2.nyi.internal [10.202.2.46])
 by mailout.west.internal (Postfix) with ESMTP id BF68132000E5;
 Mon, 31 Jul 2023 08:14:12 -0400 (EDT)
Received: from mailfrontend2 ([10.202.2.163])
 by compute2.internal (MEProxy); Mon, 31 Jul 2023 08:14:14 -0400
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=monjalon.net; h=
 cc:cc:content-transfer-encoding:content-type:content-type:date
 :date:from:from:in-reply-to:in-reply-to:message-id:mime-version
 :references:reply-to:sender:subject:subject:to:to; s=fm3; t=
 1690805652; x=1690892052; bh=7SZJg0M4ssdkmayrv4DS6SH63B74x2QtNFW
 LU/iDr8A=; b=uszxgUPMkh07w0XCRfTiq27abbtRtN3/hAsTISHNCKveWHBS9zE
 C+k78sj9tjzHVTQdoIUAKCkqDtnRuwMmWkW5W2yhA+d6ifZ9SYcV1lX0mJrnnEmJ
 gUfaVslW6JgAB8SWcYY7zWnnBUdldWWBvokKdUvbZMODO9CIAFUVqqaVwqHuEnsn
 vF8+jiSQAHHvawGCP+DcE6W+N9q2DPWHijwniMJhYR8OqFGWcwzf6Likocw2HjiX
 Kbju6O611Q/o1ZCNK34hdS4BJZZD4bmFsvpWTNwX8I8MZs1OvbAMQnGnNI4wrHbk
 Pb2aU1f9wk86v9oAN+vz+J8PBePHcCO3phA==
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=
 messagingengine.com; h=cc:cc:content-transfer-encoding
 :content-type:content-type:date:date:feedback-id:feedback-id
 :from:from:in-reply-to:in-reply-to:message-id:mime-version
 :references:reply-to:sender:subject:subject:to:to:x-me-proxy
 :x-me-proxy:x-me-sender:x-me-sender:x-sasl-enc; s=fm3; t=
 1690805652; x=1690892052; bh=7SZJg0M4ssdkmayrv4DS6SH63B74x2QtNFW
 LU/iDr8A=; b=KMoeghurh0ZL4m22ZlSvE98as3XkZxoK7pn37ZNcjVkockccPlt
 SziDzKX4bX5nf72ABCNu4oZA4JOBka5DKBlEWlCr2DG2e6T+kdKX/GPZ20+HsIY2
 FRRKQv2TFszwHBsfAN6nFIGMh0ZLW6JeFgJCLURV4WUd3jTZLe+HkLaqKQvH9X3K
 Erkmb1uC22NDs8B0T2RB7F0BoY7di9bR3dhnCCQMkRtipLTLI41Q0GtDXfcliaEY
 thpZlUWNRzI990s+KS4UeVFo42OGIyfxitmpFZXd/Nfj9P9/XbFU/0XMRrX1gcvs
 THJrdfI5dkqVHGEY33pyRF5qX61Vws6Qk6Q==
X-ME-Sender: <xms:k6XHZGM8SFg55LoNXI5LrfDtVEAPmm87Zq8F-UI049AEl2xHxE4XQw>
 <xme:k6XHZE_EPEqdx68MizeRcDQ2AwuppIKhrE4mrsDH4GwYYfnI5Uz-zKQj5YP8LBFwi
 5v7GEs7HqfGtubGrg>
X-ME-Received: <xmr:k6XHZNTncqYB9V33jcjwVOn0tMCwK4w0g52sYG1AOu2DbMYMinESuI7ikd0Grapg7OkILcKtF1hCFbpCOdIYfGed0g>
X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedviedrjeeggddviecutefuodetggdotefrodftvf
 curfhrohhfihhlvgemucfhrghsthforghilhdpqfgfvfdpuffrtefokffrpgfnqfghnecu
 uegrihhlohhuthemuceftddtnecusecvtfgvtghiphhivghnthhsucdlqddutddtmdenuc
 fjughrpefhvfevufffkfgjfhgggfgtsehtqhertddttddunecuhfhrohhmpefvhhhomhgr
 shcuofhonhhjrghlohhnuceothhhohhmrghssehmohhnjhgrlhhonhdrnhgvtheqnecugg
 ftrfgrthhtvghrnhepfefgvedvieelveevledvgfevhffgteelleevtefgudelvddtudev
 hfehledulefgnecuffhomhgrihhnpehinhhtvghlrdgtohhmpdgrrhhmrdgtohhmnecuve
 hluhhsthgvrhfuihiivgeptdenucfrrghrrghmpehmrghilhhfrhhomhepthhhohhmrghs
 sehmohhnjhgrlhhonhdrnhgvth
X-ME-Proxy: <xmx:k6XHZGuZiO8I8o2gC2nQibFh9ODiRy-GnqlxcVHIstQN_L6pQB_0vQ>
 <xmx:k6XHZOc45yuZu_5x8pPc1ttcgN-5A1S2Q6XvXrSUKy68iXclaQeoVg>
 <xmx:k6XHZK1gbbZM5k8x_IH1d2dP12GCkdAH4oYzQQtoxz_HxIsgcYHmrg>
 <xmx:lKXHZF-hjXIShEIURcCR4IT9AO7EquXDjUshmi04rdekdKrCiAXfNg>
Feedback-ID: i47234305:Fastmail
Received: by mail.messagingengine.com (Postfix) with ESMTPA; Mon,
 31 Jul 2023 08:14:09 -0400 (EDT)
From: Thomas Monjalon <thomas@monjalon.net>
To: Morten =?ISO-8859-1?Q?Br=F8rup?= <mb@smartsharesystems.com>
Cc: hofors@lysator.liu.se, bruce.richardson@intel.com,
 konstantin.v.ananyev@yandex.ru, Honnappa.Nagarahalli@arm.com,
 stephen@networkplumber.org, dev@dpdk.org, mattias.ronnblom@ericsson.com,
 kda@semihalf.com, drc@linux.vnet.ibm.com, dev@dpdk.org,
 andrew.rybchenko@oktetlabs.ru, olivier.matz@6wind.com,
 anatoly.burakov@intel.com, dmitry.kozliuk@gmail.com
Subject: Re: [PATCH v4] eal: non-temporal memcpy
Date: Mon, 31 Jul 2023 14:14:08 +0200
Message-ID: <5204082.6fTUFtlzNn@thomas>
In-Reply-To: <20221010064600.16495-1-mb@smartsharesystems.com>
References: <98CBD80474FA8B44BF855DF32C47DC35D8728A@smartserver.smartshare.dk>
 <20221010064600.16495-1-mb@smartsharesystems.com>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain; charset="iso-8859-1"
X-BeenThere: dev@dpdk.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: DPDK patches and discussions <dev.dpdk.org>
List-Unsubscribe: <https://mails.dpdk.org/options/dev>,
 <mailto:dev-request@dpdk.org?subject=unsubscribe>
List-Archive: <http://mails.dpdk.org/archives/dev/>
List-Post: <mailto:dev@dpdk.org>
List-Help: <mailto:dev-request@dpdk.org?subject=help>
List-Subscribe: <https://mails.dpdk.org/listinfo/dev>,
 <mailto:dev-request@dpdk.org?subject=subscribe>
Errors-To: dev-bounces@dpdk.org

Hello,

What's the status of this feature?


10/10/2022 08:46, Morten Br=F8rup:
> This patch provides a function for memory copy using non-temporal store,
> load or both, controlled by flags passed to the function.
>=20
> Applications sometimes copy data to another memory location, which is only
> used much later.
> In this case, it is inefficient to pollute the data cache with the copied
> data.
>=20
> An example use case (originating from a real life application):
> Copying filtered packets, or the first part of them, into a capture buffer
> for offline analysis.
>=20
> The purpose of the function is to achieve a performance gain by not
> polluting the cache when copying data.
> Although the throughput can be improved by further optimization, I do not
> have time to do it now.
>=20
> The functional tests and performance tests for memory copy have been
> expanded to include non-temporal copying.
>=20
> A non-temporal version of the mbuf library's function to create a full
> copy of a given packet mbuf is provided.
>=20
> The packet capture and packet dump libraries have been updated to use
> non-temporal memory copy of the packets.
>=20
> Implementation notes:
>=20
> Implementations for non-x86 architectures can be provided by anyone at a
> later time. I am not going to do it.
>=20
> x86 non-temporal load instructions must be 16 byte aligned [1], and
> non-temporal store instructions must be 4, 8 or 16 byte aligned [2].
>=20
> ARM non-temporal load and store instructions seem to require 4 byte
> alignment [3].
>=20
> [1] https://www.intel.com/content/www/us/en/docs/intrinsics-guide/
> index.html#text=3D_mm_stream_load
> [2] https://www.intel.com/content/www/us/en/docs/intrinsics-guide/
> index.html#text=3D_mm_stream_si
> [3] https://developer.arm.com/documentation/100076/0100/
> A64-Instruction-Set-Reference/A64-Floating-point-Instructions/
> LDNP--SIMD-and-FP-
>=20
> This patch is a major rewrite from the RFC v3, so no version log comparing
> to the RFC is provided.
>=20
> v4
> * Also ignore the warning for clang int the workaround for
>   _mm_stream_load_si128() missing const in the parameter.
> * Add missing C linkage specifier in rte_memcpy.h.
>=20
> v3
> * _mm_stream_si64() is not supported on 32-bit x86 architecture, so only
>   use it on 64-bit x86 architecture.
> * CLANG warns that _mm_stream_load_si128_const() and
>   rte_memcpy_nt_15_or_less_s16a() are not public,
>   so remove __rte_internal from them. It also affects the documentation
>   for the functions, so the fix can't be limited to CLANG.
> * Use __rte_experimental instead of __rte_internal.
> * Replace <n> with nnn in function documentation; it doesn't look like
>   HTML.
> * Slightly modify the workaround for _mm_stream_load_si128() missing const
>   in the parameter; the ancient GCC 4.5.8 in RHEL7 doesn't understand
>   #pragma GCC diagnostic ignored "-Wdiscarded-qualifiers", so use
>   #pragma GCC diagnostic ignored "-Wcast-qual" instead. I hope that works.
> * Fixed one coding style issue missed in v2.
>=20
> v2
> * The last 16 byte block of data, incl. any trailing bytes, were not
>   copied from the source memory area in rte_memcpy_nt_buf().
> * Fix many coding style issues.
> * Add some missing header files.
> * Fix build time warning for non-x86 architectures by using a different
>   method to mark the flags parameter unused.
> * CLANG doesn't understand RTE_BUILD_BUG_ON(!__builtin_constant_p(flags)),
>   so omit it when using CLANG.
>=20
> Signed-off-by: Morten Br=F8rup <mb@smartsharesystems.com>
> ---
>  app/test/test_memcpy.c               |   65 +-
>  app/test/test_memcpy_perf.c          |  187 ++--
>  lib/eal/include/generic/rte_memcpy.h |  127 +++
>  lib/eal/x86/include/rte_memcpy.h     | 1238 ++++++++++++++++++++++++++
>  lib/mbuf/rte_mbuf.c                  |   77 ++
>  lib/mbuf/rte_mbuf.h                  |   32 +
>  lib/mbuf/version.map                 |    1 +
>  lib/pcapng/rte_pcapng.c              |    3 +-
>  lib/pdump/rte_pdump.c                |    6 +-
>  9 files changed, 1645 insertions(+), 91 deletions(-)