From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id 97696A0547; Tue, 26 Oct 2021 18:14:13 +0200 (CEST) Received: from [217.70.189.124] (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id 61D6A40E0F; Tue, 26 Oct 2021 18:14:13 +0200 (CEST) Received: from out1-smtp.messagingengine.com (out1-smtp.messagingengine.com [66.111.4.25]) by mails.dpdk.org (Postfix) with ESMTP id 17CB8407FF for ; Tue, 26 Oct 2021 18:14:12 +0200 (CEST) Received: from compute1.internal (compute1.nyi.internal [10.202.2.41]) by mailout.nyi.internal (Postfix) with ESMTP id 2B9035C03AE; Tue, 26 Oct 2021 12:14:11 -0400 (EDT) Received: from mailfrontend2 ([10.202.2.163]) by compute1.internal (MEProxy); Tue, 26 Oct 2021 12:14:11 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=monjalon.net; h= from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding:content-type; s=fm2; bh= Gd61D++ITqmRCUvpf3uNlO1uEdeF9xwj0WWdzPNgG3s=; b=GGasF5H/104xHSMx o6VNljfovL1uDRUy+JDpMqXHtHo//z9ZWjamLZ3X93t0IzGh11NwGuCeVK2HQ+Ob DdIifBJ6/J+UHZRifNW/mpo/j93kHXt+N8nAFJABIbdP8JBR/uOoFbnPhJp/oLWi PJ9/eY+DAHoyu31q1C0WCiSNVt16gIVI6biGDSAmZCorWiuegDVAjNN9VAGZ3mKb LKl/i8YTP68tEiTY2GNPlr3fophRgNm3Rz3I8PCefbgnlTpA4mwhw/e9gIh3zRdZ imBQcsxeWhZaHxFnWSTW2QMyCoHU89SyJX26YFMofBJN6LIfGo9Sf1yQTarDkqWa VH3AWg== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:content-transfer-encoding:content-type :date:from:in-reply-to:message-id:mime-version:references :subject:to:x-me-proxy:x-me-proxy:x-me-sender:x-me-sender :x-sasl-enc; s=fm1; bh=Gd61D++ITqmRCUvpf3uNlO1uEdeF9xwj0WWdzPNgG 3s=; b=mE+GvqyYu/FSO4vHzFmU/qNf2zRXJOGVSOazfmpW370QxK/tkUJe3vEt2 Y8+FDHKezeoBblO5pUgmbM4mNzUXodgbjtozYcRh/HAm67t7uWD6o3aPJuzlMcUS R2s+E0+sCH3t4FYfTYru6Lp+GhK+QdGmyYnaBV9E9KUZC0sKbaMLM8rzVKOtxiK8 aEnNcAgYREHoLW313/C8YRKly53Ys9lBtF0T9TQCQEk0J7zfr6kOYbbtOV/EWJvZ oLsHcVHwaHi2F/AMkIutBDC9nUvjXsRRResS3X9y353I+3ZOEpHxcq9cILtOlQ45 8+P1cTchoIT+8afxPXnA/Ye4arRJg== X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedvtddrvdefkedgheegucetufdoteggodetrfdotf fvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdfqfgfvpdfurfetoffkrfgpnffqhgen uceurghilhhouhhtmecufedttdenucesvcftvggtihhpihgvnhhtshculddquddttddmne cujfgurhephffvufffkfgjfhgggfgtsehtufertddttddvnecuhfhrohhmpefvhhhomhgr shcuofhonhhjrghlohhnuceothhhohhmrghssehmohhnjhgrlhhonhdrnhgvtheqnecugg ftrfgrthhtvghrnhepudeggfdvfeduffdtfeeglefghfeukefgfffhueejtdetuedtjeeu ieeivdffgeehnecuvehluhhsthgvrhfuihiivgeptdenucfrrghrrghmpehmrghilhhfrh homhepthhhohhmrghssehmohhnjhgrlhhonhdrnhgvth X-ME-Proxy: Received: by mail.messagingengine.com (Postfix) with ESMTPA; Tue, 26 Oct 2021 12:14:09 -0400 (EDT) From: Thomas Monjalon To: Aman Kumar Cc: dev@dpdk.org, viacheslavo@nvidia.com, anatoly.burakov@intel.com, keesang.song@amd.com, jerinjacobk@gmail.com, konstantin.ananyev@intel.com, bruce.richardson@intel.com Date: Tue, 26 Oct 2021 18:14:08 +0200 Message-ID: <2148097.ar7J4MBmm8@thomas> In-Reply-To: <20211026155645.246783-3-aman.kumar@vvdntech.in> References: <20211019104724.19416-1-aman.kumar@vvdntech.in> <20211026155645.246783-1-aman.kumar@vvdntech.in> <20211026155645.246783-3-aman.kumar@vvdntech.in> MIME-Version: 1.0 Content-Transfer-Encoding: 7Bit Content-Type: text/plain; charset="us-ascii" Subject: Re: [dpdk-dev] [PATCH v3 3/3] lib/eal: add temporal store memcpy support on AMD platform X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" 26/10/2021 17:56, Aman Kumar: > This patch provides a rte_memcpy* call with temporal stores. > Use -Dcpu_instruction_set=znverX with build to enable this API. > > Signed-off-by: Aman Kumar > --- > config/x86/meson.build | 2 + > lib/eal/x86/include/rte_memcpy.h | 114 +++++++++++++++++++++++++++++++ It looks better as C code. Do you achieve the same performance as the asm version? > +#if defined RTE_MEMCPY_AMDEPYC [...] > +static __rte_always_inline void * > +rte_memcpy_aligned_tstore16_generic(void *dst, void *src, int len) So to be clear, an application will benefit of this optimization if 1/ DPDK is specifically compiled for AMD 2/ the application is compiled with above DPDK build (because of inlinining) I guess there is no good way to benefit from the optimization without specific compilation, because of inlining constraint. Another design, with less constraint but less performance, would be to have a function pointer assigned at runtime based on the CPU.