From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id A19BEA0C43; Thu, 21 Oct 2021 21:50:11 +0200 (CEST) Received: from [217.70.189.124] (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id 8AE7A40040; Thu, 21 Oct 2021 21:50:11 +0200 (CEST) Received: from new1-smtp.messagingengine.com (new1-smtp.messagingengine.com [66.111.4.221]) by mails.dpdk.org (Postfix) with ESMTP id 436FE4003F for ; Thu, 21 Oct 2021 21:50:10 +0200 (CEST) Received: from compute4.internal (compute4.nyi.internal [10.202.2.44]) by mailnew.nyi.internal (Postfix) with ESMTP id 76FA65812E9; Thu, 21 Oct 2021 15:50:08 -0400 (EDT) Received: from mailfrontend1 ([10.202.2.162]) by compute4.internal (MEProxy); Thu, 21 Oct 2021 15:50:08 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=monjalon.net; h= from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding:content-type; s=fm2; bh= WriRH5gX1TlMJc8zb3K+NfmwMUz3nuIa7LLNLWSgxzo=; b=rgG9KfCVewOG9olp IXSe1W2dQA4Aw5PE3vU1zmE/3+D3V5jn4F7OlaiG2+GF9Hji5wyxlHxGIImOPJU3 CjNBEDvAt9unJJhQhdgA0y+ZYTOuIx5bWyqULzrpi9RNexjqS/fO38SXkjNOtPbD PPu+00Yp5Yy86GgDfcC4oKSZFhnD6NKAK8voTotp4469MSpOHWfeSogKT4Ngf8UK bz5NACDfLjxEfED9AhjhqHUPJstVrfY9xkeU7693Z1I90AYkO7GzgpZdLJweT4HK lTmjEOg6rltPoWwIAJJIIcpYc4N+QnmiG3J6RcckaZuads2NdGRVaqbUXUkn9j1A a0P+/A== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:content-transfer-encoding:content-type :date:from:in-reply-to:message-id:mime-version:references :subject:to:x-me-proxy:x-me-proxy:x-me-sender:x-me-sender :x-sasl-enc; s=fm1; bh=WriRH5gX1TlMJc8zb3K+NfmwMUz3nuIa7LLNLWSgx zo=; b=EixKbn9W3LW5MMg3gPnkzU4wGVM183u8u85hNdKHZjQEcwF59PtmTE9oe tlawpjDsTSlugWM8Mt+s19B7yXdqmQc0cFyin74l2rwSEO6vtR0QbDF9AL4SDQ4U ByEHIUJMj585otyiiUcYoRUCJmcGHoFe2VnVyb529wnlTNeemJD0TQH4hkrSVRWA disO4nck85+tL4jK6lN1B3WMCxRIyu0If6ceEZIX100YsofIIH8l5kiOmwsC+uM2 XBgc47l7fFYxMxdwJ0okUbyuVwbx3TvvTAl0BmnyleWrRT5A+iUBUH3HcWm5JaIr WA8I98bt6DV/8P2UCyuXN6jH3PowQ== X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedvtddrvddviedgudegtdcutefuodetggdotefrod ftvfcurfhrohhfihhlvgemucfhrghsthforghilhdpqfgfvfdpuffrtefokffrpgfnqfgh necuuegrihhlohhuthemuceftddtnecusecvtfgvtghiphhivghnthhsucdlqddutddtmd enucfjughrpefhvffufffkjghfggfgtgesthfuredttddtvdenucfhrhhomhepvfhhohhm rghsucfoohhnjhgrlhhonhcuoehthhhomhgrshesmhhonhhjrghlohhnrdhnvghtqeenuc ggtffrrghtthgvrhhnpedugefgvdefudfftdefgeelgffhueekgfffhfeujedtteeutdej ueeiiedvffegheenucevlhhushhtvghrufhiiigvpedtnecurfgrrhgrmhepmhgrihhlfh hrohhmpehthhhomhgrshesmhhonhhjrghlohhnrdhnvght X-ME-Proxy: Received: by mail.messagingengine.com (Postfix) with ESMTPA; Thu, 21 Oct 2021 15:50:06 -0400 (EDT) From: Thomas Monjalon To: Aman Kumar , "Song, Keesang" Cc: "Ananyev, Konstantin" , "dev@dpdk.org" , "rasland@nvidia.com" , "asafp@nvidia.com" , "shys@nvidia.com" , "viacheslavo@nvidia.com" , "akozyrev@nvidia.com" , "matan@nvidia.com" , "Burakov, Anatoly" , "aman.kumar@vvdntech.in" , "jerinjacobk@gmail.com" , "Richardson, Bruce" , "david.marchand@redhat.com" Date: Thu, 21 Oct 2021 21:50:04 +0200 Message-ID: <4896828.JCGbO7EgO6@thomas> In-Reply-To: References: <20210823084411.29592-1-aman.kumar@vvdntech.in> <2486642.Qmzdh8hRR2@thomas> MIME-Version: 1.0 Content-Transfer-Encoding: 7Bit Content-Type: text/plain; charset="us-ascii" Subject: Re: [dpdk-dev] [PATCH v2 1/2] lib/eal: add amd epyc2 memcpy routine to eal X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" 21/10/2021 21:03, Song, Keesang: > From: Thomas Monjalon > > 21/10/2021 20:12, Song, Keesang: > > > From: Ananyev, Konstantin > > > > 21/10/2021 19:10, Song, Keesang: > > > > > 19/10/2021 17:35, Stephen Hemminger: > > > > > > From: Thomas Monjalon > > > > > > > 19/10/2021 12:47, Aman Kumar: > > > > > > > > This patch provides rte_memcpy* calls optimized for AMD EPYC > > > > > > > > platforms. Use config/x86/x86_amd_epyc_linux_gcc as cross-file > > > > > > > > with meson to build dpdk for AMD EPYC platforms. > > > > > > > > > > > > > > Please split in 2 patches: platform & memcpy. > > > > > > > > > > > > > > What optimization is specific to EPYC? > > > > > > > > > > > > > > I dislike the asm code below. > > > > > > > What is AMD specific inside? > > > > > > > Can it use compiler intrinsics as it is done elsewhere? > > > > > > > > > > > > And why is this not done by Gcc? > > > > > > > > > > I hope this can make some explanation to your question. > > > > > We(AMD Linux library support team) have implemented the custom > > > > > tailored memcpy solution which is a close match with DPDK use case > > > > > requirements like the below. > > > > > 1) Min 64B length data packet with cache aligned > > > > > Source and Destination. > > > > > 2) Non-Temporal load and temporal store for cache aligned > > > > > source for both RX and TX paths. > > > > > Could not implement the non-temporal store for TX_PATH, > > > > > as non-Temporal load/stores works only with 32B aligned addresses > > > > > for AVX2 > > > > > 3) This solution works for all AVX2 supported AMD machines. > > > > > > > > > > Internally we have completed the integrity testing and benchmarking > > > > > of the solution and found gains of 8.4% to 14.5% specifically on > > > > > Milan CPU(3rd Gen of EPYC Processor) > > > > > > > > It still not clear to me why it has to be written in assembler. > > > > Why similar stuff can't be written in C with instincts, as rest of > > > > rte_memcpy.h does? > > > > > > The current memcpy implementation in Glibc is based out of assembly > > > coding. > > > Although memcpy could have been implemented with intrinsic, > > > but since our AMD library developers are working on the Glibc > > > functions, they have provided a tailored implementation based > > > out of inline assembly coding. > > > > Please convert it to C code, thanks. > > I've already asked our AMD tools team, but they're saying > they are not really familiar with C code implementation. > We need your approval for now since we really need to get > this patch submitted to 21.11 LTS. Not sure it is urgent given that v2 came after the planned -rc1 date, after 6 weeks of silence. About the approval, there are already 3 technical board members (Konstantin, Stephen and me) objecting against this patch. Not being familiar with C code when working on CPU optimization in 2021 is a strange argument. In general, I don't really understand why we should maintain memcpy functions in DPDK instead of relying on libc optimizations. Having big asm code to maintain and debug is not helping. I think this case shows that AMD needs to become more familiar with DPDK schedule and expectations. I would encourage you to contribute more in the project, so such misunderstanding won't happen in future. Hope that's all understandable PS: discussion is more readable with replies below