From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: <dev-bounces@dpdk.org> Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id A19BEA0C43; Thu, 21 Oct 2021 21:50:11 +0200 (CEST) Received: from [217.70.189.124] (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id 8AE7A40040; Thu, 21 Oct 2021 21:50:11 +0200 (CEST) Received: from new1-smtp.messagingengine.com (new1-smtp.messagingengine.com [66.111.4.221]) by mails.dpdk.org (Postfix) with ESMTP id 436FE4003F for <dev@dpdk.org>; Thu, 21 Oct 2021 21:50:10 +0200 (CEST) Received: from compute4.internal (compute4.nyi.internal [10.202.2.44]) by mailnew.nyi.internal (Postfix) with ESMTP id 76FA65812E9; Thu, 21 Oct 2021 15:50:08 -0400 (EDT) Received: from mailfrontend1 ([10.202.2.162]) by compute4.internal (MEProxy); Thu, 21 Oct 2021 15:50:08 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=monjalon.net; h= from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding:content-type; s=fm2; bh= WriRH5gX1TlMJc8zb3K+NfmwMUz3nuIa7LLNLWSgxzo=; b=rgG9KfCVewOG9olp IXSe1W2dQA4Aw5PE3vU1zmE/3+D3V5jn4F7OlaiG2+GF9Hji5wyxlHxGIImOPJU3 CjNBEDvAt9unJJhQhdgA0y+ZYTOuIx5bWyqULzrpi9RNexjqS/fO38SXkjNOtPbD PPu+00Yp5Yy86GgDfcC4oKSZFhnD6NKAK8voTotp4469MSpOHWfeSogKT4Ngf8UK bz5NACDfLjxEfED9AhjhqHUPJstVrfY9xkeU7693Z1I90AYkO7GzgpZdLJweT4HK lTmjEOg6rltPoWwIAJJIIcpYc4N+QnmiG3J6RcckaZuads2NdGRVaqbUXUkn9j1A a0P+/A== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:content-transfer-encoding:content-type :date:from:in-reply-to:message-id:mime-version:references :subject:to:x-me-proxy:x-me-proxy:x-me-sender:x-me-sender :x-sasl-enc; s=fm1; bh=WriRH5gX1TlMJc8zb3K+NfmwMUz3nuIa7LLNLWSgx zo=; b=EixKbn9W3LW5MMg3gPnkzU4wGVM183u8u85hNdKHZjQEcwF59PtmTE9oe tlawpjDsTSlugWM8Mt+s19B7yXdqmQc0cFyin74l2rwSEO6vtR0QbDF9AL4SDQ4U ByEHIUJMj585otyiiUcYoRUCJmcGHoFe2VnVyb529wnlTNeemJD0TQH4hkrSVRWA disO4nck85+tL4jK6lN1B3WMCxRIyu0If6ceEZIX100YsofIIH8l5kiOmwsC+uM2 XBgc47l7fFYxMxdwJ0okUbyuVwbx3TvvTAl0BmnyleWrRT5A+iUBUH3HcWm5JaIr WA8I98bt6DV/8P2UCyuXN6jH3PowQ== X-ME-Sender: <xms:cMRxYb7y8edIQWhmAfmY6viOwS0QlFFBhM0BOrtd0pFfDGgZlN6BOw> <xme:cMRxYQ4rOfXIqRL_Lfm-6j92xXsb2ceAobc2BEu-WqDc-7o-3KlUJ4T9XGSEYvIne 1flNwj4VI8SfN61EQ> X-ME-Received: <xmr:cMRxYSdBTpL0E31Nvz3K6Enxt0UsbXzH7cKKfubFu5-uQB9k5o5pdORr5yRs9jgMtDgFJkg4B5098tk7D1tpl_7eWQ> X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedvtddrvddviedgudegtdcutefuodetggdotefrod ftvfcurfhrohhfihhlvgemucfhrghsthforghilhdpqfgfvfdpuffrtefokffrpgfnqfgh necuuegrihhlohhuthemuceftddtnecusecvtfgvtghiphhivghnthhsucdlqddutddtmd enucfjughrpefhvffufffkjghfggfgtgesthfuredttddtvdenucfhrhhomhepvfhhohhm rghsucfoohhnjhgrlhhonhcuoehthhhomhgrshesmhhonhhjrghlohhnrdhnvghtqeenuc ggtffrrghtthgvrhhnpedugefgvdefudfftdefgeelgffhueekgfffhfeujedtteeutdej ueeiiedvffegheenucevlhhushhtvghrufhiiigvpedtnecurfgrrhgrmhepmhgrihhlfh hrohhmpehthhhomhgrshesmhhonhhjrghlohhnrdhnvght X-ME-Proxy: <xmx:cMRxYcIzssCahDBtfmFyqz6b2zwd8tEqYvceeogi_PJmL4SA5dBk2A> <xmx:cMRxYfIZfFWvXjj3d3iFaEwT6omVfhem0WQ5uZ_SIc1C6QNArUY3Ng> <xmx:cMRxYVwUJS1g59q7_OnFZWOukirnytni5yQwuNDG1BIRNrU5qvVhfw> <xmx:cMRxYebpKlMwOB16Qs0KRiAwNuNg8Fi_b5xT5AYRzusoDiW0JDHVpQ> Received: by mail.messagingengine.com (Postfix) with ESMTPA; Thu, 21 Oct 2021 15:50:06 -0400 (EDT) From: Thomas Monjalon <thomas@monjalon.net> To: Aman Kumar <aman.kumar@vvdntech.in>, "Song, Keesang" <Keesang.Song@amd.com> Cc: "Ananyev, Konstantin" <konstantin.ananyev@intel.com>, "dev@dpdk.org" <dev@dpdk.org>, "rasland@nvidia.com" <rasland@nvidia.com>, "asafp@nvidia.com" <asafp@nvidia.com>, "shys@nvidia.com" <shys@nvidia.com>, "viacheslavo@nvidia.com" <viacheslavo@nvidia.com>, "akozyrev@nvidia.com" <akozyrev@nvidia.com>, "matan@nvidia.com" <matan@nvidia.com>, "Burakov, Anatoly" <anatoly.burakov@intel.com>, "aman.kumar@vvdntech.in" <aman.kumar@vvdntech.in>, "jerinjacobk@gmail.com" <jerinjacobk@gmail.com>, "Richardson, Bruce" <bruce.richardson@intel.com>, "david.marchand@redhat.com" <david.marchand@redhat.com> Date: Thu, 21 Oct 2021 21:50:04 +0200 Message-ID: <4896828.JCGbO7EgO6@thomas> In-Reply-To: <BY5PR12MB3681033D85E2E21FBE998C0196BF9@BY5PR12MB3681.namprd12.prod.outlook.com> References: <20210823084411.29592-1-aman.kumar@vvdntech.in> <2486642.Qmzdh8hRR2@thomas> <BY5PR12MB3681033D85E2E21FBE998C0196BF9@BY5PR12MB3681.namprd12.prod.outlook.com> MIME-Version: 1.0 Content-Transfer-Encoding: 7Bit Content-Type: text/plain; charset="us-ascii" Subject: Re: [dpdk-dev] [PATCH v2 1/2] lib/eal: add amd epyc2 memcpy routine to eal X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions <dev.dpdk.org> List-Unsubscribe: <https://mails.dpdk.org/options/dev>, <mailto:dev-request@dpdk.org?subject=unsubscribe> List-Archive: <http://mails.dpdk.org/archives/dev/> List-Post: <mailto:dev@dpdk.org> List-Help: <mailto:dev-request@dpdk.org?subject=help> List-Subscribe: <https://mails.dpdk.org/listinfo/dev>, <mailto:dev-request@dpdk.org?subject=subscribe> Errors-To: dev-bounces@dpdk.org Sender: "dev" <dev-bounces@dpdk.org> 21/10/2021 21:03, Song, Keesang: > From: Thomas Monjalon <thomas@monjalon.net> > > 21/10/2021 20:12, Song, Keesang: > > > From: Ananyev, Konstantin <konstantin.ananyev@intel.com> > > > > 21/10/2021 19:10, Song, Keesang: > > > > > 19/10/2021 17:35, Stephen Hemminger: > > > > > > From: Thomas Monjalon <thomas@monjalon.net> > > > > > > > 19/10/2021 12:47, Aman Kumar: > > > > > > > > This patch provides rte_memcpy* calls optimized for AMD EPYC > > > > > > > > platforms. Use config/x86/x86_amd_epyc_linux_gcc as cross-file > > > > > > > > with meson to build dpdk for AMD EPYC platforms. > > > > > > > > > > > > > > Please split in 2 patches: platform & memcpy. > > > > > > > > > > > > > > What optimization is specific to EPYC? > > > > > > > > > > > > > > I dislike the asm code below. > > > > > > > What is AMD specific inside? > > > > > > > Can it use compiler intrinsics as it is done elsewhere? > > > > > > > > > > > > And why is this not done by Gcc? > > > > > > > > > > I hope this can make some explanation to your question. > > > > > We(AMD Linux library support team) have implemented the custom > > > > > tailored memcpy solution which is a close match with DPDK use case > > > > > requirements like the below. > > > > > 1) Min 64B length data packet with cache aligned > > > > > Source and Destination. > > > > > 2) Non-Temporal load and temporal store for cache aligned > > > > > source for both RX and TX paths. > > > > > Could not implement the non-temporal store for TX_PATH, > > > > > as non-Temporal load/stores works only with 32B aligned addresses > > > > > for AVX2 > > > > > 3) This solution works for all AVX2 supported AMD machines. > > > > > > > > > > Internally we have completed the integrity testing and benchmarking > > > > > of the solution and found gains of 8.4% to 14.5% specifically on > > > > > Milan CPU(3rd Gen of EPYC Processor) > > > > > > > > It still not clear to me why it has to be written in assembler. > > > > Why similar stuff can't be written in C with instincts, as rest of > > > > rte_memcpy.h does? > > > > > > The current memcpy implementation in Glibc is based out of assembly > > > coding. > > > Although memcpy could have been implemented with intrinsic, > > > but since our AMD library developers are working on the Glibc > > > functions, they have provided a tailored implementation based > > > out of inline assembly coding. > > > > Please convert it to C code, thanks. > > I've already asked our AMD tools team, but they're saying > they are not really familiar with C code implementation. > We need your approval for now since we really need to get > this patch submitted to 21.11 LTS. Not sure it is urgent given that v2 came after the planned -rc1 date, after 6 weeks of silence. About the approval, there are already 3 technical board members (Konstantin, Stephen and me) objecting against this patch. Not being familiar with C code when working on CPU optimization in 2021 is a strange argument. In general, I don't really understand why we should maintain memcpy functions in DPDK instead of relying on libc optimizations. Having big asm code to maintain and debug is not helping. I think this case shows that AMD needs to become more familiar with DPDK schedule and expectations. I would encourage you to contribute more in the project, so such misunderstanding won't happen in future. Hope that's all understandable PS: discussion is more readable with replies below