From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <dev-bounces@dpdk.org>
Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124])
	by inbox.dpdk.org (Postfix) with ESMTP id A19BEA0C43;
	Thu, 21 Oct 2021 21:50:11 +0200 (CEST)
Received: from [217.70.189.124] (localhost [127.0.0.1])
	by mails.dpdk.org (Postfix) with ESMTP id 8AE7A40040;
	Thu, 21 Oct 2021 21:50:11 +0200 (CEST)
Received: from new1-smtp.messagingengine.com (new1-smtp.messagingengine.com
 [66.111.4.221]) by mails.dpdk.org (Postfix) with ESMTP id 436FE4003F
 for <dev@dpdk.org>; Thu, 21 Oct 2021 21:50:10 +0200 (CEST)
Received: from compute4.internal (compute4.nyi.internal [10.202.2.44])
 by mailnew.nyi.internal (Postfix) with ESMTP id 76FA65812E9;
 Thu, 21 Oct 2021 15:50:08 -0400 (EDT)
Received: from mailfrontend1 ([10.202.2.162])
 by compute4.internal (MEProxy); Thu, 21 Oct 2021 15:50:08 -0400
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=monjalon.net; h=
 from:to:cc:subject:date:message-id:in-reply-to:references
 :mime-version:content-transfer-encoding:content-type; s=fm2; bh=
 WriRH5gX1TlMJc8zb3K+NfmwMUz3nuIa7LLNLWSgxzo=; b=rgG9KfCVewOG9olp
 IXSe1W2dQA4Aw5PE3vU1zmE/3+D3V5jn4F7OlaiG2+GF9Hji5wyxlHxGIImOPJU3
 CjNBEDvAt9unJJhQhdgA0y+ZYTOuIx5bWyqULzrpi9RNexjqS/fO38SXkjNOtPbD
 PPu+00Yp5Yy86GgDfcC4oKSZFhnD6NKAK8voTotp4469MSpOHWfeSogKT4Ngf8UK
 bz5NACDfLjxEfED9AhjhqHUPJstVrfY9xkeU7693Z1I90AYkO7GzgpZdLJweT4HK
 lTmjEOg6rltPoWwIAJJIIcpYc4N+QnmiG3J6RcckaZuads2NdGRVaqbUXUkn9j1A
 a0P+/A==
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=
 messagingengine.com; h=cc:content-transfer-encoding:content-type
 :date:from:in-reply-to:message-id:mime-version:references
 :subject:to:x-me-proxy:x-me-proxy:x-me-sender:x-me-sender
 :x-sasl-enc; s=fm1; bh=WriRH5gX1TlMJc8zb3K+NfmwMUz3nuIa7LLNLWSgx
 zo=; b=EixKbn9W3LW5MMg3gPnkzU4wGVM183u8u85hNdKHZjQEcwF59PtmTE9oe
 tlawpjDsTSlugWM8Mt+s19B7yXdqmQc0cFyin74l2rwSEO6vtR0QbDF9AL4SDQ4U
 ByEHIUJMj585otyiiUcYoRUCJmcGHoFe2VnVyb529wnlTNeemJD0TQH4hkrSVRWA
 disO4nck85+tL4jK6lN1B3WMCxRIyu0If6ceEZIX100YsofIIH8l5kiOmwsC+uM2
 XBgc47l7fFYxMxdwJ0okUbyuVwbx3TvvTAl0BmnyleWrRT5A+iUBUH3HcWm5JaIr
 WA8I98bt6DV/8P2UCyuXN6jH3PowQ==
X-ME-Sender: <xms:cMRxYb7y8edIQWhmAfmY6viOwS0QlFFBhM0BOrtd0pFfDGgZlN6BOw>
 <xme:cMRxYQ4rOfXIqRL_Lfm-6j92xXsb2ceAobc2BEu-WqDc-7o-3KlUJ4T9XGSEYvIne
 1flNwj4VI8SfN61EQ>
X-ME-Received: <xmr:cMRxYSdBTpL0E31Nvz3K6Enxt0UsbXzH7cKKfubFu5-uQB9k5o5pdORr5yRs9jgMtDgFJkg4B5098tk7D1tpl_7eWQ>
X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedvtddrvddviedgudegtdcutefuodetggdotefrod
 ftvfcurfhrohhfihhlvgemucfhrghsthforghilhdpqfgfvfdpuffrtefokffrpgfnqfgh
 necuuegrihhlohhuthemuceftddtnecusecvtfgvtghiphhivghnthhsucdlqddutddtmd
 enucfjughrpefhvffufffkjghfggfgtgesthfuredttddtvdenucfhrhhomhepvfhhohhm
 rghsucfoohhnjhgrlhhonhcuoehthhhomhgrshesmhhonhhjrghlohhnrdhnvghtqeenuc
 ggtffrrghtthgvrhhnpedugefgvdefudfftdefgeelgffhueekgfffhfeujedtteeutdej
 ueeiiedvffegheenucevlhhushhtvghrufhiiigvpedtnecurfgrrhgrmhepmhgrihhlfh
 hrohhmpehthhhomhgrshesmhhonhhjrghlohhnrdhnvght
X-ME-Proxy: <xmx:cMRxYcIzssCahDBtfmFyqz6b2zwd8tEqYvceeogi_PJmL4SA5dBk2A>
 <xmx:cMRxYfIZfFWvXjj3d3iFaEwT6omVfhem0WQ5uZ_SIc1C6QNArUY3Ng>
 <xmx:cMRxYVwUJS1g59q7_OnFZWOukirnytni5yQwuNDG1BIRNrU5qvVhfw>
 <xmx:cMRxYebpKlMwOB16Qs0KRiAwNuNg8Fi_b5xT5AYRzusoDiW0JDHVpQ>
Received: by mail.messagingengine.com (Postfix) with ESMTPA; Thu,
 21 Oct 2021 15:50:06 -0400 (EDT)
From: Thomas Monjalon <thomas@monjalon.net>
To: Aman Kumar <aman.kumar@vvdntech.in>, "Song, Keesang" <Keesang.Song@amd.com>
Cc: "Ananyev, Konstantin" <konstantin.ananyev@intel.com>,
 "dev@dpdk.org" <dev@dpdk.org>, "rasland@nvidia.com" <rasland@nvidia.com>,
 "asafp@nvidia.com" <asafp@nvidia.com>, "shys@nvidia.com" <shys@nvidia.com>,
 "viacheslavo@nvidia.com" <viacheslavo@nvidia.com>,
 "akozyrev@nvidia.com" <akozyrev@nvidia.com>,
 "matan@nvidia.com" <matan@nvidia.com>, "Burakov,
 Anatoly" <anatoly.burakov@intel.com>,
 "aman.kumar@vvdntech.in" <aman.kumar@vvdntech.in>,
 "jerinjacobk@gmail.com" <jerinjacobk@gmail.com>, "Richardson,
 Bruce" <bruce.richardson@intel.com>,
 "david.marchand@redhat.com" <david.marchand@redhat.com>
Date: Thu, 21 Oct 2021 21:50:04 +0200
Message-ID: <4896828.JCGbO7EgO6@thomas>
In-Reply-To: <BY5PR12MB3681033D85E2E21FBE998C0196BF9@BY5PR12MB3681.namprd12.prod.outlook.com>
References: <20210823084411.29592-1-aman.kumar@vvdntech.in>
 <2486642.Qmzdh8hRR2@thomas>
 <BY5PR12MB3681033D85E2E21FBE998C0196BF9@BY5PR12MB3681.namprd12.prod.outlook.com>
MIME-Version: 1.0
Content-Transfer-Encoding: 7Bit
Content-Type: text/plain; charset="us-ascii"
Subject: Re: [dpdk-dev] [PATCH v2 1/2] lib/eal: add amd epyc2 memcpy routine
 to eal
X-BeenThere: dev@dpdk.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: DPDK patches and discussions <dev.dpdk.org>
List-Unsubscribe: <https://mails.dpdk.org/options/dev>,
 <mailto:dev-request@dpdk.org?subject=unsubscribe>
List-Archive: <http://mails.dpdk.org/archives/dev/>
List-Post: <mailto:dev@dpdk.org>
List-Help: <mailto:dev-request@dpdk.org?subject=help>
List-Subscribe: <https://mails.dpdk.org/listinfo/dev>,
 <mailto:dev-request@dpdk.org?subject=subscribe>
Errors-To: dev-bounces@dpdk.org
Sender: "dev" <dev-bounces@dpdk.org>

21/10/2021 21:03, Song, Keesang:
> From: Thomas Monjalon <thomas@monjalon.net>
> > 21/10/2021 20:12, Song, Keesang:
> > > From: Ananyev, Konstantin <konstantin.ananyev@intel.com>
> > > > 21/10/2021 19:10, Song, Keesang:
> > > > > 19/10/2021 17:35, Stephen Hemminger:
> > > > > > From: Thomas Monjalon <thomas@monjalon.net>
> > > > > > > 19/10/2021 12:47, Aman Kumar:
> > > > > > > > This patch provides rte_memcpy* calls optimized for AMD EPYC
> > > > > > > > platforms. Use config/x86/x86_amd_epyc_linux_gcc as cross-file
> > > > > > > > with meson to build dpdk for AMD EPYC platforms.
> > > > > > > 
> > > > > > > Please split in 2 patches: platform & memcpy.
> > > > > > > 
> > > > > > > What optimization is specific to EPYC?
> > > > > > > 
> > > > > > > I dislike the asm code below.
> > > > > > > What is AMD specific inside?
> > > > > > > Can it use compiler intrinsics as it is done elsewhere?
> > > > > > 
> > > > > > And why is this not done by Gcc?
> > > > >
> > > > > I hope this can make some explanation to your question.
> > > > > We(AMD Linux library support team) have implemented the custom
> > > > > tailored memcpy solution which is a close match with DPDK use case
> > > > > requirements like the below.
> > > > > 1)      Min 64B length data packet with cache aligned
> > > > > Source and Destination.
> > > > > 2)      Non-Temporal load and temporal store for cache aligned
> > > > > source for both RX and TX paths.
> > > > > Could not implement the non-temporal store for TX_PATH,
> > > > > as non-Temporal load/stores works only with 32B aligned addresses
> > > > > for AVX2
> > > > > 3)      This solution works for all AVX2 supported AMD machines.
> > > > > 
> > > > > Internally we have completed the integrity testing and benchmarking
> > > > > of the solution and found gains of 8.4% to 14.5% specifically on
> > > > > Milan CPU(3rd Gen of EPYC Processor)
> > > > 
> > > > It still not clear to me why it has to be written in assembler.
> > > > Why similar stuff can't be written in C with instincts, as rest of
> > > > rte_memcpy.h does?
> > > 
> > > The current memcpy implementation in Glibc is based out of assembly
> > > coding.
> > > Although memcpy could have been implemented with intrinsic,
> > > but since our AMD library developers are working on the Glibc
> > > functions, they have provided a tailored implementation based
> > > out of inline assembly coding.
> > 
> > Please convert it to C code, thanks.
> 
> I've already asked our AMD tools team, but they're saying
> they are not really familiar with C code implementation.
> We need your approval for now since we really need to get
> this patch submitted to 21.11 LTS.

Not sure it is urgent given that v2 came after the planned -rc1 date,
after 6 weeks of silence.
About the approval, there are already 3 technical board members
(Konstantin, Stephen and me) objecting against this patch.
Not being familiar with C code when working on CPU optimization
in 2021 is a strange argument.

In general, I don't really understand why we should maintain memcpy
functions in DPDK instead of relying on libc optimizations.
Having big asm code to maintain and debug is not helping.

I think this case shows that AMD needs to become more familiar
with DPDK schedule and expectations.
I would encourage you to contribute more in the project,
so such misunderstanding won't happen in future.

Hope that's all understandable


PS: discussion is more readable with replies below