From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id 4E4A6A0A0C; Fri, 9 Jul 2021 11:14:47 +0200 (CEST) Received: from [217.70.189.124] (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id C55CA4014D; Fri, 9 Jul 2021 11:14:46 +0200 (CEST) Received: from mga05.intel.com (mga05.intel.com [192.55.52.43]) by mails.dpdk.org (Postfix) with ESMTP id 9745840143 for ; Fri, 9 Jul 2021 11:14:44 +0200 (CEST) X-IronPort-AV: E=McAfee;i="6200,9189,10039"; a="295312672" X-IronPort-AV: E=Sophos;i="5.84,226,1620716400"; d="scan'208";a="295312672" Received: from fmsmga003.fm.intel.com ([10.253.24.29]) by fmsmga105.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 09 Jul 2021 02:14:43 -0700 X-IronPort-AV: E=Sophos;i="5.84,226,1620716400"; d="scan'208";a="487982018" Received: from bricha3-mobl.ger.corp.intel.com ([10.252.3.246]) by fmsmga003-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-SHA; 09 Jul 2021 02:14:38 -0700 Date: Fri, 9 Jul 2021 10:14:35 +0100 From: Bruce Richardson To: Jerin Jacob Cc: fengchengwen , Thomas Monjalon , Ferruh Yigit , Jerin Jacob , dpdk-dev , Morten =?iso-8859-1?Q?Br=F8rup?= , Nipun Gupta , Hemant Agrawal , Maxime Coquelin , Honnappa Nagarahalli , David Marchand , Satananda Burla , Prasun Kapoor , "Ananyev, Konstantin" , liangma@liangbit.com, Radha Mohan Chintakuntla Message-ID: References: <9b063d9f-5b52-8e1b-e12a-f24f4ea3b122@huawei.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: Subject: Re: [dpdk-dev] [PATCH] dmadev: introduce DMA device library X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" On Fri, Jul 09, 2021 at 12:05:40AM +0530, Jerin Jacob wrote: > On Thu, Jul 8, 2021 at 8:41 AM fengchengwen wrote: > > > > > >>> > > >>> It's just more conditionals and branches all through the code. Inside the > > >>> user application, the user has to check whether to set the flag or not (or > > >>> special-case the last transaction outside the loop), and within the driver, > > >>> there has to be a branch whether or not to call the doorbell function. The > > >>> code on both sides is far simpler and more readable if the doorbell > > >>> function is exactly that - a separate function. > > >> > > >> I disagree. The reason is: > > >> > > >> We will have two classes of applications > > >> > > >> a) do dma copy request as and when it has data(I think, this is the > > >> prime use case), for those, > > >> I think, it is considerable overhead to have two function invocation > > >> per transfer i.e > > >> rte_dma_copy() and rte_dma_perform() > > >> > > >> b) do dma copy when the data is reached to a logical state, like copy > > >> IP frame from Ethernet packets or so, > > >> In that case, the application will have a LOGIC to detect when to > > >> perform it so on the end of > > >> that rte_dma_copy() flag can be updated to fire the doorbell. > > >> > > >> IMO, We are comparing against a branch(flag is already in register) vs > > >> a set of instructions for > > >> 1) function pointer overhead > > >> 2) Need to use the channel context again back in another function. > > >> > > >> IMO, a single branch is most optimal from performance PoV. > > >> > > > Ok, let's try it and see how it goes. > > > > Test result show: > > 1) For Kunpeng platform (ARMv8) could benefit very little with doorbell in flags > > 2) For Xeon E5-2690 v2 (X86) could benefit with separate function > > 3) Both platform could benefit with doorbell in flags if burst < 5 > > > > There is a performance gain in small bursts (<5). Given the extensive use of bursts > in DPDK applications and users are accustomed to the concept, I do > not recommend > > using the 'doorbell' in flags. > > There is NO concept change between one option vs other option. Just > argument differnet. > Also, _perform() scheme not used anywhere in DPDK. I > > Regarding performance, I have added dummy instructions to simulate the real work > load[1], now burst also has some gain in both x86 and arm64[3] > > I have modified your application[2] to dpdk test application to use > cpu isolation etc. > So this is gain in flag scheme ad code is checked in to Github[2[ > The benchmark numbers all seem very close between the two schemes. On my team we pretty much have test ioat & idxd drivers ported internally to the last dmadev draft library, and have sample apps handling traffic using those. I'll therefore attempt to get these numbers with real traffic on real drivers to just double check that it's the same as these microbenchmarks. Assuming that perf is the same, how to resolve this? Some thoughts: * As I understand it, the main objection to the separate doorbell function is the use of 8-bytes in fastpath slot. Therefore I will also attempt to benchmark having the doorbell function not on the same cacheline and check perf impact, if any. * If we don't have a impact to perf by having the doorbell function inside the regular "ops" rather than on fastpath cacheline, there is no reason we can't implement both schemes. The user can then choose themselves whether to doorbell using a flag on last item, or to doorbell explicitly using function call. Of the two schemes, and assuming they are equal, I do have a preference for the separate function one, primarily from a code readability point of view. Other than that, I have no strong opinions. /Bruce