From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id B8A25A0C45; Sun, 11 Jul 2021 09:14:41 +0200 (CEST) Received: from [217.70.189.124] (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id 537664068C; Sun, 11 Jul 2021 09:14:41 +0200 (CEST) Received: from mail-il1-f175.google.com (mail-il1-f175.google.com [209.85.166.175]) by mails.dpdk.org (Postfix) with ESMTP id DD9A440040 for ; Sun, 11 Jul 2021 09:14:40 +0200 (CEST) Received: by mail-il1-f175.google.com with SMTP id o10so15659688ils.6 for ; Sun, 11 Jul 2021 00:14:40 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=+0FAt7LMm7Yz7rHLEtKz3rVSEg8ItSc9f2FPRiwUkak=; b=rN8BE5SevX2kwVw7158KNx+4iaO76Tp/lJtHpVzncpVmSlM7oXQPtNS4UDIA4Jn+fh TRcc8kH5e6Kj7G1l1bEj0QjLW+1vmutPbxALXozd8XsY0LCtCSZ/5qcZn4/JrkvDDqAR ywNxjcJtuBM5lWDEoeCRSNsmsoN/kNEP1hBOuYPvTdN6Uii4bl1VvvbXKu9Cx4sFnoeJ PtUYkjyHQPViah95gF9Yi1CMxc9yEYKaMxmL6I3BpghjpF6zfwnL2wKTuGmbhDhBEYMF w7UMfgnNLGtyV+kbeEEl+d4KPNbx74vFZ5hq7cobfuKi5viYvt+WUj1/ZH0z7uUGj+hc Wodw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=+0FAt7LMm7Yz7rHLEtKz3rVSEg8ItSc9f2FPRiwUkak=; b=DAQhXrA7MMNMSDIx98SzTB293j3a6MX+6NlBy2MqrpOpmxPnwD1ezlvWJXkO/ipoSA kHZb0AgoocbICQ6Q2yfgpug/Q5rpETcK3/9x9QmUmXpPz5SZs2rYfL6icpnatdzSprmb 0t7EyLa/S2DxYK+iT2pOxCOK0fEk+vtWZcas7xCFGDBTmX/EI7Ig4yP4AaQNBODJOiFM qNyXNQxUQTlEoHrC18SwenRiIY8vgRZPQF9d4Q+LCVedr25UY1Ve33IyCsjhKemTYIn3 s6SceZ2zAeywQTsEQJcavPRdGhKp0kddN/yKP4fVfM8cQU4sv2Xd55jq803tM1q8vEyv ByoA== X-Gm-Message-State: AOAM533C7OzAjFaccvY+y8PMYbuhv8EL4MoXnrzH5aRIY0b8Fn05M9BU h8BzPXQOJRIi0dRWrRq8JIDH7gdCAJudHrIK4rA= X-Google-Smtp-Source: ABdhPJyrQWFxFn4NppYfTqm7v+u7PxNaNJ2pQ/jTwYYnCgH7ABst2tbkqG3xJ3ed7cWixSWO0/SnWmmQJ6bcPEDChns= X-Received: by 2002:a92:b00b:: with SMTP id x11mr33754197ilh.130.1625987680223; Sun, 11 Jul 2021 00:14:40 -0700 (PDT) MIME-Version: 1.0 References: <9b063d9f-5b52-8e1b-e12a-f24f4ea3b122@huawei.com> In-Reply-To: From: Jerin Jacob Date: Sun, 11 Jul 2021 12:44:14 +0530 Message-ID: To: Bruce Richardson Cc: fengchengwen , Thomas Monjalon , Ferruh Yigit , Jerin Jacob , dpdk-dev , =?UTF-8?Q?Morten_Br=C3=B8rup?= , Nipun Gupta , Hemant Agrawal , Maxime Coquelin , Honnappa Nagarahalli , David Marchand , Satananda Burla , Prasun Kapoor , "Ananyev, Konstantin" , liangma@liangbit.com, Radha Mohan Chintakuntla Content-Type: text/plain; charset="UTF-8" Subject: Re: [dpdk-dev] [PATCH] dmadev: introduce DMA device library X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" On Fri, Jul 9, 2021 at 2:44 PM Bruce Richardson wrote: > > On Fri, Jul 09, 2021 at 12:05:40AM +0530, Jerin Jacob wrote: > > On Thu, Jul 8, 2021 at 8:41 AM fengchengwen wrote: > > > > > > > > >>> > > > >>> It's just more conditionals and branches all through the code. Inside the > > > >>> user application, the user has to check whether to set the flag or not (or > > > >>> special-case the last transaction outside the loop), and within the driver, > > > >>> there has to be a branch whether or not to call the doorbell function. The > > > >>> code on both sides is far simpler and more readable if the doorbell > > > >>> function is exactly that - a separate function. > > > >> > > > >> I disagree. The reason is: > > > >> > > > >> We will have two classes of applications > > > >> > > > >> a) do dma copy request as and when it has data(I think, this is the > > > >> prime use case), for those, > > > >> I think, it is considerable overhead to have two function invocation > > > >> per transfer i.e > > > >> rte_dma_copy() and rte_dma_perform() > > > >> > > > >> b) do dma copy when the data is reached to a logical state, like copy > > > >> IP frame from Ethernet packets or so, > > > >> In that case, the application will have a LOGIC to detect when to > > > >> perform it so on the end of > > > >> that rte_dma_copy() flag can be updated to fire the doorbell. > > > >> > > > >> IMO, We are comparing against a branch(flag is already in register) vs > > > >> a set of instructions for > > > >> 1) function pointer overhead > > > >> 2) Need to use the channel context again back in another function. > > > >> > > > >> IMO, a single branch is most optimal from performance PoV. > > > >> > > > > Ok, let's try it and see how it goes. > > > > > > Test result show: > > > 1) For Kunpeng platform (ARMv8) could benefit very little with doorbell in flags > > > 2) For Xeon E5-2690 v2 (X86) could benefit with separate function > > > 3) Both platform could benefit with doorbell in flags if burst < 5 > > > > > > There is a performance gain in small bursts (<5). Given the extensive use of bursts > > in DPDK applications and users are accustomed to the concept, I do > > not recommend > > > using the 'doorbell' in flags. > > > > There is NO concept change between one option vs other option. Just > > argument differnet. > > Also, _perform() scheme not used anywhere in DPDK. I > > > > Regarding performance, I have added dummy instructions to simulate the real work > > load[1], now burst also has some gain in both x86 and arm64[3] > > > > I have modified your application[2] to dpdk test application to use > > cpu isolation etc. > > So this is gain in flag scheme ad code is checked in to Github[2[ > > > > > The benchmark numbers all seem very close between the two schemes. On my > team we pretty much have test ioat & idxd drivers ported internally to the > last dmadev draft library, and have sample apps handling traffic using > those. I'll therefore attempt to get these numbers with real traffic on > real drivers to just double check that it's the same as these > microbenchmarks. Thanks. > > Assuming that perf is the same, how to resolve this? Some thoughts: > * As I understand it, the main objection to the separate doorbell function > is the use of 8-bytes in fastpath slot. Therefore I will also attempt to > benchmark having the doorbell function not on the same cacheline and check > perf impact, if any. Probably we can remove rte_dmadev_fill_sg() variant and keep sg only for copy to save 8B. > * If we don't have a impact to perf by having the doorbell function inside > the regular "ops" rather than on fastpath cacheline, there is no reason > we can't implement both schemes. The user can then choose themselves > whether to doorbell using a flag on last item, or to doorbell explicitly > using function call. Yes. I think, we can keep both. > > Of the two schemes, and assuming they are equal, I do have a preference for > the separate function one, primarily from a code readability point of view. > Other than that, I have no strong opinions. > > /Bruce