From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from dpdk.org (dpdk.org [92.243.14.124]) by inbox.dpdk.org (Postfix) with ESMTP id A399BA034F; Wed, 13 May 2020 16:49:36 +0200 (CEST) Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id 2D1E81D625; Wed, 13 May 2020 16:49:36 +0200 (CEST) Received: from mail-io1-f67.google.com (mail-io1-f67.google.com [209.85.166.67]) by dpdk.org (Postfix) with ESMTP id 367F71D620 for ; Wed, 13 May 2020 16:49:35 +0200 (CEST) Received: by mail-io1-f67.google.com with SMTP id d7so18326076ioq.5 for ; Wed, 13 May 2020 07:49:35 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=QFPqi6hFpFWlQ+07nBnLsQIQU/OPiQ58ww9/mzWEGoo=; b=gyce8xbvRKkuKP/iIpdBHlYvYygXL1Vm6/+YQzuwTPl9LpEwdwu0mgfIN+bZ6/Gcw7 tPValbyWdgp6+/2XzKvJiw1MEIpFvmiG3RoRfB3B6ybPgcVLibjI4jL6lUFiB/v4ZwDb JtWD3JjLn0VG6Fir+syTz34jhkvQq9xJCidC5llujgUcnzjL7ttMZqS/Rxm8swYrC1Mh ngE66czKDDD+rzYu8IRdnrDrJTbxR2MRleJ+ftjulCBhZQU7J5sZ8WcHpWfyxv0qk6pz V/HCgy3uWvFP1DsWOL+2WxeIrj7PwEWC+6SVjfJjH1sX3eUiHtztyOnZifuZMgM4VsdW qw/A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=QFPqi6hFpFWlQ+07nBnLsQIQU/OPiQ58ww9/mzWEGoo=; b=Hlh0Glp2XtHBhsy0wdNr1aTPQ2vRtR+LNRYLt9pl3BVVRm7QiFxD3tfIPGivv2Qfo6 qQUNttKu5Rd6W/XFjlxZhWMVm1gV4wJasWI2b/2KelLokfGx+cgolcV59QKpTeVeV0X5 llCUS7YgkRyy2gH431D/DxFhpHxITOdB3D2LKRVXN8UwKpl/JRBLlmwroD7LFiNxdi8v ggu3TA/1dSooVz48eHgUFHAAM7W8/yt+MBIyqFDMMG/+bdeGALwRFXQQpVjTl3MsVhLG B7vBNtnsGkrMT6RbSJ+w4YWxzZpMmSx07+HffJhbIf7fxuczbW0t+qVELMiVDx77AwAz UCCw== X-Gm-Message-State: AGi0Pubz+1gznSyjFLmaQnOUEQpszqfsTzE9IfQR1W6fukEv7qhWJvWO nEbgZxhueT2fEosKDzftscz2HfOepEwPUP0tKec= X-Google-Smtp-Source: APiQypL/Bu3YGuz+TQxT9KhsF+wc1dzccI/Kclpgb8Va8knEVWYCRh0DPBwWIzQ0oUqNqVueINVif5qEPd8v5TrN7So= X-Received: by 2002:a02:7113:: with SMTP id n19mr14853588jac.113.1589381374310; Wed, 13 May 2020 07:49:34 -0700 (PDT) MIME-Version: 1.0 References: <20200410164127.54229-1-gavin.hu@arm.com> <20200511180637.22200-1-honnappa.nagarahalli@arm.com> In-Reply-To: From: Jerin Jacob Date: Wed, 13 May 2020 20:19:17 +0530 Message-ID: To: Honnappa Nagarahalli Cc: Ruifeng Wang , "dev@dpdk.org" , "jerinj@marvell.com" , "hemant.agrawal@nxp.com" , "Ajit Khaparde (ajit.khaparde@broadcom.com)" , "igorch@amazon.com" , "thomas@monjalon.net" , "viacheslavo@mellanox.com" , "arybchenko@solarflare.com" , nd , "Richardson, Bruce" Content-Type: text/plain; charset="UTF-8" Subject: Re: [dpdk-dev] [RFC] eal: adjust barriers for IO on Armv8-a X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" On Wed, May 13, 2020 at 3:14 AM Honnappa Nagarahalli wrote: > > > > > > > > Subject: [RFC] eal: adjust barriers for IO on Armv8-a > > > > > > > > > > Change the barrier APIs for IO to reflect that Armv8-a is > > > > > other-multi-copy atomicity memory model. > > > > > > > > > > Armv8-a memory model has been strengthened to require > > > > > other-multi-copy atomicity. This property requires memory accesses > > > > > from an observer to become visible to all other observers > > > > > simultaneously [3]. This means > > > > > > > > > > a) A write arriving at an endpoint shared between multiple CPUs is > > > > > visible to all CPUs > > > > > b) A write that is visible to all CPUs is also visible to all other > > > > > observers in the shareability domain > > > > > > > > > > This allows for using cheaper DMB instructions in the place of DSB > > > > > for devices that are visible to all CPUs (i.e. devices that DPDK caters to). > > > > > > > > > > Please refer to [1], [2] and [3] for more information. > > > > > > > > > > [1] > > > > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git > > > > > /c ommit/?i d=22ec71615d824f4f11d38d0e55a88d8956b7e45f > > > > > [2] https://www.youtube.com/watch?v=i6DayghhA8Q > > > > > [3] https://www.cl.cam.ac.uk/~pes20/armv8-mca/ > > > > > > > > > > Signed-off-by: Honnappa Nagarahalli > > > > > --- > > > > > lib/librte_eal/arm/include/rte_atomic_64.h | 10 +++++----- > > > > > 1 file changed, 5 insertions(+), 5 deletions(-) > > > > > > > > > > diff --git a/lib/librte_eal/arm/include/rte_atomic_64.h > > > > > b/lib/librte_eal/arm/include/rte_atomic_64.h > > > > > index 7b7099cdc..e406411bb 100644 > > > > > --- a/lib/librte_eal/arm/include/rte_atomic_64.h > > > > > +++ b/lib/librte_eal/arm/include/rte_atomic_64.h > > > > > @@ -19,11 +19,11 @@ extern "C" { > > > > > #include > > > > > #include > > > > > > > > > > -#define rte_mb() asm volatile("dsb sy" : : : "memory") > > > > > +#define rte_mb() asm volatile("dmb osh" : : : "memory") > > > > > > > > > > -#define rte_wmb() asm volatile("dsb st" : : : "memory") > > > > > +#define rte_wmb() asm volatile("dmb oshst" : : : "memory") > > > > > > > > > > -#define rte_rmb() asm volatile("dsb ld" : : : "memory") > > > > > +#define rte_rmb() asm volatile("dmb oshld" : : : "memory") > > > > > > > > > > #define rte_smp_mb() asm volatile("dmb ish" : : : "memory") > > > > > > > > > > @@ -37,9 +37,9 @@ extern "C" { > > > > > > > > > > #define rte_io_rmb() rte_rmb() > > > > > > > > > > -#define rte_cio_wmb() asm volatile("dmb oshst" : : : "memory") > > > > > +#define rte_cio_wmb() rte_wmb() > > > > > > > > > > -#define rte_cio_rmb() asm volatile("dmb oshld" : : : "memory") > > > > > +#define rte_cio_rmb() rte_rmb() > > > > > > > > > > /*------------------------ 128 bit atomic operations > > > > > -------------------------*/ > > > > > > > > > > -- > > > > > 2.17.1 > > > > > > > > This change showed about 7% performance gain in testpmd single core > > > NDR test. > > > > > > I am trying to understand this patch wrt DPDK current usage model? > > > > > > 1) Is performance improvement due to the fact that the PMD that you > > > are using it for testing suppose to use existing rte_cio_* but it was > > > using rte_[rw]mb? > No, it is supposed to use rte_[rw]mb for x86. Why drivers using rte_[rw]in fastpath, IMO, rte_io_[rw]b and rte_cio_[rw]b created for this pupose. But I understand, in x86 it is mapped to rte_compiler_barrier(). Is it correct from x86 PoV? @Ananyev, Konstantin @Richardson, Bruce ? For x86: #define rte_io_wmb() rte_compiler_barrier() #define rte_io_rmb() rte_compiler_barrier() #define rte_cio_wmb() rte_compiler_barrier() #define rte_cio_rmb() rte_compiler_barrier() > > > > > This is part of the reason. There are also cases where rte_io_* was used and > > can be relaxed. > > Such as: http://patches.dpdk.org/patch/68162/ > > > > > 2) In my understanding : > > > a) CPU to CPU barrier requirements are addressed by rte_smp_* > > > b) CPU to DMA/Device barrier requirements are addressed by rte_cio_* > > > c) CPU to ANY(CPU or Device) are addressed by rte_[rw]mb > > > > > > If (c) is true then we are violating the DPDK spec with change. Right? > No, we are not. Essentially, due to the other-multi-copy atomicity behavior of the architecture, we are saying 'DMB OSH*' is enough to achieve (c). Yeah. Probably from userspace POV it should be OK to use "DMB OSH*" to have the barrier between 4 of them? 1) Global memory (BSS and Data sections), Not mapped as a hugepage. 2) Hugepage memory 3) IOVA memory 4) PCI register read/write Do we need to worry about anything else which is specific to DSB? example, TLB related flush etc. If we are talking this path then rte_cio_[rw]mb() has no meaning in DPDK as an abstraction as it was created for arm64 for this specific purpose. If we can meet all DPDK usecse with DMB OSH then probably we can deprecate rte_cio_wmb to avoid confusion. > > > > > Developers are still required to use correct barrier APIs for different use cases. > > I think this change mitigates performance penalty when non optimal barrier is > > used. > > > > > This change will not be required if fastpath (CPU to Device) is using > > rte_cio_*. > > > Right? > Yes. It is required when the fastpath uses rte_[rw]mb. > > > > > See 1). Correct usage of rte_cio_* is not the whole. > > For some other use cases, such as barrier between accesses of different > > memory types, we can also use lighter barrier 'dmb'. > > > > > > > > > > > > > > > Tested-by: Ruifeng Wang > > > > >