From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <dev-bounces@dpdk.org>
Received: from dpdk.org (dpdk.org [92.243.14.124])
	by inbox.dpdk.org (Postfix) with ESMTP id 50006A0546;
	Thu, 16 Jul 2020 12:35:19 +0200 (CEST)
Received: from [92.243.14.124] (localhost [127.0.0.1])
	by dpdk.org (Postfix) with ESMTP id DB4D91BED2;
	Thu, 16 Jul 2020 12:35:18 +0200 (CEST)
Received: from us-smtp-delivery-1.mimecast.com (us-smtp-2.mimecast.com
 [205.139.110.61]) by dpdk.org (Postfix) with ESMTP id A0E191BE98
 for <dev@dpdk.org>; Thu, 16 Jul 2020 12:35:17 +0200 (CEST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com;
 s=mimecast20190719; t=1594895717;
 h=from:from:reply-to:subject:subject:date:date:message-id:message-id:
 to:to:cc:cc:mime-version:mime-version:content-type:content-type:
 in-reply-to:in-reply-to:references:references;
 bh=JTJEI+P+zxzoMFgGdrVerK3Kfvl1lzRO8nW431wbDak=;
 b=d6LHRvOvtY1AB858BcbRgOebeWAQ8EfSTo86TJ1FW9xb3czxbsaVZyoT/AndnqfIBNv4MT
 gr6ZmXdvLXDjtf8lgJhZtI8JRQezhMbHwGEsia5vD5MC2xzc0VIt7aSRjx8Hf5NkS2iPTf
 NJMGyk+01DMmxrr7xTQD3omWLs03ryA=
Received: from mail-ua1-f70.google.com (mail-ua1-f70.google.com
 [209.85.222.70]) (Using TLS) by relay.mimecast.com with ESMTP id
 us-mta-352-RGkaHrCoOeaxWce30JX7hQ-1; Thu, 16 Jul 2020 06:35:15 -0400
X-MC-Unique: RGkaHrCoOeaxWce30JX7hQ-1
Received: by mail-ua1-f70.google.com with SMTP id 75so991542uai.21
 for <dev@dpdk.org>; Thu, 16 Jul 2020 03:35:15 -0700 (PDT)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20161025;
 h=x-gm-message-state:mime-version:references:in-reply-to:from:date
 :message-id:subject:to:cc;
 bh=JTJEI+P+zxzoMFgGdrVerK3Kfvl1lzRO8nW431wbDak=;
 b=a7eIZ+HDb1oSt0zMzONhkIqA89u6bJv2YMz0ObeGZtoIAACCfc1zYFPBKSnUpE58zO
 OulwwBIz/YzDTmB0+val4XCVvNxPyWstS68g6hH3x+QiWqvl3DijtXzSPNSPmJGqpXjm
 fr1xvXm9o7pyYzOWel08UW0KelRTgG6gavy++NouAO/VU5Lbc0WQEsgCEQiIdZ18MOzp
 rHbo0NguCDcW2X5n4Ghw/Vusfpbdt5XYTCn/fTlJ3Ic96lLN0rK3LbkfWDj4e9nTaDj6
 5kgwmqx+LzflFuR+kipvabdWx4OW55ymH6mUhdOAvjX533uThbOdpp3wEnnVHt71cO2o
 eFrw==
X-Gm-Message-State: AOAM533hJ+ExS3TxAI3jZONpd161FvFUrFVbLTVEXaWIZHr8982S4squ
 pdlmA8A39XnXFcNUYwSz32F/JPXbDjtikb/WlRCKXinAvhxPhL+3pkZ2mqALcfdj1nY3MIfDUln
 LrpeIJGpuwKGa358JJsY=
X-Received: by 2002:a67:2d4f:: with SMTP id t76mr2585760vst.105.1594895715117; 
 Thu, 16 Jul 2020 03:35:15 -0700 (PDT)
X-Google-Smtp-Source: ABdhPJzS+V3Txxr7+BewV2pafZCV19gibaBw6KTNDos+E9YzBsrzD1ozQrJH3xkDNRggskOIz2RoSNyVk5k/QUx8fww=
X-Received: by 2002:a67:2d4f:: with SMTP id t76mr2585738vst.105.1594895714810; 
 Thu, 16 Jul 2020 03:35:14 -0700 (PDT)
MIME-Version: 1.0
References: <1594621423-14796-1-git-send-email-phil.yang@arm.com>
 <1594875225-5850-1-git-send-email-phil.yang@arm.com>
 <1594875225-5850-2-git-send-email-phil.yang@arm.com>
In-Reply-To: <1594875225-5850-2-git-send-email-phil.yang@arm.com>
From: David Marchand <david.marchand@redhat.com>
Date: Thu, 16 Jul 2020 12:35:03 +0200
Message-ID: <CAJFAV8ya-7vPzoNu3nqmnZ8cNP=bS8ZKOe71fBe3LAtCVGiJZA@mail.gmail.com>
To: "Mcnamara, John" <john.mcnamara@intel.com>,
 David Christensen <drc@linux.vnet.ibm.com>, 
 Jerin Jacob Kollanukkaran <jerinj@marvell.com>, "Ananyev,
 Konstantin" <konstantin.ananyev@intel.com>, 
 Bruce Richardson <bruce.richardson@intel.com>,
 Marko Kovacevic <marko.kovacevic@intel.com>, 
 Stephen Hemminger <stephen@networkplumber.org>
Cc: Thomas Monjalon <thomas@monjalon.net>, 
 Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>, dev <dev@dpdk.org>, 
 Ola Liljedahl <Ola.Liljedahl@arm.com>, 
 "Ruifeng Wang (Arm Technology China)" <Ruifeng.Wang@arm.com>, nd <nd@arm.com>,
 Phil Yang <phil.yang@arm.com>
X-Mimecast-Spam-Score: 0
X-Mimecast-Originator: redhat.com
Content-Type: text/plain; charset="UTF-8"
Subject: Re: [dpdk-dev] [PATCH v8 1/3] doc: add optimizations using C11
	atomic built-ins
X-BeenThere: dev@dpdk.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: DPDK patches and discussions <dev.dpdk.org>
List-Unsubscribe: <https://mails.dpdk.org/options/dev>,
 <mailto:dev-request@dpdk.org?subject=unsubscribe>
List-Archive: <http://mails.dpdk.org/archives/dev/>
List-Post: <mailto:dev@dpdk.org>
List-Help: <mailto:dev-request@dpdk.org?subject=help>
List-Subscribe: <https://mails.dpdk.org/listinfo/dev>,
 <mailto:dev-request@dpdk.org?subject=subscribe>
Errors-To: dev-bounces@dpdk.org
Sender: "dev" <dev-bounces@dpdk.org>

Hello,

On Thu, Jul 16, 2020 at 6:58 AM Phil Yang <phil.yang@arm.com> wrote:
>
> Add information about possible optimizations using C11 atomic built-ins.

We are missing a review on this doc update.

Thanks.


-- 
David Marchand

>
> Signed-off-by: Phil Yang <phil.yang@arm.com>
> Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
> ---
>  doc/guides/prog_guide/writing_efficient_code.rst | 59 +++++++++++++++++++++++-
>  1 file changed, 58 insertions(+), 1 deletion(-)
>
> diff --git a/doc/guides/prog_guide/writing_efficient_code.rst b/doc/guides/prog_guide/writing_efficient_code.rst
> index 849f63e..53a1ca1 100644
> --- a/doc/guides/prog_guide/writing_efficient_code.rst
> +++ b/doc/guides/prog_guide/writing_efficient_code.rst
> @@ -167,7 +167,13 @@ but with the added cost of lower throughput.
>  Locks and Atomic Operations
>  ---------------------------
>
> -Atomic operations imply a lock prefix before the instruction,
> +This section describes some key considerations when using locks and atomic
> +operations in the DPDK environment.
> +
> +Locks
> +~~~~~
> +
> +On x86, atomic operations imply a lock prefix before the instruction,
>  causing the processor's LOCK# signal to be asserted during execution of the following instruction.
>  This has a big impact on performance in a multicore environment.
>
> @@ -176,6 +182,57 @@ It can often be replaced by other solutions like per-lcore variables.
>  Also, some locking techniques are more efficient than others.
>  For instance, the Read-Copy-Update (RCU) algorithm can frequently replace simple rwlocks.
>
> +Atomic Operations: Use C11 Atomic Built-ins
> +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> +
> +DPDK generic rte_atomic operations are implemented by __sync built-ins. These
> +__sync built-ins result in full barriers on aarch64, which are unnecessary
> +in many use cases. They can be replaced by __atomic built-ins that conform to
> +the C11 memory model and provide finer memory order control.
> +
> +So replacing the rte_atomic operations with __atomic built-ins might improve
> +performance for aarch64 machines.
> +
> +Some typical optimization cases are listed below:
> +
> +Atomicity
> +^^^^^^^^^
> +
> +Some use cases require atomicity alone, the ordering of the memory operations
> +does not matter. For example, the packet statistics counters need to be
> +incremented atomically but do not need any particular memory ordering.
> +So, RELAXED memory ordering is sufficient.
> +
> +One-way Barrier
> +^^^^^^^^^^^^^^^
> +
> +Some use cases allow for memory reordering in one way while requiring memory
> +ordering in the other direction.
> +
> +For example, the memory operations before the spinlock lock are allowed to
> +move to the critical section, but the memory operations in the critical section
> +are not allowed to move above the lock. In this case, the full memory barrier
> +in the compare-and-swap operation can be replaced with ACQUIRE memory order.
> +On the other hand, the memory operations after the spinlock unlock are allowed
> +to move to the critical section, but the memory operations in the critical
> +section are not allowed to move below the unlock. So the full barrier in the
> +store operation can use RELEASE memory order.
> +
> +Reader-Writer Concurrency
> +^^^^^^^^^^^^^^^^^^^^^^^^^
> +
> +Lock-free reader-writer concurrency is one of the common use cases in DPDK.
> +
> +The payload or the data that the writer wants to communicate to the reader,
> +can be written with RELAXED memory order. However, the guard variable should
> +be written with RELEASE memory order. This ensures that the store to guard
> +variable is observable only after the store to payload is observable.
> +
> +Correspondingly, on the reader side, the guard variable should be read
> +with ACQUIRE memory order. The payload or the data the writer communicated,
> +can be read with RELAXED memory order. This ensures that, if the store to
> +guard variable is observable, the store to payload is also observable.
> +
>  Coding Considerations
>  ---------------------
>
> --
> 2.7.4
>