From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from dpdk.org (dpdk.org [92.243.14.124]) by inbox.dpdk.org (Postfix) with ESMTP id 50006A0546; Thu, 16 Jul 2020 12:35:19 +0200 (CEST) Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id DB4D91BED2; Thu, 16 Jul 2020 12:35:18 +0200 (CEST) Received: from us-smtp-delivery-1.mimecast.com (us-smtp-2.mimecast.com [205.139.110.61]) by dpdk.org (Postfix) with ESMTP id A0E191BE98 for ; Thu, 16 Jul 2020 12:35:17 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1594895717; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=JTJEI+P+zxzoMFgGdrVerK3Kfvl1lzRO8nW431wbDak=; b=d6LHRvOvtY1AB858BcbRgOebeWAQ8EfSTo86TJ1FW9xb3czxbsaVZyoT/AndnqfIBNv4MT gr6ZmXdvLXDjtf8lgJhZtI8JRQezhMbHwGEsia5vD5MC2xzc0VIt7aSRjx8Hf5NkS2iPTf NJMGyk+01DMmxrr7xTQD3omWLs03ryA= Received: from mail-ua1-f70.google.com (mail-ua1-f70.google.com [209.85.222.70]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-352-RGkaHrCoOeaxWce30JX7hQ-1; Thu, 16 Jul 2020 06:35:15 -0400 X-MC-Unique: RGkaHrCoOeaxWce30JX7hQ-1 Received: by mail-ua1-f70.google.com with SMTP id 75so991542uai.21 for ; Thu, 16 Jul 2020 03:35:15 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=JTJEI+P+zxzoMFgGdrVerK3Kfvl1lzRO8nW431wbDak=; b=a7eIZ+HDb1oSt0zMzONhkIqA89u6bJv2YMz0ObeGZtoIAACCfc1zYFPBKSnUpE58zO OulwwBIz/YzDTmB0+val4XCVvNxPyWstS68g6hH3x+QiWqvl3DijtXzSPNSPmJGqpXjm fr1xvXm9o7pyYzOWel08UW0KelRTgG6gavy++NouAO/VU5Lbc0WQEsgCEQiIdZ18MOzp rHbo0NguCDcW2X5n4Ghw/Vusfpbdt5XYTCn/fTlJ3Ic96lLN0rK3LbkfWDj4e9nTaDj6 5kgwmqx+LzflFuR+kipvabdWx4OW55ymH6mUhdOAvjX533uThbOdpp3wEnnVHt71cO2o eFrw== X-Gm-Message-State: AOAM533hJ+ExS3TxAI3jZONpd161FvFUrFVbLTVEXaWIZHr8982S4squ pdlmA8A39XnXFcNUYwSz32F/JPXbDjtikb/WlRCKXinAvhxPhL+3pkZ2mqALcfdj1nY3MIfDUln LrpeIJGpuwKGa358JJsY= X-Received: by 2002:a67:2d4f:: with SMTP id t76mr2585760vst.105.1594895715117; Thu, 16 Jul 2020 03:35:15 -0700 (PDT) X-Google-Smtp-Source: ABdhPJzS+V3Txxr7+BewV2pafZCV19gibaBw6KTNDos+E9YzBsrzD1ozQrJH3xkDNRggskOIz2RoSNyVk5k/QUx8fww= X-Received: by 2002:a67:2d4f:: with SMTP id t76mr2585738vst.105.1594895714810; Thu, 16 Jul 2020 03:35:14 -0700 (PDT) MIME-Version: 1.0 References: <1594621423-14796-1-git-send-email-phil.yang@arm.com> <1594875225-5850-1-git-send-email-phil.yang@arm.com> <1594875225-5850-2-git-send-email-phil.yang@arm.com> In-Reply-To: <1594875225-5850-2-git-send-email-phil.yang@arm.com> From: David Marchand Date: Thu, 16 Jul 2020 12:35:03 +0200 Message-ID: To: "Mcnamara, John" , David Christensen , Jerin Jacob Kollanukkaran , "Ananyev, Konstantin" , Bruce Richardson , Marko Kovacevic , Stephen Hemminger Cc: Thomas Monjalon , Honnappa Nagarahalli , dev , Ola Liljedahl , "Ruifeng Wang (Arm Technology China)" , nd , Phil Yang X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset="UTF-8" Subject: Re: [dpdk-dev] [PATCH v8 1/3] doc: add optimizations using C11 atomic built-ins X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" Hello, On Thu, Jul 16, 2020 at 6:58 AM Phil Yang wrote: > > Add information about possible optimizations using C11 atomic built-ins. We are missing a review on this doc update. Thanks. -- David Marchand > > Signed-off-by: Phil Yang > Signed-off-by: Honnappa Nagarahalli > --- > doc/guides/prog_guide/writing_efficient_code.rst | 59 +++++++++++++++++++++++- > 1 file changed, 58 insertions(+), 1 deletion(-) > > diff --git a/doc/guides/prog_guide/writing_efficient_code.rst b/doc/guides/prog_guide/writing_efficient_code.rst > index 849f63e..53a1ca1 100644 > --- a/doc/guides/prog_guide/writing_efficient_code.rst > +++ b/doc/guides/prog_guide/writing_efficient_code.rst > @@ -167,7 +167,13 @@ but with the added cost of lower throughput. > Locks and Atomic Operations > --------------------------- > > -Atomic operations imply a lock prefix before the instruction, > +This section describes some key considerations when using locks and atomic > +operations in the DPDK environment. > + > +Locks > +~~~~~ > + > +On x86, atomic operations imply a lock prefix before the instruction, > causing the processor's LOCK# signal to be asserted during execution of the following instruction. > This has a big impact on performance in a multicore environment. > > @@ -176,6 +182,57 @@ It can often be replaced by other solutions like per-lcore variables. > Also, some locking techniques are more efficient than others. > For instance, the Read-Copy-Update (RCU) algorithm can frequently replace simple rwlocks. > > +Atomic Operations: Use C11 Atomic Built-ins > +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > + > +DPDK generic rte_atomic operations are implemented by __sync built-ins. These > +__sync built-ins result in full barriers on aarch64, which are unnecessary > +in many use cases. They can be replaced by __atomic built-ins that conform to > +the C11 memory model and provide finer memory order control. > + > +So replacing the rte_atomic operations with __atomic built-ins might improve > +performance for aarch64 machines. > + > +Some typical optimization cases are listed below: > + > +Atomicity > +^^^^^^^^^ > + > +Some use cases require atomicity alone, the ordering of the memory operations > +does not matter. For example, the packet statistics counters need to be > +incremented atomically but do not need any particular memory ordering. > +So, RELAXED memory ordering is sufficient. > + > +One-way Barrier > +^^^^^^^^^^^^^^^ > + > +Some use cases allow for memory reordering in one way while requiring memory > +ordering in the other direction. > + > +For example, the memory operations before the spinlock lock are allowed to > +move to the critical section, but the memory operations in the critical section > +are not allowed to move above the lock. In this case, the full memory barrier > +in the compare-and-swap operation can be replaced with ACQUIRE memory order. > +On the other hand, the memory operations after the spinlock unlock are allowed > +to move to the critical section, but the memory operations in the critical > +section are not allowed to move below the unlock. So the full barrier in the > +store operation can use RELEASE memory order. > + > +Reader-Writer Concurrency > +^^^^^^^^^^^^^^^^^^^^^^^^^ > + > +Lock-free reader-writer concurrency is one of the common use cases in DPDK. > + > +The payload or the data that the writer wants to communicate to the reader, > +can be written with RELAXED memory order. However, the guard variable should > +be written with RELEASE memory order. This ensures that the store to guard > +variable is observable only after the store to payload is observable. > + > +Correspondingly, on the reader side, the guard variable should be read > +with ACQUIRE memory order. The payload or the data the writer communicated, > +can be read with RELAXED memory order. This ensures that, if the store to > +guard variable is observable, the store to payload is also observable. > + > Coding Considerations > --------------------- > > -- > 2.7.4 >