From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from dpdk.org (dpdk.org [92.243.14.124]) by inbox.dpdk.org (Postfix) with ESMTP id BFC6CA0540; Mon, 13 Jul 2020 08:24:10 +0200 (CEST) Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id 2C3201C434; Mon, 13 Jul 2020 08:24:09 +0200 (CEST) Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by dpdk.org (Postfix) with ESMTP id 57C141C238 for ; Mon, 13 Jul 2020 08:24:08 +0200 (CEST) Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id D76A931B; Sun, 12 Jul 2020 23:24:07 -0700 (PDT) Received: from phil-VirtualBox.shanghai.arm.com (phil-VirtualBox.shanghai.arm.com [10.169.108.144]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPA id A3F7D3F887; Sun, 12 Jul 2020 23:24:03 -0700 (PDT) From: Phil Yang To: thomas@monjalon.net, john.mcnamara@intel.com, Honnappa.Nagarahalli@arm.com, drc@linux.vnet.ibm.com, dev@dpdk.org Cc: david.marchand@redhat.com, jerinj@marvell.com, konstantin.ananyev@intel.com, Ola.Liljedahl@arm.com, bruce.richardson@intel.com, Ruifeng.Wang@arm.com, nd@arm.com, Honnappa Nagarahalli , Marko Kovacevic Date: Mon, 13 Jul 2020 14:23:40 +0800 Message-Id: <1594621423-14796-2-git-send-email-phil.yang@arm.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1594621423-14796-1-git-send-email-phil.yang@arm.com> References: <1594115449-13750-1-git-send-email-phil.yang@arm.com> <1594621423-14796-1-git-send-email-phil.yang@arm.com> Subject: [dpdk-dev] [PATCH v7 1/3] doc: add generic atomic deprecation section X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" Add deprecating the generic rte_atomic_xx APIs to C11 atomic built-ins guide and examples. Signed-off-by: Phil Yang Signed-off-by: Honnappa Nagarahalli --- doc/guides/prog_guide/writing_efficient_code.rst | 64 +++++++++++++++++++++++- 1 file changed, 63 insertions(+), 1 deletion(-) diff --git a/doc/guides/prog_guide/writing_efficient_code.rst b/doc/guides/prog_guide/writing_efficient_code.rst index 849f63e..16d6188 100644 --- a/doc/guides/prog_guide/writing_efficient_code.rst +++ b/doc/guides/prog_guide/writing_efficient_code.rst @@ -167,7 +167,13 @@ but with the added cost of lower throughput. Locks and Atomic Operations --------------------------- -Atomic operations imply a lock prefix before the instruction, +This section describes some key considerations when using locks and atomic +operations in the DPDK environment. + +Locks +~~~~~ + +On x86, atomic operations imply a lock prefix before the instruction, causing the processor's LOCK# signal to be asserted during execution of the following instruction. This has a big impact on performance in a multicore environment. @@ -176,6 +182,62 @@ It can often be replaced by other solutions like per-lcore variables. Also, some locking techniques are more efficient than others. For instance, the Read-Copy-Update (RCU) algorithm can frequently replace simple rwlocks. +Atomic Operations: Use C11 Atomic Built-ins +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +DPDK generic rte_atomic operations are implemented by `__sync built-ins +`_. +These __sync built-ins result in full barriers on aarch64, which are unnecessary +in many use cases. They can be replaced by `__atomic built-ins +`_ +that conform to the C11 memory model and provide finer memory order control. + +So replacing the rte_atomic operations with __atomic built-ins might improve +performance for aarch64 machines. + +Some typical optimization cases are listed below: + +Atomicity +^^^^^^^^^ + +Some use cases require atomicity alone, the ordering of the memory operations +does not matter. For example the packets statistics in ``virtio_xmit()`` +function of ``vhost`` example application. It just updates the number of +transmitted packets, no subsequent logic depends on these counters. So the +RELAXED memory ordering is sufficient. + +One-way Barrier +^^^^^^^^^^^^^^^ + +Some use cases allow for memory reordering in one way while requiring memory +ordering in the other direction. + +For example, the memory operations before the ``rte_spinlock_lock()`` can move +to the critical section, but the memory operations in the critical section +cannot move above the lock. In this case, the full memory barrier in the +compare-and-swap operation can be replaced to ACQUIRE. On the other hand, the +memory operations after the ``rte_spinlock_unlock()`` can move to the critical +section, but the memory operations in the critical section cannot move below +the unlock. So the full barrier in the STORE operation can be replaced with +RELEASE. + +Reader-Writer Concurrency +^^^^^^^^^^^^^^^^^^^^^^^^^ + +Lock-free reader-writer concurrency is one of the common use cases in DPDK. + +The payload or the data that the writer wants to communicate to the reader, +can be written with RELAXED memory order. However, the guard variable should +be written with RELEASE memory order. This ensures that the store to guard +variable is observable only after the store to payload is observable. +Refer to ``rte_hash_cuckoo_insert_mw()`` for an example. + +Correspondingly, on the reader side, the guard variable should be read +with ACQUIRE memory order. The payload or the data the writer communicated, +can be read with RELAXED memory order. This ensures that, if the store to +guard variable is observable, the store to payload is also observable. +Refer to rte_hash ``search_one_bucket_lf()`` for an example. + Coding Considerations --------------------- -- 2.7.4