From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from dpdk.org (dpdk.org [92.243.14.124]) by dpdk.space (Postfix) with ESMTP id E74F8A05D3 for ; Wed, 27 Mar 2019 06:53:13 +0100 (CET) Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id 116B45F44; Wed, 27 Mar 2019 06:52:55 +0100 (CET) Received: from foss.arm.com (usa-sjc-mx-foss1.foss.arm.com [217.140.101.70]) by dpdk.org (Postfix) with ESMTP id 4836B1150 for ; Wed, 27 Mar 2019 06:52:49 +0100 (CET) Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.72.51.249]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id B3237165C; Tue, 26 Mar 2019 22:52:48 -0700 (PDT) Received: from qc2400f-1.austin.arm.com (qc2400f-1.austin.arm.com [10.118.13.209]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 38CD03F614; Tue, 26 Mar 2019 22:52:48 -0700 (PDT) From: Honnappa Nagarahalli To: konstantin.ananyev@intel.com, stephen@networkplumber.org, paulmck@linux.ibm.com, marko.kovacevic@intel.com, dev@dpdk.org Cc: honnappa.nagarahalli@arm.com, gavin.hu@arm.com, dharmik.thakkar@arm.com, malvika.gupta@arm.com Date: Wed, 27 Mar 2019 00:52:27 -0500 Message-Id: <20190327055227.4950-4-honnappa.nagarahalli@arm.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20190327055227.4950-1-honnappa.nagarahalli@arm.com> References: <20181122033055.3431-1-honnappa.nagarahalli@arm.com> <20190327055227.4950-1-honnappa.nagarahalli@arm.com> Subject: [dpdk-dev] [PATCH v2 3/3] doc/rcu: add lib_rcu documentation X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" Content-Type: text/plain; charset="UTF-8" Message-ID: <20190327055227.B8Du9by8oPePORQrIDhcBjeTNQaUu-uvjY-wSTTB7Lg@z> Add lib_rcu QSBR API and programmer guide documentation. Signed-off-by: Honnappa Nagarahalli Reviewed-by: Marko Kovacevic --- doc/api/doxy-api-index.md | 3 +- doc/api/doxy-api.conf.in | 1 + .../prog_guide/img/rcu_general_info.svg | 509 ++++++++++++++++++ doc/guides/prog_guide/index.rst | 1 + doc/guides/prog_guide/rcu_lib.rst | 179 ++++++ 5 files changed, 692 insertions(+), 1 deletion(-) create mode 100644 doc/guides/prog_guide/img/rcu_general_info.svg create mode 100644 doc/guides/prog_guide/rcu_lib.rst diff --git a/doc/api/doxy-api-index.md b/doc/api/doxy-api-index.md index d95ad566c..5c1f6b477 100644 --- a/doc/api/doxy-api-index.md +++ b/doc/api/doxy-api-index.md @@ -54,7 +54,8 @@ The public API headers are grouped by topics: [memzone] (@ref rte_memzone.h), [mempool] (@ref rte_mempool.h), [malloc] (@ref rte_malloc.h), - [memcpy] (@ref rte_memcpy.h) + [memcpy] (@ref rte_memcpy.h), + [rcu] (@ref rte_rcu_qsbr.h) - **timers**: [cycles] (@ref rte_cycles.h), diff --git a/doc/api/doxy-api.conf.in b/doc/api/doxy-api.conf.in index a365e669b..0b4c248a2 100644 --- a/doc/api/doxy-api.conf.in +++ b/doc/api/doxy-api.conf.in @@ -51,6 +51,7 @@ INPUT = @TOPDIR@/doc/api/doxy-api-index.md \ @TOPDIR@/lib/librte_port \ @TOPDIR@/lib/librte_power \ @TOPDIR@/lib/librte_rawdev \ + @TOPDIR@/lib/librte_rcu \ @TOPDIR@/lib/librte_reorder \ @TOPDIR@/lib/librte_ring \ @TOPDIR@/lib/librte_sched \ diff --git a/doc/guides/prog_guide/img/rcu_general_info.svg b/doc/guides/prog_guide/img/rcu_general_info.svg new file mode 100644 index 000000000..e7ca1dacb --- /dev/null +++ b/doc/guides/prog_guide/img/rcu_general_info.svg @@ -0,0 +1,509 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + Page-1 + + + + Sheet.3 + + + + Sheet.4 + + + + Sheet.5 + D1 + + + + D1 + + Sheet.6 + + + + Sheet.7 + + + + Sheet.8 + D2 + + + + D2 + + Sheet.9 + + + + Sheet.10 + Reader Thread 1 + + + + Reader Thread 1 + + Sheet.11 + + + + Sheet.12 + + + + Sheet.13 + D1 + + + + D1 + + Sheet.14 + + + + Sheet.15 + + + + Sheet.16 + D2 + + + + D2 + + Sheet.17 + + + + Sheet.18 + T 2 + + + + T 2 + + Sheet.19 + + + + Sheet.20 + + + + Sheet.21 + D1 + + + + D1 + + Sheet.22 + + + + Sheet.23 + + + + Sheet.24 + D2 + + + + D2 + + Sheet.25 + + + + Sheet.26 + T 3 + + + + T 3 + + Sheet.28 + Time + + + + Time + + Sheet.29 + + + + Sheet.30 + + + + Sheet.31 + Remove reference to entry1 + + + + Remove reference to entry1 + + Sheet.33 + + + + Sheet.34 + Delete + + + + Delete + + Sheet.35 + + + + Sheet.36 + + + + Sheet.37 + Delete entry1 from D1 + + + + Delete entry1 from D1 + + Sheet.38 + + + + Sheet.39 + + + + Sheet.40 + + + + Sheet.41 + Free memory for entries1 and 2 after every reader has gone th... + + + + Free memory for entries1 and 2 after every reader has gone through at least 1 quiescent state + + Sheet.46 + Free + + + + Free + + Sheet.48 + + + + Sheet.49 + + + + Sheet.50 + Grace Period + + + + Grace Period + + Sheet.51 + + + + Sheet.52 + + + + Sheet.53 + + + + Sheet.54 + Delete entry2 from D1 + + + + Delete entry2 from D1 + + Sheet.56 + + + + Sheet.57 + + + + Sheet.58 + + + + Sheet.59 + Critical sections + + + + Critical sections + + Sheet.60 + + + + Sheet.61 + + + + Sheet.62 + + + + Sheet.63 + Quiescent states + + + + Quiescent states + + Sheet.64 + + + + Sheet.65 + + + + Sheet.66 + + + + Sheet.67 + + + + Sheet.68 + + + + Sheet.69 + + + + Sheet.70 + while(1) loop + + + + while(1) loop + + Sheet.71 + + + + Sheet.72 + Reader thread is not accessing any shared data structure. i.e... + + + + Reader thread is not accessing any shared data structure.i.e. non critical section or quiescent state. + + Sheet.73 + Dx + + + + Dx + + Sheet.74 + Reader thread is accessing the shared data structure Dx. i.e.... + + + + Reader thread is accessing the shared data structure Dx.i.e. critical section. + + Sheet.75 + Point in time when the reference to the entry is removed usin... + + + + Point in time when the reference to the entry is removed using an atomic operation. + + Sheet.76 + Delete + + + + Delete + + Sheet.77 + Point in time when the writer can free the deleted entry. + + + + Point in time when the writer can free the deleted entry. + + Sheet.78 + Free + + + + Free + + Sheet.79 + Time duration between Delete and Free, during which memory ca... + + + + Time duration between Delete and Free, during which memory cannot be freed. + + Sheet.80 + Grace Period + + + + Grace Period + + Sheet.83 + + + + Sheet.84 + + + + Sheet.85 + + + + Sheet.86 + + + + Sheet.87 + + + + Sheet.88 + Remove reference to entry2 + + + + Remove reference to entry2 + + diff --git a/doc/guides/prog_guide/index.rst b/doc/guides/prog_guide/index.rst index 6726b1e8d..6fb3fb921 100644 --- a/doc/guides/prog_guide/index.rst +++ b/doc/guides/prog_guide/index.rst @@ -55,6 +55,7 @@ Programmer's Guide metrics_lib bpf_lib ipsec_lib + rcu_lib source_org dev_kit_build_system dev_kit_root_make_help diff --git a/doc/guides/prog_guide/rcu_lib.rst b/doc/guides/prog_guide/rcu_lib.rst new file mode 100644 index 000000000..dfe45fa62 --- /dev/null +++ b/doc/guides/prog_guide/rcu_lib.rst @@ -0,0 +1,179 @@ +.. SPDX-License-Identifier: BSD-3-Clause + Copyright(c) 2019 Arm Limited. + +.. _RCU_Library: + +RCU Library +============ + +Lock-less data structures provide scalability and determinism. +They enable use cases where locking may not be allowed +(for ex: real-time applications). + +In the following paras, the term 'memory' refers to memory allocated +by typical APIs like malloc or anything that is representative of +memory, for ex: an index of a free element array. + +Since these data structures are lock less, the writers and readers +are accessing the data structures concurrently. Hence, while removing +an element from a data structure, the writers cannot return the memory +to the allocator, without knowing that the readers are not +referencing that element/memory anymore. Hence, it is required to +separate the operation of removing an element into 2 steps: + +Delete: in this step, the writer removes the reference to the element from +the data structure but does not return the associated memory to the +allocator. This will ensure that new readers will not get a reference to +the removed element. Removing the reference is an atomic operation. + +Free(Reclaim): in this step, the writer returns the memory to the +memory allocator, only after knowing that all the readers have stopped +referencing the deleted element. + +This library helps the writer determine when it is safe to free the +memory. + +This library makes use of thread Quiescent State (QS). + +What is Quiescent State +----------------------- +Quiescent State can be defined as 'any point in the thread execution where the +thread does not hold a reference to shared memory'. It is up to the application +to determine its quiescent state. + +Let us consider the following diagram: + +.. figure:: img/rcu_general_info.* + + +As shown, reader thread 1 accesses data structures D1 and D2. When it is +accessing D1, if the writer has to remove an element from D1, the +writer cannot free the memory associated with that element immediately. +The writer can return the memory to the allocator only after the reader +stops referencing D1. In other words, reader thread RT1 has to enter +a quiescent state. + +Similarly, since reader thread 2 is also accessing D1, writer has to +wait till thread 2 enters quiescent state as well. + +However, the writer does not need to wait for reader thread 3 to enter +quiescent state. Reader thread 3 was not accessing D1 when the delete +operation happened. So, reader thread 1 will not have a reference to the +deleted entry. + +It can be noted that, the critical sections for D2 is a quiescent state +for D1. i.e. for a given data structure Dx, any point in the thread execution +that does not reference Dx is a quiescent state. + +Since memory is not freed immediately, there might be a need for +provisioning of additional memory, depending on the application requirements. + +Factors affecting RCU mechanism +--------------------------------- + +It is important to make sure that this library keeps the overhead of +identifying the end of grace period and subsequent freeing of memory, +to a minimum. The following paras explain how grace period and critical +section affect this overhead. + +The writer has to poll the readers to identify the end of grace period. +Polling introduces memory accesses and wastes CPU cycles. The memory +is not available for reuse during grace period. Longer grace periods +exasperate these conditions. + +The length of the critical section and the number of reader threads +is proportional to the duration of the grace period. Keeping the critical +sections smaller will keep the grace period smaller. However, keeping the +critical sections smaller requires additional CPU cycles (due to additional +reporting) in the readers. + +Hence, we need the characteristics of small grace period and large critical +section. This library addresses this by allowing the writer to do +other work without having to block till the readers report their quiescent +state. + +RCU in DPDK +----------- + +For DPDK applications, the start and end of while(1) loop (where no +references to shared data structures are kept) act as perfect quiescent +states. This will combine all the shared data structure accesses into a +single, large critical section which helps keep the overhead on the +reader side to a minimum. + +DPDK supports pipeline model of packet processing and service cores. +In these use cases, a given data structure may not be used by all the +workers in the application. The writer does not have to wait for all +the workers to report their quiescent state. To provide the required +flexibility, this library has a concept of QS variable. The application +can create one QS variable per data structure to help it track the +end of grace period for each data structure. This helps keep the grace +period to a minimum. + +How to use this library +----------------------- + +The application must allocate memory and initialize a QS variable. + +Application can call ``rte_rcu_qsbr_get_memsize`` to calculate the size +of memory to allocate. This API takes maximum number of reader threads, +using this variable, as a parameter. Currently, a maximum of 1024 threads +are supported. + +Further, the application can initialize a QS variable using the API +``rte_rcu_qsbr_init``. + +Each reader thread is assumed to have a unique thread ID. Currently, the +management of the thread ID (for ex: allocation/free) is left to the +application. The thread ID should be in the range of 0 to +maximum number of threads provided while creating the QS variable. +The application could also use lcore_id as the thread ID where applicable. + +``rte_rcu_qsbr_thread_register`` API will register a reader thread +to report its quiescent state. This can be called from a reader thread. +A control plane thread can also call this on behalf of a reader thread. +The reader thread must call ``rte_rcu_qsbr_thread_online`` API to start +reporting its quiescent state. + +Some of the use cases might require the reader threads to make +blocking API calls (for ex: while using eventdev APIs). The writer thread +should not wait for such reader threads to enter quiescent state. +The reader thread must call ``rte_rcu_qsbr_thread_offline`` API, before calling +blocking APIs. It can call ``rte_rcu_qsbr_thread_online`` API once the blocking +API call returns. + +The writer thread can trigger the reader threads to report their quiescent +state by calling the API ``rte_rcu_qsbr_start``. It is possible for multiple +writer threads to query the quiescent state status simultaneously. Hence, +``rte_rcu_qsbr_start`` returns a token to each caller. + +The writer thread must call ``rte_rcu_qsbr_check`` API with the token to +get the current quiescent state status. Option to block till all the reader +threads enter the quiescent state is provided. If this API indicates that +all the reader threads have entered the quiescent state, the application +can free the deleted entry. + +The APIs ``rte_rcu_qsbr_start`` and ``rte_rcu_qsbr_check`` are lock free. +Hence, they can be called concurrently from multiple writers even while +running as worker threads. + +The separation of triggering the reporting from querying the status provides +the writer threads flexibility to do useful work instead of blocking for the +reader threads to enter the quiescent state or go offline. This reduces the +memory accesses due to continuous polling for the status. + +``rte_rcu_qsbr_synchronize`` API combines the functionality of +``rte_rcu_qsbr_start`` and blocking ``rte_rcu_qsbr_check`` into a single API. +This API triggers the reader threads to report their quiescent state and +polls till all the readers enter the quiescent state or go offline. This +API does not allow the writer to do useful work while waiting and +introduces additional memory accesses due to continuous polling. + +The reader thread must call ``rte_rcu_qsbr_thread_offline`` and +``rte_rcu_qsbr_thread_unregister`` APIs to remove itself from reporting its +quiescent state. The ``rte_rcu_qsbr_check`` API will not wait for this reader +thread to report the quiescent state status anymore. + +The reader threads should call ``rte_rcu_qsbr_quiescent`` API to indicate that +they entered a quiescent state. This API checks if a writer has triggered a +quiescent state query and update the state accordingly. -- 2.17.1