From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pf1-f181.google.com (mail-pf1-f181.google.com [209.85.210.181]) by dpdk.org (Postfix) with ESMTP id 183CC5F12 for ; Tue, 27 Nov 2018 23:28:12 +0100 (CET) Received: by mail-pf1-f181.google.com with SMTP id u6so9091564pfh.11 for ; Tue, 27 Nov 2018 14:28:12 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=networkplumber-org.20150623.gappssmtp.com; s=20150623; h=date:from:to:cc:subject:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=qTh83r+ZeX+FL6TGT9Pc04rnFq+Wb1/8ZN2e8kKCNFE=; b=S/yguTnBY6nzt+GDaBY71twJFOE1onOBvoBZNkkvZlwG1pkJBh57NXuN0Dk+v5esW+ tyrHS+2+F60jtk0zxvoUtXUTYiglhKFRrNHkuxKyWH0daAdOt45NvekR+ehcWrkdcwsV Z5iph+58JaaZW6vK4PfE5F3CdZ+4JvIKKxfN/DsRpVbfM2YR5cBJ7/uruXEuviNNmcG+ YlgRKDmhM8yRIYxqyfZaI4Bt+JuiC/+w+m+FAfMZfTNjq/BSzq4fZzb6iDTt1iUltRvM ItJtOKVvycOpLS9yB2a8DMiIllxg7BRz5pdana1GTl2Z9P62wAD04PFJ4NtoCBB+U4/d 8Gug== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=qTh83r+ZeX+FL6TGT9Pc04rnFq+Wb1/8ZN2e8kKCNFE=; b=WBzdubvsYK0ZjvTrYsnNG3JhpOrt6DT7suAmTFAGvvEttaTZLTzlBQ6+LbO2oZWhbf LRDOLo2Wi1vzup+mqlDXk2ggu3MAztyRZSMIDZ1CeN123LWZmzOrASI9H+YLUEZ1xRK3 hy5+ROo4Wy1b2YI4CsLloDo+6uQNBqH6yxr1rtp6eMfRZn6hbwRuCPMoA0QjvcE6LyDJ qE/Cgeh2jZ5R8zYP3yImHZH4m2x8YP89a0/AiFqPbrgU99ey0ZkBMcpjvMqh8wsG53pS DO/A8YbDQpG0OhHtC9hEYFgXeUHlwg0dzj2/v1eUSmf+Suh6tI23baTjExmlRsWAiQ3E 3eew== X-Gm-Message-State: AA+aEWaISfO6QSzh7P/EZFlsV+ZRkbCGDKXrP5LQuUdnzry8yYkCVMjV OUtA5X3kQDtcoZhLUAqOBp4RNKZhqis= X-Google-Smtp-Source: AJdET5eC2hgeFKp4/T0cNSMB+bozqUZqg3sLu6Wd380at/5hcTgVIHko6CSmZrRiKNCBTE4p0AXgMA== X-Received: by 2002:a63:dd55:: with SMTP id g21mr30265415pgj.86.1543357690952; Tue, 27 Nov 2018 14:28:10 -0800 (PST) Received: from xeon-e3 (204-195-22-127.wavecable.com. [204.195.22.127]) by smtp.gmail.com with ESMTPSA id 64sm6099888pff.101.2018.11.27.14.28.10 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Tue, 27 Nov 2018 14:28:10 -0800 (PST) Date: Tue, 27 Nov 2018 14:28:03 -0800 From: Stephen Hemminger To: Honnappa Nagarahalli Cc: dev@dpdk.org, nd@arm.com, dharmik.thakkar@arm.com, malvika.gupta@arm.com, gavin.hu@arm.com Message-ID: <20181127142803.423c9b00@xeon-e3> In-Reply-To: <20181122033055.3431-1-honnappa.nagarahalli@arm.com> References: <20181122033055.3431-1-honnappa.nagarahalli@arm.com> MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Subject: Re: [dpdk-dev] [RFC 0/3] tqs: add thread quiescent state library X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 27 Nov 2018 22:28:12 -0000 On Wed, 21 Nov 2018 21:30:52 -0600 Honnappa Nagarahalli wrote: > Lock-less data structures provide scalability and determinism. > They enable use cases where locking may not be allowed > (for ex: real-time applications). > > In the following paras, the term 'memory' refers to memory allocated > by typical APIs like malloc or anything that is representative of > memory, for ex: an index of a free element array. > > Since these data structures are lock less, the writers and readers > are accessing the data structures simultaneously. Hence, while removing > an element from a data structure, the writers cannot return the memory > to the allocator, without knowing that the readers are not > referencing that element/memory anymore. Hence, it is required to > separate the operation of removing an element into 2 steps: > > Delete: in this step, the writer removes the element from the > data structure but does not return the associated memory to the allocator. > This will ensure that new readers will not get a reference to the removed > element. Removing the reference is an atomic operation. > > Free: in this step, the writer returns the memory to the > memory allocator, only after knowing that all the readers have stopped > referencing the removed element. > > This library helps the writer determine when it is safe to free the > memory. > > This library makes use of Thread Quiescent State (TQS). TQS can be > defined as 'any point in the thread execution where the thread does > not hold a reference to shared memory'. It is upto the application to > determine its quiescent state. Let us consider the following diagram: > > Time --------------------------------------------------> > > | | > RT1 $++++****D1****+++***D2*|**+++|+++**D3*****++++$ > | | > RT2 $++++****D1****++|+**D2|***++++++**D3*****++++$ > | | > RT3 $++++****D1****+++***|D2***|++++++**D2*****++++$ > | | > |<--->| > Del | Free > | > Cannot free memory > during this period > > RTx - Reader thread > < and > - Start and end of while(1) loop > ***Dx*** - Reader thread is accessing the shared data structure Dx. > i.e. critical section. > +++ - Reader thread is not accessing any shared data structure. > i.e. non critical section or quiescent state. > Del - Point in time when the reference to the entry is removed using > atomic operation. > Free - Point in time when the writer can free the entry. > > As shown thread RT1 acesses data structures D1, D2 and D3. When it is > accessing D2, if the writer has to remove an element from D2, the > writer cannot return the memory associated with that element to the > allocator. The writer can return the memory to the allocator only after > the reader stops referencng D2. In other words, reader thread RT1 > has to enter a quiescent state. > > Similarly, since thread RT3 is also accessing D2, writer has to wait till > RT3 enters quiescent state as well. > > However, the writer does not need to wait for RT2 to enter quiescent state. > Thread RT2 was not accessing D2 when the delete operation happened. > So, RT2 will not get a reference to the deleted entry. > > It can be noted that, the critical sections for D2 and D3 are quiescent states > for D1. i.e. for a given data structure Dx, any point in the thread execution > that does not reference Dx is a quiescent state. > > For DPDK applications, the start and end of while(1) loop (where no shared > data structures are getting accessed) act as perfect quiescent states. This > will combine all the shared data structure accesses into a single critical > section and keeps the over head introduced by this library to the minimum. > > However, the length of the critical section and the number of reader threads > is proportional to the time taken to identify the end of critical section. > So, if the application desires, it should be possible to identify the end > of critical section for each data structure. > > To provide the required flexibility, this library has a concept of TQS > variable. The application can create one or more TQS variables to help it > track the end of one or more critical sections. > > The application can create a TQS variable using the API rte_tqs_alloc. > It takes a mask of lcore IDs that will report their quiescent states > using this variable. This mask can be empty to start with. > > rte_tqs_register_lcore API will register a reader thread to report its > quiescent state. This can be called from any control plane thread or from > the reader thread. The application can create a TQS variable with no reader > threads and add the threads dynamically using this API. > > The application can trigger the reader threads to report their quiescent > state status by calling the API rte_tqs_start. It is possible for multiple > writer threads to query the quiescent state status simultaneously. Hence, > rte_tqs_start returns a token to each caller. > > The application has to call rte_tqs_check API with the token to get the > current status. Option to block till all the threads enter the quiescent > state is provided. If this API indicates that all the threads have entered > the quiescent state, the application can free the deleted entry. > > The separation of triggering the reporting from querying the status provides > the writer threads flexibility to do useful work instead of waiting for the > reader threads to enter the quiescent state. > > rte_tqs_unregister_lcore API will remove a reader thread from reporting its > quiescent state using a TQS variable. The rte_tqs_check API will not wait > for this reader thread to report the quiescent state status anymore. > > Finally, a TQS variable can be deleted by calling rte_tqs_free API. > Application must make sure that the reader threads are not referencing the > TQS variable anymore before deleting it. > > The reader threads should call rte_tqs_update API to indicate that they > entered a quiescent state. This API checks if a writer has triggered a > quiescent state query and update the state accordingly. > > Next Steps: > 1) Add more test cases > 2) Convert to patch > 3) Incorporate feedback from community > 4) Add documentation > > Dharmik Thakkar (1): > test/tqs: Add API and functional tests > > Honnappa Nagarahalli (2): > log: add TQS log type > tqs: add thread quiescent state library > > config/common_base | 6 + > lib/Makefile | 2 + > lib/librte_eal/common/include/rte_log.h | 1 + > lib/librte_tqs/Makefile | 23 + > lib/librte_tqs/meson.build | 5 + > lib/librte_tqs/rte_tqs.c | 249 +++++++++++ > lib/librte_tqs/rte_tqs.h | 352 +++++++++++++++ > lib/librte_tqs/rte_tqs_version.map | 16 + > lib/meson.build | 2 +- > mk/rte.app.mk | 1 + > test/test/Makefile | 2 + > test/test/autotest_data.py | 6 + > test/test/meson.build | 5 +- > test/test/test_tqs.c | 540 ++++++++++++++++++++++++ > 14 files changed, 1208 insertions(+), 2 deletions(-) > create mode 100644 lib/librte_tqs/Makefile > create mode 100644 lib/librte_tqs/meson.build > create mode 100644 lib/librte_tqs/rte_tqs.c > create mode 100644 lib/librte_tqs/rte_tqs.h > create mode 100644 lib/librte_tqs/rte_tqs_version.map > create mode 100644 test/test/test_tqs.c > Mixed feelings about this one. Love to see RCU used for more things since it is much better than reader/writer locks for many applications. But hate to see DPDK reinventing every other library and not reusing code. Userspace RCU https://liburcu.org/ is widely supported by distro's, more throughly tested and documented, and more flexiple. The issue with many of these reinventions is a tradeoff of DPDK growing another dependency on external library versus using common code. For RCU, the big issue for me is the testing and documentation of how to do RCU safely. Many people get it wrong!