From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wm1-f66.google.com (mail-wm1-f66.google.com [209.85.128.66]) by dpdk.org (Postfix) with ESMTP id A8D1B1B56E for ; Fri, 30 Nov 2018 17:26:27 +0100 (CET) Received: by mail-wm1-f66.google.com with SMTP id k198so6320997wmd.3 for ; Fri, 30 Nov 2018 08:26:27 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:message-id:subject:from:to:cc:date:in-reply-to :references:content-transfer-encoding:mime-version; bh=7HOBUk8nOPGs2z6Czu1jkPWYlScmx77cKTN2+pNKr0w=; b=ZLsZ7OO1FKaUtjTIEjnwLzZaOzHAkJL+tXj0e2aQ5/s+Aljpd33sazSEjTUYMbOHhf Jtc4GTDEbRIHDHN9SPrTRIhnC29i71p8ydjrPG9/5wIj0JfqcskcJoU6jhHanZ00J9mg UdIGzzebH495R7eXyg/g77jdHMSz+7ilSJnowMLlvRMRpJVTBdXwLhgsh7gJJAqJG2j4 ml3X/hE1g26LarOEj3j7p2LVoRbiLGf2IvhNalKmu1Erqe9hvZCPIqqXBCfP6gfWlgQZ BKgeR9kYhz5s1UZz/jqR5WRu1x2Qqz0Hvf+akWVgIegtqKgAx99ixS6ZJBaGB5oB5TN2 XdUA== X-Gm-Message-State: AA+aEWa+Cj+jWAUv9k9ZNTsfLJZRjU1myu0trohGo5bCwGb6y9n3tqx1 bKoqRo10poyNfwQ4BUtE9GY= X-Google-Smtp-Source: AFSGD/ViMrHAbmFuFtghjsUhHLeuDnu4o6cYigurW8iomahLUWtpT4snXzmUjibgDI0W61gMie4j0A== X-Received: by 2002:a1c:b456:: with SMTP id d83mr6502696wmf.115.1543595187167; Fri, 30 Nov 2018 08:26:27 -0800 (PST) Received: from localhost ([2a01:4b00:f419:6f00:a42e:4596:6de4:d1b1]) by smtp.gmail.com with ESMTPSA id v6sm6264381wro.57.2018.11.30.08.26.25 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Fri, 30 Nov 2018 08:26:25 -0800 (PST) Message-ID: <1543595185.18215.5.camel@debian.org> From: Luca Boccassi To: Honnappa Nagarahalli , Stephen Hemminger Cc: "Van Haaren, Harry" , "dev@dpdk.org" , nd , Dharmik Thakkar , Malvika Gupta , "Gavin Hu (Arm Technology China)" Date: Fri, 30 Nov 2018 16:26:25 +0000 In-Reply-To: References: <20181122033055.3431-1-honnappa.nagarahalli@arm.com> <20181127142803.423c9b00@xeon-e3> <20181128152351.27fdebe3@xeon-e3> Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Mailer: Evolution 3.22.6-1+deb9u1 Mime-Version: 1.0 Subject: Re: [dpdk-dev] [RFC 0/3] tqs: add thread quiescent state library X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 30 Nov 2018 16:26:27 -0000 On Fri, 2018-11-30 at 02:13 +0000, Honnappa Nagarahalli wrote: > >=20 > > > > > Mixed feelings about this one. > > > > >=20 > > > > > Love to see RCU used for more things since it is much better > > > > > than > > > > > reader/writer locks for many applications. But hate to see > > > > > DPDK > > > > > reinventing every other library and not reusing code. > > > > > Userspace > > > > > RCU https://liburcu.org/ is widely supported by distro's, > > > > > more > > > > > throughly tested and documented, and more flexiple. > > > > >=20 > > > > > The issue with many of these reinventions is a tradeoff of > > > > > DPDK > > > > > growing another dependency on external library versus using > > > > > common > >=20 > > code. > > > > >=20 > > >=20 > > > Agree with the dependency issues. Sometimes flexibility also > > > causes confusion > >=20 > > and features that are not necessarily required for a targeted use > > case. I have > > seen that much of the functionality that can be left to the > > application is > > implemented as part of the library. > > > I think having it in DPDK will give us control over the amount of > > > capability this > >=20 > > library will have and freedom over changes we would like to make to > > such a > > library. I also view DPDK as one package where all things required > > for data > > plane development are available. > > >=20 > > > > > For RCU, the big issue for me is the testing and > > > > > documentation of > > > > > how to do RCU safely. Many people get it wrong! > > >=20 > > > Hopefully, we all will do a better job collectively :) > > >=20 > > > >=20 > >=20 > > Reinventing RCU is not helping anyone. >=20 > IMO, this depends on what the rte_tqs has to offer and what the > requirements are. Before starting this patch, I looked at the liburcu > APIs. I have to say, fairly quickly (no offense) I concluded that > this does not address DPDK's needs. I took a deeper look at the > APIs/code in the past day and I still concluded the same. My partial > analysis (analysis of more APIs can be done, I do not have cycles at > this point) is as follows: >=20 > The reader threads' information is maintained in a linked list[1]. > This linked list is protected by a mutex lock[2]. Any > additions/deletions/traversals of this list are blocking and cannot > happen in parallel. >=20 > The API, 'synchronize_rcu' [3] (similar functionality to > rte_tqs_check call) is a blocking call. There is no option provided > to make it non-blocking. The writer spins cycles while waiting for > the grace period to get over. >=20 > 'synchronize_rcu' also has grace period lock [4]. If I have multiple > writers running on data plane threads, I cannot call this API to > reclaim the memory in the worker threads as it will block other > worker threads. This means, there is an extra thread required (on the > control plane?) which does garbage collection and a method to push > the pointers from worker threads to the garbage collection thread. > This also means the time duration from delete to free increases > putting pressure on amount of memory held up. > Since this API cannot be called concurrently by multiple writers, > each writer has to wait for other writer's grace period to get over > (i.e. multiple writer threads cannot overlap their grace periods). >=20 > This API also has to traverse the linked list which is not very well > suited for calling on data plane. >=20 > I have not gone too much into rcu_thread_offline[5] API. This again > needs to be used in worker cores and does not look to be very > optimal. >=20 > I have glanced at rcu_quiescent_state [6], it wakes up the thread > calling 'synchronize_rcu' which seems good amount of code for the > data plane. >=20 > [1] https://github.com/urcu/userspace-rcu/blob/master/include/urcu/st > atic/urcu-qsbr.h#L85 > [2] https://github.com/urcu/userspace-rcu/blob/master/src/urcu-qsbr.c > #L68 > [3] https://github.com/urcu/userspace-rcu/blob/master/src/urcu-qsbr.c > #L344 > [4] https://github.com/urcu/userspace-rcu/blob/master/src/urcu-qsbr.c > #L58 > [5] https://github.com/urcu/userspace-rcu/blob/master/include/urcu/st > atic/urcu-qsbr.h#L211 > [6] https://github.com/urcu/userspace-rcu/blob/master/include/urcu/st > atic/urcu-qsbr.h#L193 >=20 > Coming to what is provided in rte_tqs: >=20 > The synchronize_rcu functionality is split in to 2 APIs: > rte_tqs_start and rte_tqs_check. The reader data is maintained as an > array. >=20 > Both the APIs are lock-free, allowing them to be called from multiple > threads concurrently. This allows multiple writers to wait for their > grace periods concurrently as well as overlap their grace periods. > rte_tqs_start API returns a token which provides the ability to > separate the quiescent state waiting of different writers. Hence, no > writer waits for other writer's grace period to get over.=C2=A0 > Since these 2 APIs are lock-free, they can be called from writers > running on worker cores as well without the need for a separate > thread to do garbage collection. >=20 > The separation into 2 APIs provides the ability for writers to not > spin cycles waiting for the grace period to get over. This enables > different ways of doing garbage collection. For ex: a data structure > delete API could remove the entry from the data structure, call > rte_tqs_start and return back to the caller. On the invocation of > next API call of the library, the API can call rte_tqs_check (which > will mostly indicate that the grace period is complete) and free the > previously deleted entry. >=20 > rte_tqs_update (mapping to rcu_quiescent_state) is pretty small and > simple. >=20 > rte_tqs_register and rte_tqs_unregister APIs are lock free. Hence > additional APIs like rcu_thread_online and rcu_thread_offline are not > required. The rte_tqs_unregister API (when compared to > rcu_thread_offline) is much simple and conducive to be used in worker > threads. liburcu has many flavours already, qsbr being one of them. If none of those are optimal for this use case, why not work with upstream to either improve the existing flavours, or add a new one? You have the specific knowledge and expertise about the requirements needs for the implementation for this use case, and they have the long- time and extensive experience maintaining the library on a wide range of systems and use cases. Why not combine both? I might be wrong, but to me, nothing described above seems to be particularly or uniquely tied to implementing a software dataplane or to DPDK. IMHO we should pool and share wherever possible, rather than build an ecosystem closed onto itself. Just my 2c of course! > > DPDK needs to fix its dependency model, and just admit that it is > > ok to build off > > of more than glibc. > >=20 > > Having used liburcu, it can be done in a small manner and really > > isn't that > > confusing. > >=20 > > Is your real issue the LGPL license of liburcu for your skittish > > customers? >=20 > I have not had any discussions on this. Customers are mainly focused > on having a solution on which they have meaningful control. They want > to be able to submit a patch and change things if required. For ex: > barriers for Arm [7] are not optimal. How easy is it to change this > and get it into distros (there are both internal and external factors > here)? It's just as easy (or as hard) as it is with DPDK. So it's either wait for the distros to update, or rebuild locally. --=20 Kind regards, Luca Boccassi