From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from EUR04-HE1-obe.outbound.protection.outlook.com (mail-eopbgr70073.outbound.protection.outlook.com [40.107.7.73]) by dpdk.org (Postfix) with ESMTP id 345D55A for ; Tue, 26 Mar 2019 05:35:14 +0100 (CET) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=armh.onmicrosoft.com; s=selector1-arm-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=sOEh+2Il9a138Uvc8z0/Q5vfpS+HfmHmHL3fHnDFtuQ=; b=fmVGy8YXc7kGJL/eO9WNPw0Ek2tlI0OcQVtcD0USqjsT3vftkhlG0qz7rd46GwL7JUxjjQX38cWLWQ2EET9peatKi9BsLN7zDv73GTEgIKzo/m1IAp2AtzZm/dpCmJBeZj0J0UUGRHBTSUWOrDDeGgs+0v1yjDOnoDxY1LAc5ZE= Received: from AM6PR08MB3672.eurprd08.prod.outlook.com (20.177.115.76) by AM6PR08MB5014.eurprd08.prod.outlook.com (10.255.122.18) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.1730.15; Tue, 26 Mar 2019 04:35:12 +0000 Received: from AM6PR08MB3672.eurprd08.prod.outlook.com ([fe80::4d90:78f1:e670:14d5]) by AM6PR08MB3672.eurprd08.prod.outlook.com ([fe80::4d90:78f1:e670:14d5%3]) with mapi id 15.20.1730.019; Tue, 26 Mar 2019 04:35:12 +0000 From: Honnappa Nagarahalli To: "Ananyev, Konstantin" , "stephen@networkplumber.org" , "paulmck@linux.ibm.com" , "dev@dpdk.org" CC: "Gavin Hu (Arm Technology China)" , Dharmik Thakkar , Malvika Gupta , nd , nd Thread-Topic: [PATCH 1/3] rcu: add RCU library supporting QSBR mechanism Thread-Index: AQHU3hBhNhXVbW/WVkOFAdc0mqwLiaYX1KDAgAUEG6A= Date: Tue, 26 Mar 2019 04:35:12 +0000 Message-ID: References: <20181122033055.3431-1-honnappa.nagarahalli@arm.com> <20190319045228.46879-1-honnappa.nagarahalli@arm.com> <20190319045228.46879-2-honnappa.nagarahalli@arm.com> <2601191342CEEE43887BDE71AB977258013655ED5C@irsmsx105.ger.corp.intel.com> In-Reply-To: <2601191342CEEE43887BDE71AB977258013655ED5C@irsmsx105.ger.corp.intel.com> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: authentication-results: spf=none (sender IP is ) smtp.mailfrom=Honnappa.Nagarahalli@arm.com; x-originating-ip: [217.140.111.135] x-ms-publictraffictype: Email x-ms-office365-filtering-correlation-id: cc67e8c7-c5d8-43a7-d429-08d6b1a46f7e x-ms-office365-filtering-ht: Tenant x-microsoft-antispam: BCL:0; PCL:0; RULEID:(2390118)(7020095)(4652040)(8989299)(5600127)(711020)(4605104)(4618075)(4534185)(4627221)(201703031133081)(201702281549075)(8990200)(2017052603328)(7153060)(7193020); SRVR:AM6PR08MB5014; x-ms-traffictypediagnostic: AM6PR08MB5014: x-ms-exchange-purlcount: 1 nodisclaimer: True x-microsoft-antispam-prvs: x-forefront-prvs: 09888BC01D x-forefront-antispam-report: SFV:NSPM; SFS:(10009020)(346002)(39860400002)(396003)(366004)(136003)(376002)(189003)(199004)(52314003)(8676002)(6436002)(8936002)(6506007)(6116002)(11346002)(71190400001)(3846002)(74316002)(476003)(446003)(4326008)(99286004)(76176011)(7736002)(72206003)(68736007)(25786009)(53936002)(6246003)(2501003)(966005)(81156014)(14454004)(102836004)(105586002)(478600001)(33656002)(7696005)(305945005)(55016002)(9686003)(53946003)(6306002)(97736004)(81166006)(86362001)(5660300002)(26005)(2906002)(110136005)(54906003)(30864003)(229853002)(93886005)(71200400001)(106356001)(486006)(52536014)(66066001)(14444005)(316002)(256004)(186003)(2201001); DIR:OUT; SFP:1101; SCL:1; SRVR:AM6PR08MB5014; H:AM6PR08MB3672.eurprd08.prod.outlook.com; FPR:; SPF:None; LANG:en; PTR:InfoNoRecords; MX:1; A:1; received-spf: None (protection.outlook.com: arm.com does not designate permitted sender hosts) x-ms-exchange-senderadcheck: 1 x-microsoft-antispam-message-info: pxiDul+Z3Wqh5cN+PEGU1F0V53/RLPOd8j15/ewMS+nrQqm9f5IRDNVwciVmVTQa6ezxsmeZGN8/PmNZe5swwyLTHVViFVl99ZUIvbj/dyE4RReF9/qqLi6oDhmeGXwk4uMVmlLlTj5p4wXLblv53QIGKOWX4LPE9i0iWlcZdqbnbghnWIQ3L4w3vnOhfpYu2ogmOIk8CIHsQ6BtvKGnsgpTAPZO9ssjK2nM1JWtEDiKlLasxtOO0B8e/ZhAFH2Rtgsk+44Aw4/CuEf9IkkXDZOsOHGAosUtlSXWaCDj0SQ1PmU1r/y8G3RlkHs3K2r3c9pHtcJ7nTUMvLpwmzPmvLPTmchZrkxILYD3SqlNvANsFwuVAvg/+icQ/fpkj4pR9qfraKSoAZ61ulUPeLhUwbmSy7RCM4G6w2nvjeKF8Dc= Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-OriginatorOrg: arm.com X-MS-Exchange-CrossTenant-Network-Message-Id: cc67e8c7-c5d8-43a7-d429-08d6b1a46f7e X-MS-Exchange-CrossTenant-originalarrivaltime: 26 Mar 2019 04:35:12.1308 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: f34e5979-57d9-4aaa-ad4d-b122a662184d X-MS-Exchange-CrossTenant-mailboxtype: HOSTED X-MS-Exchange-Transport-CrossTenantHeadersStamped: AM6PR08MB5014 Subject: Re: [dpdk-dev] [PATCH 1/3] rcu: add RCU library supporting QSBR mechanism X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 26 Mar 2019 04:35:14 -0000 > Hi Honnappa, >=20 > > diff --git a/lib/librte_rcu/rte_rcu_qsbr.c > > b/lib/librte_rcu/rte_rcu_qsbr.c new file mode 100644 index > > 000000000..0fc4515ea > > --- /dev/null > > +++ b/lib/librte_rcu/rte_rcu_qsbr.c > > @@ -0,0 +1,99 @@ > > +/* SPDX-License-Identifier: BSD-3-Clause > > + * > > + * Copyright (c) 2018 Arm Limited > > + */ > > + > > +#include > > +#include > > +#include > > +#include > > + > > +#include > > +#include > > +#include > > +#include > > +#include > > +#include > > +#include > > +#include > > +#include > > +#include > > + > > +#include "rte_rcu_qsbr.h" > > + > > +/* Get the memory size of QSBR variable */ size_t __rte_experimental > > +rte_rcu_qsbr_get_memsize(uint32_t max_threads) { > > + size_t sz; > > + > > + RTE_ASSERT(max_threads =3D=3D 0); >=20 > Here and in all similar places: > assert() will abort when its condition will be evaluated to false. > So it should be max_threads !=3D 0. Thanks for this comment. Enabling RTE_ENABLE_ASSERT resulted in more proble= ms. I will fix in the next version. > Also it a public and non-datapath function. > Calling assert() for invalid input parameter - seems way too extreme. > Why not just return error to the caller? Ok, I will change it. >=20 > > + > > + sz =3D sizeof(struct rte_rcu_qsbr); > > + > > + /* Add the size of quiescent state counter array */ > > + sz +=3D sizeof(struct rte_rcu_qsbr_cnt) * max_threads; > > + > > + return RTE_ALIGN(sz, RTE_CACHE_LINE_SIZE); } > > + > > +/* Initialize a quiescent state variable */ void __rte_experimental > > +rte_rcu_qsbr_init(struct rte_rcu_qsbr *v, uint32_t max_threads) { > > + RTE_ASSERT(v =3D=3D NULL); > > + > > + memset(v, 0, rte_rcu_qsbr_get_memsize(max_threads)); > > + v->m_threads =3D max_threads; > > + v->num_elems =3D RTE_ALIGN_MUL_CEIL(max_threads, > > + RTE_QSBR_THRID_ARRAY_ELM_SIZE) / > > + RTE_QSBR_THRID_ARRAY_ELM_SIZE; > > + v->token =3D RTE_QSBR_CNT_INIT; > > +} > > + > > +/* Dump the details of a single quiescent state variable to a file. > > +*/ void __rte_experimental rte_rcu_qsbr_dump(FILE *f, struct > > +rte_rcu_qsbr *v) { > > + uint64_t bmap; > > + uint32_t i, t; > > + > > + RTE_ASSERT(v =3D=3D NULL || f =3D=3D NULL); > > + > > + fprintf(f, "\nQuiescent State Variable @%p\n", v); > > + > > + fprintf(f, " QS variable memory size =3D %lu\n", > > + rte_rcu_qsbr_get_memsize(v->m_threads)); > > + fprintf(f, " Given # max threads =3D %u\n", v->m_threads); > > + > > + fprintf(f, " Registered thread ID mask =3D 0x"); > > + for (i =3D 0; i < v->num_elems; i++) > > + fprintf(f, "%lx", __atomic_load_n(&v->reg_thread_id[i], > > + __ATOMIC_ACQUIRE)); > > + fprintf(f, "\n"); > > + > > + fprintf(f, " Token =3D %lu\n", > > + __atomic_load_n(&v->token, __ATOMIC_ACQUIRE)); > > + > > + fprintf(f, "Quiescent State Counts for readers:\n"); > > + for (i =3D 0; i < v->num_elems; i++) { > > + bmap =3D __atomic_load_n(&v->reg_thread_id[i], > __ATOMIC_ACQUIRE); > > + while (bmap) { > > + t =3D __builtin_ctzl(bmap); > > + fprintf(f, "thread ID =3D %d, count =3D %lu\n", t, > > + __atomic_load_n( > > + &RTE_QSBR_CNT_ARRAY_ELM(v, i)- > >cnt, > > + __ATOMIC_RELAXED)); > > + bmap &=3D ~(1UL << t); > > + } > > + } > > +} > > + > > +int rcu_log_type; > > + > > +RTE_INIT(rte_rcu_register) > > +{ > > + rcu_log_type =3D rte_log_register("lib.rcu"); > > + if (rcu_log_type >=3D 0) > > + rte_log_set_level(rcu_log_type, RTE_LOG_ERR); } > > diff --git a/lib/librte_rcu/rte_rcu_qsbr.h > > b/lib/librte_rcu/rte_rcu_qsbr.h new file mode 100644 index > > 000000000..83943f751 > > --- /dev/null > > +++ b/lib/librte_rcu/rte_rcu_qsbr.h > > @@ -0,0 +1,511 @@ > > +/* SPDX-License-Identifier: BSD-3-Clause > > + * Copyright (c) 2018 Arm Limited > > + */ > > + > > +#ifndef _RTE_RCU_QSBR_H_ > > +#define _RTE_RCU_QSBR_H_ > > + > > +/** > > + * @file > > + * RTE Quiescent State Based Reclamation (QSBR) > > + * > > + * Quiescent State (QS) is any point in the thread execution > > + * where the thread does not hold a reference to a data structure > > + * in shared memory. While using lock-less data structures, the > > +writer > > + * can safely free memory once all the reader threads have entered > > + * quiescent state. > > + * > > + * This library provides the ability for the readers to report > > +quiescent > > + * state and for the writers to identify when all the readers have > > + * entered quiescent state. > > + */ > > + > > +#ifdef __cplusplus > > +extern "C" { > > +#endif > > + > > +#include > > +#include > > +#include > > +#include > > +#include > > +#include > > +#include > > + > > +extern int rcu_log_type; > > + > > +#if RTE_LOG_DP_LEVEL >=3D RTE_LOG_DEBUG #define RCU_DP_LOG(level, > fmt, > > +args...) \ > > + rte_log(RTE_LOG_ ## level, rcu_log_type, \ > > + "%s(): " fmt "\n", __func__, ## args) #else #define > > +RCU_DP_LOG(level, fmt, args...) #endif >=20 > Why do you need that? > Can't you use RTE_LOG_DP() instead? RTE_LOG_DP is for static log types such as RTE_LOGTYPE_EAL, RTE_LOGTYPE_MBU= F etc. Use of static log type in RCU was rejected earlier. Hence, I am usin= g the dynamic log types. >=20 > > + > > +/* Registered thread IDs are stored as a bitmap of 64b element array. > > + * Given thread id needs to be converted to index into the array and > > + * the id within the array element. > > + */ > > +#define RTE_RCU_MAX_THREADS 1024 > > +#define RTE_QSBR_THRID_ARRAY_ELM_SIZE (sizeof(uint64_t) * 8) #define > > +RTE_QSBR_THRID_ARRAY_ELEMS \ > > + (RTE_ALIGN_MUL_CEIL(RTE_RCU_MAX_THREADS, \ > > + RTE_QSBR_THRID_ARRAY_ELM_SIZE) / > RTE_QSBR_THRID_ARRAY_ELM_SIZE) > > +#define RTE_QSBR_THRID_INDEX_SHIFT 6 #define RTE_QSBR_THRID_MASK > 0x3f > > +#define RTE_QSBR_THRID_INVALID 0xffffffff > > + > > +/* Worker thread counter */ > > +struct rte_rcu_qsbr_cnt { > > + uint64_t cnt; > > + /**< Quiescent state counter. Value 0 indicates the thread is > > +offline */ } __rte_cache_aligned; > > + > > +#define RTE_QSBR_CNT_ARRAY_ELM(v, i) (((struct rte_rcu_qsbr_cnt *)(v > > ++ 1)) + i) >=20 > You can probably add > struct rte_rcu_qsbr_cnt cnt[0]; > at the end of struct rte_rcu_qsbr, then wouldn't need macro above. ok >=20 > > +#define RTE_QSBR_CNT_THR_OFFLINE 0 > > +#define RTE_QSBR_CNT_INIT 1 > > + > > +/** > > + * RTE thread Quiescent State structure. > > + * Quiescent state counter array (array of 'struct > > +rte_rcu_qsbr_cnt'), > > + * whose size is dependent on the maximum number of reader threads > > + * (m_threads) using this variable is stored immediately following > > + * this structure. > > + */ > > +struct rte_rcu_qsbr { > > + uint64_t token __rte_cache_aligned; > > + /**< Counter to allow for multiple simultaneous QS queries */ > > + > > + uint32_t num_elems __rte_cache_aligned; > > + /**< Number of elements in the thread ID array */ > > + uint32_t m_threads; > > + /**< Maximum number of threads this RCU variable will use */ > > + > > + uint64_t reg_thread_id[RTE_QSBR_THRID_ARRAY_ELEMS] > __rte_cache_aligned; > > + /**< Registered thread IDs are stored in a bitmap array */ >=20 >=20 > As I understand you ended up with fixed size array to avoid 2 variable si= ze > arrays in this struct? Yes > Is that big penalty for register/unregister() to either store a pointer t= o bitmap, > or calculate it based on num_elems value? In the last RFC I sent out [1], I tested the impact of having non-fixed siz= e array. There 'was' a performance degradation in most of the performance t= ests. The issue was with calculating the address of per thread QSBR counter= s (not with the address calculation of the bitmap). With the current patch,= I do not see the performance difference (the difference between the RFC an= d this patch are the memory orderings, they are masking any perf gain from = having a fixed array). However, I have kept the fixed size array as the gen= erated code does not have additional calculations to get the address of qsb= r counter array elements. [1] http://mails.dpdk.org/archives/dev/2019-February/125029.html > As another thought - do we really need bitmap at all? The bit map is helping avoid accessing all the elements in rte_rcu_qsbr_cnt= array (as you have mentioned below). This provides the ability to scale th= e number of threads dynamically. For ex: an application can create a qsbr v= ariable with 48 max threads, but currently only 2 threads are active (due t= o traffic conditions). > Might it is possible to sotre register value for each thread inside it's > rte_rcu_qsbr_cnt: > struct rte_rcu_qsbr_cnt {uint64_t cnt; uint32_t register;} > __rte_cache_aligned; ? > That would cause check() to walk through all elems in rte_rcu_qsbr_cnt ar= ray, > but from other side would help to avoid cache conflicts for register/unre= gister. With the addition of rte_rcu_qsbr_thread_online/offline APIs, the register/= unregister APIs are not in critical path anymore. Hence, the cache conflict= s are fine. The online/offline APIs work on thread specific cache lines and= these are in the critical path. >=20 > > +} __rte_cache_aligned; > > + > > +/** > > + * @warning > > + * @b EXPERIMENTAL: this API may change without prior notice > > + * > > + * Return the size of the memory occupied by a Quiescent State variabl= e. > > + * > > + * @param max_threads > > + * Maximum number of threads reporting quiescent state on this varia= ble. > > + * @return > > + * Size of memory in bytes required for this QS variable. > > + */ > > +size_t __rte_experimental > > +rte_rcu_qsbr_get_memsize(uint32_t max_threads); > > + > > +/** > > + * @warning > > + * @b EXPERIMENTAL: this API may change without prior notice > > + * > > + * Initialize a Quiescent State (QS) variable. > > + * > > + * @param v > > + * QS variable > > + * @param max_threads > > + * Maximum number of threads reporting QS on this variable. > > + * > > + */ > > +void __rte_experimental > > +rte_rcu_qsbr_init(struct rte_rcu_qsbr *v, uint32_t max_threads); > > + > > +/** > > + * @warning > > + * @b EXPERIMENTAL: this API may change without prior notice > > + * > > + * Register a reader thread to report its quiescent state > > + * on a QS variable. > > + * > > + * This is implemented as a lock-free function. It is multi-thread > > + * safe. > > + * Any reader thread that wants to report its quiescent state must > > + * call this API. This can be called during initialization or as part > > + * of the packet processing loop. > > + * > > + * Note that rte_rcu_qsbr_thread_online must be called before the > > + * thread updates its QS using rte_rcu_qsbr_update. > > + * > > + * @param v > > + * QS variable > > + * @param thread_id > > + * Reader thread with this thread ID will report its quiescent state= on > > + * the QS variable. > > + */ > > +static __rte_always_inline void __rte_experimental > > +rte_rcu_qsbr_thread_register(struct rte_rcu_qsbr *v, unsigned int > > +thread_id) { > > + unsigned int i, id; > > + > > + RTE_ASSERT(v =3D=3D NULL || thread_id >=3D v->max_threads); > > + > > + id =3D thread_id & RTE_QSBR_THRID_MASK; > > + i =3D thread_id >> RTE_QSBR_THRID_INDEX_SHIFT; > > + > > + /* Release the new register thread ID to other threads > > + * calling rte_rcu_qsbr_check. > > + */ > > + __atomic_fetch_or(&v->reg_thread_id[i], 1UL << id, > > +__ATOMIC_RELEASE); } > > + > > +/** > > + * @warning > > + * @b EXPERIMENTAL: this API may change without prior notice > > + * > > + * Remove a reader thread, from the list of threads reporting their > > + * quiescent state on a QS variable. > > + * > > + * This is implemented as a lock-free function. It is multi-thread saf= e. > > + * This API can be called from the reader threads during shutdown. > > + * Ongoing QS queries will stop waiting for the status from this > > + * unregistered reader thread. > > + * > > + * @param v > > + * QS variable > > + * @param thread_id > > + * Reader thread with this thread ID will stop reporting its quiesce= nt > > + * state on the QS variable. > > + */ > > +static __rte_always_inline void __rte_experimental > > +rte_rcu_qsbr_thread_unregister(struct rte_rcu_qsbr *v, unsigned int > > +thread_id) { > > + unsigned int i, id; > > + > > + RTE_ASSERT(v =3D=3D NULL || thread_id >=3D v->max_threads); > > + > > + id =3D thread_id & RTE_QSBR_THRID_MASK; > > + i =3D thread_id >> RTE_QSBR_THRID_INDEX_SHIFT; > > + > > + /* Make sure the removal of the thread from the list of > > + * reporting threads is visible before the thread > > + * does anything else. > > + */ > > + __atomic_fetch_and(&v->reg_thread_id[i], > > + ~(1UL << id), __ATOMIC_RELEASE); > > +} > > + > > +/** > > + * @warning > > + * @b EXPERIMENTAL: this API may change without prior notice > > + * > > + * Add a registered reader thread, to the list of threads reporting > > +their > > + * quiescent state on a QS variable. > > + * > > + * This is implemented as a lock-free function. It is multi-thread > > + * safe. > > + * > > + * Any registered reader thread that wants to report its quiescent > > +state must > > + * call this API before calling rte_rcu_qsbr_update. This can be > > +called > > + * during initialization or as part of the packet processing loop. > > + * > > + * The reader thread must call rte_rcu_thread_offline API, before > > + * calling any functions that block, to ensure that > > +rte_rcu_qsbr_check > > + * API does not wait indefinitely for the reader thread to update its = QS. > > + * > > + * The reader thread must call rte_rcu_thread_online API, after the > > +blocking > > + * function call returns, to ensure that rte_rcu_qsbr_check API > > + * waits for the reader thread to update its QS. > > + * > > + * @param v > > + * QS variable > > + * @param thread_id > > + * Reader thread with this thread ID will report its quiescent state= on > > + * the QS variable. > > + */ > > +static __rte_always_inline void __rte_experimental > > +rte_rcu_qsbr_thread_online(struct rte_rcu_qsbr *v, unsigned int > > +thread_id) { > > + uint64_t t; > > + > > + RTE_ASSERT(v =3D=3D NULL || thread_id >=3D v->max_threads); > > + > > + /* Copy the current value of token. > > + * The fence at the end of the function will ensure that > > + * the following will not move down after the load of any shared > > + * data structure. > > + */ > > + t =3D __atomic_load_n(&v->token, __ATOMIC_RELAXED); > > + > > + /* __atomic_store_n(cnt, __ATOMIC_RELAXED) is used to ensure > > + * 'cnt' (64b) is accessed atomically. > > + */ > > + __atomic_store_n(&RTE_QSBR_CNT_ARRAY_ELM(v, thread_id)->cnt, > > + t, __ATOMIC_RELAXED); > > + > > + /* The subsequent load of the data structure should not > > + * move above the store. Hence a store-load barrier > > + * is required. > > + * If the load of the data structure moves above the store, > > + * writer might not see that the reader is online, even though > > + * the reader is referencing the shared data structure. > > + */ > > + __atomic_thread_fence(__ATOMIC_SEQ_CST); >=20 > If it has to generate a proper memory-barrier here anyway, could it use > rte_smp_mb() here? > At least for IA it would generate more lightweight one. I have used the C++11 memory model functions. I prefer to not mix it with b= arriers. Does ICC generate lightweight code for the above fence? Is it ok to add rte_smp_mb for x86 alone? > Konstantin >=20 > > +} > > + From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from dpdk.org (dpdk.org [92.243.14.124]) by dpdk.space (Postfix) with ESMTP id 5C342A05D3 for ; Tue, 26 Mar 2019 05:35:17 +0100 (CET) Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id 4951710A3; Tue, 26 Mar 2019 05:35:16 +0100 (CET) Received: from EUR04-HE1-obe.outbound.protection.outlook.com (mail-eopbgr70073.outbound.protection.outlook.com [40.107.7.73]) by dpdk.org (Postfix) with ESMTP id 345D55A for ; Tue, 26 Mar 2019 05:35:14 +0100 (CET) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=armh.onmicrosoft.com; s=selector1-arm-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=sOEh+2Il9a138Uvc8z0/Q5vfpS+HfmHmHL3fHnDFtuQ=; b=fmVGy8YXc7kGJL/eO9WNPw0Ek2tlI0OcQVtcD0USqjsT3vftkhlG0qz7rd46GwL7JUxjjQX38cWLWQ2EET9peatKi9BsLN7zDv73GTEgIKzo/m1IAp2AtzZm/dpCmJBeZj0J0UUGRHBTSUWOrDDeGgs+0v1yjDOnoDxY1LAc5ZE= Received: from AM6PR08MB3672.eurprd08.prod.outlook.com (20.177.115.76) by AM6PR08MB5014.eurprd08.prod.outlook.com (10.255.122.18) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.1730.15; Tue, 26 Mar 2019 04:35:12 +0000 Received: from AM6PR08MB3672.eurprd08.prod.outlook.com ([fe80::4d90:78f1:e670:14d5]) by AM6PR08MB3672.eurprd08.prod.outlook.com ([fe80::4d90:78f1:e670:14d5%3]) with mapi id 15.20.1730.019; Tue, 26 Mar 2019 04:35:12 +0000 From: Honnappa Nagarahalli To: "Ananyev, Konstantin" , "stephen@networkplumber.org" , "paulmck@linux.ibm.com" , "dev@dpdk.org" CC: "Gavin Hu (Arm Technology China)" , Dharmik Thakkar , Malvika Gupta , nd , nd Thread-Topic: [PATCH 1/3] rcu: add RCU library supporting QSBR mechanism Thread-Index: AQHU3hBhNhXVbW/WVkOFAdc0mqwLiaYX1KDAgAUEG6A= Date: Tue, 26 Mar 2019 04:35:12 +0000 Message-ID: References: <20181122033055.3431-1-honnappa.nagarahalli@arm.com> <20190319045228.46879-1-honnappa.nagarahalli@arm.com> <20190319045228.46879-2-honnappa.nagarahalli@arm.com> <2601191342CEEE43887BDE71AB977258013655ED5C@irsmsx105.ger.corp.intel.com> In-Reply-To: <2601191342CEEE43887BDE71AB977258013655ED5C@irsmsx105.ger.corp.intel.com> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: authentication-results: spf=none (sender IP is ) smtp.mailfrom=Honnappa.Nagarahalli@arm.com; x-originating-ip: [217.140.111.135] x-ms-publictraffictype: Email x-ms-office365-filtering-correlation-id: cc67e8c7-c5d8-43a7-d429-08d6b1a46f7e x-ms-office365-filtering-ht: Tenant x-microsoft-antispam: BCL:0; PCL:0; RULEID:(2390118)(7020095)(4652040)(8989299)(5600127)(711020)(4605104)(4618075)(4534185)(4627221)(201703031133081)(201702281549075)(8990200)(2017052603328)(7153060)(7193020); SRVR:AM6PR08MB5014; x-ms-traffictypediagnostic: AM6PR08MB5014: x-ms-exchange-purlcount: 1 nodisclaimer: True x-microsoft-antispam-prvs: x-forefront-prvs: 09888BC01D x-forefront-antispam-report: SFV:NSPM; SFS:(10009020)(346002)(39860400002)(396003)(366004)(136003)(376002)(189003)(199004)(52314003)(8676002)(6436002)(8936002)(6506007)(6116002)(11346002)(71190400001)(3846002)(74316002)(476003)(446003)(4326008)(99286004)(76176011)(7736002)(72206003)(68736007)(25786009)(53936002)(6246003)(2501003)(966005)(81156014)(14454004)(102836004)(105586002)(478600001)(33656002)(7696005)(305945005)(55016002)(9686003)(53946003)(6306002)(97736004)(81166006)(86362001)(5660300002)(26005)(2906002)(110136005)(54906003)(30864003)(229853002)(93886005)(71200400001)(106356001)(486006)(52536014)(66066001)(14444005)(316002)(256004)(186003)(2201001); DIR:OUT; SFP:1101; SCL:1; SRVR:AM6PR08MB5014; H:AM6PR08MB3672.eurprd08.prod.outlook.com; FPR:; SPF:None; LANG:en; PTR:InfoNoRecords; MX:1; A:1; received-spf: None (protection.outlook.com: arm.com does not designate permitted sender hosts) x-ms-exchange-senderadcheck: 1 x-microsoft-antispam-message-info: pxiDul+Z3Wqh5cN+PEGU1F0V53/RLPOd8j15/ewMS+nrQqm9f5IRDNVwciVmVTQa6ezxsmeZGN8/PmNZe5swwyLTHVViFVl99ZUIvbj/dyE4RReF9/qqLi6oDhmeGXwk4uMVmlLlTj5p4wXLblv53QIGKOWX4LPE9i0iWlcZdqbnbghnWIQ3L4w3vnOhfpYu2ogmOIk8CIHsQ6BtvKGnsgpTAPZO9ssjK2nM1JWtEDiKlLasxtOO0B8e/ZhAFH2Rtgsk+44Aw4/CuEf9IkkXDZOsOHGAosUtlSXWaCDj0SQ1PmU1r/y8G3RlkHs3K2r3c9pHtcJ7nTUMvLpwmzPmvLPTmchZrkxILYD3SqlNvANsFwuVAvg/+icQ/fpkj4pR9qfraKSoAZ61ulUPeLhUwbmSy7RCM4G6w2nvjeKF8Dc= Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-OriginatorOrg: arm.com X-MS-Exchange-CrossTenant-Network-Message-Id: cc67e8c7-c5d8-43a7-d429-08d6b1a46f7e X-MS-Exchange-CrossTenant-originalarrivaltime: 26 Mar 2019 04:35:12.1308 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: f34e5979-57d9-4aaa-ad4d-b122a662184d X-MS-Exchange-CrossTenant-mailboxtype: HOSTED X-MS-Exchange-Transport-CrossTenantHeadersStamped: AM6PR08MB5014 Subject: Re: [dpdk-dev] [PATCH 1/3] rcu: add RCU library supporting QSBR mechanism X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" Message-ID: <20190326043512.Z7bYX3M19JEWCUTRWE05LQkEJiFYzk3gp2mEYpsZ738@z> > Hi Honnappa, >=20 > > diff --git a/lib/librte_rcu/rte_rcu_qsbr.c > > b/lib/librte_rcu/rte_rcu_qsbr.c new file mode 100644 index > > 000000000..0fc4515ea > > --- /dev/null > > +++ b/lib/librte_rcu/rte_rcu_qsbr.c > > @@ -0,0 +1,99 @@ > > +/* SPDX-License-Identifier: BSD-3-Clause > > + * > > + * Copyright (c) 2018 Arm Limited > > + */ > > + > > +#include > > +#include > > +#include > > +#include > > + > > +#include > > +#include > > +#include > > +#include > > +#include > > +#include > > +#include > > +#include > > +#include > > +#include > > + > > +#include "rte_rcu_qsbr.h" > > + > > +/* Get the memory size of QSBR variable */ size_t __rte_experimental > > +rte_rcu_qsbr_get_memsize(uint32_t max_threads) { > > + size_t sz; > > + > > + RTE_ASSERT(max_threads =3D=3D 0); >=20 > Here and in all similar places: > assert() will abort when its condition will be evaluated to false. > So it should be max_threads !=3D 0. Thanks for this comment. Enabling RTE_ENABLE_ASSERT resulted in more proble= ms. I will fix in the next version. > Also it a public and non-datapath function. > Calling assert() for invalid input parameter - seems way too extreme. > Why not just return error to the caller? Ok, I will change it. >=20 > > + > > + sz =3D sizeof(struct rte_rcu_qsbr); > > + > > + /* Add the size of quiescent state counter array */ > > + sz +=3D sizeof(struct rte_rcu_qsbr_cnt) * max_threads; > > + > > + return RTE_ALIGN(sz, RTE_CACHE_LINE_SIZE); } > > + > > +/* Initialize a quiescent state variable */ void __rte_experimental > > +rte_rcu_qsbr_init(struct rte_rcu_qsbr *v, uint32_t max_threads) { > > + RTE_ASSERT(v =3D=3D NULL); > > + > > + memset(v, 0, rte_rcu_qsbr_get_memsize(max_threads)); > > + v->m_threads =3D max_threads; > > + v->num_elems =3D RTE_ALIGN_MUL_CEIL(max_threads, > > + RTE_QSBR_THRID_ARRAY_ELM_SIZE) / > > + RTE_QSBR_THRID_ARRAY_ELM_SIZE; > > + v->token =3D RTE_QSBR_CNT_INIT; > > +} > > + > > +/* Dump the details of a single quiescent state variable to a file. > > +*/ void __rte_experimental rte_rcu_qsbr_dump(FILE *f, struct > > +rte_rcu_qsbr *v) { > > + uint64_t bmap; > > + uint32_t i, t; > > + > > + RTE_ASSERT(v =3D=3D NULL || f =3D=3D NULL); > > + > > + fprintf(f, "\nQuiescent State Variable @%p\n", v); > > + > > + fprintf(f, " QS variable memory size =3D %lu\n", > > + rte_rcu_qsbr_get_memsize(v->m_threads)); > > + fprintf(f, " Given # max threads =3D %u\n", v->m_threads); > > + > > + fprintf(f, " Registered thread ID mask =3D 0x"); > > + for (i =3D 0; i < v->num_elems; i++) > > + fprintf(f, "%lx", __atomic_load_n(&v->reg_thread_id[i], > > + __ATOMIC_ACQUIRE)); > > + fprintf(f, "\n"); > > + > > + fprintf(f, " Token =3D %lu\n", > > + __atomic_load_n(&v->token, __ATOMIC_ACQUIRE)); > > + > > + fprintf(f, "Quiescent State Counts for readers:\n"); > > + for (i =3D 0; i < v->num_elems; i++) { > > + bmap =3D __atomic_load_n(&v->reg_thread_id[i], > __ATOMIC_ACQUIRE); > > + while (bmap) { > > + t =3D __builtin_ctzl(bmap); > > + fprintf(f, "thread ID =3D %d, count =3D %lu\n", t, > > + __atomic_load_n( > > + &RTE_QSBR_CNT_ARRAY_ELM(v, i)- > >cnt, > > + __ATOMIC_RELAXED)); > > + bmap &=3D ~(1UL << t); > > + } > > + } > > +} > > + > > +int rcu_log_type; > > + > > +RTE_INIT(rte_rcu_register) > > +{ > > + rcu_log_type =3D rte_log_register("lib.rcu"); > > + if (rcu_log_type >=3D 0) > > + rte_log_set_level(rcu_log_type, RTE_LOG_ERR); } > > diff --git a/lib/librte_rcu/rte_rcu_qsbr.h > > b/lib/librte_rcu/rte_rcu_qsbr.h new file mode 100644 index > > 000000000..83943f751 > > --- /dev/null > > +++ b/lib/librte_rcu/rte_rcu_qsbr.h > > @@ -0,0 +1,511 @@ > > +/* SPDX-License-Identifier: BSD-3-Clause > > + * Copyright (c) 2018 Arm Limited > > + */ > > + > > +#ifndef _RTE_RCU_QSBR_H_ > > +#define _RTE_RCU_QSBR_H_ > > + > > +/** > > + * @file > > + * RTE Quiescent State Based Reclamation (QSBR) > > + * > > + * Quiescent State (QS) is any point in the thread execution > > + * where the thread does not hold a reference to a data structure > > + * in shared memory. While using lock-less data structures, the > > +writer > > + * can safely free memory once all the reader threads have entered > > + * quiescent state. > > + * > > + * This library provides the ability for the readers to report > > +quiescent > > + * state and for the writers to identify when all the readers have > > + * entered quiescent state. > > + */ > > + > > +#ifdef __cplusplus > > +extern "C" { > > +#endif > > + > > +#include > > +#include > > +#include > > +#include > > +#include > > +#include > > +#include > > + > > +extern int rcu_log_type; > > + > > +#if RTE_LOG_DP_LEVEL >=3D RTE_LOG_DEBUG #define RCU_DP_LOG(level, > fmt, > > +args...) \ > > + rte_log(RTE_LOG_ ## level, rcu_log_type, \ > > + "%s(): " fmt "\n", __func__, ## args) #else #define > > +RCU_DP_LOG(level, fmt, args...) #endif >=20 > Why do you need that? > Can't you use RTE_LOG_DP() instead? RTE_LOG_DP is for static log types such as RTE_LOGTYPE_EAL, RTE_LOGTYPE_MBU= F etc. Use of static log type in RCU was rejected earlier. Hence, I am usin= g the dynamic log types. >=20 > > + > > +/* Registered thread IDs are stored as a bitmap of 64b element array. > > + * Given thread id needs to be converted to index into the array and > > + * the id within the array element. > > + */ > > +#define RTE_RCU_MAX_THREADS 1024 > > +#define RTE_QSBR_THRID_ARRAY_ELM_SIZE (sizeof(uint64_t) * 8) #define > > +RTE_QSBR_THRID_ARRAY_ELEMS \ > > + (RTE_ALIGN_MUL_CEIL(RTE_RCU_MAX_THREADS, \ > > + RTE_QSBR_THRID_ARRAY_ELM_SIZE) / > RTE_QSBR_THRID_ARRAY_ELM_SIZE) > > +#define RTE_QSBR_THRID_INDEX_SHIFT 6 #define RTE_QSBR_THRID_MASK > 0x3f > > +#define RTE_QSBR_THRID_INVALID 0xffffffff > > + > > +/* Worker thread counter */ > > +struct rte_rcu_qsbr_cnt { > > + uint64_t cnt; > > + /**< Quiescent state counter. Value 0 indicates the thread is > > +offline */ } __rte_cache_aligned; > > + > > +#define RTE_QSBR_CNT_ARRAY_ELM(v, i) (((struct rte_rcu_qsbr_cnt *)(v > > ++ 1)) + i) >=20 > You can probably add > struct rte_rcu_qsbr_cnt cnt[0]; > at the end of struct rte_rcu_qsbr, then wouldn't need macro above. ok >=20 > > +#define RTE_QSBR_CNT_THR_OFFLINE 0 > > +#define RTE_QSBR_CNT_INIT 1 > > + > > +/** > > + * RTE thread Quiescent State structure. > > + * Quiescent state counter array (array of 'struct > > +rte_rcu_qsbr_cnt'), > > + * whose size is dependent on the maximum number of reader threads > > + * (m_threads) using this variable is stored immediately following > > + * this structure. > > + */ > > +struct rte_rcu_qsbr { > > + uint64_t token __rte_cache_aligned; > > + /**< Counter to allow for multiple simultaneous QS queries */ > > + > > + uint32_t num_elems __rte_cache_aligned; > > + /**< Number of elements in the thread ID array */ > > + uint32_t m_threads; > > + /**< Maximum number of threads this RCU variable will use */ > > + > > + uint64_t reg_thread_id[RTE_QSBR_THRID_ARRAY_ELEMS] > __rte_cache_aligned; > > + /**< Registered thread IDs are stored in a bitmap array */ >=20 >=20 > As I understand you ended up with fixed size array to avoid 2 variable si= ze > arrays in this struct? Yes > Is that big penalty for register/unregister() to either store a pointer t= o bitmap, > or calculate it based on num_elems value? In the last RFC I sent out [1], I tested the impact of having non-fixed siz= e array. There 'was' a performance degradation in most of the performance t= ests. The issue was with calculating the address of per thread QSBR counter= s (not with the address calculation of the bitmap). With the current patch,= I do not see the performance difference (the difference between the RFC an= d this patch are the memory orderings, they are masking any perf gain from = having a fixed array). However, I have kept the fixed size array as the gen= erated code does not have additional calculations to get the address of qsb= r counter array elements. [1] http://mails.dpdk.org/archives/dev/2019-February/125029.html > As another thought - do we really need bitmap at all? The bit map is helping avoid accessing all the elements in rte_rcu_qsbr_cnt= array (as you have mentioned below). This provides the ability to scale th= e number of threads dynamically. For ex: an application can create a qsbr v= ariable with 48 max threads, but currently only 2 threads are active (due t= o traffic conditions). > Might it is possible to sotre register value for each thread inside it's > rte_rcu_qsbr_cnt: > struct rte_rcu_qsbr_cnt {uint64_t cnt; uint32_t register;} > __rte_cache_aligned; ? > That would cause check() to walk through all elems in rte_rcu_qsbr_cnt ar= ray, > but from other side would help to avoid cache conflicts for register/unre= gister. With the addition of rte_rcu_qsbr_thread_online/offline APIs, the register/= unregister APIs are not in critical path anymore. Hence, the cache conflict= s are fine. The online/offline APIs work on thread specific cache lines and= these are in the critical path. >=20 > > +} __rte_cache_aligned; > > + > > +/** > > + * @warning > > + * @b EXPERIMENTAL: this API may change without prior notice > > + * > > + * Return the size of the memory occupied by a Quiescent State variabl= e. > > + * > > + * @param max_threads > > + * Maximum number of threads reporting quiescent state on this varia= ble. > > + * @return > > + * Size of memory in bytes required for this QS variable. > > + */ > > +size_t __rte_experimental > > +rte_rcu_qsbr_get_memsize(uint32_t max_threads); > > + > > +/** > > + * @warning > > + * @b EXPERIMENTAL: this API may change without prior notice > > + * > > + * Initialize a Quiescent State (QS) variable. > > + * > > + * @param v > > + * QS variable > > + * @param max_threads > > + * Maximum number of threads reporting QS on this variable. > > + * > > + */ > > +void __rte_experimental > > +rte_rcu_qsbr_init(struct rte_rcu_qsbr *v, uint32_t max_threads); > > + > > +/** > > + * @warning > > + * @b EXPERIMENTAL: this API may change without prior notice > > + * > > + * Register a reader thread to report its quiescent state > > + * on a QS variable. > > + * > > + * This is implemented as a lock-free function. It is multi-thread > > + * safe. > > + * Any reader thread that wants to report its quiescent state must > > + * call this API. This can be called during initialization or as part > > + * of the packet processing loop. > > + * > > + * Note that rte_rcu_qsbr_thread_online must be called before the > > + * thread updates its QS using rte_rcu_qsbr_update. > > + * > > + * @param v > > + * QS variable > > + * @param thread_id > > + * Reader thread with this thread ID will report its quiescent state= on > > + * the QS variable. > > + */ > > +static __rte_always_inline void __rte_experimental > > +rte_rcu_qsbr_thread_register(struct rte_rcu_qsbr *v, unsigned int > > +thread_id) { > > + unsigned int i, id; > > + > > + RTE_ASSERT(v =3D=3D NULL || thread_id >=3D v->max_threads); > > + > > + id =3D thread_id & RTE_QSBR_THRID_MASK; > > + i =3D thread_id >> RTE_QSBR_THRID_INDEX_SHIFT; > > + > > + /* Release the new register thread ID to other threads > > + * calling rte_rcu_qsbr_check. > > + */ > > + __atomic_fetch_or(&v->reg_thread_id[i], 1UL << id, > > +__ATOMIC_RELEASE); } > > + > > +/** > > + * @warning > > + * @b EXPERIMENTAL: this API may change without prior notice > > + * > > + * Remove a reader thread, from the list of threads reporting their > > + * quiescent state on a QS variable. > > + * > > + * This is implemented as a lock-free function. It is multi-thread saf= e. > > + * This API can be called from the reader threads during shutdown. > > + * Ongoing QS queries will stop waiting for the status from this > > + * unregistered reader thread. > > + * > > + * @param v > > + * QS variable > > + * @param thread_id > > + * Reader thread with this thread ID will stop reporting its quiesce= nt > > + * state on the QS variable. > > + */ > > +static __rte_always_inline void __rte_experimental > > +rte_rcu_qsbr_thread_unregister(struct rte_rcu_qsbr *v, unsigned int > > +thread_id) { > > + unsigned int i, id; > > + > > + RTE_ASSERT(v =3D=3D NULL || thread_id >=3D v->max_threads); > > + > > + id =3D thread_id & RTE_QSBR_THRID_MASK; > > + i =3D thread_id >> RTE_QSBR_THRID_INDEX_SHIFT; > > + > > + /* Make sure the removal of the thread from the list of > > + * reporting threads is visible before the thread > > + * does anything else. > > + */ > > + __atomic_fetch_and(&v->reg_thread_id[i], > > + ~(1UL << id), __ATOMIC_RELEASE); > > +} > > + > > +/** > > + * @warning > > + * @b EXPERIMENTAL: this API may change without prior notice > > + * > > + * Add a registered reader thread, to the list of threads reporting > > +their > > + * quiescent state on a QS variable. > > + * > > + * This is implemented as a lock-free function. It is multi-thread > > + * safe. > > + * > > + * Any registered reader thread that wants to report its quiescent > > +state must > > + * call this API before calling rte_rcu_qsbr_update. This can be > > +called > > + * during initialization or as part of the packet processing loop. > > + * > > + * The reader thread must call rte_rcu_thread_offline API, before > > + * calling any functions that block, to ensure that > > +rte_rcu_qsbr_check > > + * API does not wait indefinitely for the reader thread to update its = QS. > > + * > > + * The reader thread must call rte_rcu_thread_online API, after the > > +blocking > > + * function call returns, to ensure that rte_rcu_qsbr_check API > > + * waits for the reader thread to update its QS. > > + * > > + * @param v > > + * QS variable > > + * @param thread_id > > + * Reader thread with this thread ID will report its quiescent state= on > > + * the QS variable. > > + */ > > +static __rte_always_inline void __rte_experimental > > +rte_rcu_qsbr_thread_online(struct rte_rcu_qsbr *v, unsigned int > > +thread_id) { > > + uint64_t t; > > + > > + RTE_ASSERT(v =3D=3D NULL || thread_id >=3D v->max_threads); > > + > > + /* Copy the current value of token. > > + * The fence at the end of the function will ensure that > > + * the following will not move down after the load of any shared > > + * data structure. > > + */ > > + t =3D __atomic_load_n(&v->token, __ATOMIC_RELAXED); > > + > > + /* __atomic_store_n(cnt, __ATOMIC_RELAXED) is used to ensure > > + * 'cnt' (64b) is accessed atomically. > > + */ > > + __atomic_store_n(&RTE_QSBR_CNT_ARRAY_ELM(v, thread_id)->cnt, > > + t, __ATOMIC_RELAXED); > > + > > + /* The subsequent load of the data structure should not > > + * move above the store. Hence a store-load barrier > > + * is required. > > + * If the load of the data structure moves above the store, > > + * writer might not see that the reader is online, even though > > + * the reader is referencing the shared data structure. > > + */ > > + __atomic_thread_fence(__ATOMIC_SEQ_CST); >=20 > If it has to generate a proper memory-barrier here anyway, could it use > rte_smp_mb() here? > At least for IA it would generate more lightweight one. I have used the C++11 memory model functions. I prefer to not mix it with b= arriers. Does ICC generate lightweight code for the above fence? Is it ok to add rte_smp_mb for x86 alone? > Konstantin >=20 > > +} > > +