From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mga12.intel.com (mga12.intel.com [192.55.52.136]) by dpdk.org (Postfix) with ESMTP id CAFCE4C99 for ; Thu, 28 Mar 2019 12:15:35 +0100 (CET) X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga001.jf.intel.com ([10.7.209.18]) by fmsmga106.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 28 Mar 2019 04:15:34 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.60,280,1549958400"; d="scan'208";a="218347912" Received: from irsmsx104.ger.corp.intel.com ([163.33.3.159]) by orsmga001.jf.intel.com with ESMTP; 28 Mar 2019 04:15:32 -0700 Received: from irsmsx105.ger.corp.intel.com ([169.254.7.210]) by IRSMSX104.ger.corp.intel.com ([169.254.5.56]) with mapi id 14.03.0415.000; Thu, 28 Mar 2019 11:15:31 +0000 From: "Ananyev, Konstantin" To: Honnappa Nagarahalli , "stephen@networkplumber.org" , "paulmck@linux.ibm.com" , "dev@dpdk.org" CC: "Gavin Hu (Arm Technology China)" , Dharmik Thakkar , Malvika Gupta , nd , nd Thread-Topic: [PATCH 1/3] rcu: add RCU library supporting QSBR mechanism Thread-Index: AQHU3hBhNhXVbW/WVkOFAdc0mqwLiaYX1KDAgAUEG6CABBflkA== Date: Thu, 28 Mar 2019 11:15:31 +0000 Message-ID: <2601191342CEEE43887BDE71AB977258013656120A@irsmsx105.ger.corp.intel.com> References: <20181122033055.3431-1-honnappa.nagarahalli@arm.com> <20190319045228.46879-1-honnappa.nagarahalli@arm.com> <20190319045228.46879-2-honnappa.nagarahalli@arm.com> <2601191342CEEE43887BDE71AB977258013655ED5C@irsmsx105.ger.corp.intel.com> In-Reply-To: Accept-Language: en-IE, en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-titus-metadata-40: eyJDYXRlZ29yeUxhYmVscyI6IiIsIk1ldGFkYXRhIjp7Im5zIjoiaHR0cDpcL1wvd3d3LnRpdHVzLmNvbVwvbnNcL0ludGVsMyIsImlkIjoiODgyMDJjMzQtMWIxYy00ZjA2LTgxNjgtMDgyNWZkOGY0YWIzIiwicHJvcHMiOlt7Im4iOiJDVFBDbGFzc2lmaWNhdGlvbiIsInZhbHMiOlt7InZhbHVlIjoiQ1RQX05UIn1dfV19LCJTdWJqZWN0TGFiZWxzIjpbXSwiVE1DVmVyc2lvbiI6IjE3LjEwLjE4MDQuNDkiLCJUcnVzdGVkTGFiZWxIYXNoIjoiYTJ3clNaUWZIdjE1cUpBWW96cW9nNUlCeEozMHh1bUNreTh6SXQxM01wbFNucHVGUUE3SE5FQzB5bEdtS0lXbyJ9 x-ctpclassification: CTP_NT dlp-product: dlpe-windows dlp-version: 11.0.400.15 dlp-reaction: no-action x-originating-ip: [163.33.239.182] Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Subject: Re: [dpdk-dev] [PATCH 1/3] rcu: add RCU library supporting QSBR mechanism X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 28 Mar 2019 11:15:36 -0000 > > > > > +#define RTE_QSBR_CNT_THR_OFFLINE 0 > > > +#define RTE_QSBR_CNT_INIT 1 > > > + > > > +/** > > > + * RTE thread Quiescent State structure. > > > + * Quiescent state counter array (array of 'struct > > > +rte_rcu_qsbr_cnt'), > > > + * whose size is dependent on the maximum number of reader threads > > > + * (m_threads) using this variable is stored immediately following > > > + * this structure. > > > + */ > > > +struct rte_rcu_qsbr { > > > + uint64_t token __rte_cache_aligned; > > > + /**< Counter to allow for multiple simultaneous QS queries */ > > > + > > > + uint32_t num_elems __rte_cache_aligned; > > > + /**< Number of elements in the thread ID array */ > > > + uint32_t m_threads; > > > + /**< Maximum number of threads this RCU variable will use */ > > > + > > > + uint64_t reg_thread_id[RTE_QSBR_THRID_ARRAY_ELEMS] > > __rte_cache_aligned; > > > + /**< Registered thread IDs are stored in a bitmap array */ > > > > > > As I understand you ended up with fixed size array to avoid 2 variable = size > > arrays in this struct? > Yes >=20 > > Is that big penalty for register/unregister() to either store a pointer= to bitmap, > > or calculate it based on num_elems value? > In the last RFC I sent out [1], I tested the impact of having non-fixed s= ize array. There 'was' a performance degradation in most of the > performance tests. The issue was with calculating the address of per thre= ad QSBR counters (not with the address calculation of the bitmap). > With the current patch, I do not see the performance difference (the diff= erence between the RFC and this patch are the memory orderings, > they are masking any perf gain from having a fixed array). However, I hav= e kept the fixed size array as the generated code does not have > additional calculations to get the address of qsbr counter array elements= . >=20 > [1] http://mails.dpdk.org/archives/dev/2019-February/125029.html Ok I see, but can we then arrange them ina different way: qsbr_cnt[] will start at the end of struct rte_rcu_qsbr (same as you have it right now). While bitmap will be placed after qsbr_cnt[]. As I understand register/unregister is not consider on critical path, so some perf-degradation here doesn't matter. Also check() would need extra address calculation for bitmap, but considering that we have to go through all bitmap (and in worst case qs= br_cnt[]) anyway, that probably not a big deal? =20 >=20 > > As another thought - do we really need bitmap at all? > The bit map is helping avoid accessing all the elements in rte_rcu_qsbr_c= nt array (as you have mentioned below). This provides the ability to > scale the number of threads dynamically. For ex: an application can creat= e a qsbr variable with 48 max threads, but currently only 2 threads > are active (due to traffic conditions). I understand that bitmap supposed to speedup check() for situations when most threads are unregistered. My thought was that might be check() speedup for such situation is not that= critical. >=20 > > Might it is possible to sotre register value for each thread inside it'= s > > rte_rcu_qsbr_cnt: > > struct rte_rcu_qsbr_cnt {uint64_t cnt; uint32_t register;} > > __rte_cache_aligned; ? > > That would cause check() to walk through all elems in rte_rcu_qsbr_cnt = array, > > but from other side would help to avoid cache conflicts for register/un= register. > With the addition of rte_rcu_qsbr_thread_online/offline APIs, the registe= r/unregister APIs are not in critical path anymore. Hence, the > cache conflicts are fine. The online/offline APIs work on thread specific= cache lines and these are in the critical path. >=20 > > > > > +} __rte_cache_aligned; > > > + From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from dpdk.org (dpdk.org [92.243.14.124]) by dpdk.space (Postfix) with ESMTP id 21C14A0679 for ; Thu, 28 Mar 2019 12:15:38 +0100 (CET) Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id DABCB1B13C; Thu, 28 Mar 2019 12:15:36 +0100 (CET) Received: from mga12.intel.com (mga12.intel.com [192.55.52.136]) by dpdk.org (Postfix) with ESMTP id CAFCE4C99 for ; Thu, 28 Mar 2019 12:15:35 +0100 (CET) X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga001.jf.intel.com ([10.7.209.18]) by fmsmga106.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 28 Mar 2019 04:15:34 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.60,280,1549958400"; d="scan'208";a="218347912" Received: from irsmsx104.ger.corp.intel.com ([163.33.3.159]) by orsmga001.jf.intel.com with ESMTP; 28 Mar 2019 04:15:32 -0700 Received: from irsmsx105.ger.corp.intel.com ([169.254.7.210]) by IRSMSX104.ger.corp.intel.com ([169.254.5.56]) with mapi id 14.03.0415.000; Thu, 28 Mar 2019 11:15:31 +0000 From: "Ananyev, Konstantin" To: Honnappa Nagarahalli , "stephen@networkplumber.org" , "paulmck@linux.ibm.com" , "dev@dpdk.org" CC: "Gavin Hu (Arm Technology China)" , Dharmik Thakkar , Malvika Gupta , nd , nd Thread-Topic: [PATCH 1/3] rcu: add RCU library supporting QSBR mechanism Thread-Index: AQHU3hBhNhXVbW/WVkOFAdc0mqwLiaYX1KDAgAUEG6CABBflkA== Date: Thu, 28 Mar 2019 11:15:31 +0000 Message-ID: <2601191342CEEE43887BDE71AB977258013656120A@irsmsx105.ger.corp.intel.com> References: <20181122033055.3431-1-honnappa.nagarahalli@arm.com> <20190319045228.46879-1-honnappa.nagarahalli@arm.com> <20190319045228.46879-2-honnappa.nagarahalli@arm.com> <2601191342CEEE43887BDE71AB977258013655ED5C@irsmsx105.ger.corp.intel.com> In-Reply-To: Accept-Language: en-IE, en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-titus-metadata-40: eyJDYXRlZ29yeUxhYmVscyI6IiIsIk1ldGFkYXRhIjp7Im5zIjoiaHR0cDpcL1wvd3d3LnRpdHVzLmNvbVwvbnNcL0ludGVsMyIsImlkIjoiODgyMDJjMzQtMWIxYy00ZjA2LTgxNjgtMDgyNWZkOGY0YWIzIiwicHJvcHMiOlt7Im4iOiJDVFBDbGFzc2lmaWNhdGlvbiIsInZhbHMiOlt7InZhbHVlIjoiQ1RQX05UIn1dfV19LCJTdWJqZWN0TGFiZWxzIjpbXSwiVE1DVmVyc2lvbiI6IjE3LjEwLjE4MDQuNDkiLCJUcnVzdGVkTGFiZWxIYXNoIjoiYTJ3clNaUWZIdjE1cUpBWW96cW9nNUlCeEozMHh1bUNreTh6SXQxM01wbFNucHVGUUE3SE5FQzB5bEdtS0lXbyJ9 x-ctpclassification: CTP_NT dlp-product: dlpe-windows dlp-version: 11.0.400.15 dlp-reaction: no-action x-originating-ip: [163.33.239.182] Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Subject: Re: [dpdk-dev] [PATCH 1/3] rcu: add RCU library supporting QSBR mechanism X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" Message-ID: <20190328111531.ArQein7fb1RMWnjD6Qz0jaAzPx3q-OEpkXzhFyJk7jc@z> > > > > > +#define RTE_QSBR_CNT_THR_OFFLINE 0 > > > +#define RTE_QSBR_CNT_INIT 1 > > > + > > > +/** > > > + * RTE thread Quiescent State structure. > > > + * Quiescent state counter array (array of 'struct > > > +rte_rcu_qsbr_cnt'), > > > + * whose size is dependent on the maximum number of reader threads > > > + * (m_threads) using this variable is stored immediately following > > > + * this structure. > > > + */ > > > +struct rte_rcu_qsbr { > > > + uint64_t token __rte_cache_aligned; > > > + /**< Counter to allow for multiple simultaneous QS queries */ > > > + > > > + uint32_t num_elems __rte_cache_aligned; > > > + /**< Number of elements in the thread ID array */ > > > + uint32_t m_threads; > > > + /**< Maximum number of threads this RCU variable will use */ > > > + > > > + uint64_t reg_thread_id[RTE_QSBR_THRID_ARRAY_ELEMS] > > __rte_cache_aligned; > > > + /**< Registered thread IDs are stored in a bitmap array */ > > > > > > As I understand you ended up with fixed size array to avoid 2 variable = size > > arrays in this struct? > Yes >=20 > > Is that big penalty for register/unregister() to either store a pointer= to bitmap, > > or calculate it based on num_elems value? > In the last RFC I sent out [1], I tested the impact of having non-fixed s= ize array. There 'was' a performance degradation in most of the > performance tests. The issue was with calculating the address of per thre= ad QSBR counters (not with the address calculation of the bitmap). > With the current patch, I do not see the performance difference (the diff= erence between the RFC and this patch are the memory orderings, > they are masking any perf gain from having a fixed array). However, I hav= e kept the fixed size array as the generated code does not have > additional calculations to get the address of qsbr counter array elements= . >=20 > [1] http://mails.dpdk.org/archives/dev/2019-February/125029.html Ok I see, but can we then arrange them ina different way: qsbr_cnt[] will start at the end of struct rte_rcu_qsbr (same as you have it right now). While bitmap will be placed after qsbr_cnt[]. As I understand register/unregister is not consider on critical path, so some perf-degradation here doesn't matter. Also check() would need extra address calculation for bitmap, but considering that we have to go through all bitmap (and in worst case qs= br_cnt[]) anyway, that probably not a big deal? =20 >=20 > > As another thought - do we really need bitmap at all? > The bit map is helping avoid accessing all the elements in rte_rcu_qsbr_c= nt array (as you have mentioned below). This provides the ability to > scale the number of threads dynamically. For ex: an application can creat= e a qsbr variable with 48 max threads, but currently only 2 threads > are active (due to traffic conditions). I understand that bitmap supposed to speedup check() for situations when most threads are unregistered. My thought was that might be check() speedup for such situation is not that= critical. >=20 > > Might it is possible to sotre register value for each thread inside it'= s > > rte_rcu_qsbr_cnt: > > struct rte_rcu_qsbr_cnt {uint64_t cnt; uint32_t register;} > > __rte_cache_aligned; ? > > That would cause check() to walk through all elems in rte_rcu_qsbr_cnt = array, > > but from other side would help to avoid cache conflicts for register/un= register. > With the addition of rte_rcu_qsbr_thread_online/offline APIs, the registe= r/unregister APIs are not in critical path anymore. Hence, the > cache conflicts are fine. The online/offline APIs work on thread specific= cache lines and these are in the critical path. >=20 > > > > > +} __rte_cache_aligned; > > > +