From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mga11.intel.com (mga11.intel.com [192.55.52.93]) by dpdk.org (Postfix) with ESMTP id E60251B611 for ; Wed, 17 Apr 2019 15:39:13 +0200 (CEST) X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga006.jf.intel.com ([10.7.209.51]) by fmsmga102.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 17 Apr 2019 06:39:12 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.60,362,1549958400"; d="scan'208";a="136572944" Received: from irsmsx103.ger.corp.intel.com ([163.33.3.157]) by orsmga006.jf.intel.com with ESMTP; 17 Apr 2019 06:39:09 -0700 Received: from irsmsx111.ger.corp.intel.com (10.108.20.4) by IRSMSX103.ger.corp.intel.com (163.33.3.157) with Microsoft SMTP Server (TLS) id 14.3.408.0; Wed, 17 Apr 2019 14:39:08 +0100 Received: from irsmsx105.ger.corp.intel.com ([169.254.7.31]) by irsmsx111.ger.corp.intel.com ([169.254.2.85]) with mapi id 14.03.0415.000; Wed, 17 Apr 2019 14:39:08 +0100 From: "Ananyev, Konstantin" To: Honnappa Nagarahalli , Stephen Hemminger CC: "paulmck@linux.ibm.com" , "Kovacevic, Marko" , "dev@dpdk.org" , "Gavin Hu (Arm Technology China)" , Dharmik Thakkar , Malvika Gupta , nd , nd Thread-Topic: [PATCH v5 1/3] rcu: add RCU library supporting QSBR mechanism Thread-Index: AQHU8W09YQlYNBQMIUeythDpmiY2L6Y5BHAAgAAFAYCAAAupgIAEE9RggAAmBQCAACqAYIAANriAgACG54CAAJ3UgIAAIisAgABKMYCAAEmDgIAAzhfQ Date: Wed, 17 Apr 2019 13:39:08 +0000 Message-ID: <2601191342CEEE43887BDE71AB9772580148A98F26@irsmsx105.ger.corp.intel.com> References: <20181122033055.3431-1-honnappa.nagarahalli@arm.com> <20190412202039.46902-1-honnappa.nagarahalli@arm.com> <20190412202039.46902-2-honnappa.nagarahalli@arm.com> <20190412150650.3709358e@shemminger-XPS-13-9360> <20190412160629.670eacd1@shemminger-XPS-13-9360> <2601191342CEEE43887BDE71AB9772580148A97E53@irsmsx105.ger.corp.intel.com> <20190415083834.31b38ed3@shemminger-XPS-13-9360> <2601191342CEEE43887BDE71AB9772580148A98064@irsmsx105.ger.corp.intel.com> <20190415142631.4c250248@shemminger-XPS-13-9360> <20190416075415.76c9d64a@xps13.lan> <20190416142205.5d683e8e@xps13.lan> In-Reply-To: Accept-Language: en-IE, en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-titus-metadata-40: eyJDYXRlZ29yeUxhYmVscyI6IiIsIk1ldGFkYXRhIjp7Im5zIjoiaHR0cDpcL1wvd3d3LnRpdHVzLmNvbVwvbnNcL0ludGVsMyIsImlkIjoiMjkyZDdmMDYtYzM3MC00NWFiLTk2Y2ItNTI5MDM5OTRhMmE4IiwicHJvcHMiOlt7Im4iOiJDVFBDbGFzc2lmaWNhdGlvbiIsInZhbHMiOlt7InZhbHVlIjoiQ1RQX05UIn1dfV19LCJTdWJqZWN0TGFiZWxzIjpbXSwiVE1DVmVyc2lvbiI6IjE3LjEwLjE4MDQuNDkiLCJUcnVzdGVkTGFiZWxIYXNoIjoibnVZTW0xVkhcL2lcLys0a1BNMmhjeEk3d0JxbUdJcnhVK2NpajZVdHZ1bmEzSDNsZ2VtN0lsZDB2ZFB1b0l3RVo0In0= x-ctpclassification: CTP_NT dlp-product: dlpe-windows dlp-version: 11.0.600.7 dlp-reaction: no-action x-originating-ip: [163.33.239.182] Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Subject: Re: [dpdk-dev] [PATCH v5 1/3] rcu: add RCU library supporting QSBR mechanism X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 17 Apr 2019 13:39:14 -0000 > > > > > > > > > > > > > > > > > > > > > > > > After evaluating long term API/ABI issues, I think > > > > > > > > > > > > you need to get rid of almost all use of inline and > > > > > > > > > > > > visible structures. Yes it might be marginally > > > > > > > > > > > > slower, but you thank me > > > > > > the first time you have to fix something. > > > > > > > > > > > > > > > > > > > > > > > Agree, I was planning on another version to address > > > > > > > > > > > this (I am yet > > > > > > to take a look at your patch addressing the ABI). > > > > > > > > > > > The structure visibility definitely needs to be addre= ssed. > > > > > > > > > > > For the inline functions, is the plan to convert all > > > > > > > > > > > the inline functions in DPDK? If yes, I think we need > > > > > > > > > > > to consider the performance > > > > > > > > > > difference. May be consider L3-fwd application, change > > > > > > > > > > all the > > > > > > inline functions in its path and run a test? > > > > > > > > > > > > > > > > > > > > Every function that is not in the direct datapath shoul= d > > > > > > > > > > not be > > > > > > inline. > > > > > > > > > > Exceptions or things like rx/tx burst, ring > > > > > > > > > > enqueue/dequeue, and packet alloc/free > > > > > I do not understand how DPDK can claim ABI compatibility if we > > > > > have > > > > inline functions (unless we freeze any development in these inline > > > > functions forever). > > > > > > > > > > > > > > > > > > > > > > > Plus synchronization routines: spin/rwlock/barrier, etc. > > > > > > > > > I think rcu should be one of such exceptions - it is just > > > > > > > > > another synchronization mechanism after all (just a bit > > > > > > > > > more > > > > sophisticated). > > > > > > > > > Konstantin > > > > > > > > > > > > > > > > If you look at the other userspace RCU, you wil see that th= e > > > > > > > > only inlines are the rcu_read_lock,rcu_read_unlock and > > > > > > rcu_reference/rcu_assign_pointer. > > > > > > > > > > > > > > > > The synchronization logic is all real functions. > > > > > > > > > > > > > > In fact, I think urcu provides both flavors: > > > > > > > https://github.com/urcu/userspace- > > > > > > rcu/blob/master/include/urcu/static/ > > > > > > > urcu-qsbr.h I still don't understand why we have to treat it > > > > > > > differently then let say spin-lock/ticket-lock or rwlock. > > > > > > > If we gone all the way to create our own version of rcu, we > > > > > > > probably want it to be as fast as possible (I know that main > > > > > > > speedup should come from the fact that readers don't have to > > > > > > > wait for writer to finish, but still...) > > > > > > > > > > > > > > Konstantin > > > > > > > > > > > > > > > > > > > Having locking functions inline is already a problem in current > > releases. > > > > > > The implementation can not be improved without breaking ABI (or > > > > > > doing special workarounds like lock v2) > > > > > I think ABI and inline function discussion needs to be taken up i= n > > > > > a > > > > different thread. > > > > > > > > > > Currently, I am looking to hide the structure visibility. I looke= d > > > > > at your > > > > patch [1], it is a different case than what I have in this patch. I= t > > > > is a pretty generic use case as well (similar situation exists in > > > > other libraries). I think a generic solution should be agreed upon. > > > > > > > > > > If we have to hide the structure content, the handle to QS > > > > > variable > > > > returned to the application needs to be opaque. I suggest using 'vo= id *' > > > > behind which any structure can be used. > > > > > > > > > > typedef void * rte_rcu_qsbr_t; > > > > > typedef void * rte_hash_t; > > > > > > > > > > But it requires typecasting. > > > > > > > > > > [1] http://patchwork.dpdk.org/cover/52609/ > > > > > > > > C allows structure to be defined without knowing what is in it > > therefore. > > > > > > > > typedef struct rte_rcu_qsbr rte_rcu_qsbr_t; > > > > > > > > is preferred (or do it without typedef) > > > > > > > > struct rte_rcu_qsbr; > > > > > > I see that rte_hash library uses the same approach (struct rte_hash i= n > > rte_hash.h, though it is marking as internal). But the ABI Laboratory t= ool > > [1] seems to be reporting incorrect numbers for this library even thoug= h > > the internal structure is changed. > > > > > > [1] > > > https://abi- > > laboratory.pro/index.php?view=3Dcompat_report&l=3Ddpdk&v1=3D19.0 > > > 2&v2=3Dcurrent&obj=3D66794&kind=3Dabi > > > > The problem is rte_hash structure is exposed as part of ABI in > > rte_cuckoo_hash.h This was a mistake. > Do you mean, due to the use of structure with the same name? I am wonderi= ng if it is just a tools issue. The application is not supposed to > include rte_cuckoo_hash.h. >=20 > For the RCU library, we either need to go all functions or leave it the w= ay it is. I do not see a point in trying to hide the internal structure > while having inline functions. >=20 > I converted the inline functions to function calls. >=20 > Testing on Arm platform (results *are* repeatable) shows very minimal dro= p (0.1% to 0.2%) in performance while using lock-free rte_hash > data structure. But one of the test cases which is just spinning shows go= od amount of drop (41%). >=20 > Testing on x86 (Xeon Gold 6132 CPU @ 2.60GHz, results *are* pretty repeat= able) shows performance improvements (7% to 8%) while using > lock-free rte_hash data structure. The test cases which is just spinning = show significant drop (14%, 155%, 231%). > Konstantin, any thoughts on the results? The fact that function show better result than inline (even for hash) is so= rt of surprise to me. Don't have any good explanation off-hand, but the actual numbers for hash t= est are huge by itself... In general, I still think that sync primitives better to stay inlined - the= re is no much point to create ones and then figure out that no-one using them because they are too slow. Though if there is no real perf difference between inlined and normal - no = point to keep it inlined. About RCU lib, my thought to have inlined version for 19.05 and do further = perf testing with it (as I remember there were suggestions about using it in l3fwd for guarding = routing table or so). If we'll find there is no real difference - move it to not-inlined version = in 19.08. It is experimental for now - so could be changed without formal ABI breaka= ge. Konstantin From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from dpdk.org (dpdk.org [92.243.14.124]) by dpdk.space (Postfix) with ESMTP id 489EDA00E6 for ; Wed, 17 Apr 2019 15:39:16 +0200 (CEST) Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id 6947758FE; Wed, 17 Apr 2019 15:39:15 +0200 (CEST) Received: from mga11.intel.com (mga11.intel.com [192.55.52.93]) by dpdk.org (Postfix) with ESMTP id E60251B611 for ; Wed, 17 Apr 2019 15:39:13 +0200 (CEST) X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga006.jf.intel.com ([10.7.209.51]) by fmsmga102.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 17 Apr 2019 06:39:12 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.60,362,1549958400"; d="scan'208";a="136572944" Received: from irsmsx103.ger.corp.intel.com ([163.33.3.157]) by orsmga006.jf.intel.com with ESMTP; 17 Apr 2019 06:39:09 -0700 Received: from irsmsx111.ger.corp.intel.com (10.108.20.4) by IRSMSX103.ger.corp.intel.com (163.33.3.157) with Microsoft SMTP Server (TLS) id 14.3.408.0; Wed, 17 Apr 2019 14:39:08 +0100 Received: from irsmsx105.ger.corp.intel.com ([169.254.7.31]) by irsmsx111.ger.corp.intel.com ([169.254.2.85]) with mapi id 14.03.0415.000; Wed, 17 Apr 2019 14:39:08 +0100 From: "Ananyev, Konstantin" To: Honnappa Nagarahalli , Stephen Hemminger CC: "paulmck@linux.ibm.com" , "Kovacevic, Marko" , "dev@dpdk.org" , "Gavin Hu (Arm Technology China)" , Dharmik Thakkar , Malvika Gupta , nd , nd Thread-Topic: [PATCH v5 1/3] rcu: add RCU library supporting QSBR mechanism Thread-Index: AQHU8W09YQlYNBQMIUeythDpmiY2L6Y5BHAAgAAFAYCAAAupgIAEE9RggAAmBQCAACqAYIAANriAgACG54CAAJ3UgIAAIisAgABKMYCAAEmDgIAAzhfQ Date: Wed, 17 Apr 2019 13:39:08 +0000 Message-ID: <2601191342CEEE43887BDE71AB9772580148A98F26@irsmsx105.ger.corp.intel.com> References: <20181122033055.3431-1-honnappa.nagarahalli@arm.com> <20190412202039.46902-1-honnappa.nagarahalli@arm.com> <20190412202039.46902-2-honnappa.nagarahalli@arm.com> <20190412150650.3709358e@shemminger-XPS-13-9360> <20190412160629.670eacd1@shemminger-XPS-13-9360> <2601191342CEEE43887BDE71AB9772580148A97E53@irsmsx105.ger.corp.intel.com> <20190415083834.31b38ed3@shemminger-XPS-13-9360> <2601191342CEEE43887BDE71AB9772580148A98064@irsmsx105.ger.corp.intel.com> <20190415142631.4c250248@shemminger-XPS-13-9360> <20190416075415.76c9d64a@xps13.lan> <20190416142205.5d683e8e@xps13.lan> In-Reply-To: Accept-Language: en-IE, en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-titus-metadata-40: eyJDYXRlZ29yeUxhYmVscyI6IiIsIk1ldGFkYXRhIjp7Im5zIjoiaHR0cDpcL1wvd3d3LnRpdHVzLmNvbVwvbnNcL0ludGVsMyIsImlkIjoiMjkyZDdmMDYtYzM3MC00NWFiLTk2Y2ItNTI5MDM5OTRhMmE4IiwicHJvcHMiOlt7Im4iOiJDVFBDbGFzc2lmaWNhdGlvbiIsInZhbHMiOlt7InZhbHVlIjoiQ1RQX05UIn1dfV19LCJTdWJqZWN0TGFiZWxzIjpbXSwiVE1DVmVyc2lvbiI6IjE3LjEwLjE4MDQuNDkiLCJUcnVzdGVkTGFiZWxIYXNoIjoibnVZTW0xVkhcL2lcLys0a1BNMmhjeEk3d0JxbUdJcnhVK2NpajZVdHZ1bmEzSDNsZ2VtN0lsZDB2ZFB1b0l3RVo0In0= x-ctpclassification: CTP_NT dlp-product: dlpe-windows dlp-version: 11.0.600.7 dlp-reaction: no-action x-originating-ip: [163.33.239.182] Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Subject: Re: [dpdk-dev] [PATCH v5 1/3] rcu: add RCU library supporting QSBR mechanism X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" Message-ID: <20190417133908.BxQ3KUYgoiWN5tmdP4n4wj6nxHCRg3ZobTxZ3WRWFvI@z> > > > > > > > > > > > > > > > > > > > > > > > > After evaluating long term API/ABI issues, I think > > > > > > > > > > > > you need to get rid of almost all use of inline and > > > > > > > > > > > > visible structures. Yes it might be marginally > > > > > > > > > > > > slower, but you thank me > > > > > > the first time you have to fix something. > > > > > > > > > > > > > > > > > > > > > > > Agree, I was planning on another version to address > > > > > > > > > > > this (I am yet > > > > > > to take a look at your patch addressing the ABI). > > > > > > > > > > > The structure visibility definitely needs to be addre= ssed. > > > > > > > > > > > For the inline functions, is the plan to convert all > > > > > > > > > > > the inline functions in DPDK? If yes, I think we need > > > > > > > > > > > to consider the performance > > > > > > > > > > difference. May be consider L3-fwd application, change > > > > > > > > > > all the > > > > > > inline functions in its path and run a test? > > > > > > > > > > > > > > > > > > > > Every function that is not in the direct datapath shoul= d > > > > > > > > > > not be > > > > > > inline. > > > > > > > > > > Exceptions or things like rx/tx burst, ring > > > > > > > > > > enqueue/dequeue, and packet alloc/free > > > > > I do not understand how DPDK can claim ABI compatibility if we > > > > > have > > > > inline functions (unless we freeze any development in these inline > > > > functions forever). > > > > > > > > > > > > > > > > > > > > > > > Plus synchronization routines: spin/rwlock/barrier, etc. > > > > > > > > > I think rcu should be one of such exceptions - it is just > > > > > > > > > another synchronization mechanism after all (just a bit > > > > > > > > > more > > > > sophisticated). > > > > > > > > > Konstantin > > > > > > > > > > > > > > > > If you look at the other userspace RCU, you wil see that th= e > > > > > > > > only inlines are the rcu_read_lock,rcu_read_unlock and > > > > > > rcu_reference/rcu_assign_pointer. > > > > > > > > > > > > > > > > The synchronization logic is all real functions. > > > > > > > > > > > > > > In fact, I think urcu provides both flavors: > > > > > > > https://github.com/urcu/userspace- > > > > > > rcu/blob/master/include/urcu/static/ > > > > > > > urcu-qsbr.h I still don't understand why we have to treat it > > > > > > > differently then let say spin-lock/ticket-lock or rwlock. > > > > > > > If we gone all the way to create our own version of rcu, we > > > > > > > probably want it to be as fast as possible (I know that main > > > > > > > speedup should come from the fact that readers don't have to > > > > > > > wait for writer to finish, but still...) > > > > > > > > > > > > > > Konstantin > > > > > > > > > > > > > > > > > > > Having locking functions inline is already a problem in current > > releases. > > > > > > The implementation can not be improved without breaking ABI (or > > > > > > doing special workarounds like lock v2) > > > > > I think ABI and inline function discussion needs to be taken up i= n > > > > > a > > > > different thread. > > > > > > > > > > Currently, I am looking to hide the structure visibility. I looke= d > > > > > at your > > > > patch [1], it is a different case than what I have in this patch. I= t > > > > is a pretty generic use case as well (similar situation exists in > > > > other libraries). I think a generic solution should be agreed upon. > > > > > > > > > > If we have to hide the structure content, the handle to QS > > > > > variable > > > > returned to the application needs to be opaque. I suggest using 'vo= id *' > > > > behind which any structure can be used. > > > > > > > > > > typedef void * rte_rcu_qsbr_t; > > > > > typedef void * rte_hash_t; > > > > > > > > > > But it requires typecasting. > > > > > > > > > > [1] http://patchwork.dpdk.org/cover/52609/ > > > > > > > > C allows structure to be defined without knowing what is in it > > therefore. > > > > > > > > typedef struct rte_rcu_qsbr rte_rcu_qsbr_t; > > > > > > > > is preferred (or do it without typedef) > > > > > > > > struct rte_rcu_qsbr; > > > > > > I see that rte_hash library uses the same approach (struct rte_hash i= n > > rte_hash.h, though it is marking as internal). But the ABI Laboratory t= ool > > [1] seems to be reporting incorrect numbers for this library even thoug= h > > the internal structure is changed. > > > > > > [1] > > > https://abi- > > laboratory.pro/index.php?view=3Dcompat_report&l=3Ddpdk&v1=3D19.0 > > > 2&v2=3Dcurrent&obj=3D66794&kind=3Dabi > > > > The problem is rte_hash structure is exposed as part of ABI in > > rte_cuckoo_hash.h This was a mistake. > Do you mean, due to the use of structure with the same name? I am wonderi= ng if it is just a tools issue. The application is not supposed to > include rte_cuckoo_hash.h. >=20 > For the RCU library, we either need to go all functions or leave it the w= ay it is. I do not see a point in trying to hide the internal structure > while having inline functions. >=20 > I converted the inline functions to function calls. >=20 > Testing on Arm platform (results *are* repeatable) shows very minimal dro= p (0.1% to 0.2%) in performance while using lock-free rte_hash > data structure. But one of the test cases which is just spinning shows go= od amount of drop (41%). >=20 > Testing on x86 (Xeon Gold 6132 CPU @ 2.60GHz, results *are* pretty repeat= able) shows performance improvements (7% to 8%) while using > lock-free rte_hash data structure. The test cases which is just spinning = show significant drop (14%, 155%, 231%). > Konstantin, any thoughts on the results? The fact that function show better result than inline (even for hash) is so= rt of surprise to me. Don't have any good explanation off-hand, but the actual numbers for hash t= est are huge by itself... In general, I still think that sync primitives better to stay inlined - the= re is no much point to create ones and then figure out that no-one using them because they are too slow. Though if there is no real perf difference between inlined and normal - no = point to keep it inlined. About RCU lib, my thought to have inlined version for 19.05 and do further = perf testing with it (as I remember there were suggestions about using it in l3fwd for guarding = routing table or so). If we'll find there is no real difference - move it to not-inlined version = in 19.08. It is experimental for now - so could be changed without formal ABI breaka= ge. Konstantin