From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from EUR02-AM5-obe.outbound.protection.outlook.com (mail-eopbgr00060.outbound.protection.outlook.com [40.107.0.60]) by dpdk.org (Postfix) with ESMTP id 709502BF4 for ; Fri, 29 Mar 2019 06:54:12 +0100 (CET) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=armh.onmicrosoft.com; s=selector1-arm-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=Ety3xPF13dDPtEe8LNInSv/IZ0PVdS7UEDipR8dDXH0=; b=TCQsYlm3OLjIiVzWV8fEiNQP8JYIAvjzzJr/XXXP88vS4oVXlZQjP4LmwgQ05ZlQZqYeyo0zJ76TM1SWj9ydqDMGLg2AkZZLr2vOt9NvpJZsv2z+YZPMkN7Oua1UPbXoJuWbARQmiTHYyl8VrS8R1t6e2kGy9CFC3ufJyso/ujE= Received: from VE1PR08MB5149.eurprd08.prod.outlook.com (20.179.30.152) by VE1PR08MB4862.eurprd08.prod.outlook.com (10.255.113.140) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.1750.15; Fri, 29 Mar 2019 05:54:09 +0000 Received: from VE1PR08MB5149.eurprd08.prod.outlook.com ([fe80::e0ae:ecad:ec5:8177]) by VE1PR08MB5149.eurprd08.prod.outlook.com ([fe80::e0ae:ecad:ec5:8177%2]) with mapi id 15.20.1750.017; Fri, 29 Mar 2019 05:54:09 +0000 From: Honnappa Nagarahalli To: "Ananyev, Konstantin" , "stephen@networkplumber.org" , "paulmck@linux.ibm.com" , "dev@dpdk.org" CC: "Gavin Hu (Arm Technology China)" , Dharmik Thakkar , Malvika Gupta , nd , nd Thread-Topic: [PATCH 1/3] rcu: add RCU library supporting QSBR mechanism Thread-Index: AQHU3hBhNhXVbW/WVkOFAdc0mqwLiaYX1KDAgAUEG6CABBflkIABLGRw Date: Fri, 29 Mar 2019 05:54:09 +0000 Message-ID: References: <20181122033055.3431-1-honnappa.nagarahalli@arm.com> <20190319045228.46879-1-honnappa.nagarahalli@arm.com> <20190319045228.46879-2-honnappa.nagarahalli@arm.com> <2601191342CEEE43887BDE71AB977258013655ED5C@irsmsx105.ger.corp.intel.com> <2601191342CEEE43887BDE71AB977258013656120A@irsmsx105.ger.corp.intel.com> In-Reply-To: <2601191342CEEE43887BDE71AB977258013656120A@irsmsx105.ger.corp.intel.com> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: authentication-results: spf=none (sender IP is ) smtp.mailfrom=Honnappa.Nagarahalli@arm.com; x-originating-ip: [217.140.111.135] x-ms-publictraffictype: Email x-ms-office365-filtering-correlation-id: f7a2224f-8ada-4f77-20f3-08d6b40af685 x-ms-office365-filtering-ht: Tenant x-microsoft-antispam: BCL:0; PCL:0; RULEID:(2390118)(7020095)(4652040)(8989299)(4534185)(4627221)(201703031133081)(201702281549075)(8990200)(5600127)(711020)(4605104)(4618075)(2017052603328)(7153060)(7193020); SRVR:VE1PR08MB4862; x-ms-traffictypediagnostic: VE1PR08MB4862: x-ms-exchange-purlcount: 1 nodisclaimer: True x-microsoft-antispam-prvs: x-forefront-prvs: 0991CAB7B3 x-forefront-antispam-report: SFV:NSPM; SFS:(10009020)(366004)(346002)(396003)(136003)(39860400002)(376002)(199004)(189003)(52314003)(53936002)(11346002)(9686003)(446003)(2501003)(305945005)(71190400001)(7736002)(6436002)(71200400001)(102836004)(93886005)(476003)(486006)(110136005)(4326008)(6506007)(229853002)(99286004)(25786009)(54906003)(316002)(2201001)(5660300002)(478600001)(86362001)(74316002)(26005)(6306002)(6116002)(68736007)(3846002)(52536014)(7696005)(76176011)(106356001)(72206003)(66066001)(55016002)(2906002)(256004)(966005)(105586002)(186003)(6246003)(8676002)(14454004)(33656002)(97736004)(81166006)(81156014)(8936002); DIR:OUT; SFP:1101; SCL:1; SRVR:VE1PR08MB4862; H:VE1PR08MB5149.eurprd08.prod.outlook.com; FPR:; SPF:None; LANG:en; PTR:InfoNoRecords; A:1; MX:1; received-spf: None (protection.outlook.com: arm.com does not designate permitted sender hosts) x-ms-exchange-senderadcheck: 1 x-microsoft-antispam-message-info: HswXkBwymTKZHwlFajaSjb4l0r7BfJnqiDt0hXCwsasSJ+jfNqXrRFvmNx7B8YpbxrNHvbPg8IBuO4jqu/2KpDHThwIqQAL55YfQLBCOSkDjoKUis047KTMv6RB3yEmcYjVwsHw1BLOltSIuu7et5DuZ8tQUOkb+FfizuJZ27nGln4mhVu4heax1IOPRV0Fiv8SpOjXRSHQ6D585sIV4X/AKlHgmR26RSwbZ4br7bo+AQvuxsvvWSJV4hpp1x/+HpAmVkiK28UW/z+BmtPmEgIBevSN9/ZQRtHPs70qyqxkxQti2bpklZPhvFPxia0hWQGzYOZ8XnGj6LZhdMtyNagM781wnz+ziht/wAJ0xEptDDLBMcLQ0riUjs3thFxm5s4xvXM6fhjVx0AbXddsQUl91hN4WbePkCZO91Ya7jNw= Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-OriginatorOrg: arm.com X-MS-Exchange-CrossTenant-Network-Message-Id: f7a2224f-8ada-4f77-20f3-08d6b40af685 X-MS-Exchange-CrossTenant-originalarrivaltime: 29 Mar 2019 05:54:09.6605 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: f34e5979-57d9-4aaa-ad4d-b122a662184d X-MS-Exchange-CrossTenant-mailboxtype: HOSTED X-MS-Exchange-Transport-CrossTenantHeadersStamped: VE1PR08MB4862 Subject: Re: [dpdk-dev] [PATCH 1/3] rcu: add RCU library supporting QSBR mechanism X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 29 Mar 2019 05:54:12 -0000 >=20 > > > > > > > +#define RTE_QSBR_CNT_THR_OFFLINE 0 #define RTE_QSBR_CNT_INIT > 1 > > > > + > > > > +/** > > > > + * RTE thread Quiescent State structure. > > > > + * Quiescent state counter array (array of 'struct > > > > +rte_rcu_qsbr_cnt'), > > > > + * whose size is dependent on the maximum number of reader > > > > +threads > > > > + * (m_threads) using this variable is stored immediately > > > > +following > > > > + * this structure. > > > > + */ > > > > +struct rte_rcu_qsbr { > > > > + uint64_t token __rte_cache_aligned; > > > > + /**< Counter to allow for multiple simultaneous QS queries */ > > > > + > > > > + uint32_t num_elems __rte_cache_aligned; > > > > + /**< Number of elements in the thread ID array */ > > > > + uint32_t m_threads; > > > > + /**< Maximum number of threads this RCU variable will use */ > > > > + > > > > + uint64_t reg_thread_id[RTE_QSBR_THRID_ARRAY_ELEMS] > > > __rte_cache_aligned; > > > > + /**< Registered thread IDs are stored in a bitmap array */ > > > > > > > > > As I understand you ended up with fixed size array to avoid 2 > > > variable size arrays in this struct? > > Yes > > > > > Is that big penalty for register/unregister() to either store a > > > pointer to bitmap, or calculate it based on num_elems value? > > In the last RFC I sent out [1], I tested the impact of having > > non-fixed size array. There 'was' a performance degradation in most of = the > performance tests. The issue was with calculating the address of per thre= ad > QSBR counters (not with the address calculation of the bitmap). > > With the current patch, I do not see the performance difference (the > > difference between the RFC and this patch are the memory orderings, > > they are masking any perf gain from having a fixed array). However, I h= ave > kept the fixed size array as the generated code does not have additional > calculations to get the address of qsbr counter array elements. > > > > [1] http://mails.dpdk.org/archives/dev/2019-February/125029.html >=20 > Ok I see, but can we then arrange them ina different way: > qsbr_cnt[] will start at the end of struct rte_rcu_qsbr (same as you have= it > right now). > While bitmap will be placed after qsbr_cnt[]. Yes, that is an option. Though, it would mean we have to calculate the addr= ess, similar to macro 'RTE_QSBR_CNT_ARRAY_ELM' > As I understand register/unregister is not consider on critical path, so = some > perf-degradation here doesn't matter. Yes > Also check() would need extra address calculation for bitmap, but conside= ring > that we have to go through all bitmap (and in worst case qsbr_cnt[]) > anyway, that probably not a big deal? I think the address calculation can be made simpler than what I had tried b= efore. I can give it a shot. >=20 > > > > > As another thought - do we really need bitmap at all? > > The bit map is helping avoid accessing all the elements in > > rte_rcu_qsbr_cnt array (as you have mentioned below). This provides > > the ability to scale the number of threads dynamically. For ex: an > application can create a qsbr variable with 48 max threads, but currently= only > 2 threads are active (due to traffic conditions). >=20 > I understand that bitmap supposed to speedup check() for situations when > most threads are unregistered. > My thought was that might be check() speedup for such situation is not th= at > critical. IMO, there is a need to address both the cases, considering the future dire= ction of DPDK. It is possible to introduce a counter for the current number= of threads registered. If that is same as maximum number of threads, then = scanning the registered thread ID array can be skipped. >=20 > > > > > Might it is possible to sotre register value for each thread inside > > > it's > > > rte_rcu_qsbr_cnt: > > > struct rte_rcu_qsbr_cnt {uint64_t cnt; uint32_t register;} > > > __rte_cache_aligned; ? > > > That would cause check() to walk through all elems in > > > rte_rcu_qsbr_cnt array, but from other side would help to avoid cache > conflicts for register/unregister. > > With the addition of rte_rcu_qsbr_thread_online/offline APIs, the > > register/unregister APIs are not in critical path anymore. Hence, the c= ache > conflicts are fine. The online/offline APIs work on thread specific cache= lines > and these are in the critical path. > > > > > > > > > +} __rte_cache_aligned; > > > > + From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from dpdk.org (dpdk.org [92.243.14.124]) by dpdk.space (Postfix) with ESMTP id AC0B9A0679 for ; Fri, 29 Mar 2019 06:54:14 +0100 (CET) Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id B17CA2C28; Fri, 29 Mar 2019 06:54:13 +0100 (CET) Received: from EUR02-AM5-obe.outbound.protection.outlook.com (mail-eopbgr00060.outbound.protection.outlook.com [40.107.0.60]) by dpdk.org (Postfix) with ESMTP id 709502BF4 for ; Fri, 29 Mar 2019 06:54:12 +0100 (CET) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=armh.onmicrosoft.com; s=selector1-arm-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=Ety3xPF13dDPtEe8LNInSv/IZ0PVdS7UEDipR8dDXH0=; b=TCQsYlm3OLjIiVzWV8fEiNQP8JYIAvjzzJr/XXXP88vS4oVXlZQjP4LmwgQ05ZlQZqYeyo0zJ76TM1SWj9ydqDMGLg2AkZZLr2vOt9NvpJZsv2z+YZPMkN7Oua1UPbXoJuWbARQmiTHYyl8VrS8R1t6e2kGy9CFC3ufJyso/ujE= Received: from VE1PR08MB5149.eurprd08.prod.outlook.com (20.179.30.152) by VE1PR08MB4862.eurprd08.prod.outlook.com (10.255.113.140) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.1750.15; Fri, 29 Mar 2019 05:54:09 +0000 Received: from VE1PR08MB5149.eurprd08.prod.outlook.com ([fe80::e0ae:ecad:ec5:8177]) by VE1PR08MB5149.eurprd08.prod.outlook.com ([fe80::e0ae:ecad:ec5:8177%2]) with mapi id 15.20.1750.017; Fri, 29 Mar 2019 05:54:09 +0000 From: Honnappa Nagarahalli To: "Ananyev, Konstantin" , "stephen@networkplumber.org" , "paulmck@linux.ibm.com" , "dev@dpdk.org" CC: "Gavin Hu (Arm Technology China)" , Dharmik Thakkar , Malvika Gupta , nd , nd Thread-Topic: [PATCH 1/3] rcu: add RCU library supporting QSBR mechanism Thread-Index: AQHU3hBhNhXVbW/WVkOFAdc0mqwLiaYX1KDAgAUEG6CABBflkIABLGRw Date: Fri, 29 Mar 2019 05:54:09 +0000 Message-ID: References: <20181122033055.3431-1-honnappa.nagarahalli@arm.com> <20190319045228.46879-1-honnappa.nagarahalli@arm.com> <20190319045228.46879-2-honnappa.nagarahalli@arm.com> <2601191342CEEE43887BDE71AB977258013655ED5C@irsmsx105.ger.corp.intel.com> <2601191342CEEE43887BDE71AB977258013656120A@irsmsx105.ger.corp.intel.com> In-Reply-To: <2601191342CEEE43887BDE71AB977258013656120A@irsmsx105.ger.corp.intel.com> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: authentication-results: spf=none (sender IP is ) smtp.mailfrom=Honnappa.Nagarahalli@arm.com; x-originating-ip: [217.140.111.135] x-ms-publictraffictype: Email x-ms-office365-filtering-correlation-id: f7a2224f-8ada-4f77-20f3-08d6b40af685 x-ms-office365-filtering-ht: Tenant x-microsoft-antispam: BCL:0; PCL:0; RULEID:(2390118)(7020095)(4652040)(8989299)(4534185)(4627221)(201703031133081)(201702281549075)(8990200)(5600127)(711020)(4605104)(4618075)(2017052603328)(7153060)(7193020); SRVR:VE1PR08MB4862; x-ms-traffictypediagnostic: VE1PR08MB4862: x-ms-exchange-purlcount: 1 nodisclaimer: True x-microsoft-antispam-prvs: x-forefront-prvs: 0991CAB7B3 x-forefront-antispam-report: SFV:NSPM; SFS:(10009020)(366004)(346002)(396003)(136003)(39860400002)(376002)(199004)(189003)(52314003)(53936002)(11346002)(9686003)(446003)(2501003)(305945005)(71190400001)(7736002)(6436002)(71200400001)(102836004)(93886005)(476003)(486006)(110136005)(4326008)(6506007)(229853002)(99286004)(25786009)(54906003)(316002)(2201001)(5660300002)(478600001)(86362001)(74316002)(26005)(6306002)(6116002)(68736007)(3846002)(52536014)(7696005)(76176011)(106356001)(72206003)(66066001)(55016002)(2906002)(256004)(966005)(105586002)(186003)(6246003)(8676002)(14454004)(33656002)(97736004)(81166006)(81156014)(8936002); DIR:OUT; SFP:1101; SCL:1; SRVR:VE1PR08MB4862; H:VE1PR08MB5149.eurprd08.prod.outlook.com; FPR:; SPF:None; LANG:en; PTR:InfoNoRecords; A:1; MX:1; received-spf: None (protection.outlook.com: arm.com does not designate permitted sender hosts) x-ms-exchange-senderadcheck: 1 x-microsoft-antispam-message-info: HswXkBwymTKZHwlFajaSjb4l0r7BfJnqiDt0hXCwsasSJ+jfNqXrRFvmNx7B8YpbxrNHvbPg8IBuO4jqu/2KpDHThwIqQAL55YfQLBCOSkDjoKUis047KTMv6RB3yEmcYjVwsHw1BLOltSIuu7et5DuZ8tQUOkb+FfizuJZ27nGln4mhVu4heax1IOPRV0Fiv8SpOjXRSHQ6D585sIV4X/AKlHgmR26RSwbZ4br7bo+AQvuxsvvWSJV4hpp1x/+HpAmVkiK28UW/z+BmtPmEgIBevSN9/ZQRtHPs70qyqxkxQti2bpklZPhvFPxia0hWQGzYOZ8XnGj6LZhdMtyNagM781wnz+ziht/wAJ0xEptDDLBMcLQ0riUjs3thFxm5s4xvXM6fhjVx0AbXddsQUl91hN4WbePkCZO91Ya7jNw= Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-OriginatorOrg: arm.com X-MS-Exchange-CrossTenant-Network-Message-Id: f7a2224f-8ada-4f77-20f3-08d6b40af685 X-MS-Exchange-CrossTenant-originalarrivaltime: 29 Mar 2019 05:54:09.6605 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: f34e5979-57d9-4aaa-ad4d-b122a662184d X-MS-Exchange-CrossTenant-mailboxtype: HOSTED X-MS-Exchange-Transport-CrossTenantHeadersStamped: VE1PR08MB4862 Subject: Re: [dpdk-dev] [PATCH 1/3] rcu: add RCU library supporting QSBR mechanism X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" Message-ID: <20190329055409.p6SEKIlP7Ie3N_c-CkIbyvpXnO_1dNQLdRRgnFjTkUQ@z> >=20 > > > > > > > +#define RTE_QSBR_CNT_THR_OFFLINE 0 #define RTE_QSBR_CNT_INIT > 1 > > > > + > > > > +/** > > > > + * RTE thread Quiescent State structure. > > > > + * Quiescent state counter array (array of 'struct > > > > +rte_rcu_qsbr_cnt'), > > > > + * whose size is dependent on the maximum number of reader > > > > +threads > > > > + * (m_threads) using this variable is stored immediately > > > > +following > > > > + * this structure. > > > > + */ > > > > +struct rte_rcu_qsbr { > > > > + uint64_t token __rte_cache_aligned; > > > > + /**< Counter to allow for multiple simultaneous QS queries */ > > > > + > > > > + uint32_t num_elems __rte_cache_aligned; > > > > + /**< Number of elements in the thread ID array */ > > > > + uint32_t m_threads; > > > > + /**< Maximum number of threads this RCU variable will use */ > > > > + > > > > + uint64_t reg_thread_id[RTE_QSBR_THRID_ARRAY_ELEMS] > > > __rte_cache_aligned; > > > > + /**< Registered thread IDs are stored in a bitmap array */ > > > > > > > > > As I understand you ended up with fixed size array to avoid 2 > > > variable size arrays in this struct? > > Yes > > > > > Is that big penalty for register/unregister() to either store a > > > pointer to bitmap, or calculate it based on num_elems value? > > In the last RFC I sent out [1], I tested the impact of having > > non-fixed size array. There 'was' a performance degradation in most of = the > performance tests. The issue was with calculating the address of per thre= ad > QSBR counters (not with the address calculation of the bitmap). > > With the current patch, I do not see the performance difference (the > > difference between the RFC and this patch are the memory orderings, > > they are masking any perf gain from having a fixed array). However, I h= ave > kept the fixed size array as the generated code does not have additional > calculations to get the address of qsbr counter array elements. > > > > [1] http://mails.dpdk.org/archives/dev/2019-February/125029.html >=20 > Ok I see, but can we then arrange them ina different way: > qsbr_cnt[] will start at the end of struct rte_rcu_qsbr (same as you have= it > right now). > While bitmap will be placed after qsbr_cnt[]. Yes, that is an option. Though, it would mean we have to calculate the addr= ess, similar to macro 'RTE_QSBR_CNT_ARRAY_ELM' > As I understand register/unregister is not consider on critical path, so = some > perf-degradation here doesn't matter. Yes > Also check() would need extra address calculation for bitmap, but conside= ring > that we have to go through all bitmap (and in worst case qsbr_cnt[]) > anyway, that probably not a big deal? I think the address calculation can be made simpler than what I had tried b= efore. I can give it a shot. >=20 > > > > > As another thought - do we really need bitmap at all? > > The bit map is helping avoid accessing all the elements in > > rte_rcu_qsbr_cnt array (as you have mentioned below). This provides > > the ability to scale the number of threads dynamically. For ex: an > application can create a qsbr variable with 48 max threads, but currently= only > 2 threads are active (due to traffic conditions). >=20 > I understand that bitmap supposed to speedup check() for situations when > most threads are unregistered. > My thought was that might be check() speedup for such situation is not th= at > critical. IMO, there is a need to address both the cases, considering the future dire= ction of DPDK. It is possible to introduce a counter for the current number= of threads registered. If that is same as maximum number of threads, then = scanning the registered thread ID array can be skipped. >=20 > > > > > Might it is possible to sotre register value for each thread inside > > > it's > > > rte_rcu_qsbr_cnt: > > > struct rte_rcu_qsbr_cnt {uint64_t cnt; uint32_t register;} > > > __rte_cache_aligned; ? > > > That would cause check() to walk through all elems in > > > rte_rcu_qsbr_cnt array, but from other side would help to avoid cache > conflicts for register/unregister. > > With the addition of rte_rcu_qsbr_thread_online/offline APIs, the > > register/unregister APIs are not in critical path anymore. Hence, the c= ache > conflicts are fine. The online/offline APIs work on thread specific cache= lines > and these are in the critical path. > > > > > > > > > +} __rte_cache_aligned; > > > > +