From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from EUR03-DB5-obe.outbound.protection.outlook.com (mail-eopbgr40064.outbound.protection.outlook.com [40.107.4.64]) by dpdk.org (Postfix) with ESMTP id 05FDB2BE5 for ; Fri, 22 Feb 2019 08:08:01 +0100 (CET) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=armh.onmicrosoft.com; s=selector1-arm-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=8PYASP/pxocjsKFv2aehNmXjT2Ebkqpw9zJHVLeXrMc=; b=Spy+EaLRlrQBjuUPAn29YmE1XrzW8+PaN66Pj/jcgxP/Ifjrj26R4e7J//Z02JKWIzlMparkGr0q1OD+co+H+zLxN4namY6va/hscs6vwoBlEtWNnm8C+0dp46RUs8JR/Dq14A8/UhV4FfAKdt/7LLIjD+tJyPbdJoi+5EOLFfg= Received: from AM6PR08MB3672.eurprd08.prod.outlook.com (20.177.115.76) by AM6PR08MB3397.eurprd08.prod.outlook.com (20.177.112.222) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.1622.19; Fri, 22 Feb 2019 07:07:59 +0000 Received: from AM6PR08MB3672.eurprd08.prod.outlook.com ([fe80::9120:87d6:b17c:fadd]) by AM6PR08MB3672.eurprd08.prod.outlook.com ([fe80::9120:87d6:b17c:fadd%3]) with mapi id 15.20.1622.018; Fri, 22 Feb 2019 07:07:59 +0000 From: Honnappa Nagarahalli To: "Ananyev, Konstantin" , "dev@dpdk.org" , "stephen@networkplumber.org" , "paulmck@linux.ibm.com" CC: "Gavin Hu (Arm Technology China)" , Dharmik Thakkar , nd , Honnappa Nagarahalli , nd Thread-Topic: [RFC v2 1/2] rcu: add RCU library supporting QSBR mechanism Thread-Index: AQHUmZwkA3lN9JDuXEiclvOGJBGOtKWwUixggACXpdCAAUkpAIACgqOwgABi02CACb/TYIAAEgIQgCy9Q9A= Date: Fri, 22 Feb 2019 07:07:59 +0000 Message-ID: References: <20181122033055.3431-1-honnappa.nagarahalli@arm.com> <20181222021420.5114-1-honnappa.nagarahalli@arm.com> <20181222021420.5114-2-honnappa.nagarahalli@arm.com> <2601191342CEEE43887BDE71AB977258010D904212@irsmsx105.ger.corp.intel.com> <2601191342CEEE43887BDE71AB977258010D904AC7@irsmsx105.ger.corp.intel.com> <2601191342CEEE43887BDE71AB977258010D9058F6@irsmsx105.ger.corp.intel.com> <2601191342CEEE43887BDE71AB977258010D907734@irsmsx105.ger.corp.intel.com> In-Reply-To: <2601191342CEEE43887BDE71AB977258010D907734@irsmsx105.ger.corp.intel.com> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: authentication-results: spf=none (sender IP is ) smtp.mailfrom=Honnappa.Nagarahalli@arm.com; x-originating-ip: [217.140.103.75] x-ms-publictraffictype: Email x-ms-office365-filtering-correlation-id: a0100898-4dcd-4bc6-f025-08d698947a9c x-ms-office365-filtering-ht: Tenant x-microsoft-antispam: BCL:0; PCL:0; RULEID:(2390118)(7020095)(4652040)(8989299)(4534185)(4627221)(201703031133081)(201702281549075)(8990200)(5600110)(711020)(4605104)(4618075)(2017052603328)(7153060)(7193020); SRVR:AM6PR08MB3397; x-ms-traffictypediagnostic: AM6PR08MB3397: x-ms-exchange-purlcount: 1 nodisclaimer: True x-microsoft-exchange-diagnostics: 1; AM6PR08MB3397; 20:YFAIU2a3LDvZzG0MdVQhS0z0MhCDP/mgSMtCxkCSJIuWV9G9XlsOcAoL9kTNBRWjIaQ6SkOcLoQa/vb6ZCvJq4DumSsQ+8OkYXpozuXNmKmDcSWf0pcFf1cIKCEZJGtMcVIccxpv26gs0fKvWtmtPtDe4jQQVVBa/3Od20dtf64= x-microsoft-antispam-prvs: x-forefront-prvs: 09565527D6 x-forefront-antispam-report: SFV:NSPM; SFS:(10009020)(39860400002)(376002)(346002)(366004)(396003)(136003)(189003)(199004)(6306002)(9686003)(53936002)(66066001)(8676002)(110136005)(8936002)(7736002)(81166006)(229853002)(2201001)(86362001)(476003)(68736007)(256004)(5660300002)(305945005)(55016002)(26005)(186003)(6436002)(25786009)(81156014)(74316002)(33656002)(14444005)(2906002)(11346002)(486006)(3846002)(6116002)(446003)(105586002)(106356001)(93886005)(7696005)(316002)(966005)(102836004)(54906003)(4326008)(72206003)(97736004)(14454004)(71190400001)(71200400001)(478600001)(2501003)(6506007)(99286004)(6246003)(76176011); DIR:OUT; SFP:1101; SCL:1; SRVR:AM6PR08MB3397; H:AM6PR08MB3672.eurprd08.prod.outlook.com; FPR:; SPF:None; LANG:en; PTR:InfoNoRecords; MX:1; A:1; received-spf: None (protection.outlook.com: arm.com does not designate permitted sender hosts) x-ms-exchange-senderadcheck: 1 x-microsoft-antispam-message-info: T11SHLMYaqBLEnuiwDfkZ2YmMZ8wBz8g1eFI07eCjiWoJblYRp2NgyNDor47tJmEZYIT6hq4AYPo8eGXFXLJG9ekIzBF5rG4GCrPlwvFOugsMvsmcGyi/KEO0GomPVfdCUBNIoUQK4lxqE5pQyhTg3Fzlv4aWQDkSEDgml6WtZB7XdweUW0/RNmmS/iDtoVkT0V+R8rgHK2ztEcoQxcCJaF1lm8lWpofHnMLQ01Bsvo549avrkawWbuGFMljEgNIFFNssXuaS4+eTRuG4/YApyktF3WNsax8zNxDr0G/RLWuTIfD3FciBy1HOjpP0PExVX6SoJjrLQkHc3hjjTctw+MadGgbXK3HA78WcYyRr9FXfYIxq/mBFXDexg68kSSPth63fie71UMkRSdi0Et4ninpIcmwJFVdXvVHu1mM1Ow= Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-OriginatorOrg: arm.com X-MS-Exchange-CrossTenant-Network-Message-Id: a0100898-4dcd-4bc6-f025-08d698947a9c X-MS-Exchange-CrossTenant-originalarrivaltime: 22 Feb 2019 07:07:59.8310 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: f34e5979-57d9-4aaa-ad4d-b122a662184d X-MS-Exchange-CrossTenant-mailboxtype: HOSTED X-MS-Exchange-Transport-CrossTenantHeadersStamped: AM6PR08MB3397 Subject: Re: [dpdk-dev] [RFC v2 1/2] rcu: add RCU library supporting QSBR mechanism X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 22 Feb 2019 07:08:01 -0000 > > > > > > > > > > > > +/** > > > > > > > > + * RTE thread Quiescent State structure. > > > > > > > > + */ > > > > > > > > +struct rte_rcu_qsbr { > > > > > > > > + uint64_t reg_thread_id[RTE_QSBR_BIT_MAP_ELEMS] > > > > > > > __rte_cache_aligned; > > > > > > > > + /**< Registered reader thread IDs - reader threads report= ing > > > > > > > > + * on this QS variable represented in a bit map. > > > > > > > > + */ > > > > > > > > + > > > > > > > > + uint64_t token __rte_cache_aligned; > > > > > > > > + /**< Counter to allow for multiple simultaneous QS > > > > > > > > +queries */ > > > > > > > > + > > > > > > > > + struct rte_rcu_qsbr_cnt w[RTE_RCU_MAX_THREADS] > > > > > > > __rte_cache_aligned; > > > > > > > > + /**< QS counter for each reader thread, counts upto > > > > > > > > + * current value of token. > > > > > > > > > > > > > > As I understand you decided to stick with neutral thread_id > > > > > > > and let user define what exactly thread_id is (lcore, syste, > > > > > > > thread id, something > > > > > else)? > > > > > > Yes, that is correct. I will reply to the other thread to > > > > > > continue the > > > discussion. > > > > > > > > > > > > > If so, can you probably get rid of RTE_RCU_MAX_THREADS > limitation? > > > > > > I am not seeing this as a limitation. The user can change this > > > > > > if required. May > > > > > be I should change it as follows: > > > > > > #ifndef RTE_RCU_MAX_THREADS > > > > > > #define RTE_RCU_MAX_THREADS 128 #endif > > > > > > > > > > Yep, that's better, though it would still require user to > > > > > rebuild the code if he would like to increase total number of thr= eads > supported. > > > > Agree > > > > > > > > > Though it seems relatively simply to extend current code to > > > > > support dynamic max thread num here (2 variable arrays plus > > > > > shift value plus > > > mask). > > > > Agree, supporting dynamic 'max thread num' is simple. But this > > > > means memory needs to be allocated to the arrays. The API > > > > 'rte_rcu_qsbr_init' has to take max thread num as the parameter. > > > > We also > > > have to introduce another API to free this memory. This will become > > > very similar to alloc/free APIs I had in the v1. > > > > I hope I am following you well, please correct me if not. > > > > > > I think we can still leave alloc/free tasks to the user. > > > We probabply just need extra function rte_rcu_qsbr_size(uint32_ > > > max_threads) to help user calculate required size. > > > rte_rcu_qsbr_init() might take as an additional parameter 'size' to > > > make checks. > > The size is returned by an API provided by the library. Why does it > > need to be validated again? If 'size' is required for rte_rcu_qsbr_init= , it > could calculate it again. >=20 > Just as extra-safety check. > I don't have strong opinion here - if you think it is overkill, let's dro= p it. >=20 >=20 > > > > > Thought about something like that: > > > > > > size_t sz =3D rte_rcu_qsbr_size(max_threads); struct rte_rcu_qsbr > > > *qsbr =3D alloc_aligned(CACHE_LINE, sz); rte_rcu_qsbr_init(qsbr, > max_threads, sz); ... > > > > > Do you see any advantage for allowing the user to allocate the memory? > So user can choose where to allocate the memory (eal malloc, normal mallo= c, > stack, something else). > Again user might decide to make rcu part of some complex data structure -= in > that case he probably would like to allocate one big chunk of memory at o= nce > and then provide part of it for rcu. > Or some other usage scenario that I can't predict. >=20 I made this change and added performance tests similar to liburcu. With the= dynamic memory allocation change the performance of rte_rcu_qsbr_update co= mes down by 42% - 45% and that of rte_rcu_qsbr_check also comes down by 133= % on Arm platform. On x86 (E5-2660 v4 @ 2.00GHz), the results are mixed. rt= e_rcu_qsbr_update comes down by 15%, but that of rte_rcu_qsbr_check improve= s. On the Arm platform, the issue seems to be due to address calculation that = needs to happen at run time. If I fix the reg_thread_id array size, I am ge= tting back/improving the performance both for Arm and x86. What this means = is, we will still have a max thread limitation, but it will be high - 512 (= 1 cache line). We could make this 1024 (2 cache lines). However, per thread= counter data size will depend on the 'max thread' provided by the user. I = think this solution serves your requirement (though with an acceptable cons= traint not affecting the near future), please let me know what you think. These changes and the 3 variants of the implementation are present in RFC v= 3 [1], in case you want to run these tests. 1/5, 2/5 - same as RFC v2 + 1 bug fixed 3/5 - Addition of rte_rcu_qsbr_get_memsize. Memory size for register thread= bitmap array as well as per thread counter data is calculated based on max= _threads parameter 4/5 - Test cases are modified to use the new API 5/5 - Size of register thread bitmap array is fixed to hold 512 thread IDs.= However, the per thread counter data is calculated based on max_threads pa= rameter. If you do not want to run the tests, you can just look at 3/5 and 5/5. [1] http://patchwork.dpdk.org/cover/50431/ > > This approach requires the user to call 3 APIs (including memory > > allocation). These 3 can be abstracted in a rte_rcu_qsbr_alloc API, use= r has > to call just 1 API. > > > > > Konstantin > > >