From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mga11.intel.com (mga11.intel.com [192.55.52.93]) by dpdk.org (Postfix) with ESMTP id 73FF77D56 for ; Wed, 23 Aug 2017 18:50:17 +0200 (CEST) Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by fmsmga102.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 23 Aug 2017 09:50:16 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.41,417,1498546800"; d="scan'208";a="1187394260" Received: from fmsmsx104.amr.corp.intel.com ([10.18.124.202]) by fmsmga001.fm.intel.com with ESMTP; 23 Aug 2017 09:50:16 -0700 Received: from fmsmsx112.amr.corp.intel.com (10.18.116.6) by fmsmsx104.amr.corp.intel.com (10.18.124.202) with Microsoft SMTP Server (TLS) id 14.3.319.2; Wed, 23 Aug 2017 09:50:16 -0700 Received: from fmsmsx113.amr.corp.intel.com ([169.254.13.140]) by FMSMSX112.amr.corp.intel.com ([169.254.5.197]) with mapi id 14.03.0319.002; Wed, 23 Aug 2017 09:50:16 -0700 From: "Wiles, Keith" To: "Carrillo, Erik G" CC: "rsanford@akamai.com" , "dev@dpdk.org" Thread-Topic: [dpdk-dev] [PATCH 0/3] *** timer library enhancements *** Thread-Index: AQHTHB7TZLeEfa+ZwEqSjr9qjq35yqKSfqeAgAAVmgCAAAihgA== Date: Wed, 23 Aug 2017 16:50:14 +0000 Message-ID: <28C555FD-9BAB-4A6D-BB9B-37BD42B750AD@intel.com> References: <1503499644-29432-1-git-send-email-erik.g.carrillo@intel.com> <3F9B5E47-8083-443E-96EE-CBC41695BE43@intel.com> In-Reply-To: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [10.254.81.250] Content-Type: text/plain; charset="us-ascii" Content-ID: <88DE384A5065C246ACE4F9E2602E69A5@intel.com> Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Subject: Re: [dpdk-dev] [PATCH 0/3] *** timer library enhancements *** X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 23 Aug 2017 16:50:18 -0000 > On Aug 23, 2017, at 11:19 AM, Carrillo, Erik G wrote: >=20 >=20 >=20 >> -----Original Message----- >> From: Wiles, Keith >> Sent: Wednesday, August 23, 2017 10:02 AM >> To: Carrillo, Erik G >> Cc: rsanford@akamai.com; dev@dpdk.org >> Subject: Re: [dpdk-dev] [PATCH 0/3] *** timer library enhancements *** >>=20 >>=20 >>> On Aug 23, 2017, at 9:47 AM, Gabriel Carrillo >> wrote: >>>=20 >>> In the current implementation of the DPDK timer library, timers can be >>> created and set to be handled by a target lcore by adding it to a >>> skiplist that corresponds to that lcore. However, if an application >>> enables multiple lcores, and each of these lcores repeatedly attempts >>> to install timers on the same target lcore, overall application >>> throughput will be reduced as all lcores contend to acquire the lock >>> guarding the single skiplist of pending timers. >>>=20 >>> This patchset addresses this scenario by adding an array of skiplists >>> to each lcore's priv_timer struct, such that when lcore i installs a >>> timer on lcore k, the timer will be added to the ith skiplist for >>> lcore k. If lcore j installs a timer on lcore k simultaneously, >>> lcores i and j can both proceed since they will be acquiring different >>> locks for different lists. >>>=20 >>> When lcore k processes its pending timers, it will traverse each >>> skiplist in its array and acquire a skiplist's lock while a run list >>> is broken out; meanwhile, all other lists can continue to be modified. >>> Then, all run lists for lcore k are collected and traversed together >>> so timers are executed in their global order. >>=20 >> What is the performance and/or latency added to the timeout now? >>=20 >> I worry about the case when just about all of the cores are enabled, whi= ch >> could be as high was 128 or more now. >=20 > There is a case in the timer_perf_autotest that runs rte_timer_manage wit= h zero timers that can give a sense of the added latency. When run with o= ne lcore, it completes in around 25 cycles. When run with 43 lcores (the h= ighest I have access to at the moment), rte_timer_mange completes in around= 155 cycles. So it looks like each added lcore adds around 3 cycles of ove= rhead for checking empty lists in my testing. Does this mean we have only 25 cycles on the current design or is the 25 cy= cles for the new design? If for the new design, then what is the old design cost compared to the new= cost. I also think we need the call to a timer function in the calculation, just = to make sure we have at least one timer in the list and we account for any = short cuts in the code for no timers active. >=20 >>=20 >> One option is to have the lcore j that wants to install a timer on lcore= k to pass >> a message via a ring to lcore k to add that timer. We could even add tha= t logic >> into setting a timer on a different lcore then the caller in the current= API. The >> ring would be a multi-producer and single consumer, we still have the lo= ck. >> What am I missing here? >>=20 >=20 > I did try this approach: initially I had a multi-producer single-consumer= ring that would hold requests to add or delete a timer from lcore k's skip= list, but it didn't really give an appreciable increase in my test applicat= ion throughput. In profiling this solution, the hotspot had moved from acq= uiring the skiplist's spinlock to the rte_atomic32_cmpset that the multiple= -producer ring code uses to manipulate the head pointer. >=20 > Then, I tried multiple single-producer single-consumer rings per target l= core. This removed the ring hotspot, but the performance didn't increase a= s much as with the proposed solution. These solutions also add overhead to = rte_timer_manage, as it would have to process the rings and then process th= e skiplists. >=20 > One other thing to note is that a solution that uses such messages change= s the use models for the timer. One interesting example is: =20 > - lcore I enqueues a message to install a timer on lcore k > - lcore k runs rte_timer_manage, processes its messages and adds the time= r to its list > - lcore I then enqueues a message to stop the same timer, now owned by lc= ore k > - lcore k does not run rte_timer_manage again > - lcore I wants to free the timer but it might not be safe This case seems like a mistake to me as lcore k should continue to call rte= _timer_manager() to process any new timers from other lcores not just the c= ase where the list becomes empty and lcore k does not add timer to his list= . >=20 > Even though lcore I has successfully enqueued the request to stop the tim= er (and delete it from lcore k's pending list), it hasn't actually been del= eted from the list yet, so freeing it could corrupt the list. This case e= xists in the existing timer stress tests. >=20 > Another interesting scenario is: > - lcore I resets a timer to install it on lcore k > - lcore j resets the same timer to install it on lcore k > - then, lcore k runs timer_manage This one also seems like a mistake, more then one lcore setting the same ti= mer seems like a problem and should not be done. A lcore should own a timer= and no other lcore should be able to change that timer. If multiple lcores= need a timer then they should not share the same timer structure. >=20 > Lcore j's message obviates lcore i's message, and it would be wasted work= for lcore k to process it, so we should mark it to be skipped over. Hand= ling all the edge cases was more complex than the solution proposed. Hmmm, to me it seems simple here as long as the lcores follow the same rule= s and sharing a timer structure is very risky and avoidable IMO. Once you have lcores adding timers to another lcore then all accesses to th= at skip list must be serialized or you get unpredictable results. This shou= ld also fix most of the edge cases you are talking about. Also it seems to me the case with an lcore adding timers to another lcore t= imer list is a specific use case and could be handled by a different set of= APIs for that specific use case. Then we do not need to change the current= design and all of the overhead is placed on the new APIs/design. IMO we ar= e turning the current timer design into a global timer design as it really = is a per lcore design today and I beleive that is a mistake. >=20 >>>=20 >>> Gabriel Carrillo (3): >>> timer: add per-installer pending lists for each lcore >>> timer: handle timers installed from non-EAL threads >>> doc: update timer lib docs >>>=20 >>> doc/guides/prog_guide/timer_lib.rst | 19 ++- >>> lib/librte_timer/rte_timer.c | 329 +++++++++++++++++++++++------= --- >> ---- >>> lib/librte_timer/rte_timer.h | 9 +- >>> 3 files changed, 231 insertions(+), 126 deletions(-) >>>=20 >>> -- >>> 2.6.4 >>>=20 >>=20 >> Regards, >> Keith Regards, Keith