From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mga09.intel.com (mga09.intel.com [134.134.136.24]) by dpdk.org (Postfix) with ESMTP id DA6557D4F for ; Wed, 23 Aug 2017 21:28:26 +0200 (CEST) Received: from fmsmga004.fm.intel.com ([10.253.24.48]) by orsmga102.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 23 Aug 2017 12:28:25 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.41,417,1498546800"; d="scan'208";a="303677806" Received: from fmsmsx104.amr.corp.intel.com ([10.18.124.202]) by fmsmga004.fm.intel.com with ESMTP; 23 Aug 2017 12:28:24 -0700 Received: from fmsmsx122.amr.corp.intel.com (10.18.125.37) by fmsmsx104.amr.corp.intel.com (10.18.124.202) with Microsoft SMTP Server (TLS) id 14.3.319.2; Wed, 23 Aug 2017 12:28:24 -0700 Received: from fmsmsx115.amr.corp.intel.com ([169.254.4.190]) by fmsmsx122.amr.corp.intel.com ([169.254.5.125]) with mapi id 14.03.0319.002; Wed, 23 Aug 2017 12:28:24 -0700 From: "Carrillo, Erik G" To: "Wiles, Keith" CC: "rsanford@akamai.com" , "dev@dpdk.org" Thread-Topic: [dpdk-dev] [PATCH 0/3] *** timer library enhancements *** Thread-Index: AQHTHB7TiZ/GChPD+EmiNd0N4Sbrd6KSfqgA//+U1PCAAIlnAP//p4sg Date: Wed, 23 Aug 2017 19:28:23 +0000 Message-ID: References: <1503499644-29432-1-git-send-email-erik.g.carrillo@intel.com> <3F9B5E47-8083-443E-96EE-CBC41695BE43@intel.com> <28C555FD-9BAB-4A6D-BB9B-37BD42B750AD@intel.com> In-Reply-To: <28C555FD-9BAB-4A6D-BB9B-37BD42B750AD@intel.com> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: dlp-product: dlpe-windows dlp-version: 10.0.102.7 dlp-reaction: no-action x-titus-metadata-40: eyJDYXRlZ29yeUxhYmVscyI6IiIsIk1ldGFkYXRhIjp7Im5zIjoiaHR0cDpcL1wvd3d3LnRpdHVzLmNvbVwvbnNcL0ludGVsMyIsImlkIjoiNTIwYWZjZGYtNzRkOC00ZmZlLTlhZmUtYmZjZjIyN2VjNzUyIiwicHJvcHMiOlt7Im4iOiJDVFBDbGFzc2lmaWNhdGlvbiIsInZhbHMiOlt7InZhbHVlIjoiQ1RQX0lDIn1dfV19LCJTdWJqZWN0TGFiZWxzIjpbXSwiVE1DVmVyc2lvbiI6IjE2LjUuOS4zIiwiVHJ1c3RlZExhYmVsSGFzaCI6InByanZTT2czS2g4b09uM0RFU1puVE1mQ20wN0l3S251c0hWTFJLXC92NnR3PSJ9 x-ctpclassification: CTP_IC x-originating-ip: [10.1.200.108] Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Subject: Re: [dpdk-dev] [PATCH 0/3] *** timer library enhancements *** X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 23 Aug 2017 19:28:27 -0000 > -----Original Message----- > From: Wiles, Keith > Sent: Wednesday, August 23, 2017 11:50 AM > To: Carrillo, Erik G > Cc: rsanford@akamai.com; dev@dpdk.org > Subject: Re: [dpdk-dev] [PATCH 0/3] *** timer library enhancements *** >=20 >=20 > > On Aug 23, 2017, at 11:19 AM, Carrillo, Erik G > wrote: > > > > > > > >> -----Original Message----- > >> From: Wiles, Keith > >> Sent: Wednesday, August 23, 2017 10:02 AM > >> To: Carrillo, Erik G > >> Cc: rsanford@akamai.com; dev@dpdk.org > >> Subject: Re: [dpdk-dev] [PATCH 0/3] *** timer library enhancements > >> *** > >> > >> > >>> On Aug 23, 2017, at 9:47 AM, Gabriel Carrillo > >>> > >> wrote: > >>> > >>> In the current implementation of the DPDK timer library, timers can > >>> be created and set to be handled by a target lcore by adding it to a > >>> skiplist that corresponds to that lcore. However, if an application > >>> enables multiple lcores, and each of these lcores repeatedly > >>> attempts to install timers on the same target lcore, overall > >>> application throughput will be reduced as all lcores contend to > >>> acquire the lock guarding the single skiplist of pending timers. > >>> > >>> This patchset addresses this scenario by adding an array of > >>> skiplists to each lcore's priv_timer struct, such that when lcore i > >>> installs a timer on lcore k, the timer will be added to the ith > >>> skiplist for lcore k. If lcore j installs a timer on lcore k > >>> simultaneously, lcores i and j can both proceed since they will be > >>> acquiring different locks for different lists. > >>> > >>> When lcore k processes its pending timers, it will traverse each > >>> skiplist in its array and acquire a skiplist's lock while a run list > >>> is broken out; meanwhile, all other lists can continue to be modified= . > >>> Then, all run lists for lcore k are collected and traversed together > >>> so timers are executed in their global order. > >> > >> What is the performance and/or latency added to the timeout now? > >> > >> I worry about the case when just about all of the cores are enabled, > >> which could be as high was 128 or more now. > > > > There is a case in the timer_perf_autotest that runs rte_timer_manage > with zero timers that can give a sense of the added latency. When run w= ith > one lcore, it completes in around 25 cycles. When run with 43 lcores (th= e > highest I have access to at the moment), rte_timer_mange completes in > around 155 cycles. So it looks like each added lcore adds around 3 cycle= s of > overhead for checking empty lists in my testing. >=20 > Does this mean we have only 25 cycles on the current design or is the 25 > cycles for the new design? >=20 Both - when run with one lcore, the new design becomes equivalent to the or= iginal one. I tested the current design to confirm. > If for the new design, then what is the old design cost compared to the n= ew > cost. >=20 > I also think we need the call to a timer function in the calculation, jus= t to > make sure we have at least one timer in the list and we account for any s= hort > cuts in the code for no timers active. >=20 Looking at the numbers for non-empty lists in timer_perf_autotest, the over= head appears to fall away. Here are some representative runs for timer_per= f_autotest: 43 lcores enabled, installing 1M timers on an lcore and processing them wit= h current design: <...snipped...> Appending 1000000 timers Time for 1000000 timers: 424066294 (193ms), Time per timer: 424 (0us) Time for 1000000 callbacks: 73124504 (33ms), Time per callback: 73 (0us) Resetting 1000000 timers Time for 1000000 timers: 1406756396 (641ms), Time per timer: 1406 (1us) <...snipped...> 43 lcores enabled, installing 1M timers on an lcore and processing them wit= h proposed design: <...snipped...> Appending 1000000 timers Time for 1000000 timers: 382912762 (174ms), Time per timer: 382 (0us) Time for 1000000 callbacks: 79194418 (36ms), Time per callback: 79 (0us) Resetting 1000000 timers Time for 1000000 timers: 1427189116 (650ms), Time per timer: 1427 (1us) <...snipped...> The above are not averages, so the numbers don't really indicate which is f= aster, but they show that the overhead of the proposed design should not be= appreciable. > > > >> > >> One option is to have the lcore j that wants to install a timer on > >> lcore k to pass a message via a ring to lcore k to add that timer. We > >> could even add that logic into setting a timer on a different lcore > >> then the caller in the current API. The ring would be a multi-producer= and > single consumer, we still have the lock. > >> What am I missing here? > >> > > > > I did try this approach: initially I had a multi-producer single-consum= er ring > that would hold requests to add or delete a timer from lcore k's skiplist= , but it > didn't really give an appreciable increase in my test application through= put. > In profiling this solution, the hotspot had moved from acquiring the skip= list's > spinlock to the rte_atomic32_cmpset that the multiple-producer ring code > uses to manipulate the head pointer. > > > > Then, I tried multiple single-producer single-consumer rings per target > lcore. This removed the ring hotspot, but the performance didn't increas= e as > much as with the proposed solution. These solutions also add overhead to > rte_timer_manage, as it would have to process the rings and then process > the skiplists. > > > > One other thing to note is that a solution that uses such messages chan= ges > the use models for the timer. One interesting example is: > > - lcore I enqueues a message to install a timer on lcore k > > - lcore k runs rte_timer_manage, processes its messages and adds the > > timer to its list > > - lcore I then enqueues a message to stop the same timer, now owned by > > lcore k > > - lcore k does not run rte_timer_manage again > > - lcore I wants to free the timer but it might not be safe >=20 > This case seems like a mistake to me as lcore k should continue to call > rte_timer_manager() to process any new timers from other lcores not just > the case where the list becomes empty and lcore k does not add timer to h= is > list. >=20 > > > > Even though lcore I has successfully enqueued the request to stop the > timer (and delete it from lcore k's pending list), it hasn't actually bee= n > deleted from the list yet, so freeing it could corrupt the list. This c= ase exists > in the existing timer stress tests. > > > > Another interesting scenario is: > > - lcore I resets a timer to install it on lcore k > > - lcore j resets the same timer to install it on lcore k > > - then, lcore k runs timer_manage >=20 > This one also seems like a mistake, more then one lcore setting the same > timer seems like a problem and should not be done. A lcore should own a > timer and no other lcore should be able to change that timer. If multiple > lcores need a timer then they should not share the same timer structure. >=20 Both of the above cases exist in the timer library stress tests, so a solut= ion would presumably need to address them or it would be less flexible. Th= e original design passed these tests, as does the proposed one. > > > > Lcore j's message obviates lcore i's message, and it would be wasted wo= rk > for lcore k to process it, so we should mark it to be skipped over. Han= dling all > the edge cases was more complex than the solution proposed. >=20 > Hmmm, to me it seems simple here as long as the lcores follow the same > rules and sharing a timer structure is very risky and avoidable IMO. >=20 > Once you have lcores adding timers to another lcore then all accesses to = that > skip list must be serialized or you get unpredictable results. This shoul= d also > fix most of the edge cases you are talking about. >=20 > Also it seems to me the case with an lcore adding timers to another lcore > timer list is a specific use case and could be handled by a different set= of APIs > for that specific use case. Then we do not need to change the current des= ign > and all of the overhead is placed on the new APIs/design. IMO we are > turning the current timer design into a global timer design as it really = is a per > lcore design today and I beleive that is a mistake. >=20 Well, the original API explicitly supports installing a timer to be execute= d on a different lcore, and there are no API changes in the patchset. Also= , the proposed design keeps the per-lcore design intact; it only takes wha= t used to be one large skiplist that held timers for all installing lcores,= and separates it into N skiplists that correspond 1:1 to an installing lco= re. When an lcore processes timers on its lists it will still only be mana= ging timers it owns, and no others. =20 > > > >>> > >>> Gabriel Carrillo (3): > >>> timer: add per-installer pending lists for each lcore > >>> timer: handle timers installed from non-EAL threads > >>> doc: update timer lib docs > >>> > >>> doc/guides/prog_guide/timer_lib.rst | 19 ++- > >>> lib/librte_timer/rte_timer.c | 329 +++++++++++++++++++++++----= -- > --- > >> ---- > >>> lib/librte_timer/rte_timer.h | 9 +- > >>> 3 files changed, 231 insertions(+), 126 deletions(-) > >>> > >>> -- > >>> 2.6.4 > >>> > >> > >> Regards, > >> Keith >=20 > Regards, > Keith