From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <erik.g.carrillo@intel.com>
Received: from mga14.intel.com (mga14.intel.com [192.55.52.115])
 by dpdk.org (Postfix) with ESMTP id 215907D3A
 for <dev@dpdk.org>; Wed, 23 Aug 2017 18:19:22 +0200 (CEST)
Received: from orsmga005.jf.intel.com ([10.7.209.41])
 by fmsmga103.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384;
 23 Aug 2017 09:19:22 -0700
X-ExtLoop1: 1
X-IronPort-AV: E=Sophos;i="5.41,417,1498546800"; d="scan'208";a="141209912"
Received: from fmsmsx105.amr.corp.intel.com ([10.18.124.203])
 by orsmga005.jf.intel.com with ESMTP; 23 Aug 2017 09:19:21 -0700
Received: from FMSMSX109.amr.corp.intel.com (10.18.116.9) by
 FMSMSX105.amr.corp.intel.com (10.18.124.203) with Microsoft SMTP Server (TLS)
 id 14.3.319.2; Wed, 23 Aug 2017 09:19:21 -0700
Received: from fmsmsx115.amr.corp.intel.com ([169.254.4.190]) by
 FMSMSX109.amr.corp.intel.com ([169.254.15.111]) with mapi id 14.03.0319.002;
 Wed, 23 Aug 2017 09:19:21 -0700
From: "Carrillo, Erik G" <erik.g.carrillo@intel.com>
To: "Wiles, Keith" <keith.wiles@intel.com>
CC: "rsanford@akamai.com" <rsanford@akamai.com>, "dev@dpdk.org" <dev@dpdk.org>
Thread-Topic: [dpdk-dev] [PATCH 0/3] *** timer library enhancements ***
Thread-Index: AQHTHB7TiZ/GChPD+EmiNd0N4Sbrd6KSfqgA//+U1PA=
Date: Wed, 23 Aug 2017 16:19:20 +0000
Message-ID: <BE54F058557D9A4FAC1D84E2FC6D87570D52F2CE@fmsmsx115.amr.corp.intel.com>
References: <1503499644-29432-1-git-send-email-erik.g.carrillo@intel.com>
 <3F9B5E47-8083-443E-96EE-CBC41695BE43@intel.com>
In-Reply-To: <3F9B5E47-8083-443E-96EE-CBC41695BE43@intel.com>
Accept-Language: en-US
Content-Language: en-US
X-MS-Has-Attach: 
X-MS-TNEF-Correlator: 
dlp-product: dlpe-windows
dlp-version: 10.0.102.7
dlp-reaction: no-action
x-titus-metadata-40: eyJDYXRlZ29yeUxhYmVscyI6IiIsIk1ldGFkYXRhIjp7Im5zIjoiaHR0cDpcL1wvd3d3LnRpdHVzLmNvbVwvbnNcL0ludGVsMyIsImlkIjoiMWMzZjcwZjgtZWMyZC00NjYxLWJiZDEtYTc3OThlYzgzZDc3IiwicHJvcHMiOlt7Im4iOiJDVFBDbGFzc2lmaWNhdGlvbiIsInZhbHMiOlt7InZhbHVlIjoiQ1RQX0lDIn1dfV19LCJTdWJqZWN0TGFiZWxzIjpbXSwiVE1DVmVyc2lvbiI6IjE2LjUuOS4zIiwiVHJ1c3RlZExhYmVsSGFzaCI6ImF4bHBcL1pzZUdtMk1Tand4Z3kxeXZUbkpsd0Z2STBLbVEzMndxekFMaDQ0PSJ9
x-ctpclassification: CTP_IC
x-originating-ip: [10.1.200.106]
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0
Subject: Re: [dpdk-dev] [PATCH 0/3] *** timer library enhancements ***
X-BeenThere: dev@dpdk.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: DPDK patches and discussions <dev.dpdk.org>
List-Unsubscribe: <http://dpdk.org/ml/options/dev>,
 <mailto:dev-request@dpdk.org?subject=unsubscribe>
List-Archive: <http://dpdk.org/ml/archives/dev/>
List-Post: <mailto:dev@dpdk.org>
List-Help: <mailto:dev-request@dpdk.org?subject=help>
List-Subscribe: <http://dpdk.org/ml/listinfo/dev>,
 <mailto:dev-request@dpdk.org?subject=subscribe>
X-List-Received-Date: Wed, 23 Aug 2017 16:19:23 -0000


> -----Original Message-----
> From: Wiles, Keith
> Sent: Wednesday, August 23, 2017 10:02 AM
> To: Carrillo, Erik G <erik.g.carrillo@intel.com>
> Cc: rsanford@akamai.com; dev@dpdk.org
> Subject: Re: [dpdk-dev] [PATCH 0/3] *** timer library enhancements ***
>=20
>=20
> > On Aug 23, 2017, at 9:47 AM, Gabriel Carrillo <erik.g.carrillo@intel.co=
m>
> wrote:
> >
> > In the current implementation of the DPDK timer library, timers can be
> > created and set to be handled by a target lcore by adding it to a
> > skiplist that corresponds to that lcore.  However, if an application
> > enables multiple lcores, and each of these lcores repeatedly attempts
> > to install timers on the same target lcore, overall application
> > throughput will be reduced as all lcores contend to acquire the lock
> > guarding the single skiplist of pending timers.
> >
> > This patchset addresses this scenario by adding an array of skiplists
> > to each lcore's priv_timer struct, such that when lcore i installs a
> > timer on lcore k, the timer will be added to the ith skiplist for
> > lcore k.  If lcore j installs a timer on lcore k simultaneously,
> > lcores i and j can both proceed since they will be acquiring different
> > locks for different lists.
> >
> > When lcore k processes its pending timers, it will traverse each
> > skiplist in its array and acquire a skiplist's lock while a run list
> > is broken out; meanwhile, all other lists can continue to be modified.
> > Then, all run lists for lcore k are collected and traversed together
> > so timers are executed in their global order.
>=20
> What is the performance and/or latency added to the timeout now?
>=20
> I worry about the case when just about all of the cores are enabled, whic=
h
> could be as high was 128 or more now.

There is a case in the timer_perf_autotest that runs rte_timer_manage with =
zero timers that can give a sense of the added latency.   When run with one=
 lcore, it completes in around 25 cycles.  When run with 43 lcores (the hig=
hest I have access to at the moment), rte_timer_mange completes in around 1=
55 cycles.  So it looks like each added lcore adds around 3 cycles of overh=
ead for checking empty lists in my testing.

>=20
> One option is to have the lcore j that wants to install a timer on lcore =
k to pass
> a message via a ring to lcore k to add that timer. We could even add that=
 logic
> into setting a timer on a different lcore then the caller in the current =
API. The
> ring would be a multi-producer and single consumer, we still have the loc=
k.
> What am I missing here?
>=20

I did try this approach: initially I had a multi-producer single-consumer r=
ing that would hold requests to add or delete a timer from lcore k's skipli=
st, but it didn't really give an appreciable increase in my test applicatio=
n throughput.  In profiling this solution, the hotspot had moved from acqui=
ring the skiplist's spinlock to the rte_atomic32_cmpset that the multiple-p=
roducer ring code uses to manipulate the head pointer.

Then, I tried multiple single-producer single-consumer rings per target lco=
re.  This removed the ring hotspot, but the performance didn't increase as =
much as with the proposed solution.  These solutions also add overhead to r=
te_timer_manage, as it would have to process the rings and then process the=
 skiplists.

One other thing to note is that a solution that uses such messages changes =
the use models for the timer.  One interesting example is: =20
- lcore I enqueues a message to install a timer on lcore k
- lcore k runs rte_timer_manage, processes its messages and adds the timer =
to its list
- lcore I then enqueues a message to stop the same timer, now owned by lcor=
e k
- lcore k does not run rte_timer_manage again
- lcore I wants to free the timer but it might not be safe

Even though lcore I has successfully enqueued the request to stop the timer=
 (and delete it from lcore k's pending list), it hasn't actually been delet=
ed from the list yet,  so freeing it could corrupt the list.  This case exi=
sts in the existing timer stress tests.

Another interesting scenario is:
- lcore I resets a timer to install it on lcore k
- lcore j resets the same timer to install it on lcore k
- then, lcore k runs timer_manage

Lcore j's message obviates lcore i's message, and it would be wasted work f=
or lcore k to process it, so we should mark it to be skipped over.   Handli=
ng all the edge cases was more complex than the solution proposed.

> >
> > Gabriel Carrillo (3):
> >  timer: add per-installer pending lists for each lcore
> >  timer: handle timers installed from non-EAL threads
> >  doc: update timer lib docs
> >
> > doc/guides/prog_guide/timer_lib.rst |  19 ++-
> > lib/librte_timer/rte_timer.c        | 329 +++++++++++++++++++++++------=
---
> ----
> > lib/librte_timer/rte_timer.h        |   9 +-
> > 3 files changed, 231 insertions(+), 126 deletions(-)
> >
> > --
> > 2.6.4
> >
>=20
> Regards,
> Keith