From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mga11.intel.com (mga11.intel.com [192.55.52.93]) by dpdk.org (Postfix) with ESMTP id 7D9B0E82 for ; Fri, 26 Sep 2014 01:18:15 +0200 (CEST) Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by fmsmga102.fm.intel.com with ESMTP; 25 Sep 2014 16:24:33 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.04,600,1406617200"; d="scan'208";a="597000910" Received: from irsmsx103.ger.corp.intel.com ([163.33.3.157]) by fmsmga001.fm.intel.com with ESMTP; 25 Sep 2014 16:24:32 -0700 Received: from irsmsx155.ger.corp.intel.com (163.33.192.3) by IRSMSX103.ger.corp.intel.com (163.33.3.157) with Microsoft SMTP Server (TLS) id 14.3.195.1; Fri, 26 Sep 2014 00:24:31 +0100 Received: from irsmsx104.ger.corp.intel.com ([169.254.5.248]) by IRSMSX155.ger.corp.intel.com ([169.254.14.152]) with mapi id 14.03.0195.001; Fri, 26 Sep 2014 00:24:31 +0100 From: "Ananyev, Konstantin" To: Neil Horman Thread-Topic: [dpdk-dev] [PATCH v2] Change alarm cancel function to thread-safe: Thread-Index: AQHP2MAqmY9s5RkexE+yRGxj2kXnOpwR4sKAgAAemtCAAAdaAIAAZ8zw Date: Thu, 25 Sep 2014 23:24:30 +0000 Message-ID: <2601191342CEEE43887BDE71AB97725821378B50@IRSMSX104.ger.corp.intel.com> References: <1411649768-8084-1-git-send-email-michalx.k.jastrzebski@intel.com> <20140925150807.GD32725@hmsreliant.think-freely.org> <2601191342CEEE43887BDE71AB977258213769DE@IRSMSX105.ger.corp.intel.com> <20140925172358.GG32725@hmsreliant.think-freely.org> In-Reply-To: <20140925172358.GG32725@hmsreliant.think-freely.org> Accept-Language: en-IE, en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [163.33.239.182] Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Cc: "dev@dpdk.org" Subject: Re: [dpdk-dev] [PATCH v2] Change alarm cancel function to thread-safe: X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 25 Sep 2014 23:18:16 -0000 > From: Neil Horman [mailto:nhorman@tuxdriver.com] > Sent: Thursday, September 25, 2014 6:24 PM > To: Ananyev, Konstantin > Cc: Jastrzebski, MichalX K; dev@dpdk.org > Subject: Re: [dpdk-dev] [PATCH v2] Change alarm cancel function to thread= -safe: >=20 > On Thu, Sep 25, 2014 at 04:03:48PM +0000, Ananyev, Konstantin wrote: > > > > > > > -----Original Message----- > > > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Neil Horman > > > Sent: Thursday, September 25, 2014 4:08 PM > > > To: Jastrzebski, MichalX K > > > Cc: dev@dpdk.org > > > Subject: Re: [dpdk-dev] [PATCH v2] Change alarm cancel function to th= read-safe: > > > > > > On Thu, Sep 25, 2014 at 01:56:08PM +0100, Michal Jastrzebski wrote: > > > > Change alarm cancel function to thread-safe. > > > > It eliminates a race between threads using rte_alarm_cancel and > > > > rte_alarm_set. > > > > > > > > Signed-off-by: Pawel Wodkowski > > > > Reviewed-by: Michal Jastrzebski > > > > > > > > --- > > > > lib/librte_eal/common/include/rte_alarm.h | 3 +- > > > > lib/librte_eal/linuxapp/eal/eal_alarm.c | 68 +++++++++++++++++= +----------- > > > > 2 files changed, 45 insertions(+), 26 deletions(-) > > > > > > > > > > > diff --git a/lib/librte_eal/common/include/rte_alarm.h b/lib/librte= _eal/common/include/rte_alarm.h > > > > index d451522..e7cbaef 100644 > > > > --- a/lib/librte_eal/common/include/rte_alarm.h > > > > +++ b/lib/librte_eal/common/include/rte_alarm.h > > > > @@ -76,7 +76,8 @@ typedef void (*rte_eal_alarm_callback)(void *arg)= ; > > > > int rte_eal_alarm_set(uint64_t us, rte_eal_alarm_callback cb, void= *cb_arg); > > > > > > > > /** > > > > - * Function to cancel an alarm callback which has been registered = before. > > > > + * Function to cancel an alarm callback which has been registered = before. If > > > > + * used outside alarm callback it wait for all callbacks to finish= its execution. > > > > * > > > > * @param cb_fn > > > > * alarm callback > > > > diff --git a/lib/librte_eal/linuxapp/eal/eal_alarm.c b/lib/librte_e= al/linuxapp/eal/eal_alarm.c > > > > index 480f0cb..ea8dfb4 100644 > > > > --- a/lib/librte_eal/linuxapp/eal/eal_alarm.c > > > > +++ b/lib/librte_eal/linuxapp/eal/eal_alarm.c > > > > @@ -69,7 +69,8 @@ struct alarm_entry { > > > > struct timeval time; > > > > rte_eal_alarm_callback cb_fn; > > > > void *cb_arg; > > > > - volatile int executing; > > > > + volatile uint8_t executing; > > > > + volatile pthread_t executing_id; > > > > }; > > > > > > > > static LIST_HEAD(alarm_list, alarm_entry) alarm_list =3D LIST_HEAD= _INITIALIZER(); > > > > @@ -108,11 +109,13 @@ eal_alarm_callback(struct rte_intr_handle *hd= l __rte_unused, > > > > (ap->time.tv_sec < now.tv_sec || (ap->time.tv_sec =3D=3D now.tv= _sec && > > > > ap->time.tv_usec <=3D now.tv_usec))){ > > > > ap->executing =3D 1; > > > > + ap->executing_id =3D pthread_self(); > > > How exactly does this work? From my read all alarm callbacks are han= dled by the > > > thread created in rte_eal_intr_init (which runs forever in > > > eal_intr_thread_main()). > > > > In current implementation - yes. > > > > So every assignment to the above executing_id value > > > will be from that thread. As such, anytime rte_eal_alarm_cancel is c= alled from > > > within a callback we are guaranteed that: > > > a) the ap->executing flag is set to 1 > > > b) the ap->executing_id value should equal whatever is returned from > > > pthread_self() > > > > Yes > > > > > > > > That will cause the executing counter local to the cancel function to= get > > > incremented, meaning we will deadlock withing that do { ... } while (= executing > > > !=3D 0) loop, no? > > > > No, as for the case when cancel is called from callback: > > pthread_equal(ap->executing_id, pthread_self()) > > would return non-zero value (which means threads ids are equal), so exe= cuting will not be incremented. > > > Ah, pthread_equal is one of the backwards functions that returns zero for > inequality. Maybe then rewrite that as: > if (!pthread_equal(...) >=20 > So its clear that we're looking for inequality there to increment? >=20 > > > > > > > rte_spinlock_unlock(&alarm_list_lk); > > > > > > > > ap->cb_fn(ap->cb_arg); > > > > > > > > rte_spinlock_lock(&alarm_list_lk); > > > > + > > > > LIST_REMOVE(ap, next); > > > > rte_free(ap); > > > > } > > > > @@ -145,7 +148,7 @@ rte_eal_alarm_set(uint64_t us, rte_eal_alarm_ca= llback cb_fn, void *cb_arg) > > > > if (us < 1 || us > (UINT64_MAX - US_PER_S) || cb_fn =3D=3D NULL) > > > > return -EINVAL; > > > > > > > > - new_alarm =3D rte_malloc(NULL, sizeof(*new_alarm), 0); > > > > + new_alarm =3D rte_zmalloc(NULL, sizeof(*new_alarm), 0); > > > > if (new_alarm =3D=3D NULL) > > > > return -ENOMEM; > > > > > > > > @@ -156,7 +159,6 @@ rte_eal_alarm_set(uint64_t us, rte_eal_alarm_ca= llback cb_fn, void *cb_arg) > > > > new_alarm->cb_arg =3D cb_arg; > > > > new_alarm->time.tv_usec =3D (now.tv_usec + us) % US_PER_S; > > > > new_alarm->time.tv_sec =3D now.tv_sec + ((now.tv_usec + us) / US_= PER_S); > > > > - new_alarm->executing =3D 0; > > > > > > > This removes the only place where ->executing is cleared again. If t= here is > > > only one change to this bits state (which is the case after this patc= h), it > > > seems that you can just use the executing bit as the test in the alar= m_cancel > > > function, and remove all the pthread_self mess. > > > > I believe we do need executing_id here. > > It allows us to distinguish are we executing cancel from a callback or = not. > > > Given what you said above, I agree, at least in the current implementatio= n. It > still seems like theres a simpler solution that doesn't require all the > comparative gymnastics. >=20 > What if, instead of testing if you're the callback thread, we turn the ex= ecuting > field of alarm_entry into a bitfield, where bit 0 represents the former > "executing" state, and bit 1 is defined as a "cancelled" bit. Then > rte_eal_alarm_cancel becomes a search that, when an alarm is found simply= or's > in the cancelled bit to the executing bit field. When the callback threa= d runs, > it skips executing any alarm that is marked as cancelled, but frees all a= larm > entries that are executed or cancelled. That gives us a single point at = which > frees of alarm entires happen? Something like the patch below (completel= y > untested)? So basically cancel() just set ALARM_CANCELLED and leaves actual alarm dele= tion to the callback()? I think it is doable - but I don't see any real advantage with that approac= h. Yes, code will become a bit simpler, as we'll have one point when we remov= e alarm from the list. But from other side, imagine such simple test-case: for (i =3D 0; i < 0x100000; i++) { rte_eal_alarm_set(ONE_MIN, cb_func, (void *)i); rte_eal_alarm_cancel(cb_func, (void *)i); }=20 We'll endup with 1M of cancelled, but still not removed entries in the alar= m_list. With current implementation that means - few MBs of wasted memory, plus very slow set() and cancel(), as they'll have to traverse all entries= in the list. =20 And all that - for empty from user perspective alarm_list=20 So I still prefer Michal's way. After all, it doesn't look that complicated to me.=20 BTW, any particular reason you are so negative about pthread_self()? >=20 > It also seems like the alarm api as a whole could use some improvement. = The > way its written right now, theres no way to refer to a specific alarm (i.= e. > cancelation relies on the specification of a function and data pointer, w= hich > may refer to multiple timers). Shouldn't rte_eal_alarm_set return an opa= que > handle to a unique timer instance that can be store by a caller and used = to > specfically cancel that timer? Thats how both the bsd and linux timer > subsystems model timers. Yeh, alarm API looks a bit unusual.=20 Though, I suppose that's subject for another patch/discussion :) >=20 >=20 >=20 > diff --git a/lib/librte_eal/linuxapp/eal/eal_alarm.c b/lib/librte_eal/lin= uxapp/eal/eal_alarm.c > index 480f0cb..73b6dc5 100644 > --- a/lib/librte_eal/linuxapp/eal/eal_alarm.c > +++ b/lib/librte_eal/linuxapp/eal/eal_alarm.c > @@ -64,6 +64,9 @@ > #define MS_PER_S 1000 > #define US_PER_S (US_PER_MS * MS_PER_S) >=20 > +#define ALARM_EXECUTING (1 << 0) > +#define ALARM_CANCELLED (1 << 1) > + > struct alarm_entry { > LIST_ENTRY(alarm_entry) next; > struct timeval time; > @@ -107,12 +110,14 @@ eal_alarm_callback(struct rte_intr_handle *hdl __rt= e_unused, > gettimeofday(&now, NULL) =3D=3D 0 && > (ap->time.tv_sec < now.tv_sec || (ap->time.tv_sec =3D=3D now.tv_sec &= & > ap->time.tv_usec <=3D now.tv_usec))){ > - ap->executing =3D 1; > - rte_spinlock_unlock(&alarm_list_lk); > + ap->executing |=3D ALARM_EXECUTING; > + if (likely(!(ap->executing & ALARM_CANCELLED)) { > + rte_spinlock_unlock(&alarm_list_lk); >=20 > - ap->cb_fn(ap->cb_arg); > + ap->cb_fn(ap->cb_arg); >=20 > - rte_spinlock_lock(&alarm_list_lk); > + rte_spinlock_lock(&alarm_list_lk); > + } > LIST_REMOVE(ap, next); > rte_free(ap); > } > @@ -209,10 +214,9 @@ rte_eal_alarm_cancel(rte_eal_alarm_callback cb_fn, v= oid *cb_arg) > rte_spinlock_lock(&alarm_list_lk); > /* remove any matches at the start of the list */ > while ((ap =3D LIST_FIRST(&alarm_list)) !=3D NULL && > - cb_fn =3D=3D ap->cb_fn && ap->executing =3D=3D 0 && > + cb_fn =3D=3D ap->cb_fn && > (cb_arg =3D=3D (void *)-1 || cb_arg =3D=3D ap->cb_arg)) { > - LIST_REMOVE(ap, next); > - rte_free(ap); > + ap->executing |=3D ALARM_CANCELLED; > count++; > } > ap_prev =3D ap; > @@ -220,10 +224,9 @@ rte_eal_alarm_cancel(rte_eal_alarm_callback cb_fn, v= oid *cb_arg) > /* now go through list, removing entries not at start */ > LIST_FOREACH(ap, &alarm_list, next) { > /* this won't be true first time through */ > - if (cb_fn =3D=3D ap->cb_fn && ap->executing =3D=3D 0 && > + if (cb_fn =3D=3D ap->cb_fn && > (cb_arg =3D=3D (void *)-1 || cb_arg =3D=3D ap->cb_arg)) { > - LIST_REMOVE(ap,next); > - rte_free(ap); > + ap->executing |=3D ALARM_CANCELLED; > count++; > ap =3D ap_prev; > }