From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) by dpdk.org (Postfix) with ESMTP id ACFCE7E0B for ; Fri, 26 Sep 2014 16:07:38 +0200 (CEST) Received: from orsmga001.jf.intel.com ([10.7.209.18]) by orsmga103.jf.intel.com with ESMTP; 26 Sep 2014 07:11:56 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.04,604,1406617200"; d="scan'208";a="579497510" Received: from irsmsx101.ger.corp.intel.com ([163.33.3.153]) by orsmga001.jf.intel.com with ESMTP; 26 Sep 2014 07:13:57 -0700 Received: from irsmsx104.ger.corp.intel.com ([169.254.5.248]) by IRSMSX101.ger.corp.intel.com ([169.254.1.201]) with mapi id 14.03.0195.001; Fri, 26 Sep 2014 15:13:56 +0100 From: "Ananyev, Konstantin" To: Neil Horman , "Wodkowski, PawelX" Thread-Topic: [dpdk-dev] [PATCH v2] Change alarm cancel function to thread-safe: Thread-Index: AQHP2MAqmY9s5RkexE+yRGxj2kXnOpwR4sKAgAAemtCAAAdaAIAAZ8zwgADMQACAAA5cAIAAEWsAgAAXO7A= Date: Fri, 26 Sep 2014 14:13:55 +0000 Message-ID: <2601191342CEEE43887BDE71AB9772582137D7F1@IRSMSX104.ger.corp.intel.com> References: <1411649768-8084-1-git-send-email-michalx.k.jastrzebski@intel.com> <20140925150807.GD32725@hmsreliant.think-freely.org> <2601191342CEEE43887BDE71AB977258213769DE@IRSMSX105.ger.corp.intel.com> <20140925172358.GG32725@hmsreliant.think-freely.org> <2601191342CEEE43887BDE71AB97725821378B50@IRSMSX104.ger.corp.intel.com> <20140926114630.GA3930@hmsreliant.think-freely.org> <20140926134014.GB3930@hmsreliant.think-freely.org> In-Reply-To: <20140926134014.GB3930@hmsreliant.think-freely.org> Accept-Language: en-IE, en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [163.33.239.180] Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Cc: "dev@dpdk.org" Subject: Re: [dpdk-dev] [PATCH v2] Change alarm cancel function to thread-safe: X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 26 Sep 2014 14:07:39 -0000 > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Neil Horman > Sent: Friday, September 26, 2014 2:40 PM > To: Wodkowski, PawelX > Cc: dev@dpdk.org > Subject: Re: [dpdk-dev] [PATCH v2] Change alarm cancel function to thread= -safe: >=20 > On Fri, Sep 26, 2014 at 12:37:54PM +0000, Wodkowski, PawelX wrote: > > > So basically cancel() just set ALARM_CANCELLED and leaves actual alar= m > > > deletion to the callback()? > > > That was the thought, yes. > > > > > > > I think it is doable - but I don't see any real advantage with that= approach. > > > > Yes, code will become a bit simpler, as we'll have one point when = we remove > > > alarm from the list. > > > Yes, that would be the advantage, that the code would be much simpler= . > > > > > > > But from other side, imagine such simple test-case: > > > > > > > > for (i =3D 0; i < 0x100000; i++) { > > > > rte_eal_alarm_set(ONE_MIN, cb_func, (void *)i); > > > > rte_eal_alarm_cancel(cb_func, (void *)i); > > > > } > > > > > > > > We'll endup with 1M of cancelled, but still not removed entries in = the > > > alarm_list. > > > > With current implementation that means - few MBs of wasted memory, > > > Thats correct, and the tradeoff to choose between. Do you want simpl= er code > > > that is easier to maintain, or do you want a high speed cancel and se= t > > > operation. I'm not aware of all the use cases, but I have a hard tim= e seeing > > > a use case in which the in-flight alarm list grows unboundedly large,= which in > > > my mind mitigates the risk of deferred removal, but I'm perfectly wil= ling to > > > believe that there are use cases which I'm not aware of. After executing example above - from user perspective there is no active al= arms in the system at all. Though in fact alarm_list contains 1M entries.=20 > > > > > > > plus very slow set() and cancel(), as they'll have to traverse all= entries in the > > > list. > > > > And all that - for empty from user perspective alarm_list > > > > So I still prefer Michal's way. > > > > After all, it doesn't look that complicated to me. > > > Except that the need for Michals fix arose from the fact that we have= two free > > > locations that might both get called depending on the situation. Tha= ts what I'm > > > trying to address here, the complexity itself, rather than the fix (w= hich I > > > agree is perfectly valid). Well, I believe his fix addresses all the open issues, no? Another thing, as Pawel pointed in another mail, your fix doesn't address p= roperly the situation when we have a racing alarm_cancel(cb_func) and alarm cb_func rearming itself.=20 While the original patch does. > > > > > > > BTW, any particular reason you are so negative about pthread_self()= ? > > > > > > > Nothing specifically against it (save for its inverted return code se= nse, which > > > made it difficult for me to parse when reviewing). Its more the comp= lexity > > > itself in the alarm cancel and callback routine that I was looking at= . Given > > > that the origional bug happened because an cancel in a callback might= produce a > > > double free, I wanted to fix it by simpifying the code, not adding co= nditions > > > which make it more complex. > > > > > > You know, looking at it, something else just occured to me. I think = this could > > > all be fixed without the cancel flag or the pthread_self assignments.= What if > > > we simply removed the alarm from the list before we called the callba= ck in > > > rte_eal_alarm_callback()? That way any cancel operation called from = within the > > > callback would fail, as it wouldn't appear on the list, and the callb= ack > > > operation would be the only freeing entity? That would let you still= have a > > > fast set and cancel, and avoid the race. Thoughts? Untested sample = patch > > > below Hmm, and how it would address any of the problems I mentioned above: 1. The alarm_list still would grow by the repetition of: {alarm_set(x); ala= rm_cansel(x);}=20 2. Race between alarm_cancel(cb_func) and alarm cb_func rearming itself. ? > > > > > > > > > > > > > > > > It also seems like the alarm api as a whole could use some improv= ement. > > > The > > > > > way its written right now, theres no way to refer to a specific a= larm (i.e. > > > > > cancelation relies on the specification of a function and data po= inter, which > > > > > may refer to multiple timers). Shouldn't rte_eal_alarm_set retur= n an opaque > > > > > handle to a unique timer instance that can be store by a caller a= nd used to > > > > > specfically cancel that timer? Thats how both the bsd and linux = timer > > > > > subsystems model timers. > > > > > > > > Yeh, alarm API looks a bit unusual. > > > > Though, I suppose that's subject for another patch/discussion :) > > > > > > > Yes, agreed :) > > > > > > > Please read quoted message bellow: > > > > > > > > > > > > > > His solution *does* eliminate race condition too. > > > > > > > I applied his patch. And here is the problem > > > 1 rte_spinlock_lock(&alarm_list_lk); > > > 2 while ((ap =3D LIST_FIRST(&alarm_list)) !=3DNULL && > > > 3 gettimeofday(&now, NULL) =3D=3D 0 && > > > 4 (ap->time.tv_sec < now.tv_sec || (ap->time.tv_sec =3D=3D > > > now.tv_sec && > > > 5 ap->time.tv_usec <=3D > > > now.tv_usec))){ > > > 6 ap->executing |=3D ALARM_EXECUTING; > > > 7 if (likely(!(ap->executing & ALARM_CANCELLED))) { > > > 8 rte_spinlock_unlock(&alarm_list_lk); > > > 9 //another thread: rte_alarm_cancel called= , mark this timer > > > canceled and exit ( THE RACE) > > > 10 ap->cb_fn(ap->cb_arg); // rte_alarm_set called > > > (THE RACE) > > > 11 > > > 12 rte_spinlock_lock(&alarm_list_lk); > > > 13 } > > > 14 > > > 15 rte_spinlock_lock(&alarm_list_lk); > > > 16 LIST_REMOVE(ap, next); > > > 17 rte_free(ap); > > > 18 } > > > > > > Imagine > > > > > > Thread 1: Thread2 > > > Execute eal_alarm_callback > > > Lock list at 1 rte_alarm_cancel -> = block on spinlock > > > .... > > > Realease lock at line 8 rte_alarm_cancel -> resumes = execution -> it > > > mark this timer canceled > > > ap->cb_fn is called at line 10 rte_alarm_cancel exits reporting= all alarms are > > > canceled and not executing (which is not true!) > > > rte_alarm_set is called > > > to rearm itself (THE RACE) > > > > > > He only mark timer to * do not execute* but does not assure that it i= s not > > > executing while canceling. > > > Race condition is made by unlocking at line 8 and our solution workar= ounds this > > > by looping until all alarms finish execution then cancel what left af= ter callback > > > finish (*including rearmed alarms*). > > > Real elimination of the race would be by using recursive locking when= *other > > > thread can't* access the list *while callback* is running but callbac= k *itself can > > > by using recursive locking*. > > > > > > Maybe I don't see something obvious? :) >=20 > I think you're missing the fact that your patch doesn't do what you asser= t above > either :) >=20 > First, lets address rte_alarm_set. There is no notion of "re-arming" in = this > alarm implementation, because theres no ability to refer to a specific al= arm > from the callers perspective. When you call rte_eal_alarm_set you get a = new > alarm every time. So I don't really see a race there. It might not be e= xactly > the behavior you want, but its not a race, becuase you're not modifying a= n alarm > in the middle of execution, you're just creating a new alarm, which is sa= fe. >=20 > There is a race in what you describe above, insofar as its possible that = you > might call rte_eal_alarm_cancel and return without having canceled all th= e > matching alarms. I don't see any clear documentation on what the behavio= r is > supposed to be, but if you want to ensure that all matching alarms are ca= ncelled > or complete on return from rte_eal_alarm_cancel, thats perfectly fine (in= linux > API parlance, thats usually denoted as a cancel_sync operation). >=20 > For that race condition, you're correct, my patch doesn't address it, I s= ee that > now. Though your patch doesn't either. If you call rte_eal_alarm_cancel= from > within a callback function, then, by definition, you can't wait on the > completion of the active alarm, because thats a deadlock. Its a necessec= ary > evil, I grant you, but it means that you can't be guaranteed the cancelle= d and > complete (cancel_sync) behavior that you want, at least not with the curr= ent > api. If you want that behavior, you need to do one of two things: >=20 > 1) Modify the api to allow callers to individually reference timer instan= ces, so > that when cancelling, we can return an appropriate return code to indicat= e to > the caller that this alarm is in-progress. That way you can guarantee th= e > caller that the specific alarm that you cancelled is either complete and = cancelled > or currently executing. Add an api to expicitly wait on a referenced ala= rm as > well. This allows developers to know that, when executing an alarm callb= ack, an > -ECURRENTLYEXECUTING return code is ok, because they are in the currently > executing context. >=20 >=20 > 2) Understand that the guarantee can't be met, and change nothing, becaus= e you > consider it ok for an in-progress alarm to be ignored when getting cancel= led. >=20 >=20 > Neil