From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from smtp.tuxdriver.com (charlotte.tuxdriver.com [70.61.120.58]) by dpdk.org (Postfix) with ESMTP id D9C2011F5 for ; Sun, 28 Sep 2014 22:41:29 +0200 (CEST) Received: from [2001:470:8:a08:215:ff:fecc:4872] (helo=localhost) by smtp.tuxdriver.com with esmtpsa (TLSv1:AES128-SHA:128) (Exim 4.63) (envelope-from ) id 1XYLNr-0006y6-8O; Sun, 28 Sep 2014 16:48:01 -0400 Date: Sun, 28 Sep 2014 16:47:54 -0400 From: Neil Horman To: "Ananyev, Konstantin" Message-ID: <20140928204754.GC4012@localhost.localdomain> References: <20140926114630.GA3930@hmsreliant.think-freely.org> <20140926134014.GB3930@hmsreliant.think-freely.org> <20140926150156.GB5619@hmsreliant.think-freely.org> <2601191342CEEE43887BDE71AB9772582137D88E@IRSMSX104.ger.corp.intel.com> <20140926162134.GE5619@hmsreliant.think-freely.org> <2601191342CEEE43887BDE71AB9772582137D95F@IRSMSX104.ger.corp.intel.com> <20140926193905.GH5619@hmsreliant.think-freely.org> <2601191342CEEE43887BDE71AB9772582138410B@IRSMSX104.ger.corp.intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <2601191342CEEE43887BDE71AB9772582138410B@IRSMSX104.ger.corp.intel.com> User-Agent: Mutt/1.5.23 (2014-03-12) X-Spam-Score: -2.9 (--) X-Spam-Status: No Cc: "dev@dpdk.org" Subject: Re: [dpdk-dev] [PATCH v2] Change alarm cancel function to thread-safe: X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 28 Sep 2014 20:41:30 -0000 On Sun, Sep 28, 2014 at 04:12:04PM +0000, Ananyev, Konstantin wrote: > > > > -----Original Message----- > > From: Neil Horman [mailto:nhorman@tuxdriver.com] > > Sent: Friday, September 26, 2014 8:39 PM > > To: Ananyev, Konstantin > > Cc: Wodkowski, PawelX; dev@dpdk.org > > Subject: Re: [dpdk-dev] [PATCH v2] Change alarm cancel function to thread-safe: > > > > On Fri, Sep 26, 2014 at 06:07:14PM +0000, Ananyev, Konstantin wrote: > > > > > > > > > > > As I remember the purpose of the patch was to fix the race condition inside rte_alarm library. > > > > > I believe that the patch provided by Michal & Pawel fixes the issues you discovered. > > > > > If you think, that is not the case, could you please provide a list of remaining issues? > > > > > Excluding ones that you just don't like it, and you are not happy with rte_alarm API in total? > > > > > > > > > > Gladly. As Pawel explained the race, its possible that, after calling > > > > rte_eal_alarm_cancel, an in-flight execution of an alarm callback may still be > > > > running. The problem with that ostensibly is that data which is being accessed > > > > by the callback might be then accessed in parallel with another process leading > > > > to data corruption or some other problem. The issue I have with his patch is > > > > that it doesn't completely close the race. While it does close the race for the > > > > condition in whcih thread B is running the alarm callback while thread A is > > > > executing the cancel operation, it does not close the case for when a single > > > > thread B is running the cancel operation, as the in-flight execution itself is > > > > still active. > > > > > > A bit puzzled here: > > > Are you saying that calling alarm_cancel() for itself inside eal_alarm_callback() might cause a problem? > > > I still don't see how. > > > > > Potentially yes, by the same race condition that exists when using a secondary > > thread to do the cancel call. As I understand it the race that Pawel described > > is as follows: > > > > Thread A Thread B > > alarm_cancel() eal_alarm_callback > > block on alarm spinlock drop spinlock > > run cancel operation execute callback function > > return from cancel > > rte_eal_alarm_set > > > > As Pawel described the problem, there is a desire to not set the new alarm while > > the old alarm is still executing. And his patch accomplishes that for the two > > thread case above just fine > > > > The problem with Pawels patch is that its non functional in the case where the > > cancel happens within Thread B. Lets change the scenario just a little bit: > > > > Thread B Thread C > > eal_alarm_callback > > callback_function > > some_other_common_func > > rte_eal_alarm_cancel(this) > > pthread_signal(Thread C) wake up > > operate on alarm data rte_eal_alarm_set > > > > As I can see, there is an incorrect behaviour in your callback_function example. > It should first finish with " eal_alarm_callback" and only then send a signal to other thread. > Otherwise we can't help it in any way. > But I think, I understand your concern: > after rte_eal_aralm_cancel() finishes, the caller can't clearly distinguish what exactly happen: > 1) alarm was cancelled succesfully. > 2) alarm was not found (already cancelled or executed). > 3) alarm is executing by the same thread and can't be cancelled. > > Basically right now the caller can distinguish that either #1 or #2,3 happened, but can't distinguish between 2 & 3. > Correct? > Yes, this is my concern exactly. > If that's so, then I suppose we can do: make alarm_cancel() to return a negative value for the case #3 (-EINPROGRESS or something). > Something like: > ... > if (ap->executing == 0) { > LIST_REMOVE(ap,next); > rte_free(ap); > count++; > ap = ap_prev; > } else if (pthread_equal(ap->executing_id, pthread_self()) == 0) { > executing++; > } else { > ret = -EINPROGRESS; > } > ... > return ((ret != 0) ? ret : count); > > So the return value will be > 0 for #1, 0 for #2, <0 for #3. > As I remember, you already suggested something similar in one of the previous mails. Yes, I rolled the API changes I suggested in with this model, because I wanted to be able to do precise specification of a timer instance to cancel, but if we're not ready to make that change, I think what you propose above would be suffficient. Theres some question as to weather we would cancel timers that are still pending on a return of -EINPROGRESS, but I think if we document it accordingly, then it can be worked out just fine. Best Neil > Konstantin > > > > > > > > In this scenario the problem is not fixed because when called from within the > > alarm thread, the executing alarm is skipped (as it must be), but that fact is > > invisible to the caller, and because of that its still possible for the same > > origional problem to occur. > > > > Neil > > >