From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from smtp.tuxdriver.com (charlotte.tuxdriver.com [70.61.120.58]) by dpdk.org (Postfix) with ESMTP id 9EC657DEB for ; Fri, 26 Sep 2014 21:32:52 +0200 (CEST) Received: from hmsreliant.think-freely.org ([2001:470:8:a08:7aac:c0ff:fec2:933b] helo=localhost) by smtp.tuxdriver.com with esmtpsa (TLSv1:AES128-SHA:128) (Exim 4.63) (envelope-from ) id 1XXbMC-00015K-C6; Fri, 26 Sep 2014 15:39:12 -0400 Date: Fri, 26 Sep 2014 15:39:05 -0400 From: Neil Horman To: "Ananyev, Konstantin" Message-ID: <20140926193905.GH5619@hmsreliant.think-freely.org> References: <20140925172358.GG32725@hmsreliant.think-freely.org> <2601191342CEEE43887BDE71AB97725821378B50@IRSMSX104.ger.corp.intel.com> <20140926114630.GA3930@hmsreliant.think-freely.org> <20140926134014.GB3930@hmsreliant.think-freely.org> <20140926150156.GB5619@hmsreliant.think-freely.org> <2601191342CEEE43887BDE71AB9772582137D88E@IRSMSX104.ger.corp.intel.com> <20140926162134.GE5619@hmsreliant.think-freely.org> <2601191342CEEE43887BDE71AB9772582137D95F@IRSMSX104.ger.corp.intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <2601191342CEEE43887BDE71AB9772582137D95F@IRSMSX104.ger.corp.intel.com> User-Agent: Mutt/1.5.23 (2014-03-12) X-Spam-Score: -2.9 (--) X-Spam-Status: No Cc: "dev@dpdk.org" Subject: Re: [dpdk-dev] [PATCH v2] Change alarm cancel function to thread-safe: X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 26 Sep 2014 19:32:52 -0000 On Fri, Sep 26, 2014 at 06:07:14PM +0000, Ananyev, Konstantin wrote: > > > > > As I remember the purpose of the patch was to fix the race condition inside rte_alarm library. > > > I believe that the patch provided by Michal & Pawel fixes the issues you discovered. > > > If you think, that is not the case, could you please provide a list of remaining issues? > > > Excluding ones that you just don't like it, and you are not happy with rte_alarm API in total? > > > > Gladly. As Pawel explained the race, its possible that, after calling > > rte_eal_alarm_cancel, an in-flight execution of an alarm callback may still be > > running. The problem with that ostensibly is that data which is being accessed > > by the callback might be then accessed in parallel with another process leading > > to data corruption or some other problem. The issue I have with his patch is > > that it doesn't completely close the race. While it does close the race for the > > condition in whcih thread B is running the alarm callback while thread A is > > executing the cancel operation, it does not close the case for when a single > > thread B is running the cancel operation, as the in-flight execution itself is > > still active. > > A bit puzzled here: > Are you saying that calling alarm_cancel() for itself inside eal_alarm_callback() might cause a problem? > I still don't see how. > Potentially yes, by the same race condition that exists when using a secondary thread to do the cancel call. As I understand it the race that Pawel described is as follows: Thread A Thread B alarm_cancel() eal_alarm_callback block on alarm spinlock drop spinlock run cancel operation execute callback function return from cancel rte_eal_alarm_set As Pawel described the problem, there is a desire to not set the new alarm while the old alarm is still executing. And his patch accomplishes that for the two thread case above just fine The problem with Pawels patch is that its non functional in the case where the cancel happens within Thread B. Lets change the scenario just a little bit: Thread B Thread C eal_alarm_callback callback_function some_other_common_func rte_eal_alarm_cancel(this) pthread_signal(Thread C) wake up operate on alarm data rte_eal_alarm_set In this scenario the problem is not fixed because when called from within the alarm thread, the executing alarm is skipped (as it must be), but that fact is invisible to the caller, and because of that its still possible for the same origional problem to occur. Neil