From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mga02.intel.com (mga02.intel.com [134.134.136.20]) by dpdk.org (Postfix) with ESMTP id 1CCEB7DEC for ; Fri, 26 Sep 2014 15:54:53 +0200 (CEST) Received: from azsmga001.ch.intel.com ([10.2.17.19]) by orsmga101.jf.intel.com with ESMTP; 26 Sep 2014 07:01:11 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.04,604,1406617200"; d="scan'208";a="480203480" Received: from irsmsx101.ger.corp.intel.com ([163.33.3.153]) by azsmga001.ch.intel.com with ESMTP; 26 Sep 2014 07:01:10 -0700 Received: from irsmsx153.ger.corp.intel.com (163.33.192.75) by IRSMSX101.ger.corp.intel.com (163.33.3.153) with Microsoft SMTP Server (TLS) id 14.3.195.1; Fri, 26 Sep 2014 15:01:06 +0100 Received: from irsmsx102.ger.corp.intel.com ([169.254.2.200]) by IRSMSX153.ger.corp.intel.com ([169.254.9.42]) with mapi id 14.03.0195.001; Fri, 26 Sep 2014 15:01:06 +0100 From: "Wodkowski, PawelX" To: Neil Horman Thread-Topic: [dpdk-dev] [PATCH v2] Change alarm cancel function to thread-safe: Thread-Index: AQHP2MAapayPUkT5UUGj9XqXyFJ3T5wR4sKAgAAPjwCAABZmAIAAZLsAgADPUACAAB2vsIAAAhgAgAAUZmA= Date: Fri, 26 Sep 2014 14:01:05 +0000 Message-ID: References: <1411649768-8084-1-git-send-email-michalx.k.jastrzebski@intel.com> <20140925150807.GD32725@hmsreliant.think-freely.org> <2601191342CEEE43887BDE71AB977258213769DE@IRSMSX105.ger.corp.intel.com> <20140925172358.GG32725@hmsreliant.think-freely.org> <2601191342CEEE43887BDE71AB97725821378B50@IRSMSX104.ger.corp.intel.com> <20140926114630.GA3930@hmsreliant.think-freely.org> <20140926134014.GB3930@hmsreliant.think-freely.org> In-Reply-To: <20140926134014.GB3930@hmsreliant.think-freely.org> Accept-Language: pl-PL, en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [163.33.239.182] Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Cc: "dev@dpdk.org" Subject: Re: [dpdk-dev] [PATCH v2] Change alarm cancel function to thread-safe: X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 26 Sep 2014 13:54:54 -0000 > > > Maybe I don't see something obvious? :) >=20 > I think you're missing the fact that your patch doesn't do what you asser= t above > either :) Issue is not in setting alarms but canceling it. If you look closer to my p= atch you see that it address this issue (look at added *do { lock(); ....; unlock();= } while( )*=20 part). >=20 > First, lets address rte_alarm_set. There is no notion of "re-arming" in = this > alarm implementation, because theres no ability to refer to a specific al= arm > from the callers perspective. When you call rte_eal_alarm_set you get a = new > alarm every time. So I don't really see a race there. It might not be e= xactly > the behavior you want, but its not a race, becuase you're not modifying a= n > alarm > in the middle of execution, you're just creating a new alarm, which is sa= fe. OK, it is safe, but this is not the case. >=20 > There is a race in what you describe above, insofar as its possible that = you > might call rte_eal_alarm_cancel and return without having canceled all th= e > matching alarms. I don't see any clear documentation on what the behavio= r is > supposed to be, but if you want to ensure that all matching alarms are ca= ncelled > or complete on return from rte_eal_alarm_cancel, thats perfectly fine (in= linux > API parlance, thats usually denoted as a cancel_sync operation). Again, look at the patch. I changed documentation to inform about this beha= vior. >=20 > For that race condition, you're correct, my patch doesn't address it, I s= ee that > now. Though your patch doesn't either. If you call rte_eal_alarm_cancel= from > within a callback function, then, by definition, you can't wait on the > completion of the active alarm, because thats a deadlock. Its a necessec= ary > evil, I grant you, but it means that you can't be guaranteed the cancelle= d and > complete (cancel_sync) behavior that you want, at least not with the curr= ent > api. If you want that behavior, you need to do one of two things: This patch does not break any API. It only removes undefined behavior. >=20 > 1) Modify the api to allow callers to individually reference timer instan= ces, so > that when cancelling, we can return an appropriate return code to indicat= e to > the caller that this alarm is in-progress. That way you can guarantee th= e > caller that the specific alarm that you cancelled is either complete and = cancelled > or currently executing. Add an api to expicitly wait on a referenced ala= rm as > well. This allows developers to know that, when executing an alarm callb= ack, an > -ECURRENTLYEXECUTING return code is ok, because they are in the currently > executing context. This would brake API for sure.