From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from smtp.tuxdriver.com (charlotte.tuxdriver.com [70.61.120.58]) by dpdk.org (Postfix) with ESMTP id 3A5791F5 for ; Mon, 22 Sep 2014 15:09:29 +0200 (CEST) Received: from [2001:470:8:a08:18c5:c64e:4bf:67a] (helo=localhost) by smtp.tuxdriver.com with esmtpsa (TLSv1:AES128-SHA:128) (Exim 4.63) (envelope-from ) id 1XW3Sf-0003Kf-3e; Mon, 22 Sep 2014 09:15:31 -0400 Date: Mon, 22 Sep 2014 09:15:24 -0400 From: Neil Horman To: "Jastrzebski, MichalX K" Message-ID: <20140922131524.GF25406@hmsreliant.think-freely.org> References: <1410963713-13837-1-git-send-email-pawelx.wodkowski@intel.com> <1410963713-13837-3-git-send-email-pawelx.wodkowski@intel.com> <20140917151304.GD4213@localhost.localdomain> <20140918160234.GJ20389@hmsreliant.think-freely.org> <20140919172907.GE12897@hmsreliant.think-freely.org> <20140922102455.GA25406@hmsreliant.think-freely.org> <60ABE07DBB3A454EB7FAD707B4BB158213896977@IRSMSX109.ger.corp.intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <60ABE07DBB3A454EB7FAD707B4BB158213896977@IRSMSX109.ger.corp.intel.com> User-Agent: Mutt/1.5.23 (2014-03-12) X-Spam-Score: -2.9 (--) X-Spam-Status: No Cc: "dev@dpdk.org" , "Landowski, MarekX M" Subject: Re: [dpdk-dev] [PATCH 2/2] bond: add mode 4 support X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 22 Sep 2014 13:09:29 -0000 On Mon, Sep 22, 2014 at 11:59:33AM +0000, Jastrzebski, MichalX K wrote: > Hi Neil, > I agree with you that the most important is to release a code that works best without errors and this is what we all are working on. Pawel's answer "no time" doesn't sounds good here (he meant something else) - I ensure you that Pawel cares a lot to release a very good code. He proposes a solution that fixes this race for 1.8 release. Implementation of a new rte_api_call will take more time, because this is a new functionality for eal_timer library and with this it seems to be not much time left. Having said that, should we abandon hotfix and focus on the long-term solution? > Yes, I think the proper solution should be the one we shoot for here, though in re-reading my response, perhaps I wasn't as clear as I could have been. All I'm really advocating here is that the while(...) { rte_pause() } loop that Pawels fix puts in place is better wrapped in a function implemented in the rte_alarm library, rather than privately to the bonding library, along with the replacement of all the pointer assignments with an internal state variable. I'm not asserting that we need to audit the code to fix up all other call sites using the rte_alarm api right now (though a quick cscope search reveals the only locations are in the test apps). I'm just saying lets fix it in such a way that other users can take advantage of it now, and write the unit tests for it after it ships. In fact, looking at the alarm test infrastructure, alarm re-arming stress isn't currently tested at all, so that could be a large undertaking after shipment. Neil > Best regards > Michal > > > -----Original Message----- > From: Neil Horman [mailto:nhorman@tuxdriver.com] > Sent: Monday, September 22, 2014 12:25 PM > To: Wodkowski, PawelX > Cc: dev@dpdk.org; Jastrzebski, MichalX K; Doherty, Declan > Subject: Re: [dpdk-dev] [PATCH 2/2] bond: add mode 4 support > > On Mon, Sep 22, 2014 at 06:26:21AM +0000, Wodkowski, PawelX wrote: > > > I think that will work, but I believe you're making it more > > > complicated (and less reusable) than it needs to be. What I think > > > you really need to do is create a new rte api call, > > > rte_eal_alarm_cancel_sync (something like the equivalent of > > > del_timer_sync in linux, that wraps up the > > > while(rte_eal_alarm_cancel(...) == 0) {rte_pause} in its own > > > function (so other call sites can use it, as I don't think this is > > > an uncommon problem), Then just create a bonding-internal state flag > > > to signal the periodic callback that it shouldn't re-arm the timer. > > > That way all you have to do is set the flag, and call > > > rte_eal_alarm_cancel_sync, and you're done. And other applications > > > will be able to handle this common type of operation as well > > > > > > Neil > > > > I agree with you that alarms should be upgraded but this need to > > discussed and tested. Now there is not time for that. > > > Nak, thats a completely broken argument. I've just demonstrated to you a race condition in the driver you are submitting. Granted it stems from a design lmitation in the alarm subsystem, but its what we all have to work with, and we can work around it and make the driver safe. To say there is "no time" to fix it, implies to me that you care more about just slamming your code in than making anything work properly. What exactly is your rush that makes you think its more important for the code to be merged than fixing it to work correctly? > Neil > > -------------------------------------------------------------- > Intel Shannon Limited > Registered in Ireland > Registered Office: Collinstown Industrial Park, Leixlip, County Kildare > Registered Number: 308263 > Business address: Dromore House, East Park, Shannon, Co. Clare > > This e-mail and any attachments may contain confidential material for the sole use of the intended recipient(s). Any review or distribution by others is strictly prohibited. If you are not the intended recipient, please contact the sender and delete all copies. > > >