DPDK patches and discussions
 help / color / mirror / Atom feed
From: "Ananyev, Konstantin" <konstantin.ananyev@intel.com>
To: Neil Horman <nhorman@tuxdriver.com>,
	"Wodkowski, PawelX" <pawelx.wodkowski@intel.com>
Cc: "dev@dpdk.org" <dev@dpdk.org>
Subject: Re: [dpdk-dev] [PATCH v2] Change alarm cancel function to thread-safe:
Date: Fri, 26 Sep 2014 15:41:58 +0000	[thread overview]
Message-ID: <2601191342CEEE43887BDE71AB9772582137D88E@IRSMSX104.ger.corp.intel.com> (raw)
In-Reply-To: <20140926150156.GB5619@hmsreliant.think-freely.org>



> -----Original Message-----
> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Neil Horman
> Sent: Friday, September 26, 2014 4:02 PM
> To: Wodkowski, PawelX
> Cc: dev@dpdk.org
> Subject: Re: [dpdk-dev] [PATCH v2] Change alarm cancel function to thread-safe:
> 
> On Fri, Sep 26, 2014 at 02:01:05PM +0000, Wodkowski, PawelX wrote:
> > > > > Maybe I don't see something obvious? :)
> > >
> > > I think you're missing the fact that your patch doesn't do what you assert above
> > > either :)
> >
> > Issue is not in setting alarms but canceling it. If you look closer to my patch you
> > see that it address this issue (look at added *do { lock(); ....; unlock(); } while( )*
> > part).
> >
> I get where the issue is, and I'm looking at your patch.  I see that you did
> some locking there.  The issue I'm pointing out is that, if you call
> rte_eal_alarm_cancel on an alarm callback, you will exit the alarm_cancel
> function with, by definition, one alarm executing (the one you are currently
> running).  You're patch works perfectly for the case where another thread calls
> cancel, in that it waits until the executing alarm is complete, but it doesn't
> work in the case where you are calling it from within the alarm callback.

Hm, and why do we need it from alarm callback?
After cb_func() is finished given alarm entry will be removed anyway.

> If  you're goal is to guarantee that all the matching alarms are cancelled and
> complete, you haven't done that, because the recursive state is still unhandled.
> 
> > >
> > > First, lets address rte_alarm_set.  There is no notion of "re-arming" in this
> > > alarm implementation, because theres no ability to refer to a specific alarm
> > > from the callers perspective.  When you call rte_eal_alarm_set you get a new
> > > alarm every time.  So I don't really see a race there.  It might not be exactly
> > > the behavior you want, but its not a race, becuase you're not modifying an
> > > alarm
> > > in the middle of execution, you're just creating a new alarm, which is safe.
> >
> > OK, it is safe, but this is not the case.
> >
> I don't know what you mean by this.  We agree its safe, great.  But it is the
> case as I've described it, you can see it from the implementation, every call to
> rte_eal_alarm_set starts with a malloc of a new alarm structure.
> 
> > >
> > > There is a race in what you describe above, insofar as its possible that you
> > > might call rte_eal_alarm_cancel and return without having canceled all the
> > > matching alarms.  I don't see any clear documentation on what the behavior is
> > > supposed to be, but if you want to ensure that all matching alarms are cancelled
> > > or complete on return from rte_eal_alarm_cancel, thats perfectly fine (in linux
> > > API parlance, thats usually denoted as a cancel_sync operation).
> >
> > Again, look at the patch. I changed documentation to inform about this behavior.
> >
> 
> This is the documentation included in the patch:
> Change alarm cancel function to thread-safe.
>         It eliminates a race between threads using rte_alarm_cancel and
>         rte_alarm_set.
> 
> neither have you compeltely described the race condition (though you now have
> previously in this thread), nor have you completely addressed it (calling
> rte_eal_alarm_cancel and rte_eal_alarm_set still behaves exactly as it did
> previously with a 2nd thread).
> 
> > >
> > > For that race condition, you're correct, my patch doesn't address it, I see that
> > > now.  Though your patch doesn't either.  If you call rte_eal_alarm_cancel from
> > > within a callback function, then, by definition, you can't wait on the
> > > completion of the active alarm, because thats a deadlock.  Its a necessecary
> > > evil, I grant you, but it means that you can't be guaranteed the cancelled and
> > > complete (cancel_sync) behavior that you want, at least not with the current
> > > api.  If you want that behavior, you need to do one of two things:
> >
> > This patch does not break any API. It only removes undefined behavior.
> >
> I never said it did break ABI.  I said that to completely fix it you would have
> to break ABI.  And it doesn't really remove undefined behavior, because you
> still have the old behavior in the recursive case (which you may be ok with, I
> don't know, but if you really want to address the behavior, you should address
> this aspect of it).
> 
> > >
> > > 1) Modify the api to allow callers to individually reference timer instances, so
> > > that when cancelling, we can return an appropriate return code to indicate to
> > > the caller that this alarm is in-progress.  That way you can guarantee the
> > > caller that the specific alarm that you cancelled is either complete and cancelled
> > > or currently executing.  Add an api to expicitly wait on a referenced alarm as
> > > well.  This allows developers to know that, when executing an alarm callback, an
> > > -ECURRENTLYEXECUTING return code is ok, because they are in the currently
> > > executing context.
> >
> > This would brake API for sure.
> Yes, it would.  Bruce Richardson just made a major ABI break with his mbuf
> cleanup set.  If there was a time to change ABI here, now would be the time I
> think.

Ok, too many words for me, to be honest :)
Can I summarise:
As I remember the purpose of the patch was to fix the race condition inside rte_alarm library.
I believe that the patch provided by Michal & Pawel fixes the issues you discovered.
If you think, that is not the case, could you please provide a list of remaining issues?
Excluding ones that you just don't like it, and you are not happy with rte_alarm API in total?      

If you have any concerns about mbuf reorg/expansion - probably better to contact Bruce and express them.
Not to use it as an argument for breaking any existing API without really good reason behind.  

Konstantin

  reply	other threads:[~2014-09-26 15:36 UTC|newest]

Thread overview: 31+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-09-25 12:56 Michal Jastrzebski
2014-09-25 13:11 ` Ananyev, Konstantin
2014-09-25 15:08 ` Neil Horman
2014-09-25 16:03   ` Ananyev, Konstantin
2014-09-25 17:23     ` Neil Horman
2014-09-25 23:24       ` Ananyev, Konstantin
2014-09-26 11:46         ` Neil Horman
2014-09-26 12:37           ` Wodkowski, PawelX
2014-09-26 13:40             ` Neil Horman
2014-09-26 14:01               ` Wodkowski, PawelX
2014-09-26 15:01                 ` Neil Horman
2014-09-26 15:41                   ` Ananyev, Konstantin [this message]
2014-09-26 16:21                     ` Neil Horman
2014-09-26 18:07                       ` Ananyev, Konstantin
2014-09-26 19:39                         ` Neil Horman
2014-09-28 16:12                           ` Ananyev, Konstantin
2014-09-28 20:47                             ` Neil Horman
2014-09-29  6:40                               ` Wodkowski, PawelX
2014-09-29  9:50                                 ` Ananyev, Konstantin
2014-09-29 10:11                                   ` Wodkowski, PawelX
2014-09-29 10:33                                     ` Bruce Richardson
2014-09-30 11:13                                       ` Wodkowski, PawelX
2014-09-30 12:05                                         ` Wodkowski, PawelX
2014-09-30 12:30                                           ` Ananyev, Konstantin
2014-09-30 12:54                                             ` Neil Horman
2014-09-29 11:35                                     ` Neil Horman
2014-09-26 14:13               ` Ananyev, Konstantin
2014-09-29 10:37                 ` Bruce Richardson
2014-09-26  6:33       ` Wodkowski, PawelX
2014-09-26  9:49         ` Wodkowski, PawelX
2014-09-26 13:43         ` Neil Horman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=2601191342CEEE43887BDE71AB9772582137D88E@IRSMSX104.ger.corp.intel.com \
    --to=konstantin.ananyev@intel.com \
    --cc=dev@dpdk.org \
    --cc=nhorman@tuxdriver.com \
    --cc=pawelx.wodkowski@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).