DPDK patches and discussions
 help / color / mirror / Atom feed
From: Neil Horman <nhorman@tuxdriver.com>
To: "Ananyev, Konstantin" <konstantin.ananyev@intel.com>
Cc: "dev@dpdk.org" <dev@dpdk.org>
Subject: Re: [dpdk-dev] [PATCH v2] Change alarm cancel function to thread-safe:
Date: Sun, 28 Sep 2014 16:47:54 -0400	[thread overview]
Message-ID: <20140928204754.GC4012@localhost.localdomain> (raw)
In-Reply-To: <2601191342CEEE43887BDE71AB9772582138410B@IRSMSX104.ger.corp.intel.com>

On Sun, Sep 28, 2014 at 04:12:04PM +0000, Ananyev, Konstantin wrote:
> 
> 
> > -----Original Message-----
> > From: Neil Horman [mailto:nhorman@tuxdriver.com]
> > Sent: Friday, September 26, 2014 8:39 PM
> > To: Ananyev, Konstantin
> > Cc: Wodkowski, PawelX; dev@dpdk.org
> > Subject: Re: [dpdk-dev] [PATCH v2] Change alarm cancel function to thread-safe:
> > 
> > On Fri, Sep 26, 2014 at 06:07:14PM +0000, Ananyev, Konstantin wrote:
> > >
> > >
> > > > > As I remember the purpose of the patch was to fix the race condition inside rte_alarm library.
> > > > > I believe that the patch provided by Michal & Pawel fixes the issues you discovered.
> > > > > If you think, that is not the case, could you please provide a list of remaining issues?
> > > > > Excluding ones that you just don't like it, and you are not happy with rte_alarm API in total?
> > >
> > >
> > > > Gladly.  As Pawel explained the race, its possible that, after calling
> > > > rte_eal_alarm_cancel, an in-flight execution of an alarm callback may still be
> > > > running.  The problem with that ostensibly is that data which is being accessed
> > > > by the callback might be then accessed in parallel with another process leading
> > > > to data corruption or some other problem. The issue I have with his patch is
> > > > that it doesn't completely close the race.  While it does close the race for the
> > > > condition in whcih thread B is running the alarm callback while thread A is
> > > > executing the cancel operation, it does not close the case for when a single
> > > > thread B is running the cancel operation, as the in-flight execution itself is
> > > > still active.
> > >
> > > A bit puzzled here:
> > > Are you saying that calling alarm_cancel() for itself inside eal_alarm_callback() might cause a problem?
> > > I still don't see how.
> > >
> > Potentially yes, by the same race condition that exists when using a secondary
> > thread to do the cancel call.  As I understand it the race that Pawel described
> > is as follows:
> > 
> > Thread A					Thread B
> > alarm_cancel()					eal_alarm_callback
> >     block on alarm spinlock			drop spinlock
> >         run cancel operation			execute callback function
> >     return from cancel
> >     rte_eal_alarm_set
> > 
> > As Pawel described the problem, there is a desire to not set the new alarm while
> > the old alarm is still executing.  And his patch accomplishes that for the two
> > thread case above just fine
> > 
> > The problem with Pawels patch is that its non functional in the case where the
> > cancel happens within Thread B.  Lets change the scenario just a little bit:
> > 
> > Thread B					Thread C
> > eal_alarm_callback
> >      callback_function
> >           some_other_common_func
> >                rte_eal_alarm_cancel(this)
> >           pthread_signal(Thread C)		wake up
> >           operate on alarm data			rte_eal_alarm_set
> >
> 
> As I can see, there is an incorrect behaviour in your callback_function example.
> It should first finish with " eal_alarm_callback" and only then send a signal to other thread.
> Otherwise we can't help it in any way.
> But I think, I understand your concern:
> after rte_eal_aralm_cancel() finishes, the caller can't clearly distinguish what exactly happen:
> 1) alarm was cancelled succesfully.
> 2) alarm was not found (already cancelled or executed).
> 3) alarm is executing by the same thread and can't be cancelled.
> 
> Basically right now the caller can distinguish that either #1 or #2,3 happened, but can't distinguish between 2 & 3.
> Correct?
> 
Yes, this is my concern exactly.

>  If that's so, then I suppose we can do: make alarm_cancel() to return a negative value for the case #3 (-EINPROGRESS or something).
>  Something like: 
> ...
> if (ap->executing == 0) {
>    LIST_REMOVE(ap,next);
>     rte_free(ap);
>     count++;
>     ap = ap_prev; 
> } else if (pthread_equal(ap->executing_id, pthread_self()) == 0) {
>     executing++;
> } else { 
>    ret = -EINPROGRESS;
> }
> ...
> return ((ret != 0) ? ret : count);
> 
> So the return value  will be > 0 for #1, 0 for #2, <0 for #3.
> As I remember, you already suggested something similar in one of the previous mails.
Yes, I rolled the API changes I suggested in with this model, because I wanted
to be able to do precise specification of a timer instance to cancel, but if
we're not ready to make that change, I think what you propose above would be
suffficient.  Theres some question as to weather we would cancel timers that are
still pending on a return of -EINPROGRESS, but I think if we document it
accordingly, then it can be worked out just fine.

Best
Neil

> Konstantin
> 
> 
> 
>  	
> > 
> > In this scenario the problem is not fixed because when called from within the
> > alarm thread, the executing alarm is skipped (as it must be), but that fact is
> > invisible to the caller, and because of that its still possible for the same
> > origional problem to occur.
> > 
> > Neil
> > 
> 

  reply	other threads:[~2014-09-28 20:41 UTC|newest]

Thread overview: 31+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-09-25 12:56 Michal Jastrzebski
2014-09-25 13:11 ` Ananyev, Konstantin
2014-09-25 15:08 ` Neil Horman
2014-09-25 16:03   ` Ananyev, Konstantin
2014-09-25 17:23     ` Neil Horman
2014-09-25 23:24       ` Ananyev, Konstantin
2014-09-26 11:46         ` Neil Horman
2014-09-26 12:37           ` Wodkowski, PawelX
2014-09-26 13:40             ` Neil Horman
2014-09-26 14:01               ` Wodkowski, PawelX
2014-09-26 15:01                 ` Neil Horman
2014-09-26 15:41                   ` Ananyev, Konstantin
2014-09-26 16:21                     ` Neil Horman
2014-09-26 18:07                       ` Ananyev, Konstantin
2014-09-26 19:39                         ` Neil Horman
2014-09-28 16:12                           ` Ananyev, Konstantin
2014-09-28 20:47                             ` Neil Horman [this message]
2014-09-29  6:40                               ` Wodkowski, PawelX
2014-09-29  9:50                                 ` Ananyev, Konstantin
2014-09-29 10:11                                   ` Wodkowski, PawelX
2014-09-29 10:33                                     ` Bruce Richardson
2014-09-30 11:13                                       ` Wodkowski, PawelX
2014-09-30 12:05                                         ` Wodkowski, PawelX
2014-09-30 12:30                                           ` Ananyev, Konstantin
2014-09-30 12:54                                             ` Neil Horman
2014-09-29 11:35                                     ` Neil Horman
2014-09-26 14:13               ` Ananyev, Konstantin
2014-09-29 10:37                 ` Bruce Richardson
2014-09-26  6:33       ` Wodkowski, PawelX
2014-09-26  9:49         ` Wodkowski, PawelX
2014-09-26 13:43         ` Neil Horman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20140928204754.GC4012@localhost.localdomain \
    --to=nhorman@tuxdriver.com \
    --cc=dev@dpdk.org \
    --cc=konstantin.ananyev@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).