DPDK patches and discussions
 help / color / mirror / Atom feed
From: "Ananyev, Konstantin" <konstantin.ananyev@intel.com>
To: Neil Horman <nhorman@tuxdriver.com>
Cc: "dev@dpdk.org" <dev@dpdk.org>
Subject: Re: [dpdk-dev] [PATCH v2] Change alarm cancel function to thread-safe:
Date: Sun, 28 Sep 2014 16:12:04 +0000	[thread overview]
Message-ID: <2601191342CEEE43887BDE71AB9772582138410B@IRSMSX104.ger.corp.intel.com> (raw)
In-Reply-To: <20140926193905.GH5619@hmsreliant.think-freely.org>



> -----Original Message-----
> From: Neil Horman [mailto:nhorman@tuxdriver.com]
> Sent: Friday, September 26, 2014 8:39 PM
> To: Ananyev, Konstantin
> Cc: Wodkowski, PawelX; dev@dpdk.org
> Subject: Re: [dpdk-dev] [PATCH v2] Change alarm cancel function to thread-safe:
> 
> On Fri, Sep 26, 2014 at 06:07:14PM +0000, Ananyev, Konstantin wrote:
> >
> >
> > > > As I remember the purpose of the patch was to fix the race condition inside rte_alarm library.
> > > > I believe that the patch provided by Michal & Pawel fixes the issues you discovered.
> > > > If you think, that is not the case, could you please provide a list of remaining issues?
> > > > Excluding ones that you just don't like it, and you are not happy with rte_alarm API in total?
> >
> >
> > > Gladly.  As Pawel explained the race, its possible that, after calling
> > > rte_eal_alarm_cancel, an in-flight execution of an alarm callback may still be
> > > running.  The problem with that ostensibly is that data which is being accessed
> > > by the callback might be then accessed in parallel with another process leading
> > > to data corruption or some other problem. The issue I have with his patch is
> > > that it doesn't completely close the race.  While it does close the race for the
> > > condition in whcih thread B is running the alarm callback while thread A is
> > > executing the cancel operation, it does not close the case for when a single
> > > thread B is running the cancel operation, as the in-flight execution itself is
> > > still active.
> >
> > A bit puzzled here:
> > Are you saying that calling alarm_cancel() for itself inside eal_alarm_callback() might cause a problem?
> > I still don't see how.
> >
> Potentially yes, by the same race condition that exists when using a secondary
> thread to do the cancel call.  As I understand it the race that Pawel described
> is as follows:
> 
> Thread A					Thread B
> alarm_cancel()					eal_alarm_callback
>     block on alarm spinlock			drop spinlock
>         run cancel operation			execute callback function
>     return from cancel
>     rte_eal_alarm_set
> 
> As Pawel described the problem, there is a desire to not set the new alarm while
> the old alarm is still executing.  And his patch accomplishes that for the two
> thread case above just fine
> 
> The problem with Pawels patch is that its non functional in the case where the
> cancel happens within Thread B.  Lets change the scenario just a little bit:
> 
> Thread B					Thread C
> eal_alarm_callback
>      callback_function
>           some_other_common_func
>                rte_eal_alarm_cancel(this)
>           pthread_signal(Thread C)		wake up
>           operate on alarm data			rte_eal_alarm_set
>

As I can see, there is an incorrect behaviour in your callback_function example.
It should first finish with " eal_alarm_callback" and only then send a signal to other thread.
Otherwise we can't help it in any way.
But I think, I understand your concern:
after rte_eal_aralm_cancel() finishes, the caller can't clearly distinguish what exactly happen:
1) alarm was cancelled succesfully.
2) alarm was not found (already cancelled or executed).
3) alarm is executing by the same thread and can't be cancelled.

Basically right now the caller can distinguish that either #1 or #2,3 happened, but can't distinguish between 2 & 3.
Correct?

 If that's so, then I suppose we can do: make alarm_cancel() to return a negative value for the case #3 (-EINPROGRESS or something).
 Something like: 
...
if (ap->executing == 0) {
   LIST_REMOVE(ap,next);
    rte_free(ap);
    count++;
    ap = ap_prev; 
} else if (pthread_equal(ap->executing_id, pthread_self()) == 0) {
    executing++;
} else { 
   ret = -EINPROGRESS;
}
...
return ((ret != 0) ? ret : count);

So the return value  will be > 0 for #1, 0 for #2, <0 for #3.
As I remember, you already suggested something similar in one of the previous mails.
Konstantin



 	
> 
> In this scenario the problem is not fixed because when called from within the
> alarm thread, the executing alarm is skipped (as it must be), but that fact is
> invisible to the caller, and because of that its still possible for the same
> origional problem to occur.
> 
> Neil
> 

  reply	other threads:[~2014-09-28 16:05 UTC|newest]

Thread overview: 31+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-09-25 12:56 Michal Jastrzebski
2014-09-25 13:11 ` Ananyev, Konstantin
2014-09-25 15:08 ` Neil Horman
2014-09-25 16:03   ` Ananyev, Konstantin
2014-09-25 17:23     ` Neil Horman
2014-09-25 23:24       ` Ananyev, Konstantin
2014-09-26 11:46         ` Neil Horman
2014-09-26 12:37           ` Wodkowski, PawelX
2014-09-26 13:40             ` Neil Horman
2014-09-26 14:01               ` Wodkowski, PawelX
2014-09-26 15:01                 ` Neil Horman
2014-09-26 15:41                   ` Ananyev, Konstantin
2014-09-26 16:21                     ` Neil Horman
2014-09-26 18:07                       ` Ananyev, Konstantin
2014-09-26 19:39                         ` Neil Horman
2014-09-28 16:12                           ` Ananyev, Konstantin [this message]
2014-09-28 20:47                             ` Neil Horman
2014-09-29  6:40                               ` Wodkowski, PawelX
2014-09-29  9:50                                 ` Ananyev, Konstantin
2014-09-29 10:11                                   ` Wodkowski, PawelX
2014-09-29 10:33                                     ` Bruce Richardson
2014-09-30 11:13                                       ` Wodkowski, PawelX
2014-09-30 12:05                                         ` Wodkowski, PawelX
2014-09-30 12:30                                           ` Ananyev, Konstantin
2014-09-30 12:54                                             ` Neil Horman
2014-09-29 11:35                                     ` Neil Horman
2014-09-26 14:13               ` Ananyev, Konstantin
2014-09-29 10:37                 ` Bruce Richardson
2014-09-26  6:33       ` Wodkowski, PawelX
2014-09-26  9:49         ` Wodkowski, PawelX
2014-09-26 13:43         ` Neil Horman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=2601191342CEEE43887BDE71AB9772582138410B@IRSMSX104.ger.corp.intel.com \
    --to=konstantin.ananyev@intel.com \
    --cc=dev@dpdk.org \
    --cc=nhorman@tuxdriver.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).