DPDK patches and discussions
 help / color / mirror / Atom feed
From: "Ananyev, Konstantin" <konstantin.ananyev@intel.com>
To: Neil Horman <nhorman@tuxdriver.com>,
	"Wodkowski, PawelX" <pawelx.wodkowski@intel.com>
Cc: "dev@dpdk.org" <dev@dpdk.org>
Subject: Re: [dpdk-dev] [PATCH v2] Change alarm cancel function to thread-safe:
Date: Fri, 26 Sep 2014 14:13:55 +0000	[thread overview]
Message-ID: <2601191342CEEE43887BDE71AB9772582137D7F1@IRSMSX104.ger.corp.intel.com> (raw)
In-Reply-To: <20140926134014.GB3930@hmsreliant.think-freely.org>

> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Neil Horman
> Sent: Friday, September 26, 2014 2:40 PM
> To: Wodkowski, PawelX
> Cc: dev@dpdk.org
> Subject: Re: [dpdk-dev] [PATCH v2] Change alarm cancel function to thread-safe:
> 
> On Fri, Sep 26, 2014 at 12:37:54PM +0000, Wodkowski, PawelX wrote:
> > > So basically cancel() just set ALARM_CANCELLED and leaves actual alarm
> > > deletion to the callback()?
> > > That was the thought, yes.
> > >
> > > > I think it is doable - but I don't see any real advantage with that approach.
> > > > Yes, code will become a bit simpler, as  we'll have one point when we remove
> > > alarm from the list.
> > > Yes, that would be the advantage, that the code would be much simpler.
> > >
> > > > But from other side, imagine such simple test-case:
> > > >
> > > > for (i = 0; i < 0x100000; i++) {
> > > >    rte_eal_alarm_set(ONE_MIN, cb_func, (void *)i);
> > > >    rte_eal_alarm_cancel(cb_func, (void *)i);
> > > > }
> > > >
> > > > We'll endup with 1M of cancelled, but still not removed entries in the
> > > alarm_list.
> > > > With current implementation that means - few MBs of wasted memory,
> > > Thats correct, and the tradeoff to choose between.  Do you want simpler code
> > > that is easier to maintain, or do you want a high speed cancel and set
> > > operation.  I'm not aware of all the use cases, but I have a hard time seeing
> > > a use case in which the in-flight alarm list grows unboundedly large, which in
> > > my mind mitigates the risk of deferred removal, but I'm perfectly willing to
> > > believe that there are use cases which I'm not aware of.

After executing example above - from user perspective there is no active alarms in the system at all.
Though in fact alarm_list contains 1M entries. 

> > >
> > > > plus very slow set() and cancel(), as they'll  have to traverse all entries in the
> > > list.
> > > > And all that - for empty from user perspective alarm_list
> > > > So I still prefer Michal's way.
> > > > After all, it doesn't look that complicated to me.
> > > Except that the need for Michals fix arose from the fact that we have two free
> > > locations that might both get called depending on the situation.  Thats what I'm
> > > trying to address here, the complexity itself, rather than the fix (which I
> > > agree is perfectly valid).

Well, I believe his fix addresses all the open issues, no?

Another thing, as Pawel pointed in another mail, your fix doesn't address properly the situation when
we have a racing alarm_cancel(cb_func) and alarm cb_func rearming itself. 
While the original patch does.

> > >
> > > > BTW, any particular reason you are so negative about pthread_self()?
> > > >
> > > Nothing specifically against it (save for its inverted return code sense, which
> > > made it difficult for me to parse when reviewing).  Its more the complexity
> > > itself in the alarm cancel and callback routine that I was looking at.  Given
> > > that the origional bug happened because an cancel in a callback might produce a
> > > double free, I wanted to fix it by simpifying the code, not adding conditions
> > > which make it more complex.
> > >
> > > You know, looking at it, something else just occured to me.  I think this could
> > > all be fixed without the cancel flag or the pthread_self assignments.  What if
> > > we simply removed the alarm from the list before we called the callback in
> > > rte_eal_alarm_callback()?  That way any cancel operation called from within the
> > > callback would fail, as it wouldn't appear on the list, and the callback
> > > operation would be the only freeing entity?  That would let you still have a
> > > fast set and cancel, and avoid the race.  Thoughts?  Untested sample patch
> > > below


Hmm, and how it would address any of the problems I mentioned above:

1. The alarm_list still would grow by the repetition of: {alarm_set(x); alarm_cansel(x);} 
2. Race between alarm_cancel(cb_func) and alarm cb_func rearming itself.

?

> > >
> > >
> > > > >
> > > > > It also seems like the alarm api as a whole could use some improvement.
> > > The
> > > > > way its written right now, theres no way to refer to a specific alarm (i.e.
> > > > > cancelation relies on the specification of a function and data pointer, which
> > > > > may refer to multiple timers).  Shouldn't rte_eal_alarm_set return an opaque
> > > > > handle to a unique timer instance that can be store by a caller and used to
> > > > > specfically cancel that timer?  Thats how both the bsd and linux timer
> > > > > subsystems model timers.
> > > >
> > > > Yeh,  alarm API looks a bit unusual.
> > > > Though, I suppose that's subject for another patch/discussion :)
> > > >
> > > Yes, agreed :)
> > >
> >
> > Please read quoted message bellow:
> >
> > > >
> > > >
> > > > His solution *does* eliminate race condition too.
> > > >
> > > I applied his patch. And here is the problem
> > > 1	rte_spinlock_lock(&alarm_list_lk);
> > > 2	while ((ap = LIST_FIRST(&alarm_list)) !=NULL &&
> > > 3			gettimeofday(&now, NULL) == 0 &&
> > > 4			(ap->time.tv_sec < now.tv_sec || (ap->time.tv_sec ==
> > > now.tv_sec &&
> > > 5						ap->time.tv_usec <=
> > > now.tv_usec))){
> > > 6		ap->executing |= ALARM_EXECUTING;
> > > 7		if (likely(!(ap->executing & ALARM_CANCELLED))) {
> > > 8				rte_spinlock_unlock(&alarm_list_lk);
> > > 9                           //another thread: rte_alarm_cancel called, mark this timer
> > > canceled and exit ( THE RACE)
> > > 10				ap->cb_fn(ap->cb_arg); // rte_alarm_set called
> > > (THE RACE)
> > > 11
> > > 12				rte_spinlock_lock(&alarm_list_lk);
> > > 13		}
> > > 14
> > > 15		rte_spinlock_lock(&alarm_list_lk);
> > > 16		LIST_REMOVE(ap, next);
> > > 17		rte_free(ap);
> > > 18	}
> > >
> > > Imagine
> > >
> > > Thread 1:                                         Thread2
> > > Execute eal_alarm_callback
> > > Lock list at 1                                   rte_alarm_cancel -> block on spinlock
> > > ....
> > > Realease lock at line 8                  rte_alarm_cancel -> resumes execution -> it
> > > mark this timer canceled
> > > ap->cb_fn is called at line 10       rte_alarm_cancel exits reporting all alarms are
> > > canceled and not executing (which is not true!)
> > >     rte_alarm_set is called
> > >     to rearm itself (THE RACE)
> > >
> > > He only mark timer to * do not execute* but does not assure that it is not
> > > executing while canceling.
> > > Race condition is made by unlocking at line 8 and our solution workarounds this
> > > by looping until all alarms finish execution then cancel what left after callback
> > > finish (*including rearmed alarms*).
> > > Real elimination of the race would be by using recursive locking when *other
> > > thread can't* access the list *while callback* is running but callback *itself can
> > > by using recursive locking*.
> > >
> > > Maybe I don't see something obvious? :)
> 
> I think you're missing the fact that your patch doesn't do what you assert above
> either :)
> 
> First, lets address rte_alarm_set.  There is no notion of "re-arming" in this
> alarm implementation, because theres no ability to refer to a specific alarm
> from the callers perspective.  When you call rte_eal_alarm_set you get a new
> alarm every time.  So I don't really see a race there.  It might not be exactly
> the behavior you want, but its not a race, becuase you're not modifying an alarm
> in the middle of execution, you're just creating a new alarm, which is safe.
> 
> There is a race in what you describe above, insofar as its possible that you
> might call rte_eal_alarm_cancel and return without having canceled all the
> matching alarms.  I don't see any clear documentation on what the behavior is
> supposed to be, but if you want to ensure that all matching alarms are cancelled
> or complete on return from rte_eal_alarm_cancel, thats perfectly fine (in linux
> API parlance, thats usually denoted as a cancel_sync operation).
> 
> For that race condition, you're correct, my patch doesn't address it, I see that
> now.  Though your patch doesn't either.  If you call rte_eal_alarm_cancel from
> within a callback function, then, by definition, you can't wait on the
> completion of the active alarm, because thats a deadlock.  Its a necessecary
> evil, I grant you, but it means that you can't be guaranteed the cancelled and
> complete (cancel_sync) behavior that you want, at least not with the current
> api.  If you want that behavior, you need to do one of two things:
> 
> 1) Modify the api to allow callers to individually reference timer instances, so
> that when cancelling, we can return an appropriate return code to indicate to
> the caller that this alarm is in-progress.  That way you can guarantee the
> caller that the specific alarm that you cancelled is either complete and cancelled
> or currently executing.  Add an api to expicitly wait on a referenced alarm as
> well.  This allows developers to know that, when executing an alarm callback, an
> -ECURRENTLYEXECUTING return code is ok, because they are in the currently
> executing context.
> 
> 
> 2) Understand that the guarantee can't be met, and change nothing, because you
> consider it ok for an in-progress alarm to be ignored when getting cancelled.
> 
> 
> Neil

  parent reply	other threads:[~2014-09-26 14:07 UTC|newest]

Thread overview: 31+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-09-25 12:56 Michal Jastrzebski
2014-09-25 13:11 ` Ananyev, Konstantin
2014-09-25 15:08 ` Neil Horman
2014-09-25 16:03   ` Ananyev, Konstantin
2014-09-25 17:23     ` Neil Horman
2014-09-25 23:24       ` Ananyev, Konstantin
2014-09-26 11:46         ` Neil Horman
2014-09-26 12:37           ` Wodkowski, PawelX
2014-09-26 13:40             ` Neil Horman
2014-09-26 14:01               ` Wodkowski, PawelX
2014-09-26 15:01                 ` Neil Horman
2014-09-26 15:41                   ` Ananyev, Konstantin
2014-09-26 16:21                     ` Neil Horman
2014-09-26 18:07                       ` Ananyev, Konstantin
2014-09-26 19:39                         ` Neil Horman
2014-09-28 16:12                           ` Ananyev, Konstantin
2014-09-28 20:47                             ` Neil Horman
2014-09-29  6:40                               ` Wodkowski, PawelX
2014-09-29  9:50                                 ` Ananyev, Konstantin
2014-09-29 10:11                                   ` Wodkowski, PawelX
2014-09-29 10:33                                     ` Bruce Richardson
2014-09-30 11:13                                       ` Wodkowski, PawelX
2014-09-30 12:05                                         ` Wodkowski, PawelX
2014-09-30 12:30                                           ` Ananyev, Konstantin
2014-09-30 12:54                                             ` Neil Horman
2014-09-29 11:35                                     ` Neil Horman
2014-09-26 14:13               ` Ananyev, Konstantin [this message]
2014-09-29 10:37                 ` Bruce Richardson
2014-09-26  6:33       ` Wodkowski, PawelX
2014-09-26  9:49         ` Wodkowski, PawelX
2014-09-26 13:43         ` Neil Horman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=2601191342CEEE43887BDE71AB9772582137D7F1@IRSMSX104.ger.corp.intel.com \
    --to=konstantin.ananyev@intel.com \
    --cc=dev@dpdk.org \
    --cc=nhorman@tuxdriver.com \
    --cc=pawelx.wodkowski@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).