Bugzilla ID 491 identifies a couple of scenarios in which calls to the rte_timer_reset_sync() and rte_timer_stop_sync() APIs could hang. Changing the function prototypes such that error codes are returned in these scenarios was considered, but rather than introducing another ABI change, it was proposed to document a usage limitation[1]. This patch adds the notes. [1] https://patches.dpdk.org/patch/75142/ Erik Gabriel Carrillo (1): timer: add limitation note for sync stop and reset lib/librte_timer/rte_timer.h | 12 ++++++++++++ 1 file changed, 12 insertions(+) -- 2.6.4
If a timer's callback function calls rte_timer_reset_sync() or rte_timer_stop_sync() on another timer that is in the RUNNING state and owned by the current lcore, the *_sync() calls will loop indefinitely. Relatedly, if a timer's callback function calls *_sync() on another timer that is in the RUNNING state and is owned by a different lcore, but a timer callback function runs on that different lcore and calls *_sync() on a timer that is in the RUNNING state and owned by the current lcore, the two lcores will loop indefinitely. Add a note in the rte_timer_stop_sync and rte_timer_reset_sync documentation that indicates that these APIs should not be used inside timer callback functions in order to avoid the hangs described above, and suggests an alternative. Bugzilla ID: 491 Cc: stable@dpdk.org Signed-off-by: Erik Gabriel Carrillo <erik.g.carrillo@intel.com> --- lib/librte_timer/rte_timer.h | 12 ++++++++++++ 1 file changed, 12 insertions(+) diff --git a/lib/librte_timer/rte_timer.h b/lib/librte_timer/rte_timer.h index c6b3d45..d7c3e03 100644 --- a/lib/librte_timer/rte_timer.h +++ b/lib/librte_timer/rte_timer.h @@ -274,6 +274,12 @@ int rte_timer_reset(struct rte_timer *tim, uint64_t ticks, * The callback function of the timer. * @param arg * The user argument of the callback function. + * + * @note + * This API should not be called inside a timer's callback function to + * reset another timer; doing so could hang in certain scenarios. Instead, + * the rte_timer_reset() API can be called directly and its return code + * can be checked for success or failure. */ void rte_timer_reset_sync(struct rte_timer *tim, uint64_t ticks, @@ -313,6 +319,12 @@ int rte_timer_stop(struct rte_timer *tim); * * @param tim * The timer handle. + * + * @note + * This API should not be called inside a timer's callback function to + * stop another timer; doing so could hang in certain scenarios. Instead, the + * rte_timer_stop() API can be called directly and its return code can + * be checked for success or failure. */ void rte_timer_stop_sync(struct rte_timer *tim); -- 2.6.4
<snip> > > If a timer's callback function calls rte_timer_reset_sync() or > rte_timer_stop_sync() on another timer that is in the RUNNING state and > owned by the current lcore, the *_sync() calls will loop indefinitely. > > Relatedly, if a timer's callback function calls *_sync() on another timer that is > in the RUNNING state and is owned by a different lcore, but a timer callback > function runs on that different lcore and calls > *_sync() on a timer that is in the RUNNING state and owned by the current > lcore, the two lcores will loop indefinitely. > > Add a note in the rte_timer_stop_sync and rte_timer_reset_sync > documentation that indicates that these APIs should not be used inside > timer callback functions in order to avoid the hangs described above, and > suggests an alternative. > > Bugzilla ID: 491 > Cc: stable@dpdk.org > > Signed-off-by: Erik Gabriel Carrillo <erik.g.carrillo@intel.com> Looks good. Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com> > --- > lib/librte_timer/rte_timer.h | 12 ++++++++++++ > 1 file changed, 12 insertions(+) > > diff --git a/lib/librte_timer/rte_timer.h b/lib/librte_timer/rte_timer.h index > c6b3d45..d7c3e03 100644 > --- a/lib/librte_timer/rte_timer.h > +++ b/lib/librte_timer/rte_timer.h > @@ -274,6 +274,12 @@ int rte_timer_reset(struct rte_timer *tim, uint64_t > ticks, > * The callback function of the timer. > * @param arg > * The user argument of the callback function. > + * > + * @note > + * This API should not be called inside a timer's callback function to > + * reset another timer; doing so could hang in certain scenarios. Instead, > + * the rte_timer_reset() API can be called directly and its return code > + * can be checked for success or failure. > */ > void > rte_timer_reset_sync(struct rte_timer *tim, uint64_t ticks, @@ -313,6 > +319,12 @@ int rte_timer_stop(struct rte_timer *tim); > * > * @param tim > * The timer handle. > + * > + * @note > + * This API should not be called inside a timer's callback function to > + * stop another timer; doing so could hang in certain scenarios. Instead, > the > + * rte_timer_stop() API can be called directly and its return code can > + * be checked for success or failure. > */ > void rte_timer_stop_sync(struct rte_timer *tim); > > -- > 2.6.4
On Thu, Sep 10, 2020 at 3:23 AM Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com> wrote: > > If a timer's callback function calls rte_timer_reset_sync() or > > rte_timer_stop_sync() on another timer that is in the RUNNING state and > > owned by the current lcore, the *_sync() calls will loop indefinitely. > > > > Relatedly, if a timer's callback function calls *_sync() on another timer that is > > in the RUNNING state and is owned by a different lcore, but a timer callback > > function runs on that different lcore and calls > > *_sync() on a timer that is in the RUNNING state and owned by the current > > lcore, the two lcores will loop indefinitely. > > > > Add a note in the rte_timer_stop_sync and rte_timer_reset_sync > > documentation that indicates that these APIs should not be used inside > > timer callback functions in order to avoid the hangs described above, and > > suggests an alternative. > > > > Bugzilla ID: 491 > > Cc: stable@dpdk.org > > > > Signed-off-by: Erik Gabriel Carrillo <erik.g.carrillo@intel.com> > Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com> Applied, thanks. Since we go with documenting a limitation, should we mark the original patches [1] and [2] as rejected instead of deferred? 1: https://patches.dpdk.org/patch/75156/ 2: https://patches.dpdk.org/patch/73683/ -- David Marchand
> -----Original Message----- > From: David Marchand <david.marchand@redhat.com> > Sent: Thursday, October 8, 2020 5:28 AM > To: Carrillo, Erik G <erik.g.carrillo@intel.com> > Cc: dev@dpdk.org; stable@dpdk.org; nd <nd@arm.com>; Honnappa > Nagarahalli <Honnappa.Nagarahalli@arm.com>; Sarosh Arif > <sarosh.arif@emumba.com> > Subject: Re: [dpdk-dev] [PATCH 1/1] timer: add limitation note for sync stop > and reset > > On Thu, Sep 10, 2020 at 3:23 AM Honnappa Nagarahalli > <Honnappa.Nagarahalli@arm.com> wrote: > > > If a timer's callback function calls rte_timer_reset_sync() or > > > rte_timer_stop_sync() on another timer that is in the RUNNING state > > > and owned by the current lcore, the *_sync() calls will loop indefinitely. > > > > > > Relatedly, if a timer's callback function calls *_sync() on another > > > timer that is in the RUNNING state and is owned by a different > > > lcore, but a timer callback function runs on that different lcore > > > and calls > > > *_sync() on a timer that is in the RUNNING state and owned by the > > > current lcore, the two lcores will loop indefinitely. > > > > > > Add a note in the rte_timer_stop_sync and rte_timer_reset_sync > > > documentation that indicates that these APIs should not be used > > > inside timer callback functions in order to avoid the hangs > > > described above, and suggests an alternative. > > > > > > Bugzilla ID: 491 > > > Cc: stable@dpdk.org > > > > > > Signed-off-by: Erik Gabriel Carrillo <erik.g.carrillo@intel.com> > > Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com> > > Applied, thanks. > > Since we go with documenting a limitation, should we mark the original > patches [1] and [2] as rejected instead of deferred? > > 1: https://patches.dpdk.org/patch/75156/ > 2: https://patches.dpdk.org/patch/73683/ > > Thanks, David. Yes, those patches should be moved to "rejected" - I tried to do it myself, but got permission errors. Sarosh, can you make these updates? Thanks, Erik > -- > David Marchand
On Thu, Oct 8, 2020 at 3:58 PM Carrillo, Erik G
<erik.g.carrillo@intel.com> wrote:
> > Since we go with documenting a limitation, should we mark the original
> > patches [1] and [2] as rejected instead of deferred?
> >
> > 1: https://patches.dpdk.org/patch/75156/
> > 2: https://patches.dpdk.org/patch/73683/
> >
> >
> Thanks, David.
>
> Yes, those patches should be moved to "rejected" - I tried to do it myself, but got permission errors. Sarosh, can you make these updates?
I updated them.
Thanks Erik.
--
David Marchand