Re: [dpdk-stable] [dpdk-dev] [PATCH v3] net/mlx4: fix dev rmv not detected after port stop

patches for DPDK stable branches
 help / color / mirror / Atom feed

From: Matan Azrad <matan@mellanox.com>
To: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Cc: Shahaf Shuler <shahafs@mellanox.com>,
	Mordechay Haimovsky <motih@mellanox.com>,
	"dev@dpdk.org" <dev@dpdk.org>,
	"stable@dpdk.org" <stable@dpdk.org>
Subject: Re: [dpdk-stable] [dpdk-dev] [PATCH v3] net/mlx4: fix dev rmv not detected after port stop
Date: Wed, 31 Jan 2018 17:07:56 +0000	[thread overview]
Message-ID: <AM4PR0501MB2657E999EF07D6A9C8E4E730D2FB0@AM4PR0501MB2657.eurprd05.prod.outlook.com> (raw)
In-Reply-To: <20180131143157.GV4256@6wind.com>

Hi Adrien

From: Adrien Mazarguil , Sent: Wednesday, January 31, 2018 4:32 PM
> Hi Matan,
> 
> On Wed, Jan 31, 2018 at 01:44:41PM +0000, Matan Azrad wrote:
> > Hi Adrien
> >
> >  From: Adrien Mazarguil, Sent: Wednesday, January 31, 2018 12:44 PM
> > > On Wed, Jan 31, 2018 at 10:08:06AM +0000, Matan Azrad wrote:
> > > > Hi all
> > > >
> > > > From: Adrien Mazarguil
> > > > > On Tue, Jan 30, 2018 at 08:37:06PM +0000, Shahaf Shuler wrote:
> > > > > > Tuesday, January 30, 2018 11:40 AM, Adrien Mazarguil:
> > > > > > > Unfortunately I didn't get a chance to review this patch
> > > > > > > before it was
> > > > > applied.
> > > > > > > I'm not sure a stopped port is supposed to report events
> > > > > > > (interrupts). Will applications expect them to occur at this point?
> > > > > >
> > > > > > Why not?
> > > > > >
> > > > > > Stopped port is still counted as attached. The fact the
> > > > > > application stopped
> > > > > the packet receive on it doesn't mean it should not receive a
> > > > > sync events (such as the remove event).
> > > > > > async events, by definition, are not related to traffic being
> > > > > > flows through
> > > > > the port.
> > > > >
> > > > > My comment is based on my understanding of rte_eth_dev_stop(),
> > > which
> > > > > is a device (or port) is completely stopped, in a suspended
> > > > > state and no interrupts shall occur, as a means for applications
> > > > > to temporarily not be bothered by them until restarted.
> > > >
> > > > Stopping traffic is not saying that the application is not
> > > > interesting in the device, I think that you mean to dev_close().
> > >
> > > No, dev_close() releases resources and destroys configuration. Good
> > > luck using dev_start() or any other devop after dev_close().
> >
> > I'm just saying here that when the user call dev_close() it means he
> > is not interesting in the device any more, This is not the case in dev_stop().
> >
> > > > Any event may still be usable for application between dev_stop()
> > > > to dev_start(), especially RMV or LCS can still be interested.
> > >
> > > Possibly, but then, how come no PMD implements it that way?
> >
> > Again, maybe others PMDs make mistakes.
> 
> It's basically the same as stating that for all these years, neither PMD nor
> application maintainers understood the true meaning of this API.
> 
> Maybe you're right, maybe not. What's for sure is this patch unilaterally
> decided to modify an accepted behavior. Perhaps it's not a big deal but we
> must be cautious.

Agree.

I'm just saying my opinion here.

> > > Neither did mlx4 before this patch got applied. At the very least it
> > > cannot be considered a "fix".
> >
> > It fixes something.
> 
> Technically every time a feature is added, its absence gets "fixed" :)

:)
 
> > > > > Think about it that way: applications do not want to get
> > > > > interrupts immediately after the device is initialized, because
> > > > > they might not be ready to process them at this point. An
> > > > > explicit call to
> > > > > rte_eth_dev_start() tells the PMD when it's OK to do so. The
> > > > > converse is
> > > rte_eth_dev_stop().
> > > >
> > > > So, they can delay the event registration to the time they
> > > > interesting in the
> > > events.
> > > > And use event unregister when they are not interesting in it anymore.
> > >
> > > Of course you can ask application maintainers to adapt to the new
> > > behavior, or you know, leave things as they used to be.
> > >
> >
> > I don't know what any application does but for me it is a mistake to
> > stop all event processes in dev_stop(), Maybe for other application
> maintainers too.
> 
> Just like you, I don't know either what all the applications ever written for
> DPDK expect out of dev_stop(). What's for sure is that currently, LSC/RMV
> don't occcur afterward, the same way these events do not occur before
> dev_start().

Why not? RMV event can occur any time after probe.

> Any application possibly relying on this fact will break. In such a
> situation, a conservative approach is better.

If an application should fail to get event in stopped state it may fail in the previous code too:
The interrupt run from host thread so the next race may occur:
dev_start() : master thread.
Context switch.
RMV interrupt started to run callbacks: host thread.
Context switch.
dev_stop(): master thread.
Start reconfiguration: master thread. 
Context switch.
Callback running.

So, the only thing which can disable callback running after dev_stop() is callback unregistration before it.

> > > Setting up RMV/LSC callbacks is not the only configuration an
> > > application usually performs before calling dev_start(). Think about
> > > setting up flow rules, MAC addresses, VLANs, and so on, this on
> > > multiple ports before starting them up all at once. Previously it
> > > could be done in an unspecified order, now they have to take special care
> for RMV/LSC.
> >
> > Or maybe there callbacks code are already safe for it.
> > Or they manages the unregister\register calls in the right places.
> 
> That's my point, these "maybes" don't argue in favor of changing things.

What I'm saying is that callbacks should be safe or not registered in the right time.

> > > Many devops are only safe when called while a device is stopped.
> > > It's even documented in rte_ethdev.h.
> > >
> >
> > And?
> 
> ...And applications therefore often do all their configuration in an unspecified
> order while a port is stopped as a measure of safety. No extra care is taken
> for RMV/LSC. This uncertainty can be addressed by not modifying the current
> behavior.

Or they expect to get interrupt and the corner case will come later if we will not change it.

> > > > > Stopping traffic can already be achieved by not polling from the
> > > > > application side, calling rte_eth_dev_[rt]x_queue_stop() and/or
> > > > > toggling RX/TX interrupts through rte_eth_dev_[rt]x_intr_enable().
> > > > > rte_eth_dev_stop() provides lower-level device control.
> > > >
> > > > I think it makes sense only for Rx interrupt which is traffic
> > > > oriented(like stop
> > > and start).
> > >
> > > OK, looks like we disagree :)
> > >
> > > > > Perhaps documentation is not clear, however that's how LSC seems
> > > > > implemented in all PMDs; it gets disabled after
> > > > > rte_eth_dev_stop() and one should explicitly use
> > > > > rte_eth_link_get() to retrieve link status afterward. I think
> > > > > RMV should behave similarly with
> > > rte_eth_dev_is_removed().
> > > > > Adapting fail-safe should be easier than modifying all the
> > > > > remaining
> > > PMDs.
> > > >
> > > > Or maybe PMDs which do it make mistakes.
> > >
> > > I'm not convinced mlx4 is the only PMD doing the right thing, we
> > > should ask other maintainers as well as application writers' opinion
> > > on the matter. If it's not a problem for RMV/LSC to occur while a
> > > device is stopped, then I'll agree with the approach. It still won't make it a
> fix regardless.
> >
> > Let's think about RMV event, How can the application\other dpdk entities
> to know if the device was removed when it was in stopped state?
> > Checking it synchronically (by rte_eth_dev_is_removed()) can miss the
> removal just a moment after the device came back again, errors will occur
> and no one will know why.
> >
> > So, at least for RMV event, we need the notification also in stopped state.
> 
> You sent the rte_eth_dev_is_removed() series. You're aware that PMDs
> implementing this call benefit from an automatic is_removed() check on all
> remaining devops whenever some error occur.
> In short, an application will get notified simply by getting dev_start() (or any
> other callback) return -EIO and not being able to use the device.
 
Yes, but between dev_stop to dev_start may not be any ethdev API calling.

> PMDs that do not implement is_removed() (or in addition to it) could also
> artificially trigger a RMV event after dev_start() is called. As long as the PMD
> remains quiet while the device is stopped, it's fine.

How can the PMD do it after dev_start()? Initiate alarm in dev start function to do it later And entering into race again?

I think it is not worth the effort and this patch is doing the right thing(also some other PMDs) .

Thanks.
> > > > > > > In my opinion it's not a fix, as in, it doesn't address an
> > > > > > > issue introduced by the mentioned patch whose behavior was
> correct.
> > > > > > >
> > > > > > > It's probably too late to change it now and it does address
> > > > > > > an issue seen with a use case involving this PMD, however I
> > > > > > > think the fail-safe PMD could as well poll using the
> > > > > > > recently-added
> > > > > > > rte_eth_dev_is_removed() when it's aware the underlying port
> > > > > > > is
> > > > > stopped instead of expecting interrupts.
> 
> --
> Adrien Mazarguil
> 6WIND

next prev parent reply	other threads:[~2018-01-31 17:07 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <1516356548-15057-1-git-send-email-motih@mellanox.com>
2018-01-19 10:16 ` [dpdk-stable] [PATCH v2] net/mlx4: fix port start fail after device removal Moti Haimovsky
2018-01-28 13:54   ` Shahaf Shuler
2018-01-29  8:34   ` [dpdk-stable] [PATCH v3] net/mlx4: fix dev rmv not detected after port stop Moti Haimovsky
2018-01-29 10:54     ` Shahaf Shuler
2018-01-30  9:21       ` Thomas Monjalon
2018-01-30  9:39       ` Adrien Mazarguil
2018-01-30 20:37         ` Shahaf Shuler
2018-01-31  9:15           ` Adrien Mazarguil
2018-01-31  9:54             ` [dpdk-stable] [dpdk-dev] " Gaëtan Rivet
2018-01-31 10:08             ` Matan Azrad
2018-01-31 10:43               ` Adrien Mazarguil
2018-01-31 13:44                 ` Matan Azrad
2018-01-31 14:31                   ` Adrien Mazarguil
2018-01-31 17:07                     ` Matan Azrad [this message]
2018-02-02 19:53                       ` Adrien Mazarguil
2018-02-03 19:42                         ` Matan Azrad

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=AM4PR0501MB2657E999EF07D6A9C8E4E730D2FB0@AM4PR0501MB2657.eurprd05.prod.outlook.com \
    --to=matan@mellanox.com \
    --cc=adrien.mazarguil@6wind.com \
    --cc=dev@dpdk.org \
    --cc=motih@mellanox.com \
    --cc=shahafs@mellanox.com \
    --cc=stable@dpdk.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).