Re: [dpdk-dev] [RFC] hot plug failure handle mechanism

DPDK patches and discussions
 help / color / mirror / Atom feed

From: "Ananyev, Konstantin" <konstantin.ananyev@intel.com>
To: "Richardson, Bruce" <bruce.richardson@intel.com>,
	"Guo, Jia" <jia.guo@intel.com>,
	"techboard@dpdk.org" <techboard@dpdk.org>
Cc: "dev@dpdk.org" <dev@dpdk.org>,
	"stephen@networkplumber.org" <stephen@networkplumber.org>,
	"Yigit, Ferruh" <ferruh.yigit@intel.com>,
	"gaetan.rivet@6wind.com" <gaetan.rivet@6wind.com>,
	"Wu, Jingjing" <jingjing.wu@intel.com>,
	"thomas@monjalon.net" <thomas@monjalon.net>,
	"motih@mellanox.com" <motih@mellanox.com>,
	"matan@mellanox.com" <matan@mellanox.com>,
	"Van Haaren, Harry" <harry.van.haaren@intel.com>,
	"Zhang, Qi Z" <qi.z.zhang@intel.com>,
	"Zhang, Helin" <helin.zhang@intel.com>,
	 "jblunck@infradead.org" <jblunck@infradead.org>,
	"shreyansh.jain@nxp.com" <shreyansh.jain@nxp.com>
Subject: Re: [dpdk-dev] [RFC] hot plug failure handle mechanism
Date: Wed, 6 Jun 2018 13:11:59 +0000	[thread overview]
Message-ID: <2601191342CEEE43887BDE71AB977258C0C38AF0@irsmsx105.ger.corp.intel.com> (raw)
In-Reply-To: <20180606125451.GA2960@bricha3-MOBL.ger.corp.intel.com>



> -----Original Message-----
> From: Richardson, Bruce
> Sent: Wednesday, June 6, 2018 1:55 PM
> To: Guo, Jia <jia.guo@intel.com>; techboard@dpdk.org
> Cc: dev@dpdk.org; Ananyev, Konstantin <konstantin.ananyev@intel.com>; stephen@networkplumber.org; Yigit, Ferruh
> <ferruh.yigit@intel.com>; gaetan.rivet@6wind.com; Wu, Jingjing <jingjing.wu@intel.com>; thomas@monjalon.net;
> motih@mellanox.com; matan@mellanox.com; Van Haaren, Harry <harry.van.haaren@intel.com>; Zhang, Qi Z
> <qi.z.zhang@intel.com>; Zhang, Helin <helin.zhang@intel.com>; jblunck@infradead.org; shreyansh.jain@nxp.com
> Subject: Re: [dpdk-dev] [RFC] hot plug failure handle mechanism
> 
> +Tech-board as I think that this should have more input at the design stage
> ahead of any code patches being pushed.
> 
> On Mon, Jun 04, 2018 at 09:56:10AM +0800, Guo, Jia wrote:
> > hi,bruce
> >
> >
> > On 5/29/2018 7:20 PM, Bruce Richardson wrote:
> > > On Thu, May 24, 2018 at 07:55:43AM +0100, Guo, Jia wrote:
> > > <snip>
> > > >     The hot plug failure handle mechanism should be come across as bellow:
> > > >
> > > >     1.      Add a new bus ops “handle_hot-unplug”in bus to handle bus
> > > >     read/write error, it is bus-specific and each
> > > >
> > > >     kind of bus can implement its own logic.
> > > >
> > > >     2.      Implement pci bus specific ops“pci_handle_hot_unplug”, in the
> > > >     function, base on the
> > > >
> > > >     failure address to remap memory which belong to the corresponding
> > > >     device that unplugged.
> > > >
> > > >     3.      Implement a new sigbus handler, and register it when start
> > > >     device event monitoring,
> > > >
> > > >     once the MMIO sigbus error exposure, it will trigger the above hot plug
> > > >     failure handle mechanism,
> > > >
> > > >     that will keep app, that working on packet processing, would not be
> > > >     broken and crash, then could
> > > >
> > > >     keep going clean, fail-safe or other working task.
> > > >
> > > >     4.      Also also will introduce the solution by use testpmd to show
> > > >     the example of the whole procedure like that:
> > > >
> > > >     device unplug ->failure handle->stop forwarding->stop port->close
> > > >     port->detach port.
> > > >
> > > Hi Jeff,
> > >
> > > so if I understand this correctly the proposal is that we need two parallel
> > > solutions to handle safe removal of a device.
> > >
> > > 1. We need a solution to support unpluging of the device at the bus level,
> > >     ie. remove the device from the list of devices and to make access to
> > >     that device invalid.
> > > 2. Since the removal of the device from the software lists is not going to
> > >     be instantaneous, we need a mechanism to handle any accesses to the
> > >     device from the data path until such time as the removal is complete. To
> > >     support that, you propose to add a sigbus handler which will
> > >     automatically replace any mmio bar mappings with some other memory that is
> > >     ok to access - presumable zero memory or similar.
> > >
> > > Is this understanding correct?
> >
> > i think you are correct about that.
> >
> > > Point #2 seems reasonably clear to me, but for #1, presumably the trigger
> > > to the bus needs to come from the kernel. What is planned to be used there?
> >
> > about point #1, i should clarify here is that, we will use the device event
> > monitor mechanism to detect the hot unplug event.
> > the monitor be enabled by app(or fail-safe driver), and app(fail-safe
> > driver) register the event callback. Once the hot unplug behave be detected,
> > the user's callback could be triggered to let app(fail-safe driver) know the
> > event and manage the process, it will call APIs to stop the device
> > and detach the device from the bus.
> 
> Ok. If there is no failsafe driver, and the app does not set up a handler,
> does nothing happen when we get a removal event? Will the app just crash?
> 
> >
> > > You also talk about using testpmd as a reference for this, but you don't
> > > explain how an application can be notified of a device removal, or even why
> > > that is necessary. Since all applications should now be using the proper
> > > macros when iterating device lists, and not just assuming devices 0-N are
> > > valid, what changes would you see a normal app having to make to be
> > > hotplug-safe?
> >
> > we could use app or fail-safe driver to use these mechanism , but at this
> > stage i will firstly use testpmd as a reference.
> > as above reply, testpmd should enable device event mechanism to monitor the
> > device removal, and register callback,
> > the device bdf list is managed by bus and the hoplug fail handler will be
> > process by the eal layer, then the app would be hotplug-safe.
> >
> > is there anything i miss to clarify? please shout. and i think i will detail
> > more later.
> 
> This is becoming clearer now, thanks. Just the one question above I have at
> this point.
> Given how long-running this issue of hotplug is, I'm hoping others on the
> technical board can also review this proposal.

I looked at the actual code a bit for 18.05.
It seems ok to me in general, though I provided few comments regarding
particular implementation details.
Konstantin

next prev parent reply	other threads:[~2018-06-06 13:12 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-05-24  6:55 Guo, Jia
2018-05-24 14:57 ` Matan Azrad
2018-05-25  7:49   ` Guo, Jia
2018-05-29 11:20 ` Bruce Richardson
2018-06-04  1:56   ` Guo, Jia
2018-06-06 12:54     ` Bruce Richardson
2018-06-06 13:11       ` Ananyev, Konstantin [this message]
2018-06-07  2:14       ` Guo, Jia
2018-06-14 21:37         ` Thomas Monjalon
2018-06-15  8:31           ` Guo, Jia

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=2601191342CEEE43887BDE71AB977258C0C38AF0@irsmsx105.ger.corp.intel.com \
    --to=konstantin.ananyev@intel.com \
    --cc=bruce.richardson@intel.com \
    --cc=dev@dpdk.org \
    --cc=ferruh.yigit@intel.com \
    --cc=gaetan.rivet@6wind.com \
    --cc=harry.van.haaren@intel.com \
    --cc=helin.zhang@intel.com \
    --cc=jblunck@infradead.org \
    --cc=jia.guo@intel.com \
    --cc=jingjing.wu@intel.com \
    --cc=matan@mellanox.com \
    --cc=motih@mellanox.com \
    --cc=qi.z.zhang@intel.com \
    --cc=shreyansh.jain@nxp.com \
    --cc=stephen@networkplumber.org \
    --cc=techboard@dpdk.org \
    --cc=thomas@monjalon.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).