DPDK patches and discussions
 help / color / mirror / Atom feed
* [dpdk-dev] [PATCH V4 0/9] hot plug failure handle mechanism
@ 2018-06-29 10:24 Jeff Guo
  2018-06-29 10:24 ` [dpdk-dev] [PATCH V4 1/9] bus: introduce hotplug failure handler Jeff Guo
                   ` (8 more replies)
  0 siblings, 9 replies; 16+ messages in thread
From: Jeff Guo @ 2018-06-29 10:24 UTC (permalink / raw)
  To: stephen, bruce.richardson, ferruh.yigit, konstantin.ananyev,
	gaetan.rivet, jingjing.wu, thomas, motih, matan,
	harry.van.haaren, qi.z.zhang, shaopeng.he, bernard.iremonger
  Cc: jblunck, shreyansh.jain, dev, jia.guo, helin.zhang

As we know, hot plug is an importance feature, either use for the datacenter
device’s fail-safe, or use for SRIOV Live Migration in SDN/NFV. It could bring
the higher flexibility and continuality to the networking services in multiple
use cases in industry. So let we see, dpdk as an importance networking
framework, what can it help to implement hot plug solution for users.

We already have a general device event detect mechanism, failsafe driver,
bonding driver and hot plug/unplug api in framework, app could use these to
develop their hot plug solution.

let’s see the case of hot unplug, it can happen when a hardware device is
be removed physically, or when the software disables it.  App need to call
ether dev API to detach the device, to unplug the device at the bus level and
make access to the device invalid. But the problem is that, the removal of the
device from the software lists is not going to be instantaneous, at this time
if the data(fast) path still read/write the device, it will cause MMIO error
and result of the app crash out.

Seems that we have got fail-safe driver(or app) + RTE_ETH_EVENT_INTR_RMV +
kernel core driver solution to handle it, but still not have failsafe driver
(or app) + RTE_DEV_EVENT_REMOVE + PCIe pmd driver failure handle solution. So
there is an absence in dpdk hot plug solution right now.

Also, we know that kernel only guaranty hot plug on the kernel side, but not for
the user mode side. Firstly we can hardly have a gatekeeper for any MMIO for
multiple PMD driver. Secondly, no more specific 3rd tools such as udev/driverctl
have especially cover these hot plug failure processing. Third, the feasibility
of app’s implement for multiple user mode PMD driver is still a problem. Here,
a general hot plug failure handle mechanism in dpdk framework would be proposed,
it aim to guaranty that, when hot unplug occur, the system will not crash and
app will not be break out, and user space can normally stop and release any
relevant resources, then unplug of the device at the bus level cleanly.

The mechanism should be come across as bellow:

Firstly, app enabled the device event monitor and register the hot plug event’s
callback before running data path. Once the hot unplug behave occur, the
mechanism will detect the removal event and then accordingly do the failure
handle. In order to do that, below functional will be bring in.
 - Add a new bus ops “handle_hot_unplug” to handle bus read/write error, it is
   bus-specific and each kind of bus can implement its own logic.
 - Implement pci bus specific ops “pci_handle_hot_unplug”. It will base on the
   failure address to remap memory for the corresponding device that unplugged.

For the data path or other unexpected control from the control path when hot
unplug occur.
 - Implement a new sigbus handler, it is registered when start device even
   monitoring. The handler is per process. Base on the signal event principle,
   control path thread and data path thread will randomly receive the sigbus
   error, but will go to the common sigbus handler. Once the MMIO sigbus error
   exposure, it will trigger the above hot unplug operation. The sigbus will be
   check if it is cause of the hot unplug or not, if not will info exception as
   the original sigbus handler. If yes, will do memory remapping.

For the control path and the igb uio release:
 - When hot unplug device, the kernel will release the device resource in the
   kernel side, such as the fd sys file will disappear, and the irq will be
   released. At this time, if igb uio driver still try to release this resource,
   it will cause kernel crash.
   On the other hand, something like interrupt disable do not automatically
   process in kernel side. If not handler it, this redundancy and dirty thing
   will affect the interrupt resource be used by other device.
   So the igb_uio driver have to check the hot plug status and corresponding
   process should be taken in igb uio deriver.
   This patch propose to add structure of rte_udev_state into rte_uio_pci_dev
   of igb_uio kernel driver, which will record the state of uio device, such as
   probed/opened/released/removed/unplug. When detect the unexpected removal
   which cause of hot unplug behavior, it will corresponding disable interrupt
   resource, while for the part of releasement which kernel have already handle,
   just skip it to avoid double free or null pointer kernel crash issue.

The mechanism could be use for fail-safe driver and app which want to use hot
plug solution. At this stage, will only use testpmd as reference to show how to
use the mechanism.
 - Enable device event monitor->device unplug->failure handle->stop forwarding->
   stop port->close port->detach port.

This process will not breaking the app/fail-safe running, and will not break
other irrelevance device. And app could plug in the device and restart the date
path again by below.
 - Device plug in->bind igb_uio driver ->attached device->start port->
   start forwarding.

patchset history:
v4->v3:
split patches to be small and clear
change to use new parameter "--hotplug-mode" in testpmd
to identify the eal hotplug and ethdev hotplug

v3->v2:
change bus ops name to bus_hotplug_handler.
add new API and bus ops of bus_signal_handler
distingush handle generic sigbus and hotplug sigbus

v2->v1(v21):
refine some doc and commit log
fix igb uio kernel issue for control path failure
rebase testpmd code

Since the hot plug solution be discussed serval around in the public, the
scope be changed and the patch set be split into many times. Coming to the
recently RFC and feature design, it just focus on the hot unplug failure
handler at this patch set, so in order let this topic more clear and focus,
summarize privours patch set in history “v1(v21)”, the v2 here go ahead
for further track.

"v1(21)" == v21 as below:
v21->v20:
split function in hot unplug ops
sync failure hanlde to fix multiple process issue fix attach port issue for multiple devices case.
combind rmv callback function to be only one.

v20->v19:
clean the code
refine the remap logic for multiple device.
remove the auto binding

v19->18:
note for limitation of multiple hotplug,fix some typo, sqeeze patch.

v18->v15:
add document, add signal bus handler, refine the code to be more clear.

the prior patch history please check the patch set "add device event monitor framework"

Jeff Guo (9):
  bus: introduce hotplug failure handler
  bus/pci: implement hotplug handler operation
  bus: introduce sigbus handler
  bus/pci: implement sigbus handler operation
  bus: add helper to handle sigbus
  eal: add failure handle mechanism for hot plug
  igb_uio: fix uio release issue when hot unplug
  app/testpmd: show example to handle hot unplug
  app/testpmd: enable device hotplug monitoring

 app/test-pmd/parameters.c               | 20 ++++++--
 app/test-pmd/testpmd.c                  | 31 +++++++-----
 app/test-pmd/testpmd.h                  |  8 ++-
 doc/guides/testpmd_app_ug/run_app.rst   | 10 +++-
 drivers/bus/pci/pci_common.c            | 78 +++++++++++++++++++++++++++++
 drivers/bus/pci/pci_common_uio.c        | 33 +++++++++++++
 drivers/bus/pci/private.h               | 12 +++++
 kernel/linux/igb_uio/igb_uio.c          | 50 +++++++++++++++++--
 lib/librte_eal/common/eal_common_bus.c  | 34 ++++++++++++-
 lib/librte_eal/common/eal_private.h     | 11 +++++
 lib/librte_eal/common/include/rte_bus.h | 31 ++++++++++++
 lib/librte_eal/linuxapp/eal/eal_dev.c   | 88 ++++++++++++++++++++++++++++++++-
 12 files changed, 381 insertions(+), 25 deletions(-)

-- 
2.7.4

^ permalink raw reply	[flat|nested] 16+ messages in thread
* [dpdk-dev] [PATCH v3 0/2] add uevent api for hot plug
@ 2017-06-29  4:37 Jeff Guo
  2018-06-29 10:30 ` [dpdk-dev] [PATCH V4 0/9] hot plug failure handle mechanism Jeff Guo
  0 siblings, 1 reply; 16+ messages in thread
From: Jeff Guo @ 2017-06-29  4:37 UTC (permalink / raw)
  To: helin.zhang, jingjing.wu; +Cc: dev, jia.guo

From: "Guo, Jia" <jia.guo@intel.com>

This patch set aim to add a variable "uevent_fd" in structure
"rte_intr_handle" for enable kernel object uevent monitoring,
and add some uevent API in rte eal interrupt, that is
“rte_uevent_connect” and “rte_uevent_get”. The patch use i40e
for example, the driver could use these API to monitor and read
out the uevent, then corresponding to handle these uevent,
such as detach or attach the device.

Guo, Jia (2):
  eal: add uevent api for hot plug
  net/i40e: add hot plug monitor in i40e

 drivers/net/i40e/i40e_ethdev.c                     |  19 +++
 lib/librte_eal/common/eal_common_pci_uio.c         |   6 +-
 lib/librte_eal/linuxapp/eal/eal_interrupts.c       | 136 ++++++++++++++++++++-
 lib/librte_eal/linuxapp/eal/eal_pci_uio.c          |   6 +
 .../linuxapp/eal/include/exec-env/rte_interrupts.h |  37 ++++++
 5 files changed, 201 insertions(+), 3 deletions(-)

-- 
1.8.3.1

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2018-07-03 11:24 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-06-29 10:24 [dpdk-dev] [PATCH V4 0/9] hot plug failure handle mechanism Jeff Guo
2018-06-29 10:24 ` [dpdk-dev] [PATCH V4 1/9] bus: introduce hotplug failure handler Jeff Guo
2018-06-29 10:24 ` [dpdk-dev] [PATCH V4 2/9] bus/pci: implement hotplug handler operation Jeff Guo
2018-06-29 10:24 ` [dpdk-dev] [PATCH V4 3/9] bus: introduce sigbus handler Jeff Guo
2018-06-29 10:24 ` [dpdk-dev] [PATCH V4 4/9] bus/pci: implement sigbus handler operation Jeff Guo
2018-06-29 10:24 ` [dpdk-dev] [PATCH V4 5/9] bus: add helper to handle sigbus Jeff Guo
2018-06-29 10:24 ` [dpdk-dev] [PATCH V4 6/9] eal: add failure handle mechanism for hot plug Jeff Guo
2018-06-29 10:24 ` [dpdk-dev] [PATCH V4 7/9] igb_uio: fix uio release issue when hot unplug Jeff Guo
2018-06-29 10:24 ` [dpdk-dev] [PATCH V4 8/9] app/testpmd: show example to handle " Jeff Guo
2018-06-29 10:24 ` [dpdk-dev] [PATCH V4 9/9] app/testpmd: enable device hotplug monitoring Jeff Guo
  -- strict thread matches above, loose matches on Subject: below --
2017-06-29  4:37 [dpdk-dev] [PATCH v3 0/2] add uevent api for hot plug Jeff Guo
2018-06-29 10:30 ` [dpdk-dev] [PATCH V4 0/9] hot plug failure handle mechanism Jeff Guo
2018-06-29 10:30   ` [dpdk-dev] [PATCH V4 5/9] bus: add helper to handle sigbus Jeff Guo
2018-06-29 10:51     ` Ananyev, Konstantin
2018-06-29 11:23       ` Guo, Jia
2018-06-29 12:21         ` Ananyev, Konstantin
2018-06-29 12:52           ` Gaëtan Rivet
2018-07-03 11:24             ` Guo, Jia

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).