From: Jeff Guo <jia.guo@intel.com>
To: stephen@networkplumber.org, bruce.richardson@intel.com,
ferruh.yigit@intel.com, konstantin.ananyev@intel.com,
gaetan.rivet@6wind.com, jingjing.wu@intel.com,
thomas@monjalon.net, motih@mellanox.com, matan@mellanox.com,
harry.van.haaren@intel.com, qi.z.zhang@intel.com,
shaopeng.he@intel.com, bernard.iremonger@intel.com
Cc: jblunck@infradead.org, shreyansh.jain@nxp.com, dev@dpdk.org,
jia.guo@intel.com, helin.zhang@intel.com
Subject: [dpdk-dev] [PATCH V4 0/9] hot plug failure handle mechanism
Date: Fri, 29 Jun 2018 18:24:22 +0800 [thread overview]
Message-ID: <1530267871-7161-1-git-send-email-jia.guo@intel.com> (raw)
As we know, hot plug is an importance feature, either use for the datacenter
device’s fail-safe, or use for SRIOV Live Migration in SDN/NFV. It could bring
the higher flexibility and continuality to the networking services in multiple
use cases in industry. So let we see, dpdk as an importance networking
framework, what can it help to implement hot plug solution for users.
We already have a general device event detect mechanism, failsafe driver,
bonding driver and hot plug/unplug api in framework, app could use these to
develop their hot plug solution.
let’s see the case of hot unplug, it can happen when a hardware device is
be removed physically, or when the software disables it. App need to call
ether dev API to detach the device, to unplug the device at the bus level and
make access to the device invalid. But the problem is that, the removal of the
device from the software lists is not going to be instantaneous, at this time
if the data(fast) path still read/write the device, it will cause MMIO error
and result of the app crash out.
Seems that we have got fail-safe driver(or app) + RTE_ETH_EVENT_INTR_RMV +
kernel core driver solution to handle it, but still not have failsafe driver
(or app) + RTE_DEV_EVENT_REMOVE + PCIe pmd driver failure handle solution. So
there is an absence in dpdk hot plug solution right now.
Also, we know that kernel only guaranty hot plug on the kernel side, but not for
the user mode side. Firstly we can hardly have a gatekeeper for any MMIO for
multiple PMD driver. Secondly, no more specific 3rd tools such as udev/driverctl
have especially cover these hot plug failure processing. Third, the feasibility
of app’s implement for multiple user mode PMD driver is still a problem. Here,
a general hot plug failure handle mechanism in dpdk framework would be proposed,
it aim to guaranty that, when hot unplug occur, the system will not crash and
app will not be break out, and user space can normally stop and release any
relevant resources, then unplug of the device at the bus level cleanly.
The mechanism should be come across as bellow:
Firstly, app enabled the device event monitor and register the hot plug event’s
callback before running data path. Once the hot unplug behave occur, the
mechanism will detect the removal event and then accordingly do the failure
handle. In order to do that, below functional will be bring in.
- Add a new bus ops “handle_hot_unplug” to handle bus read/write error, it is
bus-specific and each kind of bus can implement its own logic.
- Implement pci bus specific ops “pci_handle_hot_unplug”. It will base on the
failure address to remap memory for the corresponding device that unplugged.
For the data path or other unexpected control from the control path when hot
unplug occur.
- Implement a new sigbus handler, it is registered when start device even
monitoring. The handler is per process. Base on the signal event principle,
control path thread and data path thread will randomly receive the sigbus
error, but will go to the common sigbus handler. Once the MMIO sigbus error
exposure, it will trigger the above hot unplug operation. The sigbus will be
check if it is cause of the hot unplug or not, if not will info exception as
the original sigbus handler. If yes, will do memory remapping.
For the control path and the igb uio release:
- When hot unplug device, the kernel will release the device resource in the
kernel side, such as the fd sys file will disappear, and the irq will be
released. At this time, if igb uio driver still try to release this resource,
it will cause kernel crash.
On the other hand, something like interrupt disable do not automatically
process in kernel side. If not handler it, this redundancy and dirty thing
will affect the interrupt resource be used by other device.
So the igb_uio driver have to check the hot plug status and corresponding
process should be taken in igb uio deriver.
This patch propose to add structure of rte_udev_state into rte_uio_pci_dev
of igb_uio kernel driver, which will record the state of uio device, such as
probed/opened/released/removed/unplug. When detect the unexpected removal
which cause of hot unplug behavior, it will corresponding disable interrupt
resource, while for the part of releasement which kernel have already handle,
just skip it to avoid double free or null pointer kernel crash issue.
The mechanism could be use for fail-safe driver and app which want to use hot
plug solution. At this stage, will only use testpmd as reference to show how to
use the mechanism.
- Enable device event monitor->device unplug->failure handle->stop forwarding->
stop port->close port->detach port.
This process will not breaking the app/fail-safe running, and will not break
other irrelevance device. And app could plug in the device and restart the date
path again by below.
- Device plug in->bind igb_uio driver ->attached device->start port->
start forwarding.
patchset history:
v4->v3:
split patches to be small and clear
change to use new parameter "--hotplug-mode" in testpmd
to identify the eal hotplug and ethdev hotplug
v3->v2:
change bus ops name to bus_hotplug_handler.
add new API and bus ops of bus_signal_handler
distingush handle generic sigbus and hotplug sigbus
v2->v1(v21):
refine some doc and commit log
fix igb uio kernel issue for control path failure
rebase testpmd code
Since the hot plug solution be discussed serval around in the public, the
scope be changed and the patch set be split into many times. Coming to the
recently RFC and feature design, it just focus on the hot unplug failure
handler at this patch set, so in order let this topic more clear and focus,
summarize privours patch set in history “v1(v21)”, the v2 here go ahead
for further track.
"v1(21)" == v21 as below:
v21->v20:
split function in hot unplug ops
sync failure hanlde to fix multiple process issue fix attach port issue for multiple devices case.
combind rmv callback function to be only one.
v20->v19:
clean the code
refine the remap logic for multiple device.
remove the auto binding
v19->18:
note for limitation of multiple hotplug,fix some typo, sqeeze patch.
v18->v15:
add document, add signal bus handler, refine the code to be more clear.
the prior patch history please check the patch set "add device event monitor framework"
Jeff Guo (9):
bus: introduce hotplug failure handler
bus/pci: implement hotplug handler operation
bus: introduce sigbus handler
bus/pci: implement sigbus handler operation
bus: add helper to handle sigbus
eal: add failure handle mechanism for hot plug
igb_uio: fix uio release issue when hot unplug
app/testpmd: show example to handle hot unplug
app/testpmd: enable device hotplug monitoring
app/test-pmd/parameters.c | 20 ++++++--
app/test-pmd/testpmd.c | 31 +++++++-----
app/test-pmd/testpmd.h | 8 ++-
doc/guides/testpmd_app_ug/run_app.rst | 10 +++-
drivers/bus/pci/pci_common.c | 78 +++++++++++++++++++++++++++++
drivers/bus/pci/pci_common_uio.c | 33 +++++++++++++
drivers/bus/pci/private.h | 12 +++++
kernel/linux/igb_uio/igb_uio.c | 50 +++++++++++++++++--
lib/librte_eal/common/eal_common_bus.c | 34 ++++++++++++-
lib/librte_eal/common/eal_private.h | 11 +++++
lib/librte_eal/common/include/rte_bus.h | 31 ++++++++++++
lib/librte_eal/linuxapp/eal/eal_dev.c | 88 ++++++++++++++++++++++++++++++++-
12 files changed, 381 insertions(+), 25 deletions(-)
--
2.7.4
next reply other threads:[~2018-06-29 10:26 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-06-29 10:24 Jeff Guo [this message]
2018-06-29 10:24 ` [dpdk-dev] [PATCH V4 1/9] bus: introduce hotplug failure handler Jeff Guo
2018-06-29 10:24 ` [dpdk-dev] [PATCH V4 2/9] bus/pci: implement hotplug handler operation Jeff Guo
2018-06-29 10:24 ` [dpdk-dev] [PATCH V4 3/9] bus: introduce sigbus handler Jeff Guo
2018-06-29 10:24 ` [dpdk-dev] [PATCH V4 4/9] bus/pci: implement sigbus handler operation Jeff Guo
2018-06-29 10:24 ` [dpdk-dev] [PATCH V4 5/9] bus: add helper to handle sigbus Jeff Guo
2018-06-29 10:24 ` [dpdk-dev] [PATCH V4 6/9] eal: add failure handle mechanism for hot plug Jeff Guo
2018-06-29 10:24 ` [dpdk-dev] [PATCH V4 7/9] igb_uio: fix uio release issue when hot unplug Jeff Guo
2018-06-29 10:24 ` [dpdk-dev] [PATCH V4 8/9] app/testpmd: show example to handle " Jeff Guo
2018-06-29 10:24 ` [dpdk-dev] [PATCH V4 9/9] app/testpmd: enable device hotplug monitoring Jeff Guo
-- strict thread matches above, loose matches on Subject: below --
2017-06-29 4:37 [dpdk-dev] [PATCH v3 0/2] add uevent api for hot plug Jeff Guo
2018-06-29 10:30 ` [dpdk-dev] [PATCH V4 0/9] hot plug failure handle mechanism Jeff Guo
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1530267871-7161-1-git-send-email-jia.guo@intel.com \
--to=jia.guo@intel.com \
--cc=bernard.iremonger@intel.com \
--cc=bruce.richardson@intel.com \
--cc=dev@dpdk.org \
--cc=ferruh.yigit@intel.com \
--cc=gaetan.rivet@6wind.com \
--cc=harry.van.haaren@intel.com \
--cc=helin.zhang@intel.com \
--cc=jblunck@infradead.org \
--cc=jingjing.wu@intel.com \
--cc=konstantin.ananyev@intel.com \
--cc=matan@mellanox.com \
--cc=motih@mellanox.com \
--cc=qi.z.zhang@intel.com \
--cc=shaopeng.he@intel.com \
--cc=shreyansh.jain@nxp.com \
--cc=stephen@networkplumber.org \
--cc=thomas@monjalon.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).