From: Matan Azrad <matan@mellanox.com>
To: Gaetan Rivet <gaetan.rivet@6wind.com>
Cc: dev@dpdk.org, stable@dpdk.org
Subject: [dpdk-dev] [PATCH v5 3/3] net/failsafe: fix calling device during RMV events
Date: Thu, 8 Feb 2018 16:34:13 +0000 [thread overview]
Message-ID: <1518107653-15466-4-git-send-email-matan@mellanox.com> (raw)
In-Reply-To: <1518107653-15466-1-git-send-email-matan@mellanox.com>
Following are the failure steps:
1. The physical device is removed due to change in one of PF parameters
(e.g. MTU) 2. The interrupt thread flags the device 3. Within 2 seconds
Interrupt thread initializes the actual device removal, then every 2
seconds it tries to re-sync (plug in) the device. The trials fail as
long as VF parameter mismatches the PF parameter.
4. A control thread initiates a control operation on failsafe which
initiates this operation on the device.
5. A race condition occurs between the control thread and interrupt
thread when accessing the device data structures.
This patch mitigates the race condition in step 5.
Fixes: a46f8d584eb8 ("net/failsafe: add fail-safe PMD")
Cc: stable@dpdk.org
Signed-off-by: Matan Azrad <matan@mellanox.com>
---
drivers/net/failsafe/failsafe.c | 2 +-
drivers/net/failsafe/failsafe_eal.c | 2 +-
drivers/net/failsafe/failsafe_ether.c | 2 +-
drivers/net/failsafe/failsafe_ops.c | 26 +++++++++++++++++--------
drivers/net/failsafe/failsafe_private.h | 34 ++++++++++++++++++++++++++-------
5 files changed, 48 insertions(+), 18 deletions(-)
diff --git a/drivers/net/failsafe/failsafe.c b/drivers/net/failsafe/failsafe.c
index 7b2cdbb..6cdefd0 100644
--- a/drivers/net/failsafe/failsafe.c
+++ b/drivers/net/failsafe/failsafe.c
@@ -187,7 +187,7 @@
* If MAC address was provided as a parameter,
* apply to all probed slaves.
*/
- FOREACH_SUBDEV_STATE(sdev, i, dev, DEV_PROBED) {
+ FOREACH_SUBDEV_STATE_UNSAFE(sdev, i, dev, DEV_PROBED) {
ret = rte_eth_dev_default_mac_addr_set(PORT_ID(sdev),
mac);
if (ret) {
diff --git a/drivers/net/failsafe/failsafe_eal.c b/drivers/net/failsafe/failsafe_eal.c
index c3d6731..b3b9c32 100644
--- a/drivers/net/failsafe/failsafe_eal.c
+++ b/drivers/net/failsafe/failsafe_eal.c
@@ -126,7 +126,7 @@
int sdev_ret;
int ret = 0;
- FOREACH_SUBDEV_STATE(sdev, i, dev, DEV_PROBED) {
+ FOREACH_SUBDEV_STATE_UNSAFE(sdev, i, dev, DEV_PROBED) {
sdev_ret = rte_eal_hotplug_remove(sdev->bus->name,
sdev->dev->name);
if (sdev_ret) {
diff --git a/drivers/net/failsafe/failsafe_ether.c b/drivers/net/failsafe/failsafe_ether.c
index ca42376..f2a52c9 100644
--- a/drivers/net/failsafe/failsafe_ether.c
+++ b/drivers/net/failsafe/failsafe_ether.c
@@ -325,7 +325,7 @@
struct sub_device *sdev;
uint8_t i;
- FOREACH_SUBDEV_STATE(sdev, i, dev, DEV_ACTIVE)
+ FOREACH_SUBDEV_STATE_UNSAFE(sdev, i, dev, DEV_ACTIVE)
if (sdev->remove && fs_rxtx_clean(sdev)) {
fs_dev_stats_save(sdev);
fs_dev_remove(sdev);
diff --git a/drivers/net/failsafe/failsafe_ops.c b/drivers/net/failsafe/failsafe_ops.c
index a7c2dba..3d2cb32 100644
--- a/drivers/net/failsafe/failsafe_ops.c
+++ b/drivers/net/failsafe/failsafe_ops.c
@@ -181,6 +181,9 @@
FOREACH_SUBDEV(sdev, i, dev) {
if (sdev->state != DEV_ACTIVE)
continue;
+ if (sdev->remove == 1 && PRIV(dev)->state < DEV_STARTED)
+ /* Application shouldn't start removed sub-devices. */
+ continue;
DEBUG("Starting sub_device %d", i);
ret = rte_eth_dev_start(PORT_ID(sdev));
if (ret) {
@@ -265,10 +268,17 @@
uint8_t i;
failsafe_hotplug_alarm_cancel(dev);
- if (PRIV(dev)->state == DEV_STARTED)
+ if (PRIV(dev)->state == DEV_STARTED) {
+ /*
+ * Clean remove flags to allow stop for all sub-devices because
+ * there is not hot-plug alarm to stop the removed sub-devices.
+ */
+ FOREACH_SUBDEV_STATE_UNSAFE(sdev, i, dev, DEV_STARTED)
+ sdev->remove = 0;
dev->dev_ops->dev_stop(dev);
+ }
PRIV(dev)->state = DEV_ACTIVE - 1;
- FOREACH_SUBDEV_STATE(sdev, i, dev, DEV_ACTIVE) {
+ FOREACH_SUBDEV_STATE_UNSAFE(sdev, i, dev, DEV_ACTIVE) {
DEBUG("Closing sub_device %d", i);
rte_eth_dev_close(PORT_ID(sdev));
sdev->state = DEV_ACTIVE - 1;
@@ -309,7 +319,7 @@
if (rxq->event_fd > 0)
close(rxq->event_fd);
dev = rxq->priv->dev;
- FOREACH_SUBDEV_STATE(sdev, i, dev, DEV_ACTIVE)
+ FOREACH_SUBDEV_STATE_UNSAFE(sdev, i, dev, DEV_ACTIVE)
SUBOPS(sdev, rx_queue_release)
(ETH(sdev)->data->rx_queues[rxq->qid]);
dev->data->rx_queues[rxq->qid] = NULL;
@@ -376,7 +386,7 @@
return ret;
rxq->event_fd = intr_handle.efds[0];
dev->data->rx_queues[rx_queue_id] = rxq;
- FOREACH_SUBDEV_STATE(sdev, i, dev, DEV_ACTIVE) {
+ FOREACH_SUBDEV_STATE_UNSAFE(sdev, i, dev, DEV_ACTIVE) {
ret = rte_eth_rx_queue_setup(PORT_ID(sdev),
rx_queue_id,
nb_rx_desc, socket_id,
@@ -493,7 +503,7 @@
return;
txq = queue;
dev = txq->priv->dev;
- FOREACH_SUBDEV_STATE(sdev, i, dev, DEV_ACTIVE)
+ FOREACH_SUBDEV_STATE_UNSAFE(sdev, i, dev, DEV_ACTIVE)
SUBOPS(sdev, tx_queue_release)
(ETH(sdev)->data->tx_queues[txq->qid]);
dev->data->tx_queues[txq->qid] = NULL;
@@ -548,7 +558,7 @@
txq->info.nb_desc = nb_tx_desc;
txq->priv = PRIV(dev);
dev->data->tx_queues[tx_queue_id] = txq;
- FOREACH_SUBDEV_STATE(sdev, i, dev, DEV_ACTIVE) {
+ FOREACH_SUBDEV_STATE_UNSAFE(sdev, i, dev, DEV_ACTIVE) {
ret = rte_eth_tx_queue_setup(PORT_ID(sdev),
tx_queue_id,
nb_tx_desc, socket_id,
@@ -663,7 +673,7 @@
int ret;
rte_memcpy(stats, &PRIV(dev)->stats_accumulator, sizeof(*stats));
- FOREACH_SUBDEV_STATE(sdev, i, dev, DEV_ACTIVE) {
+ FOREACH_SUBDEV_STATE_UNSAFE(sdev, i, dev, DEV_ACTIVE) {
struct rte_eth_stats *snapshot = &sdev->stats_snapshot.stats;
uint64_t *timestamp = &sdev->stats_snapshot.timestamp;
@@ -746,7 +756,7 @@
rx_offload_capa = default_infos.rx_offload_capa;
rxq_offload_capa = default_infos.rx_queue_offload_capa;
- FOREACH_SUBDEV_STATE(sdev, i, dev, DEV_PROBED) {
+ FOREACH_SUBDEV_STATE_UNSAFE(sdev, i, dev, DEV_PROBED) {
rte_eth_dev_info_get(PORT_ID(sdev),
&PRIV(dev)->infos);
rx_offload_capa &= PRIV(dev)->infos.rx_offload_capa;
diff --git a/drivers/net/failsafe/failsafe_private.h b/drivers/net/failsafe/failsafe_private.h
index f3be152..7ddd63a 100644
--- a/drivers/net/failsafe/failsafe_private.h
+++ b/drivers/net/failsafe/failsafe_private.h
@@ -244,16 +244,31 @@ int failsafe_eth_lsc_event_callback(uint16_t port_id,
((sdev)->sid)
/**
- * Stateful iterator construct over fail-safe sub-devices:
+ * Stateful iterator construct over fail-safe sub-devices,
+ * including the removed sub-devices:
+ * s: (struct sub_device *), iterator
+ * i: (uint8_t), increment
+ * dev: (struct rte_eth_dev *), fail-safe ethdev
+ * state: (enum dev_state), minimum acceptable device state
+ */
+
+#define FOREACH_SUBDEV_STATE_UNSAFE(s, i, dev, state) \
+ for (s = fs_find_next((dev), 0, state, 0, &i); \
+ s != NULL; \
+ s = fs_find_next((dev), i + 1, state, 0, &i))
+
+/**
+ * Stateful iterator construct over fail-safe sub-devices,
+ * except the removed sub-devices:
* s: (struct sub_device *), iterator
* i: (uint8_t), increment
* dev: (struct rte_eth_dev *), fail-safe ethdev
* state: (enum dev_state), minimum acceptable device state
*/
#define FOREACH_SUBDEV_STATE(s, i, dev, state) \
- for (s = fs_find_next((dev), 0, state, &i); \
+ for (s = fs_find_next((dev), 0, state, 1, &i); \
s != NULL; \
- s = fs_find_next((dev), i + 1, state, &i))
+ s = fs_find_next((dev), i + 1, state, 1, &i))
/**
* Iterator construct over fail-safe sub-devices:
@@ -262,7 +277,7 @@ int failsafe_eth_lsc_event_callback(uint16_t port_id,
* dev: (struct rte_eth_dev *), fail-safe ethdev
*/
#define FOREACH_SUBDEV(s, i, dev) \
- FOREACH_SUBDEV_STATE(s, i, dev, DEV_UNDEFINED)
+ FOREACH_SUBDEV_STATE_UNSAFE(s, i, dev, DEV_UNDEFINED)
/* dev: (struct rte_eth_dev *) fail-safe device */
#define PREFERRED_SUBDEV(dev) \
@@ -328,6 +343,7 @@ int failsafe_eth_lsc_event_callback(uint16_t port_id,
fs_find_next(struct rte_eth_dev *dev,
uint8_t sid,
enum dev_state min_state,
+ uint8_t check_remove,
uint8_t *sid_out)
{
struct sub_device *subs;
@@ -336,8 +352,12 @@ int failsafe_eth_lsc_event_callback(uint16_t port_id,
subs = PRIV(dev)->subs;
tail = PRIV(dev)->subs_tail;
while (sid < tail) {
- if (subs[sid].state >= min_state)
- break;
+ if (subs[sid].state >= min_state) {
+ if (check_remove == 0)
+ break;
+ if (PRIV(dev)->subs[sid].remove == 0)
+ break;
+ }
sid++;
}
*sid_out = sid;
@@ -376,7 +396,7 @@ int failsafe_eth_lsc_event_callback(uint16_t port_id,
uint8_t i;
/* Using acceptable device */
- FOREACH_SUBDEV_STATE(sdev, i, dev, req_state) {
+ FOREACH_SUBDEV_STATE_UNSAFE(sdev, i, dev, req_state) {
if (sdev == banned)
continue;
DEBUG("Switching tx_dev to sub_device %d",
--
1.8.3.1
next prev parent reply other threads:[~2018-02-08 16:34 UTC|newest]
Thread overview: 36+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-09-09 19:27 [dpdk-dev] [PATCH] " Ophir Munk
2017-09-11 8:31 ` Gaëtan Rivet
2017-09-23 21:57 ` Ophir Munk
2017-10-05 22:42 ` [dpdk-dev] [PATCH v3] " Ophir Munk
2017-10-20 10:35 ` Gaëtan Rivet
2017-10-23 7:17 ` Ophir Munk
2017-10-23 8:36 ` Gaëtan Rivet
2017-11-29 19:17 ` [dpdk-dev] [dpdk-stable] " Ferruh Yigit
2018-01-18 22:22 ` Thomas Monjalon
2018-01-18 23:35 ` Gaëtan Rivet
2018-02-08 12:20 ` [dpdk-dev] [PATCH v4 0/2] failsafe: " Matan Azrad
2018-02-08 12:20 ` [dpdk-dev] [PATCH v4 1/2] net/failsafe: fix hotplug alarm cancel Matan Azrad
2018-02-08 12:20 ` [dpdk-dev] [PATCH v4 2/2] net/failsafe: fix calling device during RMV events Matan Azrad
2018-02-08 16:34 ` [dpdk-dev] [PATCH v5 0/3] failsafe: " Matan Azrad
2018-02-08 16:34 ` [dpdk-dev] [PATCH v5 1/3] net/failsafe: fix hotplug alarm cancel Matan Azrad
2018-02-08 16:34 ` [dpdk-dev] [PATCH v5 2/3] net/failsafe: fix removal scope Matan Azrad
2018-02-08 17:19 ` Gaëtan Rivet
2018-02-08 19:03 ` Matan Azrad
2018-02-08 16:34 ` Matan Azrad [this message]
2018-02-08 18:11 ` [dpdk-dev] [PATCH v5 3/3] net/failsafe: fix calling device during RMV events Gaëtan Rivet
2018-02-08 19:24 ` Matan Azrad
2018-02-11 17:24 ` [dpdk-dev] [PATCH v6 0/3] failsafe: fix hotplug races Matan Azrad
2018-02-11 17:24 ` [dpdk-dev] [PATCH v6 1/3] net/failsafe: fix hotplug alarm cancel Matan Azrad
2018-02-11 17:24 ` [dpdk-dev] [PATCH v6 2/3] net/failsafe: fix removal scope Matan Azrad
2018-02-11 17:24 ` [dpdk-dev] [PATCH v6 3/3] net/failsafe: fix hotplug races Matan Azrad
2018-02-12 18:33 ` Gaëtan Rivet
2018-02-12 20:35 ` Matan Azrad
2018-02-12 20:51 ` [dpdk-dev] [PATCH v7 0/3] failsafe: " Matan Azrad
2018-02-12 20:51 ` [dpdk-dev] [PATCH v7 1/3] net/failsafe: fix hotplug alarm cancel Matan Azrad
2018-02-12 20:51 ` [dpdk-dev] [PATCH v7 2/3] net/failsafe: fix removal scope Matan Azrad
2018-02-12 20:51 ` [dpdk-dev] [PATCH v7 3/3] net/failsafe: fix hotplug races Matan Azrad
2018-02-13 13:31 ` [dpdk-dev] [PATCH v7 0/3] failsafe: " Gaëtan Rivet
2018-02-13 16:12 ` Thomas Monjalon
2018-02-13 20:58 ` De Lara Guarch, Pablo
2018-02-13 21:13 ` Matan Azrad
2018-02-13 21:21 ` Thomas Monjalon
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1518107653-15466-4-git-send-email-matan@mellanox.com \
--to=matan@mellanox.com \
--cc=dev@dpdk.org \
--cc=gaetan.rivet@6wind.com \
--cc=stable@dpdk.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).