From: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
To: dev@dpdk.org
Cc: shahafs@mellanox.com, yskoh@mellanox.com
Subject: [dpdk-dev] [PATCH] net/mlx5: fix event handler uninstall
Date: Sat, 25 May 2019 09:26:05 +0000 [thread overview]
Message-ID: <1558776365-28511-1-git-send-email-viacheslavo@mellanox.com> (raw)
When device is being closed and tries to unregister interrupt callback,
there is a chance the handler is still active (called in context of
eal_intr_thread_main thread). If so the rte_intr_callback_unregister
returns -EAGAIN and keeps the handler registered, causing crash when
underlaying resourse is gone away.
This race condition may happen if event handling in application takes
a long time. We should check the return code of unregistering routine
and try again to unregister the handler. The diagnostic messages are
shown once a second, while trying to unregister.
Fixes: 028b2a28c3cb ("net/mlx5: update event handler for multiport IB devices")
Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Yongseok Koh <yskoh@mellanox.com>
---
drivers/net/mlx5/mlx5.c | 2 +-
drivers/net/mlx5/mlx5.h | 2 ++
drivers/net/mlx5/mlx5_ethdev.c | 79 +++++++++++++++++++++++++++++++++++++++---
3 files changed, 78 insertions(+), 5 deletions(-)
diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index 9f5ec97..2344cb4 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -313,7 +313,7 @@ struct mlx5_dev_spawn_data {
**/
assert(!sh->intr_cnt);
if (sh->intr_cnt)
- rte_intr_callback_unregister
+ mlx5_intr_callback_unregister
(&sh->intr_handle, mlx5_dev_interrupt_handler, sh);
pthread_mutex_destroy(&sh->intr_mutex);
if (sh->pd)
diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index 3eaaafd..5b5b93d 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -437,6 +437,8 @@ void mlx5_nl_check_switch_info(bool nun_vf_set,
struct mlx5_switch_info *switch_info);
void mlx5_translate_port_name(const char *port_name_in,
struct mlx5_switch_info *port_info_out);
+void mlx5_intr_callback_unregister(const struct rte_intr_handle *handle,
+ rte_intr_callback_fn cb_fn, void *cb_arg);
/* mlx5_mac.c */
diff --git a/drivers/net/mlx5/mlx5_ethdev.c b/drivers/net/mlx5/mlx5_ethdev.c
index a8a7ece..f47297c 100644
--- a/drivers/net/mlx5/mlx5_ethdev.c
+++ b/drivers/net/mlx5/mlx5_ethdev.c
@@ -1226,9 +1226,80 @@ int mlx5_fw_version_get(struct rte_eth_dev *dev, char *fw_ver, size_t fw_size)
}
}
+/*
+ * Unregister callback handler safely. The handler may be active
+ * while we are trying to unregister it, in this case code -EAGAIN
+ * is returned by rte_intr_callback_unregister(). This routine checks
+ * the return code and tries to unregister handler again.
+ *
+ * @param handle
+ * interrupt handle
+ * @param cb_fn
+ * pointer to callback routine
+ * @cb_arg
+ * opaque callback parameter
+ */
+void
+mlx5_intr_callback_unregister(const struct rte_intr_handle *handle,
+ rte_intr_callback_fn cb_fn, void *cb_arg)
+{
+ /*
+ * Try to reduce timeout management overhead by not calling
+ * the timer related routines on the first iteration. If the
+ * unregistering succeeds on first call there will be no
+ * timer calls at all.
+ */
+ uint64_t twait = 0;
+ uint64_t start = 0;
+
+ do {
+ int ret;
+
+ ret = rte_intr_callback_unregister(handle, cb_fn, cb_arg);
+ if (ret >= 0)
+ return;
+ if (ret != -EAGAIN) {
+ DRV_LOG(INFO, "failed to unregister interrupt"
+ " handler (error: %d)", ret);
+ assert(false);
+ return;
+ }
+ if (twait) {
+ struct timespec onems;
+
+ /* Wait one millisecond and try again. */
+ onems.tv_sec = 0;
+ onems.tv_nsec = NS_PER_S / MS_PER_S;
+ nanosleep(&onems, 0);
+ /* Check whether one second elapsed. */
+ if ((rte_get_timer_cycles() - start) <= twait)
+ continue;
+ } else {
+ /*
+ * We get the amount of timer ticks for one second.
+ * If this amount elapsed it means we spent one
+ * second in waiting. This branch is executed once
+ * on first iteration.
+ */
+ twait = rte_get_timer_hz();
+ assert(twait);
+ }
+ /*
+ * Timeout elapsed, show message (once a second) and retry.
+ * We have no other acceptable option here, if we ignore
+ * the unregistering return code the handler will not
+ * be unregistered, fd will be closed and we may get the
+ * crush. Hanging and messaging in the loop seems not to be
+ * the worst choice.
+ */
+ DRV_LOG(INFO, "Retrying to unregister interrupt handler");
+ start = rte_get_timer_cycles();
+ } while (true);
+}
+
/**
* Uninstall shared asynchronous device events handler.
- * This function is implemeted to support event sharing
+ * This function is implemented to support event sharing
* between multiple ports of single IB device.
*
* @param dev
@@ -1254,7 +1325,7 @@ int mlx5_fw_version_get(struct rte_eth_dev *dev, char *fw_ver, size_t fw_size)
sh->port[priv->ibv_port - 1].ih_port_id = RTE_MAX_ETHPORTS;
if (!sh->intr_cnt || --sh->intr_cnt)
goto exit;
- rte_intr_callback_unregister(&sh->intr_handle,
+ mlx5_intr_callback_unregister(&sh->intr_handle,
mlx5_dev_interrupt_handler, sh);
sh->intr_handle.fd = 0;
sh->intr_handle.type = RTE_INTR_HANDLE_UNKNOWN;
@@ -1263,8 +1334,8 @@ int mlx5_fw_version_get(struct rte_eth_dev *dev, char *fw_ver, size_t fw_size)
}
/**
- * Install shared asyncronous device events handler.
- * This function is implemeted to support event sharing
+ * Install shared asynchronous device events handler.
+ * This function is implemented to support event sharing
* between multiple ports of single IB device.
*
* @param dev
--
1.8.3.1
next reply other threads:[~2019-05-25 9:26 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-05-25 9:26 Viacheslav Ovsiienko [this message]
2019-05-26 19:16 ` Shahaf Shuler
2019-05-27 4:58 ` [dpdk-dev] [PATCH v2] " Viacheslav Ovsiienko
2019-06-02 7:53 ` Shahaf Shuler
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1558776365-28511-1-git-send-email-viacheslavo@mellanox.com \
--to=viacheslavo@mellanox.com \
--cc=dev@dpdk.org \
--cc=shahafs@mellanox.com \
--cc=yskoh@mellanox.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).