* [PATCH] common/mlx5: fix shared mempool subscription
@ 2022-11-03 10:44 Gregory Etelson
2022-11-06 11:08 ` Raslan Darawsheh
0 siblings, 1 reply; 2+ messages in thread
From: Gregory Etelson @ 2022-11-03 10:44 UTC (permalink / raw)
To: dev
Cc: getelson, matan, rasland, stable, Viacheslav Ovsiienko,
Anatoly Burakov, Dmitry Kozlyuk
MLX5 PMD counted each mempool subscribe invocation. The PMD expected
that the mempool subscription will be deleted after the mempool
counter dropped to 0. However, current PMD design unsubscribes mempool
callbacks only once.
As the result, the PMD destroyed mlx5_common_device but kept
shared RX subscription callback. EAL tried to activate that callback
and crashed.
The patch removes mempool subscriptions counter.
The PMD registers mempool subscription once only. An attempt
to register existing subscription returns EEXIST.
Also, the PMD expects to remove subscription when mempool unsubscribe
was activated.
Fixes: 8ad97e4b3215 ("common/mlx5: fix multi-process mempool registration")
Cc: stable@dpdk.org
Signed-off-by: Gregory Etelson <getelson@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
---
drivers/common/mlx5/mlx5_common.c | 22 +++++++++++-----------
drivers/common/mlx5/mlx5_common_mr.c | 1 -
drivers/common/mlx5/mlx5_common_mr.h | 1 -
3 files changed, 11 insertions(+), 13 deletions(-)
diff --git a/drivers/common/mlx5/mlx5_common.c b/drivers/common/mlx5/mlx5_common.c
index bf22c0694d..0ad14a48c7 100644
--- a/drivers/common/mlx5/mlx5_common.c
+++ b/drivers/common/mlx5/mlx5_common.c
@@ -577,6 +577,11 @@ mlx5_dev_mempool_event_cb(enum rte_mempool_event event, struct rte_mempool *mp,
}
}
+/**
+ * Primary and secondary processes share the `cdev` pointer.
+ * Callbacks addresses are local in each process.
+ * Therefore, each process can register private callbacks.
+ */
int
mlx5_dev_mempool_subscribe(struct mlx5_common_device *cdev)
{
@@ -588,14 +593,13 @@ mlx5_dev_mempool_subscribe(struct mlx5_common_device *cdev)
/* Callback for this device may be already registered. */
ret = rte_mempool_event_callback_register(mlx5_dev_mempool_event_cb,
cdev);
- if (ret != 0 && rte_errno != EEXIST)
- goto exit;
- __atomic_add_fetch(&cdev->mr_scache.mempool_cb_reg_n, 1,
- __ATOMIC_ACQUIRE);
/* Register mempools only once for this device. */
- if (rte_eal_process_type() == RTE_PROC_PRIMARY)
+ if (ret == 0 && rte_eal_process_type() == RTE_PROC_PRIMARY) {
rte_mempool_walk(mlx5_dev_mempool_register_cb, cdev);
- ret = 0;
+ goto exit;
+ }
+ if (ret != 0 && rte_errno == EEXIST)
+ ret = 0;
exit:
rte_rwlock_write_unlock(&cdev->mr_scache.mprwlock);
return ret;
@@ -604,15 +608,11 @@ mlx5_dev_mempool_subscribe(struct mlx5_common_device *cdev)
static void
mlx5_dev_mempool_unsubscribe(struct mlx5_common_device *cdev)
{
- uint32_t mempool_cb_reg_n;
int ret;
+ MLX5_ASSERT(cdev->dev != NULL);
if (!cdev->config.mr_mempool_reg_en)
return;
- mempool_cb_reg_n = __atomic_sub_fetch(&cdev->mr_scache.mempool_cb_reg_n,
- 1, __ATOMIC_RELEASE);
- if (mempool_cb_reg_n > 0)
- return;
/* Stop watching for mempool events and unregister all mempools. */
ret = rte_mempool_event_callback_unregister(mlx5_dev_mempool_event_cb,
cdev);
diff --git a/drivers/common/mlx5/mlx5_common_mr.c b/drivers/common/mlx5/mlx5_common_mr.c
index 1d54102b54..0e1d2434ab 100644
--- a/drivers/common/mlx5/mlx5_common_mr.c
+++ b/drivers/common/mlx5/mlx5_common_mr.c
@@ -1138,7 +1138,6 @@ mlx5_mr_create_cache(struct mlx5_mr_share_cache *share_cache, int socket)
&share_cache->dereg_mr_cb);
rte_rwlock_init(&share_cache->rwlock);
rte_rwlock_init(&share_cache->mprwlock);
- share_cache->mempool_cb_reg_n = 0;
/* Initialize B-tree and allocate memory for global MR cache table. */
return mlx5_mr_btree_init(&share_cache->cache,
MLX5_MR_BTREE_CACHE_N * 2, socket);
diff --git a/drivers/common/mlx5/mlx5_common_mr.h b/drivers/common/mlx5/mlx5_common_mr.h
index f774ccbf33..13eb350980 100644
--- a/drivers/common/mlx5/mlx5_common_mr.h
+++ b/drivers/common/mlx5/mlx5_common_mr.h
@@ -81,7 +81,6 @@ struct mlx5_mr_share_cache {
uint32_t dev_gen; /* Generation number to flush local caches. */
rte_rwlock_t rwlock; /* MR cache Lock. */
rte_rwlock_t mprwlock; /* Mempool Registration Lock. */
- uint32_t mempool_cb_reg_n; /* Mempool event callback registrants. */
struct mlx5_mr_btree cache; /* Global MR cache table. */
struct mlx5_mr_list mr_list; /* Registered MR list. */
struct mlx5_mr_list mr_free_list; /* Freed MR list. */
--
2.34.1
^ permalink raw reply [flat|nested] 2+ messages in thread
* RE: [PATCH] common/mlx5: fix shared mempool subscription
2022-11-03 10:44 [PATCH] common/mlx5: fix shared mempool subscription Gregory Etelson
@ 2022-11-06 11:08 ` Raslan Darawsheh
0 siblings, 0 replies; 2+ messages in thread
From: Raslan Darawsheh @ 2022-11-06 11:08 UTC (permalink / raw)
To: Gregory Etelson, dev
Cc: Matan Azrad, stable, Slava Ovsiienko, Anatoly Burakov, Dmitry Kozlyuk
Hi,
> -----Original Message-----
> From: Gregory Etelson <getelson@nvidia.com>
> Sent: Thursday, November 3, 2022 12:44 PM
> To: dev@dpdk.org
> Cc: Gregory Etelson <getelson@nvidia.com>; Matan Azrad
> <matan@nvidia.com>; Raslan Darawsheh <rasland@nvidia.com>;
> stable@dpdk.org; Slava Ovsiienko <viacheslavo@nvidia.com>; Anatoly
> Burakov <anatoly.burakov@intel.com>; Dmitry Kozlyuk
> <dkozlyuk@nvidia.com>
> Subject: [PATCH] common/mlx5: fix shared mempool subscription
>
> MLX5 PMD counted each mempool subscribe invocation. The PMD expected
> that the mempool subscription will be deleted after the mempool counter
> dropped to 0. However, current PMD design unsubscribes mempool
> callbacks only once.
> As the result, the PMD destroyed mlx5_common_device but kept shared RX
> subscription callback. EAL tried to activate that callback and crashed.
>
> The patch removes mempool subscriptions counter.
> The PMD registers mempool subscription once only. An attempt to register
> existing subscription returns EEXIST.
> Also, the PMD expects to remove subscription when mempool unsubscribe
> was activated.
>
> Fixes: 8ad97e4b3215 ("common/mlx5: fix multi-process mempool
> registration")
>
> Cc: stable@dpdk.org
>
> Signed-off-by: Gregory Etelson <getelson@nvidia.com>
> Acked-by: Matan Azrad <matan@nvidia.com>
Patch applied to next-net-mlx,
Kindest regards,
Raslan Darawsheh
^ permalink raw reply [flat|nested] 2+ messages in thread
end of thread, other threads:[~2022-11-06 11:08 UTC | newest]
Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-11-03 10:44 [PATCH] common/mlx5: fix shared mempool subscription Gregory Etelson
2022-11-06 11:08 ` Raslan Darawsheh
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).