* [PATCH] net/mlx5: workaround list management of Rx queue control
@ 2024-07-23 11:05 Bing Zhao
2024-07-23 11:14 ` [PATCH v2] " Bing Zhao
0 siblings, 1 reply; 3+ messages in thread
From: Bing Zhao @ 2024-07-23 11:05 UTC (permalink / raw)
To: dsosnowski, viacheslavo, dev, rasland; +Cc: orika, suanmingm, matan
The LIST_REMOVE macro only removes the entry from the list and
updates list itself. The pointers of this entry are not reset to
NULL to prevent the accessing for the 2nd time.
In the previous fix for the memory accessing, the "rxq_ctrl" was
removed from the list in a device private data when the "refcnt" was
decreased to 0. Under only shared or non-shared queues scenarios,
this was safe since all the "rxq_ctrl" entries were freed or kept.
There is one case that shared and non-shared Rx queues are configured
simultaneously, for example, a hairpin Rx queue cannot be shared.
When closing the port that allocated the shared Rx queues'
"rxq_ctrl", if the next entry is hairpin "rxq_ctrl", the hairpin
"rxq_ctrl" will be freed directly with other resources. When trying
to close the another port sharing the "rxq_ctrl", the LIST_REMOVE
will be called again and cause some UFA issue. If the memory is no
longer mapped, there will be a SIGSEGV.
Adding a flag in the Rx queue private structure to remove the
"rxq_ctrl" from the list only on the port/queue that allocated it.
Fixes: bcc220cb57d7 ("net/mlx5: fix shared Rx queue list management")
Signed-off-by: Bing Zhao <bingz@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
---
drivers/net/mlx5/mlx5_rx.h | 1 +
drivers/net/mlx5/mlx5_rxq.c | 5 ++++-
2 files changed, 5 insertions(+), 1 deletion(-)
diff --git a/drivers/net/mlx5/mlx5_rx.h b/drivers/net/mlx5/mlx5_rx.h
index 7d144921ab..9bcb43b007 100644
--- a/drivers/net/mlx5/mlx5_rx.h
+++ b/drivers/net/mlx5/mlx5_rx.h
@@ -173,6 +173,7 @@ struct mlx5_rxq_ctrl {
/* RX queue private data. */
struct mlx5_rxq_priv {
uint16_t idx; /* Queue index. */
+ bool possessor; /* Shared rxq_ctrl allocated for the 1st time. */
RTE_ATOMIC(uint32_t) refcnt; /* Reference counter. */
struct mlx5_rxq_ctrl *ctrl; /* Shared Rx Queue. */
LIST_ENTRY(mlx5_rxq_priv) owner_entry; /* Entry in shared rxq_ctrl. */
diff --git a/drivers/net/mlx5/mlx5_rxq.c b/drivers/net/mlx5/mlx5_rxq.c
index f13fc3b353..b9240deb97 100644
--- a/drivers/net/mlx5/mlx5_rxq.c
+++ b/drivers/net/mlx5/mlx5_rxq.c
@@ -938,6 +938,7 @@ mlx5_rx_queue_setup(struct rte_eth_dev *dev, uint16_t idx, uint16_t desc,
rte_errno = ENOMEM;
return -rte_errno;
}
+ rxq->possessor = true;
}
rxq->priv = priv;
rxq->idx = idx;
@@ -2015,6 +2016,7 @@ mlx5_rxq_hairpin_new(struct rte_eth_dev *dev, struct mlx5_rxq_priv *rxq,
tmpl->rxq.mr_ctrl.cache_bh = (struct mlx5_mr_btree) { 0 };
tmpl->rxq.idx = idx;
rxq->hairpin_conf = *hairpin_conf;
+ rxq->possessor = true;
mlx5_rxq_ref(dev, idx);
LIST_INSERT_HEAD(&priv->rxqsctrl, tmpl, next);
return tmpl;
@@ -2282,7 +2284,8 @@ mlx5_rxq_release(struct rte_eth_dev *dev, uint16_t idx)
RTE_ETH_QUEUE_STATE_STOPPED;
}
} else { /* Refcnt zero, closing device. */
- LIST_REMOVE(rxq_ctrl, next);
+ if (rxq->possessor == true)
+ LIST_REMOVE(rxq_ctrl, next);
LIST_REMOVE(rxq, owner_entry);
if (LIST_EMPTY(&rxq_ctrl->owners)) {
if (!rxq_ctrl->is_hairpin)
--
2.34.1
^ permalink raw reply [flat|nested] 3+ messages in thread
* [PATCH v2] net/mlx5: workaround list management of Rx queue control
2024-07-23 11:05 [PATCH] net/mlx5: workaround list management of Rx queue control Bing Zhao
@ 2024-07-23 11:14 ` Bing Zhao
2024-08-29 9:00 ` Raslan Darawsheh
0 siblings, 1 reply; 3+ messages in thread
From: Bing Zhao @ 2024-07-23 11:14 UTC (permalink / raw)
To: dsosnowski, viacheslavo, dev, rasland; +Cc: orika, suanmingm, matan
The LIST_REMOVE macro only removes the entry from the list and
updates list itself. The pointers of this entry are not reset to
NULL to prevent the accessing for the 2nd time.
In the previous fix for the memory accessing, the "rxq_ctrl" was
removed from the list in a device private data when the "refcnt" was
decreased to 0. Under only shared or non-shared queues scenarios,
this was safe since all the "rxq_ctrl" entries were freed or kept.
There is one case that shared and non-shared Rx queues are configured
simultaneously, for example, a hairpin Rx queue cannot be shared.
When closing the port that allocated the shared Rx queues'
"rxq_ctrl", if the next entry is hairpin "rxq_ctrl", the hairpin
"rxq_ctrl" will be freed directly with other resources. When trying
to close the another port sharing the "rxq_ctrl", the LIST_REMOVE
will be called again and cause some UFA issue. If the memory is no
longer mapped, there will be a SIGSEGV.
Adding a flag in the Rx queue private structure to remove the
"rxq_ctrl" from the list only on the port/queue that allocated it.
Fixes: bcc220cb57d7 ("net/mlx5: fix shared Rx queue list management")
Signed-off-by: Bing Zhao <bingz@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
---
v2: fix CI code style warning
---
drivers/net/mlx5/mlx5_rx.h | 1 +
drivers/net/mlx5/mlx5_rxq.c | 5 ++++-
2 files changed, 5 insertions(+), 1 deletion(-)
diff --git a/drivers/net/mlx5/mlx5_rx.h b/drivers/net/mlx5/mlx5_rx.h
index 7d144921ab..9bcb43b007 100644
--- a/drivers/net/mlx5/mlx5_rx.h
+++ b/drivers/net/mlx5/mlx5_rx.h
@@ -173,6 +173,7 @@ struct mlx5_rxq_ctrl {
/* RX queue private data. */
struct mlx5_rxq_priv {
uint16_t idx; /* Queue index. */
+ bool possessor; /* Shared rxq_ctrl allocated for the 1st time. */
RTE_ATOMIC(uint32_t) refcnt; /* Reference counter. */
struct mlx5_rxq_ctrl *ctrl; /* Shared Rx Queue. */
LIST_ENTRY(mlx5_rxq_priv) owner_entry; /* Entry in shared rxq_ctrl. */
diff --git a/drivers/net/mlx5/mlx5_rxq.c b/drivers/net/mlx5/mlx5_rxq.c
index f13fc3b353..c6655b7db4 100644
--- a/drivers/net/mlx5/mlx5_rxq.c
+++ b/drivers/net/mlx5/mlx5_rxq.c
@@ -938,6 +938,7 @@ mlx5_rx_queue_setup(struct rte_eth_dev *dev, uint16_t idx, uint16_t desc,
rte_errno = ENOMEM;
return -rte_errno;
}
+ rxq->possessor = true;
}
rxq->priv = priv;
rxq->idx = idx;
@@ -2015,6 +2016,7 @@ mlx5_rxq_hairpin_new(struct rte_eth_dev *dev, struct mlx5_rxq_priv *rxq,
tmpl->rxq.mr_ctrl.cache_bh = (struct mlx5_mr_btree) { 0 };
tmpl->rxq.idx = idx;
rxq->hairpin_conf = *hairpin_conf;
+ rxq->possessor = true;
mlx5_rxq_ref(dev, idx);
LIST_INSERT_HEAD(&priv->rxqsctrl, tmpl, next);
return tmpl;
@@ -2282,7 +2284,8 @@ mlx5_rxq_release(struct rte_eth_dev *dev, uint16_t idx)
RTE_ETH_QUEUE_STATE_STOPPED;
}
} else { /* Refcnt zero, closing device. */
- LIST_REMOVE(rxq_ctrl, next);
+ if (rxq->possessor)
+ LIST_REMOVE(rxq_ctrl, next);
LIST_REMOVE(rxq, owner_entry);
if (LIST_EMPTY(&rxq_ctrl->owners)) {
if (!rxq_ctrl->is_hairpin)
--
2.34.1
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [PATCH v2] net/mlx5: workaround list management of Rx queue control
2024-07-23 11:14 ` [PATCH v2] " Bing Zhao
@ 2024-08-29 9:00 ` Raslan Darawsheh
0 siblings, 0 replies; 3+ messages in thread
From: Raslan Darawsheh @ 2024-08-29 9:00 UTC (permalink / raw)
To: Bing Zhao, Dariusz Sosnowski, Slava Ovsiienko, dev
Cc: Ori Kam, Suanming Mou, Matan Azrad
Hi,
From: Bing Zhao <bingz@nvidia.com>
Sent: Tuesday, July 23, 2024 2:14 PM
To: Dariusz Sosnowski; Slava Ovsiienko; dev@dpdk.org; Raslan Darawsheh
Cc: Ori Kam; Suanming Mou; Matan Azrad
Subject: [PATCH v2] net/mlx5: workaround list management of Rx queue control
The LIST_REMOVE macro only removes the entry from the list and
updates list itself. The pointers of this entry are not reset to
NULL to prevent the accessing for the 2nd time.
In the previous fix for the memory accessing, the "rxq_ctrl" was
removed from the list in a device private data when the "refcnt" was
decreased to 0. Under only shared or non-shared queues scenarios,
this was safe since all the "rxq_ctrl" entries were freed or kept.
There is one case that shared and non-shared Rx queues are configured
simultaneously, for example, a hairpin Rx queue cannot be shared.
When closing the port that allocated the shared Rx queues'
"rxq_ctrl", if the next entry is hairpin "rxq_ctrl", the hairpin
"rxq_ctrl" will be freed directly with other resources. When trying
to close the another port sharing the "rxq_ctrl", the LIST_REMOVE
will be called again and cause some UFA issue. If the memory is no
longer mapped, there will be a SIGSEGV.
Adding a flag in the Rx queue private structure to remove the
"rxq_ctrl" from the list only on the port/queue that allocated it.
Fixes: bcc220cb57d7 ("net/mlx5: fix shared Rx queue list management")
Signed-off-by: Bing Zhao <bingz@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
---
v2: fix CI code style warning
---
Patch applied to next-net-mlx,
Kindest regards,
Raslan Darawsheh
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2024-08-29 9:00 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-07-23 11:05 [PATCH] net/mlx5: workaround list management of Rx queue control Bing Zhao
2024-07-23 11:14 ` [PATCH v2] " Bing Zhao
2024-08-29 9:00 ` Raslan Darawsheh
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).