DPDK patches and discussions
 help / color / mirror / Atom feed
* [dpdk-dev] [PATCH] net/mlx5: fix memory regions release deadlock
@ 2020-02-04 10:08 Michael Baum
  2020-02-04 13:36 ` [dpdk-dev] [PATCH v2] " Michael Baum
  0 siblings, 1 reply; 3+ messages in thread
From: Michael Baum @ 2020-02-04 10:08 UTC (permalink / raw)
  To: dev; +Cc: Matan Azrad, Viacheslav Ovsiienko, stable

When we create memory callback list, we add cb function managing memory
regions. This function uses lock. This callback iterates over shared
device list and takes a lock of each shared device to avoid parallel
accessing to the MR list of the shared device.
When PND releases memory regions while the list is exist, the callback
function maps the MRs using the lock.
In shared device closing, when all its MRs are freed, the same lock is
taken.

The MRs freeing calls rte_free what may trigger the memory callback.
The MR freeing, wrongly, took the lock before the shared device removal
from the callback list what causes the deadlock.

In order to solve it, first we remove the share device from the list and
then release memory regions.

Fixes: 0e3d0525b2f2 ("net/mlx5: fix memory event callback list")
Cc: viacheslavo@mellanox.com
Cc: stable@dpdk.org

Signed-off-by: Michael Baum <michaelba@mellanox.com>
---
 drivers/net/mlx5/mlx5.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index f80e403..759491f 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -679,12 +679,12 @@ struct mlx5_flow_id_pool *
 	MLX5_ASSERT(rte_eal_process_type() == RTE_PROC_PRIMARY);
 	if (--sh->refcnt)
 		goto exit;
-	/* Release created Memory Regions. */
-	mlx5_mr_release(sh);
 	/* Remove from memory callback device list. */
 	rte_rwlock_write_lock(&mlx5_shared_data->mem_event_rwlock);
 	LIST_REMOVE(sh, mem_event_cb);
 	rte_rwlock_write_unlock(&mlx5_shared_data->mem_event_rwlock);
+	/* Release created Memory Regions. */
+	mlx5_mr_release(sh);
 	/* Remove context from the global device list. */
 	LIST_REMOVE(sh, next);
 	/*
-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 3+ messages in thread

* [dpdk-dev] [PATCH v2] net/mlx5: fix memory regions release deadlock
  2020-02-04 10:08 [dpdk-dev] [PATCH] net/mlx5: fix memory regions release deadlock Michael Baum
@ 2020-02-04 13:36 ` Michael Baum
  2020-02-05  9:44   ` Raslan Darawsheh
  0 siblings, 1 reply; 3+ messages in thread
From: Michael Baum @ 2020-02-04 13:36 UTC (permalink / raw)
  To: dev; +Cc: Matan Azrad, Viacheslav Ovsiienko, stable

The mpx5 PMD maintains the list of devices for those the memory
operation callback routines must be invoked to keep the device MRs (MR
is the entity backing the hardware DMA transactions) consistent with the
mapped memory.
Each device context in the list is protected with dedicated lock on per
device basis, which might be taken inside the callback routine.

When device is closing the PMD frees all MRs by calling
mlx5_mr_release(), that might call rte_free() under the taken device
lock.  If this rte_free call triggers the entire memory segment freeing
it, in its turn, invokes the callback routine and attempt to take the
lock inside this one causes the deadlock.

The patch proposes the remove the device from the callback list first
and then call mlx5_mr_release() and free the remaining device MRs
explicitely.

Fixes: 0e3d0525b2f2 ("net/mlx5: fix memory event callback list")
Cc: stable@dpdk.org

Signed-off-by: Michael Baum <michaelba@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
---

v2:
rephrase commit masage.


 drivers/net/mlx5/mlx5.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index f80e403..759491f 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -679,12 +679,12 @@ struct mlx5_flow_id_pool *
 	MLX5_ASSERT(rte_eal_process_type() == RTE_PROC_PRIMARY);
 	if (--sh->refcnt)
 		goto exit;
-	/* Release created Memory Regions. */
-	mlx5_mr_release(sh);
 	/* Remove from memory callback device list. */
 	rte_rwlock_write_lock(&mlx5_shared_data->mem_event_rwlock);
 	LIST_REMOVE(sh, mem_event_cb);
 	rte_rwlock_write_unlock(&mlx5_shared_data->mem_event_rwlock);
+	/* Release created Memory Regions. */
+	mlx5_mr_release(sh);
 	/* Remove context from the global device list. */
 	LIST_REMOVE(sh, next);
 	/*
-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [dpdk-dev] [PATCH v2] net/mlx5: fix memory regions release deadlock
  2020-02-04 13:36 ` [dpdk-dev] [PATCH v2] " Michael Baum
@ 2020-02-05  9:44   ` Raslan Darawsheh
  0 siblings, 0 replies; 3+ messages in thread
From: Raslan Darawsheh @ 2020-02-05  9:44 UTC (permalink / raw)
  To: Michael Baum, dev; +Cc: Matan Azrad, Slava Ovsiienko, stable

Hi,

> -----Original Message-----
> From: dev <dev-bounces@dpdk.org> On Behalf Of Michael Baum
> Sent: Tuesday, February 4, 2020 3:36 PM
> To: dev@dpdk.org
> Cc: Matan Azrad <matan@mellanox.com>; Slava Ovsiienko
> <viacheslavo@mellanox.com>; stable@dpdk.org
> Subject: [dpdk-dev] [PATCH v2] net/mlx5: fix memory regions release
> deadlock
> 
> The mpx5 PMD maintains the list of devices for those the memory
> operation callback routines must be invoked to keep the device MRs (MR
> is the entity backing the hardware DMA transactions) consistent with the
> mapped memory.
> Each device context in the list is protected with dedicated lock on per
> device basis, which might be taken inside the callback routine.
> 
> When device is closing the PMD frees all MRs by calling
> mlx5_mr_release(), that might call rte_free() under the taken device
> lock.  If this rte_free call triggers the entire memory segment freeing
> it, in its turn, invokes the callback routine and attempt to take the
> lock inside this one causes the deadlock.
> 
> The patch proposes the remove the device from the callback list first
> and then call mlx5_mr_release() and free the remaining device MRs
> explicitely.
> 
> Fixes: 0e3d0525b2f2 ("net/mlx5: fix memory event callback list")
> Cc: stable@dpdk.org
> 
> Signed-off-by: Michael Baum <michaelba@mellanox.com>
> Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
> Acked-by: Matan Azrad <matan@mellanox.com>
> ---
> 
> v2:
> rephrase commit masage.
> 
> 
>  drivers/net/mlx5/mlx5.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
> index f80e403..759491f 100644
> --- a/drivers/net/mlx5/mlx5.c
> +++ b/drivers/net/mlx5/mlx5.c
> @@ -679,12 +679,12 @@ struct mlx5_flow_id_pool *
>  	MLX5_ASSERT(rte_eal_process_type() == RTE_PROC_PRIMARY);
>  	if (--sh->refcnt)
>  		goto exit;
> -	/* Release created Memory Regions. */
> -	mlx5_mr_release(sh);
>  	/* Remove from memory callback device list. */
>  	rte_rwlock_write_lock(&mlx5_shared_data->mem_event_rwlock);
>  	LIST_REMOVE(sh, mem_event_cb);
>  	rte_rwlock_write_unlock(&mlx5_shared_data-
> >mem_event_rwlock);
> +	/* Release created Memory Regions. */
> +	mlx5_mr_release(sh);
>  	/* Remove context from the global device list. */
>  	LIST_REMOVE(sh, next);
>  	/*
> --
> 1.8.3.1

Fixed typo in commit msg,

Patch applied to next-net-mlx,

Kindest regards,
Raslan Darawsheh

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2020-02-05  9:44 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-02-04 10:08 [dpdk-dev] [PATCH] net/mlx5: fix memory regions release deadlock Michael Baum
2020-02-04 13:36 ` [dpdk-dev] [PATCH v2] " Michael Baum
2020-02-05  9:44   ` Raslan Darawsheh

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).