* [dpdk-stable] [PATCH 1/6] net/mlx5: fix shared device context creation error flow
       [not found] <20210831203732.3411134-1-michaelba@nvidia.com>
@ 2021-08-31 20:37 ` Michael Baum
  2021-09-05  8:46   ` Raslan Darawsheh
  2021-08-31 20:37 ` [dpdk-stable] [PATCH 2/6] net/mlx5: fix PCI probing " Michael Baum
                   ` (5 subsequent siblings)
  6 siblings, 1 reply; 14+ messages in thread
From: Michael Baum @ 2021-08-31 20:37 UTC (permalink / raw)
  To: dev; +Cc: Matan Azrad, Raslan Darawsheh, Viacheslav Ovsiienko, stable
In shared device context creation, there are two validations after MR
btree memory allocation.
When one of them fails, the MR btree memory was not freed what caused a
memory leak.
Free it.
Fixes: 632f0f19056f ("net/mlx5: manage shared counters in three-level table")
Cc: stable@dpdk.org
Signed-off-by: Michael Baum <michaelba@nvidia.com>
---
 drivers/net/mlx5/mlx5.c | 2 ++
 1 file changed, 2 insertions(+)
diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index f84e061fe7..f0ec2d1279 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -1254,6 +1254,8 @@ mlx5_alloc_shared_dev_ctx(const struct mlx5_dev_spawn_data *spawn,
 	MLX5_ASSERT(sh);
 	if (sh->cnt_id_tbl)
 		mlx5_l3t_destroy(sh->cnt_id_tbl);
+	if (sh->share_cache.cache.table)
+		mlx5_mr_btree_free(&sh->share_cache.cache);
 	if (sh->tis)
 		claim_zero(mlx5_devx_cmd_destroy(sh->tis));
 	if (sh->td)
-- 
2.25.1
^ permalink raw reply	[flat|nested] 14+ messages in thread
* [dpdk-stable] [PATCH 2/6] net/mlx5: fix PCI probing error flow
       [not found] <20210831203732.3411134-1-michaelba@nvidia.com>
  2021-08-31 20:37 ` [dpdk-stable] [PATCH 1/6] net/mlx5: fix shared device context creation error flow Michael Baum
@ 2021-08-31 20:37 ` Michael Baum
  2021-09-05  8:53   ` Raslan Darawsheh
  2021-08-31 20:37 ` [dpdk-stable] [PATCH 3/6] net/mlx5: fix allow duplicate pattern devarg default Michael Baum
                   ` (4 subsequent siblings)
  6 siblings, 1 reply; 14+ messages in thread
From: Michael Baum @ 2021-08-31 20:37 UTC (permalink / raw)
  To: dev; +Cc: Matan Azrad, Raslan Darawsheh, Viacheslav Ovsiienko, stable
In PCI probing, there is internal probe function that is called several
times for several PFs.
When one of them fails, the previous PFs are not destroyed what may
cause a memory leak.
Destroy them.
Fixes: 08c2772fc747 ("net/mlx5: support list of representor PF")
Cc: stable@dpdk.org
Signed-off-by: Michael Baum <michaelba@nvidia.com>
---
 drivers/net/mlx5/linux/mlx5_os.c | 13 ++++++++++++-
 drivers/net/mlx5/mlx5.c          |  2 +-
 drivers/net/mlx5/mlx5.h          |  1 +
 3 files changed, 14 insertions(+), 2 deletions(-)
diff --git a/drivers/net/mlx5/linux/mlx5_os.c b/drivers/net/mlx5/linux/mlx5_os.c
index 5f8766aa48..3d204f99f7 100644
--- a/drivers/net/mlx5/linux/mlx5_os.c
+++ b/drivers/net/mlx5/linux/mlx5_os.c
@@ -2700,9 +2700,20 @@ mlx5_os_pci_probe(struct rte_pci_device *pci_dev)
 
 	if (eth_da.nb_ports > 0) {
 		/* Iterate all port if devargs pf is range: "pf[0-1]vf[...]". */
-		for (p = 0; p < eth_da.nb_ports; p++)
+		for (p = 0; p < eth_da.nb_ports; p++) {
 			ret = mlx5_os_pci_probe_pf(pci_dev, ð_da,
 						   eth_da.ports[p]);
+			if (ret)
+				break;
+		}
+		if (ret) {
+			DRV_LOG(ERR, "Probe of PCI device " PCI_PRI_FMT " "
+				"aborted due to proding failure of PF %u",
+				pci_dev->addr.domain, pci_dev->addr.bus,
+				pci_dev->addr.devid, pci_dev->addr.function,
+				eth_da.ports[p]);
+			mlx5_net_remove(&pci_dev->device);
+		}
 	} else {
 		ret = mlx5_os_pci_probe_pf(pci_dev, ð_da, 0);
 	}
diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index f0ec2d1279..02ea2e781e 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -2386,7 +2386,7 @@ mlx5_eth_find_next(uint16_t port_id, struct rte_device *odev)
  * @return
  *   0 on success, the function cannot fail.
  */
-static int
+int
 mlx5_net_remove(struct rte_device *dev)
 {
 	uint16_t port_id;
diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index e02714e231..3581414b78 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -1483,6 +1483,7 @@ int mlx5_udp_tunnel_port_add(struct rte_eth_dev *dev,
 			      struct rte_eth_udp_tunnel *udp_tunnel);
 uint16_t mlx5_eth_find_next(uint16_t port_id, struct rte_device *odev);
 int mlx5_dev_close(struct rte_eth_dev *dev);
+int mlx5_net_remove(struct rte_device *dev);
 bool mlx5_is_hpf(struct rte_eth_dev *dev);
 bool mlx5_is_sf_repr(struct rte_eth_dev *dev);
 void mlx5_age_event_prepare(struct mlx5_dev_ctx_shared *sh);
-- 
2.25.1
^ permalink raw reply	[flat|nested] 14+ messages in thread
* [dpdk-stable] [PATCH 3/6] net/mlx5: fix allow duplicate pattern devarg default
       [not found] <20210831203732.3411134-1-michaelba@nvidia.com>
  2021-08-31 20:37 ` [dpdk-stable] [PATCH 1/6] net/mlx5: fix shared device context creation error flow Michael Baum
  2021-08-31 20:37 ` [dpdk-stable] [PATCH 2/6] net/mlx5: fix PCI probing " Michael Baum
@ 2021-08-31 20:37 ` Michael Baum
  2021-08-31 20:37 ` [dpdk-stable] [PATCH 4/6] common/mlx5: fix class combination validation Michael Baum
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 14+ messages in thread
From: Michael Baum @ 2021-08-31 20:37 UTC (permalink / raw)
  To: dev; +Cc: Matan Azrad, Raslan Darawsheh, Viacheslav Ovsiienko, stable
In order to allow\disallow configuring rules with identical patterns,
the new device argument 'allow_duplicate_pattern' was introduced.
The default is to allow, and it is initialized to 1 in PCI probe
function.
However, on auxiliary bus probing (for Sub-Function) it is not
initialized at all, so it's actually initialized to 0
Move the initialization to default config function which is called from
both.
Fixes: 919488fbfa71 ("net/mlx5: support Sub-Function")
Cc: stable@dpdk.org
Signed-off-by: Michael Baum <michaelba@nvidia.com>
---
 drivers/net/mlx5/linux/mlx5_os.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/net/mlx5/linux/mlx5_os.c b/drivers/net/mlx5/linux/mlx5_os.c
index 3d204f99f7..cf4de7e6f9 100644
--- a/drivers/net/mlx5/linux/mlx5_os.c
+++ b/drivers/net/mlx5/linux/mlx5_os.c
@@ -2142,6 +2142,7 @@ mlx5_os_config_default(struct mlx5_dev_config *config)
 	config->dv_flow_en = 1;
 	config->decap_en = 1;
 	config->log_hp_size = MLX5_ARG_UNSET;
+	config->allow_duplicate_pattern = 1;
 }
 
 /**
@@ -2564,7 +2565,6 @@ mlx5_os_pci_probe_pf(struct rte_pci_device *pci_dev,
 		/* Default configuration. */
 		mlx5_os_config_default(&dev_config);
 		dev_config.vf = dev_config_vf;
-		dev_config.allow_duplicate_pattern = 1;
 		list[i].numa_node = pci_dev->device.numa_node;
 		list[i].eth_dev = mlx5_dev_spawn(&pci_dev->device,
 						 &list[i],
-- 
2.25.1
^ permalink raw reply	[flat|nested] 14+ messages in thread
* [dpdk-stable] [PATCH 4/6] common/mlx5: fix class combination validation
       [not found] <20210831203732.3411134-1-michaelba@nvidia.com>
                   ` (2 preceding siblings ...)
  2021-08-31 20:37 ` [dpdk-stable] [PATCH 3/6] net/mlx5: fix allow duplicate pattern devarg default Michael Baum
@ 2021-08-31 20:37 ` Michael Baum
  2021-08-31 20:37 ` [dpdk-stable] [PATCH 5/6] common/mlx5: fix device list operation concurrency Michael Baum
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 14+ messages in thread
From: Michael Baum @ 2021-08-31 20:37 UTC (permalink / raw)
  To: dev; +Cc: Matan Azrad, Raslan Darawsheh, Viacheslav Ovsiienko, stable
The common probe function gets as a user argument the classes it should
create, and checks whether the combination is valid.
In case the device already exists, it checks the integration of the
above with the classes that the device has.
However, the function does not check the combination when the device
does not exist and it has to create it.
Check if the combination is valid for all cases.
Fixes: ad435d320473 ("common/mlx5: add bus-agnostic layer")
Cc: stable@dpdk.org
Signed-off-by: Michael Baum <michaelba@nvidia.com>
---
 drivers/common/mlx5/mlx5_common.c | 18 ++++++++++--------
 1 file changed, 10 insertions(+), 8 deletions(-)
diff --git a/drivers/common/mlx5/mlx5_common.c b/drivers/common/mlx5/mlx5_common.c
index 459cf4bcc4..f6e440dca1 100644
--- a/drivers/common/mlx5/mlx5_common.c
+++ b/drivers/common/mlx5/mlx5_common.c
@@ -317,14 +317,16 @@ mlx5_common_dev_probe(struct rte_device *eal_dev)
 		dev->dev = eal_dev;
 		TAILQ_INSERT_HEAD(&devices_list, dev, next);
 		new_device = true;
-	} else {
-		/* Validate combination here. */
-		ret = is_valid_class_combination(classes |
-						 dev->classes_loaded);
-		if (ret != 0) {
-			DRV_LOG(ERR, "Unsupported mlx5 classes combination.");
-			return ret;
-		}
+	}
+	/*
+	 * Validate combination here.
+	 * For new device, the classes_loaded field is 0 and it check only
+	 * the classes given as user device arguments.
+	 */
+	ret = is_valid_class_combination(classes | dev->classes_loaded);
+	if (ret != 0) {
+		DRV_LOG(ERR, "Unsupported mlx5 classes combination.");
+		goto class_err;
 	}
 	ret = drivers_probe(dev, classes);
 	if (ret)
-- 
2.25.1
^ permalink raw reply	[flat|nested] 14+ messages in thread
* [dpdk-stable] [PATCH 5/6] common/mlx5: fix device list operation concurrency
       [not found] <20210831203732.3411134-1-michaelba@nvidia.com>
                   ` (3 preceding siblings ...)
  2021-08-31 20:37 ` [dpdk-stable] [PATCH 4/6] common/mlx5: fix class combination validation Michael Baum
@ 2021-08-31 20:37 ` Michael Baum
  2021-08-31 20:37 ` [dpdk-stable] [PATCH 6/6] common/mlx5: fix resource cleanliness in a device remove Michael Baum
       [not found] ` <20210912103628.257499-1-michaelba@nvidia.com>
  6 siblings, 0 replies; 14+ messages in thread
From: Michael Baum @ 2021-08-31 20:37 UTC (permalink / raw)
  To: dev; +Cc: Matan Azrad, Raslan Darawsheh, Viacheslav Ovsiienko, stable
The MLX5 common driver has a global list of mlx5 devices which are
probed.
In probe function it create one and insert it to the list. Similarly it
removes the device in remove function.
These operations are not safe as there can be such operations in
parallel, by different threads.
Add global lock for the list and use it to insert or remove.
Fixes: 8a41f4deccc3 ("common/mlx5: introduce layer for multiple class drivers")
Cc: stable@dpdk.org
Signed-off-by: Michael Baum <michaelba@nvidia.com>
---
 drivers/common/mlx5/mlx5_common.c | 6 ++++++
 1 file changed, 6 insertions(+)
diff --git a/drivers/common/mlx5/mlx5_common.c b/drivers/common/mlx5/mlx5_common.c
index f6e440dca1..4321cb3a9c 100644
--- a/drivers/common/mlx5/mlx5_common.c
+++ b/drivers/common/mlx5/mlx5_common.c
@@ -50,6 +50,7 @@ static TAILQ_HEAD(mlx5_drivers, mlx5_class_driver) drivers_list =
 /* Head of devices. */
 static TAILQ_HEAD(mlx5_devices, mlx5_common_device) devices_list =
 				TAILQ_HEAD_INITIALIZER(devices_list);
+static pthread_mutex_t devices_list_lock;
 
 static const struct {
 	const char *name;
@@ -222,7 +223,9 @@ mlx5_dev_to_pci_str(const struct rte_device *dev, char *addr, size_t size)
 static void
 dev_release(struct mlx5_common_device *dev)
 {
+	pthread_mutex_lock(&devices_list_lock);
 	TAILQ_REMOVE(&devices_list, dev, next);
+	pthread_mutex_unlock(&devices_list_lock);
 	rte_free(dev);
 }
 
@@ -315,7 +318,9 @@ mlx5_common_dev_probe(struct rte_device *eal_dev)
 		if (!dev)
 			return -ENOMEM;
 		dev->dev = eal_dev;
+		pthread_mutex_lock(&devices_list_lock);
 		TAILQ_INSERT_HEAD(&devices_list, dev, next);
+		pthread_mutex_unlock(&devices_list_lock);
 		new_device = true;
 	}
 	/*
@@ -440,6 +445,7 @@ mlx5_common_init(void)
 	if (mlx5_common_initialized)
 		return;
 
+	pthread_mutex_init(&devices_list_lock, NULL);
 	mlx5_glue_constructor();
 	mlx5_common_driver_init();
 	mlx5_common_initialized = true;
-- 
2.25.1
^ permalink raw reply	[flat|nested] 14+ messages in thread
* [dpdk-stable] [PATCH 6/6] common/mlx5: fix resource cleanliness in a device remove
       [not found] <20210831203732.3411134-1-michaelba@nvidia.com>
                   ` (4 preceding siblings ...)
  2021-08-31 20:37 ` [dpdk-stable] [PATCH 5/6] common/mlx5: fix device list operation concurrency Michael Baum
@ 2021-08-31 20:37 ` Michael Baum
       [not found] ` <20210912103628.257499-1-michaelba@nvidia.com>
  6 siblings, 0 replies; 14+ messages in thread
From: Michael Baum @ 2021-08-31 20:37 UTC (permalink / raw)
  To: dev; +Cc: Matan Azrad, Raslan Darawsheh, Viacheslav Ovsiienko, stable
The common remove function call in a loop to remove function for each
driver which have been registered.
If all removes are succeeded, it return 0 without to free the device
which allocated in probe function. Otherwise, it free the device.
In fact we expect exactly the opposite behavior. If all removes are
failed, it return error without to free the device which allocated in
probe function. Otherwise, it free the device and return 0.
Replace it with the correct behavior.
Fixes: 8a41f4deccc3 ("common/mlx5: introduce layer for multiple class drivers")
Cc: stable@dpdk.org
Signed-off-by: Michael Baum <michaelba@nvidia.com>
---
 drivers/common/mlx5/mlx5_common.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/common/mlx5/mlx5_common.c b/drivers/common/mlx5/mlx5_common.c
index 4321cb3a9c..6885bbb1d2 100644
--- a/drivers/common/mlx5/mlx5_common.c
+++ b/drivers/common/mlx5/mlx5_common.c
@@ -354,7 +354,7 @@ mlx5_common_dev_remove(struct rte_device *eal_dev)
 		return -ENODEV;
 	/* Matching device found, cleanup and unload drivers. */
 	ret = drivers_remove(dev, dev->classes_loaded);
-	if (ret != 0)
+	if (ret == 0)
 		dev_release(dev);
 	return ret;
 }
-- 
2.25.1
^ permalink raw reply	[flat|nested] 14+ messages in thread
* Re: [dpdk-stable] [PATCH 1/6] net/mlx5: fix shared device context creation error flow
  2021-08-31 20:37 ` [dpdk-stable] [PATCH 1/6] net/mlx5: fix shared device context creation error flow Michael Baum
@ 2021-09-05  8:46   ` Raslan Darawsheh
  0 siblings, 0 replies; 14+ messages in thread
From: Raslan Darawsheh @ 2021-09-05  8:46 UTC (permalink / raw)
  To: Michael Baum, dev; +Cc: Matan Azrad, Slava Ovsiienko, stable
Hi,
> -----Original Message-----
> From: Michael Baum <michaelba@nvidia.com>
> Sent: Tuesday, August 31, 2021 11:37 PM
> To: dev@dpdk.org
> Cc: Matan Azrad <matan@nvidia.com>; Raslan Darawsheh
> <rasland@nvidia.com>; Slava Ovsiienko <viacheslavo@nvidia.com>;
> stable@dpdk.org
> Subject: [PATCH 1/6] net/mlx5: fix shared device context creation error flow
How about something like this:
fix memory leak in the shared device context creation
> 
> In shared device context creation, there are two validations after MR
> btree memory allocation.
> 
> When one of them fails, the MR btree memory was not freed what caused a
> memory leak.
> 
> Free it.
How about changing it to something like this:
In shared device context creation, there is a missing validation when one of
the btree memory allocation fails that will cause a memory leak.
This adds a proper check to clean resources in case of failure.
> 
> Fixes: 632f0f19056f ("net/mlx5: manage shared counters in three-level
> table")
> Cc: stable@dpdk.org
> 
> Signed-off-by: Michael Baum <michaelba@nvidia.com>
> ---
>  drivers/net/mlx5/mlx5.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
> index f84e061fe7..f0ec2d1279 100644
> --- a/drivers/net/mlx5/mlx5.c
> +++ b/drivers/net/mlx5/mlx5.c
> @@ -1254,6 +1254,8 @@ mlx5_alloc_shared_dev_ctx(const struct
> mlx5_dev_spawn_data *spawn,
>  	MLX5_ASSERT(sh);
>  	if (sh->cnt_id_tbl)
>  		mlx5_l3t_destroy(sh->cnt_id_tbl);
> +	if (sh->share_cache.cache.table)
> +		mlx5_mr_btree_free(&sh->share_cache.cache);
>  	if (sh->tis)
>  		claim_zero(mlx5_devx_cmd_destroy(sh->tis));
>  	if (sh->td)
> --
> 2.25.1
^ permalink raw reply	[flat|nested] 14+ messages in thread
* Re: [dpdk-stable] [PATCH 2/6] net/mlx5: fix PCI probing error flow
  2021-08-31 20:37 ` [dpdk-stable] [PATCH 2/6] net/mlx5: fix PCI probing " Michael Baum
@ 2021-09-05  8:53   ` Raslan Darawsheh
  0 siblings, 0 replies; 14+ messages in thread
From: Raslan Darawsheh @ 2021-09-05  8:53 UTC (permalink / raw)
  To: Michael Baum, dev; +Cc: Matan Azrad, Slava Ovsiienko, stable
Hi,
> -----Original Message-----
> From: Michael Baum <michaelba@nvidia.com>
> Sent: Tuesday, August 31, 2021 11:37 PM
> To: dev@dpdk.org
> Cc: Matan Azrad <matan@nvidia.com>; Raslan Darawsheh
> <rasland@nvidia.com>; Slava Ovsiienko <viacheslavo@nvidia.com>;
> stable@dpdk.org
> Subject: [PATCH 2/6] net/mlx5: fix PCI probing error flow
> 
How about changing to something like this:
Fix memory leak in PCI prob
> In PCI probing, there is internal probe function that is called several times for
> several PFs.
During PCI probe, the internal probe function is called per PF.
> 
> When one of them fails, the previous PFs are not destroyed what may cause
> a memory leak.
If one of them fails, it was missing a prober destroy for the previously probed PFs
> 
> Destroy them.
This fixes the behavior by destroying all previously probed PF's
> 
> Fixes: 08c2772fc747 ("net/mlx5: support list of representor PF")
> Cc: stable@dpdk.org
> 
> Signed-off-by: Michael Baum <michaelba@nvidia.com>
> ---
>  drivers/net/mlx5/linux/mlx5_os.c | 13 ++++++++++++-
>  drivers/net/mlx5/mlx5.c          |  2 +-
>  drivers/net/mlx5/mlx5.h          |  1 +
>  3 files changed, 14 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/net/mlx5/linux/mlx5_os.c
> b/drivers/net/mlx5/linux/mlx5_os.c
> index 5f8766aa48..3d204f99f7 100644
> --- a/drivers/net/mlx5/linux/mlx5_os.c
> +++ b/drivers/net/mlx5/linux/mlx5_os.c
> @@ -2700,9 +2700,20 @@ mlx5_os_pci_probe(struct rte_pci_device
> *pci_dev)
> 
>  	if (eth_da.nb_ports > 0) {
>  		/* Iterate all port if devargs pf is range: "pf[0-1]vf[...]". */
> -		for (p = 0; p < eth_da.nb_ports; p++)
> +		for (p = 0; p < eth_da.nb_ports; p++) {
>  			ret = mlx5_os_pci_probe_pf(pci_dev, ð_da,
>  						   eth_da.ports[p]);
> +			if (ret)
> +				break;
> +		}
> +		if (ret) {
> +			DRV_LOG(ERR, "Probe of PCI device " PCI_PRI_FMT "
> "
> +				"aborted due to proding failure of PF %u",
> +				pci_dev->addr.domain, pci_dev->addr.bus,
> +				pci_dev->addr.devid, pci_dev-
> >addr.function,
> +				eth_da.ports[p]);
> +			mlx5_net_remove(&pci_dev->device);
> +		}
>  	} else {
>  		ret = mlx5_os_pci_probe_pf(pci_dev, ð_da, 0);
>  	}
> diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c index
> f0ec2d1279..02ea2e781e 100644
> --- a/drivers/net/mlx5/mlx5.c
> +++ b/drivers/net/mlx5/mlx5.c
> @@ -2386,7 +2386,7 @@ mlx5_eth_find_next(uint16_t port_id, struct
> rte_device *odev)
>   * @return
>   *   0 on success, the function cannot fail.
>   */
> -static int
> +int
>  mlx5_net_remove(struct rte_device *dev)  {
>  	uint16_t port_id;
> diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h index
> e02714e231..3581414b78 100644
> --- a/drivers/net/mlx5/mlx5.h
> +++ b/drivers/net/mlx5/mlx5.h
> @@ -1483,6 +1483,7 @@ int mlx5_udp_tunnel_port_add(struct rte_eth_dev
> *dev,
>  			      struct rte_eth_udp_tunnel *udp_tunnel);
> uint16_t mlx5_eth_find_next(uint16_t port_id, struct rte_device *odev);  int
> mlx5_dev_close(struct rte_eth_dev *dev);
> +int mlx5_net_remove(struct rte_device *dev);
>  bool mlx5_is_hpf(struct rte_eth_dev *dev);  bool mlx5_is_sf_repr(struct
> rte_eth_dev *dev);  void mlx5_age_event_prepare(struct
> mlx5_dev_ctx_shared *sh);
> --
> 2.25.1
^ permalink raw reply	[flat|nested] 14+ messages in thread
* [dpdk-stable] [PATCH v2 1/6] net/mlx5: fix memory leak in the SH creation
       [not found] ` <20210912103628.257499-1-michaelba@nvidia.com>
@ 2021-09-12 10:36   ` Michael Baum
  2021-09-12 10:36   ` [dpdk-stable] [PATCH v2 2/6] net/mlx5: fix memory leak in PCI probe Michael Baum
                     ` (4 subsequent siblings)
  5 siblings, 0 replies; 14+ messages in thread
From: Michael Baum @ 2021-09-12 10:36 UTC (permalink / raw)
  To: dev; +Cc: Matan Azrad, Raslan Darawsheh, Viacheslav Ovsiienko, stable
In shared device context creation, there is a missing validation when
one of the btree memory allocation fails that will cause a memory leak.
This adds a proper check to clean resources in case of failure.
Fixes: 632f0f19056f ("net/mlx5: manage shared counters in three-level table")
Cc: stable@dpdk.org
Signed-off-by: Michael Baum <michaelba@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
---
 drivers/net/mlx5/mlx5.c | 2 ++
 1 file changed, 2 insertions(+)
diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index f84e061fe7..f0ec2d1279 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -1254,6 +1254,8 @@ mlx5_alloc_shared_dev_ctx(const struct mlx5_dev_spawn_data *spawn,
 	MLX5_ASSERT(sh);
 	if (sh->cnt_id_tbl)
 		mlx5_l3t_destroy(sh->cnt_id_tbl);
+	if (sh->share_cache.cache.table)
+		mlx5_mr_btree_free(&sh->share_cache.cache);
 	if (sh->tis)
 		claim_zero(mlx5_devx_cmd_destroy(sh->tis));
 	if (sh->td)
-- 
2.25.1
^ permalink raw reply	[flat|nested] 14+ messages in thread
* [dpdk-stable] [PATCH v2 2/6] net/mlx5: fix memory leak in PCI probe
       [not found] ` <20210912103628.257499-1-michaelba@nvidia.com>
  2021-09-12 10:36   ` [dpdk-stable] [PATCH v2 1/6] net/mlx5: fix memory leak in the SH creation Michael Baum
@ 2021-09-12 10:36   ` Michael Baum
  2021-09-12 10:36   ` [dpdk-stable] [PATCH v2 3/6] net/mlx5: fix allow duplicate pattern devarg default Michael Baum
                     ` (3 subsequent siblings)
  5 siblings, 0 replies; 14+ messages in thread
From: Michael Baum @ 2021-09-12 10:36 UTC (permalink / raw)
  To: dev; +Cc: Matan Azrad, Raslan Darawsheh, Viacheslav Ovsiienko, stable
During PCI probe, the internal probe function is called per PF.
If one of them fails, it was missing a prober destroy for the previously
probed PFs.
This fixes the behavior by destroying all previously probed PFs.
Fixes: 08c2772fc747 ("net/mlx5: support list of representor PF")
Cc: stable@dpdk.org
Signed-off-by: Michael Baum <michaelba@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
---
 drivers/net/mlx5/linux/mlx5_os.c | 13 ++++++++++++-
 drivers/net/mlx5/mlx5.c          |  2 +-
 drivers/net/mlx5/mlx5.h          |  1 +
 3 files changed, 14 insertions(+), 2 deletions(-)
diff --git a/drivers/net/mlx5/linux/mlx5_os.c b/drivers/net/mlx5/linux/mlx5_os.c
index 5f8766aa48..3d204f99f7 100644
--- a/drivers/net/mlx5/linux/mlx5_os.c
+++ b/drivers/net/mlx5/linux/mlx5_os.c
@@ -2700,9 +2700,20 @@ mlx5_os_pci_probe(struct rte_pci_device *pci_dev)
 
 	if (eth_da.nb_ports > 0) {
 		/* Iterate all port if devargs pf is range: "pf[0-1]vf[...]". */
-		for (p = 0; p < eth_da.nb_ports; p++)
+		for (p = 0; p < eth_da.nb_ports; p++) {
 			ret = mlx5_os_pci_probe_pf(pci_dev, ð_da,
 						   eth_da.ports[p]);
+			if (ret)
+				break;
+		}
+		if (ret) {
+			DRV_LOG(ERR, "Probe of PCI device " PCI_PRI_FMT " "
+				"aborted due to proding failure of PF %u",
+				pci_dev->addr.domain, pci_dev->addr.bus,
+				pci_dev->addr.devid, pci_dev->addr.function,
+				eth_da.ports[p]);
+			mlx5_net_remove(&pci_dev->device);
+		}
 	} else {
 		ret = mlx5_os_pci_probe_pf(pci_dev, ð_da, 0);
 	}
diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index f0ec2d1279..02ea2e781e 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -2386,7 +2386,7 @@ mlx5_eth_find_next(uint16_t port_id, struct rte_device *odev)
  * @return
  *   0 on success, the function cannot fail.
  */
-static int
+int
 mlx5_net_remove(struct rte_device *dev)
 {
 	uint16_t port_id;
diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index e02714e231..3581414b78 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -1483,6 +1483,7 @@ int mlx5_udp_tunnel_port_add(struct rte_eth_dev *dev,
 			      struct rte_eth_udp_tunnel *udp_tunnel);
 uint16_t mlx5_eth_find_next(uint16_t port_id, struct rte_device *odev);
 int mlx5_dev_close(struct rte_eth_dev *dev);
+int mlx5_net_remove(struct rte_device *dev);
 bool mlx5_is_hpf(struct rte_eth_dev *dev);
 bool mlx5_is_sf_repr(struct rte_eth_dev *dev);
 void mlx5_age_event_prepare(struct mlx5_dev_ctx_shared *sh);
-- 
2.25.1
^ permalink raw reply	[flat|nested] 14+ messages in thread
* [dpdk-stable] [PATCH v2 3/6] net/mlx5: fix allow duplicate pattern devarg default
       [not found] ` <20210912103628.257499-1-michaelba@nvidia.com>
  2021-09-12 10:36   ` [dpdk-stable] [PATCH v2 1/6] net/mlx5: fix memory leak in the SH creation Michael Baum
  2021-09-12 10:36   ` [dpdk-stable] [PATCH v2 2/6] net/mlx5: fix memory leak in PCI probe Michael Baum
@ 2021-09-12 10:36   ` Michael Baum
  2021-09-12 10:36   ` [dpdk-stable] [PATCH v2 4/6] common/mlx5: fix class combination validation Michael Baum
                     ` (2 subsequent siblings)
  5 siblings, 0 replies; 14+ messages in thread
From: Michael Baum @ 2021-09-12 10:36 UTC (permalink / raw)
  To: dev; +Cc: Matan Azrad, Raslan Darawsheh, Viacheslav Ovsiienko, stable
In order to allow\disallow configuring rules with identical patterns,
the new device argument 'allow_duplicate_pattern' was introduced.
The default is to allow, and it is initialized to 1 in PCI probe
function.
However, on auxiliary bus probing (for Sub-Function) it is not
initialized at all, so it's actually initialized to 0
Move the initialization to default config function which is called from
both.
Fixes: 919488fbfa71 ("net/mlx5: support Sub-Function")
Cc: stable@dpdk.org
Signed-off-by: Michael Baum <michaelba@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
---
 drivers/net/mlx5/linux/mlx5_os.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/net/mlx5/linux/mlx5_os.c b/drivers/net/mlx5/linux/mlx5_os.c
index 3d204f99f7..cf4de7e6f9 100644
--- a/drivers/net/mlx5/linux/mlx5_os.c
+++ b/drivers/net/mlx5/linux/mlx5_os.c
@@ -2142,6 +2142,7 @@ mlx5_os_config_default(struct mlx5_dev_config *config)
 	config->dv_flow_en = 1;
 	config->decap_en = 1;
 	config->log_hp_size = MLX5_ARG_UNSET;
+	config->allow_duplicate_pattern = 1;
 }
 
 /**
@@ -2564,7 +2565,6 @@ mlx5_os_pci_probe_pf(struct rte_pci_device *pci_dev,
 		/* Default configuration. */
 		mlx5_os_config_default(&dev_config);
 		dev_config.vf = dev_config_vf;
-		dev_config.allow_duplicate_pattern = 1;
 		list[i].numa_node = pci_dev->device.numa_node;
 		list[i].eth_dev = mlx5_dev_spawn(&pci_dev->device,
 						 &list[i],
-- 
2.25.1
^ permalink raw reply	[flat|nested] 14+ messages in thread
* [dpdk-stable] [PATCH v2 4/6] common/mlx5: fix class combination validation
       [not found] ` <20210912103628.257499-1-michaelba@nvidia.com>
                     ` (2 preceding siblings ...)
  2021-09-12 10:36   ` [dpdk-stable] [PATCH v2 3/6] net/mlx5: fix allow duplicate pattern devarg default Michael Baum
@ 2021-09-12 10:36   ` Michael Baum
  2021-09-12 10:36   ` [dpdk-stable] [PATCH v2 5/6] common/mlx5: fix device list operation concurrency Michael Baum
  2021-09-12 10:36   ` [dpdk-stable] [PATCH v2 6/6] common/mlx5: fix resource cleanliness in a device remove Michael Baum
  5 siblings, 0 replies; 14+ messages in thread
From: Michael Baum @ 2021-09-12 10:36 UTC (permalink / raw)
  To: dev; +Cc: Matan Azrad, Raslan Darawsheh, Viacheslav Ovsiienko, stable
The common probe function gets as a user argument the classes it should
create, and checks whether the combination is valid.
In case the device already exists, it checks the integration of the
above with the classes that the device has.
However, the function does not check the combination when the device
does not exist and it has to create it.
Check if the combination is valid for all cases.
Fixes: ad435d320473 ("common/mlx5: add bus-agnostic layer")
Cc: stable@dpdk.org
Signed-off-by: Michael Baum <michaelba@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
---
 drivers/common/mlx5/mlx5_common.c | 18 ++++++++++--------
 1 file changed, 10 insertions(+), 8 deletions(-)
diff --git a/drivers/common/mlx5/mlx5_common.c b/drivers/common/mlx5/mlx5_common.c
index 459cf4bcc4..f6e440dca1 100644
--- a/drivers/common/mlx5/mlx5_common.c
+++ b/drivers/common/mlx5/mlx5_common.c
@@ -317,14 +317,16 @@ mlx5_common_dev_probe(struct rte_device *eal_dev)
 		dev->dev = eal_dev;
 		TAILQ_INSERT_HEAD(&devices_list, dev, next);
 		new_device = true;
-	} else {
-		/* Validate combination here. */
-		ret = is_valid_class_combination(classes |
-						 dev->classes_loaded);
-		if (ret != 0) {
-			DRV_LOG(ERR, "Unsupported mlx5 classes combination.");
-			return ret;
-		}
+	}
+	/*
+	 * Validate combination here.
+	 * For new device, the classes_loaded field is 0 and it check only
+	 * the classes given as user device arguments.
+	 */
+	ret = is_valid_class_combination(classes | dev->classes_loaded);
+	if (ret != 0) {
+		DRV_LOG(ERR, "Unsupported mlx5 classes combination.");
+		goto class_err;
 	}
 	ret = drivers_probe(dev, classes);
 	if (ret)
-- 
2.25.1
^ permalink raw reply	[flat|nested] 14+ messages in thread
* [dpdk-stable] [PATCH v2 5/6] common/mlx5: fix device list operation concurrency
       [not found] ` <20210912103628.257499-1-michaelba@nvidia.com>
                     ` (3 preceding siblings ...)
  2021-09-12 10:36   ` [dpdk-stable] [PATCH v2 4/6] common/mlx5: fix class combination validation Michael Baum
@ 2021-09-12 10:36   ` Michael Baum
  2021-09-12 10:36   ` [dpdk-stable] [PATCH v2 6/6] common/mlx5: fix resource cleanliness in a device remove Michael Baum
  5 siblings, 0 replies; 14+ messages in thread
From: Michael Baum @ 2021-09-12 10:36 UTC (permalink / raw)
  To: dev; +Cc: Matan Azrad, Raslan Darawsheh, Viacheslav Ovsiienko, stable
The MLX5 common driver has a global list of mlx5 devices which are
probed.
In probe function it create one and insert it to the list. Similarly it
removes the device in remove function.
These operations are not safe as there can be such operations in
parallel, by different threads.
Add global lock for the list and use it to insert or remove.
Fixes: 8a41f4deccc3 ("common/mlx5: introduce layer for multiple class drivers")
Cc: stable@dpdk.org
Signed-off-by: Michael Baum <michaelba@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
---
 drivers/common/mlx5/mlx5_common.c | 6 ++++++
 1 file changed, 6 insertions(+)
diff --git a/drivers/common/mlx5/mlx5_common.c b/drivers/common/mlx5/mlx5_common.c
index f6e440dca1..4321cb3a9c 100644
--- a/drivers/common/mlx5/mlx5_common.c
+++ b/drivers/common/mlx5/mlx5_common.c
@@ -50,6 +50,7 @@ static TAILQ_HEAD(mlx5_drivers, mlx5_class_driver) drivers_list =
 /* Head of devices. */
 static TAILQ_HEAD(mlx5_devices, mlx5_common_device) devices_list =
 				TAILQ_HEAD_INITIALIZER(devices_list);
+static pthread_mutex_t devices_list_lock;
 
 static const struct {
 	const char *name;
@@ -222,7 +223,9 @@ mlx5_dev_to_pci_str(const struct rte_device *dev, char *addr, size_t size)
 static void
 dev_release(struct mlx5_common_device *dev)
 {
+	pthread_mutex_lock(&devices_list_lock);
 	TAILQ_REMOVE(&devices_list, dev, next);
+	pthread_mutex_unlock(&devices_list_lock);
 	rte_free(dev);
 }
 
@@ -315,7 +318,9 @@ mlx5_common_dev_probe(struct rte_device *eal_dev)
 		if (!dev)
 			return -ENOMEM;
 		dev->dev = eal_dev;
+		pthread_mutex_lock(&devices_list_lock);
 		TAILQ_INSERT_HEAD(&devices_list, dev, next);
+		pthread_mutex_unlock(&devices_list_lock);
 		new_device = true;
 	}
 	/*
@@ -440,6 +445,7 @@ mlx5_common_init(void)
 	if (mlx5_common_initialized)
 		return;
 
+	pthread_mutex_init(&devices_list_lock, NULL);
 	mlx5_glue_constructor();
 	mlx5_common_driver_init();
 	mlx5_common_initialized = true;
-- 
2.25.1
^ permalink raw reply	[flat|nested] 14+ messages in thread
* [dpdk-stable] [PATCH v2 6/6] common/mlx5: fix resource cleanliness in a device remove
       [not found] ` <20210912103628.257499-1-michaelba@nvidia.com>
                     ` (4 preceding siblings ...)
  2021-09-12 10:36   ` [dpdk-stable] [PATCH v2 5/6] common/mlx5: fix device list operation concurrency Michael Baum
@ 2021-09-12 10:36   ` Michael Baum
  5 siblings, 0 replies; 14+ messages in thread
From: Michael Baum @ 2021-09-12 10:36 UTC (permalink / raw)
  To: dev; +Cc: Matan Azrad, Raslan Darawsheh, Viacheslav Ovsiienko, stable
The common remove function call in a loop to remove function for each
driver which have been registered.
If all removes are succeeded, it return 0 without to free the device
which allocated in probe function. Otherwise, it free the device.
In fact we expect exactly the opposite behavior. If all removes are
failed, it return error without to free the device which allocated in
probe function. Otherwise, it free the device and return 0.
Replace it with the correct behavior.
Fixes: 8a41f4deccc3 ("common/mlx5: introduce layer for multiple class drivers")
Cc: stable@dpdk.org
Signed-off-by: Michael Baum <michaelba@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
---
 drivers/common/mlx5/mlx5_common.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/common/mlx5/mlx5_common.c b/drivers/common/mlx5/mlx5_common.c
index 4321cb3a9c..6885bbb1d2 100644
--- a/drivers/common/mlx5/mlx5_common.c
+++ b/drivers/common/mlx5/mlx5_common.c
@@ -354,7 +354,7 @@ mlx5_common_dev_remove(struct rte_device *eal_dev)
 		return -ENODEV;
 	/* Matching device found, cleanup and unload drivers. */
 	ret = drivers_remove(dev, dev->classes_loaded);
-	if (ret != 0)
+	if (ret == 0)
 		dev_release(dev);
 	return ret;
 }
-- 
2.25.1
^ permalink raw reply	[flat|nested] 14+ messages in thread
end of thread, other threads:[~2021-09-12 10:37 UTC | newest]
Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <20210831203732.3411134-1-michaelba@nvidia.com>
2021-08-31 20:37 ` [dpdk-stable] [PATCH 1/6] net/mlx5: fix shared device context creation error flow Michael Baum
2021-09-05  8:46   ` Raslan Darawsheh
2021-08-31 20:37 ` [dpdk-stable] [PATCH 2/6] net/mlx5: fix PCI probing " Michael Baum
2021-09-05  8:53   ` Raslan Darawsheh
2021-08-31 20:37 ` [dpdk-stable] [PATCH 3/6] net/mlx5: fix allow duplicate pattern devarg default Michael Baum
2021-08-31 20:37 ` [dpdk-stable] [PATCH 4/6] common/mlx5: fix class combination validation Michael Baum
2021-08-31 20:37 ` [dpdk-stable] [PATCH 5/6] common/mlx5: fix device list operation concurrency Michael Baum
2021-08-31 20:37 ` [dpdk-stable] [PATCH 6/6] common/mlx5: fix resource cleanliness in a device remove Michael Baum
     [not found] ` <20210912103628.257499-1-michaelba@nvidia.com>
2021-09-12 10:36   ` [dpdk-stable] [PATCH v2 1/6] net/mlx5: fix memory leak in the SH creation Michael Baum
2021-09-12 10:36   ` [dpdk-stable] [PATCH v2 2/6] net/mlx5: fix memory leak in PCI probe Michael Baum
2021-09-12 10:36   ` [dpdk-stable] [PATCH v2 3/6] net/mlx5: fix allow duplicate pattern devarg default Michael Baum
2021-09-12 10:36   ` [dpdk-stable] [PATCH v2 4/6] common/mlx5: fix class combination validation Michael Baum
2021-09-12 10:36   ` [dpdk-stable] [PATCH v2 5/6] common/mlx5: fix device list operation concurrency Michael Baum
2021-09-12 10:36   ` [dpdk-stable] [PATCH v2 6/6] common/mlx5: fix resource cleanliness in a device remove Michael Baum
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).