In shared device context creation, there are two validations after MR btree memory allocation. When one of them fails, the MR btree memory was not freed what caused a memory leak. Free it. Fixes: 632f0f19056f ("net/mlx5: manage shared counters in three-level table") Cc: stable@dpdk.org Signed-off-by: Michael Baum <michaelba@nvidia.com> --- drivers/net/mlx5/mlx5.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c index f84e061fe7..f0ec2d1279 100644 --- a/drivers/net/mlx5/mlx5.c +++ b/drivers/net/mlx5/mlx5.c @@ -1254,6 +1254,8 @@ mlx5_alloc_shared_dev_ctx(const struct mlx5_dev_spawn_data *spawn, MLX5_ASSERT(sh); if (sh->cnt_id_tbl) mlx5_l3t_destroy(sh->cnt_id_tbl); + if (sh->share_cache.cache.table) + mlx5_mr_btree_free(&sh->share_cache.cache); if (sh->tis) claim_zero(mlx5_devx_cmd_destroy(sh->tis)); if (sh->td) -- 2.25.1
In PCI probing, there is internal probe function that is called several times for several PFs. When one of them fails, the previous PFs are not destroyed what may cause a memory leak. Destroy them. Fixes: 08c2772fc747 ("net/mlx5: support list of representor PF") Cc: stable@dpdk.org Signed-off-by: Michael Baum <michaelba@nvidia.com> --- drivers/net/mlx5/linux/mlx5_os.c | 13 ++++++++++++- drivers/net/mlx5/mlx5.c | 2 +- drivers/net/mlx5/mlx5.h | 1 + 3 files changed, 14 insertions(+), 2 deletions(-) diff --git a/drivers/net/mlx5/linux/mlx5_os.c b/drivers/net/mlx5/linux/mlx5_os.c index 5f8766aa48..3d204f99f7 100644 --- a/drivers/net/mlx5/linux/mlx5_os.c +++ b/drivers/net/mlx5/linux/mlx5_os.c @@ -2700,9 +2700,20 @@ mlx5_os_pci_probe(struct rte_pci_device *pci_dev) if (eth_da.nb_ports > 0) { /* Iterate all port if devargs pf is range: "pf[0-1]vf[...]". */ - for (p = 0; p < eth_da.nb_ports; p++) + for (p = 0; p < eth_da.nb_ports; p++) { ret = mlx5_os_pci_probe_pf(pci_dev, ð_da, eth_da.ports[p]); + if (ret) + break; + } + if (ret) { + DRV_LOG(ERR, "Probe of PCI device " PCI_PRI_FMT " " + "aborted due to proding failure of PF %u", + pci_dev->addr.domain, pci_dev->addr.bus, + pci_dev->addr.devid, pci_dev->addr.function, + eth_da.ports[p]); + mlx5_net_remove(&pci_dev->device); + } } else { ret = mlx5_os_pci_probe_pf(pci_dev, ð_da, 0); } diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c index f0ec2d1279..02ea2e781e 100644 --- a/drivers/net/mlx5/mlx5.c +++ b/drivers/net/mlx5/mlx5.c @@ -2386,7 +2386,7 @@ mlx5_eth_find_next(uint16_t port_id, struct rte_device *odev) * @return * 0 on success, the function cannot fail. */ -static int +int mlx5_net_remove(struct rte_device *dev) { uint16_t port_id; diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h index e02714e231..3581414b78 100644 --- a/drivers/net/mlx5/mlx5.h +++ b/drivers/net/mlx5/mlx5.h @@ -1483,6 +1483,7 @@ int mlx5_udp_tunnel_port_add(struct rte_eth_dev *dev, struct rte_eth_udp_tunnel *udp_tunnel); uint16_t mlx5_eth_find_next(uint16_t port_id, struct rte_device *odev); int mlx5_dev_close(struct rte_eth_dev *dev); +int mlx5_net_remove(struct rte_device *dev); bool mlx5_is_hpf(struct rte_eth_dev *dev); bool mlx5_is_sf_repr(struct rte_eth_dev *dev); void mlx5_age_event_prepare(struct mlx5_dev_ctx_shared *sh); -- 2.25.1
In order to allow\disallow configuring rules with identical patterns, the new device argument 'allow_duplicate_pattern' was introduced. The default is to allow, and it is initialized to 1 in PCI probe function. However, on auxiliary bus probing (for Sub-Function) it is not initialized at all, so it's actually initialized to 0 Move the initialization to default config function which is called from both. Fixes: 919488fbfa71 ("net/mlx5: support Sub-Function") Cc: stable@dpdk.org Signed-off-by: Michael Baum <michaelba@nvidia.com> --- drivers/net/mlx5/linux/mlx5_os.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/net/mlx5/linux/mlx5_os.c b/drivers/net/mlx5/linux/mlx5_os.c index 3d204f99f7..cf4de7e6f9 100644 --- a/drivers/net/mlx5/linux/mlx5_os.c +++ b/drivers/net/mlx5/linux/mlx5_os.c @@ -2142,6 +2142,7 @@ mlx5_os_config_default(struct mlx5_dev_config *config) config->dv_flow_en = 1; config->decap_en = 1; config->log_hp_size = MLX5_ARG_UNSET; + config->allow_duplicate_pattern = 1; } /** @@ -2564,7 +2565,6 @@ mlx5_os_pci_probe_pf(struct rte_pci_device *pci_dev, /* Default configuration. */ mlx5_os_config_default(&dev_config); dev_config.vf = dev_config_vf; - dev_config.allow_duplicate_pattern = 1; list[i].numa_node = pci_dev->device.numa_node; list[i].eth_dev = mlx5_dev_spawn(&pci_dev->device, &list[i], -- 2.25.1
The common probe function gets as a user argument the classes it should create, and checks whether the combination is valid. In case the device already exists, it checks the integration of the above with the classes that the device has. However, the function does not check the combination when the device does not exist and it has to create it. Check if the combination is valid for all cases. Fixes: ad435d320473 ("common/mlx5: add bus-agnostic layer") Cc: stable@dpdk.org Signed-off-by: Michael Baum <michaelba@nvidia.com> --- drivers/common/mlx5/mlx5_common.c | 18 ++++++++++-------- 1 file changed, 10 insertions(+), 8 deletions(-) diff --git a/drivers/common/mlx5/mlx5_common.c b/drivers/common/mlx5/mlx5_common.c index 459cf4bcc4..f6e440dca1 100644 --- a/drivers/common/mlx5/mlx5_common.c +++ b/drivers/common/mlx5/mlx5_common.c @@ -317,14 +317,16 @@ mlx5_common_dev_probe(struct rte_device *eal_dev) dev->dev = eal_dev; TAILQ_INSERT_HEAD(&devices_list, dev, next); new_device = true; - } else { - /* Validate combination here. */ - ret = is_valid_class_combination(classes | - dev->classes_loaded); - if (ret != 0) { - DRV_LOG(ERR, "Unsupported mlx5 classes combination."); - return ret; - } + } + /* + * Validate combination here. + * For new device, the classes_loaded field is 0 and it check only + * the classes given as user device arguments. + */ + ret = is_valid_class_combination(classes | dev->classes_loaded); + if (ret != 0) { + DRV_LOG(ERR, "Unsupported mlx5 classes combination."); + goto class_err; } ret = drivers_probe(dev, classes); if (ret) -- 2.25.1
The MLX5 common driver has a global list of mlx5 devices which are probed. In probe function it create one and insert it to the list. Similarly it removes the device in remove function. These operations are not safe as there can be such operations in parallel, by different threads. Add global lock for the list and use it to insert or remove. Fixes: 8a41f4deccc3 ("common/mlx5: introduce layer for multiple class drivers") Cc: stable@dpdk.org Signed-off-by: Michael Baum <michaelba@nvidia.com> --- drivers/common/mlx5/mlx5_common.c | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/drivers/common/mlx5/mlx5_common.c b/drivers/common/mlx5/mlx5_common.c index f6e440dca1..4321cb3a9c 100644 --- a/drivers/common/mlx5/mlx5_common.c +++ b/drivers/common/mlx5/mlx5_common.c @@ -50,6 +50,7 @@ static TAILQ_HEAD(mlx5_drivers, mlx5_class_driver) drivers_list = /* Head of devices. */ static TAILQ_HEAD(mlx5_devices, mlx5_common_device) devices_list = TAILQ_HEAD_INITIALIZER(devices_list); +static pthread_mutex_t devices_list_lock; static const struct { const char *name; @@ -222,7 +223,9 @@ mlx5_dev_to_pci_str(const struct rte_device *dev, char *addr, size_t size) static void dev_release(struct mlx5_common_device *dev) { + pthread_mutex_lock(&devices_list_lock); TAILQ_REMOVE(&devices_list, dev, next); + pthread_mutex_unlock(&devices_list_lock); rte_free(dev); } @@ -315,7 +318,9 @@ mlx5_common_dev_probe(struct rte_device *eal_dev) if (!dev) return -ENOMEM; dev->dev = eal_dev; + pthread_mutex_lock(&devices_list_lock); TAILQ_INSERT_HEAD(&devices_list, dev, next); + pthread_mutex_unlock(&devices_list_lock); new_device = true; } /* @@ -440,6 +445,7 @@ mlx5_common_init(void) if (mlx5_common_initialized) return; + pthread_mutex_init(&devices_list_lock, NULL); mlx5_glue_constructor(); mlx5_common_driver_init(); mlx5_common_initialized = true; -- 2.25.1
The common remove function call in a loop to remove function for each driver which have been registered. If all removes are succeeded, it return 0 without to free the device which allocated in probe function. Otherwise, it free the device. In fact we expect exactly the opposite behavior. If all removes are failed, it return error without to free the device which allocated in probe function. Otherwise, it free the device and return 0. Replace it with the correct behavior. Fixes: 8a41f4deccc3 ("common/mlx5: introduce layer for multiple class drivers") Cc: stable@dpdk.org Signed-off-by: Michael Baum <michaelba@nvidia.com> --- drivers/common/mlx5/mlx5_common.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/common/mlx5/mlx5_common.c b/drivers/common/mlx5/mlx5_common.c index 4321cb3a9c..6885bbb1d2 100644 --- a/drivers/common/mlx5/mlx5_common.c +++ b/drivers/common/mlx5/mlx5_common.c @@ -354,7 +354,7 @@ mlx5_common_dev_remove(struct rte_device *eal_dev) return -ENODEV; /* Matching device found, cleanup and unload drivers. */ ret = drivers_remove(dev, dev->classes_loaded); - if (ret != 0) + if (ret == 0) dev_release(dev); return ret; } -- 2.25.1
Hi, > -----Original Message----- > From: Michael Baum <michaelba@nvidia.com> > Sent: Tuesday, August 31, 2021 11:37 PM > To: dev@dpdk.org > Cc: Matan Azrad <matan@nvidia.com>; Raslan Darawsheh > <rasland@nvidia.com>; Slava Ovsiienko <viacheslavo@nvidia.com>; > stable@dpdk.org > Subject: [PATCH 1/6] net/mlx5: fix shared device context creation error flow How about something like this: fix memory leak in the shared device context creation > > In shared device context creation, there are two validations after MR > btree memory allocation. > > When one of them fails, the MR btree memory was not freed what caused a > memory leak. > > Free it. How about changing it to something like this: In shared device context creation, there is a missing validation when one of the btree memory allocation fails that will cause a memory leak. This adds a proper check to clean resources in case of failure. > > Fixes: 632f0f19056f ("net/mlx5: manage shared counters in three-level > table") > Cc: stable@dpdk.org > > Signed-off-by: Michael Baum <michaelba@nvidia.com> > --- > drivers/net/mlx5/mlx5.c | 2 ++ > 1 file changed, 2 insertions(+) > > diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c > index f84e061fe7..f0ec2d1279 100644 > --- a/drivers/net/mlx5/mlx5.c > +++ b/drivers/net/mlx5/mlx5.c > @@ -1254,6 +1254,8 @@ mlx5_alloc_shared_dev_ctx(const struct > mlx5_dev_spawn_data *spawn, > MLX5_ASSERT(sh); > if (sh->cnt_id_tbl) > mlx5_l3t_destroy(sh->cnt_id_tbl); > + if (sh->share_cache.cache.table) > + mlx5_mr_btree_free(&sh->share_cache.cache); > if (sh->tis) > claim_zero(mlx5_devx_cmd_destroy(sh->tis)); > if (sh->td) > -- > 2.25.1
Hi, > -----Original Message----- > From: Michael Baum <michaelba@nvidia.com> > Sent: Tuesday, August 31, 2021 11:37 PM > To: dev@dpdk.org > Cc: Matan Azrad <matan@nvidia.com>; Raslan Darawsheh > <rasland@nvidia.com>; Slava Ovsiienko <viacheslavo@nvidia.com>; > stable@dpdk.org > Subject: [PATCH 2/6] net/mlx5: fix PCI probing error flow > How about changing to something like this: Fix memory leak in PCI prob > In PCI probing, there is internal probe function that is called several times for > several PFs. During PCI probe, the internal probe function is called per PF. > > When one of them fails, the previous PFs are not destroyed what may cause > a memory leak. If one of them fails, it was missing a prober destroy for the previously probed PFs > > Destroy them. This fixes the behavior by destroying all previously probed PF's > > Fixes: 08c2772fc747 ("net/mlx5: support list of representor PF") > Cc: stable@dpdk.org > > Signed-off-by: Michael Baum <michaelba@nvidia.com> > --- > drivers/net/mlx5/linux/mlx5_os.c | 13 ++++++++++++- > drivers/net/mlx5/mlx5.c | 2 +- > drivers/net/mlx5/mlx5.h | 1 + > 3 files changed, 14 insertions(+), 2 deletions(-) > > diff --git a/drivers/net/mlx5/linux/mlx5_os.c > b/drivers/net/mlx5/linux/mlx5_os.c > index 5f8766aa48..3d204f99f7 100644 > --- a/drivers/net/mlx5/linux/mlx5_os.c > +++ b/drivers/net/mlx5/linux/mlx5_os.c > @@ -2700,9 +2700,20 @@ mlx5_os_pci_probe(struct rte_pci_device > *pci_dev) > > if (eth_da.nb_ports > 0) { > /* Iterate all port if devargs pf is range: "pf[0-1]vf[...]". */ > - for (p = 0; p < eth_da.nb_ports; p++) > + for (p = 0; p < eth_da.nb_ports; p++) { > ret = mlx5_os_pci_probe_pf(pci_dev, ð_da, > eth_da.ports[p]); > + if (ret) > + break; > + } > + if (ret) { > + DRV_LOG(ERR, "Probe of PCI device " PCI_PRI_FMT " > " > + "aborted due to proding failure of PF %u", > + pci_dev->addr.domain, pci_dev->addr.bus, > + pci_dev->addr.devid, pci_dev- > >addr.function, > + eth_da.ports[p]); > + mlx5_net_remove(&pci_dev->device); > + } > } else { > ret = mlx5_os_pci_probe_pf(pci_dev, ð_da, 0); > } > diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c index > f0ec2d1279..02ea2e781e 100644 > --- a/drivers/net/mlx5/mlx5.c > +++ b/drivers/net/mlx5/mlx5.c > @@ -2386,7 +2386,7 @@ mlx5_eth_find_next(uint16_t port_id, struct > rte_device *odev) > * @return > * 0 on success, the function cannot fail. > */ > -static int > +int > mlx5_net_remove(struct rte_device *dev) { > uint16_t port_id; > diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h index > e02714e231..3581414b78 100644 > --- a/drivers/net/mlx5/mlx5.h > +++ b/drivers/net/mlx5/mlx5.h > @@ -1483,6 +1483,7 @@ int mlx5_udp_tunnel_port_add(struct rte_eth_dev > *dev, > struct rte_eth_udp_tunnel *udp_tunnel); > uint16_t mlx5_eth_find_next(uint16_t port_id, struct rte_device *odev); int > mlx5_dev_close(struct rte_eth_dev *dev); > +int mlx5_net_remove(struct rte_device *dev); > bool mlx5_is_hpf(struct rte_eth_dev *dev); bool mlx5_is_sf_repr(struct > rte_eth_dev *dev); void mlx5_age_event_prepare(struct > mlx5_dev_ctx_shared *sh); > -- > 2.25.1
In shared device context creation, there is a missing validation when one of the btree memory allocation fails that will cause a memory leak. This adds a proper check to clean resources in case of failure. Fixes: 632f0f19056f ("net/mlx5: manage shared counters in three-level table") Cc: stable@dpdk.org Signed-off-by: Michael Baum <michaelba@nvidia.com> Acked-by: Matan Azrad <matan@nvidia.com> --- drivers/net/mlx5/mlx5.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c index f84e061fe7..f0ec2d1279 100644 --- a/drivers/net/mlx5/mlx5.c +++ b/drivers/net/mlx5/mlx5.c @@ -1254,6 +1254,8 @@ mlx5_alloc_shared_dev_ctx(const struct mlx5_dev_spawn_data *spawn, MLX5_ASSERT(sh); if (sh->cnt_id_tbl) mlx5_l3t_destroy(sh->cnt_id_tbl); + if (sh->share_cache.cache.table) + mlx5_mr_btree_free(&sh->share_cache.cache); if (sh->tis) claim_zero(mlx5_devx_cmd_destroy(sh->tis)); if (sh->td) -- 2.25.1
During PCI probe, the internal probe function is called per PF. If one of them fails, it was missing a prober destroy for the previously probed PFs. This fixes the behavior by destroying all previously probed PFs. Fixes: 08c2772fc747 ("net/mlx5: support list of representor PF") Cc: stable@dpdk.org Signed-off-by: Michael Baum <michaelba@nvidia.com> Acked-by: Matan Azrad <matan@nvidia.com> --- drivers/net/mlx5/linux/mlx5_os.c | 13 ++++++++++++- drivers/net/mlx5/mlx5.c | 2 +- drivers/net/mlx5/mlx5.h | 1 + 3 files changed, 14 insertions(+), 2 deletions(-) diff --git a/drivers/net/mlx5/linux/mlx5_os.c b/drivers/net/mlx5/linux/mlx5_os.c index 5f8766aa48..3d204f99f7 100644 --- a/drivers/net/mlx5/linux/mlx5_os.c +++ b/drivers/net/mlx5/linux/mlx5_os.c @@ -2700,9 +2700,20 @@ mlx5_os_pci_probe(struct rte_pci_device *pci_dev) if (eth_da.nb_ports > 0) { /* Iterate all port if devargs pf is range: "pf[0-1]vf[...]". */ - for (p = 0; p < eth_da.nb_ports; p++) + for (p = 0; p < eth_da.nb_ports; p++) { ret = mlx5_os_pci_probe_pf(pci_dev, ð_da, eth_da.ports[p]); + if (ret) + break; + } + if (ret) { + DRV_LOG(ERR, "Probe of PCI device " PCI_PRI_FMT " " + "aborted due to proding failure of PF %u", + pci_dev->addr.domain, pci_dev->addr.bus, + pci_dev->addr.devid, pci_dev->addr.function, + eth_da.ports[p]); + mlx5_net_remove(&pci_dev->device); + } } else { ret = mlx5_os_pci_probe_pf(pci_dev, ð_da, 0); } diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c index f0ec2d1279..02ea2e781e 100644 --- a/drivers/net/mlx5/mlx5.c +++ b/drivers/net/mlx5/mlx5.c @@ -2386,7 +2386,7 @@ mlx5_eth_find_next(uint16_t port_id, struct rte_device *odev) * @return * 0 on success, the function cannot fail. */ -static int +int mlx5_net_remove(struct rte_device *dev) { uint16_t port_id; diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h index e02714e231..3581414b78 100644 --- a/drivers/net/mlx5/mlx5.h +++ b/drivers/net/mlx5/mlx5.h @@ -1483,6 +1483,7 @@ int mlx5_udp_tunnel_port_add(struct rte_eth_dev *dev, struct rte_eth_udp_tunnel *udp_tunnel); uint16_t mlx5_eth_find_next(uint16_t port_id, struct rte_device *odev); int mlx5_dev_close(struct rte_eth_dev *dev); +int mlx5_net_remove(struct rte_device *dev); bool mlx5_is_hpf(struct rte_eth_dev *dev); bool mlx5_is_sf_repr(struct rte_eth_dev *dev); void mlx5_age_event_prepare(struct mlx5_dev_ctx_shared *sh); -- 2.25.1
In order to allow\disallow configuring rules with identical patterns, the new device argument 'allow_duplicate_pattern' was introduced. The default is to allow, and it is initialized to 1 in PCI probe function. However, on auxiliary bus probing (for Sub-Function) it is not initialized at all, so it's actually initialized to 0 Move the initialization to default config function which is called from both. Fixes: 919488fbfa71 ("net/mlx5: support Sub-Function") Cc: stable@dpdk.org Signed-off-by: Michael Baum <michaelba@nvidia.com> Acked-by: Matan Azrad <matan@nvidia.com> --- drivers/net/mlx5/linux/mlx5_os.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/net/mlx5/linux/mlx5_os.c b/drivers/net/mlx5/linux/mlx5_os.c index 3d204f99f7..cf4de7e6f9 100644 --- a/drivers/net/mlx5/linux/mlx5_os.c +++ b/drivers/net/mlx5/linux/mlx5_os.c @@ -2142,6 +2142,7 @@ mlx5_os_config_default(struct mlx5_dev_config *config) config->dv_flow_en = 1; config->decap_en = 1; config->log_hp_size = MLX5_ARG_UNSET; + config->allow_duplicate_pattern = 1; } /** @@ -2564,7 +2565,6 @@ mlx5_os_pci_probe_pf(struct rte_pci_device *pci_dev, /* Default configuration. */ mlx5_os_config_default(&dev_config); dev_config.vf = dev_config_vf; - dev_config.allow_duplicate_pattern = 1; list[i].numa_node = pci_dev->device.numa_node; list[i].eth_dev = mlx5_dev_spawn(&pci_dev->device, &list[i], -- 2.25.1
The common probe function gets as a user argument the classes it should create, and checks whether the combination is valid. In case the device already exists, it checks the integration of the above with the classes that the device has. However, the function does not check the combination when the device does not exist and it has to create it. Check if the combination is valid for all cases. Fixes: ad435d320473 ("common/mlx5: add bus-agnostic layer") Cc: stable@dpdk.org Signed-off-by: Michael Baum <michaelba@nvidia.com> Acked-by: Matan Azrad <matan@nvidia.com> --- drivers/common/mlx5/mlx5_common.c | 18 ++++++++++-------- 1 file changed, 10 insertions(+), 8 deletions(-) diff --git a/drivers/common/mlx5/mlx5_common.c b/drivers/common/mlx5/mlx5_common.c index 459cf4bcc4..f6e440dca1 100644 --- a/drivers/common/mlx5/mlx5_common.c +++ b/drivers/common/mlx5/mlx5_common.c @@ -317,14 +317,16 @@ mlx5_common_dev_probe(struct rte_device *eal_dev) dev->dev = eal_dev; TAILQ_INSERT_HEAD(&devices_list, dev, next); new_device = true; - } else { - /* Validate combination here. */ - ret = is_valid_class_combination(classes | - dev->classes_loaded); - if (ret != 0) { - DRV_LOG(ERR, "Unsupported mlx5 classes combination."); - return ret; - } + } + /* + * Validate combination here. + * For new device, the classes_loaded field is 0 and it check only + * the classes given as user device arguments. + */ + ret = is_valid_class_combination(classes | dev->classes_loaded); + if (ret != 0) { + DRV_LOG(ERR, "Unsupported mlx5 classes combination."); + goto class_err; } ret = drivers_probe(dev, classes); if (ret) -- 2.25.1
The MLX5 common driver has a global list of mlx5 devices which are probed. In probe function it create one and insert it to the list. Similarly it removes the device in remove function. These operations are not safe as there can be such operations in parallel, by different threads. Add global lock for the list and use it to insert or remove. Fixes: 8a41f4deccc3 ("common/mlx5: introduce layer for multiple class drivers") Cc: stable@dpdk.org Signed-off-by: Michael Baum <michaelba@nvidia.com> Acked-by: Matan Azrad <matan@nvidia.com> --- drivers/common/mlx5/mlx5_common.c | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/drivers/common/mlx5/mlx5_common.c b/drivers/common/mlx5/mlx5_common.c index f6e440dca1..4321cb3a9c 100644 --- a/drivers/common/mlx5/mlx5_common.c +++ b/drivers/common/mlx5/mlx5_common.c @@ -50,6 +50,7 @@ static TAILQ_HEAD(mlx5_drivers, mlx5_class_driver) drivers_list = /* Head of devices. */ static TAILQ_HEAD(mlx5_devices, mlx5_common_device) devices_list = TAILQ_HEAD_INITIALIZER(devices_list); +static pthread_mutex_t devices_list_lock; static const struct { const char *name; @@ -222,7 +223,9 @@ mlx5_dev_to_pci_str(const struct rte_device *dev, char *addr, size_t size) static void dev_release(struct mlx5_common_device *dev) { + pthread_mutex_lock(&devices_list_lock); TAILQ_REMOVE(&devices_list, dev, next); + pthread_mutex_unlock(&devices_list_lock); rte_free(dev); } @@ -315,7 +318,9 @@ mlx5_common_dev_probe(struct rte_device *eal_dev) if (!dev) return -ENOMEM; dev->dev = eal_dev; + pthread_mutex_lock(&devices_list_lock); TAILQ_INSERT_HEAD(&devices_list, dev, next); + pthread_mutex_unlock(&devices_list_lock); new_device = true; } /* @@ -440,6 +445,7 @@ mlx5_common_init(void) if (mlx5_common_initialized) return; + pthread_mutex_init(&devices_list_lock, NULL); mlx5_glue_constructor(); mlx5_common_driver_init(); mlx5_common_initialized = true; -- 2.25.1
The common remove function call in a loop to remove function for each driver which have been registered. If all removes are succeeded, it return 0 without to free the device which allocated in probe function. Otherwise, it free the device. In fact we expect exactly the opposite behavior. If all removes are failed, it return error without to free the device which allocated in probe function. Otherwise, it free the device and return 0. Replace it with the correct behavior. Fixes: 8a41f4deccc3 ("common/mlx5: introduce layer for multiple class drivers") Cc: stable@dpdk.org Signed-off-by: Michael Baum <michaelba@nvidia.com> Acked-by: Matan Azrad <matan@nvidia.com> --- drivers/common/mlx5/mlx5_common.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/common/mlx5/mlx5_common.c b/drivers/common/mlx5/mlx5_common.c index 4321cb3a9c..6885bbb1d2 100644 --- a/drivers/common/mlx5/mlx5_common.c +++ b/drivers/common/mlx5/mlx5_common.c @@ -354,7 +354,7 @@ mlx5_common_dev_remove(struct rte_device *eal_dev) return -ENODEV; /* Matching device found, cleanup and unload drivers. */ ret = drivers_remove(dev, dev->classes_loaded); - if (ret != 0) + if (ret == 0) dev_release(dev); return ret; } -- 2.25.1