This fixes two places where pthread_mutex was being unsafely used between primary secondary process. These patches are necessary but not sufficient to address Bug 662 Stephen Hemminger (2): ethdev: make flow API primary/secondary process safe net/failsafe: fix primary/secondary mutex drivers/net/failsafe/failsafe.c | 5 +++++ lib/librte_ethdev/rte_ethdev.c | 6 +++++- 2 files changed, 10 insertions(+), 1 deletion(-) -- 2.30.2
Posix mutex are not by default safe for protecting for usage from multiple processes. The flow ops mutex could be used by both primary and secondary processes. Bugzilla ID: 662 Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Fixes: 80d1a9aff7f6 ("ethdev: make flow API thread safe") Cc: suanmingm@nvidia.com --- lib/librte_ethdev/rte_ethdev.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/lib/librte_ethdev/rte_ethdev.c b/lib/librte_ethdev/rte_ethdev.c index 6f514c388b4e..d1024df408a5 100644 --- a/lib/librte_ethdev/rte_ethdev.c +++ b/lib/librte_ethdev/rte_ethdev.c @@ -470,6 +470,7 @@ rte_eth_dev_allocate(const char *name) { uint16_t port_id; struct rte_eth_dev *eth_dev = NULL; + pthread_mutexattr_t attr; size_t name_len; name_len = strnlen(name, RTE_ETH_NAME_MAX_LEN); @@ -506,7 +507,10 @@ rte_eth_dev_allocate(const char *name) strlcpy(eth_dev->data->name, name, sizeof(eth_dev->data->name)); eth_dev->data->port_id = port_id; eth_dev->data->mtu = RTE_ETHER_MTU; - pthread_mutex_init(ð_dev->data->flow_ops_mutex, NULL); + + pthread_mutexattr_init(&attr); + pthread_mutexattr_setpshared(&attr, PTHREAD_PROCESS_SHARED); + pthread_mutex_init(ð_dev->data->flow_ops_mutex, &attr); unlock: rte_spinlock_unlock(ð_dev_shared_data->ownership_lock); -- 2.30.2
Set mutex used in failsafe driver to protect when used by both primary and secondary process. Without this fix, the failsafe lock is not really locking when there are multiple secondary processes. Bugzilla ID: 662 Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Fixes: 655fcd68c7d2 ("net/failsafe: fix hotplug races") Cc: matan@mellanox.com --- drivers/net/failsafe/failsafe.c | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/drivers/net/failsafe/failsafe.c b/drivers/net/failsafe/failsafe.c index e3bda0df2bf9..5b7e560dbc08 100644 --- a/drivers/net/failsafe/failsafe.c +++ b/drivers/net/failsafe/failsafe.c @@ -140,6 +140,11 @@ fs_mutex_init(struct fs_priv *priv) ERROR("Cannot initiate mutex attributes - %s", strerror(ret)); return ret; } + /* Allow mutex to protect primary/secondary */ + ret = pthread_mutexattr_setpshared(&attr, PTHREAD_PROCESS_SHARED); + if (ret) + ERROR("Cannot set mutex shared - %s", strerror(ret)); + /* Allow mutex relocks for the thread holding the mutex. */ ret = pthread_mutexattr_settype(&attr, PTHREAD_MUTEX_RECURSIVE); if (ret) { -- 2.30.2
On Mon, 15 Mar 2021 12:27:20 -0700
Stephen Hemminger <stephen@networkplumber.org> wrote:
> This fixes two places where pthread_mutex was being unsafely
> used between primary secondary process.
>
> These patches are necessary but not sufficient to address Bug 662
>
> Stephen Hemminger (2):
> ethdev: make flow API primary/secondary process safe
> net/failsafe: fix primary/secondary mutex
>
> drivers/net/failsafe/failsafe.c | 5 +++++
> lib/librte_ethdev/rte_ethdev.c | 6 +++++-
> 2 files changed, 10 insertions(+), 1 deletion(-)
Side note:
While looking at the code, many places are checking for error
conditions that can never occur..
RETURN VALUE
pthread_mutex_init always returns 0. The other mutex functions return 0
on success and a non-zero error code on error.
On Mon, 15 Mar 2021 12:27:20 -0700
Stephen Hemminger <stephen@networkplumber.org> wrote:
> This fixes two places where pthread_mutex was being unsafely
> used between primary secondary process.
>
> These patches are necessary but not sufficient to address Bug 662
>
> Stephen Hemminger (2):
> ethdev: make flow API primary/secondary process safe
> net/failsafe: fix primary/secondary mutex
>
> drivers/net/failsafe/failsafe.c | 5 +++++
> lib/librte_ethdev/rte_ethdev.c | 6 +++++-
> 2 files changed, 10 insertions(+), 1 deletion(-)
>
The following drivers have the same kind of issue.
Drivers with unsafe pthread_mutex:
net/
af_xdp
atlantic
axgbe
bnxt
ena
failsafe (see patch #2)
hinic
mlx5
qede
vhost
virtio
raw/
ifpga
vdpa/
ifc
mlx5
Another alternative would be to create a DPDK wrapper (rte_mutex?)
which had the proper semantics. That might make the Windows port easier.
But it would make backport to stable harder.
If this does not get fixed in April, we should change documentation
to warn users.
Hi Stephen, > -----Original Message----- > From: Stephen Hemminger <stephen@networkplumber.org> > Sent: Tuesday, March 16, 2021 3:27 AM > To: dev@dpdk.org > Cc: Stephen Hemminger <stephen@networkplumber.org>; Suanming Mou > <suanmingm@nvidia.com> > Subject: [PATCH 1/2] ethdev: make flow API primary/secondary process safe > > Posix mutex are not by default safe for protecting for usage from multiple > processes. The flow ops mutex could be used by both primary and secondary > processes. Process safe is something more widely scope. I assume it should be another feature but not a bugfix for thread-safe? And the fag RTE_ETH_DEV_FLOW_OPS_THREAD_SAFE we have added is just thread safe. > > Bugzilla ID: 662 > Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> > Fixes: 80d1a9aff7f6 ("ethdev: make flow API thread safe") > Cc: suanmingm@nvidia.com > --- > lib/librte_ethdev/rte_ethdev.c | 6 +++++- > 1 file changed, 5 insertions(+), 1 deletion(-) > > diff --git a/lib/librte_ethdev/rte_ethdev.c b/lib/librte_ethdev/rte_ethdev.c index > 6f514c388b4e..d1024df408a5 100644 > --- a/lib/librte_ethdev/rte_ethdev.c > +++ b/lib/librte_ethdev/rte_ethdev.c > @@ -470,6 +470,7 @@ rte_eth_dev_allocate(const char *name) { > uint16_t port_id; > struct rte_eth_dev *eth_dev = NULL; > + pthread_mutexattr_t attr; > size_t name_len; > > name_len = strnlen(name, RTE_ETH_NAME_MAX_LEN); @@ -506,7 > +507,10 @@ rte_eth_dev_allocate(const char *name) > strlcpy(eth_dev->data->name, name, sizeof(eth_dev->data->name)); > eth_dev->data->port_id = port_id; > eth_dev->data->mtu = RTE_ETHER_MTU; > - pthread_mutex_init(ð_dev->data->flow_ops_mutex, NULL); > + > + pthread_mutexattr_init(&attr); > + pthread_mutexattr_setpshared(&attr, PTHREAD_PROCESS_SHARED); > > + pthread_mutex_init(ð_dev->data->flow_ops_mutex, &attr); > > unlock: > rte_spinlock_unlock(ð_dev_shared_data->ownership_lock); > -- > 2.30.2
On Tue, 16 Mar 2021 23:48:57 +0000
Suanming Mou <suanmingm@nvidia.com> wrote:
> Hi Stephen,
>
> > -----Original Message-----
> > From: Stephen Hemminger <stephen@networkplumber.org>
> > Sent: Tuesday, March 16, 2021 3:27 AM
> > To: dev@dpdk.org
> > Cc: Stephen Hemminger <stephen@networkplumber.org>; Suanming Mou
> > <suanmingm@nvidia.com>
> > Subject: [PATCH 1/2] ethdev: make flow API primary/secondary process safe
> >
> > Posix mutex are not by default safe for protecting for usage from multiple
> > processes. The flow ops mutex could be used by both primary and secondary
> > processes.
>
> Process safe is something more widely scope. I assume it should be another feature but not a bugfix for thread-safe?
> And the fag RTE_ETH_DEV_FLOW_OPS_THREAD_SAFE we have added is just thread safe.
>
OK, that makes sense. Is this in the documentation?
> -----Original Message-----
> From: Stephen Hemminger <stephen@networkplumber.org>
> Sent: Wednesday, March 17, 2021 8:14 AM
> To: Suanming Mou <suanmingm@nvidia.com>
> Cc: dev@dpdk.org
> Subject: Re: [PATCH 1/2] ethdev: make flow API primary/secondary process safe
>
> On Tue, 16 Mar 2021 23:48:57 +0000
> Suanming Mou <suanmingm@nvidia.com> wrote:
>
> > Hi Stephen,
> >
> > > -----Original Message-----
> > > From: Stephen Hemminger <stephen@networkplumber.org>
> > > Sent: Tuesday, March 16, 2021 3:27 AM
> > > To: dev@dpdk.org
> > > Cc: Stephen Hemminger <stephen@networkplumber.org>; Suanming Mou
> > > <suanmingm@nvidia.com>
> > > Subject: [PATCH 1/2] ethdev: make flow API primary/secondary process
> > > safe
> > >
> > > Posix mutex are not by default safe for protecting for usage from
> > > multiple processes. The flow ops mutex could be used by both primary
> > > and secondary processes.
> >
> > Process safe is something more widely scope. I assume it should be another
> feature but not a bugfix for thread-safe?
> > And the fag RTE_ETH_DEV_FLOW_OPS_THREAD_SAFE we have added is just
> thread safe.
> >
>
> OK, that makes sense. Is this in the documentation?
We have the description as below in doc/guides/prog_guide/rte_flow.rst:
If PMD interfaces don't support re-entrancy/multi-thread safety,
the rte_flow API functions will protect threads by mutex per port.
The application can check whether ``RTE_ETH_DEV_FLOW_OPS_THREAD_SAFE``
is set in ``dev_flags``, meaning the PMD is thread-safe regarding rte_flow,
so the API level protection is disabled.
On 3/16/2021 11:48 PM, Suanming Mou wrote: > Hi Stephen, > >> -----Original Message----- >> From: Stephen Hemminger <stephen@networkplumber.org> >> Sent: Tuesday, March 16, 2021 3:27 AM >> To: dev@dpdk.org >> Cc: Stephen Hemminger <stephen@networkplumber.org>; Suanming Mou >> <suanmingm@nvidia.com> >> Subject: [PATCH 1/2] ethdev: make flow API primary/secondary process safe >> >> Posix mutex are not by default safe for protecting for usage from multiple >> processes. The flow ops mutex could be used by both primary and secondary >> processes. > > Process safe is something more widely scope. I assume it should be another feature but not a bugfix for thread-safe? > And the fag RTE_ETH_DEV_FLOW_OPS_THREAD_SAFE we have added is just thread safe. > Hi Suanming, I think 'RTE_ETH_DEV_FLOW_OPS_THREAD_SAFE' flag and what this patch address are different issues. 'RTE_ETH_DEV_FLOW_OPS_THREAD_SAFE' is to add/remove synchronization support for flow APIs, that is for thread safety as flag name suggests. This patch is to solve the problem for multi process, where commit log describes as posix mutex is not safe for multiple process. Stephen, Are you aware of any downside setting 'PTHREAD_PROCESS_SHARED' attribute to the mutex? Any possible performance implications? Ori, Since mlx is heavily using the flow API, is it possible to test this patch? If there is no negative impact, I think we can get this patch, what do you think? >> >> Bugzilla ID: 662 >> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> >> Fixes: 80d1a9aff7f6 ("ethdev: make flow API thread safe") >> Cc: suanmingm@nvidia.com >> --- >> lib/librte_ethdev/rte_ethdev.c | 6 +++++- >> 1 file changed, 5 insertions(+), 1 deletion(-) >> >> diff --git a/lib/librte_ethdev/rte_ethdev.c b/lib/librte_ethdev/rte_ethdev.c index >> 6f514c388b4e..d1024df408a5 100644 >> --- a/lib/librte_ethdev/rte_ethdev.c >> +++ b/lib/librte_ethdev/rte_ethdev.c >> @@ -470,6 +470,7 @@ rte_eth_dev_allocate(const char *name) { >> uint16_t port_id; >> struct rte_eth_dev *eth_dev = NULL; >> + pthread_mutexattr_t attr; >> size_t name_len; >> >> name_len = strnlen(name, RTE_ETH_NAME_MAX_LEN); @@ -506,7 >> +507,10 @@ rte_eth_dev_allocate(const char *name) >> strlcpy(eth_dev->data->name, name, sizeof(eth_dev->data->name)); >> eth_dev->data->port_id = port_id; >> eth_dev->data->mtu = RTE_ETHER_MTU; >> - pthread_mutex_init(ð_dev->data->flow_ops_mutex, NULL); >> + >> + pthread_mutexattr_init(&attr); >> + pthread_mutexattr_setpshared(&attr, PTHREAD_PROCESS_SHARED); >> >> + pthread_mutex_init(ð_dev->data->flow_ops_mutex, &attr); >> >> unlock: >> rte_spinlock_unlock(ð_dev_shared_data->ownership_lock); >> -- >> 2.30.2 >
On 3/15/2021 7:27 PM, Stephen Hemminger wrote:
> Set mutex used in failsafe driver to protect when used by
> both primary and secondary process. Without this fix, the failsafe
> lock is not really locking when there are multiple secondary processes.
>
> Bugzilla ID: 662
> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
> Fixes: 655fcd68c7d2 ("net/failsafe: fix hotplug races")
> Cc: matan@mellanox.com
> ---
> drivers/net/failsafe/failsafe.c | 5 +++++
> 1 file changed, 5 insertions(+)
>
> diff --git a/drivers/net/failsafe/failsafe.c b/drivers/net/failsafe/failsafe.c
> index e3bda0df2bf9..5b7e560dbc08 100644
> --- a/drivers/net/failsafe/failsafe.c
> +++ b/drivers/net/failsafe/failsafe.c
> @@ -140,6 +140,11 @@ fs_mutex_init(struct fs_priv *priv)
> ERROR("Cannot initiate mutex attributes - %s", strerror(ret));
> return ret;
> }
> + /* Allow mutex to protect primary/secondary */
> + ret = pthread_mutexattr_setpshared(&attr, PTHREAD_PROCESS_SHARED);
> + if (ret)
> + ERROR("Cannot set mutex shared - %s", strerror(ret));
> +
> /* Allow mutex relocks for the thread holding the mutex. */
> ret = pthread_mutexattr_settype(&attr, PTHREAD_MUTEX_RECURSIVE);
> if (ret) {
>
Overall looks good to me.
Gaetan, Matan,
Can you please test the patch? To be sure it is not causing any unexpected
functional/performance issues.
Thanks,
ferruh
Hi, > -----Original Message----- > From: Ferruh Yigit <ferruh.yigit@intel.com> > Sent: Wednesday, April 14, 2021 9:07 PM > To: Suanming Mou <suanmingm@nvidia.com>; Ori Kam <orika@nvidia.com>; > Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>; NBU-Contact-Thomas > Monjalon <thomas@monjalon.net> > Cc: dev@dpdk.org; Stephen Hemminger <stephen@networkplumber.org> > Subject: Re: [dpdk-dev] [PATCH 1/2] ethdev: make flow API primary/secondary > process safe > > On 3/16/2021 11:48 PM, Suanming Mou wrote: > > Hi Stephen, > > > >> -----Original Message----- > >> From: Stephen Hemminger <stephen@networkplumber.org> > >> Sent: Tuesday, March 16, 2021 3:27 AM > >> To: dev@dpdk.org > >> Cc: Stephen Hemminger <stephen@networkplumber.org>; Suanming Mou > >> <suanmingm@nvidia.com> > >> Subject: [PATCH 1/2] ethdev: make flow API primary/secondary process > >> safe > >> > >> Posix mutex are not by default safe for protecting for usage from > >> multiple processes. The flow ops mutex could be used by both primary > >> and secondary processes. > > > > Process safe is something more widely scope. I assume it should be another > feature but not a bugfix for thread-safe? > > And the fag RTE_ETH_DEV_FLOW_OPS_THREAD_SAFE we have added is just > thread safe. > > > > Hi Suanming, > > I think 'RTE_ETH_DEV_FLOW_OPS_THREAD_SAFE' flag and what this patch > address are different issues. > > 'RTE_ETH_DEV_FLOW_OPS_THREAD_SAFE' is to add/remove synchronization > support for flow APIs, that is for thread safety as flag name suggests. > > This patch is to solve the problem for multi process, where commit log describes > as posix mutex is not safe for multiple process. So for PMDs which not set the RTE_ETH_DEV_FLOW_OPS_THREAD_SAFE capability bit, they will have the process level protection in multi-process. For PMDs which set the RTE_ETH_DEV_FLOW_OPS_THREAD_SAFE capability bit, this change does not help with these PMDs. If the PMD with RTE_ETH_DEV_FLOW_OPS_THREAD_SAFE capability bit internally does not support multi-process, they may still suffer crash etc. (If I understand correctly, mlx PMD level now should support multi-process, but better to have the confirmation from maintainers with much deeper level). I assume this patch solves the posix mutex for multi-process only, hard to say the flow API primary/secondary process safe after that patch. > > > Stephen, > Are you aware of any downside setting 'PTHREAD_PROCESS_SHARED' attribute > to the > mutex? Any possible performance implications? > > Ori, > Since mlx is heavily using the flow API, is it possible to test this patch? If > there is no negative impact, I think we can get this patch, what do you think? > > >> > >> Bugzilla ID: 662 > >> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> > >> Fixes: 80d1a9aff7f6 ("ethdev: make flow API thread safe") > >> Cc: suanmingm@nvidia.com > >> --- > >> lib/librte_ethdev/rte_ethdev.c | 6 +++++- > >> 1 file changed, 5 insertions(+), 1 deletion(-) > >> > >> diff --git a/lib/librte_ethdev/rte_ethdev.c b/lib/librte_ethdev/rte_ethdev.c > index > >> 6f514c388b4e..d1024df408a5 100644 > >> --- a/lib/librte_ethdev/rte_ethdev.c > >> +++ b/lib/librte_ethdev/rte_ethdev.c > >> @@ -470,6 +470,7 @@ rte_eth_dev_allocate(const char *name) { > >> uint16_t port_id; > >> struct rte_eth_dev *eth_dev = NULL; > >> + pthread_mutexattr_t attr; > >> size_t name_len; > >> > >> name_len = strnlen(name, RTE_ETH_NAME_MAX_LEN); @@ -506,7 > >> +507,10 @@ rte_eth_dev_allocate(const char *name) > >> strlcpy(eth_dev->data->name, name, sizeof(eth_dev->data->name)); > >> eth_dev->data->port_id = port_id; > >> eth_dev->data->mtu = RTE_ETHER_MTU; > >> - pthread_mutex_init(ð_dev->data->flow_ops_mutex, NULL); > >> + > >> + pthread_mutexattr_init(&attr); > >> + pthread_mutexattr_setpshared(&attr, PTHREAD_PROCESS_SHARED); > >> > >> + pthread_mutex_init(ð_dev->data->flow_ops_mutex, &attr); > >> > >> unlock: > >> rte_spinlock_unlock(ð_dev_shared_data->ownership_lock); > >> -- > >> 2.30.2 > >
On Thu, 15 Apr 2021 02:55:17 +0000 Suanming Mou <suanmingm@nvidia.com> wrote: > Hi, > > > -----Original Message----- > > From: Ferruh Yigit <ferruh.yigit@intel.com> > > Sent: Wednesday, April 14, 2021 9:07 PM > > To: Suanming Mou <suanmingm@nvidia.com>; Ori Kam <orika@nvidia.com>; > > Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>; NBU-Contact-Thomas > > Monjalon <thomas@monjalon.net> > > Cc: dev@dpdk.org; Stephen Hemminger <stephen@networkplumber.org> > > Subject: Re: [dpdk-dev] [PATCH 1/2] ethdev: make flow API primary/secondary > > process safe > > > > On 3/16/2021 11:48 PM, Suanming Mou wrote: > > > Hi Stephen, > > > > > >> -----Original Message----- > > >> From: Stephen Hemminger <stephen@networkplumber.org> > > >> Sent: Tuesday, March 16, 2021 3:27 AM > > >> To: dev@dpdk.org > > >> Cc: Stephen Hemminger <stephen@networkplumber.org>; Suanming Mou > > >> <suanmingm@nvidia.com> > > >> Subject: [PATCH 1/2] ethdev: make flow API primary/secondary process > > >> safe > > >> > > >> Posix mutex are not by default safe for protecting for usage from > > >> multiple processes. The flow ops mutex could be used by both primary > > >> and secondary processes. > > > > > > Process safe is something more widely scope. I assume it should be another > > feature but not a bugfix for thread-safe? > > > And the fag RTE_ETH_DEV_FLOW_OPS_THREAD_SAFE we have added is just > > thread safe. > > > > > > > Hi Suanming, > > > > I think 'RTE_ETH_DEV_FLOW_OPS_THREAD_SAFE' flag and what this patch > > address are different issues. > > > > 'RTE_ETH_DEV_FLOW_OPS_THREAD_SAFE' is to add/remove synchronization > > support for flow APIs, that is for thread safety as flag name suggests. > > > > This patch is to solve the problem for multi process, where commit log describes > > as posix mutex is not safe for multiple process. > > So for PMDs which not set the RTE_ETH_DEV_FLOW_OPS_THREAD_SAFE capability bit, they will have the process level protection in multi-process. > For PMDs which set the RTE_ETH_DEV_FLOW_OPS_THREAD_SAFE capability bit, this change does not help with these PMDs. If the PMD with RTE_ETH_DEV_FLOW_OPS_THREAD_SAFE capability bit internally does not support multi-process, they may still suffer crash etc. (If I understand correctly, mlx PMD level now should support multi-process, but better to have the confirmation from maintainers with much deeper level). > I assume this patch solves the posix mutex for multi-process only, hard to say the flow API primary/secondary process safe after that patch. Maybe we need a FLOW_OPS_PROCESS_SAFE flag? and declare all existing drivers as not process safe?
On 4/15/2021 3:55 AM, Suanming Mou wrote: > Hi, > >> -----Original Message----- >> From: Ferruh Yigit <ferruh.yigit@intel.com> >> Sent: Wednesday, April 14, 2021 9:07 PM >> To: Suanming Mou <suanmingm@nvidia.com>; Ori Kam <orika@nvidia.com>; >> Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>; NBU-Contact-Thomas >> Monjalon <thomas@monjalon.net> >> Cc: dev@dpdk.org; Stephen Hemminger <stephen@networkplumber.org> >> Subject: Re: [dpdk-dev] [PATCH 1/2] ethdev: make flow API primary/secondary >> process safe >> >> On 3/16/2021 11:48 PM, Suanming Mou wrote: >>> Hi Stephen, >>> >>>> -----Original Message----- >>>> From: Stephen Hemminger <stephen@networkplumber.org> >>>> Sent: Tuesday, March 16, 2021 3:27 AM >>>> To: dev@dpdk.org >>>> Cc: Stephen Hemminger <stephen@networkplumber.org>; Suanming Mou >>>> <suanmingm@nvidia.com> >>>> Subject: [PATCH 1/2] ethdev: make flow API primary/secondary process >>>> safe >>>> >>>> Posix mutex are not by default safe for protecting for usage from >>>> multiple processes. The flow ops mutex could be used by both primary >>>> and secondary processes. >>> >>> Process safe is something more widely scope. I assume it should be another >> feature but not a bugfix for thread-safe? >>> And the fag RTE_ETH_DEV_FLOW_OPS_THREAD_SAFE we have added is just >> thread safe. >>> >> >> Hi Suanming, >> >> I think 'RTE_ETH_DEV_FLOW_OPS_THREAD_SAFE' flag and what this patch >> address are different issues. >> >> 'RTE_ETH_DEV_FLOW_OPS_THREAD_SAFE' is to add/remove synchronization >> support for flow APIs, that is for thread safety as flag name suggests. >> >> This patch is to solve the problem for multi process, where commit log describes >> as posix mutex is not safe for multiple process. > > So for PMDs which not set the RTE_ETH_DEV_FLOW_OPS_THREAD_SAFE capability bit, they will have the process level protection in multi-process. > For PMDs which set the RTE_ETH_DEV_FLOW_OPS_THREAD_SAFE capability bit, this change does not help with these PMDs. If the PMD with RTE_ETH_DEV_FLOW_OPS_THREAD_SAFE capability bit internally does not support multi-process, they may still suffer crash etc. Correct > (If I understand correctly, mlx PMD level now should support multi-process, but better to have the confirmation from maintainers with much deeper level). > I assume this patch solves the posix mutex for multi-process only, hard to say the flow API primary/secondary process safe after that patch. > I am also not quite sure how PMDs that doesn't require mutex at all, (mlx5, bnxt, sfc) behave on multi process. Is calling flow APIs from both primary/secondary safe? >> >> >> Stephen, >> Are you aware of any downside setting 'PTHREAD_PROCESS_SHARED' attribute >> to the >> mutex? Any possible performance implications? >> >> Ori, >> Since mlx is heavily using the flow API, is it possible to test this patch? If >> there is no negative impact, I think we can get this patch, what do you think? >> >>>> >>>> Bugzilla ID: 662 >>>> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> >>>> Fixes: 80d1a9aff7f6 ("ethdev: make flow API thread safe") >>>> Cc: suanmingm@nvidia.com >>>> --- >>>> lib/librte_ethdev/rte_ethdev.c | 6 +++++- >>>> 1 file changed, 5 insertions(+), 1 deletion(-) >>>> >>>> diff --git a/lib/librte_ethdev/rte_ethdev.c b/lib/librte_ethdev/rte_ethdev.c >> index >>>> 6f514c388b4e..d1024df408a5 100644 >>>> --- a/lib/librte_ethdev/rte_ethdev.c >>>> +++ b/lib/librte_ethdev/rte_ethdev.c >>>> @@ -470,6 +470,7 @@ rte_eth_dev_allocate(const char *name) { >>>> uint16_t port_id; >>>> struct rte_eth_dev *eth_dev = NULL; >>>> + pthread_mutexattr_t attr; >>>> size_t name_len; >>>> >>>> name_len = strnlen(name, RTE_ETH_NAME_MAX_LEN); @@ -506,7 >>>> +507,10 @@ rte_eth_dev_allocate(const char *name) >>>> strlcpy(eth_dev->data->name, name, sizeof(eth_dev->data->name)); >>>> eth_dev->data->port_id = port_id; >>>> eth_dev->data->mtu = RTE_ETHER_MTU; >>>> - pthread_mutex_init(ð_dev->data->flow_ops_mutex, NULL); >>>> + >>>> + pthread_mutexattr_init(&attr); >>>> + pthread_mutexattr_setpshared(&attr, PTHREAD_PROCESS_SHARED); >>>> >>>> + pthread_mutex_init(ð_dev->data->flow_ops_mutex, &attr); >>>> >>>> unlock: >>>> rte_spinlock_unlock(ð_dev_shared_data->ownership_lock); >>>> -- >>>> 2.30.2 >>> >
On Thu, 15 Apr 2021 08:42:39 +0100
Ferruh Yigit <ferruh.yigit@intel.com> wrote:
> > (If I understand correctly, mlx PMD level now should support multi-process, but better to have the confirmation from maintainers with much deeper level).
> > I assume this patch solves the posix mutex for multi-process only, hard to say the flow API primary/secondary process safe after that patch.
> >
>
> I am also not quite sure how PMDs that doesn't require mutex at all, (mlx5,
> bnxt, sfc) behave on multi process. Is calling flow APIs from both
> primary/secondary safe?
>
> >>
> >>
> >> Stephen,
> >> Are you aware of any downside setting 'PTHREAD_PROCESS_SHARED' attribute
> >> to the
> >> mutex? Any possible performance implications?
> >>
Only impacts the contended case.
Setting the PROCESS_SHARED turns the mutex causes the mutex to behave
differently when cmxchg fails. Internally pthread_mutex_lock will
call futex if it has to wait and the SHARED flag impacts whether
FUTEX_PRIVATE_FLAG is set.
FUTEX_PRIVATE_FLAG (since Linux 2.6.22)
This option bit can be employed with all futex operations. It
tells the kernel that the futex is process-private and not
shared with another process (i.e., it is being used for synchro‐
nization only between threads of the same process). This allows
the kernel to make some additional performance optimizations.
> -----Original Message----- > From: Ferruh Yigit <ferruh.yigit@intel.com> > Sent: Thursday, April 15, 2021 3:43 PM > To: Suanming Mou <suanmingm@nvidia.com>; Ori Kam <orika@nvidia.com>; > Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>; NBU-Contact-Thomas > Monjalon <thomas@monjalon.net> > Cc: dev@dpdk.org; Stephen Hemminger <stephen@networkplumber.org>; > Matan Azrad <matan@nvidia.com>; Shahaf Shuler <shahafs@nvidia.com>; Ajit > Khaparde <ajit.khaparde@broadcom.com>; Somnath Kotur > <somnath.kotur@broadcom.com> > Subject: Re: [dpdk-dev] [PATCH 1/2] ethdev: make flow API primary/secondary > process safe > > On 4/15/2021 3:55 AM, Suanming Mou wrote: > > Hi, > > > >> -----Original Message----- > >> From: Ferruh Yigit <ferruh.yigit@intel.com> > >> Sent: Wednesday, April 14, 2021 9:07 PM > >> To: Suanming Mou <suanmingm@nvidia.com>; Ori Kam <orika@nvidia.com>; > >> Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>; NBU-Contact- > Thomas > >> Monjalon <thomas@monjalon.net> > >> Cc: dev@dpdk.org; Stephen Hemminger <stephen@networkplumber.org> > >> Subject: Re: [dpdk-dev] [PATCH 1/2] ethdev: make flow API > >> primary/secondary process safe > >> > >> On 3/16/2021 11:48 PM, Suanming Mou wrote: > >>> Hi Stephen, > >>> > >>>> -----Original Message----- > >>>> From: Stephen Hemminger <stephen@networkplumber.org> > >>>> Sent: Tuesday, March 16, 2021 3:27 AM > >>>> To: dev@dpdk.org > >>>> Cc: Stephen Hemminger <stephen@networkplumber.org>; Suanming Mou > >>>> <suanmingm@nvidia.com> > >>>> Subject: [PATCH 1/2] ethdev: make flow API primary/secondary > >>>> process safe > >>>> > >>>> Posix mutex are not by default safe for protecting for usage from > >>>> multiple processes. The flow ops mutex could be used by both > >>>> primary and secondary processes. > >>> > >>> Process safe is something more widely scope. I assume it should be > >>> another > >> feature but not a bugfix for thread-safe? > >>> And the fag RTE_ETH_DEV_FLOW_OPS_THREAD_SAFE we have added is just > >> thread safe. > >>> > >> > >> Hi Suanming, > >> > >> I think 'RTE_ETH_DEV_FLOW_OPS_THREAD_SAFE' flag and what this patch > >> address are different issues. > >> > >> 'RTE_ETH_DEV_FLOW_OPS_THREAD_SAFE' is to add/remove synchronization > >> support for flow APIs, that is for thread safety as flag name suggests. > >> > >> This patch is to solve the problem for multi process, where commit > >> log describes as posix mutex is not safe for multiple process. > > > > So for PMDs which not set the RTE_ETH_DEV_FLOW_OPS_THREAD_SAFE > capability bit, they will have the process level protection in multi-process. > > For PMDs which set the RTE_ETH_DEV_FLOW_OPS_THREAD_SAFE capability > bit, this change does not help with these PMDs. If the PMD with > RTE_ETH_DEV_FLOW_OPS_THREAD_SAFE capability bit internally does not > support multi-process, they may still suffer crash etc. > > Correct > > > (If I understand correctly, mlx PMD level now should support multi-process, > but better to have the confirmation from maintainers with much deeper level). > > I assume this patch solves the posix mutex for multi-process only, hard to say > the flow API primary/secondary process safe after that patch. > > > > I am also not quite sure how PMDs that doesn't require mutex at all, (mlx5, bnxt, > sfc) behave on multi process. Is calling flow APIs from both primary/secondary > safe? This depends on the vendor driver. For mlx, it should be safe, not sure others. > > >> > >> > >> Stephen, > >> Are you aware of any downside setting 'PTHREAD_PROCESS_SHARED' > >> attribute to the mutex? Any possible performance implications? > >> > >> Ori, > >> Since mlx is heavily using the flow API, is it possible to test this > >> patch? If there is no negative impact, I think we can get this patch, what do > you think? > >> > >>>>
We make a test on this patch, test result show that it works fine. Below is the detail:
HW: Kunpeng920 ARM Platform which is ARMv8
NIC: Kunpeng920 SOC NIC
OS: Linux centos-C3 5.12.0-rc4+
DPDK: 21.02
DRV: hns3
Start three process:
./testpmd -w 0000:bd:00.0 -l 67-68 --proc-type=auto -- -i --num-procs=3 --proc-id=0
./testpmd -w 0000:bd:00.0 -l 67-68 --proc-type=auto -- -i --num-procs=3 --proc-id=1
./testpmd -w 0000:bd:00.0 -l 69-70 --proc-type=auto -- -i --num-procs=3 --proc-id=2
Every process execute following steps:
1. create one fdir rule, eg: flow create 0 ingress pattern eth / ipv4 src is 192.168.110.26 / end actions queue index 0 / end
note: each process has different rules, so they will create success
2. create one rss rule
note: all process create the same rules, so they may create fail
3. flush all rules
4. goto step 1, loop again
note: there are +10ms delay after step 1/2/3 because we run VBS script to inject command on windows.
The test lasted 12 hours, and there are no process crashes.
On 2021/4/14 21:06, Ferruh Yigit wrote:
> On 3/16/2021 11:48 PM, Suanming Mou wrote:
>> Hi Stephen,
>>
>>> -----Original Message-----
>>> From: Stephen Hemminger <stephen@networkplumber.org>
>>> Sent: Tuesday, March 16, 2021 3:27 AM
>>> To: dev@dpdk.org
>>> Cc: Stephen Hemminger <stephen@networkplumber.org>; Suanming Mou
>>> <suanmingm@nvidia.com>
>>> Subject: [PATCH 1/2] ethdev: make flow API primary/secondary process safe
>>>
>>> Posix mutex are not by default safe for protecting for usage from multiple
>>> processes. The flow ops mutex could be used by both primary and secondary
>>> processes.
>>
>> Process safe is something more widely scope. I assume it should be another feature but not a bugfix for thread-safe?
>> And the fag RTE_ETH_DEV_FLOW_OPS_THREAD_SAFE we have added is just thread safe.
>>
>
> Hi Suanming,
>
> I think 'RTE_ETH_DEV_FLOW_OPS_THREAD_SAFE' flag and what this patch address are different issues.
>
> 'RTE_ETH_DEV_FLOW_OPS_THREAD_SAFE' is to add/remove synchronization support for flow APIs, that is for thread safety as flag name suggests.
>
> This patch is to solve the problem for multi process, where commit log describes as posix mutex is not safe for multiple process.
>
>
> Stephen,
> Are you aware of any downside setting 'PTHREAD_PROCESS_SHARED' attribute to the mutex? Any possible performance implications?
>
> Ori,
> Since mlx is heavily using the flow API, is it possible to test this patch? If there is no negative impact, I think we can get this patch, what do you think?
>
>>>
>>> Bugzilla ID: 662
>>> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
>>> Fixes: 80d1a9aff7f6 ("ethdev: make flow API thread safe")
>>> Cc: suanmingm@nvidia.com
>>> ---
>>> lib/librte_ethdev/rte_ethdev.c | 6 +++++-
>>> 1 file changed, 5 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/lib/librte_ethdev/rte_ethdev.c b/lib/librte_ethdev/rte_ethdev.c index
>>> 6f514c388b4e..d1024df408a5 100644
>>> --- a/lib/librte_ethdev/rte_ethdev.c
>>> +++ b/lib/librte_ethdev/rte_ethdev.c
>>> @@ -470,6 +470,7 @@ rte_eth_dev_allocate(const char *name) {
>>> uint16_t port_id;
>>> struct rte_eth_dev *eth_dev = NULL;
>>> + pthread_mutexattr_t attr;
>>> size_t name_len;
>>>
>>> name_len = strnlen(name, RTE_ETH_NAME_MAX_LEN); @@ -506,7
>>> +507,10 @@ rte_eth_dev_allocate(const char *name)
>>> strlcpy(eth_dev->data->name, name, sizeof(eth_dev->data->name));
>>> eth_dev->data->port_id = port_id;
>>> eth_dev->data->mtu = RTE_ETHER_MTU;
>>> - pthread_mutex_init(ð_dev->data->flow_ops_mutex, NULL);
>>> +
>>> + pthread_mutexattr_init(&attr);
>>> + pthread_mutexattr_setpshared(&attr, PTHREAD_PROCESS_SHARED);
>>>
>>> + pthread_mutex_init(ð_dev->data->flow_ops_mutex, &attr);
>>>
>>> unlock:
>>> rte_spinlock_unlock(ð_dev_shared_data->ownership_lock);
>>> --
>>> 2.30.2
>>
>
>
> .
[-- Attachment #1: Type: text/plain, Size: 3008 bytes --] On Thu, Apr 15, 2021 at 12:42 AM Ferruh Yigit <ferruh.yigit@intel.com> wrote: > > On 4/15/2021 3:55 AM, Suanming Mou wrote: > > Hi, > > > >> -----Original Message----- > >> From: Ferruh Yigit <ferruh.yigit@intel.com> > >> Sent: Wednesday, April 14, 2021 9:07 PM > >> To: Suanming Mou <suanmingm@nvidia.com>; Ori Kam <orika@nvidia.com>; > >> Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>; NBU-Contact-Thomas > >> Monjalon <thomas@monjalon.net> > >> Cc: dev@dpdk.org; Stephen Hemminger <stephen@networkplumber.org> > >> Subject: Re: [dpdk-dev] [PATCH 1/2] ethdev: make flow API primary/secondary > >> process safe > >> > >> On 3/16/2021 11:48 PM, Suanming Mou wrote: > >>> Hi Stephen, > >>> > >>>> -----Original Message----- > >>>> From: Stephen Hemminger <stephen@networkplumber.org> > >>>> Sent: Tuesday, March 16, 2021 3:27 AM > >>>> To: dev@dpdk.org > >>>> Cc: Stephen Hemminger <stephen@networkplumber.org>; Suanming Mou > >>>> <suanmingm@nvidia.com> > >>>> Subject: [PATCH 1/2] ethdev: make flow API primary/secondary process > >>>> safe > >>>> > >>>> Posix mutex are not by default safe for protecting for usage from > >>>> multiple processes. The flow ops mutex could be used by both primary > >>>> and secondary processes. > >>> > >>> Process safe is something more widely scope. I assume it should be another > >> feature but not a bugfix for thread-safe? > >>> And the fag RTE_ETH_DEV_FLOW_OPS_THREAD_SAFE we have added is just > >> thread safe. > >>> > >> > >> Hi Suanming, > >> > >> I think 'RTE_ETH_DEV_FLOW_OPS_THREAD_SAFE' flag and what this patch > >> address are different issues. > >> > >> 'RTE_ETH_DEV_FLOW_OPS_THREAD_SAFE' is to add/remove synchronization > >> support for flow APIs, that is for thread safety as flag name suggests. > >> > >> This patch is to solve the problem for multi process, where commit log describes > >> as posix mutex is not safe for multiple process. > > > > So for PMDs which not set the RTE_ETH_DEV_FLOW_OPS_THREAD_SAFE capability bit, they will have the process level protection in multi-process. > > For PMDs which set the RTE_ETH_DEV_FLOW_OPS_THREAD_SAFE capability bit, this change does not help with these PMDs. If the PMD with RTE_ETH_DEV_FLOW_OPS_THREAD_SAFE capability bit internally does not support multi-process, they may still suffer crash etc. > > Correct > > > (If I understand correctly, mlx PMD level now should support multi-process, but better to have the confirmation from maintainers with much deeper level). > > I assume this patch solves the posix mutex for multi-process only, hard to say the flow API primary/secondary process safe after that patch. > > > > I am also not quite sure how PMDs that doesn't require mutex at all, (mlx5, > bnxt, sfc) behave on multi process. Is calling flow APIs from both > primary/secondary safe? We have some level of tests on this and so far current code looks safe. But we will try to increase the coverage and see how it goes. >
On 4/16/2021 2:41 AM, fengchengwen wrote: > We make a test on this patch, test result show that it works fine. Below is the detail: > > HW: Kunpeng920 ARM Platform which is ARMv8 > NIC: Kunpeng920 SOC NIC > OS: Linux centos-C3 5.12.0-rc4+ > DPDK: 21.02 > DRV: hns3 > > Start three process: > ./testpmd -w 0000:bd:00.0 -l 67-68 --proc-type=auto -- -i --num-procs=3 --proc-id=0 > ./testpmd -w 0000:bd:00.0 -l 67-68 --proc-type=auto -- -i --num-procs=3 --proc-id=1 > ./testpmd -w 0000:bd:00.0 -l 69-70 --proc-type=auto -- -i --num-procs=3 --proc-id=2 > > Every process execute following steps: > 1. create one fdir rule, eg: flow create 0 ingress pattern eth / ipv4 src is 192.168.110.26 / end actions queue index 0 / end > note: each process has different rules, so they will create success > 2. create one rss rule > note: all process create the same rules, so they may create fail > 3. flush all rules > 4. goto step 1, loop again > note: there are +10ms delay after step 1/2/3 because we run VBS script to inject command on windows. > > The test lasted 12 hours, and there are no process crashes. > Thanks Chengwen for testing, appreciated. > > On 2021/4/14 21:06, Ferruh Yigit wrote: >> On 3/16/2021 11:48 PM, Suanming Mou wrote: >>> Hi Stephen, >>> >>>> -----Original Message----- >>>> From: Stephen Hemminger <stephen@networkplumber.org> >>>> Sent: Tuesday, March 16, 2021 3:27 AM >>>> To: dev@dpdk.org >>>> Cc: Stephen Hemminger <stephen@networkplumber.org>; Suanming Mou >>>> <suanmingm@nvidia.com> >>>> Subject: [PATCH 1/2] ethdev: make flow API primary/secondary process safe >>>> >>>> Posix mutex are not by default safe for protecting for usage from multiple >>>> processes. The flow ops mutex could be used by both primary and secondary >>>> processes. >>> >>> Process safe is something more widely scope. I assume it should be another feature but not a bugfix for thread-safe? >>> And the fag RTE_ETH_DEV_FLOW_OPS_THREAD_SAFE we have added is just thread safe. >>> >> >> Hi Suanming, >> >> I think 'RTE_ETH_DEV_FLOW_OPS_THREAD_SAFE' flag and what this patch address are different issues. >> >> 'RTE_ETH_DEV_FLOW_OPS_THREAD_SAFE' is to add/remove synchronization support for flow APIs, that is for thread safety as flag name suggests. >> >> This patch is to solve the problem for multi process, where commit log describes as posix mutex is not safe for multiple process. >> >> >> Stephen, >> Are you aware of any downside setting 'PTHREAD_PROCESS_SHARED' attribute to the mutex? Any possible performance implications? >> >> Ori, >> Since mlx is heavily using the flow API, is it possible to test this patch? If there is no negative impact, I think we can get this patch, what do you think? >> >>>> >>>> Bugzilla ID: 662 >>>> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> >>>> Fixes: 80d1a9aff7f6 ("ethdev: make flow API thread safe") >>>> Cc: suanmingm@nvidia.com >>>> --- >>>> lib/librte_ethdev/rte_ethdev.c | 6 +++++- >>>> 1 file changed, 5 insertions(+), 1 deletion(-) >>>> >>>> diff --git a/lib/librte_ethdev/rte_ethdev.c b/lib/librte_ethdev/rte_ethdev.c index >>>> 6f514c388b4e..d1024df408a5 100644 >>>> --- a/lib/librte_ethdev/rte_ethdev.c >>>> +++ b/lib/librte_ethdev/rte_ethdev.c >>>> @@ -470,6 +470,7 @@ rte_eth_dev_allocate(const char *name) { >>>> uint16_t port_id; >>>> struct rte_eth_dev *eth_dev = NULL; >>>> + pthread_mutexattr_t attr; >>>> size_t name_len; >>>> >>>> name_len = strnlen(name, RTE_ETH_NAME_MAX_LEN); @@ -506,7 >>>> +507,10 @@ rte_eth_dev_allocate(const char *name) >>>> strlcpy(eth_dev->data->name, name, sizeof(eth_dev->data->name)); >>>> eth_dev->data->port_id = port_id; >>>> eth_dev->data->mtu = RTE_ETHER_MTU; >>>> - pthread_mutex_init(ð_dev->data->flow_ops_mutex, NULL); >>>> + >>>> + pthread_mutexattr_init(&attr); >>>> + pthread_mutexattr_setpshared(&attr, PTHREAD_PROCESS_SHARED); >>>> >>>> + pthread_mutex_init(ð_dev->data->flow_ops_mutex, &attr); >>>> >>>> unlock: >>>> rte_spinlock_unlock(ð_dev_shared_data->ownership_lock); >>>> -- >>>> 2.30.2 >>> >> >> >> . >
On 3/15/2021 7:27 PM, Stephen Hemminger wrote:
> Posix mutex are not by default safe for protecting for usage
> from multiple processes. The flow ops mutex could be used by
> both primary and secondary processes.
>
> Bugzilla ID: 662
> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
> Fixes: 80d1a9aff7f6 ("ethdev: make flow API thread safe")
> Cc: suanmingm@nvidia.com
> ---
> lib/librte_ethdev/rte_ethdev.c | 6 +++++-
> 1 file changed, 5 insertions(+), 1 deletion(-)
>
> diff --git a/lib/librte_ethdev/rte_ethdev.c b/lib/librte_ethdev/rte_ethdev.c
> index 6f514c388b4e..d1024df408a5 100644
> --- a/lib/librte_ethdev/rte_ethdev.c
> +++ b/lib/librte_ethdev/rte_ethdev.c
> @@ -470,6 +470,7 @@ rte_eth_dev_allocate(const char *name)
> {
> uint16_t port_id;
> struct rte_eth_dev *eth_dev = NULL;
> + pthread_mutexattr_t attr;
> size_t name_len;
>
> name_len = strnlen(name, RTE_ETH_NAME_MAX_LEN);
> @@ -506,7 +507,10 @@ rte_eth_dev_allocate(const char *name)
> strlcpy(eth_dev->data->name, name, sizeof(eth_dev->data->name));
> eth_dev->data->port_id = port_id;
> eth_dev->data->mtu = RTE_ETHER_MTU;
> - pthread_mutex_init(ð_dev->data->flow_ops_mutex, NULL);
> +
> + pthread_mutexattr_init(&attr);
> + pthread_mutexattr_setpshared(&attr, PTHREAD_PROCESS_SHARED);
> + pthread_mutex_init(ð_dev->data->flow_ops_mutex, &attr);
>
> unlock:
> rte_spinlock_unlock(ð_dev_shared_data->ownership_lock);
>
The problem looks legit, and there seems no obvious downside of the patch. Also
Huawei test result was good, hence:
Acked-by: Ferruh Yigit <ferruh.yigit@intel.com>
And the PMDs that set the 'RTE_ETH_DEV_FLOW_OPS_THREAD_SAFE' flag already not
affected from this change at all, but how safe mutli process applications will
be for them is still not clear.
On 4/14/2021 2:10 PM, Ferruh Yigit wrote:
> On 3/15/2021 7:27 PM, Stephen Hemminger wrote:
>> Set mutex used in failsafe driver to protect when used by
>> both primary and secondary process. Without this fix, the failsafe
>> lock is not really locking when there are multiple secondary processes.
>>
>> Bugzilla ID: 662
>> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
>> Fixes: 655fcd68c7d2 ("net/failsafe: fix hotplug races")
>> Cc: matan@mellanox.com
>> ---
>> drivers/net/failsafe/failsafe.c | 5 +++++
>> 1 file changed, 5 insertions(+)
>>
>> diff --git a/drivers/net/failsafe/failsafe.c b/drivers/net/failsafe/failsafe.c
>> index e3bda0df2bf9..5b7e560dbc08 100644
>> --- a/drivers/net/failsafe/failsafe.c
>> +++ b/drivers/net/failsafe/failsafe.c
>> @@ -140,6 +140,11 @@ fs_mutex_init(struct fs_priv *priv)
>> ERROR("Cannot initiate mutex attributes - %s", strerror(ret));
>> return ret;
>> }
>> + /* Allow mutex to protect primary/secondary */
>> + ret = pthread_mutexattr_setpshared(&attr, PTHREAD_PROCESS_SHARED);
>> + if (ret)
>> + ERROR("Cannot set mutex shared - %s", strerror(ret));
>> +
>> /* Allow mutex relocks for the thread holding the mutex. */
>> ret = pthread_mutexattr_settype(&attr, PTHREAD_MUTEX_RECURSIVE);
>> if (ret) {
>>
>
> Overall looks good to me.
>
> Gaetan, Matan,
>
> Can you please test the patch? To be sure it is not causing any unexpected
> functional/performance issues.
>
Ping.
On 3/16/2021 4:28 PM, Stephen Hemminger wrote:
> On Mon, 15 Mar 2021 12:27:20 -0700
> Stephen Hemminger <stephen@networkplumber.org> wrote:
>
>> This fixes two places where pthread_mutex was being unsafely
>> used between primary secondary process.
>>
>> These patches are necessary but not sufficient to address Bug 662
>>
>> Stephen Hemminger (2):
>> ethdev: make flow API primary/secondary process safe
>> net/failsafe: fix primary/secondary mutex
>>
>> drivers/net/failsafe/failsafe.c | 5 +++++
>> lib/librte_ethdev/rte_ethdev.c | 6 +++++-
>> 2 files changed, 10 insertions(+), 1 deletion(-)
>>
>
> The following drivers have the same kind of issue.
>
> Drivers with unsafe pthread_mutex:
>
> net/
> af_xdp
> atlantic
> axgbe
> bnxt
> ena
> failsafe (see patch #2)
> hinic
> mlx5
> qede
> vhost
> virtio
>
> raw/
> ifpga
> vdpa/
> ifc
> mlx5
>
>
> Another alternative would be to create a DPDK wrapper (rte_mutex?)
> which had the proper semantics. That might make the Windows port easier.
> But it would make backport to stable harder.
>
>
> If this does not get fixed in April, we should change documentation
> to warn users.
>
>
+Anatoly as our multi process maintainer.
And most of the above nics already doesn't claim the multi process support (in
.ini files), the ones that claims the support:
mlx[45]
bnxt
hinic
qede
virtio
About the title, better to speak about multi-process, it is less confusing than primary/secondary. 15/03/2021 20:27, Stephen Hemminger: > Set mutex used in failsafe driver to protect when used by > both primary and secondary process. Without this fix, the failsafe > lock is not really locking when there are multiple secondary processes. > > Bugzilla ID: 662 > Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> > Fixes: 655fcd68c7d2 ("net/failsafe: fix hotplug races") > Cc: matan@mellanox.com The correct order for above lines is: Bugzilla ID: 662 Fixes: 655fcd68c7d2 ("net/failsafe: fix hotplug races") Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> > --- > --- a/drivers/net/failsafe/failsafe.c > +++ b/drivers/net/failsafe/failsafe.c > @@ -140,6 +140,11 @@ fs_mutex_init(struct fs_priv *priv) > ERROR("Cannot initiate mutex attributes - %s", strerror(ret)); > return ret; > } > + /* Allow mutex to protect primary/secondary */ > + ret = pthread_mutexattr_setpshared(&attr, PTHREAD_PROCESS_SHARED); > + if (ret) > + ERROR("Cannot set mutex shared - %s", strerror(ret)); Why not returning an error here?
About the title, better to say "multi-process" and do not
commit on being completely safe for the whole API.
It is just fixing the mutex for multi-process,
and this mutex is for driver not being natively multi-thread.
The compilation fails on MinGW:
error: implicit declaration of function ‘pthread_mutexattr_init’
15/03/2021 20:27, Stephen Hemminger:
> Posix mutex are not by default safe for protecting for usage
> from multiple processes. The flow ops mutex could be used by
> both primary and secondary processes.
>
> Bugzilla ID: 662
> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
> Fixes: 80d1a9aff7f6 ("ethdev: make flow API thread safe")
> Cc: suanmingm@nvidia.com
These lines must be sorted.
Please use --cc-cmd devtools/get-maintainer.sh so that maintainers
are Cc'ed. Adding Ferruh and Andrew.
On Mon, 19 Apr 2021 19:14:08 +0200
Thomas Monjalon <thomas@monjalon.net> wrote:
> About the title, better to say "multi-process" and do not
> commit on being completely safe for the whole API.
> It is just fixing the mutex for multi-process,
> and this mutex is for driver not being natively multi-thread.
>
> The compilation fails on MinGW:
> error: implicit declaration of function ‘pthread_mutexattr_init’
Sorry, don't have all the variants. How do you handle the lack of
compatibility on Windows.
19/04/2021 19:45, Stephen Hemminger:
> On Mon, 19 Apr 2021 19:14:08 +0200
> Thomas Monjalon <thomas@monjalon.net> wrote:
> > The compilation fails on MinGW:
> > error: implicit declaration of function ‘pthread_mutexattr_init’
>
> Sorry, don't have all the variants. How do you handle the lack of
> compatibility on Windows.
No idea, I am not the one working in Microsoft :)
Adding more people in Cc to help.
On 4/19/21 8:08 PM, Thomas Monjalon wrote:
> About the title, better to speak about multi-process,
> it is less confusing than primary/secondary.
>
> 15/03/2021 20:27, Stephen Hemminger:
>> Set mutex used in failsafe driver to protect when used by
>> both primary and secondary process. Without this fix, the failsafe
>> lock is not really locking when there are multiple secondary processes.
>>
>> Bugzilla ID: 662
>> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
>> Fixes: 655fcd68c7d2 ("net/failsafe: fix hotplug races")
>> Cc: matan@mellanox.com
>
> The correct order for above lines is:
>
> Bugzilla ID: 662
> Fixes: 655fcd68c7d2 ("net/failsafe: fix hotplug races")
>
> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
>
>> ---
>> --- a/drivers/net/failsafe/failsafe.c
>> +++ b/drivers/net/failsafe/failsafe.c
>> @@ -140,6 +140,11 @@ fs_mutex_init(struct fs_priv *priv)
>> ERROR("Cannot initiate mutex attributes - %s", strerror(ret));
>> return ret;
>> }
>> + /* Allow mutex to protect primary/secondary */
>> + ret = pthread_mutexattr_setpshared(&attr, PTHREAD_PROCESS_SHARED);
>> + if (ret)
>> + ERROR("Cannot set mutex shared - %s", strerror(ret));
>
> Why not returning an error here?
+1
I think it would be safer to return an error here.
On 3/15/21 10:27 PM, Stephen Hemminger wrote: > Posix mutex are not by default safe for protecting for usage > from multiple processes. The flow ops mutex could be used by > both primary and secondary processes. > > Bugzilla ID: 662 > Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> > Fixes: 80d1a9aff7f6 ("ethdev: make flow API thread safe") > Cc: suanmingm@nvidia.com > --- > lib/librte_ethdev/rte_ethdev.c | 6 +++++- > 1 file changed, 5 insertions(+), 1 deletion(-) > > diff --git a/lib/librte_ethdev/rte_ethdev.c b/lib/librte_ethdev/rte_ethdev.c > index 6f514c388b4e..d1024df408a5 100644 > --- a/lib/librte_ethdev/rte_ethdev.c > +++ b/lib/librte_ethdev/rte_ethdev.c > @@ -470,6 +470,7 @@ rte_eth_dev_allocate(const char *name) > { > uint16_t port_id; > struct rte_eth_dev *eth_dev = NULL; > + pthread_mutexattr_t attr; > size_t name_len; > > name_len = strnlen(name, RTE_ETH_NAME_MAX_LEN); > @@ -506,7 +507,10 @@ rte_eth_dev_allocate(const char *name) > strlcpy(eth_dev->data->name, name, sizeof(eth_dev->data->name)); > eth_dev->data->port_id = port_id; > eth_dev->data->mtu = RTE_ETHER_MTU; > - pthread_mutex_init(ð_dev->data->flow_ops_mutex, NULL); > + > + pthread_mutexattr_init(&attr); > + pthread_mutexattr_setpshared(&attr, PTHREAD_PROCESS_SHARED); Return value must be checked here. It may return ENOTSUP and EINVAL. If it fails, IMHO we should do cleanup and return NULL. > + pthread_mutex_init(ð_dev->data->flow_ops_mutex, &attr); > > unlock: > rte_spinlock_unlock(ð_dev_shared_data->ownership_lock); > Please, fix notes from Thomas and above. Overall these patches LGTM. I think we should not introduce any flags that flow ops are multi-process safe. Nobody prevents to call the API in multi-process case and it must behave consistently. The patch makes ethdev API safe and it is responsibility of the driver care about final multi-process safety.
On Tue, 8 Jun 2021 11:00:37 +0300
Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru> wrote:
> On 4/19/21 8:08 PM, Thomas Monjalon wrote:
> > About the title, better to speak about multi-process,
> > it is less confusing than primary/secondary.
> >
> > 15/03/2021 20:27, Stephen Hemminger:
> >> Set mutex used in failsafe driver to protect when used by
> >> both primary and secondary process. Without this fix, the failsafe
> >> lock is not really locking when there are multiple secondary processes.
> >>
> >> Bugzilla ID: 662
> >> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
> >> Fixes: 655fcd68c7d2 ("net/failsafe: fix hotplug races")
> >> Cc: matan@mellanox.com
> >
> > The correct order for above lines is:
> >
> > Bugzilla ID: 662
> > Fixes: 655fcd68c7d2 ("net/failsafe: fix hotplug races")
> >
> > Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
> >
> >> ---
> >> --- a/drivers/net/failsafe/failsafe.c
> >> +++ b/drivers/net/failsafe/failsafe.c
> >> @@ -140,6 +140,11 @@ fs_mutex_init(struct fs_priv *priv)
> >> ERROR("Cannot initiate mutex attributes - %s", strerror(ret));
> >> return ret;
> >> }
> >> + /* Allow mutex to protect primary/secondary */
> >> + ret = pthread_mutexattr_setpshared(&attr, PTHREAD_PROCESS_SHARED);
> >> + if (ret)
> >> + ERROR("Cannot set mutex shared - %s", strerror(ret));
> >
> > Why not returning an error here?
>
> +1
>
> I think it would be safer to return an error here.
Ok but it never happens.
On 6/8/21 6:42 PM, Stephen Hemminger wrote:
> On Tue, 8 Jun 2021 11:00:37 +0300
> Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru> wrote:
>
>> On 4/19/21 8:08 PM, Thomas Monjalon wrote:
>>> About the title, better to speak about multi-process,
>>> it is less confusing than primary/secondary.
>>>
>>> 15/03/2021 20:27, Stephen Hemminger:
>>>> Set mutex used in failsafe driver to protect when used by
>>>> both primary and secondary process. Without this fix, the failsafe
>>>> lock is not really locking when there are multiple secondary processes.
>>>>
>>>> Bugzilla ID: 662
>>>> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
>>>> Fixes: 655fcd68c7d2 ("net/failsafe: fix hotplug races")
>>>> Cc: matan@mellanox.com
>>>
>>> The correct order for above lines is:
>>>
>>> Bugzilla ID: 662
>>> Fixes: 655fcd68c7d2 ("net/failsafe: fix hotplug races")
>>>
>>> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
>>>
>>>> ---
>>>> --- a/drivers/net/failsafe/failsafe.c
>>>> +++ b/drivers/net/failsafe/failsafe.c
>>>> @@ -140,6 +140,11 @@ fs_mutex_init(struct fs_priv *priv)
>>>> ERROR("Cannot initiate mutex attributes - %s", strerror(ret));
>>>> return ret;
>>>> }
>>>> + /* Allow mutex to protect primary/secondary */
>>>> + ret = pthread_mutexattr_setpshared(&attr, PTHREAD_PROCESS_SHARED);
>>>> + if (ret)
>>>> + ERROR("Cannot set mutex shared - %s", strerror(ret));
>>>
>>> Why not returning an error here?
>>
>> +1
>>
>> I think it would be safer to return an error here.
>
> Ok but it never happens.
>
May I ask why? 'man pthread_mutexattr_setpshared' says that it
is possible.
On Tue, 8 Jun 2021 18:55:17 +0300
Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru> wrote:
> On 6/8/21 6:42 PM, Stephen Hemminger wrote:
> > On Tue, 8 Jun 2021 11:00:37 +0300
> > Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru> wrote:
> >
> >> On 4/19/21 8:08 PM, Thomas Monjalon wrote:
> >>> About the title, better to speak about multi-process,
> >>> it is less confusing than primary/secondary.
> >>>
> >>> 15/03/2021 20:27, Stephen Hemminger:
> >>>> Set mutex used in failsafe driver to protect when used by
> >>>> both primary and secondary process. Without this fix, the failsafe
> >>>> lock is not really locking when there are multiple secondary processes.
> >>>>
> >>>> Bugzilla ID: 662
> >>>> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
> >>>> Fixes: 655fcd68c7d2 ("net/failsafe: fix hotplug races")
> >>>> Cc: matan@mellanox.com
> >>>
> >>> The correct order for above lines is:
> >>>
> >>> Bugzilla ID: 662
> >>> Fixes: 655fcd68c7d2 ("net/failsafe: fix hotplug races")
> >>>
> >>> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
> >>>
> >>>> ---
> >>>> --- a/drivers/net/failsafe/failsafe.c
> >>>> +++ b/drivers/net/failsafe/failsafe.c
> >>>> @@ -140,6 +140,11 @@ fs_mutex_init(struct fs_priv *priv)
> >>>> ERROR("Cannot initiate mutex attributes - %s", strerror(ret));
> >>>> return ret;
> >>>> }
> >>>> + /* Allow mutex to protect primary/secondary */
> >>>> + ret = pthread_mutexattr_setpshared(&attr, PTHREAD_PROCESS_SHARED);
> >>>> + if (ret)
> >>>> + ERROR("Cannot set mutex shared - %s", strerror(ret));
> >>>
> >>> Why not returning an error here?
> >>
> >> +1
> >>
> >> I think it would be safer to return an error here.
> >
> > Ok but it never happens.
> >
>
> May I ask why? 'man pthread_mutexattr_setpshared' says that it
> is possible.
>
The glibc implementation of pthread_mutexattr_setpshared is:
int
pthread_mutexattr_setpshared (pthread_mutexattr_t *attr, int pshared)
{
struct pthread_mutexattr *iattr;
int err = futex_supports_pshared (pshared);
if (err != 0)
return err;
iattr = (struct pthread_mutexattr *) attr;
if (pshared == PTHREAD_PROCESS_PRIVATE)
iattr->mutexkind &= ~PTHREAD_MUTEXATTR_FLAG_PSHARED;
else
iattr->mutexkind |= PTHREAD_MUTEXATTR_FLAG_PSHARED;
return 0;
}
And
/* FUTEX_SHARED is always supported by the Linux kernel. */
static __always_inline int
futex_supports_pshared (int pshared)
{
if (__glibc_likely (pshared == PTHREAD_PROCESS_PRIVATE))
return 0;
else if (pshared == PTHREAD_PROCESS_SHARED)
return 0;
else
return EINVAL;
}
There for the code as written can not return an error.
The check was only because someone could report a bogus
issue from a broken c library.
On 6/8/21 11:48 PM, Stephen Hemminger wrote:
> On Tue, 8 Jun 2021 18:55:17 +0300
> Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru> wrote:
>
>> On 6/8/21 6:42 PM, Stephen Hemminger wrote:
>>> On Tue, 8 Jun 2021 11:00:37 +0300
>>> Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru> wrote:
>>>
>>>> On 4/19/21 8:08 PM, Thomas Monjalon wrote:
>>>>> About the title, better to speak about multi-process,
>>>>> it is less confusing than primary/secondary.
>>>>>
>>>>> 15/03/2021 20:27, Stephen Hemminger:
>>>>>> Set mutex used in failsafe driver to protect when used by
>>>>>> both primary and secondary process. Without this fix, the failsafe
>>>>>> lock is not really locking when there are multiple secondary processes.
>>>>>>
>>>>>> Bugzilla ID: 662
>>>>>> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
>>>>>> Fixes: 655fcd68c7d2 ("net/failsafe: fix hotplug races")
>>>>>> Cc: matan@mellanox.com
>>>>>
>>>>> The correct order for above lines is:
>>>>>
>>>>> Bugzilla ID: 662
>>>>> Fixes: 655fcd68c7d2 ("net/failsafe: fix hotplug races")
>>>>>
>>>>> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
>>>>>
>>>>>> ---
>>>>>> --- a/drivers/net/failsafe/failsafe.c
>>>>>> +++ b/drivers/net/failsafe/failsafe.c
>>>>>> @@ -140,6 +140,11 @@ fs_mutex_init(struct fs_priv *priv)
>>>>>> ERROR("Cannot initiate mutex attributes - %s", strerror(ret));
>>>>>> return ret;
>>>>>> }
>>>>>> + /* Allow mutex to protect primary/secondary */
>>>>>> + ret = pthread_mutexattr_setpshared(&attr, PTHREAD_PROCESS_SHARED);
>>>>>> + if (ret)
>>>>>> + ERROR("Cannot set mutex shared - %s", strerror(ret));
>>>>>
>>>>> Why not returning an error here?
>>>>
>>>> +1
>>>>
>>>> I think it would be safer to return an error here.
>>>
>>> Ok but it never happens.
>>>
>>
>> May I ask why? 'man pthread_mutexattr_setpshared' says that it
>> is possible.
>>
>
> The glibc implementation of pthread_mutexattr_setpshared is:
>
>
> int
> pthread_mutexattr_setpshared (pthread_mutexattr_t *attr, int pshared)
> {
> struct pthread_mutexattr *iattr;
>
> int err = futex_supports_pshared (pshared);
> if (err != 0)
> return err;
>
> iattr = (struct pthread_mutexattr *) attr;
>
> if (pshared == PTHREAD_PROCESS_PRIVATE)
> iattr->mutexkind &= ~PTHREAD_MUTEXATTR_FLAG_PSHARED;
> else
> iattr->mutexkind |= PTHREAD_MUTEXATTR_FLAG_PSHARED;
>
> return 0;
> }
>
> And
>
> /* FUTEX_SHARED is always supported by the Linux kernel. */
> static __always_inline int
> futex_supports_pshared (int pshared)
> {
> if (__glibc_likely (pshared == PTHREAD_PROCESS_PRIVATE))
> return 0;
> else if (pshared == PTHREAD_PROCESS_SHARED)
> return 0;
> else
> return EINVAL;
> }
>
>
> There for the code as written can not return an error.
> The check was only because someone could report a bogus
> issue from a broken c library.
>
Many thanks for detailed description.
I thought that it is better to follow API
definition and it is not that hard to check
return code and handle it. Yes, glibc is not
the only C library.
On Wed, Jun 9, 2021, at 12:04, Andrew Rybchenko wrote:
> On 6/8/21 11:48 PM, Stephen Hemminger wrote:
> > On Tue, 8 Jun 2021 18:55:17 +0300
> > Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru> wrote:
> >
> >> On 6/8/21 6:42 PM, Stephen Hemminger wrote:
> >>> On Tue, 8 Jun 2021 11:00:37 +0300
> >>> Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru> wrote:
> >>>
> >>>> On 4/19/21 8:08 PM, Thomas Monjalon wrote:
> >>>>> About the title, better to speak about multi-process,
> >>>>> it is less confusing than primary/secondary.
> >>>>>
> >>>>> 15/03/2021 20:27, Stephen Hemminger:
> >>>>>> Set mutex used in failsafe driver to protect when used by
> >>>>>> both primary and secondary process. Without this fix, the failsafe
> >>>>>> lock is not really locking when there are multiple secondary processes.
> >>>>>>
> >>>>>> Bugzilla ID: 662
> >>>>>> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
> >>>>>> Fixes: 655fcd68c7d2 ("net/failsafe: fix hotplug races")
> >>>>>> Cc: matan@mellanox.com
> >>>>>
> >>>>> The correct order for above lines is:
> >>>>>
> >>>>> Bugzilla ID: 662
> >>>>> Fixes: 655fcd68c7d2 ("net/failsafe: fix hotplug races")
> >>>>>
> >>>>> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
> >>>>>
> >>>>>> ---
> >>>>>> --- a/drivers/net/failsafe/failsafe.c
> >>>>>> +++ b/drivers/net/failsafe/failsafe.c
> >>>>>> @@ -140,6 +140,11 @@ fs_mutex_init(struct fs_priv *priv)
> >>>>>> ERROR("Cannot initiate mutex attributes - %s", strerror(ret));
> >>>>>> return ret;
> >>>>>> }
> >>>>>> + /* Allow mutex to protect primary/secondary */
> >>>>>> + ret = pthread_mutexattr_setpshared(&attr, PTHREAD_PROCESS_SHARED);
> >>>>>> + if (ret)
> >>>>>> + ERROR("Cannot set mutex shared - %s", strerror(ret));
> >>>>>
> >>>>> Why not returning an error here?
> >>>>
> >>>> +1
> >>>>
> >>>> I think it would be safer to return an error here.
> >>>
> >>> Ok but it never happens.
> >>>
> >>
> >> May I ask why? 'man pthread_mutexattr_setpshared' says that it
> >> is possible.
> >>
> >
> > The glibc implementation of pthread_mutexattr_setpshared is:
> >
> >
> > int
> > pthread_mutexattr_setpshared (pthread_mutexattr_t *attr, int pshared)
> > {
> > struct pthread_mutexattr *iattr;
> >
> > int err = futex_supports_pshared (pshared);
> > if (err != 0)
> > return err;
> >
> > iattr = (struct pthread_mutexattr *) attr;
> >
> > if (pshared == PTHREAD_PROCESS_PRIVATE)
> > iattr->mutexkind &= ~PTHREAD_MUTEXATTR_FLAG_PSHARED;
> > else
> > iattr->mutexkind |= PTHREAD_MUTEXATTR_FLAG_PSHARED;
> >
> > return 0;
> > }
> >
> > And
> >
> > /* FUTEX_SHARED is always supported by the Linux kernel. */
> > static __always_inline int
> > futex_supports_pshared (int pshared)
> > {
> > if (__glibc_likely (pshared == PTHREAD_PROCESS_PRIVATE))
> > return 0;
> > else if (pshared == PTHREAD_PROCESS_SHARED)
> > return 0;
> > else
> > return EINVAL;
> > }
> >
> >
> > There for the code as written can not return an error.
> > The check was only because someone could report a bogus
> > issue from a broken c library.
> >
>
> Many thanks for detailed description.
> I thought that it is better to follow API
> definition and it is not that hard to check
> return code and handle it. Yes, glibc is not
> the only C library.
>
On principle the API spec should be respected without assuming a specific
implementation.
Another way to think about it is that a future dev having zero knowledge of
this thread, reading this code and checking the POSIX manual, will also need to
check that usual c lib implementations are unlikely to generate an error before
concluding that this code is alright. It should not be necessary.
--
Gaetan Rivet
> On Wed, Jun 9, 2021, at 12:04, Andrew Rybchenko wrote:
>> On 6/8/21 11:48 PM, Stephen Hemminger wrote:
>> > On Tue, 8 Jun 2021 18:55:17 +0300
>> > Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru> wrote:
>> >
>> >> On 6/8/21 6:42 PM, Stephen Hemminger wrote:
> >>> On Tue, 8 Jun 2021 11:00:37 +0300
> >>> Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru> wrote:
> >>>
> >>>> On 4/19/21 8:08 PM, Thomas Monjalon wrote:
> >>>>> About the title, better to speak about multi-process, it is less
> >>>>> confusing than primary/secondary.
> >>>>>
> >>>>> 15/03/2021 20:27, Stephen Hemminger:
> >>>>>> Set mutex used in failsafe driver to protect when used by both
> >>>>>> primary and secondary process. Without this fix, the failsafe
> >>>>>> lock is not really locking when there are multiple secondary processes.
> >>>>>>
> >>>>>> Bugzilla ID: 662
> >>>>>> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
> >>>>>> Fixes: 655fcd68c7d2 ("net/failsafe: fix hotplug races")
> >>>>>> Cc: matan@mellanox.com
> >>>>>
> >>>>> The correct order for above lines is:
> >>>>>
> >>>>> Bugzilla ID: 662
> >>>>> Fixes: 655fcd68c7d2 ("net/failsafe: fix hotplug races")
> >>>>>
> >>>>> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
> >>>>>
> >>>>>> ---
> > >>>>>> --- a/drivers/net/failsafe/failsafe.c
> > >>>>>> +++ b/drivers/net/failsafe/failsafe.c
> >>>>>> @@ -140,6 +140,11 @@ fs_mutex_init(struct fs_priv *priv)
> > >>>>>> ERROR("Cannot initiate mutex attributes - %s", strerror(ret));
> > >>>>>> return ret;
> >>>>>> }
> > >>>>>> + /* Allow mutex to protect primary/secondary */
> > >>>>>> + ret = pthread_mutexattr_setpshared(&attr, PTHREAD_PROCESS_SHARED);
> > >>>>>> + if (ret)
> > >>>>>> + ERROR("Cannot set mutex shared - %s", strerror(ret));
> >>>>>
> > >>>>> Why not returning an error here?
> >>>>
> > >>>> +1
> >>>>
> > >>>> I think it would be safer to return an error here.
> >>>
> > >>> Ok but it never happens.
> >>>
> >>
> > >> May I ask why? 'man pthread_mutexattr_setpshared' says that it is
> > >> possible.
> >>
> >
> > > The glibc implementation of pthread_mutexattr_setpshared is:
> >
> >
> > > int
> > > pthread_mutexattr_setpshared (pthread_mutexattr_t *attr, int
> > > pshared) {
> > > struct pthread_mutexattr *iattr;
> >
> > > int err = futex_supports_pshared (pshared);
> > > if (err != 0)
> > > return err;
> > >
> > > iattr = (struct pthread_mutexattr *) attr;
> > >
> > > if (pshared == PTHREAD_PROCESS_PRIVATE)
> > > iattr->mutexkind &= ~PTHREAD_MUTEXATTR_FLAG_PSHARED;
> > > else
> > > iattr->mutexkind |= PTHREAD_MUTEXATTR_FLAG_PSHARED;
> > >
> > > return 0;
> > > }
> > >
> > > And
> > >
> > > /* FUTEX_SHARED is always supported by the Linux kernel. */ static
> > > __always_inline int futex_supports_pshared (int pshared) {
> > > if (__glibc_likely (pshared == PTHREAD_PROCESS_PRIVATE))
> > > return 0;
> > > else if (pshared == PTHREAD_PROCESS_SHARED)
> > > return 0;
> > > else
> > > return EINVAL;
> > }
> > >
> >
> > > There for the code as written can not return an error.
> > > The check was only because someone could report a bogus issue from a
> > > broken c library.
> > >
> >
> > Many thanks for detailed description.
> > I thought that it is better to follow API definition and it is not
> > that hard to check return code and handle it. Yes, glibc is not the
> > only C library.
> >
>
> On principle the API spec should be respected without assuming a specific implementation.
>
> Another way to think about it is that a future dev having zero knowledge of this thread, reading this code and checking the POSIX manual, will also need to check that usual c lib implementations are unlikely > to generate an error before concluding that this code is alright. It should not be necessary.
>
We are also facing similar issue, while probe of fail-safe PMD b/w multi-process.
rte_eth_dev_attach_secondary(), API return's error, while probing from secondary process in rte_pmd_tap_probe().
So, can you please let us know, if any fix available on such issue ?
Thanks,
Madhuker.