patches for DPDK stable branches
 help / color / mirror / Atom feed
* [dpdk-stable] [PATCH] net/bonding: fix slave activation simultaneously
@ 2018-04-24 11:29 Matan Azrad
  2018-05-14  2:07 ` [dpdk-stable] [dpdk-dev] " Thomas Monjalon
  2018-05-14 11:45 ` [dpdk-stable] " Doherty, Declan
  0 siblings, 2 replies; 6+ messages in thread
From: Matan Azrad @ 2018-04-24 11:29 UTC (permalink / raw)
  To: Declan Doherty; +Cc: dev, stable

The bonding PMD decides to activate\deactivate its slaves according to
the slaves link statuses.
Thus, it registers to the LSC events of the slaves ports and
activates\deactivates them from its LSC callbacks called asynchronously
by the host thread when the slave link status is changed.

In addition, the bonding PMD uses the callback for slave activation
when it tries to start it, this operation is probably called by the
master thread.

Consequently, a slave may be activated in the same time by two
different threads and may cause a lot of optional errors, for example,
slave mempool recreation with the same name causes an error.

Synchronize the critical section in the LSC callback using a special
new spinlock.

Fixes: 414b202343ce ("bonding: fix initial link status of slave")
Fixes: a45b288ef21a ("bond: support link status polling")
Cc: stable@dpdk.org

Signed-off-by: Matan Azrad <matan@mellanox.com>
---
 drivers/net/bonding/rte_eth_bond_pmd.c     | 17 +++++++++++++++--
 drivers/net/bonding/rte_eth_bond_private.h |  1 +
 2 files changed, 16 insertions(+), 2 deletions(-)

diff --git a/drivers/net/bonding/rte_eth_bond_pmd.c b/drivers/net/bonding/rte_eth_bond_pmd.c
index 2805c71..dffd12f 100644
--- a/drivers/net/bonding/rte_eth_bond_pmd.c
+++ b/drivers/net/bonding/rte_eth_bond_pmd.c
@@ -2654,14 +2654,21 @@ struct bwg_slave {
 	if (!valid_slave)
 		return rc;
 
+	/* Synchronize lsc callback parallel calls either by real link event
+	 * from the slaves PMDs or by the bonding PMD itself.
+	 */
+	rte_spinlock_lock(&internals->lsc_lock);
+
 	/* Search for port in active port list */
 	active_pos = find_slave_by_id(internals->active_slaves,
 			internals->active_slave_count, port_id);
 
 	rte_eth_link_get_nowait(port_id, &link);
 	if (link.link_status) {
-		if (active_pos < internals->active_slave_count)
+		if (active_pos < internals->active_slave_count) {
+			rte_spinlock_unlock(&internals->lsc_lock);
 			return rc;
+		}
 
 		/* if no active slave ports then set this port to be primary port */
 		if (internals->active_slave_count < 1) {
@@ -2680,8 +2687,10 @@ struct bwg_slave {
 				internals->primary_port == port_id)
 			bond_ethdev_primary_set(internals, port_id);
 	} else {
-		if (active_pos == internals->active_slave_count)
+		if (active_pos == internals->active_slave_count) {
+			rte_spinlock_unlock(&internals->lsc_lock);
 			return rc;
+		}
 
 		/* Remove from active slave list */
 		deactivate_slave(bonded_eth_dev, port_id);
@@ -2734,6 +2743,9 @@ struct bwg_slave {
 						NULL);
 		}
 	}
+
+	rte_spinlock_unlock(&internals->lsc_lock);
+
 	return 0;
 }
 
@@ -2942,6 +2954,7 @@ struct bwg_slave {
 	eth_dev->data->dev_flags = RTE_ETH_DEV_INTR_LSC;
 
 	rte_spinlock_init(&internals->lock);
+	rte_spinlock_init(&internals->lsc_lock);
 
 	internals->port_id = eth_dev->data->port_id;
 	internals->mode = BONDING_MODE_INVALID;
diff --git a/drivers/net/bonding/rte_eth_bond_private.h b/drivers/net/bonding/rte_eth_bond_private.h
index 94eca88..9220b61 100644
--- a/drivers/net/bonding/rte_eth_bond_private.h
+++ b/drivers/net/bonding/rte_eth_bond_private.h
@@ -89,6 +89,7 @@ struct bond_dev_private {
 	uint8_t mode;						/**< Link Bonding Mode */
 
 	rte_spinlock_t lock;
+	rte_spinlock_t lsc_lock;
 
 	uint16_t primary_port;			/**< Primary Slave Port */
 	uint16_t current_primary_port;		/**< Primary Slave Port */
-- 
1.9.5

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [dpdk-stable] [dpdk-dev] [PATCH] net/bonding: fix slave activation simultaneously
  2018-04-24 11:29 [dpdk-stable] [PATCH] net/bonding: fix slave activation simultaneously Matan Azrad
@ 2018-05-14  2:07 ` Thomas Monjalon
  2018-05-14 11:45 ` [dpdk-stable] " Doherty, Declan
  1 sibling, 0 replies; 6+ messages in thread
From: Thomas Monjalon @ 2018-05-14  2:07 UTC (permalink / raw)
  To: Declan Doherty
  Cc: dev, Matan Azrad, stable, chas3, radu.nicolau, sharmila.podury

Someone to review please?

24/04/2018 13:29, Matan Azrad:
> The bonding PMD decides to activate\deactivate its slaves according to
> the slaves link statuses.
> Thus, it registers to the LSC events of the slaves ports and
> activates\deactivates them from its LSC callbacks called asynchronously
> by the host thread when the slave link status is changed.
> 
> In addition, the bonding PMD uses the callback for slave activation
> when it tries to start it, this operation is probably called by the
> master thread.
> 
> Consequently, a slave may be activated in the same time by two
> different threads and may cause a lot of optional errors, for example,
> slave mempool recreation with the same name causes an error.
> 
> Synchronize the critical section in the LSC callback using a special
> new spinlock.
> 
> Fixes: 414b202343ce ("bonding: fix initial link status of slave")
> Fixes: a45b288ef21a ("bond: support link status polling")
> Cc: stable@dpdk.org
> 
> Signed-off-by: Matan Azrad <matan@mellanox.com>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [dpdk-stable] [PATCH] net/bonding: fix slave activation simultaneously
  2018-04-24 11:29 [dpdk-stable] [PATCH] net/bonding: fix slave activation simultaneously Matan Azrad
  2018-05-14  2:07 ` [dpdk-stable] [dpdk-dev] " Thomas Monjalon
@ 2018-05-14 11:45 ` Doherty, Declan
  2018-05-14 12:41   ` Ferruh Yigit
  1 sibling, 1 reply; 6+ messages in thread
From: Doherty, Declan @ 2018-05-14 11:45 UTC (permalink / raw)
  To: Matan Azrad; +Cc: dev, stable, Thomas Monjalon

On 24/04/2018 12:29 PM, Matan Azrad wrote:
> The bonding PMD decides to activate\deactivate its slaves according to
> the slaves link statuses.
> Thus, it registers to the LSC events of the slaves ports and
> activates\deactivates them from its LSC callbacks called asynchronously
> by the host thread when the slave link status is changed.
> 
> In addition, the bonding PMD uses the callback for slave activation
> when it tries to start it, this operation is probably called by the
> master thread.
> 
> Consequently, a slave may be activated in the same time by two
> different threads and may cause a lot of optional errors, for example,
> slave mempool recreation with the same name causes an error.
> 
> Synchronize the critical section in the LSC callback using a special
> new spinlock.
> 
> Fixes: 414b202343ce ("bonding: fix initial link status of slave")
> Fixes: a45b288ef21a ("bond: support link status polling")
> Cc: stable@dpdk.org
> 
> Signed-off-by: Matan Azrad <matan@mellanox.com>
> ---
...
> 

Acked-by: Declan Doherty <declan.doherty@intel.com>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [dpdk-stable] [PATCH] net/bonding: fix slave activation simultaneously
  2018-05-14 11:45 ` [dpdk-stable] " Doherty, Declan
@ 2018-05-14 12:41   ` Ferruh Yigit
  2018-05-14 15:01     ` [dpdk-stable] [dpdk-dev] " Chas Williams
  0 siblings, 1 reply; 6+ messages in thread
From: Ferruh Yigit @ 2018-05-14 12:41 UTC (permalink / raw)
  To: Doherty, Declan, Matan Azrad; +Cc: dev, stable, Thomas Monjalon

On 5/14/2018 12:45 PM, Doherty, Declan wrote:
> On 24/04/2018 12:29 PM, Matan Azrad wrote:
>> The bonding PMD decides to activate\deactivate its slaves according to
>> the slaves link statuses.
>> Thus, it registers to the LSC events of the slaves ports and
>> activates\deactivates them from its LSC callbacks called asynchronously
>> by the host thread when the slave link status is changed.
>>
>> In addition, the bonding PMD uses the callback for slave activation
>> when it tries to start it, this operation is probably called by the
>> master thread.
>>
>> Consequently, a slave may be activated in the same time by two
>> different threads and may cause a lot of optional errors, for example,
>> slave mempool recreation with the same name causes an error.
>>
>> Synchronize the critical section in the LSC callback using a special
>> new spinlock.
>>
>> Fixes: 414b202343ce ("bonding: fix initial link status of slave")
>> Fixes: a45b288ef21a ("bond: support link status polling")
>> Cc: stable@dpdk.org
>>
>> Signed-off-by: Matan Azrad <matan@mellanox.com>
>> ---
> ...
>>
> 
> Acked-by: Declan Doherty <declan.doherty@intel.com>
> 

Applied to dpdk-next-net/master, thanks.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [dpdk-stable] [dpdk-dev] [PATCH] net/bonding: fix slave activation simultaneously
  2018-05-14 12:41   ` Ferruh Yigit
@ 2018-05-14 15:01     ` Chas Williams
  2018-05-15  8:24       ` Matan Azrad
  0 siblings, 1 reply; 6+ messages in thread
From: Chas Williams @ 2018-05-14 15:01 UTC (permalink / raw)
  To: Ferruh Yigit; +Cc: Doherty, Declan, Matan Azrad, dev, stable, Thomas Monjalon

There's possibly an issue here:

        /* If the device isn't started don't handle interrupts */
        if (!bonded_eth_dev->data->dev_started)
                return rc;

        /* verify that port_id is a valid slave of bonded port */
        for (i = 0; i < internals->slave_count; i++) {
                if (internals->slaves[i].port_id == port_id) {
                        valid_slave = 1;
                        break;
                }
        }

        if (!valid_slave)
                return rc;

        /* Synchronize lsc callback parallel calls either by real link event
         * from the slaves PMDs or by the bonding PMD itself.
         */
        rte_spinlock_lock(&internals->lsc_lock);

Nothing keeps the master thread from modifying internals while this is
running and
your slave port may suddenly no longer be a slave port by the time you get
to lsc_lock.
I think there is probably an issue dev_started as well.  Again, nothing
keeps the LSC
thread from attempting to activate a slave that may have just been
stopped/removed.


On Mon, May 14, 2018 at 8:41 AM, Ferruh Yigit <ferruh.yigit@intel.com>
wrote:

> On 5/14/2018 12:45 PM, Doherty, Declan wrote:
> > On 24/04/2018 12:29 PM, Matan Azrad wrote:
> >> The bonding PMD decides to activate\deactivate its slaves according to
> >> the slaves link statuses.
> >> Thus, it registers to the LSC events of the slaves ports and
> >> activates\deactivates them from its LSC callbacks called asynchronously
> >> by the host thread when the slave link status is changed.
> >>
> >> In addition, the bonding PMD uses the callback for slave activation
> >> when it tries to start it, this operation is probably called by the
> >> master thread.
> >>
> >> Consequently, a slave may be activated in the same time by two
> >> different threads and may cause a lot of optional errors, for example,
> >> slave mempool recreation with the same name causes an error.
> >>
> >> Synchronize the critical section in the LSC callback using a special
> >> new spinlock.
> >>
> >> Fixes: 414b202343ce ("bonding: fix initial link status of slave")
> >> Fixes: a45b288ef21a ("bond: support link status polling")
> >> Cc: stable@dpdk.org
> >>
> >> Signed-off-by: Matan Azrad <matan@mellanox.com>
> >> ---
> > ...
> >>
> >
> > Acked-by: Declan Doherty <declan.doherty@intel.com>
> >
>
> Applied to dpdk-next-net/master, thanks.
>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [dpdk-stable] [dpdk-dev] [PATCH] net/bonding: fix slave activation simultaneously
  2018-05-14 15:01     ` [dpdk-stable] [dpdk-dev] " Chas Williams
@ 2018-05-15  8:24       ` Matan Azrad
  0 siblings, 0 replies; 6+ messages in thread
From: Matan Azrad @ 2018-05-15  8:24 UTC (permalink / raw)
  To: Chas Williams, Ferruh Yigit; +Cc: Doherty, Declan, dev, stable, Thomas Monjalon

Hi

From: Chas Williams
> There's possibly an issue here:
> 
>         /* If the device isn't started don't handle interrupts */
>         if (!bonded_eth_dev->data->dev_started)
>                 return rc;
> 
>         /* verify that port_id is a valid slave of bonded port */
>         for (i = 0; i < internals->slave_count; i++) {
>                 if (internals->slaves[i].port_id == port_id) {
>                         valid_slave = 1;
>                         break;
>                 }
>         }
> 
>         if (!valid_slave)
>                 return rc;
> 
>         /* Synchronize lsc callback parallel calls either by real link event
>          * from the slaves PMDs or by the bonding PMD itself.
>          */
>         rte_spinlock_lock(&internals->lsc_lock);
> 
> Nothing keeps the master thread from modifying internals while this is running
> and your slave port may suddenly no longer be a slave port by the time you get
> to lsc_lock.
> I think there is probably an issue dev_started as well.  Again, nothing keeps the
> LSC thread from attempting to activate a slave that may have just been
> stopped/removed.

Yes, you right, the same issue for data-path functions, I think there is a big synchronization work that should be done for the bonding PMD.
This patch just synchronizes the callback parallel calls and solved an issue saw in mlx5 device, and do not pretend to solve all the sync issues.

> 
> On Mon, May 14, 2018 at 8:41 AM, Ferruh Yigit <ferruh.yigit@intel.com>
> wrote:
> On 5/14/2018 12:45 PM, Doherty, Declan wrote:
> > On 24/04/2018 12:29 PM, Matan Azrad wrote:
> >> The bonding PMD decides to activate\deactivate its slaves according
> >> to the slaves link statuses.
> >> Thus, it registers to the LSC events of the slaves ports and
> >> activates\deactivates them from its LSC callbacks called
> >> asynchronously by the host thread when the slave link status is changed.
> >>
> >> In addition, the bonding PMD uses the callback for slave activation
> >> when it tries to start it, this operation is probably called by the
> >> master thread.
> >>
> >> Consequently, a slave may be activated in the same time by two
> >> different threads and may cause a lot of optional errors, for
> >> example, slave mempool recreation with the same name causes an error.
> >>
> >> Synchronize the critical section in the LSC callback using a special
> >> new spinlock.
> >>
> >> Fixes: 414b202343ce ("bonding: fix initial link status of slave")
> >> Fixes: a45b288ef21a ("bond: support link status polling")
> >> Cc: stable@dpdk.org
> >>
> >> Signed-off-by: Matan Azrad <matan@mellanox.com>
> >> ---
> > ...
> >>
> >
> > Acked-by: Declan Doherty <declan.doherty@intel.com>
> >
> Applied to dpdk-next-net/master, thanks.
> 



^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2018-05-15  8:24 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-04-24 11:29 [dpdk-stable] [PATCH] net/bonding: fix slave activation simultaneously Matan Azrad
2018-05-14  2:07 ` [dpdk-stable] [dpdk-dev] " Thomas Monjalon
2018-05-14 11:45 ` [dpdk-stable] " Doherty, Declan
2018-05-14 12:41   ` Ferruh Yigit
2018-05-14 15:01     ` [dpdk-stable] [dpdk-dev] " Chas Williams
2018-05-15  8:24       ` Matan Azrad

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).