From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <3chas3@gmail.com>
Received: from mail-it0-f68.google.com (mail-it0-f68.google.com
 [209.85.214.68]) by dpdk.org (Postfix) with ESMTP id 176781C8EF;
 Mon, 14 May 2018 17:01:25 +0200 (CEST)
Received: by mail-it0-f68.google.com with SMTP id q4-v6so11550142ite.3;
 Mon, 14 May 2018 08:01:24 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025;
 h=mime-version:in-reply-to:references:from:date:message-id:subject:to
 :cc; bh=/TM+OfopIX9QlosKPQwsgzIVk1p1fSrOibxfWX7OOGQ=;
 b=dqFr+69b5S02kDtKrdKM10SWRfQ2dwv2E+WyWP3VDuy3/0i1DFPZPehBt2vxiTvU89
 fGBBRuMSstolaIj/4M/SE1vKDKnVEjtU3lWfHkCdvStng7FFpOK8NLrPTC5Sf+4xtRuM
 l9/cRF49LWQdrJbbXvQbIorMjRUYVGcz+nzsdBcyUVwl0miayRiqPa6cqoOMyYgbEXZQ
 vlTSXSMr4FofiAsDhxV0WrbywHotBaW6XaavO+mRiZvNjC6InRgQtRydsYexi9v8EkqI
 Im7g1neVMvbsZlxIUw+z3jD4MHBbtHohxO+OFGdpO6wzmI2BrYSvOM8RmarFMwQ2hYc5
 j9+w==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20161025;
 h=x-gm-message-state:mime-version:in-reply-to:references:from:date
 :message-id:subject:to:cc;
 bh=/TM+OfopIX9QlosKPQwsgzIVk1p1fSrOibxfWX7OOGQ=;
 b=n6OJmHZtUtworZYtjSV2kbuKzR6Ftmo3LptoXlkOr4iU5X+Pnx7PIxIfO2fCnoGObV
 BQVPQH5V/p96X4PIW4gbcDTOoNC+3EUOIwuEp4PnzttJF3EqlryLxyJbU92vSz2QiqtY
 Yr9z0roIOwbIgiDU38jSh4JK9ffb1200eQu+YtZXo+ymiMkVGThAWERL2x36SizVC64C
 fOEiiTPs35VJjjcPBZ+SzZg4vEMYQYn2S8kdJcUH+3kxbVdGWZSKjfOgMKXYbH9qxqfh
 klYCjt2LjbM2UdcK2lJZSx1ZMZigBJDfbLmAUpyY/KmxfB43Cfqm2NwZNCKR0Wd1zL64
 tyTw==
X-Gm-Message-State: ALKqPwem1zcYff+dwFJ9XLivEMd0l9IRexKRm5gfOPgDnsLcR0lzDcQy
 DJxRTZnQWDdWmPOlmPV3V9NtZM3ViyFU24fMB3nuvg==
X-Google-Smtp-Source: AB8JxZoPBIh1mNggg4wvZVtlXwaFJgQ4YgBTkzY3exlwqgrTIonf+Lhpftl6lU+VHDHEx/HHtqhEtW+vo0y/LPq5bGk=
X-Received: by 2002:a6b:b386:: with SMTP id
 c128-v6mr10449561iof.50.1526310084405; 
 Mon, 14 May 2018 08:01:24 -0700 (PDT)
MIME-Version: 1.0
Received: by 10.107.143.22 with HTTP; Mon, 14 May 2018 08:01:23 -0700 (PDT)
In-Reply-To: <1dfd3e27-9870-8851-cf92-7b4e36c49154@intel.com>
References: <1524569370-6799-1-git-send-email-matan@mellanox.com>
 <0aad672c-210a-db8c-53f3-98660fe17f1d@intel.com>
 <1dfd3e27-9870-8851-cf92-7b4e36c49154@intel.com>
From: Chas Williams <3chas3@gmail.com>
Date: Mon, 14 May 2018 11:01:23 -0400
Message-ID: <CAG2-GkkOkB5EVAn1_PdwQc1et=qL2vQrj6BPeRXrdWwydkjTDQ@mail.gmail.com>
To: Ferruh Yigit <ferruh.yigit@intel.com>
Cc: "Doherty, Declan" <declan.doherty@intel.com>,
 Matan Azrad <matan@mellanox.com>, dev@dpdk.org, 
 stable@dpdk.org, Thomas Monjalon <thomas@monjalon.net>
Content-Type: text/plain; charset="UTF-8"
X-Content-Filtered-By: Mailman/MimeDel 2.1.15
Subject: Re: [dpdk-stable] [dpdk-dev] [PATCH] net/bonding: fix slave
 activation simultaneously
X-BeenThere: stable@dpdk.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: patches for DPDK stable branches <stable.dpdk.org>
List-Unsubscribe: <https://dpdk.org/ml/options/stable>,
 <mailto:stable-request@dpdk.org?subject=unsubscribe>
List-Archive: <http://dpdk.org/ml/archives/stable/>
List-Post: <mailto:stable@dpdk.org>
List-Help: <mailto:stable-request@dpdk.org?subject=help>
List-Subscribe: <https://dpdk.org/ml/listinfo/stable>,
 <mailto:stable-request@dpdk.org?subject=subscribe>
X-List-Received-Date: Mon, 14 May 2018 15:01:25 -0000

There's possibly an issue here:

        /* If the device isn't started don't handle interrupts */
        if (!bonded_eth_dev->data->dev_started)
                return rc;

        /* verify that port_id is a valid slave of bonded port */
        for (i = 0; i < internals->slave_count; i++) {
                if (internals->slaves[i].port_id == port_id) {
                        valid_slave = 1;
                        break;
                }
        }

        if (!valid_slave)
                return rc;

        /* Synchronize lsc callback parallel calls either by real link event
         * from the slaves PMDs or by the bonding PMD itself.
         */
        rte_spinlock_lock(&internals->lsc_lock);

Nothing keeps the master thread from modifying internals while this is
running and
your slave port may suddenly no longer be a slave port by the time you get
to lsc_lock.
I think there is probably an issue dev_started as well.  Again, nothing
keeps the LSC
thread from attempting to activate a slave that may have just been
stopped/removed.


On Mon, May 14, 2018 at 8:41 AM, Ferruh Yigit <ferruh.yigit@intel.com>
wrote:

> On 5/14/2018 12:45 PM, Doherty, Declan wrote:
> > On 24/04/2018 12:29 PM, Matan Azrad wrote:
> >> The bonding PMD decides to activate\deactivate its slaves according to
> >> the slaves link statuses.
> >> Thus, it registers to the LSC events of the slaves ports and
> >> activates\deactivates them from its LSC callbacks called asynchronously
> >> by the host thread when the slave link status is changed.
> >>
> >> In addition, the bonding PMD uses the callback for slave activation
> >> when it tries to start it, this operation is probably called by the
> >> master thread.
> >>
> >> Consequently, a slave may be activated in the same time by two
> >> different threads and may cause a lot of optional errors, for example,
> >> slave mempool recreation with the same name causes an error.
> >>
> >> Synchronize the critical section in the LSC callback using a special
> >> new spinlock.
> >>
> >> Fixes: 414b202343ce ("bonding: fix initial link status of slave")
> >> Fixes: a45b288ef21a ("bond: support link status polling")
> >> Cc: stable@dpdk.org
> >>
> >> Signed-off-by: Matan Azrad <matan@mellanox.com>
> >> ---
> > ...
> >>
> >
> > Acked-by: Declan Doherty <declan.doherty@intel.com>
> >
>
> Applied to dpdk-next-net/master, thanks.
>