From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mga11.intel.com (mga11.intel.com [192.55.52.93]) by dpdk.org (Postfix) with ESMTP id 226C4107A for ; Mon, 4 Sep 2017 13:54:51 +0200 (CEST) Received: from fmsmga002.fm.intel.com ([10.253.24.26]) by fmsmga102.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 04 Sep 2017 04:54:51 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.41,474,1498546800"; d="scan'208";a="1214475892" Received: from rnicolau-mobl.ger.corp.intel.com (HELO [10.237.221.79]) ([10.237.221.79]) by fmsmga002.fm.intel.com with ESMTP; 04 Sep 2017 04:54:49 -0700 To: zhangsha.zhang@huawei.com, dev@dpdk.org, declan.doherty@intel.com Cc: jerry.lilijun@huawei.com, zhoujingbin@huawei.com, caihe@huawei.com References: <1501063992-10704-1-git-send-email-zhangsha.zhang@huawei.com> From: Radu Nicolau Message-ID: Date: Mon, 4 Sep 2017 12:54:48 +0100 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:52.0) Gecko/20100101 Thunderbird/52.1.0 MIME-Version: 1.0 In-Reply-To: <1501063992-10704-1-git-send-email-zhangsha.zhang@huawei.com> Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit Content-Language: en-US Subject: Re: [dpdk-dev] [PATCH v3] bonding: fix the segfault caused by the race condition between master thread and eal-intr-thread X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 04 Sep 2017 11:54:52 -0000 Hi, Wouldn't be possible to treat the section of code that segfaults as a critical one, i.e. use the lock/unlock instead of triggering alarms? On 7/26/2017 11:13 AM, zhangsha.zhang@huawei.com wrote: > From: Sha Zhang > > Function slave_configure calls functions bond_ethdev_lsc_event_callback and > slave_eth_dev->dev_ops->link_update to fix updating slave link status. > But there is a low probability that process may be crashed if the master > thread, which create bonding-device, adds the active_slave_count of the > bond to nozero while the rx_ring or tx_ring of it haven't been created. > > This patch moves the functions bond_ethdev_lsc_event_callback and > slave_eth_dev->dev_ops->link_update to eal-intr-thread to aviod the > competition. > > Fixes: 210903803f6e ("net/bonding: fix updating slave link status") > > Signed-off-by: Sha Zhang > --- > drivers/net/bonding/rte_eth_bond_pmd.c | 58 +++++++++++++++++++++++++++++----- > 1 file changed, 50 insertions(+), 8 deletions(-) > > diff --git a/drivers/net/bonding/rte_eth_bond_pmd.c b/drivers/net/bonding/rte_eth_bond_pmd.c > index 383e27c..bc0ee7f 100644 > --- a/drivers/net/bonding/rte_eth_bond_pmd.c > +++ b/drivers/net/bonding/rte_eth_bond_pmd.c > @@ -53,6 +53,7 @@ > > #define REORDER_PERIOD_MS 10 > #define DEFAULT_POLLING_INTERVAL_10_MS (10) > +#define BOND_LSC_DELAY_TIME_US (10 * 1000) > > #define HASH_L4_PORTS(h) ((h)->src_port ^ (h)->dst_port) > > @@ -1800,14 +1801,6 @@ struct bwg_slave { > } > } > > - /* If lsc interrupt is set, check initial slave's link status */ > - if (slave_eth_dev->data->dev_flags & RTE_ETH_DEV_INTR_LSC) { > - slave_eth_dev->dev_ops->link_update(slave_eth_dev, 0); > - bond_ethdev_lsc_event_callback(slave_eth_dev->data->port_id, > - RTE_ETH_EVENT_INTR_LSC, &bonded_eth_dev->data->port_id, > - NULL); > - } > - > return 0; > } > > @@ -1878,6 +1871,51 @@ struct bwg_slave { > static void > bond_ethdev_promiscuous_enable(struct rte_eth_dev *eth_dev); > > +static void > +bond_ethdev_slave_lsc_delay(void *cb_arg) > +{ > + struct rte_eth_dev *bonded_ethdev, *slave_dev; > + struct bond_dev_private *internals; > + > + /* Default value for polling slave found is true as we don't > + * want todisable the polling thread if we cannot get the lock. > + */ > + int i = 0; > + > + if (!cb_arg) > + return; > + > + bonded_ethdev = (struct rte_eth_dev *)cb_arg; > + if (!bonded_ethdev->data->dev_started) > + return; > + > + internals = (struct bond_dev_private *)bonded_ethdev->data->dev_private; > + if (!rte_spinlock_trylock(&internals->lock)) { > + rte_eal_alarm_set(BOND_LSC_DELAY_TIME_US * 10, > + bond_ethdev_slave_lsc_delay, > + (void *)&rte_eth_devices[internals->port_id]); > + return; > + } > + > + for (i = 0; i < internals->slave_count; i++) { > + slave_dev = &(rte_eth_devices[internals->slaves[i].port_id]); > + if (slave_dev->data->dev_conf.intr_conf.lsc != 0) { > + if (slave_dev->dev_ops && > + slave_dev->dev_ops->link_update) > + slave_dev->dev_ops->link_update(slave_dev, 0); > + bond_ethdev_lsc_event_callback( > + internals->slaves[i].port_id, > + RTE_ETH_EVENT_INTR_LSC, > + &bonded_ethdev->data->port_id, NULL); > + } > + } > + rte_spinlock_unlock(&internals->lock); > + RTE_LOG(INFO, EAL, > + "bond %s(%u): slave num %d, current active slave num %d\n", > + bonded_ethdev->data->name, bonded_ethdev->data->port_id, > + internals->slave_count, internals->active_slave_count); > +} > + > static int > bond_ethdev_start(struct rte_eth_dev *eth_dev) > { > @@ -1953,6 +1991,10 @@ struct bwg_slave { > if (internals->slaves[i].link_status_poll_enabled) > internals->link_status_polling_enabled = 1; > } > + > + rte_eal_alarm_set(BOND_LSC_DELAY_TIME_US, bond_ethdev_slave_lsc_delay, > + (void *)&rte_eth_devices[internals->port_id]); > + > /* start polling if needed */ > if (internals->link_status_polling_enabled) { > rte_eal_alarm_set(