From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mga11.intel.com (mga11.intel.com [192.55.52.93]) by dpdk.org (Postfix) with ESMTP id EA1602C54 for ; Wed, 29 Nov 2017 18:45:42 +0100 (CET) Received: from fmsmga003.fm.intel.com ([10.253.24.29]) by fmsmga102.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 29 Nov 2017 09:45:41 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.45,338,1508828400"; d="scan'208";a="7297685" Received: from unknown (HELO [10.241.225.149]) ([10.241.225.149]) by FMSMGA003.fm.intel.com with ESMTP; 29 Nov 2017 09:45:41 -0800 To: Radu Nicolau , zhangsha.zhang@huawei.com, dev@dpdk.org, declan.doherty@intel.com Cc: jerry.lilijun@huawei.com, zhoujingbin@huawei.com, caihe@huawei.com References: <1501063992-10704-1-git-send-email-zhangsha.zhang@huawei.com> From: Ferruh Yigit Message-ID: <9a4101ce-e15d-4529-27b3-72f053b2ac6a@intel.com> Date: Wed, 29 Nov 2017 09:45:41 -0800 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:52.0) Gecko/20100101 Thunderbird/52.5.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit Subject: Re: [dpdk-dev] [PATCH v3] bonding: fix the segfault caused by the race condition between master thread and eal-intr-thread X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 29 Nov 2017 17:45:43 -0000 On 9/4/2017 4:54 AM, Radu Nicolau wrote: > Hi, > > Wouldn't be possible to treat the section of code that segfaults as a > critical one, i.e. use the lock/unlock instead of triggering alarms? Hi Sha, Any update? Is this patch still valid? Thanks, ferruh > > > On 7/26/2017 11:13 AM, zhangsha.zhang@huawei.com wrote: >> From: Sha Zhang >> >> Function slave_configure calls functions bond_ethdev_lsc_event_callback and >> slave_eth_dev->dev_ops->link_update to fix updating slave link status. >> But there is a low probability that process may be crashed if the master >> thread, which create bonding-device, adds the active_slave_count of the >> bond to nozero while the rx_ring or tx_ring of it haven't been created. >> >> This patch moves the functions bond_ethdev_lsc_event_callback and >> slave_eth_dev->dev_ops->link_update to eal-intr-thread to aviod the >> competition. >> >> Fixes: 210903803f6e ("net/bonding: fix updating slave link status") >> >> Signed-off-by: Sha Zhang >> --- >> drivers/net/bonding/rte_eth_bond_pmd.c | 58 +++++++++++++++++++++++++++++----- >> 1 file changed, 50 insertions(+), 8 deletions(-) >> >> diff --git a/drivers/net/bonding/rte_eth_bond_pmd.c b/drivers/net/bonding/rte_eth_bond_pmd.c >> index 383e27c..bc0ee7f 100644 >> --- a/drivers/net/bonding/rte_eth_bond_pmd.c >> +++ b/drivers/net/bonding/rte_eth_bond_pmd.c >> @@ -53,6 +53,7 @@ >> >> #define REORDER_PERIOD_MS 10 >> #define DEFAULT_POLLING_INTERVAL_10_MS (10) >> +#define BOND_LSC_DELAY_TIME_US (10 * 1000) >> >> #define HASH_L4_PORTS(h) ((h)->src_port ^ (h)->dst_port) >> >> @@ -1800,14 +1801,6 @@ struct bwg_slave { >> } >> } >> >> - /* If lsc interrupt is set, check initial slave's link status */ >> - if (slave_eth_dev->data->dev_flags & RTE_ETH_DEV_INTR_LSC) { >> - slave_eth_dev->dev_ops->link_update(slave_eth_dev, 0); >> - bond_ethdev_lsc_event_callback(slave_eth_dev->data->port_id, >> - RTE_ETH_EVENT_INTR_LSC, &bonded_eth_dev->data->port_id, >> - NULL); >> - } >> - >> return 0; >> } >> >> @@ -1878,6 +1871,51 @@ struct bwg_slave { >> static void >> bond_ethdev_promiscuous_enable(struct rte_eth_dev *eth_dev); >> >> +static void >> +bond_ethdev_slave_lsc_delay(void *cb_arg) >> +{ >> + struct rte_eth_dev *bonded_ethdev, *slave_dev; >> + struct bond_dev_private *internals; >> + >> + /* Default value for polling slave found is true as we don't >> + * want todisable the polling thread if we cannot get the lock. >> + */ >> + int i = 0; >> + >> + if (!cb_arg) >> + return; >> + >> + bonded_ethdev = (struct rte_eth_dev *)cb_arg; >> + if (!bonded_ethdev->data->dev_started) >> + return; >> + >> + internals = (struct bond_dev_private *)bonded_ethdev->data->dev_private; >> + if (!rte_spinlock_trylock(&internals->lock)) { >> + rte_eal_alarm_set(BOND_LSC_DELAY_TIME_US * 10, >> + bond_ethdev_slave_lsc_delay, >> + (void *)&rte_eth_devices[internals->port_id]); >> + return; >> + } >> + >> + for (i = 0; i < internals->slave_count; i++) { >> + slave_dev = &(rte_eth_devices[internals->slaves[i].port_id]); >> + if (slave_dev->data->dev_conf.intr_conf.lsc != 0) { >> + if (slave_dev->dev_ops && >> + slave_dev->dev_ops->link_update) >> + slave_dev->dev_ops->link_update(slave_dev, 0); >> + bond_ethdev_lsc_event_callback( >> + internals->slaves[i].port_id, >> + RTE_ETH_EVENT_INTR_LSC, >> + &bonded_ethdev->data->port_id, NULL); >> + } >> + } >> + rte_spinlock_unlock(&internals->lock); >> + RTE_LOG(INFO, EAL, >> + "bond %s(%u): slave num %d, current active slave num %d\n", >> + bonded_ethdev->data->name, bonded_ethdev->data->port_id, >> + internals->slave_count, internals->active_slave_count); >> +} >> + >> static int >> bond_ethdev_start(struct rte_eth_dev *eth_dev) >> { >> @@ -1953,6 +1991,10 @@ struct bwg_slave { >> if (internals->slaves[i].link_status_poll_enabled) >> internals->link_status_polling_enabled = 1; >> } >> + >> + rte_eal_alarm_set(BOND_LSC_DELAY_TIME_US, bond_ethdev_slave_lsc_delay, >> + (void *)&rte_eth_devices[internals->port_id]); >> + >> /* start polling if needed */ >> if (internals->link_status_polling_enabled) { >> rte_eal_alarm_set( > >