From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id 41043A0093; Tue, 3 May 2022 08:55:07 +0200 (CEST) Received: from [217.70.189.124] (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id CFA1540C35; Tue, 3 May 2022 08:55:06 +0200 (CEST) Received: from szxga02-in.huawei.com (szxga02-in.huawei.com [45.249.212.188]) by mails.dpdk.org (Postfix) with ESMTP id 0549740691; Tue, 3 May 2022 08:55:03 +0200 (CEST) Received: from kwepemi500012.china.huawei.com (unknown [172.30.72.54]) by szxga02-in.huawei.com (SkyGuard) with ESMTP id 4KsrL14qWfzhXWW; Tue, 3 May 2022 14:54:33 +0800 (CST) Received: from [10.67.103.128] (10.67.103.128) by kwepemi500012.china.huawei.com (7.221.188.12) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2375.24; Tue, 3 May 2022 14:55:00 +0800 Subject: Re: [PATCH V2 1/4] net/bonding: fix non-active slaves aren't stopped To: Ferruh Yigit , CC: Huisong Li , , Chas Williams , Radu Nicolau References: <20211025063922.34066-1-humin29@huawei.com> <20220324030036.4761-1-humin29@huawei.com> <20220324030036.4761-2-humin29@huawei.com> <212e85ef-228a-d31e-e5a1-8b2331c7eef9@xilinx.com> From: "Min Hu (Connor)" Message-ID: <447752cb-d01b-df08-b5b9-0d331a20bf26@huawei.com> Date: Tue, 3 May 2022 14:54:59 +0800 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:68.0) Gecko/20100101 Thunderbird/68.3.1 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset="utf-8"; format=flowed Content-Transfer-Encoding: 8bit X-Originating-IP: [10.67.103.128] X-ClientProxiedBy: dggems706-chm.china.huawei.com (10.3.19.183) To kwepemi500012.china.huawei.com (7.221.188.12) X-CFilter-Loop: Reflected X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Hi, Ferruh, 在 2022/4/29 21:31, Ferruh Yigit 写道: > On 4/29/2022 7:45 AM, Min Hu (Connor) wrote: >> Hi, Ferruh, >> >> 在 2022/4/27 2:19, Ferruh Yigit 写道: >>> On 3/24/2022 3:00 AM, Min Hu (Connor) wrote: >>>> From: Huisong Li >>>> >>>> When stopping a bonded port, all slaves should be deactivated. But only >>> >>> s/deactivated/stopped/ ? >> not agreed. deactivated and stopped are different state for slave. >> > > Just to clarify the sentences, otherwise I see the 'stopped' and > 'deactivated' states are different. > Next sentences complains that not all ports are stopped: "But only > active slaves are stopped.", so I thought intention in this sentences to > claim that all slaves should be stopped (but it mentions all slaves > should be 'deactivated'). > As long as you address the disconnection between two sentences, I don't > mind the wording. Actually, there is something wrong with the wording. Yes, I should take your advice. > >>> >>>> active slaves are stopped. So fix it and do "deactivae_slave()" for >>>> active >>> >>> s/deactivae_slave()/deactivate_slave()/ >>> >> agreed. >> >>>> slaves. >>> >>> Hi Connor, >>> >>> When a bonding port is closed, is it clear if all slave ports or >>> active slave ports should be stopped? >> Yes, I think all the slave ports should be stopped(or try to be stopped). >>> >>>> >>>> Fixes: 0911d4ec0183 ("net/bonding: fix crash when stopping mode 4 >>>> port") >>>> Cc: stable@dpdk.org >>>> >>>> Signed-off-by: Huisong Li >>>> Signed-off-by: Min Hu (Connor) >>>> --- >>>>   drivers/net/bonding/rte_eth_bond_pmd.c | 20 +++++++++++--------- >>>>   1 file changed, 11 insertions(+), 9 deletions(-) >>>> >>>> diff --git a/drivers/net/bonding/rte_eth_bond_pmd.c >>>> b/drivers/net/bonding/rte_eth_bond_pmd.c >>>> index b305b6a35b..469dc71170 100644 >>>> --- a/drivers/net/bonding/rte_eth_bond_pmd.c >>>> +++ b/drivers/net/bonding/rte_eth_bond_pmd.c >>>> @@ -2118,18 +2118,20 @@ bond_ethdev_stop(struct rte_eth_dev *eth_dev) >>>>       internals->link_status_polling_enabled = 0; >>>>       for (i = 0; i < internals->slave_count; i++) { >>>>           uint16_t slave_id = internals->slaves[i].port_id; >>>> + >>>> +        internals->slaves[i].last_link_status = 0; >>>> +        ret = rte_eth_dev_stop(slave_id); >>>> +        if (ret != 0) { >>>> +            RTE_BOND_LOG(ERR, "Failed to stop device on port %u", >>>> +                     slave_id); >>>> +            return ret; >>> >>> Should it return here or try to stop all ports? >>> What about to record the return status, but keep continue to stop all >>> ports. And return error if any of the stop failed? Well, I am glad you have found something unreasaonable about 'stop'. Let us see API 'rte_eth_dev_stop' rte_eth_dev_stop(dev) { .... dev->data->dev_started = 0; ret = (*dev->dev_ops->dev_stop)(dev) retur ret; } This is unreasaonable. No matter 'dev_ops->dev_stop' succeed or fail, the state 'dev_started ' will always set to be '0'. But this does not only influence bonding device but other devices like eth dev or vdev. This is the bug in rte ethdev level. I will send another patch to fix it. >> I think no need to do this. APP only see the bonded device. If bonded >> device stop failed, it means it works failed. And the number of >> "stopped" successfully slave does not make any sense. >> > > OK if trying to stop as much as possible 'slave' devices doesn't make > sense, we can keep as it is. > > Btw, when functions fails at this point, bonding device itself already > marked as stopped, right? And some of the slave devices may be stopped > already before failure. > I don't know how confusing this is for the user, that stop() function is > failed but bonding device state is 'stopped'. I don't know if function > should recover at least bonding device status (back to started) on > failure, what do you think? > >>> >>>> +        } >>>> + >>>> +        /* active slaves need to deactivate. */ >>> >>> " active slaves need to be deactivated. " ? >> agreed. >>> >>>>           if (find_slave_by_id(internals->active_slaves, >>>>                   internals->active_slave_count, slave_id) != >>>> -                        internals->active_slave_count) { >>>> -            internals->slaves[i].last_link_status = 0; >>>> -            ret = rte_eth_dev_stop(slave_id); >>>> -            if (ret != 0) { >>>> -                RTE_BOND_LOG(ERR, "Failed to stop device on port %u", >>>> -                         slave_id); >>>> -                return ret; >>>> -            } >>>> +                internals->active_slave_count) >>> >>> I think original indentation for this line is better. >>> >> agreed. >>>>               deactivate_slave(eth_dev, slave_id); >>>> -        } >>>>       } >>>>       return 0; >>> >>> . > > .