From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id 4786FA04A7; Tue, 8 Feb 2022 02:19:51 +0100 (CET) Received: from [217.70.189.124] (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id CD16D4014E; Tue, 8 Feb 2022 02:19:50 +0100 (CET) Received: from szxga03-in.huawei.com (szxga03-in.huawei.com [45.249.212.189]) by mails.dpdk.org (Postfix) with ESMTP id 8D8ED40141 for ; Tue, 8 Feb 2022 02:19:48 +0100 (CET) Received: from kwepemi100010.china.huawei.com (unknown [172.30.72.57]) by szxga03-in.huawei.com (SkyGuard) with ESMTP id 4Jt4pq4WYBzdZTQ; Tue, 8 Feb 2022 09:16:35 +0800 (CST) Received: from kwepemm600004.china.huawei.com (7.193.23.242) by kwepemi100010.china.huawei.com (7.221.188.54) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2308.21; Tue, 8 Feb 2022 09:19:46 +0800 Received: from [10.67.103.231] (10.67.103.231) by kwepemm600004.china.huawei.com (7.193.23.242) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2308.21; Tue, 8 Feb 2022 09:19:45 +0800 Message-ID: Date: Tue, 8 Feb 2022 09:19:45 +0800 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101 Thunderbird/91.2.0 Subject: Re: [dpdk-dev] [PATCH 1/3] app/testpmd: fix port status of active slave device To: Ferruh Yigit , "Min Hu (Connor)" , CC: , , Radu Nicolau , "Singh, Aman Deep" References: <20211025063922.34066-1-humin29@huawei.com> <20211025063922.34066-2-humin29@huawei.com> From: "lihuisong (C)" In-Reply-To: Content-Type: text/plain; charset="UTF-8"; format=flowed Content-Transfer-Encoding: 8bit X-Originating-IP: [10.67.103.231] X-ClientProxiedBy: dggems701-chm.china.huawei.com (10.3.19.178) To kwepemm600004.china.huawei.com (7.193.23.242) X-CFilter-Loop: Reflected X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org 在 2022/2/4 20:07, Ferruh Yigit 写道: > On 10/25/2021 7:39 AM, Min Hu (Connor) wrote: >> From: Huisong Li >> >> Stopping a bond device also stops all active slaves under the bond >> device. >> If this port is bond device, we need to modify the port status of all >> slaves from RTE_PORT_STARTED to RTE_PORT_STOPPED. >> >> Fixes: 0e545d3047fe ("app/testpmd: check stopping port is not in >> bonding") >> Cc: stable@dpdk.org >> >> Signed-off-by: Huisong Li >> Signed-off-by: Min Hu (Connor) >> --- >>   app/test-pmd/cmdline.c |  1 + >>   app/test-pmd/testpmd.c | 49 +++++++++++++++++++++++++++++++++++++++--- >>   app/test-pmd/testpmd.h |  3 ++- >>   3 files changed, 49 insertions(+), 4 deletions(-) >> >> diff --git a/app/test-pmd/cmdline.c b/app/test-pmd/cmdline.c >> index 722f4fb9d9..5bfb4b509b 100644 >> --- a/app/test-pmd/cmdline.c >> +++ b/app/test-pmd/cmdline.c >> @@ -6639,6 +6639,7 @@ static void >> cmd_create_bonded_device_parsed(void *parsed_result, >>                   "Failed to enable promiscuous mode for port %u: %s >> - ignore\n", >>                   port_id, rte_strerror(-ret)); >>   +        ports[port_id].bond_flag = 1; >>           ports[port_id].need_setup = 0; >>           ports[port_id].port_status = RTE_PORT_STOPPED; >>       } >> diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c >> index af0e79fe6d..d6b9ebc4dd 100644 >> --- a/app/test-pmd/testpmd.c >> +++ b/app/test-pmd/testpmd.c >> @@ -65,6 +65,9 @@ >>   #ifdef RTE_EXEC_ENV_WINDOWS >>   #include >>   #endif >> +#ifdef RTE_NET_BOND >> +#include >> +#endif >>     #include "testpmd.h" >>   @@ -2986,6 +2989,35 @@ start_port(portid_t pid) >>       return 0; >>   } >>   +#ifdef RTE_NET_BOND >> +static void >> +change_bonding_active_slave_port_status(portid_t bond_pid) > > The function sets the status explicitly to PORT_STOPPED, but function > name is more generic, should we update the function name to reflect the > functionality? ok > >> +{ >> +    portid_t slave_pids[RTE_MAX_ETHPORTS]; >> +    struct rte_port *port; >> +    int num_active_slaves; >> +    portid_t slave_pid; >> +    int i; >> + >> +    num_active_slaves = rte_eth_bond_active_slaves_get(bond_pid, >> slave_pids, >> +                               RTE_MAX_ETHPORTS); >> +    if (num_active_slaves < 0) { >> +        fprintf(stderr, "Failed to get slave list for port = %u\n", >> +            bond_pid); >> +        return; >> +    } >> + >> +    for (i = 0; i < num_active_slaves; i++) { >> +        slave_pid = slave_pids[i]; >> +        port = &ports[slave_pid]; >> +        if (rte_atomic16_cmpset(&(port->port_status), >> +            RTE_PORT_STARTED, RTE_PORT_STOPPED) == 0) >> +            fprintf(stderr, "Port %u can not be set into stopped\n", >> +                slave_pid); >> +    } >> +} >> +#endif >> + >>   void >>   stop_port(portid_t pid) >>   { >> @@ -3042,9 +3074,20 @@ stop_port(portid_t pid) >>           if (port->flow_list) >>               port_flow_flush(pi); >>   -        if (eth_dev_stop_mp(pi) != 0) >> -            RTE_LOG(ERR, EAL, "rte_eth_dev_stop failed for port %u\n", >> -                pi); > > Can you please remove the 'eth_dev_stop_mp()' function in this patch, > which is removed in patch 2/3. ok > >> +        if (is_proc_primary()) { >> +#ifdef RTE_NET_BOND >> +            /* >> +             * Stopping a bond device also stops all active slaves >> +             * under the bond device. If this port is bond device, >> +             * we need to modify the port status of all slaves. >> +             */ >> +            if (port->bond_flag == 1) >> +                change_bonding_active_slave_port_status(pi); >> +#endif >> +            if (rte_eth_dev_stop(pi) != 0) >> +                RTE_LOG(ERR, EAL, "rte_eth_dev_stop failed for port >> %u\n", >> +                    pi); > > Should we roll back the slave port status if 'rte_eth_dev_stop(pi)' > fails? Yes, it is necessary here for slaves to fail to execute dev_stop() in bonding driver. Btw, in thinking about this, I find a behavior that is not very reasonable. Namely, only active slaves are stopped when a bonding device is stopped. It can cause confusion in port status. For example, applications have to only modify active slaves status to RTE_PORT_STOPPED and non-active slaves status is still RTE_PORT_STARTED. I think the bonding PMD should stop all slaves when a bonding device is stopped. I checked the modification history about this in the bonding PMD. This behavior is introduced by the following patch. /* commit 0911d4ec01839c9149a0df5758d00d9d57a47cea Author: Radu Nicolau Date:   Thu Nov 8 15:26:42 2018 +0000     net/bonding: fix crash when stopping mode 4 port     When stopping a bonded port all slaves are deactivated. Attempting     to deactivate a slave that was never activated will result in a segfault     when mode 4 is used.     Fixes: 7486331308f6 ("net/bonding: stop and deactivate slaves on stop")     Cc: stable@dpdk.org     Signed-off-by: Radu Nicolau     Acked-by: Chas Williams */ The root cause of the problem the above patch mentioned is that in mode 4, the bonding PMD does not allocate rx/tx rings to non-active slave devices. The call stack is as follows: #0  0x0000000000b1250c in rte_ring_dequeue_bulk_elem (available=0x0, n=1, esize=8, obj_table=0xffffffff7c80, r=0x0) at ../dpdk-next-net/lib/ring/rte_ring_elem.h:380 #1  rte_ring_dequeue_elem (esize=8, obj_p=0xffffffff7c80, r=0x0) at ../dpdk-next-net/lib/ring/rte_ring_elem.h:476 #2  rte_ring_dequeue (obj_p=0xffffffff7c80, r=0x0) at ../dpdk-next-net/lib/ring/rte_ring.h:463 #3  bond_mode_8023ad_deactivate_slave (bond_dev=0x4753200 , slave_id=0) at ../dpdk-next-net/drivers/net/bonding/rte_eth_bond_8023ad.c:1163 #4  0x0000000000b29e10 in deactivate_slave (eth_dev=0x4753200 , port_id=0) at ../dpdk-next-net/drivers/net/bonding/rte_eth_bond_api.c:117 #5  0x0000000000b44208 in bond_ethdev_stop (eth_dev=0x4753200 ) at ../dpdk-next-net/drivers/net/bonding/rte_eth_bond_pmd.c:2103 #6  0x00000000007966fc in rte_eth_dev_stop (port_id=2) at ../dpdk-next-net/lib/ethdev/rte_ethdev.c:1894 #7  0x000000000055ea60 in eth_dev_stop_mp (port_id=2) at ../dpdk-next-net/app/test-pmd/testpmd.c:613 #8  0x0000000000565230 in stop_port (pid=2) at ../dpdk-next-net/app/test-pmd/testpmd.c:3059 #9  0x00000000004f7614 in cmd_operate_specific_port_parsed (parsed_result=0xffffffff91b0, cl=0x4829250, data=0x0) at ../dpdk-next-net/app/test-pmd/cmdline.c:1261 #10 0x000000000078be24 in cmdline_parse (cl=0x4829250, buf=0x4829298 "port stop 2\n") at ../dpdk-next-net/lib/cmdline/cmdline_parse.c:290 #11 0x0000000000789c34 in cmdline_valid_buffer (rdl=0x4829260, buf=0x4829298 "port stop 2\n", size=13) at ../dpdk-next-net/lib/cmdline/cmdline.c:26 #12 0x000000000078f160 in rdline_char_in (rdl=0x4829260, c=10 '\n') at ../dpdk-next-net/lib/cmdline/cmdline_rdline.c:446 #13 0x000000000078a0c8 in cmdline_in (cl=0x4829250, buf=0xfffffffff2e7 "\n", size=1) at ../dpdk-next-net/lib/cmdline/cmdline.c:148 #14 0x000000000078a3b4 in cmdline_interact (cl=0x4829250) at ../dpdk-next-net/lib/cmdline/cmdline.c:222 #15 0x000000000050bf98 in prompt () at ../dpdk-next-net/app/test-pmd/cmdline.c:18001 #16 0x00000000005687c4 in main (argc=4, argv=0xfffffffff510) at ../dpdk-next-net/app/test-pmd/testpmd.c:4268 For the problem Radu encountered, we only need to ensure that non-active slaves doesn't deactivate. I plan to add a patch in this patchset to fix this problem. What do you think, Ferruh? > >> +        } >>             if (rte_atomic16_cmpset(&(port->port_status), >>               RTE_PORT_HANDLING, RTE_PORT_STOPPED) == 0) >> diff --git a/app/test-pmd/testpmd.h b/app/test-pmd/testpmd.h >> index e3995d24ab..ad3b4f875c 100644 >> --- a/app/test-pmd/testpmd.h >> +++ b/app/test-pmd/testpmd.h >> @@ -237,7 +237,8 @@ struct rte_port { >>       struct rte_eth_txconf tx_conf[RTE_MAX_QUEUES_PER_PORT+1]; /**< >> per queue tx configuration */ >>       struct rte_ether_addr   *mc_addr_pool; /**< pool of multicast >> addrs */ >>       uint32_t                mc_addr_nb; /**< nb. of addr. in >> mc_addr_pool */ >> -    uint8_t                 slave_flag; /**< bonding slave port */ >> +    uint8_t                 slave_flag : 1, /**< bonding slave port */ >> +                bond_flag : 1; /**< port is bond device */ > > Can't we detect if the port is a bonding port without introducing a new > variable/state? The bonding device is also an ethdev. I do not find the external API that can be used to detect whether a port is a bonding port. > >>       struct port_flow        *flow_list; /**< Associated flows. */ >>       struct port_indirect_action *actions_list; >>       /**< Associated indirect actions. */ > > .