From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id 1ECEAA0032; Wed, 14 Sep 2022 02:46:07 +0200 (CEST) Received: from [217.70.189.124] (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id BAB9B4021D; Wed, 14 Sep 2022 02:46:06 +0200 (CEST) Received: from szxga08-in.huawei.com (szxga08-in.huawei.com [45.249.212.255]) by mails.dpdk.org (Postfix) with ESMTP id 67615400D5 for ; Wed, 14 Sep 2022 02:46:05 +0200 (CEST) Received: from dggpeml500024.china.huawei.com (unknown [172.30.72.54]) by szxga08-in.huawei.com (SkyGuard) with ESMTP id 4MS1kN729kz14QWM; Wed, 14 Sep 2022 08:42:04 +0800 (CST) Received: from [10.67.100.224] (10.67.100.224) by dggpeml500024.china.huawei.com (7.185.36.10) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2375.31; Wed, 14 Sep 2022 08:46:01 +0800 Subject: Re: [PATCH v2 1/3] net/bonding: support Tx prepare To: Ferruh Yigit , , , CC: , , , <3chas3@gmail.com> References: <1619171202-28486-2-git-send-email-tangchengchang@huawei.com> <20220725040842.35027-1-fengchengwen@huawei.com> <20220725040842.35027-2-fengchengwen@huawei.com> <495fb2f0-60c2-f1c9-2985-0d08bb463ad0@xilinx.com> From: fengchengwen Message-ID: <4b4af3e8-710a-ae75-8171-331ebfe4e4f7@huawei.com> Date: Wed, 14 Sep 2022 08:46:01 +0800 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:68.0) Gecko/20100101 Thunderbird/68.11.0 MIME-Version: 1.0 In-Reply-To: <495fb2f0-60c2-f1c9-2985-0d08bb463ad0@xilinx.com> Content-Type: text/plain; charset="utf-8" Content-Language: en-US Content-Transfer-Encoding: 8bit X-Originating-IP: [10.67.100.224] X-ClientProxiedBy: dggems705-chm.china.huawei.com (10.3.19.182) To dggpeml500024.china.huawei.com (7.185.36.10) X-CFilter-Loop: Reflected X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Hi, Ferruh On 2022/9/13 18:22, Ferruh Yigit wrote: > On 7/25/2022 5:08 AM, Chengwen Feng wrote: > >> >> Normally, to use the HW offloads capability (e.g. checksum and TSO) in >> the Tx direction, the application needs to call rte_eth_dev_prepare to >> do some adjustment with the packets before sending them (e.g. processing >> pseudo headers when Tx checksum offload enabled). But, the tx_prepare >> callback of the bonding driver is not implemented. Therefore, the >> sent packets may have errors (e.g. checksum errors). >> >> However, it is difficult to design the tx_prepare callback for bonding >> driver. Because when a bonded device sends packets, the bonded device >> allocates the packets to different slave devices based on the real-time >> link status and bonding mode. That is, it is very difficult for the >> bonded device to determine which slave device's prepare function should >> be invoked. >> >> So, in this patch, the tx_prepare callback of bonding driver is not >> implemented. Instead, the rte_eth_dev_tx_prepare() will be called for >> all the fast path packets in mode 0, 1, 2, 4, 5, 6 (mode 3 is not >> included, see[1]). In this way, all tx_offloads can be processed >> correctly for all NIC devices in these modes. >> >> As previously discussed (see V1), if the tx_prepare fails, the bonding >> driver will free the cossesponding packets internally, and only the >> packets of the tx_prepare OK are xmit. >> > > Please provide link to discussion you refer to. https://inbox.dpdk.org/dev/1618571071-5927-2-git-send-email-tangchengchang@huawei.com/ Should I push a new version for it? > >> To minimize performance impact, this patch adds one new >> 'tx_prepare_enabled' field, and corresponding control and get API: >> rte_eth_bond_tx_prepare_set() and rte_eth_bond_tx_prepare_get(). >> >> [1]: In bond mode 3 (broadcast), a packet needs to be sent by all slave >> ports. Different slave PMDs process the packets differently in >> tx_prepare. If call tx_prepare before each slave port sending, the sent >> packet may be incorrect. >> >> Signed-off-by: Chengchang Tang >> Signed-off-by: Chengwen Feng > > <...> > >> +static inline uint16_t >> +bond_ethdev_tx_wrap(struct bond_tx_queue *bd_tx_q, uint16_t slave_port_id, >> +                   struct rte_mbuf **tx_pkts, uint16_t nb_pkts) >> +{ >> +       struct bond_dev_private *internals = bd_tx_q->dev_private; >> +       uint16_t queue_id = bd_tx_q->queue_id; >> +       struct rte_mbuf *fail_pkts[nb_pkts]; >> +       uint8_t fail_mark[nb_pkts]; >> +       uint16_t nb_pre, index; >> +       uint16_t fail_cnt = 0; >> +       int i; >> + >> +       if (!internals->tx_prepare_enabled) >> +               goto tx_burst; >> + >> +       nb_pre = rte_eth_tx_prepare(slave_port_id, queue_id, tx_pkts, nb_pkts); >> +       if (nb_pre == nb_pkts) >> +               goto tx_burst; >> + >> +       fail_pkts[fail_cnt++] = tx_pkts[nb_pre]; >> +       memset(fail_mark, 0, sizeof(fail_mark)); >> +       fail_mark[nb_pre] = 1; >> +       for (i = nb_pre + 1; i < nb_pkts; /* update in inner loop */) { >> +               nb_pre = rte_eth_tx_prepare(slave_port_id, queue_id, >> +                                           tx_pkts + i, nb_pkts - i); > > > I assume intention is to make this as transparent as possible to the user, that is why you are using a wrapper that combines `rte_eth_tx_prepare()` & `rte_eth_tx_burst()` APIs. But for other PMDs `rte_eth_tx_burst()` is called explicitly by the application. > > Path is also adding two new bonding specific APIs to enable/disable Tx prepare. > Instead if you leave calling `rte_eth_tx_prepare()` decision to user, there will be no need for the enable/disable Tx prepare APIs and the wrapper. > > The `tx_pkt_prepare()` implementation in bonding can do the mode check, call Tx prepare for all slaves and apply failure recovery, as done in this wrapper function, what do you think, will it work? I see Chas Williams also reply this thread, thanks. The main problem is hard to design a tx_prepare for bonding device: 1. as Chas Williams said, there maybe twice hash calc to get target slave devices. 2. also more important, if the slave devices have changes(e.g. slave device link down or remove), and if the changes happens between bond-tx-prepare and bond-tx-burst, the output slave will changes, and this may lead to checksum failed. (Note: a bond device with slave devices may from different vendors, and slave devices may have different requirements, e.g. slave-A support calc IPv4 pseudo-head automatic (no need driver pre-calc), but slave-B need driver pre-calc). Current design cover the above two scenarios by using in-place tx-prepare. and in addition, bond devices are not transparent to applications, I think it's a practical method to provide tx-prepare support in this way. > > .