From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id CC190A0C51; Thu, 10 Jun 2021 08:47:03 +0200 (CEST) Received: from [217.70.189.124] (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id 4D7104067C; Thu, 10 Jun 2021 08:47:03 +0200 (CEST) Received: from szxga02-in.huawei.com (szxga02-in.huawei.com [45.249.212.188]) by mails.dpdk.org (Postfix) with ESMTP id 98D9B4003C for ; Thu, 10 Jun 2021 08:47:00 +0200 (CEST) Received: from dggeme765-chm.china.huawei.com (unknown [172.30.72.55]) by szxga02-in.huawei.com (SkyGuard) with ESMTP id 4G0vZs6rDYzZftT; Thu, 10 Jun 2021 14:44:05 +0800 (CST) Received: from [127.0.0.1] (10.69.27.114) by dggeme765-chm.china.huawei.com (10.3.19.111) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA256_P256) id 15.1.2176.2; Thu, 10 Jun 2021 14:46:56 +0800 To: "Ananyev, Konstantin" , Andrew Rybchenko , "dev@dpdk.org" References: <1618571071-5927-1-git-send-email-tangchengchang@huawei.com> <1619171202-28486-1-git-send-email-tangchengchang@huawei.com> <1619171202-28486-2-git-send-email-tangchengchang@huawei.com> <4ce1752a-b605-8f35-ab06-13a671ef0e6c@oktetlabs.ru> <070e2dc9-7ab8-44f3-7d78-af1b35f18908@huawei.com> CC: "linuxarm@huawei.com" , "chas3@att.com" , "humin29@huawei.com" , "Yigit, Ferruh" From: Chengchang Tang Message-ID: Date: Thu, 10 Jun 2021 14:46:56 +0800 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:45.0) Gecko/20100101 Thunderbird/45.7.1 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 7bit X-Originating-IP: [10.69.27.114] X-ClientProxiedBy: dggems705-chm.china.huawei.com (10.3.19.182) To dggeme765-chm.china.huawei.com (10.3.19.111) X-CFilter-Loop: Reflected Subject: Re: [dpdk-dev] [PATCH 1/2] net/bonding: support Tx prepare for bonding X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" On 2021/6/9 18:25, Ananyev, Konstantin wrote: >>> On 4/23/21 12:46 PM, Chengchang Tang wrote: >>>> To use the HW offloads capability (e.g. checksum and TSO) in the Tx >>>> direction, the upper-layer users need to call rte_eth_dev_prepare to do >>>> some adjustment to the packets before sending them (e.g. processing >>>> pseudo headers when Tx checksum offoad enabled). But, the tx_prepare >>>> callback of the bond driver is not implemented. Therefore, related >>>> offloads can not be used unless the upper layer users process the packet >>>> properly in their own application. But it is bad for the >>>> transplantability. >>>> >>>> However, it is difficult to design the tx_prepare callback for bonding >>>> driver. Because when a bonded device sends packets, the bonded device >>>> allocates the packets to different slave devices based on the real-time >>>> link status and bonding mode. That is, it is very difficult for the >>>> bonding device to determine which slave device's prepare function should >>>> be invoked. In addition, if the link status changes after the packets are >>>> prepared, the packets may fail to be sent because packets allocation may >>>> change. >>>> >>>> So, in this patch, the tx_prepare callback of bonding driver is not >>>> implemented. Instead, the rte_eth_dev_tx_prepare() will be called for >>>> all the fast path packet in mode 0, 1, 2, 4, 5, 6. In this way, all >>>> tx_offloads can be processed correctly for all NIC devices in these modes. >>>> If tx_prepare is not required in some cases, then slave PMDs tx_prepare >>>> pointer should be NULL and rte_eth_tx_prepare() will be just a NOOP. >>>> In these cases, the impact on performance will be very limited. It is >>>> the responsibility of the slave PMDs to decide when the real tx_prepare >>>> needs to be used. The information from dev_config/queue_setup is >>>> sufficient for them to make these decisions. >>>> >>>> Note: >>>> The rte_eth_tx_prepare is not added to bond mode 3(Broadcast). This is >>>> because in broadcast mode, a packet needs to be sent by all slave ports. >>>> Different PMDs process the packets differently in tx_prepare. As a result, >>>> the sent packet may be incorrect. >>>> >>>> Signed-off-by: Chengchang Tang >>>> --- >>>> drivers/net/bonding/rte_eth_bond.h | 1 - >>>> drivers/net/bonding/rte_eth_bond_pmd.c | 28 ++++++++++++++++++++++++---- >>>> 2 files changed, 24 insertions(+), 5 deletions(-) >>>> >>>> diff --git a/drivers/net/bonding/rte_eth_bond.h b/drivers/net/bonding/rte_eth_bond.h >>>> index 874aa91..1e6cc6d 100644 >>>> --- a/drivers/net/bonding/rte_eth_bond.h >>>> +++ b/drivers/net/bonding/rte_eth_bond.h >>>> @@ -343,7 +343,6 @@ rte_eth_bond_link_up_prop_delay_set(uint16_t bonded_port_id, >>>> int >>>> rte_eth_bond_link_up_prop_delay_get(uint16_t bonded_port_id); >>>> >>>> - >>>> #ifdef __cplusplus >>>> } >>>> #endif >>>> diff --git a/drivers/net/bonding/rte_eth_bond_pmd.c b/drivers/net/bonding/rte_eth_bond_pmd.c >>>> index 2e9cea5..84af348 100644 >>>> --- a/drivers/net/bonding/rte_eth_bond_pmd.c >>>> +++ b/drivers/net/bonding/rte_eth_bond_pmd.c >>>> @@ -606,8 +606,14 @@ bond_ethdev_tx_burst_round_robin(void *queue, struct rte_mbuf **bufs, >>>> /* Send packet burst on each slave device */ >>>> for (i = 0; i < num_of_slaves; i++) { >>>> if (slave_nb_pkts[i] > 0) { >>>> + int nb_prep_pkts; >>>> + >>>> + nb_prep_pkts = rte_eth_tx_prepare(slaves[i], >>>> + bd_tx_q->queue_id, slave_bufs[i], >>>> + slave_nb_pkts[i]); >>>> + >>> >>> Shouldn't it be called iff queue Tx offloads are not zero? >>> It will allow to decrease performance degradation if no >>> Tx offloads are enabled. Same in all cases below. >> >> Regarding this point, it has been discussed in the previous RFC: >> https://inbox.dpdk.org/dev/47f907cf-3933-1de9-9c45-6734b912eccd@huawei.com/ >> >> According to the TX_OFFLOAD status of the current device, PMDs can determine >> whether tx_prepare is currently needed. If it is not needed, set pkt_tx_prepare >> to NULL, so that the actual tx_prepare processing will be skipped directly in >> rte_eth_tx_prepare(). >> >>> >>>> num_tx_slave = rte_eth_tx_burst(slaves[i], bd_tx_q->queue_id, >>>> - slave_bufs[i], slave_nb_pkts[i]); >>>> + slave_bufs[i], nb_prep_pkts); >>> >>> In fact it is a problem here and really big problems. >>> Tx prepare may fail and return less packets. Tx prepare >>> of some packet may always fail. If application tries to >>> send packets in a loop until success, it will be a >>> forever loop here. Since application calls Tx burst, >>> it is 100% legal behaviour of the function to return 0 >>> if Tx ring is full. It is not an error indication. >>> However, in the case of Tx prepare it is an error >>> indication. > > Yes, that sounds like a problem and existing apps might be affected. > >>> >>> Should we change Tx burst description and enforce callers >>> to check for rte_errno? It sounds like a major change... >>> > > Agree, rte_errno for tx_burst() is probably a simplest and sanest way, > but yes, it is a change in behaviour and apps will need to be updated. > Another option for bond PMD - just silently free mbufs for which prepare() > fails (and probably update some stats counter). > Again it is a change in behaviour, but now just for one PMD, with tx offloads enabled. > Also as, I can see some tx_burst() function for that PMD already free packets silently: > bond_ethdev_tx_burst_alb(), bond_ethdev_tx_burst_broadcast(). > > Actually another question - why the patch adds tx_prepare() only to some > TX modes but not all? > Is that itended? > Yes. Currently, I have no ideal to perform tx_prepare() in broadcast mode with limited impact on performance. In broadcast mode, same packets will be send in several devices. In this process, we only update the ref_cnt of mbufs, but no copy of packets. As we know, tx_prepare() may change the data, so it may cause some problem if we perform tx_prepare() several times on the same packet. >> >> I agree that if the failure is caused by Tx ring full, it is a legal behaviour. >> But what about the failure caused by other reasons? At present, it is possible >> for some PMDs to fail during tx_burst due to other reasons. In this case, >> repeated tries to send will also fail. >> >> I'm not sure if all PMDs need to support the behavior of sending packets in a >> loop until it succeeds. If not, I think the current problem can be reminded to >> the user by adding a description to the bonding. If it is necessary, I think the >> description of tx_burst should also add related instructions, so that the developers >> of PMDs can better understand how tx_burst should be designed, such as putting all >> hardware-related constraint checks into tx_prepare. And another prerequisite for >> the above behavior is that the packets must be prepared (i.e. checked by >> rte_eth_tx_prepare()). Otherwise, it may also fail to send. This means that we have >> to use rte_eth_tx_prepare() in more scenarios. >> >> What's Ferruh's opinion on this? >> >>> [snip] >>> >>> . >>> >