From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id 9047FA04FF; Tue, 24 May 2022 14:11:12 +0200 (CEST) Received: from [217.70.189.124] (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id 6C29E40140; Tue, 24 May 2022 14:11:11 +0200 (CEST) Received: from szxga02-in.huawei.com (szxga02-in.huawei.com [45.249.212.188]) by mails.dpdk.org (Postfix) with ESMTP id 308BB400D6 for ; Tue, 24 May 2022 14:11:09 +0200 (CEST) Received: from kwepemi500012.china.huawei.com (unknown [172.30.72.53]) by szxga02-in.huawei.com (SkyGuard) with ESMTP id 4L6tJ73Q9SzQkBd; Tue, 24 May 2022 20:08:07 +0800 (CST) Received: from [10.67.103.128] (10.67.103.128) by kwepemi500012.china.huawei.com (7.221.188.12) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2375.24; Tue, 24 May 2022 20:11:06 +0800 Subject: Re: [dpdk-dev] [PATCH 1/2] net/bonding: support Tx prepare for bonding To: Andrew Rybchenko , Chengchang Tang , CC: , , , References: <1618571071-5927-1-git-send-email-tangchengchang@huawei.com> <1619171202-28486-1-git-send-email-tangchengchang@huawei.com> <1619171202-28486-2-git-send-email-tangchengchang@huawei.com> <4ce1752a-b605-8f35-ab06-13a671ef0e6c@oktetlabs.ru> From: "Min Hu (Connor)" Message-ID: <8fa66325-0f70-af0a-5126-7cae0fc8d0a4@huawei.com> Date: Tue, 24 May 2022 20:11:06 +0800 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:68.0) Gecko/20100101 Thunderbird/68.3.1 MIME-Version: 1.0 In-Reply-To: <4ce1752a-b605-8f35-ab06-13a671ef0e6c@oktetlabs.ru> Content-Type: text/plain; charset="utf-8"; format=flowed Content-Transfer-Encoding: 8bit X-Originating-IP: [10.67.103.128] X-ClientProxiedBy: dggems702-chm.china.huawei.com (10.3.19.179) To kwepemi500012.china.huawei.com (7.221.188.12) X-CFilter-Loop: Reflected X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Hi, Andrew, 在 2021/6/8 17:49, Andrew Rybchenko 写道: > "for bonding" is redundant in the summary since it is already "net/bonding". > > On 4/23/21 12:46 PM, Chengchang Tang wrote: >> To use the HW offloads capability (e.g. checksum and TSO) in the Tx >> direction, the upper-layer users need to call rte_eth_dev_prepare to do >> some adjustment to the packets before sending them (e.g. processing >> pseudo headers when Tx checksum offoad enabled). But, the tx_prepare >> callback of the bond driver is not implemented. Therefore, related >> offloads can not be used unless the upper layer users process the packet >> properly in their own application. But it is bad for the >> transplantability. >> >> However, it is difficult to design the tx_prepare callback for bonding >> driver. Because when a bonded device sends packets, the bonded device >> allocates the packets to different slave devices based on the real-time >> link status and bonding mode. That is, it is very difficult for the >> bonding device to determine which slave device's prepare function should >> be invoked. In addition, if the link status changes after the packets are >> prepared, the packets may fail to be sent because packets allocation may >> change. >> >> So, in this patch, the tx_prepare callback of bonding driver is not >> implemented. Instead, the rte_eth_dev_tx_prepare() will be called for >> all the fast path packet in mode 0, 1, 2, 4, 5, 6. In this way, all >> tx_offloads can be processed correctly for all NIC devices in these modes. >> If tx_prepare is not required in some cases, then slave PMDs tx_prepare >> pointer should be NULL and rte_eth_tx_prepare() will be just a NOOP. >> In these cases, the impact on performance will be very limited. It is >> the responsibility of the slave PMDs to decide when the real tx_prepare >> needs to be used. The information from dev_config/queue_setup is >> sufficient for them to make these decisions. >> >> Note: >> The rte_eth_tx_prepare is not added to bond mode 3(Broadcast). This is >> because in broadcast mode, a packet needs to be sent by all slave ports. >> Different PMDs process the packets differently in tx_prepare. As a result, >> the sent packet may be incorrect. >> >> Signed-off-by: Chengchang Tang >> --- >> drivers/net/bonding/rte_eth_bond.h | 1 - >> drivers/net/bonding/rte_eth_bond_pmd.c | 28 ++++++++++++++++++++++++---- >> 2 files changed, 24 insertions(+), 5 deletions(-) >> >> diff --git a/drivers/net/bonding/rte_eth_bond.h b/drivers/net/bonding/rte_eth_bond.h >> index 874aa91..1e6cc6d 100644 >> --- a/drivers/net/bonding/rte_eth_bond.h >> +++ b/drivers/net/bonding/rte_eth_bond.h >> @@ -343,7 +343,6 @@ rte_eth_bond_link_up_prop_delay_set(uint16_t bonded_port_id, >> int >> rte_eth_bond_link_up_prop_delay_get(uint16_t bonded_port_id); >> >> - >> #ifdef __cplusplus >> } >> #endif >> diff --git a/drivers/net/bonding/rte_eth_bond_pmd.c b/drivers/net/bonding/rte_eth_bond_pmd.c >> index 2e9cea5..84af348 100644 >> --- a/drivers/net/bonding/rte_eth_bond_pmd.c >> +++ b/drivers/net/bonding/rte_eth_bond_pmd.c >> @@ -606,8 +606,14 @@ bond_ethdev_tx_burst_round_robin(void *queue, struct rte_mbuf **bufs, >> /* Send packet burst on each slave device */ >> for (i = 0; i < num_of_slaves; i++) { >> if (slave_nb_pkts[i] > 0) { >> + int nb_prep_pkts; >> + >> + nb_prep_pkts = rte_eth_tx_prepare(slaves[i], >> + bd_tx_q->queue_id, slave_bufs[i], >> + slave_nb_pkts[i]); >> + > > Shouldn't it be called iff queue Tx offloads are not zero? > It will allow to decrease performance degradation if no > Tx offloads are enabled. Same in all cases below. > >> num_tx_slave = rte_eth_tx_burst(slaves[i], bd_tx_q->queue_id, >> - slave_bufs[i], slave_nb_pkts[i]); >> + slave_bufs[i], nb_prep_pkts); > > In fact it is a problem here and really big problems. > Tx prepare may fail and return less packets. Tx prepare > of some packet may always fail. If application tries to > send packets in a loop until success, it will be a > forever loop here. Since application calls Tx burst, > it is 100% legal behaviour of the function to return 0 > if Tx ring is full. It is not an error indication. > However, in the case of Tx prepare it is an error > indication. Just regardless of this patch, I think the problem already exit in 'rte_eth_tx_burst'. For example, there exits one 'bad' rte_mbuf in tx_pkts[]. set net driver 'fm10k' as an example, in 'fm10k_xmit_pkts', when one rte_mbuf is 'bad'(mb->nb_segs == 0), it will break and return the pkt num( < nb_pkts) which successfully xmited. The code is like : " uint16_t fm10k_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts) { .... for (count = 0; count < nb_pkts; ++count) { /* sanity check to make sure the mbuf is valid */ if ((mb->nb_segs == 0) || ((mb->nb_segs > 1) && (mb->next == NULL))) break; } return count; } " So, if APP send packets in a loop until success, it will be also forever loop here. And one solution to fix it is depending on APP itself, like what testpmd has done: it adds delay time, like that: " nb_tx = rte_eth_tx_burst(fs->tx_port, fs->tx_queue, pkts_burst, nb_rx); /* * Retry if necessary */ if (unlikely(nb_tx < nb_rx) && fs->retry_enabled) { retry = 0; while (nb_tx < nb_rx && retry++ < burst_tx_retry_num) { rte_delay_us(burst_tx_delay_time); nb_tx += rte_eth_tx_burst(fs->tx_port, fs->tx_queue, &pkts_burst[nb_tx], nb_rx - nb_tx); } } " what I mean, this patch does not introduce new 'bugs' to 'rte_eth_tx_burst'. And also, the known bug in retry situation can be fixed in APP. > > Should we change Tx burst description and enforce callers > to check for rte_errno? It sounds like a major change... > > [snip] > . >