From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id 293B1A034F; Tue, 8 Jun 2021 11:49:42 +0200 (CEST) Received: from [217.70.189.124] (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id D32A94067A; Tue, 8 Jun 2021 11:49:40 +0200 (CEST) Received: from shelob.oktetlabs.ru (shelob.oktetlabs.ru [91.220.146.113]) by mails.dpdk.org (Postfix) with ESMTP id 89BDE4013F for ; Tue, 8 Jun 2021 11:49:39 +0200 (CEST) Received: from [192.168.38.17] (aros.oktetlabs.ru [192.168.38.17]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by shelob.oktetlabs.ru (Postfix) with ESMTPSA id 006EA7F596; Tue, 8 Jun 2021 12:49:38 +0300 (MSK) DKIM-Filter: OpenDKIM Filter v2.11.0 shelob.oktetlabs.ru 006EA7F596 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=oktetlabs.ru; s=default; t=1623145779; bh=v1UhTDT4S4mbm2678masEJN80rrpYF4Yz0aRcddYcvI=; h=Subject:To:Cc:References:From:Date:In-Reply-To; b=iK+JynQsuLd7LmWd60u+Vg3Nj1uB5whU1i6VqSfkOWr5PIcN1ok+qQHLCQfJmKqna AGExpd7OvT7DXK0HobbAQoORh2Vng17ZebhVvh/6aVeW3ngTtWUIGDrjdmTRuZ6TJX 7w9WfP89PwxDKcZ7t9IbXH+MthHCHNBgInWjkWco= To: Chengchang Tang , dev@dpdk.org Cc: linuxarm@huawei.com, chas3@att.com, humin29@huawei.com, ferruh.yigit@intel.com, konstantin.ananyev@intel.com References: <1618571071-5927-1-git-send-email-tangchengchang@huawei.com> <1619171202-28486-1-git-send-email-tangchengchang@huawei.com> <1619171202-28486-2-git-send-email-tangchengchang@huawei.com> From: Andrew Rybchenko Organization: OKTET Labs Message-ID: <4ce1752a-b605-8f35-ab06-13a671ef0e6c@oktetlabs.ru> Date: Tue, 8 Jun 2021 12:49:38 +0300 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.10.0 MIME-Version: 1.0 In-Reply-To: <1619171202-28486-2-git-send-email-tangchengchang@huawei.com> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit Subject: Re: [dpdk-dev] [PATCH 1/2] net/bonding: support Tx prepare for bonding X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" "for bonding" is redundant in the summary since it is already "net/bonding". On 4/23/21 12:46 PM, Chengchang Tang wrote: > To use the HW offloads capability (e.g. checksum and TSO) in the Tx > direction, the upper-layer users need to call rte_eth_dev_prepare to do > some adjustment to the packets before sending them (e.g. processing > pseudo headers when Tx checksum offoad enabled). But, the tx_prepare > callback of the bond driver is not implemented. Therefore, related > offloads can not be used unless the upper layer users process the packet > properly in their own application. But it is bad for the > transplantability. > > However, it is difficult to design the tx_prepare callback for bonding > driver. Because when a bonded device sends packets, the bonded device > allocates the packets to different slave devices based on the real-time > link status and bonding mode. That is, it is very difficult for the > bonding device to determine which slave device's prepare function should > be invoked. In addition, if the link status changes after the packets are > prepared, the packets may fail to be sent because packets allocation may > change. > > So, in this patch, the tx_prepare callback of bonding driver is not > implemented. Instead, the rte_eth_dev_tx_prepare() will be called for > all the fast path packet in mode 0, 1, 2, 4, 5, 6. In this way, all > tx_offloads can be processed correctly for all NIC devices in these modes. > If tx_prepare is not required in some cases, then slave PMDs tx_prepare > pointer should be NULL and rte_eth_tx_prepare() will be just a NOOP. > In these cases, the impact on performance will be very limited. It is > the responsibility of the slave PMDs to decide when the real tx_prepare > needs to be used. The information from dev_config/queue_setup is > sufficient for them to make these decisions. > > Note: > The rte_eth_tx_prepare is not added to bond mode 3(Broadcast). This is > because in broadcast mode, a packet needs to be sent by all slave ports. > Different PMDs process the packets differently in tx_prepare. As a result, > the sent packet may be incorrect. > > Signed-off-by: Chengchang Tang > --- > drivers/net/bonding/rte_eth_bond.h | 1 - > drivers/net/bonding/rte_eth_bond_pmd.c | 28 ++++++++++++++++++++++++---- > 2 files changed, 24 insertions(+), 5 deletions(-) > > diff --git a/drivers/net/bonding/rte_eth_bond.h b/drivers/net/bonding/rte_eth_bond.h > index 874aa91..1e6cc6d 100644 > --- a/drivers/net/bonding/rte_eth_bond.h > +++ b/drivers/net/bonding/rte_eth_bond.h > @@ -343,7 +343,6 @@ rte_eth_bond_link_up_prop_delay_set(uint16_t bonded_port_id, > int > rte_eth_bond_link_up_prop_delay_get(uint16_t bonded_port_id); > > - > #ifdef __cplusplus > } > #endif > diff --git a/drivers/net/bonding/rte_eth_bond_pmd.c b/drivers/net/bonding/rte_eth_bond_pmd.c > index 2e9cea5..84af348 100644 > --- a/drivers/net/bonding/rte_eth_bond_pmd.c > +++ b/drivers/net/bonding/rte_eth_bond_pmd.c > @@ -606,8 +606,14 @@ bond_ethdev_tx_burst_round_robin(void *queue, struct rte_mbuf **bufs, > /* Send packet burst on each slave device */ > for (i = 0; i < num_of_slaves; i++) { > if (slave_nb_pkts[i] > 0) { > + int nb_prep_pkts; > + > + nb_prep_pkts = rte_eth_tx_prepare(slaves[i], > + bd_tx_q->queue_id, slave_bufs[i], > + slave_nb_pkts[i]); > + Shouldn't it be called iff queue Tx offloads are not zero? It will allow to decrease performance degradation if no Tx offloads are enabled. Same in all cases below. > num_tx_slave = rte_eth_tx_burst(slaves[i], bd_tx_q->queue_id, > - slave_bufs[i], slave_nb_pkts[i]); > + slave_bufs[i], nb_prep_pkts); In fact it is a problem here and really big problems. Tx prepare may fail and return less packets. Tx prepare of some packet may always fail. If application tries to send packets in a loop until success, it will be a forever loop here. Since application calls Tx burst, it is 100% legal behaviour of the function to return 0 if Tx ring is full. It is not an error indication. However, in the case of Tx prepare it is an error indication. Should we change Tx burst description and enforce callers to check for rte_errno? It sounds like a major change... [snip]