From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <dev-bounces@dpdk.org>
Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124])
	by inbox.dpdk.org (Postfix) with ESMTP id A9707A0C46;
	Wed,  9 Jun 2021 08:42:56 +0200 (CEST)
Received: from [217.70.189.124] (localhost [127.0.0.1])
	by mails.dpdk.org (Postfix) with ESMTP id 3DAC940E3C;
	Wed,  9 Jun 2021 08:42:56 +0200 (CEST)
Received: from szxga01-in.huawei.com (szxga01-in.huawei.com [45.249.212.187])
 by mails.dpdk.org (Postfix) with ESMTP id 720B64069B
 for <dev@dpdk.org>; Wed,  9 Jun 2021 08:42:53 +0200 (CEST)
Received: from dggeme765-chm.china.huawei.com (unknown [172.30.72.54])
 by szxga01-in.huawei.com (SkyGuard) with ESMTP id 4G0HXb3fR9zYsWk;
 Wed,  9 Jun 2021 14:39:59 +0800 (CST)
Received: from [127.0.0.1] (10.69.27.114) by dggeme765-chm.china.huawei.com
 (10.3.19.111) with Microsoft SMTP Server (version=TLS1_2,
 cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA256_P256) id 15.1.2176.2; Wed, 9 Jun
 2021 14:42:48 +0800
To: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>, <dev@dpdk.org>
References: <1618571071-5927-1-git-send-email-tangchengchang@huawei.com>
 <1619171202-28486-1-git-send-email-tangchengchang@huawei.com>
 <1619171202-28486-2-git-send-email-tangchengchang@huawei.com>
 <4ce1752a-b605-8f35-ab06-13a671ef0e6c@oktetlabs.ru>
CC: <linuxarm@huawei.com>, <chas3@att.com>, <humin29@huawei.com>,
 <ferruh.yigit@intel.com>, <konstantin.ananyev@intel.com>
From: Chengchang Tang <tangchengchang@huawei.com>
Message-ID: <070e2dc9-7ab8-44f3-7d78-af1b35f18908@huawei.com>
Date: Wed, 9 Jun 2021 14:42:48 +0800
User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:45.0) Gecko/20100101
 Thunderbird/45.7.1
MIME-Version: 1.0
In-Reply-To: <4ce1752a-b605-8f35-ab06-13a671ef0e6c@oktetlabs.ru>
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: 7bit
X-Originating-IP: [10.69.27.114]
X-ClientProxiedBy: dggems703-chm.china.huawei.com (10.3.19.180) To
 dggeme765-chm.china.huawei.com (10.3.19.111)
X-CFilter-Loop: Reflected
Subject: Re: [dpdk-dev] [PATCH 1/2] net/bonding: support Tx prepare for
 bonding
X-BeenThere: dev@dpdk.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: DPDK patches and discussions <dev.dpdk.org>
List-Unsubscribe: <https://mails.dpdk.org/options/dev>,
 <mailto:dev-request@dpdk.org?subject=unsubscribe>
List-Archive: <http://mails.dpdk.org/archives/dev/>
List-Post: <mailto:dev@dpdk.org>
List-Help: <mailto:dev-request@dpdk.org?subject=help>
List-Subscribe: <https://mails.dpdk.org/listinfo/dev>,
 <mailto:dev-request@dpdk.org?subject=subscribe>
Errors-To: dev-bounces@dpdk.org
Sender: "dev" <dev-bounces@dpdk.org>

Hi, Andrew and Ferruh

On 2021/6/8 17:49, Andrew Rybchenko wrote:
> "for bonding" is redundant in the summary since it is already "net/bonding".
> 
> On 4/23/21 12:46 PM, Chengchang Tang wrote:
>> To use the HW offloads capability (e.g. checksum and TSO) in the Tx
>> direction, the upper-layer users need to call rte_eth_dev_prepare to do
>> some adjustment to the packets before sending them (e.g. processing
>> pseudo headers when Tx checksum offoad enabled). But, the tx_prepare
>> callback of the bond driver is not implemented. Therefore, related
>> offloads can not be used unless the upper layer users process the packet
>> properly in their own application. But it is bad for the
>> transplantability.
>>
>> However, it is difficult to design the tx_prepare callback for bonding
>> driver. Because when a bonded device sends packets, the bonded device
>> allocates the packets to different slave devices based on the real-time
>> link status and bonding mode. That is, it is very difficult for the
>> bonding device to determine which slave device's prepare function should
>> be invoked. In addition, if the link status changes after the packets are
>> prepared, the packets may fail to be sent because packets allocation may
>> change.
>>
>> So, in this patch, the tx_prepare callback of bonding driver is not
>> implemented. Instead, the rte_eth_dev_tx_prepare() will be called for
>> all the fast path packet in mode 0, 1, 2, 4, 5, 6. In this way, all
>> tx_offloads can be processed correctly for all NIC devices in these modes.
>> If tx_prepare is not required in some cases, then slave PMDs tx_prepare
>> pointer should be NULL and rte_eth_tx_prepare() will be just a NOOP.
>> In these cases, the impact on performance will be very limited. It is
>> the responsibility of the slave PMDs to decide when the real tx_prepare
>> needs to be used. The information from dev_config/queue_setup is
>> sufficient for them to make these decisions.
>>
>> Note:
>> The rte_eth_tx_prepare is not added to bond mode 3(Broadcast). This is
>> because in broadcast mode, a packet needs to be sent by all slave ports.
>> Different PMDs process the packets differently in tx_prepare. As a result,
>> the sent packet may be incorrect.
>>
>> Signed-off-by: Chengchang Tang <tangchengchang@huawei.com>
>> ---
>>  drivers/net/bonding/rte_eth_bond.h     |  1 -
>>  drivers/net/bonding/rte_eth_bond_pmd.c | 28 ++++++++++++++++++++++++----
>>  2 files changed, 24 insertions(+), 5 deletions(-)
>>
>> diff --git a/drivers/net/bonding/rte_eth_bond.h b/drivers/net/bonding/rte_eth_bond.h
>> index 874aa91..1e6cc6d 100644
>> --- a/drivers/net/bonding/rte_eth_bond.h
>> +++ b/drivers/net/bonding/rte_eth_bond.h
>> @@ -343,7 +343,6 @@ rte_eth_bond_link_up_prop_delay_set(uint16_t bonded_port_id,
>>  int
>>  rte_eth_bond_link_up_prop_delay_get(uint16_t bonded_port_id);
>>
>> -
>>  #ifdef __cplusplus
>>  }
>>  #endif
>> diff --git a/drivers/net/bonding/rte_eth_bond_pmd.c b/drivers/net/bonding/rte_eth_bond_pmd.c
>> index 2e9cea5..84af348 100644
>> --- a/drivers/net/bonding/rte_eth_bond_pmd.c
>> +++ b/drivers/net/bonding/rte_eth_bond_pmd.c
>> @@ -606,8 +606,14 @@ bond_ethdev_tx_burst_round_robin(void *queue, struct rte_mbuf **bufs,
>>  	/* Send packet burst on each slave device */
>>  	for (i = 0; i < num_of_slaves; i++) {
>>  		if (slave_nb_pkts[i] > 0) {
>> +			int nb_prep_pkts;
>> +
>> +			nb_prep_pkts = rte_eth_tx_prepare(slaves[i],
>> +					bd_tx_q->queue_id, slave_bufs[i],
>> +					slave_nb_pkts[i]);
>> +
> 
> Shouldn't it be called iff queue Tx offloads are not zero?
> It will allow to decrease performance degradation if no
> Tx offloads are enabled. Same in all cases below.

Regarding this point, it has been discussed in the previous RFC:
https://inbox.dpdk.org/dev/47f907cf-3933-1de9-9c45-6734b912eccd@huawei.com/

According to the TX_OFFLOAD status of the current device, PMDs can determine
whether tx_prepare is currently needed. If it is not needed, set pkt_tx_prepare
to NULL, so that the actual tx_prepare processing will be skipped directly in
rte_eth_tx_prepare().

> 
>>  			num_tx_slave = rte_eth_tx_burst(slaves[i], bd_tx_q->queue_id,
>> -					slave_bufs[i], slave_nb_pkts[i]);
>> +					slave_bufs[i], nb_prep_pkts);
> 
> In fact it is a problem here and really big problems.
> Tx prepare may fail and return less packets. Tx prepare
> of some packet may always fail. If application tries to
> send packets in a loop until success, it will be a
> forever loop here. Since application calls Tx burst,
> it is 100% legal behaviour of the function to return 0
> if Tx ring is full. It is not an error indication.
> However, in the case of Tx prepare it is an error
> indication.
> 
> Should we change Tx burst description and enforce callers
> to check for rte_errno? It sounds like a major change...
> 

I agree that if the failure is caused by Tx ring full, it is a legal behaviour.
But what about the failure caused by other reasons? At present, it is possible
for some PMDs to fail during tx_burst due to other reasons. In this case,
repeated tries to send will also fail.

I'm not sure if all PMDs need to support the behavior of sending packets in a
loop until it succeeds. If not, I think the current problem can be reminded to
the user by adding a description to the bonding. If it is necessary, I think the
description of tx_burst should also add related instructions, so that the developers
of PMDs can better understand how tx_burst should be designed, such as putting all
hardware-related constraint checks into tx_prepare. And another prerequisite for
the above behavior is that the packets must be prepared (i.e. checked by
rte_eth_tx_prepare()). Otherwise, it may also fail to send. This means that we have
to use rte_eth_tx_prepare() in more scenarios.

What's Ferruh's opinion on this?

> [snip]
> 
> .
>