From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id 0921EA0C44; Mon, 14 Jun 2021 16:16:56 +0200 (CEST) Received: from [217.70.189.124] (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id EB1864068E; Mon, 14 Jun 2021 16:16:55 +0200 (CEST) Received: from shelob.oktetlabs.ru (shelob.oktetlabs.ru [91.220.146.113]) by mails.dpdk.org (Postfix) with ESMTP id 5456C4068C for ; Mon, 14 Jun 2021 16:16:54 +0200 (CEST) Received: from [192.168.1.71] (unknown [188.170.85.171]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits)) (No client certificate requested) by shelob.oktetlabs.ru (Postfix) with ESMTPSA id B2B337F40D; Mon, 14 Jun 2021 17:16:53 +0300 (MSK) DKIM-Filter: OpenDKIM Filter v2.11.0 shelob.oktetlabs.ru B2B337F40D DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=oktetlabs.ru; s=default; t=1623680214; bh=AWgeTdGAbczR5MNxa3jpoJXxZHKcuc/fO3IqsYntfFA=; h=Subject:To:Cc:References:From:Date:In-Reply-To; b=g4g10ZSJU8ScmlNav4TE93y9sRyYFx3X5AIkMml8OQwEL4DpCWCjWN3ITTVKr4qmY lPMzOMaerIftazh2BDSLz8J0zYxjQYZfpuzlsAQ7m3OiwXDKAHSLmf+CiT0/ieuJq8 799j87wC0Em6dRCBbOTmEObOtUVajuSS9Dx/hOKo= To: Chengchang Tang , dev@dpdk.org Cc: linuxarm@huawei.com, chas3@att.com, humin29@huawei.com, ferruh.yigit@intel.com, konstantin.ananyev@intel.com References: <1618571071-5927-1-git-send-email-tangchengchang@huawei.com> <1619171202-28486-1-git-send-email-tangchengchang@huawei.com> <1619171202-28486-2-git-send-email-tangchengchang@huawei.com> <4ce1752a-b605-8f35-ab06-13a671ef0e6c@oktetlabs.ru> <070e2dc9-7ab8-44f3-7d78-af1b35f18908@huawei.com> <04598999-c517-43b8-3661-8c0bb25e65e7@oktetlabs.ru> From: Andrew Rybchenko Message-ID: Date: Mon, 14 Jun 2021 17:16:53 +0300 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.10.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 8bit Subject: Re: [dpdk-dev] [PATCH 1/2] net/bonding: support Tx prepare for bonding X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" On 6/10/21 10:32 AM, Chengchang Tang wrote: > On 2021/6/9 17:35, Andrew Rybchenko wrote: >> On 6/9/21 9:42 AM, Chengchang Tang wrote: >>> Hi, Andrew and Ferruh >>> >>> On 2021/6/8 17:49, Andrew Rybchenko wrote: >>>> "for bonding" is redundant in the summary since it is already "net/bonding". >>>> >>>> On 4/23/21 12:46 PM, Chengchang Tang wrote: >>>>> To use the HW offloads capability (e.g. checksum and TSO) in the Tx >>>>> direction, the upper-layer users need to call rte_eth_dev_prepare to do >>>>> some adjustment to the packets before sending them (e.g. processing >>>>> pseudo headers when Tx checksum offoad enabled). But, the tx_prepare >>>>> callback of the bond driver is not implemented. Therefore, related >>>>> offloads can not be used unless the upper layer users process the packet >>>>> properly in their own application. But it is bad for the >>>>> transplantability. >>>>> >>>>> However, it is difficult to design the tx_prepare callback for bonding >>>>> driver. Because when a bonded device sends packets, the bonded device >>>>> allocates the packets to different slave devices based on the real-time >>>>> link status and bonding mode. That is, it is very difficult for the >>>>> bonding device to determine which slave device's prepare function should >>>>> be invoked. In addition, if the link status changes after the packets are >>>>> prepared, the packets may fail to be sent because packets allocation may >>>>> change. >>>>> >>>>> So, in this patch, the tx_prepare callback of bonding driver is not >>>>> implemented. Instead, the rte_eth_dev_tx_prepare() will be called for >>>>> all the fast path packet in mode 0, 1, 2, 4, 5, 6. In this way, all >>>>> tx_offloads can be processed correctly for all NIC devices in these modes. >>>>> If tx_prepare is not required in some cases, then slave PMDs tx_prepare >>>>> pointer should be NULL and rte_eth_tx_prepare() will be just a NOOP. >>>>> In these cases, the impact on performance will be very limited. It is >>>>> the responsibility of the slave PMDs to decide when the real tx_prepare >>>>> needs to be used. The information from dev_config/queue_setup is >>>>> sufficient for them to make these decisions. >>>>> >>>>> Note: >>>>> The rte_eth_tx_prepare is not added to bond mode 3(Broadcast). This is >>>>> because in broadcast mode, a packet needs to be sent by all slave ports. >>>>> Different PMDs process the packets differently in tx_prepare. As a result, >>>>> the sent packet may be incorrect. >>>>> >>>>> Signed-off-by: Chengchang Tang >>>>> --- >>>>> drivers/net/bonding/rte_eth_bond.h | 1 - >>>>> drivers/net/bonding/rte_eth_bond_pmd.c | 28 ++++++++++++++++++++++++---- >>>>> 2 files changed, 24 insertions(+), 5 deletions(-) >>>>> >>>>> diff --git a/drivers/net/bonding/rte_eth_bond.h b/drivers/net/bonding/rte_eth_bond.h >>>>> index 874aa91..1e6cc6d 100644 >>>>> --- a/drivers/net/bonding/rte_eth_bond.h >>>>> +++ b/drivers/net/bonding/rte_eth_bond.h >>>>> @@ -343,7 +343,6 @@ rte_eth_bond_link_up_prop_delay_set(uint16_t bonded_port_id, >>>>> int >>>>> rte_eth_bond_link_up_prop_delay_get(uint16_t bonded_port_id); >>>>> >>>>> - >>>>> #ifdef __cplusplus >>>>> } >>>>> #endif >>>>> diff --git a/drivers/net/bonding/rte_eth_bond_pmd.c b/drivers/net/bonding/rte_eth_bond_pmd.c >>>>> index 2e9cea5..84af348 100644 >>>>> --- a/drivers/net/bonding/rte_eth_bond_pmd.c >>>>> +++ b/drivers/net/bonding/rte_eth_bond_pmd.c >>>>> @@ -606,8 +606,14 @@ bond_ethdev_tx_burst_round_robin(void *queue, struct rte_mbuf **bufs, >>>>> /* Send packet burst on each slave device */ >>>>> for (i = 0; i < num_of_slaves; i++) { >>>>> if (slave_nb_pkts[i] > 0) { >>>>> + int nb_prep_pkts; >>>>> + >>>>> + nb_prep_pkts = rte_eth_tx_prepare(slaves[i], >>>>> + bd_tx_q->queue_id, slave_bufs[i], >>>>> + slave_nb_pkts[i]); >>>>> + >>>> >>>> Shouldn't it be called iff queue Tx offloads are not zero? >>>> It will allow to decrease performance degradation if no >>>> Tx offloads are enabled. Same in all cases below. >>> >>> Regarding this point, it has been discussed in the previous RFC: >>> https://inbox.dpdk.org/dev/47f907cf-3933-1de9-9c45-6734b912eccd@huawei.com/ >>> >>> According to the TX_OFFLOAD status of the current device, PMDs can determine >>> whether tx_prepare is currently needed. If it is not needed, set pkt_tx_prepare >>> to NULL, so that the actual tx_prepare processing will be skipped directly in >>> rte_eth_tx_prepare(). >> >> I still think that the following is right: >> No Tx offloads at all => Tx prepare is not necessary >> >> Am I wrong? >> > > Let PMDs determine whether tx_prepare() need be done could reduce the performance > loss in more scenarios. For example, some offload do not need a Tx prepare, and PMDs > could set tx_prepare to NULL in this scenario. Even if rte_eth_tx_prepare() is called, > it will not perform the tx_prepare callback, and then return. In this case, there is > only one judgment logic. If we judge whether tx_offloads are not zero, one more logical > judgment is added. I'll wait for net/bonding maintainers decision here. IMHO all above assumptions should be proven by performance measurements. > Of course, some PMDs currently do not optimize tx_prepare, which may have a performance > impact. We hope to force them to optimize tx_prepare in this way, just like they optimize > tx_burst. This makes it easier for users to use tx_prepare(), and no longer need to > consider that using tx_prepare() will introduce unnecessary performance degradation. > > IMHO tx_prepare() should be extended to all scenarios for use, and the impact on > performance should be optimized by PMDs. Let the application consider when it should be > used and when it should not be used, in many cases it will not be used and then introduced > some problem. > >>>> >>>>> num_tx_slave = rte_eth_tx_burst(slaves[i], bd_tx_q->queue_id, >>>>> - slave_bufs[i], slave_nb_pkts[i]); >>>>> + slave_bufs[i], nb_prep_pkts); >>>> >>>> In fact it is a problem here and really big problems. >>>> Tx prepare may fail and return less packets. Tx prepare >>>> of some packet may always fail. If application tries to >>>> send packets in a loop until success, it will be a >>>> forever loop here. Since application calls Tx burst, >>>> it is 100% legal behaviour of the function to return 0 >>>> if Tx ring is full. It is not an error indication. >>>> However, in the case of Tx prepare it is an error >>>> indication. >>>> >>>> Should we change Tx burst description and enforce callers >>>> to check for rte_errno? It sounds like a major change... >>>> >>> >>> I agree that if the failure is caused by Tx ring full, it is a legal behaviour. >>> But what about the failure caused by other reasons? At present, it is possible >>> for some PMDs to fail during tx_burst due to other reasons. In this case, >>> repeated tries to send will also fail. >> >> If so, packet should be simply dropped by Tx burst and Tx burst >> should move on. If a packet cannot be transmitted, it must be >> dropped (counted) and Tx burst should move to the next packet. >> >>> I'm not sure if all PMDs need to support the behavior of sending packets in a >>> loop until it succeeds. If not, I think the current problem can be reminded to >>> the user by adding a description to the bonding. If it is necessary, I think the >>> description of tx_burst should also add related instructions, so that the developers >>> of PMDs can better understand how tx_burst should be designed, such as putting all >>> hardware-related constraint checks into tx_prepare. And another prerequisite for >>> the above behavior is that the packets must be prepared (i.e. checked by >>> rte_eth_tx_prepare()). Otherwise, it may also fail to send. This means that we have >>> to use rte_eth_tx_prepare() in more scenarios. >> >> IMHO any PMD specific behaviour is a nightmare to application >> developer and must be avoided. Ideally application should not >> care if it is running on top of tap, virtio, failsafe or >> bonding. It should talk to ethdev in terms of ethdev API that's >> it. I know that net/bonding is designed that application should >> know about it, but IMHO the places where it requires the >> knowledge must be minimized to make applications more portable >> across various PMDs/HW. >> >> I think that the only sensible solution for above problem is >> to skip a packet which prepare dislikes. count it as dropped >> and try to prepare/transmit subsequent packets. > > Agree, I will fix this in the next version. > >> >> It is an interesting effect of the Tx prepare just before >> Tx burst inside bonding PMD. If Tx burst fails to send >> something because ring is full, a number of packets will >> be processed by Tx prepare again and again. I guess it is >> unavoidable. >> >>> What's Ferruh's opinion on this? >>> >>>> [snip]