DPDK patches and discussions
 help / color / mirror / Atom feed
From: Chengchang Tang <tangchengchang@huawei.com>
To: "Ananyev, Konstantin" <konstantin.ananyev@intel.com>,
	Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>,
	"dev@dpdk.org" <dev@dpdk.org>
Cc: "linuxarm@huawei.com" <linuxarm@huawei.com>,
	"chas3@att.com" <chas3@att.com>,
	"humin29@huawei.com" <humin29@huawei.com>,
	"Yigit, Ferruh" <ferruh.yigit@intel.com>
Subject: Re: [dpdk-dev] [PATCH 1/2] net/bonding: support Tx prepare for bonding
Date: Thu, 10 Jun 2021 14:46:56 +0800	[thread overview]
Message-ID: <f7098e21-d94a-9d0e-8e66-0bd6dd1e3152@huawei.com> (raw)
In-Reply-To: <DM6PR11MB44912E86187272F043FA025D9A369@DM6PR11MB4491.namprd11.prod.outlook.com>



On 2021/6/9 18:25, Ananyev, Konstantin wrote:
>>> On 4/23/21 12:46 PM, Chengchang Tang wrote:
>>>> To use the HW offloads capability (e.g. checksum and TSO) in the Tx
>>>> direction, the upper-layer users need to call rte_eth_dev_prepare to do
>>>> some adjustment to the packets before sending them (e.g. processing
>>>> pseudo headers when Tx checksum offoad enabled). But, the tx_prepare
>>>> callback of the bond driver is not implemented. Therefore, related
>>>> offloads can not be used unless the upper layer users process the packet
>>>> properly in their own application. But it is bad for the
>>>> transplantability.
>>>>
>>>> However, it is difficult to design the tx_prepare callback for bonding
>>>> driver. Because when a bonded device sends packets, the bonded device
>>>> allocates the packets to different slave devices based on the real-time
>>>> link status and bonding mode. That is, it is very difficult for the
>>>> bonding device to determine which slave device's prepare function should
>>>> be invoked. In addition, if the link status changes after the packets are
>>>> prepared, the packets may fail to be sent because packets allocation may
>>>> change.
>>>>
>>>> So, in this patch, the tx_prepare callback of bonding driver is not
>>>> implemented. Instead, the rte_eth_dev_tx_prepare() will be called for
>>>> all the fast path packet in mode 0, 1, 2, 4, 5, 6. In this way, all
>>>> tx_offloads can be processed correctly for all NIC devices in these modes.
>>>> If tx_prepare is not required in some cases, then slave PMDs tx_prepare
>>>> pointer should be NULL and rte_eth_tx_prepare() will be just a NOOP.
>>>> In these cases, the impact on performance will be very limited. It is
>>>> the responsibility of the slave PMDs to decide when the real tx_prepare
>>>> needs to be used. The information from dev_config/queue_setup is
>>>> sufficient for them to make these decisions.
>>>>
>>>> Note:
>>>> The rte_eth_tx_prepare is not added to bond mode 3(Broadcast). This is
>>>> because in broadcast mode, a packet needs to be sent by all slave ports.
>>>> Different PMDs process the packets differently in tx_prepare. As a result,
>>>> the sent packet may be incorrect.
>>>>
>>>> Signed-off-by: Chengchang Tang <tangchengchang@huawei.com>
>>>> ---
>>>>  drivers/net/bonding/rte_eth_bond.h     |  1 -
>>>>  drivers/net/bonding/rte_eth_bond_pmd.c | 28 ++++++++++++++++++++++++----
>>>>  2 files changed, 24 insertions(+), 5 deletions(-)
>>>>
>>>> diff --git a/drivers/net/bonding/rte_eth_bond.h b/drivers/net/bonding/rte_eth_bond.h
>>>> index 874aa91..1e6cc6d 100644
>>>> --- a/drivers/net/bonding/rte_eth_bond.h
>>>> +++ b/drivers/net/bonding/rte_eth_bond.h
>>>> @@ -343,7 +343,6 @@ rte_eth_bond_link_up_prop_delay_set(uint16_t bonded_port_id,
>>>>  int
>>>>  rte_eth_bond_link_up_prop_delay_get(uint16_t bonded_port_id);
>>>>
>>>> -
>>>>  #ifdef __cplusplus
>>>>  }
>>>>  #endif
>>>> diff --git a/drivers/net/bonding/rte_eth_bond_pmd.c b/drivers/net/bonding/rte_eth_bond_pmd.c
>>>> index 2e9cea5..84af348 100644
>>>> --- a/drivers/net/bonding/rte_eth_bond_pmd.c
>>>> +++ b/drivers/net/bonding/rte_eth_bond_pmd.c
>>>> @@ -606,8 +606,14 @@ bond_ethdev_tx_burst_round_robin(void *queue, struct rte_mbuf **bufs,
>>>>  	/* Send packet burst on each slave device */
>>>>  	for (i = 0; i < num_of_slaves; i++) {
>>>>  		if (slave_nb_pkts[i] > 0) {
>>>> +			int nb_prep_pkts;
>>>> +
>>>> +			nb_prep_pkts = rte_eth_tx_prepare(slaves[i],
>>>> +					bd_tx_q->queue_id, slave_bufs[i],
>>>> +					slave_nb_pkts[i]);
>>>> +
>>>
>>> Shouldn't it be called iff queue Tx offloads are not zero?
>>> It will allow to decrease performance degradation if no
>>> Tx offloads are enabled. Same in all cases below.
>>
>> Regarding this point, it has been discussed in the previous RFC:
>> https://inbox.dpdk.org/dev/47f907cf-3933-1de9-9c45-6734b912eccd@huawei.com/
>>
>> According to the TX_OFFLOAD status of the current device, PMDs can determine
>> whether tx_prepare is currently needed. If it is not needed, set pkt_tx_prepare
>> to NULL, so that the actual tx_prepare processing will be skipped directly in
>> rte_eth_tx_prepare().
>>
>>>
>>>>  			num_tx_slave = rte_eth_tx_burst(slaves[i], bd_tx_q->queue_id,
>>>> -					slave_bufs[i], slave_nb_pkts[i]);
>>>> +					slave_bufs[i], nb_prep_pkts);
>>>
>>> In fact it is a problem here and really big problems.
>>> Tx prepare may fail and return less packets. Tx prepare
>>> of some packet may always fail. If application tries to
>>> send packets in a loop until success, it will be a
>>> forever loop here. Since application calls Tx burst,
>>> it is 100% legal behaviour of the function to return 0
>>> if Tx ring is full. It is not an error indication.
>>> However, in the case of Tx prepare it is an error
>>> indication.
> 
> Yes, that sounds like a problem and existing apps might be affected.
> 
>>>
>>> Should we change Tx burst description and enforce callers
>>> to check for rte_errno? It sounds like a major change...
>>>
> 
> Agree, rte_errno for tx_burst() is probably a simplest and sanest way,
> but yes, it is a change in behaviour and apps will need to be updated.  
> Another option for bond PMD - just silently free mbufs for which prepare()
> fails (and probably update some stats counter).
> Again it is a change in behaviour, but now just for one PMD, with tx offloads enabled.
> Also as, I can see some tx_burst() function for that PMD already free packets silently:
> bond_ethdev_tx_burst_alb(), bond_ethdev_tx_burst_broadcast().
> 
> Actually another question - why the patch adds tx_prepare() only to some
> TX modes but not all?
> Is that itended? 
> 

Yes. Currently, I have no ideal to perform tx_prepare() in broadcast mode with limited
impact on performance. In broadcast mode, same packets will be send in several devices.
In this process, we only update the ref_cnt of mbufs, but no copy of packets. As we know,
tx_prepare() may change the data, so it may cause some problem if we perform tx_prepare()
several times on the same packet.

>>
>> I agree that if the failure is caused by Tx ring full, it is a legal behaviour.
>> But what about the failure caused by other reasons? At present, it is possible
>> for some PMDs to fail during tx_burst due to other reasons. In this case,
>> repeated tries to send will also fail.
>>
>> I'm not sure if all PMDs need to support the behavior of sending packets in a
>> loop until it succeeds. If not, I think the current problem can be reminded to
>> the user by adding a description to the bonding. If it is necessary, I think the
>> description of tx_burst should also add related instructions, so that the developers
>> of PMDs can better understand how tx_burst should be designed, such as putting all
>> hardware-related constraint checks into tx_prepare. And another prerequisite for
>> the above behavior is that the packets must be prepared (i.e. checked by
>> rte_eth_tx_prepare()). Otherwise, it may also fail to send. This means that we have
>> to use rte_eth_tx_prepare() in more scenarios.
>>
>> What's Ferruh's opinion on this?
>>
>>> [snip]
>>>
>>> .
>>>
> 


  reply	other threads:[~2021-06-10  6:47 UTC|newest]

Thread overview: 61+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-04-16 11:04 [dpdk-dev] [RFC 0/2] add Tx prepare support for bonding device Chengchang Tang
2021-04-16 11:04 ` [dpdk-dev] [RFC 1/2] net/bonding: add Tx prepare for bonding Chengchang Tang
2021-04-16 11:04 ` [dpdk-dev] [RFC 2/2] app/testpmd: add cmd for bonding Tx prepare Chengchang Tang
2021-04-16 11:12 ` [dpdk-dev] [RFC 0/2] add Tx prepare support for bonding device Min Hu (Connor)
2021-04-20  1:26 ` Ferruh Yigit
2021-04-20  2:44   ` Chengchang Tang
2021-04-20  8:33     ` Ananyev, Konstantin
2021-04-20 12:44       ` Chengchang Tang
2021-04-20 13:18         ` Ananyev, Konstantin
2021-04-20 14:06           ` Chengchang Tang
2021-04-23  9:46 ` [dpdk-dev] [PATCH " Chengchang Tang
2021-04-23  9:46   ` [dpdk-dev] [PATCH 1/2] net/bonding: support Tx prepare for bonding Chengchang Tang
2021-06-08  9:49     ` Andrew Rybchenko
2021-06-09  6:42       ` Chengchang Tang
2021-06-09  9:35         ` Andrew Rybchenko
2021-06-10  7:32           ` Chengchang Tang
2021-06-14 14:16             ` Andrew Rybchenko
2021-06-09 10:25         ` Ananyev, Konstantin
2021-06-10  6:46           ` Chengchang Tang [this message]
2021-06-14 11:36             ` Ananyev, Konstantin
2022-05-24 12:11       ` Min Hu (Connor)
2022-07-25  4:08     ` [PATCH v2 0/3] add Tx prepare support for bonding driver Chengwen Feng
2022-07-25  4:08       ` [PATCH v2 1/3] net/bonding: support Tx prepare Chengwen Feng
2022-09-13 10:22         ` Ferruh Yigit
2022-09-13 15:08           ` Chas Williams
2022-09-14  0:46           ` fengchengwen
2022-09-14 16:59             ` Chas Williams
2022-09-17  2:35               ` fengchengwen
2022-09-17 13:38                 ` Chas Williams
2022-09-19 14:07                   ` Konstantin Ananyev
2022-09-19 23:02                     ` Chas Williams
2022-09-22  2:12                       ` fengchengwen
2022-09-25 10:32                         ` Chas Williams
2022-09-26 10:18                       ` Konstantin Ananyev
2022-09-26 16:36                         ` Chas Williams
2022-07-25  4:08       ` [PATCH v2 2/3] net/bonding: support Tx prepare fail stats Chengwen Feng
2022-07-25  4:08       ` [PATCH v2 3/3] net/bonding: add testpmd cmd for Tx prepare Chengwen Feng
2022-07-25  7:04       ` [PATCH v2 0/3] add Tx prepare support for bonding driver humin (Q)
2022-09-13  1:41       ` fengchengwen
2022-09-17  4:15     ` [PATCH v3 " Chengwen Feng
2022-09-17  4:15       ` [PATCH v3 1/3] net/bonding: support Tx prepare Chengwen Feng
2022-09-17  4:15       ` [PATCH v3 2/3] net/bonding: support Tx prepare fail stats Chengwen Feng
2022-09-17  4:15       ` [PATCH v3 3/3] net/bonding: add testpmd cmd for Tx prepare Chengwen Feng
2022-10-09  3:36     ` [PATCH v4] net/bonding: call Tx prepare before Tx burst Chengwen Feng
2022-10-10 19:42       ` Chas Williams
2022-10-11 13:28         ` fengchengwen
2022-10-11 13:20     ` [PATCH v5] " Chengwen Feng
2022-10-15 15:26       ` Chas Williams
2022-10-18 14:25         ` fengchengwen
2022-10-20  7:07         ` Andrew Rybchenko
2021-04-23  9:46   ` [dpdk-dev] [PATCH 2/2] net/bonding: support configuring Tx offloading for bonding Chengchang Tang
2021-06-08  9:49     ` Andrew Rybchenko
2021-06-09  6:57       ` Chengchang Tang
2021-06-09  9:11         ` Ananyev, Konstantin
2021-06-09  9:37           ` Andrew Rybchenko
2021-06-10  6:29             ` Chengchang Tang
2021-06-14 11:05               ` Ananyev, Konstantin
2021-06-14 14:13                 ` Andrew Rybchenko
2021-04-30  6:26   ` [dpdk-dev] [PATCH 0/2] add Tx prepare support for bonding device Chengchang Tang
2021-04-30  6:47     ` Min Hu (Connor)
2021-06-03  1:44   ` Chengchang Tang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=f7098e21-d94a-9d0e-8e66-0bd6dd1e3152@huawei.com \
    --to=tangchengchang@huawei.com \
    --cc=andrew.rybchenko@oktetlabs.ru \
    --cc=chas3@att.com \
    --cc=dev@dpdk.org \
    --cc=ferruh.yigit@intel.com \
    --cc=humin29@huawei.com \
    --cc=konstantin.ananyev@intel.com \
    --cc=linuxarm@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).