DPDK patches and discussions
 help / color / mirror / Atom feed
From: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
To: Chengchang Tang <tangchengchang@huawei.com>, dev@dpdk.org
Cc: linuxarm@huawei.com, chas3@att.com, humin29@huawei.com,
	ferruh.yigit@intel.com, konstantin.ananyev@intel.com
Subject: Re: [dpdk-dev] [PATCH 1/2] net/bonding: support Tx prepare for bonding
Date: Wed, 9 Jun 2021 12:35:18 +0300
Message-ID: <04598999-c517-43b8-3661-8c0bb25e65e7@oktetlabs.ru> (raw)
In-Reply-To: <070e2dc9-7ab8-44f3-7d78-af1b35f18908@huawei.com>

On 6/9/21 9:42 AM, Chengchang Tang wrote:
> Hi, Andrew and Ferruh
> On 2021/6/8 17:49, Andrew Rybchenko wrote:
>> "for bonding" is redundant in the summary since it is already "net/bonding".
>> On 4/23/21 12:46 PM, Chengchang Tang wrote:
>>> To use the HW offloads capability (e.g. checksum and TSO) in the Tx
>>> direction, the upper-layer users need to call rte_eth_dev_prepare to do
>>> some adjustment to the packets before sending them (e.g. processing
>>> pseudo headers when Tx checksum offoad enabled). But, the tx_prepare
>>> callback of the bond driver is not implemented. Therefore, related
>>> offloads can not be used unless the upper layer users process the packet
>>> properly in their own application. But it is bad for the
>>> transplantability.
>>> However, it is difficult to design the tx_prepare callback for bonding
>>> driver. Because when a bonded device sends packets, the bonded device
>>> allocates the packets to different slave devices based on the real-time
>>> link status and bonding mode. That is, it is very difficult for the
>>> bonding device to determine which slave device's prepare function should
>>> be invoked. In addition, if the link status changes after the packets are
>>> prepared, the packets may fail to be sent because packets allocation may
>>> change.
>>> So, in this patch, the tx_prepare callback of bonding driver is not
>>> implemented. Instead, the rte_eth_dev_tx_prepare() will be called for
>>> all the fast path packet in mode 0, 1, 2, 4, 5, 6. In this way, all
>>> tx_offloads can be processed correctly for all NIC devices in these modes.
>>> If tx_prepare is not required in some cases, then slave PMDs tx_prepare
>>> pointer should be NULL and rte_eth_tx_prepare() will be just a NOOP.
>>> In these cases, the impact on performance will be very limited. It is
>>> the responsibility of the slave PMDs to decide when the real tx_prepare
>>> needs to be used. The information from dev_config/queue_setup is
>>> sufficient for them to make these decisions.
>>> Note:
>>> The rte_eth_tx_prepare is not added to bond mode 3(Broadcast). This is
>>> because in broadcast mode, a packet needs to be sent by all slave ports.
>>> Different PMDs process the packets differently in tx_prepare. As a result,
>>> the sent packet may be incorrect.
>>> Signed-off-by: Chengchang Tang <tangchengchang@huawei.com>
>>> ---
>>>  drivers/net/bonding/rte_eth_bond.h     |  1 -
>>>  drivers/net/bonding/rte_eth_bond_pmd.c | 28 ++++++++++++++++++++++++----
>>>  2 files changed, 24 insertions(+), 5 deletions(-)
>>> diff --git a/drivers/net/bonding/rte_eth_bond.h b/drivers/net/bonding/rte_eth_bond.h
>>> index 874aa91..1e6cc6d 100644
>>> --- a/drivers/net/bonding/rte_eth_bond.h
>>> +++ b/drivers/net/bonding/rte_eth_bond.h
>>> @@ -343,7 +343,6 @@ rte_eth_bond_link_up_prop_delay_set(uint16_t bonded_port_id,
>>>  int
>>>  rte_eth_bond_link_up_prop_delay_get(uint16_t bonded_port_id);
>>> -
>>>  #ifdef __cplusplus
>>>  }
>>>  #endif
>>> diff --git a/drivers/net/bonding/rte_eth_bond_pmd.c b/drivers/net/bonding/rte_eth_bond_pmd.c
>>> index 2e9cea5..84af348 100644
>>> --- a/drivers/net/bonding/rte_eth_bond_pmd.c
>>> +++ b/drivers/net/bonding/rte_eth_bond_pmd.c
>>> @@ -606,8 +606,14 @@ bond_ethdev_tx_burst_round_robin(void *queue, struct rte_mbuf **bufs,
>>>  	/* Send packet burst on each slave device */
>>>  	for (i = 0; i < num_of_slaves; i++) {
>>>  		if (slave_nb_pkts[i] > 0) {
>>> +			int nb_prep_pkts;
>>> +
>>> +			nb_prep_pkts = rte_eth_tx_prepare(slaves[i],
>>> +					bd_tx_q->queue_id, slave_bufs[i],
>>> +					slave_nb_pkts[i]);
>>> +
>> Shouldn't it be called iff queue Tx offloads are not zero?
>> It will allow to decrease performance degradation if no
>> Tx offloads are enabled. Same in all cases below.
> Regarding this point, it has been discussed in the previous RFC:
> https://inbox.dpdk.org/dev/47f907cf-3933-1de9-9c45-6734b912eccd@huawei.com/
> According to the TX_OFFLOAD status of the current device, PMDs can determine
> whether tx_prepare is currently needed. If it is not needed, set pkt_tx_prepare
> to NULL, so that the actual tx_prepare processing will be skipped directly in
> rte_eth_tx_prepare().

I still think that the following is right:
No Tx offloads at all => Tx prepare is not necessary

Am I wrong?

>>>  			num_tx_slave = rte_eth_tx_burst(slaves[i], bd_tx_q->queue_id,
>>> -					slave_bufs[i], slave_nb_pkts[i]);
>>> +					slave_bufs[i], nb_prep_pkts);
>> In fact it is a problem here and really big problems.
>> Tx prepare may fail and return less packets. Tx prepare
>> of some packet may always fail. If application tries to
>> send packets in a loop until success, it will be a
>> forever loop here. Since application calls Tx burst,
>> it is 100% legal behaviour of the function to return 0
>> if Tx ring is full. It is not an error indication.
>> However, in the case of Tx prepare it is an error
>> indication.
>> Should we change Tx burst description and enforce callers
>> to check for rte_errno? It sounds like a major change...
> I agree that if the failure is caused by Tx ring full, it is a legal behaviour.
> But what about the failure caused by other reasons? At present, it is possible
> for some PMDs to fail during tx_burst due to other reasons. In this case,
> repeated tries to send will also fail.

If so, packet should be simply dropped by Tx burst and Tx burst
should move on. If a packet cannot be transmitted, it must be
dropped (counted) and Tx burst should move to the next packet.

> I'm not sure if all PMDs need to support the behavior of sending packets in a
> loop until it succeeds. If not, I think the current problem can be reminded to
> the user by adding a description to the bonding. If it is necessary, I think the
> description of tx_burst should also add related instructions, so that the developers
> of PMDs can better understand how tx_burst should be designed, such as putting all
> hardware-related constraint checks into tx_prepare. And another prerequisite for
> the above behavior is that the packets must be prepared (i.e. checked by
> rte_eth_tx_prepare()). Otherwise, it may also fail to send. This means that we have
> to use rte_eth_tx_prepare() in more scenarios.

IMHO any PMD specific behaviour is a nightmare to application
developer and must be avoided. Ideally application should not
care if it is running on top of tap, virtio, failsafe or
bonding. It should talk to ethdev in terms of ethdev API that's
it. I know that net/bonding is designed that application should
know about it, but IMHO the places where it requires the
knowledge must be minimized to make applications more portable
across various PMDs/HW.

I think that the only sensible solution for above problem is
to skip a packet which prepare dislikes. count it as dropped
and try to prepare/transmit subsequent packets.

It is an interesting effect of the Tx prepare just before
Tx burst inside bonding PMD. If Tx burst fails to send
something because ring is full, a number of packets will
be processed by Tx prepare again and again. I guess it is

> What's Ferruh's opinion on this?
>> [snip]

  reply	other threads:[~2021-06-09  9:35 UTC|newest]

Thread overview: 31+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-04-16 11:04 [dpdk-dev] [RFC 0/2] add Tx prepare support for bonding device Chengchang Tang
2021-04-16 11:04 ` [dpdk-dev] [RFC 1/2] net/bonding: add Tx prepare for bonding Chengchang Tang
2021-04-16 11:04 ` [dpdk-dev] [RFC 2/2] app/testpmd: add cmd for bonding Tx prepare Chengchang Tang
2021-04-16 11:12 ` [dpdk-dev] [RFC 0/2] add Tx prepare support for bonding device Min Hu (Connor)
2021-04-20  1:26 ` Ferruh Yigit
2021-04-20  2:44   ` Chengchang Tang
2021-04-20  8:33     ` Ananyev, Konstantin
2021-04-20 12:44       ` Chengchang Tang
2021-04-20 13:18         ` Ananyev, Konstantin
2021-04-20 14:06           ` Chengchang Tang
2021-04-23  9:46 ` [dpdk-dev] [PATCH " Chengchang Tang
2021-04-23  9:46   ` [dpdk-dev] [PATCH 1/2] net/bonding: support Tx prepare for bonding Chengchang Tang
2021-06-08  9:49     ` Andrew Rybchenko
2021-06-09  6:42       ` Chengchang Tang
2021-06-09  9:35         ` Andrew Rybchenko [this message]
2021-06-10  7:32           ` Chengchang Tang
2021-06-14 14:16             ` Andrew Rybchenko
2021-06-09 10:25         ` Ananyev, Konstantin
2021-06-10  6:46           ` Chengchang Tang
2021-06-14 11:36             ` Ananyev, Konstantin
2021-04-23  9:46   ` [dpdk-dev] [PATCH 2/2] net/bonding: support configuring Tx offloading " Chengchang Tang
2021-06-08  9:49     ` Andrew Rybchenko
2021-06-09  6:57       ` Chengchang Tang
2021-06-09  9:11         ` Ananyev, Konstantin
2021-06-09  9:37           ` Andrew Rybchenko
2021-06-10  6:29             ` Chengchang Tang
2021-06-14 11:05               ` Ananyev, Konstantin
2021-06-14 14:13                 ` Andrew Rybchenko
2021-04-30  6:26   ` [dpdk-dev] [PATCH 0/2] add Tx prepare support for bonding device Chengchang Tang
2021-04-30  6:47     ` Min Hu (Connor)
2021-06-03  1:44   ` Chengchang Tang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=04598999-c517-43b8-3661-8c0bb25e65e7@oktetlabs.ru \
    --to=andrew.rybchenko@oktetlabs.ru \
    --cc=chas3@att.com \
    --cc=dev@dpdk.org \
    --cc=ferruh.yigit@intel.com \
    --cc=humin29@huawei.com \
    --cc=konstantin.ananyev@intel.com \
    --cc=linuxarm@huawei.com \
    --cc=tangchengchang@huawei.com \


* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

DPDK patches and discussions

This inbox may be cloned and mirrored by anyone:

	git clone --mirror https://inbox.dpdk.org/dev/0 dev/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 dev dev/ https://inbox.dpdk.org/dev \
	public-inbox-index dev

Example config snippet for mirrors.
Newsgroup available over NNTP:

AGPL code for this site: git clone https://public-inbox.org/public-inbox.git