From: fengchengwen <fengchengwen@huawei.com>
To: Chas Williams <3chas3@gmail.com>,
Konstantin Ananyev <konstantin.ananyev@huawei.com>,
Ferruh Yigit <ferruh.yigit@xilinx.com>,
"thomas@monjalon.net" <thomas@monjalon.net>,
"andrew.rybchenko@oktetlabs.ru" <andrew.rybchenko@oktetlabs.ru>,
Ferruh Yigit <ferruh.yigit@amd.com>
Cc: "dev@dpdk.org" <dev@dpdk.org>, "chas3@att.com" <chas3@att.com>,
"humin (Q)" <humin29@huawei.com>
Subject: Re: [PATCH v2 1/3] net/bonding: support Tx prepare
Date: Thu, 22 Sep 2022 10:12:20 +0800 [thread overview]
Message-ID: <66c366e5-3634-3ecb-c605-a4c23278ed88@huawei.com> (raw)
In-Reply-To: <4a978f38-4f38-4630-ca91-fb96a5789d6f@gmail.com>
On 2022/9/20 7:02, Chas Williams wrote:
>
>
> On 9/19/22 10:07, Konstantin Ananyev wrote:
>>
>>>
>>> On 9/16/22 22:35, fengchengwen wrote:
>>>> Hi Chas,
>>>>
>>>> On 2022/9/15 0:59, Chas Williams wrote:
>>>>> On 9/13/22 20:46, fengchengwen wrote:
>>>>>>
>>>>>> The main problem is hard to design a tx_prepare for bonding device:
>>>>>> 1. as Chas Williams said, there maybe twice hash calc to get target slave
>>>>>> devices.
>>>>>> 2. also more important, if the slave devices have changes(e.g. slave device
>>>>>> link down or remove), and if the changes happens between bond-tx-prepare and
>>>>>> bond-tx-burst, the output slave will changes, and this may lead to checksum
>>>>>> failed. (Note: a bond device with slave devices may from different vendors,
>>>>>> and slave devices may have different requirements, e.g. slave-A support calc
>>>>>> IPv4 pseudo-head automatic (no need driver pre-calc), but slave-B need driver
>>>>>> pre-calc).
>>>>>>
>>>>>> Current design cover the above two scenarios by using in-place tx-prepare. and
>>>>>> in addition, bond devices are not transparent to applications, I think it's a
>>>>>> practical method to provide tx-prepare support in this way.
>>>>>>
>>>>>
>>>>>
>>>>> I don't think you need to export an enable/disable routine for the use of
>>>>> rte_eth_tx_prepare. It's safe to just call that routine, even if it isn't
>>>>> implemented. You are just trading one branch in DPDK librte_eth_dev for a
>>>>> branch in drivers/net/bonding.
>>>>
>>>> Our first patch was just like yours (just add tx-prepare default), but community
>>>> is concerned about impacting performance.
>>>>
>>>> As a trade-off, I think we can add the enable/disable API.
>>>
>>> IMHO, that's a bad idea. If the rte_eth_dev_tx_prepare API affects
>>> performance adversly, that is not a bonding problem. All applications
>>> should be calling rte_eth_dev_tx_prepare. There's no defined API
>>> to determine if rte_eth_dev_tx_prepare should be called. Therefore,
>>> applications should always call rte_eth_dev_tx_prepare. Regardless,
>>> as I previously mentioned, you are just trading the location of
>>> the branch, especially in the bonding case.
>>>
>>> If rte_eth_dev_tx_prepare is causing a performance drop, then that API
>>> should be improved or rewritten. There are PMD that require you to use
>>> that API. Locally, we had maintained a patch to eliminate the use of
>>> rte_eth_dev_tx_prepare. However, that has been getting harder and harder
>>> to maintain. The performance lost by just calling rte_eth_dev_tx_prepare
>>> was marginal.
>>>
>>>>
>>>>>
>>>>> I think you missed fixing tx_machine in 802.3ad support. We have been using
>>>>> the following patch locally which I never got around to submitting.
>>>>
>>>> You are right, I will send V3 fix it.
>>>>
>>>>>
>>>>>
>>>>> From a458654d68ff5144266807ef136ac3dd2adfcd98 Mon Sep 17 00:00:00 2001
>>>>> From: "Charles (Chas) Williams" <chwillia@ciena.com>
>>>>> Date: Tue, 3 May 2022 16:52:37 -0400
>>>>> Subject: [PATCH] net/bonding: call rte_eth_tx_prepare before rte_eth_tx_burst
>>>>>
>>>>> Some PMDs might require a call to rte_eth_tx_prepare before sending the
>>>>> packets for transmission. Typically, the prepare step handles the VLAN
>>>>> headers, but it may need to do other things.
>>>>>
>>>>> Signed-off-by: Chas Williams <chwillia@ciena.com>
>>>>
>>>> ...
>>>>
>>>>> * ring if transmission fails so the packet isn't lost.
>>>>> @@ -1322,8 +1350,12 @@ bond_ethdev_tx_burst_broadcast(void *queue, struct rte_mbuf **bufs,
>>>>>
>>>>> /* Transmit burst on each active slave */
>>>>> for (i = 0; i < num_of_slaves; i++) {
>>>>> - slave_tx_total[i] = rte_eth_tx_burst(slaves[i], bd_tx_q->queue_id,
>>>>> + uint16_t nb_prep;
>>>>> +
>>>>> + nb_prep = rte_eth_tx_prepare(slaves[i], bd_tx_q->queue_id,
>>>>> bufs, nb_pkts);
>>>>> + slave_tx_total[i] = rte_eth_tx_burst(slaves[i], bd_tx_q->queue_id,
>>>>> + bufs, nb_prep);
>>>>
>>>> The tx-prepare may edit packet data, and the broadcast mode will send a packet to all slaves,
>>>> the packet data is sent and edited at the same time. Is this likely to cause problems ?
>>>
>>> This routine is already broken. You can't just increment the refcount
>>> and send the packet into a PMD's transmit routine. Nothing guarantees
>>> that a transmit routine will not modify the packet. Many PMDs perform an
>>> rte_vlan_insert.
>>
>> Hmm interesting....
>> My uderstanding was quite opposite - tx_burst() can't modify packet data and metadata
>> (except when refcnt==1 and tx_burst() going to free the mbuf and put it back to the mempool).
>> While tx_prepare() can - actually as I remember that was one of the reasons why a separate routine
>> was introduced.
>
> Is that documented anywhere? It's been my experience that the device PMD
> can do practically anything and you need to protect yourself. Currently,
> the af_packet, dpaa2, and vhost driver call rte_vlan_insert. Before 2019,
> the virtio driver also used to call rte_vlan_insert during its transmit
> path. Of course, rte_vlan_insert modifies the packet data and the mbuf
> header. Regardless, it looks like rte_eth_dev_tx_prepare should always be
> called. Handling that correctly in broadcast mode probably means always
> make a deep copy of the packet, or check to see if all the members are
> the same PMD type. If so, you can just call prepare once. You could track
> the mismatched nature during additional/removal of the members. Or just
> assume people aren't going to mismatch bonding members.
the rte_eth_tx_prepare has notes:
* Since this function can modify packet data, provided mbufs must be safely
* writable (e.g. modified data cannot be in shared segment).
but rte_eth_tx_burst have not such requirement.
except above examples of rte_vlan_insert, there are also some PMDs modify mbuf's header
and data, e.g. hns3/ark/bnxt will invoke rte_pktmbuf_append in case of the pkt-len too small.
I prefer the rte_eth_tx_burst add such restricts: the PMD should not modify the mbuf except refcnt==1.
so that application could rely on there explicit definition to do business.
As for this bonding scenario, we have three alternatives:
1) as Chas provided patch, always do tx-prepare before tx-burst. it was simple, but have: it
may modify the mbuf but application could not detect (unless especial documents)
2) my patch, application could invoke the prepare_enable/disable to control whether to do prepare.
3) implement bonding PMD's tx-prepare, it do tx-preare for each slave, but existing some problem:
if the slave device changes (e.g. add new device), some packet errors may occur because we have not
do prepare for the new add device.
note1: the above 1/2 both violate rte_eth_tx_burst's requirement, so we should especial document.
note2: we can do some optimization for 3, e.g. if the same driver name is detected on multiple slave
devices, here only need to perform tx-prepare once. but the problem above descripe still exist
because of dynamic slave devices at runtime.
hope for more discuess. @Ferruh @Chas @Humin @Konstantin
>
>
>>> You should at least perform a clone of the packet so
>>> that the mbuf headers aren't mangled by each PMD. Just to be safe you
>>> should perform a partial deep copy of the packet headers in case some
>>> PMD does an rte_vlan_insert and the other PMDs in the bonding group do
>>> not need an rte_vlan_insert.
>>>
>>> So doing a blind rte_eth_dev_tx_preprare isn't making anything much
>>> worse.
>>>
>>>>
>>>>>
>>>>> if (unlikely(slave_tx_total[i] < nb_pkts))
>>>>> tx_failed_flag = 1;
> .
next prev parent reply other threads:[~2022-09-22 2:12 UTC|newest]
Thread overview: 61+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-04-16 11:04 [dpdk-dev] [RFC 0/2] add Tx prepare support for bonding device Chengchang Tang
2021-04-16 11:04 ` [dpdk-dev] [RFC 1/2] net/bonding: add Tx prepare for bonding Chengchang Tang
2021-04-16 11:04 ` [dpdk-dev] [RFC 2/2] app/testpmd: add cmd for bonding Tx prepare Chengchang Tang
2021-04-16 11:12 ` [dpdk-dev] [RFC 0/2] add Tx prepare support for bonding device Min Hu (Connor)
2021-04-20 1:26 ` Ferruh Yigit
2021-04-20 2:44 ` Chengchang Tang
2021-04-20 8:33 ` Ananyev, Konstantin
2021-04-20 12:44 ` Chengchang Tang
2021-04-20 13:18 ` Ananyev, Konstantin
2021-04-20 14:06 ` Chengchang Tang
2021-04-23 9:46 ` [dpdk-dev] [PATCH " Chengchang Tang
2021-04-23 9:46 ` [dpdk-dev] [PATCH 1/2] net/bonding: support Tx prepare for bonding Chengchang Tang
2021-06-08 9:49 ` Andrew Rybchenko
2021-06-09 6:42 ` Chengchang Tang
2021-06-09 9:35 ` Andrew Rybchenko
2021-06-10 7:32 ` Chengchang Tang
2021-06-14 14:16 ` Andrew Rybchenko
2021-06-09 10:25 ` Ananyev, Konstantin
2021-06-10 6:46 ` Chengchang Tang
2021-06-14 11:36 ` Ananyev, Konstantin
2022-05-24 12:11 ` Min Hu (Connor)
2022-07-25 4:08 ` [PATCH v2 0/3] add Tx prepare support for bonding driver Chengwen Feng
2022-07-25 4:08 ` [PATCH v2 1/3] net/bonding: support Tx prepare Chengwen Feng
2022-09-13 10:22 ` Ferruh Yigit
2022-09-13 15:08 ` Chas Williams
2022-09-14 0:46 ` fengchengwen
2022-09-14 16:59 ` Chas Williams
2022-09-17 2:35 ` fengchengwen
2022-09-17 13:38 ` Chas Williams
2022-09-19 14:07 ` Konstantin Ananyev
2022-09-19 23:02 ` Chas Williams
2022-09-22 2:12 ` fengchengwen [this message]
2022-09-25 10:32 ` Chas Williams
2022-09-26 10:18 ` Konstantin Ananyev
2022-09-26 16:36 ` Chas Williams
2022-07-25 4:08 ` [PATCH v2 2/3] net/bonding: support Tx prepare fail stats Chengwen Feng
2022-07-25 4:08 ` [PATCH v2 3/3] net/bonding: add testpmd cmd for Tx prepare Chengwen Feng
2022-07-25 7:04 ` [PATCH v2 0/3] add Tx prepare support for bonding driver humin (Q)
2022-09-13 1:41 ` fengchengwen
2022-09-17 4:15 ` [PATCH v3 " Chengwen Feng
2022-09-17 4:15 ` [PATCH v3 1/3] net/bonding: support Tx prepare Chengwen Feng
2022-09-17 4:15 ` [PATCH v3 2/3] net/bonding: support Tx prepare fail stats Chengwen Feng
2022-09-17 4:15 ` [PATCH v3 3/3] net/bonding: add testpmd cmd for Tx prepare Chengwen Feng
2022-10-09 3:36 ` [PATCH v4] net/bonding: call Tx prepare before Tx burst Chengwen Feng
2022-10-10 19:42 ` Chas Williams
2022-10-11 13:28 ` fengchengwen
2022-10-11 13:20 ` [PATCH v5] " Chengwen Feng
2022-10-15 15:26 ` Chas Williams
2022-10-18 14:25 ` fengchengwen
2022-10-20 7:07 ` Andrew Rybchenko
2021-04-23 9:46 ` [dpdk-dev] [PATCH 2/2] net/bonding: support configuring Tx offloading for bonding Chengchang Tang
2021-06-08 9:49 ` Andrew Rybchenko
2021-06-09 6:57 ` Chengchang Tang
2021-06-09 9:11 ` Ananyev, Konstantin
2021-06-09 9:37 ` Andrew Rybchenko
2021-06-10 6:29 ` Chengchang Tang
2021-06-14 11:05 ` Ananyev, Konstantin
2021-06-14 14:13 ` Andrew Rybchenko
2021-04-30 6:26 ` [dpdk-dev] [PATCH 0/2] add Tx prepare support for bonding device Chengchang Tang
2021-04-30 6:47 ` Min Hu (Connor)
2021-06-03 1:44 ` Chengchang Tang
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=66c366e5-3634-3ecb-c605-a4c23278ed88@huawei.com \
--to=fengchengwen@huawei.com \
--cc=3chas3@gmail.com \
--cc=andrew.rybchenko@oktetlabs.ru \
--cc=chas3@att.com \
--cc=dev@dpdk.org \
--cc=ferruh.yigit@amd.com \
--cc=ferruh.yigit@xilinx.com \
--cc=humin29@huawei.com \
--cc=konstantin.ananyev@huawei.com \
--cc=thomas@monjalon.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).