DPDK patches and discussions
 help / color / mirror / Atom feed
From: Konstantin Ananyev <konstantin.ananyev@huawei.com>
To: Chas Williams <3chas3@gmail.com>,
	Fengchengwen <fengchengwen@huawei.com>,
	 Ferruh Yigit <ferruh.yigit@xilinx.com>,
	"thomas@monjalon.net" <thomas@monjalon.net>,
	"andrew.rybchenko@oktetlabs.ru" <andrew.rybchenko@oktetlabs.ru>,
	"konstantin.ananyev@intel.com" <konstantin.ananyev@intel.com>
Cc: "dev@dpdk.org" <dev@dpdk.org>, "chas3@att.com" <chas3@att.com>,
	"humin (Q)" <humin29@huawei.com>
Subject: RE: [PATCH v2 1/3] net/bonding: support Tx prepare
Date: Mon, 26 Sep 2022 10:18:41 +0000	[thread overview]
Message-ID: <e899b2c8046f4672a003d5d1184329ce@huawei.com> (raw)
In-Reply-To: <4a978f38-4f38-4630-ca91-fb96a5789d6f@gmail.com>


Hi everyone,

Sorry for late reply.

> >>>>> The main problem is hard to design a tx_prepare for bonding device:
> >>>>> 1. as Chas Williams said, there maybe twice hash calc to get target slave
> >>>>>       devices.
> >>>>> 2. also more important, if the slave devices have changes(e.g. slave device
> >>>>>       link down or remove), and if the changes happens between bond-tx-prepare and
> >>>>>       bond-tx-burst, the output slave will changes, and this may lead to checksum
> >>>>>       failed. (Note: a bond device with slave devices may from different vendors,
> >>>>>       and slave devices may have different requirements, e.g. slave-A support calc
> >>>>>       IPv4 pseudo-head automatic (no need driver pre-calc), but slave-B need driver
> >>>>>       pre-calc).
> >>>>>
> >>>>> Current design cover the above two scenarios by using in-place tx-prepare. and
> >>>>> in addition, bond devices are not transparent to applications, I think it's a
> >>>>> practical method to provide tx-prepare support in this way.
> >>>>>
> >>>>
> >>>>
> >>>> I don't think you need to export an enable/disable routine for the use of
> >>>> rte_eth_tx_prepare. It's safe to just call that routine, even if it isn't
> >>>> implemented. You are just trading one branch in DPDK librte_eth_dev for a
> >>>> branch in drivers/net/bonding.
> >>>
> >>> Our first patch was just like yours (just add tx-prepare default), but community
> >>> is concerned about impacting performance.
> >>>
> >>> As a trade-off, I think we can add the enable/disable API.
> >>
> >> IMHO, that's a bad idea. If the rte_eth_dev_tx_prepare API affects
> >> performance adversly, that is not a bonding problem. All applications
> >> should be calling rte_eth_dev_tx_prepare. There's no defined API
> >> to determine if rte_eth_dev_tx_prepare should be called. Therefore,
> >> applications should always call rte_eth_dev_tx_prepare. Regardless,
> >> as I previously mentioned, you are just trading the location of
> >> the branch, especially in the bonding case.
> >>
> >> If rte_eth_dev_tx_prepare is causing a performance drop, then that API
> >> should be improved or rewritten. There are PMD that require you to use
> >> that API. Locally, we had maintained a patch to eliminate the use of
> >> rte_eth_dev_tx_prepare. However, that has been getting harder and harder
> >> to maintain. The performance lost by just calling rte_eth_dev_tx_prepare
> >> was marginal.
> >>
> >>>
> >>>>
> >>>> I think you missed fixing tx_machine in 802.3ad support. We have been using
> >>>> the following patch locally which I never got around to submitting.
> >>>
> >>> You are right, I will send V3 fix it.
> >>>
> >>>>
> >>>>
> >>>>   From a458654d68ff5144266807ef136ac3dd2adfcd98 Mon Sep 17 00:00:00 2001
> >>>> From: "Charles (Chas) Williams" <chwillia@ciena.com>
> >>>> Date: Tue, 3 May 2022 16:52:37 -0400
> >>>> Subject: [PATCH] net/bonding: call rte_eth_tx_prepare before rte_eth_tx_burst
> >>>>
> >>>> Some PMDs might require a call to rte_eth_tx_prepare before sending the
> >>>> packets for transmission. Typically, the prepare step handles the VLAN
> >>>> headers, but it may need to do other things.
> >>>>
> >>>> Signed-off-by: Chas Williams <chwillia@ciena.com>
> >>>
> >>> ...
> >>>
> >>>>                 * ring if transmission fails so the packet isn't lost.
> >>>> @@ -1322,8 +1350,12 @@ bond_ethdev_tx_burst_broadcast(void *queue, struct rte_mbuf **bufs,
> >>>>
> >>>>        /* Transmit burst on each active slave */
> >>>>        for (i = 0; i < num_of_slaves; i++) {
> >>>> -        slave_tx_total[i] = rte_eth_tx_burst(slaves[i], bd_tx_q->queue_id,
> >>>> +        uint16_t nb_prep;
> >>>> +
> >>>> +        nb_prep = rte_eth_tx_prepare(slaves[i], bd_tx_q->queue_id,
> >>>>                        bufs, nb_pkts);
> >>>> +        slave_tx_total[i] = rte_eth_tx_burst(slaves[i], bd_tx_q->queue_id,
> >>>> +                    bufs, nb_prep);
> >>>
> >>> The tx-prepare may edit packet data, and the broadcast mode will send a packet to all slaves,
> >>> the packet data is sent and edited at the same time. Is this likely to cause problems ?
> >>
> >> This routine is already broken. You can't just increment the refcount
> >> and send the packet into a PMD's transmit routine. Nothing guarantees
> >> that a transmit routine will not modify the packet. Many PMDs perform an
> >> rte_vlan_insert.
> >
> > Hmm interesting....
> > My uderstanding was quite opposite - tx_burst() can't modify packet data and metadata
> > (except when refcnt==1 and tx_burst() going to free the mbuf and put it back to the mempool).
> > While tx_prepare() can - actually as I remember that was one of the reasons why a separate routine
> > was introduced.
> 
> Is that documented anywhere?

I looked through, but couldn't find too much except what was already mentioned by Fengcheng:
rte_eth_tx_prepare() notes:
    * Since this function can modify packet data, provided mbufs must be safely
    * writable (e.g. modified data cannot be in shared segment).
Probably that's not explicit enough, as it doesn't forbid modifying packets in tx_burst clearly.

> It's been my experience that the device PMD
> can do practically anything and you need to protect yourself.  Currently,
> the af_packet, dpaa2, and vhost driver call rte_vlan_insert. Before 2019,
> the virtio driver also used to call rte_vlan_insert during its transmit
> path. Of course, rte_vlan_insert modifies the packet data and the mbuf
> header.
Interesting, usually apps that trying to use zero-copy multi-cast TX have packet-header portion
in a separate segment, so it might even keep working.. But definetly doesn't look right to me:
if mbuf->refnct > 1,  I think it should be treated as read-only. 

 Regardless, it looks like rte_eth_dev_tx_prepare should always be
> called.

Again, as I remember, initial agreement was: if any TX offload is enabled,
tx_prepare() needs to be called (or user has implement similar stuff on his own).
If no TX offload flags were specified for the packet, tx_prepare() is not necessary.

 > Handling that correctly in broadcast mode probably means always
> make a deep copy of the packet, or check to see if all the members are
> the same PMD type. If so, you can just call prepare once. You could track
> the mismatched nature during additional/removal of the members. Or just
> assume people aren't going to mismatch bonding members.
> 
> 
> >> You should at least perform a clone of the packet so
> >> that the mbuf headers aren't mangled by each PMD. 

Usually you don't need to clone the whole packet. In many cases it is enough to just attach
as first segment l2/l3/l4 header portion of the packet.
At least that's how ip_multicast sample works.  

Just to be safe you
> >> should perform a partial deep copy of the packet headers in case some
> >> PMD does an rte_vlan_insert and the other PMDs in the bonding group do
> >> not need an rte_vlan_insert.
> >>
> >> So doing a blind rte_eth_dev_tx_preprare isn't making anything much
> >> worse.
> >>
> >>>
> >>>>
> >>>>            if (unlikely(slave_tx_total[i] < nb_pkts))
> >>>>                tx_failed_flag = 1;

  parent reply	other threads:[~2022-09-26 10:18 UTC|newest]

Thread overview: 61+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-04-16 11:04 [dpdk-dev] [RFC 0/2] add Tx prepare support for bonding device Chengchang Tang
2021-04-16 11:04 ` [dpdk-dev] [RFC 1/2] net/bonding: add Tx prepare for bonding Chengchang Tang
2021-04-16 11:04 ` [dpdk-dev] [RFC 2/2] app/testpmd: add cmd for bonding Tx prepare Chengchang Tang
2021-04-16 11:12 ` [dpdk-dev] [RFC 0/2] add Tx prepare support for bonding device Min Hu (Connor)
2021-04-20  1:26 ` Ferruh Yigit
2021-04-20  2:44   ` Chengchang Tang
2021-04-20  8:33     ` Ananyev, Konstantin
2021-04-20 12:44       ` Chengchang Tang
2021-04-20 13:18         ` Ananyev, Konstantin
2021-04-20 14:06           ` Chengchang Tang
2021-04-23  9:46 ` [dpdk-dev] [PATCH " Chengchang Tang
2021-04-23  9:46   ` [dpdk-dev] [PATCH 1/2] net/bonding: support Tx prepare for bonding Chengchang Tang
2021-06-08  9:49     ` Andrew Rybchenko
2021-06-09  6:42       ` Chengchang Tang
2021-06-09  9:35         ` Andrew Rybchenko
2021-06-10  7:32           ` Chengchang Tang
2021-06-14 14:16             ` Andrew Rybchenko
2021-06-09 10:25         ` Ananyev, Konstantin
2021-06-10  6:46           ` Chengchang Tang
2021-06-14 11:36             ` Ananyev, Konstantin
2022-05-24 12:11       ` Min Hu (Connor)
2022-07-25  4:08     ` [PATCH v2 0/3] add Tx prepare support for bonding driver Chengwen Feng
2022-07-25  4:08       ` [PATCH v2 1/3] net/bonding: support Tx prepare Chengwen Feng
2022-09-13 10:22         ` Ferruh Yigit
2022-09-13 15:08           ` Chas Williams
2022-09-14  0:46           ` fengchengwen
2022-09-14 16:59             ` Chas Williams
2022-09-17  2:35               ` fengchengwen
2022-09-17 13:38                 ` Chas Williams
2022-09-19 14:07                   ` Konstantin Ananyev
2022-09-19 23:02                     ` Chas Williams
2022-09-22  2:12                       ` fengchengwen
2022-09-25 10:32                         ` Chas Williams
2022-09-26 10:18                       ` Konstantin Ananyev [this message]
2022-09-26 16:36                         ` Chas Williams
2022-07-25  4:08       ` [PATCH v2 2/3] net/bonding: support Tx prepare fail stats Chengwen Feng
2022-07-25  4:08       ` [PATCH v2 3/3] net/bonding: add testpmd cmd for Tx prepare Chengwen Feng
2022-07-25  7:04       ` [PATCH v2 0/3] add Tx prepare support for bonding driver humin (Q)
2022-09-13  1:41       ` fengchengwen
2022-09-17  4:15     ` [PATCH v3 " Chengwen Feng
2022-09-17  4:15       ` [PATCH v3 1/3] net/bonding: support Tx prepare Chengwen Feng
2022-09-17  4:15       ` [PATCH v3 2/3] net/bonding: support Tx prepare fail stats Chengwen Feng
2022-09-17  4:15       ` [PATCH v3 3/3] net/bonding: add testpmd cmd for Tx prepare Chengwen Feng
2022-10-09  3:36     ` [PATCH v4] net/bonding: call Tx prepare before Tx burst Chengwen Feng
2022-10-10 19:42       ` Chas Williams
2022-10-11 13:28         ` fengchengwen
2022-10-11 13:20     ` [PATCH v5] " Chengwen Feng
2022-10-15 15:26       ` Chas Williams
2022-10-18 14:25         ` fengchengwen
2022-10-20  7:07         ` Andrew Rybchenko
2021-04-23  9:46   ` [dpdk-dev] [PATCH 2/2] net/bonding: support configuring Tx offloading for bonding Chengchang Tang
2021-06-08  9:49     ` Andrew Rybchenko
2021-06-09  6:57       ` Chengchang Tang
2021-06-09  9:11         ` Ananyev, Konstantin
2021-06-09  9:37           ` Andrew Rybchenko
2021-06-10  6:29             ` Chengchang Tang
2021-06-14 11:05               ` Ananyev, Konstantin
2021-06-14 14:13                 ` Andrew Rybchenko
2021-04-30  6:26   ` [dpdk-dev] [PATCH 0/2] add Tx prepare support for bonding device Chengchang Tang
2021-04-30  6:47     ` Min Hu (Connor)
2021-06-03  1:44   ` Chengchang Tang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=e899b2c8046f4672a003d5d1184329ce@huawei.com \
    --to=konstantin.ananyev@huawei.com \
    --cc=3chas3@gmail.com \
    --cc=andrew.rybchenko@oktetlabs.ru \
    --cc=chas3@att.com \
    --cc=dev@dpdk.org \
    --cc=fengchengwen@huawei.com \
    --cc=ferruh.yigit@xilinx.com \
    --cc=humin29@huawei.com \
    --cc=konstantin.ananyev@intel.com \
    --cc=thomas@monjalon.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).