DPDK patches and discussions
 help / color / mirror / Atom feed
From: Jan Viktorin <viktorin@cesnet.cz>
To: Ori Kam <orika@nvidia.com>
Cc: "Andrew Rybchenko" <andrew.rybchenko@oktetlabs.ru>,
	"Havlík Martin" <xhavli56@stud.fit.vutbr.cz>,
	"Min Hu (Connor)" <humin29@huawei.com>,
	dev@dpdk.org, "Matan Azrad" <matan@nvidia.com>,
	Chas3 <chas3@att.com>, "Shahaf Shuler" <shahafs@nvidia.com>,
	"Slava Ovsiienko" <viacheslavo@nvidia.com>,
	"Ferruh Yigit" <ferruh.yigit@intel.com>,
	"Thomas Monjalon" <thomas@monjalon.net>
Subject: Re: [dpdk-dev] [PATCH 3/3] net/bonding: start ethdev prior to setting 8023ad flow
Date: Tue, 13 Jul 2021 13:06:09 +0200	[thread overview]
Message-ID: <20210713130609.00475428@coaster.localdomain> (raw)
In-Reply-To: <f1ef05bd-0d07-5598-6d8e-55b30b811b6c@oktetlabs.ru>

On Tue, 13 Jul 2021 12:26:35 +0300
Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru> wrote:

> On 7/13/21 11:18 AM, Havlík Martin wrote:
> > Dne 2021-07-12 15:07, Ori Kam napsal:  
> >> Hi Jan,
> >>  
> >>> -----Original Message-----
> >>> From: Jan Viktorin <viktorin@cesnet.cz>
> >>> Sent: Monday, July 12, 2021 12:46 AM
> >>>
> >>> On Sun, 11 Jul 2021 08:49:18 +0000
> >>> Ori Kam <orika@nvidia.com> wrote:
> >>>  
> >>> > Hi Jan,  
> >>>
> >>> Hi Ori,
> >>>  
> >>> >
> >>> >  
> >>> > > -----Original Message-----
> >>> > > From: dev <dev-bounces@dpdk.org> On Behalf Of Jan Viktorin
> >>> > > Sent: Wednesday, July 7, 2021 6:54 PM
> >>> > >
> >>> > > On Sun, 4 Jul 2021 15:18:01 +0000
> >>> > > Matan Azrad <matan@nvidia.com> wrote:
> >>> > >  
> >>> > > > From: Havlík Martin  
> >>> > > > > Dne 2021-06-23 09:04, Min Hu (Connor) napsal:  
> >>> > > > > > 在 2021/6/22 17:25, Martin Havlik 写道:  
> >>> > > > > >> When dedicated queues are enabled, mlx5 PMD fails to
> >>> > > > > >> install RTE Flows if the underlying ethdev is not
> >>> > > > > >> started: bond_ethdev_8023ad_flow_set(267) -  
> >>> > > bond_ethdev_8023ad_flow_set:  
> >>> > > > > port  
> >>> > > > > >> not started (slave_port=0 queue_id=1)
> >>> > > > > >>  
> >>> > > > > > Why mlx5 PMD doing flow create relys on port started ?
> >>> > > > > > I noticed other PMDs did not has that reliance.
> >>> > > > > >  
> >>> > > > > After looking into it, I really can't answer this mlx5
> >>> > > > > centered question. Closest related info we found so far
> >>> > > > > is the 5th point in
> >>> > > > > https://doc.dpdk.org/guides/prog_guide/rte_flow.html#caveats
> >>> > > > > but it only specifies it's the application's
> >>> > > > > responsibility and that flow rules are assumed invalid
> >>> > > > > after port stop/close/restart but doesn't say anything
> >>> > > > > about <stop - flow rule create - start> vs <stop - start
> >>> > > > > - flow rule create> where the former is the point of
> >>> > > > > failure on mlx5. I'm addressing the maintainers for mlx5,
> >>> > > > > who might know a bit more on the topic.  
> >>> > > >  
> >>> > >
> >>> > > Hello Matan,
> >>> > >  
> >>> > > > From rte_ethdev.h
> >>> > > >
> >>> > > > * Please note that some configuration is not stored between
> >>> > > > calls to
> >>> > > >  * rte_eth_dev_stop()/rte_eth_dev_start(). The following
> >>> > > > configuration will
> >>> > > >  * be retained:
> >>> > > >  *
> >>> > > >  *     - MTU
> >>> > > >  *     - flow control settings
> >>> > > >  *     - receive mode configuration (promiscuous mode,  
> >>> all-multicast  
> >>> > > > mode,
> >>> > > >  *       hardware checksum mode, RSS/VMDQ settings etc.)
> >>> > > >  *     - VLAN filtering configuration
> >>> > > >  *     - default MAC address
> >>> > > >  *     - MAC addresses supplied to MAC address array
> >>> > > >  *     - flow director filtering mode (but not filtering
> >>> > > >rules)
> >>> > > >  *     - NIC queue statistics mappings  
> >>> > >
> >>> > > just after this section, you can find the following statement:
> >>> > >
> >>> > >  * Any other configuration will not be stored and will need
> >>> > >to be
> >>> > > re-entered
> >>> > >  * before a call to rte_eth_dev_start().
> >>> > >
> >>> > > It is not very clear how is this exactly related to flows
> >>> > > (and this applies for all the quoted section, I think) but at
> >>> > > least it can  
> >>> be used as a
> >>> counter argument.  
> >>> > >  
> >>> > I agree the doc is not clear, as I see it flows are not part of
> >>> > configuration, at least not when we are talking about rte_flow.
> >>> >  
> >>>
> >>> Agree.
> >>>  
> >>> >  
> >>> > > >
> >>> > > >
> >>> > > > Mlx5 assumes flows are allowed to be configured only after
> >>> > > > rte_eth_dev_start(). Before start \ after stop - no flow is
> >>> > > > valid anymore.  
> >>> > >
> >>> > > I believe that this discussion is not about validity of
> >>> > > flows. Let the flows be invalid after calling to
> >>> > > rte_eth_dev_stop(). This is OK, flows must be recreated and
> >>> > > the bonding driver works this way. But why not *before
> >>> > > start*?  
> >>> > Think about it this way by changing the configuration you may
> >>> > create invalid flows, for example, you can only change the
> >>> > number of queues after port stop, so if you create a flow with
> >>> > jump to queue 3 and then you remove queue 3 then, the flow that
> >>> > is cached is not valid anymore. This goes for other
> >>> > configuration that may affect the validity of a  
> >>> flow.
> >>>
> >>> This is a valid argument, of course. The thing is whether it is a
> >>> responsibility
> >>> of the PMD to take care of those corner cases or if this is up to
> >>> the application developer. If we respect the fact that calling to
> >>> stop invalidates all
> >>> flows then when you do:
> >>>  
> >>>  > port stop 0
> >>>  > flow create 0 ingress pattern ... / end actions queue 6 / end
> >>> > port config
> >>> rxq 3  > port start 0
> >>>
> >>> it's clear that something is really wrong with the caller/user. I
> >>> would say that
> >>> this is an application bug. I would expect that you first
> >>> reconfigure port and
> >>> after that you modify flows. It seems quite logical and intuitive,
> >>> doesn't it?
> >>>  
> >>
> >> I agree the use case I described is bad programming, but the same
> >> logic can be
> >> applied on stopping the port why flows should be deleted?

I just took into account what Matan has written. Certainly, it is not
necessary to delete flows on stop.

> >> And even more important on some HW and some flows can only be
> >> inserted after
> >> start since there is no real traffic gate, basically the start
> >> means adding the flows
> >> if a flow is in the HW it is working, adding a gate will result in
> >> PPS degradation.

What do you mean by term "gate", here? Some FPGA logic? Or something
else? Why "adding a gate" would result in PPS degradation? Can you
make those ideas a bit clearer?

> >>  
> >>> Anyway, the PMD can possibly catch this rxq inconsistency but I
> >>> can imagine
> >>> that there are more complicated sitations than just changing
> >>> count of queues. Any idea for a more complex real case?
> >>>  
> >>
> >> Some of the resource may be created only after start,
> >> But I don't have a very good example now.

Please, try to find some reasonable example otherwise your argumentation
does not make much sense...

> >>  
> >>> Martin Havlik, can you please check how ixgbe and i40e behave in
> >>> the case
> >>> above with "rxq change after flow created"?
> >>>  
> > Both PMDs do not raise any errors in the mentioned case. The packets
> > that should go to the non-existent queue are dropped at some point
> > in the processing (from what I could see and understand).

Thanks for clarification. It would be nice if any other PMD can be
tested for this behaviour. Can sombody help us with other NICs?

> > 
> > I'm glad for the discussion that is taking place but I feel we have
> > strayed from the original reason of it. That being the inability to
> > enable dedicated queues for bonding mode 8023AD on mlx5. We have
> > concluded that flow creation has to be done after port/device
> > start. So moving the function call to create the flow after device
> > start should solve the issue
> > (https://mails.dpdk.org/archives/dev/2021-June/212210.html). From my
> > testing, flow create after start is not a problem on Intel PMDs.  
> 
> It would be good to hear what net/bonding maintainers think
> about it. I see no conclusion.
> 
> If net/mlx5 behaviour is fixed, will it fix the initial
> problem?

I believe that if mlx5 allows to create flows before calling to
rte_eth_dev_start() then the bonding would work as expected.

> 
> > 
> > I'm turning to you, Connor, as you have made the original question
> > on this. Is the patch I presented applicable?
> > 
> > Martin  
> >>> >  
> >>> > > Does somebody know how other drivers behaves in this
> >>> > > situation? (We know and can check for Intel, there it does
> >>> > > not seem to be an issue.)
> >>> > >
> >>> > > By the way, the mlx5 behaviour opens a (probably short) time
> >>> > > window between starting of a port and configuation of
> >>> > > filtering flows. You may want to start the port with
> >>> > > thousands of flows that apply just when the port starts (not
> >>> > > after, that's late). This may introduce glitches in filtering
> >>> > > and measuring of traffic (well, it is a  
> >>> question how
> >>> serious issue could it be...).  
> >>> > >  
> >>> > Agree but this is always true nothing is done is zero time and
> >>> > even if it was the insertion is not in zero time, (assuming
> >>> > that even if the flows are stored by the PMD until start they
> >>> > still will not all be inserted in the same time)  
> >>>
> >>> Sorry, I probably do not understand well. Yes, creation of a flow
> >>> would never
> >>> be zero time, agree. But if I create flows before the port starts,
> >>> than I do not
> >>> have to care too much about how long the creation takes. Because
> >>> the card is
> >>> expected to be configured already before pushing the "start
> >>> button".
> >>>
> >>> Maybe, you assume that the created flows would first be cached
> >>> inside the
> >>> PMD and when the port is finally started, it than transparently
> >>> write the
> >>> queues to the HW. But this is not what I was talking about, that's
> >>> maybe even
> >>> worse because you are hiding such behaviour from users...
> >>>
> >>> (I do not know the exact mlx5 implementation details, so my
> >>> answer can miss something, sorry for that.)
> >>>  
> >>
> >> You are correct this is HW/PMD implementation issue, in case of
> >> Nvidia we must

By "must" you probably mean "would have to"? Is that correct?

> >> cache the flows and only insert it after start.
> >> Since RTE flow is generic and HW is not generic sometimes the PMD
> >> needs to
> >> translate this difference and in this gray area there is a game
> >> between best API best performance understanding what the user

Yes, as a former FPGA dev, I can understand your concerns. RTE Flow is
a difficult beast.

Jan

> >> really wants. This is why the doc should be very clear on what is
> >> the expected. but this is also the reason why such document is
> >> very hard to agree on.  
> 
> Do I understand correctly that net/mlx5 maintainers will follow
> up?
> 
> >>
> >> Ori  
> >>> >  
> >>> > > This matters for the bonding case as well, doesn't it?. It is
> >>> > > not desirable to accidently omit a packet that was received
> >>> > > by primary ingress logic instead of being redirected into the
> >>> > > dedicated queue.
> >>> > >
> >>> > > Are there any chances that for mlx5 it would be possible to
> >>> > > insert flow rules before calling rte_eth_dev_start? Anyway,
> >>> > > the behaviour should be specified and documented in DPDK more
> >>> > > precisely to avoid such uncertainty in the future.
> >>> > >  
> >>> > I agree the documentation should be fixed.  
> >>>
> >>> +1  
> 
> Cc Thomas and Ferruh since ethdev documentation should be
> clarified.
> 
> It looks like there is no consensus if the patch is a right
> direction or wrong. For me it looks wrong taking all above
> arguments in to account (mainly necessity to be able to
> insert flows before pushing start button which enables the
> traffic if HW supports it).
> 
> So, I'm applying first two patches and hold on this one.


  reply	other threads:[~2021-07-13 11:06 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-06-22  9:25 [dpdk-dev] [PATCH 0/3] net/bonding: make dedicated queues work with mlx5 Martin Havlik
2021-06-22  9:25 ` [dpdk-dev] [PATCH 1/3] net/bonding: fix proper return value check and correct log message Martin Havlik
2021-07-08 12:33   ` Min Hu (Connor)
2021-07-13  9:26     ` Andrew Rybchenko
2021-06-22  9:25 ` [dpdk-dev] [PATCH 2/3] net/bonding: fix not checked return value Martin Havlik
2021-07-08 12:43   ` Min Hu (Connor)
2021-07-08 13:20     ` Havlík Martin
2021-07-09  0:09       ` Min Hu (Connor)
2021-07-13  9:26         ` Andrew Rybchenko
2021-06-22  9:25 ` [dpdk-dev] [PATCH 3/3] net/bonding: start ethdev prior to setting 8023ad flow Martin Havlik
2021-06-23  7:04   ` Min Hu (Connor)
2021-06-24 11:57     ` Havlík Martin
2021-07-04 15:18       ` Matan Azrad
2021-07-07 15:54         ` Jan Viktorin
2021-07-11  8:49           ` Ori Kam
2021-07-11 21:45             ` Jan Viktorin
2021-07-12 13:07               ` Ori Kam
2021-07-13  8:18                 ` Havlík Martin
2021-07-13  9:26                   ` Andrew Rybchenko
2021-07-13 11:06                     ` Jan Viktorin [this message]
2021-07-13 17:17                       ` Ori Kam
2021-07-14 15:00                         ` Jan Viktorin
2021-07-15 13:58                           ` Thomas Monjalon
2021-08-24 13:18                             ` Ferruh Yigit
2021-08-26 10:15                               ` Jan Viktorin
2021-07-04 15:03 ` [dpdk-dev] [PATCH 0/3] net/bonding: make dedicated queues work with mlx5 Andrew Rybchenko
2021-09-25  6:15 [dpdk-dev] [PATCH 3/3] net/bonding: start ethdev prior to setting 8023ad flow Havlík Martin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210713130609.00475428@coaster.localdomain \
    --to=viktorin@cesnet.cz \
    --cc=andrew.rybchenko@oktetlabs.ru \
    --cc=chas3@att.com \
    --cc=dev@dpdk.org \
    --cc=ferruh.yigit@intel.com \
    --cc=humin29@huawei.com \
    --cc=matan@nvidia.com \
    --cc=orika@nvidia.com \
    --cc=shahafs@nvidia.com \
    --cc=thomas@monjalon.net \
    --cc=viacheslavo@nvidia.com \
    --cc=xhavli56@stud.fit.vutbr.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).