From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id 54527A0C4B; Tue, 13 Jul 2021 13:06:12 +0200 (CEST) Received: from [217.70.189.124] (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id 3F9F6410E6; Tue, 13 Jul 2021 13:06:12 +0200 (CEST) Received: from office2.cesnet.cz (office2.cesnet.cz [195.113.144.244]) by mails.dpdk.org (Postfix) with ESMTP id 292F5410DF for ; Tue, 13 Jul 2021 13:06:11 +0200 (CEST) Received: from coaster.localdomain (unknown [IPv6:2001:67c:1220:80e:69:edd0:2e93:a6c4]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange ECDHE (P-256) server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by office2.cesnet.cz (Postfix) with ESMTPSA id EE990400052; Tue, 13 Jul 2021 13:06:09 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cesnet.cz; s=office2-2020; t=1626174370; bh=yNPmH2TQyK6sSv3nobfM9YddnEEoCRcBkIU27/uilZY=; h=Date:From:To:Cc:Subject:In-Reply-To:References; b=Y9JAfLHjJ0/zSrdHY5DU6ffIs79Xi6bp3P5DCJVQtbkrSHPiVAoObbUPwC8PnrCgn T36ueGeef8LDl19+ziyHXYrEo5w4SeJfOtgtBph7ERCvGhZwAEXAy0ZBFbduE0FWM2 yf4SMHqwXncjczsWAICqZHp9lwtIMvPMJw342tdfXnxbZi8C92fVNSjtTjbhi7EBA2 AvR2FMkheZ8kleVNXzvtkhHBmUTUFF4jlvgdv0qNkMPezd+331KNzyOea/qQJKSHsh WgbqMdYHoGYvUxeGO5v8jmnLs/6QVIgOfE3Cg2SbtJL0eBq98jZYy8XWATQVESQD/1 uAOuQJVl+N3Pg== Date: Tue, 13 Jul 2021 13:06:09 +0200 From: Jan Viktorin To: Ori Kam Cc: Andrew Rybchenko , =?UTF-8?B?SGF2bMOt?= =?UTF-8?B?aw==?= Martin , "Min Hu (Connor)" , dev@dpdk.org, Matan Azrad , Chas3 , Shahaf Shuler , Slava Ovsiienko , Ferruh Yigit , Thomas Monjalon Message-ID: <20210713130609.00475428@coaster.localdomain> In-Reply-To: References: <20210622092531.73112-1-xhavli56@stud.fit.vutbr.cz> <20210622092531.73112-4-xhavli56@stud.fit.vutbr.cz> <5f2cd5e6-1c46-227c-7ab1-5a2a34eb337d@huawei.com> <01bd8d8fcd364584fecdea3bc70abd1c@stud.fit.vutbr.cz> <20210707175420.4d81fdaf@coaster.localdomain> <20210711234537.6d2b14c4@tanguero.localdomain> <75196e3aee88a5347e597c0ee6fb8bf6@stud.fit.vutbr.cz> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Subject: Re: [dpdk-dev] [PATCH 3/3] net/bonding: start ethdev prior to setting 8023ad flow X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" On Tue, 13 Jul 2021 12:26:35 +0300 Andrew Rybchenko wrote: > On 7/13/21 11:18 AM, Havl=C3=ADk Martin wrote: > > Dne 2021-07-12 15:07, Ori Kam napsal: =20 > >> Hi Jan, > >> =20 > >>> -----Original Message----- > >>> From: Jan Viktorin > >>> Sent: Monday, July 12, 2021 12:46 AM > >>> > >>> On Sun, 11 Jul 2021 08:49:18 +0000 > >>> Ori Kam wrote: > >>> =20 > >>> > Hi Jan, =20 > >>> > >>> Hi Ori, > >>> =20 > >>> > > >>> > =20 > >>> > > -----Original Message----- > >>> > > From: dev On Behalf Of Jan Viktorin > >>> > > Sent: Wednesday, July 7, 2021 6:54 PM > >>> > > > >>> > > On Sun, 4 Jul 2021 15:18:01 +0000 > >>> > > Matan Azrad wrote: > >>> > > =20 > >>> > > > From: Havl=C3=ADk Martin =20 > >>> > > > > Dne 2021-06-23 09:04, Min Hu (Connor) napsal: =20 > >>> > > > > > =E5=9C=A8 2021/6/22 17:25, Martin Havlik =E5=86=99=E9=81=93= : =20 > >>> > > > > >> When dedicated queues are enabled, mlx5 PMD fails to > >>> > > > > >> install RTE Flows if the underlying ethdev is not > >>> > > > > >> started: bond_ethdev_8023ad_flow_set(267) - =20 > >>> > > bond_ethdev_8023ad_flow_set: =20 > >>> > > > > port =20 > >>> > > > > >> not started (slave_port=3D0 queue_id=3D1) > >>> > > > > >> =20 > >>> > > > > > Why mlx5 PMD doing flow create relys on port started ? > >>> > > > > > I noticed other PMDs did not has that reliance. > >>> > > > > > =20 > >>> > > > > After looking into it, I really can't answer this mlx5 > >>> > > > > centered question. Closest related info we found so far > >>> > > > > is the 5th point in > >>> > > > > https://doc.dpdk.org/guides/prog_guide/rte_flow.html#caveats > >>> > > > > but it only specifies it's the application's > >>> > > > > responsibility and that flow rules are assumed invalid > >>> > > > > after port stop/close/restart but doesn't say anything > >>> > > > > about vs >>> > > > > - flow rule create> where the former is the point of > >>> > > > > failure on mlx5. I'm addressing the maintainers for mlx5, > >>> > > > > who might know a bit more on the topic. =20 > >>> > > > =20 > >>> > > > >>> > > Hello Matan, > >>> > > =20 > >>> > > > From rte_ethdev.h > >>> > > > > >>> > > > * Please note that some configuration is not stored between > >>> > > > calls to > >>> > > >=C2=A0 * rte_eth_dev_stop()/rte_eth_dev_start(). The following > >>> > > > configuration will > >>> > > >=C2=A0 * be retained: > >>> > > >=C2=A0 * > >>> > > >=C2=A0 *=C2=A0=C2=A0=C2=A0=C2=A0 - MTU > >>> > > >=C2=A0 *=C2=A0=C2=A0=C2=A0=C2=A0 - flow control settings > >>> > > >=C2=A0 *=C2=A0=C2=A0=C2=A0=C2=A0 - receive mode configuration (p= romiscuous mode, =20 > >>> all-multicast =20 > >>> > > > mode, > >>> > > >=C2=A0 *=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 hardware checksum m= ode, RSS/VMDQ settings etc.) > >>> > > >=C2=A0 *=C2=A0=C2=A0=C2=A0=C2=A0 - VLAN filtering configuration > >>> > > >=C2=A0 *=C2=A0=C2=A0=C2=A0=C2=A0 - default MAC address > >>> > > >=C2=A0 *=C2=A0=C2=A0=C2=A0=C2=A0 - MAC addresses supplied to MAC= address array > >>> > > >=C2=A0 *=C2=A0=C2=A0=C2=A0=C2=A0 - flow director filtering mode = (but not filtering > >>> > > >rules) > >>> > > >=C2=A0 *=C2=A0=C2=A0=C2=A0=C2=A0 - NIC queue statistics mappings= =20 > >>> > > > >>> > > just after this section, you can find the following statement: > >>> > > > >>> > >=C2=A0 * Any other configuration will not be stored and will need > >>> > >to be > >>> > > re-entered > >>> > >=C2=A0 * before a call to rte_eth_dev_start(). > >>> > > > >>> > > It is not very clear how is this exactly related to flows > >>> > > (and this applies for all the quoted section, I think) but at > >>> > > least it can =20 > >>> be used as a > >>> counter argument. =20 > >>> > > =20 > >>> > I agree the doc is not clear, as I see it flows are not part of > >>> > configuration, at least not when we are talking about rte_flow. > >>> > =20 > >>> > >>> Agree. > >>> =20 > >>> > =20 > >>> > > > > >>> > > > > >>> > > > Mlx5 assumes flows are allowed to be configured only after > >>> > > > rte_eth_dev_start(). Before start \ after stop - no flow is > >>> > > > valid anymore. =20 > >>> > > > >>> > > I believe that this discussion is not about validity of > >>> > > flows. Let the flows be invalid after calling to > >>> > > rte_eth_dev_stop(). This is OK, flows must be recreated and > >>> > > the bonding driver works this way. But why not *before > >>> > > start*? =20 > >>> > Think about it this way by changing the configuration you may > >>> > create invalid flows, for example, you can only change the > >>> > number of queues after port stop, so if you create a flow with > >>> > jump to queue 3 and then you remove queue 3 then, the flow that > >>> > is cached is not valid anymore. This goes for other > >>> > configuration that may affect the validity of a =20 > >>> flow. > >>> > >>> This is a valid argument, of course. The thing is whether it is a > >>> responsibility > >>> of the PMD to take care of those corner cases or if this is up to > >>> the application developer. If we respect the fact that calling to > >>> stop invalidates all > >>> flows then when you do: > >>> =20 > >>> =C2=A0> port stop 0 > >>> =C2=A0> flow create 0 ingress pattern ... / end actions queue 6 / end > >>> > port config > >>> rxq 3=C2=A0 > port start 0 > >>> > >>> it's clear that something is really wrong with the caller/user. I > >>> would say that > >>> this is an application bug. I would expect that you first > >>> reconfigure port and > >>> after that you modify flows. It seems quite logical and intuitive, > >>> doesn't it? > >>> =20 > >> > >> I agree the use case I described is bad programming, but the same > >> logic can be > >> applied on stopping the port why flows should be deleted? I just took into account what Matan has written. Certainly, it is not necessary to delete flows on stop. > >> And even more important on some HW and some flows can only be > >> inserted after > >> start since there is no real traffic gate, basically the start > >> means adding the flows > >> if a flow is in the HW it is working, adding a gate will result in > >> PPS degradation. What do you mean by term "gate", here? Some FPGA logic? Or something else? Why "adding a gate" would result in PPS degradation? Can you make those ideas a bit clearer? > >> =20 > >>> Anyway, the PMD can possibly catch this rxq inconsistency but I > >>> can imagine > >>> that there are more complicated sitations than just changing > >>> count of queues. Any idea for a more complex real case? > >>> =20 > >> > >> Some of the resource may be created only after start, > >> But I don't have a very good example now. Please, try to find some reasonable example otherwise your argumentation does not make much sense... > >> =20 > >>> Martin Havlik, can you please check how ixgbe and i40e behave in > >>> the case > >>> above with "rxq change after flow created"? > >>> =20 > > Both PMDs do not raise any errors in the mentioned case. The packets > > that should go to the non-existent queue are dropped at some point > > in the processing (from what I could see and understand). Thanks for clarification. It would be nice if any other PMD can be tested for this behaviour. Can sombody help us with other NICs? > >=20 > > I'm glad for the discussion that is taking place but I feel we have > > strayed from the original reason of it. That being the inability to > > enable dedicated queues for bonding mode 8023AD on mlx5. We have > > concluded that flow creation has to be done after port/device > > start. So moving the function call to create the flow after device > > start should solve the issue > > (https://mails.dpdk.org/archives/dev/2021-June/212210.html). From my > > testing, flow create after start is not a problem on Intel PMDs. =20 >=20 > It would be good to hear what net/bonding maintainers think > about it. I see no conclusion. >=20 > If net/mlx5 behaviour is fixed, will it fix the initial > problem? I believe that if mlx5 allows to create flows before calling to rte_eth_dev_start() then the bonding would work as expected. >=20 > >=20 > > I'm turning to you, Connor, as you have made the original question > > on this. Is the patch I presented applicable? > >=20 > > Martin =20 > >>> > =20 > >>> > > Does somebody know how other drivers behaves in this > >>> > > situation? (We know and can check for Intel, there it does > >>> > > not seem to be an issue.) > >>> > > > >>> > > By the way, the mlx5 behaviour opens a (probably short) time > >>> > > window between starting of a port and configuation of > >>> > > filtering flows. You may want to start the port with > >>> > > thousands of flows that apply just when the port starts (not > >>> > > after, that's late). This may introduce glitches in filtering > >>> > > and measuring of traffic (well, it is a =20 > >>> question how > >>> serious issue could it be...). =20 > >>> > > =20 > >>> > Agree but this is always true nothing is done is zero time and > >>> > even if it was the insertion is not in zero time, (assuming > >>> > that even if the flows are stored by the PMD until start they > >>> > still will not all be inserted in the same time) =20 > >>> > >>> Sorry, I probably do not understand well. Yes, creation of a flow > >>> would never > >>> be zero time, agree. But if I create flows before the port starts, > >>> than I do not > >>> have to care too much about how long the creation takes. Because > >>> the card is > >>> expected to be configured already before pushing the "start > >>> button". > >>> > >>> Maybe, you assume that the created flows would first be cached > >>> inside the > >>> PMD and when the port is finally started, it than transparently > >>> write the > >>> queues to the HW. But this is not what I was talking about, that's > >>> maybe even > >>> worse because you are hiding such behaviour from users... > >>> > >>> (I do not know the exact mlx5 implementation details, so my > >>> answer can miss something, sorry for that.) > >>> =20 > >> > >> You are correct this is HW/PMD implementation issue, in case of > >> Nvidia we must By "must" you probably mean "would have to"? Is that correct? > >> cache the flows and only insert it after start. > >> Since RTE flow is generic and HW is not generic sometimes the PMD > >> needs to > >> translate this difference and in this gray area there is a game > >> between best API best performance understanding what the user Yes, as a former FPGA dev, I can understand your concerns. RTE Flow is a difficult beast. Jan > >> really wants. This is why the doc should be very clear on what is > >> the expected. but this is also the reason why such document is > >> very hard to agree on. =20 >=20 > Do I understand correctly that net/mlx5 maintainers will follow > up? >=20 > >> > >> Ori =20 > >>> > =20 > >>> > > This matters for the bonding case as well, doesn't it?. It is > >>> > > not desirable to accidently omit a packet that was received > >>> > > by primary ingress logic instead of being redirected into the > >>> > > dedicated queue. > >>> > > > >>> > > Are there any chances that for mlx5 it would be possible to > >>> > > insert flow rules before calling rte_eth_dev_start? Anyway, > >>> > > the behaviour should be specified and documented in DPDK more > >>> > > precisely to avoid such uncertainty in the future. > >>> > > =20 > >>> > I agree the documentation should be fixed. =20 > >>> > >>> +1 =20 >=20 > Cc Thomas and Ferruh since ethdev documentation should be > clarified. >=20 > It looks like there is no consensus if the patch is a right > direction or wrong. For me it looks wrong taking all above > arguments in to account (mainly necessity to be able to > insert flows before pushing start button which enables the > traffic if HW supports it). >=20 > So, I'm applying first two patches and hold on this one.