From: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
To: Ori Kam <orika@nvidia.com>, Ivan Malov <Ivan.Malov@oktetlabs.ru>,
	Eli Britstein <elibr@nvidia.com>,
	Ilya Maximets <i.maximets@ovn.org>, "dev@dpdk.org" <dev@dpdk.org>
Cc: Smadar Fuks <smadarf@marvell.com>,
	Hyong Youb Kim <hyonkim@cisco.com>,
	Kishore Padmanabha <kishore.padmanabha@broadcom.com>,
	Ajit Khaparde <ajit.khaparde@broadcom.com>,
	Jerin Jacob <jerinj@marvell.com>, John Daley <johndale@cisco.com>,
	NBU-Contact-Thomas Monjalon <thomas@monjalon.net>,
	Ferruh Yigit <ferruh.yigit@intel.com>
Subject: Re: [dpdk-dev] [RFC PATCH] ethdev: clarify flow action PORT ID semantics
Date: Thu, 3 Jun 2021 12:55:03 +0300	[thread overview]
Message-ID: <365e2ab9-6ca2-961b-5e27-f2f46a2e32c6@oktetlabs.ru> (raw)
In-Reply-To: <BL1PR12MB5335400589726184FCC892B0D63C9@BL1PR12MB5335.namprd12.prod.outlook.com>
On 6/3/21 12:18 PM, Ori Kam wrote:
> Hi All,
> 
>> -----Original Message-----
>> From: Ivan Malov <Ivan.Malov@oktetlabs.ru>
>>
>> On 02/06/2021 14:21, Eli Britstein wrote:
>>>
>>> On 6/2/2021 1:50 PM, Andrew Rybchenko wrote:
>>>> External email: Use caution opening links or attachments
>>>>
>>>>
>>>> On 6/2/21 12:57 PM, Eli Britstein wrote:
>>>>> On 6/1/2021 5:53 PM, Andrew Rybchenko wrote:
>>>>>> External email: Use caution opening links or attachments
>>>>>>
>>>>>>
>>>>>> On 6/1/21 5:44 PM, Eli Britstein wrote:
>>>>>>> On 6/1/2021 5:35 PM, Andrew Rybchenko wrote:
>>>>>>>> External email: Use caution opening links or attachments
>>>>>>>>
>>>>>>>>
>>>>>>>> On 6/1/21 4:24 PM, Eli Britstein wrote:
>>>>>>>>> On 6/1/2021 3:10 PM, Ilya Maximets wrote:
>>>>>>>>>> External email: Use caution opening links or attachments
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On 6/1/21 1:14 PM, Ivan Malov wrote:
>>>>>>>>>>> By its very name, action PORT_ID means that packets hit an
>>>>>>>>>>> ethdev with the given DPDK port ID. At least the current
>>>>>>>>>>> comments don't state the opposite.
>>>>>>>>>>> That said, since port representors had been adopted,
>>>>>>>>>>> applications like OvS have been misusing the action. They
>>>>>>>>>>> misread its purpose as sending packets to the opposite end of
>>>>>>>>>>> the "wire" plugged to the given ethdev, for example,
>>>>>>>>>>> redirecting packets to the VF itself rather than to its
>>>>>>>>>>> representor ethdev.
> 
> Sorry but OVS got it right, this is the idea to send packet to the VF not to the representor, 
> I think that our first discussion should be what is a representor,
> I know that there are a lot threads about it but it is steel unclear.  
Yes, really unclear. I'd like to highlight again that
the problem is not with representors only (as described
and discussed in the thread).
> From my understanding representor is a shadow of a VF
> This shadow has two functionalities:
> 1. data
> It should receive any packet that was sent from the VF and was not
> routed to any other destination. And vise versa any traffic sent on the representor.
> should arrive to the corresponding VF.
> What use case do you see for sending a packet to the representor?
> 
> 2. control
> allow to modify the VF from DPDK application.
> 
> Regarding the 1 point of the data, I don't see any sense if routing traffic to representor.
> While on point 2 control their maybe some cases that we want to configure the representor itself
> and not the VF for example changing mtu.
IMO if so there is a big inconsistency here with statistics
(just an example, which is simply to discuss).
On one hand packet/byte stats should say how much data is
received/sent by the DPDK application via the port (yes,
shadow, but still an ethdev port).
On the other hand you say that it is a shadow and it should
return VF stats.
>>>>>>>>>>> Another example: OvS relies on this action with the admin PF's
>>>>>>>>>>> ethdev port ID specified in it in order to send offloaded
>>>>>>>>>>> packets to the physical port.
>>>>>>>>>>>
>>>>>>>>>>> Since there might be applications which use this action in its
>>>>>>>>>>> valid sense, one can't just change the documentation to
>>>>>>>>>>> greenlight the opposite meaning.
>>>>>>>>>>> This patch adds an explicit bit to the action configuration
>>>>>>>>>>> which will let applications, depending on their needs,
>>>>>>>>>>> leverage the two meanings properly.
>>>>>>>>>>> Applications like OvS, as well as PMDs, will have to be
>>>>>>>>>>> corrected when the patch has been applied. But the improved
>>>>>>>>>>> clarity of the action is worth it.
>>>>>>>>>>>
>>>>>>>>>>> The proposed change is not the only option. One could avoid
>>>>>>>>>>> changes in OvS and PMDs if the new configuration field had the
>>>>>>>>>>> opposite meaning, with the action itself meaning delivery to
>>>>>>>>>>> the represented port and not to DPDK one.
>>>>>>>>>>> Alternatively, one could define a brand new action with the
>>>>>>>>>>> said behaviour.
>>>>>>>>> It doesn't make any sense to attach the VF itself to OVS, but
>>>>>>>>> only its representor.
>>>>>>>> OvS is not the only DPDK application.
>>>>>>> True. It is just the focus of this commit message is OVS.
>>>>>>>>> For the PF, when in switchdev mode, it is the "uplink
>>>>>>>>> representor", so it is also a representor.
>>>>>>>> Strictly speaking it is not a representor from DPDK point of
>>>>>>>> view. E.g. representors have corresponding flag set which is
>>>>>>>> definitely clear in the case of PF.
>>>>>>> This is the per-PMD responsibility. The API should not care.
>>>>>>>>> That said, OVS does not care of the type of the port. It doesn't
>>>>>>>>> matter if it's an "upstream" or not, or if it's a representor or
>>>>>>>>> not.
>>>>>>>> Yes, it is clear, but let's put OvS aside. Let's consider a DPDK
>>>>>>>> application which has a number of ethdev port. Some may belong to
>>>>>>>> single switch domain, some may be from different switch domains
>>>>>>>> (i.e. different NICs). Can I use PORT_ID action to redirect
>>>>>>>> ingress traffic to a specified ethdev port using PORT_ID action?
>>>>>>>> It looks like no, but IMHO it is the definition of the PORT_ID
>>>>>>>> action.
>>>>>>> Let's separate API from implementation. By API point of view, yes,
>>>>>>> the user may request it. Nothing wrong with it.
>>>>>>>
>>>>>>>   From implementation point of view - yes, it might fail, but not
>>>>>>> for sure, even if on different NICs. Maybe the HW of a certain
>>>>>>> vendor has the capability to do it?
>>>>>>>
>>>>>>> We can't know, so I think the API should allow it.
>>>>>> Hold on. What should it allow? It is two opposite meanings:
>>>>>>    1. Direct traffic to DPDK ethdev port specified using ID to be
>>>>>>       received and processed by the DPDK application.
>>>>>>    2. Direct traffic to an upstream port represented by the
>>>>>>       DPDK port.
>>>>>>
>>>>>> The patch tries to address the ambiguity, misuse it in OvS (from my
>>>>>> point of view in accordance with the action documentation),
>>>>>> mis-implementation in a number of PMDs (to work in OvS) and tries
>>>>>> to sort it out with an explanation why proposed direction is
>>>>>> chosen. I realize that it could be painful, but IMHO it is the best
>>>>>> option here. Yes, it is a point to discuss.
>>>>>>
>>>>>> To start with we should agree that that problem exists.
>>>>>> Second, we should agree on direction how to solve it.
>>>>> I agree. Suppose port 0 is the PF, and port 1 is a VF representor.
>>>>>
>>>>> IIUC, there are two options:
>>>>>
>>>>> 1. flow create 1 ingress transfer pattern eth / end action port_id
>>>>> id 0 upstream 1 / end
>>>>>
> 
> What is the meaning of upstream if I want to send traffic between VFs?
Which way? Which action? If you specify VF representor ID in
PORT_ID and say upstream (or egress as suggested by Ilya)
traffic will go to VF.
In fact, I like ingress/egress terminology for the action
suggested by Ilya in other mail in the thread.
>>>>> 2. flow create 1 ingress transfer pattern eth / end action port_id
>>>>> id 0 upstream 0 / end
>>>>>
>>>>> [1] is the same behavior as today.
>>>>>
>>>>> [2] is a new behavior, the packet received by port 0 as if it
>>>>> arrived from the wire.
>>>>>
>>>>> Then, let's have more:
>>>>>
>>>>> 3. flow create 0 ingress transfer pattern eth / end action port_id
>>>>> id 1 upstream 1 / end
>>>>>
>>>>> 4. flow create 0 ingress transfer pattern eth / end action port_id
>>>>> id 1 upstream 0 / end
>>>>>
>>>>> if we have [2] and [4], the packet going from the VF will hit [2],
>>>>> then hit [4] and then [2] again in an endless loop?
>>>> As I understand PORT_ID is a fate action. So, no more lookups are
>>>> done. If the packet is loop back from applications, loop is possible.
>>>
>>> I referred a HW loop, not SW. For example with JUMP action (also fate):
>>>
>>> flow create 0 group 0 ingress transfer pattern eth / end action jump
>>> group 1 / end
>>>
>>> flow create 0 group 1 ingress transfer pattern eth / end action jump
>>> group 0 / end
>>>
>>>>
>>>> In fact, it is a good question if "flow creare 0 ingress transfer" or
>>>> "flow create 1 ingress transfer" assume any implicit filtering. I
>>>> always thought that no.
> 
> This is a matter for discussion but currently we do have implicit filtering 
> on the source VF
Very good. Let's discuss and document the decision.
We've provided arguments why it should not be
implicit filtering in the case of transfer.
>>>> i.e. if we have two network ports rule like
>>>>    flow create 0 ingress transfer pattern eth / end \
>>>>         action port_id id 1 upstream 1 / end will match packets
>>>> incoming from any port into the switch (network port 0, network port
>>>> 1, VF or PF itself (???)).
>>>> The topic also requires explicit clarification.
>>> rte_flow is port based. It implicitly filters only packets for the
>>> provided port (0).
>>>
>>> Maybe need to clarify documentation and have a "no filtering" API if
>>> needed.
> 
> Maybe
I'd say definitely requires since different vendors understand
it in a different ways.
>>
>> We've come across the following bits in the current documentation with
>> respect to attribute "transfer", quote:
>>
>> "Instead of simply matching the properties of traffic as it would appear on a
>> given DPDK port ID, enabling this attribute transfers a flow rule to the lowest
>> possible level of any device endpoints found in the pattern.
>>
>> When supported, this effectively enables an application to reroute traffic not
>> necessarily intended for it (e.g. coming from or addressed to different
>> physical ports, VFs or applications) at the device level".
>>
>> (https://doc.dpdk.org/guides/prog_guide/rte_flow.html#attributes)
>>
>> Since action PORT_ID hardly makes sense without attribute "transfer"
>> (unless it doesn't point to the same ethdev as the one used to submit the
>> flow), this paragraph effectively states that in this particular case API
>> "flow_create" is not (necessarily) port-based.
>>
> I'm not sure I understand what you mean port based.
> 
>>>
>>>> PF itself is really a hard question because of "ingress"
>>>> since traffic from PF is a traffic from DPDK application and it is
>>>> egress, not ingress.
>>>
>>> Ingress means the direction. Hit on packets otherwise provided to the
>>> SW by rte_eth_rx_burst().
>>>
>>> Same goes for the PF. Packets by rte_eth_rx_burst are the ones
>>> arriving from the wire, so ingress is that direction and egress is from the
>> app.
>>>
>>>>
>>>> I think that port ID used to created flow rule should not apply any
>>>> filtering in the case of transfer since we have corresponding items
>>>> to do it explicitly. If we do it implicitly as well, we need some
>>>> priorities and a way to avoid implicit rules which makes things much
>>>> harder to understand and implement.
>>>
> 
> I agree with your point, I think the application should state exactly what it wants.
> 
> Best,
> Ori
>>> If "upstream 0" means what I thought it means (comments?) maybe a
>>> better way to do it is expose another port for that, so there will be 2 "PF"
>>> ports - one as the wire representor and the other one as the "PF" (or
>>> clearer naming...).
>>>
>>> This would be a vendor decision, and there would be no need to change
>>> PORT_ID API.
>>>
>>>>
>>>>> If this is your meaning, maybe what you are looking for is an action
>>>>> to change the in_port and continue processing?
>>>>>
>>>>> Please comment on the examples I gave or clarify the use case you
>>>>> are trying to do.
>>>>>
>>>>>
>>>>> Thanks,
>>>>>
>>>>> Eli
>>>>>
>>>>>>>>>> We had already very similar discussions regarding the
>>>>>>>>>> understanding of what the representor really is from the DPDK
>>>>>>>>>> API's point of view, and the last time, IIUC, it was concluded
>>>>>>>>>> by a tech. board that representor should be a "ghost of a VF",
>>>>>>>>>> i.e. DPDK APIs should apply configuration by default to VF and
>>>>>>>>>> not to the representor device:
>>>>>>>>>>
>>>>>>>>>>
>> https://patches.dpdk.org/project/dpdk/cover/20191029185051.3220
>>>>>>>>>> 3-1-thomas@monjalon.net/#104376
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> This wasn't enforced though, IIUC, for existing code and
>>>>>>>>>> semantics is still mixed.
>>>>>>>>> I am not sure how this is related.
>>>>>>>>>> I still think that configuration should be applied to VF, and
>>>>>>>>>> the same applies to rte_flow API.  IMHO, average application
>>>>>>>>>> should not care if device is a VF itself or its representor.
>>>>>>>>>> Everything should work exactly the same.
>>>>>>>>>> I think this matches with the original idea/design of the
>>>>>>>>>> switchdev functionality in the linux kernel and also matches
>>>>>>>>>> with how the average user thinks about representor devices.
>>>>>>>>> Right. This is the way representors work. It is fully aligned
>>>>>>>>> with configuration of OVS-kernel.
>>>>>>>>>> If some specific use-case requires to distinguish VF from the
>>>>>>>>>> representor, there should probably be a separate special
>>>>>>>>>> API/flag for that.
>>>>>>>>>>
>>>>>>>>>> Best regards, Ilya Maximets.
>>
>> --
>> Ivan M
next prev parent reply	other threads:[~2021-06-03  9:55 UTC|newest]
Thread overview: 40+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-06-01 11:14 Ivan Malov
2021-06-01 12:10 ` Ilya Maximets
2021-06-01 13:24   ` Eli Britstein
2021-06-01 14:35     ` Andrew Rybchenko
2021-06-01 14:44       ` Eli Britstein
2021-06-01 14:50         ` Ivan Malov
2021-06-01 14:53         ` Andrew Rybchenko
2021-06-02  9:57           ` Eli Britstein
2021-06-02 10:50             ` Andrew Rybchenko
2021-06-02 11:21               ` Eli Britstein
2021-06-02 11:57                 ` Andrew Rybchenko
2021-06-02 12:36                 ` Ivan Malov
2021-06-03  9:18                   ` Ori Kam
2021-06-03  9:55                     ` Andrew Rybchenko [this message]
2021-06-07  8:28                       ` Thomas Monjalon
2021-06-07  9:42                         ` Andrew Rybchenko
2021-06-07 12:08                           ` Ori Kam
2021-06-07 13:21                             ` Ilya Maximets
2021-06-07 16:07                               ` Thomas Monjalon
2021-06-08 16:13                                 ` Thomas Monjalon
2021-06-08 16:32                                   ` Andrew Rybchenko
2021-06-08 18:49                                     ` Thomas Monjalon
2021-06-09 14:31                                       ` Andrew Rybchenko
2021-06-01 14:49     ` Ivan Malov
2021-06-01 14:28   ` Ivan Malov
2021-06-02 12:46     ` Ilya Maximets
2021-06-02 16:26       ` Andrew Rybchenko
2021-06-02 17:35         ` Ilya Maximets
2021-06-02 19:35           ` Ivan Malov
2021-06-03  9:29             ` Ilya Maximets
2021-06-03 10:33               ` Andrew Rybchenko
2021-06-03 11:05                 ` Ilya Maximets
2021-06-03 11:29               ` Ivan Malov
2021-06-07 19:27                 ` Ilya Maximets
2021-06-07 20:39                   ` Ivan Malov
2021-06-25 13:04       ` Ferruh Yigit
2021-06-02 12:16   ` Thomas Monjalon
2021-06-02 12:53     ` Ilya Maximets
2021-06-02 13:10     ` Andrew Rybchenko
2021-09-03  7:46 ` [dpdk-dev] [PATCH v1] " Andrew Rybchenko
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox
  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):
  git send-email \
    --in-reply-to=365e2ab9-6ca2-961b-5e27-f2f46a2e32c6@oktetlabs.ru \
    --to=andrew.rybchenko@oktetlabs.ru \
    --cc=Ivan.Malov@oktetlabs.ru \
    --cc=ajit.khaparde@broadcom.com \
    --cc=dev@dpdk.org \
    --cc=elibr@nvidia.com \
    --cc=ferruh.yigit@intel.com \
    --cc=hyonkim@cisco.com \
    --cc=i.maximets@ovn.org \
    --cc=jerinj@marvell.com \
    --cc=johndale@cisco.com \
    --cc=kishore.padmanabha@broadcom.com \
    --cc=orika@nvidia.com \
    --cc=smadarf@marvell.com \
    --cc=thomas@monjalon.net \
    /path/to/YOUR_REPLY
  https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
  Be sure your reply has a Subject: header at the top and a blank line
  before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).