* [dpdk-dev] [RFC PATCH] ethdev: clarify flow action PORT ID semantics
@ 2021-06-01 11:14 Ivan Malov
2021-06-01 12:10 ` Ilya Maximets
2021-09-03 7:46 ` [dpdk-dev] [PATCH v1] " Andrew Rybchenko
0 siblings, 2 replies; 40+ messages in thread
From: Ivan Malov @ 2021-06-01 11:14 UTC (permalink / raw)
To: dev
Cc: Eli Britstein, Ilya Maximets, Smadar Fuks, Hyong Youb Kim,
Kishore Padmanabha, Ori Kam, Ajit Khaparde, Jerin Jacob,
John Daley, Thomas Monjalon, Ferruh Yigit, Andrew Rybchenko
By its very name, action PORT_ID means that packets hit an ethdev with the
given DPDK port ID. At least the current comments don't state the opposite.
That said, since port representors had been adopted, applications like OvS
have been misusing the action. They misread its purpose as sending packets
to the opposite end of the "wire" plugged to the given ethdev, for example,
redirecting packets to the VF itself rather than to its representor ethdev.
Another example: OvS relies on this action with the admin PF's ethdev port
ID specified in it in order to send offloaded packets to the physical port.
Since there might be applications which use this action in its valid sense,
one can't just change the documentation to greenlight the opposite meaning.
This patch adds an explicit bit to the action configuration which will let
applications, depending on their needs, leverage the two meanings properly.
Applications like OvS, as well as PMDs, will have to be corrected when the
patch has been applied. But the improved clarity of the action is worth it.
The proposed change is not the only option. One could avoid changes in OvS
and PMDs if the new configuration field had the opposite meaning, with the
action itself meaning delivery to the represented port and not to DPDK one.
Alternatively, one could define a brand new action with the said behaviour.
One may also consider clarifying item PORT_ID meaning in a separate change.
Signed-off-by: Ivan Malov <ivan.malov@oktetlabs.ru>
---
lib/ethdev/rte_flow.h | 13 +++++++++++--
1 file changed, 11 insertions(+), 2 deletions(-)
diff --git a/lib/ethdev/rte_flow.h b/lib/ethdev/rte_flow.h
index 961a5884f..f45937bd7 100644
--- a/lib/ethdev/rte_flow.h
+++ b/lib/ethdev/rte_flow.h
@@ -2635,13 +2635,22 @@ struct rte_flow_action_phy_port {
/**
* RTE_FLOW_ACTION_TYPE_PORT_ID
*
- * Directs matching traffic to a given DPDK port ID.
+ * Directs matching traffic to an ethdev with the given DPDK port ID or
+ * to the upstream port (the peer side of the wire) corresponding to it.
+ *
+ * It's assumed that it's the PMD (typically, its instance at the admin
+ * PF) which controls the binding between a (representor) ethdev and an
+ * upstream port. Typical bindings: VF rep. <=> VF, PF <=> network port.
+ * If the PMD instance is unaware of the binding between the ethdev and
+ * its upstream port (or can't control it), it should reject the action
+ * with the upstream bit specified and log an appropriate error message.
*
* @see RTE_FLOW_ITEM_TYPE_PORT_ID
*/
struct rte_flow_action_port_id {
uint32_t original:1; /**< Use original DPDK port ID if possible. */
- uint32_t reserved:31; /**< Reserved, must be zero. */
+ uint32_t upstream:1; /**< Use the upstream port for this one. */
+ uint32_t reserved:30; /**< Reserved, must be zero. */
uint32_t id; /**< DPDK port ID. */
};
--
2.20.1
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [dpdk-dev] [RFC PATCH] ethdev: clarify flow action PORT ID semantics
2021-06-01 11:14 [dpdk-dev] [RFC PATCH] ethdev: clarify flow action PORT ID semantics Ivan Malov
@ 2021-06-01 12:10 ` Ilya Maximets
2021-06-01 13:24 ` Eli Britstein
` (2 more replies)
2021-09-03 7:46 ` [dpdk-dev] [PATCH v1] " Andrew Rybchenko
1 sibling, 3 replies; 40+ messages in thread
From: Ilya Maximets @ 2021-06-01 12:10 UTC (permalink / raw)
To: Ivan Malov, dev
Cc: Eli Britstein, Ilya Maximets, Smadar Fuks, Hyong Youb Kim,
Kishore Padmanabha, Ori Kam, Ajit Khaparde, Jerin Jacob,
John Daley, Thomas Monjalon, Ferruh Yigit, Andrew Rybchenko
On 6/1/21 1:14 PM, Ivan Malov wrote:
> By its very name, action PORT_ID means that packets hit an ethdev with the
> given DPDK port ID. At least the current comments don't state the opposite.
> That said, since port representors had been adopted, applications like OvS
> have been misusing the action. They misread its purpose as sending packets
> to the opposite end of the "wire" plugged to the given ethdev, for example,
> redirecting packets to the VF itself rather than to its representor ethdev.
> Another example: OvS relies on this action with the admin PF's ethdev port
> ID specified in it in order to send offloaded packets to the physical port.
>
> Since there might be applications which use this action in its valid sense,
> one can't just change the documentation to greenlight the opposite meaning.
> This patch adds an explicit bit to the action configuration which will let
> applications, depending on their needs, leverage the two meanings properly.
> Applications like OvS, as well as PMDs, will have to be corrected when the
> patch has been applied. But the improved clarity of the action is worth it.
>
> The proposed change is not the only option. One could avoid changes in OvS
> and PMDs if the new configuration field had the opposite meaning, with the
> action itself meaning delivery to the represented port and not to DPDK one.
> Alternatively, one could define a brand new action with the said behaviour.
We had already very similar discussions regarding the understanding of what
the representor really is from the DPDK API's point of view, and the last
time, IIUC, it was concluded by a tech. board that representor should be
a "ghost of a VF", i.e. DPDK APIs should apply configuration by default to
VF and not to the representor device:
https://patches.dpdk.org/project/dpdk/cover/20191029185051.32203-1-thomas@monjalon.net/#104376
This wasn't enforced though, IIUC, for existing code and semantics is still mixed.
I still think that configuration should be applied to VF, and the same applies
to rte_flow API. IMHO, average application should not care if device is
a VF itself or its representor. Everything should work exactly the same.
I think this matches with the original idea/design of the switchdev functionality
in the linux kernel and also matches with how the average user thinks about
representor devices.
If some specific use-case requires to distinguish VF from the representor,
there should probably be a separate special API/flag for that.
Best regards, Ilya Maximets.
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [dpdk-dev] [RFC PATCH] ethdev: clarify flow action PORT ID semantics
2021-06-01 12:10 ` Ilya Maximets
@ 2021-06-01 13:24 ` Eli Britstein
2021-06-01 14:35 ` Andrew Rybchenko
2021-06-01 14:49 ` Ivan Malov
2021-06-01 14:28 ` Ivan Malov
2021-06-02 12:16 ` Thomas Monjalon
2 siblings, 2 replies; 40+ messages in thread
From: Eli Britstein @ 2021-06-01 13:24 UTC (permalink / raw)
To: Ilya Maximets, Ivan Malov, dev
Cc: Smadar Fuks, Hyong Youb Kim, Kishore Padmanabha, Ori Kam,
Ajit Khaparde, Jerin Jacob, John Daley, Thomas Monjalon,
Ferruh Yigit, Andrew Rybchenko
On 6/1/2021 3:10 PM, Ilya Maximets wrote:
> External email: Use caution opening links or attachments
>
>
> On 6/1/21 1:14 PM, Ivan Malov wrote:
>> By its very name, action PORT_ID means that packets hit an ethdev with the
>> given DPDK port ID. At least the current comments don't state the opposite.
>> That said, since port representors had been adopted, applications like OvS
>> have been misusing the action. They misread its purpose as sending packets
>> to the opposite end of the "wire" plugged to the given ethdev, for example,
>> redirecting packets to the VF itself rather than to its representor ethdev.
>> Another example: OvS relies on this action with the admin PF's ethdev port
>> ID specified in it in order to send offloaded packets to the physical port.
>>
>> Since there might be applications which use this action in its valid sense,
>> one can't just change the documentation to greenlight the opposite meaning.
>> This patch adds an explicit bit to the action configuration which will let
>> applications, depending on their needs, leverage the two meanings properly.
>> Applications like OvS, as well as PMDs, will have to be corrected when the
>> patch has been applied. But the improved clarity of the action is worth it.
>>
>> The proposed change is not the only option. One could avoid changes in OvS
>> and PMDs if the new configuration field had the opposite meaning, with the
>> action itself meaning delivery to the represented port and not to DPDK one.
>> Alternatively, one could define a brand new action with the said behaviour.
It doesn't make any sense to attach the VF itself to OVS, but only its
representor.
For the PF, when in switchdev mode, it is the "uplink representor", so
it is also a representor.
That said, OVS does not care of the type of the port. It doesn't matter
if it's an "upstream" or not, or if it's a representor or not.
> We had already very similar discussions regarding the understanding of what
> the representor really is from the DPDK API's point of view, and the last
> time, IIUC, it was concluded by a tech. board that representor should be
> a "ghost of a VF", i.e. DPDK APIs should apply configuration by default to
> VF and not to the representor device:
> https://patches.dpdk.org/project/dpdk/cover/20191029185051.32203-1-thomas@monjalon.net/#104376
> This wasn't enforced though, IIUC, for existing code and semantics is still mixed.
I am not sure how this is related.
>
> I still think that configuration should be applied to VF, and the same applies
> to rte_flow API. IMHO, average application should not care if device is
> a VF itself or its representor. Everything should work exactly the same.
> I think this matches with the original idea/design of the switchdev functionality
> in the linux kernel and also matches with how the average user thinks about
> representor devices.
Right. This is the way representors work. It is fully aligned with
configuration of OVS-kernel.
>
> If some specific use-case requires to distinguish VF from the representor,
> there should probably be a separate special API/flag for that.
>
> Best regards, Ilya Maximets.
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [dpdk-dev] [RFC PATCH] ethdev: clarify flow action PORT ID semantics
2021-06-01 12:10 ` Ilya Maximets
2021-06-01 13:24 ` Eli Britstein
@ 2021-06-01 14:28 ` Ivan Malov
2021-06-02 12:46 ` Ilya Maximets
2021-06-02 12:16 ` Thomas Monjalon
2 siblings, 1 reply; 40+ messages in thread
From: Ivan Malov @ 2021-06-01 14:28 UTC (permalink / raw)
To: Ilya Maximets, dev
Cc: Eli Britstein, Smadar Fuks, Hyong Youb Kim, Kishore Padmanabha,
Ori Kam, Ajit Khaparde, Jerin Jacob, John Daley, Thomas Monjalon,
Ferruh Yigit, Andrew Rybchenko
Hi Ilya,
Thank you for reviewing the proposal at such short notice. I'm afraid
that prior discussions overlook the simple fact that the whole problem
is not limited to just VF representors. Action PORT_ID is also used with
respect to the admin PF's ethdev, which "represents itself" (and by no
means it represents the underlying physical/network port). In this case,
one cannot state that the application treats it as a physical port, just
like one states that the application perceives representors as VFs
themselves.
Given these facts, it would not be quite right to just align the
documentation with the de-facto action meaning assumed by OvS.
On 01/06/2021 15:10, Ilya Maximets wrote:
> On 6/1/21 1:14 PM, Ivan Malov wrote:
>> By its very name, action PORT_ID means that packets hit an ethdev with the
>> given DPDK port ID. At least the current comments don't state the opposite.
>> That said, since port representors had been adopted, applications like OvS
>> have been misusing the action. They misread its purpose as sending packets
>> to the opposite end of the "wire" plugged to the given ethdev, for example,
>> redirecting packets to the VF itself rather than to its representor ethdev.
>> Another example: OvS relies on this action with the admin PF's ethdev port
>> ID specified in it in order to send offloaded packets to the physical port.
>>
>> Since there might be applications which use this action in its valid sense,
>> one can't just change the documentation to greenlight the opposite meaning.
>> This patch adds an explicit bit to the action configuration which will let
>> applications, depending on their needs, leverage the two meanings properly.
>> Applications like OvS, as well as PMDs, will have to be corrected when the
>> patch has been applied. But the improved clarity of the action is worth it.
>>
>> The proposed change is not the only option. One could avoid changes in OvS
>> and PMDs if the new configuration field had the opposite meaning, with the
>> action itself meaning delivery to the represented port and not to DPDK one.
>> Alternatively, one could define a brand new action with the said behaviour.
>
> We had already very similar discussions regarding the understanding of what
> the representor really is from the DPDK API's point of view, and the last
> time, IIUC, it was concluded by a tech. board that representor should be
> a "ghost of a VF", i.e. DPDK APIs should apply configuration by default to
> VF and not to the representor device:
> https://patches.dpdk.org/project/dpdk/cover/20191029185051.32203-1-thomas@monjalon.net/#104376
> This wasn't enforced though, IIUC, for existing code and semantics is still mixed.
>
> I still think that configuration should be applied to VF, and the same applies
> to rte_flow API. IMHO, average application should not care if device is
> a VF itself or its representor. Everything should work exactly the same.
> I think this matches with the original idea/design of the switchdev functionality
> in the linux kernel and also matches with how the average user thinks about
> representor devices.
>
> If some specific use-case requires to distinguish VF from the representor,
> there should probably be a separate special API/flag for that.
>
> Best regards, Ilya Maximets.
>
--
Ivan M
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [dpdk-dev] [RFC PATCH] ethdev: clarify flow action PORT ID semantics
2021-06-01 13:24 ` Eli Britstein
@ 2021-06-01 14:35 ` Andrew Rybchenko
2021-06-01 14:44 ` Eli Britstein
2021-06-01 14:49 ` Ivan Malov
1 sibling, 1 reply; 40+ messages in thread
From: Andrew Rybchenko @ 2021-06-01 14:35 UTC (permalink / raw)
To: Eli Britstein, Ilya Maximets, Ivan Malov, dev
Cc: Smadar Fuks, Hyong Youb Kim, Kishore Padmanabha, Ori Kam,
Ajit Khaparde, Jerin Jacob, John Daley, Thomas Monjalon,
Ferruh Yigit
On 6/1/21 4:24 PM, Eli Britstein wrote:
>
> On 6/1/2021 3:10 PM, Ilya Maximets wrote:
>> External email: Use caution opening links or attachments
>>
>>
>> On 6/1/21 1:14 PM, Ivan Malov wrote:
>>> By its very name, action PORT_ID means that packets hit an ethdev
>>> with the
>>> given DPDK port ID. At least the current comments don't state the
>>> opposite.
>>> That said, since port representors had been adopted, applications
>>> like OvS
>>> have been misusing the action. They misread its purpose as sending
>>> packets
>>> to the opposite end of the "wire" plugged to the given ethdev, for
>>> example,
>>> redirecting packets to the VF itself rather than to its representor
>>> ethdev.
>>> Another example: OvS relies on this action with the admin PF's ethdev
>>> port
>>> ID specified in it in order to send offloaded packets to the physical
>>> port.
>>>
>>> Since there might be applications which use this action in its valid
>>> sense,
>>> one can't just change the documentation to greenlight the opposite
>>> meaning.
>>> This patch adds an explicit bit to the action configuration which
>>> will let
>>> applications, depending on their needs, leverage the two meanings
>>> properly.
>>> Applications like OvS, as well as PMDs, will have to be corrected
>>> when the
>>> patch has been applied. But the improved clarity of the action is
>>> worth it.
>>>
>>> The proposed change is not the only option. One could avoid changes
>>> in OvS
>>> and PMDs if the new configuration field had the opposite meaning,
>>> with the
>>> action itself meaning delivery to the represented port and not to
>>> DPDK one.
>>> Alternatively, one could define a brand new action with the said
>>> behaviour.
>
> It doesn't make any sense to attach the VF itself to OVS, but only its
> representor.
OvS is not the only DPDK application.
> For the PF, when in switchdev mode, it is the "uplink representor", so
> it is also a representor.
Strictly speaking it is not a representor from DPDK point of
view. E.g. representors have corresponding flag set which is
definitely clear in the case of PF.
> That said, OVS does not care of the type of the port. It doesn't matter
> if it's an "upstream" or not, or if it's a representor or not.
Yes, it is clear, but let's put OvS aside. Let's consider a
DPDK application which has a number of ethdev port. Some may
belong to single switch domain, some may be from different
switch domains (i.e. different NICs). Can I use PORT_ID action
to redirect ingress traffic to a specified ethdev port using
PORT_ID action? It looks like no, but IMHO it is the definition
of the PORT_ID action.
>> We had already very similar discussions regarding the understanding of
>> what
>> the representor really is from the DPDK API's point of view, and the last
>> time, IIUC, it was concluded by a tech. board that representor should be
>> a "ghost of a VF", i.e. DPDK APIs should apply configuration by
>> default to
>> VF and not to the representor device:
>>
>> https://patches.dpdk.org/project/dpdk/cover/20191029185051.32203-1-thomas@monjalon.net/#104376
>>
>> This wasn't enforced though, IIUC, for existing code and semantics is
>> still mixed.
> I am not sure how this is related.
>>
>> I still think that configuration should be applied to VF, and the same
>> applies
>> to rte_flow API. IMHO, average application should not care if device is
>> a VF itself or its representor. Everything should work exactly the same.
>> I think this matches with the original idea/design of the switchdev
>> functionality
>> in the linux kernel and also matches with how the average user thinks
>> about
>> representor devices.
> Right. This is the way representors work. It is fully aligned with
> configuration of OVS-kernel.
>>
>> If some specific use-case requires to distinguish VF from the
>> representor,
>> there should probably be a separate special API/flag for that.
>>
>> Best regards, Ilya Maximets.
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [dpdk-dev] [RFC PATCH] ethdev: clarify flow action PORT ID semantics
2021-06-01 14:35 ` Andrew Rybchenko
@ 2021-06-01 14:44 ` Eli Britstein
2021-06-01 14:50 ` Ivan Malov
2021-06-01 14:53 ` Andrew Rybchenko
0 siblings, 2 replies; 40+ messages in thread
From: Eli Britstein @ 2021-06-01 14:44 UTC (permalink / raw)
To: Andrew Rybchenko, Ilya Maximets, Ivan Malov, dev
Cc: Smadar Fuks, Hyong Youb Kim, Kishore Padmanabha, Ori Kam,
Ajit Khaparde, Jerin Jacob, John Daley, Thomas Monjalon,
Ferruh Yigit
On 6/1/2021 5:35 PM, Andrew Rybchenko wrote:
> External email: Use caution opening links or attachments
>
>
> On 6/1/21 4:24 PM, Eli Britstein wrote:
>> On 6/1/2021 3:10 PM, Ilya Maximets wrote:
>>> External email: Use caution opening links or attachments
>>>
>>>
>>> On 6/1/21 1:14 PM, Ivan Malov wrote:
>>>> By its very name, action PORT_ID means that packets hit an ethdev
>>>> with the
>>>> given DPDK port ID. At least the current comments don't state the
>>>> opposite.
>>>> That said, since port representors had been adopted, applications
>>>> like OvS
>>>> have been misusing the action. They misread its purpose as sending
>>>> packets
>>>> to the opposite end of the "wire" plugged to the given ethdev, for
>>>> example,
>>>> redirecting packets to the VF itself rather than to its representor
>>>> ethdev.
>>>> Another example: OvS relies on this action with the admin PF's ethdev
>>>> port
>>>> ID specified in it in order to send offloaded packets to the physical
>>>> port.
>>>>
>>>> Since there might be applications which use this action in its valid
>>>> sense,
>>>> one can't just change the documentation to greenlight the opposite
>>>> meaning.
>>>> This patch adds an explicit bit to the action configuration which
>>>> will let
>>>> applications, depending on their needs, leverage the two meanings
>>>> properly.
>>>> Applications like OvS, as well as PMDs, will have to be corrected
>>>> when the
>>>> patch has been applied. But the improved clarity of the action is
>>>> worth it.
>>>>
>>>> The proposed change is not the only option. One could avoid changes
>>>> in OvS
>>>> and PMDs if the new configuration field had the opposite meaning,
>>>> with the
>>>> action itself meaning delivery to the represented port and not to
>>>> DPDK one.
>>>> Alternatively, one could define a brand new action with the said
>>>> behaviour.
>> It doesn't make any sense to attach the VF itself to OVS, but only its
>> representor.
> OvS is not the only DPDK application.
True. It is just the focus of this commit message is OVS.
>
>> For the PF, when in switchdev mode, it is the "uplink representor", so
>> it is also a representor.
> Strictly speaking it is not a representor from DPDK point of
> view. E.g. representors have corresponding flag set which is
> definitely clear in the case of PF.
This is the per-PMD responsibility. The API should not care.
>
>> That said, OVS does not care of the type of the port. It doesn't matter
>> if it's an "upstream" or not, or if it's a representor or not.
> Yes, it is clear, but let's put OvS aside. Let's consider a
> DPDK application which has a number of ethdev port. Some may
> belong to single switch domain, some may be from different
> switch domains (i.e. different NICs). Can I use PORT_ID action
> to redirect ingress traffic to a specified ethdev port using
> PORT_ID action? It looks like no, but IMHO it is the definition
> of the PORT_ID action.
Let's separate API from implementation. By API point of view, yes, the
user may request it. Nothing wrong with it.
From implementation point of view - yes, it might fail, but not for
sure, even if on different NICs. Maybe the HW of a certain vendor has
the capability to do it?
We can't know, so I think the API should allow it.
>
>>> We had already very similar discussions regarding the understanding of
>>> what
>>> the representor really is from the DPDK API's point of view, and the last
>>> time, IIUC, it was concluded by a tech. board that representor should be
>>> a "ghost of a VF", i.e. DPDK APIs should apply configuration by
>>> default to
>>> VF and not to the representor device:
>>>
>>> https://patches.dpdk.org/project/dpdk/cover/20191029185051.32203-1-thomas@monjalon.net/#104376
>>>
>>> This wasn't enforced though, IIUC, for existing code and semantics is
>>> still mixed.
>> I am not sure how this is related.
>>> I still think that configuration should be applied to VF, and the same
>>> applies
>>> to rte_flow API. IMHO, average application should not care if device is
>>> a VF itself or its representor. Everything should work exactly the same.
>>> I think this matches with the original idea/design of the switchdev
>>> functionality
>>> in the linux kernel and also matches with how the average user thinks
>>> about
>>> representor devices.
>> Right. This is the way representors work. It is fully aligned with
>> configuration of OVS-kernel.
>>> If some specific use-case requires to distinguish VF from the
>>> representor,
>>> there should probably be a separate special API/flag for that.
>>>
>>> Best regards, Ilya Maximets.
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [dpdk-dev] [RFC PATCH] ethdev: clarify flow action PORT ID semantics
2021-06-01 13:24 ` Eli Britstein
2021-06-01 14:35 ` Andrew Rybchenko
@ 2021-06-01 14:49 ` Ivan Malov
1 sibling, 0 replies; 40+ messages in thread
From: Ivan Malov @ 2021-06-01 14:49 UTC (permalink / raw)
To: Eli Britstein, Ilya Maximets, dev
Cc: Smadar Fuks, Hyong Youb Kim, Kishore Padmanabha, Ori Kam,
Ajit Khaparde, Jerin Jacob, John Daley, Thomas Monjalon,
Ferruh Yigit, Andrew Rybchenko
Hi Eli,
On 01/06/2021 16:24, Eli Britstein wrote:
>
> On 6/1/2021 3:10 PM, Ilya Maximets wrote:
>> External email: Use caution opening links or attachments
>>
>>
>> On 6/1/21 1:14 PM, Ivan Malov wrote:
>>> By its very name, action PORT_ID means that packets hit an ethdev
>>> with the
>>> given DPDK port ID. At least the current comments don't state the
>>> opposite.
>>> That said, since port representors had been adopted, applications
>>> like OvS
>>> have been misusing the action. They misread its purpose as sending
>>> packets
>>> to the opposite end of the "wire" plugged to the given ethdev, for
>>> example,
>>> redirecting packets to the VF itself rather than to its representor
>>> ethdev.
>>> Another example: OvS relies on this action with the admin PF's ethdev
>>> port
>>> ID specified in it in order to send offloaded packets to the physical
>>> port.
>>>
>>> Since there might be applications which use this action in its valid
>>> sense,
>>> one can't just change the documentation to greenlight the opposite
>>> meaning.
>>> This patch adds an explicit bit to the action configuration which
>>> will let
>>> applications, depending on their needs, leverage the two meanings
>>> properly.
>>> Applications like OvS, as well as PMDs, will have to be corrected
>>> when the
>>> patch has been applied. But the improved clarity of the action is
>>> worth it.
>>>
>>> The proposed change is not the only option. One could avoid changes
>>> in OvS
>>> and PMDs if the new configuration field had the opposite meaning,
>>> with the
>>> action itself meaning delivery to the represented port and not to
>>> DPDK one.
>>> Alternatively, one could define a brand new action with the said
>>> behaviour.
>
> It doesn't make any sense to attach the VF itself to OVS, but only its
> representor.
Sure. But that doesn't invalidate the idea of the patch.
>
> For the PF, when in switchdev mode, it is the "uplink representor", so
> it is also a representor.
>
No. According to the existing "port representors" documentation, the
admin PF port "represents itself", that is the PF, and by no means it
represents the underlying upstream port. And this makes really big
difference. One can indeed state that plugging VFs and not their
reprsentors to DPDK/OvS is useless, but the same statement is not
applicable to the admin's PF.
> That said, OVS does not care of the type of the port. It doesn't matter
> if it's an "upstream" or not, or if it's a representor or not.
>
From the high-level standpoint, indeed, the port type is a don't care
to OvS, but the truth is that DPDK offload path in OvS, being a
lower-level component, must respect all underlying DPDK primitives'
original meaning. Agreeing the top-level expectations (OvS) with the
lower-level means (DPDK flow library) *is* effectively the proper job of
app integration. And if for some reason the existing DPDK component
misreads the lower-level action real semantics, it cannot be justified
by high-level principles of OvS.
>
>> We had already very similar discussions regarding the understanding of
>> what
>> the representor really is from the DPDK API's point of view, and the last
>> time, IIUC, it was concluded by a tech. board that representor should be
>> a "ghost of a VF", i.e. DPDK APIs should apply configuration by
>> default to
>> VF and not to the representor device:
>>
>> https://patches.dpdk.org/project/dpdk/cover/20191029185051.32203-1-thomas@monjalon.net/#104376
>>
>> This wasn't enforced though, IIUC, for existing code and semantics is
>> still mixed.
> I am not sure how this is related.
>>
>> I still think that configuration should be applied to VF, and the same
>> applies
>> to rte_flow API. IMHO, average application should not care if device is
>> a VF itself or its representor. Everything should work exactly the same.
>> I think this matches with the original idea/design of the switchdev
>> functionality
>> in the linux kernel and also matches with how the average user thinks
>> about
>> representor devices.
> Right. This is the way representors work. It is fully aligned with
> configuration of OVS-kernel.
>>
>> If some specific use-case requires to distinguish VF from the
>> representor,
>> there should probably be a separate special API/flag for that.
>>
>> Best regards, Ilya Maximets.
--
Ivan M
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [dpdk-dev] [RFC PATCH] ethdev: clarify flow action PORT ID semantics
2021-06-01 14:44 ` Eli Britstein
@ 2021-06-01 14:50 ` Ivan Malov
2021-06-01 14:53 ` Andrew Rybchenko
1 sibling, 0 replies; 40+ messages in thread
From: Ivan Malov @ 2021-06-01 14:50 UTC (permalink / raw)
To: Eli Britstein, Andrew Rybchenko, Ilya Maximets, dev
Cc: Smadar Fuks, Hyong Youb Kim, Kishore Padmanabha, Ori Kam,
Ajit Khaparde, Jerin Jacob, John Daley, Thomas Monjalon,
Ferruh Yigit
On 01/06/2021 17:44, Eli Britstein wrote:
>
> On 6/1/2021 5:35 PM, Andrew Rybchenko wrote:
>> External email: Use caution opening links or attachments
>>
>>
>> On 6/1/21 4:24 PM, Eli Britstein wrote:
>>> On 6/1/2021 3:10 PM, Ilya Maximets wrote:
>>>> External email: Use caution opening links or attachments
>>>>
>>>>
>>>> On 6/1/21 1:14 PM, Ivan Malov wrote:
>>>>> By its very name, action PORT_ID means that packets hit an ethdev
>>>>> with the
>>>>> given DPDK port ID. At least the current comments don't state the
>>>>> opposite.
>>>>> That said, since port representors had been adopted, applications
>>>>> like OvS
>>>>> have been misusing the action. They misread its purpose as sending
>>>>> packets
>>>>> to the opposite end of the "wire" plugged to the given ethdev, for
>>>>> example,
>>>>> redirecting packets to the VF itself rather than to its representor
>>>>> ethdev.
>>>>> Another example: OvS relies on this action with the admin PF's ethdev
>>>>> port
>>>>> ID specified in it in order to send offloaded packets to the physical
>>>>> port.
>>>>>
>>>>> Since there might be applications which use this action in its valid
>>>>> sense,
>>>>> one can't just change the documentation to greenlight the opposite
>>>>> meaning.
>>>>> This patch adds an explicit bit to the action configuration which
>>>>> will let
>>>>> applications, depending on their needs, leverage the two meanings
>>>>> properly.
>>>>> Applications like OvS, as well as PMDs, will have to be corrected
>>>>> when the
>>>>> patch has been applied. But the improved clarity of the action is
>>>>> worth it.
>>>>>
>>>>> The proposed change is not the only option. One could avoid changes
>>>>> in OvS
>>>>> and PMDs if the new configuration field had the opposite meaning,
>>>>> with the
>>>>> action itself meaning delivery to the represented port and not to
>>>>> DPDK one.
>>>>> Alternatively, one could define a brand new action with the said
>>>>> behaviour.
>>> It doesn't make any sense to attach the VF itself to OVS, but only its
>>> representor.
>> OvS is not the only DPDK application.
> True. It is just the focus of this commit message is OVS.
Not the focus, but rather the most pictorial example.
>>
>>> For the PF, when in switchdev mode, it is the "uplink representor", so
>>> it is also a representor.
>> Strictly speaking it is not a representor from DPDK point of
>> view. E.g. representors have corresponding flag set which is
>> definitely clear in the case of PF.
> This is the per-PMD responsibility. The API should not care.
>>
>>> That said, OVS does not care of the type of the port. It doesn't matter
>>> if it's an "upstream" or not, or if it's a representor or not.
>> Yes, it is clear, but let's put OvS aside. Let's consider a
>> DPDK application which has a number of ethdev port. Some may
>> belong to single switch domain, some may be from different
>> switch domains (i.e. different NICs). Can I use PORT_ID action
>> to redirect ingress traffic to a specified ethdev port using
>> PORT_ID action? It looks like no, but IMHO it is the definition
>> of the PORT_ID action.
>
> Let's separate API from implementation. By API point of view, yes, the
> user may request it. Nothing wrong with it.
>
> From implementation point of view - yes, it might fail, but not for
> sure, even if on different NICs. Maybe the HW of a certain vendor has
> the capability to do it?
>
> We can't know, so I think the API should allow it.
>
>>
>>>> We had already very similar discussions regarding the understanding of
>>>> what
>>>> the representor really is from the DPDK API's point of view, and the
>>>> last
>>>> time, IIUC, it was concluded by a tech. board that representor
>>>> should be
>>>> a "ghost of a VF", i.e. DPDK APIs should apply configuration by
>>>> default to
>>>> VF and not to the representor device:
>>>>
>>>> https://patches.dpdk.org/project/dpdk/cover/20191029185051.32203-1-thomas@monjalon.net/#104376
>>>>
>>>>
>>>> This wasn't enforced though, IIUC, for existing code and semantics is
>>>> still mixed.
>>> I am not sure how this is related.
>>>> I still think that configuration should be applied to VF, and the same
>>>> applies
>>>> to rte_flow API. IMHO, average application should not care if
>>>> device is
>>>> a VF itself or its representor. Everything should work exactly the
>>>> same.
>>>> I think this matches with the original idea/design of the switchdev
>>>> functionality
>>>> in the linux kernel and also matches with how the average user thinks
>>>> about
>>>> representor devices.
>>> Right. This is the way representors work. It is fully aligned with
>>> configuration of OVS-kernel.
>>>> If some specific use-case requires to distinguish VF from the
>>>> representor,
>>>> there should probably be a separate special API/flag for that.
>>>>
>>>> Best regards, Ilya Maximets.
--
Ivan M
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [dpdk-dev] [RFC PATCH] ethdev: clarify flow action PORT ID semantics
2021-06-01 14:44 ` Eli Britstein
2021-06-01 14:50 ` Ivan Malov
@ 2021-06-01 14:53 ` Andrew Rybchenko
2021-06-02 9:57 ` Eli Britstein
1 sibling, 1 reply; 40+ messages in thread
From: Andrew Rybchenko @ 2021-06-01 14:53 UTC (permalink / raw)
To: Eli Britstein, Ilya Maximets, Ivan Malov, dev
Cc: Smadar Fuks, Hyong Youb Kim, Kishore Padmanabha, Ori Kam,
Ajit Khaparde, Jerin Jacob, John Daley, Thomas Monjalon,
Ferruh Yigit
On 6/1/21 5:44 PM, Eli Britstein wrote:
>
> On 6/1/2021 5:35 PM, Andrew Rybchenko wrote:
>> External email: Use caution opening links or attachments
>>
>>
>> On 6/1/21 4:24 PM, Eli Britstein wrote:
>>> On 6/1/2021 3:10 PM, Ilya Maximets wrote:
>>>> External email: Use caution opening links or attachments
>>>>
>>>>
>>>> On 6/1/21 1:14 PM, Ivan Malov wrote:
>>>>> By its very name, action PORT_ID means that packets hit an ethdev
>>>>> with the
>>>>> given DPDK port ID. At least the current comments don't state the
>>>>> opposite.
>>>>> That said, since port representors had been adopted, applications
>>>>> like OvS
>>>>> have been misusing the action. They misread its purpose as sending
>>>>> packets
>>>>> to the opposite end of the "wire" plugged to the given ethdev, for
>>>>> example,
>>>>> redirecting packets to the VF itself rather than to its representor
>>>>> ethdev.
>>>>> Another example: OvS relies on this action with the admin PF's ethdev
>>>>> port
>>>>> ID specified in it in order to send offloaded packets to the physical
>>>>> port.
>>>>>
>>>>> Since there might be applications which use this action in its valid
>>>>> sense,
>>>>> one can't just change the documentation to greenlight the opposite
>>>>> meaning.
>>>>> This patch adds an explicit bit to the action configuration which
>>>>> will let
>>>>> applications, depending on their needs, leverage the two meanings
>>>>> properly.
>>>>> Applications like OvS, as well as PMDs, will have to be corrected
>>>>> when the
>>>>> patch has been applied. But the improved clarity of the action is
>>>>> worth it.
>>>>>
>>>>> The proposed change is not the only option. One could avoid changes
>>>>> in OvS
>>>>> and PMDs if the new configuration field had the opposite meaning,
>>>>> with the
>>>>> action itself meaning delivery to the represented port and not to
>>>>> DPDK one.
>>>>> Alternatively, one could define a brand new action with the said
>>>>> behaviour.
>>> It doesn't make any sense to attach the VF itself to OVS, but only its
>>> representor.
>> OvS is not the only DPDK application.
> True. It is just the focus of this commit message is OVS.
>>
>>> For the PF, when in switchdev mode, it is the "uplink representor", so
>>> it is also a representor.
>> Strictly speaking it is not a representor from DPDK point of
>> view. E.g. representors have corresponding flag set which is
>> definitely clear in the case of PF.
> This is the per-PMD responsibility. The API should not care.
>>
>>> That said, OVS does not care of the type of the port. It doesn't matter
>>> if it's an "upstream" or not, or if it's a representor or not.
>> Yes, it is clear, but let's put OvS aside. Let's consider a
>> DPDK application which has a number of ethdev port. Some may
>> belong to single switch domain, some may be from different
>> switch domains (i.e. different NICs). Can I use PORT_ID action
>> to redirect ingress traffic to a specified ethdev port using
>> PORT_ID action? It looks like no, but IMHO it is the definition
>> of the PORT_ID action.
>
> Let's separate API from implementation. By API point of view, yes, the
> user may request it. Nothing wrong with it.
>
> From implementation point of view - yes, it might fail, but not for
> sure, even if on different NICs. Maybe the HW of a certain vendor has
> the capability to do it?
>
> We can't know, so I think the API should allow it.
Hold on. What should it allow? It is two opposite meanings:
1. Direct traffic to DPDK ethdev port specified using ID to be
received and processed by the DPDK application.
2. Direct traffic to an upstream port represented by the
DPDK port.
The patch tries to address the ambiguity, misuse it in OvS
(from my point of view in accordance with the action
documentation), mis-implementation in a number of PMDs
(to work in OvS) and tries to sort it out with an explanation
why proposed direction is chosen. I realize that it could be
painful, but IMHO it is the best option here. Yes, it is a
point to discuss.
To start with we should agree that that problem exists.
Second, we should agree on direction how to solve it.
>>
>>>> We had already very similar discussions regarding the understanding of
>>>> what
>>>> the representor really is from the DPDK API's point of view, and the
>>>> last
>>>> time, IIUC, it was concluded by a tech. board that representor
>>>> should be
>>>> a "ghost of a VF", i.e. DPDK APIs should apply configuration by
>>>> default to
>>>> VF and not to the representor device:
>>>>
>>>> https://patches.dpdk.org/project/dpdk/cover/20191029185051.32203-1-thomas@monjalon.net/#104376
>>>>
>>>>
>>>> This wasn't enforced though, IIUC, for existing code and semantics is
>>>> still mixed.
>>> I am not sure how this is related.
>>>> I still think that configuration should be applied to VF, and the same
>>>> applies
>>>> to rte_flow API. IMHO, average application should not care if
>>>> device is
>>>> a VF itself or its representor. Everything should work exactly the
>>>> same.
>>>> I think this matches with the original idea/design of the switchdev
>>>> functionality
>>>> in the linux kernel and also matches with how the average user thinks
>>>> about
>>>> representor devices.
>>> Right. This is the way representors work. It is fully aligned with
>>> configuration of OVS-kernel.
>>>> If some specific use-case requires to distinguish VF from the
>>>> representor,
>>>> there should probably be a separate special API/flag for that.
>>>>
>>>> Best regards, Ilya Maximets.
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [dpdk-dev] [RFC PATCH] ethdev: clarify flow action PORT ID semantics
2021-06-01 14:53 ` Andrew Rybchenko
@ 2021-06-02 9:57 ` Eli Britstein
2021-06-02 10:50 ` Andrew Rybchenko
0 siblings, 1 reply; 40+ messages in thread
From: Eli Britstein @ 2021-06-02 9:57 UTC (permalink / raw)
To: Andrew Rybchenko, Ilya Maximets, Ivan Malov, dev
Cc: Smadar Fuks, Hyong Youb Kim, Kishore Padmanabha, Ori Kam,
Ajit Khaparde, Jerin Jacob, John Daley, Thomas Monjalon,
Ferruh Yigit
On 6/1/2021 5:53 PM, Andrew Rybchenko wrote:
> External email: Use caution opening links or attachments
>
>
> On 6/1/21 5:44 PM, Eli Britstein wrote:
>> On 6/1/2021 5:35 PM, Andrew Rybchenko wrote:
>>> External email: Use caution opening links or attachments
>>>
>>>
>>> On 6/1/21 4:24 PM, Eli Britstein wrote:
>>>> On 6/1/2021 3:10 PM, Ilya Maximets wrote:
>>>>> External email: Use caution opening links or attachments
>>>>>
>>>>>
>>>>> On 6/1/21 1:14 PM, Ivan Malov wrote:
>>>>>> By its very name, action PORT_ID means that packets hit an ethdev
>>>>>> with the
>>>>>> given DPDK port ID. At least the current comments don't state the
>>>>>> opposite.
>>>>>> That said, since port representors had been adopted, applications
>>>>>> like OvS
>>>>>> have been misusing the action. They misread its purpose as sending
>>>>>> packets
>>>>>> to the opposite end of the "wire" plugged to the given ethdev, for
>>>>>> example,
>>>>>> redirecting packets to the VF itself rather than to its representor
>>>>>> ethdev.
>>>>>> Another example: OvS relies on this action with the admin PF's ethdev
>>>>>> port
>>>>>> ID specified in it in order to send offloaded packets to the physical
>>>>>> port.
>>>>>>
>>>>>> Since there might be applications which use this action in its valid
>>>>>> sense,
>>>>>> one can't just change the documentation to greenlight the opposite
>>>>>> meaning.
>>>>>> This patch adds an explicit bit to the action configuration which
>>>>>> will let
>>>>>> applications, depending on their needs, leverage the two meanings
>>>>>> properly.
>>>>>> Applications like OvS, as well as PMDs, will have to be corrected
>>>>>> when the
>>>>>> patch has been applied. But the improved clarity of the action is
>>>>>> worth it.
>>>>>>
>>>>>> The proposed change is not the only option. One could avoid changes
>>>>>> in OvS
>>>>>> and PMDs if the new configuration field had the opposite meaning,
>>>>>> with the
>>>>>> action itself meaning delivery to the represented port and not to
>>>>>> DPDK one.
>>>>>> Alternatively, one could define a brand new action with the said
>>>>>> behaviour.
>>>> It doesn't make any sense to attach the VF itself to OVS, but only its
>>>> representor.
>>> OvS is not the only DPDK application.
>> True. It is just the focus of this commit message is OVS.
>>>> For the PF, when in switchdev mode, it is the "uplink representor", so
>>>> it is also a representor.
>>> Strictly speaking it is not a representor from DPDK point of
>>> view. E.g. representors have corresponding flag set which is
>>> definitely clear in the case of PF.
>> This is the per-PMD responsibility. The API should not care.
>>>> That said, OVS does not care of the type of the port. It doesn't matter
>>>> if it's an "upstream" or not, or if it's a representor or not.
>>> Yes, it is clear, but let's put OvS aside. Let's consider a
>>> DPDK application which has a number of ethdev port. Some may
>>> belong to single switch domain, some may be from different
>>> switch domains (i.e. different NICs). Can I use PORT_ID action
>>> to redirect ingress traffic to a specified ethdev port using
>>> PORT_ID action? It looks like no, but IMHO it is the definition
>>> of the PORT_ID action.
>> Let's separate API from implementation. By API point of view, yes, the
>> user may request it. Nothing wrong with it.
>>
>> From implementation point of view - yes, it might fail, but not for
>> sure, even if on different NICs. Maybe the HW of a certain vendor has
>> the capability to do it?
>>
>> We can't know, so I think the API should allow it.
> Hold on. What should it allow? It is two opposite meanings:
> 1. Direct traffic to DPDK ethdev port specified using ID to be
> received and processed by the DPDK application.
> 2. Direct traffic to an upstream port represented by the
> DPDK port.
>
> The patch tries to address the ambiguity, misuse it in OvS
> (from my point of view in accordance with the action
> documentation), mis-implementation in a number of PMDs
> (to work in OvS) and tries to sort it out with an explanation
> why proposed direction is chosen. I realize that it could be
> painful, but IMHO it is the best option here. Yes, it is a
> point to discuss.
>
> To start with we should agree that that problem exists.
> Second, we should agree on direction how to solve it.
I agree. Suppose port 0 is the PF, and port 1 is a VF representor.
IIUC, there are two options:
1. flow create 1 ingress transfer pattern eth / end action port_id id 0
upstream 1 / end
2. flow create 1 ingress transfer pattern eth / end action port_id id 0
upstream 0 / end
[1] is the same behavior as today.
[2] is a new behavior, the packet received by port 0 as if it arrived
from the wire.
Then, let's have more:
3. flow create 0 ingress transfer pattern eth / end action port_id id 1
upstream 1 / end
4. flow create 0 ingress transfer pattern eth / end action port_id id 1
upstream 0 / end
if we have [2] and [4], the packet going from the VF will hit [2], then
hit [4] and then [2] again in an endless loop?
If this is your meaning, maybe what you are looking for is an action to
change the in_port and continue processing?
Please comment on the examples I gave or clarify the use case you are
trying to do.
Thanks,
Eli
>
>>>>> We had already very similar discussions regarding the understanding of
>>>>> what
>>>>> the representor really is from the DPDK API's point of view, and the
>>>>> last
>>>>> time, IIUC, it was concluded by a tech. board that representor
>>>>> should be
>>>>> a "ghost of a VF", i.e. DPDK APIs should apply configuration by
>>>>> default to
>>>>> VF and not to the representor device:
>>>>>
>>>>> https://patches.dpdk.org/project/dpdk/cover/20191029185051.32203-1-thomas@monjalon.net/#104376
>>>>>
>>>>>
>>>>> This wasn't enforced though, IIUC, for existing code and semantics is
>>>>> still mixed.
>>>> I am not sure how this is related.
>>>>> I still think that configuration should be applied to VF, and the same
>>>>> applies
>>>>> to rte_flow API. IMHO, average application should not care if
>>>>> device is
>>>>> a VF itself or its representor. Everything should work exactly the
>>>>> same.
>>>>> I think this matches with the original idea/design of the switchdev
>>>>> functionality
>>>>> in the linux kernel and also matches with how the average user thinks
>>>>> about
>>>>> representor devices.
>>>> Right. This is the way representors work. It is fully aligned with
>>>> configuration of OVS-kernel.
>>>>> If some specific use-case requires to distinguish VF from the
>>>>> representor,
>>>>> there should probably be a separate special API/flag for that.
>>>>>
>>>>> Best regards, Ilya Maximets.
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [dpdk-dev] [RFC PATCH] ethdev: clarify flow action PORT ID semantics
2021-06-02 9:57 ` Eli Britstein
@ 2021-06-02 10:50 ` Andrew Rybchenko
2021-06-02 11:21 ` Eli Britstein
0 siblings, 1 reply; 40+ messages in thread
From: Andrew Rybchenko @ 2021-06-02 10:50 UTC (permalink / raw)
To: Eli Britstein, Ilya Maximets, Ivan Malov, dev
Cc: Smadar Fuks, Hyong Youb Kim, Kishore Padmanabha, Ori Kam,
Ajit Khaparde, Jerin Jacob, John Daley, Thomas Monjalon,
Ferruh Yigit
On 6/2/21 12:57 PM, Eli Britstein wrote:
>
> On 6/1/2021 5:53 PM, Andrew Rybchenko wrote:
>> External email: Use caution opening links or attachments
>>
>>
>> On 6/1/21 5:44 PM, Eli Britstein wrote:
>>> On 6/1/2021 5:35 PM, Andrew Rybchenko wrote:
>>>> External email: Use caution opening links or attachments
>>>>
>>>>
>>>> On 6/1/21 4:24 PM, Eli Britstein wrote:
>>>>> On 6/1/2021 3:10 PM, Ilya Maximets wrote:
>>>>>> External email: Use caution opening links or attachments
>>>>>>
>>>>>>
>>>>>> On 6/1/21 1:14 PM, Ivan Malov wrote:
>>>>>>> By its very name, action PORT_ID means that packets hit an ethdev
>>>>>>> with the
>>>>>>> given DPDK port ID. At least the current comments don't state the
>>>>>>> opposite.
>>>>>>> That said, since port representors had been adopted, applications
>>>>>>> like OvS
>>>>>>> have been misusing the action. They misread its purpose as sending
>>>>>>> packets
>>>>>>> to the opposite end of the "wire" plugged to the given ethdev, for
>>>>>>> example,
>>>>>>> redirecting packets to the VF itself rather than to its representor
>>>>>>> ethdev.
>>>>>>> Another example: OvS relies on this action with the admin PF's
>>>>>>> ethdev
>>>>>>> port
>>>>>>> ID specified in it in order to send offloaded packets to the
>>>>>>> physical
>>>>>>> port.
>>>>>>>
>>>>>>> Since there might be applications which use this action in its valid
>>>>>>> sense,
>>>>>>> one can't just change the documentation to greenlight the opposite
>>>>>>> meaning.
>>>>>>> This patch adds an explicit bit to the action configuration which
>>>>>>> will let
>>>>>>> applications, depending on their needs, leverage the two meanings
>>>>>>> properly.
>>>>>>> Applications like OvS, as well as PMDs, will have to be corrected
>>>>>>> when the
>>>>>>> patch has been applied. But the improved clarity of the action is
>>>>>>> worth it.
>>>>>>>
>>>>>>> The proposed change is not the only option. One could avoid changes
>>>>>>> in OvS
>>>>>>> and PMDs if the new configuration field had the opposite meaning,
>>>>>>> with the
>>>>>>> action itself meaning delivery to the represented port and not to
>>>>>>> DPDK one.
>>>>>>> Alternatively, one could define a brand new action with the said
>>>>>>> behaviour.
>>>>> It doesn't make any sense to attach the VF itself to OVS, but only its
>>>>> representor.
>>>> OvS is not the only DPDK application.
>>> True. It is just the focus of this commit message is OVS.
>>>>> For the PF, when in switchdev mode, it is the "uplink representor", so
>>>>> it is also a representor.
>>>> Strictly speaking it is not a representor from DPDK point of
>>>> view. E.g. representors have corresponding flag set which is
>>>> definitely clear in the case of PF.
>>> This is the per-PMD responsibility. The API should not care.
>>>>> That said, OVS does not care of the type of the port. It doesn't
>>>>> matter
>>>>> if it's an "upstream" or not, or if it's a representor or not.
>>>> Yes, it is clear, but let's put OvS aside. Let's consider a
>>>> DPDK application which has a number of ethdev port. Some may
>>>> belong to single switch domain, some may be from different
>>>> switch domains (i.e. different NICs). Can I use PORT_ID action
>>>> to redirect ingress traffic to a specified ethdev port using
>>>> PORT_ID action? It looks like no, but IMHO it is the definition
>>>> of the PORT_ID action.
>>> Let's separate API from implementation. By API point of view, yes, the
>>> user may request it. Nothing wrong with it.
>>>
>>> From implementation point of view - yes, it might fail, but not for
>>> sure, even if on different NICs. Maybe the HW of a certain vendor has
>>> the capability to do it?
>>>
>>> We can't know, so I think the API should allow it.
>> Hold on. What should it allow? It is two opposite meanings:
>> 1. Direct traffic to DPDK ethdev port specified using ID to be
>> received and processed by the DPDK application.
>> 2. Direct traffic to an upstream port represented by the
>> DPDK port.
>>
>> The patch tries to address the ambiguity, misuse it in OvS
>> (from my point of view in accordance with the action
>> documentation), mis-implementation in a number of PMDs
>> (to work in OvS) and tries to sort it out with an explanation
>> why proposed direction is chosen. I realize that it could be
>> painful, but IMHO it is the best option here. Yes, it is a
>> point to discuss.
>>
>> To start with we should agree that that problem exists.
>> Second, we should agree on direction how to solve it.
>
> I agree. Suppose port 0 is the PF, and port 1 is a VF representor.
>
> IIUC, there are two options:
>
> 1. flow create 1 ingress transfer pattern eth / end action port_id id 0
> upstream 1 / end
>
> 2. flow create 1 ingress transfer pattern eth / end action port_id id 0
> upstream 0 / end
>
> [1] is the same behavior as today.
>
> [2] is a new behavior, the packet received by port 0 as if it arrived
> from the wire.
>
> Then, let's have more:
>
> 3. flow create 0 ingress transfer pattern eth / end action port_id id 1
> upstream 1 / end
>
> 4. flow create 0 ingress transfer pattern eth / end action port_id id 1
> upstream 0 / end
>
> if we have [2] and [4], the packet going from the VF will hit [2], then
> hit [4] and then [2] again in an endless loop?
As I understand PORT_ID is a fate action. So, no more lookups
are done. If the packet is loop back from applications, loop is
possible.
In fact, it is a good question if "flow creare 0 ingress
transfer" or "flow create 1 ingress transfer" assume any
implicit filtering. I always thought that no.
i.e. if we have two network ports rule like
flow create 0 ingress transfer pattern eth / end \
action port_id id 1 upstream 1 / end
will match packets incoming from any port into the switch
(network port 0, network port 1, VF or PF itself (???)).
The topic also requires explicit clarification.
PF itself is really a hard question because of "ingress"
since traffic from PF is a traffic from DPDK application and
it is egress, not ingress.
I think that port ID used to created flow rule should not
apply any filtering in the case of transfer since we have
corresponding items to do it explicitly. If we do it implicitly
as well, we need some priorities and a way to avoid implicit
rules which makes things much harder to understand and
implement.
> If this is your meaning, maybe what you are looking for is an action to
> change the in_port and continue processing?
>
> Please comment on the examples I gave or clarify the use case you are
> trying to do.
>
>
> Thanks,
>
> Eli
>
>>
>>>>>> We had already very similar discussions regarding the
>>>>>> understanding of
>>>>>> what
>>>>>> the representor really is from the DPDK API's point of view, and the
>>>>>> last
>>>>>> time, IIUC, it was concluded by a tech. board that representor
>>>>>> should be
>>>>>> a "ghost of a VF", i.e. DPDK APIs should apply configuration by
>>>>>> default to
>>>>>> VF and not to the representor device:
>>>>>>
>>>>>> https://patches.dpdk.org/project/dpdk/cover/20191029185051.32203-1-thomas@monjalon.net/#104376
>>>>>>
>>>>>>
>>>>>>
>>>>>> This wasn't enforced though, IIUC, for existing code and semantics is
>>>>>> still mixed.
>>>>> I am not sure how this is related.
>>>>>> I still think that configuration should be applied to VF, and the
>>>>>> same
>>>>>> applies
>>>>>> to rte_flow API. IMHO, average application should not care if
>>>>>> device is
>>>>>> a VF itself or its representor. Everything should work exactly the
>>>>>> same.
>>>>>> I think this matches with the original idea/design of the switchdev
>>>>>> functionality
>>>>>> in the linux kernel and also matches with how the average user thinks
>>>>>> about
>>>>>> representor devices.
>>>>> Right. This is the way representors work. It is fully aligned with
>>>>> configuration of OVS-kernel.
>>>>>> If some specific use-case requires to distinguish VF from the
>>>>>> representor,
>>>>>> there should probably be a separate special API/flag for that.
>>>>>>
>>>>>> Best regards, Ilya Maximets.
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [dpdk-dev] [RFC PATCH] ethdev: clarify flow action PORT ID semantics
2021-06-02 10:50 ` Andrew Rybchenko
@ 2021-06-02 11:21 ` Eli Britstein
2021-06-02 11:57 ` Andrew Rybchenko
2021-06-02 12:36 ` Ivan Malov
0 siblings, 2 replies; 40+ messages in thread
From: Eli Britstein @ 2021-06-02 11:21 UTC (permalink / raw)
To: Andrew Rybchenko, Ilya Maximets, Ivan Malov, dev
Cc: Smadar Fuks, Hyong Youb Kim, Kishore Padmanabha, Ori Kam,
Ajit Khaparde, Jerin Jacob, John Daley, Thomas Monjalon,
Ferruh Yigit
On 6/2/2021 1:50 PM, Andrew Rybchenko wrote:
> External email: Use caution opening links or attachments
>
>
> On 6/2/21 12:57 PM, Eli Britstein wrote:
>> On 6/1/2021 5:53 PM, Andrew Rybchenko wrote:
>>> External email: Use caution opening links or attachments
>>>
>>>
>>> On 6/1/21 5:44 PM, Eli Britstein wrote:
>>>> On 6/1/2021 5:35 PM, Andrew Rybchenko wrote:
>>>>> External email: Use caution opening links or attachments
>>>>>
>>>>>
>>>>> On 6/1/21 4:24 PM, Eli Britstein wrote:
>>>>>> On 6/1/2021 3:10 PM, Ilya Maximets wrote:
>>>>>>> External email: Use caution opening links or attachments
>>>>>>>
>>>>>>>
>>>>>>> On 6/1/21 1:14 PM, Ivan Malov wrote:
>>>>>>>> By its very name, action PORT_ID means that packets hit an ethdev
>>>>>>>> with the
>>>>>>>> given DPDK port ID. At least the current comments don't state the
>>>>>>>> opposite.
>>>>>>>> That said, since port representors had been adopted, applications
>>>>>>>> like OvS
>>>>>>>> have been misusing the action. They misread its purpose as sending
>>>>>>>> packets
>>>>>>>> to the opposite end of the "wire" plugged to the given ethdev, for
>>>>>>>> example,
>>>>>>>> redirecting packets to the VF itself rather than to its representor
>>>>>>>> ethdev.
>>>>>>>> Another example: OvS relies on this action with the admin PF's
>>>>>>>> ethdev
>>>>>>>> port
>>>>>>>> ID specified in it in order to send offloaded packets to the
>>>>>>>> physical
>>>>>>>> port.
>>>>>>>>
>>>>>>>> Since there might be applications which use this action in its valid
>>>>>>>> sense,
>>>>>>>> one can't just change the documentation to greenlight the opposite
>>>>>>>> meaning.
>>>>>>>> This patch adds an explicit bit to the action configuration which
>>>>>>>> will let
>>>>>>>> applications, depending on their needs, leverage the two meanings
>>>>>>>> properly.
>>>>>>>> Applications like OvS, as well as PMDs, will have to be corrected
>>>>>>>> when the
>>>>>>>> patch has been applied. But the improved clarity of the action is
>>>>>>>> worth it.
>>>>>>>>
>>>>>>>> The proposed change is not the only option. One could avoid changes
>>>>>>>> in OvS
>>>>>>>> and PMDs if the new configuration field had the opposite meaning,
>>>>>>>> with the
>>>>>>>> action itself meaning delivery to the represented port and not to
>>>>>>>> DPDK one.
>>>>>>>> Alternatively, one could define a brand new action with the said
>>>>>>>> behaviour.
>>>>>> It doesn't make any sense to attach the VF itself to OVS, but only its
>>>>>> representor.
>>>>> OvS is not the only DPDK application.
>>>> True. It is just the focus of this commit message is OVS.
>>>>>> For the PF, when in switchdev mode, it is the "uplink representor", so
>>>>>> it is also a representor.
>>>>> Strictly speaking it is not a representor from DPDK point of
>>>>> view. E.g. representors have corresponding flag set which is
>>>>> definitely clear in the case of PF.
>>>> This is the per-PMD responsibility. The API should not care.
>>>>>> That said, OVS does not care of the type of the port. It doesn't
>>>>>> matter
>>>>>> if it's an "upstream" or not, or if it's a representor or not.
>>>>> Yes, it is clear, but let's put OvS aside. Let's consider a
>>>>> DPDK application which has a number of ethdev port. Some may
>>>>> belong to single switch domain, some may be from different
>>>>> switch domains (i.e. different NICs). Can I use PORT_ID action
>>>>> to redirect ingress traffic to a specified ethdev port using
>>>>> PORT_ID action? It looks like no, but IMHO it is the definition
>>>>> of the PORT_ID action.
>>>> Let's separate API from implementation. By API point of view, yes, the
>>>> user may request it. Nothing wrong with it.
>>>>
>>>> From implementation point of view - yes, it might fail, but not for
>>>> sure, even if on different NICs. Maybe the HW of a certain vendor has
>>>> the capability to do it?
>>>>
>>>> We can't know, so I think the API should allow it.
>>> Hold on. What should it allow? It is two opposite meanings:
>>> 1. Direct traffic to DPDK ethdev port specified using ID to be
>>> received and processed by the DPDK application.
>>> 2. Direct traffic to an upstream port represented by the
>>> DPDK port.
>>>
>>> The patch tries to address the ambiguity, misuse it in OvS
>>> (from my point of view in accordance with the action
>>> documentation), mis-implementation in a number of PMDs
>>> (to work in OvS) and tries to sort it out with an explanation
>>> why proposed direction is chosen. I realize that it could be
>>> painful, but IMHO it is the best option here. Yes, it is a
>>> point to discuss.
>>>
>>> To start with we should agree that that problem exists.
>>> Second, we should agree on direction how to solve it.
>> I agree. Suppose port 0 is the PF, and port 1 is a VF representor.
>>
>> IIUC, there are two options:
>>
>> 1. flow create 1 ingress transfer pattern eth / end action port_id id 0
>> upstream 1 / end
>>
>> 2. flow create 1 ingress transfer pattern eth / end action port_id id 0
>> upstream 0 / end
>>
>> [1] is the same behavior as today.
>>
>> [2] is a new behavior, the packet received by port 0 as if it arrived
>> from the wire.
>>
>> Then, let's have more:
>>
>> 3. flow create 0 ingress transfer pattern eth / end action port_id id 1
>> upstream 1 / end
>>
>> 4. flow create 0 ingress transfer pattern eth / end action port_id id 1
>> upstream 0 / end
>>
>> if we have [2] and [4], the packet going from the VF will hit [2], then
>> hit [4] and then [2] again in an endless loop?
> As I understand PORT_ID is a fate action. So, no more lookups
> are done. If the packet is loop back from applications, loop is
> possible.
I referred a HW loop, not SW. For example with JUMP action (also fate):
flow create 0 group 0 ingress transfer pattern eth / end action jump
group 1 / end
flow create 0 group 1 ingress transfer pattern eth / end action jump
group 0 / end
>
> In fact, it is a good question if "flow creare 0 ingress
> transfer" or "flow create 1 ingress transfer" assume any
> implicit filtering. I always thought that no.
> i.e. if we have two network ports rule like
> flow create 0 ingress transfer pattern eth / end \
> action port_id id 1 upstream 1 / end
> will match packets incoming from any port into the switch
> (network port 0, network port 1, VF or PF itself (???)).
> The topic also requires explicit clarification.
rte_flow is port based. It implicitly filters only packets for the
provided port (0).
Maybe need to clarify documentation and have a "no filtering" API if needed.
> PF itself is really a hard question because of "ingress"
> since traffic from PF is a traffic from DPDK application and
> it is egress, not ingress.
Ingress means the direction. Hit on packets otherwise provided to the SW
by rte_eth_rx_burst().
Same goes for the PF. Packets by rte_eth_rx_burst are the ones arriving
from the wire, so ingress is that direction and egress is from the app.
>
> I think that port ID used to created flow rule should not
> apply any filtering in the case of transfer since we have
> corresponding items to do it explicitly. If we do it implicitly
> as well, we need some priorities and a way to avoid implicit
> rules which makes things much harder to understand and
> implement.
If "upstream 0" means what I thought it means (comments?) maybe a better
way to do it is expose another port for that, so there will be 2 "PF"
ports - one as the wire representor and the other one as the "PF" (or
clearer naming...).
This would be a vendor decision, and there would be no need to change
PORT_ID API.
>
>> If this is your meaning, maybe what you are looking for is an action to
>> change the in_port and continue processing?
>>
>> Please comment on the examples I gave or clarify the use case you are
>> trying to do.
>>
>>
>> Thanks,
>>
>> Eli
>>
>>>>>>> We had already very similar discussions regarding the
>>>>>>> understanding of
>>>>>>> what
>>>>>>> the representor really is from the DPDK API's point of view, and the
>>>>>>> last
>>>>>>> time, IIUC, it was concluded by a tech. board that representor
>>>>>>> should be
>>>>>>> a "ghost of a VF", i.e. DPDK APIs should apply configuration by
>>>>>>> default to
>>>>>>> VF and not to the representor device:
>>>>>>>
>>>>>>> https://patches.dpdk.org/project/dpdk/cover/20191029185051.32203-1-thomas@monjalon.net/#104376
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> This wasn't enforced though, IIUC, for existing code and semantics is
>>>>>>> still mixed.
>>>>>> I am not sure how this is related.
>>>>>>> I still think that configuration should be applied to VF, and the
>>>>>>> same
>>>>>>> applies
>>>>>>> to rte_flow API. IMHO, average application should not care if
>>>>>>> device is
>>>>>>> a VF itself or its representor. Everything should work exactly the
>>>>>>> same.
>>>>>>> I think this matches with the original idea/design of the switchdev
>>>>>>> functionality
>>>>>>> in the linux kernel and also matches with how the average user thinks
>>>>>>> about
>>>>>>> representor devices.
>>>>>> Right. This is the way representors work. It is fully aligned with
>>>>>> configuration of OVS-kernel.
>>>>>>> If some specific use-case requires to distinguish VF from the
>>>>>>> representor,
>>>>>>> there should probably be a separate special API/flag for that.
>>>>>>>
>>>>>>> Best regards, Ilya Maximets.
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [dpdk-dev] [RFC PATCH] ethdev: clarify flow action PORT ID semantics
2021-06-02 11:21 ` Eli Britstein
@ 2021-06-02 11:57 ` Andrew Rybchenko
2021-06-02 12:36 ` Ivan Malov
1 sibling, 0 replies; 40+ messages in thread
From: Andrew Rybchenko @ 2021-06-02 11:57 UTC (permalink / raw)
To: Eli Britstein, Ilya Maximets, Ivan Malov, dev
Cc: Smadar Fuks, Hyong Youb Kim, Kishore Padmanabha, Ori Kam,
Ajit Khaparde, Jerin Jacob, John Daley, Thomas Monjalon,
Ferruh Yigit
On 6/2/21 2:21 PM, Eli Britstein wrote:
>
> On 6/2/2021 1:50 PM, Andrew Rybchenko wrote:
>> External email: Use caution opening links or attachments
>>
>>
>> On 6/2/21 12:57 PM, Eli Britstein wrote:
>>> On 6/1/2021 5:53 PM, Andrew Rybchenko wrote:
>>>> External email: Use caution opening links or attachments
>>>>
>>>>
>>>> On 6/1/21 5:44 PM, Eli Britstein wrote:
>>>>> On 6/1/2021 5:35 PM, Andrew Rybchenko wrote:
>>>>>> External email: Use caution opening links or attachments
>>>>>>
>>>>>>
>>>>>> On 6/1/21 4:24 PM, Eli Britstein wrote:
>>>>>>> On 6/1/2021 3:10 PM, Ilya Maximets wrote:
>>>>>>>> External email: Use caution opening links or attachments
>>>>>>>>
>>>>>>>>
>>>>>>>> On 6/1/21 1:14 PM, Ivan Malov wrote:
>>>>>>>>> By its very name, action PORT_ID means that packets hit an ethdev
>>>>>>>>> with the
>>>>>>>>> given DPDK port ID. At least the current comments don't state the
>>>>>>>>> opposite.
>>>>>>>>> That said, since port representors had been adopted, applications
>>>>>>>>> like OvS
>>>>>>>>> have been misusing the action. They misread its purpose as sending
>>>>>>>>> packets
>>>>>>>>> to the opposite end of the "wire" plugged to the given ethdev, for
>>>>>>>>> example,
>>>>>>>>> redirecting packets to the VF itself rather than to its
>>>>>>>>> representor
>>>>>>>>> ethdev.
>>>>>>>>> Another example: OvS relies on this action with the admin PF's
>>>>>>>>> ethdev
>>>>>>>>> port
>>>>>>>>> ID specified in it in order to send offloaded packets to the
>>>>>>>>> physical
>>>>>>>>> port.
>>>>>>>>>
>>>>>>>>> Since there might be applications which use this action in its
>>>>>>>>> valid
>>>>>>>>> sense,
>>>>>>>>> one can't just change the documentation to greenlight the opposite
>>>>>>>>> meaning.
>>>>>>>>> This patch adds an explicit bit to the action configuration which
>>>>>>>>> will let
>>>>>>>>> applications, depending on their needs, leverage the two meanings
>>>>>>>>> properly.
>>>>>>>>> Applications like OvS, as well as PMDs, will have to be corrected
>>>>>>>>> when the
>>>>>>>>> patch has been applied. But the improved clarity of the action is
>>>>>>>>> worth it.
>>>>>>>>>
>>>>>>>>> The proposed change is not the only option. One could avoid
>>>>>>>>> changes
>>>>>>>>> in OvS
>>>>>>>>> and PMDs if the new configuration field had the opposite meaning,
>>>>>>>>> with the
>>>>>>>>> action itself meaning delivery to the represented port and not to
>>>>>>>>> DPDK one.
>>>>>>>>> Alternatively, one could define a brand new action with the said
>>>>>>>>> behaviour.
>>>>>>> It doesn't make any sense to attach the VF itself to OVS, but
>>>>>>> only its
>>>>>>> representor.
>>>>>> OvS is not the only DPDK application.
>>>>> True. It is just the focus of this commit message is OVS.
>>>>>>> For the PF, when in switchdev mode, it is the "uplink
>>>>>>> representor", so
>>>>>>> it is also a representor.
>>>>>> Strictly speaking it is not a representor from DPDK point of
>>>>>> view. E.g. representors have corresponding flag set which is
>>>>>> definitely clear in the case of PF.
>>>>> This is the per-PMD responsibility. The API should not care.
>>>>>>> That said, OVS does not care of the type of the port. It doesn't
>>>>>>> matter
>>>>>>> if it's an "upstream" or not, or if it's a representor or not.
>>>>>> Yes, it is clear, but let's put OvS aside. Let's consider a
>>>>>> DPDK application which has a number of ethdev port. Some may
>>>>>> belong to single switch domain, some may be from different
>>>>>> switch domains (i.e. different NICs). Can I use PORT_ID action
>>>>>> to redirect ingress traffic to a specified ethdev port using
>>>>>> PORT_ID action? It looks like no, but IMHO it is the definition
>>>>>> of the PORT_ID action.
>>>>> Let's separate API from implementation. By API point of view, yes, the
>>>>> user may request it. Nothing wrong with it.
>>>>>
>>>>> From implementation point of view - yes, it might fail, but not for
>>>>> sure, even if on different NICs. Maybe the HW of a certain vendor has
>>>>> the capability to do it?
>>>>>
>>>>> We can't know, so I think the API should allow it.
>>>> Hold on. What should it allow? It is two opposite meanings:
>>>> 1. Direct traffic to DPDK ethdev port specified using ID to be
>>>> received and processed by the DPDK application.
>>>> 2. Direct traffic to an upstream port represented by the
>>>> DPDK port.
>>>>
>>>> The patch tries to address the ambiguity, misuse it in OvS
>>>> (from my point of view in accordance with the action
>>>> documentation), mis-implementation in a number of PMDs
>>>> (to work in OvS) and tries to sort it out with an explanation
>>>> why proposed direction is chosen. I realize that it could be
>>>> painful, but IMHO it is the best option here. Yes, it is a
>>>> point to discuss.
>>>>
>>>> To start with we should agree that that problem exists.
>>>> Second, we should agree on direction how to solve it.
>>> I agree. Suppose port 0 is the PF, and port 1 is a VF representor.
>>>
>>> IIUC, there are two options:
>>>
>>> 1. flow create 1 ingress transfer pattern eth / end action port_id id 0
>>> upstream 1 / end
>>>
>>> 2. flow create 1 ingress transfer pattern eth / end action port_id id 0
>>> upstream 0 / end
>>>
>>> [1] is the same behavior as today.
>>>
>>> [2] is a new behavior, the packet received by port 0 as if it arrived
>>> from the wire.
>>>
>>> Then, let's have more:
>>>
>>> 3. flow create 0 ingress transfer pattern eth / end action port_id id 1
>>> upstream 1 / end
>>>
>>> 4. flow create 0 ingress transfer pattern eth / end action port_id id 1
>>> upstream 0 / end
>>>
>>> if we have [2] and [4], the packet going from the VF will hit [2], then
>>> hit [4] and then [2] again in an endless loop?
>> As I understand PORT_ID is a fate action. So, no more lookups
>> are done. If the packet is loop back from applications, loop is
>> possible.
>
> I referred a HW loop, not SW. For example with JUMP action (also fate):
>
> flow create 0 group 0 ingress transfer pattern eth / end action jump
> group 1 / end
>
> flow create 0 group 1 ingress transfer pattern eth / end action jump
> group 0 / end
IMHO jump is an internal fate action. Fate action for a
corresponding group. But PORT_ID is a real fate action.
So, no more lookups are done at all. So, no HW loop.
>>
>> In fact, it is a good question if "flow creare 0 ingress
>> transfer" or "flow create 1 ingress transfer" assume any
>> implicit filtering. I always thought that no.
>> i.e. if we have two network ports rule like
>> flow create 0 ingress transfer pattern eth / end \
>> action port_id id 1 upstream 1 / end
>> will match packets incoming from any port into the switch
>> (network port 0, network port 1, VF or PF itself (???)).
>> The topic also requires explicit clarification.
> rte_flow is port based. It implicitly filters only packets for the
> provided port (0).
As I've written below I disagree. If I'm missing some
documentation about it, please, help to find it.
Otherwise, it is open question and a point to discuss.
> Maybe need to clarify documentation and have a "no filtering" API if
> needed.
>
>> PF itself is really a hard question because of "ingress"
>> since traffic from PF is a traffic from DPDK application and
>> it is egress, not ingress.
>
> Ingress means the direction. Hit on packets otherwise provided to the SW
> by rte_eth_rx_burst().
>
> Same goes for the PF. Packets by rte_eth_rx_burst are the ones arriving
> from the wire, so ingress is that direction and egress is from the app.
Yes, of course. But I'm talking about no implicit filtering
case. If so, ideally it should not match packets arriving
into a transfer switch from PF (sent using rte_eth_tx_burst()).
I.e. here I'm talking about implicit filtering specified using
ingress or egress direction.
>>
>> I think that port ID used to created flow rule should not
>> apply any filtering in the case of transfer since we have
>> corresponding items to do it explicitly. If we do it implicitly
>> as well, we need some priorities and a way to avoid implicit
>> rules which makes things much harder to understand and
>> implement.
>
> If "upstream 0" means what I thought it means (comments?) maybe a better
> way to do it is expose another port for that, so there will be 2 "PF"
> ports - one as the wire representor and the other one as the "PF" (or
> clearer naming...).
Yes, "upstream 0" means that the packet should be delivered to
DPDK port specified by ID to be received using
rte_eth_rx_burst().
Solution with network port representors is possible, but IMHO
it just adds complexity. If an application wants to route some
traffic to itself for further processing, it will require extra
efforts and signaling in order to provide ingress port
information. If such traffic is delivered via corresponding
representor, the information is provided in a native way.
> This would be a vendor decision, and there would be no need to change
> PORT_ID API.
Sorry, I still think that PORT_ID action behaviour assumed and
used by OvS in the case of PF is wrong and mismatch action
definition.
Representors case is questionable and really related to
representor definition discussion pointed out by Ilya.
>>
>>> If this is your meaning, maybe what you are looking for is an action to
>>> change the in_port and continue processing?
>>>
>>> Please comment on the examples I gave or clarify the use case you are
>>> trying to do.
>>>
>>>
>>> Thanks,
>>>
>>> Eli
>>>
>>>>>>>> We had already very similar discussions regarding the
>>>>>>>> understanding of
>>>>>>>> what
>>>>>>>> the representor really is from the DPDK API's point of view, and
>>>>>>>> the
>>>>>>>> last
>>>>>>>> time, IIUC, it was concluded by a tech. board that representor
>>>>>>>> should be
>>>>>>>> a "ghost of a VF", i.e. DPDK APIs should apply configuration by
>>>>>>>> default to
>>>>>>>> VF and not to the representor device:
>>>>>>>>
>>>>>>>> https://patches.dpdk.org/project/dpdk/cover/20191029185051.32203-1-thomas@monjalon.net/#104376
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> This wasn't enforced though, IIUC, for existing code and
>>>>>>>> semantics is
>>>>>>>> still mixed.
>>>>>>> I am not sure how this is related.
>>>>>>>> I still think that configuration should be applied to VF, and the
>>>>>>>> same
>>>>>>>> applies
>>>>>>>> to rte_flow API. IMHO, average application should not care if
>>>>>>>> device is
>>>>>>>> a VF itself or its representor. Everything should work exactly the
>>>>>>>> same.
>>>>>>>> I think this matches with the original idea/design of the switchdev
>>>>>>>> functionality
>>>>>>>> in the linux kernel and also matches with how the average user
>>>>>>>> thinks
>>>>>>>> about
>>>>>>>> representor devices.
>>>>>>> Right. This is the way representors work. It is fully aligned with
>>>>>>> configuration of OVS-kernel.
>>>>>>>> If some specific use-case requires to distinguish VF from the
>>>>>>>> representor,
>>>>>>>> there should probably be a separate special API/flag for that.
>>>>>>>>
>>>>>>>> Best regards, Ilya Maximets.
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [dpdk-dev] [RFC PATCH] ethdev: clarify flow action PORT ID semantics
2021-06-01 12:10 ` Ilya Maximets
2021-06-01 13:24 ` Eli Britstein
2021-06-01 14:28 ` Ivan Malov
@ 2021-06-02 12:16 ` Thomas Monjalon
2021-06-02 12:53 ` Ilya Maximets
2021-06-02 13:10 ` Andrew Rybchenko
2 siblings, 2 replies; 40+ messages in thread
From: Thomas Monjalon @ 2021-06-02 12:16 UTC (permalink / raw)
To: Ivan Malov, Ilya Maximets, Eli Britstein, Andrew Rybchenko
Cc: dev, Smadar Fuks, Hyong Youb Kim, Kishore Padmanabha, Ori Kam,
Ajit Khaparde, Jerin Jacob, John Daley, Ferruh Yigit
01/06/2021 14:10, Ilya Maximets:
> On 6/1/21 1:14 PM, Ivan Malov wrote:
> > By its very name, action PORT_ID means that packets hit an ethdev with the
> > given DPDK port ID. At least the current comments don't state the opposite.
> > That said, since port representors had been adopted, applications like OvS
> > have been misusing the action. They misread its purpose as sending packets
> > to the opposite end of the "wire" plugged to the given ethdev, for example,
> > redirecting packets to the VF itself rather than to its representor ethdev.
> > Another example: OvS relies on this action with the admin PF's ethdev port
> > ID specified in it in order to send offloaded packets to the physical port.
> >
> > Since there might be applications which use this action in its valid sense,
> > one can't just change the documentation to greenlight the opposite meaning.
> > This patch adds an explicit bit to the action configuration which will let
> > applications, depending on their needs, leverage the two meanings properly.
> > Applications like OvS, as well as PMDs, will have to be corrected when the
> > patch has been applied. But the improved clarity of the action is worth it.
> >
> > The proposed change is not the only option. One could avoid changes in OvS
> > and PMDs if the new configuration field had the opposite meaning, with the
> > action itself meaning delivery to the represented port and not to DPDK one.
> > Alternatively, one could define a brand new action with the said behaviour.
>
> We had already very similar discussions regarding the understanding of what
> the representor really is from the DPDK API's point of view, and the last
> time, IIUC, it was concluded by a tech. board that representor should be
> a "ghost of a VF", i.e. DPDK APIs should apply configuration by default to
> VF and not to the representor device:
> https://patches.dpdk.org/project/dpdk/cover/20191029185051.32203-1-thomas@monjalon.net/#104376
> This wasn't enforced though, IIUC, for existing code and semantics is still mixed.
Quoting myself from above link:
"the representor port must be a real DPDK port, not a ghost."
and
"During the Technical Board yesterday, it was decided to go with Intel
understanding of what is a representor, i.e. a ghost of the VF."
and
"we will continue to mix VF and representor operations
with the same port ID. For the record, I believe it is very bad."
> I still think that configuration should be applied to VF, and the same applies
> to rte_flow API. IMHO, average application should not care if device is
> a VF itself or its representor. Everything should work exactly the same.
What means "work exactly the same"?
Is it considering what is behind the representor silently,
or considering the representor as a real port?
There is a need to really consider representor port as any other port,
and stop this ugly mix. I want to propose such change again for DPDK 21.11.
To me the real solution is to use a bit in the port id of a representor
for explicitly identifying the port behind the representor.
This bit could be translated as a flag or a sign in testpmd text grammar.
> I think this matches with the original idea/design of the switchdev functionality
> in the linux kernel and also matches with how the average user thinks about
> representor devices.
There is no "average" user or application, just right and wrong.
In the switchdev model, a representor is a port of a switch like
any other port, not a ghost of its peer.
> If some specific use-case requires to distinguish VF from the representor,
> there should probably be a separate special API/flag for that.
Yes, port ID of a representor must be the representor itself,
and a bit can help reaching the port behind the representor.
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [dpdk-dev] [RFC PATCH] ethdev: clarify flow action PORT ID semantics
2021-06-02 11:21 ` Eli Britstein
2021-06-02 11:57 ` Andrew Rybchenko
@ 2021-06-02 12:36 ` Ivan Malov
2021-06-03 9:18 ` Ori Kam
1 sibling, 1 reply; 40+ messages in thread
From: Ivan Malov @ 2021-06-02 12:36 UTC (permalink / raw)
To: Eli Britstein, Andrew Rybchenko, Ilya Maximets, dev
Cc: Smadar Fuks, Hyong Youb Kim, Kishore Padmanabha, Ori Kam,
Ajit Khaparde, Jerin Jacob, John Daley, Thomas Monjalon,
Ferruh Yigit
On 02/06/2021 14:21, Eli Britstein wrote:
>
> On 6/2/2021 1:50 PM, Andrew Rybchenko wrote:
>> External email: Use caution opening links or attachments
>>
>>
>> On 6/2/21 12:57 PM, Eli Britstein wrote:
>>> On 6/1/2021 5:53 PM, Andrew Rybchenko wrote:
>>>> External email: Use caution opening links or attachments
>>>>
>>>>
>>>> On 6/1/21 5:44 PM, Eli Britstein wrote:
>>>>> On 6/1/2021 5:35 PM, Andrew Rybchenko wrote:
>>>>>> External email: Use caution opening links or attachments
>>>>>>
>>>>>>
>>>>>> On 6/1/21 4:24 PM, Eli Britstein wrote:
>>>>>>> On 6/1/2021 3:10 PM, Ilya Maximets wrote:
>>>>>>>> External email: Use caution opening links or attachments
>>>>>>>>
>>>>>>>>
>>>>>>>> On 6/1/21 1:14 PM, Ivan Malov wrote:
>>>>>>>>> By its very name, action PORT_ID means that packets hit an ethdev
>>>>>>>>> with the
>>>>>>>>> given DPDK port ID. At least the current comments don't state the
>>>>>>>>> opposite.
>>>>>>>>> That said, since port representors had been adopted, applications
>>>>>>>>> like OvS
>>>>>>>>> have been misusing the action. They misread its purpose as sending
>>>>>>>>> packets
>>>>>>>>> to the opposite end of the "wire" plugged to the given ethdev, for
>>>>>>>>> example,
>>>>>>>>> redirecting packets to the VF itself rather than to its
>>>>>>>>> representor
>>>>>>>>> ethdev.
>>>>>>>>> Another example: OvS relies on this action with the admin PF's
>>>>>>>>> ethdev
>>>>>>>>> port
>>>>>>>>> ID specified in it in order to send offloaded packets to the
>>>>>>>>> physical
>>>>>>>>> port.
>>>>>>>>>
>>>>>>>>> Since there might be applications which use this action in its
>>>>>>>>> valid
>>>>>>>>> sense,
>>>>>>>>> one can't just change the documentation to greenlight the opposite
>>>>>>>>> meaning.
>>>>>>>>> This patch adds an explicit bit to the action configuration which
>>>>>>>>> will let
>>>>>>>>> applications, depending on their needs, leverage the two meanings
>>>>>>>>> properly.
>>>>>>>>> Applications like OvS, as well as PMDs, will have to be corrected
>>>>>>>>> when the
>>>>>>>>> patch has been applied. But the improved clarity of the action is
>>>>>>>>> worth it.
>>>>>>>>>
>>>>>>>>> The proposed change is not the only option. One could avoid
>>>>>>>>> changes
>>>>>>>>> in OvS
>>>>>>>>> and PMDs if the new configuration field had the opposite meaning,
>>>>>>>>> with the
>>>>>>>>> action itself meaning delivery to the represented port and not to
>>>>>>>>> DPDK one.
>>>>>>>>> Alternatively, one could define a brand new action with the said
>>>>>>>>> behaviour.
>>>>>>> It doesn't make any sense to attach the VF itself to OVS, but
>>>>>>> only its
>>>>>>> representor.
>>>>>> OvS is not the only DPDK application.
>>>>> True. It is just the focus of this commit message is OVS.
>>>>>>> For the PF, when in switchdev mode, it is the "uplink
>>>>>>> representor", so
>>>>>>> it is also a representor.
>>>>>> Strictly speaking it is not a representor from DPDK point of
>>>>>> view. E.g. representors have corresponding flag set which is
>>>>>> definitely clear in the case of PF.
>>>>> This is the per-PMD responsibility. The API should not care.
>>>>>>> That said, OVS does not care of the type of the port. It doesn't
>>>>>>> matter
>>>>>>> if it's an "upstream" or not, or if it's a representor or not.
>>>>>> Yes, it is clear, but let's put OvS aside. Let's consider a
>>>>>> DPDK application which has a number of ethdev port. Some may
>>>>>> belong to single switch domain, some may be from different
>>>>>> switch domains (i.e. different NICs). Can I use PORT_ID action
>>>>>> to redirect ingress traffic to a specified ethdev port using
>>>>>> PORT_ID action? It looks like no, but IMHO it is the definition
>>>>>> of the PORT_ID action.
>>>>> Let's separate API from implementation. By API point of view, yes, the
>>>>> user may request it. Nothing wrong with it.
>>>>>
>>>>> From implementation point of view - yes, it might fail, but not for
>>>>> sure, even if on different NICs. Maybe the HW of a certain vendor has
>>>>> the capability to do it?
>>>>>
>>>>> We can't know, so I think the API should allow it.
>>>> Hold on. What should it allow? It is two opposite meanings:
>>>> 1. Direct traffic to DPDK ethdev port specified using ID to be
>>>> received and processed by the DPDK application.
>>>> 2. Direct traffic to an upstream port represented by the
>>>> DPDK port.
>>>>
>>>> The patch tries to address the ambiguity, misuse it in OvS
>>>> (from my point of view in accordance with the action
>>>> documentation), mis-implementation in a number of PMDs
>>>> (to work in OvS) and tries to sort it out with an explanation
>>>> why proposed direction is chosen. I realize that it could be
>>>> painful, but IMHO it is the best option here. Yes, it is a
>>>> point to discuss.
>>>>
>>>> To start with we should agree that that problem exists.
>>>> Second, we should agree on direction how to solve it.
>>> I agree. Suppose port 0 is the PF, and port 1 is a VF representor.
>>>
>>> IIUC, there are two options:
>>>
>>> 1. flow create 1 ingress transfer pattern eth / end action port_id id 0
>>> upstream 1 / end
>>>
>>> 2. flow create 1 ingress transfer pattern eth / end action port_id id 0
>>> upstream 0 / end
>>>
>>> [1] is the same behavior as today.
>>>
>>> [2] is a new behavior, the packet received by port 0 as if it arrived
>>> from the wire.
>>>
>>> Then, let's have more:
>>>
>>> 3. flow create 0 ingress transfer pattern eth / end action port_id id 1
>>> upstream 1 / end
>>>
>>> 4. flow create 0 ingress transfer pattern eth / end action port_id id 1
>>> upstream 0 / end
>>>
>>> if we have [2] and [4], the packet going from the VF will hit [2], then
>>> hit [4] and then [2] again in an endless loop?
>> As I understand PORT_ID is a fate action. So, no more lookups
>> are done. If the packet is loop back from applications, loop is
>> possible.
>
> I referred a HW loop, not SW. For example with JUMP action (also fate):
>
> flow create 0 group 0 ingress transfer pattern eth / end action jump
> group 1 / end
>
> flow create 0 group 1 ingress transfer pattern eth / end action jump
> group 0 / end
>
>>
>> In fact, it is a good question if "flow creare 0 ingress
>> transfer" or "flow create 1 ingress transfer" assume any
>> implicit filtering. I always thought that no.
>> i.e. if we have two network ports rule like
>> flow create 0 ingress transfer pattern eth / end \
>> action port_id id 1 upstream 1 / end
>> will match packets incoming from any port into the switch
>> (network port 0, network port 1, VF or PF itself (???)).
>> The topic also requires explicit clarification.
> rte_flow is port based. It implicitly filters only packets for the
> provided port (0).
>
> Maybe need to clarify documentation and have a "no filtering" API if
> needed.
We've come across the following bits in the current documentation with
respect to attribute "transfer", quote:
"Instead of simply matching the properties of traffic as it would appear
on a given DPDK port ID, enabling this attribute transfers a flow rule
to the lowest possible level of any device endpoints found in the pattern.
When supported, this effectively enables an application to reroute
traffic not necessarily intended for it (e.g. coming from or addressed
to different physical ports, VFs or applications) at the device level".
(https://doc.dpdk.org/guides/prog_guide/rte_flow.html#attributes)
Since action PORT_ID hardly makes sense without attribute "transfer"
(unless it doesn't point to the same ethdev as the one used to submit
the flow), this paragraph effectively states that in this particular
case API "flow_create" is not (necessarily) port-based.
>
>> PF itself is really a hard question because of "ingress"
>> since traffic from PF is a traffic from DPDK application and
>> it is egress, not ingress.
>
> Ingress means the direction. Hit on packets otherwise provided to the SW
> by rte_eth_rx_burst().
>
> Same goes for the PF. Packets by rte_eth_rx_burst are the ones arriving
> from the wire, so ingress is that direction and egress is from the app.
>
>>
>> I think that port ID used to created flow rule should not
>> apply any filtering in the case of transfer since we have
>> corresponding items to do it explicitly. If we do it implicitly
>> as well, we need some priorities and a way to avoid implicit
>> rules which makes things much harder to understand and
>> implement.
>
> If "upstream 0" means what I thought it means (comments?) maybe a better
> way to do it is expose another port for that, so there will be 2 "PF"
> ports - one as the wire representor and the other one as the "PF" (or
> clearer naming...).
>
> This would be a vendor decision, and there would be no need to change
> PORT_ID API.
>
>>
>>> If this is your meaning, maybe what you are looking for is an action to
>>> change the in_port and continue processing?
>>>
>>> Please comment on the examples I gave or clarify the use case you are
>>> trying to do.
>>>
>>>
>>> Thanks,
>>>
>>> Eli
>>>
>>>>>>>> We had already very similar discussions regarding the
>>>>>>>> understanding of
>>>>>>>> what
>>>>>>>> the representor really is from the DPDK API's point of view, and
>>>>>>>> the
>>>>>>>> last
>>>>>>>> time, IIUC, it was concluded by a tech. board that representor
>>>>>>>> should be
>>>>>>>> a "ghost of a VF", i.e. DPDK APIs should apply configuration by
>>>>>>>> default to
>>>>>>>> VF and not to the representor device:
>>>>>>>>
>>>>>>>> https://patches.dpdk.org/project/dpdk/cover/20191029185051.32203-1-thomas@monjalon.net/#104376
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> This wasn't enforced though, IIUC, for existing code and
>>>>>>>> semantics is
>>>>>>>> still mixed.
>>>>>>> I am not sure how this is related.
>>>>>>>> I still think that configuration should be applied to VF, and the
>>>>>>>> same
>>>>>>>> applies
>>>>>>>> to rte_flow API. IMHO, average application should not care if
>>>>>>>> device is
>>>>>>>> a VF itself or its representor. Everything should work exactly the
>>>>>>>> same.
>>>>>>>> I think this matches with the original idea/design of the switchdev
>>>>>>>> functionality
>>>>>>>> in the linux kernel and also matches with how the average user
>>>>>>>> thinks
>>>>>>>> about
>>>>>>>> representor devices.
>>>>>>> Right. This is the way representors work. It is fully aligned with
>>>>>>> configuration of OVS-kernel.
>>>>>>>> If some specific use-case requires to distinguish VF from the
>>>>>>>> representor,
>>>>>>>> there should probably be a separate special API/flag for that.
>>>>>>>>
>>>>>>>> Best regards, Ilya Maximets.
--
Ivan M
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [dpdk-dev] [RFC PATCH] ethdev: clarify flow action PORT ID semantics
2021-06-01 14:28 ` Ivan Malov
@ 2021-06-02 12:46 ` Ilya Maximets
2021-06-02 16:26 ` Andrew Rybchenko
2021-06-25 13:04 ` Ferruh Yigit
0 siblings, 2 replies; 40+ messages in thread
From: Ilya Maximets @ 2021-06-02 12:46 UTC (permalink / raw)
To: Ivan Malov, Ilya Maximets, dev
Cc: Eli Britstein, Smadar Fuks, Hyong Youb Kim, Kishore Padmanabha,
Ori Kam, Ajit Khaparde, Jerin Jacob, John Daley, Thomas Monjalon,
Ferruh Yigit, Andrew Rybchenko
On 6/1/21 4:28 PM, Ivan Malov wrote:
> Hi Ilya,
>
> Thank you for reviewing the proposal at such short notice. I'm afraid that prior discussions overlook the simple fact that the whole problem is not limited to just VF representors. Action PORT_ID is also used with respect to the admin PF's ethdev, which "represents itself" (and by no means it represents the underlying physical/network port). In this case, one cannot state that the application treats it as a physical port, just like one states that the application perceives representors as VFs themselves.
I don't think that it was overlooked. If device is in a switchdev mode than
there is a PF representor and VF representors. Application typically works
only with representors in this case is it doesn't make much sense to have
representor and the upstream port attached to the same application at the
same time. Configuration that is applied by application to the representor
(PF or VF, it doesn't matter) applies to the corresponding upstream port
(actual PF or VF) by default.
Exactly same thing here with PORT_ID action. You have a packet and action
to send it to the port, but it's not specified if HW needs to send it to
the representor or the upstream port (again, VF or PF, it doesn't matter).
Since there is no extra information, HW should send it to the upstream
port by default. The same as configuration applies by default to the
upstream port.
Let's look at some workflow examples:
DPDK Application
| |
| |
+--PF-rep------VF-rep---+
| |
| NIC (switchdev) |
| |
+---PF---------VF-------+
| |
| |
External VM or whatever
Network
a. Workflow for "DPDK Application" to set MAC to VF:
1. "DPDK Application" calls rte_set_etheraddr("VF-rep", new_mac);
2. DPDK sets MAC for "VF".
b. Workflow for "DPDK Application" to set MAC to PF:
1. "DPDK Application" calls rte_set_etheraddr("PF-rep", new_mac);
2. DPDK sets MAC for "PF".
c. Workflow for "DPDK Application" to send packet to the external network:
1. "DPDK Application" calls rte_eth_tx_burst("PF-rep", packet);
2. NIC receives the packet from "PF-rep" and sends it to "PF".
3. packet egresses to the external network from "PF".
d. Workflow for "DPDK Application" to send packet to the "VM or whatever":
1. "DPDK Application" calls rte_eth_tx_burst("VF-rep", packet);
2. NIC receives the packet from "VF-rep" and sends it to "VF".
3. "VM or whatever" receives the packet from "VF".
In two workflows above there is no rte_flow processing on step 2, i.e.,
NIC does not perform any lookups/matches/actions, because it's not possible
to configure actions for packets received from "PF-rep" or
"VF-rep" as these ports doesn't own a port id and all the configuration
and rte_flow actions translated and applied for the devices that these
ports represents ("PF" and "VF") and not representors themselves ("PF-rep"
or "VF-rep").
e. Workflow for the packet received on PF and PORT_ID action:
1. "DPDK Application" configures rte_flow for all packets from "PF-rep"
to execute PORT_ID "VF-rep".
2. NIC receives packet on "PF".
3. NIC executes 'PORT_ID "VF-rep"' action by sending packet to "VF".
4. "VM or whatever" receives the packet from "VF".
f. Workflow for the packet received on VF and PORT_ID action:
1. "DPDK Application" configures rte_flow for all packets from "VF-rep"
to execute 'PORT_ID "PF-rep"'.
2. NIC receives packet on "VF".
3. NIC executes 'PORT_ID "PF-rep"' action by sending packet to "PF".
4. Packet egresses from the "PF" to the external network.
Above is what, IMHO, the logic should look like and this matches with
the overall switchdev design in kernel.
I understand that this logic could seem flipped-over from the HW point
of view, but it's perfectly logical from the user's perspective, because
user should not care if the application works with representors or
some real devices. If application configures that all packets from port
A should be sent to port B, user will expect that these packets will
egress from port B once received from port A. That will be highly
inconvenient if the packet will ingress from port B back to the
application instead.
DPDK Application
| |
| |
port A port B
| |
*****MAGIC*****
| |
External Another Network
Network or VM or whatever
It should not matter if there is an extra layer between ports A and B
and the external network and VM. Everything should work in exactly the
same way, transparently for the application.
The point of hardware offloading, and therefore rte_flow API, is to take
what user does in software and make this "magically" work in hardware in
the exactly same way. And this will be broken if user will have to
use different logic based on the mode the hardware works in, i.e. based on
the fact if the application works with ports or their representors.
If some specific use case requires application to know if it's an
upstream port or the representor and demystify the internals of the switchdev
NIC, there should be a different port id for the representor itself that
could be used in all DPDK APIs including rte_flow API or a special bit for
that matter. IIRC, there was an idea to add a bit directly to the port_id
for that purpose that will flip over behavior in all the workflow scenarios
that I described above.
>
> Given these facts, it would not be quite right to just align the documentation with the de-facto action meaning assumed by OvS.
It's not a "meaning assumed by OvS", it's the original design and the
main idea of a switchdev based on a common sense.
>
> On 01/06/2021 15:10, Ilya Maximets wrote:
>> On 6/1/21 1:14 PM, Ivan Malov wrote:
>>> By its very name, action PORT_ID means that packets hit an ethdev with the
>>> given DPDK port ID. At least the current comments don't state the opposite.
>>> That said, since port representors had been adopted, applications like OvS
>>> have been misusing the action. They misread its purpose as sending packets
>>> to the opposite end of the "wire" plugged to the given ethdev, for example,
>>> redirecting packets to the VF itself rather than to its representor ethdev.
>>> Another example: OvS relies on this action with the admin PF's ethdev port
>>> ID specified in it in order to send offloaded packets to the physical port.
>>>
>>> Since there might be applications which use this action in its valid sense,
>>> one can't just change the documentation to greenlight the opposite meaning.
>>> This patch adds an explicit bit to the action configuration which will let
>>> applications, depending on their needs, leverage the two meanings properly.
>>> Applications like OvS, as well as PMDs, will have to be corrected when the
>>> patch has been applied. But the improved clarity of the action is worth it.
>>>
>>> The proposed change is not the only option. One could avoid changes in OvS
>>> and PMDs if the new configuration field had the opposite meaning, with the
>>> action itself meaning delivery to the represented port and not to DPDK one.
>>> Alternatively, one could define a brand new action with the said behaviour.
>>
>> We had already very similar discussions regarding the understanding of what
>> the representor really is from the DPDK API's point of view, and the last
>> time, IIUC, it was concluded by a tech. board that representor should be
>> a "ghost of a VF", i.e. DPDK APIs should apply configuration by default to
>> VF and not to the representor device:
>> https://patches.dpdk.org/project/dpdk/cover/20191029185051.32203-1-thomas@monjalon.net/#104376
>> This wasn't enforced though, IIUC, for existing code and semantics is still mixed.
>>
>> I still think that configuration should be applied to VF, and the same applies
>> to rte_flow API. IMHO, average application should not care if device is
>> a VF itself or its representor. Everything should work exactly the same.
>> I think this matches with the original idea/design of the switchdev functionality
>> in the linux kernel and also matches with how the average user thinks about
>> representor devices.
>>
>> If some specific use-case requires to distinguish VF from the representor,
>> there should probably be a separate special API/flag for that.
>>
>> Best regards, Ilya Maximets.
>>
>
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [dpdk-dev] [RFC PATCH] ethdev: clarify flow action PORT ID semantics
2021-06-02 12:16 ` Thomas Monjalon
@ 2021-06-02 12:53 ` Ilya Maximets
2021-06-02 13:10 ` Andrew Rybchenko
1 sibling, 0 replies; 40+ messages in thread
From: Ilya Maximets @ 2021-06-02 12:53 UTC (permalink / raw)
To: Thomas Monjalon, Ivan Malov, Ilya Maximets, Eli Britstein,
Andrew Rybchenko
Cc: dev, Smadar Fuks, Hyong Youb Kim, Kishore Padmanabha, Ori Kam,
Ajit Khaparde, Jerin Jacob, John Daley, Ferruh Yigit
On 6/2/21 2:16 PM, Thomas Monjalon wrote:
> 01/06/2021 14:10, Ilya Maximets:
>> On 6/1/21 1:14 PM, Ivan Malov wrote:
>>> By its very name, action PORT_ID means that packets hit an ethdev with the
>>> given DPDK port ID. At least the current comments don't state the opposite.
>>> That said, since port representors had been adopted, applications like OvS
>>> have been misusing the action. They misread its purpose as sending packets
>>> to the opposite end of the "wire" plugged to the given ethdev, for example,
>>> redirecting packets to the VF itself rather than to its representor ethdev.
>>> Another example: OvS relies on this action with the admin PF's ethdev port
>>> ID specified in it in order to send offloaded packets to the physical port.
>>>
>>> Since there might be applications which use this action in its valid sense,
>>> one can't just change the documentation to greenlight the opposite meaning.
>>> This patch adds an explicit bit to the action configuration which will let
>>> applications, depending on their needs, leverage the two meanings properly.
>>> Applications like OvS, as well as PMDs, will have to be corrected when the
>>> patch has been applied. But the improved clarity of the action is worth it.
>>>
>>> The proposed change is not the only option. One could avoid changes in OvS
>>> and PMDs if the new configuration field had the opposite meaning, with the
>>> action itself meaning delivery to the represented port and not to DPDK one.
>>> Alternatively, one could define a brand new action with the said behaviour.
>>
>> We had already very similar discussions regarding the understanding of what
>> the representor really is from the DPDK API's point of view, and the last
>> time, IIUC, it was concluded by a tech. board that representor should be
>> a "ghost of a VF", i.e. DPDK APIs should apply configuration by default to
>> VF and not to the representor device:
>> https://patches.dpdk.org/project/dpdk/cover/20191029185051.32203-1-thomas@monjalon.net/#104376
>> This wasn't enforced though, IIUC, for existing code and semantics is still mixed.
>
> Quoting myself from above link:
> "the representor port must be a real DPDK port, not a ghost."
> and
> "During the Technical Board yesterday, it was decided to go with Intel
> understanding of what is a representor, i.e. a ghost of the VF."
> and
> "we will continue to mix VF and representor operations
> with the same port ID. For the record, I believe it is very bad."
>
>> I still think that configuration should be applied to VF, and the same applies
>> to rte_flow API. IMHO, average application should not care if device is
>> a VF itself or its representor. Everything should work exactly the same.
>
> What means "work exactly the same"?
> Is it considering what is behind the representor silently,
> or considering the representor as a real port?
Check ut my other email where I described some workflows and how, I think,
they should work. Hopefully that will answer this question. (This email
arrived while I was writing another one, so I had no chance to read this
question).
>
> There is a need to really consider representor port as any other port,
> and stop this ugly mix. I want to propose such change again for DPDK 21.11.
> To me the real solution is to use a bit in the port id of a representor
> for explicitly identifying the port behind the representor.
> This bit could be translated as a flag or a sign in testpmd text grammar.
This makes sense.
>
>> I think this matches with the original idea/design of the switchdev functionality
>> in the linux kernel and also matches with how the average user thinks about
>> representor devices.
>
> There is no "average" user or application, just right and wrong.
> In the switchdev model, a representor is a port of a switch like
> any other port, not a ghost of its peer.
>
>> If some specific use-case requires to distinguish VF from the representor,
>> there should probably be a separate special API/flag for that.
>
> Yes, port ID of a representor must be the representor itself,
> and a bit can help reaching the port behind the representor.
I still think that the logic should be opposite, i.e. special bit should
be set to identify that we want to reach the representor and not the
port behind it. But we discussed this already several times and I also
wrote some of the thoughts in the other email that I just sent.
Best regards, Ilya Maximets.
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [dpdk-dev] [RFC PATCH] ethdev: clarify flow action PORT ID semantics
2021-06-02 12:16 ` Thomas Monjalon
2021-06-02 12:53 ` Ilya Maximets
@ 2021-06-02 13:10 ` Andrew Rybchenko
1 sibling, 0 replies; 40+ messages in thread
From: Andrew Rybchenko @ 2021-06-02 13:10 UTC (permalink / raw)
To: Thomas Monjalon, Ivan Malov, Ilya Maximets, Eli Britstein
Cc: dev, Smadar Fuks, Hyong Youb Kim, Kishore Padmanabha, Ori Kam,
Ajit Khaparde, Jerin Jacob, John Daley, Ferruh Yigit
On 6/2/21 3:16 PM, Thomas Monjalon wrote:
> 01/06/2021 14:10, Ilya Maximets:
>> On 6/1/21 1:14 PM, Ivan Malov wrote:
>>> By its very name, action PORT_ID means that packets hit an ethdev with the
>>> given DPDK port ID. At least the current comments don't state the opposite.
>>> That said, since port representors had been adopted, applications like OvS
>>> have been misusing the action. They misread its purpose as sending packets
>>> to the opposite end of the "wire" plugged to the given ethdev, for example,
>>> redirecting packets to the VF itself rather than to its representor ethdev.
>>> Another example: OvS relies on this action with the admin PF's ethdev port
>>> ID specified in it in order to send offloaded packets to the physical port.
>>>
>>> Since there might be applications which use this action in its valid sense,
>>> one can't just change the documentation to greenlight the opposite meaning.
>>> This patch adds an explicit bit to the action configuration which will let
>>> applications, depending on their needs, leverage the two meanings properly.
>>> Applications like OvS, as well as PMDs, will have to be corrected when the
>>> patch has been applied. But the improved clarity of the action is worth it.
>>>
>>> The proposed change is not the only option. One could avoid changes in OvS
>>> and PMDs if the new configuration field had the opposite meaning, with the
>>> action itself meaning delivery to the represented port and not to DPDK one.
>>> Alternatively, one could define a brand new action with the said behaviour.
>>
>> We had already very similar discussions regarding the understanding of what
>> the representor really is from the DPDK API's point of view, and the last
>> time, IIUC, it was concluded by a tech. board that representor should be
>> a "ghost of a VF", i.e. DPDK APIs should apply configuration by default to
>> VF and not to the representor device:
>> https://patches.dpdk.org/project/dpdk/cover/20191029185051.32203-1-thomas@monjalon.net/#104376
>> This wasn't enforced though, IIUC, for existing code and semantics is still mixed.
>
> Quoting myself from above link:
> "the representor port must be a real DPDK port, not a ghost."
That days I had no opinion on the topic. Now I tend to agree
with it, but I'm not sure that I understand all implications.
I'm afraid it is more complex:
- it is a real DPDK port since application can send
traffic to it, and receive traffic from it
- however, in our case, it is basically just a direct pipe:
- packets sent to representor bypass transfer and go
directly to represented function (VF or PF)
- packets sent by represented function go to representor
by default, but transfer rules (HW offload) may change it
Of course, it is just a vendor implementation detail.
- I doubt that all ethdev operation makes sense for
representor itself, but some, for example stats, definitely
makes sense (representor and represented entity stats could
differ a lot because of HW offload). So, if operation does
not make sense or simply not supported, it should return an
error and that's it.
In fact, I see nothing bad in attaching both representor and
represented entity (VF, PF or sub-function) to the same DPDK
application, for example, for testing purposes. So, it should
behave consistently.
> and
> "During the Technical Board yesterday, it was decided to go with Intel
> understanding of what is a representor, i.e. a ghost of the VF."
> and
> "we will continue to mix VF and representor operations
> with the same port ID. For the record, I believe it is very bad."
>
>> I still think that configuration should be applied to VF, and the same applies
>> to rte_flow API. IMHO, average application should not care if device is
>> a VF itself or its representor. Everything should work exactly the same.
>
> What means "work exactly the same"?
> Is it considering what is behind the representor silently,
> or considering the representor as a real port?
>
> There is a need to really consider representor port as any other port,
> and stop this ugly mix. I want to propose such change again for DPDK 21.11.
> To me the real solution is to use a bit in the port id of a representor
> for explicitly identifying the port behind the representor.
> This bit could be translated as a flag or a sign in testpmd text grammar.
+1, if so, it will allow to use it in PORT_ID action and item
without any changes in flow API. Just clarification in
documentation what it means for various cases.
However, I'd like to draw attention to network port <-> PF
ethdev port case. Strictly speaking it is not a representor
(as I tried to proove in other mail) and requires to be a
part of solution.
>> I think this matches with the original idea/design of the switchdev functionality
>> in the linux kernel and also matches with how the average user thinks about
>> representor devices.
>
> There is no "average" user or application, just right and wrong.
> In the switchdev model, a representor is a port of a switch like
> any other port, not a ghost of its peer.
>
>> If some specific use-case requires to distinguish VF from the representor,
>> there should probably be a separate special API/flag for that.
>
> Yes, port ID of a representor must be the representor itself,
> and a bit can help reaching the port behind the representor.
+1 with clarification of network port <-> PF ethdev case
semantics which is not an easy task: for example,
when we configure ethdev port and specify speed capabilities,
it is really configuration of the associated upstream network
port, but we still apply it to ethdev port.
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [dpdk-dev] [RFC PATCH] ethdev: clarify flow action PORT ID semantics
2021-06-02 12:46 ` Ilya Maximets
@ 2021-06-02 16:26 ` Andrew Rybchenko
2021-06-02 17:35 ` Ilya Maximets
2021-06-25 13:04 ` Ferruh Yigit
1 sibling, 1 reply; 40+ messages in thread
From: Andrew Rybchenko @ 2021-06-02 16:26 UTC (permalink / raw)
To: Ilya Maximets, Ivan Malov, dev
Cc: Eli Britstein, Smadar Fuks, Hyong Youb Kim, Kishore Padmanabha,
Ori Kam, Ajit Khaparde, Jerin Jacob, John Daley, Thomas Monjalon,
Ferruh Yigit
On 6/2/21 3:46 PM, Ilya Maximets wrote:
> On 6/1/21 4:28 PM, Ivan Malov wrote:
>> Hi Ilya,
>>
>> Thank you for reviewing the proposal at such short notice. I'm afraid that prior discussions overlook the simple fact that the whole problem is not limited to just VF representors. Action PORT_ID is also used with respect to the admin PF's ethdev, which "represents itself" (and by no means it represents the underlying physical/network port). In this case, one cannot state that the application treats it as a physical port, just like one states that the application perceives representors as VFs themselves.
>
>
> I don't think that it was overlooked. If device is in a switchdev mode than
> there is a PF representor and VF representors. Application typically works
> only with representors in this case is it doesn't make much sense to have
> representor and the upstream port attached to the same application at the
> same time. Configuration that is applied by application to the representor
> (PF or VF, it doesn't matter) applies to the corresponding upstream port
> (actual PF or VF) by default.
PF is not necessarily associated with a network port. It
could be many PFs and just one network port on NIC.
Extra PFs are like VFs in this case. These PFs may be
passed to a VM in a similar way. So, we can have PF
representors similar to VF representors. I.e. it is
incorrect to say that PF in the case of switchdev is
a representor of a network port.
If we prefer to talk in representors terminology, we
need 4 types of prepresentors:
- PF representor for PCIe physical function
- VF representor for PCIe virtual function
- SF representor for PCIe sub-function (PASID)
- network port representor
In fact above is PCIe oriented, but there are
other buses and ways to deliver traffic to applications.
Basically representor for any virtual port in virtual
switch which DPDK app can control using transfer rules.
> Exactly same thing here with PORT_ID action. You have a packet and action
> to send it to the port, but it's not specified if HW needs to send it to
> the representor or the upstream port (again, VF or PF, it doesn't matter).
> Since there is no extra information, HW should send it to the upstream
> port by default. The same as configuration applies by default to the
> upstream port.
>
> Let's look at some workflow examples:
>
> DPDK Application
> | |
> | |
> +--PF-rep------VF-rep---+
> | |
> | NIC (switchdev) |
> | |
> +---PF---------VF-------+
> | |
> | |
> External VM or whatever
> Network
See above. PF <-> External Network is incorrect above
since it not always the case. It should be
"NP <-> External network" and "NP-rep" above (NP -
network port). Sometimes PF is an NP-rep, but sometimes
it is not. It is just a question of default rules in
switchdev on what to do with traffic incoming from
network port.
A bit more complicated picture is:
+----------------------------------------+
| DPDK Application |
+----+---------+---------+---------+-----+
|PF0 |PF1 | |
| | | |
+--NP1-rep---NP2-rep---PF2-rep---VF-rep--+
| |
| NIC (switchdev) |
| |
+---NP1-------NP2-------PF2--------VF----+
| | | |
| | | |
External External VM or VM or
Network 1 Network 2 whatever whatever
So, sometimes PF plays network port representor role (PF0,
PF1), sometimes it requires representor itself (PF2).
What to do if PF2 itself is attached to application?
Can we route traffic to it using PORT_ID action?
It has DPDK ethdev port. It is one of arguments why
plain PORT_ID should route DPDK application.
Of course, some applications would like to see it as
(simpler is better):
+----------------------------------------+
| DPDK Application |
| |
+---PF0-------PF1------PF2-rep---VF-rep--+
| | | |
| | | |
External External VM or VM or
Network 1 Network 2 whatever whatever
but some, I believe, require full picture. For examples,
I'd really like to know how much traffic goes via all 8
switchdev ports and running rte_eth_stats_get(0, ...)
(i.e. DPDK port 0 attached to PF0) I'd like to get
NP1-rep stats (not NP1 stats). It will match exactly
what I see in DPDK application. It is an argument why
plain PORT_ID should be treated as a DPDK ethdev port,
not a represented (upstream) entity.
> a. Workflow for "DPDK Application" to set MAC to VF:
>
> 1. "DPDK Application" calls rte_set_etheraddr("VF-rep", new_mac);
> 2. DPDK sets MAC for "VF".
>
> b. Workflow for "DPDK Application" to set MAC to PF:
>
> 1. "DPDK Application" calls rte_set_etheraddr("PF-rep", new_mac);
> 2. DPDK sets MAC for "PF".
>
> c. Workflow for "DPDK Application" to send packet to the external network:
>
> 1. "DPDK Application" calls rte_eth_tx_burst("PF-rep", packet);
> 2. NIC receives the packet from "PF-rep" and sends it to "PF".
> 3. packet egresses to the external network from "PF".
>
> d. Workflow for "DPDK Application" to send packet to the "VM or whatever":
>
> 1. "DPDK Application" calls rte_eth_tx_burst("VF-rep", packet);
> 2. NIC receives the packet from "VF-rep" and sends it to "VF".
> 3. "VM or whatever" receives the packet from "VF".
>
> In two workflows above there is no rte_flow processing on step 2, i.e.,
> NIC does not perform any lookups/matches/actions, because it's not possible
> to configure actions for packets received from "PF-rep" or
> "VF-rep" as these ports doesn't own a port id and all the configuration
> and rte_flow actions translated and applied for the devices that these
> ports represents ("PF" and "VF") and not representors themselves ("PF-rep"
> or "VF-rep").
>
> e. Workflow for the packet received on PF and PORT_ID action:
>
> 1. "DPDK Application" configures rte_flow for all packets from "PF-rep"
> to execute PORT_ID "VF-rep".
> 2. NIC receives packet on "PF".
> 3. NIC executes 'PORT_ID "VF-rep"' action by sending packet to "VF".
> 4. "VM or whatever" receives the packet from "VF".
>
> f. Workflow for the packet received on VF and PORT_ID action:
>
> 1. "DPDK Application" configures rte_flow for all packets from "VF-rep"
> to execute 'PORT_ID "PF-rep"'.
> 2. NIC receives packet on "VF".
> 3. NIC executes 'PORT_ID "PF-rep"' action by sending packet to "PF".
> 4. Packet egresses from the "PF" to the external network.
>
> Above is what, IMHO, the logic should look like and this matches with
> the overall switchdev design in kernel.
>
> I understand that this logic could seem flipped-over from the HW point
> of view, but it's perfectly logical from the user's perspective, because
> user should not care if the application works with representors or
> some real devices. If application configures that all packets from port
> A should be sent to port B, user will expect that these packets will
> egress from port B once received from port A. That will be highly
> inconvenient if the packet will ingress from port B back to the
> application instead.
>
> DPDK Application
> | |
> | |
> port A port B
> | |
> *****MAGIC*****
> | |
> External Another Network
> Network or VM or whatever
>
> It should not matter if there is an extra layer between ports A and B
> and the external network and VM. Everything should work in exactly the
> same way, transparently for the application.
>
> The point of hardware offloading, and therefore rte_flow API, is to take
> what user does in software and make this "magically" work in hardware in
> the exactly same way. And this will be broken if user will have to
> use different logic based on the mode the hardware works in, i.e. based on
> the fact if the application works with ports or their representors.
>
> If some specific use case requires application to know if it's an
> upstream port or the representor and demystify the internals of the switchdev
> NIC, there should be a different port id for the representor itself that
> could be used in all DPDK APIs including rte_flow API or a special bit for
> that matter. IIRC, there was an idea to add a bit directly to the port_id
> for that purpose that will flip over behavior in all the workflow scenarios
> that I described above.
As I understand we're basically on the same page, but just
fighting for defaults in DPDK.
>>
>> Given these facts, it would not be quite right to just align the documentation with the de-facto action meaning assumed by OvS.
>
> It's not a "meaning assumed by OvS", it's the original design and the
> main idea of a switchdev based on a common sense.
If so, common sense is not that common :)
My "common sense" says me that PORT_ID action
should route traffic to DPDK ethdev port to be
received by the DPDK application.
>>
>> On 01/06/2021 15:10, Ilya Maximets wrote:
>>> On 6/1/21 1:14 PM, Ivan Malov wrote:
>>>> By its very name, action PORT_ID means that packets hit an ethdev with the
>>>> given DPDK port ID. At least the current comments don't state the opposite.
>>>> That said, since port representors had been adopted, applications like OvS
>>>> have been misusing the action. They misread its purpose as sending packets
>>>> to the opposite end of the "wire" plugged to the given ethdev, for example,
>>>> redirecting packets to the VF itself rather than to its representor ethdev.
>>>> Another example: OvS relies on this action with the admin PF's ethdev port
>>>> ID specified in it in order to send offloaded packets to the physical port.
>>>>
>>>> Since there might be applications which use this action in its valid sense,
>>>> one can't just change the documentation to greenlight the opposite meaning.
>>>> This patch adds an explicit bit to the action configuration which will let
>>>> applications, depending on their needs, leverage the two meanings properly.
>>>> Applications like OvS, as well as PMDs, will have to be corrected when the
>>>> patch has been applied. But the improved clarity of the action is worth it.
>>>>
>>>> The proposed change is not the only option. One could avoid changes in OvS
>>>> and PMDs if the new configuration field had the opposite meaning, with the
>>>> action itself meaning delivery to the represented port and not to DPDK one.
>>>> Alternatively, one could define a brand new action with the said behaviour.
>>>
>>> We had already very similar discussions regarding the understanding of what
>>> the representor really is from the DPDK API's point of view, and the last
>>> time, IIUC, it was concluded by a tech. board that representor should be
>>> a "ghost of a VF", i.e. DPDK APIs should apply configuration by default to
>>> VF and not to the representor device:
>>> https://patches.dpdk.org/project/dpdk/cover/20191029185051.32203-1-thomas@monjalon.net/#104376
>>> This wasn't enforced though, IIUC, for existing code and semantics is still mixed.
>>>
>>> I still think that configuration should be applied to VF, and the same applies
>>> to rte_flow API. IMHO, average application should not care if device is
>>> a VF itself or its representor. Everything should work exactly the same.
>>> I think this matches with the original idea/design of the switchdev functionality
>>> in the linux kernel and also matches with how the average user thinks about
>>> representor devices.
>>>
>>> If some specific use-case requires to distinguish VF from the representor,
>>> there should probably be a separate special API/flag for that.
>>>
>>> Best regards, Ilya Maximets.
>>>
>>
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [dpdk-dev] [RFC PATCH] ethdev: clarify flow action PORT ID semantics
2021-06-02 16:26 ` Andrew Rybchenko
@ 2021-06-02 17:35 ` Ilya Maximets
2021-06-02 19:35 ` Ivan Malov
0 siblings, 1 reply; 40+ messages in thread
From: Ilya Maximets @ 2021-06-02 17:35 UTC (permalink / raw)
To: Andrew Rybchenko, Ilya Maximets, Ivan Malov, dev
Cc: Eli Britstein, Smadar Fuks, Hyong Youb Kim, Ori Kam, Jerin Jacob,
John Daley, Thomas Monjalon, Ferruh Yigit
(Dropped Broadcom folks from CC. Mail server refuses to accept their
emails for some reason: "Recipient address rejected: Domain not found."
Please, try to ad them back on reply.)
On 6/2/21 6:26 PM, Andrew Rybchenko wrote:
> On 6/2/21 3:46 PM, Ilya Maximets wrote:
>> On 6/1/21 4:28 PM, Ivan Malov wrote:
>>> Hi Ilya,
>>>
>>> Thank you for reviewing the proposal at such short notice. I'm afraid that prior discussions overlook the simple fact that the whole problem is not limited to just VF representors. Action PORT_ID is also used with respect to the admin PF's ethdev, which "represents itself" (and by no means it represents the underlying physical/network port). In this case, one cannot state that the application treats it as a physical port, just like one states that the application perceives representors as VFs themselves.
>>
>>
>> I don't think that it was overlooked. If device is in a switchdev mode than
>> there is a PF representor and VF representors. Application typically works
>> only with representors in this case is it doesn't make much sense to have
>> representor and the upstream port attached to the same application at the
>> same time. Configuration that is applied by application to the representor
>> (PF or VF, it doesn't matter) applies to the corresponding upstream port
>> (actual PF or VF) by default.
>
> PF is not necessarily associated with a network port. It
> could be many PFs and just one network port on NIC.
> Extra PFs are like VFs in this case. These PFs may be
> passed to a VM in a similar way. So, we can have PF
> representors similar to VF representors. I.e. it is
> incorrect to say that PF in the case of switchdev is
> a representor of a network port.
>
> If we prefer to talk in representors terminology, we
> need 4 types of prepresentors:
> - PF representor for PCIe physical function
> - VF representor for PCIe virtual function
> - SF representor for PCIe sub-function (PASID)
> - network port representor
> In fact above is PCIe oriented, but there are
> other buses and ways to deliver traffic to applications.
> Basically representor for any virtual port in virtual
> switch which DPDK app can control using transfer rules.
>
>> Exactly same thing here with PORT_ID action. You have a packet and action
>> to send it to the port, but it's not specified if HW needs to send it to
>> the representor or the upstream port (again, VF or PF, it doesn't matter).
>> Since there is no extra information, HW should send it to the upstream
>> port by default. The same as configuration applies by default to the
>> upstream port.
>>
>> Let's look at some workflow examples:
>>
>> DPDK Application
>> | |
>> | |
>> +--PF-rep------VF-rep---+
>> | |
>> | NIC (switchdev) |
>> | |
>> +---PF---------VF-------+
>> | |
>> | |
>> External VM or whatever
>> Network
>
> See above. PF <-> External Network is incorrect above
> since it not always the case. It should be
> "NP <-> External network" and "NP-rep" above (NP -
> network port). Sometimes PF is an NP-rep, but sometimes
> it is not. It is just a question of default rules in
> switchdev on what to do with traffic incoming from
> network port.
>
> A bit more complicated picture is:
>
> +----------------------------------------+
> | DPDK Application |
> +----+---------+---------+---------+-----+
> |PF0 |PF1 | |
> | | | |
> +--NP1-rep---NP2-rep---PF2-rep---VF-rep--+
> | |
> | NIC (switchdev) |
> | |
> +---NP1-------NP2-------PF2--------VF----+
> | | | |
> | | | |
> External External VM or VM or
> Network 1 Network 2 whatever whatever
>
> So, sometimes PF plays network port representor role (PF0,
> PF1), sometimes it requires representor itself (PF2).
> What to do if PF2 itself is attached to application?
> Can we route traffic to it using PORT_ID action?
> It has DPDK ethdev port. It is one of arguments why
> plain PORT_ID should route DPDK application.
OK. This is not very different from my understanding. The key
is that there is a pair of interfaces, one is more visible than
the other one.
>
> Of course, some applications would like to see it as
> (simpler is better):
>
> +----------------------------------------+
> | DPDK Application |
> | |
> +---PF0-------PF1------PF2-rep---VF-rep--+
> | | | |
> | | | |
> External External VM or VM or
> Network 1 Network 2 whatever whatever
>
> but some, I believe, require full picture. For examples,
> I'd really like to know how much traffic goes via all 8
> switchdev ports and running rte_eth_stats_get(0, ...)
> (i.e. DPDK port 0 attached to PF0) I'd like to get
> NP1-rep stats (not NP1 stats). It will match exactly
> what I see in DPDK application. It is an argument why
> plain PORT_ID should be treated as a DPDK ethdev port,
> not a represented (upstream) entity.
The point is that if application doesn't require full picture,
it should not care. If application requires the full picture,
it could take extra steps by setting extra bits. I don't
understand why we need to force all applications to care about
the full picture if we can avoid that?
>
>> a. Workflow for "DPDK Application" to set MAC to VF:
>>
>> 1. "DPDK Application" calls rte_set_etheraddr("VF-rep", new_mac);
>> 2. DPDK sets MAC for "VF".
>>
>> b. Workflow for "DPDK Application" to set MAC to PF:
>>
>> 1. "DPDK Application" calls rte_set_etheraddr("PF-rep", new_mac);
>> 2. DPDK sets MAC for "PF".
>>
>> c. Workflow for "DPDK Application" to send packet to the external network:
>>
>> 1. "DPDK Application" calls rte_eth_tx_burst("PF-rep", packet);
>> 2. NIC receives the packet from "PF-rep" and sends it to "PF".
>> 3. packet egresses to the external network from "PF".
>>
>> d. Workflow for "DPDK Application" to send packet to the "VM or whatever":
>>
>> 1. "DPDK Application" calls rte_eth_tx_burst("VF-rep", packet);
>> 2. NIC receives the packet from "VF-rep" and sends it to "VF".
>> 3. "VM or whatever" receives the packet from "VF".
>>
>> In two workflows above there is no rte_flow processing on step 2, i.e.,
>> NIC does not perform any lookups/matches/actions, because it's not possible
>> to configure actions for packets received from "PF-rep" or
>> "VF-rep" as these ports doesn't own a port id and all the configuration
>> and rte_flow actions translated and applied for the devices that these
>> ports represents ("PF" and "VF") and not representors themselves ("PF-rep"
>> or "VF-rep").
>>
>> e. Workflow for the packet received on PF and PORT_ID action:
>>
>> 1. "DPDK Application" configures rte_flow for all packets from "PF-rep"
>> to execute PORT_ID "VF-rep".
>> 2. NIC receives packet on "PF".
>> 3. NIC executes 'PORT_ID "VF-rep"' action by sending packet to "VF".
>> 4. "VM or whatever" receives the packet from "VF".
>>
>> f. Workflow for the packet received on VF and PORT_ID action:
>>
>> 1. "DPDK Application" configures rte_flow for all packets from "VF-rep"
>> to execute 'PORT_ID "PF-rep"'.
>> 2. NIC receives packet on "VF".
>> 3. NIC executes 'PORT_ID "PF-rep"' action by sending packet to "PF".
>> 4. Packet egresses from the "PF" to the external network.
>>
>> Above is what, IMHO, the logic should look like and this matches with
>> the overall switchdev design in kernel.
>>
>> I understand that this logic could seem flipped-over from the HW point
>> of view, but it's perfectly logical from the user's perspective, because
>> user should not care if the application works with representors or
>> some real devices. If application configures that all packets from port
>> A should be sent to port B, user will expect that these packets will
>> egress from port B once received from port A. That will be highly
>> inconvenient if the packet will ingress from port B back to the
>> application instead.
>>
>> DPDK Application
>> | |
>> | |
>> port A port B
>> | |
>> *****MAGIC*****
>> | |
>> External Another Network
>> Network or VM or whatever
>>
>> It should not matter if there is an extra layer between ports A and B
>> and the external network and VM. Everything should work in exactly the
>> same way, transparently for the application.
>>
>> The point of hardware offloading, and therefore rte_flow API, is to take
>> what user does in software and make this "magically" work in hardware in
>> the exactly same way. And this will be broken if user will have to
>> use different logic based on the mode the hardware works in, i.e. based on
>> the fact if the application works with ports or their representors.
>>
>> If some specific use case requires application to know if it's an
>> upstream port or the representor and demystify the internals of the switchdev
>> NIC, there should be a different port id for the representor itself that
>> could be used in all DPDK APIs including rte_flow API or a special bit for
>> that matter. IIRC, there was an idea to add a bit directly to the port_id
>> for that purpose that will flip over behavior in all the workflow scenarios
>> that I described above.
>
> As I understand we're basically on the same page, but just
> fighting for defaults in DPDK.
Yep.
>
>>>
>>> Given these facts, it would not be quite right to just align the documentation with the de-facto action meaning assumed by OvS.
>>
>> It's not a "meaning assumed by OvS", it's the original design and the
>> main idea of a switchdev based on a common sense.
>
> If so, common sense is not that common :)
> My "common sense" says me that PORT_ID action
> should route traffic to DPDK ethdev port to be
> received by the DPDK application.
By this logic rte_eth_tx_burst("VF-rep", packet) should send a packet
to "VF-rep", i.e. this packet will be received back by the application
on this same interface. But that is counter-intuitive and this is not
how it works in linux kernel if you're opening socket and sending a
packet to the "VF-rep" network interface.
And if rte_eth_tx_burst("VF-rep", packet) sends packet to "VF" and not
to "VF-rep", than I don't understand why PORT_ID action should work in
the opposite way.
Application receives a packet from port A and puts it to the port B.
TC rule to forward packets from port A to port B will provide same result.
So, why the similar rte_flow should do the opposite and send the packet
back to the application?
>
>>>
>>> On 01/06/2021 15:10, Ilya Maximets wrote:
>>>> On 6/1/21 1:14 PM, Ivan Malov wrote:
>>>>> By its very name, action PORT_ID means that packets hit an ethdev with the
>>>>> given DPDK port ID. At least the current comments don't state the opposite.
>>>>> That said, since port representors had been adopted, applications like OvS
>>>>> have been misusing the action. They misread its purpose as sending packets
>>>>> to the opposite end of the "wire" plugged to the given ethdev, for example,
>>>>> redirecting packets to the VF itself rather than to its representor ethdev.
>>>>> Another example: OvS relies on this action with the admin PF's ethdev port
>>>>> ID specified in it in order to send offloaded packets to the physical port.
>>>>>
>>>>> Since there might be applications which use this action in its valid sense,
>>>>> one can't just change the documentation to greenlight the opposite meaning.
>>>>> This patch adds an explicit bit to the action configuration which will let
>>>>> applications, depending on their needs, leverage the two meanings properly.
>>>>> Applications like OvS, as well as PMDs, will have to be corrected when the
>>>>> patch has been applied. But the improved clarity of the action is worth it.
>>>>>
>>>>> The proposed change is not the only option. One could avoid changes in OvS
>>>>> and PMDs if the new configuration field had the opposite meaning, with the
>>>>> action itself meaning delivery to the represented port and not to DPDK one.
>>>>> Alternatively, one could define a brand new action with the said behaviour.
>>>>
>>>> We had already very similar discussions regarding the understanding of what
>>>> the representor really is from the DPDK API's point of view, and the last
>>>> time, IIUC, it was concluded by a tech. board that representor should be
>>>> a "ghost of a VF", i.e. DPDK APIs should apply configuration by default to
>>>> VF and not to the representor device:
>>>> https://patches.dpdk.org/project/dpdk/cover/20191029185051.32203-1-thomas@monjalon.net/#104376
>>>> This wasn't enforced though, IIUC, for existing code and semantics is still mixed.
>>>>
>>>> I still think that configuration should be applied to VF, and the same applies
>>>> to rte_flow API. IMHO, average application should not care if device is
>>>> a VF itself or its representor. Everything should work exactly the same.
>>>> I think this matches with the original idea/design of the switchdev functionality
>>>> in the linux kernel and also matches with how the average user thinks about
>>>> representor devices.
>>>>
>>>> If some specific use-case requires to distinguish VF from the representor,
>>>> there should probably be a separate special API/flag for that.
>>>>
>>>> Best regards, Ilya Maximets.
>>>>
>>>
>
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [dpdk-dev] [RFC PATCH] ethdev: clarify flow action PORT ID semantics
2021-06-02 17:35 ` Ilya Maximets
@ 2021-06-02 19:35 ` Ivan Malov
2021-06-03 9:29 ` Ilya Maximets
0 siblings, 1 reply; 40+ messages in thread
From: Ivan Malov @ 2021-06-02 19:35 UTC (permalink / raw)
To: Ilya Maximets, Andrew Rybchenko, dev
Cc: Eli Britstein, Smadar Fuks, Hyong Youb Kim, Ori Kam, Jerin Jacob,
John Daley, Thomas Monjalon, Ferruh Yigit
On 02/06/2021 20:35, Ilya Maximets wrote:
> (Dropped Broadcom folks from CC. Mail server refuses to accept their
> emails for some reason: "Recipient address rejected: Domain not found."
> Please, try to ad them back on reply.)
>
> On 6/2/21 6:26 PM, Andrew Rybchenko wrote:
>> On 6/2/21 3:46 PM, Ilya Maximets wrote:
>>> On 6/1/21 4:28 PM, Ivan Malov wrote:
>>>> Hi Ilya,
>>>>
>>>> Thank you for reviewing the proposal at such short notice. I'm afraid that prior discussions overlook the simple fact that the whole problem is not limited to just VF representors. Action PORT_ID is also used with respect to the admin PF's ethdev, which "represents itself" (and by no means it represents the underlying physical/network port). In this case, one cannot state that the application treats it as a physical port, just like one states that the application perceives representors as VFs themselves.
>>>
>>>
>>> I don't think that it was overlooked. If device is in a switchdev mode than
>>> there is a PF representor and VF representors. Application typically works
>>> only with representors in this case is it doesn't make much sense to have
>>> representor and the upstream port attached to the same application at the
>>> same time. Configuration that is applied by application to the representor
>>> (PF or VF, it doesn't matter) applies to the corresponding upstream port
>>> (actual PF or VF) by default.
>>
>> PF is not necessarily associated with a network port. It
>> could be many PFs and just one network port on NIC.
>> Extra PFs are like VFs in this case. These PFs may be
>> passed to a VM in a similar way. So, we can have PF
>> representors similar to VF representors. I.e. it is
>> incorrect to say that PF in the case of switchdev is
>> a representor of a network port.
>>
>> If we prefer to talk in representors terminology, we
>> need 4 types of prepresentors:
>> - PF representor for PCIe physical function
>> - VF representor for PCIe virtual function
>> - SF representor for PCIe sub-function (PASID)
>> - network port representor
>> In fact above is PCIe oriented, but there are
>> other buses and ways to deliver traffic to applications.
>> Basically representor for any virtual port in virtual
>> switch which DPDK app can control using transfer rules.
>>
>>> Exactly same thing here with PORT_ID action. You have a packet and action
>>> to send it to the port, but it's not specified if HW needs to send it to
>>> the representor or the upstream port (again, VF or PF, it doesn't matter).
>>> Since there is no extra information, HW should send it to the upstream
>>> port by default. The same as configuration applies by default to the
>>> upstream port.
>>>
>>> Let's look at some workflow examples:
>>>
>>> DPDK Application
>>> | |
>>> | |
>>> +--PF-rep------VF-rep---+
>>> | |
>>> | NIC (switchdev) |
>>> | |
>>> +---PF---------VF-------+
>>> | |
>>> | |
>>> External VM or whatever
>>> Network
>>
>> See above. PF <-> External Network is incorrect above
>> since it not always the case. It should be
>> "NP <-> External network" and "NP-rep" above (NP -
>> network port). Sometimes PF is an NP-rep, but sometimes
>> it is not. It is just a question of default rules in
>> switchdev on what to do with traffic incoming from
>> network port.
>>
>> A bit more complicated picture is:
>>
>> +----------------------------------------+
>> | DPDK Application |
>> +----+---------+---------+---------+-----+
>> |PF0 |PF1 | |
>> | | | |
>> +--NP1-rep---NP2-rep---PF2-rep---VF-rep--+
>> | |
>> | NIC (switchdev) |
>> | |
>> +---NP1-------NP2-------PF2--------VF----+
>> | | | |
>> | | | |
>> External External VM or VM or
>> Network 1 Network 2 whatever whatever
>>
>> So, sometimes PF plays network port representor role (PF0,
>> PF1), sometimes it requires representor itself (PF2).
>> What to do if PF2 itself is attached to application?
>> Can we route traffic to it using PORT_ID action?
>> It has DPDK ethdev port. It is one of arguments why
>> plain PORT_ID should route DPDK application.
>
> OK. This is not very different from my understanding. The key
> is that there is a pair of interfaces, one is more visible than
> the other one.
>
>>
>> Of course, some applications would like to see it as
>> (simpler is better):
>>
>> +----------------------------------------+
>> | DPDK Application |
>> | |
>> +---PF0-------PF1------PF2-rep---VF-rep--+
>> | | | |
>> | | | |
>> External External VM or VM or
>> Network 1 Network 2 whatever whatever
>>
>> but some, I believe, require full picture. For examples,
>> I'd really like to know how much traffic goes via all 8
>> switchdev ports and running rte_eth_stats_get(0, ...)
>> (i.e. DPDK port 0 attached to PF0) I'd like to get
>> NP1-rep stats (not NP1 stats). It will match exactly
>> what I see in DPDK application. It is an argument why
>> plain PORT_ID should be treated as a DPDK ethdev port,
>> not a represented (upstream) entity.
>
> The point is that if application doesn't require full picture,
> it should not care. If application requires the full picture,
> it could take extra steps by setting extra bits. I don't
> understand why we need to force all applications to care about
> the full picture if we can avoid that?
>
>>
>>> a. Workflow for "DPDK Application" to set MAC to VF:
>>>
>>> 1. "DPDK Application" calls rte_set_etheraddr("VF-rep", new_mac);
>>> 2. DPDK sets MAC for "VF".
>>>
>>> b. Workflow for "DPDK Application" to set MAC to PF:
>>>
>>> 1. "DPDK Application" calls rte_set_etheraddr("PF-rep", new_mac);
>>> 2. DPDK sets MAC for "PF".
>>>
>>> c. Workflow for "DPDK Application" to send packet to the external network:
>>>
>>> 1. "DPDK Application" calls rte_eth_tx_burst("PF-rep", packet);
>>> 2. NIC receives the packet from "PF-rep" and sends it to "PF".
>>> 3. packet egresses to the external network from "PF".
>>>
>>> d. Workflow for "DPDK Application" to send packet to the "VM or whatever":
>>>
>>> 1. "DPDK Application" calls rte_eth_tx_burst("VF-rep", packet);
>>> 2. NIC receives the packet from "VF-rep" and sends it to "VF".
>>> 3. "VM or whatever" receives the packet from "VF".
>>>
>>> In two workflows above there is no rte_flow processing on step 2, i.e.,
>>> NIC does not perform any lookups/matches/actions, because it's not possible
>>> to configure actions for packets received from "PF-rep" or
>>> "VF-rep" as these ports doesn't own a port id and all the configuration
>>> and rte_flow actions translated and applied for the devices that these
>>> ports represents ("PF" and "VF") and not representors themselves ("PF-rep"
>>> or "VF-rep").
>>>
>>> e. Workflow for the packet received on PF and PORT_ID action:
>>>
>>> 1. "DPDK Application" configures rte_flow for all packets from "PF-rep"
>>> to execute PORT_ID "VF-rep".
>>> 2. NIC receives packet on "PF".
>>> 3. NIC executes 'PORT_ID "VF-rep"' action by sending packet to "VF".
>>> 4. "VM or whatever" receives the packet from "VF".
>>>
>>> f. Workflow for the packet received on VF and PORT_ID action:
>>>
>>> 1. "DPDK Application" configures rte_flow for all packets from "VF-rep"
>>> to execute 'PORT_ID "PF-rep"'.
>>> 2. NIC receives packet on "VF".
>>> 3. NIC executes 'PORT_ID "PF-rep"' action by sending packet to "PF".
>>> 4. Packet egresses from the "PF" to the external network.
>>>
>>> Above is what, IMHO, the logic should look like and this matches with
>>> the overall switchdev design in kernel.
>>>
>>> I understand that this logic could seem flipped-over from the HW point
>>> of view, but it's perfectly logical from the user's perspective, because
>>> user should not care if the application works with representors or
>>> some real devices. If application configures that all packets from port
>>> A should be sent to port B, user will expect that these packets will
>>> egress from port B once received from port A. That will be highly
>>> inconvenient if the packet will ingress from port B back to the
>>> application instead.
>>>
>>> DPDK Application
>>> | |
>>> | |
>>> port A port B
>>> | |
>>> *****MAGIC*****
>>> | |
>>> External Another Network
>>> Network or VM or whatever
>>>
>>> It should not matter if there is an extra layer between ports A and B
>>> and the external network and VM. Everything should work in exactly the
>>> same way, transparently for the application.
>>>
>>> The point of hardware offloading, and therefore rte_flow API, is to take
>>> what user does in software and make this "magically" work in hardware in
>>> the exactly same way. And this will be broken if user will have to
>>> use different logic based on the mode the hardware works in, i.e. based on
>>> the fact if the application works with ports or their representors.
>>>
>>> If some specific use case requires application to know if it's an
>>> upstream port or the representor and demystify the internals of the switchdev
>>> NIC, there should be a different port id for the representor itself that
>>> could be used in all DPDK APIs including rte_flow API or a special bit for
>>> that matter. IIRC, there was an idea to add a bit directly to the port_id
>>> for that purpose that will flip over behavior in all the workflow scenarios
>>> that I described above.
>>
>> As I understand we're basically on the same page, but just
>> fighting for defaults in DPDK.
>
> Yep.
>
>>
>>>>
>>>> Given these facts, it would not be quite right to just align the documentation with the de-facto action meaning assumed by OvS.
>>>
>>> It's not a "meaning assumed by OvS", it's the original design and the
>>> main idea of a switchdev based on a common sense.
>>
>> If so, common sense is not that common :)
>> My "common sense" says me that PORT_ID action
>> should route traffic to DPDK ethdev port to be
>> received by the DPDK application.
>
> By this logic rte_eth_tx_burst("VF-rep", packet) should send a packet
> to "VF-rep", i.e. this packet will be received back by the application
> on this same interface. But that is counter-intuitive and this is not
> how it works in linux kernel if you're opening socket and sending a
> packet to the "VF-rep" network interface.
>
> And if rte_eth_tx_burst("VF-rep", packet) sends packet to "VF" and not
> to "VF-rep", than I don't understand why PORT_ID action should work in
> the opposite way.
There's no contradiction here.
In rte_eth_tx_burst(X, packet) example, "X" is the port which the
application sits on and from where it sends the packet. In other words,
it's the point where the packet originates from, and not where it goes to.
At the same time, flow *action* PORT_ID (ID = "X") is clearly the
opposite: it specifies where the packet will go. Port ID is the
characteristic of a DPDK ethdev. So the packet goes *to* an ethdev with
the given ID ("X").
Perhaps consider action PHY_PORT: the index is the characteristic of the
network port. The packet goes *to* network through this NP. And not the
opposite way. Hopefully, nobody is going to claim that action PHY_PORT
should mean re-injecting the packet back to the HW flow engine "as if it
just came from the network port". Then why does one try to skew the
PORT_ID meaning this way? PORT_ID points to an ethdev - the packet goes
*to* the ethdev. Isn't that simple?
>
> Application receives a packet from port A and puts it to the port B.
> TC rule to forward packets from port A to port B will provide same result.
> So, why the similar rte_flow should do the opposite and send the packet
> back to the application?
Please see above. Action VF sends the packet *to* VF and *not* to the
upstream entity which this VF is connected to. Action PHY_PORT sends the
packet *to* network and does *not* make it appear as if it entered the
NIC from the network side. Action QUEUE sends the packet *to* the Rx
queue and does *not* make it appear as if it just egressed from the Tx
queue with the same index. Action PORT_ID sends the packet *to* an
ethdev with the given ID and *not* to the upstream entity which this
ethdev is connected to. It's just that transparent. It's just "do what
the name suggests".
Yes, an application (say, OvS) might have a high level design which
perceives the "high-level" ports plugged to it as a "patch-panel" of
sorts. Yes, when a high-level mechanism/logic of such application
invokes a *datapath-unaware* wrapper to offload a rule and request that
the packet be delivered to the given "high-level" port, it therefore
requests that the packet be delivered to the opposite end of the wire.
But then the lower-level datapath-specific (DPDK) handler kicks in.
Since it's DPDK-specific, it knows *everything* about the underlying
flow library it works with. In particular it knows that action PORT_ID
delivers the packet to an *ethdev*, at the same time, it knows that the
upper caller (high-level logic) for sure wants the opposite, so it (the
lower-level DPDK component) sets the "upstream" bit when translating the
higher-level port action to an RTE action "PORT_ID". Then the resulting
action is correct, and the packet indeed doesn't end up in the ethdev
but goes to the opposite end of the wire. That's it.
I have an impression that for some reason people are tempted to ignore
the two nominal "layers" in such applications (generic, or high-level
one and DPDK-specific one) thus trying to align DPDK logic with
high-level logic of the applications. That's simply not right. What I'm
trying to point out is that it *is* the true job of DPDK-specific data
path handler in such application - to properly translate generic flow
actions to DPDK-specific ones. It's the duty of DPDK component in such
applications to be aware of the genuine meaning of action PORT_ID.
This way, mixing up the two meanings is ruled out.
>
>>
>>>>
>>>> On 01/06/2021 15:10, Ilya Maximets wrote:
>>>>> On 6/1/21 1:14 PM, Ivan Malov wrote:
>>>>>> By its very name, action PORT_ID means that packets hit an ethdev with the
>>>>>> given DPDK port ID. At least the current comments don't state the opposite.
>>>>>> That said, since port representors had been adopted, applications like OvS
>>>>>> have been misusing the action. They misread its purpose as sending packets
>>>>>> to the opposite end of the "wire" plugged to the given ethdev, for example,
>>>>>> redirecting packets to the VF itself rather than to its representor ethdev.
>>>>>> Another example: OvS relies on this action with the admin PF's ethdev port
>>>>>> ID specified in it in order to send offloaded packets to the physical port.
>>>>>>
>>>>>> Since there might be applications which use this action in its valid sense,
>>>>>> one can't just change the documentation to greenlight the opposite meaning.
>>>>>> This patch adds an explicit bit to the action configuration which will let
>>>>>> applications, depending on their needs, leverage the two meanings properly.
>>>>>> Applications like OvS, as well as PMDs, will have to be corrected when the
>>>>>> patch has been applied. But the improved clarity of the action is worth it.
>>>>>>
>>>>>> The proposed change is not the only option. One could avoid changes in OvS
>>>>>> and PMDs if the new configuration field had the opposite meaning, with the
>>>>>> action itself meaning delivery to the represented port and not to DPDK one.
>>>>>> Alternatively, one could define a brand new action with the said behaviour.
>>>>>
>>>>> We had already very similar discussions regarding the understanding of what
>>>>> the representor really is from the DPDK API's point of view, and the last
>>>>> time, IIUC, it was concluded by a tech. board that representor should be
>>>>> a "ghost of a VF", i.e. DPDK APIs should apply configuration by default to
>>>>> VF and not to the representor device:
>>>>> https://patches.dpdk.org/project/dpdk/cover/20191029185051.32203-1-thomas@monjalon.net/#104376
>>>>> This wasn't enforced though, IIUC, for existing code and semantics is still mixed.
>>>>>
>>>>> I still think that configuration should be applied to VF, and the same applies
>>>>> to rte_flow API. IMHO, average application should not care if device is
>>>>> a VF itself or its representor. Everything should work exactly the same.
>>>>> I think this matches with the original idea/design of the switchdev functionality
>>>>> in the linux kernel and also matches with how the average user thinks about
>>>>> representor devices.
>>>>>
>>>>> If some specific use-case requires to distinguish VF from the representor,
>>>>> there should probably be a separate special API/flag for that.
>>>>>
>>>>> Best regards, Ilya Maximets.
>>>>>
>>>>
>>
--
Ivan M
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [dpdk-dev] [RFC PATCH] ethdev: clarify flow action PORT ID semantics
2021-06-02 12:36 ` Ivan Malov
@ 2021-06-03 9:18 ` Ori Kam
2021-06-03 9:55 ` Andrew Rybchenko
0 siblings, 1 reply; 40+ messages in thread
From: Ori Kam @ 2021-06-03 9:18 UTC (permalink / raw)
To: Ivan Malov, Eli Britstein, Andrew Rybchenko, Ilya Maximets, dev
Cc: Smadar Fuks, Hyong Youb Kim, Kishore Padmanabha, Ajit Khaparde,
Jerin Jacob, John Daley, NBU-Contact-Thomas Monjalon,
Ferruh Yigit
Hi All,
> -----Original Message-----
> From: Ivan Malov <Ivan.Malov@oktetlabs.ru>
>
> On 02/06/2021 14:21, Eli Britstein wrote:
> >
> > On 6/2/2021 1:50 PM, Andrew Rybchenko wrote:
> >> External email: Use caution opening links or attachments
> >>
> >>
> >> On 6/2/21 12:57 PM, Eli Britstein wrote:
> >>> On 6/1/2021 5:53 PM, Andrew Rybchenko wrote:
> >>>> External email: Use caution opening links or attachments
> >>>>
> >>>>
> >>>> On 6/1/21 5:44 PM, Eli Britstein wrote:
> >>>>> On 6/1/2021 5:35 PM, Andrew Rybchenko wrote:
> >>>>>> External email: Use caution opening links or attachments
> >>>>>>
> >>>>>>
> >>>>>> On 6/1/21 4:24 PM, Eli Britstein wrote:
> >>>>>>> On 6/1/2021 3:10 PM, Ilya Maximets wrote:
> >>>>>>>> External email: Use caution opening links or attachments
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> On 6/1/21 1:14 PM, Ivan Malov wrote:
> >>>>>>>>> By its very name, action PORT_ID means that packets hit an
> >>>>>>>>> ethdev with the given DPDK port ID. At least the current
> >>>>>>>>> comments don't state the opposite.
> >>>>>>>>> That said, since port representors had been adopted,
> >>>>>>>>> applications like OvS have been misusing the action. They
> >>>>>>>>> misread its purpose as sending packets to the opposite end of
> >>>>>>>>> the "wire" plugged to the given ethdev, for example,
> >>>>>>>>> redirecting packets to the VF itself rather than to its
> >>>>>>>>> representor ethdev.
Sorry but OVS got it right, this is the idea to send packet to the VF not to the representor,
I think that our first discussion should be what is a representor,
I know that there are a lot threads about it but it is steel unclear.
From my understanding representor is a shadow of a VF
This shadow has two functionalities:
1. data
It should receive any packet that was sent from the VF and was not
routed to any other destination. And vise versa any traffic sent on the representor.
should arrive to the corresponding VF.
What use case do you see for sending a packet to the representor?
2. control
allow to modify the VF from DPDK application.
Regarding the 1 point of the data, I don't see any sense if routing traffic to representor.
While on point 2 control their maybe some cases that we want to configure the representor itself
and not the VF for example changing mtu.
> >>>>>>>>> Another example: OvS relies on this action with the admin PF's
> >>>>>>>>> ethdev port ID specified in it in order to send offloaded
> >>>>>>>>> packets to the physical port.
> >>>>>>>>>
> >>>>>>>>> Since there might be applications which use this action in its
> >>>>>>>>> valid sense, one can't just change the documentation to
> >>>>>>>>> greenlight the opposite meaning.
> >>>>>>>>> This patch adds an explicit bit to the action configuration
> >>>>>>>>> which will let applications, depending on their needs,
> >>>>>>>>> leverage the two meanings properly.
> >>>>>>>>> Applications like OvS, as well as PMDs, will have to be
> >>>>>>>>> corrected when the patch has been applied. But the improved
> >>>>>>>>> clarity of the action is worth it.
> >>>>>>>>>
> >>>>>>>>> The proposed change is not the only option. One could avoid
> >>>>>>>>> changes in OvS and PMDs if the new configuration field had the
> >>>>>>>>> opposite meaning, with the action itself meaning delivery to
> >>>>>>>>> the represented port and not to DPDK one.
> >>>>>>>>> Alternatively, one could define a brand new action with the
> >>>>>>>>> said behaviour.
> >>>>>>> It doesn't make any sense to attach the VF itself to OVS, but
> >>>>>>> only its representor.
> >>>>>> OvS is not the only DPDK application.
> >>>>> True. It is just the focus of this commit message is OVS.
> >>>>>>> For the PF, when in switchdev mode, it is the "uplink
> >>>>>>> representor", so it is also a representor.
> >>>>>> Strictly speaking it is not a representor from DPDK point of
> >>>>>> view. E.g. representors have corresponding flag set which is
> >>>>>> definitely clear in the case of PF.
> >>>>> This is the per-PMD responsibility. The API should not care.
> >>>>>>> That said, OVS does not care of the type of the port. It doesn't
> >>>>>>> matter if it's an "upstream" or not, or if it's a representor or
> >>>>>>> not.
> >>>>>> Yes, it is clear, but let's put OvS aside. Let's consider a DPDK
> >>>>>> application which has a number of ethdev port. Some may belong to
> >>>>>> single switch domain, some may be from different switch domains
> >>>>>> (i.e. different NICs). Can I use PORT_ID action to redirect
> >>>>>> ingress traffic to a specified ethdev port using PORT_ID action?
> >>>>>> It looks like no, but IMHO it is the definition of the PORT_ID
> >>>>>> action.
> >>>>> Let's separate API from implementation. By API point of view, yes,
> >>>>> the user may request it. Nothing wrong with it.
> >>>>>
> >>>>> From implementation point of view - yes, it might fail, but not
> >>>>> for sure, even if on different NICs. Maybe the HW of a certain
> >>>>> vendor has the capability to do it?
> >>>>>
> >>>>> We can't know, so I think the API should allow it.
> >>>> Hold on. What should it allow? It is two opposite meanings:
> >>>> 1. Direct traffic to DPDK ethdev port specified using ID to be
> >>>> received and processed by the DPDK application.
> >>>> 2. Direct traffic to an upstream port represented by the
> >>>> DPDK port.
> >>>>
> >>>> The patch tries to address the ambiguity, misuse it in OvS (from my
> >>>> point of view in accordance with the action documentation),
> >>>> mis-implementation in a number of PMDs (to work in OvS) and tries
> >>>> to sort it out with an explanation why proposed direction is
> >>>> chosen. I realize that it could be painful, but IMHO it is the best
> >>>> option here. Yes, it is a point to discuss.
> >>>>
> >>>> To start with we should agree that that problem exists.
> >>>> Second, we should agree on direction how to solve it.
> >>> I agree. Suppose port 0 is the PF, and port 1 is a VF representor.
> >>>
> >>> IIUC, there are two options:
> >>>
> >>> 1. flow create 1 ingress transfer pattern eth / end action port_id
> >>> id 0 upstream 1 / end
> >>>
What is the meaning of upstream if I want to send traffic between VFs?
> >>> 2. flow create 1 ingress transfer pattern eth / end action port_id
> >>> id 0 upstream 0 / end
> >>>
> >>> [1] is the same behavior as today.
> >>>
> >>> [2] is a new behavior, the packet received by port 0 as if it
> >>> arrived from the wire.
> >>>
> >>> Then, let's have more:
> >>>
> >>> 3. flow create 0 ingress transfer pattern eth / end action port_id
> >>> id 1 upstream 1 / end
> >>>
> >>> 4. flow create 0 ingress transfer pattern eth / end action port_id
> >>> id 1 upstream 0 / end
> >>>
> >>> if we have [2] and [4], the packet going from the VF will hit [2],
> >>> then hit [4] and then [2] again in an endless loop?
> >> As I understand PORT_ID is a fate action. So, no more lookups are
> >> done. If the packet is loop back from applications, loop is possible.
> >
> > I referred a HW loop, not SW. For example with JUMP action (also fate):
> >
> > flow create 0 group 0 ingress transfer pattern eth / end action jump
> > group 1 / end
> >
> > flow create 0 group 1 ingress transfer pattern eth / end action jump
> > group 0 / end
> >
> >>
> >> In fact, it is a good question if "flow creare 0 ingress transfer" or
> >> "flow create 1 ingress transfer" assume any implicit filtering. I
> >> always thought that no.
This is a matter for discussion but currently we do have implicit filtering
on the source VF
> >> i.e. if we have two network ports rule like
> >> flow create 0 ingress transfer pattern eth / end \
> >> action port_id id 1 upstream 1 / end will match packets
> >> incoming from any port into the switch (network port 0, network port
> >> 1, VF or PF itself (???)).
> >> The topic also requires explicit clarification.
> > rte_flow is port based. It implicitly filters only packets for the
> > provided port (0).
> >
> > Maybe need to clarify documentation and have a "no filtering" API if
> > needed.
Maybe
>
> We've come across the following bits in the current documentation with
> respect to attribute "transfer", quote:
>
> "Instead of simply matching the properties of traffic as it would appear on a
> given DPDK port ID, enabling this attribute transfers a flow rule to the lowest
> possible level of any device endpoints found in the pattern.
>
> When supported, this effectively enables an application to reroute traffic not
> necessarily intended for it (e.g. coming from or addressed to different
> physical ports, VFs or applications) at the device level".
>
> (https://doc.dpdk.org/guides/prog_guide/rte_flow.html#attributes)
>
> Since action PORT_ID hardly makes sense without attribute "transfer"
> (unless it doesn't point to the same ethdev as the one used to submit the
> flow), this paragraph effectively states that in this particular case API
> "flow_create" is not (necessarily) port-based.
>
I'm not sure I understand what you mean port based.
> >
> >> PF itself is really a hard question because of "ingress"
> >> since traffic from PF is a traffic from DPDK application and it is
> >> egress, not ingress.
> >
> > Ingress means the direction. Hit on packets otherwise provided to the
> > SW by rte_eth_rx_burst().
> >
> > Same goes for the PF. Packets by rte_eth_rx_burst are the ones
> > arriving from the wire, so ingress is that direction and egress is from the
> app.
> >
> >>
> >> I think that port ID used to created flow rule should not apply any
> >> filtering in the case of transfer since we have corresponding items
> >> to do it explicitly. If we do it implicitly as well, we need some
> >> priorities and a way to avoid implicit rules which makes things much
> >> harder to understand and implement.
> >
I agree with your point, I think the application should state exactly what it wants.
Best,
Ori
> > If "upstream 0" means what I thought it means (comments?) maybe a
> > better way to do it is expose another port for that, so there will be 2 "PF"
> > ports - one as the wire representor and the other one as the "PF" (or
> > clearer naming...).
> >
> > This would be a vendor decision, and there would be no need to change
> > PORT_ID API.
> >
> >>
> >>> If this is your meaning, maybe what you are looking for is an action
> >>> to change the in_port and continue processing?
> >>>
> >>> Please comment on the examples I gave or clarify the use case you
> >>> are trying to do.
> >>>
> >>>
> >>> Thanks,
> >>>
> >>> Eli
> >>>
> >>>>>>>> We had already very similar discussions regarding the
> >>>>>>>> understanding of what the representor really is from the DPDK
> >>>>>>>> API's point of view, and the last time, IIUC, it was concluded
> >>>>>>>> by a tech. board that representor should be a "ghost of a VF",
> >>>>>>>> i.e. DPDK APIs should apply configuration by default to VF and
> >>>>>>>> not to the representor device:
> >>>>>>>>
> >>>>>>>>
> https://patches.dpdk.org/project/dpdk/cover/20191029185051.3220
> >>>>>>>> 3-1-thomas@monjalon.net/#104376
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> This wasn't enforced though, IIUC, for existing code and
> >>>>>>>> semantics is still mixed.
> >>>>>>> I am not sure how this is related.
> >>>>>>>> I still think that configuration should be applied to VF, and
> >>>>>>>> the same applies to rte_flow API. IMHO, average application
> >>>>>>>> should not care if device is a VF itself or its representor.
> >>>>>>>> Everything should work exactly the same.
> >>>>>>>> I think this matches with the original idea/design of the
> >>>>>>>> switchdev functionality in the linux kernel and also matches
> >>>>>>>> with how the average user thinks about representor devices.
> >>>>>>> Right. This is the way representors work. It is fully aligned
> >>>>>>> with configuration of OVS-kernel.
> >>>>>>>> If some specific use-case requires to distinguish VF from the
> >>>>>>>> representor, there should probably be a separate special
> >>>>>>>> API/flag for that.
> >>>>>>>>
> >>>>>>>> Best regards, Ilya Maximets.
>
> --
> Ivan M
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [dpdk-dev] [RFC PATCH] ethdev: clarify flow action PORT ID semantics
2021-06-02 19:35 ` Ivan Malov
@ 2021-06-03 9:29 ` Ilya Maximets
2021-06-03 10:33 ` Andrew Rybchenko
2021-06-03 11:29 ` Ivan Malov
0 siblings, 2 replies; 40+ messages in thread
From: Ilya Maximets @ 2021-06-03 9:29 UTC (permalink / raw)
To: Ivan Malov, Ilya Maximets, Andrew Rybchenko, dev
Cc: Eli Britstein, Smadar Fuks, Hyong Youb Kim, Ori Kam, Jerin Jacob,
John Daley, Thomas Monjalon, Ferruh Yigit
On 6/2/21 9:35 PM, Ivan Malov wrote:
> On 02/06/2021 20:35, Ilya Maximets wrote:
>> (Dropped Broadcom folks from CC. Mail server refuses to accept their
>> emails for some reason: "Recipient address rejected: Domain not found."
>> Please, try to ad them back on reply.)
>>
>> On 6/2/21 6:26 PM, Andrew Rybchenko wrote:
>>> On 6/2/21 3:46 PM, Ilya Maximets wrote:
>>>> On 6/1/21 4:28 PM, Ivan Malov wrote:
>>>>> Hi Ilya,
>>>>>
>>>>> Thank you for reviewing the proposal at such short notice. I'm afraid that prior discussions overlook the simple fact that the whole problem is not limited to just VF representors. Action PORT_ID is also used with respect to the admin PF's ethdev, which "represents itself" (and by no means it represents the underlying physical/network port). In this case, one cannot state that the application treats it as a physical port, just like one states that the application perceives representors as VFs themselves.
>>>>
>>>>
>>>> I don't think that it was overlooked. If device is in a switchdev mode than
>>>> there is a PF representor and VF representors. Application typically works
>>>> only with representors in this case is it doesn't make much sense to have
>>>> representor and the upstream port attached to the same application at the
>>>> same time. Configuration that is applied by application to the representor
>>>> (PF or VF, it doesn't matter) applies to the corresponding upstream port
>>>> (actual PF or VF) by default.
>>>
>>> PF is not necessarily associated with a network port. It
>>> could be many PFs and just one network port on NIC.
>>> Extra PFs are like VFs in this case. These PFs may be
>>> passed to a VM in a similar way. So, we can have PF
>>> representors similar to VF representors. I.e. it is
>>> incorrect to say that PF in the case of switchdev is
>>> a representor of a network port.
>>>
>>> If we prefer to talk in representors terminology, we
>>> need 4 types of prepresentors:
>>> - PF representor for PCIe physical function
>>> - VF representor for PCIe virtual function
>>> - SF representor for PCIe sub-function (PASID)
>>> - network port representor
>>> In fact above is PCIe oriented, but there are
>>> other buses and ways to deliver traffic to applications.
>>> Basically representor for any virtual port in virtual
>>> switch which DPDK app can control using transfer rules.
>>>
>>>> Exactly same thing here with PORT_ID action. You have a packet and action
>>>> to send it to the port, but it's not specified if HW needs to send it to
>>>> the representor or the upstream port (again, VF or PF, it doesn't matter).
>>>> Since there is no extra information, HW should send it to the upstream
>>>> port by default. The same as configuration applies by default to the
>>>> upstream port.
>>>>
>>>> Let's look at some workflow examples:
>>>>
>>>> DPDK Application
>>>> | |
>>>> | |
>>>> +--PF-rep------VF-rep---+
>>>> | |
>>>> | NIC (switchdev) |
>>>> | |
>>>> +---PF---------VF-------+
>>>> | |
>>>> | |
>>>> External VM or whatever
>>>> Network
>>>
>>> See above. PF <-> External Network is incorrect above
>>> since it not always the case. It should be
>>> "NP <-> External network" and "NP-rep" above (NP -
>>> network port). Sometimes PF is an NP-rep, but sometimes
>>> it is not. It is just a question of default rules in
>>> switchdev on what to do with traffic incoming from
>>> network port.
>>>
>>> A bit more complicated picture is:
>>>
>>> +----------------------------------------+
>>> | DPDK Application |
>>> +----+---------+---------+---------+-----+
>>> |PF0 |PF1 | |
>>> | | | |
>>> +--NP1-rep---NP2-rep---PF2-rep---VF-rep--+
>>> | |
>>> | NIC (switchdev) |
>>> | |
>>> +---NP1-------NP2-------PF2--------VF----+
>>> | | | |
>>> | | | |
>>> External External VM or VM or
>>> Network 1 Network 2 whatever whatever
>>>
>>> So, sometimes PF plays network port representor role (PF0,
>>> PF1), sometimes it requires representor itself (PF2).
>>> What to do if PF2 itself is attached to application?
>>> Can we route traffic to it using PORT_ID action?
>>> It has DPDK ethdev port. It is one of arguments why
>>> plain PORT_ID should route DPDK application.
>>
>> OK. This is not very different from my understanding. The key
>> is that there is a pair of interfaces, one is more visible than
>> the other one.
>>
>>>
>>> Of course, some applications would like to see it as
>>> (simpler is better):
>>>
>>> +----------------------------------------+
>>> | DPDK Application |
>>> | |
>>> +---PF0-------PF1------PF2-rep---VF-rep--+
>>> | | | |
>>> | | | |
>>> External External VM or VM or
>>> Network 1 Network 2 whatever whatever
>>>
>>> but some, I believe, require full picture. For examples,
>>> I'd really like to know how much traffic goes via all 8
>>> switchdev ports and running rte_eth_stats_get(0, ...)
>>> (i.e. DPDK port 0 attached to PF0) I'd like to get
>>> NP1-rep stats (not NP1 stats). It will match exactly
>>> what I see in DPDK application. It is an argument why
>>> plain PORT_ID should be treated as a DPDK ethdev port,
>>> not a represented (upstream) entity.
>>
>> The point is that if application doesn't require full picture,
>> it should not care. If application requires the full picture,
>> it could take extra steps by setting extra bits. I don't
>> understand why we need to force all applications to care about
>> the full picture if we can avoid that?
>>
>>>
>>>> a. Workflow for "DPDK Application" to set MAC to VF:
>>>>
>>>> 1. "DPDK Application" calls rte_set_etheraddr("VF-rep", new_mac);
>>>> 2. DPDK sets MAC for "VF".
>>>>
>>>> b. Workflow for "DPDK Application" to set MAC to PF:
>>>>
>>>> 1. "DPDK Application" calls rte_set_etheraddr("PF-rep", new_mac);
>>>> 2. DPDK sets MAC for "PF".
>>>>
>>>> c. Workflow for "DPDK Application" to send packet to the external network:
>>>>
>>>> 1. "DPDK Application" calls rte_eth_tx_burst("PF-rep", packet);
>>>> 2. NIC receives the packet from "PF-rep" and sends it to "PF".
>>>> 3. packet egresses to the external network from "PF".
>>>>
>>>> d. Workflow for "DPDK Application" to send packet to the "VM or whatever":
>>>>
>>>> 1. "DPDK Application" calls rte_eth_tx_burst("VF-rep", packet);
>>>> 2. NIC receives the packet from "VF-rep" and sends it to "VF".
>>>> 3. "VM or whatever" receives the packet from "VF".
>>>>
>>>> In two workflows above there is no rte_flow processing on step 2, i.e.,
>>>> NIC does not perform any lookups/matches/actions, because it's not possible
>>>> to configure actions for packets received from "PF-rep" or
>>>> "VF-rep" as these ports doesn't own a port id and all the configuration
>>>> and rte_flow actions translated and applied for the devices that these
>>>> ports represents ("PF" and "VF") and not representors themselves ("PF-rep"
>>>> or "VF-rep").
>>>>
>>>> e. Workflow for the packet received on PF and PORT_ID action:
>>>>
>>>> 1. "DPDK Application" configures rte_flow for all packets from "PF-rep"
>>>> to execute PORT_ID "VF-rep".
>>>> 2. NIC receives packet on "PF".
>>>> 3. NIC executes 'PORT_ID "VF-rep"' action by sending packet to "VF".
>>>> 4. "VM or whatever" receives the packet from "VF".
>>>>
>>>> f. Workflow for the packet received on VF and PORT_ID action:
>>>>
>>>> 1. "DPDK Application" configures rte_flow for all packets from "VF-rep"
>>>> to execute 'PORT_ID "PF-rep"'.
>>>> 2. NIC receives packet on "VF".
>>>> 3. NIC executes 'PORT_ID "PF-rep"' action by sending packet to "PF".
>>>> 4. Packet egresses from the "PF" to the external network.
>>>>
>>>> Above is what, IMHO, the logic should look like and this matches with
>>>> the overall switchdev design in kernel.
>>>>
>>>> I understand that this logic could seem flipped-over from the HW point
>>>> of view, but it's perfectly logical from the user's perspective, because
>>>> user should not care if the application works with representors or
>>>> some real devices. If application configures that all packets from port
>>>> A should be sent to port B, user will expect that these packets will
>>>> egress from port B once received from port A. That will be highly
>>>> inconvenient if the packet will ingress from port B back to the
>>>> application instead.
>>>>
>>>> DPDK Application
>>>> | |
>>>> | |
>>>> port A port B
>>>> | |
>>>> *****MAGIC*****
>>>> | |
>>>> External Another Network
>>>> Network or VM or whatever
>>>>
>>>> It should not matter if there is an extra layer between ports A and B
>>>> and the external network and VM. Everything should work in exactly the
>>>> same way, transparently for the application.
>>>>
>>>> The point of hardware offloading, and therefore rte_flow API, is to take
>>>> what user does in software and make this "magically" work in hardware in
>>>> the exactly same way. And this will be broken if user will have to
>>>> use different logic based on the mode the hardware works in, i.e. based on
>>>> the fact if the application works with ports or their representors.
>>>>
>>>> If some specific use case requires application to know if it's an
>>>> upstream port or the representor and demystify the internals of the switchdev
>>>> NIC, there should be a different port id for the representor itself that
>>>> could be used in all DPDK APIs including rte_flow API or a special bit for
>>>> that matter. IIRC, there was an idea to add a bit directly to the port_id
>>>> for that purpose that will flip over behavior in all the workflow scenarios
>>>> that I described above.
>>>
>>> As I understand we're basically on the same page, but just
>>> fighting for defaults in DPDK.
>>
>> Yep.
>>
>>>
>>>>>
>>>>> Given these facts, it would not be quite right to just align the documentation with the de-facto action meaning assumed by OvS.
>>>>
>>>> It's not a "meaning assumed by OvS", it's the original design and the
>>>> main idea of a switchdev based on a common sense.
>>>
>>> If so, common sense is not that common :)
>>> My "common sense" says me that PORT_ID action
>>> should route traffic to DPDK ethdev port to be
>>> received by the DPDK application.
>>
>> By this logic rte_eth_tx_burst("VF-rep", packet) should send a packet
>> to "VF-rep", i.e. this packet will be received back by the application
>> on this same interface. But that is counter-intuitive and this is not
>> how it works in linux kernel if you're opening socket and sending a
>> packet to the "VF-rep" network interface.
>>
>> And if rte_eth_tx_burst("VF-rep", packet) sends packet to "VF" and not
>> to "VF-rep", than I don't understand why PORT_ID action should work in
>> the opposite way.
>
> There's no contradiction here.
>
> In rte_eth_tx_burst(X, packet) example, "X" is the port which the application sits on and from where it sends the packet. In other words, it's the point where the packet originates from, and not where it goes to.
>
> At the same time, flow *action* PORT_ID (ID = "X") is clearly the opposite: it specifies where the packet will go. Port ID is the characteristic of a DPDK ethdev. So the packet goes *to* an ethdev with the given ID ("X").
>
> Perhaps consider action PHY_PORT: the index is the characteristic of the network port. The packet goes *to* network through this NP. And not the opposite way. Hopefully, nobody is going to claim that action PHY_PORT should mean re-injecting the packet back to the HW flow engine "as if it just came from the network port". Then why does one try to skew the PORT_ID meaning this way? PORT_ID points to an ethdev - the packet goes *to* the ethdev. Isn't that simple?
It's not simple. And PHY_PORT action would be hard to use from the
application that doesn't really need to know how underlying hardware
structured.
>
>>
>> Application receives a packet from port A and puts it to the port B.
>> TC rule to forward packets from port A to port B will provide same result.
>> So, why the similar rte_flow should do the opposite and send the packet
>> back to the application?
>
> Please see above. Action VF sends the packet *to* VF and *not* to the upstream entity which this VF is connected to. Action PHY_PORT sends the packet *to* network and does *not* make it appear as if it entered the NIC from the network side. Action QUEUE sends the packet *to* the Rx queue and does *not* make it appear as if it just egressed from the Tx queue with the same index. Action PORT_ID sends the packet *to* an ethdev with the given ID and *not* to the upstream entity which this ethdev is connected to. It's just that transparent. It's just "do what the name suggests".
>
> Yes, an application (say, OvS) might have a high level design which perceives the "high-level" ports plugged to it as a "patch-panel" of sorts. Yes, when a high-level mechanism/logic of such application invokes a *datapath-unaware* wrapper to offload a rule and request that the packet be delivered to the given "high-level" port, it therefore requests that the packet be delivered to the opposite end of the wire. But then the lower-level datapath-specific (DPDK) handler kicks in. Since it's DPDK-specific, it knows *everything* about the underlying flow library it works with. In particular it knows that action PORT_ID delivers the packet to an *ethdev*, at the same time, it knows that the upper caller (high-level logic) for sure wants the opposite, so it (the lower-level DPDK component) sets the "upstream" bit when translating the higher-level port action to an RTE action "PORT_ID".
I don't understand that. DPDK user is the application and DPDK
doesn't translate anything, application creates PORT_ID action
directly and passes it to DPDK. So, you're forcing the *end user*
(a.k.a. application) to know *everything* about the hardware the
application runs on. Of course, it gets this information about
the hardware from the DPDK library (otherwise this would be
completely ridiculous), but this doesn't change the fact that it's
the application that needs to think about the structure of the
underlying hardware while it's absolutely not necessary in vast
majority of cases.
> Then the resulting action is correct, and the packet indeed doesn't end up in the ethdev but goes
> to the opposite end of the wire. That's it.
>
> I have an impression that for some reason people are tempted to ignore the two nominal "layers" in such applications (generic, or high-level one and DPDK-specific one) thus trying to align DPDK logic with high-level logic of the applications. That's simply not right. What I'm trying to point out is that it *is* the true job of DPDK-specific data path handler in such application - to properly translate generic flow actions to DPDK-specific ones. It's the duty of DPDK component in such applications to be aware of the genuine meaning of action PORT_ID.
The reason is very simple: if application don't need to know the
full picture (how the hardware structured inside) it shouldn't
care and it's a duty of DPDK to abstract the hardware and provide
programming interfaces that could be easily used by application
developers who are not experts in the architecture of a hardware
that they want to use (basically, application developer should not
care at all in most cases on which hardware application will work).
It's basically in almost every single DPDK API, EAL means environment
*abstraction* layer, not an environment *proxy/passthrough* layer.
We can't assume that DPDK-specific layers in applications are always
written by hardware experts and, IMHO, DPDK should not force users
to learn underlying structures of switchdev devices. They might not
even have such devices for testing, so the application that works
on simple NICs should be able to run correctly on switchdev-capable
NICs too.
I think that "***MAGIC***" abstraction (see one of my previous ascii
graphics) is very important here.
>
> This way, mixing up the two meanings is ruled out.
Looking closer to how tc flower rules configured I noticed that
'mirred' action requires user to specify the direction in which
the packet will appear on the destination port. And I suppose
this will solve your issues with PORT_ID action without exposing
the "full picture" of the architecture of an underlying hardware.
It looks something like this:
tc filter add dev A ... action mirred egress redirect dev B
^^^^^^
Direction could be 'ingress' or 'egress', so the packet will
ingress from the port B back to application/kernel or it will
egress from this port to the external network. Same thing
could be implemented in rte_flow like this:
flow create A ingress transfer pattern eth / end
action port_id id B egress / end
So, application that needs to receive the packet from the port B
will specify 'ingress', others that just want to send packet from
the port B will specify 'egress'. Will that work for you?
(BTW, 'ingress' seems to be not implemented in TC and that kind
of suggests that it's not very useful at least for kernel use cases)
One might say that it's actually the same what is proposed in
this RFC, but I will argue that 'ingress/egress' schema doesn't
break the "***MAGIC***" abstraction because user is not obligated
to know the structure of the underlying hardware, while 'upstream'
flag is something very unclear from that perspective and makes
no sense for plane ports (non-representors).
>
>>
>>>
>>>>>
>>>>> On 01/06/2021 15:10, Ilya Maximets wrote:
>>>>>> On 6/1/21 1:14 PM, Ivan Malov wrote:
>>>>>>> By its very name, action PORT_ID means that packets hit an ethdev with the
>>>>>>> given DPDK port ID. At least the current comments don't state the opposite.
>>>>>>> That said, since port representors had been adopted, applications like OvS
>>>>>>> have been misusing the action. They misread its purpose as sending packets
>>>>>>> to the opposite end of the "wire" plugged to the given ethdev, for example,
>>>>>>> redirecting packets to the VF itself rather than to its representor ethdev.
>>>>>>> Another example: OvS relies on this action with the admin PF's ethdev port
>>>>>>> ID specified in it in order to send offloaded packets to the physical port.
>>>>>>>
>>>>>>> Since there might be applications which use this action in its valid sense,
>>>>>>> one can't just change the documentation to greenlight the opposite meaning.
>>>>>>> This patch adds an explicit bit to the action configuration which will let
>>>>>>> applications, depending on their needs, leverage the two meanings properly.
>>>>>>> Applications like OvS, as well as PMDs, will have to be corrected when the
>>>>>>> patch has been applied. But the improved clarity of the action is worth it.
>>>>>>>
>>>>>>> The proposed change is not the only option. One could avoid changes in OvS
>>>>>>> and PMDs if the new configuration field had the opposite meaning, with the
>>>>>>> action itself meaning delivery to the represented port and not to DPDK one.
>>>>>>> Alternatively, one could define a brand new action with the said behaviour.
>>>>>>
>>>>>> We had already very similar discussions regarding the understanding of what
>>>>>> the representor really is from the DPDK API's point of view, and the last
>>>>>> time, IIUC, it was concluded by a tech. board that representor should be
>>>>>> a "ghost of a VF", i.e. DPDK APIs should apply configuration by default to
>>>>>> VF and not to the representor device:
>>>>>> https://patches.dpdk.org/project/dpdk/cover/20191029185051.32203-1-thomas@monjalon.net/#104376
>>>>>> This wasn't enforced though, IIUC, for existing code and semantics is still mixed.
>>>>>>
>>>>>> I still think that configuration should be applied to VF, and the same applies
>>>>>> to rte_flow API. IMHO, average application should not care if device is
>>>>>> a VF itself or its representor. Everything should work exactly the same.
>>>>>> I think this matches with the original idea/design of the switchdev functionality
>>>>>> in the linux kernel and also matches with how the average user thinks about
>>>>>> representor devices.
>>>>>>
>>>>>> If some specific use-case requires to distinguish VF from the representor,
>>>>>> there should probably be a separate special API/flag for that.
>>>>>>
>>>>>> Best regards, Ilya Maximets.
>>>>>>
>>>>>
>>>
>
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [dpdk-dev] [RFC PATCH] ethdev: clarify flow action PORT ID semantics
2021-06-03 9:18 ` Ori Kam
@ 2021-06-03 9:55 ` Andrew Rybchenko
2021-06-07 8:28 ` Thomas Monjalon
0 siblings, 1 reply; 40+ messages in thread
From: Andrew Rybchenko @ 2021-06-03 9:55 UTC (permalink / raw)
To: Ori Kam, Ivan Malov, Eli Britstein, Ilya Maximets, dev
Cc: Smadar Fuks, Hyong Youb Kim, Kishore Padmanabha, Ajit Khaparde,
Jerin Jacob, John Daley, NBU-Contact-Thomas Monjalon,
Ferruh Yigit
On 6/3/21 12:18 PM, Ori Kam wrote:
> Hi All,
>
>> -----Original Message-----
>> From: Ivan Malov <Ivan.Malov@oktetlabs.ru>
>>
>> On 02/06/2021 14:21, Eli Britstein wrote:
>>>
>>> On 6/2/2021 1:50 PM, Andrew Rybchenko wrote:
>>>> External email: Use caution opening links or attachments
>>>>
>>>>
>>>> On 6/2/21 12:57 PM, Eli Britstein wrote:
>>>>> On 6/1/2021 5:53 PM, Andrew Rybchenko wrote:
>>>>>> External email: Use caution opening links or attachments
>>>>>>
>>>>>>
>>>>>> On 6/1/21 5:44 PM, Eli Britstein wrote:
>>>>>>> On 6/1/2021 5:35 PM, Andrew Rybchenko wrote:
>>>>>>>> External email: Use caution opening links or attachments
>>>>>>>>
>>>>>>>>
>>>>>>>> On 6/1/21 4:24 PM, Eli Britstein wrote:
>>>>>>>>> On 6/1/2021 3:10 PM, Ilya Maximets wrote:
>>>>>>>>>> External email: Use caution opening links or attachments
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On 6/1/21 1:14 PM, Ivan Malov wrote:
>>>>>>>>>>> By its very name, action PORT_ID means that packets hit an
>>>>>>>>>>> ethdev with the given DPDK port ID. At least the current
>>>>>>>>>>> comments don't state the opposite.
>>>>>>>>>>> That said, since port representors had been adopted,
>>>>>>>>>>> applications like OvS have been misusing the action. They
>>>>>>>>>>> misread its purpose as sending packets to the opposite end of
>>>>>>>>>>> the "wire" plugged to the given ethdev, for example,
>>>>>>>>>>> redirecting packets to the VF itself rather than to its
>>>>>>>>>>> representor ethdev.
>
> Sorry but OVS got it right, this is the idea to send packet to the VF not to the representor,
> I think that our first discussion should be what is a representor,
> I know that there are a lot threads about it but it is steel unclear.
Yes, really unclear. I'd like to highlight again that
the problem is not with representors only (as described
and discussed in the thread).
> From my understanding representor is a shadow of a VF
> This shadow has two functionalities:
> 1. data
> It should receive any packet that was sent from the VF and was not
> routed to any other destination. And vise versa any traffic sent on the representor.
> should arrive to the corresponding VF.
> What use case do you see for sending a packet to the representor?
>
> 2. control
> allow to modify the VF from DPDK application.
>
> Regarding the 1 point of the data, I don't see any sense if routing traffic to representor.
> While on point 2 control their maybe some cases that we want to configure the representor itself
> and not the VF for example changing mtu.
IMO if so there is a big inconsistency here with statistics
(just an example, which is simply to discuss).
On one hand packet/byte stats should say how much data is
received/sent by the DPDK application via the port (yes,
shadow, but still an ethdev port).
On the other hand you say that it is a shadow and it should
return VF stats.
>>>>>>>>>>> Another example: OvS relies on this action with the admin PF's
>>>>>>>>>>> ethdev port ID specified in it in order to send offloaded
>>>>>>>>>>> packets to the physical port.
>>>>>>>>>>>
>>>>>>>>>>> Since there might be applications which use this action in its
>>>>>>>>>>> valid sense, one can't just change the documentation to
>>>>>>>>>>> greenlight the opposite meaning.
>>>>>>>>>>> This patch adds an explicit bit to the action configuration
>>>>>>>>>>> which will let applications, depending on their needs,
>>>>>>>>>>> leverage the two meanings properly.
>>>>>>>>>>> Applications like OvS, as well as PMDs, will have to be
>>>>>>>>>>> corrected when the patch has been applied. But the improved
>>>>>>>>>>> clarity of the action is worth it.
>>>>>>>>>>>
>>>>>>>>>>> The proposed change is not the only option. One could avoid
>>>>>>>>>>> changes in OvS and PMDs if the new configuration field had the
>>>>>>>>>>> opposite meaning, with the action itself meaning delivery to
>>>>>>>>>>> the represented port and not to DPDK one.
>>>>>>>>>>> Alternatively, one could define a brand new action with the
>>>>>>>>>>> said behaviour.
>>>>>>>>> It doesn't make any sense to attach the VF itself to OVS, but
>>>>>>>>> only its representor.
>>>>>>>> OvS is not the only DPDK application.
>>>>>>> True. It is just the focus of this commit message is OVS.
>>>>>>>>> For the PF, when in switchdev mode, it is the "uplink
>>>>>>>>> representor", so it is also a representor.
>>>>>>>> Strictly speaking it is not a representor from DPDK point of
>>>>>>>> view. E.g. representors have corresponding flag set which is
>>>>>>>> definitely clear in the case of PF.
>>>>>>> This is the per-PMD responsibility. The API should not care.
>>>>>>>>> That said, OVS does not care of the type of the port. It doesn't
>>>>>>>>> matter if it's an "upstream" or not, or if it's a representor or
>>>>>>>>> not.
>>>>>>>> Yes, it is clear, but let's put OvS aside. Let's consider a DPDK
>>>>>>>> application which has a number of ethdev port. Some may belong to
>>>>>>>> single switch domain, some may be from different switch domains
>>>>>>>> (i.e. different NICs). Can I use PORT_ID action to redirect
>>>>>>>> ingress traffic to a specified ethdev port using PORT_ID action?
>>>>>>>> It looks like no, but IMHO it is the definition of the PORT_ID
>>>>>>>> action.
>>>>>>> Let's separate API from implementation. By API point of view, yes,
>>>>>>> the user may request it. Nothing wrong with it.
>>>>>>>
>>>>>>> From implementation point of view - yes, it might fail, but not
>>>>>>> for sure, even if on different NICs. Maybe the HW of a certain
>>>>>>> vendor has the capability to do it?
>>>>>>>
>>>>>>> We can't know, so I think the API should allow it.
>>>>>> Hold on. What should it allow? It is two opposite meanings:
>>>>>> 1. Direct traffic to DPDK ethdev port specified using ID to be
>>>>>> received and processed by the DPDK application.
>>>>>> 2. Direct traffic to an upstream port represented by the
>>>>>> DPDK port.
>>>>>>
>>>>>> The patch tries to address the ambiguity, misuse it in OvS (from my
>>>>>> point of view in accordance with the action documentation),
>>>>>> mis-implementation in a number of PMDs (to work in OvS) and tries
>>>>>> to sort it out with an explanation why proposed direction is
>>>>>> chosen. I realize that it could be painful, but IMHO it is the best
>>>>>> option here. Yes, it is a point to discuss.
>>>>>>
>>>>>> To start with we should agree that that problem exists.
>>>>>> Second, we should agree on direction how to solve it.
>>>>> I agree. Suppose port 0 is the PF, and port 1 is a VF representor.
>>>>>
>>>>> IIUC, there are two options:
>>>>>
>>>>> 1. flow create 1 ingress transfer pattern eth / end action port_id
>>>>> id 0 upstream 1 / end
>>>>>
>
> What is the meaning of upstream if I want to send traffic between VFs?
Which way? Which action? If you specify VF representor ID in
PORT_ID and say upstream (or egress as suggested by Ilya)
traffic will go to VF.
In fact, I like ingress/egress terminology for the action
suggested by Ilya in other mail in the thread.
>>>>> 2. flow create 1 ingress transfer pattern eth / end action port_id
>>>>> id 0 upstream 0 / end
>>>>>
>>>>> [1] is the same behavior as today.
>>>>>
>>>>> [2] is a new behavior, the packet received by port 0 as if it
>>>>> arrived from the wire.
>>>>>
>>>>> Then, let's have more:
>>>>>
>>>>> 3. flow create 0 ingress transfer pattern eth / end action port_id
>>>>> id 1 upstream 1 / end
>>>>>
>>>>> 4. flow create 0 ingress transfer pattern eth / end action port_id
>>>>> id 1 upstream 0 / end
>>>>>
>>>>> if we have [2] and [4], the packet going from the VF will hit [2],
>>>>> then hit [4] and then [2] again in an endless loop?
>>>> As I understand PORT_ID is a fate action. So, no more lookups are
>>>> done. If the packet is loop back from applications, loop is possible.
>>>
>>> I referred a HW loop, not SW. For example with JUMP action (also fate):
>>>
>>> flow create 0 group 0 ingress transfer pattern eth / end action jump
>>> group 1 / end
>>>
>>> flow create 0 group 1 ingress transfer pattern eth / end action jump
>>> group 0 / end
>>>
>>>>
>>>> In fact, it is a good question if "flow creare 0 ingress transfer" or
>>>> "flow create 1 ingress transfer" assume any implicit filtering. I
>>>> always thought that no.
>
> This is a matter for discussion but currently we do have implicit filtering
> on the source VF
Very good. Let's discuss and document the decision.
We've provided arguments why it should not be
implicit filtering in the case of transfer.
>>>> i.e. if we have two network ports rule like
>>>> flow create 0 ingress transfer pattern eth / end \
>>>> action port_id id 1 upstream 1 / end will match packets
>>>> incoming from any port into the switch (network port 0, network port
>>>> 1, VF or PF itself (???)).
>>>> The topic also requires explicit clarification.
>>> rte_flow is port based. It implicitly filters only packets for the
>>> provided port (0).
>>>
>>> Maybe need to clarify documentation and have a "no filtering" API if
>>> needed.
>
> Maybe
I'd say definitely requires since different vendors understand
it in a different ways.
>>
>> We've come across the following bits in the current documentation with
>> respect to attribute "transfer", quote:
>>
>> "Instead of simply matching the properties of traffic as it would appear on a
>> given DPDK port ID, enabling this attribute transfers a flow rule to the lowest
>> possible level of any device endpoints found in the pattern.
>>
>> When supported, this effectively enables an application to reroute traffic not
>> necessarily intended for it (e.g. coming from or addressed to different
>> physical ports, VFs or applications) at the device level".
>>
>> (https://doc.dpdk.org/guides/prog_guide/rte_flow.html#attributes)
>>
>> Since action PORT_ID hardly makes sense without attribute "transfer"
>> (unless it doesn't point to the same ethdev as the one used to submit the
>> flow), this paragraph effectively states that in this particular case API
>> "flow_create" is not (necessarily) port-based.
>>
> I'm not sure I understand what you mean port based.
>
>>>
>>>> PF itself is really a hard question because of "ingress"
>>>> since traffic from PF is a traffic from DPDK application and it is
>>>> egress, not ingress.
>>>
>>> Ingress means the direction. Hit on packets otherwise provided to the
>>> SW by rte_eth_rx_burst().
>>>
>>> Same goes for the PF. Packets by rte_eth_rx_burst are the ones
>>> arriving from the wire, so ingress is that direction and egress is from the
>> app.
>>>
>>>>
>>>> I think that port ID used to created flow rule should not apply any
>>>> filtering in the case of transfer since we have corresponding items
>>>> to do it explicitly. If we do it implicitly as well, we need some
>>>> priorities and a way to avoid implicit rules which makes things much
>>>> harder to understand and implement.
>>>
>
> I agree with your point, I think the application should state exactly what it wants.
>
> Best,
> Ori
>>> If "upstream 0" means what I thought it means (comments?) maybe a
>>> better way to do it is expose another port for that, so there will be 2 "PF"
>>> ports - one as the wire representor and the other one as the "PF" (or
>>> clearer naming...).
>>>
>>> This would be a vendor decision, and there would be no need to change
>>> PORT_ID API.
>>>
>>>>
>>>>> If this is your meaning, maybe what you are looking for is an action
>>>>> to change the in_port and continue processing?
>>>>>
>>>>> Please comment on the examples I gave or clarify the use case you
>>>>> are trying to do.
>>>>>
>>>>>
>>>>> Thanks,
>>>>>
>>>>> Eli
>>>>>
>>>>>>>>>> We had already very similar discussions regarding the
>>>>>>>>>> understanding of what the representor really is from the DPDK
>>>>>>>>>> API's point of view, and the last time, IIUC, it was concluded
>>>>>>>>>> by a tech. board that representor should be a "ghost of a VF",
>>>>>>>>>> i.e. DPDK APIs should apply configuration by default to VF and
>>>>>>>>>> not to the representor device:
>>>>>>>>>>
>>>>>>>>>>
>> https://patches.dpdk.org/project/dpdk/cover/20191029185051.3220
>>>>>>>>>> 3-1-thomas@monjalon.net/#104376
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> This wasn't enforced though, IIUC, for existing code and
>>>>>>>>>> semantics is still mixed.
>>>>>>>>> I am not sure how this is related.
>>>>>>>>>> I still think that configuration should be applied to VF, and
>>>>>>>>>> the same applies to rte_flow API. IMHO, average application
>>>>>>>>>> should not care if device is a VF itself or its representor.
>>>>>>>>>> Everything should work exactly the same.
>>>>>>>>>> I think this matches with the original idea/design of the
>>>>>>>>>> switchdev functionality in the linux kernel and also matches
>>>>>>>>>> with how the average user thinks about representor devices.
>>>>>>>>> Right. This is the way representors work. It is fully aligned
>>>>>>>>> with configuration of OVS-kernel.
>>>>>>>>>> If some specific use-case requires to distinguish VF from the
>>>>>>>>>> representor, there should probably be a separate special
>>>>>>>>>> API/flag for that.
>>>>>>>>>>
>>>>>>>>>> Best regards, Ilya Maximets.
>>
>> --
>> Ivan M
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [dpdk-dev] [RFC PATCH] ethdev: clarify flow action PORT ID semantics
2021-06-03 9:29 ` Ilya Maximets
@ 2021-06-03 10:33 ` Andrew Rybchenko
2021-06-03 11:05 ` Ilya Maximets
2021-06-03 11:29 ` Ivan Malov
1 sibling, 1 reply; 40+ messages in thread
From: Andrew Rybchenko @ 2021-06-03 10:33 UTC (permalink / raw)
To: Ilya Maximets, Ivan Malov, dev
Cc: Eli Britstein, Smadar Fuks, Hyong Youb Kim, Ori Kam, Jerin Jacob,
John Daley, Thomas Monjalon, Ferruh Yigit
On 6/3/21 12:29 PM, Ilya Maximets wrote:
> On 6/2/21 9:35 PM, Ivan Malov wrote:
>> On 02/06/2021 20:35, Ilya Maximets wrote:
>>> (Dropped Broadcom folks from CC. Mail server refuses to accept their
>>> emails for some reason: "Recipient address rejected: Domain not found."
>>> Please, try to ad them back on reply.)
>>>
>>> On 6/2/21 6:26 PM, Andrew Rybchenko wrote:
>>>> On 6/2/21 3:46 PM, Ilya Maximets wrote:
>>>>> On 6/1/21 4:28 PM, Ivan Malov wrote:
>>>>>> Hi Ilya,
>>>>>>
>>>>>> Thank you for reviewing the proposal at such short notice. I'm afraid that prior discussions overlook the simple fact that the whole problem is not limited to just VF representors. Action PORT_ID is also used with respect to the admin PF's ethdev, which "represents itself" (and by no means it represents the underlying physical/network port). In this case, one cannot state that the application treats it as a physical port, just like one states that the application perceives representors as VFs themselves.
>>>>>
>>>>>
>>>>> I don't think that it was overlooked. If device is in a switchdev mode than
>>>>> there is a PF representor and VF representors. Application typically works
>>>>> only with representors in this case is it doesn't make much sense to have
>>>>> representor and the upstream port attached to the same application at the
>>>>> same time. Configuration that is applied by application to the representor
>>>>> (PF or VF, it doesn't matter) applies to the corresponding upstream port
>>>>> (actual PF or VF) by default.
>>>>
>>>> PF is not necessarily associated with a network port. It
>>>> could be many PFs and just one network port on NIC.
>>>> Extra PFs are like VFs in this case. These PFs may be
>>>> passed to a VM in a similar way. So, we can have PF
>>>> representors similar to VF representors. I.e. it is
>>>> incorrect to say that PF in the case of switchdev is
>>>> a representor of a network port.
>>>>
>>>> If we prefer to talk in representors terminology, we
>>>> need 4 types of prepresentors:
>>>> - PF representor for PCIe physical function
>>>> - VF representor for PCIe virtual function
>>>> - SF representor for PCIe sub-function (PASID)
>>>> - network port representor
>>>> In fact above is PCIe oriented, but there are
>>>> other buses and ways to deliver traffic to applications.
>>>> Basically representor for any virtual port in virtual
>>>> switch which DPDK app can control using transfer rules.
>>>>
>>>>> Exactly same thing here with PORT_ID action. You have a packet and action
>>>>> to send it to the port, but it's not specified if HW needs to send it to
>>>>> the representor or the upstream port (again, VF or PF, it doesn't matter).
>>>>> Since there is no extra information, HW should send it to the upstream
>>>>> port by default. The same as configuration applies by default to the
>>>>> upstream port.
>>>>>
>>>>> Let's look at some workflow examples:
>>>>>
>>>>> DPDK Application
>>>>> | |
>>>>> | |
>>>>> +--PF-rep------VF-rep---+
>>>>> | |
>>>>> | NIC (switchdev) |
>>>>> | |
>>>>> +---PF---------VF-------+
>>>>> | |
>>>>> | |
>>>>> External VM or whatever
>>>>> Network
>>>>
>>>> See above. PF <-> External Network is incorrect above
>>>> since it not always the case. It should be
>>>> "NP <-> External network" and "NP-rep" above (NP -
>>>> network port). Sometimes PF is an NP-rep, but sometimes
>>>> it is not. It is just a question of default rules in
>>>> switchdev on what to do with traffic incoming from
>>>> network port.
>>>>
>>>> A bit more complicated picture is:
>>>>
>>>> +----------------------------------------+
>>>> | DPDK Application |
>>>> +----+---------+---------+---------+-----+
>>>> |PF0 |PF1 | |
>>>> | | | |
>>>> +--NP1-rep---NP2-rep---PF2-rep---VF-rep--+
>>>> | |
>>>> | NIC (switchdev) |
>>>> | |
>>>> +---NP1-------NP2-------PF2--------VF----+
>>>> | | | |
>>>> | | | |
>>>> External External VM or VM or
>>>> Network 1 Network 2 whatever whatever
>>>>
>>>> So, sometimes PF plays network port representor role (PF0,
>>>> PF1), sometimes it requires representor itself (PF2).
>>>> What to do if PF2 itself is attached to application?
>>>> Can we route traffic to it using PORT_ID action?
>>>> It has DPDK ethdev port. It is one of arguments why
>>>> plain PORT_ID should route DPDK application.
>>>
>>> OK. This is not very different from my understanding. The key
>>> is that there is a pair of interfaces, one is more visible than
>>> the other one.
>>>
>>>>
>>>> Of course, some applications would like to see it as
>>>> (simpler is better):
>>>>
>>>> +----------------------------------------+
>>>> | DPDK Application |
>>>> | |
>>>> +---PF0-------PF1------PF2-rep---VF-rep--+
>>>> | | | |
>>>> | | | |
>>>> External External VM or VM or
>>>> Network 1 Network 2 whatever whatever
>>>>
>>>> but some, I believe, require full picture. For examples,
>>>> I'd really like to know how much traffic goes via all 8
>>>> switchdev ports and running rte_eth_stats_get(0, ...)
>>>> (i.e. DPDK port 0 attached to PF0) I'd like to get
>>>> NP1-rep stats (not NP1 stats). It will match exactly
>>>> what I see in DPDK application. It is an argument why
>>>> plain PORT_ID should be treated as a DPDK ethdev port,
>>>> not a represented (upstream) entity.
>>>
>>> The point is that if application doesn't require full picture,
>>> it should not care. If application requires the full picture,
>>> it could take extra steps by setting extra bits. I don't
>>> understand why we need to force all applications to care about
>>> the full picture if we can avoid that?
>>>
>>>>
>>>>> a. Workflow for "DPDK Application" to set MAC to VF:
>>>>>
>>>>> 1. "DPDK Application" calls rte_set_etheraddr("VF-rep", new_mac);
>>>>> 2. DPDK sets MAC for "VF".
>>>>>
>>>>> b. Workflow for "DPDK Application" to set MAC to PF:
>>>>>
>>>>> 1. "DPDK Application" calls rte_set_etheraddr("PF-rep", new_mac);
>>>>> 2. DPDK sets MAC for "PF".
>>>>>
>>>>> c. Workflow for "DPDK Application" to send packet to the external network:
>>>>>
>>>>> 1. "DPDK Application" calls rte_eth_tx_burst("PF-rep", packet);
>>>>> 2. NIC receives the packet from "PF-rep" and sends it to "PF".
>>>>> 3. packet egresses to the external network from "PF".
>>>>>
>>>>> d. Workflow for "DPDK Application" to send packet to the "VM or whatever":
>>>>>
>>>>> 1. "DPDK Application" calls rte_eth_tx_burst("VF-rep", packet);
>>>>> 2. NIC receives the packet from "VF-rep" and sends it to "VF".
>>>>> 3. "VM or whatever" receives the packet from "VF".
>>>>>
>>>>> In two workflows above there is no rte_flow processing on step 2, i.e.,
>>>>> NIC does not perform any lookups/matches/actions, because it's not possible
>>>>> to configure actions for packets received from "PF-rep" or
>>>>> "VF-rep" as these ports doesn't own a port id and all the configuration
>>>>> and rte_flow actions translated and applied for the devices that these
>>>>> ports represents ("PF" and "VF") and not representors themselves ("PF-rep"
>>>>> or "VF-rep").
>>>>>
>>>>> e. Workflow for the packet received on PF and PORT_ID action:
>>>>>
>>>>> 1. "DPDK Application" configures rte_flow for all packets from "PF-rep"
>>>>> to execute PORT_ID "VF-rep".
>>>>> 2. NIC receives packet on "PF".
>>>>> 3. NIC executes 'PORT_ID "VF-rep"' action by sending packet to "VF".
>>>>> 4. "VM or whatever" receives the packet from "VF".
>>>>>
>>>>> f. Workflow for the packet received on VF and PORT_ID action:
>>>>>
>>>>> 1. "DPDK Application" configures rte_flow for all packets from "VF-rep"
>>>>> to execute 'PORT_ID "PF-rep"'.
>>>>> 2. NIC receives packet on "VF".
>>>>> 3. NIC executes 'PORT_ID "PF-rep"' action by sending packet to "PF".
>>>>> 4. Packet egresses from the "PF" to the external network.
>>>>>
>>>>> Above is what, IMHO, the logic should look like and this matches with
>>>>> the overall switchdev design in kernel.
>>>>>
>>>>> I understand that this logic could seem flipped-over from the HW point
>>>>> of view, but it's perfectly logical from the user's perspective, because
>>>>> user should not care if the application works with representors or
>>>>> some real devices. If application configures that all packets from port
>>>>> A should be sent to port B, user will expect that these packets will
>>>>> egress from port B once received from port A. That will be highly
>>>>> inconvenient if the packet will ingress from port B back to the
>>>>> application instead.
>>>>>
>>>>> DPDK Application
>>>>> | |
>>>>> | |
>>>>> port A port B
>>>>> | |
>>>>> *****MAGIC*****
>>>>> | |
>>>>> External Another Network
>>>>> Network or VM or whatever
>>>>>
>>>>> It should not matter if there is an extra layer between ports A and B
>>>>> and the external network and VM. Everything should work in exactly the
>>>>> same way, transparently for the application.
>>>>>
>>>>> The point of hardware offloading, and therefore rte_flow API, is to take
>>>>> what user does in software and make this "magically" work in hardware in
>>>>> the exactly same way. And this will be broken if user will have to
>>>>> use different logic based on the mode the hardware works in, i.e. based on
>>>>> the fact if the application works with ports or their representors.
>>>>>
>>>>> If some specific use case requires application to know if it's an
>>>>> upstream port or the representor and demystify the internals of the switchdev
>>>>> NIC, there should be a different port id for the representor itself that
>>>>> could be used in all DPDK APIs including rte_flow API or a special bit for
>>>>> that matter. IIRC, there was an idea to add a bit directly to the port_id
>>>>> for that purpose that will flip over behavior in all the workflow scenarios
>>>>> that I described above.
>>>>
>>>> As I understand we're basically on the same page, but just
>>>> fighting for defaults in DPDK.
>>>
>>> Yep.
>>>
>>>>
>>>>>>
>>>>>> Given these facts, it would not be quite right to just align the documentation with the de-facto action meaning assumed by OvS.
>>>>>
>>>>> It's not a "meaning assumed by OvS", it's the original design and the
>>>>> main idea of a switchdev based on a common sense.
>>>>
>>>> If so, common sense is not that common :)
>>>> My "common sense" says me that PORT_ID action
>>>> should route traffic to DPDK ethdev port to be
>>>> received by the DPDK application.
>>>
>>> By this logic rte_eth_tx_burst("VF-rep", packet) should send a packet
>>> to "VF-rep", i.e. this packet will be received back by the application
>>> on this same interface. But that is counter-intuitive and this is not
>>> how it works in linux kernel if you're opening socket and sending a
>>> packet to the "VF-rep" network interface.
>>>
>>> And if rte_eth_tx_burst("VF-rep", packet) sends packet to "VF" and not
>>> to "VF-rep", than I don't understand why PORT_ID action should work in
>>> the opposite way.
>>
>> There's no contradiction here.
>>
>> In rte_eth_tx_burst(X, packet) example, "X" is the port which the application sits on and from where it sends the packet. In other words, it's the point where the packet originates from, and not where it goes to.
>>
>> At the same time, flow *action* PORT_ID (ID = "X") is clearly the opposite: it specifies where the packet will go. Port ID is the characteristic of a DPDK ethdev. So the packet goes *to* an ethdev with the given ID ("X").
>>
>> Perhaps consider action PHY_PORT: the index is the characteristic of the network port. The packet goes *to* network through this NP. And not the opposite way. Hopefully, nobody is going to claim that action PHY_PORT should mean re-injecting the packet back to the HW flow engine "as if it just came from the network port". Then why does one try to skew the PORT_ID meaning this way? PORT_ID points to an ethdev - the packet goes *to* the ethdev. Isn't that simple?
>
> It's not simple. And PHY_PORT action would be hard to use from the
> application that doesn't really need to know how underlying hardware
> structured.
Yes, I agree. Basically above paragraph just try to highlight
existing consistent semantics in various actions which set
traffic direction and highlight inconsistency if we interpret
PORT_ID default as egress in accordance with terminology
suggested below. PORT_ID is a DPDK port and default direction
should be to DPDK port. I'll continue on the topic below.
>>
>>>
>>> Application receives a packet from port A and puts it to the port B.
>>> TC rule to forward packets from port A to port B will provide same result.
>>> So, why the similar rte_flow should do the opposite and send the packet
>>> back to the application?
>>
>> Please see above. Action VF sends the packet *to* VF and *not* to the upstream entity which this VF is connected to. Action PHY_PORT sends the packet *to* network and does *not* make it appear as if it entered the NIC from the network side. Action QUEUE sends the packet *to* the Rx queue and does *not* make it appear as if it just egressed from the Tx queue with the same index. Action PORT_ID sends the packet *to* an ethdev with the given ID and *not* to the upstream entity which this ethdev is connected to. It's just that transparent. It's just "do what the name suggests".
>>
>> Yes, an application (say, OvS) might have a high level design which perceives the "high-level" ports plugged to it as a "patch-panel" of sorts. Yes, when a high-level mechanism/logic of such application invokes a *datapath-unaware* wrapper to offload a rule and request that the packet be delivered to the given "high-level" port, it therefore requests that the packet be delivered to the opposite end of the wire. But then the lower-level datapath-specific (DPDK) handler kicks in. Since it's DPDK-specific, it knows *everything* about the underlying flow library it works with. In particular it knows that action PORT_ID delivers the packet to an *ethdev*, at the same time, it knows that the upper caller (high-level logic) for sure wants the opposite, so it (the lower-level DPDK component) sets the "upstream" bit when translating the higher-level port action to an RTE action "PORT_ID".
>
> I don't understand that. DPDK user is the application and DPDK
> doesn't translate anything, application creates PORT_ID action
> directly and passes it to DPDK. So, you're forcing the *end user*
> (a.k.a. application) to know *everything* about the hardware the
> application runs on. Of course, it gets this information about
> the hardware from the DPDK library (otherwise this would be
> completely ridiculous), but this doesn't change the fact that it's
> the application that needs to think about the structure of the
> underlying hardware while it's absolutely not necessary in vast
> majority of cases.
Yes, that's all true, but I think that specification of the
direction is *not* diving to deep in hardware details.
For DPDK I think it is important to have consistent semantics
and interpretation of input parameters. That will make the
library easier to use and make it less error-prone.
>> Then the resulting action is correct, and the packet indeed doesn't end up in the ethdev but goes
>> to the opposite end of the wire. That's it.
>>
>> I have an impression that for some reason people are tempted to ignore the two nominal "layers" in such applications (generic, or high-level one and DPDK-specific one) thus trying to align DPDK logic with high-level logic of the applications. That's simply not right. What I'm trying to point out is that it *is* the true job of DPDK-specific data path handler in such application - to properly translate generic flow actions to DPDK-specific ones. It's the duty of DPDK component in such applications to be aware of the genuine meaning of action PORT_ID.
>
> The reason is very simple: if application don't need to know the
> full picture (how the hardware structured inside) it shouldn't
> care and it's a duty of DPDK to abstract the hardware and provide
> programming interfaces that could be easily used by application
> developers who are not experts in the architecture of a hardware
> that they want to use (basically, application developer should not
> care at all in most cases on which hardware application will work).
> It's basically in almost every single DPDK API, EAL means environment
> *abstraction* layer, not an environment *proxy/passthrough* layer.
> We can't assume that DPDK-specific layers in applications are always
> written by hardware experts and, IMHO, DPDK should not force users
> to learn underlying structures of switchdev devices. They might not
> even have such devices for testing, so the application that works
> on simple NICs should be able to run correctly on switchdev-capable
> NICs too.
>
> I think that "***MAGIC***" abstraction (see one of my previous ascii
> graphics) is very important here.
I've answered it above. Specification of the direction is *not*
diving to deep in HW details.
>>
>> This way, mixing up the two meanings is ruled out.
>
> Looking closer to how tc flower rules configured I noticed that
> 'mirred' action requires user to specify the direction in which
> the packet will appear on the destination port. And I suppose
> this will solve your issues with PORT_ID action without exposing
> the "full picture" of the architecture of an underlying hardware.
>
> It looks something like this:
>
> tc filter add dev A ... action mirred egress redirect dev B
> ^^^^^^
>
> Direction could be 'ingress' or 'egress', so the packet will
> ingress from the port B back to application/kernel or it will
> egress from this port to the external network. Same thing
> could be implemented in rte_flow like this:
>
> flow create A ingress transfer pattern eth / end
> action port_id id B egress / end
>
> So, application that needs to receive the packet from the port B
> will specify 'ingress', others that just want to send packet from
> the port B will specify 'egress'. Will that work for you?
>
> (BTW, 'ingress' seems to be not implemented in TC and that kind
> of suggests that it's not very useful at least for kernel use cases)
>
> One might say that it's actually the same what is proposed in
> this RFC, but I will argue that 'ingress/egress' schema doesn't
> break the "***MAGIC***" abstraction because user is not obligated
> to know the structure of the underlying hardware, while 'upstream'
> flag is something very unclear from that perspective and makes
> no sense for plane ports (non-representors).
I think it is really an excellent idea and suggested
terminology looks very good to me. However, we should
agree on technical details on API level (not testpmd
commands). I think we have 4 options:
A. Add "ingress" bit with "egress" as unset meaning.
Yes, that's what is current behaviour assumed and
used by OvS and implemented in some PMDs.
My problem with it that it is, IMHO, inconsistent
default value (as explained above).
B. Add "egress" bit with "ingress" as unset meaning.
Basically it is what is suggested in the RFC, but
the problem of the suggestion is the silent breakage
of existing users (let's put it a side if it is
correct usage or misuse). It is still the fact.
C. Encode above in ethdev port ID MSB.
The problem of the solution is that encoding
makes sense for representors, but the problem
exists for non-representor ports as well.
I have no good ideas on terminology in the case
if we try to solve it for non-representors.
D. Break API and ABI and add enum with unset(default)/
ingress/egress members to enforce application to
specify direction.
It is unclear what we'll do in the case of A, B and D
if we encode representor in port ID MSB in any case.
>>>>>> On 01/06/2021 15:10, Ilya Maximets wrote:
>>>>>>> On 6/1/21 1:14 PM, Ivan Malov wrote:
>>>>>>>> By its very name, action PORT_ID means that packets hit an ethdev with the
>>>>>>>> given DPDK port ID. At least the current comments don't state the opposite.
>>>>>>>> That said, since port representors had been adopted, applications like OvS
>>>>>>>> have been misusing the action. They misread its purpose as sending packets
>>>>>>>> to the opposite end of the "wire" plugged to the given ethdev, for example,
>>>>>>>> redirecting packets to the VF itself rather than to its representor ethdev.
>>>>>>>> Another example: OvS relies on this action with the admin PF's ethdev port
>>>>>>>> ID specified in it in order to send offloaded packets to the physical port.
>>>>>>>>
>>>>>>>> Since there might be applications which use this action in its valid sense,
>>>>>>>> one can't just change the documentation to greenlight the opposite meaning.
>>>>>>>> This patch adds an explicit bit to the action configuration which will let
>>>>>>>> applications, depending on their needs, leverage the two meanings properly.
>>>>>>>> Applications like OvS, as well as PMDs, will have to be corrected when the
>>>>>>>> patch has been applied. But the improved clarity of the action is worth it.
>>>>>>>>
>>>>>>>> The proposed change is not the only option. One could avoid changes in OvS
>>>>>>>> and PMDs if the new configuration field had the opposite meaning, with the
>>>>>>>> action itself meaning delivery to the represented port and not to DPDK one.
>>>>>>>> Alternatively, one could define a brand new action with the said behaviour.
>>>>>>>
>>>>>>> We had already very similar discussions regarding the understanding of what
>>>>>>> the representor really is from the DPDK API's point of view, and the last
>>>>>>> time, IIUC, it was concluded by a tech. board that representor should be
>>>>>>> a "ghost of a VF", i.e. DPDK APIs should apply configuration by default to
>>>>>>> VF and not to the representor device:
>>>>>>> https://patches.dpdk.org/project/dpdk/cover/20191029185051.32203-1-thomas@monjalon.net/#104376
>>>>>>> This wasn't enforced though, IIUC, for existing code and semantics is still mixed.
>>>>>>>
>>>>>>> I still think that configuration should be applied to VF, and the same applies
>>>>>>> to rte_flow API. IMHO, average application should not care if device is
>>>>>>> a VF itself or its representor. Everything should work exactly the same.
>>>>>>> I think this matches with the original idea/design of the switchdev functionality
>>>>>>> in the linux kernel and also matches with how the average user thinks about
>>>>>>> representor devices.
>>>>>>>
>>>>>>> If some specific use-case requires to distinguish VF from the representor,
>>>>>>> there should probably be a separate special API/flag for that.
>>>>>>>
>>>>>>> Best regards, Ilya Maximets.
>>>>>>>
>>>>>>
>>>>
>>
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [dpdk-dev] [RFC PATCH] ethdev: clarify flow action PORT ID semantics
2021-06-03 10:33 ` Andrew Rybchenko
@ 2021-06-03 11:05 ` Ilya Maximets
0 siblings, 0 replies; 40+ messages in thread
From: Ilya Maximets @ 2021-06-03 11:05 UTC (permalink / raw)
To: Andrew Rybchenko, Ilya Maximets, Ivan Malov, dev
Cc: Eli Britstein, Smadar Fuks, Hyong Youb Kim, Ori Kam, Jerin Jacob,
John Daley, Thomas Monjalon, Ferruh Yigit
On 6/3/21 12:33 PM, Andrew Rybchenko wrote:
> On 6/3/21 12:29 PM, Ilya Maximets wrote:
>> On 6/2/21 9:35 PM, Ivan Malov wrote:
>>> On 02/06/2021 20:35, Ilya Maximets wrote:
>>>> (Dropped Broadcom folks from CC. Mail server refuses to accept their
>>>> emails for some reason: "Recipient address rejected: Domain not found."
>>>> Please, try to ad them back on reply.)
>>>>
>>>> On 6/2/21 6:26 PM, Andrew Rybchenko wrote:
>>>>> On 6/2/21 3:46 PM, Ilya Maximets wrote:
>>>>>> On 6/1/21 4:28 PM, Ivan Malov wrote:
>>>>>>> Hi Ilya,
>>>>>>>
>>>>>>> Thank you for reviewing the proposal at such short notice. I'm afraid that prior discussions overlook the simple fact that the whole problem is not limited to just VF representors. Action PORT_ID is also used with respect to the admin PF's ethdev, which "represents itself" (and by no means it represents the underlying physical/network port). In this case, one cannot state that the application treats it as a physical port, just like one states that the application perceives representors as VFs themselves.
>>>>>>
>>>>>>
>>>>>> I don't think that it was overlooked. If device is in a switchdev mode than
>>>>>> there is a PF representor and VF representors. Application typically works
>>>>>> only with representors in this case is it doesn't make much sense to have
>>>>>> representor and the upstream port attached to the same application at the
>>>>>> same time. Configuration that is applied by application to the representor
>>>>>> (PF or VF, it doesn't matter) applies to the corresponding upstream port
>>>>>> (actual PF or VF) by default.
>>>>>
>>>>> PF is not necessarily associated with a network port. It
>>>>> could be many PFs and just one network port on NIC.
>>>>> Extra PFs are like VFs in this case. These PFs may be
>>>>> passed to a VM in a similar way. So, we can have PF
>>>>> representors similar to VF representors. I.e. it is
>>>>> incorrect to say that PF in the case of switchdev is
>>>>> a representor of a network port.
>>>>>
>>>>> If we prefer to talk in representors terminology, we
>>>>> need 4 types of prepresentors:
>>>>> - PF representor for PCIe physical function
>>>>> - VF representor for PCIe virtual function
>>>>> - SF representor for PCIe sub-function (PASID)
>>>>> - network port representor
>>>>> In fact above is PCIe oriented, but there are
>>>>> other buses and ways to deliver traffic to applications.
>>>>> Basically representor for any virtual port in virtual
>>>>> switch which DPDK app can control using transfer rules.
>>>>>
>>>>>> Exactly same thing here with PORT_ID action. You have a packet and action
>>>>>> to send it to the port, but it's not specified if HW needs to send it to
>>>>>> the representor or the upstream port (again, VF or PF, it doesn't matter).
>>>>>> Since there is no extra information, HW should send it to the upstream
>>>>>> port by default. The same as configuration applies by default to the
>>>>>> upstream port.
>>>>>>
>>>>>> Let's look at some workflow examples:
>>>>>>
>>>>>> DPDK Application
>>>>>> | |
>>>>>> | |
>>>>>> +--PF-rep------VF-rep---+
>>>>>> | |
>>>>>> | NIC (switchdev) |
>>>>>> | |
>>>>>> +---PF---------VF-------+
>>>>>> | |
>>>>>> | |
>>>>>> External VM or whatever
>>>>>> Network
>>>>>
>>>>> See above. PF <-> External Network is incorrect above
>>>>> since it not always the case. It should be
>>>>> "NP <-> External network" and "NP-rep" above (NP -
>>>>> network port). Sometimes PF is an NP-rep, but sometimes
>>>>> it is not. It is just a question of default rules in
>>>>> switchdev on what to do with traffic incoming from
>>>>> network port.
>>>>>
>>>>> A bit more complicated picture is:
>>>>>
>>>>> +----------------------------------------+
>>>>> | DPDK Application |
>>>>> +----+---------+---------+---------+-----+
>>>>> |PF0 |PF1 | |
>>>>> | | | |
>>>>> +--NP1-rep---NP2-rep---PF2-rep---VF-rep--+
>>>>> | |
>>>>> | NIC (switchdev) |
>>>>> | |
>>>>> +---NP1-------NP2-------PF2--------VF----+
>>>>> | | | |
>>>>> | | | |
>>>>> External External VM or VM or
>>>>> Network 1 Network 2 whatever whatever
>>>>>
>>>>> So, sometimes PF plays network port representor role (PF0,
>>>>> PF1), sometimes it requires representor itself (PF2).
>>>>> What to do if PF2 itself is attached to application?
>>>>> Can we route traffic to it using PORT_ID action?
>>>>> It has DPDK ethdev port. It is one of arguments why
>>>>> plain PORT_ID should route DPDK application.
>>>>
>>>> OK. This is not very different from my understanding. The key
>>>> is that there is a pair of interfaces, one is more visible than
>>>> the other one.
>>>>
>>>>>
>>>>> Of course, some applications would like to see it as
>>>>> (simpler is better):
>>>>>
>>>>> +----------------------------------------+
>>>>> | DPDK Application |
>>>>> | |
>>>>> +---PF0-------PF1------PF2-rep---VF-rep--+
>>>>> | | | |
>>>>> | | | |
>>>>> External External VM or VM or
>>>>> Network 1 Network 2 whatever whatever
>>>>>
>>>>> but some, I believe, require full picture. For examples,
>>>>> I'd really like to know how much traffic goes via all 8
>>>>> switchdev ports and running rte_eth_stats_get(0, ...)
>>>>> (i.e. DPDK port 0 attached to PF0) I'd like to get
>>>>> NP1-rep stats (not NP1 stats). It will match exactly
>>>>> what I see in DPDK application. It is an argument why
>>>>> plain PORT_ID should be treated as a DPDK ethdev port,
>>>>> not a represented (upstream) entity.
>>>>
>>>> The point is that if application doesn't require full picture,
>>>> it should not care. If application requires the full picture,
>>>> it could take extra steps by setting extra bits. I don't
>>>> understand why we need to force all applications to care about
>>>> the full picture if we can avoid that?
>>>>
>>>>>
>>>>>> a. Workflow for "DPDK Application" to set MAC to VF:
>>>>>>
>>>>>> 1. "DPDK Application" calls rte_set_etheraddr("VF-rep", new_mac);
>>>>>> 2. DPDK sets MAC for "VF".
>>>>>>
>>>>>> b. Workflow for "DPDK Application" to set MAC to PF:
>>>>>>
>>>>>> 1. "DPDK Application" calls rte_set_etheraddr("PF-rep", new_mac);
>>>>>> 2. DPDK sets MAC for "PF".
>>>>>>
>>>>>> c. Workflow for "DPDK Application" to send packet to the external network:
>>>>>>
>>>>>> 1. "DPDK Application" calls rte_eth_tx_burst("PF-rep", packet);
>>>>>> 2. NIC receives the packet from "PF-rep" and sends it to "PF".
>>>>>> 3. packet egresses to the external network from "PF".
>>>>>>
>>>>>> d. Workflow for "DPDK Application" to send packet to the "VM or whatever":
>>>>>>
>>>>>> 1. "DPDK Application" calls rte_eth_tx_burst("VF-rep", packet);
>>>>>> 2. NIC receives the packet from "VF-rep" and sends it to "VF".
>>>>>> 3. "VM or whatever" receives the packet from "VF".
>>>>>>
>>>>>> In two workflows above there is no rte_flow processing on step 2, i.e.,
>>>>>> NIC does not perform any lookups/matches/actions, because it's not possible
>>>>>> to configure actions for packets received from "PF-rep" or
>>>>>> "VF-rep" as these ports doesn't own a port id and all the configuration
>>>>>> and rte_flow actions translated and applied for the devices that these
>>>>>> ports represents ("PF" and "VF") and not representors themselves ("PF-rep"
>>>>>> or "VF-rep").
>>>>>>
>>>>>> e. Workflow for the packet received on PF and PORT_ID action:
>>>>>>
>>>>>> 1. "DPDK Application" configures rte_flow for all packets from "PF-rep"
>>>>>> to execute PORT_ID "VF-rep".
>>>>>> 2. NIC receives packet on "PF".
>>>>>> 3. NIC executes 'PORT_ID "VF-rep"' action by sending packet to "VF".
>>>>>> 4. "VM or whatever" receives the packet from "VF".
>>>>>>
>>>>>> f. Workflow for the packet received on VF and PORT_ID action:
>>>>>>
>>>>>> 1. "DPDK Application" configures rte_flow for all packets from "VF-rep"
>>>>>> to execute 'PORT_ID "PF-rep"'.
>>>>>> 2. NIC receives packet on "VF".
>>>>>> 3. NIC executes 'PORT_ID "PF-rep"' action by sending packet to "PF".
>>>>>> 4. Packet egresses from the "PF" to the external network.
>>>>>>
>>>>>> Above is what, IMHO, the logic should look like and this matches with
>>>>>> the overall switchdev design in kernel.
>>>>>>
>>>>>> I understand that this logic could seem flipped-over from the HW point
>>>>>> of view, but it's perfectly logical from the user's perspective, because
>>>>>> user should not care if the application works with representors or
>>>>>> some real devices. If application configures that all packets from port
>>>>>> A should be sent to port B, user will expect that these packets will
>>>>>> egress from port B once received from port A. That will be highly
>>>>>> inconvenient if the packet will ingress from port B back to the
>>>>>> application instead.
>>>>>>
>>>>>> DPDK Application
>>>>>> | |
>>>>>> | |
>>>>>> port A port B
>>>>>> | |
>>>>>> *****MAGIC*****
>>>>>> | |
>>>>>> External Another Network
>>>>>> Network or VM or whatever
>>>>>>
>>>>>> It should not matter if there is an extra layer between ports A and B
>>>>>> and the external network and VM. Everything should work in exactly the
>>>>>> same way, transparently for the application.
>>>>>>
>>>>>> The point of hardware offloading, and therefore rte_flow API, is to take
>>>>>> what user does in software and make this "magically" work in hardware in
>>>>>> the exactly same way. And this will be broken if user will have to
>>>>>> use different logic based on the mode the hardware works in, i.e. based on
>>>>>> the fact if the application works with ports or their representors.
>>>>>>
>>>>>> If some specific use case requires application to know if it's an
>>>>>> upstream port or the representor and demystify the internals of the switchdev
>>>>>> NIC, there should be a different port id for the representor itself that
>>>>>> could be used in all DPDK APIs including rte_flow API or a special bit for
>>>>>> that matter. IIRC, there was an idea to add a bit directly to the port_id
>>>>>> for that purpose that will flip over behavior in all the workflow scenarios
>>>>>> that I described above.
>>>>>
>>>>> As I understand we're basically on the same page, but just
>>>>> fighting for defaults in DPDK.
>>>>
>>>> Yep.
>>>>
>>>>>
>>>>>>>
>>>>>>> Given these facts, it would not be quite right to just align the documentation with the de-facto action meaning assumed by OvS.
>>>>>>
>>>>>> It's not a "meaning assumed by OvS", it's the original design and the
>>>>>> main idea of a switchdev based on a common sense.
>>>>>
>>>>> If so, common sense is not that common :)
>>>>> My "common sense" says me that PORT_ID action
>>>>> should route traffic to DPDK ethdev port to be
>>>>> received by the DPDK application.
>>>>
>>>> By this logic rte_eth_tx_burst("VF-rep", packet) should send a packet
>>>> to "VF-rep", i.e. this packet will be received back by the application
>>>> on this same interface. But that is counter-intuitive and this is not
>>>> how it works in linux kernel if you're opening socket and sending a
>>>> packet to the "VF-rep" network interface.
>>>>
>>>> And if rte_eth_tx_burst("VF-rep", packet) sends packet to "VF" and not
>>>> to "VF-rep", than I don't understand why PORT_ID action should work in
>>>> the opposite way.
>>>
>>> There's no contradiction here.
>>>
>>> In rte_eth_tx_burst(X, packet) example, "X" is the port which the application sits on and from where it sends the packet. In other words, it's the point where the packet originates from, and not where it goes to.
>>>
>>> At the same time, flow *action* PORT_ID (ID = "X") is clearly the opposite: it specifies where the packet will go. Port ID is the characteristic of a DPDK ethdev. So the packet goes *to* an ethdev with the given ID ("X").
>>>
>>> Perhaps consider action PHY_PORT: the index is the characteristic of the network port. The packet goes *to* network through this NP. And not the opposite way. Hopefully, nobody is going to claim that action PHY_PORT should mean re-injecting the packet back to the HW flow engine "as if it just came from the network port". Then why does one try to skew the PORT_ID meaning this way? PORT_ID points to an ethdev - the packet goes *to* the ethdev. Isn't that simple?
>>
>> It's not simple. And PHY_PORT action would be hard to use from the
>> application that doesn't really need to know how underlying hardware
>> structured.
>
> Yes, I agree. Basically above paragraph just try to highlight
> existing consistent semantics in various actions which set
> traffic direction and highlight inconsistency if we interpret
> PORT_ID default as egress in accordance with terminology
> suggested below. PORT_ID is a DPDK port and default direction
> should be to DPDK port. I'll continue on the topic below.
>
>>>
>>>>
>>>> Application receives a packet from port A and puts it to the port B.
>>>> TC rule to forward packets from port A to port B will provide same result.
>>>> So, why the similar rte_flow should do the opposite and send the packet
>>>> back to the application?
>>>
>>> Please see above. Action VF sends the packet *to* VF and *not* to the upstream entity which this VF is connected to. Action PHY_PORT sends the packet *to* network and does *not* make it appear as if it entered the NIC from the network side. Action QUEUE sends the packet *to* the Rx queue and does *not* make it appear as if it just egressed from the Tx queue with the same index. Action PORT_ID sends the packet *to* an ethdev with the given ID and *not* to the upstream entity which this ethdev is connected to. It's just that transparent. It's just "do what the name suggests".
>>>
>>> Yes, an application (say, OvS) might have a high level design which perceives the "high-level" ports plugged to it as a "patch-panel" of sorts. Yes, when a high-level mechanism/logic of such application invokes a *datapath-unaware* wrapper to offload a rule and request that the packet be delivered to the given "high-level" port, it therefore requests that the packet be delivered to the opposite end of the wire. But then the lower-level datapath-specific (DPDK) handler kicks in. Since it's DPDK-specific, it knows *everything* about the underlying flow library it works with. In particular it knows that action PORT_ID delivers the packet to an *ethdev*, at the same time, it knows that the upper caller (high-level logic) for sure wants the opposite, so it (the lower-level DPDK component) sets the "upstream" bit when translating the higher-level port action to an RTE action "PORT_ID".
>>
>> I don't understand that. DPDK user is the application and DPDK
>> doesn't translate anything, application creates PORT_ID action
>> directly and passes it to DPDK. So, you're forcing the *end user*
>> (a.k.a. application) to know *everything* about the hardware the
>> application runs on. Of course, it gets this information about
>> the hardware from the DPDK library (otherwise this would be
>> completely ridiculous), but this doesn't change the fact that it's
>> the application that needs to think about the structure of the
>> underlying hardware while it's absolutely not necessary in vast
>> majority of cases.
>
> Yes, that's all true, but I think that specification of the
> direction is *not* diving to deep in hardware details.
>
> For DPDK I think it is important to have consistent semantics
> and interpretation of input parameters. That will make the
> library easier to use and make it less error-prone.
>
>>> Then the resulting action is correct, and the packet indeed doesn't end up in the ethdev but goes
>>> to the opposite end of the wire. That's it.
>>>
>>> I have an impression that for some reason people are tempted to ignore the two nominal "layers" in such applications (generic, or high-level one and DPDK-specific one) thus trying to align DPDK logic with high-level logic of the applications. That's simply not right. What I'm trying to point out is that it *is* the true job of DPDK-specific data path handler in such application - to properly translate generic flow actions to DPDK-specific ones. It's the duty of DPDK component in such applications to be aware of the genuine meaning of action PORT_ID.
>>
>> The reason is very simple: if application don't need to know the
>> full picture (how the hardware structured inside) it shouldn't
>> care and it's a duty of DPDK to abstract the hardware and provide
>> programming interfaces that could be easily used by application
>> developers who are not experts in the architecture of a hardware
>> that they want to use (basically, application developer should not
>> care at all in most cases on which hardware application will work).
>> It's basically in almost every single DPDK API, EAL means environment
>> *abstraction* layer, not an environment *proxy/passthrough* layer.
>> We can't assume that DPDK-specific layers in applications are always
>> written by hardware experts and, IMHO, DPDK should not force users
>> to learn underlying structures of switchdev devices. They might not
>> even have such devices for testing, so the application that works
>> on simple NICs should be able to run correctly on switchdev-capable
>> NICs too.
>>
>> I think that "***MAGIC***" abstraction (see one of my previous ascii
>> graphics) is very important here.
>
> I've answered it above. Specification of the direction is *not*
> diving to deep in HW details.
Yes, I agree that specification of direction doesn't require any
knowledge of any HW details and this option is perfectly fine for
me as described in 'ingress/egress' suggestion below. My argument
is about 'upstream' flag specifically. Wording is important, because
I think that 'upstream' implies some knowledge that there are two
different ports.
>
>>>
>>> This way, mixing up the two meanings is ruled out.
>>
>> Looking closer to how tc flower rules configured I noticed that
>> 'mirred' action requires user to specify the direction in which
>> the packet will appear on the destination port. And I suppose
>> this will solve your issues with PORT_ID action without exposing
>> the "full picture" of the architecture of an underlying hardware.
>>
>> It looks something like this:
>>
>> tc filter add dev A ... action mirred egress redirect dev B
>> ^^^^^^
>>
>> Direction could be 'ingress' or 'egress', so the packet will
>> ingress from the port B back to application/kernel or it will
>> egress from this port to the external network. Same thing
>> could be implemented in rte_flow like this:
>>
>> flow create A ingress transfer pattern eth / end
>> action port_id id B egress / end
>>
>> So, application that needs to receive the packet from the port B
>> will specify 'ingress', others that just want to send packet from
>> the port B will specify 'egress'. Will that work for you?
>>
>> (BTW, 'ingress' seems to be not implemented in TC and that kind
>> of suggests that it's not very useful at least for kernel use cases)
>>
>> One might say that it's actually the same what is proposed in
>> this RFC, but I will argue that 'ingress/egress' schema doesn't
>> break the "***MAGIC***" abstraction because user is not obligated
>> to know the structure of the underlying hardware, while 'upstream'
>> flag is something very unclear from that perspective and makes
>> no sense for plane ports (non-representors).
>
> I think it is really an excellent idea and suggested
> terminology looks very good to me. However, we should
> agree on technical details on API level (not testpmd
> commands). I think we have 4 options:
>
> A. Add "ingress" bit with "egress" as unset meaning.
> Yes, that's what is current behaviour assumed and
> used by OvS and implemented in some PMDs.
> My problem with it that it is, IMHO, inconsistent
> default value (as explained above).
>
> B. Add "egress" bit with "ingress" as unset meaning.
> Basically it is what is suggested in the RFC, but
> the problem of the suggestion is the silent breakage
> of existing users (let's put it a side if it is
> correct usage or misuse). It is still the fact.
>
> C. Encode above in ethdev port ID MSB.
> The problem of the solution is that encoding
> makes sense for representors, but the problem
> exists for non-representor ports as well.
> I have no good ideas on terminology in the case
> if we try to solve it for non-representors.
>
> D. Break API and ABI and add enum with unset(default)/
> ingress/egress members to enforce application to
> specify direction.
>
> It is unclear what we'll do in the case of A, B and D
> if we encode representor in port ID MSB in any case.
My opinion:
- Option D is the best choice for rte_flow. No defaults, users forced
to explicitly choose the direction in HW-independent way.
- I agree that option C somewhat conflicts with the 'ingress/egress'
flag idea and it is more hardware-specific. Therefore if option C
is going to be implemented it should be implemented in concept of
option A, i.e. 'egress' is default option if port ID MSB is not set.
>
>>>>>>> On 01/06/2021 15:10, Ilya Maximets wrote:
>>>>>>>> On 6/1/21 1:14 PM, Ivan Malov wrote:
>>>>>>>>> By its very name, action PORT_ID means that packets hit an ethdev with the
>>>>>>>>> given DPDK port ID. At least the current comments don't state the opposite.
>>>>>>>>> That said, since port representors had been adopted, applications like OvS
>>>>>>>>> have been misusing the action. They misread its purpose as sending packets
>>>>>>>>> to the opposite end of the "wire" plugged to the given ethdev, for example,
>>>>>>>>> redirecting packets to the VF itself rather than to its representor ethdev.
>>>>>>>>> Another example: OvS relies on this action with the admin PF's ethdev port
>>>>>>>>> ID specified in it in order to send offloaded packets to the physical port.
>>>>>>>>>
>>>>>>>>> Since there might be applications which use this action in its valid sense,
>>>>>>>>> one can't just change the documentation to greenlight the opposite meaning.
>>>>>>>>> This patch adds an explicit bit to the action configuration which will let
>>>>>>>>> applications, depending on their needs, leverage the two meanings properly.
>>>>>>>>> Applications like OvS, as well as PMDs, will have to be corrected when the
>>>>>>>>> patch has been applied. But the improved clarity of the action is worth it.
>>>>>>>>>
>>>>>>>>> The proposed change is not the only option. One could avoid changes in OvS
>>>>>>>>> and PMDs if the new configuration field had the opposite meaning, with the
>>>>>>>>> action itself meaning delivery to the represented port and not to DPDK one.
>>>>>>>>> Alternatively, one could define a brand new action with the said behaviour.
>>>>>>>>
>>>>>>>> We had already very similar discussions regarding the understanding of what
>>>>>>>> the representor really is from the DPDK API's point of view, and the last
>>>>>>>> time, IIUC, it was concluded by a tech. board that representor should be
>>>>>>>> a "ghost of a VF", i.e. DPDK APIs should apply configuration by default to
>>>>>>>> VF and not to the representor device:
>>>>>>>> https://patches.dpdk.org/project/dpdk/cover/20191029185051.32203-1-thomas@monjalon.net/#104376
>>>>>>>> This wasn't enforced though, IIUC, for existing code and semantics is still mixed.
>>>>>>>>
>>>>>>>> I still think that configuration should be applied to VF, and the same applies
>>>>>>>> to rte_flow API. IMHO, average application should not care if device is
>>>>>>>> a VF itself or its representor. Everything should work exactly the same.
>>>>>>>> I think this matches with the original idea/design of the switchdev functionality
>>>>>>>> in the linux kernel and also matches with how the average user thinks about
>>>>>>>> representor devices.
>>>>>>>>
>>>>>>>> If some specific use-case requires to distinguish VF from the representor,
>>>>>>>> there should probably be a separate special API/flag for that.
>>>>>>>>
>>>>>>>> Best regards, Ilya Maximets.
>>>>>>>>
>>>>>>>
>>>>>
>>>
>
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [dpdk-dev] [RFC PATCH] ethdev: clarify flow action PORT ID semantics
2021-06-03 9:29 ` Ilya Maximets
2021-06-03 10:33 ` Andrew Rybchenko
@ 2021-06-03 11:29 ` Ivan Malov
2021-06-07 19:27 ` Ilya Maximets
1 sibling, 1 reply; 40+ messages in thread
From: Ivan Malov @ 2021-06-03 11:29 UTC (permalink / raw)
To: Ilya Maximets, Andrew Rybchenko, dev
Cc: Eli Britstein, Smadar Fuks, Hyong Youb Kim, Ori Kam, Jerin Jacob,
John Daley, Thomas Monjalon, Ferruh Yigit
On 03/06/2021 12:29, Ilya Maximets wrote:
> On 6/2/21 9:35 PM, Ivan Malov wrote:
>> On 02/06/2021 20:35, Ilya Maximets wrote:
>>> (Dropped Broadcom folks from CC. Mail server refuses to accept their
>>> emails for some reason: "Recipient address rejected: Domain not found."
>>> Please, try to ad them back on reply.)
>>>
>>> On 6/2/21 6:26 PM, Andrew Rybchenko wrote:
>>>> On 6/2/21 3:46 PM, Ilya Maximets wrote:
>>>>> On 6/1/21 4:28 PM, Ivan Malov wrote:
>>>>>> Hi Ilya,
>>>>>>
>>>>>> Thank you for reviewing the proposal at such short notice. I'm afraid that prior discussions overlook the simple fact that the whole problem is not limited to just VF representors. Action PORT_ID is also used with respect to the admin PF's ethdev, which "represents itself" (and by no means it represents the underlying physical/network port). In this case, one cannot state that the application treats it as a physical port, just like one states that the application perceives representors as VFs themselves.
>>>>>
>>>>>
>>>>> I don't think that it was overlooked. If device is in a switchdev mode than
>>>>> there is a PF representor and VF representors. Application typically works
>>>>> only with representors in this case is it doesn't make much sense to have
>>>>> representor and the upstream port attached to the same application at the
>>>>> same time. Configuration that is applied by application to the representor
>>>>> (PF or VF, it doesn't matter) applies to the corresponding upstream port
>>>>> (actual PF or VF) by default.
>>>>
>>>> PF is not necessarily associated with a network port. It
>>>> could be many PFs and just one network port on NIC.
>>>> Extra PFs are like VFs in this case. These PFs may be
>>>> passed to a VM in a similar way. So, we can have PF
>>>> representors similar to VF representors. I.e. it is
>>>> incorrect to say that PF in the case of switchdev is
>>>> a representor of a network port.
>>>>
>>>> If we prefer to talk in representors terminology, we
>>>> need 4 types of prepresentors:
>>>> - PF representor for PCIe physical function
>>>> - VF representor for PCIe virtual function
>>>> - SF representor for PCIe sub-function (PASID)
>>>> - network port representor
>>>> In fact above is PCIe oriented, but there are
>>>> other buses and ways to deliver traffic to applications.
>>>> Basically representor for any virtual port in virtual
>>>> switch which DPDK app can control using transfer rules.
>>>>
>>>>> Exactly same thing here with PORT_ID action. You have a packet and action
>>>>> to send it to the port, but it's not specified if HW needs to send it to
>>>>> the representor or the upstream port (again, VF or PF, it doesn't matter).
>>>>> Since there is no extra information, HW should send it to the upstream
>>>>> port by default. The same as configuration applies by default to the
>>>>> upstream port.
>>>>>
>>>>> Let's look at some workflow examples:
>>>>>
>>>>> DPDK Application
>>>>> | |
>>>>> | |
>>>>> +--PF-rep------VF-rep---+
>>>>> | |
>>>>> | NIC (switchdev) |
>>>>> | |
>>>>> +---PF---------VF-------+
>>>>> | |
>>>>> | |
>>>>> External VM or whatever
>>>>> Network
>>>>
>>>> See above. PF <-> External Network is incorrect above
>>>> since it not always the case. It should be
>>>> "NP <-> External network" and "NP-rep" above (NP -
>>>> network port). Sometimes PF is an NP-rep, but sometimes
>>>> it is not. It is just a question of default rules in
>>>> switchdev on what to do with traffic incoming from
>>>> network port.
>>>>
>>>> A bit more complicated picture is:
>>>>
>>>> +----------------------------------------+
>>>> | DPDK Application |
>>>> +----+---------+---------+---------+-----+
>>>> |PF0 |PF1 | |
>>>> | | | |
>>>> +--NP1-rep---NP2-rep---PF2-rep---VF-rep--+
>>>> | |
>>>> | NIC (switchdev) |
>>>> | |
>>>> +---NP1-------NP2-------PF2--------VF----+
>>>> | | | |
>>>> | | | |
>>>> External External VM or VM or
>>>> Network 1 Network 2 whatever whatever
>>>>
>>>> So, sometimes PF plays network port representor role (PF0,
>>>> PF1), sometimes it requires representor itself (PF2).
>>>> What to do if PF2 itself is attached to application?
>>>> Can we route traffic to it using PORT_ID action?
>>>> It has DPDK ethdev port. It is one of arguments why
>>>> plain PORT_ID should route DPDK application.
>>>
>>> OK. This is not very different from my understanding. The key
>>> is that there is a pair of interfaces, one is more visible than
>>> the other one.
>>>
>>>>
>>>> Of course, some applications would like to see it as
>>>> (simpler is better):
>>>>
>>>> +----------------------------------------+
>>>> | DPDK Application |
>>>> | |
>>>> +---PF0-------PF1------PF2-rep---VF-rep--+
>>>> | | | |
>>>> | | | |
>>>> External External VM or VM or
>>>> Network 1 Network 2 whatever whatever
>>>>
>>>> but some, I believe, require full picture. For examples,
>>>> I'd really like to know how much traffic goes via all 8
>>>> switchdev ports and running rte_eth_stats_get(0, ...)
>>>> (i.e. DPDK port 0 attached to PF0) I'd like to get
>>>> NP1-rep stats (not NP1 stats). It will match exactly
>>>> what I see in DPDK application. It is an argument why
>>>> plain PORT_ID should be treated as a DPDK ethdev port,
>>>> not a represented (upstream) entity.
>>>
>>> The point is that if application doesn't require full picture,
>>> it should not care. If application requires the full picture,
>>> it could take extra steps by setting extra bits. I don't
>>> understand why we need to force all applications to care about
>>> the full picture if we can avoid that?
>>>
>>>>
>>>>> a. Workflow for "DPDK Application" to set MAC to VF:
>>>>>
>>>>> 1. "DPDK Application" calls rte_set_etheraddr("VF-rep", new_mac);
>>>>> 2. DPDK sets MAC for "VF".
>>>>>
>>>>> b. Workflow for "DPDK Application" to set MAC to PF:
>>>>>
>>>>> 1. "DPDK Application" calls rte_set_etheraddr("PF-rep", new_mac);
>>>>> 2. DPDK sets MAC for "PF".
>>>>>
>>>>> c. Workflow for "DPDK Application" to send packet to the external network:
>>>>>
>>>>> 1. "DPDK Application" calls rte_eth_tx_burst("PF-rep", packet);
>>>>> 2. NIC receives the packet from "PF-rep" and sends it to "PF".
>>>>> 3. packet egresses to the external network from "PF".
>>>>>
>>>>> d. Workflow for "DPDK Application" to send packet to the "VM or whatever":
>>>>>
>>>>> 1. "DPDK Application" calls rte_eth_tx_burst("VF-rep", packet);
>>>>> 2. NIC receives the packet from "VF-rep" and sends it to "VF".
>>>>> 3. "VM or whatever" receives the packet from "VF".
>>>>>
>>>>> In two workflows above there is no rte_flow processing on step 2, i.e.,
>>>>> NIC does not perform any lookups/matches/actions, because it's not possible
>>>>> to configure actions for packets received from "PF-rep" or
>>>>> "VF-rep" as these ports doesn't own a port id and all the configuration
>>>>> and rte_flow actions translated and applied for the devices that these
>>>>> ports represents ("PF" and "VF") and not representors themselves ("PF-rep"
>>>>> or "VF-rep").
>>>>>
>>>>> e. Workflow for the packet received on PF and PORT_ID action:
>>>>>
>>>>> 1. "DPDK Application" configures rte_flow for all packets from "PF-rep"
>>>>> to execute PORT_ID "VF-rep".
>>>>> 2. NIC receives packet on "PF".
>>>>> 3. NIC executes 'PORT_ID "VF-rep"' action by sending packet to "VF".
>>>>> 4. "VM or whatever" receives the packet from "VF".
>>>>>
>>>>> f. Workflow for the packet received on VF and PORT_ID action:
>>>>>
>>>>> 1. "DPDK Application" configures rte_flow for all packets from "VF-rep"
>>>>> to execute 'PORT_ID "PF-rep"'.
>>>>> 2. NIC receives packet on "VF".
>>>>> 3. NIC executes 'PORT_ID "PF-rep"' action by sending packet to "PF".
>>>>> 4. Packet egresses from the "PF" to the external network.
>>>>>
>>>>> Above is what, IMHO, the logic should look like and this matches with
>>>>> the overall switchdev design in kernel.
>>>>>
>>>>> I understand that this logic could seem flipped-over from the HW point
>>>>> of view, but it's perfectly logical from the user's perspective, because
>>>>> user should not care if the application works with representors or
>>>>> some real devices. If application configures that all packets from port
>>>>> A should be sent to port B, user will expect that these packets will
>>>>> egress from port B once received from port A. That will be highly
>>>>> inconvenient if the packet will ingress from port B back to the
>>>>> application instead.
>>>>>
>>>>> DPDK Application
>>>>> | |
>>>>> | |
>>>>> port A port B
>>>>> | |
>>>>> *****MAGIC*****
>>>>> | |
>>>>> External Another Network
>>>>> Network or VM or whatever
>>>>>
>>>>> It should not matter if there is an extra layer between ports A and B
>>>>> and the external network and VM. Everything should work in exactly the
>>>>> same way, transparently for the application.
>>>>>
>>>>> The point of hardware offloading, and therefore rte_flow API, is to take
>>>>> what user does in software and make this "magically" work in hardware in
>>>>> the exactly same way. And this will be broken if user will have to
>>>>> use different logic based on the mode the hardware works in, i.e. based on
>>>>> the fact if the application works with ports or their representors.
>>>>>
>>>>> If some specific use case requires application to know if it's an
>>>>> upstream port or the representor and demystify the internals of the switchdev
>>>>> NIC, there should be a different port id for the representor itself that
>>>>> could be used in all DPDK APIs including rte_flow API or a special bit for
>>>>> that matter. IIRC, there was an idea to add a bit directly to the port_id
>>>>> for that purpose that will flip over behavior in all the workflow scenarios
>>>>> that I described above.
>>>>
>>>> As I understand we're basically on the same page, but just
>>>> fighting for defaults in DPDK.
>>>
>>> Yep.
>>>
>>>>
>>>>>>
>>>>>> Given these facts, it would not be quite right to just align the documentation with the de-facto action meaning assumed by OvS.
>>>>>
>>>>> It's not a "meaning assumed by OvS", it's the original design and the
>>>>> main idea of a switchdev based on a common sense.
>>>>
>>>> If so, common sense is not that common :)
>>>> My "common sense" says me that PORT_ID action
>>>> should route traffic to DPDK ethdev port to be
>>>> received by the DPDK application.
>>>
>>> By this logic rte_eth_tx_burst("VF-rep", packet) should send a packet
>>> to "VF-rep", i.e. this packet will be received back by the application
>>> on this same interface. But that is counter-intuitive and this is not
>>> how it works in linux kernel if you're opening socket and sending a
>>> packet to the "VF-rep" network interface.
>>>
>>> And if rte_eth_tx_burst("VF-rep", packet) sends packet to "VF" and not
>>> to "VF-rep", than I don't understand why PORT_ID action should work in
>>> the opposite way.
>>
>> There's no contradiction here.
>>
>> In rte_eth_tx_burst(X, packet) example, "X" is the port which the application sits on and from where it sends the packet. In other words, it's the point where the packet originates from, and not where it goes to.
>>
>> At the same time, flow *action* PORT_ID (ID = "X") is clearly the opposite: it specifies where the packet will go. Port ID is the characteristic of a DPDK ethdev. So the packet goes *to* an ethdev with the given ID ("X").
>>
>> Perhaps consider action PHY_PORT: the index is the characteristic of the network port. The packet goes *to* network through this NP. And not the opposite way. Hopefully, nobody is going to claim that action PHY_PORT should mean re-injecting the packet back to the HW flow engine "as if it just came from the network port". Then why does one try to skew the PORT_ID meaning this way? PORT_ID points to an ethdev - the packet goes *to* the ethdev. Isn't that simple?
>
> It's not simple. And PHY_PORT action would be hard to use from the
> application that doesn't really need to know how underlying hardware
> structured.
I'm not trying to suggest using PHY_PORT. I provide it as an example to
point out the inconsistency between these actions and action PORT_ID in
its de-facto sense assumed by apps like OvS. Sorry.
>
>>
>>>
>>> Application receives a packet from port A and puts it to the port B.
>>> TC rule to forward packets from port A to port B will provide same result.
>>> So, why the similar rte_flow should do the opposite and send the packet
>>> back to the application?
>>
>> Please see above. Action VF sends the packet *to* VF and *not* to the upstream entity which this VF is connected to. Action PHY_PORT sends the packet *to* network and does *not* make it appear as if it entered the NIC from the network side. Action QUEUE sends the packet *to* the Rx queue and does *not* make it appear as if it just egressed from the Tx queue with the same index. Action PORT_ID sends the packet *to* an ethdev with the given ID and *not* to the upstream entity which this ethdev is connected to. It's just that transparent. It's just "do what the name suggests".
>>
>> Yes, an application (say, OvS) might have a high level design which perceives the "high-level" ports plugged to it as a "patch-panel" of sorts. Yes, when a high-level mechanism/logic of such application invokes a *datapath-unaware* wrapper to offload a rule and request that the packet be delivered to the given "high-level" port, it therefore requests that the packet be delivered to the opposite end of the wire. But then the lower-level datapath-specific (DPDK) handler kicks in. Since it's DPDK-specific, it knows *everything* about the underlying flow library it works with. In particular it knows that action PORT_ID delivers the packet to an *ethdev*, at the same time, it knows that the upper caller (high-level logic) for sure wants the opposite, so it (the lower-level DPDK component) sets the "upstream" bit when translating the higher-level port action to an RTE action "PORT_ID".
>
> I don't understand that. DPDK user is the application and DPDK
> doesn't translate anything, application creates PORT_ID action
> directly and passes it to DPDK. So, you're forcing the *end user*
> (a.k.a. application) to know *everything* about the hardware the
> application runs on. Of course, it gets this information about
> the hardware from the DPDK library (otherwise this would be
> completely ridiculous), but this doesn't change the fact that it's
> the application that needs to think about the structure of the
> underlying hardware while it's absolutely not necessary in vast
> majority of cases.
Me forcing the application to know everything about the hardware? I do
no such thing. What I mean is very simple: the application developer is
expected to read DPDK documentation carefully before using a flow
primitive like PORT_ID. If they understood it, they would realise the
unfitness of the existing action behaviour for their needs. The
developer would then raise the issue and extend the action in one way
("upstream" bit) or another ("egress" bit). And only *then* they would
have everything in place to finally put the action into usage by the
application. But what we have in reality is the opposite. Let's just
face it.
>
>> Then the resulting action is correct, and the packet indeed doesn't end up in the ethdev but goes
>> to the opposite end of the wire. That's it.
>>
>> I have an impression that for some reason people are tempted to ignore the two nominal "layers" in such applications (generic, or high-level one and DPDK-specific one) thus trying to align DPDK logic with high-level logic of the applications. That's simply not right. What I'm trying to point out is that it *is* the true job of DPDK-specific data path handler in such application - to properly translate generic flow actions to DPDK-specific ones. It's the duty of DPDK component in such applications to be aware of the genuine meaning of action PORT_ID.
>
> The reason is very simple: if application don't need to know the
> full picture (how the hardware structured inside) it shouldn't
> care and it's a duty of DPDK to abstract the hardware and provide
> programming interfaces that could be easily used by application
> developers who are not experts in the architecture of a hardware
> that they want to use (basically, application developer should not
> care at all in most cases on which hardware application will work).
> It's basically in almost every single DPDK API, EAL means environment
> *abstraction* layer, not an environment *proxy/passthrough* layer.
> We can't assume that DPDK-specific layers in applications are always
> written by hardware experts and, IMHO, DPDK should not force users
> to learn underlying structures of switchdev devices. They might not
> even have such devices for testing, so the application that works
> on simple NICs should be able to run correctly on switchdev-capable
> NICs too.
>
> I think that "***MAGIC***" abstraction (see one of my previous ascii
> graphics) is very important here.
Look. Imagine that the hypothetical "observer" sits inside the
application. The application receives packets on one port X and sends
them using Tx burst API on port Y. Now it wants to offload this work and
use ID=Y to reference location where the packet should go "as if it was
sent using Tx burst on port Y". OK. But the application uses attribute
"transfer". And that really makes difference because it "teleports" (or
"transfers") the "observer" from the application down to the HW eSwitch
level. Yes, the application doesn't know anything about HW eSwitch
structure but it at leas knows about its very existence; otherwise, it
wouldn't use attribute "transfer" at all. And now, when the "observer"
sits inside the HW eSwitch, from that standpoint, PORT_ID ID=Y looks
like delivery to the ethdev port. The "observer"'s location changes the
perceived direction. In this case, correct default behaviour is delivery
to the ethdev. So. if the application developer was capable to correctly
put attribute "transfer" in use, why wouldn't they be capable to also
put flag "upstream" of the action PORT_ID in use, too?
>
>>
>> This way, mixing up the two meanings is ruled out.
>
> Looking closer to how tc flower rules configured I noticed that
> 'mirred' action requires user to specify the direction in which
> the packet will appear on the destination port. And I suppose
> this will solve your issues with PORT_ID action without exposing
> the "full picture" of the architecture of an underlying hardware.
>
> It looks something like this:
>
> tc filter add dev A ... action mirred egress redirect dev B
> ^^^^^^
>
> Direction could be 'ingress' or 'egress', so the packet will
> ingress from the port B back to application/kernel or it will
> egress from this port to the external network. Same thing
> could be implemented in rte_flow like this:
>
> flow create A ingress transfer pattern eth / end
> action port_id id B egress / end
>
> So, application that needs to receive the packet from the port B
> will specify 'ingress', others that just want to send packet from
> the port B will specify 'egress'. Will that work for you?
>
> (BTW, 'ingress' seems to be not implemented in TC and that kind
> of suggests that it's not very useful at least for kernel use cases)
>
> One might say that it's actually the same what is proposed in
> this RFC, but I will argue that 'ingress/egress' schema doesn't
> break the "***MAGIC***" abstraction because user is not obligated
> to know the structure of the underlying hardware, while 'upstream'
> flag is something very unclear from that perspective and makes
> no sense for plane ports (non-representors).
Perhaps indeed "upstream" is barely the best choice, and I'd be tempted
to accept the "egress" proposal, but I strongly disagree with the
suggestion that "upstream" purportedly implies some HW knowledge. It
simply means that the packet goes to the location where the wire leads
to and not back to the application's own end of the wire. Must the port
be connected to some location. And for non-representors, like admin's PF
(isn't that what you call a plane port?) it perfectly makes sense
because saying "PORT_ID id 0 upstream" means (not to the application,
but to us) that the packet will be sent to network through the network
port connected to this PF. That's it.
>
>>
>>>
>>>>
>>>>>>
>>>>>> On 01/06/2021 15:10, Ilya Maximets wrote:
>>>>>>> On 6/1/21 1:14 PM, Ivan Malov wrote:
>>>>>>>> By its very name, action PORT_ID means that packets hit an ethdev with the
>>>>>>>> given DPDK port ID. At least the current comments don't state the opposite.
>>>>>>>> That said, since port representors had been adopted, applications like OvS
>>>>>>>> have been misusing the action. They misread its purpose as sending packets
>>>>>>>> to the opposite end of the "wire" plugged to the given ethdev, for example,
>>>>>>>> redirecting packets to the VF itself rather than to its representor ethdev.
>>>>>>>> Another example: OvS relies on this action with the admin PF's ethdev port
>>>>>>>> ID specified in it in order to send offloaded packets to the physical port.
>>>>>>>>
>>>>>>>> Since there might be applications which use this action in its valid sense,
>>>>>>>> one can't just change the documentation to greenlight the opposite meaning.
>>>>>>>> This patch adds an explicit bit to the action configuration which will let
>>>>>>>> applications, depending on their needs, leverage the two meanings properly.
>>>>>>>> Applications like OvS, as well as PMDs, will have to be corrected when the
>>>>>>>> patch has been applied. But the improved clarity of the action is worth it.
>>>>>>>>
>>>>>>>> The proposed change is not the only option. One could avoid changes in OvS
>>>>>>>> and PMDs if the new configuration field had the opposite meaning, with the
>>>>>>>> action itself meaning delivery to the represented port and not to DPDK one.
>>>>>>>> Alternatively, one could define a brand new action with the said behaviour.
>>>>>>>
>>>>>>> We had already very similar discussions regarding the understanding of what
>>>>>>> the representor really is from the DPDK API's point of view, and the last
>>>>>>> time, IIUC, it was concluded by a tech. board that representor should be
>>>>>>> a "ghost of a VF", i.e. DPDK APIs should apply configuration by default to
>>>>>>> VF and not to the representor device:
>>>>>>> https://patches.dpdk.org/project/dpdk/cover/20191029185051.32203-1-thomas@monjalon.net/#104376
>>>>>>> This wasn't enforced though, IIUC, for existing code and semantics is still mixed.
>>>>>>>
>>>>>>> I still think that configuration should be applied to VF, and the same applies
>>>>>>> to rte_flow API. IMHO, average application should not care if device is
>>>>>>> a VF itself or its representor. Everything should work exactly the same.
>>>>>>> I think this matches with the original idea/design of the switchdev functionality
>>>>>>> in the linux kernel and also matches with how the average user thinks about
>>>>>>> representor devices.
>>>>>>>
>>>>>>> If some specific use-case requires to distinguish VF from the representor,
>>>>>>> there should probably be a separate special API/flag for that.
>>>>>>>
>>>>>>> Best regards, Ilya Maximets.
>>>>>>>
>>>>>>
>>>>
>>
--
Ivan M
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [dpdk-dev] [RFC PATCH] ethdev: clarify flow action PORT ID semantics
2021-06-03 9:55 ` Andrew Rybchenko
@ 2021-06-07 8:28 ` Thomas Monjalon
2021-06-07 9:42 ` Andrew Rybchenko
0 siblings, 1 reply; 40+ messages in thread
From: Thomas Monjalon @ 2021-06-07 8:28 UTC (permalink / raw)
To: Ori Kam, Ivan Malov, Eli Britstein, Ilya Maximets
Cc: dev, Smadar Fuks, Hyong Youb Kim, Kishore Padmanabha,
Ajit Khaparde, Jerin Jacob, John Daley, Ferruh Yigit,
Andrew Rybchenko
03/06/2021 11:55, Andrew Rybchenko:
> On 6/3/21 12:18 PM, Ori Kam wrote:
> > Sorry but OVS got it right, this is the idea to send packet to the VF not to the representor,
> > I think that our first discussion should be what is a representor,
> > I know that there are a lot threads about it but it is steel unclear.
>
> Yes, really unclear. I'd like to highlight again that
> the problem is not with representors only (as described
> and discussed in the thread).
>
> > From my understanding representor is a shadow of a VF
> > This shadow has two functionalities:
> > 1. data
> > It should receive any packet that was sent from the VF and was not
> > routed to any other destination. And vise versa any traffic sent on the representor.
> > should arrive to the corresponding VF.
> > What use case do you see for sending a packet to the representor?
> >
> > 2. control
> > allow to modify the VF from DPDK application.
> >
> > Regarding the 1 point of the data, I don't see any sense if routing traffic to representor.
> > While on point 2 control their maybe some cases that we want to configure the representor itself
> > and not the VF for example changing mtu.
>
> IMO if so there is a big inconsistency here with statistics
> (just an example, which is simply to discuss).
> On one hand packet/byte stats should say how much data is
> received/sent by the DPDK application via the port (yes,
> shadow, but still an ethdev port).
> On the other hand you say that it is a shadow and it should
> return VF stats.
I see emails don't work well to conclude on how to manage representors.
I propose working in live meetings so we can try to align our views
on a virtual whiteboard and interactively ask questions.
Participants in those meetings could work on documenting what is the
view of a representor as a first step.
Second step, it should be easier to discuss the API.
If you agree, I will plan a first meeting where we can discuss what
is a representor in our opinions.
The meeting time would be 4pm UTC.
For the day, I would propose this Thursday 10
if it works for everybody involved.
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [dpdk-dev] [RFC PATCH] ethdev: clarify flow action PORT ID semantics
2021-06-07 8:28 ` Thomas Monjalon
@ 2021-06-07 9:42 ` Andrew Rybchenko
2021-06-07 12:08 ` Ori Kam
0 siblings, 1 reply; 40+ messages in thread
From: Andrew Rybchenko @ 2021-06-07 9:42 UTC (permalink / raw)
To: Thomas Monjalon, Ori Kam, Ivan Malov, Eli Britstein, Ilya Maximets
Cc: dev, Smadar Fuks, Hyong Youb Kim, Kishore Padmanabha,
Ajit Khaparde, Jerin Jacob, John Daley, Ferruh Yigit
On 6/7/21 11:28 AM, Thomas Monjalon wrote:
> 03/06/2021 11:55, Andrew Rybchenko:
>> On 6/3/21 12:18 PM, Ori Kam wrote:
>>> Sorry but OVS got it right, this is the idea to send packet to the VF not to the representor,
>>> I think that our first discussion should be what is a representor,
>>> I know that there are a lot threads about it but it is steel unclear.
>>
>> Yes, really unclear. I'd like to highlight again that
>> the problem is not with representors only (as described
>> and discussed in the thread).
>>
>>> From my understanding representor is a shadow of a VF
>>> This shadow has two functionalities:
>>> 1. data
>>> It should receive any packet that was sent from the VF and was not
>>> routed to any other destination. And vise versa any traffic sent on the representor.
>>> should arrive to the corresponding VF.
>>> What use case do you see for sending a packet to the representor?
>>>
>>> 2. control
>>> allow to modify the VF from DPDK application.
>>>
>>> Regarding the 1 point of the data, I don't see any sense if routing traffic to representor.
>>> While on point 2 control their maybe some cases that we want to configure the representor itself
>>> and not the VF for example changing mtu.
>>
>> IMO if so there is a big inconsistency here with statistics
>> (just an example, which is simply to discuss).
>> On one hand packet/byte stats should say how much data is
>> received/sent by the DPDK application via the port (yes,
>> shadow, but still an ethdev port).
>> On the other hand you say that it is a shadow and it should
>> return VF stats.
>
> I see emails don't work well to conclude on how to manage representors.
> I propose working in live meetings so we can try to align our views
> on a virtual whiteboard and interactively ask questions.
> Participants in those meetings could work on documenting what is the
> view of a representor as a first step.
> Second step, it should be easier to discuss the API.
>
> If you agree, I will plan a first meeting where we can discuss what
> is a representor in our opinions.
> The meeting time would be 4pm UTC.
> For the day, I would propose this Thursday 10
> if it works for everybody involved.
>
OK for me.
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [dpdk-dev] [RFC PATCH] ethdev: clarify flow action PORT ID semantics
2021-06-07 9:42 ` Andrew Rybchenko
@ 2021-06-07 12:08 ` Ori Kam
2021-06-07 13:21 ` Ilya Maximets
0 siblings, 1 reply; 40+ messages in thread
From: Ori Kam @ 2021-06-07 12:08 UTC (permalink / raw)
To: Andrew Rybchenko, NBU-Contact-Thomas Monjalon, Ivan Malov,
Eli Britstein, Ilya Maximets
Cc: dev, Smadar Fuks, Hyong Youb Kim, Kishore Padmanabha,
Ajit Khaparde, Jerin Jacob, John Daley, Ferruh Yigit
> -----Original Message-----
> From: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
> semantics
>
> On 6/7/21 11:28 AM, Thomas Monjalon wrote:
> > 03/06/2021 11:55, Andrew Rybchenko:
> >> On 6/3/21 12:18 PM, Ori Kam wrote:
> >>> Sorry but OVS got it right, this is the idea to send packet to the
> >>> VF not to the representor, I think that our first discussion should
> >>> be what is a representor, I know that there are a lot threads about it but
> it is steel unclear.
> >>
> >> Yes, really unclear. I'd like to highlight again that the problem is
> >> not with representors only (as described and discussed in the
> >> thread).
> >>
> >>> From my understanding representor is a shadow of a VF This shadow
> >>> has two functionalities:
> >>> 1. data
> >>> It should receive any packet that was sent from the VF and was not
> >>> routed to any other destination. And vise versa any traffic sent on the
> representor.
> >>> should arrive to the corresponding VF.
> >>> What use case do you see for sending a packet to the representor?
> >>>
> >>> 2. control
> >>> allow to modify the VF from DPDK application.
> >>>
> >>> Regarding the 1 point of the data, I don't see any sense if routing traffic
> to representor.
> >>> While on point 2 control their maybe some cases that we want to
> >>> configure the representor itself and not the VF for example changing
> mtu.
> >>
> >> IMO if so there is a big inconsistency here with statistics (just an
> >> example, which is simply to discuss).
> >> On one hand packet/byte stats should say how much data is
> >> received/sent by the DPDK application via the port (yes, shadow, but
> >> still an ethdev port).
> >> On the other hand you say that it is a shadow and it should return VF
> >> stats.
> >
> > I see emails don't work well to conclude on how to manage representors.
> > I propose working in live meetings so we can try to align our views on
> > a virtual whiteboard and interactively ask questions.
> > Participants in those meetings could work on documenting what is the
> > view of a representor as a first step.
> > Second step, it should be easier to discuss the API.
> >
> > If you agree, I will plan a first meeting where we can discuss what is
> > a representor in our opinions.
> > The meeting time would be 4pm UTC.
> > For the day, I would propose this Thursday 10 if it works for
> > everybody involved.
> >
>
> OK for me.
O.K. for me too.
Ori
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [dpdk-dev] [RFC PATCH] ethdev: clarify flow action PORT ID semantics
2021-06-07 12:08 ` Ori Kam
@ 2021-06-07 13:21 ` Ilya Maximets
2021-06-07 16:07 ` Thomas Monjalon
0 siblings, 1 reply; 40+ messages in thread
From: Ilya Maximets @ 2021-06-07 13:21 UTC (permalink / raw)
To: Ori Kam, Andrew Rybchenko, NBU-Contact-Thomas Monjalon,
Ivan Malov, Eli Britstein, Ilya Maximets
Cc: dev, Smadar Fuks, Hyong Youb Kim, Kishore Padmanabha,
Ajit Khaparde, Jerin Jacob, John Daley, Ferruh Yigit
On 6/7/21 2:08 PM, Ori Kam wrote:
>
>
>> -----Original Message-----
>> From: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
>> semantics
>>
>> On 6/7/21 11:28 AM, Thomas Monjalon wrote:
>>> 03/06/2021 11:55, Andrew Rybchenko:
>>>> On 6/3/21 12:18 PM, Ori Kam wrote:
>>>>> Sorry but OVS got it right, this is the idea to send packet to the
>>>>> VF not to the representor, I think that our first discussion should
>>>>> be what is a representor, I know that there are a lot threads about it but
>> it is steel unclear.
>>>>
>>>> Yes, really unclear. I'd like to highlight again that the problem is
>>>> not with representors only (as described and discussed in the
>>>> thread).
>>>>
>>>>> From my understanding representor is a shadow of a VF This shadow
>>>>> has two functionalities:
>>>>> 1. data
>>>>> It should receive any packet that was sent from the VF and was not
>>>>> routed to any other destination. And vise versa any traffic sent on the
>> representor.
>>>>> should arrive to the corresponding VF.
>>>>> What use case do you see for sending a packet to the representor?
>>>>>
>>>>> 2. control
>>>>> allow to modify the VF from DPDK application.
>>>>>
>>>>> Regarding the 1 point of the data, I don't see any sense if routing traffic
>> to representor.
>>>>> While on point 2 control their maybe some cases that we want to
>>>>> configure the representor itself and not the VF for example changing
>> mtu.
>>>>
>>>> IMO if so there is a big inconsistency here with statistics (just an
>>>> example, which is simply to discuss).
>>>> On one hand packet/byte stats should say how much data is
>>>> received/sent by the DPDK application via the port (yes, shadow, but
>>>> still an ethdev port).
>>>> On the other hand you say that it is a shadow and it should return VF
>>>> stats.
>>>
>>> I see emails don't work well to conclude on how to manage representors.
>>> I propose working in live meetings so we can try to align our views on
>>> a virtual whiteboard and interactively ask questions.
>>> Participants in those meetings could work on documenting what is the
>>> view of a representor as a first step.
>>> Second step, it should be easier to discuss the API.
>>>
>>> If you agree, I will plan a first meeting where we can discuss what is
>>> a representor in our opinions.
>>> The meeting time would be 4pm UTC.
>>> For the day, I would propose this Thursday 10 if it works for
>>> everybody involved.
Second half of the day (CET) is pretty much booked in my calendar. Tuesday
or Wednesday might work on this week. Wednesday or Thursday might work on
next week.
>>>
>>
>> OK for me.
> O.K. for me too.
> Ori
>
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [dpdk-dev] [RFC PATCH] ethdev: clarify flow action PORT ID semantics
2021-06-07 13:21 ` Ilya Maximets
@ 2021-06-07 16:07 ` Thomas Monjalon
2021-06-08 16:13 ` Thomas Monjalon
0 siblings, 1 reply; 40+ messages in thread
From: Thomas Monjalon @ 2021-06-07 16:07 UTC (permalink / raw)
To: Ilya Maximets
Cc: Ori Kam, Andrew Rybchenko, Ivan Malov, Eli Britstein, dev,
Smadar Fuks, Hyong Youb Kim, Kishore Padmanabha, Ajit Khaparde,
Jerin Jacob, John Daley, Ferruh Yigit
07/06/2021 15:21, Ilya Maximets:
> On 6/7/21 2:08 PM, Ori Kam wrote:
> > From: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
> >> On 6/7/21 11:28 AM, Thomas Monjalon wrote:
> >>> I see emails don't work well to conclude on how to manage representors.
> >>> I propose working in live meetings so we can try to align our views on
> >>> a virtual whiteboard and interactively ask questions.
> >>> Participants in those meetings could work on documenting what is the
> >>> view of a representor as a first step.
> >>> Second step, it should be easier to discuss the API.
> >>>
> >>> If you agree, I will plan a first meeting where we can discuss what is
> >>> a representor in our opinions.
> >>> The meeting time would be 4pm UTC.
> >>> For the day, I would propose this Thursday 10 if it works for
> >>> everybody involved.
>
> Second half of the day (CET) is pretty much booked in my calendar. Tuesday
> or Wednesday might work on this week. Wednesday or Thursday might work on
> next week.
>
> >> OK for me.
> > O.K. for me too.
OK let's decide with a poll please:
https://framadate.org/PAP3t8ycqx3lEUfe
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [dpdk-dev] [RFC PATCH] ethdev: clarify flow action PORT ID semantics
2021-06-03 11:29 ` Ivan Malov
@ 2021-06-07 19:27 ` Ilya Maximets
2021-06-07 20:39 ` Ivan Malov
0 siblings, 1 reply; 40+ messages in thread
From: Ilya Maximets @ 2021-06-07 19:27 UTC (permalink / raw)
To: Ivan Malov, Ilya Maximets, Andrew Rybchenko, dev
Cc: Eli Britstein, Smadar Fuks, Hyong Youb Kim, Ori Kam, Jerin Jacob,
John Daley, Thomas Monjalon, Ferruh Yigit
On 6/3/21 1:29 PM, Ivan Malov wrote:
> On 03/06/2021 12:29, Ilya Maximets wrote:
>> On 6/2/21 9:35 PM, Ivan Malov wrote:
>>> On 02/06/2021 20:35, Ilya Maximets wrote:
>>>> (Dropped Broadcom folks from CC. Mail server refuses to accept their
>>>> emails for some reason: "Recipient address rejected: Domain not found."
>>>> Please, try to ad them back on reply.)
>>>>
>>>> On 6/2/21 6:26 PM, Andrew Rybchenko wrote:
>>>>> On 6/2/21 3:46 PM, Ilya Maximets wrote:
>>>>>> On 6/1/21 4:28 PM, Ivan Malov wrote:
>>>>>>> Hi Ilya,
>>>>>>>
>>>>>>> Thank you for reviewing the proposal at such short notice. I'm afraid that prior discussions overlook the simple fact that the whole problem is not limited to just VF representors. Action PORT_ID is also used with respect to the admin PF's ethdev, which "represents itself" (and by no means it represents the underlying physical/network port). In this case, one cannot state that the application treats it as a physical port, just like one states that the application perceives representors as VFs themselves.
>>>>>>
>>>>>>
>>>>>> I don't think that it was overlooked. If device is in a switchdev mode than
>>>>>> there is a PF representor and VF representors. Application typically works
>>>>>> only with representors in this case is it doesn't make much sense to have
>>>>>> representor and the upstream port attached to the same application at the
>>>>>> same time. Configuration that is applied by application to the representor
>>>>>> (PF or VF, it doesn't matter) applies to the corresponding upstream port
>>>>>> (actual PF or VF) by default.
>>>>>
>>>>> PF is not necessarily associated with a network port. It
>>>>> could be many PFs and just one network port on NIC.
>>>>> Extra PFs are like VFs in this case. These PFs may be
>>>>> passed to a VM in a similar way. So, we can have PF
>>>>> representors similar to VF representors. I.e. it is
>>>>> incorrect to say that PF in the case of switchdev is
>>>>> a representor of a network port.
>>>>>
>>>>> If we prefer to talk in representors terminology, we
>>>>> need 4 types of prepresentors:
>>>>> - PF representor for PCIe physical function
>>>>> - VF representor for PCIe virtual function
>>>>> - SF representor for PCIe sub-function (PASID)
>>>>> - network port representor
>>>>> In fact above is PCIe oriented, but there are
>>>>> other buses and ways to deliver traffic to applications.
>>>>> Basically representor for any virtual port in virtual
>>>>> switch which DPDK app can control using transfer rules.
>>>>>
>>>>>> Exactly same thing here with PORT_ID action. You have a packet and action
>>>>>> to send it to the port, but it's not specified if HW needs to send it to
>>>>>> the representor or the upstream port (again, VF or PF, it doesn't matter).
>>>>>> Since there is no extra information, HW should send it to the upstream
>>>>>> port by default. The same as configuration applies by default to the
>>>>>> upstream port.
>>>>>>
>>>>>> Let's look at some workflow examples:
>>>>>>
>>>>>> DPDK Application
>>>>>> | |
>>>>>> | |
>>>>>> +--PF-rep------VF-rep---+
>>>>>> | |
>>>>>> | NIC (switchdev) |
>>>>>> | |
>>>>>> +---PF---------VF-------+
>>>>>> | |
>>>>>> | |
>>>>>> External VM or whatever
>>>>>> Network
>>>>>
>>>>> See above. PF <-> External Network is incorrect above
>>>>> since it not always the case. It should be
>>>>> "NP <-> External network" and "NP-rep" above (NP -
>>>>> network port). Sometimes PF is an NP-rep, but sometimes
>>>>> it is not. It is just a question of default rules in
>>>>> switchdev on what to do with traffic incoming from
>>>>> network port.
>>>>>
>>>>> A bit more complicated picture is:
>>>>>
>>>>> +----------------------------------------+
>>>>> | DPDK Application |
>>>>> +----+---------+---------+---------+-----+
>>>>> |PF0 |PF1 | |
>>>>> | | | |
>>>>> +--NP1-rep---NP2-rep---PF2-rep---VF-rep--+
>>>>> | |
>>>>> | NIC (switchdev) |
>>>>> | |
>>>>> +---NP1-------NP2-------PF2--------VF----+
>>>>> | | | |
>>>>> | | | |
>>>>> External External VM or VM or
>>>>> Network 1 Network 2 whatever whatever
>>>>>
>>>>> So, sometimes PF plays network port representor role (PF0,
>>>>> PF1), sometimes it requires representor itself (PF2).
>>>>> What to do if PF2 itself is attached to application?
>>>>> Can we route traffic to it using PORT_ID action?
>>>>> It has DPDK ethdev port. It is one of arguments why
>>>>> plain PORT_ID should route DPDK application.
>>>>
>>>> OK. This is not very different from my understanding. The key
>>>> is that there is a pair of interfaces, one is more visible than
>>>> the other one.
>>>>
>>>>>
>>>>> Of course, some applications would like to see it as
>>>>> (simpler is better):
>>>>>
>>>>> +----------------------------------------+
>>>>> | DPDK Application |
>>>>> | |
>>>>> +---PF0-------PF1------PF2-rep---VF-rep--+
>>>>> | | | |
>>>>> | | | |
>>>>> External External VM or VM or
>>>>> Network 1 Network 2 whatever whatever
>>>>>
>>>>> but some, I believe, require full picture. For examples,
>>>>> I'd really like to know how much traffic goes via all 8
>>>>> switchdev ports and running rte_eth_stats_get(0, ...)
>>>>> (i.e. DPDK port 0 attached to PF0) I'd like to get
>>>>> NP1-rep stats (not NP1 stats). It will match exactly
>>>>> what I see in DPDK application. It is an argument why
>>>>> plain PORT_ID should be treated as a DPDK ethdev port,
>>>>> not a represented (upstream) entity.
>>>>
>>>> The point is that if application doesn't require full picture,
>>>> it should not care. If application requires the full picture,
>>>> it could take extra steps by setting extra bits. I don't
>>>> understand why we need to force all applications to care about
>>>> the full picture if we can avoid that?
>>>>
>>>>>
>>>>>> a. Workflow for "DPDK Application" to set MAC to VF:
>>>>>>
>>>>>> 1. "DPDK Application" calls rte_set_etheraddr("VF-rep", new_mac);
>>>>>> 2. DPDK sets MAC for "VF".
>>>>>>
>>>>>> b. Workflow for "DPDK Application" to set MAC to PF:
>>>>>>
>>>>>> 1. "DPDK Application" calls rte_set_etheraddr("PF-rep", new_mac);
>>>>>> 2. DPDK sets MAC for "PF".
>>>>>>
>>>>>> c. Workflow for "DPDK Application" to send packet to the external network:
>>>>>>
>>>>>> 1. "DPDK Application" calls rte_eth_tx_burst("PF-rep", packet);
>>>>>> 2. NIC receives the packet from "PF-rep" and sends it to "PF".
>>>>>> 3. packet egresses to the external network from "PF".
>>>>>>
>>>>>> d. Workflow for "DPDK Application" to send packet to the "VM or whatever":
>>>>>>
>>>>>> 1. "DPDK Application" calls rte_eth_tx_burst("VF-rep", packet);
>>>>>> 2. NIC receives the packet from "VF-rep" and sends it to "VF".
>>>>>> 3. "VM or whatever" receives the packet from "VF".
>>>>>>
>>>>>> In two workflows above there is no rte_flow processing on step 2, i.e.,
>>>>>> NIC does not perform any lookups/matches/actions, because it's not possible
>>>>>> to configure actions for packets received from "PF-rep" or
>>>>>> "VF-rep" as these ports doesn't own a port id and all the configuration
>>>>>> and rte_flow actions translated and applied for the devices that these
>>>>>> ports represents ("PF" and "VF") and not representors themselves ("PF-rep"
>>>>>> or "VF-rep").
>>>>>>
>>>>>> e. Workflow for the packet received on PF and PORT_ID action:
>>>>>>
>>>>>> 1. "DPDK Application" configures rte_flow for all packets from "PF-rep"
>>>>>> to execute PORT_ID "VF-rep".
>>>>>> 2. NIC receives packet on "PF".
>>>>>> 3. NIC executes 'PORT_ID "VF-rep"' action by sending packet to "VF".
>>>>>> 4. "VM or whatever" receives the packet from "VF".
>>>>>>
>>>>>> f. Workflow for the packet received on VF and PORT_ID action:
>>>>>>
>>>>>> 1. "DPDK Application" configures rte_flow for all packets from "VF-rep"
>>>>>> to execute 'PORT_ID "PF-rep"'.
>>>>>> 2. NIC receives packet on "VF".
>>>>>> 3. NIC executes 'PORT_ID "PF-rep"' action by sending packet to "PF".
>>>>>> 4. Packet egresses from the "PF" to the external network.
>>>>>>
>>>>>> Above is what, IMHO, the logic should look like and this matches with
>>>>>> the overall switchdev design in kernel.
>>>>>>
>>>>>> I understand that this logic could seem flipped-over from the HW point
>>>>>> of view, but it's perfectly logical from the user's perspective, because
>>>>>> user should not care if the application works with representors or
>>>>>> some real devices. If application configures that all packets from port
>>>>>> A should be sent to port B, user will expect that these packets will
>>>>>> egress from port B once received from port A. That will be highly
>>>>>> inconvenient if the packet will ingress from port B back to the
>>>>>> application instead.
>>>>>>
>>>>>> DPDK Application
>>>>>> | |
>>>>>> | |
>>>>>> port A port B
>>>>>> | |
>>>>>> *****MAGIC*****
>>>>>> | |
>>>>>> External Another Network
>>>>>> Network or VM or whatever
>>>>>>
>>>>>> It should not matter if there is an extra layer between ports A and B
>>>>>> and the external network and VM. Everything should work in exactly the
>>>>>> same way, transparently for the application.
>>>>>>
>>>>>> The point of hardware offloading, and therefore rte_flow API, is to take
>>>>>> what user does in software and make this "magically" work in hardware in
>>>>>> the exactly same way. And this will be broken if user will have to
>>>>>> use different logic based on the mode the hardware works in, i.e. based on
>>>>>> the fact if the application works with ports or their representors.
>>>>>>
>>>>>> If some specific use case requires application to know if it's an
>>>>>> upstream port or the representor and demystify the internals of the switchdev
>>>>>> NIC, there should be a different port id for the representor itself that
>>>>>> could be used in all DPDK APIs including rte_flow API or a special bit for
>>>>>> that matter. IIRC, there was an idea to add a bit directly to the port_id
>>>>>> for that purpose that will flip over behavior in all the workflow scenarios
>>>>>> that I described above.
>>>>>
>>>>> As I understand we're basically on the same page, but just
>>>>> fighting for defaults in DPDK.
>>>>
>>>> Yep.
>>>>
>>>>>
>>>>>>>
>>>>>>> Given these facts, it would not be quite right to just align the documentation with the de-facto action meaning assumed by OvS.
>>>>>>
>>>>>> It's not a "meaning assumed by OvS", it's the original design and the
>>>>>> main idea of a switchdev based on a common sense.
>>>>>
>>>>> If so, common sense is not that common :)
>>>>> My "common sense" says me that PORT_ID action
>>>>> should route traffic to DPDK ethdev port to be
>>>>> received by the DPDK application.
>>>>
>>>> By this logic rte_eth_tx_burst("VF-rep", packet) should send a packet
>>>> to "VF-rep", i.e. this packet will be received back by the application
>>>> on this same interface. But that is counter-intuitive and this is not
>>>> how it works in linux kernel if you're opening socket and sending a
>>>> packet to the "VF-rep" network interface.
>>>>
>>>> And if rte_eth_tx_burst("VF-rep", packet) sends packet to "VF" and not
>>>> to "VF-rep", than I don't understand why PORT_ID action should work in
>>>> the opposite way.
>>>
>>> There's no contradiction here.
>>>
>>> In rte_eth_tx_burst(X, packet) example, "X" is the port which the application sits on and from where it sends the packet. In other words, it's the point where the packet originates from, and not where it goes to.
>>>
>>> At the same time, flow *action* PORT_ID (ID = "X") is clearly the opposite: it specifies where the packet will go. Port ID is the characteristic of a DPDK ethdev. So the packet goes *to* an ethdev with the given ID ("X").
>>>
>>> Perhaps consider action PHY_PORT: the index is the characteristic of the network port. The packet goes *to* network through this NP. And not the opposite way. Hopefully, nobody is going to claim that action PHY_PORT should mean re-injecting the packet back to the HW flow engine "as if it just came from the network port". Then why does one try to skew the PORT_ID meaning this way? PORT_ID points to an ethdev - the packet goes *to* the ethdev. Isn't that simple?
>>
>> It's not simple. And PHY_PORT action would be hard to use from the
>> application that doesn't really need to know how underlying hardware
>> structured.
>
> I'm not trying to suggest using PHY_PORT. I provide it as an example to point out the inconsistency between these actions and action PORT_ID in its de-facto sense assumed by apps like OvS. Sorry.
>
>>
>>>
>>>>
>>>> Application receives a packet from port A and puts it to the port B.
>>>> TC rule to forward packets from port A to port B will provide same result.
>>>> So, why the similar rte_flow should do the opposite and send the packet
>>>> back to the application?
>>>
>>> Please see above. Action VF sends the packet *to* VF and *not* to the upstream entity which this VF is connected to. Action PHY_PORT sends the packet *to* network and does *not* make it appear as if it entered the NIC from the network side. Action QUEUE sends the packet *to* the Rx queue and does *not* make it appear as if it just egressed from the Tx queue with the same index. Action PORT_ID sends the packet *to* an ethdev with the given ID and *not* to the upstream entity which this ethdev is connected to. It's just that transparent. It's just "do what the name suggests".
>>>
>>> Yes, an application (say, OvS) might have a high level design which perceives the "high-level" ports plugged to it as a "patch-panel" of sorts. Yes, when a high-level mechanism/logic of such application invokes a *datapath-unaware* wrapper to offload a rule and request that the packet be delivered to the given "high-level" port, it therefore requests that the packet be delivered to the opposite end of the wire. But then the lower-level datapath-specific (DPDK) handler kicks in. Since it's DPDK-specific, it knows *everything* about the underlying flow library it works with. In particular it knows that action PORT_ID delivers the packet to an *ethdev*, at the same time, it knows that the upper caller (high-level logic) for sure wants the opposite, so it (the lower-level DPDK component) sets the "upstream" bit when translating the higher-level port action to an RTE action "PORT_ID".
>>
>> I don't understand that. DPDK user is the application and DPDK
>> doesn't translate anything, application creates PORT_ID action
>> directly and passes it to DPDK. So, you're forcing the *end user*
>> (a.k.a. application) to know *everything* about the hardware the
>> application runs on. Of course, it gets this information about
>> the hardware from the DPDK library (otherwise this would be
>> completely ridiculous), but this doesn't change the fact that it's
>> the application that needs to think about the structure of the
>> underlying hardware while it's absolutely not necessary in vast
>> majority of cases.
>
> Me forcing the application to know everything about the hardware? I do no such thing. What I mean is very simple: the application developer is expected to read DPDK documentation carefully before using a flow primitive like PORT_ID. If they understood it, they would realise the unfitness of the existing action behaviour for their needs. The developer would then raise the issue and extend the action in one way ("upstream" bit) or another ("egress" bit). And only *then* they would have everything in place to finally put the action into usage by the application. But what we have in reality is the opposite. Let's just face it.
>
>>
>>> Then the resulting action is correct, and the packet indeed doesn't end up in the ethdev but goes
>>> to the opposite end of the wire. That's it.
>>>
>>> I have an impression that for some reason people are tempted to ignore the two nominal "layers" in such applications (generic, or high-level one and DPDK-specific one) thus trying to align DPDK logic with high-level logic of the applications. That's simply not right. What I'm trying to point out is that it *is* the true job of DPDK-specific data path handler in such application - to properly translate generic flow actions to DPDK-specific ones. It's the duty of DPDK component in such applications to be aware of the genuine meaning of action PORT_ID.
>>
>> The reason is very simple: if application don't need to know the
>> full picture (how the hardware structured inside) it shouldn't
>> care and it's a duty of DPDK to abstract the hardware and provide
>> programming interfaces that could be easily used by application
>> developers who are not experts in the architecture of a hardware
>> that they want to use (basically, application developer should not
>> care at all in most cases on which hardware application will work).
>> It's basically in almost every single DPDK API, EAL means environment
>> *abstraction* layer, not an environment *proxy/passthrough* layer.
>> We can't assume that DPDK-specific layers in applications are always
>> written by hardware experts and, IMHO, DPDK should not force users
>> to learn underlying structures of switchdev devices. They might not
>> even have such devices for testing, so the application that works
>> on simple NICs should be able to run correctly on switchdev-capable
>> NICs too.
>>
>> I think that "***MAGIC***" abstraction (see one of my previous ascii
>> graphics) is very important here.
>
> Look. Imagine that the hypothetical "observer" sits inside the application. The application receives packets on one port X and sends them using Tx burst API on port Y. Now it wants to offload this work and use ID=Y to reference location where the packet should go "as if it was sent using Tx burst on port Y". OK. But the application uses attribute "transfer". And that really makes difference because it "teleports" (or "transfers") the "observer" from the application down to the HW eSwitch level. Yes, the application doesn't know anything about HW eSwitch structure but it at leas knows about its very existence; otherwise, it wouldn't use attribute "transfer" at all. And now, when the "observer" sits inside the HW eSwitch, from that standpoint, PORT_ID ID=Y looks like delivery to the ethdev port. The "observer"'s location changes the perceived direction. In this case, correct default behaviour is delivery to the ethdev. So. if the application developer was capable to correctly
> put attribute "transfer" in use, why wouldn't they be capable to also put flag "upstream" of the action PORT_ID in use, too?
I don't agree with the statement that "transfer" changes the point
of view from the application to the point inside the eSwitch.
For me it simply means "move", therefore means moving of the packet
from one port to another and I would assume that this happens in the
same way as it happens if application receives the packet and then
sends it to a different port. So this argument is not valid.
For me having an "observer" inside the eSwitch makes no sense if it's
not a real switch. And it's a not a real switch just because it has
too many shortcuts and implicit rules that can not be controlled by
the user, e.g. the thing that packet sent to VF-rep bypasses the
rte_flow processing and goes directly out from the VF port.
Unless it's a fully programmable switch (which it is not), I prefer
to have it fully hidden from end users as a magic box. Making it
fully programmable will make applications very complex, so I'd prefer
to not deal with this too.
>
>>
>>>
>>> This way, mixing up the two meanings is ruled out.
>>
>> Looking closer to how tc flower rules configured I noticed that
>> 'mirred' action requires user to specify the direction in which
>> the packet will appear on the destination port. And I suppose
>> this will solve your issues with PORT_ID action without exposing
>> the "full picture" of the architecture of an underlying hardware.
>>
>> It looks something like this:
>>
>> tc filter add dev A ... action mirred egress redirect dev B
>> ^^^^^^
>>
>> Direction could be 'ingress' or 'egress', so the packet will
>> ingress from the port B back to application/kernel or it will
>> egress from this port to the external network. Same thing
>> could be implemented in rte_flow like this:
>>
>> flow create A ingress transfer pattern eth / end
>> action port_id id B egress / end
>>
>> So, application that needs to receive the packet from the port B
>> will specify 'ingress', others that just want to send packet from
>> the port B will specify 'egress'. Will that work for you?
>>
>> (BTW, 'ingress' seems to be not implemented in TC and that kind
>> of suggests that it's not very useful at least for kernel use cases)
>>
>> One might say that it's actually the same what is proposed in
>> this RFC, but I will argue that 'ingress/egress' schema doesn't
>> break the "***MAGIC***" abstraction because user is not obligated
>> to know the structure of the underlying hardware, while 'upstream'
>> flag is something very unclear from that perspective and makes
>> no sense for plane ports (non-representors).
>
> Perhaps indeed "upstream" is barely the best choice, and I'd be tempted to accept the "egress" proposal, but I strongly disagree with the suggestion that "upstream" purportedly implies some HW knowledge. It simply means that the packet goes to the location where the wire leads to and not back to the application's own end of the wire. Must the port be connected to some location. And for non-representors, like admin's PF (isn't that what you call a plane port?) it perfectly makes sense because saying "PORT_ID id 0 upstream" means (not to the application, but to us) that the packet will be sent to network through the network port connected to this PF. That's it.
>
>>
>>>
>>>>
>>>>>
>>>>>>>
>>>>>>> On 01/06/2021 15:10, Ilya Maximets wrote:
>>>>>>>> On 6/1/21 1:14 PM, Ivan Malov wrote:
>>>>>>>>> By its very name, action PORT_ID means that packets hit an ethdev with the
>>>>>>>>> given DPDK port ID. At least the current comments don't state the opposite.
>>>>>>>>> That said, since port representors had been adopted, applications like OvS
>>>>>>>>> have been misusing the action. They misread its purpose as sending packets
>>>>>>>>> to the opposite end of the "wire" plugged to the given ethdev, for example,
>>>>>>>>> redirecting packets to the VF itself rather than to its representor ethdev.
>>>>>>>>> Another example: OvS relies on this action with the admin PF's ethdev port
>>>>>>>>> ID specified in it in order to send offloaded packets to the physical port.
>>>>>>>>>
>>>>>>>>> Since there might be applications which use this action in its valid sense,
>>>>>>>>> one can't just change the documentation to greenlight the opposite meaning.
>>>>>>>>> This patch adds an explicit bit to the action configuration which will let
>>>>>>>>> applications, depending on their needs, leverage the two meanings properly.
>>>>>>>>> Applications like OvS, as well as PMDs, will have to be corrected when the
>>>>>>>>> patch has been applied. But the improved clarity of the action is worth it.
>>>>>>>>>
>>>>>>>>> The proposed change is not the only option. One could avoid changes in OvS
>>>>>>>>> and PMDs if the new configuration field had the opposite meaning, with the
>>>>>>>>> action itself meaning delivery to the represented port and not to DPDK one.
>>>>>>>>> Alternatively, one could define a brand new action with the said behaviour.
>>>>>>>>
>>>>>>>> We had already very similar discussions regarding the understanding of what
>>>>>>>> the representor really is from the DPDK API's point of view, and the last
>>>>>>>> time, IIUC, it was concluded by a tech. board that representor should be
>>>>>>>> a "ghost of a VF", i.e. DPDK APIs should apply configuration by default to
>>>>>>>> VF and not to the representor device:
>>>>>>>> https://patches.dpdk.org/project/dpdk/cover/20191029185051.32203-1-thomas@monjalon.net/#104376
>>>>>>>> This wasn't enforced though, IIUC, for existing code and semantics is still mixed.
>>>>>>>>
>>>>>>>> I still think that configuration should be applied to VF, and the same applies
>>>>>>>> to rte_flow API. IMHO, average application should not care if device is
>>>>>>>> a VF itself or its representor. Everything should work exactly the same.
>>>>>>>> I think this matches with the original idea/design of the switchdev functionality
>>>>>>>> in the linux kernel and also matches with how the average user thinks about
>>>>>>>> representor devices.
>>>>>>>>
>>>>>>>> If some specific use-case requires to distinguish VF from the representor,
>>>>>>>> there should probably be a separate special API/flag for that.
>>>>>>>>
>>>>>>>> Best regards, Ilya Maximets.
>>>>>>>>
>>>>>>>
>>>>>
>>>
>
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [dpdk-dev] [RFC PATCH] ethdev: clarify flow action PORT ID semantics
2021-06-07 19:27 ` Ilya Maximets
@ 2021-06-07 20:39 ` Ivan Malov
0 siblings, 0 replies; 40+ messages in thread
From: Ivan Malov @ 2021-06-07 20:39 UTC (permalink / raw)
To: Ilya Maximets, Andrew Rybchenko, dev
Cc: Eli Britstein, Smadar Fuks, Hyong Youb Kim, Ori Kam, Jerin Jacob,
John Daley, Thomas Monjalon, Ferruh Yigit
On 07/06/2021 22:27, Ilya Maximets wrote:
> On 6/3/21 1:29 PM, Ivan Malov wrote:
>> On 03/06/2021 12:29, Ilya Maximets wrote:
>>> On 6/2/21 9:35 PM, Ivan Malov wrote:
>>>> On 02/06/2021 20:35, Ilya Maximets wrote:
>>>>> (Dropped Broadcom folks from CC. Mail server refuses to accept their
>>>>> emails for some reason: "Recipient address rejected: Domain not found."
>>>>> Please, try to ad them back on reply.)
>>>>>
>>>>> On 6/2/21 6:26 PM, Andrew Rybchenko wrote:
>>>>>> On 6/2/21 3:46 PM, Ilya Maximets wrote:
>>>>>>> On 6/1/21 4:28 PM, Ivan Malov wrote:
>>>>>>>> Hi Ilya,
>>>>>>>>
>>>>>>>> Thank you for reviewing the proposal at such short notice. I'm afraid that prior discussions overlook the simple fact that the whole problem is not limited to just VF representors. Action PORT_ID is also used with respect to the admin PF's ethdev, which "represents itself" (and by no means it represents the underlying physical/network port). In this case, one cannot state that the application treats it as a physical port, just like one states that the application perceives representors as VFs themselves.
>>>>>>>
>>>>>>>
>>>>>>> I don't think that it was overlooked. If device is in a switchdev mode than
>>>>>>> there is a PF representor and VF representors. Application typically works
>>>>>>> only with representors in this case is it doesn't make much sense to have
>>>>>>> representor and the upstream port attached to the same application at the
>>>>>>> same time. Configuration that is applied by application to the representor
>>>>>>> (PF or VF, it doesn't matter) applies to the corresponding upstream port
>>>>>>> (actual PF or VF) by default.
>>>>>>
>>>>>> PF is not necessarily associated with a network port. It
>>>>>> could be many PFs and just one network port on NIC.
>>>>>> Extra PFs are like VFs in this case. These PFs may be
>>>>>> passed to a VM in a similar way. So, we can have PF
>>>>>> representors similar to VF representors. I.e. it is
>>>>>> incorrect to say that PF in the case of switchdev is
>>>>>> a representor of a network port.
>>>>>>
>>>>>> If we prefer to talk in representors terminology, we
>>>>>> need 4 types of prepresentors:
>>>>>> - PF representor for PCIe physical function
>>>>>> - VF representor for PCIe virtual function
>>>>>> - SF representor for PCIe sub-function (PASID)
>>>>>> - network port representor
>>>>>> In fact above is PCIe oriented, but there are
>>>>>> other buses and ways to deliver traffic to applications.
>>>>>> Basically representor for any virtual port in virtual
>>>>>> switch which DPDK app can control using transfer rules.
>>>>>>
>>>>>>> Exactly same thing here with PORT_ID action. You have a packet and action
>>>>>>> to send it to the port, but it's not specified if HW needs to send it to
>>>>>>> the representor or the upstream port (again, VF or PF, it doesn't matter).
>>>>>>> Since there is no extra information, HW should send it to the upstream
>>>>>>> port by default. The same as configuration applies by default to the
>>>>>>> upstream port.
>>>>>>>
>>>>>>> Let's look at some workflow examples:
>>>>>>>
>>>>>>> DPDK Application
>>>>>>> | |
>>>>>>> | |
>>>>>>> +--PF-rep------VF-rep---+
>>>>>>> | |
>>>>>>> | NIC (switchdev) |
>>>>>>> | |
>>>>>>> +---PF---------VF-------+
>>>>>>> | |
>>>>>>> | |
>>>>>>> External VM or whatever
>>>>>>> Network
>>>>>>
>>>>>> See above. PF <-> External Network is incorrect above
>>>>>> since it not always the case. It should be
>>>>>> "NP <-> External network" and "NP-rep" above (NP -
>>>>>> network port). Sometimes PF is an NP-rep, but sometimes
>>>>>> it is not. It is just a question of default rules in
>>>>>> switchdev on what to do with traffic incoming from
>>>>>> network port.
>>>>>>
>>>>>> A bit more complicated picture is:
>>>>>>
>>>>>> +----------------------------------------+
>>>>>> | DPDK Application |
>>>>>> +----+---------+---------+---------+-----+
>>>>>> |PF0 |PF1 | |
>>>>>> | | | |
>>>>>> +--NP1-rep---NP2-rep---PF2-rep---VF-rep--+
>>>>>> | |
>>>>>> | NIC (switchdev) |
>>>>>> | |
>>>>>> +---NP1-------NP2-------PF2--------VF----+
>>>>>> | | | |
>>>>>> | | | |
>>>>>> External External VM or VM or
>>>>>> Network 1 Network 2 whatever whatever
>>>>>>
>>>>>> So, sometimes PF plays network port representor role (PF0,
>>>>>> PF1), sometimes it requires representor itself (PF2).
>>>>>> What to do if PF2 itself is attached to application?
>>>>>> Can we route traffic to it using PORT_ID action?
>>>>>> It has DPDK ethdev port. It is one of arguments why
>>>>>> plain PORT_ID should route DPDK application.
>>>>>
>>>>> OK. This is not very different from my understanding. The key
>>>>> is that there is a pair of interfaces, one is more visible than
>>>>> the other one.
>>>>>
>>>>>>
>>>>>> Of course, some applications would like to see it as
>>>>>> (simpler is better):
>>>>>>
>>>>>> +----------------------------------------+
>>>>>> | DPDK Application |
>>>>>> | |
>>>>>> +---PF0-------PF1------PF2-rep---VF-rep--+
>>>>>> | | | |
>>>>>> | | | |
>>>>>> External External VM or VM or
>>>>>> Network 1 Network 2 whatever whatever
>>>>>>
>>>>>> but some, I believe, require full picture. For examples,
>>>>>> I'd really like to know how much traffic goes via all 8
>>>>>> switchdev ports and running rte_eth_stats_get(0, ...)
>>>>>> (i.e. DPDK port 0 attached to PF0) I'd like to get
>>>>>> NP1-rep stats (not NP1 stats). It will match exactly
>>>>>> what I see in DPDK application. It is an argument why
>>>>>> plain PORT_ID should be treated as a DPDK ethdev port,
>>>>>> not a represented (upstream) entity.
>>>>>
>>>>> The point is that if application doesn't require full picture,
>>>>> it should not care. If application requires the full picture,
>>>>> it could take extra steps by setting extra bits. I don't
>>>>> understand why we need to force all applications to care about
>>>>> the full picture if we can avoid that?
>>>>>
>>>>>>
>>>>>>> a. Workflow for "DPDK Application" to set MAC to VF:
>>>>>>>
>>>>>>> 1. "DPDK Application" calls rte_set_etheraddr("VF-rep", new_mac);
>>>>>>> 2. DPDK sets MAC for "VF".
>>>>>>>
>>>>>>> b. Workflow for "DPDK Application" to set MAC to PF:
>>>>>>>
>>>>>>> 1. "DPDK Application" calls rte_set_etheraddr("PF-rep", new_mac);
>>>>>>> 2. DPDK sets MAC for "PF".
>>>>>>>
>>>>>>> c. Workflow for "DPDK Application" to send packet to the external network:
>>>>>>>
>>>>>>> 1. "DPDK Application" calls rte_eth_tx_burst("PF-rep", packet);
>>>>>>> 2. NIC receives the packet from "PF-rep" and sends it to "PF".
>>>>>>> 3. packet egresses to the external network from "PF".
>>>>>>>
>>>>>>> d. Workflow for "DPDK Application" to send packet to the "VM or whatever":
>>>>>>>
>>>>>>> 1. "DPDK Application" calls rte_eth_tx_burst("VF-rep", packet);
>>>>>>> 2. NIC receives the packet from "VF-rep" and sends it to "VF".
>>>>>>> 3. "VM or whatever" receives the packet from "VF".
>>>>>>>
>>>>>>> In two workflows above there is no rte_flow processing on step 2, i.e.,
>>>>>>> NIC does not perform any lookups/matches/actions, because it's not possible
>>>>>>> to configure actions for packets received from "PF-rep" or
>>>>>>> "VF-rep" as these ports doesn't own a port id and all the configuration
>>>>>>> and rte_flow actions translated and applied for the devices that these
>>>>>>> ports represents ("PF" and "VF") and not representors themselves ("PF-rep"
>>>>>>> or "VF-rep").
>>>>>>>
>>>>>>> e. Workflow for the packet received on PF and PORT_ID action:
>>>>>>>
>>>>>>> 1. "DPDK Application" configures rte_flow for all packets from "PF-rep"
>>>>>>> to execute PORT_ID "VF-rep".
>>>>>>> 2. NIC receives packet on "PF".
>>>>>>> 3. NIC executes 'PORT_ID "VF-rep"' action by sending packet to "VF".
>>>>>>> 4. "VM or whatever" receives the packet from "VF".
>>>>>>>
>>>>>>> f. Workflow for the packet received on VF and PORT_ID action:
>>>>>>>
>>>>>>> 1. "DPDK Application" configures rte_flow for all packets from "VF-rep"
>>>>>>> to execute 'PORT_ID "PF-rep"'.
>>>>>>> 2. NIC receives packet on "VF".
>>>>>>> 3. NIC executes 'PORT_ID "PF-rep"' action by sending packet to "PF".
>>>>>>> 4. Packet egresses from the "PF" to the external network.
>>>>>>>
>>>>>>> Above is what, IMHO, the logic should look like and this matches with
>>>>>>> the overall switchdev design in kernel.
>>>>>>>
>>>>>>> I understand that this logic could seem flipped-over from the HW point
>>>>>>> of view, but it's perfectly logical from the user's perspective, because
>>>>>>> user should not care if the application works with representors or
>>>>>>> some real devices. If application configures that all packets from port
>>>>>>> A should be sent to port B, user will expect that these packets will
>>>>>>> egress from port B once received from port A. That will be highly
>>>>>>> inconvenient if the packet will ingress from port B back to the
>>>>>>> application instead.
>>>>>>>
>>>>>>> DPDK Application
>>>>>>> | |
>>>>>>> | |
>>>>>>> port A port B
>>>>>>> | |
>>>>>>> *****MAGIC*****
>>>>>>> | |
>>>>>>> External Another Network
>>>>>>> Network or VM or whatever
>>>>>>>
>>>>>>> It should not matter if there is an extra layer between ports A and B
>>>>>>> and the external network and VM. Everything should work in exactly the
>>>>>>> same way, transparently for the application.
>>>>>>>
>>>>>>> The point of hardware offloading, and therefore rte_flow API, is to take
>>>>>>> what user does in software and make this "magically" work in hardware in
>>>>>>> the exactly same way. And this will be broken if user will have to
>>>>>>> use different logic based on the mode the hardware works in, i.e. based on
>>>>>>> the fact if the application works with ports or their representors.
>>>>>>>
>>>>>>> If some specific use case requires application to know if it's an
>>>>>>> upstream port or the representor and demystify the internals of the switchdev
>>>>>>> NIC, there should be a different port id for the representor itself that
>>>>>>> could be used in all DPDK APIs including rte_flow API or a special bit for
>>>>>>> that matter. IIRC, there was an idea to add a bit directly to the port_id
>>>>>>> for that purpose that will flip over behavior in all the workflow scenarios
>>>>>>> that I described above.
>>>>>>
>>>>>> As I understand we're basically on the same page, but just
>>>>>> fighting for defaults in DPDK.
>>>>>
>>>>> Yep.
>>>>>
>>>>>>
>>>>>>>>
>>>>>>>> Given these facts, it would not be quite right to just align the documentation with the de-facto action meaning assumed by OvS.
>>>>>>>
>>>>>>> It's not a "meaning assumed by OvS", it's the original design and the
>>>>>>> main idea of a switchdev based on a common sense.
>>>>>>
>>>>>> If so, common sense is not that common :)
>>>>>> My "common sense" says me that PORT_ID action
>>>>>> should route traffic to DPDK ethdev port to be
>>>>>> received by the DPDK application.
>>>>>
>>>>> By this logic rte_eth_tx_burst("VF-rep", packet) should send a packet
>>>>> to "VF-rep", i.e. this packet will be received back by the application
>>>>> on this same interface. But that is counter-intuitive and this is not
>>>>> how it works in linux kernel if you're opening socket and sending a
>>>>> packet to the "VF-rep" network interface.
>>>>>
>>>>> And if rte_eth_tx_burst("VF-rep", packet) sends packet to "VF" and not
>>>>> to "VF-rep", than I don't understand why PORT_ID action should work in
>>>>> the opposite way.
>>>>
>>>> There's no contradiction here.
>>>>
>>>> In rte_eth_tx_burst(X, packet) example, "X" is the port which the application sits on and from where it sends the packet. In other words, it's the point where the packet originates from, and not where it goes to.
>>>>
>>>> At the same time, flow *action* PORT_ID (ID = "X") is clearly the opposite: it specifies where the packet will go. Port ID is the characteristic of a DPDK ethdev. So the packet goes *to* an ethdev with the given ID ("X").
>>>>
>>>> Perhaps consider action PHY_PORT: the index is the characteristic of the network port. The packet goes *to* network through this NP. And not the opposite way. Hopefully, nobody is going to claim that action PHY_PORT should mean re-injecting the packet back to the HW flow engine "as if it just came from the network port". Then why does one try to skew the PORT_ID meaning this way? PORT_ID points to an ethdev - the packet goes *to* the ethdev. Isn't that simple?
>>>
>>> It's not simple. And PHY_PORT action would be hard to use from the
>>> application that doesn't really need to know how underlying hardware
>>> structured.
>>
>> I'm not trying to suggest using PHY_PORT. I provide it as an example to point out the inconsistency between these actions and action PORT_ID in its de-facto sense assumed by apps like OvS. Sorry.
>>
>>>
>>>>
>>>>>
>>>>> Application receives a packet from port A and puts it to the port B.
>>>>> TC rule to forward packets from port A to port B will provide same result.
>>>>> So, why the similar rte_flow should do the opposite and send the packet
>>>>> back to the application?
>>>>
>>>> Please see above. Action VF sends the packet *to* VF and *not* to the upstream entity which this VF is connected to. Action PHY_PORT sends the packet *to* network and does *not* make it appear as if it entered the NIC from the network side. Action QUEUE sends the packet *to* the Rx queue and does *not* make it appear as if it just egressed from the Tx queue with the same index. Action PORT_ID sends the packet *to* an ethdev with the given ID and *not* to the upstream entity which this ethdev is connected to. It's just that transparent. It's just "do what the name suggests".
>>>>
>>>> Yes, an application (say, OvS) might have a high level design which perceives the "high-level" ports plugged to it as a "patch-panel" of sorts. Yes, when a high-level mechanism/logic of such application invokes a *datapath-unaware* wrapper to offload a rule and request that the packet be delivered to the given "high-level" port, it therefore requests that the packet be delivered to the opposite end of the wire. But then the lower-level datapath-specific (DPDK) handler kicks in. Since it's DPDK-specific, it knows *everything* about the underlying flow library it works with. In particular it knows that action PORT_ID delivers the packet to an *ethdev*, at the same time, it knows that the upper caller (high-level logic) for sure wants the opposite, so it (the lower-level DPDK component) sets the "upstream" bit when translating the higher-level port action to an RTE action "PORT_ID".
>>>
>>> I don't understand that. DPDK user is the application and DPDK
>>> doesn't translate anything, application creates PORT_ID action
>>> directly and passes it to DPDK. So, you're forcing the *end user*
>>> (a.k.a. application) to know *everything* about the hardware the
>>> application runs on. Of course, it gets this information about
>>> the hardware from the DPDK library (otherwise this would be
>>> completely ridiculous), but this doesn't change the fact that it's
>>> the application that needs to think about the structure of the
>>> underlying hardware while it's absolutely not necessary in vast
>>> majority of cases.
>>
>> Me forcing the application to know everything about the hardware? I do no such thing. What I mean is very simple: the application developer is expected to read DPDK documentation carefully before using a flow primitive like PORT_ID. If they understood it, they would realise the unfitness of the existing action behaviour for their needs. The developer would then raise the issue and extend the action in one way ("upstream" bit) or another ("egress" bit). And only *then* they would have everything in place to finally put the action into usage by the application. But what we have in reality is the opposite. Let's just face it.
>>
>>>
>>>> Then the resulting action is correct, and the packet indeed doesn't end up in the ethdev but goes
>>>> to the opposite end of the wire. That's it.
>>>>
>>>> I have an impression that for some reason people are tempted to ignore the two nominal "layers" in such applications (generic, or high-level one and DPDK-specific one) thus trying to align DPDK logic with high-level logic of the applications. That's simply not right. What I'm trying to point out is that it *is* the true job of DPDK-specific data path handler in such application - to properly translate generic flow actions to DPDK-specific ones. It's the duty of DPDK component in such applications to be aware of the genuine meaning of action PORT_ID.
>>>
>>> The reason is very simple: if application don't need to know the
>>> full picture (how the hardware structured inside) it shouldn't
>>> care and it's a duty of DPDK to abstract the hardware and provide
>>> programming interfaces that could be easily used by application
>>> developers who are not experts in the architecture of a hardware
>>> that they want to use (basically, application developer should not
>>> care at all in most cases on which hardware application will work).
>>> It's basically in almost every single DPDK API, EAL means environment
>>> *abstraction* layer, not an environment *proxy/passthrough* layer.
>>> We can't assume that DPDK-specific layers in applications are always
>>> written by hardware experts and, IMHO, DPDK should not force users
>>> to learn underlying structures of switchdev devices. They might not
>>> even have such devices for testing, so the application that works
>>> on simple NICs should be able to run correctly on switchdev-capable
>>> NICs too.
>>>
>>> I think that "***MAGIC***" abstraction (see one of my previous ascii
>>> graphics) is very important here.
>>
>> Look. Imagine that the hypothetical "observer" sits inside the application. The application receives packets on one port X and sends them using Tx burst API on port Y. Now it wants to offload this work and use ID=Y to reference location where the packet should go "as if it was sent using Tx burst on port Y". OK. But the application uses attribute "transfer". And that really makes difference because it "teleports" (or "transfers") the "observer" from the application down to the HW eSwitch level. Yes, the application doesn't know anything about HW eSwitch structure but it at leas knows about its very existence; otherwise, it wouldn't use attribute "transfer" at all. And now, when the "observer" sits inside the HW eSwitch, from that standpoint, PORT_ID ID=Y looks like delivery to the ethdev port. The "observer"'s location changes the perceived direction. In this case, correct default behaviour is delivery to the ethdev. So. if the application developer was capable to correctly
>> put attribute "transfer" in use, why wouldn't they be capable to also put flag "upstream" of the action PORT_ID in use, too?
>
> I don't agree with the statement that "transfer" changes the point
> of view from the application to the point inside the eSwitch.
> For me it simply means "move", therefore means moving of the packet
> from one port to another and I would assume that this happens in the
> same way as it happens if application receives the packet and then
> sends it to a different port. So this argument is not valid.
I hate to say it, but simply disagreeing with this argument is not
helpful. Doing so doesn't make the argument invalid unless the
documentation is fixed to say super-clearly what in fact "transfer"
attribute is and what point of view (application or eSwitch) prevails.
Here's what existing documentation says:
"Instead of simply matching the properties of traffic as it would appear
on a given DPDK port ID, enabling this attribute transfers a flow rule
to the lowest possible level of any device endpoints found in the pattern".
You talk about "moving a packet" while this paragraph talks about
"transferring a rule" - it's all vague, confusing, and, after all, it is
the reason why we have the problem with action PORT_ID semantics and the
likes. The documentation should not use weasel words but rather write
out in full what point of view is, and where things start to become
HW-specific. If action PORT_ID is not for delivery of packets to an
ethdev, one should have explained this in the comment.
If I understand correctly, not only does DPDK intend to be helpful to
end users, it also tries to be clear and concise to NIC drivers. So, for
sure, one should admit the problem existence. Secondly, all parties
should finally agree on the prevailing point of view (end user or
PMD/eSwitch) with respect to attribute "transfer" and action PORT_ID.
And all of that should result in documentation fix/update or semantics
change (whatever is more correct).
Simply ignoring the problem or assuming undocumented meanings (like this
"end user" point of view) while leaving the documentation poor doesn't
bear fruit. The raised issues are not just some petty concerns but
rather real significant issues, and pretending that they don’t exist is
not right.
Moving away from opinion-driven approach and fixing the documentation is
what we all might benefit from.
>
> For me having an "observer" inside the eSwitch makes no sense if it's
> not a real switch. And it's a not a real switch just because it has
> too many shortcuts and implicit rules that can not be controlled by
> the user, e.g. the thing that packet sent to VF-rep bypasses the
> rte_flow processing and goes directly out from the VF port.
>
> Unless it's a fully programmable switch (which it is not), I prefer
> to have it fully hidden from end users as a magic box. Making it
> fully programmable will make applications very complex, so I'd prefer
> to not deal with this too.
>
>>
>>>
>>>>
>>>> This way, mixing up the two meanings is ruled out.
>>>
>>> Looking closer to how tc flower rules configured I noticed that
>>> 'mirred' action requires user to specify the direction in which
>>> the packet will appear on the destination port. And I suppose
>>> this will solve your issues with PORT_ID action without exposing
>>> the "full picture" of the architecture of an underlying hardware.
>>>
>>> It looks something like this:
>>>
>>> tc filter add dev A ... action mirred egress redirect dev B
>>> ^^^^^^
>>>
>>> Direction could be 'ingress' or 'egress', so the packet will
>>> ingress from the port B back to application/kernel or it will
>>> egress from this port to the external network. Same thing
>>> could be implemented in rte_flow like this:
>>>
>>> flow create A ingress transfer pattern eth / end
>>> action port_id id B egress / end
>>>
>>> So, application that needs to receive the packet from the port B
>>> will specify 'ingress', others that just want to send packet from
>>> the port B will specify 'egress'. Will that work for you?
>>>
>>> (BTW, 'ingress' seems to be not implemented in TC and that kind
>>> of suggests that it's not very useful at least for kernel use cases)
>>>
>>> One might say that it's actually the same what is proposed in
>>> this RFC, but I will argue that 'ingress/egress' schema doesn't
>>> break the "***MAGIC***" abstraction because user is not obligated
>>> to know the structure of the underlying hardware, while 'upstream'
>>> flag is something very unclear from that perspective and makes
>>> no sense for plane ports (non-representors).
>>
>> Perhaps indeed "upstream" is barely the best choice, and I'd be tempted to accept the "egress" proposal, but I strongly disagree with the suggestion that "upstream" purportedly implies some HW knowledge. It simply means that the packet goes to the location where the wire leads to and not back to the application's own end of the wire. Must the port be connected to some location. And for non-representors, like admin's PF (isn't that what you call a plane port?) it perfectly makes sense because saying "PORT_ID id 0 upstream" means (not to the application, but to us) that the packet will be sent to network through the network port connected to this PF. That's it.
>>
>>>
>>>>
>>>>>
>>>>>>
>>>>>>>>
>>>>>>>> On 01/06/2021 15:10, Ilya Maximets wrote:
>>>>>>>>> On 6/1/21 1:14 PM, Ivan Malov wrote:
>>>>>>>>>> By its very name, action PORT_ID means that packets hit an ethdev with the
>>>>>>>>>> given DPDK port ID. At least the current comments don't state the opposite.
>>>>>>>>>> That said, since port representors had been adopted, applications like OvS
>>>>>>>>>> have been misusing the action. They misread its purpose as sending packets
>>>>>>>>>> to the opposite end of the "wire" plugged to the given ethdev, for example,
>>>>>>>>>> redirecting packets to the VF itself rather than to its representor ethdev.
>>>>>>>>>> Another example: OvS relies on this action with the admin PF's ethdev port
>>>>>>>>>> ID specified in it in order to send offloaded packets to the physical port.
>>>>>>>>>>
>>>>>>>>>> Since there might be applications which use this action in its valid sense,
>>>>>>>>>> one can't just change the documentation to greenlight the opposite meaning.
>>>>>>>>>> This patch adds an explicit bit to the action configuration which will let
>>>>>>>>>> applications, depending on their needs, leverage the two meanings properly.
>>>>>>>>>> Applications like OvS, as well as PMDs, will have to be corrected when the
>>>>>>>>>> patch has been applied. But the improved clarity of the action is worth it.
>>>>>>>>>>
>>>>>>>>>> The proposed change is not the only option. One could avoid changes in OvS
>>>>>>>>>> and PMDs if the new configuration field had the opposite meaning, with the
>>>>>>>>>> action itself meaning delivery to the represented port and not to DPDK one.
>>>>>>>>>> Alternatively, one could define a brand new action with the said behaviour.
>>>>>>>>>
>>>>>>>>> We had already very similar discussions regarding the understanding of what
>>>>>>>>> the representor really is from the DPDK API's point of view, and the last
>>>>>>>>> time, IIUC, it was concluded by a tech. board that representor should be
>>>>>>>>> a "ghost of a VF", i.e. DPDK APIs should apply configuration by default to
>>>>>>>>> VF and not to the representor device:
>>>>>>>>> https://patches.dpdk.org/project/dpdk/cover/20191029185051.32203-1-thomas@monjalon.net/#104376
>>>>>>>>> This wasn't enforced though, IIUC, for existing code and semantics is still mixed.
>>>>>>>>>
>>>>>>>>> I still think that configuration should be applied to VF, and the same applies
>>>>>>>>> to rte_flow API. IMHO, average application should not care if device is
>>>>>>>>> a VF itself or its representor. Everything should work exactly the same.
>>>>>>>>> I think this matches with the original idea/design of the switchdev functionality
>>>>>>>>> in the linux kernel and also matches with how the average user thinks about
>>>>>>>>> representor devices.
>>>>>>>>>
>>>>>>>>> If some specific use-case requires to distinguish VF from the representor,
>>>>>>>>> there should probably be a separate special API/flag for that.
>>>>>>>>>
>>>>>>>>> Best regards, Ilya Maximets.
>>>>>>>>>
>>>>>>>>
>>>>>>
>>>>
>>
--
Ivan M
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [dpdk-dev] [RFC PATCH] ethdev: clarify flow action PORT ID semantics
2021-06-07 16:07 ` Thomas Monjalon
@ 2021-06-08 16:13 ` Thomas Monjalon
2021-06-08 16:32 ` Andrew Rybchenko
0 siblings, 1 reply; 40+ messages in thread
From: Thomas Monjalon @ 2021-06-08 16:13 UTC (permalink / raw)
To: dev
Cc: Ilya Maximets, Ori Kam, Andrew Rybchenko, Ivan Malov,
Eli Britstein, dev, Smadar Fuks, Hyong Youb Kim,
Kishore Padmanabha, Ajit Khaparde, Jerin Jacob, John Daley,
Ferruh Yigit, olivier.matz, shahafs, david.marchand, Rony Efraim
07/06/2021 18:07, Thomas Monjalon:
> 07/06/2021 15:21, Ilya Maximets:
> > On 6/7/21 2:08 PM, Ori Kam wrote:
> > > From: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
> > >> On 6/7/21 11:28 AM, Thomas Monjalon wrote:
> > >>> I see emails don't work well to conclude on how to manage representors.
> > >>> I propose working in live meetings so we can try to align our views on
> > >>> a virtual whiteboard and interactively ask questions.
> > >>> Participants in those meetings could work on documenting what is the
> > >>> view of a representor as a first step.
> > >>> Second step, it should be easier to discuss the API.
> > >>>
> > >>> If you agree, I will plan a first meeting where we can discuss what is
> > >>> a representor in our opinions.
> > >>> The meeting time would be 4pm UTC.
> > >>> For the day, I would propose this Thursday 10 if it works for
> > >>> everybody involved.
> >
> > Second half of the day (CET) is pretty much booked in my calendar. Tuesday
> > or Wednesday might work on this week. Wednesday or Thursday might work on
> > next week.
> >
> > >> OK for me.
> > > O.K. for me too.
>
> OK let's decide with a poll please:
> https://framadate.org/PAP3t8ycqx3lEUfe
Only 5 people voting.
We will be at least 4 tomorrow Wednesday 3pm UTC.
Everybody is welcome to join the community meeting:
https://zoom.us/j/93391811485
We could try another session on Thursday 4pm UTC if needed.
https://zoom.us/j/92425132629
Note: we will use Zoom which offers a whiteboard.
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [dpdk-dev] [RFC PATCH] ethdev: clarify flow action PORT ID semantics
2021-06-08 16:13 ` Thomas Monjalon
@ 2021-06-08 16:32 ` Andrew Rybchenko
2021-06-08 18:49 ` Thomas Monjalon
0 siblings, 1 reply; 40+ messages in thread
From: Andrew Rybchenko @ 2021-06-08 16:32 UTC (permalink / raw)
To: Thomas Monjalon, dev
Cc: Ilya Maximets, Ori Kam, Ivan Malov, Eli Britstein, Smadar Fuks,
Hyong Youb Kim, Kishore Padmanabha, Ajit Khaparde, Jerin Jacob,
John Daley, Ferruh Yigit, olivier.matz, shahafs, david.marchand,
Rony Efraim
On 6/8/21 7:13 PM, Thomas Monjalon wrote:
> 07/06/2021 18:07, Thomas Monjalon:
>> 07/06/2021 15:21, Ilya Maximets:
>>> On 6/7/21 2:08 PM, Ori Kam wrote:
>>>> From: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
>>>>> On 6/7/21 11:28 AM, Thomas Monjalon wrote:
>>>>>> I see emails don't work well to conclude on how to manage representors.
>>>>>> I propose working in live meetings so we can try to align our views on
>>>>>> a virtual whiteboard and interactively ask questions.
>>>>>> Participants in those meetings could work on documenting what is the
>>>>>> view of a representor as a first step.
>>>>>> Second step, it should be easier to discuss the API.
>>>>>>
>>>>>> If you agree, I will plan a first meeting where we can discuss what is
>>>>>> a representor in our opinions.
>>>>>> The meeting time would be 4pm UTC.
>>>>>> For the day, I would propose this Thursday 10 if it works for
>>>>>> everybody involved.
>>>
>>> Second half of the day (CET) is pretty much booked in my calendar. Tuesday
>>> or Wednesday might work on this week. Wednesday or Thursday might work on
>>> next week.
>>>
>>>>> OK for me.
>>>> O.K. for me too.
>>
>> OK let's decide with a poll please:
>> https://framadate.org/PAP3t8ycqx3lEUfe
>
> Only 5 people voting.
> We will be at least 4 tomorrow Wednesday 3pm UTC.
> Everybody is welcome to join the community meeting:
> https://zoom.us/j/93391811485
Invite has July, 7. Typo?
> We could try another session on Thursday 4pm UTC if needed.
> https://zoom.us/j/92425132629
>
> Note: we will use Zoom which offers a whiteboard.
>
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [dpdk-dev] [RFC PATCH] ethdev: clarify flow action PORT ID semantics
2021-06-08 16:32 ` Andrew Rybchenko
@ 2021-06-08 18:49 ` Thomas Monjalon
2021-06-09 14:31 ` Andrew Rybchenko
0 siblings, 1 reply; 40+ messages in thread
From: Thomas Monjalon @ 2021-06-08 18:49 UTC (permalink / raw)
To: Andrew Rybchenko
Cc: dev, Ilya Maximets, Ori Kam, Ivan Malov, Eli Britstein,
Smadar Fuks, Hyong Youb Kim, Kishore Padmanabha, Ajit Khaparde,
Jerin Jacob, John Daley, Ferruh Yigit, olivier.matz, shahafs,
david.marchand, Rony Efraim
08/06/2021 18:32, Andrew Rybchenko:
> On 6/8/21 7:13 PM, Thomas Monjalon wrote:
> > We will be at least 4 tomorrow Wednesday 3pm UTC.
> > Everybody is welcome to join the community meeting:
> > https://zoom.us/j/93391811485
>
> Invite has July, 7. Typo?
Yes was a typo, sorry.
The invite was updated to tomorrow Wednesday June 9th.
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [dpdk-dev] [RFC PATCH] ethdev: clarify flow action PORT ID semantics
2021-06-08 18:49 ` Thomas Monjalon
@ 2021-06-09 14:31 ` Andrew Rybchenko
0 siblings, 0 replies; 40+ messages in thread
From: Andrew Rybchenko @ 2021-06-09 14:31 UTC (permalink / raw)
To: Thomas Monjalon
Cc: dev, Ilya Maximets, Ori Kam, Ivan Malov, Eli Britstein,
Smadar Fuks, Hyong Youb Kim, Kishore Padmanabha, Ajit Khaparde,
Jerin Jacob, John Daley, Ferruh Yigit, olivier.matz, shahafs,
david.marchand, Rony Efraim
On 6/8/21 9:49 PM, Thomas Monjalon wrote:
> 08/06/2021 18:32, Andrew Rybchenko:
>> On 6/8/21 7:13 PM, Thomas Monjalon wrote:
>>> We will be at least 4 tomorrow Wednesday 3pm UTC.
>>> Everybody is welcome to join the community meeting:
>>> https://zoom.us/j/93391811485
>> Invite has July, 7. Typo?
> Yes was a typo, sorry.
> The invite was updated to tomorrow Wednesday June 9th.
Some input for the meeting. Done very quickly and a bit
late, but I hope still useful for the discussion.
Below is a very quick review of the ethdev API in attempt to understand
meaning of various functions if representor is a real ethdev port or
a shadow (configuration interface) of represented entity (VF, PF, SF
and may be even network port).
1. API with port_id+queue_id does not make sense if representor is
a shadow of VF. Queues are fully controlled by guest OS.
- rte_eth_tx_done_cleanup()
- rte_eth_dev_rx_intr_enable()
- rte_eth_dev_rx_intr_disable()
- rte_eth_dev_rx_intr_ctl_q()
- rte_eth_dev_rx_intr_ctl_q_get_fd()
- rte_eth_set_queue_rate_limit()
- rte_eth_add_rx_callback()
- rte_eth_add_first_rx_callback()
- rte_eth_remove_rx_callback()
- rte_eth_add_tx_callback()
- rte_eth_remove_tx_callback()
- rte_eth_rx_queue_setup()
- rte_eth_rx_hairpin_queue_setup()
- rte_eth_tx_queue_setup()
- rte_eth_tx_hairpin_queue_setup()
- rte_eth_dev_rx_queue_start()
- rte_eth_dev_rx_queue_stop()
- rte_eth_dev_tx_queue_start()
- rte_eth_dev_tx_queue_stop()
- rte_eth_rx_queue_info_get()
- rte_eth_tx_queue_info_get()
- rte_eth_dev_set_tx_queue_stats_mapping()
- rte_eth_dev_set_rx_queue_stats_mapping()
- rte_eth_dev_set_vlan_strip_on_queue()
- rte_eth_dev_rss_reta_update()
- rte_eth_dev_rss_reta_query()
- rte_eth_rx_burst_mode_get()
- rte_eth_tx_burst_mode_get()
- rte_eth_get_monitor_addr()
- rte_eth_rx_burst()
- rte_eth_rx_queue_count()
- rte_eth_rx_descriptor_done()
- rte_eth_rx_descriptor_status()
- rte_eth_tx_descriptor_status()
- rte_eth_tx_burst()
- rte_eth_tx_prepare()
- rte_eth_tx_buffer_flush()
- rte_eth_tx_buffer()
- rte_eth_dev_pool_ops_supported()
Of course, we can say that representor port_id is a real
ethdev port in this case, but it looks inconsistent.
2. Some API functions are essential in ethdev port life-cycle.
2.1) rte_eth_dev_configure() is required to be real if
representors are used for HW offload to receive
slow path traffic. It configures real ethdev port
queues, offloads, steering, interrupts etc.
It never configures represented function (VF).
2.2) rte_eth_dev_close() closes ethdev port itself (not VF)
2.3) rte_eth_dev_start()/rte_eth_dev_stop() are required
to be real similar to rte_eth_dev_configure()
2.4) rte_eth_dev_info_get() could be a mixture in fact
3. A number of functions makes sense in both cases and
looks a bit more natural if representor is a shadow of
represented function (e.g. VF)
3.1) rte_eth_dev_set_link_up()/rte_eth_dev_set_link_down()
to administratively control if a VF can send/receive
traffic.
3.2) rte_eth_dev_reset() to enforce represented function
reset to apply new settings.
3.3) rte_eth_promiscuous_enable()/rte_eth_promiscuous_disable()/
rte_eth_promiscuous_get()/rte_eth_allmulticast_enable()/
rte_eth_allmulticast_disable()/rte_eth_allmulticast_get()
to control if represented entity can receive corresponding
traffic
3.4) rte_eth_link_get()/rte_eth_link_get_nowait() ???
3.5) rte_eth_macaddr_get()/rte_eth_dev_default_mac_addr_set()
to provide and control administratively configured
(and possibly enforced) MAC address of the represented entity
3.6) rte_eth_dev_vlan_filter() can control (enforced?) VLAN
filtering on represented entity
3.7) rte_eth_dev_set_vlan_ether_type()/rte_eth_dev_set_vlan_pvid()
may be used to configure transparent VLAN insertion
(doable using RTE flow API VLAN_PUSH on transfer level)
3.8) rte_eth_dev_set_vlan_offload()/rte_eth_dev_get_vlan_offload()
may be used to control transparent VLAN stripping
(doable using RTE flow API VLAN_POP on transfer level)
See filter above.
3.9) rte_eth_led_on()/rte_eth_led_off()
makes sense for network port representors only
3.10) rte_eth_fec_get_capability()/rte_eth_fec_get()/rte_eth_fec_set()
makes sense for network port representors only
3.11) rte_eth_dev_flow_ctrl_get()/rte_eth_dev_flow_ctrl_set()
rte_eth_dev_priority_flow_ctrl_set()
makes sense for network port representors only
3.12) rte_eth_dev_mac_addr_add()/rte_eth_dev_mac_addr_remove()
to control allowed MAC addresses for represented entity
3.13) rte_eth_dev_udp_tunnel_port_add()/
rte_eth_dev_udp_tunnel_port_delete()
to administratively control UDP tunnels configuration and
inform represented entity (set allowed entries)
3.14) rte_eth_dev_set_mc_addr_list() to control allowed groups
membership
4. Another group of functions which make sense in both cases, but
more natural if representor is a real port.
4.1) rte_eth_stats_get()/rte_eth_stats_reset()
First of all, these stats report how many packets/bytes are
received/sent by the DPDK application including per-queue
figures.
Of course, it makes sense for represented entity as well,
but does not look as a primary meaning.
4.2) rte_eth_xstats_get_names()/rte_eth_xstats_get()/
rte_eth_xstats_get_names_by_id()/rte_eth_xstats_get_by_id()/
rte_eth_xstats_get_id_by_name()/rte_eth_xstats_reset()
similar to standard stats since include standard stats.
Could provide information for both cases simultaneously in fact.
4.3) rte_eth_dev_get_supported_ptypes()/rte_eth_dev_set_ptypes()
it says what classification of packets DPDK app will see on Rx
4.4) rte_eth_dev_get_mtu()/rte_eth_dev_set_mtu() since it is
tightly related to size of Rx buffers and Rx offloads
configuration, but definitely makes sense for represented
entity.
4.5) rte_eth_dev_callback_register()/rte_eth_dev_callback_unregister()
seems to be more natural if representor is a real ethdev port
4.6) rte_eth_dev_rss_hash_update()/rte_eth_dev_rss_hash_conf_get()
but it could make sense for represented entity if we'd like
to enforce something (unclear why we need it)
5. It does not matter for a number of functions since talking to HW anyway
- rte_eth_dev_fw_version_get()
- rte_eth_dev_get_reg_info()
- rte_eth_dev_get_eeprom_length()
- rte_eth_dev_get_eeprom()
- rte_eth_dev_set_eeprom()
- rte_eth_dev_get_module_info()
- rte_eth_dev_get_module_eeprom()
- rte_eth_timesync_enable()
- rte_eth_timesync_disable()
- rte_eth_timesync_read_rx_timestamp()
- rte_eth_timesync_read_tx_timestamp()
- rte_eth_timesync_adjust_time()
- rte_eth_timesync_read_time()
- rte_eth_timesync_write_time()
- rte_eth_read_clock()
- rte_eth_dev_hairpin_capability_get()
- rte_eth_representor_info_get()
6. Not covered
- rte_eth_dev_uc_hash_table_set()
- rte_eth_dev_uc_all_hash_table_set()
- rte_eth_mirror_rule_set()
- rte_eth_mirror_rule_reset()
- rte_eth_dev_get_dcb_info()
- rte_eth_dev_get_sec_ctx()
Andrew.
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [dpdk-dev] [RFC PATCH] ethdev: clarify flow action PORT ID semantics
2021-06-02 12:46 ` Ilya Maximets
2021-06-02 16:26 ` Andrew Rybchenko
@ 2021-06-25 13:04 ` Ferruh Yigit
1 sibling, 0 replies; 40+ messages in thread
From: Ferruh Yigit @ 2021-06-25 13:04 UTC (permalink / raw)
To: Ilya Maximets, Ivan Malov, dev
Cc: Eli Britstein, Smadar Fuks, Hyong Youb Kim, Kishore Padmanabha,
Ori Kam, Ajit Khaparde, Jerin Jacob, John Daley, Thomas Monjalon,
Andrew Rybchenko
On 6/2/2021 1:46 PM, Ilya Maximets wrote:
> On 6/1/21 4:28 PM, Ivan Malov wrote:
>> Hi Ilya,
>>
>> Thank you for reviewing the proposal at such short notice. I'm afraid that prior discussions overlook the simple fact that the whole problem is not limited to just VF representors. Action PORT_ID is also used with respect to the admin PF's ethdev, which "represents itself" (and by no means it represents the underlying physical/network port). In this case, one cannot state that the application treats it as a physical port, just like one states that the application perceives representors as VFs themselves.
>
>
> I don't think that it was overlooked. If device is in a switchdev mode than
> there is a PF representor and VF representors. Application typically works
> only with representors in this case is it doesn't make much sense to have
> representor and the upstream port attached to the same application at the
> same time. Configuration that is applied by application to the representor
> (PF or VF, it doesn't matter) applies to the corresponding upstream port
> (actual PF or VF) by default.
>
> Exactly same thing here with PORT_ID action. You have a packet and action
> to send it to the port, but it's not specified if HW needs to send it to
> the representor or the upstream port (again, VF or PF, it doesn't matter).
> Since there is no extra information, HW should send it to the upstream
> port by default. The same as configuration applies by default to the
> upstream port.
>
> Let's look at some workflow examples:
>
> DPDK Application
> | |
> | |
> +--PF-rep------VF-rep---+
> | |
> | NIC (switchdev) |
> | |
> +---PF---------VF-------+
> | |
> | |
> External VM or whatever
> Network
>
> a. Workflow for "DPDK Application" to set MAC to VF:
>
> 1. "DPDK Application" calls rte_set_etheraddr("VF-rep", new_mac);
> 2. DPDK sets MAC for "VF".
>
> b. Workflow for "DPDK Application" to set MAC to PF:
>
> 1. "DPDK Application" calls rte_set_etheraddr("PF-rep", new_mac);
> 2. DPDK sets MAC for "PF".
>
> c. Workflow for "DPDK Application" to send packet to the external network:
>
> 1. "DPDK Application" calls rte_eth_tx_burst("PF-rep", packet);
> 2. NIC receives the packet from "PF-rep" and sends it to "PF".
> 3. packet egresses to the external network from "PF".
>
> d. Workflow for "DPDK Application" to send packet to the "VM or whatever":
>
> 1. "DPDK Application" calls rte_eth_tx_burst("VF-rep", packet);
> 2. NIC receives the packet from "VF-rep" and sends it to "VF".
> 3. "VM or whatever" receives the packet from "VF".
>
> In two workflows above there is no rte_flow processing on step 2, i.e.,
> NIC does not perform any lookups/matches/actions, because it's not possible
> to configure actions for packets received from "PF-rep" or
> "VF-rep" as these ports doesn't own a port id and all the configuration
> and rte_flow actions translated and applied for the devices that these
> ports represents ("PF" and "VF") and not representors themselves ("PF-rep"
> or "VF-rep").
>
> e. Workflow for the packet received on PF and PORT_ID action:
>
> 1. "DPDK Application" configures rte_flow for all packets from "PF-rep"
> to execute PORT_ID "VF-rep".
> 2. NIC receives packet on "PF".
> 3. NIC executes 'PORT_ID "VF-rep"' action by sending packet to "VF".
> 4. "VM or whatever" receives the packet from "VF".
>
> f. Workflow for the packet received on VF and PORT_ID action:
>
> 1. "DPDK Application" configures rte_flow for all packets from "VF-rep"
> to execute 'PORT_ID "PF-rep"'.
> 2. NIC receives packet on "VF".
> 3. NIC executes 'PORT_ID "PF-rep"' action by sending packet to "PF".
> 4. Packet egresses from the "PF" to the external network.
>
> Above is what, IMHO, the logic should look like and this matches with
> the overall switchdev design in kernel.
>
Hi Ilya,
Thanks for clearly explaining the usecase, this was useful (at least for me).
But I am still not clear what is the other usecase, when port_id action is for
'VF-rep' packets sent to 'VF-ref' (instead of VF).
I remember Ilya mentioned both 'VF-rep' & 'VF' can be attached to an application
for debug purposes, but not any real life usage mentioned, unless I missed.
And if represontor datapath works independently, instead of being a pipe/wire to
represented port, won't it be a virtual partition of the port, instead of
representor of the port?
> I understand that this logic could seem flipped-over from the HW point
> of view, but it's perfectly logical from the user's perspective, because
> user should not care if the application works with representors or
> some real devices. If application configures that all packets from port
> A should be sent to port B, user will expect that these packets will
> egress from port B once received from port A. That will be highly
> inconvenient if the packet will ingress from port B back to the
> application instead.
>
> DPDK Application
> | |
> | |
> port A port B
> | |
> *****MAGIC*****
> | |
> External Another Network
> Network or VM or whatever
>
> It should not matter if there is an extra layer between ports A and B
> and the external network and VM. Everything should work in exactly the
> same way, transparently for the application.
>
> The point of hardware offloading, and therefore rte_flow API, is to take
> what user does in software and make this "magically" work in hardware in
> the exactly same way. And this will be broken if user will have to
> use different logic based on the mode the hardware works in, i.e. based on
> the fact if the application works with ports or their representors.
>
> If some specific use case requires application to know if it's an
> upstream port or the representor and demystify the internals of the switchdev
> NIC, there should be a different port id for the representor itself that
> could be used in all DPDK APIs including rte_flow API or a special bit for
> that matter. IIRC, there was an idea to add a bit directly to the port_id
> for that purpose that will flip over behavior in all the workflow scenarios
> that I described above.
>
>>
>> Given these facts, it would not be quite right to just align the documentation with the de-facto action meaning assumed by OvS.
>
> It's not a "meaning assumed by OvS", it's the original design and the
> main idea of a switchdev based on a common sense.
>
>>
>> On 01/06/2021 15:10, Ilya Maximets wrote:
>>> On 6/1/21 1:14 PM, Ivan Malov wrote:
>>>> By its very name, action PORT_ID means that packets hit an ethdev with the
>>>> given DPDK port ID. At least the current comments don't state the opposite.
>>>> That said, since port representors had been adopted, applications like OvS
>>>> have been misusing the action. They misread its purpose as sending packets
>>>> to the opposite end of the "wire" plugged to the given ethdev, for example,
>>>> redirecting packets to the VF itself rather than to its representor ethdev.
>>>> Another example: OvS relies on this action with the admin PF's ethdev port
>>>> ID specified in it in order to send offloaded packets to the physical port.
>>>>
>>>> Since there might be applications which use this action in its valid sense,
>>>> one can't just change the documentation to greenlight the opposite meaning.
>>>> This patch adds an explicit bit to the action configuration which will let
>>>> applications, depending on their needs, leverage the two meanings properly.
>>>> Applications like OvS, as well as PMDs, will have to be corrected when the
>>>> patch has been applied. But the improved clarity of the action is worth it.
>>>>
>>>> The proposed change is not the only option. One could avoid changes in OvS
>>>> and PMDs if the new configuration field had the opposite meaning, with the
>>>> action itself meaning delivery to the represented port and not to DPDK one.
>>>> Alternatively, one could define a brand new action with the said behaviour.
>>>
>>> We had already very similar discussions regarding the understanding of what
>>> the representor really is from the DPDK API's point of view, and the last
>>> time, IIUC, it was concluded by a tech. board that representor should be
>>> a "ghost of a VF", i.e. DPDK APIs should apply configuration by default to
>>> VF and not to the representor device:
>>> https://patches.dpdk.org/project/dpdk/cover/20191029185051.32203-1-thomas@monjalon.net/#104376
>>> This wasn't enforced though, IIUC, for existing code and semantics is still mixed.
>>>
>>> I still think that configuration should be applied to VF, and the same applies
>>> to rte_flow API. IMHO, average application should not care if device is
>>> a VF itself or its representor. Everything should work exactly the same.
>>> I think this matches with the original idea/design of the switchdev functionality
>>> in the linux kernel and also matches with how the average user thinks about
>>> representor devices.
>>>
>>> If some specific use-case requires to distinguish VF from the representor,
>>> there should probably be a separate special API/flag for that.
>>>
>>> Best regards, Ilya Maximets.
>>>
>>
>
^ permalink raw reply [flat|nested] 40+ messages in thread
* [dpdk-dev] [PATCH v1] ethdev: clarify flow action PORT ID semantics
2021-06-01 11:14 [dpdk-dev] [RFC PATCH] ethdev: clarify flow action PORT ID semantics Ivan Malov
2021-06-01 12:10 ` Ilya Maximets
@ 2021-09-03 7:46 ` Andrew Rybchenko
1 sibling, 0 replies; 40+ messages in thread
From: Andrew Rybchenko @ 2021-09-03 7:46 UTC (permalink / raw)
To: Thomas Monjalon, Ferruh Yigit, Ori Kam
Cc: dev, Eli Britstein, Ilya Maximets, Ajit Khaparde, Ivan Malov,
John Daley, Hyong Youb Kim, Jerin Jacob, Nithin Dabilpuram,
Kiran Kumar K, Somnath Kotur
From: Ivan Malov <ivan.malov@oktetlabs.ru>
Definition of action PORT_ID is ambiguous.
Documentation says "Directs matching traffic to a given DPDK port ID."
It suggests that an application will receive corresponding traffic on
the port using ethdev Rx burst API. Some network PMDs implement PORT_ID
action this way.
However, OvS+DPDK uses PORT_ID action a way which is natural it to
bypass OvS and redirect corresponding traffic to wire if the DPDK port
is a physical function or to a virtual function if the DPDK port is
a VF representor. Anyway corresponding packets will not be received
using ethdev Rx burst API by the OvS+DPDK. Other network PMDs implement
the PORT_ID action following the semantics.
The latter semantics may be explained to match the above definition
taking port representor definition into account which says that port
representor is a ghost of the real port and redirecting a traffic to
it means redirecting the traffic to the corresponding real port.
However, representors are not the only use case, and solution should
be extended anyway to support both options.
Since these interpretations of the PORT_ID action semantics are basically
opposite and both have reasons behind, it is bad to silently break some
applications changing default meaning. So make it backward incompatible
to ensure that all applications are updated to use it in a right way.
Stick to ingress/egress terminology to specify direction since it is
well defined from DPDK application point of view:
- ingress to be received via the specified port;
- egress as if it would-be transmitted via the specified port.
Signed-off-by: Ivan Malov <ivan.malov@oktetlabs.ru>
Signed-off-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
---
The patch is a follow up of the long discussion of the RFC [1].
It is just an attempt to summarize some results of the discussion.
Small quote from the discussion:
On June 3, 2021, 11:05 a.m. UTC, Ilya Maximets wrote:
> On June 3, 2021, 10:33 a.m. UTC, Andrew Rybchenko wrote:
> >
> > A. Add "ingress" bit with "egress" as unset meaning.
> > Yes, that's what is current behaviour assumed and
> > used by OvS and implemented in some PMDs.
> > My problem with it that it is, IMHO, inconsistent
> > default value (as explained above).
> >
> > B. Add "egress" bit with "ingress" as unset meaning.
> > Basically it is what is suggested in the RFC, but
> > the problem of the suggestion is the silent breakage
> > of existing users (let's put it a side if it is
> > correct usage or misuse). It is still the fact.
> >
> > C. Encode above in ethdev port ID MSB.
> > The problem of the solution is that encoding
> > makes sense for representors, but the problem
> > exists for non-representor ports as well.
> > I have no good ideas on terminology in the case
> > if we try to solve it for non-representors.
> >
> > D. Break API and ABI and add enum with unset(default)/
> > ingress/egress members to enforce application to
> > specify direction.
> >
> > It is unclear what we'll do in the case of A, B and D
> > if we encode representor in port ID MSB in any case.
>
> My opinion:
>
> - Option D is the best choice for rte_flow. No defaults, users forced
> to explicitly choose the direction in HW-independent way.
>
> - I agree that option C somewhat conflicts with the 'ingress/egress'
> flag idea and it is more hardware-specific. Therefore if option C
> is going to be implemented it should be implemented in concept of
> option A, i.e. 'egress' is default option if port ID MSB is not set.
If the solution is accepted, testpmd must be updated to require action
direction specification and drivers which implement the action must be
updated to handle direction properly and reject unspecified value with
appropriate error message.
If the patch does not address all concerns and there is still
significant disagreement on the topic, it would be good to schedule
call to discuss it.
[1] https://patches.dpdk.org/project/dpdk/patch/20210601111420.5549-1-ivan.malov@oktetlabs.ru/
doc/guides/prog_guide/rte_flow.rst | 7 ++++++-
doc/guides/rel_notes/deprecation.rst | 6 ------
doc/guides/rel_notes/release_21_11.rst | 4 ++++
lib/ethdev/rte_flow.h | 25 ++++++++++++++++++++++++-
4 files changed, 34 insertions(+), 8 deletions(-)
diff --git a/doc/guides/prog_guide/rte_flow.rst b/doc/guides/prog_guide/rte_flow.rst
index 2b42d5ec8c..89404b38af 100644
--- a/doc/guides/prog_guide/rte_flow.rst
+++ b/doc/guides/prog_guide/rte_flow.rst
@@ -1919,7 +1919,10 @@ See `Item: PHY_PORT`_.
Action: ``PORT_ID``
^^^^^^^^^^^^^^^^^^^
-Directs matching traffic to a given DPDK port ID.
+
+Directs matching traffic to the specified DPDK port (ingress) or to
+the would-be destination as if the application itself sent this traffic
+from the said DPDK port (egress).
See `Item: PORT_ID`_.
@@ -1934,6 +1937,8 @@ See `Item: PORT_ID`_.
+--------------+---------------------------------------+
| ``id`` | DPDK port ID |
+--------------+---------------------------------------+
+ | ``dir`` | egress or ingress |
+ +--------------+---------------------------------------+
Action: ``METER``
^^^^^^^^^^^^^^^^^
diff --git a/doc/guides/rel_notes/deprecation.rst b/doc/guides/rel_notes/deprecation.rst
index 76a4abfd6b..4ec6927c86 100644
--- a/doc/guides/rel_notes/deprecation.rst
+++ b/doc/guides/rel_notes/deprecation.rst
@@ -128,12 +128,6 @@ Deprecation Notices
is deprecated and will be removed in DPDK 21.11. Shared counters should
be managed using shared actions API (``rte_flow_shared_action_create`` etc).
-* ethdev: Definition of the flow API action ``RTE_FLOW_ACTION_TYPE_PORT_ID``
- is ambiguous and needs clarification.
- Structure ``rte_flow_action_port_id`` will be extended to specify
- traffic direction to the represented entity or ethdev port itself
- in DPDK 21.11.
-
* ethdev: Flow API documentation is unclear if ethdev port used to create
a flow rule adds any implicit match criteria in the case of transfer rules.
The semantics will be clarified in DPDK 21.11 and it will require fixes in
diff --git a/doc/guides/rel_notes/release_21_11.rst b/doc/guides/rel_notes/release_21_11.rst
index d707a554ef..b7aa175a32 100644
--- a/doc/guides/rel_notes/release_21_11.rst
+++ b/doc/guides/rel_notes/release_21_11.rst
@@ -84,6 +84,10 @@ API Changes
Also, make sure to start the actual text at the margin.
=======================================================
+* ethdev: ``struct rte_flow_action_port_id`` is extended with the direction
+ field with unspecified default, so all ``PORT_ID`` flow API action users
+ must be updated to make correct choice.
+
ABI Changes
-----------
diff --git a/lib/ethdev/rte_flow.h b/lib/ethdev/rte_flow.h
index 70f455d47d..3b83ed7d3a 100644
--- a/lib/ethdev/rte_flow.h
+++ b/lib/ethdev/rte_flow.h
@@ -2632,10 +2632,32 @@ struct rte_flow_action_phy_port {
uint32_t index; /**< Physical port index. */
};
+/** Traffic direction from application point of view. */
+enum rte_flow_direction {
+ /**
+ * Invalid value which should not be used and must be
+ * rejected by drivers.
+ */
+ RTE_FLOW_DIRECTION_UNSPECIFIED = 0,
+ /** As if the traffic was sent by the application. */
+ RTE_FLOW_EGRESS,
+ /** To be received by the application. */
+ RTE_FLOW_INGRESS,
+};
+
/**
* RTE_FLOW_ACTION_TYPE_PORT_ID
*
- * Directs matching traffic to a given DPDK port ID.
+ * Directs matching traffic to an ethdev with the given DPDK port ID or
+ * to the upstream port (the peer side of the wire) corresponding to it.
+ *
+ * It's assumed that it's the PMD (typically, its instance at the admin
+ * PF) which controls the binding between a (representor) ethdev and an
+ * upstream port. Typical bindings: VF rep. <=> VF, PF <=> network port.
+ * If the PMD instance is unaware of the binding between the ethdev and
+ * its upstream port (or can't control it), it should reject the action
+ * with the egress direction specified and log an appropriate error
+ * message.
*
* @see RTE_FLOW_ITEM_TYPE_PORT_ID
*/
@@ -2643,6 +2665,7 @@ struct rte_flow_action_port_id {
uint32_t original:1; /**< Use original DPDK port ID if possible. */
uint32_t reserved:31; /**< Reserved, must be zero. */
uint32_t id; /**< DPDK port ID. */
+ enum rte_flow_direction dir; /**< Direction to route traffic to. */
};
/**
--
2.30.2
^ permalink raw reply [flat|nested] 40+ messages in thread
end of thread, other threads:[~2021-09-03 7:46 UTC | newest]
Thread overview: 40+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-06-01 11:14 [dpdk-dev] [RFC PATCH] ethdev: clarify flow action PORT ID semantics Ivan Malov
2021-06-01 12:10 ` Ilya Maximets
2021-06-01 13:24 ` Eli Britstein
2021-06-01 14:35 ` Andrew Rybchenko
2021-06-01 14:44 ` Eli Britstein
2021-06-01 14:50 ` Ivan Malov
2021-06-01 14:53 ` Andrew Rybchenko
2021-06-02 9:57 ` Eli Britstein
2021-06-02 10:50 ` Andrew Rybchenko
2021-06-02 11:21 ` Eli Britstein
2021-06-02 11:57 ` Andrew Rybchenko
2021-06-02 12:36 ` Ivan Malov
2021-06-03 9:18 ` Ori Kam
2021-06-03 9:55 ` Andrew Rybchenko
2021-06-07 8:28 ` Thomas Monjalon
2021-06-07 9:42 ` Andrew Rybchenko
2021-06-07 12:08 ` Ori Kam
2021-06-07 13:21 ` Ilya Maximets
2021-06-07 16:07 ` Thomas Monjalon
2021-06-08 16:13 ` Thomas Monjalon
2021-06-08 16:32 ` Andrew Rybchenko
2021-06-08 18:49 ` Thomas Monjalon
2021-06-09 14:31 ` Andrew Rybchenko
2021-06-01 14:49 ` Ivan Malov
2021-06-01 14:28 ` Ivan Malov
2021-06-02 12:46 ` Ilya Maximets
2021-06-02 16:26 ` Andrew Rybchenko
2021-06-02 17:35 ` Ilya Maximets
2021-06-02 19:35 ` Ivan Malov
2021-06-03 9:29 ` Ilya Maximets
2021-06-03 10:33 ` Andrew Rybchenko
2021-06-03 11:05 ` Ilya Maximets
2021-06-03 11:29 ` Ivan Malov
2021-06-07 19:27 ` Ilya Maximets
2021-06-07 20:39 ` Ivan Malov
2021-06-25 13:04 ` Ferruh Yigit
2021-06-02 12:16 ` Thomas Monjalon
2021-06-02 12:53 ` Ilya Maximets
2021-06-02 13:10 ` Andrew Rybchenko
2021-09-03 7:46 ` [dpdk-dev] [PATCH v1] " Andrew Rybchenko
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).