From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <dev-bounces@dpdk.org>
Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124])
	by inbox.dpdk.org (Postfix) with ESMTP id 66F41A00C5;
	Wed, 14 Sep 2022 17:18:04 +0200 (CEST)
Received: from [217.70.189.124] (localhost [127.0.0.1])
	by mails.dpdk.org (Postfix) with ESMTP id 0F8854021D;
	Wed, 14 Sep 2022 17:18:04 +0200 (CEST)
Received: from shelob.oktetlabs.ru (shelob.oktetlabs.ru [91.220.146.113])
 by mails.dpdk.org (Postfix) with ESMTP id 9A72140156
 for <dev@dpdk.org>; Wed, 14 Sep 2022 17:18:02 +0200 (CEST)
Received: by shelob.oktetlabs.ru (Postfix, from userid 115)
 id 2EBBD74; Wed, 14 Sep 2022 18:18:02 +0300 (MSK)
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on mail1.oktetlabs.ru
X-Spam-Level: 
X-Spam-Status: No, score=0.8 required=5.0 tests=ALL_TRUSTED, DKIM_ADSP_DISCARD,
 URIBL_BLOCKED autolearn=no autolearn_force=no version=3.4.6
Received: from bree.oktetlabs.ru (bree.oktetlabs.ru [192.168.34.5])
 (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
 key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256)
 (No client certificate requested)
 by shelob.oktetlabs.ru (Postfix) with ESMTPS id 6204367;
 Wed, 14 Sep 2022 18:18:00 +0300 (MSK)
DKIM-Filter: OpenDKIM Filter v2.11.0 shelob.oktetlabs.ru 6204367
Authentication-Results: shelob.oktetlabs.ru/6204367; dkim=none;
 dkim-atps=neutral
Date: Wed, 14 Sep 2022 18:18:00 +0300 (MSK)
From: Ivan Malov <ivan.malov@oktetlabs.ru>
To: Rongwei Liu <rongweil@nvidia.com>
cc: Matan Azrad <matan@nvidia.com>, Slava Ovsiienko <viacheslavo@nvidia.com>, 
 Ori Kam <orika@nvidia.com>, 
 "NBU-Contact-Thomas Monjalon (EXTERNAL)" <thomas@monjalon.net>, 
 Aman Singh <aman.deep.singh@intel.com>, 
 Yuying Zhang <yuying.zhang@intel.com>, 
 Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>, 
 "dev@dpdk.org" <dev@dpdk.org>, Raslan Darawsheh <rasland@nvidia.com>
Subject: RE: [PATCH v1] ethdev: add direction info when creating the transfer
 table
In-Reply-To: <BN9PR12MB5273BD8DC24743FB9DDBD032AB469@BN9PR12MB5273.namprd12.prod.outlook.com>
Message-ID: <6164993a-ba4e-c1ea-aaf5-5cc7c35d3724@oktetlabs.ru>
References: <20220907024020.2474860-1-rongweil@nvidia.com>
 <1be72d6-be5b-88b2-f15-16fd2c6d0c0@oktetlabs.ru>
 <BN9PR12MB52738378332B7AA29147A376AB479@BN9PR12MB5273.namprd12.prod.outlook.com>
 <5d8d42b2-7011-cb46-7f2c-1b1019c4151e@oktetlabs.ru> 
 <BN9PR12MB5273D5324D65A62998CCBC0AAB469@BN9PR12MB5273.namprd12.prod.outlook.com>
 <46841a9-37f1-29a8-ba86-ac5410723e2f@oktetlabs.ru>
 <BN9PR12MB5273BD8DC24743FB9DDBD032AB469@BN9PR12MB5273.namprd12.prod.outlook.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII; format=flowed
X-BeenThere: dev@dpdk.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: DPDK patches and discussions <dev.dpdk.org>
List-Unsubscribe: <https://mails.dpdk.org/options/dev>,
 <mailto:dev-request@dpdk.org?subject=unsubscribe>
List-Archive: <http://mails.dpdk.org/archives/dev/>
List-Post: <mailto:dev@dpdk.org>
List-Help: <mailto:dev-request@dpdk.org?subject=help>
List-Subscribe: <https://mails.dpdk.org/listinfo/dev>,
 <mailto:dev-request@dpdk.org?subject=subscribe>
Errors-To: dev-bounces@dpdk.org

Hi Rongwei,

On Wed, 14 Sep 2022, Rongwei Liu wrote:

> HI
>
> BR
> Rongwei
>
>> -----Original Message-----
>> From: Ivan Malov <ivan.malov@oktetlabs.ru>
>> Sent: Wednesday, September 14, 2022 15:32
>> To: Rongwei Liu <rongweil@nvidia.com>
>> Cc: Matan Azrad <matan@nvidia.com>; Slava Ovsiienko
>> <viacheslavo@nvidia.com>; Ori Kam <orika@nvidia.com>; NBU-Contact-
>> Thomas Monjalon (EXTERNAL) <thomas@monjalon.net>; Aman Singh
>> <aman.deep.singh@intel.com>; Yuying Zhang <yuying.zhang@intel.com>;
>> Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>; dev@dpdk.org; Raslan
>> Darawsheh <rasland@nvidia.com>
>> Subject: RE: [PATCH v1] ethdev: add direction info when creating the transfer
>> table
>>
>> External email: Use caution opening links or attachments
>>
>>
>> Hi,
>>
>> On Wed, 14 Sep 2022, Rongwei Liu wrote:
>>
>>> HI
>>>
>>> BR
>>> Rongwei
>>>
>>>> -----Original Message-----
>>>> From: Ivan Malov <ivan.malov@oktetlabs.ru>
>>>> Sent: Tuesday, September 13, 2022 22:33
>>>> To: Rongwei Liu <rongweil@nvidia.com>
>>>> Cc: Matan Azrad <matan@nvidia.com>; Slava Ovsiienko
>>>> <viacheslavo@nvidia.com>; Ori Kam <orika@nvidia.com>; NBU-Contact-
>>>> Thomas Monjalon (EXTERNAL) <thomas@monjalon.net>; Aman Singh
>>>> <aman.deep.singh@intel.com>; Yuying Zhang <yuying.zhang@intel.com>;
>>>> Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>; dev@dpdk.org;
>>>> Raslan Darawsheh <rasland@nvidia.com>
>>>> Subject: RE: [PATCH v1] ethdev: add direction info when creating the
>>>> transfer table
>>>>
>>>> External email: Use caution opening links or attachments
>>>>
>>>>
>>>> Hi Rongwei,
>>>>
>>>> PSB
>>>>
>>>> On Tue, 13 Sep 2022, Rongwei Liu wrote:
>>>>
>>>>> Hi
>>>>>
>>>>> BR
>>>>> Rongwei
>>>>>
>>>>>> -----Original Message-----
>>>>>> From: Ivan Malov <ivan.malov@oktetlabs.ru>
>>>>>> Sent: Tuesday, September 13, 2022 00:57
>>>>>> To: Rongwei Liu <rongweil@nvidia.com>
>>>>>> Cc: Matan Azrad <matan@nvidia.com>; Slava Ovsiienko
>>>>>> <viacheslavo@nvidia.com>; Ori Kam <orika@nvidia.com>; NBU-Contact-
>>>>>> Thomas Monjalon (EXTERNAL) <thomas@monjalon.net>; Aman Singh
>>>>>> <aman.deep.singh@intel.com>; Yuying Zhang <yuying.zhang@intel.com>;
>>>>>> Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>; dev@dpdk.org;
>>>>>> Raslan Darawsheh <rasland@nvidia.com>
>>>>>> Subject: Re: [PATCH v1] ethdev: add direction info when creating
>>>>>> the transfer table
>>>>>>
>>>>>> External email: Use caution opening links or attachments
>>>>>>
>>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> On Wed, 7 Sep 2022, Rongwei Liu wrote:
>>>>>>
>>>>>>> The transfer domain rule is able to match traffic wire/vf origin
>>>>>>> and it means two directions' underlayer resource.
>>>>>>
>>>>>> The point of fact is that matching traffic coming from some entity
>>>>>> like wire / VF has been long generalised in the form of representors.
>>>>>> So, a flow rule with attribute "transfer" is able to match traffic
>>>>>> coming from either a REPRESENTED_PORT or from a
>> PORT_REPRESENTOR
>>>> (please find these items).
>>>>>>
>>>>>>>
>>>>>>> In customer deployments, they usually match only one direction
>>>>>>> traffic in single flow table: either from wire or from vf.
>>>>>>
>>>>>> Which customer deployments? Could you please provide detailed
>> examples?
>>>>>>
>>>>>>>
>>>>>
>>>>> We saw a lot of customers' deployment like:
>>>>> 1. Match overlay traffic from wire and do decap, then send to specific
>> vport.
>>>>> 2. Match specific 5-tuples and do encap, then send to wire.
>>>>> The matching criteria has obvious direction preference.
>>>>
>>>> Thank you. My questions are as follows:
>>>>
>>>> In (1), when you say "from wire", do you mean the need to match
>>>> packets arriving via whatever physical ports rather then matching
>>>> packets arriving from some specific phys. port?
>>
>> ^^
>>
>> Could you please find my question above? Based on your understanding of
>> templates in async flow approach, an answer to this question may help us find
>> the common ground.
> It means traffic arrived from physical ports (transfer_proxy role) or south band per you concept.

Transfer proxy has nothing to do with physical ports. And I should stress
out that "south band" and the likes are NOT my concepts. Instead, I think
that direction designations like "south" or "north" aren't applicable
when talking about the embedded switch and its flow (transfer) rules.

> Traffic from vport (not transfer_proxy) or north band per your concept won't hit even if same packets.

Please see above. Transfer proxy is a completely different concept.
And I never used "north band" concept.

>>
>> --
>>
>>>>
>>>> If, however, matching traffic "from wire" in fact means matching
>>>> packets arriving from a *specific* physical port, then for sure item
>>>> REPRESENTED_PORT should perfectly do the job, and the proposed
>>>> attribute is unneeded.
>>>>
>>>> (BTW, in DPDK, it is customary to use term "physical port", not
>>>> "wire")
>>>>
>>>> In (1), what are "vport"s? Please explain. Once again, I should
>>>> remind that, in DPDK, folks prefer terms "represented entity" /
>> "representor"
>>>> over vendor-specific terms like "vport", etc.
>>>>
>>> Vport is virtual port for short such as VF.
>>
>> Thanks. As I say, term "vport" might be confusing to some readers, so it'd be
>> better to provide this explanation (about VF) in the commit description next
>> time.
> Ack. Will add VF as an example.
>>
>>>> As for (2), imagine matching 5-tuple traffic emitted by a VF / guest.
>>>> Could you please explain, why not just add a match item
>>>> REPRESENTED_PORT pointing to that VF via its representor? Doing so
>>>> should perfectly define the exact direction / traffic source. Isn't that
>> sufficient?
>>>>
>>> Per my view, there is matching field and matching value difference.
>>> Like IPv4 src_addr 1.1.1.1, 1.1.1.2. 1.1.1.3, will you treat it as same or
>> different matching criteria?
>>> I would like to call them same since it can be summarized like
>>> 1.1.1.0/30 REPRESENTED_PORT is just another matching item, no essential
>> differences and it can't stand for direction info.
>>
>> It looks like we're starting to run into disagreement here.
>> There's no "direction" at all. There's an embedded switch inside the NIC, and
>> there're (logical) switch ports that packets enter the switch from.
>>
>> When the user submits a "transfer" rule and does not provide neither
>> REPRESENTED_PORT nor PORT_REPRESENTOR in the pattern, the embedded
>> switch is supposed to match packets coming from ANY ports, be it VFs or
>> physical (wire) ports.
>>
>> But when the user provides, in example, item REPRESENTED_PORT to point to
>> the physical (wire) port, the embedded switch knows exactly which port the
>> packets should enter it from.
>> In this case, it is supposed to match only packets coming from that physical
>> port. And this should be sufficient.
>> This in fact replaces the need to know a "direction".
>> It's just an exact specification of packet's origin.
>>
> There is traffic arriving or leaving the switch, so there is always direction, implicit or explicit.

This does not contradict my thoughts above. "Direction" is *defined* by
two points (like in geometry): an initial point (the switch port through
which a packet enters the switch) and the terminal point (the match engine 
inside the switch). If one knows these two points, no extra hints are
required to specify some "direction". Because direction is already
represented by this "vector" of sorts. That's why presence of the
port match item in the pattern is absolutely sufficient.

However, based on your later explanations, the use of
precise port item is simply inconvenient in your
use case because you are trying to match traffic
from *multiple* ports that have something in
common (i.e. all VFs or all wire ports).

And, instead of adding a new item type which would serve
exactly your needs, you for some reason try to add an
attribute, which has multiple drawbacks which I
described in my previous letter.

> For transfer rules, there is a concept transfer_proxy.
> It takes the switch ownership; all switch rules should be configured via transfer_proxy.

Yes, such concept exists, but it's a don't care with
regard to the problem that we're discussing, sorry.
Furthermore, unlike "switch domain ID" (which is
the same for all ethdevs belonging to a given
physical NIC board), nobody guarantees that
it's only one transfer proxy port. Some NIC
vendors allows transfer rules to be added
via any ethdev port.

>
> Image a logic switch with one PF and two VFs.
> PF is the transfer proxy and VF belongs to the PF logically.
> When receiving traffic from PF, we can say it comes into the logic switch.

That's correct.

> When packet sent from VF (VF belongs to PF), so we can say traffic leaves the switch.

That's not correct. Traffic sent from VF (for example, a guest VM
is sending packets) also *enters* the switch. PFs and VFs are in
fact *separate* logical ports of the embedded switch.

>
> Item REPRESENTED_PORT indicates switch to match traffic sent from which port, comes into, or leave switch.

That is not correct either. Item REPRESENTED_PORT tells the switch to
match packets which come into the switch FROM the logical port
which is represented by the given DPDK ethdev.

For example, if ethdev="E" is the *main* PF which is bound to
physical port "P", then item REPRESENTED_PORT with ethdev ID
being set to "E" tells the switch that only packet coming
to NIC from *wire* via physical port "E" should match.

> We can say it as one kind of packet metadata.

Kind of yes, but might be vendor-specific. No need to delve into this.

> Like you said, DPDK always treat transfer to match any PORTs traffic.

Slight correction: it treats it this way until it sees an exact port item.
If the user provides REPRESENTED_PORT (or PORT_REPRESENTOR), it's no
longer *any* ports traffic, it's an exact port traffic. That's it.

> When REPRESENTED_PORT is specified, the rules are limited to some dedicated PORTs.

These rules match only packets arriving TO the
embedded switch FROM the said dedicated ports.

> Other PORTs are ignored because metadata mismatching.

Kind of yes, correct.

> Rules still have the capability to match ANY PORTS if metadata matched.

This statement is only correct for the cases when the user does NOT
use neither item REPRESENTED_PORT nor item PORT_REPRESENTOR.

>
> This update will allow user to cut the other PORTs matching capabilities.

As I explained, this is exactly what items PORT_REPRESENTOR
and REPRESENTED_PORT do. No need to have an extra attribute.

If the user adds item REPRESENTED_PORT with ethdev_id="E",
like in the above example, to match packets entering NIC
via the physical port "P", then this rule will NOT match
packets entering NIC from other points. For example,
packets transmitted by a virtual machine via a VF
will not match in this case.

>>> Port id depends on the attach sequence.
>>
>> Unfortunately, this is hardly a good argument because flow rules are supposed
>> to be inserted based on the run-time packet learning. Attach sequence is a
>> don't care here.
>>
>>>> Also please mind that, although I appreciate your explanations here,
>>>> on the mailing list, they should finally be added to the commit
>>>> message, so that readers do not have to look for them elsewhere.
>>>>
>>> We have explained the high possibility of single-direction matching, right?
>>
>> Not quite. As I said, it is not correct to assume any "direction", like in
>> geographical sense ("north", "south", etc.). Application has ethdevs, and they
>> are representors of some "virtual ports" (in your terminology) belonging to the
>> switch, for example, VFs, SFs or physical ports.
>>
>> The user adds an appropriate item to the pattern (REPRESENTED_PORT), and
>> doing so specifies the packet path which it enters the switch.
>>
>>> It' hard to list all the possibilities of traffic matching preferences.
>>
>> And let's say more: one need never do this. That's exactly the reason why
>> DPDK has abandoned the concept of "direction" in *transfer* rules and
>> switched to the use of precise criteria (REPRESENTED_PORT, etc.).
>>
> As far as I know, DPDK changes "transfer ingress" to "transfer", so it' more clear that transfer can match both directions (both ingress and egress).

Not quite. DPDK has abandoned the use of "ingress / egress" in "transfer" 
rules because "ingress" and "egress" are only applicable on the VNIC
level. For example, there is a PF attached to DPDK application:
packets that the application receives through this ethdev, are
ingress, and packets that it transmits (tx_burst) are egress.

I can explain in other words. Imagine yourself standing *inside* a room
which only has one door. When someone enters the room, it's "ingress",
when someone leaves, it's "egress". It's relative to your viewpoint.
In this example, such a room represents a VNIC / ethdev.

And now imagine yourself standing *outside* of another room / auditorium 
which has multiple doors / exits. You're standing near some particular
exit "A" (VNIC / ethdev), but people may enter this room via another
door "B" and then leave it via yet another door "C". In this case,
from your viewpoint, this traffic cannot be considered neither
ingress nor egress. Because these people do not approach you.

Like in this example, embedded switch is like a large auditorium
with many-many doors / exits. And there can be many-many
directions: packet can enter the switch via phys. port "P1"
and then leave it via another phys. port "P2". Or it can
enter the switch via phys. port and the leave it via
VF's logical port (to be delivered to a guest machine),
or a packet can travel from one VF to another one.

There's no PRE-DEFINED direction like "north to south" or "east to west".
And this explains why it's very undesirable to use term "direction".

> REPRESENTED_PORT is the evolution of "port_id", I think, it' only one kind of matching items.

Yes. But nobody prevents you from defining yet another match item
which will be able to refer to a *group* of ports which have
something in common (i.e. "all guest ports of this switch"
pointing to all logical ports currently attached to
virtual machines / guests, or "all wire ports of this swtich").

>
> For large scale deployment like 10M rules, if we can save resources significantly by introducing direction, why not?

I do not deny the fact that you have a use case where resources can
be saved significantly if you give the PMD some extra knowledge
when creating a flow table / pattern template. That's totally
OK. What I object is the very implementation and the use of
term "direction". If you add new item types (like above),
then, when you create an async table 1 pattern template,
you will have item ANY_WIRE_PORTS, and, for table 2
pattern template, you'll have item ANY_GUEST_PORTS.
As you see, the two pattern templates now differ
because the match criteria use different items.

>
> Again, async API:
> 1. pattern template A
> 2. action template B
> 3. table C with pattern template A + action template B.
> 4. rule D, E, F...
> The specified REPRESENTED_PORT is provided in rules (D, E, F...) not pattern template A or action template B or table C.
> Resources may be allocated early at step 3 since table' rule_nums property.

No, item REPRESENTED_PORT *can* be provided inside pattern template A,
but, as you pointed out earlier, the problem is that you can't
distinguish different pattern templates which have this item,
because pattern templates know nothing about *exact* port IDs
and only know item MASKS. Yes, I agree that in your case
such problem exists, but, as I say above, it can be
solved by adding new item types: one for referring to
all phys. ports of a given NIC and another one for
pointing to a group of current guest users (VFs).

>>> The underlay is the one we have met for now.
>>>>>
>>>>>>> Introduce one new member transfer_mode into rte_flow_attr to
>>>>>>> indicate the flow table direction property: from wire, from vf or
>>>>>>> bi-direction(default).
>>>>>>
>>>>>> AFAIK, 'rte_flow_attr' serves both traditional flow rule insertion
>>>>>> and asynchronous (table) approach. The patch adds the attributes to
>>>>>> generic 'rte_flow_attr' but, for some reason, ignores non-table rules.
>>>>>>
>>>>>>>
>>>>> Sync API uses one rule to contain everything. It' hard for PMD to
>>>>> determine
>>>> if this rule has direction preference or not.
>>>>> Image a situation, just for an example:
>>>>> 1. Vport 1 VxLAN do decap send to vport 2.     1 million scale
>>>>> 2. Vport 0 (wire) VxLAN do decap send to vport 3.   1 hundred scale.
>>>>> 1 and 2 share the same matching conditions (eth / ipv4 / udp / vxlan
>>>>> /...), so
>>>> sync API consider them share matching determination logic.
>>>>> It means "2" have 1M scale capability too. Obviously, it wastes a
>>>>> lot of
>>>> resources.
>>>>
>>>> Strictly speaking, they do not share the same match pattern.
>>>> Your example clearly shows that, in (1), the pattern should request
>>>> packets coming from "vport 1" and, in (2), packets coming from "vport 0".
>>>>
>>>> My point is simple: the "vport" from which packets enter the embedded
>>>> switch is ALSO a match criterion. If you accept this, you'll see: the
>>>> matching conditions differ.
>>>>
>>> See above.
>>> In this case, I think the matching fields are both "port_id + ipv4_vxlan". They
>> are same.
>>> Only differs with values like vni 100 or 200 vice versa.
>>
>> Not quite. Look closer: you use *different* port IDs for (1) and (2).
>> The value of "ethdev_id" field in item REPRESENTED_PORT differs.
>>
>>>>>
>>>>> In async API, there is pattern_template introduced. We can mark "1"
>>>>> to use
>>>> pattern_tempate id 1 and "2" to use pattern_template 2.
>>>>> They will be separated from each other, don't share anymore.
>>>>
>>>> Consider an example. "Wire" is a physical port represented by PF0
>>>> which, in turn, is attached to DPDK via ethdev 0. "VF" (vport?) is
>>>> attached to guest and is represented by a representor ethdev 1 in DPDK.
>>>>
>>>> So, some rules (template 1) are needed to deliver packets from "wire"
>>>> to "VF" and also decapsulate them. And some rules (template 2) are
>>>> needed to deliver packets in the opposite direction, from "VF"
>>>> to "wire" and also encapsulate them.
>>>>
>>>> My question is, what prevents you from adding match item
>>>> REPRESENTED_PORT[ethdev_id=0] to the pattern template 1 and
>>>> REPRESENTED_PORT[ethdev_id=1] to the pattern template 2?
>>>>
>>>> As I said previously, if you insert such item before eth / ipv4 / etc
>>>> to your match pattern, doing so defines an *exact* direction / source.
>>>>
>>> Could you check the async API guidance? I think pattern template focusing
>> on the matching field (mask).
>>> "REPRESENTED_PORT[ethdev_id=0] " and
>> "REPRESENTED_PORT[ethdev_id=1] "are the same.
>>> 1. pattern  template:  REPRESENTED_PORT mask 0xffff ...
>>> 2. action template: action1 / actions2. / 3. table create with
>>> pattern_template plus action template..
>>> REPRESENTED_PORT[ethdev_id=0]  will be rule1:  rule create
>> REPRESENTED_PORT port_id is 0 / actions ....
>>> REPRESENTED_PORT[ethdev_id=1]  will be rule2:  rule create
>> REPRESENTED_PORT port_id is 1 / actions ....
>>
>> OK, so, based on this explanation, it appears that you might be looking to refer
>> to:
>> a) a *set* of any physical (wire) ports
>> b) a *set* of any guest ports (VFs)
>>
> Great, looks we are more and more closer to the agreement.

Looks so.

>> You chose to achieve this using an attribute, but:
>>
>> 1) as I explained above, the use of term "direction" is wrong;
>>     please hear me out: I'm not saying that your use case and
>>     your optimisation is wrong: I'm saying that naming for it
>>     is wrong: it has nothing to do with "direction";
>>
> Do you have any better naming proposal?

As I said, what you are trying to achieve using a new
attribute would be way better to achieve using new
pattern items which can be easily told one from
another in PMD when pre-allocaing resources for
different async flow tables.

So, I don't have any proposal for *attribute* naming.
What I propose is to consider new items instead.

>> 2) while naming a *set* of wire ports as "wire_orig" might be OK,
>>     sticking with term "vf_orig" for a *set* of guest ports is
>>     clearly not, simply because the user may pass another PF
>>     to a guest instead of passing a VF; in other words,
>>     a better term is needed here;
>>
> Like you said, vport may contain VF, SF etc. vport_orgin is on the logic switch perspective.
> Any proposal is welcome.

The problem is, vport can be easily confused with a slightly more
generic "lport" (embedded switch's "logical port"), and, logical
ports, in turn, are not confined to just VFs or PFs. For example,
physical (wire) ports are ALSO logical ports of the switch.

>> 3) since it is possible to plug multiple NICs to a DPDK application,
>>     even from different vendors, the user may end up having multiple
>>     physical ports belonging to different physical NICs attached to
>>     the application; if this is the case, then referring to a *set*
>>     of wire ports using the new attribute is ambiguous in the
>>     sense that it's unclear whether this applies only to
>>     wire ports of some specific physical NIC or to the
>>     physical ports of *all* NICs managed by the app;
>>
> Not matter how many NICs has been probed by the DPDK, there is always switch/PF/VF/SF.. concept.

Correct.

> Each switch must have an owner identified by transfer_proxy(). Vport (VF/SF) can't cross switch in normal case.

No. That is not correct. This is tricky, but please hear me out: an
individual NIC board (that is, a given *switch*) is identified only
by its switch domain ID. As I explained above, "transfer proxy" is
just a technical hint for the applcation to indicate an ethdev
through which "transfer" rules must be managed. Not all vendors
support this concept (and they are not obliged to support it).

> The traffic comes from one NIC can't be offloaded by other NICs unless forwarded by the application.

Right, but forwarding in software (inside DPDK application) is
out of scope with regard to the problem that we're discussing.

> If user use new attribute to cut one side resource, I think user is smart enough to management the rules in different NICs.

As I explained above, I do not deny the existence of the problem that
your patch is trying to solve. Now it looks like we're on the same
page with regard to understanding the fact that what you're
trying to do is to introduce a match criterion that would
refer to a GROUP of similar ports. In my opinion, this
is not an *attribute*, it's a *match criterion*, and
it should be implemented as two new items.

Having two different item types would perfectly fit the need
to know the difference between such "directions" (as per
your terminology) early enough, when parsing templates.

> No default behavior changed with this update.
>
>> 4) adding an attribute instead of yet another pattern item type
>>     is not quite good because PMDs need to be updated separately
>>     to detect this attribute and throw an error if it's not
>>     supported, whilst with a new item type, the PMDs do not
>>     need to be updated = if a PMD sees an unsupported item
>>     while traversing the item with switch () { case }, it
>>     will anyway throw an error;
>>
> PMD also need to check if it supports new matching item or not, right?
> We can't assume NIC vendor' PMD implementation, right?

No-no-no. Imagine a PMD which does not support "transfer" rules.
In such PMD, in the flow parsing function one would have:

if (!!attr->transfer) {
     print_error("Transfer is not supported");
     return EINVAL;
}

If you add a new attribute, then PMDs which are NOT going
to support it need to be updated to add similar check.
Otherwise, they will simply ignore presence / absence
of the attribute in the rule, and validation result
will be unreliable.

Yes, if this attribute is 0x0, then indeed behaviour
does nto change. But what if it's 0x1 or 0x2?
PMDs that do not support these values must
somehow reject such rules on parsing.

However, this problem does not manifest itself when
parsing items. Typially, in a PMD, one would have:

switch (item->type) {
     case RTE_FLOW_ITEM_TYPE_VOID:
         break;

     case RTE_FLOW_ITEM_TYPE_ETH:
         /* blah-blah-blah */
         break;

     default:
         return ENOTSUP;
}

So, if you introduce two new item types to solve your problem,
then you won't have to update existing PMDs. If the vendor
wants to support the new items (say, MLX or SFC), they'll
update their code to accept the items. But other vendors
will not do anything. If the user tries to pass such an
item to a vendor which doesn't support the feature,
the "default" case will just throw an error.

This is what I mean when pointing out such difference
between adding an attribute VS adding new item types.

>> 5) as in (4), a new attribute is not good from documentation
>>     standpoint; plase search for "represented_port = Y" in
>>     documentation = this way, all supported items are
>>     easily defined for various NIC vendors, but the
>>     same isn't true for attributes = there is no
>>     way to indicate supported attributes in doc.
>>
>> If points (1 - 5) make sense to you, then, if I may be so bold, I'd like to suggest
>> that the idea of adding a new attribute be abandoned. Instead, I'd like to
>> suggest adding new items:
>>
>> (the names are just sketch, for sure, it should be discussed)
>>
>> ANY_PHY_PORTS { switch_domain_id }
>>   = match packets entering the embedded switch from *whatever*
>>     physical ports belonging to the given switch domain
>>
> How many PHY_PORTS can one switch have, per your thought? Can I treat the PHY_PORTS as the { switch_domain_id } owner as transfer_proxy()?

A single physical NIC board is supposed to have a single
embedded switch engine. Hence, if the NIC board has, in
example, two or four physical ports, these will be the
physical ports of the switch. That's it.

As for the transfer proxy, please see my explanations above.
It's not *always* reliable to tell whether two given ethdevs
belong to the same physical NIC board or not.

Switch domain ID is the right criterion (for applications).

>> ANY_GUEST_PORTS { switch_domain_id }
>>   = match packets entering the embedded switch from *whatever*
>>     guest ports (VFs, PFs, etc.) belonging to the given
>>     switch domain
>>
>> The field "switch_domain_id" is required to tell one physical board / vendor
>> from another (as I explained in point (3)).
>> The application can query this parameter from ethdev's switch info: please see
>> "struct rte_eth_switch_info".
>>
>> What's your opinion?
>>
> How can we handle ANY_PHY_PORTS/ ANY_GUEST_PORTS ' relationship with REPRESENTED_PORT if conflicts?
> Need future tuning.

And if you carry on with "vf_orig" / "wire_orig" approach, you
will inevitably have the very same problem: possible conflict
with items like REPRESENTED_PORT. So does it matter? Yes,
checks need to be done by PMDs when parsing patterns.

> Like I said before,  offloaded rules can't cross different NIC vendor' "switch_domain_id".
> If user probes multiple NICs in one application, application should take care of packet forwarding.
> Also application should be aware which ports belong to which NICs.

Yes, perhaps, domain ID is not needed in the new items.
But the application still must keep track of switch
domain IDs itself so it knows which rules to
manage via which ethdevs.

Any other opinions?

>>>
>>>>>
>>>>>> For example, the diff below adds the attributes to "table" commands
>>>>>> in testpmd but does not add them to regular (non-table) commands
>>>>>> like "flow create". Why?
>>>>>>
>>>>>>>
>>>>>
>>>>> "table" command limits pattern_template to single direction or
>>>>> bidirection
>>>> per user specified attribute.
>>>>
>>>> As I say above, the same effect can be achieved by adding item
>>>> REPRESENTED_PORT to the corresponding pattern template.
>>> See above.
>>>>
>>>>> "rule" command must tight with one "table_id", so the rule will
>>>>> inherit the
>>>> "table" direction property, no need to specify again.
>>>>
>>>> You migh've misunderstood. I do not talk about "rule" command coupled
>>>> with some "table". What I talk about is regular, NON-async flow
>>>> insertion commands.
>>>>
>>>> Please take a look at section "/* Validate/create attributes. */" in
>>>> file "app/test-pmd/cmdline_flow.c". When one adds a new flow
>>>> attribute, they should reflect it the same way as VC_INGRESS,
>> VC_TRANSFER, etc.
>>>>
>>>> That's it.
>>> We don't intend to pass this to sync API. The above code example is for sync
>> API.
>>
>> So I understand. But there's one slight problem: in your patch, you add the new
>> attributes to the structure which is *shared* between sync and async use case
>> scenarios. If one adds an attribute to this structure, they have to provide
>> accessors for it in all sync-related commands in testpmd, but your patch does
>> not do that.
>>
> Like the title said, "creating transfer table" is the ASYNC operation.
> We have limited the scope of this patch. Sync API will be another story.
> Maybe we can add one more sentence to emphasize async API again.

No-no-no. There might be slight misunderstanding. I understand that
you are limiting the scope of your patch by saying this and this.
That's OK. What I'm trying to point out is the fact that your
patch nevertheless touches the COMMON part of the flow API
which is shared between two approaches (sync and async).

Imagine a reader that does not know anything about the async approach.
He just opens the file in vim and goes directly to struct rte_flow_attr.
And, over there, he sees the new attribute "wire_orig". He then
immediately assumes that these attributes can be used in
testpmd. Now the reader opens testpmd and tries to
insert a flow rule using the sync approach:

flow create priority 0 transfer vf_orig pattern / ... / end actions drop

And doing so will be a failure, because your patch does not add the
new attribute keyword to sync flow rule syntax parser. That's it.

Once again, I should ephasize: the reader MAY know nothing about the async
approach. But if the attribute is present in "struct rte_flow_attr", it
immediately means that it is available everywhere. Both sync and async.

So, with this in mind, your attempt to limit the scope of the patch
to async-only rules looks a little bit artificial. It's not
correct from the *formal* standpoint.

>
>> In other words, it is wrong to assume that "struct rte_flow_attr" only applies to
>> async approach. It had been introduced long before the async flow design was
>> added to DPDK. That's it.
>>
>>>>
>>>> But, as I say, I still believe that the new attributes aren't needed.
>>> I think we are not at the same page for now. Can we reach agreement on
>>> the same matching criteria first?
>>>>>
>>>>>>> It helps to save underlayer memory also on insertion rate.
>>>>>>
>>>>>> Which memory? Host memory? NIC memory? Term "underlayer" is
>> vague.
>>>>>> I suggest that the commit message be revised to first explain how
>>>>>> such memory is spent currently, then explain why this is not
>>>>>> optimal and, finally, which way the patch is supposed to improve
>>>>>> that. I.e. be more
>>>> specific.
>>>>>>
>>>>>>>
>>>>>
>>>>> For large scalable rules, HW (depends on implementation) always
>>>>> needs
>>>> memory to hold the rules' patterns and actions, either from NIC or from
>> host.
>>>>> The memory footprint highly depends on "user rules' complexity",
>>>>> also diff
>>>> between NICs.
>>>>> ~50% memory saving is expected if one-direction is cut.
>>>>
>>>> Regardless of this talk, this explanation should probably be present
>>>> in the commit description.
>>>>
>>> This number may differ with different NICs or implementation. We can't say
>> it for sure.
>>
>> Not an exact number, of course, but a brief explanation of:
>> a) what is wrong / not optimal in the current design;
> Please check the commit log, transfer have the capability to match bi-direction traffic no matter what ports.
>> b) how it is observed in customer deployments;
> Customer have the requirements to save resources and their offloaded rules is direction aware.
>> c) why the proposed patch is a good solution.
> New attributes provide the way to remove one direction and save underlayer resource.
> All of the above can be found in the commit log.

I understand all of that, but my point is, the existing commit message is
way too brief. Yes, it mentions that SOME customers have SOME deployments,
but it does not shed light on which specifics these deployments have. For
example, back in the day, when items PORT_REPRESENTOR and REPRESENTED_PORT
were added, the cover letter for that patch series provided details of
deployment specifics (application: OvS, scenario: full offload rules).

So, it's always better to expand on such specifics so that the reader
has full picture in their head and doesn't need to look elsewhere.
Not all readers of the commit message will be happy to delve
into our discussions on the mailing list to get the gist.

>
>>
>
>>>>>
>>>>>>> By default, the transfer domain is bi-direction, and no behavior changes.
>>>>>>>
>>>>>>> 1. Match wire origin only
>>>>>>>  flow template_table 0 create group 0 priority 0 transfer wire_orig...
>>>>>>> 2. Match vf origin only
>>>>>>>  flow template_table 0 create group 0 priority 0 transfer vf_orig...
>>>>>>>
>>>>>>> Signed-off-by: Rongwei Liu <rongweil at nvidia.com>
>>>>>>> ---
>>>>>>> app/test-pmd/cmdline_flow.c                 | 26 +++++++++++++++++++++
>>>>>>> doc/guides/testpmd_app_ug/testpmd_funcs.rst |  3 ++-
>>>>>>> lib/ethdev/rte_flow.h                       |  9 ++++++-
>>>>>>> 3 files changed, 36 insertions(+), 2 deletions(-)
>>>>>>>
>>>>>>> diff --git a/app/test-pmd/cmdline_flow.c
>>>>>>> b/app/test-pmd/cmdline_flow.c index 7f50028eb7..b25b595e82 100644
>>>>>>> --- a/app/test-pmd/cmdline_flow.c
>>>>>>> +++ b/app/test-pmd/cmdline_flow.c
>>>>>>> @@ -177,6 +177,8 @@ enum index {
>>>>>>>       TABLE_INGRESS,
>>>>>>>       TABLE_EGRESS,
>>>>>>>       TABLE_TRANSFER,
>>>>>>> +     TABLE_TRANSFER_WIRE_ORIG,
>>>>>>> +     TABLE_TRANSFER_VF_ORIG,
>>>>>>>       TABLE_RULES_NUMBER,
>>>>>>>       TABLE_PATTERN_TEMPLATE,
>>>>>>>       TABLE_ACTIONS_TEMPLATE,
>>>>>>> @@ -1141,6 +1143,8 @@ static const enum index next_table_attr[] = {
>>>>>>>       TABLE_INGRESS,
>>>>>>>       TABLE_EGRESS,
>>>>>>>       TABLE_TRANSFER,
>>>>>>> +     TABLE_TRANSFER_WIRE_ORIG,
>>>>>>> +     TABLE_TRANSFER_VF_ORIG,
>>>>>>>       TABLE_RULES_NUMBER,
>>>>>>>       TABLE_PATTERN_TEMPLATE,
>>>>>>>       TABLE_ACTIONS_TEMPLATE,
>>>>>>> @@ -2881,6 +2885,18 @@ static const struct token token_list[] = {
>>>>>>>               .next = NEXT(next_table_attr),
>>>>>>>               .call = parse_table,
>>>>>>>       },
>>>>>>> +     [TABLE_TRANSFER_WIRE_ORIG] = {
>>>>>>> +             .name = "wire_orig",
>>>>>>> +             .help = "affect rule direction to transfer",
>>>>>>
>>>>>> This does not explain the "wire" aspect. It's too broad.
>>>>>>
>>>>>>> +             .next = NEXT(next_table_attr),
>>>>>>> +             .call = parse_table,
>>>>>>> +     },
>>>>>>> +     [TABLE_TRANSFER_VF_ORIG] = {
>>>>>>> +             .name = "vf_orig",
>>>>>>> +             .help = "affect rule direction to transfer",
>>>>>>
>>>>>> This explanation simply duplicates such of the "wire_orig".
>>>>>> It does not explain the "vf" part. Should be more specific.
>>>>>>
>>>>>>> +             .next = NEXT(next_table_attr),
>>>>>>> +             .call = parse_table,
>>>>>>> +     },
>>>>>>>       [TABLE_RULES_NUMBER] = {
>>>>>>>               .name = "rules_number",
>>>>>>>               .help = "number of rules in table", @@ -8894,6
>>>>>>> +8910,16 @@ parse_table(struct context *ctx, const struct token
>>>>>>> +*token,
>>>>>>>       case TABLE_TRANSFER:
>>>>>>>               out->args.table.attr.flow_attr.transfer = 1;
>>>>>>>               return len;
>>>>>>> +     case TABLE_TRANSFER_WIRE_ORIG:
>>>>>>> +             if (!out->args.table.attr.flow_attr.transfer)
>>>>>>> +                     return -1;
>>>>>>> +             out->args.table.attr.flow_attr.transfer_mode = 1;
>>>>>>> +             return len;
>>>>>>> +     case TABLE_TRANSFER_VF_ORIG:
>>>>>>> +             if (!out->args.table.attr.flow_attr.transfer)
>>>>>>> +                     return -1;
>>>>>>> +             out->args.table.attr.flow_attr.transfer_mode = 2;
>>>>>>> +             return len;
>>>>>>>       default:
>>>>>>>               return -1;
>>>>>>>       }
>>>>>>> diff --git a/doc/guides/testpmd_app_ug/testpmd_funcs.rst
>>>>>>> b/doc/guides/testpmd_app_ug/testpmd_funcs.rst
>>>>>>> index 330e34427d..603b7988dd 100644
>>>>>>> --- a/doc/guides/testpmd_app_ug/testpmd_funcs.rst
>>>>>>> +++ b/doc/guides/testpmd_app_ug/testpmd_funcs.rst
>>>>>>> @@ -3332,7 +3332,8 @@ It is bound to
>>>>>> ``rte_flow_template_table_create()``::
>>>>>>>
>>>>>>>   flow template_table {port_id} create
>>>>>>>       [table_id {id}] [group {group_id}]
>>>>>>> -       [priority {level}] [ingress] [egress] [transfer]
>>>>>>> +       [priority {level}] [ingress] [egress]
>>>>>>> +       [transfer [vf_orig] [wire_orig]]
>>>>>>
>>>>>> Is it correct? Shouldn't it rather be [transfer] [vf_orig]
>>>>>> [wire_orig] ?
>>>>>>
>>>>>>>       rules_number {number}
>>>>>>>       pattern_template {pattern_template_id}
>>>>>>>       actions_template {actions_template_id} diff --git
>>>>>>> a/lib/ethdev/rte_flow.h b/lib/ethdev/rte_flow.h index
>>>>>>> a79f1e7ef0..512b08d817 100644
>>>>>>> --- a/lib/ethdev/rte_flow.h
>>>>>>> +++ b/lib/ethdev/rte_flow.h
>>>>>>> @@ -130,7 +130,14 @@ struct rte_flow_attr {
>>>>>>>        * through a suitable port. @see rte_flow_pick_transfer_proxy().
>>>>>>>        */
>>>>>>>       uint32_t transfer:1;
>>>>>>> -     uint32_t reserved:29; /**< Reserved, must be zero. */
>>>>>>> +     /**
>>>>>>> +      * 0 means bidirection,
>>>>>>> +      * 0x1 origin uplink,
>>>>>>
>>>>>> What does "uplink" mean? It's too vague. Hardly a good term.
>>
>> I believe this comment should be reworked, in case the idea of having an extra
>> attribute persists.
>>
>>>>>>
>>>>>>> +      * 0x2 origin vport,
>>>>>>
>>>>>> What does "origin vport" mean? Hardly a good term as well.
>>
>> I still believe this explanation is way too brief and needs to be reworked to
>> provide more details, to define the use case for the attribute more specifically.
>>
>>>>>>
>>>>>>> +      * N/A both set.
>>>>>>
>>>>>> What's this?
>>
>> The question stands.
>>
>>>>>>
>>>>>>> +      */
>>>>>>> +     uint32_t transfer_mode:2;
>>>>>>> +     uint32_t reserved:27; /**< Reserved, must be zero. */
>>>>>>> };
>>>>>>>
>>>>>>> /**
>>>>>>> --
>>>>>>> 2.27.0
>>>>>>>
>>>>>>
>>>>>> Since the attributes are added to generic 'struct rte_flow_attr',
>>>>>> non-table
>>>>>> (synchronous) flow rules are supposed to support them, too. If that
>>>>>> is indeed the case, then I'm afraid such proposal does not agree
>>>>>> with the existing items PORT_REPRESENTOR and REPRESENTED_PORT.
>> They
>>>>>> do exactly the same thing, but they are designed to be way more
>>>>>> generic. Why
>>>> not use them?
>>>>
>>>> The question stands.
>>>>
>>>>>>
>>>>>> Ivan
>>>>>
>>>>
>>>> Ivan
>>>
>

Thank you.