DPDK patches and discussions
 help / color / mirror / Atom feed
From: Ori Kam <orika@nvidia.com>
To: "Zhang, Qi Z" <qi.z.zhang@intel.com>, "dev@dpdk.org" <dev@dpdk.org>
Cc: "techboard@dpdk.org" <techboard@dpdk.org>,
	"Richardson, Bruce" <bruce.richardson@intel.com>,
	"Burakov, Anatoly" <anatoly.burakov@intel.com>,
	 "Wiles, Keith" <keith.wiles@intel.com>,
	"Liang, Cunming" <cunming.liang@intel.com>,
	"Wu, Jingjing" <jingjing.wu@intel.com>,
	"Zhang, Helin" <helin.zhang@intel.com>,
	"Mcnamara, John" <john.mcnamara@intel.com>,
	"Xu, Rosen" <rosen.xu@intel.com>
Subject: RE: seeking community input on adapting DPDK to P4Runtime backend
Date: Wed, 17 May 2023 15:18:38 +0000	[thread overview]
Message-ID: <MW2PR12MB46661C7EDC20D4612B05E1ABD67E9@MW2PR12MB4666.namprd12.prod.outlook.com> (raw)
In-Reply-To: <DM4PR11MB599421E2DB486B0E972137F1D7719@DM4PR11MB5994.namprd11.prod.outlook.com>

Hi Zhang,

rte_flow is an excellent candidate for implementing P4.
We and some internal tests that shows great promise in this regard.

I would be very happy to supply any needed information and have
discussion on how to continue with this project.

Please see inline detailed answers.

Best,
Ori Kam




> -----Original Message-----
> From: Zhang, Qi Z <qi.z.zhang@intel.com>
> Sent: Monday, May 8, 2023 9:40 AM
> Subject: seeking community input on adapting DPDK to P4Runtime backend
> 
> Hi:
> 
> Our team is currently working on developing a DPDK PMD for a P4-
> programmed network controller, based on customer feedback to integrate
> DPDK into the P4Runtime backend .[https://p4.org/p4-
> spec/p4runtime/main/P4Runtime-Spec.html]
> 
> (*) However, we are facing challenges in adapting DPDK's rte_flow API to the
> P4Runtime API, primarily due to the transition from a table-based API with
> fields of arbitrary bits width at arbitrary offset to a protocol-based API (more
> detail be described in post-script).
> 
> We are seeking suggestions and best practices from the open-source
> community to help us with this integration. Specifically, we are interested in
> learning:
> 
> (*) If anyone has previously attempted to map rte_flow to P4-based devices.

We did try successfully.

> (*) Thoughts on how to map from table-based matching to protocol-based
> matching like in rte_flow.

Rte_flow is table based (groups), now with the introduction of template API 
rte_flow is even more table based (we added the concept of tables) which are just what
p4 requires.

> (*) Any ideas on how to extend or expand the rte_flow APIs to better
> accommodate P4-based or other table-matching based devices.
> 

Lets discuss any issue you have.

> Your insights and feedback would be greatly appreciated!
> 
> ======================= Post-Script ============================
> 
> More details on the problem below, for anyone interested
> 
> In P4, flow offloading can be implemented using the P4Runtime API, which
> provides a standard interface for controlling and configuring the data plane
> behavior of network devices. P4Runtime allows network operators to
> dynamically add, modify, and remove flow rules in the hardware forwarding
> tables of P4-enabled devices.
> 
> The P4Runtime API is a table-based API, it assume the packet process pipeline
> was consists of one or more key/action units (tables). In P4Runtime, each
> table defines the fields to be matched and the actions to be taken on
> incoming packets. During compilation, the P4 compiler assigns a unique
> uint32 ID to each table, action, and field, which is associated with its
> corresponding string name. These IDs have no inherent relationship to any
> network protocol but instead serve as a means to identify different
> components of a P4 program within the P4Runtime API.
> 
This is the concept of tables and groups in rte_flow.

> If we choose to use rte_flow as the low-level API for P4Runtime, a translation
> layer is needed in the application to map the P4 tables and actions to the
> corresponding rte_flow rules. However, this translation layer can be
> problematic as it is not easily scalable. When the P4 pipeline is refined or
> updated, the translation rules may also need to be updated, which can result
> in errors and reduced efficiency.
> 
I don't understand why.

> On the other hand, a hardware vendor that provides a P4-enabled device is
> required to implement an rte_flow interface in their DPDK PMD. Typically, the
> P4 compiler generates hints for the driver on how to map P4 tables to
> hardware resources, and how to convert table entry add/modify/delete
> actions into low-level hardware configurations. However, because rte_flow is
> protocol-based, it poses an additional challenge for driver developers, who
> must create another translation layer to convert rte_flow tokens into P4
> object identifiers. This translation layer must be carefully designed and
> implemented to ensure optimal performance and scalability, and to ensure
> that the driver can efficiently handle the dynamic nature of P4 programs.
> 
Right, but some of the translation can be done in shared code by all PMDs
and the translation is static for the compilation so inserting rules can be supper fast
with no need for extra work.

> To better understand the problem, let's consider the following example that
> demonstrates how to use the P4Runtime API to program a rule for processing
> a VXLAN packet. The rule matches a VXLAN packet, decapsulates the tunnel
> header, and forwards it to a specific port.
> 
> The P4 source code below describes the VXLAN decap table
> decap_vxlan_tcp_table, which matches the outer IP address, VNI, inner IP
> address, and inner TCP port. For each rule, four action specifications can be
> selected. We will focus on one action specification decap_vxlan_fwd that
> performs decapsulation and forwards the packet to a specific port.
> 
> table decap_vxlan_tcp_table {
>     key = {
>         hdrs.ipv4[meta.depth-1].src_ip: exact @name("tun_ip_src");
>         hdrs.ipv4[meta.depth-1].dst_ip: exact @name("tun_ip_dst");
>         hdrs.vxlan[meta.depth-1].vni  : exact @name("vni");
>         hdrs.ipv4[meta.depth].src_ip  : exact @name("ipv4_src");
>         hdrs.ipv4[meta.depth].dst_ip  : exact @name("ipv4_dst");
>         hdrs.tcp.sport                : exact @name("src_port");
>         hdrs.tcp.dport                : exact @name("dst_port");
>     }
>     actions = {
>         @tableonly decap_vxlan_fwd;
>         @tableonly decap_vxlan_dnat_fwd;
>         @tableonly decap_vxlan_snat_fwd;
>         @defaultonly set_exception;
>     }
> }
Translate to rte_flow:
template pattern relaxed_mode = 1 pattern =  Ipv4_src / ipv4_dst  / vni / ipv4_src / ipv4_dst / tcp_sport / tcp_dport  
map structure = {
	tun_ip_src = &pattern[ipv4_src]
	....
}
> ...
> 
> action decap_vxlan_fwd(PortId_t port_id) {
>     meta.mod_action = (bit<11>)VXLAN_DECAP_OUTER_IPV4;
>     send_to_port(port_id);
> }
> 
Same as above just with action template

> Below is an example of the hint that the compiler will generate for the
> decap_vxlan_tcp_table:
> 
> Table ID:      8454144
> Name:          decap_vxlan_tcp_table
> Field ID       Name                          Match Type     Bit Width      Byte
> Width     Byte Order
> 1              tun_ip_src                    exact          32             4              network
> 2              tun_ip_dst                    exact          32             4              network
> 3              vni                           exact          24             3              network
> 4              ipv4_src                      exact          32             4              network
> 5              ipv4_dst                      exact          32             4              network
> 6              src_port                      exact          16             2              network
> 7              dst_port                      exact          16             2              network
> Spec ID        Name
> 8519716        decap_vxlan_fwd
> 8519718        decap_vxlan_dnat_fwd
> 8519720        decap_vxlan_snat_fwd
> 8519695        set_exception
> 
> And the hint of action spec "decap_vxlan_fwd" as below:
> 
> Spec ID:       8519716
> Name:          decap_vxlan_fwd
> Field ID       Name                          Bit Width      Byte Width     Byte Order
> 1              port_id                       32             4              host
> 
> Please note that different compilers may assign different IDs.
> 
> Below is an example of how to program a rule using the P4 runtime API in
> JSON format. This rule matches fields and directs packets to port 5.
> 
> {
>     "type": 1,  //INSERT
>     "entity": {
>         "table_entry": {
>             "table_id": 8454144,
>             "match": [
>                 { "field_id": 1, "exact": { "value": [10, 0, 0, 1] } },   // outer src IP =
> 10.0.0.1
>                 { "field_id": 2, "exact": { "value": [10, 0, 0, 2] } },  // outer dst IP =
> 10.0.0.2
>                 { "field_id": 3, "exact": { "value": [0, 0, 10] } },  //  vni = 10,
>                 { "field_id": 4, "exact": { "value": [192, 0, 0, 1] } }, // inner src IP =
> 192.0.0.1
>                 {"field_id": 5, "exact": { "value": [192, 0, 0, 2] } }, // inner dst IP =
> 192.0.0.2
>                 {"field_id": 6, "exact": { "value": [0, 200] } }, // tcp src port = 200
>                 {"field_id": 7, "exact": { "value": [0, 201] } }, // tcp dst port = 201
>             ],
>             "action": {
>                 "action": {
>                     "action_id": 8519716,
>                     "params": [
>                         { "param_id": 1, "value": [5, 0, 0, 0] }
>                     ]
>                 }
>             },
>             ...
>         }
>     }    ...
> }
> 
> Please note that this is only a part of the full command. For more
> information, please refer to the p4runtime.proto[2]
> 
> 1. https://p4.org/p4-spec/p4runtime/main/P4Runtime-Spec.html
> 2.
> https://github.com/p4lang/p4runtime/blob/main/proto/p4/v1/p4runtime.pr
> oto
> 
> Thank you for your attention to this matter.
> 

I think that we should schedule some meeting to see
how much gaps we really have between the rte_flow and 
P4 and how we can improve the rte_flow to allow the best
experience. 

> Regards
> Qi

  reply	other threads:[~2023-05-17 15:18 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-05-08  6:39 Zhang, Qi Z
2023-05-17 15:18 ` Ori Kam [this message]
2023-05-18 10:33   ` Zhang, Qi Z
2023-05-18 14:33     ` Ori Kam
2023-05-22  5:12       ` Zhang, Qi Z
2023-05-24 15:00         ` Jerin Jacob
2023-05-24 15:43           ` Thomas Monjalon
2023-05-18 14:45     ` Honnappa Nagarahalli
2023-05-22  4:58       ` Zhang, Qi Z

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=MW2PR12MB46661C7EDC20D4612B05E1ABD67E9@MW2PR12MB4666.namprd12.prod.outlook.com \
    --to=orika@nvidia.com \
    --cc=anatoly.burakov@intel.com \
    --cc=bruce.richardson@intel.com \
    --cc=cunming.liang@intel.com \
    --cc=dev@dpdk.org \
    --cc=helin.zhang@intel.com \
    --cc=jingjing.wu@intel.com \
    --cc=john.mcnamara@intel.com \
    --cc=keith.wiles@intel.com \
    --cc=qi.z.zhang@intel.com \
    --cc=rosen.xu@intel.com \
    --cc=techboard@dpdk.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).