* seeking community input on adapting DPDK to P4Runtime backend
@ 2023-05-08 6:39 Zhang, Qi Z
2023-05-17 15:18 ` Ori Kam
0 siblings, 1 reply; 9+ messages in thread
From: Zhang, Qi Z @ 2023-05-08 6:39 UTC (permalink / raw)
To: dev
Cc: techboard, Richardson, Bruce, Burakov, Anatoly, Wiles, Keith,
Liang, Cunming, Wu, Jingjing, Zhang, Helin, Mcnamara, John, Xu,
Rosen
Hi:
Our team is currently working on developing a DPDK PMD for a P4-programmed network controller, based on customer feedback to integrate DPDK into the P4Runtime backend .[https://p4.org/p4-spec/p4runtime/main/P4Runtime-Spec.html]
(*) However, we are facing challenges in adapting DPDK's rte_flow API to the P4Runtime API, primarily due to the transition from a table-based API with fields of arbitrary bits width at arbitrary offset to a protocol-based API (more detail be described in post-script).
We are seeking suggestions and best practices from the open-source community to help us with this integration. Specifically, we are interested in learning:
(*) If anyone has previously attempted to map rte_flow to P4-based devices.
(*) Thoughts on how to map from table-based matching to protocol-based matching like in rte_flow.
(*) Any ideas on how to extend or expand the rte_flow APIs to better accommodate P4-based or other table-matching based devices.
Your insights and feedback would be greatly appreciated!
======================= Post-Script ============================
More details on the problem below, for anyone interested
In P4, flow offloading can be implemented using the P4Runtime API, which provides a standard interface for controlling and configuring the data plane behavior of network devices. P4Runtime allows network operators to dynamically add, modify, and remove flow rules in the hardware forwarding tables of P4-enabled devices.
The P4Runtime API is a table-based API, it assume the packet process pipeline was consists of one or more key/action units (tables). In P4Runtime, each table defines the fields to be matched and the actions to be taken on incoming packets. During compilation, the P4 compiler assigns a unique uint32 ID to each table, action, and field, which is associated with its corresponding string name. These IDs have no inherent relationship to any network protocol but instead serve as a means to identify different components of a P4 program within the P4Runtime API.
If we choose to use rte_flow as the low-level API for P4Runtime, a translation layer is needed in the application to map the P4 tables and actions to the corresponding rte_flow rules. However, this translation layer can be problematic as it is not easily scalable. When the P4 pipeline is refined or updated, the translation rules may also need to be updated, which can result in errors and reduced efficiency.
On the other hand, a hardware vendor that provides a P4-enabled device is required to implement an rte_flow interface in their DPDK PMD. Typically, the P4 compiler generates hints for the driver on how to map P4 tables to hardware resources, and how to convert table entry add/modify/delete actions into low-level hardware configurations. However, because rte_flow is protocol-based, it poses an additional challenge for driver developers, who must create another translation layer to convert rte_flow tokens into P4 object identifiers. This translation layer must be carefully designed and implemented to ensure optimal performance and scalability, and to ensure that the driver can efficiently handle the dynamic nature of P4 programs.
To better understand the problem, let's consider the following example that demonstrates how to use the P4Runtime API to program a rule for processing a VXLAN packet. The rule matches a VXLAN packet, decapsulates the tunnel header, and forwards it to a specific port.
The P4 source code below describes the VXLAN decap table decap_vxlan_tcp_table, which matches the outer IP address, VNI, inner IP address, and inner TCP port. For each rule, four action specifications can be selected. We will focus on one action specification decap_vxlan_fwd that performs decapsulation and forwards the packet to a specific port.
table decap_vxlan_tcp_table {
key = {
hdrs.ipv4[meta.depth-1].src_ip: exact @name("tun_ip_src");
hdrs.ipv4[meta.depth-1].dst_ip: exact @name("tun_ip_dst");
hdrs.vxlan[meta.depth-1].vni : exact @name("vni");
hdrs.ipv4[meta.depth].src_ip : exact @name("ipv4_src");
hdrs.ipv4[meta.depth].dst_ip : exact @name("ipv4_dst");
hdrs.tcp.sport : exact @name("src_port");
hdrs.tcp.dport : exact @name("dst_port");
}
actions = {
@tableonly decap_vxlan_fwd;
@tableonly decap_vxlan_dnat_fwd;
@tableonly decap_vxlan_snat_fwd;
@defaultonly set_exception;
}
}
...
action decap_vxlan_fwd(PortId_t port_id) {
meta.mod_action = (bit<11>)VXLAN_DECAP_OUTER_IPV4;
send_to_port(port_id);
}
Below is an example of the hint that the compiler will generate for the decap_vxlan_tcp_table:
Table ID: 8454144
Name: decap_vxlan_tcp_table
Field ID Name Match Type Bit Width Byte Width Byte Order
1 tun_ip_src exact 32 4 network
2 tun_ip_dst exact 32 4 network
3 vni exact 24 3 network
4 ipv4_src exact 32 4 network
5 ipv4_dst exact 32 4 network
6 src_port exact 16 2 network
7 dst_port exact 16 2 network
Spec ID Name
8519716 decap_vxlan_fwd
8519718 decap_vxlan_dnat_fwd
8519720 decap_vxlan_snat_fwd
8519695 set_exception
And the hint of action spec "decap_vxlan_fwd" as below:
Spec ID: 8519716
Name: decap_vxlan_fwd
Field ID Name Bit Width Byte Width Byte Order
1 port_id 32 4 host
Please note that different compilers may assign different IDs.
Below is an example of how to program a rule using the P4 runtime API in JSON format. This rule matches fields and directs packets to port 5.
{
"type": 1, //INSERT
"entity": {
"table_entry": {
"table_id": 8454144,
"match": [
{ "field_id": 1, "exact": { "value": [10, 0, 0, 1] } }, // outer src IP = 10.0.0.1
{ "field_id": 2, "exact": { "value": [10, 0, 0, 2] } }, // outer dst IP = 10.0.0.2
{ "field_id": 3, "exact": { "value": [0, 0, 10] } }, // vni = 10,
{ "field_id": 4, "exact": { "value": [192, 0, 0, 1] } }, // inner src IP = 192.0.0.1
{"field_id": 5, "exact": { "value": [192, 0, 0, 2] } }, // inner dst IP = 192.0.0.2
{"field_id": 6, "exact": { "value": [0, 200] } }, // tcp src port = 200
{"field_id": 7, "exact": { "value": [0, 201] } }, // tcp dst port = 201
],
"action": {
"action": {
"action_id": 8519716,
"params": [
{ "param_id": 1, "value": [5, 0, 0, 0] }
]
}
},
...
}
} ...
}
Please note that this is only a part of the full command. For more information, please refer to the p4runtime.proto[2]
1. https://p4.org/p4-spec/p4runtime/main/P4Runtime-Spec.html
2. https://github.com/p4lang/p4runtime/blob/main/proto/p4/v1/p4runtime.proto
Thank you for your attention to this matter.
Regards
Qi
^ permalink raw reply [flat|nested] 9+ messages in thread
* RE: seeking community input on adapting DPDK to P4Runtime backend
2023-05-08 6:39 seeking community input on adapting DPDK to P4Runtime backend Zhang, Qi Z
@ 2023-05-17 15:18 ` Ori Kam
2023-05-18 10:33 ` Zhang, Qi Z
0 siblings, 1 reply; 9+ messages in thread
From: Ori Kam @ 2023-05-17 15:18 UTC (permalink / raw)
To: Zhang, Qi Z, dev
Cc: techboard, Richardson, Bruce, Burakov, Anatoly, Wiles, Keith,
Liang, Cunming, Wu, Jingjing, Zhang, Helin, Mcnamara, John, Xu,
Rosen
Hi Zhang,
rte_flow is an excellent candidate for implementing P4.
We and some internal tests that shows great promise in this regard.
I would be very happy to supply any needed information and have
discussion on how to continue with this project.
Please see inline detailed answers.
Best,
Ori Kam
> -----Original Message-----
> From: Zhang, Qi Z <qi.z.zhang@intel.com>
> Sent: Monday, May 8, 2023 9:40 AM
> Subject: seeking community input on adapting DPDK to P4Runtime backend
>
> Hi:
>
> Our team is currently working on developing a DPDK PMD for a P4-
> programmed network controller, based on customer feedback to integrate
> DPDK into the P4Runtime backend .[https://p4.org/p4-
> spec/p4runtime/main/P4Runtime-Spec.html]
>
> (*) However, we are facing challenges in adapting DPDK's rte_flow API to the
> P4Runtime API, primarily due to the transition from a table-based API with
> fields of arbitrary bits width at arbitrary offset to a protocol-based API (more
> detail be described in post-script).
>
> We are seeking suggestions and best practices from the open-source
> community to help us with this integration. Specifically, we are interested in
> learning:
>
> (*) If anyone has previously attempted to map rte_flow to P4-based devices.
We did try successfully.
> (*) Thoughts on how to map from table-based matching to protocol-based
> matching like in rte_flow.
Rte_flow is table based (groups), now with the introduction of template API
rte_flow is even more table based (we added the concept of tables) which are just what
p4 requires.
> (*) Any ideas on how to extend or expand the rte_flow APIs to better
> accommodate P4-based or other table-matching based devices.
>
Lets discuss any issue you have.
> Your insights and feedback would be greatly appreciated!
>
> ======================= Post-Script ============================
>
> More details on the problem below, for anyone interested
>
> In P4, flow offloading can be implemented using the P4Runtime API, which
> provides a standard interface for controlling and configuring the data plane
> behavior of network devices. P4Runtime allows network operators to
> dynamically add, modify, and remove flow rules in the hardware forwarding
> tables of P4-enabled devices.
>
> The P4Runtime API is a table-based API, it assume the packet process pipeline
> was consists of one or more key/action units (tables). In P4Runtime, each
> table defines the fields to be matched and the actions to be taken on
> incoming packets. During compilation, the P4 compiler assigns a unique
> uint32 ID to each table, action, and field, which is associated with its
> corresponding string name. These IDs have no inherent relationship to any
> network protocol but instead serve as a means to identify different
> components of a P4 program within the P4Runtime API.
>
This is the concept of tables and groups in rte_flow.
> If we choose to use rte_flow as the low-level API for P4Runtime, a translation
> layer is needed in the application to map the P4 tables and actions to the
> corresponding rte_flow rules. However, this translation layer can be
> problematic as it is not easily scalable. When the P4 pipeline is refined or
> updated, the translation rules may also need to be updated, which can result
> in errors and reduced efficiency.
>
I don't understand why.
> On the other hand, a hardware vendor that provides a P4-enabled device is
> required to implement an rte_flow interface in their DPDK PMD. Typically, the
> P4 compiler generates hints for the driver on how to map P4 tables to
> hardware resources, and how to convert table entry add/modify/delete
> actions into low-level hardware configurations. However, because rte_flow is
> protocol-based, it poses an additional challenge for driver developers, who
> must create another translation layer to convert rte_flow tokens into P4
> object identifiers. This translation layer must be carefully designed and
> implemented to ensure optimal performance and scalability, and to ensure
> that the driver can efficiently handle the dynamic nature of P4 programs.
>
Right, but some of the translation can be done in shared code by all PMDs
and the translation is static for the compilation so inserting rules can be supper fast
with no need for extra work.
> To better understand the problem, let's consider the following example that
> demonstrates how to use the P4Runtime API to program a rule for processing
> a VXLAN packet. The rule matches a VXLAN packet, decapsulates the tunnel
> header, and forwards it to a specific port.
>
> The P4 source code below describes the VXLAN decap table
> decap_vxlan_tcp_table, which matches the outer IP address, VNI, inner IP
> address, and inner TCP port. For each rule, four action specifications can be
> selected. We will focus on one action specification decap_vxlan_fwd that
> performs decapsulation and forwards the packet to a specific port.
>
> table decap_vxlan_tcp_table {
> key = {
> hdrs.ipv4[meta.depth-1].src_ip: exact @name("tun_ip_src");
> hdrs.ipv4[meta.depth-1].dst_ip: exact @name("tun_ip_dst");
> hdrs.vxlan[meta.depth-1].vni : exact @name("vni");
> hdrs.ipv4[meta.depth].src_ip : exact @name("ipv4_src");
> hdrs.ipv4[meta.depth].dst_ip : exact @name("ipv4_dst");
> hdrs.tcp.sport : exact @name("src_port");
> hdrs.tcp.dport : exact @name("dst_port");
> }
> actions = {
> @tableonly decap_vxlan_fwd;
> @tableonly decap_vxlan_dnat_fwd;
> @tableonly decap_vxlan_snat_fwd;
> @defaultonly set_exception;
> }
> }
Translate to rte_flow:
template pattern relaxed_mode = 1 pattern = Ipv4_src / ipv4_dst / vni / ipv4_src / ipv4_dst / tcp_sport / tcp_dport
map structure = {
tun_ip_src = &pattern[ipv4_src]
....
}
> ...
>
> action decap_vxlan_fwd(PortId_t port_id) {
> meta.mod_action = (bit<11>)VXLAN_DECAP_OUTER_IPV4;
> send_to_port(port_id);
> }
>
Same as above just with action template
> Below is an example of the hint that the compiler will generate for the
> decap_vxlan_tcp_table:
>
> Table ID: 8454144
> Name: decap_vxlan_tcp_table
> Field ID Name Match Type Bit Width Byte
> Width Byte Order
> 1 tun_ip_src exact 32 4 network
> 2 tun_ip_dst exact 32 4 network
> 3 vni exact 24 3 network
> 4 ipv4_src exact 32 4 network
> 5 ipv4_dst exact 32 4 network
> 6 src_port exact 16 2 network
> 7 dst_port exact 16 2 network
> Spec ID Name
> 8519716 decap_vxlan_fwd
> 8519718 decap_vxlan_dnat_fwd
> 8519720 decap_vxlan_snat_fwd
> 8519695 set_exception
>
> And the hint of action spec "decap_vxlan_fwd" as below:
>
> Spec ID: 8519716
> Name: decap_vxlan_fwd
> Field ID Name Bit Width Byte Width Byte Order
> 1 port_id 32 4 host
>
> Please note that different compilers may assign different IDs.
>
> Below is an example of how to program a rule using the P4 runtime API in
> JSON format. This rule matches fields and directs packets to port 5.
>
> {
> "type": 1, //INSERT
> "entity": {
> "table_entry": {
> "table_id": 8454144,
> "match": [
> { "field_id": 1, "exact": { "value": [10, 0, 0, 1] } }, // outer src IP =
> 10.0.0.1
> { "field_id": 2, "exact": { "value": [10, 0, 0, 2] } }, // outer dst IP =
> 10.0.0.2
> { "field_id": 3, "exact": { "value": [0, 0, 10] } }, // vni = 10,
> { "field_id": 4, "exact": { "value": [192, 0, 0, 1] } }, // inner src IP =
> 192.0.0.1
> {"field_id": 5, "exact": { "value": [192, 0, 0, 2] } }, // inner dst IP =
> 192.0.0.2
> {"field_id": 6, "exact": { "value": [0, 200] } }, // tcp src port = 200
> {"field_id": 7, "exact": { "value": [0, 201] } }, // tcp dst port = 201
> ],
> "action": {
> "action": {
> "action_id": 8519716,
> "params": [
> { "param_id": 1, "value": [5, 0, 0, 0] }
> ]
> }
> },
> ...
> }
> } ...
> }
>
> Please note that this is only a part of the full command. For more
> information, please refer to the p4runtime.proto[2]
>
> 1. https://p4.org/p4-spec/p4runtime/main/P4Runtime-Spec.html
> 2.
> https://github.com/p4lang/p4runtime/blob/main/proto/p4/v1/p4runtime.pr
> oto
>
> Thank you for your attention to this matter.
>
I think that we should schedule some meeting to see
how much gaps we really have between the rte_flow and
P4 and how we can improve the rte_flow to allow the best
experience.
> Regards
> Qi
^ permalink raw reply [flat|nested] 9+ messages in thread
* RE: seeking community input on adapting DPDK to P4Runtime backend
2023-05-17 15:18 ` Ori Kam
@ 2023-05-18 10:33 ` Zhang, Qi Z
2023-05-18 14:33 ` Ori Kam
2023-05-18 14:45 ` Honnappa Nagarahalli
0 siblings, 2 replies; 9+ messages in thread
From: Zhang, Qi Z @ 2023-05-18 10:33 UTC (permalink / raw)
To: Ori Kam, dev
Cc: techboard, Richardson, Bruce, Burakov, Anatoly, Wiles, Keith,
Liang, Cunming, Wu, Jingjing, Zhang, Helin, Mcnamara, John, Xu,
Rosen
> -----Original Message-----
> From: Ori Kam <orika@nvidia.com>
> Sent: Wednesday, May 17, 2023 11:19 PM
> To: Zhang, Qi Z <qi.z.zhang@intel.com>; dev@dpdk.org
> Cc: techboard@dpdk.org; Richardson, Bruce <bruce.richardson@intel.com>;
> Burakov, Anatoly <anatoly.burakov@intel.com>; Wiles, Keith
> <keith.wiles@intel.com>; Liang, Cunming <cunming.liang@intel.com>; Wu,
> Jingjing <jingjing.wu@intel.com>; Zhang, Helin <helin.zhang@intel.com>;
> Mcnamara, John <john.mcnamara@intel.com>; Xu, Rosen
> <rosen.xu@intel.com>
> Subject: RE: seeking community input on adapting DPDK to P4Runtime
> backend
>
> Hi Zhang,
>
> rte_flow is an excellent candidate for implementing P4.
> We and some internal tests that shows great promise in this regard.
>
> I would be very happy to supply any needed information and have
> discussion on how to continue with this project.
Thank you Ori! Please check my following comments
Regards
Qi
>
> Please see inline detailed answers.
>
> Best,
> Ori Kam
>
>
>
>
> > -----Original Message-----
> > From: Zhang, Qi Z <qi.z.zhang@intel.com>
> > Sent: Monday, May 8, 2023 9:40 AM
> > Subject: seeking community input on adapting DPDK to P4Runtime
> backend
> >
> > Hi:
> >
> > Our team is currently working on developing a DPDK PMD for a P4-
> > programmed network controller, based on customer feedback to integrate
> > DPDK into the P4Runtime backend .[https://p4.org/p4-
> > spec/p4runtime/main/P4Runtime-Spec.html]
> >
> > (*) However, we are facing challenges in adapting DPDK's rte_flow API
> > to the P4Runtime API, primarily due to the transition from a
> > table-based API with fields of arbitrary bits width at arbitrary
> > offset to a protocol-based API (more detail be described in post-script).
> >
> > We are seeking suggestions and best practices from the open-source
> > community to help us with this integration. Specifically, we are
> > interested in
> > learning:
> >
> > (*) If anyone has previously attempted to map rte_flow to P4-based
> devices.
>
> We did try successfully.
>
> > (*) Thoughts on how to map from table-based matching to protocol-based
> > matching like in rte_flow.
>
> Rte_flow is table based (groups), now with the introduction of template API
> rte_flow is even more table based (we added the concept of tables) which
> are just what
> p4 requires.
Yes, the rte_flow template can be used to map a sequence of patterns to a P4 table and a sequence of actions to a P4 action. However, Using a fixed rte_flow template can be problematic when handling different P4 programs in the same driver. To provide more flexibility, the mapping of patterns and actions can be externalized into a configuration file or part of the firmware can be learned from driver, allowing for customization based on the specific requirements of each P4 pipeline. actually we have enabled this approach in order to accommodate different P4 programs.
However, an alternative approach to consider is whether it would be feasible to directly expose the P4 table and action names or IDs to the application, rather than relying on rte_flow templates. This approach offers several potential benefits:
Integration with P4runtime Backend: By exposing the P4 table and action names or IDs directly, DPDK could be easily integrated as a P4runtime backend. This eliminates the need for translation from the P4runtime API to rte_flow templates in the application, simplifying the integration process.
Elimination of Manual Mapping: Exposing the P4 table and action names or IDs to the application would remove the requirement for the engineering team to manually map each pipeline to specific rte_flow templates. This is particularly beneficial in cases where hardware vendors provide customers with a toolchain to create their own P4 pipelines but do not necessarily own the P4 programs. By eliminating the dependency on rte_flow templates, this approach reduces complexity in using DPDK as the driver.
To be more specific, the proposed API for exposing P4 table and action names or IDs directly to the application could be as follows:
/* Get the table info */
struct rte_p4_table_info tbl_info;
rte_p4_table_info_get_by_name(port_id, "decap_vxlan_tcp_table", &tbl_info);
/* Create the key */
struct rte_p4_table_key *key;
rte_p4_table_key_create(port_id, tbl_info->id, &key);
/* Set the key fields */
rte_p4_table_key_field_set_by_name(port_id, key, "wire_port", &wire_port, 2);
rte_p4_table_key_field_set_by_name(port_id, key, "tun_ip_src", &tun_ip_src, 4);
rte_p4_table_key_field_set_by_name(port_id, key, "tun_ip_dst", &tun_ip_dst, 4);
rte_p4_table_key_field_set_by_name(port_id, key, "vni", &vni, 3);
rte_p4_table_key_field_set_by_name(port_id, key, "ipv4_src", &ipv4_src, 4);
rte_p4_table_key_field_set_by_name(port_id, key, "ipv4_dst", &ipv4_dst, 4);
rte_p4_table_key_field_set_by_name(port_id, key, "src_port", &src_port, 2);
rte_p4_table_key_field_set_by_name(port_id, key, "dst_port", &dst_port, 2);
/* Get the action spec info */
struct rte_p4_action_spec_info as_info;
rte_p4_action_spec_info_get_by_name(port_id, "decap_vxlan_fwd", &as_info);
/* Create the action */
struct rte_p4_action *action;
rte_p4_action_create(port_id, as_info->id, &action);
/* Set the action fields */
rte_p4_table_action_field_set_by_name(port_id, action, "mod_id", &mod_id, 3);
rte_p4_table_action_field_set_by_name(port_id, action, "port_id", &target_port_id, 2);
/* Add the entry */
rte_p4_table_entry_add(port_id, tbl_info->id, key, action);
...
>
> > (*) Any ideas on how to extend or expand the rte_flow APIs to better
> > accommodate P4-based or other table-matching based devices.
> >
>
> Lets discuss any issue you have.
>
> > Your insights and feedback would be greatly appreciated!
> >
> > ======================= Post-Script ============================
> >
> > More details on the problem below, for anyone interested
> >
> > In P4, flow offloading can be implemented using the P4Runtime API,
> > which provides a standard interface for controlling and configuring
> > the data plane behavior of network devices. P4Runtime allows network
> > operators to dynamically add, modify, and remove flow rules in the
> > hardware forwarding tables of P4-enabled devices.
> >
> > The P4Runtime API is a table-based API, it assume the packet process
> > pipeline was consists of one or more key/action units (tables). In
> > P4Runtime, each table defines the fields to be matched and the actions
> > to be taken on incoming packets. During compilation, the P4 compiler
> > assigns a unique
> > uint32 ID to each table, action, and field, which is associated with
> > its corresponding string name. These IDs have no inherent relationship
> > to any network protocol but instead serve as a means to identify
> > different components of a P4 program within the P4Runtime API.
> >
> This is the concept of tables and groups in rte_flow.
>
> > If we choose to use rte_flow as the low-level API for P4Runtime, a
> > translation layer is needed in the application to map the P4 tables
> > and actions to the corresponding rte_flow rules. However, this
> > translation layer can be problematic as it is not easily scalable.
> > When the P4 pipeline is refined or updated, the translation rules may
> > also need to be updated, which can result in errors and reduced efficiency.
> >
> I don't understand why.
>
> > On the other hand, a hardware vendor that provides a P4-enabled device
> > is required to implement an rte_flow interface in their DPDK PMD.
> > Typically, the
> > P4 compiler generates hints for the driver on how to map P4 tables to
> > hardware resources, and how to convert table entry add/modify/delete
> > actions into low-level hardware configurations. However, because
> > rte_flow is protocol-based, it poses an additional challenge for
> > driver developers, who must create another translation layer to
> > convert rte_flow tokens into P4 object identifiers. This translation
> > layer must be carefully designed and implemented to ensure optimal
> > performance and scalability, and to ensure that the driver can efficiently
> handle the dynamic nature of P4 programs.
> >
> Right, but some of the translation can be done in shared code by all PMDs
> and the translation is static for the compilation so inserting rules can be
> supper fast with no need for extra work.
>
> > To better understand the problem, let's consider the following example
> > that demonstrates how to use the P4Runtime API to program a rule for
> > processing a VXLAN packet. The rule matches a VXLAN packet,
> > decapsulates the tunnel header, and forwards it to a specific port.
> >
> > The P4 source code below describes the VXLAN decap table
> > decap_vxlan_tcp_table, which matches the outer IP address, VNI, inner
> > IP address, and inner TCP port. For each rule, four action
> > specifications can be selected. We will focus on one action
> > specification decap_vxlan_fwd that performs decapsulation and forwards
> the packet to a specific port.
> >
> > table decap_vxlan_tcp_table {
> > key = {
> > hdrs.ipv4[meta.depth-1].src_ip: exact @name("tun_ip_src");
> > hdrs.ipv4[meta.depth-1].dst_ip: exact @name("tun_ip_dst");
> > hdrs.vxlan[meta.depth-1].vni : exact @name("vni");
> > hdrs.ipv4[meta.depth].src_ip : exact @name("ipv4_src");
> > hdrs.ipv4[meta.depth].dst_ip : exact @name("ipv4_dst");
> > hdrs.tcp.sport : exact @name("src_port");
> > hdrs.tcp.dport : exact @name("dst_port");
> > }
> > actions = {
> > @tableonly decap_vxlan_fwd;
> > @tableonly decap_vxlan_dnat_fwd;
> > @tableonly decap_vxlan_snat_fwd;
> > @defaultonly set_exception;
> > }
> > }
> Translate to rte_flow:
> template pattern relaxed_mode = 1 pattern = Ipv4_src / ipv4_dst / vni /
> ipv4_src / ipv4_dst / tcp_sport / tcp_dport map structure = {
> tun_ip_src = &pattern[ipv4_src]
> ....
> }
> > ...
> >
> > action decap_vxlan_fwd(PortId_t port_id) {
> > meta.mod_action = (bit<11>)VXLAN_DECAP_OUTER_IPV4;
> > send_to_port(port_id);
> > }
> >
> Same as above just with action template
>
> > Below is an example of the hint that the compiler will generate for
> > the
> > decap_vxlan_tcp_table:
> >
> > Table ID: 8454144
> > Name: decap_vxlan_tcp_table
> > Field ID Name Match Type Bit Width
> > Byte Width Byte Order
> > 1 tun_ip_src exact 32
> > 4 network
> > 2 tun_ip_dst exact 32
> > 4 network
> > 3 vni exact 24
> > 3 network
> > 4 ipv4_src exact 32
> > 4 network
> > 5 ipv4_dst exact 32
> > 4 network
> > 6 src_port exact 16
> > 2 network
> > 7 dst_port exact 16
> > 2 network Spec ID Name
> > 8519716 decap_vxlan_fwd
> > 8519718 decap_vxlan_dnat_fwd
> > 8519720 decap_vxlan_snat_fwd
> > 8519695 set_exception
> >
> > And the hint of action spec "decap_vxlan_fwd" as below:
> >
> > Spec ID: 8519716
> > Name: decap_vxlan_fwd
> > Field ID Name Bit Width Byte Width
> > Byte Order
> > 1 port_id 32 4
> > host
> >
> > Please note that different compilers may assign different IDs.
> >
> > Below is an example of how to program a rule using the P4 runtime API
> > in JSON format. This rule matches fields and directs packets to port 5.
> >
> > {
> > "type": 1, //INSERT
> > "entity": {
> > "table_entry": {
> > "table_id": 8454144,
> > "match": [
> > { "field_id": 1, "exact": { "value": [10, 0, 0, 1] }
> > }, // outer src IP =
> > 10.0.0.1
> > { "field_id": 2, "exact": { "value": [10, 0, 0, 2] }
> > }, // outer dst IP =
> > 10.0.0.2
> > { "field_id": 3, "exact": { "value": [0, 0, 10] } },
> > // vni = 10,
> > { "field_id": 4, "exact": { "value": [192, 0, 0, 1] }
> > }, // inner src IP =
> > 192.0.0.1
> > {"field_id": 5, "exact": { "value": [192, 0, 0, 2] }
> > }, // inner dst IP =
> > 192.0.0.2
> > {"field_id": 6, "exact": { "value": [0, 200] } }, //
> > tcp src port = 200
> > {"field_id": 7, "exact": { "value": [0, 201] } }, //
> > tcp dst port = 201
> > ],
> > "action": {
> > "action": {
> > "action_id": 8519716,
> > "params": [
> > { "param_id": 1, "value": [5, 0, 0, 0] }
> > ]
> > }
> > },
> > ...
> > }
> > } ...
> > }
> >
> > Please note that this is only a part of the full command. For more
> > information, please refer to the p4runtime.proto[2]
> >
> > 1. https://p4.org/p4-spec/p4runtime/main/P4Runtime-Spec.html
> > 2.
> >
> https://github.com/p4lang/p4runtime/blob/main/proto/p4/v1/p4runtime.p
> r
> > oto
> >
> > Thank you for your attention to this matter.
> >
>
> I think that we should schedule some meeting to see how much gaps we
> really have between the rte_flow and
> P4 and how we can improve the rte_flow to allow the best experience.
Sound a good idea!
>
> > Regards
> > Qi
^ permalink raw reply [flat|nested] 9+ messages in thread
* RE: seeking community input on adapting DPDK to P4Runtime backend
2023-05-18 10:33 ` Zhang, Qi Z
@ 2023-05-18 14:33 ` Ori Kam
2023-05-22 5:12 ` Zhang, Qi Z
2023-05-18 14:45 ` Honnappa Nagarahalli
1 sibling, 1 reply; 9+ messages in thread
From: Ori Kam @ 2023-05-18 14:33 UTC (permalink / raw)
To: Zhang, Qi Z, dev
Cc: techboard, Richardson, Bruce, Burakov, Anatoly, Wiles, Keith,
Liang, Cunming, Wu, Jingjing, Zhang, Helin, Mcnamara, John, Xu,
Rosen
Hi Zhang,
I think we both want the same thing and share the same basic concepts.
PSB, some answers,
Best,
Ori
> -----Original Message-----
> From: Zhang, Qi Z <qi.z.zhang@intel.com>
> Sent: Thursday, May 18, 2023 1:33 PM
>
>
>
> > -----Original Message-----
> > From: Ori Kam <orika@nvidia.com>
> > Sent: Wednesday, May 17, 2023 11:19 PM
> > To: Zhang, Qi Z <qi.z.zhang@intel.com>; dev@dpdk.org
> > Cc: techboard@dpdk.org; Richardson, Bruce
> <bruce.richardson@intel.com>;
> > Burakov, Anatoly <anatoly.burakov@intel.com>; Wiles, Keith
> > <keith.wiles@intel.com>; Liang, Cunming <cunming.liang@intel.com>; Wu,
> > Jingjing <jingjing.wu@intel.com>; Zhang, Helin <helin.zhang@intel.com>;
> > Mcnamara, John <john.mcnamara@intel.com>; Xu, Rosen
> > <rosen.xu@intel.com>
> > Subject: RE: seeking community input on adapting DPDK to P4Runtime
> > backend
> >
> > Hi Zhang,
> >
> > rte_flow is an excellent candidate for implementing P4.
> > We and some internal tests that shows great promise in this regard.
> >
> > I would be very happy to supply any needed information and have
> > discussion on how to continue with this project.
>
> Thank you Ori! Please check my following comments
>
> Regards
> Qi
>
> >
> > Please see inline detailed answers.
> >
> > Best,
> > Ori Kam
> >
> >
> >
> >
> > > -----Original Message-----
> > > From: Zhang, Qi Z <qi.z.zhang@intel.com>
> > > Sent: Monday, May 8, 2023 9:40 AM
> > > Subject: seeking community input on adapting DPDK to P4Runtime
> > backend
> > >
> > > Hi:
> > >
> > > Our team is currently working on developing a DPDK PMD for a P4-
> > > programmed network controller, based on customer feedback to
> integrate
> > > DPDK into the P4Runtime backend .[https://p4.org/p4-
> > > spec/p4runtime/main/P4Runtime-Spec.html]
> > >
> > > (*) However, we are facing challenges in adapting DPDK's rte_flow API
> > > to the P4Runtime API, primarily due to the transition from a
> > > table-based API with fields of arbitrary bits width at arbitrary
> > > offset to a protocol-based API (more detail be described in post-script).
> > >
> > > We are seeking suggestions and best practices from the open-source
> > > community to help us with this integration. Specifically, we are
> > > interested in
> > > learning:
> > >
> > > (*) If anyone has previously attempted to map rte_flow to P4-based
> > devices.
> >
> > We did try successfully.
> >
> > > (*) Thoughts on how to map from table-based matching to protocol-
> based
> > > matching like in rte_flow.
> >
> > Rte_flow is table based (groups), now with the introduction of template
> API
> > rte_flow is even more table based (we added the concept of tables) which
> > are just what
> > p4 requires.
>
> Yes, the rte_flow template can be used to map a sequence of patterns to a
> P4 table and a sequence of actions to a P4 action. However, Using a fixed
> rte_flow template can be problematic when handling different P4 programs
> in the same driver. To provide more flexibility, the mapping of patterns and
> actions can be externalized into a configuration file or part of the firmware
> can be learned from driver, allowing for customization based on the specific
> requirements of each P4 pipeline. actually we have enabled this approach in
> order to accommodate different P4 programs.
>
> However, an alternative approach to consider is whether it would be feasible
> to directly expose the P4 table and action names or IDs to the application,
> rather than relying on rte_flow templates. This approach offers several
> potential benefits:
>
> Integration with P4runtime Backend: By exposing the P4 table and action
> names or IDs directly, DPDK could be easily integrated as a P4runtime
> backend. This eliminates the need for translation from the P4runtime API to
> rte_flow templates in the application, simplifying the integration process.
>
> Elimination of Manual Mapping: Exposing the P4 table and action names or
> IDs to the application would remove the requirement for the engineering
> team to manually map each pipeline to specific rte_flow templates. This is
> particularly beneficial in cases where hardware vendors provide customers
> with a toolchain to create their own P4 pipelines but do not necessarily own
> the P4 programs. By eliminating the dependency on rte_flow templates, this
> approach reduces complexity in using DPDK as the driver.
>
> To be more specific, the proposed API for exposing P4 table and action
> names or IDs directly to the application could be as follows:
>
> /* Get the table info */
> struct rte_p4_table_info tbl_info;
> rte_p4_table_info_get_by_name(port_id, "decap_vxlan_tcp_table",
> &tbl_info);
>
> /* Create the key */
> struct rte_p4_table_key *key;
> rte_p4_table_key_create(port_id, tbl_info->id, &key);
>
> /* Set the key fields */
> rte_p4_table_key_field_set_by_name(port_id, key, "wire_port",
> &wire_port, 2);
> rte_p4_table_key_field_set_by_name(port_id, key, "tun_ip_src",
> &tun_ip_src, 4);
> rte_p4_table_key_field_set_by_name(port_id, key, "tun_ip_dst",
> &tun_ip_dst, 4);
> rte_p4_table_key_field_set_by_name(port_id, key, "vni", &vni, 3);
> rte_p4_table_key_field_set_by_name(port_id, key, "ipv4_src", &ipv4_src, 4);
> rte_p4_table_key_field_set_by_name(port_id, key, "ipv4_dst", &ipv4_dst,
> 4);
> rte_p4_table_key_field_set_by_name(port_id, key, "src_port", &src_port,
> 2);
> rte_p4_table_key_field_set_by_name(port_id, key, "dst_port", &dst_port,
> 2);
>
> /* Get the action spec info */
> struct rte_p4_action_spec_info as_info;
> rte_p4_action_spec_info_get_by_name(port_id, "decap_vxlan_fwd",
> &as_info);
>
>
> /* Create the action */
> struct rte_p4_action *action;
> rte_p4_action_create(port_id, as_info->id, &action);
>
>
> /* Set the action fields */
> rte_p4_table_action_field_set_by_name(port_id, action, "mod_id",
> &mod_id, 3);
> rte_p4_table_action_field_set_by_name(port_id, action, "port_id",
> &target_port_id, 2);
>
> /* Add the entry */
> rte_p4_table_entry_add(port_id, tbl_info->id, key, action);
>
> ...
>
I think that introduce some API that knows P4 is the way to go,
but I think that this should be a very simple API which calls rte_flow.
>
>
>
>
> >
> > > (*) Any ideas on how to extend or expand the rte_flow APIs to better
> > > accommodate P4-based or other table-matching based devices.
> > >
> >
> > Lets discuss any issue you have.
> >
> > > Your insights and feedback would be greatly appreciated!
> > >
> > > ======================= Post-Script
> ============================
> > >
> > > More details on the problem below, for anyone interested
> > >
> > > In P4, flow offloading can be implemented using the P4Runtime API,
> > > which provides a standard interface for controlling and configuring
> > > the data plane behavior of network devices. P4Runtime allows network
> > > operators to dynamically add, modify, and remove flow rules in the
> > > hardware forwarding tables of P4-enabled devices.
> > >
> > > The P4Runtime API is a table-based API, it assume the packet process
> > > pipeline was consists of one or more key/action units (tables). In
> > > P4Runtime, each table defines the fields to be matched and the actions
> > > to be taken on incoming packets. During compilation, the P4 compiler
> > > assigns a unique
> > > uint32 ID to each table, action, and field, which is associated with
> > > its corresponding string name. These IDs have no inherent relationship
> > > to any network protocol but instead serve as a means to identify
> > > different components of a P4 program within the P4Runtime API.
> > >
> > This is the concept of tables and groups in rte_flow.
> >
> > > If we choose to use rte_flow as the low-level API for P4Runtime, a
> > > translation layer is needed in the application to map the P4 tables
> > > and actions to the corresponding rte_flow rules. However, this
> > > translation layer can be problematic as it is not easily scalable.
> > > When the P4 pipeline is refined or updated, the translation rules may
> > > also need to be updated, which can result in errors and reduced
> efficiency.
> > >
> > I don't understand why.
> >
> > > On the other hand, a hardware vendor that provides a P4-enabled device
> > > is required to implement an rte_flow interface in their DPDK PMD.
> > > Typically, the
> > > P4 compiler generates hints for the driver on how to map P4 tables to
> > > hardware resources, and how to convert table entry add/modify/delete
> > > actions into low-level hardware configurations. However, because
> > > rte_flow is protocol-based, it poses an additional challenge for
> > > driver developers, who must create another translation layer to
> > > convert rte_flow tokens into P4 object identifiers. This translation
> > > layer must be carefully designed and implemented to ensure optimal
> > > performance and scalability, and to ensure that the driver can efficiently
> > handle the dynamic nature of P4 programs.
> > >
> > Right, but some of the translation can be done in shared code by all PMDs
> > and the translation is static for the compilation so inserting rules can be
> > supper fast with no need for extra work.
> >
> > > To better understand the problem, let's consider the following example
> > > that demonstrates how to use the P4Runtime API to program a rule for
> > > processing a VXLAN packet. The rule matches a VXLAN packet,
> > > decapsulates the tunnel header, and forwards it to a specific port.
> > >
> > > The P4 source code below describes the VXLAN decap table
> > > decap_vxlan_tcp_table, which matches the outer IP address, VNI, inner
> > > IP address, and inner TCP port. For each rule, four action
> > > specifications can be selected. We will focus on one action
> > > specification decap_vxlan_fwd that performs decapsulation and forwards
> > the packet to a specific port.
> > >
> > > table decap_vxlan_tcp_table {
> > > key = {
> > > hdrs.ipv4[meta.depth-1].src_ip: exact @name("tun_ip_src");
> > > hdrs.ipv4[meta.depth-1].dst_ip: exact @name("tun_ip_dst");
> > > hdrs.vxlan[meta.depth-1].vni : exact @name("vni");
> > > hdrs.ipv4[meta.depth].src_ip : exact @name("ipv4_src");
> > > hdrs.ipv4[meta.depth].dst_ip : exact @name("ipv4_dst");
> > > hdrs.tcp.sport : exact @name("src_port");
> > > hdrs.tcp.dport : exact @name("dst_port");
> > > }
> > > actions = {
> > > @tableonly decap_vxlan_fwd;
> > > @tableonly decap_vxlan_dnat_fwd;
> > > @tableonly decap_vxlan_snat_fwd;
> > > @defaultonly set_exception;
> > > }
> > > }
> > Translate to rte_flow:
> > template pattern relaxed_mode = 1 pattern = Ipv4_src / ipv4_dst / vni /
> > ipv4_src / ipv4_dst / tcp_sport / tcp_dport map structure = {
> > tun_ip_src = &pattern[ipv4_src]
> > ....
> > }
> > > ...
> > >
> > > action decap_vxlan_fwd(PortId_t port_id) {
> > > meta.mod_action = (bit<11>)VXLAN_DECAP_OUTER_IPV4;
> > > send_to_port(port_id);
> > > }
> > >
> > Same as above just with action template
> >
> > > Below is an example of the hint that the compiler will generate for
> > > the
> > > decap_vxlan_tcp_table:
> > >
> > > Table ID: 8454144
> > > Name: decap_vxlan_tcp_table
> > > Field ID Name Match Type Bit Width
> > > Byte Width Byte Order
> > > 1 tun_ip_src exact 32
> > > 4 network
> > > 2 tun_ip_dst exact 32
> > > 4 network
> > > 3 vni exact 24
> > > 3 network
> > > 4 ipv4_src exact 32
> > > 4 network
> > > 5 ipv4_dst exact 32
> > > 4 network
> > > 6 src_port exact 16
> > > 2 network
> > > 7 dst_port exact 16
> > > 2 network Spec ID Name
> > > 8519716 decap_vxlan_fwd
> > > 8519718 decap_vxlan_dnat_fwd
> > > 8519720 decap_vxlan_snat_fwd
> > > 8519695 set_exception
> > >
> > > And the hint of action spec "decap_vxlan_fwd" as below:
> > >
> > > Spec ID: 8519716
> > > Name: decap_vxlan_fwd
> > > Field ID Name Bit Width Byte Width
> > > Byte Order
> > > 1 port_id 32 4
> > > host
> > >
> > > Please note that different compilers may assign different IDs.
> > >
> > > Below is an example of how to program a rule using the P4 runtime API
> > > in JSON format. This rule matches fields and directs packets to port 5.
> > >
> > > {
> > > "type": 1, //INSERT
> > > "entity": {
> > > "table_entry": {
> > > "table_id": 8454144,
> > > "match": [
> > > { "field_id": 1, "exact": { "value": [10, 0, 0, 1] }
> > > }, // outer src IP =
> > > 10.0.0.1
> > > { "field_id": 2, "exact": { "value": [10, 0, 0, 2] }
> > > }, // outer dst IP =
> > > 10.0.0.2
> > > { "field_id": 3, "exact": { "value": [0, 0, 10] } },
> > > // vni = 10,
> > > { "field_id": 4, "exact": { "value": [192, 0, 0, 1] }
> > > }, // inner src IP =
> > > 192.0.0.1
> > > {"field_id": 5, "exact": { "value": [192, 0, 0, 2] }
> > > }, // inner dst IP =
> > > 192.0.0.2
> > > {"field_id": 6, "exact": { "value": [0, 200] } }, //
> > > tcp src port = 200
> > > {"field_id": 7, "exact": { "value": [0, 201] } }, //
> > > tcp dst port = 201
> > > ],
> > > "action": {
> > > "action": {
> > > "action_id": 8519716,
> > > "params": [
> > > { "param_id": 1, "value": [5, 0, 0, 0] }
> > > ]
> > > }
> > > },
> > > ...
> > > }
> > > } ...
> > > }
> > >
> > > Please note that this is only a part of the full command. For more
> > > information, please refer to the p4runtime.proto[2]
> > >
> > > 1. https://p4.org/p4-spec/p4runtime/main/P4Runtime-Spec.html
> > > 2.
> > >
> >
> https://github.com/p4lang/p4runtime/blob/main/proto/p4/v1/p4runtime.p
> > r
> > > oto
> > >
> > > Thank you for your attention to this matter.
> > >
> >
> > I think that we should schedule some meeting to see how much gaps we
> > really have between the rte_flow and
> > P4 and how we can improve the rte_flow to allow the best experience.
>
> Sound a good idea!
> >
> > > Regards
> > > Qi
^ permalink raw reply [flat|nested] 9+ messages in thread
* RE: seeking community input on adapting DPDK to P4Runtime backend
2023-05-18 10:33 ` Zhang, Qi Z
2023-05-18 14:33 ` Ori Kam
@ 2023-05-18 14:45 ` Honnappa Nagarahalli
2023-05-22 4:58 ` Zhang, Qi Z
1 sibling, 1 reply; 9+ messages in thread
From: Honnappa Nagarahalli @ 2023-05-18 14:45 UTC (permalink / raw)
To: Zhang, Qi Z, Ori Kam, dev
Cc: techboard, Richardson, Bruce, Burakov, Anatoly, Wiles, Keith,
Liang, Cunming, Wu, Jingjing, Zhang, Helin, Mcnamara, John, Xu,
Rosen, nd, nd
<snip>
> >
> > Hi Zhang,
> >
> > rte_flow is an excellent candidate for implementing P4.
> > We and some internal tests that shows great promise in this regard.
> >
> > I would be very happy to supply any needed information and have
> > discussion on how to continue with this project.
>
> Thank you Ori! Please check my following comments
>
> Regards
> Qi
>
> >
> > Please see inline detailed answers.
> >
> > Best,
> > Ori Kam
> >
> >
> >
> >
> > > -----Original Message-----
> > > From: Zhang, Qi Z <qi.z.zhang@intel.com>
> > > Sent: Monday, May 8, 2023 9:40 AM
> > > Subject: seeking community input on adapting DPDK to P4Runtime
> > backend
> > >
> > > Hi:
> > >
> > > Our team is currently working on developing a DPDK PMD for a P4-
> > > programmed network controller, based on customer feedback to
> > > integrate DPDK into the P4Runtime backend .[https://p4.org/p4-
> > > spec/p4runtime/main/P4Runtime-Spec.html]
> > >
> > > (*) However, we are facing challenges in adapting DPDK's rte_flow
> > > API to the P4Runtime API, primarily due to the transition from a
> > > table-based API with fields of arbitrary bits width at arbitrary
> > > offset to a protocol-based API (more detail be described in post-script).
> > >
> > > We are seeking suggestions and best practices from the open-source
> > > community to help us with this integration. Specifically, we are
> > > interested in
> > > learning:
> > >
> > > (*) If anyone has previously attempted to map rte_flow to P4-based
> > devices.
> >
> > We did try successfully.
> >
> > > (*) Thoughts on how to map from table-based matching to
> > > protocol-based matching like in rte_flow.
> >
> > Rte_flow is table based (groups), now with the introduction of
> > template API rte_flow is even more table based (we added the concept
> > of tables) which are just what
> > p4 requires.
>
> Yes, the rte_flow template can be used to map a sequence of patterns to a P4
> table and a sequence of actions to a P4 action. However, Using a fixed rte_flow
> template can be problematic when handling different P4 programs in the same
> driver. To provide more flexibility, the mapping of patterns and actions can be
> externalized into a configuration file or part of the firmware can be learned
> from driver, allowing for customization based on the specific requirements of
> each P4 pipeline. actually we have enabled this approach in order to
> accommodate different P4 programs.
>
> However, an alternative approach to consider is whether it would be feasible to
> directly expose the P4 table and action names or IDs to the application, rather
> than relying on rte_flow templates. This approach offers several potential
> benefits:
>
> Integration with P4runtime Backend: By exposing the P4 table and action names
> or IDs directly, DPDK could be easily integrated as a P4runtime backend. This
> eliminates the need for translation from the P4runtime API to rte_flow
> templates in the application, simplifying the integration process.
>
> Elimination of Manual Mapping: Exposing the P4 table and action names or IDs
> to the application would remove the requirement for the engineering team to
> manually map each pipeline to specific rte_flow templates. This is particularly
> beneficial in cases where hardware vendors provide customers with a toolchain
> to create their own P4 pipelines but do not necessarily own the P4 programs. By
> eliminating the dependency on rte_flow templates, this approach reduces
> complexity in using DPDK as the driver.
>
> To be more specific, the proposed API for exposing P4 table and action names or
> IDs directly to the application could be as follows:
>
> /* Get the table info */
> struct rte_p4_table_info tbl_info;
> rte_p4_table_info_get_by_name(port_id, "decap_vxlan_tcp_table",
> &tbl_info);
>
> /* Create the key */
> struct rte_p4_table_key *key;
> rte_p4_table_key_create(port_id, tbl_info->id, &key);
>
> /* Set the key fields */
> rte_p4_table_key_field_set_by_name(port_id, key, "wire_port", &wire_port,
> 2); rte_p4_table_key_field_set_by_name(port_id, key, "tun_ip_src",
> &tun_ip_src, 4); rte_p4_table_key_field_set_by_name(port_id, key,
> "tun_ip_dst", &tun_ip_dst, 4); rte_p4_table_key_field_set_by_name(port_id,
> key, "vni", &vni, 3); rte_p4_table_key_field_set_by_name(port_id, key,
> "ipv4_src", &ipv4_src, 4); rte_p4_table_key_field_set_by_name(port_id, key,
> "ipv4_dst", &ipv4_dst, 4); rte_p4_table_key_field_set_by_name(port_id, key,
> "src_port", &src_port, 2); rte_p4_table_key_field_set_by_name(port_id, key,
> "dst_port", &dst_port, 2);
>
> /* Get the action spec info */
> struct rte_p4_action_spec_info as_info;
> rte_p4_action_spec_info_get_by_name(port_id, "decap_vxlan_fwd",
> &as_info);
>
>
> /* Create the action */
> struct rte_p4_action *action;
> rte_p4_action_create(port_id, as_info->id, &action);
>
>
> /* Set the action fields */
> rte_p4_table_action_field_set_by_name(port_id, action, "mod_id", &mod_id,
> 3); rte_p4_table_action_field_set_by_name(port_id, action, "port_id",
> &target_port_id, 2);
>
> /* Add the entry */
> rte_p4_table_entry_add(port_id, tbl_info->id, key, action);
These do not look at like P4 specific. Could be just generic APIs. Could we have these as rte_flow APIs?
>
> ...
>
>
>
>
>
> >
> > > (*) Any ideas on how to extend or expand the rte_flow APIs to better
> > > accommodate P4-based or other table-matching based devices.
> > >
> >
> > Lets discuss any issue you have.
> >
> > > Your insights and feedback would be greatly appreciated!
> > >
> > > ======================= Post-Script
> ============================
> > >
> > > More details on the problem below, for anyone interested
> > >
> > > In P4, flow offloading can be implemented using the P4Runtime API,
> > > which provides a standard interface for controlling and configuring
> > > the data plane behavior of network devices. P4Runtime allows network
> > > operators to dynamically add, modify, and remove flow rules in the
> > > hardware forwarding tables of P4-enabled devices.
> > >
> > > The P4Runtime API is a table-based API, it assume the packet process
> > > pipeline was consists of one or more key/action units (tables). In
> > > P4Runtime, each table defines the fields to be matched and the
> > > actions to be taken on incoming packets. During compilation, the P4
> > > compiler assigns a unique
> > > uint32 ID to each table, action, and field, which is associated with
> > > its corresponding string name. These IDs have no inherent
> > > relationship to any network protocol but instead serve as a means to
> > > identify different components of a P4 program within the P4Runtime API.
> > >
> > This is the concept of tables and groups in rte_flow.
> >
> > > If we choose to use rte_flow as the low-level API for P4Runtime, a
> > > translation layer is needed in the application to map the P4 tables
> > > and actions to the corresponding rte_flow rules. However, this
> > > translation layer can be problematic as it is not easily scalable.
> > > When the P4 pipeline is refined or updated, the translation rules
> > > may also need to be updated, which can result in errors and reduced
> efficiency.
> > >
> > I don't understand why.
> >
> > > On the other hand, a hardware vendor that provides a P4-enabled
> > > device is required to implement an rte_flow interface in their DPDK PMD.
> > > Typically, the
> > > P4 compiler generates hints for the driver on how to map P4 tables
> > > to hardware resources, and how to convert table entry
> > > add/modify/delete actions into low-level hardware configurations.
> > > However, because rte_flow is protocol-based, it poses an additional
> > > challenge for driver developers, who must create another translation
> > > layer to convert rte_flow tokens into P4 object identifiers. This
> > > translation layer must be carefully designed and implemented to
> > > ensure optimal performance and scalability, and to ensure that the
> > > driver can efficiently
> > handle the dynamic nature of P4 programs.
> > >
> > Right, but some of the translation can be done in shared code by all
> > PMDs and the translation is static for the compilation so inserting
> > rules can be supper fast with no need for extra work.
> >
> > > To better understand the problem, let's consider the following
> > > example that demonstrates how to use the P4Runtime API to program a
> > > rule for processing a VXLAN packet. The rule matches a VXLAN packet,
> > > decapsulates the tunnel header, and forwards it to a specific port.
> > >
> > > The P4 source code below describes the VXLAN decap table
> > > decap_vxlan_tcp_table, which matches the outer IP address, VNI,
> > > inner IP address, and inner TCP port. For each rule, four action
> > > specifications can be selected. We will focus on one action
> > > specification decap_vxlan_fwd that performs decapsulation and
> > > forwards
> > the packet to a specific port.
> > >
> > > table decap_vxlan_tcp_table {
> > > key = {
> > > hdrs.ipv4[meta.depth-1].src_ip: exact @name("tun_ip_src");
> > > hdrs.ipv4[meta.depth-1].dst_ip: exact @name("tun_ip_dst");
> > > hdrs.vxlan[meta.depth-1].vni : exact @name("vni");
> > > hdrs.ipv4[meta.depth].src_ip : exact @name("ipv4_src");
> > > hdrs.ipv4[meta.depth].dst_ip : exact @name("ipv4_dst");
> > > hdrs.tcp.sport : exact @name("src_port");
> > > hdrs.tcp.dport : exact @name("dst_port");
> > > }
> > > actions = {
> > > @tableonly decap_vxlan_fwd;
> > > @tableonly decap_vxlan_dnat_fwd;
> > > @tableonly decap_vxlan_snat_fwd;
> > > @defaultonly set_exception;
> > > }
> > > }
> > Translate to rte_flow:
> > template pattern relaxed_mode = 1 pattern = Ipv4_src / ipv4_dst /
> > vni / ipv4_src / ipv4_dst / tcp_sport / tcp_dport map structure = {
> > tun_ip_src = &pattern[ipv4_src]
> > ....
> > }
> > > ...
> > >
> > > action decap_vxlan_fwd(PortId_t port_id) {
> > > meta.mod_action = (bit<11>)VXLAN_DECAP_OUTER_IPV4;
> > > send_to_port(port_id);
> > > }
> > >
> > Same as above just with action template
> >
> > > Below is an example of the hint that the compiler will generate for
> > > the
> > > decap_vxlan_tcp_table:
> > >
> > > Table ID: 8454144
> > > Name: decap_vxlan_tcp_table Field ID Name
> > > Match Type Bit Width Byte Width Byte Order
> > > 1 tun_ip_src exact 32
> > > 4 network
> > > 2 tun_ip_dst exact 32
> > > 4 network
> > > 3 vni exact 24
> > > 3 network
> > > 4 ipv4_src exact 32
> > > 4 network
> > > 5 ipv4_dst exact 32
> > > 4 network
> > > 6 src_port exact 16
> > > 2 network
> > > 7 dst_port exact 16
> > > 2 network Spec ID Name
> > > 8519716 decap_vxlan_fwd
> > > 8519718 decap_vxlan_dnat_fwd
> > > 8519720 decap_vxlan_snat_fwd
> > > 8519695 set_exception
> > >
> > > And the hint of action spec "decap_vxlan_fwd" as below:
> > >
> > > Spec ID: 8519716
> > > Name: decap_vxlan_fwd
> > > Field ID Name Bit Width Byte
> > > Width Byte Order
> > > 1 port_id 32 4 host
> > >
> > > Please note that different compilers may assign different IDs.
> > >
> > > Below is an example of how to program a rule using the P4 runtime
> > > API in JSON format. This rule matches fields and directs packets to port 5.
> > >
> > > {
> > > "type": 1, //INSERT
> > > "entity": {
> > > "table_entry": {
> > > "table_id": 8454144,
> > > "match": [
> > > { "field_id": 1, "exact": { "value": [10, 0, 0, 1] }
> > > }, // outer src IP =
> > > 10.0.0.1
> > > { "field_id": 2, "exact": { "value": [10, 0, 0, 2] }
> > > }, // outer dst IP =
> > > 10.0.0.2
> > > { "field_id": 3, "exact": { "value": [0, 0, 10] } },
> > > // vni = 10,
> > > { "field_id": 4, "exact": { "value": [192, 0, 0, 1]
> > > } }, // inner src IP =
> > > 192.0.0.1
> > > {"field_id": 5, "exact": { "value": [192, 0, 0, 2] }
> > > }, // inner dst IP =
> > > 192.0.0.2
> > > {"field_id": 6, "exact": { "value": [0, 200] } }, //
> > > tcp src port = 200
> > > {"field_id": 7, "exact": { "value": [0, 201] } }, //
> > > tcp dst port = 201
> > > ],
> > > "action": {
> > > "action": {
> > > "action_id": 8519716,
> > > "params": [
> > > { "param_id": 1, "value": [5, 0, 0, 0] }
> > > ]
> > > }
> > > },
> > > ...
> > > }
> > > } ...
> > > }
> > >
> > > Please note that this is only a part of the full command. For more
> > > information, please refer to the p4runtime.proto[2]
> > >
> > > 1. https://p4.org/p4-spec/p4runtime/main/P4Runtime-Spec.html
> > > 2.
> > >
> > https://github.com/p4lang/p4runtime/blob/main/proto/p4/v1/p4runtime.p
> > r
> > > oto
> > >
> > > Thank you for your attention to this matter.
> > >
> >
> > I think that we should schedule some meeting to see how much gaps we
> > really have between the rte_flow and
> > P4 and how we can improve the rte_flow to allow the best experience.
>
> Sound a good idea!
> >
> > > Regards
> > > Qi
^ permalink raw reply [flat|nested] 9+ messages in thread
* RE: seeking community input on adapting DPDK to P4Runtime backend
2023-05-18 14:45 ` Honnappa Nagarahalli
@ 2023-05-22 4:58 ` Zhang, Qi Z
0 siblings, 0 replies; 9+ messages in thread
From: Zhang, Qi Z @ 2023-05-22 4:58 UTC (permalink / raw)
To: Honnappa Nagarahalli, Ori Kam, dev
Cc: techboard, Richardson, Bruce, Burakov, Anatoly, Wiles, Keith,
Liang, Cunming, Wu, Jingjing, Zhang, Helin, Mcnamara, John, Xu,
Rosen, nd, nd
> -----Original Message-----
> From: Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>
> Sent: Thursday, May 18, 2023 10:46 PM
> To: Zhang, Qi Z <qi.z.zhang@intel.com>; Ori Kam <orika@nvidia.com>;
> dev@dpdk.org
> Cc: techboard@dpdk.org; Richardson, Bruce <bruce.richardson@intel.com>;
> Burakov, Anatoly <anatoly.burakov@intel.com>; Wiles, Keith
> <keith.wiles@intel.com>; Liang, Cunming <cunming.liang@intel.com>; Wu,
> Jingjing <jingjing.wu@intel.com>; Zhang, Helin <helin.zhang@intel.com>;
> Mcnamara, John <john.mcnamara@intel.com>; Xu, Rosen
> <rosen.xu@intel.com>; nd <nd@arm.com>; nd <nd@arm.com>
> Subject: RE: seeking community input on adapting DPDK to P4Runtime
> backend
>
> <snip>
>
> > >
> > > Hi Zhang,
> > >
> > > rte_flow is an excellent candidate for implementing P4.
> > > We and some internal tests that shows great promise in this regard.
> > >
> > > I would be very happy to supply any needed information and have
> > > discussion on how to continue with this project.
> >
> > Thank you Ori! Please check my following comments
> >
> > Regards
> > Qi
> >
> > >
> > > Please see inline detailed answers.
> > >
> > > Best,
> > > Ori Kam
> > >
> > >
> > >
> > >
> > > > -----Original Message-----
> > > > From: Zhang, Qi Z <qi.z.zhang@intel.com>
> > > > Sent: Monday, May 8, 2023 9:40 AM
> > > > Subject: seeking community input on adapting DPDK to P4Runtime
> > > backend
> > > >
> > > > Hi:
> > > >
> > > > Our team is currently working on developing a DPDK PMD for a P4-
> > > > programmed network controller, based on customer feedback to
> > > > integrate DPDK into the P4Runtime backend .[https://p4.org/p4-
> > > > spec/p4runtime/main/P4Runtime-Spec.html]
> > > >
> > > > (*) However, we are facing challenges in adapting DPDK's rte_flow
> > > > API to the P4Runtime API, primarily due to the transition from a
> > > > table-based API with fields of arbitrary bits width at arbitrary
> > > > offset to a protocol-based API (more detail be described in post-script).
> > > >
> > > > We are seeking suggestions and best practices from the open-source
> > > > community to help us with this integration. Specifically, we are
> > > > interested in
> > > > learning:
> > > >
> > > > (*) If anyone has previously attempted to map rte_flow to P4-based
> > > devices.
> > >
> > > We did try successfully.
> > >
> > > > (*) Thoughts on how to map from table-based matching to
> > > > protocol-based matching like in rte_flow.
> > >
> > > Rte_flow is table based (groups), now with the introduction of
> > > template API rte_flow is even more table based (we added the concept
> > > of tables) which are just what
> > > p4 requires.
> >
> > Yes, the rte_flow template can be used to map a sequence of patterns
> > to a P4 table and a sequence of actions to a P4 action. However, Using
> > a fixed rte_flow template can be problematic when handling different
> > P4 programs in the same driver. To provide more flexibility, the
> > mapping of patterns and actions can be externalized into a
> > configuration file or part of the firmware can be learned from driver,
> > allowing for customization based on the specific requirements of each
> > P4 pipeline. actually we have enabled this approach in order to
> accommodate different P4 programs.
> >
> > However, an alternative approach to consider is whether it would be
> > feasible to directly expose the P4 table and action names or IDs to
> > the application, rather than relying on rte_flow templates. This
> > approach offers several potential
> > benefits:
> >
> > Integration with P4runtime Backend: By exposing the P4 table and
> > action names or IDs directly, DPDK could be easily integrated as a
> > P4runtime backend. This eliminates the need for translation from the
> > P4runtime API to rte_flow templates in the application, simplifying the
> integration process.
> >
> > Elimination of Manual Mapping: Exposing the P4 table and action names
> > or IDs to the application would remove the requirement for the
> > engineering team to manually map each pipeline to specific rte_flow
> > templates. This is particularly beneficial in cases where hardware
> > vendors provide customers with a toolchain to create their own P4
> > pipelines but do not necessarily own the P4 programs. By eliminating
> > the dependency on rte_flow templates, this approach reduces complexity
> in using DPDK as the driver.
> >
> > To be more specific, the proposed API for exposing P4 table and action
> > names or IDs directly to the application could be as follows:
> >
> > /* Get the table info */
> > struct rte_p4_table_info tbl_info;
> > rte_p4_table_info_get_by_name(port_id, "decap_vxlan_tcp_table",
> > &tbl_info);
> >
> > /* Create the key */
> > struct rte_p4_table_key *key;
> > rte_p4_table_key_create(port_id, tbl_info->id, &key);
> >
> > /* Set the key fields */
> > rte_p4_table_key_field_set_by_name(port_id, key, "wire_port",
> > &wire_port, 2); rte_p4_table_key_field_set_by_name(port_id, key,
> > "tun_ip_src", &tun_ip_src, 4);
> > rte_p4_table_key_field_set_by_name(port_id, key, "tun_ip_dst",
> > &tun_ip_dst, 4); rte_p4_table_key_field_set_by_name(port_id,
> > key, "vni", &vni, 3); rte_p4_table_key_field_set_by_name(port_id, key,
> > "ipv4_src", &ipv4_src, 4); rte_p4_table_key_field_set_by_name(port_id,
> > key, "ipv4_dst", &ipv4_dst, 4);
> > rte_p4_table_key_field_set_by_name(port_id, key, "src_port",
> > &src_port, 2); rte_p4_table_key_field_set_by_name(port_id, key,
> > "dst_port", &dst_port, 2);
> >
> > /* Get the action spec info */
> > struct rte_p4_action_spec_info as_info;
> > rte_p4_action_spec_info_get_by_name(port_id, "decap_vxlan_fwd",
> > &as_info);
> >
> >
> > /* Create the action */
> > struct rte_p4_action *action;
> > rte_p4_action_create(port_id, as_info->id, &action);
> >
> >
> > /* Set the action fields */
> > rte_p4_table_action_field_set_by_name(port_id, action, "mod_id",
> > &mod_id, 3); rte_p4_table_action_field_set_by_name(port_id, action,
> > "port_id", &target_port_id, 2);
> >
> > /* Add the entry */
> > rte_p4_table_entry_add(port_id, tbl_info->id, key, action);
> These do not look at like P4 specific. Could be just generic APIs. Could we
> have these as rte_flow APIs?
Agreed, the goal is not necessarily to have P4-specific APIs, but rather to expose a set of table-driven APIs that align with the programmable hardware pipeline. This approach would allow for more flexibility and customization compared to relying on existing protocol-based APIs.
Both options, extending the existing rte_flow API to expose the required table-driven feature or introducing a set of dedicate table-driven APIs, appear to be viable solutions for me.
Thanks
Qi
> >
> >
> >
> >
> >
> > >
> > > > (*) Any ideas on how to extend or expand the rte_flow APIs to
> > > > better accommodate P4-based or other table-matching based devices.
> > > >
> > >
> > > Lets discuss any issue you have.
> > >
> > > > Your insights and feedback would be greatly appreciated!
> > > >
> > > > ======================= Post-Script
> > ============================
> > > >
> > > > More details on the problem below, for anyone interested
> > > >
> > > > In P4, flow offloading can be implemented using the P4Runtime API,
> > > > which provides a standard interface for controlling and
> > > > configuring the data plane behavior of network devices. P4Runtime
> > > > allows network operators to dynamically add, modify, and remove
> > > > flow rules in the hardware forwarding tables of P4-enabled devices.
> > > >
> > > > The P4Runtime API is a table-based API, it assume the packet
> > > > process pipeline was consists of one or more key/action units
> > > > (tables). In P4Runtime, each table defines the fields to be
> > > > matched and the actions to be taken on incoming packets. During
> > > > compilation, the P4 compiler assigns a unique
> > > > uint32 ID to each table, action, and field, which is associated
> > > > with its corresponding string name. These IDs have no inherent
> > > > relationship to any network protocol but instead serve as a means
> > > > to identify different components of a P4 program within the P4Runtime
> API.
> > > >
> > > This is the concept of tables and groups in rte_flow.
> > >
> > > > If we choose to use rte_flow as the low-level API for P4Runtime, a
> > > > translation layer is needed in the application to map the P4
> > > > tables and actions to the corresponding rte_flow rules. However,
> > > > this translation layer can be problematic as it is not easily scalable.
> > > > When the P4 pipeline is refined or updated, the translation rules
> > > > may also need to be updated, which can result in errors and
> > > > reduced
> > efficiency.
> > > >
> > > I don't understand why.
> > >
> > > > On the other hand, a hardware vendor that provides a P4-enabled
> > > > device is required to implement an rte_flow interface in their DPDK PMD.
> > > > Typically, the
> > > > P4 compiler generates hints for the driver on how to map P4 tables
> > > > to hardware resources, and how to convert table entry
> > > > add/modify/delete actions into low-level hardware configurations.
> > > > However, because rte_flow is protocol-based, it poses an
> > > > additional challenge for driver developers, who must create
> > > > another translation layer to convert rte_flow tokens into P4
> > > > object identifiers. This translation layer must be carefully
> > > > designed and implemented to ensure optimal performance and
> > > > scalability, and to ensure that the driver can efficiently
> > > handle the dynamic nature of P4 programs.
> > > >
> > > Right, but some of the translation can be done in shared code by all
> > > PMDs and the translation is static for the compilation so inserting
> > > rules can be supper fast with no need for extra work.
> > >
> > > > To better understand the problem, let's consider the following
> > > > example that demonstrates how to use the P4Runtime API to program
> > > > a rule for processing a VXLAN packet. The rule matches a VXLAN
> > > > packet, decapsulates the tunnel header, and forwards it to a specific
> port.
> > > >
> > > > The P4 source code below describes the VXLAN decap table
> > > > decap_vxlan_tcp_table, which matches the outer IP address, VNI,
> > > > inner IP address, and inner TCP port. For each rule, four action
> > > > specifications can be selected. We will focus on one action
> > > > specification decap_vxlan_fwd that performs decapsulation and
> > > > forwards
> > > the packet to a specific port.
> > > >
> > > > table decap_vxlan_tcp_table {
> > > > key = {
> > > > hdrs.ipv4[meta.depth-1].src_ip: exact @name("tun_ip_src");
> > > > hdrs.ipv4[meta.depth-1].dst_ip: exact @name("tun_ip_dst");
> > > > hdrs.vxlan[meta.depth-1].vni : exact @name("vni");
> > > > hdrs.ipv4[meta.depth].src_ip : exact @name("ipv4_src");
> > > > hdrs.ipv4[meta.depth].dst_ip : exact @name("ipv4_dst");
> > > > hdrs.tcp.sport : exact @name("src_port");
> > > > hdrs.tcp.dport : exact @name("dst_port");
> > > > }
> > > > actions = {
> > > > @tableonly decap_vxlan_fwd;
> > > > @tableonly decap_vxlan_dnat_fwd;
> > > > @tableonly decap_vxlan_snat_fwd;
> > > > @defaultonly set_exception;
> > > > }
> > > > }
> > > Translate to rte_flow:
> > > template pattern relaxed_mode = 1 pattern = Ipv4_src / ipv4_dst /
> > > vni / ipv4_src / ipv4_dst / tcp_sport / tcp_dport map structure = {
> > > tun_ip_src = &pattern[ipv4_src]
> > > ....
> > > }
> > > > ...
> > > >
> > > > action decap_vxlan_fwd(PortId_t port_id) {
> > > > meta.mod_action = (bit<11>)VXLAN_DECAP_OUTER_IPV4;
> > > > send_to_port(port_id);
> > > > }
> > > >
> > > Same as above just with action template
> > >
> > > > Below is an example of the hint that the compiler will generate
> > > > for the
> > > > decap_vxlan_tcp_table:
> > > >
> > > > Table ID: 8454144
> > > > Name: decap_vxlan_tcp_table Field ID Name Match
> > > > Type Bit Width Byte Width Byte Order
> > > > 1 tun_ip_src exact 32
> > > > 4 network
> > > > 2 tun_ip_dst exact 32
> > > > 4 network
> > > > 3 vni exact 24
> > > > 3 network
> > > > 4 ipv4_src exact 32
> > > > 4 network
> > > > 5 ipv4_dst exact 32
> > > > 4 network
> > > > 6 src_port exact 16
> > > > 2 network
> > > > 7 dst_port exact 16
> > > > 2 network Spec ID Name
> > > > 8519716 decap_vxlan_fwd
> > > > 8519718 decap_vxlan_dnat_fwd
> > > > 8519720 decap_vxlan_snat_fwd
> > > > 8519695 set_exception
> > > >
> > > > And the hint of action spec "decap_vxlan_fwd" as below:
> > > >
> > > > Spec ID: 8519716
> > > > Name: decap_vxlan_fwd
> > > > Field ID Name Bit Width Byte
> > > > Width Byte Order
> > > > 1 port_id 32 4 host
> > > >
> > > > Please note that different compilers may assign different IDs.
> > > >
> > > > Below is an example of how to program a rule using the P4 runtime
> > > > API in JSON format. This rule matches fields and directs packets to port 5.
> > > >
> > > > {
> > > > "type": 1, //INSERT
> > > > "entity": {
> > > > "table_entry": {
> > > > "table_id": 8454144,
> > > > "match": [
> > > > { "field_id": 1, "exact": { "value": [10, 0, 0, 1]
> > > > } }, // outer src IP =
> > > > 10.0.0.1
> > > > { "field_id": 2, "exact": { "value": [10, 0, 0, 2]
> > > > } }, // outer dst IP =
> > > > 10.0.0.2
> > > > { "field_id": 3, "exact": { "value": [0, 0, 10] }
> > > > }, // vni = 10,
> > > > { "field_id": 4, "exact": { "value": [192, 0, 0,
> > > > 1] } }, // inner src IP =
> > > > 192.0.0.1
> > > > {"field_id": 5, "exact": { "value": [192, 0, 0, 2]
> > > > } }, // inner dst IP =
> > > > 192.0.0.2
> > > > {"field_id": 6, "exact": { "value": [0, 200] } },
> > > > // tcp src port = 200
> > > > {"field_id": 7, "exact": { "value": [0, 201] } },
> > > > // tcp dst port = 201
> > > > ],
> > > > "action": {
> > > > "action": {
> > > > "action_id": 8519716,
> > > > "params": [
> > > > { "param_id": 1, "value": [5, 0, 0, 0] }
> > > > ]
> > > > }
> > > > },
> > > > ...
> > > > }
> > > > } ...
> > > > }
> > > >
> > > > Please note that this is only a part of the full command. For more
> > > > information, please refer to the p4runtime.proto[2]
> > > >
> > > > 1. https://p4.org/p4-spec/p4runtime/main/P4Runtime-Spec.html
> > > > 2.
> > > >
> > >
> https://github.com/p4lang/p4runtime/blob/main/proto/p4/v1/p4runtime.
> > > p
> > > r
> > > > oto
> > > >
> > > > Thank you for your attention to this matter.
> > > >
> > >
> > > I think that we should schedule some meeting to see how much gaps we
> > > really have between the rte_flow and
> > > P4 and how we can improve the rte_flow to allow the best experience.
> >
> > Sound a good idea!
> > >
> > > > Regards
> > > > Qi
^ permalink raw reply [flat|nested] 9+ messages in thread
* RE: seeking community input on adapting DPDK to P4Runtime backend
2023-05-18 14:33 ` Ori Kam
@ 2023-05-22 5:12 ` Zhang, Qi Z
2023-05-24 15:00 ` Jerin Jacob
0 siblings, 1 reply; 9+ messages in thread
From: Zhang, Qi Z @ 2023-05-22 5:12 UTC (permalink / raw)
To: Ori Kam, dev
Cc: techboard, Richardson, Bruce, Burakov, Anatoly, Wiles, Keith,
Liang, Cunming, Wu, Jingjing, Zhang, Helin, Mcnamara, John, Xu,
Rosen
> -----Original Message-----
> From: Ori Kam <orika@nvidia.com>
> Sent: Thursday, May 18, 2023 10:34 PM
> To: Zhang, Qi Z <qi.z.zhang@intel.com>; dev@dpdk.org
> Cc: techboard@dpdk.org; Richardson, Bruce <bruce.richardson@intel.com>;
> Burakov, Anatoly <anatoly.burakov@intel.com>; Wiles, Keith
> <keith.wiles@intel.com>; Liang, Cunming <cunming.liang@intel.com>; Wu,
> Jingjing <jingjing.wu@intel.com>; Zhang, Helin <helin.zhang@intel.com>;
> Mcnamara, John <john.mcnamara@intel.com>; Xu, Rosen
> <rosen.xu@intel.com>
> Subject: RE: seeking community input on adapting DPDK to P4Runtime
> backend
>
> Hi Zhang,
>
> I think we both want the same thing and share the same basic concepts.
>
> PSB, some answers,
>
> Best,
> Ori
>
>
> > -----Original Message-----
> > From: Zhang, Qi Z <qi.z.zhang@intel.com>
> > Sent: Thursday, May 18, 2023 1:33 PM
> >
> >
> >
> > > -----Original Message-----
> > > From: Ori Kam <orika@nvidia.com>
> > > Sent: Wednesday, May 17, 2023 11:19 PM
> > > To: Zhang, Qi Z <qi.z.zhang@intel.com>; dev@dpdk.org
> > > Cc: techboard@dpdk.org; Richardson, Bruce
> > <bruce.richardson@intel.com>;
> > > Burakov, Anatoly <anatoly.burakov@intel.com>; Wiles, Keith
> > > <keith.wiles@intel.com>; Liang, Cunming <cunming.liang@intel.com>;
> > > Wu, Jingjing <jingjing.wu@intel.com>; Zhang, Helin
> > > <helin.zhang@intel.com>; Mcnamara, John <john.mcnamara@intel.com>;
> > > Xu, Rosen <rosen.xu@intel.com>
> > > Subject: RE: seeking community input on adapting DPDK to P4Runtime
> > > backend
> > >
> > > Hi Zhang,
> > >
> > > rte_flow is an excellent candidate for implementing P4.
> > > We and some internal tests that shows great promise in this regard.
> > >
> > > I would be very happy to supply any needed information and have
> > > discussion on how to continue with this project.
> >
> > Thank you Ori! Please check my following comments
> >
> > Regards
> > Qi
> >
> > >
> > > Please see inline detailed answers.
> > >
> > > Best,
> > > Ori Kam
> > >
> > >
> > >
> > >
> > > > -----Original Message-----
> > > > From: Zhang, Qi Z <qi.z.zhang@intel.com>
> > > > Sent: Monday, May 8, 2023 9:40 AM
> > > > Subject: seeking community input on adapting DPDK to P4Runtime
> > > backend
> > > >
> > > > Hi:
> > > >
> > > > Our team is currently working on developing a DPDK PMD for a P4-
> > > > programmed network controller, based on customer feedback to
> > integrate
> > > > DPDK into the P4Runtime backend .[https://p4.org/p4-
> > > > spec/p4runtime/main/P4Runtime-Spec.html]
> > > >
> > > > (*) However, we are facing challenges in adapting DPDK's rte_flow
> > > > API to the P4Runtime API, primarily due to the transition from a
> > > > table-based API with fields of arbitrary bits width at arbitrary
> > > > offset to a protocol-based API (more detail be described in post-script).
> > > >
> > > > We are seeking suggestions and best practices from the open-source
> > > > community to help us with this integration. Specifically, we are
> > > > interested in
> > > > learning:
> > > >
> > > > (*) If anyone has previously attempted to map rte_flow to P4-based
> > > devices.
> > >
> > > We did try successfully.
> > >
> > > > (*) Thoughts on how to map from table-based matching to protocol-
> > based
> > > > matching like in rte_flow.
> > >
> > > Rte_flow is table based (groups), now with the introduction of
> > > template
> > API
> > > rte_flow is even more table based (we added the concept of tables)
> > > which are just what
> > > p4 requires.
> >
> > Yes, the rte_flow template can be used to map a sequence of patterns
> > to a
> > P4 table and a sequence of actions to a P4 action. However, Using a
> > fixed rte_flow template can be problematic when handling different P4
> > programs in the same driver. To provide more flexibility, the mapping
> > of patterns and actions can be externalized into a configuration file
> > or part of the firmware can be learned from driver, allowing for
> > customization based on the specific requirements of each P4 pipeline.
> > actually we have enabled this approach in order to accommodate different
> P4 programs.
> >
> > However, an alternative approach to consider is whether it would be
> > feasible to directly expose the P4 table and action names or IDs to
> > the application, rather than relying on rte_flow templates. This
> > approach offers several potential benefits:
> >
> > Integration with P4runtime Backend: By exposing the P4 table and
> > action names or IDs directly, DPDK could be easily integrated as a
> > P4runtime backend. This eliminates the need for translation from the
> > P4runtime API to rte_flow templates in the application, simplifying the
> integration process.
> >
> > Elimination of Manual Mapping: Exposing the P4 table and action names
> > or IDs to the application would remove the requirement for the
> > engineering team to manually map each pipeline to specific rte_flow
> > templates. This is particularly beneficial in cases where hardware
> > vendors provide customers with a toolchain to create their own P4
> > pipelines but do not necessarily own the P4 programs. By eliminating
> > the dependency on rte_flow templates, this approach reduces complexity
> in using DPDK as the driver.
> >
> > To be more specific, the proposed API for exposing P4 table and action
> > names or IDs directly to the application could be as follows:
> >
> > /* Get the table info */
> > struct rte_p4_table_info tbl_info;
> > rte_p4_table_info_get_by_name(port_id, "decap_vxlan_tcp_table",
> > &tbl_info);
> >
> > /* Create the key */
> > struct rte_p4_table_key *key;
> > rte_p4_table_key_create(port_id, tbl_info->id, &key);
> >
> > /* Set the key fields */
> > rte_p4_table_key_field_set_by_name(port_id, key, "wire_port",
> > &wire_port, 2); rte_p4_table_key_field_set_by_name(port_id, key,
> > "tun_ip_src", &tun_ip_src, 4);
> > rte_p4_table_key_field_set_by_name(port_id, key, "tun_ip_dst",
> > &tun_ip_dst, 4); rte_p4_table_key_field_set_by_name(port_id, key,
> > "vni", &vni, 3); rte_p4_table_key_field_set_by_name(port_id, key,
> > "ipv4_src", &ipv4_src, 4); rte_p4_table_key_field_set_by_name(port_id,
> > key, "ipv4_dst", &ipv4_dst, 4);
> > rte_p4_table_key_field_set_by_name(port_id, key, "src_port",
> > &src_port, 2); rte_p4_table_key_field_set_by_name(port_id, key,
> > "dst_port", &dst_port, 2);
> >
> > /* Get the action spec info */
> > struct rte_p4_action_spec_info as_info;
> > rte_p4_action_spec_info_get_by_name(port_id, "decap_vxlan_fwd",
> > &as_info);
> >
> >
> > /* Create the action */
> > struct rte_p4_action *action;
> > rte_p4_action_create(port_id, as_info->id, &action);
> >
> >
> > /* Set the action fields */
> > rte_p4_table_action_field_set_by_name(port_id, action, "mod_id",
> > &mod_id, 3); rte_p4_table_action_field_set_by_name(port_id, action,
> > "port_id", &target_port_id, 2);
> >
> > /* Add the entry */
> > rte_p4_table_entry_add(port_id, tbl_info->id, key, action);
> >
> > ...
> >
>
> I think that introduce some API that knows P4 is the way to go,
Good to know!
> but I think that
> this should be a very simple API which calls rte_flow.
I guess the complexity of the API implementation may depend on the underlying hardware, In our case, we can directly translate the P4 table key and action into low-level hardware configuration using hints generated by the P4 compiler, without the need for additional translation with rte_flow protocol-based templates
Thanks
Qi
>
>
> >
> >
> >
> >
> > >
> > > > (*) Any ideas on how to extend or expand the rte_flow APIs to
> > > > better accommodate P4-based or other table-matching based devices.
> > > >
> > >
> > > Lets discuss any issue you have.
> > >
> > > > Your insights and feedback would be greatly appreciated!
> > > >
> > > > ======================= Post-Script
> > ============================
> > > >
> > > > More details on the problem below, for anyone interested
> > > >
> > > > In P4, flow offloading can be implemented using the P4Runtime API,
> > > > which provides a standard interface for controlling and
> > > > configuring the data plane behavior of network devices. P4Runtime
> > > > allows network operators to dynamically add, modify, and remove
> > > > flow rules in the hardware forwarding tables of P4-enabled devices.
> > > >
> > > > The P4Runtime API is a table-based API, it assume the packet
> > > > process pipeline was consists of one or more key/action units
> > > > (tables). In P4Runtime, each table defines the fields to be
> > > > matched and the actions to be taken on incoming packets. During
> > > > compilation, the P4 compiler assigns a unique
> > > > uint32 ID to each table, action, and field, which is associated
> > > > with its corresponding string name. These IDs have no inherent
> > > > relationship to any network protocol but instead serve as a means
> > > > to identify different components of a P4 program within the P4Runtime
> API.
> > > >
> > > This is the concept of tables and groups in rte_flow.
> > >
> > > > If we choose to use rte_flow as the low-level API for P4Runtime, a
> > > > translation layer is needed in the application to map the P4
> > > > tables and actions to the corresponding rte_flow rules. However,
> > > > this translation layer can be problematic as it is not easily scalable.
> > > > When the P4 pipeline is refined or updated, the translation rules
> > > > may also need to be updated, which can result in errors and
> > > > reduced
> > efficiency.
> > > >
> > > I don't understand why.
> > >
> > > > On the other hand, a hardware vendor that provides a P4-enabled
> > > > device is required to implement an rte_flow interface in their DPDK PMD.
> > > > Typically, the
> > > > P4 compiler generates hints for the driver on how to map P4 tables
> > > > to hardware resources, and how to convert table entry
> > > > add/modify/delete actions into low-level hardware configurations.
> > > > However, because rte_flow is protocol-based, it poses an
> > > > additional challenge for driver developers, who must create
> > > > another translation layer to convert rte_flow tokens into P4
> > > > object identifiers. This translation layer must be carefully
> > > > designed and implemented to ensure optimal performance and
> > > > scalability, and to ensure that the driver can efficiently
> > > handle the dynamic nature of P4 programs.
> > > >
> > > Right, but some of the translation can be done in shared code by all
> > > PMDs and the translation is static for the compilation so inserting
> > > rules can be supper fast with no need for extra work.
> > >
> > > > To better understand the problem, let's consider the following
> > > > example that demonstrates how to use the P4Runtime API to program
> > > > a rule for processing a VXLAN packet. The rule matches a VXLAN
> > > > packet, decapsulates the tunnel header, and forwards it to a specific
> port.
> > > >
> > > > The P4 source code below describes the VXLAN decap table
> > > > decap_vxlan_tcp_table, which matches the outer IP address, VNI,
> > > > inner IP address, and inner TCP port. For each rule, four action
> > > > specifications can be selected. We will focus on one action
> > > > specification decap_vxlan_fwd that performs decapsulation and
> > > > forwards
> > > the packet to a specific port.
> > > >
> > > > table decap_vxlan_tcp_table {
> > > > key = {
> > > > hdrs.ipv4[meta.depth-1].src_ip: exact @name("tun_ip_src");
> > > > hdrs.ipv4[meta.depth-1].dst_ip: exact @name("tun_ip_dst");
> > > > hdrs.vxlan[meta.depth-1].vni : exact @name("vni");
> > > > hdrs.ipv4[meta.depth].src_ip : exact @name("ipv4_src");
> > > > hdrs.ipv4[meta.depth].dst_ip : exact @name("ipv4_dst");
> > > > hdrs.tcp.sport : exact @name("src_port");
> > > > hdrs.tcp.dport : exact @name("dst_port");
> > > > }
> > > > actions = {
> > > > @tableonly decap_vxlan_fwd;
> > > > @tableonly decap_vxlan_dnat_fwd;
> > > > @tableonly decap_vxlan_snat_fwd;
> > > > @defaultonly set_exception;
> > > > }
> > > > }
> > > Translate to rte_flow:
> > > template pattern relaxed_mode = 1 pattern = Ipv4_src / ipv4_dst /
> > > vni / ipv4_src / ipv4_dst / tcp_sport / tcp_dport map structure = {
> > > tun_ip_src = &pattern[ipv4_src]
> > > ....
> > > }
> > > > ...
> > > >
> > > > action decap_vxlan_fwd(PortId_t port_id) {
> > > > meta.mod_action = (bit<11>)VXLAN_DECAP_OUTER_IPV4;
> > > > send_to_port(port_id);
> > > > }
> > > >
> > > Same as above just with action template
> > >
> > > > Below is an example of the hint that the compiler will generate
> > > > for the
> > > > decap_vxlan_tcp_table:
> > > >
> > > > Table ID: 8454144
> > > > Name: decap_vxlan_tcp_table Field ID Name
> > > > Match Type Bit Width Byte Width Byte Order
> > > > 1 tun_ip_src exact 32
> > > > 4 network
> > > > 2 tun_ip_dst exact 32
> > > > 4 network
> > > > 3 vni exact 24
> > > > 3 network
> > > > 4 ipv4_src exact 32
> > > > 4 network
> > > > 5 ipv4_dst exact 32
> > > > 4 network
> > > > 6 src_port exact 16
> > > > 2 network
> > > > 7 dst_port exact 16
> > > > 2 network Spec ID Name
> > > > 8519716 decap_vxlan_fwd
> > > > 8519718 decap_vxlan_dnat_fwd
> > > > 8519720 decap_vxlan_snat_fwd
> > > > 8519695 set_exception
> > > >
> > > > And the hint of action spec "decap_vxlan_fwd" as below:
> > > >
> > > > Spec ID: 8519716
> > > > Name: decap_vxlan_fwd
> > > > Field ID Name Bit Width Byte
> > > > Width Byte Order
> > > > 1 port_id 32 4 host
> > > >
> > > > Please note that different compilers may assign different IDs.
> > > >
> > > > Below is an example of how to program a rule using the P4 runtime
> > > > API in JSON format. This rule matches fields and directs packets to port 5.
> > > >
> > > > {
> > > > "type": 1, //INSERT
> > > > "entity": {
> > > > "table_entry": {
> > > > "table_id": 8454144,
> > > > "match": [
> > > > { "field_id": 1, "exact": { "value": [10, 0, 0, 1]
> > > > } }, // outer src IP =
> > > > 10.0.0.1
> > > > { "field_id": 2, "exact": { "value": [10, 0, 0, 2]
> > > > } }, // outer dst IP =
> > > > 10.0.0.2
> > > > { "field_id": 3, "exact": { "value": [0, 0, 10] }
> > > > }, // vni = 10,
> > > > { "field_id": 4, "exact": { "value": [192, 0, 0,
> > > > 1] } }, // inner src IP =
> > > > 192.0.0.1
> > > > {"field_id": 5, "exact": { "value": [192, 0, 0, 2]
> > > > } }, // inner dst IP =
> > > > 192.0.0.2
> > > > {"field_id": 6, "exact": { "value": [0, 200] } },
> > > > // tcp src port = 200
> > > > {"field_id": 7, "exact": { "value": [0, 201] } },
> > > > // tcp dst port = 201
> > > > ],
> > > > "action": {
> > > > "action": {
> > > > "action_id": 8519716,
> > > > "params": [
> > > > { "param_id": 1, "value": [5, 0, 0, 0] }
> > > > ]
> > > > }
> > > > },
> > > > ...
> > > > }
> > > > } ...
> > > > }
> > > >
> > > > Please note that this is only a part of the full command. For more
> > > > information, please refer to the p4runtime.proto[2]
> > > >
> > > > 1. https://p4.org/p4-spec/p4runtime/main/P4Runtime-Spec.html
> > > > 2.
> > > >
> > >
> >
> https://github.com/p4lang/p4runtime/blob/main/proto/p4/v1/p4runtime.p
> > > r
> > > > oto
> > > >
> > > > Thank you for your attention to this matter.
> > > >
> > >
> > > I think that we should schedule some meeting to see how much gaps we
> > > really have between the rte_flow and
> > > P4 and how we can improve the rte_flow to allow the best experience.
> >
> > Sound a good idea!
> > >
> > > > Regards
> > > > Qi
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: seeking community input on adapting DPDK to P4Runtime backend
2023-05-22 5:12 ` Zhang, Qi Z
@ 2023-05-24 15:00 ` Jerin Jacob
2023-05-24 15:43 ` Thomas Monjalon
0 siblings, 1 reply; 9+ messages in thread
From: Jerin Jacob @ 2023-05-24 15:00 UTC (permalink / raw)
To: Zhang, Qi Z
Cc: Ori Kam, dev, techboard, Richardson, Bruce, Burakov, Anatoly,
Wiles, Keith, Liang, Cunming, Wu, Jingjing, Zhang, Helin,
Mcnamara, John, Xu, Rosen, Kiran Kumar K, Satheesh Paul
On Mon, May 22, 2023 at 10:42 AM Zhang, Qi Z <qi.z.zhang@intel.com> wrote:
>
>
>
> > -----Original Message-----
> > From: Ori Kam <orika@nvidia.com>
> > Sent: Thursday, May 18, 2023 10:34 PM
> > To: Zhang, Qi Z <qi.z.zhang@intel.com>; dev@dpdk.org
> > Cc: techboard@dpdk.org; Richardson, Bruce <bruce.richardson@intel.com>;
> > Burakov, Anatoly <anatoly.burakov@intel.com>; Wiles, Keith
> > <keith.wiles@intel.com>; Liang, Cunming <cunming.liang@intel.com>; Wu,
> > Jingjing <jingjing.wu@intel.com>; Zhang, Helin <helin.zhang@intel.com>;
> > Mcnamara, John <john.mcnamara@intel.com>; Xu, Rosen
> > <rosen.xu@intel.com>
> > Subject: RE: seeking community input on adapting DPDK to P4Runtime
> > backend
We did some study to use rte_flow on table driven _HW_ (HW has similar
capability to p4 table)
Following are the observations that need improvement in rte_flow.
1) HW engines require more resources for ACL (considering the
algorithmic HW implementation and table size is in handful of
millions),
whereas EM, LPM needs less HW resources, In p4, we have means to
express this, in rte_flow, in general assumption it is ACL.
We may need to express the mode in rte_flow_template_table_create() or
so. Otherwise,
more than one rte_flow_pattern_template* templates
pattern_template_index of rte_flow_async_create() creates
conflicting modes. In p4, mode is associated with a table, and it has
fixed KEY and VALUE. This area in the rte_flow requires
improvement if we need to use with p4 type HW.
2) rte_flow is purely in working "inline" mode, If CPU core needs to
do lookup on the table created. We require some APIs
to look-aside mode support.
3) Handling of raw action data
a) In p4, Action value is opaque, so maybe we need to have action RAW
where value can be running
number from 0 to VALUE - 1.
b) Expressing the handling compute operation after lookup.
rte_flow_actions are fixed in nature, which
would suffice for a lot of use case. Expressing the following case may
be difficult with rte_flow now.
For example:
value_from_lookup = lookup(packet, key);
if ((packet.filed[x] && value_value_from_lookup) == value_x) {
packet.field[x] += value_y;
packet.field[x] ^= value_z;
}
I think, such general programming paradigm kind of action may need
ePBF kind program to express.
Where we can add new RTE_FLOW_ACTION_LOAD_EPF_PROGRAM to run through a
simple program after table lookup.
Either, we can update the rte_flow to address the cases reported in the thread
or enhance the current rte_table library(which already has a function
pointer based backend) and
create an object using the rte_table API and connect the table object
with rte_flow API.
I think, we should try to enhance rte_flow for more native table
support if possible.
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: seeking community input on adapting DPDK to P4Runtime backend
2023-05-24 15:00 ` Jerin Jacob
@ 2023-05-24 15:43 ` Thomas Monjalon
0 siblings, 0 replies; 9+ messages in thread
From: Thomas Monjalon @ 2023-05-24 15:43 UTC (permalink / raw)
To: Jerin Jacob
Cc: Zhang, Qi Z, techboard, Ori Kam, dev, techboard, Richardson,
Bruce, Burakov, Anatoly, Wiles, Keith, Liang, Cunming, Wu,
Jingjing, Zhang, Helin, Mcnamara, John, Xu, Rosen, Kiran Kumar K,
Satheesh Paul
24/05/2023 17:00, Jerin Jacob:
> We did some study to use rte_flow on table driven _HW_ (HW has similar
> capability to p4 table)
> Following are the observations that need improvement in rte_flow.
>
>
> 1) HW engines require more resources for ACL (considering the
> algorithmic HW implementation and table size is in handful of
> millions),
> whereas EM, LPM needs less HW resources, In p4, we have means to
> express this, in rte_flow, in general assumption it is ACL.
> We may need to express the mode in rte_flow_template_table_create() or
> so. Otherwise,
> more than one rte_flow_pattern_template* templates
> pattern_template_index of rte_flow_async_create() creates
> conflicting modes. In p4, mode is associated with a table, and it has
> fixed KEY and VALUE. This area in the rte_flow requires
> improvement if we need to use with p4 type HW.
>
> 2) rte_flow is purely in working "inline" mode, If CPU core needs to
> do lookup on the table created. We require some APIs
> to look-aside mode support.
>
> 3) Handling of raw action data
>
> a) In p4, Action value is opaque, so maybe we need to have action RAW
> where value can be running
> number from 0 to VALUE - 1.
>
> b) Expressing the handling compute operation after lookup.
> rte_flow_actions are fixed in nature, which
> would suffice for a lot of use case. Expressing the following case may
> be difficult with rte_flow now.
>
> For example:
> value_from_lookup = lookup(packet, key);
> if ((packet.filed[x] && value_value_from_lookup) == value_x) {
> packet.field[x] += value_y;
> packet.field[x] ^= value_z;
> }
>
> I think, such general programming paradigm kind of action may need
> ePBF kind program to express.
> Where we can add new RTE_FLOW_ACTION_LOAD_EPF_PROGRAM to run through a
> simple program after table lookup.
>
> Either, we can update the rte_flow to address the cases reported in the thread
> or enhance the current rte_table library(which already has a function
> pointer based backend) and
> create an object using the rte_table API and connect the table object
> with rte_flow API.
>
> I think, we should try to enhance rte_flow for more native table
> support if possible.
I agree to enhance rte_flow in general.
I suspect that most of features above are already possible
using some unknown properties of rte_flow.
For instance, modifying a packet is possible with RTE_FLOW_ACTION_TYPE_MODIFY_FIELD.
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2023-05-24 15:43 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-05-08 6:39 seeking community input on adapting DPDK to P4Runtime backend Zhang, Qi Z
2023-05-17 15:18 ` Ori Kam
2023-05-18 10:33 ` Zhang, Qi Z
2023-05-18 14:33 ` Ori Kam
2023-05-22 5:12 ` Zhang, Qi Z
2023-05-24 15:00 ` Jerin Jacob
2023-05-24 15:43 ` Thomas Monjalon
2023-05-18 14:45 ` Honnappa Nagarahalli
2023-05-22 4:58 ` Zhang, Qi Z
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).