From: "Zhang, Qi Z" <qi.z.zhang@intel.com>
To: Ori Kam <orika@nvidia.com>, "dev@dpdk.org" <dev@dpdk.org>
Cc: "techboard@dpdk.org" <techboard@dpdk.org>,
"Richardson, Bruce" <bruce.richardson@intel.com>,
"Burakov, Anatoly" <anatoly.burakov@intel.com>,
"Wiles, Keith" <keith.wiles@intel.com>,
"Liang, Cunming" <cunming.liang@intel.com>,
"Wu, Jingjing" <jingjing.wu@intel.com>,
"Zhang, Helin" <helin.zhang@intel.com>,
"Mcnamara, John" <john.mcnamara@intel.com>,
"Xu, Rosen" <rosen.xu@intel.com>
Subject: RE: seeking community input on adapting DPDK to P4Runtime backend
Date: Thu, 18 May 2023 10:33:07 +0000 [thread overview]
Message-ID: <DM4PR11MB599450B2422CDA8BB351560DD77F9@DM4PR11MB5994.namprd11.prod.outlook.com> (raw)
In-Reply-To: <MW2PR12MB46661C7EDC20D4612B05E1ABD67E9@MW2PR12MB4666.namprd12.prod.outlook.com>
> -----Original Message-----
> From: Ori Kam <orika@nvidia.com>
> Sent: Wednesday, May 17, 2023 11:19 PM
> To: Zhang, Qi Z <qi.z.zhang@intel.com>; dev@dpdk.org
> Cc: techboard@dpdk.org; Richardson, Bruce <bruce.richardson@intel.com>;
> Burakov, Anatoly <anatoly.burakov@intel.com>; Wiles, Keith
> <keith.wiles@intel.com>; Liang, Cunming <cunming.liang@intel.com>; Wu,
> Jingjing <jingjing.wu@intel.com>; Zhang, Helin <helin.zhang@intel.com>;
> Mcnamara, John <john.mcnamara@intel.com>; Xu, Rosen
> <rosen.xu@intel.com>
> Subject: RE: seeking community input on adapting DPDK to P4Runtime
> backend
>
> Hi Zhang,
>
> rte_flow is an excellent candidate for implementing P4.
> We and some internal tests that shows great promise in this regard.
>
> I would be very happy to supply any needed information and have
> discussion on how to continue with this project.
Thank you Ori! Please check my following comments
Regards
Qi
>
> Please see inline detailed answers.
>
> Best,
> Ori Kam
>
>
>
>
> > -----Original Message-----
> > From: Zhang, Qi Z <qi.z.zhang@intel.com>
> > Sent: Monday, May 8, 2023 9:40 AM
> > Subject: seeking community input on adapting DPDK to P4Runtime
> backend
> >
> > Hi:
> >
> > Our team is currently working on developing a DPDK PMD for a P4-
> > programmed network controller, based on customer feedback to integrate
> > DPDK into the P4Runtime backend .[https://p4.org/p4-
> > spec/p4runtime/main/P4Runtime-Spec.html]
> >
> > (*) However, we are facing challenges in adapting DPDK's rte_flow API
> > to the P4Runtime API, primarily due to the transition from a
> > table-based API with fields of arbitrary bits width at arbitrary
> > offset to a protocol-based API (more detail be described in post-script).
> >
> > We are seeking suggestions and best practices from the open-source
> > community to help us with this integration. Specifically, we are
> > interested in
> > learning:
> >
> > (*) If anyone has previously attempted to map rte_flow to P4-based
> devices.
>
> We did try successfully.
>
> > (*) Thoughts on how to map from table-based matching to protocol-based
> > matching like in rte_flow.
>
> Rte_flow is table based (groups), now with the introduction of template API
> rte_flow is even more table based (we added the concept of tables) which
> are just what
> p4 requires.
Yes, the rte_flow template can be used to map a sequence of patterns to a P4 table and a sequence of actions to a P4 action. However, Using a fixed rte_flow template can be problematic when handling different P4 programs in the same driver. To provide more flexibility, the mapping of patterns and actions can be externalized into a configuration file or part of the firmware can be learned from driver, allowing for customization based on the specific requirements of each P4 pipeline. actually we have enabled this approach in order to accommodate different P4 programs.
However, an alternative approach to consider is whether it would be feasible to directly expose the P4 table and action names or IDs to the application, rather than relying on rte_flow templates. This approach offers several potential benefits:
Integration with P4runtime Backend: By exposing the P4 table and action names or IDs directly, DPDK could be easily integrated as a P4runtime backend. This eliminates the need for translation from the P4runtime API to rte_flow templates in the application, simplifying the integration process.
Elimination of Manual Mapping: Exposing the P4 table and action names or IDs to the application would remove the requirement for the engineering team to manually map each pipeline to specific rte_flow templates. This is particularly beneficial in cases where hardware vendors provide customers with a toolchain to create their own P4 pipelines but do not necessarily own the P4 programs. By eliminating the dependency on rte_flow templates, this approach reduces complexity in using DPDK as the driver.
To be more specific, the proposed API for exposing P4 table and action names or IDs directly to the application could be as follows:
/* Get the table info */
struct rte_p4_table_info tbl_info;
rte_p4_table_info_get_by_name(port_id, "decap_vxlan_tcp_table", &tbl_info);
/* Create the key */
struct rte_p4_table_key *key;
rte_p4_table_key_create(port_id, tbl_info->id, &key);
/* Set the key fields */
rte_p4_table_key_field_set_by_name(port_id, key, "wire_port", &wire_port, 2);
rte_p4_table_key_field_set_by_name(port_id, key, "tun_ip_src", &tun_ip_src, 4);
rte_p4_table_key_field_set_by_name(port_id, key, "tun_ip_dst", &tun_ip_dst, 4);
rte_p4_table_key_field_set_by_name(port_id, key, "vni", &vni, 3);
rte_p4_table_key_field_set_by_name(port_id, key, "ipv4_src", &ipv4_src, 4);
rte_p4_table_key_field_set_by_name(port_id, key, "ipv4_dst", &ipv4_dst, 4);
rte_p4_table_key_field_set_by_name(port_id, key, "src_port", &src_port, 2);
rte_p4_table_key_field_set_by_name(port_id, key, "dst_port", &dst_port, 2);
/* Get the action spec info */
struct rte_p4_action_spec_info as_info;
rte_p4_action_spec_info_get_by_name(port_id, "decap_vxlan_fwd", &as_info);
/* Create the action */
struct rte_p4_action *action;
rte_p4_action_create(port_id, as_info->id, &action);
/* Set the action fields */
rte_p4_table_action_field_set_by_name(port_id, action, "mod_id", &mod_id, 3);
rte_p4_table_action_field_set_by_name(port_id, action, "port_id", &target_port_id, 2);
/* Add the entry */
rte_p4_table_entry_add(port_id, tbl_info->id, key, action);
...
>
> > (*) Any ideas on how to extend or expand the rte_flow APIs to better
> > accommodate P4-based or other table-matching based devices.
> >
>
> Lets discuss any issue you have.
>
> > Your insights and feedback would be greatly appreciated!
> >
> > ======================= Post-Script ============================
> >
> > More details on the problem below, for anyone interested
> >
> > In P4, flow offloading can be implemented using the P4Runtime API,
> > which provides a standard interface for controlling and configuring
> > the data plane behavior of network devices. P4Runtime allows network
> > operators to dynamically add, modify, and remove flow rules in the
> > hardware forwarding tables of P4-enabled devices.
> >
> > The P4Runtime API is a table-based API, it assume the packet process
> > pipeline was consists of one or more key/action units (tables). In
> > P4Runtime, each table defines the fields to be matched and the actions
> > to be taken on incoming packets. During compilation, the P4 compiler
> > assigns a unique
> > uint32 ID to each table, action, and field, which is associated with
> > its corresponding string name. These IDs have no inherent relationship
> > to any network protocol but instead serve as a means to identify
> > different components of a P4 program within the P4Runtime API.
> >
> This is the concept of tables and groups in rte_flow.
>
> > If we choose to use rte_flow as the low-level API for P4Runtime, a
> > translation layer is needed in the application to map the P4 tables
> > and actions to the corresponding rte_flow rules. However, this
> > translation layer can be problematic as it is not easily scalable.
> > When the P4 pipeline is refined or updated, the translation rules may
> > also need to be updated, which can result in errors and reduced efficiency.
> >
> I don't understand why.
>
> > On the other hand, a hardware vendor that provides a P4-enabled device
> > is required to implement an rte_flow interface in their DPDK PMD.
> > Typically, the
> > P4 compiler generates hints for the driver on how to map P4 tables to
> > hardware resources, and how to convert table entry add/modify/delete
> > actions into low-level hardware configurations. However, because
> > rte_flow is protocol-based, it poses an additional challenge for
> > driver developers, who must create another translation layer to
> > convert rte_flow tokens into P4 object identifiers. This translation
> > layer must be carefully designed and implemented to ensure optimal
> > performance and scalability, and to ensure that the driver can efficiently
> handle the dynamic nature of P4 programs.
> >
> Right, but some of the translation can be done in shared code by all PMDs
> and the translation is static for the compilation so inserting rules can be
> supper fast with no need for extra work.
>
> > To better understand the problem, let's consider the following example
> > that demonstrates how to use the P4Runtime API to program a rule for
> > processing a VXLAN packet. The rule matches a VXLAN packet,
> > decapsulates the tunnel header, and forwards it to a specific port.
> >
> > The P4 source code below describes the VXLAN decap table
> > decap_vxlan_tcp_table, which matches the outer IP address, VNI, inner
> > IP address, and inner TCP port. For each rule, four action
> > specifications can be selected. We will focus on one action
> > specification decap_vxlan_fwd that performs decapsulation and forwards
> the packet to a specific port.
> >
> > table decap_vxlan_tcp_table {
> > key = {
> > hdrs.ipv4[meta.depth-1].src_ip: exact @name("tun_ip_src");
> > hdrs.ipv4[meta.depth-1].dst_ip: exact @name("tun_ip_dst");
> > hdrs.vxlan[meta.depth-1].vni : exact @name("vni");
> > hdrs.ipv4[meta.depth].src_ip : exact @name("ipv4_src");
> > hdrs.ipv4[meta.depth].dst_ip : exact @name("ipv4_dst");
> > hdrs.tcp.sport : exact @name("src_port");
> > hdrs.tcp.dport : exact @name("dst_port");
> > }
> > actions = {
> > @tableonly decap_vxlan_fwd;
> > @tableonly decap_vxlan_dnat_fwd;
> > @tableonly decap_vxlan_snat_fwd;
> > @defaultonly set_exception;
> > }
> > }
> Translate to rte_flow:
> template pattern relaxed_mode = 1 pattern = Ipv4_src / ipv4_dst / vni /
> ipv4_src / ipv4_dst / tcp_sport / tcp_dport map structure = {
> tun_ip_src = &pattern[ipv4_src]
> ....
> }
> > ...
> >
> > action decap_vxlan_fwd(PortId_t port_id) {
> > meta.mod_action = (bit<11>)VXLAN_DECAP_OUTER_IPV4;
> > send_to_port(port_id);
> > }
> >
> Same as above just with action template
>
> > Below is an example of the hint that the compiler will generate for
> > the
> > decap_vxlan_tcp_table:
> >
> > Table ID: 8454144
> > Name: decap_vxlan_tcp_table
> > Field ID Name Match Type Bit Width
> > Byte Width Byte Order
> > 1 tun_ip_src exact 32
> > 4 network
> > 2 tun_ip_dst exact 32
> > 4 network
> > 3 vni exact 24
> > 3 network
> > 4 ipv4_src exact 32
> > 4 network
> > 5 ipv4_dst exact 32
> > 4 network
> > 6 src_port exact 16
> > 2 network
> > 7 dst_port exact 16
> > 2 network Spec ID Name
> > 8519716 decap_vxlan_fwd
> > 8519718 decap_vxlan_dnat_fwd
> > 8519720 decap_vxlan_snat_fwd
> > 8519695 set_exception
> >
> > And the hint of action spec "decap_vxlan_fwd" as below:
> >
> > Spec ID: 8519716
> > Name: decap_vxlan_fwd
> > Field ID Name Bit Width Byte Width
> > Byte Order
> > 1 port_id 32 4
> > host
> >
> > Please note that different compilers may assign different IDs.
> >
> > Below is an example of how to program a rule using the P4 runtime API
> > in JSON format. This rule matches fields and directs packets to port 5.
> >
> > {
> > "type": 1, //INSERT
> > "entity": {
> > "table_entry": {
> > "table_id": 8454144,
> > "match": [
> > { "field_id": 1, "exact": { "value": [10, 0, 0, 1] }
> > }, // outer src IP =
> > 10.0.0.1
> > { "field_id": 2, "exact": { "value": [10, 0, 0, 2] }
> > }, // outer dst IP =
> > 10.0.0.2
> > { "field_id": 3, "exact": { "value": [0, 0, 10] } },
> > // vni = 10,
> > { "field_id": 4, "exact": { "value": [192, 0, 0, 1] }
> > }, // inner src IP =
> > 192.0.0.1
> > {"field_id": 5, "exact": { "value": [192, 0, 0, 2] }
> > }, // inner dst IP =
> > 192.0.0.2
> > {"field_id": 6, "exact": { "value": [0, 200] } }, //
> > tcp src port = 200
> > {"field_id": 7, "exact": { "value": [0, 201] } }, //
> > tcp dst port = 201
> > ],
> > "action": {
> > "action": {
> > "action_id": 8519716,
> > "params": [
> > { "param_id": 1, "value": [5, 0, 0, 0] }
> > ]
> > }
> > },
> > ...
> > }
> > } ...
> > }
> >
> > Please note that this is only a part of the full command. For more
> > information, please refer to the p4runtime.proto[2]
> >
> > 1. https://p4.org/p4-spec/p4runtime/main/P4Runtime-Spec.html
> > 2.
> >
> https://github.com/p4lang/p4runtime/blob/main/proto/p4/v1/p4runtime.p
> r
> > oto
> >
> > Thank you for your attention to this matter.
> >
>
> I think that we should schedule some meeting to see how much gaps we
> really have between the rte_flow and
> P4 and how we can improve the rte_flow to allow the best experience.
Sound a good idea!
>
> > Regards
> > Qi
next prev parent reply other threads:[~2023-05-18 10:33 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-05-08 6:39 Zhang, Qi Z
2023-05-17 15:18 ` Ori Kam
2023-05-18 10:33 ` Zhang, Qi Z [this message]
2023-05-18 14:33 ` Ori Kam
2023-05-22 5:12 ` Zhang, Qi Z
2023-05-24 15:00 ` Jerin Jacob
2023-05-24 15:43 ` Thomas Monjalon
2023-05-18 14:45 ` Honnappa Nagarahalli
2023-05-22 4:58 ` Zhang, Qi Z
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=DM4PR11MB599450B2422CDA8BB351560DD77F9@DM4PR11MB5994.namprd11.prod.outlook.com \
--to=qi.z.zhang@intel.com \
--cc=anatoly.burakov@intel.com \
--cc=bruce.richardson@intel.com \
--cc=cunming.liang@intel.com \
--cc=dev@dpdk.org \
--cc=helin.zhang@intel.com \
--cc=jingjing.wu@intel.com \
--cc=john.mcnamara@intel.com \
--cc=keith.wiles@intel.com \
--cc=orika@nvidia.com \
--cc=rosen.xu@intel.com \
--cc=techboard@dpdk.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).