From: Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>
To: "Zhang, Qi Z" <qi.z.zhang@intel.com>, Ori Kam <orika@nvidia.com>,
"dev@dpdk.org" <dev@dpdk.org>
Cc: "techboard@dpdk.org" <techboard@dpdk.org>,
"Richardson, Bruce" <bruce.richardson@intel.com>,
"Burakov, Anatoly" <anatoly.burakov@intel.com>,
"Wiles, Keith" <keith.wiles@intel.com>,
"Liang, Cunming" <cunming.liang@intel.com>,
"Wu, Jingjing" <jingjing.wu@intel.com>,
"Zhang, Helin" <helin.zhang@intel.com>,
"Mcnamara, John" <john.mcnamara@intel.com>,
"Xu, Rosen" <rosen.xu@intel.com>, nd <nd@arm.com>,
nd <nd@arm.com>
Subject: RE: seeking community input on adapting DPDK to P4Runtime backend
Date: Thu, 18 May 2023 14:45:38 +0000 [thread overview]
Message-ID: <DBAPR08MB5814B1E669A654DF452BBA96987F9@DBAPR08MB5814.eurprd08.prod.outlook.com> (raw)
In-Reply-To: <DM4PR11MB599450B2422CDA8BB351560DD77F9@DM4PR11MB5994.namprd11.prod.outlook.com>
<snip>
> >
> > Hi Zhang,
> >
> > rte_flow is an excellent candidate for implementing P4.
> > We and some internal tests that shows great promise in this regard.
> >
> > I would be very happy to supply any needed information and have
> > discussion on how to continue with this project.
>
> Thank you Ori! Please check my following comments
>
> Regards
> Qi
>
> >
> > Please see inline detailed answers.
> >
> > Best,
> > Ori Kam
> >
> >
> >
> >
> > > -----Original Message-----
> > > From: Zhang, Qi Z <qi.z.zhang@intel.com>
> > > Sent: Monday, May 8, 2023 9:40 AM
> > > Subject: seeking community input on adapting DPDK to P4Runtime
> > backend
> > >
> > > Hi:
> > >
> > > Our team is currently working on developing a DPDK PMD for a P4-
> > > programmed network controller, based on customer feedback to
> > > integrate DPDK into the P4Runtime backend .[https://p4.org/p4-
> > > spec/p4runtime/main/P4Runtime-Spec.html]
> > >
> > > (*) However, we are facing challenges in adapting DPDK's rte_flow
> > > API to the P4Runtime API, primarily due to the transition from a
> > > table-based API with fields of arbitrary bits width at arbitrary
> > > offset to a protocol-based API (more detail be described in post-script).
> > >
> > > We are seeking suggestions and best practices from the open-source
> > > community to help us with this integration. Specifically, we are
> > > interested in
> > > learning:
> > >
> > > (*) If anyone has previously attempted to map rte_flow to P4-based
> > devices.
> >
> > We did try successfully.
> >
> > > (*) Thoughts on how to map from table-based matching to
> > > protocol-based matching like in rte_flow.
> >
> > Rte_flow is table based (groups), now with the introduction of
> > template API rte_flow is even more table based (we added the concept
> > of tables) which are just what
> > p4 requires.
>
> Yes, the rte_flow template can be used to map a sequence of patterns to a P4
> table and a sequence of actions to a P4 action. However, Using a fixed rte_flow
> template can be problematic when handling different P4 programs in the same
> driver. To provide more flexibility, the mapping of patterns and actions can be
> externalized into a configuration file or part of the firmware can be learned
> from driver, allowing for customization based on the specific requirements of
> each P4 pipeline. actually we have enabled this approach in order to
> accommodate different P4 programs.
>
> However, an alternative approach to consider is whether it would be feasible to
> directly expose the P4 table and action names or IDs to the application, rather
> than relying on rte_flow templates. This approach offers several potential
> benefits:
>
> Integration with P4runtime Backend: By exposing the P4 table and action names
> or IDs directly, DPDK could be easily integrated as a P4runtime backend. This
> eliminates the need for translation from the P4runtime API to rte_flow
> templates in the application, simplifying the integration process.
>
> Elimination of Manual Mapping: Exposing the P4 table and action names or IDs
> to the application would remove the requirement for the engineering team to
> manually map each pipeline to specific rte_flow templates. This is particularly
> beneficial in cases where hardware vendors provide customers with a toolchain
> to create their own P4 pipelines but do not necessarily own the P4 programs. By
> eliminating the dependency on rte_flow templates, this approach reduces
> complexity in using DPDK as the driver.
>
> To be more specific, the proposed API for exposing P4 table and action names or
> IDs directly to the application could be as follows:
>
> /* Get the table info */
> struct rte_p4_table_info tbl_info;
> rte_p4_table_info_get_by_name(port_id, "decap_vxlan_tcp_table",
> &tbl_info);
>
> /* Create the key */
> struct rte_p4_table_key *key;
> rte_p4_table_key_create(port_id, tbl_info->id, &key);
>
> /* Set the key fields */
> rte_p4_table_key_field_set_by_name(port_id, key, "wire_port", &wire_port,
> 2); rte_p4_table_key_field_set_by_name(port_id, key, "tun_ip_src",
> &tun_ip_src, 4); rte_p4_table_key_field_set_by_name(port_id, key,
> "tun_ip_dst", &tun_ip_dst, 4); rte_p4_table_key_field_set_by_name(port_id,
> key, "vni", &vni, 3); rte_p4_table_key_field_set_by_name(port_id, key,
> "ipv4_src", &ipv4_src, 4); rte_p4_table_key_field_set_by_name(port_id, key,
> "ipv4_dst", &ipv4_dst, 4); rte_p4_table_key_field_set_by_name(port_id, key,
> "src_port", &src_port, 2); rte_p4_table_key_field_set_by_name(port_id, key,
> "dst_port", &dst_port, 2);
>
> /* Get the action spec info */
> struct rte_p4_action_spec_info as_info;
> rte_p4_action_spec_info_get_by_name(port_id, "decap_vxlan_fwd",
> &as_info);
>
>
> /* Create the action */
> struct rte_p4_action *action;
> rte_p4_action_create(port_id, as_info->id, &action);
>
>
> /* Set the action fields */
> rte_p4_table_action_field_set_by_name(port_id, action, "mod_id", &mod_id,
> 3); rte_p4_table_action_field_set_by_name(port_id, action, "port_id",
> &target_port_id, 2);
>
> /* Add the entry */
> rte_p4_table_entry_add(port_id, tbl_info->id, key, action);
These do not look at like P4 specific. Could be just generic APIs. Could we have these as rte_flow APIs?
>
> ...
>
>
>
>
>
> >
> > > (*) Any ideas on how to extend or expand the rte_flow APIs to better
> > > accommodate P4-based or other table-matching based devices.
> > >
> >
> > Lets discuss any issue you have.
> >
> > > Your insights and feedback would be greatly appreciated!
> > >
> > > ======================= Post-Script
> ============================
> > >
> > > More details on the problem below, for anyone interested
> > >
> > > In P4, flow offloading can be implemented using the P4Runtime API,
> > > which provides a standard interface for controlling and configuring
> > > the data plane behavior of network devices. P4Runtime allows network
> > > operators to dynamically add, modify, and remove flow rules in the
> > > hardware forwarding tables of P4-enabled devices.
> > >
> > > The P4Runtime API is a table-based API, it assume the packet process
> > > pipeline was consists of one or more key/action units (tables). In
> > > P4Runtime, each table defines the fields to be matched and the
> > > actions to be taken on incoming packets. During compilation, the P4
> > > compiler assigns a unique
> > > uint32 ID to each table, action, and field, which is associated with
> > > its corresponding string name. These IDs have no inherent
> > > relationship to any network protocol but instead serve as a means to
> > > identify different components of a P4 program within the P4Runtime API.
> > >
> > This is the concept of tables and groups in rte_flow.
> >
> > > If we choose to use rte_flow as the low-level API for P4Runtime, a
> > > translation layer is needed in the application to map the P4 tables
> > > and actions to the corresponding rte_flow rules. However, this
> > > translation layer can be problematic as it is not easily scalable.
> > > When the P4 pipeline is refined or updated, the translation rules
> > > may also need to be updated, which can result in errors and reduced
> efficiency.
> > >
> > I don't understand why.
> >
> > > On the other hand, a hardware vendor that provides a P4-enabled
> > > device is required to implement an rte_flow interface in their DPDK PMD.
> > > Typically, the
> > > P4 compiler generates hints for the driver on how to map P4 tables
> > > to hardware resources, and how to convert table entry
> > > add/modify/delete actions into low-level hardware configurations.
> > > However, because rte_flow is protocol-based, it poses an additional
> > > challenge for driver developers, who must create another translation
> > > layer to convert rte_flow tokens into P4 object identifiers. This
> > > translation layer must be carefully designed and implemented to
> > > ensure optimal performance and scalability, and to ensure that the
> > > driver can efficiently
> > handle the dynamic nature of P4 programs.
> > >
> > Right, but some of the translation can be done in shared code by all
> > PMDs and the translation is static for the compilation so inserting
> > rules can be supper fast with no need for extra work.
> >
> > > To better understand the problem, let's consider the following
> > > example that demonstrates how to use the P4Runtime API to program a
> > > rule for processing a VXLAN packet. The rule matches a VXLAN packet,
> > > decapsulates the tunnel header, and forwards it to a specific port.
> > >
> > > The P4 source code below describes the VXLAN decap table
> > > decap_vxlan_tcp_table, which matches the outer IP address, VNI,
> > > inner IP address, and inner TCP port. For each rule, four action
> > > specifications can be selected. We will focus on one action
> > > specification decap_vxlan_fwd that performs decapsulation and
> > > forwards
> > the packet to a specific port.
> > >
> > > table decap_vxlan_tcp_table {
> > > key = {
> > > hdrs.ipv4[meta.depth-1].src_ip: exact @name("tun_ip_src");
> > > hdrs.ipv4[meta.depth-1].dst_ip: exact @name("tun_ip_dst");
> > > hdrs.vxlan[meta.depth-1].vni : exact @name("vni");
> > > hdrs.ipv4[meta.depth].src_ip : exact @name("ipv4_src");
> > > hdrs.ipv4[meta.depth].dst_ip : exact @name("ipv4_dst");
> > > hdrs.tcp.sport : exact @name("src_port");
> > > hdrs.tcp.dport : exact @name("dst_port");
> > > }
> > > actions = {
> > > @tableonly decap_vxlan_fwd;
> > > @tableonly decap_vxlan_dnat_fwd;
> > > @tableonly decap_vxlan_snat_fwd;
> > > @defaultonly set_exception;
> > > }
> > > }
> > Translate to rte_flow:
> > template pattern relaxed_mode = 1 pattern = Ipv4_src / ipv4_dst /
> > vni / ipv4_src / ipv4_dst / tcp_sport / tcp_dport map structure = {
> > tun_ip_src = &pattern[ipv4_src]
> > ....
> > }
> > > ...
> > >
> > > action decap_vxlan_fwd(PortId_t port_id) {
> > > meta.mod_action = (bit<11>)VXLAN_DECAP_OUTER_IPV4;
> > > send_to_port(port_id);
> > > }
> > >
> > Same as above just with action template
> >
> > > Below is an example of the hint that the compiler will generate for
> > > the
> > > decap_vxlan_tcp_table:
> > >
> > > Table ID: 8454144
> > > Name: decap_vxlan_tcp_table Field ID Name
> > > Match Type Bit Width Byte Width Byte Order
> > > 1 tun_ip_src exact 32
> > > 4 network
> > > 2 tun_ip_dst exact 32
> > > 4 network
> > > 3 vni exact 24
> > > 3 network
> > > 4 ipv4_src exact 32
> > > 4 network
> > > 5 ipv4_dst exact 32
> > > 4 network
> > > 6 src_port exact 16
> > > 2 network
> > > 7 dst_port exact 16
> > > 2 network Spec ID Name
> > > 8519716 decap_vxlan_fwd
> > > 8519718 decap_vxlan_dnat_fwd
> > > 8519720 decap_vxlan_snat_fwd
> > > 8519695 set_exception
> > >
> > > And the hint of action spec "decap_vxlan_fwd" as below:
> > >
> > > Spec ID: 8519716
> > > Name: decap_vxlan_fwd
> > > Field ID Name Bit Width Byte
> > > Width Byte Order
> > > 1 port_id 32 4 host
> > >
> > > Please note that different compilers may assign different IDs.
> > >
> > > Below is an example of how to program a rule using the P4 runtime
> > > API in JSON format. This rule matches fields and directs packets to port 5.
> > >
> > > {
> > > "type": 1, //INSERT
> > > "entity": {
> > > "table_entry": {
> > > "table_id": 8454144,
> > > "match": [
> > > { "field_id": 1, "exact": { "value": [10, 0, 0, 1] }
> > > }, // outer src IP =
> > > 10.0.0.1
> > > { "field_id": 2, "exact": { "value": [10, 0, 0, 2] }
> > > }, // outer dst IP =
> > > 10.0.0.2
> > > { "field_id": 3, "exact": { "value": [0, 0, 10] } },
> > > // vni = 10,
> > > { "field_id": 4, "exact": { "value": [192, 0, 0, 1]
> > > } }, // inner src IP =
> > > 192.0.0.1
> > > {"field_id": 5, "exact": { "value": [192, 0, 0, 2] }
> > > }, // inner dst IP =
> > > 192.0.0.2
> > > {"field_id": 6, "exact": { "value": [0, 200] } }, //
> > > tcp src port = 200
> > > {"field_id": 7, "exact": { "value": [0, 201] } }, //
> > > tcp dst port = 201
> > > ],
> > > "action": {
> > > "action": {
> > > "action_id": 8519716,
> > > "params": [
> > > { "param_id": 1, "value": [5, 0, 0, 0] }
> > > ]
> > > }
> > > },
> > > ...
> > > }
> > > } ...
> > > }
> > >
> > > Please note that this is only a part of the full command. For more
> > > information, please refer to the p4runtime.proto[2]
> > >
> > > 1. https://p4.org/p4-spec/p4runtime/main/P4Runtime-Spec.html
> > > 2.
> > >
> > https://github.com/p4lang/p4runtime/blob/main/proto/p4/v1/p4runtime.p
> > r
> > > oto
> > >
> > > Thank you for your attention to this matter.
> > >
> >
> > I think that we should schedule some meeting to see how much gaps we
> > really have between the rte_flow and
> > P4 and how we can improve the rte_flow to allow the best experience.
>
> Sound a good idea!
> >
> > > Regards
> > > Qi
next prev parent reply other threads:[~2023-05-18 14:46 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-05-08 6:39 Zhang, Qi Z
2023-05-17 15:18 ` Ori Kam
2023-05-18 10:33 ` Zhang, Qi Z
2023-05-18 14:33 ` Ori Kam
2023-05-22 5:12 ` Zhang, Qi Z
2023-05-24 15:00 ` Jerin Jacob
2023-05-24 15:43 ` Thomas Monjalon
2023-05-18 14:45 ` Honnappa Nagarahalli [this message]
2023-05-22 4:58 ` Zhang, Qi Z
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=DBAPR08MB5814B1E669A654DF452BBA96987F9@DBAPR08MB5814.eurprd08.prod.outlook.com \
--to=honnappa.nagarahalli@arm.com \
--cc=anatoly.burakov@intel.com \
--cc=bruce.richardson@intel.com \
--cc=cunming.liang@intel.com \
--cc=dev@dpdk.org \
--cc=helin.zhang@intel.com \
--cc=jingjing.wu@intel.com \
--cc=john.mcnamara@intel.com \
--cc=keith.wiles@intel.com \
--cc=nd@arm.com \
--cc=orika@nvidia.com \
--cc=qi.z.zhang@intel.com \
--cc=rosen.xu@intel.com \
--cc=techboard@dpdk.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).