DPDK patches and discussions
 help / color / mirror / Atom feed
From: Ori Kam <orika@nvidia.com>
To: "Zhang, Qi Z" <qi.z.zhang@intel.com>, "dev@dpdk.org" <dev@dpdk.org>
Cc: "techboard@dpdk.org" <techboard@dpdk.org>,
	"Richardson, Bruce" <bruce.richardson@intel.com>,
	"Burakov, Anatoly" <anatoly.burakov@intel.com>,
	 "Wiles, Keith" <keith.wiles@intel.com>,
	"Liang, Cunming" <cunming.liang@intel.com>,
	"Wu, Jingjing" <jingjing.wu@intel.com>,
	"Zhang, Helin" <helin.zhang@intel.com>,
	"Mcnamara, John" <john.mcnamara@intel.com>,
	"Xu, Rosen" <rosen.xu@intel.com>
Subject: RE: seeking community input on adapting DPDK to P4Runtime backend
Date: Thu, 18 May 2023 14:33:30 +0000	[thread overview]
Message-ID: <MW2PR12MB46662896E0007C5373729ECBD67F9@MW2PR12MB4666.namprd12.prod.outlook.com> (raw)
In-Reply-To: <DM4PR11MB599450B2422CDA8BB351560DD77F9@DM4PR11MB5994.namprd11.prod.outlook.com>

Hi Zhang,

I think we both want the same thing and share the same basic concepts.

PSB, some answers,

Best,
Ori


> -----Original Message-----
> From: Zhang, Qi Z <qi.z.zhang@intel.com>
> Sent: Thursday, May 18, 2023 1:33 PM
> 
> 
> 
> > -----Original Message-----
> > From: Ori Kam <orika@nvidia.com>
> > Sent: Wednesday, May 17, 2023 11:19 PM
> > To: Zhang, Qi Z <qi.z.zhang@intel.com>; dev@dpdk.org
> > Cc: techboard@dpdk.org; Richardson, Bruce
> <bruce.richardson@intel.com>;
> > Burakov, Anatoly <anatoly.burakov@intel.com>; Wiles, Keith
> > <keith.wiles@intel.com>; Liang, Cunming <cunming.liang@intel.com>; Wu,
> > Jingjing <jingjing.wu@intel.com>; Zhang, Helin <helin.zhang@intel.com>;
> > Mcnamara, John <john.mcnamara@intel.com>; Xu, Rosen
> > <rosen.xu@intel.com>
> > Subject: RE: seeking community input on adapting DPDK to P4Runtime
> > backend
> >
> > Hi Zhang,
> >
> > rte_flow is an excellent candidate for implementing P4.
> > We and some internal tests that shows great promise in this regard.
> >
> > I would be very happy to supply any needed information and have
> > discussion on how to continue with this project.
> 
> Thank you Ori! Please check my following comments
> 
> Regards
> Qi
> 
> >
> > Please see inline detailed answers.
> >
> > Best,
> > Ori Kam
> >
> >
> >
> >
> > > -----Original Message-----
> > > From: Zhang, Qi Z <qi.z.zhang@intel.com>
> > > Sent: Monday, May 8, 2023 9:40 AM
> > > Subject: seeking community input on adapting DPDK to P4Runtime
> > backend
> > >
> > > Hi:
> > >
> > > Our team is currently working on developing a DPDK PMD for a P4-
> > > programmed network controller, based on customer feedback to
> integrate
> > > DPDK into the P4Runtime backend .[https://p4.org/p4-
> > > spec/p4runtime/main/P4Runtime-Spec.html]
> > >
> > > (*) However, we are facing challenges in adapting DPDK's rte_flow API
> > > to the P4Runtime API, primarily due to the transition from a
> > > table-based API with fields of arbitrary bits width at arbitrary
> > > offset to a protocol-based API (more detail be described in post-script).
> > >
> > > We are seeking suggestions and best practices from the open-source
> > > community to help us with this integration. Specifically, we are
> > > interested in
> > > learning:
> > >
> > > (*) If anyone has previously attempted to map rte_flow to P4-based
> > devices.
> >
> > We did try successfully.
> >
> > > (*) Thoughts on how to map from table-based matching to protocol-
> based
> > > matching like in rte_flow.
> >
> > Rte_flow is table based (groups), now with the introduction of template
> API
> > rte_flow is even more table based (we added the concept of tables) which
> > are just what
> > p4 requires.
> 
> Yes, the rte_flow template can be used to map a sequence of patterns to a
> P4 table and a sequence of actions to a P4 action. However, Using a fixed
> rte_flow template can be problematic when handling different P4 programs
> in the same driver. To provide more flexibility, the mapping of patterns and
> actions can be externalized into a configuration file or part of the firmware
> can be learned from driver, allowing for customization based on the specific
> requirements of each P4 pipeline. actually we have enabled this approach in
> order to accommodate different P4 programs.
> 
> However, an alternative approach to consider is whether it would be feasible
> to directly expose the P4 table and action names or IDs to the application,
> rather than relying on rte_flow templates. This approach offers several
> potential benefits:
> 
> Integration with P4runtime Backend: By exposing the P4 table and action
> names or IDs directly, DPDK could be easily integrated as a P4runtime
> backend. This eliminates the need for translation from the P4runtime API to
> rte_flow templates in the application, simplifying the integration process.
> 
> Elimination of Manual Mapping: Exposing the P4 table and action names or
> IDs to the application would remove the requirement for the engineering
> team to manually map each pipeline to specific rte_flow templates. This is
> particularly beneficial in cases where hardware vendors provide customers
> with a toolchain to create their own P4 pipelines but do not necessarily own
> the P4 programs. By eliminating the dependency on rte_flow templates, this
> approach reduces complexity in using DPDK as the driver.
> 
> To be more specific, the proposed API for exposing P4 table and action
> names or IDs directly to the application could be as follows:
> 
> /* Get the table info */
> struct rte_p4_table_info tbl_info;
> rte_p4_table_info_get_by_name(port_id, "decap_vxlan_tcp_table",
> &tbl_info);
> 
> /* Create the key */
> struct rte_p4_table_key *key;
> rte_p4_table_key_create(port_id, tbl_info->id, &key);
> 
> /* Set the key fields */
> rte_p4_table_key_field_set_by_name(port_id, key, "wire_port",
> &wire_port, 2);
> rte_p4_table_key_field_set_by_name(port_id, key, "tun_ip_src",
> &tun_ip_src, 4);
> rte_p4_table_key_field_set_by_name(port_id, key, "tun_ip_dst",
> &tun_ip_dst, 4);
> rte_p4_table_key_field_set_by_name(port_id, key, "vni", &vni, 3);
> rte_p4_table_key_field_set_by_name(port_id, key, "ipv4_src", &ipv4_src, 4);
> rte_p4_table_key_field_set_by_name(port_id, key, "ipv4_dst", &ipv4_dst,
> 4);
> rte_p4_table_key_field_set_by_name(port_id, key, "src_port", &src_port,
> 2);
> rte_p4_table_key_field_set_by_name(port_id, key, "dst_port", &dst_port,
> 2);
> 
> /* Get the action spec info */
> struct rte_p4_action_spec_info as_info;
> rte_p4_action_spec_info_get_by_name(port_id, "decap_vxlan_fwd",
> &as_info);
> 
> 
> /* Create the action */
> struct rte_p4_action *action;
> rte_p4_action_create(port_id, as_info->id, &action);
> 
> 
> /* Set the action fields */
> rte_p4_table_action_field_set_by_name(port_id, action, "mod_id",
> &mod_id, 3);
> rte_p4_table_action_field_set_by_name(port_id, action, "port_id",
> &target_port_id, 2);
> 
> /* Add the entry */
> rte_p4_table_entry_add(port_id, tbl_info->id, key, action);
> 
> ...
> 

I think that introduce some API that knows P4 is the way to go,
but I think that this should be a very simple API which calls rte_flow.


> 
> 
> 
> 
> >
> > > (*) Any ideas on how to extend or expand the rte_flow APIs to better
> > > accommodate P4-based or other table-matching based devices.
> > >
> >
> > Lets discuss any issue you have.
> >
> > > Your insights and feedback would be greatly appreciated!
> > >
> > > ======================= Post-Script
> ============================
> > >
> > > More details on the problem below, for anyone interested
> > >
> > > In P4, flow offloading can be implemented using the P4Runtime API,
> > > which provides a standard interface for controlling and configuring
> > > the data plane behavior of network devices. P4Runtime allows network
> > > operators to dynamically add, modify, and remove flow rules in the
> > > hardware forwarding tables of P4-enabled devices.
> > >
> > > The P4Runtime API is a table-based API, it assume the packet process
> > > pipeline was consists of one or more key/action units (tables). In
> > > P4Runtime, each table defines the fields to be matched and the actions
> > > to be taken on incoming packets. During compilation, the P4 compiler
> > > assigns a unique
> > > uint32 ID to each table, action, and field, which is associated with
> > > its corresponding string name. These IDs have no inherent relationship
> > > to any network protocol but instead serve as a means to identify
> > > different components of a P4 program within the P4Runtime API.
> > >
> > This is the concept of tables and groups in rte_flow.
> >
> > > If we choose to use rte_flow as the low-level API for P4Runtime, a
> > > translation layer is needed in the application to map the P4 tables
> > > and actions to the corresponding rte_flow rules. However, this
> > > translation layer can be problematic as it is not easily scalable.
> > > When the P4 pipeline is refined or updated, the translation rules may
> > > also need to be updated, which can result in errors and reduced
> efficiency.
> > >
> > I don't understand why.
> >
> > > On the other hand, a hardware vendor that provides a P4-enabled device
> > > is required to implement an rte_flow interface in their DPDK PMD.
> > > Typically, the
> > > P4 compiler generates hints for the driver on how to map P4 tables to
> > > hardware resources, and how to convert table entry add/modify/delete
> > > actions into low-level hardware configurations. However, because
> > > rte_flow is protocol-based, it poses an additional challenge for
> > > driver developers, who must create another translation layer to
> > > convert rte_flow tokens into P4 object identifiers. This translation
> > > layer must be carefully designed and implemented to ensure optimal
> > > performance and scalability, and to ensure that the driver can efficiently
> > handle the dynamic nature of P4 programs.
> > >
> > Right, but some of the translation can be done in shared code by all PMDs
> > and the translation is static for the compilation so inserting rules can be
> > supper fast with no need for extra work.
> >
> > > To better understand the problem, let's consider the following example
> > > that demonstrates how to use the P4Runtime API to program a rule for
> > > processing a VXLAN packet. The rule matches a VXLAN packet,
> > > decapsulates the tunnel header, and forwards it to a specific port.
> > >
> > > The P4 source code below describes the VXLAN decap table
> > > decap_vxlan_tcp_table, which matches the outer IP address, VNI, inner
> > > IP address, and inner TCP port. For each rule, four action
> > > specifications can be selected. We will focus on one action
> > > specification decap_vxlan_fwd that performs decapsulation and forwards
> > the packet to a specific port.
> > >
> > > table decap_vxlan_tcp_table {
> > >     key = {
> > >         hdrs.ipv4[meta.depth-1].src_ip: exact @name("tun_ip_src");
> > >         hdrs.ipv4[meta.depth-1].dst_ip: exact @name("tun_ip_dst");
> > >         hdrs.vxlan[meta.depth-1].vni  : exact @name("vni");
> > >         hdrs.ipv4[meta.depth].src_ip  : exact @name("ipv4_src");
> > >         hdrs.ipv4[meta.depth].dst_ip  : exact @name("ipv4_dst");
> > >         hdrs.tcp.sport                : exact @name("src_port");
> > >         hdrs.tcp.dport                : exact @name("dst_port");
> > >     }
> > >     actions = {
> > >         @tableonly decap_vxlan_fwd;
> > >         @tableonly decap_vxlan_dnat_fwd;
> > >         @tableonly decap_vxlan_snat_fwd;
> > >         @defaultonly set_exception;
> > >     }
> > > }
> > Translate to rte_flow:
> > template pattern relaxed_mode = 1 pattern =  Ipv4_src / ipv4_dst  / vni /
> > ipv4_src / ipv4_dst / tcp_sport / tcp_dport map structure = {
> > 	tun_ip_src = &pattern[ipv4_src]
> > 	....
> > }
> > > ...
> > >
> > > action decap_vxlan_fwd(PortId_t port_id) {
> > >     meta.mod_action = (bit<11>)VXLAN_DECAP_OUTER_IPV4;
> > >     send_to_port(port_id);
> > > }
> > >
> > Same as above just with action template
> >
> > > Below is an example of the hint that the compiler will generate for
> > > the
> > > decap_vxlan_tcp_table:
> > >
> > > Table ID:      8454144
> > > Name:          decap_vxlan_tcp_table
> > > Field ID       Name                          Match Type     Bit Width
> > > Byte Width     Byte Order
> > > 1              tun_ip_src                    exact          32
> > > 4              network
> > > 2              tun_ip_dst                    exact          32
> > > 4              network
> > > 3              vni                           exact          24
> > > 3              network
> > > 4              ipv4_src                      exact          32
> > > 4              network
> > > 5              ipv4_dst                      exact          32
> > > 4              network
> > > 6              src_port                      exact          16
> > > 2              network
> > > 7              dst_port                      exact          16
> > > 2              network Spec ID        Name
> > > 8519716        decap_vxlan_fwd
> > > 8519718        decap_vxlan_dnat_fwd
> > > 8519720        decap_vxlan_snat_fwd
> > > 8519695        set_exception
> > >
> > > And the hint of action spec "decap_vxlan_fwd" as below:
> > >
> > > Spec ID:       8519716
> > > Name:          decap_vxlan_fwd
> > > Field ID       Name                          Bit Width      Byte Width
> > > Byte Order
> > > 1              port_id                       32             4
> > > host
> > >
> > > Please note that different compilers may assign different IDs.
> > >
> > > Below is an example of how to program a rule using the P4 runtime API
> > > in JSON format. This rule matches fields and directs packets to port 5.
> > >
> > > {
> > >     "type": 1,  //INSERT
> > >     "entity": {
> > >         "table_entry": {
> > >             "table_id": 8454144,
> > >             "match": [
> > >                 { "field_id": 1, "exact": { "value": [10, 0, 0, 1] }
> > > },   // outer src IP =
> > > 10.0.0.1
> > >                 { "field_id": 2, "exact": { "value": [10, 0, 0, 2] }
> > > },  // outer dst IP =
> > > 10.0.0.2
> > >                 { "field_id": 3, "exact": { "value": [0, 0, 10] } },
> > > //  vni = 10,
> > >                 { "field_id": 4, "exact": { "value": [192, 0, 0, 1] }
> > > }, // inner src IP =
> > > 192.0.0.1
> > >                 {"field_id": 5, "exact": { "value": [192, 0, 0, 2] }
> > > }, // inner dst IP =
> > > 192.0.0.2
> > >                 {"field_id": 6, "exact": { "value": [0, 200] } }, //
> > > tcp src port = 200
> > >                 {"field_id": 7, "exact": { "value": [0, 201] } }, //
> > > tcp dst port = 201
> > >             ],
> > >             "action": {
> > >                 "action": {
> > >                     "action_id": 8519716,
> > >                     "params": [
> > >                         { "param_id": 1, "value": [5, 0, 0, 0] }
> > >                     ]
> > >                 }
> > >             },
> > >             ...
> > >         }
> > >     }    ...
> > > }
> > >
> > > Please note that this is only a part of the full command. For more
> > > information, please refer to the p4runtime.proto[2]
> > >
> > > 1. https://p4.org/p4-spec/p4runtime/main/P4Runtime-Spec.html
> > > 2.
> > >
> >
> https://github.com/p4lang/p4runtime/blob/main/proto/p4/v1/p4runtime.p
> > r
> > > oto
> > >
> > > Thank you for your attention to this matter.
> > >
> >
> > I think that we should schedule some meeting to see how much gaps we
> > really have between the rte_flow and
> > P4 and how we can improve the rte_flow to allow the best experience.
> 
> Sound a good idea!
> >
> > > Regards
> > > Qi

  reply	other threads:[~2023-05-18 14:33 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-05-08  6:39 Zhang, Qi Z
2023-05-17 15:18 ` Ori Kam
2023-05-18 10:33   ` Zhang, Qi Z
2023-05-18 14:33     ` Ori Kam [this message]
2023-05-22  5:12       ` Zhang, Qi Z
2023-05-24 15:00         ` Jerin Jacob
2023-05-24 15:43           ` Thomas Monjalon
2023-05-18 14:45     ` Honnappa Nagarahalli
2023-05-22  4:58       ` Zhang, Qi Z

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=MW2PR12MB46662896E0007C5373729ECBD67F9@MW2PR12MB4666.namprd12.prod.outlook.com \
    --to=orika@nvidia.com \
    --cc=anatoly.burakov@intel.com \
    --cc=bruce.richardson@intel.com \
    --cc=cunming.liang@intel.com \
    --cc=dev@dpdk.org \
    --cc=helin.zhang@intel.com \
    --cc=jingjing.wu@intel.com \
    --cc=john.mcnamara@intel.com \
    --cc=keith.wiles@intel.com \
    --cc=qi.z.zhang@intel.com \
    --cc=rosen.xu@intel.com \
    --cc=techboard@dpdk.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).