DPDK patches and discussions
 help / color / mirror / Atom feed
From: "Zhang, Qi Z" <qi.z.zhang@intel.com>
To: Ori Kam <orika@nvidia.com>, "dev@dpdk.org" <dev@dpdk.org>
Cc: "techboard@dpdk.org" <techboard@dpdk.org>,
	"Richardson, Bruce" <bruce.richardson@intel.com>,
	"Burakov, Anatoly" <anatoly.burakov@intel.com>,
	 "Wiles, Keith" <keith.wiles@intel.com>,
	"Liang, Cunming" <cunming.liang@intel.com>,
	"Wu, Jingjing" <jingjing.wu@intel.com>,
	"Zhang, Helin" <helin.zhang@intel.com>,
	"Mcnamara, John" <john.mcnamara@intel.com>,
	"Xu, Rosen" <rosen.xu@intel.com>
Subject: RE: seeking community input on adapting DPDK to P4Runtime backend
Date: Mon, 22 May 2023 05:12:29 +0000	[thread overview]
Message-ID: <DM4PR11MB5994A61AA0CBBDCE3616D101D7439@DM4PR11MB5994.namprd11.prod.outlook.com> (raw)
In-Reply-To: <MW2PR12MB46662896E0007C5373729ECBD67F9@MW2PR12MB4666.namprd12.prod.outlook.com>



> -----Original Message-----
> From: Ori Kam <orika@nvidia.com>
> Sent: Thursday, May 18, 2023 10:34 PM
> To: Zhang, Qi Z <qi.z.zhang@intel.com>; dev@dpdk.org
> Cc: techboard@dpdk.org; Richardson, Bruce <bruce.richardson@intel.com>;
> Burakov, Anatoly <anatoly.burakov@intel.com>; Wiles, Keith
> <keith.wiles@intel.com>; Liang, Cunming <cunming.liang@intel.com>; Wu,
> Jingjing <jingjing.wu@intel.com>; Zhang, Helin <helin.zhang@intel.com>;
> Mcnamara, John <john.mcnamara@intel.com>; Xu, Rosen
> <rosen.xu@intel.com>
> Subject: RE: seeking community input on adapting DPDK to P4Runtime
> backend
> 
> Hi Zhang,
> 
> I think we both want the same thing and share the same basic concepts.
> 
> PSB, some answers,
> 
> Best,
> Ori
> 
> 
> > -----Original Message-----
> > From: Zhang, Qi Z <qi.z.zhang@intel.com>
> > Sent: Thursday, May 18, 2023 1:33 PM
> >
> >
> >
> > > -----Original Message-----
> > > From: Ori Kam <orika@nvidia.com>
> > > Sent: Wednesday, May 17, 2023 11:19 PM
> > > To: Zhang, Qi Z <qi.z.zhang@intel.com>; dev@dpdk.org
> > > Cc: techboard@dpdk.org; Richardson, Bruce
> > <bruce.richardson@intel.com>;
> > > Burakov, Anatoly <anatoly.burakov@intel.com>; Wiles, Keith
> > > <keith.wiles@intel.com>; Liang, Cunming <cunming.liang@intel.com>;
> > > Wu, Jingjing <jingjing.wu@intel.com>; Zhang, Helin
> > > <helin.zhang@intel.com>; Mcnamara, John <john.mcnamara@intel.com>;
> > > Xu, Rosen <rosen.xu@intel.com>
> > > Subject: RE: seeking community input on adapting DPDK to P4Runtime
> > > backend
> > >
> > > Hi Zhang,
> > >
> > > rte_flow is an excellent candidate for implementing P4.
> > > We and some internal tests that shows great promise in this regard.
> > >
> > > I would be very happy to supply any needed information and have
> > > discussion on how to continue with this project.
> >
> > Thank you Ori! Please check my following comments
> >
> > Regards
> > Qi
> >
> > >
> > > Please see inline detailed answers.
> > >
> > > Best,
> > > Ori Kam
> > >
> > >
> > >
> > >
> > > > -----Original Message-----
> > > > From: Zhang, Qi Z <qi.z.zhang@intel.com>
> > > > Sent: Monday, May 8, 2023 9:40 AM
> > > > Subject: seeking community input on adapting DPDK to P4Runtime
> > > backend
> > > >
> > > > Hi:
> > > >
> > > > Our team is currently working on developing a DPDK PMD for a P4-
> > > > programmed network controller, based on customer feedback to
> > integrate
> > > > DPDK into the P4Runtime backend .[https://p4.org/p4-
> > > > spec/p4runtime/main/P4Runtime-Spec.html]
> > > >
> > > > (*) However, we are facing challenges in adapting DPDK's rte_flow
> > > > API to the P4Runtime API, primarily due to the transition from a
> > > > table-based API with fields of arbitrary bits width at arbitrary
> > > > offset to a protocol-based API (more detail be described in post-script).
> > > >
> > > > We are seeking suggestions and best practices from the open-source
> > > > community to help us with this integration. Specifically, we are
> > > > interested in
> > > > learning:
> > > >
> > > > (*) If anyone has previously attempted to map rte_flow to P4-based
> > > devices.
> > >
> > > We did try successfully.
> > >
> > > > (*) Thoughts on how to map from table-based matching to protocol-
> > based
> > > > matching like in rte_flow.
> > >
> > > Rte_flow is table based (groups), now with the introduction of
> > > template
> > API
> > > rte_flow is even more table based (we added the concept of tables)
> > > which are just what
> > > p4 requires.
> >
> > Yes, the rte_flow template can be used to map a sequence of patterns
> > to a
> > P4 table and a sequence of actions to a P4 action. However, Using a
> > fixed rte_flow template can be problematic when handling different P4
> > programs in the same driver. To provide more flexibility, the mapping
> > of patterns and actions can be externalized into a configuration file
> > or part of the firmware can be learned from driver, allowing for
> > customization based on the specific requirements of each P4 pipeline.
> > actually we have enabled this approach in order to accommodate different
> P4 programs.
> >
> > However, an alternative approach to consider is whether it would be
> > feasible to directly expose the P4 table and action names or IDs to
> > the application, rather than relying on rte_flow templates. This
> > approach offers several potential benefits:
> >
> > Integration with P4runtime Backend: By exposing the P4 table and
> > action names or IDs directly, DPDK could be easily integrated as a
> > P4runtime backend. This eliminates the need for translation from the
> > P4runtime API to rte_flow templates in the application, simplifying the
> integration process.
> >
> > Elimination of Manual Mapping: Exposing the P4 table and action names
> > or IDs to the application would remove the requirement for the
> > engineering team to manually map each pipeline to specific rte_flow
> > templates. This is particularly beneficial in cases where hardware
> > vendors provide customers with a toolchain to create their own P4
> > pipelines but do not necessarily own the P4 programs. By eliminating
> > the dependency on rte_flow templates, this approach reduces complexity
> in using DPDK as the driver.
> >
> > To be more specific, the proposed API for exposing P4 table and action
> > names or IDs directly to the application could be as follows:
> >
> > /* Get the table info */
> > struct rte_p4_table_info tbl_info;
> > rte_p4_table_info_get_by_name(port_id, "decap_vxlan_tcp_table",
> > &tbl_info);
> >
> > /* Create the key */
> > struct rte_p4_table_key *key;
> > rte_p4_table_key_create(port_id, tbl_info->id, &key);
> >
> > /* Set the key fields */
> > rte_p4_table_key_field_set_by_name(port_id, key, "wire_port",
> > &wire_port, 2); rte_p4_table_key_field_set_by_name(port_id, key,
> > "tun_ip_src", &tun_ip_src, 4);
> > rte_p4_table_key_field_set_by_name(port_id, key, "tun_ip_dst",
> > &tun_ip_dst, 4); rte_p4_table_key_field_set_by_name(port_id, key,
> > "vni", &vni, 3); rte_p4_table_key_field_set_by_name(port_id, key,
> > "ipv4_src", &ipv4_src, 4); rte_p4_table_key_field_set_by_name(port_id,
> > key, "ipv4_dst", &ipv4_dst, 4);
> > rte_p4_table_key_field_set_by_name(port_id, key, "src_port",
> > &src_port, 2); rte_p4_table_key_field_set_by_name(port_id, key,
> > "dst_port", &dst_port, 2);
> >
> > /* Get the action spec info */
> > struct rte_p4_action_spec_info as_info;
> > rte_p4_action_spec_info_get_by_name(port_id, "decap_vxlan_fwd",
> > &as_info);
> >
> >
> > /* Create the action */
> > struct rte_p4_action *action;
> > rte_p4_action_create(port_id, as_info->id, &action);
> >
> >
> > /* Set the action fields */
> > rte_p4_table_action_field_set_by_name(port_id, action, "mod_id",
> > &mod_id, 3); rte_p4_table_action_field_set_by_name(port_id, action,
> > "port_id", &target_port_id, 2);
> >
> > /* Add the entry */
> > rte_p4_table_entry_add(port_id, tbl_info->id, key, action);
> >
> > ...
> >
> 
> I think that introduce some API that knows P4 is the way to go, 

Good to know!

> but I think that
> this should be a very simple API which calls rte_flow.

I guess the complexity of the API implementation may depend on the underlying hardware, In our case, we can directly translate the P4 table key and action into low-level hardware configuration using hints generated by the P4 compiler, without the need for additional translation with rte_flow protocol-based templates

Thanks
Qi

> 
> 
> >
> >
> >
> >
> > >
> > > > (*) Any ideas on how to extend or expand the rte_flow APIs to
> > > > better accommodate P4-based or other table-matching based devices.
> > > >
> > >
> > > Lets discuss any issue you have.
> > >
> > > > Your insights and feedback would be greatly appreciated!
> > > >
> > > > ======================= Post-Script
> > ============================
> > > >
> > > > More details on the problem below, for anyone interested
> > > >
> > > > In P4, flow offloading can be implemented using the P4Runtime API,
> > > > which provides a standard interface for controlling and
> > > > configuring the data plane behavior of network devices. P4Runtime
> > > > allows network operators to dynamically add, modify, and remove
> > > > flow rules in the hardware forwarding tables of P4-enabled devices.
> > > >
> > > > The P4Runtime API is a table-based API, it assume the packet
> > > > process pipeline was consists of one or more key/action units
> > > > (tables). In P4Runtime, each table defines the fields to be
> > > > matched and the actions to be taken on incoming packets. During
> > > > compilation, the P4 compiler assigns a unique
> > > > uint32 ID to each table, action, and field, which is associated
> > > > with its corresponding string name. These IDs have no inherent
> > > > relationship to any network protocol but instead serve as a means
> > > > to identify different components of a P4 program within the P4Runtime
> API.
> > > >
> > > This is the concept of tables and groups in rte_flow.
> > >
> > > > If we choose to use rte_flow as the low-level API for P4Runtime, a
> > > > translation layer is needed in the application to map the P4
> > > > tables and actions to the corresponding rte_flow rules. However,
> > > > this translation layer can be problematic as it is not easily scalable.
> > > > When the P4 pipeline is refined or updated, the translation rules
> > > > may also need to be updated, which can result in errors and
> > > > reduced
> > efficiency.
> > > >
> > > I don't understand why.
> > >
> > > > On the other hand, a hardware vendor that provides a P4-enabled
> > > > device is required to implement an rte_flow interface in their DPDK PMD.
> > > > Typically, the
> > > > P4 compiler generates hints for the driver on how to map P4 tables
> > > > to hardware resources, and how to convert table entry
> > > > add/modify/delete actions into low-level hardware configurations.
> > > > However, because rte_flow is protocol-based, it poses an
> > > > additional challenge for driver developers, who must create
> > > > another translation layer to convert rte_flow tokens into P4
> > > > object identifiers. This translation layer must be carefully
> > > > designed and implemented to ensure optimal performance and
> > > > scalability, and to ensure that the driver can efficiently
> > > handle the dynamic nature of P4 programs.
> > > >
> > > Right, but some of the translation can be done in shared code by all
> > > PMDs and the translation is static for the compilation so inserting
> > > rules can be supper fast with no need for extra work.
> > >
> > > > To better understand the problem, let's consider the following
> > > > example that demonstrates how to use the P4Runtime API to program
> > > > a rule for processing a VXLAN packet. The rule matches a VXLAN
> > > > packet, decapsulates the tunnel header, and forwards it to a specific
> port.
> > > >
> > > > The P4 source code below describes the VXLAN decap table
> > > > decap_vxlan_tcp_table, which matches the outer IP address, VNI,
> > > > inner IP address, and inner TCP port. For each rule, four action
> > > > specifications can be selected. We will focus on one action
> > > > specification decap_vxlan_fwd that performs decapsulation and
> > > > forwards
> > > the packet to a specific port.
> > > >
> > > > table decap_vxlan_tcp_table {
> > > >     key = {
> > > >         hdrs.ipv4[meta.depth-1].src_ip: exact @name("tun_ip_src");
> > > >         hdrs.ipv4[meta.depth-1].dst_ip: exact @name("tun_ip_dst");
> > > >         hdrs.vxlan[meta.depth-1].vni  : exact @name("vni");
> > > >         hdrs.ipv4[meta.depth].src_ip  : exact @name("ipv4_src");
> > > >         hdrs.ipv4[meta.depth].dst_ip  : exact @name("ipv4_dst");
> > > >         hdrs.tcp.sport                : exact @name("src_port");
> > > >         hdrs.tcp.dport                : exact @name("dst_port");
> > > >     }
> > > >     actions = {
> > > >         @tableonly decap_vxlan_fwd;
> > > >         @tableonly decap_vxlan_dnat_fwd;
> > > >         @tableonly decap_vxlan_snat_fwd;
> > > >         @defaultonly set_exception;
> > > >     }
> > > > }
> > > Translate to rte_flow:
> > > template pattern relaxed_mode = 1 pattern =  Ipv4_src / ipv4_dst  /
> > > vni / ipv4_src / ipv4_dst / tcp_sport / tcp_dport map structure = {
> > > 	tun_ip_src = &pattern[ipv4_src]
> > > 	....
> > > }
> > > > ...
> > > >
> > > > action decap_vxlan_fwd(PortId_t port_id) {
> > > >     meta.mod_action = (bit<11>)VXLAN_DECAP_OUTER_IPV4;
> > > >     send_to_port(port_id);
> > > > }
> > > >
> > > Same as above just with action template
> > >
> > > > Below is an example of the hint that the compiler will generate
> > > > for the
> > > > decap_vxlan_tcp_table:
> > > >
> > > > Table ID:      8454144
> > > > Name:          decap_vxlan_tcp_table Field ID       Name
> > > > Match Type     Bit Width Byte Width     Byte Order
> > > > 1              tun_ip_src                    exact          32
> > > > 4              network
> > > > 2              tun_ip_dst                    exact          32
> > > > 4              network
> > > > 3              vni                           exact          24
> > > > 3              network
> > > > 4              ipv4_src                      exact          32
> > > > 4              network
> > > > 5              ipv4_dst                      exact          32
> > > > 4              network
> > > > 6              src_port                      exact          16
> > > > 2              network
> > > > 7              dst_port                      exact          16
> > > > 2              network Spec ID        Name
> > > > 8519716        decap_vxlan_fwd
> > > > 8519718        decap_vxlan_dnat_fwd
> > > > 8519720        decap_vxlan_snat_fwd
> > > > 8519695        set_exception
> > > >
> > > > And the hint of action spec "decap_vxlan_fwd" as below:
> > > >
> > > > Spec ID:       8519716
> > > > Name:          decap_vxlan_fwd
> > > > Field ID       Name                          Bit Width      Byte
> > > > Width Byte Order
> > > > 1              port_id                       32             4 host
> > > >
> > > > Please note that different compilers may assign different IDs.
> > > >
> > > > Below is an example of how to program a rule using the P4 runtime
> > > > API in JSON format. This rule matches fields and directs packets to port 5.
> > > >
> > > > {
> > > >     "type": 1,  //INSERT
> > > >     "entity": {
> > > >         "table_entry": {
> > > >             "table_id": 8454144,
> > > >             "match": [
> > > >                 { "field_id": 1, "exact": { "value": [10, 0, 0, 1]
> > > > } },   // outer src IP =
> > > > 10.0.0.1
> > > >                 { "field_id": 2, "exact": { "value": [10, 0, 0, 2]
> > > > } },  // outer dst IP =
> > > > 10.0.0.2
> > > >                 { "field_id": 3, "exact": { "value": [0, 0, 10] }
> > > > }, //  vni = 10,
> > > >                 { "field_id": 4, "exact": { "value": [192, 0, 0,
> > > > 1] } }, // inner src IP =
> > > > 192.0.0.1
> > > >                 {"field_id": 5, "exact": { "value": [192, 0, 0, 2]
> > > > } }, // inner dst IP =
> > > > 192.0.0.2
> > > >                 {"field_id": 6, "exact": { "value": [0, 200] } },
> > > > // tcp src port = 200
> > > >                 {"field_id": 7, "exact": { "value": [0, 201] } },
> > > > // tcp dst port = 201
> > > >             ],
> > > >             "action": {
> > > >                 "action": {
> > > >                     "action_id": 8519716,
> > > >                     "params": [
> > > >                         { "param_id": 1, "value": [5, 0, 0, 0] }
> > > >                     ]
> > > >                 }
> > > >             },
> > > >             ...
> > > >         }
> > > >     }    ...
> > > > }
> > > >
> > > > Please note that this is only a part of the full command. For more
> > > > information, please refer to the p4runtime.proto[2]
> > > >
> > > > 1. https://p4.org/p4-spec/p4runtime/main/P4Runtime-Spec.html
> > > > 2.
> > > >
> > >
> >
> https://github.com/p4lang/p4runtime/blob/main/proto/p4/v1/p4runtime.p
> > > r
> > > > oto
> > > >
> > > > Thank you for your attention to this matter.
> > > >
> > >
> > > I think that we should schedule some meeting to see how much gaps we
> > > really have between the rte_flow and
> > > P4 and how we can improve the rte_flow to allow the best experience.
> >
> > Sound a good idea!
> > >
> > > > Regards
> > > > Qi

  reply	other threads:[~2023-05-22  5:12 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-05-08  6:39 Zhang, Qi Z
2023-05-17 15:18 ` Ori Kam
2023-05-18 10:33   ` Zhang, Qi Z
2023-05-18 14:33     ` Ori Kam
2023-05-22  5:12       ` Zhang, Qi Z [this message]
2023-05-24 15:00         ` Jerin Jacob
2023-05-24 15:43           ` Thomas Monjalon
2023-05-18 14:45     ` Honnappa Nagarahalli
2023-05-22  4:58       ` Zhang, Qi Z

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=DM4PR11MB5994A61AA0CBBDCE3616D101D7439@DM4PR11MB5994.namprd11.prod.outlook.com \
    --to=qi.z.zhang@intel.com \
    --cc=anatoly.burakov@intel.com \
    --cc=bruce.richardson@intel.com \
    --cc=cunming.liang@intel.com \
    --cc=dev@dpdk.org \
    --cc=helin.zhang@intel.com \
    --cc=jingjing.wu@intel.com \
    --cc=john.mcnamara@intel.com \
    --cc=keith.wiles@intel.com \
    --cc=orika@nvidia.com \
    --cc=rosen.xu@intel.com \
    --cc=techboard@dpdk.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).