Hello Gregory, Thanks for suggesting the workaround. Looking forward to the investigation result and the fix. Regards, Tao From: Gregory Etelson Date: Wednesday, 20. March 2024 at 17:36 To: Tao Li , Suanming Mou , guvenc.gulce@gmail.com , users@dpdk.org Subject: Re: mlx5: rte_flow template/async API raw_encap validation bug ? Hello Tao, I reproduced the PMD crash you've described. We'll investigate it and will issue a fix shortly. In the meanwhile, I can suggest a workaround. Please consider creating actions template with the fully masked RAW_ENCAP action. The fully masked RAW_ENCAP provides data and size parameters in the action description and sets non-zero values in the action mask configuration. Testpmd commands are: dpdk-testpmd -a $PCI,dv_flow_en=2,representor=vf\[0-1\] -- -i port stop all flow configure 0 queues_number 4 queues_size 64 flow configure 1 queues_number 4 queues_size 64 flow configure 2 queues_number 4 queues_size 64 port start all set verbose 1 set raw_decap 0 eth / ipv6 / end_set set raw_encap 0 eth src is 11:22:33:44:55:66 dst is aa:bb:cc:dd:ee:aa type is 0x0800 has_vlan is 0 / end_set flow actions_template 0 create transfer actions_template_id 1 template raw_decap / raw_encap index 0 / represented_port / end mask raw_decap / raw_encap index 0 / represented_port / end flow pattern_template 0 create transfer pattern_template_id 1 template eth / ipv6 / end flow template_table 0 create transfer table_id 1 group 0 priority 0 rules_number 1 pattern_template 1 actions_template 1 Regards, Gregory ________________________________ From: Tao Li Sent: Wednesday, March 20, 2024 17:19 To: Gregory Etelson ; Suanming Mou ; guvenc.gulce@gmail.com ; users@dpdk.org Subject: Re: mlx5: rte_flow template/async API raw_encap validation bug ? External email: Use caution opening links or attachments Hello Gregory, I am the colleague from Guevenc. Thanks a lot for providing detailed explaination, and we appreciate your support very much. The guidance on the usage of the RAW_ENCAP mask is adopted and experimented, which currently leads to the segmentation fault in our setup. The example code used to create the action template and table template is as following: // first action template struct rte_flow_action_raw_decap decap_action = {.size = (sizeof(struct rte_ether_hdr)+sizeof(struct rte_ipv6_hdr))}; // remove IPinIP packet’s header struct rte_flow_action_raw_encap encap_action = {.data = NULL, .size = sizeof(struct rte_ether_hdr)}; // add ether header for VMs struct rte_flow_action act[] = { [0] = {.type = RTE_FLOW_ACTION_TYPE_RAW_DECAP, .conf = &decap_action}, //? [1] = {.type = RTE_FLOW_ACTION_TYPE_RAW_ENCAP, .conf = &encap_action}, //? [2] = {.type = RTE_FLOW_ACTION_TYPE_REPRESENTED_PORT,}, [3] = {.type = RTE_FLOW_ACTION_TYPE_END,}, }; struct rte_flow_action msk[] = { [0] = {.type = RTE_FLOW_ACTION_TYPE_RAW_DECAP, .conf= &decap_action}, [1] = {.type = RTE_FLOW_ACTION_TYPE_RAW_ENCAP, .conf= &encap_action}, [2] = {.type = RTE_FLOW_ACTION_TYPE_REPRESENTED_PORT,}, [3] = {.type = RTE_FLOW_ACTION_TYPE_END,}, }; port_template_info_pf.actions_templates[0] = create_actions_template(main_eswitch_port, act, msk); // create template table port_template_info_pf.template_table = create_table_template(main_eswitch_port, &table_attr_pf, (struct rte_flow_pattern_template **)&port_template_info_pf.pattern_templates, MAX_NR_OF_PATTERN_TEMPLATE, (struct rte_flow_actions_template **)&port_template_info_pf.actions_templates, MAX_NR_OF_ACTION_TEMPLATE); Using gdb, the following segfault trace is captured: #0 0x00005555579bfb62 in mlx5dr_action_prepare_decap_l3_data (src=0x0, dst=0x7fffffffb3cc "", num_of_actions=6) at ../drivers/net/mlx5/hws/mlx5dr_action.c:2774 #1 0x00005555579c2136 in mlx5dr_action_handle_tunnel_l3_to_l2 (action=0x55555f089dc0, num_of_hdrs=1 '\001', hdrs=0x7fffffffb720, log_bulk_sz=1) at ../drivers/net/mlx5/hws/mlx5dr_action.c:1468 #2 0x00005555579bd56f in mlx5dr_action_create_reformat_hws (action=0x55555f089dc0, num_of_hdrs=1 '\001', hdrs=0x7fffffffb720, bulk_size=1) at ../drivers/net/mlx5/hws/mlx5dr_action.c:1537 #3 0x00005555579bd0eb in mlx5dr_action_create_reformat (ctx=0x55555e756b40, reformat_type=MLX5DR_ACTION_TYP_REFORMAT_TNL_L3_TO_L2, num_of_hdrs=1 '\001', hdrs=0x7fffffffb720, log_bulk_size=1, flags=32) at ../drivers/net/mlx5/hws/mlx5dr_action.c:1594 #4 0x0000555557927826 in mlx5_tbl_multi_pattern_process (dev=0x555559167300 , tbl=0x17fdd1280, mpat=0x7fffffffb9c8, error=0x7fffffffd050) at ../drivers/net/mlx5/mlx5_flow_hw.c:4146 #5 0x000055555795f133 in mlx5_hw_build_template_table (dev=0x555559167300 , nb_action_templates=1 '\001', action_templates=0x555558dc1830 , at=0x7fffffffcf00, tbl=0x17fdd1280, error=0x7fffffffd050) at ../drivers/net/mlx5/mlx5_flow_hw.c:4235 #6 0x00005555579022d3 in flow_hw_table_create (dev=0x555559167300 , table_cfg=0x7fffffffdec8, item_templates=0x555558dc1828 , nb_item_templates=1 '\001', action_templates=0x555558dc1830 , nb_action_templates=1 '\001', error=0x7fffffffe0f8) at ../drivers/net/mlx5/mlx5_flow_hw.c:4401 #7 0x00005555577f89ec in flow_hw_template_table_create (dev=0x555559167300 , attr=0x5555589cc1e4 , item_templates=0x555558dc1828 , nb_item_templates=1 '\001', action_templates=0x555558dc1830 , nb_action_templates=1 '\001', error=0x7fffffffe0f8) at ../drivers/net/mlx5/mlx5_flow_hw.c:4589 #8 0x0000555556ca21e8 in mlx5_flow_table_create (dev=0x555559167300 , attr=0x5555589cc1e4 , item_templates=0x555558dc1828 , nb_item_templates=1 '\001', action_templates=0x555558dc1830 , nb_action_templates=1 '\001', error=0x7fffffffe0f8) at ../drivers/net/mlx5/mlx5_flow.c:9357 #9 0x0000555555c07c9a in rte_flow_template_table_create (port_id=0, table_attr=0x5555589cc1e4 , pattern_templates=0x555558dc1828 , nb_pattern_templates=1 '\001', actions_templates=0x555558dc1830 , nb_actions_templates=1 '\001', error=0x7fffffffe0f8) at ../lib/ethdev/rte_flow.c:1928 Any comment or suggestions on this issue would be appreciated. Thanks in advance. Best regards, Tao Li ________________________________ From: Gregory Etelson Sent: 19 March 2024 14:25 To: Suanming Mou ; Guvenc Gulce ; users@dpdk.org Cc: Ori Kam ; Maayan Kashani Subject: Re: mlx5: rte_flow template/async API raw_encap validation bug ? Hello Guvenc, Flow actions in MLX5 PMD actions template are translated according to these general rules: 1. If flow action configuration in template mask was not NULL, PMD constructs the action according to the action configuration parameters. PMD will use that pre-build action during async flow creation. The action parameters cannot be changed during async flow creation. 2. If flow action configuration in template mask was NULL, PMD ignores the action configuration in the template. The action will be constructed according to configuration data provided during async flow creation. Before patch 2e543b6f18a2 ("net/mlx5: reuse reformat and modify actions in a table") the PMD ignored the RAW_ENCAP NULL mask configuration and used the action configuration for construction. 2e543b6f18a2 does not allow access to RAW_ENCAP configuration if the action did not provide correct mask. If flow action configuration has several parameters, the action template can be partially translated - some action parameters will be provided with the template and other with async flow. In that case, if the action mask parameter has any non-zero value, it's configuration parameter will be used in a template. If the action mask parameter is 0, that parameter value will be provided during async flow. Partial action translation used for pre-defined flow actions. MLX5 PMD requires the `size` parameter of the RAW_ENCAP action during the template action translation. The action data can be provided ether with the template action configuration or with async flow. Therefore, the RAW_ENCAP template configuration can be fully masked with the action size and data or partially masked with size only. Regards, Gregory ________________________________ From: Suanming Mou Sent: Tuesday, March 19, 2024 02:24 To: Guvenc Gulce ; users@dpdk.org ; Gregory Etelson Cc: Ori Kam ; Maayan Kashani Subject: RE: mlx5: rte_flow template/async API raw_encap validation bug ? Hi Guvenc, From: Guvenc Gulce Sent: Monday, March 18, 2024 6:26 PM To: users@dpdk.org Cc: Suanming Mou ; Ori Kam Subject: mlx5: rte_flow template/async API raw_encap validation bug ? Hi all, It is great that we have rte_flow async/template api integrated to mlx5 driver code and it is being established as the new standard rte_flow API. I have the following raw_encap problem when using the rte_flow async/template API with mlx5 driver: - raw_encap rte_flow action template fails during validation when the action mask conf is NULL but this clearly contradicts the explanation from Suanming Mou's commit 7f6daa490d9 which clearly states that the raw encap action mask is allowed to be NULL. 2. RAW encap (encap_data: raw) action conf (raw_data) a. action mask conf (not NULL) - encap_data constant. b. action mask conf (NULL) - encap_data will change. Commenting out the raw_encap validation would make it possible to create rte_flow template with null mask conf which can be concretized later on. Things seem to work after relaxing the rte_flow raw_encap validation. The change would look like: [Suanming] I guess maybe it is due to the raw_encap and raw_decap combination. I added Gregory who added that code maybe can explain it better. @Gregory Etelson diff --git a/drivers/net/mlx5/mlx5_flow_hw.c b/drivers/net/mlx5/mlx5_flow_hw.c index 35f1ed7a03..3f57fd9286 100644 --- a/drivers/net/mlx5/mlx5_flow_hw.c +++ b/drivers/net/mlx5/mlx5_flow_hw.c @@ -6020,10 +6020,10 @@ flow_hw_validate_action_raw_encap(const struct rte_flow_action *action, const struct rte_flow_action_raw_encap *mask_conf = mask->conf; const struct rte_flow_action_raw_encap *action_conf = action->conf; - if (!mask_conf || !mask_conf->size) +/* if (!mask_conf || !mask_conf->size) return rte_flow_error_set(error, EINVAL, RTE_FLOW_ERROR_TYPE_ACTION, mask, - "raw_encap: size must be masked"); + "raw_encap: size must be masked"); */ if (!action_conf || !action_conf->size) return rte_flow_error_set(error, EINVAL, RTE_FLOW_ERROR_TYPE_ACTION, action, But this can not be the proper solution. Please advise a solution how to make the raw_encap work with rte_flow template/async API. If relaxing the validation is ok, I can also prepare and send a patch. Thanks in advance, Guvenc Gulce