Hello Gregory,
Thanks for suggesting the workaround. Looking forward to the investigation result and the fix.
Regards,
Tao
From:
Gregory Etelson <getelson@nvidia.com>
Date: Wednesday, 20. March 2024 at 17:36
To: Tao Li <byteocean@hotmail.com>, Suanming Mou <suanmingm@nvidia.com>, guvenc.gulce@gmail.com <guvenc.gulce@gmail.com>, users@dpdk.org <users@dpdk.org>
Subject: Re: mlx5: rte_flow template/async API raw_encap validation bug ?
Hello Tao,
I reproduced the PMD crash you've described.
We'll investigate it and will issue a fix shortly.
In the meanwhile, I can suggest a workaround.
Please consider creating actions template with the fully masked RAW_ENCAP action.
The fully masked RAW_ENCAP provides data and size parameters in the action description and sets non-zero values in the action mask configuration.
Testpmd commands are:
dpdk-testpmd -a $PCI,dv_flow_en=2,representor=vf\[0-1\] -- -i
port stop all
flow configure 0 queues_number 4 queues_size 64
flow configure 1 queues_number 4 queues_size 64
flow configure 2 queues_number 4 queues_size 64
port start all
set verbose 1
set raw_decap 0 eth / ipv6 / end_set
set raw_encap 0 eth src is 11:22:33:44:55:66 dst is aa:bb:cc:dd:ee:aa type is 0x0800 has_vlan is 0 / end_set
flow actions_template 0 create transfer actions_template_id 1 template raw_decap / raw_encap index 0 / represented_port / end mask raw_decap / raw_encap index 0 / represented_port
/ end
flow pattern_template 0 create transfer pattern_template_id 1 template eth / ipv6 / end
flow template_table 0 create transfer table_id 1 group 0 priority 0 rules_number 1 pattern_template 1 actions_template 1
Regards,
Gregory
From: Tao Li <byteocean@hotmail.com>
Sent: Wednesday, March 20, 2024 17:19
To: Gregory Etelson <getelson@nvidia.com>; Suanming Mou <suanmingm@nvidia.com>; guvenc.gulce@gmail.com <guvenc.gulce@gmail.com>; users@dpdk.org <users@dpdk.org>
Subject: Re: mlx5: rte_flow template/async API raw_encap validation bug ?
External email: Use caution opening links or attachments |
Hello Gregory,
I am the colleague from Guevenc. Thanks a lot for providing detailed explaination, and we appreciate your support very much.
The guidance on the usage of the RAW_ENCAP mask is adopted and experimented, which currently leads to the segmentation fault in our setup. The example code used to create the action template and table template is
as following:
<Code Excerpt>
// first action template
struct rte_flow_action_raw_decap decap_action = {.size = (sizeof(struct rte_ether_hdr)+sizeof(struct rte_ipv6_hdr))}; // remove IPinIP packet’s header
struct rte_flow_action_raw_encap encap_action = {.data = NULL, .size = sizeof(struct rte_ether_hdr)}; // add ether header for VMs
struct rte_flow_action act[] = {
[0] = {.type = RTE_FLOW_ACTION_TYPE_RAW_DECAP, .conf = &decap_action}, //?
[1] = {.type = RTE_FLOW_ACTION_TYPE_RAW_ENCAP, .conf = &encap_action}, //?
[2] = {.type = RTE_FLOW_ACTION_TYPE_REPRESENTED_PORT,},
[3] = {.type = RTE_FLOW_ACTION_TYPE_END,},
};
struct rte_flow_action msk[] = {
[0] = {.type = RTE_FLOW_ACTION_TYPE_RAW_DECAP, .conf= &decap_action},
[1] = {.type = RTE_FLOW_ACTION_TYPE_RAW_ENCAP, .conf= &encap_action},
[2] = {.type = RTE_FLOW_ACTION_TYPE_REPRESENTED_PORT,},
[3] = {.type = RTE_FLOW_ACTION_TYPE_END,},
};
port_template_info_pf.actions_templates[0] = create_actions_template(main_eswitch_port, act, msk);
// create template table
port_template_info_pf.template_table = create_table_template(main_eswitch_port, &table_attr_pf,
(struct rte_flow_pattern_template **)&port_template_info_pf.pattern_templates, MAX_NR_OF_PATTERN_TEMPLATE,
(struct rte_flow_actions_template **)&port_template_info_pf.actions_templates, MAX_NR_OF_ACTION_TEMPLATE);
</Code Excerpt>
Using gdb, the following segfault trace is captured:
<Trace Excerpt>
#0 0x00005555579bfb62 in mlx5dr_action_prepare_decap_l3_data (src=0x0, dst=0x7fffffffb3cc "", num_of_actions=6) at ../drivers/net/mlx5/hws/mlx5dr_action.c:2774
#1 0x00005555579c2136 in mlx5dr_action_handle_tunnel_l3_to_l2 (action=0x55555f089dc0, num_of_hdrs=1 '\001', hdrs=0x7fffffffb720, log_bulk_sz=1) at ../drivers/net/mlx5/hws/mlx5dr_action.c:1468
#2 0x00005555579bd56f in mlx5dr_action_create_reformat_hws (action=0x55555f089dc0, num_of_hdrs=1 '\001', hdrs=0x7fffffffb720, bulk_size=1) at ../drivers/net/mlx5/hws/mlx5dr_action.c:1537
#3 0x00005555579bd0eb in mlx5dr_action_create_reformat (ctx=0x55555e756b40, reformat_type=MLX5DR_ACTION_TYP_REFORMAT_TNL_L3_TO_L2, num_of_hdrs=1 '\001', hdrs=0x7fffffffb720, log_bulk_size=1, flags=32) at ../drivers/net/mlx5/hws/mlx5dr_action.c:1594
#4 0x0000555557927826 in mlx5_tbl_multi_pattern_process (dev=0x555559167300 <rte_eth_devices>, tbl=0x17fdd1280, mpat=0x7fffffffb9c8, error=0x7fffffffd050) at ../drivers/net/mlx5/mlx5_flow_hw.c:4146
#5 0x000055555795f133 in mlx5_hw_build_template_table (dev=0x555559167300 <rte_eth_devices>, nb_action_templates=1 '\001', action_templates=0x555558dc1830 <port_template_info_pf+16>, at=0x7fffffffcf00, tbl=0x17fdd1280,
error=0x7fffffffd050)
at ../drivers/net/mlx5/mlx5_flow_hw.c:4235
#6 0x00005555579022d3 in flow_hw_table_create (dev=0x555559167300 <rte_eth_devices>, table_cfg=0x7fffffffdec8, item_templates=0x555558dc1828 <port_template_info_pf+8>, nb_item_templates=1 '\001',
action_templates=0x555558dc1830 <port_template_info_pf+16>, nb_action_templates=1 '\001', error=0x7fffffffe0f8) at ../drivers/net/mlx5/mlx5_flow_hw.c:4401
#7 0x00005555577f89ec in flow_hw_template_table_create (dev=0x555559167300 <rte_eth_devices>, attr=0x5555589cc1e4 <table_attr_pf>, item_templates=0x555558dc1828 <port_template_info_pf+8>, nb_item_templates=1
'\001',
action_templates=0x555558dc1830 <port_template_info_pf+16>, nb_action_templates=1 '\001', error=0x7fffffffe0f8) at ../drivers/net/mlx5/mlx5_flow_hw.c:4589
#8 0x0000555556ca21e8 in mlx5_flow_table_create (dev=0x555559167300 <rte_eth_devices>, attr=0x5555589cc1e4 <table_attr_pf>, item_templates=0x555558dc1828 <port_template_info_pf+8>, nb_item_templates=1 '\001',
action_templates=0x555558dc1830 <port_template_info_pf+16>, nb_action_templates=1 '\001', error=0x7fffffffe0f8) at ../drivers/net/mlx5/mlx5_flow.c:9357
#9 0x0000555555c07c9a in rte_flow_template_table_create (port_id=0, table_attr=0x5555589cc1e4 <table_attr_pf>, pattern_templates=0x555558dc1828 <port_template_info_pf+8>, nb_pattern_templates=1 '\001',
actions_templates=0x555558dc1830 <port_template_info_pf+16>, nb_actions_templates=1 '\001', error=0x7fffffffe0f8) at ../lib/ethdev/rte_flow.c:1928
</Trace Excerpt>
Any comment or suggestions on this issue would be appreciated. Thanks in advance.
Best regards,
Tao Li
From: Gregory Etelson <getelson@nvidia.com>
Sent: 19 March 2024 14:25
To: Suanming Mou <suanmingm@nvidia.com>; Guvenc Gulce <guvenc.gulce@gmail.com>; users@dpdk.org <users@dpdk.org>
Cc: Ori Kam <orika@nvidia.com>; Maayan Kashani <mkashani@nvidia.com>
Subject: Re: mlx5: rte_flow template/async API raw_encap validation bug ?
Hello Guvenc,
Flow actions in MLX5 PMD actions template are translated according to these general rules:
Before patch 2e543b6f18a2 ("net/mlx5: reuse reformat and modify actions in a table")
the PMD ignored the RAW_ENCAP NULL mask configuration and used the action configuration for construction.
2e543b6f18a2 does not allow access to RAW_ENCAP configuration if the action did not provide correct mask.
If flow action configuration has several parameters, the action template can be partially translated -
some action parameters will be provided with the template and other with async flow.
In that case, if the action mask parameter has any non-zero value, it's configuration parameter will be used in a template.
If the action mask parameter is 0, that parameter value will be provided during async flow.
Partial action translation used for pre-defined flow actions.
MLX5 PMD requires the `size` parameter of the RAW_ENCAP action during the template action translation.
The action data can be provided ether with the template action configuration or with async flow.
Therefore, the RAW_ENCAP template configuration can be fully masked with the action size and data or partially masked with size only.
Regards,
Gregory
From: Suanming Mou <suanmingm@nvidia.com>
Sent: Tuesday, March 19, 2024 02:24
To: Guvenc Gulce <guvenc.gulce@gmail.com>; users@dpdk.org <users@dpdk.org>; Gregory Etelson <getelson@nvidia.com>
Cc: Ori Kam <orika@nvidia.com>; Maayan Kashani <mkashani@nvidia.com>
Subject: RE: mlx5: rte_flow template/async API raw_encap validation bug ?
Hi Guvenc,
From: Guvenc Gulce <guvenc.gulce@gmail.com>
Sent: Monday, March 18, 2024 6:26 PM
To: users@dpdk.org
Cc: Suanming Mou <suanmingm@nvidia.com>; Ori Kam <orika@nvidia.com>
Subject: mlx5: rte_flow template/async API raw_encap validation bug ?
Hi all,
It is great that we have rte_flow async/template api integrated to mlx5
driver code and it is being established as the new standard rte_flow API.
I have the following raw_encap problem when using the rte_flow async/template API
with mlx5 driver:
- raw_encap rte_flow action template fails during validation when the action mask
conf is NULL but this clearly contradicts the explanation from Suanming Mou's
commit 7f6daa490d9 which clearly states that the raw encap action mask is allowed
to be NULL.
<Excerpt from commit 7f6daa490d9>
2. RAW encap (encap_data: raw)
action conf (raw_data)
a. action mask conf (not NULL)
- encap_data constant.
b. action mask conf (NULL)
- encap_data will change.
</Excerpt from commit 7f6daa490d9>
Commenting out the raw_encap validation would make it possible to create
rte_flow template with null mask conf which can be concretized later on.
Things seem to work after relaxing the rte_flow raw_encap validation.
The change would look like:
[Suanming] I guess maybe it is due to the raw_encap and raw_decap combination. I added Gregory who added that code maybe can explain it better.
@Gregory Etelson
<Excerpt>
diff --git a/drivers/net/mlx5/mlx5_flow_hw.c b/drivers/net/mlx5/mlx5_flow_hw.c
index 35f1ed7a03..3f57fd9286 100644
--- a/drivers/net/mlx5/mlx5_flow_hw.c
+++ b/drivers/net/mlx5/mlx5_flow_hw.c
@@ -6020,10 +6020,10 @@ flow_hw_validate_action_raw_encap(const struct rte_flow_action *action,
const struct rte_flow_action_raw_encap *mask_conf = mask->conf;
const struct rte_flow_action_raw_encap *action_conf = action->conf;
- if (!mask_conf || !mask_conf->size)
+/* if (!mask_conf || !mask_conf->size)
return rte_flow_error_set(error, EINVAL,
RTE_FLOW_ERROR_TYPE_ACTION, mask,
- "raw_encap: size must be masked");
+ "raw_encap: size must be masked"); */
if (!action_conf || !action_conf->size)
return rte_flow_error_set(error, EINVAL,
RTE_FLOW_ERROR_TYPE_ACTION, action,
</Excerpt>
But this can not be the proper solution. Please advise a solution how to make the
raw_encap work with rte_flow template/async API. If relaxing the validation is ok, I can
also prepare and send a patch.
Thanks in advance,
Guvenc Gulce