From: Yongseok Koh <yskoh@mellanox.com>
To: Jack Min <jackmin@mellanox.com>
Cc: Shahaf Shuler <shahafs@mellanox.com>, "dev@dpdk.org" <dev@dpdk.org>
Subject: Re: [dpdk-dev] [PATCH] net/mlx5: eswitch-IP address UDP/TCP port rewrite
Date: Tue, 9 Oct 2018 08:55:29 +0000 [thread overview]
Message-ID: <7BBA09A8-00D5-43D4-9215-72DE4580493B@mellanox.com> (raw)
In-Reply-To: <20181009070323.kdbje4eyg7dzmca6@MTBC-JACKMIN.mtl.com>
> On Oct 9, 2018, at 12:03 AM, Jack Min <jackmin@mellanox.com> wrote:
>
> On 18-10-09 06:08:32, Yongseok Koh wrote:
>> On Mon, Oct 08, 2018 at 07:22:03PM +0800, Xiaoyu Min wrote:
>>> On 18-10-02 04:19:00, Yongseok Koh wrote:
>>>> On Sun, Sep 30, 2018 at 03:21:04PM +0800, Xiaoyu Min wrote:
>>>>> On 18-09-29 07:03:33, Yongseok Koh wrote:
>>>>>> On Tue, Sep 25, 2018 at 07:51:06PM +0800, Xiaoyu Min wrote:
[...]
>>>>>
>>>>>>> + p_parser->keys[idx].mask = 0xFFFF0000;
>>>>>>> + p_parser->keys[idx].val = ((const struct rte_flow_action_set_tp *)
>>>>>>> + actions->conf)->port;
>>>>>>
>>>>>> Assigning 2B to 4B big-endian stroage? Doesn't look consistent with the mask
>>>>>> above - 0xffff0000.
>>>>>>
>>>>> So it should be as following ?
>>>>> p_parser->keys[idx].val = (__u32)((const struct rte_flow_action_set_tp *))
>>>>> actions->conf)->port;
>>>>
>>>> You can figure it out by actual tests but I think the following would be right.
>>>> p_parser->keys[idx].val =
>>>> rte_cpu_to_be_32(((const struct rte_flow_action_set_tp *)
>>>> actions->conf)->port);
>>>>
>>>> Please verify it by testing anyway.
>>>>
>>> No, it doesn't work correctly if it's converted to BE.
>>> As my understanding the Netlink message should be expressed in host-byte order (?)
>>
>> As far as I know rte_flow takes network-byte order for args and tcf na msg takes
>> host-byte order. Please refer to the "override_na_vlan_id:" in mlx5_flow_tcf.c,
>> rte_be_to_cpu_16() is used there. I'm still confusing why the mask is
>> 0xffff0000 above. Please make sure your code works correctly by multiple test
>> cases and hopefully I can hear clear explanation what is right and why.
>>
> Hey Koh,
> 1. yes, rte_flow takes network-byte order. Here the 'port' is already converted to
> BE by testpmd.
> If the rte_be_to_cpu_16 is used just like "override_na_vlan_id" does the result
> is not correct. Things like TC-pedit takes network-byte order.
> 2. For the mask, honestly, I'm not very clear. The only reference is from [1] and
> my verification by using TC show command.
> [1] https://www.netdevconf.org/2.1/slides/apr7/tc-workshop/vadai-tc-workshop-update.pdf
Interesting.
Please go ahead with your code because you have already verified it.
>>>>>>> + p_parser->sel.nkeys = (++idx);
>>>>>>> +}
>>>>>>> +
>>>>>>> +static void
>>>>>>> +flow_tcf_pedit_key_set_ipv6_addr(const struct rte_flow_action *actions,
>>>>>>> + struct pedit_parser *p_parser)
>>>>>>> +{
>>>>>>> + int idx = p_parser->sel.nkeys;
>>>>>>> + int keys = flow_tcf_calc_pedit_keys(IPV6_ADDR_LEN);
>>>>>>> + int off_base =
>>>>>>> + actions->type == RTE_FLOW_ACTION_TYPE_SET_IPV6_SRC ? 8 : 24;
>>>>>>
>>>>>> offsetof(struct ipv6_hdr, src_addr) :
>>>>>> offsetof(struct ipv6_hdr, dst_addr);
>>>>>>
>>>>> Got it!
>>>>>>> + const struct rte_flow_action_set_ipv6 *conf =
>>>>>>> + (const struct rte_flow_action_set_ipv6 *)actions->conf;
>>>>>>> +
>>>>>>> + for (int i = 0; i < keys; i++, idx++) {
>>>>>>> + p_parser->keys_ex[idx].htype = TCA_PEDIT_KEY_EX_HDR_TYPE_IP6;
>>>>>>> + p_parser->keys_ex[idx].cmd = TCA_PEDIT_KEY_EX_CMD_SET;
>>>>>>> + p_parser->keys[idx].off = off_base + i * SZ_PEDIT_KEY_VAL;
>>>>>>> + p_parser->keys[idx].mask = ~UINT32_MAX;
>>>>>>> + memcpy(&p_parser->keys[idx].val,
>>>>>>> + conf->ipv6_addr + i * SZ_PEDIT_KEY_VAL,
>>>>>>> + SZ_PEDIT_KEY_VAL);
>>>>>>> + }
>>>>>>> + p_parser->sel.nkeys += keys;
>>>>>>> +}
>>>>>>> +
>>>>>>> +static void
>>>>>>> +flow_tcf_pedit_key_set_ipv4_addr(const struct rte_flow_action *actions,
>>>>>>
>>>>>> How about getting rte_flow_action_set_ipv4 instead of rte_flow_action?
>>>>>> Same comment for ipv6 and tp_port.
>>>>>>
>>>>> What's the benefit by using rte_flow_action_set_ipv4 and how I know it's
>>>>> for src or dst address ?
>>>>
>>>> Just to make the function neat but I overlooked that you still need
>>>> actions->type. Please disregard my previous comment.
>>>>
>>> OK~
>>>>>>> + struct pedit_parser *p_parser)
>>>>>>> +{
>>>>>>> + int idx = p_parser->sel.nkeys;
>>>>>>> +
>>>>>>> + p_parser->keys_ex[idx].htype = TCA_PEDIT_KEY_EX_HDR_TYPE_IP4;
>>>>>>> + p_parser->keys_ex[idx].cmd = TCA_PEDIT_KEY_EX_CMD_SET;
>>>>>>> + p_parser->keys[idx].off =
>>>>>>> + (actions->type == RTE_FLOW_ACTION_TYPE_SET_IPV4_SRC ? 12 : 16);
>>>>>>
>>>>>> offsetof(struct ipv4_hdr, src_addr) :
>>>>>> offsetof(struct ipv4_hdr, dst_addr);
>>>>>>
>>>>> Got it!
>>>>>>> + p_parser->keys[idx].mask = ~UINT32_MAX;
>>>>>>> + p_parser->keys[idx].val =
>>>>>>> + ((const struct rte_flow_action_set_ipv4 *)
>>>>>>> + actions->conf)->ipv4_addr;
>>>>>>> + p_parser->sel.nkeys = (++idx);
>>>>>>> +}
>>>>>>> +
>>>>>>> +static int
>>>>>>> +flow_tcf_create_pedit_mnl_msg(struct nlmsghdr *nl,
>>>>>>> + const struct rte_flow_action **actions,
>>>>>>> + uint64_t item_flags)
>>>>>>> +{
>>>>>>> + struct pedit_parser p_parser;
>>>>>>> +
>>>>>>> + memset(&p_parser, 0, sizeof(p_parser));
>>>>>>> + mnl_attr_put_strz(nl, TCA_ACT_KIND, "pedit");
>>>>>>> + struct nlattr *na_act_options = mnl_attr_nest_start(nl,
>>>>>>> + TCA_ACT_OPTIONS);
>>>>>>> + /* all modify header actions should be in one tc-pedit action */
>>>>>>> + for (; (*actions)->type != RTE_FLOW_ACTION_TYPE_END; (*actions)++) {
>>>>>>
>>>>>> It seems that you want to aggregate all the pedit actions and form a single
>>>>>> na attr. But what if rte_flow_action_set_* are not contiguous? E.g.
>>>>>>
>>>>>> flow create ... actions set1 / set2 / port_id / set3 / end
>>>>>>
>>>>>> Then, it would have two pedit na attrs. Is that okay?
>>>>>> Or, need to think about another way?
>>>>>>
>>>>>> Same will happen in flow_tcf_get_pedit_actions_size().
>>>>>>
>>>>> It's OK if we have more than one pedit na attrs.
>>>>> _BUT_ only last pedit take effect based on my experiment
>>>>
>>>> Then, shouldn't we give some warning to user in validation? So that user can
>>>> have right expectation and reorder the actions as their intention like:
>>>> flow create ... actions set1 / set2 / set3 / port_id / end
>>>>
>>>> Otherwise set1 and set2 will be lost according to your comment.
>>>>
>>> I prefer to give error to user in validation because this is simple.
>>
>> Good.
>>
>>>> Or, how about making PMD do the right thing. I mean, even if the set actions are
>>>> scattered, PMD can collect it and apply in a single na attr?
>>>>
>>> My feeling is the above approach will be (become) complex. It looks like we introduce
>>> new functionality which re-order all actions, something like rss_expand.
>>
>> +1
>>
>>>>>>> + switch ((*actions)->type) {
>>>>>>> + case RTE_FLOW_ACTION_TYPE_SET_IPV4_SRC:
>>>>>>> + case RTE_FLOW_ACTION_TYPE_SET_IPV4_DST:
>>>>>>> + flow_tcf_pedit_key_set_ipv4_addr(*actions, &p_parser);
>>>>>>> + break;
>>>>>>> + case RTE_FLOW_ACTION_TYPE_SET_IPV6_SRC:
>>>>>>> + case RTE_FLOW_ACTION_TYPE_SET_IPV6_DST:
>>>>>>> + flow_tcf_pedit_key_set_ipv6_addr(*actions, &p_parser);
>>>>>>> + break;
>>>>>>> + case RTE_FLOW_ACTION_TYPE_SET_TP_SRC:
>>>>>>> + case RTE_FLOW_ACTION_TYPE_SET_TP_DST:
>>>>>>> + flow_tcf_pedit_key_set_tp_port(*actions,
>>>>>>> + &p_parser, item_flags);
>>>>>>> + break;
>>>>>>> + default:
>>>>>>> + goto pedit_mnl_msg_done;
>>>>>>> + }
>>>>>>> + }
>>>>>>> +pedit_mnl_msg_done:
>>>>>>> + p_parser.sel.action = TC_ACT_PIPE;
>>>>>>> + mnl_attr_put(nl, TCA_PEDIT_PARMS_EX,
>>>>>>> + sizeof(p_parser.sel) +
>>>>>>> + p_parser.sel.nkeys * sizeof(struct tc_pedit_key),
>>>>>>> + &p_parser);
>>>>>>> + struct nlattr *na_pedit_keys = mnl_attr_nest_start(nl,
>>>>>>> + TCA_PEDIT_KEYS_EX | NLA_F_NESTED);
>>>>>>> + for (int i = 0; i < p_parser.sel.nkeys; i++) {
>>>>>>> + struct nlattr *na_pedit_key = mnl_attr_nest_start(nl,
>>>>>>> + TCA_PEDIT_KEY_EX | NLA_F_NESTED);
>>>>>>> + mnl_attr_put_u16(nl, TCA_PEDIT_KEY_EX_HTYPE,
>>>>>>> + p_parser.keys_ex[i].htype);
>>>>>>> + mnl_attr_put_u16(nl, TCA_PEDIT_KEY_EX_CMD,
>>>>>>> + p_parser.keys_ex[i].cmd);
>>>>>>> + mnl_attr_nest_end(nl, na_pedit_key);
>>>>>>> + }
>>>>>>> + mnl_attr_nest_end(nl, na_pedit_keys);
>>>>>>> + mnl_attr_nest_end(nl, na_act_options);
>>>>>>> + (*actions)--;
>>>>>>> + return 0;
>>>>>>> +}
>>>>>>> +
>>>>>>> +/**
>>>>>>> + * Calculate max memory size of one TC-pedit actions.
>>>>>>> + * One TC-pedit action can contain set of keys each defining
>>>>>>> + * a rewrite element (rte_flow action)
>>>>>>> + *
>>>>>>> + * @param[in] actions
>>>>>>> + * actions specification.
>>>>>>> + * @param[inout] action_flags
>>>>>>> + * actions flags
>>>>>>> + * @param[inout] size
>>>>>>> + * accumulated size
>>>>>>> + * @return
>>>>>>> + * Max memory size of one TC-pedit action
>>>>>>> + */
>>>>>>> +static int
>>>>>>> +flow_tcf_get_pedit_actions_size(const struct rte_flow_action **actions,
>>>>>>> + uint64_t *action_flags)
>>>>>>> +{
>>>>>>> + int pedit_size = 0;
>>>>>>> + int keys = 0;
>>>>>>> + uint64_t flags = 0;
>>>>>>> +
>>>>>>> + pedit_size += SZ_NLATTR_NEST + /* na_act_index. */
>>>>>>> + SZ_NLATTR_STRZ_OF("pedit") +
>>>>>>> + SZ_NLATTR_NEST; /* TCA_ACT_OPTIONS. */
>>>>>>> + for (; (*actions)->type != RTE_FLOW_ACTION_TYPE_END; (*actions)++) {
>>>>>>> + switch ((*actions)->type) {
>>>>>>> + case RTE_FLOW_ACTION_TYPE_SET_IPV4_SRC:
>>>>>>> + keys += flow_tcf_calc_pedit_keys(IPV4_ADDR_LEN);
>>>>>>> + flags |= MLX5_ACTION_SET_IPV4_SRC;
>>>>>>> + break;
>>>>>>> + case RTE_FLOW_ACTION_TYPE_SET_IPV4_DST:
>>>>>>> + keys += flow_tcf_calc_pedit_keys(IPV4_ADDR_LEN);
>>>>>>> + flags |= MLX5_ACTION_SET_IPV4_DST;
>>>>>>> + break;
>>>>>>> + case RTE_FLOW_ACTION_TYPE_SET_IPV6_SRC:
>>>>>>> + keys += flow_tcf_calc_pedit_keys(IPV6_ADDR_LEN);
>>>>>>> + flags |= MLX5_ACTION_SET_IPV6_SRC;
>>>>>>> + break;
>>>>>>> + case RTE_FLOW_ACTION_TYPE_SET_IPV6_DST:
>>>>>>> + keys += flow_tcf_calc_pedit_keys(IPV6_ADDR_LEN);
>>>>>>> + flags |= MLX5_ACTION_SET_IPV6_DST;
>>>>>>> + break;
>>>>>>> + case RTE_FLOW_ACTION_TYPE_SET_TP_SRC:
>>>>>>> + /* TCP is as same as UDP */
>>>>>>> + keys += flow_tcf_calc_pedit_keys(TP_PORT_LEN);
>>>>>>> + flags |= MLX5_ACTION_SET_TP_SRC;
>>>>>>> + break;
>>>>>>> + case RTE_FLOW_ACTION_TYPE_SET_TP_DST:
>>>>>>> + /* TCP is as same as UDP */
>>>>>>> + keys += flow_tcf_calc_pedit_keys(TP_PORT_LEN);
>>>>>>> + flags |= MLX5_ACTION_SET_TP_DST;
>>>>>>> + break;
>>>>>>> + default:
>>>>>>> + goto get_pedit_action_size_done;
>>>>>>> + }
>>>>>>> + }
>>>>>>> +get_pedit_action_size_done:
>>>>>>> + /* TCA_PEDIT_PARAMS_EX */
>>>>>>> + pedit_size += SZ_NLATTR_DATA_OF(sizeof(struct tc_pedit_sel) +
>>>>>>> + keys * sizeof(struct tc_pedit_key));
>>>>>>
>>>>>>> + pedit_size += SZ_NLATTR_NEST; /* TCA_PEDIT_KEYS */
>>>>>>> + pedit_size += keys *
>>>>>>> + /* TCA_PEDIT_KEY_EX + HTYPE + CMD */
>>>>>>> + (SZ_NLATTR_NEST + SZ_NLATTR_DATA_OF(2) + SZ_NLATTR_DATA_OF(2));
>>>>>>> + (*action_flags) |= flags;
>>>>>>> + (*actions)--;
>>>>>>> + return pedit_size;
>>>>>>> +}
>>>>>>> +
>>>>>>> /**
>>>>>>> * Retrieve mask for pattern item.
>>>>>>> *
>>>>>>> @@ -430,6 +708,8 @@ flow_tcf_validate(struct rte_eth_dev *dev,
>>>>>>> of_set_vlan_vid;
>>>>>>> const struct rte_flow_action_of_set_vlan_pcp *
>>>>>>> of_set_vlan_pcp;
>>>>>>> + const struct rte_flow_action_set_ipv4 *set_ipv4;
>>>>>>> + const struct rte_flow_action_set_ipv6 *set_ipv6;
>>>>>>> } conf;
>>>>>>> uint32_t item_flags = 0;
>>>>>>> uint32_t action_flags = 0;
>>>>>>> @@ -690,12 +970,64 @@ flow_tcf_validate(struct rte_eth_dev *dev,
>>>>>>> case RTE_FLOW_ACTION_TYPE_OF_SET_VLAN_PCP:
>>>>>>> action_flags |= MLX5_ACTION_OF_SET_VLAN_PCP;
>>>>>>> break;
>>>>>>> + case RTE_FLOW_ACTION_TYPE_SET_IPV4_SRC:
>>>>>>> + action_flags |= MLX5_ACTION_SET_IPV4_SRC;
>>>>>>> + break;
>>>>>>> + case RTE_FLOW_ACTION_TYPE_SET_IPV4_DST:
>>>>>>> + action_flags |= MLX5_ACTION_SET_IPV4_DST;
>>>>>>> + break;
>>>>>>> + case RTE_FLOW_ACTION_TYPE_SET_IPV6_SRC:
>>>>>>> + action_flags |= MLX5_ACTION_SET_IPV6_SRC;
>>>>>>> + break;
>>>>>>> + case RTE_FLOW_ACTION_TYPE_SET_IPV6_DST:
>>>>>>> + action_flags |= MLX5_ACTION_SET_IPV6_DST;
>>>>>>> + break;
>>>>>>> + case RTE_FLOW_ACTION_TYPE_SET_TP_SRC:
>>>>>>> + action_flags |= MLX5_ACTION_SET_TP_SRC;
>>>>>>> + break;
>>>>>>> + case RTE_FLOW_ACTION_TYPE_SET_TP_DST:
>>>>>>> + action_flags |= MLX5_ACTION_SET_TP_DST;
>>>>>>> + break;
>>>>>>> default:
>>>>>>> return rte_flow_error_set(error, ENOTSUP,
>>>>>>> RTE_FLOW_ERROR_TYPE_ACTION,
>>>>>>> actions,
>>>>>>> "action not supported");
>>>>>>> }
>>>>>>> + if (IS_MODIFY_ACTION(actions->type)) {
>>>>
>>>> This would be a redundant 'if' as classification is already done above. So, how
>>>> about adding a goto label at the end of this code - 'err_no_action_conf:', and
>>>> use goto above. E.g.,
>>>>
>>>> case RTE_FLOW_ACTION_TYPE_SET_TP_DST:
>>>> action_flags |= MLX5_ACTION_SET_TP_DST;
>>>> if (!actions->conf)
>>>> goto err_no_action_conf;
>>>> break;
>>>>
>>> In some level I do agree with you it's redundant. But things like this kind of
>>> redundancy is not avoidable. I mean if we use "goto err_no_action_conf", the
>>> "if (!actions->conf) goto err_no_action_conf" has to be repeated in each "case"
>>> which needs to check conf or you think it's acceptable ?
>>
>> But in your approach, the redundant check (IS_MODIFY_ACTION) is inside a loop
>> and it has to be checked even if there's no 'set' action in the action list.
>>
> Yes, it will check very time. But I doubt it will impact performace significantly.
Not a performance issue but I just wanted to make the code modular as much as possible.
> Another thing bother me is, if taking 'goto' approach, beside the 'if (!actions->conf)...',
> at this moment, we also need to add 'if (set action is discontinued)' for
> each modification case. I'm not sure if this will keep code clear
> considering many repeated code there...
And like I asked, this sanity check is also needed for other actions (PORT_ID and VLAN).
That's why I just wanted to have more generic way.
I'm not sure which way is clearer considering all that.
I don't have a strong objection about your current code but please make sure you add the
same sanity check for other existing acitons in a unified way. Also I hope the macro
(IS_MODIFY_ACTION) has better name.
Thanks,
Yongseok
>>>>
>>>> And if I may, can I ask you to add the same to RTE_FLOW_ACTION_TYPE_PORT_ID?
>>>>
>>> Yes, I will add.
>>>>>>> + if (!actions->conf)
>>>>>>> + return rte_flow_error_set(error, ENOTSUP,
>>>>>>> + RTE_FLOW_ERROR_TYPE_ACTION_CONF,
>>>>>>> + actions,
>>>>>>> + "action configuration not set");
>>>>>>> + }
>>>>>>> + }
next prev parent reply other threads:[~2018-10-09 8:55 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-09-25 11:51 Xiaoyu Min
2018-09-28 23:03 ` Yongseok Koh
2018-09-30 7:21 ` Xiaoyu Min
2018-10-01 20:19 ` Yongseok Koh
2018-10-08 11:22 ` Xiaoyu Min
2018-10-08 22:08 ` Yongseok Koh
2018-10-09 7:03 ` Jack Min
2018-10-09 8:55 ` Yongseok Koh [this message]
2018-10-10 12:56 ` [dpdk-dev] [PATCH v2] net/mlx5: rewrite IP address UDP/TCP port by E-Switch Jack Min
2018-10-11 4:45 ` Yongseok Koh
2018-10-11 9:26 ` [dpdk-dev] [PATCH v3] " Jack Min
2018-10-11 13:22 ` [dpdk-dev] [PATCH v4] " Jack Min
2018-10-11 13:39 ` Shahaf Shuler
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=7BBA09A8-00D5-43D4-9215-72DE4580493B@mellanox.com \
--to=yskoh@mellanox.com \
--cc=dev@dpdk.org \
--cc=jackmin@mellanox.com \
--cc=shahafs@mellanox.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).