From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wr1-f67.google.com (mail-wr1-f67.google.com [209.85.221.67]) by dpdk.org (Postfix) with ESMTP id 2E7501B556 for ; Thu, 12 Jul 2018 12:47:03 +0200 (CEST) Received: by mail-wr1-f67.google.com with SMTP id s11-v6so21199343wra.13 for ; Thu, 12 Jul 2018 03:47:03 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=6wind-com.20150623.gappssmtp.com; s=20150623; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=cHejXW5mcx/mVcFA38oaUHb8CCSyxUB00tDqWzxFeFU=; b=AnSzJfG6ALOc2obMszogXIGHcANQ2gWfkI09fyFTgvvufYt70+/BuLZKwCQI8wJAXJ 84npgb8BGw9wGlPp1V9aNYkKYdn7Pe/Kzi31Zw4+sLtb799MJmQyuGbH2B4IsIPApzWi P1efCd5FEXU8sv5TNgIkX3NWjtY5ebwHIgC0bbjrmjj1BuN56190E56W42jdJ0Lja6iw B7Sv2ylE/I5FeuYfeezNDyJ8yIc8UTTA4dan998X656mtm24UHP4P2VEnnunQ6lOlAA8 DEB5Gkp1j5j1/CrnAqMEE76edH5EM0aZVF9YM5765brHYgr9VzzUitFb18PFbjYV098i bSZQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=cHejXW5mcx/mVcFA38oaUHb8CCSyxUB00tDqWzxFeFU=; b=akDz7Iivd1/2R3YQaTKv7+tX5cRzwXJQClAIS0Fx6k2ioUo0hx+paJmnqm/GthUj9U Bl5sZeR6Aib34cReAJqPCgvfqnwEso5dcEaEgOQJwT/oJ+hG1ZkSqcpuB1CnxbvDU2mW gOL6FRR+W1gRsxnmnC6+Cy/F4FcxX0fDTM3SpixNsWwMz5tAPGjCKVoaRCtjEZ3ap317 0fCpZ+j0ZWM0Db4RgtT1hpQ5z8CDgtv8luj/ogXz4n/q00t4ZMv6L/g1zRACaoBiVctD A9fezHq4CPiBpyBWp59IoxdvtLnYk1FPXoO61OTx3I6PRNH8NXpIBBzgF57OB1/mYVL5 lOeQ== X-Gm-Message-State: AOUpUlFqbAvh/UQa+dJ2DMw4O3XeeWVmqCb7n/T/6KiDlYkONhvIhjus u7TnlMq5OZZ1AJIxYGGUmBJomw== X-Google-Smtp-Source: AAOMgpeNOnn6BnAK12HugTuLWMy5BW2ZKYizzhp2rM1g9xezPqPmrZcwhN8+0GxL03QzeS/3Y/gVYQ== X-Received: by 2002:adf:ce88:: with SMTP id r8-v6mr1309444wrn.112.1531392423434; Thu, 12 Jul 2018 03:47:03 -0700 (PDT) Received: from 6wind.com (host.78.145.23.62.rev.coltfrance.com. [62.23.145.78]) by smtp.gmail.com with ESMTPSA id r125-v6sm2804557wmb.27.2018.07.12.03.47.02 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 12 Jul 2018 03:47:02 -0700 (PDT) Date: Thu, 12 Jul 2018 12:46:46 +0200 From: Adrien Mazarguil To: Yongseok Koh Cc: Shahaf Shuler , Nelio Laranjeiro , dev@dpdk.org Message-ID: <20180712104646.GT5211@6wind.com> References: <20180627173355.4718-1-adrien.mazarguil@6wind.com> <20180627173355.4718-3-adrien.mazarguil@6wind.com> <20180712005917.GD69686@yongseok-MBP.local> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20180712005917.GD69686@yongseok-MBP.local> Subject: Re: [dpdk-dev] [PATCH 2/6] net/mlx5: add framework for switch flow rules X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 12 Jul 2018 10:47:04 -0000 On Wed, Jul 11, 2018 at 05:59:18PM -0700, Yongseok Koh wrote: > On Wed, Jun 27, 2018 at 08:08:12PM +0200, Adrien Mazarguil wrote: > > Because mlx5 switch flow rules are configured through Netlink (TC > > interface) and have little in common with Verbs, this patch adds a separate > > parser function to handle them. > > > > - mlx5_nl_flow_transpose() converts a rte_flow rule to its TC equivalent > > and stores the result in a buffer. > > > > - mlx5_nl_flow_brand() gives a unique handle to a flow rule buffer. > > > > - mlx5_nl_flow_create() instantiates a flow rule on the device based on > > such a buffer. > > > > - mlx5_nl_flow_destroy() performs the reverse operation. > > > > These functions are called by the existing implementation when encountering > > flow rules which must be offloaded to the switch (currently relying on the > > transfer attribute). > > > > Signed-off-by: Adrien Mazarguil > > Signed-off-by: Nelio Laranjeiro > > diff --git a/drivers/net/mlx5/mlx5_flow.c b/drivers/net/mlx5/mlx5_flow.c > > index 9241855be..93b245991 100644 > > --- a/drivers/net/mlx5/mlx5_flow.c > > +++ b/drivers/net/mlx5/mlx5_flow.c > > @@ -4,6 +4,7 @@ > > */ > > > > #include > > +#include > > #include > > #include > > > > @@ -271,6 +272,7 @@ struct rte_flow { > > /**< Store tunnel packet type data to store in Rx queue. */ > > uint8_t key[40]; /**< RSS hash key. */ > > uint16_t (*queue)[]; /**< Destination queues to redirect traffic to. */ > > + void *nl_flow; /**< Netlink flow buffer if relevant. */ > > }; > > > > static const struct rte_flow_ops mlx5_flow_ops = { > > @@ -2403,6 +2405,106 @@ mlx5_flow_actions(struct rte_eth_dev *dev, > > } > > > > /** > > + * Validate flow rule and fill flow structure accordingly. > > + * > > + * @param dev > > + * Pointer to Ethernet device. > > + * @param[out] flow > > + * Pointer to flow structure. > > + * @param flow_size > > + * Size of allocated space for @p flow. > > + * @param[in] attr > > + * Flow rule attributes. > > + * @param[in] pattern > > + * Pattern specification (list terminated by the END pattern item). > > + * @param[in] actions > > + * Associated actions (list terminated by the END action). > > + * @param[out] error > > + * Perform verbose error reporting if not NULL. > > + * > > + * @return > > + * A positive value representing the size of the flow object in bytes > > + * regardless of @p flow_size on success, a negative errno value otherwise > > + * and rte_errno is set. > > + */ > > +static int > > +mlx5_flow_merge_switch(struct rte_eth_dev *dev, > > + struct rte_flow *flow, > > + size_t flow_size, > > + const struct rte_flow_attr *attr, > > + const struct rte_flow_item pattern[], > > + const struct rte_flow_action actions[], > > + struct rte_flow_error *error) > > +{ > > + struct priv *priv = dev->data->dev_private; > > + unsigned int n = mlx5_domain_to_port_id(priv->domain_id, NULL, 0); > > + uint16_t port_list[!n + n]; > > + struct mlx5_nl_flow_ptoi ptoi[!n + n + 1]; > > + size_t off = RTE_ALIGN_CEIL(sizeof(*flow), alignof(max_align_t)); > > + unsigned int i; > > + unsigned int own = 0; > > + int ret; > > + > > + /* At least one port is needed when no switch domain is present. */ > > + if (!n) { > > + n = 1; > > + port_list[0] = dev->data->port_id; > > + } else { > > + n = mlx5_domain_to_port_id(priv->domain_id, port_list, n); > > + if (n > RTE_DIM(port_list)) > > + n = RTE_DIM(port_list); > > + } > > + for (i = 0; i != n; ++i) { > > + struct rte_eth_dev_info dev_info; > > + > > + rte_eth_dev_info_get(port_list[i], &dev_info); > > + if (port_list[i] == dev->data->port_id) > > + own = i; > > + ptoi[i].port_id = port_list[i]; > > + ptoi[i].ifindex = dev_info.if_index; > > + } > > + /* Ensure first entry of ptoi[] is the current device. */ > > + if (own) { > > + ptoi[n] = ptoi[0]; > > + ptoi[0] = ptoi[own]; > > + ptoi[own] = ptoi[n]; > > + } > > + /* An entry with zero ifindex terminates ptoi[]. */ > > + ptoi[n].port_id = 0; > > + ptoi[n].ifindex = 0; > > + if (flow_size < off) > > + flow_size = 0; > > + ret = mlx5_nl_flow_transpose((uint8_t *)flow + off, > > + flow_size ? flow_size - off : 0, > > + ptoi, attr, pattern, actions, error); > > + if (ret < 0) > > + return ret; > > So, there's an assumption that the buffer allocated outside of this API is > enough to include all the messages in mlx5_nl_flow_transpose(), right? If > flow_size isn't enough, buf_tmp will be used and _transpose() doesn't return > error but required size. Sounds confusing, may need to make a change or to have > clearer documentation. Well, isn't it already documented? Besides these are the usual snprintf() semantics used everywhere in these files, I think this was a major topic of discussion with Nelio on the flow rework series :) buf_tmp[] is internal to mlx5_nl_flow_transpose() and used as a fallback to complete a pass when input buffer is not large enough (including zero-sized). Having a valid buffer is a constraint imposed by libmnl, because we badly want to know how much space will be needed assuming the flow rule was successfully processed. Without libmnl, the helpers it provides would have been written in a way that doesn't require buf_tmp[]. However libmnl is just too convenient to pass up, hence this compromise. (just to remind onlookers, we want to allocate the minimum amount of memory we possibly can for resources needed by each flow rule, and do so through a single allocation, end goal being to support millions of flow rules while wasting as little memory as possible.) > > diff --git a/drivers/net/mlx5/mlx5_nl_flow.c b/drivers/net/mlx5/mlx5_nl_flow.c > > index 7a8683b03..1fc62fb0a 100644 > > --- a/drivers/net/mlx5/mlx5_nl_flow.c > > +++ b/drivers/net/mlx5/mlx5_nl_flow.c > > @@ -5,7 +5,9 @@ > > > > #include > > #include > > +#include > > #include > > +#include > > #include > > #include > > #include > > @@ -14,11 +16,248 @@ > > #include > > #include > > > > +#include > > #include > > #include > > > > #include "mlx5.h" > > > > +/** Parser state definitions for mlx5_nl_flow_trans[]. */ > > +enum mlx5_nl_flow_trans { > > + INVALID, > > + BACK, > > + ATTR, > > + PATTERN, > > + ITEM_VOID, > > + ACTIONS, > > + ACTION_VOID, > > + END, > > +}; > > + > > +#define TRANS(...) (const enum mlx5_nl_flow_trans []){ __VA_ARGS__, INVALID, } > > + > > +#define PATTERN_COMMON \ > > + ITEM_VOID, ACTIONS > > +#define ACTIONS_COMMON \ > > + ACTION_VOID, END > > + > > +/** Parser state transitions used by mlx5_nl_flow_transpose(). */ > > +static const enum mlx5_nl_flow_trans *const mlx5_nl_flow_trans[] = { > > + [INVALID] = NULL, > > + [BACK] = NULL, > > + [ATTR] = TRANS(PATTERN), > > + [PATTERN] = TRANS(PATTERN_COMMON), > > + [ITEM_VOID] = TRANS(BACK), > > + [ACTIONS] = TRANS(ACTIONS_COMMON), > > + [ACTION_VOID] = TRANS(BACK), > > + [END] = NULL, > > +}; > > + > > +/** > > + * Transpose flow rule description to rtnetlink message. > > + * > > + * This function transposes a flow rule description to a traffic control > > + * (TC) filter creation message ready to be sent over Netlink. > > + * > > + * Target interface is specified as the first entry of the @p ptoi table. > > + * Subsequent entries enable this function to resolve other DPDK port IDs > > + * found in the flow rule. > > + * > > + * @param[out] buf > > + * Output message buffer. May be NULL when @p size is 0. > > + * @param size > > + * Size of @p buf. Message may be truncated if not large enough. > > + * @param[in] ptoi > > + * DPDK port ID to network interface index translation table. This table > > + * is terminated by an entry with a zero ifindex value. > > + * @param[in] attr > > + * Flow rule attributes. > > + * @param[in] pattern > > + * Pattern specification. > > + * @param[in] actions > > + * Associated actions. > > + * @param[out] error > > + * Perform verbose error reporting if not NULL. > > + * > > + * @return > > + * A positive value representing the exact size of the message in bytes > > + * regardless of the @p size parameter on success, a negative errno value > > + * otherwise and rte_errno is set. > > + */ > > +int > > +mlx5_nl_flow_transpose(void *buf, > > + size_t size, > > + const struct mlx5_nl_flow_ptoi *ptoi, > > + const struct rte_flow_attr *attr, > > + const struct rte_flow_item *pattern, > > + const struct rte_flow_action *actions, > > + struct rte_flow_error *error) > > +{ > > + alignas(struct nlmsghdr) > > + uint8_t buf_tmp[MNL_SOCKET_BUFFER_SIZE]; > > + const struct rte_flow_item *item; > > + const struct rte_flow_action *action; > > + unsigned int n; > > + struct nlattr *na_flower; > > + struct nlattr *na_flower_act; > > + const enum mlx5_nl_flow_trans *trans; > > + const enum mlx5_nl_flow_trans *back; > > + > > + if (!size) > > + goto error_nobufs; > > +init: > > + item = pattern; > > + action = actions; > > + n = 0; > > + na_flower = NULL; > > + na_flower_act = NULL; > > + trans = TRANS(ATTR); > > + back = trans; > > +trans: > > + switch (trans[n++]) { > > + struct nlmsghdr *nlh; > > + struct tcmsg *tcm; > > + > > + case INVALID: > > + if (item->type) > > + return rte_flow_error_set > > + (error, ENOTSUP, RTE_FLOW_ERROR_TYPE_ITEM, > > + item, "unsupported pattern item combination"); > > + else if (action->type) > > + return rte_flow_error_set > > + (error, ENOTSUP, RTE_FLOW_ERROR_TYPE_ACTION, > > + action, "unsupported action combination"); > > + return rte_flow_error_set > > + (error, ENOTSUP, RTE_FLOW_ERROR_TYPE_UNSPECIFIED, NULL, > > + "flow rule lacks some kind of fate action"); > > + case BACK: > > + trans = back; > > + n = 0; > > + goto trans; > > + case ATTR: > > + /* > > + * Supported attributes: no groups, some priorities and > > + * ingress only. Don't care about transfer as it is the > > + * caller's problem. > > + */ > > + if (attr->group) > > + return rte_flow_error_set > > + (error, ENOTSUP, > > + RTE_FLOW_ERROR_TYPE_ATTR_GROUP, > > + attr, "groups are not supported"); > > + if (attr->priority > 0xfffe) > > + return rte_flow_error_set > > + (error, ENOTSUP, > > + RTE_FLOW_ERROR_TYPE_ATTR_PRIORITY, > > + attr, "lowest priority level is 0xfffe"); > > + if (!attr->ingress) > > + return rte_flow_error_set > > + (error, ENOTSUP, > > + RTE_FLOW_ERROR_TYPE_ATTR_INGRESS, > > + attr, "only ingress is supported"); > > + if (attr->egress) > > + return rte_flow_error_set > > + (error, ENOTSUP, > > + RTE_FLOW_ERROR_TYPE_ATTR_INGRESS, > > + attr, "egress is not supported"); > > + if (size < mnl_nlmsg_size(sizeof(*tcm))) > > + goto error_nobufs; > > + nlh = mnl_nlmsg_put_header(buf); > > + nlh->nlmsg_type = 0; > > + nlh->nlmsg_flags = 0; > > + nlh->nlmsg_seq = 0; > > + tcm = mnl_nlmsg_put_extra_header(nlh, sizeof(*tcm)); > > + tcm->tcm_family = AF_UNSPEC; > > + tcm->tcm_ifindex = ptoi[0].ifindex; > > + /* > > + * Let kernel pick a handle by default. A predictable handle > > + * can be set by the caller on the resulting buffer through > > + * mlx5_nl_flow_brand(). > > + */ > > + tcm->tcm_handle = 0; > > + tcm->tcm_parent = TC_H_MAKE(TC_H_INGRESS, TC_H_MIN_INGRESS); > > + /* > > + * Priority cannot be zero to prevent the kernel from > > + * picking one automatically. > > + */ > > + tcm->tcm_info = TC_H_MAKE((attr->priority + 1) << 16, > > + RTE_BE16(ETH_P_ALL)); > > + break; > > + case PATTERN: > > + if (!mnl_attr_put_strz_check(buf, size, TCA_KIND, "flower")) > > + goto error_nobufs; > > + na_flower = mnl_attr_nest_start_check(buf, size, TCA_OPTIONS); > > + if (!na_flower) > > + goto error_nobufs; > > + if (!mnl_attr_put_u32_check(buf, size, TCA_FLOWER_FLAGS, > > + TCA_CLS_FLAGS_SKIP_SW)) > > + goto error_nobufs; > > + break; > > + case ITEM_VOID: > > + if (item->type != RTE_FLOW_ITEM_TYPE_VOID) > > + goto trans; > > + ++item; > > + break; > > + case ACTIONS: > > + if (item->type != RTE_FLOW_ITEM_TYPE_END) > > + goto trans; > > + assert(na_flower); > > + assert(!na_flower_act); > > + na_flower_act = > > + mnl_attr_nest_start_check(buf, size, TCA_FLOWER_ACT); > > + if (!na_flower_act) > > + goto error_nobufs; > > + break; > > + case ACTION_VOID: > > + if (action->type != RTE_FLOW_ACTION_TYPE_VOID) > > + goto trans; > > + ++action; > > + break; > > + case END: > > + if (item->type != RTE_FLOW_ITEM_TYPE_END || > > + action->type != RTE_FLOW_ACTION_TYPE_END) > > + goto trans; > > + if (na_flower_act) > > + mnl_attr_nest_end(buf, na_flower_act); > > + if (na_flower) > > + mnl_attr_nest_end(buf, na_flower); > > + nlh = buf; > > + return nlh->nlmsg_len; > > + } > > + back = trans; > > + trans = mlx5_nl_flow_trans[trans[n - 1]]; > > + n = 0; > > + goto trans; > > +error_nobufs: > > + if (buf != buf_tmp) { > > + buf = buf_tmp; > > + size = sizeof(buf_tmp); > > + goto init; > > + } > > Continuing my comment above. > This part is unclear. It looks to me that this func does: > > 1) if size is zero, consider it as a testing call to know the amount of memory > required. Yeah, in fact this one is a shortcut to speed up this specific scenario as it happens all the time in the two-pass use case. You can lump it with 2). > 2) if size isn't zero but not enough, it stops writing to buf and start over to > return the amount of memory required instead of returning error. > 3) if size isn't zero and enough, it fills in buf. > > Do I correctly understand? Yes. Another minor note for 2), the returned buffer is also filled up to the point of failure (mimics snprintf()). Perhaps the following snippet can better summarize the envisioned approach: int ret = snprintf(NULL, 0, "something", ...); if (ret < 0) { goto court; } else { char buf[ret]; snprintf(buf, sizeof(buf), "something", ...); /* Guaranteed. */ [...] } -- Adrien Mazarguil 6WIND