From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wm0-f67.google.com (mail-wm0-f67.google.com [74.125.82.67]) by dpdk.org (Postfix) with ESMTP id CA2B41C389 for ; Wed, 27 Jun 2018 20:08:29 +0200 (CEST) Received: by mail-wm0-f67.google.com with SMTP id 69-v6so6411282wmf.3 for ; Wed, 27 Jun 2018 11:08:29 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=6wind-com.20150623.gappssmtp.com; s=20150623; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=U1E2+DC2LJ/BRtPJepqcUF6KWkj9ynxLOWeVoySyUcI=; b=O2bYwX49+zlyk4Uvd/ZmxkfiSTu7/40DAQO6Vp2eGASyjSvVi8APfWbmj5Sqx2Befc ZryEXttIN12kFCc1Vs9emCAMhV3KNXZaFC3oSYzP+vbPcjVQMjLtOtJ3QUEgGAZU4uxX iNJbSoOxFYPtEExwNjLMfaHACadgCZNPkZCyfMLSC+vsDbhTDJWpvj+2NWEGhZJaHoal hU6EI0eCGIwXjntrtJwVhKQuBq6DcXuzE/AlaeRA/IG0qQLxMhchAgAH433DBFxD+91W rreoVjMK5tLFjrE0ED/NgHFsYaff3xoW0LkESeFbxI5i7b6y5Ox4BataPlvYYSd8Th0a jQyg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=U1E2+DC2LJ/BRtPJepqcUF6KWkj9ynxLOWeVoySyUcI=; b=bPEE7b8iYhuy73D083UCSlr3/quSK3ldACkQLbD7EcLw+W57aFh2sapR8KajQDNrQA zskzBgnLGXUBX6ahycEB/O5O0mBhpO+boCsnQZawbTX2skLsE23OHrDHnviTpBsPxDqH S8Fs8uKGJ139HCJfKbYOkcszOGsiZubEky5npTxspvFo/nWjkbi+vcXTp0NTUd3wjq3M n+olQsnfoB1Op1iNt4I4UxI+o15j8OhzRj4luBOVMkacMKADzS7Oe7Lj/tHxHeXuRtLw 95tr/g/D3muCo8UHGkbu/NbuZ2u/rCZr+MsOxX22Ogk5A8vB3n5cXAW6qC3i1kudR3wc XeYQ== X-Gm-Message-State: APt69E0dkuU8LbxM52rcktRldR2tR/y2F15VdpYSTXV3J84nOyh2Q6ZH 8X/4yEMfzcskDcVomlPa6F/VSw== X-Google-Smtp-Source: AAOMgpdhT+0jIuQpVRvAxEAqK6e4X3o2jQYKMuoE+5/iDGAQGooXmt9LPkYYXtvo02MsUv7blXJIIg== X-Received: by 2002:a1c:928c:: with SMTP id u134-v6mr5665915wmd.106.1530122909288; Wed, 27 Jun 2018 11:08:29 -0700 (PDT) Received: from 6wind.com (host.78.145.23.62.rev.coltfrance.com. [62.23.145.78]) by smtp.gmail.com with ESMTPSA id 132-v6sm8584514wmr.33.2018.06.27.11.08.28 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 27 Jun 2018 11:08:28 -0700 (PDT) Date: Wed, 27 Jun 2018 20:08:12 +0200 From: Adrien Mazarguil To: Shahaf Shuler Cc: Nelio Laranjeiro , Yongseok Koh , dev@dpdk.org Message-ID: <20180627173355.4718-3-adrien.mazarguil@6wind.com> References: <20180627173355.4718-1-adrien.mazarguil@6wind.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20180627173355.4718-1-adrien.mazarguil@6wind.com> X-Mailer: git-send-email 2.11.0 Subject: [dpdk-dev] [PATCH 2/6] net/mlx5: add framework for switch flow rules X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 27 Jun 2018 18:08:30 -0000 Because mlx5 switch flow rules are configured through Netlink (TC interface) and have little in common with Verbs, this patch adds a separate parser function to handle them. - mlx5_nl_flow_transpose() converts a rte_flow rule to its TC equivalent and stores the result in a buffer. - mlx5_nl_flow_brand() gives a unique handle to a flow rule buffer. - mlx5_nl_flow_create() instantiates a flow rule on the device based on such a buffer. - mlx5_nl_flow_destroy() performs the reverse operation. These functions are called by the existing implementation when encountering flow rules which must be offloaded to the switch (currently relying on the transfer attribute). Signed-off-by: Adrien Mazarguil Signed-off-by: Nelio Laranjeiro --- drivers/net/mlx5/mlx5.h | 18 +++ drivers/net/mlx5/mlx5_flow.c | 113 ++++++++++++++ drivers/net/mlx5/mlx5_nl_flow.c | 295 +++++++++++++++++++++++++++++++++++ 3 files changed, 426 insertions(+) diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h index 390249adb..aa16057d6 100644 --- a/drivers/net/mlx5/mlx5.h +++ b/drivers/net/mlx5/mlx5.h @@ -148,6 +148,12 @@ struct mlx5_drop { struct mlx5_rxq_ibv *rxq; /* Verbs Rx queue. */ }; +/** DPDK port to network interface index (ifindex) conversion. */ +struct mlx5_nl_flow_ptoi { + uint16_t port_id; /**< DPDK port ID. */ + unsigned int ifindex; /**< Network interface index. */ +}; + struct mnl_socket; struct priv { @@ -374,6 +380,18 @@ int mlx5_nl_allmulti(struct rte_eth_dev *dev, int enable); /* mlx5_nl_flow.c */ +int mlx5_nl_flow_transpose(void *buf, + size_t size, + const struct mlx5_nl_flow_ptoi *ptoi, + const struct rte_flow_attr *attr, + const struct rte_flow_item *pattern, + const struct rte_flow_action *actions, + struct rte_flow_error *error); +void mlx5_nl_flow_brand(void *buf, uint32_t handle); +int mlx5_nl_flow_create(struct mnl_socket *nl, void *buf, + struct rte_flow_error *error); +int mlx5_nl_flow_destroy(struct mnl_socket *nl, void *buf, + struct rte_flow_error *error); int mlx5_nl_flow_init(struct mnl_socket *nl, unsigned int ifindex, struct rte_flow_error *error); struct mnl_socket *mlx5_nl_flow_socket_create(void); diff --git a/drivers/net/mlx5/mlx5_flow.c b/drivers/net/mlx5/mlx5_flow.c index 9241855be..93b245991 100644 --- a/drivers/net/mlx5/mlx5_flow.c +++ b/drivers/net/mlx5/mlx5_flow.c @@ -4,6 +4,7 @@ */ #include +#include #include #include @@ -271,6 +272,7 @@ struct rte_flow { /**< Store tunnel packet type data to store in Rx queue. */ uint8_t key[40]; /**< RSS hash key. */ uint16_t (*queue)[]; /**< Destination queues to redirect traffic to. */ + void *nl_flow; /**< Netlink flow buffer if relevant. */ }; static const struct rte_flow_ops mlx5_flow_ops = { @@ -2403,6 +2405,106 @@ mlx5_flow_actions(struct rte_eth_dev *dev, } /** + * Validate flow rule and fill flow structure accordingly. + * + * @param dev + * Pointer to Ethernet device. + * @param[out] flow + * Pointer to flow structure. + * @param flow_size + * Size of allocated space for @p flow. + * @param[in] attr + * Flow rule attributes. + * @param[in] pattern + * Pattern specification (list terminated by the END pattern item). + * @param[in] actions + * Associated actions (list terminated by the END action). + * @param[out] error + * Perform verbose error reporting if not NULL. + * + * @return + * A positive value representing the size of the flow object in bytes + * regardless of @p flow_size on success, a negative errno value otherwise + * and rte_errno is set. + */ +static int +mlx5_flow_merge_switch(struct rte_eth_dev *dev, + struct rte_flow *flow, + size_t flow_size, + const struct rte_flow_attr *attr, + const struct rte_flow_item pattern[], + const struct rte_flow_action actions[], + struct rte_flow_error *error) +{ + struct priv *priv = dev->data->dev_private; + unsigned int n = mlx5_domain_to_port_id(priv->domain_id, NULL, 0); + uint16_t port_list[!n + n]; + struct mlx5_nl_flow_ptoi ptoi[!n + n + 1]; + size_t off = RTE_ALIGN_CEIL(sizeof(*flow), alignof(max_align_t)); + unsigned int i; + unsigned int own = 0; + int ret; + + /* At least one port is needed when no switch domain is present. */ + if (!n) { + n = 1; + port_list[0] = dev->data->port_id; + } else { + n = mlx5_domain_to_port_id(priv->domain_id, port_list, n); + if (n > RTE_DIM(port_list)) + n = RTE_DIM(port_list); + } + for (i = 0; i != n; ++i) { + struct rte_eth_dev_info dev_info; + + rte_eth_dev_info_get(port_list[i], &dev_info); + if (port_list[i] == dev->data->port_id) + own = i; + ptoi[i].port_id = port_list[i]; + ptoi[i].ifindex = dev_info.if_index; + } + /* Ensure first entry of ptoi[] is the current device. */ + if (own) { + ptoi[n] = ptoi[0]; + ptoi[0] = ptoi[own]; + ptoi[own] = ptoi[n]; + } + /* An entry with zero ifindex terminates ptoi[]. */ + ptoi[n].port_id = 0; + ptoi[n].ifindex = 0; + if (flow_size < off) + flow_size = 0; + ret = mlx5_nl_flow_transpose((uint8_t *)flow + off, + flow_size ? flow_size - off : 0, + ptoi, attr, pattern, actions, error); + if (ret < 0) + return ret; + if (flow_size) { + *flow = (struct rte_flow){ + .attributes = *attr, + .nl_flow = (uint8_t *)flow + off, + }; + /* + * Generate a reasonably unique handle based on the address + * of the target buffer. + * + * This is straightforward on 32-bit systems where the flow + * pointer can be used directly. Otherwise, its least + * significant part is taken after shifting it by the + * previous power of two of the pointed buffer size. + */ + if (sizeof(flow) <= 4) + mlx5_nl_flow_brand(flow->nl_flow, (uintptr_t)flow); + else + mlx5_nl_flow_brand + (flow->nl_flow, + (uintptr_t)flow >> + rte_log2_u32(rte_align32prevpow2(flow_size))); + } + return off + ret; +} + +/** * Validate the rule and return a flow structure filled accordingly. * * @param dev @@ -2439,6 +2541,9 @@ mlx5_flow_merge(struct rte_eth_dev *dev, struct rte_flow *flow, int ret; uint32_t i; + if (attr->transfer) + return mlx5_flow_merge_switch(dev, flow, flow_size, + attr, items, actions, error); if (!remain) flow = &local_flow; ret = mlx5_flow_attributes(dev, attr, flow, error); @@ -2554,8 +2659,11 @@ mlx5_flow_validate(struct rte_eth_dev *dev, static void mlx5_flow_fate_remove(struct rte_eth_dev *dev, struct rte_flow *flow) { + struct priv *priv = dev->data->dev_private; struct mlx5_flow_verbs *verbs; + if (flow->nl_flow && priv->mnl_socket) + mlx5_nl_flow_destroy(priv->mnl_socket, flow->nl_flow, NULL); LIST_FOREACH(verbs, &flow->verbs, next) { if (verbs->flow) { claim_zero(mlx5_glue->destroy_flow(verbs->flow)); @@ -2592,6 +2700,7 @@ static int mlx5_flow_fate_apply(struct rte_eth_dev *dev, struct rte_flow *flow, struct rte_flow_error *error) { + struct priv *priv = dev->data->dev_private; struct mlx5_flow_verbs *verbs; int err; @@ -2640,6 +2749,10 @@ mlx5_flow_fate_apply(struct rte_eth_dev *dev, struct rte_flow *flow, goto error; } } + if (flow->nl_flow && + priv->mnl_socket && + mlx5_nl_flow_create(priv->mnl_socket, flow->nl_flow, error)) + goto error; return 0; error: err = rte_errno; /* Save rte_errno before cleanup. */ diff --git a/drivers/net/mlx5/mlx5_nl_flow.c b/drivers/net/mlx5/mlx5_nl_flow.c index 7a8683b03..1fc62fb0a 100644 --- a/drivers/net/mlx5/mlx5_nl_flow.c +++ b/drivers/net/mlx5/mlx5_nl_flow.c @@ -5,7 +5,9 @@ #include #include +#include #include +#include #include #include #include @@ -14,11 +16,248 @@ #include #include +#include #include #include #include "mlx5.h" +/** Parser state definitions for mlx5_nl_flow_trans[]. */ +enum mlx5_nl_flow_trans { + INVALID, + BACK, + ATTR, + PATTERN, + ITEM_VOID, + ACTIONS, + ACTION_VOID, + END, +}; + +#define TRANS(...) (const enum mlx5_nl_flow_trans []){ __VA_ARGS__, INVALID, } + +#define PATTERN_COMMON \ + ITEM_VOID, ACTIONS +#define ACTIONS_COMMON \ + ACTION_VOID, END + +/** Parser state transitions used by mlx5_nl_flow_transpose(). */ +static const enum mlx5_nl_flow_trans *const mlx5_nl_flow_trans[] = { + [INVALID] = NULL, + [BACK] = NULL, + [ATTR] = TRANS(PATTERN), + [PATTERN] = TRANS(PATTERN_COMMON), + [ITEM_VOID] = TRANS(BACK), + [ACTIONS] = TRANS(ACTIONS_COMMON), + [ACTION_VOID] = TRANS(BACK), + [END] = NULL, +}; + +/** + * Transpose flow rule description to rtnetlink message. + * + * This function transposes a flow rule description to a traffic control + * (TC) filter creation message ready to be sent over Netlink. + * + * Target interface is specified as the first entry of the @p ptoi table. + * Subsequent entries enable this function to resolve other DPDK port IDs + * found in the flow rule. + * + * @param[out] buf + * Output message buffer. May be NULL when @p size is 0. + * @param size + * Size of @p buf. Message may be truncated if not large enough. + * @param[in] ptoi + * DPDK port ID to network interface index translation table. This table + * is terminated by an entry with a zero ifindex value. + * @param[in] attr + * Flow rule attributes. + * @param[in] pattern + * Pattern specification. + * @param[in] actions + * Associated actions. + * @param[out] error + * Perform verbose error reporting if not NULL. + * + * @return + * A positive value representing the exact size of the message in bytes + * regardless of the @p size parameter on success, a negative errno value + * otherwise and rte_errno is set. + */ +int +mlx5_nl_flow_transpose(void *buf, + size_t size, + const struct mlx5_nl_flow_ptoi *ptoi, + const struct rte_flow_attr *attr, + const struct rte_flow_item *pattern, + const struct rte_flow_action *actions, + struct rte_flow_error *error) +{ + alignas(struct nlmsghdr) + uint8_t buf_tmp[MNL_SOCKET_BUFFER_SIZE]; + const struct rte_flow_item *item; + const struct rte_flow_action *action; + unsigned int n; + struct nlattr *na_flower; + struct nlattr *na_flower_act; + const enum mlx5_nl_flow_trans *trans; + const enum mlx5_nl_flow_trans *back; + + if (!size) + goto error_nobufs; +init: + item = pattern; + action = actions; + n = 0; + na_flower = NULL; + na_flower_act = NULL; + trans = TRANS(ATTR); + back = trans; +trans: + switch (trans[n++]) { + struct nlmsghdr *nlh; + struct tcmsg *tcm; + + case INVALID: + if (item->type) + return rte_flow_error_set + (error, ENOTSUP, RTE_FLOW_ERROR_TYPE_ITEM, + item, "unsupported pattern item combination"); + else if (action->type) + return rte_flow_error_set + (error, ENOTSUP, RTE_FLOW_ERROR_TYPE_ACTION, + action, "unsupported action combination"); + return rte_flow_error_set + (error, ENOTSUP, RTE_FLOW_ERROR_TYPE_UNSPECIFIED, NULL, + "flow rule lacks some kind of fate action"); + case BACK: + trans = back; + n = 0; + goto trans; + case ATTR: + /* + * Supported attributes: no groups, some priorities and + * ingress only. Don't care about transfer as it is the + * caller's problem. + */ + if (attr->group) + return rte_flow_error_set + (error, ENOTSUP, + RTE_FLOW_ERROR_TYPE_ATTR_GROUP, + attr, "groups are not supported"); + if (attr->priority > 0xfffe) + return rte_flow_error_set + (error, ENOTSUP, + RTE_FLOW_ERROR_TYPE_ATTR_PRIORITY, + attr, "lowest priority level is 0xfffe"); + if (!attr->ingress) + return rte_flow_error_set + (error, ENOTSUP, + RTE_FLOW_ERROR_TYPE_ATTR_INGRESS, + attr, "only ingress is supported"); + if (attr->egress) + return rte_flow_error_set + (error, ENOTSUP, + RTE_FLOW_ERROR_TYPE_ATTR_INGRESS, + attr, "egress is not supported"); + if (size < mnl_nlmsg_size(sizeof(*tcm))) + goto error_nobufs; + nlh = mnl_nlmsg_put_header(buf); + nlh->nlmsg_type = 0; + nlh->nlmsg_flags = 0; + nlh->nlmsg_seq = 0; + tcm = mnl_nlmsg_put_extra_header(nlh, sizeof(*tcm)); + tcm->tcm_family = AF_UNSPEC; + tcm->tcm_ifindex = ptoi[0].ifindex; + /* + * Let kernel pick a handle by default. A predictable handle + * can be set by the caller on the resulting buffer through + * mlx5_nl_flow_brand(). + */ + tcm->tcm_handle = 0; + tcm->tcm_parent = TC_H_MAKE(TC_H_INGRESS, TC_H_MIN_INGRESS); + /* + * Priority cannot be zero to prevent the kernel from + * picking one automatically. + */ + tcm->tcm_info = TC_H_MAKE((attr->priority + 1) << 16, + RTE_BE16(ETH_P_ALL)); + break; + case PATTERN: + if (!mnl_attr_put_strz_check(buf, size, TCA_KIND, "flower")) + goto error_nobufs; + na_flower = mnl_attr_nest_start_check(buf, size, TCA_OPTIONS); + if (!na_flower) + goto error_nobufs; + if (!mnl_attr_put_u32_check(buf, size, TCA_FLOWER_FLAGS, + TCA_CLS_FLAGS_SKIP_SW)) + goto error_nobufs; + break; + case ITEM_VOID: + if (item->type != RTE_FLOW_ITEM_TYPE_VOID) + goto trans; + ++item; + break; + case ACTIONS: + if (item->type != RTE_FLOW_ITEM_TYPE_END) + goto trans; + assert(na_flower); + assert(!na_flower_act); + na_flower_act = + mnl_attr_nest_start_check(buf, size, TCA_FLOWER_ACT); + if (!na_flower_act) + goto error_nobufs; + break; + case ACTION_VOID: + if (action->type != RTE_FLOW_ACTION_TYPE_VOID) + goto trans; + ++action; + break; + case END: + if (item->type != RTE_FLOW_ITEM_TYPE_END || + action->type != RTE_FLOW_ACTION_TYPE_END) + goto trans; + if (na_flower_act) + mnl_attr_nest_end(buf, na_flower_act); + if (na_flower) + mnl_attr_nest_end(buf, na_flower); + nlh = buf; + return nlh->nlmsg_len; + } + back = trans; + trans = mlx5_nl_flow_trans[trans[n - 1]]; + n = 0; + goto trans; +error_nobufs: + if (buf != buf_tmp) { + buf = buf_tmp; + size = sizeof(buf_tmp); + goto init; + } + return rte_flow_error_set + (error, ENOBUFS, RTE_FLOW_ERROR_TYPE_UNSPECIFIED, NULL, + "generated TC message is too large"); +} + +/** + * Brand rtnetlink buffer with unique handle. + * + * This handle should be unique for a given network interface to avoid + * collisions. + * + * @param buf + * Flow rule buffer previously initialized by mlx5_nl_flow_transpose(). + * @param handle + * Unique 32-bit handle to use. + */ +void +mlx5_nl_flow_brand(void *buf, uint32_t handle) +{ + struct tcmsg *tcm = mnl_nlmsg_get_payload(buf); + + tcm->tcm_handle = handle; +} + /** * Send Netlink message with acknowledgment. * @@ -54,6 +293,62 @@ mlx5_nl_flow_nl_ack(struct mnl_socket *nl, struct nlmsghdr *nlh) } /** + * Create a Netlink flow rule. + * + * @param nl + * Libmnl socket to use. + * @param buf + * Flow rule buffer previously initialized by mlx5_nl_flow_transpose(). + * @param[out] error + * Perform verbose error reporting if not NULL. + * + * @return + * 0 on success, a negative errno value otherwise and rte_errno is set. + */ +int +mlx5_nl_flow_create(struct mnl_socket *nl, void *buf, + struct rte_flow_error *error) +{ + struct nlmsghdr *nlh = buf; + + nlh->nlmsg_type = RTM_NEWTFILTER; + nlh->nlmsg_flags = NLM_F_REQUEST | NLM_F_CREATE | NLM_F_EXCL; + if (!mlx5_nl_flow_nl_ack(nl, nlh)) + return 0; + return rte_flow_error_set + (error, rte_errno, RTE_FLOW_ERROR_TYPE_UNSPECIFIED, NULL, + "netlink: failed to create TC flow rule"); +} + +/** + * Destroy a Netlink flow rule. + * + * @param nl + * Libmnl socket to use. + * @param buf + * Flow rule buffer previously initialized by mlx5_nl_flow_transpose(). + * @param[out] error + * Perform verbose error reporting if not NULL. + * + * @return + * 0 on success, a negative errno value otherwise and rte_errno is set. + */ +int +mlx5_nl_flow_destroy(struct mnl_socket *nl, void *buf, + struct rte_flow_error *error) +{ + struct nlmsghdr *nlh = buf; + + nlh->nlmsg_type = RTM_DELTFILTER; + nlh->nlmsg_flags = NLM_F_REQUEST; + if (!mlx5_nl_flow_nl_ack(nl, nlh)) + return 0; + return rte_flow_error_set + (error, errno, RTE_FLOW_ERROR_TYPE_UNSPECIFIED, NULL, + "netlink: failed to destroy TC flow rule"); +} + +/** * Initialize ingress qdisc of a given network interface. * * @param nl -- 2.11.0