DPDK patches and discussions
 help / color / mirror / Atom feed
From: Yongseok Koh <yskoh@mellanox.com>
To: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Cc: Shahaf Shuler <shahafs@mellanox.com>,
	Nelio Laranjeiro <nelio.laranjeiro@6wind.com>,
	dev@dpdk.org
Subject: Re: [dpdk-dev] [PATCH 2/6] net/mlx5: add framework for switch flow rules
Date: Thu, 12 Jul 2018 11:25:27 -0700	[thread overview]
Message-ID: <20180712182526.GA73570@yongseok-MBP.local> (raw)
In-Reply-To: <20180712104646.GT5211@6wind.com>

On Thu, Jul 12, 2018 at 12:46:46PM +0200, Adrien Mazarguil wrote:
> On Wed, Jul 11, 2018 at 05:59:18PM -0700, Yongseok Koh wrote:
> > On Wed, Jun 27, 2018 at 08:08:12PM +0200, Adrien Mazarguil wrote:
> > > Because mlx5 switch flow rules are configured through Netlink (TC
> > > interface) and have little in common with Verbs, this patch adds a separate
> > > parser function to handle them.
> > > 
> > > - mlx5_nl_flow_transpose() converts a rte_flow rule to its TC equivalent
> > >   and stores the result in a buffer.
> > > 
> > > - mlx5_nl_flow_brand() gives a unique handle to a flow rule buffer.
> > > 
> > > - mlx5_nl_flow_create() instantiates a flow rule on the device based on
> > >   such a buffer.
> > > 
> > > - mlx5_nl_flow_destroy() performs the reverse operation.
> > > 
> > > These functions are called by the existing implementation when encountering
> > > flow rules which must be offloaded to the switch (currently relying on the
> > > transfer attribute).
> > > 
> > > Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
> > > Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
> <snip>
> > > diff --git a/drivers/net/mlx5/mlx5_flow.c b/drivers/net/mlx5/mlx5_flow.c
> > > index 9241855be..93b245991 100644
> > > --- a/drivers/net/mlx5/mlx5_flow.c
> > > +++ b/drivers/net/mlx5/mlx5_flow.c
> > > @@ -4,6 +4,7 @@
> > >   */
> > >  
> > >  #include <sys/queue.h>
> > > +#include <stdalign.h>
> > >  #include <stdint.h>
> > >  #include <string.h>
> > >  
> > > @@ -271,6 +272,7 @@ struct rte_flow {
> > >  	/**< Store tunnel packet type data to store in Rx queue. */
> > >  	uint8_t key[40]; /**< RSS hash key. */
> > >  	uint16_t (*queue)[]; /**< Destination queues to redirect traffic to. */
> > > +	void *nl_flow; /**< Netlink flow buffer if relevant. */
> > >  };
> > >  
> > >  static const struct rte_flow_ops mlx5_flow_ops = {
> > > @@ -2403,6 +2405,106 @@ mlx5_flow_actions(struct rte_eth_dev *dev,
> > >  }
> > >  
> > >  /**
> > > + * Validate flow rule and fill flow structure accordingly.
> > > + *
> > > + * @param dev
> > > + *   Pointer to Ethernet device.
> > > + * @param[out] flow
> > > + *   Pointer to flow structure.
> > > + * @param flow_size
> > > + *   Size of allocated space for @p flow.
> > > + * @param[in] attr
> > > + *   Flow rule attributes.
> > > + * @param[in] pattern
> > > + *   Pattern specification (list terminated by the END pattern item).
> > > + * @param[in] actions
> > > + *   Associated actions (list terminated by the END action).
> > > + * @param[out] error
> > > + *   Perform verbose error reporting if not NULL.
> > > + *
> > > + * @return
> > > + *   A positive value representing the size of the flow object in bytes
> > > + *   regardless of @p flow_size on success, a negative errno value otherwise
> > > + *   and rte_errno is set.
> > > + */
> > > +static int
> > > +mlx5_flow_merge_switch(struct rte_eth_dev *dev,
> > > +		       struct rte_flow *flow,
> > > +		       size_t flow_size,
> > > +		       const struct rte_flow_attr *attr,
> > > +		       const struct rte_flow_item pattern[],
> > > +		       const struct rte_flow_action actions[],
> > > +		       struct rte_flow_error *error)
> > > +{
> > > +	struct priv *priv = dev->data->dev_private;
> > > +	unsigned int n = mlx5_domain_to_port_id(priv->domain_id, NULL, 0);
> > > +	uint16_t port_list[!n + n];
> > > +	struct mlx5_nl_flow_ptoi ptoi[!n + n + 1];
> > > +	size_t off = RTE_ALIGN_CEIL(sizeof(*flow), alignof(max_align_t));
> > > +	unsigned int i;
> > > +	unsigned int own = 0;
> > > +	int ret;
> > > +
> > > +	/* At least one port is needed when no switch domain is present. */
> > > +	if (!n) {
> > > +		n = 1;
> > > +		port_list[0] = dev->data->port_id;
> > > +	} else {
> > > +		n = mlx5_domain_to_port_id(priv->domain_id, port_list, n);
> > > +		if (n > RTE_DIM(port_list))
> > > +			n = RTE_DIM(port_list);
> > > +	}
> > > +	for (i = 0; i != n; ++i) {
> > > +		struct rte_eth_dev_info dev_info;
> > > +
> > > +		rte_eth_dev_info_get(port_list[i], &dev_info);
> > > +		if (port_list[i] == dev->data->port_id)
> > > +			own = i;
> > > +		ptoi[i].port_id = port_list[i];
> > > +		ptoi[i].ifindex = dev_info.if_index;
> > > +	}
> > > +	/* Ensure first entry of ptoi[] is the current device. */
> > > +	if (own) {
> > > +		ptoi[n] = ptoi[0];
> > > +		ptoi[0] = ptoi[own];
> > > +		ptoi[own] = ptoi[n];
> > > +	}
> > > +	/* An entry with zero ifindex terminates ptoi[]. */
> > > +	ptoi[n].port_id = 0;
> > > +	ptoi[n].ifindex = 0;
> > > +	if (flow_size < off)
> > > +		flow_size = 0;
> > > +	ret = mlx5_nl_flow_transpose((uint8_t *)flow + off,
> > > +				     flow_size ? flow_size - off : 0,
> > > +				     ptoi, attr, pattern, actions, error);
> > > +	if (ret < 0)
> > > +		return ret;
> > 
> > So, there's an assumption that the buffer allocated outside of this API is
> > enough to include all the messages in mlx5_nl_flow_transpose(), right? If
> > flow_size isn't enough, buf_tmp will be used and _transpose() doesn't return
> > error but required size. Sounds confusing, may need to make a change or to have
> > clearer documentation.
> 
> Well, isn't it already documented? Besides these are the usual snprintf()
> semantics used everywhere in these files, I think this was a major topic of
> discussion with Nelio on the flow rework series :)
> 
> buf_tmp[] is internal to mlx5_nl_flow_transpose() and used as a fallback to
> complete a pass when input buffer is not large enough (including
> zero-sized). Having a valid buffer is a constraint imposed by libmnl,
> because we badly want to know how much space will be needed assuming the
> flow rule was successfully processed.
> 
> Without libmnl, the helpers it provides would have been written in a way
> that doesn't require buf_tmp[]. However libmnl is just too convenient to
> pass up, hence this compromise.
> 
> (just to remind onlookers, we want to allocate the minimum amount of memory
> we possibly can for resources needed by each flow rule, and do so through a
> single allocation, end goal being to support millions of flow rules while
> wasting as little memory as possible.)
> 
> <snip>
> > > diff --git a/drivers/net/mlx5/mlx5_nl_flow.c b/drivers/net/mlx5/mlx5_nl_flow.c
> > > index 7a8683b03..1fc62fb0a 100644
> > > --- a/drivers/net/mlx5/mlx5_nl_flow.c
> > > +++ b/drivers/net/mlx5/mlx5_nl_flow.c
> > > @@ -5,7 +5,9 @@
> > >  
> > >  #include <errno.h>
> > >  #include <libmnl/libmnl.h>
> > > +#include <linux/if_ether.h>
> > >  #include <linux/netlink.h>
> > > +#include <linux/pkt_cls.h>
> > >  #include <linux/pkt_sched.h>
> > >  #include <linux/rtnetlink.h>
> > >  #include <stdalign.h>
> > > @@ -14,11 +16,248 @@
> > >  #include <stdlib.h>
> > >  #include <sys/socket.h>
> > >  
> > > +#include <rte_byteorder.h>
> > >  #include <rte_errno.h>
> > >  #include <rte_flow.h>
> > >  
> > >  #include "mlx5.h"
> > >  
> > > +/** Parser state definitions for mlx5_nl_flow_trans[]. */
> > > +enum mlx5_nl_flow_trans {
> > > +	INVALID,
> > > +	BACK,
> > > +	ATTR,
> > > +	PATTERN,
> > > +	ITEM_VOID,
> > > +	ACTIONS,
> > > +	ACTION_VOID,
> > > +	END,
> > > +};
> > > +
> > > +#define TRANS(...) (const enum mlx5_nl_flow_trans []){ __VA_ARGS__, INVALID, }
> > > +
> > > +#define PATTERN_COMMON \
> > > +	ITEM_VOID, ACTIONS
> > > +#define ACTIONS_COMMON \
> > > +	ACTION_VOID, END
> > > +
> > > +/** Parser state transitions used by mlx5_nl_flow_transpose(). */
> > > +static const enum mlx5_nl_flow_trans *const mlx5_nl_flow_trans[] = {
> > > +	[INVALID] = NULL,
> > > +	[BACK] = NULL,
> > > +	[ATTR] = TRANS(PATTERN),
> > > +	[PATTERN] = TRANS(PATTERN_COMMON),
> > > +	[ITEM_VOID] = TRANS(BACK),
> > > +	[ACTIONS] = TRANS(ACTIONS_COMMON),
> > > +	[ACTION_VOID] = TRANS(BACK),
> > > +	[END] = NULL,
> > > +};
> > > +
> > > +/**
> > > + * Transpose flow rule description to rtnetlink message.
> > > + *
> > > + * This function transposes a flow rule description to a traffic control
> > > + * (TC) filter creation message ready to be sent over Netlink.
> > > + *
> > > + * Target interface is specified as the first entry of the @p ptoi table.
> > > + * Subsequent entries enable this function to resolve other DPDK port IDs
> > > + * found in the flow rule.
> > > + *
> > > + * @param[out] buf
> > > + *   Output message buffer. May be NULL when @p size is 0.
> > > + * @param size
> > > + *   Size of @p buf. Message may be truncated if not large enough.
> > > + * @param[in] ptoi
> > > + *   DPDK port ID to network interface index translation table. This table
> > > + *   is terminated by an entry with a zero ifindex value.
> > > + * @param[in] attr
> > > + *   Flow rule attributes.
> > > + * @param[in] pattern
> > > + *   Pattern specification.
> > > + * @param[in] actions
> > > + *   Associated actions.
> > > + * @param[out] error
> > > + *   Perform verbose error reporting if not NULL.
> > > + *
> > > + * @return
> > > + *   A positive value representing the exact size of the message in bytes
> > > + *   regardless of the @p size parameter on success, a negative errno value
> > > + *   otherwise and rte_errno is set.
> > > + */
> > > +int
> > > +mlx5_nl_flow_transpose(void *buf,
> > > +		       size_t size,
> > > +		       const struct mlx5_nl_flow_ptoi *ptoi,
> > > +		       const struct rte_flow_attr *attr,
> > > +		       const struct rte_flow_item *pattern,
> > > +		       const struct rte_flow_action *actions,
> > > +		       struct rte_flow_error *error)
> > > +{
> > > +	alignas(struct nlmsghdr)
> > > +	uint8_t buf_tmp[MNL_SOCKET_BUFFER_SIZE];
> > > +	const struct rte_flow_item *item;
> > > +	const struct rte_flow_action *action;
> > > +	unsigned int n;
> > > +	struct nlattr *na_flower;
> > > +	struct nlattr *na_flower_act;
> > > +	const enum mlx5_nl_flow_trans *trans;
> > > +	const enum mlx5_nl_flow_trans *back;
> > > +
> > > +	if (!size)
> > > +		goto error_nobufs;
> > > +init:
> > > +	item = pattern;
> > > +	action = actions;
> > > +	n = 0;
> > > +	na_flower = NULL;
> > > +	na_flower_act = NULL;
> > > +	trans = TRANS(ATTR);
> > > +	back = trans;
> > > +trans:
> > > +	switch (trans[n++]) {
> > > +		struct nlmsghdr *nlh;
> > > +		struct tcmsg *tcm;
> > > +
> > > +	case INVALID:
> > > +		if (item->type)
> > > +			return rte_flow_error_set
> > > +				(error, ENOTSUP, RTE_FLOW_ERROR_TYPE_ITEM,
> > > +				 item, "unsupported pattern item combination");
> > > +		else if (action->type)
> > > +			return rte_flow_error_set
> > > +				(error, ENOTSUP, RTE_FLOW_ERROR_TYPE_ACTION,
> > > +				 action, "unsupported action combination");
> > > +		return rte_flow_error_set
> > > +			(error, ENOTSUP, RTE_FLOW_ERROR_TYPE_UNSPECIFIED, NULL,
> > > +			 "flow rule lacks some kind of fate action");
> > > +	case BACK:
> > > +		trans = back;
> > > +		n = 0;
> > > +		goto trans;
> > > +	case ATTR:
> > > +		/*
> > > +		 * Supported attributes: no groups, some priorities and
> > > +		 * ingress only. Don't care about transfer as it is the
> > > +		 * caller's problem.
> > > +		 */
> > > +		if (attr->group)
> > > +			return rte_flow_error_set
> > > +				(error, ENOTSUP,
> > > +				 RTE_FLOW_ERROR_TYPE_ATTR_GROUP,
> > > +				 attr, "groups are not supported");
> > > +		if (attr->priority > 0xfffe)
> > > +			return rte_flow_error_set
> > > +				(error, ENOTSUP,
> > > +				 RTE_FLOW_ERROR_TYPE_ATTR_PRIORITY,
> > > +				 attr, "lowest priority level is 0xfffe");
> > > +		if (!attr->ingress)
> > > +			return rte_flow_error_set
> > > +				(error, ENOTSUP,
> > > +				 RTE_FLOW_ERROR_TYPE_ATTR_INGRESS,
> > > +				 attr, "only ingress is supported");
> > > +		if (attr->egress)
> > > +			return rte_flow_error_set
> > > +				(error, ENOTSUP,
> > > +				 RTE_FLOW_ERROR_TYPE_ATTR_INGRESS,
> > > +				 attr, "egress is not supported");
> > > +		if (size < mnl_nlmsg_size(sizeof(*tcm)))
> > > +			goto error_nobufs;
> > > +		nlh = mnl_nlmsg_put_header(buf);
> > > +		nlh->nlmsg_type = 0;
> > > +		nlh->nlmsg_flags = 0;
> > > +		nlh->nlmsg_seq = 0;
> > > +		tcm = mnl_nlmsg_put_extra_header(nlh, sizeof(*tcm));
> > > +		tcm->tcm_family = AF_UNSPEC;
> > > +		tcm->tcm_ifindex = ptoi[0].ifindex;
> > > +		/*
> > > +		 * Let kernel pick a handle by default. A predictable handle
> > > +		 * can be set by the caller on the resulting buffer through
> > > +		 * mlx5_nl_flow_brand().
> > > +		 */
> > > +		tcm->tcm_handle = 0;
> > > +		tcm->tcm_parent = TC_H_MAKE(TC_H_INGRESS, TC_H_MIN_INGRESS);
> > > +		/*
> > > +		 * Priority cannot be zero to prevent the kernel from
> > > +		 * picking one automatically.
> > > +		 */
> > > +		tcm->tcm_info = TC_H_MAKE((attr->priority + 1) << 16,
> > > +					  RTE_BE16(ETH_P_ALL));
> > > +		break;
> > > +	case PATTERN:
> > > +		if (!mnl_attr_put_strz_check(buf, size, TCA_KIND, "flower"))
> > > +			goto error_nobufs;
> > > +		na_flower = mnl_attr_nest_start_check(buf, size, TCA_OPTIONS);
> > > +		if (!na_flower)
> > > +			goto error_nobufs;
> > > +		if (!mnl_attr_put_u32_check(buf, size, TCA_FLOWER_FLAGS,
> > > +					    TCA_CLS_FLAGS_SKIP_SW))
> > > +			goto error_nobufs;
> > > +		break;
> > > +	case ITEM_VOID:
> > > +		if (item->type != RTE_FLOW_ITEM_TYPE_VOID)
> > > +			goto trans;
> > > +		++item;
> > > +		break;
> > > +	case ACTIONS:
> > > +		if (item->type != RTE_FLOW_ITEM_TYPE_END)
> > > +			goto trans;
> > > +		assert(na_flower);
> > > +		assert(!na_flower_act);
> > > +		na_flower_act =
> > > +			mnl_attr_nest_start_check(buf, size, TCA_FLOWER_ACT);
> > > +		if (!na_flower_act)
> > > +			goto error_nobufs;
> > > +		break;
> > > +	case ACTION_VOID:
> > > +		if (action->type != RTE_FLOW_ACTION_TYPE_VOID)
> > > +			goto trans;
> > > +		++action;
> > > +		break;
> > > +	case END:
> > > +		if (item->type != RTE_FLOW_ITEM_TYPE_END ||
> > > +		    action->type != RTE_FLOW_ACTION_TYPE_END)
> > > +			goto trans;
> > > +		if (na_flower_act)
> > > +			mnl_attr_nest_end(buf, na_flower_act);
> > > +		if (na_flower)
> > > +			mnl_attr_nest_end(buf, na_flower);
> > > +		nlh = buf;
> > > +		return nlh->nlmsg_len;
> > > +	}
> > > +	back = trans;
> > > +	trans = mlx5_nl_flow_trans[trans[n - 1]];
> > > +	n = 0;
> > > +	goto trans;
> > > +error_nobufs:
> > > +	if (buf != buf_tmp) {
> > > +		buf = buf_tmp;
> > > +		size = sizeof(buf_tmp);
> > > +		goto init;
> > > +	}
> > 
> > Continuing my comment above.
> > This part is unclear. It looks to me that this func does:
> > 
> > 1) if size is zero, consider it as a testing call to know the amount of memory
> > required.
> 
> Yeah, in fact this one is a shortcut to speed up this specific scenario as
> it happens all the time in the two-pass use case. You can lump it with 2).
> 
> > 2) if size isn't zero but not enough, it stops writing to buf and start over to
> > return the amount of memory required instead of returning error.
> > 3) if size isn't zero and enough, it fills in buf.
> > 
> > Do I correctly understand?
> 
> Yes. Another minor note for 2), the returned buffer is also filled up to the
> point of failure (mimics snprintf()).
> 
> Perhaps the following snippet can better summarize the envisioned approach:
> 
>  int ret = snprintf(NULL, 0, "something", ...);
> 
>  if (ret < 0) {
>      goto court;
>  } else {
>      char buf[ret];
> 
>      snprintf(buf, sizeof(buf), "something", ...); /* Guaranteed. */
>      [...]
>  }

I know you and Nelio mimicked snprintf() but as _merge() isn't a public API for
users but an internal API. I didn't think it should necessarily be like that. I
hoped to have it used either for testing (knowing size) or real translation - 1)
and 3). And no possibility for 2), then 2) would've been handled by assert().
I beleive this could've made the code simpler.

However, as I already acked Nelio's patchset, agreed on the idea and Nelio
already documented the behavior,

Acked-by: Yongseok Koh <yskoh@mellanox.com>

Thanks

  reply	other threads:[~2018-07-12 18:25 UTC|newest]

Thread overview: 33+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-06-27 18:08 [dpdk-dev] [PATCH 0/6] net/mlx5: add support " Adrien Mazarguil
2018-06-27 18:08 ` [dpdk-dev] [PATCH 1/6] net/mlx5: lay groundwork for switch offloads Adrien Mazarguil
2018-07-12  0:17   ` Yongseok Koh
2018-07-12 10:46     ` Adrien Mazarguil
2018-07-12 17:33       ` Yongseok Koh
2018-06-27 18:08 ` [dpdk-dev] [PATCH 2/6] net/mlx5: add framework for switch flow rules Adrien Mazarguil
2018-07-12  0:59   ` Yongseok Koh
2018-07-12 10:46     ` Adrien Mazarguil
2018-07-12 18:25       ` Yongseok Koh [this message]
2018-06-27 18:08 ` [dpdk-dev] [PATCH 3/6] net/mlx5: add fate actions to " Adrien Mazarguil
2018-07-12  1:00   ` Yongseok Koh
2018-06-27 18:08 ` [dpdk-dev] [PATCH 4/6] net/mlx5: add L2-L4 pattern items " Adrien Mazarguil
2018-07-12  1:02   ` Yongseok Koh
2018-06-27 18:08 ` [dpdk-dev] [PATCH 5/6] net/mlx5: add VLAN item and actions " Adrien Mazarguil
2018-07-12  1:10   ` Yongseok Koh
2018-07-12 10:47     ` Adrien Mazarguil
2018-07-12 18:49       ` Yongseok Koh
2018-06-27 18:08 ` [dpdk-dev] [PATCH 6/6] net/mlx5: add port ID pattern item " Adrien Mazarguil
2018-07-12  1:13   ` Yongseok Koh
2018-06-28  9:05 ` [dpdk-dev] [PATCH 0/6] net/mlx5: add support for " Nélio Laranjeiro
2018-07-13  9:40 ` [dpdk-dev] [PATCH v2 " Adrien Mazarguil
2018-07-13  9:40   ` [dpdk-dev] [PATCH v2 1/6] net/mlx5: lay groundwork for switch offloads Adrien Mazarguil
2018-07-14  1:29     ` Yongseok Koh
2018-07-23 21:40     ` Ferruh Yigit
2018-07-24  0:50       ` Stephen Hemminger
2018-07-24  4:35         ` Shahaf Shuler
2018-07-24 19:33           ` Stephen Hemminger
2018-07-13  9:40   ` [dpdk-dev] [PATCH v2 2/6] net/mlx5: add framework for switch flow rules Adrien Mazarguil
2018-07-13  9:40   ` [dpdk-dev] [PATCH v2 3/6] net/mlx5: add fate actions to " Adrien Mazarguil
2018-07-13  9:40   ` [dpdk-dev] [PATCH v2 4/6] net/mlx5: add L2-L4 pattern items " Adrien Mazarguil
2018-07-13  9:40   ` [dpdk-dev] [PATCH v2 5/6] net/mlx5: add VLAN item and actions " Adrien Mazarguil
2018-07-13  9:40   ` [dpdk-dev] [PATCH v2 6/6] net/mlx5: add port ID pattern item " Adrien Mazarguil
2018-07-22 11:21   ` [dpdk-dev] [PATCH v2 0/6] net/mlx5: add support for " Shahaf Shuler

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180712182526.GA73570@yongseok-MBP.local \
    --to=yskoh@mellanox.com \
    --cc=adrien.mazarguil@6wind.com \
    --cc=dev@dpdk.org \
    --cc=nelio.laranjeiro@6wind.com \
    --cc=shahafs@mellanox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).