DPDK patches and discussions
 help / color / mirror / Atom feed
* [dpdk-dev] [PATCH 0/6] net/mlx5: add support for switch flow rules
@ 2018-06-27 18:08 Adrien Mazarguil
  2018-06-27 18:08 ` [dpdk-dev] [PATCH 1/6] net/mlx5: lay groundwork for switch offloads Adrien Mazarguil
                   ` (7 more replies)
  0 siblings, 8 replies; 33+ messages in thread
From: Adrien Mazarguil @ 2018-06-27 18:08 UTC (permalink / raw)
  To: Shahaf Shuler; +Cc: Nelio Laranjeiro, Yongseok Koh, dev

This series adds support for switch flow rules, that is, rte_flow rules
applied to mlx5 devices at the switch level.

It allows applications to offload traffic redirection between DPDK ports in
hardware, while optionally modifying it (e.g. performing encap/decap).

For this to work, involved DPDK ports must be part of the same switch
domain, as is the case with port representors, and the transfer attribute
must be requested on flow rules.

Also since the mlx5 switch is controlled through Netlink instead of Verbs,
and given how tedious formatting Netlink messages is, a new dependency is
added to mlx5: libmnl. See relevant patch.

This series depends on Nelio's mlx5 flow engine rework ("net/mlx5: flow
rework" [1][2]) which must be applied first.

[1] https://patches.dpdk.org/project/dpdk/list/?series=268
[2] https://mails.dpdk.org/archives/dev/2018-June/105499.html

Adrien Mazarguil (6):
  net/mlx5: lay groundwork for switch offloads
  net/mlx5: add framework for switch flow rules
  net/mlx5: add fate actions to switch flow rules
  net/mlx5: add L2-L4 pattern items to switch flow rules
  net/mlx5: add VLAN item and actions to switch flow rules
  net/mlx5: add port ID pattern item to switch flow rules

 drivers/net/mlx5/Makefile       |    2 +
 drivers/net/mlx5/mlx5.c         |   32 +
 drivers/net/mlx5/mlx5.h         |   28 +
 drivers/net/mlx5/mlx5_flow.c    |  113 ++++
 drivers/net/mlx5/mlx5_nl_flow.c | 1126 ++++++++++++++++++++++++++++++++++
 mk/rte.app.mk                   |    2 +-
 6 files changed, 1302 insertions(+), 1 deletion(-)
 create mode 100644 drivers/net/mlx5/mlx5_nl_flow.c

-- 
2.11.0

^ permalink raw reply	[flat|nested] 33+ messages in thread

* [dpdk-dev] [PATCH 1/6] net/mlx5: lay groundwork for switch offloads
  2018-06-27 18:08 [dpdk-dev] [PATCH 0/6] net/mlx5: add support for switch flow rules Adrien Mazarguil
@ 2018-06-27 18:08 ` Adrien Mazarguil
  2018-07-12  0:17   ` Yongseok Koh
  2018-06-27 18:08 ` [dpdk-dev] [PATCH 2/6] net/mlx5: add framework for switch flow rules Adrien Mazarguil
                   ` (6 subsequent siblings)
  7 siblings, 1 reply; 33+ messages in thread
From: Adrien Mazarguil @ 2018-06-27 18:08 UTC (permalink / raw)
  To: Shahaf Shuler; +Cc: Nelio Laranjeiro, Yongseok Koh, dev

With mlx5, unlike normal flow rules implemented through Verbs for traffic
emitted and received by the application, those targeting different logical
ports of the device (VF representors for instance) are offloaded at the
switch level and must be configured through Netlink (TC interface).

This patch adds preliminary support to manage such flow rules through the
flow API (rte_flow).

Instead of rewriting tons of Netlink helpers and as previously suggested by
Stephen [1], this patch introduces a new dependency to libmnl [2]
(LGPL-2.1) when compiling mlx5.

[1] https://mails.dpdk.org/archives/dev/2018-March/092676.html
[2] https://netfilter.org/projects/libmnl/

Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
---
 drivers/net/mlx5/Makefile       |   2 +
 drivers/net/mlx5/mlx5.c         |  32 ++++++++
 drivers/net/mlx5/mlx5.h         |  10 +++
 drivers/net/mlx5/mlx5_nl_flow.c | 139 +++++++++++++++++++++++++++++++++++
 mk/rte.app.mk                   |   2 +-
 5 files changed, 184 insertions(+), 1 deletion(-)

diff --git a/drivers/net/mlx5/Makefile b/drivers/net/mlx5/Makefile
index 8a5229e61..3325eed06 100644
--- a/drivers/net/mlx5/Makefile
+++ b/drivers/net/mlx5/Makefile
@@ -33,6 +33,7 @@ SRCS-$(CONFIG_RTE_LIBRTE_MLX5_PMD) += mlx5_mr.c
 SRCS-$(CONFIG_RTE_LIBRTE_MLX5_PMD) += mlx5_flow.c
 SRCS-$(CONFIG_RTE_LIBRTE_MLX5_PMD) += mlx5_socket.c
 SRCS-$(CONFIG_RTE_LIBRTE_MLX5_PMD) += mlx5_nl.c
+SRCS-$(CONFIG_RTE_LIBRTE_MLX5_PMD) += mlx5_nl_flow.c
 
 ifeq ($(CONFIG_RTE_LIBRTE_MLX5_DLOPEN_DEPS),y)
 INSTALL-$(CONFIG_RTE_LIBRTE_MLX5_PMD)-lib += $(LIB_GLUE)
@@ -56,6 +57,7 @@ LDLIBS += -ldl
 else
 LDLIBS += -libverbs -lmlx5
 endif
+LDLIBS += -lmnl
 LDLIBS += -lrte_eal -lrte_mbuf -lrte_mempool -lrte_ring
 LDLIBS += -lrte_ethdev -lrte_net -lrte_kvargs
 LDLIBS += -lrte_bus_pci
diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index 665a3c31f..d9b9097b1 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -279,6 +279,8 @@ mlx5_dev_close(struct rte_eth_dev *dev)
 		mlx5_nl_mac_addr_flush(dev);
 	if (priv->nl_socket >= 0)
 		close(priv->nl_socket);
+	if (priv->mnl_socket)
+		mlx5_nl_flow_socket_destroy(priv->mnl_socket);
 	ret = mlx5_hrxq_ibv_verify(dev);
 	if (ret)
 		DRV_LOG(WARNING, "port %u some hash Rx queue still remain",
@@ -1077,6 +1079,34 @@ mlx5_dev_spawn_one(struct rte_device *dpdk_dev,
 			priv->nl_socket = -1;
 		mlx5_nl_mac_addr_sync(eth_dev);
 	}
+	priv->mnl_socket = mlx5_nl_flow_socket_create();
+	if (!priv->mnl_socket) {
+		err = -rte_errno;
+		DRV_LOG(WARNING,
+			"flow rules relying on switch offloads will not be"
+			" supported: cannot open libmnl socket: %s",
+			strerror(rte_errno));
+	} else {
+		struct rte_flow_error error;
+		unsigned int ifindex = mlx5_ifindex(eth_dev);
+
+		if (!ifindex) {
+			err = -rte_errno;
+			error.message =
+				"cannot retrieve network interface index";
+		} else {
+			err = mlx5_nl_flow_init(priv->mnl_socket, ifindex,
+						&error);
+		}
+		if (err) {
+			DRV_LOG(WARNING,
+				"flow rules relying on switch offloads will"
+				" not be supported: %s: %s",
+				error.message, strerror(rte_errno));
+			mlx5_nl_flow_socket_destroy(priv->mnl_socket);
+			priv->mnl_socket = NULL;
+		}
+	}
 	TAILQ_INIT(&priv->flows);
 	TAILQ_INIT(&priv->ctrl_flows);
 	/* Hint libmlx5 to use PMD allocator for data plane resources */
@@ -1127,6 +1157,8 @@ mlx5_dev_spawn_one(struct rte_device *dpdk_dev,
 	if (priv) {
 		unsigned int i;
 
+		if (priv->mnl_socket)
+			mlx5_nl_flow_socket_destroy(priv->mnl_socket);
 		i = mlx5_domain_to_port_id(priv->domain_id, NULL, 0);
 		if (i == 1)
 			claim_zero(rte_eth_switch_domain_free(priv->domain_id));
diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index 1d8e156c8..390249adb 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -148,6 +148,8 @@ struct mlx5_drop {
 	struct mlx5_rxq_ibv *rxq; /* Verbs Rx queue. */
 };
 
+struct mnl_socket;
+
 struct priv {
 	LIST_ENTRY(priv) mem_event_cb; /* Called by memory event callback. */
 	struct rte_eth_dev_data *dev_data;  /* Pointer to device data. */
@@ -207,6 +209,7 @@ struct priv {
 	/* Context for Verbs allocator. */
 	int nl_socket; /* Netlink socket. */
 	uint32_t nl_sn; /* Netlink message sequence number. */
+	struct mnl_socket *mnl_socket; /* Libmnl socket. */
 };
 
 #define PORT_ID(priv) ((priv)->dev_data->port_id)
@@ -369,4 +372,11 @@ void mlx5_nl_mac_addr_flush(struct rte_eth_dev *dev);
 int mlx5_nl_promisc(struct rte_eth_dev *dev, int enable);
 int mlx5_nl_allmulti(struct rte_eth_dev *dev, int enable);
 
+/* mlx5_nl_flow.c */
+
+int mlx5_nl_flow_init(struct mnl_socket *nl, unsigned int ifindex,
+		      struct rte_flow_error *error);
+struct mnl_socket *mlx5_nl_flow_socket_create(void);
+void mlx5_nl_flow_socket_destroy(struct mnl_socket *nl);
+
 #endif /* RTE_PMD_MLX5_H_ */
diff --git a/drivers/net/mlx5/mlx5_nl_flow.c b/drivers/net/mlx5/mlx5_nl_flow.c
new file mode 100644
index 000000000..7a8683b03
--- /dev/null
+++ b/drivers/net/mlx5/mlx5_nl_flow.c
@@ -0,0 +1,139 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright 2018 6WIND S.A.
+ * Copyright 2018 Mellanox Technologies, Ltd
+ */
+
+#include <errno.h>
+#include <libmnl/libmnl.h>
+#include <linux/netlink.h>
+#include <linux/pkt_sched.h>
+#include <linux/rtnetlink.h>
+#include <stdalign.h>
+#include <stddef.h>
+#include <stdint.h>
+#include <stdlib.h>
+#include <sys/socket.h>
+
+#include <rte_errno.h>
+#include <rte_flow.h>
+
+#include "mlx5.h"
+
+/**
+ * Send Netlink message with acknowledgment.
+ *
+ * @param nl
+ *   Libmnl socket to use.
+ * @param nlh
+ *   Message to send. This function always raises the NLM_F_ACK flag before
+ *   sending.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+static int
+mlx5_nl_flow_nl_ack(struct mnl_socket *nl, struct nlmsghdr *nlh)
+{
+	alignas(struct nlmsghdr)
+	uint8_t ans[MNL_SOCKET_BUFFER_SIZE];
+	uint32_t seq = random();
+	int ret;
+
+	nlh->nlmsg_flags |= NLM_F_ACK;
+	nlh->nlmsg_seq = seq;
+	ret = mnl_socket_sendto(nl, nlh, nlh->nlmsg_len);
+	if (ret != -1)
+		ret = mnl_socket_recvfrom(nl, ans, sizeof(ans));
+	if (ret != -1)
+		ret = mnl_cb_run
+			(ans, ret, seq, mnl_socket_get_portid(nl), NULL, NULL);
+	if (!ret)
+		return 0;
+	rte_errno = errno;
+	return -rte_errno;
+}
+
+/**
+ * Initialize ingress qdisc of a given network interface.
+ *
+ * @param nl
+ *   Libmnl socket of the @p NETLINK_ROUTE kind.
+ * @param ifindex
+ *   Index of network interface to initialize.
+ * @param[out] error
+ *   Perform verbose error reporting if not NULL.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+int
+mlx5_nl_flow_init(struct mnl_socket *nl, unsigned int ifindex,
+		  struct rte_flow_error *error)
+{
+	uint8_t buf[MNL_SOCKET_BUFFER_SIZE];
+	struct nlmsghdr *nlh;
+	struct tcmsg *tcm;
+
+	/* Destroy existing ingress qdisc and everything attached to it. */
+	nlh = mnl_nlmsg_put_header(buf);
+	nlh->nlmsg_type = RTM_DELQDISC;
+	nlh->nlmsg_flags = NLM_F_REQUEST;
+	tcm = mnl_nlmsg_put_extra_header(nlh, sizeof(*tcm));
+	tcm->tcm_family = AF_UNSPEC;
+	tcm->tcm_ifindex = ifindex;
+	tcm->tcm_handle = TC_H_MAKE(TC_H_INGRESS, 0);
+	tcm->tcm_parent = TC_H_INGRESS;
+	/* Ignore errors when qdisc is already absent. */
+	if (mlx5_nl_flow_nl_ack(nl, nlh) &&
+	    rte_errno != EINVAL && rte_errno != ENOENT)
+		return rte_flow_error_set
+			(error, rte_errno, RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+			 NULL, "netlink: failed to remove ingress qdisc");
+	/* Create fresh ingress qdisc. */
+	nlh = mnl_nlmsg_put_header(buf);
+	nlh->nlmsg_type = RTM_NEWQDISC;
+	nlh->nlmsg_flags = NLM_F_REQUEST | NLM_F_CREATE | NLM_F_EXCL;
+	tcm = mnl_nlmsg_put_extra_header(nlh, sizeof(*tcm));
+	tcm->tcm_family = AF_UNSPEC;
+	tcm->tcm_ifindex = ifindex;
+	tcm->tcm_handle = TC_H_MAKE(TC_H_INGRESS, 0);
+	tcm->tcm_parent = TC_H_INGRESS;
+	mnl_attr_put_strz_check(nlh, sizeof(buf), TCA_KIND, "ingress");
+	if (mlx5_nl_flow_nl_ack(nl, nlh))
+		return rte_flow_error_set
+			(error, rte_errno, RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+			 NULL, "netlink: failed to create ingress qdisc");
+	return 0;
+}
+
+/**
+ * Create and configure a libmnl socket for Netlink flow rules.
+ *
+ * @return
+ *   A valid libmnl socket object pointer on success, NULL otherwise and
+ *   rte_errno is set.
+ */
+struct mnl_socket *
+mlx5_nl_flow_socket_create(void)
+{
+	struct mnl_socket *nl = mnl_socket_open(NETLINK_ROUTE);
+
+	if (nl &&
+	    !mnl_socket_setsockopt(nl, NETLINK_CAP_ACK, &(int){ 1 },
+				   sizeof(int)) &&
+	    !mnl_socket_bind(nl, 0, MNL_SOCKET_AUTOPID))
+		return nl;
+	rte_errno = errno;
+	if (nl)
+		mnl_socket_close(nl);
+	return NULL;
+}
+
+/**
+ * Destroy a libmnl socket.
+ */
+void
+mlx5_nl_flow_socket_destroy(struct mnl_socket *nl)
+{
+	mnl_socket_close(nl);
+}
diff --git a/mk/rte.app.mk b/mk/rte.app.mk
index 7bcf6308d..414f1b967 100644
--- a/mk/rte.app.mk
+++ b/mk/rte.app.mk
@@ -145,7 +145,7 @@ endif
 ifeq ($(CONFIG_RTE_LIBRTE_MLX5_DLOPEN_DEPS),y)
 _LDLIBS-$(CONFIG_RTE_LIBRTE_MLX5_PMD)       += -lrte_pmd_mlx5 -ldl
 else
-_LDLIBS-$(CONFIG_RTE_LIBRTE_MLX5_PMD)       += -lrte_pmd_mlx5 -libverbs -lmlx5
+_LDLIBS-$(CONFIG_RTE_LIBRTE_MLX5_PMD)       += -lrte_pmd_mlx5 -libverbs -lmlx5 -lmnl
 endif
 _LDLIBS-$(CONFIG_RTE_LIBRTE_MVPP2_PMD)      += -lrte_pmd_mvpp2 -L$(LIBMUSDK_PATH)/lib -lmusdk
 _LDLIBS-$(CONFIG_RTE_LIBRTE_NFP_PMD)        += -lrte_pmd_nfp
-- 
2.11.0

^ permalink raw reply	[flat|nested] 33+ messages in thread

* [dpdk-dev] [PATCH 2/6] net/mlx5: add framework for switch flow rules
  2018-06-27 18:08 [dpdk-dev] [PATCH 0/6] net/mlx5: add support for switch flow rules Adrien Mazarguil
  2018-06-27 18:08 ` [dpdk-dev] [PATCH 1/6] net/mlx5: lay groundwork for switch offloads Adrien Mazarguil
@ 2018-06-27 18:08 ` Adrien Mazarguil
  2018-07-12  0:59   ` Yongseok Koh
  2018-06-27 18:08 ` [dpdk-dev] [PATCH 3/6] net/mlx5: add fate actions to " Adrien Mazarguil
                   ` (5 subsequent siblings)
  7 siblings, 1 reply; 33+ messages in thread
From: Adrien Mazarguil @ 2018-06-27 18:08 UTC (permalink / raw)
  To: Shahaf Shuler; +Cc: Nelio Laranjeiro, Yongseok Koh, dev

Because mlx5 switch flow rules are configured through Netlink (TC
interface) and have little in common with Verbs, this patch adds a separate
parser function to handle them.

- mlx5_nl_flow_transpose() converts a rte_flow rule to its TC equivalent
  and stores the result in a buffer.

- mlx5_nl_flow_brand() gives a unique handle to a flow rule buffer.

- mlx5_nl_flow_create() instantiates a flow rule on the device based on
  such a buffer.

- mlx5_nl_flow_destroy() performs the reverse operation.

These functions are called by the existing implementation when encountering
flow rules which must be offloaded to the switch (currently relying on the
transfer attribute).

Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
---
 drivers/net/mlx5/mlx5.h         |  18 +++
 drivers/net/mlx5/mlx5_flow.c    | 113 ++++++++++++++
 drivers/net/mlx5/mlx5_nl_flow.c | 295 +++++++++++++++++++++++++++++++++++
 3 files changed, 426 insertions(+)

diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index 390249adb..aa16057d6 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -148,6 +148,12 @@ struct mlx5_drop {
 	struct mlx5_rxq_ibv *rxq; /* Verbs Rx queue. */
 };
 
+/** DPDK port to network interface index (ifindex) conversion. */
+struct mlx5_nl_flow_ptoi {
+	uint16_t port_id; /**< DPDK port ID. */
+	unsigned int ifindex; /**< Network interface index. */
+};
+
 struct mnl_socket;
 
 struct priv {
@@ -374,6 +380,18 @@ int mlx5_nl_allmulti(struct rte_eth_dev *dev, int enable);
 
 /* mlx5_nl_flow.c */
 
+int mlx5_nl_flow_transpose(void *buf,
+			   size_t size,
+			   const struct mlx5_nl_flow_ptoi *ptoi,
+			   const struct rte_flow_attr *attr,
+			   const struct rte_flow_item *pattern,
+			   const struct rte_flow_action *actions,
+			   struct rte_flow_error *error);
+void mlx5_nl_flow_brand(void *buf, uint32_t handle);
+int mlx5_nl_flow_create(struct mnl_socket *nl, void *buf,
+			struct rte_flow_error *error);
+int mlx5_nl_flow_destroy(struct mnl_socket *nl, void *buf,
+			 struct rte_flow_error *error);
 int mlx5_nl_flow_init(struct mnl_socket *nl, unsigned int ifindex,
 		      struct rte_flow_error *error);
 struct mnl_socket *mlx5_nl_flow_socket_create(void);
diff --git a/drivers/net/mlx5/mlx5_flow.c b/drivers/net/mlx5/mlx5_flow.c
index 9241855be..93b245991 100644
--- a/drivers/net/mlx5/mlx5_flow.c
+++ b/drivers/net/mlx5/mlx5_flow.c
@@ -4,6 +4,7 @@
  */
 
 #include <sys/queue.h>
+#include <stdalign.h>
 #include <stdint.h>
 #include <string.h>
 
@@ -271,6 +272,7 @@ struct rte_flow {
 	/**< Store tunnel packet type data to store in Rx queue. */
 	uint8_t key[40]; /**< RSS hash key. */
 	uint16_t (*queue)[]; /**< Destination queues to redirect traffic to. */
+	void *nl_flow; /**< Netlink flow buffer if relevant. */
 };
 
 static const struct rte_flow_ops mlx5_flow_ops = {
@@ -2403,6 +2405,106 @@ mlx5_flow_actions(struct rte_eth_dev *dev,
 }
 
 /**
+ * Validate flow rule and fill flow structure accordingly.
+ *
+ * @param dev
+ *   Pointer to Ethernet device.
+ * @param[out] flow
+ *   Pointer to flow structure.
+ * @param flow_size
+ *   Size of allocated space for @p flow.
+ * @param[in] attr
+ *   Flow rule attributes.
+ * @param[in] pattern
+ *   Pattern specification (list terminated by the END pattern item).
+ * @param[in] actions
+ *   Associated actions (list terminated by the END action).
+ * @param[out] error
+ *   Perform verbose error reporting if not NULL.
+ *
+ * @return
+ *   A positive value representing the size of the flow object in bytes
+ *   regardless of @p flow_size on success, a negative errno value otherwise
+ *   and rte_errno is set.
+ */
+static int
+mlx5_flow_merge_switch(struct rte_eth_dev *dev,
+		       struct rte_flow *flow,
+		       size_t flow_size,
+		       const struct rte_flow_attr *attr,
+		       const struct rte_flow_item pattern[],
+		       const struct rte_flow_action actions[],
+		       struct rte_flow_error *error)
+{
+	struct priv *priv = dev->data->dev_private;
+	unsigned int n = mlx5_domain_to_port_id(priv->domain_id, NULL, 0);
+	uint16_t port_list[!n + n];
+	struct mlx5_nl_flow_ptoi ptoi[!n + n + 1];
+	size_t off = RTE_ALIGN_CEIL(sizeof(*flow), alignof(max_align_t));
+	unsigned int i;
+	unsigned int own = 0;
+	int ret;
+
+	/* At least one port is needed when no switch domain is present. */
+	if (!n) {
+		n = 1;
+		port_list[0] = dev->data->port_id;
+	} else {
+		n = mlx5_domain_to_port_id(priv->domain_id, port_list, n);
+		if (n > RTE_DIM(port_list))
+			n = RTE_DIM(port_list);
+	}
+	for (i = 0; i != n; ++i) {
+		struct rte_eth_dev_info dev_info;
+
+		rte_eth_dev_info_get(port_list[i], &dev_info);
+		if (port_list[i] == dev->data->port_id)
+			own = i;
+		ptoi[i].port_id = port_list[i];
+		ptoi[i].ifindex = dev_info.if_index;
+	}
+	/* Ensure first entry of ptoi[] is the current device. */
+	if (own) {
+		ptoi[n] = ptoi[0];
+		ptoi[0] = ptoi[own];
+		ptoi[own] = ptoi[n];
+	}
+	/* An entry with zero ifindex terminates ptoi[]. */
+	ptoi[n].port_id = 0;
+	ptoi[n].ifindex = 0;
+	if (flow_size < off)
+		flow_size = 0;
+	ret = mlx5_nl_flow_transpose((uint8_t *)flow + off,
+				     flow_size ? flow_size - off : 0,
+				     ptoi, attr, pattern, actions, error);
+	if (ret < 0)
+		return ret;
+	if (flow_size) {
+		*flow = (struct rte_flow){
+			.attributes = *attr,
+			.nl_flow = (uint8_t *)flow + off,
+		};
+		/*
+		 * Generate a reasonably unique handle based on the address
+		 * of the target buffer.
+		 *
+		 * This is straightforward on 32-bit systems where the flow
+		 * pointer can be used directly. Otherwise, its least
+		 * significant part is taken after shifting it by the
+		 * previous power of two of the pointed buffer size.
+		 */
+		if (sizeof(flow) <= 4)
+			mlx5_nl_flow_brand(flow->nl_flow, (uintptr_t)flow);
+		else
+			mlx5_nl_flow_brand
+				(flow->nl_flow,
+				 (uintptr_t)flow >>
+				 rte_log2_u32(rte_align32prevpow2(flow_size)));
+	}
+	return off + ret;
+}
+
+/**
  * Validate the rule and return a flow structure filled accordingly.
  *
  * @param dev
@@ -2439,6 +2541,9 @@ mlx5_flow_merge(struct rte_eth_dev *dev, struct rte_flow *flow,
 	int ret;
 	uint32_t i;
 
+	if (attr->transfer)
+		return mlx5_flow_merge_switch(dev, flow, flow_size,
+					      attr, items, actions, error);
 	if (!remain)
 		flow = &local_flow;
 	ret = mlx5_flow_attributes(dev, attr, flow, error);
@@ -2554,8 +2659,11 @@ mlx5_flow_validate(struct rte_eth_dev *dev,
 static void
 mlx5_flow_fate_remove(struct rte_eth_dev *dev, struct rte_flow *flow)
 {
+	struct priv *priv = dev->data->dev_private;
 	struct mlx5_flow_verbs *verbs;
 
+	if (flow->nl_flow && priv->mnl_socket)
+		mlx5_nl_flow_destroy(priv->mnl_socket, flow->nl_flow, NULL);
 	LIST_FOREACH(verbs, &flow->verbs, next) {
 		if (verbs->flow) {
 			claim_zero(mlx5_glue->destroy_flow(verbs->flow));
@@ -2592,6 +2700,7 @@ static int
 mlx5_flow_fate_apply(struct rte_eth_dev *dev, struct rte_flow *flow,
 		     struct rte_flow_error *error)
 {
+	struct priv *priv = dev->data->dev_private;
 	struct mlx5_flow_verbs *verbs;
 	int err;
 
@@ -2640,6 +2749,10 @@ mlx5_flow_fate_apply(struct rte_eth_dev *dev, struct rte_flow *flow,
 			goto error;
 		}
 	}
+	if (flow->nl_flow &&
+	    priv->mnl_socket &&
+	    mlx5_nl_flow_create(priv->mnl_socket, flow->nl_flow, error))
+		goto error;
 	return 0;
 error:
 	err = rte_errno; /* Save rte_errno before cleanup. */
diff --git a/drivers/net/mlx5/mlx5_nl_flow.c b/drivers/net/mlx5/mlx5_nl_flow.c
index 7a8683b03..1fc62fb0a 100644
--- a/drivers/net/mlx5/mlx5_nl_flow.c
+++ b/drivers/net/mlx5/mlx5_nl_flow.c
@@ -5,7 +5,9 @@
 
 #include <errno.h>
 #include <libmnl/libmnl.h>
+#include <linux/if_ether.h>
 #include <linux/netlink.h>
+#include <linux/pkt_cls.h>
 #include <linux/pkt_sched.h>
 #include <linux/rtnetlink.h>
 #include <stdalign.h>
@@ -14,11 +16,248 @@
 #include <stdlib.h>
 #include <sys/socket.h>
 
+#include <rte_byteorder.h>
 #include <rte_errno.h>
 #include <rte_flow.h>
 
 #include "mlx5.h"
 
+/** Parser state definitions for mlx5_nl_flow_trans[]. */
+enum mlx5_nl_flow_trans {
+	INVALID,
+	BACK,
+	ATTR,
+	PATTERN,
+	ITEM_VOID,
+	ACTIONS,
+	ACTION_VOID,
+	END,
+};
+
+#define TRANS(...) (const enum mlx5_nl_flow_trans []){ __VA_ARGS__, INVALID, }
+
+#define PATTERN_COMMON \
+	ITEM_VOID, ACTIONS
+#define ACTIONS_COMMON \
+	ACTION_VOID, END
+
+/** Parser state transitions used by mlx5_nl_flow_transpose(). */
+static const enum mlx5_nl_flow_trans *const mlx5_nl_flow_trans[] = {
+	[INVALID] = NULL,
+	[BACK] = NULL,
+	[ATTR] = TRANS(PATTERN),
+	[PATTERN] = TRANS(PATTERN_COMMON),
+	[ITEM_VOID] = TRANS(BACK),
+	[ACTIONS] = TRANS(ACTIONS_COMMON),
+	[ACTION_VOID] = TRANS(BACK),
+	[END] = NULL,
+};
+
+/**
+ * Transpose flow rule description to rtnetlink message.
+ *
+ * This function transposes a flow rule description to a traffic control
+ * (TC) filter creation message ready to be sent over Netlink.
+ *
+ * Target interface is specified as the first entry of the @p ptoi table.
+ * Subsequent entries enable this function to resolve other DPDK port IDs
+ * found in the flow rule.
+ *
+ * @param[out] buf
+ *   Output message buffer. May be NULL when @p size is 0.
+ * @param size
+ *   Size of @p buf. Message may be truncated if not large enough.
+ * @param[in] ptoi
+ *   DPDK port ID to network interface index translation table. This table
+ *   is terminated by an entry with a zero ifindex value.
+ * @param[in] attr
+ *   Flow rule attributes.
+ * @param[in] pattern
+ *   Pattern specification.
+ * @param[in] actions
+ *   Associated actions.
+ * @param[out] error
+ *   Perform verbose error reporting if not NULL.
+ *
+ * @return
+ *   A positive value representing the exact size of the message in bytes
+ *   regardless of the @p size parameter on success, a negative errno value
+ *   otherwise and rte_errno is set.
+ */
+int
+mlx5_nl_flow_transpose(void *buf,
+		       size_t size,
+		       const struct mlx5_nl_flow_ptoi *ptoi,
+		       const struct rte_flow_attr *attr,
+		       const struct rte_flow_item *pattern,
+		       const struct rte_flow_action *actions,
+		       struct rte_flow_error *error)
+{
+	alignas(struct nlmsghdr)
+	uint8_t buf_tmp[MNL_SOCKET_BUFFER_SIZE];
+	const struct rte_flow_item *item;
+	const struct rte_flow_action *action;
+	unsigned int n;
+	struct nlattr *na_flower;
+	struct nlattr *na_flower_act;
+	const enum mlx5_nl_flow_trans *trans;
+	const enum mlx5_nl_flow_trans *back;
+
+	if (!size)
+		goto error_nobufs;
+init:
+	item = pattern;
+	action = actions;
+	n = 0;
+	na_flower = NULL;
+	na_flower_act = NULL;
+	trans = TRANS(ATTR);
+	back = trans;
+trans:
+	switch (trans[n++]) {
+		struct nlmsghdr *nlh;
+		struct tcmsg *tcm;
+
+	case INVALID:
+		if (item->type)
+			return rte_flow_error_set
+				(error, ENOTSUP, RTE_FLOW_ERROR_TYPE_ITEM,
+				 item, "unsupported pattern item combination");
+		else if (action->type)
+			return rte_flow_error_set
+				(error, ENOTSUP, RTE_FLOW_ERROR_TYPE_ACTION,
+				 action, "unsupported action combination");
+		return rte_flow_error_set
+			(error, ENOTSUP, RTE_FLOW_ERROR_TYPE_UNSPECIFIED, NULL,
+			 "flow rule lacks some kind of fate action");
+	case BACK:
+		trans = back;
+		n = 0;
+		goto trans;
+	case ATTR:
+		/*
+		 * Supported attributes: no groups, some priorities and
+		 * ingress only. Don't care about transfer as it is the
+		 * caller's problem.
+		 */
+		if (attr->group)
+			return rte_flow_error_set
+				(error, ENOTSUP,
+				 RTE_FLOW_ERROR_TYPE_ATTR_GROUP,
+				 attr, "groups are not supported");
+		if (attr->priority > 0xfffe)
+			return rte_flow_error_set
+				(error, ENOTSUP,
+				 RTE_FLOW_ERROR_TYPE_ATTR_PRIORITY,
+				 attr, "lowest priority level is 0xfffe");
+		if (!attr->ingress)
+			return rte_flow_error_set
+				(error, ENOTSUP,
+				 RTE_FLOW_ERROR_TYPE_ATTR_INGRESS,
+				 attr, "only ingress is supported");
+		if (attr->egress)
+			return rte_flow_error_set
+				(error, ENOTSUP,
+				 RTE_FLOW_ERROR_TYPE_ATTR_INGRESS,
+				 attr, "egress is not supported");
+		if (size < mnl_nlmsg_size(sizeof(*tcm)))
+			goto error_nobufs;
+		nlh = mnl_nlmsg_put_header(buf);
+		nlh->nlmsg_type = 0;
+		nlh->nlmsg_flags = 0;
+		nlh->nlmsg_seq = 0;
+		tcm = mnl_nlmsg_put_extra_header(nlh, sizeof(*tcm));
+		tcm->tcm_family = AF_UNSPEC;
+		tcm->tcm_ifindex = ptoi[0].ifindex;
+		/*
+		 * Let kernel pick a handle by default. A predictable handle
+		 * can be set by the caller on the resulting buffer through
+		 * mlx5_nl_flow_brand().
+		 */
+		tcm->tcm_handle = 0;
+		tcm->tcm_parent = TC_H_MAKE(TC_H_INGRESS, TC_H_MIN_INGRESS);
+		/*
+		 * Priority cannot be zero to prevent the kernel from
+		 * picking one automatically.
+		 */
+		tcm->tcm_info = TC_H_MAKE((attr->priority + 1) << 16,
+					  RTE_BE16(ETH_P_ALL));
+		break;
+	case PATTERN:
+		if (!mnl_attr_put_strz_check(buf, size, TCA_KIND, "flower"))
+			goto error_nobufs;
+		na_flower = mnl_attr_nest_start_check(buf, size, TCA_OPTIONS);
+		if (!na_flower)
+			goto error_nobufs;
+		if (!mnl_attr_put_u32_check(buf, size, TCA_FLOWER_FLAGS,
+					    TCA_CLS_FLAGS_SKIP_SW))
+			goto error_nobufs;
+		break;
+	case ITEM_VOID:
+		if (item->type != RTE_FLOW_ITEM_TYPE_VOID)
+			goto trans;
+		++item;
+		break;
+	case ACTIONS:
+		if (item->type != RTE_FLOW_ITEM_TYPE_END)
+			goto trans;
+		assert(na_flower);
+		assert(!na_flower_act);
+		na_flower_act =
+			mnl_attr_nest_start_check(buf, size, TCA_FLOWER_ACT);
+		if (!na_flower_act)
+			goto error_nobufs;
+		break;
+	case ACTION_VOID:
+		if (action->type != RTE_FLOW_ACTION_TYPE_VOID)
+			goto trans;
+		++action;
+		break;
+	case END:
+		if (item->type != RTE_FLOW_ITEM_TYPE_END ||
+		    action->type != RTE_FLOW_ACTION_TYPE_END)
+			goto trans;
+		if (na_flower_act)
+			mnl_attr_nest_end(buf, na_flower_act);
+		if (na_flower)
+			mnl_attr_nest_end(buf, na_flower);
+		nlh = buf;
+		return nlh->nlmsg_len;
+	}
+	back = trans;
+	trans = mlx5_nl_flow_trans[trans[n - 1]];
+	n = 0;
+	goto trans;
+error_nobufs:
+	if (buf != buf_tmp) {
+		buf = buf_tmp;
+		size = sizeof(buf_tmp);
+		goto init;
+	}
+	return rte_flow_error_set
+		(error, ENOBUFS, RTE_FLOW_ERROR_TYPE_UNSPECIFIED, NULL,
+		 "generated TC message is too large");
+}
+
+/**
+ * Brand rtnetlink buffer with unique handle.
+ *
+ * This handle should be unique for a given network interface to avoid
+ * collisions.
+ *
+ * @param buf
+ *   Flow rule buffer previously initialized by mlx5_nl_flow_transpose().
+ * @param handle
+ *   Unique 32-bit handle to use.
+ */
+void
+mlx5_nl_flow_brand(void *buf, uint32_t handle)
+{
+	struct tcmsg *tcm = mnl_nlmsg_get_payload(buf);
+
+	tcm->tcm_handle = handle;
+}
+
 /**
  * Send Netlink message with acknowledgment.
  *
@@ -54,6 +293,62 @@ mlx5_nl_flow_nl_ack(struct mnl_socket *nl, struct nlmsghdr *nlh)
 }
 
 /**
+ * Create a Netlink flow rule.
+ *
+ * @param nl
+ *   Libmnl socket to use.
+ * @param buf
+ *   Flow rule buffer previously initialized by mlx5_nl_flow_transpose().
+ * @param[out] error
+ *   Perform verbose error reporting if not NULL.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+int
+mlx5_nl_flow_create(struct mnl_socket *nl, void *buf,
+		    struct rte_flow_error *error)
+{
+	struct nlmsghdr *nlh = buf;
+
+	nlh->nlmsg_type = RTM_NEWTFILTER;
+	nlh->nlmsg_flags = NLM_F_REQUEST | NLM_F_CREATE | NLM_F_EXCL;
+	if (!mlx5_nl_flow_nl_ack(nl, nlh))
+		return 0;
+	return rte_flow_error_set
+		(error, rte_errno, RTE_FLOW_ERROR_TYPE_UNSPECIFIED, NULL,
+		 "netlink: failed to create TC flow rule");
+}
+
+/**
+ * Destroy a Netlink flow rule.
+ *
+ * @param nl
+ *   Libmnl socket to use.
+ * @param buf
+ *   Flow rule buffer previously initialized by mlx5_nl_flow_transpose().
+ * @param[out] error
+ *   Perform verbose error reporting if not NULL.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+int
+mlx5_nl_flow_destroy(struct mnl_socket *nl, void *buf,
+		     struct rte_flow_error *error)
+{
+	struct nlmsghdr *nlh = buf;
+
+	nlh->nlmsg_type = RTM_DELTFILTER;
+	nlh->nlmsg_flags = NLM_F_REQUEST;
+	if (!mlx5_nl_flow_nl_ack(nl, nlh))
+		return 0;
+	return rte_flow_error_set
+		(error, errno, RTE_FLOW_ERROR_TYPE_UNSPECIFIED, NULL,
+		 "netlink: failed to destroy TC flow rule");
+}
+
+/**
  * Initialize ingress qdisc of a given network interface.
  *
  * @param nl
-- 
2.11.0

^ permalink raw reply	[flat|nested] 33+ messages in thread

* [dpdk-dev] [PATCH 3/6] net/mlx5: add fate actions to switch flow rules
  2018-06-27 18:08 [dpdk-dev] [PATCH 0/6] net/mlx5: add support for switch flow rules Adrien Mazarguil
  2018-06-27 18:08 ` [dpdk-dev] [PATCH 1/6] net/mlx5: lay groundwork for switch offloads Adrien Mazarguil
  2018-06-27 18:08 ` [dpdk-dev] [PATCH 2/6] net/mlx5: add framework for switch flow rules Adrien Mazarguil
@ 2018-06-27 18:08 ` Adrien Mazarguil
  2018-07-12  1:00   ` Yongseok Koh
  2018-06-27 18:08 ` [dpdk-dev] [PATCH 4/6] net/mlx5: add L2-L4 pattern items " Adrien Mazarguil
                   ` (4 subsequent siblings)
  7 siblings, 1 reply; 33+ messages in thread
From: Adrien Mazarguil @ 2018-06-27 18:08 UTC (permalink / raw)
  To: Shahaf Shuler; +Cc: Nelio Laranjeiro, Yongseok Koh, dev

This patch enables creation of rte_flow rules that direct matching traffic
to a different port (e.g. another VF representor) or drop it directly at
the switch level (PORT_ID and DROP actions).

Testpmd examples:

- Directing all traffic to port ID 0:

  flow create 1 ingress transfer pattern end actions port_id id 0 / end

- Dropping all traffic normally received by port ID 1:

  flow create 1 ingress transfer pattern end actions drop / end

Note the presence of the transfer attribute, which requests them to be
applied at the switch level. All traffic is matched due to empty pattern.

Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
---
 drivers/net/mlx5/mlx5_nl_flow.c | 77 +++++++++++++++++++++++++++++++++++-
 1 file changed, 75 insertions(+), 2 deletions(-)

diff --git a/drivers/net/mlx5/mlx5_nl_flow.c b/drivers/net/mlx5/mlx5_nl_flow.c
index 1fc62fb0a..70da85fd5 100644
--- a/drivers/net/mlx5/mlx5_nl_flow.c
+++ b/drivers/net/mlx5/mlx5_nl_flow.c
@@ -10,6 +10,8 @@
 #include <linux/pkt_cls.h>
 #include <linux/pkt_sched.h>
 #include <linux/rtnetlink.h>
+#include <linux/tc_act/tc_gact.h>
+#include <linux/tc_act/tc_mirred.h>
 #include <stdalign.h>
 #include <stddef.h>
 #include <stdint.h>
@@ -31,6 +33,8 @@ enum mlx5_nl_flow_trans {
 	ITEM_VOID,
 	ACTIONS,
 	ACTION_VOID,
+	ACTION_PORT_ID,
+	ACTION_DROP,
 	END,
 };
 
@@ -39,7 +43,9 @@ enum mlx5_nl_flow_trans {
 #define PATTERN_COMMON \
 	ITEM_VOID, ACTIONS
 #define ACTIONS_COMMON \
-	ACTION_VOID, END
+	ACTION_VOID
+#define ACTIONS_FATE \
+	ACTION_PORT_ID, ACTION_DROP
 
 /** Parser state transitions used by mlx5_nl_flow_transpose(). */
 static const enum mlx5_nl_flow_trans *const mlx5_nl_flow_trans[] = {
@@ -48,8 +54,10 @@ static const enum mlx5_nl_flow_trans *const mlx5_nl_flow_trans[] = {
 	[ATTR] = TRANS(PATTERN),
 	[PATTERN] = TRANS(PATTERN_COMMON),
 	[ITEM_VOID] = TRANS(BACK),
-	[ACTIONS] = TRANS(ACTIONS_COMMON),
+	[ACTIONS] = TRANS(ACTIONS_FATE, ACTIONS_COMMON),
 	[ACTION_VOID] = TRANS(BACK),
+	[ACTION_PORT_ID] = TRANS(ACTION_VOID, END),
+	[ACTION_DROP] = TRANS(ACTION_VOID, END),
 	[END] = NULL,
 };
 
@@ -98,6 +106,7 @@ mlx5_nl_flow_transpose(void *buf,
 	const struct rte_flow_item *item;
 	const struct rte_flow_action *action;
 	unsigned int n;
+	uint32_t act_index_cur;
 	struct nlattr *na_flower;
 	struct nlattr *na_flower_act;
 	const enum mlx5_nl_flow_trans *trans;
@@ -109,14 +118,21 @@ mlx5_nl_flow_transpose(void *buf,
 	item = pattern;
 	action = actions;
 	n = 0;
+	act_index_cur = 0;
 	na_flower = NULL;
 	na_flower_act = NULL;
 	trans = TRANS(ATTR);
 	back = trans;
 trans:
 	switch (trans[n++]) {
+		union {
+			const struct rte_flow_action_port_id *port_id;
+		} conf;
 		struct nlmsghdr *nlh;
 		struct tcmsg *tcm;
+		struct nlattr *act_index;
+		struct nlattr *act;
+		unsigned int i;
 
 	case INVALID:
 		if (item->type)
@@ -207,12 +223,69 @@ mlx5_nl_flow_transpose(void *buf,
 			mnl_attr_nest_start_check(buf, size, TCA_FLOWER_ACT);
 		if (!na_flower_act)
 			goto error_nobufs;
+		act_index_cur = 1;
 		break;
 	case ACTION_VOID:
 		if (action->type != RTE_FLOW_ACTION_TYPE_VOID)
 			goto trans;
 		++action;
 		break;
+	case ACTION_PORT_ID:
+		if (action->type != RTE_FLOW_ACTION_TYPE_PORT_ID)
+			goto trans;
+		conf.port_id = action->conf;
+		if (conf.port_id->original)
+			i = 0;
+		else
+			for (i = 0; ptoi[i].ifindex; ++i)
+				if (ptoi[i].port_id == conf.port_id->id)
+					break;
+		if (!ptoi[i].ifindex)
+			return rte_flow_error_set
+				(error, ENODEV, RTE_FLOW_ERROR_TYPE_ACTION_CONF,
+				 conf.port_id,
+				 "missing data to convert port ID to ifindex");
+		act_index =
+			mnl_attr_nest_start_check(buf, size, act_index_cur++);
+		if (!act_index ||
+		    !mnl_attr_put_strz_check(buf, size, TCA_ACT_KIND, "mirred"))
+			goto error_nobufs;
+		act = mnl_attr_nest_start_check(buf, size, TCA_ACT_OPTIONS);
+		if (!act)
+			goto error_nobufs;
+		if (!mnl_attr_put_check(buf, size, TCA_MIRRED_PARMS,
+					sizeof(struct tc_mirred),
+					&(struct tc_mirred){
+						.action = TC_ACT_STOLEN,
+						.eaction = TCA_EGRESS_REDIR,
+						.ifindex = ptoi[i].ifindex,
+					}))
+			goto error_nobufs;
+		mnl_attr_nest_end(buf, act);
+		mnl_attr_nest_end(buf, act_index);
+		++action;
+		break;
+	case ACTION_DROP:
+		if (action->type != RTE_FLOW_ACTION_TYPE_DROP)
+			goto trans;
+		act_index =
+			mnl_attr_nest_start_check(buf, size, act_index_cur++);
+		if (!act_index ||
+		    !mnl_attr_put_strz_check(buf, size, TCA_ACT_KIND, "gact"))
+			goto error_nobufs;
+		act = mnl_attr_nest_start_check(buf, size, TCA_ACT_OPTIONS);
+		if (!act)
+			goto error_nobufs;
+		if (!mnl_attr_put_check(buf, size, TCA_GACT_PARMS,
+					sizeof(struct tc_gact),
+					&(struct tc_gact){
+						.action = TC_ACT_SHOT,
+					}))
+			goto error_nobufs;
+		mnl_attr_nest_end(buf, act);
+		mnl_attr_nest_end(buf, act_index);
+		++action;
+		break;
 	case END:
 		if (item->type != RTE_FLOW_ITEM_TYPE_END ||
 		    action->type != RTE_FLOW_ACTION_TYPE_END)
-- 
2.11.0

^ permalink raw reply	[flat|nested] 33+ messages in thread

* [dpdk-dev] [PATCH 4/6] net/mlx5: add L2-L4 pattern items to switch flow rules
  2018-06-27 18:08 [dpdk-dev] [PATCH 0/6] net/mlx5: add support for switch flow rules Adrien Mazarguil
                   ` (2 preceding siblings ...)
  2018-06-27 18:08 ` [dpdk-dev] [PATCH 3/6] net/mlx5: add fate actions to " Adrien Mazarguil
@ 2018-06-27 18:08 ` Adrien Mazarguil
  2018-07-12  1:02   ` Yongseok Koh
  2018-06-27 18:08 ` [dpdk-dev] [PATCH 5/6] net/mlx5: add VLAN item and actions " Adrien Mazarguil
                   ` (3 subsequent siblings)
  7 siblings, 1 reply; 33+ messages in thread
From: Adrien Mazarguil @ 2018-06-27 18:08 UTC (permalink / raw)
  To: Shahaf Shuler; +Cc: Nelio Laranjeiro, Yongseok Koh, dev

This enables flow rules to explicitly match supported combinations of
Ethernet, IPv4, IPv6, TCP and UDP headers at the switch level.

Testpmd example:

- Dropping TCPv4 traffic with a specific destination on port ID 2:

  flow create 2 ingress transfer pattern eth / ipv4 / tcp dst is 42 / end
     actions drop / end

Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
---
 drivers/net/mlx5/mlx5_nl_flow.c | 397 ++++++++++++++++++++++++++++++++++-
 1 file changed, 396 insertions(+), 1 deletion(-)

diff --git a/drivers/net/mlx5/mlx5_nl_flow.c b/drivers/net/mlx5/mlx5_nl_flow.c
index 70da85fd5..ad1e001c6 100644
--- a/drivers/net/mlx5/mlx5_nl_flow.c
+++ b/drivers/net/mlx5/mlx5_nl_flow.c
@@ -3,6 +3,7 @@
  * Copyright 2018 Mellanox Technologies, Ltd
  */
 
+#include <assert.h>
 #include <errno.h>
 #include <libmnl/libmnl.h>
 #include <linux/if_ether.h>
@@ -12,7 +13,9 @@
 #include <linux/rtnetlink.h>
 #include <linux/tc_act/tc_gact.h>
 #include <linux/tc_act/tc_mirred.h>
+#include <netinet/in.h>
 #include <stdalign.h>
+#include <stdbool.h>
 #include <stddef.h>
 #include <stdint.h>
 #include <stdlib.h>
@@ -20,6 +23,7 @@
 
 #include <rte_byteorder.h>
 #include <rte_errno.h>
+#include <rte_ether.h>
 #include <rte_flow.h>
 
 #include "mlx5.h"
@@ -31,6 +35,11 @@ enum mlx5_nl_flow_trans {
 	ATTR,
 	PATTERN,
 	ITEM_VOID,
+	ITEM_ETH,
+	ITEM_IPV4,
+	ITEM_IPV6,
+	ITEM_TCP,
+	ITEM_UDP,
 	ACTIONS,
 	ACTION_VOID,
 	ACTION_PORT_ID,
@@ -52,8 +61,13 @@ static const enum mlx5_nl_flow_trans *const mlx5_nl_flow_trans[] = {
 	[INVALID] = NULL,
 	[BACK] = NULL,
 	[ATTR] = TRANS(PATTERN),
-	[PATTERN] = TRANS(PATTERN_COMMON),
+	[PATTERN] = TRANS(ITEM_ETH, PATTERN_COMMON),
 	[ITEM_VOID] = TRANS(BACK),
+	[ITEM_ETH] = TRANS(ITEM_IPV4, ITEM_IPV6, PATTERN_COMMON),
+	[ITEM_IPV4] = TRANS(ITEM_TCP, ITEM_UDP, PATTERN_COMMON),
+	[ITEM_IPV6] = TRANS(ITEM_TCP, ITEM_UDP, PATTERN_COMMON),
+	[ITEM_TCP] = TRANS(PATTERN_COMMON),
+	[ITEM_UDP] = TRANS(PATTERN_COMMON),
 	[ACTIONS] = TRANS(ACTIONS_FATE, ACTIONS_COMMON),
 	[ACTION_VOID] = TRANS(BACK),
 	[ACTION_PORT_ID] = TRANS(ACTION_VOID, END),
@@ -61,6 +75,126 @@ static const enum mlx5_nl_flow_trans *const mlx5_nl_flow_trans[] = {
 	[END] = NULL,
 };
 
+/** Empty masks for known item types. */
+static const union {
+	struct rte_flow_item_eth eth;
+	struct rte_flow_item_ipv4 ipv4;
+	struct rte_flow_item_ipv6 ipv6;
+	struct rte_flow_item_tcp tcp;
+	struct rte_flow_item_udp udp;
+} mlx5_nl_flow_mask_empty;
+
+/** Supported masks for known item types. */
+static const struct {
+	struct rte_flow_item_eth eth;
+	struct rte_flow_item_ipv4 ipv4;
+	struct rte_flow_item_ipv6 ipv6;
+	struct rte_flow_item_tcp tcp;
+	struct rte_flow_item_udp udp;
+} mlx5_nl_flow_mask_supported = {
+	.eth = {
+		.type = RTE_BE16(0xffff),
+		.dst.addr_bytes = "\xff\xff\xff\xff\xff\xff",
+		.src.addr_bytes = "\xff\xff\xff\xff\xff\xff",
+	},
+	.ipv4.hdr = {
+		.next_proto_id = 0xff,
+		.src_addr = RTE_BE32(0xffffffff),
+		.dst_addr = RTE_BE32(0xffffffff),
+	},
+	.ipv6.hdr = {
+		.proto = 0xff,
+		.src_addr =
+			"\xff\xff\xff\xff\xff\xff\xff\xff"
+			"\xff\xff\xff\xff\xff\xff\xff\xff",
+		.dst_addr =
+			"\xff\xff\xff\xff\xff\xff\xff\xff"
+			"\xff\xff\xff\xff\xff\xff\xff\xff",
+	},
+	.tcp.hdr = {
+		.src_port = RTE_BE16(0xffff),
+		.dst_port = RTE_BE16(0xffff),
+	},
+	.udp.hdr = {
+		.src_port = RTE_BE16(0xffff),
+		.dst_port = RTE_BE16(0xffff),
+	},
+};
+
+/**
+ * Retrieve mask for pattern item.
+ *
+ * This function does basic sanity checks on a pattern item in order to
+ * return the most appropriate mask for it.
+ *
+ * @param[in] item
+ *   Item specification.
+ * @param[in] mask_default
+ *   Default mask for pattern item as specified by the flow API.
+ * @param[in] mask_supported
+ *   Mask fields supported by the implementation.
+ * @param[in] mask_empty
+ *   Empty mask to return when there is no specification.
+ * @param[out] error
+ *   Perform verbose error reporting if not NULL.
+ *
+ * @return
+ *   Either @p item->mask or one of the mask parameters on success, NULL
+ *   otherwise and rte_errno is set.
+ */
+static const void *
+mlx5_nl_flow_item_mask(const struct rte_flow_item *item,
+		       const void *mask_default,
+		       const void *mask_supported,
+		       const void *mask_empty,
+		       size_t mask_size,
+		       struct rte_flow_error *error)
+{
+	const uint8_t *mask;
+	size_t i;
+
+	/* item->last and item->mask cannot exist without item->spec. */
+	if (!item->spec && (item->mask || item->last)) {
+		rte_flow_error_set
+			(error, EINVAL, RTE_FLOW_ERROR_TYPE_ITEM, item,
+			 "\"mask\" or \"last\" field provided without a"
+			 " corresponding \"spec\"");
+		return NULL;
+	}
+	/* No spec, no mask, no problem. */
+	if (!item->spec)
+		return mask_empty;
+	mask = item->mask ? item->mask : mask_default;
+	assert(mask);
+	/*
+	 * Single-pass check to make sure that:
+	 * - Mask is supported, no bits are set outside mask_supported.
+	 * - Both item->spec and item->last are included in mask.
+	 */
+	for (i = 0; i != mask_size; ++i) {
+		if (!mask[i])
+			continue;
+		if ((mask[i] | ((const uint8_t *)mask_supported)[i]) !=
+		    ((const uint8_t *)mask_supported)[i]) {
+			rte_flow_error_set
+				(error, ENOTSUP, RTE_FLOW_ERROR_TYPE_ITEM_MASK,
+				 mask, "unsupported field found in \"mask\"");
+			return NULL;
+		}
+		if (item->last &&
+		    (((const uint8_t *)item->spec)[i] & mask[i]) !=
+		    (((const uint8_t *)item->last)[i] & mask[i])) {
+			rte_flow_error_set
+				(error, ENOTSUP, RTE_FLOW_ERROR_TYPE_ITEM_LAST,
+				 item->last,
+				 "range between \"spec\" and \"last\" not"
+				 " comprised in \"mask\"");
+			return NULL;
+		}
+	}
+	return mask;
+}
+
 /**
  * Transpose flow rule description to rtnetlink message.
  *
@@ -107,6 +241,8 @@ mlx5_nl_flow_transpose(void *buf,
 	const struct rte_flow_action *action;
 	unsigned int n;
 	uint32_t act_index_cur;
+	bool eth_type_set;
+	bool ip_proto_set;
 	struct nlattr *na_flower;
 	struct nlattr *na_flower_act;
 	const enum mlx5_nl_flow_trans *trans;
@@ -119,6 +255,8 @@ mlx5_nl_flow_transpose(void *buf,
 	action = actions;
 	n = 0;
 	act_index_cur = 0;
+	eth_type_set = false;
+	ip_proto_set = false;
 	na_flower = NULL;
 	na_flower_act = NULL;
 	trans = TRANS(ATTR);
@@ -126,6 +264,13 @@ mlx5_nl_flow_transpose(void *buf,
 trans:
 	switch (trans[n++]) {
 		union {
+			const struct rte_flow_item_eth *eth;
+			const struct rte_flow_item_ipv4 *ipv4;
+			const struct rte_flow_item_ipv6 *ipv6;
+			const struct rte_flow_item_tcp *tcp;
+			const struct rte_flow_item_udp *udp;
+		} spec, mask;
+		union {
 			const struct rte_flow_action_port_id *port_id;
 		} conf;
 		struct nlmsghdr *nlh;
@@ -214,6 +359,256 @@ mlx5_nl_flow_transpose(void *buf,
 			goto trans;
 		++item;
 		break;
+	case ITEM_ETH:
+		if (item->type != RTE_FLOW_ITEM_TYPE_ETH)
+			goto trans;
+		mask.eth = mlx5_nl_flow_item_mask
+			(item, &rte_flow_item_eth_mask,
+			 &mlx5_nl_flow_mask_supported.eth,
+			 &mlx5_nl_flow_mask_empty.eth,
+			 sizeof(mlx5_nl_flow_mask_supported.eth), error);
+		if (!mask.eth)
+			return -rte_errno;
+		if (mask.eth == &mlx5_nl_flow_mask_empty.eth) {
+			++item;
+			break;
+		}
+		spec.eth = item->spec;
+		if (mask.eth->type && mask.eth->type != RTE_BE16(0xffff))
+			return rte_flow_error_set
+				(error, ENOTSUP, RTE_FLOW_ERROR_TYPE_ITEM_MASK,
+				 mask.eth,
+				 "no support for partial mask on"
+				 " \"type\" field");
+		if (mask.eth->type) {
+			if (!mnl_attr_put_u16_check(buf, size,
+						    TCA_FLOWER_KEY_ETH_TYPE,
+						    spec.eth->type))
+				goto error_nobufs;
+			eth_type_set = 1;
+		}
+		if ((!is_zero_ether_addr(&mask.eth->dst) &&
+		     (!mnl_attr_put_check(buf, size,
+					  TCA_FLOWER_KEY_ETH_DST,
+					  ETHER_ADDR_LEN,
+					  spec.eth->dst.addr_bytes) ||
+		      !mnl_attr_put_check(buf, size,
+					  TCA_FLOWER_KEY_ETH_DST_MASK,
+					  ETHER_ADDR_LEN,
+					  mask.eth->dst.addr_bytes))) ||
+		    (!is_zero_ether_addr(&mask.eth->src) &&
+		     (!mnl_attr_put_check(buf, size,
+					  TCA_FLOWER_KEY_ETH_SRC,
+					  ETHER_ADDR_LEN,
+					  spec.eth->src.addr_bytes) ||
+		      !mnl_attr_put_check(buf, size,
+					  TCA_FLOWER_KEY_ETH_SRC_MASK,
+					  ETHER_ADDR_LEN,
+					  mask.eth->src.addr_bytes))))
+			goto error_nobufs;
+		++item;
+		break;
+	case ITEM_IPV4:
+		if (item->type != RTE_FLOW_ITEM_TYPE_IPV4)
+			goto trans;
+		mask.ipv4 = mlx5_nl_flow_item_mask
+			(item, &rte_flow_item_ipv4_mask,
+			 &mlx5_nl_flow_mask_supported.ipv4,
+			 &mlx5_nl_flow_mask_empty.ipv4,
+			 sizeof(mlx5_nl_flow_mask_supported.ipv4), error);
+		if (!mask.ipv4)
+			return -rte_errno;
+		if (!eth_type_set &&
+		    !mnl_attr_put_u16_check(buf, size,
+					    TCA_FLOWER_KEY_ETH_TYPE,
+					    RTE_BE16(ETH_P_IP)))
+			goto error_nobufs;
+		eth_type_set = 1;
+		if (mask.ipv4 == &mlx5_nl_flow_mask_empty.ipv4) {
+			++item;
+			break;
+		}
+		spec.ipv4 = item->spec;
+		if (mask.ipv4->hdr.next_proto_id &&
+		    mask.ipv4->hdr.next_proto_id != 0xff)
+			return rte_flow_error_set
+				(error, ENOTSUP, RTE_FLOW_ERROR_TYPE_ITEM_MASK,
+				 mask.ipv4,
+				 "no support for partial mask on"
+				 " \"hdr.next_proto_id\" field");
+		if (mask.ipv4->hdr.next_proto_id) {
+			if (!mnl_attr_put_u8_check
+			    (buf, size, TCA_FLOWER_KEY_IP_PROTO,
+			     spec.ipv4->hdr.next_proto_id))
+				goto error_nobufs;
+			ip_proto_set = 1;
+		}
+		if ((mask.ipv4->hdr.src_addr &&
+		     (!mnl_attr_put_u32_check(buf, size,
+					      TCA_FLOWER_KEY_IPV4_SRC,
+					      spec.ipv4->hdr.src_addr) ||
+		      !mnl_attr_put_u32_check(buf, size,
+					      TCA_FLOWER_KEY_IPV4_SRC_MASK,
+					      mask.ipv4->hdr.src_addr))) ||
+		    (mask.ipv4->hdr.dst_addr &&
+		     (!mnl_attr_put_u32_check(buf, size,
+					      TCA_FLOWER_KEY_IPV4_DST,
+					      spec.ipv4->hdr.dst_addr) ||
+		      !mnl_attr_put_u32_check(buf, size,
+					      TCA_FLOWER_KEY_IPV4_DST_MASK,
+					      mask.ipv4->hdr.dst_addr))))
+			goto error_nobufs;
+		++item;
+		break;
+	case ITEM_IPV6:
+		if (item->type != RTE_FLOW_ITEM_TYPE_IPV6)
+			goto trans;
+		mask.ipv6 = mlx5_nl_flow_item_mask
+			(item, &rte_flow_item_ipv6_mask,
+			 &mlx5_nl_flow_mask_supported.ipv6,
+			 &mlx5_nl_flow_mask_empty.ipv6,
+			 sizeof(mlx5_nl_flow_mask_supported.ipv6), error);
+		if (!mask.ipv6)
+			return -rte_errno;
+		if (!eth_type_set &&
+		    !mnl_attr_put_u16_check(buf, size,
+					    TCA_FLOWER_KEY_ETH_TYPE,
+					    RTE_BE16(ETH_P_IPV6)))
+			goto error_nobufs;
+		eth_type_set = 1;
+		if (mask.ipv6 == &mlx5_nl_flow_mask_empty.ipv6) {
+			++item;
+			break;
+		}
+		spec.ipv6 = item->spec;
+		if (mask.ipv6->hdr.proto && mask.ipv6->hdr.proto != 0xff)
+			return rte_flow_error_set
+				(error, ENOTSUP, RTE_FLOW_ERROR_TYPE_ITEM_MASK,
+				 mask.ipv6,
+				 "no support for partial mask on"
+				 " \"hdr.proto\" field");
+		if (mask.ipv6->hdr.proto) {
+			if (!mnl_attr_put_u8_check
+			    (buf, size, TCA_FLOWER_KEY_IP_PROTO,
+			     spec.ipv6->hdr.proto))
+				goto error_nobufs;
+			ip_proto_set = 1;
+		}
+		if ((!IN6_IS_ADDR_UNSPECIFIED(mask.ipv6->hdr.src_addr) &&
+		     (!mnl_attr_put_check(buf, size,
+					  TCA_FLOWER_KEY_IPV6_SRC,
+					  sizeof(spec.ipv6->hdr.src_addr),
+					  spec.ipv6->hdr.src_addr) ||
+		      !mnl_attr_put_check(buf, size,
+					  TCA_FLOWER_KEY_IPV6_SRC_MASK,
+					  sizeof(mask.ipv6->hdr.src_addr),
+					  mask.ipv6->hdr.src_addr))) ||
+		    (!IN6_IS_ADDR_UNSPECIFIED(mask.ipv6->hdr.dst_addr) &&
+		     (!mnl_attr_put_check(buf, size,
+					  TCA_FLOWER_KEY_IPV6_DST,
+					  sizeof(spec.ipv6->hdr.dst_addr),
+					  spec.ipv6->hdr.dst_addr) ||
+		      !mnl_attr_put_check(buf, size,
+					  TCA_FLOWER_KEY_IPV6_DST_MASK,
+					  sizeof(mask.ipv6->hdr.dst_addr),
+					  mask.ipv6->hdr.dst_addr))))
+			goto error_nobufs;
+		++item;
+		break;
+	case ITEM_TCP:
+		if (item->type != RTE_FLOW_ITEM_TYPE_TCP)
+			goto trans;
+		mask.tcp = mlx5_nl_flow_item_mask
+			(item, &rte_flow_item_tcp_mask,
+			 &mlx5_nl_flow_mask_supported.tcp,
+			 &mlx5_nl_flow_mask_empty.tcp,
+			 sizeof(mlx5_nl_flow_mask_supported.tcp), error);
+		if (!mask.tcp)
+			return -rte_errno;
+		if (!ip_proto_set &&
+		    !mnl_attr_put_u8_check(buf, size,
+					   TCA_FLOWER_KEY_IP_PROTO,
+					   IPPROTO_TCP))
+			goto error_nobufs;
+		if (mask.tcp == &mlx5_nl_flow_mask_empty.tcp) {
+			++item;
+			break;
+		}
+		spec.tcp = item->spec;
+		if ((mask.tcp->hdr.src_port &&
+		     mask.tcp->hdr.src_port != RTE_BE16(0xffff)) ||
+		    (mask.tcp->hdr.dst_port &&
+		     mask.tcp->hdr.dst_port != RTE_BE16(0xffff)))
+			return rte_flow_error_set
+				(error, ENOTSUP, RTE_FLOW_ERROR_TYPE_ITEM_MASK,
+				 mask.tcp,
+				 "no support for partial masks on"
+				 " \"hdr.src_port\" and \"hdr.dst_port\""
+				 " fields");
+		if ((mask.tcp->hdr.src_port &&
+		     (!mnl_attr_put_u16_check(buf, size,
+					      TCA_FLOWER_KEY_TCP_SRC,
+					      spec.tcp->hdr.src_port) ||
+		      !mnl_attr_put_u16_check(buf, size,
+					      TCA_FLOWER_KEY_TCP_SRC_MASK,
+					      mask.tcp->hdr.src_port))) ||
+		    (mask.tcp->hdr.dst_port &&
+		     (!mnl_attr_put_u16_check(buf, size,
+					      TCA_FLOWER_KEY_TCP_DST,
+					      spec.tcp->hdr.dst_port) ||
+		      !mnl_attr_put_u16_check(buf, size,
+					      TCA_FLOWER_KEY_TCP_DST_MASK,
+					      mask.tcp->hdr.dst_port))))
+			goto error_nobufs;
+		++item;
+		break;
+	case ITEM_UDP:
+		if (item->type != RTE_FLOW_ITEM_TYPE_UDP)
+			goto trans;
+		mask.udp = mlx5_nl_flow_item_mask
+			(item, &rte_flow_item_udp_mask,
+			 &mlx5_nl_flow_mask_supported.udp,
+			 &mlx5_nl_flow_mask_empty.udp,
+			 sizeof(mlx5_nl_flow_mask_supported.udp), error);
+		if (!mask.udp)
+			return -rte_errno;
+		if (!ip_proto_set &&
+		    !mnl_attr_put_u8_check(buf, size,
+					   TCA_FLOWER_KEY_IP_PROTO,
+					   IPPROTO_UDP))
+			goto error_nobufs;
+		if (mask.udp == &mlx5_nl_flow_mask_empty.udp) {
+			++item;
+			break;
+		}
+		spec.udp = item->spec;
+		if ((mask.udp->hdr.src_port &&
+		     mask.udp->hdr.src_port != RTE_BE16(0xffff)) ||
+		    (mask.udp->hdr.dst_port &&
+		     mask.udp->hdr.dst_port != RTE_BE16(0xffff)))
+			return rte_flow_error_set
+				(error, ENOTSUP, RTE_FLOW_ERROR_TYPE_ITEM_MASK,
+				 mask.udp,
+				 "no support for partial masks on"
+				 " \"hdr.src_port\" and \"hdr.dst_port\""
+				 " fields");
+		if ((mask.udp->hdr.src_port &&
+		     (!mnl_attr_put_u16_check(buf, size,
+					      TCA_FLOWER_KEY_UDP_SRC,
+					      spec.udp->hdr.src_port) ||
+		      !mnl_attr_put_u16_check(buf, size,
+					      TCA_FLOWER_KEY_UDP_SRC_MASK,
+					      mask.udp->hdr.src_port))) ||
+		    (mask.udp->hdr.dst_port &&
+		     (!mnl_attr_put_u16_check(buf, size,
+					      TCA_FLOWER_KEY_UDP_DST,
+					      spec.udp->hdr.dst_port) ||
+		      !mnl_attr_put_u16_check(buf, size,
+					      TCA_FLOWER_KEY_UDP_DST_MASK,
+					      mask.udp->hdr.dst_port))))
+			goto error_nobufs;
+		++item;
+		break;
 	case ACTIONS:
 		if (item->type != RTE_FLOW_ITEM_TYPE_END)
 			goto trans;
-- 
2.11.0

^ permalink raw reply	[flat|nested] 33+ messages in thread

* [dpdk-dev] [PATCH 5/6] net/mlx5: add VLAN item and actions to switch flow rules
  2018-06-27 18:08 [dpdk-dev] [PATCH 0/6] net/mlx5: add support for switch flow rules Adrien Mazarguil
                   ` (3 preceding siblings ...)
  2018-06-27 18:08 ` [dpdk-dev] [PATCH 4/6] net/mlx5: add L2-L4 pattern items " Adrien Mazarguil
@ 2018-06-27 18:08 ` Adrien Mazarguil
  2018-07-12  1:10   ` Yongseok Koh
  2018-06-27 18:08 ` [dpdk-dev] [PATCH 6/6] net/mlx5: add port ID pattern item " Adrien Mazarguil
                   ` (2 subsequent siblings)
  7 siblings, 1 reply; 33+ messages in thread
From: Adrien Mazarguil @ 2018-06-27 18:08 UTC (permalink / raw)
  To: Shahaf Shuler; +Cc: Nelio Laranjeiro, Yongseok Koh, dev

This enables flow rules to explicitly match VLAN traffic (VLAN pattern
item) and perform various operations on VLAN headers at the switch level
(OF_POP_VLAN, OF_PUSH_VLAN, OF_SET_VLAN_VID and OF_SET_VLAN_PCP actions).

Testpmd examples:

- Directing all VLAN traffic received on port ID 1 to port ID 0:

  flow create 1 ingress transfer pattern eth / vlan / end actions
     port_id id 0 / end

- Adding a VLAN header to IPv6 traffic received on port ID 1 and directing
  it to port ID 0:

  flow create 1 ingress transfer pattern eth / ipv6 / end actions
     of_push_vlan ethertype 0x8100 / of_set_vlan_vid / port_id id 0 / end

Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
---
 drivers/net/mlx5/mlx5_nl_flow.c | 177 ++++++++++++++++++++++++++++++++++-
 1 file changed, 173 insertions(+), 4 deletions(-)

diff --git a/drivers/net/mlx5/mlx5_nl_flow.c b/drivers/net/mlx5/mlx5_nl_flow.c
index ad1e001c6..a45d94fae 100644
--- a/drivers/net/mlx5/mlx5_nl_flow.c
+++ b/drivers/net/mlx5/mlx5_nl_flow.c
@@ -13,6 +13,7 @@
 #include <linux/rtnetlink.h>
 #include <linux/tc_act/tc_gact.h>
 #include <linux/tc_act/tc_mirred.h>
+#include <linux/tc_act/tc_vlan.h>
 #include <netinet/in.h>
 #include <stdalign.h>
 #include <stdbool.h>
@@ -36,6 +37,7 @@ enum mlx5_nl_flow_trans {
 	PATTERN,
 	ITEM_VOID,
 	ITEM_ETH,
+	ITEM_VLAN,
 	ITEM_IPV4,
 	ITEM_IPV6,
 	ITEM_TCP,
@@ -44,6 +46,10 @@ enum mlx5_nl_flow_trans {
 	ACTION_VOID,
 	ACTION_PORT_ID,
 	ACTION_DROP,
+	ACTION_OF_POP_VLAN,
+	ACTION_OF_PUSH_VLAN,
+	ACTION_OF_SET_VLAN_VID,
+	ACTION_OF_SET_VLAN_PCP,
 	END,
 };
 
@@ -52,7 +58,8 @@ enum mlx5_nl_flow_trans {
 #define PATTERN_COMMON \
 	ITEM_VOID, ACTIONS
 #define ACTIONS_COMMON \
-	ACTION_VOID
+	ACTION_VOID, ACTION_OF_POP_VLAN, ACTION_OF_PUSH_VLAN, \
+	ACTION_OF_SET_VLAN_VID, ACTION_OF_SET_VLAN_PCP
 #define ACTIONS_FATE \
 	ACTION_PORT_ID, ACTION_DROP
 
@@ -63,7 +70,8 @@ static const enum mlx5_nl_flow_trans *const mlx5_nl_flow_trans[] = {
 	[ATTR] = TRANS(PATTERN),
 	[PATTERN] = TRANS(ITEM_ETH, PATTERN_COMMON),
 	[ITEM_VOID] = TRANS(BACK),
-	[ITEM_ETH] = TRANS(ITEM_IPV4, ITEM_IPV6, PATTERN_COMMON),
+	[ITEM_ETH] = TRANS(ITEM_IPV4, ITEM_IPV6, ITEM_VLAN, PATTERN_COMMON),
+	[ITEM_VLAN] = TRANS(ITEM_IPV4, ITEM_IPV6, PATTERN_COMMON),
 	[ITEM_IPV4] = TRANS(ITEM_TCP, ITEM_UDP, PATTERN_COMMON),
 	[ITEM_IPV6] = TRANS(ITEM_TCP, ITEM_UDP, PATTERN_COMMON),
 	[ITEM_TCP] = TRANS(PATTERN_COMMON),
@@ -72,12 +80,17 @@ static const enum mlx5_nl_flow_trans *const mlx5_nl_flow_trans[] = {
 	[ACTION_VOID] = TRANS(BACK),
 	[ACTION_PORT_ID] = TRANS(ACTION_VOID, END),
 	[ACTION_DROP] = TRANS(ACTION_VOID, END),
+	[ACTION_OF_POP_VLAN] = TRANS(ACTIONS_FATE, ACTIONS_COMMON),
+	[ACTION_OF_PUSH_VLAN] = TRANS(ACTIONS_FATE, ACTIONS_COMMON),
+	[ACTION_OF_SET_VLAN_VID] = TRANS(ACTIONS_FATE, ACTIONS_COMMON),
+	[ACTION_OF_SET_VLAN_PCP] = TRANS(ACTIONS_FATE, ACTIONS_COMMON),
 	[END] = NULL,
 };
 
 /** Empty masks for known item types. */
 static const union {
 	struct rte_flow_item_eth eth;
+	struct rte_flow_item_vlan vlan;
 	struct rte_flow_item_ipv4 ipv4;
 	struct rte_flow_item_ipv6 ipv6;
 	struct rte_flow_item_tcp tcp;
@@ -87,6 +100,7 @@ static const union {
 /** Supported masks for known item types. */
 static const struct {
 	struct rte_flow_item_eth eth;
+	struct rte_flow_item_vlan vlan;
 	struct rte_flow_item_ipv4 ipv4;
 	struct rte_flow_item_ipv6 ipv6;
 	struct rte_flow_item_tcp tcp;
@@ -97,6 +111,11 @@ static const struct {
 		.dst.addr_bytes = "\xff\xff\xff\xff\xff\xff",
 		.src.addr_bytes = "\xff\xff\xff\xff\xff\xff",
 	},
+	.vlan = {
+		/* PCP and VID only, no DEI. */
+		.tci = RTE_BE16(0xefff),
+		.inner_type = RTE_BE16(0xffff),
+	},
 	.ipv4.hdr = {
 		.next_proto_id = 0xff,
 		.src_addr = RTE_BE32(0xffffffff),
@@ -242,9 +261,13 @@ mlx5_nl_flow_transpose(void *buf,
 	unsigned int n;
 	uint32_t act_index_cur;
 	bool eth_type_set;
+	bool vlan_present;
+	bool vlan_eth_type_set;
 	bool ip_proto_set;
 	struct nlattr *na_flower;
 	struct nlattr *na_flower_act;
+	struct nlattr *na_vlan_id;
+	struct nlattr *na_vlan_priority;
 	const enum mlx5_nl_flow_trans *trans;
 	const enum mlx5_nl_flow_trans *back;
 
@@ -256,15 +279,20 @@ mlx5_nl_flow_transpose(void *buf,
 	n = 0;
 	act_index_cur = 0;
 	eth_type_set = false;
+	vlan_present = false;
+	vlan_eth_type_set = false;
 	ip_proto_set = false;
 	na_flower = NULL;
 	na_flower_act = NULL;
+	na_vlan_id = NULL;
+	na_vlan_priority = NULL;
 	trans = TRANS(ATTR);
 	back = trans;
 trans:
 	switch (trans[n++]) {
 		union {
 			const struct rte_flow_item_eth *eth;
+			const struct rte_flow_item_vlan *vlan;
 			const struct rte_flow_item_ipv4 *ipv4;
 			const struct rte_flow_item_ipv6 *ipv6;
 			const struct rte_flow_item_tcp *tcp;
@@ -272,6 +300,11 @@ mlx5_nl_flow_transpose(void *buf,
 		} spec, mask;
 		union {
 			const struct rte_flow_action_port_id *port_id;
+			const struct rte_flow_action_of_push_vlan *of_push_vlan;
+			const struct rte_flow_action_of_set_vlan_vid *
+				of_set_vlan_vid;
+			const struct rte_flow_action_of_set_vlan_pcp *
+				of_set_vlan_pcp;
 		} conf;
 		struct nlmsghdr *nlh;
 		struct tcmsg *tcm;
@@ -408,6 +441,58 @@ mlx5_nl_flow_transpose(void *buf,
 			goto error_nobufs;
 		++item;
 		break;
+	case ITEM_VLAN:
+		if (item->type != RTE_FLOW_ITEM_TYPE_VLAN)
+			goto trans;
+		mask.vlan = mlx5_nl_flow_item_mask
+			(item, &rte_flow_item_vlan_mask,
+			 &mlx5_nl_flow_mask_supported.vlan,
+			 &mlx5_nl_flow_mask_empty.vlan,
+			 sizeof(mlx5_nl_flow_mask_supported.vlan), error);
+		if (!mask.vlan)
+			return -rte_errno;
+		if (!eth_type_set &&
+		    !mnl_attr_put_u16_check(buf, size,
+					    TCA_FLOWER_KEY_ETH_TYPE,
+					    RTE_BE16(ETH_P_8021Q)))
+			goto error_nobufs;
+		eth_type_set = 1;
+		vlan_present = 1;
+		if (mask.vlan == &mlx5_nl_flow_mask_empty.vlan) {
+			++item;
+			break;
+		}
+		spec.vlan = item->spec;
+		if ((mask.vlan->tci & RTE_BE16(0xe000) &&
+		     (mask.vlan->tci & RTE_BE16(0xe000)) != RTE_BE16(0xe000)) ||
+		    (mask.vlan->tci & RTE_BE16(0x0fff) &&
+		     (mask.vlan->tci & RTE_BE16(0x0fff)) != RTE_BE16(0x0fff)) ||
+		    (mask.vlan->inner_type &&
+		     mask.vlan->inner_type != RTE_BE16(0xffff)))
+			return rte_flow_error_set
+				(error, ENOTSUP, RTE_FLOW_ERROR_TYPE_ITEM_MASK,
+				 mask.vlan,
+				 "no support for partial masks on"
+				 " \"tci\" (PCP and VID parts) and"
+				 " \"inner_type\" fields");
+		if (mask.vlan->inner_type) {
+			if (!mnl_attr_put_u16_check
+			    (buf, size, TCA_FLOWER_KEY_VLAN_ETH_TYPE,
+			     spec.vlan->inner_type))
+				goto error_nobufs;
+			vlan_eth_type_set = 1;
+		}
+		if ((mask.vlan->tci & RTE_BE16(0xe000) &&
+		     !mnl_attr_put_u8_check
+		     (buf, size, TCA_FLOWER_KEY_VLAN_PRIO,
+		      (rte_be_to_cpu_16(spec.vlan->tci) >> 13) & 0x7)) ||
+		    (mask.vlan->tci & RTE_BE16(0x0fff) &&
+		     !mnl_attr_put_u16_check
+		     (buf, size, TCA_FLOWER_KEY_VLAN_ID,
+		      spec.vlan->tci & RTE_BE16(0x0fff))))
+			goto error_nobufs;
+		++item;
+		break;
 	case ITEM_IPV4:
 		if (item->type != RTE_FLOW_ITEM_TYPE_IPV4)
 			goto trans;
@@ -418,12 +503,15 @@ mlx5_nl_flow_transpose(void *buf,
 			 sizeof(mlx5_nl_flow_mask_supported.ipv4), error);
 		if (!mask.ipv4)
 			return -rte_errno;
-		if (!eth_type_set &&
+		if ((!eth_type_set || !vlan_eth_type_set) &&
 		    !mnl_attr_put_u16_check(buf, size,
+					    vlan_present ?
+					    TCA_FLOWER_KEY_VLAN_ETH_TYPE :
 					    TCA_FLOWER_KEY_ETH_TYPE,
 					    RTE_BE16(ETH_P_IP)))
 			goto error_nobufs;
 		eth_type_set = 1;
+		vlan_eth_type_set = 1;
 		if (mask.ipv4 == &mlx5_nl_flow_mask_empty.ipv4) {
 			++item;
 			break;
@@ -470,12 +558,15 @@ mlx5_nl_flow_transpose(void *buf,
 			 sizeof(mlx5_nl_flow_mask_supported.ipv6), error);
 		if (!mask.ipv6)
 			return -rte_errno;
-		if (!eth_type_set &&
+		if ((!eth_type_set || !vlan_eth_type_set) &&
 		    !mnl_attr_put_u16_check(buf, size,
+					    vlan_present ?
+					    TCA_FLOWER_KEY_VLAN_ETH_TYPE :
 					    TCA_FLOWER_KEY_ETH_TYPE,
 					    RTE_BE16(ETH_P_IPV6)))
 			goto error_nobufs;
 		eth_type_set = 1;
+		vlan_eth_type_set = 1;
 		if (mask.ipv6 == &mlx5_nl_flow_mask_empty.ipv6) {
 			++item;
 			break;
@@ -681,6 +772,84 @@ mlx5_nl_flow_transpose(void *buf,
 		mnl_attr_nest_end(buf, act_index);
 		++action;
 		break;
+	case ACTION_OF_POP_VLAN:
+		if (action->type != RTE_FLOW_ACTION_TYPE_OF_POP_VLAN)
+			goto trans;
+		conf.of_push_vlan = NULL;
+		i = TCA_VLAN_ACT_POP;
+		goto action_of_vlan;
+	case ACTION_OF_PUSH_VLAN:
+		if (action->type != RTE_FLOW_ACTION_TYPE_OF_PUSH_VLAN)
+			goto trans;
+		conf.of_push_vlan = action->conf;
+		i = TCA_VLAN_ACT_PUSH;
+		goto action_of_vlan;
+	case ACTION_OF_SET_VLAN_VID:
+		if (action->type != RTE_FLOW_ACTION_TYPE_OF_SET_VLAN_VID)
+			goto trans;
+		conf.of_set_vlan_vid = action->conf;
+		if (na_vlan_id)
+			goto override_na_vlan_id;
+		i = TCA_VLAN_ACT_MODIFY;
+		goto action_of_vlan;
+	case ACTION_OF_SET_VLAN_PCP:
+		if (action->type != RTE_FLOW_ACTION_TYPE_OF_SET_VLAN_PCP)
+			goto trans;
+		conf.of_set_vlan_pcp = action->conf;
+		if (na_vlan_priority)
+			goto override_na_vlan_priority;
+		i = TCA_VLAN_ACT_MODIFY;
+		goto action_of_vlan;
+action_of_vlan:
+		act_index =
+			mnl_attr_nest_start_check(buf, size, act_index_cur++);
+		if (!act_index ||
+		    !mnl_attr_put_strz_check(buf, size, TCA_ACT_KIND, "vlan"))
+			goto error_nobufs;
+		act = mnl_attr_nest_start_check(buf, size, TCA_ACT_OPTIONS);
+		if (!act)
+			goto error_nobufs;
+		if (!mnl_attr_put_check(buf, size, TCA_VLAN_PARMS,
+					sizeof(struct tc_vlan),
+					&(struct tc_vlan){
+						.action = TC_ACT_PIPE,
+						.v_action = i,
+					}))
+			goto error_nobufs;
+		if (i == TCA_VLAN_ACT_POP) {
+			mnl_attr_nest_end(buf, act);
+			++action;
+			break;
+		}
+		if (i == TCA_VLAN_ACT_PUSH &&
+		    !mnl_attr_put_u16_check(buf, size,
+					    TCA_VLAN_PUSH_VLAN_PROTOCOL,
+					    conf.of_push_vlan->ethertype))
+			goto error_nobufs;
+		na_vlan_id = mnl_nlmsg_get_payload_tail(buf);
+		if (!mnl_attr_put_u16_check(buf, size, TCA_VLAN_PAD, 0))
+			goto error_nobufs;
+		na_vlan_priority = mnl_nlmsg_get_payload_tail(buf);
+		if (!mnl_attr_put_u8_check(buf, size, TCA_VLAN_PAD, 0))
+			goto error_nobufs;
+		mnl_attr_nest_end(buf, act);
+		mnl_attr_nest_end(buf, act_index);
+		if (action->type == RTE_FLOW_ACTION_TYPE_OF_SET_VLAN_VID) {
+override_na_vlan_id:
+			na_vlan_id->nla_type = TCA_VLAN_PUSH_VLAN_ID;
+			*(uint16_t *)mnl_attr_get_payload(na_vlan_id) =
+				rte_be_to_cpu_16
+				(conf.of_set_vlan_vid->vlan_vid);
+		} else if (action->type ==
+			   RTE_FLOW_ACTION_TYPE_OF_SET_VLAN_PCP) {
+override_na_vlan_priority:
+			na_vlan_priority->nla_type =
+				TCA_VLAN_PUSH_VLAN_PRIORITY;
+			*(uint8_t *)mnl_attr_get_payload(na_vlan_priority) =
+				conf.of_set_vlan_pcp->vlan_pcp;
+		}
+		++action;
+		break;
 	case END:
 		if (item->type != RTE_FLOW_ITEM_TYPE_END ||
 		    action->type != RTE_FLOW_ACTION_TYPE_END)
-- 
2.11.0

^ permalink raw reply	[flat|nested] 33+ messages in thread

* [dpdk-dev] [PATCH 6/6] net/mlx5: add port ID pattern item to switch flow rules
  2018-06-27 18:08 [dpdk-dev] [PATCH 0/6] net/mlx5: add support for switch flow rules Adrien Mazarguil
                   ` (4 preceding siblings ...)
  2018-06-27 18:08 ` [dpdk-dev] [PATCH 5/6] net/mlx5: add VLAN item and actions " Adrien Mazarguil
@ 2018-06-27 18:08 ` Adrien Mazarguil
  2018-07-12  1:13   ` Yongseok Koh
  2018-06-28  9:05 ` [dpdk-dev] [PATCH 0/6] net/mlx5: add support for " Nélio Laranjeiro
  2018-07-13  9:40 ` [dpdk-dev] [PATCH v2 " Adrien Mazarguil
  7 siblings, 1 reply; 33+ messages in thread
From: Adrien Mazarguil @ 2018-06-27 18:08 UTC (permalink / raw)
  To: Shahaf Shuler; +Cc: Nelio Laranjeiro, Yongseok Koh, dev

This enables flow rules to match traffic coming from a different DPDK port
ID associated with the device (PORT_ID pattern item), mainly for the
convenience of applications that want to deal with a single port ID for all
flow rules associated with some physical device.

Testpmd example:

- Creating a flow rule on port ID 1 to consume all traffic from port ID 0
  and direct it to port ID 2:

  flow create 1 ingress transfer pattern port_id id is 0 / end actions
     port_id id 2 / end

Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
---
 drivers/net/mlx5/mlx5_nl_flow.c | 57 +++++++++++++++++++++++++++++++++++-
 1 file changed, 56 insertions(+), 1 deletion(-)

diff --git a/drivers/net/mlx5/mlx5_nl_flow.c b/drivers/net/mlx5/mlx5_nl_flow.c
index a45d94fae..ad7a53d36 100644
--- a/drivers/net/mlx5/mlx5_nl_flow.c
+++ b/drivers/net/mlx5/mlx5_nl_flow.c
@@ -36,6 +36,7 @@ enum mlx5_nl_flow_trans {
 	ATTR,
 	PATTERN,
 	ITEM_VOID,
+	ITEM_PORT_ID,
 	ITEM_ETH,
 	ITEM_VLAN,
 	ITEM_IPV4,
@@ -56,7 +57,7 @@ enum mlx5_nl_flow_trans {
 #define TRANS(...) (const enum mlx5_nl_flow_trans []){ __VA_ARGS__, INVALID, }
 
 #define PATTERN_COMMON \
-	ITEM_VOID, ACTIONS
+	ITEM_VOID, ITEM_PORT_ID, ACTIONS
 #define ACTIONS_COMMON \
 	ACTION_VOID, ACTION_OF_POP_VLAN, ACTION_OF_PUSH_VLAN, \
 	ACTION_OF_SET_VLAN_VID, ACTION_OF_SET_VLAN_PCP
@@ -70,6 +71,7 @@ static const enum mlx5_nl_flow_trans *const mlx5_nl_flow_trans[] = {
 	[ATTR] = TRANS(PATTERN),
 	[PATTERN] = TRANS(ITEM_ETH, PATTERN_COMMON),
 	[ITEM_VOID] = TRANS(BACK),
+	[ITEM_PORT_ID] = TRANS(BACK),
 	[ITEM_ETH] = TRANS(ITEM_IPV4, ITEM_IPV6, ITEM_VLAN, PATTERN_COMMON),
 	[ITEM_VLAN] = TRANS(ITEM_IPV4, ITEM_IPV6, PATTERN_COMMON),
 	[ITEM_IPV4] = TRANS(ITEM_TCP, ITEM_UDP, PATTERN_COMMON),
@@ -89,6 +91,7 @@ static const enum mlx5_nl_flow_trans *const mlx5_nl_flow_trans[] = {
 
 /** Empty masks for known item types. */
 static const union {
+	struct rte_flow_item_port_id port_id;
 	struct rte_flow_item_eth eth;
 	struct rte_flow_item_vlan vlan;
 	struct rte_flow_item_ipv4 ipv4;
@@ -99,6 +102,7 @@ static const union {
 
 /** Supported masks for known item types. */
 static const struct {
+	struct rte_flow_item_port_id port_id;
 	struct rte_flow_item_eth eth;
 	struct rte_flow_item_vlan vlan;
 	struct rte_flow_item_ipv4 ipv4;
@@ -106,6 +110,9 @@ static const struct {
 	struct rte_flow_item_tcp tcp;
 	struct rte_flow_item_udp udp;
 } mlx5_nl_flow_mask_supported = {
+	.port_id = {
+		.id = 0xffffffff,
+	},
 	.eth = {
 		.type = RTE_BE16(0xffff),
 		.dst.addr_bytes = "\xff\xff\xff\xff\xff\xff",
@@ -260,6 +267,7 @@ mlx5_nl_flow_transpose(void *buf,
 	const struct rte_flow_action *action;
 	unsigned int n;
 	uint32_t act_index_cur;
+	bool in_port_id_set;
 	bool eth_type_set;
 	bool vlan_present;
 	bool vlan_eth_type_set;
@@ -278,6 +286,7 @@ mlx5_nl_flow_transpose(void *buf,
 	action = actions;
 	n = 0;
 	act_index_cur = 0;
+	in_port_id_set = false;
 	eth_type_set = false;
 	vlan_present = false;
 	vlan_eth_type_set = false;
@@ -291,6 +300,7 @@ mlx5_nl_flow_transpose(void *buf,
 trans:
 	switch (trans[n++]) {
 		union {
+			const struct rte_flow_item_port_id *port_id;
 			const struct rte_flow_item_eth *eth;
 			const struct rte_flow_item_vlan *vlan;
 			const struct rte_flow_item_ipv4 *ipv4;
@@ -392,6 +402,51 @@ mlx5_nl_flow_transpose(void *buf,
 			goto trans;
 		++item;
 		break;
+	case ITEM_PORT_ID:
+		if (item->type != RTE_FLOW_ITEM_TYPE_PORT_ID)
+			goto trans;
+		mask.port_id = mlx5_nl_flow_item_mask
+			(item, &rte_flow_item_port_id_mask,
+			 &mlx5_nl_flow_mask_supported.port_id,
+			 &mlx5_nl_flow_mask_empty.port_id,
+			 sizeof(mlx5_nl_flow_mask_supported.port_id), error);
+		if (!mask.port_id)
+			return -rte_errno;
+		if (mask.port_id == &mlx5_nl_flow_mask_empty.port_id) {
+			in_port_id_set = 1;
+			++item;
+			break;
+		}
+		spec.port_id = item->spec;
+		if (mask.port_id->id && mask.port_id->id != 0xffffffff)
+			return rte_flow_error_set
+				(error, ENOTSUP, RTE_FLOW_ERROR_TYPE_ITEM_MASK,
+				 mask.port_id,
+				 "no support for partial mask on"
+				 " \"id\" field");
+		if (!mask.port_id->id)
+			i = 0;
+		else
+			for (i = 0; ptoi[i].ifindex; ++i)
+				if (ptoi[i].port_id == spec.port_id->id)
+					break;
+		if (!ptoi[i].ifindex)
+			return rte_flow_error_set
+				(error, ENODEV, RTE_FLOW_ERROR_TYPE_ITEM_SPEC,
+				 spec.port_id,
+				 "missing data to convert port ID to ifindex");
+		tcm = mnl_nlmsg_get_payload(buf);
+		if (in_port_id_set &&
+		    ptoi[i].ifindex != (unsigned int)tcm->tcm_ifindex)
+			return rte_flow_error_set
+				(error, ENOTSUP, RTE_FLOW_ERROR_TYPE_ITEM_SPEC,
+				 spec.port_id,
+				 "cannot match traffic for several port IDs"
+				 " through a single flow rule");
+		tcm->tcm_ifindex = ptoi[i].ifindex;
+		in_port_id_set = 1;
+		++item;
+		break;
 	case ITEM_ETH:
 		if (item->type != RTE_FLOW_ITEM_TYPE_ETH)
 			goto trans;
-- 
2.11.0

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [dpdk-dev] [PATCH 0/6] net/mlx5: add support for switch flow rules
  2018-06-27 18:08 [dpdk-dev] [PATCH 0/6] net/mlx5: add support for switch flow rules Adrien Mazarguil
                   ` (5 preceding siblings ...)
  2018-06-27 18:08 ` [dpdk-dev] [PATCH 6/6] net/mlx5: add port ID pattern item " Adrien Mazarguil
@ 2018-06-28  9:05 ` Nélio Laranjeiro
  2018-07-13  9:40 ` [dpdk-dev] [PATCH v2 " Adrien Mazarguil
  7 siblings, 0 replies; 33+ messages in thread
From: Nélio Laranjeiro @ 2018-06-28  9:05 UTC (permalink / raw)
  To: Adrien Mazarguil; +Cc: Shahaf Shuler, Yongseok Koh, dev

On Wed, Jun 27, 2018 at 08:08:08PM +0200, Adrien Mazarguil wrote:
> This series adds support for switch flow rules, that is, rte_flow rules
> applied to mlx5 devices at the switch level.
> 
> It allows applications to offload traffic redirection between DPDK ports in
> hardware, while optionally modifying it (e.g. performing encap/decap).
> 
> For this to work, involved DPDK ports must be part of the same switch
> domain, as is the case with port representors, and the transfer attribute
> must be requested on flow rules.
> 
> Also since the mlx5 switch is controlled through Netlink instead of Verbs,
> and given how tedious formatting Netlink messages is, a new dependency is
> added to mlx5: libmnl. See relevant patch.
> 
> This series depends on Nelio's mlx5 flow engine rework ("net/mlx5: flow
> rework" [1][2]) which must be applied first.
> 
> [1] https://patches.dpdk.org/project/dpdk/list/?series=268
> [2] https://mails.dpdk.org/archives/dev/2018-June/105499.html
> 
> Adrien Mazarguil (6):
>   net/mlx5: lay groundwork for switch offloads
>   net/mlx5: add framework for switch flow rules
>   net/mlx5: add fate actions to switch flow rules
>   net/mlx5: add L2-L4 pattern items to switch flow rules
>   net/mlx5: add VLAN item and actions to switch flow rules
>   net/mlx5: add port ID pattern item to switch flow rules
> 
>  drivers/net/mlx5/Makefile       |    2 +
>  drivers/net/mlx5/mlx5.c         |   32 +
>  drivers/net/mlx5/mlx5.h         |   28 +
>  drivers/net/mlx5/mlx5_flow.c    |  113 ++++
>  drivers/net/mlx5/mlx5_nl_flow.c | 1126 ++++++++++++++++++++++++++++++++++
>  mk/rte.app.mk                   |    2 +-
>  6 files changed, 1302 insertions(+), 1 deletion(-)
>  create mode 100644 drivers/net/mlx5/mlx5_nl_flow.c
> 
> -- 
> 2.11.0

Acked-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>

-- 
Nélio Laranjeiro
6WIND

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [dpdk-dev] [PATCH 1/6] net/mlx5: lay groundwork for switch offloads
  2018-06-27 18:08 ` [dpdk-dev] [PATCH 1/6] net/mlx5: lay groundwork for switch offloads Adrien Mazarguil
@ 2018-07-12  0:17   ` Yongseok Koh
  2018-07-12 10:46     ` Adrien Mazarguil
  0 siblings, 1 reply; 33+ messages in thread
From: Yongseok Koh @ 2018-07-12  0:17 UTC (permalink / raw)
  To: Adrien Mazarguil; +Cc: Shahaf Shuler, Nelio Laranjeiro, dev

On Wed, Jun 27, 2018 at 08:08:10PM +0200, Adrien Mazarguil wrote:
> With mlx5, unlike normal flow rules implemented through Verbs for traffic
> emitted and received by the application, those targeting different logical
> ports of the device (VF representors for instance) are offloaded at the
> switch level and must be configured through Netlink (TC interface).
> 
> This patch adds preliminary support to manage such flow rules through the
> flow API (rte_flow).
> 
> Instead of rewriting tons of Netlink helpers and as previously suggested by
> Stephen [1], this patch introduces a new dependency to libmnl [2]
> (LGPL-2.1) when compiling mlx5.
> 
> [1] https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fmails.dpdk.org%2Farchives%2Fdev%2F2018-March%2F092676.html&data=02%7C01%7Cyskoh%40mellanox.com%7C1250093eca0c4ad6d9f008d5dc58fbb4%7Ca652971c7d2e4d9ba6a4d149256f461b%7C0%7C0%7C636657197116524482&sdata=JrAyzK1s3JG5CnuquNcA7XRN4d2WYtHUi1KXyloGdvA%3D&reserved=0
> [2] https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fnetfilter.org%2Fprojects%2Flibmnl%2F&data=02%7C01%7Cyskoh%40mellanox.com%7C1250093eca0c4ad6d9f008d5dc58fbb4%7Ca652971c7d2e4d9ba6a4d149256f461b%7C0%7C0%7C636657197116524482&sdata=yLYa0NzsTyE62BHDCZDoDah31snt6w4Coq47pY913Oo%3D&reserved=0
> 
> Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
> ---
>  drivers/net/mlx5/Makefile       |   2 +
>  drivers/net/mlx5/mlx5.c         |  32 ++++++++
>  drivers/net/mlx5/mlx5.h         |  10 +++
>  drivers/net/mlx5/mlx5_nl_flow.c | 139 +++++++++++++++++++++++++++++++++++
>  mk/rte.app.mk                   |   2 +-
>  5 files changed, 184 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/net/mlx5/Makefile b/drivers/net/mlx5/Makefile
> index 8a5229e61..3325eed06 100644
> --- a/drivers/net/mlx5/Makefile
> +++ b/drivers/net/mlx5/Makefile
> @@ -33,6 +33,7 @@ SRCS-$(CONFIG_RTE_LIBRTE_MLX5_PMD) += mlx5_mr.c
>  SRCS-$(CONFIG_RTE_LIBRTE_MLX5_PMD) += mlx5_flow.c
>  SRCS-$(CONFIG_RTE_LIBRTE_MLX5_PMD) += mlx5_socket.c
>  SRCS-$(CONFIG_RTE_LIBRTE_MLX5_PMD) += mlx5_nl.c
> +SRCS-$(CONFIG_RTE_LIBRTE_MLX5_PMD) += mlx5_nl_flow.c
>  
>  ifeq ($(CONFIG_RTE_LIBRTE_MLX5_DLOPEN_DEPS),y)
>  INSTALL-$(CONFIG_RTE_LIBRTE_MLX5_PMD)-lib += $(LIB_GLUE)
> @@ -56,6 +57,7 @@ LDLIBS += -ldl
>  else
>  LDLIBS += -libverbs -lmlx5
>  endif
> +LDLIBS += -lmnl
>  LDLIBS += -lrte_eal -lrte_mbuf -lrte_mempool -lrte_ring
>  LDLIBS += -lrte_ethdev -lrte_net -lrte_kvargs
>  LDLIBS += -lrte_bus_pci
> diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
> index 665a3c31f..d9b9097b1 100644
> --- a/drivers/net/mlx5/mlx5.c
> +++ b/drivers/net/mlx5/mlx5.c
> @@ -279,6 +279,8 @@ mlx5_dev_close(struct rte_eth_dev *dev)
>  		mlx5_nl_mac_addr_flush(dev);
>  	if (priv->nl_socket >= 0)
>  		close(priv->nl_socket);
> +	if (priv->mnl_socket)
> +		mlx5_nl_flow_socket_destroy(priv->mnl_socket);
>  	ret = mlx5_hrxq_ibv_verify(dev);
>  	if (ret)
>  		DRV_LOG(WARNING, "port %u some hash Rx queue still remain",
> @@ -1077,6 +1079,34 @@ mlx5_dev_spawn_one(struct rte_device *dpdk_dev,
>  			priv->nl_socket = -1;
>  		mlx5_nl_mac_addr_sync(eth_dev);
>  	}
> +	priv->mnl_socket = mlx5_nl_flow_socket_create();
> +	if (!priv->mnl_socket) {
> +		err = -rte_errno;
> +		DRV_LOG(WARNING,
> +			"flow rules relying on switch offloads will not be"
> +			" supported: cannot open libmnl socket: %s",
> +			strerror(rte_errno));
> +	} else {
> +		struct rte_flow_error error;
> +		unsigned int ifindex = mlx5_ifindex(eth_dev);
> +
> +		if (!ifindex) {
> +			err = -rte_errno;
> +			error.message =
> +				"cannot retrieve network interface index";
> +		} else {
> +			err = mlx5_nl_flow_init(priv->mnl_socket, ifindex,
> +						&error);
> +		}
> +		if (err) {
> +			DRV_LOG(WARNING,
> +				"flow rules relying on switch offloads will"
> +				" not be supported: %s: %s",
> +				error.message, strerror(rte_errno));
> +			mlx5_nl_flow_socket_destroy(priv->mnl_socket);
> +			priv->mnl_socket = NULL;
> +		}
> +	}
>  	TAILQ_INIT(&priv->flows);
>  	TAILQ_INIT(&priv->ctrl_flows);
>  	/* Hint libmlx5 to use PMD allocator for data plane resources */
> @@ -1127,6 +1157,8 @@ mlx5_dev_spawn_one(struct rte_device *dpdk_dev,
>  	if (priv) {
>  		unsigned int i;
>  
> +		if (priv->mnl_socket)
> +			mlx5_nl_flow_socket_destroy(priv->mnl_socket);
>  		i = mlx5_domain_to_port_id(priv->domain_id, NULL, 0);
>  		if (i == 1)
>  			claim_zero(rte_eth_switch_domain_free(priv->domain_id));
> diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
> index 1d8e156c8..390249adb 100644
> --- a/drivers/net/mlx5/mlx5.h
> +++ b/drivers/net/mlx5/mlx5.h
> @@ -148,6 +148,8 @@ struct mlx5_drop {
>  	struct mlx5_rxq_ibv *rxq; /* Verbs Rx queue. */
>  };
>  
> +struct mnl_socket;
> +
>  struct priv {
>  	LIST_ENTRY(priv) mem_event_cb; /* Called by memory event callback. */
>  	struct rte_eth_dev_data *dev_data;  /* Pointer to device data. */
> @@ -207,6 +209,7 @@ struct priv {
>  	/* Context for Verbs allocator. */
>  	int nl_socket; /* Netlink socket. */
>  	uint32_t nl_sn; /* Netlink message sequence number. */
> +	struct mnl_socket *mnl_socket; /* Libmnl socket. */
>  };
>  
>  #define PORT_ID(priv) ((priv)->dev_data->port_id)
> @@ -369,4 +372,11 @@ void mlx5_nl_mac_addr_flush(struct rte_eth_dev *dev);
>  int mlx5_nl_promisc(struct rte_eth_dev *dev, int enable);
>  int mlx5_nl_allmulti(struct rte_eth_dev *dev, int enable);
>  
> +/* mlx5_nl_flow.c */
> +
> +int mlx5_nl_flow_init(struct mnl_socket *nl, unsigned int ifindex,
> +		      struct rte_flow_error *error);
> +struct mnl_socket *mlx5_nl_flow_socket_create(void);
> +void mlx5_nl_flow_socket_destroy(struct mnl_socket *nl);
> +
>  #endif /* RTE_PMD_MLX5_H_ */
> diff --git a/drivers/net/mlx5/mlx5_nl_flow.c b/drivers/net/mlx5/mlx5_nl_flow.c
> new file mode 100644
> index 000000000..7a8683b03
> --- /dev/null
> +++ b/drivers/net/mlx5/mlx5_nl_flow.c
> @@ -0,0 +1,139 @@
> +/* SPDX-License-Identifier: BSD-3-Clause
> + * Copyright 2018 6WIND S.A.
> + * Copyright 2018 Mellanox Technologies, Ltd
> + */
> +
> +#include <errno.h>
> +#include <libmnl/libmnl.h>
> +#include <linux/netlink.h>
> +#include <linux/pkt_sched.h>
> +#include <linux/rtnetlink.h>
> +#include <stdalign.h>
> +#include <stddef.h>
> +#include <stdint.h>
> +#include <stdlib.h>
> +#include <sys/socket.h>
> +
> +#include <rte_errno.h>
> +#include <rte_flow.h>
> +
> +#include "mlx5.h"
> +
> +/**
> + * Send Netlink message with acknowledgment.
> + *
> + * @param nl
> + *   Libmnl socket to use.
> + * @param nlh
> + *   Message to send. This function always raises the NLM_F_ACK flag before
> + *   sending.
> + *
> + * @return
> + *   0 on success, a negative errno value otherwise and rte_errno is set.
> + */
> +static int
> +mlx5_nl_flow_nl_ack(struct mnl_socket *nl, struct nlmsghdr *nlh)
> +{
> +	alignas(struct nlmsghdr)
> +	uint8_t ans[MNL_SOCKET_BUFFER_SIZE];

There are total 3 of this buffer. On a certain host having large pagesize, this
can be 8kB * 3 = 24kB. This is not a gigantic buffer but as all the functions
here are sequentially accessed, how about having just one global buffer instead?

> +	uint32_t seq = random();
> +	int ret;
> +
> +	nlh->nlmsg_flags |= NLM_F_ACK;
> +	nlh->nlmsg_seq = seq;
> +	ret = mnl_socket_sendto(nl, nlh, nlh->nlmsg_len);
> +	if (ret != -1)
> +		ret = mnl_socket_recvfrom(nl, ans, sizeof(ans));
> +	if (ret != -1)
> +		ret = mnl_cb_run
> +			(ans, ret, seq, mnl_socket_get_portid(nl), NULL, NULL);
> +	if (!ret)
> +		return 0;
> +	rte_errno = errno;
> +	return -rte_errno;
> +}
> +
> +/**
> + * Initialize ingress qdisc of a given network interface.
> + *
> + * @param nl
> + *   Libmnl socket of the @p NETLINK_ROUTE kind.
> + * @param ifindex
> + *   Index of network interface to initialize.
> + * @param[out] error
> + *   Perform verbose error reporting if not NULL.
> + *
> + * @return
> + *   0 on success, a negative errno value otherwise and rte_errno is set.
> + */
> +int
> +mlx5_nl_flow_init(struct mnl_socket *nl, unsigned int ifindex,
> +		  struct rte_flow_error *error)
> +{
> +	uint8_t buf[MNL_SOCKET_BUFFER_SIZE];
> +	struct nlmsghdr *nlh;
> +	struct tcmsg *tcm;
> +
> +	/* Destroy existing ingress qdisc and everything attached to it. */
> +	nlh = mnl_nlmsg_put_header(buf);
> +	nlh->nlmsg_type = RTM_DELQDISC;
> +	nlh->nlmsg_flags = NLM_F_REQUEST;
> +	tcm = mnl_nlmsg_put_extra_header(nlh, sizeof(*tcm));
> +	tcm->tcm_family = AF_UNSPEC;
> +	tcm->tcm_ifindex = ifindex;
> +	tcm->tcm_handle = TC_H_MAKE(TC_H_INGRESS, 0);
> +	tcm->tcm_parent = TC_H_INGRESS;
> +	/* Ignore errors when qdisc is already absent. */
> +	if (mlx5_nl_flow_nl_ack(nl, nlh) &&
> +	    rte_errno != EINVAL && rte_errno != ENOENT)
> +		return rte_flow_error_set
> +			(error, rte_errno, RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
> +			 NULL, "netlink: failed to remove ingress qdisc");
> +	/* Create fresh ingress qdisc. */
> +	nlh = mnl_nlmsg_put_header(buf);
> +	nlh->nlmsg_type = RTM_NEWQDISC;
> +	nlh->nlmsg_flags = NLM_F_REQUEST | NLM_F_CREATE | NLM_F_EXCL;
> +	tcm = mnl_nlmsg_put_extra_header(nlh, sizeof(*tcm));
> +	tcm->tcm_family = AF_UNSPEC;
> +	tcm->tcm_ifindex = ifindex;
> +	tcm->tcm_handle = TC_H_MAKE(TC_H_INGRESS, 0);
> +	tcm->tcm_parent = TC_H_INGRESS;
> +	mnl_attr_put_strz_check(nlh, sizeof(buf), TCA_KIND, "ingress");
> +	if (mlx5_nl_flow_nl_ack(nl, nlh))
> +		return rte_flow_error_set
> +			(error, rte_errno, RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
> +			 NULL, "netlink: failed to create ingress qdisc");
> +	return 0;
> +}
> +
> +/**
> + * Create and configure a libmnl socket for Netlink flow rules.
> + *
> + * @return
> + *   A valid libmnl socket object pointer on success, NULL otherwise and
> + *   rte_errno is set.
> + */
> +struct mnl_socket *
> +mlx5_nl_flow_socket_create(void)
> +{
> +	struct mnl_socket *nl = mnl_socket_open(NETLINK_ROUTE);
> +
> +	if (nl &&
> +	    !mnl_socket_setsockopt(nl, NETLINK_CAP_ACK, &(int){ 1 },
> +				   sizeof(int)) &&
> +	    !mnl_socket_bind(nl, 0, MNL_SOCKET_AUTOPID))
> +		return nl;
> +	rte_errno = errno;
> +	if (nl)
> +		mnl_socket_close(nl);
> +	return NULL;
> +}
> +
> +/**
> + * Destroy a libmnl socket.
> + */
> +void
> +mlx5_nl_flow_socket_destroy(struct mnl_socket *nl)
> +{
> +	mnl_socket_close(nl);
> +}
> diff --git a/mk/rte.app.mk b/mk/rte.app.mk
> index 7bcf6308d..414f1b967 100644
> --- a/mk/rte.app.mk
> +++ b/mk/rte.app.mk
> @@ -145,7 +145,7 @@ endif
>  ifeq ($(CONFIG_RTE_LIBRTE_MLX5_DLOPEN_DEPS),y)
>  _LDLIBS-$(CONFIG_RTE_LIBRTE_MLX5_PMD)       += -lrte_pmd_mlx5 -ldl
>  else
> -_LDLIBS-$(CONFIG_RTE_LIBRTE_MLX5_PMD)       += -lrte_pmd_mlx5 -libverbs -lmlx5
> +_LDLIBS-$(CONFIG_RTE_LIBRTE_MLX5_PMD)       += -lrte_pmd_mlx5 -libverbs -lmlx5 -lmnl
>  endif
>  _LDLIBS-$(CONFIG_RTE_LIBRTE_MVPP2_PMD)      += -lrte_pmd_mvpp2 -L$(LIBMUSDK_PATH)/lib -lmusdk
>  _LDLIBS-$(CONFIG_RTE_LIBRTE_NFP_PMD)        += -lrte_pmd_nfp
> -- 
> 2.11.0

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [dpdk-dev] [PATCH 2/6] net/mlx5: add framework for switch flow rules
  2018-06-27 18:08 ` [dpdk-dev] [PATCH 2/6] net/mlx5: add framework for switch flow rules Adrien Mazarguil
@ 2018-07-12  0:59   ` Yongseok Koh
  2018-07-12 10:46     ` Adrien Mazarguil
  0 siblings, 1 reply; 33+ messages in thread
From: Yongseok Koh @ 2018-07-12  0:59 UTC (permalink / raw)
  To: Adrien Mazarguil; +Cc: Shahaf Shuler, Nelio Laranjeiro, dev

On Wed, Jun 27, 2018 at 08:08:12PM +0200, Adrien Mazarguil wrote:
> Because mlx5 switch flow rules are configured through Netlink (TC
> interface) and have little in common with Verbs, this patch adds a separate
> parser function to handle them.
> 
> - mlx5_nl_flow_transpose() converts a rte_flow rule to its TC equivalent
>   and stores the result in a buffer.
> 
> - mlx5_nl_flow_brand() gives a unique handle to a flow rule buffer.
> 
> - mlx5_nl_flow_create() instantiates a flow rule on the device based on
>   such a buffer.
> 
> - mlx5_nl_flow_destroy() performs the reverse operation.
> 
> These functions are called by the existing implementation when encountering
> flow rules which must be offloaded to the switch (currently relying on the
> transfer attribute).
> 
> Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
> Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
> ---
>  drivers/net/mlx5/mlx5.h         |  18 +++
>  drivers/net/mlx5/mlx5_flow.c    | 113 ++++++++++++++
>  drivers/net/mlx5/mlx5_nl_flow.c | 295 +++++++++++++++++++++++++++++++++++
>  3 files changed, 426 insertions(+)
> 
> diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
> index 390249adb..aa16057d6 100644
> --- a/drivers/net/mlx5/mlx5.h
> +++ b/drivers/net/mlx5/mlx5.h
> @@ -148,6 +148,12 @@ struct mlx5_drop {
>  	struct mlx5_rxq_ibv *rxq; /* Verbs Rx queue. */
>  };
>  
> +/** DPDK port to network interface index (ifindex) conversion. */
> +struct mlx5_nl_flow_ptoi {
> +	uint16_t port_id; /**< DPDK port ID. */
> +	unsigned int ifindex; /**< Network interface index. */
> +};
> +
>  struct mnl_socket;
>  
>  struct priv {
> @@ -374,6 +380,18 @@ int mlx5_nl_allmulti(struct rte_eth_dev *dev, int enable);
>  
>  /* mlx5_nl_flow.c */
>  
> +int mlx5_nl_flow_transpose(void *buf,
> +			   size_t size,
> +			   const struct mlx5_nl_flow_ptoi *ptoi,
> +			   const struct rte_flow_attr *attr,
> +			   const struct rte_flow_item *pattern,
> +			   const struct rte_flow_action *actions,
> +			   struct rte_flow_error *error);
> +void mlx5_nl_flow_brand(void *buf, uint32_t handle);
> +int mlx5_nl_flow_create(struct mnl_socket *nl, void *buf,
> +			struct rte_flow_error *error);
> +int mlx5_nl_flow_destroy(struct mnl_socket *nl, void *buf,
> +			 struct rte_flow_error *error);
>  int mlx5_nl_flow_init(struct mnl_socket *nl, unsigned int ifindex,
>  		      struct rte_flow_error *error);
>  struct mnl_socket *mlx5_nl_flow_socket_create(void);
> diff --git a/drivers/net/mlx5/mlx5_flow.c b/drivers/net/mlx5/mlx5_flow.c
> index 9241855be..93b245991 100644
> --- a/drivers/net/mlx5/mlx5_flow.c
> +++ b/drivers/net/mlx5/mlx5_flow.c
> @@ -4,6 +4,7 @@
>   */
>  
>  #include <sys/queue.h>
> +#include <stdalign.h>
>  #include <stdint.h>
>  #include <string.h>
>  
> @@ -271,6 +272,7 @@ struct rte_flow {
>  	/**< Store tunnel packet type data to store in Rx queue. */
>  	uint8_t key[40]; /**< RSS hash key. */
>  	uint16_t (*queue)[]; /**< Destination queues to redirect traffic to. */
> +	void *nl_flow; /**< Netlink flow buffer if relevant. */
>  };
>  
>  static const struct rte_flow_ops mlx5_flow_ops = {
> @@ -2403,6 +2405,106 @@ mlx5_flow_actions(struct rte_eth_dev *dev,
>  }
>  
>  /**
> + * Validate flow rule and fill flow structure accordingly.
> + *
> + * @param dev
> + *   Pointer to Ethernet device.
> + * @param[out] flow
> + *   Pointer to flow structure.
> + * @param flow_size
> + *   Size of allocated space for @p flow.
> + * @param[in] attr
> + *   Flow rule attributes.
> + * @param[in] pattern
> + *   Pattern specification (list terminated by the END pattern item).
> + * @param[in] actions
> + *   Associated actions (list terminated by the END action).
> + * @param[out] error
> + *   Perform verbose error reporting if not NULL.
> + *
> + * @return
> + *   A positive value representing the size of the flow object in bytes
> + *   regardless of @p flow_size on success, a negative errno value otherwise
> + *   and rte_errno is set.
> + */
> +static int
> +mlx5_flow_merge_switch(struct rte_eth_dev *dev,
> +		       struct rte_flow *flow,
> +		       size_t flow_size,
> +		       const struct rte_flow_attr *attr,
> +		       const struct rte_flow_item pattern[],
> +		       const struct rte_flow_action actions[],
> +		       struct rte_flow_error *error)
> +{
> +	struct priv *priv = dev->data->dev_private;
> +	unsigned int n = mlx5_domain_to_port_id(priv->domain_id, NULL, 0);
> +	uint16_t port_list[!n + n];
> +	struct mlx5_nl_flow_ptoi ptoi[!n + n + 1];
> +	size_t off = RTE_ALIGN_CEIL(sizeof(*flow), alignof(max_align_t));
> +	unsigned int i;
> +	unsigned int own = 0;
> +	int ret;
> +
> +	/* At least one port is needed when no switch domain is present. */
> +	if (!n) {
> +		n = 1;
> +		port_list[0] = dev->data->port_id;
> +	} else {
> +		n = mlx5_domain_to_port_id(priv->domain_id, port_list, n);
> +		if (n > RTE_DIM(port_list))
> +			n = RTE_DIM(port_list);
> +	}
> +	for (i = 0; i != n; ++i) {
> +		struct rte_eth_dev_info dev_info;
> +
> +		rte_eth_dev_info_get(port_list[i], &dev_info);
> +		if (port_list[i] == dev->data->port_id)
> +			own = i;
> +		ptoi[i].port_id = port_list[i];
> +		ptoi[i].ifindex = dev_info.if_index;
> +	}
> +	/* Ensure first entry of ptoi[] is the current device. */
> +	if (own) {
> +		ptoi[n] = ptoi[0];
> +		ptoi[0] = ptoi[own];
> +		ptoi[own] = ptoi[n];
> +	}
> +	/* An entry with zero ifindex terminates ptoi[]. */
> +	ptoi[n].port_id = 0;
> +	ptoi[n].ifindex = 0;
> +	if (flow_size < off)
> +		flow_size = 0;
> +	ret = mlx5_nl_flow_transpose((uint8_t *)flow + off,
> +				     flow_size ? flow_size - off : 0,
> +				     ptoi, attr, pattern, actions, error);
> +	if (ret < 0)
> +		return ret;

So, there's an assumption that the buffer allocated outside of this API is
enough to include all the messages in mlx5_nl_flow_transpose(), right? If
flow_size isn't enough, buf_tmp will be used and _transpose() doesn't return
error but required size. Sounds confusing, may need to make a change or to have
clearer documentation.

> +	if (flow_size) {
> +		*flow = (struct rte_flow){
> +			.attributes = *attr,
> +			.nl_flow = (uint8_t *)flow + off,
> +		};
> +		/*
> +		 * Generate a reasonably unique handle based on the address
> +		 * of the target buffer.
> +		 *
> +		 * This is straightforward on 32-bit systems where the flow
> +		 * pointer can be used directly. Otherwise, its least
> +		 * significant part is taken after shifting it by the
> +		 * previous power of two of the pointed buffer size.
> +		 */
> +		if (sizeof(flow) <= 4)
> +			mlx5_nl_flow_brand(flow->nl_flow, (uintptr_t)flow);
> +		else
> +			mlx5_nl_flow_brand
> +				(flow->nl_flow,
> +				 (uintptr_t)flow >>
> +				 rte_log2_u32(rte_align32prevpow2(flow_size)));
> +	}
> +	return off + ret;
> +}
> +
> +/**
>   * Validate the rule and return a flow structure filled accordingly.
>   *
>   * @param dev
> @@ -2439,6 +2541,9 @@ mlx5_flow_merge(struct rte_eth_dev *dev, struct rte_flow *flow,
>  	int ret;
>  	uint32_t i;
>  
> +	if (attr->transfer)
> +		return mlx5_flow_merge_switch(dev, flow, flow_size,
> +					      attr, items, actions, error);
>  	if (!remain)
>  		flow = &local_flow;
>  	ret = mlx5_flow_attributes(dev, attr, flow, error);
> @@ -2554,8 +2659,11 @@ mlx5_flow_validate(struct rte_eth_dev *dev,
>  static void
>  mlx5_flow_fate_remove(struct rte_eth_dev *dev, struct rte_flow *flow)
>  {
> +	struct priv *priv = dev->data->dev_private;
>  	struct mlx5_flow_verbs *verbs;
>  
> +	if (flow->nl_flow && priv->mnl_socket)
> +		mlx5_nl_flow_destroy(priv->mnl_socket, flow->nl_flow, NULL);
>  	LIST_FOREACH(verbs, &flow->verbs, next) {
>  		if (verbs->flow) {
>  			claim_zero(mlx5_glue->destroy_flow(verbs->flow));
> @@ -2592,6 +2700,7 @@ static int
>  mlx5_flow_fate_apply(struct rte_eth_dev *dev, struct rte_flow *flow,
>  		     struct rte_flow_error *error)
>  {
> +	struct priv *priv = dev->data->dev_private;
>  	struct mlx5_flow_verbs *verbs;
>  	int err;
>  
> @@ -2640,6 +2749,10 @@ mlx5_flow_fate_apply(struct rte_eth_dev *dev, struct rte_flow *flow,
>  			goto error;
>  		}
>  	}
> +	if (flow->nl_flow &&
> +	    priv->mnl_socket &&
> +	    mlx5_nl_flow_create(priv->mnl_socket, flow->nl_flow, error))
> +		goto error;
>  	return 0;
>  error:
>  	err = rte_errno; /* Save rte_errno before cleanup. */
> diff --git a/drivers/net/mlx5/mlx5_nl_flow.c b/drivers/net/mlx5/mlx5_nl_flow.c
> index 7a8683b03..1fc62fb0a 100644
> --- a/drivers/net/mlx5/mlx5_nl_flow.c
> +++ b/drivers/net/mlx5/mlx5_nl_flow.c
> @@ -5,7 +5,9 @@
>  
>  #include <errno.h>
>  #include <libmnl/libmnl.h>
> +#include <linux/if_ether.h>
>  #include <linux/netlink.h>
> +#include <linux/pkt_cls.h>
>  #include <linux/pkt_sched.h>
>  #include <linux/rtnetlink.h>
>  #include <stdalign.h>
> @@ -14,11 +16,248 @@
>  #include <stdlib.h>
>  #include <sys/socket.h>
>  
> +#include <rte_byteorder.h>
>  #include <rte_errno.h>
>  #include <rte_flow.h>
>  
>  #include "mlx5.h"
>  
> +/** Parser state definitions for mlx5_nl_flow_trans[]. */
> +enum mlx5_nl_flow_trans {
> +	INVALID,
> +	BACK,
> +	ATTR,
> +	PATTERN,
> +	ITEM_VOID,
> +	ACTIONS,
> +	ACTION_VOID,
> +	END,
> +};
> +
> +#define TRANS(...) (const enum mlx5_nl_flow_trans []){ __VA_ARGS__, INVALID, }
> +
> +#define PATTERN_COMMON \
> +	ITEM_VOID, ACTIONS
> +#define ACTIONS_COMMON \
> +	ACTION_VOID, END
> +
> +/** Parser state transitions used by mlx5_nl_flow_transpose(). */
> +static const enum mlx5_nl_flow_trans *const mlx5_nl_flow_trans[] = {
> +	[INVALID] = NULL,
> +	[BACK] = NULL,
> +	[ATTR] = TRANS(PATTERN),
> +	[PATTERN] = TRANS(PATTERN_COMMON),
> +	[ITEM_VOID] = TRANS(BACK),
> +	[ACTIONS] = TRANS(ACTIONS_COMMON),
> +	[ACTION_VOID] = TRANS(BACK),
> +	[END] = NULL,
> +};
> +
> +/**
> + * Transpose flow rule description to rtnetlink message.
> + *
> + * This function transposes a flow rule description to a traffic control
> + * (TC) filter creation message ready to be sent over Netlink.
> + *
> + * Target interface is specified as the first entry of the @p ptoi table.
> + * Subsequent entries enable this function to resolve other DPDK port IDs
> + * found in the flow rule.
> + *
> + * @param[out] buf
> + *   Output message buffer. May be NULL when @p size is 0.
> + * @param size
> + *   Size of @p buf. Message may be truncated if not large enough.
> + * @param[in] ptoi
> + *   DPDK port ID to network interface index translation table. This table
> + *   is terminated by an entry with a zero ifindex value.
> + * @param[in] attr
> + *   Flow rule attributes.
> + * @param[in] pattern
> + *   Pattern specification.
> + * @param[in] actions
> + *   Associated actions.
> + * @param[out] error
> + *   Perform verbose error reporting if not NULL.
> + *
> + * @return
> + *   A positive value representing the exact size of the message in bytes
> + *   regardless of the @p size parameter on success, a negative errno value
> + *   otherwise and rte_errno is set.
> + */
> +int
> +mlx5_nl_flow_transpose(void *buf,
> +		       size_t size,
> +		       const struct mlx5_nl_flow_ptoi *ptoi,
> +		       const struct rte_flow_attr *attr,
> +		       const struct rte_flow_item *pattern,
> +		       const struct rte_flow_action *actions,
> +		       struct rte_flow_error *error)
> +{
> +	alignas(struct nlmsghdr)
> +	uint8_t buf_tmp[MNL_SOCKET_BUFFER_SIZE];
> +	const struct rte_flow_item *item;
> +	const struct rte_flow_action *action;
> +	unsigned int n;
> +	struct nlattr *na_flower;
> +	struct nlattr *na_flower_act;
> +	const enum mlx5_nl_flow_trans *trans;
> +	const enum mlx5_nl_flow_trans *back;
> +
> +	if (!size)
> +		goto error_nobufs;
> +init:
> +	item = pattern;
> +	action = actions;
> +	n = 0;
> +	na_flower = NULL;
> +	na_flower_act = NULL;
> +	trans = TRANS(ATTR);
> +	back = trans;
> +trans:
> +	switch (trans[n++]) {
> +		struct nlmsghdr *nlh;
> +		struct tcmsg *tcm;
> +
> +	case INVALID:
> +		if (item->type)
> +			return rte_flow_error_set
> +				(error, ENOTSUP, RTE_FLOW_ERROR_TYPE_ITEM,
> +				 item, "unsupported pattern item combination");
> +		else if (action->type)
> +			return rte_flow_error_set
> +				(error, ENOTSUP, RTE_FLOW_ERROR_TYPE_ACTION,
> +				 action, "unsupported action combination");
> +		return rte_flow_error_set
> +			(error, ENOTSUP, RTE_FLOW_ERROR_TYPE_UNSPECIFIED, NULL,
> +			 "flow rule lacks some kind of fate action");
> +	case BACK:
> +		trans = back;
> +		n = 0;
> +		goto trans;
> +	case ATTR:
> +		/*
> +		 * Supported attributes: no groups, some priorities and
> +		 * ingress only. Don't care about transfer as it is the
> +		 * caller's problem.
> +		 */
> +		if (attr->group)
> +			return rte_flow_error_set
> +				(error, ENOTSUP,
> +				 RTE_FLOW_ERROR_TYPE_ATTR_GROUP,
> +				 attr, "groups are not supported");
> +		if (attr->priority > 0xfffe)
> +			return rte_flow_error_set
> +				(error, ENOTSUP,
> +				 RTE_FLOW_ERROR_TYPE_ATTR_PRIORITY,
> +				 attr, "lowest priority level is 0xfffe");
> +		if (!attr->ingress)
> +			return rte_flow_error_set
> +				(error, ENOTSUP,
> +				 RTE_FLOW_ERROR_TYPE_ATTR_INGRESS,
> +				 attr, "only ingress is supported");
> +		if (attr->egress)
> +			return rte_flow_error_set
> +				(error, ENOTSUP,
> +				 RTE_FLOW_ERROR_TYPE_ATTR_INGRESS,
> +				 attr, "egress is not supported");
> +		if (size < mnl_nlmsg_size(sizeof(*tcm)))
> +			goto error_nobufs;
> +		nlh = mnl_nlmsg_put_header(buf);
> +		nlh->nlmsg_type = 0;
> +		nlh->nlmsg_flags = 0;
> +		nlh->nlmsg_seq = 0;
> +		tcm = mnl_nlmsg_put_extra_header(nlh, sizeof(*tcm));
> +		tcm->tcm_family = AF_UNSPEC;
> +		tcm->tcm_ifindex = ptoi[0].ifindex;
> +		/*
> +		 * Let kernel pick a handle by default. A predictable handle
> +		 * can be set by the caller on the resulting buffer through
> +		 * mlx5_nl_flow_brand().
> +		 */
> +		tcm->tcm_handle = 0;
> +		tcm->tcm_parent = TC_H_MAKE(TC_H_INGRESS, TC_H_MIN_INGRESS);
> +		/*
> +		 * Priority cannot be zero to prevent the kernel from
> +		 * picking one automatically.
> +		 */
> +		tcm->tcm_info = TC_H_MAKE((attr->priority + 1) << 16,
> +					  RTE_BE16(ETH_P_ALL));
> +		break;
> +	case PATTERN:
> +		if (!mnl_attr_put_strz_check(buf, size, TCA_KIND, "flower"))
> +			goto error_nobufs;
> +		na_flower = mnl_attr_nest_start_check(buf, size, TCA_OPTIONS);
> +		if (!na_flower)
> +			goto error_nobufs;
> +		if (!mnl_attr_put_u32_check(buf, size, TCA_FLOWER_FLAGS,
> +					    TCA_CLS_FLAGS_SKIP_SW))
> +			goto error_nobufs;
> +		break;
> +	case ITEM_VOID:
> +		if (item->type != RTE_FLOW_ITEM_TYPE_VOID)
> +			goto trans;
> +		++item;
> +		break;
> +	case ACTIONS:
> +		if (item->type != RTE_FLOW_ITEM_TYPE_END)
> +			goto trans;
> +		assert(na_flower);
> +		assert(!na_flower_act);
> +		na_flower_act =
> +			mnl_attr_nest_start_check(buf, size, TCA_FLOWER_ACT);
> +		if (!na_flower_act)
> +			goto error_nobufs;
> +		break;
> +	case ACTION_VOID:
> +		if (action->type != RTE_FLOW_ACTION_TYPE_VOID)
> +			goto trans;
> +		++action;
> +		break;
> +	case END:
> +		if (item->type != RTE_FLOW_ITEM_TYPE_END ||
> +		    action->type != RTE_FLOW_ACTION_TYPE_END)
> +			goto trans;
> +		if (na_flower_act)
> +			mnl_attr_nest_end(buf, na_flower_act);
> +		if (na_flower)
> +			mnl_attr_nest_end(buf, na_flower);
> +		nlh = buf;
> +		return nlh->nlmsg_len;
> +	}
> +	back = trans;
> +	trans = mlx5_nl_flow_trans[trans[n - 1]];
> +	n = 0;
> +	goto trans;
> +error_nobufs:
> +	if (buf != buf_tmp) {
> +		buf = buf_tmp;
> +		size = sizeof(buf_tmp);
> +		goto init;
> +	}

Continuing my comment above.
This part is unclear. It looks to me that this func does:

1) if size is zero, consider it as a testing call to know the amount of memory
required.
2) if size isn't zero but not enough, it stops writing to buf and start over to
return the amount of memory required instead of returning error.
3) if size isn't zero and enough, it fills in buf.

Do I correctly understand?

Thanks,
Yongseok

> +	return rte_flow_error_set
> +		(error, ENOBUFS, RTE_FLOW_ERROR_TYPE_UNSPECIFIED, NULL,
> +		 "generated TC message is too large");
> +}
> +
> +/**
> + * Brand rtnetlink buffer with unique handle.
> + *
> + * This handle should be unique for a given network interface to avoid
> + * collisions.
> + *
> + * @param buf
> + *   Flow rule buffer previously initialized by mlx5_nl_flow_transpose().
> + * @param handle
> + *   Unique 32-bit handle to use.
> + */
> +void
> +mlx5_nl_flow_brand(void *buf, uint32_t handle)
> +{
> +	struct tcmsg *tcm = mnl_nlmsg_get_payload(buf);
> +
> +	tcm->tcm_handle = handle;
> +}
> +
>  /**
>   * Send Netlink message with acknowledgment.
>   *
> @@ -54,6 +293,62 @@ mlx5_nl_flow_nl_ack(struct mnl_socket *nl, struct nlmsghdr *nlh)
>  }
>  
>  /**
> + * Create a Netlink flow rule.
> + *
> + * @param nl
> + *   Libmnl socket to use.
> + * @param buf
> + *   Flow rule buffer previously initialized by mlx5_nl_flow_transpose().
> + * @param[out] error
> + *   Perform verbose error reporting if not NULL.
> + *
> + * @return
> + *   0 on success, a negative errno value otherwise and rte_errno is set.
> + */
> +int
> +mlx5_nl_flow_create(struct mnl_socket *nl, void *buf,
> +		    struct rte_flow_error *error)
> +{
> +	struct nlmsghdr *nlh = buf;
> +
> +	nlh->nlmsg_type = RTM_NEWTFILTER;
> +	nlh->nlmsg_flags = NLM_F_REQUEST | NLM_F_CREATE | NLM_F_EXCL;
> +	if (!mlx5_nl_flow_nl_ack(nl, nlh))
> +		return 0;
> +	return rte_flow_error_set
> +		(error, rte_errno, RTE_FLOW_ERROR_TYPE_UNSPECIFIED, NULL,
> +		 "netlink: failed to create TC flow rule");
> +}
> +
> +/**
> + * Destroy a Netlink flow rule.
> + *
> + * @param nl
> + *   Libmnl socket to use.
> + * @param buf
> + *   Flow rule buffer previously initialized by mlx5_nl_flow_transpose().
> + * @param[out] error
> + *   Perform verbose error reporting if not NULL.
> + *
> + * @return
> + *   0 on success, a negative errno value otherwise and rte_errno is set.
> + */
> +int
> +mlx5_nl_flow_destroy(struct mnl_socket *nl, void *buf,
> +		     struct rte_flow_error *error)
> +{
> +	struct nlmsghdr *nlh = buf;
> +
> +	nlh->nlmsg_type = RTM_DELTFILTER;
> +	nlh->nlmsg_flags = NLM_F_REQUEST;
> +	if (!mlx5_nl_flow_nl_ack(nl, nlh))
> +		return 0;
> +	return rte_flow_error_set
> +		(error, errno, RTE_FLOW_ERROR_TYPE_UNSPECIFIED, NULL,
> +		 "netlink: failed to destroy TC flow rule");
> +}
> +
> +/**
>   * Initialize ingress qdisc of a given network interface.
>   *
>   * @param nl
> -- 
> 2.11.0

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [dpdk-dev] [PATCH 3/6] net/mlx5: add fate actions to switch flow rules
  2018-06-27 18:08 ` [dpdk-dev] [PATCH 3/6] net/mlx5: add fate actions to " Adrien Mazarguil
@ 2018-07-12  1:00   ` Yongseok Koh
  0 siblings, 0 replies; 33+ messages in thread
From: Yongseok Koh @ 2018-07-12  1:00 UTC (permalink / raw)
  To: Adrien Mazarguil; +Cc: Shahaf Shuler, Nelio Laranjeiro, dev

On Wed, Jun 27, 2018 at 08:08:14PM +0200, Adrien Mazarguil wrote:
> This patch enables creation of rte_flow rules that direct matching traffic
> to a different port (e.g. another VF representor) or drop it directly at
> the switch level (PORT_ID and DROP actions).
> 
> Testpmd examples:
> 
> - Directing all traffic to port ID 0:
> 
>   flow create 1 ingress transfer pattern end actions port_id id 0 / end
> 
> - Dropping all traffic normally received by port ID 1:
> 
>   flow create 1 ingress transfer pattern end actions drop / end
> 
> Note the presence of the transfer attribute, which requests them to be
> applied at the switch level. All traffic is matched due to empty pattern.
> 
> Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
> ---
Acked-by: Yongseok Koh <yskoh@mellanox.com>

Thanks

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [dpdk-dev] [PATCH 4/6] net/mlx5: add L2-L4 pattern items to switch flow rules
  2018-06-27 18:08 ` [dpdk-dev] [PATCH 4/6] net/mlx5: add L2-L4 pattern items " Adrien Mazarguil
@ 2018-07-12  1:02   ` Yongseok Koh
  0 siblings, 0 replies; 33+ messages in thread
From: Yongseok Koh @ 2018-07-12  1:02 UTC (permalink / raw)
  To: Adrien Mazarguil; +Cc: Shahaf Shuler, Nelio Laranjeiro, dev

On Wed, Jun 27, 2018 at 08:08:16PM +0200, Adrien Mazarguil wrote:
> This enables flow rules to explicitly match supported combinations of
> Ethernet, IPv4, IPv6, TCP and UDP headers at the switch level.
> 
> Testpmd example:
> 
> - Dropping TCPv4 traffic with a specific destination on port ID 2:
> 
>   flow create 2 ingress transfer pattern eth / ipv4 / tcp dst is 42 / end
>      actions drop / end
> 
> Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
> ---
Acked-by: Yongseok Koh <yskoh@mellanox.com>

Thanks

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [dpdk-dev] [PATCH 5/6] net/mlx5: add VLAN item and actions to switch flow rules
  2018-06-27 18:08 ` [dpdk-dev] [PATCH 5/6] net/mlx5: add VLAN item and actions " Adrien Mazarguil
@ 2018-07-12  1:10   ` Yongseok Koh
  2018-07-12 10:47     ` Adrien Mazarguil
  0 siblings, 1 reply; 33+ messages in thread
From: Yongseok Koh @ 2018-07-12  1:10 UTC (permalink / raw)
  To: Adrien Mazarguil; +Cc: Shahaf Shuler, Nelio Laranjeiro, dev

On Wed, Jun 27, 2018 at 08:08:18PM +0200, Adrien Mazarguil wrote:
> This enables flow rules to explicitly match VLAN traffic (VLAN pattern
> item) and perform various operations on VLAN headers at the switch level
> (OF_POP_VLAN, OF_PUSH_VLAN, OF_SET_VLAN_VID and OF_SET_VLAN_PCP actions).
> 
> Testpmd examples:
> 
> - Directing all VLAN traffic received on port ID 1 to port ID 0:
> 
>   flow create 1 ingress transfer pattern eth / vlan / end actions
>      port_id id 0 / end
> 
> - Adding a VLAN header to IPv6 traffic received on port ID 1 and directing
>   it to port ID 0:
> 
>   flow create 1 ingress transfer pattern eth / ipv6 / end actions
>      of_push_vlan ethertype 0x8100 / of_set_vlan_vid / port_id id 0 / end
> 
> Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
> ---
>  drivers/net/mlx5/mlx5_nl_flow.c | 177 ++++++++++++++++++++++++++++++++++-
>  1 file changed, 173 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/net/mlx5/mlx5_nl_flow.c b/drivers/net/mlx5/mlx5_nl_flow.c
> index ad1e001c6..a45d94fae 100644
> --- a/drivers/net/mlx5/mlx5_nl_flow.c
> +++ b/drivers/net/mlx5/mlx5_nl_flow.c
> @@ -13,6 +13,7 @@
>  #include <linux/rtnetlink.h>
>  #include <linux/tc_act/tc_gact.h>
>  #include <linux/tc_act/tc_mirred.h>
> +#include <linux/tc_act/tc_vlan.h>
>  #include <netinet/in.h>
>  #include <stdalign.h>
>  #include <stdbool.h>
> @@ -36,6 +37,7 @@ enum mlx5_nl_flow_trans {
>  	PATTERN,
>  	ITEM_VOID,
>  	ITEM_ETH,
> +	ITEM_VLAN,
>  	ITEM_IPV4,
>  	ITEM_IPV6,
>  	ITEM_TCP,
> @@ -44,6 +46,10 @@ enum mlx5_nl_flow_trans {
>  	ACTION_VOID,
>  	ACTION_PORT_ID,
>  	ACTION_DROP,
> +	ACTION_OF_POP_VLAN,
> +	ACTION_OF_PUSH_VLAN,
> +	ACTION_OF_SET_VLAN_VID,
> +	ACTION_OF_SET_VLAN_PCP,
>  	END,
>  };
>  
> @@ -52,7 +58,8 @@ enum mlx5_nl_flow_trans {
>  #define PATTERN_COMMON \
>  	ITEM_VOID, ACTIONS
>  #define ACTIONS_COMMON \
> -	ACTION_VOID
> +	ACTION_VOID, ACTION_OF_POP_VLAN, ACTION_OF_PUSH_VLAN, \
> +	ACTION_OF_SET_VLAN_VID, ACTION_OF_SET_VLAN_PCP
>  #define ACTIONS_FATE \
>  	ACTION_PORT_ID, ACTION_DROP
>  
> @@ -63,7 +70,8 @@ static const enum mlx5_nl_flow_trans *const mlx5_nl_flow_trans[] = {
>  	[ATTR] = TRANS(PATTERN),
>  	[PATTERN] = TRANS(ITEM_ETH, PATTERN_COMMON),
>  	[ITEM_VOID] = TRANS(BACK),
> -	[ITEM_ETH] = TRANS(ITEM_IPV4, ITEM_IPV6, PATTERN_COMMON),
> +	[ITEM_ETH] = TRANS(ITEM_IPV4, ITEM_IPV6, ITEM_VLAN, PATTERN_COMMON),
> +	[ITEM_VLAN] = TRANS(ITEM_IPV4, ITEM_IPV6, PATTERN_COMMON),
>  	[ITEM_IPV4] = TRANS(ITEM_TCP, ITEM_UDP, PATTERN_COMMON),
>  	[ITEM_IPV6] = TRANS(ITEM_TCP, ITEM_UDP, PATTERN_COMMON),
>  	[ITEM_TCP] = TRANS(PATTERN_COMMON),
> @@ -72,12 +80,17 @@ static const enum mlx5_nl_flow_trans *const mlx5_nl_flow_trans[] = {
>  	[ACTION_VOID] = TRANS(BACK),
>  	[ACTION_PORT_ID] = TRANS(ACTION_VOID, END),
>  	[ACTION_DROP] = TRANS(ACTION_VOID, END),
> +	[ACTION_OF_POP_VLAN] = TRANS(ACTIONS_FATE, ACTIONS_COMMON),
> +	[ACTION_OF_PUSH_VLAN] = TRANS(ACTIONS_FATE, ACTIONS_COMMON),
> +	[ACTION_OF_SET_VLAN_VID] = TRANS(ACTIONS_FATE, ACTIONS_COMMON),
> +	[ACTION_OF_SET_VLAN_PCP] = TRANS(ACTIONS_FATE, ACTIONS_COMMON),
>  	[END] = NULL,
>  };
>  
>  /** Empty masks for known item types. */
>  static const union {
>  	struct rte_flow_item_eth eth;
> +	struct rte_flow_item_vlan vlan;
>  	struct rte_flow_item_ipv4 ipv4;
>  	struct rte_flow_item_ipv6 ipv6;
>  	struct rte_flow_item_tcp tcp;
> @@ -87,6 +100,7 @@ static const union {
>  /** Supported masks for known item types. */
>  static const struct {
>  	struct rte_flow_item_eth eth;
> +	struct rte_flow_item_vlan vlan;
>  	struct rte_flow_item_ipv4 ipv4;
>  	struct rte_flow_item_ipv6 ipv6;
>  	struct rte_flow_item_tcp tcp;
> @@ -97,6 +111,11 @@ static const struct {
>  		.dst.addr_bytes = "\xff\xff\xff\xff\xff\xff",
>  		.src.addr_bytes = "\xff\xff\xff\xff\xff\xff",
>  	},
> +	.vlan = {
> +		/* PCP and VID only, no DEI. */
> +		.tci = RTE_BE16(0xefff),
> +		.inner_type = RTE_BE16(0xffff),
> +	},
>  	.ipv4.hdr = {
>  		.next_proto_id = 0xff,
>  		.src_addr = RTE_BE32(0xffffffff),
> @@ -242,9 +261,13 @@ mlx5_nl_flow_transpose(void *buf,
>  	unsigned int n;
>  	uint32_t act_index_cur;
>  	bool eth_type_set;
> +	bool vlan_present;
> +	bool vlan_eth_type_set;
>  	bool ip_proto_set;
>  	struct nlattr *na_flower;
>  	struct nlattr *na_flower_act;
> +	struct nlattr *na_vlan_id;
> +	struct nlattr *na_vlan_priority;
>  	const enum mlx5_nl_flow_trans *trans;
>  	const enum mlx5_nl_flow_trans *back;
>  
> @@ -256,15 +279,20 @@ mlx5_nl_flow_transpose(void *buf,
>  	n = 0;
>  	act_index_cur = 0;
>  	eth_type_set = false;
> +	vlan_present = false;
> +	vlan_eth_type_set = false;
>  	ip_proto_set = false;
>  	na_flower = NULL;
>  	na_flower_act = NULL;
> +	na_vlan_id = NULL;
> +	na_vlan_priority = NULL;
>  	trans = TRANS(ATTR);
>  	back = trans;
>  trans:
>  	switch (trans[n++]) {
>  		union {
>  			const struct rte_flow_item_eth *eth;
> +			const struct rte_flow_item_vlan *vlan;
>  			const struct rte_flow_item_ipv4 *ipv4;
>  			const struct rte_flow_item_ipv6 *ipv6;
>  			const struct rte_flow_item_tcp *tcp;
> @@ -272,6 +300,11 @@ mlx5_nl_flow_transpose(void *buf,
>  		} spec, mask;
>  		union {
>  			const struct rte_flow_action_port_id *port_id;
> +			const struct rte_flow_action_of_push_vlan *of_push_vlan;
> +			const struct rte_flow_action_of_set_vlan_vid *
> +				of_set_vlan_vid;
> +			const struct rte_flow_action_of_set_vlan_pcp *
> +				of_set_vlan_pcp;
>  		} conf;
>  		struct nlmsghdr *nlh;
>  		struct tcmsg *tcm;
> @@ -408,6 +441,58 @@ mlx5_nl_flow_transpose(void *buf,
>  			goto error_nobufs;
>  		++item;
>  		break;
> +	case ITEM_VLAN:
> +		if (item->type != RTE_FLOW_ITEM_TYPE_VLAN)
> +			goto trans;
> +		mask.vlan = mlx5_nl_flow_item_mask
> +			(item, &rte_flow_item_vlan_mask,
> +			 &mlx5_nl_flow_mask_supported.vlan,
> +			 &mlx5_nl_flow_mask_empty.vlan,
> +			 sizeof(mlx5_nl_flow_mask_supported.vlan), error);
> +		if (!mask.vlan)
> +			return -rte_errno;
> +		if (!eth_type_set &&
> +		    !mnl_attr_put_u16_check(buf, size,
> +					    TCA_FLOWER_KEY_ETH_TYPE,
> +					    RTE_BE16(ETH_P_8021Q)))
> +			goto error_nobufs;
> +		eth_type_set = 1;
> +		vlan_present = 1;
> +		if (mask.vlan == &mlx5_nl_flow_mask_empty.vlan) {
> +			++item;
> +			break;
> +		}
> +		spec.vlan = item->spec;
> +		if ((mask.vlan->tci & RTE_BE16(0xe000) &&
> +		     (mask.vlan->tci & RTE_BE16(0xe000)) != RTE_BE16(0xe000)) ||
> +		    (mask.vlan->tci & RTE_BE16(0x0fff) &&
> +		     (mask.vlan->tci & RTE_BE16(0x0fff)) != RTE_BE16(0x0fff)) ||
> +		    (mask.vlan->inner_type &&
> +		     mask.vlan->inner_type != RTE_BE16(0xffff)))
> +			return rte_flow_error_set
> +				(error, ENOTSUP, RTE_FLOW_ERROR_TYPE_ITEM_MASK,
> +				 mask.vlan,
> +				 "no support for partial masks on"
> +				 " \"tci\" (PCP and VID parts) and"
> +				 " \"inner_type\" fields");
> +		if (mask.vlan->inner_type) {
> +			if (!mnl_attr_put_u16_check
> +			    (buf, size, TCA_FLOWER_KEY_VLAN_ETH_TYPE,
> +			     spec.vlan->inner_type))
> +				goto error_nobufs;
> +			vlan_eth_type_set = 1;
> +		}
> +		if ((mask.vlan->tci & RTE_BE16(0xe000) &&
> +		     !mnl_attr_put_u8_check
> +		     (buf, size, TCA_FLOWER_KEY_VLAN_PRIO,
> +		      (rte_be_to_cpu_16(spec.vlan->tci) >> 13) & 0x7)) ||
> +		    (mask.vlan->tci & RTE_BE16(0x0fff) &&
> +		     !mnl_attr_put_u16_check
> +		     (buf, size, TCA_FLOWER_KEY_VLAN_ID,
> +		      spec.vlan->tci & RTE_BE16(0x0fff))))
> +			goto error_nobufs;
> +		++item;
> +		break;
>  	case ITEM_IPV4:
>  		if (item->type != RTE_FLOW_ITEM_TYPE_IPV4)
>  			goto trans;
> @@ -418,12 +503,15 @@ mlx5_nl_flow_transpose(void *buf,
>  			 sizeof(mlx5_nl_flow_mask_supported.ipv4), error);
>  		if (!mask.ipv4)
>  			return -rte_errno;
> -		if (!eth_type_set &&
> +		if ((!eth_type_set || !vlan_eth_type_set) &&
>  		    !mnl_attr_put_u16_check(buf, size,
> +					    vlan_present ?
> +					    TCA_FLOWER_KEY_VLAN_ETH_TYPE :
>  					    TCA_FLOWER_KEY_ETH_TYPE,
>  					    RTE_BE16(ETH_P_IP)))
>  			goto error_nobufs;
>  		eth_type_set = 1;
> +		vlan_eth_type_set = 1;
>  		if (mask.ipv4 == &mlx5_nl_flow_mask_empty.ipv4) {
>  			++item;
>  			break;
> @@ -470,12 +558,15 @@ mlx5_nl_flow_transpose(void *buf,
>  			 sizeof(mlx5_nl_flow_mask_supported.ipv6), error);
>  		if (!mask.ipv6)
>  			return -rte_errno;
> -		if (!eth_type_set &&
> +		if ((!eth_type_set || !vlan_eth_type_set) &&
>  		    !mnl_attr_put_u16_check(buf, size,
> +					    vlan_present ?
> +					    TCA_FLOWER_KEY_VLAN_ETH_TYPE :
>  					    TCA_FLOWER_KEY_ETH_TYPE,
>  					    RTE_BE16(ETH_P_IPV6)))
>  			goto error_nobufs;
>  		eth_type_set = 1;
> +		vlan_eth_type_set = 1;
>  		if (mask.ipv6 == &mlx5_nl_flow_mask_empty.ipv6) {
>  			++item;
>  			break;
> @@ -681,6 +772,84 @@ mlx5_nl_flow_transpose(void *buf,
>  		mnl_attr_nest_end(buf, act_index);
>  		++action;
>  		break;
> +	case ACTION_OF_POP_VLAN:
> +		if (action->type != RTE_FLOW_ACTION_TYPE_OF_POP_VLAN)
> +			goto trans;
> +		conf.of_push_vlan = NULL;
> +		i = TCA_VLAN_ACT_POP;
> +		goto action_of_vlan;
> +	case ACTION_OF_PUSH_VLAN:
> +		if (action->type != RTE_FLOW_ACTION_TYPE_OF_PUSH_VLAN)
> +			goto trans;
> +		conf.of_push_vlan = action->conf;
> +		i = TCA_VLAN_ACT_PUSH;
> +		goto action_of_vlan;
> +	case ACTION_OF_SET_VLAN_VID:
> +		if (action->type != RTE_FLOW_ACTION_TYPE_OF_SET_VLAN_VID)
> +			goto trans;
> +		conf.of_set_vlan_vid = action->conf;
> +		if (na_vlan_id)
> +			goto override_na_vlan_id;
> +		i = TCA_VLAN_ACT_MODIFY;
> +		goto action_of_vlan;
> +	case ACTION_OF_SET_VLAN_PCP:
> +		if (action->type != RTE_FLOW_ACTION_TYPE_OF_SET_VLAN_PCP)
> +			goto trans;
> +		conf.of_set_vlan_pcp = action->conf;
> +		if (na_vlan_priority)
> +			goto override_na_vlan_priority;
> +		i = TCA_VLAN_ACT_MODIFY;
> +		goto action_of_vlan;
> +action_of_vlan:
> +		act_index =
> +			mnl_attr_nest_start_check(buf, size, act_index_cur++);
> +		if (!act_index ||
> +		    !mnl_attr_put_strz_check(buf, size, TCA_ACT_KIND, "vlan"))
> +			goto error_nobufs;
> +		act = mnl_attr_nest_start_check(buf, size, TCA_ACT_OPTIONS);
> +		if (!act)
> +			goto error_nobufs;
> +		if (!mnl_attr_put_check(buf, size, TCA_VLAN_PARMS,
> +					sizeof(struct tc_vlan),
> +					&(struct tc_vlan){
> +						.action = TC_ACT_PIPE,
> +						.v_action = i,
> +					}))
> +			goto error_nobufs;
> +		if (i == TCA_VLAN_ACT_POP) {
> +			mnl_attr_nest_end(buf, act);
> +			++action;
> +			break;
> +		}
> +		if (i == TCA_VLAN_ACT_PUSH &&
> +		    !mnl_attr_put_u16_check(buf, size,
> +					    TCA_VLAN_PUSH_VLAN_PROTOCOL,
> +					    conf.of_push_vlan->ethertype))
> +			goto error_nobufs;
> +		na_vlan_id = mnl_nlmsg_get_payload_tail(buf);
> +		if (!mnl_attr_put_u16_check(buf, size, TCA_VLAN_PAD, 0))
> +			goto error_nobufs;
> +		na_vlan_priority = mnl_nlmsg_get_payload_tail(buf);
> +		if (!mnl_attr_put_u8_check(buf, size, TCA_VLAN_PAD, 0))
> +			goto error_nobufs;
> +		mnl_attr_nest_end(buf, act);
> +		mnl_attr_nest_end(buf, act_index);
> +		if (action->type == RTE_FLOW_ACTION_TYPE_OF_SET_VLAN_VID) {
> +override_na_vlan_id:
> +			na_vlan_id->nla_type = TCA_VLAN_PUSH_VLAN_ID;
> +			*(uint16_t *)mnl_attr_get_payload(na_vlan_id) =
> +				rte_be_to_cpu_16
> +				(conf.of_set_vlan_vid->vlan_vid);
> +		} else if (action->type ==
> +			   RTE_FLOW_ACTION_TYPE_OF_SET_VLAN_PCP) {
> +override_na_vlan_priority:
> +			na_vlan_priority->nla_type =
> +				TCA_VLAN_PUSH_VLAN_PRIORITY;
> +			*(uint8_t *)mnl_attr_get_payload(na_vlan_priority) =
> +				conf.of_set_vlan_pcp->vlan_pcp;
> +		}
> +		++action;
> +		break;

I'm wondering if there's no need to check the existence of VLAN in pattern when
having VLAN modification actions. For example, if flow has POP_VLAN action, its
pattern has to have VLAN item, doesn't it? Even though kernel driver has such
validation checks, mlx5_flow_validate() can't validate it.

In the PRM,
	8.18.2.7 Packet Classification Ambiguities
	...
	In addition, a flow should not match or attempt to modify (Modify Header
	action, Pop VLAN action) non-existing fields of a packet, as defined by
	the packet classification process.
	...

Thanks,
Yongseok

>  	case END:
>  		if (item->type != RTE_FLOW_ITEM_TYPE_END ||
>  		    action->type != RTE_FLOW_ACTION_TYPE_END)
> -- 
> 2.11.0

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [dpdk-dev] [PATCH 6/6] net/mlx5: add port ID pattern item to switch flow rules
  2018-06-27 18:08 ` [dpdk-dev] [PATCH 6/6] net/mlx5: add port ID pattern item " Adrien Mazarguil
@ 2018-07-12  1:13   ` Yongseok Koh
  0 siblings, 0 replies; 33+ messages in thread
From: Yongseok Koh @ 2018-07-12  1:13 UTC (permalink / raw)
  To: Adrien Mazarguil; +Cc: Shahaf Shuler, Nelio Laranjeiro, dev

On Wed, Jun 27, 2018 at 08:08:20PM +0200, Adrien Mazarguil wrote:
> This enables flow rules to match traffic coming from a different DPDK port
> ID associated with the device (PORT_ID pattern item), mainly for the
> convenience of applications that want to deal with a single port ID for all
> flow rules associated with some physical device.
> 
> Testpmd example:
> 
> - Creating a flow rule on port ID 1 to consume all traffic from port ID 0
>   and direct it to port ID 2:
> 
>   flow create 1 ingress transfer pattern port_id id is 0 / end actions
>      port_id id 2 / end
> 
> Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
> ---
Acked-by: Yongseok Koh <yskoh@mellanox.com>

Thanks

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [dpdk-dev] [PATCH 1/6] net/mlx5: lay groundwork for switch offloads
  2018-07-12  0:17   ` Yongseok Koh
@ 2018-07-12 10:46     ` Adrien Mazarguil
  2018-07-12 17:33       ` Yongseok Koh
  0 siblings, 1 reply; 33+ messages in thread
From: Adrien Mazarguil @ 2018-07-12 10:46 UTC (permalink / raw)
  To: Yongseok Koh; +Cc: Shahaf Shuler, Nelio Laranjeiro, dev

On Wed, Jul 11, 2018 at 05:17:09PM -0700, Yongseok Koh wrote:
> On Wed, Jun 27, 2018 at 08:08:10PM +0200, Adrien Mazarguil wrote:
> > With mlx5, unlike normal flow rules implemented through Verbs for traffic
> > emitted and received by the application, those targeting different logical
> > ports of the device (VF representors for instance) are offloaded at the
> > switch level and must be configured through Netlink (TC interface).
> > 
> > This patch adds preliminary support to manage such flow rules through the
> > flow API (rte_flow).
> > 
> > Instead of rewriting tons of Netlink helpers and as previously suggested by
> > Stephen [1], this patch introduces a new dependency to libmnl [2]
> > (LGPL-2.1) when compiling mlx5.
> > 
> > [1] https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fmails.dpdk.org%2Farchives%2Fdev%2F2018-March%2F092676.html&data=02%7C01%7Cyskoh%40mellanox.com%7C1250093eca0c4ad6d9f008d5dc58fbb4%7Ca652971c7d2e4d9ba6a4d149256f461b%7C0%7C0%7C636657197116524482&sdata=JrAyzK1s3JG5CnuquNcA7XRN4d2WYtHUi1KXyloGdvA%3D&reserved=0
> > [2] https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fnetfilter.org%2Fprojects%2Flibmnl%2F&data=02%7C01%7Cyskoh%40mellanox.com%7C1250093eca0c4ad6d9f008d5dc58fbb4%7Ca652971c7d2e4d9ba6a4d149256f461b%7C0%7C0%7C636657197116524482&sdata=yLYa0NzsTyE62BHDCZDoDah31snt6w4Coq47pY913Oo%3D&reserved=0
> > 
> > Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
<snip>
> > diff --git a/drivers/net/mlx5/mlx5_nl_flow.c b/drivers/net/mlx5/mlx5_nl_flow.c
> > new file mode 100644
> > index 000000000..7a8683b03
> > --- /dev/null
> > +++ b/drivers/net/mlx5/mlx5_nl_flow.c
> > @@ -0,0 +1,139 @@
> > +/* SPDX-License-Identifier: BSD-3-Clause
> > + * Copyright 2018 6WIND S.A.
> > + * Copyright 2018 Mellanox Technologies, Ltd
> > + */
> > +
> > +#include <errno.h>
> > +#include <libmnl/libmnl.h>
> > +#include <linux/netlink.h>
> > +#include <linux/pkt_sched.h>
> > +#include <linux/rtnetlink.h>
> > +#include <stdalign.h>
> > +#include <stddef.h>
> > +#include <stdint.h>
> > +#include <stdlib.h>
> > +#include <sys/socket.h>
> > +
> > +#include <rte_errno.h>
> > +#include <rte_flow.h>
> > +
> > +#include "mlx5.h"
> > +
> > +/**
> > + * Send Netlink message with acknowledgment.
> > + *
> > + * @param nl
> > + *   Libmnl socket to use.
> > + * @param nlh
> > + *   Message to send. This function always raises the NLM_F_ACK flag before
> > + *   sending.
> > + *
> > + * @return
> > + *   0 on success, a negative errno value otherwise and rte_errno is set.
> > + */
> > +static int
> > +mlx5_nl_flow_nl_ack(struct mnl_socket *nl, struct nlmsghdr *nlh)
> > +{
> > +	alignas(struct nlmsghdr)
> > +	uint8_t ans[MNL_SOCKET_BUFFER_SIZE];
> 
> There are total 3 of this buffer. On a certain host having large pagesize, this
> can be 8kB * 3 = 24kB. This is not a gigantic buffer but as all the functions
> here are sequentially accessed, how about having just one global buffer instead?

All right it's not ideal, I opted for simplicity though. This is a generic
ack function. When NETLINK_CAP_ACK is not supported (note: this was made
optional for v2, some systems do not support it), an ack consumes a bit more
space than the original message, which may itself be huge, and failure to
receive acks is deemed fatal.

Its callers are mlx5_nl_flow_init(), called once per device during
initialization, and mlx5_nl_flow_create/destroy(), called for each
created/removed flow rule.

These last two are called often but do not put their own buffer on the
stack, they reuse previously generated messages from the heap.

So to improve stack consumption a bit, what I can do is size this buffer
according to nlh->nlmsg_len + extra room for ack header, yet still allocate
it locally since it would be a pain otherwise. Callers may not want their
own buffers to be overwritten with useless acks.

-- 
Adrien Mazarguil
6WIND

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [dpdk-dev] [PATCH 2/6] net/mlx5: add framework for switch flow rules
  2018-07-12  0:59   ` Yongseok Koh
@ 2018-07-12 10:46     ` Adrien Mazarguil
  2018-07-12 18:25       ` Yongseok Koh
  0 siblings, 1 reply; 33+ messages in thread
From: Adrien Mazarguil @ 2018-07-12 10:46 UTC (permalink / raw)
  To: Yongseok Koh; +Cc: Shahaf Shuler, Nelio Laranjeiro, dev

On Wed, Jul 11, 2018 at 05:59:18PM -0700, Yongseok Koh wrote:
> On Wed, Jun 27, 2018 at 08:08:12PM +0200, Adrien Mazarguil wrote:
> > Because mlx5 switch flow rules are configured through Netlink (TC
> > interface) and have little in common with Verbs, this patch adds a separate
> > parser function to handle them.
> > 
> > - mlx5_nl_flow_transpose() converts a rte_flow rule to its TC equivalent
> >   and stores the result in a buffer.
> > 
> > - mlx5_nl_flow_brand() gives a unique handle to a flow rule buffer.
> > 
> > - mlx5_nl_flow_create() instantiates a flow rule on the device based on
> >   such a buffer.
> > 
> > - mlx5_nl_flow_destroy() performs the reverse operation.
> > 
> > These functions are called by the existing implementation when encountering
> > flow rules which must be offloaded to the switch (currently relying on the
> > transfer attribute).
> > 
> > Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
> > Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
<snip>
> > diff --git a/drivers/net/mlx5/mlx5_flow.c b/drivers/net/mlx5/mlx5_flow.c
> > index 9241855be..93b245991 100644
> > --- a/drivers/net/mlx5/mlx5_flow.c
> > +++ b/drivers/net/mlx5/mlx5_flow.c
> > @@ -4,6 +4,7 @@
> >   */
> >  
> >  #include <sys/queue.h>
> > +#include <stdalign.h>
> >  #include <stdint.h>
> >  #include <string.h>
> >  
> > @@ -271,6 +272,7 @@ struct rte_flow {
> >  	/**< Store tunnel packet type data to store in Rx queue. */
> >  	uint8_t key[40]; /**< RSS hash key. */
> >  	uint16_t (*queue)[]; /**< Destination queues to redirect traffic to. */
> > +	void *nl_flow; /**< Netlink flow buffer if relevant. */
> >  };
> >  
> >  static const struct rte_flow_ops mlx5_flow_ops = {
> > @@ -2403,6 +2405,106 @@ mlx5_flow_actions(struct rte_eth_dev *dev,
> >  }
> >  
> >  /**
> > + * Validate flow rule and fill flow structure accordingly.
> > + *
> > + * @param dev
> > + *   Pointer to Ethernet device.
> > + * @param[out] flow
> > + *   Pointer to flow structure.
> > + * @param flow_size
> > + *   Size of allocated space for @p flow.
> > + * @param[in] attr
> > + *   Flow rule attributes.
> > + * @param[in] pattern
> > + *   Pattern specification (list terminated by the END pattern item).
> > + * @param[in] actions
> > + *   Associated actions (list terminated by the END action).
> > + * @param[out] error
> > + *   Perform verbose error reporting if not NULL.
> > + *
> > + * @return
> > + *   A positive value representing the size of the flow object in bytes
> > + *   regardless of @p flow_size on success, a negative errno value otherwise
> > + *   and rte_errno is set.
> > + */
> > +static int
> > +mlx5_flow_merge_switch(struct rte_eth_dev *dev,
> > +		       struct rte_flow *flow,
> > +		       size_t flow_size,
> > +		       const struct rte_flow_attr *attr,
> > +		       const struct rte_flow_item pattern[],
> > +		       const struct rte_flow_action actions[],
> > +		       struct rte_flow_error *error)
> > +{
> > +	struct priv *priv = dev->data->dev_private;
> > +	unsigned int n = mlx5_domain_to_port_id(priv->domain_id, NULL, 0);
> > +	uint16_t port_list[!n + n];
> > +	struct mlx5_nl_flow_ptoi ptoi[!n + n + 1];
> > +	size_t off = RTE_ALIGN_CEIL(sizeof(*flow), alignof(max_align_t));
> > +	unsigned int i;
> > +	unsigned int own = 0;
> > +	int ret;
> > +
> > +	/* At least one port is needed when no switch domain is present. */
> > +	if (!n) {
> > +		n = 1;
> > +		port_list[0] = dev->data->port_id;
> > +	} else {
> > +		n = mlx5_domain_to_port_id(priv->domain_id, port_list, n);
> > +		if (n > RTE_DIM(port_list))
> > +			n = RTE_DIM(port_list);
> > +	}
> > +	for (i = 0; i != n; ++i) {
> > +		struct rte_eth_dev_info dev_info;
> > +
> > +		rte_eth_dev_info_get(port_list[i], &dev_info);
> > +		if (port_list[i] == dev->data->port_id)
> > +			own = i;
> > +		ptoi[i].port_id = port_list[i];
> > +		ptoi[i].ifindex = dev_info.if_index;
> > +	}
> > +	/* Ensure first entry of ptoi[] is the current device. */
> > +	if (own) {
> > +		ptoi[n] = ptoi[0];
> > +		ptoi[0] = ptoi[own];
> > +		ptoi[own] = ptoi[n];
> > +	}
> > +	/* An entry with zero ifindex terminates ptoi[]. */
> > +	ptoi[n].port_id = 0;
> > +	ptoi[n].ifindex = 0;
> > +	if (flow_size < off)
> > +		flow_size = 0;
> > +	ret = mlx5_nl_flow_transpose((uint8_t *)flow + off,
> > +				     flow_size ? flow_size - off : 0,
> > +				     ptoi, attr, pattern, actions, error);
> > +	if (ret < 0)
> > +		return ret;
> 
> So, there's an assumption that the buffer allocated outside of this API is
> enough to include all the messages in mlx5_nl_flow_transpose(), right? If
> flow_size isn't enough, buf_tmp will be used and _transpose() doesn't return
> error but required size. Sounds confusing, may need to make a change or to have
> clearer documentation.

Well, isn't it already documented? Besides these are the usual snprintf()
semantics used everywhere in these files, I think this was a major topic of
discussion with Nelio on the flow rework series :)

buf_tmp[] is internal to mlx5_nl_flow_transpose() and used as a fallback to
complete a pass when input buffer is not large enough (including
zero-sized). Having a valid buffer is a constraint imposed by libmnl,
because we badly want to know how much space will be needed assuming the
flow rule was successfully processed.

Without libmnl, the helpers it provides would have been written in a way
that doesn't require buf_tmp[]. However libmnl is just too convenient to
pass up, hence this compromise.

(just to remind onlookers, we want to allocate the minimum amount of memory
we possibly can for resources needed by each flow rule, and do so through a
single allocation, end goal being to support millions of flow rules while
wasting as little memory as possible.)

<snip>
> > diff --git a/drivers/net/mlx5/mlx5_nl_flow.c b/drivers/net/mlx5/mlx5_nl_flow.c
> > index 7a8683b03..1fc62fb0a 100644
> > --- a/drivers/net/mlx5/mlx5_nl_flow.c
> > +++ b/drivers/net/mlx5/mlx5_nl_flow.c
> > @@ -5,7 +5,9 @@
> >  
> >  #include <errno.h>
> >  #include <libmnl/libmnl.h>
> > +#include <linux/if_ether.h>
> >  #include <linux/netlink.h>
> > +#include <linux/pkt_cls.h>
> >  #include <linux/pkt_sched.h>
> >  #include <linux/rtnetlink.h>
> >  #include <stdalign.h>
> > @@ -14,11 +16,248 @@
> >  #include <stdlib.h>
> >  #include <sys/socket.h>
> >  
> > +#include <rte_byteorder.h>
> >  #include <rte_errno.h>
> >  #include <rte_flow.h>
> >  
> >  #include "mlx5.h"
> >  
> > +/** Parser state definitions for mlx5_nl_flow_trans[]. */
> > +enum mlx5_nl_flow_trans {
> > +	INVALID,
> > +	BACK,
> > +	ATTR,
> > +	PATTERN,
> > +	ITEM_VOID,
> > +	ACTIONS,
> > +	ACTION_VOID,
> > +	END,
> > +};
> > +
> > +#define TRANS(...) (const enum mlx5_nl_flow_trans []){ __VA_ARGS__, INVALID, }
> > +
> > +#define PATTERN_COMMON \
> > +	ITEM_VOID, ACTIONS
> > +#define ACTIONS_COMMON \
> > +	ACTION_VOID, END
> > +
> > +/** Parser state transitions used by mlx5_nl_flow_transpose(). */
> > +static const enum mlx5_nl_flow_trans *const mlx5_nl_flow_trans[] = {
> > +	[INVALID] = NULL,
> > +	[BACK] = NULL,
> > +	[ATTR] = TRANS(PATTERN),
> > +	[PATTERN] = TRANS(PATTERN_COMMON),
> > +	[ITEM_VOID] = TRANS(BACK),
> > +	[ACTIONS] = TRANS(ACTIONS_COMMON),
> > +	[ACTION_VOID] = TRANS(BACK),
> > +	[END] = NULL,
> > +};
> > +
> > +/**
> > + * Transpose flow rule description to rtnetlink message.
> > + *
> > + * This function transposes a flow rule description to a traffic control
> > + * (TC) filter creation message ready to be sent over Netlink.
> > + *
> > + * Target interface is specified as the first entry of the @p ptoi table.
> > + * Subsequent entries enable this function to resolve other DPDK port IDs
> > + * found in the flow rule.
> > + *
> > + * @param[out] buf
> > + *   Output message buffer. May be NULL when @p size is 0.
> > + * @param size
> > + *   Size of @p buf. Message may be truncated if not large enough.
> > + * @param[in] ptoi
> > + *   DPDK port ID to network interface index translation table. This table
> > + *   is terminated by an entry with a zero ifindex value.
> > + * @param[in] attr
> > + *   Flow rule attributes.
> > + * @param[in] pattern
> > + *   Pattern specification.
> > + * @param[in] actions
> > + *   Associated actions.
> > + * @param[out] error
> > + *   Perform verbose error reporting if not NULL.
> > + *
> > + * @return
> > + *   A positive value representing the exact size of the message in bytes
> > + *   regardless of the @p size parameter on success, a negative errno value
> > + *   otherwise and rte_errno is set.
> > + */
> > +int
> > +mlx5_nl_flow_transpose(void *buf,
> > +		       size_t size,
> > +		       const struct mlx5_nl_flow_ptoi *ptoi,
> > +		       const struct rte_flow_attr *attr,
> > +		       const struct rte_flow_item *pattern,
> > +		       const struct rte_flow_action *actions,
> > +		       struct rte_flow_error *error)
> > +{
> > +	alignas(struct nlmsghdr)
> > +	uint8_t buf_tmp[MNL_SOCKET_BUFFER_SIZE];
> > +	const struct rte_flow_item *item;
> > +	const struct rte_flow_action *action;
> > +	unsigned int n;
> > +	struct nlattr *na_flower;
> > +	struct nlattr *na_flower_act;
> > +	const enum mlx5_nl_flow_trans *trans;
> > +	const enum mlx5_nl_flow_trans *back;
> > +
> > +	if (!size)
> > +		goto error_nobufs;
> > +init:
> > +	item = pattern;
> > +	action = actions;
> > +	n = 0;
> > +	na_flower = NULL;
> > +	na_flower_act = NULL;
> > +	trans = TRANS(ATTR);
> > +	back = trans;
> > +trans:
> > +	switch (trans[n++]) {
> > +		struct nlmsghdr *nlh;
> > +		struct tcmsg *tcm;
> > +
> > +	case INVALID:
> > +		if (item->type)
> > +			return rte_flow_error_set
> > +				(error, ENOTSUP, RTE_FLOW_ERROR_TYPE_ITEM,
> > +				 item, "unsupported pattern item combination");
> > +		else if (action->type)
> > +			return rte_flow_error_set
> > +				(error, ENOTSUP, RTE_FLOW_ERROR_TYPE_ACTION,
> > +				 action, "unsupported action combination");
> > +		return rte_flow_error_set
> > +			(error, ENOTSUP, RTE_FLOW_ERROR_TYPE_UNSPECIFIED, NULL,
> > +			 "flow rule lacks some kind of fate action");
> > +	case BACK:
> > +		trans = back;
> > +		n = 0;
> > +		goto trans;
> > +	case ATTR:
> > +		/*
> > +		 * Supported attributes: no groups, some priorities and
> > +		 * ingress only. Don't care about transfer as it is the
> > +		 * caller's problem.
> > +		 */
> > +		if (attr->group)
> > +			return rte_flow_error_set
> > +				(error, ENOTSUP,
> > +				 RTE_FLOW_ERROR_TYPE_ATTR_GROUP,
> > +				 attr, "groups are not supported");
> > +		if (attr->priority > 0xfffe)
> > +			return rte_flow_error_set
> > +				(error, ENOTSUP,
> > +				 RTE_FLOW_ERROR_TYPE_ATTR_PRIORITY,
> > +				 attr, "lowest priority level is 0xfffe");
> > +		if (!attr->ingress)
> > +			return rte_flow_error_set
> > +				(error, ENOTSUP,
> > +				 RTE_FLOW_ERROR_TYPE_ATTR_INGRESS,
> > +				 attr, "only ingress is supported");
> > +		if (attr->egress)
> > +			return rte_flow_error_set
> > +				(error, ENOTSUP,
> > +				 RTE_FLOW_ERROR_TYPE_ATTR_INGRESS,
> > +				 attr, "egress is not supported");
> > +		if (size < mnl_nlmsg_size(sizeof(*tcm)))
> > +			goto error_nobufs;
> > +		nlh = mnl_nlmsg_put_header(buf);
> > +		nlh->nlmsg_type = 0;
> > +		nlh->nlmsg_flags = 0;
> > +		nlh->nlmsg_seq = 0;
> > +		tcm = mnl_nlmsg_put_extra_header(nlh, sizeof(*tcm));
> > +		tcm->tcm_family = AF_UNSPEC;
> > +		tcm->tcm_ifindex = ptoi[0].ifindex;
> > +		/*
> > +		 * Let kernel pick a handle by default. A predictable handle
> > +		 * can be set by the caller on the resulting buffer through
> > +		 * mlx5_nl_flow_brand().
> > +		 */
> > +		tcm->tcm_handle = 0;
> > +		tcm->tcm_parent = TC_H_MAKE(TC_H_INGRESS, TC_H_MIN_INGRESS);
> > +		/*
> > +		 * Priority cannot be zero to prevent the kernel from
> > +		 * picking one automatically.
> > +		 */
> > +		tcm->tcm_info = TC_H_MAKE((attr->priority + 1) << 16,
> > +					  RTE_BE16(ETH_P_ALL));
> > +		break;
> > +	case PATTERN:
> > +		if (!mnl_attr_put_strz_check(buf, size, TCA_KIND, "flower"))
> > +			goto error_nobufs;
> > +		na_flower = mnl_attr_nest_start_check(buf, size, TCA_OPTIONS);
> > +		if (!na_flower)
> > +			goto error_nobufs;
> > +		if (!mnl_attr_put_u32_check(buf, size, TCA_FLOWER_FLAGS,
> > +					    TCA_CLS_FLAGS_SKIP_SW))
> > +			goto error_nobufs;
> > +		break;
> > +	case ITEM_VOID:
> > +		if (item->type != RTE_FLOW_ITEM_TYPE_VOID)
> > +			goto trans;
> > +		++item;
> > +		break;
> > +	case ACTIONS:
> > +		if (item->type != RTE_FLOW_ITEM_TYPE_END)
> > +			goto trans;
> > +		assert(na_flower);
> > +		assert(!na_flower_act);
> > +		na_flower_act =
> > +			mnl_attr_nest_start_check(buf, size, TCA_FLOWER_ACT);
> > +		if (!na_flower_act)
> > +			goto error_nobufs;
> > +		break;
> > +	case ACTION_VOID:
> > +		if (action->type != RTE_FLOW_ACTION_TYPE_VOID)
> > +			goto trans;
> > +		++action;
> > +		break;
> > +	case END:
> > +		if (item->type != RTE_FLOW_ITEM_TYPE_END ||
> > +		    action->type != RTE_FLOW_ACTION_TYPE_END)
> > +			goto trans;
> > +		if (na_flower_act)
> > +			mnl_attr_nest_end(buf, na_flower_act);
> > +		if (na_flower)
> > +			mnl_attr_nest_end(buf, na_flower);
> > +		nlh = buf;
> > +		return nlh->nlmsg_len;
> > +	}
> > +	back = trans;
> > +	trans = mlx5_nl_flow_trans[trans[n - 1]];
> > +	n = 0;
> > +	goto trans;
> > +error_nobufs:
> > +	if (buf != buf_tmp) {
> > +		buf = buf_tmp;
> > +		size = sizeof(buf_tmp);
> > +		goto init;
> > +	}
> 
> Continuing my comment above.
> This part is unclear. It looks to me that this func does:
> 
> 1) if size is zero, consider it as a testing call to know the amount of memory
> required.

Yeah, in fact this one is a shortcut to speed up this specific scenario as
it happens all the time in the two-pass use case. You can lump it with 2).

> 2) if size isn't zero but not enough, it stops writing to buf and start over to
> return the amount of memory required instead of returning error.
> 3) if size isn't zero and enough, it fills in buf.
> 
> Do I correctly understand?

Yes. Another minor note for 2), the returned buffer is also filled up to the
point of failure (mimics snprintf()).

Perhaps the following snippet can better summarize the envisioned approach:

 int ret = snprintf(NULL, 0, "something", ...);

 if (ret < 0) {
     goto court;
 } else {
     char buf[ret];

     snprintf(buf, sizeof(buf), "something", ...); /* Guaranteed. */
     [...]
 }

-- 
Adrien Mazarguil
6WIND

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [dpdk-dev] [PATCH 5/6] net/mlx5: add VLAN item and actions to switch flow rules
  2018-07-12  1:10   ` Yongseok Koh
@ 2018-07-12 10:47     ` Adrien Mazarguil
  2018-07-12 18:49       ` Yongseok Koh
  0 siblings, 1 reply; 33+ messages in thread
From: Adrien Mazarguil @ 2018-07-12 10:47 UTC (permalink / raw)
  To: Yongseok Koh; +Cc: Shahaf Shuler, Nelio Laranjeiro, dev

On Wed, Jul 11, 2018 at 06:10:25PM -0700, Yongseok Koh wrote:
> On Wed, Jun 27, 2018 at 08:08:18PM +0200, Adrien Mazarguil wrote:
> > This enables flow rules to explicitly match VLAN traffic (VLAN pattern
> > item) and perform various operations on VLAN headers at the switch level
> > (OF_POP_VLAN, OF_PUSH_VLAN, OF_SET_VLAN_VID and OF_SET_VLAN_PCP actions).
> > 
> > Testpmd examples:
> > 
> > - Directing all VLAN traffic received on port ID 1 to port ID 0:
> > 
> >   flow create 1 ingress transfer pattern eth / vlan / end actions
> >      port_id id 0 / end
> > 
> > - Adding a VLAN header to IPv6 traffic received on port ID 1 and directing
> >   it to port ID 0:
> > 
> >   flow create 1 ingress transfer pattern eth / ipv6 / end actions
> >      of_push_vlan ethertype 0x8100 / of_set_vlan_vid / port_id id 0 / end
> > 
> > Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
<snip>
> > @@ -681,6 +772,84 @@ mlx5_nl_flow_transpose(void *buf,
> >  		mnl_attr_nest_end(buf, act_index);
> >  		++action;
> >  		break;
> > +	case ACTION_OF_POP_VLAN:
> > +		if (action->type != RTE_FLOW_ACTION_TYPE_OF_POP_VLAN)
> > +			goto trans;
> > +		conf.of_push_vlan = NULL;
> > +		i = TCA_VLAN_ACT_POP;
> > +		goto action_of_vlan;
> > +	case ACTION_OF_PUSH_VLAN:
> > +		if (action->type != RTE_FLOW_ACTION_TYPE_OF_PUSH_VLAN)
> > +			goto trans;
> > +		conf.of_push_vlan = action->conf;
> > +		i = TCA_VLAN_ACT_PUSH;
> > +		goto action_of_vlan;
> > +	case ACTION_OF_SET_VLAN_VID:
> > +		if (action->type != RTE_FLOW_ACTION_TYPE_OF_SET_VLAN_VID)
> > +			goto trans;
> > +		conf.of_set_vlan_vid = action->conf;
> > +		if (na_vlan_id)
> > +			goto override_na_vlan_id;
> > +		i = TCA_VLAN_ACT_MODIFY;
> > +		goto action_of_vlan;
> > +	case ACTION_OF_SET_VLAN_PCP:
> > +		if (action->type != RTE_FLOW_ACTION_TYPE_OF_SET_VLAN_PCP)
> > +			goto trans;
> > +		conf.of_set_vlan_pcp = action->conf;
> > +		if (na_vlan_priority)
> > +			goto override_na_vlan_priority;
> > +		i = TCA_VLAN_ACT_MODIFY;
> > +		goto action_of_vlan;
> > +action_of_vlan:
> > +		act_index =
> > +			mnl_attr_nest_start_check(buf, size, act_index_cur++);
> > +		if (!act_index ||
> > +		    !mnl_attr_put_strz_check(buf, size, TCA_ACT_KIND, "vlan"))
> > +			goto error_nobufs;
> > +		act = mnl_attr_nest_start_check(buf, size, TCA_ACT_OPTIONS);
> > +		if (!act)
> > +			goto error_nobufs;
> > +		if (!mnl_attr_put_check(buf, size, TCA_VLAN_PARMS,
> > +					sizeof(struct tc_vlan),
> > +					&(struct tc_vlan){
> > +						.action = TC_ACT_PIPE,
> > +						.v_action = i,
> > +					}))
> > +			goto error_nobufs;
> > +		if (i == TCA_VLAN_ACT_POP) {
> > +			mnl_attr_nest_end(buf, act);
> > +			++action;
> > +			break;
> > +		}
> > +		if (i == TCA_VLAN_ACT_PUSH &&
> > +		    !mnl_attr_put_u16_check(buf, size,
> > +					    TCA_VLAN_PUSH_VLAN_PROTOCOL,
> > +					    conf.of_push_vlan->ethertype))
> > +			goto error_nobufs;
> > +		na_vlan_id = mnl_nlmsg_get_payload_tail(buf);
> > +		if (!mnl_attr_put_u16_check(buf, size, TCA_VLAN_PAD, 0))
> > +			goto error_nobufs;
> > +		na_vlan_priority = mnl_nlmsg_get_payload_tail(buf);
> > +		if (!mnl_attr_put_u8_check(buf, size, TCA_VLAN_PAD, 0))
> > +			goto error_nobufs;
> > +		mnl_attr_nest_end(buf, act);
> > +		mnl_attr_nest_end(buf, act_index);
> > +		if (action->type == RTE_FLOW_ACTION_TYPE_OF_SET_VLAN_VID) {
> > +override_na_vlan_id:
> > +			na_vlan_id->nla_type = TCA_VLAN_PUSH_VLAN_ID;
> > +			*(uint16_t *)mnl_attr_get_payload(na_vlan_id) =
> > +				rte_be_to_cpu_16
> > +				(conf.of_set_vlan_vid->vlan_vid);
> > +		} else if (action->type ==
> > +			   RTE_FLOW_ACTION_TYPE_OF_SET_VLAN_PCP) {
> > +override_na_vlan_priority:
> > +			na_vlan_priority->nla_type =
> > +				TCA_VLAN_PUSH_VLAN_PRIORITY;
> > +			*(uint8_t *)mnl_attr_get_payload(na_vlan_priority) =
> > +				conf.of_set_vlan_pcp->vlan_pcp;
> > +		}
> > +		++action;
> > +		break;
> 
> I'm wondering if there's no need to check the existence of VLAN in pattern when
> having VLAN modification actions. For example, if flow has POP_VLAN action, its
> pattern has to have VLAN item, doesn't it?

Not necessarily. For instance there is no need to explicitly match VLAN
traffic if you somehow guarantee that only VLAN traffic goes through,
e.g. in case peer is configured to always push a VLAN header regardless,
requesting explicit match in this sense can be thought as an unnecessary
limitation.

I agree this check would have been mandatory if this check wasn't performed
elsewhere, as discussed below:

> Even though kernel driver has such
> validation checks, mlx5_flow_validate() can't validate it.

Yes, note this is consistent with the rest of this particular implementation
(VLAN POP is not an exception). This entire code is a somewhat generic
rte_flow-to-TC converter which doesn't check HW capabilities at all, it
doesn't check the private structure, type of device and so on. This role is
left to the kernel implementation and (optionally) the caller function.

The only explicit checks are performed at conversion stage; if something
cannot be converted from rte_flow to TC, as is the case for VLAN DEI (hence
the 0xefff mask). The rest is implicit, for instance I didn't bother to
implement all pattern items and fields, only the bare minimum.

Further, ConnectX-4 and ConnectX-5 have different capabilities. The former
only supports offloading destination MAC matching and the drop action at the
switch level. Depending on driver/firmware combinations, such and such
feature may or may not be present.

Checking everything in order to print nice error messages would have been
nice, but would have required a lot of effort. Hence the decision to
restrict the scope of this function.

> In the PRM,
> 	8.18.2.7 Packet Classification Ambiguities
> 	...
> 	In addition, a flow should not match or attempt to modify (Modify Header
> 	action, Pop VLAN action) non-existing fields of a packet, as defined by
> 	the packet classification process.
> 	...

Fortunately this code is not running on top of PRM :)

This is my opinion anyway. If you think we need extra safety for (and only
for) VLAN POP, I'll add it, please confirm.

-- 
Adrien Mazarguil
6WIND

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [dpdk-dev] [PATCH 1/6] net/mlx5: lay groundwork for switch offloads
  2018-07-12 10:46     ` Adrien Mazarguil
@ 2018-07-12 17:33       ` Yongseok Koh
  0 siblings, 0 replies; 33+ messages in thread
From: Yongseok Koh @ 2018-07-12 17:33 UTC (permalink / raw)
  To: Adrien Mazarguil; +Cc: Shahaf Shuler, Nélio Laranjeiro, dev


> On Jul 12, 2018, at 3:46 AM, Adrien Mazarguil <adrien.mazarguil@6wind.com> wrote:
> 
> On Wed, Jul 11, 2018 at 05:17:09PM -0700, Yongseok Koh wrote:
>> On Wed, Jun 27, 2018 at 08:08:10PM +0200, Adrien Mazarguil wrote:
>>> With mlx5, unlike normal flow rules implemented through Verbs for traffic
>>> emitted and received by the application, those targeting different logical
>>> ports of the device (VF representors for instance) are offloaded at the
>>> switch level and must be configured through Netlink (TC interface).
>>> 
>>> This patch adds preliminary support to manage such flow rules through the
>>> flow API (rte_flow).
>>> 
>>> Instead of rewriting tons of Netlink helpers and as previously suggested by
>>> Stephen [1], this patch introduces a new dependency to libmnl [2]
>>> (LGPL-2.1) when compiling mlx5.
>>> 
>>> [1] https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fmails.dpdk.org%2Farchives%2Fdev%2F2018-March%2F092676.html&data=02%7C01%7Cyskoh%40mellanox.com%7C1250093eca0c4ad6d9f008d5dc58fbb4%7Ca652971c7d2e4d9ba6a4d149256f461b%7C0%7C0%7C636657197116524482&sdata=JrAyzK1s3JG5CnuquNcA7XRN4d2WYtHUi1KXyloGdvA%3D&reserved=0
>>> [2] https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fnetfilter.org%2Fprojects%2Flibmnl%2F&data=02%7C01%7Cyskoh%40mellanox.com%7C1250093eca0c4ad6d9f008d5dc58fbb4%7Ca652971c7d2e4d9ba6a4d149256f461b%7C0%7C0%7C636657197116524482&sdata=yLYa0NzsTyE62BHDCZDoDah31snt6w4Coq47pY913Oo%3D&reserved=0
>>> 
>>> Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
> <snip>
>>> diff --git a/drivers/net/mlx5/mlx5_nl_flow.c b/drivers/net/mlx5/mlx5_nl_flow.c
>>> new file mode 100644
>>> index 000000000..7a8683b03
>>> --- /dev/null
>>> +++ b/drivers/net/mlx5/mlx5_nl_flow.c
>>> @@ -0,0 +1,139 @@
>>> +/* SPDX-License-Identifier: BSD-3-Clause
>>> + * Copyright 2018 6WIND S.A.
>>> + * Copyright 2018 Mellanox Technologies, Ltd
>>> + */
>>> +
>>> +#include <errno.h>
>>> +#include <libmnl/libmnl.h>
>>> +#include <linux/netlink.h>
>>> +#include <linux/pkt_sched.h>
>>> +#include <linux/rtnetlink.h>
>>> +#include <stdalign.h>
>>> +#include <stddef.h>
>>> +#include <stdint.h>
>>> +#include <stdlib.h>
>>> +#include <sys/socket.h>
>>> +
>>> +#include <rte_errno.h>
>>> +#include <rte_flow.h>
>>> +
>>> +#include "mlx5.h"
>>> +
>>> +/**
>>> + * Send Netlink message with acknowledgment.
>>> + *
>>> + * @param nl
>>> + *   Libmnl socket to use.
>>> + * @param nlh
>>> + *   Message to send. This function always raises the NLM_F_ACK flag before
>>> + *   sending.
>>> + *
>>> + * @return
>>> + *   0 on success, a negative errno value otherwise and rte_errno is set.
>>> + */
>>> +static int
>>> +mlx5_nl_flow_nl_ack(struct mnl_socket *nl, struct nlmsghdr *nlh)
>>> +{
>>> +	alignas(struct nlmsghdr)
>>> +	uint8_t ans[MNL_SOCKET_BUFFER_SIZE];
>> 
>> There are total 3 of this buffer. On a certain host having large pagesize, this
>> can be 8kB * 3 = 24kB. This is not a gigantic buffer but as all the functions
>> here are sequentially accessed, how about having just one global buffer instead?
> 
> All right it's not ideal, I opted for simplicity though. This is a generic
> ack function. When NETLINK_CAP_ACK is not supported (note: this was made
> optional for v2, some systems do not support it), an ack consumes a bit more
> space than the original message, which may itself be huge, and failure to
> receive acks is deemed fatal.
> 
> Its callers are mlx5_nl_flow_init(), called once per device during
> initialization, and mlx5_nl_flow_create/destroy(), called for each
> created/removed flow rule.
> 
> These last two are called often but do not put their own buffer on the
> stack, they reuse previously generated messages from the heap.
> 
> So to improve stack consumption a bit, what I can do is size this buffer
> according to nlh->nlmsg_len + extra room for ack header, yet still allocate
> it locally since it would be a pain otherwise. Callers may not want their
> own buffers to be overwritten with useless acks.

I like this approach.

Thanks,
Yongseok

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [dpdk-dev] [PATCH 2/6] net/mlx5: add framework for switch flow rules
  2018-07-12 10:46     ` Adrien Mazarguil
@ 2018-07-12 18:25       ` Yongseok Koh
  0 siblings, 0 replies; 33+ messages in thread
From: Yongseok Koh @ 2018-07-12 18:25 UTC (permalink / raw)
  To: Adrien Mazarguil; +Cc: Shahaf Shuler, Nelio Laranjeiro, dev

On Thu, Jul 12, 2018 at 12:46:46PM +0200, Adrien Mazarguil wrote:
> On Wed, Jul 11, 2018 at 05:59:18PM -0700, Yongseok Koh wrote:
> > On Wed, Jun 27, 2018 at 08:08:12PM +0200, Adrien Mazarguil wrote:
> > > Because mlx5 switch flow rules are configured through Netlink (TC
> > > interface) and have little in common with Verbs, this patch adds a separate
> > > parser function to handle them.
> > > 
> > > - mlx5_nl_flow_transpose() converts a rte_flow rule to its TC equivalent
> > >   and stores the result in a buffer.
> > > 
> > > - mlx5_nl_flow_brand() gives a unique handle to a flow rule buffer.
> > > 
> > > - mlx5_nl_flow_create() instantiates a flow rule on the device based on
> > >   such a buffer.
> > > 
> > > - mlx5_nl_flow_destroy() performs the reverse operation.
> > > 
> > > These functions are called by the existing implementation when encountering
> > > flow rules which must be offloaded to the switch (currently relying on the
> > > transfer attribute).
> > > 
> > > Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
> > > Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
> <snip>
> > > diff --git a/drivers/net/mlx5/mlx5_flow.c b/drivers/net/mlx5/mlx5_flow.c
> > > index 9241855be..93b245991 100644
> > > --- a/drivers/net/mlx5/mlx5_flow.c
> > > +++ b/drivers/net/mlx5/mlx5_flow.c
> > > @@ -4,6 +4,7 @@
> > >   */
> > >  
> > >  #include <sys/queue.h>
> > > +#include <stdalign.h>
> > >  #include <stdint.h>
> > >  #include <string.h>
> > >  
> > > @@ -271,6 +272,7 @@ struct rte_flow {
> > >  	/**< Store tunnel packet type data to store in Rx queue. */
> > >  	uint8_t key[40]; /**< RSS hash key. */
> > >  	uint16_t (*queue)[]; /**< Destination queues to redirect traffic to. */
> > > +	void *nl_flow; /**< Netlink flow buffer if relevant. */
> > >  };
> > >  
> > >  static const struct rte_flow_ops mlx5_flow_ops = {
> > > @@ -2403,6 +2405,106 @@ mlx5_flow_actions(struct rte_eth_dev *dev,
> > >  }
> > >  
> > >  /**
> > > + * Validate flow rule and fill flow structure accordingly.
> > > + *
> > > + * @param dev
> > > + *   Pointer to Ethernet device.
> > > + * @param[out] flow
> > > + *   Pointer to flow structure.
> > > + * @param flow_size
> > > + *   Size of allocated space for @p flow.
> > > + * @param[in] attr
> > > + *   Flow rule attributes.
> > > + * @param[in] pattern
> > > + *   Pattern specification (list terminated by the END pattern item).
> > > + * @param[in] actions
> > > + *   Associated actions (list terminated by the END action).
> > > + * @param[out] error
> > > + *   Perform verbose error reporting if not NULL.
> > > + *
> > > + * @return
> > > + *   A positive value representing the size of the flow object in bytes
> > > + *   regardless of @p flow_size on success, a negative errno value otherwise
> > > + *   and rte_errno is set.
> > > + */
> > > +static int
> > > +mlx5_flow_merge_switch(struct rte_eth_dev *dev,
> > > +		       struct rte_flow *flow,
> > > +		       size_t flow_size,
> > > +		       const struct rte_flow_attr *attr,
> > > +		       const struct rte_flow_item pattern[],
> > > +		       const struct rte_flow_action actions[],
> > > +		       struct rte_flow_error *error)
> > > +{
> > > +	struct priv *priv = dev->data->dev_private;
> > > +	unsigned int n = mlx5_domain_to_port_id(priv->domain_id, NULL, 0);
> > > +	uint16_t port_list[!n + n];
> > > +	struct mlx5_nl_flow_ptoi ptoi[!n + n + 1];
> > > +	size_t off = RTE_ALIGN_CEIL(sizeof(*flow), alignof(max_align_t));
> > > +	unsigned int i;
> > > +	unsigned int own = 0;
> > > +	int ret;
> > > +
> > > +	/* At least one port is needed when no switch domain is present. */
> > > +	if (!n) {
> > > +		n = 1;
> > > +		port_list[0] = dev->data->port_id;
> > > +	} else {
> > > +		n = mlx5_domain_to_port_id(priv->domain_id, port_list, n);
> > > +		if (n > RTE_DIM(port_list))
> > > +			n = RTE_DIM(port_list);
> > > +	}
> > > +	for (i = 0; i != n; ++i) {
> > > +		struct rte_eth_dev_info dev_info;
> > > +
> > > +		rte_eth_dev_info_get(port_list[i], &dev_info);
> > > +		if (port_list[i] == dev->data->port_id)
> > > +			own = i;
> > > +		ptoi[i].port_id = port_list[i];
> > > +		ptoi[i].ifindex = dev_info.if_index;
> > > +	}
> > > +	/* Ensure first entry of ptoi[] is the current device. */
> > > +	if (own) {
> > > +		ptoi[n] = ptoi[0];
> > > +		ptoi[0] = ptoi[own];
> > > +		ptoi[own] = ptoi[n];
> > > +	}
> > > +	/* An entry with zero ifindex terminates ptoi[]. */
> > > +	ptoi[n].port_id = 0;
> > > +	ptoi[n].ifindex = 0;
> > > +	if (flow_size < off)
> > > +		flow_size = 0;
> > > +	ret = mlx5_nl_flow_transpose((uint8_t *)flow + off,
> > > +				     flow_size ? flow_size - off : 0,
> > > +				     ptoi, attr, pattern, actions, error);
> > > +	if (ret < 0)
> > > +		return ret;
> > 
> > So, there's an assumption that the buffer allocated outside of this API is
> > enough to include all the messages in mlx5_nl_flow_transpose(), right? If
> > flow_size isn't enough, buf_tmp will be used and _transpose() doesn't return
> > error but required size. Sounds confusing, may need to make a change or to have
> > clearer documentation.
> 
> Well, isn't it already documented? Besides these are the usual snprintf()
> semantics used everywhere in these files, I think this was a major topic of
> discussion with Nelio on the flow rework series :)
> 
> buf_tmp[] is internal to mlx5_nl_flow_transpose() and used as a fallback to
> complete a pass when input buffer is not large enough (including
> zero-sized). Having a valid buffer is a constraint imposed by libmnl,
> because we badly want to know how much space will be needed assuming the
> flow rule was successfully processed.
> 
> Without libmnl, the helpers it provides would have been written in a way
> that doesn't require buf_tmp[]. However libmnl is just too convenient to
> pass up, hence this compromise.
> 
> (just to remind onlookers, we want to allocate the minimum amount of memory
> we possibly can for resources needed by each flow rule, and do so through a
> single allocation, end goal being to support millions of flow rules while
> wasting as little memory as possible.)
> 
> <snip>
> > > diff --git a/drivers/net/mlx5/mlx5_nl_flow.c b/drivers/net/mlx5/mlx5_nl_flow.c
> > > index 7a8683b03..1fc62fb0a 100644
> > > --- a/drivers/net/mlx5/mlx5_nl_flow.c
> > > +++ b/drivers/net/mlx5/mlx5_nl_flow.c
> > > @@ -5,7 +5,9 @@
> > >  
> > >  #include <errno.h>
> > >  #include <libmnl/libmnl.h>
> > > +#include <linux/if_ether.h>
> > >  #include <linux/netlink.h>
> > > +#include <linux/pkt_cls.h>
> > >  #include <linux/pkt_sched.h>
> > >  #include <linux/rtnetlink.h>
> > >  #include <stdalign.h>
> > > @@ -14,11 +16,248 @@
> > >  #include <stdlib.h>
> > >  #include <sys/socket.h>
> > >  
> > > +#include <rte_byteorder.h>
> > >  #include <rte_errno.h>
> > >  #include <rte_flow.h>
> > >  
> > >  #include "mlx5.h"
> > >  
> > > +/** Parser state definitions for mlx5_nl_flow_trans[]. */
> > > +enum mlx5_nl_flow_trans {
> > > +	INVALID,
> > > +	BACK,
> > > +	ATTR,
> > > +	PATTERN,
> > > +	ITEM_VOID,
> > > +	ACTIONS,
> > > +	ACTION_VOID,
> > > +	END,
> > > +};
> > > +
> > > +#define TRANS(...) (const enum mlx5_nl_flow_trans []){ __VA_ARGS__, INVALID, }
> > > +
> > > +#define PATTERN_COMMON \
> > > +	ITEM_VOID, ACTIONS
> > > +#define ACTIONS_COMMON \
> > > +	ACTION_VOID, END
> > > +
> > > +/** Parser state transitions used by mlx5_nl_flow_transpose(). */
> > > +static const enum mlx5_nl_flow_trans *const mlx5_nl_flow_trans[] = {
> > > +	[INVALID] = NULL,
> > > +	[BACK] = NULL,
> > > +	[ATTR] = TRANS(PATTERN),
> > > +	[PATTERN] = TRANS(PATTERN_COMMON),
> > > +	[ITEM_VOID] = TRANS(BACK),
> > > +	[ACTIONS] = TRANS(ACTIONS_COMMON),
> > > +	[ACTION_VOID] = TRANS(BACK),
> > > +	[END] = NULL,
> > > +};
> > > +
> > > +/**
> > > + * Transpose flow rule description to rtnetlink message.
> > > + *
> > > + * This function transposes a flow rule description to a traffic control
> > > + * (TC) filter creation message ready to be sent over Netlink.
> > > + *
> > > + * Target interface is specified as the first entry of the @p ptoi table.
> > > + * Subsequent entries enable this function to resolve other DPDK port IDs
> > > + * found in the flow rule.
> > > + *
> > > + * @param[out] buf
> > > + *   Output message buffer. May be NULL when @p size is 0.
> > > + * @param size
> > > + *   Size of @p buf. Message may be truncated if not large enough.
> > > + * @param[in] ptoi
> > > + *   DPDK port ID to network interface index translation table. This table
> > > + *   is terminated by an entry with a zero ifindex value.
> > > + * @param[in] attr
> > > + *   Flow rule attributes.
> > > + * @param[in] pattern
> > > + *   Pattern specification.
> > > + * @param[in] actions
> > > + *   Associated actions.
> > > + * @param[out] error
> > > + *   Perform verbose error reporting if not NULL.
> > > + *
> > > + * @return
> > > + *   A positive value representing the exact size of the message in bytes
> > > + *   regardless of the @p size parameter on success, a negative errno value
> > > + *   otherwise and rte_errno is set.
> > > + */
> > > +int
> > > +mlx5_nl_flow_transpose(void *buf,
> > > +		       size_t size,
> > > +		       const struct mlx5_nl_flow_ptoi *ptoi,
> > > +		       const struct rte_flow_attr *attr,
> > > +		       const struct rte_flow_item *pattern,
> > > +		       const struct rte_flow_action *actions,
> > > +		       struct rte_flow_error *error)
> > > +{
> > > +	alignas(struct nlmsghdr)
> > > +	uint8_t buf_tmp[MNL_SOCKET_BUFFER_SIZE];
> > > +	const struct rte_flow_item *item;
> > > +	const struct rte_flow_action *action;
> > > +	unsigned int n;
> > > +	struct nlattr *na_flower;
> > > +	struct nlattr *na_flower_act;
> > > +	const enum mlx5_nl_flow_trans *trans;
> > > +	const enum mlx5_nl_flow_trans *back;
> > > +
> > > +	if (!size)
> > > +		goto error_nobufs;
> > > +init:
> > > +	item = pattern;
> > > +	action = actions;
> > > +	n = 0;
> > > +	na_flower = NULL;
> > > +	na_flower_act = NULL;
> > > +	trans = TRANS(ATTR);
> > > +	back = trans;
> > > +trans:
> > > +	switch (trans[n++]) {
> > > +		struct nlmsghdr *nlh;
> > > +		struct tcmsg *tcm;
> > > +
> > > +	case INVALID:
> > > +		if (item->type)
> > > +			return rte_flow_error_set
> > > +				(error, ENOTSUP, RTE_FLOW_ERROR_TYPE_ITEM,
> > > +				 item, "unsupported pattern item combination");
> > > +		else if (action->type)
> > > +			return rte_flow_error_set
> > > +				(error, ENOTSUP, RTE_FLOW_ERROR_TYPE_ACTION,
> > > +				 action, "unsupported action combination");
> > > +		return rte_flow_error_set
> > > +			(error, ENOTSUP, RTE_FLOW_ERROR_TYPE_UNSPECIFIED, NULL,
> > > +			 "flow rule lacks some kind of fate action");
> > > +	case BACK:
> > > +		trans = back;
> > > +		n = 0;
> > > +		goto trans;
> > > +	case ATTR:
> > > +		/*
> > > +		 * Supported attributes: no groups, some priorities and
> > > +		 * ingress only. Don't care about transfer as it is the
> > > +		 * caller's problem.
> > > +		 */
> > > +		if (attr->group)
> > > +			return rte_flow_error_set
> > > +				(error, ENOTSUP,
> > > +				 RTE_FLOW_ERROR_TYPE_ATTR_GROUP,
> > > +				 attr, "groups are not supported");
> > > +		if (attr->priority > 0xfffe)
> > > +			return rte_flow_error_set
> > > +				(error, ENOTSUP,
> > > +				 RTE_FLOW_ERROR_TYPE_ATTR_PRIORITY,
> > > +				 attr, "lowest priority level is 0xfffe");
> > > +		if (!attr->ingress)
> > > +			return rte_flow_error_set
> > > +				(error, ENOTSUP,
> > > +				 RTE_FLOW_ERROR_TYPE_ATTR_INGRESS,
> > > +				 attr, "only ingress is supported");
> > > +		if (attr->egress)
> > > +			return rte_flow_error_set
> > > +				(error, ENOTSUP,
> > > +				 RTE_FLOW_ERROR_TYPE_ATTR_INGRESS,
> > > +				 attr, "egress is not supported");
> > > +		if (size < mnl_nlmsg_size(sizeof(*tcm)))
> > > +			goto error_nobufs;
> > > +		nlh = mnl_nlmsg_put_header(buf);
> > > +		nlh->nlmsg_type = 0;
> > > +		nlh->nlmsg_flags = 0;
> > > +		nlh->nlmsg_seq = 0;
> > > +		tcm = mnl_nlmsg_put_extra_header(nlh, sizeof(*tcm));
> > > +		tcm->tcm_family = AF_UNSPEC;
> > > +		tcm->tcm_ifindex = ptoi[0].ifindex;
> > > +		/*
> > > +		 * Let kernel pick a handle by default. A predictable handle
> > > +		 * can be set by the caller on the resulting buffer through
> > > +		 * mlx5_nl_flow_brand().
> > > +		 */
> > > +		tcm->tcm_handle = 0;
> > > +		tcm->tcm_parent = TC_H_MAKE(TC_H_INGRESS, TC_H_MIN_INGRESS);
> > > +		/*
> > > +		 * Priority cannot be zero to prevent the kernel from
> > > +		 * picking one automatically.
> > > +		 */
> > > +		tcm->tcm_info = TC_H_MAKE((attr->priority + 1) << 16,
> > > +					  RTE_BE16(ETH_P_ALL));
> > > +		break;
> > > +	case PATTERN:
> > > +		if (!mnl_attr_put_strz_check(buf, size, TCA_KIND, "flower"))
> > > +			goto error_nobufs;
> > > +		na_flower = mnl_attr_nest_start_check(buf, size, TCA_OPTIONS);
> > > +		if (!na_flower)
> > > +			goto error_nobufs;
> > > +		if (!mnl_attr_put_u32_check(buf, size, TCA_FLOWER_FLAGS,
> > > +					    TCA_CLS_FLAGS_SKIP_SW))
> > > +			goto error_nobufs;
> > > +		break;
> > > +	case ITEM_VOID:
> > > +		if (item->type != RTE_FLOW_ITEM_TYPE_VOID)
> > > +			goto trans;
> > > +		++item;
> > > +		break;
> > > +	case ACTIONS:
> > > +		if (item->type != RTE_FLOW_ITEM_TYPE_END)
> > > +			goto trans;
> > > +		assert(na_flower);
> > > +		assert(!na_flower_act);
> > > +		na_flower_act =
> > > +			mnl_attr_nest_start_check(buf, size, TCA_FLOWER_ACT);
> > > +		if (!na_flower_act)
> > > +			goto error_nobufs;
> > > +		break;
> > > +	case ACTION_VOID:
> > > +		if (action->type != RTE_FLOW_ACTION_TYPE_VOID)
> > > +			goto trans;
> > > +		++action;
> > > +		break;
> > > +	case END:
> > > +		if (item->type != RTE_FLOW_ITEM_TYPE_END ||
> > > +		    action->type != RTE_FLOW_ACTION_TYPE_END)
> > > +			goto trans;
> > > +		if (na_flower_act)
> > > +			mnl_attr_nest_end(buf, na_flower_act);
> > > +		if (na_flower)
> > > +			mnl_attr_nest_end(buf, na_flower);
> > > +		nlh = buf;
> > > +		return nlh->nlmsg_len;
> > > +	}
> > > +	back = trans;
> > > +	trans = mlx5_nl_flow_trans[trans[n - 1]];
> > > +	n = 0;
> > > +	goto trans;
> > > +error_nobufs:
> > > +	if (buf != buf_tmp) {
> > > +		buf = buf_tmp;
> > > +		size = sizeof(buf_tmp);
> > > +		goto init;
> > > +	}
> > 
> > Continuing my comment above.
> > This part is unclear. It looks to me that this func does:
> > 
> > 1) if size is zero, consider it as a testing call to know the amount of memory
> > required.
> 
> Yeah, in fact this one is a shortcut to speed up this specific scenario as
> it happens all the time in the two-pass use case. You can lump it with 2).
> 
> > 2) if size isn't zero but not enough, it stops writing to buf and start over to
> > return the amount of memory required instead of returning error.
> > 3) if size isn't zero and enough, it fills in buf.
> > 
> > Do I correctly understand?
> 
> Yes. Another minor note for 2), the returned buffer is also filled up to the
> point of failure (mimics snprintf()).
> 
> Perhaps the following snippet can better summarize the envisioned approach:
> 
>  int ret = snprintf(NULL, 0, "something", ...);
> 
>  if (ret < 0) {
>      goto court;
>  } else {
>      char buf[ret];
> 
>      snprintf(buf, sizeof(buf), "something", ...); /* Guaranteed. */
>      [...]
>  }

I know you and Nelio mimicked snprintf() but as _merge() isn't a public API for
users but an internal API. I didn't think it should necessarily be like that. I
hoped to have it used either for testing (knowing size) or real translation - 1)
and 3). And no possibility for 2), then 2) would've been handled by assert().
I beleive this could've made the code simpler.

However, as I already acked Nelio's patchset, agreed on the idea and Nelio
already documented the behavior,

Acked-by: Yongseok Koh <yskoh@mellanox.com>

Thanks

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [dpdk-dev] [PATCH 5/6] net/mlx5: add VLAN item and actions to switch flow rules
  2018-07-12 10:47     ` Adrien Mazarguil
@ 2018-07-12 18:49       ` Yongseok Koh
  0 siblings, 0 replies; 33+ messages in thread
From: Yongseok Koh @ 2018-07-12 18:49 UTC (permalink / raw)
  To: Adrien Mazarguil; +Cc: Shahaf Shuler, Nelio Laranjeiro, dev

On Thu, Jul 12, 2018 at 12:47:09PM +0200, Adrien Mazarguil wrote:
> On Wed, Jul 11, 2018 at 06:10:25PM -0700, Yongseok Koh wrote:
> > On Wed, Jun 27, 2018 at 08:08:18PM +0200, Adrien Mazarguil wrote:
> > > This enables flow rules to explicitly match VLAN traffic (VLAN pattern
> > > item) and perform various operations on VLAN headers at the switch level
> > > (OF_POP_VLAN, OF_PUSH_VLAN, OF_SET_VLAN_VID and OF_SET_VLAN_PCP actions).
> > > 
> > > Testpmd examples:
> > > 
> > > - Directing all VLAN traffic received on port ID 1 to port ID 0:
> > > 
> > >   flow create 1 ingress transfer pattern eth / vlan / end actions
> > >      port_id id 0 / end
> > > 
> > > - Adding a VLAN header to IPv6 traffic received on port ID 1 and directing
> > >   it to port ID 0:
> > > 
> > >   flow create 1 ingress transfer pattern eth / ipv6 / end actions
> > >      of_push_vlan ethertype 0x8100 / of_set_vlan_vid / port_id id 0 / end
> > > 
> > > Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
> <snip>
> > > @@ -681,6 +772,84 @@ mlx5_nl_flow_transpose(void *buf,
> > >  		mnl_attr_nest_end(buf, act_index);
> > >  		++action;
> > >  		break;
> > > +	case ACTION_OF_POP_VLAN:
> > > +		if (action->type != RTE_FLOW_ACTION_TYPE_OF_POP_VLAN)
> > > +			goto trans;
> > > +		conf.of_push_vlan = NULL;
> > > +		i = TCA_VLAN_ACT_POP;
> > > +		goto action_of_vlan;
> > > +	case ACTION_OF_PUSH_VLAN:
> > > +		if (action->type != RTE_FLOW_ACTION_TYPE_OF_PUSH_VLAN)
> > > +			goto trans;
> > > +		conf.of_push_vlan = action->conf;
> > > +		i = TCA_VLAN_ACT_PUSH;
> > > +		goto action_of_vlan;
> > > +	case ACTION_OF_SET_VLAN_VID:
> > > +		if (action->type != RTE_FLOW_ACTION_TYPE_OF_SET_VLAN_VID)
> > > +			goto trans;
> > > +		conf.of_set_vlan_vid = action->conf;
> > > +		if (na_vlan_id)
> > > +			goto override_na_vlan_id;
> > > +		i = TCA_VLAN_ACT_MODIFY;
> > > +		goto action_of_vlan;
> > > +	case ACTION_OF_SET_VLAN_PCP:
> > > +		if (action->type != RTE_FLOW_ACTION_TYPE_OF_SET_VLAN_PCP)
> > > +			goto trans;
> > > +		conf.of_set_vlan_pcp = action->conf;
> > > +		if (na_vlan_priority)
> > > +			goto override_na_vlan_priority;
> > > +		i = TCA_VLAN_ACT_MODIFY;
> > > +		goto action_of_vlan;
> > > +action_of_vlan:
> > > +		act_index =
> > > +			mnl_attr_nest_start_check(buf, size, act_index_cur++);
> > > +		if (!act_index ||
> > > +		    !mnl_attr_put_strz_check(buf, size, TCA_ACT_KIND, "vlan"))
> > > +			goto error_nobufs;
> > > +		act = mnl_attr_nest_start_check(buf, size, TCA_ACT_OPTIONS);
> > > +		if (!act)
> > > +			goto error_nobufs;
> > > +		if (!mnl_attr_put_check(buf, size, TCA_VLAN_PARMS,
> > > +					sizeof(struct tc_vlan),
> > > +					&(struct tc_vlan){
> > > +						.action = TC_ACT_PIPE,
> > > +						.v_action = i,
> > > +					}))
> > > +			goto error_nobufs;
> > > +		if (i == TCA_VLAN_ACT_POP) {
> > > +			mnl_attr_nest_end(buf, act);
> > > +			++action;
> > > +			break;
> > > +		}
> > > +		if (i == TCA_VLAN_ACT_PUSH &&
> > > +		    !mnl_attr_put_u16_check(buf, size,
> > > +					    TCA_VLAN_PUSH_VLAN_PROTOCOL,
> > > +					    conf.of_push_vlan->ethertype))
> > > +			goto error_nobufs;
> > > +		na_vlan_id = mnl_nlmsg_get_payload_tail(buf);
> > > +		if (!mnl_attr_put_u16_check(buf, size, TCA_VLAN_PAD, 0))
> > > +			goto error_nobufs;
> > > +		na_vlan_priority = mnl_nlmsg_get_payload_tail(buf);
> > > +		if (!mnl_attr_put_u8_check(buf, size, TCA_VLAN_PAD, 0))
> > > +			goto error_nobufs;
> > > +		mnl_attr_nest_end(buf, act);
> > > +		mnl_attr_nest_end(buf, act_index);
> > > +		if (action->type == RTE_FLOW_ACTION_TYPE_OF_SET_VLAN_VID) {
> > > +override_na_vlan_id:
> > > +			na_vlan_id->nla_type = TCA_VLAN_PUSH_VLAN_ID;
> > > +			*(uint16_t *)mnl_attr_get_payload(na_vlan_id) =
> > > +				rte_be_to_cpu_16
> > > +				(conf.of_set_vlan_vid->vlan_vid);
> > > +		} else if (action->type ==
> > > +			   RTE_FLOW_ACTION_TYPE_OF_SET_VLAN_PCP) {
> > > +override_na_vlan_priority:
> > > +			na_vlan_priority->nla_type =
> > > +				TCA_VLAN_PUSH_VLAN_PRIORITY;
> > > +			*(uint8_t *)mnl_attr_get_payload(na_vlan_priority) =
> > > +				conf.of_set_vlan_pcp->vlan_pcp;
> > > +		}
> > > +		++action;
> > > +		break;
> > 
> > I'm wondering if there's no need to check the existence of VLAN in pattern when
> > having VLAN modification actions. For example, if flow has POP_VLAN action, its
> > pattern has to have VLAN item, doesn't it?
> 
> Not necessarily. For instance there is no need to explicitly match VLAN
> traffic if you somehow guarantee that only VLAN traffic goes through,
> e.g. in case peer is configured to always push a VLAN header regardless,
> requesting explicit match in this sense can be thought as an unnecessary
> limitation.
> 
> I agree this check would have been mandatory if this check wasn't performed
> elsewhere, as discussed below:

>From user's perspective, it may not be necessary to specify VLAN in the pattern
as specifying POP_VLAN action implies that. But from device/PMD  perspective,
there could be two options, a) complain about the violation or b) silently add
VLAN pattern to not cause unexpected behavior in the device.

> > Even though kernel driver has such
> > validation checks, mlx5_flow_validate() can't validate it.
> 
> Yes, note this is consistent with the rest of this particular implementation
> (VLAN POP is not an exception). This entire code is a somewhat generic
> rte_flow-to-TC converter which doesn't check HW capabilities at all, it
> doesn't check the private structure, type of device and so on. This role is
> left to the kernel implementation and (optionally) the caller function.
> 
> The only explicit checks are performed at conversion stage; if something
> cannot be converted from rte_flow to TC, as is the case for VLAN DEI (hence
> the 0xefff mask). The rest is implicit, for instance I didn't bother to
> implement all pattern items and fields, only the bare minimum.
> 
> Further, ConnectX-4 and ConnectX-5 have different capabilities. The former
> only supports offloading destination MAC matching and the drop action at the
> switch level. Depending on driver/firmware combinations, such and such
> feature may or may not be present.
> 
> Checking everything in order to print nice error messages would have been
> nice, but would have required a lot of effort. Hence the decision to
> restrict the scope of this function.

I worried about a case where a flow gets success from validation call but
creation call returns an error (error from kernel). It would be a violation of
requirement of rte_flow library.

However, I agree that this implementation should have limited scope for now as
the current lib/ker implementation is quite divergent. We have two separate
paths to configure the flow and this should be unified. Good news is we'll get
to have the unified path eventually.

> > In the PRM,
> > 	8.18.2.7 Packet Classification Ambiguities
> > 	...
> > 	In addition, a flow should not match or attempt to modify (Modify Header
> > 	action, Pop VLAN action) non-existing fields of a packet, as defined by
> > 	the packet classification process.
> > 	...
> 
> Fortunately this code is not running on top of PRM :)
> 
> This is my opinion anyway. If you think we need extra safety for (and only
> for) VLAN POP, I'll add it, please confirm.

Well, I'll leave the decision to you.

Acked-by: Yongseok Koh <yskoh@mellanox.com>

Thanks

^ permalink raw reply	[flat|nested] 33+ messages in thread

* [dpdk-dev] [PATCH v2 0/6] net/mlx5: add support for switch flow rules
  2018-06-27 18:08 [dpdk-dev] [PATCH 0/6] net/mlx5: add support for switch flow rules Adrien Mazarguil
                   ` (6 preceding siblings ...)
  2018-06-28  9:05 ` [dpdk-dev] [PATCH 0/6] net/mlx5: add support for " Nélio Laranjeiro
@ 2018-07-13  9:40 ` Adrien Mazarguil
  2018-07-13  9:40   ` [dpdk-dev] [PATCH v2 1/6] net/mlx5: lay groundwork for switch offloads Adrien Mazarguil
                     ` (6 more replies)
  7 siblings, 7 replies; 33+ messages in thread
From: Adrien Mazarguil @ 2018-07-13  9:40 UTC (permalink / raw)
  To: Shahaf Shuler; +Cc: Nelio Laranjeiro, Yongseok Koh, dev

This series adds support for switch flow rules, that is, rte_flow rules
applied to mlx5 devices at the switch level.

It allows applications to offload traffic redirection between DPDK ports in
hardware, while optionally modifying it (e.g. performing encap/decap).

For this to work, involved DPDK ports must be part of the same switch
domain, as is the case with port representors, and the transfer attribute
must be requested on flow rules.

Also since the mlx5 switch is controlled through Netlink instead of Verbs,
and given how tedious formatting Netlink messages is, a new dependency is
added to mlx5: libmnl. See relevant patch.

v2 changes:

- Mostly compilation fixes for missing Netlink definitions on older systems.
- Reduced stack consumption.
- Adapted series to rely on mlx5_dev_to_port_id() instead of
  mlx5_dev_to_domain_id().
- See relevant patches for more information.

Adrien Mazarguil (6):
  net/mlx5: lay groundwork for switch offloads
  net/mlx5: add framework for switch flow rules
  net/mlx5: add fate actions to switch flow rules
  net/mlx5: add L2-L4 pattern items to switch flow rules
  net/mlx5: add VLAN item and actions to switch flow rules
  net/mlx5: add port ID pattern item to switch flow rules

 drivers/net/mlx5/Makefile       |  142 ++++
 drivers/net/mlx5/mlx5.c         |   32 +
 drivers/net/mlx5/mlx5.h         |   28 +
 drivers/net/mlx5/mlx5_flow.c    |  111 +++
 drivers/net/mlx5/mlx5_nl_flow.c | 1247 ++++++++++++++++++++++++++++++++++
 mk/rte.app.mk                   |    2 +-
 6 files changed, 1561 insertions(+), 1 deletion(-)
 create mode 100644 drivers/net/mlx5/mlx5_nl_flow.c

-- 
2.11.0

^ permalink raw reply	[flat|nested] 33+ messages in thread

* [dpdk-dev] [PATCH v2 1/6] net/mlx5: lay groundwork for switch offloads
  2018-07-13  9:40 ` [dpdk-dev] [PATCH v2 " Adrien Mazarguil
@ 2018-07-13  9:40   ` Adrien Mazarguil
  2018-07-14  1:29     ` Yongseok Koh
  2018-07-23 21:40     ` Ferruh Yigit
  2018-07-13  9:40   ` [dpdk-dev] [PATCH v2 2/6] net/mlx5: add framework for switch flow rules Adrien Mazarguil
                     ` (5 subsequent siblings)
  6 siblings, 2 replies; 33+ messages in thread
From: Adrien Mazarguil @ 2018-07-13  9:40 UTC (permalink / raw)
  To: Shahaf Shuler; +Cc: Nelio Laranjeiro, Yongseok Koh, dev

With mlx5, unlike normal flow rules implemented through Verbs for traffic
emitted and received by the application, those targeting different logical
ports of the device (VF representors for instance) are offloaded at the
switch level and must be configured through Netlink (TC interface).

This patch adds preliminary support to manage such flow rules through the
flow API (rte_flow).

Instead of rewriting tons of Netlink helpers and as previously suggested by
Stephen [1], this patch introduces a new dependency to libmnl [2]
(LGPL-2.1) when compiling mlx5.

[1] https://mails.dpdk.org/archives/dev/2018-March/092676.html
[2] https://netfilter.org/projects/libmnl/

Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Acked-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Cc: Yongseok Koh <yskoh@mellanox.com>
--
v2 changes:

- Added NETLINK_CAP_ACK definition if missing from the host system. This
  parameter is also not mandatory anymore and won't prevent creation of
  NL sockets when not supported.
- Modified mlx5_nl_flow_nl_ack() and mlx5_nl_flow_init() to consume the
  least amount of stack space based on message size, instead of the fixed
  MNL_SOCKET_BUFFER_SIZE which is quite large.
---
 drivers/net/mlx5/Makefile       |   2 +
 drivers/net/mlx5/mlx5.c         |  32 ++++++++
 drivers/net/mlx5/mlx5.h         |  10 +++
 drivers/net/mlx5/mlx5_nl_flow.c | 147 +++++++++++++++++++++++++++++++++++
 mk/rte.app.mk                   |   2 +-
 5 files changed, 192 insertions(+), 1 deletion(-)

diff --git a/drivers/net/mlx5/Makefile b/drivers/net/mlx5/Makefile
index 9e274964b..8d3cb219b 100644
--- a/drivers/net/mlx5/Makefile
+++ b/drivers/net/mlx5/Makefile
@@ -33,6 +33,7 @@ SRCS-$(CONFIG_RTE_LIBRTE_MLX5_PMD) += mlx5_mr.c
 SRCS-$(CONFIG_RTE_LIBRTE_MLX5_PMD) += mlx5_flow.c
 SRCS-$(CONFIG_RTE_LIBRTE_MLX5_PMD) += mlx5_socket.c
 SRCS-$(CONFIG_RTE_LIBRTE_MLX5_PMD) += mlx5_nl.c
+SRCS-$(CONFIG_RTE_LIBRTE_MLX5_PMD) += mlx5_nl_flow.c
 
 ifeq ($(CONFIG_RTE_LIBRTE_MLX5_DLOPEN_DEPS),y)
 INSTALL-$(CONFIG_RTE_LIBRTE_MLX5_PMD)-lib += $(LIB_GLUE)
@@ -56,6 +57,7 @@ LDLIBS += -ldl
 else
 LDLIBS += -libverbs -lmlx5
 endif
+LDLIBS += -lmnl
 LDLIBS += -lrte_eal -lrte_mbuf -lrte_mempool -lrte_ring
 LDLIBS += -lrte_ethdev -lrte_net -lrte_kvargs
 LDLIBS += -lrte_bus_pci
diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index 6d3421fae..8fb8c91eb 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -282,6 +282,8 @@ mlx5_dev_close(struct rte_eth_dev *dev)
 		close(priv->nl_socket_route);
 	if (priv->nl_socket_rdma >= 0)
 		close(priv->nl_socket_rdma);
+	if (priv->mnl_socket)
+		mlx5_nl_flow_socket_destroy(priv->mnl_socket);
 	ret = mlx5_hrxq_ibv_verify(dev);
 	if (ret)
 		DRV_LOG(WARNING, "port %u some hash Rx queue still remain",
@@ -1116,6 +1118,34 @@ mlx5_dev_spawn(struct rte_device *dpdk_dev,
 	claim_zero(mlx5_mac_addr_add(eth_dev, &mac, 0, 0));
 	if (vf && config.vf_nl_en)
 		mlx5_nl_mac_addr_sync(eth_dev);
+	priv->mnl_socket = mlx5_nl_flow_socket_create();
+	if (!priv->mnl_socket) {
+		err = -rte_errno;
+		DRV_LOG(WARNING,
+			"flow rules relying on switch offloads will not be"
+			" supported: cannot open libmnl socket: %s",
+			strerror(rte_errno));
+	} else {
+		struct rte_flow_error error;
+		unsigned int ifindex = mlx5_ifindex(eth_dev);
+
+		if (!ifindex) {
+			err = -rte_errno;
+			error.message =
+				"cannot retrieve network interface index";
+		} else {
+			err = mlx5_nl_flow_init(priv->mnl_socket, ifindex,
+						&error);
+		}
+		if (err) {
+			DRV_LOG(WARNING,
+				"flow rules relying on switch offloads will"
+				" not be supported: %s: %s",
+				error.message, strerror(rte_errno));
+			mlx5_nl_flow_socket_destroy(priv->mnl_socket);
+			priv->mnl_socket = NULL;
+		}
+	}
 	TAILQ_INIT(&priv->flows);
 	TAILQ_INIT(&priv->ctrl_flows);
 	/* Hint libmlx5 to use PMD allocator for data plane resources */
@@ -1168,6 +1198,8 @@ mlx5_dev_spawn(struct rte_device *dpdk_dev,
 			close(priv->nl_socket_route);
 		if (priv->nl_socket_rdma >= 0)
 			close(priv->nl_socket_rdma);
+		if (priv->mnl_socket)
+			mlx5_nl_flow_socket_destroy(priv->mnl_socket);
 		if (own_domain_id)
 			claim_zero(rte_eth_switch_domain_free(priv->domain_id));
 		rte_free(priv);
diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index 131be334c..98b6ec07d 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -156,6 +156,8 @@ struct mlx5_drop {
 	struct mlx5_rxq_ibv *rxq; /* Verbs Rx queue. */
 };
 
+struct mnl_socket;
+
 struct priv {
 	LIST_ENTRY(priv) mem_event_cb; /* Called by memory event callback. */
 	struct rte_eth_dev_data *dev_data;  /* Pointer to device data. */
@@ -215,6 +217,7 @@ struct priv {
 	int nl_socket_rdma; /* Netlink socket (NETLINK_RDMA). */
 	int nl_socket_route; /* Netlink socket (NETLINK_ROUTE). */
 	uint32_t nl_sn; /* Netlink message sequence number. */
+	struct mnl_socket *mnl_socket; /* Libmnl socket. */
 };
 
 #define PORT_ID(priv) ((priv)->dev_data->port_id)
@@ -380,4 +383,11 @@ unsigned int mlx5_nl_ifindex(int nl, const char *name);
 int mlx5_nl_switch_info(int nl, unsigned int ifindex,
 			struct mlx5_switch_info *info);
 
+/* mlx5_nl_flow.c */
+
+int mlx5_nl_flow_init(struct mnl_socket *nl, unsigned int ifindex,
+		      struct rte_flow_error *error);
+struct mnl_socket *mlx5_nl_flow_socket_create(void);
+void mlx5_nl_flow_socket_destroy(struct mnl_socket *nl);
+
 #endif /* RTE_PMD_MLX5_H_ */
diff --git a/drivers/net/mlx5/mlx5_nl_flow.c b/drivers/net/mlx5/mlx5_nl_flow.c
new file mode 100644
index 000000000..60a4493e5
--- /dev/null
+++ b/drivers/net/mlx5/mlx5_nl_flow.c
@@ -0,0 +1,147 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright 2018 6WIND S.A.
+ * Copyright 2018 Mellanox Technologies, Ltd
+ */
+
+#include <errno.h>
+#include <libmnl/libmnl.h>
+#include <linux/netlink.h>
+#include <linux/pkt_sched.h>
+#include <linux/rtnetlink.h>
+#include <stdalign.h>
+#include <stddef.h>
+#include <stdint.h>
+#include <stdlib.h>
+#include <sys/socket.h>
+
+#include <rte_errno.h>
+#include <rte_flow.h>
+
+#include "mlx5.h"
+
+/* Normally found in linux/netlink.h. */
+#ifndef NETLINK_CAP_ACK
+#define NETLINK_CAP_ACK 10
+#endif
+
+/**
+ * Send Netlink message with acknowledgment.
+ *
+ * @param nl
+ *   Libmnl socket to use.
+ * @param nlh
+ *   Message to send. This function always raises the NLM_F_ACK flag before
+ *   sending.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+static int
+mlx5_nl_flow_nl_ack(struct mnl_socket *nl, struct nlmsghdr *nlh)
+{
+	alignas(struct nlmsghdr)
+	uint8_t ans[mnl_nlmsg_size(sizeof(struct nlmsgerr)) +
+		    nlh->nlmsg_len - sizeof(*nlh)];
+	uint32_t seq = random();
+	int ret;
+
+	nlh->nlmsg_flags |= NLM_F_ACK;
+	nlh->nlmsg_seq = seq;
+	ret = mnl_socket_sendto(nl, nlh, nlh->nlmsg_len);
+	if (ret != -1)
+		ret = mnl_socket_recvfrom(nl, ans, sizeof(ans));
+	if (ret != -1)
+		ret = mnl_cb_run
+			(ans, ret, seq, mnl_socket_get_portid(nl), NULL, NULL);
+	if (!ret)
+		return 0;
+	rte_errno = errno;
+	return -rte_errno;
+}
+
+/**
+ * Initialize ingress qdisc of a given network interface.
+ *
+ * @param nl
+ *   Libmnl socket of the @p NETLINK_ROUTE kind.
+ * @param ifindex
+ *   Index of network interface to initialize.
+ * @param[out] error
+ *   Perform verbose error reporting if not NULL.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+int
+mlx5_nl_flow_init(struct mnl_socket *nl, unsigned int ifindex,
+		  struct rte_flow_error *error)
+{
+	struct nlmsghdr *nlh;
+	struct tcmsg *tcm;
+	alignas(struct nlmsghdr)
+	uint8_t buf[mnl_nlmsg_size(sizeof(*tcm) + 128)];
+
+	/* Destroy existing ingress qdisc and everything attached to it. */
+	nlh = mnl_nlmsg_put_header(buf);
+	nlh->nlmsg_type = RTM_DELQDISC;
+	nlh->nlmsg_flags = NLM_F_REQUEST;
+	tcm = mnl_nlmsg_put_extra_header(nlh, sizeof(*tcm));
+	tcm->tcm_family = AF_UNSPEC;
+	tcm->tcm_ifindex = ifindex;
+	tcm->tcm_handle = TC_H_MAKE(TC_H_INGRESS, 0);
+	tcm->tcm_parent = TC_H_INGRESS;
+	/* Ignore errors when qdisc is already absent. */
+	if (mlx5_nl_flow_nl_ack(nl, nlh) &&
+	    rte_errno != EINVAL && rte_errno != ENOENT)
+		return rte_flow_error_set
+			(error, rte_errno, RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+			 NULL, "netlink: failed to remove ingress qdisc");
+	/* Create fresh ingress qdisc. */
+	nlh = mnl_nlmsg_put_header(buf);
+	nlh->nlmsg_type = RTM_NEWQDISC;
+	nlh->nlmsg_flags = NLM_F_REQUEST | NLM_F_CREATE | NLM_F_EXCL;
+	tcm = mnl_nlmsg_put_extra_header(nlh, sizeof(*tcm));
+	tcm->tcm_family = AF_UNSPEC;
+	tcm->tcm_ifindex = ifindex;
+	tcm->tcm_handle = TC_H_MAKE(TC_H_INGRESS, 0);
+	tcm->tcm_parent = TC_H_INGRESS;
+	mnl_attr_put_strz_check(nlh, sizeof(buf), TCA_KIND, "ingress");
+	if (mlx5_nl_flow_nl_ack(nl, nlh))
+		return rte_flow_error_set
+			(error, rte_errno, RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+			 NULL, "netlink: failed to create ingress qdisc");
+	return 0;
+}
+
+/**
+ * Create and configure a libmnl socket for Netlink flow rules.
+ *
+ * @return
+ *   A valid libmnl socket object pointer on success, NULL otherwise and
+ *   rte_errno is set.
+ */
+struct mnl_socket *
+mlx5_nl_flow_socket_create(void)
+{
+	struct mnl_socket *nl = mnl_socket_open(NETLINK_ROUTE);
+
+	if (nl) {
+		mnl_socket_setsockopt(nl, NETLINK_CAP_ACK, &(int){ 1 },
+				      sizeof(int));
+		if (!mnl_socket_bind(nl, 0, MNL_SOCKET_AUTOPID))
+			return nl;
+	}
+	rte_errno = errno;
+	if (nl)
+		mnl_socket_close(nl);
+	return NULL;
+}
+
+/**
+ * Destroy a libmnl socket.
+ */
+void
+mlx5_nl_flow_socket_destroy(struct mnl_socket *nl)
+{
+	mnl_socket_close(nl);
+}
diff --git a/mk/rte.app.mk b/mk/rte.app.mk
index 7bcf6308d..414f1b967 100644
--- a/mk/rte.app.mk
+++ b/mk/rte.app.mk
@@ -145,7 +145,7 @@ endif
 ifeq ($(CONFIG_RTE_LIBRTE_MLX5_DLOPEN_DEPS),y)
 _LDLIBS-$(CONFIG_RTE_LIBRTE_MLX5_PMD)       += -lrte_pmd_mlx5 -ldl
 else
-_LDLIBS-$(CONFIG_RTE_LIBRTE_MLX5_PMD)       += -lrte_pmd_mlx5 -libverbs -lmlx5
+_LDLIBS-$(CONFIG_RTE_LIBRTE_MLX5_PMD)       += -lrte_pmd_mlx5 -libverbs -lmlx5 -lmnl
 endif
 _LDLIBS-$(CONFIG_RTE_LIBRTE_MVPP2_PMD)      += -lrte_pmd_mvpp2 -L$(LIBMUSDK_PATH)/lib -lmusdk
 _LDLIBS-$(CONFIG_RTE_LIBRTE_NFP_PMD)        += -lrte_pmd_nfp
-- 
2.11.0

^ permalink raw reply	[flat|nested] 33+ messages in thread

* [dpdk-dev] [PATCH v2 2/6] net/mlx5: add framework for switch flow rules
  2018-07-13  9:40 ` [dpdk-dev] [PATCH v2 " Adrien Mazarguil
  2018-07-13  9:40   ` [dpdk-dev] [PATCH v2 1/6] net/mlx5: lay groundwork for switch offloads Adrien Mazarguil
@ 2018-07-13  9:40   ` Adrien Mazarguil
  2018-07-13  9:40   ` [dpdk-dev] [PATCH v2 3/6] net/mlx5: add fate actions to " Adrien Mazarguil
                     ` (4 subsequent siblings)
  6 siblings, 0 replies; 33+ messages in thread
From: Adrien Mazarguil @ 2018-07-13  9:40 UTC (permalink / raw)
  To: Shahaf Shuler; +Cc: Nelio Laranjeiro, Yongseok Koh, dev

Because mlx5 switch flow rules are configured through Netlink (TC
interface) and have little in common with Verbs, this patch adds a separate
parser function to handle them.

- mlx5_nl_flow_transpose() converts a rte_flow rule to its TC equivalent
  and stores the result in a buffer.

- mlx5_nl_flow_brand() gives a unique handle to a flow rule buffer.

- mlx5_nl_flow_create() instantiates a flow rule on the device based on
  such a buffer.

- mlx5_nl_flow_destroy() performs the reverse operation.

These functions are called by the existing implementation when encountering
flow rules which must be offloaded to the switch (currently relying on the
transfer attribute).

Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Acked-by: Yongseok Koh <yskoh@mellanox.com>
--
v2 changes:

- Replaced mlx5_domain_to_port_id() with mlx5_dev_to_port_id().
- Added definitions for NETLINK_CAP_ACK, TC_H_MIN_INGRESS,
  TCA_CLS_FLAGS_SKIP_SW, TCA_FLOWER_ACT and TCA_FLOWER_FLAGS in case they
  are missing from the host system (e.g. RHEL 7.2).
- Modified the size of buf_tmp[] in mlx5_nl_flow_transpose() as
  MNL_SOCKET_BUFFER_SIZE was insane. 1 kiB of message payload is plenty
  enough for the time being.
---
 drivers/net/mlx5/Makefile       |  10 ++
 drivers/net/mlx5/mlx5.h         |  18 ++
 drivers/net/mlx5/mlx5_flow.c    | 111 +++++++++++++
 drivers/net/mlx5/mlx5_nl_flow.c | 311 +++++++++++++++++++++++++++++++++++
 4 files changed, 450 insertions(+)

diff --git a/drivers/net/mlx5/Makefile b/drivers/net/mlx5/Makefile
index 8d3cb219b..1ccfbb594 100644
--- a/drivers/net/mlx5/Makefile
+++ b/drivers/net/mlx5/Makefile
@@ -199,6 +199,16 @@ mlx5_autoconf.h.new: $(RTE_SDK)/buildtools/auto-config-h.sh
 		linux/if_link.h \
 		enum IFLA_PHYS_PORT_NAME \
 		$(AUTOCONF_OUTPUT)
+	$Q sh -- '$<' '$@' \
+		HAVE_TCA_FLOWER_ACT \
+		linux/pkt_cls.h \
+		enum TCA_FLOWER_ACT \
+		$(AUTOCONF_OUTPUT)
+	$Q sh -- '$<' '$@' \
+		HAVE_TCA_FLOWER_FLAGS \
+		linux/pkt_cls.h \
+		enum TCA_FLOWER_FLAGS \
+		$(AUTOCONF_OUTPUT)
 
 # Create mlx5_autoconf.h or update it in case it differs from the new one.
 
diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index 98b6ec07d..5bad1b32b 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -156,6 +156,12 @@ struct mlx5_drop {
 	struct mlx5_rxq_ibv *rxq; /* Verbs Rx queue. */
 };
 
+/** DPDK port to network interface index (ifindex) conversion. */
+struct mlx5_nl_flow_ptoi {
+	uint16_t port_id; /**< DPDK port ID. */
+	unsigned int ifindex; /**< Network interface index. */
+};
+
 struct mnl_socket;
 
 struct priv {
@@ -385,6 +391,18 @@ int mlx5_nl_switch_info(int nl, unsigned int ifindex,
 
 /* mlx5_nl_flow.c */
 
+int mlx5_nl_flow_transpose(void *buf,
+			   size_t size,
+			   const struct mlx5_nl_flow_ptoi *ptoi,
+			   const struct rte_flow_attr *attr,
+			   const struct rte_flow_item *pattern,
+			   const struct rte_flow_action *actions,
+			   struct rte_flow_error *error);
+void mlx5_nl_flow_brand(void *buf, uint32_t handle);
+int mlx5_nl_flow_create(struct mnl_socket *nl, void *buf,
+			struct rte_flow_error *error);
+int mlx5_nl_flow_destroy(struct mnl_socket *nl, void *buf,
+			 struct rte_flow_error *error);
 int mlx5_nl_flow_init(struct mnl_socket *nl, unsigned int ifindex,
 		      struct rte_flow_error *error);
 struct mnl_socket *mlx5_nl_flow_socket_create(void);
diff --git a/drivers/net/mlx5/mlx5_flow.c b/drivers/net/mlx5/mlx5_flow.c
index 89bfc670f..890bf7d72 100644
--- a/drivers/net/mlx5/mlx5_flow.c
+++ b/drivers/net/mlx5/mlx5_flow.c
@@ -4,6 +4,7 @@
  */
 
 #include <sys/queue.h>
+#include <stdalign.h>
 #include <stdint.h>
 #include <string.h>
 
@@ -280,6 +281,7 @@ struct rte_flow {
 	struct rte_flow_action_rss rss;/**< RSS context. */
 	uint8_t key[MLX5_RSS_HASH_KEY_LEN]; /**< RSS hash key. */
 	uint16_t (*queue)[]; /**< Destination queues to redirect traffic to. */
+	void *nl_flow; /**< Netlink flow buffer if relevant. */
 };
 
 static const struct rte_flow_ops mlx5_flow_ops = {
@@ -2365,6 +2367,103 @@ mlx5_flow_actions(struct rte_eth_dev *dev,
 }
 
 /**
+ * Validate flow rule and fill flow structure accordingly.
+ *
+ * @param dev
+ *   Pointer to Ethernet device.
+ * @param[out] flow
+ *   Pointer to flow structure.
+ * @param flow_size
+ *   Size of allocated space for @p flow.
+ * @param[in] attr
+ *   Flow rule attributes.
+ * @param[in] pattern
+ *   Pattern specification (list terminated by the END pattern item).
+ * @param[in] actions
+ *   Associated actions (list terminated by the END action).
+ * @param[out] error
+ *   Perform verbose error reporting if not NULL.
+ *
+ * @return
+ *   A positive value representing the size of the flow object in bytes
+ *   regardless of @p flow_size on success, a negative errno value otherwise
+ *   and rte_errno is set.
+ */
+static int
+mlx5_flow_merge_switch(struct rte_eth_dev *dev,
+		       struct rte_flow *flow,
+		       size_t flow_size,
+		       const struct rte_flow_attr *attr,
+		       const struct rte_flow_item pattern[],
+		       const struct rte_flow_action actions[],
+		       struct rte_flow_error *error)
+{
+	unsigned int n = mlx5_dev_to_port_id(dev->device, NULL, 0);
+	uint16_t port_id[!n + n];
+	struct mlx5_nl_flow_ptoi ptoi[!n + n + 1];
+	size_t off = RTE_ALIGN_CEIL(sizeof(*flow), alignof(max_align_t));
+	unsigned int i;
+	unsigned int own = 0;
+	int ret;
+
+	/* At least one port is needed when no switch domain is present. */
+	if (!n) {
+		n = 1;
+		port_id[0] = dev->data->port_id;
+	} else {
+		n = RTE_MIN(mlx5_dev_to_port_id(dev->device, port_id, n), n);
+	}
+	for (i = 0; i != n; ++i) {
+		struct rte_eth_dev_info dev_info;
+
+		rte_eth_dev_info_get(port_id[i], &dev_info);
+		if (port_id[i] == dev->data->port_id)
+			own = i;
+		ptoi[i].port_id = port_id[i];
+		ptoi[i].ifindex = dev_info.if_index;
+	}
+	/* Ensure first entry of ptoi[] is the current device. */
+	if (own) {
+		ptoi[n] = ptoi[0];
+		ptoi[0] = ptoi[own];
+		ptoi[own] = ptoi[n];
+	}
+	/* An entry with zero ifindex terminates ptoi[]. */
+	ptoi[n].port_id = 0;
+	ptoi[n].ifindex = 0;
+	if (flow_size < off)
+		flow_size = 0;
+	ret = mlx5_nl_flow_transpose((uint8_t *)flow + off,
+				     flow_size ? flow_size - off : 0,
+				     ptoi, attr, pattern, actions, error);
+	if (ret < 0)
+		return ret;
+	if (flow_size) {
+		*flow = (struct rte_flow){
+			.attributes = *attr,
+			.nl_flow = (uint8_t *)flow + off,
+		};
+		/*
+		 * Generate a reasonably unique handle based on the address
+		 * of the target buffer.
+		 *
+		 * This is straightforward on 32-bit systems where the flow
+		 * pointer can be used directly. Otherwise, its least
+		 * significant part is taken after shifting it by the
+		 * previous power of two of the pointed buffer size.
+		 */
+		if (sizeof(flow) <= 4)
+			mlx5_nl_flow_brand(flow->nl_flow, (uintptr_t)flow);
+		else
+			mlx5_nl_flow_brand
+				(flow->nl_flow,
+				 (uintptr_t)flow >>
+				 rte_log2_u32(rte_align32prevpow2(flow_size)));
+	}
+	return off + ret;
+}
+
+/**
  * Convert the @p attributes, @p pattern, @p action, into an flow for the NIC
  * after ensuring the NIC will understand and process it correctly.
  * The conversion is only performed item/action per item/action, each of
@@ -2418,6 +2517,10 @@ mlx5_flow_merge(struct rte_eth_dev *dev, struct rte_flow *flow,
 	int ret;
 	uint32_t i;
 
+	if (attributes->transfer)
+		return mlx5_flow_merge_switch(dev, flow, flow_size,
+					      attributes, pattern,
+					      actions, error);
 	if (size > flow_size)
 		flow = &local_flow;
 	ret = mlx5_flow_attributes(dev, attributes, flow, error);
@@ -2708,8 +2811,11 @@ mlx5_flow_validate(struct rte_eth_dev *dev,
 static void
 mlx5_flow_remove(struct rte_eth_dev *dev, struct rte_flow *flow)
 {
+	struct priv *priv = dev->data->dev_private;
 	struct mlx5_flow_verbs *verbs;
 
+	if (flow->nl_flow && priv->mnl_socket)
+		mlx5_nl_flow_destroy(priv->mnl_socket, flow->nl_flow, NULL);
 	LIST_FOREACH(verbs, &flow->verbs, next) {
 		if (verbs->flow) {
 			claim_zero(mlx5_glue->destroy_flow(verbs->flow));
@@ -2746,6 +2852,7 @@ static int
 mlx5_flow_apply(struct rte_eth_dev *dev, struct rte_flow *flow,
 		struct rte_flow_error *error)
 {
+	struct priv *priv = dev->data->dev_private;
 	struct mlx5_flow_verbs *verbs;
 	int err;
 
@@ -2794,6 +2901,10 @@ mlx5_flow_apply(struct rte_eth_dev *dev, struct rte_flow *flow,
 			goto error;
 		}
 	}
+	if (flow->nl_flow &&
+	    priv->mnl_socket &&
+	    mlx5_nl_flow_create(priv->mnl_socket, flow->nl_flow, error))
+		goto error;
 	return 0;
 error:
 	err = rte_errno; /* Save rte_errno before cleanup. */
diff --git a/drivers/net/mlx5/mlx5_nl_flow.c b/drivers/net/mlx5/mlx5_nl_flow.c
index 60a4493e5..a9a5bac49 100644
--- a/drivers/net/mlx5/mlx5_nl_flow.c
+++ b/drivers/net/mlx5/mlx5_nl_flow.c
@@ -5,7 +5,9 @@
 
 #include <errno.h>
 #include <libmnl/libmnl.h>
+#include <linux/if_ether.h>
 #include <linux/netlink.h>
+#include <linux/pkt_cls.h>
 #include <linux/pkt_sched.h>
 #include <linux/rtnetlink.h>
 #include <stdalign.h>
@@ -14,6 +16,7 @@
 #include <stdlib.h>
 #include <sys/socket.h>
 
+#include <rte_byteorder.h>
 #include <rte_errno.h>
 #include <rte_flow.h>
 
@@ -24,6 +27,258 @@
 #define NETLINK_CAP_ACK 10
 #endif
 
+/* Normally found in linux/pkt_sched.h. */
+#ifndef TC_H_MIN_INGRESS
+#define TC_H_MIN_INGRESS 0xfff2u
+#endif
+
+/* Normally found in linux/pkt_cls.h. */
+#ifndef TCA_CLS_FLAGS_SKIP_SW
+#define TCA_CLS_FLAGS_SKIP_SW (1 << 1)
+#endif
+#ifndef HAVE_TCA_FLOWER_ACT
+#define TCA_FLOWER_ACT 3
+#endif
+#ifndef HAVE_TCA_FLOWER_FLAGS
+#define TCA_FLOWER_FLAGS 22
+#endif
+
+/** Parser state definitions for mlx5_nl_flow_trans[]. */
+enum mlx5_nl_flow_trans {
+	INVALID,
+	BACK,
+	ATTR,
+	PATTERN,
+	ITEM_VOID,
+	ACTIONS,
+	ACTION_VOID,
+	END,
+};
+
+#define TRANS(...) (const enum mlx5_nl_flow_trans []){ __VA_ARGS__, INVALID, }
+
+#define PATTERN_COMMON \
+	ITEM_VOID, ACTIONS
+#define ACTIONS_COMMON \
+	ACTION_VOID, END
+
+/** Parser state transitions used by mlx5_nl_flow_transpose(). */
+static const enum mlx5_nl_flow_trans *const mlx5_nl_flow_trans[] = {
+	[INVALID] = NULL,
+	[BACK] = NULL,
+	[ATTR] = TRANS(PATTERN),
+	[PATTERN] = TRANS(PATTERN_COMMON),
+	[ITEM_VOID] = TRANS(BACK),
+	[ACTIONS] = TRANS(ACTIONS_COMMON),
+	[ACTION_VOID] = TRANS(BACK),
+	[END] = NULL,
+};
+
+/**
+ * Transpose flow rule description to rtnetlink message.
+ *
+ * This function transposes a flow rule description to a traffic control
+ * (TC) filter creation message ready to be sent over Netlink.
+ *
+ * Target interface is specified as the first entry of the @p ptoi table.
+ * Subsequent entries enable this function to resolve other DPDK port IDs
+ * found in the flow rule.
+ *
+ * @param[out] buf
+ *   Output message buffer. May be NULL when @p size is 0.
+ * @param size
+ *   Size of @p buf. Message may be truncated if not large enough.
+ * @param[in] ptoi
+ *   DPDK port ID to network interface index translation table. This table
+ *   is terminated by an entry with a zero ifindex value.
+ * @param[in] attr
+ *   Flow rule attributes.
+ * @param[in] pattern
+ *   Pattern specification.
+ * @param[in] actions
+ *   Associated actions.
+ * @param[out] error
+ *   Perform verbose error reporting if not NULL.
+ *
+ * @return
+ *   A positive value representing the exact size of the message in bytes
+ *   regardless of the @p size parameter on success, a negative errno value
+ *   otherwise and rte_errno is set.
+ */
+int
+mlx5_nl_flow_transpose(void *buf,
+		       size_t size,
+		       const struct mlx5_nl_flow_ptoi *ptoi,
+		       const struct rte_flow_attr *attr,
+		       const struct rte_flow_item *pattern,
+		       const struct rte_flow_action *actions,
+		       struct rte_flow_error *error)
+{
+	alignas(struct nlmsghdr)
+	uint8_t buf_tmp[mnl_nlmsg_size(sizeof(struct tcmsg) + 1024)];
+	const struct rte_flow_item *item;
+	const struct rte_flow_action *action;
+	unsigned int n;
+	struct nlattr *na_flower;
+	struct nlattr *na_flower_act;
+	const enum mlx5_nl_flow_trans *trans;
+	const enum mlx5_nl_flow_trans *back;
+
+	if (!size)
+		goto error_nobufs;
+init:
+	item = pattern;
+	action = actions;
+	n = 0;
+	na_flower = NULL;
+	na_flower_act = NULL;
+	trans = TRANS(ATTR);
+	back = trans;
+trans:
+	switch (trans[n++]) {
+		struct nlmsghdr *nlh;
+		struct tcmsg *tcm;
+
+	case INVALID:
+		if (item->type)
+			return rte_flow_error_set
+				(error, ENOTSUP, RTE_FLOW_ERROR_TYPE_ITEM,
+				 item, "unsupported pattern item combination");
+		else if (action->type)
+			return rte_flow_error_set
+				(error, ENOTSUP, RTE_FLOW_ERROR_TYPE_ACTION,
+				 action, "unsupported action combination");
+		return rte_flow_error_set
+			(error, ENOTSUP, RTE_FLOW_ERROR_TYPE_UNSPECIFIED, NULL,
+			 "flow rule lacks some kind of fate action");
+	case BACK:
+		trans = back;
+		n = 0;
+		goto trans;
+	case ATTR:
+		/*
+		 * Supported attributes: no groups, some priorities and
+		 * ingress only. Don't care about transfer as it is the
+		 * caller's problem.
+		 */
+		if (attr->group)
+			return rte_flow_error_set
+				(error, ENOTSUP,
+				 RTE_FLOW_ERROR_TYPE_ATTR_GROUP,
+				 attr, "groups are not supported");
+		if (attr->priority > 0xfffe)
+			return rte_flow_error_set
+				(error, ENOTSUP,
+				 RTE_FLOW_ERROR_TYPE_ATTR_PRIORITY,
+				 attr, "lowest priority level is 0xfffe");
+		if (!attr->ingress)
+			return rte_flow_error_set
+				(error, ENOTSUP,
+				 RTE_FLOW_ERROR_TYPE_ATTR_INGRESS,
+				 attr, "only ingress is supported");
+		if (attr->egress)
+			return rte_flow_error_set
+				(error, ENOTSUP,
+				 RTE_FLOW_ERROR_TYPE_ATTR_INGRESS,
+				 attr, "egress is not supported");
+		if (size < mnl_nlmsg_size(sizeof(*tcm)))
+			goto error_nobufs;
+		nlh = mnl_nlmsg_put_header(buf);
+		nlh->nlmsg_type = 0;
+		nlh->nlmsg_flags = 0;
+		nlh->nlmsg_seq = 0;
+		tcm = mnl_nlmsg_put_extra_header(nlh, sizeof(*tcm));
+		tcm->tcm_family = AF_UNSPEC;
+		tcm->tcm_ifindex = ptoi[0].ifindex;
+		/*
+		 * Let kernel pick a handle by default. A predictable handle
+		 * can be set by the caller on the resulting buffer through
+		 * mlx5_nl_flow_brand().
+		 */
+		tcm->tcm_handle = 0;
+		tcm->tcm_parent = TC_H_MAKE(TC_H_INGRESS, TC_H_MIN_INGRESS);
+		/*
+		 * Priority cannot be zero to prevent the kernel from
+		 * picking one automatically.
+		 */
+		tcm->tcm_info = TC_H_MAKE((attr->priority + 1) << 16,
+					  RTE_BE16(ETH_P_ALL));
+		break;
+	case PATTERN:
+		if (!mnl_attr_put_strz_check(buf, size, TCA_KIND, "flower"))
+			goto error_nobufs;
+		na_flower = mnl_attr_nest_start_check(buf, size, TCA_OPTIONS);
+		if (!na_flower)
+			goto error_nobufs;
+		if (!mnl_attr_put_u32_check(buf, size, TCA_FLOWER_FLAGS,
+					    TCA_CLS_FLAGS_SKIP_SW))
+			goto error_nobufs;
+		break;
+	case ITEM_VOID:
+		if (item->type != RTE_FLOW_ITEM_TYPE_VOID)
+			goto trans;
+		++item;
+		break;
+	case ACTIONS:
+		if (item->type != RTE_FLOW_ITEM_TYPE_END)
+			goto trans;
+		assert(na_flower);
+		assert(!na_flower_act);
+		na_flower_act =
+			mnl_attr_nest_start_check(buf, size, TCA_FLOWER_ACT);
+		if (!na_flower_act)
+			goto error_nobufs;
+		break;
+	case ACTION_VOID:
+		if (action->type != RTE_FLOW_ACTION_TYPE_VOID)
+			goto trans;
+		++action;
+		break;
+	case END:
+		if (item->type != RTE_FLOW_ITEM_TYPE_END ||
+		    action->type != RTE_FLOW_ACTION_TYPE_END)
+			goto trans;
+		if (na_flower_act)
+			mnl_attr_nest_end(buf, na_flower_act);
+		if (na_flower)
+			mnl_attr_nest_end(buf, na_flower);
+		nlh = buf;
+		return nlh->nlmsg_len;
+	}
+	back = trans;
+	trans = mlx5_nl_flow_trans[trans[n - 1]];
+	n = 0;
+	goto trans;
+error_nobufs:
+	if (buf != buf_tmp) {
+		buf = buf_tmp;
+		size = sizeof(buf_tmp);
+		goto init;
+	}
+	return rte_flow_error_set
+		(error, ENOBUFS, RTE_FLOW_ERROR_TYPE_UNSPECIFIED, NULL,
+		 "generated TC message is too large");
+}
+
+/**
+ * Brand rtnetlink buffer with unique handle.
+ *
+ * This handle should be unique for a given network interface to avoid
+ * collisions.
+ *
+ * @param buf
+ *   Flow rule buffer previously initialized by mlx5_nl_flow_transpose().
+ * @param handle
+ *   Unique 32-bit handle to use.
+ */
+void
+mlx5_nl_flow_brand(void *buf, uint32_t handle)
+{
+	struct tcmsg *tcm = mnl_nlmsg_get_payload(buf);
+
+	tcm->tcm_handle = handle;
+}
+
 /**
  * Send Netlink message with acknowledgment.
  *
@@ -60,6 +315,62 @@ mlx5_nl_flow_nl_ack(struct mnl_socket *nl, struct nlmsghdr *nlh)
 }
 
 /**
+ * Create a Netlink flow rule.
+ *
+ * @param nl
+ *   Libmnl socket to use.
+ * @param buf
+ *   Flow rule buffer previously initialized by mlx5_nl_flow_transpose().
+ * @param[out] error
+ *   Perform verbose error reporting if not NULL.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+int
+mlx5_nl_flow_create(struct mnl_socket *nl, void *buf,
+		    struct rte_flow_error *error)
+{
+	struct nlmsghdr *nlh = buf;
+
+	nlh->nlmsg_type = RTM_NEWTFILTER;
+	nlh->nlmsg_flags = NLM_F_REQUEST | NLM_F_CREATE | NLM_F_EXCL;
+	if (!mlx5_nl_flow_nl_ack(nl, nlh))
+		return 0;
+	return rte_flow_error_set
+		(error, rte_errno, RTE_FLOW_ERROR_TYPE_UNSPECIFIED, NULL,
+		 "netlink: failed to create TC flow rule");
+}
+
+/**
+ * Destroy a Netlink flow rule.
+ *
+ * @param nl
+ *   Libmnl socket to use.
+ * @param buf
+ *   Flow rule buffer previously initialized by mlx5_nl_flow_transpose().
+ * @param[out] error
+ *   Perform verbose error reporting if not NULL.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+int
+mlx5_nl_flow_destroy(struct mnl_socket *nl, void *buf,
+		     struct rte_flow_error *error)
+{
+	struct nlmsghdr *nlh = buf;
+
+	nlh->nlmsg_type = RTM_DELTFILTER;
+	nlh->nlmsg_flags = NLM_F_REQUEST;
+	if (!mlx5_nl_flow_nl_ack(nl, nlh))
+		return 0;
+	return rte_flow_error_set
+		(error, errno, RTE_FLOW_ERROR_TYPE_UNSPECIFIED, NULL,
+		 "netlink: failed to destroy TC flow rule");
+}
+
+/**
  * Initialize ingress qdisc of a given network interface.
  *
  * @param nl
-- 
2.11.0

^ permalink raw reply	[flat|nested] 33+ messages in thread

* [dpdk-dev] [PATCH v2 3/6] net/mlx5: add fate actions to switch flow rules
  2018-07-13  9:40 ` [dpdk-dev] [PATCH v2 " Adrien Mazarguil
  2018-07-13  9:40   ` [dpdk-dev] [PATCH v2 1/6] net/mlx5: lay groundwork for switch offloads Adrien Mazarguil
  2018-07-13  9:40   ` [dpdk-dev] [PATCH v2 2/6] net/mlx5: add framework for switch flow rules Adrien Mazarguil
@ 2018-07-13  9:40   ` Adrien Mazarguil
  2018-07-13  9:40   ` [dpdk-dev] [PATCH v2 4/6] net/mlx5: add L2-L4 pattern items " Adrien Mazarguil
                     ` (3 subsequent siblings)
  6 siblings, 0 replies; 33+ messages in thread
From: Adrien Mazarguil @ 2018-07-13  9:40 UTC (permalink / raw)
  To: Shahaf Shuler; +Cc: Nelio Laranjeiro, Yongseok Koh, dev

This patch enables creation of rte_flow rules that direct matching traffic
to a different port (e.g. another VF representor) or drop it directly at
the switch level (PORT_ID and DROP actions).

Testpmd examples:

- Directing all traffic to port ID 0:

  flow create 1 ingress transfer pattern end actions port_id id 0 / end

- Dropping all traffic normally received by port ID 1:

  flow create 1 ingress transfer pattern end actions drop / end

Note the presence of the transfer attribute, which requests them to be
applied at the switch level. All traffic is matched due to empty pattern.

Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Acked-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Acked-by: Yongseok Koh <yskoh@mellanox.com>
---
 drivers/net/mlx5/mlx5_nl_flow.c | 77 +++++++++++++++++++++++++++++++++++-
 1 file changed, 75 insertions(+), 2 deletions(-)

diff --git a/drivers/net/mlx5/mlx5_nl_flow.c b/drivers/net/mlx5/mlx5_nl_flow.c
index a9a5bac49..42b7c655e 100644
--- a/drivers/net/mlx5/mlx5_nl_flow.c
+++ b/drivers/net/mlx5/mlx5_nl_flow.c
@@ -10,6 +10,8 @@
 #include <linux/pkt_cls.h>
 #include <linux/pkt_sched.h>
 #include <linux/rtnetlink.h>
+#include <linux/tc_act/tc_gact.h>
+#include <linux/tc_act/tc_mirred.h>
 #include <stdalign.h>
 #include <stddef.h>
 #include <stdint.h>
@@ -52,6 +54,8 @@ enum mlx5_nl_flow_trans {
 	ITEM_VOID,
 	ACTIONS,
 	ACTION_VOID,
+	ACTION_PORT_ID,
+	ACTION_DROP,
 	END,
 };
 
@@ -60,7 +64,9 @@ enum mlx5_nl_flow_trans {
 #define PATTERN_COMMON \
 	ITEM_VOID, ACTIONS
 #define ACTIONS_COMMON \
-	ACTION_VOID, END
+	ACTION_VOID
+#define ACTIONS_FATE \
+	ACTION_PORT_ID, ACTION_DROP
 
 /** Parser state transitions used by mlx5_nl_flow_transpose(). */
 static const enum mlx5_nl_flow_trans *const mlx5_nl_flow_trans[] = {
@@ -69,8 +75,10 @@ static const enum mlx5_nl_flow_trans *const mlx5_nl_flow_trans[] = {
 	[ATTR] = TRANS(PATTERN),
 	[PATTERN] = TRANS(PATTERN_COMMON),
 	[ITEM_VOID] = TRANS(BACK),
-	[ACTIONS] = TRANS(ACTIONS_COMMON),
+	[ACTIONS] = TRANS(ACTIONS_FATE, ACTIONS_COMMON),
 	[ACTION_VOID] = TRANS(BACK),
+	[ACTION_PORT_ID] = TRANS(ACTION_VOID, END),
+	[ACTION_DROP] = TRANS(ACTION_VOID, END),
 	[END] = NULL,
 };
 
@@ -119,6 +127,7 @@ mlx5_nl_flow_transpose(void *buf,
 	const struct rte_flow_item *item;
 	const struct rte_flow_action *action;
 	unsigned int n;
+	uint32_t act_index_cur;
 	struct nlattr *na_flower;
 	struct nlattr *na_flower_act;
 	const enum mlx5_nl_flow_trans *trans;
@@ -130,14 +139,21 @@ mlx5_nl_flow_transpose(void *buf,
 	item = pattern;
 	action = actions;
 	n = 0;
+	act_index_cur = 0;
 	na_flower = NULL;
 	na_flower_act = NULL;
 	trans = TRANS(ATTR);
 	back = trans;
 trans:
 	switch (trans[n++]) {
+		union {
+			const struct rte_flow_action_port_id *port_id;
+		} conf;
 		struct nlmsghdr *nlh;
 		struct tcmsg *tcm;
+		struct nlattr *act_index;
+		struct nlattr *act;
+		unsigned int i;
 
 	case INVALID:
 		if (item->type)
@@ -228,12 +244,69 @@ mlx5_nl_flow_transpose(void *buf,
 			mnl_attr_nest_start_check(buf, size, TCA_FLOWER_ACT);
 		if (!na_flower_act)
 			goto error_nobufs;
+		act_index_cur = 1;
 		break;
 	case ACTION_VOID:
 		if (action->type != RTE_FLOW_ACTION_TYPE_VOID)
 			goto trans;
 		++action;
 		break;
+	case ACTION_PORT_ID:
+		if (action->type != RTE_FLOW_ACTION_TYPE_PORT_ID)
+			goto trans;
+		conf.port_id = action->conf;
+		if (conf.port_id->original)
+			i = 0;
+		else
+			for (i = 0; ptoi[i].ifindex; ++i)
+				if (ptoi[i].port_id == conf.port_id->id)
+					break;
+		if (!ptoi[i].ifindex)
+			return rte_flow_error_set
+				(error, ENODEV, RTE_FLOW_ERROR_TYPE_ACTION_CONF,
+				 conf.port_id,
+				 "missing data to convert port ID to ifindex");
+		act_index =
+			mnl_attr_nest_start_check(buf, size, act_index_cur++);
+		if (!act_index ||
+		    !mnl_attr_put_strz_check(buf, size, TCA_ACT_KIND, "mirred"))
+			goto error_nobufs;
+		act = mnl_attr_nest_start_check(buf, size, TCA_ACT_OPTIONS);
+		if (!act)
+			goto error_nobufs;
+		if (!mnl_attr_put_check(buf, size, TCA_MIRRED_PARMS,
+					sizeof(struct tc_mirred),
+					&(struct tc_mirred){
+						.action = TC_ACT_STOLEN,
+						.eaction = TCA_EGRESS_REDIR,
+						.ifindex = ptoi[i].ifindex,
+					}))
+			goto error_nobufs;
+		mnl_attr_nest_end(buf, act);
+		mnl_attr_nest_end(buf, act_index);
+		++action;
+		break;
+	case ACTION_DROP:
+		if (action->type != RTE_FLOW_ACTION_TYPE_DROP)
+			goto trans;
+		act_index =
+			mnl_attr_nest_start_check(buf, size, act_index_cur++);
+		if (!act_index ||
+		    !mnl_attr_put_strz_check(buf, size, TCA_ACT_KIND, "gact"))
+			goto error_nobufs;
+		act = mnl_attr_nest_start_check(buf, size, TCA_ACT_OPTIONS);
+		if (!act)
+			goto error_nobufs;
+		if (!mnl_attr_put_check(buf, size, TCA_GACT_PARMS,
+					sizeof(struct tc_gact),
+					&(struct tc_gact){
+						.action = TC_ACT_SHOT,
+					}))
+			goto error_nobufs;
+		mnl_attr_nest_end(buf, act);
+		mnl_attr_nest_end(buf, act_index);
+		++action;
+		break;
 	case END:
 		if (item->type != RTE_FLOW_ITEM_TYPE_END ||
 		    action->type != RTE_FLOW_ACTION_TYPE_END)
-- 
2.11.0

^ permalink raw reply	[flat|nested] 33+ messages in thread

* [dpdk-dev] [PATCH v2 4/6] net/mlx5: add L2-L4 pattern items to switch flow rules
  2018-07-13  9:40 ` [dpdk-dev] [PATCH v2 " Adrien Mazarguil
                     ` (2 preceding siblings ...)
  2018-07-13  9:40   ` [dpdk-dev] [PATCH v2 3/6] net/mlx5: add fate actions to " Adrien Mazarguil
@ 2018-07-13  9:40   ` Adrien Mazarguil
  2018-07-13  9:40   ` [dpdk-dev] [PATCH v2 5/6] net/mlx5: add VLAN item and actions " Adrien Mazarguil
                     ` (2 subsequent siblings)
  6 siblings, 0 replies; 33+ messages in thread
From: Adrien Mazarguil @ 2018-07-13  9:40 UTC (permalink / raw)
  To: Shahaf Shuler; +Cc: Nelio Laranjeiro, Yongseok Koh, dev

This enables flow rules to explicitly match supported combinations of
Ethernet, IPv4, IPv6, TCP and UDP headers at the switch level.

Testpmd example:

- Dropping TCPv4 traffic with a specific destination on port ID 2:

  flow create 2 ingress transfer pattern eth / ipv4 / tcp dst is 42 / end
     actions drop / end

Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Acked-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Acked-by: Yongseok Koh <yskoh@mellanox.com>
--
v2 changes:

- Added definitions for TCA_FLOWER_KEY_ETH_TYPE, TCA_FLOWER_KEY_ETH_DST,
  TCA_FLOWER_KEY_ETH_DST_MASK, TCA_FLOWER_KEY_ETH_SRC,
  TCA_FLOWER_KEY_ETH_SRC_MASK, TCA_FLOWER_KEY_IP_PROTO,
  TCA_FLOWER_KEY_IPV4_SRC, TCA_FLOWER_KEY_IPV4_SRC_MASK,
  TCA_FLOWER_KEY_IPV4_DST, TCA_FLOWER_KEY_IPV4_DST_MASK,
  TCA_FLOWER_KEY_IPV6_SRC, TCA_FLOWER_KEY_IPV6_SRC_MASK,
  TCA_FLOWER_KEY_IPV6_DST, TCA_FLOWER_KEY_IPV6_DST_MASK,
  TCA_FLOWER_KEY_TCP_SRC, TCA_FLOWER_KEY_TCP_SRC_MASK,
  TCA_FLOWER_KEY_TCP_DST, TCA_FLOWER_KEY_TCP_DST_MASK,
  TCA_FLOWER_KEY_UDP_SRC, TCA_FLOWER_KEY_UDP_SRC_MASK,
  TCA_FLOWER_KEY_UDP_DST and TCA_FLOWER_KEY_UDP_DST_MASK in case they are
  missing from the host system (e.g. RHEL 7.2).
---
 drivers/net/mlx5/Makefile       | 110 +++++++++
 drivers/net/mlx5/mlx5_nl_flow.c | 463 ++++++++++++++++++++++++++++++++++-
 2 files changed, 572 insertions(+), 1 deletion(-)

diff --git a/drivers/net/mlx5/Makefile b/drivers/net/mlx5/Makefile
index 1ccfbb594..5e28b4c87 100644
--- a/drivers/net/mlx5/Makefile
+++ b/drivers/net/mlx5/Makefile
@@ -209,6 +209,116 @@ mlx5_autoconf.h.new: $(RTE_SDK)/buildtools/auto-config-h.sh
 		linux/pkt_cls.h \
 		enum TCA_FLOWER_FLAGS \
 		$(AUTOCONF_OUTPUT)
+	$Q sh -- '$<' '$@' \
+		HAVE_TCA_FLOWER_KEY_ETH_TYPE \
+		linux/pkt_cls.h \
+		enum TCA_FLOWER_KEY_ETH_TYPE \
+		$(AUTOCONF_OUTPUT)
+	$Q sh -- '$<' '$@' \
+		HAVE_TCA_FLOWER_KEY_ETH_DST \
+		linux/pkt_cls.h \
+		enum TCA_FLOWER_KEY_ETH_DST \
+		$(AUTOCONF_OUTPUT)
+	$Q sh -- '$<' '$@' \
+		HAVE_TCA_FLOWER_KEY_ETH_DST_MASK \
+		linux/pkt_cls.h \
+		enum TCA_FLOWER_KEY_ETH_DST_MASK \
+		$(AUTOCONF_OUTPUT)
+	$Q sh -- '$<' '$@' \
+		HAVE_TCA_FLOWER_KEY_ETH_SRC \
+		linux/pkt_cls.h \
+		enum TCA_FLOWER_KEY_ETH_SRC \
+		$(AUTOCONF_OUTPUT)
+	$Q sh -- '$<' '$@' \
+		HAVE_TCA_FLOWER_KEY_ETH_SRC_MASK \
+		linux/pkt_cls.h \
+		enum TCA_FLOWER_KEY_ETH_SRC_MASK \
+		$(AUTOCONF_OUTPUT)
+	$Q sh -- '$<' '$@' \
+		HAVE_TCA_FLOWER_KEY_IP_PROTO \
+		linux/pkt_cls.h \
+		enum TCA_FLOWER_KEY_IP_PROTO \
+		$(AUTOCONF_OUTPUT)
+	$Q sh -- '$<' '$@' \
+		HAVE_TCA_FLOWER_KEY_IPV4_SRC \
+		linux/pkt_cls.h \
+		enum TCA_FLOWER_KEY_IPV4_SRC \
+		$(AUTOCONF_OUTPUT)
+	$Q sh -- '$<' '$@' \
+		HAVE_TCA_FLOWER_KEY_IPV4_SRC_MASK \
+		linux/pkt_cls.h \
+		enum TCA_FLOWER_KEY_IPV4_SRC_MASK \
+		$(AUTOCONF_OUTPUT)
+	$Q sh -- '$<' '$@' \
+		HAVE_TCA_FLOWER_KEY_IPV4_DST \
+		linux/pkt_cls.h \
+		enum TCA_FLOWER_KEY_IPV4_DST \
+		$(AUTOCONF_OUTPUT)
+	$Q sh -- '$<' '$@' \
+		HAVE_TCA_FLOWER_KEY_IPV4_DST_MASK \
+		linux/pkt_cls.h \
+		enum TCA_FLOWER_KEY_IPV4_DST_MASK \
+		$(AUTOCONF_OUTPUT)
+	$Q sh -- '$<' '$@' \
+		HAVE_TCA_FLOWER_KEY_IPV6_SRC \
+		linux/pkt_cls.h \
+		enum TCA_FLOWER_KEY_IPV6_SRC \
+		$(AUTOCONF_OUTPUT)
+	$Q sh -- '$<' '$@' \
+		HAVE_TCA_FLOWER_KEY_IPV6_SRC_MASK \
+		linux/pkt_cls.h \
+		enum TCA_FLOWER_KEY_IPV6_SRC_MASK \
+		$(AUTOCONF_OUTPUT)
+	$Q sh -- '$<' '$@' \
+		HAVE_TCA_FLOWER_KEY_IPV6_DST \
+		linux/pkt_cls.h \
+		enum TCA_FLOWER_KEY_IPV6_DST \
+		$(AUTOCONF_OUTPUT)
+	$Q sh -- '$<' '$@' \
+		HAVE_TCA_FLOWER_KEY_IPV6_DST_MASK \
+		linux/pkt_cls.h \
+		enum TCA_FLOWER_KEY_IPV6_DST_MASK \
+		$(AUTOCONF_OUTPUT)
+	$Q sh -- '$<' '$@' \
+		HAVE_TCA_FLOWER_KEY_TCP_SRC \
+		linux/pkt_cls.h \
+		enum TCA_FLOWER_KEY_TCP_SRC \
+		$(AUTOCONF_OUTPUT)
+	$Q sh -- '$<' '$@' \
+		HAVE_TCA_FLOWER_KEY_TCP_SRC_MASK \
+		linux/pkt_cls.h \
+		enum TCA_FLOWER_KEY_TCP_SRC_MASK \
+		$(AUTOCONF_OUTPUT)
+	$Q sh -- '$<' '$@' \
+		HAVE_TCA_FLOWER_KEY_TCP_DST \
+		linux/pkt_cls.h \
+		enum TCA_FLOWER_KEY_TCP_DST \
+		$(AUTOCONF_OUTPUT)
+	$Q sh -- '$<' '$@' \
+		HAVE_TCA_FLOWER_KEY_TCP_DST_MASK \
+		linux/pkt_cls.h \
+		enum TCA_FLOWER_KEY_TCP_DST_MASK \
+		$(AUTOCONF_OUTPUT)
+	$Q sh -- '$<' '$@' \
+		HAVE_TCA_FLOWER_KEY_UDP_SRC \
+		linux/pkt_cls.h \
+		enum TCA_FLOWER_KEY_UDP_SRC \
+		$(AUTOCONF_OUTPUT)
+	$Q sh -- '$<' '$@' \
+		HAVE_TCA_FLOWER_KEY_UDP_SRC_MASK \
+		linux/pkt_cls.h \
+		enum TCA_FLOWER_KEY_UDP_SRC_MASK \
+		$(AUTOCONF_OUTPUT)
+	$Q sh -- '$<' '$@' \
+		HAVE_TCA_FLOWER_KEY_UDP_DST \
+		linux/pkt_cls.h \
+		enum TCA_FLOWER_KEY_UDP_DST \
+		$(AUTOCONF_OUTPUT)
+	$Q sh -- '$<' '$@' \
+		HAVE_TCA_FLOWER_KEY_UDP_DST_MASK \
+		linux/pkt_cls.h \
+		enum TCA_FLOWER_KEY_UDP_DST_MASK \
+		$(AUTOCONF_OUTPUT)
 
 # Create mlx5_autoconf.h or update it in case it differs from the new one.
 
diff --git a/drivers/net/mlx5/mlx5_nl_flow.c b/drivers/net/mlx5/mlx5_nl_flow.c
index 42b7c655e..88e7cabd5 100644
--- a/drivers/net/mlx5/mlx5_nl_flow.c
+++ b/drivers/net/mlx5/mlx5_nl_flow.c
@@ -3,6 +3,7 @@
  * Copyright 2018 Mellanox Technologies, Ltd
  */
 
+#include <assert.h>
 #include <errno.h>
 #include <libmnl/libmnl.h>
 #include <linux/if_ether.h>
@@ -12,7 +13,9 @@
 #include <linux/rtnetlink.h>
 #include <linux/tc_act/tc_gact.h>
 #include <linux/tc_act/tc_mirred.h>
+#include <netinet/in.h>
 #include <stdalign.h>
+#include <stdbool.h>
 #include <stddef.h>
 #include <stdint.h>
 #include <stdlib.h>
@@ -20,6 +23,7 @@
 
 #include <rte_byteorder.h>
 #include <rte_errno.h>
+#include <rte_ether.h>
 #include <rte_flow.h>
 
 #include "mlx5.h"
@@ -44,6 +48,72 @@
 #ifndef HAVE_TCA_FLOWER_FLAGS
 #define TCA_FLOWER_FLAGS 22
 #endif
+#ifndef HAVE_TCA_FLOWER_KEY_ETH_TYPE
+#define TCA_FLOWER_KEY_ETH_TYPE 8
+#endif
+#ifndef HAVE_TCA_FLOWER_KEY_ETH_DST
+#define TCA_FLOWER_KEY_ETH_DST 4
+#endif
+#ifndef HAVE_TCA_FLOWER_KEY_ETH_DST_MASK
+#define TCA_FLOWER_KEY_ETH_DST_MASK 5
+#endif
+#ifndef HAVE_TCA_FLOWER_KEY_ETH_SRC
+#define TCA_FLOWER_KEY_ETH_SRC 6
+#endif
+#ifndef HAVE_TCA_FLOWER_KEY_ETH_SRC_MASK
+#define TCA_FLOWER_KEY_ETH_SRC_MASK 7
+#endif
+#ifndef HAVE_TCA_FLOWER_KEY_IP_PROTO
+#define TCA_FLOWER_KEY_IP_PROTO 9
+#endif
+#ifndef HAVE_TCA_FLOWER_KEY_IPV4_SRC
+#define TCA_FLOWER_KEY_IPV4_SRC 10
+#endif
+#ifndef HAVE_TCA_FLOWER_KEY_IPV4_SRC_MASK
+#define TCA_FLOWER_KEY_IPV4_SRC_MASK 11
+#endif
+#ifndef HAVE_TCA_FLOWER_KEY_IPV4_DST
+#define TCA_FLOWER_KEY_IPV4_DST 12
+#endif
+#ifndef HAVE_TCA_FLOWER_KEY_IPV4_DST_MASK
+#define TCA_FLOWER_KEY_IPV4_DST_MASK 13
+#endif
+#ifndef HAVE_TCA_FLOWER_KEY_IPV6_SRC
+#define TCA_FLOWER_KEY_IPV6_SRC 14
+#endif
+#ifndef HAVE_TCA_FLOWER_KEY_IPV6_SRC_MASK
+#define TCA_FLOWER_KEY_IPV6_SRC_MASK 15
+#endif
+#ifndef HAVE_TCA_FLOWER_KEY_IPV6_DST
+#define TCA_FLOWER_KEY_IPV6_DST 16
+#endif
+#ifndef HAVE_TCA_FLOWER_KEY_IPV6_DST_MASK
+#define TCA_FLOWER_KEY_IPV6_DST_MASK 17
+#endif
+#ifndef HAVE_TCA_FLOWER_KEY_TCP_SRC
+#define TCA_FLOWER_KEY_TCP_SRC 18
+#endif
+#ifndef HAVE_TCA_FLOWER_KEY_TCP_SRC_MASK
+#define TCA_FLOWER_KEY_TCP_SRC_MASK 35
+#endif
+#ifndef HAVE_TCA_FLOWER_KEY_TCP_DST
+#define TCA_FLOWER_KEY_TCP_DST 19
+#endif
+#ifndef HAVE_TCA_FLOWER_KEY_TCP_DST_MASK
+#define TCA_FLOWER_KEY_TCP_DST_MASK 36
+#endif
+#ifndef HAVE_TCA_FLOWER_KEY_UDP_SRC
+#define TCA_FLOWER_KEY_UDP_SRC 20
+#endif
+#ifndef HAVE_TCA_FLOWER_KEY_UDP_SRC_MASK
+#define TCA_FLOWER_KEY_UDP_SRC_MASK 37
+#endif
+#ifndef HAVE_TCA_FLOWER_KEY_UDP_DST
+#define TCA_FLOWER_KEY_UDP_DST 21
+#endif
+#ifndef HAVE_TCA_FLOWER_KEY_UDP_DST_MASK
+#define TCA_FLOWER_KEY_UDP_DST_MASK 38
+#endif
 
 /** Parser state definitions for mlx5_nl_flow_trans[]. */
 enum mlx5_nl_flow_trans {
@@ -52,6 +122,11 @@ enum mlx5_nl_flow_trans {
 	ATTR,
 	PATTERN,
 	ITEM_VOID,
+	ITEM_ETH,
+	ITEM_IPV4,
+	ITEM_IPV6,
+	ITEM_TCP,
+	ITEM_UDP,
 	ACTIONS,
 	ACTION_VOID,
 	ACTION_PORT_ID,
@@ -73,8 +148,13 @@ static const enum mlx5_nl_flow_trans *const mlx5_nl_flow_trans[] = {
 	[INVALID] = NULL,
 	[BACK] = NULL,
 	[ATTR] = TRANS(PATTERN),
-	[PATTERN] = TRANS(PATTERN_COMMON),
+	[PATTERN] = TRANS(ITEM_ETH, PATTERN_COMMON),
 	[ITEM_VOID] = TRANS(BACK),
+	[ITEM_ETH] = TRANS(ITEM_IPV4, ITEM_IPV6, PATTERN_COMMON),
+	[ITEM_IPV4] = TRANS(ITEM_TCP, ITEM_UDP, PATTERN_COMMON),
+	[ITEM_IPV6] = TRANS(ITEM_TCP, ITEM_UDP, PATTERN_COMMON),
+	[ITEM_TCP] = TRANS(PATTERN_COMMON),
+	[ITEM_UDP] = TRANS(PATTERN_COMMON),
 	[ACTIONS] = TRANS(ACTIONS_FATE, ACTIONS_COMMON),
 	[ACTION_VOID] = TRANS(BACK),
 	[ACTION_PORT_ID] = TRANS(ACTION_VOID, END),
@@ -82,6 +162,126 @@ static const enum mlx5_nl_flow_trans *const mlx5_nl_flow_trans[] = {
 	[END] = NULL,
 };
 
+/** Empty masks for known item types. */
+static const union {
+	struct rte_flow_item_eth eth;
+	struct rte_flow_item_ipv4 ipv4;
+	struct rte_flow_item_ipv6 ipv6;
+	struct rte_flow_item_tcp tcp;
+	struct rte_flow_item_udp udp;
+} mlx5_nl_flow_mask_empty;
+
+/** Supported masks for known item types. */
+static const struct {
+	struct rte_flow_item_eth eth;
+	struct rte_flow_item_ipv4 ipv4;
+	struct rte_flow_item_ipv6 ipv6;
+	struct rte_flow_item_tcp tcp;
+	struct rte_flow_item_udp udp;
+} mlx5_nl_flow_mask_supported = {
+	.eth = {
+		.type = RTE_BE16(0xffff),
+		.dst.addr_bytes = "\xff\xff\xff\xff\xff\xff",
+		.src.addr_bytes = "\xff\xff\xff\xff\xff\xff",
+	},
+	.ipv4.hdr = {
+		.next_proto_id = 0xff,
+		.src_addr = RTE_BE32(0xffffffff),
+		.dst_addr = RTE_BE32(0xffffffff),
+	},
+	.ipv6.hdr = {
+		.proto = 0xff,
+		.src_addr =
+			"\xff\xff\xff\xff\xff\xff\xff\xff"
+			"\xff\xff\xff\xff\xff\xff\xff\xff",
+		.dst_addr =
+			"\xff\xff\xff\xff\xff\xff\xff\xff"
+			"\xff\xff\xff\xff\xff\xff\xff\xff",
+	},
+	.tcp.hdr = {
+		.src_port = RTE_BE16(0xffff),
+		.dst_port = RTE_BE16(0xffff),
+	},
+	.udp.hdr = {
+		.src_port = RTE_BE16(0xffff),
+		.dst_port = RTE_BE16(0xffff),
+	},
+};
+
+/**
+ * Retrieve mask for pattern item.
+ *
+ * This function does basic sanity checks on a pattern item in order to
+ * return the most appropriate mask for it.
+ *
+ * @param[in] item
+ *   Item specification.
+ * @param[in] mask_default
+ *   Default mask for pattern item as specified by the flow API.
+ * @param[in] mask_supported
+ *   Mask fields supported by the implementation.
+ * @param[in] mask_empty
+ *   Empty mask to return when there is no specification.
+ * @param[out] error
+ *   Perform verbose error reporting if not NULL.
+ *
+ * @return
+ *   Either @p item->mask or one of the mask parameters on success, NULL
+ *   otherwise and rte_errno is set.
+ */
+static const void *
+mlx5_nl_flow_item_mask(const struct rte_flow_item *item,
+		       const void *mask_default,
+		       const void *mask_supported,
+		       const void *mask_empty,
+		       size_t mask_size,
+		       struct rte_flow_error *error)
+{
+	const uint8_t *mask;
+	size_t i;
+
+	/* item->last and item->mask cannot exist without item->spec. */
+	if (!item->spec && (item->mask || item->last)) {
+		rte_flow_error_set
+			(error, EINVAL, RTE_FLOW_ERROR_TYPE_ITEM, item,
+			 "\"mask\" or \"last\" field provided without a"
+			 " corresponding \"spec\"");
+		return NULL;
+	}
+	/* No spec, no mask, no problem. */
+	if (!item->spec)
+		return mask_empty;
+	mask = item->mask ? item->mask : mask_default;
+	assert(mask);
+	/*
+	 * Single-pass check to make sure that:
+	 * - Mask is supported, no bits are set outside mask_supported.
+	 * - Both item->spec and item->last are included in mask.
+	 */
+	for (i = 0; i != mask_size; ++i) {
+		if (!mask[i])
+			continue;
+		if ((mask[i] | ((const uint8_t *)mask_supported)[i]) !=
+		    ((const uint8_t *)mask_supported)[i]) {
+			rte_flow_error_set
+				(error, ENOTSUP, RTE_FLOW_ERROR_TYPE_ITEM_MASK,
+				 mask, "unsupported field found in \"mask\"");
+			return NULL;
+		}
+		if (item->last &&
+		    (((const uint8_t *)item->spec)[i] & mask[i]) !=
+		    (((const uint8_t *)item->last)[i] & mask[i])) {
+			rte_flow_error_set
+				(error, ENOTSUP, RTE_FLOW_ERROR_TYPE_ITEM_LAST,
+				 item->last,
+				 "range between \"spec\" and \"last\" not"
+				 " comprised in \"mask\"");
+			return NULL;
+		}
+	}
+	return mask;
+}
+
 /**
  * Transpose flow rule description to rtnetlink message.
  *
@@ -128,6 +328,8 @@ mlx5_nl_flow_transpose(void *buf,
 	const struct rte_flow_action *action;
 	unsigned int n;
 	uint32_t act_index_cur;
+	bool eth_type_set;
+	bool ip_proto_set;
 	struct nlattr *na_flower;
 	struct nlattr *na_flower_act;
 	const enum mlx5_nl_flow_trans *trans;
@@ -140,6 +342,8 @@ mlx5_nl_flow_transpose(void *buf,
 	action = actions;
 	n = 0;
 	act_index_cur = 0;
+	eth_type_set = false;
+	ip_proto_set = false;
 	na_flower = NULL;
 	na_flower_act = NULL;
 	trans = TRANS(ATTR);
@@ -147,6 +351,13 @@ mlx5_nl_flow_transpose(void *buf,
 trans:
 	switch (trans[n++]) {
 		union {
+			const struct rte_flow_item_eth *eth;
+			const struct rte_flow_item_ipv4 *ipv4;
+			const struct rte_flow_item_ipv6 *ipv6;
+			const struct rte_flow_item_tcp *tcp;
+			const struct rte_flow_item_udp *udp;
+		} spec, mask;
+		union {
 			const struct rte_flow_action_port_id *port_id;
 		} conf;
 		struct nlmsghdr *nlh;
@@ -235,6 +446,256 @@ mlx5_nl_flow_transpose(void *buf,
 			goto trans;
 		++item;
 		break;
+	case ITEM_ETH:
+		if (item->type != RTE_FLOW_ITEM_TYPE_ETH)
+			goto trans;
+		mask.eth = mlx5_nl_flow_item_mask
+			(item, &rte_flow_item_eth_mask,
+			 &mlx5_nl_flow_mask_supported.eth,
+			 &mlx5_nl_flow_mask_empty.eth,
+			 sizeof(mlx5_nl_flow_mask_supported.eth), error);
+		if (!mask.eth)
+			return -rte_errno;
+		if (mask.eth == &mlx5_nl_flow_mask_empty.eth) {
+			++item;
+			break;
+		}
+		spec.eth = item->spec;
+		if (mask.eth->type && mask.eth->type != RTE_BE16(0xffff))
+			return rte_flow_error_set
+				(error, ENOTSUP, RTE_FLOW_ERROR_TYPE_ITEM_MASK,
+				 mask.eth,
+				 "no support for partial mask on"
+				 " \"type\" field");
+		if (mask.eth->type) {
+			if (!mnl_attr_put_u16_check(buf, size,
+						    TCA_FLOWER_KEY_ETH_TYPE,
+						    spec.eth->type))
+				goto error_nobufs;
+			eth_type_set = 1;
+		}
+		if ((!is_zero_ether_addr(&mask.eth->dst) &&
+		     (!mnl_attr_put_check(buf, size,
+					  TCA_FLOWER_KEY_ETH_DST,
+					  ETHER_ADDR_LEN,
+					  spec.eth->dst.addr_bytes) ||
+		      !mnl_attr_put_check(buf, size,
+					  TCA_FLOWER_KEY_ETH_DST_MASK,
+					  ETHER_ADDR_LEN,
+					  mask.eth->dst.addr_bytes))) ||
+		    (!is_zero_ether_addr(&mask.eth->src) &&
+		     (!mnl_attr_put_check(buf, size,
+					  TCA_FLOWER_KEY_ETH_SRC,
+					  ETHER_ADDR_LEN,
+					  spec.eth->src.addr_bytes) ||
+		      !mnl_attr_put_check(buf, size,
+					  TCA_FLOWER_KEY_ETH_SRC_MASK,
+					  ETHER_ADDR_LEN,
+					  mask.eth->src.addr_bytes))))
+			goto error_nobufs;
+		++item;
+		break;
+	case ITEM_IPV4:
+		if (item->type != RTE_FLOW_ITEM_TYPE_IPV4)
+			goto trans;
+		mask.ipv4 = mlx5_nl_flow_item_mask
+			(item, &rte_flow_item_ipv4_mask,
+			 &mlx5_nl_flow_mask_supported.ipv4,
+			 &mlx5_nl_flow_mask_empty.ipv4,
+			 sizeof(mlx5_nl_flow_mask_supported.ipv4), error);
+		if (!mask.ipv4)
+			return -rte_errno;
+		if (!eth_type_set &&
+		    !mnl_attr_put_u16_check(buf, size,
+					    TCA_FLOWER_KEY_ETH_TYPE,
+					    RTE_BE16(ETH_P_IP)))
+			goto error_nobufs;
+		eth_type_set = 1;
+		if (mask.ipv4 == &mlx5_nl_flow_mask_empty.ipv4) {
+			++item;
+			break;
+		}
+		spec.ipv4 = item->spec;
+		if (mask.ipv4->hdr.next_proto_id &&
+		    mask.ipv4->hdr.next_proto_id != 0xff)
+			return rte_flow_error_set
+				(error, ENOTSUP, RTE_FLOW_ERROR_TYPE_ITEM_MASK,
+				 mask.ipv4,
+				 "no support for partial mask on"
+				 " \"hdr.next_proto_id\" field");
+		if (mask.ipv4->hdr.next_proto_id) {
+			if (!mnl_attr_put_u8_check
+			    (buf, size, TCA_FLOWER_KEY_IP_PROTO,
+			     spec.ipv4->hdr.next_proto_id))
+				goto error_nobufs;
+			ip_proto_set = 1;
+		}
+		if ((mask.ipv4->hdr.src_addr &&
+		     (!mnl_attr_put_u32_check(buf, size,
+					      TCA_FLOWER_KEY_IPV4_SRC,
+					      spec.ipv4->hdr.src_addr) ||
+		      !mnl_attr_put_u32_check(buf, size,
+					      TCA_FLOWER_KEY_IPV4_SRC_MASK,
+					      mask.ipv4->hdr.src_addr))) ||
+		    (mask.ipv4->hdr.dst_addr &&
+		     (!mnl_attr_put_u32_check(buf, size,
+					      TCA_FLOWER_KEY_IPV4_DST,
+					      spec.ipv4->hdr.dst_addr) ||
+		      !mnl_attr_put_u32_check(buf, size,
+					      TCA_FLOWER_KEY_IPV4_DST_MASK,
+					      mask.ipv4->hdr.dst_addr))))
+			goto error_nobufs;
+		++item;
+		break;
+	case ITEM_IPV6:
+		if (item->type != RTE_FLOW_ITEM_TYPE_IPV6)
+			goto trans;
+		mask.ipv6 = mlx5_nl_flow_item_mask
+			(item, &rte_flow_item_ipv6_mask,
+			 &mlx5_nl_flow_mask_supported.ipv6,
+			 &mlx5_nl_flow_mask_empty.ipv6,
+			 sizeof(mlx5_nl_flow_mask_supported.ipv6), error);
+		if (!mask.ipv6)
+			return -rte_errno;
+		if (!eth_type_set &&
+		    !mnl_attr_put_u16_check(buf, size,
+					    TCA_FLOWER_KEY_ETH_TYPE,
+					    RTE_BE16(ETH_P_IPV6)))
+			goto error_nobufs;
+		eth_type_set = 1;
+		if (mask.ipv6 == &mlx5_nl_flow_mask_empty.ipv6) {
+			++item;
+			break;
+		}
+		spec.ipv6 = item->spec;
+		if (mask.ipv6->hdr.proto && mask.ipv6->hdr.proto != 0xff)
+			return rte_flow_error_set
+				(error, ENOTSUP, RTE_FLOW_ERROR_TYPE_ITEM_MASK,
+				 mask.ipv6,
+				 "no support for partial mask on"
+				 " \"hdr.proto\" field");
+		if (mask.ipv6->hdr.proto) {
+			if (!mnl_attr_put_u8_check
+			    (buf, size, TCA_FLOWER_KEY_IP_PROTO,
+			     spec.ipv6->hdr.proto))
+				goto error_nobufs;
+			ip_proto_set = 1;
+		}
+		if ((!IN6_IS_ADDR_UNSPECIFIED(mask.ipv6->hdr.src_addr) &&
+		     (!mnl_attr_put_check(buf, size,
+					  TCA_FLOWER_KEY_IPV6_SRC,
+					  sizeof(spec.ipv6->hdr.src_addr),
+					  spec.ipv6->hdr.src_addr) ||
+		      !mnl_attr_put_check(buf, size,
+					  TCA_FLOWER_KEY_IPV6_SRC_MASK,
+					  sizeof(mask.ipv6->hdr.src_addr),
+					  mask.ipv6->hdr.src_addr))) ||
+		    (!IN6_IS_ADDR_UNSPECIFIED(mask.ipv6->hdr.dst_addr) &&
+		     (!mnl_attr_put_check(buf, size,
+					  TCA_FLOWER_KEY_IPV6_DST,
+					  sizeof(spec.ipv6->hdr.dst_addr),
+					  spec.ipv6->hdr.dst_addr) ||
+		      !mnl_attr_put_check(buf, size,
+					  TCA_FLOWER_KEY_IPV6_DST_MASK,
+					  sizeof(mask.ipv6->hdr.dst_addr),
+					  mask.ipv6->hdr.dst_addr))))
+			goto error_nobufs;
+		++item;
+		break;
+	case ITEM_TCP:
+		if (item->type != RTE_FLOW_ITEM_TYPE_TCP)
+			goto trans;
+		mask.tcp = mlx5_nl_flow_item_mask
+			(item, &rte_flow_item_tcp_mask,
+			 &mlx5_nl_flow_mask_supported.tcp,
+			 &mlx5_nl_flow_mask_empty.tcp,
+			 sizeof(mlx5_nl_flow_mask_supported.tcp), error);
+		if (!mask.tcp)
+			return -rte_errno;
+		if (!ip_proto_set &&
+		    !mnl_attr_put_u8_check(buf, size,
+					   TCA_FLOWER_KEY_IP_PROTO,
+					   IPPROTO_TCP))
+			goto error_nobufs;
+		if (mask.tcp == &mlx5_nl_flow_mask_empty.tcp) {
+			++item;
+			break;
+		}
+		spec.tcp = item->spec;
+		if ((mask.tcp->hdr.src_port &&
+		     mask.tcp->hdr.src_port != RTE_BE16(0xffff)) ||
+		    (mask.tcp->hdr.dst_port &&
+		     mask.tcp->hdr.dst_port != RTE_BE16(0xffff)))
+			return rte_flow_error_set
+				(error, ENOTSUP, RTE_FLOW_ERROR_TYPE_ITEM_MASK,
+				 mask.tcp,
+				 "no support for partial masks on"
+				 " \"hdr.src_port\" and \"hdr.dst_port\""
+				 " fields");
+		if ((mask.tcp->hdr.src_port &&
+		     (!mnl_attr_put_u16_check(buf, size,
+					      TCA_FLOWER_KEY_TCP_SRC,
+					      spec.tcp->hdr.src_port) ||
+		      !mnl_attr_put_u16_check(buf, size,
+					      TCA_FLOWER_KEY_TCP_SRC_MASK,
+					      mask.tcp->hdr.src_port))) ||
+		    (mask.tcp->hdr.dst_port &&
+		     (!mnl_attr_put_u16_check(buf, size,
+					      TCA_FLOWER_KEY_TCP_DST,
+					      spec.tcp->hdr.dst_port) ||
+		      !mnl_attr_put_u16_check(buf, size,
+					      TCA_FLOWER_KEY_TCP_DST_MASK,
+					      mask.tcp->hdr.dst_port))))
+			goto error_nobufs;
+		++item;
+		break;
+	case ITEM_UDP:
+		if (item->type != RTE_FLOW_ITEM_TYPE_UDP)
+			goto trans;
+		mask.udp = mlx5_nl_flow_item_mask
+			(item, &rte_flow_item_udp_mask,
+			 &mlx5_nl_flow_mask_supported.udp,
+			 &mlx5_nl_flow_mask_empty.udp,
+			 sizeof(mlx5_nl_flow_mask_supported.udp), error);
+		if (!mask.udp)
+			return -rte_errno;
+		if (!ip_proto_set &&
+		    !mnl_attr_put_u8_check(buf, size,
+					   TCA_FLOWER_KEY_IP_PROTO,
+					   IPPROTO_UDP))
+			goto error_nobufs;
+		if (mask.udp == &mlx5_nl_flow_mask_empty.udp) {
+			++item;
+			break;
+		}
+		spec.udp = item->spec;
+		if ((mask.udp->hdr.src_port &&
+		     mask.udp->hdr.src_port != RTE_BE16(0xffff)) ||
+		    (mask.udp->hdr.dst_port &&
+		     mask.udp->hdr.dst_port != RTE_BE16(0xffff)))
+			return rte_flow_error_set
+				(error, ENOTSUP, RTE_FLOW_ERROR_TYPE_ITEM_MASK,
+				 mask.udp,
+				 "no support for partial masks on"
+				 " \"hdr.src_port\" and \"hdr.dst_port\""
+				 " fields");
+		if ((mask.udp->hdr.src_port &&
+		     (!mnl_attr_put_u16_check(buf, size,
+					      TCA_FLOWER_KEY_UDP_SRC,
+					      spec.udp->hdr.src_port) ||
+		      !mnl_attr_put_u16_check(buf, size,
+					      TCA_FLOWER_KEY_UDP_SRC_MASK,
+					      mask.udp->hdr.src_port))) ||
+		    (mask.udp->hdr.dst_port &&
+		     (!mnl_attr_put_u16_check(buf, size,
+					      TCA_FLOWER_KEY_UDP_DST,
+					      spec.udp->hdr.dst_port) ||
+		      !mnl_attr_put_u16_check(buf, size,
+					      TCA_FLOWER_KEY_UDP_DST_MASK,
+					      mask.udp->hdr.dst_port))))
+			goto error_nobufs;
+		++item;
+		break;
 	case ACTIONS:
 		if (item->type != RTE_FLOW_ITEM_TYPE_END)
 			goto trans;
-- 
2.11.0

^ permalink raw reply	[flat|nested] 33+ messages in thread

* [dpdk-dev] [PATCH v2 5/6] net/mlx5: add VLAN item and actions to switch flow rules
  2018-07-13  9:40 ` [dpdk-dev] [PATCH v2 " Adrien Mazarguil
                     ` (3 preceding siblings ...)
  2018-07-13  9:40   ` [dpdk-dev] [PATCH v2 4/6] net/mlx5: add L2-L4 pattern items " Adrien Mazarguil
@ 2018-07-13  9:40   ` Adrien Mazarguil
  2018-07-13  9:40   ` [dpdk-dev] [PATCH v2 6/6] net/mlx5: add port ID pattern item " Adrien Mazarguil
  2018-07-22 11:21   ` [dpdk-dev] [PATCH v2 0/6] net/mlx5: add support for " Shahaf Shuler
  6 siblings, 0 replies; 33+ messages in thread
From: Adrien Mazarguil @ 2018-07-13  9:40 UTC (permalink / raw)
  To: Shahaf Shuler; +Cc: Nelio Laranjeiro, Yongseok Koh, dev

This enables flow rules to explicitly match VLAN traffic (VLAN pattern
item) and perform various operations on VLAN headers at the switch level
(OF_POP_VLAN, OF_PUSH_VLAN, OF_SET_VLAN_VID and OF_SET_VLAN_PCP actions).

Testpmd examples:

- Directing all VLAN traffic received on port ID 1 to port ID 0:

  flow create 1 ingress transfer pattern eth / vlan / end actions
     port_id id 0 / end

- Adding a VLAN header to IPv6 traffic received on port ID 1 and directing
  it to port ID 0:

  flow create 1 ingress transfer pattern eth / ipv6 / end actions
     of_push_vlan ethertype 0x8100 / of_set_vlan_vid vlan_vid 42 /
     port_id id 0 / end

Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Acked-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Acked-by: Yongseok Koh <yskoh@mellanox.com>
--
v2 changes:

- Yongseok, I chose not to add extra safety to VLAN POP at this point since
  basic rte_flow_validate() requirements are satisfied: this implementation
  makes sure that a flow rule is fully understood and can be attempted, it
  just doesn't perform extra HW-specific checks and leaves them to the
  kernel. They can be added later if necessary.
- Added definitions for TC_ACT_VLAN, TCA_FLOWER_KEY_VLAN_ID,
  TCA_FLOWER_KEY_VLAN_PRIO and TCA_FLOWER_KEY_VLAN_ETH_TYPE in case they
  are missing from the host system (e.g.  RHEL 7.2).
---
 drivers/net/mlx5/Makefile       |  20 ++++
 drivers/net/mlx5/mlx5_nl_flow.c | 208 ++++++++++++++++++++++++++++++++++-
 2 files changed, 224 insertions(+), 4 deletions(-)

diff --git a/drivers/net/mlx5/Makefile b/drivers/net/mlx5/Makefile
index 5e28b4c87..6dd218285 100644
--- a/drivers/net/mlx5/Makefile
+++ b/drivers/net/mlx5/Makefile
@@ -319,6 +319,26 @@ mlx5_autoconf.h.new: $(RTE_SDK)/buildtools/auto-config-h.sh
 		linux/pkt_cls.h \
 		enum TCA_FLOWER_KEY_UDP_DST_MASK \
 		$(AUTOCONF_OUTPUT)
+	$Q sh -- '$<' '$@' \
+		HAVE_TCA_FLOWER_KEY_VLAN_ID \
+		linux/pkt_cls.h \
+		enum TCA_FLOWER_KEY_VLAN_ID \
+		$(AUTOCONF_OUTPUT)
+	$Q sh -- '$<' '$@' \
+		HAVE_TCA_FLOWER_KEY_VLAN_PRIO \
+		linux/pkt_cls.h \
+		enum TCA_FLOWER_KEY_VLAN_PRIO \
+		$(AUTOCONF_OUTPUT)
+	$Q sh -- '$<' '$@' \
+		HAVE_TCA_FLOWER_KEY_VLAN_ETH_TYPE \
+		linux/pkt_cls.h \
+		enum TCA_FLOWER_KEY_VLAN_ETH_TYPE \
+		$(AUTOCONF_OUTPUT)
+	$Q sh -- '$<' '$@' \
+		HAVE_TC_ACT_VLAN \
+		linux/tc_act/tc_vlan.h \
+		enum TCA_VLAN_PUSH_VLAN_PRIORITY \
+		$(AUTOCONF_OUTPUT)
 
 # Create mlx5_autoconf.h or update it in case it differs from the new one.
 
diff --git a/drivers/net/mlx5/mlx5_nl_flow.c b/drivers/net/mlx5/mlx5_nl_flow.c
index 88e7cabd5..6c7bf7119 100644
--- a/drivers/net/mlx5/mlx5_nl_flow.c
+++ b/drivers/net/mlx5/mlx5_nl_flow.c
@@ -27,6 +27,29 @@
 #include <rte_flow.h>
 
 #include "mlx5.h"
+#include "mlx5_autoconf.h"
+
+#ifdef HAVE_TC_ACT_VLAN
+
+#include <linux/tc_act/tc_vlan.h>
+
+#else /* HAVE_TC_ACT_VLAN */
+
+#define TCA_VLAN_ACT_POP 1
+#define TCA_VLAN_ACT_PUSH 2
+#define TCA_VLAN_ACT_MODIFY 3
+#define TCA_VLAN_PARMS 2
+#define TCA_VLAN_PUSH_VLAN_ID 3
+#define TCA_VLAN_PUSH_VLAN_PROTOCOL 4
+#define TCA_VLAN_PAD 5
+#define TCA_VLAN_PUSH_VLAN_PRIORITY 6
+
+struct tc_vlan {
+	tc_gen;
+	int v_action;
+};
+
+#endif /* HAVE_TC_ACT_VLAN */
 
 /* Normally found in linux/netlink.h. */
 #ifndef NETLINK_CAP_ACK
@@ -114,6 +137,15 @@
 #ifndef HAVE_TCA_FLOWER_KEY_UDP_DST_MASK
 #define TCA_FLOWER_KEY_UDP_DST_MASK 38
 #endif
+#ifndef HAVE_TCA_FLOWER_KEY_VLAN_ID
+#define TCA_FLOWER_KEY_VLAN_ID 23
+#endif
+#ifndef HAVE_TCA_FLOWER_KEY_VLAN_PRIO
+#define TCA_FLOWER_KEY_VLAN_PRIO 24
+#endif
+#ifndef HAVE_TCA_FLOWER_KEY_VLAN_ETH_TYPE
+#define TCA_FLOWER_KEY_VLAN_ETH_TYPE 25
+#endif
 
 /** Parser state definitions for mlx5_nl_flow_trans[]. */
 enum mlx5_nl_flow_trans {
@@ -123,6 +155,7 @@ enum mlx5_nl_flow_trans {
 	PATTERN,
 	ITEM_VOID,
 	ITEM_ETH,
+	ITEM_VLAN,
 	ITEM_IPV4,
 	ITEM_IPV6,
 	ITEM_TCP,
@@ -131,6 +164,10 @@ enum mlx5_nl_flow_trans {
 	ACTION_VOID,
 	ACTION_PORT_ID,
 	ACTION_DROP,
+	ACTION_OF_POP_VLAN,
+	ACTION_OF_PUSH_VLAN,
+	ACTION_OF_SET_VLAN_VID,
+	ACTION_OF_SET_VLAN_PCP,
 	END,
 };
 
@@ -139,7 +176,8 @@ enum mlx5_nl_flow_trans {
 #define PATTERN_COMMON \
 	ITEM_VOID, ACTIONS
 #define ACTIONS_COMMON \
-	ACTION_VOID
+	ACTION_VOID, ACTION_OF_POP_VLAN, ACTION_OF_PUSH_VLAN, \
+	ACTION_OF_SET_VLAN_VID, ACTION_OF_SET_VLAN_PCP
 #define ACTIONS_FATE \
 	ACTION_PORT_ID, ACTION_DROP
 
@@ -150,7 +188,8 @@ static const enum mlx5_nl_flow_trans *const mlx5_nl_flow_trans[] = {
 	[ATTR] = TRANS(PATTERN),
 	[PATTERN] = TRANS(ITEM_ETH, PATTERN_COMMON),
 	[ITEM_VOID] = TRANS(BACK),
-	[ITEM_ETH] = TRANS(ITEM_IPV4, ITEM_IPV6, PATTERN_COMMON),
+	[ITEM_ETH] = TRANS(ITEM_IPV4, ITEM_IPV6, ITEM_VLAN, PATTERN_COMMON),
+	[ITEM_VLAN] = TRANS(ITEM_IPV4, ITEM_IPV6, PATTERN_COMMON),
 	[ITEM_IPV4] = TRANS(ITEM_TCP, ITEM_UDP, PATTERN_COMMON),
 	[ITEM_IPV6] = TRANS(ITEM_TCP, ITEM_UDP, PATTERN_COMMON),
 	[ITEM_TCP] = TRANS(PATTERN_COMMON),
@@ -159,12 +198,17 @@ static const enum mlx5_nl_flow_trans *const mlx5_nl_flow_trans[] = {
 	[ACTION_VOID] = TRANS(BACK),
 	[ACTION_PORT_ID] = TRANS(ACTION_VOID, END),
 	[ACTION_DROP] = TRANS(ACTION_VOID, END),
+	[ACTION_OF_POP_VLAN] = TRANS(ACTIONS_FATE, ACTIONS_COMMON),
+	[ACTION_OF_PUSH_VLAN] = TRANS(ACTIONS_FATE, ACTIONS_COMMON),
+	[ACTION_OF_SET_VLAN_VID] = TRANS(ACTIONS_FATE, ACTIONS_COMMON),
+	[ACTION_OF_SET_VLAN_PCP] = TRANS(ACTIONS_FATE, ACTIONS_COMMON),
 	[END] = NULL,
 };
 
 /** Empty masks for known item types. */
 static const union {
 	struct rte_flow_item_eth eth;
+	struct rte_flow_item_vlan vlan;
 	struct rte_flow_item_ipv4 ipv4;
 	struct rte_flow_item_ipv6 ipv6;
 	struct rte_flow_item_tcp tcp;
@@ -174,6 +218,7 @@ static const union {
 /** Supported masks for known item types. */
 static const struct {
 	struct rte_flow_item_eth eth;
+	struct rte_flow_item_vlan vlan;
 	struct rte_flow_item_ipv4 ipv4;
 	struct rte_flow_item_ipv6 ipv6;
 	struct rte_flow_item_tcp tcp;
@@ -184,6 +229,11 @@ static const struct {
 		.dst.addr_bytes = "\xff\xff\xff\xff\xff\xff",
 		.src.addr_bytes = "\xff\xff\xff\xff\xff\xff",
 	},
+	.vlan = {
+		/* PCP and VID only, no DEI. */
+		.tci = RTE_BE16(0xefff),
+		.inner_type = RTE_BE16(0xffff),
+	},
 	.ipv4.hdr = {
 		.next_proto_id = 0xff,
 		.src_addr = RTE_BE32(0xffffffff),
@@ -329,9 +379,13 @@ mlx5_nl_flow_transpose(void *buf,
 	unsigned int n;
 	uint32_t act_index_cur;
 	bool eth_type_set;
+	bool vlan_present;
+	bool vlan_eth_type_set;
 	bool ip_proto_set;
 	struct nlattr *na_flower;
 	struct nlattr *na_flower_act;
+	struct nlattr *na_vlan_id;
+	struct nlattr *na_vlan_priority;
 	const enum mlx5_nl_flow_trans *trans;
 	const enum mlx5_nl_flow_trans *back;
 
@@ -343,15 +397,20 @@ mlx5_nl_flow_transpose(void *buf,
 	n = 0;
 	act_index_cur = 0;
 	eth_type_set = false;
+	vlan_present = false;
+	vlan_eth_type_set = false;
 	ip_proto_set = false;
 	na_flower = NULL;
 	na_flower_act = NULL;
+	na_vlan_id = NULL;
+	na_vlan_priority = NULL;
 	trans = TRANS(ATTR);
 	back = trans;
 trans:
 	switch (trans[n++]) {
 		union {
 			const struct rte_flow_item_eth *eth;
+			const struct rte_flow_item_vlan *vlan;
 			const struct rte_flow_item_ipv4 *ipv4;
 			const struct rte_flow_item_ipv6 *ipv6;
 			const struct rte_flow_item_tcp *tcp;
@@ -359,6 +418,11 @@ mlx5_nl_flow_transpose(void *buf,
 		} spec, mask;
 		union {
 			const struct rte_flow_action_port_id *port_id;
+			const struct rte_flow_action_of_push_vlan *of_push_vlan;
+			const struct rte_flow_action_of_set_vlan_vid *
+				of_set_vlan_vid;
+			const struct rte_flow_action_of_set_vlan_pcp *
+				of_set_vlan_pcp;
 		} conf;
 		struct nlmsghdr *nlh;
 		struct tcmsg *tcm;
@@ -495,6 +559,58 @@ mlx5_nl_flow_transpose(void *buf,
 			goto error_nobufs;
 		++item;
 		break;
+	case ITEM_VLAN:
+		if (item->type != RTE_FLOW_ITEM_TYPE_VLAN)
+			goto trans;
+		mask.vlan = mlx5_nl_flow_item_mask
+			(item, &rte_flow_item_vlan_mask,
+			 &mlx5_nl_flow_mask_supported.vlan,
+			 &mlx5_nl_flow_mask_empty.vlan,
+			 sizeof(mlx5_nl_flow_mask_supported.vlan), error);
+		if (!mask.vlan)
+			return -rte_errno;
+		if (!eth_type_set &&
+		    !mnl_attr_put_u16_check(buf, size,
+					    TCA_FLOWER_KEY_ETH_TYPE,
+					    RTE_BE16(ETH_P_8021Q)))
+			goto error_nobufs;
+		eth_type_set = 1;
+		vlan_present = 1;
+		if (mask.vlan == &mlx5_nl_flow_mask_empty.vlan) {
+			++item;
+			break;
+		}
+		spec.vlan = item->spec;
+		if ((mask.vlan->tci & RTE_BE16(0xe000) &&
+		     (mask.vlan->tci & RTE_BE16(0xe000)) != RTE_BE16(0xe000)) ||
+		    (mask.vlan->tci & RTE_BE16(0x0fff) &&
+		     (mask.vlan->tci & RTE_BE16(0x0fff)) != RTE_BE16(0x0fff)) ||
+		    (mask.vlan->inner_type &&
+		     mask.vlan->inner_type != RTE_BE16(0xffff)))
+			return rte_flow_error_set
+				(error, ENOTSUP, RTE_FLOW_ERROR_TYPE_ITEM_MASK,
+				 mask.vlan,
+				 "no support for partial masks on"
+				 " \"tci\" (PCP and VID parts) and"
+				 " \"inner_type\" fields");
+		if (mask.vlan->inner_type) {
+			if (!mnl_attr_put_u16_check
+			    (buf, size, TCA_FLOWER_KEY_VLAN_ETH_TYPE,
+			     spec.vlan->inner_type))
+				goto error_nobufs;
+			vlan_eth_type_set = 1;
+		}
+		if ((mask.vlan->tci & RTE_BE16(0xe000) &&
+		     !mnl_attr_put_u8_check
+		     (buf, size, TCA_FLOWER_KEY_VLAN_PRIO,
+		      (rte_be_to_cpu_16(spec.vlan->tci) >> 13) & 0x7)) ||
+		    (mask.vlan->tci & RTE_BE16(0x0fff) &&
+		     !mnl_attr_put_u16_check
+		     (buf, size, TCA_FLOWER_KEY_VLAN_ID,
+		      spec.vlan->tci & RTE_BE16(0x0fff))))
+			goto error_nobufs;
+		++item;
+		break;
 	case ITEM_IPV4:
 		if (item->type != RTE_FLOW_ITEM_TYPE_IPV4)
 			goto trans;
@@ -505,12 +621,15 @@ mlx5_nl_flow_transpose(void *buf,
 			 sizeof(mlx5_nl_flow_mask_supported.ipv4), error);
 		if (!mask.ipv4)
 			return -rte_errno;
-		if (!eth_type_set &&
+		if ((!eth_type_set || !vlan_eth_type_set) &&
 		    !mnl_attr_put_u16_check(buf, size,
+					    vlan_present ?
+					    TCA_FLOWER_KEY_VLAN_ETH_TYPE :
 					    TCA_FLOWER_KEY_ETH_TYPE,
 					    RTE_BE16(ETH_P_IP)))
 			goto error_nobufs;
 		eth_type_set = 1;
+		vlan_eth_type_set = 1;
 		if (mask.ipv4 == &mlx5_nl_flow_mask_empty.ipv4) {
 			++item;
 			break;
@@ -557,12 +676,15 @@ mlx5_nl_flow_transpose(void *buf,
 			 sizeof(mlx5_nl_flow_mask_supported.ipv6), error);
 		if (!mask.ipv6)
 			return -rte_errno;
-		if (!eth_type_set &&
+		if ((!eth_type_set || !vlan_eth_type_set) &&
 		    !mnl_attr_put_u16_check(buf, size,
+					    vlan_present ?
+					    TCA_FLOWER_KEY_VLAN_ETH_TYPE :
 					    TCA_FLOWER_KEY_ETH_TYPE,
 					    RTE_BE16(ETH_P_IPV6)))
 			goto error_nobufs;
 		eth_type_set = 1;
+		vlan_eth_type_set = 1;
 		if (mask.ipv6 == &mlx5_nl_flow_mask_empty.ipv6) {
 			++item;
 			break;
@@ -768,6 +890,84 @@ mlx5_nl_flow_transpose(void *buf,
 		mnl_attr_nest_end(buf, act_index);
 		++action;
 		break;
+	case ACTION_OF_POP_VLAN:
+		if (action->type != RTE_FLOW_ACTION_TYPE_OF_POP_VLAN)
+			goto trans;
+		conf.of_push_vlan = NULL;
+		i = TCA_VLAN_ACT_POP;
+		goto action_of_vlan;
+	case ACTION_OF_PUSH_VLAN:
+		if (action->type != RTE_FLOW_ACTION_TYPE_OF_PUSH_VLAN)
+			goto trans;
+		conf.of_push_vlan = action->conf;
+		i = TCA_VLAN_ACT_PUSH;
+		goto action_of_vlan;
+	case ACTION_OF_SET_VLAN_VID:
+		if (action->type != RTE_FLOW_ACTION_TYPE_OF_SET_VLAN_VID)
+			goto trans;
+		conf.of_set_vlan_vid = action->conf;
+		if (na_vlan_id)
+			goto override_na_vlan_id;
+		i = TCA_VLAN_ACT_MODIFY;
+		goto action_of_vlan;
+	case ACTION_OF_SET_VLAN_PCP:
+		if (action->type != RTE_FLOW_ACTION_TYPE_OF_SET_VLAN_PCP)
+			goto trans;
+		conf.of_set_vlan_pcp = action->conf;
+		if (na_vlan_priority)
+			goto override_na_vlan_priority;
+		i = TCA_VLAN_ACT_MODIFY;
+		goto action_of_vlan;
+action_of_vlan:
+		act_index =
+			mnl_attr_nest_start_check(buf, size, act_index_cur++);
+		if (!act_index ||
+		    !mnl_attr_put_strz_check(buf, size, TCA_ACT_KIND, "vlan"))
+			goto error_nobufs;
+		act = mnl_attr_nest_start_check(buf, size, TCA_ACT_OPTIONS);
+		if (!act)
+			goto error_nobufs;
+		if (!mnl_attr_put_check(buf, size, TCA_VLAN_PARMS,
+					sizeof(struct tc_vlan),
+					&(struct tc_vlan){
+						.action = TC_ACT_PIPE,
+						.v_action = i,
+					}))
+			goto error_nobufs;
+		if (i == TCA_VLAN_ACT_POP) {
+			mnl_attr_nest_end(buf, act);
+			++action;
+			break;
+		}
+		if (i == TCA_VLAN_ACT_PUSH &&
+		    !mnl_attr_put_u16_check(buf, size,
+					    TCA_VLAN_PUSH_VLAN_PROTOCOL,
+					    conf.of_push_vlan->ethertype))
+			goto error_nobufs;
+		na_vlan_id = mnl_nlmsg_get_payload_tail(buf);
+		if (!mnl_attr_put_u16_check(buf, size, TCA_VLAN_PAD, 0))
+			goto error_nobufs;
+		na_vlan_priority = mnl_nlmsg_get_payload_tail(buf);
+		if (!mnl_attr_put_u8_check(buf, size, TCA_VLAN_PAD, 0))
+			goto error_nobufs;
+		mnl_attr_nest_end(buf, act);
+		mnl_attr_nest_end(buf, act_index);
+		if (action->type == RTE_FLOW_ACTION_TYPE_OF_SET_VLAN_VID) {
+override_na_vlan_id:
+			na_vlan_id->nla_type = TCA_VLAN_PUSH_VLAN_ID;
+			*(uint16_t *)mnl_attr_get_payload(na_vlan_id) =
+				rte_be_to_cpu_16
+				(conf.of_set_vlan_vid->vlan_vid);
+		} else if (action->type ==
+			   RTE_FLOW_ACTION_TYPE_OF_SET_VLAN_PCP) {
+override_na_vlan_priority:
+			na_vlan_priority->nla_type =
+				TCA_VLAN_PUSH_VLAN_PRIORITY;
+			*(uint8_t *)mnl_attr_get_payload(na_vlan_priority) =
+				conf.of_set_vlan_pcp->vlan_pcp;
+		}
+		++action;
+		break;
 	case END:
 		if (item->type != RTE_FLOW_ITEM_TYPE_END ||
 		    action->type != RTE_FLOW_ACTION_TYPE_END)
-- 
2.11.0

^ permalink raw reply	[flat|nested] 33+ messages in thread

* [dpdk-dev] [PATCH v2 6/6] net/mlx5: add port ID pattern item to switch flow rules
  2018-07-13  9:40 ` [dpdk-dev] [PATCH v2 " Adrien Mazarguil
                     ` (4 preceding siblings ...)
  2018-07-13  9:40   ` [dpdk-dev] [PATCH v2 5/6] net/mlx5: add VLAN item and actions " Adrien Mazarguil
@ 2018-07-13  9:40   ` Adrien Mazarguil
  2018-07-22 11:21   ` [dpdk-dev] [PATCH v2 0/6] net/mlx5: add support for " Shahaf Shuler
  6 siblings, 0 replies; 33+ messages in thread
From: Adrien Mazarguil @ 2018-07-13  9:40 UTC (permalink / raw)
  To: Shahaf Shuler; +Cc: Nelio Laranjeiro, Yongseok Koh, dev

This enables flow rules to match traffic coming from a different DPDK port
ID associated with the device (PORT_ID pattern item), mainly for the
convenience of applications that want to deal with a single port ID for all
flow rules associated with some physical device.

Testpmd example:

- Creating a flow rule on port ID 1 to consume all traffic from port ID 0
  and direct it to port ID 2:

  flow create 1 ingress transfer pattern port_id id is 0 / end actions
     port_id id 2 / end

Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Acked-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Acked-by: Yongseok Koh <yskoh@mellanox.com>
---
 drivers/net/mlx5/mlx5_nl_flow.c | 57 +++++++++++++++++++++++++++++++++++-
 1 file changed, 56 insertions(+), 1 deletion(-)

diff --git a/drivers/net/mlx5/mlx5_nl_flow.c b/drivers/net/mlx5/mlx5_nl_flow.c
index 6c7bf7119..9bad1a418 100644
--- a/drivers/net/mlx5/mlx5_nl_flow.c
+++ b/drivers/net/mlx5/mlx5_nl_flow.c
@@ -154,6 +154,7 @@ enum mlx5_nl_flow_trans {
 	ATTR,
 	PATTERN,
 	ITEM_VOID,
+	ITEM_PORT_ID,
 	ITEM_ETH,
 	ITEM_VLAN,
 	ITEM_IPV4,
@@ -174,7 +175,7 @@ enum mlx5_nl_flow_trans {
 #define TRANS(...) (const enum mlx5_nl_flow_trans []){ __VA_ARGS__, INVALID, }
 
 #define PATTERN_COMMON \
-	ITEM_VOID, ACTIONS
+	ITEM_VOID, ITEM_PORT_ID, ACTIONS
 #define ACTIONS_COMMON \
 	ACTION_VOID, ACTION_OF_POP_VLAN, ACTION_OF_PUSH_VLAN, \
 	ACTION_OF_SET_VLAN_VID, ACTION_OF_SET_VLAN_PCP
@@ -188,6 +189,7 @@ static const enum mlx5_nl_flow_trans *const mlx5_nl_flow_trans[] = {
 	[ATTR] = TRANS(PATTERN),
 	[PATTERN] = TRANS(ITEM_ETH, PATTERN_COMMON),
 	[ITEM_VOID] = TRANS(BACK),
+	[ITEM_PORT_ID] = TRANS(BACK),
 	[ITEM_ETH] = TRANS(ITEM_IPV4, ITEM_IPV6, ITEM_VLAN, PATTERN_COMMON),
 	[ITEM_VLAN] = TRANS(ITEM_IPV4, ITEM_IPV6, PATTERN_COMMON),
 	[ITEM_IPV4] = TRANS(ITEM_TCP, ITEM_UDP, PATTERN_COMMON),
@@ -207,6 +209,7 @@ static const enum mlx5_nl_flow_trans *const mlx5_nl_flow_trans[] = {
 
 /** Empty masks for known item types. */
 static const union {
+	struct rte_flow_item_port_id port_id;
 	struct rte_flow_item_eth eth;
 	struct rte_flow_item_vlan vlan;
 	struct rte_flow_item_ipv4 ipv4;
@@ -217,6 +220,7 @@ static const union {
 
 /** Supported masks for known item types. */
 static const struct {
+	struct rte_flow_item_port_id port_id;
 	struct rte_flow_item_eth eth;
 	struct rte_flow_item_vlan vlan;
 	struct rte_flow_item_ipv4 ipv4;
@@ -224,6 +228,9 @@ static const struct {
 	struct rte_flow_item_tcp tcp;
 	struct rte_flow_item_udp udp;
 } mlx5_nl_flow_mask_supported = {
+	.port_id = {
+		.id = 0xffffffff,
+	},
 	.eth = {
 		.type = RTE_BE16(0xffff),
 		.dst.addr_bytes = "\xff\xff\xff\xff\xff\xff",
@@ -378,6 +385,7 @@ mlx5_nl_flow_transpose(void *buf,
 	const struct rte_flow_action *action;
 	unsigned int n;
 	uint32_t act_index_cur;
+	bool in_port_id_set;
 	bool eth_type_set;
 	bool vlan_present;
 	bool vlan_eth_type_set;
@@ -396,6 +404,7 @@ mlx5_nl_flow_transpose(void *buf,
 	action = actions;
 	n = 0;
 	act_index_cur = 0;
+	in_port_id_set = false;
 	eth_type_set = false;
 	vlan_present = false;
 	vlan_eth_type_set = false;
@@ -409,6 +418,7 @@ mlx5_nl_flow_transpose(void *buf,
 trans:
 	switch (trans[n++]) {
 		union {
+			const struct rte_flow_item_port_id *port_id;
 			const struct rte_flow_item_eth *eth;
 			const struct rte_flow_item_vlan *vlan;
 			const struct rte_flow_item_ipv4 *ipv4;
@@ -510,6 +520,51 @@ mlx5_nl_flow_transpose(void *buf,
 			goto trans;
 		++item;
 		break;
+	case ITEM_PORT_ID:
+		if (item->type != RTE_FLOW_ITEM_TYPE_PORT_ID)
+			goto trans;
+		mask.port_id = mlx5_nl_flow_item_mask
+			(item, &rte_flow_item_port_id_mask,
+			 &mlx5_nl_flow_mask_supported.port_id,
+			 &mlx5_nl_flow_mask_empty.port_id,
+			 sizeof(mlx5_nl_flow_mask_supported.port_id), error);
+		if (!mask.port_id)
+			return -rte_errno;
+		if (mask.port_id == &mlx5_nl_flow_mask_empty.port_id) {
+			in_port_id_set = 1;
+			++item;
+			break;
+		}
+		spec.port_id = item->spec;
+		if (mask.port_id->id && mask.port_id->id != 0xffffffff)
+			return rte_flow_error_set
+				(error, ENOTSUP, RTE_FLOW_ERROR_TYPE_ITEM_MASK,
+				 mask.port_id,
+				 "no support for partial mask on"
+				 " \"id\" field");
+		if (!mask.port_id->id)
+			i = 0;
+		else
+			for (i = 0; ptoi[i].ifindex; ++i)
+				if (ptoi[i].port_id == spec.port_id->id)
+					break;
+		if (!ptoi[i].ifindex)
+			return rte_flow_error_set
+				(error, ENODEV, RTE_FLOW_ERROR_TYPE_ITEM_SPEC,
+				 spec.port_id,
+				 "missing data to convert port ID to ifindex");
+		tcm = mnl_nlmsg_get_payload(buf);
+		if (in_port_id_set &&
+		    ptoi[i].ifindex != (unsigned int)tcm->tcm_ifindex)
+			return rte_flow_error_set
+				(error, ENOTSUP, RTE_FLOW_ERROR_TYPE_ITEM_SPEC,
+				 spec.port_id,
+				 "cannot match traffic for several port IDs"
+				 " through a single flow rule");
+		tcm->tcm_ifindex = ptoi[i].ifindex;
+		in_port_id_set = 1;
+		++item;
+		break;
 	case ITEM_ETH:
 		if (item->type != RTE_FLOW_ITEM_TYPE_ETH)
 			goto trans;
-- 
2.11.0

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [dpdk-dev] [PATCH v2 1/6] net/mlx5: lay groundwork for switch offloads
  2018-07-13  9:40   ` [dpdk-dev] [PATCH v2 1/6] net/mlx5: lay groundwork for switch offloads Adrien Mazarguil
@ 2018-07-14  1:29     ` Yongseok Koh
  2018-07-23 21:40     ` Ferruh Yigit
  1 sibling, 0 replies; 33+ messages in thread
From: Yongseok Koh @ 2018-07-14  1:29 UTC (permalink / raw)
  To: Adrien Mazarguil; +Cc: Shahaf Shuler, Nélio Laranjeiro, dev


On Jul 13, 2018, at 6:27 PM, Adrien Mazarguil <adrien.mazarguil@6wind.com<mailto:adrien.mazarguil@6wind.com>> wrote:

With mlx5, unlike normal flow rules implemented through Verbs for traffic
emitted and received by the application, those targeting different logical
ports of the device (VF representors for instance) are offloaded at the
switch level and must be configured through Netlink (TC interface).

This patch adds preliminary support to manage such flow rules through the
flow API (rte_flow).

Instead of rewriting tons of Netlink helpers and as previously suggested by
Stephen [1], this patch introduces a new dependency to libmnl [2]
(LGPL-2.1) when compiling mlx5.

[1] https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fmails.dpdk.org%2Farchives%2Fdev%2F2018-March%2F092676.html&amp;data=02%7C01%7Cyskoh%40mellanox.com%7Ceb65cd0f56444f90d1e208d5e8a4baf7%7Ca652971c7d2e4d9ba6a4d149256f461b%7C0%7C0%7C636670716587068083&amp;sdata=WlqYmX3p1gmGl3ekvNoduW64vGYz8H9R%2Favu8rsCB2g%3D&amp;reserved=0
[2] https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fnetfilter.org%2Fprojects%2Flibmnl%2F&amp;data=02%7C01%7Cyskoh%40mellanox.com%7Ceb65cd0f56444f90d1e208d5e8a4baf7%7Ca652971c7d2e4d9ba6a4d149256f461b%7C0%7C0%7C636670716587068083&amp;sdata=EDV86z3I27N46U%2Bmj73U2PguS4vYa%2FLFL5o2gY2QDKo%3D&amp;reserved=0

Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com<mailto:adrien.mazarguil@6wind.com>>
Acked-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com<mailto:nelio.laranjeiro@6wind.com>>
Cc: Yongseok Koh <yskoh@mellanox.com<mailto:yskoh@mellanox.com>>
--
Acked-by: Yongseok Koh <yskoh@mellanox.com<mailto:yskoh@mellanox.com>>

Thanks

v2 changes:

- Added NETLINK_CAP_ACK definition if missing from the host system. This
 parameter is also not mandatory anymore and won't prevent creation of
 NL sockets when not supported.
- Modified mlx5_nl_flow_nl_ack() and mlx5_nl_flow_init() to consume the
 least amount of stack space based on message size, instead of the fixed
 MNL_SOCKET_BUFFER_SIZE which is quite large.
---
drivers/net/mlx5/Makefile       |   2 +
drivers/net/mlx5/mlx5.c         |  32 ++++++++
drivers/net/mlx5/mlx5.h         |  10 +++
drivers/net/mlx5/mlx5_nl_flow.c | 147 +++++++++++++++++++++++++++++++++++
mk/rte.app.mk                   |   2 +-
5 files changed, 192 insertions(+), 1 deletion(-)

diff --git a/drivers/net/mlx5/Makefile b/drivers/net/mlx5/Makefile
index 9e274964b..8d3cb219b 100644
--- a/drivers/net/mlx5/Makefile
+++ b/drivers/net/mlx5/Makefile
@@ -33,6 +33,7 @@ SRCS-$(CONFIG_RTE_LIBRTE_MLX5_PMD) += mlx5_mr.c
SRCS-$(CONFIG_RTE_LIBRTE_MLX5_PMD) += mlx5_flow.c
SRCS-$(CONFIG_RTE_LIBRTE_MLX5_PMD) += mlx5_socket.c
SRCS-$(CONFIG_RTE_LIBRTE_MLX5_PMD) += mlx5_nl.c
+SRCS-$(CONFIG_RTE_LIBRTE_MLX5_PMD) += mlx5_nl_flow.c

ifeq ($(CONFIG_RTE_LIBRTE_MLX5_DLOPEN_DEPS),y)
INSTALL-$(CONFIG_RTE_LIBRTE_MLX5_PMD)-lib += $(LIB_GLUE)
@@ -56,6 +57,7 @@ LDLIBS += -ldl
else
LDLIBS += -libverbs -lmlx5
endif
+LDLIBS += -lmnl
LDLIBS += -lrte_eal -lrte_mbuf -lrte_mempool -lrte_ring
LDLIBS += -lrte_ethdev -lrte_net -lrte_kvargs
LDLIBS += -lrte_bus_pci
diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index 6d3421fae..8fb8c91eb 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -282,6 +282,8 @@ mlx5_dev_close(struct rte_eth_dev *dev)
       close(priv->nl_socket_route);
   if (priv->nl_socket_rdma >= 0)
       close(priv->nl_socket_rdma);
+    if (priv->mnl_socket)
+        mlx5_nl_flow_socket_destroy(priv->mnl_socket);
   ret = mlx5_hrxq_ibv_verify(dev);
   if (ret)
       DRV_LOG(WARNING, "port %u some hash Rx queue still remain",
@@ -1116,6 +1118,34 @@ mlx5_dev_spawn(struct rte_device *dpdk_dev,
   claim_zero(mlx5_mac_addr_add(eth_dev, &mac, 0, 0));
   if (vf && config.vf_nl_en)
       mlx5_nl_mac_addr_sync(eth_dev);
+    priv->mnl_socket = mlx5_nl_flow_socket_create();
+    if (!priv->mnl_socket) {
+        err = -rte_errno;
+        DRV_LOG(WARNING,
+            "flow rules relying on switch offloads will not be"
+            " supported: cannot open libmnl socket: %s",
+            strerror(rte_errno));
+    } else {
+        struct rte_flow_error error;
+        unsigned int ifindex = mlx5_ifindex(eth_dev);
+
+        if (!ifindex) {
+            err = -rte_errno;
+            error.message =
+                "cannot retrieve network interface index";
+        } else {
+            err = mlx5_nl_flow_init(priv->mnl_socket, ifindex,
+                        &error);
+        }
+        if (err) {
+            DRV_LOG(WARNING,
+                "flow rules relying on switch offloads will"
+                " not be supported: %s: %s",
+                error.message, strerror(rte_errno));
+            mlx5_nl_flow_socket_destroy(priv->mnl_socket);
+            priv->mnl_socket = NULL;
+        }
+    }
   TAILQ_INIT(&priv->flows);
   TAILQ_INIT(&priv->ctrl_flows);
   /* Hint libmlx5 to use PMD allocator for data plane resources */
@@ -1168,6 +1198,8 @@ mlx5_dev_spawn(struct rte_device *dpdk_dev,
           close(priv->nl_socket_route);
       if (priv->nl_socket_rdma >= 0)
           close(priv->nl_socket_rdma);
+        if (priv->mnl_socket)
+            mlx5_nl_flow_socket_destroy(priv->mnl_socket);
       if (own_domain_id)
           claim_zero(rte_eth_switch_domain_free(priv->domain_id));
       rte_free(priv);
diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index 131be334c..98b6ec07d 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -156,6 +156,8 @@ struct mlx5_drop {
   struct mlx5_rxq_ibv *rxq; /* Verbs Rx queue. */
};

+struct mnl_socket;
+
struct priv {
   LIST_ENTRY(priv) mem_event_cb; /* Called by memory event callback. */
   struct rte_eth_dev_data *dev_data;  /* Pointer to device data. */
@@ -215,6 +217,7 @@ struct priv {
   int nl_socket_rdma; /* Netlink socket (NETLINK_RDMA). */
   int nl_socket_route; /* Netlink socket (NETLINK_ROUTE). */
   uint32_t nl_sn; /* Netlink message sequence number. */
+    struct mnl_socket *mnl_socket; /* Libmnl socket. */
};

#define PORT_ID(priv) ((priv)->dev_data->port_id)
@@ -380,4 +383,11 @@ unsigned int mlx5_nl_ifindex(int nl, const char *name);
int mlx5_nl_switch_info(int nl, unsigned int ifindex,
           struct mlx5_switch_info *info);

+/* mlx5_nl_flow.c */
+
+int mlx5_nl_flow_init(struct mnl_socket *nl, unsigned int ifindex,
+              struct rte_flow_error *error);
+struct mnl_socket *mlx5_nl_flow_socket_create(void);
+void mlx5_nl_flow_socket_destroy(struct mnl_socket *nl);
+
#endif /* RTE_PMD_MLX5_H_ */
diff --git a/drivers/net/mlx5/mlx5_nl_flow.c b/drivers/net/mlx5/mlx5_nl_flow.c
new file mode 100644
index 000000000..60a4493e5
--- /dev/null
+++ b/drivers/net/mlx5/mlx5_nl_flow.c
@@ -0,0 +1,147 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright 2018 6WIND S.A.
+ * Copyright 2018 Mellanox Technologies, Ltd
+ */
+
+#include <errno.h>
+#include <libmnl/libmnl.h>
+#include <linux/netlink.h>
+#include <linux/pkt_sched.h>
+#include <linux/rtnetlink.h>
+#include <stdalign.h>
+#include <stddef.h>
+#include <stdint.h>
+#include <stdlib.h>
+#include <sys/socket.h>
+
+#include <rte_errno.h>
+#include <rte_flow.h>
+
+#include "mlx5.h"
+
+/* Normally found in linux/netlink.h. */
+#ifndef NETLINK_CAP_ACK
+#define NETLINK_CAP_ACK 10
+#endif
+
+/**
+ * Send Netlink message with acknowledgment.
+ *
+ * @param nl
+ *   Libmnl socket to use.
+ * @param nlh
+ *   Message to send. This function always raises the NLM_F_ACK flag before
+ *   sending.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+static int
+mlx5_nl_flow_nl_ack(struct mnl_socket *nl, struct nlmsghdr *nlh)
+{
+    alignas(struct nlmsghdr)
+    uint8_t ans[mnl_nlmsg_size(sizeof(struct nlmsgerr)) +
+            nlh->nlmsg_len - sizeof(*nlh)];
+    uint32_t seq = random();
+    int ret;
+
+    nlh->nlmsg_flags |= NLM_F_ACK;
+    nlh->nlmsg_seq = seq;
+    ret = mnl_socket_sendto(nl, nlh, nlh->nlmsg_len);
+    if (ret != -1)
+        ret = mnl_socket_recvfrom(nl, ans, sizeof(ans));
+    if (ret != -1)
+        ret = mnl_cb_run
+            (ans, ret, seq, mnl_socket_get_portid(nl), NULL, NULL);
+    if (!ret)
+        return 0;
+    rte_errno = errno;
+    return -rte_errno;
+}
+
+/**
+ * Initialize ingress qdisc of a given network interface.
+ *
+ * @param nl
+ *   Libmnl socket of the @p NETLINK_ROUTE kind.
+ * @param ifindex
+ *   Index of network interface to initialize.
+ * @param[out] error
+ *   Perform verbose error reporting if not NULL.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+int
+mlx5_nl_flow_init(struct mnl_socket *nl, unsigned int ifindex,
+          struct rte_flow_error *error)
+{
+    struct nlmsghdr *nlh;
+    struct tcmsg *tcm;
+    alignas(struct nlmsghdr)
+    uint8_t buf[mnl_nlmsg_size(sizeof(*tcm) + 128)];
+
+    /* Destroy existing ingress qdisc and everything attached to it. */
+    nlh = mnl_nlmsg_put_header(buf);
+    nlh->nlmsg_type = RTM_DELQDISC;
+    nlh->nlmsg_flags = NLM_F_REQUEST;
+    tcm = mnl_nlmsg_put_extra_header(nlh, sizeof(*tcm));
+    tcm->tcm_family = AF_UNSPEC;
+    tcm->tcm_ifindex = ifindex;
+    tcm->tcm_handle = TC_H_MAKE(TC_H_INGRESS, 0);
+    tcm->tcm_parent = TC_H_INGRESS;
+    /* Ignore errors when qdisc is already absent. */
+    if (mlx5_nl_flow_nl_ack(nl, nlh) &&
+        rte_errno != EINVAL && rte_errno != ENOENT)
+        return rte_flow_error_set
+            (error, rte_errno, RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+             NULL, "netlink: failed to remove ingress qdisc");
+    /* Create fresh ingress qdisc. */
+    nlh = mnl_nlmsg_put_header(buf);
+    nlh->nlmsg_type = RTM_NEWQDISC;
+    nlh->nlmsg_flags = NLM_F_REQUEST | NLM_F_CREATE | NLM_F_EXCL;
+    tcm = mnl_nlmsg_put_extra_header(nlh, sizeof(*tcm));
+    tcm->tcm_family = AF_UNSPEC;
+    tcm->tcm_ifindex = ifindex;
+    tcm->tcm_handle = TC_H_MAKE(TC_H_INGRESS, 0);
+    tcm->tcm_parent = TC_H_INGRESS;
+    mnl_attr_put_strz_check(nlh, sizeof(buf), TCA_KIND, "ingress");
+    if (mlx5_nl_flow_nl_ack(nl, nlh))
+        return rte_flow_error_set
+            (error, rte_errno, RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+             NULL, "netlink: failed to create ingress qdisc");
+    return 0;
+}
+
+/**
+ * Create and configure a libmnl socket for Netlink flow rules.
+ *
+ * @return
+ *   A valid libmnl socket object pointer on success, NULL otherwise and
+ *   rte_errno is set.
+ */
+struct mnl_socket *
+mlx5_nl_flow_socket_create(void)
+{
+    struct mnl_socket *nl = mnl_socket_open(NETLINK_ROUTE);
+
+    if (nl) {
+        mnl_socket_setsockopt(nl, NETLINK_CAP_ACK, &(int){ 1 },
+                      sizeof(int));
+        if (!mnl_socket_bind(nl, 0, MNL_SOCKET_AUTOPID))
+            return nl;
+    }
+    rte_errno = errno;
+    if (nl)
+        mnl_socket_close(nl);
+    return NULL;
+}
+
+/**
+ * Destroy a libmnl socket.
+ */
+void
+mlx5_nl_flow_socket_destroy(struct mnl_socket *nl)
+{
+    mnl_socket_close(nl);
+}
diff --git a/mk/rte.app.mk b/mk/rte.app.mk
index 7bcf6308d..414f1b967 100644
--- a/mk/rte.app.mk
+++ b/mk/rte.app.mk
@@ -145,7 +145,7 @@ endif
ifeq ($(CONFIG_RTE_LIBRTE_MLX5_DLOPEN_DEPS),y)
_LDLIBS-$(CONFIG_RTE_LIBRTE_MLX5_PMD)       += -lrte_pmd_mlx5 -ldl
else
-_LDLIBS-$(CONFIG_RTE_LIBRTE_MLX5_PMD)       += -lrte_pmd_mlx5 -libverbs -lmlx5
+_LDLIBS-$(CONFIG_RTE_LIBRTE_MLX5_PMD)       += -lrte_pmd_mlx5 -libverbs -lmlx5 -lmnl
endif
_LDLIBS-$(CONFIG_RTE_LIBRTE_MVPP2_PMD)      += -lrte_pmd_mvpp2 -L$(LIBMUSDK_PATH)/lib -lmusdk
_LDLIBS-$(CONFIG_RTE_LIBRTE_NFP_PMD)        += -lrte_pmd_nfp
--
2.11.0

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [dpdk-dev] [PATCH v2 0/6] net/mlx5: add support for switch flow rules
  2018-07-13  9:40 ` [dpdk-dev] [PATCH v2 " Adrien Mazarguil
                     ` (5 preceding siblings ...)
  2018-07-13  9:40   ` [dpdk-dev] [PATCH v2 6/6] net/mlx5: add port ID pattern item " Adrien Mazarguil
@ 2018-07-22 11:21   ` Shahaf Shuler
  6 siblings, 0 replies; 33+ messages in thread
From: Shahaf Shuler @ 2018-07-22 11:21 UTC (permalink / raw)
  To: Adrien Mazarguil
  Cc: Nélio Laranjeiro, Yongseok Koh, dev, Guillaume Gaudonville,
	Raslan Darawsheh, Wael Abualrub

Hi Adrien,

Friday, July 13, 2018 12:41 PM, Adrien Mazarguil:
> Subject: [PATCH v2 0/6] net/mlx5: add support for switch flow rules
> 
> This series adds support for switch flow rules, that is, rte_flow rules applied
> to mlx5 devices at the switch level.
> 
> It allows applications to offload traffic redirection between DPDK ports in
> hardware, while optionally modifying it (e.g. performing encap/decap).
> 
> For this to work, involved DPDK ports must be part of the same switch
> domain, as is the case with port representors, and the transfer attribute
> must be requested on flow rules.
> 
> Also since the mlx5 switch is controlled through Netlink instead of Verbs, and
> given how tedious formatting Netlink messages is, a new dependency is
> added to mlx5: libmnl. See relevant patch.

There are some checkpatch[1] warning, but those are safe to ignore. 

Adrien, one thing which is missing is a documentation update for mlx5 doc on the new dependency of libmnl. 
Just like rdma-core: how to get it, how to install it, version required..

I won't postpone the series acceptance due to this (since I want to avoid big changes after the rc2), but we must have such doc before 18.08 release. 

Series applied to next-net-mlx, thanks!

[1]

### net/mlx5: add framework for switch flow rules

ERROR:COMPLEX_MACRO: Macros with complex values should be enclosed in parentheses
#307: FILE: drivers/net/mlx5/mlx5_nl_flow.c:60:
+#define PATTERN_COMMON \
+       ITEM_VOID, ACTIONS

ERROR:COMPLEX_MACRO: Macros with complex values should be enclosed in parentheses
#309: FILE: drivers/net/mlx5/mlx5_nl_flow.c:62:
+#define ACTIONS_COMMON \
+       ACTION_VOID, END

total: 2 errors, 0 warnings, 0 checks, 537 lines checked

### net/mlx5: add fate actions to switch flow rules

ERROR:COMPLEX_MACRO: Macros with complex values should be enclosed in parentheses
#55: FILE: drivers/net/mlx5/mlx5_nl_flow.c:68:
+#define ACTIONS_FATE \
+       ACTION_PORT_ID, ACTION_DROP

ERROR:ASSIGN_IN_IF: do not use assignment in if condition
#136: FILE: drivers/net/mlx5/mlx5_nl_flow.c:277:
+               if (!mnl_attr_put_check(buf, size, TCA_MIRRED_PARMS,

ERROR:ASSIGN_IN_IF: do not use assignment in if condition
#159: FILE: drivers/net/mlx5/mlx5_nl_flow.c:300:
+               if (!mnl_attr_put_check(buf, size, TCA_GACT_PARMS,

total: 3 errors, 0 warnings, 0 checks, 134 lines checked

### net/mlx5: add VLAN item and actions to switch flow rules

ERROR:ASSIGN_IN_IF: do not use assignment in if condition
#367: FILE: drivers/net/mlx5/mlx5_nl_flow.c:930:
+               if (!mnl_attr_put_check(buf, size, TCA_VLAN_PARMS,

total: 1 errors, 0 warnings, 0 checks, 358 lines checked


> 
> v2 changes:
> 
> - Mostly compilation fixes for missing Netlink definitions on older systems.
> - Reduced stack consumption.
> - Adapted series to rely on mlx5_dev_to_port_id() instead of
>   mlx5_dev_to_domain_id().
> - See relevant patches for more information.
> 
> Adrien Mazarguil (6):
>   net/mlx5: lay groundwork for switch offloads
>   net/mlx5: add framework for switch flow rules
>   net/mlx5: add fate actions to switch flow rules
>   net/mlx5: add L2-L4 pattern items to switch flow rules
>   net/mlx5: add VLAN item and actions to switch flow rules
>   net/mlx5: add port ID pattern item to switch flow rules
> 
>  drivers/net/mlx5/Makefile       |  142 ++++
>  drivers/net/mlx5/mlx5.c         |   32 +
>  drivers/net/mlx5/mlx5.h         |   28 +
>  drivers/net/mlx5/mlx5_flow.c    |  111 +++
>  drivers/net/mlx5/mlx5_nl_flow.c | 1247
> ++++++++++++++++++++++++++++++++++
>  mk/rte.app.mk                   |    2 +-
>  6 files changed, 1561 insertions(+), 1 deletion(-)  create mode 100644
> drivers/net/mlx5/mlx5_nl_flow.c
> 
> --
> 2.11.0

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [dpdk-dev] [PATCH v2 1/6] net/mlx5: lay groundwork for switch offloads
  2018-07-13  9:40   ` [dpdk-dev] [PATCH v2 1/6] net/mlx5: lay groundwork for switch offloads Adrien Mazarguil
  2018-07-14  1:29     ` Yongseok Koh
@ 2018-07-23 21:40     ` Ferruh Yigit
  2018-07-24  0:50       ` Stephen Hemminger
  1 sibling, 1 reply; 33+ messages in thread
From: Ferruh Yigit @ 2018-07-23 21:40 UTC (permalink / raw)
  To: Adrien Mazarguil, Shahaf Shuler; +Cc: Nelio Laranjeiro, Yongseok Koh, dev

On 7/13/2018 10:40 AM, Adrien Mazarguil wrote:
> With mlx5, unlike normal flow rules implemented through Verbs for traffic
> emitted and received by the application, those targeting different logical
> ports of the device (VF representors for instance) are offloaded at the
> switch level and must be configured through Netlink (TC interface).
> 
> This patch adds preliminary support to manage such flow rules through the
> flow API (rte_flow).
> 
> Instead of rewriting tons of Netlink helpers and as previously suggested by
> Stephen [1], this patch introduces a new dependency to libmnl [2]
> (LGPL-2.1) when compiling mlx5.
> 
> [1] https://mails.dpdk.org/archives/dev/2018-March/092676.html
> [2] https://netfilter.org/projects/libmnl/

Just to highlight this new PMD level dependency to libmnl.

tap pmd also uses netlink and vdev_netvsc also does nl communication, perhaps we
can discuss unifying netlink usage around this new library.

> 
> Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
> Acked-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
> Cc: Yongseok Koh <yskoh@mellanox.com>
> --
> v2 changes:
> 
> - Added NETLINK_CAP_ACK definition if missing from the host system. This
>   parameter is also not mandatory anymore and won't prevent creation of
>   NL sockets when not supported.
> - Modified mlx5_nl_flow_nl_ack() and mlx5_nl_flow_init() to consume the
>   least amount of stack space based on message size, instead of the fixed
>   MNL_SOCKET_BUFFER_SIZE which is quite large.

<...>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [dpdk-dev] [PATCH v2 1/6] net/mlx5: lay groundwork for switch offloads
  2018-07-23 21:40     ` Ferruh Yigit
@ 2018-07-24  0:50       ` Stephen Hemminger
  2018-07-24  4:35         ` Shahaf Shuler
  0 siblings, 1 reply; 33+ messages in thread
From: Stephen Hemminger @ 2018-07-24  0:50 UTC (permalink / raw)
  To: Ferruh Yigit
  Cc: Adrien Mazarguil, Shahaf Shuler, Nelio Laranjeiro, Yongseok Koh, dev

On Mon, 23 Jul 2018 22:40:47 +0100
Ferruh Yigit <ferruh.yigit@intel.com> wrote:

> On 7/13/2018 10:40 AM, Adrien Mazarguil wrote:
> > With mlx5, unlike normal flow rules implemented through Verbs for traffic
> > emitted and received by the application, those targeting different logical
> > ports of the device (VF representors for instance) are offloaded at the
> > switch level and must be configured through Netlink (TC interface).
> > 
> > This patch adds preliminary support to manage such flow rules through the
> > flow API (rte_flow).
> > 
> > Instead of rewriting tons of Netlink helpers and as previously suggested by
> > Stephen [1], this patch introduces a new dependency to libmnl [2]
> > (LGPL-2.1) when compiling mlx5.
> > 
> > [1] https://mails.dpdk.org/archives/dev/2018-March/092676.html
> > [2] https://netfilter.org/projects/libmnl/  
> 
> Just to highlight this new PMD level dependency to libmnl.
> 
> tap pmd also uses netlink and vdev_netvsc also does nl communication, perhaps we
> can discuss unifying netlink usage around this new library.
> 
> > 
> > Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
> > Acked-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
> > Cc: Yongseok Koh <yskoh@mellanox.com>
> > --
> > v2 changes:
> > 
> > - Added NETLINK_CAP_ACK definition if missing from the host system. This
> >   parameter is also not mandatory anymore and won't prevent creation of
> >   NL sockets when not supported.
> > - Modified mlx5_nl_flow_nl_ack() and mlx5_nl_flow_init() to consume the
> >   least amount of stack space based on message size, instead of the fixed
> >   MNL_SOCKET_BUFFER_SIZE which is quite large.  
> 
> <...>
> 

I am concerned that this won't work on FreeBSD and it will end up
farther behind.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [dpdk-dev] [PATCH v2 1/6] net/mlx5: lay groundwork for switch offloads
  2018-07-24  0:50       ` Stephen Hemminger
@ 2018-07-24  4:35         ` Shahaf Shuler
  2018-07-24 19:33           ` Stephen Hemminger
  0 siblings, 1 reply; 33+ messages in thread
From: Shahaf Shuler @ 2018-07-24  4:35 UTC (permalink / raw)
  To: Stephen Hemminger, Ferruh Yigit
  Cc: Adrien Mazarguil, Nélio Laranjeiro, Yongseok Koh, dev

Stephen,

Tuesday, July 24, 2018 3:51 AM, Stephen Hemminger:
> Subject: Re: [dpdk-dev] [PATCH v2 1/6] net/mlx5: lay groundwork for switch
> offloads
> 
> On Mon, 23 Jul 2018 22:40:47 +0100
> Ferruh Yigit <ferruh.yigit@intel.com> wrote:
> >
> > Just to highlight this new PMD level dependency to libmnl.
> >
> > tap pmd also uses netlink and vdev_netvsc also does nl communication,
> > perhaps we can discuss unifying netlink usage around this new library.
> >
> >
> 
> I am concerned that this won't work on FreeBSD and it will end up farther
> behind.

Can you elaborate? What is the reason it will not work?
 

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [dpdk-dev] [PATCH v2 1/6] net/mlx5: lay groundwork for switch offloads
  2018-07-24  4:35         ` Shahaf Shuler
@ 2018-07-24 19:33           ` Stephen Hemminger
  0 siblings, 0 replies; 33+ messages in thread
From: Stephen Hemminger @ 2018-07-24 19:33 UTC (permalink / raw)
  To: Shahaf Shuler
  Cc: Ferruh Yigit, Adrien Mazarguil, Nélio Laranjeiro, Yongseok Koh, dev

On Tue, 24 Jul 2018 04:35:05 +0000
Shahaf Shuler <shahafs@mellanox.com> wrote:

> Stephen,
> 
> Tuesday, July 24, 2018 3:51 AM, Stephen Hemminger:
> > Subject: Re: [dpdk-dev] [PATCH v2 1/6] net/mlx5: lay groundwork for switch
> > offloads
> > 
> > On Mon, 23 Jul 2018 22:40:47 +0100
> > Ferruh Yigit <ferruh.yigit@intel.com> wrote:  
> > >
> > > Just to highlight this new PMD level dependency to libmnl.
> > >
> > > tap pmd also uses netlink and vdev_netvsc also does nl communication,
> > > perhaps we can discuss unifying netlink usage around this new library.
> > >
> > >  
> > 
> > I am concerned that this won't work on FreeBSD and it will end up farther
> > behind.  
> 
> Can you elaborate? What is the reason it will not work?
>  
> 

There is no working netlink on FreeBSD.
There is no eBPF on FreeBSD.

^ permalink raw reply	[flat|nested] 33+ messages in thread

end of thread, other threads:[~2018-07-24 19:33 UTC | newest]

Thread overview: 33+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-06-27 18:08 [dpdk-dev] [PATCH 0/6] net/mlx5: add support for switch flow rules Adrien Mazarguil
2018-06-27 18:08 ` [dpdk-dev] [PATCH 1/6] net/mlx5: lay groundwork for switch offloads Adrien Mazarguil
2018-07-12  0:17   ` Yongseok Koh
2018-07-12 10:46     ` Adrien Mazarguil
2018-07-12 17:33       ` Yongseok Koh
2018-06-27 18:08 ` [dpdk-dev] [PATCH 2/6] net/mlx5: add framework for switch flow rules Adrien Mazarguil
2018-07-12  0:59   ` Yongseok Koh
2018-07-12 10:46     ` Adrien Mazarguil
2018-07-12 18:25       ` Yongseok Koh
2018-06-27 18:08 ` [dpdk-dev] [PATCH 3/6] net/mlx5: add fate actions to " Adrien Mazarguil
2018-07-12  1:00   ` Yongseok Koh
2018-06-27 18:08 ` [dpdk-dev] [PATCH 4/6] net/mlx5: add L2-L4 pattern items " Adrien Mazarguil
2018-07-12  1:02   ` Yongseok Koh
2018-06-27 18:08 ` [dpdk-dev] [PATCH 5/6] net/mlx5: add VLAN item and actions " Adrien Mazarguil
2018-07-12  1:10   ` Yongseok Koh
2018-07-12 10:47     ` Adrien Mazarguil
2018-07-12 18:49       ` Yongseok Koh
2018-06-27 18:08 ` [dpdk-dev] [PATCH 6/6] net/mlx5: add port ID pattern item " Adrien Mazarguil
2018-07-12  1:13   ` Yongseok Koh
2018-06-28  9:05 ` [dpdk-dev] [PATCH 0/6] net/mlx5: add support for " Nélio Laranjeiro
2018-07-13  9:40 ` [dpdk-dev] [PATCH v2 " Adrien Mazarguil
2018-07-13  9:40   ` [dpdk-dev] [PATCH v2 1/6] net/mlx5: lay groundwork for switch offloads Adrien Mazarguil
2018-07-14  1:29     ` Yongseok Koh
2018-07-23 21:40     ` Ferruh Yigit
2018-07-24  0:50       ` Stephen Hemminger
2018-07-24  4:35         ` Shahaf Shuler
2018-07-24 19:33           ` Stephen Hemminger
2018-07-13  9:40   ` [dpdk-dev] [PATCH v2 2/6] net/mlx5: add framework for switch flow rules Adrien Mazarguil
2018-07-13  9:40   ` [dpdk-dev] [PATCH v2 3/6] net/mlx5: add fate actions to " Adrien Mazarguil
2018-07-13  9:40   ` [dpdk-dev] [PATCH v2 4/6] net/mlx5: add L2-L4 pattern items " Adrien Mazarguil
2018-07-13  9:40   ` [dpdk-dev] [PATCH v2 5/6] net/mlx5: add VLAN item and actions " Adrien Mazarguil
2018-07-13  9:40   ` [dpdk-dev] [PATCH v2 6/6] net/mlx5: add port ID pattern item " Adrien Mazarguil
2018-07-22 11:21   ` [dpdk-dev] [PATCH v2 0/6] net/mlx5: add support for " Shahaf Shuler

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).