DPDK patches and discussions
 help / color / mirror / Atom feed
* [dpdk-dev] [PATCH 0/8] net/mlx5: add switch offload for VXLAN encap/decap
@ 2018-08-31  9:57 Adrien Mazarguil
  2018-08-31  9:57 ` [dpdk-dev] [PATCH 1/8] net/mlx5: speed up interface index retrieval for flow rules Adrien Mazarguil
                   ` (7 more replies)
  0 siblings, 8 replies; 9+ messages in thread
From: Adrien Mazarguil @ 2018-08-31  9:57 UTC (permalink / raw)
  To: Shahaf Shuler, Yongseok Koh, Slava Ovsiienko; +Cc: dev

This series adds support for RTE_FLOW_ACTION_TYPE_VXLAN_ENCAP and
RTE_FLOW_ACTION_TYPE_VXLAN_DECAP to mlx5.

Since these actions are supported at the switch level, the "transfer"
attribute must be set on such flow rules. They must also be combined with a
port redirection action to make sense.

A typical use case is port representors in switchdev mode, with VXLAN
traffic encapsulation performed on traffic coming *from* a representor and
decapsulation on traffic going *to* that representor, in order to
transparently assign a given VXLAN to VF traffic.

Since only ingress is supported, encapsulation flow rules are normally
applied on a physical port and emit traffic to a port representor. The
opposite order is used for decapsulation.

Like other mlx5 switch flow rule actions, these are implemented through
Linux's TC flower API. Since the Linux interface for VXLAN encap/decap
involves virtual network devices (i.e. ip link add type vxlan [...]), the
PMD automatically spawns them on a needed basis through Netlink calls. The
added complexity necessarily results in a rather convoluted PMD
implementation.

This series relies on "ethdev: add flow API object converter" [1] which
should applied first since testpmd does not provide a means to test VXLAN
encap otherwise.

[1] https://patches.dpdk.org/project/dpdk/list/?series=1123

Adrien Mazarguil (8):
  net/mlx5: speed up interface index retrieval for flow rules
  net/mlx5: clean up redundant interface name getters
  net/mlx5: rename internal function
  net/mlx5: enhance TC flow rule send/ack function
  net/mlx5: prepare switch flow rule parser for encap offloads
  net/mlx5: add convenience macros to switch flow rule engine
  net/mlx5: add VXLAN encap support to switch flow rules
  net/mlx5: add VXLAN decap support to switch flow rules

 drivers/net/mlx5/Makefile       |   75 ++
 drivers/net/mlx5/mlx5.c         |   74 +-
 drivers/net/mlx5/mlx5.h         |   28 +-
 drivers/net/mlx5/mlx5_ethdev.c  |  188 ++--
 drivers/net/mlx5/mlx5_flow.c    |   16 +-
 drivers/net/mlx5/mlx5_nl.c      |    9 +-
 drivers/net/mlx5/mlx5_nl_flow.c | 1763 ++++++++++++++++++++++++++++++++--
 7 files changed, 1871 insertions(+), 282 deletions(-)

-- 
2.11.0

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [dpdk-dev] [PATCH 1/8] net/mlx5: speed up interface index retrieval for flow rules
  2018-08-31  9:57 [dpdk-dev] [PATCH 0/8] net/mlx5: add switch offload for VXLAN encap/decap Adrien Mazarguil
@ 2018-08-31  9:57 ` Adrien Mazarguil
  2018-08-31  9:57 ` [dpdk-dev] [PATCH 2/8] net/mlx5: clean up redundant interface name getters Adrien Mazarguil
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: Adrien Mazarguil @ 2018-08-31  9:57 UTC (permalink / raw)
  To: Shahaf Shuler, Yongseok Koh, Slava Ovsiienko; +Cc: dev

rte_eth_dev_info_get() can be avoided since the underlying device type and
data structure are known.

Caching the index before creating any flow rules avoids a number of
redundant system calls later since users are not expected to destroy the
associated network interface while PMD is bound and running.

Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
---
 drivers/net/mlx5/mlx5.c        | 48 ++++++++++++++++++-------------------
 drivers/net/mlx5/mlx5.h        |  1 +
 drivers/net/mlx5/mlx5_ethdev.c |  4 +---
 drivers/net/mlx5/mlx5_flow.c   |  6 ++---
 drivers/net/mlx5/mlx5_nl.c     |  9 +++----
 5 files changed, 32 insertions(+), 36 deletions(-)

diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index a8ae2b5d3..55b73a03b 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -736,6 +736,7 @@ mlx5_dev_spawn(struct rte_device *dpdk_dev,
 	struct ether_addr mac;
 	char name[RTE_ETH_NAME_MAX_LEN];
 	int own_domain_id = 0;
+	struct rte_flow_error flow_error;
 	unsigned int i;
 
 	/* Determine if this port representor is supposed to be spawned. */
@@ -959,6 +960,8 @@ mlx5_dev_spawn(struct rte_device *dpdk_dev,
 	priv->domain_id = RTE_ETH_DEV_SWITCH_DOMAIN_ID_INVALID;
 	priv->representor_id =
 		switch_info->representor ? switch_info->port_name : -1;
+	/* Interface index will be known once eth_dev is allocated. */
+	priv->ifindex = 0;
 	/*
 	 * Look for sibling devices in order to reuse their switch domain
 	 * if any, otherwise allocate one.
@@ -1087,6 +1090,16 @@ mlx5_dev_spawn(struct rte_device *dpdk_dev,
 		err = rte_errno;
 		goto error;
 	}
+	/*
+	 * Cache associated interface index since lookups are expensive.
+	 * It is not expected to change while a PMD instance is bound and
+	 * running.
+	 */
+	priv->ifindex = mlx5_ifindex(eth_dev);
+	if (!priv->ifindex)
+		DRV_LOG(WARNING,
+			"cannot retrieve network interface index: %s",
+			strerror(rte_errno));
 	/* Configure the first MAC address by default. */
 	if (mlx5_get_mac(eth_dev, &mac.addr_bytes)) {
 		DRV_LOG(ERR,
@@ -1131,32 +1144,19 @@ mlx5_dev_spawn(struct rte_device *dpdk_dev,
 	if (vf && config.vf_nl_en)
 		mlx5_nl_mac_addr_sync(eth_dev);
 	priv->mnl_socket = mlx5_nl_flow_socket_create();
-	if (!priv->mnl_socket) {
-		err = -rte_errno;
+	if (!priv->mnl_socket ||
+	    !priv->ifindex ||
+	    mlx5_nl_flow_init(priv->mnl_socket, priv->ifindex, &flow_error)) {
+		if (!priv->mnl_socket) {
+			flow_error.message = "cannot open libmnl socket";
+		} else if (!priv->ifindex) {
+			rte_errno = ENXIO;
+			flow_error.message = "unknown network interface index";
+		}
 		DRV_LOG(WARNING,
 			"flow rules relying on switch offloads will not be"
-			" supported: cannot open libmnl socket: %s",
-			strerror(rte_errno));
-	} else {
-		struct rte_flow_error error;
-		unsigned int ifindex = mlx5_ifindex(eth_dev);
-
-		if (!ifindex) {
-			err = -rte_errno;
-			error.message =
-				"cannot retrieve network interface index";
-		} else {
-			err = mlx5_nl_flow_init(priv->mnl_socket, ifindex,
-						&error);
-		}
-		if (err) {
-			DRV_LOG(WARNING,
-				"flow rules relying on switch offloads will"
-				" not be supported: %s: %s",
-				error.message, strerror(rte_errno));
-			mlx5_nl_flow_socket_destroy(priv->mnl_socket);
-			priv->mnl_socket = NULL;
-		}
+			" supported: %s: %s",
+			flow_error.message, strerror(rte_errno));
 	}
 	TAILQ_INIT(&priv->flows);
 	TAILQ_INIT(&priv->ctrl_flows);
diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index 35a196e76..4c2dec644 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -183,6 +183,7 @@ struct priv {
 	unsigned int representor:1; /* Device is a port representor. */
 	uint16_t domain_id; /* Switch domain identifier. */
 	int32_t representor_id; /* Port representor identifier. */
+	unsigned int ifindex; /* Interface index associated with device. */
 	/* RX/TX queues. */
 	unsigned int rxqs_n; /* RX queues array size. */
 	unsigned int txqs_n; /* TX queues array size. */
diff --git a/drivers/net/mlx5/mlx5_ethdev.c b/drivers/net/mlx5/mlx5_ethdev.c
index 34c5b95ee..cf0b415b2 100644
--- a/drivers/net/mlx5/mlx5_ethdev.c
+++ b/drivers/net/mlx5/mlx5_ethdev.c
@@ -511,7 +511,6 @@ mlx5_dev_infos_get(struct rte_eth_dev *dev, struct rte_eth_dev_info *info)
 	struct priv *priv = dev->data->dev_private;
 	struct mlx5_dev_config *config = &priv->config;
 	unsigned int max;
-	char ifname[IF_NAMESIZE];
 
 	/* FIXME: we should ask the device for these values. */
 	info->min_rx_bufsize = 32;
@@ -532,8 +531,7 @@ mlx5_dev_infos_get(struct rte_eth_dev *dev, struct rte_eth_dev_info *info)
 	info->rx_offload_capa = (mlx5_get_rx_port_offloads() |
 				 info->rx_queue_offload_capa);
 	info->tx_offload_capa = mlx5_get_tx_port_offloads(dev);
-	if (mlx5_get_ifname(dev, &ifname) == 0)
-		info->if_index = if_nametoindex(ifname);
+	info->if_index = priv->ifindex;
 	info->reta_size = priv->reta_idx_n ?
 		priv->reta_idx_n : config->ind_table_max_size;
 	info->hash_key_size = MLX5_RSS_HASH_KEY_LEN;
diff --git a/drivers/net/mlx5/mlx5_flow.c b/drivers/net/mlx5/mlx5_flow.c
index 3f548a9a4..f093a5ed0 100644
--- a/drivers/net/mlx5/mlx5_flow.c
+++ b/drivers/net/mlx5/mlx5_flow.c
@@ -2466,13 +2466,13 @@ mlx5_flow_merge_switch(struct rte_eth_dev *dev,
 		n = RTE_MIN(mlx5_dev_to_port_id(dev->device, port_id, n), n);
 	}
 	for (i = 0; i != n; ++i) {
-		struct rte_eth_dev_info dev_info;
+		struct rte_eth_dev *i_dev = &rte_eth_devices[port_id[i]];
+		struct priv *i_priv = i_dev->data->dev_private;
 
-		rte_eth_dev_info_get(port_id[i], &dev_info);
 		if (port_id[i] == dev->data->port_id)
 			own = i;
 		ptoi[i].port_id = port_id[i];
-		ptoi[i].ifindex = dev_info.if_index;
+		ptoi[i].ifindex = i_priv->ifindex;
 	}
 	/* Ensure first entry of ptoi[] is the current device. */
 	if (own) {
diff --git a/drivers/net/mlx5/mlx5_nl.c b/drivers/net/mlx5/mlx5_nl.c
index d61826aea..a298db68c 100644
--- a/drivers/net/mlx5/mlx5_nl.c
+++ b/drivers/net/mlx5/mlx5_nl.c
@@ -362,7 +362,6 @@ mlx5_nl_mac_addr_list(struct rte_eth_dev *dev, struct ether_addr (*mac)[],
 		      int *mac_n)
 {
 	struct priv *priv = dev->data->dev_private;
-	unsigned int iface_idx = mlx5_ifindex(dev);
 	struct {
 		struct nlmsghdr	hdr;
 		struct ifinfomsg ifm;
@@ -374,7 +373,7 @@ mlx5_nl_mac_addr_list(struct rte_eth_dev *dev, struct ether_addr (*mac)[],
 		},
 		.ifm = {
 			.ifi_family = PF_BRIDGE,
-			.ifi_index = iface_idx,
+			.ifi_index = priv->ifindex,
 		},
 	};
 	struct mlx5_nl_mac_addr data = {
@@ -421,7 +420,6 @@ mlx5_nl_mac_addr_modify(struct rte_eth_dev *dev, struct ether_addr *mac,
 			int add)
 {
 	struct priv *priv = dev->data->dev_private;
-	unsigned int iface_idx = mlx5_ifindex(dev);
 	struct {
 		struct nlmsghdr hdr;
 		struct ndmsg ndm;
@@ -437,7 +435,7 @@ mlx5_nl_mac_addr_modify(struct rte_eth_dev *dev, struct ether_addr *mac,
 		.ndm = {
 			.ndm_family = PF_BRIDGE,
 			.ndm_state = NUD_NOARP | NUD_PERMANENT,
-			.ndm_ifindex = iface_idx,
+			.ndm_ifindex = priv->ifindex,
 			.ndm_flags = NTF_SELF,
 		},
 		.rta = {
@@ -600,7 +598,6 @@ static int
 mlx5_nl_device_flags(struct rte_eth_dev *dev, uint32_t flags, int enable)
 {
 	struct priv *priv = dev->data->dev_private;
-	unsigned int iface_idx = mlx5_ifindex(dev);
 	struct {
 		struct nlmsghdr hdr;
 		struct ifinfomsg ifi;
@@ -613,7 +610,7 @@ mlx5_nl_device_flags(struct rte_eth_dev *dev, uint32_t flags, int enable)
 		.ifi = {
 			.ifi_flags = enable ? flags : 0,
 			.ifi_change = flags,
-			.ifi_index = iface_idx,
+			.ifi_index = priv->ifindex,
 		},
 	};
 	int fd;
-- 
2.11.0

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [dpdk-dev] [PATCH 2/8] net/mlx5: clean up redundant interface name getters
  2018-08-31  9:57 [dpdk-dev] [PATCH 0/8] net/mlx5: add switch offload for VXLAN encap/decap Adrien Mazarguil
  2018-08-31  9:57 ` [dpdk-dev] [PATCH 1/8] net/mlx5: speed up interface index retrieval for flow rules Adrien Mazarguil
@ 2018-08-31  9:57 ` Adrien Mazarguil
  2018-08-31  9:57 ` [dpdk-dev] [PATCH 3/8] net/mlx5: rename internal function Adrien Mazarguil
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: Adrien Mazarguil @ 2018-08-31  9:57 UTC (permalink / raw)
  To: Shahaf Shuler, Yongseok Koh, Slava Ovsiienko; +Cc: dev

In order to return the network interface index (ifindex) associated with a
device, mlx5_ifindex() uses if_nametoindex() to convert the result of
mlx5_get_ifname(). This is inefficient because the latter first retrieves
ifindex on its own to pass it through if_indextoname().

Since indices are much more reliable than names (less prone to change) and
involved in flow rule management where performance matters, this patch
moves ifindex-getting code directly into mlx5_ifindex() and replaces
remaining mlx5_get_ifname() calls with if_indextoname().

Similarly, the new function mlx5_master_ifindex() replaces
mlx5_get_master_ifname() while getting rid of irrelevant compatibility
code for unsupported Linux and MLNX_OFED versions.

Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
---
 drivers/net/mlx5/mlx5.c        |  15 +--
 drivers/net/mlx5/mlx5.h        |   3 -
 drivers/net/mlx5/mlx5_ethdev.c | 184 ++++++++++++++----------------------
 3 files changed, 74 insertions(+), 128 deletions(-)

diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index 55b73a03b..1414ce0c5 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -734,7 +734,7 @@ mlx5_dev_spawn(struct rte_device *dpdk_dev,
 	struct ibv_counter_set_description cs_desc = { .counter_type = 0 };
 #endif
 	struct ether_addr mac;
-	char name[RTE_ETH_NAME_MAX_LEN];
+	char name[RTE_MAX(IF_NAMESIZE, RTE_ETH_NAME_MAX_LEN)];
 	int own_domain_id = 0;
 	struct rte_flow_error flow_error;
 	unsigned int i;
@@ -1116,16 +1116,9 @@ mlx5_dev_spawn(struct rte_device *dpdk_dev,
 		mac.addr_bytes[2], mac.addr_bytes[3],
 		mac.addr_bytes[4], mac.addr_bytes[5]);
 #ifndef NDEBUG
-	{
-		char ifname[IF_NAMESIZE];
-
-		if (mlx5_get_ifname(eth_dev, &ifname) == 0)
-			DRV_LOG(DEBUG, "port %u ifname is \"%s\"",
-				eth_dev->data->port_id, ifname);
-		else
-			DRV_LOG(DEBUG, "port %u ifname is unknown",
-				eth_dev->data->port_id);
-	}
+	DRV_LOG(DEBUG, "port %u ifname is \"%s\"",
+		eth_dev->data->port_id,
+		if_indextoname(priv->ifindex, name) ? name : "");
 #endif
 	/* Get actual MTU if possible. */
 	err = mlx5_get_mtu(eth_dev, &priv->mtu);
diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index 4c2dec644..0807cf689 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -241,9 +241,6 @@ int mlx5_getenv_int(const char *);
 
 /* mlx5_ethdev.c */
 
-int mlx5_get_master_ifname(const struct rte_eth_dev *dev,
-			   char (*ifname)[IF_NAMESIZE]);
-int mlx5_get_ifname(const struct rte_eth_dev *dev, char (*ifname)[IF_NAMESIZE]);
 unsigned int mlx5_ifindex(const struct rte_eth_dev *dev);
 int mlx5_ifreq(const struct rte_eth_dev *dev, int req, struct ifreq *ifr,
 	       int master);
diff --git a/drivers/net/mlx5/mlx5_ethdev.c b/drivers/net/mlx5/mlx5_ethdev.c
index cf0b415b2..67149b7b3 100644
--- a/drivers/net/mlx5/mlx5_ethdev.c
+++ b/drivers/net/mlx5/mlx5_ethdev.c
@@ -119,149 +119,104 @@ struct ethtool_link_settings {
 #endif
 
 /**
- * Get master interface name from private structure.
+ * Get network interface index associated with master device.
+ *
+ * Result differs from mlx5_ifindex() when the current device is a port
+ * representor.
  *
  * @param[in] dev
  *   Pointer to Ethernet device.
- * @param[out] ifname
- *   Interface name output buffer.
  *
  * @return
- *   0 on success, a negative errno value otherwise and rte_errno is set.
+ *   Nonzero interface index on success, zero otherwise and rte_errno is set.
  */
-int
-mlx5_get_master_ifname(const struct rte_eth_dev *dev,
-		       char (*ifname)[IF_NAMESIZE])
+static unsigned int
+mlx5_master_ifindex(const struct rte_eth_dev *dev)
 {
 	struct priv *priv = dev->data->dev_private;
-	DIR *dir;
-	struct dirent *dent;
-	unsigned int dev_type = 0;
-	unsigned int dev_port_prev = ~0u;
-	char match[IF_NAMESIZE] = "";
-
-	{
-		MKSTR(path, "%s/device/net", priv->ibdev_path);
-
-		dir = opendir(path);
-		if (dir == NULL) {
-			rte_errno = errno;
-			return -rte_errno;
-		}
-	}
-	while ((dent = readdir(dir)) != NULL) {
-		char *name = dent->d_name;
-		FILE *file;
-		unsigned int dev_port;
-		int r;
-
-		if ((name[0] == '.') &&
-		    ((name[1] == '\0') ||
-		     ((name[1] == '.') && (name[2] == '\0'))))
-			continue;
+	DIR *dir = NULL;
+	struct dirent *dent = NULL;
+	size_t size = 0;
+	unsigned int ifindex = 0;
 
-		MKSTR(path, "%s/device/net/%s/%s",
-		      priv->ibdev_path, name,
-		      (dev_type ? "dev_id" : "dev_port"));
+	while (1) {
+		char path[size];
+		FILE *file;
+		int ret;
 
-		file = fopen(path, "rb");
-		if (file == NULL) {
-			if (errno != ENOENT)
-				continue;
-			/*
-			 * Switch to dev_id when dev_port does not exist as
-			 * is the case with Linux kernel versions < 3.15.
-			 */
-try_dev_id:
-			match[0] = '\0';
-			if (dev_type)
-				break;
-			dev_type = 1;
-			dev_port_prev = ~0u;
-			rewinddir(dir);
+		ret = snprintf(path, size, "%s/device/net/%s%s",
+			       priv->ibdev_path,
+			       dent ? dent->d_name : "",
+			       dent ? "/ifindex" : "");
+		if (ret == -1)
+			goto error;
+		if (!size) {
+			size = ret + 1;
 			continue;
 		}
-		r = fscanf(file, (dev_type ? "%x" : "%u"), &dev_port);
-		fclose(file);
-		if (r != 1)
-			continue;
-		/*
-		 * Switch to dev_id when dev_port returns the same value for
-		 * all ports. May happen when using a MOFED release older than
-		 * 3.0 with a Linux kernel >= 3.15.
-		 */
-		if (dev_port == dev_port_prev)
-			goto try_dev_id;
-		dev_port_prev = dev_port;
-		if (dev_port == 0)
-			strlcpy(match, name, sizeof(match));
-	}
+		if (!dir) {
+			dir = opendir(path);
+			if (!dir)
+				goto error;
+		}
+		file = dent ? fopen(path, "rb") : NULL;
+		if (file) {
+			/* Only one ifindex is expected in there. */
+			ret = !!ifindex;
+			if (fscanf(file, "%u", &ifindex) != 1)
+				ret = 0;
+			fclose(file);
+			if (ret) {
+				errno = ENOTSUP;
+				goto error;
+			}
+		}
+		do {
+			dent = readdir(dir);
+		} while (dent &&
+			 (!strcmp(dent->d_name, ".") ||
+			  !strcmp(dent->d_name, "..")));
+		if (!dent)
+			break;
+		size = 0;
+	};
 	closedir(dir);
-	if (match[0] == '\0') {
-		rte_errno = ENOENT;
-		return -rte_errno;
-	}
-	strncpy(*ifname, match, sizeof(*ifname));
+	if (!ifindex)
+		rte_errno = ENXIO;
+	return ifindex;
+error:
+	rte_errno = errno;
+	if (dir)
+		closedir(dir);
 	return 0;
 }
 
 /**
- * Get interface name from private structure.
- *
- * This is a port representor-aware version of mlx5_get_master_ifname().
+ * Get network interface index associated with device.
  *
  * @param[in] dev
  *   Pointer to Ethernet device.
- * @param[out] ifname
- *   Interface name output buffer.
  *
  * @return
- *   0 on success, a negative errno value otherwise and rte_errno is set.
+ *   Nonzero interface index on success, zero otherwise and rte_errno is set.
  */
-int
-mlx5_get_ifname(const struct rte_eth_dev *dev, char (*ifname)[IF_NAMESIZE])
+unsigned int
+mlx5_ifindex(const struct rte_eth_dev *dev)
 {
 	struct priv *priv = dev->data->dev_private;
 	unsigned int ifindex =
 		priv->nl_socket_rdma >= 0 ?
 		mlx5_nl_ifindex(priv->nl_socket_rdma, priv->ibdev_name) : 0;
 
-	if (!ifindex) {
-		if (!priv->representor)
-			return mlx5_get_master_ifname(dev, ifname);
-		rte_errno = ENXIO;
-		return -rte_errno;
-	}
-	if (if_indextoname(ifindex, &(*ifname)[0]))
-		return 0;
-	rte_errno = errno;
+	if (ifindex)
+		return ifindex;
+	if (!priv->representor)
+		return mlx5_master_ifindex(dev);
+	rte_errno = ENXIO;
 	return -rte_errno;
 }
 
 /**
- * Get the interface index from device name.
- *
- * @param[in] dev
- *   Pointer to Ethernet device.
- *
- * @return
- *   Nonzero interface index on success, zero otherwise and rte_errno is set.
- */
-unsigned int
-mlx5_ifindex(const struct rte_eth_dev *dev)
-{
-	char ifname[IF_NAMESIZE];
-	unsigned int ifindex;
-
-	if (mlx5_get_ifname(dev, &ifname))
-		return 0;
-	ifindex = if_nametoindex(ifname);
-	if (!ifindex)
-		rte_errno = errno;
-	return ifindex;
-}
-
-/**
  * Perform ifreq ioctl() on associated Ethernet device.
  *
  * @param[in] dev
@@ -282,17 +237,18 @@ mlx5_ifreq(const struct rte_eth_dev *dev, int req, struct ifreq *ifr,
 	   int master)
 {
 	int sock = socket(PF_INET, SOCK_DGRAM, IPPROTO_IP);
-	int ret = 0;
+	unsigned int ifindex;
+	int ret;
 
 	if (sock == -1) {
 		rte_errno = errno;
 		return -rte_errno;
 	}
 	if (master)
-		ret = mlx5_get_master_ifname(dev, &ifr->ifr_name);
+		ifindex = mlx5_master_ifindex(dev);
 	else
-		ret = mlx5_get_ifname(dev, &ifr->ifr_name);
-	if (ret)
+		ifindex = mlx5_ifindex(dev);
+	if (!ifindex || !if_indextoname(ifindex, ifr->ifr_name))
 		goto error;
 	ret = ioctl(sock, req, ifr);
 	if (ret == -1) {
-- 
2.11.0

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [dpdk-dev] [PATCH 3/8] net/mlx5: rename internal function
  2018-08-31  9:57 [dpdk-dev] [PATCH 0/8] net/mlx5: add switch offload for VXLAN encap/decap Adrien Mazarguil
  2018-08-31  9:57 ` [dpdk-dev] [PATCH 1/8] net/mlx5: speed up interface index retrieval for flow rules Adrien Mazarguil
  2018-08-31  9:57 ` [dpdk-dev] [PATCH 2/8] net/mlx5: clean up redundant interface name getters Adrien Mazarguil
@ 2018-08-31  9:57 ` Adrien Mazarguil
  2018-08-31  9:57 ` [dpdk-dev] [PATCH 4/8] net/mlx5: enhance TC flow rule send/ack function Adrien Mazarguil
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: Adrien Mazarguil @ 2018-08-31  9:57 UTC (permalink / raw)
  To: Shahaf Shuler, Yongseok Koh, Slava Ovsiienko; +Cc: dev

Clarify difference between mlx5_nl_flow_create() and mlx5_nl_flow_init()
by renaming the latter mlx5_nl_flow_ifindex_init().

Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
---
 drivers/net/mlx5/mlx5.c         | 3 ++-
 drivers/net/mlx5/mlx5.h         | 4 ++--
 drivers/net/mlx5/mlx5_nl_flow.c | 4 ++--
 3 files changed, 6 insertions(+), 5 deletions(-)

diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index 1414ce0c5..9a504a31c 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -1139,7 +1139,8 @@ mlx5_dev_spawn(struct rte_device *dpdk_dev,
 	priv->mnl_socket = mlx5_nl_flow_socket_create();
 	if (!priv->mnl_socket ||
 	    !priv->ifindex ||
-	    mlx5_nl_flow_init(priv->mnl_socket, priv->ifindex, &flow_error)) {
+	    mlx5_nl_flow_ifindex_init(priv->mnl_socket, priv->ifindex,
+				      &flow_error)) {
 		if (!priv->mnl_socket) {
 			flow_error.message = "cannot open libmnl socket";
 		} else if (!priv->ifindex) {
diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index 0807cf689..287cfc643 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -408,8 +408,8 @@ int mlx5_nl_flow_create(struct mnl_socket *nl, void *buf,
 			struct rte_flow_error *error);
 int mlx5_nl_flow_destroy(struct mnl_socket *nl, void *buf,
 			 struct rte_flow_error *error);
-int mlx5_nl_flow_init(struct mnl_socket *nl, unsigned int ifindex,
-		      struct rte_flow_error *error);
+int mlx5_nl_flow_ifindex_init(struct mnl_socket *nl, unsigned int ifindex,
+			      struct rte_flow_error *error);
 struct mnl_socket *mlx5_nl_flow_socket_create(void);
 void mlx5_nl_flow_socket_destroy(struct mnl_socket *nl);
 
diff --git a/drivers/net/mlx5/mlx5_nl_flow.c b/drivers/net/mlx5/mlx5_nl_flow.c
index beb03c911..9ea2a1b55 100644
--- a/drivers/net/mlx5/mlx5_nl_flow.c
+++ b/drivers/net/mlx5/mlx5_nl_flow.c
@@ -1154,8 +1154,8 @@ mlx5_nl_flow_destroy(struct mnl_socket *nl, void *buf,
  *   0 on success, a negative errno value otherwise and rte_errno is set.
  */
 int
-mlx5_nl_flow_init(struct mnl_socket *nl, unsigned int ifindex,
-		  struct rte_flow_error *error)
+mlx5_nl_flow_ifindex_init(struct mnl_socket *nl, unsigned int ifindex,
+			  struct rte_flow_error *error)
 {
 	struct nlmsghdr *nlh;
 	struct tcmsg *tcm;
-- 
2.11.0

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [dpdk-dev] [PATCH 4/8] net/mlx5: enhance TC flow rule send/ack function
  2018-08-31  9:57 [dpdk-dev] [PATCH 0/8] net/mlx5: add switch offload for VXLAN encap/decap Adrien Mazarguil
                   ` (2 preceding siblings ...)
  2018-08-31  9:57 ` [dpdk-dev] [PATCH 3/8] net/mlx5: rename internal function Adrien Mazarguil
@ 2018-08-31  9:57 ` Adrien Mazarguil
  2018-08-31  9:57 ` [dpdk-dev] [PATCH 5/8] net/mlx5: prepare switch flow rule parser for encap offloads Adrien Mazarguil
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: Adrien Mazarguil @ 2018-08-31  9:57 UTC (permalink / raw)
  To: Shahaf Shuler, Yongseok Koh, Slava Ovsiienko; +Cc: dev

A callback parameter to process replies will be useful for subsequent work
in this area. It implies the following:

- Replies may be much larger than requests. In fact their size cannot
  really be known in advance. Using MNL_SOCKET_BUFFER_SIZE (at least 8192
  bytes) is the recommended approach to make truncation less likely (look
  for NLMSG_GOODSIZE in Linux).

- Multipart replies are made of several messages. A loop is needed to
  process these.

- In case of truncated message (since one cannot really be sure),
  its remaining parts must be flushed to prevent their reception by
  subsequent queries.

- Using rte_get_tsc_cycles() instead of random() for message sequence
  numbers is faster yet unlikely to pick the same number twice in a row.

- mlx5_nl_flow_init() can be simplified since the query message is never
  written over (it was already the case actually).

Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
---
 drivers/net/mlx5/mlx5_nl_flow.c | 73 ++++++++++++++++++++++++------------
 1 file changed, 48 insertions(+), 25 deletions(-)

diff --git a/drivers/net/mlx5/mlx5_nl_flow.c b/drivers/net/mlx5/mlx5_nl_flow.c
index 9ea2a1b55..e720728b7 100644
--- a/drivers/net/mlx5/mlx5_nl_flow.c
+++ b/drivers/net/mlx5/mlx5_nl_flow.c
@@ -22,6 +22,7 @@
 #include <sys/socket.h>
 
 #include <rte_byteorder.h>
+#include <rte_cycles.h>
 #include <rte_errno.h>
 #include <rte_ether.h>
 #include <rte_flow.h>
@@ -1050,38 +1051,63 @@ mlx5_nl_flow_brand(void *buf, uint32_t handle)
 }
 
 /**
- * Send Netlink message with acknowledgment.
+ * Send Netlink message with acknowledgment and process reply.
  *
  * @param nl
  *   Libmnl socket to use.
  * @param nlh
- *   Message to send. This function always raises the NLM_F_ACK flag before
- *   sending.
+ *   Message to send. This function always raises the NLM_F_ACK flag and
+ *   sets its sequence number before sending.
+ * @param cb
+ *   Callback handler for received message.
+ * @param arg
+ *   Data pointer for callback handler.
  *
  * @return
  *   0 on success, a negative errno value otherwise and rte_errno is set.
  */
 static int
-mlx5_nl_flow_nl_ack(struct mnl_socket *nl, struct nlmsghdr *nlh)
+mlx5_nl_flow_chat(struct mnl_socket *nl, struct nlmsghdr *nlh,
+		  mnl_cb_t cb, void *arg)
 {
 	alignas(struct nlmsghdr)
-	uint8_t ans[mnl_nlmsg_size(sizeof(struct nlmsgerr)) +
-		    nlh->nlmsg_len - sizeof(*nlh)];
-	uint32_t seq = random();
+	uint8_t ans[MNL_SOCKET_BUFFER_SIZE];
+	unsigned int portid = mnl_socket_get_portid(nl);
+	uint32_t seq = rte_get_tsc_cycles();
+	int err = 0;
 	int ret;
 
 	nlh->nlmsg_flags |= NLM_F_ACK;
 	nlh->nlmsg_seq = seq;
 	ret = mnl_socket_sendto(nl, nlh, nlh->nlmsg_len);
-	if (ret != -1)
+	nlh = (void *)ans;
+	/*
+	 * The following loop postpones non-fatal errors until multipart
+	 * messages are complete.
+	 */
+	while (ret > 0) {
 		ret = mnl_socket_recvfrom(nl, ans, sizeof(ans));
-	if (ret != -1)
-		ret = mnl_cb_run
-			(ans, ret, seq, mnl_socket_get_portid(nl), NULL, NULL);
-	if (!ret)
+		if (ret == -1) {
+			err = errno;
+			if (err != ENOSPC)
+				break;
+			ret = sizeof(*nlh);
+		}
+		if (!err) {
+			ret = mnl_cb_run(nlh, ret, seq, portid, cb, arg);
+			if (ret < 0)
+				err = -ret;
+		}
+		if (!(nlh->nlmsg_flags & NLM_F_MULTI) ||
+		    nlh->nlmsg_type == NLMSG_DONE)
+			ret = -err;
+		else
+			ret = 1;
+	}
+	if (!err)
 		return 0;
-	rte_errno = errno;
-	return -rte_errno;
+	rte_errno = err;
+	return -err;
 }
 
 /**
@@ -1105,7 +1131,7 @@ mlx5_nl_flow_create(struct mnl_socket *nl, void *buf,
 
 	nlh->nlmsg_type = RTM_NEWTFILTER;
 	nlh->nlmsg_flags = NLM_F_REQUEST | NLM_F_CREATE | NLM_F_EXCL;
-	if (!mlx5_nl_flow_nl_ack(nl, nlh))
+	if (!mlx5_nl_flow_chat(nl, nlh, NULL, NULL))
 		return 0;
 	return rte_flow_error_set
 		(error, rte_errno, RTE_FLOW_ERROR_TYPE_UNSPECIFIED, NULL,
@@ -1133,7 +1159,7 @@ mlx5_nl_flow_destroy(struct mnl_socket *nl, void *buf,
 
 	nlh->nlmsg_type = RTM_DELTFILTER;
 	nlh->nlmsg_flags = NLM_F_REQUEST;
-	if (!mlx5_nl_flow_nl_ack(nl, nlh))
+	if (!mlx5_nl_flow_chat(nl, nlh, NULL, NULL))
 		return 0;
 	return rte_flow_error_set
 		(error, errno, RTE_FLOW_ERROR_TYPE_UNSPECIFIED, NULL,
@@ -1171,23 +1197,20 @@ mlx5_nl_flow_ifindex_init(struct mnl_socket *nl, unsigned int ifindex,
 	tcm->tcm_ifindex = ifindex;
 	tcm->tcm_handle = TC_H_MAKE(TC_H_INGRESS, 0);
 	tcm->tcm_parent = TC_H_INGRESS;
+	if (!mnl_attr_put_strz_check(nlh, sizeof(buf), TCA_KIND, "ingress"))
+		return rte_flow_error_set
+			(error, ENOBUFS, RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+			 NULL, "netlink: not enough space for message");
 	/* Ignore errors when qdisc is already absent. */
-	if (mlx5_nl_flow_nl_ack(nl, nlh) &&
+	if (mlx5_nl_flow_chat(nl, nlh, NULL, NULL) &&
 	    rte_errno != EINVAL && rte_errno != ENOENT)
 		return rte_flow_error_set
 			(error, rte_errno, RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
 			 NULL, "netlink: failed to remove ingress qdisc");
 	/* Create fresh ingress qdisc. */
-	nlh = mnl_nlmsg_put_header(buf);
 	nlh->nlmsg_type = RTM_NEWQDISC;
 	nlh->nlmsg_flags = NLM_F_REQUEST | NLM_F_CREATE | NLM_F_EXCL;
-	tcm = mnl_nlmsg_put_extra_header(nlh, sizeof(*tcm));
-	tcm->tcm_family = AF_UNSPEC;
-	tcm->tcm_ifindex = ifindex;
-	tcm->tcm_handle = TC_H_MAKE(TC_H_INGRESS, 0);
-	tcm->tcm_parent = TC_H_INGRESS;
-	mnl_attr_put_strz_check(nlh, sizeof(buf), TCA_KIND, "ingress");
-	if (mlx5_nl_flow_nl_ack(nl, nlh))
+	if (mlx5_nl_flow_chat(nl, nlh, NULL, NULL))
 		return rte_flow_error_set
 			(error, rte_errno, RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
 			 NULL, "netlink: failed to create ingress qdisc");
-- 
2.11.0

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [dpdk-dev] [PATCH 5/8] net/mlx5: prepare switch flow rule parser for encap offloads
  2018-08-31  9:57 [dpdk-dev] [PATCH 0/8] net/mlx5: add switch offload for VXLAN encap/decap Adrien Mazarguil
                   ` (3 preceding siblings ...)
  2018-08-31  9:57 ` [dpdk-dev] [PATCH 4/8] net/mlx5: enhance TC flow rule send/ack function Adrien Mazarguil
@ 2018-08-31  9:57 ` Adrien Mazarguil
  2018-08-31  9:57 ` [dpdk-dev] [PATCH 6/8] net/mlx5: add convenience macros to switch flow rule engine Adrien Mazarguil
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: Adrien Mazarguil @ 2018-08-31  9:57 UTC (permalink / raw)
  To: Shahaf Shuler, Yongseok Koh, Slava Ovsiienko; +Cc: dev

A mere message buffer is not enough to support the additional logic
required to manage flow rules with such offloads; a dedicated object
(struct mlx5_nl_flow) with the ability to store additional information and
adjustable target network interfaces is needed, as well as a context
object for shared data (struct mlx5_nl_flow_ctx).

A predictable message sequence number can now be stored in the context
object as an improvement over CPU counters.

Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
---
 drivers/net/mlx5/mlx5.c         |  18 ++--
 drivers/net/mlx5/mlx5.h         |  22 ++--
 drivers/net/mlx5/mlx5_flow.c    |  10 +-
 drivers/net/mlx5/mlx5_nl_flow.c | 189 ++++++++++++++++++++++++-----------
 4 files changed, 155 insertions(+), 84 deletions(-)

diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index 9a504a31c..c10ca4ae5 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -282,8 +282,8 @@ mlx5_dev_close(struct rte_eth_dev *dev)
 		close(priv->nl_socket_route);
 	if (priv->nl_socket_rdma >= 0)
 		close(priv->nl_socket_rdma);
-	if (priv->mnl_socket)
-		mlx5_nl_flow_socket_destroy(priv->mnl_socket);
+	if (priv->nl_flow_ctx)
+		mlx5_nl_flow_ctx_destroy(priv->nl_flow_ctx);
 	ret = mlx5_hrxq_ibv_verify(dev);
 	if (ret)
 		DRV_LOG(WARNING, "port %u some hash Rx queue still remain",
@@ -1136,13 +1136,13 @@ mlx5_dev_spawn(struct rte_device *dpdk_dev,
 	claim_zero(mlx5_mac_addr_add(eth_dev, &mac, 0, 0));
 	if (vf && config.vf_nl_en)
 		mlx5_nl_mac_addr_sync(eth_dev);
-	priv->mnl_socket = mlx5_nl_flow_socket_create();
-	if (!priv->mnl_socket ||
+	priv->nl_flow_ctx = mlx5_nl_flow_ctx_create(eth_dev->device->numa_node);
+	if (!priv->nl_flow_ctx ||
 	    !priv->ifindex ||
-	    mlx5_nl_flow_ifindex_init(priv->mnl_socket, priv->ifindex,
+	    mlx5_nl_flow_ifindex_init(priv->nl_flow_ctx, priv->ifindex,
 				      &flow_error)) {
-		if (!priv->mnl_socket) {
-			flow_error.message = "cannot open libmnl socket";
+		if (!priv->nl_flow_ctx) {
+			flow_error.message = "cannot create NL flow context";
 		} else if (!priv->ifindex) {
 			rte_errno = ENXIO;
 			flow_error.message = "unknown network interface index";
@@ -1204,8 +1204,8 @@ mlx5_dev_spawn(struct rte_device *dpdk_dev,
 			close(priv->nl_socket_route);
 		if (priv->nl_socket_rdma >= 0)
 			close(priv->nl_socket_rdma);
-		if (priv->mnl_socket)
-			mlx5_nl_flow_socket_destroy(priv->mnl_socket);
+		if (priv->nl_flow_ctx)
+			mlx5_nl_flow_ctx_destroy(priv->nl_flow_ctx);
 		if (own_domain_id)
 			claim_zero(rte_eth_switch_domain_free(priv->domain_id));
 		rte_free(priv);
diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index 287cfc643..210f4ea11 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -162,7 +162,8 @@ struct mlx5_nl_flow_ptoi {
 	unsigned int ifindex; /**< Network interface index. */
 };
 
-struct mnl_socket;
+struct mlx5_nl_flow;
+struct mlx5_nl_flow_ctx;
 
 struct priv {
 	LIST_ENTRY(priv) mem_event_cb; /* Called by memory event callback. */
@@ -229,7 +230,7 @@ struct priv {
 	rte_spinlock_t uar_lock[MLX5_UAR_PAGE_NUM_MAX];
 	/* UAR same-page access control required in 32bit implementations. */
 #endif
-	struct mnl_socket *mnl_socket; /* Libmnl socket. */
+	struct mlx5_nl_flow_ctx *nl_flow_ctx; /* Context for NL flow rules. */
 };
 
 #define PORT_ID(priv) ((priv)->dev_data->port_id)
@@ -396,21 +397,24 @@ int mlx5_nl_switch_info(int nl, unsigned int ifindex,
 
 /* mlx5_nl_flow.c */
 
-int mlx5_nl_flow_transpose(void *buf,
+int mlx5_nl_flow_transpose(struct mlx5_nl_flow *nl_flow,
 			   size_t size,
 			   const struct mlx5_nl_flow_ptoi *ptoi,
 			   const struct rte_flow_attr *attr,
 			   const struct rte_flow_item *pattern,
 			   const struct rte_flow_action *actions,
 			   struct rte_flow_error *error);
-void mlx5_nl_flow_brand(void *buf, uint32_t handle);
-int mlx5_nl_flow_create(struct mnl_socket *nl, void *buf,
+void mlx5_nl_flow_brand(struct mlx5_nl_flow *nl_flow, uint32_t handle);
+int mlx5_nl_flow_create(struct mlx5_nl_flow_ctx *ctx,
+			struct mlx5_nl_flow *nl_flow,
 			struct rte_flow_error *error);
-int mlx5_nl_flow_destroy(struct mnl_socket *nl, void *buf,
+int mlx5_nl_flow_destroy(struct mlx5_nl_flow_ctx *ctx,
+			 struct mlx5_nl_flow *nl_flow,
 			 struct rte_flow_error *error);
-int mlx5_nl_flow_ifindex_init(struct mnl_socket *nl, unsigned int ifindex,
+int mlx5_nl_flow_ifindex_init(struct mlx5_nl_flow_ctx *ctx,
+			      unsigned int ifindex,
 			      struct rte_flow_error *error);
-struct mnl_socket *mlx5_nl_flow_socket_create(void);
-void mlx5_nl_flow_socket_destroy(struct mnl_socket *nl);
+struct mlx5_nl_flow_ctx *mlx5_nl_flow_ctx_create(int socket);
+void mlx5_nl_flow_ctx_destroy(struct mlx5_nl_flow_ctx *ctx);
 
 #endif /* RTE_PMD_MLX5_H_ */
diff --git a/drivers/net/mlx5/mlx5_flow.c b/drivers/net/mlx5/mlx5_flow.c
index f093a5ed0..77e510dc3 100644
--- a/drivers/net/mlx5/mlx5_flow.c
+++ b/drivers/net/mlx5/mlx5_flow.c
@@ -2485,7 +2485,7 @@ mlx5_flow_merge_switch(struct rte_eth_dev *dev,
 	ptoi[n].ifindex = 0;
 	if (flow_size < off)
 		flow_size = 0;
-	ret = mlx5_nl_flow_transpose((uint8_t *)flow + off,
+	ret = mlx5_nl_flow_transpose((void *)((uint8_t *)flow + off),
 				     flow_size ? flow_size - off : 0,
 				     ptoi, attr, pattern, actions, error);
 	if (ret < 0)
@@ -2885,8 +2885,8 @@ mlx5_flow_remove(struct rte_eth_dev *dev, struct rte_flow *flow)
 	struct priv *priv = dev->data->dev_private;
 	struct mlx5_flow_verbs *verbs;
 
-	if (flow->nl_flow && priv->mnl_socket)
-		mlx5_nl_flow_destroy(priv->mnl_socket, flow->nl_flow, NULL);
+	if (flow->nl_flow && priv->nl_flow_ctx)
+		mlx5_nl_flow_destroy(priv->nl_flow_ctx, flow->nl_flow, NULL);
 	LIST_FOREACH(verbs, &flow->verbs, next) {
 		if (verbs->flow) {
 			claim_zero(mlx5_glue->destroy_flow(verbs->flow));
@@ -2975,8 +2975,8 @@ mlx5_flow_apply(struct rte_eth_dev *dev, struct rte_flow *flow,
 		}
 	}
 	if (flow->nl_flow &&
-	    priv->mnl_socket &&
-	    mlx5_nl_flow_create(priv->mnl_socket, flow->nl_flow, error))
+	    priv->nl_flow_ctx &&
+	    mlx5_nl_flow_create(priv->nl_flow_ctx, flow->nl_flow, error))
 		goto error;
 	return 0;
 error:
diff --git a/drivers/net/mlx5/mlx5_nl_flow.c b/drivers/net/mlx5/mlx5_nl_flow.c
index e720728b7..d20416026 100644
--- a/drivers/net/mlx5/mlx5_nl_flow.c
+++ b/drivers/net/mlx5/mlx5_nl_flow.c
@@ -22,10 +22,10 @@
 #include <sys/socket.h>
 
 #include <rte_byteorder.h>
-#include <rte_cycles.h>
 #include <rte_errno.h>
 #include <rte_ether.h>
 #include <rte_flow.h>
+#include <rte_malloc.h>
 
 #include "mlx5.h"
 #include "mlx5_autoconf.h"
@@ -148,6 +148,23 @@ struct tc_vlan {
 #define TCA_FLOWER_KEY_VLAN_ETH_TYPE 25
 #endif
 
+/** Context object required by most functions. */
+struct mlx5_nl_flow_ctx {
+	int socket; /**< NUMA socket for memory allocations. */
+	uint32_t seq; /**< Message sequence number for @p nl. */
+	struct mnl_socket *nl; /**< @p NETLINK_ROUTE libmnl socket. */
+};
+
+/** Flow rule descriptor. */
+struct mlx5_nl_flow {
+	uint32_t size; /**< Size of this object. */
+	uint32_t applied:1; /**< Whether rule is currently applied. */
+	unsigned int *ifindex_src; /**< Source interface. */
+	unsigned int *ifindex_dst; /**< Destination interface. */
+	alignas(struct nlmsghdr)
+	uint8_t msg[]; /**< Netlink message data. */
+};
+
 /** Parser state definitions for mlx5_nl_flow_trans[]. */
 enum mlx5_nl_flow_trans {
 	INVALID,
@@ -350,10 +367,10 @@ mlx5_nl_flow_item_mask(const struct rte_flow_item *item,
  * Subsequent entries enable this function to resolve other DPDK port IDs
  * found in the flow rule.
  *
- * @param[out] buf
- *   Output message buffer. May be NULL when @p size is 0.
+ * @param[out] nl_flow
+ *   Output buffer. May be NULL when @p size is 0.
  * @param size
- *   Size of @p buf. Message may be truncated if not large enough.
+ *   Size of @p nl_flow. May be truncated if not large enough.
  * @param[in] ptoi
  *   DPDK port ID to network interface index translation table. This table
  *   is terminated by an entry with a zero ifindex value.
@@ -372,7 +389,7 @@ mlx5_nl_flow_item_mask(const struct rte_flow_item *item,
  *   otherwise and rte_errno is set.
  */
 int
-mlx5_nl_flow_transpose(void *buf,
+mlx5_nl_flow_transpose(struct mlx5_nl_flow *nl_flow,
 		       size_t size,
 		       const struct mlx5_nl_flow_ptoi *ptoi,
 		       const struct rte_flow_attr *attr,
@@ -380,8 +397,9 @@ mlx5_nl_flow_transpose(void *buf,
 		       const struct rte_flow_action *actions,
 		       struct rte_flow_error *error)
 {
-	alignas(struct nlmsghdr)
-	uint8_t buf_tmp[mnl_nlmsg_size(sizeof(struct tcmsg) + 1024)];
+	alignas(struct mlx5_nl_flow)
+	uint8_t buf_tmp[1024];
+	void *buf;
 	const struct rte_flow_item *item;
 	const struct rte_flow_action *action;
 	unsigned int n;
@@ -398,9 +416,15 @@ mlx5_nl_flow_transpose(void *buf,
 	const enum mlx5_nl_flow_trans *trans;
 	const enum mlx5_nl_flow_trans *back;
 
-	if (!size)
-		goto error_nobufs;
 init:
+	buf = NULL;
+	if (size < offsetof(struct mlx5_nl_flow, msg))
+		goto error_nobufs;
+	nl_flow->size = offsetof(struct mlx5_nl_flow, msg);
+	nl_flow->applied = 0;
+	nl_flow->ifindex_src = NULL;
+	nl_flow->ifindex_dst = NULL;
+	size -= nl_flow->size;
 	item = pattern;
 	action = actions;
 	n = 0;
@@ -483,15 +507,21 @@ mlx5_nl_flow_transpose(void *buf,
 				(error, ENOTSUP,
 				 RTE_FLOW_ERROR_TYPE_ATTR_INGRESS,
 				 attr, "egress is not supported");
-		if (size < mnl_nlmsg_size(sizeof(*tcm)))
+		i = RTE_ALIGN_CEIL(nl_flow->size, alignof(struct nlmsghdr));
+		i -= nl_flow->size;
+		if (size < i + mnl_nlmsg_size(sizeof(*tcm)))
 			goto error_nobufs;
+		nl_flow->size += i;
+		buf = (void *)((uintptr_t)nl_flow + nl_flow->size);
+		size -= i;
 		nlh = mnl_nlmsg_put_header(buf);
-		nlh->nlmsg_type = 0;
+		nlh->nlmsg_type = RTM_NEWTFILTER;
 		nlh->nlmsg_flags = 0;
 		nlh->nlmsg_seq = 0;
 		tcm = mnl_nlmsg_put_extra_header(nlh, sizeof(*tcm));
 		tcm->tcm_family = AF_UNSPEC;
 		tcm->tcm_ifindex = ptoi[0].ifindex;
+		nl_flow->ifindex_src = (unsigned int *)&tcm->tcm_ifindex;
 		/*
 		 * Let kernel pick a handle by default. A predictable handle
 		 * can be set by the caller on the resulting buffer through
@@ -893,6 +923,10 @@ mlx5_nl_flow_transpose(void *buf,
 		act = mnl_attr_nest_start_check(buf, size, TCA_ACT_OPTIONS);
 		if (!act)
 			goto error_nobufs;
+		nl_flow->ifindex_dst =
+			&((struct tc_mirred *)
+			  mnl_attr_get_payload
+			  (mnl_nlmsg_get_payload_tail(buf)))->ifindex;
 		if (!mnl_attr_put_check(buf, size, TCA_MIRRED_PARMS,
 					sizeof(struct tc_mirred),
 					&(struct tc_mirred){
@@ -1014,15 +1048,18 @@ mlx5_nl_flow_transpose(void *buf,
 		if (na_flower)
 			mnl_attr_nest_end(buf, na_flower);
 		nlh = buf;
-		return nlh->nlmsg_len;
+		buf = NULL;
+		size -= nlh->nlmsg_len;
+		nl_flow->size += nlh->nlmsg_len;
+		return nl_flow->size;
 	}
 	back = trans;
 	trans = mlx5_nl_flow_trans[trans[n - 1]];
 	n = 0;
 	goto trans;
 error_nobufs:
-	if (buf != buf_tmp) {
-		buf = buf_tmp;
+	if (nl_flow != (void *)buf_tmp) {
+		nl_flow = (void *)buf_tmp;
 		size = sizeof(buf_tmp);
 		goto init;
 	}
@@ -1037,14 +1074,15 @@ mlx5_nl_flow_transpose(void *buf,
  * This handle should be unique for a given network interface to avoid
  * collisions.
  *
- * @param buf
+ * @param nl_flow
  *   Flow rule buffer previously initialized by mlx5_nl_flow_transpose().
  * @param handle
  *   Unique 32-bit handle to use.
  */
 void
-mlx5_nl_flow_brand(void *buf, uint32_t handle)
+mlx5_nl_flow_brand(struct mlx5_nl_flow *nl_flow, uint32_t handle)
 {
+	void *buf = nl_flow->msg;
 	struct tcmsg *tcm = mnl_nlmsg_get_payload(buf);
 
 	tcm->tcm_handle = handle;
@@ -1053,8 +1091,8 @@ mlx5_nl_flow_brand(void *buf, uint32_t handle)
 /**
  * Send Netlink message with acknowledgment and process reply.
  *
- * @param nl
- *   Libmnl socket to use.
+ * @param ctx
+ *   Context object initialized by mlx5_nl_flow_ctx_create().
  * @param nlh
  *   Message to send. This function always raises the NLM_F_ACK flag and
  *   sets its sequence number before sending.
@@ -1067,26 +1105,26 @@ mlx5_nl_flow_brand(void *buf, uint32_t handle)
  *   0 on success, a negative errno value otherwise and rte_errno is set.
  */
 static int
-mlx5_nl_flow_chat(struct mnl_socket *nl, struct nlmsghdr *nlh,
+mlx5_nl_flow_chat(struct mlx5_nl_flow_ctx *ctx, struct nlmsghdr *nlh,
 		  mnl_cb_t cb, void *arg)
 {
 	alignas(struct nlmsghdr)
 	uint8_t ans[MNL_SOCKET_BUFFER_SIZE];
-	unsigned int portid = mnl_socket_get_portid(nl);
-	uint32_t seq = rte_get_tsc_cycles();
+	unsigned int portid = mnl_socket_get_portid(ctx->nl);
+	uint32_t seq = ++ctx->seq ? ctx->seq : ++ctx->seq;
 	int err = 0;
 	int ret;
 
 	nlh->nlmsg_flags |= NLM_F_ACK;
 	nlh->nlmsg_seq = seq;
-	ret = mnl_socket_sendto(nl, nlh, nlh->nlmsg_len);
+	ret = mnl_socket_sendto(ctx->nl, nlh, nlh->nlmsg_len);
 	nlh = (void *)ans;
 	/*
 	 * The following loop postpones non-fatal errors until multipart
 	 * messages are complete.
 	 */
 	while (ret > 0) {
-		ret = mnl_socket_recvfrom(nl, ans, sizeof(ans));
+		ret = mnl_socket_recvfrom(ctx->nl, ans, sizeof(ans));
 		if (ret == -1) {
 			err = errno;
 			if (err != ENOSPC)
@@ -1113,9 +1151,9 @@ mlx5_nl_flow_chat(struct mnl_socket *nl, struct nlmsghdr *nlh,
 /**
  * Create a Netlink flow rule.
  *
- * @param nl
- *   Libmnl socket to use.
- * @param buf
+ * @param ctx
+ *   Context object initialized by mlx5_nl_flow_ctx_create().
+ * @param nl_flow
  *   Flow rule buffer previously initialized by mlx5_nl_flow_transpose().
  * @param[out] error
  *   Perform verbose error reporting if not NULL.
@@ -1124,15 +1162,19 @@ mlx5_nl_flow_chat(struct mnl_socket *nl, struct nlmsghdr *nlh,
  *   0 on success, a negative errno value otherwise and rte_errno is set.
  */
 int
-mlx5_nl_flow_create(struct mnl_socket *nl, void *buf,
+mlx5_nl_flow_create(struct mlx5_nl_flow_ctx *ctx, struct mlx5_nl_flow *nl_flow,
 		    struct rte_flow_error *error)
 {
-	struct nlmsghdr *nlh = buf;
+	struct nlmsghdr *nlh = (void *)nl_flow->msg;
 
+	if (nl_flow->applied)
+		return 0;
 	nlh->nlmsg_type = RTM_NEWTFILTER;
 	nlh->nlmsg_flags = NLM_F_REQUEST | NLM_F_CREATE | NLM_F_EXCL;
-	if (!mlx5_nl_flow_chat(nl, nlh, NULL, NULL))
+	if (!mlx5_nl_flow_chat(ctx, nlh, NULL, NULL)) {
+		nl_flow->applied = 1;
 		return 0;
+	}
 	return rte_flow_error_set
 		(error, rte_errno, RTE_FLOW_ERROR_TYPE_UNSPECIFIED, NULL,
 		 "netlink: failed to create TC flow rule");
@@ -1141,9 +1183,12 @@ mlx5_nl_flow_create(struct mnl_socket *nl, void *buf,
 /**
  * Destroy a Netlink flow rule.
  *
- * @param nl
- *   Libmnl socket to use.
- * @param buf
+ * In case of error, no recovery is possible; caller must suppose flow rule
+ * was destroyed.
+ *
+ * @param ctx
+ *   Context object initialized by mlx5_nl_flow_ctx_create().
+ * @param nl_flow
  *   Flow rule buffer previously initialized by mlx5_nl_flow_transpose().
  * @param[out] error
  *   Perform verbose error reporting if not NULL.
@@ -1152,26 +1197,31 @@ mlx5_nl_flow_create(struct mnl_socket *nl, void *buf,
  *   0 on success, a negative errno value otherwise and rte_errno is set.
  */
 int
-mlx5_nl_flow_destroy(struct mnl_socket *nl, void *buf,
+mlx5_nl_flow_destroy(struct mlx5_nl_flow_ctx *ctx, struct mlx5_nl_flow *nl_flow,
 		     struct rte_flow_error *error)
 {
-	struct nlmsghdr *nlh = buf;
+	struct nlmsghdr *nlh = (void *)nl_flow->msg;
+	int ret;
 
+	if (!nl_flow->applied)
+		return 0;
 	nlh->nlmsg_type = RTM_DELTFILTER;
 	nlh->nlmsg_flags = NLM_F_REQUEST;
-	if (!mlx5_nl_flow_chat(nl, nlh, NULL, NULL))
+	ret = mlx5_nl_flow_chat(ctx, nlh, NULL, NULL);
+	nl_flow->applied = 0;
+	if (!ret)
 		return 0;
 	return rte_flow_error_set
-		(error, errno, RTE_FLOW_ERROR_TYPE_UNSPECIFIED, NULL,
+		(error, rte_errno, RTE_FLOW_ERROR_TYPE_UNSPECIFIED, NULL,
 		 "netlink: failed to destroy TC flow rule");
 }
 
 /**
- * Initialize ingress qdisc of a given network interface.
+ * Initialize ingress qdisc of network interfaces.
  *
- * @param nl
- *   Libmnl socket of the @p NETLINK_ROUTE kind.
- * @param ifindex
+ * @param ctx
+ *   Context object initialized by mlx5_nl_flow_ctx_create().
+ * @param[in] ifindex
  *   Index of network interface to initialize.
  * @param[out] error
  *   Perform verbose error reporting if not NULL.
@@ -1180,7 +1230,8 @@ mlx5_nl_flow_destroy(struct mnl_socket *nl, void *buf,
  *   0 on success, a negative errno value otherwise and rte_errno is set.
  */
 int
-mlx5_nl_flow_ifindex_init(struct mnl_socket *nl, unsigned int ifindex,
+mlx5_nl_flow_ifindex_init(struct mlx5_nl_flow_ctx *ctx,
+			  const unsigned int ifindex,
 			  struct rte_flow_error *error)
 {
 	struct nlmsghdr *nlh;
@@ -1202,15 +1253,15 @@ mlx5_nl_flow_ifindex_init(struct mnl_socket *nl, unsigned int ifindex,
 			(error, ENOBUFS, RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
 			 NULL, "netlink: not enough space for message");
 	/* Ignore errors when qdisc is already absent. */
-	if (mlx5_nl_flow_chat(nl, nlh, NULL, NULL) &&
-	    rte_errno != EINVAL && rte_errno != ENOENT)
+	if (mlx5_nl_flow_chat(ctx, nlh, NULL, NULL) &&
+	    rte_errno != EINVAL && rte_errno != ENOENT && rte_errno != EPERM)
 		return rte_flow_error_set
 			(error, rte_errno, RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
 			 NULL, "netlink: failed to remove ingress qdisc");
 	/* Create fresh ingress qdisc. */
 	nlh->nlmsg_type = RTM_NEWQDISC;
 	nlh->nlmsg_flags = NLM_F_REQUEST | NLM_F_CREATE | NLM_F_EXCL;
-	if (mlx5_nl_flow_chat(nl, nlh, NULL, NULL))
+	if (mlx5_nl_flow_chat(ctx, nlh, NULL, NULL))
 		return rte_flow_error_set
 			(error, rte_errno, RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
 			 NULL, "netlink: failed to create ingress qdisc");
@@ -1218,34 +1269,50 @@ mlx5_nl_flow_ifindex_init(struct mnl_socket *nl, unsigned int ifindex,
 }
 
 /**
- * Create and configure a libmnl socket for Netlink flow rules.
+ * Create NL flow rule context object.
  *
+ * @param socket
+ *   NUMA socket for memory allocations.
  * @return
- *   A valid libmnl socket object pointer on success, NULL otherwise and
- *   rte_errno is set.
+ *   A valid object on success, NULL otherwise and rte_errno is set.
  */
-struct mnl_socket *
-mlx5_nl_flow_socket_create(void)
+struct mlx5_nl_flow_ctx *
+mlx5_nl_flow_ctx_create(int socket)
 {
-	struct mnl_socket *nl = mnl_socket_open(NETLINK_ROUTE);
+	struct mlx5_nl_flow_ctx *ctx =
+		rte_zmalloc_socket(__func__, sizeof(*ctx), 0, socket);
 
-	if (nl) {
-		mnl_socket_setsockopt(nl, NETLINK_CAP_ACK, &(int){ 1 },
-				      sizeof(int));
-		if (!mnl_socket_bind(nl, 0, MNL_SOCKET_AUTOPID))
-			return nl;
-	}
+	if (!ctx)
+		goto error;
+	ctx->socket = socket;
+	ctx->seq = 0;
+	ctx->nl = mnl_socket_open(NETLINK_ROUTE);
+	if (!ctx->nl)
+		goto error;
+	mnl_socket_setsockopt(ctx->nl, NETLINK_CAP_ACK, &(int){ 1 },
+			      sizeof(int));
+	if (mnl_socket_bind(ctx->nl, 0, MNL_SOCKET_AUTOPID))
+		goto error;
+	return ctx;
+error:
 	rte_errno = errno;
-	if (nl)
-		mnl_socket_close(nl);
+	if (ctx) {
+		if (ctx->nl)
+			mnl_socket_close(ctx->nl);
+		rte_free(ctx);
+	}
 	return NULL;
 }
 
 /**
- * Destroy a libmnl socket.
+ * Destroy NL flow rule context object.
+ *
+ * @param ctx
+ *   Context object initialized by mlx5_nl_flow_ctx_create().
  */
 void
-mlx5_nl_flow_socket_destroy(struct mnl_socket *nl)
+mlx5_nl_flow_ctx_destroy(struct mlx5_nl_flow_ctx *ctx)
 {
-	mnl_socket_close(nl);
+	mnl_socket_close(ctx->nl);
+	rte_free(ctx);
 }
-- 
2.11.0

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [dpdk-dev] [PATCH 6/8] net/mlx5: add convenience macros to switch flow rule engine
  2018-08-31  9:57 [dpdk-dev] [PATCH 0/8] net/mlx5: add switch offload for VXLAN encap/decap Adrien Mazarguil
                   ` (4 preceding siblings ...)
  2018-08-31  9:57 ` [dpdk-dev] [PATCH 5/8] net/mlx5: prepare switch flow rule parser for encap offloads Adrien Mazarguil
@ 2018-08-31  9:57 ` Adrien Mazarguil
  2018-08-31  9:57 ` [dpdk-dev] [PATCH 7/8] net/mlx5: add VXLAN encap support to switch flow rules Adrien Mazarguil
  2018-08-31  9:57 ` [dpdk-dev] [PATCH 8/8] net/mlx5: add VXLAN decap " Adrien Mazarguil
  7 siblings, 0 replies; 9+ messages in thread
From: Adrien Mazarguil @ 2018-08-31  9:57 UTC (permalink / raw)
  To: Shahaf Shuler, Yongseok Koh, Slava Ovsiienko; +Cc: dev

Upcoming patches will rely on them.

Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
---
 drivers/net/mlx5/mlx5_nl_flow.c | 31 +++++++++++++++++--------------
 1 file changed, 17 insertions(+), 14 deletions(-)

diff --git a/drivers/net/mlx5/mlx5_nl_flow.c b/drivers/net/mlx5/mlx5_nl_flow.c
index d20416026..91ff90a13 100644
--- a/drivers/net/mlx5/mlx5_nl_flow.c
+++ b/drivers/net/mlx5/mlx5_nl_flow.c
@@ -236,6 +236,13 @@ static const union {
 	struct rte_flow_item_udp udp;
 } mlx5_nl_flow_mask_empty;
 
+#define ETHER_ADDR_MASK "\xff\xff\xff\xff\xff\xff"
+#define IN_ADDR_MASK RTE_BE32(0xffffffff)
+#define IN6_ADDR_MASK \
+	"\xff\xff\xff\xff\xff\xff\xff\xff" \
+	"\xff\xff\xff\xff\xff\xff\xff\xff"
+#define BE16_MASK RTE_BE16(0xffff)
+
 /** Supported masks for known item types. */
 static const struct {
 	struct rte_flow_item_port_id port_id;
@@ -251,8 +258,8 @@ static const struct {
 	},
 	.eth = {
 		.type = RTE_BE16(0xffff),
-		.dst.addr_bytes = "\xff\xff\xff\xff\xff\xff",
-		.src.addr_bytes = "\xff\xff\xff\xff\xff\xff",
+		.dst.addr_bytes = ETHER_ADDR_MASK,
+		.src.addr_bytes = ETHER_ADDR_MASK,
 	},
 	.vlan = {
 		/* PCP and VID only, no DEI. */
@@ -261,25 +268,21 @@ static const struct {
 	},
 	.ipv4.hdr = {
 		.next_proto_id = 0xff,
-		.src_addr = RTE_BE32(0xffffffff),
-		.dst_addr = RTE_BE32(0xffffffff),
+		.src_addr = IN_ADDR_MASK,
+		.dst_addr = IN_ADDR_MASK,
 	},
 	.ipv6.hdr = {
 		.proto = 0xff,
-		.src_addr =
-			"\xff\xff\xff\xff\xff\xff\xff\xff"
-			"\xff\xff\xff\xff\xff\xff\xff\xff",
-		.dst_addr =
-			"\xff\xff\xff\xff\xff\xff\xff\xff"
-			"\xff\xff\xff\xff\xff\xff\xff\xff",
+		.src_addr = IN6_ADDR_MASK,
+		.dst_addr = IN6_ADDR_MASK,
 	},
 	.tcp.hdr = {
-		.src_port = RTE_BE16(0xffff),
-		.dst_port = RTE_BE16(0xffff),
+		.src_port = BE16_MASK,
+		.dst_port = BE16_MASK,
 	},
 	.udp.hdr = {
-		.src_port = RTE_BE16(0xffff),
-		.dst_port = RTE_BE16(0xffff),
+		.src_port = BE16_MASK,
+		.dst_port = BE16_MASK,
 	},
 };
 
-- 
2.11.0

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [dpdk-dev] [PATCH 7/8] net/mlx5: add VXLAN encap support to switch flow rules
  2018-08-31  9:57 [dpdk-dev] [PATCH 0/8] net/mlx5: add switch offload for VXLAN encap/decap Adrien Mazarguil
                   ` (5 preceding siblings ...)
  2018-08-31  9:57 ` [dpdk-dev] [PATCH 6/8] net/mlx5: add convenience macros to switch flow rule engine Adrien Mazarguil
@ 2018-08-31  9:57 ` Adrien Mazarguil
  2018-08-31  9:57 ` [dpdk-dev] [PATCH 8/8] net/mlx5: add VXLAN decap " Adrien Mazarguil
  7 siblings, 0 replies; 9+ messages in thread
From: Adrien Mazarguil @ 2018-08-31  9:57 UTC (permalink / raw)
  To: Shahaf Shuler, Yongseok Koh, Slava Ovsiienko; +Cc: dev

This patch is huge because support for VXLAN encapsulation in switch flow
rules involves configuration of virtual network interfaces on the host
system including source addresses, routes and neighbor entries for flow
rules to be offloadable by TC. All of this is done through Netlink.

VXLAN interfaces are dynamically created for each combination of local UDP
port and outer network interface associated with flow rules, then used as
targets for TC "flower" filters in order to perform encapsulation.

To automatically create and remove these interfaces on a needed basis
according to the applied flow rules, the PMD maintains global resources
shared between all PMD instances of the primary process.

Testpmd example:

- Setting up outer properties of VXLAN tunnel:

  set vxlan ip-version ipv4 vni 0x112233 udp-src 4242 udp-dst 4789
    ip-src 1.1.1.1 ip-dst 2.2.2.2
    eth-src 00:11:22:33:44:55 eth-dst 66:77:88:99:aa:bb

- Creating a flow rule on port ID 2 performing VXLAN encapsulation with the
  above properties and directing the resulting traffic to port ID 1:

  flow create 2 ingress transfer pattern eth src is 00:11:22:33:44:55 /
     ipv4 / udp dst is 5566 / end actions vxlan_encap / port_id id 1 / end

Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
---
 drivers/net/mlx5/Makefile       |   10 +
 drivers/net/mlx5/mlx5_nl_flow.c | 1198 +++++++++++++++++++++++++++++++++-
 2 files changed, 1204 insertions(+), 4 deletions(-)

diff --git a/drivers/net/mlx5/Makefile b/drivers/net/mlx5/Makefile
index 2e70dec5b..1ba4ce612 100644
--- a/drivers/net/mlx5/Makefile
+++ b/drivers/net/mlx5/Makefile
@@ -384,6 +384,16 @@ mlx5_autoconf.h.new: $(RTE_SDK)/buildtools/auto-config-h.sh
 		/usr/include/assert.h \
 		define static_assert \
 		$(AUTOCONF_OUTPUT)
+	$Q sh -- '$<' '$@' \
+		HAVE_TC_ACT_TUNNEL_KEY \
+		linux/tc_act/tc_tunnel_key.h \
+		define TCA_ACT_TUNNEL_KEY \
+		$(AUTOCONF_OUTPUT)
+	$Q sh -- '$<' '$@' \
+		HAVE_TCA_TUNNEL_KEY_ENC_DST_PORT \
+		linux/tc_act/tc_tunnel_key.h \
+		enum TCA_TUNNEL_KEY_ENC_DST_PORT \
+		$(AUTOCONF_OUTPUT)
 
 # Create mlx5_autoconf.h or update it in case it differs from the new one.
 
diff --git a/drivers/net/mlx5/mlx5_nl_flow.c b/drivers/net/mlx5/mlx5_nl_flow.c
index 91ff90a13..672f92863 100644
--- a/drivers/net/mlx5/mlx5_nl_flow.c
+++ b/drivers/net/mlx5/mlx5_nl_flow.c
@@ -6,7 +6,31 @@
 #include <assert.h>
 #include <errno.h>
 #include <libmnl/libmnl.h>
+/*
+ * Older versions of linux/if.h do not have the required safeties to coexist
+ * with net/if.h. This causes a compilation failure due to symbol
+ * redefinitions even when including the latter first.
+ *
+ * One workaround is to prevent net/if.h from defining conflicting symbols
+ * by removing __USE_MISC, and maintaining it undefined while including
+ * linux/if.h.
+ *
+ * Alphabetical order cannot be preserved since net/if.h must always be
+ * included before linux/if.h regardless.
+ */
+#ifdef __USE_MISC
+#undef __USE_MISC
+#define RESTORE_USE_MISC
+#endif
+#include <net/if.h>
+#include <linux/if.h>
+#ifdef RESTORE_USE_MISC
+#undef RESTORE_USE_MISC
+#define __USE_MISC 1
+#endif
+#include <linux/if_arp.h>
 #include <linux/if_ether.h>
+#include <linux/if_link.h>
 #include <linux/netlink.h>
 #include <linux/pkt_cls.h>
 #include <linux/pkt_sched.h>
@@ -14,11 +38,13 @@
 #include <linux/tc_act/tc_gact.h>
 #include <linux/tc_act/tc_mirred.h>
 #include <netinet/in.h>
+#include <pthread.h>
 #include <stdalign.h>
 #include <stdbool.h>
 #include <stddef.h>
 #include <stdint.h>
 #include <stdlib.h>
+#include <sys/queue.h>
 #include <sys/socket.h>
 
 #include <rte_byteorder.h>
@@ -52,6 +78,34 @@ struct tc_vlan {
 
 #endif /* HAVE_TC_ACT_VLAN */
 
+#ifdef HAVE_TC_ACT_TUNNEL_KEY
+
+#include <linux/tc_act/tc_tunnel_key.h>
+
+#ifndef HAVE_TCA_TUNNEL_KEY_ENC_DST_PORT
+#define TCA_TUNNEL_KEY_ENC_DST_PORT 9
+#endif
+
+#else /* HAVE_TC_ACT_TUNNEL_KEY */
+
+#define TCA_ACT_TUNNEL_KEY 17
+#define TCA_TUNNEL_KEY_ACT_SET 1
+#define TCA_TUNNEL_KEY_ACT_RELEASE 2
+#define TCA_TUNNEL_KEY_PARMS 2
+#define TCA_TUNNEL_KEY_ENC_IPV4_SRC 3
+#define TCA_TUNNEL_KEY_ENC_IPV4_DST 4
+#define TCA_TUNNEL_KEY_ENC_IPV6_SRC 5
+#define TCA_TUNNEL_KEY_ENC_IPV6_DST 6
+#define TCA_TUNNEL_KEY_ENC_KEY_ID 7
+#define TCA_TUNNEL_KEY_ENC_DST_PORT 9
+
+struct tc_tunnel_key {
+	tc_gen;
+	int t_action;
+};
+
+#endif /* HAVE_TC_ACT_TUNNEL_KEY */
+
 /* Normally found in linux/netlink.h. */
 #ifndef NETLINK_CAP_ACK
 #define NETLINK_CAP_ACK 10
@@ -148,6 +202,71 @@ struct tc_vlan {
 #define TCA_FLOWER_KEY_VLAN_ETH_TYPE 25
 #endif
 
+#define BIT(b) (1 << (b))
+#define BIT_ENCAP(e) BIT(MLX5_NL_FLOW_ENCAP_ ## e)
+
+/** Flags used for @p mask in struct mlx5_nl_flow_encap. */
+enum mlx5_nl_flow_encap_flag {
+	MLX5_NL_FLOW_ENCAP_ETH_SRC,
+	MLX5_NL_FLOW_ENCAP_ETH_DST,
+	MLX5_NL_FLOW_ENCAP_IPV4_SRC,
+	MLX5_NL_FLOW_ENCAP_IPV4_DST,
+	MLX5_NL_FLOW_ENCAP_IPV6_SRC,
+	MLX5_NL_FLOW_ENCAP_IPV6_DST,
+	MLX5_NL_FLOW_ENCAP_UDP_SRC,
+	MLX5_NL_FLOW_ENCAP_UDP_DST,
+	MLX5_NL_FLOW_ENCAP_VXLAN_VNI,
+};
+
+/** Encapsulation structure with fixed format for convenience. */
+struct mlx5_nl_flow_encap {
+	uint32_t mask;
+	struct {
+		struct ether_addr src;
+		struct ether_addr dst;
+	} eth;
+	struct mlx5_nl_flow_encap_ip {
+		union mlx5_nl_flow_encap_ip_addr {
+			struct in_addr v4;
+			struct in6_addr v6;
+		} src;
+		union mlx5_nl_flow_encap_ip_addr dst;
+	} ip;
+	struct {
+		rte_be16_t src;
+		rte_be16_t dst;
+	} udp;
+	struct {
+		rte_be32_t vni;
+	} vxlan;
+};
+
+/** Generic address descriptor for encapsulation resources. */
+struct mlx5_nl_flow_encap_addr {
+	LIST_ENTRY(mlx5_nl_flow_encap_addr) next;
+	uint32_t refcnt;
+	uint32_t mask;
+	struct mlx5_nl_flow_encap_ip ip;
+};
+
+/** VXLAN-specific encapsulation resources. */
+struct mlx5_nl_flow_encap_vxlan {
+	LIST_ENTRY(mlx5_nl_flow_encap_vxlan) next;
+	uint32_t refcnt;
+	rte_be16_t port;
+	unsigned int inner;
+};
+
+/** Encapsulation interface descriptor. */
+struct mlx5_nl_flow_encap_ifindex {
+	LIST_ENTRY(mlx5_nl_flow_encap_ifindex) next;
+	uint32_t refcnt;
+	unsigned int outer;
+	LIST_HEAD(, mlx5_nl_flow_encap_vxlan) vxlan;
+	LIST_HEAD(, mlx5_nl_flow_encap_addr) local;
+	LIST_HEAD(, mlx5_nl_flow_encap_addr) neigh;
+};
+
 /** Context object required by most functions. */
 struct mlx5_nl_flow_ctx {
 	int socket; /**< NUMA socket for memory allocations. */
@@ -159,8 +278,10 @@ struct mlx5_nl_flow_ctx {
 struct mlx5_nl_flow {
 	uint32_t size; /**< Size of this object. */
 	uint32_t applied:1; /**< Whether rule is currently applied. */
+	unsigned int encap_ifindex; /**< Interface to use with @p encap. */
 	unsigned int *ifindex_src; /**< Source interface. */
 	unsigned int *ifindex_dst; /**< Destination interface. */
+	struct mlx5_nl_flow_encap *encap; /**< Encapsulation properties. */
 	alignas(struct nlmsghdr)
 	uint8_t msg[]; /**< Netlink message data. */
 };
@@ -179,6 +300,7 @@ enum mlx5_nl_flow_trans {
 	ITEM_IPV6,
 	ITEM_TCP,
 	ITEM_UDP,
+	ITEM_VXLAN,
 	ACTIONS,
 	ACTION_VOID,
 	ACTION_PORT_ID,
@@ -187,6 +309,8 @@ enum mlx5_nl_flow_trans {
 	ACTION_OF_PUSH_VLAN,
 	ACTION_OF_SET_VLAN_VID,
 	ACTION_OF_SET_VLAN_PCP,
+	ACTION_VXLAN_ENCAP,
+	ACTION_VXLAN_DECAP,
 	END,
 };
 
@@ -196,7 +320,8 @@ enum mlx5_nl_flow_trans {
 	ITEM_VOID, ITEM_PORT_ID, ACTIONS
 #define ACTIONS_COMMON \
 	ACTION_VOID, ACTION_OF_POP_VLAN, ACTION_OF_PUSH_VLAN, \
-	ACTION_OF_SET_VLAN_VID, ACTION_OF_SET_VLAN_PCP
+	ACTION_OF_SET_VLAN_VID, ACTION_OF_SET_VLAN_PCP, \
+	ACTION_VXLAN_ENCAP, ACTION_VXLAN_DECAP
 #define ACTIONS_FATE \
 	ACTION_PORT_ID, ACTION_DROP
 
@@ -213,7 +338,8 @@ static const enum mlx5_nl_flow_trans *const mlx5_nl_flow_trans[] = {
 	[ITEM_IPV4] = TRANS(ITEM_TCP, ITEM_UDP, PATTERN_COMMON),
 	[ITEM_IPV6] = TRANS(ITEM_TCP, ITEM_UDP, PATTERN_COMMON),
 	[ITEM_TCP] = TRANS(PATTERN_COMMON),
-	[ITEM_UDP] = TRANS(PATTERN_COMMON),
+	[ITEM_UDP] = TRANS(ITEM_VXLAN, PATTERN_COMMON),
+	[ITEM_VXLAN] = TRANS(PATTERN_COMMON),
 	[ACTIONS] = TRANS(ACTIONS_FATE, ACTIONS_COMMON),
 	[ACTION_VOID] = TRANS(BACK),
 	[ACTION_PORT_ID] = TRANS(ACTION_VOID, END),
@@ -222,6 +348,21 @@ static const enum mlx5_nl_flow_trans *const mlx5_nl_flow_trans[] = {
 	[ACTION_OF_PUSH_VLAN] = TRANS(ACTIONS_FATE, ACTIONS_COMMON),
 	[ACTION_OF_SET_VLAN_VID] = TRANS(ACTIONS_FATE, ACTIONS_COMMON),
 	[ACTION_OF_SET_VLAN_PCP] = TRANS(ACTIONS_FATE, ACTIONS_COMMON),
+	[ACTION_VXLAN_ENCAP] = TRANS(ACTIONS_FATE, ACTIONS_COMMON),
+	[ACTION_VXLAN_DECAP] = TRANS(ACTIONS_FATE, ACTIONS_COMMON),
+	[END] = NULL,
+};
+
+/** Parser state transitions used by mlx5_nl_flow_encap_reap(). */
+static const enum mlx5_nl_flow_trans *const mlx5_nl_flow_encap_reap_trans[] = {
+	[INVALID] = NULL,
+	[BACK] = NULL,
+	[ITEM_VOID] = TRANS(BACK),
+	[ITEM_ETH] = TRANS(ITEM_IPV4, ITEM_IPV6, ITEM_VOID),
+	[ITEM_IPV4] = TRANS(ITEM_UDP, ITEM_VOID),
+	[ITEM_IPV6] = TRANS(ITEM_UDP, ITEM_VOID),
+	[ITEM_UDP] = TRANS(ITEM_VXLAN, ITEM_VOID),
+	[ITEM_VXLAN] = TRANS(END),
 	[END] = NULL,
 };
 
@@ -234,6 +375,7 @@ static const union {
 	struct rte_flow_item_ipv6 ipv6;
 	struct rte_flow_item_tcp tcp;
 	struct rte_flow_item_udp udp;
+	struct rte_flow_item_vxlan vxlan;
 } mlx5_nl_flow_mask_empty;
 
 #define ETHER_ADDR_MASK "\xff\xff\xff\xff\xff\xff"
@@ -242,6 +384,7 @@ static const union {
 	"\xff\xff\xff\xff\xff\xff\xff\xff" \
 	"\xff\xff\xff\xff\xff\xff\xff\xff"
 #define BE16_MASK RTE_BE16(0xffff)
+#define VXLAN_VNI_MASK "\xff\xff\xff"
 
 /** Supported masks for known item types. */
 static const struct {
@@ -286,6 +429,35 @@ static const struct {
 	},
 };
 
+/** Supported masks for known encapsulation item types. */
+static const struct {
+	struct rte_flow_item_eth eth;
+	struct rte_flow_item_ipv4 ipv4;
+	struct rte_flow_item_ipv6 ipv6;
+	struct rte_flow_item_udp udp;
+	struct rte_flow_item_vxlan vxlan;
+} mlx5_nl_flow_encap_mask_supported = {
+	.eth = {
+		.dst.addr_bytes = ETHER_ADDR_MASK,
+		.src.addr_bytes = ETHER_ADDR_MASK,
+	},
+	.ipv4.hdr = {
+		.src_addr = IN_ADDR_MASK,
+		.dst_addr = IN_ADDR_MASK,
+	},
+	.ipv6.hdr = {
+		.src_addr = IN6_ADDR_MASK,
+		.dst_addr = IN6_ADDR_MASK,
+	},
+	.udp.hdr = {
+		.src_port = BE16_MASK,
+		.dst_port = BE16_MASK,
+	},
+	.vxlan = {
+		.vni = VXLAN_VNI_MASK,
+	},
+};
+
 /**
  * Retrieve mask for pattern item.
  *
@@ -361,6 +533,227 @@ mlx5_nl_flow_item_mask(const struct rte_flow_item *item,
 }
 
 /**
+ * Convert VXLAN VNI to 32-bit integer.
+ *
+ * @param[in] vni
+ *   VXLAN VNI in 24-bit wire format.
+ *
+ * @return
+ *   VXLAN VNI as a 32-bit integer value in network endian.
+ */
+static rte_be32_t
+vxlan_vni_as_be32(const uint8_t vni[3])
+{
+	return (volatile union { uint8_t u8[4]; rte_be32_t u32; })
+		{ { 0, vni[0], vni[1], vni[2] } }.u32;
+}
+
+/**
+ * Populate consolidated encapsulation object from list of pattern items.
+ *
+ * Helper function to process configuration of generic actions such as
+ * RTE_FLOW_ACTION_TYPE_VXLAN_ENCAP.
+ *
+ * @param[out] dst
+ *   Destination object.
+ * @param[in] src
+ *   List of pattern items to gather data from.
+ * @param[out] error
+ *   Perform verbose error reporting if not NULL.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+static int
+mlx5_nl_flow_encap_reap(struct mlx5_nl_flow_encap *dst,
+			const struct rte_flow_item *src,
+			struct rte_flow_error *error)
+{
+	struct mlx5_nl_flow_encap tmp = {
+		.mask = 0,
+	};
+	unsigned int n = 0;
+	const enum mlx5_nl_flow_trans *trans = TRANS(ITEM_ETH);
+	const enum mlx5_nl_flow_trans *back = trans;
+
+trans:
+	switch (trans[n++]) {
+		union {
+			const struct rte_flow_item_eth *eth;
+			const struct rte_flow_item_ipv4 *ipv4;
+			const struct rte_flow_item_ipv6 *ipv6;
+			const struct rte_flow_item_udp *udp;
+			const struct rte_flow_item_vxlan *vxlan;
+		} spec, mask;
+
+	default:
+	case INVALID:
+		goto error_encap;
+	case BACK:
+		trans = back;
+		n = 0;
+		goto trans;
+	case ITEM_VOID:
+		if (src->type != RTE_FLOW_ITEM_TYPE_VOID)
+			goto trans;
+		++src;
+		break;
+	case ITEM_ETH:
+		if (src->type != RTE_FLOW_ITEM_TYPE_ETH)
+			goto trans;
+		mask.eth = mlx5_nl_flow_item_mask
+			(src, &rte_flow_item_eth_mask,
+			 &mlx5_nl_flow_encap_mask_supported.eth,
+			 &mlx5_nl_flow_mask_empty.eth,
+			 sizeof(rte_flow_item_eth_mask), error);
+		if (!mask.eth)
+			return -rte_errno;
+		if (mask.eth == &mlx5_nl_flow_mask_empty.eth)
+			goto error_spec;
+		spec.eth = src->spec;
+		if (!is_zero_ether_addr(&mask.eth->src)) {
+			if (!is_broadcast_ether_addr(&mask.eth->src))
+				goto error_mask;
+			tmp.eth.src = spec.eth->src;
+			tmp.mask |= BIT_ENCAP(ETH_SRC);
+		}
+		if (!is_zero_ether_addr(&mask.eth->dst)) {
+			if (!is_broadcast_ether_addr(&mask.eth->dst))
+				goto error_mask;
+			tmp.eth.dst = spec.eth->dst;
+			tmp.mask |= BIT_ENCAP(ETH_DST);
+		}
+		++src;
+		break;
+	case ITEM_IPV4:
+		if (src->type != RTE_FLOW_ITEM_TYPE_IPV4)
+			goto trans;
+		mask.ipv4 = mlx5_nl_flow_item_mask
+			(src, &rte_flow_item_ipv4_mask,
+			 &mlx5_nl_flow_encap_mask_supported.ipv4,
+			 &mlx5_nl_flow_mask_empty.ipv4,
+			 sizeof(rte_flow_item_ipv4_mask), error);
+		if (!mask.ipv4)
+			return -rte_errno;
+		if (mask.ipv4 == &mlx5_nl_flow_mask_empty.ipv4)
+			goto error_spec;
+		spec.ipv4 = src->spec;
+		if (mask.ipv4->hdr.src_addr) {
+			if (mask.ipv4->hdr.src_addr != IN_ADDR_MASK)
+				goto error_mask;
+			tmp.ip.src.v4.s_addr = spec.ipv4->hdr.src_addr;
+			tmp.mask |= BIT_ENCAP(IPV4_SRC);
+		}
+		if (mask.ipv4->hdr.dst_addr) {
+			if (mask.ipv4->hdr.dst_addr != IN_ADDR_MASK)
+				goto error_mask;
+			tmp.ip.dst.v4.s_addr = spec.ipv4->hdr.dst_addr;
+			tmp.mask |= BIT_ENCAP(IPV4_DST);
+		}
+		++src;
+		break;
+	case ITEM_IPV6:
+		if (src->type != RTE_FLOW_ITEM_TYPE_IPV6)
+			goto trans;
+		mask.ipv6 = mlx5_nl_flow_item_mask
+			(src, &rte_flow_item_ipv6_mask,
+			 &mlx5_nl_flow_encap_mask_supported.ipv6,
+			 &mlx5_nl_flow_mask_empty.ipv6,
+			 sizeof(rte_flow_item_ipv6_mask), error);
+		if (!mask.ipv6)
+			return -rte_errno;
+		if (mask.ipv6 == &mlx5_nl_flow_mask_empty.ipv6)
+			goto error_spec;
+		spec.ipv6 = src->spec;
+		if (!IN6_IS_ADDR_UNSPECIFIED(mask.ipv6->hdr.src_addr)) {
+			if (memcmp(mask.ipv6->hdr.src_addr, IN6_ADDR_MASK, 16))
+				goto error_mask;
+			tmp.ip.src.v6 =	*(const struct in6_addr *)
+				spec.ipv6->hdr.src_addr;
+			tmp.mask |= BIT_ENCAP(IPV6_SRC);
+		}
+		if (!IN6_IS_ADDR_UNSPECIFIED(mask.ipv6->hdr.dst_addr)) {
+			if (memcmp(mask.ipv6->hdr.dst_addr, IN6_ADDR_MASK, 16))
+				goto error_mask;
+			tmp.ip.dst.v6 =	*(const struct in6_addr *)
+				spec.ipv6->hdr.dst_addr;
+			tmp.mask |= BIT_ENCAP(IPV6_DST);
+		}
+		++src;
+		break;
+	case ITEM_UDP:
+		if (src->type != RTE_FLOW_ITEM_TYPE_UDP)
+			goto trans;
+		mask.udp = mlx5_nl_flow_item_mask
+			(src, &rte_flow_item_udp_mask,
+			 &mlx5_nl_flow_encap_mask_supported.udp,
+			 &mlx5_nl_flow_mask_empty.udp,
+			 sizeof(rte_flow_item_udp_mask), error);
+		if (!mask.udp)
+			return -rte_errno;
+		if (mask.udp == &mlx5_nl_flow_mask_empty.udp)
+			goto error_spec;
+		spec.udp = src->spec;
+		if (mask.udp->hdr.src_port) {
+			if (mask.udp->hdr.src_port != BE16_MASK)
+				goto error_mask;
+			tmp.udp.src = spec.udp->hdr.src_port;
+			tmp.mask |= BIT_ENCAP(UDP_SRC);
+		}
+		if (mask.udp->hdr.dst_port) {
+			if (mask.udp->hdr.dst_port != BE16_MASK)
+				goto error_mask;
+			tmp.udp.dst = spec.udp->hdr.dst_port;
+			tmp.mask |= BIT_ENCAP(UDP_DST);
+		}
+		++src;
+		break;
+	case ITEM_VXLAN:
+		if (src->type != RTE_FLOW_ITEM_TYPE_VXLAN)
+			goto trans;
+		mask.vxlan = mlx5_nl_flow_item_mask
+			(src, &rte_flow_item_vxlan_mask,
+			 &mlx5_nl_flow_encap_mask_supported.vxlan,
+			 &mlx5_nl_flow_mask_empty.vxlan,
+			 sizeof(rte_flow_item_vxlan_mask), error);
+		if (!mask.vxlan)
+			return -rte_errno;
+		if (mask.vxlan == &mlx5_nl_flow_mask_empty.vxlan)
+			goto error_spec;
+		spec.vxlan = src->spec;
+		if (vxlan_vni_as_be32(mask.vxlan->vni)) {
+			if (memcmp(mask.vxlan->vni, VXLAN_VNI_MASK, 3))
+				goto error_mask;
+			tmp.vxlan.vni = vxlan_vni_as_be32(spec.vxlan->vni);
+			tmp.mask |= BIT_ENCAP(VXLAN_VNI);
+		}
+		++src;
+		break;
+	case END:
+		if (src->type != RTE_FLOW_ITEM_TYPE_END)
+			goto trans;
+		*dst = tmp;
+		return 0;
+	}
+	back = trans;
+	trans = mlx5_nl_flow_encap_reap_trans[trans[n - 1]];
+	n = 0;
+	goto trans;
+error_encap:
+	return rte_flow_error_set
+		(error, ENOTSUP, RTE_FLOW_ERROR_TYPE_ITEM, src,
+		 "unsupported encapsulation format");
+error_spec:
+	return rte_flow_error_set
+		(error, ENOTSUP, RTE_FLOW_ERROR_TYPE_ITEM, src,
+		 "a specification structure is required for encapsulation");
+error_mask:
+	return rte_flow_error_set
+		(error, ENOTSUP, RTE_FLOW_ERROR_TYPE_ITEM, src,
+		 "partial masks are not supported for encapsulation");
+}
+
+/**
  * Transpose flow rule description to rtnetlink message.
  *
  * This function transposes a flow rule description to a traffic control
@@ -412,6 +805,7 @@ mlx5_nl_flow_transpose(struct mlx5_nl_flow *nl_flow,
 	bool vlan_present;
 	bool vlan_eth_type_set;
 	bool ip_proto_set;
+	struct mlx5_nl_flow_encap encap;
 	struct nlattr *na_flower;
 	struct nlattr *na_flower_act;
 	struct nlattr *na_vlan_id;
@@ -425,8 +819,10 @@ mlx5_nl_flow_transpose(struct mlx5_nl_flow *nl_flow,
 		goto error_nobufs;
 	nl_flow->size = offsetof(struct mlx5_nl_flow, msg);
 	nl_flow->applied = 0;
+	nl_flow->encap_ifindex = 0;
 	nl_flow->ifindex_src = NULL;
 	nl_flow->ifindex_dst = NULL;
+	nl_flow->encap = NULL;
 	size -= nl_flow->size;
 	item = pattern;
 	action = actions;
@@ -437,6 +833,7 @@ mlx5_nl_flow_transpose(struct mlx5_nl_flow *nl_flow,
 	vlan_present = false;
 	vlan_eth_type_set = false;
 	ip_proto_set = false;
+	memset(&encap, 0, sizeof(encap));
 	na_flower = NULL;
 	na_flower_act = NULL;
 	na_vlan_id = NULL;
@@ -461,6 +858,7 @@ mlx5_nl_flow_transpose(struct mlx5_nl_flow *nl_flow,
 				of_set_vlan_vid;
 			const struct rte_flow_action_of_set_vlan_pcp *
 				of_set_vlan_pcp;
+			const struct rte_flow_action_vxlan_encap *vxlan_encap;
 		} conf;
 		struct nlmsghdr *nlh;
 		struct tcmsg *tcm;
@@ -887,6 +1285,12 @@ mlx5_nl_flow_transpose(struct mlx5_nl_flow *nl_flow,
 			goto error_nobufs;
 		++item;
 		break;
+	case ITEM_VXLAN:
+		if (item->type != RTE_FLOW_ITEM_TYPE_VXLAN)
+			goto trans;
+		return rte_flow_error_set
+			(error, ENOTSUP, RTE_FLOW_ERROR_TYPE_ITEM, item,
+			 "VXLAN header matching is not supported yet");
 	case ACTIONS:
 		if (item->type != RTE_FLOW_ITEM_TYPE_END)
 			goto trans;
@@ -1042,6 +1446,77 @@ mlx5_nl_flow_transpose(struct mlx5_nl_flow *nl_flow,
 		}
 		++action;
 		break;
+	case ACTION_VXLAN_ENCAP:
+		if (action->type != RTE_FLOW_ACTION_TYPE_VXLAN_ENCAP)
+			goto trans;
+		conf.vxlan_encap = action->conf;
+		if (mlx5_nl_flow_encap_reap(&encap,
+					    conf.vxlan_encap->definition,
+					    error))
+			return -rte_errno;
+		act_index =
+			mnl_attr_nest_start_check(buf, size, act_index_cur++);
+		if (!act_index ||
+		    !mnl_attr_put_strz_check(buf, size, TCA_ACT_KIND,
+					     "tunnel_key"))
+			goto error_nobufs;
+		act = mnl_attr_nest_start_check(buf, size, TCA_ACT_OPTIONS);
+		if (!act)
+			goto error_nobufs;
+		if (!mnl_attr_put_check(buf, size, TCA_TUNNEL_KEY_PARMS,
+					sizeof(struct tc_tunnel_key),
+					&(struct tc_tunnel_key){
+						.action = TC_ACT_PIPE,
+						.t_action =
+							TCA_TUNNEL_KEY_ACT_SET,
+					}))
+			goto error_nobufs;
+		if (encap.mask & BIT_ENCAP(IPV4_SRC) &&
+		    !mnl_attr_put_u32_check
+		    (buf, size, TCA_TUNNEL_KEY_ENC_IPV4_SRC,
+		     encap.ip.src.v4.s_addr))
+			goto error_nobufs;
+		if (encap.mask & BIT_ENCAP(IPV4_DST) &&
+		    !mnl_attr_put_u32_check
+		    (buf, size, TCA_TUNNEL_KEY_ENC_IPV4_DST,
+		     encap.ip.dst.v4.s_addr))
+			goto error_nobufs;
+		if (encap.mask & BIT_ENCAP(IPV6_SRC) &&
+		    !mnl_attr_put_check
+		    (buf, size, TCA_TUNNEL_KEY_ENC_IPV6_SRC,
+		     sizeof(encap.ip.src.v6), &encap.ip.src.v6))
+			goto error_nobufs;
+		if (encap.mask & BIT_ENCAP(IPV6_DST) &&
+		    !mnl_attr_put_check
+		    (buf, size, TCA_TUNNEL_KEY_ENC_IPV6_DST,
+		     sizeof(encap.ip.dst.v6), &encap.ip.dst.v6))
+			goto error_nobufs;
+		if (encap.mask & BIT_ENCAP(UDP_SRC) &&
+		    nl_flow != (void *)buf_tmp)
+			DRV_LOG(WARNING,
+				"UDP source port cannot be forced"
+				" for VXLAN encap; parameter ignored");
+		if (encap.mask & BIT_ENCAP(UDP_DST) &&
+		    !mnl_attr_put_u16_check
+		    (buf, size, TCA_TUNNEL_KEY_ENC_DST_PORT, encap.udp.dst))
+			goto error_nobufs;
+		if (!(encap.mask & BIT_ENCAP(VXLAN_VNI)))
+			return rte_flow_error_set
+				(error, EINVAL, RTE_FLOW_ERROR_TYPE_ACTION_CONF,
+				 conf.vxlan_encap, "VXLAN VNI is missing");
+		if (!mnl_attr_put_u32_check
+		    (buf, size, TCA_TUNNEL_KEY_ENC_KEY_ID, encap.vxlan.vni))
+			goto error_nobufs;
+		mnl_attr_nest_end(buf, act);
+		mnl_attr_nest_end(buf, act_index);
+		++action;
+		break;
+	case ACTION_VXLAN_DECAP:
+		if (action->type != RTE_FLOW_ACTION_TYPE_VXLAN_DECAP)
+			goto trans;
+		return rte_flow_error_set
+			(error, ENOTSUP, RTE_FLOW_ERROR_TYPE_ACTION, action,
+			 "VXLAN decap is not supported yet");
 	case END:
 		if (item->type != RTE_FLOW_ITEM_TYPE_END ||
 		    action->type != RTE_FLOW_ACTION_TYPE_END)
@@ -1054,6 +1529,21 @@ mlx5_nl_flow_transpose(struct mlx5_nl_flow *nl_flow,
 		buf = NULL;
 		size -= nlh->nlmsg_len;
 		nl_flow->size += nlh->nlmsg_len;
+		if (!encap.mask)
+			return nl_flow->size;
+		i = RTE_ALIGN_CEIL(nl_flow->size,
+				   alignof(struct mlx5_nl_flow_encap));
+		i -= nl_flow->size;
+		if (size < i + sizeof(encap))
+			goto error_nobufs;
+		nl_flow->size += i;
+		buf = (void *)((uintptr_t)nl_flow + nl_flow->size);
+		size -= i;
+		nl_flow->encap = buf;
+		*nl_flow->encap = encap;
+		buf = NULL;
+		size -= sizeof(*nl_flow->encap);
+		nl_flow->size += sizeof(*nl_flow->encap);
 		return nl_flow->size;
 	}
 	back = trans;
@@ -1151,6 +1641,671 @@ mlx5_nl_flow_chat(struct mlx5_nl_flow_ctx *ctx, struct nlmsghdr *nlh,
 	return -err;
 }
 
+/** Data structure used by mlx5_nl_flow_init_vxlan_cb(). */
+struct mlx5_nl_flow_init_vxlan_data {
+	unsigned int ifindex; /**< Base interface index. */
+	rte_be16_t vxlan_port; /**< Remote UDP port. */
+	unsigned int *collect; /**< Collected interfaces. */
+	unsigned int collect_n; /**< Number of collected interfaces. */
+};
+
+/**
+ * Collect indices of VXLAN encap/decap interfaces associated with device.
+ *
+ * @param nlh
+ *   Pointer to reply header.
+ * @param arg
+ *   Opaque data pointer for this callback.
+ *
+ * @return
+ *   A positive, nonzero value on success, negative errno value otherwise
+ *   and rte_errno is set.
+ */
+static int
+mlx5_nl_flow_init_vxlan_cb(const struct nlmsghdr *nlh, void *arg)
+{
+	struct mlx5_nl_flow_init_vxlan_data *data = arg;
+	struct ifinfomsg *ifm;
+	struct nlattr *na;
+	struct nlattr *na_info = NULL;
+	struct nlattr *na_vxlan = NULL;
+	struct nlattr *na_vxlan_port = NULL;
+	bool found = false;
+	unsigned int *collect;
+
+	if (nlh->nlmsg_type != RTM_NEWLINK)
+		goto error_inval;
+	ifm = mnl_nlmsg_get_payload(nlh);
+	mnl_attr_for_each(na, nlh, sizeof(*ifm))
+		if (mnl_attr_get_type(na) == IFLA_LINKINFO) {
+			na_info = na;
+			break;
+		}
+	if (!na_info)
+		return 1;
+	mnl_attr_for_each_nested(na, na_info) {
+		switch (mnl_attr_get_type(na)) {
+		case IFLA_INFO_KIND:
+			if (!strncmp("vxlan", mnl_attr_get_str(na),
+				     mnl_attr_get_len(na)))
+				found = true;
+			break;
+		case IFLA_INFO_DATA:
+			na_vxlan = na;
+			break;
+		}
+		if (found && na_vxlan)
+			break;
+	}
+	if (!found || !na_vxlan)
+		return 1;
+	found = false;
+	mnl_attr_for_each_nested(na, na_vxlan) {
+		switch (mnl_attr_get_type(na)) {
+		case IFLA_VXLAN_LINK:
+			if (mnl_attr_get_u32(na) == data->ifindex)
+				found = true;
+			break;
+		case IFLA_VXLAN_PORT:
+			na_vxlan_port = na;
+			break;
+		}
+		if (found && na_vxlan_port)
+			break;
+	}
+	if (!found ||
+	    (na_vxlan_port &&
+	     mnl_attr_get_u16(na_vxlan_port) != data->vxlan_port))
+		return 1;
+	if (!ifm->ifi_index)
+		goto error_inval;
+	collect = realloc(data->collect,
+			  (data->collect_n + 1) * sizeof(*data->collect));
+	if (!collect) {
+		rte_errno = errno;
+		return -rte_errno;
+	}
+	collect[data->collect_n] = ifm->ifi_index;
+	data->collect = collect;
+	data->collect_n += 1;
+	return 1;
+error_inval:
+	rte_errno = EINVAL;
+	return -rte_errno;
+}
+
+/**
+ * Clean up and generate VXLAN encap/decap interface.
+ *
+ * @param ctx
+ *   Context object initialized by mlx5_nl_flow_ctx_create().
+ * @param ifindex
+ *   Network interface index to associate VXLAN encap/decap with.
+ * @param vxlan_port
+ *   Remote UDP port.
+ * @param enable
+ *   If disabled, stop after initial clean up.
+ * @param[out] error
+ *   Perform verbose error reporting if not NULL.
+ *
+ * @return
+ *   Interface index on success, zero otherwise and rte_errno is set.
+ *
+ *   If @p enable is set, the returned ifindex is that of the new VXLAN
+ *   interface, otherwise @p ifindex is simply returned as is.
+ */
+static unsigned int
+mlx5_nl_flow_ifindex_vxlan(struct mlx5_nl_flow_ctx *ctx, unsigned int ifindex,
+			   rte_be16_t vxlan_port, int enable,
+			   struct rte_flow_error *error)
+{
+	struct nlmsghdr *nlh;
+	struct ifinfomsg *ifm;
+	alignas(struct nlmsghdr)
+	uint8_t buf[mnl_nlmsg_size(sizeof(*ifm) + 256)];
+	unsigned int ifindex_vxlan = 0;
+	struct mlx5_nl_flow_init_vxlan_data data = {
+		.ifindex = ifindex,
+		.vxlan_port = vxlan_port,
+		.collect = NULL,
+		.collect_n = 0,
+	};
+	char name[IF_NAMESIZE];
+	struct nlattr *na_info;
+	struct nlattr *na_vxlan;
+	unsigned int i;
+	int ret;
+
+	if (!ifindex) {
+		ret = -EINVAL;
+		goto exit;
+	}
+	/*
+	 * Seek and destroy leftover VXLAN encap/decap interfaces with
+	 * matching properties.
+	 */
+	nlh = mnl_nlmsg_put_header(buf);
+	nlh->nlmsg_type = RTM_GETLINK;
+	nlh->nlmsg_flags = NLM_F_REQUEST | NLM_F_DUMP;
+	ifm = mnl_nlmsg_put_extra_header(nlh, sizeof(*ifm));
+	ifm->ifi_family = AF_UNSPEC;
+	ret = mlx5_nl_flow_chat(ctx, nlh, mlx5_nl_flow_init_vxlan_cb, &data);
+	if (ret)
+		goto exit;
+	nlh->nlmsg_type = RTM_DELLINK;
+	nlh->nlmsg_flags = NLM_F_REQUEST;
+	for (i = 0; i != data.collect_n; ++i) {
+		ifm->ifi_index = data.collect[i];
+		DRV_LOG(DEBUG, "cleaning up VXLAN encap/decap ifindex %u",
+			ifm->ifi_index);
+		ret = mlx5_nl_flow_chat(ctx, nlh, NULL, NULL);
+		if (ret)
+			goto exit;
+	}
+	if (!enable)
+		return ifindex;
+	/* Add fresh VXLAN encap/decap interface. */
+	nlh->nlmsg_type = RTM_NEWLINK;
+	nlh->nlmsg_flags = NLM_F_REQUEST | NLM_F_CREATE | NLM_F_REPLACE;
+	ifm->ifi_type = ARPHRD_ETHER;
+	ifm->ifi_index = 0;
+	ifm->ifi_flags = IFF_UP;
+	ifm->ifi_change = 0xffffffff;
+	if (snprintf(name, sizeof(name), "vxlan_%u_%u",
+		     rte_be_to_cpu_16(vxlan_port), ifindex) == -1) {
+		ret = -errno;
+		goto exit;
+	}
+	ret = -ENOBUFS;
+	if (!mnl_attr_put_strz_check(nlh, sizeof(buf), IFLA_IFNAME, name))
+		goto exit;
+	na_info = mnl_attr_nest_start_check(nlh, sizeof(buf), IFLA_LINKINFO);
+	if (!na_info)
+		goto exit;
+	if (!mnl_attr_put_strz_check(nlh, sizeof(buf), IFLA_INFO_KIND, "vxlan"))
+		goto exit;
+	na_vxlan = mnl_attr_nest_start_check(nlh, sizeof(buf), IFLA_INFO_DATA);
+	if (!na_vxlan)
+		goto exit;
+	if (!mnl_attr_put_u32_check(nlh, sizeof(buf), IFLA_VXLAN_LINK, ifindex))
+		goto exit;
+	if (!mnl_attr_put_u8_check(nlh, sizeof(buf),
+				   IFLA_VXLAN_COLLECT_METADATA, 1))
+		goto exit;
+	/*
+	 * When destination port or VNI are either undefined or set to fixed
+	 * values, kernel complains with EEXIST ("A VXLAN device with the
+	 * specified VNI already exist") when creating subsequent VXLAN
+	 * interfaces with the same properties, even if linked with
+	 * different physical devices.
+	 *
+	 * Also since only destination ports assigned to existing VXLAN
+	 * interfaces can be offloaded to the switch, the above limitation
+	 * cannot be worked around by picking a random value here and using
+	 * a different one when creating flow rules later.
+	 *
+	 * Therefore request a hopefully unique VNI based on the interface
+	 * index in order to work around EEXIST. VNI will be overridden
+	 * later on a flow rule basis thanks to IFLA_VXLAN_COLLECT_METADATA.
+	 */
+	if (!mnl_attr_put_u16_check(nlh, sizeof(buf), IFLA_VXLAN_PORT,
+				    vxlan_port))
+		goto exit;
+	if (!mnl_attr_put_u32_check(nlh, sizeof(buf), IFLA_VXLAN_ID, ifindex))
+		goto exit;
+	mnl_attr_nest_end(nlh, na_vxlan);
+	mnl_attr_nest_end(nlh, na_info);
+	ret = mlx5_nl_flow_chat(ctx, nlh, NULL, NULL);
+	if (ret)
+		goto exit;
+	/* Lastly, retrieve its ifindex value. */
+	nlh->nlmsg_type = RTM_GETLINK;
+	nlh->nlmsg_flags = NLM_F_REQUEST | NLM_F_DUMP;
+	data.collect_n = 0;
+	ret = mlx5_nl_flow_chat(ctx, nlh, mlx5_nl_flow_init_vxlan_cb, &data);
+	if (ret)
+		goto exit;
+	ret = -ENXIO;
+	if (data.collect_n != 1 || !*data.collect)
+		goto exit;
+	ifindex_vxlan = *data.collect;
+	DRV_LOG(DEBUG, "created VXLAN encap/decap ifindex %u (%s)",
+		ifindex_vxlan, name);
+	ret = mlx5_nl_flow_ifindex_init(ctx, ifindex_vxlan, error);
+	if (ret) {
+		mlx5_nl_flow_ifindex_vxlan(ctx, ifindex_vxlan, vxlan_port,
+					   false, NULL);
+		ifindex_vxlan = 0;
+		goto exit;
+	}
+	ret = 0;
+exit:
+	free(data.collect);
+	if (ret)
+		rte_flow_error_set
+			(error, -ret, RTE_FLOW_ERROR_TYPE_UNSPECIFIED, NULL,
+			 "netlink: failed to request VXLAN encap/decap"
+			 " interface creation/deletion");
+	return ifindex_vxlan;
+}
+
+/**
+ * Emit Netlink message to add/remove local address.
+ *
+ * Note that an implicit route is maintained by the kernel due to the
+ * presence of a peer address (IFA_ADDRESS).
+ *
+ * @param ctx
+ *   Context object initialized by mlx5_nl_flow_ctx_create().
+ * @param[in] encap
+ *   Encapsulation properties (source address).
+ * @param ifindex
+ *   Network interface.
+ * @param enable
+ *   Toggle between add and remove.
+ * @param[out] error
+ *   Perform verbose error reporting if not NULL.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+static int
+mlx5_nl_flow_encap_local(struct mlx5_nl_flow_ctx *ctx,
+			 const struct mlx5_nl_flow_encap *encap,
+			 unsigned int ifindex,
+			 bool enable,
+			 struct rte_flow_error *error)
+{
+	struct nlmsghdr *nlh;
+	struct ifaddrmsg *ifa;
+	alignas(struct nlmsghdr)
+	uint8_t buf[mnl_nlmsg_size(sizeof(*ifa) + 128)];
+
+	nlh = mnl_nlmsg_put_header(buf);
+	nlh->nlmsg_type = enable ? RTM_NEWADDR : RTM_DELADDR;
+	nlh->nlmsg_flags =
+		NLM_F_REQUEST | (enable ? NLM_F_CREATE | NLM_F_REPLACE : 0);
+	nlh->nlmsg_seq = 0;
+	ifa = mnl_nlmsg_put_extra_header(nlh, sizeof(*ifa));
+	if (encap->mask & BIT_ENCAP(IPV4_SRC)) {
+		ifa->ifa_family = AF_INET;
+		ifa->ifa_prefixlen = 32;
+	} else if (encap->mask & BIT_ENCAP(IPV6_SRC)) {
+		ifa->ifa_family = AF_INET6;
+		ifa->ifa_prefixlen = 128;
+	} else {
+		ifa->ifa_family = AF_UNSPEC;
+		ifa->ifa_prefixlen = 0;
+	}
+	ifa->ifa_flags = IFA_F_PERMANENT;
+	ifa->ifa_scope = RT_SCOPE_LINK;
+	ifa->ifa_index = ifindex;
+	if (encap->mask & BIT_ENCAP(IPV4_SRC) &&
+	    !mnl_attr_put_u32_check(nlh, sizeof(buf), IFA_LOCAL,
+				    encap->ip.src.v4.s_addr))
+		goto error_nobufs;
+	if (encap->mask & BIT_ENCAP(IPV6_SRC) &&
+	    !mnl_attr_put_check(nlh, sizeof(buf), IFA_LOCAL,
+				sizeof(encap->ip.src.v6), &encap->ip.src.v6))
+		goto error_nobufs;
+	if (encap->mask & BIT_ENCAP(IPV4_DST) &&
+	    !mnl_attr_put_u32_check(nlh, sizeof(buf), IFA_ADDRESS,
+				    encap->ip.dst.v4.s_addr))
+		goto error_nobufs;
+	if (encap->mask & BIT_ENCAP(IPV6_DST) &&
+	    !mnl_attr_put_check(nlh, sizeof(buf), IFA_ADDRESS,
+				sizeof(encap->ip.dst.v6), &encap->ip.dst.v6))
+		goto error_nobufs;
+	if (!mlx5_nl_flow_chat(ctx, nlh, NULL, NULL))
+		return 0;
+	return rte_flow_error_set
+		(error, rte_errno, RTE_FLOW_ERROR_TYPE_UNSPECIFIED, NULL,
+		 "cannot complete IFA request");
+error_nobufs:
+	return rte_flow_error_set
+		(error, ENOBUFS, RTE_FLOW_ERROR_TYPE_UNSPECIFIED, NULL,
+		 "generated IFA message is too large");
+}
+
+/**
+ * Emit Netlink message to add/remove neighbor.
+ *
+ * @param ctx
+ *   Context object initialized by mlx5_nl_flow_ctx_create().
+ * @param[in] encap
+ *   Encapsulation properties (destination address).
+ * @param ifindex
+ *   Network interface.
+ * @param enable
+ *   Toggle between add and remove.
+ * @param[out] error
+ *   Perform verbose error reporting if not NULL.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+static int
+mlx5_nl_flow_encap_neigh(struct mlx5_nl_flow_ctx *ctx,
+			 const struct mlx5_nl_flow_encap *encap,
+			 unsigned int ifindex,
+			 bool enable,
+			 struct rte_flow_error *error)
+{
+	struct nlmsghdr *nlh;
+	struct ndmsg *ndm;
+	alignas(struct nlmsghdr)
+	uint8_t buf[mnl_nlmsg_size(sizeof(*ndm) + 128)];
+
+	nlh = mnl_nlmsg_put_header(buf);
+	nlh->nlmsg_type = enable ? RTM_NEWNEIGH : RTM_DELNEIGH;
+	nlh->nlmsg_flags =
+		NLM_F_REQUEST | (enable ? NLM_F_CREATE | NLM_F_REPLACE : 0);
+	nlh->nlmsg_seq = 0;
+	ndm = mnl_nlmsg_put_extra_header(nlh, sizeof(*ndm));
+	if (encap->mask & BIT_ENCAP(IPV4_DST))
+		ndm->ndm_family = AF_INET;
+	else if (encap->mask & BIT_ENCAP(IPV6_DST))
+		ndm->ndm_family = AF_INET6;
+	else
+		ndm->ndm_family = AF_UNSPEC;
+	ndm->ndm_ifindex = ifindex;
+	ndm->ndm_state = NUD_PERMANENT;
+	ndm->ndm_flags = 0;
+	ndm->ndm_type = 0;
+	if (encap->mask & BIT_ENCAP(IPV4_DST) &&
+	    !mnl_attr_put_u32_check(nlh, sizeof(buf), NDA_DST,
+				    encap->ip.dst.v4.s_addr))
+		goto error_nobufs;
+	if (encap->mask & BIT_ENCAP(IPV6_DST) &&
+	    !mnl_attr_put_check(nlh, sizeof(buf), NDA_DST,
+				sizeof(encap->ip.dst.v6), &encap->ip.dst.v6))
+		goto error_nobufs;
+	if (encap->mask & BIT_ENCAP(ETH_SRC) && enable)
+		DRV_LOG(WARNING,
+			"Ethernet source address cannot be forced"
+			" for VXLAN encap; parameter ignored");
+	if (encap->mask & BIT_ENCAP(ETH_DST) &&
+	    !mnl_attr_put_check(nlh, sizeof(buf), NDA_LLADDR,
+				sizeof(encap->eth.dst), &encap->eth.dst))
+		goto error_nobufs;
+	if (!mlx5_nl_flow_chat(ctx, nlh, NULL, NULL))
+		return 0;
+	return rte_flow_error_set
+		(error, rte_errno, RTE_FLOW_ERROR_TYPE_UNSPECIFIED, NULL,
+		 "cannot complete ND request");
+error_nobufs:
+	return rte_flow_error_set
+		(error, ENOBUFS, RTE_FLOW_ERROR_TYPE_UNSPECIFIED, NULL,
+		 "generated ND message is too large");
+}
+
+/**
+ * Look for matching IP source/destination properties.
+ *
+ * @param[in] bag
+ *   Search target.
+ * @param bag_mask
+ *   Bit-mask for valid fields in @p bag.
+ * @param[in] what
+ *   Properties to look for in @p bag.
+ * @param what_mask
+ *   Bit-mask for valid fields in @p what.
+ *
+ * @return
+ *   True if @p what is found in @p bag, false otherwise.
+ */
+static bool
+mlx5_nl_flow_encap_ip_search(const struct mlx5_nl_flow_encap_ip *bag,
+			     uint32_t bag_mask,
+			     const struct mlx5_nl_flow_encap_ip *what,
+			     uint32_t what_mask)
+{
+	if ((what_mask & BIT_ENCAP(IPV4_SRC) &&
+	     (!(bag_mask & BIT_ENCAP(IPV4_SRC)) ||
+	      bag->src.v4.s_addr != what->src.v4.s_addr)) ||
+	    (what_mask & BIT_ENCAP(IPV4_DST) &&
+	     (!(bag_mask & BIT_ENCAP(IPV4_DST)) ||
+	      bag->dst.v4.s_addr != what->dst.v4.s_addr)) ||
+	    (what_mask & BIT_ENCAP(IPV6_SRC) &&
+	     (!(bag_mask & BIT_ENCAP(IPV6_SRC)) ||
+	      memcmp(&bag->src.v6, &what->src.v6, sizeof(bag->src.v6)))) ||
+	    (what_mask & BIT_ENCAP(IPV6_DST) &&
+	     (!(bag_mask & BIT_ENCAP(IPV6_DST)) ||
+	      memcmp(&bag->dst.v6, &what->dst.v6, sizeof(bag->dst.v6)))))
+		return false;
+	return true;
+}
+
+/**
+ * Interface resources list common to all driver instances of a given
+ * process. It is protected by a standard mutex because resource allocation
+ * is slow and involves system calls.
+ */
+static LIST_HEAD(, mlx5_nl_flow_encap_ifindex) mlx5_nl_flow_encap_ifindex_list =
+	LIST_HEAD_INITIALIZER();
+static pthread_mutex_t mlx5_nl_flow_encap_ifindex_list_lock =
+	PTHREAD_MUTEX_INITIALIZER;
+
+/**
+ * Retrieve target interface index for encapsulation.
+ *
+ * Resources are automatically allocated and released as necessary.
+ *
+ * @param ctx
+ *   Context object initialized by mlx5_nl_flow_ctx_create().
+ * @param[in] encap
+ *   Encapsulation properties.
+ * @param ifindex
+ *   Outer network interface.
+ * @param enable
+ *   Toggle whether resources are allocated or released.
+ * @param[out] error
+ *   Perform verbose error reporting if not NULL.
+ *
+ * @return
+ *   Interface index on success, zero otherwise and rte_errno is set.
+ *
+ *   If @p enable is set, the returned ifindex is that of the inner
+ *   interface, otherwise @p ifindex is simply returned as is.
+ */
+static unsigned int
+mlx5_nl_flow_encap_ifindex(struct mlx5_nl_flow_ctx *ctx,
+			   const struct mlx5_nl_flow_encap *encap,
+			   unsigned int ifindex,
+			   bool enable,
+			   struct rte_flow_error *error)
+{
+	struct mlx5_nl_flow_encap_ifindex *encap_ifindex = NULL;
+	struct mlx5_nl_flow_encap_vxlan *encap_vxlan = NULL;
+	struct mlx5_nl_flow_encap_addr *encap_local = NULL;
+	struct mlx5_nl_flow_encap_addr *encap_neigh = NULL;
+	unsigned int ifindex_inner = ifindex;
+	int ret;
+
+	pthread_mutex_lock(&mlx5_nl_flow_encap_ifindex_list_lock);
+	/* Interface descriptor. */
+	LIST_FOREACH(encap_ifindex, &mlx5_nl_flow_encap_ifindex_list, next) {
+		if (encap_ifindex->outer != ifindex)
+			continue;
+		if (enable)
+			++encap_ifindex->refcnt;
+		break;
+	}
+	if (enable && !encap_ifindex) {
+		encap_ifindex =
+			rte_zmalloc_socket(__func__, sizeof(*encap_ifindex),
+					   0, ctx->socket);
+		if (!encap_ifindex) {
+			rte_flow_error_set
+				(error, ENOMEM, RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+				 NULL, "missing ifindex encap data");
+			goto release;
+		}
+		*encap_ifindex = (struct mlx5_nl_flow_encap_ifindex){
+			.refcnt = 1,
+			.outer = ifindex,
+			.vxlan = LIST_HEAD_INITIALIZER(),
+			.local = LIST_HEAD_INITIALIZER(),
+			.neigh = LIST_HEAD_INITIALIZER(),
+		};
+		LIST_INSERT_HEAD(&mlx5_nl_flow_encap_ifindex_list,
+				 encap_ifindex, next);
+	}
+	if (!encap_ifindex) {
+		if (!enable)
+			goto release;
+		rte_flow_error_set
+			(error, EINVAL, RTE_FLOW_ERROR_TYPE_UNSPECIFIED, NULL,
+			 "nonexistent interface");
+		goto release;
+	}
+	/* VXLAN descriptor. */
+	if (!(encap->mask & BIT_ENCAP(VXLAN_VNI)) ||
+	    !(encap->mask & BIT_ENCAP(UDP_SRC)))
+		goto skip_vxlan;
+	LIST_FOREACH(encap_vxlan, &encap_ifindex->vxlan, next) {
+		if (encap->udp.src != encap_vxlan->port)
+			continue;
+		if (enable)
+			++encap_vxlan->refcnt;
+		break;
+	}
+	if (enable && !encap_vxlan) {
+		encap_vxlan =
+			rte_zmalloc_socket(__func__, sizeof(*encap_vxlan),
+					   0, ctx->socket);
+		if (!encap_vxlan) {
+			rte_flow_error_set
+				(error, ENOMEM, RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+				 NULL, "missing VXLAN encap data");
+			goto release;
+		}
+		*encap_vxlan = (struct mlx5_nl_flow_encap_vxlan){
+			.refcnt = 1,
+			.port = encap->udp.src,
+			.inner = mlx5_nl_flow_ifindex_vxlan
+				(ctx, ifindex, encap->udp.src, true, error),
+		};
+		if (!encap_vxlan->inner) {
+			rte_free(encap_vxlan);
+			encap_vxlan = NULL;
+			goto release;
+		}
+		LIST_INSERT_HEAD(&encap_ifindex->vxlan, encap_vxlan, next);
+	}
+	ifindex_inner = encap_vxlan->inner;
+skip_vxlan:
+	/* Local address descriptor (source). */
+	LIST_FOREACH(encap_local, &encap_ifindex->local, next) {
+		if (!mlx5_nl_flow_encap_ip_search
+		    (&encap->ip, encap->mask,
+		     &encap_local->ip, encap_local->mask &
+		     (BIT_ENCAP(IPV4_SRC) | BIT_ENCAP(IPV6_SRC))))
+			continue;
+		if (enable)
+			++encap_local->refcnt;
+		break;
+	}
+	if (enable && !encap_local &&
+	    encap->mask & (BIT_ENCAP(IPV4_SRC) | BIT_ENCAP(IPV6_SRC))) {
+		encap_local =
+			rte_zmalloc_socket(__func__, sizeof(*encap_local),
+					   0, ctx->socket);
+		if (!encap_local) {
+			rte_flow_error_set
+				(error, ENOMEM, RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+				 NULL, "missing local encap data");
+			goto release;
+		}
+		encap_local->refcnt = 1;
+		encap_local->mask =
+			encap->mask &
+			(BIT_ENCAP(IPV4_SRC) | BIT_ENCAP(IPV6_SRC));
+		if (encap->mask & BIT_ENCAP(IPV4_SRC))
+			encap_local->ip.src.v4 = encap->ip.src.v4;
+		if (encap->mask & BIT_ENCAP(IPV6_SRC))
+			encap_local->ip.src.v6 = encap->ip.src.v6;
+		ret = mlx5_nl_flow_encap_local(ctx, encap, ifindex, true,
+					       error);
+		if (ret) {
+			rte_free(encap_local);
+			encap_local = NULL;
+			goto release;
+		}
+		LIST_INSERT_HEAD(&encap_ifindex->local, encap_local, next);
+	}
+	/* Neighbor descriptor (destination). */
+	LIST_FOREACH(encap_neigh, &encap_ifindex->neigh, next) {
+		if (!mlx5_nl_flow_encap_ip_search
+		    (&encap->ip, encap->mask,
+		     &encap_local->ip, encap_local->mask &
+		     (BIT_ENCAP(IPV4_DST) | BIT_ENCAP(IPV6_DST))))
+			continue;
+		if (enable)
+			++encap_neigh->refcnt;
+		break;
+	}
+	if (enable && !encap_neigh &&
+	    encap->mask & (BIT_ENCAP(IPV4_DST) | BIT_ENCAP(IPV6_DST))) {
+		encap_neigh =
+			rte_zmalloc_socket(__func__, sizeof(*encap_neigh),
+					   0, ctx->socket);
+		if (!encap_neigh) {
+			rte_flow_error_set
+				(error, ENOMEM, RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+				 NULL, "missing neigh encap data");
+			goto release;
+		}
+		encap_neigh->refcnt = 1;
+		encap_neigh->mask =
+			encap->mask &
+			(BIT_ENCAP(IPV4_DST) | BIT_ENCAP(IPV6_DST));
+		if (encap->mask & BIT_ENCAP(IPV4_DST))
+			encap_neigh->ip.dst.v4 = encap->ip.dst.v4;
+		if (encap->mask & BIT_ENCAP(IPV6_DST))
+			encap_neigh->ip.dst.v6 = encap->ip.dst.v6;
+		ret = mlx5_nl_flow_encap_neigh(ctx, encap, ifindex, true,
+					       error);
+		if (ret) {
+			rte_free(encap_neigh);
+			encap_neigh = NULL;
+			goto release;
+		}
+		LIST_INSERT_HEAD(&encap_ifindex->neigh, encap_neigh, next);
+	}
+	if (!enable)
+		goto release;
+	pthread_mutex_unlock(&mlx5_nl_flow_encap_ifindex_list_lock);
+	return ifindex_inner;
+release:
+	ret = rte_errno;
+	if (encap_neigh && !--encap_neigh->refcnt) {
+		LIST_REMOVE(encap_neigh, next);
+		mlx5_nl_flow_encap_neigh(ctx, encap, ifindex, false, NULL);
+		rte_free(encap_neigh);
+	}
+	if (encap_local && !--encap_local->refcnt) {
+		LIST_REMOVE(encap_local, next);
+		mlx5_nl_flow_encap_local(ctx, encap, ifindex, false, NULL);
+		rte_free(encap_local);
+	}
+	if (encap_vxlan && !--encap_vxlan->refcnt) {
+		LIST_REMOVE(encap_vxlan, next);
+		mlx5_nl_flow_ifindex_vxlan
+			(ctx, ifindex, encap_vxlan->port, false, NULL);
+		rte_free(encap_vxlan);
+	}
+	if (encap_ifindex && !--encap_ifindex->refcnt) {
+		LIST_REMOVE(encap_ifindex, next);
+		rte_free(encap_ifindex);
+	}
+	pthread_mutex_unlock(&mlx5_nl_flow_encap_ifindex_list_lock);
+	if (!enable)
+		return ifindex;
+	rte_errno = ret;
+	return 0;
+}
+
 /**
  * Create a Netlink flow rule.
  *
@@ -1169,17 +2324,35 @@ mlx5_nl_flow_create(struct mlx5_nl_flow_ctx *ctx, struct mlx5_nl_flow *nl_flow,
 		    struct rte_flow_error *error)
 {
 	struct nlmsghdr *nlh = (void *)nl_flow->msg;
+	struct mlx5_nl_flow_encap *encap =
+		nl_flow->encap && nl_flow->ifindex_dst ?
+		nl_flow->encap : NULL;
+	unsigned int ifindex = encap ? *nl_flow->ifindex_dst : 0;
+	int ret;
 
 	if (nl_flow->applied)
 		return 0;
 	nlh->nlmsg_type = RTM_NEWTFILTER;
 	nlh->nlmsg_flags = NLM_F_REQUEST | NLM_F_CREATE | NLM_F_EXCL;
-	if (!mlx5_nl_flow_chat(ctx, nlh, NULL, NULL)) {
+	if (encap) {
+		nl_flow->encap_ifindex = mlx5_nl_flow_encap_ifindex
+			(ctx, encap, ifindex, true, error);
+		if (!nl_flow->encap_ifindex)
+			return -rte_errno;
+		*nl_flow->ifindex_dst = nl_flow->encap_ifindex;
+	}
+	ret = mlx5_nl_flow_chat(ctx, nlh, NULL, NULL);
+	if (encap)
+		*nl_flow->ifindex_dst = ifindex;
+	if (!ret) {
 		nl_flow->applied = 1;
 		return 0;
 	}
+	ret = rte_errno;
+	if (nl_flow->encap_ifindex)
+		mlx5_nl_flow_encap_ifindex(ctx, encap, ifindex, false, NULL);
 	return rte_flow_error_set
-		(error, rte_errno, RTE_FLOW_ERROR_TYPE_UNSPECIFIED, NULL,
+		(error, ret, RTE_FLOW_ERROR_TYPE_UNSPECIFIED, NULL,
 		 "netlink: failed to create TC flow rule");
 }
 
@@ -1204,14 +2377,31 @@ mlx5_nl_flow_destroy(struct mlx5_nl_flow_ctx *ctx, struct mlx5_nl_flow *nl_flow,
 		     struct rte_flow_error *error)
 {
 	struct nlmsghdr *nlh = (void *)nl_flow->msg;
+	struct mlx5_nl_flow_encap *encap =
+		nl_flow->encap && nl_flow->ifindex_dst ?
+		nl_flow->encap : NULL;
+	unsigned int ifindex = encap ? *nl_flow->ifindex_dst : 0;
+	int err = 0;
 	int ret;
 
 	if (!nl_flow->applied)
 		return 0;
 	nlh->nlmsg_type = RTM_DELTFILTER;
 	nlh->nlmsg_flags = NLM_F_REQUEST;
+	if (encap) {
+		if (!mlx5_nl_flow_encap_ifindex
+		    (ctx, encap, ifindex, false, error))
+			err = rte_errno;
+		*nl_flow->ifindex_dst = nl_flow->encap_ifindex;
+	}
 	ret = mlx5_nl_flow_chat(ctx, nlh, NULL, NULL);
+	if (encap)
+		*nl_flow->ifindex_dst = ifindex;
 	nl_flow->applied = 0;
+	if (err) {
+		rte_errno = err;
+		return -rte_errno;
+	}
 	if (!ret)
 		return 0;
 	return rte_flow_error_set
-- 
2.11.0

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [dpdk-dev] [PATCH 8/8] net/mlx5: add VXLAN decap support to switch flow rules
  2018-08-31  9:57 [dpdk-dev] [PATCH 0/8] net/mlx5: add switch offload for VXLAN encap/decap Adrien Mazarguil
                   ` (6 preceding siblings ...)
  2018-08-31  9:57 ` [dpdk-dev] [PATCH 7/8] net/mlx5: add VXLAN encap support to switch flow rules Adrien Mazarguil
@ 2018-08-31  9:57 ` Adrien Mazarguil
  7 siblings, 0 replies; 9+ messages in thread
From: Adrien Mazarguil @ 2018-08-31  9:57 UTC (permalink / raw)
  To: Shahaf Shuler, Yongseok Koh, Slava Ovsiienko; +Cc: dev

This provides support for the VXLAN_DECAP action. Outer tunnel properties
are specified as the initial part of the flow rule pattern (up to and
including VXLAN item), optionally followed by inner traffic properties.

Testpmd examples:

- Creating a flow on port ID 1 performing VXLAN decapsulation and directing
  the result to port ID 2 without checking inner properties:

  flow create 1 ingress transfer pattern eth src is 66:77:88:99:aa:bb
     dst is 00:11:22:33:44:55 / ipv4 src is 2.2.2.2 dst is 1.1.1.1 /
     udp src is 4789 dst is 4242 / vxlan vni is 0x112233 / end
     actions vxlan_decap / port_id id 2 / end

- Same as above except only inner TCPv6 packets with destination port 42
  will be let through:

  flow create 1 ingress transfer pattern eth src is 66:77:88:99:aa:bb
     dst is 00:11:22:33:44:55 / ipv4 src is 2.2.2.2 dst is 1.1.1.1 /
     udp src is 4789 dst is 4242 / vxlan vni is 0x112233 /
     eth / ipv6 / tcp dst is 42 / end
     actions vxlan_decap / port_id id 2 / end

Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
---
 drivers/net/mlx5/Makefile       |  65 +++++++
 drivers/net/mlx5/mlx5_nl_flow.c | 344 ++++++++++++++++++++++++++++++++---
 2 files changed, 379 insertions(+), 30 deletions(-)

diff --git a/drivers/net/mlx5/Makefile b/drivers/net/mlx5/Makefile
index 1ba4ce612..85672abd6 100644
--- a/drivers/net/mlx5/Makefile
+++ b/drivers/net/mlx5/Makefile
@@ -335,6 +335,71 @@ mlx5_autoconf.h.new: $(RTE_SDK)/buildtools/auto-config-h.sh
 		enum TCA_FLOWER_KEY_VLAN_ETH_TYPE \
 		$(AUTOCONF_OUTPUT)
 	$Q sh -- '$<' '$@' \
+		HAVE_TCA_FLOWER_KEY_ENC_KEY_ID \
+		linux/pkt_cls.h \
+		enum TCA_FLOWER_KEY_ENC_KEY_ID \
+		$(AUTOCONF_OUTPUT)
+	$Q sh -- '$<' '$@' \
+		HAVE_TCA_FLOWER_KEY_ENC_IPV4_SRC \
+		linux/pkt_cls.h \
+		enum TCA_FLOWER_KEY_ENC_IPV4_SRC \
+		$(AUTOCONF_OUTPUT)
+	$Q sh -- '$<' '$@' \
+		HAVE_TCA_FLOWER_KEY_ENC_IPV4_SRC_MASK \
+		linux/pkt_cls.h \
+		enum TCA_FLOWER_KEY_ENC_IPV4_SRC_MASK \
+		$(AUTOCONF_OUTPUT)
+	$Q sh -- '$<' '$@' \
+		HAVE_TCA_FLOWER_KEY_ENC_IPV4_DST \
+		linux/pkt_cls.h \
+		enum TCA_FLOWER_KEY_ENC_IPV4_DST \
+		$(AUTOCONF_OUTPUT)
+	$Q sh -- '$<' '$@' \
+		HAVE_TCA_FLOWER_KEY_ENC_IPV4_DST_MASK \
+		linux/pkt_cls.h \
+		enum TCA_FLOWER_KEY_ENC_IPV4_DST_MASK \
+		$(AUTOCONF_OUTPUT)
+	$Q sh -- '$<' '$@' \
+		HAVE_TCA_FLOWER_KEY_ENC_IPV6_SRC \
+		linux/pkt_cls.h \
+		enum TCA_FLOWER_KEY_ENC_IPV6_SRC \
+		$(AUTOCONF_OUTPUT)
+	$Q sh -- '$<' '$@' \
+		HAVE_TCA_FLOWER_KEY_ENC_IPV6_SRC_MASK \
+		linux/pkt_cls.h \
+		enum TCA_FLOWER_KEY_ENC_IPV6_SRC_MASK \
+		$(AUTOCONF_OUTPUT)
+	$Q sh -- '$<' '$@' \
+		HAVE_TCA_FLOWER_KEY_ENC_IPV6_DST \
+		linux/pkt_cls.h \
+		enum TCA_FLOWER_KEY_ENC_IPV6_DST \
+		$(AUTOCONF_OUTPUT)
+	$Q sh -- '$<' '$@' \
+		HAVE_TCA_FLOWER_KEY_ENC_IPV6_DST_MASK \
+		linux/pkt_cls.h \
+		enum TCA_FLOWER_KEY_ENC_IPV6_DST_MASK \
+		$(AUTOCONF_OUTPUT)
+	$Q sh -- '$<' '$@' \
+		HAVE_TCA_FLOWER_KEY_ENC_UDP_SRC_PORT \
+		linux/pkt_cls.h \
+		enum TCA_FLOWER_KEY_ENC_UDP_SRC_PORT \
+		$(AUTOCONF_OUTPUT)
+	$Q sh -- '$<' '$@' \
+		HAVE_TCA_FLOWER_KEY_ENC_UDP_SRC_PORT_MASK \
+		linux/pkt_cls.h \
+		enum TCA_FLOWER_KEY_ENC_UDP_SRC_PORT_MASK \
+		$(AUTOCONF_OUTPUT)
+	$Q sh -- '$<' '$@' \
+		HAVE_TCA_FLOWER_KEY_ENC_UDP_DST_PORT \
+		linux/pkt_cls.h \
+		enum TCA_FLOWER_KEY_ENC_UDP_DST_PORT \
+		$(AUTOCONF_OUTPUT)
+	$Q sh -- '$<' '$@' \
+		HAVE_TCA_FLOWER_KEY_ENC_UDP_DST_PORT_MASK \
+		linux/pkt_cls.h \
+		enum TCA_FLOWER_KEY_ENC_UDP_DST_PORT_MASK \
+		$(AUTOCONF_OUTPUT)
+	$Q sh -- '$<' '$@' \
 		HAVE_TC_ACT_VLAN \
 		linux/tc_act/tc_vlan.h \
 		enum TCA_VLAN_PUSH_VLAN_PRIORITY \
diff --git a/drivers/net/mlx5/mlx5_nl_flow.c b/drivers/net/mlx5/mlx5_nl_flow.c
index 672f92863..12802796a 100644
--- a/drivers/net/mlx5/mlx5_nl_flow.c
+++ b/drivers/net/mlx5/mlx5_nl_flow.c
@@ -201,6 +201,45 @@ struct tc_tunnel_key {
 #ifndef HAVE_TCA_FLOWER_KEY_VLAN_ETH_TYPE
 #define TCA_FLOWER_KEY_VLAN_ETH_TYPE 25
 #endif
+#ifndef HAVE_TCA_FLOWER_KEY_ENC_KEY_ID
+#define TCA_FLOWER_KEY_ENC_KEY_ID 26
+#endif
+#ifndef HAVE_TCA_FLOWER_KEY_ENC_IPV4_SRC
+#define TCA_FLOWER_KEY_ENC_IPV4_SRC 27
+#endif
+#ifndef HAVE_TCA_FLOWER_KEY_ENC_IPV4_SRC_MASK
+#define TCA_FLOWER_KEY_ENC_IPV4_SRC_MASK 28
+#endif
+#ifndef HAVE_TCA_FLOWER_KEY_ENC_IPV4_DST
+#define TCA_FLOWER_KEY_ENC_IPV4_DST 29
+#endif
+#ifndef HAVE_TCA_FLOWER_KEY_ENC_IPV4_DST_MASK
+#define TCA_FLOWER_KEY_ENC_IPV4_DST_MASK 30
+#endif
+#ifndef HAVE_TCA_FLOWER_KEY_ENC_IPV6_SRC
+#define TCA_FLOWER_KEY_ENC_IPV6_SRC 31
+#endif
+#ifndef HAVE_TCA_FLOWER_KEY_ENC_IPV6_SRC_MASK
+#define TCA_FLOWER_KEY_ENC_IPV6_SRC_MASK 32
+#endif
+#ifndef HAVE_TCA_FLOWER_KEY_ENC_IPV6_DST
+#define TCA_FLOWER_KEY_ENC_IPV6_DST 33
+#endif
+#ifndef HAVE_TCA_FLOWER_KEY_ENC_IPV6_DST_MASK
+#define TCA_FLOWER_KEY_ENC_IPV6_DST_MASK 34
+#endif
+#ifndef HAVE_TCA_FLOWER_KEY_ENC_UDP_SRC_PORT
+#define TCA_FLOWER_KEY_ENC_UDP_SRC_PORT 43
+#endif
+#ifndef HAVE_TCA_FLOWER_KEY_ENC_UDP_SRC_PORT_MASK
+#define TCA_FLOWER_KEY_ENC_UDP_SRC_PORT_MASK 44
+#endif
+#ifndef HAVE_TCA_FLOWER_KEY_ENC_UDP_DST_PORT
+#define TCA_FLOWER_KEY_ENC_UDP_DST_PORT 45
+#endif
+#ifndef HAVE_TCA_FLOWER_KEY_ENC_UDP_DST_PORT_MASK
+#define TCA_FLOWER_KEY_ENC_UDP_DST_PORT_MASK 46
+#endif
 
 #define BIT(b) (1 << (b))
 #define BIT_ENCAP(e) BIT(MLX5_NL_FLOW_ENCAP_ ## e)
@@ -278,6 +317,7 @@ struct mlx5_nl_flow_ctx {
 struct mlx5_nl_flow {
 	uint32_t size; /**< Size of this object. */
 	uint32_t applied:1; /**< Whether rule is currently applied. */
+	uint32_t decap:1; /**< Decapsulate @p encap. */
 	unsigned int encap_ifindex; /**< Interface to use with @p encap. */
 	unsigned int *ifindex_src; /**< Source interface. */
 	unsigned int *ifindex_dst; /**< Destination interface. */
@@ -301,6 +341,11 @@ enum mlx5_nl_flow_trans {
 	ITEM_TCP,
 	ITEM_UDP,
 	ITEM_VXLAN,
+	ITEM_VXLAN_END,
+	ITEM_TUN_ETH,
+	ITEM_TUN_IPV4,
+	ITEM_TUN_IPV6,
+	ITEM_TUN_UDP,
 	ACTIONS,
 	ACTION_VOID,
 	ACTION_PORT_ID,
@@ -339,7 +384,12 @@ static const enum mlx5_nl_flow_trans *const mlx5_nl_flow_trans[] = {
 	[ITEM_IPV6] = TRANS(ITEM_TCP, ITEM_UDP, PATTERN_COMMON),
 	[ITEM_TCP] = TRANS(PATTERN_COMMON),
 	[ITEM_UDP] = TRANS(ITEM_VXLAN, PATTERN_COMMON),
-	[ITEM_VXLAN] = TRANS(PATTERN_COMMON),
+	[ITEM_VXLAN] = TRANS(ITEM_TUN_ETH, PATTERN_COMMON),
+	[ITEM_VXLAN_END] = TRANS(ITEM_ETH, PATTERN_COMMON),
+	[ITEM_TUN_ETH] = TRANS(ITEM_TUN_IPV4, ITEM_TUN_IPV6, PATTERN_COMMON),
+	[ITEM_TUN_IPV4] = TRANS(ITEM_TUN_UDP, PATTERN_COMMON),
+	[ITEM_TUN_IPV6] = TRANS(ITEM_TUN_UDP, PATTERN_COMMON),
+	[ITEM_TUN_UDP] = TRANS(ITEM_VXLAN_END, ITEM_VOID, ITEM_PORT_ID),
 	[ACTIONS] = TRANS(ACTIONS_FATE, ACTIONS_COMMON),
 	[ACTION_VOID] = TRANS(BACK),
 	[ACTION_PORT_ID] = TRANS(ACTION_VOID, END),
@@ -805,6 +855,7 @@ mlx5_nl_flow_transpose(struct mlx5_nl_flow *nl_flow,
 	bool vlan_present;
 	bool vlan_eth_type_set;
 	bool ip_proto_set;
+	bool vxlan_decap;
 	struct mlx5_nl_flow_encap encap;
 	struct nlattr *na_flower;
 	struct nlattr *na_flower_act;
@@ -819,6 +870,7 @@ mlx5_nl_flow_transpose(struct mlx5_nl_flow *nl_flow,
 		goto error_nobufs;
 	nl_flow->size = offsetof(struct mlx5_nl_flow, msg);
 	nl_flow->applied = 0;
+	nl_flow->decap = 0;
 	nl_flow->encap_ifindex = 0;
 	nl_flow->ifindex_src = NULL;
 	nl_flow->ifindex_dst = NULL;
@@ -833,6 +885,7 @@ mlx5_nl_flow_transpose(struct mlx5_nl_flow *nl_flow,
 	vlan_present = false;
 	vlan_eth_type_set = false;
 	ip_proto_set = false;
+	vxlan_decap = false;
 	memset(&encap, 0, sizeof(encap));
 	na_flower = NULL;
 	na_flower_act = NULL;
@@ -850,6 +903,7 @@ mlx5_nl_flow_transpose(struct mlx5_nl_flow *nl_flow,
 			const struct rte_flow_item_ipv6 *ipv6;
 			const struct rte_flow_item_tcp *tcp;
 			const struct rte_flow_item_udp *udp;
+			const struct rte_flow_item_vxlan *vxlan;
 		} spec, mask;
 		union {
 			const struct rte_flow_action_port_id *port_id;
@@ -943,9 +997,6 @@ mlx5_nl_flow_transpose(struct mlx5_nl_flow *nl_flow,
 		na_flower = mnl_attr_nest_start_check(buf, size, TCA_OPTIONS);
 		if (!na_flower)
 			goto error_nobufs;
-		if (!mnl_attr_put_u32_check(buf, size, TCA_FLOWER_FLAGS,
-					    TCA_CLS_FLAGS_SKIP_SW))
-			goto error_nobufs;
 		break;
 	case ITEM_VOID:
 		if (item->type != RTE_FLOW_ITEM_TYPE_VOID)
@@ -1286,16 +1337,215 @@ mlx5_nl_flow_transpose(struct mlx5_nl_flow *nl_flow,
 		++item;
 		break;
 	case ITEM_VXLAN:
+	case ITEM_VXLAN_END:
 		if (item->type != RTE_FLOW_ITEM_TYPE_VXLAN)
 			goto trans;
-		return rte_flow_error_set
-			(error, ENOTSUP, RTE_FLOW_ERROR_TYPE_ITEM, item,
-			 "VXLAN header matching is not supported yet");
+		if (vxlan_decap) {
+			/* Done with outer, continue with inner. */
+			++item;
+			break;
+		}
+		if (encap.mask)
+			return rte_flow_error_set
+				(error, ENOTSUP, RTE_FLOW_ERROR_TYPE_ITEM,
+				 item, "no support for stacked encapsulation");
+		mask.vxlan = mlx5_nl_flow_item_mask
+			(item, &rte_flow_item_vxlan_mask,
+			 &mlx5_nl_flow_encap_mask_supported.vxlan,
+			 &mlx5_nl_flow_mask_empty.vxlan,
+			 sizeof(rte_flow_item_vxlan_mask), error);
+		if (!mask.vxlan)
+			return -rte_errno;
+		spec.vxlan = item->spec;
+		/*
+		 * No TCA_FLOWER_* to match VXLAN traffic. This can only be
+		 * done indirectly through ACTION_VXLAN_DECAP.
+		 *
+		 * Since tunnel encapsulation information must be collected
+		 * from the previous pattern items, the message built so far
+		 * must be discarded, inner traffic will be matched by
+		 * subsequent pattern items.
+		 *
+		 * Reset inner context and process pattern again through a
+		 * different path.
+		 */
+		eth_type_set = false;
+		vlan_present = false;
+		vlan_eth_type_set = false;
+		ip_proto_set = false;
+		nlh = buf;
+		mnl_attr_nest_cancel(nlh, na_flower);
+		na_flower = mnl_attr_nest_start_check(buf, size, TCA_OPTIONS);
+		if (!na_flower)
+			goto error_nobufs;
+		if (memcmp(mask.vxlan->vni, VXLAN_VNI_MASK, 3))
+			return rte_flow_error_set
+				(error, ENOTSUP, RTE_FLOW_ERROR_TYPE_ITEM_MASK,
+				 mask.vxlan,
+				 "VXLAN VNI is either incomplete or missing");
+		if (!mnl_attr_put_u32_check(buf, size,
+					    TCA_FLOWER_KEY_ENC_KEY_ID,
+					    vxlan_vni_as_be32(spec.vxlan->vni)))
+			goto error_nobufs;
+		encap.vxlan.vni = vxlan_vni_as_be32(spec.vxlan->vni);
+		encap.mask |= BIT_ENCAP(VXLAN_VNI);
+		vxlan_decap = true;
+		item = pattern;
+		break;
+	case ITEM_TUN_ETH:
+		if (item->type != RTE_FLOW_ITEM_TYPE_ETH)
+			goto trans;
+		mask.eth = mlx5_nl_flow_item_mask
+			(item, &rte_flow_item_eth_mask,
+			 &mlx5_nl_flow_encap_mask_supported.eth,
+			 &mlx5_nl_flow_mask_empty.eth,
+			 sizeof(rte_flow_item_eth_mask), error);
+		if (!mask.eth)
+			return -rte_errno;
+		spec.eth = item->spec;
+		if ((!is_zero_ether_addr(&mask.eth->dst) ||
+		     !is_zero_ether_addr(&mask.eth->src)) &&
+		    nl_flow != (void *)buf_tmp)
+			DRV_LOG(WARNING,
+				"Ethernet source/destination addresses cannot"
+				" be matched along with VXLAN traffic;"
+				" parameters ignored");
+		/* Source and destination are swapped for decap. */
+		if (is_broadcast_ether_addr(&mask.eth->dst)) {
+			encap.eth.src = spec.eth->dst;
+			encap.mask |= BIT_ENCAP(ETH_SRC);
+		}
+		if (is_broadcast_ether_addr(&mask.eth->src)) {
+			encap.eth.dst = spec.eth->src;
+			encap.mask |= BIT_ENCAP(ETH_DST);
+		}
+		++item;
+		break;
+	case ITEM_TUN_IPV4:
+		if (item->type != RTE_FLOW_ITEM_TYPE_IPV4)
+			goto trans;
+		mask.ipv4 = mlx5_nl_flow_item_mask
+			(item, &rte_flow_item_ipv4_mask,
+			 &mlx5_nl_flow_encap_mask_supported.ipv4,
+			 &mlx5_nl_flow_mask_empty.ipv4,
+			 sizeof(rte_flow_item_ipv4_mask), error);
+		if (!mask.ipv4)
+			return -rte_errno;
+		spec.ipv4 = item->spec;
+		if ((mask.ipv4->hdr.src_addr &&
+		     (!mnl_attr_put_u32_check(buf, size,
+					      TCA_FLOWER_KEY_ENC_IPV4_SRC,
+					      spec.ipv4->hdr.src_addr) ||
+		      !mnl_attr_put_u32_check(buf, size,
+					      TCA_FLOWER_KEY_ENC_IPV4_SRC_MASK,
+					      mask.ipv4->hdr.src_addr))) ||
+		    (mask.ipv4->hdr.dst_addr &&
+		     (!mnl_attr_put_u32_check(buf, size,
+					      TCA_FLOWER_KEY_ENC_IPV4_DST,
+					      spec.ipv4->hdr.dst_addr) ||
+		      !mnl_attr_put_u32_check(buf, size,
+					      TCA_FLOWER_KEY_ENC_IPV4_DST_MASK,
+					      mask.ipv4->hdr.dst_addr))))
+			goto error_nobufs;
+		/* Source and destination are swapped for decap. */
+		if (mask.ipv4->hdr.src_addr == IN_ADDR_MASK) {
+			encap.ip.dst.v4.s_addr = spec.ipv4->hdr.src_addr;
+			encap.mask |= BIT_ENCAP(IPV4_DST);
+		}
+		if (mask.ipv4->hdr.dst_addr == IN_ADDR_MASK) {
+			encap.ip.src.v4.s_addr = spec.ipv4->hdr.dst_addr;
+			encap.mask |= BIT_ENCAP(IPV4_SRC);
+		}
+		++item;
+		break;
+	case ITEM_TUN_IPV6:
+		if (item->type != RTE_FLOW_ITEM_TYPE_IPV6)
+			goto trans;
+		mask.ipv6 = mlx5_nl_flow_item_mask
+			(item, &rte_flow_item_ipv6_mask,
+			 &mlx5_nl_flow_encap_mask_supported.ipv6,
+			 &mlx5_nl_flow_mask_empty.ipv6,
+			 sizeof(rte_flow_item_ipv6_mask), error);
+		if (!mask.ipv6)
+			return -rte_errno;
+		spec.ipv6 = item->spec;
+		if ((!IN6_IS_ADDR_UNSPECIFIED(mask.ipv6->hdr.src_addr) &&
+		     (!mnl_attr_put_check(buf, size,
+					  TCA_FLOWER_KEY_ENC_IPV6_SRC,
+					  sizeof(spec.ipv6->hdr.src_addr),
+					  spec.ipv6->hdr.src_addr) ||
+		      !mnl_attr_put_check(buf, size,
+					  TCA_FLOWER_KEY_ENC_IPV6_SRC_MASK,
+					  sizeof(mask.ipv6->hdr.src_addr),
+					  mask.ipv6->hdr.src_addr))) ||
+		    (!IN6_IS_ADDR_UNSPECIFIED(mask.ipv6->hdr.dst_addr) &&
+		     (!mnl_attr_put_check(buf, size,
+					  TCA_FLOWER_KEY_ENC_IPV6_DST,
+					  sizeof(spec.ipv6->hdr.dst_addr),
+					  spec.ipv6->hdr.dst_addr) ||
+		      !mnl_attr_put_check(buf, size,
+					  TCA_FLOWER_KEY_ENC_IPV6_DST_MASK,
+					  sizeof(mask.ipv6->hdr.dst_addr),
+					  mask.ipv6->hdr.dst_addr))))
+			goto error_nobufs;
+		/* Source and destination are swapped for decap. */
+		if (!memcmp(mask.ipv6->hdr.src_addr, IN6_ADDR_MASK, 16)) {
+			encap.ip.dst.v6 =
+				*(struct in6_addr *)&spec.ipv6->hdr.src_addr;
+			encap.mask |= BIT_ENCAP(IPV6_DST);
+		}
+		if (!memcmp(mask.ipv6->hdr.dst_addr, IN6_ADDR_MASK, 16)) {
+			encap.ip.src.v6 =
+				*(struct in6_addr *)&spec.ipv6->hdr.dst_addr;
+			encap.mask |= BIT_ENCAP(IPV6_SRC);
+		}
+		++item;
+		break;
+	case ITEM_TUN_UDP:
+		if (item->type != RTE_FLOW_ITEM_TYPE_UDP)
+			goto trans;
+		mask.udp = mlx5_nl_flow_item_mask
+			(item, &rte_flow_item_udp_mask,
+			 &mlx5_nl_flow_encap_mask_supported.udp,
+			 &mlx5_nl_flow_mask_empty.udp,
+			 sizeof(rte_flow_item_udp_mask), error);
+		if (!mask.udp)
+			return -rte_errno;
+		spec.udp = item->spec;
+		if ((mask.udp->hdr.src_port &&
+		     (!mnl_attr_put_u16_check(buf, size,
+					      TCA_FLOWER_KEY_ENC_UDP_SRC_PORT,
+					      spec.udp->hdr.src_port) ||
+		      !mnl_attr_put_u16_check
+			(buf, size, TCA_FLOWER_KEY_ENC_UDP_SRC_PORT_MASK,
+			 mask.udp->hdr.src_port))) ||
+		    (mask.udp->hdr.dst_port &&
+		     (!mnl_attr_put_u16_check(buf, size,
+					      TCA_FLOWER_KEY_ENC_UDP_DST_PORT,
+					      spec.udp->hdr.dst_port) ||
+		      !mnl_attr_put_u16_check
+			(buf, size, TCA_FLOWER_KEY_ENC_UDP_DST_PORT_MASK,
+			 mask.udp->hdr.dst_port))))
+			goto error_nobufs;
+		/* Source and destination are swapped for decap. */
+		if (mask.udp->hdr.src_port == BE16_MASK) {
+			encap.udp.dst = spec.udp->hdr.src_port;
+			encap.mask |= BIT_ENCAP(UDP_DST);
+		}
+		if (mask.udp->hdr.dst_port == BE16_MASK) {
+			encap.udp.src = spec.udp->hdr.dst_port;
+			encap.mask |= BIT_ENCAP(UDP_SRC);
+		}
+		++item;
+		break;
 	case ACTIONS:
 		if (item->type != RTE_FLOW_ITEM_TYPE_END)
 			goto trans;
 		assert(na_flower);
 		assert(!na_flower_act);
+		if (!mnl_attr_put_u32_check(buf, size, TCA_FLOWER_FLAGS,
+					    TCA_CLS_FLAGS_SKIP_SW))
+			goto error_nobufs;
 		na_flower_act =
 			mnl_attr_nest_start_check(buf, size, TCA_FLOWER_ACT);
 		if (!na_flower_act)
@@ -1446,14 +1696,35 @@ mlx5_nl_flow_transpose(struct mlx5_nl_flow *nl_flow,
 		}
 		++action;
 		break;
+	case ACTION_VXLAN_DECAP:
+		if (action->type != RTE_FLOW_ACTION_TYPE_VXLAN_DECAP)
+			goto trans;
+		if (!vxlan_decap)
+			return rte_flow_error_set
+				(error, ENOTSUP, RTE_FLOW_ERROR_TYPE_ACTION,
+				 action,
+				 "VXLAN decapsulation is only supported after"
+				 " matching VXLAN traffic explicitly first");
+		i = TCA_TUNNEL_KEY_ACT_RELEASE;
+		nl_flow->decap = 1;
+		conf.vxlan_encap = NULL;
+		goto vxlan_encap;
 	case ACTION_VXLAN_ENCAP:
 		if (action->type != RTE_FLOW_ACTION_TYPE_VXLAN_ENCAP)
 			goto trans;
+		if (vxlan_decap)
+			return rte_flow_error_set
+				(error, ENOTSUP, RTE_FLOW_ERROR_TYPE_ACTION,
+				 action,
+				 "cannot combine VXLAN header matching with"
+				 " encapsulation");
 		conf.vxlan_encap = action->conf;
 		if (mlx5_nl_flow_encap_reap(&encap,
 					    conf.vxlan_encap->definition,
 					    error))
 			return -rte_errno;
+		i = TCA_TUNNEL_KEY_ACT_SET;
+vxlan_encap:
 		act_index =
 			mnl_attr_nest_start_check(buf, size, act_index_cur++);
 		if (!act_index ||
@@ -1467,10 +1738,11 @@ mlx5_nl_flow_transpose(struct mlx5_nl_flow *nl_flow,
 					sizeof(struct tc_tunnel_key),
 					&(struct tc_tunnel_key){
 						.action = TC_ACT_PIPE,
-						.t_action =
-							TCA_TUNNEL_KEY_ACT_SET,
+						.t_action = i,
 					}))
 			goto error_nobufs;
+		if (!conf.vxlan_encap)
+			goto vxlan_encap_end;
 		if (encap.mask & BIT_ENCAP(IPV4_SRC) &&
 		    !mnl_attr_put_u32_check
 		    (buf, size, TCA_TUNNEL_KEY_ENC_IPV4_SRC,
@@ -1507,16 +1779,11 @@ mlx5_nl_flow_transpose(struct mlx5_nl_flow *nl_flow,
 		if (!mnl_attr_put_u32_check
 		    (buf, size, TCA_TUNNEL_KEY_ENC_KEY_ID, encap.vxlan.vni))
 			goto error_nobufs;
+vxlan_encap_end:
 		mnl_attr_nest_end(buf, act);
 		mnl_attr_nest_end(buf, act_index);
 		++action;
 		break;
-	case ACTION_VXLAN_DECAP:
-		if (action->type != RTE_FLOW_ACTION_TYPE_VXLAN_DECAP)
-			goto trans;
-		return rte_flow_error_set
-			(error, ENOTSUP, RTE_FLOW_ERROR_TYPE_ACTION, action,
-			 "VXLAN decap is not supported yet");
 	case END:
 		if (item->type != RTE_FLOW_ITEM_TYPE_END ||
 		    action->type != RTE_FLOW_ACTION_TYPE_END)
@@ -1844,15 +2111,26 @@ mlx5_nl_flow_ifindex_vxlan(struct mlx5_nl_flow_ctx *ctx, unsigned int ifindex,
 	 * cannot be worked around by picking a random value here and using
 	 * a different one when creating flow rules later.
 	 *
-	 * Therefore request a hopefully unique VNI based on the interface
-	 * index in order to work around EEXIST. VNI will be overridden
-	 * later on a flow rule basis thanks to IFLA_VXLAN_COLLECT_METADATA.
+	 * There is another way to work around EEXIST by assigning a unique
+	 * VNI to the VXLAN interface (e.g. by emitting IFLA_VXLAN_ID based
+	 * on underlying ifindex), however doing so breaks decap as it
+	 * prevents the kernel from matching VNI when looking for a VXLAN
+	 * interface in that direction. Note that iproute2 doesn't allow
+	 * this combination either.
+	 *
+	 * Creating non-external VXLAN interfaces with fixed outer
+	 * properties was also considered. Problem is that not only it won't
+	 * scale to large numbers, it appears that only interfaces with
+	 * dynamic properties (external) can be offloaded to hardware.
+	 *
+	 * Hence the following limitation: as long as VXLAN encap/decap flow
+	 * rules exist on a given DPDK port, the local UDP port they rely on
+	 * can only be used by flow rules on that port. They will fail with
+	 * EEXIST on others.
 	 */
 	if (!mnl_attr_put_u16_check(nlh, sizeof(buf), IFLA_VXLAN_PORT,
 				    vxlan_port))
 		goto exit;
-	if (!mnl_attr_put_u32_check(nlh, sizeof(buf), IFLA_VXLAN_ID, ifindex))
-		goto exit;
 	mnl_attr_nest_end(nlh, na_vxlan);
 	mnl_attr_nest_end(nlh, na_info);
 	ret = mlx5_nl_flow_chat(ctx, nlh, NULL, NULL);
@@ -2022,8 +2300,9 @@ mlx5_nl_flow_encap_neigh(struct mlx5_nl_flow_ctx *ctx,
 		goto error_nobufs;
 	if (encap->mask & BIT_ENCAP(ETH_SRC) && enable)
 		DRV_LOG(WARNING,
-			"Ethernet source address cannot be forced"
-			" for VXLAN encap; parameter ignored");
+			"Ethernet source address (encap) or destination"
+			" address (decap) cannot be forced for VXLAN"
+			" encap/decap; parameter ignored");
 	if (encap->mask & BIT_ENCAP(ETH_DST) &&
 	    !mnl_attr_put_check(nlh, sizeof(buf), NDA_LLADDR,
 				sizeof(encap->eth.dst), &encap->eth.dst))
@@ -2325,9 +2604,12 @@ mlx5_nl_flow_create(struct mlx5_nl_flow_ctx *ctx, struct mlx5_nl_flow *nl_flow,
 {
 	struct nlmsghdr *nlh = (void *)nl_flow->msg;
 	struct mlx5_nl_flow_encap *encap =
-		nl_flow->encap && nl_flow->ifindex_dst ?
+		nl_flow->encap && nl_flow->ifindex_dst && nl_flow->ifindex_src ?
 		nl_flow->encap : NULL;
-	unsigned int ifindex = encap ? *nl_flow->ifindex_dst : 0;
+	unsigned int *ifindex_target =
+		nl_flow->decap ?
+		nl_flow->ifindex_src : nl_flow->ifindex_dst;
+	unsigned int ifindex = encap ? *ifindex_target : 0;
 	int ret;
 
 	if (nl_flow->applied)
@@ -2339,11 +2621,11 @@ mlx5_nl_flow_create(struct mlx5_nl_flow_ctx *ctx, struct mlx5_nl_flow *nl_flow,
 			(ctx, encap, ifindex, true, error);
 		if (!nl_flow->encap_ifindex)
 			return -rte_errno;
-		*nl_flow->ifindex_dst = nl_flow->encap_ifindex;
+		*ifindex_target = nl_flow->encap_ifindex;
 	}
 	ret = mlx5_nl_flow_chat(ctx, nlh, NULL, NULL);
 	if (encap)
-		*nl_flow->ifindex_dst = ifindex;
+		*ifindex_target = ifindex;
 	if (!ret) {
 		nl_flow->applied = 1;
 		return 0;
@@ -2378,9 +2660,11 @@ mlx5_nl_flow_destroy(struct mlx5_nl_flow_ctx *ctx, struct mlx5_nl_flow *nl_flow,
 {
 	struct nlmsghdr *nlh = (void *)nl_flow->msg;
 	struct mlx5_nl_flow_encap *encap =
-		nl_flow->encap && nl_flow->ifindex_dst ?
+		nl_flow->encap && nl_flow->ifindex_dst && nl_flow->ifindex_src ?
 		nl_flow->encap : NULL;
-	unsigned int ifindex = encap ? *nl_flow->ifindex_dst : 0;
+	unsigned int *ifindex_target =
+		nl_flow->decap ? nl_flow->ifindex_src : nl_flow->ifindex_dst;
+	unsigned int ifindex = encap ? *ifindex_target : 0;
 	int err = 0;
 	int ret;
 
@@ -2392,11 +2676,11 @@ mlx5_nl_flow_destroy(struct mlx5_nl_flow_ctx *ctx, struct mlx5_nl_flow *nl_flow,
 		if (!mlx5_nl_flow_encap_ifindex
 		    (ctx, encap, ifindex, false, error))
 			err = rte_errno;
-		*nl_flow->ifindex_dst = nl_flow->encap_ifindex;
+		*ifindex_target = nl_flow->encap_ifindex;
 	}
 	ret = mlx5_nl_flow_chat(ctx, nlh, NULL, NULL);
 	if (encap)
-		*nl_flow->ifindex_dst = ifindex;
+		*ifindex_target = ifindex;
 	nl_flow->applied = 0;
 	if (err) {
 		rte_errno = err;
-- 
2.11.0

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2018-08-31  9:57 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-08-31  9:57 [dpdk-dev] [PATCH 0/8] net/mlx5: add switch offload for VXLAN encap/decap Adrien Mazarguil
2018-08-31  9:57 ` [dpdk-dev] [PATCH 1/8] net/mlx5: speed up interface index retrieval for flow rules Adrien Mazarguil
2018-08-31  9:57 ` [dpdk-dev] [PATCH 2/8] net/mlx5: clean up redundant interface name getters Adrien Mazarguil
2018-08-31  9:57 ` [dpdk-dev] [PATCH 3/8] net/mlx5: rename internal function Adrien Mazarguil
2018-08-31  9:57 ` [dpdk-dev] [PATCH 4/8] net/mlx5: enhance TC flow rule send/ack function Adrien Mazarguil
2018-08-31  9:57 ` [dpdk-dev] [PATCH 5/8] net/mlx5: prepare switch flow rule parser for encap offloads Adrien Mazarguil
2018-08-31  9:57 ` [dpdk-dev] [PATCH 6/8] net/mlx5: add convenience macros to switch flow rule engine Adrien Mazarguil
2018-08-31  9:57 ` [dpdk-dev] [PATCH 7/8] net/mlx5: add VXLAN encap support to switch flow rules Adrien Mazarguil
2018-08-31  9:57 ` [dpdk-dev] [PATCH 8/8] net/mlx5: add VXLAN decap " Adrien Mazarguil

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).