DPDK patches and discussions
 help / color / mirror / Atom feed
* [dpdk-dev] [RFC PATCH 0/6] General tunneling APIs
@ 2015-12-23  8:49 Jijiang Liu
  2015-12-23  8:49 ` [dpdk-dev] [RFC PATCH 1/6] rte_ether: extend rte_eth_tunnel_flow structure Jijiang Liu
                   ` (7 more replies)
  0 siblings, 8 replies; 13+ messages in thread
From: Jijiang Liu @ 2015-12-23  8:49 UTC (permalink / raw)
  To: dev

I want to define a set of General tunneling APIs, which are used to accelarate tunneling packet processing in DPDK,
In this RFC patch set, I wll explain my idea using some codes.

1. Using flow director offload to define a tunnel flow in a pair of queues.
   
flow rule: src IP + dst IP + src port + dst port + tunnel ID (for VXLAN)

For example:
	struct rte_eth_tunnel_conf{
	.tunnel_type = VXLAN,
	.rx_queue = 1,
	.tx_queue = 1,
	.filter_type = 'src ip + dst ip + src port + dst port + tunnel id' 
	.flow_tnl {
         	.tunnel_type = VXLAN,
         	.tunnel_id = 100,
         	.remote_mac = 11.22.33.44.55.66,
         .ip_type = ipv4, 
         .outer_ipv4.src_ip = 192.168.10.1
         .outer_ipv4.dst_ip = 10.239.129.11
         .src_port = 1000,
         .dst_port =2000
};
       
2. Configure tunnel flow for a device and for a pair of queues.

rte_eth_dev_tunnel_configure(0, &rte_eth_tunnel_conf);

In this API, it will call RX decapsulation and TX encapsulation callback function if HW doesn't support encap/decap, and
a space will be allocated for tunnel configuration and store a pointer to this new allocated space as dev->post_rx/tx_burst_cbs[].param.

rte_eth_add_rx_callback(port_id, tunnel_conf.rx_queue,
                        rte_eth_tunnel_decap, (void *)tunnel_conf);
rte_eth_add_tx_callback(port_id, tunnel_conf.tx_queue,
                        rte_eth_tunnel_encap, (void *)tunnel_conf)

3. Using rte_vxlan_decap_burst() to do decapsulation of tunneling packet.

4. Using rte_vxlan_encap_burst() to do encapsulation of tunneling packet.
   The 'src ip, dst ip, src port, dst port and  tunnel ID" can be got from tunnel configuration.
   And SIMD is used to accelarate the operation. 

How to use these APIs, there is a example below:

1)at config phase

dev_config(port, ...);
tunnel_config(port,...);
...
dev_start(port);
...
rx_burst(port, rxq,... );
tx_burst(port, txq,...);


2)at transmitting packet phase
The only outer src/dst MAC address need to be set for TX tunnel configuration in dev->post_tx_burst_cbs[].param.

In this patch set, I have not finished all of codes, the purpose of sending patch set is that I would like to collect more comments and sugestions on this idea.


Jijiang Liu (6):
  extend rte_eth_tunnel_flow
  define tunnel flow structure and APIs
  implement tunnel flow APIs
  define rte_vxlan_decap/encap
  implement rte_vxlan_decap/encap
  i40e tunnel configure

 drivers/net/i40e/i40e_ethdev.c             |   41 +++++
 lib/librte_ether/libtunnel/rte_vxlan_opt.c |  251 ++++++++++++++++++++++++++++
 lib/librte_ether/libtunnel/rte_vxlan_opt.h |   49 ++++++
 lib/librte_ether/rte_eth_ctrl.h            |   14 ++-
 lib/librte_ether/rte_ethdev.h              |   28 +++
 lib/librte_ether/rte_ethdev.c              |   60 ++
 5 files changed, 440 insertions(+), 3 deletions(-)
 create mode 100644 lib/librte_ether/libtunnel/rte_vxlan_opt.c
 create mode 100644 lib/librte_ether/libtunnel/rte_vxlan_opt.h

-- 
1.7.7.6

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [dpdk-dev] [RFC PATCH 1/6] rte_ether: extend rte_eth_tunnel_flow structure
  2015-12-23  8:49 [dpdk-dev] [RFC PATCH 0/6] General tunneling APIs Jijiang Liu
@ 2015-12-23  8:49 ` Jijiang Liu
  2015-12-23  8:49 ` [dpdk-dev] [RFC PATCH 2/6] rte_ether: define tunnel flow structure and APIs Jijiang Liu
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 13+ messages in thread
From: Jijiang Liu @ 2015-12-23  8:49 UTC (permalink / raw)
  To: dev

The purpose of extending this structure is to support more tunnel filter conditions.

Signed-off-by: Jijiang Liu <jijiang.liu@intel.com>
---
 lib/librte_ether/rte_eth_ctrl.h |   14 +++++++++++---
 1 files changed, 11 insertions(+), 3 deletions(-)

diff --git a/lib/librte_ether/rte_eth_ctrl.h b/lib/librte_ether/rte_eth_ctrl.h
index ce224ad..39f52d9 100644
--- a/lib/librte_ether/rte_eth_ctrl.h
+++ b/lib/librte_ether/rte_eth_ctrl.h
@@ -494,9 +494,17 @@ enum rte_eth_fdir_tunnel_type {
  * NVGRE
  */
 struct rte_eth_tunnel_flow {
-	enum rte_eth_fdir_tunnel_type tunnel_type; /**< Tunnel type to match. */
-	uint32_t tunnel_id;                        /**< Tunnel ID to match. TNI, VNI... */
-	struct ether_addr mac_addr;                /**< Mac address to match. */
+	enum rte_eth_tunnel_type tunnel_type;
+	uint64_t tunnel_id;  /**< Tunnel ID to match. TNI, VNI... */
+	struct ether_addr outer_src_mac;  /* for TX */
+	struct ether_addr outer_peer_mac; /* for TX */
+	enum rte_tunnel_iptype outer_ip_type; /**< IP address type. */
+	union {
+		struct rte_eth_ipv4_flow outer_ipv4;
+		struct rte_eth_ipv6_flow outer_ipv6;
+	} outer_ip_addr;
+	uint16_t dst_port;
+	uint16_t src_port;
 };
 
 /**
-- 
1.7.7.6

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [dpdk-dev] [RFC PATCH 2/6] rte_ether: define tunnel flow structure and APIs
  2015-12-23  8:49 [dpdk-dev] [RFC PATCH 0/6] General tunneling APIs Jijiang Liu
  2015-12-23  8:49 ` [dpdk-dev] [RFC PATCH 1/6] rte_ether: extend rte_eth_tunnel_flow structure Jijiang Liu
@ 2015-12-23  8:49 ` Jijiang Liu
  2015-12-23  8:49 ` [dpdk-dev] [RFC PATCH 3/6] rte_ether: implement tunnel config API Jijiang Liu
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 13+ messages in thread
From: Jijiang Liu @ 2015-12-23  8:49 UTC (permalink / raw)
  To: dev

Add the struct 'rte_eth_tunnel_conf' and the tunnel configuration API. 

Signed-off-by: Jijiang Liu <jijiang.liu@intel.com>
---
 lib/librte_ether/rte_ethdev.h |   28 ++++++++++++++++++++++++++++
 1 files changed, 28 insertions(+), 0 deletions(-)

diff --git a/lib/librte_ether/rte_ethdev.h b/lib/librte_ether/rte_ethdev.h
index bada8ad..cb4d9a2 100644
--- a/lib/librte_ether/rte_ethdev.h
+++ b/lib/librte_ether/rte_ethdev.h
@@ -630,6 +630,18 @@ struct rte_eth_rxconf {
 	uint8_t rx_deferred_start; /**< Do not start queue with rte_eth_dev_start(). */
 };
 
+/**
+ * A structure used to configure tunnel flow of an Ethernet port.
+ */
+struct rte_eth_tunnel_conf {
+	uint16_t rx_queue;
+	uint16_t tx_queue;
+	uint16_t udp_tunnel_port;
+	uint16_t nb_flow;
+	uint16_t filter_type;
+	struct rte_eth_tunnel_flow *tunnel_flow;
+};
+
 #define ETH_TXQ_FLAGS_NOMULTSEGS 0x0001 /**< nb_segs=1 for all mbufs */
 #define ETH_TXQ_FLAGS_NOREFCOUNT 0x0002 /**< refcnt can be ignored */
 #define ETH_TXQ_FLAGS_NOMULTMEMP 0x0004 /**< all bufs come from same mempool */
@@ -810,6 +822,7 @@ struct rte_eth_conf {
 #define DEV_RX_OFFLOAD_TCP_CKSUM   0x00000008
 #define DEV_RX_OFFLOAD_TCP_LRO     0x00000010
 #define DEV_RX_OFFLOAD_QINQ_STRIP  0x00000020
+#define DEV_RX_OFFLOAD_TUNNEL_DECAP 0x00000040
 
 /**
  * TX offload capabilities of a device.
@@ -1210,6 +1223,10 @@ typedef int (*eth_udp_tunnel_add_t)(struct rte_eth_dev *dev,
 
 typedef int (*eth_udp_tunnel_del_t)(struct rte_eth_dev *dev,
 				    struct rte_eth_udp_tunnel *tunnel_udp);
+
+typedef int (*eth_tunnel_flow_conf_t)(struct rte_eth_dev *dev,
+				      struct rte_eth_tunnel_conf *tunnel_conf);
+
 /**< @internal Delete tunneling UDP info */
 
 typedef int (*eth_set_mc_addr_list_t)(struct rte_eth_dev *dev,
@@ -1385,6 +1402,7 @@ struct eth_dev_ops {
 	eth_set_vf_vlan_filter_t   set_vf_vlan_filter;  /**< Set VF VLAN filter */
 	eth_udp_tunnel_add_t       udp_tunnel_add;
 	eth_udp_tunnel_del_t       udp_tunnel_del;
+	eth_tunnel_flow_conf_t     tunnel_configure;
 	eth_set_queue_rate_limit_t set_queue_rate_limit;   /**< Set queue rate limit */
 	eth_set_vf_rate_limit_t    set_vf_rate_limit;   /**< Set VF rate limit */
 	/** Update redirection table. */
@@ -1821,6 +1839,16 @@ extern int rte_eth_dev_configure(uint8_t port_id,
 				 const struct rte_eth_conf *eth_conf);
 
 /**
+ * Configure an Ethernet device for tunnelling packet.
+ *
+ * @return
+ *   - 0: Success, device configured.
+ *    - <0: Error code returned by the driver configuration function.
+ */    
+extern int rte_eth_dev_tunnel_configure(uint8_t port_id,
+					struct rte_eth_tunnel_conf *tunnel_conf);
+
+/**
  * Allocate and set up a receive queue for an Ethernet device.
  *
  * The function allocates a contiguous block of memory for *nb_rx_desc*
-- 
1.7.7.6

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [dpdk-dev] [RFC PATCH 3/6] rte_ether: implement tunnel config API
  2015-12-23  8:49 [dpdk-dev] [RFC PATCH 0/6] General tunneling APIs Jijiang Liu
  2015-12-23  8:49 ` [dpdk-dev] [RFC PATCH 1/6] rte_ether: extend rte_eth_tunnel_flow structure Jijiang Liu
  2015-12-23  8:49 ` [dpdk-dev] [RFC PATCH 2/6] rte_ether: define tunnel flow structure and APIs Jijiang Liu
@ 2015-12-23  8:49 ` Jijiang Liu
  2015-12-23  8:49 ` [dpdk-dev] [RFC PATCH 4/6] rte_ether: define rte_eth_vxlan_decap and rte_eth_vxlan_encap Jijiang Liu
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 13+ messages in thread
From: Jijiang Liu @ 2015-12-23  8:49 UTC (permalink / raw)
  To: dev

Signed-off-by: Jijiang Liu <jijiang.liu@intel.com>
---
 lib/librte_ether/rte_ethdev.c |   60 +++++++++++++++++++++++++++++++++++++++++
 1 files changed, 60 insertions(+), 0 deletions(-)

diff --git a/lib/librte_ether/rte_ethdev.c b/lib/librte_ether/rte_ethdev.c
index c3eed49..6725398 100644
--- a/lib/librte_ether/rte_ethdev.c
+++ b/lib/librte_ether/rte_ethdev.c
@@ -1004,6 +1004,66 @@ rte_eth_dev_configure(uint8_t port_id, uint16_t nb_rx_q, uint16_t nb_tx_q,
 	return 0;
 }
 
+int
+rte_eth_dev_tunnel_configure(uint8_t port_id,
+			     struct rte_eth_tunnel_conf *tunnel_conf)
+{
+	struct rte_eth_dev *dev;
+	struct rte_eth_dev_info dev_info;
+	int diag;
+
+	/* This function is only safe when called from the primary process
+ 	* * in a multi-process setup*/
+	RTE_PROC_PRIMARY_OR_ERR_RET(-E_RTE_SECONDARY);
+
+	RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -EINVAL);
+	
+	dev = &rte_eth_devices[port_id];
+
+	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->dev_infos_get, -ENOTSUP);
+
+	/*
+ 	 * * Check that the numbers of RX and TX queues are not greater
+ 	 * * than the configured number of RX and TX queues supported by the
+ 	 * * configured device.
+ 	 * */
+	(*dev->dev_ops->dev_infos_get)(dev, &dev_info);
+	if (tunnel_conf->rx_queue > dev->data->nb_rx_queues - 1) {
+		RTE_PMD_DEBUG_TRACE("ethdev port_id=%d nb_rx_queues=%d > %d\n",
+				port_id, nb_rx_q, dev_info.max_rx_queues);
+		return -EINVAL;
+	}
+
+	if (tunnel_conf->tx_queue > dev->data->nb_rx_queues -1 ) {
+		RTE_PMD_DEBUG_TRACE("ethdev port_id=%d nb_tx_queues=%d > %d\n",
+				port_id, nb_tx_q, dev_info.max_tx_queues);
+		return -EINVAL;
+	}
+
+	tunnel_conf->tunnel_flow = rte_zmalloc(NULL,
+				sizeof(struct rte_eth_tunnel_flow)
+				* tunnel_conf->nb_flow, 0);
+	
+	/* Copy the dev_conf parameter into the dev structure */
+	memcpy(dev->data->dev_conf.tunnel_conf[tunnel_conf->rx_queue],
+			tunnel_conf, sizeof(struct rte_eth_tunnel_conf));
+
+	rte_eth_add_rx_callback(port_id, tunnel_conf->rx_queue,
+				rte_eth_tunnel_decap, (void *)tunnel_conf);
+
+	rte_eth_add_tx_callback(port_id, tunnel_conf->tx_queue,
+				rte_eth_tunnel_encap, (void *)tunnel_conf)
+
+	diag = (*dev->dev_ops->tunnel_configure)(dev);
+	if (diag != 0) {
+		RTE_PMD_DEBUG_TRACE("port%d dev_tunnel_configure = %d\n",
+				port_id, diag);
+		return diag;
+	}
+
+	return 0;
+}
+
 static void
 rte_eth_dev_config_restore(uint8_t port_id)
 {
-- 
1.7.7.6

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [dpdk-dev] [RFC PATCH 4/6] rte_ether: define rte_eth_vxlan_decap and rte_eth_vxlan_encap
  2015-12-23  8:49 [dpdk-dev] [RFC PATCH 0/6] General tunneling APIs Jijiang Liu
                   ` (2 preceding siblings ...)
  2015-12-23  8:49 ` [dpdk-dev] [RFC PATCH 3/6] rte_ether: implement tunnel config API Jijiang Liu
@ 2015-12-23  8:49 ` Jijiang Liu
  2015-12-23  8:49 ` [dpdk-dev] [RFC PATCH 5/6] rte_ether: implement encap and decap APIs Jijiang Liu
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 13+ messages in thread
From: Jijiang Liu @ 2015-12-23  8:49 UTC (permalink / raw)
  To: dev

This function parameters should be the same as callback function (rte_rx/tx_callback_fn).

But we can redefine some parameters as 'unused'.

Signed-off-by: Jijiang Liu <jijiang.liu@intel.com>
---
 lib/librte_ether/libtunnel/rte_vxlan_opt.h |   49 ++++++++++++++++++++++++++++
 1 files changed, 49 insertions(+), 0 deletions(-)
 create mode 100644 lib/librte_ether/libtunnel/rte_vxlan_opt.h

diff --git a/lib/librte_ether/libtunnel/rte_vxlan_opt.h b/lib/librte_ether/libtunnel/rte_vxlan_opt.h
new file mode 100644
index 0000000..d9412fc
--- /dev/null
+++ b/lib/librte_ether/libtunnel/rte_vxlan_opt.h
@@ -0,0 +1,49 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2010-2015 Intel Corporation. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef _RTE_VXLAN_OPT_H_
+#define _RTE_VXLAN_OPT_H_
+
+extern void rte_vxlan_encap_burst (uint8_t port, uint16_t queue,
+			      	   struct rte_mbuf *pkts[],
+				   uint16_t nb_pkts,
+				   uint16_t max_pkts,
+			      	   void *user_param);
+
+extern uint16_t rte_vxlan_decap_burst(uint8_t port,
+				      uint16_t queue,
+				      struct rte_mbuf *pkts[],
+				      uint16_t nb_pkts,
+				      void *user_param);
+
+#endif /* _RTE_VXLAN_OPT_H_ */
-- 
1.7.7.6

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [dpdk-dev] [RFC PATCH 5/6] rte_ether: implement encap and decap APIs
  2015-12-23  8:49 [dpdk-dev] [RFC PATCH 0/6] General tunneling APIs Jijiang Liu
                   ` (3 preceding siblings ...)
  2015-12-23  8:49 ` [dpdk-dev] [RFC PATCH 4/6] rte_ether: define rte_eth_vxlan_decap and rte_eth_vxlan_encap Jijiang Liu
@ 2015-12-23  8:49 ` Jijiang Liu
  2015-12-23 18:32   ` Stephen Hemminger
  2015-12-23  8:49 ` [dpdk-dev] [RFC PATCH 6/6] driver/i40e: tunnel configure in i40e Jijiang Liu
                   ` (2 subsequent siblings)
  7 siblings, 1 reply; 13+ messages in thread
From: Jijiang Liu @ 2015-12-23  8:49 UTC (permalink / raw)
  To: dev

Using SIMD instruction to accelarate encapsulation operation.

Signed-off-by: Jijiang Liu <jijiang.liu@intel.com>
---
 lib/librte_ether/libtunnel/rte_vxlan_opt.c |  251 ++++++++++++++++++++++++++++
 1 files changed, 251 insertions(+), 0 deletions(-)
 create mode 100644 lib/librte_ether/libtunnel/rte_vxlan_opt.c

diff --git a/lib/librte_ether/libtunnel/rte_vxlan_opt.c b/lib/librte_ether/libtunnel/rte_vxlan_opt.c
new file mode 100644
index 0000000..e59ed2c
--- /dev/null
+++ b/lib/librte_ether/libtunnel/rte_vxlan_opt.c
@@ -0,0 +1,251 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2010-2015 Intel Corporation. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+#include <stdint.h>
+#include <rte_mbuf.h>
+#include <rte_ether.h>
+#include <rte_ip.h>
+#include <rte_udp.h>
+#include <rte_tcp.h>
+#include <rte_byteorder.h>
+#include <rte_prefetch.h>
+#include <rte_ethdev.h>
+
+#include <immintrin.h>
+#include <tmmintrin.h>
+#include <mmintrin.h>
+
+#include "vxlan_opt.h"
+
+#ifndef __INTEL_COMPILER
+#pragma GCC diagnostic ignored "-Wcast-qual"
+#endif
+
+#pragma GCC diagnostic ignored "-Wstrict-aliasing"
+
+#define PORT_MIN    49152
+#define PORT_MAX    65535
+#define PORT_RANGE ((PORT_MAX - PORT_MIN) + 1)
+
+#define DUMMY_FOR_TEST
+#define RTE_DEFAULT_VXLAN_PORT 4789
+ 
+#define LOOP           4
+#define MAC_LEN        6
+#define PREFIX         ETHER_HDR_LEN + 4
+#define UDP_PRE_SZ     (sizeof(struct udp_hdr) + sizeof(struct vxlan_hdr))
+#define IP_PRE_SZ      (UDP_PRE_SZ + sizeof(struct ipv4_hdr))
+#define VXLAN_PKT_HDR_SIZE       (IP_PRE_SZ + ETHER_HDR_LEN)
+ 
+#define VXLAN_SIZE     sizeof(struct vxlan_hdr)
+#define INNER_PRE_SZ   (14 + 20 + 8 + 8)
+#define DECAP_OFFSET   (16 + 8 + 8)
+#define DETECT_OFFSET  12
+
+struct eth_pkt_info {
+	uint8_t l2_len;
+	uint16_t ethertype;
+	uint16_t l3_len;
+	uint16_t l4_proto;
+	uint16_t l4_len;
+};
+
+/* 16Bytes tx meta data */
+struct vxlan_tx_meta {
+	uint32_t sip;
+	uint32_t dip;
+	uint32_t vni;
+	uint16_t sport;
+} __attribute__((__aligned__(16)));
+
+
+/* Parse an IPv4 header to fill l3_len, l4_len, and l4_proto */
+static void
+parse_ipv4(struct ipv4_hdr *ipv4_hdr, struct eth_pkt_info *info)
+{
+	struct tcp_hdr *tcp_hdr;
+
+	info->l3_len = (ipv4_hdr->version_ihl & 0x0f) * 4;
+	info->l4_proto = ipv4_hdr->next_proto_id;
+
+	/* only fill l4_len for TCP, it's useful for TSO */
+	if (info->l4_proto == IPPROTO_TCP) {
+		tcp_hdr = (struct tcp_hdr *)((char *)ipv4_hdr + info->l3_len);
+		info->l4_len = (tcp_hdr->data_off & 0xf0) >> 2;
+	} else
+		info->l4_len = 0;
+}
+
+/* Parse an IPv6 header to fill l3_len, l4_len, and l4_proto */
+static void
+parse_ipv6(struct ipv6_hdr *ipv6_hdr, struct eth_pkt_info *info)
+{
+	struct tcp_hdr *tcp_hdr;
+
+	info->l3_len = sizeof(struct ipv6_hdr);
+	info->l4_proto = ipv6_hdr->proto;
+
+	/* only fill l4_len for TCP, it's useful for TSO */
+	if (info->l4_proto == IPPROTO_TCP) {
+		tcp_hdr = (struct tcp_hdr *)((char *)ipv6_hdr + info->l3_len);
+		info->l4_len = (tcp_hdr->data_off & 0xf0) >> 2;
+	} else
+		info->l4_len = 0;
+}
+
+/*
+ * Parse an ethernet header to fill the ethertype, l2_len, l3_len and
+ * ipproto. This function is able to recognize IPv4/IPv6 with one optional vlan
+ * header. The l4_len argument is only set in case of TCP (useful for TSO).
+ */
+static void
+parse_ethernet(struct ether_hdr *eth_hdr, struct eth_pkt_info *info)
+{
+	struct ipv4_hdr *ipv4_hdr;
+	struct ipv6_hdr *ipv6_hdr;
+
+	info->l2_len = sizeof(struct ether_hdr);
+	info->ethertype = eth_hdr->ether_type;
+
+	if (info->ethertype == _htons(ETHER_TYPE_VLAN)) {
+		struct vlan_hdr *vlan_hdr = (struct vlan_hdr *)(eth_hdr + 1);
+
+		info->l2_len  += sizeof(struct vlan_hdr);
+		info->ethertype = vlan_hdr->eth_proto;
+	}
+
+	switch (info->ethertype) {
+	case _htons(ETHER_TYPE_IPv4):
+		ipv4_hdr = (struct ipv4_hdr *) ((char *)eth_hdr + info->l2_len);
+		rte_parse_ipv4(ipv4_hdr, info);
+		break;
+	case _htons(ETHER_TYPE_IPv6):
+		ipv6_hdr = (struct ipv6_hdr *) ((char *)eth_hdr + info->l2_len);
+		rte_parse_ipv6(ipv6_hdr, info);
+		break;
+	default:
+		info->l4_len = 0;
+		info->l3_len = 0;
+		info->l4_proto = 0;
+		break;
+	}
+}
+
+extern void
+rte_vxlan_decap_burst(uint8_t port, uint16_t queue,
+                      struct rte_mbuf *pkts[], uint16_t nb_pkts,
+                      void *user_param)
+{
+	char *pkt;
+	struct eth_pkt_info info;
+	uint16_t outer_hdr_len;
+	uint16_t nb_rx = 0;
+
+	struct ether_hdr *eth_hdr = rte_pktmbuf_mtod(pkt[nb_rx], struct ether_hdr *);
+	
+	/* Assume we are using same rule on this queue,and just analyse first packet */
+	if (user_param == NULL)
+		return;
+
+	parse_ethernet(eth_hdr, &info);
+	outer_hdr_len = info.l2_len + info.l3_len + info.l4_len +
+			sizeof(struct vxlan_hdr);
+
+	rte_pktmbuf_adj(pkt[nb_rx++], outer_header_len);
+
+	while (nb_rx < nb_pkts)
+		rte_pktmbuf_adj(pkt[nb_rx++], outer_header_len);
+}
+
+/* Encapsulation using SIMD and flow rule  to accelarate this operation */
+
+extern void 
+rte_vxlan_encap_burst(uint8_t port, uint16_t queue,
+        struct rte_mbuf *pkts[], uint16_t nb_pkts,
+        rte_eth_tunnel_conf *encap_param)
+{
+	char *pkt;
+	uint16_t len;
+	uint32_t hash;
+	uint16_t nb_rx = 0;
+	__m256i temp, cur;
+	__m256i shuf_msk = _mm256_set_epi8(
+		0xFF, 0, 1, 2,           /* high octet 0~2, 24 bits vni */
+		0xFF, 0xFF, 0xFF, 0xFF,  /* skip vx_flags */
+		0xFF, 0xFF, 0xFF, 0xFF,  /* skip udp len, cksum */
+		0xFF, 0xFF,              /* skip udp dst port */
+		8, 9,                    /* high octet 8~9, 16 bits udp src port */
+		8, 9, 10, 11,            /* low octet 8~11, 32 bits dst ip */
+		0, 1, 2, 3,              /* low octet 0~3, 32 bits src ip */
+		0xFF, 0xFF, 0xFF, 0xFF,  /* skip ttl, proto_id, hdr_csum */
+		0xFF, 0xFF, 0xFF, 0xFF   /* skip packet_id, fragment_offset */
+	);
+
+	
+	hash = rte_hash_crc(phdr, 2 * ETHER_ADDR_LEN, phdr->ether_type);
+
+        meta.src_ip = encap_param->tunnel_flow[0].dst_ip;
+        meta.dst_ip = encap_param->tunnel_flow[0].src_ip;
+        meta.vni = encap_param->tunnel_id;
+        meta.sport =  rte_cpu_to_be_16(((uint64_t) hash * PORT_RANGE) >> 32 + PORT_MIN);
+
+	while (nb_rx < nb_pkts) {
+		len = rte_pktmbuf_pkt_len(pkts[nb_rx]);
+		pkt = rte_pktmbuf_prepend(pkts[nb_rx], VXLAN_PKT_HDR_SIZE);
+
+		/* load 16B meta into 32B register */
+		cur = _mm256_cvtepu32_epi64(_mm_loadu_si128((__m128i *)meta));
+		temp = _mm256_set_epi16(0, 0, 0, 0,
+               		0, rte_cpu_to_be_16(len + UDP_PRE_SZ),
+               		rte_cpu_to_be_16(DEFAULT_VXLAN_PORT), 0,
+               		0, 0, 0, 0,
+               		0, 0x11FF, 0, 0);
+
+		rte_prefetch1(pkts);
+		cur = _mm256_shuffle_epi8(cur, shuf_msk);
+
+		/* write 4 Bytes, IP:4B */
+  		*(uint32_t *)(pkt[nb_rx] + ETHER_HDR_LEN) =
+			rte_cpu_to_be_32(0x4500 << 16 | (len + IP_PRE_SZ));
+
+		/* write 32Btyes, VXLAN:8 UDP:8 IP:16B */
+		_mm256_storeu_si256((__m256i *)(pkt[nb_rx] + PREFIX), cur);
+
+		cur = _mm256_or_si256(cur, temp);
+
+		/* write L2 header */
+		rte_memcpy(pkts[nb_rx], encap_param->peer_mac, MAC_LEN);
+		rte_memcpy(pkts[nb_rx] + MAC_LEN, encap_param->dst_mac, MAC_LEN);
+		*(uint32_t *)(pkts[nb_rx] + MAC_LEN * 2) = rte_cpu_to_be_16(ETHER_TYPE_IPv4);
+		nb_rx++;
+	}
+}
-- 
1.7.7.6

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [dpdk-dev] [RFC PATCH 6/6] driver/i40e: tunnel configure in i40e
  2015-12-23  8:49 [dpdk-dev] [RFC PATCH 0/6] General tunneling APIs Jijiang Liu
                   ` (4 preceding siblings ...)
  2015-12-23  8:49 ` [dpdk-dev] [RFC PATCH 5/6] rte_ether: implement encap and decap APIs Jijiang Liu
@ 2015-12-23  8:49 ` Jijiang Liu
  2015-12-23 11:17 ` [dpdk-dev] [RFC PATCH 0/6] General tunneling APIs Walukiewicz, Miroslaw
  2015-12-23 18:31 ` Stephen Hemminger
  7 siblings, 0 replies; 13+ messages in thread
From: Jijiang Liu @ 2015-12-23  8:49 UTC (permalink / raw)
  To: dev

Add i40e_udp_tunnel_flow_configre() to implement the configuration of flow rule with 'src IP, dst IP, src port, dst port and tunnel ID' using flow director.

Signed-off-by: Jijiang Liu <jijiang.liu@intel.com>
---
 drivers/net/i40e/i40e_ethdev.c |   41 ++++++++++++++++++++++++++++++++++++++++
 1 files changed, 41 insertions(+), 0 deletions(-)

diff --git a/drivers/net/i40e/i40e_ethdev.c b/drivers/net/i40e/i40e_ethdev.c
index 7e03a1f..7d8c8d7 100644
--- a/drivers/net/i40e/i40e_ethdev.c
+++ b/drivers/net/i40e/i40e_ethdev.c
@@ -469,6 +469,7 @@ static const struct eth_dev_ops i40e_eth_dev_ops = {
 	.rss_hash_conf_get            = i40e_dev_rss_hash_conf_get,
 	.udp_tunnel_add               = i40e_dev_udp_tunnel_add,
 	.udp_tunnel_del               = i40e_dev_udp_tunnel_del,
+	.tunnel_configure             = i40e_dev_tunnel_configure,
 	.filter_ctrl                  = i40e_dev_filter_ctrl,
 	.rxq_info_get                 = i40e_rxq_info_get,
 	.txq_info_get                 = i40e_txq_info_get,
@@ -6029,6 +6030,46 @@ i40e_dev_udp_tunnel_del(struct rte_eth_dev *dev,
 	return ret;
 }
 
+static int
+i40e_udp_tunnel_flow_configre(struct i40e_pf *pf, rte_eth_tunnel_conf *tunnel_conf)
+{
+	int  idx, ret;
+	uint8_t filter_idx;
+	struct i40e_hw *hw = I40E_PF_TO_HW(pf);
+
+	/* set filter with src IP + dst IP + src port + dst port + tunnel id*/
+	/* flow director setting */	
+	
+	return 0;
+}
+
+/* Add UDP tunneling port */
+static int
+i40e_dev_tunnel_conf(struct rte_eth_dev *dev,
+		     struct rte_eth_tunnel_conf *tunnel_conf)
+{
+	int ret = 0;
+	struct i40e_pf *pf = I40E_DEV_PRIVATE_TO_PF(dev->data->dev_private);
+
+	if (tunnel_tunnel == NULL)
+		return -EINVAL;
+
+	switch (udp_tunnel->prot_type) {
+	case RTE_TUNNEL_TYPE_VXLAN:
+	case RTE_TUNNEL_TYPE_GENEVE:
+	case RTE_TUNNEL_TYPE_TEREDO:
+		ret = i40e_udp_tunnel_flow_configure(pf, tunnel_conf);
+		break;
+
+	default:
+		PMD_DRV_LOG(ERR, "Invalid tunnel type");
+		ret = -1;
+		break;
+	}
+
+	return ret;
+}
+
 /* Calculate the maximum number of contiguous PF queues that are configured */
 static int
 i40e_pf_calc_configured_queues_num(struct i40e_pf *pf)
-- 
1.7.7.6

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [dpdk-dev] [RFC PATCH 0/6] General tunneling APIs
  2015-12-23  8:49 [dpdk-dev] [RFC PATCH 0/6] General tunneling APIs Jijiang Liu
                   ` (5 preceding siblings ...)
  2015-12-23  8:49 ` [dpdk-dev] [RFC PATCH 6/6] driver/i40e: tunnel configure in i40e Jijiang Liu
@ 2015-12-23 11:17 ` Walukiewicz, Miroslaw
  2015-12-28  5:54   ` Liu, Jijiang
  2015-12-23 18:31 ` Stephen Hemminger
  7 siblings, 1 reply; 13+ messages in thread
From: Walukiewicz, Miroslaw @ 2015-12-23 11:17 UTC (permalink / raw)
  To: Liu, Jijiang, dev

Hi Jijang,

I like an idea of tunnel API very much. 

I have a few questions. 

1. I see that you have only i40e support due to lack of HW tunneling support in other NICs. 
I don't see a way how do you want to handle tunneling requests for NICs without HW offload. 

I think that we should have one common function for sending tunneled packets but the initialization should check the NIC capabilities and call some registered function making tunneling in SW in case of lack of HW support.

I know that making tunnel is very time consuming process, but it makes an API more generic. Similar only 3 protocols are supported by i40e by HW and we can imagine about 40 or more different tunnels working with this NIC. 

Making the SW implementation we could support missing tunnels even for i40e.

2. I understand that we need RX HW queue defined in struct rte_eth_tunnel_conf but why tx_queue is necessary?. 
  As I know i40e HW we can set tunneled packet descriptors in any HW queue and receive only on one specific queue.

3. I see a similar problem with receiving tunneled packets on the single queue only. I know that some NICs like fm10k could make hashing on packets and push same tunnel to many queues. Maybe we should support such RSS like feature in the design also. I know that it is not supported by i40e but it is good to have a more flexible API design. 

4. In your implementation you are assuming the there is one tunnel configured per DPDK interface

rte_eth_dev_tunnel_configure(uint8_t port_id,
+			     struct rte_eth_tunnel_conf *tunnel_conf)

The sense of tunnel is lack of interfaces in the system because number of possible VLANs is too small (4095). 
In the DPDK we have only one tunnel per physical port what is useless even with such big acceleration provided with i40e.

In normal use cases there is a need for 10,000s of tunnels per interface. Even for Vxlan we have 24 bits for tunnel definition

I think that we need a special API for sending like rte_eth_dev_tunnel_send_burst where we will provide some tunnel number allocated by rte_eth_dev_tunnel_configure to avoid setting the tunnel specific information separately in each descriptor .

Same on RX we should provide   in  struct rte_eth_tunnel_conf the callback functions that will make some specific action on received tunnel that could be pushing packet to the user ring or setting the tunnel information in RX descriptor or somewhat else.

5. I see that you have implementations for VXLAN,TEREDO, and GENEVE tunnels in i40e drivers. I could  find the implementation for VXLAN encap/decap. Are all files in the patch present?

6. What about with QinQ HW tunneling also supported by i40e HW. I know that the implementation is present in different place but why not include QinQ as additional tunnel. It would be very nice feature to have all tunnels API in single place.

Regards,

Mirek




> -----Original Message-----
> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Jijiang Liu
> Sent: Wednesday, December 23, 2015 9:50 AM
> To: dev@dpdk.org
> Subject: [dpdk-dev] [RFC PATCH 0/6] General tunneling APIs
> 
> I want to define a set of General tunneling APIs, which are used to
> accelarate tunneling packet processing in DPDK,
> In this RFC patch set, I wll explain my idea using some codes.
> 
> 1. Using flow director offload to define a tunnel flow in a pair of queues.
> 
> flow rule: src IP + dst IP + src port + dst port + tunnel ID (for VXLAN)
> 
> For example:
> 	struct rte_eth_tunnel_conf{
> 	.tunnel_type = VXLAN,
> 	.rx_queue = 1,
> 	.tx_queue = 1,
> 	.filter_type = 'src ip + dst ip + src port + dst port + tunnel id'
> 	.flow_tnl {
>          	.tunnel_type = VXLAN,
>          	.tunnel_id = 100,
>          	.remote_mac = 11.22.33.44.55.66,
>          .ip_type = ipv4,
>          .outer_ipv4.src_ip = 192.168.10.1
>          .outer_ipv4.dst_ip = 10.239.129.11
>          .src_port = 1000,
>          .dst_port =2000
> };
> 
> 2. Configure tunnel flow for a device and for a pair of queues.
> 
> rte_eth_dev_tunnel_configure(0, &rte_eth_tunnel_conf);
> 
> In this API, it will call RX decapsulation and TX encapsulation callback
> function if HW doesn't support encap/decap, and
> a space will be allocated for tunnel configuration and store a pointer to this
> new allocated space as dev->post_rx/tx_burst_cbs[].param.
> 
> rte_eth_add_rx_callback(port_id, tunnel_conf.rx_queue,
>                         rte_eth_tunnel_decap, (void *)tunnel_conf);
> rte_eth_add_tx_callback(port_id, tunnel_conf.tx_queue,
>                         rte_eth_tunnel_encap, (void *)tunnel_conf)
> 
> 3. Using rte_vxlan_decap_burst() to do decapsulation of tunneling packet.
> 
> 4. Using rte_vxlan_encap_burst() to do encapsulation of tunneling packet.
>    The 'src ip, dst ip, src port, dst port and  tunnel ID" can be got from tunnel
> configuration.
>    And SIMD is used to accelarate the operation.
> 
> How to use these APIs, there is a example below:
> 
> 1)at config phase
> 
> dev_config(port, ...);
> tunnel_config(port,...);
> ...
> dev_start(port);
> ...
> rx_burst(port, rxq,... );
> tx_burst(port, txq,...);
> 
> 
> 2)at transmitting packet phase
> The only outer src/dst MAC address need to be set for TX tunnel
> configuration in dev->post_tx_burst_cbs[].param.
> 
> In this patch set, I have not finished all of codes, the purpose of sending
> patch set is that I would like to collect more comments and sugestions on
> this idea.
> 
> 
> Jijiang Liu (6):
>   extend rte_eth_tunnel_flow
>   define tunnel flow structure and APIs
>   implement tunnel flow APIs
>   define rte_vxlan_decap/encap
>   implement rte_vxlan_decap/encap
>   i40e tunnel configure
> 
>  drivers/net/i40e/i40e_ethdev.c             |   41 +++++
>  lib/librte_ether/libtunnel/rte_vxlan_opt.c |  251
> ++++++++++++++++++++++++++++
>  lib/librte_ether/libtunnel/rte_vxlan_opt.h |   49 ++++++
>  lib/librte_ether/rte_eth_ctrl.h            |   14 ++-
>  lib/librte_ether/rte_ethdev.h              |   28 +++
>  lib/librte_ether/rte_ethdev.c              |   60 ++
>  5 files changed, 440 insertions(+), 3 deletions(-)
>  create mode 100644 lib/librte_ether/libtunnel/rte_vxlan_opt.c
>  create mode 100644 lib/librte_ether/libtunnel/rte_vxlan_opt.h
> 
> --
> 1.7.7.6

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [dpdk-dev] [RFC PATCH 0/6] General tunneling APIs
  2015-12-23  8:49 [dpdk-dev] [RFC PATCH 0/6] General tunneling APIs Jijiang Liu
                   ` (6 preceding siblings ...)
  2015-12-23 11:17 ` [dpdk-dev] [RFC PATCH 0/6] General tunneling APIs Walukiewicz, Miroslaw
@ 2015-12-23 18:31 ` Stephen Hemminger
  2015-12-28  1:46   ` Liu, Jijiang
  7 siblings, 1 reply; 13+ messages in thread
From: Stephen Hemminger @ 2015-12-23 18:31 UTC (permalink / raw)
  To: Jijiang Liu; +Cc: dev

On Wed, 23 Dec 2015 16:49:46 +0800
Jijiang Liu <jijiang.liu@intel.com> wrote:

> 1)at config phase
> 
> dev_config(port, ...);
> tunnel_config(port,...);
> ...
> dev_start(port);
> ...
> rx_burst(port, rxq,... );
> tx_burst(port, txq,...);

What about dynamically adding and deleting multiple tunnels after
device has started? This would be the more common case in a real world
environment.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [dpdk-dev] [RFC PATCH 5/6] rte_ether: implement encap and decap APIs
  2015-12-23  8:49 ` [dpdk-dev] [RFC PATCH 5/6] rte_ether: implement encap and decap APIs Jijiang Liu
@ 2015-12-23 18:32   ` Stephen Hemminger
  0 siblings, 0 replies; 13+ messages in thread
From: Stephen Hemminger @ 2015-12-23 18:32 UTC (permalink / raw)
  To: Jijiang Liu; +Cc: dev

On Wed, 23 Dec 2015 16:49:51 +0800
Jijiang Liu <jijiang.liu@intel.com> wrote:

> +
> +#ifndef __INTEL_COMPILER
> +#pragma GCC diagnostic ignored "-Wcast-qual"
> +#endif
> +
> +#pragma GCC diagnostic ignored "-Wstrict-aliasing"
> +

Since this is new code, can't you please fix it to be warning safe?

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [dpdk-dev] [RFC PATCH 0/6] General tunneling APIs
  2015-12-23 18:31 ` Stephen Hemminger
@ 2015-12-28  1:46   ` Liu, Jijiang
  0 siblings, 0 replies; 13+ messages in thread
From: Liu, Jijiang @ 2015-12-28  1:46 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: dev



> -----Original Message-----
> From: Stephen Hemminger [mailto:stephen@networkplumber.org]
> Sent: Thursday, December 24, 2015 2:31 AM
> To: Liu, Jijiang
> Cc: dev@dpdk.org
> Subject: Re: [dpdk-dev] [RFC PATCH 0/6] General tunneling APIs
> 
> On Wed, 23 Dec 2015 16:49:46 +0800
> Jijiang Liu <jijiang.liu@intel.com> wrote:
> 
> > 1)at config phase
> >
> > dev_config(port, ...);
> > tunnel_config(port,...);
> > ...
> > dev_start(port);
> > ...
> > rx_burst(port, rxq,... );
> > tx_burst(port, txq,...);
> 
> What about dynamically adding and deleting multiple tunnels after device
> has started? This would be the more common case in a real world
> environment.
Yes, this makes sense, we will support.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [dpdk-dev] [RFC PATCH 0/6] General tunneling APIs
  2015-12-23 11:17 ` [dpdk-dev] [RFC PATCH 0/6] General tunneling APIs Walukiewicz, Miroslaw
@ 2015-12-28  5:54   ` Liu, Jijiang
  2016-01-04 10:48     ` Walukiewicz, Miroslaw
  0 siblings, 1 reply; 13+ messages in thread
From: Liu, Jijiang @ 2015-12-28  5:54 UTC (permalink / raw)
  To: Walukiewicz, Miroslaw, dev

Hi Miroslaw,

The partial answer is below.

> -----Original Message-----
> From: Walukiewicz, Miroslaw
> Sent: Wednesday, December 23, 2015 7:18 PM
> To: Liu, Jijiang; dev@dpdk.org
> Subject: RE: [dpdk-dev] [RFC PATCH 0/6] General tunneling APIs
> 
> Hi Jijang,
> 
> I like an idea of tunnel API very much.
> 
> I have a few questions.
> 
> 1. I see that you have only i40e support due to lack of HW tunneling support
> in other NICs.
> I don't see a way how do you want to handle tunneling requests for NICs
> without HW offload.

The flow director offload mechanism is used here, flow director is a common feature in current NICs.
Here I don't use special related tunneling HW offload features, the goal is that we want to support  all of NICs.

> I think that we should have one common function for sending tunneled
> packets but the initialization should check the NIC capabilities and call some
> registered function making tunneling in SW in case of lack of HW support.
Yes, we should check NIC capabilities.

> I know that making tunnel is very time consuming process, but it makes an
> API more generic. Similar only 3 protocols are supported by i40e by HW and
> we can imagine about 40 or more different tunnels working with this NIC.
> 
> Making the SW implementation we could support missing tunnels even for
> i40e.

In this patch set, I just use VXLAN protocol to demonstrate the framework, 
If the framework is accepted, other tunneling protocol will be added one by one in future. 

> 2. I understand that we need RX HW queue defined in struct
> rte_eth_tunnel_conf but why tx_queue is necessary?.
>   As I know i40e HW we can set tunneled packet descriptors in any HW queue
> and receive only on one specific queue.

As for adding tx_queue here, I have already explained here at [1]

[1] http://dpdk.org/ml/archives/dev/2015-December/030509.html

Do you think it makes sense?

> 4. In your implementation you are assuming the there is one tunnel
> configured per DPDK interface
> 
> rte_eth_dev_tunnel_configure(uint8_t port_id,
> +			     struct rte_eth_tunnel_conf *tunnel_conf)
> 
No, in terms of i40e,  there will  be up to 8K tunnels  in one DPDK interface,
It depends on number of flow rules on a pair of queues.

struct rte_eth_tunnel_conf {
	uint16_t rx_queue;
	uint16_t tx_queue;
	uint16_t udp_tunnel_port;
	uint16_t nb_flow;
	uint16_t filter_type;
	struct rte_eth_tunnel_flow *tunnel_flow;
};

If the ' nb_flow ' is set 2000, and you can configure 2000 flow rules on one queues on a port.

> The sense of tunnel is lack of interfaces in the system because number of
> possible VLANs is too small (4095).
> In the DPDK we have only one tunnel per physical port what is useless even
> with such big acceleration provided with i40e.

> In normal use cases there is a need for 10,000s of tunnels per interface. Even
> for Vxlan we have 24 bits for tunnel definition


We use flow director HW offload here, in terms of i40e, it support up to 8K flow rules of exact match.
This is HW limitation, 10,000s of tunnels per interface is not supported by HW.


> 5. I see that you have implementations for VXLAN,TEREDO, and GENEVE
> tunnels in i40e drivers. I could  find the implementation for VXLAN
> encap/decap. Are all files in the patch present?
No, I have not finished all of codes, just VXLAN here.
Other tunneling protocol will be added one by one in future.

> Regards,
> 
> Mirek
> 
> 
> 
> 
> > -----Original Message-----
> > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Jijiang Liu
> > Sent: Wednesday, December 23, 2015 9:50 AM
> > To: dev@dpdk.org
> > Subject: [dpdk-dev] [RFC PATCH 0/6] General tunneling APIs
> >
> > I want to define a set of General tunneling APIs, which are used to
> > accelarate tunneling packet processing in DPDK, In this RFC patch set,
> > I wll explain my idea using some codes.
> >
> > 1. Using flow director offload to define a tunnel flow in a pair of queues.
> >
> > flow rule: src IP + dst IP + src port + dst port + tunnel ID (for
> > VXLAN)
> >
> > For example:
> > 	struct rte_eth_tunnel_conf{
> > 	.tunnel_type = VXLAN,
> > 	.rx_queue = 1,
> > 	.tx_queue = 1,
> > 	.filter_type = 'src ip + dst ip + src port + dst port + tunnel id'
> > 	.flow_tnl {
> >          	.tunnel_type = VXLAN,
> >          	.tunnel_id = 100,
> >          	.remote_mac = 11.22.33.44.55.66,
> >          .ip_type = ipv4,
> >          .outer_ipv4.src_ip = 192.168.10.1
> >          .outer_ipv4.dst_ip = 10.239.129.11
> >          .src_port = 1000,
> >          .dst_port =2000
> > };
> >
> > 2. Configure tunnel flow for a device and for a pair of queues.
> >
> > rte_eth_dev_tunnel_configure(0, &rte_eth_tunnel_conf);
> >
> > In this API, it will call RX decapsulation and TX encapsulation
> > callback function if HW doesn't support encap/decap, and a space will
> > be allocated for tunnel configuration and store a pointer to this new
> > allocated space as dev->post_rx/tx_burst_cbs[].param.
> >
> > rte_eth_add_rx_callback(port_id, tunnel_conf.rx_queue,
> >                         rte_eth_tunnel_decap, (void *)tunnel_conf);
> > rte_eth_add_tx_callback(port_id, tunnel_conf.tx_queue,
> >                         rte_eth_tunnel_encap, (void *)tunnel_conf)
> >
> > 3. Using rte_vxlan_decap_burst() to do decapsulation of tunneling packet.
> >
> > 4. Using rte_vxlan_encap_burst() to do encapsulation of tunneling packet.
> >    The 'src ip, dst ip, src port, dst port and  tunnel ID" can be got
> > from tunnel configuration.
> >    And SIMD is used to accelarate the operation.
> >
> > How to use these APIs, there is a example below:
> >
> > 1)at config phase
> >
> > dev_config(port, ...);
> > tunnel_config(port,...);
> > ...
> > dev_start(port);
> > ...
> > rx_burst(port, rxq,... );
> > tx_burst(port, txq,...);
> >
> >
> > 2)at transmitting packet phase
> > The only outer src/dst MAC address need to be set for TX tunnel
> > configuration in dev->post_tx_burst_cbs[].param.
> >
> > In this patch set, I have not finished all of codes, the purpose of
> > sending patch set is that I would like to collect more comments and
> > sugestions on this idea.
> >
> >
> > Jijiang Liu (6):
> >   extend rte_eth_tunnel_flow
> >   define tunnel flow structure and APIs
> >   implement tunnel flow APIs
> >   define rte_vxlan_decap/encap
> >   implement rte_vxlan_decap/encap
> >   i40e tunnel configure
> >
> >  drivers/net/i40e/i40e_ethdev.c             |   41 +++++
> >  lib/librte_ether/libtunnel/rte_vxlan_opt.c |  251
> > ++++++++++++++++++++++++++++
> >  lib/librte_ether/libtunnel/rte_vxlan_opt.h |   49 ++++++
> >  lib/librte_ether/rte_eth_ctrl.h            |   14 ++-
> >  lib/librte_ether/rte_ethdev.h              |   28 +++
> >  lib/librte_ether/rte_ethdev.c              |   60 ++
> >  5 files changed, 440 insertions(+), 3 deletions(-)  create mode
> > 100644 lib/librte_ether/libtunnel/rte_vxlan_opt.c
> >  create mode 100644 lib/librte_ether/libtunnel/rte_vxlan_opt.h
> >
> > --
> > 1.7.7.6

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [dpdk-dev] [RFC PATCH 0/6] General tunneling APIs
  2015-12-28  5:54   ` Liu, Jijiang
@ 2016-01-04 10:48     ` Walukiewicz, Miroslaw
  0 siblings, 0 replies; 13+ messages in thread
From: Walukiewicz, Miroslaw @ 2016-01-04 10:48 UTC (permalink / raw)
  To: Liu, Jijiang, dev

Hi Jijang, 

My comments below MW>

> -----Original Message-----
> From: Liu, Jijiang
> Sent: Monday, December 28, 2015 6:55 AM
> To: Walukiewicz, Miroslaw; dev@dpdk.org
> Subject: RE: [dpdk-dev] [RFC PATCH 0/6] General tunneling APIs
> 
> Hi Miroslaw,
> 
> The partial answer is below.
> 
> > -----Original Message-----
> > From: Walukiewicz, Miroslaw
> > Sent: Wednesday, December 23, 2015 7:18 PM
> > To: Liu, Jijiang; dev@dpdk.org
> > Subject: RE: [dpdk-dev] [RFC PATCH 0/6] General tunneling APIs
> >
> > Hi Jijang,
> >
> > I like an idea of tunnel API very much.
> >
> > I have a few questions.
> >
> > 1. I see that you have only i40e support due to lack of HW tunneling
> support
> > in other NICs.
> > I don't see a way how do you want to handle tunneling requests for NICs
> > without HW offload.
> 
> The flow director offload mechanism is used here, flow director is a
> common feature in current NICs.
> Here I don't use special related tunneling HW offload features, the goal is
> that we want to support  all of NICs.
> 
> > I think that we should have one common function for sending tunneled
> > packets but the initialization should check the NIC capabilities and call
> some
> > registered function making tunneling in SW in case of lack of HW support.
> Yes, we should check NIC capabilities.
> 
> > I know that making tunnel is very time consuming process, but it makes an
> > API more generic. Similar only 3 protocols are supported by i40e by HW
> and
> > we can imagine about 40 or more different tunnels working with this NIC.
> >
> > Making the SW implementation we could support missing tunnels even for
> > i40e.
> 
> In this patch set, I just use VXLAN protocol to demonstrate the framework,
> If the framework is accepted, other tunneling protocol will be added one by
> one in future.
> 
> > 2. I understand that we need RX HW queue defined in struct
> > rte_eth_tunnel_conf but why tx_queue is necessary?.
> >   As I know i40e HW we can set tunneled packet descriptors in any HW
> queue
> > and receive only on one specific queue.
> 
> As for adding tx_queue here, I have already explained here at [1]
> 
> [1] http://dpdk.org/ml/archives/dev/2015-December/030509.html
> 
> Do you think it makes sense?

MW> Unfortunately I do not see any explanation for using tx_queue parameter in this thread. 
For me this parameter is not necessary. The tunnels will work without it anyway as they are set in the packet descriptor.

> 
> > 4. In your implementation you are assuming the there is one tunnel
> > configured per DPDK interface
> >
> > rte_eth_dev_tunnel_configure(uint8_t port_id,
> > +			     struct rte_eth_tunnel_conf *tunnel_conf)
> >
> No, in terms of i40e,  there will  be up to 8K tunnels  in one DPDK interface,
> It depends on number of flow rules on a pair of queues.
> 
> struct rte_eth_tunnel_conf {
> 	uint16_t rx_queue;
> 	uint16_t tx_queue;
> 	uint16_t udp_tunnel_port;
> 	uint16_t nb_flow;
> 	uint16_t filter_type;
> 	struct rte_eth_tunnel_flow *tunnel_flow;
> };
> 
> If the ' nb_flow ' is set 2000, and you can configure 2000 flow rules on one
> queues on a port.

MW> so in your design the tunnel_flow is table of rte_eth_tunnel_flow structures. 
I did not catch it.

I hope that you will add a possibility to dynamically adding/removing tunnels from interface.

 What is a sense of the udp_tunnel_port parameter as the tunnel_flow structure also provides the same parameter.

Similar the tunnel_type should be a part of the tunnel_flow also as we assume to support different tunnels on single interface (not just VXLAN only)

> 
> > The sense of tunnel is lack of interfaces in the system because number of
> > possible VLANs is too small (4095).
> > In the DPDK we have only one tunnel per physical port what is useless even
> > with such big acceleration provided with i40e.
> 
> > In normal use cases there is a need for 10,000s of tunnels per interface.
> Even
> > for Vxlan we have 24 bits for tunnel definition
> 
> 
> We use flow director HW offload here, in terms of i40e, it support up to 8K
> flow rules of exact match.
> This is HW limitation, 10,000s of tunnels per interface is not supported by
> HW.
> 
> 
> > 5. I see that you have implementations for VXLAN,TEREDO, and GENEVE
> > tunnels in i40e drivers. I could  find the implementation for VXLAN
> > encap/decap. Are all files in the patch present?
> No, I have not finished all of codes, just VXLAN here.
> Other tunneling protocol will be added one by one in future.
> 
> > Regards,
> >
> > Mirek
> >
> >
> >
> >
> > > -----Original Message-----
> > > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Jijiang Liu
> > > Sent: Wednesday, December 23, 2015 9:50 AM
> > > To: dev@dpdk.org
> > > Subject: [dpdk-dev] [RFC PATCH 0/6] General tunneling APIs
> > >
> > > I want to define a set of General tunneling APIs, which are used to
> > > accelarate tunneling packet processing in DPDK, In this RFC patch set,
> > > I wll explain my idea using some codes.
> > >
> > > 1. Using flow director offload to define a tunnel flow in a pair of queues.
> > >
> > > flow rule: src IP + dst IP + src port + dst port + tunnel ID (for
> > > VXLAN)
> > >
> > > For example:
> > > 	struct rte_eth_tunnel_conf{
> > > 	.tunnel_type = VXLAN,
> > > 	.rx_queue = 1,
> > > 	.tx_queue = 1,
> > > 	.filter_type = 'src ip + dst ip + src port + dst port + tunnel id'
> > > 	.flow_tnl {
> > >          	.tunnel_type = VXLAN,
> > >          	.tunnel_id = 100,
> > >          	.remote_mac = 11.22.33.44.55.66,
> > >          .ip_type = ipv4,
> > >          .outer_ipv4.src_ip = 192.168.10.1
> > >          .outer_ipv4.dst_ip = 10.239.129.11
> > >          .src_port = 1000,
> > >          .dst_port =2000
> > > };
> > >
> > > 2. Configure tunnel flow for a device and for a pair of queues.
> > >
> > > rte_eth_dev_tunnel_configure(0, &rte_eth_tunnel_conf);
> > >
> > > In this API, it will call RX decapsulation and TX encapsulation
> > > callback function if HW doesn't support encap/decap, and a space will
> > > be allocated for tunnel configuration and store a pointer to this new
> > > allocated space as dev->post_rx/tx_burst_cbs[].param.
> > >
> > > rte_eth_add_rx_callback(port_id, tunnel_conf.rx_queue,
> > >                         rte_eth_tunnel_decap, (void *)tunnel_conf);
> > > rte_eth_add_tx_callback(port_id, tunnel_conf.tx_queue,
> > >                         rte_eth_tunnel_encap, (void *)tunnel_conf)
> > >
> > > 3. Using rte_vxlan_decap_burst() to do decapsulation of tunneling
> packet.
> > >
> > > 4. Using rte_vxlan_encap_burst() to do encapsulation of tunneling
> packet.
> > >    The 'src ip, dst ip, src port, dst port and  tunnel ID" can be got
> > > from tunnel configuration.
> > >    And SIMD is used to accelarate the operation.
> > >
> > > How to use these APIs, there is a example below:
> > >
> > > 1)at config phase
> > >
> > > dev_config(port, ...);
> > > tunnel_config(port,...);
> > > ...
> > > dev_start(port);
> > > ...
> > > rx_burst(port, rxq,... );
> > > tx_burst(port, txq,...);
> > >
> > >
> > > 2)at transmitting packet phase
> > > The only outer src/dst MAC address need to be set for TX tunnel
> > > configuration in dev->post_tx_burst_cbs[].param.
> > >
> > > In this patch set, I have not finished all of codes, the purpose of
> > > sending patch set is that I would like to collect more comments and
> > > sugestions on this idea.
> > >
> > >
> > > Jijiang Liu (6):
> > >   extend rte_eth_tunnel_flow
> > >   define tunnel flow structure and APIs
> > >   implement tunnel flow APIs
> > >   define rte_vxlan_decap/encap
> > >   implement rte_vxlan_decap/encap
> > >   i40e tunnel configure
> > >
> > >  drivers/net/i40e/i40e_ethdev.c             |   41 +++++
> > >  lib/librte_ether/libtunnel/rte_vxlan_opt.c |  251
> > > ++++++++++++++++++++++++++++
> > >  lib/librte_ether/libtunnel/rte_vxlan_opt.h |   49 ++++++
> > >  lib/librte_ether/rte_eth_ctrl.h            |   14 ++-
> > >  lib/librte_ether/rte_ethdev.h              |   28 +++
> > >  lib/librte_ether/rte_ethdev.c              |   60 ++
> > >  5 files changed, 440 insertions(+), 3 deletions(-)  create mode
> > > 100644 lib/librte_ether/libtunnel/rte_vxlan_opt.c
> > >  create mode 100644 lib/librte_ether/libtunnel/rte_vxlan_opt.h
> > >
> > > --
> > > 1.7.7.6

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2016-01-04 10:48 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-12-23  8:49 [dpdk-dev] [RFC PATCH 0/6] General tunneling APIs Jijiang Liu
2015-12-23  8:49 ` [dpdk-dev] [RFC PATCH 1/6] rte_ether: extend rte_eth_tunnel_flow structure Jijiang Liu
2015-12-23  8:49 ` [dpdk-dev] [RFC PATCH 2/6] rte_ether: define tunnel flow structure and APIs Jijiang Liu
2015-12-23  8:49 ` [dpdk-dev] [RFC PATCH 3/6] rte_ether: implement tunnel config API Jijiang Liu
2015-12-23  8:49 ` [dpdk-dev] [RFC PATCH 4/6] rte_ether: define rte_eth_vxlan_decap and rte_eth_vxlan_encap Jijiang Liu
2015-12-23  8:49 ` [dpdk-dev] [RFC PATCH 5/6] rte_ether: implement encap and decap APIs Jijiang Liu
2015-12-23 18:32   ` Stephen Hemminger
2015-12-23  8:49 ` [dpdk-dev] [RFC PATCH 6/6] driver/i40e: tunnel configure in i40e Jijiang Liu
2015-12-23 11:17 ` [dpdk-dev] [RFC PATCH 0/6] General tunneling APIs Walukiewicz, Miroslaw
2015-12-28  5:54   ` Liu, Jijiang
2016-01-04 10:48     ` Walukiewicz, Miroslaw
2015-12-23 18:31 ` Stephen Hemminger
2015-12-28  1:46   ` Liu, Jijiang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).