[dpdk-dev] [PATCH v4 0/3] i40e VXLAN TX checksum rework

DPDK patches and discussions
 help / color / mirror / Atom feed

* [dpdk-dev] [PATCH v4 0/3] i40e VXLAN TX checksum rework
@ 2014-12-02  6:52 Jijiang Liu
  2014-12-02  6:52 ` [dpdk-dev] [PATCH v4 1/3] mbuf:redefine three TX ol_flags Jijiang Liu
                   ` (2 more replies)
  0 siblings, 3 replies; 9+ messages in thread
From: Jijiang Liu @ 2014-12-02  6:52 UTC (permalink / raw)
  To: dev

We have got some feedback about backward compatibility of VXLAN TX checksum offload API with 1G/10G NIC after the i40e VXLAN TX checksum codes were applied, so we have to rework the APIs on i40e, including the changes of mbuf, i40e PMD and csum forward engine.
 
The main changes in mbuf are as follows, in place of removing PKT_TX_VXLAN_CKSUM, we introduce 4 new flags: PKT_TX_OUTER_IP_CKSUM, PKT_TX_OUTER_IPV4, PKT_TX_OUTER_IPV6 and PKT_TX_UDP_TUNNEL_PKT. Replace the inner_l2_len and the inner_l3_len field with the outer_l2_len and outer_l3_len field.
 
Let's use a few examples to demonstrate how to use these new flags and existing flags in rte_mbuf.h
Let say we have a tunnel packet: eth_hdr_out/ipv4_hdr_out/udp_hdr_out/vxlan_hdr/ehtr_hdr_in/ipv4_hdr_in/tcp_hdr_in. There could be several scenarios:
 
A) User requests HW offload for ipv4_hdr_out checksum.
He doesn't care is it a tunnelled packet or not. So he sets:
 
mb->l2_len = eth_hdr_out;
mb->l3_len = ipv4_hdr_out;
mb->ol_flags |= PKT_TX_IPV4_CSUM;
 
B) User is aware that it is a tunnelled packet and requests HW offload for ipv4_hdr_in and tcp_hdr_in *only*.
He doesn't care about outer IP checksum offload. In that case, for FVL  he has 2 choices:
   1. Treat that packet as a 'proper' tunnelled packet, and fill all the fields:
     mb->l2_len = udp_hdr_out + vxlan_hdr +eth_hdr_in;
     mb->l3_len = ipv4_hdr_in;
     mb->outer_l2_len = eth_hdr_out;
     mb->outer_l3_len = ipv4_hdr_out;
     mb->ol_flags |= PKT_TX_UDP_TUNNEL_PKT | PKT_TX_IP_CKSUM |  PKT_TX_TCP_CKSUM;

   2. As user doesn't care about outer IP hdr checksum, he can treat everything before ipv4_hdr_in as L2 header.
   So he knows, that it is a tunnelled packet, but makes HW to treat it as ordinary (non-tunnelled) packet:
     mb->l2_len = eth_hdr_out + ipv4_hdr_out + udp_hdr_out + vxlan_hdr + ehtr_hdr_in;
     mb->l3_len = ipv4_hdr_in;
     mb->ol_flags |= PKT_TX_IP_CKSUM |  PKT_TX_TCP_CKSUM;
 
i40e PMD will support both B.1 and B.2, but ixgbe/igb/em PMD supports only B.2.
if HW supports both - it will be up to user app which method to choose.
tespmd will support both methods, and it should be configurable by user which approach to use (cmdline parameter).
So the user can try/test both methods and select an appropriate for him.
 
C) User knows that is a tunnelled packet, and wants HW offload for all 3 checksums:  outer IP hdr checksum, inner IP checksum, inner TCP checksum.
Then he has to setup all TX checksum fields:
     mb->l2_len =  udp_hdr_out + vxlan_hdr +eth_hdr_in;;
     mb->l3_len = ipv4_hdr_in;
     mb->outer_l2_len = eth_hdr_out;
     mb->outer_l3_len = ipv4_hdr_out;
     mb->ol_flags |= PKT_TX_OUT_IP_CKSUM  | PKT_TX_UDP_TUNNEL_PKT | PKT_TX_IP_CKSUM |  PKT_TX_TCP_CKSUM;

Change notes:
v2 changes:
     remove PKT_TX_IP_CKSUM alias.
     add PKT_TX_OUT_IP_CKSUM and PKT_TX_OUTER_IPV6 in rte_get_tx_ol_flag_name.
     spliting mbuf changes into two patches.
     fix MACLEN caculation issue in i40e driver
     fix some issues in csumonly.c
     change cover letter.
v3 changes:
     fix MACLEN caculation issue in i40e driver when non-tunneling packet 
v4 changes:
     reorganize patches to avoid compilation to be broken between patches.
     remove l4_tun_len from mbuf structure.
     add PKT_TX_OUTER_IPV4 to indicate no IP checksum offload requirement for tunneling packet.
     change i40e PMD and csum engine due to above changes.
 
Jijiang Liu (3):
  Redefine PKT_TX_IPV4, PKT_TX_IPV6 and PKT_TX_VLAN_PKT;
  Replace PKT_TX_VXLAN_CKSUM with PKT_TX_UDP_TUNNEL_PKT, and add 3 TX flags, which are PKT_TX_OUTER_IP_CKSUM, PKT_TX_OUTER_IPV4 and PKT_TX_OUTER_IPV6,and rework csum forward engine and i40e pmd due to these changes;
  Replace the inner_l2_len and the inner_l3_len field with the outer_l2_len and outer_l3_len field, and rework csum forward engine and i40e pmd due to  these changes;

 app/test-pmd/csumonly.c         |   69 ++++++++++++++++++++++----------------
 lib/librte_mbuf/rte_mbuf.c      |    7 +++-
 lib/librte_mbuf/rte_mbuf.h      |   25 +++++++++----
 lib/librte_pmd_i40e/i40e_rxtx.c |   44 +++++++++++++------------
 4 files changed, 86 insertions(+), 59 deletions(-)

-- 
1.7.7.6

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [dpdk-dev] [PATCH v4 1/3] mbuf:redefine three TX ol_flags
  2014-12-02  6:52 [dpdk-dev] [PATCH v4 0/3] i40e VXLAN TX checksum rework Jijiang Liu
@ 2014-12-02  6:52 ` Jijiang Liu
  2014-12-02  6:52 ` [dpdk-dev] [PATCH v4 2/3] mbuf:add three TX ol_flags and repalce PKT_TX_VXLAN_CKSUM Jijiang Liu
  2014-12-02  6:52 ` [dpdk-dev] [PATCH v4 3/3] mbuf:replace the inner_l2_len and the inner_l3_len fields Jijiang Liu
  2 siblings, 0 replies; 9+ messages in thread
From: Jijiang Liu @ 2014-12-02  6:52 UTC (permalink / raw)
  To: dev

The reason of redefining the PKT_TX_IPV4 and the PKT_TX_IPV6 is listed below,
It will avoid to send a packet with a bad info:
  - we receive a Ether/IP6/IP4/L4/data packet
  - the driver sets PKT_RX_IPV6_HDR
  - the stack decapsulates IP6
  - the stack sends the packet, it has the PKT_TX_IPV6 flag but it's an IPv4 packet.

Signed-off-by: Jijiang Liu <jijiang.liu@intel.com>
---
 lib/librte_mbuf/rte_mbuf.h |   10 +++++-----
 1 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h
index 2e5fce5..cbadf8e 100644
--- a/lib/librte_mbuf/rte_mbuf.h
+++ b/lib/librte_mbuf/rte_mbuf.h
@@ -141,13 +141,13 @@ extern "C" {
 #define PKT_TX_IP_CKSUM      (1ULL << 54) /**< IP cksum of TX pkt. computed by NIC. */
 #define PKT_TX_IPV4_CSUM     PKT_TX_IP_CKSUM /**< Alias of PKT_TX_IP_CKSUM. */
 
-/** Tell the NIC it's an IPv4 packet. Required for L4 checksum offload or TSO. */
-#define PKT_TX_IPV4          PKT_RX_IPV4_HDR
+/** Packet is IPv4 without requiring IP checksum offload. */
+#define PKT_TX_IPV4          (1ULL << 55)
 
-/** Tell the NIC it's an IPv6 packet. Required for L4 checksum offload or TSO. */
-#define PKT_TX_IPV6          PKT_RX_IPV6_HDR
+/** Tell the NIC it's an IPv6 packet.*/
+#define PKT_TX_IPV6          (1ULL << 56)
 
-#define PKT_TX_VLAN_PKT      (1ULL << 55) /**< TX packet is a 802.1q VLAN packet. */
+#define PKT_TX_VLAN_PKT      (1ULL << 57) /**< TX packet is a 802.1q VLAN packet. */
 
 /* Use final bit of flags to indicate a control mbuf */
 #define CTRL_MBUF_FLAG       (1ULL << 63) /**< Mbuf contains control data */
-- 
1.7.7.6

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [dpdk-dev] [PATCH v4 2/3] mbuf:add three TX ol_flags and repalce PKT_TX_VXLAN_CKSUM
  2014-12-02  6:52 [dpdk-dev] [PATCH v4 0/3] i40e VXLAN TX checksum rework Jijiang Liu
  2014-12-02  6:52 ` [dpdk-dev] [PATCH v4 1/3] mbuf:redefine three TX ol_flags Jijiang Liu
@ 2014-12-02  6:52 ` Jijiang Liu
  2014-12-02  6:52 ` [dpdk-dev] [PATCH v4 3/3] mbuf:replace the inner_l2_len and the inner_l3_len fields Jijiang Liu
  2 siblings, 0 replies; 9+ messages in thread
From: Jijiang Liu @ 2014-12-02  6:52 UTC (permalink / raw)
  To: dev

Replace PKT_TX_VXLAN_CKSUM with PKT_TX_UDP_TUNNEL_PKT in order to indicate a packet is an UDP tunneling packet, and introduce 3 TX offload flags for outer IP TX checksum, which are PKT_TX_OUTER_IP_CKSUM, PKT_TX_OUTER_IPV4 and PKT_TX_OUTER_IPV6 respectively;Rework csum forward engine and i40e PMD due to these changes.

Signed-off-by: Jijiang Liu <jijiang.liu@intel.com>
---
 app/test-pmd/csumonly.c         |    9 +++++++--
 lib/librte_mbuf/rte_mbuf.c      |    7 ++++++-
 lib/librte_mbuf/rte_mbuf.h      |   11 ++++++++++-
 lib/librte_pmd_i40e/i40e_rxtx.c |    6 +++---
 4 files changed, 26 insertions(+), 7 deletions(-)

diff --git a/app/test-pmd/csumonly.c b/app/test-pmd/csumonly.c
index d8c080a..9094967 100644
--- a/app/test-pmd/csumonly.c
+++ b/app/test-pmd/csumonly.c
@@ -257,7 +257,7 @@ process_outer_cksums(void *outer_l3_hdr, uint16_t outer_ethertype,
 	uint64_t ol_flags = 0;
 
 	if (testpmd_ol_flags & TESTPMD_TX_OFFLOAD_VXLAN_CKSUM)
-		ol_flags |= PKT_TX_VXLAN_CKSUM;
+		ol_flags |= PKT_TX_UDP_TUNNEL_PKT;
 
 	if (outer_ethertype == _htons(ETHER_TYPE_IPv4)) {
 		ipv4_hdr->hdr_checksum = 0;
@@ -470,7 +470,12 @@ pkt_burst_checksum_forward(struct fwd_stream *fs)
 				{ PKT_TX_UDP_CKSUM, PKT_TX_L4_MASK },
 				{ PKT_TX_TCP_CKSUM, PKT_TX_L4_MASK },
 				{ PKT_TX_SCTP_CKSUM, PKT_TX_L4_MASK },
-				{ PKT_TX_VXLAN_CKSUM, PKT_TX_VXLAN_CKSUM },
+				{ PKT_TX_UDP_TUNNEL_PKT, PKT_TX_UDP_TUNNEL_PKT },
+				{ PKT_TX_IPV4, PKT_TX_IPV4 },
+				{ PKT_TX_IPV6, PKT_TX_IPV6 },
+				{ PKT_TX_OUTER_IP_CKSUM, PKT_TX_OUTER_IP_CKSUM },
+				{ PKT_TX_OUTER_IPV4, PKT_TX_OUTER_IPV4 },
+				{ PKT_TX_OUTER_IPV6, PKT_TX_OUTER_IPV6 },
 				{ PKT_TX_TCP_SEG, PKT_TX_TCP_SEG },
 			};
 			unsigned j;
diff --git a/lib/librte_mbuf/rte_mbuf.c b/lib/librte_mbuf/rte_mbuf.c
index 87c2963..1b14e02 100644
--- a/lib/librte_mbuf/rte_mbuf.c
+++ b/lib/librte_mbuf/rte_mbuf.c
@@ -240,8 +240,13 @@ const char *rte_get_tx_ol_flag_name(uint64_t mask)
 	case PKT_TX_SCTP_CKSUM: return "PKT_TX_SCTP_CKSUM";
 	case PKT_TX_UDP_CKSUM: return "PKT_TX_UDP_CKSUM";
 	case PKT_TX_IEEE1588_TMST: return "PKT_TX_IEEE1588_TMST";
-	case PKT_TX_VXLAN_CKSUM: return "PKT_TX_VXLAN_CKSUM";
+	case PKT_TX_UDP_TUNNEL_PKT: return "PKT_TX_UDP_TUNNEL_PKT";
 	case PKT_TX_TCP_SEG: return "PKT_TX_TCP_SEG";
+	case PKT_TX_IPV4: return "PKT_TX_IPV4";
+	case PKT_TX_IPV6: return "PKT_TX_IPV6";
+	case PKT_TX_OUTER_IP_CKSUM: return "PKT_TX_OUTER_IP_CKSUM";
+	case PKT_TX_OUTER_IPV4: return "PKT_TX_OUTER_IPV4";
+	case PKT_TX_OUTER_IPV6: return "PKT_TX_OUTER_IPV6";
 	default: return NULL;
 	}
 }
diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h
index cbadf8e..6eb898f 100644
--- a/lib/librte_mbuf/rte_mbuf.h
+++ b/lib/librte_mbuf/rte_mbuf.h
@@ -118,7 +118,7 @@ extern "C" {
  */
 #define PKT_TX_TCP_SEG       (1ULL << 49)
 
-#define PKT_TX_VXLAN_CKSUM   (1ULL << 50) /**< TX checksum of VXLAN computed by NIC */
+#define PKT_TX_UDP_TUNNEL_PKT (1ULL << 50) /**< TX packet is an UDP tunneling packet */
 #define PKT_TX_IEEE1588_TMST (1ULL << 51) /**< TX IEEE1588 packet to timestamp. */
 
 /**
@@ -149,6 +149,15 @@ extern "C" {
 
 #define PKT_TX_VLAN_PKT      (1ULL << 57) /**< TX packet is a 802.1q VLAN packet. */
 
+/** Outer IP checksum of TX packet, computed by NIC for tunneling packet */
+#define PKT_TX_OUTER_IP_CKSUM   (1ULL << 58)
+
+/** Packet is outer IPv4 without requiring IP checksum offload for tunneling packet. */
+#define PKT_TX_OUTER_IPV4   (1ULL << 59)
+
+/** Tell the NIC it's an outer IPv6 packet for tunneling packet */
+#define PKT_TX_OUTER_IPV6    (1ULL << 60)
+
 /* Use final bit of flags to indicate a control mbuf */
 #define CTRL_MBUF_FLAG       (1ULL << 63) /**< Mbuf contains control data */
 
diff --git a/lib/librte_pmd_i40e/i40e_rxtx.c b/lib/librte_pmd_i40e/i40e_rxtx.c
index 2d2ef04..078e973 100644
--- a/lib/librte_pmd_i40e/i40e_rxtx.c
+++ b/lib/librte_pmd_i40e/i40e_rxtx.c
@@ -478,7 +478,7 @@ i40e_txd_enable_checksum(uint64_t ol_flags,
 	}
 
 	/* VXLAN packet TX checksum offload */
-	if (unlikely(ol_flags & PKT_TX_VXLAN_CKSUM)) {
+	if (unlikely(ol_flags & PKT_TX_UDP_TUNNEL_PKT)) {
 		uint8_t l4tun_len;
 
 		l4tun_len = ETHER_VXLAN_HLEN + inner_l2_len;
@@ -1158,8 +1158,8 @@ i40e_calc_context_desc(uint64_t flags)
 {
 	uint64_t mask = 0ULL;
 
-	if (flags | PKT_TX_VXLAN_CKSUM)
-		mask |= PKT_TX_VXLAN_CKSUM;
+	if (flags | PKT_TX_UDP_TUNNEL_PKT)
+		mask |= PKT_TX_UDP_TUNNEL_PKT;
 
 #ifdef RTE_LIBRTE_IEEE1588
 	mask |= PKT_TX_IEEE1588_TMST;
-- 
1.7.7.6

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [dpdk-dev] [PATCH v4 3/3] mbuf:replace the inner_l2_len and the inner_l3_len fields
  2014-12-02  6:52 [dpdk-dev] [PATCH v4 0/3] i40e VXLAN TX checksum rework Jijiang Liu
  2014-12-02  6:52 ` [dpdk-dev] [PATCH v4 1/3] mbuf:redefine three TX ol_flags Jijiang Liu
  2014-12-02  6:52 ` [dpdk-dev] [PATCH v4 2/3] mbuf:add three TX ol_flags and repalce PKT_TX_VXLAN_CKSUM Jijiang Liu
@ 2014-12-02  6:52 ` Jijiang Liu
  2014-12-02 14:53   ` didier.pallard
  2 siblings, 1 reply; 9+ messages in thread
From: Jijiang Liu @ 2014-12-02  6:52 UTC (permalink / raw)
  To: dev

Replace the inner_l2_len and the inner_l3_len field with the outer_l2_len and outer_l3_len field, and rework csum forward engine and i40e PMD due to  these changes.

Signed-off-by: Jijiang Liu <jijiang.liu@intel.com>
---
 app/test-pmd/csumonly.c         |   60 +++++++++++++++++++++-----------------
 lib/librte_mbuf/rte_mbuf.h      |    4 +-
 lib/librte_pmd_i40e/i40e_rxtx.c |   38 +++++++++++++-----------
 3 files changed, 55 insertions(+), 47 deletions(-)

diff --git a/app/test-pmd/csumonly.c b/app/test-pmd/csumonly.c
index 9094967..e874ac5 100644
--- a/app/test-pmd/csumonly.c
+++ b/app/test-pmd/csumonly.c
@@ -189,11 +189,12 @@ process_inner_cksums(void *l3_hdr, uint16_t ethertype, uint16_t l3_len,
 		} else {
 			if (testpmd_ol_flags & TESTPMD_TX_OFFLOAD_IP_CKSUM)
 				ol_flags |= PKT_TX_IP_CKSUM;
-			else
+			else {
 				ipv4_hdr->hdr_checksum =
 					rte_ipv4_cksum(ipv4_hdr);
+				ol_flags |= PKT_TX_IPV4;
+			}
 		}
-		ol_flags |= PKT_TX_IPV4;
 	} else if (ethertype == _htons(ETHER_TYPE_IPv6))
 		ol_flags |= PKT_TX_IPV6;
 	else
@@ -262,22 +263,25 @@ process_outer_cksums(void *outer_l3_hdr, uint16_t outer_ethertype,
 	if (outer_ethertype == _htons(ETHER_TYPE_IPv4)) {
 		ipv4_hdr->hdr_checksum = 0;
 
-		if ((testpmd_ol_flags & TESTPMD_TX_OFFLOAD_VXLAN_CKSUM) == 0)
+		if (testpmd_ol_flags & TESTPMD_TX_OFFLOAD_VXLAN_CKSUM)
+			ol_flags |= PKT_TX_OUTER_IP_CKSUM;
+		else {
+			ol_flags |= PKT_TX_OUTER_IPV4;
 			ipv4_hdr->hdr_checksum = rte_ipv4_cksum(ipv4_hdr);
-	}
+		}
+	} else
+		ol_flags |= PKT_TX_OUTER_IPV6;
 
 	udp_hdr = (struct udp_hdr *)((char *)outer_l3_hdr + outer_l3_len);
 	/* do not recalculate udp cksum if it was 0 */
 	if (udp_hdr->dgram_cksum != 0) {
 		udp_hdr->dgram_cksum = 0;
-		if ((testpmd_ol_flags & TESTPMD_TX_OFFLOAD_VXLAN_CKSUM) == 0) {
-			if (outer_ethertype == _htons(ETHER_TYPE_IPv4))
-				udp_hdr->dgram_cksum =
-					rte_ipv4_udptcp_cksum(ipv4_hdr, udp_hdr);
-			else
-				udp_hdr->dgram_cksum =
-					rte_ipv6_udptcp_cksum(ipv6_hdr, udp_hdr);
-		}
+		if (outer_ethertype == _htons(ETHER_TYPE_IPv4))
+			udp_hdr->dgram_cksum =
+				rte_ipv4_udptcp_cksum(ipv4_hdr, udp_hdr);
+		else
+			udp_hdr->dgram_cksum =
+				rte_ipv6_udptcp_cksum(ipv6_hdr, udp_hdr);
 	}
 
 	return ol_flags;
@@ -303,8 +307,7 @@ process_outer_cksums(void *outer_l3_hdr, uint16_t outer_ethertype,
  * TESTPMD_TX_OFFLOAD_* in ports[tx_port].tx_ol_flags. They control
  * wether a checksum must be calculated in software or in hardware. The
  * IP, UDP, TCP and SCTP flags always concern the inner layer.  The
- * VxLAN flag concerns the outer IP and UDP layer (if packet is
- * recognized as a vxlan packet).
+ * VxLAN flag concerns the outer IP(if packet is recognized as a vxlan packet).
  */
 static void
 pkt_burst_checksum_forward(struct fwd_stream *fs)
@@ -320,7 +323,7 @@ pkt_burst_checksum_forward(struct fwd_stream *fs)
 	uint16_t i;
 	uint64_t ol_flags;
 	uint16_t testpmd_ol_flags;
-	uint8_t l4_proto;
+	uint8_t l4_proto, l4_tun_len = 0;
 	uint16_t ethertype = 0, outer_ethertype = 0;
 	uint16_t l2_len = 0, l3_len = 0, l4_len = 0;
 	uint16_t outer_l2_len = 0, outer_l3_len = 0;
@@ -360,6 +363,7 @@ pkt_burst_checksum_forward(struct fwd_stream *fs)
 
 		ol_flags = 0;
 		tunnel = 0;
+		l4_tun_len = 0;
 		m = pkts_burst[i];
 
 		/* Update the L3/L4 checksum error packet statistics */
@@ -378,14 +382,16 @@ pkt_burst_checksum_forward(struct fwd_stream *fs)
 		if (l4_proto == IPPROTO_UDP) {
 			udp_hdr = (struct udp_hdr *)((char *)l3_hdr + l3_len);
 
+			/* check udp destination port, 4789 is the default
+			 * vxlan port (rfc7348) */
+			if (udp_hdr->dst_port == _htons(4789)) {
+				l4_tun_len = ETHER_VXLAN_HLEN;
+				tunnel = 1;
+
 			/* currently, this flag is set by i40e only if the
 			 * packet is vxlan */
-			if (((m->ol_flags & PKT_RX_TUNNEL_IPV4_HDR) ||
-					(m->ol_flags & PKT_RX_TUNNEL_IPV6_HDR)))
-				tunnel = 1;
-			/* else check udp destination port, 4789 is the default
-			 * vxlan port (rfc7348) */
-			else if (udp_hdr->dst_port == _htons(4789))
+			} else if (m->ol_flags & (PKT_RX_TUNNEL_IPV4_HDR |
+					PKT_RX_TUNNEL_IPV6_HDR))
 				tunnel = 1;
 
 			if (tunnel == 1) {
@@ -432,10 +438,10 @@ pkt_burst_checksum_forward(struct fwd_stream *fs)
 
 		if (tunnel == 1) {
 			if (testpmd_ol_flags & TESTPMD_TX_OFFLOAD_VXLAN_CKSUM) {
-				m->l2_len = outer_l2_len;
-				m->l3_len = outer_l3_len;
-				m->inner_l2_len = l2_len;
-				m->inner_l3_len = l3_len;
+				m->outer_l2_len = outer_l2_len;
+				m->outer_l3_len = outer_l3_len;
+				m->l2_len = l4_tun_len + l2_len;
+				m->l3_len = l3_len;
 			}
 			else {
 				/* if we don't do vxlan cksum in hw,
@@ -503,8 +509,8 @@ pkt_burst_checksum_forward(struct fwd_stream *fs)
 					m->l2_len, m->l3_len, m->l4_len);
 			if ((tunnel == 1) &&
 				(testpmd_ol_flags & TESTPMD_TX_OFFLOAD_VXLAN_CKSUM))
-				printf("tx: m->inner_l2_len=%d m->inner_l3_len=%d\n",
-					m->inner_l2_len, m->inner_l3_len);
+				printf("tx: m->outer_l2_len=%d m->outer_l3_len=%d\n",
+					m->outer_l2_len, m->outer_l3_len);
 			if (tso_segsz != 0)
 				printf("tx: m->tso_segsz=%d\n", m->tso_segsz);
 			printf("tx: flags=");
diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h
index 6eb898f..0404261 100644
--- a/lib/librte_mbuf/rte_mbuf.h
+++ b/lib/librte_mbuf/rte_mbuf.h
@@ -276,8 +276,8 @@ struct rte_mbuf {
 			uint64_t tso_segsz:16; /**< TCP TSO segment size */
 
 			/* fields for TX offloading of tunnels */
-			uint64_t inner_l3_len:9; /**< inner L3 (IP) Hdr Length. */
-			uint64_t inner_l2_len:7; /**< inner L2 (MAC) Hdr Length. */
+			uint64_t outer_l3_len:9; /**< Outer L3 (IP) Hdr Length. */
+			uint64_t outer_l2_len:7; /**< Outer L2 (MAC) Hdr Length. */
 
 			/* uint64_t unused:8; */
 		};
diff --git a/lib/librte_pmd_i40e/i40e_rxtx.c b/lib/librte_pmd_i40e/i40e_rxtx.c
index 078e973..9f0d1eb 100644
--- a/lib/librte_pmd_i40e/i40e_rxtx.c
+++ b/lib/librte_pmd_i40e/i40e_rxtx.c
@@ -462,41 +462,43 @@ i40e_txd_enable_checksum(uint64_t ol_flags,
 			uint32_t *td_offset,
 			uint8_t l2_len,
 			uint16_t l3_len,
-			uint8_t inner_l2_len,
-			uint16_t inner_l3_len,
+			uint8_t outer_l2_len,
+			uint16_t outer_l3_len,
 			uint32_t *cd_tunneling)
 {
 	if (!l2_len) {
 		PMD_DRV_LOG(DEBUG, "L2 length set to 0");
 		return;
 	}
-	*td_offset |= (l2_len >> 1) << I40E_TX_DESC_LENGTH_MACLEN_SHIFT;
 
 	if (!l3_len) {
 		PMD_DRV_LOG(DEBUG, "L3 length set to 0");
 		return;
 	}
 
-	/* VXLAN packet TX checksum offload */
+	/* UDP tunneling packet TX checksum offload */
 	if (unlikely(ol_flags & PKT_TX_UDP_TUNNEL_PKT)) {
-		uint8_t l4tun_len;
 
-		l4tun_len = ETHER_VXLAN_HLEN + inner_l2_len;
+		*td_offset |= (outer_l2_len >> 1)
+				<< I40E_TX_DESC_LENGTH_MACLEN_SHIFT;
 
-		if (ol_flags & PKT_TX_IPV4_CSUM)
+		if (ol_flags & PKT_TX_OUTER_IP_CKSUM)
 			*cd_tunneling |= I40E_TX_CTX_EXT_IP_IPV4;
-		else if (ol_flags & PKT_TX_IPV6)
+		else if (ol_flags & PKT_TX_OUTER_IPV4)
+			*cd_tunneling |= I40E_TX_CTX_EXT_IP_IPV4_NO_CSUM;
+		else if (ol_flags & PKT_TX_OUTER_IPV6)
 			*cd_tunneling |= I40E_TX_CTX_EXT_IP_IPV6;
 
 		/* Now set the ctx descriptor fields */
-		*cd_tunneling |= (l3_len >> 2) <<
+		*cd_tunneling |= (outer_l3_len >> 2) <<
 				I40E_TXD_CTX_QW0_EXT_IPLEN_SHIFT |
 				I40E_TXD_CTX_UDP_TUNNELING |
-				(l4tun_len >> 1) <<
+				(l2_len >> 1) <<
 				I40E_TXD_CTX_QW0_NATLEN_SHIFT;
 
-		l3_len = inner_l3_len;
-	}
+	} else
+		*td_offset |= (l2_len >> 1)
+			<< I40E_TX_DESC_LENGTH_MACLEN_SHIFT;
 
 	/* Enable L3 checksum offloads */
 	if (ol_flags & PKT_TX_IPV4_CSUM) {
@@ -1190,8 +1192,8 @@ i40e_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 	uint64_t ol_flags;
 	uint8_t l2_len;
 	uint16_t l3_len;
-	uint8_t inner_l2_len;
-	uint16_t inner_l3_len;
+	uint8_t outer_l2_len;
+	uint16_t outer_l3_len;
 	uint16_t nb_used;
 	uint16_t nb_ctx;
 	uint16_t tx_last;
@@ -1219,9 +1221,9 @@ i40e_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 
 		ol_flags = tx_pkt->ol_flags;
 		l2_len = tx_pkt->l2_len;
-		inner_l2_len = tx_pkt->inner_l2_len;
 		l3_len = tx_pkt->l3_len;
-		inner_l3_len = tx_pkt->inner_l3_len;
+		outer_l2_len = tx_pkt->outer_l2_len;
+		outer_l3_len = tx_pkt->outer_l3_len;
 
 		/* Calculate the number of context descriptors needed. */
 		nb_ctx = i40e_calc_context_desc(ol_flags);
@@ -1271,8 +1273,8 @@ i40e_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 		/* Enable checksum offloading */
 		cd_tunneling_params = 0;
 		i40e_txd_enable_checksum(ol_flags, &td_cmd, &td_offset,
-						l2_len, l3_len, inner_l2_len,
-						inner_l3_len,
+						l2_len, l3_len, outer_l2_len,
+						outer_l3_len,
 						&cd_tunneling_params);
 
 		if (unlikely(nb_ctx)) {
-- 
1.7.7.6

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [dpdk-dev] [PATCH v4 3/3] mbuf:replace the inner_l2_len and the inner_l3_len fields
  2014-12-02  6:52 ` [dpdk-dev] [PATCH v4 3/3] mbuf:replace the inner_l2_len and the inner_l3_len fields Jijiang Liu
@ 2014-12-02 14:53   ` didier.pallard
  2014-12-02 15:36     ` Ananyev, Konstantin
  0 siblings, 1 reply; 9+ messages in thread
From: didier.pallard @ 2014-12-02 14:53 UTC (permalink / raw)
  To: jijiang.liu, dev

Hello,

On 12/02/2014 07:52 AM, Jijiang Liu wrote:
> Replace the inner_l2_len and the inner_l3_len field with the outer_l2_len and outer_l3_len field, and rework csum forward engine and i40e PMD due to  these changes.
>
> Signed-off-by: Jijiang Liu <jijiang.liu@intel.com>
[...]
> --- a/lib/librte_mbuf/rte_mbuf.h
> +++ b/lib/librte_mbuf/rte_mbuf.h
> @@ -276,8 +276,8 @@ struct rte_mbuf {
>   			uint64_t tso_segsz:16; /**< TCP TSO segment size */
>   
>   			/* fields for TX offloading of tunnels */
> -			uint64_t inner_l3_len:9; /**< inner L3 (IP) Hdr Length. */
> -			uint64_t inner_l2_len:7; /**< inner L2 (MAC) Hdr Length. */
> +			uint64_t outer_l3_len:9; /**< Outer L3 (IP) Hdr Length. */
> +			uint64_t outer_l2_len:7; /**< Outer L2 (MAC) Hdr Length. */
>   
>   			/* uint64_t unused:8; */
>   		};

Sorry for entering lately this discussion, but i'm not convinced by the 
choice of outer_lx_len rather than inner_lx_len for new fields.
I agree with Olivier that new flags should only be related to the use of 
new fields, to maintain coherency with oldest implementations.
But from a stack point of view, i think it is better to have lx_len 
fields that target the outer layers, and to name new fields inner_lx_len.

Let's discuss the two possibilities.

1) outer_lx_len fields are introduced.
In this case, the stack should have knowledge that it is processing 
tunneled packets to use outer_lx_len rather than lx_len,
or stack should always use outer_lx_len packet and move those fields to 
lx_len packets if no tunneling occurs...
I think it will induce extra processing that does not seem to be really 
needed.

2) inner_lx_len fields are introduced.
In this case, the stack first uses lx_len fields. When packet should be 
tunneled, lx_len fields are moved to inner_lx_len fields.
Then the stack can process the outer layer and still use the lx_len fields.

For  example:
an eth/IP/TCP forged packet will look like this:

Ether/IP/UDP/xxx
   m->flags = IP_CKSUM
   m->l2_len = sizeof(ether)
   m->l3_len = sizeof(ip)
   m->l4_len = sizeof(udp)
   m->inner_l2_len = 0
   m->inner_l3_len = 0

When entering tunnel for example a VXLAN interface, lx_len will be moved 
to inner_lx_len

Ether/IP/UDP/xxx
   m->flags = INNER_IP_CKSUM
   m->l2_len = 0
   m->l3_len = 0
   m->l4_len = 0
   m->inner_l2_len = sizeof(ether)
   m->inner_l3_len = sizeof(ip)

once complete encapsulation is processed by the stack, the packet will 
look like

Ether/IP/UDP/VXLAN/Ether/IP/UDP/xxx
   m->flags = IP_CKSUM | INNER_IP_CKSUM
   m->l2_len = sizeof(ether)
   m->l3_len = sizeof(ip)
   m->l4_len = sizeof(udp)
   m->inner_l2_len = sizeof(ether) + sizeof (vxlan)
   m->inner_l3_len = sizeof(ip)

didier

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [dpdk-dev] [PATCH v4 3/3] mbuf:replace the inner_l2_len and the inner_l3_len fields
  2014-12-02 14:53   ` didier.pallard
@ 2014-12-02 15:36     ` Ananyev, Konstantin
  2014-12-03  8:57       ` Olivier MATZ
  0 siblings, 1 reply; 9+ messages in thread
From: Ananyev, Konstantin @ 2014-12-02 15:36 UTC (permalink / raw)
  To: didier.pallard, Liu, Jijiang, dev

Hi Didier

> -----Original Message-----
> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of didier.pallard
> Sent: Tuesday, December 02, 2014 2:53 PM
> To: Liu, Jijiang; dev@dpdk.org
> Subject: Re: [dpdk-dev] [PATCH v4 3/3] mbuf:replace the inner_l2_len and the inner_l3_len fields
> 
> Hello,
> 
> On 12/02/2014 07:52 AM, Jijiang Liu wrote:
> > Replace the inner_l2_len and the inner_l3_len field with the outer_l2_len and outer_l3_len field, and rework csum forward engine
> and i40e PMD due to  these changes.
> >
> > Signed-off-by: Jijiang Liu <jijiang.liu@intel.com>
> [...]
> > --- a/lib/librte_mbuf/rte_mbuf.h
> > +++ b/lib/librte_mbuf/rte_mbuf.h
> > @@ -276,8 +276,8 @@ struct rte_mbuf {
> >   			uint64_t tso_segsz:16; /**< TCP TSO segment size */
> >
> >   			/* fields for TX offloading of tunnels */
> > -			uint64_t inner_l3_len:9; /**< inner L3 (IP) Hdr Length. */
> > -			uint64_t inner_l2_len:7; /**< inner L2 (MAC) Hdr Length. */
> > +			uint64_t outer_l3_len:9; /**< Outer L3 (IP) Hdr Length. */
> > +			uint64_t outer_l2_len:7; /**< Outer L2 (MAC) Hdr Length. */
> >
> >   			/* uint64_t unused:8; */
> >   		};
> 
> Sorry for entering lately this discussion, but i'm not convinced by the
> choice of outer_lx_len rather than inner_lx_len for new fields.
> I agree with Olivier that new flags should only be related to the use of
> new fields, to maintain coherency with oldest implementations.
> But from a stack point of view, i think it is better to have lx_len
> fields that target the outer layers, and to name new fields inner_lx_len.
> 
> Let's discuss the two possibilities.
> 
> 1) outer_lx_len fields are introduced.
> In this case, the stack should have knowledge that it is processing
> tunneled packets to use outer_lx_len rather than lx_len,
> or stack should always use outer_lx_len packet and move those fields to
> lx_len packets if no tunneling occurs...
> I think it will induce extra processing that does not seem to be really
> needed.
> 
> 2) inner_lx_len fields are introduced.
> In this case, the stack first uses lx_len fields. When packet should be
> tunneled, lx_len fields are moved to inner_lx_len fields.
> Then the stack can process the outer layer and still use the lx_len fields.

Not sure, that I understood why 2) is better than 1).
Let say,  you have a 'normal' (non-tunnelling) packet: ether/IP/TCP
In that case you still use mbuf's l2_len/l3_len/l4_len fields and setup ol_flags as usual.
Then later, you decided to 'tunnel' that packet.
So you just fill mbuf's outer_l2_len/outer_l3_len, setup TX_OUTER_* and TX_TUNNEL_* bits in ol_flags and probably update l2_len.
l3_len/l4_len and ol_flags bits set for them remain intact.
That's with 1)

With 2) - you'll have to move l3_len/l4_len to inner_lx_len. 
And I suppose ol_flags values too:
ol_flags &= ~PKT_ IP_CKSUM;
ol_flgas  |=  PKT_INNER_IP_CKSUM
?
And same for L4_CKSUM flags too?

Konstantin

> 
> For  example:
> an eth/IP/TCP forged packet will look like this:
> 
> Ether/IP/UDP/xxx
>    m->flags = IP_CKSUM
>    m->l2_len = sizeof(ether)
>    m->l3_len = sizeof(ip)
>    m->l4_len = sizeof(udp)
>    m->inner_l2_len = 0
>    m->inner_l3_len = 0
> 
> When entering tunnel for example a VXLAN interface, lx_len will be moved
> to inner_lx_len
> 
> Ether/IP/UDP/xxx
>    m->flags = INNER_IP_CKSUM
>    m->l2_len = 0
>    m->l3_len = 0
>    m->l4_len = 0
>    m->inner_l2_len = sizeof(ether)
>    m->inner_l3_len = sizeof(ip)
> 
> 
> once complete encapsulation is processed by the stack, the packet will
> look like
> 
> Ether/IP/UDP/VXLAN/Ether/IP/UDP/xxx
>    m->flags = IP_CKSUM | INNER_IP_CKSUM
>    m->l2_len = sizeof(ether)
>    m->l3_len = sizeof(ip)
>    m->l4_len = sizeof(udp)
>    m->inner_l2_len = sizeof(ether) + sizeof (vxlan)
>    m->inner_l3_len = sizeof(ip)
> 
> 
> didier
> 

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [dpdk-dev] [PATCH v4 3/3] mbuf:replace the inner_l2_len and the inner_l3_len fields
  2014-12-02 15:36     ` Ananyev, Konstantin
@ 2014-12-03  8:57       ` Olivier MATZ
  2014-12-03 11:11         ` Ananyev, Konstantin
  0 siblings, 1 reply; 9+ messages in thread
From: Olivier MATZ @ 2014-12-03  8:57 UTC (permalink / raw)
  To: Ananyev, Konstantin, didier.pallard, Liu, Jijiang, dev

Hi Didier, Konstantin, Jijiang,

On 12/02/2014 04:36 PM, Ananyev, Konstantin wrote:
> Hi Didier
>
>> -----Original Message-----
>> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of didier.pallard
>> Sent: Tuesday, December 02, 2014 2:53 PM
>> To: Liu, Jijiang; dev@dpdk.org
>> Subject: Re: [dpdk-dev] [PATCH v4 3/3] mbuf:replace the inner_l2_len and the inner_l3_len fields
>>
>> Hello,
>>
>> On 12/02/2014 07:52 AM, Jijiang Liu wrote:
>>> Replace the inner_l2_len and the inner_l3_len field with the outer_l2_len and outer_l3_len field, and rework csum forward engine
>> and i40e PMD due to  these changes.
>>>
>>> Signed-off-by: Jijiang Liu <jijiang.liu@intel.com>
>> [...]
>>> --- a/lib/librte_mbuf/rte_mbuf.h
>>> +++ b/lib/librte_mbuf/rte_mbuf.h
>>> @@ -276,8 +276,8 @@ struct rte_mbuf {
>>>    			uint64_t tso_segsz:16; /**< TCP TSO segment size */
>>>
>>>    			/* fields for TX offloading of tunnels */
>>> -			uint64_t inner_l3_len:9; /**< inner L3 (IP) Hdr Length. */
>>> -			uint64_t inner_l2_len:7; /**< inner L2 (MAC) Hdr Length. */
>>> +			uint64_t outer_l3_len:9; /**< Outer L3 (IP) Hdr Length. */
>>> +			uint64_t outer_l2_len:7; /**< Outer L2 (MAC) Hdr Length. */
>>>
>>>    			/* uint64_t unused:8; */
>>>    		};
>>
>> Sorry for entering lately this discussion, but i'm not convinced by the
>> choice of outer_lx_len rather than inner_lx_len for new fields.
>> I agree with Olivier that new flags should only be related to the use of
>> new fields, to maintain coherency with oldest implementations.
>> But from a stack point of view, i think it is better to have lx_len
>> fields that target the outer layers, and to name new fields inner_lx_len.
>>
>> Let's discuss the two possibilities.
>>
>> 1) outer_lx_len fields are introduced.
>> In this case, the stack should have knowledge that it is processing
>> tunneled packets to use outer_lx_len rather than lx_len,
>> or stack should always use outer_lx_len packet and move those fields to
>> lx_len packets if no tunneling occurs...
>> I think it will induce extra processing that does not seem to be really
>> needed.
>>
>> 2) inner_lx_len fields are introduced.
>> In this case, the stack first uses lx_len fields. When packet should be
>> tunneled, lx_len fields are moved to inner_lx_len fields.
>> Then the stack can process the outer layer and still use the lx_len fields.
>
> Not sure, that I understood why 2) is better than 1).
> Let say,  you have a 'normal' (non-tunnelling) packet: ether/IP/TCP
> In that case you still use mbuf's l2_len/l3_len/l4_len fields and setup ol_flags as usual.
> Then later, you decided to 'tunnel' that packet.
> So you just fill mbuf's outer_l2_len/outer_l3_len, setup TX_OUTER_* and TX_TUNNEL_* bits in ol_flags and probably update l2_len.
> l3_len/l4_len and ol_flags bits set for them remain intact.
> That's with 1)
>
> With 2) - you'll have to move l3_len/l4_len to inner_lx_len.
> And I suppose ol_flags values too:
> ol_flags &= ~PKT_ IP_CKSUM;
> ol_flgas  |=  PKT_INNER_IP_CKSUM
> ?
> And same for L4_CKSUM flags too?

After reading Didier's mail, I start to be convinced that keeping inner
may be better than outer. From a network stack architecture point of
view, 2) looks better:

- the functions in the network stack that write the Ether/IP header
   always deal with l2_len/l3_len, whatever it's a tunnel or not.

- the function that adds the tunnel header moves the l2_len/l3_len and
   the flags to inner_l2_len/inner_l3_len and inner_flags.

Althought it was my first idea, now I cannot find a better argument in
favor of outer_lX_len. The initial argument was that the correspondance
between a flag and a lX_len should always remain the same, but it is
still possible with Didier's approach:
   - PKT_IP_CKSUM uses l2_len and l3_len
   - PKT_INNER_CKSUM uses inner_l2_len and inner_l3_len

Jijiang, I'm sorry to change my mind about this. If you want (and if
Konstantin is also ok with that), I can try to rebase your patches to
match this. Or do you prefer to do it by yourself?

Regards,
Olivier

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [dpdk-dev] [PATCH v4 3/3] mbuf:replace the inner_l2_len and the inner_l3_len fields
  2014-12-03  8:57       ` Olivier MATZ
@ 2014-12-03 11:11         ` Ananyev, Konstantin
  2014-12-03 11:27           ` Olivier MATZ
  0 siblings, 1 reply; 9+ messages in thread
From: Ananyev, Konstantin @ 2014-12-03 11:11 UTC (permalink / raw)
  To: Olivier MATZ, didier.pallard, Liu, Jijiang, dev

Hi Oliver,

> -----Original Message-----
> From: Olivier MATZ [mailto:olivier.matz@6wind.com]
> Sent: Wednesday, December 03, 2014 8:57 AM
> To: Ananyev, Konstantin; didier.pallard; Liu, Jijiang; dev@dpdk.org
> Subject: Re: [dpdk-dev] [PATCH v4 3/3] mbuf:replace the inner_l2_len and the inner_l3_len fields
> 
> Hi Didier, Konstantin, Jijiang,
> 
> On 12/02/2014 04:36 PM, Ananyev, Konstantin wrote:
> > Hi Didier
> >
> >> -----Original Message-----
> >> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of didier.pallard
> >> Sent: Tuesday, December 02, 2014 2:53 PM
> >> To: Liu, Jijiang; dev@dpdk.org
> >> Subject: Re: [dpdk-dev] [PATCH v4 3/3] mbuf:replace the inner_l2_len and the inner_l3_len fields
> >>
> >> Hello,
> >>
> >> On 12/02/2014 07:52 AM, Jijiang Liu wrote:
> >>> Replace the inner_l2_len and the inner_l3_len field with the outer_l2_len and outer_l3_len field, and rework csum forward
> engine
> >> and i40e PMD due to  these changes.
> >>>
> >>> Signed-off-by: Jijiang Liu <jijiang.liu@intel.com>
> >> [...]
> >>> --- a/lib/librte_mbuf/rte_mbuf.h
> >>> +++ b/lib/librte_mbuf/rte_mbuf.h
> >>> @@ -276,8 +276,8 @@ struct rte_mbuf {
> >>>    			uint64_t tso_segsz:16; /**< TCP TSO segment size */
> >>>
> >>>    			/* fields for TX offloading of tunnels */
> >>> -			uint64_t inner_l3_len:9; /**< inner L3 (IP) Hdr Length. */
> >>> -			uint64_t inner_l2_len:7; /**< inner L2 (MAC) Hdr Length. */
> >>> +			uint64_t outer_l3_len:9; /**< Outer L3 (IP) Hdr Length. */
> >>> +			uint64_t outer_l2_len:7; /**< Outer L2 (MAC) Hdr Length. */
> >>>
> >>>    			/* uint64_t unused:8; */
> >>>    		};
> >>
> >> Sorry for entering lately this discussion, but i'm not convinced by the
> >> choice of outer_lx_len rather than inner_lx_len for new fields.
> >> I agree with Olivier that new flags should only be related to the use of
> >> new fields, to maintain coherency with oldest implementations.
> >> But from a stack point of view, i think it is better to have lx_len
> >> fields that target the outer layers, and to name new fields inner_lx_len.
> >>
> >> Let's discuss the two possibilities.
> >>
> >> 1) outer_lx_len fields are introduced.
> >> In this case, the stack should have knowledge that it is processing
> >> tunneled packets to use outer_lx_len rather than lx_len,
> >> or stack should always use outer_lx_len packet and move those fields to
> >> lx_len packets if no tunneling occurs...
> >> I think it will induce extra processing that does not seem to be really
> >> needed.
> >>
> >> 2) inner_lx_len fields are introduced.
> >> In this case, the stack first uses lx_len fields. When packet should be
> >> tunneled, lx_len fields are moved to inner_lx_len fields.
> >> Then the stack can process the outer layer and still use the lx_len fields.
> >
> > Not sure, that I understood why 2) is better than 1).
> > Let say,  you have a 'normal' (non-tunnelling) packet: ether/IP/TCP
> > In that case you still use mbuf's l2_len/l3_len/l4_len fields and setup ol_flags as usual.
> > Then later, you decided to 'tunnel' that packet.
> > So you just fill mbuf's outer_l2_len/outer_l3_len, setup TX_OUTER_* and TX_TUNNEL_* bits in ol_flags and probably update l2_len.
> > l3_len/l4_len and ol_flags bits set for them remain intact.
> > That's with 1)
> >
> > With 2) - you'll have to move l3_len/l4_len to inner_lx_len.
> > And I suppose ol_flags values too:
> > ol_flags &= ~PKT_ IP_CKSUM;
> > ol_flgas  |=  PKT_INNER_IP_CKSUM
> > ?
> > And same for L4_CKSUM flags too?
> 
> After reading Didier's mail, I start to be convinced that keeping inner
> may be better than outer. From a network stack architecture point of
> view, 2) looks better:
> 
> - the functions in the network stack that write the Ether/IP header
>    always deal with l2_len/l3_len, whatever it's a tunnel or not.
> 
> - the function that adds the tunnel header moves the l2_len/l3_len and
>    the flags to inner_l2_len/inner_l3_len and inner_flags.

Hmm, still don't understand you here.
Why all that you mentioned above suddenly become 'better'?
Here is you original suggestion about introducing 'outer_lx_len':
http://dpdk.org/ml/archives/dev/2014-November/008268.html
As you pointed, it is a clean and straightforward way to extend DPDK HW TX offload API with tunnelling support.
And we agreed with you here.

>From other side, 2) approach looks like a mess:
If packet is going to be tunnelled, the upper layer has to:
1.  move lx_len values to inner_lx_len.
2. move PKT_TX_*_CKSUM bit to PKT_TX_INNER_*_CKSUM bits in ol_flags. 
Plus in the mbuf we'll either have to introduce PKT_TX_INNER_(TCP|UDP|SCTP)_CKSUM flags
(otherwise we'll have a weird situation when PKT_TX_IP_CKSUM we'll be for outer IP header, but
PKT_TX_TCP_CKSUM will be for inner).

So, from DPDK perspective, 2) looks like a waste of bits in ol_flags and unnecessary complication.  
At least to me.
You are talking about 'the network stack', but as I know at dpdk.org we don't have any open sourced L3/L4 stack supported. 
>From other side - in DPDK we just adding fields into mbuf for TX tunnelling.
So if any of the existing commercial stacks already support tunnelling - they should have their own fields for that in their own packet metadata structure.   
Which means - they somehow have to copy information from their packet structure into mbuf anyway. 
If they don't support tunnelling yet and plan to use mbuf directly (without copying info into their own packet metadata structure),
I suppose they can adopt  the DPDK approach.
So, from my point - let's implement it in a way that makes most sense from DPDK perspective: 1).
Konstantin

> 
> Althought it was my first idea, now I cannot find a better argument in
> favor of outer_lX_len. The initial argument was that the correspondance
> between a flag and a lX_len should always remain the same, but it is
> still possible with Didier's approach:
>    - PKT_IP_CKSUM uses l2_len and l3_len
>    - PKT_INNER_CKSUM uses inner_l2_len and inner_l3_len
> 
> Jijiang, I'm sorry to change my mind about this. If you want (and if
> Konstantin is also ok with that), I can try to rebase your patches to
> match this. Or do you prefer to do it by yourself?
> 
> Regards,
> Olivier
> 

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [dpdk-dev] [PATCH v4 3/3] mbuf:replace the inner_l2_len and the inner_l3_len fields
  2014-12-03 11:11         ` Ananyev, Konstantin
@ 2014-12-03 11:27           ` Olivier MATZ
  0 siblings, 0 replies; 9+ messages in thread
From: Olivier MATZ @ 2014-12-03 11:27 UTC (permalink / raw)
  To: Ananyev, Konstantin, didier.pallard, Liu, Jijiang, dev

Hi Konstantin,

On 12/03/2014 12:11 PM, Ananyev, Konstantin wrote:
>>>> Let's discuss the two possibilities.
>>>>
>>>> 1) outer_lx_len fields are introduced.
>>>> In this case, the stack should have knowledge that it is processing
>>>> tunneled packets to use outer_lx_len rather than lx_len,
>>>> or stack should always use outer_lx_len packet and move those fields to
>>>> lx_len packets if no tunneling occurs...
>>>> I think it will induce extra processing that does not seem to be really
>>>> needed.
>>>>
>>>> 2) inner_lx_len fields are introduced.
>>>> In this case, the stack first uses lx_len fields. When packet should be
>>>> tunneled, lx_len fields are moved to inner_lx_len fields.
>>>> Then the stack can process the outer layer and still use the lx_len fields.
>>>
>>> Not sure, that I understood why 2) is better than 1).
>>> Let say,  you have a 'normal' (non-tunnelling) packet: ether/IP/TCP
>>> In that case you still use mbuf's l2_len/l3_len/l4_len fields and setup ol_flags as usual.
>>> Then later, you decided to 'tunnel' that packet.
>>> So you just fill mbuf's outer_l2_len/outer_l3_len, setup TX_OUTER_* and TX_TUNNEL_* bits in ol_flags and probably update l2_len.
>>> l3_len/l4_len and ol_flags bits set for them remain intact.
>>> That's with 1)
>>>
>>> With 2) - you'll have to move l3_len/l4_len to inner_lx_len.
>>> And I suppose ol_flags values too:
>>> ol_flags &= ~PKT_ IP_CKSUM;
>>> ol_flgas  |=  PKT_INNER_IP_CKSUM
>>> ?
>>> And same for L4_CKSUM flags too?
>>
>> After reading Didier's mail, I start to be convinced that keeping inner
>> may be better than outer. From a network stack architecture point of
>> view, 2) looks better:
>>
>> - the functions in the network stack that write the Ether/IP header
>>     always deal with l2_len/l3_len, whatever it's a tunnel or not.
>>
>> - the function that adds the tunnel header moves the l2_len/l3_len and
>>     the flags to inner_l2_len/inner_l3_len and inner_flags.
>
> Hmm, still don't understand you here.
> Why all that you mentioned above suddenly become 'better'?
> Here is you original suggestion about introducing 'outer_lx_len':
> http://dpdk.org/ml/archives/dev/2014-November/008268.html
> As you pointed, it is a clean and straightforward way to extend DPDK HW TX offload API with tunnelling support.
> And we agreed with you here.
>
>  From other side, 2) approach looks like a mess:
> If packet is going to be tunnelled, the upper layer has to:
> 1.  move lx_len values to inner_lx_len.
> 2. move PKT_TX_*_CKSUM bit to PKT_TX_INNER_*_CKSUM bits in ol_flags.
> Plus in the mbuf we'll either have to introduce PKT_TX_INNER_(TCP|UDP|SCTP)_CKSUM flags
> (otherwise we'll have a weird situation when PKT_TX_IP_CKSUM we'll be for outer IP header, but
> PKT_TX_TCP_CKSUM will be for inner).
>
> So, from DPDK perspective, 2) looks like a waste of bits in ol_flags and unnecessary complication.
> At least to me.
> You are talking about 'the network stack', but as I know at dpdk.org we don't have any open sourced L3/L4 stack supported.
>  From other side - in DPDK we just adding fields into mbuf for TX tunnelling.
> So if any of the existing commercial stacks already support tunnelling - they should have their own fields for that in their own packet metadata structure.
> Which means - they somehow have to copy information from their packet structure into mbuf anyway.
> If they don't support tunnelling yet and plan to use mbuf directly (without copying info into their own packet metadata structure),
> I suppose they can adopt  the DPDK approach.
> So, from my point - let's implement it in a way that makes most sense from DPDK perspective: 1).

OK. your argumentation makes sense even though I think DPDK aims to be
a SDK for building network applications or stacks and should ease the
work done in application. But this is something we can handle in the
stack if the API is properly defined in DPDK.

Anyway, we need to find a way to conclude on this :)
And it does not prevent us to think again about it for 2.0, if a
dev_prep_tx() function is introduced.

I still have few comments on the v5 patch from Jijiang but I think we
can converge soon.

Regards,
Olivier

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2014-12-03 11:27 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-12-02  6:52 [dpdk-dev] [PATCH v4 0/3] i40e VXLAN TX checksum rework Jijiang Liu
2014-12-02  6:52 ` [dpdk-dev] [PATCH v4 1/3] mbuf:redefine three TX ol_flags Jijiang Liu
2014-12-02  6:52 ` [dpdk-dev] [PATCH v4 2/3] mbuf:add three TX ol_flags and repalce PKT_TX_VXLAN_CKSUM Jijiang Liu
2014-12-02  6:52 ` [dpdk-dev] [PATCH v4 3/3] mbuf:replace the inner_l2_len and the inner_l3_len fields Jijiang Liu
2014-12-02 14:53   ` didier.pallard
2014-12-02 15:36     ` Ananyev, Konstantin
2014-12-03  8:57       ` Olivier MATZ
2014-12-03 11:11         ` Ananyev, Konstantin
2014-12-03 11:27           ` Olivier MATZ

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).