DPDK patches and discussions
 help / color / mirror / Atom feed
* [dpdk-dev] [PATCH 0/3] i40e VXLAN TX checksum rework
@ 2014-11-27  8:18 Jijiang Liu
  2014-11-27  8:18 ` [dpdk-dev] [PATCH 1/3] mbuf:add two TX offload flags and change three fields Jijiang Liu
                   ` (4 more replies)
  0 siblings, 5 replies; 28+ messages in thread
From: Jijiang Liu @ 2014-11-27  8:18 UTC (permalink / raw)
  To: dev

We have got some feedback about backward compatibility of VXLAN TX checksum offload API with 1G/10G NIC after the i40e VXLAN TX checksum codes were applied, so we have to rework the APIs on i40e, including the changes of mbuf, i40e PMD and csum engine.

The main changes in mbuf are as follows,
In place of removing PKT_TX_VXLAN_CKSUM, we introducing 2 new flags: PKT_TX_OUT_IP_CKSUM, PKT_TX_UDP_TUNNEL_PKT, and a new field: l4_tun_len.
Replace the inner_l2_len and the inner_l3_len field with the outer_l2_len and outer_l3_len field.

The existing flags are listed below, 
PKT_TX_IP_CKSUM:     HW IPv4 checksum for non-tunnelling packet/ HW inner IPv4 checksum for tunnelling packet
PKT_TX_TCP_CKSUM:    HW TCP checksum for non-tunnelling packet/ HW inner TCP checksum for tunnelling packet
PKT_TX_SCTP_CKSUM:   HW SCTP checksum for non-tunnelling packet/ HW inner SCTP checksum for tunnelling packet
PKT_TX_UDP_CKSUM:    HW SCTP checksum for non-tunnelling packet/ HW inner SCTP checksum for tunnelling packet
PKT_TX_IPV4:        IPv4 with no HW checksum offload for non-tunnelling packet/inner IPv4 with no HW checksum offload for tunnelling packet 
PKT_TX_IPV6:        IPv6 non-tunnelling packet/ inner IPv6 with no HW checksum offload for tunnelling packet

let's use a few examples to demonstrate how to use these flags:
Let say we have a tunnel packet: eth_hdr_out/ipv4_hdr_out/udp_hdr_out/vxlan_hdr/ehtr_hdr_in/ipv4_hdr_in/tcp_hdr_in.There could be several scenarios:

A) User requests HW offload for ipv4_hdr_out checksum.
He doesn't care is it a tunnelled packet or not.
So he sets:

mb->l2_len =  eth_hdr_out;
mb->l3_len = ipv4_hdr_out;
mb->ol_flags |= PKT_TX_IPV4_CSUM;

B) User is aware that it is a tunnelled packet and requests HW offload for ipv4_hdr_in and tcp_hdr_in *only*.
He doesn't care about outer IP checksum offload.
In that case, for FVL  he has 2 choices:
   1. Treat that packet as a 'proper' tunnelled packet, and fill all the fields:
     mb->l2_len =  eth_hdr_in;
     mb->l3_len = ipv4_hdr_in;
     mb->outer_l2_len = eth_hdr_out;
     mb->outer_l3_len = ipv4_hdr_out;
     mb->l4tun_len = vxlan_hdr;     
     mb->ol_flags |= PKT_TX_UDP_TUNNEL_PKT | PKT_TX_IP_CKSUM |  PKT_TX_TCP_CKSUM;

   2. As user doesn't care about outer IP hdr checksum, he can treat everything before ipv4_hdr_in as L2 header.
   So he knows, that it is a tunnelled packet, but makes HW to treat it as ordinary (non-tunnelled) packet:
     mb->l2_len = eth_hdr_out + ipv4_hdr_out + udp_hdr_out + vxlan_hdr + ehtr_hdr_in;
     mb->l3_len = ipv4_hdr_in;
     mb->ol_flags |= PKT_TX_IP_CKSUM |  PKT_TX_TCP_CKSUM;

i40e PMD will support both B.1 and B.2.
ixgbe/igb/em PMD supports only B.2.
if HW supports both - it will be up to user app which method to choose.
tespmd will support both methods, and it should be configurable by user which approach to use (cmdline parameter).
So the user can try/test both methods and select an appropriate for him. 

Now, B.2 is exactly what Oliver suggested.
I think it has few important advantages over B.1:
First of all - compatibility. It works across all HW we currently support (i40e/ixgbe/igb/em).
Second - it is probably faster even on FVL, as for it we have to fill only TXD, while with approach #2 we have to fill both TCD and TXD.
  
C) User knows that is a tunnelled packet, and wants HW offload for all 3 checksums:  outer IP hdr checksum, inner IP checksum, inner TCP checksum.
Then he has to setup all TX checksum fields: 
     mb->l2_len =  eth_hdr_in;
     mb->l3_len = ipv4_hdr_in;
     mb->outer_l2_len = eth_hdr_out;
     mb->outer_l3_len = ipv4_hdr_out;
     mb->l4tun_len = vxlan_hdr;     
     mb->ol_flags |= PKT_TX_OUT_IP_CKSUM  | PKT_TX_UDP_TUNNEL | PKT_TX_IP_CKSUM |  PKT_TX_TCP_CKSUM;

Jijiang Liu (3):
  mbuf change
  i40e PMD change in i40e_rxtx.c  
  rework csum forward engine 

 app/test-pmd/csumonly.c         |   55 +++++++++++++++++++++-----------------
 lib/librte_mbuf/rte_mbuf.c      |    2 +-
 lib/librte_mbuf/rte_mbuf.h      |   23 ++++++++++------
 lib/librte_pmd_i40e/i40e_rxtx.c |   40 ++++++++++++---------------
 4 files changed, 63 insertions(+), 57 deletions(-)

-- 
1.7.7.6

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [dpdk-dev] [PATCH 1/3] mbuf:add two TX offload flags and change three fields
  2014-11-27  8:18 [dpdk-dev] [PATCH 0/3] i40e VXLAN TX checksum rework Jijiang Liu
@ 2014-11-27  8:18 ` Jijiang Liu
  2014-11-27 10:00   ` Olivier MATZ
  2014-11-27  8:18 ` [dpdk-dev] [PATCH 2/3] i40e:PMD change for VXLAN TX checksum Jijiang Liu
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 28+ messages in thread
From: Jijiang Liu @ 2014-11-27  8:18 UTC (permalink / raw)
  To: dev

In place of removing the PKT_TX_VXLAN_CKSUM, we introduce 2 new flags: PKT_TX_OUT_IP_CKSUM, PKT_TX_UDP_TUNNEL_PKT, and a new field: l4_tun_len.
Replace the inner_l2_len and the inner_l3_len field with the outer_l2_len and outer_l3_len field.

PKT_TX_OUT_IP_CKSUM: is not used for non-tunnelling packet;hardware outer checksum for tunnelling packet.
PKT_TX_UDP_TUNNEL_PKT: is used to tell PMD that the transmit packet is a UDP tunneling packet.
l4_tun_len: for VXLAN packet, it should be udp header length plus VXLAN header length. 

Signed-off-by: Jijiang Liu <jijiang.liu@intel.com>
---
 lib/librte_mbuf/rte_mbuf.c |    2 +-
 lib/librte_mbuf/rte_mbuf.h |   23 ++++++++++++++---------
 2 files changed, 15 insertions(+), 10 deletions(-)

diff --git a/lib/librte_mbuf/rte_mbuf.c b/lib/librte_mbuf/rte_mbuf.c
index 87c2963..e89c310 100644
--- a/lib/librte_mbuf/rte_mbuf.c
+++ b/lib/librte_mbuf/rte_mbuf.c
@@ -240,7 +240,7 @@ const char *rte_get_tx_ol_flag_name(uint64_t mask)
 	case PKT_TX_SCTP_CKSUM: return "PKT_TX_SCTP_CKSUM";
 	case PKT_TX_UDP_CKSUM: return "PKT_TX_UDP_CKSUM";
 	case PKT_TX_IEEE1588_TMST: return "PKT_TX_IEEE1588_TMST";
-	case PKT_TX_VXLAN_CKSUM: return "PKT_TX_VXLAN_CKSUM";
+	case PKT_TX_UDP_TUNNEL_PKT: return "PKT_TX_UDP_TUNNEL_PKT";
 	case PKT_TX_TCP_SEG: return "PKT_TX_TCP_SEG";
 	default: return NULL;
 	}
diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h
index 367fc56..48cd8e1 100644
--- a/lib/librte_mbuf/rte_mbuf.h
+++ b/lib/librte_mbuf/rte_mbuf.h
@@ -99,10 +99,9 @@ extern "C" {
 #define PKT_RX_TUNNEL_IPV6_HDR (1ULL << 12) /**< RX tunnel packet with IPv6 header. */
 #define PKT_RX_FDIR_ID       (1ULL << 13) /**< FD id reported if FDIR match. */
 #define PKT_RX_FDIR_FLX      (1ULL << 14) /**< Flexible bytes reported if FDIR match. */
-/* add new RX flags here */
 
 /* add new TX flags here */
-#define PKT_TX_VXLAN_CKSUM   (1ULL << 50) /**< TX checksum of VXLAN computed by NIC */
+#define PKT_TX_UDP_TUNNEL_PKT (1ULL << 50) /**< TX packet is an UDP tunneling packet */
 #define PKT_TX_IEEE1588_TMST (1ULL << 51) /**< TX IEEE1588 packet to timestamp. */
 
 /**
@@ -125,13 +124,20 @@ extern "C" {
 #define PKT_TX_IP_CKSUM      (1ULL << 54) /**< IP cksum of TX pkt. computed by NIC. */
 #define PKT_TX_IPV4_CSUM     PKT_TX_IP_CKSUM /**< Alias of PKT_TX_IP_CKSUM. */
 
+#define PKT_TX_VLAN_PKT      (1ULL << 55) /**< TX packet is a 802.1q VLAN packet. */
+
 /** Tell the NIC it's an IPv4 packet. Required for L4 checksum offload or TSO. */
-#define PKT_TX_IPV4          PKT_RX_IPV4_HDR
+#define PKT_TX_IPV4          (1ULL << 56)
 
 /** Tell the NIC it's an IPv6 packet. Required for L4 checksum offload or TSO. */
-#define PKT_TX_IPV6          PKT_RX_IPV6_HDR
+#define PKT_TX_IPV6          (1ULL << 57)
 
-#define PKT_TX_VLAN_PKT      (1ULL << 55) /**< TX packet is a 802.1q VLAN packet. */
+/** Outer IP cksum of TX pkt. computed by NIC for tunneling packet */
+#define PKT_TX_OUTER_IP_CKSUM   (1ULL << 58)
+#define PKT_TX_OUTER_IPV4_CSUM  PKT_TX_OUTER_IP_CKSUM /**< Alias of PKT_TX_OUTER_IP_CKSUM. */
+
+/** Tell the NIC it's an outer IPv6 packet for tunneling packet.*/
+#define PKT_TX_OUTER_IPV6    (1ULL << 59)
 
 /**
  * TCP segmentation offload. To enable this offload feature for a
@@ -266,10 +272,9 @@ struct rte_mbuf {
 			uint64_t tso_segsz:16; /**< TCP TSO segment size */
 
 			/* fields for TX offloading of tunnels */
-			uint64_t inner_l3_len:9; /**< inner L3 (IP) Hdr Length. */
-			uint64_t inner_l2_len:7; /**< inner L2 (MAC) Hdr Length. */
-
-			/* uint64_t unused:8; */
+			uint64_t outer_l3_len:9; /**< outer L3 (IP) Hdr Length. */
+			uint64_t outer_l2_len:7; /**< outer L2 (MAC) Hdr Length. */
+			uint64_t l4_tun_len:8; /**< L4 tunnelling header length */
 		};
 	};
 } __rte_cache_aligned;
-- 
1.7.7.6

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [dpdk-dev] [PATCH 2/3] i40e:PMD change for VXLAN TX checksum
  2014-11-27  8:18 [dpdk-dev] [PATCH 0/3] i40e VXLAN TX checksum rework Jijiang Liu
  2014-11-27  8:18 ` [dpdk-dev] [PATCH 1/3] mbuf:add two TX offload flags and change three fields Jijiang Liu
@ 2014-11-27  8:18 ` Jijiang Liu
  2014-11-27  8:18 ` [dpdk-dev] [PATCH 3/3] testpmd:rework csum forward engine Jijiang Liu
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 28+ messages in thread
From: Jijiang Liu @ 2014-11-27  8:18 UTC (permalink / raw)
  To: dev

Rework the i40e PMD codes using the new introduced ol_flags and fields.  

Signed-off-by: Jijiang Liu <jijiang.liu@intel.com>
---
 lib/librte_pmd_i40e/i40e_rxtx.c |   40 +++++++++++++++++---------------------
 1 files changed, 18 insertions(+), 22 deletions(-)

diff --git a/lib/librte_pmd_i40e/i40e_rxtx.c b/lib/librte_pmd_i40e/i40e_rxtx.c
index cce6911..b5d14bd 100644
--- a/lib/librte_pmd_i40e/i40e_rxtx.c
+++ b/lib/librte_pmd_i40e/i40e_rxtx.c
@@ -462,8 +462,8 @@ i40e_txd_enable_checksum(uint64_t ol_flags,
 			uint32_t *td_offset,
 			uint8_t l2_len,
 			uint16_t l3_len,
-			uint8_t inner_l2_len,
-			uint16_t inner_l3_len,
+			uint8_t l4_tunnel_len,
+			uint16_t outer_l3_len,
 			uint32_t *cd_tunneling)
 {
 	if (!l2_len) {
@@ -477,25 +477,19 @@ i40e_txd_enable_checksum(uint64_t ol_flags,
 		return;
 	}
 
-	/* VXLAN packet TX checksum offload */
-	if (unlikely(ol_flags & PKT_TX_VXLAN_CKSUM)) {
-		uint8_t l4tun_len;
-
-		l4tun_len = ETHER_VXLAN_HLEN + inner_l2_len;
-
-		if (ol_flags & PKT_TX_IPV4_CSUM)
+	/* UDP tunneling packet TX checksum offload */
+	if (unlikely(ol_flags & PKT_TX_UDP_TUNNEL_PKT)) {
+		if (ol_flags & PKT_TX_OUTER_IPV4_CSUM)
 			*cd_tunneling |= I40E_TX_CTX_EXT_IP_IPV4;
-		else if (ol_flags & PKT_TX_IPV6)
+		else if (ol_flags & PKT_TX_OUTER_IPV6)
 			*cd_tunneling |= I40E_TX_CTX_EXT_IP_IPV6;
 
 		/* Now set the ctx descriptor fields */
-		*cd_tunneling |= (l3_len >> 2) <<
+		*cd_tunneling |= (outer_l3_len >> 2) <<
 				I40E_TXD_CTX_QW0_EXT_IPLEN_SHIFT |
 				I40E_TXD_CTX_UDP_TUNNELING |
-				(l4tun_len >> 1) <<
+				(l4_tunnel_len >> 1) <<
 				I40E_TXD_CTX_QW0_NATLEN_SHIFT;
-
-		l3_len = inner_l3_len;
 	}
 
 	/* Enable L3 checksum offloads */
@@ -1158,8 +1152,8 @@ i40e_calc_context_desc(uint64_t flags)
 {
 	uint64_t mask = 0ULL;
 
-	if (flags | PKT_TX_VXLAN_CKSUM)
-		mask |= PKT_TX_VXLAN_CKSUM;
+	if (flags | PKT_TX_UDP_TUNNEL_PKT)
+		mask |= PKT_TX_UDP_TUNNEL_PKT;
 
 #ifdef RTE_LIBRTE_IEEE1588
 	mask |= PKT_TX_IEEE1588_TMST;
@@ -1190,8 +1184,8 @@ i40e_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 	uint64_t ol_flags;
 	uint8_t l2_len;
 	uint16_t l3_len;
-	uint8_t inner_l2_len;
-	uint16_t inner_l3_len;
+	uint16_t outer_l3_len;
+	uint8_t l4_tunnel_len;
 	uint16_t nb_used;
 	uint16_t nb_ctx;
 	uint16_t tx_last;
@@ -1219,9 +1213,11 @@ i40e_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 
 		ol_flags = tx_pkt->ol_flags;
 		l2_len = tx_pkt->l2_len;
-		inner_l2_len = tx_pkt->inner_l2_len;
 		l3_len = tx_pkt->l3_len;
-		inner_l3_len = tx_pkt->inner_l3_len;
+		outer_l3_len = tx_pkt->outer_l3_len;
+
+		/* L4 Tunneling Length */
+		l4_tunnel_len = tx_pkt->l4_tun_len + l2_len;
 
 		/* Calculate the number of context descriptors needed. */
 		nb_ctx = i40e_calc_context_desc(ol_flags);
@@ -1271,8 +1267,8 @@ i40e_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 		/* Enable checksum offloading */
 		cd_tunneling_params = 0;
 		i40e_txd_enable_checksum(ol_flags, &td_cmd, &td_offset,
-						l2_len, l3_len, inner_l2_len,
-						inner_l3_len,
+						l2_len, l3_len, l4_tunnel_len,
+						outer_l3_len,
 						&cd_tunneling_params);
 
 		if (unlikely(nb_ctx)) {
-- 
1.7.7.6

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [dpdk-dev] [PATCH 3/3] testpmd:rework csum forward engine
  2014-11-27  8:18 [dpdk-dev] [PATCH 0/3] i40e VXLAN TX checksum rework Jijiang Liu
  2014-11-27  8:18 ` [dpdk-dev] [PATCH 1/3] mbuf:add two TX offload flags and change three fields Jijiang Liu
  2014-11-27  8:18 ` [dpdk-dev] [PATCH 2/3] i40e:PMD change for VXLAN TX checksum Jijiang Liu
@ 2014-11-27  8:18 ` Jijiang Liu
  2014-11-27 10:23   ` Olivier MATZ
  2014-11-27  8:50 ` [dpdk-dev] [PATCH 0/3] i40e VXLAN TX checksum rework Liu, Jijiang
  2014-11-27  9:44 ` Olivier MATZ
  4 siblings, 1 reply; 28+ messages in thread
From: Jijiang Liu @ 2014-11-27  8:18 UTC (permalink / raw)
  To: dev

The changes include:
1. use the new introduced ol_flags and fields in csumonly.c file;
2. fix an issue of outer UDP checksum check;
3. fix an issue that is if the TESTPMD_TX_OFFLOAD_IP_CKSUM is not set, and the "ol_flags |= PKT_TX_IPV4" should be done in the process_inner_cksums();

Signed-off-by: Jijiang Liu <jijiang.liu@intel.com>
---
 app/test-pmd/csumonly.c |   55 +++++++++++++++++++++++++---------------------
 1 files changed, 30 insertions(+), 25 deletions(-)

diff --git a/app/test-pmd/csumonly.c b/app/test-pmd/csumonly.c
index d8c080a..0727510 100644
--- a/app/test-pmd/csumonly.c
+++ b/app/test-pmd/csumonly.c
@@ -189,11 +189,12 @@ process_inner_cksums(void *l3_hdr, uint16_t ethertype, uint16_t l3_len,
 		} else {
 			if (testpmd_ol_flags & TESTPMD_TX_OFFLOAD_IP_CKSUM)
 				ol_flags |= PKT_TX_IP_CKSUM;
-			else
+			else {
 				ipv4_hdr->hdr_checksum =
 					rte_ipv4_cksum(ipv4_hdr);
+				ol_flags |= PKT_TX_IPV4;
+			}
 		}
-		ol_flags |= PKT_TX_IPV4;
 	} else if (ethertype == _htons(ETHER_TYPE_IPv6))
 		ol_flags |= PKT_TX_IPV6;
 	else
@@ -257,27 +258,28 @@ process_outer_cksums(void *outer_l3_hdr, uint16_t outer_ethertype,
 	uint64_t ol_flags = 0;
 
 	if (testpmd_ol_flags & TESTPMD_TX_OFFLOAD_VXLAN_CKSUM)
-		ol_flags |= PKT_TX_VXLAN_CKSUM;
+		ol_flags |= PKT_TX_UDP_TUNNEL_PKT;
 
 	if (outer_ethertype == _htons(ETHER_TYPE_IPv4)) {
 		ipv4_hdr->hdr_checksum = 0;
 
-		if ((testpmd_ol_flags & TESTPMD_TX_OFFLOAD_VXLAN_CKSUM) == 0)
+		if (testpmd_ol_flags & TESTPMD_TX_OFFLOAD_VXLAN_CKSUM)
+			ol_flags |= PKT_TX_OUTER_IPV4_CSUM;
+		else
 			ipv4_hdr->hdr_checksum = rte_ipv4_cksum(ipv4_hdr);
-	}
+	} else if (outer_ethertype == _htons(ETHER_TYPE_IPv6))
+		ol_flags |= PKT_TX_OUTER_IPV6;
 
 	udp_hdr = (struct udp_hdr *)((char *)outer_l3_hdr + outer_l3_len);
 	/* do not recalculate udp cksum if it was 0 */
 	if (udp_hdr->dgram_cksum != 0) {
 		udp_hdr->dgram_cksum = 0;
-		if ((testpmd_ol_flags & TESTPMD_TX_OFFLOAD_VXLAN_CKSUM) == 0) {
-			if (outer_ethertype == _htons(ETHER_TYPE_IPv4))
-				udp_hdr->dgram_cksum =
-					rte_ipv4_udptcp_cksum(ipv4_hdr, udp_hdr);
-			else
-				udp_hdr->dgram_cksum =
-					rte_ipv6_udptcp_cksum(ipv6_hdr, udp_hdr);
-		}
+		if (outer_ethertype == _htons(ETHER_TYPE_IPv4))
+			udp_hdr->dgram_cksum =
+				rte_ipv4_udptcp_cksum(ipv4_hdr, udp_hdr);
+		else
+			udp_hdr->dgram_cksum =
+				rte_ipv6_udptcp_cksum(ipv6_hdr, udp_hdr);
 	}
 
 	return ol_flags;
@@ -320,7 +322,7 @@ pkt_burst_checksum_forward(struct fwd_stream *fs)
 	uint16_t i;
 	uint64_t ol_flags;
 	uint16_t testpmd_ol_flags;
-	uint8_t l4_proto;
+	uint8_t l4_proto, l4_tun_len = 0;
 	uint16_t ethertype = 0, outer_ethertype = 0;
 	uint16_t l2_len = 0, l3_len = 0, l4_len = 0;
 	uint16_t outer_l2_len = 0, outer_l3_len = 0;
@@ -383,10 +385,6 @@ pkt_burst_checksum_forward(struct fwd_stream *fs)
 			if (((m->ol_flags & PKT_RX_TUNNEL_IPV4_HDR) ||
 					(m->ol_flags & PKT_RX_TUNNEL_IPV6_HDR)))
 				tunnel = 1;
-			/* else check udp destination port, 4789 is the default
-			 * vxlan port (rfc7348) */
-			else if (udp_hdr->dst_port == _htons(4789))
-				tunnel = 1;
 
 			if (tunnel == 1) {
 				outer_ethertype = ethertype;
@@ -394,6 +392,11 @@ pkt_burst_checksum_forward(struct fwd_stream *fs)
 				outer_l3_len = l3_len;
 				outer_l3_hdr = l3_hdr;
 
+				/* check udp destination port, 4789 is the default
+				 * vxlan port (rfc7348) */
+				if (udp_hdr->dst_port == _htons(4789))
+					l4_tun_len = ETHER_VXLAN_HLEN;
+
 				eth_hdr = (struct ether_hdr *)((char *)udp_hdr +
 					sizeof(struct udp_hdr) +
 					sizeof(struct vxlan_hdr));
@@ -432,10 +435,11 @@ pkt_burst_checksum_forward(struct fwd_stream *fs)
 
 		if (tunnel == 1) {
 			if (testpmd_ol_flags & TESTPMD_TX_OFFLOAD_VXLAN_CKSUM) {
-				m->l2_len = outer_l2_len;
-				m->l3_len = outer_l3_len;
-				m->inner_l2_len = l2_len;
-				m->inner_l3_len = l3_len;
+				m->outer_l2_len = outer_l2_len;
+				m->outer_l3_len = outer_l3_len;
+				m->l2_len = l2_len;
+				m->l3_len = l3_len;
+				m->l4_tun_len = l4_tun_len;
 			}
 			else {
 				/* if we don't do vxlan cksum in hw,
@@ -470,7 +474,7 @@ pkt_burst_checksum_forward(struct fwd_stream *fs)
 				{ PKT_TX_UDP_CKSUM, PKT_TX_L4_MASK },
 				{ PKT_TX_TCP_CKSUM, PKT_TX_L4_MASK },
 				{ PKT_TX_SCTP_CKSUM, PKT_TX_L4_MASK },
-				{ PKT_TX_VXLAN_CKSUM, PKT_TX_VXLAN_CKSUM },
+				{ PKT_TX_UDP_TUNNEL_PKT, PKT_TX_UDP_TUNNEL_PKT },
 				{ PKT_TX_TCP_SEG, PKT_TX_TCP_SEG },
 			};
 			unsigned j;
@@ -498,8 +502,9 @@ pkt_burst_checksum_forward(struct fwd_stream *fs)
 					m->l2_len, m->l3_len, m->l4_len);
 			if ((tunnel == 1) &&
 				(testpmd_ol_flags & TESTPMD_TX_OFFLOAD_VXLAN_CKSUM))
-				printf("tx: m->inner_l2_len=%d m->inner_l3_len=%d\n",
-					m->inner_l2_len, m->inner_l3_len);
+				printf("tx: m->outer_l2_len=%d m->outer_l3_len=%d\n"
+					"m->l4_tun_len=%d", m->outer_l2_len,
+					m->outer_l3_len, m->l4_tun_len);
 			if (tso_segsz != 0)
 				printf("tx: m->tso_segsz=%d\n", m->tso_segsz);
 			printf("tx: flags=");
-- 
1.7.7.6

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [dpdk-dev] [PATCH 0/3] i40e VXLAN TX checksum rework
  2014-11-27  8:18 [dpdk-dev] [PATCH 0/3] i40e VXLAN TX checksum rework Jijiang Liu
                   ` (2 preceding siblings ...)
  2014-11-27  8:18 ` [dpdk-dev] [PATCH 3/3] testpmd:rework csum forward engine Jijiang Liu
@ 2014-11-27  8:50 ` Liu, Jijiang
  2014-11-27  9:44 ` Olivier MATZ
  4 siblings, 0 replies; 28+ messages in thread
From: Liu, Jijiang @ 2014-11-27  8:50 UTC (permalink / raw)
  To: Ananyev, Konstantin, Olivier MATZ, Thomas Monjalon, dev


Could you review this patch today ? 
If possible, I hope this patch set can be included RC2.


> -----Original Message-----
> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Jijiang Liu
> Sent: Thursday, November 27, 2014 4:19 PM
> To: dev@dpdk.org
> Subject: [dpdk-dev] [PATCH 0/3] i40e VXLAN TX checksum rework
> 
> We have got some feedback about backward compatibility of VXLAN TX checksum
> offload API with 1G/10G NIC after the i40e VXLAN TX checksum codes were
> applied, so we have to rework the APIs on i40e, including the changes of mbuf,
> i40e PMD and csum engine.
> 
> The main changes in mbuf are as follows, In place of removing
> PKT_TX_VXLAN_CKSUM, we introducing 2 new flags: PKT_TX_OUT_IP_CKSUM,
> PKT_TX_UDP_TUNNEL_PKT, and a new field: l4_tun_len.
> Replace the inner_l2_len and the inner_l3_len field with the outer_l2_len and
> outer_l3_len field.
> 
> The existing flags are listed below,
> PKT_TX_IP_CKSUM:     HW IPv4 checksum for non-tunnelling packet/ HW inner
> IPv4 checksum for tunnelling packet
> PKT_TX_TCP_CKSUM:    HW TCP checksum for non-tunnelling packet/ HW inner
> TCP checksum for tunnelling packet
> PKT_TX_SCTP_CKSUM:   HW SCTP checksum for non-tunnelling packet/ HW inner
> SCTP checksum for tunnelling packet
> PKT_TX_UDP_CKSUM:    HW SCTP checksum for non-tunnelling packet/ HW inner
> SCTP checksum for tunnelling packet
> PKT_TX_IPV4:        IPv4 with no HW checksum offload for non-tunnelling
> packet/inner IPv4 with no HW checksum offload for tunnelling packet
> PKT_TX_IPV6:        IPv6 non-tunnelling packet/ inner IPv6 with no HW checksum
> offload for tunnelling packet
> 
> let's use a few examples to demonstrate how to use these flags:
> Let say we have a tunnel packet:
> eth_hdr_out/ipv4_hdr_out/udp_hdr_out/vxlan_hdr/ehtr_hdr_in/ipv4_hdr_in/tcp
> _hdr_in.There could be several scenarios:
> 
> A) User requests HW offload for ipv4_hdr_out checksum.
> He doesn't care is it a tunnelled packet or not.
> So he sets:
> 
> mb->l2_len =  eth_hdr_out;
> mb->l3_len = ipv4_hdr_out;
> mb->ol_flags |= PKT_TX_IPV4_CSUM;
> 
> B) User is aware that it is a tunnelled packet and requests HW offload for
> ipv4_hdr_in and tcp_hdr_in *only*.
> He doesn't care about outer IP checksum offload.
> In that case, for FVL  he has 2 choices:
>    1. Treat that packet as a 'proper' tunnelled packet, and fill all the fields:
>      mb->l2_len =  eth_hdr_in;
>      mb->l3_len = ipv4_hdr_in;
>      mb->outer_l2_len = eth_hdr_out;
>      mb->outer_l3_len = ipv4_hdr_out;
>      mb->l4tun_len = vxlan_hdr;
>      mb->ol_flags |= PKT_TX_UDP_TUNNEL_PKT | PKT_TX_IP_CKSUM |
> PKT_TX_TCP_CKSUM;
> 
>    2. As user doesn't care about outer IP hdr checksum, he can treat everything
> before ipv4_hdr_in as L2 header.
>    So he knows, that it is a tunnelled packet, but makes HW to treat it as ordinary
> (non-tunnelled) packet:
>      mb->l2_len = eth_hdr_out + ipv4_hdr_out + udp_hdr_out + vxlan_hdr +
> ehtr_hdr_in;
>      mb->l3_len = ipv4_hdr_in;
>      mb->ol_flags |= PKT_TX_IP_CKSUM |  PKT_TX_TCP_CKSUM;
> 
> i40e PMD will support both B.1 and B.2.
> ixgbe/igb/em PMD supports only B.2.
> if HW supports both - it will be up to user app which method to choose.
> tespmd will support both methods, and it should be configurable by user which
> approach to use (cmdline parameter).
> So the user can try/test both methods and select an appropriate for him.
> 
> Now, B.2 is exactly what Oliver suggested.
> I think it has few important advantages over B.1:
> First of all - compatibility. It works across all HW we currently support
> (i40e/ixgbe/igb/em).
> Second - it is probably faster even on FVL, as for it we have to fill only TXD, while
> with approach #2 we have to fill both TCD and TXD.
> 
> C) User knows that is a tunnelled packet, and wants HW offload for all 3
> checksums:  outer IP hdr checksum, inner IP checksum, inner TCP checksum.
> Then he has to setup all TX checksum fields:
>      mb->l2_len =  eth_hdr_in;
>      mb->l3_len = ipv4_hdr_in;
>      mb->outer_l2_len = eth_hdr_out;
>      mb->outer_l3_len = ipv4_hdr_out;
>      mb->l4tun_len = vxlan_hdr;
>      mb->ol_flags |= PKT_TX_OUT_IP_CKSUM  | PKT_TX_UDP_TUNNEL |
> PKT_TX_IP_CKSUM |  PKT_TX_TCP_CKSUM;
> 
> Jijiang Liu (3):
>   mbuf change
>   i40e PMD change in i40e_rxtx.c
>   rework csum forward engine
> 
>  app/test-pmd/csumonly.c         |   55 +++++++++++++++++++++-----------------
>  lib/librte_mbuf/rte_mbuf.c      |    2 +-
>  lib/librte_mbuf/rte_mbuf.h      |   23 ++++++++++------
>  lib/librte_pmd_i40e/i40e_rxtx.c |   40 ++++++++++++---------------
>  4 files changed, 63 insertions(+), 57 deletions(-)
> 
> --
> 1.7.7.6

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [dpdk-dev] [PATCH 0/3] i40e VXLAN TX checksum rework
  2014-11-27  8:18 [dpdk-dev] [PATCH 0/3] i40e VXLAN TX checksum rework Jijiang Liu
                   ` (3 preceding siblings ...)
  2014-11-27  8:50 ` [dpdk-dev] [PATCH 0/3] i40e VXLAN TX checksum rework Liu, Jijiang
@ 2014-11-27  9:44 ` Olivier MATZ
  2014-11-27 10:12   ` Olivier MATZ
                     ` (2 more replies)
  4 siblings, 3 replies; 28+ messages in thread
From: Olivier MATZ @ 2014-11-27  9:44 UTC (permalink / raw)
  To: Jijiang Liu, dev

Hi Jijiang,

Please find below some comments about the specifications. The global
picture looks fine to me.

I've not reviewed the patch right now, but it's in the pipe.

On 11/27/2014 09:18 AM, Jijiang Liu wrote:
> We have got some feedback about backward compatibility of VXLAN TX checksum offload API with 1G/10G NIC after the i40e VXLAN TX checksum codes were applied, so we have to rework the APIs on i40e, including the changes of mbuf, i40e PMD and csum engine.
>
> The main changes in mbuf are as follows,
> In place of removing PKT_TX_VXLAN_CKSUM, we introducing 2 new flags: PKT_TX_OUT_IP_CKSUM, PKT_TX_UDP_TUNNEL_PKT, and a new field: l4_tun_len.

What about PKT_TX_OUT_UDP_CKSUM instead of PKT_TX_UDP_TUNNEL_PKT? It's
maybe more coherent with the other names.


> Replace the inner_l2_len and the inner_l3_len field with the outer_l2_len and outer_l3_len field.
>
> The existing flags are listed below,
> PKT_TX_IP_CKSUM:     HW IPv4 checksum for non-tunnelling packet/ HW inner IPv4 checksum for tunnelling packet
> PKT_TX_TCP_CKSUM:    HW TCP checksum for non-tunnelling packet/ HW inner TCP checksum for tunnelling packet
> PKT_TX_SCTP_CKSUM:   HW SCTP checksum for non-tunnelling packet/ HW inner SCTP checksum for tunnelling packet
> PKT_TX_UDP_CKSUM:    HW SCTP checksum for non-tunnelling packet/ HW inner SCTP checksum for tunnelling packet
> PKT_TX_IPV4:        IPv4 with no HW checksum offload for non-tunnelling packet/inner IPv4 with no HW checksum offload for tunnelling packet
> PKT_TX_IPV6:        IPv6 non-tunnelling packet/ inner IPv6 with no HW checksum offload for tunnelling packet

As I suggested in the TSO thread, I think the following semantics
is easier to understand for the user:

   - PKT_TX_IP_CKSUM: tell the NIC to compute IP cksum

   - PKT_TX_IPV4: tell the NIC it's an IPv4 packet. Required for L4
     checksum offload or TSO.

   - PKT_TX_IPV6: tell the NIC it's an IPv6 packet. Required for L4
     checksum offload or TSO.

I think it won't make a big difference in the FVL driver.


> let's use a few examples to demonstrate how to use these flags:
> Let say we have a tunnel packet: eth_hdr_out/ipv4_hdr_out/udp_hdr_out/vxlan_hdr/ehtr_hdr_in/ipv4_hdr_in/tcp_hdr_in.There could be several scenarios:
>
> A) User requests HW offload for ipv4_hdr_out checksum.
> He doesn't care is it a tunnelled packet or not.
> So he sets:
>
> mb->l2_len =  eth_hdr_out;
> mb->l3_len = ipv4_hdr_out;
> mb->ol_flags |= PKT_TX_IPV4_CSUM;
>
> B) User is aware that it is a tunnelled packet and requests HW offload for ipv4_hdr_in and tcp_hdr_in *only*.
> He doesn't care about outer IP checksum offload.
> In that case, for FVL  he has 2 choices:
>     1. Treat that packet as a 'proper' tunnelled packet, and fill all the fields:
>       mb->l2_len =  eth_hdr_in;
>       mb->l3_len = ipv4_hdr_in;
>       mb->outer_l2_len = eth_hdr_out;
>       mb->outer_l3_len = ipv4_hdr_out;
>       mb->l4tun_len = vxlan_hdr;
>       mb->ol_flags |= PKT_TX_UDP_TUNNEL_PKT | PKT_TX_IP_CKSUM |  PKT_TX_TCP_CKSUM;
>
>     2. As user doesn't care about outer IP hdr checksum, he can treat everything before ipv4_hdr_in as L2 header.
>     So he knows, that it is a tunnelled packet, but makes HW to treat it as ordinary (non-tunnelled) packet:
>       mb->l2_len = eth_hdr_out + ipv4_hdr_out + udp_hdr_out + vxlan_hdr + ehtr_hdr_in;
>       mb->l3_len = ipv4_hdr_in;
>       mb->ol_flags |= PKT_TX_IP_CKSUM |  PKT_TX_TCP_CKSUM;
>
> i40e PMD will support both B.1 and B.2.
> ixgbe/igb/em PMD supports only B.2.
> if HW supports both - it will be up to user app which method to choose.

I think we should have a flag to advertise outer ip and outer udp
checksum offload support, so the application knows which mode can
be used.


Regards,
Olivier

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [dpdk-dev] [PATCH 1/3] mbuf:add two TX offload flags and change three fields
  2014-11-27  8:18 ` [dpdk-dev] [PATCH 1/3] mbuf:add two TX offload flags and change three fields Jijiang Liu
@ 2014-11-27 10:00   ` Olivier MATZ
  2014-11-27 13:14     ` Liu, Jijiang
       [not found]     ` <1ED644BD7E0A5F4091CF203DAFB8E4CC01D9EEA0@SHSMSX101.ccr.corp.intel.com>
  0 siblings, 2 replies; 28+ messages in thread
From: Olivier MATZ @ 2014-11-27 10:00 UTC (permalink / raw)
  To: Jijiang Liu, dev

Hi Jijiang,

Please see some comments below.

On 11/27/2014 09:18 AM, Jijiang Liu wrote:
> In place of removing the PKT_TX_VXLAN_CKSUM, we introduce 2 new flags: PKT_TX_OUT_IP_CKSUM, PKT_TX_UDP_TUNNEL_PKT, and a new field: l4_tun_len.
> Replace the inner_l2_len and the inner_l3_len field with the outer_l2_len and outer_l3_len field.
>
> PKT_TX_OUT_IP_CKSUM: is not used for non-tunnelling packet;hardware outer checksum for tunnelling packet.
> PKT_TX_UDP_TUNNEL_PKT: is used to tell PMD that the transmit packet is a UDP tunneling packet.
> l4_tun_len: for VXLAN packet, it should be udp header length plus VXLAN header length.
>
> Signed-off-by: Jijiang Liu <jijiang.liu@intel.com>
> ---
>   lib/librte_mbuf/rte_mbuf.c |    2 +-
>   lib/librte_mbuf/rte_mbuf.h |   23 ++++++++++++++---------
>   2 files changed, 15 insertions(+), 10 deletions(-)
>
> diff --git a/lib/librte_mbuf/rte_mbuf.c b/lib/librte_mbuf/rte_mbuf.c
> index 87c2963..e89c310 100644
> --- a/lib/librte_mbuf/rte_mbuf.c
> +++ b/lib/librte_mbuf/rte_mbuf.c
> @@ -240,7 +240,7 @@ const char *rte_get_tx_ol_flag_name(uint64_t mask)
>   	case PKT_TX_SCTP_CKSUM: return "PKT_TX_SCTP_CKSUM";
>   	case PKT_TX_UDP_CKSUM: return "PKT_TX_UDP_CKSUM";
>   	case PKT_TX_IEEE1588_TMST: return "PKT_TX_IEEE1588_TMST";
> -	case PKT_TX_VXLAN_CKSUM: return "PKT_TX_VXLAN_CKSUM";
> +	case PKT_TX_UDP_TUNNEL_PKT: return "PKT_TX_UDP_TUNNEL_PKT";
>   	case PKT_TX_TCP_SEG: return "PKT_TX_TCP_SEG";
>   	default: return NULL;

As I said as a reply to the cover letter, I suggest to use
PKT_TX_OUT_UDP_CKSUM instead of PKT_TX_UDP_TUNNEL_PKT.

Also, the PKT_TX_OUT_IP_CKSUM case is missing here.

> diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h
> index 367fc56..48cd8e1 100644
> --- a/lib/librte_mbuf/rte_mbuf.h
> +++ b/lib/librte_mbuf/rte_mbuf.h
> @@ -99,10 +99,9 @@ extern "C" {
>   #define PKT_RX_TUNNEL_IPV6_HDR (1ULL << 12) /**< RX tunnel packet with IPv6 header. */
>   #define PKT_RX_FDIR_ID       (1ULL << 13) /**< FD id reported if FDIR match. */
>   #define PKT_RX_FDIR_FLX      (1ULL << 14) /**< Flexible bytes reported if FDIR match. */
> -/* add new RX flags here */
>

We should probably not remove this line.


>   /* add new TX flags here */
> -#define PKT_TX_VXLAN_CKSUM   (1ULL << 50) /**< TX checksum of VXLAN computed by NIC */
> +#define PKT_TX_UDP_TUNNEL_PKT (1ULL << 50) /**< TX packet is an UDP tunneling packet */
>   #define PKT_TX_IEEE1588_TMST (1ULL << 51) /**< TX IEEE1588 packet to timestamp. */
>
>   /**
> @@ -125,13 +124,20 @@ extern "C" {
>   #define PKT_TX_IP_CKSUM      (1ULL << 54) /**< IP cksum of TX pkt. computed by NIC. */
>   #define PKT_TX_IPV4_CSUM     PKT_TX_IP_CKSUM /**< Alias of PKT_TX_IP_CKSUM. */
>
> +#define PKT_TX_VLAN_PKT      (1ULL << 55) /**< TX packet is a 802.1q VLAN packet. */
> +
>   /** Tell the NIC it's an IPv4 packet. Required for L4 checksum offload or TSO. */
> -#define PKT_TX_IPV4          PKT_RX_IPV4_HDR
> +#define PKT_TX_IPV4          (1ULL << 56)
>
>   /** Tell the NIC it's an IPv6 packet. Required for L4 checksum offload or TSO. */
> -#define PKT_TX_IPV6          PKT_RX_IPV6_HDR
> +#define PKT_TX_IPV6          (1ULL << 57)

The description in comment does not match the description in the cover
letter.

Also, I think replacing PKT_RX_IPV[46]_HDR by the value may go in
another commit.


> -#define PKT_TX_VLAN_PKT      (1ULL << 55) /**< TX packet is a 802.1q VLAN packet. */
> +/** Outer IP cksum of TX pkt. computed by NIC for tunneling packet */
> +#define PKT_TX_OUTER_IP_CKSUM   (1ULL << 58)
> +#define PKT_TX_OUTER_IPV4_CSUM  PKT_TX_OUTER_IP_CKSUM /**< Alias of PKT_TX_OUTER_IP_CKSUM. */

Why do we need an alias?

By the way, I think the alias of PKT_TX_IP_CKSUM is also uneeded and
can be removed. But it's not the topic of your series.

Also, the name PKT_TX_OUTER_IP_CKSUM does not match the name in the
cover letter and commit logs.


> +
> +/** Tell the NIC it's an outer IPv6 packet for tunneling packet.*/
> +#define PKT_TX_OUTER_IPV6    (1ULL << 59)
>

This flag is not in the cover letter or commit log. What is its purpose?


>   /**
>    * TCP segmentation offload. To enable this offload feature for a
> @@ -266,10 +272,9 @@ struct rte_mbuf {
>   			uint64_t tso_segsz:16; /**< TCP TSO segment size */
>
>   			/* fields for TX offloading of tunnels */
> -			uint64_t inner_l3_len:9; /**< inner L3 (IP) Hdr Length. */
> -			uint64_t inner_l2_len:7; /**< inner L2 (MAC) Hdr Length. */
> -
> -			/* uint64_t unused:8; */
> +			uint64_t outer_l3_len:9; /**< outer L3 (IP) Hdr Length. */
> +			uint64_t outer_l2_len:7; /**< outer L2 (MAC) Hdr Length. */
> +			uint64_t l4_tun_len:8; /**< L4 tunnelling header length */
>   		};
>   	};
>   } __rte_cache_aligned;
>

About l4_tun_len, I have another comment I forgot to add in the cover
letter. Can we remove it and include its length in outer_l2_len
instead? For instance, replace:

      mb->l2_len =  eth_hdr_in;
      mb->l3_len = ipv4_hdr_in;
      mb->outer_l2_len = eth_hdr_out;
      mb->outer_l3_len = ipv4_hdr_out;
      mb->l4tun_len = vxlan_hdr;
      mb->ol_flags |= PKT_TX_OUT_IP_CKSUM  | PKT_TX_UDP_TUNNEL |
        PKT_TX_IP_CKSUM |  PKT_TX_TCP_CKSUM;

by:

      mb->l2_len =  eth_hdr_in;
      mb->l3_len = ipv4_hdr_in;
      mb->outer_l2_len = eth_hdr_out + vxlan_hdr;
      mb->outer_l3_len = ipv4_hdr_out;
      mb->ol_flags |= PKT_TX_OUT_IP_CKSUM  | PKT_TX_UDP_TUNNEL |
        PKT_TX_IP_CKSUM |  PKT_TX_TCP_CKSUM;

I think it won't bother the driver, and it's coherent with case B.2 of
your cover letter.

Regards,
Olivier

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [dpdk-dev] [PATCH 0/3] i40e VXLAN TX checksum rework
  2014-11-27  9:44 ` Olivier MATZ
@ 2014-11-27 10:12   ` Olivier MATZ
  2014-11-27 12:06     ` Liu, Jijiang
  2014-11-27 12:07   ` Liu, Jijiang
  2014-11-27 15:29   ` Ananyev, Konstantin
  2 siblings, 1 reply; 28+ messages in thread
From: Olivier MATZ @ 2014-11-27 10:12 UTC (permalink / raw)
  To: Jijiang Liu, dev


>> We have got some feedback about backward compatibility of VXLAN TX
>> checksum offload API with 1G/10G NIC after the i40e VXLAN TX checksum
>> codes were applied, so we have to rework the APIs on i40e, including
>> the changes of mbuf, i40e PMD and csum engine.
>>
>> The main changes in mbuf are as follows,
>> In place of removing PKT_TX_VXLAN_CKSUM, we introducing 2 new flags:
>> PKT_TX_OUT_IP_CKSUM, PKT_TX_UDP_TUNNEL_PKT, and a new field: l4_tun_len.
>
> What about PKT_TX_OUT_UDP_CKSUM instead of PKT_TX_UDP_TUNNEL_PKT? It's
> maybe more coherent with the other names.

oh I just realized that the flag is not for asking to the hardware
to calculate the outer UDP checksum.
So why does the hardware need this information?

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [dpdk-dev] [PATCH 3/3] testpmd:rework csum forward engine
  2014-11-27  8:18 ` [dpdk-dev] [PATCH 3/3] testpmd:rework csum forward engine Jijiang Liu
@ 2014-11-27 10:23   ` Olivier MATZ
  0 siblings, 0 replies; 28+ messages in thread
From: Olivier MATZ @ 2014-11-27 10:23 UTC (permalink / raw)
  To: Jijiang Liu, dev

Hi Jijiang,

On 11/27/2014 09:18 AM, Jijiang Liu wrote:
> The changes include:
> 1. use the new introduced ol_flags and fields in csumonly.c file;
> 2. fix an issue of outer UDP checksum check;
> 3. fix an issue that is if the TESTPMD_TX_OFFLOAD_IP_CKSUM is not set, and the "ol_flags |= PKT_TX_IPV4" should be done in the process_inner_cksums();
>
> Signed-off-by: Jijiang Liu <jijiang.liu@intel.com>
> ---
>   app/test-pmd/csumonly.c |   55 +++++++++++++++++++++++++---------------------
>   1 files changed, 30 insertions(+), 25 deletions(-)
>
> diff --git a/app/test-pmd/csumonly.c b/app/test-pmd/csumonly.c
> index d8c080a..0727510 100644
> --- a/app/test-pmd/csumonly.c
> +++ b/app/test-pmd/csumonly.c
> @@ -189,11 +189,12 @@ process_inner_cksums(void *l3_hdr, uint16_t ethertype, uint16_t l3_len,
>   		} else {
>   			if (testpmd_ol_flags & TESTPMD_TX_OFFLOAD_IP_CKSUM)
>   				ol_flags |= PKT_TX_IP_CKSUM;
> -			else
> +			else {
>   				ipv4_hdr->hdr_checksum =
>   					rte_ipv4_cksum(ipv4_hdr);
> +				ol_flags |= PKT_TX_IPV4;
> +			}
>   		}
> -		ol_flags |= PKT_TX_IPV4;

Same remark than in the cover letter. I think we currently don't agree
on the meaning of PKT_TX_IPV4 and PKT_TX_IPV6.


>   	udp_hdr = (struct udp_hdr *)((char *)outer_l3_hdr + outer_l3_len);
>   	/* do not recalculate udp cksum if it was 0 */
>   	if (udp_hdr->dgram_cksum != 0) {
>   		udp_hdr->dgram_cksum = 0;
> -		if ((testpmd_ol_flags & TESTPMD_TX_OFFLOAD_VXLAN_CKSUM) == 0) {
> -			if (outer_ethertype == _htons(ETHER_TYPE_IPv4))
> -				udp_hdr->dgram_cksum =
> -					rte_ipv4_udptcp_cksum(ipv4_hdr, udp_hdr);
> -			else
> -				udp_hdr->dgram_cksum =
> -					rte_ipv6_udptcp_cksum(ipv6_hdr, udp_hdr);
> -		}
> +		if (outer_ethertype == _htons(ETHER_TYPE_IPv4))
> +			udp_hdr->dgram_cksum =
> +				rte_ipv4_udptcp_cksum(ipv4_hdr, udp_hdr);
> +		else
> +			udp_hdr->dgram_cksum =
> +				rte_ipv6_udptcp_cksum(ipv6_hdr, udp_hdr);

So, I understand that today no dpdk driver support the offload of
outer udp checksum, and that it has to be done in software?


> @@ -383,10 +385,6 @@ pkt_burst_checksum_forward(struct fwd_stream *fs)
>   			if (((m->ol_flags & PKT_RX_TUNNEL_IPV4_HDR) ||
>   					(m->ol_flags & PKT_RX_TUNNEL_IPV6_HDR)))
>   				tunnel = 1;
> -			/* else check udp destination port, 4789 is the default
> -			 * vxlan port (rfc7348) */
> -			else if (udp_hdr->dst_port == _htons(4789))
> -				tunnel = 1;
>
>   			if (tunnel == 1) {
>   				outer_ethertype = ethertype;
> @@ -394,6 +392,11 @@ pkt_burst_checksum_forward(struct fwd_stream *fs)
>   				outer_l3_len = l3_len;
>   				outer_l3_hdr = l3_hdr;
>
> +				/* check udp destination port, 4789 is the default
> +				 * vxlan port (rfc7348) */
> +				if (udp_hdr->dst_port == _htons(4789))
> +					l4_tun_len = ETHER_VXLAN_HLEN;
> +

Why moving this? It won't work anymore with drivers != i40e.


>   				eth_hdr = (struct ether_hdr *)((char *)udp_hdr +
>   					sizeof(struct udp_hdr) +
>   					sizeof(struct vxlan_hdr));
> @@ -432,10 +435,11 @@ pkt_burst_checksum_forward(struct fwd_stream *fs)
>
>   		if (tunnel == 1) {
>   			if (testpmd_ol_flags & TESTPMD_TX_OFFLOAD_VXLAN_CKSUM) {
> -				m->l2_len = outer_l2_len;
> -				m->l3_len = outer_l3_len;
> -				m->inner_l2_len = l2_len;
> -				m->inner_l3_len = l3_len;
> +				m->outer_l2_len = outer_l2_len;
> +				m->outer_l3_len = outer_l3_len;
> +				m->l2_len = l2_len;
> +				m->l3_len = l3_len;
> +				m->l4_tun_len = l4_tun_len;
>   			}
>   			else {
>   				/* if we don't do vxlan cksum in hw,
> @@ -470,7 +474,7 @@ pkt_burst_checksum_forward(struct fwd_stream *fs)
>   				{ PKT_TX_UDP_CKSUM, PKT_TX_L4_MASK },
>   				{ PKT_TX_TCP_CKSUM, PKT_TX_L4_MASK },
>   				{ PKT_TX_SCTP_CKSUM, PKT_TX_L4_MASK },
> -				{ PKT_TX_VXLAN_CKSUM, PKT_TX_VXLAN_CKSUM },
> +				{ PKT_TX_UDP_TUNNEL_PKT, PKT_TX_UDP_TUNNEL_PKT },
>   				{ PKT_TX_TCP_SEG, PKT_TX_TCP_SEG },
>   			};
>   			unsigned j;
> @@ -498,8 +502,9 @@ pkt_burst_checksum_forward(struct fwd_stream *fs)
>   					m->l2_len, m->l3_len, m->l4_len);
>   			if ((tunnel == 1) &&
>   				(testpmd_ol_flags & TESTPMD_TX_OFFLOAD_VXLAN_CKSUM))
> -				printf("tx: m->inner_l2_len=%d m->inner_l3_len=%d\n",
> -					m->inner_l2_len, m->inner_l3_len);
> +				printf("tx: m->outer_l2_len=%d m->outer_l3_len=%d\n"
> +					"m->l4_tun_len=%d", m->outer_l2_len,
> +					m->outer_l3_len, m->l4_tun_len);
>   			if (tso_segsz != 0)
>   				printf("tx: m->tso_segsz=%d\n", m->tso_segsz);
>   			printf("tx: flags=");
>

One more comment:

I think the help of csum forward engine in testpmd command lines +
all the comments in csumonly.c should be checked. For example, I
can see: " The VxLAN flag concerns the outer IP and UDP layer (if
packet is recognized as a vxlan packet)". If it concerns the IP header
only, the comment should be modified.

Regards,
Olivier

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [dpdk-dev] [PATCH 0/3] i40e VXLAN TX checksum rework
  2014-11-27 10:12   ` Olivier MATZ
@ 2014-11-27 12:06     ` Liu, Jijiang
  0 siblings, 0 replies; 28+ messages in thread
From: Liu, Jijiang @ 2014-11-27 12:06 UTC (permalink / raw)
  To: Olivier MATZ, dev



> -----Original Message-----
> From: Olivier MATZ [mailto:olivier.matz@6wind.com]
> Sent: Thursday, November 27, 2014 6:13 PM
> To: Liu, Jijiang; dev@dpdk.org
> Subject: Re: [dpdk-dev] [PATCH 0/3] i40e VXLAN TX checksum rework
> 
> 
> >> We have got some feedback about backward compatibility of VXLAN TX
> >> checksum offload API with 1G/10G NIC after the i40e VXLAN TX checksum
> >> codes were applied, so we have to rework the APIs on i40e, including
> >> the changes of mbuf, i40e PMD and csum engine.
> >>
> >> The main changes in mbuf are as follows, In place of removing
> >> PKT_TX_VXLAN_CKSUM, we introducing 2 new flags:
> >> PKT_TX_OUT_IP_CKSUM, PKT_TX_UDP_TUNNEL_PKT, and a new field:
> l4_tun_len.
> >
> > What about PKT_TX_OUT_UDP_CKSUM instead of
> PKT_TX_UDP_TUNNEL_PKT? It's
> > maybe more coherent with the other names.
> 
> oh I just realized that the flag is not for asking to the hardware to calculate the
> outer UDP checksum.
> So why does the hardware need this information?


As I said before, this flag is used to tell FVL that the transmit packet is a UDP tunneling packet.
In FVL, there is a register need to configured, see below.

L4 Tunneling Type (Teredo / GRE header / VXLAN header) indication:
00b - No UDP / GRE tunneling (field must be set to zero if EIPT equals to zero)
01b - UDP tunneling header (Any UDP tunneling, VxLAN and Geneve)
10b - GRE tunneling header
Else - reserved

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [dpdk-dev] [PATCH 0/3] i40e VXLAN TX checksum rework
  2014-11-27  9:44 ` Olivier MATZ
  2014-11-27 10:12   ` Olivier MATZ
@ 2014-11-27 12:07   ` Liu, Jijiang
  2014-11-27 15:29   ` Ananyev, Konstantin
  2 siblings, 0 replies; 28+ messages in thread
From: Liu, Jijiang @ 2014-11-27 12:07 UTC (permalink / raw)
  To: Olivier MATZ, dev



> -----Original Message-----
> From: Olivier MATZ [mailto:olivier.matz@6wind.com]
> Sent: Thursday, November 27, 2014 5:45 PM
> To: Liu, Jijiang; dev@dpdk.org
> Subject: Re: [dpdk-dev] [PATCH 0/3] i40e VXLAN TX checksum rework
> 
> Hi Jijiang,
> 
> Please find below some comments about the specifications. The global picture
> looks fine to me.
> 
> I've not reviewed the patch right now, but it's in the pipe.
> 
> On 11/27/2014 09:18 AM, Jijiang Liu wrote:
> > We have got some feedback about backward compatibility of VXLAN TX
> checksum offload API with 1G/10G NIC after the i40e VXLAN TX checksum codes
> were applied, so we have to rework the APIs on i40e, including the changes of
> mbuf, i40e PMD and csum engine.
> >
> > The main changes in mbuf are as follows, In place of removing
> > PKT_TX_VXLAN_CKSUM, we introducing 2 new flags: PKT_TX_OUT_IP_CKSUM,
> PKT_TX_UDP_TUNNEL_PKT, and a new field: l4_tun_len.
> 
> What about PKT_TX_OUT_UDP_CKSUM instead of PKT_TX_UDP_TUNNEL_PKT?
> It's maybe more coherent with the other names.
> 
> 
> > Replace the inner_l2_len and the inner_l3_len field with the outer_l2_len and
> outer_l3_len field.
> >
> > The existing flags are listed below,
> > PKT_TX_IP_CKSUM:     HW IPv4 checksum for non-tunnelling packet/ HW inner
> IPv4 checksum for tunnelling packet
> > PKT_TX_TCP_CKSUM:    HW TCP checksum for non-tunnelling packet/ HW inner
> TCP checksum for tunnelling packet
> > PKT_TX_SCTP_CKSUM:   HW SCTP checksum for non-tunnelling packet/ HW
> inner SCTP checksum for tunnelling packet
> > PKT_TX_UDP_CKSUM:    HW SCTP checksum for non-tunnelling packet/ HW
> inner SCTP checksum for tunnelling packet
> > PKT_TX_IPV4:        IPv4 with no HW checksum offload for non-tunnelling
> packet/inner IPv4 with no HW checksum offload for tunnelling packet
> > PKT_TX_IPV6:        IPv6 non-tunnelling packet/ inner IPv6 with no HW checksum
> offload for tunnelling packet
> 
> As I suggested in the TSO thread, I think the following semantics is easier to
> understand for the user:
> 
>    - PKT_TX_IP_CKSUM: tell the NIC to compute IP cksum
> 
>    - PKT_TX_IPV4: tell the NIC it's an IPv4 packet. Required for L4
>      checksum offload or TSO.
> 
>    - PKT_TX_IPV6: tell the NIC it's an IPv6 packet. Required for L4
>      checksum offload or TSO.
> 
> I think it won't make a big difference in the FVL driver.
> 
Ok.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [dpdk-dev] [PATCH 1/3] mbuf:add two TX offload flags and change three fields
  2014-11-27 10:00   ` Olivier MATZ
@ 2014-11-27 13:14     ` Liu, Jijiang
  2014-11-28  9:17       ` Olivier MATZ
       [not found]     ` <1ED644BD7E0A5F4091CF203DAFB8E4CC01D9EEA0@SHSMSX101.ccr.corp.intel.com>
  1 sibling, 1 reply; 28+ messages in thread
From: Liu, Jijiang @ 2014-11-27 13:14 UTC (permalink / raw)
  To: Olivier MATZ, dev



> -----Original Message-----
> From: Olivier MATZ [mailto:olivier.matz@6wind.com]
> Sent: Thursday, November 27, 2014 6:00 PM
> To: Liu, Jijiang; dev@dpdk.org
> Subject: Re: [dpdk-dev] [PATCH 1/3] mbuf:add two TX offload flags and change
> three fields
> 
> Hi Jijiang,
> 
> Please see some comments below.
> 
> On 11/27/2014 09:18 AM, Jijiang Liu wrote:
> > In place of removing the PKT_TX_VXLAN_CKSUM, we introduce 2 new flags:
> PKT_TX_OUT_IP_CKSUM, PKT_TX_UDP_TUNNEL_PKT, and a new field:
> l4_tun_len.
> > Replace the inner_l2_len and the inner_l3_len field with the outer_l2_len and
> outer_l3_len field.
> >
> > PKT_TX_OUT_IP_CKSUM: is not used for non-tunnelling packet;hardware outer
> checksum for tunnelling packet.
> > PKT_TX_UDP_TUNNEL_PKT: is used to tell PMD that the transmit packet is a
> UDP tunneling packet.
> > l4_tun_len: for VXLAN packet, it should be udp header length plus VXLAN
> header length.
> >
> > Signed-off-by: Jijiang Liu <jijiang.liu@intel.com>
> > ---
> >   lib/librte_mbuf/rte_mbuf.c |    2 +-
> >   lib/librte_mbuf/rte_mbuf.h |   23 ++++++++++++++---------
> >   2 files changed, 15 insertions(+), 10 deletions(-)
> >
> > diff --git a/lib/librte_mbuf/rte_mbuf.c b/lib/librte_mbuf/rte_mbuf.c
> > index 87c2963..e89c310 100644
> > --- a/lib/librte_mbuf/rte_mbuf.c
> > +++ b/lib/librte_mbuf/rte_mbuf.c
> > @@ -240,7 +240,7 @@ const char *rte_get_tx_ol_flag_name(uint64_t mask)
> >   	case PKT_TX_SCTP_CKSUM: return "PKT_TX_SCTP_CKSUM";
> >   	case PKT_TX_UDP_CKSUM: return "PKT_TX_UDP_CKSUM";
> >   	case PKT_TX_IEEE1588_TMST: return "PKT_TX_IEEE1588_TMST";
> > -	case PKT_TX_VXLAN_CKSUM: return "PKT_TX_VXLAN_CKSUM";
> > +	case PKT_TX_UDP_TUNNEL_PKT: return "PKT_TX_UDP_TUNNEL_PKT";
> >   	case PKT_TX_TCP_SEG: return "PKT_TX_TCP_SEG";
> >   	default: return NULL;
> 
> As I said as a reply to the cover letter, I suggest to use
> PKT_TX_OUT_UDP_CKSUM instead of PKT_TX_UDP_TUNNEL_PKT.
> 
> Also, the PKT_TX_OUT_IP_CKSUM case is missing here.
Yes.

> > diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h
> > index 367fc56..48cd8e1 100644
> > --- a/lib/librte_mbuf/rte_mbuf.h
> > +++ b/lib/librte_mbuf/rte_mbuf.h
> > @@ -99,10 +99,9 @@ extern "C" {
> >   #define PKT_RX_TUNNEL_IPV6_HDR (1ULL << 12) /**< RX tunnel packet with
> IPv6 header. */
> >   #define PKT_RX_FDIR_ID       (1ULL << 13) /**< FD id reported if FDIR match.
> */
> >   #define PKT_RX_FDIR_FLX      (1ULL << 14) /**< Flexible bytes reported if FDIR
> match. */
> > -/* add new RX flags here */
> >
> 
> We should probably not remove this line.

Why?
There are two lines "/* add new RX flags here */" in rte_mbuf.h file.

> 
> >   /* add new TX flags here */
> > -#define PKT_TX_VXLAN_CKSUM   (1ULL << 50) /**< TX checksum of VXLAN
> computed by NIC */
> > +#define PKT_TX_UDP_TUNNEL_PKT (1ULL << 50) /**< TX packet is an UDP
> > +tunneling packet */
> >   #define PKT_TX_IEEE1588_TMST (1ULL << 51) /**< TX IEEE1588 packet to
> > timestamp. */
> >
> >   /**
> > @@ -125,13 +124,20 @@ extern "C" {
> >   #define PKT_TX_IP_CKSUM      (1ULL << 54) /**< IP cksum of TX pkt. computed
> by NIC. */
> >   #define PKT_TX_IPV4_CSUM     PKT_TX_IP_CKSUM /**< Alias of
> PKT_TX_IP_CKSUM. */
> >
> > +#define PKT_TX_VLAN_PKT      (1ULL << 55) /**< TX packet is a 802.1q VLAN
> packet. */
> > +
> >   /** Tell the NIC it's an IPv4 packet. Required for L4 checksum offload or TSO.
> */
> > -#define PKT_TX_IPV4          PKT_RX_IPV4_HDR
> > +#define PKT_TX_IPV4          (1ULL << 56)
> >
> >   /** Tell the NIC it's an IPv6 packet. Required for L4 checksum offload or TSO.
> */
> > -#define PKT_TX_IPV6          PKT_RX_IPV6_HDR
> > +#define PKT_TX_IPV6          (1ULL << 57)
> 
> The description in comment does not match the description in the cover letter.
Change cover letter.
> Also, I think replacing PKT_RX_IPV[46]_HDR by the value may go in another
> commit.
Ok.
> 
> > -#define PKT_TX_VLAN_PKT      (1ULL << 55) /**< TX packet is a 802.1q VLAN
> packet. */
> > +/** Outer IP cksum of TX pkt. computed by NIC for tunneling packet */
> > +#define PKT_TX_OUTER_IP_CKSUM   (1ULL << 58)
> > +#define PKT_TX_OUTER_IPV4_CSUM  PKT_TX_OUTER_IP_CKSUM /**< Alias
> of
> > +PKT_TX_OUTER_IP_CKSUM. */
> 
> Why do we need an alias?
Ok, remove it.
> By the way, I think the alias of PKT_TX_IP_CKSUM is also uneeded and can be
> removed. But it's not the topic of your series.
> 
> Also, the name PKT_TX_OUTER_IP_CKSUM does not match the name in the cover
> letter and commit logs.
> 
> 
> > +
> > +/** Tell the NIC it's an outer IPv6 packet for tunneling packet.*/
> > +#define PKT_TX_OUTER_IPV6    (1ULL << 59)
> >
> 
> This flag is not in the cover letter or commit log. What is its purpose?
> 
With FVL, if outer L3 header is IPv6, to make HW TX checksum offload work , SW must be responsible to tell hardware this information.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [dpdk-dev] [PATCH 1/3] mbuf:add two TX offload flags and change three fields
       [not found]     ` <1ED644BD7E0A5F4091CF203DAFB8E4CC01D9EEA0@SHSMSX101.ccr.corp.intel.com>
@ 2014-11-27 14:56       ` Ananyev, Konstantin
  2014-11-27 17:01         ` Ananyev, Konstantin
  0 siblings, 1 reply; 28+ messages in thread
From: Ananyev, Konstantin @ 2014-11-27 14:56 UTC (permalink / raw)
  To: Liu, Jijiang, Olivier Matz (olivier.matz@6wind.com); +Cc: dev



> 
> -----Original Message-----
> From: Olivier MATZ [mailto:olivier.matz@6wind.com]
> Sent: Thursday, November 27, 2014 6:00 PM
> To: Liu, Jijiang; dev@dpdk.org
> Subject: Re: [dpdk-dev] [PATCH 1/3] mbuf:add two TX offload flags and change three fields
> 
> Hi Jijiang,
> 
> Please see some comments below.
> 
> On 11/27/2014 09:18 AM, Jijiang Liu wrote:
> > In place of removing the PKT_TX_VXLAN_CKSUM, we introduce 2 new flags: PKT_TX_OUT_IP_CKSUM, PKT_TX_UDP_TUNNEL_PKT,
> and a new field: l4_tun_len.
> > Replace the inner_l2_len and the inner_l3_len field with the outer_l2_len and outer_l3_len field.
> >
> > PKT_TX_OUT_IP_CKSUM: is not used for non-tunnelling packet;hardware outer checksum for tunnelling packet.
> > PKT_TX_UDP_TUNNEL_PKT: is used to tell PMD that the transmit packet is a UDP tunneling packet.
> > l4_tun_len: for VXLAN packet, it should be udp header length plus VXLAN header length.
> >
> > Signed-off-by: Jijiang Liu <jijiang.liu@intel.com>
> > ---
> >   lib/librte_mbuf/rte_mbuf.c |    2 +-
> >   lib/librte_mbuf/rte_mbuf.h |   23 ++++++++++++++---------
> >   2 files changed, 15 insertions(+), 10 deletions(-)
> >
> > diff --git a/lib/librte_mbuf/rte_mbuf.c b/lib/librte_mbuf/rte_mbuf.c
> > index 87c2963..e89c310 100644
> > --- a/lib/librte_mbuf/rte_mbuf.c
> > +++ b/lib/librte_mbuf/rte_mbuf.c
> > @@ -240,7 +240,7 @@ const char *rte_get_tx_ol_flag_name(uint64_t mask)
> >   	case PKT_TX_SCTP_CKSUM: return "PKT_TX_SCTP_CKSUM";
> >   	case PKT_TX_UDP_CKSUM: return "PKT_TX_UDP_CKSUM";
> >   	case PKT_TX_IEEE1588_TMST: return "PKT_TX_IEEE1588_TMST";
> > -	case PKT_TX_VXLAN_CKSUM: return "PKT_TX_VXLAN_CKSUM";
> > +	case PKT_TX_UDP_TUNNEL_PKT: return "PKT_TX_UDP_TUNNEL_PKT";
> >   	case PKT_TX_TCP_SEG: return "PKT_TX_TCP_SEG";
> >   	default: return NULL;
> 
> As I said as a reply to the cover letter, I suggest to use PKT_TX_OUT_UDP_CKSUM instead of PKT_TX_UDP_TUNNEL_PKT.

HW don't support outer L4 checksum offload.
But to calculate inner checksums correctly, it needs a hint from SW about L4 Tunneling Type.
Currently the following values are recognised by HW:

L4 Tunneling Type (Teredo / GRE header / VXLAN header) indication:
00b - No UDP / GRE tunneling (field must be set to zero if EIPT equals to zero)
01b - UDP tunneling header (any UDP tunneling, VXLAN and Geneve).
10b - GRE tunneling header
Else - reserved

You can check yourself:
http://www.intel.com/content/www/us/en/embedded/products/networking/xl710-10-40-controller-datasheet.html
Sections 8.4.2.2.1 and 8.4.4.2

> 
> Also, the PKT_TX_OUT_IP_CKSUM case is missing here.
> 
> > diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h
> > index 367fc56..48cd8e1 100644
> > --- a/lib/librte_mbuf/rte_mbuf.h
> > +++ b/lib/librte_mbuf/rte_mbuf.h
> > @@ -99,10 +99,9 @@ extern "C" {
> >   #define PKT_RX_TUNNEL_IPV6_HDR (1ULL << 12) /**< RX tunnel packet with IPv6 header. */
> >   #define PKT_RX_FDIR_ID       (1ULL << 13) /**< FD id reported if FDIR match. */
> >   #define PKT_RX_FDIR_FLX      (1ULL << 14) /**< Flexible bytes reported if FDIR match. */
> > -/* add new RX flags here */
> >
> 
> We should probably not remove this line.
> 
> 
> >   /* add new TX flags here */
> > -#define PKT_TX_VXLAN_CKSUM   (1ULL << 50) /**< TX checksum of VXLAN computed by NIC */
> > +#define PKT_TX_UDP_TUNNEL_PKT (1ULL << 50) /**< TX packet is an UDP
> > +tunneling packet */
> >   #define PKT_TX_IEEE1588_TMST (1ULL << 51) /**< TX IEEE1588 packet to
> > timestamp. */
> >
> >   /**
> > @@ -125,13 +124,20 @@ extern "C" {
> >   #define PKT_TX_IP_CKSUM      (1ULL << 54) /**< IP cksum of TX pkt. computed by NIC. */
> >   #define PKT_TX_IPV4_CSUM     PKT_TX_IP_CKSUM /**< Alias of PKT_TX_IP_CKSUM. */
> >
> > +#define PKT_TX_VLAN_PKT      (1ULL << 55) /**< TX packet is a 802.1q VLAN packet. */
> > +
> >   /** Tell the NIC it's an IPv4 packet. Required for L4 checksum offload or TSO. */
> > -#define PKT_TX_IPV4          PKT_RX_IPV4_HDR
> > +#define PKT_TX_IPV4          (1ULL << 56)
> >
> >   /** Tell the NIC it's an IPv6 packet. Required for L4 checksum offload or TSO. */
> > -#define PKT_TX_IPV6          PKT_RX_IPV6_HDR
> > +#define PKT_TX_IPV6          (1ULL << 57)
> 
> The description in comment does not match the description in the cover letter.
> 
> Also, I think replacing PKT_RX_IPV[46]_HDR by the value may go in another commit.
> 
> 
> > -#define PKT_TX_VLAN_PKT      (1ULL << 55) /**< TX packet is a 802.1q VLAN packet. */
> > +/** Outer IP cksum of TX pkt. computed by NIC for tunneling packet */
> > +#define PKT_TX_OUTER_IP_CKSUM   (1ULL << 58)
> > +#define PKT_TX_OUTER_IPV4_CSUM  PKT_TX_OUTER_IP_CKSUM /**< Alias of
> > +PKT_TX_OUTER_IP_CKSUM. */
> 
> Why do we need an alias?
> 
> By the way, I think the alias of PKT_TX_IP_CKSUM is also uneeded and can be removed. But it's not the topic of your series.
> 
> Also, the name PKT_TX_OUTER_IP_CKSUM does not match the name in the cover letter and commit logs.
> 
> 
> > +
> > +/** Tell the NIC it's an outer IPv6 packet for tunneling packet.*/
> > +#define PKT_TX_OUTER_IPV6    (1ULL << 59)
> >
> 
> This flag is not in the cover letter or commit log. What is its purpose?


My bad, forgot that for outer IP, will also need to specify it's type.
So same story here as for inner IP.
So in total, we might need 3 flags for outer IP:

/* Tells HW that outer IP is IPV4 and checksum for it should be calculated by HW. */
PKT_TX_OUTER_IP_CKSUM

/* Tells HW that outer IP is IPV4 and checksum for it should not be calculated by HW. */
PKT_TX_OUTER_IPV4

/* Tells HW that outer IP is IPV6. */
PKT_TX_OUTER_IPV6

> 
> 
> >   /**
> >    * TCP segmentation offload. To enable this offload feature for a @@
> > -266,10 +272,9 @@ struct rte_mbuf {
> >   			uint64_t tso_segsz:16; /**< TCP TSO segment size */
> >
> >   			/* fields for TX offloading of tunnels */
> > -			uint64_t inner_l3_len:9; /**< inner L3 (IP) Hdr Length. */
> > -			uint64_t inner_l2_len:7; /**< inner L2 (MAC) Hdr Length. */
> > -
> > -			/* uint64_t unused:8; */
> > +			uint64_t outer_l3_len:9; /**< outer L3 (IP) Hdr Length. */
> > +			uint64_t outer_l2_len:7; /**< outer L2 (MAC) Hdr Length. */
> > +			uint64_t l4_tun_len:8; /**< L4 tunnelling header length */
> >   		};
> >   	};
> >   } __rte_cache_aligned;
> >
> 
> About l4_tun_len, I have another comment I forgot to add in the cover letter. Can we remove it and include its length in outer_l2_len
> instead? For instance, replace:
> 
>       mb->l2_len =  eth_hdr_in;
>       mb->l3_len = ipv4_hdr_in;
>       mb->outer_l2_len = eth_hdr_out;
>       mb->outer_l3_len = ipv4_hdr_out;
>       mb->l4tun_len = vxlan_hdr;
>       mb->ol_flags |= PKT_TX_OUT_IP_CKSUM  | PKT_TX_UDP_TUNNEL |
>         PKT_TX_IP_CKSUM |  PKT_TX_TCP_CKSUM;
> 
> by:
> 
>       mb->l2_len =  eth_hdr_in;
>       mb->l3_len = ipv4_hdr_in;
>       mb->outer_l2_len = eth_hdr_out + vxlan_hdr;
>       mb->outer_l3_len = ipv4_hdr_out;
>       mb->ol_flags |= PKT_TX_OUT_IP_CKSUM  | PKT_TX_UDP_TUNNEL |
>         PKT_TX_IP_CKSUM |  PKT_TX_TCP_CKSUM;
> 
> I think it won't bother the driver, and it's coherent with case B.2 of your cover letter.

You probably meant:
mb->l2_len =  eth_hdr_in + vxlan_hdr;
?
Yes, I think it could be done that way too.
Though I still prefer to keep l4tun_len - it makes things a bit cleaner (at least to me). 
After all  we do have space for it in mbuf's tx_offload.
Konstantin

> 
> Regards,
> Olivier

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [dpdk-dev] [PATCH 0/3] i40e VXLAN TX checksum rework
  2014-11-27  9:44 ` Olivier MATZ
  2014-11-27 10:12   ` Olivier MATZ
  2014-11-27 12:07   ` Liu, Jijiang
@ 2014-11-27 15:29   ` Ananyev, Konstantin
  2014-11-27 16:31     ` Liu, Jijiang
  2014-11-28  9:26     ` Olivier MATZ
  2 siblings, 2 replies; 28+ messages in thread
From: Ananyev, Konstantin @ 2014-11-27 15:29 UTC (permalink / raw)
  To: Olivier MATZ, Liu, Jijiang, dev

Hi Oliver,

> -----Original Message-----
> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Olivier MATZ
> Sent: Thursday, November 27, 2014 9:45 AM
> To: Liu, Jijiang; dev@dpdk.org
> Subject: Re: [dpdk-dev] [PATCH 0/3] i40e VXLAN TX checksum rework
> 
> Hi Jijiang,
> 
> Please find below some comments about the specifications. The global
> picture looks fine to me.
> 
> I've not reviewed the patch right now, but it's in the pipe.
> 
> On 11/27/2014 09:18 AM, Jijiang Liu wrote:
> > We have got some feedback about backward compatibility of VXLAN TX checksum offload API with 1G/10G NIC after the i40e VXLAN
> TX checksum codes were applied, so we have to rework the APIs on i40e, including the changes of mbuf, i40e PMD and csum engine.
> >
> > The main changes in mbuf are as follows,
> > In place of removing PKT_TX_VXLAN_CKSUM, we introducing 2 new flags: PKT_TX_OUT_IP_CKSUM, PKT_TX_UDP_TUNNEL_PKT,
> and a new field: l4_tun_len.
> 
> What about PKT_TX_OUT_UDP_CKSUM instead of PKT_TX_UDP_TUNNEL_PKT? It's
> maybe more coherent with the other names.

FVL HW don't support outer L4 checksum offload.
But to calculate inner checksums correctly, it needs a hint from SW about L4 Tunnelling Type.

> 
> 
> > Replace the inner_l2_len and the inner_l3_len field with the outer_l2_len and outer_l3_len field.
> >
> > The existing flags are listed below,
> > PKT_TX_IP_CKSUM:     HW IPv4 checksum for non-tunnelling packet/ HW inner IPv4 checksum for tunnelling packet
> > PKT_TX_TCP_CKSUM:    HW TCP checksum for non-tunnelling packet/ HW inner TCP checksum for tunnelling packet
> > PKT_TX_SCTP_CKSUM:   HW SCTP checksum for non-tunnelling packet/ HW inner SCTP checksum for tunnelling packet
> > PKT_TX_UDP_CKSUM:    HW SCTP checksum for non-tunnelling packet/ HW inner SCTP checksum for tunnelling packet
> > PKT_TX_IPV4:        IPv4 with no HW checksum offload for non-tunnelling packet/inner IPv4 with no HW checksum offload for
> tunnelling packet
> > PKT_TX_IPV6:        IPv6 non-tunnelling packet/ inner IPv6 with no HW checksum offload for tunnelling packet
> 
> As I suggested in the TSO thread, I think the following semantics
> is easier to understand for the user:
> 
>    - PKT_TX_IP_CKSUM: tell the NIC to compute IP cksum
> 
>    - PKT_TX_IPV4: tell the NIC it's an IPv4 packet. Required for L4
>      checksum offload or TSO.
> 
>    - PKT_TX_IPV6: tell the NIC it's an IPv6 packet. Required for L4
>      checksum offload or TSO.
> 
> I think it won't make a big difference in the FVL driver.

No, no big difference here, but I still think it will be a bit cleaner if all 3 flags would be nutually exclusive.
In fact,  we can unite all 3 of them them into 2 bits,    same as we doing for L4 checksum flags.

> 
> 
> > let's use a few examples to demonstrate how to use these flags:
> > Let say we have a tunnel packet: eth_hdr_out/ipv4_hdr_out/udp_hdr_out/vxlan_hdr/ehtr_hdr_in/ipv4_hdr_in/tcp_hdr_in.There
> could be several scenarios:
> >
> > A) User requests HW offload for ipv4_hdr_out checksum.
> > He doesn't care is it a tunnelled packet or not.
> > So he sets:
> >
> > mb->l2_len =  eth_hdr_out;
> > mb->l3_len = ipv4_hdr_out;
> > mb->ol_flags |= PKT_TX_IPV4_CSUM;
> >
> > B) User is aware that it is a tunnelled packet and requests HW offload for ipv4_hdr_in and tcp_hdr_in *only*.
> > He doesn't care about outer IP checksum offload.
> > In that case, for FVL  he has 2 choices:
> >     1. Treat that packet as a 'proper' tunnelled packet, and fill all the fields:
> >       mb->l2_len =  eth_hdr_in;
> >       mb->l3_len = ipv4_hdr_in;
> >       mb->outer_l2_len = eth_hdr_out;
> >       mb->outer_l3_len = ipv4_hdr_out;
> >       mb->l4tun_len = vxlan_hdr;
> >       mb->ol_flags |= PKT_TX_UDP_TUNNEL_PKT | PKT_TX_IP_CKSUM |  PKT_TX_TCP_CKSUM;
> >
> >     2. As user doesn't care about outer IP hdr checksum, he can treat everything before ipv4_hdr_in as L2 header.
> >     So he knows, that it is a tunnelled packet, but makes HW to treat it as ordinary (non-tunnelled) packet:
> >       mb->l2_len = eth_hdr_out + ipv4_hdr_out + udp_hdr_out + vxlan_hdr + ehtr_hdr_in;
> >       mb->l3_len = ipv4_hdr_in;
> >       mb->ol_flags |= PKT_TX_IP_CKSUM |  PKT_TX_TCP_CKSUM;
> >
> > i40e PMD will support both B.1 and B.2.
> > ixgbe/igb/em PMD supports only B.2.
> > if HW supports both - it will be up to user app which method to choose.
> 
> I think we should have a flag to advertise outer ip and outer udp
> checksum offload support, so the application knows which mode can
> be used.

You mean a new DEV_TX_OFFLOAD_* value, right?
Something like:  DEV_TX_OFFLOAD_UDP_TUNNEL?
And make i40e_dev_info_get() to return it?
Yes, forgot about it, sounds like a proper thing to do. 

Konstantin

> 
> 
> Regards,
> Olivier

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [dpdk-dev] [PATCH 0/3] i40e VXLAN TX checksum rework
  2014-11-27 15:29   ` Ananyev, Konstantin
@ 2014-11-27 16:31     ` Liu, Jijiang
  2014-12-03  8:02       ` Liu, Jijiang
  2014-11-28  9:26     ` Olivier MATZ
  1 sibling, 1 reply; 28+ messages in thread
From: Liu, Jijiang @ 2014-11-27 16:31 UTC (permalink / raw)
  To: Olivier MATZ; +Cc: dev



> -----Original Message-----
> From: Ananyev, Konstantin
> Sent: Thursday, November 27, 2014 11:30 PM
> To: Olivier MATZ; Liu, Jijiang; dev@dpdk.org
> Subject: RE: [dpdk-dev] [PATCH 0/3] i40e VXLAN TX checksum rework
> 
> Hi Oliver,
> 
> > -----Original Message-----
> > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Olivier MATZ
> > Sent: Thursday, November 27, 2014 9:45 AM
> > To: Liu, Jijiang; dev@dpdk.org
> > Subject: Re: [dpdk-dev] [PATCH 0/3] i40e VXLAN TX checksum rework
> >
> > Hi Jijiang,
> >
> > Please find below some comments about the specifications. The global
> > picture looks fine to me.
> >
> > I've not reviewed the patch right now, but it's in the pipe.
> >
> > On 11/27/2014 09:18 AM, Jijiang Liu wrote:
> > > We have got some feedback about backward compatibility of VXLAN TX
> > > checksum offload API with 1G/10G NIC after the i40e VXLAN
> > TX checksum codes were applied, so we have to rework the APIs on i40e,
> including the changes of mbuf, i40e PMD and csum engine.
> > >
> > > The main changes in mbuf are as follows, In place of removing
> > > PKT_TX_VXLAN_CKSUM, we introducing 2 new flags:
> PKT_TX_OUT_IP_CKSUM,
> > > PKT_TX_UDP_TUNNEL_PKT,
> > and a new field: l4_tun_len.
> >
> > What about PKT_TX_OUT_UDP_CKSUM instead of
> PKT_TX_UDP_TUNNEL_PKT? It's
> > maybe more coherent with the other names.
> 
> FVL HW don't support outer L4 checksum offload.
> But to calculate inner checksums correctly, it needs a hint from SW about L4
> Tunnelling Type.
> 
> >
> >
> > > Replace the inner_l2_len and the inner_l3_len field with the outer_l2_len and
> outer_l3_len field.
> > >
> > > The existing flags are listed below,
> > > PKT_TX_IP_CKSUM:     HW IPv4 checksum for non-tunnelling packet/ HW
> inner IPv4 checksum for tunnelling packet
> > > PKT_TX_TCP_CKSUM:    HW TCP checksum for non-tunnelling packet/ HW
> inner TCP checksum for tunnelling packet
> > > PKT_TX_SCTP_CKSUM:   HW SCTP checksum for non-tunnelling packet/ HW
> inner SCTP checksum for tunnelling packet
> > > PKT_TX_UDP_CKSUM:    HW SCTP checksum for non-tunnelling packet/ HW
> inner SCTP checksum for tunnelling packet
> > > PKT_TX_IPV4:        IPv4 with no HW checksum offload for non-tunnelling
> packet/inner IPv4 with no HW checksum offload for
> > tunnelling packet
> > > PKT_TX_IPV6:        IPv6 non-tunnelling packet/ inner IPv6 with no HW
> checksum offload for tunnelling packet
> >
> > As I suggested in the TSO thread, I think the following semantics is
> > easier to understand for the user:
> >
> >    - PKT_TX_IP_CKSUM: tell the NIC to compute IP cksum
> >
> >    - PKT_TX_IPV4: tell the NIC it's an IPv4 packet. Required for L4
> >      checksum offload or TSO.
> >
> >    - PKT_TX_IPV6: tell the NIC it's an IPv6 packet. Required for L4
> >      checksum offload or TSO.
> >
> > I think it won't make a big difference in the FVL driver.
> 
> No, no big difference here, but I still think it will be a bit cleaner if all 3 flags would
> be nutually exclusive.
> In fact,  we can unite all 3 of them them into 2 bits,    same as we doing for L4
> checksum flags.
> 
> >
> >
> > > let's use a few examples to demonstrate how to use these flags:
> > > Let say we have a tunnel packet:
> > > eth_hdr_out/ipv4_hdr_out/udp_hdr_out/vxlan_hdr/ehtr_hdr_in/ipv4_hdr_
> > > in/tcp_hdr_in.There
> > could be several scenarios:
> > >
> > > A) User requests HW offload for ipv4_hdr_out checksum.
> > > He doesn't care is it a tunnelled packet or not.
> > > So he sets:
> > >
> > > mb->l2_len =  eth_hdr_out;
> > > mb->l3_len = ipv4_hdr_out;
> > > mb->ol_flags |= PKT_TX_IPV4_CSUM;
> > >
> > > B) User is aware that it is a tunnelled packet and requests HW offload for
> ipv4_hdr_in and tcp_hdr_in *only*.
> > > He doesn't care about outer IP checksum offload.
> > > In that case, for FVL  he has 2 choices:
> > >     1. Treat that packet as a 'proper' tunnelled packet, and fill all the fields:
> > >       mb->l2_len =  eth_hdr_in;
> > >       mb->l3_len = ipv4_hdr_in;
> > >       mb->outer_l2_len = eth_hdr_out;
> > >       mb->outer_l3_len = ipv4_hdr_out;
> > >       mb->l4tun_len = vxlan_hdr;
> > >       mb->ol_flags |= PKT_TX_UDP_TUNNEL_PKT | PKT_TX_IP_CKSUM |
> > > PKT_TX_TCP_CKSUM;
> > >
> > >     2. As user doesn't care about outer IP hdr checksum, he can treat
> everything before ipv4_hdr_in as L2 header.
> > >     So he knows, that it is a tunnelled packet, but makes HW to treat it as
> ordinary (non-tunnelled) packet:
> > >       mb->l2_len = eth_hdr_out + ipv4_hdr_out + udp_hdr_out + vxlan_hdr +
> ehtr_hdr_in;
> > >       mb->l3_len = ipv4_hdr_in;
> > >       mb->ol_flags |= PKT_TX_IP_CKSUM |  PKT_TX_TCP_CKSUM;
> > >
> > > i40e PMD will support both B.1 and B.2.
> > > ixgbe/igb/em PMD supports only B.2.
> > > if HW supports both - it will be up to user app which method to choose.
> >
> > I think we should have a flag to advertise outer ip and outer udp
> > checksum offload support, so the application knows which mode can be
> > used.
> 
> You mean a new DEV_TX_OFFLOAD_* value, right?
> Something like:  DEV_TX_OFFLOAD_UDP_TUNNEL?
> And make i40e_dev_info_get() to return it?
> Yes, forgot about it, sounds like a proper thing to do.
Yes, makes sense, I will send a separate patch(bug fixing) to do this. Thanks .

> Konstantin
> 
> >
> >
> > Regards,
> > Olivier

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [dpdk-dev] [PATCH 1/3] mbuf:add two TX offload flags and change three fields
  2014-11-27 14:56       ` Ananyev, Konstantin
@ 2014-11-27 17:01         ` Ananyev, Konstantin
  2014-11-28 10:45           ` Olivier MATZ
  0 siblings, 1 reply; 28+ messages in thread
From: Ananyev, Konstantin @ 2014-11-27 17:01 UTC (permalink / raw)
  To: Ananyev, Konstantin, Liu, Jijiang, Olivier Matz (olivier.matz@6wind.com)
  Cc: dev



> -----Original Message-----
> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Ananyev, Konstantin
> Sent: Thursday, November 27, 2014 2:56 PM
> To: Liu, Jijiang; Olivier Matz (olivier.matz@6wind.com)
> Cc: dev@dpdk.org
> Subject: Re: [dpdk-dev] [PATCH 1/3] mbuf:add two TX offload flags and change three fields
> 
> 
> 
> >
> > -----Original Message-----
> > From: Olivier MATZ [mailto:olivier.matz@6wind.com]
> > Sent: Thursday, November 27, 2014 6:00 PM
> > To: Liu, Jijiang; dev@dpdk.org
> > Subject: Re: [dpdk-dev] [PATCH 1/3] mbuf:add two TX offload flags and change three fields
> >
> > Hi Jijiang,
> >
> > Please see some comments below.
> >
> > On 11/27/2014 09:18 AM, Jijiang Liu wrote:
> > > In place of removing the PKT_TX_VXLAN_CKSUM, we introduce 2 new flags: PKT_TX_OUT_IP_CKSUM,
> PKT_TX_UDP_TUNNEL_PKT,
> > and a new field: l4_tun_len.
> > > Replace the inner_l2_len and the inner_l3_len field with the outer_l2_len and outer_l3_len field.
> > >
> > > PKT_TX_OUT_IP_CKSUM: is not used for non-tunnelling packet;hardware outer checksum for tunnelling packet.
> > > PKT_TX_UDP_TUNNEL_PKT: is used to tell PMD that the transmit packet is a UDP tunneling packet.
> > > l4_tun_len: for VXLAN packet, it should be udp header length plus VXLAN header length.
> > >
> > > Signed-off-by: Jijiang Liu <jijiang.liu@intel.com>
> > > ---
> > >   lib/librte_mbuf/rte_mbuf.c |    2 +-
> > >   lib/librte_mbuf/rte_mbuf.h |   23 ++++++++++++++---------
> > >   2 files changed, 15 insertions(+), 10 deletions(-)
> > >
> > > diff --git a/lib/librte_mbuf/rte_mbuf.c b/lib/librte_mbuf/rte_mbuf.c
> > > index 87c2963..e89c310 100644
> > > --- a/lib/librte_mbuf/rte_mbuf.c
> > > +++ b/lib/librte_mbuf/rte_mbuf.c
> > > @@ -240,7 +240,7 @@ const char *rte_get_tx_ol_flag_name(uint64_t mask)
> > >   	case PKT_TX_SCTP_CKSUM: return "PKT_TX_SCTP_CKSUM";
> > >   	case PKT_TX_UDP_CKSUM: return "PKT_TX_UDP_CKSUM";
> > >   	case PKT_TX_IEEE1588_TMST: return "PKT_TX_IEEE1588_TMST";
> > > -	case PKT_TX_VXLAN_CKSUM: return "PKT_TX_VXLAN_CKSUM";
> > > +	case PKT_TX_UDP_TUNNEL_PKT: return "PKT_TX_UDP_TUNNEL_PKT";
> > >   	case PKT_TX_TCP_SEG: return "PKT_TX_TCP_SEG";
> > >   	default: return NULL;
> >
> > As I said as a reply to the cover letter, I suggest to use PKT_TX_OUT_UDP_CKSUM instead of PKT_TX_UDP_TUNNEL_PKT.
> 
> HW don't support outer L4 checksum offload.
> But to calculate inner checksums correctly, it needs a hint from SW about L4 Tunneling Type.
> Currently the following values are recognised by HW:
> 
> L4 Tunneling Type (Teredo / GRE header / VXLAN header) indication:
> 00b - No UDP / GRE tunneling (field must be set to zero if EIPT equals to zero)
> 01b - UDP tunneling header (any UDP tunneling, VXLAN and Geneve).
> 10b - GRE tunneling header
> Else - reserved
> 
> You can check yourself:
> http://www.intel.com/content/www/us/en/embedded/products/networking/xl710-10-40-controller-datasheet.html
> Sections 8.4.2.2.1 and 8.4.4.2
> 
> >
> > Also, the PKT_TX_OUT_IP_CKSUM case is missing here.
> >
> > > diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h
> > > index 367fc56..48cd8e1 100644
> > > --- a/lib/librte_mbuf/rte_mbuf.h
> > > +++ b/lib/librte_mbuf/rte_mbuf.h
> > > @@ -99,10 +99,9 @@ extern "C" {
> > >   #define PKT_RX_TUNNEL_IPV6_HDR (1ULL << 12) /**< RX tunnel packet with IPv6 header. */
> > >   #define PKT_RX_FDIR_ID       (1ULL << 13) /**< FD id reported if FDIR match. */
> > >   #define PKT_RX_FDIR_FLX      (1ULL << 14) /**< Flexible bytes reported if FDIR match. */
> > > -/* add new RX flags here */
> > >
> >
> > We should probably not remove this line.
> >
> >
> > >   /* add new TX flags here */
> > > -#define PKT_TX_VXLAN_CKSUM   (1ULL << 50) /**< TX checksum of VXLAN computed by NIC */
> > > +#define PKT_TX_UDP_TUNNEL_PKT (1ULL << 50) /**< TX packet is an UDP
> > > +tunneling packet */
> > >   #define PKT_TX_IEEE1588_TMST (1ULL << 51) /**< TX IEEE1588 packet to
> > > timestamp. */
> > >
> > >   /**
> > > @@ -125,13 +124,20 @@ extern "C" {
> > >   #define PKT_TX_IP_CKSUM      (1ULL << 54) /**< IP cksum of TX pkt. computed by NIC. */
> > >   #define PKT_TX_IPV4_CSUM     PKT_TX_IP_CKSUM /**< Alias of PKT_TX_IP_CKSUM. */
> > >
> > > +#define PKT_TX_VLAN_PKT      (1ULL << 55) /**< TX packet is a 802.1q VLAN packet. */
> > > +
> > >   /** Tell the NIC it's an IPv4 packet. Required for L4 checksum offload or TSO. */
> > > -#define PKT_TX_IPV4          PKT_RX_IPV4_HDR
> > > +#define PKT_TX_IPV4          (1ULL << 56)
> > >
> > >   /** Tell the NIC it's an IPv6 packet. Required for L4 checksum offload or TSO. */
> > > -#define PKT_TX_IPV6          PKT_RX_IPV6_HDR
> > > +#define PKT_TX_IPV6          (1ULL << 57)
> >
> > The description in comment does not match the description in the cover letter.
> >
> > Also, I think replacing PKT_RX_IPV[46]_HDR by the value may go in another commit.
> >
> >
> > > -#define PKT_TX_VLAN_PKT      (1ULL << 55) /**< TX packet is a 802.1q VLAN packet. */
> > > +/** Outer IP cksum of TX pkt. computed by NIC for tunneling packet */
> > > +#define PKT_TX_OUTER_IP_CKSUM   (1ULL << 58)
> > > +#define PKT_TX_OUTER_IPV4_CSUM  PKT_TX_OUTER_IP_CKSUM /**< Alias of
> > > +PKT_TX_OUTER_IP_CKSUM. */
> >
> > Why do we need an alias?
> >
> > By the way, I think the alias of PKT_TX_IP_CKSUM is also uneeded and can be removed. But it's not the topic of your series.
> >
> > Also, the name PKT_TX_OUTER_IP_CKSUM does not match the name in the cover letter and commit logs.
> >
> >
> > > +
> > > +/** Tell the NIC it's an outer IPv6 packet for tunneling packet.*/
> > > +#define PKT_TX_OUTER_IPV6    (1ULL << 59)
> > >
> >
> > This flag is not in the cover letter or commit log. What is its purpose?
> 
> 
> My bad, forgot that for outer IP, will also need to specify it's type.
> So same story here as for inner IP.
> So in total, we might need 3 flags for outer IP:
> 
> /* Tells HW that outer IP is IPV4 and checksum for it should be calculated by HW. */
> PKT_TX_OUTER_IP_CKSUM
> 
> /* Tells HW that outer IP is IPV4 and checksum for it should not be calculated by HW. */
> PKT_TX_OUTER_IPV4
> 
> /* Tells HW that outer IP is IPV6. */
> PKT_TX_OUTER_IPV6
> 
> >
> >
> > >   /**
> > >    * TCP segmentation offload. To enable this offload feature for a @@
> > > -266,10 +272,9 @@ struct rte_mbuf {
> > >   			uint64_t tso_segsz:16; /**< TCP TSO segment size */
> > >
> > >   			/* fields for TX offloading of tunnels */
> > > -			uint64_t inner_l3_len:9; /**< inner L3 (IP) Hdr Length. */
> > > -			uint64_t inner_l2_len:7; /**< inner L2 (MAC) Hdr Length. */
> > > -
> > > -			/* uint64_t unused:8; */
> > > +			uint64_t outer_l3_len:9; /**< outer L3 (IP) Hdr Length. */
> > > +			uint64_t outer_l2_len:7; /**< outer L2 (MAC) Hdr Length. */
> > > +			uint64_t l4_tun_len:8; /**< L4 tunnelling header length */
> > >   		};
> > >   	};
> > >   } __rte_cache_aligned;
> > >
> >
> > About l4_tun_len, I have another comment I forgot to add in the cover letter. Can we remove it and include its length in
> outer_l2_len
> > instead? For instance, replace:
> >
> >       mb->l2_len =  eth_hdr_in;
> >       mb->l3_len = ipv4_hdr_in;
> >       mb->outer_l2_len = eth_hdr_out;
> >       mb->outer_l3_len = ipv4_hdr_out;
> >       mb->l4tun_len = vxlan_hdr;
> >       mb->ol_flags |= PKT_TX_OUT_IP_CKSUM  | PKT_TX_UDP_TUNNEL |
> >         PKT_TX_IP_CKSUM |  PKT_TX_TCP_CKSUM;
> >
> > by:
> >
> >       mb->l2_len =  eth_hdr_in;
> >       mb->l3_len = ipv4_hdr_in;
> >       mb->outer_l2_len = eth_hdr_out + vxlan_hdr;
> >       mb->outer_l3_len = ipv4_hdr_out;
> >       mb->ol_flags |= PKT_TX_OUT_IP_CKSUM  | PKT_TX_UDP_TUNNEL |
> >         PKT_TX_IP_CKSUM |  PKT_TX_TCP_CKSUM;
> >
> > I think it won't bother the driver, and it's coherent with case B.2 of your cover letter.
> 
> You probably meant:
> mb->l2_len =  eth_hdr_in + vxlan_hdr;
> ?
> Yes, I think it could be done that way too.
> Though I still prefer to keep l4tun_len - it makes things a bit cleaner (at least to me).
> After all  we do have space for it in mbuf's tx_offload.

As one more thing in favour of separate l4tun_len field:
l2_len is 7 bit long, so in theory it might be not enough, as for FVL:
12:18 L4TUNLEN L4 Tunneling Length (Teredo / GRE header / VXLAN header) defined in Words. 


> Konstantin
> 
> >
> > Regards,
> > Olivier

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [dpdk-dev] [PATCH 1/3] mbuf:add two TX offload flags and change three fields
  2014-11-27 13:14     ` Liu, Jijiang
@ 2014-11-28  9:17       ` Olivier MATZ
  0 siblings, 0 replies; 28+ messages in thread
From: Olivier MATZ @ 2014-11-28  9:17 UTC (permalink / raw)
  To: Liu, Jijiang, dev


Hi Jijiang,

On 11/27/2014 02:14 PM, Liu, Jijiang wrote:
>>> diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h
>>> index 367fc56..48cd8e1 100644
>>> --- a/lib/librte_mbuf/rte_mbuf.h
>>> +++ b/lib/librte_mbuf/rte_mbuf.h
>>> @@ -99,10 +99,9 @@ extern "C" {
>>>   #define PKT_RX_TUNNEL_IPV6_HDR (1ULL << 12) /**< RX tunnel packet with
>> IPv6 header. */
>>>   #define PKT_RX_FDIR_ID       (1ULL << 13) /**< FD id reported if FDIR match.
>> */
>>>   #define PKT_RX_FDIR_FLX      (1ULL << 14) /**< Flexible bytes reported if FDIR
>> match. */
>>> -/* add new RX flags here */
>>>
>>
>> We should probably not remove this line.
> 
> Why?
> There are two lines "/* add new RX flags here */" in rte_mbuf.h file.

No, one is RX, the other is TX.


>>> +/** Tell the NIC it's an outer IPv6 packet for tunneling packet.*/
>>> +#define PKT_TX_OUTER_IPV6    (1ULL << 59)
>>>
>>
>> This flag is not in the cover letter or commit log. What is its purpose?
>>
> With FVL, if outer L3 header is IPv6, to make HW TX checksum offload work , SW must be responsible to tell hardware this information.


What hardware checksum are you talking about?
I understand that outer L4 checksum is not supported from one of
Konstantin's mail.
And there is no L3 checksum in IPv6.


Regards,
Olivier

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [dpdk-dev] [PATCH 0/3] i40e VXLAN TX checksum rework
  2014-11-27 15:29   ` Ananyev, Konstantin
  2014-11-27 16:31     ` Liu, Jijiang
@ 2014-11-28  9:26     ` Olivier MATZ
  1 sibling, 0 replies; 28+ messages in thread
From: Olivier MATZ @ 2014-11-28  9:26 UTC (permalink / raw)
  To: Ananyev, Konstantin, Liu, Jijiang, dev

Hi Konstantin,

On 11/27/2014 04:29 PM, Ananyev, Konstantin wrote:
>> As I suggested in the TSO thread, I think the following semantics
>> is easier to understand for the user:
>>
>>    - PKT_TX_IP_CKSUM: tell the NIC to compute IP cksum
>>
>>    - PKT_TX_IPV4: tell the NIC it's an IPv4 packet. Required for L4
>>      checksum offload or TSO.
>>
>>    - PKT_TX_IPV6: tell the NIC it's an IPv6 packet. Required for L4
>>      checksum offload or TSO.
>>
>> I think it won't make a big difference in the FVL driver.
> 
> No, no big difference here, but I still think it will be a bit cleaner if all 3 flags would be nutually exclusive.
> In fact,  we can unite all 3 of them them into 2 bits,    same as we doing for L4 checksum flags.

In case of TSO, you need to set the PKT_TX_IPV4 flag.
But as suggested by Yong Wang from Vmware [1], the vmxnet3 driver could
support TSO without offloading IP checksum, so I think it's better to
have flags for (is_ipv4 or is_ipv6), and another one to ask the
ip_checksum.


> You mean a new DEV_TX_OFFLOAD_* value, right?
> Something like:  DEV_TX_OFFLOAD_UDP_TUNNEL?
> And make i40e_dev_info_get() to return it?
> Yes, forgot about it, sounds like a proper thing to do. 

Yes. I've seen that Jijiang is planning to add it in a future bug fix
patch. That's fine to me.


[1] http://dpdk.org/ml/archives/dev/2014-November/007775.html

Regards,
Olivier

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [dpdk-dev] [PATCH 1/3] mbuf:add two TX offload flags and change three fields
  2014-11-27 17:01         ` Ananyev, Konstantin
@ 2014-11-28 10:45           ` Olivier MATZ
  2014-11-28 11:16             ` Ananyev, Konstantin
  2014-11-30 14:50             ` Ananyev, Konstantin
  0 siblings, 2 replies; 28+ messages in thread
From: Olivier MATZ @ 2014-11-28 10:45 UTC (permalink / raw)
  To: Ananyev, Konstantin, Liu, Jijiang; +Cc: dev

Hi Konstantin,

On 11/27/2014 06:01 PM, Ananyev, Konstantin wrote:
>> Yes, I think it could be done that way too.
>> Though I still prefer to keep l4tun_len - it makes things a bit cleaner (at least to me).
>> After all  we do have space for it in mbuf's tx_offload.
> 
> As one more thing in favour of separate l4tun_len field:
> l2_len is 7 bit long, so in theory it might be not enough, as for FVL:
> 12:18 L4TUNLEN L4 Tunneling Length (Teredo / GRE header / VXLAN header) defined in Words. 

The l2_len field is 7 bits long because it was mapped to ixgbe hardware.
If it's not enough (although I'm not sure it's possible to have a header
larger than 128 bytes in this case), it's probably because we should
not have been doing that.
Maybe these generic fields should be generic :)
If it's not enough, what about changing l2_len to 8 bits?

Often in the recent discussions, I see as an argument "fortville needs
this so we need to add it in the mbuf". I agree that currently
fortville is the only hardware supported for the new features, so it
can make sense to act like this, but we have to accept to come back
to this API in the future if it makes less sense with other drivers.

Also, application developers can be annoyed to see that the mbuf flags
and meta data are just duplicating information that is already present
in the packet.

About the l4tun_len, it's another field the application has to fill,
but it's maybe cleaner. I just wanted to clarify why I'm discussing
these points.

Regards,
Olivier

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [dpdk-dev] [PATCH 1/3] mbuf:add two TX offload flags and change three fields
  2014-11-28 10:45           ` Olivier MATZ
@ 2014-11-28 11:16             ` Ananyev, Konstantin
  2014-11-30 14:50             ` Ananyev, Konstantin
  1 sibling, 0 replies; 28+ messages in thread
From: Ananyev, Konstantin @ 2014-11-28 11:16 UTC (permalink / raw)
  To: Olivier MATZ, Liu, Jijiang; +Cc: dev

Hi Olver,

> -----Original Message-----
> From: Olivier MATZ [mailto:olivier.matz@6wind.com]
> Sent: Friday, November 28, 2014 10:46 AM
> To: Ananyev, Konstantin; Liu, Jijiang
> Cc: dev@dpdk.org
> Subject: Re: [dpdk-dev] [PATCH 1/3] mbuf:add two TX offload flags and change three fields
> 
> Hi Konstantin,
> 
> On 11/27/2014 06:01 PM, Ananyev, Konstantin wrote:
> >> Yes, I think it could be done that way too.
> >> Though I still prefer to keep l4tun_len - it makes things a bit cleaner (at least to me).
> >> After all  we do have space for it in mbuf's tx_offload.
> >
> > As one more thing in favour of separate l4tun_len field:
> > l2_len is 7 bit long, so in theory it might be not enough, as for FVL:
> > 12:18 L4TUNLEN L4 Tunneling Length (Teredo / GRE header / VXLAN header) defined in Words.
> 
> The l2_len field is 7 bits long because it was mapped to ixgbe hardware.

Yes.

> If it's not enough (although I'm not sure it's possible to have a header
> larger than 128 bytes in this case), it's probably because we should
> not have been doing that.

I also can't imagine the L2 header being that long.
>From other side - I just don't like an idea of PMD stripping off HW capabilities.

> Maybe these generic fields should be generic :)
> If it's not enough, what about changing l2_len to 8 bits?

Yes, was thinking about the same thing.
Maybe instead of introducing l4tun_len, we should increase l2_len and outer_l2_len sizes to 8bits each? 
Though that would break the pair:
l2_len : 7
l3_len :9 
Which is quite useful, as it fits into 2B, and maps exactly to ixgbe TCD layout.
Wonder if we change it, would be there any performance penalty for ixgbe, and if yes how big.

> 
> Often in the recent discussions, I see as an argument "fortville needs
> this so we need to add it in the mbuf". I agree that currently
> fortville is the only hardware supported for the new features, so it
> can make sense to act like this, but we have to accept to come back
> to this API in the future if it makes less sense with other drivers.

Yes, it is sometime hard to keep a balance.
>From one side people would like to have a  clean and generic API, from other side
people would like to have ab ability to use all features that HW supports. 

> 
> Also, application developers can be annoyed to see that the mbuf flags
> and meta data are just duplicating information that is already present
> in the packet.
> 
> About the l4tun_len, it's another field the application has to fill,
> but it's maybe cleaner. I just wanted to clarify why I'm discussing
> these points.
> 
> Regards,
> Olivier

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [dpdk-dev] [PATCH 1/3] mbuf:add two TX offload flags and change three fields
  2014-11-28 10:45           ` Olivier MATZ
  2014-11-28 11:16             ` Ananyev, Konstantin
@ 2014-11-30 14:50             ` Ananyev, Konstantin
  2014-12-01  2:30               ` Liu, Jijiang
  1 sibling, 1 reply; 28+ messages in thread
From: Ananyev, Konstantin @ 2014-11-30 14:50 UTC (permalink / raw)
  To: Olivier MATZ, Liu, Jijiang; +Cc: dev

Hi Olver,

> -----Original Message-----
> From: Olivier MATZ [mailto:olivier.matz@6wind.com]
> Sent: Friday, November 28, 2014 10:46 AM
> To: Ananyev, Konstantin; Liu, Jijiang
> Cc: dev@dpdk.org
> Subject: Re: [dpdk-dev] [PATCH 1/3] mbuf:add two TX offload flags and change three fields
> 
> Hi Konstantin,
> 
> On 11/27/2014 06:01 PM, Ananyev, Konstantin wrote:
> >> Yes, I think it could be done that way too.
> >> Though I still prefer to keep l4tun_len - it makes things a bit cleaner (at least to me).
> >> After all  we do have space for it in mbuf's tx_offload.
> >
> > As one more thing in favour of separate l4tun_len field:
> > l2_len is 7 bit long, so in theory it might be not enough, as for FVL:
> > 12:18 L4TUNLEN L4 Tunneling Length (Teredo / GRE header / VXLAN header) defined in Words.
> 
> The l2_len field is 7 bits long because it was mapped to ixgbe hardware.
> If it's not enough (although I'm not sure it's possible to have a header
> larger than 128 bytes in this case), it's probably because we should
> not have been doing that.
> Maybe these generic fields should be generic :)
> If it's not enough, what about changing l2_len to 8 bits?
> 
> Often in the recent discussions, I see as an argument "fortville needs
> this so we need to add it in the mbuf". I agree that currently
> fortville is the only hardware supported for the new features, so it
> can make sense to act like this, but we have to accept to come back
> to this API in the future if it makes less sense with other drivers.
> 
> Also, application developers can be annoyed to see that the mbuf flags
> and meta data are just duplicating information that is already present
> in the packet.
> 
> About the l4tun_len, it's another field the application has to fill,
> but it's maybe cleaner. I just wanted to clarify why I'm discussing
> these points.

After another thought, I think that the way you proposed is a better one.
I gives us more flexibility:  
let's say for now we'll keep both l2_len and outer_l2_len as 7 bit fields,
and upper layer would have to:
mb->l2_len =  eth_hdr_in + vxlan_hdr;
for VXLAN packets.
Then if in the future, we'll realise that 7 bit is not enough we can either:
- increase size of l2_len and outer_l2_len
- introduce new field (l4tun_len as we planned now)
In both cases backward compatibility wouldn't be affected.
>From other side - if we'' introduce a new field now (l4tun_len), it would be very hard to get rid of it in the future.  
So, I think we'd better follow your approach here.

Thanks
Konstantin


> 
> Regards,
> Olivier

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [dpdk-dev] [PATCH 1/3] mbuf:add two TX offload flags and change three fields
  2014-11-30 14:50             ` Ananyev, Konstantin
@ 2014-12-01  2:30               ` Liu, Jijiang
  2014-12-01  9:52                 ` Olivier MATZ
  0 siblings, 1 reply; 28+ messages in thread
From: Liu, Jijiang @ 2014-12-01  2:30 UTC (permalink / raw)
  To: Ananyev, Konstantin, Olivier MATZ; +Cc: dev



> -----Original Message-----
> From: Ananyev, Konstantin
> Sent: Sunday, November 30, 2014 10:51 PM
> To: Olivier MATZ; Liu, Jijiang
> Cc: dev@dpdk.org
> Subject: RE: [dpdk-dev] [PATCH 1/3] mbuf:add two TX offload flags and change
> three fields
> 
> Hi Olver,
> 
> > -----Original Message-----
> > From: Olivier MATZ [mailto:olivier.matz@6wind.com]
> > Sent: Friday, November 28, 2014 10:46 AM
> > To: Ananyev, Konstantin; Liu, Jijiang
> > Cc: dev@dpdk.org
> > Subject: Re: [dpdk-dev] [PATCH 1/3] mbuf:add two TX offload flags and
> > change three fields
> >
> > Hi Konstantin,
> >
> > On 11/27/2014 06:01 PM, Ananyev, Konstantin wrote:
> > >> Yes, I think it could be done that way too.
> > >> Though I still prefer to keep l4tun_len - it makes things a bit cleaner (at least
> to me).
> > >> After all  we do have space for it in mbuf's tx_offload.
> > >
> > > As one more thing in favour of separate l4tun_len field:
> > > l2_len is 7 bit long, so in theory it might be not enough, as for FVL:
> > > 12:18 L4TUNLEN L4 Tunneling Length (Teredo / GRE header / VXLAN header)
> defined in Words.
> >
> > The l2_len field is 7 bits long because it was mapped to ixgbe hardware.
> > If it's not enough (although I'm not sure it's possible to have a
> > header larger than 128 bytes in this case), it's probably because we
> > should not have been doing that.
> > Maybe these generic fields should be generic :) If it's not enough,
> > what about changing l2_len to 8 bits?
> >
> > Often in the recent discussions, I see as an argument "fortville needs
> > this so we need to add it in the mbuf". I agree that currently
> > fortville is the only hardware supported for the new features, so it
> > can make sense to act like this, but we have to accept to come back to
> > this API in the future if it makes less sense with other drivers.
> >
> > Also, application developers can be annoyed to see that the mbuf flags
> > and meta data are just duplicating information that is already present
> > in the packet.
> >
> > About the l4tun_len, it's another field the application has to fill,
> > but it's maybe cleaner. I just wanted to clarify why I'm discussing
> > these points.
> 
> After another thought, I think that the way you proposed is a better one.
> I gives us more flexibility:
> let's say for now we'll keep both l2_len and outer_l2_len as 7 bit fields, and upper
> layer would have to:
> mb->l2_len =  eth_hdr_in + vxlan_hdr;

A question, how to fill the mb->l2_len  for IP in IP, IP in GRE tunneling packet that is no inner L2 header?
What  is the rule?
Or you think it should be mb->l2_len =  0 + l4_tun_len;  ??

> for VXLAN packets.
> Then if in the future, we'll realise that 7 bit is not enough we can either:
> - increase size of l2_len and outer_l2_len
> - introduce new field (l4tun_len as we planned now) In both cases backward
> compatibility wouldn't be affected.
> From other side - if we'' introduce a new field now (l4tun_len), it would be very
> hard to get rid of it in the future.
> So, I think we'd better follow your approach here.
> 
> Thanks
> Konstantin
> 
> 
> >
> > Regards,
> > Olivier

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [dpdk-dev] [PATCH 1/3] mbuf:add two TX offload flags and change three fields
  2014-12-01  2:30               ` Liu, Jijiang
@ 2014-12-01  9:52                 ` Olivier MATZ
  2014-12-01 11:58                   ` Ananyev, Konstantin
  0 siblings, 1 reply; 28+ messages in thread
From: Olivier MATZ @ 2014-12-01  9:52 UTC (permalink / raw)
  To: Liu, Jijiang, Ananyev, Konstantin; +Cc: dev

Hi Jijiang,

On 12/01/2014 03:30 AM, Liu, Jijiang wrote:
>> After another thought, I think that the way you proposed is a better one.
>> I gives us more flexibility:
>> let's say for now we'll keep both l2_len and outer_l2_len as 7 bit fields, and upper
>> layer would have to:
>> mb->l2_len =  eth_hdr_in + vxlan_hdr;
> 
> A question, how to fill the mb->l2_len  for IP in IP, IP in GRE tunneling packet that is no inner L2 header?
> What  is the rule?
> Or you think it should be mb->l2_len =  0 + l4_tun_len;  ??

We could have the following rule:
- the l2_len starts after the end of outer_l2_len + outer_l3_len (if
  they are not 0)
- l2_len is the length of all headers up to the first ip or ipv6
  header

Examples:

Ether/IP/UDP/xxx
  m->outer_l2_len = 0
  m->outer_l3_len = 0
  m->l2_len = sizeof(ether)
  m->l3_len = sizeof(ip)
  m->l4_len = sizeof(udp)

Ether/IP/IP/UDP/xxx
  m->outer_l2_len = sizeof(ether)
  m->outer_l3_len = sizeof(ip)
  m->l2_len = 0
  m->l3_len = sizeof(ip)
  m->l4_len = sizeof(udp)

Ether/IP/GRE/IP/UDP/xxx
  m->outer_l2_len = sizeof(ether)
  m->outer_l3_len = sizeof(ip)
  m->l2_len = sizeof(gre)
  m->l3_len = sizeof(ip)
  m->l4_len = sizeof(udp)

Ether/IP/UDP/vxlan/Ether/IP/UDP/xxx
  m->outer_l2_len = sizeof(ether)
  m->outer_l3_len = sizeof(ip)
  m->l2_len = sizeof(udp) + sizeof(vxlan) + sizeof(ether)
  m->l3_len = sizeof(ip)
  m->l4_len = sizeof(udp)

Regards,
Olivier

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [dpdk-dev] [PATCH 1/3] mbuf:add two TX offload flags and change three fields
  2014-12-01  9:52                 ` Olivier MATZ
@ 2014-12-01 11:58                   ` Ananyev, Konstantin
  2014-12-01 12:28                     ` Olivier MATZ
  0 siblings, 1 reply; 28+ messages in thread
From: Ananyev, Konstantin @ 2014-12-01 11:58 UTC (permalink / raw)
  To: Olivier MATZ, Liu, Jijiang; +Cc: dev



> -----Original Message-----
> From: Olivier MATZ [mailto:olivier.matz@6wind.com]
> Sent: Monday, December 01, 2014 9:52 AM
> To: Liu, Jijiang; Ananyev, Konstantin
> Cc: dev@dpdk.org
> Subject: Re: [dpdk-dev] [PATCH 1/3] mbuf:add two TX offload flags and change three fields
> 
> Hi Jijiang,
> 
> On 12/01/2014 03:30 AM, Liu, Jijiang wrote:
> >> After another thought, I think that the way you proposed is a better one.
> >> I gives us more flexibility:
> >> let's say for now we'll keep both l2_len and outer_l2_len as 7 bit fields, and upper
> >> layer would have to:
> >> mb->l2_len =  eth_hdr_in + vxlan_hdr;
> >
> > A question, how to fill the mb->l2_len  for IP in IP, IP in GRE tunneling packet that is no inner L2 header?
> > What  is the rule?
> > Or you think it should be mb->l2_len =  0 + l4_tun_len;  ??

I think it should be:
l2_len =  0 + l4_tun_len;

>From your patch:
l4_tunnel_len = l4_tun_len + l2_len;

> 
> We could have the following rule:
> - the l2_len starts after the end of outer_l2_len + outer_l3_len (if
>   they are not 0)

I think it should start after the outer_l4_hdr.
I.e. right now we don't include udp header length into it.

> - l2_len is the length of all headers up to the first ip or ipv6
>   header

Yes, same thought here.

> 
> Examples:
> 
> Ether/IP/UDP/xxx
>   m->outer_l2_len = 0
>   m->outer_l3_len = 0
>   m->l2_len = sizeof(ether)
>   m->l3_len = sizeof(ip)
>   m->l4_len = sizeof(udp)
> 
> Ether/IP/IP/UDP/xxx
>   m->outer_l2_len = sizeof(ether)
>   m->outer_l3_len = sizeof(ip)
>   m->l2_len = 0
>   m->l3_len = sizeof(ip)
>   m->l4_len = sizeof(udp)
> 
> Ether/IP/GRE/IP/UDP/xxx
>   m->outer_l2_len = sizeof(ether)
>   m->outer_l3_len = sizeof(ip)
>   m->l2_len = sizeof(gre)
>   m->l3_len = sizeof(ip)
>   m->l4_len = sizeof(udp)
> 
> Ether/IP/UDP/vxlan/Ether/IP/UDP/xxx
>   m->outer_l2_len = sizeof(ether)
>   m->outer_l3_len = sizeof(ip)
>   m->l2_len = sizeof(udp) + sizeof(vxlan) + sizeof(ether)

I think it should be:
m->l2_len = sizeof(vxlan) + sizeof(ether)

We don't need to add sizeof(udp) as we already say to the HW that I t will be UDP TUNNELING vi the ol_flag: PKT_TX_UDP_TUNNEL_PKT.

>   m->l3_len = sizeof(ip)
>   m->l4_len = sizeof(udp)
> 
> Regards,
> Olivier

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [dpdk-dev] [PATCH 1/3] mbuf:add two TX offload flags and change three fields
  2014-12-01 11:58                   ` Ananyev, Konstantin
@ 2014-12-01 12:28                     ` Olivier MATZ
  2014-12-01 13:07                       ` Liu, Jijiang
  0 siblings, 1 reply; 28+ messages in thread
From: Olivier MATZ @ 2014-12-01 12:28 UTC (permalink / raw)
  To: Ananyev, Konstantin, Liu, Jijiang; +Cc: dev

Hi Konstantin,

On 12/01/2014 12:58 PM, Ananyev, Konstantin wrote:
>> Ether/IP/UDP/vxlan/Ether/IP/UDP/xxx
>>   m->outer_l2_len = sizeof(ether)
>>   m->outer_l3_len = sizeof(ip)
>>   m->l2_len = sizeof(udp) + sizeof(vxlan) + sizeof(ether)
> 
> I think it should be:
> m->l2_len = sizeof(vxlan) + sizeof(ether)
> 
> We don't need to add sizeof(udp) as we already say to the HW that I t will be UDP TUNNELING vi the ol_flag: PKT_TX_UDP_TUNNEL_PKT.
> 
>>   m->l3_len = sizeof(ip)
>>   m->l4_len = sizeof(udp)

I would agree if we had a m->outer_l4_len. Maybe we should add
it to be coherent with lX_len fields?

I think that to process the inner IP checksum, we should be able
to use these 2 notations:

Ether/IP/GRE/IP/UDP/xxx
  m->flags = OUTER_IP_CKSUM
  m->outer_l2_len = sizeof(ether)
  m->outer_l3_len = sizeof(ip)
  m->l2_len = sizeof(gre)
  m->l3_len = sizeof(ip)
  m->l4_len = sizeof(udp)

Ether/IP/GRE/IP/UDP/xxx
  m->flags = IP_CKSUM
  /* sum of all outer_lX_len and l2_len from above: */
  m->l2_len = sizeof(ether) + sizeof(ip) + sizeof(gre)
  m->l3_len = sizeof(ip)
  m->l4_len = sizeof(udp)

So, in case of vxlan, I suggest that either we include the size of
UDP in l2_len, or we add a outer_l4_len. What do you think?

Maybe adding outer_l4_len makes more sense.

Regards,
Olivier

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [dpdk-dev] [PATCH 1/3] mbuf:add two TX offload flags and change three fields
  2014-12-01 12:28                     ` Olivier MATZ
@ 2014-12-01 13:07                       ` Liu, Jijiang
  2014-12-01 14:31                         ` Ananyev, Konstantin
  0 siblings, 1 reply; 28+ messages in thread
From: Liu, Jijiang @ 2014-12-01 13:07 UTC (permalink / raw)
  To: Olivier MATZ, Ananyev, Konstantin; +Cc: dev



> -----Original Message-----
> From: Olivier MATZ [mailto:olivier.matz@6wind.com]
> Sent: Monday, December 1, 2014 8:28 PM
> To: Ananyev, Konstantin; Liu, Jijiang
> Cc: dev@dpdk.org
> Subject: Re: [dpdk-dev] [PATCH 1/3] mbuf:add two TX offload flags and change
> three fields
> 
> Hi Konstantin,
> 
> On 12/01/2014 12:58 PM, Ananyev, Konstantin wrote:
> >> Ether/IP/UDP/vxlan/Ether/IP/UDP/xxx
> >>   m->outer_l2_len = sizeof(ether)
> >>   m->outer_l3_len = sizeof(ip)
> >>   m->l2_len = sizeof(udp) + sizeof(vxlan) + sizeof(ether)
> >
> > I think it should be:
> > m->l2_len = sizeof(vxlan) + sizeof(ether)
> >
> > We don't need to add sizeof(udp) as we already say to the HW that I t will be
> UDP TUNNELING vi the ol_flag: PKT_TX_UDP_TUNNEL_PKT.
> >
> >>   m->l3_len = sizeof(ip)
> >>   m->l4_len = sizeof(udp)
> 
> I would agree if we had a m->outer_l4_len. Maybe we should add it to be
> coherent with lX_len fields?
> 
> I think that to process the inner IP checksum, we should be able to use these 2
> notations:
> 
> Ether/IP/GRE/IP/UDP/xxx
>   m->flags = OUTER_IP_CKSUM
>   m->outer_l2_len = sizeof(ether)
>   m->outer_l3_len = sizeof(ip)
>   m->l2_len = sizeof(gre)
>   m->l3_len = sizeof(ip)
>   m->l4_len = sizeof(udp)
> 
> Ether/IP/GRE/IP/UDP/xxx
>   m->flags = IP_CKSUM
>   /* sum of all outer_lX_len and l2_len from above: */
>   m->l2_len = sizeof(ether) + sizeof(ip) + sizeof(gre)
>   m->l3_len = sizeof(ip)
>   m->l4_len = sizeof(udp)
> 
> So, in case of vxlan, I suggest that either we include the size of UDP in l2_len, or
> we add a outer_l4_len. What do you think?
I agree to include the size of UDP in l2_len, for VXLAN, the UDP header is a part of VXLAN tunneling header as the UDP destination port indicate if a packet is VXLAN packet.
> Maybe adding outer_l4_len makes more sense.

> 
> Regards,
> Olivier

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [dpdk-dev] [PATCH 1/3] mbuf:add two TX offload flags and change three fields
  2014-12-01 13:07                       ` Liu, Jijiang
@ 2014-12-01 14:31                         ` Ananyev, Konstantin
  0 siblings, 0 replies; 28+ messages in thread
From: Ananyev, Konstantin @ 2014-12-01 14:31 UTC (permalink / raw)
  To: Liu, Jijiang, Olivier MATZ; +Cc: dev

Hi lads,

> -----Original Message-----
> From: Liu, Jijiang
> Sent: Monday, December 01, 2014 1:08 PM
> To: Olivier MATZ; Ananyev, Konstantin
> Cc: dev@dpdk.org
> Subject: RE: [dpdk-dev] [PATCH 1/3] mbuf:add two TX offload flags and change three fields
> 
> 
> 
> > -----Original Message-----
> > From: Olivier MATZ [mailto:olivier.matz@6wind.com]
> > Sent: Monday, December 1, 2014 8:28 PM
> > To: Ananyev, Konstantin; Liu, Jijiang
> > Cc: dev@dpdk.org
> > Subject: Re: [dpdk-dev] [PATCH 1/3] mbuf:add two TX offload flags and change
> > three fields
> >
> > Hi Konstantin,
> >
> > On 12/01/2014 12:58 PM, Ananyev, Konstantin wrote:
> > >> Ether/IP/UDP/vxlan/Ether/IP/UDP/xxx
> > >>   m->outer_l2_len = sizeof(ether)
> > >>   m->outer_l3_len = sizeof(ip)
> > >>   m->l2_len = sizeof(udp) + sizeof(vxlan) + sizeof(ether)
> > >
> > > I think it should be:
> > > m->l2_len = sizeof(vxlan) + sizeof(ether)
> > >
> > > We don't need to add sizeof(udp) as we already say to the HW that I t will be
> > UDP TUNNELING vi the ol_flag: PKT_TX_UDP_TUNNEL_PKT.
> > >
> > >>   m->l3_len = sizeof(ip)
> > >>   m->l4_len = sizeof(udp)
> >
> > I would agree if we had a m->outer_l4_len. Maybe we should add it to be
> > coherent with lX_len fields?
> >
> > I think that to process the inner IP checksum, we should be able to use these 2
> > notations:
> >
> > Ether/IP/GRE/IP/UDP/xxx
> >   m->flags = OUTER_IP_CKSUM
> >   m->outer_l2_len = sizeof(ether)
> >   m->outer_l3_len = sizeof(ip)
> >   m->l2_len = sizeof(gre)
> >   m->l3_len = sizeof(ip)
> >   m->l4_len = sizeof(udp)
> >
> > Ether/IP/GRE/IP/UDP/xxx
> >   m->flags = IP_CKSUM
> >   /* sum of all outer_lX_len and l2_len from above: */
> >   m->l2_len = sizeof(ether) + sizeof(ip) + sizeof(gre)
> >   m->l3_len = sizeof(ip)
> >   m->l4_len = sizeof(udp)
> >
> > So, in case of vxlan, I suggest that either we include the size of UDP in l2_len, or
> > we add a outer_l4_len. What do you think?
> I agree to include the size of UDP in l2_len, for VXLAN, the UDP header is a part of VXLAN tunnelling header as the UDP destination
> port indicate if a packet is VXLAN packet.

Actually it is my bad.
While looking at current implementation, I didn't realise that: ETHER_VXLAN_HLEN == (sizeof(struct udp_hdr) + sizeof(struct vxlan_hdr)).
So yes you are right for VXLAN packet it should be:
m->l2_len = sizeof(udp) + sizeof(vxlan) + sizeof(ether);
Sorry for confusing everyone.
Konstantin

> > Maybe adding outer_l4_len makes more sense.
> 
> >
> > Regards,
> > Olivier

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [dpdk-dev] [PATCH 0/3] i40e VXLAN TX checksum rework
  2014-11-27 16:31     ` Liu, Jijiang
@ 2014-12-03  8:02       ` Liu, Jijiang
  0 siblings, 0 replies; 28+ messages in thread
From: Liu, Jijiang @ 2014-12-03  8:02 UTC (permalink / raw)
  To: Thomas Monjalon; +Cc: 'dev@dpdk.org'

Hi Thomas,

> -----Original Message-----
> From: Liu, Jijiang
> Sent: Friday, November 28, 2014 12:32 AM
> To: Olivier MATZ
> Cc: Ananyev, Konstantin; dev@dpdk.org
> Subject: RE: [dpdk-dev] [PATCH 0/3] i40e VXLAN TX checksum rework
> 
> 
> 
> > -----Original Message-----
> > From: Ananyev, Konstantin
> > Sent: Thursday, November 27, 2014 11:30 PM
> > To: Olivier MATZ; Liu, Jijiang; dev@dpdk.org
> > Subject: RE: [dpdk-dev] [PATCH 0/3] i40e VXLAN TX checksum rework
> >
> > Hi Oliver,
> >
> > > -----Original Message-----
> > > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Olivier MATZ
> > > Sent: Thursday, November 27, 2014 9:45 AM
> > > To: Liu, Jijiang; dev@dpdk.org
> > > Subject: Re: [dpdk-dev] [PATCH 0/3] i40e VXLAN TX checksum rework
> > >
> > > Hi Jijiang,
> > >
> > > Please find below some comments about the specifications. The global
> > > picture looks fine to me.
> > >
> > > I've not reviewed the patch right now, but it's in the pipe.
> > >
> > > On 11/27/2014 09:18 AM, Jijiang Liu wrote:
> > > > We have got some feedback about backward compatibility of VXLAN TX
> > > > checksum offload API with 1G/10G NIC after the i40e VXLAN
> > > TX checksum codes were applied, so we have to rework the APIs on
> > > i40e,
> > including the changes of mbuf, i40e PMD and csum engine.
> > > >
> > > > The main changes in mbuf are as follows, In place of removing
> > > > PKT_TX_VXLAN_CKSUM, we introducing 2 new flags:
> > PKT_TX_OUT_IP_CKSUM,
> > > > PKT_TX_UDP_TUNNEL_PKT,
> > > and a new field: l4_tun_len.
> > >
> > > What about PKT_TX_OUT_UDP_CKSUM instead of
> > PKT_TX_UDP_TUNNEL_PKT? It's
> > > maybe more coherent with the other names.
> >
> > FVL HW don't support outer L4 checksum offload.
> > But to calculate inner checksums correctly, it needs a hint from SW
> > about L4 Tunnelling Type.
> >
> > >
> > >
> > > > Replace the inner_l2_len and the inner_l3_len field with the
> > > > outer_l2_len and
> > outer_l3_len field.
> > > >
> > > > The existing flags are listed below,
> > > > PKT_TX_IP_CKSUM:     HW IPv4 checksum for non-tunnelling packet/ HW
> > inner IPv4 checksum for tunnelling packet
> > > > PKT_TX_TCP_CKSUM:    HW TCP checksum for non-tunnelling packet/ HW
> > inner TCP checksum for tunnelling packet
> > > > PKT_TX_SCTP_CKSUM:   HW SCTP checksum for non-tunnelling packet/ HW
> > inner SCTP checksum for tunnelling packet
> > > > PKT_TX_UDP_CKSUM:    HW SCTP checksum for non-tunnelling packet/ HW
> > inner SCTP checksum for tunnelling packet
> > > > PKT_TX_IPV4:        IPv4 with no HW checksum offload for non-tunnelling
> > packet/inner IPv4 with no HW checksum offload for
> > > tunnelling packet
> > > > PKT_TX_IPV6:        IPv6 non-tunnelling packet/ inner IPv6 with no HW
> > checksum offload for tunnelling packet
> > >
> > > As I suggested in the TSO thread, I think the following semantics is
> > > easier to understand for the user:
> > >
> > >    - PKT_TX_IP_CKSUM: tell the NIC to compute IP cksum
> > >
> > >    - PKT_TX_IPV4: tell the NIC it's an IPv4 packet. Required for L4
> > >      checksum offload or TSO.
> > >
> > >    - PKT_TX_IPV6: tell the NIC it's an IPv6 packet. Required for L4
> > >      checksum offload or TSO.
> > >
> > > I think it won't make a big difference in the FVL driver.
> >
> > No, no big difference here, but I still think it will be a bit cleaner
> > if all 3 flags would be nutually exclusive.
> > In fact,  we can unite all 3 of them them into 2 bits,    same as we doing for L4
> > checksum flags.
> >
> > >
> > >
> > > > let's use a few examples to demonstrate how to use these flags:
> > > > Let say we have a tunnel packet:
> > > > eth_hdr_out/ipv4_hdr_out/udp_hdr_out/vxlan_hdr/ehtr_hdr_in/ipv4_hd
> > > > r_
> > > > in/tcp_hdr_in.There
> > > could be several scenarios:
> > > >
> > > > A) User requests HW offload for ipv4_hdr_out checksum.
> > > > He doesn't care is it a tunnelled packet or not.
> > > > So he sets:
> > > >
> > > > mb->l2_len =  eth_hdr_out;
> > > > mb->l3_len = ipv4_hdr_out;
> > > > mb->ol_flags |= PKT_TX_IPV4_CSUM;
> > > >
> > > > B) User is aware that it is a tunnelled packet and requests HW
> > > > offload for
> > ipv4_hdr_in and tcp_hdr_in *only*.
> > > > He doesn't care about outer IP checksum offload.
> > > > In that case, for FVL  he has 2 choices:
> > > >     1. Treat that packet as a 'proper' tunnelled packet, and fill all the fields:
> > > >       mb->l2_len =  eth_hdr_in;
> > > >       mb->l3_len = ipv4_hdr_in;
> > > >       mb->outer_l2_len = eth_hdr_out;
> > > >       mb->outer_l3_len = ipv4_hdr_out;
> > > >       mb->l4tun_len = vxlan_hdr;
> > > >       mb->ol_flags |= PKT_TX_UDP_TUNNEL_PKT | PKT_TX_IP_CKSUM |
> > > > PKT_TX_TCP_CKSUM;
> > > >
> > > >     2. As user doesn't care about outer IP hdr checksum, he can
> > > > treat
> > everything before ipv4_hdr_in as L2 header.
> > > >     So he knows, that it is a tunnelled packet, but makes HW to
> > > > treat it as
> > ordinary (non-tunnelled) packet:
> > > >       mb->l2_len = eth_hdr_out + ipv4_hdr_out + udp_hdr_out +
> > > > vxlan_hdr +
> > ehtr_hdr_in;
> > > >       mb->l3_len = ipv4_hdr_in;
> > > >       mb->ol_flags |= PKT_TX_IP_CKSUM |  PKT_TX_TCP_CKSUM;
> > > >
> > > > i40e PMD will support both B.1 and B.2.
> > > > ixgbe/igb/em PMD supports only B.2.
> > > > if HW supports both - it will be up to user app which method to choose.
> > >
> > > I think we should have a flag to advertise outer ip and outer udp
> > > checksum offload support, so the application knows which mode can be
> > > used.
> >
> > You mean a new DEV_TX_OFFLOAD_* value, right?
> > Something like:  DEV_TX_OFFLOAD_UDP_TUNNEL?
> > And make i40e_dev_info_get() to return it?
> > Yes, forgot about it, sounds like a proper thing to do.

> Yes, makes sense, I will send a separate patch(bug fixing) to do this. Thanks .

I'm preparing this patch, and will send it out soon, I hope this patch also can be included in DPDK1.8
Thanks.

 

> > Konstantin
> >
> > >
> > >
> > > Regards,
> > > Olivier

^ permalink raw reply	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2014-12-03  8:02 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-11-27  8:18 [dpdk-dev] [PATCH 0/3] i40e VXLAN TX checksum rework Jijiang Liu
2014-11-27  8:18 ` [dpdk-dev] [PATCH 1/3] mbuf:add two TX offload flags and change three fields Jijiang Liu
2014-11-27 10:00   ` Olivier MATZ
2014-11-27 13:14     ` Liu, Jijiang
2014-11-28  9:17       ` Olivier MATZ
     [not found]     ` <1ED644BD7E0A5F4091CF203DAFB8E4CC01D9EEA0@SHSMSX101.ccr.corp.intel.com>
2014-11-27 14:56       ` Ananyev, Konstantin
2014-11-27 17:01         ` Ananyev, Konstantin
2014-11-28 10:45           ` Olivier MATZ
2014-11-28 11:16             ` Ananyev, Konstantin
2014-11-30 14:50             ` Ananyev, Konstantin
2014-12-01  2:30               ` Liu, Jijiang
2014-12-01  9:52                 ` Olivier MATZ
2014-12-01 11:58                   ` Ananyev, Konstantin
2014-12-01 12:28                     ` Olivier MATZ
2014-12-01 13:07                       ` Liu, Jijiang
2014-12-01 14:31                         ` Ananyev, Konstantin
2014-11-27  8:18 ` [dpdk-dev] [PATCH 2/3] i40e:PMD change for VXLAN TX checksum Jijiang Liu
2014-11-27  8:18 ` [dpdk-dev] [PATCH 3/3] testpmd:rework csum forward engine Jijiang Liu
2014-11-27 10:23   ` Olivier MATZ
2014-11-27  8:50 ` [dpdk-dev] [PATCH 0/3] i40e VXLAN TX checksum rework Liu, Jijiang
2014-11-27  9:44 ` Olivier MATZ
2014-11-27 10:12   ` Olivier MATZ
2014-11-27 12:06     ` Liu, Jijiang
2014-11-27 12:07   ` Liu, Jijiang
2014-11-27 15:29   ` Ananyev, Konstantin
2014-11-27 16:31     ` Liu, Jijiang
2014-12-03  8:02       ` Liu, Jijiang
2014-11-28  9:26     ` Olivier MATZ

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).