DPDK patches and discussions
 help / color / mirror / Atom feed
* [dpdk-dev] [PATCH 00/12] add TSO support
@ 2014-11-10 15:59 Olivier Matz
  2014-11-10 15:59 ` [dpdk-dev] [PATCH 01/12] igb/ixgbe: fix IP checksum calculation Olivier Matz
                   ` (13 more replies)
  0 siblings, 14 replies; 112+ messages in thread
From: Olivier Matz @ 2014-11-10 15:59 UTC (permalink / raw)
  To: dev; +Cc: jigsaw

This series add TSO support in ixgbe DPDK driver. This is the third
version of the series, but as the previous version [1] was posted several
months ago and included a mbuf rework that is now in mainline, it can
be considered as a new patch series. I'm open to comments on this
patchset, especially on the API (see [2]).

This series first fixes some bugs that were discovered during the
development, adds some changes to the mbuf API (new l4_len and
tso_segsz fields), adds TSO support in ixgbe, reworks testpmd
csum forward engine, and finally adds TSO support in testpmd so it
can be validated.

The new fields added in mbuf try to be generic enough to apply to
other hardware in the future. To delegate the TCP segmentation to the
hardware, the user has to:

  - set the PKT_TX_TCP_SEG flag in mbuf->ol_flags (this flag implies
    PKT_TX_TCP_CKSUM)
  - if it's IPv4, set the PKT_TX_IP_CKSUM flag and write the IP checksum
    to 0 in the packet
  - fill the mbuf offload information: l2_len, l3_len, l4_len, tso_segsz
  - calculate the pseudo header checksum and set it in the TCP header,
    as required when doing hardware TCP checksum offload

The test report will be added as an answer to this cover letter and
could be linked in the concerned commits.

[1] http://dpdk.org/ml/archives/dev/2014-May/002537.html
[2] http://dpdk.org/ml/archives/dev/2014-November/007940.html

Olivier Matz (12):
  igb/ixgbe: fix IP checksum calculation
  ixgbe: fix remaining pkt_flags variable size to 64 bits
  mbuf: move vxlan_cksum flag definition at the proper place
  mbuf: add help about TX checksum flags
  mbuf: remove too specific PKT_TX_OFFLOAD_MASK definition
  mbuf: add functions to get the name of an ol_flag
  mbuf: generic support for TCP segmentation offload
  ixgbe: support TCP segmentation offload
  testpmd: fix use of offload flags in testpmd
  testpmd: rework csum forward engine
  testpmd: support TSO in csum forward engine
  testpmd: add a verbose mode csum forward engine

 app/test-pmd/cmdline.c              | 243 ++++++++++--
 app/test-pmd/config.c               |  15 +-
 app/test-pmd/csumonly.c             | 740 +++++++++++++++++++-----------------
 app/test-pmd/macfwd.c               |   5 +-
 app/test-pmd/macswap.c              |   5 +-
 app/test-pmd/rxonly.c               |  36 +-
 app/test-pmd/testpmd.c              |   3 +-
 app/test-pmd/testpmd.h              |  24 +-
 app/test-pmd/txonly.c               |   9 +-
 examples/ipv4_multicast/main.c      |   3 +-
 lib/librte_mbuf/rte_mbuf.h          | 130 +++++--
 lib/librte_pmd_e1000/igb_rxtx.c     |  16 +-
 lib/librte_pmd_ixgbe/ixgbe_ethdev.c |   3 +-
 lib/librte_pmd_ixgbe/ixgbe_rxtx.c   | 222 ++++++++---
 lib/librte_pmd_ixgbe/ixgbe_rxtx.h   |  19 +-
 15 files changed, 921 insertions(+), 552 deletions(-)

-- 
2.1.0

^ permalink raw reply	[flat|nested] 112+ messages in thread

* [dpdk-dev] [PATCH 01/12] igb/ixgbe: fix IP checksum calculation
  2014-11-10 15:59 [dpdk-dev] [PATCH 00/12] add TSO support Olivier Matz
@ 2014-11-10 15:59 ` Olivier Matz
  2014-11-10 15:59 ` [dpdk-dev] [PATCH 02/12] ixgbe: fix remaining pkt_flags variable size to 64 bits Olivier Matz
                   ` (12 subsequent siblings)
  13 siblings, 0 replies; 112+ messages in thread
From: Olivier Matz @ 2014-11-10 15:59 UTC (permalink / raw)
  To: dev; +Cc: jigsaw

According to Intel® 82599 10 GbE Controller Datasheet (Table 7-38), both
L2 and L3 lengths are needed to offload the IP checksum.

Note that the e1000 driver does not need to be patched as it already
contains the fix.

Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
---
 lib/librte_pmd_e1000/igb_rxtx.c   | 2 +-
 lib/librte_pmd_ixgbe/ixgbe_rxtx.c | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/lib/librte_pmd_e1000/igb_rxtx.c b/lib/librte_pmd_e1000/igb_rxtx.c
index f09c525..321493e 100644
--- a/lib/librte_pmd_e1000/igb_rxtx.c
+++ b/lib/librte_pmd_e1000/igb_rxtx.c
@@ -262,7 +262,7 @@ igbe_set_xmit_ctx(struct igb_tx_queue* txq,
 
 	if (ol_flags & PKT_TX_IP_CKSUM) {
 		type_tucmd_mlhl = E1000_ADVTXD_TUCMD_IPV4;
-		cmp_mask |= TX_MAC_LEN_CMP_MASK;
+		cmp_mask |= TX_MACIP_LEN_CMP_MASK;
 	}
 
 	/* Specify which HW CTX to upload. */
diff --git a/lib/librte_pmd_ixgbe/ixgbe_rxtx.c b/lib/librte_pmd_ixgbe/ixgbe_rxtx.c
index 3a5a8ff..78be7e6 100644
--- a/lib/librte_pmd_ixgbe/ixgbe_rxtx.c
+++ b/lib/librte_pmd_ixgbe/ixgbe_rxtx.c
@@ -374,7 +374,7 @@ ixgbe_set_xmit_ctx(struct igb_tx_queue* txq,
 
 	if (ol_flags & PKT_TX_IP_CKSUM) {
 		type_tucmd_mlhl = IXGBE_ADVTXD_TUCMD_IPV4;
-		cmp_mask |= TX_MAC_LEN_CMP_MASK;
+		cmp_mask |= TX_MACIP_LEN_CMP_MASK;
 	}
 
 	/* Specify which HW CTX to upload. */
-- 
2.1.0

^ permalink raw reply	[flat|nested] 112+ messages in thread

* [dpdk-dev] [PATCH 02/12] ixgbe: fix remaining pkt_flags variable size to 64 bits
  2014-11-10 15:59 [dpdk-dev] [PATCH 00/12] add TSO support Olivier Matz
  2014-11-10 15:59 ` [dpdk-dev] [PATCH 01/12] igb/ixgbe: fix IP checksum calculation Olivier Matz
@ 2014-11-10 15:59 ` Olivier Matz
  2014-11-10 16:59   ` Bruce Richardson
  2014-11-10 15:59 ` [dpdk-dev] [PATCH 03/12] mbuf: move vxlan_cksum flag definition at the proper place Olivier Matz
                   ` (11 subsequent siblings)
  13 siblings, 1 reply; 112+ messages in thread
From: Olivier Matz @ 2014-11-10 15:59 UTC (permalink / raw)
  To: dev; +Cc: jigsaw

Since commit 4332beee9 "mbuf: expand ol_flags field to 64-bits", the
packet flags are now 64 bits wide. Some occurences were forgotten in
the ixgbe driver.

Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
---
 lib/librte_pmd_ixgbe/ixgbe_rxtx.c | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/lib/librte_pmd_ixgbe/ixgbe_rxtx.c b/lib/librte_pmd_ixgbe/ixgbe_rxtx.c
index 78be7e6..042ee8a 100644
--- a/lib/librte_pmd_ixgbe/ixgbe_rxtx.c
+++ b/lib/librte_pmd_ixgbe/ixgbe_rxtx.c
@@ -817,7 +817,7 @@ end_of_tx:
 static inline uint64_t
 rx_desc_hlen_type_rss_to_pkt_flags(uint32_t hl_tp_rs)
 {
-	uint16_t pkt_flags;
+	uint64_t pkt_flags;
 
 	static uint64_t ip_pkt_types_map[16] = {
 		0, PKT_RX_IPV4_HDR, PKT_RX_IPV4_HDR_EXT, PKT_RX_IPV4_HDR_EXT,
@@ -834,7 +834,7 @@ rx_desc_hlen_type_rss_to_pkt_flags(uint32_t hl_tp_rs)
 	};
 
 #ifdef RTE_LIBRTE_IEEE1588
-	static uint32_t ip_pkt_etqf_map[8] = {
+	static uint64_t ip_pkt_etqf_map[8] = {
 		0, 0, 0, PKT_RX_IEEE1588_PTP,
 		0, 0, 0, 0,
 	};
@@ -903,7 +903,7 @@ ixgbe_rx_scan_hw_ring(struct igb_rx_queue *rxq)
 	struct igb_rx_entry *rxep;
 	struct rte_mbuf *mb;
 	uint16_t pkt_len;
-	uint16_t pkt_flags;
+	uint64_t pkt_flags;
 	int s[LOOK_AHEAD], nb_dd;
 	int i, j, nb_rx = 0;
 
@@ -1335,7 +1335,7 @@ ixgbe_recv_scattered_pkts(void *rx_queue, struct rte_mbuf **rx_pkts,
 	uint16_t nb_rx;
 	uint16_t nb_hold;
 	uint16_t data_len;
-	uint16_t pkt_flags;
+	uint64_t pkt_flags;
 
 	nb_rx = 0;
 	nb_hold = 0;
@@ -1511,9 +1511,9 @@ ixgbe_recv_scattered_pkts(void *rx_queue, struct rte_mbuf **rx_pkts,
 		first_seg->vlan_tci = rte_le_to_cpu_16(rxd.wb.upper.vlan);
 		hlen_type_rss = rte_le_to_cpu_32(rxd.wb.lower.lo_dword.data);
 		pkt_flags = rx_desc_hlen_type_rss_to_pkt_flags(hlen_type_rss);
-		pkt_flags = (uint16_t)(pkt_flags |
+		pkt_flags = (pkt_flags |
 				rx_desc_status_to_pkt_flags(staterr));
-		pkt_flags = (uint16_t)(pkt_flags |
+		pkt_flags = (pkt_flags |
 				rx_desc_error_to_pkt_flags(staterr));
 		first_seg->ol_flags = pkt_flags;
 
-- 
2.1.0

^ permalink raw reply	[flat|nested] 112+ messages in thread

* [dpdk-dev] [PATCH 03/12] mbuf: move vxlan_cksum flag definition at the proper place
  2014-11-10 15:59 [dpdk-dev] [PATCH 00/12] add TSO support Olivier Matz
  2014-11-10 15:59 ` [dpdk-dev] [PATCH 01/12] igb/ixgbe: fix IP checksum calculation Olivier Matz
  2014-11-10 15:59 ` [dpdk-dev] [PATCH 02/12] ixgbe: fix remaining pkt_flags variable size to 64 bits Olivier Matz
@ 2014-11-10 15:59 ` Olivier Matz
  2014-11-10 17:09   ` Bruce Richardson
  2014-11-10 15:59 ` [dpdk-dev] [PATCH 04/12] mbuf: add help about TX checksum flags Olivier Matz
                   ` (10 subsequent siblings)
  13 siblings, 1 reply; 112+ messages in thread
From: Olivier Matz @ 2014-11-10 15:59 UTC (permalink / raw)
  To: dev; +Cc: jigsaw

The tx mbuf flags are ordered from the highest value to the
the lowest. Move the PKT_TX_VXLAN_CKSUM at the right place.

Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
---
 lib/librte_mbuf/rte_mbuf.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h
index e8f9bfc..be15168 100644
--- a/lib/librte_mbuf/rte_mbuf.h
+++ b/lib/librte_mbuf/rte_mbuf.h
@@ -96,7 +96,6 @@ extern "C" {
 
 #define PKT_TX_VLAN_PKT      (1ULL << 55) /**< TX packet is a 802.1q VLAN packet. */
 #define PKT_TX_IP_CKSUM      (1ULL << 54) /**< IP cksum of TX pkt. computed by NIC. */
-#define PKT_TX_VXLAN_CKSUM   (1ULL << 50) /**< TX checksum of VXLAN computed by NIC */
 #define PKT_TX_IPV4_CSUM     PKT_TX_IP_CKSUM /**< Alias of PKT_TX_IP_CKSUM. */
 #define PKT_TX_IPV4          PKT_RX_IPV4_HDR /**< IPv4 with no IP checksum offload. */
 #define PKT_TX_IPV6          PKT_RX_IPV6_HDR /**< IPv6 packet */
@@ -114,9 +113,10 @@ extern "C" {
 #define PKT_TX_UDP_CKSUM     (3ULL << 52) /**< UDP cksum of TX pkt. computed by NIC. */
 #define PKT_TX_L4_MASK       (3ULL << 52) /**< Mask for L4 cksum offload request. */
 
-/* Bit 51 - IEEE1588*/
 #define PKT_TX_IEEE1588_TMST (1ULL << 51) /**< TX IEEE1588 packet to timestamp. */
 
+#define PKT_TX_VXLAN_CKSUM   (1ULL << 50) /**< TX checksum of VXLAN computed by NIC */
+
 /* Use final bit of flags to indicate a control mbuf */
 #define CTRL_MBUF_FLAG       (1ULL << 63) /**< Mbuf contains control data */
 
-- 
2.1.0

^ permalink raw reply	[flat|nested] 112+ messages in thread

* [dpdk-dev] [PATCH 04/12] mbuf: add help about TX checksum flags
  2014-11-10 15:59 [dpdk-dev] [PATCH 00/12] add TSO support Olivier Matz
                   ` (2 preceding siblings ...)
  2014-11-10 15:59 ` [dpdk-dev] [PATCH 03/12] mbuf: move vxlan_cksum flag definition at the proper place Olivier Matz
@ 2014-11-10 15:59 ` Olivier Matz
  2014-11-10 17:10   ` Bruce Richardson
  2014-11-10 15:59 ` [dpdk-dev] [PATCH 05/12] mbuf: remove too specific PKT_TX_OFFLOAD_MASK definition Olivier Matz
                   ` (9 subsequent siblings)
  13 siblings, 1 reply; 112+ messages in thread
From: Olivier Matz @ 2014-11-10 15:59 UTC (permalink / raw)
  To: dev; +Cc: jigsaw

Describe how to use hardware checksum API.

Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
---
 lib/librte_mbuf/rte_mbuf.h | 25 +++++++++++++++++--------
 1 file changed, 17 insertions(+), 8 deletions(-)

diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h
index be15168..96e322b 100644
--- a/lib/librte_mbuf/rte_mbuf.h
+++ b/lib/librte_mbuf/rte_mbuf.h
@@ -95,19 +95,28 @@ extern "C" {
 #define PKT_RX_TUNNEL_IPV6_HDR (1ULL << 12) /**< RX tunnel packet with IPv6 header. */
 
 #define PKT_TX_VLAN_PKT      (1ULL << 55) /**< TX packet is a 802.1q VLAN packet. */
-#define PKT_TX_IP_CKSUM      (1ULL << 54) /**< IP cksum of TX pkt. computed by NIC. */
+
+/**
+ * Enable hardware computation of IP cksum. To use it:
+ *  - fill l2_len and l3_len in mbuf
+ *  - set the flags PKT_TX_IP_CKSUM
+ *  - set the ip checksum to 0 in IP header
+ */
+#define PKT_TX_IP_CKSUM      (1ULL << 54)
 #define PKT_TX_IPV4_CSUM     PKT_TX_IP_CKSUM /**< Alias of PKT_TX_IP_CKSUM. */
 #define PKT_TX_IPV4          PKT_RX_IPV4_HDR /**< IPv4 with no IP checksum offload. */
 #define PKT_TX_IPV6          PKT_RX_IPV6_HDR /**< IPv6 packet */
 
-/*
- * Bits 52+53 used for L4 packet type with checksum enabled.
- *     00: Reserved
- *     01: TCP checksum
- *     10: SCTP checksum
- *     11: UDP checksum
+/**
+ * Bits 52+53 used for L4 packet type with checksum enabled: 00: Reserved,
+ * 01: TCP checksum, 10: SCTP checksum, 11: UDP checksum. To use hardware
+ * L4 checksum offload, the user needs to:
+ *  - fill l2_len and l3_len in mbuf
+ *  - set the flags PKT_TX_TCP_CKSUM, PKT_TX_SCTP_CKSUM or PKT_TX_UDP_CKSUM
+ *  - calculate the pseudo header checksum and set it in the L4 header (only
+ *    for TCP or UDP). For SCTP, set the crc field to 0.
  */
-#define PKT_TX_L4_NO_CKSUM   (0ULL << 52) /**< Disable L4 cksum of TX pkt. */
+#define PKT_TX_L4_NO_CKSUM   (0ULL << 52) /* Disable L4 cksum of TX pkt. */
 #define PKT_TX_TCP_CKSUM     (1ULL << 52) /**< TCP cksum of TX pkt. computed by NIC. */
 #define PKT_TX_SCTP_CKSUM    (2ULL << 52) /**< SCTP cksum of TX pkt. computed by NIC. */
 #define PKT_TX_UDP_CKSUM     (3ULL << 52) /**< UDP cksum of TX pkt. computed by NIC. */
-- 
2.1.0

^ permalink raw reply	[flat|nested] 112+ messages in thread

* [dpdk-dev] [PATCH 05/12] mbuf: remove too specific PKT_TX_OFFLOAD_MASK definition
  2014-11-10 15:59 [dpdk-dev] [PATCH 00/12] add TSO support Olivier Matz
                   ` (3 preceding siblings ...)
  2014-11-10 15:59 ` [dpdk-dev] [PATCH 04/12] mbuf: add help about TX checksum flags Olivier Matz
@ 2014-11-10 15:59 ` Olivier Matz
  2014-11-10 17:14   ` Bruce Richardson
  2014-11-10 15:59 ` [dpdk-dev] [PATCH 06/12] mbuf: add functions to get the name of an ol_flag Olivier Matz
                   ` (8 subsequent siblings)
  13 siblings, 1 reply; 112+ messages in thread
From: Olivier Matz @ 2014-11-10 15:59 UTC (permalink / raw)
  To: dev; +Cc: jigsaw

This definition is specific to Intel PMD drivers and its definition
"indicate what bits required for building TX context" shows that it
should not be in the generic rte_mbuf.h but in the PMD driver.

Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
---
 lib/librte_mbuf/rte_mbuf.h        | 5 -----
 lib/librte_pmd_e1000/igb_rxtx.c   | 3 ++-
 lib/librte_pmd_ixgbe/ixgbe_rxtx.c | 3 ++-
 3 files changed, 4 insertions(+), 7 deletions(-)

diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h
index 96e322b..ff11b84 100644
--- a/lib/librte_mbuf/rte_mbuf.h
+++ b/lib/librte_mbuf/rte_mbuf.h
@@ -129,11 +129,6 @@ extern "C" {
 /* Use final bit of flags to indicate a control mbuf */
 #define CTRL_MBUF_FLAG       (1ULL << 63) /**< Mbuf contains control data */
 
-/**
- * Bit Mask to indicate what bits required for building TX context
- */
-#define PKT_TX_OFFLOAD_MASK (PKT_TX_VLAN_PKT | PKT_TX_IP_CKSUM | PKT_TX_L4_MASK)
-
 /* define a set of marker types that can be used to refer to set points in the
  * mbuf */
 typedef void    *MARKER[0];   /**< generic marker for a point in a structure */
diff --git a/lib/librte_pmd_e1000/igb_rxtx.c b/lib/librte_pmd_e1000/igb_rxtx.c
index 321493e..dbf5074 100644
--- a/lib/librte_pmd_e1000/igb_rxtx.c
+++ b/lib/librte_pmd_e1000/igb_rxtx.c
@@ -400,7 +400,8 @@ eth_igb_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
 		ol_flags = tx_pkt->ol_flags;
 		vlan_macip_lens.f.vlan_tci = tx_pkt->vlan_tci;
 		vlan_macip_lens.f.l2_l3_len = tx_pkt->l2_l3_len;
-		tx_ol_req = ol_flags & PKT_TX_OFFLOAD_MASK;
+		tx_ol_req = ol_flags & (PKT_TX_VLAN_PKT | PKT_TX_IP_CKSUM |
+			PKT_TX_L4_MASK);
 
 		/* If a Context Descriptor need be built . */
 		if (tx_ol_req) {
diff --git a/lib/librte_pmd_ixgbe/ixgbe_rxtx.c b/lib/librte_pmd_ixgbe/ixgbe_rxtx.c
index 042ee8a..70ca254 100644
--- a/lib/librte_pmd_ixgbe/ixgbe_rxtx.c
+++ b/lib/librte_pmd_ixgbe/ixgbe_rxtx.c
@@ -580,7 +580,8 @@ ixgbe_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
 		ol_flags = tx_pkt->ol_flags;
 
 		/* If hardware offload required */
-		tx_ol_req = ol_flags & PKT_TX_OFFLOAD_MASK;
+		tx_ol_req = ol_flags & (PKT_TX_VLAN_PKT | PKT_TX_IP_CKSUM |
+			PKT_TX_L4_MASK);
 		if (tx_ol_req) {
 			vlan_macip_lens.f.vlan_tci = tx_pkt->vlan_tci;
 			vlan_macip_lens.f.l2_l3_len = tx_pkt->l2_l3_len;
-- 
2.1.0

^ permalink raw reply	[flat|nested] 112+ messages in thread

* [dpdk-dev] [PATCH 06/12] mbuf: add functions to get the name of an ol_flag
  2014-11-10 15:59 [dpdk-dev] [PATCH 00/12] add TSO support Olivier Matz
                   ` (4 preceding siblings ...)
  2014-11-10 15:59 ` [dpdk-dev] [PATCH 05/12] mbuf: remove too specific PKT_TX_OFFLOAD_MASK definition Olivier Matz
@ 2014-11-10 15:59 ` Olivier Matz
  2014-11-10 17:29   ` Bruce Richardson
  2014-11-10 15:59 ` [dpdk-dev] [PATCH 07/12] mbuf: generic support for TCP segmentation offload Olivier Matz
                   ` (7 subsequent siblings)
  13 siblings, 1 reply; 112+ messages in thread
From: Olivier Matz @ 2014-11-10 15:59 UTC (permalink / raw)
  To: dev; +Cc: jigsaw

In test-pmd (rxonly.c), the code is able to dump the list of ol_flags.
The issue is that the list of flags in the application has to be
synchronized with the flags defined in rte_mbuf.h.

This patch introduces 2 new functions rte_get_rx_ol_flag_name()
and rte_get_tx_ol_flag_name() that returns the name of a flag from
its mask. It also fixes rxonly.c to use this new functions and to
display the proper flags.

Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
---
 app/test-pmd/rxonly.c      | 36 ++++++++--------------------
 lib/librte_mbuf/rte_mbuf.h | 60 ++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 70 insertions(+), 26 deletions(-)

diff --git a/app/test-pmd/rxonly.c b/app/test-pmd/rxonly.c
index 4410c3d..e7cd7e2 100644
--- a/app/test-pmd/rxonly.c
+++ b/app/test-pmd/rxonly.c
@@ -71,26 +71,6 @@
 
 #include "testpmd.h"
 
-#define MAX_PKT_RX_FLAGS 13
-static const char *pkt_rx_flag_names[MAX_PKT_RX_FLAGS] = {
-	"VLAN_PKT",
-	"RSS_HASH",
-	"PKT_RX_FDIR",
-	"IP_CKSUM",
-	"IP_CKSUM_BAD",
-
-	"IPV4_HDR",
-	"IPV4_HDR_EXT",
-	"IPV6_HDR",
-	"IPV6_HDR_EXT",
-
-	"IEEE1588_PTP",
-	"IEEE1588_TMST",
-
-	"TUNNEL_IPV4_HDR",
-	"TUNNEL_IPV6_HDR",
-};
-
 static inline void
 print_ether_addr(const char *what, struct ether_addr *eth_addr)
 {
@@ -219,12 +199,16 @@ pkt_burst_receive(struct fwd_stream *fs)
 		printf(" - Receive queue=0x%x", (unsigned) fs->rx_queue);
 		printf("\n");
 		if (ol_flags != 0) {
-			int rxf;
-
-			for (rxf = 0; rxf < MAX_PKT_RX_FLAGS; rxf++) {
-				if (ol_flags & (1 << rxf))
-					printf("  PKT_RX_%s\n",
-					       pkt_rx_flag_names[rxf]);
+			unsigned rxf;
+			const char *name;
+
+			for (rxf = 0; rxf < sizeof(mb->ol_flags) * 8; rxf++) {
+				if ((ol_flags & (1ULL << rxf)) == 0)
+					continue;
+				name = rte_get_rx_ol_flag_name(1ULL << rxf);
+				if (name == NULL)
+					continue;
+				printf("  %s\n", name);
 			}
 		}
 		rte_pktmbuf_free(mb);
diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h
index ff11b84..bcd8996 100644
--- a/lib/librte_mbuf/rte_mbuf.h
+++ b/lib/librte_mbuf/rte_mbuf.h
@@ -129,6 +129,66 @@ extern "C" {
 /* Use final bit of flags to indicate a control mbuf */
 #define CTRL_MBUF_FLAG       (1ULL << 63) /**< Mbuf contains control data */
 
+/**
+ * Bit Mask to indicate what bits required for building TX context
+ * Get the name of a RX offload flag
+ *
+ * @param mask
+ *   The mask describing the flag. Usually only one bit must be set.
+ *   Several bits can be given if they belong to the same mask.
+ *   Ex: PKT_TX_L4_MASK.
+ * @return
+ *   The name of this flag, or NULL if it's not a valid RX flag.
+ */
+static inline const char *rte_get_rx_ol_flag_name(uint64_t mask)
+{
+	switch (mask) {
+	case PKT_RX_VLAN_PKT: return "PKT_RX_VLAN_PKT";
+	case PKT_RX_RSS_HASH: return "PKT_RX_RSS_HASH";
+	case PKT_RX_FDIR: return "PKT_RX_FDIR";
+	case PKT_RX_L4_CKSUM_BAD: return "PKT_RX_L4_CKSUM_BAD";
+	case PKT_RX_IP_CKSUM_BAD: return "PKT_RX_IP_CKSUM_BAD";
+	/* case PKT_RX_EIP_CKSUM_BAD: return "PKT_RX_EIP_CKSUM_BAD"; */
+	/* case PKT_RX_OVERSIZE: return "PKT_RX_OVERSIZE"; */
+	/* case PKT_RX_HBUF_OVERFLOW: return "PKT_RX_HBUF_OVERFLOW"; */
+	/* case PKT_RX_RECIP_ERR: return "PKT_RX_RECIP_ERR"; */
+	/* case PKT_RX_MAC_ERR: return "PKT_RX_MAC_ERR"; */
+	case PKT_RX_IPV4_HDR: return "PKT_RX_IPV4_HDR";
+	case PKT_RX_IPV4_HDR_EXT: return "PKT_RX_IPV4_HDR_EXT";
+	case PKT_RX_IPV6_HDR: return "PKT_RX_IPV6_HDR";
+	case PKT_RX_IPV6_HDR_EXT: return "PKT_RX_IPV6_HDR_EXT";
+	case PKT_RX_IEEE1588_PTP: return "PKT_RX_IEEE1588_PTP";
+	case PKT_RX_IEEE1588_TMST: return "PKT_RX_IEEE1588_TMST";
+	case PKT_RX_TUNNEL_IPV4_HDR: return "PKT_RX_TUNNEL_IPV4_HDR";
+	case PKT_RX_TUNNEL_IPV6_HDR: return "PKT_RX_TUNNEL_IPV6_HDR";
+	default: return NULL;
+	}
+}
+
+/**
+ * Get the name of a TX offload flag
+ *
+ * @param mask
+ *   The mask describing the flag. Usually only one bit must be set.
+ *   Several bits can be given if they belong to the same mask.
+ *   Ex: PKT_TX_L4_MASK.
+ * @return
+ *   The name of this flag, or NULL if it's not a valid TX flag.
+ */
+static inline const char *rte_get_tx_ol_flag_name(uint64_t mask)
+{
+	switch (mask) {
+	case PKT_TX_VLAN_PKT: return "PKT_TX_VLAN_PKT";
+	case PKT_TX_IP_CKSUM: return "PKT_TX_IP_CKSUM";
+	case PKT_TX_TCP_CKSUM: return "PKT_TX_TCP_CKSUM";
+	case PKT_TX_SCTP_CKSUM: return "PKT_TX_SCTP_CKSUM";
+	case PKT_TX_UDP_CKSUM: return "PKT_TX_UDP_CKSUM";
+	case PKT_TX_IEEE1588_TMST: return "PKT_TX_IEEE1588_TMST";
+	case PKT_TX_VXLAN_CKSUM: return "PKT_TX_VXLAN_CKSUM";
+	default: return NULL;
+	}
+}
+
 /* define a set of marker types that can be used to refer to set points in the
  * mbuf */
 typedef void    *MARKER[0];   /**< generic marker for a point in a structure */
-- 
2.1.0

^ permalink raw reply	[flat|nested] 112+ messages in thread

* [dpdk-dev] [PATCH 07/12] mbuf: generic support for TCP segmentation offload
  2014-11-10 15:59 [dpdk-dev] [PATCH 00/12] add TSO support Olivier Matz
                   ` (5 preceding siblings ...)
  2014-11-10 15:59 ` [dpdk-dev] [PATCH 06/12] mbuf: add functions to get the name of an ol_flag Olivier Matz
@ 2014-11-10 15:59 ` Olivier Matz
  2014-11-11  3:17   ` Liu, Jijiang
  2014-11-12 13:09   ` Ananyev, Konstantin
  2014-11-10 15:59 ` [dpdk-dev] [PATCH 08/12] ixgbe: support " Olivier Matz
                   ` (6 subsequent siblings)
  13 siblings, 2 replies; 112+ messages in thread
From: Olivier Matz @ 2014-11-10 15:59 UTC (permalink / raw)
  To: dev; +Cc: jigsaw

Some of the NICs supported by DPDK have a possibility to accelerate TCP
traffic by using segmentation offload. The application prepares a packet
with valid TCP header with size up to 64K and deleguates the
segmentation to the NIC.

Implement the generic part of TCP segmentation offload in rte_mbuf. It
introduces 2 new fields in rte_mbuf: l4_len (length of L4 header in bytes)
and tso_segsz (MSS of packets).

To delegate the TCP segmentation to the hardware, the user has to:

- set the PKT_TX_TCP_SEG flag in mbuf->ol_flags (this flag implies
  PKT_TX_TCP_CKSUM)
- set PKT_TX_IP_CKSUM if it's IPv4, and set the IP checksum to 0 in
  the packet
- fill the mbuf offload information: l2_len, l3_len, l4_len, tso_segsz
- calculate the pseudo header checksum and set it in the TCP header,
  as required when doing hardware TCP checksum offload

The API is inspired from ixgbe hardware (the next commit adds the
support for ixgbe), but it seems generic enough to be used for other
hw/drivers in the future.

This commit also reworks the way l2_len and l3_len are used in igb
and ixgbe drivers as the l2_l3_len is not available anymore in mbuf.

Signed-off-by: Mirek Walukiewicz <miroslaw.walukiewicz@intel.com>
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
---
 app/test-pmd/testpmd.c            |  3 ++-
 examples/ipv4_multicast/main.c    |  3 ++-
 lib/librte_mbuf/rte_mbuf.h        | 44 +++++++++++++++++++++++----------------
 lib/librte_pmd_e1000/igb_rxtx.c   | 11 +++++++++-
 lib/librte_pmd_ixgbe/ixgbe_rxtx.c | 11 +++++++++-
 5 files changed, 50 insertions(+), 22 deletions(-)

diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c
index 12adafa..a831e31 100644
--- a/app/test-pmd/testpmd.c
+++ b/app/test-pmd/testpmd.c
@@ -408,7 +408,8 @@ testpmd_mbuf_ctor(struct rte_mempool *mp,
 	mb->ol_flags     = 0;
 	mb->data_off     = RTE_PKTMBUF_HEADROOM;
 	mb->nb_segs      = 1;
-	mb->l2_l3_len       = 0;
+	mb->l2_len       = 0;
+	mb->l3_len       = 0;
 	mb->vlan_tci     = 0;
 	mb->hash.rss     = 0;
 }
diff --git a/examples/ipv4_multicast/main.c b/examples/ipv4_multicast/main.c
index de5e6be..a31d43d 100644
--- a/examples/ipv4_multicast/main.c
+++ b/examples/ipv4_multicast/main.c
@@ -302,7 +302,8 @@ mcast_out_pkt(struct rte_mbuf *pkt, int use_clone)
 	/* copy metadata from source packet*/
 	hdr->port = pkt->port;
 	hdr->vlan_tci = pkt->vlan_tci;
-	hdr->l2_l3_len = pkt->l2_l3_len;
+	hdr->l2_len = pkt->l2_len;
+	hdr->l3_len = pkt->l3_len;
 	hdr->hash = pkt->hash;
 
 	hdr->ol_flags = pkt->ol_flags;
diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h
index bcd8996..f76b768 100644
--- a/lib/librte_mbuf/rte_mbuf.h
+++ b/lib/librte_mbuf/rte_mbuf.h
@@ -126,6 +126,19 @@ extern "C" {
 
 #define PKT_TX_VXLAN_CKSUM   (1ULL << 50) /**< TX checksum of VXLAN computed by NIC */
 
+/**
+ * TCP segmentation offload. To enable this offload feature for a
+ * packet to be transmitted on hardware supporting TSO:
+ *  - set the PKT_TX_TCP_SEG flag in mbuf->ol_flags (this flag implies
+ *    PKT_TX_TCP_CKSUM)
+ *  - if it's IPv4, set the PKT_TX_IP_CKSUM flag and write the IP checksum
+ *    to 0 in the packet
+ *  - fill the mbuf offload information: l2_len, l3_len, l4_len, tso_segsz
+ *  - calculate the pseudo header checksum and set it in the TCP header,
+ *    as required when doing hardware TCP checksum offload
+ */
+#define PKT_TX_TCP_SEG       (1ULL << 49)
+
 /* Use final bit of flags to indicate a control mbuf */
 #define CTRL_MBUF_FLAG       (1ULL << 63) /**< Mbuf contains control data */
 
@@ -185,6 +198,7 @@ static inline const char *rte_get_tx_ol_flag_name(uint64_t mask)
 	case PKT_TX_UDP_CKSUM: return "PKT_TX_UDP_CKSUM";
 	case PKT_TX_IEEE1588_TMST: return "PKT_TX_IEEE1588_TMST";
 	case PKT_TX_VXLAN_CKSUM: return "PKT_TX_VXLAN_CKSUM";
+	case PKT_TX_TCP_SEG: return "PKT_TX_TCP_SEG";
 	default: return NULL;
 	}
 }
@@ -264,22 +278,18 @@ struct rte_mbuf {
 
 	/* fields to support TX offloads */
 	union {
-		uint16_t l2_l3_len; /**< combined l2/l3 lengths as single var */
+		uint64_t tx_offload;       /**< combined for easy fetch */
 		struct {
-			uint16_t l3_len:9;      /**< L3 (IP) Header Length. */
-			uint16_t l2_len:7;      /**< L2 (MAC) Header Length. */
-		};
-	};
+			uint64_t l2_len:7; /**< L2 (MAC) Header Length. */
+			uint64_t l3_len:9; /**< L3 (IP) Header Length. */
+			uint64_t l4_len:8; /**< L4 (TCP/UDP) Header Length. */
+			uint64_t tso_segsz:16; /**< TCP TSO segment size */
 
-	/* fields for TX offloading of tunnels */
-	union {
-		uint16_t inner_l2_l3_len;
-		/**< combined inner l2/l3 lengths as single var */
-		struct {
-			uint16_t inner_l3_len:9;
-			/**< inner L3 (IP) Header Length. */
-			uint16_t inner_l2_len:7;
-			/**< inner L2 (MAC) Header Length. */
+			/* fields for TX offloading of tunnels */
+			uint16_t inner_l3_len:9; /**< inner L3 (IP) Hdr Length. */
+			uint16_t inner_l2_len:7; /**< inner L2 (MAC) Hdr Length. */
+
+			/* uint64_t unused:8; */
 		};
 	};
 } __rte_cache_aligned;
@@ -631,8 +641,7 @@ static inline void rte_pktmbuf_reset(struct rte_mbuf *m)
 {
 	m->next = NULL;
 	m->pkt_len = 0;
-	m->l2_l3_len = 0;
-	m->inner_l2_l3_len = 0;
+	m->tx_offload = 0;
 	m->vlan_tci = 0;
 	m->nb_segs = 1;
 	m->port = 0xff;
@@ -701,8 +710,7 @@ static inline void rte_pktmbuf_attach(struct rte_mbuf *mi, struct rte_mbuf *md)
 	mi->data_len = md->data_len;
 	mi->port = md->port;
 	mi->vlan_tci = md->vlan_tci;
-	mi->l2_l3_len = md->l2_l3_len;
-	mi->inner_l2_l3_len = md->inner_l2_l3_len;
+	mi->tx_offload = md->tx_offload;
 	mi->hash = md->hash;
 
 	mi->next = NULL;
diff --git a/lib/librte_pmd_e1000/igb_rxtx.c b/lib/librte_pmd_e1000/igb_rxtx.c
index dbf5074..0a9447e 100644
--- a/lib/librte_pmd_e1000/igb_rxtx.c
+++ b/lib/librte_pmd_e1000/igb_rxtx.c
@@ -361,6 +361,13 @@ eth_igb_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
 	struct rte_mbuf     *tx_pkt;
 	struct rte_mbuf     *m_seg;
 	union igb_vlan_macip vlan_macip_lens;
+	union {
+		uint16_t u16;
+		struct {
+			uint16_t l3_len:9;
+			uint16_t l2_len:7;
+		};
+	} l2_l3_len;
 	uint64_t buf_dma_addr;
 	uint32_t olinfo_status;
 	uint32_t cmd_type_len;
@@ -398,8 +405,10 @@ eth_igb_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
 		tx_last = (uint16_t) (tx_id + tx_pkt->nb_segs - 1);
 
 		ol_flags = tx_pkt->ol_flags;
+		l2_l3_len.l2_len = tx_pkt->l2_len;
+		l2_l3_len.l3_len = tx_pkt->l3_len;
 		vlan_macip_lens.f.vlan_tci = tx_pkt->vlan_tci;
-		vlan_macip_lens.f.l2_l3_len = tx_pkt->l2_l3_len;
+		vlan_macip_lens.f.l2_l3_len = l2_l3_len.u16;
 		tx_ol_req = ol_flags & (PKT_TX_VLAN_PKT | PKT_TX_IP_CKSUM |
 			PKT_TX_L4_MASK);
 
diff --git a/lib/librte_pmd_ixgbe/ixgbe_rxtx.c b/lib/librte_pmd_ixgbe/ixgbe_rxtx.c
index 70ca254..54a0fc1 100644
--- a/lib/librte_pmd_ixgbe/ixgbe_rxtx.c
+++ b/lib/librte_pmd_ixgbe/ixgbe_rxtx.c
@@ -540,6 +540,13 @@ ixgbe_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
 	struct rte_mbuf     *tx_pkt;
 	struct rte_mbuf     *m_seg;
 	union ixgbe_vlan_macip vlan_macip_lens;
+	union {
+		uint16_t u16;
+		struct {
+			uint16_t l3_len:9;
+			uint16_t l2_len:7;
+		};
+	} l2_l3_len;
 	uint64_t buf_dma_addr;
 	uint32_t olinfo_status;
 	uint32_t cmd_type_len;
@@ -583,8 +590,10 @@ ixgbe_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
 		tx_ol_req = ol_flags & (PKT_TX_VLAN_PKT | PKT_TX_IP_CKSUM |
 			PKT_TX_L4_MASK);
 		if (tx_ol_req) {
+			l2_l3_len.l2_len = tx_pkt->l2_len;
+			l2_l3_len.l3_len = tx_pkt->l3_len;
 			vlan_macip_lens.f.vlan_tci = tx_pkt->vlan_tci;
-			vlan_macip_lens.f.l2_l3_len = tx_pkt->l2_l3_len;
+			vlan_macip_lens.f.l2_l3_len = l2_l3_len.u16;
 
 			/* If new context need be built or reuse the exist ctx. */
 			ctx = what_advctx_update(txq, tx_ol_req,
-- 
2.1.0

^ permalink raw reply	[flat|nested] 112+ messages in thread

* [dpdk-dev] [PATCH 08/12] ixgbe: support TCP segmentation offload
  2014-11-10 15:59 [dpdk-dev] [PATCH 00/12] add TSO support Olivier Matz
                   ` (6 preceding siblings ...)
  2014-11-10 15:59 ` [dpdk-dev] [PATCH 07/12] mbuf: generic support for TCP segmentation offload Olivier Matz
@ 2014-11-10 15:59 ` Olivier Matz
  2014-11-10 15:59 ` [dpdk-dev] [PATCH 09/12] testpmd: fix use of offload flags in testpmd Olivier Matz
                   ` (5 subsequent siblings)
  13 siblings, 0 replies; 112+ messages in thread
From: Olivier Matz @ 2014-11-10 15:59 UTC (permalink / raw)
  To: dev; +Cc: jigsaw

Implement TSO (TCP segmentation offload) in ixgbe driver. The driver is
now able to use PKT_TX_TCP_SEG mbuf flag and mbuf hardware offload infos
(l2_len, l3_len, l4_len, tso_segsz) to configure the hardware support of
TCP segmentation.

In ixgbe, when doing TSO, the IP length must not be included in the TCP
pseudo header checksum. A new function ixgbe_fix_tcp_phdr_cksum() is
used to fix the pseudo header checksum of the packet before giving it to
the hardware.

In the patch, the tx_desc_cksum_flags_to_olinfo() and
tx_desc_ol_flags_to_cmdtype() functions have been reworked to make them
clearer. This should not impact performance as gcc (version 4.8 in my
case) is smart enough to convert the tests into a code that does not
contain any branch instruction.

Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
---
 lib/librte_pmd_ixgbe/ixgbe_ethdev.c |   3 +-
 lib/librte_pmd_ixgbe/ixgbe_rxtx.c   | 220 +++++++++++++++++++++++++-----------
 lib/librte_pmd_ixgbe/ixgbe_rxtx.h   |  19 ++--
 3 files changed, 167 insertions(+), 75 deletions(-)

diff --git a/lib/librte_pmd_ixgbe/ixgbe_ethdev.c b/lib/librte_pmd_ixgbe/ixgbe_ethdev.c
index 9c73a30..1ab433a 100644
--- a/lib/librte_pmd_ixgbe/ixgbe_ethdev.c
+++ b/lib/librte_pmd_ixgbe/ixgbe_ethdev.c
@@ -1961,7 +1961,8 @@ ixgbe_dev_info_get(struct rte_eth_dev *dev, struct rte_eth_dev_info *dev_info)
 		DEV_TX_OFFLOAD_IPV4_CKSUM  |
 		DEV_TX_OFFLOAD_UDP_CKSUM   |
 		DEV_TX_OFFLOAD_TCP_CKSUM   |
-		DEV_TX_OFFLOAD_SCTP_CKSUM;
+		DEV_TX_OFFLOAD_SCTP_CKSUM  |
+		DEV_TX_OFFLOAD_TCP_TSO;
 
 	dev_info->default_rxconf = (struct rte_eth_rxconf) {
 			.rx_thresh = {
diff --git a/lib/librte_pmd_ixgbe/ixgbe_rxtx.c b/lib/librte_pmd_ixgbe/ixgbe_rxtx.c
index 54a0fc1..79f7395 100644
--- a/lib/librte_pmd_ixgbe/ixgbe_rxtx.c
+++ b/lib/librte_pmd_ixgbe/ixgbe_rxtx.c
@@ -354,62 +354,132 @@ ixgbe_xmit_pkts_simple(void *tx_queue, struct rte_mbuf **tx_pkts,
 	return nb_tx;
 }
 
+/* When doing TSO, the IP length must not be included in the pseudo
+ * header checksum of the packet given to the hardware */
+static inline void
+ixgbe_fix_tcp_phdr_cksum(struct rte_mbuf *m)
+{
+	char *data;
+	uint16_t *cksum_ptr;
+	uint16_t prev_cksum;
+	uint16_t new_cksum;
+	uint16_t ip_len, ip_paylen;
+	uint32_t tmp;
+	uint8_t ip_version;
+
+	/* get phdr cksum at offset 16 of TCP header */
+	data = rte_pktmbuf_mtod(m, char *);
+	cksum_ptr = (uint16_t *)(data + m->l2_len + m->l3_len + 16);
+	prev_cksum = *cksum_ptr;
+
+	/* get ip_version */
+	ip_version = (*(uint8_t *)(data + m->l2_len)) >> 4;
+
+	/* get ip_len at offset 2 of IP header or offset 4 of IPv6 header */
+	if (ip_version == 4) {
+		/* override ip cksum to 0 */
+		data[m->l2_len + 10] = 0;
+		data[m->l2_len + 11] = 0;
+
+		ip_len = *(uint16_t *)(data + m->l2_len + 2);
+		ip_paylen = rte_cpu_to_be_16(rte_be_to_cpu_16(ip_len) -
+			m->l3_len);
+	} else {
+		ip_paylen = *(uint16_t *)(data + m->l2_len + 4);
+	}
+
+	/* calculate the new phdr checksum that doesn't include ip_paylen */
+	tmp = prev_cksum;
+	if (tmp < ip_paylen)
+		tmp += 0xffff;
+	tmp -= ip_paylen;
+	new_cksum = tmp;
+
+	/* replace it in the packet */
+	*cksum_ptr = new_cksum;
+}
+
 static inline void
 ixgbe_set_xmit_ctx(struct igb_tx_queue* txq,
 		volatile struct ixgbe_adv_tx_context_desc *ctx_txd,
-		uint64_t ol_flags, uint32_t vlan_macip_lens)
+		uint64_t ol_flags, union ixgbe_tx_offload tx_offload)
 {
 	uint32_t type_tucmd_mlhl;
-	uint32_t mss_l4len_idx;
+	uint32_t mss_l4len_idx = 0;
 	uint32_t ctx_idx;
-	uint32_t cmp_mask;
+	uint32_t vlan_macip_lens;
+	union ixgbe_tx_offload tx_offload_mask;
 
 	ctx_idx = txq->ctx_curr;
-	cmp_mask = 0;
+	tx_offload_mask.data = 0;
 	type_tucmd_mlhl = 0;
 
+	/* Specify which HW CTX to upload. */
+	mss_l4len_idx |= (ctx_idx << IXGBE_ADVTXD_IDX_SHIFT);
+
 	if (ol_flags & PKT_TX_VLAN_PKT) {
-		cmp_mask |= TX_VLAN_CMP_MASK;
+		tx_offload_mask.vlan_tci = ~0;
 	}
 
-	if (ol_flags & PKT_TX_IP_CKSUM) {
-		type_tucmd_mlhl = IXGBE_ADVTXD_TUCMD_IPV4;
-		cmp_mask |= TX_MACIP_LEN_CMP_MASK;
-	}
+	/* check if TCP segmentation required for this packet */
+	if (ol_flags & PKT_TX_TCP_SEG) {
+		/* implies IP cksum and TCP cksum */
+		type_tucmd_mlhl = IXGBE_ADVTXD_TUCMD_IPV4 |
+			IXGBE_ADVTXD_TUCMD_L4T_TCP |
+			IXGBE_ADVTXD_DTYP_CTXT | IXGBE_ADVTXD_DCMD_DEXT;;
+
+		tx_offload_mask.l2_len = ~0;
+		tx_offload_mask.l3_len = ~0;
+		tx_offload_mask.l4_len = ~0;
+		tx_offload_mask.tso_segsz = ~0;
+		mss_l4len_idx |= tx_offload.tso_segsz << IXGBE_ADVTXD_MSS_SHIFT;
+		mss_l4len_idx |= tx_offload.l4_len << IXGBE_ADVTXD_L4LEN_SHIFT;
+	} else { /* no TSO, check if hardware checksum is needed */
+		if (ol_flags & PKT_TX_IP_CKSUM) {
+			type_tucmd_mlhl = IXGBE_ADVTXD_TUCMD_IPV4;
+			tx_offload_mask.l2_len = ~0;
+			tx_offload_mask.l3_len = ~0;
+		}
 
-	/* Specify which HW CTX to upload. */
-	mss_l4len_idx = (ctx_idx << IXGBE_ADVTXD_IDX_SHIFT);
-	switch (ol_flags & PKT_TX_L4_MASK) {
-	case PKT_TX_UDP_CKSUM:
-		type_tucmd_mlhl |= IXGBE_ADVTXD_TUCMD_L4T_UDP |
+		switch (ol_flags & PKT_TX_L4_MASK) {
+		case PKT_TX_UDP_CKSUM:
+			type_tucmd_mlhl |= IXGBE_ADVTXD_TUCMD_L4T_UDP |
 				IXGBE_ADVTXD_DTYP_CTXT | IXGBE_ADVTXD_DCMD_DEXT;
-		mss_l4len_idx |= sizeof(struct udp_hdr) << IXGBE_ADVTXD_L4LEN_SHIFT;
-		cmp_mask |= TX_MACIP_LEN_CMP_MASK;
-		break;
-	case PKT_TX_TCP_CKSUM:
-		type_tucmd_mlhl |= IXGBE_ADVTXD_TUCMD_L4T_TCP |
+			mss_l4len_idx |= sizeof(struct udp_hdr) << IXGBE_ADVTXD_L4LEN_SHIFT;
+			tx_offload_mask.l2_len = ~0;
+			tx_offload_mask.l3_len = ~0;
+			break;
+		case PKT_TX_TCP_CKSUM:
+			type_tucmd_mlhl |= IXGBE_ADVTXD_TUCMD_L4T_TCP |
 				IXGBE_ADVTXD_DTYP_CTXT | IXGBE_ADVTXD_DCMD_DEXT;
-		mss_l4len_idx |= sizeof(struct tcp_hdr) << IXGBE_ADVTXD_L4LEN_SHIFT;
-		cmp_mask |= TX_MACIP_LEN_CMP_MASK;
-		break;
-	case PKT_TX_SCTP_CKSUM:
-		type_tucmd_mlhl |= IXGBE_ADVTXD_TUCMD_L4T_SCTP |
+			mss_l4len_idx |= sizeof(struct tcp_hdr) << IXGBE_ADVTXD_L4LEN_SHIFT;
+			tx_offload_mask.l2_len = ~0;
+			tx_offload_mask.l3_len = ~0;
+			tx_offload_mask.l4_len = ~0;
+			break;
+		case PKT_TX_SCTP_CKSUM:
+			type_tucmd_mlhl |= IXGBE_ADVTXD_TUCMD_L4T_SCTP |
 				IXGBE_ADVTXD_DTYP_CTXT | IXGBE_ADVTXD_DCMD_DEXT;
-		mss_l4len_idx |= sizeof(struct sctp_hdr) << IXGBE_ADVTXD_L4LEN_SHIFT;
-		cmp_mask |= TX_MACIP_LEN_CMP_MASK;
-		break;
-	default:
-		type_tucmd_mlhl |= IXGBE_ADVTXD_TUCMD_L4T_RSV |
+			mss_l4len_idx |= sizeof(struct sctp_hdr) << IXGBE_ADVTXD_L4LEN_SHIFT;
+			tx_offload_mask.l2_len = ~0;
+			tx_offload_mask.l3_len = ~0;
+			break;
+		default:
+			type_tucmd_mlhl |= IXGBE_ADVTXD_TUCMD_L4T_RSV |
 				IXGBE_ADVTXD_DTYP_CTXT | IXGBE_ADVTXD_DCMD_DEXT;
-		break;
+			break;
+		}
 	}
 
 	txq->ctx_cache[ctx_idx].flags = ol_flags;
-	txq->ctx_cache[ctx_idx].cmp_mask = cmp_mask;
-	txq->ctx_cache[ctx_idx].vlan_macip_lens.data =
-		vlan_macip_lens & cmp_mask;
+	txq->ctx_cache[ctx_idx].tx_offload.data  =
+		tx_offload_mask.data & tx_offload.data;
+	txq->ctx_cache[ctx_idx].tx_offload_mask    = tx_offload_mask;
 
 	ctx_txd->type_tucmd_mlhl = rte_cpu_to_le_32(type_tucmd_mlhl);
+	vlan_macip_lens = tx_offload.l3_len;
+	vlan_macip_lens |= (tx_offload.l2_len << IXGBE_ADVTXD_MACLEN_SHIFT);
+	vlan_macip_lens |= ((uint32_t)tx_offload.vlan_tci << IXGBE_ADVTXD_VLAN_SHIFT);
 	ctx_txd->vlan_macip_lens = rte_cpu_to_le_32(vlan_macip_lens);
 	ctx_txd->mss_l4len_idx   = rte_cpu_to_le_32(mss_l4len_idx);
 	ctx_txd->seqnum_seed     = 0;
@@ -421,20 +491,20 @@ ixgbe_set_xmit_ctx(struct igb_tx_queue* txq,
  */
 static inline uint32_t
 what_advctx_update(struct igb_tx_queue *txq, uint64_t flags,
-		uint32_t vlan_macip_lens)
+		union ixgbe_tx_offload tx_offload)
 {
 	/* If match with the current used context */
 	if (likely((txq->ctx_cache[txq->ctx_curr].flags == flags) &&
-		(txq->ctx_cache[txq->ctx_curr].vlan_macip_lens.data ==
-		(txq->ctx_cache[txq->ctx_curr].cmp_mask & vlan_macip_lens)))) {
+		(txq->ctx_cache[txq->ctx_curr].tx_offload.data ==
+		(txq->ctx_cache[txq->ctx_curr].tx_offload_mask.data & tx_offload.data)))) {
 			return txq->ctx_curr;
 	}
 
 	/* What if match with the next context  */
 	txq->ctx_curr ^= 1;
 	if (likely((txq->ctx_cache[txq->ctx_curr].flags == flags) &&
-		(txq->ctx_cache[txq->ctx_curr].vlan_macip_lens.data ==
-		(txq->ctx_cache[txq->ctx_curr].cmp_mask & vlan_macip_lens)))) {
+		(txq->ctx_cache[txq->ctx_curr].tx_offload.data ==
+		(txq->ctx_cache[txq->ctx_curr].tx_offload_mask.data & tx_offload.data)))) {
 			return txq->ctx_curr;
 	}
 
@@ -445,20 +515,25 @@ what_advctx_update(struct igb_tx_queue *txq, uint64_t flags,
 static inline uint32_t
 tx_desc_cksum_flags_to_olinfo(uint64_t ol_flags)
 {
-	static const uint32_t l4_olinfo[2] = {0, IXGBE_ADVTXD_POPTS_TXSM};
-	static const uint32_t l3_olinfo[2] = {0, IXGBE_ADVTXD_POPTS_IXSM};
-	uint32_t tmp;
-
-	tmp  = l4_olinfo[(ol_flags & PKT_TX_L4_MASK)  != PKT_TX_L4_NO_CKSUM];
-	tmp |= l3_olinfo[(ol_flags & PKT_TX_IP_CKSUM) != 0];
+	uint32_t tmp = 0;
+	if ((ol_flags & PKT_TX_L4_MASK) != PKT_TX_L4_NO_CKSUM)
+		tmp |= IXGBE_ADVTXD_POPTS_TXSM;
+	if (ol_flags & PKT_TX_IP_CKSUM)
+		tmp |= IXGBE_ADVTXD_POPTS_IXSM;
+	if (ol_flags & PKT_TX_TCP_SEG)
+		tmp |= IXGBE_ADVTXD_POPTS_TXSM;
 	return tmp;
 }
 
 static inline uint32_t
-tx_desc_vlan_flags_to_cmdtype(uint64_t ol_flags)
+tx_desc_ol_flags_to_cmdtype(uint64_t ol_flags)
 {
-	static const uint32_t vlan_cmd[2] = {0, IXGBE_ADVTXD_DCMD_VLE};
-	return vlan_cmd[(ol_flags & PKT_TX_VLAN_PKT) != 0];
+	uint32_t cmdtype = 0;
+	if (ol_flags & PKT_TX_VLAN_PKT)
+		cmdtype |= IXGBE_ADVTXD_DCMD_VLE;
+	if (ol_flags & PKT_TX_TCP_SEG)
+		cmdtype |= IXGBE_ADVTXD_DCMD_TSE;
+	return cmdtype;
 }
 
 /* Default RS bit threshold values */
@@ -539,14 +614,6 @@ ixgbe_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
 	volatile union ixgbe_adv_tx_desc *txd;
 	struct rte_mbuf     *tx_pkt;
 	struct rte_mbuf     *m_seg;
-	union ixgbe_vlan_macip vlan_macip_lens;
-	union {
-		uint16_t u16;
-		struct {
-			uint16_t l3_len:9;
-			uint16_t l2_len:7;
-		};
-	} l2_l3_len;
 	uint64_t buf_dma_addr;
 	uint32_t olinfo_status;
 	uint32_t cmd_type_len;
@@ -560,6 +627,7 @@ ixgbe_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
 	uint64_t tx_ol_req;
 	uint32_t ctx = 0;
 	uint32_t new_ctx;
+	union ixgbe_tx_offload tx_offload = { .data = 0 };
 
 	txq = tx_queue;
 	sw_ring = txq->sw_ring;
@@ -587,17 +655,19 @@ ixgbe_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
 		ol_flags = tx_pkt->ol_flags;
 
 		/* If hardware offload required */
-		tx_ol_req = ol_flags & (PKT_TX_VLAN_PKT | PKT_TX_IP_CKSUM |
-			PKT_TX_L4_MASK);
+		tx_ol_req = ol_flags &
+			(PKT_TX_VLAN_PKT | PKT_TX_IP_CKSUM | PKT_TX_L4_MASK |
+				PKT_TX_TCP_SEG);
 		if (tx_ol_req) {
-			l2_l3_len.l2_len = tx_pkt->l2_len;
-			l2_l3_len.l3_len = tx_pkt->l3_len;
-			vlan_macip_lens.f.vlan_tci = tx_pkt->vlan_tci;
-			vlan_macip_lens.f.l2_l3_len = l2_l3_len.u16;
+			tx_offload.l2_len = tx_pkt->l2_len;
+			tx_offload.l3_len = tx_pkt->l3_len;
+			tx_offload.l4_len = tx_pkt->l4_len;
+			tx_offload.vlan_tci = tx_pkt->vlan_tci;
+			tx_offload.tso_segsz = tx_pkt->tso_segsz;
 
 			/* If new context need be built or reuse the exist ctx. */
 			ctx = what_advctx_update(txq, tx_ol_req,
-				vlan_macip_lens.data);
+				tx_offload);
 			/* Only allocate context descriptor if required*/
 			new_ctx = (ctx == IXGBE_CTX_NUM);
 			ctx = txq->ctx_curr;
@@ -712,13 +782,26 @@ ixgbe_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
 		 */
 		cmd_type_len = IXGBE_ADVTXD_DTYP_DATA |
 			IXGBE_ADVTXD_DCMD_IFCS | IXGBE_ADVTXD_DCMD_DEXT;
-		olinfo_status = (pkt_len << IXGBE_ADVTXD_PAYLEN_SHIFT);
+
 #ifdef RTE_LIBRTE_IEEE1588
 		if (ol_flags & PKT_TX_IEEE1588_TMST)
 			cmd_type_len |= IXGBE_ADVTXD_MAC_1588;
 #endif
 
+		olinfo_status = 0;
 		if (tx_ol_req) {
+
+			if (ol_flags & PKT_TX_TCP_SEG) {
+				/* when TSO is on, paylen in descriptor is the
+				 * not the packet len but the tcp payload len */
+				pkt_len -= (tx_offload.l2_len +
+					tx_offload.l3_len + tx_offload.l4_len);
+
+				/* the pseudo header checksum must be modified:
+				 * it should not include the ip_len */
+				ixgbe_fix_tcp_phdr_cksum(tx_pkt);
+			}
+
 			/*
 			 * Setup the TX Advanced Context Descriptor if required
 			 */
@@ -739,7 +822,7 @@ ixgbe_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
 				}
 
 				ixgbe_set_xmit_ctx(txq, ctx_txd, tx_ol_req,
-				    vlan_macip_lens.data);
+					tx_offload);
 
 				txe->last_id = tx_last;
 				tx_id = txe->next_id;
@@ -751,11 +834,13 @@ ixgbe_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
 			 * This path will go through
 			 * whatever new/reuse the context descriptor
 			 */
-			cmd_type_len  |= tx_desc_vlan_flags_to_cmdtype(ol_flags);
+			cmd_type_len  |= tx_desc_ol_flags_to_cmdtype(ol_flags);
 			olinfo_status |= tx_desc_cksum_flags_to_olinfo(ol_flags);
 			olinfo_status |= ctx << IXGBE_ADVTXD_IDX_SHIFT;
 		}
 
+		olinfo_status |= (pkt_len << IXGBE_ADVTXD_PAYLEN_SHIFT);
+
 		m_seg = tx_pkt;
 		do {
 			txd = &txr[tx_id];
@@ -3600,9 +3685,10 @@ ixgbe_dev_tx_init(struct rte_eth_dev *dev)
 	PMD_INIT_FUNC_TRACE();
 	hw = IXGBE_DEV_PRIVATE_TO_HW(dev->data->dev_private);
 
-	/* Enable TX CRC (checksum offload requirement) */
+	/* Enable TX CRC (checksum offload requirement) and hw padding
+	 * (TSO requirement) */
 	hlreg0 = IXGBE_READ_REG(hw, IXGBE_HLREG0);
-	hlreg0 |= IXGBE_HLREG0_TXCRCEN;
+	hlreg0 |= (IXGBE_HLREG0_TXCRCEN | IXGBE_HLREG0_TXPADEN);
 	IXGBE_WRITE_REG(hw, IXGBE_HLREG0, hlreg0);
 
 	/* Setup the Base and Length of the Tx Descriptor Rings */
diff --git a/lib/librte_pmd_ixgbe/ixgbe_rxtx.h b/lib/librte_pmd_ixgbe/ixgbe_rxtx.h
index eb89715..13099af 100644
--- a/lib/librte_pmd_ixgbe/ixgbe_rxtx.h
+++ b/lib/librte_pmd_ixgbe/ixgbe_rxtx.h
@@ -145,13 +145,16 @@ enum ixgbe_advctx_num {
 };
 
 /** Offload features */
-union ixgbe_vlan_macip {
-	uint32_t data;
+union ixgbe_tx_offload {
+	uint64_t data;
 	struct {
-		uint16_t l2_l3_len; /**< combined 9-bit l3, 7-bit l2 lengths */
-		uint16_t vlan_tci;
+		uint64_t l2_len:7; /**< L2 (MAC) Header Length. */
+		uint64_t l3_len:9; /**< L3 (IP) Header Length. */
+		uint64_t l4_len:8; /**< L4 (TCP/UDP) Header Length. */
+		uint64_t tso_segsz:16; /**< TCP TSO segment size */
+		uint64_t vlan_tci:16;
 		/**< VLAN Tag Control Identifier (CPU order). */
-	} f;
+	};
 };
 
 /*
@@ -170,8 +173,10 @@ union ixgbe_vlan_macip {
 
 struct ixgbe_advctx_info {
 	uint64_t flags;           /**< ol_flags for context build. */
-	uint32_t cmp_mask;        /**< compare mask for vlan_macip_lens */
-	union ixgbe_vlan_macip vlan_macip_lens; /**< vlan, mac ip length. */
+	/**< tx offload: vlan, tso, l2-l3-l4 lengths. */
+	union ixgbe_tx_offload tx_offload;
+	/** compare mask for tx offload. */
+	union ixgbe_tx_offload tx_offload_mask;
 };
 
 /**
-- 
2.1.0

^ permalink raw reply	[flat|nested] 112+ messages in thread

* [dpdk-dev] [PATCH 09/12] testpmd: fix use of offload flags in testpmd
  2014-11-10 15:59 [dpdk-dev] [PATCH 00/12] add TSO support Olivier Matz
                   ` (7 preceding siblings ...)
  2014-11-10 15:59 ` [dpdk-dev] [PATCH 08/12] ixgbe: support " Olivier Matz
@ 2014-11-10 15:59 ` Olivier Matz
  2014-11-10 15:59 ` [dpdk-dev] [PATCH 10/12] testpmd: rework csum forward engine Olivier Matz
                   ` (4 subsequent siblings)
  13 siblings, 0 replies; 112+ messages in thread
From: Olivier Matz @ 2014-11-10 15:59 UTC (permalink / raw)
  To: dev; +Cc: jigsaw

In testpmd the rte_port->tx_ol_flags flag was used in 2 incompatible
manners:
- sometimes used with testpmd specific flags (0xff for checksums, and
  bit 11 for vlan)
- sometimes assigned to m->ol_flags directly, which is wrong in case
  of checksum flags

This commit replaces the hardcoded values by named definitions, which
are not compatible with mbuf flags. The testpmd forward engines are
fixed to use the flags properly.

Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
---
 app/test-pmd/config.c   |  4 ++--
 app/test-pmd/csumonly.c | 40 +++++++++++++++++++++++-----------------
 app/test-pmd/macfwd.c   |  5 ++++-
 app/test-pmd/macswap.c  |  5 ++++-
 app/test-pmd/testpmd.h  | 28 +++++++++++++++++++++-------
 app/test-pmd/txonly.c   |  9 ++++++---
 6 files changed, 60 insertions(+), 31 deletions(-)

diff --git a/app/test-pmd/config.c b/app/test-pmd/config.c
index 9bc08f4..4b6fb91 100644
--- a/app/test-pmd/config.c
+++ b/app/test-pmd/config.c
@@ -1674,7 +1674,7 @@ tx_vlan_set(portid_t port_id, uint16_t vlan_id)
 		return;
 	if (vlan_id_is_invalid(vlan_id))
 		return;
-	ports[port_id].tx_ol_flags |= PKT_TX_VLAN_PKT;
+	ports[port_id].tx_ol_flags |= TESTPMD_TX_OFFLOAD_INSERT_VLAN;
 	ports[port_id].tx_vlan_id = vlan_id;
 }
 
@@ -1683,7 +1683,7 @@ tx_vlan_reset(portid_t port_id)
 {
 	if (port_id_is_invalid(port_id))
 		return;
-	ports[port_id].tx_ol_flags &= ~PKT_TX_VLAN_PKT;
+	ports[port_id].tx_ol_flags &= ~TESTPMD_TX_OFFLOAD_INSERT_VLAN;
 }
 
 void
diff --git a/app/test-pmd/csumonly.c b/app/test-pmd/csumonly.c
index 8d10bfd..743094a 100644
--- a/app/test-pmd/csumonly.c
+++ b/app/test-pmd/csumonly.c
@@ -322,7 +322,7 @@ pkt_burst_checksum_forward(struct fwd_stream *fs)
 			/* Do not delete, this is required by HW*/
 			ipv4_hdr->hdr_checksum = 0;
 
-			if (tx_ol_flags & 0x1) {
+			if (tx_ol_flags & TESTPMD_TX_OFFLOAD_IP_CKSUM) {
 				/* HW checksum */
 				ol_flags |= PKT_TX_IP_CKSUM;
 			}
@@ -336,7 +336,7 @@ pkt_burst_checksum_forward(struct fwd_stream *fs)
 			if (l4_proto == IPPROTO_UDP) {
 				udp_hdr = (struct udp_hdr*) (rte_pktmbuf_mtod(mb,
 						unsigned char *) + l2_len + l3_len);
-				if (tx_ol_flags & 0x2) {
+				if (tx_ol_flags & TESTPMD_TX_OFFLOAD_UDP_CKSUM) {
 					/* HW Offload */
 					ol_flags |= PKT_TX_UDP_CKSUM;
 					if (ipv4_tunnel)
@@ -358,7 +358,7 @@ pkt_burst_checksum_forward(struct fwd_stream *fs)
 					uint16_t len;
 
 					/* Check if inner L3/L4 checkum flag is set */
-					if (tx_ol_flags & 0xF0)
+					if (tx_ol_flags & TESTPMD_TX_OFFLOAD_INNER_CKSUM_MASK)
 						ol_flags |= PKT_TX_VXLAN_CKSUM;
 
 					inner_l2_len  = sizeof(struct ether_hdr);
@@ -381,7 +381,7 @@ pkt_burst_checksum_forward(struct fwd_stream *fs)
 								unsigned char *) + len);
 						inner_l4_proto = inner_ipv4_hdr->next_proto_id;
 
-						if (tx_ol_flags & 0x10) {
+						if (tx_ol_flags & TESTPMD_TX_OFFLOAD_INNER_IP_CKSUM) {
 
 							/* Do not delete, this is required by HW*/
 							inner_ipv4_hdr->hdr_checksum = 0;
@@ -394,7 +394,8 @@ pkt_burst_checksum_forward(struct fwd_stream *fs)
 								unsigned char *) + len);
 						inner_l4_proto = inner_ipv6_hdr->proto;
 					}
-					if ((inner_l4_proto == IPPROTO_UDP) && (tx_ol_flags & 0x20)) {
+					if ((inner_l4_proto == IPPROTO_UDP) &&
+						(tx_ol_flags & TESTPMD_TX_OFFLOAD_INNER_UDP_CKSUM)) {
 
 						/* HW Offload */
 						ol_flags |= PKT_TX_UDP_CKSUM;
@@ -405,7 +406,8 @@ pkt_burst_checksum_forward(struct fwd_stream *fs)
 						else if (eth_type == ETHER_TYPE_IPv6)
 							inner_udp_hdr->dgram_cksum = get_ipv6_psd_sum(inner_ipv6_hdr);
 
-					} else if ((inner_l4_proto == IPPROTO_TCP) && (tx_ol_flags & 0x40)) {
+					} else if ((inner_l4_proto == IPPROTO_TCP) &&
+						(tx_ol_flags & TESTPMD_TX_OFFLOAD_INNER_TCP_CKSUM)) {
 						/* HW Offload */
 						ol_flags |= PKT_TX_TCP_CKSUM;
 						inner_tcp_hdr = (struct tcp_hdr *) (rte_pktmbuf_mtod(mb,
@@ -414,7 +416,8 @@ pkt_burst_checksum_forward(struct fwd_stream *fs)
 							inner_tcp_hdr->cksum = get_ipv4_psd_sum(inner_ipv4_hdr);
 						else if (eth_type == ETHER_TYPE_IPv6)
 							inner_tcp_hdr->cksum = get_ipv6_psd_sum(inner_ipv6_hdr);
-					} else if ((inner_l4_proto == IPPROTO_SCTP) && (tx_ol_flags & 0x80)) {
+					} else if ((inner_l4_proto == IPPROTO_SCTP) &&
+						(tx_ol_flags & TESTPMD_TX_OFFLOAD_INNER_SCTP_CKSUM)) {
 						/* HW Offload */
 						ol_flags |= PKT_TX_SCTP_CKSUM;
 						inner_sctp_hdr = (struct sctp_hdr *) (rte_pktmbuf_mtod(mb,
@@ -427,7 +430,7 @@ pkt_burst_checksum_forward(struct fwd_stream *fs)
 			} else if (l4_proto == IPPROTO_TCP) {
 				tcp_hdr = (struct tcp_hdr*) (rte_pktmbuf_mtod(mb,
 						unsigned char *) + l2_len + l3_len);
-				if (tx_ol_flags & 0x4) {
+				if (tx_ol_flags & TESTPMD_TX_OFFLOAD_TCP_CKSUM) {
 					ol_flags |= PKT_TX_TCP_CKSUM;
 					tcp_hdr->cksum = get_ipv4_psd_sum(ipv4_hdr);
 				}
@@ -440,7 +443,7 @@ pkt_burst_checksum_forward(struct fwd_stream *fs)
 				sctp_hdr = (struct sctp_hdr*) (rte_pktmbuf_mtod(mb,
 						unsigned char *) + l2_len + l3_len);
 
-				if (tx_ol_flags & 0x8) {
+				if (tx_ol_flags & TESTPMD_TX_OFFLOAD_SCTP_CKSUM) {
 					ol_flags |= PKT_TX_SCTP_CKSUM;
 					sctp_hdr->cksum = 0;
 
@@ -465,7 +468,7 @@ pkt_burst_checksum_forward(struct fwd_stream *fs)
 			if (l4_proto == IPPROTO_UDP) {
 				udp_hdr = (struct udp_hdr*) (rte_pktmbuf_mtod(mb,
 						unsigned char *) + l2_len + l3_len);
-				if (tx_ol_flags & 0x2) {
+				if (tx_ol_flags & TESTPMD_TX_OFFLOAD_UDP_CKSUM) {
 					/* HW Offload */
 					ol_flags |= PKT_TX_UDP_CKSUM;
 					if (ipv6_tunnel)
@@ -487,7 +490,7 @@ pkt_burst_checksum_forward(struct fwd_stream *fs)
 					uint16_t len;
 
 					/* Check if inner L3/L4 checksum flag is set */
-					if (tx_ol_flags & 0xF0)
+					if (tx_ol_flags & TESTPMD_TX_OFFLOAD_INNER_CKSUM_MASK)
 						ol_flags |= PKT_TX_VXLAN_CKSUM;
 
 					inner_l2_len  = sizeof(struct ether_hdr);
@@ -511,7 +514,7 @@ pkt_burst_checksum_forward(struct fwd_stream *fs)
 						inner_l4_proto = inner_ipv4_hdr->next_proto_id;
 
 						/* HW offload */
-						if (tx_ol_flags & 0x10) {
+						if (tx_ol_flags & TESTPMD_TX_OFFLOAD_INNER_IP_CKSUM) {
 
 							/* Do not delete, this is required by HW*/
 							inner_ipv4_hdr->hdr_checksum = 0;
@@ -524,7 +527,8 @@ pkt_burst_checksum_forward(struct fwd_stream *fs)
 						inner_l4_proto = inner_ipv6_hdr->proto;
 					}
 
-					if ((inner_l4_proto == IPPROTO_UDP) && (tx_ol_flags & 0x20)) {
+					if ((inner_l4_proto == IPPROTO_UDP) &&
+						(tx_ol_flags & TESTPMD_TX_OFFLOAD_INNER_UDP_CKSUM)) {
 						inner_udp_hdr = (struct udp_hdr *) (rte_pktmbuf_mtod(mb,
 							unsigned char *) + len + inner_l3_len);
 						/* HW offload */
@@ -534,7 +538,8 @@ pkt_burst_checksum_forward(struct fwd_stream *fs)
 							inner_udp_hdr->dgram_cksum = get_ipv4_psd_sum(inner_ipv4_hdr);
 						else if (eth_type == ETHER_TYPE_IPv6)
 							inner_udp_hdr->dgram_cksum = get_ipv6_psd_sum(inner_ipv6_hdr);
-					} else if ((inner_l4_proto == IPPROTO_TCP) && (tx_ol_flags & 0x40)) {
+					} else if ((inner_l4_proto == IPPROTO_TCP) &&
+						(tx_ol_flags & TESTPMD_TX_OFFLOAD_INNER_TCP_CKSUM)) {
 						/* HW offload */
 						ol_flags |= PKT_TX_TCP_CKSUM;
 						inner_tcp_hdr = (struct tcp_hdr *) (rte_pktmbuf_mtod(mb,
@@ -545,7 +550,8 @@ pkt_burst_checksum_forward(struct fwd_stream *fs)
 						else if (eth_type == ETHER_TYPE_IPv6)
 							inner_tcp_hdr->cksum = get_ipv6_psd_sum(inner_ipv6_hdr);
 
-					} else if ((inner_l4_proto == IPPROTO_SCTP) && (tx_ol_flags & 0x80)) {
+					} else if ((inner_l4_proto == IPPROTO_SCTP) &&
+						(tx_ol_flags & TESTPMD_TX_OFFLOAD_INNER_SCTP_CKSUM)) {
 						/* HW offload */
 						ol_flags |= PKT_TX_SCTP_CKSUM;
 						inner_sctp_hdr = (struct sctp_hdr *) (rte_pktmbuf_mtod(mb,
@@ -559,7 +565,7 @@ pkt_burst_checksum_forward(struct fwd_stream *fs)
 			else if (l4_proto == IPPROTO_TCP) {
 				tcp_hdr = (struct tcp_hdr*) (rte_pktmbuf_mtod(mb,
 						unsigned char *) + l2_len + l3_len);
-				if (tx_ol_flags & 0x4) {
+				if (tx_ol_flags & TESTPMD_TX_OFFLOAD_TCP_CKSUM) {
 					ol_flags |= PKT_TX_TCP_CKSUM;
 					tcp_hdr->cksum = get_ipv6_psd_sum(ipv6_hdr);
 				}
@@ -573,7 +579,7 @@ pkt_burst_checksum_forward(struct fwd_stream *fs)
 				sctp_hdr = (struct sctp_hdr*) (rte_pktmbuf_mtod(mb,
 						unsigned char *) + l2_len + l3_len);
 
-				if (tx_ol_flags & 0x8) {
+				if (tx_ol_flags & TESTPMD_TX_OFFLOAD_SCTP_CKSUM) {
 					ol_flags |= PKT_TX_SCTP_CKSUM;
 					sctp_hdr->cksum = 0;
 					/* Sanity check, only number of 4 bytes supported by HW */
diff --git a/app/test-pmd/macfwd.c b/app/test-pmd/macfwd.c
index 38bae23..aa3d705 100644
--- a/app/test-pmd/macfwd.c
+++ b/app/test-pmd/macfwd.c
@@ -85,6 +85,7 @@ pkt_burst_mac_forward(struct fwd_stream *fs)
 	uint16_t nb_rx;
 	uint16_t nb_tx;
 	uint16_t i;
+	uint64_t ol_flags = 0;
 #ifdef RTE_TEST_PMD_RECORD_CORE_CYCLES
 	uint64_t start_tsc;
 	uint64_t end_tsc;
@@ -108,6 +109,8 @@ pkt_burst_mac_forward(struct fwd_stream *fs)
 #endif
 	fs->rx_packets += nb_rx;
 	txp = &ports[fs->tx_port];
+	if (txp->tx_ol_flags & TESTPMD_TX_OFFLOAD_INSERT_VLAN)
+		ol_flags = PKT_TX_VLAN_PKT;
 	for (i = 0; i < nb_rx; i++) {
 		mb = pkts_burst[i];
 		eth_hdr = rte_pktmbuf_mtod(mb, struct ether_hdr *);
@@ -115,7 +118,7 @@ pkt_burst_mac_forward(struct fwd_stream *fs)
 				&eth_hdr->d_addr);
 		ether_addr_copy(&ports[fs->tx_port].eth_addr,
 				&eth_hdr->s_addr);
-		mb->ol_flags = txp->tx_ol_flags;
+		mb->ol_flags = ol_flags;
 		mb->l2_len = sizeof(struct ether_hdr);
 		mb->l3_len = sizeof(struct ipv4_hdr);
 		mb->vlan_tci = txp->tx_vlan_id;
diff --git a/app/test-pmd/macswap.c b/app/test-pmd/macswap.c
index 1786095..ec61657 100644
--- a/app/test-pmd/macswap.c
+++ b/app/test-pmd/macswap.c
@@ -85,6 +85,7 @@ pkt_burst_mac_swap(struct fwd_stream *fs)
 	uint16_t nb_rx;
 	uint16_t nb_tx;
 	uint16_t i;
+	uint64_t ol_flags = 0;
 #ifdef RTE_TEST_PMD_RECORD_CORE_CYCLES
 	uint64_t start_tsc;
 	uint64_t end_tsc;
@@ -108,6 +109,8 @@ pkt_burst_mac_swap(struct fwd_stream *fs)
 #endif
 	fs->rx_packets += nb_rx;
 	txp = &ports[fs->tx_port];
+	if (txp->tx_ol_flags & TESTPMD_TX_OFFLOAD_INSERT_VLAN)
+		ol_flags = PKT_TX_VLAN_PKT;
 	for (i = 0; i < nb_rx; i++) {
 		mb = pkts_burst[i];
 		eth_hdr = rte_pktmbuf_mtod(mb, struct ether_hdr *);
@@ -117,7 +120,7 @@ pkt_burst_mac_swap(struct fwd_stream *fs)
 		ether_addr_copy(&eth_hdr->s_addr, &eth_hdr->d_addr);
 		ether_addr_copy(&addr, &eth_hdr->s_addr);
 
-		mb->ol_flags = txp->tx_ol_flags;
+		mb->ol_flags = ol_flags;
 		mb->l2_len = sizeof(struct ether_hdr);
 		mb->l3_len = sizeof(struct ipv4_hdr);
 		mb->vlan_tci = txp->tx_vlan_id;
diff --git a/app/test-pmd/testpmd.h b/app/test-pmd/testpmd.h
index 9cbfeac..82af2bd 100644
--- a/app/test-pmd/testpmd.h
+++ b/app/test-pmd/testpmd.h
@@ -123,14 +123,28 @@ struct fwd_stream {
 #endif
 };
 
+/** Offload IP checksum in csum forward engine */
+#define TESTPMD_TX_OFFLOAD_IP_CKSUM          0x0001
+/** Offload UDP checksum in csum forward engine */
+#define TESTPMD_TX_OFFLOAD_UDP_CKSUM         0x0002
+/** Offload TCP checksum in csum forward engine */
+#define TESTPMD_TX_OFFLOAD_TCP_CKSUM         0x0004
+/** Offload SCTP checksum in csum forward engine */
+#define TESTPMD_TX_OFFLOAD_SCTP_CKSUM        0x0008
+/** Offload inner IP checksum in csum forward engine */
+#define TESTPMD_TX_OFFLOAD_INNER_IP_CKSUM    0x0010
+/** Offload inner UDP checksum in csum forward engine */
+#define TESTPMD_TX_OFFLOAD_INNER_UDP_CKSUM   0x0020
+/** Offload inner TCP checksum in csum forward engine */
+#define TESTPMD_TX_OFFLOAD_INNER_TCP_CKSUM   0x0040
+/** Offload inner SCTP checksum in csum forward engine */
+#define TESTPMD_TX_OFFLOAD_INNER_SCTP_CKSUM  0x0080
+/** Offload inner IP checksum mask */
+#define TESTPMD_TX_OFFLOAD_INNER_CKSUM_MASK  0x00F0
+/** Insert VLAN header in forward engine */
+#define TESTPMD_TX_OFFLOAD_INSERT_VLAN       0x0100
 /**
  * The data structure associated with each port.
- * tx_ol_flags is slightly different from ol_flags of rte_mbuf.
- *   Bit  0: Insert IP checksum
- *   Bit  1: Insert UDP checksum
- *   Bit  2: Insert TCP checksum
- *   Bit  3: Insert SCTP checksum
- *   Bit 11: Insert VLAN Label
  */
 struct rte_port {
 	struct rte_eth_dev_info dev_info;   /**< PCI info + driver name */
@@ -141,7 +155,7 @@ struct rte_port {
 	struct fwd_stream       *rx_stream; /**< Port RX stream, if unique */
 	struct fwd_stream       *tx_stream; /**< Port TX stream, if unique */
 	unsigned int            socket_id;  /**< For NUMA support */
-	uint64_t                tx_ol_flags;/**< Offload Flags of TX packets. */
+	uint16_t                tx_ol_flags;/**< TX Offload Flags (TESTPMD_TX_OFFLOAD...). */
 	uint16_t                tx_vlan_id; /**< Tag Id. in TX VLAN packets. */
 	void                    *fwd_ctx;   /**< Forwarding mode context */
 	uint64_t                rx_bad_ip_csum; /**< rx pkts with bad ip checksum  */
diff --git a/app/test-pmd/txonly.c b/app/test-pmd/txonly.c
index 3d08005..c984670 100644
--- a/app/test-pmd/txonly.c
+++ b/app/test-pmd/txonly.c
@@ -196,6 +196,7 @@ static void
 pkt_burst_transmit(struct fwd_stream *fs)
 {
 	struct rte_mbuf *pkts_burst[MAX_PKT_BURST];
+	struct rte_port *txp;
 	struct rte_mbuf *pkt;
 	struct rte_mbuf *pkt_seg;
 	struct rte_mempool *mbp;
@@ -203,7 +204,7 @@ pkt_burst_transmit(struct fwd_stream *fs)
 	uint16_t nb_tx;
 	uint16_t nb_pkt;
 	uint16_t vlan_tci;
-	uint64_t ol_flags;
+	uint64_t ol_flags = 0;
 	uint8_t  i;
 #ifdef RTE_TEST_PMD_RECORD_CORE_CYCLES
 	uint64_t start_tsc;
@@ -216,8 +217,10 @@ pkt_burst_transmit(struct fwd_stream *fs)
 #endif
 
 	mbp = current_fwd_lcore()->mbp;
-	vlan_tci = ports[fs->tx_port].tx_vlan_id;
-	ol_flags = ports[fs->tx_port].tx_ol_flags;
+	txp = &ports[fs->tx_port];
+	vlan_tci = txp->tx_vlan_id;
+	if (txp->tx_ol_flags & TESTPMD_TX_OFFLOAD_INSERT_VLAN)
+		ol_flags = PKT_TX_VLAN_PKT;
 	for (nb_pkt = 0; nb_pkt < nb_pkt_per_burst; nb_pkt++) {
 		pkt = tx_mbuf_alloc(mbp);
 		if (pkt == NULL) {
-- 
2.1.0

^ permalink raw reply	[flat|nested] 112+ messages in thread

* [dpdk-dev] [PATCH 10/12] testpmd: rework csum forward engine
  2014-11-10 15:59 [dpdk-dev] [PATCH 00/12] add TSO support Olivier Matz
                   ` (8 preceding siblings ...)
  2014-11-10 15:59 ` [dpdk-dev] [PATCH 09/12] testpmd: fix use of offload flags in testpmd Olivier Matz
@ 2014-11-10 15:59 ` Olivier Matz
  2014-11-11  8:35   ` Liu, Jijiang
  2014-11-10 15:59 ` [dpdk-dev] [PATCH 11/12] testpmd: support TSO in " Olivier Matz
                   ` (3 subsequent siblings)
  13 siblings, 1 reply; 112+ messages in thread
From: Olivier Matz @ 2014-11-10 15:59 UTC (permalink / raw)
  To: dev; +Cc: jigsaw

The csum forward engine was becoming too complex to be used and
extended (the next commits want to add the support of TSO):

- no explaination about what the code does
- code is not factorized, lots of code duplicated, especially between
  ipv4/ipv6
- user command line api: use of bitmasks that need to be calculated by
  the user
- the user flags don't have the same semantic:
  - for legacy IP/UDP/TCP/SCTP, it selects software or hardware checksum
  - for other (vxlan), it selects between hardware checksum or no
    checksum
- the code relies too much on flags set by the driver without software
  alternative (ex: PKT_RX_TUNNEL_IPV4_HDR). It is nice to be able to
  compare a software implementation with the hardware offload.

This commit tries to fix these issues, and provide a simple definition
of what is done by the forward engine:

 * Receive a burst of packets, and for supported packet types:
 *  - modify the IPs
 *  - reprocess the checksum in SW or HW, depending on testpmd command line
 *    configuration
 * Then packets are transmitted on the output port.
 *
 * Supported packets are:
 *   Ether / (vlan) / IP|IP6 / UDP|TCP|SCTP .
 *   Ether / (vlan) / IP|IP6 / UDP / VxLAN / Ether / IP|IP6 / UDP|TCP|SCTP
 *
 * The network parser supposes that the packet is contiguous, which may
 * not be the case in real life.

Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
---
 app/test-pmd/cmdline.c  | 151 ++++++++---
 app/test-pmd/config.c   |  11 -
 app/test-pmd/csumonly.c | 668 ++++++++++++++++++++++--------------------------
 app/test-pmd/testpmd.h  |  17 +-
 4 files changed, 423 insertions(+), 424 deletions(-)

diff --git a/app/test-pmd/cmdline.c b/app/test-pmd/cmdline.c
index 4c3fc76..0361e58 100644
--- a/app/test-pmd/cmdline.c
+++ b/app/test-pmd/cmdline.c
@@ -310,19 +310,14 @@ static void cmd_help_long_parsed(void *parsed_result,
 			"    Disable hardware insertion of a VLAN header in"
 			" packets sent on a port.\n\n"
 
-			"tx_checksum set (mask) (port_id)\n"
-			"    Enable hardware insertion of checksum offload with"
-			" the 8-bit mask, 0~0xff, in packets sent on a port.\n"
-			"        bit 0 - insert ip   checksum offload if set\n"
-			"        bit 1 - insert udp  checksum offload if set\n"
-			"        bit 2 - insert tcp  checksum offload if set\n"
-			"        bit 3 - insert sctp checksum offload if set\n"
-			"        bit 4 - insert inner ip  checksum offload if set\n"
-			"        bit 5 - insert inner udp checksum offload if set\n"
-			"        bit 6 - insert inner tcp checksum offload if set\n"
-			"        bit 7 - insert inner sctp checksum offload if set\n"
+			"tx_cksum set (ip|udp|tcp|sctp|vxlan) (hw|sw) (port_id)\n"
+			"    Enable hardware calculation of checksum with when"
+			" transmitting a packet using 'csum' forward engine.\n"
 			"    Please check the NIC datasheet for HW limits.\n\n"
 
+			"tx_checksum show (port_id)\n"
+			"    Display tx checksum offload configuration\n\n"
+
 			"set fwd (%s)\n"
 			"    Set packet forwarding mode.\n\n"
 
@@ -2738,48 +2733,131 @@ cmdline_parse_inst_t cmd_tx_vlan_reset = {
 
 
 /* *** ENABLE HARDWARE INSERTION OF CHECKSUM IN TX PACKETS *** */
-struct cmd_tx_cksum_set_result {
+struct cmd_tx_cksum_result {
 	cmdline_fixed_string_t tx_cksum;
-	cmdline_fixed_string_t set;
-	uint8_t cksum_mask;
+	cmdline_fixed_string_t mode;
+	cmdline_fixed_string_t proto;
+	cmdline_fixed_string_t hwsw;
 	uint8_t port_id;
 };
 
 static void
-cmd_tx_cksum_set_parsed(void *parsed_result,
+cmd_tx_cksum_parsed(void *parsed_result,
 		       __attribute__((unused)) struct cmdline *cl,
 		       __attribute__((unused)) void *data)
 {
-	struct cmd_tx_cksum_set_result *res = parsed_result;
+	struct cmd_tx_cksum_result *res = parsed_result;
+	int hw = 0;
+	uint16_t ol_flags, mask = 0;
+	struct rte_eth_dev_info dev_info;
+
+	if (port_id_is_invalid(res->port_id)) {
+		printf("invalid port %d\n", res->port_id);
+		return;
+	}
 
-	tx_cksum_set(res->port_id, res->cksum_mask);
+	if (!strcmp(res->mode, "set")) {
+
+		if (!strcmp(res->hwsw, "hw"))
+			hw = 1;
+
+		if (!strcmp(res->proto, "ip")) {
+			mask = TESTPMD_TX_OFFLOAD_IP_CKSUM;
+		} else if (!strcmp(res->proto, "udp")) {
+			mask = TESTPMD_TX_OFFLOAD_UDP_CKSUM;
+		} else if (!strcmp(res->proto, "tcp")) {
+			mask = TESTPMD_TX_OFFLOAD_TCP_CKSUM;
+		} else if (!strcmp(res->proto, "sctp")) {
+			mask = TESTPMD_TX_OFFLOAD_SCTP_CKSUM;
+		} else if (!strcmp(res->proto, "vxlan")) {
+			mask = TESTPMD_TX_OFFLOAD_VXLAN_CKSUM;
+		}
+
+		if (hw)
+			ports[res->port_id].tx_ol_flags |= mask;
+		else
+			ports[res->port_id].tx_ol_flags &= (~mask);
+	}
+
+	ol_flags = ports[res->port_id].tx_ol_flags;
+	printf("IP checksum offload is %s\n",
+		(ol_flags & TESTPMD_TX_OFFLOAD_IP_CKSUM) ? "hw" : "sw");
+	printf("UDP checksum offload is %s\n",
+		(ol_flags & TESTPMD_TX_OFFLOAD_UDP_CKSUM) ? "hw" : "sw");
+	printf("TCP checksum offload is %s\n",
+		(ol_flags & TESTPMD_TX_OFFLOAD_TCP_CKSUM) ? "hw" : "sw");
+	printf("SCTP checksum offload is %s\n",
+		(ol_flags & TESTPMD_TX_OFFLOAD_SCTP_CKSUM) ? "hw" : "sw");
+	printf("VxLAN checksum offload is %s\n",
+		(ol_flags & TESTPMD_TX_OFFLOAD_VXLAN_CKSUM) ? "hw" : "sw");
+
+	/* display warnings if configuration is not supported by the NIC */
+	rte_eth_dev_info_get(res->port_id, &dev_info);
+	if ((ol_flags & TESTPMD_TX_OFFLOAD_IP_CKSUM) &&
+		(dev_info.tx_offload_capa & DEV_TX_OFFLOAD_IPV4_CKSUM) == 0) {
+		printf("Warning: hardware IP checksum enabled but not "
+			"supported by port %d\n", res->port_id);
+	}
+	if ((ol_flags & TESTPMD_TX_OFFLOAD_UDP_CKSUM) &&
+		(dev_info.tx_offload_capa & DEV_TX_OFFLOAD_UDP_CKSUM) == 0) {
+		printf("Warning: hardware UDP checksum enabled but not "
+			"supported by port %d\n", res->port_id);
+	}
+	if ((ol_flags & TESTPMD_TX_OFFLOAD_TCP_CKSUM) &&
+		(dev_info.tx_offload_capa & DEV_TX_OFFLOAD_TCP_CKSUM) == 0) {
+		printf("Warning: hardware TCP checksum enabled but not "
+			"supported by port %d\n", res->port_id);
+	}
+	if ((ol_flags & TESTPMD_TX_OFFLOAD_SCTP_CKSUM) &&
+		(dev_info.tx_offload_capa & DEV_TX_OFFLOAD_SCTP_CKSUM) == 0) {
+		printf("Warning: hardware SCTP checksum enabled but not "
+			"supported by port %d\n", res->port_id);
+	}
 }
 
-cmdline_parse_token_string_t cmd_tx_cksum_set_tx_cksum =
-	TOKEN_STRING_INITIALIZER(struct cmd_tx_cksum_set_result,
+cmdline_parse_token_string_t cmd_tx_cksum_tx_cksum =
+	TOKEN_STRING_INITIALIZER(struct cmd_tx_cksum_result,
 				tx_cksum, "tx_checksum");
-cmdline_parse_token_string_t cmd_tx_cksum_set_set =
-	TOKEN_STRING_INITIALIZER(struct cmd_tx_cksum_set_result,
-				set, "set");
-cmdline_parse_token_num_t cmd_tx_cksum_set_cksum_mask =
-	TOKEN_NUM_INITIALIZER(struct cmd_tx_cksum_set_result,
-				cksum_mask, UINT8);
-cmdline_parse_token_num_t cmd_tx_cksum_set_portid =
-	TOKEN_NUM_INITIALIZER(struct cmd_tx_cksum_set_result,
+cmdline_parse_token_string_t cmd_tx_cksum_mode =
+	TOKEN_STRING_INITIALIZER(struct cmd_tx_cksum_result,
+				mode, "set");
+cmdline_parse_token_string_t cmd_tx_cksum_proto =
+	TOKEN_STRING_INITIALIZER(struct cmd_tx_cksum_result,
+				proto, "ip#tcp#udp#sctp#vxlan");
+cmdline_parse_token_string_t cmd_tx_cksum_hwsw =
+	TOKEN_STRING_INITIALIZER(struct cmd_tx_cksum_result,
+				hwsw, "hw#sw");
+cmdline_parse_token_num_t cmd_tx_cksum_portid =
+	TOKEN_NUM_INITIALIZER(struct cmd_tx_cksum_result,
 				port_id, UINT8);
 
 cmdline_parse_inst_t cmd_tx_cksum_set = {
-	.f = cmd_tx_cksum_set_parsed,
+	.f = cmd_tx_cksum_parsed,
+	.data = NULL,
+	.help_str = "enable/disable hardware calculation of L3/L4 checksum when "
+		"using csum forward engine: tx_cksum set ip|tcp|udp|sctp|vxlan hw|sw <port>",
+	.tokens = {
+		(void *)&cmd_tx_cksum_tx_cksum,
+		(void *)&cmd_tx_cksum_mode,
+		(void *)&cmd_tx_cksum_proto,
+		(void *)&cmd_tx_cksum_hwsw,
+		(void *)&cmd_tx_cksum_portid,
+		NULL,
+	},
+};
+
+cmdline_parse_token_string_t cmd_tx_cksum_mode_show =
+	TOKEN_STRING_INITIALIZER(struct cmd_tx_cksum_result,
+				mode, "show");
+
+cmdline_parse_inst_t cmd_tx_cksum_show = {
+	.f = cmd_tx_cksum_parsed,
 	.data = NULL,
-	.help_str = "enable hardware insertion of L3/L4checksum with a given "
-	"mask in packets sent on a port, the bit mapping is given as, Bit 0 for ip, "
-	"Bit 1 for UDP, Bit 2 for TCP, Bit 3 for SCTP, Bit 4 for inner ip, "
-	"Bit 5 for inner UDP, Bit 6 for inner TCP, Bit 7 for inner SCTP",
+	.help_str = "show checksum offload configuration: tx_cksum show <port>",
 	.tokens = {
-		(void *)&cmd_tx_cksum_set_tx_cksum,
-		(void *)&cmd_tx_cksum_set_set,
-		(void *)&cmd_tx_cksum_set_cksum_mask,
-		(void *)&cmd_tx_cksum_set_portid,
+		(void *)&cmd_tx_cksum_tx_cksum,
+		(void *)&cmd_tx_cksum_mode_show,
+		(void *)&cmd_tx_cksum_portid,
 		NULL,
 	},
 };
@@ -7796,6 +7874,7 @@ cmdline_parse_ctx_t main_ctx[] = {
 	(cmdline_parse_inst_t *)&cmd_tx_vlan_reset,
 	(cmdline_parse_inst_t *)&cmd_tx_vlan_set_pvid,
 	(cmdline_parse_inst_t *)&cmd_tx_cksum_set,
+	(cmdline_parse_inst_t *)&cmd_tx_cksum_show,
 	(cmdline_parse_inst_t *)&cmd_link_flow_control_set,
 	(cmdline_parse_inst_t *)&cmd_link_flow_control_set_rx,
 	(cmdline_parse_inst_t *)&cmd_link_flow_control_set_tx,
diff --git a/app/test-pmd/config.c b/app/test-pmd/config.c
index 4b6fb91..6b234f6 100644
--- a/app/test-pmd/config.c
+++ b/app/test-pmd/config.c
@@ -1748,17 +1748,6 @@ set_qmap(portid_t port_id, uint8_t is_rx, uint16_t queue_id, uint8_t map_value)
 }
 
 void
-tx_cksum_set(portid_t port_id, uint64_t ol_flags)
-{
-	uint64_t tx_ol_flags;
-	if (port_id_is_invalid(port_id))
-		return;
-	/* Clear last 8 bits and then set L3/4 checksum mask again */
-	tx_ol_flags = ports[port_id].tx_ol_flags & (~0x0FFull);
-	ports[port_id].tx_ol_flags = ((ol_flags & 0xff) | tx_ol_flags);
-}
-
-void
 fdir_add_signature_filter(portid_t port_id, uint8_t queue_id,
 			  struct rte_fdir_filter *fdir_filter)
 {
diff --git a/app/test-pmd/csumonly.c b/app/test-pmd/csumonly.c
index 743094a..abc525c 100644
--- a/app/test-pmd/csumonly.c
+++ b/app/test-pmd/csumonly.c
@@ -73,13 +73,19 @@
 #include <rte_string_fns.h>
 #include "testpmd.h"
 
-
-
 #define IP_DEFTTL  64   /* from RFC 1340. */
 #define IP_VERSION 0x40
 #define IP_HDRLEN  0x05 /* default IP header length == five 32-bits words. */
 #define IP_VHL_DEF (IP_VERSION | IP_HDRLEN)
 
+/* we cannot use htons() from arpa/inet.h due to name conflicts, and we
+ * cannot use rte_cpu_to_be_16() on a constant in a switch/case */
+#if __BYTE_ORDER == __LITTLE_ENDIAN
+#define _htons(x) ((uint16_t)((((x) & 0x00ffU) << 8) | (((x) & 0xff00U) >> 8)))
+#else
+#define _htons(x) (x)
+#endif
+
 static inline uint16_t
 get_16b_sum(uint16_t *ptr16, uint32_t nr)
 {
@@ -112,7 +118,7 @@ get_ipv4_cksum(struct ipv4_hdr *ipv4_hdr)
 
 
 static inline uint16_t
-get_ipv4_psd_sum (struct ipv4_hdr * ip_hdr)
+get_ipv4_psd_sum(struct ipv4_hdr *ip_hdr)
 {
 	/* Pseudo Header for IPv4/UDP/TCP checksum */
 	union ipv4_psd_header {
@@ -136,7 +142,7 @@ get_ipv4_psd_sum (struct ipv4_hdr * ip_hdr)
 }
 
 static inline uint16_t
-get_ipv6_psd_sum (struct ipv6_hdr * ip_hdr)
+get_ipv6_psd_sum(struct ipv6_hdr *ip_hdr)
 {
 	/* Pseudo Header for IPv6/UDP/TCP checksum */
 	union ipv6_psd_header {
@@ -158,6 +164,15 @@ get_ipv6_psd_sum (struct ipv6_hdr * ip_hdr)
 	return get_16b_sum(psd_hdr.u16_arr, sizeof(psd_hdr));
 }
 
+static uint16_t
+get_psd_sum(void *l3_hdr, uint16_t ethertype)
+{
+	if (ethertype == _htons(ETHER_TYPE_IPv4))
+		return get_ipv4_psd_sum(l3_hdr);
+	else /* assume ethertype == ETHER_TYPE_IPv6 */
+		return get_ipv6_psd_sum(l3_hdr);
+}
+
 static inline uint16_t
 get_ipv4_udptcp_checksum(struct ipv4_hdr *ipv4_hdr, uint16_t *l4_hdr)
 {
@@ -174,7 +189,6 @@ get_ipv4_udptcp_checksum(struct ipv4_hdr *ipv4_hdr, uint16_t *l4_hdr)
 	if (cksum == 0)
 		cksum = 0xffff;
 	return (uint16_t)cksum;
-
 }
 
 static inline uint16_t
@@ -196,48 +210,218 @@ get_ipv6_udptcp_checksum(struct ipv6_hdr *ipv6_hdr, uint16_t *l4_hdr)
 	return (uint16_t)cksum;
 }
 
+static uint16_t
+get_udptcp_checksum(void *l3_hdr, void *l4_hdr, uint16_t ethertype)
+{
+	if (ethertype == _htons(ETHER_TYPE_IPv4))
+		return get_ipv4_udptcp_checksum(l3_hdr, l4_hdr);
+	else /* assume ethertype == ETHER_TYPE_IPv6 */
+		return get_ipv6_udptcp_checksum(l3_hdr, l4_hdr);
+}
 
 /*
- * Forwarding of packets. Change the checksum field with HW or SW methods
- * The HW/SW method selection depends on the ol_flags on every packet
+ * Parse an ethernet header to fill the ethertype, l2_len, l3_len and
+ * ipproto. This function is able to recognize IPv4/IPv6 with one optional vlan
+ * header.
+ */
+static void
+parse_ethernet(struct ether_hdr *eth_hdr, uint16_t *ethertype, uint16_t *l2_len,
+	uint16_t *l3_len, uint8_t *l4_proto)
+{
+	struct ipv4_hdr *ipv4_hdr;
+	struct ipv6_hdr *ipv6_hdr;
+
+	*l2_len = sizeof(struct ether_hdr);
+	*ethertype = eth_hdr->ether_type;
+
+	if (*ethertype == _htons(ETHER_TYPE_VLAN)) {
+		struct vlan_hdr *vlan_hdr = (struct vlan_hdr *)(eth_hdr + 1);
+
+		*l2_len  += sizeof(struct vlan_hdr);
+		*ethertype = vlan_hdr->eth_proto;
+	}
+
+	switch (*ethertype) {
+	case _htons(ETHER_TYPE_IPv4):
+		ipv4_hdr = (struct ipv4_hdr *) ((char *)eth_hdr + *l2_len);
+		*l3_len = (ipv4_hdr->version_ihl & 0x0f) * 4;
+		*l4_proto = ipv4_hdr->next_proto_id;
+		break;
+	case _htons(ETHER_TYPE_IPv6):
+		ipv6_hdr = (struct ipv6_hdr *) ((char *)eth_hdr + *l2_len);
+		*l3_len = sizeof(struct ipv6_hdr) ;
+		*l4_proto = ipv6_hdr->proto;
+		break;
+	default:
+		*l3_len = 0;
+		*l4_proto = 0;
+		break;
+	}
+}
+
+/* modify the IPv4 or IPv4 source address of a packet */
+static void
+change_ip_addresses(void *l3_hdr, uint16_t ethertype)
+{
+	struct ipv4_hdr *ipv4_hdr = l3_hdr;
+	struct ipv6_hdr *ipv6_hdr = l3_hdr;
+
+	if (ethertype == _htons(ETHER_TYPE_IPv4)) {
+		ipv4_hdr->src_addr =
+			rte_cpu_to_be_32(rte_be_to_cpu_32(ipv4_hdr->src_addr) + 1);
+	}
+	else if (ethertype == _htons(ETHER_TYPE_IPv6)) {
+		ipv6_hdr->src_addr[15] = ipv6_hdr->src_addr[15] + 1;
+	}
+}
+
+/* if possible, calculate the checksum of a packet in hw or sw,
+ * depending on the testpmd command line configuration */
+static uint64_t
+process_inner_cksums(void *l3_hdr, uint16_t ethertype, uint16_t l3_len,
+	uint8_t l4_proto, uint16_t testpmd_ol_flags)
+{
+	struct ipv4_hdr *ipv4_hdr = l3_hdr;
+	struct udp_hdr *udp_hdr;
+	struct tcp_hdr *tcp_hdr;
+	struct sctp_hdr *sctp_hdr;
+	uint64_t ol_flags = 0;
+
+	if (ethertype == _htons(ETHER_TYPE_IPv4)) {
+		ipv4_hdr = l3_hdr;
+		ipv4_hdr->hdr_checksum = 0;
+
+		if (testpmd_ol_flags & TESTPMD_TX_OFFLOAD_IP_CKSUM)
+			ol_flags |= PKT_TX_IP_CKSUM;
+		else
+			ipv4_hdr->hdr_checksum = get_ipv4_cksum(ipv4_hdr);
+
+	}
+	else if (ethertype != _htons(ETHER_TYPE_IPv6))
+		return 0; /* packet type not supported nothing to do */
+
+	if (l4_proto == IPPROTO_UDP) {
+		udp_hdr = (struct udp_hdr *)((char *)l3_hdr + l3_len);
+		/* do not recalculate udp cksum if it was 0 */
+		if (udp_hdr->dgram_cksum != 0) {
+			udp_hdr->dgram_cksum = 0;
+			if (testpmd_ol_flags & TESTPMD_TX_OFFLOAD_UDP_CKSUM) {
+				ol_flags |= PKT_TX_UDP_CKSUM;
+				udp_hdr->dgram_cksum = get_psd_sum(l3_hdr,
+					ethertype);
+			}
+			else {
+				udp_hdr->dgram_cksum =
+					get_udptcp_checksum(l3_hdr, udp_hdr,
+						ethertype);
+			}
+		}
+	}
+	else if (l4_proto == IPPROTO_TCP) {
+		tcp_hdr = (struct tcp_hdr *)((char *)l3_hdr + l3_len);
+		tcp_hdr->cksum = 0;
+		if (testpmd_ol_flags & TESTPMD_TX_OFFLOAD_TCP_CKSUM) {
+			ol_flags |= PKT_TX_TCP_CKSUM;
+			tcp_hdr->cksum = get_psd_sum(l3_hdr, ethertype);
+		}
+		else {
+			tcp_hdr->cksum =
+				get_udptcp_checksum(l3_hdr, tcp_hdr, ethertype);
+		}
+	}
+	else if (l4_proto == IPPROTO_SCTP) {
+		sctp_hdr = (struct sctp_hdr *)((char *)l3_hdr + l3_len);
+		sctp_hdr->cksum = 0;
+		/* sctp payload must be a multiple of 4 to be
+		 * offloaded */
+		if ((testpmd_ol_flags & TESTPMD_TX_OFFLOAD_SCTP_CKSUM) &&
+			((ipv4_hdr->total_length & 0x3) == 0)) {
+			ol_flags |= PKT_TX_SCTP_CKSUM;
+		}
+		else {
+			/* XXX implement CRC32c, example available in
+			 * RFC3309 */
+		}
+	}
+
+	return ol_flags;
+}
+
+/* Calculate the checksum of outer header (only vxlan is supported,
+ * meaning IP + UDP). The caller already checked that it's a vxlan
+ * packet */
+static uint64_t
+process_outer_cksums(void *outer_l3_hdr, uint16_t outer_ethertype,
+	uint16_t outer_l3_len, uint16_t testpmd_ol_flags)
+{
+	struct ipv4_hdr *ipv4_hdr = outer_l3_hdr;
+	struct ipv6_hdr *ipv6_hdr = outer_l3_hdr;
+	struct udp_hdr *udp_hdr;
+	uint64_t ol_flags = 0;
+
+	if (testpmd_ol_flags & TESTPMD_TX_OFFLOAD_VXLAN_CKSUM)
+		ol_flags |= PKT_TX_IP_CKSUM;
+
+	if (outer_ethertype == _htons(ETHER_TYPE_IPv4)) {
+		ipv4_hdr->hdr_checksum = 0;
+
+		if ((testpmd_ol_flags & TESTPMD_TX_OFFLOAD_VXLAN_CKSUM) == 0)
+			ipv4_hdr->hdr_checksum = get_ipv4_cksum(ipv4_hdr);
+	}
+
+	udp_hdr = (struct udp_hdr *)((char *)outer_l3_hdr + outer_l3_len);
+	/* do not recalculate udp cksum if it was 0 */
+	if (udp_hdr->dgram_cksum != 0) {
+		udp_hdr->dgram_cksum = 0;
+		if ((testpmd_ol_flags & TESTPMD_TX_OFFLOAD_VXLAN_CKSUM) == 0) {
+			if (outer_ethertype == _htons(ETHER_TYPE_IPv4))
+				udp_hdr->dgram_cksum =
+					get_ipv4_udptcp_checksum(ipv4_hdr,
+						(uint16_t *)udp_hdr);
+			else
+				udp_hdr->dgram_cksum =
+					get_ipv6_udptcp_checksum(ipv6_hdr,
+						(uint16_t *)udp_hdr);
+		}
+	}
+
+	return ol_flags;
+}
+
+/*
+ * Receive a burst of packets, and for supported packet types:
+ *  - modify the IPs
+ *  - reprocess the checksum in SW or HW, depending on testpmd command line
+ *    configuration
+ * Then packets are transmitted on the output port.
+ *
+ * Supported packets are:
+ *   Ether / (vlan) / IP|IP6 / UDP|TCP|SCTP .
+ *   Ether / (vlan) / IP|IP6 / UDP / VxLAN / Ether / IP|IP6 / UDP|TCP|SCTP
+ *
+ * The network parser supposes that the packet is contiguous, which may
+ * not be the case in real life.
  */
 static void
 pkt_burst_checksum_forward(struct fwd_stream *fs)
 {
-	struct rte_mbuf  *pkts_burst[MAX_PKT_BURST];
-	struct rte_port  *txp;
-	struct rte_mbuf  *mb;
+	struct rte_mbuf *pkts_burst[MAX_PKT_BURST];
+	struct rte_port *txp;
+	struct rte_mbuf *m;
 	struct ether_hdr *eth_hdr;
-	struct ipv4_hdr  *ipv4_hdr;
-	struct ether_hdr *inner_eth_hdr;
-	struct ipv4_hdr  *inner_ipv4_hdr = NULL;
-	struct ipv6_hdr  *ipv6_hdr;
-	struct ipv6_hdr  *inner_ipv6_hdr = NULL;
-	struct udp_hdr   *udp_hdr;
-	struct udp_hdr   *inner_udp_hdr;
-	struct tcp_hdr   *tcp_hdr;
-	struct tcp_hdr   *inner_tcp_hdr;
-	struct sctp_hdr  *sctp_hdr;
-	struct sctp_hdr  *inner_sctp_hdr;
-
+	void *l3_hdr = NULL, *outer_l3_hdr = NULL; /* can be IPv4 or IPv6 */
+	struct udp_hdr *udp_hdr;
 	uint16_t nb_rx;
 	uint16_t nb_tx;
 	uint16_t i;
 	uint64_t ol_flags;
-	uint64_t pkt_ol_flags;
-	uint64_t tx_ol_flags;
-	uint16_t l4_proto;
-	uint16_t inner_l4_proto = 0;
-	uint16_t eth_type;
-	uint8_t  l2_len;
-	uint8_t  l3_len;
-	uint8_t  inner_l2_len = 0;
-	uint8_t  inner_l3_len = 0;
-
+	uint16_t testpmd_ol_flags;
+	uint8_t l4_proto;
+	uint16_t ethertype = 0, outer_ethertype = 0;
+	uint16_t  l2_len = 0, l3_len = 0, outer_l2_len = 0, outer_l3_len = 0;
+	int tunnel = 0;
 	uint32_t rx_bad_ip_csum;
 	uint32_t rx_bad_l4_csum;
-	uint8_t  ipv4_tunnel;
-	uint8_t  ipv6_tunnel;
 
 #ifdef RTE_TEST_PMD_RECORD_CORE_CYCLES
 	uint64_t start_tsc;
@@ -249,9 +433,7 @@ pkt_burst_checksum_forward(struct fwd_stream *fs)
 	start_tsc = rte_rdtsc();
 #endif
 
-	/*
-	 * Receive a burst of packets and forward them.
-	 */
+	/* receive a burst of packet */
 	nb_rx = rte_eth_rx_burst(fs->rx_port, fs->rx_queue, pkts_burst,
 				 nb_pkt_per_burst);
 	if (unlikely(nb_rx == 0))
@@ -265,348 +447,107 @@ pkt_burst_checksum_forward(struct fwd_stream *fs)
 	rx_bad_l4_csum = 0;
 
 	txp = &ports[fs->tx_port];
-	tx_ol_flags = txp->tx_ol_flags;
+	testpmd_ol_flags = txp->tx_ol_flags;
 
 	for (i = 0; i < nb_rx; i++) {
 
-		mb = pkts_burst[i];
-		l2_len  = sizeof(struct ether_hdr);
-		pkt_ol_flags = mb->ol_flags;
-		ol_flags = (pkt_ol_flags & (~PKT_TX_L4_MASK));
-		ipv4_tunnel = (pkt_ol_flags & PKT_RX_TUNNEL_IPV4_HDR) ?
-				1 : 0;
-		ipv6_tunnel = (pkt_ol_flags & PKT_RX_TUNNEL_IPV6_HDR) ?
-				1 : 0;
-		eth_hdr = rte_pktmbuf_mtod(mb, struct ether_hdr *);
-		eth_type = rte_be_to_cpu_16(eth_hdr->ether_type);
-		if (eth_type == ETHER_TYPE_VLAN) {
-			/* Only allow single VLAN label here */
-			l2_len  += sizeof(struct vlan_hdr);
-			 eth_type = rte_be_to_cpu_16(*(uint16_t *)
-				((uintptr_t)&eth_hdr->ether_type +
-				sizeof(struct vlan_hdr)));
+		ol_flags = 0;
+		tunnel = 0;
+		m = pkts_burst[i];
+
+		/* Update the L3/L4 checksum error packet statistics */
+		rx_bad_ip_csum += ((m->ol_flags & PKT_RX_IP_CKSUM_BAD) != 0);
+		rx_bad_l4_csum += ((m->ol_flags & PKT_RX_L4_CKSUM_BAD) != 0);
+
+		/* step 1: dissect packet, parsing optional vlan, ip4/ip6, vxlan
+		 * and inner headers */
+
+		eth_hdr = rte_pktmbuf_mtod(m, struct ether_hdr *);
+		parse_ethernet(eth_hdr, &ethertype, &l2_len, &l3_len, &l4_proto);
+		l3_hdr = (char *)eth_hdr + l2_len;
+
+		/* check if it's a supported tunnel (only vxlan for now) */
+		if (l4_proto == IPPROTO_UDP) {
+			udp_hdr = (struct udp_hdr *)((char *)l3_hdr + l3_len);
+
+			/* currently, this flag is set by i40e only if the
+			 * packet is vxlan */
+			if (((m->ol_flags & PKT_RX_TUNNEL_IPV4_HDR) ||
+					(m->ol_flags & PKT_RX_TUNNEL_IPV6_HDR)))
+				tunnel = 1;
+			/* else check udp destination port, 4789 is the default
+			 * vxlan port (rfc7348) */
+			else if (udp_hdr->dst_port == _htons(4789))
+				tunnel = 1;
+
+			if (tunnel == 1) {
+				outer_ethertype = ethertype;
+				outer_l2_len = l2_len;
+				outer_l3_len = l3_len;
+				outer_l3_hdr = l3_hdr;
+
+				eth_hdr = (struct ether_hdr *)((char *)udp_hdr +
+					sizeof(struct udp_hdr) +
+					sizeof(struct vxlan_hdr));
+
+				parse_ethernet(eth_hdr, &ethertype, &l2_len,
+					&l3_len, &l4_proto);
+				l3_hdr = (char *)eth_hdr + l2_len;
+			}
 		}
 
-		/* Update the L3/L4 checksum error packet count  */
-		rx_bad_ip_csum += (uint16_t) ((pkt_ol_flags & PKT_RX_IP_CKSUM_BAD) != 0);
-		rx_bad_l4_csum += (uint16_t) ((pkt_ol_flags & PKT_RX_L4_CKSUM_BAD) != 0);
-
-		/*
-		 * Try to figure out L3 packet type by SW.
-		 */
-		if ((pkt_ol_flags & (PKT_RX_IPV4_HDR | PKT_RX_IPV4_HDR_EXT |
-				PKT_RX_IPV6_HDR | PKT_RX_IPV6_HDR_EXT)) == 0) {
-			if (eth_type == ETHER_TYPE_IPv4)
-				pkt_ol_flags |= PKT_RX_IPV4_HDR;
-			else if (eth_type == ETHER_TYPE_IPv6)
-				pkt_ol_flags |= PKT_RX_IPV6_HDR;
-		}
+		/* step 2: change all source IPs (v4 or v6) so we need
+		 * to recompute the chksums even if they were correct */
 
-		/*
-		 * Simplify the protocol parsing
-		 * Assuming the incoming packets format as
-		 *      Ethernet2 + optional single VLAN
-		 *      + ipv4 or ipv6
-		 *      + udp or tcp or sctp or others
-		 */
-		if (pkt_ol_flags & (PKT_RX_IPV4_HDR | PKT_RX_TUNNEL_IPV4_HDR)) {
+		change_ip_addresses(l3_hdr, ethertype);
+		if (tunnel == 1)
+			change_ip_addresses(outer_l3_hdr, outer_ethertype);
 
-			/* Do not support ipv4 option field */
-			l3_len = sizeof(struct ipv4_hdr) ;
+		/* step 3: depending on user command line configuration,
+		 * recompute checksum either in software or flag the
+		 * mbuf to offload the calculation to the NIC */
 
-			ipv4_hdr = (struct ipv4_hdr *) (rte_pktmbuf_mtod(mb,
-					unsigned char *) + l2_len);
+		/* process checksums of inner headers first */
+		ol_flags |= process_inner_cksums(l3_hdr, ethertype,
+			l3_len, l4_proto, testpmd_ol_flags);
 
-			l4_proto = ipv4_hdr->next_proto_id;
+		/* Then process outer headers if any. Note that the software
+		 * checksum will be wrong if one of the inner checksums is
+		 * processed in hardware. */
+		if (tunnel == 1) {
+			ol_flags |= process_outer_cksums(outer_l3_hdr,
+				outer_ethertype, outer_l3_len, testpmd_ol_flags);
+		}
 
-			/* Do not delete, this is required by HW*/
-			ipv4_hdr->hdr_checksum = 0;
+		/* step 4: fill the mbuf meta data (flags and header lengths) */
 
-			if (tx_ol_flags & TESTPMD_TX_OFFLOAD_IP_CKSUM) {
-				/* HW checksum */
-				ol_flags |= PKT_TX_IP_CKSUM;
+		if (tunnel == 1) {
+			if (testpmd_ol_flags & TESTPMD_TX_OFFLOAD_VXLAN_CKSUM) {
+				m->l2_len = outer_l2_len;
+				m->l3_len = outer_l3_len;
+				m->inner_l2_len = l2_len;
+				m->inner_l3_len = l3_len;
 			}
 			else {
-				ol_flags |= PKT_TX_IPV4;
-				/* SW checksum calculation */
-				ipv4_hdr->src_addr++;
-				ipv4_hdr->hdr_checksum = get_ipv4_cksum(ipv4_hdr);
+				/* if we don't do vxlan cksum in hw,
+				   outer checksum will be wrong because
+				   we changed the ip, but it shows that
+				   we can process the inner header cksum
+				   in the nic */
+				m->l2_len = outer_l2_len + outer_l3_len +
+					sizeof(struct udp_hdr) +
+					sizeof(struct vxlan_hdr) + l2_len;
+				m->l3_len = l3_len;
 			}
-
-			if (l4_proto == IPPROTO_UDP) {
-				udp_hdr = (struct udp_hdr*) (rte_pktmbuf_mtod(mb,
-						unsigned char *) + l2_len + l3_len);
-				if (tx_ol_flags & TESTPMD_TX_OFFLOAD_UDP_CKSUM) {
-					/* HW Offload */
-					ol_flags |= PKT_TX_UDP_CKSUM;
-					if (ipv4_tunnel)
-						udp_hdr->dgram_cksum = 0;
-					else
-						/* Pseudo header sum need be set properly */
-						udp_hdr->dgram_cksum =
-							get_ipv4_psd_sum(ipv4_hdr);
-				}
-				else {
-					/* SW Implementation, clear checksum field first */
-					udp_hdr->dgram_cksum = 0;
-					udp_hdr->dgram_cksum = get_ipv4_udptcp_checksum(ipv4_hdr,
-									(uint16_t *)udp_hdr);
-				}
-
-				if (ipv4_tunnel) {
-
-					uint16_t len;
-
-					/* Check if inner L3/L4 checkum flag is set */
-					if (tx_ol_flags & TESTPMD_TX_OFFLOAD_INNER_CKSUM_MASK)
-						ol_flags |= PKT_TX_VXLAN_CKSUM;
-
-					inner_l2_len  = sizeof(struct ether_hdr);
-					inner_eth_hdr = (struct ether_hdr *) (rte_pktmbuf_mtod(mb,
-								unsigned char *) + l2_len + l3_len
-								 + ETHER_VXLAN_HLEN);
-
-					eth_type = rte_be_to_cpu_16(inner_eth_hdr->ether_type);
-					if (eth_type == ETHER_TYPE_VLAN) {
-						inner_l2_len += sizeof(struct vlan_hdr);
-						eth_type = rte_be_to_cpu_16(*(uint16_t *)
-							((uintptr_t)&eth_hdr->ether_type +
-								sizeof(struct vlan_hdr)));
-					}
-
-					len = l2_len + l3_len + ETHER_VXLAN_HLEN + inner_l2_len;
-					if (eth_type == ETHER_TYPE_IPv4) {
-						inner_l3_len = sizeof(struct ipv4_hdr);
-						inner_ipv4_hdr = (struct ipv4_hdr *) (rte_pktmbuf_mtod(mb,
-								unsigned char *) + len);
-						inner_l4_proto = inner_ipv4_hdr->next_proto_id;
-
-						if (tx_ol_flags & TESTPMD_TX_OFFLOAD_INNER_IP_CKSUM) {
-
-							/* Do not delete, this is required by HW*/
-							inner_ipv4_hdr->hdr_checksum = 0;
-							ol_flags |= PKT_TX_IPV4_CSUM;
-						}
-
-					} else if (eth_type == ETHER_TYPE_IPv6) {
-						inner_l3_len = sizeof(struct ipv6_hdr);
-						inner_ipv6_hdr = (struct ipv6_hdr *) (rte_pktmbuf_mtod(mb,
-								unsigned char *) + len);
-						inner_l4_proto = inner_ipv6_hdr->proto;
-					}
-					if ((inner_l4_proto == IPPROTO_UDP) &&
-						(tx_ol_flags & TESTPMD_TX_OFFLOAD_INNER_UDP_CKSUM)) {
-
-						/* HW Offload */
-						ol_flags |= PKT_TX_UDP_CKSUM;
-						inner_udp_hdr = (struct udp_hdr *) (rte_pktmbuf_mtod(mb,
-								unsigned char *) + len + inner_l3_len);
-						if (eth_type == ETHER_TYPE_IPv4)
-							inner_udp_hdr->dgram_cksum = get_ipv4_psd_sum(inner_ipv4_hdr);
-						else if (eth_type == ETHER_TYPE_IPv6)
-							inner_udp_hdr->dgram_cksum = get_ipv6_psd_sum(inner_ipv6_hdr);
-
-					} else if ((inner_l4_proto == IPPROTO_TCP) &&
-						(tx_ol_flags & TESTPMD_TX_OFFLOAD_INNER_TCP_CKSUM)) {
-						/* HW Offload */
-						ol_flags |= PKT_TX_TCP_CKSUM;
-						inner_tcp_hdr = (struct tcp_hdr *) (rte_pktmbuf_mtod(mb,
-								unsigned char *) + len + inner_l3_len);
-						if (eth_type == ETHER_TYPE_IPv4)
-							inner_tcp_hdr->cksum = get_ipv4_psd_sum(inner_ipv4_hdr);
-						else if (eth_type == ETHER_TYPE_IPv6)
-							inner_tcp_hdr->cksum = get_ipv6_psd_sum(inner_ipv6_hdr);
-					} else if ((inner_l4_proto == IPPROTO_SCTP) &&
-						(tx_ol_flags & TESTPMD_TX_OFFLOAD_INNER_SCTP_CKSUM)) {
-						/* HW Offload */
-						ol_flags |= PKT_TX_SCTP_CKSUM;
-						inner_sctp_hdr = (struct sctp_hdr *) (rte_pktmbuf_mtod(mb,
-								unsigned char *) + len + inner_l3_len);
-						inner_sctp_hdr->cksum = 0;
-					}
-
-				}
-
-			} else if (l4_proto == IPPROTO_TCP) {
-				tcp_hdr = (struct tcp_hdr*) (rte_pktmbuf_mtod(mb,
-						unsigned char *) + l2_len + l3_len);
-				if (tx_ol_flags & TESTPMD_TX_OFFLOAD_TCP_CKSUM) {
-					ol_flags |= PKT_TX_TCP_CKSUM;
-					tcp_hdr->cksum = get_ipv4_psd_sum(ipv4_hdr);
-				}
-				else {
-					tcp_hdr->cksum = 0;
-					tcp_hdr->cksum = get_ipv4_udptcp_checksum(ipv4_hdr,
-							(uint16_t*)tcp_hdr);
-				}
-			} else if (l4_proto == IPPROTO_SCTP) {
-				sctp_hdr = (struct sctp_hdr*) (rte_pktmbuf_mtod(mb,
-						unsigned char *) + l2_len + l3_len);
-
-				if (tx_ol_flags & TESTPMD_TX_OFFLOAD_SCTP_CKSUM) {
-					ol_flags |= PKT_TX_SCTP_CKSUM;
-					sctp_hdr->cksum = 0;
-
-					/* Sanity check, only number of 4 bytes supported */
-					if ((rte_be_to_cpu_16(ipv4_hdr->total_length) % 4) != 0)
-						printf("sctp payload must be a multiple "
-							"of 4 bytes for checksum offload");
-				}
-				else {
-					sctp_hdr->cksum = 0;
-					/* CRC32c sample code available in RFC3309 */
-				}
-			}
-			/* End of L4 Handling*/
-		} else if (pkt_ol_flags & (PKT_RX_IPV6_HDR | PKT_RX_TUNNEL_IPV6_HDR)) {
-			ipv6_hdr = (struct ipv6_hdr *) (rte_pktmbuf_mtod(mb,
-					unsigned char *) + l2_len);
-			l3_len = sizeof(struct ipv6_hdr) ;
-			l4_proto = ipv6_hdr->proto;
-			ol_flags |= PKT_TX_IPV6;
-
-			if (l4_proto == IPPROTO_UDP) {
-				udp_hdr = (struct udp_hdr*) (rte_pktmbuf_mtod(mb,
-						unsigned char *) + l2_len + l3_len);
-				if (tx_ol_flags & TESTPMD_TX_OFFLOAD_UDP_CKSUM) {
-					/* HW Offload */
-					ol_flags |= PKT_TX_UDP_CKSUM;
-					if (ipv6_tunnel)
-						udp_hdr->dgram_cksum = 0;
-					else
-						udp_hdr->dgram_cksum =
-							get_ipv6_psd_sum(ipv6_hdr);
-				}
-				else {
-					/* SW Implementation */
-					/* checksum field need be clear first */
-					udp_hdr->dgram_cksum = 0;
-					udp_hdr->dgram_cksum = get_ipv6_udptcp_checksum(ipv6_hdr,
-								(uint16_t *)udp_hdr);
-				}
-
-				if (ipv6_tunnel) {
-
-					uint16_t len;
-
-					/* Check if inner L3/L4 checksum flag is set */
-					if (tx_ol_flags & TESTPMD_TX_OFFLOAD_INNER_CKSUM_MASK)
-						ol_flags |= PKT_TX_VXLAN_CKSUM;
-
-					inner_l2_len  = sizeof(struct ether_hdr);
-					inner_eth_hdr = (struct ether_hdr *) (rte_pktmbuf_mtod(mb,
-						unsigned char *) + l2_len + l3_len + ETHER_VXLAN_HLEN);
-					eth_type = rte_be_to_cpu_16(inner_eth_hdr->ether_type);
-
-					if (eth_type == ETHER_TYPE_VLAN) {
-						inner_l2_len += sizeof(struct vlan_hdr);
-						eth_type = rte_be_to_cpu_16(*(uint16_t *)
-							((uintptr_t)&eth_hdr->ether_type +
-							sizeof(struct vlan_hdr)));
-					}
-
-					len = l2_len + l3_len + ETHER_VXLAN_HLEN + inner_l2_len;
-
-					if (eth_type == ETHER_TYPE_IPv4) {
-						inner_l3_len = sizeof(struct ipv4_hdr);
-						inner_ipv4_hdr = (struct ipv4_hdr *) (rte_pktmbuf_mtod(mb,
-								unsigned char *) + len);
-						inner_l4_proto = inner_ipv4_hdr->next_proto_id;
-
-						/* HW offload */
-						if (tx_ol_flags & TESTPMD_TX_OFFLOAD_INNER_IP_CKSUM) {
-
-							/* Do not delete, this is required by HW*/
-							inner_ipv4_hdr->hdr_checksum = 0;
-							ol_flags |= PKT_TX_IPV4_CSUM;
-						}
-					} else if (eth_type == ETHER_TYPE_IPv6) {
-						inner_l3_len = sizeof(struct ipv6_hdr);
-						inner_ipv6_hdr = (struct ipv6_hdr *) (rte_pktmbuf_mtod(mb,
-							unsigned char *) + len);
-						inner_l4_proto = inner_ipv6_hdr->proto;
-					}
-
-					if ((inner_l4_proto == IPPROTO_UDP) &&
-						(tx_ol_flags & TESTPMD_TX_OFFLOAD_INNER_UDP_CKSUM)) {
-						inner_udp_hdr = (struct udp_hdr *) (rte_pktmbuf_mtod(mb,
-							unsigned char *) + len + inner_l3_len);
-						/* HW offload */
-						ol_flags |= PKT_TX_UDP_CKSUM;
-						inner_udp_hdr->dgram_cksum = 0;
-						if (eth_type == ETHER_TYPE_IPv4)
-							inner_udp_hdr->dgram_cksum = get_ipv4_psd_sum(inner_ipv4_hdr);
-						else if (eth_type == ETHER_TYPE_IPv6)
-							inner_udp_hdr->dgram_cksum = get_ipv6_psd_sum(inner_ipv6_hdr);
-					} else if ((inner_l4_proto == IPPROTO_TCP) &&
-						(tx_ol_flags & TESTPMD_TX_OFFLOAD_INNER_TCP_CKSUM)) {
-						/* HW offload */
-						ol_flags |= PKT_TX_TCP_CKSUM;
-						inner_tcp_hdr = (struct tcp_hdr *) (rte_pktmbuf_mtod(mb,
-								unsigned char *) + len + inner_l3_len);
-
-						if (eth_type == ETHER_TYPE_IPv4)
-							inner_tcp_hdr->cksum = get_ipv4_psd_sum(inner_ipv4_hdr);
-						else if (eth_type == ETHER_TYPE_IPv6)
-							inner_tcp_hdr->cksum = get_ipv6_psd_sum(inner_ipv6_hdr);
-
-					} else if ((inner_l4_proto == IPPROTO_SCTP) &&
-						(tx_ol_flags & TESTPMD_TX_OFFLOAD_INNER_SCTP_CKSUM)) {
-						/* HW offload */
-						ol_flags |= PKT_TX_SCTP_CKSUM;
-						inner_sctp_hdr = (struct sctp_hdr *) (rte_pktmbuf_mtod(mb,
-								unsigned char *) + len + inner_l3_len);
-						inner_sctp_hdr->cksum = 0;
-					}
-
-				}
-
-			}
-			else if (l4_proto == IPPROTO_TCP) {
-				tcp_hdr = (struct tcp_hdr*) (rte_pktmbuf_mtod(mb,
-						unsigned char *) + l2_len + l3_len);
-				if (tx_ol_flags & TESTPMD_TX_OFFLOAD_TCP_CKSUM) {
-					ol_flags |= PKT_TX_TCP_CKSUM;
-					tcp_hdr->cksum = get_ipv6_psd_sum(ipv6_hdr);
-				}
-				else {
-					tcp_hdr->cksum = 0;
-					tcp_hdr->cksum = get_ipv6_udptcp_checksum(ipv6_hdr,
-							(uint16_t*)tcp_hdr);
-				}
-			}
-			else if (l4_proto == IPPROTO_SCTP) {
-				sctp_hdr = (struct sctp_hdr*) (rte_pktmbuf_mtod(mb,
-						unsigned char *) + l2_len + l3_len);
-
-				if (tx_ol_flags & TESTPMD_TX_OFFLOAD_SCTP_CKSUM) {
-					ol_flags |= PKT_TX_SCTP_CKSUM;
-					sctp_hdr->cksum = 0;
-					/* Sanity check, only number of 4 bytes supported by HW */
-					if ((rte_be_to_cpu_16(ipv6_hdr->payload_len) % 4) != 0)
-						printf("sctp payload must be a multiple "
-							"of 4 bytes for checksum offload");
-				}
-				else {
-					/* CRC32c sample code available in RFC3309 */
-					sctp_hdr->cksum = 0;
-				}
-			} else {
-				printf("Test flow control for 1G PMD \n");
-			}
-			/* End of L6 Handling*/
-		}
-		else {
-			l3_len = 0;
-			printf("Unhandled packet type: %#hx\n", eth_type);
+		} else {
+			/* this is only useful if an offload flag is
+			 * set, but it does not hurt to fill it in any
+			 * case */
+			m->l2_len = l2_len;
+			m->l3_len = l3_len;
 		}
+		m->ol_flags = ol_flags;
 
-		/* Combine the packet header write. VLAN is not consider here */
-		mb->l2_len = l2_len;
-		mb->l3_len = l3_len;
-		mb->inner_l2_len = inner_l2_len;
-		mb->inner_l3_len = inner_l3_len;
-		mb->ol_flags = ol_flags;
 	}
 	nb_tx = rte_eth_tx_burst(fs->tx_port, fs->tx_queue, pkts_burst, nb_rx);
 	fs->tx_packets += nb_tx;
@@ -629,7 +570,6 @@ pkt_burst_checksum_forward(struct fwd_stream *fs)
 #endif
 }
 
-
 struct fwd_engine csum_fwd_engine = {
 	.fwd_mode_name  = "csum",
 	.port_fwd_begin = NULL,
diff --git a/app/test-pmd/testpmd.h b/app/test-pmd/testpmd.h
index 82af2bd..c753d37 100644
--- a/app/test-pmd/testpmd.h
+++ b/app/test-pmd/testpmd.h
@@ -131,18 +131,11 @@ struct fwd_stream {
 #define TESTPMD_TX_OFFLOAD_TCP_CKSUM         0x0004
 /** Offload SCTP checksum in csum forward engine */
 #define TESTPMD_TX_OFFLOAD_SCTP_CKSUM        0x0008
-/** Offload inner IP checksum in csum forward engine */
-#define TESTPMD_TX_OFFLOAD_INNER_IP_CKSUM    0x0010
-/** Offload inner UDP checksum in csum forward engine */
-#define TESTPMD_TX_OFFLOAD_INNER_UDP_CKSUM   0x0020
-/** Offload inner TCP checksum in csum forward engine */
-#define TESTPMD_TX_OFFLOAD_INNER_TCP_CKSUM   0x0040
-/** Offload inner SCTP checksum in csum forward engine */
-#define TESTPMD_TX_OFFLOAD_INNER_SCTP_CKSUM  0x0080
-/** Offload inner IP checksum mask */
-#define TESTPMD_TX_OFFLOAD_INNER_CKSUM_MASK  0x00F0
+/** Offload VxLAN checksum in csum forward engine */
+#define TESTPMD_TX_OFFLOAD_VXLAN_CKSUM       0x0010
 /** Insert VLAN header in forward engine */
-#define TESTPMD_TX_OFFLOAD_INSERT_VLAN       0x0100
+#define TESTPMD_TX_OFFLOAD_INSERT_VLAN       0x0020
+
 /**
  * The data structure associated with each port.
  */
@@ -510,8 +503,6 @@ void tx_vlan_pvid_set(portid_t port_id, uint16_t vlan_id, int on);
 
 void set_qmap(portid_t port_id, uint8_t is_rx, uint16_t queue_id, uint8_t map_value);
 
-void tx_cksum_set(portid_t port_id, uint64_t ol_flags);
-
 void set_verbose_level(uint16_t vb_level);
 void set_tx_pkt_segments(unsigned *seg_lengths, unsigned nb_segs);
 void set_nb_pkt_per_burst(uint16_t pkt_burst);
-- 
2.1.0

^ permalink raw reply	[flat|nested] 112+ messages in thread

* [dpdk-dev] [PATCH 11/12] testpmd: support TSO in csum forward engine
  2014-11-10 15:59 [dpdk-dev] [PATCH 00/12] add TSO support Olivier Matz
                   ` (9 preceding siblings ...)
  2014-11-10 15:59 ` [dpdk-dev] [PATCH 10/12] testpmd: rework csum forward engine Olivier Matz
@ 2014-11-10 15:59 ` Olivier Matz
  2014-11-10 15:59 ` [dpdk-dev] [PATCH 12/12] testpmd: add a verbose mode " Olivier Matz
                   ` (2 subsequent siblings)
  13 siblings, 0 replies; 112+ messages in thread
From: Olivier Matz @ 2014-11-10 15:59 UTC (permalink / raw)
  To: dev; +Cc: jigsaw

Add two new commands in testpmd:

- tso set <segsize> <portid>
- tso show <portid>

These commands can be used enable TSO when transmitting TCP packets in
the csum forward engine. Ex:

  set fwd csum
  tx_checksum set ip hw 0
  tso set 800 0
  start

Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
---
 app/test-pmd/cmdline.c  | 92 +++++++++++++++++++++++++++++++++++++++++++++++++
 app/test-pmd/csumonly.c | 53 +++++++++++++++++++++-------
 app/test-pmd/testpmd.h  |  1 +
 3 files changed, 133 insertions(+), 13 deletions(-)

diff --git a/app/test-pmd/cmdline.c b/app/test-pmd/cmdline.c
index 0361e58..5460415 100644
--- a/app/test-pmd/cmdline.c
+++ b/app/test-pmd/cmdline.c
@@ -318,6 +318,14 @@ static void cmd_help_long_parsed(void *parsed_result,
 			"tx_checksum show (port_id)\n"
 			"    Display tx checksum offload configuration\n\n"
 
+			"tso set (segsize) (portid)\n"
+			"    Enable TCP Segmentation Offload in csum forward"
+			" engine.\n"
+			"    Please check the NIC datasheet for HW limits.\n\n"
+
+			"tso show (portid)"
+			"    Display the status of TCP Segmentation Offload.\n\n"
+
 			"set fwd (%s)\n"
 			"    Set packet forwarding mode.\n\n"
 
@@ -2862,6 +2870,88 @@ cmdline_parse_inst_t cmd_tx_cksum_show = {
 	},
 };
 
+/* *** ENABLE HARDWARE SEGMENTATION IN TX PACKETS *** */
+struct cmd_tso_set_result {
+	cmdline_fixed_string_t tso;
+	cmdline_fixed_string_t mode;
+	uint16_t tso_segsz;
+	uint8_t port_id;
+};
+
+static void
+cmd_tso_set_parsed(void *parsed_result,
+		       __attribute__((unused)) struct cmdline *cl,
+		       __attribute__((unused)) void *data)
+{
+	struct cmd_tso_set_result *res = parsed_result;
+	struct rte_eth_dev_info dev_info;
+
+	if (port_id_is_invalid(res->port_id))
+		return;
+
+	if (!strcmp(res->mode, "set"))
+		ports[res->port_id].tso_segsz = res->tso_segsz;
+
+	if (ports[res->port_id].tso_segsz == 0)
+		printf("TSO is disabled\n");
+	else
+		printf("TSO segment size is %d\n",
+			ports[res->port_id].tso_segsz);
+
+	/* display warnings if configuration is not supported by the NIC */
+	rte_eth_dev_info_get(res->port_id, &dev_info);
+	if ((ports[res->port_id].tso_segsz != 0) &&
+		(dev_info.tx_offload_capa & DEV_TX_OFFLOAD_TCP_TSO) == 0) {
+		printf("Warning: TSO enabled but not "
+			"supported by port %d\n", res->port_id);
+	}
+}
+
+cmdline_parse_token_string_t cmd_tso_set_tso =
+	TOKEN_STRING_INITIALIZER(struct cmd_tso_set_result,
+				tso, "tso");
+cmdline_parse_token_string_t cmd_tso_set_mode =
+	TOKEN_STRING_INITIALIZER(struct cmd_tso_set_result,
+				mode, "set");
+cmdline_parse_token_num_t cmd_tso_set_tso_segsz =
+	TOKEN_NUM_INITIALIZER(struct cmd_tso_set_result,
+				tso_segsz, UINT16);
+cmdline_parse_token_num_t cmd_tso_set_portid =
+	TOKEN_NUM_INITIALIZER(struct cmd_tso_set_result,
+				port_id, UINT8);
+
+cmdline_parse_inst_t cmd_tso_set = {
+	.f = cmd_tso_set_parsed,
+	.data = NULL,
+	.help_str = "Set TSO segment size for csum engine (0 to disable): "
+	"tso set <tso_segsz> <port>",
+	.tokens = {
+		(void *)&cmd_tso_set_tso,
+		(void *)&cmd_tso_set_mode,
+		(void *)&cmd_tso_set_tso_segsz,
+		(void *)&cmd_tso_set_portid,
+		NULL,
+	},
+};
+
+cmdline_parse_token_string_t cmd_tso_show_mode =
+	TOKEN_STRING_INITIALIZER(struct cmd_tso_set_result,
+				mode, "show");
+
+
+cmdline_parse_inst_t cmd_tso_show = {
+	.f = cmd_tso_set_parsed,
+	.data = NULL,
+	.help_str = "Show TSO segment size for csum engine: "
+	"tso show <port>",
+	.tokens = {
+		(void *)&cmd_tso_set_tso,
+		(void *)&cmd_tso_show_mode,
+		(void *)&cmd_tso_set_portid,
+		NULL,
+	},
+};
+
 /* *** ENABLE/DISABLE FLUSH ON RX STREAMS *** */
 struct cmd_set_flush_rx {
 	cmdline_fixed_string_t set;
@@ -7875,6 +7965,8 @@ cmdline_parse_ctx_t main_ctx[] = {
 	(cmdline_parse_inst_t *)&cmd_tx_vlan_set_pvid,
 	(cmdline_parse_inst_t *)&cmd_tx_cksum_set,
 	(cmdline_parse_inst_t *)&cmd_tx_cksum_show,
+	(cmdline_parse_inst_t *)&cmd_tso_set,
+	(cmdline_parse_inst_t *)&cmd_tso_show,
 	(cmdline_parse_inst_t *)&cmd_link_flow_control_set,
 	(cmdline_parse_inst_t *)&cmd_link_flow_control_set_rx,
 	(cmdline_parse_inst_t *)&cmd_link_flow_control_set_tx,
diff --git a/app/test-pmd/csumonly.c b/app/test-pmd/csumonly.c
index abc525c..7995ff5 100644
--- a/app/test-pmd/csumonly.c
+++ b/app/test-pmd/csumonly.c
@@ -222,14 +222,15 @@ get_udptcp_checksum(void *l3_hdr, void *l4_hdr, uint16_t ethertype)
 /*
  * Parse an ethernet header to fill the ethertype, l2_len, l3_len and
  * ipproto. This function is able to recognize IPv4/IPv6 with one optional vlan
- * header.
+ * header. The l4_len argument is only set in case of TCP (useful for TSO).
  */
 static void
 parse_ethernet(struct ether_hdr *eth_hdr, uint16_t *ethertype, uint16_t *l2_len,
-	uint16_t *l3_len, uint8_t *l4_proto)
+	uint16_t *l3_len, uint8_t *l4_proto, uint16_t *l4_len)
 {
 	struct ipv4_hdr *ipv4_hdr;
 	struct ipv6_hdr *ipv6_hdr;
+	struct tcp_hdr *tcp_hdr;
 
 	*l2_len = sizeof(struct ether_hdr);
 	*ethertype = eth_hdr->ether_type;
@@ -257,6 +258,14 @@ parse_ethernet(struct ether_hdr *eth_hdr, uint16_t *ethertype, uint16_t *l2_len,
 		*l4_proto = 0;
 		break;
 	}
+
+	if (*l4_proto == IPPROTO_TCP) {
+		tcp_hdr = (struct tcp_hdr *)((char *)eth_hdr +
+			*l2_len + *l3_len);
+		*l4_len = (tcp_hdr->data_off & 0xf0) >> 2;
+	}
+	else
+		*l4_len = 0;
 }
 
 /* modify the IPv4 or IPv4 source address of a packet */
@@ -279,7 +288,7 @@ change_ip_addresses(void *l3_hdr, uint16_t ethertype)
  * depending on the testpmd command line configuration */
 static uint64_t
 process_inner_cksums(void *l3_hdr, uint16_t ethertype, uint16_t l3_len,
-	uint8_t l4_proto, uint16_t testpmd_ol_flags)
+	uint8_t l4_proto, uint16_t tso_segsz, uint16_t testpmd_ol_flags)
 {
 	struct ipv4_hdr *ipv4_hdr = l3_hdr;
 	struct udp_hdr *udp_hdr;
@@ -291,11 +300,15 @@ process_inner_cksums(void *l3_hdr, uint16_t ethertype, uint16_t l3_len,
 		ipv4_hdr = l3_hdr;
 		ipv4_hdr->hdr_checksum = 0;
 
-		if (testpmd_ol_flags & TESTPMD_TX_OFFLOAD_IP_CKSUM)
+		if (tso_segsz != 0 && l4_proto == IPPROTO_TCP) {
 			ol_flags |= PKT_TX_IP_CKSUM;
-		else
-			ipv4_hdr->hdr_checksum = get_ipv4_cksum(ipv4_hdr);
-
+		}
+		else {
+			if (testpmd_ol_flags & TESTPMD_TX_OFFLOAD_IP_CKSUM)
+				ol_flags |= PKT_TX_IP_CKSUM;
+			else
+				ipv4_hdr->hdr_checksum = get_ipv4_cksum(ipv4_hdr);
+		}
 	}
 	else if (ethertype != _htons(ETHER_TYPE_IPv6))
 		return 0; /* packet type not supported nothing to do */
@@ -320,7 +333,11 @@ process_inner_cksums(void *l3_hdr, uint16_t ethertype, uint16_t l3_len,
 	else if (l4_proto == IPPROTO_TCP) {
 		tcp_hdr = (struct tcp_hdr *)((char *)l3_hdr + l3_len);
 		tcp_hdr->cksum = 0;
-		if (testpmd_ol_flags & TESTPMD_TX_OFFLOAD_TCP_CKSUM) {
+		if (tso_segsz != 0) {
+			ol_flags |= PKT_TX_TCP_SEG;
+			tcp_hdr->cksum = get_psd_sum(l3_hdr, ethertype);
+		}
+		else if (testpmd_ol_flags & TESTPMD_TX_OFFLOAD_TCP_CKSUM) {
 			ol_flags |= PKT_TX_TCP_CKSUM;
 			tcp_hdr->cksum = get_psd_sum(l3_hdr, ethertype);
 		}
@@ -393,6 +410,8 @@ process_outer_cksums(void *outer_l3_hdr, uint16_t outer_ethertype,
  *  - modify the IPs
  *  - reprocess the checksum in SW or HW, depending on testpmd command line
  *    configuration
+ *  - if TSO is enabled in testpmd command line, also flag the mbuf for TCP
+ *    segmentation offload (this implies HW checksum)
  * Then packets are transmitted on the output port.
  *
  * Supported packets are:
@@ -418,7 +437,9 @@ pkt_burst_checksum_forward(struct fwd_stream *fs)
 	uint16_t testpmd_ol_flags;
 	uint8_t l4_proto;
 	uint16_t ethertype = 0, outer_ethertype = 0;
-	uint16_t  l2_len = 0, l3_len = 0, outer_l2_len = 0, outer_l3_len = 0;
+	uint16_t l2_len = 0, l3_len = 0, l4_len = 0;
+	uint16_t outer_l2_len = 0, outer_l3_len = 0;
+	uint16_t tso_segsz;
 	int tunnel = 0;
 	uint32_t rx_bad_ip_csum;
 	uint32_t rx_bad_l4_csum;
@@ -448,6 +469,7 @@ pkt_burst_checksum_forward(struct fwd_stream *fs)
 
 	txp = &ports[fs->tx_port];
 	testpmd_ol_flags = txp->tx_ol_flags;
+	tso_segsz = txp->tso_segsz;
 
 	for (i = 0; i < nb_rx; i++) {
 
@@ -463,7 +485,8 @@ pkt_burst_checksum_forward(struct fwd_stream *fs)
 		 * and inner headers */
 
 		eth_hdr = rte_pktmbuf_mtod(m, struct ether_hdr *);
-		parse_ethernet(eth_hdr, &ethertype, &l2_len, &l3_len, &l4_proto);
+		parse_ethernet(eth_hdr, &ethertype, &l2_len, &l3_len,
+			&l4_proto, &l4_len);
 		l3_hdr = (char *)eth_hdr + l2_len;
 
 		/* check if it's a supported tunnel (only vxlan for now) */
@@ -491,7 +514,7 @@ pkt_burst_checksum_forward(struct fwd_stream *fs)
 					sizeof(struct vxlan_hdr));
 
 				parse_ethernet(eth_hdr, &ethertype, &l2_len,
-					&l3_len, &l4_proto);
+					&l3_len, &l4_proto, &l4_len);
 				l3_hdr = (char *)eth_hdr + l2_len;
 			}
 		}
@@ -505,11 +528,12 @@ pkt_burst_checksum_forward(struct fwd_stream *fs)
 
 		/* step 3: depending on user command line configuration,
 		 * recompute checksum either in software or flag the
-		 * mbuf to offload the calculation to the NIC */
+		 * mbuf to offload the calculation to the NIC. If TSO
+		 * is configured, prepare the mbuf for TCP segmentation. */
 
 		/* process checksums of inner headers first */
 		ol_flags |= process_inner_cksums(l3_hdr, ethertype,
-			l3_len, l4_proto, testpmd_ol_flags);
+			l3_len, l4_proto, tso_segsz, testpmd_ol_flags);
 
 		/* Then process outer headers if any. Note that the software
 		 * checksum will be wrong if one of the inner checksums is
@@ -538,6 +562,7 @@ pkt_burst_checksum_forward(struct fwd_stream *fs)
 					sizeof(struct udp_hdr) +
 					sizeof(struct vxlan_hdr) + l2_len;
 				m->l3_len = l3_len;
+				m->l4_len = l4_len;
 			}
 		} else {
 			/* this is only useful if an offload flag is
@@ -545,7 +570,9 @@ pkt_burst_checksum_forward(struct fwd_stream *fs)
 			 * case */
 			m->l2_len = l2_len;
 			m->l3_len = l3_len;
+			m->l4_len = l4_len;
 		}
+		m->tso_segsz = tso_segsz;
 		m->ol_flags = ol_flags;
 
 	}
diff --git a/app/test-pmd/testpmd.h b/app/test-pmd/testpmd.h
index c753d37..c22863f 100644
--- a/app/test-pmd/testpmd.h
+++ b/app/test-pmd/testpmd.h
@@ -149,6 +149,7 @@ struct rte_port {
 	struct fwd_stream       *tx_stream; /**< Port TX stream, if unique */
 	unsigned int            socket_id;  /**< For NUMA support */
 	uint16_t                tx_ol_flags;/**< TX Offload Flags (TESTPMD_TX_OFFLOAD...). */
+	uint16_t                tso_segsz;  /**< MSS for segmentation offload. */
 	uint16_t                tx_vlan_id; /**< Tag Id. in TX VLAN packets. */
 	void                    *fwd_ctx;   /**< Forwarding mode context */
 	uint64_t                rx_bad_ip_csum; /**< rx pkts with bad ip checksum  */
-- 
2.1.0

^ permalink raw reply	[flat|nested] 112+ messages in thread

* [dpdk-dev] [PATCH 12/12] testpmd: add a verbose mode csum forward engine
  2014-11-10 15:59 [dpdk-dev] [PATCH 00/12] add TSO support Olivier Matz
                   ` (10 preceding siblings ...)
  2014-11-10 15:59 ` [dpdk-dev] [PATCH 11/12] testpmd: support TSO in " Olivier Matz
@ 2014-11-10 15:59 ` Olivier Matz
  2014-11-11  9:21 ` [dpdk-dev] [PATCH 00/12] add TSO support Olivier MATZ
  2014-11-14 17:03 ` [dpdk-dev] [PATCH v2 00/13] " Olivier Matz
  13 siblings, 0 replies; 112+ messages in thread
From: Olivier Matz @ 2014-11-10 15:59 UTC (permalink / raw)
  To: dev; +Cc: jigsaw

If the user specifies 'set verbose 1' in testpmd command line,
the csum forward engine will dump some informations about received
and transmitted packets, especially which flags are set and what
values are assigned to l2_len, l3_len, l4_len and tso_segsz.

This can help someone implementing TSO or hardware checksum offload to
understand how to configure the mbufs.

Example of output for one packet:

 --------------
 rx: l2_len=14 ethertype=800 l3_len=20 l4_proto=6 l4_len=20
 tx: m->l2_len=14 m->l3_len=20 m->l4_len=20
 tx: m->tso_segsz=800
 tx: flags=PKT_TX_IP_CKSUM PKT_TX_TCP_SEG
 --------------

Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
---
 app/test-pmd/csumonly.c | 51 +++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 51 insertions(+)

diff --git a/app/test-pmd/csumonly.c b/app/test-pmd/csumonly.c
index 7995ff5..74521d4 100644
--- a/app/test-pmd/csumonly.c
+++ b/app/test-pmd/csumonly.c
@@ -575,6 +575,57 @@ pkt_burst_checksum_forward(struct fwd_stream *fs)
 		m->tso_segsz = tso_segsz;
 		m->ol_flags = ol_flags;
 
+		/* if verbose mode is enabled, dump debug info */
+		if (verbose_level > 0) {
+			struct {
+				uint64_t flag;
+				uint64_t mask;
+			} tx_flags[] = {
+				{ PKT_TX_IP_CKSUM, PKT_TX_IP_CKSUM },
+				{ PKT_TX_UDP_CKSUM, PKT_TX_L4_MASK },
+				{ PKT_TX_TCP_CKSUM, PKT_TX_L4_MASK },
+				{ PKT_TX_SCTP_CKSUM, PKT_TX_L4_MASK },
+				{ PKT_TX_VXLAN_CKSUM, PKT_TX_VXLAN_CKSUM },
+				{ PKT_TX_TCP_SEG, PKT_TX_TCP_SEG },
+			};
+			unsigned j;
+			const char *name;
+
+			printf("-----------------\n");
+			/* dump rx parsed packet info */
+			printf("rx: l2_len=%d ethertype=%x l3_len=%d "
+				"l4_proto=%d l4_len=%d\n",
+				l2_len, rte_be_to_cpu_16(ethertype),
+				l3_len, l4_proto, l4_len);
+			if (tunnel == 1)
+				printf("rx: outer_l2_len=%d outer_ethertype=%x "
+					"outer_l3_len=%d\n", outer_l2_len,
+					rte_be_to_cpu_16(outer_ethertype),
+					outer_l3_len);
+			/* dump tx packet info */
+			if ((testpmd_ol_flags & (TESTPMD_TX_OFFLOAD_IP_CKSUM |
+						TESTPMD_TX_OFFLOAD_UDP_CKSUM |
+						TESTPMD_TX_OFFLOAD_TCP_CKSUM |
+						TESTPMD_TX_OFFLOAD_SCTP_CKSUM)) ||
+				tso_segsz != 0)
+				printf("tx: m->l2_len=%d m->l3_len=%d "
+					"m->l4_len=%d\n",
+					m->l2_len, m->l3_len, m->l4_len);
+			if ((tunnel == 1) &&
+				(testpmd_ol_flags & TESTPMD_TX_OFFLOAD_VXLAN_CKSUM))
+				printf("tx: m->inner_l2_len=%d m->inner_l3_len=%d\n",
+					m->inner_l2_len, m->inner_l3_len);
+			if (tso_segsz != 0)
+				printf("tx: m->tso_segsz=%d\n", m->tso_segsz);
+			printf("tx: flags=");
+			for (j = 0; j < sizeof(tx_flags)/sizeof(*tx_flags); j++) {
+				name = rte_get_tx_ol_flag_name(tx_flags[j].flag);
+				if ((m->ol_flags & tx_flags[j].mask) ==
+					tx_flags[j].flag)
+					printf("%s ", name);
+			}
+			printf("\n");
+		}
 	}
 	nb_tx = rte_eth_tx_burst(fs->tx_port, fs->tx_queue, pkts_burst, nb_rx);
 	fs->tx_packets += nb_tx;
-- 
2.1.0

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [dpdk-dev] [PATCH 02/12] ixgbe: fix remaining pkt_flags variable size to 64 bits
  2014-11-10 15:59 ` [dpdk-dev] [PATCH 02/12] ixgbe: fix remaining pkt_flags variable size to 64 bits Olivier Matz
@ 2014-11-10 16:59   ` Bruce Richardson
  0 siblings, 0 replies; 112+ messages in thread
From: Bruce Richardson @ 2014-11-10 16:59 UTC (permalink / raw)
  To: Olivier Matz; +Cc: jigsaw, dev

On Mon, Nov 10, 2014 at 04:59:16PM +0100, Olivier Matz wrote:
> Since commit 4332beee9 "mbuf: expand ol_flags field to 64-bits", the
> packet flags are now 64 bits wide. Some occurences were forgotten in
> the ixgbe driver.
> 
> Signed-off-by: Olivier Matz <olivier.matz@6wind.com>

Acked-by: Bruce Richardson <bruce.richardson@intel.com>

> ---
>  lib/librte_pmd_ixgbe/ixgbe_rxtx.c | 12 ++++++------
>  1 file changed, 6 insertions(+), 6 deletions(-)
> 
> diff --git a/lib/librte_pmd_ixgbe/ixgbe_rxtx.c b/lib/librte_pmd_ixgbe/ixgbe_rxtx.c
> index 78be7e6..042ee8a 100644
> --- a/lib/librte_pmd_ixgbe/ixgbe_rxtx.c
> +++ b/lib/librte_pmd_ixgbe/ixgbe_rxtx.c
> @@ -817,7 +817,7 @@ end_of_tx:
>  static inline uint64_t
>  rx_desc_hlen_type_rss_to_pkt_flags(uint32_t hl_tp_rs)
>  {
> -	uint16_t pkt_flags;
> +	uint64_t pkt_flags;
>  
>  	static uint64_t ip_pkt_types_map[16] = {
>  		0, PKT_RX_IPV4_HDR, PKT_RX_IPV4_HDR_EXT, PKT_RX_IPV4_HDR_EXT,
> @@ -834,7 +834,7 @@ rx_desc_hlen_type_rss_to_pkt_flags(uint32_t hl_tp_rs)
>  	};
>  
>  #ifdef RTE_LIBRTE_IEEE1588
> -	static uint32_t ip_pkt_etqf_map[8] = {
> +	static uint64_t ip_pkt_etqf_map[8] = {
>  		0, 0, 0, PKT_RX_IEEE1588_PTP,
>  		0, 0, 0, 0,
>  	};
> @@ -903,7 +903,7 @@ ixgbe_rx_scan_hw_ring(struct igb_rx_queue *rxq)
>  	struct igb_rx_entry *rxep;
>  	struct rte_mbuf *mb;
>  	uint16_t pkt_len;
> -	uint16_t pkt_flags;
> +	uint64_t pkt_flags;
>  	int s[LOOK_AHEAD], nb_dd;
>  	int i, j, nb_rx = 0;
>  
> @@ -1335,7 +1335,7 @@ ixgbe_recv_scattered_pkts(void *rx_queue, struct rte_mbuf **rx_pkts,
>  	uint16_t nb_rx;
>  	uint16_t nb_hold;
>  	uint16_t data_len;
> -	uint16_t pkt_flags;
> +	uint64_t pkt_flags;
>  
>  	nb_rx = 0;
>  	nb_hold = 0;
> @@ -1511,9 +1511,9 @@ ixgbe_recv_scattered_pkts(void *rx_queue, struct rte_mbuf **rx_pkts,
>  		first_seg->vlan_tci = rte_le_to_cpu_16(rxd.wb.upper.vlan);
>  		hlen_type_rss = rte_le_to_cpu_32(rxd.wb.lower.lo_dword.data);
>  		pkt_flags = rx_desc_hlen_type_rss_to_pkt_flags(hlen_type_rss);
> -		pkt_flags = (uint16_t)(pkt_flags |
> +		pkt_flags = (pkt_flags |
>  				rx_desc_status_to_pkt_flags(staterr));
> -		pkt_flags = (uint16_t)(pkt_flags |
> +		pkt_flags = (pkt_flags |
>  				rx_desc_error_to_pkt_flags(staterr));
>  		first_seg->ol_flags = pkt_flags;
>  
> -- 
> 2.1.0
> 

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [dpdk-dev] [PATCH 03/12] mbuf: move vxlan_cksum flag definition at the proper place
  2014-11-10 15:59 ` [dpdk-dev] [PATCH 03/12] mbuf: move vxlan_cksum flag definition at the proper place Olivier Matz
@ 2014-11-10 17:09   ` Bruce Richardson
  0 siblings, 0 replies; 112+ messages in thread
From: Bruce Richardson @ 2014-11-10 17:09 UTC (permalink / raw)
  To: Olivier Matz; +Cc: jigsaw, dev

On Mon, Nov 10, 2014 at 04:59:17PM +0100, Olivier Matz wrote:
> The tx mbuf flags are ordered from the highest value to the
> the lowest. Move the PKT_TX_VXLAN_CKSUM at the right place.
> 
> Signed-off-by: Olivier Matz <olivier.matz@6wind.com>

Acked-by: Bruce Richardson <bruce.richardson@intel.com>

> ---
>  lib/librte_mbuf/rte_mbuf.h | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h
> index e8f9bfc..be15168 100644
> --- a/lib/librte_mbuf/rte_mbuf.h
> +++ b/lib/librte_mbuf/rte_mbuf.h
> @@ -96,7 +96,6 @@ extern "C" {
>  
>  #define PKT_TX_VLAN_PKT      (1ULL << 55) /**< TX packet is a 802.1q VLAN packet. */
>  #define PKT_TX_IP_CKSUM      (1ULL << 54) /**< IP cksum of TX pkt. computed by NIC. */
> -#define PKT_TX_VXLAN_CKSUM   (1ULL << 50) /**< TX checksum of VXLAN computed by NIC */
>  #define PKT_TX_IPV4_CSUM     PKT_TX_IP_CKSUM /**< Alias of PKT_TX_IP_CKSUM. */
>  #define PKT_TX_IPV4          PKT_RX_IPV4_HDR /**< IPv4 with no IP checksum offload. */
>  #define PKT_TX_IPV6          PKT_RX_IPV6_HDR /**< IPv6 packet */
> @@ -114,9 +113,10 @@ extern "C" {
>  #define PKT_TX_UDP_CKSUM     (3ULL << 52) /**< UDP cksum of TX pkt. computed by NIC. */
>  #define PKT_TX_L4_MASK       (3ULL << 52) /**< Mask for L4 cksum offload request. */
>  
> -/* Bit 51 - IEEE1588*/
>  #define PKT_TX_IEEE1588_TMST (1ULL << 51) /**< TX IEEE1588 packet to timestamp. */
>  
> +#define PKT_TX_VXLAN_CKSUM   (1ULL << 50) /**< TX checksum of VXLAN computed by NIC */
> +
>  /* Use final bit of flags to indicate a control mbuf */
>  #define CTRL_MBUF_FLAG       (1ULL << 63) /**< Mbuf contains control data */
>  
> -- 
> 2.1.0
> 

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [dpdk-dev] [PATCH 04/12] mbuf: add help about TX checksum flags
  2014-11-10 15:59 ` [dpdk-dev] [PATCH 04/12] mbuf: add help about TX checksum flags Olivier Matz
@ 2014-11-10 17:10   ` Bruce Richardson
  0 siblings, 0 replies; 112+ messages in thread
From: Bruce Richardson @ 2014-11-10 17:10 UTC (permalink / raw)
  To: Olivier Matz; +Cc: jigsaw, dev

On Mon, Nov 10, 2014 at 04:59:18PM +0100, Olivier Matz wrote:
> Describe how to use hardware checksum API.
> 
> Signed-off-by: Olivier Matz <olivier.matz@6wind.com>

Acked-by: Bruce Richardson <bruce.richardson@intel.com>

> ---
>  lib/librte_mbuf/rte_mbuf.h | 25 +++++++++++++++++--------
>  1 file changed, 17 insertions(+), 8 deletions(-)
> 
> diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h
> index be15168..96e322b 100644
> --- a/lib/librte_mbuf/rte_mbuf.h
> +++ b/lib/librte_mbuf/rte_mbuf.h
> @@ -95,19 +95,28 @@ extern "C" {
>  #define PKT_RX_TUNNEL_IPV6_HDR (1ULL << 12) /**< RX tunnel packet with IPv6 header. */
>  
>  #define PKT_TX_VLAN_PKT      (1ULL << 55) /**< TX packet is a 802.1q VLAN packet. */
> -#define PKT_TX_IP_CKSUM      (1ULL << 54) /**< IP cksum of TX pkt. computed by NIC. */
> +
> +/**
> + * Enable hardware computation of IP cksum. To use it:
> + *  - fill l2_len and l3_len in mbuf
> + *  - set the flags PKT_TX_IP_CKSUM
> + *  - set the ip checksum to 0 in IP header
> + */
> +#define PKT_TX_IP_CKSUM      (1ULL << 54)
>  #define PKT_TX_IPV4_CSUM     PKT_TX_IP_CKSUM /**< Alias of PKT_TX_IP_CKSUM. */
>  #define PKT_TX_IPV4          PKT_RX_IPV4_HDR /**< IPv4 with no IP checksum offload. */
>  #define PKT_TX_IPV6          PKT_RX_IPV6_HDR /**< IPv6 packet */
>  
> -/*
> - * Bits 52+53 used for L4 packet type with checksum enabled.
> - *     00: Reserved
> - *     01: TCP checksum
> - *     10: SCTP checksum
> - *     11: UDP checksum
> +/**
> + * Bits 52+53 used for L4 packet type with checksum enabled: 00: Reserved,
> + * 01: TCP checksum, 10: SCTP checksum, 11: UDP checksum. To use hardware
> + * L4 checksum offload, the user needs to:
> + *  - fill l2_len and l3_len in mbuf
> + *  - set the flags PKT_TX_TCP_CKSUM, PKT_TX_SCTP_CKSUM or PKT_TX_UDP_CKSUM
> + *  - calculate the pseudo header checksum and set it in the L4 header (only
> + *    for TCP or UDP). For SCTP, set the crc field to 0.
>   */
> -#define PKT_TX_L4_NO_CKSUM   (0ULL << 52) /**< Disable L4 cksum of TX pkt. */
> +#define PKT_TX_L4_NO_CKSUM   (0ULL << 52) /* Disable L4 cksum of TX pkt. */
>  #define PKT_TX_TCP_CKSUM     (1ULL << 52) /**< TCP cksum of TX pkt. computed by NIC. */
>  #define PKT_TX_SCTP_CKSUM    (2ULL << 52) /**< SCTP cksum of TX pkt. computed by NIC. */
>  #define PKT_TX_UDP_CKSUM     (3ULL << 52) /**< UDP cksum of TX pkt. computed by NIC. */
> -- 
> 2.1.0
> 

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [dpdk-dev] [PATCH 05/12] mbuf: remove too specific PKT_TX_OFFLOAD_MASK definition
  2014-11-10 15:59 ` [dpdk-dev] [PATCH 05/12] mbuf: remove too specific PKT_TX_OFFLOAD_MASK definition Olivier Matz
@ 2014-11-10 17:14   ` Bruce Richardson
  2014-11-10 20:59     ` Olivier MATZ
  0 siblings, 1 reply; 112+ messages in thread
From: Bruce Richardson @ 2014-11-10 17:14 UTC (permalink / raw)
  To: Olivier Matz; +Cc: jigsaw, dev

On Mon, Nov 10, 2014 at 04:59:19PM +0100, Olivier Matz wrote:
> This definition is specific to Intel PMD drivers and its definition
> "indicate what bits required for building TX context" shows that it
> should not be in the generic rte_mbuf.h but in the PMD driver.
> 
> Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
> ---
>  lib/librte_mbuf/rte_mbuf.h        | 5 -----
>  lib/librte_pmd_e1000/igb_rxtx.c   | 3 ++-
>  lib/librte_pmd_ixgbe/ixgbe_rxtx.c | 3 ++-
>  3 files changed, 4 insertions(+), 7 deletions(-)
> 
> diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h
> index 96e322b..ff11b84 100644
> --- a/lib/librte_mbuf/rte_mbuf.h
> +++ b/lib/librte_mbuf/rte_mbuf.h
> @@ -129,11 +129,6 @@ extern "C" {
>  /* Use final bit of flags to indicate a control mbuf */
>  #define CTRL_MBUF_FLAG       (1ULL << 63) /**< Mbuf contains control data */
>  
> -/**
> - * Bit Mask to indicate what bits required for building TX context
> - */
> -#define PKT_TX_OFFLOAD_MASK (PKT_TX_VLAN_PKT | PKT_TX_IP_CKSUM | PKT_TX_L4_MASK)
> -
>  /* define a set of marker types that can be used to refer to set points in the
>   * mbuf */
>  typedef void    *MARKER[0];   /**< generic marker for a point in a structure */
> diff --git a/lib/librte_pmd_e1000/igb_rxtx.c b/lib/librte_pmd_e1000/igb_rxtx.c
> index 321493e..dbf5074 100644
> --- a/lib/librte_pmd_e1000/igb_rxtx.c
> +++ b/lib/librte_pmd_e1000/igb_rxtx.c
> @@ -400,7 +400,8 @@ eth_igb_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
>  		ol_flags = tx_pkt->ol_flags;
>  		vlan_macip_lens.f.vlan_tci = tx_pkt->vlan_tci;
>  		vlan_macip_lens.f.l2_l3_len = tx_pkt->l2_l3_len;
> -		tx_ol_req = ol_flags & PKT_TX_OFFLOAD_MASK;
> +		tx_ol_req = ol_flags & (PKT_TX_VLAN_PKT | PKT_TX_IP_CKSUM |
> +			PKT_TX_L4_MASK);
>  

Rather than make the change like this, might it be clearer just to copy-paste
the macro definition into this file (perhaps as IGB_TX_OFFLOAD_MASK). Similarly
with ixgbe below?

/Bruce

>  		/* If a Context Descriptor need be built . */
>  		if (tx_ol_req) {
> diff --git a/lib/librte_pmd_ixgbe/ixgbe_rxtx.c b/lib/librte_pmd_ixgbe/ixgbe_rxtx.c
> index 042ee8a..70ca254 100644
> --- a/lib/librte_pmd_ixgbe/ixgbe_rxtx.c
> +++ b/lib/librte_pmd_ixgbe/ixgbe_rxtx.c
> @@ -580,7 +580,8 @@ ixgbe_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
>  		ol_flags = tx_pkt->ol_flags;
>  
>  		/* If hardware offload required */
> -		tx_ol_req = ol_flags & PKT_TX_OFFLOAD_MASK;
> +		tx_ol_req = ol_flags & (PKT_TX_VLAN_PKT | PKT_TX_IP_CKSUM |
> +			PKT_TX_L4_MASK);
>  		if (tx_ol_req) {
>  			vlan_macip_lens.f.vlan_tci = tx_pkt->vlan_tci;
>  			vlan_macip_lens.f.l2_l3_len = tx_pkt->l2_l3_len;
> -- 
> 2.1.0
> 

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [dpdk-dev] [PATCH 06/12] mbuf: add functions to get the name of an ol_flag
  2014-11-10 15:59 ` [dpdk-dev] [PATCH 06/12] mbuf: add functions to get the name of an ol_flag Olivier Matz
@ 2014-11-10 17:29   ` Bruce Richardson
  2014-11-10 20:54     ` Olivier MATZ
  2014-11-12 17:21     ` Ananyev, Konstantin
  0 siblings, 2 replies; 112+ messages in thread
From: Bruce Richardson @ 2014-11-10 17:29 UTC (permalink / raw)
  To: Olivier Matz; +Cc: jigsaw, dev

On Mon, Nov 10, 2014 at 04:59:20PM +0100, Olivier Matz wrote:
> In test-pmd (rxonly.c), the code is able to dump the list of ol_flags.
> The issue is that the list of flags in the application has to be
> synchronized with the flags defined in rte_mbuf.h.
> 
> This patch introduces 2 new functions rte_get_rx_ol_flag_name()
> and rte_get_tx_ol_flag_name() that returns the name of a flag from
> its mask. It also fixes rxonly.c to use this new functions and to
> display the proper flags.

Good idea. Couple of minor comments below.

/Bruce

> 
> Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
> ---
>  app/test-pmd/rxonly.c      | 36 ++++++++--------------------
>  lib/librte_mbuf/rte_mbuf.h | 60 ++++++++++++++++++++++++++++++++++++++++++++++
>  2 files changed, 70 insertions(+), 26 deletions(-)
> 
> diff --git a/app/test-pmd/rxonly.c b/app/test-pmd/rxonly.c
> index 4410c3d..e7cd7e2 100644
> --- a/app/test-pmd/rxonly.c
> +++ b/app/test-pmd/rxonly.c
> @@ -71,26 +71,6 @@
>  
>  #include "testpmd.h"
>  
> -#define MAX_PKT_RX_FLAGS 13
> -static const char *pkt_rx_flag_names[MAX_PKT_RX_FLAGS] = {
> -	"VLAN_PKT",
> -	"RSS_HASH",
> -	"PKT_RX_FDIR",
> -	"IP_CKSUM",
> -	"IP_CKSUM_BAD",
> -
> -	"IPV4_HDR",
> -	"IPV4_HDR_EXT",
> -	"IPV6_HDR",
> -	"IPV6_HDR_EXT",
> -
> -	"IEEE1588_PTP",
> -	"IEEE1588_TMST",
> -
> -	"TUNNEL_IPV4_HDR",
> -	"TUNNEL_IPV6_HDR",
> -};
> -
>  static inline void
>  print_ether_addr(const char *what, struct ether_addr *eth_addr)
>  {
> @@ -219,12 +199,16 @@ pkt_burst_receive(struct fwd_stream *fs)
>  		printf(" - Receive queue=0x%x", (unsigned) fs->rx_queue);
>  		printf("\n");
>  		if (ol_flags != 0) {
> -			int rxf;
> -
> -			for (rxf = 0; rxf < MAX_PKT_RX_FLAGS; rxf++) {
> -				if (ol_flags & (1 << rxf))
> -					printf("  PKT_RX_%s\n",
> -					       pkt_rx_flag_names[rxf]);
> +			unsigned rxf;
> +			const char *name;
> +
> +			for (rxf = 0; rxf < sizeof(mb->ol_flags) * 8; rxf++) {
> +				if ((ol_flags & (1ULL << rxf)) == 0)
> +					continue;
> +				name = rte_get_rx_ol_flag_name(1ULL << rxf);
> +				if (name == NULL)
> +					continue;
> +				printf("  %s\n", name);
>  			}
>  		}
>  		rte_pktmbuf_free(mb);
> diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h
> index ff11b84..bcd8996 100644
> --- a/lib/librte_mbuf/rte_mbuf.h
> +++ b/lib/librte_mbuf/rte_mbuf.h
> @@ -129,6 +129,66 @@ extern "C" {
>  /* Use final bit of flags to indicate a control mbuf */
>  #define CTRL_MBUF_FLAG       (1ULL << 63) /**< Mbuf contains control data */
>  
> +/**
> + * Bit Mask to indicate what bits required for building TX context
I don't understand this first line - is it accidentally included?

> + * Get the name of a RX offload flag
> + *
> + * @param mask
> + *   The mask describing the flag. Usually only one bit must be set.
> + *   Several bits can be given if they belong to the same mask.
> + *   Ex: PKT_TX_L4_MASK.
TX mask given as an example for a function for RX flags is confusing.

> + * @return
> + *   The name of this flag, or NULL if it's not a valid RX flag.
> + */
> +static inline const char *rte_get_rx_ol_flag_name(uint64_t mask)
> +{
> +	switch (mask) {
> +	case PKT_RX_VLAN_PKT: return "PKT_RX_VLAN_PKT";
> +	case PKT_RX_RSS_HASH: return "PKT_RX_RSS_HASH";
> +	case PKT_RX_FDIR: return "PKT_RX_FDIR";
> +	case PKT_RX_L4_CKSUM_BAD: return "PKT_RX_L4_CKSUM_BAD";
> +	case PKT_RX_IP_CKSUM_BAD: return "PKT_RX_IP_CKSUM_BAD";
> +	/* case PKT_RX_EIP_CKSUM_BAD: return "PKT_RX_EIP_CKSUM_BAD"; */
> +	/* case PKT_RX_OVERSIZE: return "PKT_RX_OVERSIZE"; */
> +	/* case PKT_RX_HBUF_OVERFLOW: return "PKT_RX_HBUF_OVERFLOW"; */
> +	/* case PKT_RX_RECIP_ERR: return "PKT_RX_RECIP_ERR"; */
> +	/* case PKT_RX_MAC_ERR: return "PKT_RX_MAC_ERR"; */
> +	case PKT_RX_IPV4_HDR: return "PKT_RX_IPV4_HDR";
> +	case PKT_RX_IPV4_HDR_EXT: return "PKT_RX_IPV4_HDR_EXT";
> +	case PKT_RX_IPV6_HDR: return "PKT_RX_IPV6_HDR";
> +	case PKT_RX_IPV6_HDR_EXT: return "PKT_RX_IPV6_HDR_EXT";
> +	case PKT_RX_IEEE1588_PTP: return "PKT_RX_IEEE1588_PTP";
> +	case PKT_RX_IEEE1588_TMST: return "PKT_RX_IEEE1588_TMST";
> +	case PKT_RX_TUNNEL_IPV4_HDR: return "PKT_RX_TUNNEL_IPV4_HDR";
> +	case PKT_RX_TUNNEL_IPV6_HDR: return "PKT_RX_TUNNEL_IPV6_HDR";
> +	default: return NULL;
> +	}
> +}
> +
> +/**
> + * Get the name of a TX offload flag
> + *
> + * @param mask
> + *   The mask describing the flag. Usually only one bit must be set.
> + *   Several bits can be given if they belong to the same mask.
> + *   Ex: PKT_TX_L4_MASK.
> + * @return
> + *   The name of this flag, or NULL if it's not a valid TX flag.
> + */
> +static inline const char *rte_get_tx_ol_flag_name(uint64_t mask)
> +{
> +	switch (mask) {
> +	case PKT_TX_VLAN_PKT: return "PKT_TX_VLAN_PKT";
> +	case PKT_TX_IP_CKSUM: return "PKT_TX_IP_CKSUM";
> +	case PKT_TX_TCP_CKSUM: return "PKT_TX_TCP_CKSUM";
> +	case PKT_TX_SCTP_CKSUM: return "PKT_TX_SCTP_CKSUM";
> +	case PKT_TX_UDP_CKSUM: return "PKT_TX_UDP_CKSUM";
> +	case PKT_TX_IEEE1588_TMST: return "PKT_TX_IEEE1588_TMST";
> +	case PKT_TX_VXLAN_CKSUM: return "PKT_TX_VXLAN_CKSUM";
> +	default: return NULL;
> +	}
> +}
> +
>  /* define a set of marker types that can be used to refer to set points in the
>   * mbuf */
>  typedef void    *MARKER[0];   /**< generic marker for a point in a structure */
> -- 
> 2.1.0
> 

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [dpdk-dev] [PATCH 06/12] mbuf: add functions to get the name of an ol_flag
  2014-11-10 17:29   ` Bruce Richardson
@ 2014-11-10 20:54     ` Olivier MATZ
  2014-11-12 17:21     ` Ananyev, Konstantin
  1 sibling, 0 replies; 112+ messages in thread
From: Olivier MATZ @ 2014-11-10 20:54 UTC (permalink / raw)
  To: Bruce Richardson; +Cc: jigsaw, dev

Hi Bruce,

Thank you for the review.

On 11/10/2014 06:29 PM, Bruce Richardson wrote:
>> +/**
>> + * Bit Mask to indicate what bits required for building TX context
> 
> I don't understand this first line - is it accidentally included?

Right, it's a mistake, I'll remove this line.

>> + * Get the name of a RX offload flag
>> + *
>> + * @param mask
>> + *   The mask describing the flag. Usually only one bit must be set.
>> + *   Several bits can be given if they belong to the same mask.
>> + *   Ex: PKT_TX_L4_MASK.
> TX mask given as an example for a function for RX flags is confusing.

I'll remove the last two lines of the description as there is no example
for RX flags.


Regards,
Olivier

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [dpdk-dev] [PATCH 05/12] mbuf: remove too specific PKT_TX_OFFLOAD_MASK definition
  2014-11-10 17:14   ` Bruce Richardson
@ 2014-11-10 20:59     ` Olivier MATZ
  0 siblings, 0 replies; 112+ messages in thread
From: Olivier MATZ @ 2014-11-10 20:59 UTC (permalink / raw)
  To: Bruce Richardson; +Cc: jigsaw, dev

Hi Bruce,

On 11/10/2014 06:14 PM, Bruce Richardson wrote:
>> --- a/lib/librte_pmd_e1000/igb_rxtx.c
>> +++ b/lib/librte_pmd_e1000/igb_rxtx.c
>> @@ -400,7 +400,8 @@ eth_igb_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
>>  		ol_flags = tx_pkt->ol_flags;
>>  		vlan_macip_lens.f.vlan_tci = tx_pkt->vlan_tci;
>>  		vlan_macip_lens.f.l2_l3_len = tx_pkt->l2_l3_len;
>> -		tx_ol_req = ol_flags & PKT_TX_OFFLOAD_MASK;
>> +		tx_ol_req = ol_flags & (PKT_TX_VLAN_PKT | PKT_TX_IP_CKSUM |
>> +			PKT_TX_L4_MASK);
>>  
> 
> Rather than make the change like this, might it be clearer just to copy-paste
> the macro definition into this file (perhaps as IGB_TX_OFFLOAD_MASK). Similarly
> with ixgbe below?

As this definition was used only once per PMD, I thought it was clearer
to remove the definition. But... someone did the same comment than
you internally, so I'll change it in next version!

Regards,
Olivier

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [dpdk-dev] [PATCH 07/12] mbuf: generic support for TCP segmentation offload
  2014-11-10 15:59 ` [dpdk-dev] [PATCH 07/12] mbuf: generic support for TCP segmentation offload Olivier Matz
@ 2014-11-11  3:17   ` Liu, Jijiang
  2014-11-12 13:09   ` Ananyev, Konstantin
  1 sibling, 0 replies; 112+ messages in thread
From: Liu, Jijiang @ 2014-11-11  3:17 UTC (permalink / raw)
  To: Olivier Matz, dev; +Cc: jigsaw



> -----Original Message-----
> From: Olivier Matz [mailto:olivier.matz@6wind.com]
> Sent: Monday, November 10, 2014 11:59 PM
> To: dev@dpdk.org
> Cc: olivier.matz@6wind.com; Walukiewicz, Miroslaw; Liu, Jijiang; Liu, Yong;
> jigsaw@gmail.com; Richardson, Bruce; Ananyev, Konstantin
> Subject: [PATCH 07/12] mbuf: generic support for TCP segmentation offload
> 
> Some of the NICs supported by DPDK have a possibility to accelerate TCP traffic
> by using segmentation offload. The application prepares a packet with valid TCP
> header with size up to 64K and deleguates the segmentation to the NIC.
> 
> Implement the generic part of TCP segmentation offload in rte_mbuf. It
> introduces 2 new fields in rte_mbuf: l4_len (length of L4 header in bytes) and
> tso_segsz (MSS of packets).
> 
> To delegate the TCP segmentation to the hardware, the user has to:
> 
> - set the PKT_TX_TCP_SEG flag in mbuf->ol_flags (this flag implies
>   PKT_TX_TCP_CKSUM)
> - set PKT_TX_IP_CKSUM if it's IPv4, and set the IP checksum to 0 in
>   the packet
> - fill the mbuf offload information: l2_len, l3_len, l4_len, tso_segsz
> - calculate the pseudo header checksum and set it in the TCP header,
>   as required when doing hardware TCP checksum offload
> 
> The API is inspired from ixgbe hardware (the next commit adds the support for
> ixgbe), but it seems generic enough to be used for other hw/drivers in the future.
> 
> This commit also reworks the way l2_len and l3_len are used in igb and ixgbe
> drivers as the l2_l3_len is not available anymore in mbuf.
> 
> Signed-off-by: Mirek Walukiewicz <miroslaw.walukiewicz@intel.com>
> Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
> ---
>  app/test-pmd/testpmd.c            |  3 ++-
>  examples/ipv4_multicast/main.c    |  3 ++-
>  lib/librte_mbuf/rte_mbuf.h        | 44 +++++++++++++++++++++++----------------
>  lib/librte_pmd_e1000/igb_rxtx.c   | 11 +++++++++-
>  lib/librte_pmd_ixgbe/ixgbe_rxtx.c | 11 +++++++++-
>  5 files changed, 50 insertions(+), 22 deletions(-)
> 
> diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c index
> 12adafa..a831e31 100644
> --- a/app/test-pmd/testpmd.c
> +++ b/app/test-pmd/testpmd.c
> @@ -408,7 +408,8 @@ testpmd_mbuf_ctor(struct rte_mempool *mp,
>  	mb->ol_flags     = 0;
>  	mb->data_off     = RTE_PKTMBUF_HEADROOM;
>  	mb->nb_segs      = 1;
> -	mb->l2_l3_len       = 0;
> +	mb->l2_len       = 0;
> +	mb->l3_len       = 0;

The mb->inner_l2_len and  mb->inner_l3_len are missed here;   I also can add them later.

>  	mb->vlan_tci     = 0;
>  	mb->hash.rss     = 0;
>  }
> diff --git a/examples/ipv4_multicast/main.c b/examples/ipv4_multicast/main.c
> index de5e6be..a31d43d 100644
> --- a/examples/ipv4_multicast/main.c
> +++ b/examples/ipv4_multicast/main.c
> @@ -302,7 +302,8 @@ mcast_out_pkt(struct rte_mbuf *pkt, int use_clone)
>  	/* copy metadata from source packet*/
>  	hdr->port = pkt->port;
>  	hdr->vlan_tci = pkt->vlan_tci;
> -	hdr->l2_l3_len = pkt->l2_l3_len;
> +	hdr->l2_len = pkt->l2_len;
> +	hdr->l3_len = pkt->l3_len;

The mb->inner_l2_len and  mb->inner_l3_len are missed here, too.
    
>  	hdr->hash = pkt->hash;
> 
>  	hdr->ol_flags = pkt->ol_flags;
> diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h index
> bcd8996..f76b768 100644
> --- a/lib/librte_mbuf/rte_mbuf.h
> +++ b/lib/librte_mbuf/rte_mbuf.h
> @@ -126,6 +126,19 @@ extern "C" {
> 
>  #define PKT_TX_VXLAN_CKSUM   (1ULL << 50) /**< TX checksum of VXLAN
> computed by NIC */
> 
> +/**
> + * TCP segmentation offload. To enable this offload feature for a
> + * packet to be transmitted on hardware supporting TSO:
> + *  - set the PKT_TX_TCP_SEG flag in mbuf->ol_flags (this flag implies
> + *    PKT_TX_TCP_CKSUM)
> + *  - if it's IPv4, set the PKT_TX_IP_CKSUM flag and write the IP checksum
> + *    to 0 in the packet
> + *  - fill the mbuf offload information: l2_len, l3_len, l4_len,
> +tso_segsz
> + *  - calculate the pseudo header checksum and set it in the TCP header,
> + *    as required when doing hardware TCP checksum offload
> + */
> +#define PKT_TX_TCP_SEG       (1ULL << 49)
> +
>  /* Use final bit of flags to indicate a control mbuf */
>  #define CTRL_MBUF_FLAG       (1ULL << 63) /**< Mbuf contains control data */
> 
> @@ -185,6 +198,7 @@ static inline const char
> *rte_get_tx_ol_flag_name(uint64_t mask)
>  	case PKT_TX_UDP_CKSUM: return "PKT_TX_UDP_CKSUM";
>  	case PKT_TX_IEEE1588_TMST: return "PKT_TX_IEEE1588_TMST";
>  	case PKT_TX_VXLAN_CKSUM: return "PKT_TX_VXLAN_CKSUM";
> +	case PKT_TX_TCP_SEG: return "PKT_TX_TCP_SEG";
>  	default: return NULL;
>  	}
>  }
> @@ -264,22 +278,18 @@ struct rte_mbuf {
> 
>  	/* fields to support TX offloads */
>  	union {
> -		uint16_t l2_l3_len; /**< combined l2/l3 lengths as single var */
> +		uint64_t tx_offload;       /**< combined for easy fetch */
>  		struct {
> -			uint16_t l3_len:9;      /**< L3 (IP) Header Length. */
> -			uint16_t l2_len:7;      /**< L2 (MAC) Header Length. */
> -		};
> -	};
> +			uint64_t l2_len:7; /**< L2 (MAC) Header Length. */
> +			uint64_t l3_len:9; /**< L3 (IP) Header Length. */
> +			uint64_t l4_len:8; /**< L4 (TCP/UDP) Header Length. */
> +			uint64_t tso_segsz:16; /**< TCP TSO segment size */
> 
> -	/* fields for TX offloading of tunnels */
> -	union {
> -		uint16_t inner_l2_l3_len;
> -		/**< combined inner l2/l3 lengths as single var */
> -		struct {
> -			uint16_t inner_l3_len:9;
> -			/**< inner L3 (IP) Header Length. */
> -			uint16_t inner_l2_len:7;
> -			/**< inner L2 (MAC) Header Length. */
> +			/* fields for TX offloading of tunnels */
> +			uint16_t inner_l3_len:9; /**< inner L3 (IP) Hdr Length.
> */
> +			uint16_t inner_l2_len:7; /**< inner L2 (MAC) Hdr
> Length. */
> +
> +			/* uint64_t unused:8; */
>  		};
>  	};
>  } __rte_cache_aligned;
> @@ -631,8 +641,7 @@ static inline void rte_pktmbuf_reset(struct rte_mbuf *m)
> {
>  	m->next = NULL;
>  	m->pkt_len = 0;
> -	m->l2_l3_len = 0;
> -	m->inner_l2_l3_len = 0;
> +	m->tx_offload = 0;
>  	m->vlan_tci = 0;
>  	m->nb_segs = 1;
>  	m->port = 0xff;
> @@ -701,8 +710,7 @@ static inline void rte_pktmbuf_attach(struct rte_mbuf
> *mi, struct rte_mbuf *md)
>  	mi->data_len = md->data_len;
>  	mi->port = md->port;
>  	mi->vlan_tci = md->vlan_tci;
> -	mi->l2_l3_len = md->l2_l3_len;
> -	mi->inner_l2_l3_len = md->inner_l2_l3_len;
> +	mi->tx_offload = md->tx_offload;
>  	mi->hash = md->hash;
> 
>  	mi->next = NULL;
> diff --git a/lib/librte_pmd_e1000/igb_rxtx.c b/lib/librte_pmd_e1000/igb_rxtx.c
> index dbf5074..0a9447e 100644
> --- a/lib/librte_pmd_e1000/igb_rxtx.c
> +++ b/lib/librte_pmd_e1000/igb_rxtx.c
> @@ -361,6 +361,13 @@ eth_igb_xmit_pkts(void *tx_queue, struct rte_mbuf
> **tx_pkts,
>  	struct rte_mbuf     *tx_pkt;
>  	struct rte_mbuf     *m_seg;
>  	union igb_vlan_macip vlan_macip_lens;
> +	union {
> +		uint16_t u16;
> +		struct {
> +			uint16_t l3_len:9;
> +			uint16_t l2_len:7;
> +		};
> +	} l2_l3_len;
>  	uint64_t buf_dma_addr;
>  	uint32_t olinfo_status;
>  	uint32_t cmd_type_len;
> @@ -398,8 +405,10 @@ eth_igb_xmit_pkts(void *tx_queue, struct rte_mbuf
> **tx_pkts,
>  		tx_last = (uint16_t) (tx_id + tx_pkt->nb_segs - 1);
> 
>  		ol_flags = tx_pkt->ol_flags;
> +		l2_l3_len.l2_len = tx_pkt->l2_len;
> +		l2_l3_len.l3_len = tx_pkt->l3_len;
>  		vlan_macip_lens.f.vlan_tci = tx_pkt->vlan_tci;
> -		vlan_macip_lens.f.l2_l3_len = tx_pkt->l2_l3_len;
> +		vlan_macip_lens.f.l2_l3_len = l2_l3_len.u16;
>  		tx_ol_req = ol_flags & (PKT_TX_VLAN_PKT | PKT_TX_IP_CKSUM
> |
>  			PKT_TX_L4_MASK);
> 
> diff --git a/lib/librte_pmd_ixgbe/ixgbe_rxtx.c b/lib/librte_pmd_ixgbe/ixgbe_rxtx.c
> index 70ca254..54a0fc1 100644
> --- a/lib/librte_pmd_ixgbe/ixgbe_rxtx.c
> +++ b/lib/librte_pmd_ixgbe/ixgbe_rxtx.c
> @@ -540,6 +540,13 @@ ixgbe_xmit_pkts(void *tx_queue, struct rte_mbuf
> **tx_pkts,
>  	struct rte_mbuf     *tx_pkt;
>  	struct rte_mbuf     *m_seg;
>  	union ixgbe_vlan_macip vlan_macip_lens;
> +	union {
> +		uint16_t u16;
> +		struct {
> +			uint16_t l3_len:9;
> +			uint16_t l2_len:7;
> +		};
> +	} l2_l3_len;
>  	uint64_t buf_dma_addr;
>  	uint32_t olinfo_status;
>  	uint32_t cmd_type_len;
> @@ -583,8 +590,10 @@ ixgbe_xmit_pkts(void *tx_queue, struct rte_mbuf
> **tx_pkts,
>  		tx_ol_req = ol_flags & (PKT_TX_VLAN_PKT | PKT_TX_IP_CKSUM
> |
>  			PKT_TX_L4_MASK);
>  		if (tx_ol_req) {
> +			l2_l3_len.l2_len = tx_pkt->l2_len;
> +			l2_l3_len.l3_len = tx_pkt->l3_len;
>  			vlan_macip_lens.f.vlan_tci = tx_pkt->vlan_tci;
> -			vlan_macip_lens.f.l2_l3_len = tx_pkt->l2_l3_len;
> +			vlan_macip_lens.f.l2_l3_len = l2_l3_len.u16;
> 
>  			/* If new context need be built or reuse the exist ctx. */
>  			ctx = what_advctx_update(txq, tx_ol_req,
> --
> 2.1.0

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [dpdk-dev] [PATCH 10/12] testpmd: rework csum forward engine
  2014-11-10 15:59 ` [dpdk-dev] [PATCH 10/12] testpmd: rework csum forward engine Olivier Matz
@ 2014-11-11  8:35   ` Liu, Jijiang
  2014-11-11  9:55     ` Olivier MATZ
  0 siblings, 1 reply; 112+ messages in thread
From: Liu, Jijiang @ 2014-11-11  8:35 UTC (permalink / raw)
  To: Olivier Matz, dev; +Cc: jigsaw

Hi Olivier,

The PKT_TX_VXLAN_CKSUM was not set in the patch, and VXLAN TX checksum offload would not work. 

Thanks
Jijiang Liu

> -----Original Message-----
> From: Olivier Matz [mailto:olivier.matz@6wind.com]
> Sent: Monday, November 10, 2014 11:59 PM
> To: dev@dpdk.org
> Cc: olivier.matz@6wind.com; Walukiewicz, Miroslaw; Liu, Jijiang; Liu, Yong;
> jigsaw@gmail.com; Richardson, Bruce; Ananyev, Konstantin
> Subject: [PATCH 10/12] testpmd: rework csum forward engine
> 
> The csum forward engine was becoming too complex to be used and extended
> (the next commits want to add the support of TSO):
> 
> - no explaination about what the code does
> - code is not factorized, lots of code duplicated, especially between
>   ipv4/ipv6
> - user command line api: use of bitmasks that need to be calculated by
>   the user
> - the user flags don't have the same semantic:
>   - for legacy IP/UDP/TCP/SCTP, it selects software or hardware checksum
>   - for other (vxlan), it selects between hardware checksum or no
>     checksum
> - the code relies too much on flags set by the driver without software
>   alternative (ex: PKT_RX_TUNNEL_IPV4_HDR). It is nice to be able to
>   compare a software implementation with the hardware offload.
> 
> This commit tries to fix these issues, and provide a simple definition of what is
> done by the forward engine:
> 
>  * Receive a burst of packets, and for supported packet types:
>  *  - modify the IPs
>  *  - reprocess the checksum in SW or HW, depending on testpmd command line
>  *    configuration
>  * Then packets are transmitted on the output port.
>  *
>  * Supported packets are:
>  *   Ether / (vlan) / IP|IP6 / UDP|TCP|SCTP .
>  *   Ether / (vlan) / IP|IP6 / UDP / VxLAN / Ether / IP|IP6 / UDP|TCP|SCTP
>  *
>  * The network parser supposes that the packet is contiguous, which may
>  * not be the case in real life.
> 
> Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
> ---
>  app/test-pmd/cmdline.c  | 151 ++++++++---
>  app/test-pmd/config.c   |  11 -
>  app/test-pmd/csumonly.c | 668 ++++++++++++++++++++++-------------------------
> -
>  app/test-pmd/testpmd.h  |  17 +-
>  4 files changed, 423 insertions(+), 424 deletions(-)
> 
> diff --git a/app/test-pmd/cmdline.c b/app/test-pmd/cmdline.c index
> 4c3fc76..0361e58 100644
> --- a/app/test-pmd/cmdline.c
> +++ b/app/test-pmd/cmdline.c
> @@ -310,19 +310,14 @@ static void cmd_help_long_parsed(void
> *parsed_result,
>  			"    Disable hardware insertion of a VLAN header in"
>  			" packets sent on a port.\n\n"
> 
> -			"tx_checksum set (mask) (port_id)\n"
> -			"    Enable hardware insertion of checksum offload with"
> -			" the 8-bit mask, 0~0xff, in packets sent on a port.\n"
> -			"        bit 0 - insert ip   checksum offload if set\n"
> -			"        bit 1 - insert udp  checksum offload if set\n"
> -			"        bit 2 - insert tcp  checksum offload if set\n"
> -			"        bit 3 - insert sctp checksum offload if set\n"
> -			"        bit 4 - insert inner ip  checksum offload if set\n"
> -			"        bit 5 - insert inner udp checksum offload if set\n"
> -			"        bit 6 - insert inner tcp checksum offload if set\n"
> -			"        bit 7 - insert inner sctp checksum offload if set\n"
> +			"tx_cksum set (ip|udp|tcp|sctp|vxlan) (hw|sw)
> (port_id)\n"
> +			"    Enable hardware calculation of checksum with when"
> +			" transmitting a packet using 'csum' forward engine.\n"
>  			"    Please check the NIC datasheet for HW limits.\n\n"
> 
> +			"tx_checksum show (port_id)\n"
> +			"    Display tx checksum offload configuration\n\n"
> +
>  			"set fwd (%s)\n"
>  			"    Set packet forwarding mode.\n\n"
> 
> @@ -2738,48 +2733,131 @@ cmdline_parse_inst_t cmd_tx_vlan_reset = {
> 
> 
>  /* *** ENABLE HARDWARE INSERTION OF CHECKSUM IN TX PACKETS *** */ -
> struct cmd_tx_cksum_set_result {
> +struct cmd_tx_cksum_result {
>  	cmdline_fixed_string_t tx_cksum;
> -	cmdline_fixed_string_t set;
> -	uint8_t cksum_mask;
> +	cmdline_fixed_string_t mode;
> +	cmdline_fixed_string_t proto;
> +	cmdline_fixed_string_t hwsw;
>  	uint8_t port_id;
>  };
> 
>  static void
> -cmd_tx_cksum_set_parsed(void *parsed_result,
> +cmd_tx_cksum_parsed(void *parsed_result,
>  		       __attribute__((unused)) struct cmdline *cl,
>  		       __attribute__((unused)) void *data)  {
> -	struct cmd_tx_cksum_set_result *res = parsed_result;
> +	struct cmd_tx_cksum_result *res = parsed_result;
> +	int hw = 0;
> +	uint16_t ol_flags, mask = 0;
> +	struct rte_eth_dev_info dev_info;
> +
> +	if (port_id_is_invalid(res->port_id)) {
> +		printf("invalid port %d\n", res->port_id);
> +		return;
> +	}
> 
> -	tx_cksum_set(res->port_id, res->cksum_mask);
> +	if (!strcmp(res->mode, "set")) {
> +
> +		if (!strcmp(res->hwsw, "hw"))
> +			hw = 1;
> +
> +		if (!strcmp(res->proto, "ip")) {
> +			mask = TESTPMD_TX_OFFLOAD_IP_CKSUM;
> +		} else if (!strcmp(res->proto, "udp")) {
> +			mask = TESTPMD_TX_OFFLOAD_UDP_CKSUM;
> +		} else if (!strcmp(res->proto, "tcp")) {
> +			mask = TESTPMD_TX_OFFLOAD_TCP_CKSUM;
> +		} else if (!strcmp(res->proto, "sctp")) {
> +			mask = TESTPMD_TX_OFFLOAD_SCTP_CKSUM;
> +		} else if (!strcmp(res->proto, "vxlan")) {
> +			mask = TESTPMD_TX_OFFLOAD_VXLAN_CKSUM;
> +		}
> +
> +		if (hw)
> +			ports[res->port_id].tx_ol_flags |= mask;
> +		else
> +			ports[res->port_id].tx_ol_flags &= (~mask);
> +	}
> +
> +	ol_flags = ports[res->port_id].tx_ol_flags;
> +	printf("IP checksum offload is %s\n",
> +		(ol_flags & TESTPMD_TX_OFFLOAD_IP_CKSUM) ? "hw" : "sw");
> +	printf("UDP checksum offload is %s\n",
> +		(ol_flags & TESTPMD_TX_OFFLOAD_UDP_CKSUM) ? "hw" :
> "sw");
> +	printf("TCP checksum offload is %s\n",
> +		(ol_flags & TESTPMD_TX_OFFLOAD_TCP_CKSUM) ? "hw" : "sw");
> +	printf("SCTP checksum offload is %s\n",
> +		(ol_flags & TESTPMD_TX_OFFLOAD_SCTP_CKSUM) ? "hw" :
> "sw");
> +	printf("VxLAN checksum offload is %s\n",
> +		(ol_flags & TESTPMD_TX_OFFLOAD_VXLAN_CKSUM) ? "hw" :
> "sw");
> +
> +	/* display warnings if configuration is not supported by the NIC */
> +	rte_eth_dev_info_get(res->port_id, &dev_info);
> +	if ((ol_flags & TESTPMD_TX_OFFLOAD_IP_CKSUM) &&
> +		(dev_info.tx_offload_capa & DEV_TX_OFFLOAD_IPV4_CKSUM)
> == 0) {
> +		printf("Warning: hardware IP checksum enabled but not "
> +			"supported by port %d\n", res->port_id);
> +	}
> +	if ((ol_flags & TESTPMD_TX_OFFLOAD_UDP_CKSUM) &&
> +		(dev_info.tx_offload_capa & DEV_TX_OFFLOAD_UDP_CKSUM)
> == 0) {
> +		printf("Warning: hardware UDP checksum enabled but not "
> +			"supported by port %d\n", res->port_id);
> +	}
> +	if ((ol_flags & TESTPMD_TX_OFFLOAD_TCP_CKSUM) &&
> +		(dev_info.tx_offload_capa & DEV_TX_OFFLOAD_TCP_CKSUM) ==
> 0) {
> +		printf("Warning: hardware TCP checksum enabled but not "
> +			"supported by port %d\n", res->port_id);
> +	}
> +	if ((ol_flags & TESTPMD_TX_OFFLOAD_SCTP_CKSUM) &&
> +		(dev_info.tx_offload_capa & DEV_TX_OFFLOAD_SCTP_CKSUM)
> == 0) {
> +		printf("Warning: hardware SCTP checksum enabled but not "
> +			"supported by port %d\n", res->port_id);
> +	}
>  }
> 
> -cmdline_parse_token_string_t cmd_tx_cksum_set_tx_cksum =
> -	TOKEN_STRING_INITIALIZER(struct cmd_tx_cksum_set_result,
> +cmdline_parse_token_string_t cmd_tx_cksum_tx_cksum =
> +	TOKEN_STRING_INITIALIZER(struct cmd_tx_cksum_result,
>  				tx_cksum, "tx_checksum");
> -cmdline_parse_token_string_t cmd_tx_cksum_set_set =
> -	TOKEN_STRING_INITIALIZER(struct cmd_tx_cksum_set_result,
> -				set, "set");
> -cmdline_parse_token_num_t cmd_tx_cksum_set_cksum_mask =
> -	TOKEN_NUM_INITIALIZER(struct cmd_tx_cksum_set_result,
> -				cksum_mask, UINT8);
> -cmdline_parse_token_num_t cmd_tx_cksum_set_portid =
> -	TOKEN_NUM_INITIALIZER(struct cmd_tx_cksum_set_result,
> +cmdline_parse_token_string_t cmd_tx_cksum_mode =
> +	TOKEN_STRING_INITIALIZER(struct cmd_tx_cksum_result,
> +				mode, "set");
> +cmdline_parse_token_string_t cmd_tx_cksum_proto =
> +	TOKEN_STRING_INITIALIZER(struct cmd_tx_cksum_result,
> +				proto, "ip#tcp#udp#sctp#vxlan");
> +cmdline_parse_token_string_t cmd_tx_cksum_hwsw =
> +	TOKEN_STRING_INITIALIZER(struct cmd_tx_cksum_result,
> +				hwsw, "hw#sw");
> +cmdline_parse_token_num_t cmd_tx_cksum_portid =
> +	TOKEN_NUM_INITIALIZER(struct cmd_tx_cksum_result,
>  				port_id, UINT8);
> 
>  cmdline_parse_inst_t cmd_tx_cksum_set = {
> -	.f = cmd_tx_cksum_set_parsed,
> +	.f = cmd_tx_cksum_parsed,
> +	.data = NULL,
> +	.help_str = "enable/disable hardware calculation of L3/L4 checksum
> when "
> +		"using csum forward engine: tx_cksum set
> ip|tcp|udp|sctp|vxlan hw|sw <port>",
> +	.tokens = {
> +		(void *)&cmd_tx_cksum_tx_cksum,
> +		(void *)&cmd_tx_cksum_mode,
> +		(void *)&cmd_tx_cksum_proto,
> +		(void *)&cmd_tx_cksum_hwsw,
> +		(void *)&cmd_tx_cksum_portid,
> +		NULL,
> +	},
> +};
> +
> +cmdline_parse_token_string_t cmd_tx_cksum_mode_show =
> +	TOKEN_STRING_INITIALIZER(struct cmd_tx_cksum_result,
> +				mode, "show");
> +
> +cmdline_parse_inst_t cmd_tx_cksum_show = {
> +	.f = cmd_tx_cksum_parsed,
>  	.data = NULL,
> -	.help_str = "enable hardware insertion of L3/L4checksum with a given "
> -	"mask in packets sent on a port, the bit mapping is given as, Bit 0 for ip, "
> -	"Bit 1 for UDP, Bit 2 for TCP, Bit 3 for SCTP, Bit 4 for inner ip, "
> -	"Bit 5 for inner UDP, Bit 6 for inner TCP, Bit 7 for inner SCTP",
> +	.help_str = "show checksum offload configuration: tx_cksum show
> +<port>",
>  	.tokens = {
> -		(void *)&cmd_tx_cksum_set_tx_cksum,
> -		(void *)&cmd_tx_cksum_set_set,
> -		(void *)&cmd_tx_cksum_set_cksum_mask,
> -		(void *)&cmd_tx_cksum_set_portid,
> +		(void *)&cmd_tx_cksum_tx_cksum,
> +		(void *)&cmd_tx_cksum_mode_show,
> +		(void *)&cmd_tx_cksum_portid,
>  		NULL,
>  	},
>  };
> @@ -7796,6 +7874,7 @@ cmdline_parse_ctx_t main_ctx[] = {
>  	(cmdline_parse_inst_t *)&cmd_tx_vlan_reset,
>  	(cmdline_parse_inst_t *)&cmd_tx_vlan_set_pvid,
>  	(cmdline_parse_inst_t *)&cmd_tx_cksum_set,
> +	(cmdline_parse_inst_t *)&cmd_tx_cksum_show,
>  	(cmdline_parse_inst_t *)&cmd_link_flow_control_set,
>  	(cmdline_parse_inst_t *)&cmd_link_flow_control_set_rx,
>  	(cmdline_parse_inst_t *)&cmd_link_flow_control_set_tx, diff --git
> a/app/test-pmd/config.c b/app/test-pmd/config.c index 4b6fb91..6b234f6
> 100644
> --- a/app/test-pmd/config.c
> +++ b/app/test-pmd/config.c
> @@ -1748,17 +1748,6 @@ set_qmap(portid_t port_id, uint8_t is_rx, uint16_t
> queue_id, uint8_t map_value)  }
> 
>  void
> -tx_cksum_set(portid_t port_id, uint64_t ol_flags) -{
> -	uint64_t tx_ol_flags;
> -	if (port_id_is_invalid(port_id))
> -		return;
> -	/* Clear last 8 bits and then set L3/4 checksum mask again */
> -	tx_ol_flags = ports[port_id].tx_ol_flags & (~0x0FFull);
> -	ports[port_id].tx_ol_flags = ((ol_flags & 0xff) | tx_ol_flags);
> -}
> -
> -void
>  fdir_add_signature_filter(portid_t port_id, uint8_t queue_id,
>  			  struct rte_fdir_filter *fdir_filter)  { diff --git a/app/test-
> pmd/csumonly.c b/app/test-pmd/csumonly.c index 743094a..abc525c 100644
> --- a/app/test-pmd/csumonly.c
> +++ b/app/test-pmd/csumonly.c
> @@ -73,13 +73,19 @@
>  #include <rte_string_fns.h>
>  #include "testpmd.h"
> 
> -
> -
>  #define IP_DEFTTL  64   /* from RFC 1340. */
>  #define IP_VERSION 0x40
>  #define IP_HDRLEN  0x05 /* default IP header length == five 32-bits words. */
> #define IP_VHL_DEF (IP_VERSION | IP_HDRLEN)
> 
> +/* we cannot use htons() from arpa/inet.h due to name conflicts, and we
> + * cannot use rte_cpu_to_be_16() on a constant in a switch/case */ #if
> +__BYTE_ORDER == __LITTLE_ENDIAN #define _htons(x) ((uint16_t)((((x) &
> +0x00ffU) << 8) | (((x) & 0xff00U) >> 8))) #else #define _htons(x) (x)
> +#endif
> +
>  static inline uint16_t
>  get_16b_sum(uint16_t *ptr16, uint32_t nr)  { @@ -112,7 +118,7 @@
> get_ipv4_cksum(struct ipv4_hdr *ipv4_hdr)
> 
> 
>  static inline uint16_t
> -get_ipv4_psd_sum (struct ipv4_hdr * ip_hdr)
> +get_ipv4_psd_sum(struct ipv4_hdr *ip_hdr)
>  {
>  	/* Pseudo Header for IPv4/UDP/TCP checksum */
>  	union ipv4_psd_header {
> @@ -136,7 +142,7 @@ get_ipv4_psd_sum (struct ipv4_hdr * ip_hdr)  }
> 
>  static inline uint16_t
> -get_ipv6_psd_sum (struct ipv6_hdr * ip_hdr)
> +get_ipv6_psd_sum(struct ipv6_hdr *ip_hdr)
>  {
>  	/* Pseudo Header for IPv6/UDP/TCP checksum */
>  	union ipv6_psd_header {
> @@ -158,6 +164,15 @@ get_ipv6_psd_sum (struct ipv6_hdr * ip_hdr)
>  	return get_16b_sum(psd_hdr.u16_arr, sizeof(psd_hdr));  }
> 
> +static uint16_t
> +get_psd_sum(void *l3_hdr, uint16_t ethertype) {
> +	if (ethertype == _htons(ETHER_TYPE_IPv4))
> +		return get_ipv4_psd_sum(l3_hdr);
> +	else /* assume ethertype == ETHER_TYPE_IPv6 */
> +		return get_ipv6_psd_sum(l3_hdr);
> +}
> +
>  static inline uint16_t
>  get_ipv4_udptcp_checksum(struct ipv4_hdr *ipv4_hdr, uint16_t *l4_hdr)  { @@ -
> 174,7 +189,6 @@ get_ipv4_udptcp_checksum(struct ipv4_hdr *ipv4_hdr,
> uint16_t *l4_hdr)
>  	if (cksum == 0)
>  		cksum = 0xffff;
>  	return (uint16_t)cksum;
> -
>  }
> 
>  static inline uint16_t
> @@ -196,48 +210,218 @@ get_ipv6_udptcp_checksum(struct ipv6_hdr
> *ipv6_hdr, uint16_t *l4_hdr)
>  	return (uint16_t)cksum;
>  }
> 
> +static uint16_t
> +get_udptcp_checksum(void *l3_hdr, void *l4_hdr, uint16_t ethertype) {
> +	if (ethertype == _htons(ETHER_TYPE_IPv4))
> +		return get_ipv4_udptcp_checksum(l3_hdr, l4_hdr);
> +	else /* assume ethertype == ETHER_TYPE_IPv6 */
> +		return get_ipv6_udptcp_checksum(l3_hdr, l4_hdr); }
> 
>  /*
> - * Forwarding of packets. Change the checksum field with HW or SW methods
> - * The HW/SW method selection depends on the ol_flags on every packet
> + * Parse an ethernet header to fill the ethertype, l2_len, l3_len and
> + * ipproto. This function is able to recognize IPv4/IPv6 with one
> +optional vlan
> + * header.
> + */
> +static void
> +parse_ethernet(struct ether_hdr *eth_hdr, uint16_t *ethertype, uint16_t
> *l2_len,
> +	uint16_t *l3_len, uint8_t *l4_proto)
> +{
> +	struct ipv4_hdr *ipv4_hdr;
> +	struct ipv6_hdr *ipv6_hdr;
> +
> +	*l2_len = sizeof(struct ether_hdr);
> +	*ethertype = eth_hdr->ether_type;
> +
> +	if (*ethertype == _htons(ETHER_TYPE_VLAN)) {
> +		struct vlan_hdr *vlan_hdr = (struct vlan_hdr *)(eth_hdr + 1);
> +
> +		*l2_len  += sizeof(struct vlan_hdr);
> +		*ethertype = vlan_hdr->eth_proto;
> +	}
> +
> +	switch (*ethertype) {
> +	case _htons(ETHER_TYPE_IPv4):
> +		ipv4_hdr = (struct ipv4_hdr *) ((char *)eth_hdr + *l2_len);
> +		*l3_len = (ipv4_hdr->version_ihl & 0x0f) * 4;
> +		*l4_proto = ipv4_hdr->next_proto_id;
> +		break;
> +	case _htons(ETHER_TYPE_IPv6):
> +		ipv6_hdr = (struct ipv6_hdr *) ((char *)eth_hdr + *l2_len);
> +		*l3_len = sizeof(struct ipv6_hdr) ;
> +		*l4_proto = ipv6_hdr->proto;
> +		break;
> +	default:
> +		*l3_len = 0;
> +		*l4_proto = 0;
> +		break;
> +	}
> +}
> +
> +/* modify the IPv4 or IPv4 source address of a packet */ static void
> +change_ip_addresses(void *l3_hdr, uint16_t ethertype) {
> +	struct ipv4_hdr *ipv4_hdr = l3_hdr;
> +	struct ipv6_hdr *ipv6_hdr = l3_hdr;
> +
> +	if (ethertype == _htons(ETHER_TYPE_IPv4)) {
> +		ipv4_hdr->src_addr =
> +			rte_cpu_to_be_32(rte_be_to_cpu_32(ipv4_hdr-
> >src_addr) + 1);
> +	}
> +	else if (ethertype == _htons(ETHER_TYPE_IPv6)) {
> +		ipv6_hdr->src_addr[15] = ipv6_hdr->src_addr[15] + 1;
> +	}
> +}
> +
> +/* if possible, calculate the checksum of a packet in hw or sw,
> + * depending on the testpmd command line configuration */ static
> +uint64_t process_inner_cksums(void *l3_hdr, uint16_t ethertype,
> +uint16_t l3_len,
> +	uint8_t l4_proto, uint16_t testpmd_ol_flags) {
> +	struct ipv4_hdr *ipv4_hdr = l3_hdr;
> +	struct udp_hdr *udp_hdr;
> +	struct tcp_hdr *tcp_hdr;
> +	struct sctp_hdr *sctp_hdr;
> +	uint64_t ol_flags = 0;
> +
> +	if (ethertype == _htons(ETHER_TYPE_IPv4)) {
> +		ipv4_hdr = l3_hdr;
> +		ipv4_hdr->hdr_checksum = 0;
> +
> +		if (testpmd_ol_flags & TESTPMD_TX_OFFLOAD_IP_CKSUM)
> +			ol_flags |= PKT_TX_IP_CKSUM;
> +		else
> +			ipv4_hdr->hdr_checksum = get_ipv4_cksum(ipv4_hdr);
> +
> +	}
> +	else if (ethertype != _htons(ETHER_TYPE_IPv6))
> +		return 0; /* packet type not supported nothing to do */
> +
> +	if (l4_proto == IPPROTO_UDP) {
> +		udp_hdr = (struct udp_hdr *)((char *)l3_hdr + l3_len);
> +		/* do not recalculate udp cksum if it was 0 */
> +		if (udp_hdr->dgram_cksum != 0) {
> +			udp_hdr->dgram_cksum = 0;
> +			if (testpmd_ol_flags &
> TESTPMD_TX_OFFLOAD_UDP_CKSUM) {
> +				ol_flags |= PKT_TX_UDP_CKSUM;
> +				udp_hdr->dgram_cksum = get_psd_sum(l3_hdr,
> +					ethertype);
> +			}
> +			else {
> +				udp_hdr->dgram_cksum =
> +					get_udptcp_checksum(l3_hdr, udp_hdr,
> +						ethertype);
> +			}
> +		}
> +	}
> +	else if (l4_proto == IPPROTO_TCP) {
> +		tcp_hdr = (struct tcp_hdr *)((char *)l3_hdr + l3_len);
> +		tcp_hdr->cksum = 0;
> +		if (testpmd_ol_flags & TESTPMD_TX_OFFLOAD_TCP_CKSUM) {
> +			ol_flags |= PKT_TX_TCP_CKSUM;
> +			tcp_hdr->cksum = get_psd_sum(l3_hdr, ethertype);
> +		}
> +		else {
> +			tcp_hdr->cksum =
> +				get_udptcp_checksum(l3_hdr, tcp_hdr,
> ethertype);
> +		}
> +	}
> +	else if (l4_proto == IPPROTO_SCTP) {
> +		sctp_hdr = (struct sctp_hdr *)((char *)l3_hdr + l3_len);
> +		sctp_hdr->cksum = 0;
> +		/* sctp payload must be a multiple of 4 to be
> +		 * offloaded */
> +		if ((testpmd_ol_flags & TESTPMD_TX_OFFLOAD_SCTP_CKSUM)
> &&
> +			((ipv4_hdr->total_length & 0x3) == 0)) {
> +			ol_flags |= PKT_TX_SCTP_CKSUM;
> +		}
> +		else {
> +			/* XXX implement CRC32c, example available in
> +			 * RFC3309 */
> +		}
> +	}
> +
> +	return ol_flags;
> +}
> +
> +/* Calculate the checksum of outer header (only vxlan is supported,
> + * meaning IP + UDP). The caller already checked that it's a vxlan
> + * packet */
> +static uint64_t
> +process_outer_cksums(void *outer_l3_hdr, uint16_t outer_ethertype,
> +	uint16_t outer_l3_len, uint16_t testpmd_ol_flags) {
> +	struct ipv4_hdr *ipv4_hdr = outer_l3_hdr;
> +	struct ipv6_hdr *ipv6_hdr = outer_l3_hdr;
> +	struct udp_hdr *udp_hdr;
> +	uint64_t ol_flags = 0;
> +
> +	if (testpmd_ol_flags & TESTPMD_TX_OFFLOAD_VXLAN_CKSUM)
> +		ol_flags |= PKT_TX_IP_CKSUM;
> +
> +	if (outer_ethertype == _htons(ETHER_TYPE_IPv4)) {
> +		ipv4_hdr->hdr_checksum = 0;
> +
> +		if ((testpmd_ol_flags & TESTPMD_TX_OFFLOAD_VXLAN_CKSUM)
> == 0)
> +			ipv4_hdr->hdr_checksum = get_ipv4_cksum(ipv4_hdr);
> +	}
> +
> +	udp_hdr = (struct udp_hdr *)((char *)outer_l3_hdr + outer_l3_len);
> +	/* do not recalculate udp cksum if it was 0 */
> +	if (udp_hdr->dgram_cksum != 0) {
> +		udp_hdr->dgram_cksum = 0;
> +		if ((testpmd_ol_flags & TESTPMD_TX_OFFLOAD_VXLAN_CKSUM)
> == 0) {
> +			if (outer_ethertype == _htons(ETHER_TYPE_IPv4))
> +				udp_hdr->dgram_cksum =
> +					get_ipv4_udptcp_checksum(ipv4_hdr,
> +						(uint16_t *)udp_hdr);
> +			else
> +				udp_hdr->dgram_cksum =
> +					get_ipv6_udptcp_checksum(ipv6_hdr,
> +						(uint16_t *)udp_hdr);
> +		}
> +	}
> +
> +	return ol_flags;
> +}
> +
> +/*
> + * Receive a burst of packets, and for supported packet types:
> + *  - modify the IPs
> + *  - reprocess the checksum in SW or HW, depending on testpmd command line
> + *    configuration
> + * Then packets are transmitted on the output port.
> + *
> + * Supported packets are:
> + *   Ether / (vlan) / IP|IP6 / UDP|TCP|SCTP .
> + *   Ether / (vlan) / IP|IP6 / UDP / VxLAN / Ether / IP|IP6 / UDP|TCP|SCTP
> + *
> + * The network parser supposes that the packet is contiguous, which may
> + * not be the case in real life.
>   */
>  static void
>  pkt_burst_checksum_forward(struct fwd_stream *fs)  {
> -	struct rte_mbuf  *pkts_burst[MAX_PKT_BURST];
> -	struct rte_port  *txp;
> -	struct rte_mbuf  *mb;
> +	struct rte_mbuf *pkts_burst[MAX_PKT_BURST];
> +	struct rte_port *txp;
> +	struct rte_mbuf *m;
>  	struct ether_hdr *eth_hdr;
> -	struct ipv4_hdr  *ipv4_hdr;
> -	struct ether_hdr *inner_eth_hdr;
> -	struct ipv4_hdr  *inner_ipv4_hdr = NULL;
> -	struct ipv6_hdr  *ipv6_hdr;
> -	struct ipv6_hdr  *inner_ipv6_hdr = NULL;
> -	struct udp_hdr   *udp_hdr;
> -	struct udp_hdr   *inner_udp_hdr;
> -	struct tcp_hdr   *tcp_hdr;
> -	struct tcp_hdr   *inner_tcp_hdr;
> -	struct sctp_hdr  *sctp_hdr;
> -	struct sctp_hdr  *inner_sctp_hdr;
> -
> +	void *l3_hdr = NULL, *outer_l3_hdr = NULL; /* can be IPv4 or IPv6 */
> +	struct udp_hdr *udp_hdr;
>  	uint16_t nb_rx;
>  	uint16_t nb_tx;
>  	uint16_t i;
>  	uint64_t ol_flags;
> -	uint64_t pkt_ol_flags;
> -	uint64_t tx_ol_flags;
> -	uint16_t l4_proto;
> -	uint16_t inner_l4_proto = 0;
> -	uint16_t eth_type;
> -	uint8_t  l2_len;
> -	uint8_t  l3_len;
> -	uint8_t  inner_l2_len = 0;
> -	uint8_t  inner_l3_len = 0;
> -
> +	uint16_t testpmd_ol_flags;
> +	uint8_t l4_proto;
> +	uint16_t ethertype = 0, outer_ethertype = 0;
> +	uint16_t  l2_len = 0, l3_len = 0, outer_l2_len = 0, outer_l3_len = 0;
> +	int tunnel = 0;
>  	uint32_t rx_bad_ip_csum;
>  	uint32_t rx_bad_l4_csum;
> -	uint8_t  ipv4_tunnel;
> -	uint8_t  ipv6_tunnel;
> 
>  #ifdef RTE_TEST_PMD_RECORD_CORE_CYCLES
>  	uint64_t start_tsc;
> @@ -249,9 +433,7 @@ pkt_burst_checksum_forward(struct fwd_stream *fs)
>  	start_tsc = rte_rdtsc();
>  #endif
> 
> -	/*
> -	 * Receive a burst of packets and forward them.
> -	 */
> +	/* receive a burst of packet */
>  	nb_rx = rte_eth_rx_burst(fs->rx_port, fs->rx_queue, pkts_burst,
>  				 nb_pkt_per_burst);
>  	if (unlikely(nb_rx == 0))
> @@ -265,348 +447,107 @@ pkt_burst_checksum_forward(struct fwd_stream
> *fs)
>  	rx_bad_l4_csum = 0;
> 
>  	txp = &ports[fs->tx_port];
> -	tx_ol_flags = txp->tx_ol_flags;
> +	testpmd_ol_flags = txp->tx_ol_flags;
> 
>  	for (i = 0; i < nb_rx; i++) {
> 
> -		mb = pkts_burst[i];
> -		l2_len  = sizeof(struct ether_hdr);
> -		pkt_ol_flags = mb->ol_flags;
> -		ol_flags = (pkt_ol_flags & (~PKT_TX_L4_MASK));
> -		ipv4_tunnel = (pkt_ol_flags & PKT_RX_TUNNEL_IPV4_HDR) ?
> -				1 : 0;
> -		ipv6_tunnel = (pkt_ol_flags & PKT_RX_TUNNEL_IPV6_HDR) ?
> -				1 : 0;
> -		eth_hdr = rte_pktmbuf_mtod(mb, struct ether_hdr *);
> -		eth_type = rte_be_to_cpu_16(eth_hdr->ether_type);
> -		if (eth_type == ETHER_TYPE_VLAN) {
> -			/* Only allow single VLAN label here */
> -			l2_len  += sizeof(struct vlan_hdr);
> -			 eth_type = rte_be_to_cpu_16(*(uint16_t *)
> -				((uintptr_t)&eth_hdr->ether_type +
> -				sizeof(struct vlan_hdr)));
> +		ol_flags = 0;
> +		tunnel = 0;
> +		m = pkts_burst[i];
> +
> +		/* Update the L3/L4 checksum error packet statistics */
> +		rx_bad_ip_csum += ((m->ol_flags & PKT_RX_IP_CKSUM_BAD) !=
> 0);
> +		rx_bad_l4_csum += ((m->ol_flags & PKT_RX_L4_CKSUM_BAD) !=
> 0);
> +
> +		/* step 1: dissect packet, parsing optional vlan, ip4/ip6, vxlan
> +		 * and inner headers */
> +
> +		eth_hdr = rte_pktmbuf_mtod(m, struct ether_hdr *);
> +		parse_ethernet(eth_hdr, &ethertype, &l2_len, &l3_len,
> &l4_proto);
> +		l3_hdr = (char *)eth_hdr + l2_len;
> +
> +		/* check if it's a supported tunnel (only vxlan for now) */
> +		if (l4_proto == IPPROTO_UDP) {
> +			udp_hdr = (struct udp_hdr *)((char *)l3_hdr + l3_len);
> +
> +			/* currently, this flag is set by i40e only if the
> +			 * packet is vxlan */
> +			if (((m->ol_flags & PKT_RX_TUNNEL_IPV4_HDR) ||
> +					(m->ol_flags &
> PKT_RX_TUNNEL_IPV6_HDR)))
> +				tunnel = 1;
> +			/* else check udp destination port, 4789 is the default
> +			 * vxlan port (rfc7348) */
> +			else if (udp_hdr->dst_port == _htons(4789))
> +				tunnel = 1;
> +
> +			if (tunnel == 1) {
> +				outer_ethertype = ethertype;
> +				outer_l2_len = l2_len;
> +				outer_l3_len = l3_len;
> +				outer_l3_hdr = l3_hdr;
> +
> +				eth_hdr = (struct ether_hdr *)((char *)udp_hdr +
> +					sizeof(struct udp_hdr) +
> +					sizeof(struct vxlan_hdr));
> +
> +				parse_ethernet(eth_hdr, &ethertype, &l2_len,
> +					&l3_len, &l4_proto);
> +				l3_hdr = (char *)eth_hdr + l2_len;
> +			}
>  		}
> 
> -		/* Update the L3/L4 checksum error packet count  */
> -		rx_bad_ip_csum += (uint16_t) ((pkt_ol_flags &
> PKT_RX_IP_CKSUM_BAD) != 0);
> -		rx_bad_l4_csum += (uint16_t) ((pkt_ol_flags &
> PKT_RX_L4_CKSUM_BAD) != 0);
> -
> -		/*
> -		 * Try to figure out L3 packet type by SW.
> -		 */
> -		if ((pkt_ol_flags & (PKT_RX_IPV4_HDR | PKT_RX_IPV4_HDR_EXT
> |
> -				PKT_RX_IPV6_HDR | PKT_RX_IPV6_HDR_EXT))
> == 0) {
> -			if (eth_type == ETHER_TYPE_IPv4)
> -				pkt_ol_flags |= PKT_RX_IPV4_HDR;
> -			else if (eth_type == ETHER_TYPE_IPv6)
> -				pkt_ol_flags |= PKT_RX_IPV6_HDR;
> -		}
> +		/* step 2: change all source IPs (v4 or v6) so we need
> +		 * to recompute the chksums even if they were correct */
> 
> -		/*
> -		 * Simplify the protocol parsing
> -		 * Assuming the incoming packets format as
> -		 *      Ethernet2 + optional single VLAN
> -		 *      + ipv4 or ipv6
> -		 *      + udp or tcp or sctp or others
> -		 */
> -		if (pkt_ol_flags & (PKT_RX_IPV4_HDR |
> PKT_RX_TUNNEL_IPV4_HDR)) {
> +		change_ip_addresses(l3_hdr, ethertype);
> +		if (tunnel == 1)
> +			change_ip_addresses(outer_l3_hdr, outer_ethertype);
> 
> -			/* Do not support ipv4 option field */
> -			l3_len = sizeof(struct ipv4_hdr) ;
> +		/* step 3: depending on user command line configuration,
> +		 * recompute checksum either in software or flag the
> +		 * mbuf to offload the calculation to the NIC */
> 
> -			ipv4_hdr = (struct ipv4_hdr *) (rte_pktmbuf_mtod(mb,
> -					unsigned char *) + l2_len);
> +		/* process checksums of inner headers first */
> +		ol_flags |= process_inner_cksums(l3_hdr, ethertype,
> +			l3_len, l4_proto, testpmd_ol_flags);
> 
> -			l4_proto = ipv4_hdr->next_proto_id;
> +		/* Then process outer headers if any. Note that the software
> +		 * checksum will be wrong if one of the inner checksums is
> +		 * processed in hardware. */
> +		if (tunnel == 1) {
> +			ol_flags |= process_outer_cksums(outer_l3_hdr,
> +				outer_ethertype, outer_l3_len,
> testpmd_ol_flags);
> +		}
> 
> -			/* Do not delete, this is required by HW*/
> -			ipv4_hdr->hdr_checksum = 0;
> +		/* step 4: fill the mbuf meta data (flags and header lengths) */
> 
> -			if (tx_ol_flags & TESTPMD_TX_OFFLOAD_IP_CKSUM) {
> -				/* HW checksum */
> -				ol_flags |= PKT_TX_IP_CKSUM;
> +		if (tunnel == 1) {
> +			if (testpmd_ol_flags &
> TESTPMD_TX_OFFLOAD_VXLAN_CKSUM) {
> +				m->l2_len = outer_l2_len;
> +				m->l3_len = outer_l3_len;
> +				m->inner_l2_len = l2_len;
> +				m->inner_l3_len = l3_len;
>  			}
>  			else {
> -				ol_flags |= PKT_TX_IPV4;
> -				/* SW checksum calculation */
> -				ipv4_hdr->src_addr++;
> -				ipv4_hdr->hdr_checksum =
> get_ipv4_cksum(ipv4_hdr);
> +				/* if we don't do vxlan cksum in hw,
> +				   outer checksum will be wrong because
> +				   we changed the ip, but it shows that
> +				   we can process the inner header cksum
> +				   in the nic */
> +				m->l2_len = outer_l2_len + outer_l3_len +
> +					sizeof(struct udp_hdr) +
> +					sizeof(struct vxlan_hdr) + l2_len;
> +				m->l3_len = l3_len;
>  			}
> -
> -			if (l4_proto == IPPROTO_UDP) {
> -				udp_hdr = (struct udp_hdr*)
> (rte_pktmbuf_mtod(mb,
> -						unsigned char *) + l2_len +
> l3_len);
> -				if (tx_ol_flags &
> TESTPMD_TX_OFFLOAD_UDP_CKSUM) {
> -					/* HW Offload */
> -					ol_flags |= PKT_TX_UDP_CKSUM;
> -					if (ipv4_tunnel)
> -						udp_hdr->dgram_cksum = 0;
> -					else
> -						/* Pseudo header sum need be
> set properly */
> -						udp_hdr->dgram_cksum =
> -
> 	get_ipv4_psd_sum(ipv4_hdr);
> -				}
> -				else {
> -					/* SW Implementation, clear checksum
> field first */
> -					udp_hdr->dgram_cksum = 0;
> -					udp_hdr->dgram_cksum =
> get_ipv4_udptcp_checksum(ipv4_hdr,
> -
> 	(uint16_t *)udp_hdr);
> -				}
> -
> -				if (ipv4_tunnel) {
> -
> -					uint16_t len;
> -
> -					/* Check if inner L3/L4 checkum flag is
> set */
> -					if (tx_ol_flags &
> TESTPMD_TX_OFFLOAD_INNER_CKSUM_MASK)
> -						ol_flags |=
> PKT_TX_VXLAN_CKSUM;
> -
> -					inner_l2_len  = sizeof(struct ether_hdr);
> -					inner_eth_hdr = (struct ether_hdr *)
> (rte_pktmbuf_mtod(mb,
> -								unsigned char
> *) + l2_len + l3_len
> -								 +
> ETHER_VXLAN_HLEN);
> -
> -					eth_type =
> rte_be_to_cpu_16(inner_eth_hdr->ether_type);
> -					if (eth_type == ETHER_TYPE_VLAN) {
> -						inner_l2_len += sizeof(struct
> vlan_hdr);
> -						eth_type =
> rte_be_to_cpu_16(*(uint16_t *)
> -							((uintptr_t)&eth_hdr-
> >ether_type +
> -								sizeof(struct
> vlan_hdr)));
> -					}
> -
> -					len = l2_len + l3_len +
> ETHER_VXLAN_HLEN + inner_l2_len;
> -					if (eth_type == ETHER_TYPE_IPv4) {
> -						inner_l3_len = sizeof(struct
> ipv4_hdr);
> -						inner_ipv4_hdr = (struct
> ipv4_hdr *) (rte_pktmbuf_mtod(mb,
> -								unsigned char
> *) + len);
> -						inner_l4_proto =
> inner_ipv4_hdr->next_proto_id;
> -
> -						if (tx_ol_flags &
> TESTPMD_TX_OFFLOAD_INNER_IP_CKSUM) {
> -
> -							/* Do not delete, this is
> required by HW*/
> -							inner_ipv4_hdr-
> >hdr_checksum = 0;
> -							ol_flags |=
> PKT_TX_IPV4_CSUM;
> -						}
> -
> -					} else if (eth_type ==
> ETHER_TYPE_IPv6) {
> -						inner_l3_len = sizeof(struct
> ipv6_hdr);
> -						inner_ipv6_hdr = (struct
> ipv6_hdr *) (rte_pktmbuf_mtod(mb,
> -								unsigned char
> *) + len);
> -						inner_l4_proto =
> inner_ipv6_hdr->proto;
> -					}
> -					if ((inner_l4_proto == IPPROTO_UDP)
> &&
> -						(tx_ol_flags &
> TESTPMD_TX_OFFLOAD_INNER_UDP_CKSUM)) {
> -
> -						/* HW Offload */
> -						ol_flags |=
> PKT_TX_UDP_CKSUM;
> -						inner_udp_hdr = (struct
> udp_hdr *) (rte_pktmbuf_mtod(mb,
> -								unsigned char
> *) + len + inner_l3_len);
> -						if (eth_type ==
> ETHER_TYPE_IPv4)
> -							inner_udp_hdr-
> >dgram_cksum = get_ipv4_psd_sum(inner_ipv4_hdr);
> -						else if (eth_type ==
> ETHER_TYPE_IPv6)
> -							inner_udp_hdr-
> >dgram_cksum = get_ipv6_psd_sum(inner_ipv6_hdr);
> -
> -					} else if ((inner_l4_proto ==
> IPPROTO_TCP) &&
> -						(tx_ol_flags &
> TESTPMD_TX_OFFLOAD_INNER_TCP_CKSUM)) {
> -						/* HW Offload */
> -						ol_flags |=
> PKT_TX_TCP_CKSUM;
> -						inner_tcp_hdr = (struct tcp_hdr
> *) (rte_pktmbuf_mtod(mb,
> -								unsigned char
> *) + len + inner_l3_len);
> -						if (eth_type ==
> ETHER_TYPE_IPv4)
> -							inner_tcp_hdr->cksum
> = get_ipv4_psd_sum(inner_ipv4_hdr);
> -						else if (eth_type ==
> ETHER_TYPE_IPv6)
> -							inner_tcp_hdr->cksum
> = get_ipv6_psd_sum(inner_ipv6_hdr);
> -					} else if ((inner_l4_proto ==
> IPPROTO_SCTP) &&
> -						(tx_ol_flags &
> TESTPMD_TX_OFFLOAD_INNER_SCTP_CKSUM)) {
> -						/* HW Offload */
> -						ol_flags |=
> PKT_TX_SCTP_CKSUM;
> -						inner_sctp_hdr = (struct
> sctp_hdr *) (rte_pktmbuf_mtod(mb,
> -								unsigned char
> *) + len + inner_l3_len);
> -						inner_sctp_hdr->cksum = 0;
> -					}
> -
> -				}
> -
> -			} else if (l4_proto == IPPROTO_TCP) {
> -				tcp_hdr = (struct tcp_hdr*)
> (rte_pktmbuf_mtod(mb,
> -						unsigned char *) + l2_len +
> l3_len);
> -				if (tx_ol_flags &
> TESTPMD_TX_OFFLOAD_TCP_CKSUM) {
> -					ol_flags |= PKT_TX_TCP_CKSUM;
> -					tcp_hdr->cksum =
> get_ipv4_psd_sum(ipv4_hdr);
> -				}
> -				else {
> -					tcp_hdr->cksum = 0;
> -					tcp_hdr->cksum =
> get_ipv4_udptcp_checksum(ipv4_hdr,
> -							(uint16_t*)tcp_hdr);
> -				}
> -			} else if (l4_proto == IPPROTO_SCTP) {
> -				sctp_hdr = (struct sctp_hdr*)
> (rte_pktmbuf_mtod(mb,
> -						unsigned char *) + l2_len +
> l3_len);
> -
> -				if (tx_ol_flags &
> TESTPMD_TX_OFFLOAD_SCTP_CKSUM) {
> -					ol_flags |= PKT_TX_SCTP_CKSUM;
> -					sctp_hdr->cksum = 0;
> -
> -					/* Sanity check, only number of 4 bytes
> supported */
> -					if ((rte_be_to_cpu_16(ipv4_hdr-
> >total_length) % 4) != 0)
> -						printf("sctp payload must be a
> multiple "
> -							"of 4 bytes for
> checksum offload");
> -				}
> -				else {
> -					sctp_hdr->cksum = 0;
> -					/* CRC32c sample code available in
> RFC3309 */
> -				}
> -			}
> -			/* End of L4 Handling*/
> -		} else if (pkt_ol_flags & (PKT_RX_IPV6_HDR |
> PKT_RX_TUNNEL_IPV6_HDR)) {
> -			ipv6_hdr = (struct ipv6_hdr *) (rte_pktmbuf_mtod(mb,
> -					unsigned char *) + l2_len);
> -			l3_len = sizeof(struct ipv6_hdr) ;
> -			l4_proto = ipv6_hdr->proto;
> -			ol_flags |= PKT_TX_IPV6;
> -
> -			if (l4_proto == IPPROTO_UDP) {
> -				udp_hdr = (struct udp_hdr*)
> (rte_pktmbuf_mtod(mb,
> -						unsigned char *) + l2_len +
> l3_len);
> -				if (tx_ol_flags &
> TESTPMD_TX_OFFLOAD_UDP_CKSUM) {
> -					/* HW Offload */
> -					ol_flags |= PKT_TX_UDP_CKSUM;
> -					if (ipv6_tunnel)
> -						udp_hdr->dgram_cksum = 0;
> -					else
> -						udp_hdr->dgram_cksum =
> -
> 	get_ipv6_psd_sum(ipv6_hdr);
> -				}
> -				else {
> -					/* SW Implementation */
> -					/* checksum field need be clear first */
> -					udp_hdr->dgram_cksum = 0;
> -					udp_hdr->dgram_cksum =
> get_ipv6_udptcp_checksum(ipv6_hdr,
> -								(uint16_t
> *)udp_hdr);
> -				}
> -
> -				if (ipv6_tunnel) {
> -
> -					uint16_t len;
> -
> -					/* Check if inner L3/L4 checksum flag is
> set */
> -					if (tx_ol_flags &
> TESTPMD_TX_OFFLOAD_INNER_CKSUM_MASK)
> -						ol_flags |=
> PKT_TX_VXLAN_CKSUM;
> -
> -					inner_l2_len  = sizeof(struct ether_hdr);
> -					inner_eth_hdr = (struct ether_hdr *)
> (rte_pktmbuf_mtod(mb,
> -						unsigned char *) + l2_len +
> l3_len + ETHER_VXLAN_HLEN);
> -					eth_type =
> rte_be_to_cpu_16(inner_eth_hdr->ether_type);
> -
> -					if (eth_type == ETHER_TYPE_VLAN) {
> -						inner_l2_len += sizeof(struct
> vlan_hdr);
> -						eth_type =
> rte_be_to_cpu_16(*(uint16_t *)
> -							((uintptr_t)&eth_hdr-
> >ether_type +
> -							sizeof(struct
> vlan_hdr)));
> -					}
> -
> -					len = l2_len + l3_len +
> ETHER_VXLAN_HLEN + inner_l2_len;
> -
> -					if (eth_type == ETHER_TYPE_IPv4) {
> -						inner_l3_len = sizeof(struct
> ipv4_hdr);
> -						inner_ipv4_hdr = (struct
> ipv4_hdr *) (rte_pktmbuf_mtod(mb,
> -								unsigned char
> *) + len);
> -						inner_l4_proto =
> inner_ipv4_hdr->next_proto_id;
> -
> -						/* HW offload */
> -						if (tx_ol_flags &
> TESTPMD_TX_OFFLOAD_INNER_IP_CKSUM) {
> -
> -							/* Do not delete, this is
> required by HW*/
> -							inner_ipv4_hdr-
> >hdr_checksum = 0;
> -							ol_flags |=
> PKT_TX_IPV4_CSUM;
> -						}
> -					} else if (eth_type ==
> ETHER_TYPE_IPv6) {
> -						inner_l3_len = sizeof(struct
> ipv6_hdr);
> -						inner_ipv6_hdr = (struct
> ipv6_hdr *) (rte_pktmbuf_mtod(mb,
> -							unsigned char *) + len);
> -						inner_l4_proto =
> inner_ipv6_hdr->proto;
> -					}
> -
> -					if ((inner_l4_proto == IPPROTO_UDP)
> &&
> -						(tx_ol_flags &
> TESTPMD_TX_OFFLOAD_INNER_UDP_CKSUM)) {
> -						inner_udp_hdr = (struct
> udp_hdr *) (rte_pktmbuf_mtod(mb,
> -							unsigned char *) + len
> + inner_l3_len);
> -						/* HW offload */
> -						ol_flags |=
> PKT_TX_UDP_CKSUM;
> -						inner_udp_hdr->dgram_cksum
> = 0;
> -						if (eth_type ==
> ETHER_TYPE_IPv4)
> -							inner_udp_hdr-
> >dgram_cksum = get_ipv4_psd_sum(inner_ipv4_hdr);
> -						else if (eth_type ==
> ETHER_TYPE_IPv6)
> -							inner_udp_hdr-
> >dgram_cksum = get_ipv6_psd_sum(inner_ipv6_hdr);
> -					} else if ((inner_l4_proto ==
> IPPROTO_TCP) &&
> -						(tx_ol_flags &
> TESTPMD_TX_OFFLOAD_INNER_TCP_CKSUM)) {
> -						/* HW offload */
> -						ol_flags |=
> PKT_TX_TCP_CKSUM;
> -						inner_tcp_hdr = (struct tcp_hdr
> *) (rte_pktmbuf_mtod(mb,
> -								unsigned char
> *) + len + inner_l3_len);
> -
> -						if (eth_type ==
> ETHER_TYPE_IPv4)
> -							inner_tcp_hdr->cksum
> = get_ipv4_psd_sum(inner_ipv4_hdr);
> -						else if (eth_type ==
> ETHER_TYPE_IPv6)
> -							inner_tcp_hdr->cksum
> = get_ipv6_psd_sum(inner_ipv6_hdr);
> -
> -					} else if ((inner_l4_proto ==
> IPPROTO_SCTP) &&
> -						(tx_ol_flags &
> TESTPMD_TX_OFFLOAD_INNER_SCTP_CKSUM)) {
> -						/* HW offload */
> -						ol_flags |=
> PKT_TX_SCTP_CKSUM;
> -						inner_sctp_hdr = (struct
> sctp_hdr *) (rte_pktmbuf_mtod(mb,
> -								unsigned char
> *) + len + inner_l3_len);
> -						inner_sctp_hdr->cksum = 0;
> -					}
> -
> -				}
> -
> -			}
> -			else if (l4_proto == IPPROTO_TCP) {
> -				tcp_hdr = (struct tcp_hdr*)
> (rte_pktmbuf_mtod(mb,
> -						unsigned char *) + l2_len +
> l3_len);
> -				if (tx_ol_flags &
> TESTPMD_TX_OFFLOAD_TCP_CKSUM) {
> -					ol_flags |= PKT_TX_TCP_CKSUM;
> -					tcp_hdr->cksum =
> get_ipv6_psd_sum(ipv6_hdr);
> -				}
> -				else {
> -					tcp_hdr->cksum = 0;
> -					tcp_hdr->cksum =
> get_ipv6_udptcp_checksum(ipv6_hdr,
> -							(uint16_t*)tcp_hdr);
> -				}
> -			}
> -			else if (l4_proto == IPPROTO_SCTP) {
> -				sctp_hdr = (struct sctp_hdr*)
> (rte_pktmbuf_mtod(mb,
> -						unsigned char *) + l2_len +
> l3_len);
> -
> -				if (tx_ol_flags &
> TESTPMD_TX_OFFLOAD_SCTP_CKSUM) {
> -					ol_flags |= PKT_TX_SCTP_CKSUM;
> -					sctp_hdr->cksum = 0;
> -					/* Sanity check, only number of 4 bytes
> supported by HW */
> -					if ((rte_be_to_cpu_16(ipv6_hdr-
> >payload_len) % 4) != 0)
> -						printf("sctp payload must be a
> multiple "
> -							"of 4 bytes for
> checksum offload");
> -				}
> -				else {
> -					/* CRC32c sample code available in
> RFC3309 */
> -					sctp_hdr->cksum = 0;
> -				}
> -			} else {
> -				printf("Test flow control for 1G PMD \n");
> -			}
> -			/* End of L6 Handling*/
> -		}
> -		else {
> -			l3_len = 0;
> -			printf("Unhandled packet type: %#hx\n", eth_type);
> +		} else {
> +			/* this is only useful if an offload flag is
> +			 * set, but it does not hurt to fill it in any
> +			 * case */
> +			m->l2_len = l2_len;
> +			m->l3_len = l3_len;
>  		}
> +		m->ol_flags = ol_flags;
> 
> -		/* Combine the packet header write. VLAN is not consider here
> */
> -		mb->l2_len = l2_len;
> -		mb->l3_len = l3_len;
> -		mb->inner_l2_len = inner_l2_len;
> -		mb->inner_l3_len = inner_l3_len;
> -		mb->ol_flags = ol_flags;
>  	}
>  	nb_tx = rte_eth_tx_burst(fs->tx_port, fs->tx_queue, pkts_burst, nb_rx);
>  	fs->tx_packets += nb_tx;
> @@ -629,7 +570,6 @@ pkt_burst_checksum_forward(struct fwd_stream *fs)
> #endif  }
> 
> -
>  struct fwd_engine csum_fwd_engine = {
>  	.fwd_mode_name  = "csum",
>  	.port_fwd_begin = NULL,
> diff --git a/app/test-pmd/testpmd.h b/app/test-pmd/testpmd.h index
> 82af2bd..c753d37 100644
> --- a/app/test-pmd/testpmd.h
> +++ b/app/test-pmd/testpmd.h
> @@ -131,18 +131,11 @@ struct fwd_stream {
>  #define TESTPMD_TX_OFFLOAD_TCP_CKSUM         0x0004
>  /** Offload SCTP checksum in csum forward engine */
>  #define TESTPMD_TX_OFFLOAD_SCTP_CKSUM        0x0008
> -/** Offload inner IP checksum in csum forward engine */
> -#define TESTPMD_TX_OFFLOAD_INNER_IP_CKSUM    0x0010
> -/** Offload inner UDP checksum in csum forward engine */
> -#define TESTPMD_TX_OFFLOAD_INNER_UDP_CKSUM   0x0020
> -/** Offload inner TCP checksum in csum forward engine */
> -#define TESTPMD_TX_OFFLOAD_INNER_TCP_CKSUM   0x0040
> -/** Offload inner SCTP checksum in csum forward engine */ -#define
> TESTPMD_TX_OFFLOAD_INNER_SCTP_CKSUM  0x0080
> -/** Offload inner IP checksum mask */
> -#define TESTPMD_TX_OFFLOAD_INNER_CKSUM_MASK  0x00F0
> +/** Offload VxLAN checksum in csum forward engine */
> +#define TESTPMD_TX_OFFLOAD_VXLAN_CKSUM       0x0010
>  /** Insert VLAN header in forward engine */
> -#define TESTPMD_TX_OFFLOAD_INSERT_VLAN       0x0100
> +#define TESTPMD_TX_OFFLOAD_INSERT_VLAN       0x0020
> +
>  /**
>   * The data structure associated with each port.
>   */
> @@ -510,8 +503,6 @@ void tx_vlan_pvid_set(portid_t port_id, uint16_t vlan_id,
> int on);
> 
>  void set_qmap(portid_t port_id, uint8_t is_rx, uint16_t queue_id, uint8_t
> map_value);
> 
> -void tx_cksum_set(portid_t port_id, uint64_t ol_flags);
> -
>  void set_verbose_level(uint16_t vb_level);  void set_tx_pkt_segments(unsigned
> *seg_lengths, unsigned nb_segs);  void set_nb_pkt_per_burst(uint16_t
> pkt_burst);
> --
> 2.1.0

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [dpdk-dev] [PATCH 00/12] add TSO support
  2014-11-10 15:59 [dpdk-dev] [PATCH 00/12] add TSO support Olivier Matz
                   ` (11 preceding siblings ...)
  2014-11-10 15:59 ` [dpdk-dev] [PATCH 12/12] testpmd: add a verbose mode " Olivier Matz
@ 2014-11-11  9:21 ` Olivier MATZ
  2014-11-11  9:48   ` Olivier MATZ
  2014-11-14 17:03 ` [dpdk-dev] [PATCH v2 00/13] " Olivier Matz
  13 siblings, 1 reply; 112+ messages in thread
From: Olivier MATZ @ 2014-11-11  9:21 UTC (permalink / raw)
  To: dev; +Cc: jigsaw

This is the test report for the new TSO feature. Test done on testpmd
on x86_64-native-linuxapp-gcc

platform:

  Tester (linux)   <---->   DUT (DPDK on westmere)
         ixgbe6             port0 (ixgbe)

Run testpmd on DUT:

  cd dpdk.org/
  make install T=x86_64-native-linuxapp-gcc
  cd x86_64-native-linuxapp-gcc/
  modprobe uio
  insmod kmod/igb_uio.ko
  python ../tools/dpdk_nic_bind.py -b igb_uio 0000:02:00.0
  echo 0 > /proc/sys/kernel/randomize_va_space
  echo 1000 >
/sys/devices/system/node/node0/hugepages/hugepages-2048kB/nr_hugepages
  echo 1000 >
/sys/devices/system/node/node1/hugepages/hugepages-2048kB/nr_hugepages
  mount -t hugetlbfs none /mnt/huge
  ./app/testpmd -c 0x55 -n 4 -m 800 -- -i --port-topology=chained
--enable-rx-cksum

Disable all offload feature on Tester, and start capture:

  ethtool -K ixgbe6 rx off tx off tso off gso off gro off lro off
  ip l set ixgbe6 up
  tcpdump -n -e -i ixgbe6 -s 0 -w /tmp/cap

We use the following scapy script for testing (note: vxlan was not
tested because I have no i40e on my platform, but at least the test
scripts are provided if someone wants to check it):

class VXLAN(Packet):
    name = 'VXLAN'
    fields_desc = [
        FlagsField('flags', default=1 << 3, size=8,
            names=['R', 'R', 'R', 'R', 'I', 'R', 'R', 'R']),
        XBitField('reserved1', default=0x000000, size=24),
        BitField('vni', None, size=24),
        XBitField('reserved2', default=0x00, size=8),
    ]
    overload_fields = {
        UDP: {'sport': 4789, 'dport': 4789},
    }
    def mysummary(self):
        return self.sprintf("VXLAN (vni=%VXLAN.vni%)")

bind_layers(UDP, VXLAN, dport=4789)
bind_layers(VXLAN, Ether)

def test_v4(iface, macdst):
  macsrc = get_if_hwaddr(iface)
  v4 = Ether(dst=macdst, src=macsrc)/IP(src=RandIP(), dst=RandIP())
  # valid TCP packet
  p=v4/TCP(flags=0x10)/Raw(RandString(50))
  sendp(p, iface=iface, count=5)
  # valid UDP packet
  p=v4/UDP()/Raw(RandString(50))
  sendp(p, iface=iface, count=5)
  # bad IP checksum
  p=v4/TCP(flags=0x10)/Raw(RandString(50))
  p[IP].chksum=0x1234
  sendp(p, iface=iface, count=5)
  # bad TCP checksum
  p=v4/TCP(flags=0x10, chksum=0x1234)/Raw(RandString(50))
  sendp(p, iface=iface, count=5)
  # large packet
  p=v4/TCP(flags=0x10)/Raw(RandString(1400))
  sendp(p, iface=iface, count=5)

def test_v6(iface, macdst):
  macsrc = get_if_hwaddr(iface)
  v6 = Ether(dst=macdst, src=macsrc)/IPv6(src=RandIP6(), dst=RandIP6())
  # checksum TCP
  p=v6/TCP(flags=0x10)/Raw(RandString(50))
  sendp(p, iface=iface, count=5)
  # checksum UDP
  p=v6/UDP()/Raw(RandString(50))
  sendp(p, iface=iface, count=5)
  # bad TCP checksum
  p=v6/TCP(flags=0x10, chksum=0x1234)/Raw(RandString(50))
  sendp(p, iface=iface, count=5)
  # large packet
  p=v6/TCP(flags=0x10)/Raw(RandString(1400))
  sendp(p, iface=iface, count=5)

def test_vxlan(iface, macdst):
  macsrc = get_if_hwaddr(iface)
  vxlan = Ether(dst=macdst, src=macsrc)/IP(src=RandIP(), dst=RandIP())
  vxlan /= UDP()/VXLAN(vni=1234)/Ether(dst=macdst, src=macsrc)
  vxlan /= IP(src=RandIP(), dst=RandIP())
  # valid packet
  p=vxlan/TCP(flags=0x10)/Raw(RandString(50))
  sendp(p, iface=iface, count=5)
  # bad IP checksum
  p=vxlan/TCP(flags=0x10)/Raw(RandString(50))
  p[IP].payload[IP].chksum=0x1234 # inner header
  sendp(p, iface=iface, count=5)
  # bad TCP checksum
  p=vxlan/TCP(flags=0x10, chksum=0x1234)/Raw(RandString(50))
  sendp(p, iface=iface, count=5)
  # large TCP packet, no UDP checksum on outer
  p=vxlan/TCP(flags=0x10)/Raw(RandString(1400))
  p[UDP].chksum = 0
  sendp(p, iface=iface, count=5)

test_v4("ixgbe6", "00:1B:21:8E:B2:30")
test_v6("ixgbe6", "00:1B:21:8E:B2:30")
test_vxlan("ixgbe6", "00:1B:21:8E:B2:30")

Test 1: rxonly fwd engine
=========================

Check that the NIC is able to decode the packet header and the bad
checksum values. The test_vxlan does not work on ixgbe as it is not able
to recognize vxlan packets.

testpmd command lines:

  set fwd rxonly
  set verbose 1
  start

Start test_v4() in scapy. Result is:

port 0/queue 0: received 1 packets
  src=90:E2:BA:2B:0F:4C - dst=00:1B:21:8E:B2:30 - type=0x0800 -
length=104 - nb_segs=1 - Receive queue=0x0
  PKT_RX_IPV4_HDR
port 0/queue 0: received 1 packets
  src=90:E2:BA:2B:0F:4C - dst=00:1B:21:8E:B2:30 - type=0x0800 -
length=104 - nb_segs=1 - Receive queue=0x0
  PKT_RX_IPV4_HDR
port 0/queue 0: received 1 packets
  src=90:E2:BA:2B:0F:4C - dst=00:1B:21:8E:B2:30 - type=0x0800 -
length=104 - nb_segs=1 - Receive queue=0x0
  PKT_RX_IPV4_HDR
port 0/queue 0: received 1 packets
  src=90:E2:BA:2B:0F:4C - dst=00:1B:21:8E:B2:30 - type=0x0800 -
length=104 - nb_segs=1 - Receive queue=0x0
  PKT_RX_IPV4_HDR
port 0/queue 0: received 1 packets
  src=90:E2:BA:2B:0F:4C - dst=00:1B:21:8E:B2:30 - type=0x0800 -
length=104 - nb_segs=1 - Receive queue=0x0
  PKT_RX_IPV4_HDR
port 0/queue 0: received 1 packets
  src=90:E2:BA:2B:0F:4C - dst=00:1B:21:8E:B2:30 - type=0x0800 -
length=92 - nb_segs=1 - Receive queue=0x0
  PKT_RX_IPV4_HDR
port 0/queue 0: received 1 packets
  src=90:E2:BA:2B:0F:4C - dst=00:1B:21:8E:B2:30 - type=0x0800 -
length=92 - nb_segs=1 - Receive queue=0x0
  PKT_RX_IPV4_HDR
port 0/queue 0: received 1 packets
  src=90:E2:BA:2B:0F:4C - dst=00:1B:21:8E:B2:30 - type=0x0800 -
length=92 - nb_segs=1 - Receive queue=0x0
  PKT_RX_IPV4_HDR
port 0/queue 0: received 1 packets
  src=90:E2:BA:2B:0F:4C - dst=00:1B:21:8E:B2:30 - type=0x0800 -
length=92 - nb_segs=1 - Receive queue=0x0
  PKT_RX_IPV4_HDR
port 0/queue 0: received 1 packets
  src=90:E2:BA:2B:0F:4C - dst=00:1B:21:8E:B2:30 - type=0x0800 -
length=92 - nb_segs=1 - Receive queue=0x0
  PKT_RX_IPV4_HDR
port 0/queue 0: received 1 packets
  src=90:E2:BA:2B:0F:4C - dst=00:1B:21:8E:B2:30 - type=0x0800 -
length=104 - nb_segs=1 - Receive queue=0x0
  PKT_RX_IP_CKSUM_BAD
  PKT_RX_IPV4_HDR
port 0/queue 0: received 1 packets
  src=90:E2:BA:2B:0F:4C - dst=00:1B:21:8E:B2:30 - type=0x0800 -
length=104 - nb_segs=1 - Receive queue=0x0
  PKT_RX_IP_CKSUM_BAD
  PKT_RX_IPV4_HDR
port 0/queue 0: received 1 packets
  src=90:E2:BA:2B:0F:4C - dst=00:1B:21:8E:B2:30 - type=0x0800 -
length=104 - nb_segs=1 - Receive queue=0x0
  PKT_RX_IP_CKSUM_BAD
  PKT_RX_IPV4_HDR
port 0/queue 0: received 1 packets
  src=90:E2:BA:2B:0F:4C - dst=00:1B:21:8E:B2:30 - type=0x0800 -
length=104 - nb_segs=1 - Receive queue=0x0
  PKT_RX_IP_CKSUM_BAD
  PKT_RX_IPV4_HDR
port 0/queue 0: received 1 packets
  src=90:E2:BA:2B:0F:4C - dst=00:1B:21:8E:B2:30 - type=0x0800 -
length=104 - nb_segs=1 - Receive queue=0x0
  PKT_RX_IP_CKSUM_BAD
  PKT_RX_IPV4_HDR
port 0/queue 0: received 1 packets
  src=90:E2:BA:2B:0F:4C - dst=00:1B:21:8E:B2:30 - type=0x0800 -
length=104 - nb_segs=1 - Receive queue=0x0
  PKT_RX_L4_CKSUM_BAD
  PKT_RX_IPV4_HDR
port 0/queue 0: received 1 packets
  src=90:E2:BA:2B:0F:4C - dst=00:1B:21:8E:B2:30 - type=0x0800 -
length=104 - nb_segs=1 - Receive queue=0x0
  PKT_RX_L4_CKSUM_BAD
  PKT_RX_IPV4_HDR
port 0/queue 0: received 1 packets
  src=90:E2:BA:2B:0F:4C - dst=00:1B:21:8E:B2:30 - type=0x0800 -
length=104 - nb_segs=1 - Receive queue=0x0
  PKT_RX_L4_CKSUM_BAD
  PKT_RX_IPV4_HDR
port 0/queue 0: received 1 packets
  src=90:E2:BA:2B:0F:4C - dst=00:1B:21:8E:B2:30 - type=0x0800 -
length=104 - nb_segs=1 - Receive queue=0x0
  PKT_RX_L4_CKSUM_BAD
  PKT_RX_IPV4_HDR
port 0/queue 0: received 1 packets
  src=90:E2:BA:2B:0F:4C - dst=00:1B:21:8E:B2:30 - type=0x0800 -
length=104 - nb_segs=1 - Receive queue=0x0
  PKT_RX_L4_CKSUM_BAD
  PKT_RX_IPV4_HDR
port 0/queue 0: received 1 packets
  src=90:E2:BA:2B:0F:4C - dst=00:1B:21:8E:B2:30 - type=0x0800 -
length=1454 - nb_segs=1 - Receive queue=0x0
  PKT_RX_IPV4_HDR
port 0/queue 0: received 1 packets
  src=90:E2:BA:2B:0F:4C - dst=00:1B:21:8E:B2:30 - type=0x0800 -
length=1454 - nb_segs=1 - Receive queue=0x0
  PKT_RX_IPV4_HDR
port 0/queue 0: received 1 packets
  src=90:E2:BA:2B:0F:4C - dst=00:1B:21:8E:B2:30 - type=0x0800 -
length=1454 - nb_segs=1 - Receive queue=0x0
  PKT_RX_IPV4_HDR
port 0/queue 0: received 1 packets
  src=90:E2:BA:2B:0F:4C - dst=00:1B:21:8E:B2:30 - type=0x0800 -
length=1454 - nb_segs=1 - Receive queue=0x0
  PKT_RX_IPV4_HDR
port 0/queue 0: received 1 packets
  src=90:E2:BA:2B:0F:4C - dst=00:1B:21:8E:B2:30 - type=0x0800 -
length=1454 - nb_segs=1 - Receive queue=0x0
  PKT_RX_IPV4_HDR

test_v6

port 0/queue 0: received 1 packets
  src=90:E2:BA:2B:0F:4C - dst=00:1B:21:8E:B2:30 - type=0x86dd -
length=124 - nb_segs=1 - Receive queue=0x0
  PKT_RX_IPV6_HDR
port 0/queue 0: received 1 packets
  src=90:E2:BA:2B:0F:4C - dst=00:1B:21:8E:B2:30 - type=0x86dd -
length=124 - nb_segs=1 - Receive queue=0x0
  PKT_RX_IPV6_HDR
port 0/queue 0: received 1 packets
  src=90:E2:BA:2B:0F:4C - dst=00:1B:21:8E:B2:30 - type=0x86dd -
length=124 - nb_segs=1 - Receive queue=0x0
  PKT_RX_IPV6_HDR
port 0/queue 0: received 1 packets
  src=90:E2:BA:2B:0F:4C - dst=00:1B:21:8E:B2:30 - type=0x86dd -
length=124 - nb_segs=1 - Receive queue=0x0
  PKT_RX_IPV6_HDR
port 0/queue 0: received 1 packets
  src=90:E2:BA:2B:0F:4C - dst=00:1B:21:8E:B2:30 - type=0x86dd -
length=124 - nb_segs=1 - Receive queue=0x0
  PKT_RX_IPV6_HDR
port 0/queue 0: received 1 packets
  src=90:E2:BA:2B:0F:4C - dst=00:1B:21:8E:B2:30 - type=0x86dd -
length=112 - nb_segs=1 - Receive queue=0x0
  PKT_RX_IPV6_HDR
port 0/queue 0: received 1 packets
  src=90:E2:BA:2B:0F:4C - dst=00:1B:21:8E:B2:30 - type=0x86dd -
length=112 - nb_segs=1 - Receive queue=0x0
  PKT_RX_IPV6_HDR
port 0/queue 0: received 1 packets
  src=90:E2:BA:2B:0F:4C - dst=00:1B:21:8E:B2:30 - type=0x86dd -
length=112 - nb_segs=1 - Receive queue=0x0
  PKT_RX_IPV6_HDR
port 0/queue 0: received 1 packets
  src=90:E2:BA:2B:0F:4C - dst=00:1B:21:8E:B2:30 - type=0x86dd -
length=112 - nb_segs=1 - Receive queue=0x0
  PKT_RX_IPV6_HDR
port 0/queue 0: received 1 packets
  src=90:E2:BA:2B:0F:4C - dst=00:1B:21:8E:B2:30 - type=0x86dd -
length=112 - nb_segs=1 - Receive queue=0x0
  PKT_RX_IPV6_HDR
port 0/queue 0: received 1 packets
  src=90:E2:BA:2B:0F:4C - dst=00:1B:21:8E:B2:30 - type=0x86dd -
length=124 - nb_segs=1 - Receive queue=0x0
  PKT_RX_L4_CKSUM_BAD
  PKT_RX_IPV6_HDR
port 0/queue 0: received 1 packets
  src=90:E2:BA:2B:0F:4C - dst=00:1B:21:8E:B2:30 - type=0x86dd -
length=124 - nb_segs=1 - Receive queue=0x0
  PKT_RX_L4_CKSUM_BAD
  PKT_RX_IPV6_HDR
port 0/queue 0: received 1 packets
  src=90:E2:BA:2B:0F:4C - dst=00:1B:21:8E:B2:30 - type=0x86dd -
length=124 - nb_segs=1 - Receive queue=0x0
  PKT_RX_L4_CKSUM_BAD
  PKT_RX_IPV6_HDR
port 0/queue 0: received 1 packets
  src=90:E2:BA:2B:0F:4C - dst=00:1B:21:8E:B2:30 - type=0x86dd -
length=124 - nb_segs=1 - Receive queue=0x0
  PKT_RX_L4_CKSUM_BAD
  PKT_RX_IPV6_HDR
port 0/queue 0: received 1 packets
  src=90:E2:BA:2B:0F:4C - dst=00:1B:21:8E:B2:30 - type=0x86dd -
length=124 - nb_segs=1 - Receive queue=0x0
  PKT_RX_L4_CKSUM_BAD
  PKT_RX_IPV6_HDR
port 0/queue 0: received 1 packets
  src=90:E2:BA:2B:0F:4C - dst=00:1B:21:8E:B2:30 - type=0x86dd -
length=1474 - nb_segs=1 - Receive queue=0x0
  PKT_RX_IPV6_HDR
port 0/queue 0: received 1 packets
  src=90:E2:BA:2B:0F:4C - dst=00:1B:21:8E:B2:30 - type=0x86dd -
length=1474 - nb_segs=1 - Receive queue=0x0
  PKT_RX_IPV6_HDR
port 0/queue 0: received 1 packets
  src=90:E2:BA:2B:0F:4C - dst=00:1B:21:8E:B2:30 - type=0x86dd -
length=1474 - nb_segs=1 - Receive queue=0x0
  PKT_RX_IPV6_HDR
port 0/queue 0: received 1 packets
  src=90:E2:BA:2B:0F:4C - dst=00:1B:21:8E:B2:30 - type=0x86dd -
length=1474 - nb_segs=1 - Receive queue=0x0
  PKT_RX_IPV6_HDR
port 0/queue 0: received 1 packets
  src=90:E2:BA:2B:0F:4C - dst=00:1B:21:8E:B2:30 - type=0x86dd -
length=1474 - nb_segs=1 - Receive queue=0x0
  PKT_RX_IPV6_HDR

Test 2: csum fwd engine, use sw checksum
========================================

The goal of this test is to show that the csum forward engine is able
to process checksum in software.

  # hw checksum and tso are disabled for port 0
  tx_checksum set ip sw 0
  tx_checksum set udp sw 0
  tx_checksum set tcp sw 0
  tx_checksum set sctp sw 0
  tx_checksum set vxlan sw 0
  tso set 0 0
  # set the forward engine
  set verbose 1
  set fwd csum
  start

Start test_v4() in scapy. Result is:

  -----------------
  rx: l2_len=14 ethertype=800 l3_len=20 l4_proto=6 l4_len=20
  tx: flags=
  -----------------
  rx: l2_len=14 ethertype=800 l3_len=20 l4_proto=6 l4_len=20
  tx: flags=
  -----------------
  rx: l2_len=14 ethertype=800 l3_len=20 l4_proto=6 l4_len=20
  tx: flags=
  -----------------
  rx: l2_len=14 ethertype=800 l3_len=20 l4_proto=6 l4_len=20
  tx: flags=
  -----------------
  rx: l2_len=14 ethertype=800 l3_len=20 l4_proto=6 l4_len=20
  tx: flags=
  -----------------
  rx: l2_len=14 ethertype=800 l3_len=20 l4_proto=17 l4_len=0
  tx: flags=
  -----------------
  rx: l2_len=14 ethertype=800 l3_len=20 l4_proto=17 l4_len=0
  tx: flags=
  -----------------
  rx: l2_len=14 ethertype=800 l3_len=20 l4_proto=17 l4_len=0
  tx: flags=
  -----------------
  rx: l2_len=14 ethertype=800 l3_len=20 l4_proto=17 l4_len=0
  tx: flags=
  -----------------
  rx: l2_len=14 ethertype=800 l3_len=20 l4_proto=17 l4_len=0
  tx: flags=
  -----------------
  rx: l2_len=14 ethertype=800 l3_len=20 l4_proto=6 l4_len=20
  tx: flags=
  -----------------
  rx: l2_len=14 ethertype=800 l3_len=20 l4_proto=6 l4_len=20
  tx: flags=
  -----------------
  rx: l2_len=14 ethertype=800 l3_len=20 l4_proto=6 l4_len=20
  tx: flags=
  -----------------
  rx: l2_len=14 ethertype=800 l3_len=20 l4_proto=6 l4_len=20
  tx: flags=
  -----------------
  rx: l2_len=14 ethertype=800 l3_len=20 l4_proto=6 l4_len=20
  tx: flags=
  -----------------
  rx: l2_len=14 ethertype=800 l3_len=20 l4_proto=6 l4_len=20
  tx: flags=
  -----------------
  rx: l2_len=14 ethertype=800 l3_len=20 l4_proto=6 l4_len=20
  tx: flags=
  -----------------
  rx: l2_len=14 ethertype=800 l3_len=20 l4_proto=6 l4_len=20
  tx: flags=
  -----------------
  rx: l2_len=14 ethertype=800 l3_len=20 l4_proto=6 l4_len=20
  tx: flags=
  -----------------
  rx: l2_len=14 ethertype=800 l3_len=20 l4_proto=6 l4_len=20
  tx: flags=
  -----------------
  rx: l2_len=14 ethertype=800 l3_len=20 l4_proto=6 l4_len=20
  tx: flags=
  -----------------
  rx: l2_len=14 ethertype=800 l3_len=20 l4_proto=6 l4_len=20
  tx: flags=
  -----------------
  rx: l2_len=14 ethertype=800 l3_len=20 l4_proto=6 l4_len=20
  tx: flags=
  -----------------
  rx: l2_len=14 ethertype=800 l3_len=20 l4_proto=6 l4_len=20
  tx: flags=
  -----------------
  rx: l2_len=14 ethertype=800 l3_len=20 l4_proto=6 l4_len=20
  tx: flags=

Start test_v6() in scapy, result is:

  -----------------
  rx: l2_len=14 ethertype=86dd l3_len=40 l4_proto=6 l4_len=20
  tx: flags=
  -----------------
  rx: l2_len=14 ethertype=86dd l3_len=40 l4_proto=6 l4_len=20
  tx: flags=
  -----------------
  rx: l2_len=14 ethertype=86dd l3_len=40 l4_proto=6 l4_len=20
  tx: flags=
  -----------------
  rx: l2_len=14 ethertype=86dd l3_len=40 l4_proto=6 l4_len=20
  tx: flags=
  -----------------
  rx: l2_len=14 ethertype=86dd l3_len=40 l4_proto=6 l4_len=20
  tx: flags=
  -----------------
  rx: l2_len=14 ethertype=86dd l3_len=40 l4_proto=17 l4_len=0
  tx: flags=
  -----------------
  rx: l2_len=14 ethertype=86dd l3_len=40 l4_proto=17 l4_len=0
  tx: flags=
  -----------------
  rx: l2_len=14 ethertype=86dd l3_len=40 l4_proto=17 l4_len=0
  tx: flags=
  -----------------
  rx: l2_len=14 ethertype=86dd l3_len=40 l4_proto=17 l4_len=0
  tx: flags=
  -----------------
  rx: l2_len=14 ethertype=86dd l3_len=40 l4_proto=17 l4_len=0
  tx: flags=
  -----------------
  rx: l2_len=14 ethertype=86dd l3_len=40 l4_proto=6 l4_len=20
  tx: flags=
  -----------------
  rx: l2_len=14 ethertype=86dd l3_len=40 l4_proto=6 l4_len=20
  tx: flags=
  -----------------
  rx: l2_len=14 ethertype=86dd l3_len=40 l4_proto=6 l4_len=20
  tx: flags=
  -----------------
  rx: l2_len=14 ethertype=86dd l3_len=40 l4_proto=6 l4_len=20
  tx: flags=
  -----------------
  rx: l2_len=14 ethertype=86dd l3_len=40 l4_proto=6 l4_len=20
  tx: flags=
  -----------------
  rx: l2_len=14 ethertype=86dd l3_len=40 l4_proto=6 l4_len=20
  tx: flags=
  -----------------
  rx: l2_len=14 ethertype=86dd l3_len=40 l4_proto=6 l4_len=20
  tx: flags=
  -----------------
  rx: l2_len=14 ethertype=86dd l3_len=40 l4_proto=6 l4_len=20
  tx: flags=
  -----------------
  rx: l2_len=14 ethertype=86dd l3_len=40 l4_proto=6 l4_len=20
  tx: flags=
  -----------------
  rx: l2_len=14 ethertype=86dd l3_len=40 l4_proto=6 l4_len=20
  tx: flags=

Start test_vxlan() in scapy, result is:

  -----------------
  rx: l2_len=14 ethertype=800 l3_len=20 l4_proto=6 l4_len=20
  rx: outer_l2_len=14 outer_ethertype=800 outer_l3_len=20
  tx: flags=
  -----------------
  rx: l2_len=14 ethertype=800 l3_len=20 l4_proto=6 l4_len=20
  rx: outer_l2_len=14 outer_ethertype=800 outer_l3_len=20
  tx: flags=
  -----------------
  rx: l2_len=14 ethertype=800 l3_len=20 l4_proto=6 l4_len=20
  rx: outer_l2_len=14 outer_ethertype=800 outer_l3_len=20
  tx: flags=
  -----------------
  rx: l2_len=14 ethertype=800 l3_len=20 l4_proto=6 l4_len=20
  rx: outer_l2_len=14 outer_ethertype=800 outer_l3_len=20
  tx: flags=
  -----------------
  rx: l2_len=14 ethertype=800 l3_len=20 l4_proto=6 l4_len=20
  rx: outer_l2_len=14 outer_ethertype=800 outer_l3_len=20
  tx: flags=
  -----------------
  rx: l2_len=14 ethertype=800 l3_len=20 l4_proto=6 l4_len=20
  rx: outer_l2_len=14 outer_ethertype=800 outer_l3_len=20
  tx: flags=
  -----------------
  rx: l2_len=14 ethertype=800 l3_len=20 l4_proto=6 l4_len=20
  rx: outer_l2_len=14 outer_ethertype=800 outer_l3_len=20
  tx: flags=
  -----------------
  rx: l2_len=14 ethertype=800 l3_len=20 l4_proto=6 l4_len=20
  rx: outer_l2_len=14 outer_ethertype=800 outer_l3_len=20
  tx: flags=
  -----------------
  rx: l2_len=14 ethertype=800 l3_len=20 l4_proto=6 l4_len=20
  rx: outer_l2_len=14 outer_ethertype=800 outer_l3_len=20
  tx: flags=
  -----------------
  rx: l2_len=14 ethertype=800 l3_len=20 l4_proto=6 l4_len=20
  rx: outer_l2_len=14 outer_ethertype=800 outer_l3_len=20
  tx: flags=
  -----------------
  rx: l2_len=14 ethertype=800 l3_len=20 l4_proto=6 l4_len=20
  rx: outer_l2_len=14 outer_ethertype=800 outer_l3_len=20
  tx: flags=
  -----------------
  rx: l2_len=14 ethertype=800 l3_len=20 l4_proto=6 l4_len=20
  rx: outer_l2_len=14 outer_ethertype=800 outer_l3_len=20
  tx: flags=
  -----------------
  rx: l2_len=14 ethertype=800 l3_len=20 l4_proto=6 l4_len=20
  rx: outer_l2_len=14 outer_ethertype=800 outer_l3_len=20
  tx: flags=
  -----------------
  rx: l2_len=14 ethertype=800 l3_len=20 l4_proto=6 l4_len=20
  rx: outer_l2_len=14 outer_ethertype=800 outer_l3_len=20
  tx: flags=
  -----------------
  rx: l2_len=14 ethertype=800 l3_len=20 l4_proto=6 l4_len=20
  rx: outer_l2_len=14 outer_ethertype=800 outer_l3_len=20
  tx: flags=
  -----------------
  rx: l2_len=14 ethertype=800 l3_len=20 l4_proto=6 l4_len=20
  rx: outer_l2_len=14 outer_ethertype=800 outer_l3_len=20
  tx: flags=
  -----------------
  rx: l2_len=14 ethertype=800 l3_len=20 l4_proto=6 l4_len=20
  rx: outer_l2_len=14 outer_ethertype=800 outer_l3_len=20
  tx: flags=
  -----------------
  rx: l2_len=14 ethertype=800 l3_len=20 l4_proto=6 l4_len=20
  rx: outer_l2_len=14 outer_ethertype=800 outer_l3_len=20
  tx: flags=
  -----------------
  rx: l2_len=14 ethertype=800 l3_len=20 l4_proto=6 l4_len=20
  rx: outer_l2_len=14 outer_ethertype=800 outer_l3_len=20
  tx: flags=
  -----------------
  rx: l2_len=14 ethertype=800 l3_len=20 l4_proto=6 l4_len=20
  rx: outer_l2_len=14 outer_ethertype=800 outer_l3_len=20
  tx: flags=

Check the capture file (test2-cap-sw-cksum.cap)

Test 3: csum fwd engine, use hw checksum
========================================

The goal of this test is to show that the csum forward engine is able to
process checksum in hardware.

  # enable hw cksum in csumonly test, disable tso
  tx_checksum set ip hw 0
  tx_checksum set udp hw 0
  tx_checksum set tcp hw 0
  tx_checksum set sctp hw 0
  # set the forward engine
  set verbose 1
  set fwd csum
  start

Start test_v4() in scapy. Result is:

  -----------------
  rx: l2_len=14 ethertype=800 l3_len=20 l4_proto=6 l4_len=20
  tx: m->l2_len=14 m->l3_len=20 m->l4_len=20
  tx: flags=PKT_TX_IP_CKSUM PKT_TX_TCP_CKSUM
  -----------------
  rx: l2_len=14 ethertype=800 l3_len=20 l4_proto=6 l4_len=20
  tx: m->l2_len=14 m->l3_len=20 m->l4_len=20
  tx: flags=PKT_TX_IP_CKSUM PKT_TX_TCP_CKSUM
  -----------------
  rx: l2_len=14 ethertype=800 l3_len=20 l4_proto=6 l4_len=20
  tx: m->l2_len=14 m->l3_len=20 m->l4_len=20
  tx: flags=PKT_TX_IP_CKSUM PKT_TX_TCP_CKSUM
  -----------------
  rx: l2_len=14 ethertype=800 l3_len=20 l4_proto=6 l4_len=20
  tx: m->l2_len=14 m->l3_len=20 m->l4_len=20
  tx: flags=PKT_TX_IP_CKSUM PKT_TX_TCP_CKSUM
  -----------------
  rx: l2_len=14 ethertype=800 l3_len=20 l4_proto=6 l4_len=20
  tx: m->l2_len=14 m->l3_len=20 m->l4_len=20
  tx: flags=PKT_TX_IP_CKSUM PKT_TX_TCP_CKSUM
  -----------------
  rx: l2_len=14 ethertype=800 l3_len=20 l4_proto=17 l4_len=0
  tx: m->l2_len=14 m->l3_len=20 m->l4_len=0
  tx: flags=PKT_TX_IP_CKSUM PKT_TX_UDP_CKSUM
  -----------------
  rx: l2_len=14 ethertype=800 l3_len=20 l4_proto=17 l4_len=0
  tx: m->l2_len=14 m->l3_len=20 m->l4_len=0
  tx: flags=PKT_TX_IP_CKSUM PKT_TX_UDP_CKSUM
  -----------------
  rx: l2_len=14 ethertype=800 l3_len=20 l4_proto=17 l4_len=0
  tx: m->l2_len=14 m->l3_len=20 m->l4_len=0
  tx: flags=PKT_TX_IP_CKSUM PKT_TX_UDP_CKSUM
  -----------------
  rx: l2_len=14 ethertype=800 l3_len=20 l4_proto=17 l4_len=0
  tx: m->l2_len=14 m->l3_len=20 m->l4_len=0
  tx: flags=PKT_TX_IP_CKSUM PKT_TX_UDP_CKSUM
  -----------------
  rx: l2_len=14 ethertype=800 l3_len=20 l4_proto=17 l4_len=0
  tx: m->l2_len=14 m->l3_len=20 m->l4_len=0
  tx: flags=PKT_TX_IP_CKSUM PKT_TX_UDP_CKSUM
  -----------------
  rx: l2_len=14 ethertype=800 l3_len=20 l4_proto=6 l4_len=20
  tx: m->l2_len=14 m->l3_len=20 m->l4_len=20
  tx: flags=PKT_TX_IP_CKSUM PKT_TX_TCP_CKSUM
  -----------------
  rx: l2_len=14 ethertype=800 l3_len=20 l4_proto=6 l4_len=20
  tx: m->l2_len=14 m->l3_len=20 m->l4_len=20
  tx: flags=PKT_TX_IP_CKSUM PKT_TX_TCP_CKSUM
  -----------------
  rx: l2_len=14 ethertype=800 l3_len=20 l4_proto=6 l4_len=20
  tx: m->l2_len=14 m->l3_len=20 m->l4_len=20
  tx: flags=PKT_TX_IP_CKSUM PKT_TX_TCP_CKSUM
  -----------------
  rx: l2_len=14 ethertype=800 l3_len=20 l4_proto=6 l4_len=20
  tx: m->l2_len=14 m->l3_len=20 m->l4_len=20
  tx: flags=PKT_TX_IP_CKSUM PKT_TX_TCP_CKSUM
  -----------------
  rx: l2_len=14 ethertype=800 l3_len=20 l4_proto=6 l4_len=20
  tx: m->l2_len=14 m->l3_len=20 m->l4_len=20
  tx: flags=PKT_TX_IP_CKSUM PKT_TX_TCP_CKSUM
  -----------------
  rx: l2_len=14 ethertype=800 l3_len=20 l4_proto=6 l4_len=20
  tx: m->l2_len=14 m->l3_len=20 m->l4_len=20
  tx: flags=PKT_TX_IP_CKSUM PKT_TX_TCP_CKSUM
  -----------------
  rx: l2_len=14 ethertype=800 l3_len=20 l4_proto=6 l4_len=20
  tx: m->l2_len=14 m->l3_len=20 m->l4_len=20
  tx: flags=PKT_TX_IP_CKSUM PKT_TX_TCP_CKSUM
  -----------------
  rx: l2_len=14 ethertype=800 l3_len=20 l4_proto=6 l4_len=20
  tx: m->l2_len=14 m->l3_len=20 m->l4_len=20
  tx: flags=PKT_TX_IP_CKSUM PKT_TX_TCP_CKSUM
  -----------------
  rx: l2_len=14 ethertype=800 l3_len=20 l4_proto=6 l4_len=20
  tx: m->l2_len=14 m->l3_len=20 m->l4_len=20
  tx: flags=PKT_TX_IP_CKSUM PKT_TX_TCP_CKSUM
  -----------------
  rx: l2_len=14 ethertype=800 l3_len=20 l4_proto=6 l4_len=20
  tx: m->l2_len=14 m->l3_len=20 m->l4_len=20
  tx: flags=PKT_TX_IP_CKSUM PKT_TX_TCP_CKSUM
  -----------------
  rx: l2_len=14 ethertype=800 l3_len=20 l4_proto=6 l4_len=20
  tx: m->l2_len=14 m->l3_len=20 m->l4_len=20
  tx: flags=PKT_TX_IP_CKSUM PKT_TX_TCP_CKSUM
  -----------------
  rx: l2_len=14 ethertype=800 l3_len=20 l4_proto=6 l4_len=20
  tx: m->l2_len=14 m->l3_len=20 m->l4_len=20
  tx: flags=PKT_TX_IP_CKSUM PKT_TX_TCP_CKSUM
  -----------------
  rx: l2_len=14 ethertype=800 l3_len=20 l4_proto=6 l4_len=20
  tx: m->l2_len=14 m->l3_len=20 m->l4_len=20
  tx: flags=PKT_TX_IP_CKSUM PKT_TX_TCP_CKSUM
  -----------------
  rx: l2_len=14 ethertype=800 l3_len=20 l4_proto=6 l4_len=20
  tx: m->l2_len=14 m->l3_len=20 m->l4_len=20
  tx: flags=PKT_TX_IP_CKSUM PKT_TX_TCP_CKSUM
  -----------------
  rx: l2_len=14 ethertype=800 l3_len=20 l4_proto=6 l4_len=20
  tx: m->l2_len=14 m->l3_len=20 m->l4_len=20
  tx: flags=PKT_TX_IP_CKSUM PKT_TX_TCP_CKSUM

Start test_v6() in scapy, result is:

  -----------------
  rx: l2_len=14 ethertype=86dd l3_len=40 l4_proto=6 l4_len=20
  tx: m->l2_len=14 m->l3_len=40 m->l4_len=20
  tx: flags=PKT_TX_TCP_CKSUM
  -----------------
  rx: l2_len=14 ethertype=86dd l3_len=40 l4_proto=6 l4_len=20
  tx: m->l2_len=14 m->l3_len=40 m->l4_len=20
  tx: flags=PKT_TX_TCP_CKSUM
  -----------------
  rx: l2_len=14 ethertype=86dd l3_len=40 l4_proto=6 l4_len=20
  tx: m->l2_len=14 m->l3_len=40 m->l4_len=20
  tx: flags=PKT_TX_TCP_CKSUM
  -----------------
  rx: l2_len=14 ethertype=86dd l3_len=40 l4_proto=6 l4_len=20
  tx: m->l2_len=14 m->l3_len=40 m->l4_len=20
  tx: flags=PKT_TX_TCP_CKSUM
  -----------------
  rx: l2_len=14 ethertype=86dd l3_len=40 l4_proto=6 l4_len=20
  tx: m->l2_len=14 m->l3_len=40 m->l4_len=20
  tx: flags=PKT_TX_TCP_CKSUM
  -----------------
  rx: l2_len=14 ethertype=86dd l3_len=40 l4_proto=17 l4_len=0
  tx: m->l2_len=14 m->l3_len=40 m->l4_len=0
  tx: flags=PKT_TX_UDP_CKSUM
  -----------------
  rx: l2_len=14 ethertype=86dd l3_len=40 l4_proto=17 l4_len=0
  tx: m->l2_len=14 m->l3_len=40 m->l4_len=0
  tx: flags=PKT_TX_UDP_CKSUM
  -----------------
  rx: l2_len=14 ethertype=86dd l3_len=40 l4_proto=17 l4_len=0
  tx: m->l2_len=14 m->l3_len=40 m->l4_len=0
  tx: flags=PKT_TX_UDP_CKSUM
  -----------------
  rx: l2_len=14 ethertype=86dd l3_len=40 l4_proto=17 l4_len=0
  tx: m->l2_len=14 m->l3_len=40 m->l4_len=0
  tx: flags=PKT_TX_UDP_CKSUM
  -----------------
  rx: l2_len=14 ethertype=86dd l3_len=40 l4_proto=17 l4_len=0
  tx: m->l2_len=14 m->l3_len=40 m->l4_len=0
  tx: flags=PKT_TX_UDP_CKSUM
  -----------------
  rx: l2_len=14 ethertype=86dd l3_len=40 l4_proto=6 l4_len=20
  tx: m->l2_len=14 m->l3_len=40 m->l4_len=20
  tx: flags=PKT_TX_TCP_CKSUM
  -----------------
  rx: l2_len=14 ethertype=86dd l3_len=40 l4_proto=6 l4_len=20
  tx: m->l2_len=14 m->l3_len=40 m->l4_len=20
  tx: flags=PKT_TX_TCP_CKSUM
  -----------------
  rx: l2_len=14 ethertype=86dd l3_len=40 l4_proto=6 l4_len=20
  tx: m->l2_len=14 m->l3_len=40 m->l4_len=20
  tx: flags=PKT_TX_TCP_CKSUM
  -----------------
  rx: l2_len=14 ethertype=86dd l3_len=40 l4_proto=6 l4_len=20
  tx: m->l2_len=14 m->l3_len=40 m->l4_len=20
  tx: flags=PKT_TX_TCP_CKSUM
  -----------------
  rx: l2_len=14 ethertype=86dd l3_len=40 l4_proto=6 l4_len=20
  tx: m->l2_len=14 m->l3_len=40 m->l4_len=20
  tx: flags=PKT_TX_TCP_CKSUM
  -----------------
  rx: l2_len=14 ethertype=86dd l3_len=40 l4_proto=6 l4_len=20
  tx: m->l2_len=14 m->l3_len=40 m->l4_len=20
  tx: flags=PKT_TX_TCP_CKSUM
  -----------------
  rx: l2_len=14 ethertype=86dd l3_len=40 l4_proto=6 l4_len=20
  tx: m->l2_len=14 m->l3_len=40 m->l4_len=20
  tx: flags=PKT_TX_TCP_CKSUM
  -----------------
  rx: l2_len=14 ethertype=86dd l3_len=40 l4_proto=6 l4_len=20
  tx: m->l2_len=14 m->l3_len=40 m->l4_len=20
  tx: flags=PKT_TX_TCP_CKSUM
  -----------------
  rx: l2_len=14 ethertype=86dd l3_len=40 l4_proto=6 l4_len=20
  tx: m->l2_len=14 m->l3_len=40 m->l4_len=20
  tx: flags=PKT_TX_TCP_CKSUM
  -----------------
  rx: l2_len=14 ethertype=86dd l3_len=40 l4_proto=6 l4_len=20
  tx: m->l2_len=14 m->l3_len=40 m->l4_len=20
  tx: flags=PKT_TX_TCP_CKSUM

Start test_vxlan() in scapy, result is:

  -----------------
  rx: l2_len=14 ethertype=800 l3_len=20 l4_proto=6 l4_len=20
  rx: outer_l2_len=14 outer_ethertype=800 outer_l3_len=20
  tx: m->l2_len=64 m->l3_len=20 m->l4_len=20
  tx: flags=PKT_TX_IP_CKSUM PKT_TX_TCP_CKSUM
  -----------------
  rx: l2_len=14 ethertype=800 l3_len=20 l4_proto=6 l4_len=20
  rx: outer_l2_len=14 outer_ethertype=800 outer_l3_len=20
  tx: m->l2_len=64 m->l3_len=20 m->l4_len=20
  tx: flags=PKT_TX_IP_CKSUM PKT_TX_TCP_CKSUM
  -----------------
  rx: l2_len=14 ethertype=800 l3_len=20 l4_proto=6 l4_len=20
  rx: outer_l2_len=14 outer_ethertype=800 outer_l3_len=20
  tx: m->l2_len=64 m->l3_len=20 m->l4_len=20
  tx: flags=PKT_TX_IP_CKSUM PKT_TX_TCP_CKSUM
  -----------------
  rx: l2_len=14 ethertype=800 l3_len=20 l4_proto=6 l4_len=20
  rx: outer_l2_len=14 outer_ethertype=800 outer_l3_len=20
  tx: m->l2_len=64 m->l3_len=20 m->l4_len=20
  tx: flags=PKT_TX_IP_CKSUM PKT_TX_TCP_CKSUM
  -----------------
  rx: l2_len=14 ethertype=800 l3_len=20 l4_proto=6 l4_len=20
  rx: outer_l2_len=14 outer_ethertype=800 outer_l3_len=20
  tx: m->l2_len=64 m->l3_len=20 m->l4_len=20
  tx: flags=PKT_TX_IP_CKSUM PKT_TX_TCP_CKSUM
  -----------------
  rx: l2_len=14 ethertype=800 l3_len=20 l4_proto=6 l4_len=20
  rx: outer_l2_len=14 outer_ethertype=800 outer_l3_len=20
  tx: m->l2_len=64 m->l3_len=20 m->l4_len=20
  tx: flags=PKT_TX_IP_CKSUM PKT_TX_TCP_CKSUM
  -----------------
  rx: l2_len=14 ethertype=800 l3_len=20 l4_proto=6 l4_len=20
  rx: outer_l2_len=14 outer_ethertype=800 outer_l3_len=20
  tx: m->l2_len=64 m->l3_len=20 m->l4_len=20
  tx: flags=PKT_TX_IP_CKSUM PKT_TX_TCP_CKSUM
  -----------------
  rx: l2_len=14 ethertype=800 l3_len=20 l4_proto=6 l4_len=20
  rx: outer_l2_len=14 outer_ethertype=800 outer_l3_len=20
  tx: m->l2_len=64 m->l3_len=20 m->l4_len=20
  tx: flags=PKT_TX_IP_CKSUM PKT_TX_TCP_CKSUM
  -----------------
  rx: l2_len=14 ethertype=800 l3_len=20 l4_proto=6 l4_len=20
  rx: outer_l2_len=14 outer_ethertype=800 outer_l3_len=20
  tx: m->l2_len=64 m->l3_len=20 m->l4_len=20
  tx: flags=PKT_TX_IP_CKSUM PKT_TX_TCP_CKSUM
  -----------------
  rx: l2_len=14 ethertype=800 l3_len=20 l4_proto=6 l4_len=20
  rx: outer_l2_len=14 outer_ethertype=800 outer_l3_len=20
  tx: m->l2_len=64 m->l3_len=20 m->l4_len=20
  tx: flags=PKT_TX_IP_CKSUM PKT_TX_TCP_CKSUM
  -----------------
  rx: l2_len=14 ethertype=800 l3_len=20 l4_proto=6 l4_len=20
  rx: outer_l2_len=14 outer_ethertype=800 outer_l3_len=20
  tx: m->l2_len=64 m->l3_len=20 m->l4_len=20
  tx: flags=PKT_TX_IP_CKSUM PKT_TX_TCP_CKSUM
  -----------------
  rx: l2_len=14 ethertype=800 l3_len=20 l4_proto=6 l4_len=20
  rx: outer_l2_len=14 outer_ethertype=800 outer_l3_len=20
  tx: m->l2_len=64 m->l3_len=20 m->l4_len=20
  tx: flags=PKT_TX_IP_CKSUM PKT_TX_TCP_CKSUM
  -----------------
  rx: l2_len=14 ethertype=800 l3_len=20 l4_proto=6 l4_len=20
  rx: outer_l2_len=14 outer_ethertype=800 outer_l3_len=20
  tx: m->l2_len=64 m->l3_len=20 m->l4_len=20
  tx: flags=PKT_TX_IP_CKSUM PKT_TX_TCP_CKSUM
  -----------------
  rx: l2_len=14 ethertype=800 l3_len=20 l4_proto=6 l4_len=20
  rx: outer_l2_len=14 outer_ethertype=800 outer_l3_len=20
  tx: m->l2_len=64 m->l3_len=20 m->l4_len=20
  tx: flags=PKT_TX_IP_CKSUM PKT_TX_TCP_CKSUM
  -----------------
  rx: l2_len=14 ethertype=800 l3_len=20 l4_proto=6 l4_len=20
  rx: outer_l2_len=14 outer_ethertype=800 outer_l3_len=20
  tx: m->l2_len=64 m->l3_len=20 m->l4_len=20
  tx: flags=PKT_TX_IP_CKSUM PKT_TX_TCP_CKSUM
  -----------------
  rx: l2_len=14 ethertype=800 l3_len=20 l4_proto=6 l4_len=20
  rx: outer_l2_len=14 outer_ethertype=800 outer_l3_len=20
  tx: m->l2_len=64 m->l3_len=20 m->l4_len=20
  tx: flags=PKT_TX_IP_CKSUM PKT_TX_TCP_CKSUM
  -----------------
  rx: l2_len=14 ethertype=800 l3_len=20 l4_proto=6 l4_len=20
  rx: outer_l2_len=14 outer_ethertype=800 outer_l3_len=20
  tx: m->l2_len=64 m->l3_len=20 m->l4_len=20
  tx: flags=PKT_TX_IP_CKSUM PKT_TX_TCP_CKSUM
  -----------------
  rx: l2_len=14 ethertype=800 l3_len=20 l4_proto=6 l4_len=20
  rx: outer_l2_len=14 outer_ethertype=800 outer_l3_len=20
  tx: m->l2_len=64 m->l3_len=20 m->l4_len=20
  tx: flags=PKT_TX_IP_CKSUM PKT_TX_TCP_CKSUM
  -----------------
  rx: l2_len=14 ethertype=800 l3_len=20 l4_proto=6 l4_len=20
  rx: outer_l2_len=14 outer_ethertype=800 outer_l3_len=20
  tx: m->l2_len=64 m->l3_len=20 m->l4_len=20
  tx: flags=PKT_TX_IP_CKSUM PKT_TX_TCP_CKSUM
  -----------------
  rx: l2_len=14 ethertype=800 l3_len=20 l4_proto=6 l4_len=20
  rx: outer_l2_len=14 outer_ethertype=800 outer_l3_len=20
  tx: m->l2_len=64 m->l3_len=20 m->l4_len=20
  tx: flags=PKT_TX_IP_CKSUM PKT_TX_TCP_CKSUM

Check the capture file (test3-cap-hw-cksum.cap)

Note that the outer UDP checksum is wrong when not 0. This is normal as
the software cannot calculate the checksum of the inner layer because
the checksum of the inner layer will be modified by hardware.

Test 4: csum fwd engine, use TSO
================================

The goal of this test is to verify that TSO is working properly.

  # enable hw checksum
  tx_checksum set ip hw 0
  tx_checksum set udp hw 0
  tx_checksum set tcp hw 0
  tx_checksum set sctp hw 0
  # enable TSO
  tso set 800 0
  # set fwd engine and start
  set verbose 1
  set fwd csum
  start

Start test_v4() in scapy, result is:

  -----------------
  rx: l2_len=14 ethertype=800 l3_len=20 l4_proto=6 l4_len=20
  tx: m->l2_len=14 m->l3_len=20 m->l4_len=20
  tx: m->tso_segsz=800
  tx: flags=PKT_TX_IP_CKSUM PKT_TX_TCP_SEG
  -----------------
  rx: l2_len=14 ethertype=800 l3_len=20 l4_proto=6 l4_len=20
  tx: m->l2_len=14 m->l3_len=20 m->l4_len=20
  tx: m->tso_segsz=800
  tx: flags=PKT_TX_IP_CKSUM PKT_TX_TCP_SEG
  -----------------
  rx: l2_len=14 ethertype=800 l3_len=20 l4_proto=6 l4_len=20
  tx: m->l2_len=14 m->l3_len=20 m->l4_len=20
  tx: m->tso_segsz=800
  tx: flags=PKT_TX_IP_CKSUM PKT_TX_TCP_SEG
  -----------------
  rx: l2_len=14 ethertype=800 l3_len=20 l4_proto=6 l4_len=20
  tx: m->l2_len=14 m->l3_len=20 m->l4_len=20
  tx: m->tso_segsz=800
  tx: flags=PKT_TX_IP_CKSUM PKT_TX_TCP_SEG
  -----------------
  rx: l2_len=14 ethertype=800 l3_len=20 l4_proto=6 l4_len=20
  tx: m->l2_len=14 m->l3_len=20 m->l4_len=20
  tx: m->tso_segsz=800
  tx: flags=PKT_TX_IP_CKSUM PKT_TX_TCP_SEG
  -----------------
  rx: l2_len=14 ethertype=800 l3_len=20 l4_proto=17 l4_len=0
  tx: m->l2_len=14 m->l3_len=20 m->l4_len=0
  tx: m->tso_segsz=800
  tx: flags=PKT_TX_IP_CKSUM PKT_TX_UDP_CKSUM
  -----------------
  rx: l2_len=14 ethertype=800 l3_len=20 l4_proto=17 l4_len=0
  tx: m->l2_len=14 m->l3_len=20 m->l4_len=0
  tx: m->tso_segsz=800
  tx: flags=PKT_TX_IP_CKSUM PKT_TX_UDP_CKSUM
  -----------------
  rx: l2_len=14 ethertype=800 l3_len=20 l4_proto=17 l4_len=0
  tx: m->l2_len=14 m->l3_len=20 m->l4_len=0
  tx: m->tso_segsz=800
  tx: flags=PKT_TX_IP_CKSUM PKT_TX_UDP_CKSUM
  -----------------
  rx: l2_len=14 ethertype=800 l3_len=20 l4_proto=17 l4_len=0
  tx: m->l2_len=14 m->l3_len=20 m->l4_len=0
  tx: m->tso_segsz=800
  tx: flags=PKT_TX_IP_CKSUM PKT_TX_UDP_CKSUM
  -----------------
  rx: l2_len=14 ethertype=800 l3_len=20 l4_proto=17 l4_len=0
  tx: m->l2_len=14 m->l3_len=20 m->l4_len=0
  tx: m->tso_segsz=800
  tx: flags=PKT_TX_IP_CKSUM PKT_TX_UDP_CKSUM
  -----------------
  rx: l2_len=14 ethertype=800 l3_len=20 l4_proto=6 l4_len=20
  tx: m->l2_len=14 m->l3_len=20 m->l4_len=20
  tx: m->tso_segsz=800
  tx: flags=PKT_TX_IP_CKSUM PKT_TX_TCP_SEG
  -----------------
  rx: l2_len=14 ethertype=800 l3_len=20 l4_proto=6 l4_len=20
  tx: m->l2_len=14 m->l3_len=20 m->l4_len=20
  tx: m->tso_segsz=800
  tx: flags=PKT_TX_IP_CKSUM PKT_TX_TCP_SEG
  -----------------
  rx: l2_len=14 ethertype=800 l3_len=20 l4_proto=6 l4_len=20
  tx: m->l2_len=14 m->l3_len=20 m->l4_len=20
  tx: m->tso_segsz=800
  tx: flags=PKT_TX_IP_CKSUM PKT_TX_TCP_SEG
  -----------------
  rx: l2_len=14 ethertype=800 l3_len=20 l4_proto=6 l4_len=20
  tx: m->l2_len=14 m->l3_len=20 m->l4_len=20
  tx: m->tso_segsz=800
  tx: flags=PKT_TX_IP_CKSUM PKT_TX_TCP_SEG
  -----------------
  rx: l2_len=14 ethertype=800 l3_len=20 l4_proto=6 l4_len=20
  tx: m->l2_len=14 m->l3_len=20 m->l4_len=20
  tx: m->tso_segsz=800
  tx: flags=PKT_TX_IP_CKSUM PKT_TX_TCP_SEG
  -----------------
  rx: l2_len=14 ethertype=800 l3_len=20 l4_proto=6 l4_len=20
  tx: m->l2_len=14 m->l3_len=20 m->l4_len=20
  tx: m->tso_segsz=800
  tx: flags=PKT_TX_IP_CKSUM PKT_TX_TCP_SEG
  -----------------
  rx: l2_len=14 ethertype=800 l3_len=20 l4_proto=6 l4_len=20
  tx: m->l2_len=14 m->l3_len=20 m->l4_len=20
  tx: m->tso_segsz=800
  tx: flags=PKT_TX_IP_CKSUM PKT_TX_TCP_SEG
  -----------------
  rx: l2_len=14 ethertype=800 l3_len=20 l4_proto=6 l4_len=20
  tx: m->l2_len=14 m->l3_len=20 m->l4_len=20
  tx: m->tso_segsz=800
  tx: flags=PKT_TX_IP_CKSUM PKT_TX_TCP_SEG
  -----------------
  rx: l2_len=14 ethertype=800 l3_len=20 l4_proto=6 l4_len=20
  tx: m->l2_len=14 m->l3_len=20 m->l4_len=20
  tx: m->tso_segsz=800
  tx: flags=PKT_TX_IP_CKSUM PKT_TX_TCP_SEG
  -----------------
  rx: l2_len=14 ethertype=800 l3_len=20 l4_proto=6 l4_len=20
  tx: m->l2_len=14 m->l3_len=20 m->l4_len=20
  tx: m->tso_segsz=800
  tx: flags=PKT_TX_IP_CKSUM PKT_TX_TCP_SEG
  -----------------
  rx: l2_len=14 ethertype=800 l3_len=20 l4_proto=6 l4_len=20
  tx: m->l2_len=14 m->l3_len=20 m->l4_len=20
  tx: m->tso_segsz=800
  tx: flags=PKT_TX_IP_CKSUM PKT_TX_TCP_SEG
  -----------------
  rx: l2_len=14 ethertype=800 l3_len=20 l4_proto=6 l4_len=20
  tx: m->l2_len=14 m->l3_len=20 m->l4_len=20
  tx: m->tso_segsz=800
  tx: flags=PKT_TX_IP_CKSUM PKT_TX_TCP_SEG
  -----------------
  rx: l2_len=14 ethertype=800 l3_len=20 l4_proto=6 l4_len=20
  tx: m->l2_len=14 m->l3_len=20 m->l4_len=20
  tx: m->tso_segsz=800
  tx: flags=PKT_TX_IP_CKSUM PKT_TX_TCP_SEG
  -----------------
  rx: l2_len=14 ethertype=800 l3_len=20 l4_proto=6 l4_len=20
  tx: m->l2_len=14 m->l3_len=20 m->l4_len=20
  tx: m->tso_segsz=800
  tx: flags=PKT_TX_IP_CKSUM PKT_TX_TCP_SEG
  -----------------
  rx: l2_len=14 ethertype=800 l3_len=20 l4_proto=6 l4_len=20
  tx: m->l2_len=14 m->l3_len=20 m->l4_len=20
  tx: m->tso_segsz=800
  tx: flags=PKT_TX_IP_CKSUM PKT_TX_TCP_SEG

Start test_v6() in scapy, result is:

  -----------------
  rx: l2_len=14 ethertype=86dd l3_len=40 l4_proto=6 l4_len=20
  tx: m->l2_len=14 m->l3_len=40 m->l4_len=20
  tx: m->tso_segsz=800
  tx: flags=PKT_TX_TCP_SEG
  -----------------
  rx: l2_len=14 ethertype=86dd l3_len=40 l4_proto=6 l4_len=20
  tx: m->l2_len=14 m->l3_len=40 m->l4_len=20
  tx: m->tso_segsz=800
  tx: flags=PKT_TX_TCP_SEG
  -----------------
  rx: l2_len=14 ethertype=86dd l3_len=40 l4_proto=6 l4_len=20
  tx: m->l2_len=14 m->l3_len=40 m->l4_len=20
  tx: m->tso_segsz=800
  tx: flags=PKT_TX_TCP_SEG
  -----------------
  rx: l2_len=14 ethertype=86dd l3_len=40 l4_proto=6 l4_len=20
  tx: m->l2_len=14 m->l3_len=40 m->l4_len=20
  tx: m->tso_segsz=800
  tx: flags=PKT_TX_TCP_SEG
  -----------------
  rx: l2_len=14 ethertype=86dd l3_len=40 l4_proto=6 l4_len=20
  tx: m->l2_len=14 m->l3_len=40 m->l4_len=20
  tx: m->tso_segsz=800
  tx: flags=PKT_TX_TCP_SEG
  -----------------
  rx: l2_len=14 ethertype=86dd l3_len=40 l4_proto=17 l4_len=0
  tx: m->l2_len=14 m->l3_len=40 m->l4_len=0
  tx: m->tso_segsz=800
  tx: flags=PKT_TX_UDP_CKSUM
  -----------------
  rx: l2_len=14 ethertype=86dd l3_len=40 l4_proto=17 l4_len=0
  tx: m->l2_len=14 m->l3_len=40 m->l4_len=0
  tx: m->tso_segsz=800
  tx: flags=PKT_TX_UDP_CKSUM
  -----------------
  rx: l2_len=14 ethertype=86dd l3_len=40 l4_proto=17 l4_len=0
  tx: m->l2_len=14 m->l3_len=40 m->l4_len=0
  tx: m->tso_segsz=800
  tx: flags=PKT_TX_UDP_CKSUM
  -----------------
  rx: l2_len=14 ethertype=86dd l3_len=40 l4_proto=17 l4_len=0
  tx: m->l2_len=14 m->l3_len=40 m->l4_len=0
  tx: m->tso_segsz=800
  tx: flags=PKT_TX_UDP_CKSUM
  -----------------
  rx: l2_len=14 ethertype=86dd l3_len=40 l4_proto=17 l4_len=0
  tx: m->l2_len=14 m->l3_len=40 m->l4_len=0
  tx: m->tso_segsz=800
  tx: flags=PKT_TX_UDP_CKSUM
  -----------------
  rx: l2_len=14 ethertype=86dd l3_len=40 l4_proto=6 l4_len=20
  tx: m->l2_len=14 m->l3_len=40 m->l4_len=20
  tx: m->tso_segsz=800
  tx: flags=PKT_TX_TCP_SEG
  -----------------
  rx: l2_len=14 ethertype=86dd l3_len=40 l4_proto=6 l4_len=20
  tx: m->l2_len=14 m->l3_len=40 m->l4_len=20
  tx: m->tso_segsz=800
  tx: flags=PKT_TX_TCP_SEG
  -----------------
  rx: l2_len=14 ethertype=86dd l3_len=40 l4_proto=6 l4_len=20
  tx: m->l2_len=14 m->l3_len=40 m->l4_len=20
  tx: m->tso_segsz=800
  tx: flags=PKT_TX_TCP_SEG
  -----------------
  rx: l2_len=14 ethertype=86dd l3_len=40 l4_proto=6 l4_len=20
  tx: m->l2_len=14 m->l3_len=40 m->l4_len=20
  tx: m->tso_segsz=800
  tx: flags=PKT_TX_TCP_SEG
  -----------------
  rx: l2_len=14 ethertype=86dd l3_len=40 l4_proto=6 l4_len=20
  tx: m->l2_len=14 m->l3_len=40 m->l4_len=20
  tx: m->tso_segsz=800
  tx: flags=PKT_TX_TCP_SEG
  -----------------
  rx: l2_len=14 ethertype=86dd l3_len=40 l4_proto=6 l4_len=20
  tx: m->l2_len=14 m->l3_len=40 m->l4_len=20
  tx: m->tso_segsz=800
  tx: flags=PKT_TX_TCP_SEG
  -----------------
  rx: l2_len=14 ethertype=86dd l3_len=40 l4_proto=6 l4_len=20
  tx: m->l2_len=14 m->l3_len=40 m->l4_len=20
  tx: m->tso_segsz=800
  tx: flags=PKT_TX_TCP_SEG
  -----------------
  rx: l2_len=14 ethertype=86dd l3_len=40 l4_proto=6 l4_len=20
  tx: m->l2_len=14 m->l3_len=40 m->l4_len=20
  tx: m->tso_segsz=800
  tx: flags=PKT_TX_TCP_SEG
  -----------------
  rx: l2_len=14 ethertype=86dd l3_len=40 l4_proto=6 l4_len=20
  tx: m->l2_len=14 m->l3_len=40 m->l4_len=20
  tx: m->tso_segsz=800
  tx: flags=PKT_TX_TCP_SEG
  -----------------
  rx: l2_len=14 ethertype=86dd l3_len=40 l4_proto=6 l4_len=20
  tx: m->l2_len=14 m->l3_len=40 m->l4_len=20
  tx: m->tso_segsz=800
  tx: flags=PKT_TX_TCP_SEG

Start test_vxlan() in scapy, result is:

  -----------------
  rx: l2_len=14 ethertype=800 l3_len=20 l4_proto=6 l4_len=20
  rx: outer_l2_len=14 outer_ethertype=800 outer_l3_len=20
  tx: m->l2_len=64 m->l3_len=20 m->l4_len=20
  tx: m->tso_segsz=800
  tx: flags=PKT_TX_IP_CKSUM PKT_TX_TCP_SEG
  -----------------
  rx: l2_len=14 ethertype=800 l3_len=20 l4_proto=6 l4_len=20
  rx: outer_l2_len=14 outer_ethertype=800 outer_l3_len=20
  tx: m->l2_len=64 m->l3_len=20 m->l4_len=20
  tx: m->tso_segsz=800
  tx: flags=PKT_TX_IP_CKSUM PKT_TX_TCP_SEG
  -----------------
  rx: l2_len=14 ethertype=800 l3_len=20 l4_proto=6 l4_len=20
  rx: outer_l2_len=14 outer_ethertype=800 outer_l3_len=20
  tx: m->l2_len=64 m->l3_len=20 m->l4_len=20
  tx: m->tso_segsz=800
  tx: flags=PKT_TX_IP_CKSUM PKT_TX_TCP_SEG
  -----------------
  rx: l2_len=14 ethertype=800 l3_len=20 l4_proto=6 l4_len=20
  rx: outer_l2_len=14 outer_ethertype=800 outer_l3_len=20
  tx: m->l2_len=64 m->l3_len=20 m->l4_len=20
  tx: m->tso_segsz=800
  tx: flags=PKT_TX_IP_CKSUM PKT_TX_TCP_SEG
  -----------------
  rx: l2_len=14 ethertype=800 l3_len=20 l4_proto=6 l4_len=20
  rx: outer_l2_len=14 outer_ethertype=800 outer_l3_len=20
  tx: m->l2_len=64 m->l3_len=20 m->l4_len=20
  tx: m->tso_segsz=800
  tx: flags=PKT_TX_IP_CKSUM PKT_TX_TCP_SEG
  -----------------
  rx: l2_len=14 ethertype=800 l3_len=20 l4_proto=6 l4_len=20
  rx: outer_l2_len=14 outer_ethertype=800 outer_l3_len=20
  tx: m->l2_len=64 m->l3_len=20 m->l4_len=20
  tx: m->tso_segsz=800
  tx: flags=PKT_TX_IP_CKSUM PKT_TX_TCP_SEG
  -----------------
  rx: l2_len=14 ethertype=800 l3_len=20 l4_proto=6 l4_len=20
  rx: outer_l2_len=14 outer_ethertype=800 outer_l3_len=20
  tx: m->l2_len=64 m->l3_len=20 m->l4_len=20
  tx: m->tso_segsz=800
  tx: flags=PKT_TX_IP_CKSUM PKT_TX_TCP_SEG
  -----------------
  rx: l2_len=14 ethertype=800 l3_len=20 l4_proto=6 l4_len=20
  rx: outer_l2_len=14 outer_ethertype=800 outer_l3_len=20
  tx: m->l2_len=64 m->l3_len=20 m->l4_len=20
  tx: m->tso_segsz=800
  tx: flags=PKT_TX_IP_CKSUM PKT_TX_TCP_SEG
  -----------------
  rx: l2_len=14 ethertype=800 l3_len=20 l4_proto=6 l4_len=20
  rx: outer_l2_len=14 outer_ethertype=800 outer_l3_len=20
  tx: m->l2_len=64 m->l3_len=20 m->l4_len=20
  tx: m->tso_segsz=800
  tx: flags=PKT_TX_IP_CKSUM PKT_TX_TCP_SEG
  -----------------
  rx: l2_len=14 ethertype=800 l3_len=20 l4_proto=6 l4_len=20
  rx: outer_l2_len=14 outer_ethertype=800 outer_l3_len=20
  tx: m->l2_len=64 m->l3_len=20 m->l4_len=20
  tx: m->tso_segsz=800
  tx: flags=PKT_TX_IP_CKSUM PKT_TX_TCP_SEG
  -----------------
  rx: l2_len=14 ethertype=800 l3_len=20 l4_proto=6 l4_len=20
  rx: outer_l2_len=14 outer_ethertype=800 outer_l3_len=20
  tx: m->l2_len=64 m->l3_len=20 m->l4_len=20
  tx: m->tso_segsz=800
  tx: flags=PKT_TX_IP_CKSUM PKT_TX_TCP_SEG
  -----------------
  rx: l2_len=14 ethertype=800 l3_len=20 l4_proto=6 l4_len=20
  rx: outer_l2_len=14 outer_ethertype=800 outer_l3_len=20
  tx: m->l2_len=64 m->l3_len=20 m->l4_len=20
  tx: m->tso_segsz=800
  tx: flags=PKT_TX_IP_CKSUM PKT_TX_TCP_SEG
  -----------------
  rx: l2_len=14 ethertype=800 l3_len=20 l4_proto=6 l4_len=20
  rx: outer_l2_len=14 outer_ethertype=800 outer_l3_len=20
  tx: m->l2_len=64 m->l3_len=20 m->l4_len=20
  tx: m->tso_segsz=800
  tx: flags=PKT_TX_IP_CKSUM PKT_TX_TCP_SEG
  -----------------
  rx: l2_len=14 ethertype=800 l3_len=20 l4_proto=6 l4_len=20
  rx: outer_l2_len=14 outer_ethertype=800 outer_l3_len=20
  tx: m->l2_len=64 m->l3_len=20 m->l4_len=20
  tx: m->tso_segsz=800
  tx: flags=PKT_TX_IP_CKSUM PKT_TX_TCP_SEG
  -----------------
  rx: l2_len=14 ethertype=800 l3_len=20 l4_proto=6 l4_len=20
  rx: outer_l2_len=14 outer_ethertype=800 outer_l3_len=20
  tx: m->l2_len=64 m->l3_len=20 m->l4_len=20
  tx: m->tso_segsz=800
  tx: flags=PKT_TX_IP_CKSUM PKT_TX_TCP_SEG
  -----------------
  rx: l2_len=14 ethertype=800 l3_len=20 l4_proto=6 l4_len=20
  rx: outer_l2_len=14 outer_ethertype=800 outer_l3_len=20
  tx: m->l2_len=64 m->l3_len=20 m->l4_len=20
  tx: m->tso_segsz=800
  tx: flags=PKT_TX_IP_CKSUM PKT_TX_TCP_SEG
  -----------------
  rx: l2_len=14 ethertype=800 l3_len=20 l4_proto=6 l4_len=20
  rx: outer_l2_len=14 outer_ethertype=800 outer_l3_len=20
  tx: m->l2_len=64 m->l3_len=20 m->l4_len=20
  tx: m->tso_segsz=800
  tx: flags=PKT_TX_IP_CKSUM PKT_TX_TCP_SEG
  -----------------
  rx: l2_len=14 ethertype=800 l3_len=20 l4_proto=6 l4_len=20
  rx: outer_l2_len=14 outer_ethertype=800 outer_l3_len=20
  tx: m->l2_len=64 m->l3_len=20 m->l4_len=20
  tx: m->tso_segsz=800
  tx: flags=PKT_TX_IP_CKSUM PKT_TX_TCP_SEG
  -----------------
  rx: l2_len=14 ethertype=800 l3_len=20 l4_proto=6 l4_len=20
  rx: outer_l2_len=14 outer_ethertype=800 outer_l3_len=20
  tx: m->l2_len=64 m->l3_len=20 m->l4_len=20
  tx: m->tso_segsz=800
  tx: flags=PKT_TX_IP_CKSUM PKT_TX_TCP_SEG
  -----------------
  rx: l2_len=14 ethertype=800 l3_len=20 l4_proto=6 l4_len=20
  rx: outer_l2_len=14 outer_ethertype=800 outer_l3_len=20
  tx: m->l2_len=64 m->l3_len=20 m->l4_len=20
  tx: m->tso_segsz=800
  tx: flags=PKT_TX_IP_CKSUM PKT_TX_TCP_SEG

Check the capture file (test4-cap-tso.cap)

Note that the outer UDP checksum is wrong when not 0. This is normal as
the software cannot calculate the checksum of the inner layer because
the checksum of the inner layer will be modified by hardware.

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [dpdk-dev] [PATCH 00/12] add TSO support
  2014-11-11  9:21 ` [dpdk-dev] [PATCH 00/12] add TSO support Olivier MATZ
@ 2014-11-11  9:48   ` Olivier MATZ
  0 siblings, 0 replies; 112+ messages in thread
From: Olivier MATZ @ 2014-11-11  9:48 UTC (permalink / raw)
  To: dev; +Cc: jigsaw

On 11/11/2014 10:21 AM, Olivier MATZ wrote:
> Check the capture file (test2-cap-sw-cksum.cap)

> Check the capture file (test3-cap-hw-cksum.cap)

> Check the capture file (test4-cap-tso.cap)

Sorry, the attachments are automatically stripped by the list,
you can find them here:
https://www.droids-corp.org/~zer0/dpdk-tso-cap/

Olivier

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [dpdk-dev] [PATCH 10/12] testpmd: rework csum forward engine
  2014-11-11  8:35   ` Liu, Jijiang
@ 2014-11-11  9:55     ` Olivier MATZ
  0 siblings, 0 replies; 112+ messages in thread
From: Olivier MATZ @ 2014-11-11  9:55 UTC (permalink / raw)
  To: Liu, Jijiang, dev; +Cc: jigsaw

Hi Jijiang,

On 11/11/2014 09:35 AM, Liu, Jijiang wrote:
> The PKT_TX_VXLAN_CKSUM was not set in the patch, and VXLAN TX checksum offload would not work. 

Thank you for reporting this. Indeed, there is an issue. See below.

>> +/* Calculate the checksum of outer header (only vxlan is supported,
>> + * meaning IP + UDP). The caller already checked that it's a vxlan
>> + * packet */
>> +static uint64_t
>> +process_outer_cksums(void *outer_l3_hdr, uint16_t outer_ethertype,
>> +	uint16_t outer_l3_len, uint16_t testpmd_ol_flags) {
>> +	struct ipv4_hdr *ipv4_hdr = outer_l3_hdr;
>> +	struct ipv6_hdr *ipv6_hdr = outer_l3_hdr;
>> +	struct udp_hdr *udp_hdr;
>> +	uint64_t ol_flags = 0;
>> +
>> +	if (testpmd_ol_flags & TESTPMD_TX_OFFLOAD_VXLAN_CKSUM)
>> +		ol_flags |= PKT_TX_IP_CKSUM;

Here it should be: ol_flags |= PKT_TX_VXLAN_CKSUM

I'll fix that in the next version.

Regards,
Olivier

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [dpdk-dev] [PATCH 07/12] mbuf: generic support for TCP segmentation offload
  2014-11-10 15:59 ` [dpdk-dev] [PATCH 07/12] mbuf: generic support for TCP segmentation offload Olivier Matz
  2014-11-11  3:17   ` Liu, Jijiang
@ 2014-11-12 13:09   ` Ananyev, Konstantin
  1 sibling, 0 replies; 112+ messages in thread
From: Ananyev, Konstantin @ 2014-11-12 13:09 UTC (permalink / raw)
  To: Olivier Matz, dev; +Cc: jigsaw



> -----Original Message-----
> From: Olivier Matz [mailto:olivier.matz@6wind.com]
> Sent: Monday, November 10, 2014 3:59 PM
> To: dev@dpdk.org
> Cc: olivier.matz@6wind.com; Walukiewicz, Miroslaw; Liu, Jijiang; Liu, Yong; jigsaw@gmail.com; Richardson, Bruce; Ananyev, Konstantin
> Subject: [PATCH 07/12] mbuf: generic support for TCP segmentation offload
> 
> Some of the NICs supported by DPDK have a possibility to accelerate TCP
> traffic by using segmentation offload. The application prepares a packet
> with valid TCP header with size up to 64K and deleguates the
> segmentation to the NIC.
> 
> Implement the generic part of TCP segmentation offload in rte_mbuf. It
> introduces 2 new fields in rte_mbuf: l4_len (length of L4 header in bytes)
> and tso_segsz (MSS of packets).
> 
> To delegate the TCP segmentation to the hardware, the user has to:
> 
> - set the PKT_TX_TCP_SEG flag in mbuf->ol_flags (this flag implies
>   PKT_TX_TCP_CKSUM)
> - set PKT_TX_IP_CKSUM if it's IPv4, and set the IP checksum to 0 in
>   the packet
> - fill the mbuf offload information: l2_len, l3_len, l4_len, tso_segsz
> - calculate the pseudo header checksum and set it in the TCP header,
>   as required when doing hardware TCP checksum offload
> 
> The API is inspired from ixgbe hardware (the next commit adds the
> support for ixgbe), but it seems generic enough to be used for other
> hw/drivers in the future.
> 
> This commit also reworks the way l2_len and l3_len are used in igb
> and ixgbe drivers as the l2_l3_len is not available anymore in mbuf.
> 
> Signed-off-by: Mirek Walukiewicz <miroslaw.walukiewicz@intel.com>
> Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
> ---
>  app/test-pmd/testpmd.c            |  3 ++-
>  examples/ipv4_multicast/main.c    |  3 ++-
>  lib/librte_mbuf/rte_mbuf.h        | 44 +++++++++++++++++++++++----------------
>  lib/librte_pmd_e1000/igb_rxtx.c   | 11 +++++++++-
>  lib/librte_pmd_ixgbe/ixgbe_rxtx.c | 11 +++++++++-
>  5 files changed, 50 insertions(+), 22 deletions(-)
> 
> diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c
> index 12adafa..a831e31 100644
> --- a/app/test-pmd/testpmd.c
> +++ b/app/test-pmd/testpmd.c
> @@ -408,7 +408,8 @@ testpmd_mbuf_ctor(struct rte_mempool *mp,
>  	mb->ol_flags     = 0;
>  	mb->data_off     = RTE_PKTMBUF_HEADROOM;
>  	mb->nb_segs      = 1;
> -	mb->l2_l3_len       = 0;
> +	mb->l2_len       = 0;
> +	mb->l3_len       = 0;
>  	mb->vlan_tci     = 0;
>  	mb->hash.rss     = 0;
>  }
> diff --git a/examples/ipv4_multicast/main.c b/examples/ipv4_multicast/main.c
> index de5e6be..a31d43d 100644
> --- a/examples/ipv4_multicast/main.c
> +++ b/examples/ipv4_multicast/main.c
> @@ -302,7 +302,8 @@ mcast_out_pkt(struct rte_mbuf *pkt, int use_clone)
>  	/* copy metadata from source packet*/
>  	hdr->port = pkt->port;
>  	hdr->vlan_tci = pkt->vlan_tci;
> -	hdr->l2_l3_len = pkt->l2_l3_len;
> +	hdr->l2_len = pkt->l2_len;
> +	hdr->l3_len = pkt->l3_len;
>  	hdr->hash = pkt->hash;
> 
>  	hdr->ol_flags = pkt->ol_flags;
> diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h
> index bcd8996..f76b768 100644
> --- a/lib/librte_mbuf/rte_mbuf.h
> +++ b/lib/librte_mbuf/rte_mbuf.h
> @@ -126,6 +126,19 @@ extern "C" {
> 
>  #define PKT_TX_VXLAN_CKSUM   (1ULL << 50) /**< TX checksum of VXLAN computed by NIC */
> 
> +/**
> + * TCP segmentation offload. To enable this offload feature for a
> + * packet to be transmitted on hardware supporting TSO:
> + *  - set the PKT_TX_TCP_SEG flag in mbuf->ol_flags (this flag implies
> + *    PKT_TX_TCP_CKSUM)
> + *  - if it's IPv4, set the PKT_TX_IP_CKSUM flag and write the IP checksum
> + *    to 0 in the packet
> + *  - fill the mbuf offload information: l2_len, l3_len, l4_len, tso_segsz
> + *  - calculate the pseudo header checksum and set it in the TCP header,
> + *    as required when doing hardware TCP checksum offload
> + */
> +#define PKT_TX_TCP_SEG       (1ULL << 49)
> +
>  /* Use final bit of flags to indicate a control mbuf */
>  #define CTRL_MBUF_FLAG       (1ULL << 63) /**< Mbuf contains control data */
> 
> @@ -185,6 +198,7 @@ static inline const char *rte_get_tx_ol_flag_name(uint64_t mask)
>  	case PKT_TX_UDP_CKSUM: return "PKT_TX_UDP_CKSUM";
>  	case PKT_TX_IEEE1588_TMST: return "PKT_TX_IEEE1588_TMST";
>  	case PKT_TX_VXLAN_CKSUM: return "PKT_TX_VXLAN_CKSUM";
> +	case PKT_TX_TCP_SEG: return "PKT_TX_TCP_SEG";
>  	default: return NULL;
>  	}
>  }
> @@ -264,22 +278,18 @@ struct rte_mbuf {
> 
>  	/* fields to support TX offloads */
>  	union {
> -		uint16_t l2_l3_len; /**< combined l2/l3 lengths as single var */
> +		uint64_t tx_offload;       /**< combined for easy fetch */
>  		struct {
> -			uint16_t l3_len:9;      /**< L3 (IP) Header Length. */
> -			uint16_t l2_len:7;      /**< L2 (MAC) Header Length. */
> -		};
> -	};
> +			uint64_t l2_len:7; /**< L2 (MAC) Header Length. */
> +			uint64_t l3_len:9; /**< L3 (IP) Header Length. */
> +			uint64_t l4_len:8; /**< L4 (TCP/UDP) Header Length. */
> +			uint64_t tso_segsz:16; /**< TCP TSO segment size */
> 
> -	/* fields for TX offloading of tunnels */
> -	union {
> -		uint16_t inner_l2_l3_len;
> -		/**< combined inner l2/l3 lengths as single var */
> -		struct {
> -			uint16_t inner_l3_len:9;
> -			/**< inner L3 (IP) Header Length. */
> -			uint16_t inner_l2_len:7;
> -			/**< inner L2 (MAC) Header Length. */
> +			/* fields for TX offloading of tunnels */
> +			uint16_t inner_l3_len:9; /**< inner L3 (IP) Hdr Length. */
> +			uint16_t inner_l2_len:7; /**< inner L2 (MAC) Hdr Length. */
> +

Shouldn't these 2 fields be bit fields of uint64_t too?
uint64_t inner_l3_len:9;
uint64_t inner_l2_len:7;  
So it fits inot one uint64_t?

Konstantin

> +			/* uint64_t unused:8; */
>  		};
>  	};
>  } __rte_cache_aligned;
> @@ -631,8 +641,7 @@ static inline void rte_pktmbuf_reset(struct rte_mbuf *m)
>  {
>  	m->next = NULL;
>  	m->pkt_len = 0;
> -	m->l2_l3_len = 0;
> -	m->inner_l2_l3_len = 0;
> +	m->tx_offload = 0;
>  	m->vlan_tci = 0;
>  	m->nb_segs = 1;
>  	m->port = 0xff;
> @@ -701,8 +710,7 @@ static inline void rte_pktmbuf_attach(struct rte_mbuf *mi, struct rte_mbuf *md)
>  	mi->data_len = md->data_len;
>  	mi->port = md->port;
>  	mi->vlan_tci = md->vlan_tci;
> -	mi->l2_l3_len = md->l2_l3_len;
> -	mi->inner_l2_l3_len = md->inner_l2_l3_len;
> +	mi->tx_offload = md->tx_offload;
>  	mi->hash = md->hash;
> 
>  	mi->next = NULL;
> diff --git a/lib/librte_pmd_e1000/igb_rxtx.c b/lib/librte_pmd_e1000/igb_rxtx.c
> index dbf5074..0a9447e 100644
> --- a/lib/librte_pmd_e1000/igb_rxtx.c
> +++ b/lib/librte_pmd_e1000/igb_rxtx.c
> @@ -361,6 +361,13 @@ eth_igb_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
>  	struct rte_mbuf     *tx_pkt;
>  	struct rte_mbuf     *m_seg;
>  	union igb_vlan_macip vlan_macip_lens;
> +	union {
> +		uint16_t u16;
> +		struct {
> +			uint16_t l3_len:9;
> +			uint16_t l2_len:7;
> +		};
> +	} l2_l3_len;
>  	uint64_t buf_dma_addr;
>  	uint32_t olinfo_status;
>  	uint32_t cmd_type_len;
> @@ -398,8 +405,10 @@ eth_igb_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
>  		tx_last = (uint16_t) (tx_id + tx_pkt->nb_segs - 1);
> 
>  		ol_flags = tx_pkt->ol_flags;
> +		l2_l3_len.l2_len = tx_pkt->l2_len;
> +		l2_l3_len.l3_len = tx_pkt->l3_len;
>  		vlan_macip_lens.f.vlan_tci = tx_pkt->vlan_tci;
> -		vlan_macip_lens.f.l2_l3_len = tx_pkt->l2_l3_len;
> +		vlan_macip_lens.f.l2_l3_len = l2_l3_len.u16;
>  		tx_ol_req = ol_flags & (PKT_TX_VLAN_PKT | PKT_TX_IP_CKSUM |
>  			PKT_TX_L4_MASK);
> 
> diff --git a/lib/librte_pmd_ixgbe/ixgbe_rxtx.c b/lib/librte_pmd_ixgbe/ixgbe_rxtx.c
> index 70ca254..54a0fc1 100644
> --- a/lib/librte_pmd_ixgbe/ixgbe_rxtx.c
> +++ b/lib/librte_pmd_ixgbe/ixgbe_rxtx.c
> @@ -540,6 +540,13 @@ ixgbe_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
>  	struct rte_mbuf     *tx_pkt;
>  	struct rte_mbuf     *m_seg;
>  	union ixgbe_vlan_macip vlan_macip_lens;
> +	union {
> +		uint16_t u16;
> +		struct {
> +			uint16_t l3_len:9;
> +			uint16_t l2_len:7;
> +		};
> +	} l2_l3_len;
>  	uint64_t buf_dma_addr;
>  	uint32_t olinfo_status;
>  	uint32_t cmd_type_len;
> @@ -583,8 +590,10 @@ ixgbe_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
>  		tx_ol_req = ol_flags & (PKT_TX_VLAN_PKT | PKT_TX_IP_CKSUM |
>  			PKT_TX_L4_MASK);
>  		if (tx_ol_req) {
> +			l2_l3_len.l2_len = tx_pkt->l2_len;
> +			l2_l3_len.l3_len = tx_pkt->l3_len;
>  			vlan_macip_lens.f.vlan_tci = tx_pkt->vlan_tci;
> -			vlan_macip_lens.f.l2_l3_len = tx_pkt->l2_l3_len;
> +			vlan_macip_lens.f.l2_l3_len = l2_l3_len.u16;
> 
>  			/* If new context need be built or reuse the exist ctx. */
>  			ctx = what_advctx_update(txq, tx_ol_req,
> --
> 2.1.0

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [dpdk-dev] [PATCH 06/12] mbuf: add functions to get the name of an ol_flag
  2014-11-10 17:29   ` Bruce Richardson
  2014-11-10 20:54     ` Olivier MATZ
@ 2014-11-12 17:21     ` Ananyev, Konstantin
  2014-11-12 17:44       ` Olivier MATZ
  1 sibling, 1 reply; 112+ messages in thread
From: Ananyev, Konstantin @ 2014-11-12 17:21 UTC (permalink / raw)
  To: Richardson, Bruce, Olivier Matz; +Cc: dev, jigsaw



> -----Original Message-----
> From: Richardson, Bruce
> Sent: Monday, November 10, 2014 5:30 PM
> To: Olivier Matz
> Cc: dev@dpdk.org; Walukiewicz, Miroslaw; Liu, Jijiang; Liu, Yong; jigsaw@gmail.com; Ananyev, Konstantin
> Subject: Re: [PATCH 06/12] mbuf: add functions to get the name of an ol_flag
> 
> On Mon, Nov 10, 2014 at 04:59:20PM +0100, Olivier Matz wrote:
> > In test-pmd (rxonly.c), the code is able to dump the list of ol_flags.
> > The issue is that the list of flags in the application has to be
> > synchronized with the flags defined in rte_mbuf.h.
> >
> > This patch introduces 2 new functions rte_get_rx_ol_flag_name()
> > and rte_get_tx_ol_flag_name() that returns the name of a flag from
> > its mask. It also fixes rxonly.c to use this new functions and to
> > display the proper flags.
> 
> Good idea. Couple of minor comments below.

Yes, that looks like a good idea to me too.
Just one thought - there is probably no need to make rte_get_*_ol_flag_name() inlined
and put  them into rte_mbuf.h
Seems like rte_mbuf.c - is a good place for these 2 functions definitions.

Konstantin

> 
> /Bruce
> 
> >
> > Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
> > ---
> >  app/test-pmd/rxonly.c      | 36 ++++++++--------------------
> >  lib/librte_mbuf/rte_mbuf.h | 60 ++++++++++++++++++++++++++++++++++++++++++++++
> >  2 files changed, 70 insertions(+), 26 deletions(-)
> >
> > diff --git a/app/test-pmd/rxonly.c b/app/test-pmd/rxonly.c
> > index 4410c3d..e7cd7e2 100644
> > --- a/app/test-pmd/rxonly.c
> > +++ b/app/test-pmd/rxonly.c
> > @@ -71,26 +71,6 @@
> >
> >  #include "testpmd.h"
> >
> > -#define MAX_PKT_RX_FLAGS 13
> > -static const char *pkt_rx_flag_names[MAX_PKT_RX_FLAGS] = {
> > -	"VLAN_PKT",
> > -	"RSS_HASH",
> > -	"PKT_RX_FDIR",
> > -	"IP_CKSUM",
> > -	"IP_CKSUM_BAD",
> > -
> > -	"IPV4_HDR",
> > -	"IPV4_HDR_EXT",
> > -	"IPV6_HDR",
> > -	"IPV6_HDR_EXT",
> > -
> > -	"IEEE1588_PTP",
> > -	"IEEE1588_TMST",
> > -
> > -	"TUNNEL_IPV4_HDR",
> > -	"TUNNEL_IPV6_HDR",
> > -};
> > -
> >  static inline void
> >  print_ether_addr(const char *what, struct ether_addr *eth_addr)
> >  {
> > @@ -219,12 +199,16 @@ pkt_burst_receive(struct fwd_stream *fs)
> >  		printf(" - Receive queue=0x%x", (unsigned) fs->rx_queue);
> >  		printf("\n");
> >  		if (ol_flags != 0) {
> > -			int rxf;
> > -
> > -			for (rxf = 0; rxf < MAX_PKT_RX_FLAGS; rxf++) {
> > -				if (ol_flags & (1 << rxf))
> > -					printf("  PKT_RX_%s\n",
> > -					       pkt_rx_flag_names[rxf]);
> > +			unsigned rxf;
> > +			const char *name;
> > +
> > +			for (rxf = 0; rxf < sizeof(mb->ol_flags) * 8; rxf++) {
> > +				if ((ol_flags & (1ULL << rxf)) == 0)
> > +					continue;
> > +				name = rte_get_rx_ol_flag_name(1ULL << rxf);
> > +				if (name == NULL)
> > +					continue;
> > +				printf("  %s\n", name);
> >  			}
> >  		}
> >  		rte_pktmbuf_free(mb);
> > diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h
> > index ff11b84..bcd8996 100644
> > --- a/lib/librte_mbuf/rte_mbuf.h
> > +++ b/lib/librte_mbuf/rte_mbuf.h
> > @@ -129,6 +129,66 @@ extern "C" {
> >  /* Use final bit of flags to indicate a control mbuf */
> >  #define CTRL_MBUF_FLAG       (1ULL << 63) /**< Mbuf contains control data */
> >
> > +/**
> > + * Bit Mask to indicate what bits required for building TX context
> I don't understand this first line - is it accidentally included?
> 
> > + * Get the name of a RX offload flag
> > + *
> > + * @param mask
> > + *   The mask describing the flag. Usually only one bit must be set.
> > + *   Several bits can be given if they belong to the same mask.
> > + *   Ex: PKT_TX_L4_MASK.
> TX mask given as an example for a function for RX flags is confusing.
> 
> > + * @return
> > + *   The name of this flag, or NULL if it's not a valid RX flag.
> > + */
> > +static inline const char *rte_get_rx_ol_flag_name(uint64_t mask)
> > +{
> > +	switch (mask) {
> > +	case PKT_RX_VLAN_PKT: return "PKT_RX_VLAN_PKT";
> > +	case PKT_RX_RSS_HASH: return "PKT_RX_RSS_HASH";
> > +	case PKT_RX_FDIR: return "PKT_RX_FDIR";
> > +	case PKT_RX_L4_CKSUM_BAD: return "PKT_RX_L4_CKSUM_BAD";
> > +	case PKT_RX_IP_CKSUM_BAD: return "PKT_RX_IP_CKSUM_BAD";
> > +	/* case PKT_RX_EIP_CKSUM_BAD: return "PKT_RX_EIP_CKSUM_BAD"; */
> > +	/* case PKT_RX_OVERSIZE: return "PKT_RX_OVERSIZE"; */
> > +	/* case PKT_RX_HBUF_OVERFLOW: return "PKT_RX_HBUF_OVERFLOW"; */
> > +	/* case PKT_RX_RECIP_ERR: return "PKT_RX_RECIP_ERR"; */
> > +	/* case PKT_RX_MAC_ERR: return "PKT_RX_MAC_ERR"; */
> > +	case PKT_RX_IPV4_HDR: return "PKT_RX_IPV4_HDR";
> > +	case PKT_RX_IPV4_HDR_EXT: return "PKT_RX_IPV4_HDR_EXT";
> > +	case PKT_RX_IPV6_HDR: return "PKT_RX_IPV6_HDR";
> > +	case PKT_RX_IPV6_HDR_EXT: return "PKT_RX_IPV6_HDR_EXT";
> > +	case PKT_RX_IEEE1588_PTP: return "PKT_RX_IEEE1588_PTP";
> > +	case PKT_RX_IEEE1588_TMST: return "PKT_RX_IEEE1588_TMST";
> > +	case PKT_RX_TUNNEL_IPV4_HDR: return "PKT_RX_TUNNEL_IPV4_HDR";
> > +	case PKT_RX_TUNNEL_IPV6_HDR: return "PKT_RX_TUNNEL_IPV6_HDR";
> > +	default: return NULL;
> > +	}
> > +}
> > +
> > +/**
> > + * Get the name of a TX offload flag
> > + *
> > + * @param mask
> > + *   The mask describing the flag. Usually only one bit must be set.
> > + *   Several bits can be given if they belong to the same mask.
> > + *   Ex: PKT_TX_L4_MASK.
> > + * @return
> > + *   The name of this flag, or NULL if it's not a valid TX flag.
> > + */
> > +static inline const char *rte_get_tx_ol_flag_name(uint64_t mask)
> > +{
> > +	switch (mask) {
> > +	case PKT_TX_VLAN_PKT: return "PKT_TX_VLAN_PKT";
> > +	case PKT_TX_IP_CKSUM: return "PKT_TX_IP_CKSUM";
> > +	case PKT_TX_TCP_CKSUM: return "PKT_TX_TCP_CKSUM";
> > +	case PKT_TX_SCTP_CKSUM: return "PKT_TX_SCTP_CKSUM";
> > +	case PKT_TX_UDP_CKSUM: return "PKT_TX_UDP_CKSUM";
> > +	case PKT_TX_IEEE1588_TMST: return "PKT_TX_IEEE1588_TMST";
> > +	case PKT_TX_VXLAN_CKSUM: return "PKT_TX_VXLAN_CKSUM";
> > +	default: return NULL;
> > +	}
> > +}
> > +
> >  /* define a set of marker types that can be used to refer to set points in the
> >   * mbuf */
> >  typedef void    *MARKER[0];   /**< generic marker for a point in a structure */
> > --
> > 2.1.0
> >

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [dpdk-dev] [PATCH 06/12] mbuf: add functions to get the name of an ol_flag
  2014-11-12 17:21     ` Ananyev, Konstantin
@ 2014-11-12 17:44       ` Olivier MATZ
  0 siblings, 0 replies; 112+ messages in thread
From: Olivier MATZ @ 2014-11-12 17:44 UTC (permalink / raw)
  To: Ananyev, Konstantin, Richardson, Bruce; +Cc: dev, jigsaw

Hi Konstantin,

On 11/12/2014 06:21 PM, Ananyev, Konstantin wrote:
>>> This patch introduces 2 new functions rte_get_rx_ol_flag_name()
>>> and rte_get_tx_ol_flag_name() that returns the name of a flag from
>>> its mask. It also fixes rxonly.c to use this new functions and to
>>> display the proper flags.
>>
>> Good idea. Couple of minor comments below.
>
> Yes, that looks like a good idea to me too.
> Just one thought - there is probably no need to make rte_get_*_ol_flag_name() inlined
> and put  them into rte_mbuf.h
> Seems like rte_mbuf.c - is a good place for these 2 functions definitions.
>
> Konstantin
>

Agree, I'll add this change to the list.

Thanks,
Olivier

^ permalink raw reply	[flat|nested] 112+ messages in thread

* [dpdk-dev] [PATCH v2 00/13] add TSO support
  2014-11-10 15:59 [dpdk-dev] [PATCH 00/12] add TSO support Olivier Matz
                   ` (12 preceding siblings ...)
  2014-11-11  9:21 ` [dpdk-dev] [PATCH 00/12] add TSO support Olivier MATZ
@ 2014-11-14 17:03 ` Olivier Matz
  2014-11-14 17:03   ` [dpdk-dev] [PATCH v2 01/13] igb/ixgbe: fix IP checksum calculation Olivier Matz
                     ` (13 more replies)
  13 siblings, 14 replies; 112+ messages in thread
From: Olivier Matz @ 2014-11-14 17:03 UTC (permalink / raw)
  To: dev; +Cc: jigsaw

This series add TSO support in ixgbe DPDK driver. This is a rework
of the series sent earlier this week [1]. This work is based on
another version [2] that was posted several months ago and
which included a mbuf rework that is now in mainline.

Changes in v2:

- move rte_get_rx_ol_flag_name() and rte_get_tx_ol_flag_name() in
  rte_mbuf.c, and fix comments
- use IGB_TX_OFFLOAD_MASK and IXGBE_TX_OFFLOAD_MASK to replace
  PKT_TX_OFFLOAD_MASK
- fix inner_l2_len and inner_l3_len bitfields: use uint64_t instead
  of uint16_t
- replace assignation of l2_len and l3_len by assignation of tx_offload.
  It now includes inner_l2_len and inner_l3_len at the same time.
- introduce a new cksum api in rte_ip.h following discussion with
  Konstantin
- reorder commits to have all TSO commits at the end of the series
- use ol_flags for phdr checksum calculation (this now matches ixgbe
  API: standard pseudo hdr cksum for TCP cksum offload, pseudo hdr
  cksum without ip paylen for TSO). This will probably be changed
  with a dev_prep_tx() like function for 2.0 release.
- rebase on latest head


This series first fixes some bugs that were discovered during the
development, adds some changes to the mbuf API (new l4_len and
tso_segsz fields), adds TSO support in ixgbe, reworks testpmd
csum forward engine, and finally adds TSO support in testpmd so it
can be validated.

The new fields added in mbuf try to be generic enough to apply to
other hardware in the future. To delegate the TCP segmentation to the
hardware, the user has to:

  - set the PKT_TX_TCP_SEG flag in mbuf->ol_flags (this flag implies
    PKT_TX_TCP_CKSUM)
  - if it's IPv4, set the PKT_TX_IP_CKSUM flag and write the IP checksum
    to 0 in the packet
  - fill the mbuf offload information: l2_len, l3_len, l4_len, tso_segsz
  - calculate the pseudo header checksum and set it in the TCP header,
    as required when doing hardware TCP checksum offload

The test report will be added as an answer to this cover letter and
could be linked in the concerned commits.

[1] http://dpdk.org/ml/archives/dev/2014-November/007953.html
[2] http://dpdk.org/ml/archives/dev/2014-May/002537.html


Olivier Matz (13):
  igb/ixgbe: fix IP checksum calculation
  ixgbe: fix remaining pkt_flags variable size to 64 bits
  mbuf: move vxlan_cksum flag definition at the proper place
  mbuf: add help about TX checksum flags
  mbuf: remove too specific PKT_TX_OFFLOAD_MASK definition
  mbuf: add functions to get the name of an ol_flag
  testpmd: fix use of offload flags in testpmd
  testpmd: rework csum forward engine
  mbuf: introduce new checksum API
  mbuf: generic support for TCP segmentation offload
  ixgbe: support TCP segmentation offload
  testpmd: support TSO in csum forward engine
  testpmd: add a verbose mode csum forward engine

 app/test-pmd/cmdline.c              | 243 +++++++++--
 app/test-pmd/config.c               |  15 +-
 app/test-pmd/csumonly.c             | 806 ++++++++++++++++--------------------
 app/test-pmd/macfwd.c               |   5 +-
 app/test-pmd/macswap.c              |   5 +-
 app/test-pmd/rxonly.c               |  36 +-
 app/test-pmd/testpmd.c              |   2 +-
 app/test-pmd/testpmd.h              |  24 +-
 app/test-pmd/txonly.c               |   9 +-
 examples/ipv4_multicast/main.c      |   2 +-
 lib/librte_mbuf/rte_mbuf.c          |  46 ++
 lib/librte_mbuf/rte_mbuf.h          |  95 +++--
 lib/librte_net/rte_ip.h             | 204 +++++++++
 lib/librte_pmd_e1000/igb_rxtx.c     |  21 +-
 lib/librte_pmd_ixgbe/ixgbe_ethdev.c |   3 +-
 lib/librte_pmd_ixgbe/ixgbe_rxtx.c   | 178 +++++---
 lib/librte_pmd_ixgbe/ixgbe_rxtx.h   |  19 +-
 17 files changed, 1069 insertions(+), 644 deletions(-)

-- 
2.1.0

^ permalink raw reply	[flat|nested] 112+ messages in thread

* [dpdk-dev] [PATCH v2 01/13] igb/ixgbe: fix IP checksum calculation
  2014-11-14 17:03 ` [dpdk-dev] [PATCH v2 00/13] " Olivier Matz
@ 2014-11-14 17:03   ` Olivier Matz
  2014-11-14 17:03   ` [dpdk-dev] [PATCH v2 02/13] ixgbe: fix remaining pkt_flags variable size to 64 bits Olivier Matz
                     ` (12 subsequent siblings)
  13 siblings, 0 replies; 112+ messages in thread
From: Olivier Matz @ 2014-11-14 17:03 UTC (permalink / raw)
  To: dev; +Cc: jigsaw

According to Intel® 82599 10 GbE Controller Datasheet (Table 7-38), both
L2 and L3 lengths are needed to offload the IP checksum.

Note that the e1000 driver does not need to be patched as it already
contains the fix.

Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
---
 lib/librte_pmd_e1000/igb_rxtx.c   | 2 +-
 lib/librte_pmd_ixgbe/ixgbe_rxtx.c | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/lib/librte_pmd_e1000/igb_rxtx.c b/lib/librte_pmd_e1000/igb_rxtx.c
index 0dca7b7..b406397 100644
--- a/lib/librte_pmd_e1000/igb_rxtx.c
+++ b/lib/librte_pmd_e1000/igb_rxtx.c
@@ -262,7 +262,7 @@ igbe_set_xmit_ctx(struct igb_tx_queue* txq,
 
 	if (ol_flags & PKT_TX_IP_CKSUM) {
 		type_tucmd_mlhl = E1000_ADVTXD_TUCMD_IPV4;
-		cmp_mask |= TX_MAC_LEN_CMP_MASK;
+		cmp_mask |= TX_MACIP_LEN_CMP_MASK;
 	}
 
 	/* Specify which HW CTX to upload. */
diff --git a/lib/librte_pmd_ixgbe/ixgbe_rxtx.c b/lib/librte_pmd_ixgbe/ixgbe_rxtx.c
index f9b3fe3..ecebbf6 100644
--- a/lib/librte_pmd_ixgbe/ixgbe_rxtx.c
+++ b/lib/librte_pmd_ixgbe/ixgbe_rxtx.c
@@ -374,7 +374,7 @@ ixgbe_set_xmit_ctx(struct igb_tx_queue* txq,
 
 	if (ol_flags & PKT_TX_IP_CKSUM) {
 		type_tucmd_mlhl = IXGBE_ADVTXD_TUCMD_IPV4;
-		cmp_mask |= TX_MAC_LEN_CMP_MASK;
+		cmp_mask |= TX_MACIP_LEN_CMP_MASK;
 	}
 
 	/* Specify which HW CTX to upload. */
-- 
2.1.0

^ permalink raw reply	[flat|nested] 112+ messages in thread

* [dpdk-dev] [PATCH v2 02/13] ixgbe: fix remaining pkt_flags variable size to 64 bits
  2014-11-14 17:03 ` [dpdk-dev] [PATCH v2 00/13] " Olivier Matz
  2014-11-14 17:03   ` [dpdk-dev] [PATCH v2 01/13] igb/ixgbe: fix IP checksum calculation Olivier Matz
@ 2014-11-14 17:03   ` Olivier Matz
  2014-11-17 16:47     ` Walukiewicz, Miroslaw
  2014-11-14 17:03   ` [dpdk-dev] [PATCH v2 03/13] mbuf: move vxlan_cksum flag definition at the proper place Olivier Matz
                     ` (11 subsequent siblings)
  13 siblings, 1 reply; 112+ messages in thread
From: Olivier Matz @ 2014-11-14 17:03 UTC (permalink / raw)
  To: dev; +Cc: jigsaw

Since commit 4332beee9 "mbuf: expand ol_flags field to 64-bits", the
packet flags are now 64 bits wide. Some occurences were forgotten in
the ixgbe driver.

Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
---
 lib/librte_pmd_ixgbe/ixgbe_rxtx.c | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/lib/librte_pmd_ixgbe/ixgbe_rxtx.c b/lib/librte_pmd_ixgbe/ixgbe_rxtx.c
index ecebbf6..7e470ce 100644
--- a/lib/librte_pmd_ixgbe/ixgbe_rxtx.c
+++ b/lib/librte_pmd_ixgbe/ixgbe_rxtx.c
@@ -817,7 +817,7 @@ end_of_tx:
 static inline uint64_t
 rx_desc_hlen_type_rss_to_pkt_flags(uint32_t hl_tp_rs)
 {
-	uint16_t pkt_flags;
+	uint64_t pkt_flags;
 
 	static uint64_t ip_pkt_types_map[16] = {
 		0, PKT_RX_IPV4_HDR, PKT_RX_IPV4_HDR_EXT, PKT_RX_IPV4_HDR_EXT,
@@ -834,7 +834,7 @@ rx_desc_hlen_type_rss_to_pkt_flags(uint32_t hl_tp_rs)
 	};
 
 #ifdef RTE_LIBRTE_IEEE1588
-	static uint32_t ip_pkt_etqf_map[8] = {
+	static uint64_t ip_pkt_etqf_map[8] = {
 		0, 0, 0, PKT_RX_IEEE1588_PTP,
 		0, 0, 0, 0,
 	};
@@ -903,7 +903,7 @@ ixgbe_rx_scan_hw_ring(struct igb_rx_queue *rxq)
 	struct igb_rx_entry *rxep;
 	struct rte_mbuf *mb;
 	uint16_t pkt_len;
-	uint16_t pkt_flags;
+	uint64_t pkt_flags;
 	int s[LOOK_AHEAD], nb_dd;
 	int i, j, nb_rx = 0;
 
@@ -1335,7 +1335,7 @@ ixgbe_recv_scattered_pkts(void *rx_queue, struct rte_mbuf **rx_pkts,
 	uint16_t nb_rx;
 	uint16_t nb_hold;
 	uint16_t data_len;
-	uint16_t pkt_flags;
+	uint64_t pkt_flags;
 
 	nb_rx = 0;
 	nb_hold = 0;
@@ -1511,9 +1511,9 @@ ixgbe_recv_scattered_pkts(void *rx_queue, struct rte_mbuf **rx_pkts,
 		first_seg->vlan_tci = rte_le_to_cpu_16(rxd.wb.upper.vlan);
 		hlen_type_rss = rte_le_to_cpu_32(rxd.wb.lower.lo_dword.data);
 		pkt_flags = rx_desc_hlen_type_rss_to_pkt_flags(hlen_type_rss);
-		pkt_flags = (uint16_t)(pkt_flags |
+		pkt_flags = (pkt_flags |
 				rx_desc_status_to_pkt_flags(staterr));
-		pkt_flags = (uint16_t)(pkt_flags |
+		pkt_flags = (pkt_flags |
 				rx_desc_error_to_pkt_flags(staterr));
 		first_seg->ol_flags = pkt_flags;
 
-- 
2.1.0

^ permalink raw reply	[flat|nested] 112+ messages in thread

* [dpdk-dev] [PATCH v2 03/13] mbuf: move vxlan_cksum flag definition at the proper place
  2014-11-14 17:03 ` [dpdk-dev] [PATCH v2 00/13] " Olivier Matz
  2014-11-14 17:03   ` [dpdk-dev] [PATCH v2 01/13] igb/ixgbe: fix IP checksum calculation Olivier Matz
  2014-11-14 17:03   ` [dpdk-dev] [PATCH v2 02/13] ixgbe: fix remaining pkt_flags variable size to 64 bits Olivier Matz
@ 2014-11-14 17:03   ` Olivier Matz
  2014-11-17 22:05     ` Thomas Monjalon
  2014-11-14 17:03   ` [dpdk-dev] [PATCH v2 04/13] mbuf: add help about TX checksum flags Olivier Matz
                     ` (10 subsequent siblings)
  13 siblings, 1 reply; 112+ messages in thread
From: Olivier Matz @ 2014-11-14 17:03 UTC (permalink / raw)
  To: dev; +Cc: jigsaw

The tx mbuf flags are ordered from the highest value to the
the lowest. Move the PKT_TX_VXLAN_CKSUM at the right place.

Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
---
 lib/librte_mbuf/rte_mbuf.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h
index f5f8658..029c669 100644
--- a/lib/librte_mbuf/rte_mbuf.h
+++ b/lib/librte_mbuf/rte_mbuf.h
@@ -96,7 +96,6 @@ extern "C" {
 
 #define PKT_TX_VLAN_PKT      (1ULL << 55) /**< TX packet is a 802.1q VLAN packet. */
 #define PKT_TX_IP_CKSUM      (1ULL << 54) /**< IP cksum of TX pkt. computed by NIC. */
-#define PKT_TX_VXLAN_CKSUM   (1ULL << 50) /**< TX checksum of VXLAN computed by NIC */
 #define PKT_TX_IPV4_CSUM     PKT_TX_IP_CKSUM /**< Alias of PKT_TX_IP_CKSUM. */
 #define PKT_TX_IPV4          PKT_RX_IPV4_HDR /**< IPv4 with no IP checksum offload. */
 #define PKT_TX_IPV6          PKT_RX_IPV6_HDR /**< IPv6 packet */
@@ -114,9 +113,10 @@ extern "C" {
 #define PKT_TX_UDP_CKSUM     (3ULL << 52) /**< UDP cksum of TX pkt. computed by NIC. */
 #define PKT_TX_L4_MASK       (3ULL << 52) /**< Mask for L4 cksum offload request. */
 
-/* Bit 51 - IEEE1588*/
 #define PKT_TX_IEEE1588_TMST (1ULL << 51) /**< TX IEEE1588 packet to timestamp. */
 
+#define PKT_TX_VXLAN_CKSUM   (1ULL << 50) /**< TX checksum of VXLAN computed by NIC */
+
 /* Use final bit of flags to indicate a control mbuf */
 #define CTRL_MBUF_FLAG       (1ULL << 63) /**< Mbuf contains control data */
 
-- 
2.1.0

^ permalink raw reply	[flat|nested] 112+ messages in thread

* [dpdk-dev] [PATCH v2 04/13] mbuf: add help about TX checksum flags
  2014-11-14 17:03 ` [dpdk-dev] [PATCH v2 00/13] " Olivier Matz
                     ` (2 preceding siblings ...)
  2014-11-14 17:03   ` [dpdk-dev] [PATCH v2 03/13] mbuf: move vxlan_cksum flag definition at the proper place Olivier Matz
@ 2014-11-14 17:03   ` Olivier Matz
  2014-11-14 17:03   ` [dpdk-dev] [PATCH v2 05/13] mbuf: remove too specific PKT_TX_OFFLOAD_MASK definition Olivier Matz
                     ` (9 subsequent siblings)
  13 siblings, 0 replies; 112+ messages in thread
From: Olivier Matz @ 2014-11-14 17:03 UTC (permalink / raw)
  To: dev; +Cc: jigsaw

Describe how to use hardware checksum API.

Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
---
 lib/librte_mbuf/rte_mbuf.h | 25 +++++++++++++++++--------
 1 file changed, 17 insertions(+), 8 deletions(-)

diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h
index 029c669..d7070e9 100644
--- a/lib/librte_mbuf/rte_mbuf.h
+++ b/lib/librte_mbuf/rte_mbuf.h
@@ -95,19 +95,28 @@ extern "C" {
 #define PKT_RX_TUNNEL_IPV6_HDR (1ULL << 12) /**< RX tunnel packet with IPv6 header. */
 
 #define PKT_TX_VLAN_PKT      (1ULL << 55) /**< TX packet is a 802.1q VLAN packet. */
-#define PKT_TX_IP_CKSUM      (1ULL << 54) /**< IP cksum of TX pkt. computed by NIC. */
+
+/**
+ * Enable hardware computation of IP cksum. To use it:
+ *  - fill l2_len and l3_len in mbuf
+ *  - set the flags PKT_TX_IP_CKSUM
+ *  - set the ip checksum to 0 in IP header
+ */
+#define PKT_TX_IP_CKSUM      (1ULL << 54)
 #define PKT_TX_IPV4_CSUM     PKT_TX_IP_CKSUM /**< Alias of PKT_TX_IP_CKSUM. */
 #define PKT_TX_IPV4          PKT_RX_IPV4_HDR /**< IPv4 with no IP checksum offload. */
 #define PKT_TX_IPV6          PKT_RX_IPV6_HDR /**< IPv6 packet */
 
-/*
- * Bits 52+53 used for L4 packet type with checksum enabled.
- *     00: Reserved
- *     01: TCP checksum
- *     10: SCTP checksum
- *     11: UDP checksum
+/**
+ * Bits 52+53 used for L4 packet type with checksum enabled: 00: Reserved,
+ * 01: TCP checksum, 10: SCTP checksum, 11: UDP checksum. To use hardware
+ * L4 checksum offload, the user needs to:
+ *  - fill l2_len and l3_len in mbuf
+ *  - set the flags PKT_TX_TCP_CKSUM, PKT_TX_SCTP_CKSUM or PKT_TX_UDP_CKSUM
+ *  - calculate the pseudo header checksum and set it in the L4 header (only
+ *    for TCP or UDP). For SCTP, set the crc field to 0.
  */
-#define PKT_TX_L4_NO_CKSUM   (0ULL << 52) /**< Disable L4 cksum of TX pkt. */
+#define PKT_TX_L4_NO_CKSUM   (0ULL << 52) /* Disable L4 cksum of TX pkt. */
 #define PKT_TX_TCP_CKSUM     (1ULL << 52) /**< TCP cksum of TX pkt. computed by NIC. */
 #define PKT_TX_SCTP_CKSUM    (2ULL << 52) /**< SCTP cksum of TX pkt. computed by NIC. */
 #define PKT_TX_UDP_CKSUM     (3ULL << 52) /**< UDP cksum of TX pkt. computed by NIC. */
-- 
2.1.0

^ permalink raw reply	[flat|nested] 112+ messages in thread

* [dpdk-dev] [PATCH v2 05/13] mbuf: remove too specific PKT_TX_OFFLOAD_MASK definition
  2014-11-14 17:03 ` [dpdk-dev] [PATCH v2 00/13] " Olivier Matz
                     ` (3 preceding siblings ...)
  2014-11-14 17:03   ` [dpdk-dev] [PATCH v2 04/13] mbuf: add help about TX checksum flags Olivier Matz
@ 2014-11-14 17:03   ` Olivier Matz
  2014-11-17 10:35     ` Bruce Richardson
  2014-11-14 17:03   ` [dpdk-dev] [PATCH v2 06/13] mbuf: add functions to get the name of an ol_flag Olivier Matz
                     ` (8 subsequent siblings)
  13 siblings, 1 reply; 112+ messages in thread
From: Olivier Matz @ 2014-11-14 17:03 UTC (permalink / raw)
  To: dev; +Cc: jigsaw

This definition is specific to Intel PMD drivers and its definition
"indicate what bits required for building TX context" shows that it
should not be in the generic rte_mbuf.h but in the PMD driver.

Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
---
 lib/librte_mbuf/rte_mbuf.h        | 5 -----
 lib/librte_pmd_e1000/igb_rxtx.c   | 8 +++++++-
 lib/librte_pmd_ixgbe/ixgbe_rxtx.c | 8 +++++++-
 3 files changed, 14 insertions(+), 7 deletions(-)

diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h
index d7070e9..68fb988 100644
--- a/lib/librte_mbuf/rte_mbuf.h
+++ b/lib/librte_mbuf/rte_mbuf.h
@@ -129,11 +129,6 @@ extern "C" {
 /* Use final bit of flags to indicate a control mbuf */
 #define CTRL_MBUF_FLAG       (1ULL << 63) /**< Mbuf contains control data */
 
-/**
- * Bit Mask to indicate what bits required for building TX context
- */
-#define PKT_TX_OFFLOAD_MASK (PKT_TX_VLAN_PKT | PKT_TX_IP_CKSUM | PKT_TX_L4_MASK)
-
 /* define a set of marker types that can be used to refer to set points in the
  * mbuf */
 typedef void    *MARKER[0];   /**< generic marker for a point in a structure */
diff --git a/lib/librte_pmd_e1000/igb_rxtx.c b/lib/librte_pmd_e1000/igb_rxtx.c
index b406397..433c616 100644
--- a/lib/librte_pmd_e1000/igb_rxtx.c
+++ b/lib/librte_pmd_e1000/igb_rxtx.c
@@ -84,6 +84,12 @@
 		ETH_RSS_IPV6_UDP | \
 		ETH_RSS_IPV6_UDP_EX)
 
+/* Bit Mask to indicate what bits required for building TX context */
+#define IGB_TX_OFFLOAD_MASK (			 \
+		PKT_TX_VLAN_PKT |		 \
+		PKT_TX_IP_CKSUM |		 \
+		PKT_TX_L4_MASK)
+
 static inline struct rte_mbuf *
 rte_rxmbuf_alloc(struct rte_mempool *mp)
 {
@@ -400,7 +406,7 @@ eth_igb_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
 		ol_flags = tx_pkt->ol_flags;
 		vlan_macip_lens.f.vlan_tci = tx_pkt->vlan_tci;
 		vlan_macip_lens.f.l2_l3_len = tx_pkt->l2_l3_len;
-		tx_ol_req = ol_flags & PKT_TX_OFFLOAD_MASK;
+		tx_ol_req = ol_flags & IGB_TX_OFFLOAD_MASK;
 
 		/* If a Context Descriptor need be built . */
 		if (tx_ol_req) {
diff --git a/lib/librte_pmd_ixgbe/ixgbe_rxtx.c b/lib/librte_pmd_ixgbe/ixgbe_rxtx.c
index 7e470ce..ca35db2 100644
--- a/lib/librte_pmd_ixgbe/ixgbe_rxtx.c
+++ b/lib/librte_pmd_ixgbe/ixgbe_rxtx.c
@@ -90,6 +90,12 @@
 		ETH_RSS_IPV6_UDP | \
 		ETH_RSS_IPV6_UDP_EX)
 
+/* Bit Mask to indicate what bits required for building TX context */
+#define IXGBE_TX_OFFLOAD_MASK (			 \
+		PKT_TX_VLAN_PKT |		 \
+		PKT_TX_IP_CKSUM |		 \
+		PKT_TX_L4_MASK)
+
 static inline struct rte_mbuf *
 rte_rxmbuf_alloc(struct rte_mempool *mp)
 {
@@ -580,7 +586,7 @@ ixgbe_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
 		ol_flags = tx_pkt->ol_flags;
 
 		/* If hardware offload required */
-		tx_ol_req = ol_flags & PKT_TX_OFFLOAD_MASK;
+		tx_ol_req = ol_flags & IXGBE_TX_OFFLOAD_MASK;
 		if (tx_ol_req) {
 			vlan_macip_lens.f.vlan_tci = tx_pkt->vlan_tci;
 			vlan_macip_lens.f.l2_l3_len = tx_pkt->l2_l3_len;
-- 
2.1.0

^ permalink raw reply	[flat|nested] 112+ messages in thread

* [dpdk-dev] [PATCH v2 06/13] mbuf: add functions to get the name of an ol_flag
  2014-11-14 17:03 ` [dpdk-dev] [PATCH v2 00/13] " Olivier Matz
                     ` (4 preceding siblings ...)
  2014-11-14 17:03   ` [dpdk-dev] [PATCH v2 05/13] mbuf: remove too specific PKT_TX_OFFLOAD_MASK definition Olivier Matz
@ 2014-11-14 17:03   ` Olivier Matz
  2014-11-17 10:39     ` Bruce Richardson
  2014-11-17 19:00     ` Ananyev, Konstantin
  2014-11-14 17:03   ` [dpdk-dev] [PATCH v2 07/13] testpmd: fix use of offload flags in testpmd Olivier Matz
                     ` (7 subsequent siblings)
  13 siblings, 2 replies; 112+ messages in thread
From: Olivier Matz @ 2014-11-14 17:03 UTC (permalink / raw)
  To: dev; +Cc: jigsaw

In test-pmd (rxonly.c), the code is able to dump the list of ol_flags.
The issue is that the list of flags in the application has to be
synchronized with the flags defined in rte_mbuf.h.

This patch introduces 2 new functions rte_get_rx_ol_flag_name()
and rte_get_tx_ol_flag_name() that returns the name of a flag from
its mask. It also fixes rxonly.c to use this new functions and to
display the proper flags.

Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
---
 app/test-pmd/rxonly.c      | 36 ++++++++++--------------------------
 lib/librte_mbuf/rte_mbuf.c | 45 +++++++++++++++++++++++++++++++++++++++++++++
 lib/librte_mbuf/rte_mbuf.h | 22 ++++++++++++++++++++++
 3 files changed, 77 insertions(+), 26 deletions(-)

diff --git a/app/test-pmd/rxonly.c b/app/test-pmd/rxonly.c
index 9ad1df6..51a530a 100644
--- a/app/test-pmd/rxonly.c
+++ b/app/test-pmd/rxonly.c
@@ -71,26 +71,6 @@
 
 #include "testpmd.h"
 
-#define MAX_PKT_RX_FLAGS 13
-static const char *pkt_rx_flag_names[MAX_PKT_RX_FLAGS] = {
-	"VLAN_PKT",
-	"RSS_HASH",
-	"PKT_RX_FDIR",
-	"IP_CKSUM",
-	"IP_CKSUM_BAD",
-
-	"IPV4_HDR",
-	"IPV4_HDR_EXT",
-	"IPV6_HDR",
-	"IPV6_HDR_EXT",
-
-	"IEEE1588_PTP",
-	"IEEE1588_TMST",
-
-	"TUNNEL_IPV4_HDR",
-	"TUNNEL_IPV6_HDR",
-};
-
 static inline void
 print_ether_addr(const char *what, struct ether_addr *eth_addr)
 {
@@ -214,12 +194,16 @@ pkt_burst_receive(struct fwd_stream *fs)
 		printf(" - Receive queue=0x%x", (unsigned) fs->rx_queue);
 		printf("\n");
 		if (ol_flags != 0) {
-			int rxf;
-
-			for (rxf = 0; rxf < MAX_PKT_RX_FLAGS; rxf++) {
-				if (ol_flags & (1 << rxf))
-					printf("  PKT_RX_%s\n",
-					       pkt_rx_flag_names[rxf]);
+			unsigned rxf;
+			const char *name;
+
+			for (rxf = 0; rxf < sizeof(mb->ol_flags) * 8; rxf++) {
+				if ((ol_flags & (1ULL << rxf)) == 0)
+					continue;
+				name = rte_get_rx_ol_flag_name(1ULL << rxf);
+				if (name == NULL)
+					continue;
+				printf("  %s\n", name);
 			}
 		}
 		rte_pktmbuf_free(mb);
diff --git a/lib/librte_mbuf/rte_mbuf.c b/lib/librte_mbuf/rte_mbuf.c
index 52e7574..5cd9137 100644
--- a/lib/librte_mbuf/rte_mbuf.c
+++ b/lib/librte_mbuf/rte_mbuf.c
@@ -196,3 +196,48 @@ rte_pktmbuf_dump(FILE *f, const struct rte_mbuf *m, unsigned dump_len)
 		nb_segs --;
 	}
 }
+
+/*
+ * Get the name of a RX offload flag
+ */
+const char *rte_get_rx_ol_flag_name(uint64_t mask)
+{
+	switch (mask) {
+	case PKT_RX_VLAN_PKT: return "PKT_RX_VLAN_PKT";
+	case PKT_RX_RSS_HASH: return "PKT_RX_RSS_HASH";
+	case PKT_RX_FDIR: return "PKT_RX_FDIR";
+	case PKT_RX_L4_CKSUM_BAD: return "PKT_RX_L4_CKSUM_BAD";
+	case PKT_RX_IP_CKSUM_BAD: return "PKT_RX_IP_CKSUM_BAD";
+	/* case PKT_RX_EIP_CKSUM_BAD: return "PKT_RX_EIP_CKSUM_BAD"; */
+	/* case PKT_RX_OVERSIZE: return "PKT_RX_OVERSIZE"; */
+	/* case PKT_RX_HBUF_OVERFLOW: return "PKT_RX_HBUF_OVERFLOW"; */
+	/* case PKT_RX_RECIP_ERR: return "PKT_RX_RECIP_ERR"; */
+	/* case PKT_RX_MAC_ERR: return "PKT_RX_MAC_ERR"; */
+	case PKT_RX_IPV4_HDR: return "PKT_RX_IPV4_HDR";
+	case PKT_RX_IPV4_HDR_EXT: return "PKT_RX_IPV4_HDR_EXT";
+	case PKT_RX_IPV6_HDR: return "PKT_RX_IPV6_HDR";
+	case PKT_RX_IPV6_HDR_EXT: return "PKT_RX_IPV6_HDR_EXT";
+	case PKT_RX_IEEE1588_PTP: return "PKT_RX_IEEE1588_PTP";
+	case PKT_RX_IEEE1588_TMST: return "PKT_RX_IEEE1588_TMST";
+	case PKT_RX_TUNNEL_IPV4_HDR: return "PKT_RX_TUNNEL_IPV4_HDR";
+	case PKT_RX_TUNNEL_IPV6_HDR: return "PKT_RX_TUNNEL_IPV6_HDR";
+	default: return NULL;
+	}
+}
+
+/*
+ * Get the name of a TX offload flag
+ */
+const char *rte_get_tx_ol_flag_name(uint64_t mask)
+{
+	switch (mask) {
+	case PKT_TX_VLAN_PKT: return "PKT_TX_VLAN_PKT";
+	case PKT_TX_IP_CKSUM: return "PKT_TX_IP_CKSUM";
+	case PKT_TX_TCP_CKSUM: return "PKT_TX_TCP_CKSUM";
+	case PKT_TX_SCTP_CKSUM: return "PKT_TX_SCTP_CKSUM";
+	case PKT_TX_UDP_CKSUM: return "PKT_TX_UDP_CKSUM";
+	case PKT_TX_IEEE1588_TMST: return "PKT_TX_IEEE1588_TMST";
+	case PKT_TX_VXLAN_CKSUM: return "PKT_TX_VXLAN_CKSUM";
+	default: return NULL;
+	}
+}
diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h
index 68fb988..e76617f 100644
--- a/lib/librte_mbuf/rte_mbuf.h
+++ b/lib/librte_mbuf/rte_mbuf.h
@@ -129,6 +129,28 @@ extern "C" {
 /* Use final bit of flags to indicate a control mbuf */
 #define CTRL_MBUF_FLAG       (1ULL << 63) /**< Mbuf contains control data */
 
+/**
+ * Get the name of a RX offload flag
+ *
+ * @param mask
+ *   The mask describing the flag.
+ * @return
+ *   The name of this flag, or NULL if it's not a valid RX flag.
+ */
+const char *rte_get_rx_ol_flag_name(uint64_t mask);
+
+/**
+ * Get the name of a TX offload flag
+ *
+ * @param mask
+ *   The mask describing the flag. Usually only one bit must be set.
+ *   Several bits can be given if they belong to the same mask.
+ *   Ex: PKT_TX_L4_MASK.
+ * @return
+ *   The name of this flag, or NULL if it's not a valid TX flag.
+ */
+const char *rte_get_tx_ol_flag_name(uint64_t mask);
+
 /* define a set of marker types that can be used to refer to set points in the
  * mbuf */
 typedef void    *MARKER[0];   /**< generic marker for a point in a structure */
-- 
2.1.0

^ permalink raw reply	[flat|nested] 112+ messages in thread

* [dpdk-dev] [PATCH v2 07/13] testpmd: fix use of offload flags in testpmd
  2014-11-14 17:03 ` [dpdk-dev] [PATCH v2 00/13] " Olivier Matz
                     ` (5 preceding siblings ...)
  2014-11-14 17:03   ` [dpdk-dev] [PATCH v2 06/13] mbuf: add functions to get the name of an ol_flag Olivier Matz
@ 2014-11-14 17:03   ` Olivier Matz
  2014-11-14 17:03   ` [dpdk-dev] [PATCH v2 08/13] testpmd: rework csum forward engine Olivier Matz
                     ` (6 subsequent siblings)
  13 siblings, 0 replies; 112+ messages in thread
From: Olivier Matz @ 2014-11-14 17:03 UTC (permalink / raw)
  To: dev; +Cc: jigsaw

In testpmd the rte_port->tx_ol_flags flag was used in 2 incompatible
manners:
- sometimes used with testpmd specific flags (0xff for checksums, and
  bit 11 for vlan)
- sometimes assigned to m->ol_flags directly, which is wrong in case
  of checksum flags

This commit replaces the hardcoded values by named definitions, which
are not compatible with mbuf flags. The testpmd forward engines are
fixed to use the flags properly.

Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
---
 app/test-pmd/config.c   |  4 ++--
 app/test-pmd/csumonly.c | 40 +++++++++++++++++++++++-----------------
 app/test-pmd/macfwd.c   |  5 ++++-
 app/test-pmd/macswap.c  |  5 ++++-
 app/test-pmd/testpmd.h  | 28 +++++++++++++++++++++-------
 app/test-pmd/txonly.c   |  9 ++++++---
 6 files changed, 60 insertions(+), 31 deletions(-)

diff --git a/app/test-pmd/config.c b/app/test-pmd/config.c
index b102b72..34b6fdb 100644
--- a/app/test-pmd/config.c
+++ b/app/test-pmd/config.c
@@ -1670,7 +1670,7 @@ tx_vlan_set(portid_t port_id, uint16_t vlan_id)
 		return;
 	if (vlan_id_is_invalid(vlan_id))
 		return;
-	ports[port_id].tx_ol_flags |= PKT_TX_VLAN_PKT;
+	ports[port_id].tx_ol_flags |= TESTPMD_TX_OFFLOAD_INSERT_VLAN;
 	ports[port_id].tx_vlan_id = vlan_id;
 }
 
@@ -1679,7 +1679,7 @@ tx_vlan_reset(portid_t port_id)
 {
 	if (port_id_is_invalid(port_id))
 		return;
-	ports[port_id].tx_ol_flags &= ~PKT_TX_VLAN_PKT;
+	ports[port_id].tx_ol_flags &= ~TESTPMD_TX_OFFLOAD_INSERT_VLAN;
 }
 
 void
diff --git a/app/test-pmd/csumonly.c b/app/test-pmd/csumonly.c
index 8d10bfd..743094a 100644
--- a/app/test-pmd/csumonly.c
+++ b/app/test-pmd/csumonly.c
@@ -322,7 +322,7 @@ pkt_burst_checksum_forward(struct fwd_stream *fs)
 			/* Do not delete, this is required by HW*/
 			ipv4_hdr->hdr_checksum = 0;
 
-			if (tx_ol_flags & 0x1) {
+			if (tx_ol_flags & TESTPMD_TX_OFFLOAD_IP_CKSUM) {
 				/* HW checksum */
 				ol_flags |= PKT_TX_IP_CKSUM;
 			}
@@ -336,7 +336,7 @@ pkt_burst_checksum_forward(struct fwd_stream *fs)
 			if (l4_proto == IPPROTO_UDP) {
 				udp_hdr = (struct udp_hdr*) (rte_pktmbuf_mtod(mb,
 						unsigned char *) + l2_len + l3_len);
-				if (tx_ol_flags & 0x2) {
+				if (tx_ol_flags & TESTPMD_TX_OFFLOAD_UDP_CKSUM) {
 					/* HW Offload */
 					ol_flags |= PKT_TX_UDP_CKSUM;
 					if (ipv4_tunnel)
@@ -358,7 +358,7 @@ pkt_burst_checksum_forward(struct fwd_stream *fs)
 					uint16_t len;
 
 					/* Check if inner L3/L4 checkum flag is set */
-					if (tx_ol_flags & 0xF0)
+					if (tx_ol_flags & TESTPMD_TX_OFFLOAD_INNER_CKSUM_MASK)
 						ol_flags |= PKT_TX_VXLAN_CKSUM;
 
 					inner_l2_len  = sizeof(struct ether_hdr);
@@ -381,7 +381,7 @@ pkt_burst_checksum_forward(struct fwd_stream *fs)
 								unsigned char *) + len);
 						inner_l4_proto = inner_ipv4_hdr->next_proto_id;
 
-						if (tx_ol_flags & 0x10) {
+						if (tx_ol_flags & TESTPMD_TX_OFFLOAD_INNER_IP_CKSUM) {
 
 							/* Do not delete, this is required by HW*/
 							inner_ipv4_hdr->hdr_checksum = 0;
@@ -394,7 +394,8 @@ pkt_burst_checksum_forward(struct fwd_stream *fs)
 								unsigned char *) + len);
 						inner_l4_proto = inner_ipv6_hdr->proto;
 					}
-					if ((inner_l4_proto == IPPROTO_UDP) && (tx_ol_flags & 0x20)) {
+					if ((inner_l4_proto == IPPROTO_UDP) &&
+						(tx_ol_flags & TESTPMD_TX_OFFLOAD_INNER_UDP_CKSUM)) {
 
 						/* HW Offload */
 						ol_flags |= PKT_TX_UDP_CKSUM;
@@ -405,7 +406,8 @@ pkt_burst_checksum_forward(struct fwd_stream *fs)
 						else if (eth_type == ETHER_TYPE_IPv6)
 							inner_udp_hdr->dgram_cksum = get_ipv6_psd_sum(inner_ipv6_hdr);
 
-					} else if ((inner_l4_proto == IPPROTO_TCP) && (tx_ol_flags & 0x40)) {
+					} else if ((inner_l4_proto == IPPROTO_TCP) &&
+						(tx_ol_flags & TESTPMD_TX_OFFLOAD_INNER_TCP_CKSUM)) {
 						/* HW Offload */
 						ol_flags |= PKT_TX_TCP_CKSUM;
 						inner_tcp_hdr = (struct tcp_hdr *) (rte_pktmbuf_mtod(mb,
@@ -414,7 +416,8 @@ pkt_burst_checksum_forward(struct fwd_stream *fs)
 							inner_tcp_hdr->cksum = get_ipv4_psd_sum(inner_ipv4_hdr);
 						else if (eth_type == ETHER_TYPE_IPv6)
 							inner_tcp_hdr->cksum = get_ipv6_psd_sum(inner_ipv6_hdr);
-					} else if ((inner_l4_proto == IPPROTO_SCTP) && (tx_ol_flags & 0x80)) {
+					} else if ((inner_l4_proto == IPPROTO_SCTP) &&
+						(tx_ol_flags & TESTPMD_TX_OFFLOAD_INNER_SCTP_CKSUM)) {
 						/* HW Offload */
 						ol_flags |= PKT_TX_SCTP_CKSUM;
 						inner_sctp_hdr = (struct sctp_hdr *) (rte_pktmbuf_mtod(mb,
@@ -427,7 +430,7 @@ pkt_burst_checksum_forward(struct fwd_stream *fs)
 			} else if (l4_proto == IPPROTO_TCP) {
 				tcp_hdr = (struct tcp_hdr*) (rte_pktmbuf_mtod(mb,
 						unsigned char *) + l2_len + l3_len);
-				if (tx_ol_flags & 0x4) {
+				if (tx_ol_flags & TESTPMD_TX_OFFLOAD_TCP_CKSUM) {
 					ol_flags |= PKT_TX_TCP_CKSUM;
 					tcp_hdr->cksum = get_ipv4_psd_sum(ipv4_hdr);
 				}
@@ -440,7 +443,7 @@ pkt_burst_checksum_forward(struct fwd_stream *fs)
 				sctp_hdr = (struct sctp_hdr*) (rte_pktmbuf_mtod(mb,
 						unsigned char *) + l2_len + l3_len);
 
-				if (tx_ol_flags & 0x8) {
+				if (tx_ol_flags & TESTPMD_TX_OFFLOAD_SCTP_CKSUM) {
 					ol_flags |= PKT_TX_SCTP_CKSUM;
 					sctp_hdr->cksum = 0;
 
@@ -465,7 +468,7 @@ pkt_burst_checksum_forward(struct fwd_stream *fs)
 			if (l4_proto == IPPROTO_UDP) {
 				udp_hdr = (struct udp_hdr*) (rte_pktmbuf_mtod(mb,
 						unsigned char *) + l2_len + l3_len);
-				if (tx_ol_flags & 0x2) {
+				if (tx_ol_flags & TESTPMD_TX_OFFLOAD_UDP_CKSUM) {
 					/* HW Offload */
 					ol_flags |= PKT_TX_UDP_CKSUM;
 					if (ipv6_tunnel)
@@ -487,7 +490,7 @@ pkt_burst_checksum_forward(struct fwd_stream *fs)
 					uint16_t len;
 
 					/* Check if inner L3/L4 checksum flag is set */
-					if (tx_ol_flags & 0xF0)
+					if (tx_ol_flags & TESTPMD_TX_OFFLOAD_INNER_CKSUM_MASK)
 						ol_flags |= PKT_TX_VXLAN_CKSUM;
 
 					inner_l2_len  = sizeof(struct ether_hdr);
@@ -511,7 +514,7 @@ pkt_burst_checksum_forward(struct fwd_stream *fs)
 						inner_l4_proto = inner_ipv4_hdr->next_proto_id;
 
 						/* HW offload */
-						if (tx_ol_flags & 0x10) {
+						if (tx_ol_flags & TESTPMD_TX_OFFLOAD_INNER_IP_CKSUM) {
 
 							/* Do not delete, this is required by HW*/
 							inner_ipv4_hdr->hdr_checksum = 0;
@@ -524,7 +527,8 @@ pkt_burst_checksum_forward(struct fwd_stream *fs)
 						inner_l4_proto = inner_ipv6_hdr->proto;
 					}
 
-					if ((inner_l4_proto == IPPROTO_UDP) && (tx_ol_flags & 0x20)) {
+					if ((inner_l4_proto == IPPROTO_UDP) &&
+						(tx_ol_flags & TESTPMD_TX_OFFLOAD_INNER_UDP_CKSUM)) {
 						inner_udp_hdr = (struct udp_hdr *) (rte_pktmbuf_mtod(mb,
 							unsigned char *) + len + inner_l3_len);
 						/* HW offload */
@@ -534,7 +538,8 @@ pkt_burst_checksum_forward(struct fwd_stream *fs)
 							inner_udp_hdr->dgram_cksum = get_ipv4_psd_sum(inner_ipv4_hdr);
 						else if (eth_type == ETHER_TYPE_IPv6)
 							inner_udp_hdr->dgram_cksum = get_ipv6_psd_sum(inner_ipv6_hdr);
-					} else if ((inner_l4_proto == IPPROTO_TCP) && (tx_ol_flags & 0x40)) {
+					} else if ((inner_l4_proto == IPPROTO_TCP) &&
+						(tx_ol_flags & TESTPMD_TX_OFFLOAD_INNER_TCP_CKSUM)) {
 						/* HW offload */
 						ol_flags |= PKT_TX_TCP_CKSUM;
 						inner_tcp_hdr = (struct tcp_hdr *) (rte_pktmbuf_mtod(mb,
@@ -545,7 +550,8 @@ pkt_burst_checksum_forward(struct fwd_stream *fs)
 						else if (eth_type == ETHER_TYPE_IPv6)
 							inner_tcp_hdr->cksum = get_ipv6_psd_sum(inner_ipv6_hdr);
 
-					} else if ((inner_l4_proto == IPPROTO_SCTP) && (tx_ol_flags & 0x80)) {
+					} else if ((inner_l4_proto == IPPROTO_SCTP) &&
+						(tx_ol_flags & TESTPMD_TX_OFFLOAD_INNER_SCTP_CKSUM)) {
 						/* HW offload */
 						ol_flags |= PKT_TX_SCTP_CKSUM;
 						inner_sctp_hdr = (struct sctp_hdr *) (rte_pktmbuf_mtod(mb,
@@ -559,7 +565,7 @@ pkt_burst_checksum_forward(struct fwd_stream *fs)
 			else if (l4_proto == IPPROTO_TCP) {
 				tcp_hdr = (struct tcp_hdr*) (rte_pktmbuf_mtod(mb,
 						unsigned char *) + l2_len + l3_len);
-				if (tx_ol_flags & 0x4) {
+				if (tx_ol_flags & TESTPMD_TX_OFFLOAD_TCP_CKSUM) {
 					ol_flags |= PKT_TX_TCP_CKSUM;
 					tcp_hdr->cksum = get_ipv6_psd_sum(ipv6_hdr);
 				}
@@ -573,7 +579,7 @@ pkt_burst_checksum_forward(struct fwd_stream *fs)
 				sctp_hdr = (struct sctp_hdr*) (rte_pktmbuf_mtod(mb,
 						unsigned char *) + l2_len + l3_len);
 
-				if (tx_ol_flags & 0x8) {
+				if (tx_ol_flags & TESTPMD_TX_OFFLOAD_SCTP_CKSUM) {
 					ol_flags |= PKT_TX_SCTP_CKSUM;
 					sctp_hdr->cksum = 0;
 					/* Sanity check, only number of 4 bytes supported by HW */
diff --git a/app/test-pmd/macfwd.c b/app/test-pmd/macfwd.c
index 38bae23..aa3d705 100644
--- a/app/test-pmd/macfwd.c
+++ b/app/test-pmd/macfwd.c
@@ -85,6 +85,7 @@ pkt_burst_mac_forward(struct fwd_stream *fs)
 	uint16_t nb_rx;
 	uint16_t nb_tx;
 	uint16_t i;
+	uint64_t ol_flags = 0;
 #ifdef RTE_TEST_PMD_RECORD_CORE_CYCLES
 	uint64_t start_tsc;
 	uint64_t end_tsc;
@@ -108,6 +109,8 @@ pkt_burst_mac_forward(struct fwd_stream *fs)
 #endif
 	fs->rx_packets += nb_rx;
 	txp = &ports[fs->tx_port];
+	if (txp->tx_ol_flags & TESTPMD_TX_OFFLOAD_INSERT_VLAN)
+		ol_flags = PKT_TX_VLAN_PKT;
 	for (i = 0; i < nb_rx; i++) {
 		mb = pkts_burst[i];
 		eth_hdr = rte_pktmbuf_mtod(mb, struct ether_hdr *);
@@ -115,7 +118,7 @@ pkt_burst_mac_forward(struct fwd_stream *fs)
 				&eth_hdr->d_addr);
 		ether_addr_copy(&ports[fs->tx_port].eth_addr,
 				&eth_hdr->s_addr);
-		mb->ol_flags = txp->tx_ol_flags;
+		mb->ol_flags = ol_flags;
 		mb->l2_len = sizeof(struct ether_hdr);
 		mb->l3_len = sizeof(struct ipv4_hdr);
 		mb->vlan_tci = txp->tx_vlan_id;
diff --git a/app/test-pmd/macswap.c b/app/test-pmd/macswap.c
index 1786095..ec61657 100644
--- a/app/test-pmd/macswap.c
+++ b/app/test-pmd/macswap.c
@@ -85,6 +85,7 @@ pkt_burst_mac_swap(struct fwd_stream *fs)
 	uint16_t nb_rx;
 	uint16_t nb_tx;
 	uint16_t i;
+	uint64_t ol_flags = 0;
 #ifdef RTE_TEST_PMD_RECORD_CORE_CYCLES
 	uint64_t start_tsc;
 	uint64_t end_tsc;
@@ -108,6 +109,8 @@ pkt_burst_mac_swap(struct fwd_stream *fs)
 #endif
 	fs->rx_packets += nb_rx;
 	txp = &ports[fs->tx_port];
+	if (txp->tx_ol_flags & TESTPMD_TX_OFFLOAD_INSERT_VLAN)
+		ol_flags = PKT_TX_VLAN_PKT;
 	for (i = 0; i < nb_rx; i++) {
 		mb = pkts_burst[i];
 		eth_hdr = rte_pktmbuf_mtod(mb, struct ether_hdr *);
@@ -117,7 +120,7 @@ pkt_burst_mac_swap(struct fwd_stream *fs)
 		ether_addr_copy(&eth_hdr->s_addr, &eth_hdr->d_addr);
 		ether_addr_copy(&addr, &eth_hdr->s_addr);
 
-		mb->ol_flags = txp->tx_ol_flags;
+		mb->ol_flags = ol_flags;
 		mb->l2_len = sizeof(struct ether_hdr);
 		mb->l3_len = sizeof(struct ipv4_hdr);
 		mb->vlan_tci = txp->tx_vlan_id;
diff --git a/app/test-pmd/testpmd.h b/app/test-pmd/testpmd.h
index 9cbfeac..82af2bd 100644
--- a/app/test-pmd/testpmd.h
+++ b/app/test-pmd/testpmd.h
@@ -123,14 +123,28 @@ struct fwd_stream {
 #endif
 };
 
+/** Offload IP checksum in csum forward engine */
+#define TESTPMD_TX_OFFLOAD_IP_CKSUM          0x0001
+/** Offload UDP checksum in csum forward engine */
+#define TESTPMD_TX_OFFLOAD_UDP_CKSUM         0x0002
+/** Offload TCP checksum in csum forward engine */
+#define TESTPMD_TX_OFFLOAD_TCP_CKSUM         0x0004
+/** Offload SCTP checksum in csum forward engine */
+#define TESTPMD_TX_OFFLOAD_SCTP_CKSUM        0x0008
+/** Offload inner IP checksum in csum forward engine */
+#define TESTPMD_TX_OFFLOAD_INNER_IP_CKSUM    0x0010
+/** Offload inner UDP checksum in csum forward engine */
+#define TESTPMD_TX_OFFLOAD_INNER_UDP_CKSUM   0x0020
+/** Offload inner TCP checksum in csum forward engine */
+#define TESTPMD_TX_OFFLOAD_INNER_TCP_CKSUM   0x0040
+/** Offload inner SCTP checksum in csum forward engine */
+#define TESTPMD_TX_OFFLOAD_INNER_SCTP_CKSUM  0x0080
+/** Offload inner IP checksum mask */
+#define TESTPMD_TX_OFFLOAD_INNER_CKSUM_MASK  0x00F0
+/** Insert VLAN header in forward engine */
+#define TESTPMD_TX_OFFLOAD_INSERT_VLAN       0x0100
 /**
  * The data structure associated with each port.
- * tx_ol_flags is slightly different from ol_flags of rte_mbuf.
- *   Bit  0: Insert IP checksum
- *   Bit  1: Insert UDP checksum
- *   Bit  2: Insert TCP checksum
- *   Bit  3: Insert SCTP checksum
- *   Bit 11: Insert VLAN Label
  */
 struct rte_port {
 	struct rte_eth_dev_info dev_info;   /**< PCI info + driver name */
@@ -141,7 +155,7 @@ struct rte_port {
 	struct fwd_stream       *rx_stream; /**< Port RX stream, if unique */
 	struct fwd_stream       *tx_stream; /**< Port TX stream, if unique */
 	unsigned int            socket_id;  /**< For NUMA support */
-	uint64_t                tx_ol_flags;/**< Offload Flags of TX packets. */
+	uint16_t                tx_ol_flags;/**< TX Offload Flags (TESTPMD_TX_OFFLOAD...). */
 	uint16_t                tx_vlan_id; /**< Tag Id. in TX VLAN packets. */
 	void                    *fwd_ctx;   /**< Forwarding mode context */
 	uint64_t                rx_bad_ip_csum; /**< rx pkts with bad ip checksum  */
diff --git a/app/test-pmd/txonly.c b/app/test-pmd/txonly.c
index 3d08005..c984670 100644
--- a/app/test-pmd/txonly.c
+++ b/app/test-pmd/txonly.c
@@ -196,6 +196,7 @@ static void
 pkt_burst_transmit(struct fwd_stream *fs)
 {
 	struct rte_mbuf *pkts_burst[MAX_PKT_BURST];
+	struct rte_port *txp;
 	struct rte_mbuf *pkt;
 	struct rte_mbuf *pkt_seg;
 	struct rte_mempool *mbp;
@@ -203,7 +204,7 @@ pkt_burst_transmit(struct fwd_stream *fs)
 	uint16_t nb_tx;
 	uint16_t nb_pkt;
 	uint16_t vlan_tci;
-	uint64_t ol_flags;
+	uint64_t ol_flags = 0;
 	uint8_t  i;
 #ifdef RTE_TEST_PMD_RECORD_CORE_CYCLES
 	uint64_t start_tsc;
@@ -216,8 +217,10 @@ pkt_burst_transmit(struct fwd_stream *fs)
 #endif
 
 	mbp = current_fwd_lcore()->mbp;
-	vlan_tci = ports[fs->tx_port].tx_vlan_id;
-	ol_flags = ports[fs->tx_port].tx_ol_flags;
+	txp = &ports[fs->tx_port];
+	vlan_tci = txp->tx_vlan_id;
+	if (txp->tx_ol_flags & TESTPMD_TX_OFFLOAD_INSERT_VLAN)
+		ol_flags = PKT_TX_VLAN_PKT;
 	for (nb_pkt = 0; nb_pkt < nb_pkt_per_burst; nb_pkt++) {
 		pkt = tx_mbuf_alloc(mbp);
 		if (pkt == NULL) {
-- 
2.1.0

^ permalink raw reply	[flat|nested] 112+ messages in thread

* [dpdk-dev] [PATCH v2 08/13] testpmd: rework csum forward engine
  2014-11-14 17:03 ` [dpdk-dev] [PATCH v2 00/13] " Olivier Matz
                     ` (6 preceding siblings ...)
  2014-11-14 17:03   ` [dpdk-dev] [PATCH v2 07/13] testpmd: fix use of offload flags in testpmd Olivier Matz
@ 2014-11-14 17:03   ` Olivier Matz
  2014-11-17  8:11     ` Liu, Jijiang
  2014-11-14 17:03   ` [dpdk-dev] [PATCH v2 09/13] mbuf: introduce new checksum API Olivier Matz
                     ` (5 subsequent siblings)
  13 siblings, 1 reply; 112+ messages in thread
From: Olivier Matz @ 2014-11-14 17:03 UTC (permalink / raw)
  To: dev; +Cc: jigsaw

The csum forward engine was becoming too complex to be used and
extended (the next commits want to add the support of TSO):

- no explaination about what the code does
- code is not factorized, lots of code duplicated, especially between
  ipv4/ipv6
- user command line api: use of bitmasks that need to be calculated by
  the user
- the user flags don't have the same semantic:
  - for legacy IP/UDP/TCP/SCTP, it selects software or hardware checksum
  - for other (vxlan), it selects between hardware checksum or no
    checksum
- the code relies too much on flags set by the driver without software
  alternative (ex: PKT_RX_TUNNEL_IPV4_HDR). It is nice to be able to
  compare a software implementation with the hardware offload.

This commit tries to fix these issues, and provide a simple definition
of what is done by the forward engine:

 * Receive a burst of packets, and for supported packet types:
 *  - modify the IPs
 *  - reprocess the checksum in SW or HW, depending on testpmd command line
 *    configuration
 * Then packets are transmitted on the output port.
 *
 * Supported packets are:
 *   Ether / (vlan) / IP|IP6 / UDP|TCP|SCTP .
 *   Ether / (vlan) / IP|IP6 / UDP / VxLAN / Ether / IP|IP6 / UDP|TCP|SCTP
 *
 * The network parser supposes that the packet is contiguous, which may
 * not be the case in real life.

Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
---
 app/test-pmd/cmdline.c  | 151 ++++++++---
 app/test-pmd/config.c   |  11 -
 app/test-pmd/csumonly.c | 668 ++++++++++++++++++++++--------------------------
 app/test-pmd/testpmd.h  |  17 +-
 4 files changed, 423 insertions(+), 424 deletions(-)

diff --git a/app/test-pmd/cmdline.c b/app/test-pmd/cmdline.c
index 4c3fc76..0361e58 100644
--- a/app/test-pmd/cmdline.c
+++ b/app/test-pmd/cmdline.c
@@ -310,19 +310,14 @@ static void cmd_help_long_parsed(void *parsed_result,
 			"    Disable hardware insertion of a VLAN header in"
 			" packets sent on a port.\n\n"
 
-			"tx_checksum set (mask) (port_id)\n"
-			"    Enable hardware insertion of checksum offload with"
-			" the 8-bit mask, 0~0xff, in packets sent on a port.\n"
-			"        bit 0 - insert ip   checksum offload if set\n"
-			"        bit 1 - insert udp  checksum offload if set\n"
-			"        bit 2 - insert tcp  checksum offload if set\n"
-			"        bit 3 - insert sctp checksum offload if set\n"
-			"        bit 4 - insert inner ip  checksum offload if set\n"
-			"        bit 5 - insert inner udp checksum offload if set\n"
-			"        bit 6 - insert inner tcp checksum offload if set\n"
-			"        bit 7 - insert inner sctp checksum offload if set\n"
+			"tx_cksum set (ip|udp|tcp|sctp|vxlan) (hw|sw) (port_id)\n"
+			"    Enable hardware calculation of checksum with when"
+			" transmitting a packet using 'csum' forward engine.\n"
 			"    Please check the NIC datasheet for HW limits.\n\n"
 
+			"tx_checksum show (port_id)\n"
+			"    Display tx checksum offload configuration\n\n"
+
 			"set fwd (%s)\n"
 			"    Set packet forwarding mode.\n\n"
 
@@ -2738,48 +2733,131 @@ cmdline_parse_inst_t cmd_tx_vlan_reset = {
 
 
 /* *** ENABLE HARDWARE INSERTION OF CHECKSUM IN TX PACKETS *** */
-struct cmd_tx_cksum_set_result {
+struct cmd_tx_cksum_result {
 	cmdline_fixed_string_t tx_cksum;
-	cmdline_fixed_string_t set;
-	uint8_t cksum_mask;
+	cmdline_fixed_string_t mode;
+	cmdline_fixed_string_t proto;
+	cmdline_fixed_string_t hwsw;
 	uint8_t port_id;
 };
 
 static void
-cmd_tx_cksum_set_parsed(void *parsed_result,
+cmd_tx_cksum_parsed(void *parsed_result,
 		       __attribute__((unused)) struct cmdline *cl,
 		       __attribute__((unused)) void *data)
 {
-	struct cmd_tx_cksum_set_result *res = parsed_result;
+	struct cmd_tx_cksum_result *res = parsed_result;
+	int hw = 0;
+	uint16_t ol_flags, mask = 0;
+	struct rte_eth_dev_info dev_info;
+
+	if (port_id_is_invalid(res->port_id)) {
+		printf("invalid port %d\n", res->port_id);
+		return;
+	}
 
-	tx_cksum_set(res->port_id, res->cksum_mask);
+	if (!strcmp(res->mode, "set")) {
+
+		if (!strcmp(res->hwsw, "hw"))
+			hw = 1;
+
+		if (!strcmp(res->proto, "ip")) {
+			mask = TESTPMD_TX_OFFLOAD_IP_CKSUM;
+		} else if (!strcmp(res->proto, "udp")) {
+			mask = TESTPMD_TX_OFFLOAD_UDP_CKSUM;
+		} else if (!strcmp(res->proto, "tcp")) {
+			mask = TESTPMD_TX_OFFLOAD_TCP_CKSUM;
+		} else if (!strcmp(res->proto, "sctp")) {
+			mask = TESTPMD_TX_OFFLOAD_SCTP_CKSUM;
+		} else if (!strcmp(res->proto, "vxlan")) {
+			mask = TESTPMD_TX_OFFLOAD_VXLAN_CKSUM;
+		}
+
+		if (hw)
+			ports[res->port_id].tx_ol_flags |= mask;
+		else
+			ports[res->port_id].tx_ol_flags &= (~mask);
+	}
+
+	ol_flags = ports[res->port_id].tx_ol_flags;
+	printf("IP checksum offload is %s\n",
+		(ol_flags & TESTPMD_TX_OFFLOAD_IP_CKSUM) ? "hw" : "sw");
+	printf("UDP checksum offload is %s\n",
+		(ol_flags & TESTPMD_TX_OFFLOAD_UDP_CKSUM) ? "hw" : "sw");
+	printf("TCP checksum offload is %s\n",
+		(ol_flags & TESTPMD_TX_OFFLOAD_TCP_CKSUM) ? "hw" : "sw");
+	printf("SCTP checksum offload is %s\n",
+		(ol_flags & TESTPMD_TX_OFFLOAD_SCTP_CKSUM) ? "hw" : "sw");
+	printf("VxLAN checksum offload is %s\n",
+		(ol_flags & TESTPMD_TX_OFFLOAD_VXLAN_CKSUM) ? "hw" : "sw");
+
+	/* display warnings if configuration is not supported by the NIC */
+	rte_eth_dev_info_get(res->port_id, &dev_info);
+	if ((ol_flags & TESTPMD_TX_OFFLOAD_IP_CKSUM) &&
+		(dev_info.tx_offload_capa & DEV_TX_OFFLOAD_IPV4_CKSUM) == 0) {
+		printf("Warning: hardware IP checksum enabled but not "
+			"supported by port %d\n", res->port_id);
+	}
+	if ((ol_flags & TESTPMD_TX_OFFLOAD_UDP_CKSUM) &&
+		(dev_info.tx_offload_capa & DEV_TX_OFFLOAD_UDP_CKSUM) == 0) {
+		printf("Warning: hardware UDP checksum enabled but not "
+			"supported by port %d\n", res->port_id);
+	}
+	if ((ol_flags & TESTPMD_TX_OFFLOAD_TCP_CKSUM) &&
+		(dev_info.tx_offload_capa & DEV_TX_OFFLOAD_TCP_CKSUM) == 0) {
+		printf("Warning: hardware TCP checksum enabled but not "
+			"supported by port %d\n", res->port_id);
+	}
+	if ((ol_flags & TESTPMD_TX_OFFLOAD_SCTP_CKSUM) &&
+		(dev_info.tx_offload_capa & DEV_TX_OFFLOAD_SCTP_CKSUM) == 0) {
+		printf("Warning: hardware SCTP checksum enabled but not "
+			"supported by port %d\n", res->port_id);
+	}
 }
 
-cmdline_parse_token_string_t cmd_tx_cksum_set_tx_cksum =
-	TOKEN_STRING_INITIALIZER(struct cmd_tx_cksum_set_result,
+cmdline_parse_token_string_t cmd_tx_cksum_tx_cksum =
+	TOKEN_STRING_INITIALIZER(struct cmd_tx_cksum_result,
 				tx_cksum, "tx_checksum");
-cmdline_parse_token_string_t cmd_tx_cksum_set_set =
-	TOKEN_STRING_INITIALIZER(struct cmd_tx_cksum_set_result,
-				set, "set");
-cmdline_parse_token_num_t cmd_tx_cksum_set_cksum_mask =
-	TOKEN_NUM_INITIALIZER(struct cmd_tx_cksum_set_result,
-				cksum_mask, UINT8);
-cmdline_parse_token_num_t cmd_tx_cksum_set_portid =
-	TOKEN_NUM_INITIALIZER(struct cmd_tx_cksum_set_result,
+cmdline_parse_token_string_t cmd_tx_cksum_mode =
+	TOKEN_STRING_INITIALIZER(struct cmd_tx_cksum_result,
+				mode, "set");
+cmdline_parse_token_string_t cmd_tx_cksum_proto =
+	TOKEN_STRING_INITIALIZER(struct cmd_tx_cksum_result,
+				proto, "ip#tcp#udp#sctp#vxlan");
+cmdline_parse_token_string_t cmd_tx_cksum_hwsw =
+	TOKEN_STRING_INITIALIZER(struct cmd_tx_cksum_result,
+				hwsw, "hw#sw");
+cmdline_parse_token_num_t cmd_tx_cksum_portid =
+	TOKEN_NUM_INITIALIZER(struct cmd_tx_cksum_result,
 				port_id, UINT8);
 
 cmdline_parse_inst_t cmd_tx_cksum_set = {
-	.f = cmd_tx_cksum_set_parsed,
+	.f = cmd_tx_cksum_parsed,
+	.data = NULL,
+	.help_str = "enable/disable hardware calculation of L3/L4 checksum when "
+		"using csum forward engine: tx_cksum set ip|tcp|udp|sctp|vxlan hw|sw <port>",
+	.tokens = {
+		(void *)&cmd_tx_cksum_tx_cksum,
+		(void *)&cmd_tx_cksum_mode,
+		(void *)&cmd_tx_cksum_proto,
+		(void *)&cmd_tx_cksum_hwsw,
+		(void *)&cmd_tx_cksum_portid,
+		NULL,
+	},
+};
+
+cmdline_parse_token_string_t cmd_tx_cksum_mode_show =
+	TOKEN_STRING_INITIALIZER(struct cmd_tx_cksum_result,
+				mode, "show");
+
+cmdline_parse_inst_t cmd_tx_cksum_show = {
+	.f = cmd_tx_cksum_parsed,
 	.data = NULL,
-	.help_str = "enable hardware insertion of L3/L4checksum with a given "
-	"mask in packets sent on a port, the bit mapping is given as, Bit 0 for ip, "
-	"Bit 1 for UDP, Bit 2 for TCP, Bit 3 for SCTP, Bit 4 for inner ip, "
-	"Bit 5 for inner UDP, Bit 6 for inner TCP, Bit 7 for inner SCTP",
+	.help_str = "show checksum offload configuration: tx_cksum show <port>",
 	.tokens = {
-		(void *)&cmd_tx_cksum_set_tx_cksum,
-		(void *)&cmd_tx_cksum_set_set,
-		(void *)&cmd_tx_cksum_set_cksum_mask,
-		(void *)&cmd_tx_cksum_set_portid,
+		(void *)&cmd_tx_cksum_tx_cksum,
+		(void *)&cmd_tx_cksum_mode_show,
+		(void *)&cmd_tx_cksum_portid,
 		NULL,
 	},
 };
@@ -7796,6 +7874,7 @@ cmdline_parse_ctx_t main_ctx[] = {
 	(cmdline_parse_inst_t *)&cmd_tx_vlan_reset,
 	(cmdline_parse_inst_t *)&cmd_tx_vlan_set_pvid,
 	(cmdline_parse_inst_t *)&cmd_tx_cksum_set,
+	(cmdline_parse_inst_t *)&cmd_tx_cksum_show,
 	(cmdline_parse_inst_t *)&cmd_link_flow_control_set,
 	(cmdline_parse_inst_t *)&cmd_link_flow_control_set_rx,
 	(cmdline_parse_inst_t *)&cmd_link_flow_control_set_tx,
diff --git a/app/test-pmd/config.c b/app/test-pmd/config.c
index 34b6fdb..16d62ab 100644
--- a/app/test-pmd/config.c
+++ b/app/test-pmd/config.c
@@ -1744,17 +1744,6 @@ set_qmap(portid_t port_id, uint8_t is_rx, uint16_t queue_id, uint8_t map_value)
 }
 
 void
-tx_cksum_set(portid_t port_id, uint64_t ol_flags)
-{
-	uint64_t tx_ol_flags;
-	if (port_id_is_invalid(port_id))
-		return;
-	/* Clear last 8 bits and then set L3/4 checksum mask again */
-	tx_ol_flags = ports[port_id].tx_ol_flags & (~0x0FFull);
-	ports[port_id].tx_ol_flags = ((ol_flags & 0xff) | tx_ol_flags);
-}
-
-void
 fdir_add_signature_filter(portid_t port_id, uint8_t queue_id,
 			  struct rte_fdir_filter *fdir_filter)
 {
diff --git a/app/test-pmd/csumonly.c b/app/test-pmd/csumonly.c
index 743094a..dda5d9e 100644
--- a/app/test-pmd/csumonly.c
+++ b/app/test-pmd/csumonly.c
@@ -73,13 +73,19 @@
 #include <rte_string_fns.h>
 #include "testpmd.h"
 
-
-
 #define IP_DEFTTL  64   /* from RFC 1340. */
 #define IP_VERSION 0x40
 #define IP_HDRLEN  0x05 /* default IP header length == five 32-bits words. */
 #define IP_VHL_DEF (IP_VERSION | IP_HDRLEN)
 
+/* we cannot use htons() from arpa/inet.h due to name conflicts, and we
+ * cannot use rte_cpu_to_be_16() on a constant in a switch/case */
+#if __BYTE_ORDER == __LITTLE_ENDIAN
+#define _htons(x) ((uint16_t)((((x) & 0x00ffU) << 8) | (((x) & 0xff00U) >> 8)))
+#else
+#define _htons(x) (x)
+#endif
+
 static inline uint16_t
 get_16b_sum(uint16_t *ptr16, uint32_t nr)
 {
@@ -112,7 +118,7 @@ get_ipv4_cksum(struct ipv4_hdr *ipv4_hdr)
 
 
 static inline uint16_t
-get_ipv4_psd_sum (struct ipv4_hdr * ip_hdr)
+get_ipv4_psd_sum(struct ipv4_hdr *ip_hdr)
 {
 	/* Pseudo Header for IPv4/UDP/TCP checksum */
 	union ipv4_psd_header {
@@ -136,7 +142,7 @@ get_ipv4_psd_sum (struct ipv4_hdr * ip_hdr)
 }
 
 static inline uint16_t
-get_ipv6_psd_sum (struct ipv6_hdr * ip_hdr)
+get_ipv6_psd_sum(struct ipv6_hdr *ip_hdr)
 {
 	/* Pseudo Header for IPv6/UDP/TCP checksum */
 	union ipv6_psd_header {
@@ -158,6 +164,15 @@ get_ipv6_psd_sum (struct ipv6_hdr * ip_hdr)
 	return get_16b_sum(psd_hdr.u16_arr, sizeof(psd_hdr));
 }
 
+static uint16_t
+get_psd_sum(void *l3_hdr, uint16_t ethertype)
+{
+	if (ethertype == _htons(ETHER_TYPE_IPv4))
+		return get_ipv4_psd_sum(l3_hdr);
+	else /* assume ethertype == ETHER_TYPE_IPv6 */
+		return get_ipv6_psd_sum(l3_hdr);
+}
+
 static inline uint16_t
 get_ipv4_udptcp_checksum(struct ipv4_hdr *ipv4_hdr, uint16_t *l4_hdr)
 {
@@ -174,7 +189,6 @@ get_ipv4_udptcp_checksum(struct ipv4_hdr *ipv4_hdr, uint16_t *l4_hdr)
 	if (cksum == 0)
 		cksum = 0xffff;
 	return (uint16_t)cksum;
-
 }
 
 static inline uint16_t
@@ -196,48 +210,218 @@ get_ipv6_udptcp_checksum(struct ipv6_hdr *ipv6_hdr, uint16_t *l4_hdr)
 	return (uint16_t)cksum;
 }
 
+static uint16_t
+get_udptcp_checksum(void *l3_hdr, void *l4_hdr, uint16_t ethertype)
+{
+	if (ethertype == _htons(ETHER_TYPE_IPv4))
+		return get_ipv4_udptcp_checksum(l3_hdr, l4_hdr);
+	else /* assume ethertype == ETHER_TYPE_IPv6 */
+		return get_ipv6_udptcp_checksum(l3_hdr, l4_hdr);
+}
 
 /*
- * Forwarding of packets. Change the checksum field with HW or SW methods
- * The HW/SW method selection depends on the ol_flags on every packet
+ * Parse an ethernet header to fill the ethertype, l2_len, l3_len and
+ * ipproto. This function is able to recognize IPv4/IPv6 with one optional vlan
+ * header.
+ */
+static void
+parse_ethernet(struct ether_hdr *eth_hdr, uint16_t *ethertype, uint16_t *l2_len,
+	uint16_t *l3_len, uint8_t *l4_proto)
+{
+	struct ipv4_hdr *ipv4_hdr;
+	struct ipv6_hdr *ipv6_hdr;
+
+	*l2_len = sizeof(struct ether_hdr);
+	*ethertype = eth_hdr->ether_type;
+
+	if (*ethertype == _htons(ETHER_TYPE_VLAN)) {
+		struct vlan_hdr *vlan_hdr = (struct vlan_hdr *)(eth_hdr + 1);
+
+		*l2_len  += sizeof(struct vlan_hdr);
+		*ethertype = vlan_hdr->eth_proto;
+	}
+
+	switch (*ethertype) {
+	case _htons(ETHER_TYPE_IPv4):
+		ipv4_hdr = (struct ipv4_hdr *) ((char *)eth_hdr + *l2_len);
+		*l3_len = (ipv4_hdr->version_ihl & 0x0f) * 4;
+		*l4_proto = ipv4_hdr->next_proto_id;
+		break;
+	case _htons(ETHER_TYPE_IPv6):
+		ipv6_hdr = (struct ipv6_hdr *) ((char *)eth_hdr + *l2_len);
+		*l3_len = sizeof(struct ipv6_hdr) ;
+		*l4_proto = ipv6_hdr->proto;
+		break;
+	default:
+		*l3_len = 0;
+		*l4_proto = 0;
+		break;
+	}
+}
+
+/* modify the IPv4 or IPv4 source address of a packet */
+static void
+change_ip_addresses(void *l3_hdr, uint16_t ethertype)
+{
+	struct ipv4_hdr *ipv4_hdr = l3_hdr;
+	struct ipv6_hdr *ipv6_hdr = l3_hdr;
+
+	if (ethertype == _htons(ETHER_TYPE_IPv4)) {
+		ipv4_hdr->src_addr =
+			rte_cpu_to_be_32(rte_be_to_cpu_32(ipv4_hdr->src_addr) + 1);
+	}
+	else if (ethertype == _htons(ETHER_TYPE_IPv6)) {
+		ipv6_hdr->src_addr[15] = ipv6_hdr->src_addr[15] + 1;
+	}
+}
+
+/* if possible, calculate the checksum of a packet in hw or sw,
+ * depending on the testpmd command line configuration */
+static uint64_t
+process_inner_cksums(void *l3_hdr, uint16_t ethertype, uint16_t l3_len,
+	uint8_t l4_proto, uint16_t testpmd_ol_flags)
+{
+	struct ipv4_hdr *ipv4_hdr = l3_hdr;
+	struct udp_hdr *udp_hdr;
+	struct tcp_hdr *tcp_hdr;
+	struct sctp_hdr *sctp_hdr;
+	uint64_t ol_flags = 0;
+
+	if (ethertype == _htons(ETHER_TYPE_IPv4)) {
+		ipv4_hdr = l3_hdr;
+		ipv4_hdr->hdr_checksum = 0;
+
+		if (testpmd_ol_flags & TESTPMD_TX_OFFLOAD_IP_CKSUM)
+			ol_flags |= PKT_TX_IP_CKSUM;
+		else
+			ipv4_hdr->hdr_checksum = get_ipv4_cksum(ipv4_hdr);
+
+	}
+	else if (ethertype != _htons(ETHER_TYPE_IPv6))
+		return 0; /* packet type not supported nothing to do */
+
+	if (l4_proto == IPPROTO_UDP) {
+		udp_hdr = (struct udp_hdr *)((char *)l3_hdr + l3_len);
+		/* do not recalculate udp cksum if it was 0 */
+		if (udp_hdr->dgram_cksum != 0) {
+			udp_hdr->dgram_cksum = 0;
+			if (testpmd_ol_flags & TESTPMD_TX_OFFLOAD_UDP_CKSUM) {
+				ol_flags |= PKT_TX_UDP_CKSUM;
+				udp_hdr->dgram_cksum = get_psd_sum(l3_hdr,
+					ethertype);
+			}
+			else {
+				udp_hdr->dgram_cksum =
+					get_udptcp_checksum(l3_hdr, udp_hdr,
+						ethertype);
+			}
+		}
+	}
+	else if (l4_proto == IPPROTO_TCP) {
+		tcp_hdr = (struct tcp_hdr *)((char *)l3_hdr + l3_len);
+		tcp_hdr->cksum = 0;
+		if (testpmd_ol_flags & TESTPMD_TX_OFFLOAD_TCP_CKSUM) {
+			ol_flags |= PKT_TX_TCP_CKSUM;
+			tcp_hdr->cksum = get_psd_sum(l3_hdr, ethertype);
+		}
+		else {
+			tcp_hdr->cksum =
+				get_udptcp_checksum(l3_hdr, tcp_hdr, ethertype);
+		}
+	}
+	else if (l4_proto == IPPROTO_SCTP) {
+		sctp_hdr = (struct sctp_hdr *)((char *)l3_hdr + l3_len);
+		sctp_hdr->cksum = 0;
+		/* sctp payload must be a multiple of 4 to be
+		 * offloaded */
+		if ((testpmd_ol_flags & TESTPMD_TX_OFFLOAD_SCTP_CKSUM) &&
+			((ipv4_hdr->total_length & 0x3) == 0)) {
+			ol_flags |= PKT_TX_SCTP_CKSUM;
+		}
+		else {
+			/* XXX implement CRC32c, example available in
+			 * RFC3309 */
+		}
+	}
+
+	return ol_flags;
+}
+
+/* Calculate the checksum of outer header (only vxlan is supported,
+ * meaning IP + UDP). The caller already checked that it's a vxlan
+ * packet */
+static uint64_t
+process_outer_cksums(void *outer_l3_hdr, uint16_t outer_ethertype,
+	uint16_t outer_l3_len, uint16_t testpmd_ol_flags)
+{
+	struct ipv4_hdr *ipv4_hdr = outer_l3_hdr;
+	struct ipv6_hdr *ipv6_hdr = outer_l3_hdr;
+	struct udp_hdr *udp_hdr;
+	uint64_t ol_flags = 0;
+
+	if (testpmd_ol_flags & TESTPMD_TX_OFFLOAD_VXLAN_CKSUM)
+		ol_flags |= PKT_TX_VXLAN_CKSUM;
+
+	if (outer_ethertype == _htons(ETHER_TYPE_IPv4)) {
+		ipv4_hdr->hdr_checksum = 0;
+
+		if ((testpmd_ol_flags & TESTPMD_TX_OFFLOAD_VXLAN_CKSUM) == 0)
+			ipv4_hdr->hdr_checksum = get_ipv4_cksum(ipv4_hdr);
+	}
+
+	udp_hdr = (struct udp_hdr *)((char *)outer_l3_hdr + outer_l3_len);
+	/* do not recalculate udp cksum if it was 0 */
+	if (udp_hdr->dgram_cksum != 0) {
+		udp_hdr->dgram_cksum = 0;
+		if ((testpmd_ol_flags & TESTPMD_TX_OFFLOAD_VXLAN_CKSUM) == 0) {
+			if (outer_ethertype == _htons(ETHER_TYPE_IPv4))
+				udp_hdr->dgram_cksum =
+					get_ipv4_udptcp_checksum(ipv4_hdr,
+						(uint16_t *)udp_hdr);
+			else
+				udp_hdr->dgram_cksum =
+					get_ipv6_udptcp_checksum(ipv6_hdr,
+						(uint16_t *)udp_hdr);
+		}
+	}
+
+	return ol_flags;
+}
+
+/*
+ * Receive a burst of packets, and for supported packet types:
+ *  - modify the IPs
+ *  - reprocess the checksum in SW or HW, depending on testpmd command line
+ *    configuration
+ * Then packets are transmitted on the output port.
+ *
+ * Supported packets are:
+ *   Ether / (vlan) / IP|IP6 / UDP|TCP|SCTP .
+ *   Ether / (vlan) / IP|IP6 / UDP / VxLAN / Ether / IP|IP6 / UDP|TCP|SCTP
+ *
+ * The network parser supposes that the packet is contiguous, which may
+ * not be the case in real life.
  */
 static void
 pkt_burst_checksum_forward(struct fwd_stream *fs)
 {
-	struct rte_mbuf  *pkts_burst[MAX_PKT_BURST];
-	struct rte_port  *txp;
-	struct rte_mbuf  *mb;
+	struct rte_mbuf *pkts_burst[MAX_PKT_BURST];
+	struct rte_port *txp;
+	struct rte_mbuf *m;
 	struct ether_hdr *eth_hdr;
-	struct ipv4_hdr  *ipv4_hdr;
-	struct ether_hdr *inner_eth_hdr;
-	struct ipv4_hdr  *inner_ipv4_hdr = NULL;
-	struct ipv6_hdr  *ipv6_hdr;
-	struct ipv6_hdr  *inner_ipv6_hdr = NULL;
-	struct udp_hdr   *udp_hdr;
-	struct udp_hdr   *inner_udp_hdr;
-	struct tcp_hdr   *tcp_hdr;
-	struct tcp_hdr   *inner_tcp_hdr;
-	struct sctp_hdr  *sctp_hdr;
-	struct sctp_hdr  *inner_sctp_hdr;
-
+	void *l3_hdr = NULL, *outer_l3_hdr = NULL; /* can be IPv4 or IPv6 */
+	struct udp_hdr *udp_hdr;
 	uint16_t nb_rx;
 	uint16_t nb_tx;
 	uint16_t i;
 	uint64_t ol_flags;
-	uint64_t pkt_ol_flags;
-	uint64_t tx_ol_flags;
-	uint16_t l4_proto;
-	uint16_t inner_l4_proto = 0;
-	uint16_t eth_type;
-	uint8_t  l2_len;
-	uint8_t  l3_len;
-	uint8_t  inner_l2_len = 0;
-	uint8_t  inner_l3_len = 0;
-
+	uint16_t testpmd_ol_flags;
+	uint8_t l4_proto;
+	uint16_t ethertype = 0, outer_ethertype = 0;
+	uint16_t  l2_len = 0, l3_len = 0, outer_l2_len = 0, outer_l3_len = 0;
+	int tunnel = 0;
 	uint32_t rx_bad_ip_csum;
 	uint32_t rx_bad_l4_csum;
-	uint8_t  ipv4_tunnel;
-	uint8_t  ipv6_tunnel;
 
 #ifdef RTE_TEST_PMD_RECORD_CORE_CYCLES
 	uint64_t start_tsc;
@@ -249,9 +433,7 @@ pkt_burst_checksum_forward(struct fwd_stream *fs)
 	start_tsc = rte_rdtsc();
 #endif
 
-	/*
-	 * Receive a burst of packets and forward them.
-	 */
+	/* receive a burst of packet */
 	nb_rx = rte_eth_rx_burst(fs->rx_port, fs->rx_queue, pkts_burst,
 				 nb_pkt_per_burst);
 	if (unlikely(nb_rx == 0))
@@ -265,348 +447,107 @@ pkt_burst_checksum_forward(struct fwd_stream *fs)
 	rx_bad_l4_csum = 0;
 
 	txp = &ports[fs->tx_port];
-	tx_ol_flags = txp->tx_ol_flags;
+	testpmd_ol_flags = txp->tx_ol_flags;
 
 	for (i = 0; i < nb_rx; i++) {
 
-		mb = pkts_burst[i];
-		l2_len  = sizeof(struct ether_hdr);
-		pkt_ol_flags = mb->ol_flags;
-		ol_flags = (pkt_ol_flags & (~PKT_TX_L4_MASK));
-		ipv4_tunnel = (pkt_ol_flags & PKT_RX_TUNNEL_IPV4_HDR) ?
-				1 : 0;
-		ipv6_tunnel = (pkt_ol_flags & PKT_RX_TUNNEL_IPV6_HDR) ?
-				1 : 0;
-		eth_hdr = rte_pktmbuf_mtod(mb, struct ether_hdr *);
-		eth_type = rte_be_to_cpu_16(eth_hdr->ether_type);
-		if (eth_type == ETHER_TYPE_VLAN) {
-			/* Only allow single VLAN label here */
-			l2_len  += sizeof(struct vlan_hdr);
-			 eth_type = rte_be_to_cpu_16(*(uint16_t *)
-				((uintptr_t)&eth_hdr->ether_type +
-				sizeof(struct vlan_hdr)));
+		ol_flags = 0;
+		tunnel = 0;
+		m = pkts_burst[i];
+
+		/* Update the L3/L4 checksum error packet statistics */
+		rx_bad_ip_csum += ((m->ol_flags & PKT_RX_IP_CKSUM_BAD) != 0);
+		rx_bad_l4_csum += ((m->ol_flags & PKT_RX_L4_CKSUM_BAD) != 0);
+
+		/* step 1: dissect packet, parsing optional vlan, ip4/ip6, vxlan
+		 * and inner headers */
+
+		eth_hdr = rte_pktmbuf_mtod(m, struct ether_hdr *);
+		parse_ethernet(eth_hdr, &ethertype, &l2_len, &l3_len, &l4_proto);
+		l3_hdr = (char *)eth_hdr + l2_len;
+
+		/* check if it's a supported tunnel (only vxlan for now) */
+		if (l4_proto == IPPROTO_UDP) {
+			udp_hdr = (struct udp_hdr *)((char *)l3_hdr + l3_len);
+
+			/* currently, this flag is set by i40e only if the
+			 * packet is vxlan */
+			if (((m->ol_flags & PKT_RX_TUNNEL_IPV4_HDR) ||
+					(m->ol_flags & PKT_RX_TUNNEL_IPV6_HDR)))
+				tunnel = 1;
+			/* else check udp destination port, 4789 is the default
+			 * vxlan port (rfc7348) */
+			else if (udp_hdr->dst_port == _htons(4789))
+				tunnel = 1;
+
+			if (tunnel == 1) {
+				outer_ethertype = ethertype;
+				outer_l2_len = l2_len;
+				outer_l3_len = l3_len;
+				outer_l3_hdr = l3_hdr;
+
+				eth_hdr = (struct ether_hdr *)((char *)udp_hdr +
+					sizeof(struct udp_hdr) +
+					sizeof(struct vxlan_hdr));
+
+				parse_ethernet(eth_hdr, &ethertype, &l2_len,
+					&l3_len, &l4_proto);
+				l3_hdr = (char *)eth_hdr + l2_len;
+			}
 		}
 
-		/* Update the L3/L4 checksum error packet count  */
-		rx_bad_ip_csum += (uint16_t) ((pkt_ol_flags & PKT_RX_IP_CKSUM_BAD) != 0);
-		rx_bad_l4_csum += (uint16_t) ((pkt_ol_flags & PKT_RX_L4_CKSUM_BAD) != 0);
-
-		/*
-		 * Try to figure out L3 packet type by SW.
-		 */
-		if ((pkt_ol_flags & (PKT_RX_IPV4_HDR | PKT_RX_IPV4_HDR_EXT |
-				PKT_RX_IPV6_HDR | PKT_RX_IPV6_HDR_EXT)) == 0) {
-			if (eth_type == ETHER_TYPE_IPv4)
-				pkt_ol_flags |= PKT_RX_IPV4_HDR;
-			else if (eth_type == ETHER_TYPE_IPv6)
-				pkt_ol_flags |= PKT_RX_IPV6_HDR;
-		}
+		/* step 2: change all source IPs (v4 or v6) so we need
+		 * to recompute the chksums even if they were correct */
 
-		/*
-		 * Simplify the protocol parsing
-		 * Assuming the incoming packets format as
-		 *      Ethernet2 + optional single VLAN
-		 *      + ipv4 or ipv6
-		 *      + udp or tcp or sctp or others
-		 */
-		if (pkt_ol_flags & (PKT_RX_IPV4_HDR | PKT_RX_TUNNEL_IPV4_HDR)) {
+		change_ip_addresses(l3_hdr, ethertype);
+		if (tunnel == 1)
+			change_ip_addresses(outer_l3_hdr, outer_ethertype);
 
-			/* Do not support ipv4 option field */
-			l3_len = sizeof(struct ipv4_hdr) ;
+		/* step 3: depending on user command line configuration,
+		 * recompute checksum either in software or flag the
+		 * mbuf to offload the calculation to the NIC */
 
-			ipv4_hdr = (struct ipv4_hdr *) (rte_pktmbuf_mtod(mb,
-					unsigned char *) + l2_len);
+		/* process checksums of inner headers first */
+		ol_flags |= process_inner_cksums(l3_hdr, ethertype,
+			l3_len, l4_proto, testpmd_ol_flags);
 
-			l4_proto = ipv4_hdr->next_proto_id;
+		/* Then process outer headers if any. Note that the software
+		 * checksum will be wrong if one of the inner checksums is
+		 * processed in hardware. */
+		if (tunnel == 1) {
+			ol_flags |= process_outer_cksums(outer_l3_hdr,
+				outer_ethertype, outer_l3_len, testpmd_ol_flags);
+		}
 
-			/* Do not delete, this is required by HW*/
-			ipv4_hdr->hdr_checksum = 0;
+		/* step 4: fill the mbuf meta data (flags and header lengths) */
 
-			if (tx_ol_flags & TESTPMD_TX_OFFLOAD_IP_CKSUM) {
-				/* HW checksum */
-				ol_flags |= PKT_TX_IP_CKSUM;
+		if (tunnel == 1) {
+			if (testpmd_ol_flags & TESTPMD_TX_OFFLOAD_VXLAN_CKSUM) {
+				m->l2_len = outer_l2_len;
+				m->l3_len = outer_l3_len;
+				m->inner_l2_len = l2_len;
+				m->inner_l3_len = l3_len;
 			}
 			else {
-				ol_flags |= PKT_TX_IPV4;
-				/* SW checksum calculation */
-				ipv4_hdr->src_addr++;
-				ipv4_hdr->hdr_checksum = get_ipv4_cksum(ipv4_hdr);
+				/* if we don't do vxlan cksum in hw,
+				   outer checksum will be wrong because
+				   we changed the ip, but it shows that
+				   we can process the inner header cksum
+				   in the nic */
+				m->l2_len = outer_l2_len + outer_l3_len +
+					sizeof(struct udp_hdr) +
+					sizeof(struct vxlan_hdr) + l2_len;
+				m->l3_len = l3_len;
 			}
-
-			if (l4_proto == IPPROTO_UDP) {
-				udp_hdr = (struct udp_hdr*) (rte_pktmbuf_mtod(mb,
-						unsigned char *) + l2_len + l3_len);
-				if (tx_ol_flags & TESTPMD_TX_OFFLOAD_UDP_CKSUM) {
-					/* HW Offload */
-					ol_flags |= PKT_TX_UDP_CKSUM;
-					if (ipv4_tunnel)
-						udp_hdr->dgram_cksum = 0;
-					else
-						/* Pseudo header sum need be set properly */
-						udp_hdr->dgram_cksum =
-							get_ipv4_psd_sum(ipv4_hdr);
-				}
-				else {
-					/* SW Implementation, clear checksum field first */
-					udp_hdr->dgram_cksum = 0;
-					udp_hdr->dgram_cksum = get_ipv4_udptcp_checksum(ipv4_hdr,
-									(uint16_t *)udp_hdr);
-				}
-
-				if (ipv4_tunnel) {
-
-					uint16_t len;
-
-					/* Check if inner L3/L4 checkum flag is set */
-					if (tx_ol_flags & TESTPMD_TX_OFFLOAD_INNER_CKSUM_MASK)
-						ol_flags |= PKT_TX_VXLAN_CKSUM;
-
-					inner_l2_len  = sizeof(struct ether_hdr);
-					inner_eth_hdr = (struct ether_hdr *) (rte_pktmbuf_mtod(mb,
-								unsigned char *) + l2_len + l3_len
-								 + ETHER_VXLAN_HLEN);
-
-					eth_type = rte_be_to_cpu_16(inner_eth_hdr->ether_type);
-					if (eth_type == ETHER_TYPE_VLAN) {
-						inner_l2_len += sizeof(struct vlan_hdr);
-						eth_type = rte_be_to_cpu_16(*(uint16_t *)
-							((uintptr_t)&eth_hdr->ether_type +
-								sizeof(struct vlan_hdr)));
-					}
-
-					len = l2_len + l3_len + ETHER_VXLAN_HLEN + inner_l2_len;
-					if (eth_type == ETHER_TYPE_IPv4) {
-						inner_l3_len = sizeof(struct ipv4_hdr);
-						inner_ipv4_hdr = (struct ipv4_hdr *) (rte_pktmbuf_mtod(mb,
-								unsigned char *) + len);
-						inner_l4_proto = inner_ipv4_hdr->next_proto_id;
-
-						if (tx_ol_flags & TESTPMD_TX_OFFLOAD_INNER_IP_CKSUM) {
-
-							/* Do not delete, this is required by HW*/
-							inner_ipv4_hdr->hdr_checksum = 0;
-							ol_flags |= PKT_TX_IPV4_CSUM;
-						}
-
-					} else if (eth_type == ETHER_TYPE_IPv6) {
-						inner_l3_len = sizeof(struct ipv6_hdr);
-						inner_ipv6_hdr = (struct ipv6_hdr *) (rte_pktmbuf_mtod(mb,
-								unsigned char *) + len);
-						inner_l4_proto = inner_ipv6_hdr->proto;
-					}
-					if ((inner_l4_proto == IPPROTO_UDP) &&
-						(tx_ol_flags & TESTPMD_TX_OFFLOAD_INNER_UDP_CKSUM)) {
-
-						/* HW Offload */
-						ol_flags |= PKT_TX_UDP_CKSUM;
-						inner_udp_hdr = (struct udp_hdr *) (rte_pktmbuf_mtod(mb,
-								unsigned char *) + len + inner_l3_len);
-						if (eth_type == ETHER_TYPE_IPv4)
-							inner_udp_hdr->dgram_cksum = get_ipv4_psd_sum(inner_ipv4_hdr);
-						else if (eth_type == ETHER_TYPE_IPv6)
-							inner_udp_hdr->dgram_cksum = get_ipv6_psd_sum(inner_ipv6_hdr);
-
-					} else if ((inner_l4_proto == IPPROTO_TCP) &&
-						(tx_ol_flags & TESTPMD_TX_OFFLOAD_INNER_TCP_CKSUM)) {
-						/* HW Offload */
-						ol_flags |= PKT_TX_TCP_CKSUM;
-						inner_tcp_hdr = (struct tcp_hdr *) (rte_pktmbuf_mtod(mb,
-								unsigned char *) + len + inner_l3_len);
-						if (eth_type == ETHER_TYPE_IPv4)
-							inner_tcp_hdr->cksum = get_ipv4_psd_sum(inner_ipv4_hdr);
-						else if (eth_type == ETHER_TYPE_IPv6)
-							inner_tcp_hdr->cksum = get_ipv6_psd_sum(inner_ipv6_hdr);
-					} else if ((inner_l4_proto == IPPROTO_SCTP) &&
-						(tx_ol_flags & TESTPMD_TX_OFFLOAD_INNER_SCTP_CKSUM)) {
-						/* HW Offload */
-						ol_flags |= PKT_TX_SCTP_CKSUM;
-						inner_sctp_hdr = (struct sctp_hdr *) (rte_pktmbuf_mtod(mb,
-								unsigned char *) + len + inner_l3_len);
-						inner_sctp_hdr->cksum = 0;
-					}
-
-				}
-
-			} else if (l4_proto == IPPROTO_TCP) {
-				tcp_hdr = (struct tcp_hdr*) (rte_pktmbuf_mtod(mb,
-						unsigned char *) + l2_len + l3_len);
-				if (tx_ol_flags & TESTPMD_TX_OFFLOAD_TCP_CKSUM) {
-					ol_flags |= PKT_TX_TCP_CKSUM;
-					tcp_hdr->cksum = get_ipv4_psd_sum(ipv4_hdr);
-				}
-				else {
-					tcp_hdr->cksum = 0;
-					tcp_hdr->cksum = get_ipv4_udptcp_checksum(ipv4_hdr,
-							(uint16_t*)tcp_hdr);
-				}
-			} else if (l4_proto == IPPROTO_SCTP) {
-				sctp_hdr = (struct sctp_hdr*) (rte_pktmbuf_mtod(mb,
-						unsigned char *) + l2_len + l3_len);
-
-				if (tx_ol_flags & TESTPMD_TX_OFFLOAD_SCTP_CKSUM) {
-					ol_flags |= PKT_TX_SCTP_CKSUM;
-					sctp_hdr->cksum = 0;
-
-					/* Sanity check, only number of 4 bytes supported */
-					if ((rte_be_to_cpu_16(ipv4_hdr->total_length) % 4) != 0)
-						printf("sctp payload must be a multiple "
-							"of 4 bytes for checksum offload");
-				}
-				else {
-					sctp_hdr->cksum = 0;
-					/* CRC32c sample code available in RFC3309 */
-				}
-			}
-			/* End of L4 Handling*/
-		} else if (pkt_ol_flags & (PKT_RX_IPV6_HDR | PKT_RX_TUNNEL_IPV6_HDR)) {
-			ipv6_hdr = (struct ipv6_hdr *) (rte_pktmbuf_mtod(mb,
-					unsigned char *) + l2_len);
-			l3_len = sizeof(struct ipv6_hdr) ;
-			l4_proto = ipv6_hdr->proto;
-			ol_flags |= PKT_TX_IPV6;
-
-			if (l4_proto == IPPROTO_UDP) {
-				udp_hdr = (struct udp_hdr*) (rte_pktmbuf_mtod(mb,
-						unsigned char *) + l2_len + l3_len);
-				if (tx_ol_flags & TESTPMD_TX_OFFLOAD_UDP_CKSUM) {
-					/* HW Offload */
-					ol_flags |= PKT_TX_UDP_CKSUM;
-					if (ipv6_tunnel)
-						udp_hdr->dgram_cksum = 0;
-					else
-						udp_hdr->dgram_cksum =
-							get_ipv6_psd_sum(ipv6_hdr);
-				}
-				else {
-					/* SW Implementation */
-					/* checksum field need be clear first */
-					udp_hdr->dgram_cksum = 0;
-					udp_hdr->dgram_cksum = get_ipv6_udptcp_checksum(ipv6_hdr,
-								(uint16_t *)udp_hdr);
-				}
-
-				if (ipv6_tunnel) {
-
-					uint16_t len;
-
-					/* Check if inner L3/L4 checksum flag is set */
-					if (tx_ol_flags & TESTPMD_TX_OFFLOAD_INNER_CKSUM_MASK)
-						ol_flags |= PKT_TX_VXLAN_CKSUM;
-
-					inner_l2_len  = sizeof(struct ether_hdr);
-					inner_eth_hdr = (struct ether_hdr *) (rte_pktmbuf_mtod(mb,
-						unsigned char *) + l2_len + l3_len + ETHER_VXLAN_HLEN);
-					eth_type = rte_be_to_cpu_16(inner_eth_hdr->ether_type);
-
-					if (eth_type == ETHER_TYPE_VLAN) {
-						inner_l2_len += sizeof(struct vlan_hdr);
-						eth_type = rte_be_to_cpu_16(*(uint16_t *)
-							((uintptr_t)&eth_hdr->ether_type +
-							sizeof(struct vlan_hdr)));
-					}
-
-					len = l2_len + l3_len + ETHER_VXLAN_HLEN + inner_l2_len;
-
-					if (eth_type == ETHER_TYPE_IPv4) {
-						inner_l3_len = sizeof(struct ipv4_hdr);
-						inner_ipv4_hdr = (struct ipv4_hdr *) (rte_pktmbuf_mtod(mb,
-								unsigned char *) + len);
-						inner_l4_proto = inner_ipv4_hdr->next_proto_id;
-
-						/* HW offload */
-						if (tx_ol_flags & TESTPMD_TX_OFFLOAD_INNER_IP_CKSUM) {
-
-							/* Do not delete, this is required by HW*/
-							inner_ipv4_hdr->hdr_checksum = 0;
-							ol_flags |= PKT_TX_IPV4_CSUM;
-						}
-					} else if (eth_type == ETHER_TYPE_IPv6) {
-						inner_l3_len = sizeof(struct ipv6_hdr);
-						inner_ipv6_hdr = (struct ipv6_hdr *) (rte_pktmbuf_mtod(mb,
-							unsigned char *) + len);
-						inner_l4_proto = inner_ipv6_hdr->proto;
-					}
-
-					if ((inner_l4_proto == IPPROTO_UDP) &&
-						(tx_ol_flags & TESTPMD_TX_OFFLOAD_INNER_UDP_CKSUM)) {
-						inner_udp_hdr = (struct udp_hdr *) (rte_pktmbuf_mtod(mb,
-							unsigned char *) + len + inner_l3_len);
-						/* HW offload */
-						ol_flags |= PKT_TX_UDP_CKSUM;
-						inner_udp_hdr->dgram_cksum = 0;
-						if (eth_type == ETHER_TYPE_IPv4)
-							inner_udp_hdr->dgram_cksum = get_ipv4_psd_sum(inner_ipv4_hdr);
-						else if (eth_type == ETHER_TYPE_IPv6)
-							inner_udp_hdr->dgram_cksum = get_ipv6_psd_sum(inner_ipv6_hdr);
-					} else if ((inner_l4_proto == IPPROTO_TCP) &&
-						(tx_ol_flags & TESTPMD_TX_OFFLOAD_INNER_TCP_CKSUM)) {
-						/* HW offload */
-						ol_flags |= PKT_TX_TCP_CKSUM;
-						inner_tcp_hdr = (struct tcp_hdr *) (rte_pktmbuf_mtod(mb,
-								unsigned char *) + len + inner_l3_len);
-
-						if (eth_type == ETHER_TYPE_IPv4)
-							inner_tcp_hdr->cksum = get_ipv4_psd_sum(inner_ipv4_hdr);
-						else if (eth_type == ETHER_TYPE_IPv6)
-							inner_tcp_hdr->cksum = get_ipv6_psd_sum(inner_ipv6_hdr);
-
-					} else if ((inner_l4_proto == IPPROTO_SCTP) &&
-						(tx_ol_flags & TESTPMD_TX_OFFLOAD_INNER_SCTP_CKSUM)) {
-						/* HW offload */
-						ol_flags |= PKT_TX_SCTP_CKSUM;
-						inner_sctp_hdr = (struct sctp_hdr *) (rte_pktmbuf_mtod(mb,
-								unsigned char *) + len + inner_l3_len);
-						inner_sctp_hdr->cksum = 0;
-					}
-
-				}
-
-			}
-			else if (l4_proto == IPPROTO_TCP) {
-				tcp_hdr = (struct tcp_hdr*) (rte_pktmbuf_mtod(mb,
-						unsigned char *) + l2_len + l3_len);
-				if (tx_ol_flags & TESTPMD_TX_OFFLOAD_TCP_CKSUM) {
-					ol_flags |= PKT_TX_TCP_CKSUM;
-					tcp_hdr->cksum = get_ipv6_psd_sum(ipv6_hdr);
-				}
-				else {
-					tcp_hdr->cksum = 0;
-					tcp_hdr->cksum = get_ipv6_udptcp_checksum(ipv6_hdr,
-							(uint16_t*)tcp_hdr);
-				}
-			}
-			else if (l4_proto == IPPROTO_SCTP) {
-				sctp_hdr = (struct sctp_hdr*) (rte_pktmbuf_mtod(mb,
-						unsigned char *) + l2_len + l3_len);
-
-				if (tx_ol_flags & TESTPMD_TX_OFFLOAD_SCTP_CKSUM) {
-					ol_flags |= PKT_TX_SCTP_CKSUM;
-					sctp_hdr->cksum = 0;
-					/* Sanity check, only number of 4 bytes supported by HW */
-					if ((rte_be_to_cpu_16(ipv6_hdr->payload_len) % 4) != 0)
-						printf("sctp payload must be a multiple "
-							"of 4 bytes for checksum offload");
-				}
-				else {
-					/* CRC32c sample code available in RFC3309 */
-					sctp_hdr->cksum = 0;
-				}
-			} else {
-				printf("Test flow control for 1G PMD \n");
-			}
-			/* End of L6 Handling*/
-		}
-		else {
-			l3_len = 0;
-			printf("Unhandled packet type: %#hx\n", eth_type);
+		} else {
+			/* this is only useful if an offload flag is
+			 * set, but it does not hurt to fill it in any
+			 * case */
+			m->l2_len = l2_len;
+			m->l3_len = l3_len;
 		}
+		m->ol_flags = ol_flags;
 
-		/* Combine the packet header write. VLAN is not consider here */
-		mb->l2_len = l2_len;
-		mb->l3_len = l3_len;
-		mb->inner_l2_len = inner_l2_len;
-		mb->inner_l3_len = inner_l3_len;
-		mb->ol_flags = ol_flags;
 	}
 	nb_tx = rte_eth_tx_burst(fs->tx_port, fs->tx_queue, pkts_burst, nb_rx);
 	fs->tx_packets += nb_tx;
@@ -629,7 +570,6 @@ pkt_burst_checksum_forward(struct fwd_stream *fs)
 #endif
 }
 
-
 struct fwd_engine csum_fwd_engine = {
 	.fwd_mode_name  = "csum",
 	.port_fwd_begin = NULL,
diff --git a/app/test-pmd/testpmd.h b/app/test-pmd/testpmd.h
index 82af2bd..c753d37 100644
--- a/app/test-pmd/testpmd.h
+++ b/app/test-pmd/testpmd.h
@@ -131,18 +131,11 @@ struct fwd_stream {
 #define TESTPMD_TX_OFFLOAD_TCP_CKSUM         0x0004
 /** Offload SCTP checksum in csum forward engine */
 #define TESTPMD_TX_OFFLOAD_SCTP_CKSUM        0x0008
-/** Offload inner IP checksum in csum forward engine */
-#define TESTPMD_TX_OFFLOAD_INNER_IP_CKSUM    0x0010
-/** Offload inner UDP checksum in csum forward engine */
-#define TESTPMD_TX_OFFLOAD_INNER_UDP_CKSUM   0x0020
-/** Offload inner TCP checksum in csum forward engine */
-#define TESTPMD_TX_OFFLOAD_INNER_TCP_CKSUM   0x0040
-/** Offload inner SCTP checksum in csum forward engine */
-#define TESTPMD_TX_OFFLOAD_INNER_SCTP_CKSUM  0x0080
-/** Offload inner IP checksum mask */
-#define TESTPMD_TX_OFFLOAD_INNER_CKSUM_MASK  0x00F0
+/** Offload VxLAN checksum in csum forward engine */
+#define TESTPMD_TX_OFFLOAD_VXLAN_CKSUM       0x0010
 /** Insert VLAN header in forward engine */
-#define TESTPMD_TX_OFFLOAD_INSERT_VLAN       0x0100
+#define TESTPMD_TX_OFFLOAD_INSERT_VLAN       0x0020
+
 /**
  * The data structure associated with each port.
  */
@@ -510,8 +503,6 @@ void tx_vlan_pvid_set(portid_t port_id, uint16_t vlan_id, int on);
 
 void set_qmap(portid_t port_id, uint8_t is_rx, uint16_t queue_id, uint8_t map_value);
 
-void tx_cksum_set(portid_t port_id, uint64_t ol_flags);
-
 void set_verbose_level(uint16_t vb_level);
 void set_tx_pkt_segments(unsigned *seg_lengths, unsigned nb_segs);
 void set_nb_pkt_per_burst(uint16_t pkt_burst);
-- 
2.1.0

^ permalink raw reply	[flat|nested] 112+ messages in thread

* [dpdk-dev] [PATCH v2 09/13] mbuf: introduce new checksum API
  2014-11-14 17:03 ` [dpdk-dev] [PATCH v2 00/13] " Olivier Matz
                     ` (7 preceding siblings ...)
  2014-11-14 17:03   ` [dpdk-dev] [PATCH v2 08/13] testpmd: rework csum forward engine Olivier Matz
@ 2014-11-14 17:03   ` Olivier Matz
  2014-11-17 18:15     ` Ananyev, Konstantin
  2014-11-14 17:03   ` [dpdk-dev] [PATCH v2 10/13] mbuf: generic support for TCP segmentation offload Olivier Matz
                     ` (4 subsequent siblings)
  13 siblings, 1 reply; 112+ messages in thread
From: Olivier Matz @ 2014-11-14 17:03 UTC (permalink / raw)
  To: dev; +Cc: jigsaw

Introduce new functions to calculate checksums. These new functions
are derivated from the ones provided csumonly.c but slightly reworked.
There is still some room for future optimization of these functions
(maybe SSE/AVX, ...).

This API will be modified in tbe next commits by the introduction of
TSO that requires a different pseudo header checksum to be set in the
packet.

Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
---
 app/test-pmd/csumonly.c    | 133 ++-------------------------------
 lib/librte_mbuf/rte_mbuf.h |   3 +-
 lib/librte_net/rte_ip.h    | 179 +++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 189 insertions(+), 126 deletions(-)

diff --git a/app/test-pmd/csumonly.c b/app/test-pmd/csumonly.c
index dda5d9e..39f974d 100644
--- a/app/test-pmd/csumonly.c
+++ b/app/test-pmd/csumonly.c
@@ -86,137 +86,22 @@
 #define _htons(x) (x)
 #endif
 
-static inline uint16_t
-get_16b_sum(uint16_t *ptr16, uint32_t nr)
-{
-	uint32_t sum = 0;
-	while (nr > 1)
-	{
-		sum +=*ptr16;
-		nr -= sizeof(uint16_t);
-		ptr16++;
-		if (sum > UINT16_MAX)
-			sum -= UINT16_MAX;
-	}
-
-	/* If length is in odd bytes */
-	if (nr)
-		sum += *((uint8_t*)ptr16);
-
-	sum = ((sum & 0xffff0000) >> 16) + (sum & 0xffff);
-	sum &= 0x0ffff;
-	return (uint16_t)sum;
-}
-
-static inline uint16_t
-get_ipv4_cksum(struct ipv4_hdr *ipv4_hdr)
-{
-	uint16_t cksum;
-	cksum = get_16b_sum((uint16_t*)ipv4_hdr, sizeof(struct ipv4_hdr));
-	return (uint16_t)((cksum == 0xffff)?cksum:~cksum);
-}
-
-
-static inline uint16_t
-get_ipv4_psd_sum(struct ipv4_hdr *ip_hdr)
-{
-	/* Pseudo Header for IPv4/UDP/TCP checksum */
-	union ipv4_psd_header {
-		struct {
-			uint32_t src_addr; /* IP address of source host. */
-			uint32_t dst_addr; /* IP address of destination host(s). */
-			uint8_t  zero;     /* zero. */
-			uint8_t  proto;    /* L4 protocol type. */
-			uint16_t len;      /* L4 length. */
-		} __attribute__((__packed__));
-		uint16_t u16_arr[0];
-	} psd_hdr;
-
-	psd_hdr.src_addr = ip_hdr->src_addr;
-	psd_hdr.dst_addr = ip_hdr->dst_addr;
-	psd_hdr.zero     = 0;
-	psd_hdr.proto    = ip_hdr->next_proto_id;
-	psd_hdr.len      = rte_cpu_to_be_16((uint16_t)(rte_be_to_cpu_16(ip_hdr->total_length)
-				- sizeof(struct ipv4_hdr)));
-	return get_16b_sum(psd_hdr.u16_arr, sizeof(psd_hdr));
-}
-
-static inline uint16_t
-get_ipv6_psd_sum(struct ipv6_hdr *ip_hdr)
-{
-	/* Pseudo Header for IPv6/UDP/TCP checksum */
-	union ipv6_psd_header {
-		struct {
-			uint8_t src_addr[16]; /* IP address of source host. */
-			uint8_t dst_addr[16]; /* IP address of destination host(s). */
-			uint32_t len;         /* L4 length. */
-			uint32_t proto;       /* L4 protocol - top 3 bytes must be zero */
-		} __attribute__((__packed__));
-
-		uint16_t u16_arr[0]; /* allow use as 16-bit values with safe aliasing */
-	} psd_hdr;
-
-	rte_memcpy(&psd_hdr.src_addr, ip_hdr->src_addr,
-			sizeof(ip_hdr->src_addr) + sizeof(ip_hdr->dst_addr));
-	psd_hdr.len       = ip_hdr->payload_len;
-	psd_hdr.proto     = (ip_hdr->proto << 24);
-
-	return get_16b_sum(psd_hdr.u16_arr, sizeof(psd_hdr));
-}
-
 static uint16_t
 get_psd_sum(void *l3_hdr, uint16_t ethertype)
 {
 	if (ethertype == _htons(ETHER_TYPE_IPv4))
-		return get_ipv4_psd_sum(l3_hdr);
+		return rte_ipv4_phdr_cksum(l3_hdr);
 	else /* assume ethertype == ETHER_TYPE_IPv6 */
-		return get_ipv6_psd_sum(l3_hdr);
-}
-
-static inline uint16_t
-get_ipv4_udptcp_checksum(struct ipv4_hdr *ipv4_hdr, uint16_t *l4_hdr)
-{
-	uint32_t cksum;
-	uint32_t l4_len;
-
-	l4_len = rte_be_to_cpu_16(ipv4_hdr->total_length) - sizeof(struct ipv4_hdr);
-
-	cksum = get_16b_sum(l4_hdr, l4_len);
-	cksum += get_ipv4_psd_sum(ipv4_hdr);
-
-	cksum = ((cksum & 0xffff0000) >> 16) + (cksum & 0xffff);
-	cksum = (~cksum) & 0xffff;
-	if (cksum == 0)
-		cksum = 0xffff;
-	return (uint16_t)cksum;
-}
-
-static inline uint16_t
-get_ipv6_udptcp_checksum(struct ipv6_hdr *ipv6_hdr, uint16_t *l4_hdr)
-{
-	uint32_t cksum;
-	uint32_t l4_len;
-
-	l4_len = rte_be_to_cpu_16(ipv6_hdr->payload_len);
-
-	cksum = get_16b_sum(l4_hdr, l4_len);
-	cksum += get_ipv6_psd_sum(ipv6_hdr);
-
-	cksum = ((cksum & 0xffff0000) >> 16) + (cksum & 0xffff);
-	cksum = (~cksum) & 0xffff;
-	if (cksum == 0)
-		cksum = 0xffff;
-
-	return (uint16_t)cksum;
+		return rte_ipv6_phdr_cksum(l3_hdr);
 }
 
 static uint16_t
 get_udptcp_checksum(void *l3_hdr, void *l4_hdr, uint16_t ethertype)
 {
 	if (ethertype == _htons(ETHER_TYPE_IPv4))
-		return get_ipv4_udptcp_checksum(l3_hdr, l4_hdr);
+		return rte_ipv4_udptcp_cksum(l3_hdr, l4_hdr);
 	else /* assume ethertype == ETHER_TYPE_IPv6 */
-		return get_ipv6_udptcp_checksum(l3_hdr, l4_hdr);
+		return rte_ipv6_udptcp_cksum(l3_hdr, l4_hdr);
 }
 
 /*
@@ -294,7 +179,7 @@ process_inner_cksums(void *l3_hdr, uint16_t ethertype, uint16_t l3_len,
 		if (testpmd_ol_flags & TESTPMD_TX_OFFLOAD_IP_CKSUM)
 			ol_flags |= PKT_TX_IP_CKSUM;
 		else
-			ipv4_hdr->hdr_checksum = get_ipv4_cksum(ipv4_hdr);
+			ipv4_hdr->hdr_checksum = rte_ipv4_cksum(ipv4_hdr);
 
 	}
 	else if (ethertype != _htons(ETHER_TYPE_IPv6))
@@ -366,7 +251,7 @@ process_outer_cksums(void *outer_l3_hdr, uint16_t outer_ethertype,
 		ipv4_hdr->hdr_checksum = 0;
 
 		if ((testpmd_ol_flags & TESTPMD_TX_OFFLOAD_VXLAN_CKSUM) == 0)
-			ipv4_hdr->hdr_checksum = get_ipv4_cksum(ipv4_hdr);
+			ipv4_hdr->hdr_checksum = rte_ipv4_cksum(ipv4_hdr);
 	}
 
 	udp_hdr = (struct udp_hdr *)((char *)outer_l3_hdr + outer_l3_len);
@@ -376,12 +261,10 @@ process_outer_cksums(void *outer_l3_hdr, uint16_t outer_ethertype,
 		if ((testpmd_ol_flags & TESTPMD_TX_OFFLOAD_VXLAN_CKSUM) == 0) {
 			if (outer_ethertype == _htons(ETHER_TYPE_IPv4))
 				udp_hdr->dgram_cksum =
-					get_ipv4_udptcp_checksum(ipv4_hdr,
-						(uint16_t *)udp_hdr);
+					rte_ipv4_udptcp_cksum(ipv4_hdr, udp_hdr);
 			else
 				udp_hdr->dgram_cksum =
-					get_ipv6_udptcp_checksum(ipv6_hdr,
-						(uint16_t *)udp_hdr);
+					rte_ipv6_udptcp_cksum(ipv6_hdr, udp_hdr);
 		}
 	}
 
diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h
index e76617f..3c8e825 100644
--- a/lib/librte_mbuf/rte_mbuf.h
+++ b/lib/librte_mbuf/rte_mbuf.h
@@ -114,7 +114,8 @@ extern "C" {
  *  - fill l2_len and l3_len in mbuf
  *  - set the flags PKT_TX_TCP_CKSUM, PKT_TX_SCTP_CKSUM or PKT_TX_UDP_CKSUM
  *  - calculate the pseudo header checksum and set it in the L4 header (only
- *    for TCP or UDP). For SCTP, set the crc field to 0.
+ *    for TCP or UDP). See rte_ipv4_phdr_cksum() and rte_ipv6_phdr_cksum().
+ *    For SCTP, set the crc field to 0.
  */
 #define PKT_TX_L4_NO_CKSUM   (0ULL << 52) /* Disable L4 cksum of TX pkt. */
 #define PKT_TX_TCP_CKSUM     (1ULL << 52) /**< TCP cksum of TX pkt. computed by NIC. */
diff --git a/lib/librte_net/rte_ip.h b/lib/librte_net/rte_ip.h
index e3f65c1..9cfca7f 100644
--- a/lib/librte_net/rte_ip.h
+++ b/lib/librte_net/rte_ip.h
@@ -78,6 +78,9 @@
 
 #include <stdint.h>
 
+#include <rte_memcpy.h>
+#include <rte_byteorder.h>
+
 #ifdef __cplusplus
 extern "C" {
 #endif
@@ -247,6 +250,124 @@ struct ipv4_hdr {
 	((x) >= IPV4_MIN_MCAST && (x) <= IPV4_MAX_MCAST) /**< check if IPv4 address is multicast */
 
 /**
+ * Process the non-complemented checksum of a buffer.
+ *
+ * @param buf
+ *   Pointer to the buffer.
+ * @param len
+ *   Length of the buffer.
+ * @return
+ *   The non-complemented checksum.
+ */
+static inline uint16_t
+rte_raw_cksum(const char *buf, size_t len)
+{
+	const uint16_t *u16 = (const uint16_t *)buf;
+	uint32_t sum = 0;
+
+	while (len >= 8) {
+		sum += u16[0]; sum += u16[1]; sum += u16[2]; sum += u16[3];
+		len -= 8;
+		u16 += 4;
+	}
+	while (len >= 2) {
+		sum += *u16;
+		len -= 2;
+		u16 += 1;
+	}
+
+	/* if length is in odd bytes */
+	if (len == 1)
+		sum += *((const uint8_t *)u16);
+
+	sum = ((sum & 0xffff0000) >> 16) + (sum & 0xffff);
+	sum = ((sum & 0xffff0000) >> 16) + (sum & 0xffff);
+	return (uint16_t)sum;
+}
+
+/**
+ * Process the IPv4 checksum of an IPv4 header.
+ *
+ * The checksum field must be set to 0 by the caller.
+ *
+ * @param ipv4_hdr
+ *   The pointer to the contiguous IPv4 header.
+ * @return
+ *   The complemented checksum to set in the IP packet.
+ */
+static inline uint16_t
+rte_ipv4_cksum(const struct ipv4_hdr *ipv4_hdr)
+{
+	uint16_t cksum;
+	cksum = rte_raw_cksum((const char *)ipv4_hdr, sizeof(struct ipv4_hdr));
+	return ((cksum == 0xffff) ? cksum : ~cksum);
+}
+
+/**
+ * Process the pseudo-header checksum of an IPv4 header.
+ *
+ * The checksum field must be set to 0 by the caller.
+ *
+ * @param ipv4_hdr
+ *   The pointer to the contiguous IPv4 header.
+ * @return
+ *   The non-complemented checksum to set in the L4 header.
+ */
+static inline uint16_t
+rte_ipv4_phdr_cksum(const struct ipv4_hdr *ipv4_hdr)
+{
+	struct ipv4_psd_header {
+		uint32_t src_addr; /* IP address of source host. */
+		uint32_t dst_addr; /* IP address of destination host. */
+		uint8_t  zero;     /* zero. */
+		uint8_t  proto;    /* L4 protocol type. */
+		uint16_t len;      /* L4 length. */
+	} psd_hdr;
+
+	psd_hdr.src_addr = ipv4_hdr->src_addr;
+	psd_hdr.dst_addr = ipv4_hdr->dst_addr;
+	psd_hdr.zero = 0;
+	psd_hdr.proto = ipv4_hdr->next_proto_id;
+	psd_hdr.len = rte_cpu_to_be_16(
+		(uint16_t)(rte_be_to_cpu_16(ipv4_hdr->total_length)
+			- sizeof(struct ipv4_hdr)));
+	return rte_raw_cksum((const char *)&psd_hdr, sizeof(psd_hdr));
+}
+
+/**
+ * Process the IPv4 UDP or TCP checksum.
+ *
+ * The IPv4 header should not contains options. The IP and layer 4
+ * checksum must be set to 0 in the packet by the caller.
+ *
+ * @param ipv4_hdr
+ *   The pointer to the contiguous IPv4 header.
+ * @param l4_hdr
+ *   The pointer to the beginning of the L4 header.
+ * @return
+ *   The complemented checksum to set in the IP packet.
+ */
+static inline uint16_t
+rte_ipv4_udptcp_cksum(const struct ipv4_hdr *ipv4_hdr, const void *l4_hdr)
+{
+	uint32_t cksum;
+	uint32_t l4_len;
+
+	l4_len = rte_be_to_cpu_16(ipv4_hdr->total_length) -
+		sizeof(struct ipv4_hdr);
+
+	cksum = rte_raw_cksum(l4_hdr, l4_len);
+	cksum += rte_ipv4_phdr_cksum(ipv4_hdr);
+
+	cksum = ((cksum & 0xffff0000) >> 16) + (cksum & 0xffff);
+	cksum = (~cksum) & 0xffff;
+	if (cksum == 0)
+		cksum = 0xffff;
+
+	return cksum;
+}
+
+/**
  * IPv6 Header
  */
 struct ipv6_hdr {
@@ -258,6 +379,64 @@ struct ipv6_hdr {
 	uint8_t  dst_addr[16]; /**< IP address of destination host(s). */
 } __attribute__((__packed__));
 
+/**
+ * Process the pseudo-header checksum of an IPv6 header.
+ *
+ * @param ipv6_hdr
+ *   The pointer to the contiguous IPv6 header.
+ * @return
+ *   The non-complemented checksum to set in the L4 header.
+ */
+static inline uint16_t
+rte_ipv6_phdr_cksum(const struct ipv6_hdr *ipv6_hdr)
+{
+	struct ipv6_psd_header {
+		uint8_t src_addr[16]; /* IP address of source host. */
+		uint8_t dst_addr[16]; /* IP address of destination host. */
+		uint32_t len;         /* L4 length. */
+		uint32_t proto;       /* L4 protocol - top 3 bytes must be zero */
+	} psd_hdr;
+
+	rte_memcpy(&psd_hdr.src_addr, ipv6_hdr->src_addr,
+		sizeof(ipv6_hdr->src_addr) + sizeof(ipv6_hdr->dst_addr));
+	psd_hdr.proto = (ipv6_hdr->proto << 24);
+	psd_hdr.len = ipv6_hdr->payload_len;
+
+	return rte_raw_cksum((const char *)&psd_hdr, sizeof(psd_hdr));
+}
+
+/**
+ * Process the IPv6 UDP or TCP checksum.
+ *
+ * The IPv4 header should not contains options. The layer 4 checksum
+ * must be set to 0 in the packet by the caller.
+ *
+ * @param ipv6_hdr
+ *   The pointer to the contiguous IPv6 header.
+ * @param l4_hdr
+ *   The pointer to the beginning of the L4 header.
+ * @return
+ *   The complemented checksum to set in the IP packet.
+ */
+static inline uint16_t
+rte_ipv6_udptcp_cksum(const struct ipv6_hdr *ipv6_hdr, const void *l4_hdr)
+{
+	uint32_t cksum;
+	uint32_t l4_len;
+
+	l4_len = rte_be_to_cpu_16(ipv6_hdr->payload_len);
+
+	cksum = rte_raw_cksum(l4_hdr, l4_len);
+	cksum += rte_ipv6_phdr_cksum(ipv6_hdr, 0);
+
+	cksum = ((cksum & 0xffff0000) >> 16) + (cksum & 0xffff);
+	cksum = (~cksum) & 0xffff;
+	if (cksum == 0)
+		cksum = 0xffff;
+
+	return cksum;
+}
+
 #ifdef __cplusplus
 }
 #endif
-- 
2.1.0

^ permalink raw reply	[flat|nested] 112+ messages in thread

* [dpdk-dev] [PATCH v2 10/13] mbuf: generic support for TCP segmentation offload
  2014-11-14 17:03 ` [dpdk-dev] [PATCH v2 00/13] " Olivier Matz
                     ` (8 preceding siblings ...)
  2014-11-14 17:03   ` [dpdk-dev] [PATCH v2 09/13] mbuf: introduce new checksum API Olivier Matz
@ 2014-11-14 17:03   ` Olivier Matz
  2014-11-17 23:33     ` Ananyev, Konstantin
  2014-11-14 17:03   ` [dpdk-dev] [PATCH v2 11/13] ixgbe: support " Olivier Matz
                     ` (3 subsequent siblings)
  13 siblings, 1 reply; 112+ messages in thread
From: Olivier Matz @ 2014-11-14 17:03 UTC (permalink / raw)
  To: dev; +Cc: jigsaw

Some of the NICs supported by DPDK have a possibility to accelerate TCP
traffic by using segmentation offload. The application prepares a packet
with valid TCP header with size up to 64K and deleguates the
segmentation to the NIC.

Implement the generic part of TCP segmentation offload in rte_mbuf. It
introduces 2 new fields in rte_mbuf: l4_len (length of L4 header in bytes)
and tso_segsz (MSS of packets).

To delegate the TCP segmentation to the hardware, the user has to:

- set the PKT_TX_TCP_SEG flag in mbuf->ol_flags (this flag implies
  PKT_TX_TCP_CKSUM)
- set PKT_TX_IP_CKSUM if it's IPv4, and set the IP checksum to 0 in
  the packet
- fill the mbuf offload information: l2_len, l3_len, l4_len, tso_segsz
- calculate the pseudo header checksum without taking ip_len in account,
  and set it in the TCP header, for instance by using
  rte_ipv4_phdr_cksum(ip_hdr, ol_flags)

The API is inspired from ixgbe hardware (the next commit adds the
support for ixgbe), but it seems generic enough to be used for other
hw/drivers in the future.

This commit also reworks the way l2_len and l3_len are used in igb
and ixgbe drivers as the l2_l3_len is not available anymore in mbuf.

Signed-off-by: Mirek Walukiewicz <miroslaw.walukiewicz@intel.com>
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
---
 app/test-pmd/testpmd.c            |  2 +-
 examples/ipv4_multicast/main.c    |  2 +-
 lib/librte_mbuf/rte_mbuf.c        |  1 +
 lib/librte_mbuf/rte_mbuf.h        | 44 +++++++++++++++++++++++----------------
 lib/librte_net/rte_ip.h           | 39 +++++++++++++++++++++++++++-------
 lib/librte_pmd_e1000/igb_rxtx.c   | 11 +++++++++-
 lib/librte_pmd_ixgbe/ixgbe_rxtx.c | 11 +++++++++-
 7 files changed, 81 insertions(+), 29 deletions(-)

diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c
index 12adafa..632a993 100644
--- a/app/test-pmd/testpmd.c
+++ b/app/test-pmd/testpmd.c
@@ -408,7 +408,7 @@ testpmd_mbuf_ctor(struct rte_mempool *mp,
 	mb->ol_flags     = 0;
 	mb->data_off     = RTE_PKTMBUF_HEADROOM;
 	mb->nb_segs      = 1;
-	mb->l2_l3_len       = 0;
+	mb->tx_offload   = 0;
 	mb->vlan_tci     = 0;
 	mb->hash.rss     = 0;
 }
diff --git a/examples/ipv4_multicast/main.c b/examples/ipv4_multicast/main.c
index 590d11a..80c5140 100644
--- a/examples/ipv4_multicast/main.c
+++ b/examples/ipv4_multicast/main.c
@@ -302,7 +302,7 @@ mcast_out_pkt(struct rte_mbuf *pkt, int use_clone)
 	/* copy metadata from source packet*/
 	hdr->port = pkt->port;
 	hdr->vlan_tci = pkt->vlan_tci;
-	hdr->l2_l3_len = pkt->l2_l3_len;
+	hdr->tx_offload = pkt->tx_offload;
 	hdr->hash = pkt->hash;
 
 	hdr->ol_flags = pkt->ol_flags;
diff --git a/lib/librte_mbuf/rte_mbuf.c b/lib/librte_mbuf/rte_mbuf.c
index 5cd9137..75295c8 100644
--- a/lib/librte_mbuf/rte_mbuf.c
+++ b/lib/librte_mbuf/rte_mbuf.c
@@ -238,6 +238,7 @@ const char *rte_get_tx_ol_flag_name(uint64_t mask)
 	case PKT_TX_UDP_CKSUM: return "PKT_TX_UDP_CKSUM";
 	case PKT_TX_IEEE1588_TMST: return "PKT_TX_IEEE1588_TMST";
 	case PKT_TX_VXLAN_CKSUM: return "PKT_TX_VXLAN_CKSUM";
+	case PKT_TX_TCP_SEG: return "PKT_TX_TCP_SEG";
 	default: return NULL;
 	}
 }
diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h
index 3c8e825..9f44d08 100644
--- a/lib/librte_mbuf/rte_mbuf.h
+++ b/lib/librte_mbuf/rte_mbuf.h
@@ -127,6 +127,20 @@ extern "C" {
 
 #define PKT_TX_VXLAN_CKSUM   (1ULL << 50) /**< TX checksum of VXLAN computed by NIC */
 
+/**
+ * TCP segmentation offload. To enable this offload feature for a
+ * packet to be transmitted on hardware supporting TSO:
+ *  - set the PKT_TX_TCP_SEG flag in mbuf->ol_flags (this flag implies
+ *    PKT_TX_TCP_CKSUM)
+ *  - if it's IPv4, set the PKT_TX_IP_CKSUM flag and write the IP checksum
+ *    to 0 in the packet
+ *  - fill the mbuf offload information: l2_len, l3_len, l4_len, tso_segsz
+ *  - calculate the pseudo header checksum without taking ip_len in accound,
+ *    and set it in the TCP header. Refer to rte_ipv4_phdr_cksum() and
+ *    rte_ipv6_phdr_cksum() that can be used as helpers.
+ */
+#define PKT_TX_TCP_SEG       (1ULL << 49)
+
 /* Use final bit of flags to indicate a control mbuf */
 #define CTRL_MBUF_FLAG       (1ULL << 63) /**< Mbuf contains control data */
 
@@ -228,22 +242,18 @@ struct rte_mbuf {
 
 	/* fields to support TX offloads */
 	union {
-		uint16_t l2_l3_len; /**< combined l2/l3 lengths as single var */
+		uint64_t tx_offload;       /**< combined for easy fetch */
 		struct {
-			uint16_t l3_len:9;      /**< L3 (IP) Header Length. */
-			uint16_t l2_len:7;      /**< L2 (MAC) Header Length. */
-		};
-	};
+			uint64_t l2_len:7; /**< L2 (MAC) Header Length. */
+			uint64_t l3_len:9; /**< L3 (IP) Header Length. */
+			uint64_t l4_len:8; /**< L4 (TCP/UDP) Header Length. */
+			uint64_t tso_segsz:16; /**< TCP TSO segment size */
 
-	/* fields for TX offloading of tunnels */
-	union {
-		uint16_t inner_l2_l3_len;
-		/**< combined inner l2/l3 lengths as single var */
-		struct {
-			uint16_t inner_l3_len:9;
-			/**< inner L3 (IP) Header Length. */
-			uint16_t inner_l2_len:7;
-			/**< inner L2 (MAC) Header Length. */
+			/* fields for TX offloading of tunnels */
+			uint64_t inner_l3_len:9; /**< inner L3 (IP) Hdr Length. */
+			uint64_t inner_l2_len:7; /**< inner L2 (MAC) Hdr Length. */
+
+			/* uint64_t unused:8; */
 		};
 	};
 } __rte_cache_aligned;
@@ -595,8 +605,7 @@ static inline void rte_pktmbuf_reset(struct rte_mbuf *m)
 {
 	m->next = NULL;
 	m->pkt_len = 0;
-	m->l2_l3_len = 0;
-	m->inner_l2_l3_len = 0;
+	m->tx_offload = 0;
 	m->vlan_tci = 0;
 	m->nb_segs = 1;
 	m->port = 0xff;
@@ -665,8 +674,7 @@ static inline void rte_pktmbuf_attach(struct rte_mbuf *mi, struct rte_mbuf *md)
 	mi->data_len = md->data_len;
 	mi->port = md->port;
 	mi->vlan_tci = md->vlan_tci;
-	mi->l2_l3_len = md->l2_l3_len;
-	mi->inner_l2_l3_len = md->inner_l2_l3_len;
+	mi->tx_offload = md->tx_offload;
 	mi->hash = md->hash;
 
 	mi->next = NULL;
diff --git a/lib/librte_net/rte_ip.h b/lib/librte_net/rte_ip.h
index 9cfca7f..1fafa73 100644
--- a/lib/librte_net/rte_ip.h
+++ b/lib/librte_net/rte_ip.h
@@ -80,6 +80,7 @@
 
 #include <rte_memcpy.h>
 #include <rte_byteorder.h>
+#include <rte_mbuf.h>
 
 #ifdef __cplusplus
 extern "C" {
@@ -308,13 +309,21 @@ rte_ipv4_cksum(const struct ipv4_hdr *ipv4_hdr)
  *
  * The checksum field must be set to 0 by the caller.
  *
+ * Depending on the ol_flags, the pseudo-header checksum expected by the
+ * drivers is not the same. For instance, when TSO is enabled, the IP
+ * payload length must not be included in the packet.
+ *
+ * When ol_flags is 0, it computes the standard pseudo-header checksum.
+ *
  * @param ipv4_hdr
  *   The pointer to the contiguous IPv4 header.
+ * @param ol_flags
+ *   The ol_flags of the associated mbuf.
  * @return
  *   The non-complemented checksum to set in the L4 header.
  */
 static inline uint16_t
-rte_ipv4_phdr_cksum(const struct ipv4_hdr *ipv4_hdr)
+rte_ipv4_phdr_cksum(const struct ipv4_hdr *ipv4_hdr, uint64_t ol_flags)
 {
 	struct ipv4_psd_header {
 		uint32_t src_addr; /* IP address of source host. */
@@ -328,9 +337,13 @@ rte_ipv4_phdr_cksum(const struct ipv4_hdr *ipv4_hdr)
 	psd_hdr.dst_addr = ipv4_hdr->dst_addr;
 	psd_hdr.zero = 0;
 	psd_hdr.proto = ipv4_hdr->next_proto_id;
-	psd_hdr.len = rte_cpu_to_be_16(
-		(uint16_t)(rte_be_to_cpu_16(ipv4_hdr->total_length)
-			- sizeof(struct ipv4_hdr)));
+	if (ol_flags & PKT_TX_TCP_SEG) {
+		psd_hdr.len = 0;
+	} else {
+		psd_hdr.len = rte_cpu_to_be_16(
+			(uint16_t)(rte_be_to_cpu_16(ipv4_hdr->total_length)
+				- sizeof(struct ipv4_hdr)));
+	}
 	return rte_raw_cksum((const char *)&psd_hdr, sizeof(psd_hdr));
 }
 
@@ -357,7 +370,7 @@ rte_ipv4_udptcp_cksum(const struct ipv4_hdr *ipv4_hdr, const void *l4_hdr)
 		sizeof(struct ipv4_hdr);
 
 	cksum = rte_raw_cksum(l4_hdr, l4_len);
-	cksum += rte_ipv4_phdr_cksum(ipv4_hdr);
+	cksum += rte_ipv4_phdr_cksum(ipv4_hdr, 0);
 
 	cksum = ((cksum & 0xffff0000) >> 16) + (cksum & 0xffff);
 	cksum = (~cksum) & 0xffff;
@@ -382,13 +395,21 @@ struct ipv6_hdr {
 /**
  * Process the pseudo-header checksum of an IPv6 header.
  *
+ * Depending on the ol_flags, the pseudo-header checksum expected by the
+ * drivers is not the same. For instance, when TSO is enabled, the IPv6
+ * payload length must not be included in the packet.
+ *
+ * When ol_flags is 0, it computes the standard pseudo-header checksum.
+ *
  * @param ipv6_hdr
  *   The pointer to the contiguous IPv6 header.
+ * @param ol_flags
+ *   The ol_flags of the associated mbuf.
  * @return
  *   The non-complemented checksum to set in the L4 header.
  */
 static inline uint16_t
-rte_ipv6_phdr_cksum(const struct ipv6_hdr *ipv6_hdr)
+rte_ipv6_phdr_cksum(const struct ipv6_hdr *ipv6_hdr, uint64_t ol_flags)
 {
 	struct ipv6_psd_header {
 		uint8_t src_addr[16]; /* IP address of source host. */
@@ -400,7 +421,11 @@ rte_ipv6_phdr_cksum(const struct ipv6_hdr *ipv6_hdr)
 	rte_memcpy(&psd_hdr.src_addr, ipv6_hdr->src_addr,
 		sizeof(ipv6_hdr->src_addr) + sizeof(ipv6_hdr->dst_addr));
 	psd_hdr.proto = (ipv6_hdr->proto << 24);
-	psd_hdr.len = ipv6_hdr->payload_len;
+	if (ol_flags & PKT_TX_TCP_SEG) {
+		psd_hdr.len = 0;
+	} else {
+		psd_hdr.len = ipv6_hdr->payload_len;
+	}
 
 	return rte_raw_cksum((const char *)&psd_hdr, sizeof(psd_hdr));
 }
diff --git a/lib/librte_pmd_e1000/igb_rxtx.c b/lib/librte_pmd_e1000/igb_rxtx.c
index 433c616..848d5d1 100644
--- a/lib/librte_pmd_e1000/igb_rxtx.c
+++ b/lib/librte_pmd_e1000/igb_rxtx.c
@@ -367,6 +367,13 @@ eth_igb_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
 	struct rte_mbuf     *tx_pkt;
 	struct rte_mbuf     *m_seg;
 	union igb_vlan_macip vlan_macip_lens;
+	union {
+		uint16_t u16;
+		struct {
+			uint16_t l3_len:9;
+			uint16_t l2_len:7;
+		};
+	} l2_l3_len;
 	uint64_t buf_dma_addr;
 	uint32_t olinfo_status;
 	uint32_t cmd_type_len;
@@ -404,8 +411,10 @@ eth_igb_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
 		tx_last = (uint16_t) (tx_id + tx_pkt->nb_segs - 1);
 
 		ol_flags = tx_pkt->ol_flags;
+		l2_l3_len.l2_len = tx_pkt->l2_len;
+		l2_l3_len.l3_len = tx_pkt->l3_len;
 		vlan_macip_lens.f.vlan_tci = tx_pkt->vlan_tci;
-		vlan_macip_lens.f.l2_l3_len = tx_pkt->l2_l3_len;
+		vlan_macip_lens.f.l2_l3_len = l2_l3_len.u16;
 		tx_ol_req = ol_flags & IGB_TX_OFFLOAD_MASK;
 
 		/* If a Context Descriptor need be built . */
diff --git a/lib/librte_pmd_ixgbe/ixgbe_rxtx.c b/lib/librte_pmd_ixgbe/ixgbe_rxtx.c
index ca35db2..2df3385 100644
--- a/lib/librte_pmd_ixgbe/ixgbe_rxtx.c
+++ b/lib/librte_pmd_ixgbe/ixgbe_rxtx.c
@@ -546,6 +546,13 @@ ixgbe_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
 	struct rte_mbuf     *tx_pkt;
 	struct rte_mbuf     *m_seg;
 	union ixgbe_vlan_macip vlan_macip_lens;
+	union {
+		uint16_t u16;
+		struct {
+			uint16_t l3_len:9;
+			uint16_t l2_len:7;
+		};
+	} l2_l3_len;
 	uint64_t buf_dma_addr;
 	uint32_t olinfo_status;
 	uint32_t cmd_type_len;
@@ -588,8 +595,10 @@ ixgbe_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
 		/* If hardware offload required */
 		tx_ol_req = ol_flags & IXGBE_TX_OFFLOAD_MASK;
 		if (tx_ol_req) {
+			l2_l3_len.l2_len = tx_pkt->l2_len;
+			l2_l3_len.l3_len = tx_pkt->l3_len;
 			vlan_macip_lens.f.vlan_tci = tx_pkt->vlan_tci;
-			vlan_macip_lens.f.l2_l3_len = tx_pkt->l2_l3_len;
+			vlan_macip_lens.f.l2_l3_len = l2_l3_len.u16;
 
 			/* If new context need be built or reuse the exist ctx. */
 			ctx = what_advctx_update(txq, tx_ol_req,
-- 
2.1.0

^ permalink raw reply	[flat|nested] 112+ messages in thread

* [dpdk-dev] [PATCH v2 11/13] ixgbe: support TCP segmentation offload
  2014-11-14 17:03 ` [dpdk-dev] [PATCH v2 00/13] " Olivier Matz
                     ` (9 preceding siblings ...)
  2014-11-14 17:03   ` [dpdk-dev] [PATCH v2 10/13] mbuf: generic support for TCP segmentation offload Olivier Matz
@ 2014-11-14 17:03   ` Olivier Matz
  2014-11-17 18:26     ` Ananyev, Konstantin
  2014-11-14 17:03   ` [dpdk-dev] [PATCH v2 12/13] testpmd: support TSO in csum forward engine Olivier Matz
                     ` (2 subsequent siblings)
  13 siblings, 1 reply; 112+ messages in thread
From: Olivier Matz @ 2014-11-14 17:03 UTC (permalink / raw)
  To: dev; +Cc: jigsaw

Implement TSO (TCP segmentation offload) in ixgbe driver. The driver is
now able to use PKT_TX_TCP_SEG mbuf flag and mbuf hardware offload infos
(l2_len, l3_len, l4_len, tso_segsz) to configure the hardware support of
TCP segmentation.

In ixgbe, when doing TSO, the IP length must not be included in the TCP
pseudo header checksum. A new function ixgbe_fix_tcp_phdr_cksum() is
used to fix the pseudo header checksum of the packet before giving it to
the hardware.

In the patch, the tx_desc_cksum_flags_to_olinfo() and
tx_desc_ol_flags_to_cmdtype() functions have been reworked to make them
clearer. This should not impact performance as gcc (version 4.8 in my
case) is smart enough to convert the tests into a code that does not
contain any branch instruction.

Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
---
 lib/librte_pmd_ixgbe/ixgbe_ethdev.c |   3 +-
 lib/librte_pmd_ixgbe/ixgbe_rxtx.c   | 169 ++++++++++++++++++++++--------------
 lib/librte_pmd_ixgbe/ixgbe_rxtx.h   |  19 ++--
 3 files changed, 117 insertions(+), 74 deletions(-)

diff --git a/lib/librte_pmd_ixgbe/ixgbe_ethdev.c b/lib/librte_pmd_ixgbe/ixgbe_ethdev.c
index 2eb609c..2c2ecc0 100644
--- a/lib/librte_pmd_ixgbe/ixgbe_ethdev.c
+++ b/lib/librte_pmd_ixgbe/ixgbe_ethdev.c
@@ -1964,7 +1964,8 @@ ixgbe_dev_info_get(struct rte_eth_dev *dev, struct rte_eth_dev_info *dev_info)
 		DEV_TX_OFFLOAD_IPV4_CKSUM  |
 		DEV_TX_OFFLOAD_UDP_CKSUM   |
 		DEV_TX_OFFLOAD_TCP_CKSUM   |
-		DEV_TX_OFFLOAD_SCTP_CKSUM;
+		DEV_TX_OFFLOAD_SCTP_CKSUM  |
+		DEV_TX_OFFLOAD_TCP_TSO;
 
 	dev_info->default_rxconf = (struct rte_eth_rxconf) {
 			.rx_thresh = {
diff --git a/lib/librte_pmd_ixgbe/ixgbe_rxtx.c b/lib/librte_pmd_ixgbe/ixgbe_rxtx.c
index 2df3385..19e3b73 100644
--- a/lib/librte_pmd_ixgbe/ixgbe_rxtx.c
+++ b/lib/librte_pmd_ixgbe/ixgbe_rxtx.c
@@ -94,7 +94,8 @@
 #define IXGBE_TX_OFFLOAD_MASK (			 \
 		PKT_TX_VLAN_PKT |		 \
 		PKT_TX_IP_CKSUM |		 \
-		PKT_TX_L4_MASK)
+		PKT_TX_L4_MASK |		 \
+		PKT_TX_TCP_SEG)
 
 static inline struct rte_mbuf *
 rte_rxmbuf_alloc(struct rte_mempool *mp)
@@ -363,59 +364,84 @@ ixgbe_xmit_pkts_simple(void *tx_queue, struct rte_mbuf **tx_pkts,
 static inline void
 ixgbe_set_xmit_ctx(struct igb_tx_queue* txq,
 		volatile struct ixgbe_adv_tx_context_desc *ctx_txd,
-		uint64_t ol_flags, uint32_t vlan_macip_lens)
+		uint64_t ol_flags, union ixgbe_tx_offload tx_offload)
 {
 	uint32_t type_tucmd_mlhl;
-	uint32_t mss_l4len_idx;
+	uint32_t mss_l4len_idx = 0;
 	uint32_t ctx_idx;
-	uint32_t cmp_mask;
+	uint32_t vlan_macip_lens;
+	union ixgbe_tx_offload tx_offload_mask;
 
 	ctx_idx = txq->ctx_curr;
-	cmp_mask = 0;
+	tx_offload_mask.data = 0;
 	type_tucmd_mlhl = 0;
 
+	/* Specify which HW CTX to upload. */
+	mss_l4len_idx |= (ctx_idx << IXGBE_ADVTXD_IDX_SHIFT);
+
 	if (ol_flags & PKT_TX_VLAN_PKT) {
-		cmp_mask |= TX_VLAN_CMP_MASK;
+		tx_offload_mask.vlan_tci = ~0;
 	}
 
-	if (ol_flags & PKT_TX_IP_CKSUM) {
-		type_tucmd_mlhl = IXGBE_ADVTXD_TUCMD_IPV4;
-		cmp_mask |= TX_MACIP_LEN_CMP_MASK;
-	}
+	/* check if TCP segmentation required for this packet */
+	if (ol_flags & PKT_TX_TCP_SEG) {
+		/* implies IP cksum and TCP cksum */
+		type_tucmd_mlhl = IXGBE_ADVTXD_TUCMD_IPV4 |
+			IXGBE_ADVTXD_TUCMD_L4T_TCP |
+			IXGBE_ADVTXD_DTYP_CTXT | IXGBE_ADVTXD_DCMD_DEXT;;
+
+		tx_offload_mask.l2_len = ~0;
+		tx_offload_mask.l3_len = ~0;
+		tx_offload_mask.l4_len = ~0;
+		tx_offload_mask.tso_segsz = ~0;
+		mss_l4len_idx |= tx_offload.tso_segsz << IXGBE_ADVTXD_MSS_SHIFT;
+		mss_l4len_idx |= tx_offload.l4_len << IXGBE_ADVTXD_L4LEN_SHIFT;
+	} else { /* no TSO, check if hardware checksum is needed */
+		if (ol_flags & PKT_TX_IP_CKSUM) {
+			type_tucmd_mlhl = IXGBE_ADVTXD_TUCMD_IPV4;
+			tx_offload_mask.l2_len = ~0;
+			tx_offload_mask.l3_len = ~0;
+		}
 
-	/* Specify which HW CTX to upload. */
-	mss_l4len_idx = (ctx_idx << IXGBE_ADVTXD_IDX_SHIFT);
-	switch (ol_flags & PKT_TX_L4_MASK) {
-	case PKT_TX_UDP_CKSUM:
-		type_tucmd_mlhl |= IXGBE_ADVTXD_TUCMD_L4T_UDP |
+		switch (ol_flags & PKT_TX_L4_MASK) {
+		case PKT_TX_UDP_CKSUM:
+			type_tucmd_mlhl |= IXGBE_ADVTXD_TUCMD_L4T_UDP |
 				IXGBE_ADVTXD_DTYP_CTXT | IXGBE_ADVTXD_DCMD_DEXT;
-		mss_l4len_idx |= sizeof(struct udp_hdr) << IXGBE_ADVTXD_L4LEN_SHIFT;
-		cmp_mask |= TX_MACIP_LEN_CMP_MASK;
-		break;
-	case PKT_TX_TCP_CKSUM:
-		type_tucmd_mlhl |= IXGBE_ADVTXD_TUCMD_L4T_TCP |
+			mss_l4len_idx |= sizeof(struct udp_hdr) << IXGBE_ADVTXD_L4LEN_SHIFT;
+			tx_offload_mask.l2_len = ~0;
+			tx_offload_mask.l3_len = ~0;
+			break;
+		case PKT_TX_TCP_CKSUM:
+			type_tucmd_mlhl |= IXGBE_ADVTXD_TUCMD_L4T_TCP |
 				IXGBE_ADVTXD_DTYP_CTXT | IXGBE_ADVTXD_DCMD_DEXT;
-		mss_l4len_idx |= sizeof(struct tcp_hdr) << IXGBE_ADVTXD_L4LEN_SHIFT;
-		cmp_mask |= TX_MACIP_LEN_CMP_MASK;
-		break;
-	case PKT_TX_SCTP_CKSUM:
-		type_tucmd_mlhl |= IXGBE_ADVTXD_TUCMD_L4T_SCTP |
+			mss_l4len_idx |= sizeof(struct tcp_hdr) << IXGBE_ADVTXD_L4LEN_SHIFT;
+			tx_offload_mask.l2_len = ~0;
+			tx_offload_mask.l3_len = ~0;
+			tx_offload_mask.l4_len = ~0;
+			break;
+		case PKT_TX_SCTP_CKSUM:
+			type_tucmd_mlhl |= IXGBE_ADVTXD_TUCMD_L4T_SCTP |
 				IXGBE_ADVTXD_DTYP_CTXT | IXGBE_ADVTXD_DCMD_DEXT;
-		mss_l4len_idx |= sizeof(struct sctp_hdr) << IXGBE_ADVTXD_L4LEN_SHIFT;
-		cmp_mask |= TX_MACIP_LEN_CMP_MASK;
-		break;
-	default:
-		type_tucmd_mlhl |= IXGBE_ADVTXD_TUCMD_L4T_RSV |
+			mss_l4len_idx |= sizeof(struct sctp_hdr) << IXGBE_ADVTXD_L4LEN_SHIFT;
+			tx_offload_mask.l2_len = ~0;
+			tx_offload_mask.l3_len = ~0;
+			break;
+		default:
+			type_tucmd_mlhl |= IXGBE_ADVTXD_TUCMD_L4T_RSV |
 				IXGBE_ADVTXD_DTYP_CTXT | IXGBE_ADVTXD_DCMD_DEXT;
-		break;
+			break;
+		}
 	}
 
 	txq->ctx_cache[ctx_idx].flags = ol_flags;
-	txq->ctx_cache[ctx_idx].cmp_mask = cmp_mask;
-	txq->ctx_cache[ctx_idx].vlan_macip_lens.data =
-		vlan_macip_lens & cmp_mask;
+	txq->ctx_cache[ctx_idx].tx_offload.data  =
+		tx_offload_mask.data & tx_offload.data;
+	txq->ctx_cache[ctx_idx].tx_offload_mask    = tx_offload_mask;
 
 	ctx_txd->type_tucmd_mlhl = rte_cpu_to_le_32(type_tucmd_mlhl);
+	vlan_macip_lens = tx_offload.l3_len;
+	vlan_macip_lens |= (tx_offload.l2_len << IXGBE_ADVTXD_MACLEN_SHIFT);
+	vlan_macip_lens |= ((uint32_t)tx_offload.vlan_tci << IXGBE_ADVTXD_VLAN_SHIFT);
 	ctx_txd->vlan_macip_lens = rte_cpu_to_le_32(vlan_macip_lens);
 	ctx_txd->mss_l4len_idx   = rte_cpu_to_le_32(mss_l4len_idx);
 	ctx_txd->seqnum_seed     = 0;
@@ -427,20 +453,20 @@ ixgbe_set_xmit_ctx(struct igb_tx_queue* txq,
  */
 static inline uint32_t
 what_advctx_update(struct igb_tx_queue *txq, uint64_t flags,
-		uint32_t vlan_macip_lens)
+		union ixgbe_tx_offload tx_offload)
 {
 	/* If match with the current used context */
 	if (likely((txq->ctx_cache[txq->ctx_curr].flags == flags) &&
-		(txq->ctx_cache[txq->ctx_curr].vlan_macip_lens.data ==
-		(txq->ctx_cache[txq->ctx_curr].cmp_mask & vlan_macip_lens)))) {
+		(txq->ctx_cache[txq->ctx_curr].tx_offload.data ==
+		(txq->ctx_cache[txq->ctx_curr].tx_offload_mask.data & tx_offload.data)))) {
 			return txq->ctx_curr;
 	}
 
 	/* What if match with the next context  */
 	txq->ctx_curr ^= 1;
 	if (likely((txq->ctx_cache[txq->ctx_curr].flags == flags) &&
-		(txq->ctx_cache[txq->ctx_curr].vlan_macip_lens.data ==
-		(txq->ctx_cache[txq->ctx_curr].cmp_mask & vlan_macip_lens)))) {
+		(txq->ctx_cache[txq->ctx_curr].tx_offload.data ==
+		(txq->ctx_cache[txq->ctx_curr].tx_offload_mask.data & tx_offload.data)))) {
 			return txq->ctx_curr;
 	}
 
@@ -451,20 +477,25 @@ what_advctx_update(struct igb_tx_queue *txq, uint64_t flags,
 static inline uint32_t
 tx_desc_cksum_flags_to_olinfo(uint64_t ol_flags)
 {
-	static const uint32_t l4_olinfo[2] = {0, IXGBE_ADVTXD_POPTS_TXSM};
-	static const uint32_t l3_olinfo[2] = {0, IXGBE_ADVTXD_POPTS_IXSM};
-	uint32_t tmp;
-
-	tmp  = l4_olinfo[(ol_flags & PKT_TX_L4_MASK)  != PKT_TX_L4_NO_CKSUM];
-	tmp |= l3_olinfo[(ol_flags & PKT_TX_IP_CKSUM) != 0];
+	uint32_t tmp = 0;
+	if ((ol_flags & PKT_TX_L4_MASK) != PKT_TX_L4_NO_CKSUM)
+		tmp |= IXGBE_ADVTXD_POPTS_TXSM;
+	if (ol_flags & PKT_TX_IP_CKSUM)
+		tmp |= IXGBE_ADVTXD_POPTS_IXSM;
+	if (ol_flags & PKT_TX_TCP_SEG)
+		tmp |= IXGBE_ADVTXD_POPTS_TXSM;
 	return tmp;
 }
 
 static inline uint32_t
-tx_desc_vlan_flags_to_cmdtype(uint64_t ol_flags)
+tx_desc_ol_flags_to_cmdtype(uint64_t ol_flags)
 {
-	static const uint32_t vlan_cmd[2] = {0, IXGBE_ADVTXD_DCMD_VLE};
-	return vlan_cmd[(ol_flags & PKT_TX_VLAN_PKT) != 0];
+	uint32_t cmdtype = 0;
+	if (ol_flags & PKT_TX_VLAN_PKT)
+		cmdtype |= IXGBE_ADVTXD_DCMD_VLE;
+	if (ol_flags & PKT_TX_TCP_SEG)
+		cmdtype |= IXGBE_ADVTXD_DCMD_TSE;
+	return cmdtype;
 }
 
 /* Default RS bit threshold values */
@@ -545,14 +576,6 @@ ixgbe_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
 	volatile union ixgbe_adv_tx_desc *txd;
 	struct rte_mbuf     *tx_pkt;
 	struct rte_mbuf     *m_seg;
-	union ixgbe_vlan_macip vlan_macip_lens;
-	union {
-		uint16_t u16;
-		struct {
-			uint16_t l3_len:9;
-			uint16_t l2_len:7;
-		};
-	} l2_l3_len;
 	uint64_t buf_dma_addr;
 	uint32_t olinfo_status;
 	uint32_t cmd_type_len;
@@ -566,6 +589,7 @@ ixgbe_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
 	uint64_t tx_ol_req;
 	uint32_t ctx = 0;
 	uint32_t new_ctx;
+	union ixgbe_tx_offload tx_offload = { .data = 0 };
 
 	txq = tx_queue;
 	sw_ring = txq->sw_ring;
@@ -595,14 +619,15 @@ ixgbe_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
 		/* If hardware offload required */
 		tx_ol_req = ol_flags & IXGBE_TX_OFFLOAD_MASK;
 		if (tx_ol_req) {
-			l2_l3_len.l2_len = tx_pkt->l2_len;
-			l2_l3_len.l3_len = tx_pkt->l3_len;
-			vlan_macip_lens.f.vlan_tci = tx_pkt->vlan_tci;
-			vlan_macip_lens.f.l2_l3_len = l2_l3_len.u16;
+			tx_offload.l2_len = tx_pkt->l2_len;
+			tx_offload.l3_len = tx_pkt->l3_len;
+			tx_offload.l4_len = tx_pkt->l4_len;
+			tx_offload.vlan_tci = tx_pkt->vlan_tci;
+			tx_offload.tso_segsz = tx_pkt->tso_segsz;
 
 			/* If new context need be built or reuse the exist ctx. */
 			ctx = what_advctx_update(txq, tx_ol_req,
-				vlan_macip_lens.data);
+				tx_offload);
 			/* Only allocate context descriptor if required*/
 			new_ctx = (ctx == IXGBE_CTX_NUM);
 			ctx = txq->ctx_curr;
@@ -717,13 +742,22 @@ ixgbe_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
 		 */
 		cmd_type_len = IXGBE_ADVTXD_DTYP_DATA |
 			IXGBE_ADVTXD_DCMD_IFCS | IXGBE_ADVTXD_DCMD_DEXT;
-		olinfo_status = (pkt_len << IXGBE_ADVTXD_PAYLEN_SHIFT);
+
 #ifdef RTE_LIBRTE_IEEE1588
 		if (ol_flags & PKT_TX_IEEE1588_TMST)
 			cmd_type_len |= IXGBE_ADVTXD_MAC_1588;
 #endif
 
+		olinfo_status = 0;
 		if (tx_ol_req) {
+
+			if (ol_flags & PKT_TX_TCP_SEG) {
+				/* when TSO is on, paylen in descriptor is the
+				 * not the packet len but the tcp payload len */
+				pkt_len -= (tx_offload.l2_len +
+					tx_offload.l3_len + tx_offload.l4_len);
+			}
+
 			/*
 			 * Setup the TX Advanced Context Descriptor if required
 			 */
@@ -744,7 +778,7 @@ ixgbe_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
 				}
 
 				ixgbe_set_xmit_ctx(txq, ctx_txd, tx_ol_req,
-				    vlan_macip_lens.data);
+					tx_offload);
 
 				txe->last_id = tx_last;
 				tx_id = txe->next_id;
@@ -756,11 +790,13 @@ ixgbe_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
 			 * This path will go through
 			 * whatever new/reuse the context descriptor
 			 */
-			cmd_type_len  |= tx_desc_vlan_flags_to_cmdtype(ol_flags);
+			cmd_type_len  |= tx_desc_ol_flags_to_cmdtype(ol_flags);
 			olinfo_status |= tx_desc_cksum_flags_to_olinfo(ol_flags);
 			olinfo_status |= ctx << IXGBE_ADVTXD_IDX_SHIFT;
 		}
 
+		olinfo_status |= (pkt_len << IXGBE_ADVTXD_PAYLEN_SHIFT);
+
 		m_seg = tx_pkt;
 		do {
 			txd = &txr[tx_id];
@@ -3611,9 +3647,10 @@ ixgbe_dev_tx_init(struct rte_eth_dev *dev)
 	PMD_INIT_FUNC_TRACE();
 	hw = IXGBE_DEV_PRIVATE_TO_HW(dev->data->dev_private);
 
-	/* Enable TX CRC (checksum offload requirement) */
+	/* Enable TX CRC (checksum offload requirement) and hw padding
+	 * (TSO requirement) */
 	hlreg0 = IXGBE_READ_REG(hw, IXGBE_HLREG0);
-	hlreg0 |= IXGBE_HLREG0_TXCRCEN;
+	hlreg0 |= (IXGBE_HLREG0_TXCRCEN | IXGBE_HLREG0_TXPADEN);
 	IXGBE_WRITE_REG(hw, IXGBE_HLREG0, hlreg0);
 
 	/* Setup the Base and Length of the Tx Descriptor Rings */
diff --git a/lib/librte_pmd_ixgbe/ixgbe_rxtx.h b/lib/librte_pmd_ixgbe/ixgbe_rxtx.h
index eb89715..13099af 100644
--- a/lib/librte_pmd_ixgbe/ixgbe_rxtx.h
+++ b/lib/librte_pmd_ixgbe/ixgbe_rxtx.h
@@ -145,13 +145,16 @@ enum ixgbe_advctx_num {
 };
 
 /** Offload features */
-union ixgbe_vlan_macip {
-	uint32_t data;
+union ixgbe_tx_offload {
+	uint64_t data;
 	struct {
-		uint16_t l2_l3_len; /**< combined 9-bit l3, 7-bit l2 lengths */
-		uint16_t vlan_tci;
+		uint64_t l2_len:7; /**< L2 (MAC) Header Length. */
+		uint64_t l3_len:9; /**< L3 (IP) Header Length. */
+		uint64_t l4_len:8; /**< L4 (TCP/UDP) Header Length. */
+		uint64_t tso_segsz:16; /**< TCP TSO segment size */
+		uint64_t vlan_tci:16;
 		/**< VLAN Tag Control Identifier (CPU order). */
-	} f;
+	};
 };
 
 /*
@@ -170,8 +173,10 @@ union ixgbe_vlan_macip {
 
 struct ixgbe_advctx_info {
 	uint64_t flags;           /**< ol_flags for context build. */
-	uint32_t cmp_mask;        /**< compare mask for vlan_macip_lens */
-	union ixgbe_vlan_macip vlan_macip_lens; /**< vlan, mac ip length. */
+	/**< tx offload: vlan, tso, l2-l3-l4 lengths. */
+	union ixgbe_tx_offload tx_offload;
+	/** compare mask for tx offload. */
+	union ixgbe_tx_offload tx_offload_mask;
 };
 
 /**
-- 
2.1.0

^ permalink raw reply	[flat|nested] 112+ messages in thread

* [dpdk-dev] [PATCH v2 12/13] testpmd: support TSO in csum forward engine
  2014-11-14 17:03 ` [dpdk-dev] [PATCH v2 00/13] " Olivier Matz
                     ` (10 preceding siblings ...)
  2014-11-14 17:03   ` [dpdk-dev] [PATCH v2 11/13] ixgbe: support " Olivier Matz
@ 2014-11-14 17:03   ` Olivier Matz
  2014-11-14 17:03   ` [dpdk-dev] [PATCH v2 13/13] testpmd: add a verbose mode " Olivier Matz
  2014-11-20 22:58   ` [dpdk-dev] [PATCH v3 00/13] add TSO support Olivier Matz
  13 siblings, 0 replies; 112+ messages in thread
From: Olivier Matz @ 2014-11-14 17:03 UTC (permalink / raw)
  To: dev; +Cc: jigsaw

Add two new commands in testpmd:

- tso set <segsize> <portid>
- tso show <portid>

These commands can be used enable TSO when transmitting TCP packets in
the csum forward engine. Ex:

  set fwd csum
  tx_checksum set ip hw 0
  tso set 800 0
  start

Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
---
 app/test-pmd/cmdline.c  | 92 +++++++++++++++++++++++++++++++++++++++++++++++++
 app/test-pmd/csumonly.c | 64 ++++++++++++++++++++++++----------
 app/test-pmd/testpmd.h  |  1 +
 3 files changed, 139 insertions(+), 18 deletions(-)

diff --git a/app/test-pmd/cmdline.c b/app/test-pmd/cmdline.c
index 0361e58..5460415 100644
--- a/app/test-pmd/cmdline.c
+++ b/app/test-pmd/cmdline.c
@@ -318,6 +318,14 @@ static void cmd_help_long_parsed(void *parsed_result,
 			"tx_checksum show (port_id)\n"
 			"    Display tx checksum offload configuration\n\n"
 
+			"tso set (segsize) (portid)\n"
+			"    Enable TCP Segmentation Offload in csum forward"
+			" engine.\n"
+			"    Please check the NIC datasheet for HW limits.\n\n"
+
+			"tso show (portid)"
+			"    Display the status of TCP Segmentation Offload.\n\n"
+
 			"set fwd (%s)\n"
 			"    Set packet forwarding mode.\n\n"
 
@@ -2862,6 +2870,88 @@ cmdline_parse_inst_t cmd_tx_cksum_show = {
 	},
 };
 
+/* *** ENABLE HARDWARE SEGMENTATION IN TX PACKETS *** */
+struct cmd_tso_set_result {
+	cmdline_fixed_string_t tso;
+	cmdline_fixed_string_t mode;
+	uint16_t tso_segsz;
+	uint8_t port_id;
+};
+
+static void
+cmd_tso_set_parsed(void *parsed_result,
+		       __attribute__((unused)) struct cmdline *cl,
+		       __attribute__((unused)) void *data)
+{
+	struct cmd_tso_set_result *res = parsed_result;
+	struct rte_eth_dev_info dev_info;
+
+	if (port_id_is_invalid(res->port_id))
+		return;
+
+	if (!strcmp(res->mode, "set"))
+		ports[res->port_id].tso_segsz = res->tso_segsz;
+
+	if (ports[res->port_id].tso_segsz == 0)
+		printf("TSO is disabled\n");
+	else
+		printf("TSO segment size is %d\n",
+			ports[res->port_id].tso_segsz);
+
+	/* display warnings if configuration is not supported by the NIC */
+	rte_eth_dev_info_get(res->port_id, &dev_info);
+	if ((ports[res->port_id].tso_segsz != 0) &&
+		(dev_info.tx_offload_capa & DEV_TX_OFFLOAD_TCP_TSO) == 0) {
+		printf("Warning: TSO enabled but not "
+			"supported by port %d\n", res->port_id);
+	}
+}
+
+cmdline_parse_token_string_t cmd_tso_set_tso =
+	TOKEN_STRING_INITIALIZER(struct cmd_tso_set_result,
+				tso, "tso");
+cmdline_parse_token_string_t cmd_tso_set_mode =
+	TOKEN_STRING_INITIALIZER(struct cmd_tso_set_result,
+				mode, "set");
+cmdline_parse_token_num_t cmd_tso_set_tso_segsz =
+	TOKEN_NUM_INITIALIZER(struct cmd_tso_set_result,
+				tso_segsz, UINT16);
+cmdline_parse_token_num_t cmd_tso_set_portid =
+	TOKEN_NUM_INITIALIZER(struct cmd_tso_set_result,
+				port_id, UINT8);
+
+cmdline_parse_inst_t cmd_tso_set = {
+	.f = cmd_tso_set_parsed,
+	.data = NULL,
+	.help_str = "Set TSO segment size for csum engine (0 to disable): "
+	"tso set <tso_segsz> <port>",
+	.tokens = {
+		(void *)&cmd_tso_set_tso,
+		(void *)&cmd_tso_set_mode,
+		(void *)&cmd_tso_set_tso_segsz,
+		(void *)&cmd_tso_set_portid,
+		NULL,
+	},
+};
+
+cmdline_parse_token_string_t cmd_tso_show_mode =
+	TOKEN_STRING_INITIALIZER(struct cmd_tso_set_result,
+				mode, "show");
+
+
+cmdline_parse_inst_t cmd_tso_show = {
+	.f = cmd_tso_set_parsed,
+	.data = NULL,
+	.help_str = "Show TSO segment size for csum engine: "
+	"tso show <port>",
+	.tokens = {
+		(void *)&cmd_tso_set_tso,
+		(void *)&cmd_tso_show_mode,
+		(void *)&cmd_tso_set_portid,
+		NULL,
+	},
+};
+
 /* *** ENABLE/DISABLE FLUSH ON RX STREAMS *** */
 struct cmd_set_flush_rx {
 	cmdline_fixed_string_t set;
@@ -7875,6 +7965,8 @@ cmdline_parse_ctx_t main_ctx[] = {
 	(cmdline_parse_inst_t *)&cmd_tx_vlan_set_pvid,
 	(cmdline_parse_inst_t *)&cmd_tx_cksum_set,
 	(cmdline_parse_inst_t *)&cmd_tx_cksum_show,
+	(cmdline_parse_inst_t *)&cmd_tso_set,
+	(cmdline_parse_inst_t *)&cmd_tso_show,
 	(cmdline_parse_inst_t *)&cmd_link_flow_control_set,
 	(cmdline_parse_inst_t *)&cmd_link_flow_control_set_rx,
 	(cmdline_parse_inst_t *)&cmd_link_flow_control_set_tx,
diff --git a/app/test-pmd/csumonly.c b/app/test-pmd/csumonly.c
index 39f974d..9a2beac 100644
--- a/app/test-pmd/csumonly.c
+++ b/app/test-pmd/csumonly.c
@@ -87,12 +87,12 @@
 #endif
 
 static uint16_t
-get_psd_sum(void *l3_hdr, uint16_t ethertype)
+get_psd_sum(void *l3_hdr, uint16_t ethertype, uint64_t ol_flags)
 {
 	if (ethertype == _htons(ETHER_TYPE_IPv4))
-		return rte_ipv4_phdr_cksum(l3_hdr);
+		return rte_ipv4_phdr_cksum(l3_hdr, ol_flags);
 	else /* assume ethertype == ETHER_TYPE_IPv6 */
-		return rte_ipv6_phdr_cksum(l3_hdr);
+		return rte_ipv6_phdr_cksum(l3_hdr, ol_flags);
 }
 
 static uint16_t
@@ -107,14 +107,15 @@ get_udptcp_checksum(void *l3_hdr, void *l4_hdr, uint16_t ethertype)
 /*
  * Parse an ethernet header to fill the ethertype, l2_len, l3_len and
  * ipproto. This function is able to recognize IPv4/IPv6 with one optional vlan
- * header.
+ * header. The l4_len argument is only set in case of TCP (useful for TSO).
  */
 static void
 parse_ethernet(struct ether_hdr *eth_hdr, uint16_t *ethertype, uint16_t *l2_len,
-	uint16_t *l3_len, uint8_t *l4_proto)
+	uint16_t *l3_len, uint8_t *l4_proto, uint16_t *l4_len)
 {
 	struct ipv4_hdr *ipv4_hdr;
 	struct ipv6_hdr *ipv6_hdr;
+	struct tcp_hdr *tcp_hdr;
 
 	*l2_len = sizeof(struct ether_hdr);
 	*ethertype = eth_hdr->ether_type;
@@ -142,6 +143,14 @@ parse_ethernet(struct ether_hdr *eth_hdr, uint16_t *ethertype, uint16_t *l2_len,
 		*l4_proto = 0;
 		break;
 	}
+
+	if (*l4_proto == IPPROTO_TCP) {
+		tcp_hdr = (struct tcp_hdr *)((char *)eth_hdr +
+			*l2_len + *l3_len);
+		*l4_len = (tcp_hdr->data_off & 0xf0) >> 2;
+	}
+	else
+		*l4_len = 0;
 }
 
 /* modify the IPv4 or IPv4 source address of a packet */
@@ -164,7 +173,7 @@ change_ip_addresses(void *l3_hdr, uint16_t ethertype)
  * depending on the testpmd command line configuration */
 static uint64_t
 process_inner_cksums(void *l3_hdr, uint16_t ethertype, uint16_t l3_len,
-	uint8_t l4_proto, uint16_t testpmd_ol_flags)
+	uint8_t l4_proto, uint16_t tso_segsz, uint16_t testpmd_ol_flags)
 {
 	struct ipv4_hdr *ipv4_hdr = l3_hdr;
 	struct udp_hdr *udp_hdr;
@@ -176,11 +185,16 @@ process_inner_cksums(void *l3_hdr, uint16_t ethertype, uint16_t l3_len,
 		ipv4_hdr = l3_hdr;
 		ipv4_hdr->hdr_checksum = 0;
 
-		if (testpmd_ol_flags & TESTPMD_TX_OFFLOAD_IP_CKSUM)
+		if (tso_segsz != 0 && l4_proto == IPPROTO_TCP) {
 			ol_flags |= PKT_TX_IP_CKSUM;
-		else
-			ipv4_hdr->hdr_checksum = rte_ipv4_cksum(ipv4_hdr);
-
+		}
+		else {
+			if (testpmd_ol_flags & TESTPMD_TX_OFFLOAD_IP_CKSUM)
+				ol_flags |= PKT_TX_IP_CKSUM;
+			else
+				ipv4_hdr->hdr_checksum =
+					rte_ipv4_cksum(ipv4_hdr);
+		}
 	}
 	else if (ethertype != _htons(ETHER_TYPE_IPv6))
 		return 0; /* packet type not supported nothing to do */
@@ -193,7 +207,7 @@ process_inner_cksums(void *l3_hdr, uint16_t ethertype, uint16_t l3_len,
 			if (testpmd_ol_flags & TESTPMD_TX_OFFLOAD_UDP_CKSUM) {
 				ol_flags |= PKT_TX_UDP_CKSUM;
 				udp_hdr->dgram_cksum = get_psd_sum(l3_hdr,
-					ethertype);
+					ethertype, ol_flags);
 			}
 			else {
 				udp_hdr->dgram_cksum =
@@ -205,9 +219,13 @@ process_inner_cksums(void *l3_hdr, uint16_t ethertype, uint16_t l3_len,
 	else if (l4_proto == IPPROTO_TCP) {
 		tcp_hdr = (struct tcp_hdr *)((char *)l3_hdr + l3_len);
 		tcp_hdr->cksum = 0;
-		if (testpmd_ol_flags & TESTPMD_TX_OFFLOAD_TCP_CKSUM) {
+		if (tso_segsz != 0) {
+			ol_flags |= PKT_TX_TCP_SEG;
+			tcp_hdr->cksum = get_psd_sum(l3_hdr, ethertype, ol_flags);
+		}
+		else if (testpmd_ol_flags & TESTPMD_TX_OFFLOAD_TCP_CKSUM) {
 			ol_flags |= PKT_TX_TCP_CKSUM;
-			tcp_hdr->cksum = get_psd_sum(l3_hdr, ethertype);
+			tcp_hdr->cksum = get_psd_sum(l3_hdr, ethertype, ol_flags);
 		}
 		else {
 			tcp_hdr->cksum =
@@ -276,6 +294,8 @@ process_outer_cksums(void *outer_l3_hdr, uint16_t outer_ethertype,
  *  - modify the IPs
  *  - reprocess the checksum in SW or HW, depending on testpmd command line
  *    configuration
+ *  - if TSO is enabled in testpmd command line, also flag the mbuf for TCP
+ *    segmentation offload (this implies HW checksum)
  * Then packets are transmitted on the output port.
  *
  * Supported packets are:
@@ -301,7 +321,9 @@ pkt_burst_checksum_forward(struct fwd_stream *fs)
 	uint16_t testpmd_ol_flags;
 	uint8_t l4_proto;
 	uint16_t ethertype = 0, outer_ethertype = 0;
-	uint16_t  l2_len = 0, l3_len = 0, outer_l2_len = 0, outer_l3_len = 0;
+	uint16_t l2_len = 0, l3_len = 0, l4_len = 0;
+	uint16_t outer_l2_len = 0, outer_l3_len = 0;
+	uint16_t tso_segsz;
 	int tunnel = 0;
 	uint32_t rx_bad_ip_csum;
 	uint32_t rx_bad_l4_csum;
@@ -331,6 +353,7 @@ pkt_burst_checksum_forward(struct fwd_stream *fs)
 
 	txp = &ports[fs->tx_port];
 	testpmd_ol_flags = txp->tx_ol_flags;
+	tso_segsz = txp->tso_segsz;
 
 	for (i = 0; i < nb_rx; i++) {
 
@@ -346,7 +369,8 @@ pkt_burst_checksum_forward(struct fwd_stream *fs)
 		 * and inner headers */
 
 		eth_hdr = rte_pktmbuf_mtod(m, struct ether_hdr *);
-		parse_ethernet(eth_hdr, &ethertype, &l2_len, &l3_len, &l4_proto);
+		parse_ethernet(eth_hdr, &ethertype, &l2_len, &l3_len,
+			&l4_proto, &l4_len);
 		l3_hdr = (char *)eth_hdr + l2_len;
 
 		/* check if it's a supported tunnel (only vxlan for now) */
@@ -374,7 +398,7 @@ pkt_burst_checksum_forward(struct fwd_stream *fs)
 					sizeof(struct vxlan_hdr));
 
 				parse_ethernet(eth_hdr, &ethertype, &l2_len,
-					&l3_len, &l4_proto);
+					&l3_len, &l4_proto, &l4_len);
 				l3_hdr = (char *)eth_hdr + l2_len;
 			}
 		}
@@ -388,11 +412,12 @@ pkt_burst_checksum_forward(struct fwd_stream *fs)
 
 		/* step 3: depending on user command line configuration,
 		 * recompute checksum either in software or flag the
-		 * mbuf to offload the calculation to the NIC */
+		 * mbuf to offload the calculation to the NIC. If TSO
+		 * is configured, prepare the mbuf for TCP segmentation. */
 
 		/* process checksums of inner headers first */
 		ol_flags |= process_inner_cksums(l3_hdr, ethertype,
-			l3_len, l4_proto, testpmd_ol_flags);
+			l3_len, l4_proto, tso_segsz, testpmd_ol_flags);
 
 		/* Then process outer headers if any. Note that the software
 		 * checksum will be wrong if one of the inner checksums is
@@ -421,6 +446,7 @@ pkt_burst_checksum_forward(struct fwd_stream *fs)
 					sizeof(struct udp_hdr) +
 					sizeof(struct vxlan_hdr) + l2_len;
 				m->l3_len = l3_len;
+				m->l4_len = l4_len;
 			}
 		} else {
 			/* this is only useful if an offload flag is
@@ -428,7 +454,9 @@ pkt_burst_checksum_forward(struct fwd_stream *fs)
 			 * case */
 			m->l2_len = l2_len;
 			m->l3_len = l3_len;
+			m->l4_len = l4_len;
 		}
+		m->tso_segsz = tso_segsz;
 		m->ol_flags = ol_flags;
 
 	}
diff --git a/app/test-pmd/testpmd.h b/app/test-pmd/testpmd.h
index c753d37..c22863f 100644
--- a/app/test-pmd/testpmd.h
+++ b/app/test-pmd/testpmd.h
@@ -149,6 +149,7 @@ struct rte_port {
 	struct fwd_stream       *tx_stream; /**< Port TX stream, if unique */
 	unsigned int            socket_id;  /**< For NUMA support */
 	uint16_t                tx_ol_flags;/**< TX Offload Flags (TESTPMD_TX_OFFLOAD...). */
+	uint16_t                tso_segsz;  /**< MSS for segmentation offload. */
 	uint16_t                tx_vlan_id; /**< Tag Id. in TX VLAN packets. */
 	void                    *fwd_ctx;   /**< Forwarding mode context */
 	uint64_t                rx_bad_ip_csum; /**< rx pkts with bad ip checksum  */
-- 
2.1.0

^ permalink raw reply	[flat|nested] 112+ messages in thread

* [dpdk-dev] [PATCH v2 13/13] testpmd: add a verbose mode csum forward engine
  2014-11-14 17:03 ` [dpdk-dev] [PATCH v2 00/13] " Olivier Matz
                     ` (11 preceding siblings ...)
  2014-11-14 17:03   ` [dpdk-dev] [PATCH v2 12/13] testpmd: support TSO in csum forward engine Olivier Matz
@ 2014-11-14 17:03   ` Olivier Matz
  2014-11-20 22:58   ` [dpdk-dev] [PATCH v3 00/13] add TSO support Olivier Matz
  13 siblings, 0 replies; 112+ messages in thread
From: Olivier Matz @ 2014-11-14 17:03 UTC (permalink / raw)
  To: dev; +Cc: jigsaw

If the user specifies 'set verbose 1' in testpmd command line,
the csum forward engine will dump some informations about received
and transmitted packets, especially which flags are set and what
values are assigned to l2_len, l3_len, l4_len and tso_segsz.

This can help someone implementing TSO or hardware checksum offload to
understand how to configure the mbufs.

Example of output for one packet:

 --------------
 rx: l2_len=14 ethertype=800 l3_len=20 l4_proto=6 l4_len=20
 tx: m->l2_len=14 m->l3_len=20 m->l4_len=20
 tx: m->tso_segsz=800
 tx: flags=PKT_TX_IP_CKSUM PKT_TX_TCP_SEG
 --------------

Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
---
 app/test-pmd/csumonly.c | 51 +++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 51 insertions(+)

diff --git a/app/test-pmd/csumonly.c b/app/test-pmd/csumonly.c
index 9a2beac..9e8c811 100644
--- a/app/test-pmd/csumonly.c
+++ b/app/test-pmd/csumonly.c
@@ -459,6 +459,57 @@ pkt_burst_checksum_forward(struct fwd_stream *fs)
 		m->tso_segsz = tso_segsz;
 		m->ol_flags = ol_flags;
 
+		/* if verbose mode is enabled, dump debug info */
+		if (verbose_level > 0) {
+			struct {
+				uint64_t flag;
+				uint64_t mask;
+			} tx_flags[] = {
+				{ PKT_TX_IP_CKSUM, PKT_TX_IP_CKSUM },
+				{ PKT_TX_UDP_CKSUM, PKT_TX_L4_MASK },
+				{ PKT_TX_TCP_CKSUM, PKT_TX_L4_MASK },
+				{ PKT_TX_SCTP_CKSUM, PKT_TX_L4_MASK },
+				{ PKT_TX_VXLAN_CKSUM, PKT_TX_VXLAN_CKSUM },
+				{ PKT_TX_TCP_SEG, PKT_TX_TCP_SEG },
+			};
+			unsigned j;
+			const char *name;
+
+			printf("-----------------\n");
+			/* dump rx parsed packet info */
+			printf("rx: l2_len=%d ethertype=%x l3_len=%d "
+				"l4_proto=%d l4_len=%d\n",
+				l2_len, rte_be_to_cpu_16(ethertype),
+				l3_len, l4_proto, l4_len);
+			if (tunnel == 1)
+				printf("rx: outer_l2_len=%d outer_ethertype=%x "
+					"outer_l3_len=%d\n", outer_l2_len,
+					rte_be_to_cpu_16(outer_ethertype),
+					outer_l3_len);
+			/* dump tx packet info */
+			if ((testpmd_ol_flags & (TESTPMD_TX_OFFLOAD_IP_CKSUM |
+						TESTPMD_TX_OFFLOAD_UDP_CKSUM |
+						TESTPMD_TX_OFFLOAD_TCP_CKSUM |
+						TESTPMD_TX_OFFLOAD_SCTP_CKSUM)) ||
+				tso_segsz != 0)
+				printf("tx: m->l2_len=%d m->l3_len=%d "
+					"m->l4_len=%d\n",
+					m->l2_len, m->l3_len, m->l4_len);
+			if ((tunnel == 1) &&
+				(testpmd_ol_flags & TESTPMD_TX_OFFLOAD_VXLAN_CKSUM))
+				printf("tx: m->inner_l2_len=%d m->inner_l3_len=%d\n",
+					m->inner_l2_len, m->inner_l3_len);
+			if (tso_segsz != 0)
+				printf("tx: m->tso_segsz=%d\n", m->tso_segsz);
+			printf("tx: flags=");
+			for (j = 0; j < sizeof(tx_flags)/sizeof(*tx_flags); j++) {
+				name = rte_get_tx_ol_flag_name(tx_flags[j].flag);
+				if ((m->ol_flags & tx_flags[j].mask) ==
+					tx_flags[j].flag)
+					printf("%s ", name);
+			}
+			printf("\n");
+		}
 	}
 	nb_tx = rte_eth_tx_burst(fs->tx_port, fs->tx_queue, pkts_burst, nb_rx);
 	fs->tx_packets += nb_tx;
-- 
2.1.0

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [dpdk-dev] [PATCH v2 08/13] testpmd: rework csum forward engine
  2014-11-14 17:03   ` [dpdk-dev] [PATCH v2 08/13] testpmd: rework csum forward engine Olivier Matz
@ 2014-11-17  8:11     ` Liu, Jijiang
  2014-11-17 13:00       ` Olivier MATZ
  0 siblings, 1 reply; 112+ messages in thread
From: Liu, Jijiang @ 2014-11-17  8:11 UTC (permalink / raw)
  To: Olivier Matz; +Cc: dev, jigsaw



> -----Original Message-----
> From: Olivier Matz [mailto:olivier.matz@6wind.com]
> Sent: Saturday, November 15, 2014 1:03 AM
> To: dev@dpdk.org
> Cc: olivier.matz@6wind.com; Walukiewicz, Miroslaw; Liu, Jijiang; Liu, Yong;
> jigsaw@gmail.com; Richardson, Bruce
> Subject: [PATCH v2 08/13] testpmd: rework csum forward engine
> 
> The csum forward engine was becoming too complex to be used and extended
> (the next commits want to add the support of TSO):
> 
> - no explaination about what the code does
> - code is not factorized, lots of code duplicated, especially between
>   ipv4/ipv6
> - user command line api: use of bitmasks that need to be calculated by
>   the user
> - the user flags don't have the same semantic:
>   - for legacy IP/UDP/TCP/SCTP, it selects software or hardware checksum
>   - for other (vxlan), it selects between hardware checksum or no
>     checksum
> - the code relies too much on flags set by the driver without software
>   alternative (ex: PKT_RX_TUNNEL_IPV4_HDR). It is nice to be able to
>   compare a software implementation with the hardware offload.
> 
> This commit tries to fix these issues, and provide a simple definition of what is
> done by the forward engine:
> 
>  * Receive a burst of packets, and for supported packet types:
>  *  - modify the IPs
>  *  - reprocess the checksum in SW or HW, depending on testpmd command line
>  *    configuration
>  * Then packets are transmitted on the output port.
>  *
>  * Supported packets are:
>  *   Ether / (vlan) / IP|IP6 / UDP|TCP|SCTP .
>  *   Ether / (vlan) / IP|IP6 / UDP / VxLAN / Ether / IP|IP6 / UDP|TCP|SCTP
>  *
>  * The network parser supposes that the packet is contiguous, which may
>  * not be the case in real life.
> 
> Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
> ---
>  app/test-pmd/cmdline.c  | 151 ++++++++---
>  app/test-pmd/config.c   |  11 -
>  app/test-pmd/csumonly.c | 668 ++++++++++++++++++++++-------------------------
> -
>  app/test-pmd/testpmd.h  |  17 +-
>  4 files changed, 423 insertions(+), 424 deletions(-)
> 
> diff --git a/app/test-pmd/cmdline.c b/app/test-pmd/cmdline.c index
> 4c3fc76..0361e58 100644
> --- a/app/test-pmd/cmdline.c
> +++ b/app/test-pmd/cmdline.c
> @@ -310,19 +310,14 @@ static void cmd_help_long_parsed(void
> *parsed_result,
>  			"    Disable hardware insertion of a VLAN header in"
>  			" packets sent on a port.\n\n"
> 
> -			"tx_checksum set (mask) (port_id)\n"
> -			"    Enable hardware insertion of checksum offload with"
> -			" the 8-bit mask, 0~0xff, in packets sent on a port.\n"
> -			"        bit 0 - insert ip   checksum offload if set\n"
> -			"        bit 1 - insert udp  checksum offload if set\n"
> -			"        bit 2 - insert tcp  checksum offload if set\n"
> -			"        bit 3 - insert sctp checksum offload if set\n"
> -			"        bit 4 - insert inner ip  checksum offload if set\n"
> -			"        bit 5 - insert inner udp checksum offload if set\n"
> -			"        bit 6 - insert inner tcp checksum offload if set\n"
> -			"        bit 7 - insert inner sctp checksum offload if set\n"
> +			"tx_cksum set (ip|udp|tcp|sctp|vxlan) (hw|sw)
> (port_id)\n"
> +			"    Enable hardware calculation of checksum with when"
> +			" transmitting a packet using 'csum' forward engine.\n"
>  			"    Please check the NIC datasheet for HW limits.\n\n"
> 
> +			"tx_checksum show (port_id)\n"
> +			"    Display tx checksum offload configuration\n\n"
> +
>  			"set fwd (%s)\n"
>  			"    Set packet forwarding mode.\n\n"
> 
> @@ -2738,48 +2733,131 @@ cmdline_parse_inst_t cmd_tx_vlan_reset = {
> 
> 
>  /* *** ENABLE HARDWARE INSERTION OF CHECKSUM IN TX PACKETS *** */ -
> struct cmd_tx_cksum_set_result {
> +struct cmd_tx_cksum_result {
>  	cmdline_fixed_string_t tx_cksum;
> -	cmdline_fixed_string_t set;
> -	uint8_t cksum_mask;
> +	cmdline_fixed_string_t mode;
> +	cmdline_fixed_string_t proto;
> +	cmdline_fixed_string_t hwsw;
>  	uint8_t port_id;
>  };
> 
>  static void
> -cmd_tx_cksum_set_parsed(void *parsed_result,
> +cmd_tx_cksum_parsed(void *parsed_result,
>  		       __attribute__((unused)) struct cmdline *cl,
>  		       __attribute__((unused)) void *data)  {
> -	struct cmd_tx_cksum_set_result *res = parsed_result;
> +	struct cmd_tx_cksum_result *res = parsed_result;
> +	int hw = 0;
> +	uint16_t ol_flags, mask = 0;
> +	struct rte_eth_dev_info dev_info;
> +
> +	if (port_id_is_invalid(res->port_id)) {
> +		printf("invalid port %d\n", res->port_id);
> +		return;
> +	}
> 
> -	tx_cksum_set(res->port_id, res->cksum_mask);
> +	if (!strcmp(res->mode, "set")) {
> +
> +		if (!strcmp(res->hwsw, "hw"))
> +			hw = 1;
> +
> +		if (!strcmp(res->proto, "ip")) {
> +			mask = TESTPMD_TX_OFFLOAD_IP_CKSUM;
> +		} else if (!strcmp(res->proto, "udp")) {
> +			mask = TESTPMD_TX_OFFLOAD_UDP_CKSUM;
> +		} else if (!strcmp(res->proto, "tcp")) {
> +			mask = TESTPMD_TX_OFFLOAD_TCP_CKSUM;
> +		} else if (!strcmp(res->proto, "sctp")) {
> +			mask = TESTPMD_TX_OFFLOAD_SCTP_CKSUM;
> +		} else if (!strcmp(res->proto, "vxlan")) {
> +			mask = TESTPMD_TX_OFFLOAD_VXLAN_CKSUM;
> +		}
> +
> +		if (hw)
> +			ports[res->port_id].tx_ol_flags |= mask;
> +		else
> +			ports[res->port_id].tx_ol_flags &= (~mask);
> +	}
> +
> +	ol_flags = ports[res->port_id].tx_ol_flags;
> +	printf("IP checksum offload is %s\n",
> +		(ol_flags & TESTPMD_TX_OFFLOAD_IP_CKSUM) ? "hw" : "sw");
> +	printf("UDP checksum offload is %s\n",
> +		(ol_flags & TESTPMD_TX_OFFLOAD_UDP_CKSUM) ? "hw" :
> "sw");
> +	printf("TCP checksum offload is %s\n",
> +		(ol_flags & TESTPMD_TX_OFFLOAD_TCP_CKSUM) ? "hw" : "sw");
> +	printf("SCTP checksum offload is %s\n",
> +		(ol_flags & TESTPMD_TX_OFFLOAD_SCTP_CKSUM) ? "hw" :
> "sw");
> +	printf("VxLAN checksum offload is %s\n",
> +		(ol_flags & TESTPMD_TX_OFFLOAD_VXLAN_CKSUM) ? "hw" :
> "sw");
> +
> +	/* display warnings if configuration is not supported by the NIC */
> +	rte_eth_dev_info_get(res->port_id, &dev_info);
> +	if ((ol_flags & TESTPMD_TX_OFFLOAD_IP_CKSUM) &&
> +		(dev_info.tx_offload_capa & DEV_TX_OFFLOAD_IPV4_CKSUM)
> == 0) {
> +		printf("Warning: hardware IP checksum enabled but not "
> +			"supported by port %d\n", res->port_id);
> +	}
> +	if ((ol_flags & TESTPMD_TX_OFFLOAD_UDP_CKSUM) &&
> +		(dev_info.tx_offload_capa & DEV_TX_OFFLOAD_UDP_CKSUM)
> == 0) {
> +		printf("Warning: hardware UDP checksum enabled but not "
> +			"supported by port %d\n", res->port_id);
> +	}
> +	if ((ol_flags & TESTPMD_TX_OFFLOAD_TCP_CKSUM) &&
> +		(dev_info.tx_offload_capa & DEV_TX_OFFLOAD_TCP_CKSUM) ==
> 0) {
> +		printf("Warning: hardware TCP checksum enabled but not "
> +			"supported by port %d\n", res->port_id);
> +	}
> +	if ((ol_flags & TESTPMD_TX_OFFLOAD_SCTP_CKSUM) &&
> +		(dev_info.tx_offload_capa & DEV_TX_OFFLOAD_SCTP_CKSUM)
> == 0) {
> +		printf("Warning: hardware SCTP checksum enabled but not "
> +			"supported by port %d\n", res->port_id);
> +	}
>  }
> 
> -cmdline_parse_token_string_t cmd_tx_cksum_set_tx_cksum =
> -	TOKEN_STRING_INITIALIZER(struct cmd_tx_cksum_set_result,
> +cmdline_parse_token_string_t cmd_tx_cksum_tx_cksum =
> +	TOKEN_STRING_INITIALIZER(struct cmd_tx_cksum_result,
>  				tx_cksum, "tx_checksum");
> -cmdline_parse_token_string_t cmd_tx_cksum_set_set =
> -	TOKEN_STRING_INITIALIZER(struct cmd_tx_cksum_set_result,
> -				set, "set");
> -cmdline_parse_token_num_t cmd_tx_cksum_set_cksum_mask =
> -	TOKEN_NUM_INITIALIZER(struct cmd_tx_cksum_set_result,
> -				cksum_mask, UINT8);
> -cmdline_parse_token_num_t cmd_tx_cksum_set_portid =
> -	TOKEN_NUM_INITIALIZER(struct cmd_tx_cksum_set_result,
> +cmdline_parse_token_string_t cmd_tx_cksum_mode =
> +	TOKEN_STRING_INITIALIZER(struct cmd_tx_cksum_result,
> +				mode, "set");
> +cmdline_parse_token_string_t cmd_tx_cksum_proto =
> +	TOKEN_STRING_INITIALIZER(struct cmd_tx_cksum_result,
> +				proto, "ip#tcp#udp#sctp#vxlan");
> +cmdline_parse_token_string_t cmd_tx_cksum_hwsw =
> +	TOKEN_STRING_INITIALIZER(struct cmd_tx_cksum_result,
> +				hwsw, "hw#sw");
> +cmdline_parse_token_num_t cmd_tx_cksum_portid =
> +	TOKEN_NUM_INITIALIZER(struct cmd_tx_cksum_result,
>  				port_id, UINT8);
> 
>  cmdline_parse_inst_t cmd_tx_cksum_set = {
> -	.f = cmd_tx_cksum_set_parsed,
> +	.f = cmd_tx_cksum_parsed,
> +	.data = NULL,
> +	.help_str = "enable/disable hardware calculation of L3/L4 checksum
> when "
> +		"using csum forward engine: tx_cksum set
> ip|tcp|udp|sctp|vxlan hw|sw <port>",
> +	.tokens = {
> +		(void *)&cmd_tx_cksum_tx_cksum,
> +		(void *)&cmd_tx_cksum_mode,
> +		(void *)&cmd_tx_cksum_proto,
> +		(void *)&cmd_tx_cksum_hwsw,
> +		(void *)&cmd_tx_cksum_portid,
> +		NULL,
> +	},
> +};
> +
> +cmdline_parse_token_string_t cmd_tx_cksum_mode_show =
> +	TOKEN_STRING_INITIALIZER(struct cmd_tx_cksum_result,
> +				mode, "show");
> +
> +cmdline_parse_inst_t cmd_tx_cksum_show = {
> +	.f = cmd_tx_cksum_parsed,
>  	.data = NULL,
> -	.help_str = "enable hardware insertion of L3/L4checksum with a given "
> -	"mask in packets sent on a port, the bit mapping is given as, Bit 0 for ip, "
> -	"Bit 1 for UDP, Bit 2 for TCP, Bit 3 for SCTP, Bit 4 for inner ip, "
> -	"Bit 5 for inner UDP, Bit 6 for inner TCP, Bit 7 for inner SCTP",
> +	.help_str = "show checksum offload configuration: tx_cksum show
> +<port>",
>  	.tokens = {
> -		(void *)&cmd_tx_cksum_set_tx_cksum,
> -		(void *)&cmd_tx_cksum_set_set,
> -		(void *)&cmd_tx_cksum_set_cksum_mask,
> -		(void *)&cmd_tx_cksum_set_portid,
> +		(void *)&cmd_tx_cksum_tx_cksum,
> +		(void *)&cmd_tx_cksum_mode_show,
> +		(void *)&cmd_tx_cksum_portid,
>  		NULL,
>  	},
>  };
> @@ -7796,6 +7874,7 @@ cmdline_parse_ctx_t main_ctx[] = {
>  	(cmdline_parse_inst_t *)&cmd_tx_vlan_reset,
>  	(cmdline_parse_inst_t *)&cmd_tx_vlan_set_pvid,
>  	(cmdline_parse_inst_t *)&cmd_tx_cksum_set,
> +	(cmdline_parse_inst_t *)&cmd_tx_cksum_show,
>  	(cmdline_parse_inst_t *)&cmd_link_flow_control_set,
>  	(cmdline_parse_inst_t *)&cmd_link_flow_control_set_rx,
>  	(cmdline_parse_inst_t *)&cmd_link_flow_control_set_tx, diff --git
> a/app/test-pmd/config.c b/app/test-pmd/config.c index 34b6fdb..16d62ab
> 100644
> --- a/app/test-pmd/config.c
> +++ b/app/test-pmd/config.c
> @@ -1744,17 +1744,6 @@ set_qmap(portid_t port_id, uint8_t is_rx, uint16_t
> queue_id, uint8_t map_value)  }
> 
>  void
> -tx_cksum_set(portid_t port_id, uint64_t ol_flags) -{
> -	uint64_t tx_ol_flags;
> -	if (port_id_is_invalid(port_id))
> -		return;
> -	/* Clear last 8 bits and then set L3/4 checksum mask again */
> -	tx_ol_flags = ports[port_id].tx_ol_flags & (~0x0FFull);
> -	ports[port_id].tx_ol_flags = ((ol_flags & 0xff) | tx_ol_flags);
> -}
> -
> -void
>  fdir_add_signature_filter(portid_t port_id, uint8_t queue_id,
>  			  struct rte_fdir_filter *fdir_filter)  { diff --git a/app/test-
> pmd/csumonly.c b/app/test-pmd/csumonly.c index 743094a..dda5d9e 100644
> --- a/app/test-pmd/csumonly.c
> +++ b/app/test-pmd/csumonly.c
> @@ -73,13 +73,19 @@
>  #include <rte_string_fns.h>
>  #include "testpmd.h"
> 
> -
> -
>  #define IP_DEFTTL  64   /* from RFC 1340. */
>  #define IP_VERSION 0x40
>  #define IP_HDRLEN  0x05 /* default IP header length == five 32-bits words. */
> #define IP_VHL_DEF (IP_VERSION | IP_HDRLEN)
> 
> +/* we cannot use htons() from arpa/inet.h due to name conflicts, and we
> + * cannot use rte_cpu_to_be_16() on a constant in a switch/case */ #if
> +__BYTE_ORDER == __LITTLE_ENDIAN #define _htons(x) ((uint16_t)((((x) &
> +0x00ffU) << 8) | (((x) & 0xff00U) >> 8))) #else #define _htons(x) (x)
> +#endif
> +
>  static inline uint16_t
>  get_16b_sum(uint16_t *ptr16, uint32_t nr)  { @@ -112,7 +118,7 @@
> get_ipv4_cksum(struct ipv4_hdr *ipv4_hdr)
> 
> 
>  static inline uint16_t
> -get_ipv4_psd_sum (struct ipv4_hdr * ip_hdr)
> +get_ipv4_psd_sum(struct ipv4_hdr *ip_hdr)
>  {
>  	/* Pseudo Header for IPv4/UDP/TCP checksum */
>  	union ipv4_psd_header {
> @@ -136,7 +142,7 @@ get_ipv4_psd_sum (struct ipv4_hdr * ip_hdr)  }
> 
>  static inline uint16_t
> -get_ipv6_psd_sum (struct ipv6_hdr * ip_hdr)
> +get_ipv6_psd_sum(struct ipv6_hdr *ip_hdr)
>  {
>  	/* Pseudo Header for IPv6/UDP/TCP checksum */
>  	union ipv6_psd_header {
> @@ -158,6 +164,15 @@ get_ipv6_psd_sum (struct ipv6_hdr * ip_hdr)
>  	return get_16b_sum(psd_hdr.u16_arr, sizeof(psd_hdr));  }
> 
> +static uint16_t
> +get_psd_sum(void *l3_hdr, uint16_t ethertype) {
> +	if (ethertype == _htons(ETHER_TYPE_IPv4))
> +		return get_ipv4_psd_sum(l3_hdr);
> +	else /* assume ethertype == ETHER_TYPE_IPv6 */
> +		return get_ipv6_psd_sum(l3_hdr);
> +}
> +
>  static inline uint16_t
>  get_ipv4_udptcp_checksum(struct ipv4_hdr *ipv4_hdr, uint16_t *l4_hdr)  { @@ -
> 174,7 +189,6 @@ get_ipv4_udptcp_checksum(struct ipv4_hdr *ipv4_hdr,
> uint16_t *l4_hdr)
>  	if (cksum == 0)
>  		cksum = 0xffff;
>  	return (uint16_t)cksum;
> -
>  }
> 
>  static inline uint16_t
> @@ -196,48 +210,218 @@ get_ipv6_udptcp_checksum(struct ipv6_hdr
> *ipv6_hdr, uint16_t *l4_hdr)
>  	return (uint16_t)cksum;
>  }
> 
> +static uint16_t
> +get_udptcp_checksum(void *l3_hdr, void *l4_hdr, uint16_t ethertype) {
> +	if (ethertype == _htons(ETHER_TYPE_IPv4))
> +		return get_ipv4_udptcp_checksum(l3_hdr, l4_hdr);
> +	else /* assume ethertype == ETHER_TYPE_IPv6 */
> +		return get_ipv6_udptcp_checksum(l3_hdr, l4_hdr); }
> 
>  /*
> - * Forwarding of packets. Change the checksum field with HW or SW methods
> - * The HW/SW method selection depends on the ol_flags on every packet
> + * Parse an ethernet header to fill the ethertype, l2_len, l3_len and
> + * ipproto. This function is able to recognize IPv4/IPv6 with one
> +optional vlan
> + * header.
> + */
> +static void
> +parse_ethernet(struct ether_hdr *eth_hdr, uint16_t *ethertype, uint16_t
> *l2_len,
> +	uint16_t *l3_len, uint8_t *l4_proto)
> +{
> +	struct ipv4_hdr *ipv4_hdr;
> +	struct ipv6_hdr *ipv6_hdr;
> +
> +	*l2_len = sizeof(struct ether_hdr);
> +	*ethertype = eth_hdr->ether_type;
> +
> +	if (*ethertype == _htons(ETHER_TYPE_VLAN)) {
> +		struct vlan_hdr *vlan_hdr = (struct vlan_hdr *)(eth_hdr + 1);
> +
> +		*l2_len  += sizeof(struct vlan_hdr);
> +		*ethertype = vlan_hdr->eth_proto;
> +	}
> +
> +	switch (*ethertype) {
> +	case _htons(ETHER_TYPE_IPv4):
> +		ipv4_hdr = (struct ipv4_hdr *) ((char *)eth_hdr + *l2_len);
> +		*l3_len = (ipv4_hdr->version_ihl & 0x0f) * 4;
> +		*l4_proto = ipv4_hdr->next_proto_id;
> +		break;
> +	case _htons(ETHER_TYPE_IPv6):
> +		ipv6_hdr = (struct ipv6_hdr *) ((char *)eth_hdr + *l2_len);
> +		*l3_len = sizeof(struct ipv6_hdr) ;
> +		*l4_proto = ipv6_hdr->proto;
> +		break;
> +	default:
> +		*l3_len = 0;
> +		*l4_proto = 0;
> +		break;
> +	}
> +}
> +
> +/* modify the IPv4 or IPv4 source address of a packet */ static void
> +change_ip_addresses(void *l3_hdr, uint16_t ethertype) {
> +	struct ipv4_hdr *ipv4_hdr = l3_hdr;
> +	struct ipv6_hdr *ipv6_hdr = l3_hdr;
> +
> +	if (ethertype == _htons(ETHER_TYPE_IPv4)) {
> +		ipv4_hdr->src_addr =
> +			rte_cpu_to_be_32(rte_be_to_cpu_32(ipv4_hdr-
> >src_addr) + 1);
> +	}
> +	else if (ethertype == _htons(ETHER_TYPE_IPv6)) {
> +		ipv6_hdr->src_addr[15] = ipv6_hdr->src_addr[15] + 1;
> +	}
> +}
> +
> +/* if possible, calculate the checksum of a packet in hw or sw,
> + * depending on the testpmd command line configuration */ static
> +uint64_t process_inner_cksums(void *l3_hdr, uint16_t ethertype,
> +uint16_t l3_len,
> +	uint8_t l4_proto, uint16_t testpmd_ol_flags) {
> +	struct ipv4_hdr *ipv4_hdr = l3_hdr;
> +	struct udp_hdr *udp_hdr;
> +	struct tcp_hdr *tcp_hdr;
> +	struct sctp_hdr *sctp_hdr;
> +	uint64_t ol_flags = 0;
> +
> +	if (ethertype == _htons(ETHER_TYPE_IPv4)) {
> +		ipv4_hdr = l3_hdr;
> +		ipv4_hdr->hdr_checksum = 0;
> +
> +		if (testpmd_ol_flags & TESTPMD_TX_OFFLOAD_IP_CKSUM)
> +			ol_flags |= PKT_TX_IP_CKSUM;
> +		else
> +			ipv4_hdr->hdr_checksum = get_ipv4_cksum(ipv4_hdr);
> +
> +	}
> +	else if (ethertype != _htons(ETHER_TYPE_IPv6))
> +		return 0; /* packet type not supported nothing to do */
> +
> +	if (l4_proto == IPPROTO_UDP) {
> +		udp_hdr = (struct udp_hdr *)((char *)l3_hdr + l3_len);
> +		/* do not recalculate udp cksum if it was 0 */
> +		if (udp_hdr->dgram_cksum != 0) {
> +			udp_hdr->dgram_cksum = 0;
> +			if (testpmd_ol_flags &
> TESTPMD_TX_OFFLOAD_UDP_CKSUM) {
> +				ol_flags |= PKT_TX_UDP_CKSUM;
> +				udp_hdr->dgram_cksum = get_psd_sum(l3_hdr,
> +					ethertype);
> +			}
> +			else {
> +				udp_hdr->dgram_cksum =
> +					get_udptcp_checksum(l3_hdr, udp_hdr,
> +						ethertype);
> +			}
> +		}
> +	}
> +	else if (l4_proto == IPPROTO_TCP) {
> +		tcp_hdr = (struct tcp_hdr *)((char *)l3_hdr + l3_len);
> +		tcp_hdr->cksum = 0;
> +		if (testpmd_ol_flags & TESTPMD_TX_OFFLOAD_TCP_CKSUM) {
> +			ol_flags |= PKT_TX_TCP_CKSUM;
> +			tcp_hdr->cksum = get_psd_sum(l3_hdr, ethertype);
> +		}
> +		else {
> +			tcp_hdr->cksum =
> +				get_udptcp_checksum(l3_hdr, tcp_hdr,
> ethertype);
> +		}
> +	}
> +	else if (l4_proto == IPPROTO_SCTP) {
> +		sctp_hdr = (struct sctp_hdr *)((char *)l3_hdr + l3_len);
> +		sctp_hdr->cksum = 0;
> +		/* sctp payload must be a multiple of 4 to be
> +		 * offloaded */
> +		if ((testpmd_ol_flags & TESTPMD_TX_OFFLOAD_SCTP_CKSUM)
> &&
> +			((ipv4_hdr->total_length & 0x3) == 0)) {
> +			ol_flags |= PKT_TX_SCTP_CKSUM;
> +		}
> +		else {
> +			/* XXX implement CRC32c, example available in
> +			 * RFC3309 */
> +		}
> +	}
> +
> +	return ol_flags;
> +}
> +
> +/* Calculate the checksum of outer header (only vxlan is supported,
> + * meaning IP + UDP). The caller already checked that it's a vxlan
> + * packet */
> +static uint64_t
> +process_outer_cksums(void *outer_l3_hdr, uint16_t outer_ethertype,
> +	uint16_t outer_l3_len, uint16_t testpmd_ol_flags) {
> +	struct ipv4_hdr *ipv4_hdr = outer_l3_hdr;
> +	struct ipv6_hdr *ipv6_hdr = outer_l3_hdr;
> +	struct udp_hdr *udp_hdr;
> +	uint64_t ol_flags = 0;
> +
> +	if (testpmd_ol_flags & TESTPMD_TX_OFFLOAD_VXLAN_CKSUM)
> +		ol_flags |= PKT_TX_VXLAN_CKSUM;
> +
> +	if (outer_ethertype == _htons(ETHER_TYPE_IPv4)) {
> +		ipv4_hdr->hdr_checksum = 0;
> +
> +		if ((testpmd_ol_flags & TESTPMD_TX_OFFLOAD_VXLAN_CKSUM)
> == 0)
> +			ipv4_hdr->hdr_checksum = get_ipv4_cksum(ipv4_hdr);
> +	}

As I mentioned, we should use TESTPMD_TX_OFFLOAD_IP_CKSUM instead of using TESTPMD_TX_OFFLOAD_VXLAN_CKSUM flag to check if we need to set outer IP checksum offload.
In other words, even if VXLAN packet, outer IP TX checksum offload is also needed if  TESTPMD_TX_OFFLOAD_IP_CKSUM is set.

> +	udp_hdr = (struct udp_hdr *)((char *)outer_l3_hdr + outer_l3_len);
> +	/* do not recalculate udp cksum if it was 0 */
> +	if (udp_hdr->dgram_cksum != 0) {
> +		udp_hdr->dgram_cksum = 0;
> +		if ((testpmd_ol_flags & TESTPMD_TX_OFFLOAD_VXLAN_CKSUM)
> == 0) {
> +			if (outer_ethertype == _htons(ETHER_TYPE_IPv4))
> +				udp_hdr->dgram_cksum =
> +					get_ipv4_udptcp_checksum(ipv4_hdr,
> +						(uint16_t *)udp_hdr);
> +			else
> +				udp_hdr->dgram_cksum =
> +					get_ipv6_udptcp_checksum(ipv6_hdr,
> +						(uint16_t *)udp_hdr);
> +		}
> +	}
> +
> +	return ol_flags;
> +}
> +
> +/*
> + * Receive a burst of packets, and for supported packet types:
> + *  - modify the IPs
> + *  - reprocess the checksum in SW or HW, depending on testpmd command line
> + *    configuration
> + * Then packets are transmitted on the output port.
> + *
> + * Supported packets are:
> + *   Ether / (vlan) / IP|IP6 / UDP|TCP|SCTP .
> + *   Ether / (vlan) / IP|IP6 / UDP / VxLAN / Ether / IP|IP6 / UDP|TCP|SCTP
> + *
> + * The network parser supposes that the packet is contiguous, which may
> + * not be the case in real life.
>   */
>  static void
>  pkt_burst_checksum_forward(struct fwd_stream *fs)  {
> -	struct rte_mbuf  *pkts_burst[MAX_PKT_BURST];
> -	struct rte_port  *txp;
> -	struct rte_mbuf  *mb;
> +	struct rte_mbuf *pkts_burst[MAX_PKT_BURST];
> +	struct rte_port *txp;
> +	struct rte_mbuf *m;
>  	struct ether_hdr *eth_hdr;
> -	struct ipv4_hdr  *ipv4_hdr;
> -	struct ether_hdr *inner_eth_hdr;
> -	struct ipv4_hdr  *inner_ipv4_hdr = NULL;
> -	struct ipv6_hdr  *ipv6_hdr;
> -	struct ipv6_hdr  *inner_ipv6_hdr = NULL;
> -	struct udp_hdr   *udp_hdr;
> -	struct udp_hdr   *inner_udp_hdr;
> -	struct tcp_hdr   *tcp_hdr;
> -	struct tcp_hdr   *inner_tcp_hdr;
> -	struct sctp_hdr  *sctp_hdr;
> -	struct sctp_hdr  *inner_sctp_hdr;
> -
> +	void *l3_hdr = NULL, *outer_l3_hdr = NULL; /* can be IPv4 or IPv6 */
> +	struct udp_hdr *udp_hdr;
>  	uint16_t nb_rx;
>  	uint16_t nb_tx;
>  	uint16_t i;
>  	uint64_t ol_flags;
> -	uint64_t pkt_ol_flags;
> -	uint64_t tx_ol_flags;
> -	uint16_t l4_proto;
> -	uint16_t inner_l4_proto = 0;
> -	uint16_t eth_type;
> -	uint8_t  l2_len;
> -	uint8_t  l3_len;
> -	uint8_t  inner_l2_len = 0;
> -	uint8_t  inner_l3_len = 0;
> -
> +	uint16_t testpmd_ol_flags;
> +	uint8_t l4_proto;
> +	uint16_t ethertype = 0, outer_ethertype = 0;
> +	uint16_t  l2_len = 0, l3_len = 0, outer_l2_len = 0, outer_l3_len = 0;
> +	int tunnel = 0;
>  	uint32_t rx_bad_ip_csum;
>  	uint32_t rx_bad_l4_csum;
> -	uint8_t  ipv4_tunnel;
> -	uint8_t  ipv6_tunnel;
> 
>  #ifdef RTE_TEST_PMD_RECORD_CORE_CYCLES
>  	uint64_t start_tsc;
> @@ -249,9 +433,7 @@ pkt_burst_checksum_forward(struct fwd_stream *fs)
>  	start_tsc = rte_rdtsc();
>  #endif
> 
> -	/*
> -	 * Receive a burst of packets and forward them.
> -	 */
> +	/* receive a burst of packet */
>  	nb_rx = rte_eth_rx_burst(fs->rx_port, fs->rx_queue, pkts_burst,
>  				 nb_pkt_per_burst);
>  	if (unlikely(nb_rx == 0))
> @@ -265,348 +447,107 @@ pkt_burst_checksum_forward(struct fwd_stream
> *fs)
>  	rx_bad_l4_csum = 0;
> 
>  	txp = &ports[fs->tx_port];
> -	tx_ol_flags = txp->tx_ol_flags;
> +	testpmd_ol_flags = txp->tx_ol_flags;
> 
>  	for (i = 0; i < nb_rx; i++) {
> 
> -		mb = pkts_burst[i];
> -		l2_len  = sizeof(struct ether_hdr);
> -		pkt_ol_flags = mb->ol_flags;
> -		ol_flags = (pkt_ol_flags & (~PKT_TX_L4_MASK));
> -		ipv4_tunnel = (pkt_ol_flags & PKT_RX_TUNNEL_IPV4_HDR) ?
> -				1 : 0;
> -		ipv6_tunnel = (pkt_ol_flags & PKT_RX_TUNNEL_IPV6_HDR) ?
> -				1 : 0;
> -		eth_hdr = rte_pktmbuf_mtod(mb, struct ether_hdr *);
> -		eth_type = rte_be_to_cpu_16(eth_hdr->ether_type);
> -		if (eth_type == ETHER_TYPE_VLAN) {
> -			/* Only allow single VLAN label here */
> -			l2_len  += sizeof(struct vlan_hdr);
> -			 eth_type = rte_be_to_cpu_16(*(uint16_t *)
> -				((uintptr_t)&eth_hdr->ether_type +
> -				sizeof(struct vlan_hdr)));
> +		ol_flags = 0;
> +		tunnel = 0;
> +		m = pkts_burst[i];
> +
> +		/* Update the L3/L4 checksum error packet statistics */
> +		rx_bad_ip_csum += ((m->ol_flags & PKT_RX_IP_CKSUM_BAD) !=
> 0);
> +		rx_bad_l4_csum += ((m->ol_flags & PKT_RX_L4_CKSUM_BAD) !=
> 0);
> +
> +		/* step 1: dissect packet, parsing optional vlan, ip4/ip6, vxlan
> +		 * and inner headers */
> +
> +		eth_hdr = rte_pktmbuf_mtod(m, struct ether_hdr *);
> +		parse_ethernet(eth_hdr, &ethertype, &l2_len, &l3_len,
> &l4_proto);
> +		l3_hdr = (char *)eth_hdr + l2_len;
> +
> +		/* check if it's a supported tunnel (only vxlan for now) */
> +		if (l4_proto == IPPROTO_UDP) {
> +			udp_hdr = (struct udp_hdr *)((char *)l3_hdr + l3_len);
> +
> +			/* currently, this flag is set by i40e only if the
> +			 * packet is vxlan */
> +			if (((m->ol_flags & PKT_RX_TUNNEL_IPV4_HDR) ||
> +					(m->ol_flags &
> PKT_RX_TUNNEL_IPV6_HDR)))
> +				tunnel = 1;
> +			/* else check udp destination port, 4789 is the default
> +			 * vxlan port (rfc7348) */
> +			else if (udp_hdr->dst_port == _htons(4789))
> +				tunnel = 1;
> +
> +			if (tunnel == 1) {
> +				outer_ethertype = ethertype;
> +				outer_l2_len = l2_len;
> +				outer_l3_len = l3_len;
> +				outer_l3_hdr = l3_hdr;
> +
> +				eth_hdr = (struct ether_hdr *)((char *)udp_hdr +
> +					sizeof(struct udp_hdr) +
> +					sizeof(struct vxlan_hdr));
> +
> +				parse_ethernet(eth_hdr, &ethertype, &l2_len,
> +					&l3_len, &l4_proto);
> +				l3_hdr = (char *)eth_hdr + l2_len;
> +			}
>  		}
> 
> -		/* Update the L3/L4 checksum error packet count  */
> -		rx_bad_ip_csum += (uint16_t) ((pkt_ol_flags &
> PKT_RX_IP_CKSUM_BAD) != 0);
> -		rx_bad_l4_csum += (uint16_t) ((pkt_ol_flags &
> PKT_RX_L4_CKSUM_BAD) != 0);
> -
> -		/*
> -		 * Try to figure out L3 packet type by SW.
> -		 */
> -		if ((pkt_ol_flags & (PKT_RX_IPV4_HDR | PKT_RX_IPV4_HDR_EXT
> |
> -				PKT_RX_IPV6_HDR | PKT_RX_IPV6_HDR_EXT))
> == 0) {
> -			if (eth_type == ETHER_TYPE_IPv4)
> -				pkt_ol_flags |= PKT_RX_IPV4_HDR;
> -			else if (eth_type == ETHER_TYPE_IPv6)
> -				pkt_ol_flags |= PKT_RX_IPV6_HDR;
> -		}
> +		/* step 2: change all source IPs (v4 or v6) so we need
> +		 * to recompute the chksums even if they were correct */
> 
> -		/*
> -		 * Simplify the protocol parsing
> -		 * Assuming the incoming packets format as
> -		 *      Ethernet2 + optional single VLAN
> -		 *      + ipv4 or ipv6
> -		 *      + udp or tcp or sctp or others
> -		 */
> -		if (pkt_ol_flags & (PKT_RX_IPV4_HDR |
> PKT_RX_TUNNEL_IPV4_HDR)) {
> +		change_ip_addresses(l3_hdr, ethertype);
> +		if (tunnel == 1)
> +			change_ip_addresses(outer_l3_hdr, outer_ethertype);
> 
> -			/* Do not support ipv4 option field */
> -			l3_len = sizeof(struct ipv4_hdr) ;
> +		/* step 3: depending on user command line configuration,
> +		 * recompute checksum either in software or flag the
> +		 * mbuf to offload the calculation to the NIC */
> 
> -			ipv4_hdr = (struct ipv4_hdr *) (rte_pktmbuf_mtod(mb,
> -					unsigned char *) + l2_len);
> +		/* process checksums of inner headers first */
> +		ol_flags |= process_inner_cksums(l3_hdr, ethertype,
> +			l3_len, l4_proto, testpmd_ol_flags);
> -			l4_proto = ipv4_hdr->next_proto_id;
> +		/* Then process outer headers if any. Note that the software
> +		 * checksum will be wrong if one of the inner checksums is
> +		 * processed in hardware. */
> +		if (tunnel == 1) {
> +			ol_flags |= process_outer_cksums(outer_l3_hdr,
> +				outer_ethertype, outer_l3_len,
> testpmd_ol_flags);
> +		}
> 
> -			/* Do not delete, this is required by HW*/
> -			ipv4_hdr->hdr_checksum = 0;
> +		/* step 4: fill the mbuf meta data (flags and header lengths) */
> 
> -			if (tx_ol_flags & TESTPMD_TX_OFFLOAD_IP_CKSUM) {
> -				/* HW checksum */
> -				ol_flags |= PKT_TX_IP_CKSUM;
> +		if (tunnel == 1) {
> +			if (testpmd_ol_flags &
> TESTPMD_TX_OFFLOAD_VXLAN_CKSUM) {
> +				m->l2_len = outer_l2_len;
> +				m->l3_len = outer_l3_len;
> +				m->inner_l2_len = l2_len;
> +				m->inner_l3_len = l3_len;
>  			}
>  			else {
> -				ol_flags |= PKT_TX_IPV4;
> -				/* SW checksum calculation */
> -				ipv4_hdr->src_addr++;
> -				ipv4_hdr->hdr_checksum =
> get_ipv4_cksum(ipv4_hdr);
> +				/* if we don't do vxlan cksum in hw,
> +				   outer checksum will be wrong because
> +				   we changed the ip, but it shows that
> +				   we can process the inner header cksum
> +				   in the nic */
> +				m->l2_len = outer_l2_len + outer_l3_len +
> +					sizeof(struct udp_hdr) +
> +					sizeof(struct vxlan_hdr) + l2_len;
> +				m->l3_len = l3_len;
>  			}
> -
> -			if (l4_proto == IPPROTO_UDP) {
> -				udp_hdr = (struct udp_hdr*)
> (rte_pktmbuf_mtod(mb,
> -						unsigned char *) + l2_len +
> l3_len);
> -				if (tx_ol_flags &
> TESTPMD_TX_OFFLOAD_UDP_CKSUM) {
> -					/* HW Offload */
> -					ol_flags |= PKT_TX_UDP_CKSUM;
> -					if (ipv4_tunnel)
> -						udp_hdr->dgram_cksum = 0;
> -					else
> -						/* Pseudo header sum need be
> set properly */
> -						udp_hdr->dgram_cksum =
> -
> 	get_ipv4_psd_sum(ipv4_hdr);
> -				}
> -				else {
> -					/* SW Implementation, clear checksum
> field first */
> -					udp_hdr->dgram_cksum = 0;
> -					udp_hdr->dgram_cksum =
> get_ipv4_udptcp_checksum(ipv4_hdr,
> -
> 	(uint16_t *)udp_hdr);
> -				}
> -
> -				if (ipv4_tunnel) {
> -
> -					uint16_t len;
> -
> -					/* Check if inner L3/L4 checkum flag is
> set */
> -					if (tx_ol_flags &
> TESTPMD_TX_OFFLOAD_INNER_CKSUM_MASK)
> -						ol_flags |=
> PKT_TX_VXLAN_CKSUM;
> -
> -					inner_l2_len  = sizeof(struct ether_hdr);
> -					inner_eth_hdr = (struct ether_hdr *)
> (rte_pktmbuf_mtod(mb,
> -								unsigned char
> *) + l2_len + l3_len
> -								 +
> ETHER_VXLAN_HLEN);
> -
> -					eth_type =
> rte_be_to_cpu_16(inner_eth_hdr->ether_type);
> -					if (eth_type == ETHER_TYPE_VLAN) {
> -						inner_l2_len += sizeof(struct
> vlan_hdr);
> -						eth_type =
> rte_be_to_cpu_16(*(uint16_t *)
> -							((uintptr_t)&eth_hdr-
> >ether_type +
> -								sizeof(struct
> vlan_hdr)));
> -					}
> -
> -					len = l2_len + l3_len +
> ETHER_VXLAN_HLEN + inner_l2_len;
> -					if (eth_type == ETHER_TYPE_IPv4) {
> -						inner_l3_len = sizeof(struct
> ipv4_hdr);
> -						inner_ipv4_hdr = (struct
> ipv4_hdr *) (rte_pktmbuf_mtod(mb,
> -								unsigned char
> *) + len);
> -						inner_l4_proto =
> inner_ipv4_hdr->next_proto_id;
> -
> -						if (tx_ol_flags &
> TESTPMD_TX_OFFLOAD_INNER_IP_CKSUM) {
> -
> -							/* Do not delete, this is
> required by HW*/
> -							inner_ipv4_hdr-
> >hdr_checksum = 0;
> -							ol_flags |=
> PKT_TX_IPV4_CSUM;
> -						}
> -
> -					} else if (eth_type ==
> ETHER_TYPE_IPv6) {
> -						inner_l3_len = sizeof(struct
> ipv6_hdr);
> -						inner_ipv6_hdr = (struct
> ipv6_hdr *) (rte_pktmbuf_mtod(mb,
> -								unsigned char
> *) + len);
> -						inner_l4_proto =
> inner_ipv6_hdr->proto;
> -					}
> -					if ((inner_l4_proto == IPPROTO_UDP)
> &&
> -						(tx_ol_flags &
> TESTPMD_TX_OFFLOAD_INNER_UDP_CKSUM)) {
> -
> -						/* HW Offload */
> -						ol_flags |=
> PKT_TX_UDP_CKSUM;
> -						inner_udp_hdr = (struct
> udp_hdr *) (rte_pktmbuf_mtod(mb,
> -								unsigned char
> *) + len + inner_l3_len);
> -						if (eth_type ==
> ETHER_TYPE_IPv4)
> -							inner_udp_hdr-
> >dgram_cksum = get_ipv4_psd_sum(inner_ipv4_hdr);
> -						else if (eth_type ==
> ETHER_TYPE_IPv6)
> -							inner_udp_hdr-
> >dgram_cksum = get_ipv6_psd_sum(inner_ipv6_hdr);
> -
> -					} else if ((inner_l4_proto ==
> IPPROTO_TCP) &&
> -						(tx_ol_flags &
> TESTPMD_TX_OFFLOAD_INNER_TCP_CKSUM)) {
> -						/* HW Offload */
> -						ol_flags |=
> PKT_TX_TCP_CKSUM;
> -						inner_tcp_hdr = (struct tcp_hdr
> *) (rte_pktmbuf_mtod(mb,
> -								unsigned char
> *) + len + inner_l3_len);
> -						if (eth_type ==
> ETHER_TYPE_IPv4)
> -							inner_tcp_hdr->cksum
> = get_ipv4_psd_sum(inner_ipv4_hdr);
> -						else if (eth_type ==
> ETHER_TYPE_IPv6)
> -							inner_tcp_hdr->cksum
> = get_ipv6_psd_sum(inner_ipv6_hdr);
> -					} else if ((inner_l4_proto ==
> IPPROTO_SCTP) &&
> -						(tx_ol_flags &
> TESTPMD_TX_OFFLOAD_INNER_SCTP_CKSUM)) {
> -						/* HW Offload */
> -						ol_flags |=
> PKT_TX_SCTP_CKSUM;
> -						inner_sctp_hdr = (struct
> sctp_hdr *) (rte_pktmbuf_mtod(mb,
> -								unsigned char
> *) + len + inner_l3_len);
> -						inner_sctp_hdr->cksum = 0;
> -					}
> -
> -				}
> -
> -			} else if (l4_proto == IPPROTO_TCP) {
> -				tcp_hdr = (struct tcp_hdr*)
> (rte_pktmbuf_mtod(mb,
> -						unsigned char *) + l2_len +
> l3_len);
> -				if (tx_ol_flags &
> TESTPMD_TX_OFFLOAD_TCP_CKSUM) {
> -					ol_flags |= PKT_TX_TCP_CKSUM;
> -					tcp_hdr->cksum =
> get_ipv4_psd_sum(ipv4_hdr);
> -				}
> -				else {
> -					tcp_hdr->cksum = 0;
> -					tcp_hdr->cksum =
> get_ipv4_udptcp_checksum(ipv4_hdr,
> -							(uint16_t*)tcp_hdr);
> -				}
> -			} else if (l4_proto == IPPROTO_SCTP) {
> -				sctp_hdr = (struct sctp_hdr*)
> (rte_pktmbuf_mtod(mb,
> -						unsigned char *) + l2_len +
> l3_len);
> -
> -				if (tx_ol_flags &
> TESTPMD_TX_OFFLOAD_SCTP_CKSUM) {
> -					ol_flags |= PKT_TX_SCTP_CKSUM;
> -					sctp_hdr->cksum = 0;
> -
> -					/* Sanity check, only number of 4 bytes
> supported */
> -					if ((rte_be_to_cpu_16(ipv4_hdr-
> >total_length) % 4) != 0)
> -						printf("sctp payload must be a
> multiple "
> -							"of 4 bytes for
> checksum offload");
> -				}
> -				else {
> -					sctp_hdr->cksum = 0;
> -					/* CRC32c sample code available in
> RFC3309 */
> -				}
> -			}
> -			/* End of L4 Handling*/
> -		} else if (pkt_ol_flags & (PKT_RX_IPV6_HDR |
> PKT_RX_TUNNEL_IPV6_HDR)) {
> -			ipv6_hdr = (struct ipv6_hdr *) (rte_pktmbuf_mtod(mb,
> -					unsigned char *) + l2_len);
> -			l3_len = sizeof(struct ipv6_hdr) ;
> -			l4_proto = ipv6_hdr->proto;
> -			ol_flags |= PKT_TX_IPV6;
> -
> -			if (l4_proto == IPPROTO_UDP) {
> -				udp_hdr = (struct udp_hdr*)
> (rte_pktmbuf_mtod(mb,
> -						unsigned char *) + l2_len +
> l3_len);
> -				if (tx_ol_flags &
> TESTPMD_TX_OFFLOAD_UDP_CKSUM) {
> -					/* HW Offload */
> -					ol_flags |= PKT_TX_UDP_CKSUM;
> -					if (ipv6_tunnel)
> -						udp_hdr->dgram_cksum = 0;
> -					else
> -						udp_hdr->dgram_cksum =
> -
> 	get_ipv6_psd_sum(ipv6_hdr);
> -				}
> -				else {
> -					/* SW Implementation */
> -					/* checksum field need be clear first */
> -					udp_hdr->dgram_cksum = 0;
> -					udp_hdr->dgram_cksum =
> get_ipv6_udptcp_checksum(ipv6_hdr,
> -								(uint16_t
> *)udp_hdr);
> -				}
> -
> -				if (ipv6_tunnel) {
> -
> -					uint16_t len;
> -
> -					/* Check if inner L3/L4 checksum flag is
> set */
> -					if (tx_ol_flags &
> TESTPMD_TX_OFFLOAD_INNER_CKSUM_MASK)
> -						ol_flags |=
> PKT_TX_VXLAN_CKSUM;
> -
> -					inner_l2_len  = sizeof(struct ether_hdr);
> -					inner_eth_hdr = (struct ether_hdr *)
> (rte_pktmbuf_mtod(mb,
> -						unsigned char *) + l2_len +
> l3_len + ETHER_VXLAN_HLEN);
> -					eth_type =
> rte_be_to_cpu_16(inner_eth_hdr->ether_type);
> -
> -					if (eth_type == ETHER_TYPE_VLAN) {
> -						inner_l2_len += sizeof(struct
> vlan_hdr);
> -						eth_type =
> rte_be_to_cpu_16(*(uint16_t *)
> -							((uintptr_t)&eth_hdr-
> >ether_type +
> -							sizeof(struct
> vlan_hdr)));
> -					}
> -
> -					len = l2_len + l3_len +
> ETHER_VXLAN_HLEN + inner_l2_len;
> -
> -					if (eth_type == ETHER_TYPE_IPv4) {
> -						inner_l3_len = sizeof(struct
> ipv4_hdr);
> -						inner_ipv4_hdr = (struct
> ipv4_hdr *) (rte_pktmbuf_mtod(mb,
> -								unsigned char
> *) + len);
> -						inner_l4_proto =
> inner_ipv4_hdr->next_proto_id;
> -
> -						/* HW offload */
> -						if (tx_ol_flags &
> TESTPMD_TX_OFFLOAD_INNER_IP_CKSUM) {
> -
> -							/* Do not delete, this is
> required by HW*/
> -							inner_ipv4_hdr-
> >hdr_checksum = 0;
> -							ol_flags |=
> PKT_TX_IPV4_CSUM;
> -						}
> -					} else if (eth_type ==
> ETHER_TYPE_IPv6) {
> -						inner_l3_len = sizeof(struct
> ipv6_hdr);
> -						inner_ipv6_hdr = (struct
> ipv6_hdr *) (rte_pktmbuf_mtod(mb,
> -							unsigned char *) + len);
> -						inner_l4_proto =
> inner_ipv6_hdr->proto;
> -					}
> -
> -					if ((inner_l4_proto == IPPROTO_UDP)
> &&
> -						(tx_ol_flags &
> TESTPMD_TX_OFFLOAD_INNER_UDP_CKSUM)) {
> -						inner_udp_hdr = (struct
> udp_hdr *) (rte_pktmbuf_mtod(mb,
> -							unsigned char *) + len
> + inner_l3_len);
> -						/* HW offload */
> -						ol_flags |=
> PKT_TX_UDP_CKSUM;
> -						inner_udp_hdr->dgram_cksum
> = 0;
> -						if (eth_type ==
> ETHER_TYPE_IPv4)
> -							inner_udp_hdr-
> >dgram_cksum = get_ipv4_psd_sum(inner_ipv4_hdr);
> -						else if (eth_type ==
> ETHER_TYPE_IPv6)
> -							inner_udp_hdr-
> >dgram_cksum = get_ipv6_psd_sum(inner_ipv6_hdr);
> -					} else if ((inner_l4_proto ==
> IPPROTO_TCP) &&
> -						(tx_ol_flags &
> TESTPMD_TX_OFFLOAD_INNER_TCP_CKSUM)) {
> -						/* HW offload */
> -						ol_flags |=
> PKT_TX_TCP_CKSUM;
> -						inner_tcp_hdr = (struct tcp_hdr
> *) (rte_pktmbuf_mtod(mb,
> -								unsigned char
> *) + len + inner_l3_len);
> -
> -						if (eth_type ==
> ETHER_TYPE_IPv4)
> -							inner_tcp_hdr->cksum
> = get_ipv4_psd_sum(inner_ipv4_hdr);
> -						else if (eth_type ==
> ETHER_TYPE_IPv6)
> -							inner_tcp_hdr->cksum
> = get_ipv6_psd_sum(inner_ipv6_hdr);
> -
> -					} else if ((inner_l4_proto ==
> IPPROTO_SCTP) &&
> -						(tx_ol_flags &
> TESTPMD_TX_OFFLOAD_INNER_SCTP_CKSUM)) {
> -						/* HW offload */
> -						ol_flags |=
> PKT_TX_SCTP_CKSUM;
> -						inner_sctp_hdr = (struct
> sctp_hdr *) (rte_pktmbuf_mtod(mb,
> -								unsigned char
> *) + len + inner_l3_len);
> -						inner_sctp_hdr->cksum = 0;
> -					}
> -
> -				}
> -
> -			}
> -			else if (l4_proto == IPPROTO_TCP) {
> -				tcp_hdr = (struct tcp_hdr*)
> (rte_pktmbuf_mtod(mb,
> -						unsigned char *) + l2_len +
> l3_len);
> -				if (tx_ol_flags &
> TESTPMD_TX_OFFLOAD_TCP_CKSUM) {
> -					ol_flags |= PKT_TX_TCP_CKSUM;
> -					tcp_hdr->cksum =
> get_ipv6_psd_sum(ipv6_hdr);
> -				}
> -				else {
> -					tcp_hdr->cksum = 0;
> -					tcp_hdr->cksum =
> get_ipv6_udptcp_checksum(ipv6_hdr,
> -							(uint16_t*)tcp_hdr);
> -				}
> -			}
> -			else if (l4_proto == IPPROTO_SCTP) {
> -				sctp_hdr = (struct sctp_hdr*)
> (rte_pktmbuf_mtod(mb,
> -						unsigned char *) + l2_len +
> l3_len);
> -
> -				if (tx_ol_flags &
> TESTPMD_TX_OFFLOAD_SCTP_CKSUM) {
> -					ol_flags |= PKT_TX_SCTP_CKSUM;
> -					sctp_hdr->cksum = 0;
> -					/* Sanity check, only number of 4 bytes
> supported by HW */
> -					if ((rte_be_to_cpu_16(ipv6_hdr-
> >payload_len) % 4) != 0)
> -						printf("sctp payload must be a
> multiple "
> -							"of 4 bytes for
> checksum offload");
> -				}
> -				else {
> -					/* CRC32c sample code available in
> RFC3309 */
> -					sctp_hdr->cksum = 0;
> -				}
> -			} else {
> -				printf("Test flow control for 1G PMD \n");
> -			}
> -			/* End of L6 Handling*/
> -		}
> -		else {
> -			l3_len = 0;
> -			printf("Unhandled packet type: %#hx\n", eth_type);
> +		} else {
> +			/* this is only useful if an offload flag is
> +			 * set, but it does not hurt to fill it in any
> +			 * case */
> +			m->l2_len = l2_len;
> +			m->l3_len = l3_len;
>  		}
> +		m->ol_flags = ol_flags;
> 
> -		/* Combine the packet header write. VLAN is not consider here
> */
> -		mb->l2_len = l2_len;
> -		mb->l3_len = l3_len;
> -		mb->inner_l2_len = inner_l2_len;
> -		mb->inner_l3_len = inner_l3_len;
> -		mb->ol_flags = ol_flags;
>  	}
>  	nb_tx = rte_eth_tx_burst(fs->tx_port, fs->tx_queue, pkts_burst, nb_rx);
>  	fs->tx_packets += nb_tx;
> @@ -629,7 +570,6 @@ pkt_burst_checksum_forward(struct fwd_stream *fs)
> #endif  }
> 
> -
>  struct fwd_engine csum_fwd_engine = {
>  	.fwd_mode_name  = "csum",
>  	.port_fwd_begin = NULL,
> diff --git a/app/test-pmd/testpmd.h b/app/test-pmd/testpmd.h index
> 82af2bd..c753d37 100644
> --- a/app/test-pmd/testpmd.h
> +++ b/app/test-pmd/testpmd.h
> @@ -131,18 +131,11 @@ struct fwd_stream {
>  #define TESTPMD_TX_OFFLOAD_TCP_CKSUM         0x0004
>  /** Offload SCTP checksum in csum forward engine */
>  #define TESTPMD_TX_OFFLOAD_SCTP_CKSUM        0x0008
> -/** Offload inner IP checksum in csum forward engine */
> -#define TESTPMD_TX_OFFLOAD_INNER_IP_CKSUM    0x0010
> -/** Offload inner UDP checksum in csum forward engine */
> -#define TESTPMD_TX_OFFLOAD_INNER_UDP_CKSUM   0x0020
> -/** Offload inner TCP checksum in csum forward engine */
> -#define TESTPMD_TX_OFFLOAD_INNER_TCP_CKSUM   0x0040
> -/** Offload inner SCTP checksum in csum forward engine */ -#define
> TESTPMD_TX_OFFLOAD_INNER_SCTP_CKSUM  0x0080
> -/** Offload inner IP checksum mask */
> -#define TESTPMD_TX_OFFLOAD_INNER_CKSUM_MASK  0x00F0
> +/** Offload VxLAN checksum in csum forward engine */
> +#define TESTPMD_TX_OFFLOAD_VXLAN_CKSUM       0x0010
>  /** Insert VLAN header in forward engine */
> -#define TESTPMD_TX_OFFLOAD_INSERT_VLAN       0x0100
> +#define TESTPMD_TX_OFFLOAD_INSERT_VLAN       0x0020
> +
>  /**
>   * The data structure associated with each port.
>   */
> @@ -510,8 +503,6 @@ void tx_vlan_pvid_set(portid_t port_id, uint16_t vlan_id,
> int on);
> 
>  void set_qmap(portid_t port_id, uint8_t is_rx, uint16_t queue_id, uint8_t
> map_value);
> 
> -void tx_cksum_set(portid_t port_id, uint64_t ol_flags);
> -
>  void set_verbose_level(uint16_t vb_level);  void set_tx_pkt_segments(unsigned
> *seg_lengths, unsigned nb_segs);  void set_nb_pkt_per_burst(uint16_t
> pkt_burst);
> --
> 2.1.0

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [dpdk-dev] [PATCH v2 05/13] mbuf: remove too specific PKT_TX_OFFLOAD_MASK definition
  2014-11-14 17:03   ` [dpdk-dev] [PATCH v2 05/13] mbuf: remove too specific PKT_TX_OFFLOAD_MASK definition Olivier Matz
@ 2014-11-17 10:35     ` Bruce Richardson
  0 siblings, 0 replies; 112+ messages in thread
From: Bruce Richardson @ 2014-11-17 10:35 UTC (permalink / raw)
  To: Olivier Matz; +Cc: dev, jigsaw

On Fri, Nov 14, 2014 at 06:03:21PM +0100, Olivier Matz wrote:
> This definition is specific to Intel PMD drivers and its definition
> "indicate what bits required for building TX context" shows that it
> should not be in the generic rte_mbuf.h but in the PMD driver.
> 
> Signed-off-by: Olivier Matz <olivier.matz@6wind.com>

Acked-by: Bruce Richardson <bruce.richardson@intel.com>

> ---
>  lib/librte_mbuf/rte_mbuf.h        | 5 -----
>  lib/librte_pmd_e1000/igb_rxtx.c   | 8 +++++++-
>  lib/librte_pmd_ixgbe/ixgbe_rxtx.c | 8 +++++++-
>  3 files changed, 14 insertions(+), 7 deletions(-)
> 
> diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h
> index d7070e9..68fb988 100644
> --- a/lib/librte_mbuf/rte_mbuf.h
> +++ b/lib/librte_mbuf/rte_mbuf.h
> @@ -129,11 +129,6 @@ extern "C" {
>  /* Use final bit of flags to indicate a control mbuf */
>  #define CTRL_MBUF_FLAG       (1ULL << 63) /**< Mbuf contains control data */
>  
> -/**
> - * Bit Mask to indicate what bits required for building TX context
> - */
> -#define PKT_TX_OFFLOAD_MASK (PKT_TX_VLAN_PKT | PKT_TX_IP_CKSUM | PKT_TX_L4_MASK)
> -
>  /* define a set of marker types that can be used to refer to set points in the
>   * mbuf */
>  typedef void    *MARKER[0];   /**< generic marker for a point in a structure */
> diff --git a/lib/librte_pmd_e1000/igb_rxtx.c b/lib/librte_pmd_e1000/igb_rxtx.c
> index b406397..433c616 100644
> --- a/lib/librte_pmd_e1000/igb_rxtx.c
> +++ b/lib/librte_pmd_e1000/igb_rxtx.c
> @@ -84,6 +84,12 @@
>  		ETH_RSS_IPV6_UDP | \
>  		ETH_RSS_IPV6_UDP_EX)
>  
> +/* Bit Mask to indicate what bits required for building TX context */
> +#define IGB_TX_OFFLOAD_MASK (			 \
> +		PKT_TX_VLAN_PKT |		 \
> +		PKT_TX_IP_CKSUM |		 \
> +		PKT_TX_L4_MASK)
> +
>  static inline struct rte_mbuf *
>  rte_rxmbuf_alloc(struct rte_mempool *mp)
>  {
> @@ -400,7 +406,7 @@ eth_igb_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
>  		ol_flags = tx_pkt->ol_flags;
>  		vlan_macip_lens.f.vlan_tci = tx_pkt->vlan_tci;
>  		vlan_macip_lens.f.l2_l3_len = tx_pkt->l2_l3_len;
> -		tx_ol_req = ol_flags & PKT_TX_OFFLOAD_MASK;
> +		tx_ol_req = ol_flags & IGB_TX_OFFLOAD_MASK;
>  
>  		/* If a Context Descriptor need be built . */
>  		if (tx_ol_req) {
> diff --git a/lib/librte_pmd_ixgbe/ixgbe_rxtx.c b/lib/librte_pmd_ixgbe/ixgbe_rxtx.c
> index 7e470ce..ca35db2 100644
> --- a/lib/librte_pmd_ixgbe/ixgbe_rxtx.c
> +++ b/lib/librte_pmd_ixgbe/ixgbe_rxtx.c
> @@ -90,6 +90,12 @@
>  		ETH_RSS_IPV6_UDP | \
>  		ETH_RSS_IPV6_UDP_EX)
>  
> +/* Bit Mask to indicate what bits required for building TX context */
> +#define IXGBE_TX_OFFLOAD_MASK (			 \
> +		PKT_TX_VLAN_PKT |		 \
> +		PKT_TX_IP_CKSUM |		 \
> +		PKT_TX_L4_MASK)
> +
>  static inline struct rte_mbuf *
>  rte_rxmbuf_alloc(struct rte_mempool *mp)
>  {
> @@ -580,7 +586,7 @@ ixgbe_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
>  		ol_flags = tx_pkt->ol_flags;
>  
>  		/* If hardware offload required */
> -		tx_ol_req = ol_flags & PKT_TX_OFFLOAD_MASK;
> +		tx_ol_req = ol_flags & IXGBE_TX_OFFLOAD_MASK;
>  		if (tx_ol_req) {
>  			vlan_macip_lens.f.vlan_tci = tx_pkt->vlan_tci;
>  			vlan_macip_lens.f.l2_l3_len = tx_pkt->l2_l3_len;
> -- 
> 2.1.0
> 

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [dpdk-dev] [PATCH v2 06/13] mbuf: add functions to get the name of an ol_flag
  2014-11-14 17:03   ` [dpdk-dev] [PATCH v2 06/13] mbuf: add functions to get the name of an ol_flag Olivier Matz
@ 2014-11-17 10:39     ` Bruce Richardson
  2014-11-17 12:51       ` Olivier MATZ
  2014-11-17 19:00     ` Ananyev, Konstantin
  1 sibling, 1 reply; 112+ messages in thread
From: Bruce Richardson @ 2014-11-17 10:39 UTC (permalink / raw)
  To: Olivier Matz; +Cc: dev, jigsaw

On Fri, Nov 14, 2014 at 06:03:22PM +0100, Olivier Matz wrote:
> In test-pmd (rxonly.c), the code is able to dump the list of ol_flags.
> The issue is that the list of flags in the application has to be
> synchronized with the flags defined in rte_mbuf.h.
> 
> This patch introduces 2 new functions rte_get_rx_ol_flag_name()
> and rte_get_tx_ol_flag_name() that returns the name of a flag from
> its mask. It also fixes rxonly.c to use this new functions and to
> display the proper flags.
> 
> Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
> ---
>  app/test-pmd/rxonly.c      | 36 ++++++++++--------------------------
>  lib/librte_mbuf/rte_mbuf.c | 45 +++++++++++++++++++++++++++++++++++++++++++++
>  lib/librte_mbuf/rte_mbuf.h | 22 ++++++++++++++++++++++
>  3 files changed, 77 insertions(+), 26 deletions(-)
> 
> diff --git a/app/test-pmd/rxonly.c b/app/test-pmd/rxonly.c
> index 9ad1df6..51a530a 100644
> --- a/app/test-pmd/rxonly.c
> +++ b/app/test-pmd/rxonly.c
> @@ -71,26 +71,6 @@
>  
>  #include "testpmd.h"
>  
> -#define MAX_PKT_RX_FLAGS 13
> -static const char *pkt_rx_flag_names[MAX_PKT_RX_FLAGS] = {
> -	"VLAN_PKT",
> -	"RSS_HASH",
> -	"PKT_RX_FDIR",
> -	"IP_CKSUM",
> -	"IP_CKSUM_BAD",
> -
> -	"IPV4_HDR",
> -	"IPV4_HDR_EXT",
> -	"IPV6_HDR",
> -	"IPV6_HDR_EXT",
> -
> -	"IEEE1588_PTP",
> -	"IEEE1588_TMST",
> -
> -	"TUNNEL_IPV4_HDR",
> -	"TUNNEL_IPV6_HDR",
> -};
> -
>  static inline void
>  print_ether_addr(const char *what, struct ether_addr *eth_addr)
>  {
> @@ -214,12 +194,16 @@ pkt_burst_receive(struct fwd_stream *fs)
>  		printf(" - Receive queue=0x%x", (unsigned) fs->rx_queue);
>  		printf("\n");
>  		if (ol_flags != 0) {
> -			int rxf;
> -
> -			for (rxf = 0; rxf < MAX_PKT_RX_FLAGS; rxf++) {
> -				if (ol_flags & (1 << rxf))
> -					printf("  PKT_RX_%s\n",
> -					       pkt_rx_flag_names[rxf]);
> +			unsigned rxf;
> +			const char *name;
> +
> +			for (rxf = 0; rxf < sizeof(mb->ol_flags) * 8; rxf++) {
> +				if ((ol_flags & (1ULL << rxf)) == 0)
> +					continue;
> +				name = rte_get_rx_ol_flag_name(1ULL << rxf);
> +				if (name == NULL)
> +					continue;
> +				printf("  %s\n", name);
>  			}
>  		}
>  		rte_pktmbuf_free(mb);
> diff --git a/lib/librte_mbuf/rte_mbuf.c b/lib/librte_mbuf/rte_mbuf.c
> index 52e7574..5cd9137 100644
> --- a/lib/librte_mbuf/rte_mbuf.c
> +++ b/lib/librte_mbuf/rte_mbuf.c
> @@ -196,3 +196,48 @@ rte_pktmbuf_dump(FILE *f, const struct rte_mbuf *m, unsigned dump_len)
>  		nb_segs --;
>  	}
>  }
> +
> +/*
> + * Get the name of a RX offload flag
> + */
> +const char *rte_get_rx_ol_flag_name(uint64_t mask)
> +{
> +	switch (mask) {
> +	case PKT_RX_VLAN_PKT: return "PKT_RX_VLAN_PKT";
> +	case PKT_RX_RSS_HASH: return "PKT_RX_RSS_HASH";
> +	case PKT_RX_FDIR: return "PKT_RX_FDIR";
> +	case PKT_RX_L4_CKSUM_BAD: return "PKT_RX_L4_CKSUM_BAD";
> +	case PKT_RX_IP_CKSUM_BAD: return "PKT_RX_IP_CKSUM_BAD";
> +	/* case PKT_RX_EIP_CKSUM_BAD: return "PKT_RX_EIP_CKSUM_BAD"; */
> +	/* case PKT_RX_OVERSIZE: return "PKT_RX_OVERSIZE"; */
> +	/* case PKT_RX_HBUF_OVERFLOW: return "PKT_RX_HBUF_OVERFLOW"; */
> +	/* case PKT_RX_RECIP_ERR: return "PKT_RX_RECIP_ERR"; */
> +	/* case PKT_RX_MAC_ERR: return "PKT_RX_MAC_ERR"; */
> +	case PKT_RX_IPV4_HDR: return "PKT_RX_IPV4_HDR";
> +	case PKT_RX_IPV4_HDR_EXT: return "PKT_RX_IPV4_HDR_EXT";
> +	case PKT_RX_IPV6_HDR: return "PKT_RX_IPV6_HDR";
> +	case PKT_RX_IPV6_HDR_EXT: return "PKT_RX_IPV6_HDR_EXT";
> +	case PKT_RX_IEEE1588_PTP: return "PKT_RX_IEEE1588_PTP";
> +	case PKT_RX_IEEE1588_TMST: return "PKT_RX_IEEE1588_TMST";
> +	case PKT_RX_TUNNEL_IPV4_HDR: return "PKT_RX_TUNNEL_IPV4_HDR";
> +	case PKT_RX_TUNNEL_IPV6_HDR: return "PKT_RX_TUNNEL_IPV6_HDR";
> +	default: return NULL;
> +	}
> +}
> +
> +/*
> + * Get the name of a TX offload flag
> + */
> +const char *rte_get_tx_ol_flag_name(uint64_t mask)
> +{
> +	switch (mask) {
> +	case PKT_TX_VLAN_PKT: return "PKT_TX_VLAN_PKT";
> +	case PKT_TX_IP_CKSUM: return "PKT_TX_IP_CKSUM";
> +	case PKT_TX_TCP_CKSUM: return "PKT_TX_TCP_CKSUM";
> +	case PKT_TX_SCTP_CKSUM: return "PKT_TX_SCTP_CKSUM";
> +	case PKT_TX_UDP_CKSUM: return "PKT_TX_UDP_CKSUM";
> +	case PKT_TX_IEEE1588_TMST: return "PKT_TX_IEEE1588_TMST";
> +	case PKT_TX_VXLAN_CKSUM: return "PKT_TX_VXLAN_CKSUM";
> +	default: return NULL;
> +	}
> +}
> diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h
> index 68fb988..e76617f 100644
> --- a/lib/librte_mbuf/rte_mbuf.h
> +++ b/lib/librte_mbuf/rte_mbuf.h
> @@ -129,6 +129,28 @@ extern "C" {
>  /* Use final bit of flags to indicate a control mbuf */
>  #define CTRL_MBUF_FLAG       (1ULL << 63) /**< Mbuf contains control data */
>  

I think this patch should perhaps also add to a comment at the top of the list
of flags that any new flags added should also be added to the appropriate
function in rte_mbuf.c. Having the comment in rte_mbuf.h where people would add the flags
should help remind people to keep the flag lists in sync.

/Bruce

> +/**
> + * Get the name of a RX offload flag
> + *
> + * @param mask
> + *   The mask describing the flag.
> + * @return
> + *   The name of this flag, or NULL if it's not a valid RX flag.
> + */
> +const char *rte_get_rx_ol_flag_name(uint64_t mask);
> +
> +/**
> + * Get the name of a TX offload flag
> + *
> + * @param mask
> + *   The mask describing the flag. Usually only one bit must be set.
> + *   Several bits can be given if they belong to the same mask.
> + *   Ex: PKT_TX_L4_MASK.
> + * @return
> + *   The name of this flag, or NULL if it's not a valid TX flag.
> + */
> +const char *rte_get_tx_ol_flag_name(uint64_t mask);
> +
>  /* define a set of marker types that can be used to refer to set points in the
>   * mbuf */
>  typedef void    *MARKER[0];   /**< generic marker for a point in a structure */
> -- 
> 2.1.0
> 

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [dpdk-dev] [PATCH v2 06/13] mbuf: add functions to get the name of an ol_flag
  2014-11-17 10:39     ` Bruce Richardson
@ 2014-11-17 12:51       ` Olivier MATZ
  0 siblings, 0 replies; 112+ messages in thread
From: Olivier MATZ @ 2014-11-17 12:51 UTC (permalink / raw)
  To: Bruce Richardson; +Cc: dev, jigsaw

Hi Bruce,

On 11/17/2014 11:39 AM, Bruce Richardson wrote:
>> +/*
>> + * Get the name of a TX offload flag
>> + */
>> +const char *rte_get_tx_ol_flag_name(uint64_t mask)
>> +{
>> +	switch (mask) {
>> +	case PKT_TX_VLAN_PKT: return "PKT_TX_VLAN_PKT";
>> +	case PKT_TX_IP_CKSUM: return "PKT_TX_IP_CKSUM";
>> +	case PKT_TX_TCP_CKSUM: return "PKT_TX_TCP_CKSUM";
>> +	case PKT_TX_SCTP_CKSUM: return "PKT_TX_SCTP_CKSUM";
>> +	case PKT_TX_UDP_CKSUM: return "PKT_TX_UDP_CKSUM";
>> +	case PKT_TX_IEEE1588_TMST: return "PKT_TX_IEEE1588_TMST";
>> +	case PKT_TX_VXLAN_CKSUM: return "PKT_TX_VXLAN_CKSUM";
>> +	default: return NULL;
>> +	}
>> +}
>> diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h
>> index 68fb988..e76617f 100644
>> --- a/lib/librte_mbuf/rte_mbuf.h
>> +++ b/lib/librte_mbuf/rte_mbuf.h
>> @@ -129,6 +129,28 @@ extern "C" {
>>  /* Use final bit of flags to indicate a control mbuf */
>>  #define CTRL_MBUF_FLAG       (1ULL << 63) /**< Mbuf contains control data */
>>  
> 
> I think this patch should perhaps also add to a comment at the top of the list
> of flags that any new flags added should also be added to the appropriate
> function in rte_mbuf.c. Having the comment in rte_mbuf.h where people would add the flags
> should help remind people to keep the flag lists in sync.

Good idea, I'll add it.

Regards,
Olivier

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [dpdk-dev] [PATCH v2 08/13] testpmd: rework csum forward engine
  2014-11-17  8:11     ` Liu, Jijiang
@ 2014-11-17 13:00       ` Olivier MATZ
  0 siblings, 0 replies; 112+ messages in thread
From: Olivier MATZ @ 2014-11-17 13:00 UTC (permalink / raw)
  To: Liu, Jijiang; +Cc: dev, jigsaw

Hi Jijiang,

On 11/17/2014 09:11 AM, Liu, Jijiang wrote:
>> +/* Calculate the checksum of outer header (only vxlan is supported,
>> + * meaning IP + UDP). The caller already checked that it's a vxlan
>> + * packet */
>> +static uint64_t
>> +process_outer_cksums(void *outer_l3_hdr, uint16_t outer_ethertype,
>> +	uint16_t outer_l3_len, uint16_t testpmd_ol_flags) {
>> +	struct ipv4_hdr *ipv4_hdr = outer_l3_hdr;
>> +	struct ipv6_hdr *ipv6_hdr = outer_l3_hdr;
>> +	struct udp_hdr *udp_hdr;
>> +	uint64_t ol_flags = 0;
>> +
>> +	if (testpmd_ol_flags & TESTPMD_TX_OFFLOAD_VXLAN_CKSUM)
>> +		ol_flags |= PKT_TX_VXLAN_CKSUM;
>> +
>> +	if (outer_ethertype == _htons(ETHER_TYPE_IPv4)) {
>> +		ipv4_hdr->hdr_checksum = 0;
>> +
>> +		if ((testpmd_ol_flags & TESTPMD_TX_OFFLOAD_VXLAN_CKSUM)
>> == 0)
>> +			ipv4_hdr->hdr_checksum = get_ipv4_cksum(ipv4_hdr);
>> +	}
> 
> As I mentioned, we should use TESTPMD_TX_OFFLOAD_IP_CKSUM instead of using TESTPMD_TX_OFFLOAD_VXLAN_CKSUM flag to check if we need to set outer IP checksum offload.
> In other words, even if VXLAN packet, outer IP TX checksum offload is also needed if  TESTPMD_TX_OFFLOAD_IP_CKSUM is set.

The csum forward engine works as follow after the rework. I can add
some more comments in the patch or testpmd help to describe it more
clearly.

Receive a burst of packets, and for each packet:
 - parse packet, and try to recognize a supported packet type (1)
 - if it's not a supported packet type, don't touch the packet, else:
 - modify the IPs in inner headers and in outer headers if any
 - reprocess the checksum of all supported layers. This is done in SW
   or HW, depending on testpmd command line configuration
 - if TSO is enabled in testpmd command line, also flag the mbuf for TCP
   segmentation offload (this implies HW TCP checksum)
Then transmit packets on the output port.

(1) Supported packets are:
  Ether / (vlan) / IP|IP6 / UDP|TCP|SCTP .
  Ether / (vlan) / outer IP|IP6 / outer UDP / VxLAN / Ether / IP|IP6 /
UDP|TCP|SCTP

The testpmd command line for csum takes the following arguments:
  tx_cksum set (ip|udp|tcp|sctp|vxlan) (hw|sw) (port_id)

- "ip|udp|tcp|sctp" always concern the inner layer
- "vxlan" concerns the outer IP and UDP layer (if packet is recognized
  as a vxlan packet)

Hope it's enough precisely described to be able to predict the output
of testpmd without reading the code or the i40e datasheets. This was
not so clear before.

So, following this description, there is not reason to check the
TESTPMD_TX_OFFLOAD_IP_CKSUM when scheduling the hardware VxLAN checksum.
One thing may be wrong however, it's the mbuf flags set in the packet.
But we cannot say it's wrong today because the API is not documented.
But the VXLAN feature is not enough documented to be sure it's wrong.


Regards,
Olivier

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [dpdk-dev] [PATCH v2 02/13] ixgbe: fix remaining pkt_flags variable size to 64 bits
  2014-11-14 17:03   ` [dpdk-dev] [PATCH v2 02/13] ixgbe: fix remaining pkt_flags variable size to 64 bits Olivier Matz
@ 2014-11-17 16:47     ` Walukiewicz, Miroslaw
  2014-11-17 17:03       ` Olivier MATZ
  0 siblings, 1 reply; 112+ messages in thread
From: Walukiewicz, Miroslaw @ 2014-11-17 16:47 UTC (permalink / raw)
  To: Olivier Matz, dev; +Cc: jigsaw



> -----Original Message-----
> From: Olivier Matz [mailto:olivier.matz@6wind.com]
> Sent: Friday, November 14, 2014 6:03 PM
> To: dev@dpdk.org
> Cc: olivier.matz@6wind.com; Walukiewicz, Miroslaw; Liu, Jijiang; Liu, Yong;
> jigsaw@gmail.com; Richardson, Bruce
> Subject: [PATCH v2 02/13] ixgbe: fix remaining pkt_flags variable size to 64
> bits
> 
> Since commit 4332beee9 "mbuf: expand ol_flags field to 64-bits", the
> packet flags are now 64 bits wide. Some occurences were forgotten in
> the ixgbe driver.

I think it should be present in separate patch. I do no not see any relation to TSO

> 
> Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
> Acked-by: Bruce Richardson <bruce.richardson@intel.com>
> ---
>  lib/librte_pmd_ixgbe/ixgbe_rxtx.c | 12 ++++++------
>  1 file changed, 6 insertions(+), 6 deletions(-)
> 
> diff --git a/lib/librte_pmd_ixgbe/ixgbe_rxtx.c
> b/lib/librte_pmd_ixgbe/ixgbe_rxtx.c
> index ecebbf6..7e470ce 100644
> --- a/lib/librte_pmd_ixgbe/ixgbe_rxtx.c
> +++ b/lib/librte_pmd_ixgbe/ixgbe_rxtx.c
> @@ -817,7 +817,7 @@ end_of_tx:
>  static inline uint64_t
>  rx_desc_hlen_type_rss_to_pkt_flags(uint32_t hl_tp_rs)
>  {
> -	uint16_t pkt_flags;
> +	uint64_t pkt_flags;
> 
>  	static uint64_t ip_pkt_types_map[16] = {
>  		0, PKT_RX_IPV4_HDR, PKT_RX_IPV4_HDR_EXT,
> PKT_RX_IPV4_HDR_EXT,
> @@ -834,7 +834,7 @@ rx_desc_hlen_type_rss_to_pkt_flags(uint32_t
> hl_tp_rs)
>  	};
> 
>  #ifdef RTE_LIBRTE_IEEE1588
> -	static uint32_t ip_pkt_etqf_map[8] = {
> +	static uint64_t ip_pkt_etqf_map[8] = {
>  		0, 0, 0, PKT_RX_IEEE1588_PTP,
>  		0, 0, 0, 0,
>  	};
> @@ -903,7 +903,7 @@ ixgbe_rx_scan_hw_ring(struct igb_rx_queue *rxq)
>  	struct igb_rx_entry *rxep;
>  	struct rte_mbuf *mb;
>  	uint16_t pkt_len;
> -	uint16_t pkt_flags;
> +	uint64_t pkt_flags;
>  	int s[LOOK_AHEAD], nb_dd;
>  	int i, j, nb_rx = 0;
> 
> @@ -1335,7 +1335,7 @@ ixgbe_recv_scattered_pkts(void *rx_queue, struct
> rte_mbuf **rx_pkts,
>  	uint16_t nb_rx;
>  	uint16_t nb_hold;
>  	uint16_t data_len;
> -	uint16_t pkt_flags;
> +	uint64_t pkt_flags;
> 
>  	nb_rx = 0;
>  	nb_hold = 0;
> @@ -1511,9 +1511,9 @@ ixgbe_recv_scattered_pkts(void *rx_queue, struct
> rte_mbuf **rx_pkts,
>  		first_seg->vlan_tci = rte_le_to_cpu_16(rxd.wb.upper.vlan);
>  		hlen_type_rss =
> rte_le_to_cpu_32(rxd.wb.lower.lo_dword.data);
>  		pkt_flags =
> rx_desc_hlen_type_rss_to_pkt_flags(hlen_type_rss);
> -		pkt_flags = (uint16_t)(pkt_flags |
> +		pkt_flags = (pkt_flags |
>  				rx_desc_status_to_pkt_flags(staterr));
> -		pkt_flags = (uint16_t)(pkt_flags |
> +		pkt_flags = (pkt_flags |
>  				rx_desc_error_to_pkt_flags(staterr));
>  		first_seg->ol_flags = pkt_flags;
> 
> --
> 2.1.0

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [dpdk-dev] [PATCH v2 02/13] ixgbe: fix remaining pkt_flags variable size to 64 bits
  2014-11-17 16:47     ` Walukiewicz, Miroslaw
@ 2014-11-17 17:03       ` Olivier MATZ
  2014-11-17 17:40         ` Thomas Monjalon
  0 siblings, 1 reply; 112+ messages in thread
From: Olivier MATZ @ 2014-11-17 17:03 UTC (permalink / raw)
  To: Walukiewicz, Miroslaw, dev, Thomas Monjalon; +Cc: jigsaw

Hi Miroslaw,

On 11/17/2014 05:47 PM, Walukiewicz, Miroslaw wrote:
> 
> 
>> -----Original Message-----
>> From: Olivier Matz [mailto:olivier.matz@6wind.com]
>> Sent: Friday, November 14, 2014 6:03 PM
>> To: dev@dpdk.org
>> Cc: olivier.matz@6wind.com; Walukiewicz, Miroslaw; Liu, Jijiang; Liu, Yong;
>> jigsaw@gmail.com; Richardson, Bruce
>> Subject: [PATCH v2 02/13] ixgbe: fix remaining pkt_flags variable size to 64
>> bits
>>
>> Since commit 4332beee9 "mbuf: expand ol_flags field to 64-bits", the
>> packet flags are now 64 bits wide. Some occurences were forgotten in
>> the ixgbe driver.
> 
> I think it should be present in separate patch. I do no not see any relation to TSO

You are right, there is no relation with TSO. The reason why I initially
added it in the same patchset is because I discovered this bug while
implementing TSO and I wanted to avoid too much noise on the list.

I can take out some patches from the series, but maybe it's too late
and it would confuse patchwork.

Thomas, what do you think?

Regards,
Olivier

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [dpdk-dev] [PATCH v2 02/13] ixgbe: fix remaining pkt_flags variable size to 64 bits
  2014-11-17 17:03       ` Olivier MATZ
@ 2014-11-17 17:40         ` Thomas Monjalon
  0 siblings, 0 replies; 112+ messages in thread
From: Thomas Monjalon @ 2014-11-17 17:40 UTC (permalink / raw)
  To: Olivier MATZ; +Cc: dev, jigsaw

2014-11-17 18:03, Olivier MATZ:
> Hi Miroslaw,
> 
> On 11/17/2014 05:47 PM, Walukiewicz, Miroslaw wrote:
> > 
> > 
> >> -----Original Message-----
> >> From: Olivier Matz [mailto:olivier.matz@6wind.com]
> >> Sent: Friday, November 14, 2014 6:03 PM
> >> To: dev@dpdk.org
> >> Cc: olivier.matz@6wind.com; Walukiewicz, Miroslaw; Liu, Jijiang; Liu, Yong;
> >> jigsaw@gmail.com; Richardson, Bruce
> >> Subject: [PATCH v2 02/13] ixgbe: fix remaining pkt_flags variable size to 64
> >> bits
> >>
> >> Since commit 4332beee9 "mbuf: expand ol_flags field to 64-bits", the
> >> packet flags are now 64 bits wide. Some occurences were forgotten in
> >> the ixgbe driver.
> > 
> > I think it should be present in separate patch. I do no not see any relation to TSO
> 
> You are right, there is no relation with TSO. The reason why I initially
> added it in the same patchset is because I discovered this bug while
> implementing TSO and I wanted to avoid too much noise on the list.
> 
> I can take out some patches from the series, but maybe it's too late
> and it would confuse patchwork.
> 
> Thomas, what do you think?

In general, it's better to have only one feature in a patchset.
It's not a real problem here because TSO is planned for release 1.8 and fixes
are also welcome. So all the patches should enter in the coming days.
By the way, there is no problem with patchwork. You are free to choose :)

-- 
Thomas

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [dpdk-dev] [PATCH v2 09/13] mbuf: introduce new checksum API
  2014-11-14 17:03   ` [dpdk-dev] [PATCH v2 09/13] mbuf: introduce new checksum API Olivier Matz
@ 2014-11-17 18:15     ` Ananyev, Konstantin
  2014-11-18  9:10       ` Olivier MATZ
  0 siblings, 1 reply; 112+ messages in thread
From: Ananyev, Konstantin @ 2014-11-17 18:15 UTC (permalink / raw)
  To: Olivier Matz, dev; +Cc: jigsaw



> -----Original Message-----
> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Olivier Matz
> Sent: Friday, November 14, 2014 5:03 PM
> To: dev@dpdk.org
> Cc: jigsaw@gmail.com
> Subject: [dpdk-dev] [PATCH v2 09/13] mbuf: introduce new checksum API
> 
> Introduce new functions to calculate checksums. These new functions
> are derivated from the ones provided csumonly.c but slightly reworked.
> There is still some room for future optimization of these functions
> (maybe SSE/AVX, ...).
> 
> This API will be modified in tbe next commits by the introduction of
> TSO that requires a different pseudo header checksum to be set in the
> packet.
> 
> Signed-off-by: Olivier Matz <olivier.matz@6wind.com>

Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>

Just 2 nits from me:

1)
> +static inline uint16_t
> +rte_raw_cksum(const char *buf, size_t len)
> +{
...
> +	while (len >= 8) {
> +		sum += u16[0]; sum += u16[1]; sum += u16[2]; sum += u16[3];

Can you put each expression into a new line?
sum += u16[0];
sum += u16[1];
...

To make it easier to read.
Or can it be rewritten just like:
sum = (uint32_t)u16[0] + u16[1] + u16[2] + u16[3];
here?

2) 
> +	while (len >= 8) {
> +		sum += u16[0]; sum += u16[1]; sum += u16[2]; sum += u16[3];
> +		len -= 8;
> +		u16 += 4;
> +	}
> +	while (len >= 2) {
> +		sum += *u16;
> +		len -= 2;
> +		u16 += 1;
> +	}

In the code above, probably use sizeof(u16[0]) wherever appropriate.
To make things a bit more clearer and consistent.
...
while (len >=  4 * sizeof(u16[0]))
len -= 4 * sizeof(u16[0]);
u16 += 4; 
...
Same for second loop.


> ---
>  app/test-pmd/csumonly.c    | 133 ++-------------------------------
>  lib/librte_mbuf/rte_mbuf.h |   3 +-
>  lib/librte_net/rte_ip.h    | 179 +++++++++++++++++++++++++++++++++++++++++++++
>  3 files changed, 189 insertions(+), 126 deletions(-)
> 
> diff --git a/app/test-pmd/csumonly.c b/app/test-pmd/csumonly.c
> index dda5d9e..39f974d 100644
> --- a/app/test-pmd/csumonly.c
> +++ b/app/test-pmd/csumonly.c
> @@ -86,137 +86,22 @@
>  #define _htons(x) (x)
>  #endif
> 
> -static inline uint16_t
> -get_16b_sum(uint16_t *ptr16, uint32_t nr)
> -{
> -	uint32_t sum = 0;
> -	while (nr > 1)
> -	{
> -		sum +=*ptr16;
> -		nr -= sizeof(uint16_t);
> -		ptr16++;
> -		if (sum > UINT16_MAX)
> -			sum -= UINT16_MAX;
> -	}
> -
> -	/* If length is in odd bytes */
> -	if (nr)
> -		sum += *((uint8_t*)ptr16);
> -
> -	sum = ((sum & 0xffff0000) >> 16) + (sum & 0xffff);
> -	sum &= 0x0ffff;
> -	return (uint16_t)sum;
> -}
> -
> -static inline uint16_t
> -get_ipv4_cksum(struct ipv4_hdr *ipv4_hdr)
> -{
> -	uint16_t cksum;
> -	cksum = get_16b_sum((uint16_t*)ipv4_hdr, sizeof(struct ipv4_hdr));
> -	return (uint16_t)((cksum == 0xffff)?cksum:~cksum);
> -}
> -
> -
> -static inline uint16_t
> -get_ipv4_psd_sum(struct ipv4_hdr *ip_hdr)
> -{
> -	/* Pseudo Header for IPv4/UDP/TCP checksum */
> -	union ipv4_psd_header {
> -		struct {
> -			uint32_t src_addr; /* IP address of source host. */
> -			uint32_t dst_addr; /* IP address of destination host(s). */
> -			uint8_t  zero;     /* zero. */
> -			uint8_t  proto;    /* L4 protocol type. */
> -			uint16_t len;      /* L4 length. */
> -		} __attribute__((__packed__));
> -		uint16_t u16_arr[0];
> -	} psd_hdr;
> -
> -	psd_hdr.src_addr = ip_hdr->src_addr;
> -	psd_hdr.dst_addr = ip_hdr->dst_addr;
> -	psd_hdr.zero     = 0;
> -	psd_hdr.proto    = ip_hdr->next_proto_id;
> -	psd_hdr.len      = rte_cpu_to_be_16((uint16_t)(rte_be_to_cpu_16(ip_hdr->total_length)
> -				- sizeof(struct ipv4_hdr)));
> -	return get_16b_sum(psd_hdr.u16_arr, sizeof(psd_hdr));
> -}
> -
> -static inline uint16_t
> -get_ipv6_psd_sum(struct ipv6_hdr *ip_hdr)
> -{
> -	/* Pseudo Header for IPv6/UDP/TCP checksum */
> -	union ipv6_psd_header {
> -		struct {
> -			uint8_t src_addr[16]; /* IP address of source host. */
> -			uint8_t dst_addr[16]; /* IP address of destination host(s). */
> -			uint32_t len;         /* L4 length. */
> -			uint32_t proto;       /* L4 protocol - top 3 bytes must be zero */
> -		} __attribute__((__packed__));
> -
> -		uint16_t u16_arr[0]; /* allow use as 16-bit values with safe aliasing */
> -	} psd_hdr;
> -
> -	rte_memcpy(&psd_hdr.src_addr, ip_hdr->src_addr,
> -			sizeof(ip_hdr->src_addr) + sizeof(ip_hdr->dst_addr));
> -	psd_hdr.len       = ip_hdr->payload_len;
> -	psd_hdr.proto     = (ip_hdr->proto << 24);
> -
> -	return get_16b_sum(psd_hdr.u16_arr, sizeof(psd_hdr));
> -}
> -
>  static uint16_t
>  get_psd_sum(void *l3_hdr, uint16_t ethertype)
>  {
>  	if (ethertype == _htons(ETHER_TYPE_IPv4))
> -		return get_ipv4_psd_sum(l3_hdr);
> +		return rte_ipv4_phdr_cksum(l3_hdr);
>  	else /* assume ethertype == ETHER_TYPE_IPv6 */
> -		return get_ipv6_psd_sum(l3_hdr);
> -}
> -
> -static inline uint16_t
> -get_ipv4_udptcp_checksum(struct ipv4_hdr *ipv4_hdr, uint16_t *l4_hdr)
> -{
> -	uint32_t cksum;
> -	uint32_t l4_len;
> -
> -	l4_len = rte_be_to_cpu_16(ipv4_hdr->total_length) - sizeof(struct ipv4_hdr);
> -
> -	cksum = get_16b_sum(l4_hdr, l4_len);
> -	cksum += get_ipv4_psd_sum(ipv4_hdr);
> -
> -	cksum = ((cksum & 0xffff0000) >> 16) + (cksum & 0xffff);
> -	cksum = (~cksum) & 0xffff;
> -	if (cksum == 0)
> -		cksum = 0xffff;
> -	return (uint16_t)cksum;
> -}
> -
> -static inline uint16_t
> -get_ipv6_udptcp_checksum(struct ipv6_hdr *ipv6_hdr, uint16_t *l4_hdr)
> -{
> -	uint32_t cksum;
> -	uint32_t l4_len;
> -
> -	l4_len = rte_be_to_cpu_16(ipv6_hdr->payload_len);
> -
> -	cksum = get_16b_sum(l4_hdr, l4_len);
> -	cksum += get_ipv6_psd_sum(ipv6_hdr);
> -
> -	cksum = ((cksum & 0xffff0000) >> 16) + (cksum & 0xffff);
> -	cksum = (~cksum) & 0xffff;
> -	if (cksum == 0)
> -		cksum = 0xffff;
> -
> -	return (uint16_t)cksum;
> +		return rte_ipv6_phdr_cksum(l3_hdr);
>  }
> 
>  static uint16_t
>  get_udptcp_checksum(void *l3_hdr, void *l4_hdr, uint16_t ethertype)
>  {
>  	if (ethertype == _htons(ETHER_TYPE_IPv4))
> -		return get_ipv4_udptcp_checksum(l3_hdr, l4_hdr);
> +		return rte_ipv4_udptcp_cksum(l3_hdr, l4_hdr);
>  	else /* assume ethertype == ETHER_TYPE_IPv6 */
> -		return get_ipv6_udptcp_checksum(l3_hdr, l4_hdr);
> +		return rte_ipv6_udptcp_cksum(l3_hdr, l4_hdr);
>  }
> 
>  /*
> @@ -294,7 +179,7 @@ process_inner_cksums(void *l3_hdr, uint16_t ethertype, uint16_t l3_len,
>  		if (testpmd_ol_flags & TESTPMD_TX_OFFLOAD_IP_CKSUM)
>  			ol_flags |= PKT_TX_IP_CKSUM;
>  		else
> -			ipv4_hdr->hdr_checksum = get_ipv4_cksum(ipv4_hdr);
> +			ipv4_hdr->hdr_checksum = rte_ipv4_cksum(ipv4_hdr);
> 
>  	}
>  	else if (ethertype != _htons(ETHER_TYPE_IPv6))
> @@ -366,7 +251,7 @@ process_outer_cksums(void *outer_l3_hdr, uint16_t outer_ethertype,
>  		ipv4_hdr->hdr_checksum = 0;
> 
>  		if ((testpmd_ol_flags & TESTPMD_TX_OFFLOAD_VXLAN_CKSUM) == 0)
> -			ipv4_hdr->hdr_checksum = get_ipv4_cksum(ipv4_hdr);
> +			ipv4_hdr->hdr_checksum = rte_ipv4_cksum(ipv4_hdr);
>  	}
> 
>  	udp_hdr = (struct udp_hdr *)((char *)outer_l3_hdr + outer_l3_len);
> @@ -376,12 +261,10 @@ process_outer_cksums(void *outer_l3_hdr, uint16_t outer_ethertype,
>  		if ((testpmd_ol_flags & TESTPMD_TX_OFFLOAD_VXLAN_CKSUM) == 0) {
>  			if (outer_ethertype == _htons(ETHER_TYPE_IPv4))
>  				udp_hdr->dgram_cksum =
> -					get_ipv4_udptcp_checksum(ipv4_hdr,
> -						(uint16_t *)udp_hdr);
> +					rte_ipv4_udptcp_cksum(ipv4_hdr, udp_hdr);
>  			else
>  				udp_hdr->dgram_cksum =
> -					get_ipv6_udptcp_checksum(ipv6_hdr,
> -						(uint16_t *)udp_hdr);
> +					rte_ipv6_udptcp_cksum(ipv6_hdr, udp_hdr);
>  		}
>  	}
> 
> diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h
> index e76617f..3c8e825 100644
> --- a/lib/librte_mbuf/rte_mbuf.h
> +++ b/lib/librte_mbuf/rte_mbuf.h
> @@ -114,7 +114,8 @@ extern "C" {
>   *  - fill l2_len and l3_len in mbuf
>   *  - set the flags PKT_TX_TCP_CKSUM, PKT_TX_SCTP_CKSUM or PKT_TX_UDP_CKSUM
>   *  - calculate the pseudo header checksum and set it in the L4 header (only
> - *    for TCP or UDP). For SCTP, set the crc field to 0.
> + *    for TCP or UDP). See rte_ipv4_phdr_cksum() and rte_ipv6_phdr_cksum().
> + *    For SCTP, set the crc field to 0.
>   */
>  #define PKT_TX_L4_NO_CKSUM   (0ULL << 52) /* Disable L4 cksum of TX pkt. */
>  #define PKT_TX_TCP_CKSUM     (1ULL << 52) /**< TCP cksum of TX pkt. computed by NIC. */
> diff --git a/lib/librte_net/rte_ip.h b/lib/librte_net/rte_ip.h
> index e3f65c1..9cfca7f 100644
> --- a/lib/librte_net/rte_ip.h
> +++ b/lib/librte_net/rte_ip.h
> @@ -78,6 +78,9 @@
> 
>  #include <stdint.h>
> 
> +#include <rte_memcpy.h>
> +#include <rte_byteorder.h>
> +
>  #ifdef __cplusplus
>  extern "C" {
>  #endif
> @@ -247,6 +250,124 @@ struct ipv4_hdr {
>  	((x) >= IPV4_MIN_MCAST && (x) <= IPV4_MAX_MCAST) /**< check if IPv4 address is multicast */
> 
>  /**
> + * Process the non-complemented checksum of a buffer.
> + *
> + * @param buf
> + *   Pointer to the buffer.
> + * @param len
> + *   Length of the buffer.
> + * @return
> + *   The non-complemented checksum.
> + */
> +static inline uint16_t
> +rte_raw_cksum(const char *buf, size_t len)
> +{
> +	const uint16_t *u16 = (const uint16_t *)buf;
> +	uint32_t sum = 0;
> +
> +	while (len >= 8) {
> +		sum += u16[0]; sum += u16[1]; sum += u16[2]; sum += u16[3];
> +		len -= 8;
> +		u16 += 4;
> +	}
> +	while (len >= 2) {
> +		sum += *u16;
> +		len -= 2;
> +		u16 += 1;
> +	}
> +
> +	/* if length is in odd bytes */
> +	if (len == 1)
> +		sum += *((const uint8_t *)u16);
> +
> +	sum = ((sum & 0xffff0000) >> 16) + (sum & 0xffff);
> +	sum = ((sum & 0xffff0000) >> 16) + (sum & 0xffff);
> +	return (uint16_t)sum;
> +}
> +
> +/**
> + * Process the IPv4 checksum of an IPv4 header.
> + *
> + * The checksum field must be set to 0 by the caller.
> + *
> + * @param ipv4_hdr
> + *   The pointer to the contiguous IPv4 header.
> + * @return
> + *   The complemented checksum to set in the IP packet.
> + */
> +static inline uint16_t
> +rte_ipv4_cksum(const struct ipv4_hdr *ipv4_hdr)
> +{
> +	uint16_t cksum;
> +	cksum = rte_raw_cksum((const char *)ipv4_hdr, sizeof(struct ipv4_hdr));
> +	return ((cksum == 0xffff) ? cksum : ~cksum);
> +}
> +
> +/**
> + * Process the pseudo-header checksum of an IPv4 header.
> + *
> + * The checksum field must be set to 0 by the caller.
> + *
> + * @param ipv4_hdr
> + *   The pointer to the contiguous IPv4 header.
> + * @return
> + *   The non-complemented checksum to set in the L4 header.
> + */
> +static inline uint16_t
> +rte_ipv4_phdr_cksum(const struct ipv4_hdr *ipv4_hdr)
> +{
> +	struct ipv4_psd_header {
> +		uint32_t src_addr; /* IP address of source host. */
> +		uint32_t dst_addr; /* IP address of destination host. */
> +		uint8_t  zero;     /* zero. */
> +		uint8_t  proto;    /* L4 protocol type. */
> +		uint16_t len;      /* L4 length. */
> +	} psd_hdr;
> +
> +	psd_hdr.src_addr = ipv4_hdr->src_addr;
> +	psd_hdr.dst_addr = ipv4_hdr->dst_addr;
> +	psd_hdr.zero = 0;
> +	psd_hdr.proto = ipv4_hdr->next_proto_id;
> +	psd_hdr.len = rte_cpu_to_be_16(
> +		(uint16_t)(rte_be_to_cpu_16(ipv4_hdr->total_length)
> +			- sizeof(struct ipv4_hdr)));
> +	return rte_raw_cksum((const char *)&psd_hdr, sizeof(psd_hdr));
> +}
> +
> +/**
> + * Process the IPv4 UDP or TCP checksum.
> + *
> + * The IPv4 header should not contains options. The IP and layer 4
> + * checksum must be set to 0 in the packet by the caller.
> + *
> + * @param ipv4_hdr
> + *   The pointer to the contiguous IPv4 header.
> + * @param l4_hdr
> + *   The pointer to the beginning of the L4 header.
> + * @return
> + *   The complemented checksum to set in the IP packet.
> + */
> +static inline uint16_t
> +rte_ipv4_udptcp_cksum(const struct ipv4_hdr *ipv4_hdr, const void *l4_hdr)
> +{
> +	uint32_t cksum;
> +	uint32_t l4_len;
> +
> +	l4_len = rte_be_to_cpu_16(ipv4_hdr->total_length) -
> +		sizeof(struct ipv4_hdr);
> +
> +	cksum = rte_raw_cksum(l4_hdr, l4_len);
> +	cksum += rte_ipv4_phdr_cksum(ipv4_hdr);
> +
> +	cksum = ((cksum & 0xffff0000) >> 16) + (cksum & 0xffff);
> +	cksum = (~cksum) & 0xffff;
> +	if (cksum == 0)
> +		cksum = 0xffff;
> +
> +	return cksum;
> +}
> +
> +/**
>   * IPv6 Header
>   */
>  struct ipv6_hdr {
> @@ -258,6 +379,64 @@ struct ipv6_hdr {
>  	uint8_t  dst_addr[16]; /**< IP address of destination host(s). */
>  } __attribute__((__packed__));
> 
> +/**
> + * Process the pseudo-header checksum of an IPv6 header.
> + *
> + * @param ipv6_hdr
> + *   The pointer to the contiguous IPv6 header.
> + * @return
> + *   The non-complemented checksum to set in the L4 header.
> + */
> +static inline uint16_t
> +rte_ipv6_phdr_cksum(const struct ipv6_hdr *ipv6_hdr)
> +{
> +	struct ipv6_psd_header {
> +		uint8_t src_addr[16]; /* IP address of source host. */
> +		uint8_t dst_addr[16]; /* IP address of destination host. */
> +		uint32_t len;         /* L4 length. */
> +		uint32_t proto;       /* L4 protocol - top 3 bytes must be zero */
> +	} psd_hdr;
> +
> +	rte_memcpy(&psd_hdr.src_addr, ipv6_hdr->src_addr,
> +		sizeof(ipv6_hdr->src_addr) + sizeof(ipv6_hdr->dst_addr));
> +	psd_hdr.proto = (ipv6_hdr->proto << 24);
> +	psd_hdr.len = ipv6_hdr->payload_len;
> +
> +	return rte_raw_cksum((const char *)&psd_hdr, sizeof(psd_hdr));
> +}
> +
> +/**
> + * Process the IPv6 UDP or TCP checksum.
> + *
> + * The IPv4 header should not contains options. The layer 4 checksum
> + * must be set to 0 in the packet by the caller.
> + *
> + * @param ipv6_hdr
> + *   The pointer to the contiguous IPv6 header.
> + * @param l4_hdr
> + *   The pointer to the beginning of the L4 header.
> + * @return
> + *   The complemented checksum to set in the IP packet.
> + */
> +static inline uint16_t
> +rte_ipv6_udptcp_cksum(const struct ipv6_hdr *ipv6_hdr, const void *l4_hdr)
> +{
> +	uint32_t cksum;
> +	uint32_t l4_len;
> +
> +	l4_len = rte_be_to_cpu_16(ipv6_hdr->payload_len);
> +
> +	cksum = rte_raw_cksum(l4_hdr, l4_len);
> +	cksum += rte_ipv6_phdr_cksum(ipv6_hdr, 0);
> +
> +	cksum = ((cksum & 0xffff0000) >> 16) + (cksum & 0xffff);
> +	cksum = (~cksum) & 0xffff;
> +	if (cksum == 0)
> +		cksum = 0xffff;
> +
> +	return cksum;
> +}
> +
>  #ifdef __cplusplus
>  }
>  #endif
> --
> 2.1.0

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [dpdk-dev] [PATCH v2 11/13] ixgbe: support TCP segmentation offload
  2014-11-14 17:03   ` [dpdk-dev] [PATCH v2 11/13] ixgbe: support " Olivier Matz
@ 2014-11-17 18:26     ` Ananyev, Konstantin
  2014-11-18  9:11       ` Olivier MATZ
  0 siblings, 1 reply; 112+ messages in thread
From: Ananyev, Konstantin @ 2014-11-17 18:26 UTC (permalink / raw)
  To: Olivier Matz, dev; +Cc: jigsaw



> -----Original Message-----
> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Olivier Matz
> Sent: Friday, November 14, 2014 5:03 PM
> To: dev@dpdk.org
> Cc: jigsaw@gmail.com
> Subject: [dpdk-dev] [PATCH v2 11/13] ixgbe: support TCP segmentation offload
> 
> Implement TSO (TCP segmentation offload) in ixgbe driver. The driver is
> now able to use PKT_TX_TCP_SEG mbuf flag and mbuf hardware offload infos
> (l2_len, l3_len, l4_len, tso_segsz) to configure the hardware support of
> TCP segmentation.
> 
> In ixgbe, when doing TSO, the IP length must not be included in the TCP
> pseudo header checksum. A new function ixgbe_fix_tcp_phdr_cksum() is
> used to fix the pseudo header checksum of the packet before giving it to
> the hardware.
> 
> In the patch, the tx_desc_cksum_flags_to_olinfo() and
> tx_desc_ol_flags_to_cmdtype() functions have been reworked to make them
> clearer. This should not impact performance as gcc (version 4.8 in my
> case) is smart enough to convert the tests into a code that does not
> contain any branch instruction.
> 
> Signed-off-by: Olivier Matz <olivier.matz@6wind.com>

Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>

Just one thing - double semicolon -  looks like a typo:
> +	/* check if TCP segmentation required for this packet */
> +	if (ol_flags & PKT_TX_TCP_SEG) {
> +		/* implies IP cksum and TCP cksum */
> +		type_tucmd_mlhl = IXGBE_ADVTXD_TUCMD_IPV4 |
> +			IXGBE_ADVTXD_TUCMD_L4T_TCP |
> +			IXGBE_ADVTXD_DTYP_CTXT | IXGBE_ADVTXD_DCMD_DEXT;;
 

> ---
>  lib/librte_pmd_ixgbe/ixgbe_ethdev.c |   3 +-
>  lib/librte_pmd_ixgbe/ixgbe_rxtx.c   | 169 ++++++++++++++++++++++--------------
>  lib/librte_pmd_ixgbe/ixgbe_rxtx.h   |  19 ++--
>  3 files changed, 117 insertions(+), 74 deletions(-)
> 
> diff --git a/lib/librte_pmd_ixgbe/ixgbe_ethdev.c b/lib/librte_pmd_ixgbe/ixgbe_ethdev.c
> index 2eb609c..2c2ecc0 100644
> --- a/lib/librte_pmd_ixgbe/ixgbe_ethdev.c
> +++ b/lib/librte_pmd_ixgbe/ixgbe_ethdev.c
> @@ -1964,7 +1964,8 @@ ixgbe_dev_info_get(struct rte_eth_dev *dev, struct rte_eth_dev_info *dev_info)
>  		DEV_TX_OFFLOAD_IPV4_CKSUM  |
>  		DEV_TX_OFFLOAD_UDP_CKSUM   |
>  		DEV_TX_OFFLOAD_TCP_CKSUM   |
> -		DEV_TX_OFFLOAD_SCTP_CKSUM;
> +		DEV_TX_OFFLOAD_SCTP_CKSUM  |
> +		DEV_TX_OFFLOAD_TCP_TSO;
> 
>  	dev_info->default_rxconf = (struct rte_eth_rxconf) {
>  			.rx_thresh = {
> diff --git a/lib/librte_pmd_ixgbe/ixgbe_rxtx.c b/lib/librte_pmd_ixgbe/ixgbe_rxtx.c
> index 2df3385..19e3b73 100644
> --- a/lib/librte_pmd_ixgbe/ixgbe_rxtx.c
> +++ b/lib/librte_pmd_ixgbe/ixgbe_rxtx.c
> @@ -94,7 +94,8 @@
>  #define IXGBE_TX_OFFLOAD_MASK (			 \
>  		PKT_TX_VLAN_PKT |		 \
>  		PKT_TX_IP_CKSUM |		 \
> -		PKT_TX_L4_MASK)
> +		PKT_TX_L4_MASK |		 \
> +		PKT_TX_TCP_SEG)
> 
>  static inline struct rte_mbuf *
>  rte_rxmbuf_alloc(struct rte_mempool *mp)
> @@ -363,59 +364,84 @@ ixgbe_xmit_pkts_simple(void *tx_queue, struct rte_mbuf **tx_pkts,
>  static inline void
>  ixgbe_set_xmit_ctx(struct igb_tx_queue* txq,
>  		volatile struct ixgbe_adv_tx_context_desc *ctx_txd,
> -		uint64_t ol_flags, uint32_t vlan_macip_lens)
> +		uint64_t ol_flags, union ixgbe_tx_offload tx_offload)
>  {
>  	uint32_t type_tucmd_mlhl;
> -	uint32_t mss_l4len_idx;
> +	uint32_t mss_l4len_idx = 0;
>  	uint32_t ctx_idx;
> -	uint32_t cmp_mask;
> +	uint32_t vlan_macip_lens;
> +	union ixgbe_tx_offload tx_offload_mask;
> 
>  	ctx_idx = txq->ctx_curr;
> -	cmp_mask = 0;
> +	tx_offload_mask.data = 0;
>  	type_tucmd_mlhl = 0;
> 
> +	/* Specify which HW CTX to upload. */
> +	mss_l4len_idx |= (ctx_idx << IXGBE_ADVTXD_IDX_SHIFT);
> +
>  	if (ol_flags & PKT_TX_VLAN_PKT) {
> -		cmp_mask |= TX_VLAN_CMP_MASK;
> +		tx_offload_mask.vlan_tci = ~0;
>  	}
> 
> -	if (ol_flags & PKT_TX_IP_CKSUM) {
> -		type_tucmd_mlhl = IXGBE_ADVTXD_TUCMD_IPV4;
> -		cmp_mask |= TX_MACIP_LEN_CMP_MASK;
> -	}
> +	/* check if TCP segmentation required for this packet */
> +	if (ol_flags & PKT_TX_TCP_SEG) {
> +		/* implies IP cksum and TCP cksum */
> +		type_tucmd_mlhl = IXGBE_ADVTXD_TUCMD_IPV4 |
> +			IXGBE_ADVTXD_TUCMD_L4T_TCP |
> +			IXGBE_ADVTXD_DTYP_CTXT | IXGBE_ADVTXD_DCMD_DEXT;;
> +
> +		tx_offload_mask.l2_len = ~0;
> +		tx_offload_mask.l3_len = ~0;
> +		tx_offload_mask.l4_len = ~0;
> +		tx_offload_mask.tso_segsz = ~0;
> +		mss_l4len_idx |= tx_offload.tso_segsz << IXGBE_ADVTXD_MSS_SHIFT;
> +		mss_l4len_idx |= tx_offload.l4_len << IXGBE_ADVTXD_L4LEN_SHIFT;
> +	} else { /* no TSO, check if hardware checksum is needed */
> +		if (ol_flags & PKT_TX_IP_CKSUM) {
> +			type_tucmd_mlhl = IXGBE_ADVTXD_TUCMD_IPV4;
> +			tx_offload_mask.l2_len = ~0;
> +			tx_offload_mask.l3_len = ~0;
> +		}
> 
> -	/* Specify which HW CTX to upload. */
> -	mss_l4len_idx = (ctx_idx << IXGBE_ADVTXD_IDX_SHIFT);
> -	switch (ol_flags & PKT_TX_L4_MASK) {
> -	case PKT_TX_UDP_CKSUM:
> -		type_tucmd_mlhl |= IXGBE_ADVTXD_TUCMD_L4T_UDP |
> +		switch (ol_flags & PKT_TX_L4_MASK) {
> +		case PKT_TX_UDP_CKSUM:
> +			type_tucmd_mlhl |= IXGBE_ADVTXD_TUCMD_L4T_UDP |
>  				IXGBE_ADVTXD_DTYP_CTXT | IXGBE_ADVTXD_DCMD_DEXT;
> -		mss_l4len_idx |= sizeof(struct udp_hdr) << IXGBE_ADVTXD_L4LEN_SHIFT;
> -		cmp_mask |= TX_MACIP_LEN_CMP_MASK;
> -		break;
> -	case PKT_TX_TCP_CKSUM:
> -		type_tucmd_mlhl |= IXGBE_ADVTXD_TUCMD_L4T_TCP |
> +			mss_l4len_idx |= sizeof(struct udp_hdr) << IXGBE_ADVTXD_L4LEN_SHIFT;
> +			tx_offload_mask.l2_len = ~0;
> +			tx_offload_mask.l3_len = ~0;
> +			break;
> +		case PKT_TX_TCP_CKSUM:
> +			type_tucmd_mlhl |= IXGBE_ADVTXD_TUCMD_L4T_TCP |
>  				IXGBE_ADVTXD_DTYP_CTXT | IXGBE_ADVTXD_DCMD_DEXT;
> -		mss_l4len_idx |= sizeof(struct tcp_hdr) << IXGBE_ADVTXD_L4LEN_SHIFT;
> -		cmp_mask |= TX_MACIP_LEN_CMP_MASK;
> -		break;
> -	case PKT_TX_SCTP_CKSUM:
> -		type_tucmd_mlhl |= IXGBE_ADVTXD_TUCMD_L4T_SCTP |
> +			mss_l4len_idx |= sizeof(struct tcp_hdr) << IXGBE_ADVTXD_L4LEN_SHIFT;
> +			tx_offload_mask.l2_len = ~0;
> +			tx_offload_mask.l3_len = ~0;
> +			tx_offload_mask.l4_len = ~0;
> +			break;
> +		case PKT_TX_SCTP_CKSUM:
> +			type_tucmd_mlhl |= IXGBE_ADVTXD_TUCMD_L4T_SCTP |
>  				IXGBE_ADVTXD_DTYP_CTXT | IXGBE_ADVTXD_DCMD_DEXT;
> -		mss_l4len_idx |= sizeof(struct sctp_hdr) << IXGBE_ADVTXD_L4LEN_SHIFT;
> -		cmp_mask |= TX_MACIP_LEN_CMP_MASK;
> -		break;
> -	default:
> -		type_tucmd_mlhl |= IXGBE_ADVTXD_TUCMD_L4T_RSV |
> +			mss_l4len_idx |= sizeof(struct sctp_hdr) << IXGBE_ADVTXD_L4LEN_SHIFT;
> +			tx_offload_mask.l2_len = ~0;
> +			tx_offload_mask.l3_len = ~0;
> +			break;
> +		default:
> +			type_tucmd_mlhl |= IXGBE_ADVTXD_TUCMD_L4T_RSV |
>  				IXGBE_ADVTXD_DTYP_CTXT | IXGBE_ADVTXD_DCMD_DEXT;
> -		break;
> +			break;
> +		}
>  	}
> 
>  	txq->ctx_cache[ctx_idx].flags = ol_flags;
> -	txq->ctx_cache[ctx_idx].cmp_mask = cmp_mask;
> -	txq->ctx_cache[ctx_idx].vlan_macip_lens.data =
> -		vlan_macip_lens & cmp_mask;
> +	txq->ctx_cache[ctx_idx].tx_offload.data  =
> +		tx_offload_mask.data & tx_offload.data;
> +	txq->ctx_cache[ctx_idx].tx_offload_mask    = tx_offload_mask;
> 
>  	ctx_txd->type_tucmd_mlhl = rte_cpu_to_le_32(type_tucmd_mlhl);
> +	vlan_macip_lens = tx_offload.l3_len;
> +	vlan_macip_lens |= (tx_offload.l2_len << IXGBE_ADVTXD_MACLEN_SHIFT);
> +	vlan_macip_lens |= ((uint32_t)tx_offload.vlan_tci << IXGBE_ADVTXD_VLAN_SHIFT);
>  	ctx_txd->vlan_macip_lens = rte_cpu_to_le_32(vlan_macip_lens);
>  	ctx_txd->mss_l4len_idx   = rte_cpu_to_le_32(mss_l4len_idx);
>  	ctx_txd->seqnum_seed     = 0;
> @@ -427,20 +453,20 @@ ixgbe_set_xmit_ctx(struct igb_tx_queue* txq,
>   */
>  static inline uint32_t
>  what_advctx_update(struct igb_tx_queue *txq, uint64_t flags,
> -		uint32_t vlan_macip_lens)
> +		union ixgbe_tx_offload tx_offload)
>  {
>  	/* If match with the current used context */
>  	if (likely((txq->ctx_cache[txq->ctx_curr].flags == flags) &&
> -		(txq->ctx_cache[txq->ctx_curr].vlan_macip_lens.data ==
> -		(txq->ctx_cache[txq->ctx_curr].cmp_mask & vlan_macip_lens)))) {
> +		(txq->ctx_cache[txq->ctx_curr].tx_offload.data ==
> +		(txq->ctx_cache[txq->ctx_curr].tx_offload_mask.data & tx_offload.data)))) {
>  			return txq->ctx_curr;
>  	}
> 
>  	/* What if match with the next context  */
>  	txq->ctx_curr ^= 1;
>  	if (likely((txq->ctx_cache[txq->ctx_curr].flags == flags) &&
> -		(txq->ctx_cache[txq->ctx_curr].vlan_macip_lens.data ==
> -		(txq->ctx_cache[txq->ctx_curr].cmp_mask & vlan_macip_lens)))) {
> +		(txq->ctx_cache[txq->ctx_curr].tx_offload.data ==
> +		(txq->ctx_cache[txq->ctx_curr].tx_offload_mask.data & tx_offload.data)))) {
>  			return txq->ctx_curr;
>  	}
> 
> @@ -451,20 +477,25 @@ what_advctx_update(struct igb_tx_queue *txq, uint64_t flags,
>  static inline uint32_t
>  tx_desc_cksum_flags_to_olinfo(uint64_t ol_flags)
>  {
> -	static const uint32_t l4_olinfo[2] = {0, IXGBE_ADVTXD_POPTS_TXSM};
> -	static const uint32_t l3_olinfo[2] = {0, IXGBE_ADVTXD_POPTS_IXSM};
> -	uint32_t tmp;
> -
> -	tmp  = l4_olinfo[(ol_flags & PKT_TX_L4_MASK)  != PKT_TX_L4_NO_CKSUM];
> -	tmp |= l3_olinfo[(ol_flags & PKT_TX_IP_CKSUM) != 0];
> +	uint32_t tmp = 0;
> +	if ((ol_flags & PKT_TX_L4_MASK) != PKT_TX_L4_NO_CKSUM)
> +		tmp |= IXGBE_ADVTXD_POPTS_TXSM;
> +	if (ol_flags & PKT_TX_IP_CKSUM)
> +		tmp |= IXGBE_ADVTXD_POPTS_IXSM;
> +	if (ol_flags & PKT_TX_TCP_SEG)
> +		tmp |= IXGBE_ADVTXD_POPTS_TXSM;
>  	return tmp;
>  }
> 
>  static inline uint32_t
> -tx_desc_vlan_flags_to_cmdtype(uint64_t ol_flags)
> +tx_desc_ol_flags_to_cmdtype(uint64_t ol_flags)
>  {
> -	static const uint32_t vlan_cmd[2] = {0, IXGBE_ADVTXD_DCMD_VLE};
> -	return vlan_cmd[(ol_flags & PKT_TX_VLAN_PKT) != 0];
> +	uint32_t cmdtype = 0;
> +	if (ol_flags & PKT_TX_VLAN_PKT)
> +		cmdtype |= IXGBE_ADVTXD_DCMD_VLE;
> +	if (ol_flags & PKT_TX_TCP_SEG)
> +		cmdtype |= IXGBE_ADVTXD_DCMD_TSE;
> +	return cmdtype;
>  }
> 
>  /* Default RS bit threshold values */
> @@ -545,14 +576,6 @@ ixgbe_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
>  	volatile union ixgbe_adv_tx_desc *txd;
>  	struct rte_mbuf     *tx_pkt;
>  	struct rte_mbuf     *m_seg;
> -	union ixgbe_vlan_macip vlan_macip_lens;
> -	union {
> -		uint16_t u16;
> -		struct {
> -			uint16_t l3_len:9;
> -			uint16_t l2_len:7;
> -		};
> -	} l2_l3_len;
>  	uint64_t buf_dma_addr;
>  	uint32_t olinfo_status;
>  	uint32_t cmd_type_len;
> @@ -566,6 +589,7 @@ ixgbe_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
>  	uint64_t tx_ol_req;
>  	uint32_t ctx = 0;
>  	uint32_t new_ctx;
> +	union ixgbe_tx_offload tx_offload = { .data = 0 };
> 
>  	txq = tx_queue;
>  	sw_ring = txq->sw_ring;
> @@ -595,14 +619,15 @@ ixgbe_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
>  		/* If hardware offload required */
>  		tx_ol_req = ol_flags & IXGBE_TX_OFFLOAD_MASK;
>  		if (tx_ol_req) {
> -			l2_l3_len.l2_len = tx_pkt->l2_len;
> -			l2_l3_len.l3_len = tx_pkt->l3_len;
> -			vlan_macip_lens.f.vlan_tci = tx_pkt->vlan_tci;
> -			vlan_macip_lens.f.l2_l3_len = l2_l3_len.u16;
> +			tx_offload.l2_len = tx_pkt->l2_len;
> +			tx_offload.l3_len = tx_pkt->l3_len;
> +			tx_offload.l4_len = tx_pkt->l4_len;
> +			tx_offload.vlan_tci = tx_pkt->vlan_tci;
> +			tx_offload.tso_segsz = tx_pkt->tso_segsz;
> 
>  			/* If new context need be built or reuse the exist ctx. */
>  			ctx = what_advctx_update(txq, tx_ol_req,
> -				vlan_macip_lens.data);
> +				tx_offload);
>  			/* Only allocate context descriptor if required*/
>  			new_ctx = (ctx == IXGBE_CTX_NUM);
>  			ctx = txq->ctx_curr;
> @@ -717,13 +742,22 @@ ixgbe_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
>  		 */
>  		cmd_type_len = IXGBE_ADVTXD_DTYP_DATA |
>  			IXGBE_ADVTXD_DCMD_IFCS | IXGBE_ADVTXD_DCMD_DEXT;
> -		olinfo_status = (pkt_len << IXGBE_ADVTXD_PAYLEN_SHIFT);
> +
>  #ifdef RTE_LIBRTE_IEEE1588
>  		if (ol_flags & PKT_TX_IEEE1588_TMST)
>  			cmd_type_len |= IXGBE_ADVTXD_MAC_1588;
>  #endif
> 
> +		olinfo_status = 0;
>  		if (tx_ol_req) {
> +
> +			if (ol_flags & PKT_TX_TCP_SEG) {
> +				/* when TSO is on, paylen in descriptor is the
> +				 * not the packet len but the tcp payload len */
> +				pkt_len -= (tx_offload.l2_len +
> +					tx_offload.l3_len + tx_offload.l4_len);
> +			}
> +
>  			/*
>  			 * Setup the TX Advanced Context Descriptor if required
>  			 */
> @@ -744,7 +778,7 @@ ixgbe_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
>  				}
> 
>  				ixgbe_set_xmit_ctx(txq, ctx_txd, tx_ol_req,
> -				    vlan_macip_lens.data);
> +					tx_offload);
> 
>  				txe->last_id = tx_last;
>  				tx_id = txe->next_id;
> @@ -756,11 +790,13 @@ ixgbe_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
>  			 * This path will go through
>  			 * whatever new/reuse the context descriptor
>  			 */
> -			cmd_type_len  |= tx_desc_vlan_flags_to_cmdtype(ol_flags);
> +			cmd_type_len  |= tx_desc_ol_flags_to_cmdtype(ol_flags);
>  			olinfo_status |= tx_desc_cksum_flags_to_olinfo(ol_flags);
>  			olinfo_status |= ctx << IXGBE_ADVTXD_IDX_SHIFT;
>  		}
> 
> +		olinfo_status |= (pkt_len << IXGBE_ADVTXD_PAYLEN_SHIFT);
> +
>  		m_seg = tx_pkt;
>  		do {
>  			txd = &txr[tx_id];
> @@ -3611,9 +3647,10 @@ ixgbe_dev_tx_init(struct rte_eth_dev *dev)
>  	PMD_INIT_FUNC_TRACE();
>  	hw = IXGBE_DEV_PRIVATE_TO_HW(dev->data->dev_private);
> 
> -	/* Enable TX CRC (checksum offload requirement) */
> +	/* Enable TX CRC (checksum offload requirement) and hw padding
> +	 * (TSO requirement) */
>  	hlreg0 = IXGBE_READ_REG(hw, IXGBE_HLREG0);
> -	hlreg0 |= IXGBE_HLREG0_TXCRCEN;
> +	hlreg0 |= (IXGBE_HLREG0_TXCRCEN | IXGBE_HLREG0_TXPADEN);
>  	IXGBE_WRITE_REG(hw, IXGBE_HLREG0, hlreg0);
> 
>  	/* Setup the Base and Length of the Tx Descriptor Rings */
> diff --git a/lib/librte_pmd_ixgbe/ixgbe_rxtx.h b/lib/librte_pmd_ixgbe/ixgbe_rxtx.h
> index eb89715..13099af 100644
> --- a/lib/librte_pmd_ixgbe/ixgbe_rxtx.h
> +++ b/lib/librte_pmd_ixgbe/ixgbe_rxtx.h
> @@ -145,13 +145,16 @@ enum ixgbe_advctx_num {
>  };
> 
>  /** Offload features */
> -union ixgbe_vlan_macip {
> -	uint32_t data;
> +union ixgbe_tx_offload {
> +	uint64_t data;
>  	struct {
> -		uint16_t l2_l3_len; /**< combined 9-bit l3, 7-bit l2 lengths */
> -		uint16_t vlan_tci;
> +		uint64_t l2_len:7; /**< L2 (MAC) Header Length. */
> +		uint64_t l3_len:9; /**< L3 (IP) Header Length. */
> +		uint64_t l4_len:8; /**< L4 (TCP/UDP) Header Length. */
> +		uint64_t tso_segsz:16; /**< TCP TSO segment size */
> +		uint64_t vlan_tci:16;
>  		/**< VLAN Tag Control Identifier (CPU order). */
> -	} f;
> +	};
>  };
> 
>  /*
> @@ -170,8 +173,10 @@ union ixgbe_vlan_macip {
> 
>  struct ixgbe_advctx_info {
>  	uint64_t flags;           /**< ol_flags for context build. */
> -	uint32_t cmp_mask;        /**< compare mask for vlan_macip_lens */
> -	union ixgbe_vlan_macip vlan_macip_lens; /**< vlan, mac ip length. */
> +	/**< tx offload: vlan, tso, l2-l3-l4 lengths. */
> +	union ixgbe_tx_offload tx_offload;
> +	/** compare mask for tx offload. */
> +	union ixgbe_tx_offload tx_offload_mask;
>  };
> 
>  /**
> --
> 2.1.0

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [dpdk-dev] [PATCH v2 06/13] mbuf: add functions to get the name of an ol_flag
  2014-11-14 17:03   ` [dpdk-dev] [PATCH v2 06/13] mbuf: add functions to get the name of an ol_flag Olivier Matz
  2014-11-17 10:39     ` Bruce Richardson
@ 2014-11-17 19:00     ` Ananyev, Konstantin
  2014-11-18  9:29       ` Olivier MATZ
  1 sibling, 1 reply; 112+ messages in thread
From: Ananyev, Konstantin @ 2014-11-17 19:00 UTC (permalink / raw)
  To: Olivier Matz, dev; +Cc: jigsaw

Hi Oliver,

> -----Original Message-----
> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Olivier Matz
> Sent: Friday, November 14, 2014 5:03 PM
> To: dev@dpdk.org
> Cc: jigsaw@gmail.com
> Subject: [dpdk-dev] [PATCH v2 06/13] mbuf: add functions to get the name of an ol_flag
> 
> In test-pmd (rxonly.c), the code is able to dump the list of ol_flags.
> The issue is that the list of flags in the application has to be
> synchronized with the flags defined in rte_mbuf.h.
> 
> This patch introduces 2 new functions rte_get_rx_ol_flag_name()
> and rte_get_tx_ol_flag_name() that returns the name of a flag from
> its mask. It also fixes rxonly.c to use this new functions and to
> display the proper flags.
> 
> Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
> ---
>  app/test-pmd/rxonly.c      | 36 ++++++++++--------------------------
>  lib/librte_mbuf/rte_mbuf.c | 45 +++++++++++++++++++++++++++++++++++++++++++++
>  lib/librte_mbuf/rte_mbuf.h | 22 ++++++++++++++++++++++
>  3 files changed, 77 insertions(+), 26 deletions(-)
> 
> diff --git a/app/test-pmd/rxonly.c b/app/test-pmd/rxonly.c
> index 9ad1df6..51a530a 100644
> --- a/app/test-pmd/rxonly.c
> +++ b/app/test-pmd/rxonly.c
> @@ -71,26 +71,6 @@
> 
>  #include "testpmd.h"
> 
> -#define MAX_PKT_RX_FLAGS 13
> -static const char *pkt_rx_flag_names[MAX_PKT_RX_FLAGS] = {
> -	"VLAN_PKT",
> -	"RSS_HASH",
> -	"PKT_RX_FDIR",
> -	"IP_CKSUM",
> -	"IP_CKSUM_BAD",
> -
> -	"IPV4_HDR",
> -	"IPV4_HDR_EXT",
> -	"IPV6_HDR",
> -	"IPV6_HDR_EXT",
> -
> -	"IEEE1588_PTP",
> -	"IEEE1588_TMST",
> -
> -	"TUNNEL_IPV4_HDR",
> -	"TUNNEL_IPV6_HDR",
> -};
> -
>  static inline void
>  print_ether_addr(const char *what, struct ether_addr *eth_addr)
>  {
> @@ -214,12 +194,16 @@ pkt_burst_receive(struct fwd_stream *fs)
>  		printf(" - Receive queue=0x%x", (unsigned) fs->rx_queue);
>  		printf("\n");
>  		if (ol_flags != 0) {
> -			int rxf;
> -
> -			for (rxf = 0; rxf < MAX_PKT_RX_FLAGS; rxf++) {
> -				if (ol_flags & (1 << rxf))
> -					printf("  PKT_RX_%s\n",
> -					       pkt_rx_flag_names[rxf]);
> +			unsigned rxf;
> +			const char *name;
> +
> +			for (rxf = 0; rxf < sizeof(mb->ol_flags) * 8; rxf++) {
> +				if ((ol_flags & (1ULL << rxf)) == 0)
> +					continue;
> +				name = rte_get_rx_ol_flag_name(1ULL << rxf);
> +				if (name == NULL)
> +					continue;
> +				printf("  %s\n", name);
>  			}
>  		}
>  		rte_pktmbuf_free(mb);
> diff --git a/lib/librte_mbuf/rte_mbuf.c b/lib/librte_mbuf/rte_mbuf.c
> index 52e7574..5cd9137 100644
> --- a/lib/librte_mbuf/rte_mbuf.c
> +++ b/lib/librte_mbuf/rte_mbuf.c
> @@ -196,3 +196,48 @@ rte_pktmbuf_dump(FILE *f, const struct rte_mbuf *m, unsigned dump_len)
>  		nb_segs --;
>  	}
>  }
> +
> +/*
> + * Get the name of a RX offload flag
> + */
> +const char *rte_get_rx_ol_flag_name(uint64_t mask)
> +{
> +	switch (mask) {
> +	case PKT_RX_VLAN_PKT: return "PKT_RX_VLAN_PKT";
> +	case PKT_RX_RSS_HASH: return "PKT_RX_RSS_HASH";
> +	case PKT_RX_FDIR: return "PKT_RX_FDIR";
> +	case PKT_RX_L4_CKSUM_BAD: return "PKT_RX_L4_CKSUM_BAD";
> +	case PKT_RX_IP_CKSUM_BAD: return "PKT_RX_IP_CKSUM_BAD";
> +	/* case PKT_RX_EIP_CKSUM_BAD: return "PKT_RX_EIP_CKSUM_BAD"; */
> +	/* case PKT_RX_OVERSIZE: return "PKT_RX_OVERSIZE"; */
> +	/* case PKT_RX_HBUF_OVERFLOW: return "PKT_RX_HBUF_OVERFLOW"; */
> +	/* case PKT_RX_RECIP_ERR: return "PKT_RX_RECIP_ERR"; */
> +	/* case PKT_RX_MAC_ERR: return "PKT_RX_MAC_ERR"; */

Didn't spot it before, wonder why do you need these 5 commented out lines?
In fact, why do we need these flags if they all equal to zero right now?
I know these flags were not introduced by that patch, in fact as I can see it was a temporary measure,
as old ol_flags were just 16 bits long:
http://dpdk.org/ml/archives/dev/2014-June/003308.html
So wonder should now these flags either get proper values or be removed?

Konstantin

> +	case PKT_RX_IPV4_HDR: return "PKT_RX_IPV4_HDR";
> +	case PKT_RX_IPV4_HDR_EXT: return "PKT_RX_IPV4_HDR_EXT";
> +	case PKT_RX_IPV6_HDR: return "PKT_RX_IPV6_HDR";
> +	case PKT_RX_IPV6_HDR_EXT: return "PKT_RX_IPV6_HDR_EXT";
> +	case PKT_RX_IEEE1588_PTP: return "PKT_RX_IEEE1588_PTP";
> +	case PKT_RX_IEEE1588_TMST: return "PKT_RX_IEEE1588_TMST";
> +	case PKT_RX_TUNNEL_IPV4_HDR: return "PKT_RX_TUNNEL_IPV4_HDR";
> +	case PKT_RX_TUNNEL_IPV6_HDR: return "PKT_RX_TUNNEL_IPV6_HDR";
> +	default: return NULL;
> +	}
> +}
> +
> +/*
> + * Get the name of a TX offload flag
> + */
> +const char *rte_get_tx_ol_flag_name(uint64_t mask)
> +{
> +	switch (mask) {
> +	case PKT_TX_VLAN_PKT: return "PKT_TX_VLAN_PKT";
> +	case PKT_TX_IP_CKSUM: return "PKT_TX_IP_CKSUM";
> +	case PKT_TX_TCP_CKSUM: return "PKT_TX_TCP_CKSUM";
> +	case PKT_TX_SCTP_CKSUM: return "PKT_TX_SCTP_CKSUM";
> +	case PKT_TX_UDP_CKSUM: return "PKT_TX_UDP_CKSUM";
> +	case PKT_TX_IEEE1588_TMST: return "PKT_TX_IEEE1588_TMST";
> +	case PKT_TX_VXLAN_CKSUM: return "PKT_TX_VXLAN_CKSUM";
> +	default: return NULL;
> +	}
> +}
> diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h
> index 68fb988..e76617f 100644
> --- a/lib/librte_mbuf/rte_mbuf.h
> +++ b/lib/librte_mbuf/rte_mbuf.h
> @@ -129,6 +129,28 @@ extern "C" {
>  /* Use final bit of flags to indicate a control mbuf */
>  #define CTRL_MBUF_FLAG       (1ULL << 63) /**< Mbuf contains control data */
> 
> +/**
> + * Get the name of a RX offload flag
> + *
> + * @param mask
> + *   The mask describing the flag.
> + * @return
> + *   The name of this flag, or NULL if it's not a valid RX flag.
> + */
> +const char *rte_get_rx_ol_flag_name(uint64_t mask);
> +
> +/**
> + * Get the name of a TX offload flag
> + *
> + * @param mask
> + *   The mask describing the flag. Usually only one bit must be set.
> + *   Several bits can be given if they belong to the same mask.
> + *   Ex: PKT_TX_L4_MASK.
> + * @return
> + *   The name of this flag, or NULL if it's not a valid TX flag.
> + */
> +const char *rte_get_tx_ol_flag_name(uint64_t mask);
> +
>  /* define a set of marker types that can be used to refer to set points in the
>   * mbuf */
>  typedef void    *MARKER[0];   /**< generic marker for a point in a structure */
> --
> 2.1.0

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [dpdk-dev] [PATCH v2 03/13] mbuf: move vxlan_cksum flag definition at the proper place
  2014-11-14 17:03   ` [dpdk-dev] [PATCH v2 03/13] mbuf: move vxlan_cksum flag definition at the proper place Olivier Matz
@ 2014-11-17 22:05     ` Thomas Monjalon
  2014-11-18 14:10       ` Olivier MATZ
  0 siblings, 1 reply; 112+ messages in thread
From: Thomas Monjalon @ 2014-11-17 22:05 UTC (permalink / raw)
  To: Olivier Matz; +Cc: dev

2014-11-14 18:03, Olivier Matz:
> The tx mbuf flags are ordered from the highest value to the
> the lowest. Move the PKT_TX_VXLAN_CKSUM at the right place.

Please, could you reorder them from the lowest to the highest?
It will be simpler to understand. There is already a comment to explain
the reverse allocation of Tx flags.

Thanks
-- 
Thomas

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [dpdk-dev] [PATCH v2 10/13] mbuf: generic support for TCP segmentation offload
  2014-11-14 17:03   ` [dpdk-dev] [PATCH v2 10/13] mbuf: generic support for TCP segmentation offload Olivier Matz
@ 2014-11-17 23:33     ` Ananyev, Konstantin
  0 siblings, 0 replies; 112+ messages in thread
From: Ananyev, Konstantin @ 2014-11-17 23:33 UTC (permalink / raw)
  To: Olivier Matz, dev; +Cc: jigsaw



> -----Original Message-----
> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Olivier Matz
> Sent: Friday, November 14, 2014 5:03 PM
> To: dev@dpdk.org
> Cc: jigsaw@gmail.com
> Subject: [dpdk-dev] [PATCH v2 10/13] mbuf: generic support for TCP segmentation offload
> 
> Some of the NICs supported by DPDK have a possibility to accelerate TCP
> traffic by using segmentation offload. The application prepares a packet
> with valid TCP header with size up to 64K and deleguates the
> segmentation to the NIC.
> 
> Implement the generic part of TCP segmentation offload in rte_mbuf. It
> introduces 2 new fields in rte_mbuf: l4_len (length of L4 header in bytes)
> and tso_segsz (MSS of packets).
> 
> To delegate the TCP segmentation to the hardware, the user has to:
> 
> - set the PKT_TX_TCP_SEG flag in mbuf->ol_flags (this flag implies
>   PKT_TX_TCP_CKSUM)
> - set PKT_TX_IP_CKSUM if it's IPv4, and set the IP checksum to 0 in
>   the packet
> - fill the mbuf offload information: l2_len, l3_len, l4_len, tso_segsz
> - calculate the pseudo header checksum without taking ip_len in account,
>   and set it in the TCP header, for instance by using
>   rte_ipv4_phdr_cksum(ip_hdr, ol_flags)
> 
> The API is inspired from ixgbe hardware (the next commit adds the
> support for ixgbe), but it seems generic enough to be used for other
> hw/drivers in the future.
> 
> This commit also reworks the way l2_len and l3_len are used in igb
> and ixgbe drivers as the l2_l3_len is not available anymore in mbuf.
> 
> Signed-off-by: Mirek Walukiewicz <miroslaw.walukiewicz@intel.com>
> Signed-off-by: Olivier Matz <olivier.matz@6wind.com>

Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>

> ---
>  app/test-pmd/testpmd.c            |  2 +-
>  examples/ipv4_multicast/main.c    |  2 +-
>  lib/librte_mbuf/rte_mbuf.c        |  1 +
>  lib/librte_mbuf/rte_mbuf.h        | 44 +++++++++++++++++++++++----------------
>  lib/librte_net/rte_ip.h           | 39 +++++++++++++++++++++++++++-------
>  lib/librte_pmd_e1000/igb_rxtx.c   | 11 +++++++++-
>  lib/librte_pmd_ixgbe/ixgbe_rxtx.c | 11 +++++++++-
>  7 files changed, 81 insertions(+), 29 deletions(-)
> 
> diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c
> index 12adafa..632a993 100644
> --- a/app/test-pmd/testpmd.c
> +++ b/app/test-pmd/testpmd.c
> @@ -408,7 +408,7 @@ testpmd_mbuf_ctor(struct rte_mempool *mp,
>  	mb->ol_flags     = 0;
>  	mb->data_off     = RTE_PKTMBUF_HEADROOM;
>  	mb->nb_segs      = 1;
> -	mb->l2_l3_len       = 0;
> +	mb->tx_offload   = 0;
>  	mb->vlan_tci     = 0;
>  	mb->hash.rss     = 0;
>  }
> diff --git a/examples/ipv4_multicast/main.c b/examples/ipv4_multicast/main.c
> index 590d11a..80c5140 100644
> --- a/examples/ipv4_multicast/main.c
> +++ b/examples/ipv4_multicast/main.c
> @@ -302,7 +302,7 @@ mcast_out_pkt(struct rte_mbuf *pkt, int use_clone)
>  	/* copy metadata from source packet*/
>  	hdr->port = pkt->port;
>  	hdr->vlan_tci = pkt->vlan_tci;
> -	hdr->l2_l3_len = pkt->l2_l3_len;
> +	hdr->tx_offload = pkt->tx_offload;
>  	hdr->hash = pkt->hash;
> 
>  	hdr->ol_flags = pkt->ol_flags;
> diff --git a/lib/librte_mbuf/rte_mbuf.c b/lib/librte_mbuf/rte_mbuf.c
> index 5cd9137..75295c8 100644
> --- a/lib/librte_mbuf/rte_mbuf.c
> +++ b/lib/librte_mbuf/rte_mbuf.c
> @@ -238,6 +238,7 @@ const char *rte_get_tx_ol_flag_name(uint64_t mask)
>  	case PKT_TX_UDP_CKSUM: return "PKT_TX_UDP_CKSUM";
>  	case PKT_TX_IEEE1588_TMST: return "PKT_TX_IEEE1588_TMST";
>  	case PKT_TX_VXLAN_CKSUM: return "PKT_TX_VXLAN_CKSUM";
> +	case PKT_TX_TCP_SEG: return "PKT_TX_TCP_SEG";
>  	default: return NULL;
>  	}
>  }
> diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h
> index 3c8e825..9f44d08 100644
> --- a/lib/librte_mbuf/rte_mbuf.h
> +++ b/lib/librte_mbuf/rte_mbuf.h
> @@ -127,6 +127,20 @@ extern "C" {
> 
>  #define PKT_TX_VXLAN_CKSUM   (1ULL << 50) /**< TX checksum of VXLAN computed by NIC */
> 
> +/**
> + * TCP segmentation offload. To enable this offload feature for a
> + * packet to be transmitted on hardware supporting TSO:
> + *  - set the PKT_TX_TCP_SEG flag in mbuf->ol_flags (this flag implies
> + *    PKT_TX_TCP_CKSUM)
> + *  - if it's IPv4, set the PKT_TX_IP_CKSUM flag and write the IP checksum
> + *    to 0 in the packet
> + *  - fill the mbuf offload information: l2_len, l3_len, l4_len, tso_segsz
> + *  - calculate the pseudo header checksum without taking ip_len in accound,
> + *    and set it in the TCP header. Refer to rte_ipv4_phdr_cksum() and
> + *    rte_ipv6_phdr_cksum() that can be used as helpers.
> + */
> +#define PKT_TX_TCP_SEG       (1ULL << 49)
> +
>  /* Use final bit of flags to indicate a control mbuf */
>  #define CTRL_MBUF_FLAG       (1ULL << 63) /**< Mbuf contains control data */
> 
> @@ -228,22 +242,18 @@ struct rte_mbuf {
> 
>  	/* fields to support TX offloads */
>  	union {
> -		uint16_t l2_l3_len; /**< combined l2/l3 lengths as single var */
> +		uint64_t tx_offload;       /**< combined for easy fetch */
>  		struct {
> -			uint16_t l3_len:9;      /**< L3 (IP) Header Length. */
> -			uint16_t l2_len:7;      /**< L2 (MAC) Header Length. */
> -		};
> -	};
> +			uint64_t l2_len:7; /**< L2 (MAC) Header Length. */
> +			uint64_t l3_len:9; /**< L3 (IP) Header Length. */
> +			uint64_t l4_len:8; /**< L4 (TCP/UDP) Header Length. */
> +			uint64_t tso_segsz:16; /**< TCP TSO segment size */
> 
> -	/* fields for TX offloading of tunnels */
> -	union {
> -		uint16_t inner_l2_l3_len;
> -		/**< combined inner l2/l3 lengths as single var */
> -		struct {
> -			uint16_t inner_l3_len:9;
> -			/**< inner L3 (IP) Header Length. */
> -			uint16_t inner_l2_len:7;
> -			/**< inner L2 (MAC) Header Length. */
> +			/* fields for TX offloading of tunnels */
> +			uint64_t inner_l3_len:9; /**< inner L3 (IP) Hdr Length. */
> +			uint64_t inner_l2_len:7; /**< inner L2 (MAC) Hdr Length. */
> +
> +			/* uint64_t unused:8; */
>  		};
>  	};
>  } __rte_cache_aligned;
> @@ -595,8 +605,7 @@ static inline void rte_pktmbuf_reset(struct rte_mbuf *m)
>  {
>  	m->next = NULL;
>  	m->pkt_len = 0;
> -	m->l2_l3_len = 0;
> -	m->inner_l2_l3_len = 0;
> +	m->tx_offload = 0;
>  	m->vlan_tci = 0;
>  	m->nb_segs = 1;
>  	m->port = 0xff;
> @@ -665,8 +674,7 @@ static inline void rte_pktmbuf_attach(struct rte_mbuf *mi, struct rte_mbuf *md)
>  	mi->data_len = md->data_len;
>  	mi->port = md->port;
>  	mi->vlan_tci = md->vlan_tci;
> -	mi->l2_l3_len = md->l2_l3_len;
> -	mi->inner_l2_l3_len = md->inner_l2_l3_len;
> +	mi->tx_offload = md->tx_offload;
>  	mi->hash = md->hash;
> 
>  	mi->next = NULL;
> diff --git a/lib/librte_net/rte_ip.h b/lib/librte_net/rte_ip.h
> index 9cfca7f..1fafa73 100644
> --- a/lib/librte_net/rte_ip.h
> +++ b/lib/librte_net/rte_ip.h
> @@ -80,6 +80,7 @@
> 
>  #include <rte_memcpy.h>
>  #include <rte_byteorder.h>
> +#include <rte_mbuf.h>
> 
>  #ifdef __cplusplus
>  extern "C" {
> @@ -308,13 +309,21 @@ rte_ipv4_cksum(const struct ipv4_hdr *ipv4_hdr)
>   *
>   * The checksum field must be set to 0 by the caller.
>   *
> + * Depending on the ol_flags, the pseudo-header checksum expected by the
> + * drivers is not the same. For instance, when TSO is enabled, the IP
> + * payload length must not be included in the packet.
> + *
> + * When ol_flags is 0, it computes the standard pseudo-header checksum.
> + *
>   * @param ipv4_hdr
>   *   The pointer to the contiguous IPv4 header.
> + * @param ol_flags
> + *   The ol_flags of the associated mbuf.
>   * @return
>   *   The non-complemented checksum to set in the L4 header.
>   */
>  static inline uint16_t
> -rte_ipv4_phdr_cksum(const struct ipv4_hdr *ipv4_hdr)
> +rte_ipv4_phdr_cksum(const struct ipv4_hdr *ipv4_hdr, uint64_t ol_flags)
>  {
>  	struct ipv4_psd_header {
>  		uint32_t src_addr; /* IP address of source host. */
> @@ -328,9 +337,13 @@ rte_ipv4_phdr_cksum(const struct ipv4_hdr *ipv4_hdr)
>  	psd_hdr.dst_addr = ipv4_hdr->dst_addr;
>  	psd_hdr.zero = 0;
>  	psd_hdr.proto = ipv4_hdr->next_proto_id;
> -	psd_hdr.len = rte_cpu_to_be_16(
> -		(uint16_t)(rte_be_to_cpu_16(ipv4_hdr->total_length)
> -			- sizeof(struct ipv4_hdr)));
> +	if (ol_flags & PKT_TX_TCP_SEG) {
> +		psd_hdr.len = 0;
> +	} else {
> +		psd_hdr.len = rte_cpu_to_be_16(
> +			(uint16_t)(rte_be_to_cpu_16(ipv4_hdr->total_length)
> +				- sizeof(struct ipv4_hdr)));
> +	}
>  	return rte_raw_cksum((const char *)&psd_hdr, sizeof(psd_hdr));
>  }
> 
> @@ -357,7 +370,7 @@ rte_ipv4_udptcp_cksum(const struct ipv4_hdr *ipv4_hdr, const void *l4_hdr)
>  		sizeof(struct ipv4_hdr);
> 
>  	cksum = rte_raw_cksum(l4_hdr, l4_len);
> -	cksum += rte_ipv4_phdr_cksum(ipv4_hdr);
> +	cksum += rte_ipv4_phdr_cksum(ipv4_hdr, 0);
> 
>  	cksum = ((cksum & 0xffff0000) >> 16) + (cksum & 0xffff);
>  	cksum = (~cksum) & 0xffff;
> @@ -382,13 +395,21 @@ struct ipv6_hdr {
>  /**
>   * Process the pseudo-header checksum of an IPv6 header.
>   *
> + * Depending on the ol_flags, the pseudo-header checksum expected by the
> + * drivers is not the same. For instance, when TSO is enabled, the IPv6
> + * payload length must not be included in the packet.
> + *
> + * When ol_flags is 0, it computes the standard pseudo-header checksum.
> + *
>   * @param ipv6_hdr
>   *   The pointer to the contiguous IPv6 header.
> + * @param ol_flags
> + *   The ol_flags of the associated mbuf.
>   * @return
>   *   The non-complemented checksum to set in the L4 header.
>   */
>  static inline uint16_t
> -rte_ipv6_phdr_cksum(const struct ipv6_hdr *ipv6_hdr)
> +rte_ipv6_phdr_cksum(const struct ipv6_hdr *ipv6_hdr, uint64_t ol_flags)
>  {
>  	struct ipv6_psd_header {
>  		uint8_t src_addr[16]; /* IP address of source host. */
> @@ -400,7 +421,11 @@ rte_ipv6_phdr_cksum(const struct ipv6_hdr *ipv6_hdr)
>  	rte_memcpy(&psd_hdr.src_addr, ipv6_hdr->src_addr,
>  		sizeof(ipv6_hdr->src_addr) + sizeof(ipv6_hdr->dst_addr));
>  	psd_hdr.proto = (ipv6_hdr->proto << 24);
> -	psd_hdr.len = ipv6_hdr->payload_len;
> +	if (ol_flags & PKT_TX_TCP_SEG) {
> +		psd_hdr.len = 0;
> +	} else {
> +		psd_hdr.len = ipv6_hdr->payload_len;
> +	}
> 
>  	return rte_raw_cksum((const char *)&psd_hdr, sizeof(psd_hdr));
>  }
> diff --git a/lib/librte_pmd_e1000/igb_rxtx.c b/lib/librte_pmd_e1000/igb_rxtx.c
> index 433c616..848d5d1 100644
> --- a/lib/librte_pmd_e1000/igb_rxtx.c
> +++ b/lib/librte_pmd_e1000/igb_rxtx.c
> @@ -367,6 +367,13 @@ eth_igb_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
>  	struct rte_mbuf     *tx_pkt;
>  	struct rte_mbuf     *m_seg;
>  	union igb_vlan_macip vlan_macip_lens;
> +	union {
> +		uint16_t u16;
> +		struct {
> +			uint16_t l3_len:9;
> +			uint16_t l2_len:7;
> +		};
> +	} l2_l3_len;
>  	uint64_t buf_dma_addr;
>  	uint32_t olinfo_status;
>  	uint32_t cmd_type_len;
> @@ -404,8 +411,10 @@ eth_igb_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
>  		tx_last = (uint16_t) (tx_id + tx_pkt->nb_segs - 1);
> 
>  		ol_flags = tx_pkt->ol_flags;
> +		l2_l3_len.l2_len = tx_pkt->l2_len;
> +		l2_l3_len.l3_len = tx_pkt->l3_len;
>  		vlan_macip_lens.f.vlan_tci = tx_pkt->vlan_tci;
> -		vlan_macip_lens.f.l2_l3_len = tx_pkt->l2_l3_len;
> +		vlan_macip_lens.f.l2_l3_len = l2_l3_len.u16;
>  		tx_ol_req = ol_flags & IGB_TX_OFFLOAD_MASK;
> 
>  		/* If a Context Descriptor need be built . */
> diff --git a/lib/librte_pmd_ixgbe/ixgbe_rxtx.c b/lib/librte_pmd_ixgbe/ixgbe_rxtx.c
> index ca35db2..2df3385 100644
> --- a/lib/librte_pmd_ixgbe/ixgbe_rxtx.c
> +++ b/lib/librte_pmd_ixgbe/ixgbe_rxtx.c
> @@ -546,6 +546,13 @@ ixgbe_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
>  	struct rte_mbuf     *tx_pkt;
>  	struct rte_mbuf     *m_seg;
>  	union ixgbe_vlan_macip vlan_macip_lens;
> +	union {
> +		uint16_t u16;
> +		struct {
> +			uint16_t l3_len:9;
> +			uint16_t l2_len:7;
> +		};
> +	} l2_l3_len;
>  	uint64_t buf_dma_addr;
>  	uint32_t olinfo_status;
>  	uint32_t cmd_type_len;
> @@ -588,8 +595,10 @@ ixgbe_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
>  		/* If hardware offload required */
>  		tx_ol_req = ol_flags & IXGBE_TX_OFFLOAD_MASK;
>  		if (tx_ol_req) {
> +			l2_l3_len.l2_len = tx_pkt->l2_len;
> +			l2_l3_len.l3_len = tx_pkt->l3_len;
>  			vlan_macip_lens.f.vlan_tci = tx_pkt->vlan_tci;
> -			vlan_macip_lens.f.l2_l3_len = tx_pkt->l2_l3_len;
> +			vlan_macip_lens.f.l2_l3_len = l2_l3_len.u16;
> 
>  			/* If new context need be built or reuse the exist ctx. */
>  			ctx = what_advctx_update(txq, tx_ol_req,
> --
> 2.1.0

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [dpdk-dev] [PATCH v2 09/13] mbuf: introduce new checksum API
  2014-11-17 18:15     ` Ananyev, Konstantin
@ 2014-11-18  9:10       ` Olivier MATZ
  0 siblings, 0 replies; 112+ messages in thread
From: Olivier MATZ @ 2014-11-18  9:10 UTC (permalink / raw)
  To: Ananyev, Konstantin, dev; +Cc: jigsaw

Hi Konstantin,

On 11/17/2014 07:15 PM, Ananyev, Konstantin wrote:
> Just 2 nits from me:
>
> 1)
>> +static inline uint16_t
>> +rte_raw_cksum(const char *buf, size_t len)
>> +{
> ...
>> +	while (len >= 8) {
>> +		sum += u16[0]; sum += u16[1]; sum += u16[2]; sum += u16[3];
>
> Can you put each expression into a new line?
> sum += u16[0];
> sum += u16[1];
> ...
>
> To make it easier to read.
> Or can it be rewritten just like:
> sum = (uint32_t)u16[0] + u16[1] + u16[2] + u16[3];
> here?
>
> 2)
>> +	while (len >= 8) {
>> +		sum += u16[0]; sum += u16[1]; sum += u16[2]; sum += u16[3];
>> +		len -= 8;
>> +		u16 += 4;
>> +	}
>> +	while (len >= 2) {
>> +		sum += *u16;
>> +		len -= 2;
>> +		u16 += 1;
>> +	}
>
> In the code above, probably use sizeof(u16[0]) wherever appropriate.
> To make things a bit more clearer and consistent.
> ...
> while (len >=  4 * sizeof(u16[0]))
> len -= 4 * sizeof(u16[0]);
> u16 += 4;
> ...
> Same for second loop

OK, I push that in the todo list for the v3.

Thanks,
Olivier

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [dpdk-dev] [PATCH v2 11/13] ixgbe: support TCP segmentation offload
  2014-11-17 18:26     ` Ananyev, Konstantin
@ 2014-11-18  9:11       ` Olivier MATZ
  0 siblings, 0 replies; 112+ messages in thread
From: Olivier MATZ @ 2014-11-18  9:11 UTC (permalink / raw)
  To: Ananyev, Konstantin, dev; +Cc: jigsaw

Hi Konstantin,

On 11/17/2014 07:26 PM, Ananyev, Konstantin wrote:
> Just one thing - double semicolon -  looks like a typo:
>> +	/* check if TCP segmentation required for this packet */
>> +	if (ol_flags & PKT_TX_TCP_SEG) {
>> +		/* implies IP cksum and TCP cksum */
>> +		type_tucmd_mlhl = IXGBE_ADVTXD_TUCMD_IPV4 |
>> +			IXGBE_ADVTXD_TUCMD_L4T_TCP |
>> +			IXGBE_ADVTXD_DTYP_CTXT | IXGBE_ADVTXD_DCMD_DEXT;;

Good catch, I'll fix this too.


Olivier

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [dpdk-dev] [PATCH v2 06/13] mbuf: add functions to get the name of an ol_flag
  2014-11-17 19:00     ` Ananyev, Konstantin
@ 2014-11-18  9:29       ` Olivier MATZ
  2014-11-19 11:06         ` Ananyev, Konstantin
  0 siblings, 1 reply; 112+ messages in thread
From: Olivier MATZ @ 2014-11-18  9:29 UTC (permalink / raw)
  To: Ananyev, Konstantin, dev; +Cc: jigsaw

Hi Konstantin,

On 11/17/2014 08:00 PM, Ananyev, Konstantin wrote:
>> +/*
>> + * Get the name of a RX offload flag
>> + */
>> +const char *rte_get_rx_ol_flag_name(uint64_t mask)
>> +{
>> +	switch (mask) {
>> +	case PKT_RX_VLAN_PKT: return "PKT_RX_VLAN_PKT";
>> +	case PKT_RX_RSS_HASH: return "PKT_RX_RSS_HASH";
>> +	case PKT_RX_FDIR: return "PKT_RX_FDIR";
>> +	case PKT_RX_L4_CKSUM_BAD: return "PKT_RX_L4_CKSUM_BAD";
>> +	case PKT_RX_IP_CKSUM_BAD: return "PKT_RX_IP_CKSUM_BAD";
>> +	/* case PKT_RX_EIP_CKSUM_BAD: return "PKT_RX_EIP_CKSUM_BAD"; */
>> +	/* case PKT_RX_OVERSIZE: return "PKT_RX_OVERSIZE"; */
>> +	/* case PKT_RX_HBUF_OVERFLOW: return "PKT_RX_HBUF_OVERFLOW"; */
>> +	/* case PKT_RX_RECIP_ERR: return "PKT_RX_RECIP_ERR"; */
>> +	/* case PKT_RX_MAC_ERR: return "PKT_RX_MAC_ERR"; */
>
> Didn't spot it before, wonder why do you need these 5 commented out lines?
> In fact, why do we need these flags if they all equal to zero right now?
> I know these flags were not introduced by that patch, in fact as I can see it was a temporary measure,
> as old ol_flags were just 16 bits long:
> http://dpdk.org/ml/archives/dev/2014-June/003308.html
> So wonder should now these flags either get proper values or be removed?

I would be in favor of removing them, or at least the following ones
(I don't understand how they can help the application):

- PKT_RX_OVERSIZE: Num of desc of an RX pkt oversize.
- PKT_RX_HBUF_OVERFLOW: Header buffer overflow.
- PKT_RX_RECIP_ERR: Hardware processing error.
- PKT_RX_MAC_ERR: MAC error.

I would have say that a statistics counter in the driver is more
appropriate for this case (maybe there is already a counter in the
hardware).

I have no i40e hardware to test that, so I don't feel very comfortable
to modify the i40e driver code to add these stats.

Adding Helin in CC list, maybe he has an idea.

Regards,
Olivier

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [dpdk-dev] [PATCH v2 03/13] mbuf: move vxlan_cksum flag definition at the proper place
  2014-11-17 22:05     ` Thomas Monjalon
@ 2014-11-18 14:10       ` Olivier MATZ
  0 siblings, 0 replies; 112+ messages in thread
From: Olivier MATZ @ 2014-11-18 14:10 UTC (permalink / raw)
  To: Thomas Monjalon; +Cc: dev

Hi Thomas,

On 11/17/2014 11:05 PM, Thomas Monjalon wrote:
> 2014-11-14 18:03, Olivier Matz:
>> The tx mbuf flags are ordered from the highest value to the
>> the lowest. Move the PKT_TX_VXLAN_CKSUM at the right place.
>
> Please, could you reorder them from the lowest to the highest?
> It will be simpler to understand. There is already a comment to explain
> the reverse allocation of Tx flags.
>
> Thanks

Sure, I'll do that in a separate commit.

Olivier

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [dpdk-dev] [PATCH v2 06/13] mbuf: add functions to get the name of an ol_flag
  2014-11-18  9:29       ` Olivier MATZ
@ 2014-11-19 11:06         ` Ananyev, Konstantin
  2014-11-25 10:37           ` Ananyev, Konstantin
  0 siblings, 1 reply; 112+ messages in thread
From: Ananyev, Konstantin @ 2014-11-19 11:06 UTC (permalink / raw)
  To: Olivier MATZ, dev; +Cc: jigsaw



> -----Original Message-----
> From: Olivier MATZ [mailto:olivier.matz@6wind.com]
> Sent: Tuesday, November 18, 2014 9:30 AM
> To: Ananyev, Konstantin; dev@dpdk.org
> Cc: jigsaw@gmail.com; Zhang, Helin
> Subject: Re: [dpdk-dev] [PATCH v2 06/13] mbuf: add functions to get the name of an ol_flag
> 
> Hi Konstantin,
> 
> On 11/17/2014 08:00 PM, Ananyev, Konstantin wrote:
> >> +/*
> >> + * Get the name of a RX offload flag
> >> + */
> >> +const char *rte_get_rx_ol_flag_name(uint64_t mask)
> >> +{
> >> +	switch (mask) {
> >> +	case PKT_RX_VLAN_PKT: return "PKT_RX_VLAN_PKT";
> >> +	case PKT_RX_RSS_HASH: return "PKT_RX_RSS_HASH";
> >> +	case PKT_RX_FDIR: return "PKT_RX_FDIR";
> >> +	case PKT_RX_L4_CKSUM_BAD: return "PKT_RX_L4_CKSUM_BAD";
> >> +	case PKT_RX_IP_CKSUM_BAD: return "PKT_RX_IP_CKSUM_BAD";
> >> +	/* case PKT_RX_EIP_CKSUM_BAD: return "PKT_RX_EIP_CKSUM_BAD"; */
> >> +	/* case PKT_RX_OVERSIZE: return "PKT_RX_OVERSIZE"; */
> >> +	/* case PKT_RX_HBUF_OVERFLOW: return "PKT_RX_HBUF_OVERFLOW"; */
> >> +	/* case PKT_RX_RECIP_ERR: return "PKT_RX_RECIP_ERR"; */
> >> +	/* case PKT_RX_MAC_ERR: return "PKT_RX_MAC_ERR"; */
> >
> > Didn't spot it before, wonder why do you need these 5 commented out lines?
> > In fact, why do we need these flags if they all equal to zero right now?
> > I know these flags were not introduced by that patch, in fact as I can see it was a temporary measure,
> > as old ol_flags were just 16 bits long:
> > http://dpdk.org/ml/archives/dev/2014-June/003308.html
> > So wonder should now these flags either get proper values or be removed?
> 
> I would be in favor of removing them, or at least the following ones
> (I don't understand how they can help the application):
> 
> - PKT_RX_OVERSIZE: Num of desc of an RX pkt oversize.
> - PKT_RX_HBUF_OVERFLOW: Header buffer overflow.
> - PKT_RX_RECIP_ERR: Hardware processing error.
> - PKT_RX_MAC_ERR: MAC error.

Tend to agree...
Or probably collapse these 4 flags into one: flag PKT_RX_ERR or something.
Might be still used by someone for debugging purposes.
Helin, what do you think?

> 
> I would have say that a statistics counter in the driver is more
> appropriate for this case (maybe there is already a counter in the
> hardware).
> 
> I have no i40e hardware to test that, so I don't feel very comfortable
> to modify the i40e driver code to add these stats.
> 
> Adding Helin in CC list, maybe he has an idea.
> 
> Regards,
> Olivier

^ permalink raw reply	[flat|nested] 112+ messages in thread

* [dpdk-dev] [PATCH v3 00/13] add TSO support
  2014-11-14 17:03 ` [dpdk-dev] [PATCH v2 00/13] " Olivier Matz
                     ` (12 preceding siblings ...)
  2014-11-14 17:03   ` [dpdk-dev] [PATCH v2 13/13] testpmd: add a verbose mode " Olivier Matz
@ 2014-11-20 22:58   ` Olivier Matz
  2014-11-20 22:58     ` [dpdk-dev] [PATCH v3 01/13] igb/ixgbe: fix IP checksum calculation Olivier Matz
                       ` (13 more replies)
  13 siblings, 14 replies; 112+ messages in thread
From: Olivier Matz @ 2014-11-20 22:58 UTC (permalink / raw)
  To: dev; +Cc: jigsaw

This series add TSO support in ixgbe DPDK driver. This is a rework
of the series sent earlier this week [1]. This work is based on
another version [2] that was posted several months ago and
which included a mbuf rework that is now in mainline.

Changes in v3:

- indicate that rte_get_rx_ol_flag_name() and rte_get_tx_ol_flag_name()
  should be kept synchronized with flags definition
- use sizeof() when appropriate in rte_raw_cksum()
- remove double semicolon in ixgbe driver
- reorder tx ol_flags as requested by Thomas
- add missing copyrights when big modifications are made
- enhance the help of tx_cksum command in testpmd
- enhance the description of csumonly (comments)

Changes in v2:

- move rte_get_rx_ol_flag_name() and rte_get_tx_ol_flag_name() in
  rte_mbuf.c, and fix comments
- use IGB_TX_OFFLOAD_MASK and IXGBE_TX_OFFLOAD_MASK to replace
  PKT_TX_OFFLOAD_MASK
- fix inner_l2_len and inner_l3_len bitfields: use uint64_t instead
  of uint16_t
- replace assignation of l2_len and l3_len by assignation of tx_offload.
  It now includes inner_l2_len and inner_l3_len at the same time.
- introduce a new cksum api in rte_ip.h following discussion with
  Konstantin
- reorder commits to have all TSO commits at the end of the series
- use ol_flags for phdr checksum calculation (this now matches ixgbe
  API: standard pseudo hdr cksum for TCP cksum offload, pseudo hdr
  cksum without ip paylen for TSO). This will probably be changed
  with a dev_prep_tx() like function for 2.0 release.
- rebase on latest head


This series first fixes some bugs that were discovered during the
development, adds some changes to the mbuf API (new l4_len and
tso_segsz fields), adds TSO support in ixgbe, reworks testpmd
csum forward engine, and finally adds TSO support in testpmd so it
can be validated.

The new fields added in mbuf try to be generic enough to apply to
other hardware in the future. To delegate the TCP segmentation to the
hardware, the user has to:

  - set the PKT_TX_TCP_SEG flag in mbuf->ol_flags (this flag implies
    PKT_TX_TCP_CKSUM)
  - if it's IPv4, set the PKT_TX_IP_CKSUM flag and write the IP checksum
    to 0 in the packet
  - fill the mbuf offload information: l2_len, l3_len, l4_len, tso_segsz
  - calculate the pseudo header checksum and set it in the TCP header,
    as required when doing hardware TCP checksum offload

The test report will be added as an answer to this cover letter and
could be linked in the concerned commits.

[1] http://dpdk.org/ml/archives/dev/2014-November/007953.html
[2] http://dpdk.org/ml/archives/dev/2014-May/002537.html

Olivier Matz (13):
  igb/ixgbe: fix IP checksum calculation
  ixgbe: fix remaining pkt_flags variable size to 64 bits
  mbuf: reorder tx ol_flags
  mbuf: add help about TX checksum flags
  mbuf: remove too specific PKT_TX_OFFLOAD_MASK definition
  mbuf: add functions to get the name of an ol_flag
  testpmd: fix use of offload flags in testpmd
  testpmd: rework csum forward engine
  mbuf: introduce new checksum API
  mbuf: generic support for TCP segmentation offload
  ixgbe: support TCP segmentation offload
  testpmd: support TSO in csum forward engine
  testpmd: add a verbose mode csum forward engine

 app/test-pmd/cmdline.c              | 248 +++++++++--
 app/test-pmd/config.c               |  17 +-
 app/test-pmd/csumonly.c             | 814 ++++++++++++++++--------------------
 app/test-pmd/macfwd.c               |   5 +-
 app/test-pmd/macswap.c              |   5 +-
 app/test-pmd/rxonly.c               |  36 +-
 app/test-pmd/testpmd.c              |   2 +-
 app/test-pmd/testpmd.h              |  24 +-
 app/test-pmd/txonly.c               |   9 +-
 examples/ipv4_multicast/main.c      |   2 +-
 lib/librte_mbuf/rte_mbuf.c          |  49 +++
 lib/librte_mbuf/rte_mbuf.h          | 102 +++--
 lib/librte_net/rte_ip.h             | 208 +++++++++
 lib/librte_pmd_e1000/igb_rxtx.c     |  21 +-
 lib/librte_pmd_ixgbe/ixgbe_ethdev.c |   3 +-
 lib/librte_pmd_ixgbe/ixgbe_rxtx.c   | 179 +++++---
 lib/librte_pmd_ixgbe/ixgbe_rxtx.h   |  19 +-
 17 files changed, 1094 insertions(+), 649 deletions(-)

-- 
2.1.0

^ permalink raw reply	[flat|nested] 112+ messages in thread

* [dpdk-dev] [PATCH v3 01/13] igb/ixgbe: fix IP checksum calculation
  2014-11-20 22:58   ` [dpdk-dev] [PATCH v3 00/13] add TSO support Olivier Matz
@ 2014-11-20 22:58     ` Olivier Matz
  2014-11-20 22:58     ` [dpdk-dev] [PATCH v3 02/13] ixgbe: fix remaining pkt_flags variable size to 64 bits Olivier Matz
                       ` (12 subsequent siblings)
  13 siblings, 0 replies; 112+ messages in thread
From: Olivier Matz @ 2014-11-20 22:58 UTC (permalink / raw)
  To: dev; +Cc: jigsaw

According to Intel® 82599 10 GbE Controller Datasheet (Table 7-38), both
L2 and L3 lengths are needed to offload the IP checksum.

Note that the e1000 driver does not need to be patched as it already
contains the fix.

Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
---
 lib/librte_pmd_e1000/igb_rxtx.c   | 2 +-
 lib/librte_pmd_ixgbe/ixgbe_rxtx.c | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/lib/librte_pmd_e1000/igb_rxtx.c b/lib/librte_pmd_e1000/igb_rxtx.c
index 0dca7b7..b406397 100644
--- a/lib/librte_pmd_e1000/igb_rxtx.c
+++ b/lib/librte_pmd_e1000/igb_rxtx.c
@@ -262,7 +262,7 @@ igbe_set_xmit_ctx(struct igb_tx_queue* txq,
 
 	if (ol_flags & PKT_TX_IP_CKSUM) {
 		type_tucmd_mlhl = E1000_ADVTXD_TUCMD_IPV4;
-		cmp_mask |= TX_MAC_LEN_CMP_MASK;
+		cmp_mask |= TX_MACIP_LEN_CMP_MASK;
 	}
 
 	/* Specify which HW CTX to upload. */
diff --git a/lib/librte_pmd_ixgbe/ixgbe_rxtx.c b/lib/librte_pmd_ixgbe/ixgbe_rxtx.c
index f9b3fe3..ecebbf6 100644
--- a/lib/librte_pmd_ixgbe/ixgbe_rxtx.c
+++ b/lib/librte_pmd_ixgbe/ixgbe_rxtx.c
@@ -374,7 +374,7 @@ ixgbe_set_xmit_ctx(struct igb_tx_queue* txq,
 
 	if (ol_flags & PKT_TX_IP_CKSUM) {
 		type_tucmd_mlhl = IXGBE_ADVTXD_TUCMD_IPV4;
-		cmp_mask |= TX_MAC_LEN_CMP_MASK;
+		cmp_mask |= TX_MACIP_LEN_CMP_MASK;
 	}
 
 	/* Specify which HW CTX to upload. */
-- 
2.1.0

^ permalink raw reply	[flat|nested] 112+ messages in thread

* [dpdk-dev] [PATCH v3 02/13] ixgbe: fix remaining pkt_flags variable size to 64 bits
  2014-11-20 22:58   ` [dpdk-dev] [PATCH v3 00/13] add TSO support Olivier Matz
  2014-11-20 22:58     ` [dpdk-dev] [PATCH v3 01/13] igb/ixgbe: fix IP checksum calculation Olivier Matz
@ 2014-11-20 22:58     ` Olivier Matz
  2014-11-20 22:58     ` [dpdk-dev] [PATCH v3 03/13] mbuf: reorder tx ol_flags Olivier Matz
                       ` (11 subsequent siblings)
  13 siblings, 0 replies; 112+ messages in thread
From: Olivier Matz @ 2014-11-20 22:58 UTC (permalink / raw)
  To: dev; +Cc: jigsaw

Since commit 4332beee9 "mbuf: expand ol_flags field to 64-bits", the
packet flags are now 64 bits wide. Some occurences were forgotten in
the ixgbe driver.

Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
---
 lib/librte_pmd_ixgbe/ixgbe_rxtx.c | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/lib/librte_pmd_ixgbe/ixgbe_rxtx.c b/lib/librte_pmd_ixgbe/ixgbe_rxtx.c
index ecebbf6..7e470ce 100644
--- a/lib/librte_pmd_ixgbe/ixgbe_rxtx.c
+++ b/lib/librte_pmd_ixgbe/ixgbe_rxtx.c
@@ -817,7 +817,7 @@ end_of_tx:
 static inline uint64_t
 rx_desc_hlen_type_rss_to_pkt_flags(uint32_t hl_tp_rs)
 {
-	uint16_t pkt_flags;
+	uint64_t pkt_flags;
 
 	static uint64_t ip_pkt_types_map[16] = {
 		0, PKT_RX_IPV4_HDR, PKT_RX_IPV4_HDR_EXT, PKT_RX_IPV4_HDR_EXT,
@@ -834,7 +834,7 @@ rx_desc_hlen_type_rss_to_pkt_flags(uint32_t hl_tp_rs)
 	};
 
 #ifdef RTE_LIBRTE_IEEE1588
-	static uint32_t ip_pkt_etqf_map[8] = {
+	static uint64_t ip_pkt_etqf_map[8] = {
 		0, 0, 0, PKT_RX_IEEE1588_PTP,
 		0, 0, 0, 0,
 	};
@@ -903,7 +903,7 @@ ixgbe_rx_scan_hw_ring(struct igb_rx_queue *rxq)
 	struct igb_rx_entry *rxep;
 	struct rte_mbuf *mb;
 	uint16_t pkt_len;
-	uint16_t pkt_flags;
+	uint64_t pkt_flags;
 	int s[LOOK_AHEAD], nb_dd;
 	int i, j, nb_rx = 0;
 
@@ -1335,7 +1335,7 @@ ixgbe_recv_scattered_pkts(void *rx_queue, struct rte_mbuf **rx_pkts,
 	uint16_t nb_rx;
 	uint16_t nb_hold;
 	uint16_t data_len;
-	uint16_t pkt_flags;
+	uint64_t pkt_flags;
 
 	nb_rx = 0;
 	nb_hold = 0;
@@ -1511,9 +1511,9 @@ ixgbe_recv_scattered_pkts(void *rx_queue, struct rte_mbuf **rx_pkts,
 		first_seg->vlan_tci = rte_le_to_cpu_16(rxd.wb.upper.vlan);
 		hlen_type_rss = rte_le_to_cpu_32(rxd.wb.lower.lo_dword.data);
 		pkt_flags = rx_desc_hlen_type_rss_to_pkt_flags(hlen_type_rss);
-		pkt_flags = (uint16_t)(pkt_flags |
+		pkt_flags = (pkt_flags |
 				rx_desc_status_to_pkt_flags(staterr));
-		pkt_flags = (uint16_t)(pkt_flags |
+		pkt_flags = (pkt_flags |
 				rx_desc_error_to_pkt_flags(staterr));
 		first_seg->ol_flags = pkt_flags;
 
-- 
2.1.0

^ permalink raw reply	[flat|nested] 112+ messages in thread

* [dpdk-dev] [PATCH v3 03/13] mbuf: reorder tx ol_flags
  2014-11-20 22:58   ` [dpdk-dev] [PATCH v3 00/13] add TSO support Olivier Matz
  2014-11-20 22:58     ` [dpdk-dev] [PATCH v3 01/13] igb/ixgbe: fix IP checksum calculation Olivier Matz
  2014-11-20 22:58     ` [dpdk-dev] [PATCH v3 02/13] ixgbe: fix remaining pkt_flags variable size to 64 bits Olivier Matz
@ 2014-11-20 22:58     ` Olivier Matz
  2014-11-25 10:22       ` Thomas Monjalon
  2014-11-20 22:58     ` [dpdk-dev] [PATCH v3 04/13] mbuf: add help about TX checksum flags Olivier Matz
                       ` (10 subsequent siblings)
  13 siblings, 1 reply; 112+ messages in thread
From: Olivier Matz @ 2014-11-20 22:58 UTC (permalink / raw)
  To: dev; +Cc: jigsaw

The tx mbuf flags are now ordered from the lowest value to the
the highest. Add comments to explain where to add new flags.

By the way, move the PKT_TX_VXLAN_CKSUM at the right place.

Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
---
 lib/librte_mbuf/rte_mbuf.h | 17 +++++++++--------
 1 file changed, 9 insertions(+), 8 deletions(-)

diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h
index f5f8658..d3eba44 100644
--- a/lib/librte_mbuf/rte_mbuf.h
+++ b/lib/librte_mbuf/rte_mbuf.h
@@ -93,14 +93,11 @@ extern "C" {
 #define PKT_RX_IEEE1588_TMST (1ULL << 10) /**< RX IEEE1588 L2/L4 timestamped packet.*/
 #define PKT_RX_TUNNEL_IPV4_HDR (1ULL << 11) /**< RX tunnel packet with IPv4 header.*/
 #define PKT_RX_TUNNEL_IPV6_HDR (1ULL << 12) /**< RX tunnel packet with IPv6 header. */
+/* add new RX flags here */
 
-#define PKT_TX_VLAN_PKT      (1ULL << 55) /**< TX packet is a 802.1q VLAN packet. */
-#define PKT_TX_IP_CKSUM      (1ULL << 54) /**< IP cksum of TX pkt. computed by NIC. */
+/* add new TX flags here */
 #define PKT_TX_VXLAN_CKSUM   (1ULL << 50) /**< TX checksum of VXLAN computed by NIC */
-#define PKT_TX_IPV4_CSUM     PKT_TX_IP_CKSUM /**< Alias of PKT_TX_IP_CKSUM. */
-#define PKT_TX_IPV4          PKT_RX_IPV4_HDR /**< IPv4 with no IP checksum offload. */
-#define PKT_TX_IPV6          PKT_RX_IPV6_HDR /**< IPv6 packet */
-
+#define PKT_TX_IEEE1588_TMST (1ULL << 51) /**< TX IEEE1588 packet to timestamp. */
 /*
  * Bits 52+53 used for L4 packet type with checksum enabled.
  *     00: Reserved
@@ -114,8 +111,12 @@ extern "C" {
 #define PKT_TX_UDP_CKSUM     (3ULL << 52) /**< UDP cksum of TX pkt. computed by NIC. */
 #define PKT_TX_L4_MASK       (3ULL << 52) /**< Mask for L4 cksum offload request. */
 
-/* Bit 51 - IEEE1588*/
-#define PKT_TX_IEEE1588_TMST (1ULL << 51) /**< TX IEEE1588 packet to timestamp. */
+#define PKT_TX_IP_CKSUM      (1ULL << 54) /**< IP cksum of TX pkt. computed by NIC. */
+#define PKT_TX_IPV4_CSUM     PKT_TX_IP_CKSUM /**< Alias of PKT_TX_IP_CKSUM. */
+#define PKT_TX_IPV4          PKT_RX_IPV4_HDR /**< IPv4 with no IP checksum offload. */
+#define PKT_TX_IPV6          PKT_RX_IPV6_HDR /**< IPv6 packet */
+
+#define PKT_TX_VLAN_PKT      (1ULL << 55) /**< TX packet is a 802.1q VLAN packet. */
 
 /* Use final bit of flags to indicate a control mbuf */
 #define CTRL_MBUF_FLAG       (1ULL << 63) /**< Mbuf contains control data */
-- 
2.1.0

^ permalink raw reply	[flat|nested] 112+ messages in thread

* [dpdk-dev] [PATCH v3 04/13] mbuf: add help about TX checksum flags
  2014-11-20 22:58   ` [dpdk-dev] [PATCH v3 00/13] add TSO support Olivier Matz
                       ` (2 preceding siblings ...)
  2014-11-20 22:58     ` [dpdk-dev] [PATCH v3 03/13] mbuf: reorder tx ol_flags Olivier Matz
@ 2014-11-20 22:58     ` Olivier Matz
  2014-11-20 22:58     ` [dpdk-dev] [PATCH v3 05/13] mbuf: remove too specific PKT_TX_OFFLOAD_MASK definition Olivier Matz
                       ` (9 subsequent siblings)
  13 siblings, 0 replies; 112+ messages in thread
From: Olivier Matz @ 2014-11-20 22:58 UTC (permalink / raw)
  To: dev; +Cc: jigsaw

Describe how to use hardware checksum API.

Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
---
 lib/librte_mbuf/rte_mbuf.h | 17 ++++++++++-------
 1 file changed, 10 insertions(+), 7 deletions(-)

diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h
index d3eba44..0c96b00 100644
--- a/lib/librte_mbuf/rte_mbuf.h
+++ b/lib/librte_mbuf/rte_mbuf.h
@@ -98,14 +98,17 @@ extern "C" {
 /* add new TX flags here */
 #define PKT_TX_VXLAN_CKSUM   (1ULL << 50) /**< TX checksum of VXLAN computed by NIC */
 #define PKT_TX_IEEE1588_TMST (1ULL << 51) /**< TX IEEE1588 packet to timestamp. */
-/*
- * Bits 52+53 used for L4 packet type with checksum enabled.
- *     00: Reserved
- *     01: TCP checksum
- *     10: SCTP checksum
- *     11: UDP checksum
+
+/**
+ * Bits 52+53 used for L4 packet type with checksum enabled: 00: Reserved,
+ * 01: TCP checksum, 10: SCTP checksum, 11: UDP checksum. To use hardware
+ * L4 checksum offload, the user needs to:
+ *  - fill l2_len and l3_len in mbuf
+ *  - set the flags PKT_TX_TCP_CKSUM, PKT_TX_SCTP_CKSUM or PKT_TX_UDP_CKSUM
+ *  - calculate the pseudo header checksum and set it in the L4 header (only
+ *    for TCP or UDP). For SCTP, set the crc field to 0.
  */
-#define PKT_TX_L4_NO_CKSUM   (0ULL << 52) /**< Disable L4 cksum of TX pkt. */
+#define PKT_TX_L4_NO_CKSUM   (0ULL << 52) /* Disable L4 cksum of TX pkt. */
 #define PKT_TX_TCP_CKSUM     (1ULL << 52) /**< TCP cksum of TX pkt. computed by NIC. */
 #define PKT_TX_SCTP_CKSUM    (2ULL << 52) /**< SCTP cksum of TX pkt. computed by NIC. */
 #define PKT_TX_UDP_CKSUM     (3ULL << 52) /**< UDP cksum of TX pkt. computed by NIC. */
-- 
2.1.0

^ permalink raw reply	[flat|nested] 112+ messages in thread

* [dpdk-dev] [PATCH v3 05/13] mbuf: remove too specific PKT_TX_OFFLOAD_MASK definition
  2014-11-20 22:58   ` [dpdk-dev] [PATCH v3 00/13] add TSO support Olivier Matz
                       ` (3 preceding siblings ...)
  2014-11-20 22:58     ` [dpdk-dev] [PATCH v3 04/13] mbuf: add help about TX checksum flags Olivier Matz
@ 2014-11-20 22:58     ` Olivier Matz
  2014-11-20 22:58     ` [dpdk-dev] [PATCH v3 06/13] mbuf: add functions to get the name of an ol_flag Olivier Matz
                       ` (8 subsequent siblings)
  13 siblings, 0 replies; 112+ messages in thread
From: Olivier Matz @ 2014-11-20 22:58 UTC (permalink / raw)
  To: dev; +Cc: jigsaw

This definition is specific to Intel PMD drivers and its definition
"indicate what bits required for building TX context" shows that it
should not be in the generic rte_mbuf.h but in the PMD driver.

Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
---
 lib/librte_mbuf/rte_mbuf.h        | 5 -----
 lib/librte_pmd_e1000/igb_rxtx.c   | 8 +++++++-
 lib/librte_pmd_ixgbe/ixgbe_rxtx.c | 8 +++++++-
 3 files changed, 14 insertions(+), 7 deletions(-)

diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h
index 0c96b00..62d952d 100644
--- a/lib/librte_mbuf/rte_mbuf.h
+++ b/lib/librte_mbuf/rte_mbuf.h
@@ -124,11 +124,6 @@ extern "C" {
 /* Use final bit of flags to indicate a control mbuf */
 #define CTRL_MBUF_FLAG       (1ULL << 63) /**< Mbuf contains control data */
 
-/**
- * Bit Mask to indicate what bits required for building TX context
- */
-#define PKT_TX_OFFLOAD_MASK (PKT_TX_VLAN_PKT | PKT_TX_IP_CKSUM | PKT_TX_L4_MASK)
-
 /* define a set of marker types that can be used to refer to set points in the
  * mbuf */
 typedef void    *MARKER[0];   /**< generic marker for a point in a structure */
diff --git a/lib/librte_pmd_e1000/igb_rxtx.c b/lib/librte_pmd_e1000/igb_rxtx.c
index b406397..433c616 100644
--- a/lib/librte_pmd_e1000/igb_rxtx.c
+++ b/lib/librte_pmd_e1000/igb_rxtx.c
@@ -84,6 +84,12 @@
 		ETH_RSS_IPV6_UDP | \
 		ETH_RSS_IPV6_UDP_EX)
 
+/* Bit Mask to indicate what bits required for building TX context */
+#define IGB_TX_OFFLOAD_MASK (			 \
+		PKT_TX_VLAN_PKT |		 \
+		PKT_TX_IP_CKSUM |		 \
+		PKT_TX_L4_MASK)
+
 static inline struct rte_mbuf *
 rte_rxmbuf_alloc(struct rte_mempool *mp)
 {
@@ -400,7 +406,7 @@ eth_igb_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
 		ol_flags = tx_pkt->ol_flags;
 		vlan_macip_lens.f.vlan_tci = tx_pkt->vlan_tci;
 		vlan_macip_lens.f.l2_l3_len = tx_pkt->l2_l3_len;
-		tx_ol_req = ol_flags & PKT_TX_OFFLOAD_MASK;
+		tx_ol_req = ol_flags & IGB_TX_OFFLOAD_MASK;
 
 		/* If a Context Descriptor need be built . */
 		if (tx_ol_req) {
diff --git a/lib/librte_pmd_ixgbe/ixgbe_rxtx.c b/lib/librte_pmd_ixgbe/ixgbe_rxtx.c
index 7e470ce..ca35db2 100644
--- a/lib/librte_pmd_ixgbe/ixgbe_rxtx.c
+++ b/lib/librte_pmd_ixgbe/ixgbe_rxtx.c
@@ -90,6 +90,12 @@
 		ETH_RSS_IPV6_UDP | \
 		ETH_RSS_IPV6_UDP_EX)
 
+/* Bit Mask to indicate what bits required for building TX context */
+#define IXGBE_TX_OFFLOAD_MASK (			 \
+		PKT_TX_VLAN_PKT |		 \
+		PKT_TX_IP_CKSUM |		 \
+		PKT_TX_L4_MASK)
+
 static inline struct rte_mbuf *
 rte_rxmbuf_alloc(struct rte_mempool *mp)
 {
@@ -580,7 +586,7 @@ ixgbe_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
 		ol_flags = tx_pkt->ol_flags;
 
 		/* If hardware offload required */
-		tx_ol_req = ol_flags & PKT_TX_OFFLOAD_MASK;
+		tx_ol_req = ol_flags & IXGBE_TX_OFFLOAD_MASK;
 		if (tx_ol_req) {
 			vlan_macip_lens.f.vlan_tci = tx_pkt->vlan_tci;
 			vlan_macip_lens.f.l2_l3_len = tx_pkt->l2_l3_len;
-- 
2.1.0

^ permalink raw reply	[flat|nested] 112+ messages in thread

* [dpdk-dev] [PATCH v3 06/13] mbuf: add functions to get the name of an ol_flag
  2014-11-20 22:58   ` [dpdk-dev] [PATCH v3 00/13] add TSO support Olivier Matz
                       ` (4 preceding siblings ...)
  2014-11-20 22:58     ` [dpdk-dev] [PATCH v3 05/13] mbuf: remove too specific PKT_TX_OFFLOAD_MASK definition Olivier Matz
@ 2014-11-20 22:58     ` Olivier Matz
  2014-11-25 10:23       ` Thomas Monjalon
  2014-11-20 22:58     ` [dpdk-dev] [PATCH v3 07/13] testpmd: fix use of offload flags in testpmd Olivier Matz
                       ` (7 subsequent siblings)
  13 siblings, 1 reply; 112+ messages in thread
From: Olivier Matz @ 2014-11-20 22:58 UTC (permalink / raw)
  To: dev; +Cc: jigsaw

In test-pmd (rxonly.c), the code is able to dump the list of ol_flags.
The issue is that the list of flags in the application has to be
synchronized with the flags defined in rte_mbuf.h.

This patch introduces 2 new functions rte_get_rx_ol_flag_name()
and rte_get_tx_ol_flag_name() that returns the name of a flag from
its mask. It also fixes rxonly.c to use this new functions and to
display the proper flags.

Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
---
 app/test-pmd/rxonly.c      | 36 ++++++++++------------------------
 lib/librte_mbuf/rte_mbuf.c | 48 ++++++++++++++++++++++++++++++++++++++++++++++
 lib/librte_mbuf/rte_mbuf.h | 25 ++++++++++++++++++++++++
 3 files changed, 83 insertions(+), 26 deletions(-)

diff --git a/app/test-pmd/rxonly.c b/app/test-pmd/rxonly.c
index 9ad1df6..51a530a 100644
--- a/app/test-pmd/rxonly.c
+++ b/app/test-pmd/rxonly.c
@@ -71,26 +71,6 @@
 
 #include "testpmd.h"
 
-#define MAX_PKT_RX_FLAGS 13
-static const char *pkt_rx_flag_names[MAX_PKT_RX_FLAGS] = {
-	"VLAN_PKT",
-	"RSS_HASH",
-	"PKT_RX_FDIR",
-	"IP_CKSUM",
-	"IP_CKSUM_BAD",
-
-	"IPV4_HDR",
-	"IPV4_HDR_EXT",
-	"IPV6_HDR",
-	"IPV6_HDR_EXT",
-
-	"IEEE1588_PTP",
-	"IEEE1588_TMST",
-
-	"TUNNEL_IPV4_HDR",
-	"TUNNEL_IPV6_HDR",
-};
-
 static inline void
 print_ether_addr(const char *what, struct ether_addr *eth_addr)
 {
@@ -214,12 +194,16 @@ pkt_burst_receive(struct fwd_stream *fs)
 		printf(" - Receive queue=0x%x", (unsigned) fs->rx_queue);
 		printf("\n");
 		if (ol_flags != 0) {
-			int rxf;
-
-			for (rxf = 0; rxf < MAX_PKT_RX_FLAGS; rxf++) {
-				if (ol_flags & (1 << rxf))
-					printf("  PKT_RX_%s\n",
-					       pkt_rx_flag_names[rxf]);
+			unsigned rxf;
+			const char *name;
+
+			for (rxf = 0; rxf < sizeof(mb->ol_flags) * 8; rxf++) {
+				if ((ol_flags & (1ULL << rxf)) == 0)
+					continue;
+				name = rte_get_rx_ol_flag_name(1ULL << rxf);
+				if (name == NULL)
+					continue;
+				printf("  %s\n", name);
 			}
 		}
 		rte_pktmbuf_free(mb);
diff --git a/lib/librte_mbuf/rte_mbuf.c b/lib/librte_mbuf/rte_mbuf.c
index 52e7574..9b57b3a 100644
--- a/lib/librte_mbuf/rte_mbuf.c
+++ b/lib/librte_mbuf/rte_mbuf.c
@@ -2,6 +2,7 @@
  *   BSD LICENSE
  *
  *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
+ *   Copyright 2014 6WIND S.A.
  *   All rights reserved.
  *
  *   Redistribution and use in source and binary forms, with or without
@@ -196,3 +197,50 @@ rte_pktmbuf_dump(FILE *f, const struct rte_mbuf *m, unsigned dump_len)
 		nb_segs --;
 	}
 }
+
+/*
+ * Get the name of a RX offload flag. Must be kept synchronized with flag
+ * definitions in rte_mbuf.h.
+ */
+const char *rte_get_rx_ol_flag_name(uint64_t mask)
+{
+	switch (mask) {
+	case PKT_RX_VLAN_PKT: return "PKT_RX_VLAN_PKT";
+	case PKT_RX_RSS_HASH: return "PKT_RX_RSS_HASH";
+	case PKT_RX_FDIR: return "PKT_RX_FDIR";
+	case PKT_RX_L4_CKSUM_BAD: return "PKT_RX_L4_CKSUM_BAD";
+	case PKT_RX_IP_CKSUM_BAD: return "PKT_RX_IP_CKSUM_BAD";
+	/* case PKT_RX_EIP_CKSUM_BAD: return "PKT_RX_EIP_CKSUM_BAD"; */
+	/* case PKT_RX_OVERSIZE: return "PKT_RX_OVERSIZE"; */
+	/* case PKT_RX_HBUF_OVERFLOW: return "PKT_RX_HBUF_OVERFLOW"; */
+	/* case PKT_RX_RECIP_ERR: return "PKT_RX_RECIP_ERR"; */
+	/* case PKT_RX_MAC_ERR: return "PKT_RX_MAC_ERR"; */
+	case PKT_RX_IPV4_HDR: return "PKT_RX_IPV4_HDR";
+	case PKT_RX_IPV4_HDR_EXT: return "PKT_RX_IPV4_HDR_EXT";
+	case PKT_RX_IPV6_HDR: return "PKT_RX_IPV6_HDR";
+	case PKT_RX_IPV6_HDR_EXT: return "PKT_RX_IPV6_HDR_EXT";
+	case PKT_RX_IEEE1588_PTP: return "PKT_RX_IEEE1588_PTP";
+	case PKT_RX_IEEE1588_TMST: return "PKT_RX_IEEE1588_TMST";
+	case PKT_RX_TUNNEL_IPV4_HDR: return "PKT_RX_TUNNEL_IPV4_HDR";
+	case PKT_RX_TUNNEL_IPV6_HDR: return "PKT_RX_TUNNEL_IPV6_HDR";
+	default: return NULL;
+	}
+}
+
+/*
+ * Get the name of a TX offload flag. Must be kept synchronized with flag
+ * definitions in rte_mbuf.h.
+ */
+const char *rte_get_tx_ol_flag_name(uint64_t mask)
+{
+	switch (mask) {
+	case PKT_TX_VLAN_PKT: return "PKT_TX_VLAN_PKT";
+	case PKT_TX_IP_CKSUM: return "PKT_TX_IP_CKSUM";
+	case PKT_TX_TCP_CKSUM: return "PKT_TX_TCP_CKSUM";
+	case PKT_TX_SCTP_CKSUM: return "PKT_TX_SCTP_CKSUM";
+	case PKT_TX_UDP_CKSUM: return "PKT_TX_UDP_CKSUM";
+	case PKT_TX_IEEE1588_TMST: return "PKT_TX_IEEE1588_TMST";
+	case PKT_TX_VXLAN_CKSUM: return "PKT_TX_VXLAN_CKSUM";
+	default: return NULL;
+	}
+}
diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h
index 62d952d..acc0385 100644
--- a/lib/librte_mbuf/rte_mbuf.h
+++ b/lib/librte_mbuf/rte_mbuf.h
@@ -74,6 +74,9 @@ extern "C" {
  * - The most-significant 8 bits are reserved for generic mbuf flags
  * - TX flags therefore start at bit position 55 (i.e. 63-8), and new flags get
  *   added to the right of the previously defined flags
+ *
+ * Keep these flags synchronized with rte_get_rx_ol_flag_name() and
+ * rte_get_tx_ol_flag_name().
  */
 #define PKT_RX_VLAN_PKT      (1ULL << 0)  /**< RX packet is a 802.1q VLAN packet. */
 #define PKT_RX_RSS_HASH      (1ULL << 1)  /**< RX packet with RSS hash result. */
@@ -124,6 +127,28 @@ extern "C" {
 /* Use final bit of flags to indicate a control mbuf */
 #define CTRL_MBUF_FLAG       (1ULL << 63) /**< Mbuf contains control data */
 
+/**
+ * Get the name of a RX offload flag
+ *
+ * @param mask
+ *   The mask describing the flag.
+ * @return
+ *   The name of this flag, or NULL if it's not a valid RX flag.
+ */
+const char *rte_get_rx_ol_flag_name(uint64_t mask);
+
+/**
+ * Get the name of a TX offload flag
+ *
+ * @param mask
+ *   The mask describing the flag. Usually only one bit must be set.
+ *   Several bits can be given if they belong to the same mask.
+ *   Ex: PKT_TX_L4_MASK.
+ * @return
+ *   The name of this flag, or NULL if it's not a valid TX flag.
+ */
+const char *rte_get_tx_ol_flag_name(uint64_t mask);
+
 /* define a set of marker types that can be used to refer to set points in the
  * mbuf */
 typedef void    *MARKER[0];   /**< generic marker for a point in a structure */
-- 
2.1.0

^ permalink raw reply	[flat|nested] 112+ messages in thread

* [dpdk-dev] [PATCH v3 07/13] testpmd: fix use of offload flags in testpmd
  2014-11-20 22:58   ` [dpdk-dev] [PATCH v3 00/13] add TSO support Olivier Matz
                       ` (5 preceding siblings ...)
  2014-11-20 22:58     ` [dpdk-dev] [PATCH v3 06/13] mbuf: add functions to get the name of an ol_flag Olivier Matz
@ 2014-11-20 22:58     ` Olivier Matz
  2014-11-25 11:52       ` Ananyev, Konstantin
  2014-11-20 22:58     ` [dpdk-dev] [PATCH v3 08/13] testpmd: rework csum forward engine Olivier Matz
                       ` (6 subsequent siblings)
  13 siblings, 1 reply; 112+ messages in thread
From: Olivier Matz @ 2014-11-20 22:58 UTC (permalink / raw)
  To: dev; +Cc: jigsaw

In testpmd the rte_port->tx_ol_flags flag was used in 2 incompatible
manners:
- sometimes used with testpmd specific flags (0xff for checksums, and
  bit 11 for vlan)
- sometimes assigned to m->ol_flags directly, which is wrong in case
  of checksum flags

This commit replaces the hardcoded values by named definitions, which
are not compatible with mbuf flags. The testpmd forward engines are
fixed to use the flags properly.

Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
---
 app/test-pmd/config.c   |  4 ++--
 app/test-pmd/csumonly.c | 40 +++++++++++++++++++++++-----------------
 app/test-pmd/macfwd.c   |  5 ++++-
 app/test-pmd/macswap.c  |  5 ++++-
 app/test-pmd/testpmd.h  | 28 +++++++++++++++++++++-------
 app/test-pmd/txonly.c   |  9 ++++++---
 6 files changed, 60 insertions(+), 31 deletions(-)

diff --git a/app/test-pmd/config.c b/app/test-pmd/config.c
index b102b72..34b6fdb 100644
--- a/app/test-pmd/config.c
+++ b/app/test-pmd/config.c
@@ -1670,7 +1670,7 @@ tx_vlan_set(portid_t port_id, uint16_t vlan_id)
 		return;
 	if (vlan_id_is_invalid(vlan_id))
 		return;
-	ports[port_id].tx_ol_flags |= PKT_TX_VLAN_PKT;
+	ports[port_id].tx_ol_flags |= TESTPMD_TX_OFFLOAD_INSERT_VLAN;
 	ports[port_id].tx_vlan_id = vlan_id;
 }
 
@@ -1679,7 +1679,7 @@ tx_vlan_reset(portid_t port_id)
 {
 	if (port_id_is_invalid(port_id))
 		return;
-	ports[port_id].tx_ol_flags &= ~PKT_TX_VLAN_PKT;
+	ports[port_id].tx_ol_flags &= ~TESTPMD_TX_OFFLOAD_INSERT_VLAN;
 }
 
 void
diff --git a/app/test-pmd/csumonly.c b/app/test-pmd/csumonly.c
index 8d10bfd..743094a 100644
--- a/app/test-pmd/csumonly.c
+++ b/app/test-pmd/csumonly.c
@@ -322,7 +322,7 @@ pkt_burst_checksum_forward(struct fwd_stream *fs)
 			/* Do not delete, this is required by HW*/
 			ipv4_hdr->hdr_checksum = 0;
 
-			if (tx_ol_flags & 0x1) {
+			if (tx_ol_flags & TESTPMD_TX_OFFLOAD_IP_CKSUM) {
 				/* HW checksum */
 				ol_flags |= PKT_TX_IP_CKSUM;
 			}
@@ -336,7 +336,7 @@ pkt_burst_checksum_forward(struct fwd_stream *fs)
 			if (l4_proto == IPPROTO_UDP) {
 				udp_hdr = (struct udp_hdr*) (rte_pktmbuf_mtod(mb,
 						unsigned char *) + l2_len + l3_len);
-				if (tx_ol_flags & 0x2) {
+				if (tx_ol_flags & TESTPMD_TX_OFFLOAD_UDP_CKSUM) {
 					/* HW Offload */
 					ol_flags |= PKT_TX_UDP_CKSUM;
 					if (ipv4_tunnel)
@@ -358,7 +358,7 @@ pkt_burst_checksum_forward(struct fwd_stream *fs)
 					uint16_t len;
 
 					/* Check if inner L3/L4 checkum flag is set */
-					if (tx_ol_flags & 0xF0)
+					if (tx_ol_flags & TESTPMD_TX_OFFLOAD_INNER_CKSUM_MASK)
 						ol_flags |= PKT_TX_VXLAN_CKSUM;
 
 					inner_l2_len  = sizeof(struct ether_hdr);
@@ -381,7 +381,7 @@ pkt_burst_checksum_forward(struct fwd_stream *fs)
 								unsigned char *) + len);
 						inner_l4_proto = inner_ipv4_hdr->next_proto_id;
 
-						if (tx_ol_flags & 0x10) {
+						if (tx_ol_flags & TESTPMD_TX_OFFLOAD_INNER_IP_CKSUM) {
 
 							/* Do not delete, this is required by HW*/
 							inner_ipv4_hdr->hdr_checksum = 0;
@@ -394,7 +394,8 @@ pkt_burst_checksum_forward(struct fwd_stream *fs)
 								unsigned char *) + len);
 						inner_l4_proto = inner_ipv6_hdr->proto;
 					}
-					if ((inner_l4_proto == IPPROTO_UDP) && (tx_ol_flags & 0x20)) {
+					if ((inner_l4_proto == IPPROTO_UDP) &&
+						(tx_ol_flags & TESTPMD_TX_OFFLOAD_INNER_UDP_CKSUM)) {
 
 						/* HW Offload */
 						ol_flags |= PKT_TX_UDP_CKSUM;
@@ -405,7 +406,8 @@ pkt_burst_checksum_forward(struct fwd_stream *fs)
 						else if (eth_type == ETHER_TYPE_IPv6)
 							inner_udp_hdr->dgram_cksum = get_ipv6_psd_sum(inner_ipv6_hdr);
 
-					} else if ((inner_l4_proto == IPPROTO_TCP) && (tx_ol_flags & 0x40)) {
+					} else if ((inner_l4_proto == IPPROTO_TCP) &&
+						(tx_ol_flags & TESTPMD_TX_OFFLOAD_INNER_TCP_CKSUM)) {
 						/* HW Offload */
 						ol_flags |= PKT_TX_TCP_CKSUM;
 						inner_tcp_hdr = (struct tcp_hdr *) (rte_pktmbuf_mtod(mb,
@@ -414,7 +416,8 @@ pkt_burst_checksum_forward(struct fwd_stream *fs)
 							inner_tcp_hdr->cksum = get_ipv4_psd_sum(inner_ipv4_hdr);
 						else if (eth_type == ETHER_TYPE_IPv6)
 							inner_tcp_hdr->cksum = get_ipv6_psd_sum(inner_ipv6_hdr);
-					} else if ((inner_l4_proto == IPPROTO_SCTP) && (tx_ol_flags & 0x80)) {
+					} else if ((inner_l4_proto == IPPROTO_SCTP) &&
+						(tx_ol_flags & TESTPMD_TX_OFFLOAD_INNER_SCTP_CKSUM)) {
 						/* HW Offload */
 						ol_flags |= PKT_TX_SCTP_CKSUM;
 						inner_sctp_hdr = (struct sctp_hdr *) (rte_pktmbuf_mtod(mb,
@@ -427,7 +430,7 @@ pkt_burst_checksum_forward(struct fwd_stream *fs)
 			} else if (l4_proto == IPPROTO_TCP) {
 				tcp_hdr = (struct tcp_hdr*) (rte_pktmbuf_mtod(mb,
 						unsigned char *) + l2_len + l3_len);
-				if (tx_ol_flags & 0x4) {
+				if (tx_ol_flags & TESTPMD_TX_OFFLOAD_TCP_CKSUM) {
 					ol_flags |= PKT_TX_TCP_CKSUM;
 					tcp_hdr->cksum = get_ipv4_psd_sum(ipv4_hdr);
 				}
@@ -440,7 +443,7 @@ pkt_burst_checksum_forward(struct fwd_stream *fs)
 				sctp_hdr = (struct sctp_hdr*) (rte_pktmbuf_mtod(mb,
 						unsigned char *) + l2_len + l3_len);
 
-				if (tx_ol_flags & 0x8) {
+				if (tx_ol_flags & TESTPMD_TX_OFFLOAD_SCTP_CKSUM) {
 					ol_flags |= PKT_TX_SCTP_CKSUM;
 					sctp_hdr->cksum = 0;
 
@@ -465,7 +468,7 @@ pkt_burst_checksum_forward(struct fwd_stream *fs)
 			if (l4_proto == IPPROTO_UDP) {
 				udp_hdr = (struct udp_hdr*) (rte_pktmbuf_mtod(mb,
 						unsigned char *) + l2_len + l3_len);
-				if (tx_ol_flags & 0x2) {
+				if (tx_ol_flags & TESTPMD_TX_OFFLOAD_UDP_CKSUM) {
 					/* HW Offload */
 					ol_flags |= PKT_TX_UDP_CKSUM;
 					if (ipv6_tunnel)
@@ -487,7 +490,7 @@ pkt_burst_checksum_forward(struct fwd_stream *fs)
 					uint16_t len;
 
 					/* Check if inner L3/L4 checksum flag is set */
-					if (tx_ol_flags & 0xF0)
+					if (tx_ol_flags & TESTPMD_TX_OFFLOAD_INNER_CKSUM_MASK)
 						ol_flags |= PKT_TX_VXLAN_CKSUM;
 
 					inner_l2_len  = sizeof(struct ether_hdr);
@@ -511,7 +514,7 @@ pkt_burst_checksum_forward(struct fwd_stream *fs)
 						inner_l4_proto = inner_ipv4_hdr->next_proto_id;
 
 						/* HW offload */
-						if (tx_ol_flags & 0x10) {
+						if (tx_ol_flags & TESTPMD_TX_OFFLOAD_INNER_IP_CKSUM) {
 
 							/* Do not delete, this is required by HW*/
 							inner_ipv4_hdr->hdr_checksum = 0;
@@ -524,7 +527,8 @@ pkt_burst_checksum_forward(struct fwd_stream *fs)
 						inner_l4_proto = inner_ipv6_hdr->proto;
 					}
 
-					if ((inner_l4_proto == IPPROTO_UDP) && (tx_ol_flags & 0x20)) {
+					if ((inner_l4_proto == IPPROTO_UDP) &&
+						(tx_ol_flags & TESTPMD_TX_OFFLOAD_INNER_UDP_CKSUM)) {
 						inner_udp_hdr = (struct udp_hdr *) (rte_pktmbuf_mtod(mb,
 							unsigned char *) + len + inner_l3_len);
 						/* HW offload */
@@ -534,7 +538,8 @@ pkt_burst_checksum_forward(struct fwd_stream *fs)
 							inner_udp_hdr->dgram_cksum = get_ipv4_psd_sum(inner_ipv4_hdr);
 						else if (eth_type == ETHER_TYPE_IPv6)
 							inner_udp_hdr->dgram_cksum = get_ipv6_psd_sum(inner_ipv6_hdr);
-					} else if ((inner_l4_proto == IPPROTO_TCP) && (tx_ol_flags & 0x40)) {
+					} else if ((inner_l4_proto == IPPROTO_TCP) &&
+						(tx_ol_flags & TESTPMD_TX_OFFLOAD_INNER_TCP_CKSUM)) {
 						/* HW offload */
 						ol_flags |= PKT_TX_TCP_CKSUM;
 						inner_tcp_hdr = (struct tcp_hdr *) (rte_pktmbuf_mtod(mb,
@@ -545,7 +550,8 @@ pkt_burst_checksum_forward(struct fwd_stream *fs)
 						else if (eth_type == ETHER_TYPE_IPv6)
 							inner_tcp_hdr->cksum = get_ipv6_psd_sum(inner_ipv6_hdr);
 
-					} else if ((inner_l4_proto == IPPROTO_SCTP) && (tx_ol_flags & 0x80)) {
+					} else if ((inner_l4_proto == IPPROTO_SCTP) &&
+						(tx_ol_flags & TESTPMD_TX_OFFLOAD_INNER_SCTP_CKSUM)) {
 						/* HW offload */
 						ol_flags |= PKT_TX_SCTP_CKSUM;
 						inner_sctp_hdr = (struct sctp_hdr *) (rte_pktmbuf_mtod(mb,
@@ -559,7 +565,7 @@ pkt_burst_checksum_forward(struct fwd_stream *fs)
 			else if (l4_proto == IPPROTO_TCP) {
 				tcp_hdr = (struct tcp_hdr*) (rte_pktmbuf_mtod(mb,
 						unsigned char *) + l2_len + l3_len);
-				if (tx_ol_flags & 0x4) {
+				if (tx_ol_flags & TESTPMD_TX_OFFLOAD_TCP_CKSUM) {
 					ol_flags |= PKT_TX_TCP_CKSUM;
 					tcp_hdr->cksum = get_ipv6_psd_sum(ipv6_hdr);
 				}
@@ -573,7 +579,7 @@ pkt_burst_checksum_forward(struct fwd_stream *fs)
 				sctp_hdr = (struct sctp_hdr*) (rte_pktmbuf_mtod(mb,
 						unsigned char *) + l2_len + l3_len);
 
-				if (tx_ol_flags & 0x8) {
+				if (tx_ol_flags & TESTPMD_TX_OFFLOAD_SCTP_CKSUM) {
 					ol_flags |= PKT_TX_SCTP_CKSUM;
 					sctp_hdr->cksum = 0;
 					/* Sanity check, only number of 4 bytes supported by HW */
diff --git a/app/test-pmd/macfwd.c b/app/test-pmd/macfwd.c
index 38bae23..aa3d705 100644
--- a/app/test-pmd/macfwd.c
+++ b/app/test-pmd/macfwd.c
@@ -85,6 +85,7 @@ pkt_burst_mac_forward(struct fwd_stream *fs)
 	uint16_t nb_rx;
 	uint16_t nb_tx;
 	uint16_t i;
+	uint64_t ol_flags = 0;
 #ifdef RTE_TEST_PMD_RECORD_CORE_CYCLES
 	uint64_t start_tsc;
 	uint64_t end_tsc;
@@ -108,6 +109,8 @@ pkt_burst_mac_forward(struct fwd_stream *fs)
 #endif
 	fs->rx_packets += nb_rx;
 	txp = &ports[fs->tx_port];
+	if (txp->tx_ol_flags & TESTPMD_TX_OFFLOAD_INSERT_VLAN)
+		ol_flags = PKT_TX_VLAN_PKT;
 	for (i = 0; i < nb_rx; i++) {
 		mb = pkts_burst[i];
 		eth_hdr = rte_pktmbuf_mtod(mb, struct ether_hdr *);
@@ -115,7 +118,7 @@ pkt_burst_mac_forward(struct fwd_stream *fs)
 				&eth_hdr->d_addr);
 		ether_addr_copy(&ports[fs->tx_port].eth_addr,
 				&eth_hdr->s_addr);
-		mb->ol_flags = txp->tx_ol_flags;
+		mb->ol_flags = ol_flags;
 		mb->l2_len = sizeof(struct ether_hdr);
 		mb->l3_len = sizeof(struct ipv4_hdr);
 		mb->vlan_tci = txp->tx_vlan_id;
diff --git a/app/test-pmd/macswap.c b/app/test-pmd/macswap.c
index 1786095..ec61657 100644
--- a/app/test-pmd/macswap.c
+++ b/app/test-pmd/macswap.c
@@ -85,6 +85,7 @@ pkt_burst_mac_swap(struct fwd_stream *fs)
 	uint16_t nb_rx;
 	uint16_t nb_tx;
 	uint16_t i;
+	uint64_t ol_flags = 0;
 #ifdef RTE_TEST_PMD_RECORD_CORE_CYCLES
 	uint64_t start_tsc;
 	uint64_t end_tsc;
@@ -108,6 +109,8 @@ pkt_burst_mac_swap(struct fwd_stream *fs)
 #endif
 	fs->rx_packets += nb_rx;
 	txp = &ports[fs->tx_port];
+	if (txp->tx_ol_flags & TESTPMD_TX_OFFLOAD_INSERT_VLAN)
+		ol_flags = PKT_TX_VLAN_PKT;
 	for (i = 0; i < nb_rx; i++) {
 		mb = pkts_burst[i];
 		eth_hdr = rte_pktmbuf_mtod(mb, struct ether_hdr *);
@@ -117,7 +120,7 @@ pkt_burst_mac_swap(struct fwd_stream *fs)
 		ether_addr_copy(&eth_hdr->s_addr, &eth_hdr->d_addr);
 		ether_addr_copy(&addr, &eth_hdr->s_addr);
 
-		mb->ol_flags = txp->tx_ol_flags;
+		mb->ol_flags = ol_flags;
 		mb->l2_len = sizeof(struct ether_hdr);
 		mb->l3_len = sizeof(struct ipv4_hdr);
 		mb->vlan_tci = txp->tx_vlan_id;
diff --git a/app/test-pmd/testpmd.h b/app/test-pmd/testpmd.h
index 9cbfeac..82af2bd 100644
--- a/app/test-pmd/testpmd.h
+++ b/app/test-pmd/testpmd.h
@@ -123,14 +123,28 @@ struct fwd_stream {
 #endif
 };
 
+/** Offload IP checksum in csum forward engine */
+#define TESTPMD_TX_OFFLOAD_IP_CKSUM          0x0001
+/** Offload UDP checksum in csum forward engine */
+#define TESTPMD_TX_OFFLOAD_UDP_CKSUM         0x0002
+/** Offload TCP checksum in csum forward engine */
+#define TESTPMD_TX_OFFLOAD_TCP_CKSUM         0x0004
+/** Offload SCTP checksum in csum forward engine */
+#define TESTPMD_TX_OFFLOAD_SCTP_CKSUM        0x0008
+/** Offload inner IP checksum in csum forward engine */
+#define TESTPMD_TX_OFFLOAD_INNER_IP_CKSUM    0x0010
+/** Offload inner UDP checksum in csum forward engine */
+#define TESTPMD_TX_OFFLOAD_INNER_UDP_CKSUM   0x0020
+/** Offload inner TCP checksum in csum forward engine */
+#define TESTPMD_TX_OFFLOAD_INNER_TCP_CKSUM   0x0040
+/** Offload inner SCTP checksum in csum forward engine */
+#define TESTPMD_TX_OFFLOAD_INNER_SCTP_CKSUM  0x0080
+/** Offload inner IP checksum mask */
+#define TESTPMD_TX_OFFLOAD_INNER_CKSUM_MASK  0x00F0
+/** Insert VLAN header in forward engine */
+#define TESTPMD_TX_OFFLOAD_INSERT_VLAN       0x0100
 /**
  * The data structure associated with each port.
- * tx_ol_flags is slightly different from ol_flags of rte_mbuf.
- *   Bit  0: Insert IP checksum
- *   Bit  1: Insert UDP checksum
- *   Bit  2: Insert TCP checksum
- *   Bit  3: Insert SCTP checksum
- *   Bit 11: Insert VLAN Label
  */
 struct rte_port {
 	struct rte_eth_dev_info dev_info;   /**< PCI info + driver name */
@@ -141,7 +155,7 @@ struct rte_port {
 	struct fwd_stream       *rx_stream; /**< Port RX stream, if unique */
 	struct fwd_stream       *tx_stream; /**< Port TX stream, if unique */
 	unsigned int            socket_id;  /**< For NUMA support */
-	uint64_t                tx_ol_flags;/**< Offload Flags of TX packets. */
+	uint16_t                tx_ol_flags;/**< TX Offload Flags (TESTPMD_TX_OFFLOAD...). */
 	uint16_t                tx_vlan_id; /**< Tag Id. in TX VLAN packets. */
 	void                    *fwd_ctx;   /**< Forwarding mode context */
 	uint64_t                rx_bad_ip_csum; /**< rx pkts with bad ip checksum  */
diff --git a/app/test-pmd/txonly.c b/app/test-pmd/txonly.c
index 3d08005..c984670 100644
--- a/app/test-pmd/txonly.c
+++ b/app/test-pmd/txonly.c
@@ -196,6 +196,7 @@ static void
 pkt_burst_transmit(struct fwd_stream *fs)
 {
 	struct rte_mbuf *pkts_burst[MAX_PKT_BURST];
+	struct rte_port *txp;
 	struct rte_mbuf *pkt;
 	struct rte_mbuf *pkt_seg;
 	struct rte_mempool *mbp;
@@ -203,7 +204,7 @@ pkt_burst_transmit(struct fwd_stream *fs)
 	uint16_t nb_tx;
 	uint16_t nb_pkt;
 	uint16_t vlan_tci;
-	uint64_t ol_flags;
+	uint64_t ol_flags = 0;
 	uint8_t  i;
 #ifdef RTE_TEST_PMD_RECORD_CORE_CYCLES
 	uint64_t start_tsc;
@@ -216,8 +217,10 @@ pkt_burst_transmit(struct fwd_stream *fs)
 #endif
 
 	mbp = current_fwd_lcore()->mbp;
-	vlan_tci = ports[fs->tx_port].tx_vlan_id;
-	ol_flags = ports[fs->tx_port].tx_ol_flags;
+	txp = &ports[fs->tx_port];
+	vlan_tci = txp->tx_vlan_id;
+	if (txp->tx_ol_flags & TESTPMD_TX_OFFLOAD_INSERT_VLAN)
+		ol_flags = PKT_TX_VLAN_PKT;
 	for (nb_pkt = 0; nb_pkt < nb_pkt_per_burst; nb_pkt++) {
 		pkt = tx_mbuf_alloc(mbp);
 		if (pkt == NULL) {
-- 
2.1.0

^ permalink raw reply	[flat|nested] 112+ messages in thread

* [dpdk-dev] [PATCH v3 08/13] testpmd: rework csum forward engine
  2014-11-20 22:58   ` [dpdk-dev] [PATCH v3 00/13] add TSO support Olivier Matz
                       ` (6 preceding siblings ...)
  2014-11-20 22:58     ` [dpdk-dev] [PATCH v3 07/13] testpmd: fix use of offload flags in testpmd Olivier Matz
@ 2014-11-20 22:58     ` Olivier Matz
  2014-11-26 10:10       ` Ananyev, Konstantin
  2014-11-20 22:58     ` [dpdk-dev] [PATCH v3 09/13] mbuf: introduce new checksum API Olivier Matz
                       ` (5 subsequent siblings)
  13 siblings, 1 reply; 112+ messages in thread
From: Olivier Matz @ 2014-11-20 22:58 UTC (permalink / raw)
  To: dev; +Cc: jigsaw

The csum forward engine was becoming too complex to be used and
extended (the next commits want to add the support of TSO):

- no explaination about what the code does
- code is not factorized, lots of code duplicated, especially between
  ipv4/ipv6
- user command line api: use of bitmasks that need to be calculated by
  the user
- the user flags don't have the same semantic:
  - for legacy IP/UDP/TCP/SCTP, it selects software or hardware checksum
  - for other (vxlan), it selects between hardware checksum or no
    checksum
- the code relies too much on flags set by the driver without software
  alternative (ex: PKT_RX_TUNNEL_IPV4_HDR). It is nice to be able to
  compare a software implementation with the hardware offload.

This commit tries to fix these issues, and provide a simple definition
of what is done by the forward engine:

 * Receive a burst of packets, and for supported packet types:
 *  - modify the IPs
 *  - reprocess the checksum in SW or HW, depending on testpmd command line
 *    configuration
 * Then packets are transmitted on the output port.
 *
 * Supported packets are:
 *   Ether / (vlan) / IP|IP6 / UDP|TCP|SCTP .
 *   Ether / (vlan) / IP|IP6 / UDP / VxLAN / Ether / IP|IP6 / UDP|TCP|SCTP
 *
 * The network parser supposes that the packet is contiguous, which may
 * not be the case in real life.

Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
---
 app/test-pmd/cmdline.c  | 156 ++++++++---
 app/test-pmd/config.c   |  13 +-
 app/test-pmd/csumonly.c | 676 ++++++++++++++++++++++--------------------------
 app/test-pmd/testpmd.h  |  17 +-
 4 files changed, 437 insertions(+), 425 deletions(-)

diff --git a/app/test-pmd/cmdline.c b/app/test-pmd/cmdline.c
index 4c3fc76..61e4340 100644
--- a/app/test-pmd/cmdline.c
+++ b/app/test-pmd/cmdline.c
@@ -310,19 +310,19 @@ static void cmd_help_long_parsed(void *parsed_result,
 			"    Disable hardware insertion of a VLAN header in"
 			" packets sent on a port.\n\n"
 
-			"tx_checksum set (mask) (port_id)\n"
-			"    Enable hardware insertion of checksum offload with"
-			" the 8-bit mask, 0~0xff, in packets sent on a port.\n"
-			"        bit 0 - insert ip   checksum offload if set\n"
-			"        bit 1 - insert udp  checksum offload if set\n"
-			"        bit 2 - insert tcp  checksum offload if set\n"
-			"        bit 3 - insert sctp checksum offload if set\n"
-			"        bit 4 - insert inner ip  checksum offload if set\n"
-			"        bit 5 - insert inner udp checksum offload if set\n"
-			"        bit 6 - insert inner tcp checksum offload if set\n"
-			"        bit 7 - insert inner sctp checksum offload if set\n"
+			"tx_cksum set (ip|udp|tcp|sctp|vxlan) (hw|sw) (port_id)\n"
+			"    Select hardware or software calculation of the"
+			" checksum with when transmitting a packet using the"
+			" csum forward engine.\n"
+			"    ip|udp|tcp|sctp always concern the inner layer.\n"
+			"    vxlan concerns the outer IP and UDP layer (in"
+			" case the packet is recognized as a vxlan packet by"
+			" the forward engine)\n"
 			"    Please check the NIC datasheet for HW limits.\n\n"
 
+			"tx_checksum show (port_id)\n"
+			"    Display tx checksum offload configuration\n\n"
+
 			"set fwd (%s)\n"
 			"    Set packet forwarding mode.\n\n"
 
@@ -2738,48 +2738,131 @@ cmdline_parse_inst_t cmd_tx_vlan_reset = {
 
 
 /* *** ENABLE HARDWARE INSERTION OF CHECKSUM IN TX PACKETS *** */
-struct cmd_tx_cksum_set_result {
+struct cmd_tx_cksum_result {
 	cmdline_fixed_string_t tx_cksum;
-	cmdline_fixed_string_t set;
-	uint8_t cksum_mask;
+	cmdline_fixed_string_t mode;
+	cmdline_fixed_string_t proto;
+	cmdline_fixed_string_t hwsw;
 	uint8_t port_id;
 };
 
 static void
-cmd_tx_cksum_set_parsed(void *parsed_result,
+cmd_tx_cksum_parsed(void *parsed_result,
 		       __attribute__((unused)) struct cmdline *cl,
 		       __attribute__((unused)) void *data)
 {
-	struct cmd_tx_cksum_set_result *res = parsed_result;
+	struct cmd_tx_cksum_result *res = parsed_result;
+	int hw = 0;
+	uint16_t ol_flags, mask = 0;
+	struct rte_eth_dev_info dev_info;
+
+	if (port_id_is_invalid(res->port_id)) {
+		printf("invalid port %d\n", res->port_id);
+		return;
+	}
 
-	tx_cksum_set(res->port_id, res->cksum_mask);
+	if (!strcmp(res->mode, "set")) {
+
+		if (!strcmp(res->hwsw, "hw"))
+			hw = 1;
+
+		if (!strcmp(res->proto, "ip")) {
+			mask = TESTPMD_TX_OFFLOAD_IP_CKSUM;
+		} else if (!strcmp(res->proto, "udp")) {
+			mask = TESTPMD_TX_OFFLOAD_UDP_CKSUM;
+		} else if (!strcmp(res->proto, "tcp")) {
+			mask = TESTPMD_TX_OFFLOAD_TCP_CKSUM;
+		} else if (!strcmp(res->proto, "sctp")) {
+			mask = TESTPMD_TX_OFFLOAD_SCTP_CKSUM;
+		} else if (!strcmp(res->proto, "vxlan")) {
+			mask = TESTPMD_TX_OFFLOAD_VXLAN_CKSUM;
+		}
+
+		if (hw)
+			ports[res->port_id].tx_ol_flags |= mask;
+		else
+			ports[res->port_id].tx_ol_flags &= (~mask);
+	}
+
+	ol_flags = ports[res->port_id].tx_ol_flags;
+	printf("IP checksum offload is %s\n",
+		(ol_flags & TESTPMD_TX_OFFLOAD_IP_CKSUM) ? "hw" : "sw");
+	printf("UDP checksum offload is %s\n",
+		(ol_flags & TESTPMD_TX_OFFLOAD_UDP_CKSUM) ? "hw" : "sw");
+	printf("TCP checksum offload is %s\n",
+		(ol_flags & TESTPMD_TX_OFFLOAD_TCP_CKSUM) ? "hw" : "sw");
+	printf("SCTP checksum offload is %s\n",
+		(ol_flags & TESTPMD_TX_OFFLOAD_SCTP_CKSUM) ? "hw" : "sw");
+	printf("VxLAN checksum offload is %s\n",
+		(ol_flags & TESTPMD_TX_OFFLOAD_VXLAN_CKSUM) ? "hw" : "sw");
+
+	/* display warnings if configuration is not supported by the NIC */
+	rte_eth_dev_info_get(res->port_id, &dev_info);
+	if ((ol_flags & TESTPMD_TX_OFFLOAD_IP_CKSUM) &&
+		(dev_info.tx_offload_capa & DEV_TX_OFFLOAD_IPV4_CKSUM) == 0) {
+		printf("Warning: hardware IP checksum enabled but not "
+			"supported by port %d\n", res->port_id);
+	}
+	if ((ol_flags & TESTPMD_TX_OFFLOAD_UDP_CKSUM) &&
+		(dev_info.tx_offload_capa & DEV_TX_OFFLOAD_UDP_CKSUM) == 0) {
+		printf("Warning: hardware UDP checksum enabled but not "
+			"supported by port %d\n", res->port_id);
+	}
+	if ((ol_flags & TESTPMD_TX_OFFLOAD_TCP_CKSUM) &&
+		(dev_info.tx_offload_capa & DEV_TX_OFFLOAD_TCP_CKSUM) == 0) {
+		printf("Warning: hardware TCP checksum enabled but not "
+			"supported by port %d\n", res->port_id);
+	}
+	if ((ol_flags & TESTPMD_TX_OFFLOAD_SCTP_CKSUM) &&
+		(dev_info.tx_offload_capa & DEV_TX_OFFLOAD_SCTP_CKSUM) == 0) {
+		printf("Warning: hardware SCTP checksum enabled but not "
+			"supported by port %d\n", res->port_id);
+	}
 }
 
-cmdline_parse_token_string_t cmd_tx_cksum_set_tx_cksum =
-	TOKEN_STRING_INITIALIZER(struct cmd_tx_cksum_set_result,
+cmdline_parse_token_string_t cmd_tx_cksum_tx_cksum =
+	TOKEN_STRING_INITIALIZER(struct cmd_tx_cksum_result,
 				tx_cksum, "tx_checksum");
-cmdline_parse_token_string_t cmd_tx_cksum_set_set =
-	TOKEN_STRING_INITIALIZER(struct cmd_tx_cksum_set_result,
-				set, "set");
-cmdline_parse_token_num_t cmd_tx_cksum_set_cksum_mask =
-	TOKEN_NUM_INITIALIZER(struct cmd_tx_cksum_set_result,
-				cksum_mask, UINT8);
-cmdline_parse_token_num_t cmd_tx_cksum_set_portid =
-	TOKEN_NUM_INITIALIZER(struct cmd_tx_cksum_set_result,
+cmdline_parse_token_string_t cmd_tx_cksum_mode =
+	TOKEN_STRING_INITIALIZER(struct cmd_tx_cksum_result,
+				mode, "set");
+cmdline_parse_token_string_t cmd_tx_cksum_proto =
+	TOKEN_STRING_INITIALIZER(struct cmd_tx_cksum_result,
+				proto, "ip#tcp#udp#sctp#vxlan");
+cmdline_parse_token_string_t cmd_tx_cksum_hwsw =
+	TOKEN_STRING_INITIALIZER(struct cmd_tx_cksum_result,
+				hwsw, "hw#sw");
+cmdline_parse_token_num_t cmd_tx_cksum_portid =
+	TOKEN_NUM_INITIALIZER(struct cmd_tx_cksum_result,
 				port_id, UINT8);
 
 cmdline_parse_inst_t cmd_tx_cksum_set = {
-	.f = cmd_tx_cksum_set_parsed,
+	.f = cmd_tx_cksum_parsed,
+	.data = NULL,
+	.help_str = "enable/disable hardware calculation of L3/L4 checksum when "
+		"using csum forward engine: tx_cksum set ip|tcp|udp|sctp|vxlan hw|sw <port>",
+	.tokens = {
+		(void *)&cmd_tx_cksum_tx_cksum,
+		(void *)&cmd_tx_cksum_mode,
+		(void *)&cmd_tx_cksum_proto,
+		(void *)&cmd_tx_cksum_hwsw,
+		(void *)&cmd_tx_cksum_portid,
+		NULL,
+	},
+};
+
+cmdline_parse_token_string_t cmd_tx_cksum_mode_show =
+	TOKEN_STRING_INITIALIZER(struct cmd_tx_cksum_result,
+				mode, "show");
+
+cmdline_parse_inst_t cmd_tx_cksum_show = {
+	.f = cmd_tx_cksum_parsed,
 	.data = NULL,
-	.help_str = "enable hardware insertion of L3/L4checksum with a given "
-	"mask in packets sent on a port, the bit mapping is given as, Bit 0 for ip, "
-	"Bit 1 for UDP, Bit 2 for TCP, Bit 3 for SCTP, Bit 4 for inner ip, "
-	"Bit 5 for inner UDP, Bit 6 for inner TCP, Bit 7 for inner SCTP",
+	.help_str = "show checksum offload configuration: tx_cksum show <port>",
 	.tokens = {
-		(void *)&cmd_tx_cksum_set_tx_cksum,
-		(void *)&cmd_tx_cksum_set_set,
-		(void *)&cmd_tx_cksum_set_cksum_mask,
-		(void *)&cmd_tx_cksum_set_portid,
+		(void *)&cmd_tx_cksum_tx_cksum,
+		(void *)&cmd_tx_cksum_mode_show,
+		(void *)&cmd_tx_cksum_portid,
 		NULL,
 	},
 };
@@ -7796,6 +7879,7 @@ cmdline_parse_ctx_t main_ctx[] = {
 	(cmdline_parse_inst_t *)&cmd_tx_vlan_reset,
 	(cmdline_parse_inst_t *)&cmd_tx_vlan_set_pvid,
 	(cmdline_parse_inst_t *)&cmd_tx_cksum_set,
+	(cmdline_parse_inst_t *)&cmd_tx_cksum_show,
 	(cmdline_parse_inst_t *)&cmd_link_flow_control_set,
 	(cmdline_parse_inst_t *)&cmd_link_flow_control_set_rx,
 	(cmdline_parse_inst_t *)&cmd_link_flow_control_set_tx,
diff --git a/app/test-pmd/config.c b/app/test-pmd/config.c
index 34b6fdb..d093227 100644
--- a/app/test-pmd/config.c
+++ b/app/test-pmd/config.c
@@ -32,7 +32,7 @@
  */
 /*   BSD LICENSE
  *
- *   Copyright(c) 2013 6WIND.
+ *   Copyright 2013-2014 6WIND S.A.
  *
  *   Redistribution and use in source and binary forms, with or without
  *   modification, are permitted provided that the following conditions
@@ -1744,17 +1744,6 @@ set_qmap(portid_t port_id, uint8_t is_rx, uint16_t queue_id, uint8_t map_value)
 }
 
 void
-tx_cksum_set(portid_t port_id, uint64_t ol_flags)
-{
-	uint64_t tx_ol_flags;
-	if (port_id_is_invalid(port_id))
-		return;
-	/* Clear last 8 bits and then set L3/4 checksum mask again */
-	tx_ol_flags = ports[port_id].tx_ol_flags & (~0x0FFull);
-	ports[port_id].tx_ol_flags = ((ol_flags & 0xff) | tx_ol_flags);
-}
-
-void
 fdir_add_signature_filter(portid_t port_id, uint8_t queue_id,
 			  struct rte_fdir_filter *fdir_filter)
 {
diff --git a/app/test-pmd/csumonly.c b/app/test-pmd/csumonly.c
index 743094a..4d6f1ee 100644
--- a/app/test-pmd/csumonly.c
+++ b/app/test-pmd/csumonly.c
@@ -2,6 +2,7 @@
  *   BSD LICENSE
  *
  *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
+ *   Copyright 2014 6WIND S.A.
  *   All rights reserved.
  *
  *   Redistribution and use in source and binary forms, with or without
@@ -73,13 +74,19 @@
 #include <rte_string_fns.h>
 #include "testpmd.h"
 
-
-
 #define IP_DEFTTL  64   /* from RFC 1340. */
 #define IP_VERSION 0x40
 #define IP_HDRLEN  0x05 /* default IP header length == five 32-bits words. */
 #define IP_VHL_DEF (IP_VERSION | IP_HDRLEN)
 
+/* we cannot use htons() from arpa/inet.h due to name conflicts, and we
+ * cannot use rte_cpu_to_be_16() on a constant in a switch/case */
+#if __BYTE_ORDER == __LITTLE_ENDIAN
+#define _htons(x) ((uint16_t)((((x) & 0x00ffU) << 8) | (((x) & 0xff00U) >> 8)))
+#else
+#define _htons(x) (x)
+#endif
+
 static inline uint16_t
 get_16b_sum(uint16_t *ptr16, uint32_t nr)
 {
@@ -112,7 +119,7 @@ get_ipv4_cksum(struct ipv4_hdr *ipv4_hdr)
 
 
 static inline uint16_t
-get_ipv4_psd_sum (struct ipv4_hdr * ip_hdr)
+get_ipv4_psd_sum(struct ipv4_hdr *ip_hdr)
 {
 	/* Pseudo Header for IPv4/UDP/TCP checksum */
 	union ipv4_psd_header {
@@ -136,7 +143,7 @@ get_ipv4_psd_sum (struct ipv4_hdr * ip_hdr)
 }
 
 static inline uint16_t
-get_ipv6_psd_sum (struct ipv6_hdr * ip_hdr)
+get_ipv6_psd_sum(struct ipv6_hdr *ip_hdr)
 {
 	/* Pseudo Header for IPv6/UDP/TCP checksum */
 	union ipv6_psd_header {
@@ -158,6 +165,15 @@ get_ipv6_psd_sum (struct ipv6_hdr * ip_hdr)
 	return get_16b_sum(psd_hdr.u16_arr, sizeof(psd_hdr));
 }
 
+static uint16_t
+get_psd_sum(void *l3_hdr, uint16_t ethertype)
+{
+	if (ethertype == _htons(ETHER_TYPE_IPv4))
+		return get_ipv4_psd_sum(l3_hdr);
+	else /* assume ethertype == ETHER_TYPE_IPv6 */
+		return get_ipv6_psd_sum(l3_hdr);
+}
+
 static inline uint16_t
 get_ipv4_udptcp_checksum(struct ipv4_hdr *ipv4_hdr, uint16_t *l4_hdr)
 {
@@ -174,7 +190,6 @@ get_ipv4_udptcp_checksum(struct ipv4_hdr *ipv4_hdr, uint16_t *l4_hdr)
 	if (cksum == 0)
 		cksum = 0xffff;
 	return (uint16_t)cksum;
-
 }
 
 static inline uint16_t
@@ -196,48 +211,225 @@ get_ipv6_udptcp_checksum(struct ipv6_hdr *ipv6_hdr, uint16_t *l4_hdr)
 	return (uint16_t)cksum;
 }
 
+static uint16_t
+get_udptcp_checksum(void *l3_hdr, void *l4_hdr, uint16_t ethertype)
+{
+	if (ethertype == _htons(ETHER_TYPE_IPv4))
+		return get_ipv4_udptcp_checksum(l3_hdr, l4_hdr);
+	else /* assume ethertype == ETHER_TYPE_IPv6 */
+		return get_ipv6_udptcp_checksum(l3_hdr, l4_hdr);
+}
 
 /*
- * Forwarding of packets. Change the checksum field with HW or SW methods
- * The HW/SW method selection depends on the ol_flags on every packet
+ * Parse an ethernet header to fill the ethertype, l2_len, l3_len and
+ * ipproto. This function is able to recognize IPv4/IPv6 with one optional vlan
+ * header.
+ */
+static void
+parse_ethernet(struct ether_hdr *eth_hdr, uint16_t *ethertype, uint16_t *l2_len,
+	uint16_t *l3_len, uint8_t *l4_proto)
+{
+	struct ipv4_hdr *ipv4_hdr;
+	struct ipv6_hdr *ipv6_hdr;
+
+	*l2_len = sizeof(struct ether_hdr);
+	*ethertype = eth_hdr->ether_type;
+
+	if (*ethertype == _htons(ETHER_TYPE_VLAN)) {
+		struct vlan_hdr *vlan_hdr = (struct vlan_hdr *)(eth_hdr + 1);
+
+		*l2_len  += sizeof(struct vlan_hdr);
+		*ethertype = vlan_hdr->eth_proto;
+	}
+
+	switch (*ethertype) {
+	case _htons(ETHER_TYPE_IPv4):
+		ipv4_hdr = (struct ipv4_hdr *) ((char *)eth_hdr + *l2_len);
+		*l3_len = (ipv4_hdr->version_ihl & 0x0f) * 4;
+		*l4_proto = ipv4_hdr->next_proto_id;
+		break;
+	case _htons(ETHER_TYPE_IPv6):
+		ipv6_hdr = (struct ipv6_hdr *) ((char *)eth_hdr + *l2_len);
+		*l3_len = sizeof(struct ipv6_hdr) ;
+		*l4_proto = ipv6_hdr->proto;
+		break;
+	default:
+		*l3_len = 0;
+		*l4_proto = 0;
+		break;
+	}
+}
+
+/* modify the IPv4 or IPv4 source address of a packet */
+static void
+change_ip_addresses(void *l3_hdr, uint16_t ethertype)
+{
+	struct ipv4_hdr *ipv4_hdr = l3_hdr;
+	struct ipv6_hdr *ipv6_hdr = l3_hdr;
+
+	if (ethertype == _htons(ETHER_TYPE_IPv4)) {
+		ipv4_hdr->src_addr =
+			rte_cpu_to_be_32(rte_be_to_cpu_32(ipv4_hdr->src_addr) + 1);
+	}
+	else if (ethertype == _htons(ETHER_TYPE_IPv6)) {
+		ipv6_hdr->src_addr[15] = ipv6_hdr->src_addr[15] + 1;
+	}
+}
+
+/* if possible, calculate the checksum of a packet in hw or sw,
+ * depending on the testpmd command line configuration */
+static uint64_t
+process_inner_cksums(void *l3_hdr, uint16_t ethertype, uint16_t l3_len,
+	uint8_t l4_proto, uint16_t testpmd_ol_flags)
+{
+	struct ipv4_hdr *ipv4_hdr = l3_hdr;
+	struct udp_hdr *udp_hdr;
+	struct tcp_hdr *tcp_hdr;
+	struct sctp_hdr *sctp_hdr;
+	uint64_t ol_flags = 0;
+
+	if (ethertype == _htons(ETHER_TYPE_IPv4)) {
+		ipv4_hdr = l3_hdr;
+		ipv4_hdr->hdr_checksum = 0;
+
+		if (testpmd_ol_flags & TESTPMD_TX_OFFLOAD_IP_CKSUM)
+			ol_flags |= PKT_TX_IP_CKSUM;
+		else
+			ipv4_hdr->hdr_checksum = get_ipv4_cksum(ipv4_hdr);
+
+	}
+	else if (ethertype != _htons(ETHER_TYPE_IPv6))
+		return 0; /* packet type not supported nothing to do */
+
+	if (l4_proto == IPPROTO_UDP) {
+		udp_hdr = (struct udp_hdr *)((char *)l3_hdr + l3_len);
+		/* do not recalculate udp cksum if it was 0 */
+		if (udp_hdr->dgram_cksum != 0) {
+			udp_hdr->dgram_cksum = 0;
+			if (testpmd_ol_flags & TESTPMD_TX_OFFLOAD_UDP_CKSUM) {
+				ol_flags |= PKT_TX_UDP_CKSUM;
+				udp_hdr->dgram_cksum = get_psd_sum(l3_hdr,
+					ethertype);
+			}
+			else {
+				udp_hdr->dgram_cksum =
+					get_udptcp_checksum(l3_hdr, udp_hdr,
+						ethertype);
+			}
+		}
+	}
+	else if (l4_proto == IPPROTO_TCP) {
+		tcp_hdr = (struct tcp_hdr *)((char *)l3_hdr + l3_len);
+		tcp_hdr->cksum = 0;
+		if (testpmd_ol_flags & TESTPMD_TX_OFFLOAD_TCP_CKSUM) {
+			ol_flags |= PKT_TX_TCP_CKSUM;
+			tcp_hdr->cksum = get_psd_sum(l3_hdr, ethertype);
+		}
+		else {
+			tcp_hdr->cksum =
+				get_udptcp_checksum(l3_hdr, tcp_hdr, ethertype);
+		}
+	}
+	else if (l4_proto == IPPROTO_SCTP) {
+		sctp_hdr = (struct sctp_hdr *)((char *)l3_hdr + l3_len);
+		sctp_hdr->cksum = 0;
+		/* sctp payload must be a multiple of 4 to be
+		 * offloaded */
+		if ((testpmd_ol_flags & TESTPMD_TX_OFFLOAD_SCTP_CKSUM) &&
+			((ipv4_hdr->total_length & 0x3) == 0)) {
+			ol_flags |= PKT_TX_SCTP_CKSUM;
+		}
+		else {
+			/* XXX implement CRC32c, example available in
+			 * RFC3309 */
+		}
+	}
+
+	return ol_flags;
+}
+
+/* Calculate the checksum of outer header (only vxlan is supported,
+ * meaning IP + UDP). The caller already checked that it's a vxlan
+ * packet */
+static uint64_t
+process_outer_cksums(void *outer_l3_hdr, uint16_t outer_ethertype,
+	uint16_t outer_l3_len, uint16_t testpmd_ol_flags)
+{
+	struct ipv4_hdr *ipv4_hdr = outer_l3_hdr;
+	struct ipv6_hdr *ipv6_hdr = outer_l3_hdr;
+	struct udp_hdr *udp_hdr;
+	uint64_t ol_flags = 0;
+
+	if (testpmd_ol_flags & TESTPMD_TX_OFFLOAD_VXLAN_CKSUM)
+		ol_flags |= PKT_TX_VXLAN_CKSUM;
+
+	if (outer_ethertype == _htons(ETHER_TYPE_IPv4)) {
+		ipv4_hdr->hdr_checksum = 0;
+
+		if ((testpmd_ol_flags & TESTPMD_TX_OFFLOAD_VXLAN_CKSUM) == 0)
+			ipv4_hdr->hdr_checksum = get_ipv4_cksum(ipv4_hdr);
+	}
+
+	udp_hdr = (struct udp_hdr *)((char *)outer_l3_hdr + outer_l3_len);
+	/* do not recalculate udp cksum if it was 0 */
+	if (udp_hdr->dgram_cksum != 0) {
+		udp_hdr->dgram_cksum = 0;
+		if ((testpmd_ol_flags & TESTPMD_TX_OFFLOAD_VXLAN_CKSUM) == 0) {
+			if (outer_ethertype == _htons(ETHER_TYPE_IPv4))
+				udp_hdr->dgram_cksum =
+					get_ipv4_udptcp_checksum(ipv4_hdr,
+						(uint16_t *)udp_hdr);
+			else
+				udp_hdr->dgram_cksum =
+					get_ipv6_udptcp_checksum(ipv6_hdr,
+						(uint16_t *)udp_hdr);
+		}
+	}
+
+	return ol_flags;
+}
+
+/*
+ * Receive a burst of packets, and for each packet:
+ *  - parse packet, and try to recognize a supported packet type (1)
+ *  - if it's not a supported packet type, don't touch the packet, else:
+ *  - modify the IPs in inner headers and in outer headers if any
+ *  - reprocess the checksum of all supported layers. This is done in SW
+ *    or HW, depending on testpmd command line configuration
+ * Then transmit packets on the output port.
+ *
+ * (1) Supported packets are:
+ *   Ether / (vlan) / IP|IP6 / UDP|TCP|SCTP .
+ *   Ether / (vlan) / outer IP|IP6 / outer UDP / VxLAN / Ether / IP|IP6 /
+ *           UDP|TCP|SCTP
+ *
+ * The testpmd command line for this forward engine sets the flags
+ * TESTPMD_TX_OFFLOAD_* in ports[tx_port].tx_ol_flags. They control
+ * wether a checksum must be calculated in software or in hardware. The
+ * IP, UDP, TCP and SCTP flags always concern the inner layer.  The
+ * VxLAN flag concerns the outer IP and UDP layer (if packet is
+ * recognized as a vxlan packet).
  */
 static void
 pkt_burst_checksum_forward(struct fwd_stream *fs)
 {
-	struct rte_mbuf  *pkts_burst[MAX_PKT_BURST];
-	struct rte_port  *txp;
-	struct rte_mbuf  *mb;
+	struct rte_mbuf *pkts_burst[MAX_PKT_BURST];
+	struct rte_port *txp;
+	struct rte_mbuf *m;
 	struct ether_hdr *eth_hdr;
-	struct ipv4_hdr  *ipv4_hdr;
-	struct ether_hdr *inner_eth_hdr;
-	struct ipv4_hdr  *inner_ipv4_hdr = NULL;
-	struct ipv6_hdr  *ipv6_hdr;
-	struct ipv6_hdr  *inner_ipv6_hdr = NULL;
-	struct udp_hdr   *udp_hdr;
-	struct udp_hdr   *inner_udp_hdr;
-	struct tcp_hdr   *tcp_hdr;
-	struct tcp_hdr   *inner_tcp_hdr;
-	struct sctp_hdr  *sctp_hdr;
-	struct sctp_hdr  *inner_sctp_hdr;
-
+	void *l3_hdr = NULL, *outer_l3_hdr = NULL; /* can be IPv4 or IPv6 */
+	struct udp_hdr *udp_hdr;
 	uint16_t nb_rx;
 	uint16_t nb_tx;
 	uint16_t i;
 	uint64_t ol_flags;
-	uint64_t pkt_ol_flags;
-	uint64_t tx_ol_flags;
-	uint16_t l4_proto;
-	uint16_t inner_l4_proto = 0;
-	uint16_t eth_type;
-	uint8_t  l2_len;
-	uint8_t  l3_len;
-	uint8_t  inner_l2_len = 0;
-	uint8_t  inner_l3_len = 0;
-
+	uint16_t testpmd_ol_flags;
+	uint8_t l4_proto;
+	uint16_t ethertype = 0, outer_ethertype = 0;
+	uint16_t  l2_len = 0, l3_len = 0, outer_l2_len = 0, outer_l3_len = 0;
+	int tunnel = 0;
 	uint32_t rx_bad_ip_csum;
 	uint32_t rx_bad_l4_csum;
-	uint8_t  ipv4_tunnel;
-	uint8_t  ipv6_tunnel;
 
 #ifdef RTE_TEST_PMD_RECORD_CORE_CYCLES
 	uint64_t start_tsc;
@@ -249,9 +441,7 @@ pkt_burst_checksum_forward(struct fwd_stream *fs)
 	start_tsc = rte_rdtsc();
 #endif
 
-	/*
-	 * Receive a burst of packets and forward them.
-	 */
+	/* receive a burst of packet */
 	nb_rx = rte_eth_rx_burst(fs->rx_port, fs->rx_queue, pkts_burst,
 				 nb_pkt_per_burst);
 	if (unlikely(nb_rx == 0))
@@ -265,348 +455,107 @@ pkt_burst_checksum_forward(struct fwd_stream *fs)
 	rx_bad_l4_csum = 0;
 
 	txp = &ports[fs->tx_port];
-	tx_ol_flags = txp->tx_ol_flags;
+	testpmd_ol_flags = txp->tx_ol_flags;
 
 	for (i = 0; i < nb_rx; i++) {
 
-		mb = pkts_burst[i];
-		l2_len  = sizeof(struct ether_hdr);
-		pkt_ol_flags = mb->ol_flags;
-		ol_flags = (pkt_ol_flags & (~PKT_TX_L4_MASK));
-		ipv4_tunnel = (pkt_ol_flags & PKT_RX_TUNNEL_IPV4_HDR) ?
-				1 : 0;
-		ipv6_tunnel = (pkt_ol_flags & PKT_RX_TUNNEL_IPV6_HDR) ?
-				1 : 0;
-		eth_hdr = rte_pktmbuf_mtod(mb, struct ether_hdr *);
-		eth_type = rte_be_to_cpu_16(eth_hdr->ether_type);
-		if (eth_type == ETHER_TYPE_VLAN) {
-			/* Only allow single VLAN label here */
-			l2_len  += sizeof(struct vlan_hdr);
-			 eth_type = rte_be_to_cpu_16(*(uint16_t *)
-				((uintptr_t)&eth_hdr->ether_type +
-				sizeof(struct vlan_hdr)));
+		ol_flags = 0;
+		tunnel = 0;
+		m = pkts_burst[i];
+
+		/* Update the L3/L4 checksum error packet statistics */
+		rx_bad_ip_csum += ((m->ol_flags & PKT_RX_IP_CKSUM_BAD) != 0);
+		rx_bad_l4_csum += ((m->ol_flags & PKT_RX_L4_CKSUM_BAD) != 0);
+
+		/* step 1: dissect packet, parsing optional vlan, ip4/ip6, vxlan
+		 * and inner headers */
+
+		eth_hdr = rte_pktmbuf_mtod(m, struct ether_hdr *);
+		parse_ethernet(eth_hdr, &ethertype, &l2_len, &l3_len, &l4_proto);
+		l3_hdr = (char *)eth_hdr + l2_len;
+
+		/* check if it's a supported tunnel (only vxlan for now) */
+		if (l4_proto == IPPROTO_UDP) {
+			udp_hdr = (struct udp_hdr *)((char *)l3_hdr + l3_len);
+
+			/* currently, this flag is set by i40e only if the
+			 * packet is vxlan */
+			if (((m->ol_flags & PKT_RX_TUNNEL_IPV4_HDR) ||
+					(m->ol_flags & PKT_RX_TUNNEL_IPV6_HDR)))
+				tunnel = 1;
+			/* else check udp destination port, 4789 is the default
+			 * vxlan port (rfc7348) */
+			else if (udp_hdr->dst_port == _htons(4789))
+				tunnel = 1;
+
+			if (tunnel == 1) {
+				outer_ethertype = ethertype;
+				outer_l2_len = l2_len;
+				outer_l3_len = l3_len;
+				outer_l3_hdr = l3_hdr;
+
+				eth_hdr = (struct ether_hdr *)((char *)udp_hdr +
+					sizeof(struct udp_hdr) +
+					sizeof(struct vxlan_hdr));
+
+				parse_ethernet(eth_hdr, &ethertype, &l2_len,
+					&l3_len, &l4_proto);
+				l3_hdr = (char *)eth_hdr + l2_len;
+			}
 		}
 
-		/* Update the L3/L4 checksum error packet count  */
-		rx_bad_ip_csum += (uint16_t) ((pkt_ol_flags & PKT_RX_IP_CKSUM_BAD) != 0);
-		rx_bad_l4_csum += (uint16_t) ((pkt_ol_flags & PKT_RX_L4_CKSUM_BAD) != 0);
-
-		/*
-		 * Try to figure out L3 packet type by SW.
-		 */
-		if ((pkt_ol_flags & (PKT_RX_IPV4_HDR | PKT_RX_IPV4_HDR_EXT |
-				PKT_RX_IPV6_HDR | PKT_RX_IPV6_HDR_EXT)) == 0) {
-			if (eth_type == ETHER_TYPE_IPv4)
-				pkt_ol_flags |= PKT_RX_IPV4_HDR;
-			else if (eth_type == ETHER_TYPE_IPv6)
-				pkt_ol_flags |= PKT_RX_IPV6_HDR;
-		}
+		/* step 2: change all source IPs (v4 or v6) so we need
+		 * to recompute the chksums even if they were correct */
 
-		/*
-		 * Simplify the protocol parsing
-		 * Assuming the incoming packets format as
-		 *      Ethernet2 + optional single VLAN
-		 *      + ipv4 or ipv6
-		 *      + udp or tcp or sctp or others
-		 */
-		if (pkt_ol_flags & (PKT_RX_IPV4_HDR | PKT_RX_TUNNEL_IPV4_HDR)) {
+		change_ip_addresses(l3_hdr, ethertype);
+		if (tunnel == 1)
+			change_ip_addresses(outer_l3_hdr, outer_ethertype);
 
-			/* Do not support ipv4 option field */
-			l3_len = sizeof(struct ipv4_hdr) ;
+		/* step 3: depending on user command line configuration,
+		 * recompute checksum either in software or flag the
+		 * mbuf to offload the calculation to the NIC */
 
-			ipv4_hdr = (struct ipv4_hdr *) (rte_pktmbuf_mtod(mb,
-					unsigned char *) + l2_len);
+		/* process checksums of inner headers first */
+		ol_flags |= process_inner_cksums(l3_hdr, ethertype,
+			l3_len, l4_proto, testpmd_ol_flags);
 
-			l4_proto = ipv4_hdr->next_proto_id;
+		/* Then process outer headers if any. Note that the software
+		 * checksum will be wrong if one of the inner checksums is
+		 * processed in hardware. */
+		if (tunnel == 1) {
+			ol_flags |= process_outer_cksums(outer_l3_hdr,
+				outer_ethertype, outer_l3_len, testpmd_ol_flags);
+		}
 
-			/* Do not delete, this is required by HW*/
-			ipv4_hdr->hdr_checksum = 0;
+		/* step 4: fill the mbuf meta data (flags and header lengths) */
 
-			if (tx_ol_flags & TESTPMD_TX_OFFLOAD_IP_CKSUM) {
-				/* HW checksum */
-				ol_flags |= PKT_TX_IP_CKSUM;
+		if (tunnel == 1) {
+			if (testpmd_ol_flags & TESTPMD_TX_OFFLOAD_VXLAN_CKSUM) {
+				m->l2_len = outer_l2_len;
+				m->l3_len = outer_l3_len;
+				m->inner_l2_len = l2_len;
+				m->inner_l3_len = l3_len;
 			}
 			else {
-				ol_flags |= PKT_TX_IPV4;
-				/* SW checksum calculation */
-				ipv4_hdr->src_addr++;
-				ipv4_hdr->hdr_checksum = get_ipv4_cksum(ipv4_hdr);
+				/* if we don't do vxlan cksum in hw,
+				   outer checksum will be wrong because
+				   we changed the ip, but it shows that
+				   we can process the inner header cksum
+				   in the nic */
+				m->l2_len = outer_l2_len + outer_l3_len +
+					sizeof(struct udp_hdr) +
+					sizeof(struct vxlan_hdr) + l2_len;
+				m->l3_len = l3_len;
 			}
-
-			if (l4_proto == IPPROTO_UDP) {
-				udp_hdr = (struct udp_hdr*) (rte_pktmbuf_mtod(mb,
-						unsigned char *) + l2_len + l3_len);
-				if (tx_ol_flags & TESTPMD_TX_OFFLOAD_UDP_CKSUM) {
-					/* HW Offload */
-					ol_flags |= PKT_TX_UDP_CKSUM;
-					if (ipv4_tunnel)
-						udp_hdr->dgram_cksum = 0;
-					else
-						/* Pseudo header sum need be set properly */
-						udp_hdr->dgram_cksum =
-							get_ipv4_psd_sum(ipv4_hdr);
-				}
-				else {
-					/* SW Implementation, clear checksum field first */
-					udp_hdr->dgram_cksum = 0;
-					udp_hdr->dgram_cksum = get_ipv4_udptcp_checksum(ipv4_hdr,
-									(uint16_t *)udp_hdr);
-				}
-
-				if (ipv4_tunnel) {
-
-					uint16_t len;
-
-					/* Check if inner L3/L4 checkum flag is set */
-					if (tx_ol_flags & TESTPMD_TX_OFFLOAD_INNER_CKSUM_MASK)
-						ol_flags |= PKT_TX_VXLAN_CKSUM;
-
-					inner_l2_len  = sizeof(struct ether_hdr);
-					inner_eth_hdr = (struct ether_hdr *) (rte_pktmbuf_mtod(mb,
-								unsigned char *) + l2_len + l3_len
-								 + ETHER_VXLAN_HLEN);
-
-					eth_type = rte_be_to_cpu_16(inner_eth_hdr->ether_type);
-					if (eth_type == ETHER_TYPE_VLAN) {
-						inner_l2_len += sizeof(struct vlan_hdr);
-						eth_type = rte_be_to_cpu_16(*(uint16_t *)
-							((uintptr_t)&eth_hdr->ether_type +
-								sizeof(struct vlan_hdr)));
-					}
-
-					len = l2_len + l3_len + ETHER_VXLAN_HLEN + inner_l2_len;
-					if (eth_type == ETHER_TYPE_IPv4) {
-						inner_l3_len = sizeof(struct ipv4_hdr);
-						inner_ipv4_hdr = (struct ipv4_hdr *) (rte_pktmbuf_mtod(mb,
-								unsigned char *) + len);
-						inner_l4_proto = inner_ipv4_hdr->next_proto_id;
-
-						if (tx_ol_flags & TESTPMD_TX_OFFLOAD_INNER_IP_CKSUM) {
-
-							/* Do not delete, this is required by HW*/
-							inner_ipv4_hdr->hdr_checksum = 0;
-							ol_flags |= PKT_TX_IPV4_CSUM;
-						}
-
-					} else if (eth_type == ETHER_TYPE_IPv6) {
-						inner_l3_len = sizeof(struct ipv6_hdr);
-						inner_ipv6_hdr = (struct ipv6_hdr *) (rte_pktmbuf_mtod(mb,
-								unsigned char *) + len);
-						inner_l4_proto = inner_ipv6_hdr->proto;
-					}
-					if ((inner_l4_proto == IPPROTO_UDP) &&
-						(tx_ol_flags & TESTPMD_TX_OFFLOAD_INNER_UDP_CKSUM)) {
-
-						/* HW Offload */
-						ol_flags |= PKT_TX_UDP_CKSUM;
-						inner_udp_hdr = (struct udp_hdr *) (rte_pktmbuf_mtod(mb,
-								unsigned char *) + len + inner_l3_len);
-						if (eth_type == ETHER_TYPE_IPv4)
-							inner_udp_hdr->dgram_cksum = get_ipv4_psd_sum(inner_ipv4_hdr);
-						else if (eth_type == ETHER_TYPE_IPv6)
-							inner_udp_hdr->dgram_cksum = get_ipv6_psd_sum(inner_ipv6_hdr);
-
-					} else if ((inner_l4_proto == IPPROTO_TCP) &&
-						(tx_ol_flags & TESTPMD_TX_OFFLOAD_INNER_TCP_CKSUM)) {
-						/* HW Offload */
-						ol_flags |= PKT_TX_TCP_CKSUM;
-						inner_tcp_hdr = (struct tcp_hdr *) (rte_pktmbuf_mtod(mb,
-								unsigned char *) + len + inner_l3_len);
-						if (eth_type == ETHER_TYPE_IPv4)
-							inner_tcp_hdr->cksum = get_ipv4_psd_sum(inner_ipv4_hdr);
-						else if (eth_type == ETHER_TYPE_IPv6)
-							inner_tcp_hdr->cksum = get_ipv6_psd_sum(inner_ipv6_hdr);
-					} else if ((inner_l4_proto == IPPROTO_SCTP) &&
-						(tx_ol_flags & TESTPMD_TX_OFFLOAD_INNER_SCTP_CKSUM)) {
-						/* HW Offload */
-						ol_flags |= PKT_TX_SCTP_CKSUM;
-						inner_sctp_hdr = (struct sctp_hdr *) (rte_pktmbuf_mtod(mb,
-								unsigned char *) + len + inner_l3_len);
-						inner_sctp_hdr->cksum = 0;
-					}
-
-				}
-
-			} else if (l4_proto == IPPROTO_TCP) {
-				tcp_hdr = (struct tcp_hdr*) (rte_pktmbuf_mtod(mb,
-						unsigned char *) + l2_len + l3_len);
-				if (tx_ol_flags & TESTPMD_TX_OFFLOAD_TCP_CKSUM) {
-					ol_flags |= PKT_TX_TCP_CKSUM;
-					tcp_hdr->cksum = get_ipv4_psd_sum(ipv4_hdr);
-				}
-				else {
-					tcp_hdr->cksum = 0;
-					tcp_hdr->cksum = get_ipv4_udptcp_checksum(ipv4_hdr,
-							(uint16_t*)tcp_hdr);
-				}
-			} else if (l4_proto == IPPROTO_SCTP) {
-				sctp_hdr = (struct sctp_hdr*) (rte_pktmbuf_mtod(mb,
-						unsigned char *) + l2_len + l3_len);
-
-				if (tx_ol_flags & TESTPMD_TX_OFFLOAD_SCTP_CKSUM) {
-					ol_flags |= PKT_TX_SCTP_CKSUM;
-					sctp_hdr->cksum = 0;
-
-					/* Sanity check, only number of 4 bytes supported */
-					if ((rte_be_to_cpu_16(ipv4_hdr->total_length) % 4) != 0)
-						printf("sctp payload must be a multiple "
-							"of 4 bytes for checksum offload");
-				}
-				else {
-					sctp_hdr->cksum = 0;
-					/* CRC32c sample code available in RFC3309 */
-				}
-			}
-			/* End of L4 Handling*/
-		} else if (pkt_ol_flags & (PKT_RX_IPV6_HDR | PKT_RX_TUNNEL_IPV6_HDR)) {
-			ipv6_hdr = (struct ipv6_hdr *) (rte_pktmbuf_mtod(mb,
-					unsigned char *) + l2_len);
-			l3_len = sizeof(struct ipv6_hdr) ;
-			l4_proto = ipv6_hdr->proto;
-			ol_flags |= PKT_TX_IPV6;
-
-			if (l4_proto == IPPROTO_UDP) {
-				udp_hdr = (struct udp_hdr*) (rte_pktmbuf_mtod(mb,
-						unsigned char *) + l2_len + l3_len);
-				if (tx_ol_flags & TESTPMD_TX_OFFLOAD_UDP_CKSUM) {
-					/* HW Offload */
-					ol_flags |= PKT_TX_UDP_CKSUM;
-					if (ipv6_tunnel)
-						udp_hdr->dgram_cksum = 0;
-					else
-						udp_hdr->dgram_cksum =
-							get_ipv6_psd_sum(ipv6_hdr);
-				}
-				else {
-					/* SW Implementation */
-					/* checksum field need be clear first */
-					udp_hdr->dgram_cksum = 0;
-					udp_hdr->dgram_cksum = get_ipv6_udptcp_checksum(ipv6_hdr,
-								(uint16_t *)udp_hdr);
-				}
-
-				if (ipv6_tunnel) {
-
-					uint16_t len;
-
-					/* Check if inner L3/L4 checksum flag is set */
-					if (tx_ol_flags & TESTPMD_TX_OFFLOAD_INNER_CKSUM_MASK)
-						ol_flags |= PKT_TX_VXLAN_CKSUM;
-
-					inner_l2_len  = sizeof(struct ether_hdr);
-					inner_eth_hdr = (struct ether_hdr *) (rte_pktmbuf_mtod(mb,
-						unsigned char *) + l2_len + l3_len + ETHER_VXLAN_HLEN);
-					eth_type = rte_be_to_cpu_16(inner_eth_hdr->ether_type);
-
-					if (eth_type == ETHER_TYPE_VLAN) {
-						inner_l2_len += sizeof(struct vlan_hdr);
-						eth_type = rte_be_to_cpu_16(*(uint16_t *)
-							((uintptr_t)&eth_hdr->ether_type +
-							sizeof(struct vlan_hdr)));
-					}
-
-					len = l2_len + l3_len + ETHER_VXLAN_HLEN + inner_l2_len;
-
-					if (eth_type == ETHER_TYPE_IPv4) {
-						inner_l3_len = sizeof(struct ipv4_hdr);
-						inner_ipv4_hdr = (struct ipv4_hdr *) (rte_pktmbuf_mtod(mb,
-								unsigned char *) + len);
-						inner_l4_proto = inner_ipv4_hdr->next_proto_id;
-
-						/* HW offload */
-						if (tx_ol_flags & TESTPMD_TX_OFFLOAD_INNER_IP_CKSUM) {
-
-							/* Do not delete, this is required by HW*/
-							inner_ipv4_hdr->hdr_checksum = 0;
-							ol_flags |= PKT_TX_IPV4_CSUM;
-						}
-					} else if (eth_type == ETHER_TYPE_IPv6) {
-						inner_l3_len = sizeof(struct ipv6_hdr);
-						inner_ipv6_hdr = (struct ipv6_hdr *) (rte_pktmbuf_mtod(mb,
-							unsigned char *) + len);
-						inner_l4_proto = inner_ipv6_hdr->proto;
-					}
-
-					if ((inner_l4_proto == IPPROTO_UDP) &&
-						(tx_ol_flags & TESTPMD_TX_OFFLOAD_INNER_UDP_CKSUM)) {
-						inner_udp_hdr = (struct udp_hdr *) (rte_pktmbuf_mtod(mb,
-							unsigned char *) + len + inner_l3_len);
-						/* HW offload */
-						ol_flags |= PKT_TX_UDP_CKSUM;
-						inner_udp_hdr->dgram_cksum = 0;
-						if (eth_type == ETHER_TYPE_IPv4)
-							inner_udp_hdr->dgram_cksum = get_ipv4_psd_sum(inner_ipv4_hdr);
-						else if (eth_type == ETHER_TYPE_IPv6)
-							inner_udp_hdr->dgram_cksum = get_ipv6_psd_sum(inner_ipv6_hdr);
-					} else if ((inner_l4_proto == IPPROTO_TCP) &&
-						(tx_ol_flags & TESTPMD_TX_OFFLOAD_INNER_TCP_CKSUM)) {
-						/* HW offload */
-						ol_flags |= PKT_TX_TCP_CKSUM;
-						inner_tcp_hdr = (struct tcp_hdr *) (rte_pktmbuf_mtod(mb,
-								unsigned char *) + len + inner_l3_len);
-
-						if (eth_type == ETHER_TYPE_IPv4)
-							inner_tcp_hdr->cksum = get_ipv4_psd_sum(inner_ipv4_hdr);
-						else if (eth_type == ETHER_TYPE_IPv6)
-							inner_tcp_hdr->cksum = get_ipv6_psd_sum(inner_ipv6_hdr);
-
-					} else if ((inner_l4_proto == IPPROTO_SCTP) &&
-						(tx_ol_flags & TESTPMD_TX_OFFLOAD_INNER_SCTP_CKSUM)) {
-						/* HW offload */
-						ol_flags |= PKT_TX_SCTP_CKSUM;
-						inner_sctp_hdr = (struct sctp_hdr *) (rte_pktmbuf_mtod(mb,
-								unsigned char *) + len + inner_l3_len);
-						inner_sctp_hdr->cksum = 0;
-					}
-
-				}
-
-			}
-			else if (l4_proto == IPPROTO_TCP) {
-				tcp_hdr = (struct tcp_hdr*) (rte_pktmbuf_mtod(mb,
-						unsigned char *) + l2_len + l3_len);
-				if (tx_ol_flags & TESTPMD_TX_OFFLOAD_TCP_CKSUM) {
-					ol_flags |= PKT_TX_TCP_CKSUM;
-					tcp_hdr->cksum = get_ipv6_psd_sum(ipv6_hdr);
-				}
-				else {
-					tcp_hdr->cksum = 0;
-					tcp_hdr->cksum = get_ipv6_udptcp_checksum(ipv6_hdr,
-							(uint16_t*)tcp_hdr);
-				}
-			}
-			else if (l4_proto == IPPROTO_SCTP) {
-				sctp_hdr = (struct sctp_hdr*) (rte_pktmbuf_mtod(mb,
-						unsigned char *) + l2_len + l3_len);
-
-				if (tx_ol_flags & TESTPMD_TX_OFFLOAD_SCTP_CKSUM) {
-					ol_flags |= PKT_TX_SCTP_CKSUM;
-					sctp_hdr->cksum = 0;
-					/* Sanity check, only number of 4 bytes supported by HW */
-					if ((rte_be_to_cpu_16(ipv6_hdr->payload_len) % 4) != 0)
-						printf("sctp payload must be a multiple "
-							"of 4 bytes for checksum offload");
-				}
-				else {
-					/* CRC32c sample code available in RFC3309 */
-					sctp_hdr->cksum = 0;
-				}
-			} else {
-				printf("Test flow control for 1G PMD \n");
-			}
-			/* End of L6 Handling*/
-		}
-		else {
-			l3_len = 0;
-			printf("Unhandled packet type: %#hx\n", eth_type);
+		} else {
+			/* this is only useful if an offload flag is
+			 * set, but it does not hurt to fill it in any
+			 * case */
+			m->l2_len = l2_len;
+			m->l3_len = l3_len;
 		}
+		m->ol_flags = ol_flags;
 
-		/* Combine the packet header write. VLAN is not consider here */
-		mb->l2_len = l2_len;
-		mb->l3_len = l3_len;
-		mb->inner_l2_len = inner_l2_len;
-		mb->inner_l3_len = inner_l3_len;
-		mb->ol_flags = ol_flags;
 	}
 	nb_tx = rte_eth_tx_burst(fs->tx_port, fs->tx_queue, pkts_burst, nb_rx);
 	fs->tx_packets += nb_tx;
@@ -629,7 +578,6 @@ pkt_burst_checksum_forward(struct fwd_stream *fs)
 #endif
 }
 
-
 struct fwd_engine csum_fwd_engine = {
 	.fwd_mode_name  = "csum",
 	.port_fwd_begin = NULL,
diff --git a/app/test-pmd/testpmd.h b/app/test-pmd/testpmd.h
index 82af2bd..c753d37 100644
--- a/app/test-pmd/testpmd.h
+++ b/app/test-pmd/testpmd.h
@@ -131,18 +131,11 @@ struct fwd_stream {
 #define TESTPMD_TX_OFFLOAD_TCP_CKSUM         0x0004
 /** Offload SCTP checksum in csum forward engine */
 #define TESTPMD_TX_OFFLOAD_SCTP_CKSUM        0x0008
-/** Offload inner IP checksum in csum forward engine */
-#define TESTPMD_TX_OFFLOAD_INNER_IP_CKSUM    0x0010
-/** Offload inner UDP checksum in csum forward engine */
-#define TESTPMD_TX_OFFLOAD_INNER_UDP_CKSUM   0x0020
-/** Offload inner TCP checksum in csum forward engine */
-#define TESTPMD_TX_OFFLOAD_INNER_TCP_CKSUM   0x0040
-/** Offload inner SCTP checksum in csum forward engine */
-#define TESTPMD_TX_OFFLOAD_INNER_SCTP_CKSUM  0x0080
-/** Offload inner IP checksum mask */
-#define TESTPMD_TX_OFFLOAD_INNER_CKSUM_MASK  0x00F0
+/** Offload VxLAN checksum in csum forward engine */
+#define TESTPMD_TX_OFFLOAD_VXLAN_CKSUM       0x0010
 /** Insert VLAN header in forward engine */
-#define TESTPMD_TX_OFFLOAD_INSERT_VLAN       0x0100
+#define TESTPMD_TX_OFFLOAD_INSERT_VLAN       0x0020
+
 /**
  * The data structure associated with each port.
  */
@@ -510,8 +503,6 @@ void tx_vlan_pvid_set(portid_t port_id, uint16_t vlan_id, int on);
 
 void set_qmap(portid_t port_id, uint8_t is_rx, uint16_t queue_id, uint8_t map_value);
 
-void tx_cksum_set(portid_t port_id, uint64_t ol_flags);
-
 void set_verbose_level(uint16_t vb_level);
 void set_tx_pkt_segments(unsigned *seg_lengths, unsigned nb_segs);
 void set_nb_pkt_per_burst(uint16_t pkt_burst);
-- 
2.1.0

^ permalink raw reply	[flat|nested] 112+ messages in thread

* [dpdk-dev] [PATCH v3 09/13] mbuf: introduce new checksum API
  2014-11-20 22:58   ` [dpdk-dev] [PATCH v3 00/13] add TSO support Olivier Matz
                       ` (7 preceding siblings ...)
  2014-11-20 22:58     ` [dpdk-dev] [PATCH v3 08/13] testpmd: rework csum forward engine Olivier Matz
@ 2014-11-20 22:58     ` Olivier Matz
  2014-11-20 22:58     ` [dpdk-dev] [PATCH v3 10/13] mbuf: generic support for TCP segmentation offload Olivier Matz
                       ` (4 subsequent siblings)
  13 siblings, 0 replies; 112+ messages in thread
From: Olivier Matz @ 2014-11-20 22:58 UTC (permalink / raw)
  To: dev; +Cc: jigsaw

Introduce new functions to calculate checksums. These new functions
are derivated from the ones provided csumonly.c but slightly reworked.
There is still some room for future optimization of these functions
(maybe SSE/AVX, ...).

This API will be modified in tbe next commits by the introduction of
TSO that requires a different pseudo header checksum to be set in the
packet.

Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
---
 app/test-pmd/csumonly.c    | 133 ++------------------------------
 lib/librte_mbuf/rte_mbuf.h |   3 +-
 lib/librte_net/rte_ip.h    | 183 +++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 193 insertions(+), 126 deletions(-)

diff --git a/app/test-pmd/csumonly.c b/app/test-pmd/csumonly.c
index 4d6f1ee..37d4129 100644
--- a/app/test-pmd/csumonly.c
+++ b/app/test-pmd/csumonly.c
@@ -87,137 +87,22 @@
 #define _htons(x) (x)
 #endif
 
-static inline uint16_t
-get_16b_sum(uint16_t *ptr16, uint32_t nr)
-{
-	uint32_t sum = 0;
-	while (nr > 1)
-	{
-		sum +=*ptr16;
-		nr -= sizeof(uint16_t);
-		ptr16++;
-		if (sum > UINT16_MAX)
-			sum -= UINT16_MAX;
-	}
-
-	/* If length is in odd bytes */
-	if (nr)
-		sum += *((uint8_t*)ptr16);
-
-	sum = ((sum & 0xffff0000) >> 16) + (sum & 0xffff);
-	sum &= 0x0ffff;
-	return (uint16_t)sum;
-}
-
-static inline uint16_t
-get_ipv4_cksum(struct ipv4_hdr *ipv4_hdr)
-{
-	uint16_t cksum;
-	cksum = get_16b_sum((uint16_t*)ipv4_hdr, sizeof(struct ipv4_hdr));
-	return (uint16_t)((cksum == 0xffff)?cksum:~cksum);
-}
-
-
-static inline uint16_t
-get_ipv4_psd_sum(struct ipv4_hdr *ip_hdr)
-{
-	/* Pseudo Header for IPv4/UDP/TCP checksum */
-	union ipv4_psd_header {
-		struct {
-			uint32_t src_addr; /* IP address of source host. */
-			uint32_t dst_addr; /* IP address of destination host(s). */
-			uint8_t  zero;     /* zero. */
-			uint8_t  proto;    /* L4 protocol type. */
-			uint16_t len;      /* L4 length. */
-		} __attribute__((__packed__));
-		uint16_t u16_arr[0];
-	} psd_hdr;
-
-	psd_hdr.src_addr = ip_hdr->src_addr;
-	psd_hdr.dst_addr = ip_hdr->dst_addr;
-	psd_hdr.zero     = 0;
-	psd_hdr.proto    = ip_hdr->next_proto_id;
-	psd_hdr.len      = rte_cpu_to_be_16((uint16_t)(rte_be_to_cpu_16(ip_hdr->total_length)
-				- sizeof(struct ipv4_hdr)));
-	return get_16b_sum(psd_hdr.u16_arr, sizeof(psd_hdr));
-}
-
-static inline uint16_t
-get_ipv6_psd_sum(struct ipv6_hdr *ip_hdr)
-{
-	/* Pseudo Header for IPv6/UDP/TCP checksum */
-	union ipv6_psd_header {
-		struct {
-			uint8_t src_addr[16]; /* IP address of source host. */
-			uint8_t dst_addr[16]; /* IP address of destination host(s). */
-			uint32_t len;         /* L4 length. */
-			uint32_t proto;       /* L4 protocol - top 3 bytes must be zero */
-		} __attribute__((__packed__));
-
-		uint16_t u16_arr[0]; /* allow use as 16-bit values with safe aliasing */
-	} psd_hdr;
-
-	rte_memcpy(&psd_hdr.src_addr, ip_hdr->src_addr,
-			sizeof(ip_hdr->src_addr) + sizeof(ip_hdr->dst_addr));
-	psd_hdr.len       = ip_hdr->payload_len;
-	psd_hdr.proto     = (ip_hdr->proto << 24);
-
-	return get_16b_sum(psd_hdr.u16_arr, sizeof(psd_hdr));
-}
-
 static uint16_t
 get_psd_sum(void *l3_hdr, uint16_t ethertype)
 {
 	if (ethertype == _htons(ETHER_TYPE_IPv4))
-		return get_ipv4_psd_sum(l3_hdr);
+		return rte_ipv4_phdr_cksum(l3_hdr);
 	else /* assume ethertype == ETHER_TYPE_IPv6 */
-		return get_ipv6_psd_sum(l3_hdr);
-}
-
-static inline uint16_t
-get_ipv4_udptcp_checksum(struct ipv4_hdr *ipv4_hdr, uint16_t *l4_hdr)
-{
-	uint32_t cksum;
-	uint32_t l4_len;
-
-	l4_len = rte_be_to_cpu_16(ipv4_hdr->total_length) - sizeof(struct ipv4_hdr);
-
-	cksum = get_16b_sum(l4_hdr, l4_len);
-	cksum += get_ipv4_psd_sum(ipv4_hdr);
-
-	cksum = ((cksum & 0xffff0000) >> 16) + (cksum & 0xffff);
-	cksum = (~cksum) & 0xffff;
-	if (cksum == 0)
-		cksum = 0xffff;
-	return (uint16_t)cksum;
-}
-
-static inline uint16_t
-get_ipv6_udptcp_checksum(struct ipv6_hdr *ipv6_hdr, uint16_t *l4_hdr)
-{
-	uint32_t cksum;
-	uint32_t l4_len;
-
-	l4_len = rte_be_to_cpu_16(ipv6_hdr->payload_len);
-
-	cksum = get_16b_sum(l4_hdr, l4_len);
-	cksum += get_ipv6_psd_sum(ipv6_hdr);
-
-	cksum = ((cksum & 0xffff0000) >> 16) + (cksum & 0xffff);
-	cksum = (~cksum) & 0xffff;
-	if (cksum == 0)
-		cksum = 0xffff;
-
-	return (uint16_t)cksum;
+		return rte_ipv6_phdr_cksum(l3_hdr);
 }
 
 static uint16_t
 get_udptcp_checksum(void *l3_hdr, void *l4_hdr, uint16_t ethertype)
 {
 	if (ethertype == _htons(ETHER_TYPE_IPv4))
-		return get_ipv4_udptcp_checksum(l3_hdr, l4_hdr);
+		return rte_ipv4_udptcp_cksum(l3_hdr, l4_hdr);
 	else /* assume ethertype == ETHER_TYPE_IPv6 */
-		return get_ipv6_udptcp_checksum(l3_hdr, l4_hdr);
+		return rte_ipv6_udptcp_cksum(l3_hdr, l4_hdr);
 }
 
 /*
@@ -295,7 +180,7 @@ process_inner_cksums(void *l3_hdr, uint16_t ethertype, uint16_t l3_len,
 		if (testpmd_ol_flags & TESTPMD_TX_OFFLOAD_IP_CKSUM)
 			ol_flags |= PKT_TX_IP_CKSUM;
 		else
-			ipv4_hdr->hdr_checksum = get_ipv4_cksum(ipv4_hdr);
+			ipv4_hdr->hdr_checksum = rte_ipv4_cksum(ipv4_hdr);
 
 	}
 	else if (ethertype != _htons(ETHER_TYPE_IPv6))
@@ -367,7 +252,7 @@ process_outer_cksums(void *outer_l3_hdr, uint16_t outer_ethertype,
 		ipv4_hdr->hdr_checksum = 0;
 
 		if ((testpmd_ol_flags & TESTPMD_TX_OFFLOAD_VXLAN_CKSUM) == 0)
-			ipv4_hdr->hdr_checksum = get_ipv4_cksum(ipv4_hdr);
+			ipv4_hdr->hdr_checksum = rte_ipv4_cksum(ipv4_hdr);
 	}
 
 	udp_hdr = (struct udp_hdr *)((char *)outer_l3_hdr + outer_l3_len);
@@ -377,12 +262,10 @@ process_outer_cksums(void *outer_l3_hdr, uint16_t outer_ethertype,
 		if ((testpmd_ol_flags & TESTPMD_TX_OFFLOAD_VXLAN_CKSUM) == 0) {
 			if (outer_ethertype == _htons(ETHER_TYPE_IPv4))
 				udp_hdr->dgram_cksum =
-					get_ipv4_udptcp_checksum(ipv4_hdr,
-						(uint16_t *)udp_hdr);
+					rte_ipv4_udptcp_cksum(ipv4_hdr, udp_hdr);
 			else
 				udp_hdr->dgram_cksum =
-					get_ipv6_udptcp_checksum(ipv6_hdr,
-						(uint16_t *)udp_hdr);
+					rte_ipv6_udptcp_cksum(ipv6_hdr, udp_hdr);
 		}
 	}
 
diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h
index acc0385..10ddd93 100644
--- a/lib/librte_mbuf/rte_mbuf.h
+++ b/lib/librte_mbuf/rte_mbuf.h
@@ -109,7 +109,8 @@ extern "C" {
  *  - fill l2_len and l3_len in mbuf
  *  - set the flags PKT_TX_TCP_CKSUM, PKT_TX_SCTP_CKSUM or PKT_TX_UDP_CKSUM
  *  - calculate the pseudo header checksum and set it in the L4 header (only
- *    for TCP or UDP). For SCTP, set the crc field to 0.
+ *    for TCP or UDP). See rte_ipv4_phdr_cksum() and rte_ipv6_phdr_cksum().
+ *    For SCTP, set the crc field to 0.
  */
 #define PKT_TX_L4_NO_CKSUM   (0ULL << 52) /* Disable L4 cksum of TX pkt. */
 #define PKT_TX_TCP_CKSUM     (1ULL << 52) /**< TCP cksum of TX pkt. computed by NIC. */
diff --git a/lib/librte_net/rte_ip.h b/lib/librte_net/rte_ip.h
index e3f65c1..387b06c 100644
--- a/lib/librte_net/rte_ip.h
+++ b/lib/librte_net/rte_ip.h
@@ -2,6 +2,7 @@
  *   BSD LICENSE
  *
  *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
+ *   Copyright 2014 6WIND S.A.
  *   All rights reserved.
  *
  *   Redistribution and use in source and binary forms, with or without
@@ -78,6 +79,9 @@
 
 #include <stdint.h>
 
+#include <rte_memcpy.h>
+#include <rte_byteorder.h>
+
 #ifdef __cplusplus
 extern "C" {
 #endif
@@ -247,6 +251,127 @@ struct ipv4_hdr {
 	((x) >= IPV4_MIN_MCAST && (x) <= IPV4_MAX_MCAST) /**< check if IPv4 address is multicast */
 
 /**
+ * Process the non-complemented checksum of a buffer.
+ *
+ * @param buf
+ *   Pointer to the buffer.
+ * @param len
+ *   Length of the buffer.
+ * @return
+ *   The non-complemented checksum.
+ */
+static inline uint16_t
+rte_raw_cksum(const char *buf, size_t len)
+{
+	const uint16_t *u16 = (const uint16_t *)buf;
+	uint32_t sum = 0;
+
+	while (len >= (sizeof(*u16) * 4)) {
+		sum += u16[0];
+		sum += u16[1];
+		sum += u16[2];
+		sum += u16[3];
+		len -= sizeof(*u16) * 4;
+		u16 += 4;
+	}
+	while (len >= sizeof(*u16)) {
+		sum += *u16;
+		len -= sizeof(*u16);
+		u16 += 1;
+	}
+
+	/* if length is in odd bytes */
+	if (len == 1)
+		sum += *((const uint8_t *)u16);
+
+	sum = ((sum & 0xffff0000) >> 16) + (sum & 0xffff);
+	sum = ((sum & 0xffff0000) >> 16) + (sum & 0xffff);
+	return (uint16_t)sum;
+}
+
+/**
+ * Process the IPv4 checksum of an IPv4 header.
+ *
+ * The checksum field must be set to 0 by the caller.
+ *
+ * @param ipv4_hdr
+ *   The pointer to the contiguous IPv4 header.
+ * @return
+ *   The complemented checksum to set in the IP packet.
+ */
+static inline uint16_t
+rte_ipv4_cksum(const struct ipv4_hdr *ipv4_hdr)
+{
+	uint16_t cksum;
+	cksum = rte_raw_cksum((const char *)ipv4_hdr, sizeof(struct ipv4_hdr));
+	return ((cksum == 0xffff) ? cksum : ~cksum);
+}
+
+/**
+ * Process the pseudo-header checksum of an IPv4 header.
+ *
+ * The checksum field must be set to 0 by the caller.
+ *
+ * @param ipv4_hdr
+ *   The pointer to the contiguous IPv4 header.
+ * @return
+ *   The non-complemented checksum to set in the L4 header.
+ */
+static inline uint16_t
+rte_ipv4_phdr_cksum(const struct ipv4_hdr *ipv4_hdr)
+{
+	struct ipv4_psd_header {
+		uint32_t src_addr; /* IP address of source host. */
+		uint32_t dst_addr; /* IP address of destination host. */
+		uint8_t  zero;     /* zero. */
+		uint8_t  proto;    /* L4 protocol type. */
+		uint16_t len;      /* L4 length. */
+	} psd_hdr;
+
+	psd_hdr.src_addr = ipv4_hdr->src_addr;
+	psd_hdr.dst_addr = ipv4_hdr->dst_addr;
+	psd_hdr.zero = 0;
+	psd_hdr.proto = ipv4_hdr->next_proto_id;
+	psd_hdr.len = rte_cpu_to_be_16(
+		(uint16_t)(rte_be_to_cpu_16(ipv4_hdr->total_length)
+			- sizeof(struct ipv4_hdr)));
+	return rte_raw_cksum((const char *)&psd_hdr, sizeof(psd_hdr));
+}
+
+/**
+ * Process the IPv4 UDP or TCP checksum.
+ *
+ * The IPv4 header should not contains options. The IP and layer 4
+ * checksum must be set to 0 in the packet by the caller.
+ *
+ * @param ipv4_hdr
+ *   The pointer to the contiguous IPv4 header.
+ * @param l4_hdr
+ *   The pointer to the beginning of the L4 header.
+ * @return
+ *   The complemented checksum to set in the IP packet.
+ */
+static inline uint16_t
+rte_ipv4_udptcp_cksum(const struct ipv4_hdr *ipv4_hdr, const void *l4_hdr)
+{
+	uint32_t cksum;
+	uint32_t l4_len;
+
+	l4_len = rte_be_to_cpu_16(ipv4_hdr->total_length) -
+		sizeof(struct ipv4_hdr);
+
+	cksum = rte_raw_cksum(l4_hdr, l4_len);
+	cksum += rte_ipv4_phdr_cksum(ipv4_hdr);
+
+	cksum = ((cksum & 0xffff0000) >> 16) + (cksum & 0xffff);
+	cksum = (~cksum) & 0xffff;
+	if (cksum == 0)
+		cksum = 0xffff;
+
+	return cksum;
+}
+
+/**
  * IPv6 Header
  */
 struct ipv6_hdr {
@@ -258,6 +383,64 @@ struct ipv6_hdr {
 	uint8_t  dst_addr[16]; /**< IP address of destination host(s). */
 } __attribute__((__packed__));
 
+/**
+ * Process the pseudo-header checksum of an IPv6 header.
+ *
+ * @param ipv6_hdr
+ *   The pointer to the contiguous IPv6 header.
+ * @return
+ *   The non-complemented checksum to set in the L4 header.
+ */
+static inline uint16_t
+rte_ipv6_phdr_cksum(const struct ipv6_hdr *ipv6_hdr)
+{
+	struct ipv6_psd_header {
+		uint8_t src_addr[16]; /* IP address of source host. */
+		uint8_t dst_addr[16]; /* IP address of destination host. */
+		uint32_t len;         /* L4 length. */
+		uint32_t proto;       /* L4 protocol - top 3 bytes must be zero */
+	} psd_hdr;
+
+	rte_memcpy(&psd_hdr.src_addr, ipv6_hdr->src_addr,
+		sizeof(ipv6_hdr->src_addr) + sizeof(ipv6_hdr->dst_addr));
+	psd_hdr.proto = (ipv6_hdr->proto << 24);
+	psd_hdr.len = ipv6_hdr->payload_len;
+
+	return rte_raw_cksum((const char *)&psd_hdr, sizeof(psd_hdr));
+}
+
+/**
+ * Process the IPv6 UDP or TCP checksum.
+ *
+ * The IPv4 header should not contains options. The layer 4 checksum
+ * must be set to 0 in the packet by the caller.
+ *
+ * @param ipv6_hdr
+ *   The pointer to the contiguous IPv6 header.
+ * @param l4_hdr
+ *   The pointer to the beginning of the L4 header.
+ * @return
+ *   The complemented checksum to set in the IP packet.
+ */
+static inline uint16_t
+rte_ipv6_udptcp_cksum(const struct ipv6_hdr *ipv6_hdr, const void *l4_hdr)
+{
+	uint32_t cksum;
+	uint32_t l4_len;
+
+	l4_len = rte_be_to_cpu_16(ipv6_hdr->payload_len);
+
+	cksum = rte_raw_cksum(l4_hdr, l4_len);
+	cksum += rte_ipv6_phdr_cksum(ipv6_hdr, 0);
+
+	cksum = ((cksum & 0xffff0000) >> 16) + (cksum & 0xffff);
+	cksum = (~cksum) & 0xffff;
+	if (cksum == 0)
+		cksum = 0xffff;
+
+	return cksum;
+}
+
 #ifdef __cplusplus
 }
 #endif
-- 
2.1.0

^ permalink raw reply	[flat|nested] 112+ messages in thread

* [dpdk-dev] [PATCH v3 10/13] mbuf: generic support for TCP segmentation offload
  2014-11-20 22:58   ` [dpdk-dev] [PATCH v3 00/13] add TSO support Olivier Matz
                       ` (8 preceding siblings ...)
  2014-11-20 22:58     ` [dpdk-dev] [PATCH v3 09/13] mbuf: introduce new checksum API Olivier Matz
@ 2014-11-20 22:58     ` Olivier Matz
  2014-11-20 22:58     ` [dpdk-dev] [PATCH v3 11/13] ixgbe: support " Olivier Matz
                       ` (3 subsequent siblings)
  13 siblings, 0 replies; 112+ messages in thread
From: Olivier Matz @ 2014-11-20 22:58 UTC (permalink / raw)
  To: dev; +Cc: jigsaw

Some of the NICs supported by DPDK have a possibility to accelerate TCP
traffic by using segmentation offload. The application prepares a packet
with valid TCP header with size up to 64K and deleguates the
segmentation to the NIC.

Implement the generic part of TCP segmentation offload in rte_mbuf. It
introduces 2 new fields in rte_mbuf: l4_len (length of L4 header in bytes)
and tso_segsz (MSS of packets).

To delegate the TCP segmentation to the hardware, the user has to:

- set the PKT_TX_TCP_SEG flag in mbuf->ol_flags (this flag implies
  PKT_TX_TCP_CKSUM)
- set PKT_TX_IP_CKSUM if it's IPv4, and set the IP checksum to 0 in
  the packet
- fill the mbuf offload information: l2_len, l3_len, l4_len, tso_segsz
- calculate the pseudo header checksum without taking ip_len in account,
  and set it in the TCP header, for instance by using
  rte_ipv4_phdr_cksum(ip_hdr, ol_flags)

The API is inspired from ixgbe hardware (the next commit adds the
support for ixgbe), but it seems generic enough to be used for other
hw/drivers in the future.

This commit also reworks the way l2_len and l3_len are used in igb
and ixgbe drivers as the l2_l3_len is not available anymore in mbuf.

Signed-off-by: Mirek Walukiewicz <miroslaw.walukiewicz@intel.com>
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
---
 app/test-pmd/testpmd.c            |  2 +-
 examples/ipv4_multicast/main.c    |  2 +-
 lib/librte_mbuf/rte_mbuf.c        |  1 +
 lib/librte_mbuf/rte_mbuf.h        | 45 +++++++++++++++++++++++----------------
 lib/librte_net/rte_ip.h           | 39 +++++++++++++++++++++++++++------
 lib/librte_pmd_e1000/igb_rxtx.c   | 11 +++++++++-
 lib/librte_pmd_ixgbe/ixgbe_rxtx.c | 11 +++++++++-
 7 files changed, 82 insertions(+), 29 deletions(-)

diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c
index 12adafa..632a993 100644
--- a/app/test-pmd/testpmd.c
+++ b/app/test-pmd/testpmd.c
@@ -408,7 +408,7 @@ testpmd_mbuf_ctor(struct rte_mempool *mp,
 	mb->ol_flags     = 0;
 	mb->data_off     = RTE_PKTMBUF_HEADROOM;
 	mb->nb_segs      = 1;
-	mb->l2_l3_len       = 0;
+	mb->tx_offload   = 0;
 	mb->vlan_tci     = 0;
 	mb->hash.rss     = 0;
 }
diff --git a/examples/ipv4_multicast/main.c b/examples/ipv4_multicast/main.c
index 590d11a..80c5140 100644
--- a/examples/ipv4_multicast/main.c
+++ b/examples/ipv4_multicast/main.c
@@ -302,7 +302,7 @@ mcast_out_pkt(struct rte_mbuf *pkt, int use_clone)
 	/* copy metadata from source packet*/
 	hdr->port = pkt->port;
 	hdr->vlan_tci = pkt->vlan_tci;
-	hdr->l2_l3_len = pkt->l2_l3_len;
+	hdr->tx_offload = pkt->tx_offload;
 	hdr->hash = pkt->hash;
 
 	hdr->ol_flags = pkt->ol_flags;
diff --git a/lib/librte_mbuf/rte_mbuf.c b/lib/librte_mbuf/rte_mbuf.c
index 9b57b3a..87c2963 100644
--- a/lib/librte_mbuf/rte_mbuf.c
+++ b/lib/librte_mbuf/rte_mbuf.c
@@ -241,6 +241,7 @@ const char *rte_get_tx_ol_flag_name(uint64_t mask)
 	case PKT_TX_UDP_CKSUM: return "PKT_TX_UDP_CKSUM";
 	case PKT_TX_IEEE1588_TMST: return "PKT_TX_IEEE1588_TMST";
 	case PKT_TX_VXLAN_CKSUM: return "PKT_TX_VXLAN_CKSUM";
+	case PKT_TX_TCP_SEG: return "PKT_TX_TCP_SEG";
 	default: return NULL;
 	}
 }
diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h
index 10ddd93..bc6c363 100644
--- a/lib/librte_mbuf/rte_mbuf.h
+++ b/lib/librte_mbuf/rte_mbuf.h
@@ -2,6 +2,7 @@
  *   BSD LICENSE
  *
  *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
+ *   Copyright 2014 6WIND S.A.
  *   All rights reserved.
  *
  *   Redistribution and use in source and binary forms, with or without
@@ -125,6 +126,20 @@ extern "C" {
 
 #define PKT_TX_VLAN_PKT      (1ULL << 55) /**< TX packet is a 802.1q VLAN packet. */
 
+/**
+ * TCP segmentation offload. To enable this offload feature for a
+ * packet to be transmitted on hardware supporting TSO:
+ *  - set the PKT_TX_TCP_SEG flag in mbuf->ol_flags (this flag implies
+ *    PKT_TX_TCP_CKSUM)
+ *  - if it's IPv4, set the PKT_TX_IP_CKSUM flag and write the IP checksum
+ *    to 0 in the packet
+ *  - fill the mbuf offload information: l2_len, l3_len, l4_len, tso_segsz
+ *  - calculate the pseudo header checksum without taking ip_len in accound,
+ *    and set it in the TCP header. Refer to rte_ipv4_phdr_cksum() and
+ *    rte_ipv6_phdr_cksum() that can be used as helpers.
+ */
+#define PKT_TX_TCP_SEG       (1ULL << 49)
+
 /* Use final bit of flags to indicate a control mbuf */
 #define CTRL_MBUF_FLAG       (1ULL << 63) /**< Mbuf contains control data */
 
@@ -226,22 +241,18 @@ struct rte_mbuf {
 
 	/* fields to support TX offloads */
 	union {
-		uint16_t l2_l3_len; /**< combined l2/l3 lengths as single var */
+		uint64_t tx_offload;       /**< combined for easy fetch */
 		struct {
-			uint16_t l3_len:9;      /**< L3 (IP) Header Length. */
-			uint16_t l2_len:7;      /**< L2 (MAC) Header Length. */
-		};
-	};
+			uint64_t l2_len:7; /**< L2 (MAC) Header Length. */
+			uint64_t l3_len:9; /**< L3 (IP) Header Length. */
+			uint64_t l4_len:8; /**< L4 (TCP/UDP) Header Length. */
+			uint64_t tso_segsz:16; /**< TCP TSO segment size */
 
-	/* fields for TX offloading of tunnels */
-	union {
-		uint16_t inner_l2_l3_len;
-		/**< combined inner l2/l3 lengths as single var */
-		struct {
-			uint16_t inner_l3_len:9;
-			/**< inner L3 (IP) Header Length. */
-			uint16_t inner_l2_len:7;
-			/**< inner L2 (MAC) Header Length. */
+			/* fields for TX offloading of tunnels */
+			uint64_t inner_l3_len:9; /**< inner L3 (IP) Hdr Length. */
+			uint64_t inner_l2_len:7; /**< inner L2 (MAC) Hdr Length. */
+
+			/* uint64_t unused:8; */
 		};
 	};
 } __rte_cache_aligned;
@@ -593,8 +604,7 @@ static inline void rte_pktmbuf_reset(struct rte_mbuf *m)
 {
 	m->next = NULL;
 	m->pkt_len = 0;
-	m->l2_l3_len = 0;
-	m->inner_l2_l3_len = 0;
+	m->tx_offload = 0;
 	m->vlan_tci = 0;
 	m->nb_segs = 1;
 	m->port = 0xff;
@@ -663,8 +673,7 @@ static inline void rte_pktmbuf_attach(struct rte_mbuf *mi, struct rte_mbuf *md)
 	mi->data_len = md->data_len;
 	mi->port = md->port;
 	mi->vlan_tci = md->vlan_tci;
-	mi->l2_l3_len = md->l2_l3_len;
-	mi->inner_l2_l3_len = md->inner_l2_l3_len;
+	mi->tx_offload = md->tx_offload;
 	mi->hash = md->hash;
 
 	mi->next = NULL;
diff --git a/lib/librte_net/rte_ip.h b/lib/librte_net/rte_ip.h
index 387b06c..20c3ae1 100644
--- a/lib/librte_net/rte_ip.h
+++ b/lib/librte_net/rte_ip.h
@@ -81,6 +81,7 @@
 
 #include <rte_memcpy.h>
 #include <rte_byteorder.h>
+#include <rte_mbuf.h>
 
 #ifdef __cplusplus
 extern "C" {
@@ -312,13 +313,21 @@ rte_ipv4_cksum(const struct ipv4_hdr *ipv4_hdr)
  *
  * The checksum field must be set to 0 by the caller.
  *
+ * Depending on the ol_flags, the pseudo-header checksum expected by the
+ * drivers is not the same. For instance, when TSO is enabled, the IP
+ * payload length must not be included in the packet.
+ *
+ * When ol_flags is 0, it computes the standard pseudo-header checksum.
+ *
  * @param ipv4_hdr
  *   The pointer to the contiguous IPv4 header.
+ * @param ol_flags
+ *   The ol_flags of the associated mbuf.
  * @return
  *   The non-complemented checksum to set in the L4 header.
  */
 static inline uint16_t
-rte_ipv4_phdr_cksum(const struct ipv4_hdr *ipv4_hdr)
+rte_ipv4_phdr_cksum(const struct ipv4_hdr *ipv4_hdr, uint64_t ol_flags)
 {
 	struct ipv4_psd_header {
 		uint32_t src_addr; /* IP address of source host. */
@@ -332,9 +341,13 @@ rte_ipv4_phdr_cksum(const struct ipv4_hdr *ipv4_hdr)
 	psd_hdr.dst_addr = ipv4_hdr->dst_addr;
 	psd_hdr.zero = 0;
 	psd_hdr.proto = ipv4_hdr->next_proto_id;
-	psd_hdr.len = rte_cpu_to_be_16(
-		(uint16_t)(rte_be_to_cpu_16(ipv4_hdr->total_length)
-			- sizeof(struct ipv4_hdr)));
+	if (ol_flags & PKT_TX_TCP_SEG) {
+		psd_hdr.len = 0;
+	} else {
+		psd_hdr.len = rte_cpu_to_be_16(
+			(uint16_t)(rte_be_to_cpu_16(ipv4_hdr->total_length)
+				- sizeof(struct ipv4_hdr)));
+	}
 	return rte_raw_cksum((const char *)&psd_hdr, sizeof(psd_hdr));
 }
 
@@ -361,7 +374,7 @@ rte_ipv4_udptcp_cksum(const struct ipv4_hdr *ipv4_hdr, const void *l4_hdr)
 		sizeof(struct ipv4_hdr);
 
 	cksum = rte_raw_cksum(l4_hdr, l4_len);
-	cksum += rte_ipv4_phdr_cksum(ipv4_hdr);
+	cksum += rte_ipv4_phdr_cksum(ipv4_hdr, 0);
 
 	cksum = ((cksum & 0xffff0000) >> 16) + (cksum & 0xffff);
 	cksum = (~cksum) & 0xffff;
@@ -386,13 +399,21 @@ struct ipv6_hdr {
 /**
  * Process the pseudo-header checksum of an IPv6 header.
  *
+ * Depending on the ol_flags, the pseudo-header checksum expected by the
+ * drivers is not the same. For instance, when TSO is enabled, the IPv6
+ * payload length must not be included in the packet.
+ *
+ * When ol_flags is 0, it computes the standard pseudo-header checksum.
+ *
  * @param ipv6_hdr
  *   The pointer to the contiguous IPv6 header.
+ * @param ol_flags
+ *   The ol_flags of the associated mbuf.
  * @return
  *   The non-complemented checksum to set in the L4 header.
  */
 static inline uint16_t
-rte_ipv6_phdr_cksum(const struct ipv6_hdr *ipv6_hdr)
+rte_ipv6_phdr_cksum(const struct ipv6_hdr *ipv6_hdr, uint64_t ol_flags)
 {
 	struct ipv6_psd_header {
 		uint8_t src_addr[16]; /* IP address of source host. */
@@ -404,7 +425,11 @@ rte_ipv6_phdr_cksum(const struct ipv6_hdr *ipv6_hdr)
 	rte_memcpy(&psd_hdr.src_addr, ipv6_hdr->src_addr,
 		sizeof(ipv6_hdr->src_addr) + sizeof(ipv6_hdr->dst_addr));
 	psd_hdr.proto = (ipv6_hdr->proto << 24);
-	psd_hdr.len = ipv6_hdr->payload_len;
+	if (ol_flags & PKT_TX_TCP_SEG) {
+		psd_hdr.len = 0;
+	} else {
+		psd_hdr.len = ipv6_hdr->payload_len;
+	}
 
 	return rte_raw_cksum((const char *)&psd_hdr, sizeof(psd_hdr));
 }
diff --git a/lib/librte_pmd_e1000/igb_rxtx.c b/lib/librte_pmd_e1000/igb_rxtx.c
index 433c616..848d5d1 100644
--- a/lib/librte_pmd_e1000/igb_rxtx.c
+++ b/lib/librte_pmd_e1000/igb_rxtx.c
@@ -367,6 +367,13 @@ eth_igb_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
 	struct rte_mbuf     *tx_pkt;
 	struct rte_mbuf     *m_seg;
 	union igb_vlan_macip vlan_macip_lens;
+	union {
+		uint16_t u16;
+		struct {
+			uint16_t l3_len:9;
+			uint16_t l2_len:7;
+		};
+	} l2_l3_len;
 	uint64_t buf_dma_addr;
 	uint32_t olinfo_status;
 	uint32_t cmd_type_len;
@@ -404,8 +411,10 @@ eth_igb_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
 		tx_last = (uint16_t) (tx_id + tx_pkt->nb_segs - 1);
 
 		ol_flags = tx_pkt->ol_flags;
+		l2_l3_len.l2_len = tx_pkt->l2_len;
+		l2_l3_len.l3_len = tx_pkt->l3_len;
 		vlan_macip_lens.f.vlan_tci = tx_pkt->vlan_tci;
-		vlan_macip_lens.f.l2_l3_len = tx_pkt->l2_l3_len;
+		vlan_macip_lens.f.l2_l3_len = l2_l3_len.u16;
 		tx_ol_req = ol_flags & IGB_TX_OFFLOAD_MASK;
 
 		/* If a Context Descriptor need be built . */
diff --git a/lib/librte_pmd_ixgbe/ixgbe_rxtx.c b/lib/librte_pmd_ixgbe/ixgbe_rxtx.c
index ca35db2..2df3385 100644
--- a/lib/librte_pmd_ixgbe/ixgbe_rxtx.c
+++ b/lib/librte_pmd_ixgbe/ixgbe_rxtx.c
@@ -546,6 +546,13 @@ ixgbe_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
 	struct rte_mbuf     *tx_pkt;
 	struct rte_mbuf     *m_seg;
 	union ixgbe_vlan_macip vlan_macip_lens;
+	union {
+		uint16_t u16;
+		struct {
+			uint16_t l3_len:9;
+			uint16_t l2_len:7;
+		};
+	} l2_l3_len;
 	uint64_t buf_dma_addr;
 	uint32_t olinfo_status;
 	uint32_t cmd_type_len;
@@ -588,8 +595,10 @@ ixgbe_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
 		/* If hardware offload required */
 		tx_ol_req = ol_flags & IXGBE_TX_OFFLOAD_MASK;
 		if (tx_ol_req) {
+			l2_l3_len.l2_len = tx_pkt->l2_len;
+			l2_l3_len.l3_len = tx_pkt->l3_len;
 			vlan_macip_lens.f.vlan_tci = tx_pkt->vlan_tci;
-			vlan_macip_lens.f.l2_l3_len = tx_pkt->l2_l3_len;
+			vlan_macip_lens.f.l2_l3_len = l2_l3_len.u16;
 
 			/* If new context need be built or reuse the exist ctx. */
 			ctx = what_advctx_update(txq, tx_ol_req,
-- 
2.1.0

^ permalink raw reply	[flat|nested] 112+ messages in thread

* [dpdk-dev] [PATCH v3 11/13] ixgbe: support TCP segmentation offload
  2014-11-20 22:58   ` [dpdk-dev] [PATCH v3 00/13] add TSO support Olivier Matz
                       ` (9 preceding siblings ...)
  2014-11-20 22:58     ` [dpdk-dev] [PATCH v3 10/13] mbuf: generic support for TCP segmentation offload Olivier Matz
@ 2014-11-20 22:58     ` Olivier Matz
  2014-11-20 22:58     ` [dpdk-dev] [PATCH v3 12/13] testpmd: support TSO in csum forward engine Olivier Matz
                       ` (2 subsequent siblings)
  13 siblings, 0 replies; 112+ messages in thread
From: Olivier Matz @ 2014-11-20 22:58 UTC (permalink / raw)
  To: dev; +Cc: jigsaw

Implement TSO (TCP segmentation offload) in ixgbe driver. The driver is
now able to use PKT_TX_TCP_SEG mbuf flag and mbuf hardware offload infos
(l2_len, l3_len, l4_len, tso_segsz) to configure the hardware support of
TCP segmentation.

In ixgbe, when doing TSO, the IP length must not be included in the TCP
pseudo header checksum. A new function ixgbe_fix_tcp_phdr_cksum() is
used to fix the pseudo header checksum of the packet before giving it to
the hardware.

In the patch, the tx_desc_cksum_flags_to_olinfo() and
tx_desc_ol_flags_to_cmdtype() functions have been reworked to make them
clearer. This should not impact performance as gcc (version 4.8 in my
case) is smart enough to convert the tests into a code that does not
contain any branch instruction.

Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
---
 lib/librte_pmd_ixgbe/ixgbe_ethdev.c |   3 +-
 lib/librte_pmd_ixgbe/ixgbe_rxtx.c   | 170 ++++++++++++++++++++++--------------
 lib/librte_pmd_ixgbe/ixgbe_rxtx.h   |  19 ++--
 3 files changed, 118 insertions(+), 74 deletions(-)

diff --git a/lib/librte_pmd_ixgbe/ixgbe_ethdev.c b/lib/librte_pmd_ixgbe/ixgbe_ethdev.c
index 2eb609c..2c2ecc0 100644
--- a/lib/librte_pmd_ixgbe/ixgbe_ethdev.c
+++ b/lib/librte_pmd_ixgbe/ixgbe_ethdev.c
@@ -1964,7 +1964,8 @@ ixgbe_dev_info_get(struct rte_eth_dev *dev, struct rte_eth_dev_info *dev_info)
 		DEV_TX_OFFLOAD_IPV4_CKSUM  |
 		DEV_TX_OFFLOAD_UDP_CKSUM   |
 		DEV_TX_OFFLOAD_TCP_CKSUM   |
-		DEV_TX_OFFLOAD_SCTP_CKSUM;
+		DEV_TX_OFFLOAD_SCTP_CKSUM  |
+		DEV_TX_OFFLOAD_TCP_TSO;
 
 	dev_info->default_rxconf = (struct rte_eth_rxconf) {
 			.rx_thresh = {
diff --git a/lib/librte_pmd_ixgbe/ixgbe_rxtx.c b/lib/librte_pmd_ixgbe/ixgbe_rxtx.c
index 2df3385..63216fa 100644
--- a/lib/librte_pmd_ixgbe/ixgbe_rxtx.c
+++ b/lib/librte_pmd_ixgbe/ixgbe_rxtx.c
@@ -2,6 +2,7 @@
  *   BSD LICENSE
  *
  *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
+ *   Copyright 2014 6WIND S.A.
  *   All rights reserved.
  *
  *   Redistribution and use in source and binary forms, with or without
@@ -94,7 +95,8 @@
 #define IXGBE_TX_OFFLOAD_MASK (			 \
 		PKT_TX_VLAN_PKT |		 \
 		PKT_TX_IP_CKSUM |		 \
-		PKT_TX_L4_MASK)
+		PKT_TX_L4_MASK |		 \
+		PKT_TX_TCP_SEG)
 
 static inline struct rte_mbuf *
 rte_rxmbuf_alloc(struct rte_mempool *mp)
@@ -363,59 +365,84 @@ ixgbe_xmit_pkts_simple(void *tx_queue, struct rte_mbuf **tx_pkts,
 static inline void
 ixgbe_set_xmit_ctx(struct igb_tx_queue* txq,
 		volatile struct ixgbe_adv_tx_context_desc *ctx_txd,
-		uint64_t ol_flags, uint32_t vlan_macip_lens)
+		uint64_t ol_flags, union ixgbe_tx_offload tx_offload)
 {
 	uint32_t type_tucmd_mlhl;
-	uint32_t mss_l4len_idx;
+	uint32_t mss_l4len_idx = 0;
 	uint32_t ctx_idx;
-	uint32_t cmp_mask;
+	uint32_t vlan_macip_lens;
+	union ixgbe_tx_offload tx_offload_mask;
 
 	ctx_idx = txq->ctx_curr;
-	cmp_mask = 0;
+	tx_offload_mask.data = 0;
 	type_tucmd_mlhl = 0;
 
+	/* Specify which HW CTX to upload. */
+	mss_l4len_idx |= (ctx_idx << IXGBE_ADVTXD_IDX_SHIFT);
+
 	if (ol_flags & PKT_TX_VLAN_PKT) {
-		cmp_mask |= TX_VLAN_CMP_MASK;
+		tx_offload_mask.vlan_tci = ~0;
 	}
 
-	if (ol_flags & PKT_TX_IP_CKSUM) {
-		type_tucmd_mlhl = IXGBE_ADVTXD_TUCMD_IPV4;
-		cmp_mask |= TX_MACIP_LEN_CMP_MASK;
-	}
+	/* check if TCP segmentation required for this packet */
+	if (ol_flags & PKT_TX_TCP_SEG) {
+		/* implies IP cksum and TCP cksum */
+		type_tucmd_mlhl = IXGBE_ADVTXD_TUCMD_IPV4 |
+			IXGBE_ADVTXD_TUCMD_L4T_TCP |
+			IXGBE_ADVTXD_DTYP_CTXT | IXGBE_ADVTXD_DCMD_DEXT;
+
+		tx_offload_mask.l2_len = ~0;
+		tx_offload_mask.l3_len = ~0;
+		tx_offload_mask.l4_len = ~0;
+		tx_offload_mask.tso_segsz = ~0;
+		mss_l4len_idx |= tx_offload.tso_segsz << IXGBE_ADVTXD_MSS_SHIFT;
+		mss_l4len_idx |= tx_offload.l4_len << IXGBE_ADVTXD_L4LEN_SHIFT;
+	} else { /* no TSO, check if hardware checksum is needed */
+		if (ol_flags & PKT_TX_IP_CKSUM) {
+			type_tucmd_mlhl = IXGBE_ADVTXD_TUCMD_IPV4;
+			tx_offload_mask.l2_len = ~0;
+			tx_offload_mask.l3_len = ~0;
+		}
 
-	/* Specify which HW CTX to upload. */
-	mss_l4len_idx = (ctx_idx << IXGBE_ADVTXD_IDX_SHIFT);
-	switch (ol_flags & PKT_TX_L4_MASK) {
-	case PKT_TX_UDP_CKSUM:
-		type_tucmd_mlhl |= IXGBE_ADVTXD_TUCMD_L4T_UDP |
+		switch (ol_flags & PKT_TX_L4_MASK) {
+		case PKT_TX_UDP_CKSUM:
+			type_tucmd_mlhl |= IXGBE_ADVTXD_TUCMD_L4T_UDP |
 				IXGBE_ADVTXD_DTYP_CTXT | IXGBE_ADVTXD_DCMD_DEXT;
-		mss_l4len_idx |= sizeof(struct udp_hdr) << IXGBE_ADVTXD_L4LEN_SHIFT;
-		cmp_mask |= TX_MACIP_LEN_CMP_MASK;
-		break;
-	case PKT_TX_TCP_CKSUM:
-		type_tucmd_mlhl |= IXGBE_ADVTXD_TUCMD_L4T_TCP |
+			mss_l4len_idx |= sizeof(struct udp_hdr) << IXGBE_ADVTXD_L4LEN_SHIFT;
+			tx_offload_mask.l2_len = ~0;
+			tx_offload_mask.l3_len = ~0;
+			break;
+		case PKT_TX_TCP_CKSUM:
+			type_tucmd_mlhl |= IXGBE_ADVTXD_TUCMD_L4T_TCP |
 				IXGBE_ADVTXD_DTYP_CTXT | IXGBE_ADVTXD_DCMD_DEXT;
-		mss_l4len_idx |= sizeof(struct tcp_hdr) << IXGBE_ADVTXD_L4LEN_SHIFT;
-		cmp_mask |= TX_MACIP_LEN_CMP_MASK;
-		break;
-	case PKT_TX_SCTP_CKSUM:
-		type_tucmd_mlhl |= IXGBE_ADVTXD_TUCMD_L4T_SCTP |
+			mss_l4len_idx |= sizeof(struct tcp_hdr) << IXGBE_ADVTXD_L4LEN_SHIFT;
+			tx_offload_mask.l2_len = ~0;
+			tx_offload_mask.l3_len = ~0;
+			tx_offload_mask.l4_len = ~0;
+			break;
+		case PKT_TX_SCTP_CKSUM:
+			type_tucmd_mlhl |= IXGBE_ADVTXD_TUCMD_L4T_SCTP |
 				IXGBE_ADVTXD_DTYP_CTXT | IXGBE_ADVTXD_DCMD_DEXT;
-		mss_l4len_idx |= sizeof(struct sctp_hdr) << IXGBE_ADVTXD_L4LEN_SHIFT;
-		cmp_mask |= TX_MACIP_LEN_CMP_MASK;
-		break;
-	default:
-		type_tucmd_mlhl |= IXGBE_ADVTXD_TUCMD_L4T_RSV |
+			mss_l4len_idx |= sizeof(struct sctp_hdr) << IXGBE_ADVTXD_L4LEN_SHIFT;
+			tx_offload_mask.l2_len = ~0;
+			tx_offload_mask.l3_len = ~0;
+			break;
+		default:
+			type_tucmd_mlhl |= IXGBE_ADVTXD_TUCMD_L4T_RSV |
 				IXGBE_ADVTXD_DTYP_CTXT | IXGBE_ADVTXD_DCMD_DEXT;
-		break;
+			break;
+		}
 	}
 
 	txq->ctx_cache[ctx_idx].flags = ol_flags;
-	txq->ctx_cache[ctx_idx].cmp_mask = cmp_mask;
-	txq->ctx_cache[ctx_idx].vlan_macip_lens.data =
-		vlan_macip_lens & cmp_mask;
+	txq->ctx_cache[ctx_idx].tx_offload.data  =
+		tx_offload_mask.data & tx_offload.data;
+	txq->ctx_cache[ctx_idx].tx_offload_mask    = tx_offload_mask;
 
 	ctx_txd->type_tucmd_mlhl = rte_cpu_to_le_32(type_tucmd_mlhl);
+	vlan_macip_lens = tx_offload.l3_len;
+	vlan_macip_lens |= (tx_offload.l2_len << IXGBE_ADVTXD_MACLEN_SHIFT);
+	vlan_macip_lens |= ((uint32_t)tx_offload.vlan_tci << IXGBE_ADVTXD_VLAN_SHIFT);
 	ctx_txd->vlan_macip_lens = rte_cpu_to_le_32(vlan_macip_lens);
 	ctx_txd->mss_l4len_idx   = rte_cpu_to_le_32(mss_l4len_idx);
 	ctx_txd->seqnum_seed     = 0;
@@ -427,20 +454,20 @@ ixgbe_set_xmit_ctx(struct igb_tx_queue* txq,
  */
 static inline uint32_t
 what_advctx_update(struct igb_tx_queue *txq, uint64_t flags,
-		uint32_t vlan_macip_lens)
+		union ixgbe_tx_offload tx_offload)
 {
 	/* If match with the current used context */
 	if (likely((txq->ctx_cache[txq->ctx_curr].flags == flags) &&
-		(txq->ctx_cache[txq->ctx_curr].vlan_macip_lens.data ==
-		(txq->ctx_cache[txq->ctx_curr].cmp_mask & vlan_macip_lens)))) {
+		(txq->ctx_cache[txq->ctx_curr].tx_offload.data ==
+		(txq->ctx_cache[txq->ctx_curr].tx_offload_mask.data & tx_offload.data)))) {
 			return txq->ctx_curr;
 	}
 
 	/* What if match with the next context  */
 	txq->ctx_curr ^= 1;
 	if (likely((txq->ctx_cache[txq->ctx_curr].flags == flags) &&
-		(txq->ctx_cache[txq->ctx_curr].vlan_macip_lens.data ==
-		(txq->ctx_cache[txq->ctx_curr].cmp_mask & vlan_macip_lens)))) {
+		(txq->ctx_cache[txq->ctx_curr].tx_offload.data ==
+		(txq->ctx_cache[txq->ctx_curr].tx_offload_mask.data & tx_offload.data)))) {
 			return txq->ctx_curr;
 	}
 
@@ -451,20 +478,25 @@ what_advctx_update(struct igb_tx_queue *txq, uint64_t flags,
 static inline uint32_t
 tx_desc_cksum_flags_to_olinfo(uint64_t ol_flags)
 {
-	static const uint32_t l4_olinfo[2] = {0, IXGBE_ADVTXD_POPTS_TXSM};
-	static const uint32_t l3_olinfo[2] = {0, IXGBE_ADVTXD_POPTS_IXSM};
-	uint32_t tmp;
-
-	tmp  = l4_olinfo[(ol_flags & PKT_TX_L4_MASK)  != PKT_TX_L4_NO_CKSUM];
-	tmp |= l3_olinfo[(ol_flags & PKT_TX_IP_CKSUM) != 0];
+	uint32_t tmp = 0;
+	if ((ol_flags & PKT_TX_L4_MASK) != PKT_TX_L4_NO_CKSUM)
+		tmp |= IXGBE_ADVTXD_POPTS_TXSM;
+	if (ol_flags & PKT_TX_IP_CKSUM)
+		tmp |= IXGBE_ADVTXD_POPTS_IXSM;
+	if (ol_flags & PKT_TX_TCP_SEG)
+		tmp |= IXGBE_ADVTXD_POPTS_TXSM;
 	return tmp;
 }
 
 static inline uint32_t
-tx_desc_vlan_flags_to_cmdtype(uint64_t ol_flags)
+tx_desc_ol_flags_to_cmdtype(uint64_t ol_flags)
 {
-	static const uint32_t vlan_cmd[2] = {0, IXGBE_ADVTXD_DCMD_VLE};
-	return vlan_cmd[(ol_flags & PKT_TX_VLAN_PKT) != 0];
+	uint32_t cmdtype = 0;
+	if (ol_flags & PKT_TX_VLAN_PKT)
+		cmdtype |= IXGBE_ADVTXD_DCMD_VLE;
+	if (ol_flags & PKT_TX_TCP_SEG)
+		cmdtype |= IXGBE_ADVTXD_DCMD_TSE;
+	return cmdtype;
 }
 
 /* Default RS bit threshold values */
@@ -545,14 +577,6 @@ ixgbe_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
 	volatile union ixgbe_adv_tx_desc *txd;
 	struct rte_mbuf     *tx_pkt;
 	struct rte_mbuf     *m_seg;
-	union ixgbe_vlan_macip vlan_macip_lens;
-	union {
-		uint16_t u16;
-		struct {
-			uint16_t l3_len:9;
-			uint16_t l2_len:7;
-		};
-	} l2_l3_len;
 	uint64_t buf_dma_addr;
 	uint32_t olinfo_status;
 	uint32_t cmd_type_len;
@@ -566,6 +590,7 @@ ixgbe_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
 	uint64_t tx_ol_req;
 	uint32_t ctx = 0;
 	uint32_t new_ctx;
+	union ixgbe_tx_offload tx_offload = { .data = 0 };
 
 	txq = tx_queue;
 	sw_ring = txq->sw_ring;
@@ -595,14 +620,15 @@ ixgbe_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
 		/* If hardware offload required */
 		tx_ol_req = ol_flags & IXGBE_TX_OFFLOAD_MASK;
 		if (tx_ol_req) {
-			l2_l3_len.l2_len = tx_pkt->l2_len;
-			l2_l3_len.l3_len = tx_pkt->l3_len;
-			vlan_macip_lens.f.vlan_tci = tx_pkt->vlan_tci;
-			vlan_macip_lens.f.l2_l3_len = l2_l3_len.u16;
+			tx_offload.l2_len = tx_pkt->l2_len;
+			tx_offload.l3_len = tx_pkt->l3_len;
+			tx_offload.l4_len = tx_pkt->l4_len;
+			tx_offload.vlan_tci = tx_pkt->vlan_tci;
+			tx_offload.tso_segsz = tx_pkt->tso_segsz;
 
 			/* If new context need be built or reuse the exist ctx. */
 			ctx = what_advctx_update(txq, tx_ol_req,
-				vlan_macip_lens.data);
+				tx_offload);
 			/* Only allocate context descriptor if required*/
 			new_ctx = (ctx == IXGBE_CTX_NUM);
 			ctx = txq->ctx_curr;
@@ -717,13 +743,22 @@ ixgbe_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
 		 */
 		cmd_type_len = IXGBE_ADVTXD_DTYP_DATA |
 			IXGBE_ADVTXD_DCMD_IFCS | IXGBE_ADVTXD_DCMD_DEXT;
-		olinfo_status = (pkt_len << IXGBE_ADVTXD_PAYLEN_SHIFT);
+
 #ifdef RTE_LIBRTE_IEEE1588
 		if (ol_flags & PKT_TX_IEEE1588_TMST)
 			cmd_type_len |= IXGBE_ADVTXD_MAC_1588;
 #endif
 
+		olinfo_status = 0;
 		if (tx_ol_req) {
+
+			if (ol_flags & PKT_TX_TCP_SEG) {
+				/* when TSO is on, paylen in descriptor is the
+				 * not the packet len but the tcp payload len */
+				pkt_len -= (tx_offload.l2_len +
+					tx_offload.l3_len + tx_offload.l4_len);
+			}
+
 			/*
 			 * Setup the TX Advanced Context Descriptor if required
 			 */
@@ -744,7 +779,7 @@ ixgbe_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
 				}
 
 				ixgbe_set_xmit_ctx(txq, ctx_txd, tx_ol_req,
-				    vlan_macip_lens.data);
+					tx_offload);
 
 				txe->last_id = tx_last;
 				tx_id = txe->next_id;
@@ -756,11 +791,13 @@ ixgbe_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
 			 * This path will go through
 			 * whatever new/reuse the context descriptor
 			 */
-			cmd_type_len  |= tx_desc_vlan_flags_to_cmdtype(ol_flags);
+			cmd_type_len  |= tx_desc_ol_flags_to_cmdtype(ol_flags);
 			olinfo_status |= tx_desc_cksum_flags_to_olinfo(ol_flags);
 			olinfo_status |= ctx << IXGBE_ADVTXD_IDX_SHIFT;
 		}
 
+		olinfo_status |= (pkt_len << IXGBE_ADVTXD_PAYLEN_SHIFT);
+
 		m_seg = tx_pkt;
 		do {
 			txd = &txr[tx_id];
@@ -3611,9 +3648,10 @@ ixgbe_dev_tx_init(struct rte_eth_dev *dev)
 	PMD_INIT_FUNC_TRACE();
 	hw = IXGBE_DEV_PRIVATE_TO_HW(dev->data->dev_private);
 
-	/* Enable TX CRC (checksum offload requirement) */
+	/* Enable TX CRC (checksum offload requirement) and hw padding
+	 * (TSO requirement) */
 	hlreg0 = IXGBE_READ_REG(hw, IXGBE_HLREG0);
-	hlreg0 |= IXGBE_HLREG0_TXCRCEN;
+	hlreg0 |= (IXGBE_HLREG0_TXCRCEN | IXGBE_HLREG0_TXPADEN);
 	IXGBE_WRITE_REG(hw, IXGBE_HLREG0, hlreg0);
 
 	/* Setup the Base and Length of the Tx Descriptor Rings */
diff --git a/lib/librte_pmd_ixgbe/ixgbe_rxtx.h b/lib/librte_pmd_ixgbe/ixgbe_rxtx.h
index eb89715..13099af 100644
--- a/lib/librte_pmd_ixgbe/ixgbe_rxtx.h
+++ b/lib/librte_pmd_ixgbe/ixgbe_rxtx.h
@@ -145,13 +145,16 @@ enum ixgbe_advctx_num {
 };
 
 /** Offload features */
-union ixgbe_vlan_macip {
-	uint32_t data;
+union ixgbe_tx_offload {
+	uint64_t data;
 	struct {
-		uint16_t l2_l3_len; /**< combined 9-bit l3, 7-bit l2 lengths */
-		uint16_t vlan_tci;
+		uint64_t l2_len:7; /**< L2 (MAC) Header Length. */
+		uint64_t l3_len:9; /**< L3 (IP) Header Length. */
+		uint64_t l4_len:8; /**< L4 (TCP/UDP) Header Length. */
+		uint64_t tso_segsz:16; /**< TCP TSO segment size */
+		uint64_t vlan_tci:16;
 		/**< VLAN Tag Control Identifier (CPU order). */
-	} f;
+	};
 };
 
 /*
@@ -170,8 +173,10 @@ union ixgbe_vlan_macip {
 
 struct ixgbe_advctx_info {
 	uint64_t flags;           /**< ol_flags for context build. */
-	uint32_t cmp_mask;        /**< compare mask for vlan_macip_lens */
-	union ixgbe_vlan_macip vlan_macip_lens; /**< vlan, mac ip length. */
+	/**< tx offload: vlan, tso, l2-l3-l4 lengths. */
+	union ixgbe_tx_offload tx_offload;
+	/** compare mask for tx offload. */
+	union ixgbe_tx_offload tx_offload_mask;
 };
 
 /**
-- 
2.1.0

^ permalink raw reply	[flat|nested] 112+ messages in thread

* [dpdk-dev] [PATCH v3 12/13] testpmd: support TSO in csum forward engine
  2014-11-20 22:58   ` [dpdk-dev] [PATCH v3 00/13] add TSO support Olivier Matz
                       ` (10 preceding siblings ...)
  2014-11-20 22:58     ` [dpdk-dev] [PATCH v3 11/13] ixgbe: support " Olivier Matz
@ 2014-11-20 22:58     ` Olivier Matz
  2014-11-20 22:58     ` [dpdk-dev] [PATCH v3 13/13] testpmd: add a verbose mode " Olivier Matz
  2014-11-26 15:04     ` [dpdk-dev] [PATCH v4 00/13] add TSO support Olivier Matz
  13 siblings, 0 replies; 112+ messages in thread
From: Olivier Matz @ 2014-11-20 22:58 UTC (permalink / raw)
  To: dev; +Cc: jigsaw

Add two new commands in testpmd:

- tso set <segsize> <portid>
- tso show <portid>

These commands can be used enable TSO when transmitting TCP packets in
the csum forward engine. Ex:

  set fwd csum
  tx_checksum set ip hw 0
  tso set 800 0
  start

Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
---
 app/test-pmd/cmdline.c  | 92 +++++++++++++++++++++++++++++++++++++++++++++++++
 app/test-pmd/csumonly.c | 64 ++++++++++++++++++++++++----------
 app/test-pmd/testpmd.h  |  1 +
 3 files changed, 139 insertions(+), 18 deletions(-)

diff --git a/app/test-pmd/cmdline.c b/app/test-pmd/cmdline.c
index 61e4340..fe2ee41 100644
--- a/app/test-pmd/cmdline.c
+++ b/app/test-pmd/cmdline.c
@@ -323,6 +323,14 @@ static void cmd_help_long_parsed(void *parsed_result,
 			"tx_checksum show (port_id)\n"
 			"    Display tx checksum offload configuration\n\n"
 
+			"tso set (segsize) (portid)\n"
+			"    Enable TCP Segmentation Offload in csum forward"
+			" engine.\n"
+			"    Please check the NIC datasheet for HW limits.\n\n"
+
+			"tso show (portid)"
+			"    Display the status of TCP Segmentation Offload.\n\n"
+
 			"set fwd (%s)\n"
 			"    Set packet forwarding mode.\n\n"
 
@@ -2867,6 +2875,88 @@ cmdline_parse_inst_t cmd_tx_cksum_show = {
 	},
 };
 
+/* *** ENABLE HARDWARE SEGMENTATION IN TX PACKETS *** */
+struct cmd_tso_set_result {
+	cmdline_fixed_string_t tso;
+	cmdline_fixed_string_t mode;
+	uint16_t tso_segsz;
+	uint8_t port_id;
+};
+
+static void
+cmd_tso_set_parsed(void *parsed_result,
+		       __attribute__((unused)) struct cmdline *cl,
+		       __attribute__((unused)) void *data)
+{
+	struct cmd_tso_set_result *res = parsed_result;
+	struct rte_eth_dev_info dev_info;
+
+	if (port_id_is_invalid(res->port_id))
+		return;
+
+	if (!strcmp(res->mode, "set"))
+		ports[res->port_id].tso_segsz = res->tso_segsz;
+
+	if (ports[res->port_id].tso_segsz == 0)
+		printf("TSO is disabled\n");
+	else
+		printf("TSO segment size is %d\n",
+			ports[res->port_id].tso_segsz);
+
+	/* display warnings if configuration is not supported by the NIC */
+	rte_eth_dev_info_get(res->port_id, &dev_info);
+	if ((ports[res->port_id].tso_segsz != 0) &&
+		(dev_info.tx_offload_capa & DEV_TX_OFFLOAD_TCP_TSO) == 0) {
+		printf("Warning: TSO enabled but not "
+			"supported by port %d\n", res->port_id);
+	}
+}
+
+cmdline_parse_token_string_t cmd_tso_set_tso =
+	TOKEN_STRING_INITIALIZER(struct cmd_tso_set_result,
+				tso, "tso");
+cmdline_parse_token_string_t cmd_tso_set_mode =
+	TOKEN_STRING_INITIALIZER(struct cmd_tso_set_result,
+				mode, "set");
+cmdline_parse_token_num_t cmd_tso_set_tso_segsz =
+	TOKEN_NUM_INITIALIZER(struct cmd_tso_set_result,
+				tso_segsz, UINT16);
+cmdline_parse_token_num_t cmd_tso_set_portid =
+	TOKEN_NUM_INITIALIZER(struct cmd_tso_set_result,
+				port_id, UINT8);
+
+cmdline_parse_inst_t cmd_tso_set = {
+	.f = cmd_tso_set_parsed,
+	.data = NULL,
+	.help_str = "Set TSO segment size for csum engine (0 to disable): "
+	"tso set <tso_segsz> <port>",
+	.tokens = {
+		(void *)&cmd_tso_set_tso,
+		(void *)&cmd_tso_set_mode,
+		(void *)&cmd_tso_set_tso_segsz,
+		(void *)&cmd_tso_set_portid,
+		NULL,
+	},
+};
+
+cmdline_parse_token_string_t cmd_tso_show_mode =
+	TOKEN_STRING_INITIALIZER(struct cmd_tso_set_result,
+				mode, "show");
+
+
+cmdline_parse_inst_t cmd_tso_show = {
+	.f = cmd_tso_set_parsed,
+	.data = NULL,
+	.help_str = "Show TSO segment size for csum engine: "
+	"tso show <port>",
+	.tokens = {
+		(void *)&cmd_tso_set_tso,
+		(void *)&cmd_tso_show_mode,
+		(void *)&cmd_tso_set_portid,
+		NULL,
+	},
+};
+
 /* *** ENABLE/DISABLE FLUSH ON RX STREAMS *** */
 struct cmd_set_flush_rx {
 	cmdline_fixed_string_t set;
@@ -7880,6 +7970,8 @@ cmdline_parse_ctx_t main_ctx[] = {
 	(cmdline_parse_inst_t *)&cmd_tx_vlan_set_pvid,
 	(cmdline_parse_inst_t *)&cmd_tx_cksum_set,
 	(cmdline_parse_inst_t *)&cmd_tx_cksum_show,
+	(cmdline_parse_inst_t *)&cmd_tso_set,
+	(cmdline_parse_inst_t *)&cmd_tso_show,
 	(cmdline_parse_inst_t *)&cmd_link_flow_control_set,
 	(cmdline_parse_inst_t *)&cmd_link_flow_control_set_rx,
 	(cmdline_parse_inst_t *)&cmd_link_flow_control_set_tx,
diff --git a/app/test-pmd/csumonly.c b/app/test-pmd/csumonly.c
index 37d4129..ec9555f 100644
--- a/app/test-pmd/csumonly.c
+++ b/app/test-pmd/csumonly.c
@@ -88,12 +88,12 @@
 #endif
 
 static uint16_t
-get_psd_sum(void *l3_hdr, uint16_t ethertype)
+get_psd_sum(void *l3_hdr, uint16_t ethertype, uint64_t ol_flags)
 {
 	if (ethertype == _htons(ETHER_TYPE_IPv4))
-		return rte_ipv4_phdr_cksum(l3_hdr);
+		return rte_ipv4_phdr_cksum(l3_hdr, ol_flags);
 	else /* assume ethertype == ETHER_TYPE_IPv6 */
-		return rte_ipv6_phdr_cksum(l3_hdr);
+		return rte_ipv6_phdr_cksum(l3_hdr, ol_flags);
 }
 
 static uint16_t
@@ -108,14 +108,15 @@ get_udptcp_checksum(void *l3_hdr, void *l4_hdr, uint16_t ethertype)
 /*
  * Parse an ethernet header to fill the ethertype, l2_len, l3_len and
  * ipproto. This function is able to recognize IPv4/IPv6 with one optional vlan
- * header.
+ * header. The l4_len argument is only set in case of TCP (useful for TSO).
  */
 static void
 parse_ethernet(struct ether_hdr *eth_hdr, uint16_t *ethertype, uint16_t *l2_len,
-	uint16_t *l3_len, uint8_t *l4_proto)
+	uint16_t *l3_len, uint8_t *l4_proto, uint16_t *l4_len)
 {
 	struct ipv4_hdr *ipv4_hdr;
 	struct ipv6_hdr *ipv6_hdr;
+	struct tcp_hdr *tcp_hdr;
 
 	*l2_len = sizeof(struct ether_hdr);
 	*ethertype = eth_hdr->ether_type;
@@ -143,6 +144,14 @@ parse_ethernet(struct ether_hdr *eth_hdr, uint16_t *ethertype, uint16_t *l2_len,
 		*l4_proto = 0;
 		break;
 	}
+
+	if (*l4_proto == IPPROTO_TCP) {
+		tcp_hdr = (struct tcp_hdr *)((char *)eth_hdr +
+			*l2_len + *l3_len);
+		*l4_len = (tcp_hdr->data_off & 0xf0) >> 2;
+	}
+	else
+		*l4_len = 0;
 }
 
 /* modify the IPv4 or IPv4 source address of a packet */
@@ -165,7 +174,7 @@ change_ip_addresses(void *l3_hdr, uint16_t ethertype)
  * depending on the testpmd command line configuration */
 static uint64_t
 process_inner_cksums(void *l3_hdr, uint16_t ethertype, uint16_t l3_len,
-	uint8_t l4_proto, uint16_t testpmd_ol_flags)
+	uint8_t l4_proto, uint16_t tso_segsz, uint16_t testpmd_ol_flags)
 {
 	struct ipv4_hdr *ipv4_hdr = l3_hdr;
 	struct udp_hdr *udp_hdr;
@@ -177,11 +186,16 @@ process_inner_cksums(void *l3_hdr, uint16_t ethertype, uint16_t l3_len,
 		ipv4_hdr = l3_hdr;
 		ipv4_hdr->hdr_checksum = 0;
 
-		if (testpmd_ol_flags & TESTPMD_TX_OFFLOAD_IP_CKSUM)
+		if (tso_segsz != 0 && l4_proto == IPPROTO_TCP) {
 			ol_flags |= PKT_TX_IP_CKSUM;
-		else
-			ipv4_hdr->hdr_checksum = rte_ipv4_cksum(ipv4_hdr);
-
+		}
+		else {
+			if (testpmd_ol_flags & TESTPMD_TX_OFFLOAD_IP_CKSUM)
+				ol_flags |= PKT_TX_IP_CKSUM;
+			else
+				ipv4_hdr->hdr_checksum =
+					rte_ipv4_cksum(ipv4_hdr);
+		}
 	}
 	else if (ethertype != _htons(ETHER_TYPE_IPv6))
 		return 0; /* packet type not supported nothing to do */
@@ -194,7 +208,7 @@ process_inner_cksums(void *l3_hdr, uint16_t ethertype, uint16_t l3_len,
 			if (testpmd_ol_flags & TESTPMD_TX_OFFLOAD_UDP_CKSUM) {
 				ol_flags |= PKT_TX_UDP_CKSUM;
 				udp_hdr->dgram_cksum = get_psd_sum(l3_hdr,
-					ethertype);
+					ethertype, ol_flags);
 			}
 			else {
 				udp_hdr->dgram_cksum =
@@ -206,9 +220,13 @@ process_inner_cksums(void *l3_hdr, uint16_t ethertype, uint16_t l3_len,
 	else if (l4_proto == IPPROTO_TCP) {
 		tcp_hdr = (struct tcp_hdr *)((char *)l3_hdr + l3_len);
 		tcp_hdr->cksum = 0;
-		if (testpmd_ol_flags & TESTPMD_TX_OFFLOAD_TCP_CKSUM) {
+		if (tso_segsz != 0) {
+			ol_flags |= PKT_TX_TCP_SEG;
+			tcp_hdr->cksum = get_psd_sum(l3_hdr, ethertype, ol_flags);
+		}
+		else if (testpmd_ol_flags & TESTPMD_TX_OFFLOAD_TCP_CKSUM) {
 			ol_flags |= PKT_TX_TCP_CKSUM;
-			tcp_hdr->cksum = get_psd_sum(l3_hdr, ethertype);
+			tcp_hdr->cksum = get_psd_sum(l3_hdr, ethertype, ol_flags);
 		}
 		else {
 			tcp_hdr->cksum =
@@ -279,6 +297,8 @@ process_outer_cksums(void *outer_l3_hdr, uint16_t outer_ethertype,
  *  - modify the IPs in inner headers and in outer headers if any
  *  - reprocess the checksum of all supported layers. This is done in SW
  *    or HW, depending on testpmd command line configuration
+ *  - if TSO is enabled in testpmd command line, also flag the mbuf for TCP
+ *    segmentation offload (this implies HW TCP checksum)
  * Then transmit packets on the output port.
  *
  * (1) Supported packets are:
@@ -309,7 +329,9 @@ pkt_burst_checksum_forward(struct fwd_stream *fs)
 	uint16_t testpmd_ol_flags;
 	uint8_t l4_proto;
 	uint16_t ethertype = 0, outer_ethertype = 0;
-	uint16_t  l2_len = 0, l3_len = 0, outer_l2_len = 0, outer_l3_len = 0;
+	uint16_t l2_len = 0, l3_len = 0, l4_len = 0;
+	uint16_t outer_l2_len = 0, outer_l3_len = 0;
+	uint16_t tso_segsz;
 	int tunnel = 0;
 	uint32_t rx_bad_ip_csum;
 	uint32_t rx_bad_l4_csum;
@@ -339,6 +361,7 @@ pkt_burst_checksum_forward(struct fwd_stream *fs)
 
 	txp = &ports[fs->tx_port];
 	testpmd_ol_flags = txp->tx_ol_flags;
+	tso_segsz = txp->tso_segsz;
 
 	for (i = 0; i < nb_rx; i++) {
 
@@ -354,7 +377,8 @@ pkt_burst_checksum_forward(struct fwd_stream *fs)
 		 * and inner headers */
 
 		eth_hdr = rte_pktmbuf_mtod(m, struct ether_hdr *);
-		parse_ethernet(eth_hdr, &ethertype, &l2_len, &l3_len, &l4_proto);
+		parse_ethernet(eth_hdr, &ethertype, &l2_len, &l3_len,
+			&l4_proto, &l4_len);
 		l3_hdr = (char *)eth_hdr + l2_len;
 
 		/* check if it's a supported tunnel (only vxlan for now) */
@@ -382,7 +406,7 @@ pkt_burst_checksum_forward(struct fwd_stream *fs)
 					sizeof(struct vxlan_hdr));
 
 				parse_ethernet(eth_hdr, &ethertype, &l2_len,
-					&l3_len, &l4_proto);
+					&l3_len, &l4_proto, &l4_len);
 				l3_hdr = (char *)eth_hdr + l2_len;
 			}
 		}
@@ -396,11 +420,12 @@ pkt_burst_checksum_forward(struct fwd_stream *fs)
 
 		/* step 3: depending on user command line configuration,
 		 * recompute checksum either in software or flag the
-		 * mbuf to offload the calculation to the NIC */
+		 * mbuf to offload the calculation to the NIC. If TSO
+		 * is configured, prepare the mbuf for TCP segmentation. */
 
 		/* process checksums of inner headers first */
 		ol_flags |= process_inner_cksums(l3_hdr, ethertype,
-			l3_len, l4_proto, testpmd_ol_flags);
+			l3_len, l4_proto, tso_segsz, testpmd_ol_flags);
 
 		/* Then process outer headers if any. Note that the software
 		 * checksum will be wrong if one of the inner checksums is
@@ -429,6 +454,7 @@ pkt_burst_checksum_forward(struct fwd_stream *fs)
 					sizeof(struct udp_hdr) +
 					sizeof(struct vxlan_hdr) + l2_len;
 				m->l3_len = l3_len;
+				m->l4_len = l4_len;
 			}
 		} else {
 			/* this is only useful if an offload flag is
@@ -436,7 +462,9 @@ pkt_burst_checksum_forward(struct fwd_stream *fs)
 			 * case */
 			m->l2_len = l2_len;
 			m->l3_len = l3_len;
+			m->l4_len = l4_len;
 		}
+		m->tso_segsz = tso_segsz;
 		m->ol_flags = ol_flags;
 
 	}
diff --git a/app/test-pmd/testpmd.h b/app/test-pmd/testpmd.h
index c753d37..c22863f 100644
--- a/app/test-pmd/testpmd.h
+++ b/app/test-pmd/testpmd.h
@@ -149,6 +149,7 @@ struct rte_port {
 	struct fwd_stream       *tx_stream; /**< Port TX stream, if unique */
 	unsigned int            socket_id;  /**< For NUMA support */
 	uint16_t                tx_ol_flags;/**< TX Offload Flags (TESTPMD_TX_OFFLOAD...). */
+	uint16_t                tso_segsz;  /**< MSS for segmentation offload. */
 	uint16_t                tx_vlan_id; /**< Tag Id. in TX VLAN packets. */
 	void                    *fwd_ctx;   /**< Forwarding mode context */
 	uint64_t                rx_bad_ip_csum; /**< rx pkts with bad ip checksum  */
-- 
2.1.0

^ permalink raw reply	[flat|nested] 112+ messages in thread

* [dpdk-dev] [PATCH v3 13/13] testpmd: add a verbose mode csum forward engine
  2014-11-20 22:58   ` [dpdk-dev] [PATCH v3 00/13] add TSO support Olivier Matz
                       ` (11 preceding siblings ...)
  2014-11-20 22:58     ` [dpdk-dev] [PATCH v3 12/13] testpmd: support TSO in csum forward engine Olivier Matz
@ 2014-11-20 22:58     ` Olivier Matz
  2014-11-26 15:04     ` [dpdk-dev] [PATCH v4 00/13] add TSO support Olivier Matz
  13 siblings, 0 replies; 112+ messages in thread
From: Olivier Matz @ 2014-11-20 22:58 UTC (permalink / raw)
  To: dev; +Cc: jigsaw

If the user specifies 'set verbose 1' in testpmd command line,
the csum forward engine will dump some informations about received
and transmitted packets, especially which flags are set and what
values are assigned to l2_len, l3_len, l4_len and tso_segsz.

This can help someone implementing TSO or hardware checksum offload to
understand how to configure the mbufs.

Example of output for one packet:

 --------------
 rx: l2_len=14 ethertype=800 l3_len=20 l4_proto=6 l4_len=20
 tx: m->l2_len=14 m->l3_len=20 m->l4_len=20
 tx: m->tso_segsz=800
 tx: flags=PKT_TX_IP_CKSUM PKT_TX_TCP_SEG
 --------------

Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
---
 app/test-pmd/csumonly.c | 51 +++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 51 insertions(+)

diff --git a/app/test-pmd/csumonly.c b/app/test-pmd/csumonly.c
index ec9555f..72b984c 100644
--- a/app/test-pmd/csumonly.c
+++ b/app/test-pmd/csumonly.c
@@ -467,6 +467,57 @@ pkt_burst_checksum_forward(struct fwd_stream *fs)
 		m->tso_segsz = tso_segsz;
 		m->ol_flags = ol_flags;
 
+		/* if verbose mode is enabled, dump debug info */
+		if (verbose_level > 0) {
+			struct {
+				uint64_t flag;
+				uint64_t mask;
+			} tx_flags[] = {
+				{ PKT_TX_IP_CKSUM, PKT_TX_IP_CKSUM },
+				{ PKT_TX_UDP_CKSUM, PKT_TX_L4_MASK },
+				{ PKT_TX_TCP_CKSUM, PKT_TX_L4_MASK },
+				{ PKT_TX_SCTP_CKSUM, PKT_TX_L4_MASK },
+				{ PKT_TX_VXLAN_CKSUM, PKT_TX_VXLAN_CKSUM },
+				{ PKT_TX_TCP_SEG, PKT_TX_TCP_SEG },
+			};
+			unsigned j;
+			const char *name;
+
+			printf("-----------------\n");
+			/* dump rx parsed packet info */
+			printf("rx: l2_len=%d ethertype=%x l3_len=%d "
+				"l4_proto=%d l4_len=%d\n",
+				l2_len, rte_be_to_cpu_16(ethertype),
+				l3_len, l4_proto, l4_len);
+			if (tunnel == 1)
+				printf("rx: outer_l2_len=%d outer_ethertype=%x "
+					"outer_l3_len=%d\n", outer_l2_len,
+					rte_be_to_cpu_16(outer_ethertype),
+					outer_l3_len);
+			/* dump tx packet info */
+			if ((testpmd_ol_flags & (TESTPMD_TX_OFFLOAD_IP_CKSUM |
+						TESTPMD_TX_OFFLOAD_UDP_CKSUM |
+						TESTPMD_TX_OFFLOAD_TCP_CKSUM |
+						TESTPMD_TX_OFFLOAD_SCTP_CKSUM)) ||
+				tso_segsz != 0)
+				printf("tx: m->l2_len=%d m->l3_len=%d "
+					"m->l4_len=%d\n",
+					m->l2_len, m->l3_len, m->l4_len);
+			if ((tunnel == 1) &&
+				(testpmd_ol_flags & TESTPMD_TX_OFFLOAD_VXLAN_CKSUM))
+				printf("tx: m->inner_l2_len=%d m->inner_l3_len=%d\n",
+					m->inner_l2_len, m->inner_l3_len);
+			if (tso_segsz != 0)
+				printf("tx: m->tso_segsz=%d\n", m->tso_segsz);
+			printf("tx: flags=");
+			for (j = 0; j < sizeof(tx_flags)/sizeof(*tx_flags); j++) {
+				name = rte_get_tx_ol_flag_name(tx_flags[j].flag);
+				if ((m->ol_flags & tx_flags[j].mask) ==
+					tx_flags[j].flag)
+					printf("%s ", name);
+			}
+			printf("\n");
+		}
 	}
 	nb_tx = rte_eth_tx_burst(fs->tx_port, fs->tx_queue, pkts_burst, nb_rx);
 	fs->tx_packets += nb_tx;
-- 
2.1.0

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [dpdk-dev] [PATCH v3 03/13] mbuf: reorder tx ol_flags
  2014-11-20 22:58     ` [dpdk-dev] [PATCH v3 03/13] mbuf: reorder tx ol_flags Olivier Matz
@ 2014-11-25 10:22       ` Thomas Monjalon
  0 siblings, 0 replies; 112+ messages in thread
From: Thomas Monjalon @ 2014-11-25 10:22 UTC (permalink / raw)
  To: Olivier Matz; +Cc: dev

2014-11-20 23:58, Olivier Matz:
> The tx mbuf flags are now ordered from the lowest value to the
> the highest. Add comments to explain where to add new flags.
> 
> By the way, move the PKT_TX_VXLAN_CKSUM at the right place.
> 
> Signed-off-by: Olivier Matz <olivier.matz@6wind.com>

Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [dpdk-dev] [PATCH v3 06/13] mbuf: add functions to get the name of an ol_flag
  2014-11-20 22:58     ` [dpdk-dev] [PATCH v3 06/13] mbuf: add functions to get the name of an ol_flag Olivier Matz
@ 2014-11-25 10:23       ` Thomas Monjalon
  0 siblings, 0 replies; 112+ messages in thread
From: Thomas Monjalon @ 2014-11-25 10:23 UTC (permalink / raw)
  To: dev

> In test-pmd (rxonly.c), the code is able to dump the list of ol_flags.
> The issue is that the list of flags in the application has to be
> synchronized with the flags defined in rte_mbuf.h.
> 
> This patch introduces 2 new functions rte_get_rx_ol_flag_name()
> and rte_get_tx_ol_flag_name() that returns the name of a flag from
> its mask. It also fixes rxonly.c to use this new functions and to
> display the proper flags.
> 
> Signed-off-by: Olivier Matz <olivier.matz@6wind.com>

Any comment or ack?

-- 
Thomas

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [dpdk-dev] [PATCH v2 06/13] mbuf: add functions to get the name of an ol_flag
  2014-11-19 11:06         ` Ananyev, Konstantin
@ 2014-11-25 10:37           ` Ananyev, Konstantin
  2014-11-25 12:15             ` Zhang, Helin
  0 siblings, 1 reply; 112+ messages in thread
From: Ananyev, Konstantin @ 2014-11-25 10:37 UTC (permalink / raw)
  To: 'Olivier MATZ', 'dev@dpdk.org'; +Cc: 'jigsaw@gmail.com'

Hi Helin,

> -----Original Message-----
> From: Ananyev, Konstantin
> Sent: Wednesday, November 19, 2014 11:07 AM
> To: Olivier MATZ; dev@dpdk.org
> Cc: jigsaw@gmail.com; Zhang, Helin
> Subject: RE: [dpdk-dev] [PATCH v2 06/13] mbuf: add functions to get the name of an ol_flag
> 
> 
> 
> > -----Original Message-----
> > From: Olivier MATZ [mailto:olivier.matz@6wind.com]
> > Sent: Tuesday, November 18, 2014 9:30 AM
> > To: Ananyev, Konstantin; dev@dpdk.org
> > Cc: jigsaw@gmail.com; Zhang, Helin
> > Subject: Re: [dpdk-dev] [PATCH v2 06/13] mbuf: add functions to get the name of an ol_flag
> >
> > Hi Konstantin,
> >
> > On 11/17/2014 08:00 PM, Ananyev, Konstantin wrote:
> > >> +/*
> > >> + * Get the name of a RX offload flag
> > >> + */
> > >> +const char *rte_get_rx_ol_flag_name(uint64_t mask)
> > >> +{
> > >> +	switch (mask) {
> > >> +	case PKT_RX_VLAN_PKT: return "PKT_RX_VLAN_PKT";
> > >> +	case PKT_RX_RSS_HASH: return "PKT_RX_RSS_HASH";
> > >> +	case PKT_RX_FDIR: return "PKT_RX_FDIR";
> > >> +	case PKT_RX_L4_CKSUM_BAD: return "PKT_RX_L4_CKSUM_BAD";
> > >> +	case PKT_RX_IP_CKSUM_BAD: return "PKT_RX_IP_CKSUM_BAD";
> > >> +	/* case PKT_RX_EIP_CKSUM_BAD: return "PKT_RX_EIP_CKSUM_BAD"; */
> > >> +	/* case PKT_RX_OVERSIZE: return "PKT_RX_OVERSIZE"; */
> > >> +	/* case PKT_RX_HBUF_OVERFLOW: return "PKT_RX_HBUF_OVERFLOW"; */
> > >> +	/* case PKT_RX_RECIP_ERR: return "PKT_RX_RECIP_ERR"; */
> > >> +	/* case PKT_RX_MAC_ERR: return "PKT_RX_MAC_ERR"; */
> > >
> > > Didn't spot it before, wonder why do you need these 5 commented out lines?
> > > In fact, why do we need these flags if they all equal to zero right now?
> > > I know these flags were not introduced by that patch, in fact as I can see it was a temporary measure,
> > > as old ol_flags were just 16 bits long:
> > > http://dpdk.org/ml/archives/dev/2014-June/003308.html
> > > So wonder should now these flags either get proper values or be removed?
> >
> > I would be in favor of removing them, or at least the following ones
> > (I don't understand how they can help the application):
> >
> > - PKT_RX_OVERSIZE: Num of desc of an RX pkt oversize.
> > - PKT_RX_HBUF_OVERFLOW: Header buffer overflow.
> > - PKT_RX_RECIP_ERR: Hardware processing error.
> > - PKT_RX_MAC_ERR: MAC error.
> 
> Tend to agree...
> Or probably collapse these 4 flags into one: flag PKT_RX_ERR or something.
> Might be still used by someone for debugging purposes.
> Helin, what do you think?

As there is no answer, I suppose you don't care these flags any more.
So we can just remove them, right?

Konstantin

> 
> >
> > I would have say that a statistics counter in the driver is more
> > appropriate for this case (maybe there is already a counter in the
> > hardware).
> >
> > I have no i40e hardware to test that, so I don't feel very comfortable
> > to modify the i40e driver code to add these stats.
> >
> > Adding Helin in CC list, maybe he has an idea.
> >
> > Regards,
> > Olivier

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [dpdk-dev] [PATCH v3 07/13] testpmd: fix use of offload flags in testpmd
  2014-11-20 22:58     ` [dpdk-dev] [PATCH v3 07/13] testpmd: fix use of offload flags in testpmd Olivier Matz
@ 2014-11-25 11:52       ` Ananyev, Konstantin
  0 siblings, 0 replies; 112+ messages in thread
From: Ananyev, Konstantin @ 2014-11-25 11:52 UTC (permalink / raw)
  To: Olivier Matz, dev; +Cc: jigsaw



> -----Original Message-----
> From: Olivier Matz [mailto:olivier.matz@6wind.com]
> Sent: Thursday, November 20, 2014 10:59 PM
> To: dev@dpdk.org
> Cc: olivier.matz@6wind.com; Walukiewicz, Miroslaw; Liu, Jijiang; Liu, Yong; jigsaw@gmail.com; Richardson, Bruce; Ananyev, Konstantin
> Subject: [PATCH v3 07/13] testpmd: fix use of offload flags in testpmd
> 
> In testpmd the rte_port->tx_ol_flags flag was used in 2 incompatible
> manners:
> - sometimes used with testpmd specific flags (0xff for checksums, and
>   bit 11 for vlan)
> - sometimes assigned to m->ol_flags directly, which is wrong in case
>   of checksum flags
> 
> This commit replaces the hardcoded values by named definitions, which
> are not compatible with mbuf flags. The testpmd forward engines are
> fixed to use the flags properly.
> 
> Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
> ---

Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>

>  app/test-pmd/config.c   |  4 ++--
>  app/test-pmd/csumonly.c | 40 +++++++++++++++++++++++-----------------
>  app/test-pmd/macfwd.c   |  5 ++++-
>  app/test-pmd/macswap.c  |  5 ++++-
>  app/test-pmd/testpmd.h  | 28 +++++++++++++++++++++-------
>  app/test-pmd/txonly.c   |  9 ++++++---
>  6 files changed, 60 insertions(+), 31 deletions(-)
> 
> diff --git a/app/test-pmd/config.c b/app/test-pmd/config.c
> index b102b72..34b6fdb 100644
> --- a/app/test-pmd/config.c
> +++ b/app/test-pmd/config.c
> @@ -1670,7 +1670,7 @@ tx_vlan_set(portid_t port_id, uint16_t vlan_id)
>  		return;
>  	if (vlan_id_is_invalid(vlan_id))
>  		return;
> -	ports[port_id].tx_ol_flags |= PKT_TX_VLAN_PKT;
> +	ports[port_id].tx_ol_flags |= TESTPMD_TX_OFFLOAD_INSERT_VLAN;
>  	ports[port_id].tx_vlan_id = vlan_id;
>  }
> 
> @@ -1679,7 +1679,7 @@ tx_vlan_reset(portid_t port_id)
>  {
>  	if (port_id_is_invalid(port_id))
>  		return;
> -	ports[port_id].tx_ol_flags &= ~PKT_TX_VLAN_PKT;
> +	ports[port_id].tx_ol_flags &= ~TESTPMD_TX_OFFLOAD_INSERT_VLAN;
>  }
> 
>  void
> diff --git a/app/test-pmd/csumonly.c b/app/test-pmd/csumonly.c
> index 8d10bfd..743094a 100644
> --- a/app/test-pmd/csumonly.c
> +++ b/app/test-pmd/csumonly.c
> @@ -322,7 +322,7 @@ pkt_burst_checksum_forward(struct fwd_stream *fs)
>  			/* Do not delete, this is required by HW*/
>  			ipv4_hdr->hdr_checksum = 0;
> 
> -			if (tx_ol_flags & 0x1) {
> +			if (tx_ol_flags & TESTPMD_TX_OFFLOAD_IP_CKSUM) {
>  				/* HW checksum */
>  				ol_flags |= PKT_TX_IP_CKSUM;
>  			}
> @@ -336,7 +336,7 @@ pkt_burst_checksum_forward(struct fwd_stream *fs)
>  			if (l4_proto == IPPROTO_UDP) {
>  				udp_hdr = (struct udp_hdr*) (rte_pktmbuf_mtod(mb,
>  						unsigned char *) + l2_len + l3_len);
> -				if (tx_ol_flags & 0x2) {
> +				if (tx_ol_flags & TESTPMD_TX_OFFLOAD_UDP_CKSUM) {
>  					/* HW Offload */
>  					ol_flags |= PKT_TX_UDP_CKSUM;
>  					if (ipv4_tunnel)
> @@ -358,7 +358,7 @@ pkt_burst_checksum_forward(struct fwd_stream *fs)
>  					uint16_t len;
> 
>  					/* Check if inner L3/L4 checkum flag is set */
> -					if (tx_ol_flags & 0xF0)
> +					if (tx_ol_flags & TESTPMD_TX_OFFLOAD_INNER_CKSUM_MASK)
>  						ol_flags |= PKT_TX_VXLAN_CKSUM;
> 
>  					inner_l2_len  = sizeof(struct ether_hdr);
> @@ -381,7 +381,7 @@ pkt_burst_checksum_forward(struct fwd_stream *fs)
>  								unsigned char *) + len);
>  						inner_l4_proto = inner_ipv4_hdr->next_proto_id;
> 
> -						if (tx_ol_flags & 0x10) {
> +						if (tx_ol_flags & TESTPMD_TX_OFFLOAD_INNER_IP_CKSUM) {
> 
>  							/* Do not delete, this is required by HW*/
>  							inner_ipv4_hdr->hdr_checksum = 0;
> @@ -394,7 +394,8 @@ pkt_burst_checksum_forward(struct fwd_stream *fs)
>  								unsigned char *) + len);
>  						inner_l4_proto = inner_ipv6_hdr->proto;
>  					}
> -					if ((inner_l4_proto == IPPROTO_UDP) && (tx_ol_flags & 0x20)) {
> +					if ((inner_l4_proto == IPPROTO_UDP) &&
> +						(tx_ol_flags & TESTPMD_TX_OFFLOAD_INNER_UDP_CKSUM)) {
> 
>  						/* HW Offload */
>  						ol_flags |= PKT_TX_UDP_CKSUM;
> @@ -405,7 +406,8 @@ pkt_burst_checksum_forward(struct fwd_stream *fs)
>  						else if (eth_type == ETHER_TYPE_IPv6)
>  							inner_udp_hdr->dgram_cksum = get_ipv6_psd_sum(inner_ipv6_hdr);
> 
> -					} else if ((inner_l4_proto == IPPROTO_TCP) && (tx_ol_flags & 0x40)) {
> +					} else if ((inner_l4_proto == IPPROTO_TCP) &&
> +						(tx_ol_flags & TESTPMD_TX_OFFLOAD_INNER_TCP_CKSUM)) {
>  						/* HW Offload */
>  						ol_flags |= PKT_TX_TCP_CKSUM;
>  						inner_tcp_hdr = (struct tcp_hdr *) (rte_pktmbuf_mtod(mb,
> @@ -414,7 +416,8 @@ pkt_burst_checksum_forward(struct fwd_stream *fs)
>  							inner_tcp_hdr->cksum = get_ipv4_psd_sum(inner_ipv4_hdr);
>  						else if (eth_type == ETHER_TYPE_IPv6)
>  							inner_tcp_hdr->cksum = get_ipv6_psd_sum(inner_ipv6_hdr);
> -					} else if ((inner_l4_proto == IPPROTO_SCTP) && (tx_ol_flags & 0x80)) {
> +					} else if ((inner_l4_proto == IPPROTO_SCTP) &&
> +						(tx_ol_flags & TESTPMD_TX_OFFLOAD_INNER_SCTP_CKSUM)) {
>  						/* HW Offload */
>  						ol_flags |= PKT_TX_SCTP_CKSUM;
>  						inner_sctp_hdr = (struct sctp_hdr *) (rte_pktmbuf_mtod(mb,
> @@ -427,7 +430,7 @@ pkt_burst_checksum_forward(struct fwd_stream *fs)
>  			} else if (l4_proto == IPPROTO_TCP) {
>  				tcp_hdr = (struct tcp_hdr*) (rte_pktmbuf_mtod(mb,
>  						unsigned char *) + l2_len + l3_len);
> -				if (tx_ol_flags & 0x4) {
> +				if (tx_ol_flags & TESTPMD_TX_OFFLOAD_TCP_CKSUM) {
>  					ol_flags |= PKT_TX_TCP_CKSUM;
>  					tcp_hdr->cksum = get_ipv4_psd_sum(ipv4_hdr);
>  				}
> @@ -440,7 +443,7 @@ pkt_burst_checksum_forward(struct fwd_stream *fs)
>  				sctp_hdr = (struct sctp_hdr*) (rte_pktmbuf_mtod(mb,
>  						unsigned char *) + l2_len + l3_len);
> 
> -				if (tx_ol_flags & 0x8) {
> +				if (tx_ol_flags & TESTPMD_TX_OFFLOAD_SCTP_CKSUM) {
>  					ol_flags |= PKT_TX_SCTP_CKSUM;
>  					sctp_hdr->cksum = 0;
> 
> @@ -465,7 +468,7 @@ pkt_burst_checksum_forward(struct fwd_stream *fs)
>  			if (l4_proto == IPPROTO_UDP) {
>  				udp_hdr = (struct udp_hdr*) (rte_pktmbuf_mtod(mb,
>  						unsigned char *) + l2_len + l3_len);
> -				if (tx_ol_flags & 0x2) {
> +				if (tx_ol_flags & TESTPMD_TX_OFFLOAD_UDP_CKSUM) {
>  					/* HW Offload */
>  					ol_flags |= PKT_TX_UDP_CKSUM;
>  					if (ipv6_tunnel)
> @@ -487,7 +490,7 @@ pkt_burst_checksum_forward(struct fwd_stream *fs)
>  					uint16_t len;
> 
>  					/* Check if inner L3/L4 checksum flag is set */
> -					if (tx_ol_flags & 0xF0)
> +					if (tx_ol_flags & TESTPMD_TX_OFFLOAD_INNER_CKSUM_MASK)
>  						ol_flags |= PKT_TX_VXLAN_CKSUM;
> 
>  					inner_l2_len  = sizeof(struct ether_hdr);
> @@ -511,7 +514,7 @@ pkt_burst_checksum_forward(struct fwd_stream *fs)
>  						inner_l4_proto = inner_ipv4_hdr->next_proto_id;
> 
>  						/* HW offload */
> -						if (tx_ol_flags & 0x10) {
> +						if (tx_ol_flags & TESTPMD_TX_OFFLOAD_INNER_IP_CKSUM) {
> 
>  							/* Do not delete, this is required by HW*/
>  							inner_ipv4_hdr->hdr_checksum = 0;
> @@ -524,7 +527,8 @@ pkt_burst_checksum_forward(struct fwd_stream *fs)
>  						inner_l4_proto = inner_ipv6_hdr->proto;
>  					}
> 
> -					if ((inner_l4_proto == IPPROTO_UDP) && (tx_ol_flags & 0x20)) {
> +					if ((inner_l4_proto == IPPROTO_UDP) &&
> +						(tx_ol_flags & TESTPMD_TX_OFFLOAD_INNER_UDP_CKSUM)) {
>  						inner_udp_hdr = (struct udp_hdr *) (rte_pktmbuf_mtod(mb,
>  							unsigned char *) + len + inner_l3_len);
>  						/* HW offload */
> @@ -534,7 +538,8 @@ pkt_burst_checksum_forward(struct fwd_stream *fs)
>  							inner_udp_hdr->dgram_cksum = get_ipv4_psd_sum(inner_ipv4_hdr);
>  						else if (eth_type == ETHER_TYPE_IPv6)
>  							inner_udp_hdr->dgram_cksum = get_ipv6_psd_sum(inner_ipv6_hdr);
> -					} else if ((inner_l4_proto == IPPROTO_TCP) && (tx_ol_flags & 0x40)) {
> +					} else if ((inner_l4_proto == IPPROTO_TCP) &&
> +						(tx_ol_flags & TESTPMD_TX_OFFLOAD_INNER_TCP_CKSUM)) {
>  						/* HW offload */
>  						ol_flags |= PKT_TX_TCP_CKSUM;
>  						inner_tcp_hdr = (struct tcp_hdr *) (rte_pktmbuf_mtod(mb,
> @@ -545,7 +550,8 @@ pkt_burst_checksum_forward(struct fwd_stream *fs)
>  						else if (eth_type == ETHER_TYPE_IPv6)
>  							inner_tcp_hdr->cksum = get_ipv6_psd_sum(inner_ipv6_hdr);
> 
> -					} else if ((inner_l4_proto == IPPROTO_SCTP) && (tx_ol_flags & 0x80)) {
> +					} else if ((inner_l4_proto == IPPROTO_SCTP) &&
> +						(tx_ol_flags & TESTPMD_TX_OFFLOAD_INNER_SCTP_CKSUM)) {
>  						/* HW offload */
>  						ol_flags |= PKT_TX_SCTP_CKSUM;
>  						inner_sctp_hdr = (struct sctp_hdr *) (rte_pktmbuf_mtod(mb,
> @@ -559,7 +565,7 @@ pkt_burst_checksum_forward(struct fwd_stream *fs)
>  			else if (l4_proto == IPPROTO_TCP) {
>  				tcp_hdr = (struct tcp_hdr*) (rte_pktmbuf_mtod(mb,
>  						unsigned char *) + l2_len + l3_len);
> -				if (tx_ol_flags & 0x4) {
> +				if (tx_ol_flags & TESTPMD_TX_OFFLOAD_TCP_CKSUM) {
>  					ol_flags |= PKT_TX_TCP_CKSUM;
>  					tcp_hdr->cksum = get_ipv6_psd_sum(ipv6_hdr);
>  				}
> @@ -573,7 +579,7 @@ pkt_burst_checksum_forward(struct fwd_stream *fs)
>  				sctp_hdr = (struct sctp_hdr*) (rte_pktmbuf_mtod(mb,
>  						unsigned char *) + l2_len + l3_len);
> 
> -				if (tx_ol_flags & 0x8) {
> +				if (tx_ol_flags & TESTPMD_TX_OFFLOAD_SCTP_CKSUM) {
>  					ol_flags |= PKT_TX_SCTP_CKSUM;
>  					sctp_hdr->cksum = 0;
>  					/* Sanity check, only number of 4 bytes supported by HW */
> diff --git a/app/test-pmd/macfwd.c b/app/test-pmd/macfwd.c
> index 38bae23..aa3d705 100644
> --- a/app/test-pmd/macfwd.c
> +++ b/app/test-pmd/macfwd.c
> @@ -85,6 +85,7 @@ pkt_burst_mac_forward(struct fwd_stream *fs)
>  	uint16_t nb_rx;
>  	uint16_t nb_tx;
>  	uint16_t i;
> +	uint64_t ol_flags = 0;
>  #ifdef RTE_TEST_PMD_RECORD_CORE_CYCLES
>  	uint64_t start_tsc;
>  	uint64_t end_tsc;
> @@ -108,6 +109,8 @@ pkt_burst_mac_forward(struct fwd_stream *fs)
>  #endif
>  	fs->rx_packets += nb_rx;
>  	txp = &ports[fs->tx_port];
> +	if (txp->tx_ol_flags & TESTPMD_TX_OFFLOAD_INSERT_VLAN)
> +		ol_flags = PKT_TX_VLAN_PKT;
>  	for (i = 0; i < nb_rx; i++) {
>  		mb = pkts_burst[i];
>  		eth_hdr = rte_pktmbuf_mtod(mb, struct ether_hdr *);
> @@ -115,7 +118,7 @@ pkt_burst_mac_forward(struct fwd_stream *fs)
>  				&eth_hdr->d_addr);
>  		ether_addr_copy(&ports[fs->tx_port].eth_addr,
>  				&eth_hdr->s_addr);
> -		mb->ol_flags = txp->tx_ol_flags;
> +		mb->ol_flags = ol_flags;
>  		mb->l2_len = sizeof(struct ether_hdr);
>  		mb->l3_len = sizeof(struct ipv4_hdr);
>  		mb->vlan_tci = txp->tx_vlan_id;
> diff --git a/app/test-pmd/macswap.c b/app/test-pmd/macswap.c
> index 1786095..ec61657 100644
> --- a/app/test-pmd/macswap.c
> +++ b/app/test-pmd/macswap.c
> @@ -85,6 +85,7 @@ pkt_burst_mac_swap(struct fwd_stream *fs)
>  	uint16_t nb_rx;
>  	uint16_t nb_tx;
>  	uint16_t i;
> +	uint64_t ol_flags = 0;
>  #ifdef RTE_TEST_PMD_RECORD_CORE_CYCLES
>  	uint64_t start_tsc;
>  	uint64_t end_tsc;
> @@ -108,6 +109,8 @@ pkt_burst_mac_swap(struct fwd_stream *fs)
>  #endif
>  	fs->rx_packets += nb_rx;
>  	txp = &ports[fs->tx_port];
> +	if (txp->tx_ol_flags & TESTPMD_TX_OFFLOAD_INSERT_VLAN)
> +		ol_flags = PKT_TX_VLAN_PKT;
>  	for (i = 0; i < nb_rx; i++) {
>  		mb = pkts_burst[i];
>  		eth_hdr = rte_pktmbuf_mtod(mb, struct ether_hdr *);
> @@ -117,7 +120,7 @@ pkt_burst_mac_swap(struct fwd_stream *fs)
>  		ether_addr_copy(&eth_hdr->s_addr, &eth_hdr->d_addr);
>  		ether_addr_copy(&addr, &eth_hdr->s_addr);
> 
> -		mb->ol_flags = txp->tx_ol_flags;
> +		mb->ol_flags = ol_flags;
>  		mb->l2_len = sizeof(struct ether_hdr);
>  		mb->l3_len = sizeof(struct ipv4_hdr);
>  		mb->vlan_tci = txp->tx_vlan_id;
> diff --git a/app/test-pmd/testpmd.h b/app/test-pmd/testpmd.h
> index 9cbfeac..82af2bd 100644
> --- a/app/test-pmd/testpmd.h
> +++ b/app/test-pmd/testpmd.h
> @@ -123,14 +123,28 @@ struct fwd_stream {
>  #endif
>  };
> 
> +/** Offload IP checksum in csum forward engine */
> +#define TESTPMD_TX_OFFLOAD_IP_CKSUM          0x0001
> +/** Offload UDP checksum in csum forward engine */
> +#define TESTPMD_TX_OFFLOAD_UDP_CKSUM         0x0002
> +/** Offload TCP checksum in csum forward engine */
> +#define TESTPMD_TX_OFFLOAD_TCP_CKSUM         0x0004
> +/** Offload SCTP checksum in csum forward engine */
> +#define TESTPMD_TX_OFFLOAD_SCTP_CKSUM        0x0008
> +/** Offload inner IP checksum in csum forward engine */
> +#define TESTPMD_TX_OFFLOAD_INNER_IP_CKSUM    0x0010
> +/** Offload inner UDP checksum in csum forward engine */
> +#define TESTPMD_TX_OFFLOAD_INNER_UDP_CKSUM   0x0020
> +/** Offload inner TCP checksum in csum forward engine */
> +#define TESTPMD_TX_OFFLOAD_INNER_TCP_CKSUM   0x0040
> +/** Offload inner SCTP checksum in csum forward engine */
> +#define TESTPMD_TX_OFFLOAD_INNER_SCTP_CKSUM  0x0080
> +/** Offload inner IP checksum mask */
> +#define TESTPMD_TX_OFFLOAD_INNER_CKSUM_MASK  0x00F0
> +/** Insert VLAN header in forward engine */
> +#define TESTPMD_TX_OFFLOAD_INSERT_VLAN       0x0100
>  /**
>   * The data structure associated with each port.
> - * tx_ol_flags is slightly different from ol_flags of rte_mbuf.
> - *   Bit  0: Insert IP checksum
> - *   Bit  1: Insert UDP checksum
> - *   Bit  2: Insert TCP checksum
> - *   Bit  3: Insert SCTP checksum
> - *   Bit 11: Insert VLAN Label
>   */
>  struct rte_port {
>  	struct rte_eth_dev_info dev_info;   /**< PCI info + driver name */
> @@ -141,7 +155,7 @@ struct rte_port {
>  	struct fwd_stream       *rx_stream; /**< Port RX stream, if unique */
>  	struct fwd_stream       *tx_stream; /**< Port TX stream, if unique */
>  	unsigned int            socket_id;  /**< For NUMA support */
> -	uint64_t                tx_ol_flags;/**< Offload Flags of TX packets. */
> +	uint16_t                tx_ol_flags;/**< TX Offload Flags (TESTPMD_TX_OFFLOAD...). */
>  	uint16_t                tx_vlan_id; /**< Tag Id. in TX VLAN packets. */
>  	void                    *fwd_ctx;   /**< Forwarding mode context */
>  	uint64_t                rx_bad_ip_csum; /**< rx pkts with bad ip checksum  */
> diff --git a/app/test-pmd/txonly.c b/app/test-pmd/txonly.c
> index 3d08005..c984670 100644
> --- a/app/test-pmd/txonly.c
> +++ b/app/test-pmd/txonly.c
> @@ -196,6 +196,7 @@ static void
>  pkt_burst_transmit(struct fwd_stream *fs)
>  {
>  	struct rte_mbuf *pkts_burst[MAX_PKT_BURST];
> +	struct rte_port *txp;
>  	struct rte_mbuf *pkt;
>  	struct rte_mbuf *pkt_seg;
>  	struct rte_mempool *mbp;
> @@ -203,7 +204,7 @@ pkt_burst_transmit(struct fwd_stream *fs)
>  	uint16_t nb_tx;
>  	uint16_t nb_pkt;
>  	uint16_t vlan_tci;
> -	uint64_t ol_flags;
> +	uint64_t ol_flags = 0;
>  	uint8_t  i;
>  #ifdef RTE_TEST_PMD_RECORD_CORE_CYCLES
>  	uint64_t start_tsc;
> @@ -216,8 +217,10 @@ pkt_burst_transmit(struct fwd_stream *fs)
>  #endif
> 
>  	mbp = current_fwd_lcore()->mbp;
> -	vlan_tci = ports[fs->tx_port].tx_vlan_id;
> -	ol_flags = ports[fs->tx_port].tx_ol_flags;
> +	txp = &ports[fs->tx_port];
> +	vlan_tci = txp->tx_vlan_id;
> +	if (txp->tx_ol_flags & TESTPMD_TX_OFFLOAD_INSERT_VLAN)
> +		ol_flags = PKT_TX_VLAN_PKT;
>  	for (nb_pkt = 0; nb_pkt < nb_pkt_per_burst; nb_pkt++) {
>  		pkt = tx_mbuf_alloc(mbp);
>  		if (pkt == NULL) {
> --
> 2.1.0

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [dpdk-dev] [PATCH v2 06/13] mbuf: add functions to get the name of an ol_flag
  2014-11-25 10:37           ` Ananyev, Konstantin
@ 2014-11-25 12:15             ` Zhang, Helin
  2014-11-25 12:37               ` Olivier MATZ
  2014-11-25 13:49               ` Ananyev, Konstantin
  0 siblings, 2 replies; 112+ messages in thread
From: Zhang, Helin @ 2014-11-25 12:15 UTC (permalink / raw)
  To: Ananyev, Konstantin, 'Olivier MATZ', 'dev@dpdk.org'
  Cc: 'jigsaw@gmail.com'

HI Konstantin

> -----Original Message-----
> From: Ananyev, Konstantin
> Sent: Tuesday, November 25, 2014 6:38 PM
> To: 'Olivier MATZ'; 'dev@dpdk.org'
> Cc: 'jigsaw@gmail.com'; Zhang, Helin
> Subject: RE: [dpdk-dev] [PATCH v2 06/13] mbuf: add functions to get the name of
> an ol_flag
> 
> Hi Helin,
> 
> > -----Original Message-----
> > From: Ananyev, Konstantin
> > Sent: Wednesday, November 19, 2014 11:07 AM
> > To: Olivier MATZ; dev@dpdk.org
> > Cc: jigsaw@gmail.com; Zhang, Helin
> > Subject: RE: [dpdk-dev] [PATCH v2 06/13] mbuf: add functions to get
> > the name of an ol_flag
> >
> >
> >
> > > -----Original Message-----
> > > From: Olivier MATZ [mailto:olivier.matz@6wind.com]
> > > Sent: Tuesday, November 18, 2014 9:30 AM
> > > To: Ananyev, Konstantin; dev@dpdk.org
> > > Cc: jigsaw@gmail.com; Zhang, Helin
> > > Subject: Re: [dpdk-dev] [PATCH v2 06/13] mbuf: add functions to get
> > > the name of an ol_flag
> > >
> > > Hi Konstantin,
> > >
> > > On 11/17/2014 08:00 PM, Ananyev, Konstantin wrote:
> > > >> +/*
> > > >> + * Get the name of a RX offload flag  */ const char
> > > >> +*rte_get_rx_ol_flag_name(uint64_t mask) {
> > > >> +	switch (mask) {
> > > >> +	case PKT_RX_VLAN_PKT: return "PKT_RX_VLAN_PKT";
> > > >> +	case PKT_RX_RSS_HASH: return "PKT_RX_RSS_HASH";
> > > >> +	case PKT_RX_FDIR: return "PKT_RX_FDIR";
> > > >> +	case PKT_RX_L4_CKSUM_BAD: return "PKT_RX_L4_CKSUM_BAD";
> > > >> +	case PKT_RX_IP_CKSUM_BAD: return "PKT_RX_IP_CKSUM_BAD";
> > > >> +	/* case PKT_RX_EIP_CKSUM_BAD: return "PKT_RX_EIP_CKSUM_BAD";
> */
> > > >> +	/* case PKT_RX_OVERSIZE: return "PKT_RX_OVERSIZE"; */
> > > >> +	/* case PKT_RX_HBUF_OVERFLOW: return
> "PKT_RX_HBUF_OVERFLOW"; */
> > > >> +	/* case PKT_RX_RECIP_ERR: return "PKT_RX_RECIP_ERR"; */
> > > >> +	/* case PKT_RX_MAC_ERR: return "PKT_RX_MAC_ERR"; */
> > > >
> > > > Didn't spot it before, wonder why do you need these 5 commented out
> lines?
> > > > In fact, why do we need these flags if they all equal to zero right now?
> > > > I know these flags were not introduced by that patch, in fact as I
> > > > can see it was a temporary measure, as old ol_flags were just 16 bits long:
> > > > http://dpdk.org/ml/archives/dev/2014-June/003308.html
> > > > So wonder should now these flags either get proper values or be removed?
> > >
> > > I would be in favor of removing them, or at least the following ones
> > > (I don't understand how they can help the application):
> > >
> > > - PKT_RX_OVERSIZE: Num of desc of an RX pkt oversize.
> > > - PKT_RX_HBUF_OVERFLOW: Header buffer overflow.
> > > - PKT_RX_RECIP_ERR: Hardware processing error.
> > > - PKT_RX_MAC_ERR: MAC error.
> >
> > Tend to agree...
> > Or probably collapse these 4 flags into one: flag PKT_RX_ERR or something.
> > Might be still used by someone for debugging purposes.
> > Helin, what do you think?
> 
> As there is no answer, I suppose you don't care these flags any more.
> So we can just remove them, right?
Sorry, I think I care it a bit. I have a lot of emails to be dealt with, due to the whole week training.
Yes, it was added there before new mbuf defined. Why zero? Because of lack of bits for them.
Unfortunately, I forgot to add them with correct values after new mbuf introduced.
Thank you so much for spotting it out!

The error flags were added according to the errors defined by FVL datasheet. It could be
helpful for middle layer software or applications with the specific errors identified. I'd prefer
to add the correct values for those flags. What do you think?

Thanks and Regards,
Helin

> 
> Konstantin
> 
> >
> > >
> > > I would have say that a statistics counter in the driver is more
> > > appropriate for this case (maybe there is already a counter in the
> > > hardware).
> > >
> > > I have no i40e hardware to test that, so I don't feel very
> > > comfortable to modify the i40e driver code to add these stats.
> > >
> > > Adding Helin in CC list, maybe he has an idea.
> > >
> > > Regards,
> > > Olivier

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [dpdk-dev] [PATCH v2 06/13] mbuf: add functions to get the name of an ol_flag
  2014-11-25 12:15             ` Zhang, Helin
@ 2014-11-25 12:37               ` Olivier MATZ
  2014-11-25 13:31                 ` Zhang, Helin
  2014-11-25 13:49               ` Ananyev, Konstantin
  1 sibling, 1 reply; 112+ messages in thread
From: Olivier MATZ @ 2014-11-25 12:37 UTC (permalink / raw)
  To: Zhang, Helin, Ananyev, Konstantin, 'dev@dpdk.org'
  Cc: 'jigsaw@gmail.com'

Hi Helin,

On 11/25/2014 01:15 PM, Zhang, Helin wrote:
>>>> I would be in favor of removing them, or at least the following ones
>>>> (I don't understand how they can help the application):
>>>>
>>>> - PKT_RX_OVERSIZE: Num of desc of an RX pkt oversize.
>>>> - PKT_RX_HBUF_OVERFLOW: Header buffer overflow.
>>>> - PKT_RX_RECIP_ERR: Hardware processing error.
>>>> - PKT_RX_MAC_ERR: MAC error.
>>>
>>> Tend to agree...
>>> Or probably collapse these 4 flags into one: flag PKT_RX_ERR or something.
>>> Might be still used by someone for debugging purposes.
>>> Helin, what do you think?
>>
>> As there is no answer, I suppose you don't care these flags any more.
>> So we can just remove them, right?
> Sorry, I think I care it a bit. I have a lot of emails to be dealt with, due to the whole week training.
> Yes, it was added there before new mbuf defined. Why zero? Because of lack of bits for them.
> Unfortunately, I forgot to add them with correct values after new mbuf introduced.
> Thank you so much for spotting it out!
>
> The error flags were added according to the errors defined by FVL datasheet. It could be
> helpful for middle layer software or applications with the specific errors identified. I'd prefer
> to add the correct values for those flags. What do you think?

Could you elaborate about why it could be useful for an application
to have this flag in the mbuf? When these flags are set, is the data
still present in the mbuf? How can the application use this data if
the hardware says "there is an error in the packet"?

I think a stats counter would do the job here.

Regards,
Olivier

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [dpdk-dev] [PATCH v2 06/13] mbuf: add functions to get the name of an ol_flag
  2014-11-25 12:37               ` Olivier MATZ
@ 2014-11-25 13:31                 ` Zhang, Helin
  0 siblings, 0 replies; 112+ messages in thread
From: Zhang, Helin @ 2014-11-25 13:31 UTC (permalink / raw)
  To: Olivier MATZ, Ananyev, Konstantin, 'dev@dpdk.org'
  Cc: 'jigsaw@gmail.com'

Hi Olivier

> -----Original Message-----
> From: Olivier MATZ [mailto:olivier.matz@6wind.com]
> Sent: Tuesday, November 25, 2014 8:38 PM
> To: Zhang, Helin; Ananyev, Konstantin; 'dev@dpdk.org'
> Cc: 'jigsaw@gmail.com'
> Subject: Re: [dpdk-dev] [PATCH v2 06/13] mbuf: add functions to get the name of
> an ol_flag
> 
> Hi Helin,
> 
> On 11/25/2014 01:15 PM, Zhang, Helin wrote:
> >>>> I would be in favor of removing them, or at least the following
> >>>> ones (I don't understand how they can help the application):
> >>>>
> >>>> - PKT_RX_OVERSIZE: Num of desc of an RX pkt oversize.
> >>>> - PKT_RX_HBUF_OVERFLOW: Header buffer overflow.
> >>>> - PKT_RX_RECIP_ERR: Hardware processing error.
> >>>> - PKT_RX_MAC_ERR: MAC error.
> >>>
> >>> Tend to agree...
> >>> Or probably collapse these 4 flags into one: flag PKT_RX_ERR or something.
> >>> Might be still used by someone for debugging purposes.
> >>> Helin, what do you think?
> >>
> >> As there is no answer, I suppose you don't care these flags any more.
> >> So we can just remove them, right?
> > Sorry, I think I care it a bit. I have a lot of emails to be dealt with, due to the
> whole week training.
> > Yes, it was added there before new mbuf defined. Why zero? Because of lack
> of bits for them.
> > Unfortunately, I forgot to add them with correct values after new mbuf
> introduced.
> > Thank you so much for spotting it out!
> >
> > The error flags were added according to the errors defined by FVL
> > datasheet. It could be helpful for middle layer software or
> > applications with the specific errors identified. I'd prefer to add the correct
> values for those flags. What do you think?
> 
> Could you elaborate about why it could be useful for an application to have this
> flag in the mbuf? When these flags are set, is the data still present in the mbuf?
> How can the application use this data if the hardware says "there is an error in
> the packet"?
That mbuf has already been filled with data, even error happens. The error flags can
be used to indicate if the data is valid or not.
Though it may not need too many error flags, but error flags with specific root causes
could be helpful for users to know what happens.

> 
> I think a stats counter would do the job here.
It already supports statistics collection in i40e.

> 
> Regards,
> Olivier

Regards,
Helin

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [dpdk-dev] [PATCH v2 06/13] mbuf: add functions to get the name of an ol_flag
  2014-11-25 12:15             ` Zhang, Helin
  2014-11-25 12:37               ` Olivier MATZ
@ 2014-11-25 13:49               ` Ananyev, Konstantin
  2014-11-26  0:58                 ` Zhang, Helin
  1 sibling, 1 reply; 112+ messages in thread
From: Ananyev, Konstantin @ 2014-11-25 13:49 UTC (permalink / raw)
  To: Zhang, Helin, 'Olivier MATZ', 'dev@dpdk.org'
  Cc: 'jigsaw@gmail.com'



> -----Original Message-----
> From: Zhang, Helin
> Sent: Tuesday, November 25, 2014 12:15 PM
> To: Ananyev, Konstantin; 'Olivier MATZ'; 'dev@dpdk.org'
> Cc: 'jigsaw@gmail.com'
> Subject: RE: [dpdk-dev] [PATCH v2 06/13] mbuf: add functions to get the name of an ol_flag
> 
> HI Konstantin
> 
> > -----Original Message-----
> > From: Ananyev, Konstantin
> > Sent: Tuesday, November 25, 2014 6:38 PM
> > To: 'Olivier MATZ'; 'dev@dpdk.org'
> > Cc: 'jigsaw@gmail.com'; Zhang, Helin
> > Subject: RE: [dpdk-dev] [PATCH v2 06/13] mbuf: add functions to get the name of
> > an ol_flag
> >
> > Hi Helin,
> >
> > > -----Original Message-----
> > > From: Ananyev, Konstantin
> > > Sent: Wednesday, November 19, 2014 11:07 AM
> > > To: Olivier MATZ; dev@dpdk.org
> > > Cc: jigsaw@gmail.com; Zhang, Helin
> > > Subject: RE: [dpdk-dev] [PATCH v2 06/13] mbuf: add functions to get
> > > the name of an ol_flag
> > >
> > >
> > >
> > > > -----Original Message-----
> > > > From: Olivier MATZ [mailto:olivier.matz@6wind.com]
> > > > Sent: Tuesday, November 18, 2014 9:30 AM
> > > > To: Ananyev, Konstantin; dev@dpdk.org
> > > > Cc: jigsaw@gmail.com; Zhang, Helin
> > > > Subject: Re: [dpdk-dev] [PATCH v2 06/13] mbuf: add functions to get
> > > > the name of an ol_flag
> > > >
> > > > Hi Konstantin,
> > > >
> > > > On 11/17/2014 08:00 PM, Ananyev, Konstantin wrote:
> > > > >> +/*
> > > > >> + * Get the name of a RX offload flag  */ const char
> > > > >> +*rte_get_rx_ol_flag_name(uint64_t mask) {
> > > > >> +	switch (mask) {
> > > > >> +	case PKT_RX_VLAN_PKT: return "PKT_RX_VLAN_PKT";
> > > > >> +	case PKT_RX_RSS_HASH: return "PKT_RX_RSS_HASH";
> > > > >> +	case PKT_RX_FDIR: return "PKT_RX_FDIR";
> > > > >> +	case PKT_RX_L4_CKSUM_BAD: return "PKT_RX_L4_CKSUM_BAD";
> > > > >> +	case PKT_RX_IP_CKSUM_BAD: return "PKT_RX_IP_CKSUM_BAD";
> > > > >> +	/* case PKT_RX_EIP_CKSUM_BAD: return "PKT_RX_EIP_CKSUM_BAD";
> > */
> > > > >> +	/* case PKT_RX_OVERSIZE: return "PKT_RX_OVERSIZE"; */
> > > > >> +	/* case PKT_RX_HBUF_OVERFLOW: return
> > "PKT_RX_HBUF_OVERFLOW"; */
> > > > >> +	/* case PKT_RX_RECIP_ERR: return "PKT_RX_RECIP_ERR"; */
> > > > >> +	/* case PKT_RX_MAC_ERR: return "PKT_RX_MAC_ERR"; */
> > > > >
> > > > > Didn't spot it before, wonder why do you need these 5 commented out
> > lines?
> > > > > In fact, why do we need these flags if they all equal to zero right now?
> > > > > I know these flags were not introduced by that patch, in fact as I
> > > > > can see it was a temporary measure, as old ol_flags were just 16 bits long:
> > > > > http://dpdk.org/ml/archives/dev/2014-June/003308.html
> > > > > So wonder should now these flags either get proper values or be removed?
> > > >
> > > > I would be in favor of removing them, or at least the following ones
> > > > (I don't understand how they can help the application):
> > > >
> > > > - PKT_RX_OVERSIZE: Num of desc of an RX pkt oversize.
> > > > - PKT_RX_HBUF_OVERFLOW: Header buffer overflow.
> > > > - PKT_RX_RECIP_ERR: Hardware processing error.
> > > > - PKT_RX_MAC_ERR: MAC error.
> > >
> > > Tend to agree...
> > > Or probably collapse these 4 flags into one: flag PKT_RX_ERR or something.
> > > Might be still used by someone for debugging purposes.
> > > Helin, what do you think?
> >
> > As there is no answer, I suppose you don't care these flags any more.
> > So we can just remove them, right?
> Sorry, I think I care it a bit. I have a lot of emails to be dealt with, due to the whole week training.
> Yes, it was added there before new mbuf defined. Why zero? Because of lack of bits for them.
> Unfortunately, I forgot to add them with correct values after new mbuf introduced.
> Thank you so much for spotting it out!
> 
> The error flags were added according to the errors defined by FVL datasheet. It could be
> helpful for middle layer software or applications with the specific errors identified. I'd prefer
> to add the correct values for those flags. What do you think?


I am ok to have one flag for that something like PKT_RX_HW_ERR (or something).
Don't really understand why you need all 4 of them - the packet contains invalid data anyway,
so there is not much use of it.
For debugging purposes you can just add a debug log for all of them.
Something like:

if (unlikely(error_bits & ...)) {
     flags |= PKT_RX_MAC_ERR;
     PMD_DRV_LOG(DEBUG, ...);
     return flags;
}

Konstantin

> 
> Thanks and Regards,
> Helin
> 
> >
> > Konstantin
> >
> > >
> > > >
> > > > I would have say that a statistics counter in the driver is more
> > > > appropriate for this case (maybe there is already a counter in the
> > > > hardware).
> > > >
> > > > I have no i40e hardware to test that, so I don't feel very
> > > > comfortable to modify the i40e driver code to add these stats.
> > > >
> > > > Adding Helin in CC list, maybe he has an idea.
> > > >
> > > > Regards,
> > > > Olivier

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [dpdk-dev] [PATCH v2 06/13] mbuf: add functions to get the name of an ol_flag
  2014-11-25 13:49               ` Ananyev, Konstantin
@ 2014-11-26  0:58                 ` Zhang, Helin
  0 siblings, 0 replies; 112+ messages in thread
From: Zhang, Helin @ 2014-11-26  0:58 UTC (permalink / raw)
  To: Ananyev, Konstantin, 'Olivier MATZ', 'dev@dpdk.org'
  Cc: 'jigsaw@gmail.com'



> -----Original Message-----
> From: Ananyev, Konstantin
> Sent: Tuesday, November 25, 2014 9:49 PM
> To: Zhang, Helin; 'Olivier MATZ'; 'dev@dpdk.org'
> Cc: 'jigsaw@gmail.com'
> Subject: RE: [dpdk-dev] [PATCH v2 06/13] mbuf: add functions to get the name of
> an ol_flag
> 
> 
> 
> > -----Original Message-----
> > From: Zhang, Helin
> > Sent: Tuesday, November 25, 2014 12:15 PM
> > To: Ananyev, Konstantin; 'Olivier MATZ'; 'dev@dpdk.org'
> > Cc: 'jigsaw@gmail.com'
> > Subject: RE: [dpdk-dev] [PATCH v2 06/13] mbuf: add functions to get
> > the name of an ol_flag
> >
> > HI Konstantin
> >
> > > -----Original Message-----
> > > From: Ananyev, Konstantin
> > > Sent: Tuesday, November 25, 2014 6:38 PM
> > > To: 'Olivier MATZ'; 'dev@dpdk.org'
> > > Cc: 'jigsaw@gmail.com'; Zhang, Helin
> > > Subject: RE: [dpdk-dev] [PATCH v2 06/13] mbuf: add functions to get
> > > the name of an ol_flag
> > >
> > > Hi Helin,
> > >
> > > > -----Original Message-----
> > > > From: Ananyev, Konstantin
> > > > Sent: Wednesday, November 19, 2014 11:07 AM
> > > > To: Olivier MATZ; dev@dpdk.org
> > > > Cc: jigsaw@gmail.com; Zhang, Helin
> > > > Subject: RE: [dpdk-dev] [PATCH v2 06/13] mbuf: add functions to
> > > > get the name of an ol_flag
> > > >
> > > >
> > > >
> > > > > -----Original Message-----
> > > > > From: Olivier MATZ [mailto:olivier.matz@6wind.com]
> > > > > Sent: Tuesday, November 18, 2014 9:30 AM
> > > > > To: Ananyev, Konstantin; dev@dpdk.org
> > > > > Cc: jigsaw@gmail.com; Zhang, Helin
> > > > > Subject: Re: [dpdk-dev] [PATCH v2 06/13] mbuf: add functions to
> > > > > get the name of an ol_flag
> > > > >
> > > > > Hi Konstantin,
> > > > >
> > > > > On 11/17/2014 08:00 PM, Ananyev, Konstantin wrote:
> > > > > >> +/*
> > > > > >> + * Get the name of a RX offload flag  */ const char
> > > > > >> +*rte_get_rx_ol_flag_name(uint64_t mask) {
> > > > > >> +	switch (mask) {
> > > > > >> +	case PKT_RX_VLAN_PKT: return "PKT_RX_VLAN_PKT";
> > > > > >> +	case PKT_RX_RSS_HASH: return "PKT_RX_RSS_HASH";
> > > > > >> +	case PKT_RX_FDIR: return "PKT_RX_FDIR";
> > > > > >> +	case PKT_RX_L4_CKSUM_BAD: return
> "PKT_RX_L4_CKSUM_BAD";
> > > > > >> +	case PKT_RX_IP_CKSUM_BAD: return
> "PKT_RX_IP_CKSUM_BAD";
> > > > > >> +	/* case PKT_RX_EIP_CKSUM_BAD: return
> > > > > >> +"PKT_RX_EIP_CKSUM_BAD";
> > > */
> > > > > >> +	/* case PKT_RX_OVERSIZE: return "PKT_RX_OVERSIZE"; */
> > > > > >> +	/* case PKT_RX_HBUF_OVERFLOW: return
> > > "PKT_RX_HBUF_OVERFLOW"; */
> > > > > >> +	/* case PKT_RX_RECIP_ERR: return "PKT_RX_RECIP_ERR"; */
> > > > > >> +	/* case PKT_RX_MAC_ERR: return "PKT_RX_MAC_ERR"; */
> > > > > >
> > > > > > Didn't spot it before, wonder why do you need these 5
> > > > > > commented out
> > > lines?
> > > > > > In fact, why do we need these flags if they all equal to zero right now?
> > > > > > I know these flags were not introduced by that patch, in fact
> > > > > > as I can see it was a temporary measure, as old ol_flags were just 16 bits
> long:
> > > > > > http://dpdk.org/ml/archives/dev/2014-June/003308.html
> > > > > > So wonder should now these flags either get proper values or be
> removed?
> > > > >
> > > > > I would be in favor of removing them, or at least the following
> > > > > ones (I don't understand how they can help the application):
> > > > >
> > > > > - PKT_RX_OVERSIZE: Num of desc of an RX pkt oversize.
> > > > > - PKT_RX_HBUF_OVERFLOW: Header buffer overflow.
> > > > > - PKT_RX_RECIP_ERR: Hardware processing error.
> > > > > - PKT_RX_MAC_ERR: MAC error.
> > > >
> > > > Tend to agree...
> > > > Or probably collapse these 4 flags into one: flag PKT_RX_ERR or something.
> > > > Might be still used by someone for debugging purposes.
> > > > Helin, what do you think?
> > >
> > > As there is no answer, I suppose you don't care these flags any more.
> > > So we can just remove them, right?
> > Sorry, I think I care it a bit. I have a lot of emails to be dealt with, due to the
> whole week training.
> > Yes, it was added there before new mbuf defined. Why zero? Because of lack
> of bits for them.
> > Unfortunately, I forgot to add them with correct values after new mbuf
> introduced.
> > Thank you so much for spotting it out!
> >
> > The error flags were added according to the errors defined by FVL
> > datasheet. It could be helpful for middle layer software or
> > applications with the specific errors identified. I'd prefer to add the correct
> values for those flags. What do you think?
> 
> 
> I am ok to have one flag for that something like PKT_RX_HW_ERR (or something).
> Don't really understand why you need all 4 of them - the packet contains invalid
> data anyway, so there is not much use of it.
Yes, I agree with you that one bit might be enough. It seems that we have more
than one bits for errors previously.

Regards,
Helin

> For debugging purposes you can just add a debug log for all of them.
> Something like:
> 
> if (unlikely(error_bits & ...)) {
>      flags |= PKT_RX_MAC_ERR;
>      PMD_DRV_LOG(DEBUG, ...);
>      return flags;
> }
> 
> Konstantin
> 
> >
> > Thanks and Regards,
> > Helin
> >
> > >
> > > Konstantin
> > >
> > > >
> > > > >
> > > > > I would have say that a statistics counter in the driver is more
> > > > > appropriate for this case (maybe there is already a counter in
> > > > > the hardware).
> > > > >
> > > > > I have no i40e hardware to test that, so I don't feel very
> > > > > comfortable to modify the i40e driver code to add these stats.
> > > > >
> > > > > Adding Helin in CC list, maybe he has an idea.
> > > > >
> > > > > Regards,
> > > > > Olivier

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [dpdk-dev] [PATCH v3 08/13] testpmd: rework csum forward engine
  2014-11-20 22:58     ` [dpdk-dev] [PATCH v3 08/13] testpmd: rework csum forward engine Olivier Matz
@ 2014-11-26 10:10       ` Ananyev, Konstantin
  2014-11-26 11:14         ` Olivier MATZ
  0 siblings, 1 reply; 112+ messages in thread
From: Ananyev, Konstantin @ 2014-11-26 10:10 UTC (permalink / raw)
  To: Olivier Matz, dev; +Cc: jigsaw

Hi Oliver,

> -----Original Message-----
> From: Olivier Matz [mailto:olivier.matz@6wind.com]
> Sent: Thursday, November 20, 2014 10:59 PM
> To: dev@dpdk.org
> Cc: olivier.matz@6wind.com; Walukiewicz, Miroslaw; Liu, Jijiang; Liu, Yong; jigsaw@gmail.com; Richardson, Bruce; Ananyev, Konstantin
> Subject: [PATCH v3 08/13] testpmd: rework csum forward engine
> 
> The csum forward engine was becoming too complex to be used and
> extended (the next commits want to add the support of TSO):
> 
> - no explaination about what the code does
> - code is not factorized, lots of code duplicated, especially between
>   ipv4/ipv6
> - user command line api: use of bitmasks that need to be calculated by
>   the user
> - the user flags don't have the same semantic:
>   - for legacy IP/UDP/TCP/SCTP, it selects software or hardware checksum
>   - for other (vxlan), it selects between hardware checksum or no
>     checksum
> - the code relies too much on flags set by the driver without software
>   alternative (ex: PKT_RX_TUNNEL_IPV4_HDR). It is nice to be able to
>   compare a software implementation with the hardware offload.
> 
> This commit tries to fix these issues, and provide a simple definition
> of what is done by the forward engine:
> 
>  * Receive a burst of packets, and for supported packet types:
>  *  - modify the IPs
>  *  - reprocess the checksum in SW or HW, depending on testpmd command line
>  *    configuration
>  * Then packets are transmitted on the output port.
>  *
>  * Supported packets are:
>  *   Ether / (vlan) / IP|IP6 / UDP|TCP|SCTP .
>  *   Ether / (vlan) / IP|IP6 / UDP / VxLAN / Ether / IP|IP6 / UDP|TCP|SCTP
>  *
>  * The network parser supposes that the packet is contiguous, which may
>  * not be the case in real life.

As I can see you removed code that sets up TX_PKT_IPV4 and TX_PKT_IPV6  of ol_flags.
I think that we need to keep it.
The reason for that is:
With FVL, to make HW TX checksum offload work, SW is responsible to provide to the HW information about L3 header.
Possible values are:   
- IPv4 hdr with HW checksum calculation
- IPV4 hdr (checksum done by SW)
- IPV6 hdr 
- unknown
So let say to for the packet: ETHER_HDR/IPV6_HDR/TCP_HDR/DATA
To request HW TCP checksum offload,  SW have to provide to HW information that it is a packet with IPV6 header
(plus as for ixgbe: l2_hdr_len, l3_hdr_len, l4_type, l4_hdr_len).
That's why TX_PKT_IPV4 and TX_PKT_IPV6   were introduced.

Yes, it is  a change in public API for HW TX offload, but I don't see any other way we can overcome it
(apart from make TX function itself to parse a packet, which is obviously not a good choice).
Note that existing apps working on existing HW (ixgbe/igb/em) are not affected.
Though apps that supposed to be run on FVL HW too have to follow new convention.

So I suggest we keep setting these flags in csumonly.c

Apart from that , the patch looks good to me.
And yes, we would need to change the  the way we handle TX offload for tunnelled packets. 
Konstantin

> 
> Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
> ---
>  app/test-pmd/cmdline.c  | 156 ++++++++---
>  app/test-pmd/config.c   |  13 +-
>  app/test-pmd/csumonly.c | 676 ++++++++++++++++++++++--------------------------
>  app/test-pmd/testpmd.h  |  17 +-
>  4 files changed, 437 insertions(+), 425 deletions(-)
> 
> diff --git a/app/test-pmd/cmdline.c b/app/test-pmd/cmdline.c
> index 4c3fc76..61e4340 100644
> --- a/app/test-pmd/cmdline.c
> +++ b/app/test-pmd/cmdline.c
> @@ -310,19 +310,19 @@ static void cmd_help_long_parsed(void *parsed_result,
>  			"    Disable hardware insertion of a VLAN header in"
>  			" packets sent on a port.\n\n"
> 
> -			"tx_checksum set (mask) (port_id)\n"
> -			"    Enable hardware insertion of checksum offload with"
> -			" the 8-bit mask, 0~0xff, in packets sent on a port.\n"
> -			"        bit 0 - insert ip   checksum offload if set\n"
> -			"        bit 1 - insert udp  checksum offload if set\n"
> -			"        bit 2 - insert tcp  checksum offload if set\n"
> -			"        bit 3 - insert sctp checksum offload if set\n"
> -			"        bit 4 - insert inner ip  checksum offload if set\n"
> -			"        bit 5 - insert inner udp checksum offload if set\n"
> -			"        bit 6 - insert inner tcp checksum offload if set\n"
> -			"        bit 7 - insert inner sctp checksum offload if set\n"
> +			"tx_cksum set (ip|udp|tcp|sctp|vxlan) (hw|sw) (port_id)\n"
> +			"    Select hardware or software calculation of the"
> +			" checksum with when transmitting a packet using the"
> +			" csum forward engine.\n"
> +			"    ip|udp|tcp|sctp always concern the inner layer.\n"
> +			"    vxlan concerns the outer IP and UDP layer (in"
> +			" case the packet is recognized as a vxlan packet by"
> +			" the forward engine)\n"
>  			"    Please check the NIC datasheet for HW limits.\n\n"
> 
> +			"tx_checksum show (port_id)\n"
> +			"    Display tx checksum offload configuration\n\n"
> +
>  			"set fwd (%s)\n"
>  			"    Set packet forwarding mode.\n\n"
> 
> @@ -2738,48 +2738,131 @@ cmdline_parse_inst_t cmd_tx_vlan_reset = {
> 
> 
>  /* *** ENABLE HARDWARE INSERTION OF CHECKSUM IN TX PACKETS *** */
> -struct cmd_tx_cksum_set_result {
> +struct cmd_tx_cksum_result {
>  	cmdline_fixed_string_t tx_cksum;
> -	cmdline_fixed_string_t set;
> -	uint8_t cksum_mask;
> +	cmdline_fixed_string_t mode;
> +	cmdline_fixed_string_t proto;
> +	cmdline_fixed_string_t hwsw;
>  	uint8_t port_id;
>  };
> 
>  static void
> -cmd_tx_cksum_set_parsed(void *parsed_result,
> +cmd_tx_cksum_parsed(void *parsed_result,
>  		       __attribute__((unused)) struct cmdline *cl,
>  		       __attribute__((unused)) void *data)
>  {
> -	struct cmd_tx_cksum_set_result *res = parsed_result;
> +	struct cmd_tx_cksum_result *res = parsed_result;
> +	int hw = 0;
> +	uint16_t ol_flags, mask = 0;
> +	struct rte_eth_dev_info dev_info;
> +
> +	if (port_id_is_invalid(res->port_id)) {
> +		printf("invalid port %d\n", res->port_id);
> +		return;
> +	}
> 
> -	tx_cksum_set(res->port_id, res->cksum_mask);
> +	if (!strcmp(res->mode, "set")) {
> +
> +		if (!strcmp(res->hwsw, "hw"))
> +			hw = 1;
> +
> +		if (!strcmp(res->proto, "ip")) {
> +			mask = TESTPMD_TX_OFFLOAD_IP_CKSUM;
> +		} else if (!strcmp(res->proto, "udp")) {
> +			mask = TESTPMD_TX_OFFLOAD_UDP_CKSUM;
> +		} else if (!strcmp(res->proto, "tcp")) {
> +			mask = TESTPMD_TX_OFFLOAD_TCP_CKSUM;
> +		} else if (!strcmp(res->proto, "sctp")) {
> +			mask = TESTPMD_TX_OFFLOAD_SCTP_CKSUM;
> +		} else if (!strcmp(res->proto, "vxlan")) {
> +			mask = TESTPMD_TX_OFFLOAD_VXLAN_CKSUM;
> +		}
> +
> +		if (hw)
> +			ports[res->port_id].tx_ol_flags |= mask;
> +		else
> +			ports[res->port_id].tx_ol_flags &= (~mask);
> +	}
> +
> +	ol_flags = ports[res->port_id].tx_ol_flags;
> +	printf("IP checksum offload is %s\n",
> +		(ol_flags & TESTPMD_TX_OFFLOAD_IP_CKSUM) ? "hw" : "sw");
> +	printf("UDP checksum offload is %s\n",
> +		(ol_flags & TESTPMD_TX_OFFLOAD_UDP_CKSUM) ? "hw" : "sw");
> +	printf("TCP checksum offload is %s\n",
> +		(ol_flags & TESTPMD_TX_OFFLOAD_TCP_CKSUM) ? "hw" : "sw");
> +	printf("SCTP checksum offload is %s\n",
> +		(ol_flags & TESTPMD_TX_OFFLOAD_SCTP_CKSUM) ? "hw" : "sw");
> +	printf("VxLAN checksum offload is %s\n",
> +		(ol_flags & TESTPMD_TX_OFFLOAD_VXLAN_CKSUM) ? "hw" : "sw");
> +
> +	/* display warnings if configuration is not supported by the NIC */
> +	rte_eth_dev_info_get(res->port_id, &dev_info);
> +	if ((ol_flags & TESTPMD_TX_OFFLOAD_IP_CKSUM) &&
> +		(dev_info.tx_offload_capa & DEV_TX_OFFLOAD_IPV4_CKSUM) == 0) {
> +		printf("Warning: hardware IP checksum enabled but not "
> +			"supported by port %d\n", res->port_id);
> +	}
> +	if ((ol_flags & TESTPMD_TX_OFFLOAD_UDP_CKSUM) &&
> +		(dev_info.tx_offload_capa & DEV_TX_OFFLOAD_UDP_CKSUM) == 0) {
> +		printf("Warning: hardware UDP checksum enabled but not "
> +			"supported by port %d\n", res->port_id);
> +	}
> +	if ((ol_flags & TESTPMD_TX_OFFLOAD_TCP_CKSUM) &&
> +		(dev_info.tx_offload_capa & DEV_TX_OFFLOAD_TCP_CKSUM) == 0) {
> +		printf("Warning: hardware TCP checksum enabled but not "
> +			"supported by port %d\n", res->port_id);
> +	}
> +	if ((ol_flags & TESTPMD_TX_OFFLOAD_SCTP_CKSUM) &&
> +		(dev_info.tx_offload_capa & DEV_TX_OFFLOAD_SCTP_CKSUM) == 0) {
> +		printf("Warning: hardware SCTP checksum enabled but not "
> +			"supported by port %d\n", res->port_id);
> +	}
>  }
> 
> -cmdline_parse_token_string_t cmd_tx_cksum_set_tx_cksum =
> -	TOKEN_STRING_INITIALIZER(struct cmd_tx_cksum_set_result,
> +cmdline_parse_token_string_t cmd_tx_cksum_tx_cksum =
> +	TOKEN_STRING_INITIALIZER(struct cmd_tx_cksum_result,
>  				tx_cksum, "tx_checksum");
> -cmdline_parse_token_string_t cmd_tx_cksum_set_set =
> -	TOKEN_STRING_INITIALIZER(struct cmd_tx_cksum_set_result,
> -				set, "set");
> -cmdline_parse_token_num_t cmd_tx_cksum_set_cksum_mask =
> -	TOKEN_NUM_INITIALIZER(struct cmd_tx_cksum_set_result,
> -				cksum_mask, UINT8);
> -cmdline_parse_token_num_t cmd_tx_cksum_set_portid =
> -	TOKEN_NUM_INITIALIZER(struct cmd_tx_cksum_set_result,
> +cmdline_parse_token_string_t cmd_tx_cksum_mode =
> +	TOKEN_STRING_INITIALIZER(struct cmd_tx_cksum_result,
> +				mode, "set");
> +cmdline_parse_token_string_t cmd_tx_cksum_proto =
> +	TOKEN_STRING_INITIALIZER(struct cmd_tx_cksum_result,
> +				proto, "ip#tcp#udp#sctp#vxlan");
> +cmdline_parse_token_string_t cmd_tx_cksum_hwsw =
> +	TOKEN_STRING_INITIALIZER(struct cmd_tx_cksum_result,
> +				hwsw, "hw#sw");
> +cmdline_parse_token_num_t cmd_tx_cksum_portid =
> +	TOKEN_NUM_INITIALIZER(struct cmd_tx_cksum_result,
>  				port_id, UINT8);
> 
>  cmdline_parse_inst_t cmd_tx_cksum_set = {
> -	.f = cmd_tx_cksum_set_parsed,
> +	.f = cmd_tx_cksum_parsed,
> +	.data = NULL,
> +	.help_str = "enable/disable hardware calculation of L3/L4 checksum when "
> +		"using csum forward engine: tx_cksum set ip|tcp|udp|sctp|vxlan hw|sw <port>",
> +	.tokens = {
> +		(void *)&cmd_tx_cksum_tx_cksum,
> +		(void *)&cmd_tx_cksum_mode,
> +		(void *)&cmd_tx_cksum_proto,
> +		(void *)&cmd_tx_cksum_hwsw,
> +		(void *)&cmd_tx_cksum_portid,
> +		NULL,
> +	},
> +};
> +
> +cmdline_parse_token_string_t cmd_tx_cksum_mode_show =
> +	TOKEN_STRING_INITIALIZER(struct cmd_tx_cksum_result,
> +				mode, "show");
> +
> +cmdline_parse_inst_t cmd_tx_cksum_show = {
> +	.f = cmd_tx_cksum_parsed,
>  	.data = NULL,
> -	.help_str = "enable hardware insertion of L3/L4checksum with a given "
> -	"mask in packets sent on a port, the bit mapping is given as, Bit 0 for ip, "
> -	"Bit 1 for UDP, Bit 2 for TCP, Bit 3 for SCTP, Bit 4 for inner ip, "
> -	"Bit 5 for inner UDP, Bit 6 for inner TCP, Bit 7 for inner SCTP",
> +	.help_str = "show checksum offload configuration: tx_cksum show <port>",
>  	.tokens = {
> -		(void *)&cmd_tx_cksum_set_tx_cksum,
> -		(void *)&cmd_tx_cksum_set_set,
> -		(void *)&cmd_tx_cksum_set_cksum_mask,
> -		(void *)&cmd_tx_cksum_set_portid,
> +		(void *)&cmd_tx_cksum_tx_cksum,
> +		(void *)&cmd_tx_cksum_mode_show,
> +		(void *)&cmd_tx_cksum_portid,
>  		NULL,
>  	},
>  };
> @@ -7796,6 +7879,7 @@ cmdline_parse_ctx_t main_ctx[] = {
>  	(cmdline_parse_inst_t *)&cmd_tx_vlan_reset,
>  	(cmdline_parse_inst_t *)&cmd_tx_vlan_set_pvid,
>  	(cmdline_parse_inst_t *)&cmd_tx_cksum_set,
> +	(cmdline_parse_inst_t *)&cmd_tx_cksum_show,
>  	(cmdline_parse_inst_t *)&cmd_link_flow_control_set,
>  	(cmdline_parse_inst_t *)&cmd_link_flow_control_set_rx,
>  	(cmdline_parse_inst_t *)&cmd_link_flow_control_set_tx,
> diff --git a/app/test-pmd/config.c b/app/test-pmd/config.c
> index 34b6fdb..d093227 100644
> --- a/app/test-pmd/config.c
> +++ b/app/test-pmd/config.c
> @@ -32,7 +32,7 @@
>   */
>  /*   BSD LICENSE
>   *
> - *   Copyright(c) 2013 6WIND.
> + *   Copyright 2013-2014 6WIND S.A.
>   *
>   *   Redistribution and use in source and binary forms, with or without
>   *   modification, are permitted provided that the following conditions
> @@ -1744,17 +1744,6 @@ set_qmap(portid_t port_id, uint8_t is_rx, uint16_t queue_id, uint8_t map_value)
>  }
> 
>  void
> -tx_cksum_set(portid_t port_id, uint64_t ol_flags)
> -{
> -	uint64_t tx_ol_flags;
> -	if (port_id_is_invalid(port_id))
> -		return;
> -	/* Clear last 8 bits and then set L3/4 checksum mask again */
> -	tx_ol_flags = ports[port_id].tx_ol_flags & (~0x0FFull);
> -	ports[port_id].tx_ol_flags = ((ol_flags & 0xff) | tx_ol_flags);
> -}
> -
> -void
>  fdir_add_signature_filter(portid_t port_id, uint8_t queue_id,
>  			  struct rte_fdir_filter *fdir_filter)
>  {
> diff --git a/app/test-pmd/csumonly.c b/app/test-pmd/csumonly.c
> index 743094a..4d6f1ee 100644
> --- a/app/test-pmd/csumonly.c
> +++ b/app/test-pmd/csumonly.c
> @@ -2,6 +2,7 @@
>   *   BSD LICENSE
>   *
>   *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
> + *   Copyright 2014 6WIND S.A.
>   *   All rights reserved.
>   *
>   *   Redistribution and use in source and binary forms, with or without
> @@ -73,13 +74,19 @@
>  #include <rte_string_fns.h>
>  #include "testpmd.h"
> 
> -
> -
>  #define IP_DEFTTL  64   /* from RFC 1340. */
>  #define IP_VERSION 0x40
>  #define IP_HDRLEN  0x05 /* default IP header length == five 32-bits words. */
>  #define IP_VHL_DEF (IP_VERSION | IP_HDRLEN)
> 
> +/* we cannot use htons() from arpa/inet.h due to name conflicts, and we
> + * cannot use rte_cpu_to_be_16() on a constant in a switch/case */
> +#if __BYTE_ORDER == __LITTLE_ENDIAN
> +#define _htons(x) ((uint16_t)((((x) & 0x00ffU) << 8) | (((x) & 0xff00U) >> 8)))
> +#else
> +#define _htons(x) (x)
> +#endif
> +
>  static inline uint16_t
>  get_16b_sum(uint16_t *ptr16, uint32_t nr)
>  {
> @@ -112,7 +119,7 @@ get_ipv4_cksum(struct ipv4_hdr *ipv4_hdr)
> 
> 
>  static inline uint16_t
> -get_ipv4_psd_sum (struct ipv4_hdr * ip_hdr)
> +get_ipv4_psd_sum(struct ipv4_hdr *ip_hdr)
>  {
>  	/* Pseudo Header for IPv4/UDP/TCP checksum */
>  	union ipv4_psd_header {
> @@ -136,7 +143,7 @@ get_ipv4_psd_sum (struct ipv4_hdr * ip_hdr)
>  }
> 
>  static inline uint16_t
> -get_ipv6_psd_sum (struct ipv6_hdr * ip_hdr)
> +get_ipv6_psd_sum(struct ipv6_hdr *ip_hdr)
>  {
>  	/* Pseudo Header for IPv6/UDP/TCP checksum */
>  	union ipv6_psd_header {
> @@ -158,6 +165,15 @@ get_ipv6_psd_sum (struct ipv6_hdr * ip_hdr)
>  	return get_16b_sum(psd_hdr.u16_arr, sizeof(psd_hdr));
>  }
> 
> +static uint16_t
> +get_psd_sum(void *l3_hdr, uint16_t ethertype)
> +{
> +	if (ethertype == _htons(ETHER_TYPE_IPv4))
> +		return get_ipv4_psd_sum(l3_hdr);
> +	else /* assume ethertype == ETHER_TYPE_IPv6 */
> +		return get_ipv6_psd_sum(l3_hdr);
> +}
> +
>  static inline uint16_t
>  get_ipv4_udptcp_checksum(struct ipv4_hdr *ipv4_hdr, uint16_t *l4_hdr)
>  {
> @@ -174,7 +190,6 @@ get_ipv4_udptcp_checksum(struct ipv4_hdr *ipv4_hdr, uint16_t *l4_hdr)
>  	if (cksum == 0)
>  		cksum = 0xffff;
>  	return (uint16_t)cksum;
> -
>  }
> 
>  static inline uint16_t
> @@ -196,48 +211,225 @@ get_ipv6_udptcp_checksum(struct ipv6_hdr *ipv6_hdr, uint16_t *l4_hdr)
>  	return (uint16_t)cksum;
>  }
> 
> +static uint16_t
> +get_udptcp_checksum(void *l3_hdr, void *l4_hdr, uint16_t ethertype)
> +{
> +	if (ethertype == _htons(ETHER_TYPE_IPv4))
> +		return get_ipv4_udptcp_checksum(l3_hdr, l4_hdr);
> +	else /* assume ethertype == ETHER_TYPE_IPv6 */
> +		return get_ipv6_udptcp_checksum(l3_hdr, l4_hdr);
> +}
> 
>  /*
> - * Forwarding of packets. Change the checksum field with HW or SW methods
> - * The HW/SW method selection depends on the ol_flags on every packet
> + * Parse an ethernet header to fill the ethertype, l2_len, l3_len and
> + * ipproto. This function is able to recognize IPv4/IPv6 with one optional vlan
> + * header.
> + */
> +static void
> +parse_ethernet(struct ether_hdr *eth_hdr, uint16_t *ethertype, uint16_t *l2_len,
> +	uint16_t *l3_len, uint8_t *l4_proto)
> +{
> +	struct ipv4_hdr *ipv4_hdr;
> +	struct ipv6_hdr *ipv6_hdr;
> +
> +	*l2_len = sizeof(struct ether_hdr);
> +	*ethertype = eth_hdr->ether_type;
> +
> +	if (*ethertype == _htons(ETHER_TYPE_VLAN)) {
> +		struct vlan_hdr *vlan_hdr = (struct vlan_hdr *)(eth_hdr + 1);
> +
> +		*l2_len  += sizeof(struct vlan_hdr);
> +		*ethertype = vlan_hdr->eth_proto;
> +	}
> +
> +	switch (*ethertype) {
> +	case _htons(ETHER_TYPE_IPv4):
> +		ipv4_hdr = (struct ipv4_hdr *) ((char *)eth_hdr + *l2_len);
> +		*l3_len = (ipv4_hdr->version_ihl & 0x0f) * 4;
> +		*l4_proto = ipv4_hdr->next_proto_id;
> +		break;
> +	case _htons(ETHER_TYPE_IPv6):
> +		ipv6_hdr = (struct ipv6_hdr *) ((char *)eth_hdr + *l2_len);
> +		*l3_len = sizeof(struct ipv6_hdr) ;
> +		*l4_proto = ipv6_hdr->proto;
> +		break;
> +	default:
> +		*l3_len = 0;
> +		*l4_proto = 0;
> +		break;
> +	}
> +}
> +
> +/* modify the IPv4 or IPv4 source address of a packet */
> +static void
> +change_ip_addresses(void *l3_hdr, uint16_t ethertype)
> +{
> +	struct ipv4_hdr *ipv4_hdr = l3_hdr;
> +	struct ipv6_hdr *ipv6_hdr = l3_hdr;
> +
> +	if (ethertype == _htons(ETHER_TYPE_IPv4)) {
> +		ipv4_hdr->src_addr =
> +			rte_cpu_to_be_32(rte_be_to_cpu_32(ipv4_hdr->src_addr) + 1);
> +	}
> +	else if (ethertype == _htons(ETHER_TYPE_IPv6)) {
> +		ipv6_hdr->src_addr[15] = ipv6_hdr->src_addr[15] + 1;
> +	}
> +}
> +
> +/* if possible, calculate the checksum of a packet in hw or sw,
> + * depending on the testpmd command line configuration */
> +static uint64_t
> +process_inner_cksums(void *l3_hdr, uint16_t ethertype, uint16_t l3_len,
> +	uint8_t l4_proto, uint16_t testpmd_ol_flags)
> +{
> +	struct ipv4_hdr *ipv4_hdr = l3_hdr;
> +	struct udp_hdr *udp_hdr;
> +	struct tcp_hdr *tcp_hdr;
> +	struct sctp_hdr *sctp_hdr;
> +	uint64_t ol_flags = 0;
> +
> +	if (ethertype == _htons(ETHER_TYPE_IPv4)) {
> +		ipv4_hdr = l3_hdr;
> +		ipv4_hdr->hdr_checksum = 0;
> +
> +		if (testpmd_ol_flags & TESTPMD_TX_OFFLOAD_IP_CKSUM)
> +			ol_flags |= PKT_TX_IP_CKSUM;
> +		else
> +			ipv4_hdr->hdr_checksum = get_ipv4_cksum(ipv4_hdr);
> +
> +	}
> +	else if (ethertype != _htons(ETHER_TYPE_IPv6))
> +		return 0; /* packet type not supported nothing to do */
> +
> +	if (l4_proto == IPPROTO_UDP) {
> +		udp_hdr = (struct udp_hdr *)((char *)l3_hdr + l3_len);
> +		/* do not recalculate udp cksum if it was 0 */
> +		if (udp_hdr->dgram_cksum != 0) {
> +			udp_hdr->dgram_cksum = 0;
> +			if (testpmd_ol_flags & TESTPMD_TX_OFFLOAD_UDP_CKSUM) {
> +				ol_flags |= PKT_TX_UDP_CKSUM;
> +				udp_hdr->dgram_cksum = get_psd_sum(l3_hdr,
> +					ethertype);
> +			}
> +			else {
> +				udp_hdr->dgram_cksum =
> +					get_udptcp_checksum(l3_hdr, udp_hdr,
> +						ethertype);
> +			}
> +		}
> +	}
> +	else if (l4_proto == IPPROTO_TCP) {
> +		tcp_hdr = (struct tcp_hdr *)((char *)l3_hdr + l3_len);
> +		tcp_hdr->cksum = 0;
> +		if (testpmd_ol_flags & TESTPMD_TX_OFFLOAD_TCP_CKSUM) {
> +			ol_flags |= PKT_TX_TCP_CKSUM;
> +			tcp_hdr->cksum = get_psd_sum(l3_hdr, ethertype);
> +		}
> +		else {
> +			tcp_hdr->cksum =
> +				get_udptcp_checksum(l3_hdr, tcp_hdr, ethertype);
> +		}
> +	}
> +	else if (l4_proto == IPPROTO_SCTP) {
> +		sctp_hdr = (struct sctp_hdr *)((char *)l3_hdr + l3_len);
> +		sctp_hdr->cksum = 0;
> +		/* sctp payload must be a multiple of 4 to be
> +		 * offloaded */
> +		if ((testpmd_ol_flags & TESTPMD_TX_OFFLOAD_SCTP_CKSUM) &&
> +			((ipv4_hdr->total_length & 0x3) == 0)) {
> +			ol_flags |= PKT_TX_SCTP_CKSUM;
> +		}
> +		else {
> +			/* XXX implement CRC32c, example available in
> +			 * RFC3309 */
> +		}
> +	}
> +
> +	return ol_flags;
> +}
> +
> +/* Calculate the checksum of outer header (only vxlan is supported,
> + * meaning IP + UDP). The caller already checked that it's a vxlan
> + * packet */
> +static uint64_t
> +process_outer_cksums(void *outer_l3_hdr, uint16_t outer_ethertype,
> +	uint16_t outer_l3_len, uint16_t testpmd_ol_flags)
> +{
> +	struct ipv4_hdr *ipv4_hdr = outer_l3_hdr;
> +	struct ipv6_hdr *ipv6_hdr = outer_l3_hdr;
> +	struct udp_hdr *udp_hdr;
> +	uint64_t ol_flags = 0;
> +
> +	if (testpmd_ol_flags & TESTPMD_TX_OFFLOAD_VXLAN_CKSUM)
> +		ol_flags |= PKT_TX_VXLAN_CKSUM;
> +
> +	if (outer_ethertype == _htons(ETHER_TYPE_IPv4)) {
> +		ipv4_hdr->hdr_checksum = 0;
> +
> +		if ((testpmd_ol_flags & TESTPMD_TX_OFFLOAD_VXLAN_CKSUM) == 0)
> +			ipv4_hdr->hdr_checksum = get_ipv4_cksum(ipv4_hdr);
> +	}
> +
> +	udp_hdr = (struct udp_hdr *)((char *)outer_l3_hdr + outer_l3_len);
> +	/* do not recalculate udp cksum if it was 0 */
> +	if (udp_hdr->dgram_cksum != 0) {
> +		udp_hdr->dgram_cksum = 0;
> +		if ((testpmd_ol_flags & TESTPMD_TX_OFFLOAD_VXLAN_CKSUM) == 0) {
> +			if (outer_ethertype == _htons(ETHER_TYPE_IPv4))
> +				udp_hdr->dgram_cksum =
> +					get_ipv4_udptcp_checksum(ipv4_hdr,
> +						(uint16_t *)udp_hdr);
> +			else
> +				udp_hdr->dgram_cksum =
> +					get_ipv6_udptcp_checksum(ipv6_hdr,
> +						(uint16_t *)udp_hdr);
> +		}
> +	}
> +
> +	return ol_flags;
> +}
> +
> +/*
> + * Receive a burst of packets, and for each packet:
> + *  - parse packet, and try to recognize a supported packet type (1)
> + *  - if it's not a supported packet type, don't touch the packet, else:
> + *  - modify the IPs in inner headers and in outer headers if any
> + *  - reprocess the checksum of all supported layers. This is done in SW
> + *    or HW, depending on testpmd command line configuration
> + * Then transmit packets on the output port.
> + *
> + * (1) Supported packets are:
> + *   Ether / (vlan) / IP|IP6 / UDP|TCP|SCTP .
> + *   Ether / (vlan) / outer IP|IP6 / outer UDP / VxLAN / Ether / IP|IP6 /
> + *           UDP|TCP|SCTP
> + *
> + * The testpmd command line for this forward engine sets the flags
> + * TESTPMD_TX_OFFLOAD_* in ports[tx_port].tx_ol_flags. They control
> + * wether a checksum must be calculated in software or in hardware. The
> + * IP, UDP, TCP and SCTP flags always concern the inner layer.  The
> + * VxLAN flag concerns the outer IP and UDP layer (if packet is
> + * recognized as a vxlan packet).
>   */
>  static void
>  pkt_burst_checksum_forward(struct fwd_stream *fs)
>  {
> -	struct rte_mbuf  *pkts_burst[MAX_PKT_BURST];
> -	struct rte_port  *txp;
> -	struct rte_mbuf  *mb;
> +	struct rte_mbuf *pkts_burst[MAX_PKT_BURST];
> +	struct rte_port *txp;
> +	struct rte_mbuf *m;
>  	struct ether_hdr *eth_hdr;
> -	struct ipv4_hdr  *ipv4_hdr;
> -	struct ether_hdr *inner_eth_hdr;
> -	struct ipv4_hdr  *inner_ipv4_hdr = NULL;
> -	struct ipv6_hdr  *ipv6_hdr;
> -	struct ipv6_hdr  *inner_ipv6_hdr = NULL;
> -	struct udp_hdr   *udp_hdr;
> -	struct udp_hdr   *inner_udp_hdr;
> -	struct tcp_hdr   *tcp_hdr;
> -	struct tcp_hdr   *inner_tcp_hdr;
> -	struct sctp_hdr  *sctp_hdr;
> -	struct sctp_hdr  *inner_sctp_hdr;
> -
> +	void *l3_hdr = NULL, *outer_l3_hdr = NULL; /* can be IPv4 or IPv6 */
> +	struct udp_hdr *udp_hdr;
>  	uint16_t nb_rx;
>  	uint16_t nb_tx;
>  	uint16_t i;
>  	uint64_t ol_flags;
> -	uint64_t pkt_ol_flags;
> -	uint64_t tx_ol_flags;
> -	uint16_t l4_proto;
> -	uint16_t inner_l4_proto = 0;
> -	uint16_t eth_type;
> -	uint8_t  l2_len;
> -	uint8_t  l3_len;
> -	uint8_t  inner_l2_len = 0;
> -	uint8_t  inner_l3_len = 0;
> -
> +	uint16_t testpmd_ol_flags;
> +	uint8_t l4_proto;
> +	uint16_t ethertype = 0, outer_ethertype = 0;
> +	uint16_t  l2_len = 0, l3_len = 0, outer_l2_len = 0, outer_l3_len = 0;
> +	int tunnel = 0;
>  	uint32_t rx_bad_ip_csum;
>  	uint32_t rx_bad_l4_csum;
> -	uint8_t  ipv4_tunnel;
> -	uint8_t  ipv6_tunnel;
> 
>  #ifdef RTE_TEST_PMD_RECORD_CORE_CYCLES
>  	uint64_t start_tsc;
> @@ -249,9 +441,7 @@ pkt_burst_checksum_forward(struct fwd_stream *fs)
>  	start_tsc = rte_rdtsc();
>  #endif
> 
> -	/*
> -	 * Receive a burst of packets and forward them.
> -	 */
> +	/* receive a burst of packet */
>  	nb_rx = rte_eth_rx_burst(fs->rx_port, fs->rx_queue, pkts_burst,
>  				 nb_pkt_per_burst);
>  	if (unlikely(nb_rx == 0))
> @@ -265,348 +455,107 @@ pkt_burst_checksum_forward(struct fwd_stream *fs)
>  	rx_bad_l4_csum = 0;
> 
>  	txp = &ports[fs->tx_port];
> -	tx_ol_flags = txp->tx_ol_flags;
> +	testpmd_ol_flags = txp->tx_ol_flags;
> 
>  	for (i = 0; i < nb_rx; i++) {
> 
> -		mb = pkts_burst[i];
> -		l2_len  = sizeof(struct ether_hdr);
> -		pkt_ol_flags = mb->ol_flags;
> -		ol_flags = (pkt_ol_flags & (~PKT_TX_L4_MASK));
> -		ipv4_tunnel = (pkt_ol_flags & PKT_RX_TUNNEL_IPV4_HDR) ?
> -				1 : 0;
> -		ipv6_tunnel = (pkt_ol_flags & PKT_RX_TUNNEL_IPV6_HDR) ?
> -				1 : 0;
> -		eth_hdr = rte_pktmbuf_mtod(mb, struct ether_hdr *);
> -		eth_type = rte_be_to_cpu_16(eth_hdr->ether_type);
> -		if (eth_type == ETHER_TYPE_VLAN) {
> -			/* Only allow single VLAN label here */
> -			l2_len  += sizeof(struct vlan_hdr);
> -			 eth_type = rte_be_to_cpu_16(*(uint16_t *)
> -				((uintptr_t)&eth_hdr->ether_type +
> -				sizeof(struct vlan_hdr)));
> +		ol_flags = 0;
> +		tunnel = 0;
> +		m = pkts_burst[i];
> +
> +		/* Update the L3/L4 checksum error packet statistics */
> +		rx_bad_ip_csum += ((m->ol_flags & PKT_RX_IP_CKSUM_BAD) != 0);
> +		rx_bad_l4_csum += ((m->ol_flags & PKT_RX_L4_CKSUM_BAD) != 0);
> +
> +		/* step 1: dissect packet, parsing optional vlan, ip4/ip6, vxlan
> +		 * and inner headers */
> +
> +		eth_hdr = rte_pktmbuf_mtod(m, struct ether_hdr *);
> +		parse_ethernet(eth_hdr, &ethertype, &l2_len, &l3_len, &l4_proto);
> +		l3_hdr = (char *)eth_hdr + l2_len;
> +
> +		/* check if it's a supported tunnel (only vxlan for now) */
> +		if (l4_proto == IPPROTO_UDP) {
> +			udp_hdr = (struct udp_hdr *)((char *)l3_hdr + l3_len);
> +
> +			/* currently, this flag is set by i40e only if the
> +			 * packet is vxlan */
> +			if (((m->ol_flags & PKT_RX_TUNNEL_IPV4_HDR) ||
> +					(m->ol_flags & PKT_RX_TUNNEL_IPV6_HDR)))
> +				tunnel = 1;
> +			/* else check udp destination port, 4789 is the default
> +			 * vxlan port (rfc7348) */
> +			else if (udp_hdr->dst_port == _htons(4789))
> +				tunnel = 1;
> +
> +			if (tunnel == 1) {
> +				outer_ethertype = ethertype;
> +				outer_l2_len = l2_len;
> +				outer_l3_len = l3_len;
> +				outer_l3_hdr = l3_hdr;
> +
> +				eth_hdr = (struct ether_hdr *)((char *)udp_hdr +
> +					sizeof(struct udp_hdr) +
> +					sizeof(struct vxlan_hdr));
> +
> +				parse_ethernet(eth_hdr, &ethertype, &l2_len,
> +					&l3_len, &l4_proto);
> +				l3_hdr = (char *)eth_hdr + l2_len;
> +			}
>  		}
> 
> -		/* Update the L3/L4 checksum error packet count  */
> -		rx_bad_ip_csum += (uint16_t) ((pkt_ol_flags & PKT_RX_IP_CKSUM_BAD) != 0);
> -		rx_bad_l4_csum += (uint16_t) ((pkt_ol_flags & PKT_RX_L4_CKSUM_BAD) != 0);
> -
> -		/*
> -		 * Try to figure out L3 packet type by SW.
> -		 */
> -		if ((pkt_ol_flags & (PKT_RX_IPV4_HDR | PKT_RX_IPV4_HDR_EXT |
> -				PKT_RX_IPV6_HDR | PKT_RX_IPV6_HDR_EXT)) == 0) {
> -			if (eth_type == ETHER_TYPE_IPv4)
> -				pkt_ol_flags |= PKT_RX_IPV4_HDR;
> -			else if (eth_type == ETHER_TYPE_IPv6)
> -				pkt_ol_flags |= PKT_RX_IPV6_HDR;
> -		}
> +		/* step 2: change all source IPs (v4 or v6) so we need
> +		 * to recompute the chksums even if they were correct */
> 
> -		/*
> -		 * Simplify the protocol parsing
> -		 * Assuming the incoming packets format as
> -		 *      Ethernet2 + optional single VLAN
> -		 *      + ipv4 or ipv6
> -		 *      + udp or tcp or sctp or others
> -		 */
> -		if (pkt_ol_flags & (PKT_RX_IPV4_HDR | PKT_RX_TUNNEL_IPV4_HDR)) {
> +		change_ip_addresses(l3_hdr, ethertype);
> +		if (tunnel == 1)
> +			change_ip_addresses(outer_l3_hdr, outer_ethertype);
> 
> -			/* Do not support ipv4 option field */
> -			l3_len = sizeof(struct ipv4_hdr) ;
> +		/* step 3: depending on user command line configuration,
> +		 * recompute checksum either in software or flag the
> +		 * mbuf to offload the calculation to the NIC */
> 
> -			ipv4_hdr = (struct ipv4_hdr *) (rte_pktmbuf_mtod(mb,
> -					unsigned char *) + l2_len);
> +		/* process checksums of inner headers first */
> +		ol_flags |= process_inner_cksums(l3_hdr, ethertype,
> +			l3_len, l4_proto, testpmd_ol_flags);
> 
> -			l4_proto = ipv4_hdr->next_proto_id;
> +		/* Then process outer headers if any. Note that the software
> +		 * checksum will be wrong if one of the inner checksums is
> +		 * processed in hardware. */
> +		if (tunnel == 1) {
> +			ol_flags |= process_outer_cksums(outer_l3_hdr,
> +				outer_ethertype, outer_l3_len, testpmd_ol_flags);
> +		}
> 
> -			/* Do not delete, this is required by HW*/
> -			ipv4_hdr->hdr_checksum = 0;
> +		/* step 4: fill the mbuf meta data (flags and header lengths) */
> 
> -			if (tx_ol_flags & TESTPMD_TX_OFFLOAD_IP_CKSUM) {
> -				/* HW checksum */
> -				ol_flags |= PKT_TX_IP_CKSUM;
> +		if (tunnel == 1) {
> +			if (testpmd_ol_flags & TESTPMD_TX_OFFLOAD_VXLAN_CKSUM) {
> +				m->l2_len = outer_l2_len;
> +				m->l3_len = outer_l3_len;
> +				m->inner_l2_len = l2_len;
> +				m->inner_l3_len = l3_len;
>  			}
>  			else {
> -				ol_flags |= PKT_TX_IPV4;
> -				/* SW checksum calculation */
> -				ipv4_hdr->src_addr++;
> -				ipv4_hdr->hdr_checksum = get_ipv4_cksum(ipv4_hdr);
> +				/* if we don't do vxlan cksum in hw,
> +				   outer checksum will be wrong because
> +				   we changed the ip, but it shows that
> +				   we can process the inner header cksum
> +				   in the nic */
> +				m->l2_len = outer_l2_len + outer_l3_len +
> +					sizeof(struct udp_hdr) +
> +					sizeof(struct vxlan_hdr) + l2_len;
> +				m->l3_len = l3_len;
>  			}
> -
> -			if (l4_proto == IPPROTO_UDP) {
> -				udp_hdr = (struct udp_hdr*) (rte_pktmbuf_mtod(mb,
> -						unsigned char *) + l2_len + l3_len);
> -				if (tx_ol_flags & TESTPMD_TX_OFFLOAD_UDP_CKSUM) {
> -					/* HW Offload */
> -					ol_flags |= PKT_TX_UDP_CKSUM;
> -					if (ipv4_tunnel)
> -						udp_hdr->dgram_cksum = 0;
> -					else
> -						/* Pseudo header sum need be set properly */
> -						udp_hdr->dgram_cksum =
> -							get_ipv4_psd_sum(ipv4_hdr);
> -				}
> -				else {
> -					/* SW Implementation, clear checksum field first */
> -					udp_hdr->dgram_cksum = 0;
> -					udp_hdr->dgram_cksum = get_ipv4_udptcp_checksum(ipv4_hdr,
> -									(uint16_t *)udp_hdr);
> -				}
> -
> -				if (ipv4_tunnel) {
> -
> -					uint16_t len;
> -
> -					/* Check if inner L3/L4 checkum flag is set */
> -					if (tx_ol_flags & TESTPMD_TX_OFFLOAD_INNER_CKSUM_MASK)
> -						ol_flags |= PKT_TX_VXLAN_CKSUM;
> -
> -					inner_l2_len  = sizeof(struct ether_hdr);
> -					inner_eth_hdr = (struct ether_hdr *) (rte_pktmbuf_mtod(mb,
> -								unsigned char *) + l2_len + l3_len
> -								 + ETHER_VXLAN_HLEN);
> -
> -					eth_type = rte_be_to_cpu_16(inner_eth_hdr->ether_type);
> -					if (eth_type == ETHER_TYPE_VLAN) {
> -						inner_l2_len += sizeof(struct vlan_hdr);
> -						eth_type = rte_be_to_cpu_16(*(uint16_t *)
> -							((uintptr_t)&eth_hdr->ether_type +
> -								sizeof(struct vlan_hdr)));
> -					}
> -
> -					len = l2_len + l3_len + ETHER_VXLAN_HLEN + inner_l2_len;
> -					if (eth_type == ETHER_TYPE_IPv4) {
> -						inner_l3_len = sizeof(struct ipv4_hdr);
> -						inner_ipv4_hdr = (struct ipv4_hdr *) (rte_pktmbuf_mtod(mb,
> -								unsigned char *) + len);
> -						inner_l4_proto = inner_ipv4_hdr->next_proto_id;
> -
> -						if (tx_ol_flags & TESTPMD_TX_OFFLOAD_INNER_IP_CKSUM) {
> -
> -							/* Do not delete, this is required by HW*/
> -							inner_ipv4_hdr->hdr_checksum = 0;
> -							ol_flags |= PKT_TX_IPV4_CSUM;
> -						}
> -
> -					} else if (eth_type == ETHER_TYPE_IPv6) {
> -						inner_l3_len = sizeof(struct ipv6_hdr);
> -						inner_ipv6_hdr = (struct ipv6_hdr *) (rte_pktmbuf_mtod(mb,
> -								unsigned char *) + len);
> -						inner_l4_proto = inner_ipv6_hdr->proto;
> -					}
> -					if ((inner_l4_proto == IPPROTO_UDP) &&
> -						(tx_ol_flags & TESTPMD_TX_OFFLOAD_INNER_UDP_CKSUM)) {
> -
> -						/* HW Offload */
> -						ol_flags |= PKT_TX_UDP_CKSUM;
> -						inner_udp_hdr = (struct udp_hdr *) (rte_pktmbuf_mtod(mb,
> -								unsigned char *) + len + inner_l3_len);
> -						if (eth_type == ETHER_TYPE_IPv4)
> -							inner_udp_hdr->dgram_cksum = get_ipv4_psd_sum(inner_ipv4_hdr);
> -						else if (eth_type == ETHER_TYPE_IPv6)
> -							inner_udp_hdr->dgram_cksum = get_ipv6_psd_sum(inner_ipv6_hdr);
> -
> -					} else if ((inner_l4_proto == IPPROTO_TCP) &&
> -						(tx_ol_flags & TESTPMD_TX_OFFLOAD_INNER_TCP_CKSUM)) {
> -						/* HW Offload */
> -						ol_flags |= PKT_TX_TCP_CKSUM;
> -						inner_tcp_hdr = (struct tcp_hdr *) (rte_pktmbuf_mtod(mb,
> -								unsigned char *) + len + inner_l3_len);
> -						if (eth_type == ETHER_TYPE_IPv4)
> -							inner_tcp_hdr->cksum = get_ipv4_psd_sum(inner_ipv4_hdr);
> -						else if (eth_type == ETHER_TYPE_IPv6)
> -							inner_tcp_hdr->cksum = get_ipv6_psd_sum(inner_ipv6_hdr);
> -					} else if ((inner_l4_proto == IPPROTO_SCTP) &&
> -						(tx_ol_flags & TESTPMD_TX_OFFLOAD_INNER_SCTP_CKSUM)) {
> -						/* HW Offload */
> -						ol_flags |= PKT_TX_SCTP_CKSUM;
> -						inner_sctp_hdr = (struct sctp_hdr *) (rte_pktmbuf_mtod(mb,
> -								unsigned char *) + len + inner_l3_len);
> -						inner_sctp_hdr->cksum = 0;
> -					}
> -
> -				}
> -
> -			} else if (l4_proto == IPPROTO_TCP) {
> -				tcp_hdr = (struct tcp_hdr*) (rte_pktmbuf_mtod(mb,
> -						unsigned char *) + l2_len + l3_len);
> -				if (tx_ol_flags & TESTPMD_TX_OFFLOAD_TCP_CKSUM) {
> -					ol_flags |= PKT_TX_TCP_CKSUM;
> -					tcp_hdr->cksum = get_ipv4_psd_sum(ipv4_hdr);
> -				}
> -				else {
> -					tcp_hdr->cksum = 0;
> -					tcp_hdr->cksum = get_ipv4_udptcp_checksum(ipv4_hdr,
> -							(uint16_t*)tcp_hdr);
> -				}
> -			} else if (l4_proto == IPPROTO_SCTP) {
> -				sctp_hdr = (struct sctp_hdr*) (rte_pktmbuf_mtod(mb,
> -						unsigned char *) + l2_len + l3_len);
> -
> -				if (tx_ol_flags & TESTPMD_TX_OFFLOAD_SCTP_CKSUM) {
> -					ol_flags |= PKT_TX_SCTP_CKSUM;
> -					sctp_hdr->cksum = 0;
> -
> -					/* Sanity check, only number of 4 bytes supported */
> -					if ((rte_be_to_cpu_16(ipv4_hdr->total_length) % 4) != 0)
> -						printf("sctp payload must be a multiple "
> -							"of 4 bytes for checksum offload");
> -				}
> -				else {
> -					sctp_hdr->cksum = 0;
> -					/* CRC32c sample code available in RFC3309 */
> -				}
> -			}
> -			/* End of L4 Handling*/
> -		} else if (pkt_ol_flags & (PKT_RX_IPV6_HDR | PKT_RX_TUNNEL_IPV6_HDR)) {
> -			ipv6_hdr = (struct ipv6_hdr *) (rte_pktmbuf_mtod(mb,
> -					unsigned char *) + l2_len);
> -			l3_len = sizeof(struct ipv6_hdr) ;
> -			l4_proto = ipv6_hdr->proto;
> -			ol_flags |= PKT_TX_IPV6;
> -
> -			if (l4_proto == IPPROTO_UDP) {
> -				udp_hdr = (struct udp_hdr*) (rte_pktmbuf_mtod(mb,
> -						unsigned char *) + l2_len + l3_len);
> -				if (tx_ol_flags & TESTPMD_TX_OFFLOAD_UDP_CKSUM) {
> -					/* HW Offload */
> -					ol_flags |= PKT_TX_UDP_CKSUM;
> -					if (ipv6_tunnel)
> -						udp_hdr->dgram_cksum = 0;
> -					else
> -						udp_hdr->dgram_cksum =
> -							get_ipv6_psd_sum(ipv6_hdr);
> -				}
> -				else {
> -					/* SW Implementation */
> -					/* checksum field need be clear first */
> -					udp_hdr->dgram_cksum = 0;
> -					udp_hdr->dgram_cksum = get_ipv6_udptcp_checksum(ipv6_hdr,
> -								(uint16_t *)udp_hdr);
> -				}
> -
> -				if (ipv6_tunnel) {
> -
> -					uint16_t len;
> -
> -					/* Check if inner L3/L4 checksum flag is set */
> -					if (tx_ol_flags & TESTPMD_TX_OFFLOAD_INNER_CKSUM_MASK)
> -						ol_flags |= PKT_TX_VXLAN_CKSUM;
> -
> -					inner_l2_len  = sizeof(struct ether_hdr);
> -					inner_eth_hdr = (struct ether_hdr *) (rte_pktmbuf_mtod(mb,
> -						unsigned char *) + l2_len + l3_len + ETHER_VXLAN_HLEN);
> -					eth_type = rte_be_to_cpu_16(inner_eth_hdr->ether_type);
> -
> -					if (eth_type == ETHER_TYPE_VLAN) {
> -						inner_l2_len += sizeof(struct vlan_hdr);
> -						eth_type = rte_be_to_cpu_16(*(uint16_t *)
> -							((uintptr_t)&eth_hdr->ether_type +
> -							sizeof(struct vlan_hdr)));
> -					}
> -
> -					len = l2_len + l3_len + ETHER_VXLAN_HLEN + inner_l2_len;
> -
> -					if (eth_type == ETHER_TYPE_IPv4) {
> -						inner_l3_len = sizeof(struct ipv4_hdr);
> -						inner_ipv4_hdr = (struct ipv4_hdr *) (rte_pktmbuf_mtod(mb,
> -								unsigned char *) + len);
> -						inner_l4_proto = inner_ipv4_hdr->next_proto_id;
> -
> -						/* HW offload */
> -						if (tx_ol_flags & TESTPMD_TX_OFFLOAD_INNER_IP_CKSUM) {
> -
> -							/* Do not delete, this is required by HW*/
> -							inner_ipv4_hdr->hdr_checksum = 0;
> -							ol_flags |= PKT_TX_IPV4_CSUM;
> -						}
> -					} else if (eth_type == ETHER_TYPE_IPv6) {
> -						inner_l3_len = sizeof(struct ipv6_hdr);
> -						inner_ipv6_hdr = (struct ipv6_hdr *) (rte_pktmbuf_mtod(mb,
> -							unsigned char *) + len);
> -						inner_l4_proto = inner_ipv6_hdr->proto;
> -					}
> -
> -					if ((inner_l4_proto == IPPROTO_UDP) &&
> -						(tx_ol_flags & TESTPMD_TX_OFFLOAD_INNER_UDP_CKSUM)) {
> -						inner_udp_hdr = (struct udp_hdr *) (rte_pktmbuf_mtod(mb,
> -							unsigned char *) + len + inner_l3_len);
> -						/* HW offload */
> -						ol_flags |= PKT_TX_UDP_CKSUM;
> -						inner_udp_hdr->dgram_cksum = 0;
> -						if (eth_type == ETHER_TYPE_IPv4)
> -							inner_udp_hdr->dgram_cksum = get_ipv4_psd_sum(inner_ipv4_hdr);
> -						else if (eth_type == ETHER_TYPE_IPv6)
> -							inner_udp_hdr->dgram_cksum = get_ipv6_psd_sum(inner_ipv6_hdr);
> -					} else if ((inner_l4_proto == IPPROTO_TCP) &&
> -						(tx_ol_flags & TESTPMD_TX_OFFLOAD_INNER_TCP_CKSUM)) {
> -						/* HW offload */
> -						ol_flags |= PKT_TX_TCP_CKSUM;
> -						inner_tcp_hdr = (struct tcp_hdr *) (rte_pktmbuf_mtod(mb,
> -								unsigned char *) + len + inner_l3_len);
> -
> -						if (eth_type == ETHER_TYPE_IPv4)
> -							inner_tcp_hdr->cksum = get_ipv4_psd_sum(inner_ipv4_hdr);
> -						else if (eth_type == ETHER_TYPE_IPv6)
> -							inner_tcp_hdr->cksum = get_ipv6_psd_sum(inner_ipv6_hdr);
> -
> -					} else if ((inner_l4_proto == IPPROTO_SCTP) &&
> -						(tx_ol_flags & TESTPMD_TX_OFFLOAD_INNER_SCTP_CKSUM)) {
> -						/* HW offload */
> -						ol_flags |= PKT_TX_SCTP_CKSUM;
> -						inner_sctp_hdr = (struct sctp_hdr *) (rte_pktmbuf_mtod(mb,
> -								unsigned char *) + len + inner_l3_len);
> -						inner_sctp_hdr->cksum = 0;
> -					}
> -
> -				}
> -
> -			}
> -			else if (l4_proto == IPPROTO_TCP) {
> -				tcp_hdr = (struct tcp_hdr*) (rte_pktmbuf_mtod(mb,
> -						unsigned char *) + l2_len + l3_len);
> -				if (tx_ol_flags & TESTPMD_TX_OFFLOAD_TCP_CKSUM) {
> -					ol_flags |= PKT_TX_TCP_CKSUM;
> -					tcp_hdr->cksum = get_ipv6_psd_sum(ipv6_hdr);
> -				}
> -				else {
> -					tcp_hdr->cksum = 0;
> -					tcp_hdr->cksum = get_ipv6_udptcp_checksum(ipv6_hdr,
> -							(uint16_t*)tcp_hdr);
> -				}
> -			}
> -			else if (l4_proto == IPPROTO_SCTP) {
> -				sctp_hdr = (struct sctp_hdr*) (rte_pktmbuf_mtod(mb,
> -						unsigned char *) + l2_len + l3_len);
> -
> -				if (tx_ol_flags & TESTPMD_TX_OFFLOAD_SCTP_CKSUM) {
> -					ol_flags |= PKT_TX_SCTP_CKSUM;
> -					sctp_hdr->cksum = 0;
> -					/* Sanity check, only number of 4 bytes supported by HW */
> -					if ((rte_be_to_cpu_16(ipv6_hdr->payload_len) % 4) != 0)
> -						printf("sctp payload must be a multiple "
> -							"of 4 bytes for checksum offload");
> -				}
> -				else {
> -					/* CRC32c sample code available in RFC3309 */
> -					sctp_hdr->cksum = 0;
> -				}
> -			} else {
> -				printf("Test flow control for 1G PMD \n");
> -			}
> -			/* End of L6 Handling*/
> -		}
> -		else {
> -			l3_len = 0;
> -			printf("Unhandled packet type: %#hx\n", eth_type);
> +		} else {
> +			/* this is only useful if an offload flag is
> +			 * set, but it does not hurt to fill it in any
> +			 * case */
> +			m->l2_len = l2_len;
> +			m->l3_len = l3_len;
>  		}
> +		m->ol_flags = ol_flags;
> 
> -		/* Combine the packet header write. VLAN is not consider here */
> -		mb->l2_len = l2_len;
> -		mb->l3_len = l3_len;
> -		mb->inner_l2_len = inner_l2_len;
> -		mb->inner_l3_len = inner_l3_len;
> -		mb->ol_flags = ol_flags;
>  	}
>  	nb_tx = rte_eth_tx_burst(fs->tx_port, fs->tx_queue, pkts_burst, nb_rx);
>  	fs->tx_packets += nb_tx;
> @@ -629,7 +578,6 @@ pkt_burst_checksum_forward(struct fwd_stream *fs)
>  #endif
>  }
> 
> -
>  struct fwd_engine csum_fwd_engine = {
>  	.fwd_mode_name  = "csum",
>  	.port_fwd_begin = NULL,
> diff --git a/app/test-pmd/testpmd.h b/app/test-pmd/testpmd.h
> index 82af2bd..c753d37 100644
> --- a/app/test-pmd/testpmd.h
> +++ b/app/test-pmd/testpmd.h
> @@ -131,18 +131,11 @@ struct fwd_stream {
>  #define TESTPMD_TX_OFFLOAD_TCP_CKSUM         0x0004
>  /** Offload SCTP checksum in csum forward engine */
>  #define TESTPMD_TX_OFFLOAD_SCTP_CKSUM        0x0008
> -/** Offload inner IP checksum in csum forward engine */
> -#define TESTPMD_TX_OFFLOAD_INNER_IP_CKSUM    0x0010
> -/** Offload inner UDP checksum in csum forward engine */
> -#define TESTPMD_TX_OFFLOAD_INNER_UDP_CKSUM   0x0020
> -/** Offload inner TCP checksum in csum forward engine */
> -#define TESTPMD_TX_OFFLOAD_INNER_TCP_CKSUM   0x0040
> -/** Offload inner SCTP checksum in csum forward engine */
> -#define TESTPMD_TX_OFFLOAD_INNER_SCTP_CKSUM  0x0080
> -/** Offload inner IP checksum mask */
> -#define TESTPMD_TX_OFFLOAD_INNER_CKSUM_MASK  0x00F0
> +/** Offload VxLAN checksum in csum forward engine */
> +#define TESTPMD_TX_OFFLOAD_VXLAN_CKSUM       0x0010
>  /** Insert VLAN header in forward engine */
> -#define TESTPMD_TX_OFFLOAD_INSERT_VLAN       0x0100
> +#define TESTPMD_TX_OFFLOAD_INSERT_VLAN       0x0020
> +
>  /**
>   * The data structure associated with each port.
>   */
> @@ -510,8 +503,6 @@ void tx_vlan_pvid_set(portid_t port_id, uint16_t vlan_id, int on);
> 
>  void set_qmap(portid_t port_id, uint8_t is_rx, uint16_t queue_id, uint8_t map_value);
> 
> -void tx_cksum_set(portid_t port_id, uint64_t ol_flags);
> -
>  void set_verbose_level(uint16_t vb_level);
>  void set_tx_pkt_segments(unsigned *seg_lengths, unsigned nb_segs);
>  void set_nb_pkt_per_burst(uint16_t pkt_burst);
> --
> 2.1.0

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [dpdk-dev] [PATCH v3 08/13] testpmd: rework csum forward engine
  2014-11-26 10:10       ` Ananyev, Konstantin
@ 2014-11-26 11:14         ` Olivier MATZ
  2014-11-26 12:25           ` Ananyev, Konstantin
  2014-11-26 13:59           ` Liu, Jijiang
  0 siblings, 2 replies; 112+ messages in thread
From: Olivier MATZ @ 2014-11-26 11:14 UTC (permalink / raw)
  To: Ananyev, Konstantin, dev; +Cc: jigsaw

Hi Konstantin,

On 11/26/2014 11:10 AM, Ananyev, Konstantin wrote:
> As I can see you removed code that sets up TX_PKT_IPV4 and TX_PKT_IPV6  of ol_flags.
> I think that we need to keep it.
> The reason for that is:
> With FVL, to make HW TX checksum offload work, SW is responsible to provide to the HW information about L3 header.
> Possible values are:
> - IPv4 hdr with HW checksum calculation
> - IPV4 hdr (checksum done by SW)
> - IPV6 hdr
> - unknown
> So let say to for the packet: ETHER_HDR/IPV6_HDR/TCP_HDR/DATA
> To request HW TCP checksum offload,  SW have to provide to HW information that it is a packet with IPV6 header
> (plus as for ixgbe: l2_hdr_len, l3_hdr_len, l4_type, l4_hdr_len).
> That's why TX_PKT_IPV4 and TX_PKT_IPV6   were introduced.
>
> Yes, it is  a change in public API for HW TX offload, but I don't see any other way we can overcome it
> (apart from make TX function itself to parse a packet, which is obviously not a good choice).
> Note that existing apps working on existing HW (ixgbe/igb/em) are not affected.
> Though apps that supposed to be run on FVL HW too have to follow new convention.
>
> So I suggest we keep setting these flags in csumonly.c

Right, I missed these flags.
It's indeed an API change, but maybe it makes sense, and setting it
is not a big cost for the application.

So I would also need to slightly modify the API help in the following
patches:
  - [04/13] mbuf: add help about TX checksum flags
  - [10/13] mbuf: generic support for TCP segmentation offload

I'll send a v4 this afternoon that integrates this change.

Do you know precisely when the flags PKT_TX_IPV4 and PKT_TX_IPV6 must
be set by the application? Is it only the hw checksum and tso use case?
If yes, I'll add it in the API help too.

By the way (this is probably off-topic), but I'm wondering if the TX
flags should have the same values than the RX flags:

   #define PKT_TX_IPV4          PKT_RX_IPV4_HDR
   #define PKT_TX_IPV6          PKT_RX_IPV6_HDR

> Apart from that , the patch looks good to me.
> And yes, we would need to change the  the way we handle TX offload for tunnelled packets.

Thank you very much Konstantin for your review.

Regards,
Olivier

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [dpdk-dev] [PATCH v3 08/13] testpmd: rework csum forward engine
  2014-11-26 11:14         ` Olivier MATZ
@ 2014-11-26 12:25           ` Ananyev, Konstantin
  2014-11-26 14:55             ` Olivier MATZ
  2014-11-26 13:59           ` Liu, Jijiang
  1 sibling, 1 reply; 112+ messages in thread
From: Ananyev, Konstantin @ 2014-11-26 12:25 UTC (permalink / raw)
  To: Olivier MATZ, dev; +Cc: jigsaw



> -----Original Message-----
> From: Olivier MATZ [mailto:olivier.matz@6wind.com]
> Sent: Wednesday, November 26, 2014 11:15 AM
> To: Ananyev, Konstantin; dev@dpdk.org
> Cc: Walukiewicz, Miroslaw; Liu, Jijiang; Liu, Yong; jigsaw@gmail.com; Richardson, Bruce
> Subject: Re: [PATCH v3 08/13] testpmd: rework csum forward engine
> 
> Hi Konstantin,
> 
> On 11/26/2014 11:10 AM, Ananyev, Konstantin wrote:
> > As I can see you removed code that sets up TX_PKT_IPV4 and TX_PKT_IPV6  of ol_flags.
> > I think that we need to keep it.
> > The reason for that is:
> > With FVL, to make HW TX checksum offload work, SW is responsible to provide to the HW information about L3 header.
> > Possible values are:
> > - IPv4 hdr with HW checksum calculation
> > - IPV4 hdr (checksum done by SW)
> > - IPV6 hdr
> > - unknown
> > So let say to for the packet: ETHER_HDR/IPV6_HDR/TCP_HDR/DATA
> > To request HW TCP checksum offload,  SW have to provide to HW information that it is a packet with IPV6 header
> > (plus as for ixgbe: l2_hdr_len, l3_hdr_len, l4_type, l4_hdr_len).
> > That's why TX_PKT_IPV4 and TX_PKT_IPV6   were introduced.
> >
> > Yes, it is  a change in public API for HW TX offload, but I don't see any other way we can overcome it
> > (apart from make TX function itself to parse a packet, which is obviously not a good choice).
> > Note that existing apps working on existing HW (ixgbe/igb/em) are not affected.
> > Though apps that supposed to be run on FVL HW too have to follow new convention.
> >
> > So I suggest we keep setting these flags in csumonly.c
> 
> Right, I missed these flags.
> It's indeed an API change, but maybe it makes sense, and setting it
> is not a big cost for the application.
> 
> So I would also need to slightly modify the API help in the following
> patches:
>   - [04/13] mbuf: add help about TX checksum flags
>   - [10/13] mbuf: generic support for TCP segmentation offload
> 
> I'll send a v4 this afternoon that integrates this change.

Ok, thanks.

> 
> Do you know precisely when the flags PKT_TX_IPV4 and PKT_TX_IPV6 must
> be set by the application? Is it only the hw checksum and tso use case?

Yes, I believe it should be set only for hw checksum and tso.

> If yes, I'll add it in the API help too.
> 
> By the way (this is probably off-topic), but I'm wondering if the TX
> flags should have the same values than the RX flags:
> 
>    #define PKT_TX_IPV4          PKT_RX_IPV4_HDR
>    #define PKT_TX_IPV6          PKT_RX_IPV6_HDR

Thought about that too.
>From one side,  it is a bit out of our concept: separate RX and TX falgs.
>From other side, it allows us to save 2 bits in the ol_flags.
Don't have any strong opinion here.
What do you think?  

> 
> > Apart from that , the patch looks good to me.
> > And yes, we would need to change the  the way we handle TX offload for tunnelled packets.
> 
> Thank you very much Konstantin for your review.
> 
> Regards,
> Olivier

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [dpdk-dev] [PATCH v3 08/13] testpmd: rework csum forward engine
  2014-11-26 11:14         ` Olivier MATZ
  2014-11-26 12:25           ` Ananyev, Konstantin
@ 2014-11-26 13:59           ` Liu, Jijiang
  1 sibling, 0 replies; 112+ messages in thread
From: Liu, Jijiang @ 2014-11-26 13:59 UTC (permalink / raw)
  To: Olivier MATZ, Ananyev, Konstantin, dev; +Cc: jigsaw



> -----Original Message-----
> From: Olivier MATZ [mailto:olivier.matz@6wind.com]
> Sent: Wednesday, November 26, 2014 7:15 PM
> To: Ananyev, Konstantin; dev@dpdk.org
> Cc: Walukiewicz, Miroslaw; Liu, Jijiang; Liu, Yong; jigsaw@gmail.com; Richardson,
> Bruce
> Subject: Re: [PATCH v3 08/13] testpmd: rework csum forward engine
> 
> Hi Konstantin,
> 
> On 11/26/2014 11:10 AM, Ananyev, Konstantin wrote:
> > As I can see you removed code that sets up TX_PKT_IPV4 and TX_PKT_IPV6  of
> ol_flags.
> > I think that we need to keep it.
> > The reason for that is:
> > With FVL, to make HW TX checksum offload work, SW is responsible to provide
> to the HW information about L3 header.
> > Possible values are:
> > - IPv4 hdr with HW checksum calculation
> > - IPV4 hdr (checksum done by SW)
> > - IPV6 hdr
> > - unknown
> > So let say to for the packet: ETHER_HDR/IPV6_HDR/TCP_HDR/DATA To
> > request HW TCP checksum offload,  SW have to provide to HW information
> > that it is a packet with IPV6 header (plus as for ixgbe: l2_hdr_len, l3_hdr_len,
> l4_type, l4_hdr_len).
> > That's why TX_PKT_IPV4 and TX_PKT_IPV6   were introduced.
> >
> > Yes, it is  a change in public API for HW TX offload, but I don't see
> > any other way we can overcome it (apart from make TX function itself to parse
> a packet, which is obviously not a good choice).
> > Note that existing apps working on existing HW (ixgbe/igb/em) are not affected.
> > Though apps that supposed to be run on FVL HW too have to follow new
> convention.
> >
> > So I suggest we keep setting these flags in csumonly.c
> 
> Right, I missed these flags.
> It's indeed an API change, but maybe it makes sense, and setting it is not a big
> cost for the application.
> 
> So I would also need to slightly modify the API help in the following
> patches:
>   - [04/13] mbuf: add help about TX checksum flags
>   - [10/13] mbuf: generic support for TCP segmentation offload
> 
> I'll send a v4 this afternoon that integrates this change.

After your patch is applied, I will send a patch of  i40e driver change for VXLAN Tx checksum.
 
> Do you know precisely when the flags PKT_TX_IPV4 and PKT_TX_IPV6 must be
> set by the application? Is it only the hw checksum and tso use case?
> If yes, I'll add it in the API help too.
> 
> By the way (this is probably off-topic), but I'm wondering if the TX flags should
> have the same values than the RX flags:
> 
>    #define PKT_TX_IPV4          PKT_RX_IPV4_HDR
>    #define PKT_TX_IPV6          PKT_RX_IPV6_HDR
> 
> > Apart from that , the patch looks good to me.
> > And yes, we would need to change the  the way we handle TX offload for
> tunnelled packets.
> 
> Thank you very much Konstantin for your review.
> 
> Regards,
> Olivier

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [dpdk-dev] [PATCH v3 08/13] testpmd: rework csum forward engine
  2014-11-26 12:25           ` Ananyev, Konstantin
@ 2014-11-26 14:55             ` Olivier MATZ
  2014-11-26 16:34               ` Ananyev, Konstantin
  0 siblings, 1 reply; 112+ messages in thread
From: Olivier MATZ @ 2014-11-26 14:55 UTC (permalink / raw)
  To: Ananyev, Konstantin, dev; +Cc: jigsaw

Hi Konstantin,

On 11/26/2014 01:25 PM, Ananyev, Konstantin wrote:
>> By the way (this is probably off-topic), but I'm wondering if the TX
>> flags should have the same values than the RX flags:
>>
>>     #define PKT_TX_IPV4          PKT_RX_IPV4_HDR
>>     #define PKT_TX_IPV6          PKT_RX_IPV6_HDR
>
> Thought about that too.
>  From one side,  it is a bit out of our concept: separate RX and TX falgs.
>  From other side, it allows us to save 2 bits in the ol_flags.
> Don't have any strong opinion here.
> What do you think?

I have no strong opinion too, but I have a preference for 2 different
bit values for these flags:

- as you say, it's matches the concept (RX and TX flags are separated)

- 64 bits is a lot, we have some time before there is no more available
   bit... and I hope we it will never occur because it would become
   complex for an application to handle them all

- it will avoid to send a packet with a bad info:
   - we receive a Ether/IP6/IP4/L4/data packet
   - the driver sets PKT_RX_IPV6_HDR
   - the stack decapsulates IP6
   - the stack sends the packet, it has the PKT_TX_IPV6 flag but it's an
     IPv4 packet

   This is not a real problem as the flag will not be used by the
   driver/hardware (it's only mandatory for hw cksum / tso), but
   it can be confusing.

Regards,
Olivier

^ permalink raw reply	[flat|nested] 112+ messages in thread

* [dpdk-dev] [PATCH v4 00/13] add TSO support
  2014-11-20 22:58   ` [dpdk-dev] [PATCH v3 00/13] add TSO support Olivier Matz
                       ` (12 preceding siblings ...)
  2014-11-20 22:58     ` [dpdk-dev] [PATCH v3 13/13] testpmd: add a verbose mode " Olivier Matz
@ 2014-11-26 15:04     ` Olivier Matz
  2014-11-26 15:04       ` [dpdk-dev] [PATCH v4 01/13] igb/ixgbe: fix IP checksum calculation Olivier Matz
                         ` (13 more replies)
  13 siblings, 14 replies; 112+ messages in thread
From: Olivier Matz @ 2014-11-26 15:04 UTC (permalink / raw)
  To: dev; +Cc: jigsaw

This series add TSO support in ixgbe DPDK driver. This is a rework
of the series sent earlier this week [1]. This work is based on
another version [2] that was posted several months ago and
which included a mbuf rework that is now in mainline.

Changes in v4:

- csum fwd engine: use PKT_TX_IPV4 and PKT_TX_IPV6 to tell the hardware
  the IP version of the packet as suggested by Konstantin.
- document these 2 flags, explaining they should be set for hw L4 cksum
  offload or TSO.
- rebase on latest head

Changes in v3:

- indicate that rte_get_rx_ol_flag_name() and rte_get_tx_ol_flag_name()
  should be kept synchronized with flags definition
- use sizeof() when appropriate in rte_raw_cksum()
- remove double semicolon in ixgbe driver
- reorder tx ol_flags as requested by Thomas
- add missing copyrights when big modifications are made
- enhance the help of tx_cksum command in testpmd
- enhance the description of csumonly (comments)

Changes in v2:

- move rte_get_rx_ol_flag_name() and rte_get_tx_ol_flag_name() in
  rte_mbuf.c, and fix comments
- use IGB_TX_OFFLOAD_MASK and IXGBE_TX_OFFLOAD_MASK to replace
  PKT_TX_OFFLOAD_MASK
- fix inner_l2_len and inner_l3_len bitfields: use uint64_t instead
  of uint16_t
- replace assignation of l2_len and l3_len by assignation of tx_offload.
  It now includes inner_l2_len and inner_l3_len at the same time.
- introduce a new cksum api in rte_ip.h following discussion with
  Konstantin
- reorder commits to have all TSO commits at the end of the series
- use ol_flags for phdr checksum calculation (this now matches ixgbe
  API: standard pseudo hdr cksum for TCP cksum offload, pseudo hdr
  cksum without ip paylen for TSO). This will probably be changed
  with a dev_prep_tx() like function for 2.0 release.
- rebase on latest head


This series first fixes some bugs that were discovered during the
development, adds some changes to the mbuf API (new l4_len and
tso_segsz fields), adds TSO support in ixgbe, reworks testpmd
csum forward engine, and finally adds TSO support in testpmd so it
can be validated.

The new fields added in mbuf try to be generic enough to apply to
other hardware in the future. To delegate the TCP segmentation to the
hardware, the user has to:

  - set the PKT_TX_TCP_SEG flag in mbuf->ol_flags (this flag implies
    PKT_TX_TCP_CKSUM)
  - set the flag PKT_TX_IPV4 or PKT_TX_IPV6
  - if it's IPv4, set the PKT_TX_IP_CKSUM flag and write the IP checksum
    to 0 in the packet
  - fill the mbuf offload information: l2_len, l3_len, l4_len, tso_segsz

  - calculate the pseudo header checksum without taking ip_len in account,
    and set it in the TCP header, for instance by using
    rte_ipv4_phdr_cksum(ip_hdr, ol_flags)

The test report will be added as an answer to this cover letter and
could be linked in the concerned commits.

[1] http://dpdk.org/ml/archives/dev/2014-November/007953.html
[2] http://dpdk.org/ml/archives/dev/2014-May/002537.html

Olivier Matz (13):
  igb/ixgbe: fix IP checksum calculation
  ixgbe: fix remaining pkt_flags variable size to 64 bits
  mbuf: reorder tx ol_flags
  mbuf: add help about TX checksum flags
  mbuf: remove too specific PKT_TX_OFFLOAD_MASK definition
  mbuf: add functions to get the name of an ol_flag
  testpmd: fix use of offload flags in testpmd
  testpmd: rework csum forward engine
  mbuf: introduce new checksum API
  mbuf: generic support for TCP segmentation offload
  ixgbe: support TCP segmentation offload
  testpmd: support TSO in csum forward engine
  testpmd: add a verbose mode csum forward engine

 app/test-pmd/cmdline.c              | 248 +++++++++--
 app/test-pmd/config.c               |  17 +-
 app/test-pmd/csumonly.c             | 817 ++++++++++++++++--------------------
 app/test-pmd/macfwd.c               |   5 +-
 app/test-pmd/macswap.c              |   5 +-
 app/test-pmd/rxonly.c               |  36 +-
 app/test-pmd/testpmd.c              |   2 +-
 app/test-pmd/testpmd.h              |  24 +-
 app/test-pmd/txonly.c               |   9 +-
 examples/ipv4_multicast/main.c      |   2 +-
 lib/librte_mbuf/rte_mbuf.c          |  49 +++
 lib/librte_mbuf/rte_mbuf.h          | 108 +++--
 lib/librte_net/rte_ip.h             | 208 +++++++++
 lib/librte_pmd_e1000/igb_rxtx.c     |  21 +-
 lib/librte_pmd_ixgbe/ixgbe_ethdev.c |   3 +-
 lib/librte_pmd_ixgbe/ixgbe_rxtx.c   | 179 +++++---
 lib/librte_pmd_ixgbe/ixgbe_rxtx.h   |  19 +-
 17 files changed, 1103 insertions(+), 649 deletions(-)

-- 
2.1.0

^ permalink raw reply	[flat|nested] 112+ messages in thread

* [dpdk-dev] [PATCH v4 01/13] igb/ixgbe: fix IP checksum calculation
  2014-11-26 15:04     ` [dpdk-dev] [PATCH v4 00/13] add TSO support Olivier Matz
@ 2014-11-26 15:04       ` Olivier Matz
  2014-11-26 15:04       ` [dpdk-dev] [PATCH v4 02/13] ixgbe: fix remaining pkt_flags variable size to 64 bits Olivier Matz
                         ` (12 subsequent siblings)
  13 siblings, 0 replies; 112+ messages in thread
From: Olivier Matz @ 2014-11-26 15:04 UTC (permalink / raw)
  To: dev; +Cc: jigsaw

According to Intel® 82599 10 GbE Controller Datasheet (Table 7-38), both
L2 and L3 lengths are needed to offload the IP checksum.

Note that the e1000 driver does not need to be patched as it already
contains the fix.

Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
---
 lib/librte_pmd_e1000/igb_rxtx.c   | 2 +-
 lib/librte_pmd_ixgbe/ixgbe_rxtx.c | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/lib/librte_pmd_e1000/igb_rxtx.c b/lib/librte_pmd_e1000/igb_rxtx.c
index 0dca7b7..b406397 100644
--- a/lib/librte_pmd_e1000/igb_rxtx.c
+++ b/lib/librte_pmd_e1000/igb_rxtx.c
@@ -262,7 +262,7 @@ igbe_set_xmit_ctx(struct igb_tx_queue* txq,
 
 	if (ol_flags & PKT_TX_IP_CKSUM) {
 		type_tucmd_mlhl = E1000_ADVTXD_TUCMD_IPV4;
-		cmp_mask |= TX_MAC_LEN_CMP_MASK;
+		cmp_mask |= TX_MACIP_LEN_CMP_MASK;
 	}
 
 	/* Specify which HW CTX to upload. */
diff --git a/lib/librte_pmd_ixgbe/ixgbe_rxtx.c b/lib/librte_pmd_ixgbe/ixgbe_rxtx.c
index f9b3fe3..ecebbf6 100644
--- a/lib/librte_pmd_ixgbe/ixgbe_rxtx.c
+++ b/lib/librte_pmd_ixgbe/ixgbe_rxtx.c
@@ -374,7 +374,7 @@ ixgbe_set_xmit_ctx(struct igb_tx_queue* txq,
 
 	if (ol_flags & PKT_TX_IP_CKSUM) {
 		type_tucmd_mlhl = IXGBE_ADVTXD_TUCMD_IPV4;
-		cmp_mask |= TX_MAC_LEN_CMP_MASK;
+		cmp_mask |= TX_MACIP_LEN_CMP_MASK;
 	}
 
 	/* Specify which HW CTX to upload. */
-- 
2.1.0

^ permalink raw reply	[flat|nested] 112+ messages in thread

* [dpdk-dev] [PATCH v4 02/13] ixgbe: fix remaining pkt_flags variable size to 64 bits
  2014-11-26 15:04     ` [dpdk-dev] [PATCH v4 00/13] add TSO support Olivier Matz
  2014-11-26 15:04       ` [dpdk-dev] [PATCH v4 01/13] igb/ixgbe: fix IP checksum calculation Olivier Matz
@ 2014-11-26 15:04       ` Olivier Matz
  2014-11-26 15:04       ` [dpdk-dev] [PATCH v4 03/13] mbuf: reorder tx ol_flags Olivier Matz
                         ` (11 subsequent siblings)
  13 siblings, 0 replies; 112+ messages in thread
From: Olivier Matz @ 2014-11-26 15:04 UTC (permalink / raw)
  To: dev; +Cc: jigsaw

Since commit 4332beee9 "mbuf: expand ol_flags field to 64-bits", the
packet flags are now 64 bits wide. Some occurences were forgotten in
the ixgbe driver.

Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
---
 lib/librte_pmd_ixgbe/ixgbe_rxtx.c | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/lib/librte_pmd_ixgbe/ixgbe_rxtx.c b/lib/librte_pmd_ixgbe/ixgbe_rxtx.c
index ecebbf6..7e470ce 100644
--- a/lib/librte_pmd_ixgbe/ixgbe_rxtx.c
+++ b/lib/librte_pmd_ixgbe/ixgbe_rxtx.c
@@ -817,7 +817,7 @@ end_of_tx:
 static inline uint64_t
 rx_desc_hlen_type_rss_to_pkt_flags(uint32_t hl_tp_rs)
 {
-	uint16_t pkt_flags;
+	uint64_t pkt_flags;
 
 	static uint64_t ip_pkt_types_map[16] = {
 		0, PKT_RX_IPV4_HDR, PKT_RX_IPV4_HDR_EXT, PKT_RX_IPV4_HDR_EXT,
@@ -834,7 +834,7 @@ rx_desc_hlen_type_rss_to_pkt_flags(uint32_t hl_tp_rs)
 	};
 
 #ifdef RTE_LIBRTE_IEEE1588
-	static uint32_t ip_pkt_etqf_map[8] = {
+	static uint64_t ip_pkt_etqf_map[8] = {
 		0, 0, 0, PKT_RX_IEEE1588_PTP,
 		0, 0, 0, 0,
 	};
@@ -903,7 +903,7 @@ ixgbe_rx_scan_hw_ring(struct igb_rx_queue *rxq)
 	struct igb_rx_entry *rxep;
 	struct rte_mbuf *mb;
 	uint16_t pkt_len;
-	uint16_t pkt_flags;
+	uint64_t pkt_flags;
 	int s[LOOK_AHEAD], nb_dd;
 	int i, j, nb_rx = 0;
 
@@ -1335,7 +1335,7 @@ ixgbe_recv_scattered_pkts(void *rx_queue, struct rte_mbuf **rx_pkts,
 	uint16_t nb_rx;
 	uint16_t nb_hold;
 	uint16_t data_len;
-	uint16_t pkt_flags;
+	uint64_t pkt_flags;
 
 	nb_rx = 0;
 	nb_hold = 0;
@@ -1511,9 +1511,9 @@ ixgbe_recv_scattered_pkts(void *rx_queue, struct rte_mbuf **rx_pkts,
 		first_seg->vlan_tci = rte_le_to_cpu_16(rxd.wb.upper.vlan);
 		hlen_type_rss = rte_le_to_cpu_32(rxd.wb.lower.lo_dword.data);
 		pkt_flags = rx_desc_hlen_type_rss_to_pkt_flags(hlen_type_rss);
-		pkt_flags = (uint16_t)(pkt_flags |
+		pkt_flags = (pkt_flags |
 				rx_desc_status_to_pkt_flags(staterr));
-		pkt_flags = (uint16_t)(pkt_flags |
+		pkt_flags = (pkt_flags |
 				rx_desc_error_to_pkt_flags(staterr));
 		first_seg->ol_flags = pkt_flags;
 
-- 
2.1.0

^ permalink raw reply	[flat|nested] 112+ messages in thread

* [dpdk-dev] [PATCH v4 03/13] mbuf: reorder tx ol_flags
  2014-11-26 15:04     ` [dpdk-dev] [PATCH v4 00/13] add TSO support Olivier Matz
  2014-11-26 15:04       ` [dpdk-dev] [PATCH v4 01/13] igb/ixgbe: fix IP checksum calculation Olivier Matz
  2014-11-26 15:04       ` [dpdk-dev] [PATCH v4 02/13] ixgbe: fix remaining pkt_flags variable size to 64 bits Olivier Matz
@ 2014-11-26 15:04       ` Olivier Matz
  2014-11-26 15:04       ` [dpdk-dev] [PATCH v4 04/13] mbuf: add help about TX checksum flags Olivier Matz
                         ` (10 subsequent siblings)
  13 siblings, 0 replies; 112+ messages in thread
From: Olivier Matz @ 2014-11-26 15:04 UTC (permalink / raw)
  To: dev; +Cc: jigsaw

The tx mbuf flags are now ordered from the lowest value to the
the highest. Add comments to explain where to add new flags.

By the way, move the PKT_TX_VXLAN_CKSUM at the right place.

Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
---
 lib/librte_mbuf/rte_mbuf.h | 17 +++++++++--------
 1 file changed, 9 insertions(+), 8 deletions(-)

diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h
index 5899e5c..faa9924 100644
--- a/lib/librte_mbuf/rte_mbuf.h
+++ b/lib/librte_mbuf/rte_mbuf.h
@@ -95,14 +95,11 @@ extern "C" {
 #define PKT_RX_TUNNEL_IPV6_HDR (1ULL << 12) /**< RX tunnel packet with IPv6 header. */
 #define PKT_RX_FDIR_ID       (1ULL << 13) /**< FD id reported if FDIR match. */
 #define PKT_RX_FDIR_FLX      (1ULL << 14) /**< Flexible bytes reported if FDIR match. */
+/* add new RX flags here */
 
-#define PKT_TX_VLAN_PKT      (1ULL << 55) /**< TX packet is a 802.1q VLAN packet. */
-#define PKT_TX_IP_CKSUM      (1ULL << 54) /**< IP cksum of TX pkt. computed by NIC. */
+/* add new TX flags here */
 #define PKT_TX_VXLAN_CKSUM   (1ULL << 50) /**< TX checksum of VXLAN computed by NIC */
-#define PKT_TX_IPV4_CSUM     PKT_TX_IP_CKSUM /**< Alias of PKT_TX_IP_CKSUM. */
-#define PKT_TX_IPV4          PKT_RX_IPV4_HDR /**< IPv4 with no IP checksum offload. */
-#define PKT_TX_IPV6          PKT_RX_IPV6_HDR /**< IPv6 packet */
-
+#define PKT_TX_IEEE1588_TMST (1ULL << 51) /**< TX IEEE1588 packet to timestamp. */
 /*
  * Bits 52+53 used for L4 packet type with checksum enabled.
  *     00: Reserved
@@ -116,8 +113,12 @@ extern "C" {
 #define PKT_TX_UDP_CKSUM     (3ULL << 52) /**< UDP cksum of TX pkt. computed by NIC. */
 #define PKT_TX_L4_MASK       (3ULL << 52) /**< Mask for L4 cksum offload request. */
 
-/* Bit 51 - IEEE1588*/
-#define PKT_TX_IEEE1588_TMST (1ULL << 51) /**< TX IEEE1588 packet to timestamp. */
+#define PKT_TX_IP_CKSUM      (1ULL << 54) /**< IP cksum of TX pkt. computed by NIC. */
+#define PKT_TX_IPV4_CSUM     PKT_TX_IP_CKSUM /**< Alias of PKT_TX_IP_CKSUM. */
+#define PKT_TX_IPV4          PKT_RX_IPV4_HDR /**< IPv4 with no IP checksum offload. */
+#define PKT_TX_IPV6          PKT_RX_IPV6_HDR /**< IPv6 packet */
+
+#define PKT_TX_VLAN_PKT      (1ULL << 55) /**< TX packet is a 802.1q VLAN packet. */
 
 /* Use final bit of flags to indicate a control mbuf */
 #define CTRL_MBUF_FLAG       (1ULL << 63) /**< Mbuf contains control data */
-- 
2.1.0

^ permalink raw reply	[flat|nested] 112+ messages in thread

* [dpdk-dev] [PATCH v4 04/13] mbuf: add help about TX checksum flags
  2014-11-26 15:04     ` [dpdk-dev] [PATCH v4 00/13] add TSO support Olivier Matz
                         ` (2 preceding siblings ...)
  2014-11-26 15:04       ` [dpdk-dev] [PATCH v4 03/13] mbuf: reorder tx ol_flags Olivier Matz
@ 2014-11-26 15:04       ` Olivier Matz
  2014-11-26 15:04       ` [dpdk-dev] [PATCH v4 05/13] mbuf: remove too specific PKT_TX_OFFLOAD_MASK definition Olivier Matz
                         ` (9 subsequent siblings)
  13 siblings, 0 replies; 112+ messages in thread
From: Olivier Matz @ 2014-11-26 15:04 UTC (permalink / raw)
  To: dev; +Cc: jigsaw

Describe how to use hardware checksum API.

Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
---
 lib/librte_mbuf/rte_mbuf.h | 28 ++++++++++++++++++----------
 1 file changed, 18 insertions(+), 10 deletions(-)

diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h
index faa9924..6d9ef21 100644
--- a/lib/librte_mbuf/rte_mbuf.h
+++ b/lib/librte_mbuf/rte_mbuf.h
@@ -100,23 +100,31 @@ extern "C" {
 /* add new TX flags here */
 #define PKT_TX_VXLAN_CKSUM   (1ULL << 50) /**< TX checksum of VXLAN computed by NIC */
 #define PKT_TX_IEEE1588_TMST (1ULL << 51) /**< TX IEEE1588 packet to timestamp. */
-/*
- * Bits 52+53 used for L4 packet type with checksum enabled.
- *     00: Reserved
- *     01: TCP checksum
- *     10: SCTP checksum
- *     11: UDP checksum
+
+/**
+ * Bits 52+53 used for L4 packet type with checksum enabled: 00: Reserved,
+ * 01: TCP checksum, 10: SCTP checksum, 11: UDP checksum. To use hardware
+ * L4 checksum offload, the user needs to:
+ *  - fill l2_len and l3_len in mbuf
+ *  - set the flags PKT_TX_TCP_CKSUM, PKT_TX_SCTP_CKSUM or PKT_TX_UDP_CKSUM
+ *  - set the flag PKT_TX_IPV4 or PKT_TX_IPV6
+ *  - calculate the pseudo header checksum and set it in the L4 header (only
+ *    for TCP or UDP). For SCTP, set the crc field to 0.
  */
-#define PKT_TX_L4_NO_CKSUM   (0ULL << 52) /**< Disable L4 cksum of TX pkt. */
+#define PKT_TX_L4_NO_CKSUM   (0ULL << 52) /* Disable L4 cksum of TX pkt. */
 #define PKT_TX_TCP_CKSUM     (1ULL << 52) /**< TCP cksum of TX pkt. computed by NIC. */
 #define PKT_TX_SCTP_CKSUM    (2ULL << 52) /**< SCTP cksum of TX pkt. computed by NIC. */
 #define PKT_TX_UDP_CKSUM     (3ULL << 52) /**< UDP cksum of TX pkt. computed by NIC. */
 #define PKT_TX_L4_MASK       (3ULL << 52) /**< Mask for L4 cksum offload request. */
 
-#define PKT_TX_IP_CKSUM      (1ULL << 54) /**< IP cksum of TX pkt. computed by NIC. */
+#define PKT_TX_IP_CKSUM      (1ULL << 54)/**< IP cksum of TX pkt. computed by NIC. */
 #define PKT_TX_IPV4_CSUM     PKT_TX_IP_CKSUM /**< Alias of PKT_TX_IP_CKSUM. */
-#define PKT_TX_IPV4          PKT_RX_IPV4_HDR /**< IPv4 with no IP checksum offload. */
-#define PKT_TX_IPV6          PKT_RX_IPV6_HDR /**< IPv6 packet */
+
+/** Tell the NIC it's an IPv4 packet. Required for L4 checksum offload. */
+#define PKT_TX_IPV4          PKT_RX_IPV4_HDR
+
+/** Tell the NIC it's an IPv6 packet. Required for L4 checksum offload. */
+#define PKT_TX_IPV6          PKT_RX_IPV6_HDR
 
 #define PKT_TX_VLAN_PKT      (1ULL << 55) /**< TX packet is a 802.1q VLAN packet. */
 
-- 
2.1.0

^ permalink raw reply	[flat|nested] 112+ messages in thread

* [dpdk-dev] [PATCH v4 05/13] mbuf: remove too specific PKT_TX_OFFLOAD_MASK definition
  2014-11-26 15:04     ` [dpdk-dev] [PATCH v4 00/13] add TSO support Olivier Matz
                         ` (3 preceding siblings ...)
  2014-11-26 15:04       ` [dpdk-dev] [PATCH v4 04/13] mbuf: add help about TX checksum flags Olivier Matz
@ 2014-11-26 15:04       ` Olivier Matz
  2014-11-26 15:04       ` [dpdk-dev] [PATCH v4 06/13] mbuf: add functions to get the name of an ol_flag Olivier Matz
                         ` (8 subsequent siblings)
  13 siblings, 0 replies; 112+ messages in thread
From: Olivier Matz @ 2014-11-26 15:04 UTC (permalink / raw)
  To: dev; +Cc: jigsaw

This definition is specific to Intel PMD drivers and its definition
"indicate what bits required for building TX context" shows that it
should not be in the generic rte_mbuf.h but in the PMD driver.

Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
---
 lib/librte_mbuf/rte_mbuf.h        | 5 -----
 lib/librte_pmd_e1000/igb_rxtx.c   | 8 +++++++-
 lib/librte_pmd_ixgbe/ixgbe_rxtx.c | 8 +++++++-
 3 files changed, 14 insertions(+), 7 deletions(-)

diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h
index 6d9ef21..c2f4685 100644
--- a/lib/librte_mbuf/rte_mbuf.h
+++ b/lib/librte_mbuf/rte_mbuf.h
@@ -131,11 +131,6 @@ extern "C" {
 /* Use final bit of flags to indicate a control mbuf */
 #define CTRL_MBUF_FLAG       (1ULL << 63) /**< Mbuf contains control data */
 
-/**
- * Bit Mask to indicate what bits required for building TX context
- */
-#define PKT_TX_OFFLOAD_MASK (PKT_TX_VLAN_PKT | PKT_TX_IP_CKSUM | PKT_TX_L4_MASK)
-
 /* define a set of marker types that can be used to refer to set points in the
  * mbuf */
 typedef void    *MARKER[0];   /**< generic marker for a point in a structure */
diff --git a/lib/librte_pmd_e1000/igb_rxtx.c b/lib/librte_pmd_e1000/igb_rxtx.c
index b406397..433c616 100644
--- a/lib/librte_pmd_e1000/igb_rxtx.c
+++ b/lib/librte_pmd_e1000/igb_rxtx.c
@@ -84,6 +84,12 @@
 		ETH_RSS_IPV6_UDP | \
 		ETH_RSS_IPV6_UDP_EX)
 
+/* Bit Mask to indicate what bits required for building TX context */
+#define IGB_TX_OFFLOAD_MASK (			 \
+		PKT_TX_VLAN_PKT |		 \
+		PKT_TX_IP_CKSUM |		 \
+		PKT_TX_L4_MASK)
+
 static inline struct rte_mbuf *
 rte_rxmbuf_alloc(struct rte_mempool *mp)
 {
@@ -400,7 +406,7 @@ eth_igb_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
 		ol_flags = tx_pkt->ol_flags;
 		vlan_macip_lens.f.vlan_tci = tx_pkt->vlan_tci;
 		vlan_macip_lens.f.l2_l3_len = tx_pkt->l2_l3_len;
-		tx_ol_req = ol_flags & PKT_TX_OFFLOAD_MASK;
+		tx_ol_req = ol_flags & IGB_TX_OFFLOAD_MASK;
 
 		/* If a Context Descriptor need be built . */
 		if (tx_ol_req) {
diff --git a/lib/librte_pmd_ixgbe/ixgbe_rxtx.c b/lib/librte_pmd_ixgbe/ixgbe_rxtx.c
index 7e470ce..ca35db2 100644
--- a/lib/librte_pmd_ixgbe/ixgbe_rxtx.c
+++ b/lib/librte_pmd_ixgbe/ixgbe_rxtx.c
@@ -90,6 +90,12 @@
 		ETH_RSS_IPV6_UDP | \
 		ETH_RSS_IPV6_UDP_EX)
 
+/* Bit Mask to indicate what bits required for building TX context */
+#define IXGBE_TX_OFFLOAD_MASK (			 \
+		PKT_TX_VLAN_PKT |		 \
+		PKT_TX_IP_CKSUM |		 \
+		PKT_TX_L4_MASK)
+
 static inline struct rte_mbuf *
 rte_rxmbuf_alloc(struct rte_mempool *mp)
 {
@@ -580,7 +586,7 @@ ixgbe_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
 		ol_flags = tx_pkt->ol_flags;
 
 		/* If hardware offload required */
-		tx_ol_req = ol_flags & PKT_TX_OFFLOAD_MASK;
+		tx_ol_req = ol_flags & IXGBE_TX_OFFLOAD_MASK;
 		if (tx_ol_req) {
 			vlan_macip_lens.f.vlan_tci = tx_pkt->vlan_tci;
 			vlan_macip_lens.f.l2_l3_len = tx_pkt->l2_l3_len;
-- 
2.1.0

^ permalink raw reply	[flat|nested] 112+ messages in thread

* [dpdk-dev] [PATCH v4 06/13] mbuf: add functions to get the name of an ol_flag
  2014-11-26 15:04     ` [dpdk-dev] [PATCH v4 00/13] add TSO support Olivier Matz
                         ` (4 preceding siblings ...)
  2014-11-26 15:04       ` [dpdk-dev] [PATCH v4 05/13] mbuf: remove too specific PKT_TX_OFFLOAD_MASK definition Olivier Matz
@ 2014-11-26 15:04       ` Olivier Matz
  2014-11-26 15:04       ` [dpdk-dev] [PATCH v4 07/13] testpmd: fix use of offload flags in testpmd Olivier Matz
                         ` (7 subsequent siblings)
  13 siblings, 0 replies; 112+ messages in thread
From: Olivier Matz @ 2014-11-26 15:04 UTC (permalink / raw)
  To: dev; +Cc: jigsaw

In test-pmd (rxonly.c), the code is able to dump the list of ol_flags.
The issue is that the list of flags in the application has to be
synchronized with the flags defined in rte_mbuf.h.

This patch introduces 2 new functions rte_get_rx_ol_flag_name()
and rte_get_tx_ol_flag_name() that returns the name of a flag from
its mask. It also fixes rxonly.c to use this new functions and to
display the proper flags.

Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
---
 app/test-pmd/rxonly.c      | 36 ++++++++++------------------------
 lib/librte_mbuf/rte_mbuf.c | 48 ++++++++++++++++++++++++++++++++++++++++++++++
 lib/librte_mbuf/rte_mbuf.h | 25 ++++++++++++++++++++++++
 3 files changed, 83 insertions(+), 26 deletions(-)

diff --git a/app/test-pmd/rxonly.c b/app/test-pmd/rxonly.c
index 88b65bc..fdfe990 100644
--- a/app/test-pmd/rxonly.c
+++ b/app/test-pmd/rxonly.c
@@ -71,26 +71,6 @@
 
 #include "testpmd.h"
 
-#define MAX_PKT_RX_FLAGS 13
-static const char *pkt_rx_flag_names[MAX_PKT_RX_FLAGS] = {
-	"VLAN_PKT",
-	"RSS_HASH",
-	"PKT_RX_FDIR",
-	"IP_CKSUM",
-	"IP_CKSUM_BAD",
-
-	"IPV4_HDR",
-	"IPV4_HDR_EXT",
-	"IPV6_HDR",
-	"IPV6_HDR_EXT",
-
-	"IEEE1588_PTP",
-	"IEEE1588_TMST",
-
-	"TUNNEL_IPV4_HDR",
-	"TUNNEL_IPV6_HDR",
-};
-
 static inline void
 print_ether_addr(const char *what, struct ether_addr *eth_addr)
 {
@@ -222,12 +202,16 @@ pkt_burst_receive(struct fwd_stream *fs)
 		printf(" - Receive queue=0x%x", (unsigned) fs->rx_queue);
 		printf("\n");
 		if (ol_flags != 0) {
-			int rxf;
-
-			for (rxf = 0; rxf < MAX_PKT_RX_FLAGS; rxf++) {
-				if (ol_flags & (1 << rxf))
-					printf("  PKT_RX_%s\n",
-					       pkt_rx_flag_names[rxf]);
+			unsigned rxf;
+			const char *name;
+
+			for (rxf = 0; rxf < sizeof(mb->ol_flags) * 8; rxf++) {
+				if ((ol_flags & (1ULL << rxf)) == 0)
+					continue;
+				name = rte_get_rx_ol_flag_name(1ULL << rxf);
+				if (name == NULL)
+					continue;
+				printf("  %s\n", name);
 			}
 		}
 		rte_pktmbuf_free(mb);
diff --git a/lib/librte_mbuf/rte_mbuf.c b/lib/librte_mbuf/rte_mbuf.c
index 52e7574..9b57b3a 100644
--- a/lib/librte_mbuf/rte_mbuf.c
+++ b/lib/librte_mbuf/rte_mbuf.c
@@ -2,6 +2,7 @@
  *   BSD LICENSE
  *
  *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
+ *   Copyright 2014 6WIND S.A.
  *   All rights reserved.
  *
  *   Redistribution and use in source and binary forms, with or without
@@ -196,3 +197,50 @@ rte_pktmbuf_dump(FILE *f, const struct rte_mbuf *m, unsigned dump_len)
 		nb_segs --;
 	}
 }
+
+/*
+ * Get the name of a RX offload flag. Must be kept synchronized with flag
+ * definitions in rte_mbuf.h.
+ */
+const char *rte_get_rx_ol_flag_name(uint64_t mask)
+{
+	switch (mask) {
+	case PKT_RX_VLAN_PKT: return "PKT_RX_VLAN_PKT";
+	case PKT_RX_RSS_HASH: return "PKT_RX_RSS_HASH";
+	case PKT_RX_FDIR: return "PKT_RX_FDIR";
+	case PKT_RX_L4_CKSUM_BAD: return "PKT_RX_L4_CKSUM_BAD";
+	case PKT_RX_IP_CKSUM_BAD: return "PKT_RX_IP_CKSUM_BAD";
+	/* case PKT_RX_EIP_CKSUM_BAD: return "PKT_RX_EIP_CKSUM_BAD"; */
+	/* case PKT_RX_OVERSIZE: return "PKT_RX_OVERSIZE"; */
+	/* case PKT_RX_HBUF_OVERFLOW: return "PKT_RX_HBUF_OVERFLOW"; */
+	/* case PKT_RX_RECIP_ERR: return "PKT_RX_RECIP_ERR"; */
+	/* case PKT_RX_MAC_ERR: return "PKT_RX_MAC_ERR"; */
+	case PKT_RX_IPV4_HDR: return "PKT_RX_IPV4_HDR";
+	case PKT_RX_IPV4_HDR_EXT: return "PKT_RX_IPV4_HDR_EXT";
+	case PKT_RX_IPV6_HDR: return "PKT_RX_IPV6_HDR";
+	case PKT_RX_IPV6_HDR_EXT: return "PKT_RX_IPV6_HDR_EXT";
+	case PKT_RX_IEEE1588_PTP: return "PKT_RX_IEEE1588_PTP";
+	case PKT_RX_IEEE1588_TMST: return "PKT_RX_IEEE1588_TMST";
+	case PKT_RX_TUNNEL_IPV4_HDR: return "PKT_RX_TUNNEL_IPV4_HDR";
+	case PKT_RX_TUNNEL_IPV6_HDR: return "PKT_RX_TUNNEL_IPV6_HDR";
+	default: return NULL;
+	}
+}
+
+/*
+ * Get the name of a TX offload flag. Must be kept synchronized with flag
+ * definitions in rte_mbuf.h.
+ */
+const char *rte_get_tx_ol_flag_name(uint64_t mask)
+{
+	switch (mask) {
+	case PKT_TX_VLAN_PKT: return "PKT_TX_VLAN_PKT";
+	case PKT_TX_IP_CKSUM: return "PKT_TX_IP_CKSUM";
+	case PKT_TX_TCP_CKSUM: return "PKT_TX_TCP_CKSUM";
+	case PKT_TX_SCTP_CKSUM: return "PKT_TX_SCTP_CKSUM";
+	case PKT_TX_UDP_CKSUM: return "PKT_TX_UDP_CKSUM";
+	case PKT_TX_IEEE1588_TMST: return "PKT_TX_IEEE1588_TMST";
+	case PKT_TX_VXLAN_CKSUM: return "PKT_TX_VXLAN_CKSUM";
+	default: return NULL;
+	}
+}
diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h
index c2f4685..2bc7a90 100644
--- a/lib/librte_mbuf/rte_mbuf.h
+++ b/lib/librte_mbuf/rte_mbuf.h
@@ -74,6 +74,9 @@ extern "C" {
  * - The most-significant 8 bits are reserved for generic mbuf flags
  * - TX flags therefore start at bit position 55 (i.e. 63-8), and new flags get
  *   added to the right of the previously defined flags
+ *
+ * Keep these flags synchronized with rte_get_rx_ol_flag_name() and
+ * rte_get_tx_ol_flag_name().
  */
 #define PKT_RX_VLAN_PKT      (1ULL << 0)  /**< RX packet is a 802.1q VLAN packet. */
 #define PKT_RX_RSS_HASH      (1ULL << 1)  /**< RX packet with RSS hash result. */
@@ -131,6 +134,28 @@ extern "C" {
 /* Use final bit of flags to indicate a control mbuf */
 #define CTRL_MBUF_FLAG       (1ULL << 63) /**< Mbuf contains control data */
 
+/**
+ * Get the name of a RX offload flag
+ *
+ * @param mask
+ *   The mask describing the flag.
+ * @return
+ *   The name of this flag, or NULL if it's not a valid RX flag.
+ */
+const char *rte_get_rx_ol_flag_name(uint64_t mask);
+
+/**
+ * Get the name of a TX offload flag
+ *
+ * @param mask
+ *   The mask describing the flag. Usually only one bit must be set.
+ *   Several bits can be given if they belong to the same mask.
+ *   Ex: PKT_TX_L4_MASK.
+ * @return
+ *   The name of this flag, or NULL if it's not a valid TX flag.
+ */
+const char *rte_get_tx_ol_flag_name(uint64_t mask);
+
 /* define a set of marker types that can be used to refer to set points in the
  * mbuf */
 typedef void    *MARKER[0];   /**< generic marker for a point in a structure */
-- 
2.1.0

^ permalink raw reply	[flat|nested] 112+ messages in thread

* [dpdk-dev] [PATCH v4 07/13] testpmd: fix use of offload flags in testpmd
  2014-11-26 15:04     ` [dpdk-dev] [PATCH v4 00/13] add TSO support Olivier Matz
                         ` (5 preceding siblings ...)
  2014-11-26 15:04       ` [dpdk-dev] [PATCH v4 06/13] mbuf: add functions to get the name of an ol_flag Olivier Matz
@ 2014-11-26 15:04       ` Olivier Matz
  2014-11-26 15:04       ` [dpdk-dev] [PATCH v4 08/13] testpmd: rework csum forward engine Olivier Matz
                         ` (6 subsequent siblings)
  13 siblings, 0 replies; 112+ messages in thread
From: Olivier Matz @ 2014-11-26 15:04 UTC (permalink / raw)
  To: dev; +Cc: jigsaw

In testpmd the rte_port->tx_ol_flags flag was used in 2 incompatible
manners:
- sometimes used with testpmd specific flags (0xff for checksums, and
  bit 11 for vlan)
- sometimes assigned to m->ol_flags directly, which is wrong in case
  of checksum flags

This commit replaces the hardcoded values by named definitions, which
are not compatible with mbuf flags. The testpmd forward engines are
fixed to use the flags properly.

Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
---
 app/test-pmd/config.c   |  4 ++--
 app/test-pmd/csumonly.c | 40 +++++++++++++++++++++++-----------------
 app/test-pmd/macfwd.c   |  5 ++++-
 app/test-pmd/macswap.c  |  5 ++++-
 app/test-pmd/testpmd.h  | 28 +++++++++++++++++++++-------
 app/test-pmd/txonly.c   |  9 ++++++---
 6 files changed, 60 insertions(+), 31 deletions(-)

diff --git a/app/test-pmd/config.c b/app/test-pmd/config.c
index a322d8b..c5ac8a5 100644
--- a/app/test-pmd/config.c
+++ b/app/test-pmd/config.c
@@ -1683,7 +1683,7 @@ tx_vlan_set(portid_t port_id, uint16_t vlan_id)
 		return;
 	if (vlan_id_is_invalid(vlan_id))
 		return;
-	ports[port_id].tx_ol_flags |= PKT_TX_VLAN_PKT;
+	ports[port_id].tx_ol_flags |= TESTPMD_TX_OFFLOAD_INSERT_VLAN;
 	ports[port_id].tx_vlan_id = vlan_id;
 }
 
@@ -1692,7 +1692,7 @@ tx_vlan_reset(portid_t port_id)
 {
 	if (port_id_is_invalid(port_id))
 		return;
-	ports[port_id].tx_ol_flags &= ~PKT_TX_VLAN_PKT;
+	ports[port_id].tx_ol_flags &= ~TESTPMD_TX_OFFLOAD_INSERT_VLAN;
 }
 
 void
diff --git a/app/test-pmd/csumonly.c b/app/test-pmd/csumonly.c
index 8d10bfd..743094a 100644
--- a/app/test-pmd/csumonly.c
+++ b/app/test-pmd/csumonly.c
@@ -322,7 +322,7 @@ pkt_burst_checksum_forward(struct fwd_stream *fs)
 			/* Do not delete, this is required by HW*/
 			ipv4_hdr->hdr_checksum = 0;
 
-			if (tx_ol_flags & 0x1) {
+			if (tx_ol_flags & TESTPMD_TX_OFFLOAD_IP_CKSUM) {
 				/* HW checksum */
 				ol_flags |= PKT_TX_IP_CKSUM;
 			}
@@ -336,7 +336,7 @@ pkt_burst_checksum_forward(struct fwd_stream *fs)
 			if (l4_proto == IPPROTO_UDP) {
 				udp_hdr = (struct udp_hdr*) (rte_pktmbuf_mtod(mb,
 						unsigned char *) + l2_len + l3_len);
-				if (tx_ol_flags & 0x2) {
+				if (tx_ol_flags & TESTPMD_TX_OFFLOAD_UDP_CKSUM) {
 					/* HW Offload */
 					ol_flags |= PKT_TX_UDP_CKSUM;
 					if (ipv4_tunnel)
@@ -358,7 +358,7 @@ pkt_burst_checksum_forward(struct fwd_stream *fs)
 					uint16_t len;
 
 					/* Check if inner L3/L4 checkum flag is set */
-					if (tx_ol_flags & 0xF0)
+					if (tx_ol_flags & TESTPMD_TX_OFFLOAD_INNER_CKSUM_MASK)
 						ol_flags |= PKT_TX_VXLAN_CKSUM;
 
 					inner_l2_len  = sizeof(struct ether_hdr);
@@ -381,7 +381,7 @@ pkt_burst_checksum_forward(struct fwd_stream *fs)
 								unsigned char *) + len);
 						inner_l4_proto = inner_ipv4_hdr->next_proto_id;
 
-						if (tx_ol_flags & 0x10) {
+						if (tx_ol_flags & TESTPMD_TX_OFFLOAD_INNER_IP_CKSUM) {
 
 							/* Do not delete, this is required by HW*/
 							inner_ipv4_hdr->hdr_checksum = 0;
@@ -394,7 +394,8 @@ pkt_burst_checksum_forward(struct fwd_stream *fs)
 								unsigned char *) + len);
 						inner_l4_proto = inner_ipv6_hdr->proto;
 					}
-					if ((inner_l4_proto == IPPROTO_UDP) && (tx_ol_flags & 0x20)) {
+					if ((inner_l4_proto == IPPROTO_UDP) &&
+						(tx_ol_flags & TESTPMD_TX_OFFLOAD_INNER_UDP_CKSUM)) {
 
 						/* HW Offload */
 						ol_flags |= PKT_TX_UDP_CKSUM;
@@ -405,7 +406,8 @@ pkt_burst_checksum_forward(struct fwd_stream *fs)
 						else if (eth_type == ETHER_TYPE_IPv6)
 							inner_udp_hdr->dgram_cksum = get_ipv6_psd_sum(inner_ipv6_hdr);
 
-					} else if ((inner_l4_proto == IPPROTO_TCP) && (tx_ol_flags & 0x40)) {
+					} else if ((inner_l4_proto == IPPROTO_TCP) &&
+						(tx_ol_flags & TESTPMD_TX_OFFLOAD_INNER_TCP_CKSUM)) {
 						/* HW Offload */
 						ol_flags |= PKT_TX_TCP_CKSUM;
 						inner_tcp_hdr = (struct tcp_hdr *) (rte_pktmbuf_mtod(mb,
@@ -414,7 +416,8 @@ pkt_burst_checksum_forward(struct fwd_stream *fs)
 							inner_tcp_hdr->cksum = get_ipv4_psd_sum(inner_ipv4_hdr);
 						else if (eth_type == ETHER_TYPE_IPv6)
 							inner_tcp_hdr->cksum = get_ipv6_psd_sum(inner_ipv6_hdr);
-					} else if ((inner_l4_proto == IPPROTO_SCTP) && (tx_ol_flags & 0x80)) {
+					} else if ((inner_l4_proto == IPPROTO_SCTP) &&
+						(tx_ol_flags & TESTPMD_TX_OFFLOAD_INNER_SCTP_CKSUM)) {
 						/* HW Offload */
 						ol_flags |= PKT_TX_SCTP_CKSUM;
 						inner_sctp_hdr = (struct sctp_hdr *) (rte_pktmbuf_mtod(mb,
@@ -427,7 +430,7 @@ pkt_burst_checksum_forward(struct fwd_stream *fs)
 			} else if (l4_proto == IPPROTO_TCP) {
 				tcp_hdr = (struct tcp_hdr*) (rte_pktmbuf_mtod(mb,
 						unsigned char *) + l2_len + l3_len);
-				if (tx_ol_flags & 0x4) {
+				if (tx_ol_flags & TESTPMD_TX_OFFLOAD_TCP_CKSUM) {
 					ol_flags |= PKT_TX_TCP_CKSUM;
 					tcp_hdr->cksum = get_ipv4_psd_sum(ipv4_hdr);
 				}
@@ -440,7 +443,7 @@ pkt_burst_checksum_forward(struct fwd_stream *fs)
 				sctp_hdr = (struct sctp_hdr*) (rte_pktmbuf_mtod(mb,
 						unsigned char *) + l2_len + l3_len);
 
-				if (tx_ol_flags & 0x8) {
+				if (tx_ol_flags & TESTPMD_TX_OFFLOAD_SCTP_CKSUM) {
 					ol_flags |= PKT_TX_SCTP_CKSUM;
 					sctp_hdr->cksum = 0;
 
@@ -465,7 +468,7 @@ pkt_burst_checksum_forward(struct fwd_stream *fs)
 			if (l4_proto == IPPROTO_UDP) {
 				udp_hdr = (struct udp_hdr*) (rte_pktmbuf_mtod(mb,
 						unsigned char *) + l2_len + l3_len);
-				if (tx_ol_flags & 0x2) {
+				if (tx_ol_flags & TESTPMD_TX_OFFLOAD_UDP_CKSUM) {
 					/* HW Offload */
 					ol_flags |= PKT_TX_UDP_CKSUM;
 					if (ipv6_tunnel)
@@ -487,7 +490,7 @@ pkt_burst_checksum_forward(struct fwd_stream *fs)
 					uint16_t len;
 
 					/* Check if inner L3/L4 checksum flag is set */
-					if (tx_ol_flags & 0xF0)
+					if (tx_ol_flags & TESTPMD_TX_OFFLOAD_INNER_CKSUM_MASK)
 						ol_flags |= PKT_TX_VXLAN_CKSUM;
 
 					inner_l2_len  = sizeof(struct ether_hdr);
@@ -511,7 +514,7 @@ pkt_burst_checksum_forward(struct fwd_stream *fs)
 						inner_l4_proto = inner_ipv4_hdr->next_proto_id;
 
 						/* HW offload */
-						if (tx_ol_flags & 0x10) {
+						if (tx_ol_flags & TESTPMD_TX_OFFLOAD_INNER_IP_CKSUM) {
 
 							/* Do not delete, this is required by HW*/
 							inner_ipv4_hdr->hdr_checksum = 0;
@@ -524,7 +527,8 @@ pkt_burst_checksum_forward(struct fwd_stream *fs)
 						inner_l4_proto = inner_ipv6_hdr->proto;
 					}
 
-					if ((inner_l4_proto == IPPROTO_UDP) && (tx_ol_flags & 0x20)) {
+					if ((inner_l4_proto == IPPROTO_UDP) &&
+						(tx_ol_flags & TESTPMD_TX_OFFLOAD_INNER_UDP_CKSUM)) {
 						inner_udp_hdr = (struct udp_hdr *) (rte_pktmbuf_mtod(mb,
 							unsigned char *) + len + inner_l3_len);
 						/* HW offload */
@@ -534,7 +538,8 @@ pkt_burst_checksum_forward(struct fwd_stream *fs)
 							inner_udp_hdr->dgram_cksum = get_ipv4_psd_sum(inner_ipv4_hdr);
 						else if (eth_type == ETHER_TYPE_IPv6)
 							inner_udp_hdr->dgram_cksum = get_ipv6_psd_sum(inner_ipv6_hdr);
-					} else if ((inner_l4_proto == IPPROTO_TCP) && (tx_ol_flags & 0x40)) {
+					} else if ((inner_l4_proto == IPPROTO_TCP) &&
+						(tx_ol_flags & TESTPMD_TX_OFFLOAD_INNER_TCP_CKSUM)) {
 						/* HW offload */
 						ol_flags |= PKT_TX_TCP_CKSUM;
 						inner_tcp_hdr = (struct tcp_hdr *) (rte_pktmbuf_mtod(mb,
@@ -545,7 +550,8 @@ pkt_burst_checksum_forward(struct fwd_stream *fs)
 						else if (eth_type == ETHER_TYPE_IPv6)
 							inner_tcp_hdr->cksum = get_ipv6_psd_sum(inner_ipv6_hdr);
 
-					} else if ((inner_l4_proto == IPPROTO_SCTP) && (tx_ol_flags & 0x80)) {
+					} else if ((inner_l4_proto == IPPROTO_SCTP) &&
+						(tx_ol_flags & TESTPMD_TX_OFFLOAD_INNER_SCTP_CKSUM)) {
 						/* HW offload */
 						ol_flags |= PKT_TX_SCTP_CKSUM;
 						inner_sctp_hdr = (struct sctp_hdr *) (rte_pktmbuf_mtod(mb,
@@ -559,7 +565,7 @@ pkt_burst_checksum_forward(struct fwd_stream *fs)
 			else if (l4_proto == IPPROTO_TCP) {
 				tcp_hdr = (struct tcp_hdr*) (rte_pktmbuf_mtod(mb,
 						unsigned char *) + l2_len + l3_len);
-				if (tx_ol_flags & 0x4) {
+				if (tx_ol_flags & TESTPMD_TX_OFFLOAD_TCP_CKSUM) {
 					ol_flags |= PKT_TX_TCP_CKSUM;
 					tcp_hdr->cksum = get_ipv6_psd_sum(ipv6_hdr);
 				}
@@ -573,7 +579,7 @@ pkt_burst_checksum_forward(struct fwd_stream *fs)
 				sctp_hdr = (struct sctp_hdr*) (rte_pktmbuf_mtod(mb,
 						unsigned char *) + l2_len + l3_len);
 
-				if (tx_ol_flags & 0x8) {
+				if (tx_ol_flags & TESTPMD_TX_OFFLOAD_SCTP_CKSUM) {
 					ol_flags |= PKT_TX_SCTP_CKSUM;
 					sctp_hdr->cksum = 0;
 					/* Sanity check, only number of 4 bytes supported by HW */
diff --git a/app/test-pmd/macfwd.c b/app/test-pmd/macfwd.c
index 38bae23..aa3d705 100644
--- a/app/test-pmd/macfwd.c
+++ b/app/test-pmd/macfwd.c
@@ -85,6 +85,7 @@ pkt_burst_mac_forward(struct fwd_stream *fs)
 	uint16_t nb_rx;
 	uint16_t nb_tx;
 	uint16_t i;
+	uint64_t ol_flags = 0;
 #ifdef RTE_TEST_PMD_RECORD_CORE_CYCLES
 	uint64_t start_tsc;
 	uint64_t end_tsc;
@@ -108,6 +109,8 @@ pkt_burst_mac_forward(struct fwd_stream *fs)
 #endif
 	fs->rx_packets += nb_rx;
 	txp = &ports[fs->tx_port];
+	if (txp->tx_ol_flags & TESTPMD_TX_OFFLOAD_INSERT_VLAN)
+		ol_flags = PKT_TX_VLAN_PKT;
 	for (i = 0; i < nb_rx; i++) {
 		mb = pkts_burst[i];
 		eth_hdr = rte_pktmbuf_mtod(mb, struct ether_hdr *);
@@ -115,7 +118,7 @@ pkt_burst_mac_forward(struct fwd_stream *fs)
 				&eth_hdr->d_addr);
 		ether_addr_copy(&ports[fs->tx_port].eth_addr,
 				&eth_hdr->s_addr);
-		mb->ol_flags = txp->tx_ol_flags;
+		mb->ol_flags = ol_flags;
 		mb->l2_len = sizeof(struct ether_hdr);
 		mb->l3_len = sizeof(struct ipv4_hdr);
 		mb->vlan_tci = txp->tx_vlan_id;
diff --git a/app/test-pmd/macswap.c b/app/test-pmd/macswap.c
index 1786095..ec61657 100644
--- a/app/test-pmd/macswap.c
+++ b/app/test-pmd/macswap.c
@@ -85,6 +85,7 @@ pkt_burst_mac_swap(struct fwd_stream *fs)
 	uint16_t nb_rx;
 	uint16_t nb_tx;
 	uint16_t i;
+	uint64_t ol_flags = 0;
 #ifdef RTE_TEST_PMD_RECORD_CORE_CYCLES
 	uint64_t start_tsc;
 	uint64_t end_tsc;
@@ -108,6 +109,8 @@ pkt_burst_mac_swap(struct fwd_stream *fs)
 #endif
 	fs->rx_packets += nb_rx;
 	txp = &ports[fs->tx_port];
+	if (txp->tx_ol_flags & TESTPMD_TX_OFFLOAD_INSERT_VLAN)
+		ol_flags = PKT_TX_VLAN_PKT;
 	for (i = 0; i < nb_rx; i++) {
 		mb = pkts_burst[i];
 		eth_hdr = rte_pktmbuf_mtod(mb, struct ether_hdr *);
@@ -117,7 +120,7 @@ pkt_burst_mac_swap(struct fwd_stream *fs)
 		ether_addr_copy(&eth_hdr->s_addr, &eth_hdr->d_addr);
 		ether_addr_copy(&addr, &eth_hdr->s_addr);
 
-		mb->ol_flags = txp->tx_ol_flags;
+		mb->ol_flags = ol_flags;
 		mb->l2_len = sizeof(struct ether_hdr);
 		mb->l3_len = sizeof(struct ipv4_hdr);
 		mb->vlan_tci = txp->tx_vlan_id;
diff --git a/app/test-pmd/testpmd.h b/app/test-pmd/testpmd.h
index 9371ba1..d8847cb 100644
--- a/app/test-pmd/testpmd.h
+++ b/app/test-pmd/testpmd.h
@@ -123,14 +123,28 @@ struct fwd_stream {
 #endif
 };
 
+/** Offload IP checksum in csum forward engine */
+#define TESTPMD_TX_OFFLOAD_IP_CKSUM          0x0001
+/** Offload UDP checksum in csum forward engine */
+#define TESTPMD_TX_OFFLOAD_UDP_CKSUM         0x0002
+/** Offload TCP checksum in csum forward engine */
+#define TESTPMD_TX_OFFLOAD_TCP_CKSUM         0x0004
+/** Offload SCTP checksum in csum forward engine */
+#define TESTPMD_TX_OFFLOAD_SCTP_CKSUM        0x0008
+/** Offload inner IP checksum in csum forward engine */
+#define TESTPMD_TX_OFFLOAD_INNER_IP_CKSUM    0x0010
+/** Offload inner UDP checksum in csum forward engine */
+#define TESTPMD_TX_OFFLOAD_INNER_UDP_CKSUM   0x0020
+/** Offload inner TCP checksum in csum forward engine */
+#define TESTPMD_TX_OFFLOAD_INNER_TCP_CKSUM   0x0040
+/** Offload inner SCTP checksum in csum forward engine */
+#define TESTPMD_TX_OFFLOAD_INNER_SCTP_CKSUM  0x0080
+/** Offload inner IP checksum mask */
+#define TESTPMD_TX_OFFLOAD_INNER_CKSUM_MASK  0x00F0
+/** Insert VLAN header in forward engine */
+#define TESTPMD_TX_OFFLOAD_INSERT_VLAN       0x0100
 /**
  * The data structure associated with each port.
- * tx_ol_flags is slightly different from ol_flags of rte_mbuf.
- *   Bit  0: Insert IP checksum
- *   Bit  1: Insert UDP checksum
- *   Bit  2: Insert TCP checksum
- *   Bit  3: Insert SCTP checksum
- *   Bit 11: Insert VLAN Label
  */
 struct rte_port {
 	struct rte_eth_dev_info dev_info;   /**< PCI info + driver name */
@@ -141,7 +155,7 @@ struct rte_port {
 	struct fwd_stream       *rx_stream; /**< Port RX stream, if unique */
 	struct fwd_stream       *tx_stream; /**< Port TX stream, if unique */
 	unsigned int            socket_id;  /**< For NUMA support */
-	uint64_t                tx_ol_flags;/**< Offload Flags of TX packets. */
+	uint16_t                tx_ol_flags;/**< TX Offload Flags (TESTPMD_TX_OFFLOAD...). */
 	uint16_t                tx_vlan_id; /**< Tag Id. in TX VLAN packets. */
 	void                    *fwd_ctx;   /**< Forwarding mode context */
 	uint64_t                rx_bad_ip_csum; /**< rx pkts with bad ip checksum  */
diff --git a/app/test-pmd/txonly.c b/app/test-pmd/txonly.c
index 3d08005..c984670 100644
--- a/app/test-pmd/txonly.c
+++ b/app/test-pmd/txonly.c
@@ -196,6 +196,7 @@ static void
 pkt_burst_transmit(struct fwd_stream *fs)
 {
 	struct rte_mbuf *pkts_burst[MAX_PKT_BURST];
+	struct rte_port *txp;
 	struct rte_mbuf *pkt;
 	struct rte_mbuf *pkt_seg;
 	struct rte_mempool *mbp;
@@ -203,7 +204,7 @@ pkt_burst_transmit(struct fwd_stream *fs)
 	uint16_t nb_tx;
 	uint16_t nb_pkt;
 	uint16_t vlan_tci;
-	uint64_t ol_flags;
+	uint64_t ol_flags = 0;
 	uint8_t  i;
 #ifdef RTE_TEST_PMD_RECORD_CORE_CYCLES
 	uint64_t start_tsc;
@@ -216,8 +217,10 @@ pkt_burst_transmit(struct fwd_stream *fs)
 #endif
 
 	mbp = current_fwd_lcore()->mbp;
-	vlan_tci = ports[fs->tx_port].tx_vlan_id;
-	ol_flags = ports[fs->tx_port].tx_ol_flags;
+	txp = &ports[fs->tx_port];
+	vlan_tci = txp->tx_vlan_id;
+	if (txp->tx_ol_flags & TESTPMD_TX_OFFLOAD_INSERT_VLAN)
+		ol_flags = PKT_TX_VLAN_PKT;
 	for (nb_pkt = 0; nb_pkt < nb_pkt_per_burst; nb_pkt++) {
 		pkt = tx_mbuf_alloc(mbp);
 		if (pkt == NULL) {
-- 
2.1.0

^ permalink raw reply	[flat|nested] 112+ messages in thread

* [dpdk-dev] [PATCH v4 08/13] testpmd: rework csum forward engine
  2014-11-26 15:04     ` [dpdk-dev] [PATCH v4 00/13] add TSO support Olivier Matz
                         ` (6 preceding siblings ...)
  2014-11-26 15:04       ` [dpdk-dev] [PATCH v4 07/13] testpmd: fix use of offload flags in testpmd Olivier Matz
@ 2014-11-26 15:04       ` Olivier Matz
  2014-11-26 20:02         ` Ananyev, Konstantin
  2014-11-26 15:04       ` [dpdk-dev] [PATCH v4 09/13] mbuf: introduce new checksum API Olivier Matz
                         ` (5 subsequent siblings)
  13 siblings, 1 reply; 112+ messages in thread
From: Olivier Matz @ 2014-11-26 15:04 UTC (permalink / raw)
  To: dev; +Cc: jigsaw

The csum forward engine was becoming too complex to be used and
extended (the next commits want to add the support of TSO):

- no explaination about what the code does
- code is not factorized, lots of code duplicated, especially between
  ipv4/ipv6
- user command line api: use of bitmasks that need to be calculated by
  the user
- the user flags don't have the same semantic:
  - for legacy IP/UDP/TCP/SCTP, it selects software or hardware checksum
  - for other (vxlan), it selects between hardware checksum or no
    checksum
- the code relies too much on flags set by the driver without software
  alternative (ex: PKT_RX_TUNNEL_IPV4_HDR). It is nice to be able to
  compare a software implementation with the hardware offload.

This commit tries to fix these issues, and provide a simple definition
of what is done by the forward engine:

 * Receive a burst of packets, and for supported packet types:
 *  - modify the IPs
 *  - reprocess the checksum in SW or HW, depending on testpmd command line
 *    configuration
 * Then packets are transmitted on the output port.
 *
 * Supported packets are:
 *   Ether / (vlan) / IP|IP6 / UDP|TCP|SCTP .
 *   Ether / (vlan) / IP|IP6 / UDP / VxLAN / Ether / IP|IP6 / UDP|TCP|SCTP
 *
 * The network parser supposes that the packet is contiguous, which may
 * not be the case in real life.

Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
---
 app/test-pmd/cmdline.c  | 156 ++++++++---
 app/test-pmd/config.c   |  13 +-
 app/test-pmd/csumonly.c | 679 ++++++++++++++++++++++--------------------------
 app/test-pmd/testpmd.h  |  17 +-
 4 files changed, 440 insertions(+), 425 deletions(-)

diff --git a/app/test-pmd/cmdline.c b/app/test-pmd/cmdline.c
index bb4e75c..722cd76 100644
--- a/app/test-pmd/cmdline.c
+++ b/app/test-pmd/cmdline.c
@@ -316,19 +316,19 @@ static void cmd_help_long_parsed(void *parsed_result,
 			"    Disable hardware insertion of a VLAN header in"
 			" packets sent on a port.\n\n"
 
-			"tx_checksum set (mask) (port_id)\n"
-			"    Enable hardware insertion of checksum offload with"
-			" the 8-bit mask, 0~0xff, in packets sent on a port.\n"
-			"        bit 0 - insert ip   checksum offload if set\n"
-			"        bit 1 - insert udp  checksum offload if set\n"
-			"        bit 2 - insert tcp  checksum offload if set\n"
-			"        bit 3 - insert sctp checksum offload if set\n"
-			"        bit 4 - insert inner ip  checksum offload if set\n"
-			"        bit 5 - insert inner udp checksum offload if set\n"
-			"        bit 6 - insert inner tcp checksum offload if set\n"
-			"        bit 7 - insert inner sctp checksum offload if set\n"
+			"tx_cksum set (ip|udp|tcp|sctp|vxlan) (hw|sw) (port_id)\n"
+			"    Select hardware or software calculation of the"
+			" checksum with when transmitting a packet using the"
+			" csum forward engine.\n"
+			"    ip|udp|tcp|sctp always concern the inner layer.\n"
+			"    vxlan concerns the outer IP and UDP layer (in"
+			" case the packet is recognized as a vxlan packet by"
+			" the forward engine)\n"
 			"    Please check the NIC datasheet for HW limits.\n\n"
 
+			"tx_checksum show (port_id)\n"
+			"    Display tx checksum offload configuration\n\n"
+
 			"set fwd (%s)\n"
 			"    Set packet forwarding mode.\n\n"
 
@@ -2855,48 +2855,131 @@ cmdline_parse_inst_t cmd_tx_vlan_reset = {
 
 
 /* *** ENABLE HARDWARE INSERTION OF CHECKSUM IN TX PACKETS *** */
-struct cmd_tx_cksum_set_result {
+struct cmd_tx_cksum_result {
 	cmdline_fixed_string_t tx_cksum;
-	cmdline_fixed_string_t set;
-	uint8_t cksum_mask;
+	cmdline_fixed_string_t mode;
+	cmdline_fixed_string_t proto;
+	cmdline_fixed_string_t hwsw;
 	uint8_t port_id;
 };
 
 static void
-cmd_tx_cksum_set_parsed(void *parsed_result,
+cmd_tx_cksum_parsed(void *parsed_result,
 		       __attribute__((unused)) struct cmdline *cl,
 		       __attribute__((unused)) void *data)
 {
-	struct cmd_tx_cksum_set_result *res = parsed_result;
+	struct cmd_tx_cksum_result *res = parsed_result;
+	int hw = 0;
+	uint16_t ol_flags, mask = 0;
+	struct rte_eth_dev_info dev_info;
+
+	if (port_id_is_invalid(res->port_id)) {
+		printf("invalid port %d\n", res->port_id);
+		return;
+	}
 
-	tx_cksum_set(res->port_id, res->cksum_mask);
+	if (!strcmp(res->mode, "set")) {
+
+		if (!strcmp(res->hwsw, "hw"))
+			hw = 1;
+
+		if (!strcmp(res->proto, "ip")) {
+			mask = TESTPMD_TX_OFFLOAD_IP_CKSUM;
+		} else if (!strcmp(res->proto, "udp")) {
+			mask = TESTPMD_TX_OFFLOAD_UDP_CKSUM;
+		} else if (!strcmp(res->proto, "tcp")) {
+			mask = TESTPMD_TX_OFFLOAD_TCP_CKSUM;
+		} else if (!strcmp(res->proto, "sctp")) {
+			mask = TESTPMD_TX_OFFLOAD_SCTP_CKSUM;
+		} else if (!strcmp(res->proto, "vxlan")) {
+			mask = TESTPMD_TX_OFFLOAD_VXLAN_CKSUM;
+		}
+
+		if (hw)
+			ports[res->port_id].tx_ol_flags |= mask;
+		else
+			ports[res->port_id].tx_ol_flags &= (~mask);
+	}
+
+	ol_flags = ports[res->port_id].tx_ol_flags;
+	printf("IP checksum offload is %s\n",
+		(ol_flags & TESTPMD_TX_OFFLOAD_IP_CKSUM) ? "hw" : "sw");
+	printf("UDP checksum offload is %s\n",
+		(ol_flags & TESTPMD_TX_OFFLOAD_UDP_CKSUM) ? "hw" : "sw");
+	printf("TCP checksum offload is %s\n",
+		(ol_flags & TESTPMD_TX_OFFLOAD_TCP_CKSUM) ? "hw" : "sw");
+	printf("SCTP checksum offload is %s\n",
+		(ol_flags & TESTPMD_TX_OFFLOAD_SCTP_CKSUM) ? "hw" : "sw");
+	printf("VxLAN checksum offload is %s\n",
+		(ol_flags & TESTPMD_TX_OFFLOAD_VXLAN_CKSUM) ? "hw" : "sw");
+
+	/* display warnings if configuration is not supported by the NIC */
+	rte_eth_dev_info_get(res->port_id, &dev_info);
+	if ((ol_flags & TESTPMD_TX_OFFLOAD_IP_CKSUM) &&
+		(dev_info.tx_offload_capa & DEV_TX_OFFLOAD_IPV4_CKSUM) == 0) {
+		printf("Warning: hardware IP checksum enabled but not "
+			"supported by port %d\n", res->port_id);
+	}
+	if ((ol_flags & TESTPMD_TX_OFFLOAD_UDP_CKSUM) &&
+		(dev_info.tx_offload_capa & DEV_TX_OFFLOAD_UDP_CKSUM) == 0) {
+		printf("Warning: hardware UDP checksum enabled but not "
+			"supported by port %d\n", res->port_id);
+	}
+	if ((ol_flags & TESTPMD_TX_OFFLOAD_TCP_CKSUM) &&
+		(dev_info.tx_offload_capa & DEV_TX_OFFLOAD_TCP_CKSUM) == 0) {
+		printf("Warning: hardware TCP checksum enabled but not "
+			"supported by port %d\n", res->port_id);
+	}
+	if ((ol_flags & TESTPMD_TX_OFFLOAD_SCTP_CKSUM) &&
+		(dev_info.tx_offload_capa & DEV_TX_OFFLOAD_SCTP_CKSUM) == 0) {
+		printf("Warning: hardware SCTP checksum enabled but not "
+			"supported by port %d\n", res->port_id);
+	}
 }
 
-cmdline_parse_token_string_t cmd_tx_cksum_set_tx_cksum =
-	TOKEN_STRING_INITIALIZER(struct cmd_tx_cksum_set_result,
+cmdline_parse_token_string_t cmd_tx_cksum_tx_cksum =
+	TOKEN_STRING_INITIALIZER(struct cmd_tx_cksum_result,
 				tx_cksum, "tx_checksum");
-cmdline_parse_token_string_t cmd_tx_cksum_set_set =
-	TOKEN_STRING_INITIALIZER(struct cmd_tx_cksum_set_result,
-				set, "set");
-cmdline_parse_token_num_t cmd_tx_cksum_set_cksum_mask =
-	TOKEN_NUM_INITIALIZER(struct cmd_tx_cksum_set_result,
-				cksum_mask, UINT8);
-cmdline_parse_token_num_t cmd_tx_cksum_set_portid =
-	TOKEN_NUM_INITIALIZER(struct cmd_tx_cksum_set_result,
+cmdline_parse_token_string_t cmd_tx_cksum_mode =
+	TOKEN_STRING_INITIALIZER(struct cmd_tx_cksum_result,
+				mode, "set");
+cmdline_parse_token_string_t cmd_tx_cksum_proto =
+	TOKEN_STRING_INITIALIZER(struct cmd_tx_cksum_result,
+				proto, "ip#tcp#udp#sctp#vxlan");
+cmdline_parse_token_string_t cmd_tx_cksum_hwsw =
+	TOKEN_STRING_INITIALIZER(struct cmd_tx_cksum_result,
+				hwsw, "hw#sw");
+cmdline_parse_token_num_t cmd_tx_cksum_portid =
+	TOKEN_NUM_INITIALIZER(struct cmd_tx_cksum_result,
 				port_id, UINT8);
 
 cmdline_parse_inst_t cmd_tx_cksum_set = {
-	.f = cmd_tx_cksum_set_parsed,
+	.f = cmd_tx_cksum_parsed,
+	.data = NULL,
+	.help_str = "enable/disable hardware calculation of L3/L4 checksum when "
+		"using csum forward engine: tx_cksum set ip|tcp|udp|sctp|vxlan hw|sw <port>",
+	.tokens = {
+		(void *)&cmd_tx_cksum_tx_cksum,
+		(void *)&cmd_tx_cksum_mode,
+		(void *)&cmd_tx_cksum_proto,
+		(void *)&cmd_tx_cksum_hwsw,
+		(void *)&cmd_tx_cksum_portid,
+		NULL,
+	},
+};
+
+cmdline_parse_token_string_t cmd_tx_cksum_mode_show =
+	TOKEN_STRING_INITIALIZER(struct cmd_tx_cksum_result,
+				mode, "show");
+
+cmdline_parse_inst_t cmd_tx_cksum_show = {
+	.f = cmd_tx_cksum_parsed,
 	.data = NULL,
-	.help_str = "enable hardware insertion of L3/L4checksum with a given "
-	"mask in packets sent on a port, the bit mapping is given as, Bit 0 for ip, "
-	"Bit 1 for UDP, Bit 2 for TCP, Bit 3 for SCTP, Bit 4 for inner ip, "
-	"Bit 5 for inner UDP, Bit 6 for inner TCP, Bit 7 for inner SCTP",
+	.help_str = "show checksum offload configuration: tx_cksum show <port>",
 	.tokens = {
-		(void *)&cmd_tx_cksum_set_tx_cksum,
-		(void *)&cmd_tx_cksum_set_set,
-		(void *)&cmd_tx_cksum_set_cksum_mask,
-		(void *)&cmd_tx_cksum_set_portid,
+		(void *)&cmd_tx_cksum_tx_cksum,
+		(void *)&cmd_tx_cksum_mode_show,
+		(void *)&cmd_tx_cksum_portid,
 		NULL,
 	},
 };
@@ -8576,6 +8659,7 @@ cmdline_parse_ctx_t main_ctx[] = {
 	(cmdline_parse_inst_t *)&cmd_tx_vlan_reset,
 	(cmdline_parse_inst_t *)&cmd_tx_vlan_set_pvid,
 	(cmdline_parse_inst_t *)&cmd_tx_cksum_set,
+	(cmdline_parse_inst_t *)&cmd_tx_cksum_show,
 	(cmdline_parse_inst_t *)&cmd_link_flow_control_set,
 	(cmdline_parse_inst_t *)&cmd_link_flow_control_set_rx,
 	(cmdline_parse_inst_t *)&cmd_link_flow_control_set_tx,
diff --git a/app/test-pmd/config.c b/app/test-pmd/config.c
index c5ac8a5..71b34dd 100644
--- a/app/test-pmd/config.c
+++ b/app/test-pmd/config.c
@@ -32,7 +32,7 @@
  */
 /*   BSD LICENSE
  *
- *   Copyright(c) 2013 6WIND.
+ *   Copyright 2013-2014 6WIND S.A.
  *
  *   Redistribution and use in source and binary forms, with or without
  *   modification, are permitted provided that the following conditions
@@ -1757,17 +1757,6 @@ set_qmap(portid_t port_id, uint8_t is_rx, uint16_t queue_id, uint8_t map_value)
 }
 
 void
-tx_cksum_set(portid_t port_id, uint64_t ol_flags)
-{
-	uint64_t tx_ol_flags;
-	if (port_id_is_invalid(port_id))
-		return;
-	/* Clear last 8 bits and then set L3/4 checksum mask again */
-	tx_ol_flags = ports[port_id].tx_ol_flags & (~0x0FFull);
-	ports[port_id].tx_ol_flags = ((ol_flags & 0xff) | tx_ol_flags);
-}
-
-void
 fdir_add_signature_filter(portid_t port_id, uint8_t queue_id,
 			  struct rte_fdir_filter *fdir_filter)
 {
diff --git a/app/test-pmd/csumonly.c b/app/test-pmd/csumonly.c
index 743094a..6b28003 100644
--- a/app/test-pmd/csumonly.c
+++ b/app/test-pmd/csumonly.c
@@ -2,6 +2,7 @@
  *   BSD LICENSE
  *
  *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
+ *   Copyright 2014 6WIND S.A.
  *   All rights reserved.
  *
  *   Redistribution and use in source and binary forms, with or without
@@ -73,13 +74,19 @@
 #include <rte_string_fns.h>
 #include "testpmd.h"
 
-
-
 #define IP_DEFTTL  64   /* from RFC 1340. */
 #define IP_VERSION 0x40
 #define IP_HDRLEN  0x05 /* default IP header length == five 32-bits words. */
 #define IP_VHL_DEF (IP_VERSION | IP_HDRLEN)
 
+/* we cannot use htons() from arpa/inet.h due to name conflicts, and we
+ * cannot use rte_cpu_to_be_16() on a constant in a switch/case */
+#if __BYTE_ORDER == __LITTLE_ENDIAN
+#define _htons(x) ((uint16_t)((((x) & 0x00ffU) << 8) | (((x) & 0xff00U) >> 8)))
+#else
+#define _htons(x) (x)
+#endif
+
 static inline uint16_t
 get_16b_sum(uint16_t *ptr16, uint32_t nr)
 {
@@ -112,7 +119,7 @@ get_ipv4_cksum(struct ipv4_hdr *ipv4_hdr)
 
 
 static inline uint16_t
-get_ipv4_psd_sum (struct ipv4_hdr * ip_hdr)
+get_ipv4_psd_sum(struct ipv4_hdr *ip_hdr)
 {
 	/* Pseudo Header for IPv4/UDP/TCP checksum */
 	union ipv4_psd_header {
@@ -136,7 +143,7 @@ get_ipv4_psd_sum (struct ipv4_hdr * ip_hdr)
 }
 
 static inline uint16_t
-get_ipv6_psd_sum (struct ipv6_hdr * ip_hdr)
+get_ipv6_psd_sum(struct ipv6_hdr *ip_hdr)
 {
 	/* Pseudo Header for IPv6/UDP/TCP checksum */
 	union ipv6_psd_header {
@@ -158,6 +165,15 @@ get_ipv6_psd_sum (struct ipv6_hdr * ip_hdr)
 	return get_16b_sum(psd_hdr.u16_arr, sizeof(psd_hdr));
 }
 
+static uint16_t
+get_psd_sum(void *l3_hdr, uint16_t ethertype)
+{
+	if (ethertype == _htons(ETHER_TYPE_IPv4))
+		return get_ipv4_psd_sum(l3_hdr);
+	else /* assume ethertype == ETHER_TYPE_IPv6 */
+		return get_ipv6_psd_sum(l3_hdr);
+}
+
 static inline uint16_t
 get_ipv4_udptcp_checksum(struct ipv4_hdr *ipv4_hdr, uint16_t *l4_hdr)
 {
@@ -174,7 +190,6 @@ get_ipv4_udptcp_checksum(struct ipv4_hdr *ipv4_hdr, uint16_t *l4_hdr)
 	if (cksum == 0)
 		cksum = 0xffff;
 	return (uint16_t)cksum;
-
 }
 
 static inline uint16_t
@@ -196,48 +211,228 @@ get_ipv6_udptcp_checksum(struct ipv6_hdr *ipv6_hdr, uint16_t *l4_hdr)
 	return (uint16_t)cksum;
 }
 
+static uint16_t
+get_udptcp_checksum(void *l3_hdr, void *l4_hdr, uint16_t ethertype)
+{
+	if (ethertype == _htons(ETHER_TYPE_IPv4))
+		return get_ipv4_udptcp_checksum(l3_hdr, l4_hdr);
+	else /* assume ethertype == ETHER_TYPE_IPv6 */
+		return get_ipv6_udptcp_checksum(l3_hdr, l4_hdr);
+}
+
+/*
+ * Parse an ethernet header to fill the ethertype, l2_len, l3_len and
+ * ipproto. This function is able to recognize IPv4/IPv6 with one optional vlan
+ * header.
+ */
+static void
+parse_ethernet(struct ether_hdr *eth_hdr, uint16_t *ethertype, uint16_t *l2_len,
+	uint16_t *l3_len, uint8_t *l4_proto)
+{
+	struct ipv4_hdr *ipv4_hdr;
+	struct ipv6_hdr *ipv6_hdr;
+
+	*l2_len = sizeof(struct ether_hdr);
+	*ethertype = eth_hdr->ether_type;
+
+	if (*ethertype == _htons(ETHER_TYPE_VLAN)) {
+		struct vlan_hdr *vlan_hdr = (struct vlan_hdr *)(eth_hdr + 1);
+
+		*l2_len  += sizeof(struct vlan_hdr);
+		*ethertype = vlan_hdr->eth_proto;
+	}
+
+	switch (*ethertype) {
+	case _htons(ETHER_TYPE_IPv4):
+		ipv4_hdr = (struct ipv4_hdr *) ((char *)eth_hdr + *l2_len);
+		*l3_len = (ipv4_hdr->version_ihl & 0x0f) * 4;
+		*l4_proto = ipv4_hdr->next_proto_id;
+		break;
+	case _htons(ETHER_TYPE_IPv6):
+		ipv6_hdr = (struct ipv6_hdr *) ((char *)eth_hdr + *l2_len);
+		*l3_len = sizeof(struct ipv6_hdr) ;
+		*l4_proto = ipv6_hdr->proto;
+		break;
+	default:
+		*l3_len = 0;
+		*l4_proto = 0;
+		break;
+	}
+}
+
+/* modify the IPv4 or IPv4 source address of a packet */
+static void
+change_ip_addresses(void *l3_hdr, uint16_t ethertype)
+{
+	struct ipv4_hdr *ipv4_hdr = l3_hdr;
+	struct ipv6_hdr *ipv6_hdr = l3_hdr;
+
+	if (ethertype == _htons(ETHER_TYPE_IPv4)) {
+		ipv4_hdr->src_addr =
+			rte_cpu_to_be_32(rte_be_to_cpu_32(ipv4_hdr->src_addr) + 1);
+	}
+	else if (ethertype == _htons(ETHER_TYPE_IPv6)) {
+		ipv6_hdr->src_addr[15] = ipv6_hdr->src_addr[15] + 1;
+	}
+}
+
+/* if possible, calculate the checksum of a packet in hw or sw,
+ * depending on the testpmd command line configuration */
+static uint64_t
+process_inner_cksums(void *l3_hdr, uint16_t ethertype, uint16_t l3_len,
+	uint8_t l4_proto, uint16_t testpmd_ol_flags)
+{
+	struct ipv4_hdr *ipv4_hdr = l3_hdr;
+	struct udp_hdr *udp_hdr;
+	struct tcp_hdr *tcp_hdr;
+	struct sctp_hdr *sctp_hdr;
+	uint64_t ol_flags = 0;
+
+	if (ethertype == _htons(ETHER_TYPE_IPv4)) {
+		ipv4_hdr = l3_hdr;
+		ipv4_hdr->hdr_checksum = 0;
+
+		if (testpmd_ol_flags & TESTPMD_TX_OFFLOAD_IP_CKSUM)
+			ol_flags |= PKT_TX_IP_CKSUM;
+		else
+			ipv4_hdr->hdr_checksum = get_ipv4_cksum(ipv4_hdr);
+
+		ol_flags |= PKT_TX_IPV4;
+	}
+	else if (ethertype == _htons(ETHER_TYPE_IPv6))
+		ol_flags |= PKT_TX_IPV6;
+	else
+		return 0; /* packet type not supported, nothing to do */
+
+	if (l4_proto == IPPROTO_UDP) {
+		udp_hdr = (struct udp_hdr *)((char *)l3_hdr + l3_len);
+		/* do not recalculate udp cksum if it was 0 */
+		if (udp_hdr->dgram_cksum != 0) {
+			udp_hdr->dgram_cksum = 0;
+			if (testpmd_ol_flags & TESTPMD_TX_OFFLOAD_UDP_CKSUM) {
+				ol_flags |= PKT_TX_UDP_CKSUM;
+				udp_hdr->dgram_cksum = get_psd_sum(l3_hdr,
+					ethertype);
+			}
+			else {
+				udp_hdr->dgram_cksum =
+					get_udptcp_checksum(l3_hdr, udp_hdr,
+						ethertype);
+			}
+		}
+	}
+	else if (l4_proto == IPPROTO_TCP) {
+		tcp_hdr = (struct tcp_hdr *)((char *)l3_hdr + l3_len);
+		tcp_hdr->cksum = 0;
+		if (testpmd_ol_flags & TESTPMD_TX_OFFLOAD_TCP_CKSUM) {
+			ol_flags |= PKT_TX_TCP_CKSUM;
+			tcp_hdr->cksum = get_psd_sum(l3_hdr, ethertype);
+		}
+		else {
+			tcp_hdr->cksum =
+				get_udptcp_checksum(l3_hdr, tcp_hdr, ethertype);
+		}
+	}
+	else if (l4_proto == IPPROTO_SCTP) {
+		sctp_hdr = (struct sctp_hdr *)((char *)l3_hdr + l3_len);
+		sctp_hdr->cksum = 0;
+		/* sctp payload must be a multiple of 4 to be
+		 * offloaded */
+		if ((testpmd_ol_flags & TESTPMD_TX_OFFLOAD_SCTP_CKSUM) &&
+			((ipv4_hdr->total_length & 0x3) == 0)) {
+			ol_flags |= PKT_TX_SCTP_CKSUM;
+		}
+		else {
+			/* XXX implement CRC32c, example available in
+			 * RFC3309 */
+		}
+	}
+
+	return ol_flags;
+}
+
+/* Calculate the checksum of outer header (only vxlan is supported,
+ * meaning IP + UDP). The caller already checked that it's a vxlan
+ * packet */
+static uint64_t
+process_outer_cksums(void *outer_l3_hdr, uint16_t outer_ethertype,
+	uint16_t outer_l3_len, uint16_t testpmd_ol_flags)
+{
+	struct ipv4_hdr *ipv4_hdr = outer_l3_hdr;
+	struct ipv6_hdr *ipv6_hdr = outer_l3_hdr;
+	struct udp_hdr *udp_hdr;
+	uint64_t ol_flags = 0;
+
+	if (testpmd_ol_flags & TESTPMD_TX_OFFLOAD_VXLAN_CKSUM)
+		ol_flags |= PKT_TX_VXLAN_CKSUM;
+
+	if (outer_ethertype == _htons(ETHER_TYPE_IPv4)) {
+		ipv4_hdr->hdr_checksum = 0;
+
+		if ((testpmd_ol_flags & TESTPMD_TX_OFFLOAD_VXLAN_CKSUM) == 0)
+			ipv4_hdr->hdr_checksum = get_ipv4_cksum(ipv4_hdr);
+	}
+
+	udp_hdr = (struct udp_hdr *)((char *)outer_l3_hdr + outer_l3_len);
+	/* do not recalculate udp cksum if it was 0 */
+	if (udp_hdr->dgram_cksum != 0) {
+		udp_hdr->dgram_cksum = 0;
+		if ((testpmd_ol_flags & TESTPMD_TX_OFFLOAD_VXLAN_CKSUM) == 0) {
+			if (outer_ethertype == _htons(ETHER_TYPE_IPv4))
+				udp_hdr->dgram_cksum =
+					get_ipv4_udptcp_checksum(ipv4_hdr,
+						(uint16_t *)udp_hdr);
+			else
+				udp_hdr->dgram_cksum =
+					get_ipv6_udptcp_checksum(ipv6_hdr,
+						(uint16_t *)udp_hdr);
+		}
+	}
+
+	return ol_flags;
+}
 
 /*
- * Forwarding of packets. Change the checksum field with HW or SW methods
- * The HW/SW method selection depends on the ol_flags on every packet
+ * Receive a burst of packets, and for each packet:
+ *  - parse packet, and try to recognize a supported packet type (1)
+ *  - if it's not a supported packet type, don't touch the packet, else:
+ *  - modify the IPs in inner headers and in outer headers if any
+ *  - reprocess the checksum of all supported layers. This is done in SW
+ *    or HW, depending on testpmd command line configuration
+ * Then transmit packets on the output port.
+ *
+ * (1) Supported packets are:
+ *   Ether / (vlan) / IP|IP6 / UDP|TCP|SCTP .
+ *   Ether / (vlan) / outer IP|IP6 / outer UDP / VxLAN / Ether / IP|IP6 /
+ *           UDP|TCP|SCTP
+ *
+ * The testpmd command line for this forward engine sets the flags
+ * TESTPMD_TX_OFFLOAD_* in ports[tx_port].tx_ol_flags. They control
+ * wether a checksum must be calculated in software or in hardware. The
+ * IP, UDP, TCP and SCTP flags always concern the inner layer.  The
+ * VxLAN flag concerns the outer IP and UDP layer (if packet is
+ * recognized as a vxlan packet).
  */
 static void
 pkt_burst_checksum_forward(struct fwd_stream *fs)
 {
-	struct rte_mbuf  *pkts_burst[MAX_PKT_BURST];
-	struct rte_port  *txp;
-	struct rte_mbuf  *mb;
+	struct rte_mbuf *pkts_burst[MAX_PKT_BURST];
+	struct rte_port *txp;
+	struct rte_mbuf *m;
 	struct ether_hdr *eth_hdr;
-	struct ipv4_hdr  *ipv4_hdr;
-	struct ether_hdr *inner_eth_hdr;
-	struct ipv4_hdr  *inner_ipv4_hdr = NULL;
-	struct ipv6_hdr  *ipv6_hdr;
-	struct ipv6_hdr  *inner_ipv6_hdr = NULL;
-	struct udp_hdr   *udp_hdr;
-	struct udp_hdr   *inner_udp_hdr;
-	struct tcp_hdr   *tcp_hdr;
-	struct tcp_hdr   *inner_tcp_hdr;
-	struct sctp_hdr  *sctp_hdr;
-	struct sctp_hdr  *inner_sctp_hdr;
-
+	void *l3_hdr = NULL, *outer_l3_hdr = NULL; /* can be IPv4 or IPv6 */
+	struct udp_hdr *udp_hdr;
 	uint16_t nb_rx;
 	uint16_t nb_tx;
 	uint16_t i;
 	uint64_t ol_flags;
-	uint64_t pkt_ol_flags;
-	uint64_t tx_ol_flags;
-	uint16_t l4_proto;
-	uint16_t inner_l4_proto = 0;
-	uint16_t eth_type;
-	uint8_t  l2_len;
-	uint8_t  l3_len;
-	uint8_t  inner_l2_len = 0;
-	uint8_t  inner_l3_len = 0;
-
+	uint16_t testpmd_ol_flags;
+	uint8_t l4_proto;
+	uint16_t ethertype = 0, outer_ethertype = 0;
+	uint16_t  l2_len = 0, l3_len = 0, outer_l2_len = 0, outer_l3_len = 0;
+	int tunnel = 0;
 	uint32_t rx_bad_ip_csum;
 	uint32_t rx_bad_l4_csum;
-	uint8_t  ipv4_tunnel;
-	uint8_t  ipv6_tunnel;
 
 #ifdef RTE_TEST_PMD_RECORD_CORE_CYCLES
 	uint64_t start_tsc;
@@ -249,9 +444,7 @@ pkt_burst_checksum_forward(struct fwd_stream *fs)
 	start_tsc = rte_rdtsc();
 #endif
 
-	/*
-	 * Receive a burst of packets and forward them.
-	 */
+	/* receive a burst of packet */
 	nb_rx = rte_eth_rx_burst(fs->rx_port, fs->rx_queue, pkts_burst,
 				 nb_pkt_per_burst);
 	if (unlikely(nb_rx == 0))
@@ -265,348 +458,107 @@ pkt_burst_checksum_forward(struct fwd_stream *fs)
 	rx_bad_l4_csum = 0;
 
 	txp = &ports[fs->tx_port];
-	tx_ol_flags = txp->tx_ol_flags;
+	testpmd_ol_flags = txp->tx_ol_flags;
 
 	for (i = 0; i < nb_rx; i++) {
 
-		mb = pkts_burst[i];
-		l2_len  = sizeof(struct ether_hdr);
-		pkt_ol_flags = mb->ol_flags;
-		ol_flags = (pkt_ol_flags & (~PKT_TX_L4_MASK));
-		ipv4_tunnel = (pkt_ol_flags & PKT_RX_TUNNEL_IPV4_HDR) ?
-				1 : 0;
-		ipv6_tunnel = (pkt_ol_flags & PKT_RX_TUNNEL_IPV6_HDR) ?
-				1 : 0;
-		eth_hdr = rte_pktmbuf_mtod(mb, struct ether_hdr *);
-		eth_type = rte_be_to_cpu_16(eth_hdr->ether_type);
-		if (eth_type == ETHER_TYPE_VLAN) {
-			/* Only allow single VLAN label here */
-			l2_len  += sizeof(struct vlan_hdr);
-			 eth_type = rte_be_to_cpu_16(*(uint16_t *)
-				((uintptr_t)&eth_hdr->ether_type +
-				sizeof(struct vlan_hdr)));
+		ol_flags = 0;
+		tunnel = 0;
+		m = pkts_burst[i];
+
+		/* Update the L3/L4 checksum error packet statistics */
+		rx_bad_ip_csum += ((m->ol_flags & PKT_RX_IP_CKSUM_BAD) != 0);
+		rx_bad_l4_csum += ((m->ol_flags & PKT_RX_L4_CKSUM_BAD) != 0);
+
+		/* step 1: dissect packet, parsing optional vlan, ip4/ip6, vxlan
+		 * and inner headers */
+
+		eth_hdr = rte_pktmbuf_mtod(m, struct ether_hdr *);
+		parse_ethernet(eth_hdr, &ethertype, &l2_len, &l3_len, &l4_proto);
+		l3_hdr = (char *)eth_hdr + l2_len;
+
+		/* check if it's a supported tunnel (only vxlan for now) */
+		if (l4_proto == IPPROTO_UDP) {
+			udp_hdr = (struct udp_hdr *)((char *)l3_hdr + l3_len);
+
+			/* currently, this flag is set by i40e only if the
+			 * packet is vxlan */
+			if (((m->ol_flags & PKT_RX_TUNNEL_IPV4_HDR) ||
+					(m->ol_flags & PKT_RX_TUNNEL_IPV6_HDR)))
+				tunnel = 1;
+			/* else check udp destination port, 4789 is the default
+			 * vxlan port (rfc7348) */
+			else if (udp_hdr->dst_port == _htons(4789))
+				tunnel = 1;
+
+			if (tunnel == 1) {
+				outer_ethertype = ethertype;
+				outer_l2_len = l2_len;
+				outer_l3_len = l3_len;
+				outer_l3_hdr = l3_hdr;
+
+				eth_hdr = (struct ether_hdr *)((char *)udp_hdr +
+					sizeof(struct udp_hdr) +
+					sizeof(struct vxlan_hdr));
+
+				parse_ethernet(eth_hdr, &ethertype, &l2_len,
+					&l3_len, &l4_proto);
+				l3_hdr = (char *)eth_hdr + l2_len;
+			}
 		}
 
-		/* Update the L3/L4 checksum error packet count  */
-		rx_bad_ip_csum += (uint16_t) ((pkt_ol_flags & PKT_RX_IP_CKSUM_BAD) != 0);
-		rx_bad_l4_csum += (uint16_t) ((pkt_ol_flags & PKT_RX_L4_CKSUM_BAD) != 0);
-
-		/*
-		 * Try to figure out L3 packet type by SW.
-		 */
-		if ((pkt_ol_flags & (PKT_RX_IPV4_HDR | PKT_RX_IPV4_HDR_EXT |
-				PKT_RX_IPV6_HDR | PKT_RX_IPV6_HDR_EXT)) == 0) {
-			if (eth_type == ETHER_TYPE_IPv4)
-				pkt_ol_flags |= PKT_RX_IPV4_HDR;
-			else if (eth_type == ETHER_TYPE_IPv6)
-				pkt_ol_flags |= PKT_RX_IPV6_HDR;
-		}
+		/* step 2: change all source IPs (v4 or v6) so we need
+		 * to recompute the chksums even if they were correct */
 
-		/*
-		 * Simplify the protocol parsing
-		 * Assuming the incoming packets format as
-		 *      Ethernet2 + optional single VLAN
-		 *      + ipv4 or ipv6
-		 *      + udp or tcp or sctp or others
-		 */
-		if (pkt_ol_flags & (PKT_RX_IPV4_HDR | PKT_RX_TUNNEL_IPV4_HDR)) {
+		change_ip_addresses(l3_hdr, ethertype);
+		if (tunnel == 1)
+			change_ip_addresses(outer_l3_hdr, outer_ethertype);
 
-			/* Do not support ipv4 option field */
-			l3_len = sizeof(struct ipv4_hdr) ;
+		/* step 3: depending on user command line configuration,
+		 * recompute checksum either in software or flag the
+		 * mbuf to offload the calculation to the NIC */
 
-			ipv4_hdr = (struct ipv4_hdr *) (rte_pktmbuf_mtod(mb,
-					unsigned char *) + l2_len);
+		/* process checksums of inner headers first */
+		ol_flags |= process_inner_cksums(l3_hdr, ethertype,
+			l3_len, l4_proto, testpmd_ol_flags);
 
-			l4_proto = ipv4_hdr->next_proto_id;
+		/* Then process outer headers if any. Note that the software
+		 * checksum will be wrong if one of the inner checksums is
+		 * processed in hardware. */
+		if (tunnel == 1) {
+			ol_flags |= process_outer_cksums(outer_l3_hdr,
+				outer_ethertype, outer_l3_len, testpmd_ol_flags);
+		}
 
-			/* Do not delete, this is required by HW*/
-			ipv4_hdr->hdr_checksum = 0;
+		/* step 4: fill the mbuf meta data (flags and header lengths) */
 
-			if (tx_ol_flags & TESTPMD_TX_OFFLOAD_IP_CKSUM) {
-				/* HW checksum */
-				ol_flags |= PKT_TX_IP_CKSUM;
+		if (tunnel == 1) {
+			if (testpmd_ol_flags & TESTPMD_TX_OFFLOAD_VXLAN_CKSUM) {
+				m->l2_len = outer_l2_len;
+				m->l3_len = outer_l3_len;
+				m->inner_l2_len = l2_len;
+				m->inner_l3_len = l3_len;
 			}
 			else {
-				ol_flags |= PKT_TX_IPV4;
-				/* SW checksum calculation */
-				ipv4_hdr->src_addr++;
-				ipv4_hdr->hdr_checksum = get_ipv4_cksum(ipv4_hdr);
+				/* if we don't do vxlan cksum in hw,
+				   outer checksum will be wrong becaus