DPDK patches and discussions
 help / color / mirror / Atom feed
* [dpdk-dev] [PATCH 0/5] Offload flags fixes
@ 2021-04-01  9:52 David Marchand
  2021-04-01  9:52 ` [dpdk-dev] [PATCH 1/5] mbuf: mark old offload flag as deprecated David Marchand
                   ` (7 more replies)
  0 siblings, 8 replies; 63+ messages in thread
From: David Marchand @ 2021-04-01  9:52 UTC (permalink / raw)
  To: dev; +Cc: maxime.coquelin, olivier.matz, fbl, i.maximets

The important part is the last patch on vhost handling of offloading
requests coming from a virtio guest interface.

The rest are small fixes that I accumulated while reviewing the mbuf
offload flags.

On this last patch, it has the potential of breaking existing
applications using the vhost library (OVS being impacted).
I did not mark it for backport, but I am having second thoughts.

The vhost example has not been updated yet, as I wanted to send this
series first to get feedback before looking more into the example code.


-- 
David Marchand

David Marchand (5):
  mbuf: mark old offload flag as deprecated
  net/tap: do not touch Tx offload flags
  net/virtio: do not touch Tx offload flags
  net/virtio: refactor Tx offload helper
  vhost: fix offload flags in Rx path

 drivers/net/tap/rte_eth_tap.c                |  17 ++-
 drivers/net/virtio/virtio_rxtx.c             |   7 +-
 drivers/net/virtio/virtio_rxtx_packed_avx.h  |   2 +-
 drivers/net/virtio/virtio_rxtx_packed_neon.h |   2 +-
 drivers/net/virtio/virtqueue.h               |  81 +++++-----
 examples/vhost/main.c                        |   6 +
 lib/librte_mbuf/rte_mbuf_core.h              |   3 +-
 lib/librte_vhost/virtio_net.c                | 148 ++++++++-----------
 8 files changed, 123 insertions(+), 143 deletions(-)

-- 
2.23.0


^ permalink raw reply	[flat|nested] 63+ messages in thread

* [dpdk-dev] [PATCH 1/5] mbuf: mark old offload flag as deprecated
  2021-04-01  9:52 [dpdk-dev] [PATCH 0/5] Offload flags fixes David Marchand
@ 2021-04-01  9:52 ` David Marchand
  2021-04-07 20:14   ` Flavio Leitner
  2021-04-08  7:23   ` Olivier Matz
  2021-04-01  9:52 ` [dpdk-dev] [PATCH 2/5] net/tap: do not touch Tx offload flags David Marchand
                   ` (6 subsequent siblings)
  7 siblings, 2 replies; 63+ messages in thread
From: David Marchand @ 2021-04-01  9:52 UTC (permalink / raw)
  To: dev; +Cc: maxime.coquelin, olivier.matz, fbl, i.maximets

PKT_RX_EIP_CKSUM_BAD has been declared deprecated quite some time ago,
but there was no warning to applications still using it.
Fix this by marking as deprecated with the newly introduced
RTE_DEPRECATED.

Signed-off-by: David Marchand <david.marchand@redhat.com>
---
 lib/librte_mbuf/rte_mbuf_core.h | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/lib/librte_mbuf/rte_mbuf_core.h b/lib/librte_mbuf/rte_mbuf_core.h
index c17dc95c51..bb38d7f581 100644
--- a/lib/librte_mbuf/rte_mbuf_core.h
+++ b/lib/librte_mbuf/rte_mbuf_core.h
@@ -83,7 +83,8 @@ extern "C" {
  * Deprecated.
  * This flag has been renamed, use PKT_RX_OUTER_IP_CKSUM_BAD instead.
  */
-#define PKT_RX_EIP_CKSUM_BAD PKT_RX_OUTER_IP_CKSUM_BAD
+#define PKT_RX_EIP_CKSUM_BAD \
+	RTE_DEPRECATED(PKT_RX_EIP_CKSUM_BAD) PKT_RX_OUTER_IP_CKSUM_BAD
 
 /**
  * A vlan has been stripped by the hardware and its tci is saved in
-- 
2.23.0


^ permalink raw reply	[flat|nested] 63+ messages in thread

* [dpdk-dev] [PATCH 2/5] net/tap: do not touch Tx offload flags
  2021-04-01  9:52 [dpdk-dev] [PATCH 0/5] Offload flags fixes David Marchand
  2021-04-01  9:52 ` [dpdk-dev] [PATCH 1/5] mbuf: mark old offload flag as deprecated David Marchand
@ 2021-04-01  9:52 ` David Marchand
  2021-04-07 20:15   ` Flavio Leitner
  2021-04-08  7:53   ` Olivier Matz
  2021-04-01  9:52 ` [dpdk-dev] [PATCH 3/5] net/virtio: " David Marchand
                   ` (5 subsequent siblings)
  7 siblings, 2 replies; 63+ messages in thread
From: David Marchand @ 2021-04-01  9:52 UTC (permalink / raw)
  To: dev; +Cc: maxime.coquelin, olivier.matz, fbl, i.maximets, Keith Wiles

Tx offload flags are of the application responsibility.
Leave the mbuf alone and check for TSO where needed.

Signed-off-by: David Marchand <david.marchand@redhat.com>
---
 drivers/net/tap/rte_eth_tap.c | 17 ++++++++++-------
 1 file changed, 10 insertions(+), 7 deletions(-)

diff --git a/drivers/net/tap/rte_eth_tap.c b/drivers/net/tap/rte_eth_tap.c
index c36d4bf76e..285fe395c5 100644
--- a/drivers/net/tap/rte_eth_tap.c
+++ b/drivers/net/tap/rte_eth_tap.c
@@ -562,6 +562,7 @@ tap_tx_l3_cksum(char *packet, uint64_t ol_flags, unsigned int l2_len,
 		uint16_t *l4_phdr_cksum, uint32_t *l4_raw_cksum)
 {
 	void *l3_hdr = packet + l2_len;
+	uint64_t csum_l4;
 
 	if (ol_flags & (PKT_TX_IP_CKSUM | PKT_TX_IPV4)) {
 		struct rte_ipv4_hdr *iph = l3_hdr;
@@ -571,13 +572,17 @@ tap_tx_l3_cksum(char *packet, uint64_t ol_flags, unsigned int l2_len,
 		cksum = rte_raw_cksum(iph, l3_len);
 		iph->hdr_checksum = (cksum == 0xffff) ? cksum : ~cksum;
 	}
-	if (ol_flags & PKT_TX_L4_MASK) {
+
+	csum_l4 = ol_flags & PKT_TX_L4_MASK;
+	if (ol_flags & PKT_TX_TCP_SEG)
+		csum_l4 |= PKT_TX_TCP_CKSUM;
+	if (csum_l4) {
 		void *l4_hdr;
 
 		l4_hdr = packet + l2_len + l3_len;
-		if ((ol_flags & PKT_TX_L4_MASK) == PKT_TX_UDP_CKSUM)
+		if (csum_l4 == PKT_TX_UDP_CKSUM)
 			*l4_cksum = &((struct rte_udp_hdr *)l4_hdr)->dgram_cksum;
-		else if ((ol_flags & PKT_TX_L4_MASK) == PKT_TX_TCP_CKSUM)
+		else if (csum_l4 == PKT_TX_TCP_CKSUM)
 			*l4_cksum = &((struct rte_tcp_hdr *)l4_hdr)->cksum;
 		else
 			return;
@@ -648,7 +653,8 @@ tap_write_mbufs(struct tx_queue *txq, uint16_t num_mbufs,
 		if (txq->csum &&
 		    ((mbuf->ol_flags & (PKT_TX_IP_CKSUM | PKT_TX_IPV4) ||
 		     (mbuf->ol_flags & PKT_TX_L4_MASK) == PKT_TX_UDP_CKSUM ||
-		     (mbuf->ol_flags & PKT_TX_L4_MASK) == PKT_TX_TCP_CKSUM))) {
+		     (mbuf->ol_flags & PKT_TX_L4_MASK) == PKT_TX_TCP_CKSUM) ||
+		     (mbuf->ol_flags & PKT_TX_TCP_SEG))) {
 			is_cksum = 1;
 
 			/* Support only packets with at least layer 4
@@ -742,9 +748,6 @@ pmd_tx_burst(void *queue, struct rte_mbuf **bufs, uint16_t nb_pkts)
 		if (tso) {
 			struct rte_gso_ctx *gso_ctx = &txq->gso_ctx;
 
-			/* TCP segmentation implies TCP checksum offload */
-			mbuf_in->ol_flags |= PKT_TX_TCP_CKSUM;
-
 			/* gso size is calculated without RTE_ETHER_CRC_LEN */
 			hdrs_len = mbuf_in->l2_len + mbuf_in->l3_len +
 					mbuf_in->l4_len;
-- 
2.23.0


^ permalink raw reply	[flat|nested] 63+ messages in thread

* [dpdk-dev] [PATCH 3/5] net/virtio: do not touch Tx offload flags
  2021-04-01  9:52 [dpdk-dev] [PATCH 0/5] Offload flags fixes David Marchand
  2021-04-01  9:52 ` [dpdk-dev] [PATCH 1/5] mbuf: mark old offload flag as deprecated David Marchand
  2021-04-01  9:52 ` [dpdk-dev] [PATCH 2/5] net/tap: do not touch Tx offload flags David Marchand
@ 2021-04-01  9:52 ` David Marchand
  2021-04-13 14:17   ` Maxime Coquelin
  2021-04-01  9:52 ` [dpdk-dev] [PATCH 4/5] net/virtio: refactor Tx offload helper David Marchand
                   ` (4 subsequent siblings)
  7 siblings, 1 reply; 63+ messages in thread
From: David Marchand @ 2021-04-01  9:52 UTC (permalink / raw)
  To: dev; +Cc: maxime.coquelin, olivier.matz, fbl, i.maximets, Chenbo Xia

Tx offload flags are of the application responsibility.
Leave the mbuf alone and use a local storage for implicit tcp checksum
offloading in case of TSO.

Signed-off-by: David Marchand <david.marchand@redhat.com>
---
 drivers/net/virtio/virtqueue.h | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/net/virtio/virtqueue.h b/drivers/net/virtio/virtqueue.h
index 71b66f3208..2e8826bc28 100644
--- a/drivers/net/virtio/virtqueue.h
+++ b/drivers/net/virtio/virtqueue.h
@@ -618,10 +618,12 @@ virtqueue_xmit_offload(struct virtio_net_hdr *hdr,
 			uint8_t offload)
 {
 	if (offload) {
+		uint64_t csum_l4 = cookie->ol_flags & PKT_TX_L4_MASK;
+
 		if (cookie->ol_flags & PKT_TX_TCP_SEG)
-			cookie->ol_flags |= PKT_TX_TCP_CKSUM;
+			csum_l4 |= PKT_TX_TCP_CKSUM;
 
-		switch (cookie->ol_flags & PKT_TX_L4_MASK) {
+		switch (csum_l4) {
 		case PKT_TX_UDP_CKSUM:
 			hdr->csum_start = cookie->l2_len + cookie->l3_len;
 			hdr->csum_offset = offsetof(struct rte_udp_hdr,
-- 
2.23.0


^ permalink raw reply	[flat|nested] 63+ messages in thread

* [dpdk-dev] [PATCH 4/5] net/virtio: refactor Tx offload helper
  2021-04-01  9:52 [dpdk-dev] [PATCH 0/5] Offload flags fixes David Marchand
                   ` (2 preceding siblings ...)
  2021-04-01  9:52 ` [dpdk-dev] [PATCH 3/5] net/virtio: " David Marchand
@ 2021-04-01  9:52 ` David Marchand
  2021-04-08 13:05   ` Flavio Leitner
  2021-04-09  2:31   ` Ruifeng Wang
  2021-04-01  9:52 ` [dpdk-dev] [PATCH 5/5] vhost: fix offload flags in Rx path David Marchand
                   ` (3 subsequent siblings)
  7 siblings, 2 replies; 63+ messages in thread
From: David Marchand @ 2021-04-01  9:52 UTC (permalink / raw)
  To: dev
  Cc: maxime.coquelin, olivier.matz, fbl, i.maximets, Chenbo Xia,
	Bruce Richardson, Konstantin Ananyev, Jerin Jacob, Ruifeng Wang

Purely cosmetic but it is rather odd to have an "offload" helper that
checks if it actually must do something.
We already have the same checks in most callers, so move this branch
in them.

Signed-off-by: David Marchand <david.marchand@redhat.com>
---
 drivers/net/virtio/virtio_rxtx.c             |  7 +-
 drivers/net/virtio/virtio_rxtx_packed_avx.h  |  2 +-
 drivers/net/virtio/virtio_rxtx_packed_neon.h |  2 +-
 drivers/net/virtio/virtqueue.h               | 83 +++++++++-----------
 4 files changed, 44 insertions(+), 50 deletions(-)

diff --git a/drivers/net/virtio/virtio_rxtx.c b/drivers/net/virtio/virtio_rxtx.c
index 40283001b0..a4e37ef379 100644
--- a/drivers/net/virtio/virtio_rxtx.c
+++ b/drivers/net/virtio/virtio_rxtx.c
@@ -448,7 +448,7 @@ virtqueue_enqueue_xmit_inorder(struct virtnet_tx *txvq,
 		if (!vq->hw->has_tx_offload)
 			virtqueue_clear_net_hdr(hdr);
 		else
-			virtqueue_xmit_offload(hdr, cookies[i], true);
+			virtqueue_xmit_offload(hdr, cookies[i]);
 
 		start_dp[idx].addr  = rte_mbuf_data_iova(cookies[i]) - head_size;
 		start_dp[idx].len   = cookies[i]->data_len + head_size;
@@ -495,7 +495,7 @@ virtqueue_enqueue_xmit_packed_fast(struct virtnet_tx *txvq,
 	if (!vq->hw->has_tx_offload)
 		virtqueue_clear_net_hdr(hdr);
 	else
-		virtqueue_xmit_offload(hdr, cookie, true);
+		virtqueue_xmit_offload(hdr, cookie);
 
 	dp->addr = rte_mbuf_data_iova(cookie) - head_size;
 	dp->len  = cookie->data_len + head_size;
@@ -581,7 +581,8 @@ virtqueue_enqueue_xmit(struct virtnet_tx *txvq, struct rte_mbuf *cookie,
 		idx = start_dp[idx].next;
 	}
 
-	virtqueue_xmit_offload(hdr, cookie, vq->hw->has_tx_offload);
+	if (vq->hw->has_tx_offload)
+		virtqueue_xmit_offload(hdr, cookie);
 
 	do {
 		start_dp[idx].addr  = rte_mbuf_data_iova(cookie);
diff --git a/drivers/net/virtio/virtio_rxtx_packed_avx.h b/drivers/net/virtio/virtio_rxtx_packed_avx.h
index 49e845d02a..33cac3244f 100644
--- a/drivers/net/virtio/virtio_rxtx_packed_avx.h
+++ b/drivers/net/virtio/virtio_rxtx_packed_avx.h
@@ -115,7 +115,7 @@ virtqueue_enqueue_batch_packed_vec(struct virtnet_tx *txvq,
 		virtio_for_each_try_unroll(i, 0, PACKED_BATCH_SIZE) {
 			hdr = rte_pktmbuf_mtod_offset(tx_pkts[i],
 					struct virtio_net_hdr *, -head_size);
-			virtqueue_xmit_offload(hdr, tx_pkts[i], true);
+			virtqueue_xmit_offload(hdr, tx_pkts[i]);
 		}
 	}
 
diff --git a/drivers/net/virtio/virtio_rxtx_packed_neon.h b/drivers/net/virtio/virtio_rxtx_packed_neon.h
index 851c81f312..1a49caf8af 100644
--- a/drivers/net/virtio/virtio_rxtx_packed_neon.h
+++ b/drivers/net/virtio/virtio_rxtx_packed_neon.h
@@ -134,7 +134,7 @@ virtqueue_enqueue_batch_packed_vec(struct virtnet_tx *txvq,
 		virtio_for_each_try_unroll(i, 0, PACKED_BATCH_SIZE) {
 			hdr = rte_pktmbuf_mtod_offset(tx_pkts[i],
 					struct virtio_net_hdr *, -head_size);
-			virtqueue_xmit_offload(hdr, tx_pkts[i], true);
+			virtqueue_xmit_offload(hdr, tx_pkts[i]);
 		}
 	}
 
diff --git a/drivers/net/virtio/virtqueue.h b/drivers/net/virtio/virtqueue.h
index 2e8826bc28..41a9b82a5f 100644
--- a/drivers/net/virtio/virtqueue.h
+++ b/drivers/net/virtio/virtqueue.h
@@ -613,52 +613,44 @@ virtqueue_notify(struct virtqueue *vq)
 } while (0)
 
 static inline void
-virtqueue_xmit_offload(struct virtio_net_hdr *hdr,
-			struct rte_mbuf *cookie,
-			uint8_t offload)
+virtqueue_xmit_offload(struct virtio_net_hdr *hdr, struct rte_mbuf *cookie)
 {
-	if (offload) {
-		uint64_t csum_l4 = cookie->ol_flags & PKT_TX_L4_MASK;
-
-		if (cookie->ol_flags & PKT_TX_TCP_SEG)
-			csum_l4 |= PKT_TX_TCP_CKSUM;
-
-		switch (csum_l4) {
-		case PKT_TX_UDP_CKSUM:
-			hdr->csum_start = cookie->l2_len + cookie->l3_len;
-			hdr->csum_offset = offsetof(struct rte_udp_hdr,
-				dgram_cksum);
-			hdr->flags = VIRTIO_NET_HDR_F_NEEDS_CSUM;
-			break;
-
-		case PKT_TX_TCP_CKSUM:
-			hdr->csum_start = cookie->l2_len + cookie->l3_len;
-			hdr->csum_offset = offsetof(struct rte_tcp_hdr, cksum);
-			hdr->flags = VIRTIO_NET_HDR_F_NEEDS_CSUM;
-			break;
-
-		default:
-			ASSIGN_UNLESS_EQUAL(hdr->csum_start, 0);
-			ASSIGN_UNLESS_EQUAL(hdr->csum_offset, 0);
-			ASSIGN_UNLESS_EQUAL(hdr->flags, 0);
-			break;
-		}
+	uint64_t csum_l4 = cookie->ol_flags & PKT_TX_L4_MASK;
+
+	if (cookie->ol_flags & PKT_TX_TCP_SEG)
+		csum_l4 |= PKT_TX_TCP_CKSUM;
+
+	switch (csum_l4) {
+	case PKT_TX_UDP_CKSUM:
+		hdr->csum_start = cookie->l2_len + cookie->l3_len;
+		hdr->csum_offset = offsetof(struct rte_udp_hdr, dgram_cksum);
+		hdr->flags = VIRTIO_NET_HDR_F_NEEDS_CSUM;
+		break;
+
+	case PKT_TX_TCP_CKSUM:
+		hdr->csum_start = cookie->l2_len + cookie->l3_len;
+		hdr->csum_offset = offsetof(struct rte_tcp_hdr, cksum);
+		hdr->flags = VIRTIO_NET_HDR_F_NEEDS_CSUM;
+		break;
+
+	default:
+		ASSIGN_UNLESS_EQUAL(hdr->csum_start, 0);
+		ASSIGN_UNLESS_EQUAL(hdr->csum_offset, 0);
+		ASSIGN_UNLESS_EQUAL(hdr->flags, 0);
+		break;
+	}
 
-		/* TCP Segmentation Offload */
-		if (cookie->ol_flags & PKT_TX_TCP_SEG) {
-			hdr->gso_type = (cookie->ol_flags & PKT_TX_IPV6) ?
-				VIRTIO_NET_HDR_GSO_TCPV6 :
-				VIRTIO_NET_HDR_GSO_TCPV4;
-			hdr->gso_size = cookie->tso_segsz;
-			hdr->hdr_len =
-				cookie->l2_len +
-				cookie->l3_len +
-				cookie->l4_len;
-		} else {
-			ASSIGN_UNLESS_EQUAL(hdr->gso_type, 0);
-			ASSIGN_UNLESS_EQUAL(hdr->gso_size, 0);
-			ASSIGN_UNLESS_EQUAL(hdr->hdr_len, 0);
-		}
+	/* TCP Segmentation Offload */
+	if (cookie->ol_flags & PKT_TX_TCP_SEG) {
+		hdr->gso_type = (cookie->ol_flags & PKT_TX_IPV6) ?
+			VIRTIO_NET_HDR_GSO_TCPV6 :
+			VIRTIO_NET_HDR_GSO_TCPV4;
+		hdr->gso_size = cookie->tso_segsz;
+		hdr->hdr_len = cookie->l2_len + cookie->l3_len + cookie->l4_len;
+	} else {
+		ASSIGN_UNLESS_EQUAL(hdr->gso_type, 0);
+		ASSIGN_UNLESS_EQUAL(hdr->gso_size, 0);
+		ASSIGN_UNLESS_EQUAL(hdr->hdr_len, 0);
 	}
 }
 
@@ -737,7 +729,8 @@ virtqueue_enqueue_xmit_packed(struct virtnet_tx *txvq, struct rte_mbuf *cookie,
 		}
 	}
 
-	virtqueue_xmit_offload(hdr, cookie, vq->hw->has_tx_offload);
+	if (vq->hw->has_tx_offload)
+		virtqueue_xmit_offload(hdr, cookie);
 
 	do {
 		uint16_t flags;
-- 
2.23.0


^ permalink raw reply	[flat|nested] 63+ messages in thread

* [dpdk-dev] [PATCH 5/5] vhost: fix offload flags in Rx path
  2021-04-01  9:52 [dpdk-dev] [PATCH 0/5] Offload flags fixes David Marchand
                   ` (3 preceding siblings ...)
  2021-04-01  9:52 ` [dpdk-dev] [PATCH 4/5] net/virtio: refactor Tx offload helper David Marchand
@ 2021-04-01  9:52 ` David Marchand
  2021-04-08  8:28   ` Olivier Matz
  2021-04-08 18:38   ` Flavio Leitner
  2021-04-29  8:04 ` [dpdk-dev] [PATCH v2 0/4] Offload flags fixes David Marchand
                   ` (2 subsequent siblings)
  7 siblings, 2 replies; 63+ messages in thread
From: David Marchand @ 2021-04-01  9:52 UTC (permalink / raw)
  To: dev
  Cc: maxime.coquelin, olivier.matz, fbl, i.maximets, Chenbo Xia,
	Jijiang Liu, Yuanhan Liu

The vhost library current configures Tx offloading (PKT_TX_*) on any
packet received from a guest virtio device which asks for some offloading.

This is problematic, as Tx offloading is something that the application
must ask for: the application needs to configure devices
to support every used offloads (ip, tcp checksumming, tso..), and the
various l2/l3/l4 lengths must be set following any processing that
happened in the application itself.

On the other hand, the received packets are not marked wrt current
packet l3/l4 checksumming info.

Copy virtio rx processing to fix those offload flags.

The vhost example needs a reworking as it was built with the assumption
that mbuf TSO configuration is set up by the vhost library.
This is not done in this patch for now so TSO activation is forcibly
refused.

Fixes: 859b480d5afd ("vhost: add guest offload setting")

Signed-off-by: David Marchand <david.marchand@redhat.com>
---
 examples/vhost/main.c         |   6 ++
 lib/librte_vhost/virtio_net.c | 148 ++++++++++++++--------------------
 2 files changed, 67 insertions(+), 87 deletions(-)

diff --git a/examples/vhost/main.c b/examples/vhost/main.c
index 2ca7d98c58..819cd9909f 100644
--- a/examples/vhost/main.c
+++ b/examples/vhost/main.c
@@ -607,6 +607,12 @@ us_vhost_parse_args(int argc, char **argv)
 				us_vhost_usage(prgname);
 				return -1;
 			}
+			/* FIXME: tso support is broken */
+			if (ret != 0) {
+				RTE_LOG(INFO, VHOST_CONFIG, "TSO support is broken\n");
+				us_vhost_usage(prgname);
+				return -1;
+			}
 			enable_tso = ret;
 			break;
 
diff --git a/lib/librte_vhost/virtio_net.c b/lib/librte_vhost/virtio_net.c
index 583bf379c6..06089a4206 100644
--- a/lib/librte_vhost/virtio_net.c
+++ b/lib/librte_vhost/virtio_net.c
@@ -8,6 +8,7 @@
 
 #include <rte_mbuf.h>
 #include <rte_memcpy.h>
+#include <rte_net.h>
 #include <rte_ether.h>
 #include <rte_ip.h>
 #include <rte_vhost.h>
@@ -1821,105 +1822,75 @@ virtio_net_with_host_offload(struct virtio_net *dev)
 	return false;
 }
 
-static void
-parse_ethernet(struct rte_mbuf *m, uint16_t *l4_proto, void **l4_hdr)
-{
-	struct rte_ipv4_hdr *ipv4_hdr;
-	struct rte_ipv6_hdr *ipv6_hdr;
-	void *l3_hdr = NULL;
-	struct rte_ether_hdr *eth_hdr;
-	uint16_t ethertype;
-
-	eth_hdr = rte_pktmbuf_mtod(m, struct rte_ether_hdr *);
-
-	m->l2_len = sizeof(struct rte_ether_hdr);
-	ethertype = rte_be_to_cpu_16(eth_hdr->ether_type);
-
-	if (ethertype == RTE_ETHER_TYPE_VLAN) {
-		struct rte_vlan_hdr *vlan_hdr =
-			(struct rte_vlan_hdr *)(eth_hdr + 1);
-
-		m->l2_len += sizeof(struct rte_vlan_hdr);
-		ethertype = rte_be_to_cpu_16(vlan_hdr->eth_proto);
-	}
-
-	l3_hdr = (char *)eth_hdr + m->l2_len;
-
-	switch (ethertype) {
-	case RTE_ETHER_TYPE_IPV4:
-		ipv4_hdr = l3_hdr;
-		*l4_proto = ipv4_hdr->next_proto_id;
-		m->l3_len = rte_ipv4_hdr_len(ipv4_hdr);
-		*l4_hdr = (char *)l3_hdr + m->l3_len;
-		m->ol_flags |= PKT_TX_IPV4;
-		break;
-	case RTE_ETHER_TYPE_IPV6:
-		ipv6_hdr = l3_hdr;
-		*l4_proto = ipv6_hdr->proto;
-		m->l3_len = sizeof(struct rte_ipv6_hdr);
-		*l4_hdr = (char *)l3_hdr + m->l3_len;
-		m->ol_flags |= PKT_TX_IPV6;
-		break;
-	default:
-		m->l3_len = 0;
-		*l4_proto = 0;
-		*l4_hdr = NULL;
-		break;
-	}
-}
-
-static __rte_always_inline void
+static __rte_always_inline int
 vhost_dequeue_offload(struct virtio_net_hdr *hdr, struct rte_mbuf *m)
 {
-	uint16_t l4_proto = 0;
-	void *l4_hdr = NULL;
-	struct rte_tcp_hdr *tcp_hdr = NULL;
+	struct rte_net_hdr_lens hdr_lens;
+	uint32_t hdrlen, ptype;
+	int l4_supported = 0;
 
+	/* nothing to do */
 	if (hdr->flags == 0 && hdr->gso_type == VIRTIO_NET_HDR_GSO_NONE)
-		return;
-
-	parse_ethernet(m, &l4_proto, &l4_hdr);
-	if (hdr->flags == VIRTIO_NET_HDR_F_NEEDS_CSUM) {
-		if (hdr->csum_start == (m->l2_len + m->l3_len)) {
-			switch (hdr->csum_offset) {
-			case (offsetof(struct rte_tcp_hdr, cksum)):
-				if (l4_proto == IPPROTO_TCP)
-					m->ol_flags |= PKT_TX_TCP_CKSUM;
-				break;
-			case (offsetof(struct rte_udp_hdr, dgram_cksum)):
-				if (l4_proto == IPPROTO_UDP)
-					m->ol_flags |= PKT_TX_UDP_CKSUM;
-				break;
-			case (offsetof(struct rte_sctp_hdr, cksum)):
-				if (l4_proto == IPPROTO_SCTP)
-					m->ol_flags |= PKT_TX_SCTP_CKSUM;
-				break;
-			default:
-				break;
-			}
+		return 0;
+
+	m->ol_flags |= PKT_RX_IP_CKSUM_UNKNOWN;
+
+	ptype = rte_net_get_ptype(m, &hdr_lens, RTE_PTYPE_ALL_MASK);
+	m->packet_type = ptype;
+	if ((ptype & RTE_PTYPE_L4_MASK) == RTE_PTYPE_L4_TCP ||
+	    (ptype & RTE_PTYPE_L4_MASK) == RTE_PTYPE_L4_UDP ||
+	    (ptype & RTE_PTYPE_L4_MASK) == RTE_PTYPE_L4_SCTP)
+		l4_supported = 1;
+
+	if (hdr->flags & VIRTIO_NET_HDR_F_NEEDS_CSUM) {
+		hdrlen = hdr_lens.l2_len + hdr_lens.l3_len + hdr_lens.l4_len;
+		if (hdr->csum_start <= hdrlen && l4_supported) {
+			m->ol_flags |= PKT_RX_L4_CKSUM_NONE;
+		} else {
+			/* Unknown proto or tunnel, do sw cksum. We can assume
+			 * the cksum field is in the first segment since the
+			 * buffers we provided to the host are large enough.
+			 * In case of SCTP, this will be wrong since it's a CRC
+			 * but there's nothing we can do.
+			 */
+			uint16_t csum = 0, off;
+
+			if (rte_raw_cksum_mbuf(m, hdr->csum_start,
+				rte_pktmbuf_pkt_len(m) - hdr->csum_start,
+				&csum) < 0)
+				return -EINVAL;
+			if (likely(csum != 0xffff))
+				csum = ~csum;
+			off = hdr->csum_offset + hdr->csum_start;
+			if (rte_pktmbuf_data_len(m) >= off + 1)
+				*rte_pktmbuf_mtod_offset(m, uint16_t *,
+					off) = csum;
 		}
+	} else if (hdr->flags & VIRTIO_NET_HDR_F_DATA_VALID && l4_supported) {
+		m->ol_flags |= PKT_RX_L4_CKSUM_GOOD;
 	}
 
-	if (l4_hdr && hdr->gso_type != VIRTIO_NET_HDR_GSO_NONE) {
+	/* GSO request, save required information in mbuf */
+	if (hdr->gso_type != VIRTIO_NET_HDR_GSO_NONE) {
+		/* Check unsupported modes */
+		if ((hdr->gso_type & VIRTIO_NET_HDR_GSO_ECN) ||
+		    (hdr->gso_size == 0)) {
+			return -EINVAL;
+		}
+
+		/* Update mss lengths in mbuf */
+		m->tso_segsz = hdr->gso_size;
 		switch (hdr->gso_type & ~VIRTIO_NET_HDR_GSO_ECN) {
 		case VIRTIO_NET_HDR_GSO_TCPV4:
 		case VIRTIO_NET_HDR_GSO_TCPV6:
-			tcp_hdr = l4_hdr;
-			m->ol_flags |= PKT_TX_TCP_SEG;
-			m->tso_segsz = hdr->gso_size;
-			m->l4_len = (tcp_hdr->data_off & 0xf0) >> 2;
-			break;
-		case VIRTIO_NET_HDR_GSO_UDP:
-			m->ol_flags |= PKT_TX_UDP_SEG;
-			m->tso_segsz = hdr->gso_size;
-			m->l4_len = sizeof(struct rte_udp_hdr);
+			m->ol_flags |= PKT_RX_LRO | PKT_RX_L4_CKSUM_NONE;
 			break;
 		default:
-			VHOST_LOG_DATA(WARNING,
-				"unsupported gso type %u.\n", hdr->gso_type);
-			break;
+			return -EINVAL;
 		}
 	}
+
+	return 0;
 }
 
 static __rte_noinline void
@@ -2078,8 +2049,11 @@ copy_desc_to_mbuf(struct virtio_net *dev, struct vhost_virtqueue *vq,
 	prev->data_len = mbuf_offset;
 	m->pkt_len    += mbuf_offset;
 
-	if (hdr)
-		vhost_dequeue_offload(hdr, m);
+	if (hdr && vhost_dequeue_offload(hdr, m) < 0) {
+		VHOST_LOG_DATA(ERR, "Packet with invalid offloads.\n");
+		error = -1;
+		goto out;
+	}
 
 out:
 
-- 
2.23.0


^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [dpdk-dev] [PATCH 1/5] mbuf: mark old offload flag as deprecated
  2021-04-01  9:52 ` [dpdk-dev] [PATCH 1/5] mbuf: mark old offload flag as deprecated David Marchand
@ 2021-04-07 20:14   ` Flavio Leitner
  2021-04-08  7:23   ` Olivier Matz
  1 sibling, 0 replies; 63+ messages in thread
From: Flavio Leitner @ 2021-04-07 20:14 UTC (permalink / raw)
  To: David Marchand; +Cc: dev, maxime.coquelin, olivier.matz, i.maximets

On Thu, Apr 01, 2021 at 11:52:39AM +0200, David Marchand wrote:
> PKT_RX_EIP_CKSUM_BAD has been declared deprecated quite some time ago,
> but there was no warning to applications still using it.
> Fix this by marking as deprecated with the newly introduced
> RTE_DEPRECATED.
> 
> Signed-off-by: David Marchand <david.marchand@redhat.com>
> ---

Reviewed-by: Flavio Leitner <fbl@sysclose.org>


^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [dpdk-dev] [PATCH 2/5] net/tap: do not touch Tx offload flags
  2021-04-01  9:52 ` [dpdk-dev] [PATCH 2/5] net/tap: do not touch Tx offload flags David Marchand
@ 2021-04-07 20:15   ` Flavio Leitner
  2021-04-08  7:41     ` Olivier Matz
  2021-04-08  7:53   ` Olivier Matz
  1 sibling, 1 reply; 63+ messages in thread
From: Flavio Leitner @ 2021-04-07 20:15 UTC (permalink / raw)
  To: David Marchand
  Cc: dev, maxime.coquelin, olivier.matz, i.maximets, Keith Wiles

On Thu, Apr 01, 2021 at 11:52:40AM +0200, David Marchand wrote:
> Tx offload flags are of the application responsibility.
> Leave the mbuf alone and check for TSO where needed.
> 
> Signed-off-by: David Marchand <david.marchand@redhat.com>
> ---

The patch looks good, but maybe a better approach would be
to change the documentation to require the TCP_CKSUM flag
when TCP_SEG is used, otherwise this flag adjusting needs
to be replicated every time TCP_SEG is used.

The above could break existing applications, so perhaps doing
something like below would be better and backwards compatible?
Then we can remove those places tweaking the flags completely.

diff --git a/lib/librte_mbuf/rte_mbuf_core.h b/lib/librte_mbuf/rte_mbuf_core.h
index c17dc95c5..6a0c2cdd9 100644
--- a/lib/librte_mbuf/rte_mbuf_core.h
+++ b/lib/librte_mbuf/rte_mbuf_core.h
@@ -298,7 +298,7 @@ extern "C" {
  *  - if it's IPv4, set the PKT_TX_IP_CKSUM flag
  *  - fill the mbuf offload information: l2_len, l3_len, l4_len, tso_segsz
  */
-#define PKT_TX_TCP_SEG       (1ULL << 50)
+#define PKT_TX_TCP_SEG       (1ULL << 50) | PKT_TX_TCP_CKSUM
 
 /** TX IEEE1588 packet to timestamp. */
 #define PKT_TX_IEEE1588_TMST (1ULL << 51)

Thanks,
fbl

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [dpdk-dev] [PATCH 1/5] mbuf: mark old offload flag as deprecated
  2021-04-01  9:52 ` [dpdk-dev] [PATCH 1/5] mbuf: mark old offload flag as deprecated David Marchand
  2021-04-07 20:14   ` Flavio Leitner
@ 2021-04-08  7:23   ` Olivier Matz
  2021-04-08  8:41     ` David Marchand
  1 sibling, 1 reply; 63+ messages in thread
From: Olivier Matz @ 2021-04-08  7:23 UTC (permalink / raw)
  To: David Marchand; +Cc: dev, maxime.coquelin, fbl, i.maximets

On Thu, Apr 01, 2021 at 11:52:39AM +0200, David Marchand wrote:
> PKT_RX_EIP_CKSUM_BAD has been declared deprecated quite some time ago,

It's not that old, it was done by Lance in commit e8a419d6de4b ("mbuf:
rename outer IP checksum macro") 1 month ago.

> but there was no warning to applications still using it.
> Fix this by marking as deprecated with the newly introduced
> RTE_DEPRECATED.
> 
> Signed-off-by: David Marchand <david.marchand@redhat.com>

Acked-by: Olivier Matz <olivier.matz@6wind.com>

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [dpdk-dev] [PATCH 2/5] net/tap: do not touch Tx offload flags
  2021-04-07 20:15   ` Flavio Leitner
@ 2021-04-08  7:41     ` Olivier Matz
  2021-04-08 11:21       ` Flavio Leitner
  0 siblings, 1 reply; 63+ messages in thread
From: Olivier Matz @ 2021-04-08  7:41 UTC (permalink / raw)
  To: Flavio Leitner
  Cc: David Marchand, dev, maxime.coquelin, i.maximets, Keith Wiles

On Wed, Apr 07, 2021 at 05:15:39PM -0300, Flavio Leitner wrote:
> On Thu, Apr 01, 2021 at 11:52:40AM +0200, David Marchand wrote:
> > Tx offload flags are of the application responsibility.
> > Leave the mbuf alone and check for TSO where needed.
> > 
> > Signed-off-by: David Marchand <david.marchand@redhat.com>
> > ---
> 
> The patch looks good, but maybe a better approach would be
> to change the documentation to require the TCP_CKSUM flag
> when TCP_SEG is used, otherwise this flag adjusting needs
> to be replicated every time TCP_SEG is used.
> 
> The above could break existing applications, so perhaps doing
> something like below would be better and backwards compatible?
> Then we can remove those places tweaking the flags completely.

As a first step, I suggest to document that:
- applications must set TCP_CKSUM when setting TCP_SEG
- pmds must suppose that TCP_CKSUM is set when TCP_SEG is set

This is clearer that what we have today, and I think it does not break
anything. This will guide apps in the correct direction, facilitating
an eventual future PMD change.

> diff --git a/lib/librte_mbuf/rte_mbuf_core.h b/lib/librte_mbuf/rte_mbuf_core.h
> index c17dc95c5..6a0c2cdd9 100644
> --- a/lib/librte_mbuf/rte_mbuf_core.h
> +++ b/lib/librte_mbuf/rte_mbuf_core.h
> @@ -298,7 +298,7 @@ extern "C" {
>   *  - if it's IPv4, set the PKT_TX_IP_CKSUM flag
>   *  - fill the mbuf offload information: l2_len, l3_len, l4_len, tso_segsz
>   */
> -#define PKT_TX_TCP_SEG       (1ULL << 50)
> +#define PKT_TX_TCP_SEG       (1ULL << 50) | PKT_TX_TCP_CKSUM
>  
>  /** TX IEEE1588 packet to timestamp. */
>  #define PKT_TX_IEEE1588_TMST (1ULL << 51)

I'm afraid some applications or drivers use extended bit manipulations
to do the conversion from/to another domain (like hardware descriptors
or application-specific flags). They may expect this constant to be a
uniq flag.

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [dpdk-dev] [PATCH 2/5] net/tap: do not touch Tx offload flags
  2021-04-01  9:52 ` [dpdk-dev] [PATCH 2/5] net/tap: do not touch Tx offload flags David Marchand
  2021-04-07 20:15   ` Flavio Leitner
@ 2021-04-08  7:53   ` Olivier Matz
  2021-04-28 12:12     ` David Marchand
  1 sibling, 1 reply; 63+ messages in thread
From: Olivier Matz @ 2021-04-08  7:53 UTC (permalink / raw)
  To: David Marchand; +Cc: dev, maxime.coquelin, fbl, i.maximets, Keith Wiles

On Thu, Apr 01, 2021 at 11:52:40AM +0200, David Marchand wrote:
> Tx offload flags are of the application responsibility.
> Leave the mbuf alone and check for TSO where needed.
> 
> Signed-off-by: David Marchand <david.marchand@redhat.com>

Maybe the problem being solved should be better described in the commit
log. Is it a problem (other than cosmetic) to touch a mbuf in the Tx
function of a driver, where we could expect that the mbuf is owned by
the driver?

The only problem I can think about is in case we transmit a direct mbuf
whose refcnt is increased, but I wonder how much this is really
supported: for instance, several drivers add vlans using
rte_vlan_insert() in their Tx path.


> ---
>  drivers/net/tap/rte_eth_tap.c | 17 ++++++++++-------
>  1 file changed, 10 insertions(+), 7 deletions(-)
> 
> diff --git a/drivers/net/tap/rte_eth_tap.c b/drivers/net/tap/rte_eth_tap.c
> index c36d4bf76e..285fe395c5 100644
> --- a/drivers/net/tap/rte_eth_tap.c
> +++ b/drivers/net/tap/rte_eth_tap.c
> @@ -562,6 +562,7 @@ tap_tx_l3_cksum(char *packet, uint64_t ol_flags, unsigned int l2_len,
>  		uint16_t *l4_phdr_cksum, uint32_t *l4_raw_cksum)
>  {
>  	void *l3_hdr = packet + l2_len;
> +	uint64_t csum_l4;
>  
>  	if (ol_flags & (PKT_TX_IP_CKSUM | PKT_TX_IPV4)) {
>  		struct rte_ipv4_hdr *iph = l3_hdr;
> @@ -571,13 +572,17 @@ tap_tx_l3_cksum(char *packet, uint64_t ol_flags, unsigned int l2_len,
>  		cksum = rte_raw_cksum(iph, l3_len);
>  		iph->hdr_checksum = (cksum == 0xffff) ? cksum : ~cksum;
>  	}
> -	if (ol_flags & PKT_TX_L4_MASK) {
> +
> +	csum_l4 = ol_flags & PKT_TX_L4_MASK;
> +	if (ol_flags & PKT_TX_TCP_SEG)
> +		csum_l4 |= PKT_TX_TCP_CKSUM;
> +	if (csum_l4) {
>  		void *l4_hdr;
>  
>  		l4_hdr = packet + l2_len + l3_len;
> -		if ((ol_flags & PKT_TX_L4_MASK) == PKT_TX_UDP_CKSUM)
> +		if (csum_l4 == PKT_TX_UDP_CKSUM)
>  			*l4_cksum = &((struct rte_udp_hdr *)l4_hdr)->dgram_cksum;
> -		else if ((ol_flags & PKT_TX_L4_MASK) == PKT_TX_TCP_CKSUM)
> +		else if (csum_l4 == PKT_TX_TCP_CKSUM)
>  			*l4_cksum = &((struct rte_tcp_hdr *)l4_hdr)->cksum;
>  		else
>  			return;
> @@ -648,7 +653,8 @@ tap_write_mbufs(struct tx_queue *txq, uint16_t num_mbufs,
>  		if (txq->csum &&
>  		    ((mbuf->ol_flags & (PKT_TX_IP_CKSUM | PKT_TX_IPV4) ||
>  		     (mbuf->ol_flags & PKT_TX_L4_MASK) == PKT_TX_UDP_CKSUM ||
> -		     (mbuf->ol_flags & PKT_TX_L4_MASK) == PKT_TX_TCP_CKSUM))) {
> +		     (mbuf->ol_flags & PKT_TX_L4_MASK) == PKT_TX_TCP_CKSUM) ||
> +		     (mbuf->ol_flags & PKT_TX_TCP_SEG))) {
>  			is_cksum = 1;
>  
>  			/* Support only packets with at least layer 4
> @@ -742,9 +748,6 @@ pmd_tx_burst(void *queue, struct rte_mbuf **bufs, uint16_t nb_pkts)
>  		if (tso) {
>  			struct rte_gso_ctx *gso_ctx = &txq->gso_ctx;
>  
> -			/* TCP segmentation implies TCP checksum offload */
> -			mbuf_in->ol_flags |= PKT_TX_TCP_CKSUM;
> -
>  			/* gso size is calculated without RTE_ETHER_CRC_LEN */
>  			hdrs_len = mbuf_in->l2_len + mbuf_in->l3_len +
>  					mbuf_in->l4_len;
> -- 
> 2.23.0
> 

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [dpdk-dev] [PATCH 5/5] vhost: fix offload flags in Rx path
  2021-04-01  9:52 ` [dpdk-dev] [PATCH 5/5] vhost: fix offload flags in Rx path David Marchand
@ 2021-04-08  8:28   ` Olivier Matz
  2021-04-08 18:38   ` Flavio Leitner
  1 sibling, 0 replies; 63+ messages in thread
From: Olivier Matz @ 2021-04-08  8:28 UTC (permalink / raw)
  To: David Marchand
  Cc: dev, maxime.coquelin, fbl, i.maximets, Chenbo Xia, Jijiang Liu,
	Yuanhan Liu

Hi David,

On Thu, Apr 01, 2021 at 11:52:43AM +0200, David Marchand wrote:
> The vhost library current configures Tx offloading (PKT_TX_*) on any
> packet received from a guest virtio device which asks for some offloading.
> 
> This is problematic, as Tx offloading is something that the application
> must ask for: the application needs to configure devices
> to support every used offloads (ip, tcp checksumming, tso..), and the
> various l2/l3/l4 lengths must be set following any processing that
> happened in the application itself.
> 
> On the other hand, the received packets are not marked wrt current
> packet l3/l4 checksumming info.
> 
> Copy virtio rx processing to fix those offload flags.
> 
> The vhost example needs a reworking as it was built with the assumption
> that mbuf TSO configuration is set up by the vhost library.
> This is not done in this patch for now so TSO activation is forcibly
> refused.
> 
> Fixes: 859b480d5afd ("vhost: add guest offload setting")
> 
> Signed-off-by: David Marchand <david.marchand@redhat.com>
> ---

Reviewed-by: Olivier Matz <olivier.matz@6wind.com>

LGTM, just one little comment below.

<...>

> +	m->ol_flags |= PKT_RX_IP_CKSUM_UNKNOWN;
> +
> +	ptype = rte_net_get_ptype(m, &hdr_lens, RTE_PTYPE_ALL_MASK);
> +	m->packet_type = ptype;
> +	if ((ptype & RTE_PTYPE_L4_MASK) == RTE_PTYPE_L4_TCP ||
> +	    (ptype & RTE_PTYPE_L4_MASK) == RTE_PTYPE_L4_UDP ||
> +	    (ptype & RTE_PTYPE_L4_MASK) == RTE_PTYPE_L4_SCTP)
> +		l4_supported = 1;
> +
> +	if (hdr->flags & VIRTIO_NET_HDR_F_NEEDS_CSUM) {
> +		hdrlen = hdr_lens.l2_len + hdr_lens.l3_len + hdr_lens.l4_len;
> +		if (hdr->csum_start <= hdrlen && l4_supported) {
> +			m->ol_flags |= PKT_RX_L4_CKSUM_NONE;
> +		} else {
> +			/* Unknown proto or tunnel, do sw cksum. We can assume
> +			 * the cksum field is in the first segment since the
> +			 * buffers we provided to the host are large enough.
> +			 * In case of SCTP, this will be wrong since it's a CRC
> +			 * but there's nothing we can do.
> +			 */
> +			uint16_t csum = 0, off;
> +
> +			if (rte_raw_cksum_mbuf(m, hdr->csum_start,
> +				rte_pktmbuf_pkt_len(m) - hdr->csum_start,
> +				&csum) < 0)
> +				return -EINVAL;
> +			if (likely(csum != 0xffff))
> +				csum = ~csum;

I was trying to remember the reason for this last test (which is also
present in net/virtio).

If this is a UDP checksum (on top of an unrecognized tunnel), it's
indeed needed to do that, because we don't want to set the checksum to 0
in the packet (which means "no checksum" for UDPv4, or is fordidden for
UDPv6).

If this is something else than UDP, it shouldn't hurt to have a 0xffff in the
packet instead of 0.

Maybe it deserves a comment here, like:

  /* avoid 0 checksum for UDP, shouldn't hurt for other protocols */

What do you think?

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [dpdk-dev] [PATCH 1/5] mbuf: mark old offload flag as deprecated
  2021-04-08  7:23   ` Olivier Matz
@ 2021-04-08  8:41     ` David Marchand
  0 siblings, 0 replies; 63+ messages in thread
From: David Marchand @ 2021-04-08  8:41 UTC (permalink / raw)
  To: Olivier Matz; +Cc: dev, Maxime Coquelin, Flavio Leitner, Ilya Maximets

On Thu, Apr 8, 2021 at 9:24 AM Olivier Matz <olivier.matz@6wind.com> wrote:
>
> On Thu, Apr 01, 2021 at 11:52:39AM +0200, David Marchand wrote:
> > PKT_RX_EIP_CKSUM_BAD has been declared deprecated quite some time ago,
>
> It's not that old, it was done by Lance in commit e8a419d6de4b ("mbuf:
> rename outer IP checksum macro") 1 month ago.

Err, I was pretty sure it was older... I probably misread some date.
Ok, I'll reword this and add a Fixes: tag just for info.


-- 
David Marchand


^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [dpdk-dev] [PATCH 2/5] net/tap: do not touch Tx offload flags
  2021-04-08  7:41     ` Olivier Matz
@ 2021-04-08 11:21       ` Flavio Leitner
  2021-04-08 12:05         ` Olivier Matz
  2021-04-08 12:16         ` Ananyev, Konstantin
  0 siblings, 2 replies; 63+ messages in thread
From: Flavio Leitner @ 2021-04-08 11:21 UTC (permalink / raw)
  To: Olivier Matz
  Cc: David Marchand, dev, maxime.coquelin, i.maximets, Keith Wiles

On Thu, Apr 08, 2021 at 09:41:59AM +0200, Olivier Matz wrote:
> On Wed, Apr 07, 2021 at 05:15:39PM -0300, Flavio Leitner wrote:
> > On Thu, Apr 01, 2021 at 11:52:40AM +0200, David Marchand wrote:
> > > Tx offload flags are of the application responsibility.
> > > Leave the mbuf alone and check for TSO where needed.
> > > 
> > > Signed-off-by: David Marchand <david.marchand@redhat.com>
> > > ---
> > 
> > The patch looks good, but maybe a better approach would be
> > to change the documentation to require the TCP_CKSUM flag
> > when TCP_SEG is used, otherwise this flag adjusting needs
> > to be replicated every time TCP_SEG is used.
> > 
> > The above could break existing applications, so perhaps doing
> > something like below would be better and backwards compatible?
> > Then we can remove those places tweaking the flags completely.
> 
> As a first step, I suggest to document that:
> - applications must set TCP_CKSUM when setting TCP_SEG

That's what I suggest above.

> - pmds must suppose that TCP_CKSUM is set when TCP_SEG is set

But that keeps the problem of implying the TCP_CKSUM flag in
various places.

> This is clearer that what we have today, and I think it does not break
> anything. This will guide apps in the correct direction, facilitating
> an eventual future PMD change.
> 
> > diff --git a/lib/librte_mbuf/rte_mbuf_core.h b/lib/librte_mbuf/rte_mbuf_core.h
> > index c17dc95c5..6a0c2cdd9 100644
> > --- a/lib/librte_mbuf/rte_mbuf_core.h
> > +++ b/lib/librte_mbuf/rte_mbuf_core.h
> > @@ -298,7 +298,7 @@ extern "C" {
> >   *  - if it's IPv4, set the PKT_TX_IP_CKSUM flag
> >   *  - fill the mbuf offload information: l2_len, l3_len, l4_len, tso_segsz
> >   */
> > -#define PKT_TX_TCP_SEG       (1ULL << 50)
> > +#define PKT_TX_TCP_SEG       (1ULL << 50) | PKT_TX_TCP_CKSUM
> >  
> >  /** TX IEEE1588 packet to timestamp. */
> >  #define PKT_TX_IEEE1588_TMST (1ULL << 51)
> 
> I'm afraid some applications or drivers use extended bit manipulations
> to do the conversion from/to another domain (like hardware descriptors
> or application-specific flags). They may expect this constant to be a
> uniq flag.

Interesting, do you have an example? Because each flag still has an
separate meaning.

-- 
fbl

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [dpdk-dev] [PATCH 2/5] net/tap: do not touch Tx offload flags
  2021-04-08 11:21       ` Flavio Leitner
@ 2021-04-08 12:05         ` Olivier Matz
  2021-04-08 12:58           ` Flavio Leitner
  2021-04-08 12:16         ` Ananyev, Konstantin
  1 sibling, 1 reply; 63+ messages in thread
From: Olivier Matz @ 2021-04-08 12:05 UTC (permalink / raw)
  To: Flavio Leitner
  Cc: David Marchand, dev, maxime.coquelin, i.maximets, Keith Wiles

On Thu, Apr 08, 2021 at 08:21:58AM -0300, Flavio Leitner wrote:
> On Thu, Apr 08, 2021 at 09:41:59AM +0200, Olivier Matz wrote:
> > On Wed, Apr 07, 2021 at 05:15:39PM -0300, Flavio Leitner wrote:
> > > On Thu, Apr 01, 2021 at 11:52:40AM +0200, David Marchand wrote:
> > > > Tx offload flags are of the application responsibility.
> > > > Leave the mbuf alone and check for TSO where needed.
> > > > 
> > > > Signed-off-by: David Marchand <david.marchand@redhat.com>
> > > > ---
> > > 
> > > The patch looks good, but maybe a better approach would be
> > > to change the documentation to require the TCP_CKSUM flag
> > > when TCP_SEG is used, otherwise this flag adjusting needs
> > > to be replicated every time TCP_SEG is used.
> > > 
> > > The above could break existing applications, so perhaps doing
> > > something like below would be better and backwards compatible?
> > > Then we can remove those places tweaking the flags completely.
> > 
> > As a first step, I suggest to document that:
> > - applications must set TCP_CKSUM when setting TCP_SEG
> 
> That's what I suggest above.
> 
> > - pmds must suppose that TCP_CKSUM is set when TCP_SEG is set
> 
> But that keeps the problem of implying the TCP_CKSUM flag in
> various places.

Yes. What I propose is just a first step: better document what is the
current expected behavior, before doing something else.

> > This is clearer that what we have today, and I think it does not break
> > anything. This will guide apps in the correct direction, facilitating
> > an eventual future PMD change.
> > 
> > > diff --git a/lib/librte_mbuf/rte_mbuf_core.h b/lib/librte_mbuf/rte_mbuf_core.h
> > > index c17dc95c5..6a0c2cdd9 100644
> > > --- a/lib/librte_mbuf/rte_mbuf_core.h
> > > +++ b/lib/librte_mbuf/rte_mbuf_core.h
> > > @@ -298,7 +298,7 @@ extern "C" {
> > >   *  - if it's IPv4, set the PKT_TX_IP_CKSUM flag
> > >   *  - fill the mbuf offload information: l2_len, l3_len, l4_len, tso_segsz
> > >   */
> > > -#define PKT_TX_TCP_SEG       (1ULL << 50)
> > > +#define PKT_TX_TCP_SEG       (1ULL << 50) | PKT_TX_TCP_CKSUM
> > >  
> > >  /** TX IEEE1588 packet to timestamp. */
> > >  #define PKT_TX_IEEE1588_TMST (1ULL << 51)
> > 
> > I'm afraid some applications or drivers use extended bit manipulations
> > to do the conversion from/to another domain (like hardware descriptors
> > or application-specific flags). They may expect this constant to be a
> > uniq flag.
> 
> Interesting, do you have an example? Because each flag still has an
> separate meaning.

Honnestly no, I don't have any good example, just a (maybe unfounded) doubt.

I have in mind operations that are done with tables or vector
instructions inside the drivers, but this is mainly done for Rx, not Tx.
You can look at Tx functions like mlx5_set_cksum_table() or
nix_xmit_pkts_vector(), or Rx functions like desc_to_olflags_v() or
enic_noscatter_vec_recv_pkts() to see what kind of stuff I'm talking
about.

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [dpdk-dev] [PATCH 2/5] net/tap: do not touch Tx offload flags
  2021-04-08 11:21       ` Flavio Leitner
  2021-04-08 12:05         ` Olivier Matz
@ 2021-04-08 12:16         ` Ananyev, Konstantin
  1 sibling, 0 replies; 63+ messages in thread
From: Ananyev, Konstantin @ 2021-04-08 12:16 UTC (permalink / raw)
  To: Flavio Leitner, Olivier Matz
  Cc: David Marchand, dev, maxime.coquelin, i.maximets, Wiles, Keith



> 
> On Thu, Apr 08, 2021 at 09:41:59AM +0200, Olivier Matz wrote:
> > On Wed, Apr 07, 2021 at 05:15:39PM -0300, Flavio Leitner wrote:
> > > On Thu, Apr 01, 2021 at 11:52:40AM +0200, David Marchand wrote:
> > > > Tx offload flags are of the application responsibility.
> > > > Leave the mbuf alone and check for TSO where needed.
> > > >
> > > > Signed-off-by: David Marchand <david.marchand@redhat.com>
> > > > ---
> > >
> > > The patch looks good, but maybe a better approach would be
> > > to change the documentation to require the TCP_CKSUM flag
> > > when TCP_SEG is used, otherwise this flag adjusting needs
> > > to be replicated every time TCP_SEG is used.
> > >
> > > The above could break existing applications, so perhaps doing
> > > something like below would be better and backwards compatible?
> > > Then we can remove those places tweaking the flags completely.
> >
> > As a first step, I suggest to document that:
> > - applications must set TCP_CKSUM when setting TCP_SEG
> 
> That's what I suggest above.
> 
> > - pmds must suppose that TCP_CKSUM is set when TCP_SEG is set
> 
> But that keeps the problem of implying the TCP_CKSUM flag in
> various places.
> 
> > This is clearer that what we have today, and I think it does not break
> > anything. This will guide apps in the correct direction, facilitating
> > an eventual future PMD change.
> >
> > > diff --git a/lib/librte_mbuf/rte_mbuf_core.h b/lib/librte_mbuf/rte_mbuf_core.h
> > > index c17dc95c5..6a0c2cdd9 100644
> > > --- a/lib/librte_mbuf/rte_mbuf_core.h
> > > +++ b/lib/librte_mbuf/rte_mbuf_core.h
> > > @@ -298,7 +298,7 @@ extern "C" {
> > >   *  - if it's IPv4, set the PKT_TX_IP_CKSUM flag
> > >   *  - fill the mbuf offload information: l2_len, l3_len, l4_len, tso_segsz
> > >   */
> > > -#define PKT_TX_TCP_SEG       (1ULL << 50)
> > > +#define PKT_TX_TCP_SEG       (1ULL << 50) | PKT_TX_TCP_CKSUM

I think that would be an ABI breakage.

> > >
> > >  /** TX IEEE1588 packet to timestamp. */
> > >  #define PKT_TX_IEEE1588_TMST (1ULL << 51)
> >
> > I'm afraid some applications or drivers use extended bit manipulations
> > to do the conversion from/to another domain (like hardware descriptors
> > or application-specific flags). They may expect this constant to be a
> > uniq flag.
> 
> Interesting, do you have an example? Because each flag still has an
> separate meaning.


 

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [dpdk-dev] [PATCH 2/5] net/tap: do not touch Tx offload flags
  2021-04-08 12:05         ` Olivier Matz
@ 2021-04-08 12:58           ` Flavio Leitner
  2021-04-09 13:30             ` Olivier Matz
  0 siblings, 1 reply; 63+ messages in thread
From: Flavio Leitner @ 2021-04-08 12:58 UTC (permalink / raw)
  To: Olivier Matz
  Cc: David Marchand, dev, maxime.coquelin, i.maximets, Keith Wiles

On Thu, Apr 08, 2021 at 02:05:21PM +0200, Olivier Matz wrote:
> On Thu, Apr 08, 2021 at 08:21:58AM -0300, Flavio Leitner wrote:
> > On Thu, Apr 08, 2021 at 09:41:59AM +0200, Olivier Matz wrote:
> > > On Wed, Apr 07, 2021 at 05:15:39PM -0300, Flavio Leitner wrote:
> > > > On Thu, Apr 01, 2021 at 11:52:40AM +0200, David Marchand wrote:
> > > > > Tx offload flags are of the application responsibility.
> > > > > Leave the mbuf alone and check for TSO where needed.
> > > > > 
> > > > > Signed-off-by: David Marchand <david.marchand@redhat.com>
> > > > > ---
> > > > 
> > > > The patch looks good, but maybe a better approach would be
> > > > to change the documentation to require the TCP_CKSUM flag
> > > > when TCP_SEG is used, otherwise this flag adjusting needs
> > > > to be replicated every time TCP_SEG is used.
> > > > 
> > > > The above could break existing applications, so perhaps doing
> > > > something like below would be better and backwards compatible?
> > > > Then we can remove those places tweaking the flags completely.
> > > 
> > > As a first step, I suggest to document that:
> > > - applications must set TCP_CKSUM when setting TCP_SEG
> > 
> > That's what I suggest above.
> > 
> > > - pmds must suppose that TCP_CKSUM is set when TCP_SEG is set
> > 
> > But that keeps the problem of implying the TCP_CKSUM flag in
> > various places.
> 
> Yes. What I propose is just a first step: better document what is the
> current expected behavior, before doing something else.
> 
> > > This is clearer that what we have today, and I think it does not break
> > > anything. This will guide apps in the correct direction, facilitating
> > > an eventual future PMD change.
> > > 
> > > > diff --git a/lib/librte_mbuf/rte_mbuf_core.h b/lib/librte_mbuf/rte_mbuf_core.h
> > > > index c17dc95c5..6a0c2cdd9 100644
> > > > --- a/lib/librte_mbuf/rte_mbuf_core.h
> > > > +++ b/lib/librte_mbuf/rte_mbuf_core.h
> > > > @@ -298,7 +298,7 @@ extern "C" {
> > > >   *  - if it's IPv4, set the PKT_TX_IP_CKSUM flag
> > > >   *  - fill the mbuf offload information: l2_len, l3_len, l4_len, tso_segsz
> > > >   */
> > > > -#define PKT_TX_TCP_SEG       (1ULL << 50)
> > > > +#define PKT_TX_TCP_SEG       (1ULL << 50) | PKT_TX_TCP_CKSUM
> > > >  
> > > >  /** TX IEEE1588 packet to timestamp. */
> > > >  #define PKT_TX_IEEE1588_TMST (1ULL << 51)
> > > 
> > > I'm afraid some applications or drivers use extended bit manipulations
> > > to do the conversion from/to another domain (like hardware descriptors
> > > or application-specific flags). They may expect this constant to be a
> > > uniq flag.
> > 
> > Interesting, do you have an example? Because each flag still has an
> > separate meaning.
> 
> Honnestly no, I don't have any good example, just a (maybe unfounded) doubt.
> 
> I have in mind operations that are done with tables or vector
> instructions inside the drivers, but this is mainly done for Rx, not Tx.
> You can look at Tx functions like mlx5_set_cksum_table() or
> nix_xmit_pkts_vector(), or Rx functions like desc_to_olflags_v() or
> enic_noscatter_vec_recv_pkts() to see what kind of stuff I'm talking
> about.

I see your point. Going back to improving the documentation as a
first step, what would be the next steps? Are we going to wait few
releases and then remove the flag tweaking code assuming that PMDs
and apps are ok?

Thanks,
-- 
fbl

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [dpdk-dev] [PATCH 4/5] net/virtio: refactor Tx offload helper
  2021-04-01  9:52 ` [dpdk-dev] [PATCH 4/5] net/virtio: refactor Tx offload helper David Marchand
@ 2021-04-08 13:05   ` Flavio Leitner
  2021-04-09  2:31   ` Ruifeng Wang
  1 sibling, 0 replies; 63+ messages in thread
From: Flavio Leitner @ 2021-04-08 13:05 UTC (permalink / raw)
  To: David Marchand
  Cc: dev, maxime.coquelin, olivier.matz, i.maximets, Chenbo Xia,
	Bruce Richardson, Konstantin Ananyev, Jerin Jacob, Ruifeng Wang

On Thu, Apr 01, 2021 at 11:52:42AM +0200, David Marchand wrote:
> Purely cosmetic but it is rather odd to have an "offload" helper that
> checks if it actually must do something.
> We already have the same checks in most callers, so move this branch
> in them.
> 
> Signed-off-by: David Marchand <david.marchand@redhat.com>
> ---

Reviewed-by: Flavio Leitner <fbl@sysclose.org>


^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [dpdk-dev] [PATCH 5/5] vhost: fix offload flags in Rx path
  2021-04-01  9:52 ` [dpdk-dev] [PATCH 5/5] vhost: fix offload flags in Rx path David Marchand
  2021-04-08  8:28   ` Olivier Matz
@ 2021-04-08 18:38   ` Flavio Leitner
  2021-04-13 15:27     ` Maxime Coquelin
  1 sibling, 1 reply; 63+ messages in thread
From: Flavio Leitner @ 2021-04-08 18:38 UTC (permalink / raw)
  To: David Marchand
  Cc: dev, maxime.coquelin, olivier.matz, i.maximets, Chenbo Xia,
	Jijiang Liu, Yuanhan Liu

On Thu, Apr 01, 2021 at 11:52:43AM +0200, David Marchand wrote:
> The vhost library current configures Tx offloading (PKT_TX_*) on any
> packet received from a guest virtio device which asks for some offloading.
> 
> This is problematic, as Tx offloading is something that the application
> must ask for: the application needs to configure devices
> to support every used offloads (ip, tcp checksumming, tso..), and the
> various l2/l3/l4 lengths must be set following any processing that
> happened in the application itself.
> 
> On the other hand, the received packets are not marked wrt current
> packet l3/l4 checksumming info.
> 
> Copy virtio rx processing to fix those offload flags.
> 
> The vhost example needs a reworking as it was built with the assumption
> that mbuf TSO configuration is set up by the vhost library.
> This is not done in this patch for now so TSO activation is forcibly
> refused.
> 
> Fixes: 859b480d5afd ("vhost: add guest offload setting")

There is change that before ECN was ignored and now it is invalid.
I think that's the right way to go, but not sure if virtio blocks
the negotiation of that feature.

Reviewed-by: Flavio Leitner <fbl@sysclose.org>

fbl

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [dpdk-dev] [PATCH 4/5] net/virtio: refactor Tx offload helper
  2021-04-01  9:52 ` [dpdk-dev] [PATCH 4/5] net/virtio: refactor Tx offload helper David Marchand
  2021-04-08 13:05   ` Flavio Leitner
@ 2021-04-09  2:31   ` Ruifeng Wang
  1 sibling, 0 replies; 63+ messages in thread
From: Ruifeng Wang @ 2021-04-09  2:31 UTC (permalink / raw)
  To: David Marchand, dev
  Cc: maxime.coquelin, olivier.matz, fbl, i.maximets, Chenbo Xia,
	Bruce Richardson, Konstantin Ananyev, jerinj, nd

> -----Original Message-----
> From: David Marchand <david.marchand@redhat.com>
> Sent: Thursday, April 1, 2021 5:53 PM
> To: dev@dpdk.org
> Cc: maxime.coquelin@redhat.com; olivier.matz@6wind.com;
> fbl@sysclose.org; i.maximets@ovn.org; Chenbo Xia <chenbo.xia@intel.com>;
> Bruce Richardson <bruce.richardson@intel.com>; Konstantin Ananyev
> <konstantin.ananyev@intel.com>; jerinj@marvell.com; Ruifeng Wang
> <Ruifeng.Wang@arm.com>
> Subject: [PATCH 4/5] net/virtio: refactor Tx offload helper
> 
> Purely cosmetic but it is rather odd to have an "offload" helper that checks if
> it actually must do something.
> We already have the same checks in most callers, so move this branch in
> them.
> 
> Signed-off-by: David Marchand <david.marchand@redhat.com>
> ---
>  drivers/net/virtio/virtio_rxtx.c             |  7 +-
>  drivers/net/virtio/virtio_rxtx_packed_avx.h  |  2 +-
> drivers/net/virtio/virtio_rxtx_packed_neon.h |  2 +-
>  drivers/net/virtio/virtqueue.h               | 83 +++++++++-----------
>  4 files changed, 44 insertions(+), 50 deletions(-)
> 
> diff --git a/drivers/net/virtio/virtio_rxtx.c b/drivers/net/virtio/virtio_rxtx.c
> index 40283001b0..a4e37ef379 100644
> --- a/drivers/net/virtio/virtio_rxtx.c
> +++ b/drivers/net/virtio/virtio_rxtx.c
> @@ -448,7 +448,7 @@ virtqueue_enqueue_xmit_inorder(struct virtnet_tx
> *txvq,
>  		if (!vq->hw->has_tx_offload)
>  			virtqueue_clear_net_hdr(hdr);
>  		else
> -			virtqueue_xmit_offload(hdr, cookies[i], true);
> +			virtqueue_xmit_offload(hdr, cookies[i]);
> 
>  		start_dp[idx].addr  = rte_mbuf_data_iova(cookies[i]) -
> head_size;
>  		start_dp[idx].len   = cookies[i]->data_len + head_size;
> @@ -495,7 +495,7 @@ virtqueue_enqueue_xmit_packed_fast(struct
> virtnet_tx *txvq,
>  	if (!vq->hw->has_tx_offload)
>  		virtqueue_clear_net_hdr(hdr);
>  	else
> -		virtqueue_xmit_offload(hdr, cookie, true);
> +		virtqueue_xmit_offload(hdr, cookie);
> 
>  	dp->addr = rte_mbuf_data_iova(cookie) - head_size;
>  	dp->len  = cookie->data_len + head_size; @@ -581,7 +581,8 @@
> virtqueue_enqueue_xmit(struct virtnet_tx *txvq, struct rte_mbuf *cookie,
>  		idx = start_dp[idx].next;
>  	}
> 
> -	virtqueue_xmit_offload(hdr, cookie, vq->hw->has_tx_offload);
> +	if (vq->hw->has_tx_offload)
> +		virtqueue_xmit_offload(hdr, cookie);
> 
>  	do {
>  		start_dp[idx].addr  = rte_mbuf_data_iova(cookie); diff --git
> a/drivers/net/virtio/virtio_rxtx_packed_avx.h
> b/drivers/net/virtio/virtio_rxtx_packed_avx.h
> index 49e845d02a..33cac3244f 100644
> --- a/drivers/net/virtio/virtio_rxtx_packed_avx.h
> +++ b/drivers/net/virtio/virtio_rxtx_packed_avx.h
> @@ -115,7 +115,7 @@ virtqueue_enqueue_batch_packed_vec(struct
> virtnet_tx *txvq,
>  		virtio_for_each_try_unroll(i, 0, PACKED_BATCH_SIZE) {
>  			hdr = rte_pktmbuf_mtod_offset(tx_pkts[i],
>  					struct virtio_net_hdr *, -head_size);
> -			virtqueue_xmit_offload(hdr, tx_pkts[i], true);
> +			virtqueue_xmit_offload(hdr, tx_pkts[i]);
>  		}
>  	}
> 
> diff --git a/drivers/net/virtio/virtio_rxtx_packed_neon.h
> b/drivers/net/virtio/virtio_rxtx_packed_neon.h
> index 851c81f312..1a49caf8af 100644
> --- a/drivers/net/virtio/virtio_rxtx_packed_neon.h
> +++ b/drivers/net/virtio/virtio_rxtx_packed_neon.h
> @@ -134,7 +134,7 @@ virtqueue_enqueue_batch_packed_vec(struct
> virtnet_tx *txvq,
>  		virtio_for_each_try_unroll(i, 0, PACKED_BATCH_SIZE) {
>  			hdr = rte_pktmbuf_mtod_offset(tx_pkts[i],
>  					struct virtio_net_hdr *, -head_size);
> -			virtqueue_xmit_offload(hdr, tx_pkts[i], true);
> +			virtqueue_xmit_offload(hdr, tx_pkts[i]);
>  		}
>  	}
> 
> diff --git a/drivers/net/virtio/virtqueue.h b/drivers/net/virtio/virtqueue.h
> index 2e8826bc28..41a9b82a5f 100644
> --- a/drivers/net/virtio/virtqueue.h
> +++ b/drivers/net/virtio/virtqueue.h
> @@ -613,52 +613,44 @@ virtqueue_notify(struct virtqueue *vq)  } while (0)
> 
>  static inline void
> -virtqueue_xmit_offload(struct virtio_net_hdr *hdr,
> -			struct rte_mbuf *cookie,
> -			uint8_t offload)
> +virtqueue_xmit_offload(struct virtio_net_hdr *hdr, struct rte_mbuf
> +*cookie)
>  {
> -	if (offload) {
> -		uint64_t csum_l4 = cookie->ol_flags & PKT_TX_L4_MASK;
> -
> -		if (cookie->ol_flags & PKT_TX_TCP_SEG)
> -			csum_l4 |= PKT_TX_TCP_CKSUM;
> -
> -		switch (csum_l4) {
> -		case PKT_TX_UDP_CKSUM:
> -			hdr->csum_start = cookie->l2_len + cookie->l3_len;
> -			hdr->csum_offset = offsetof(struct rte_udp_hdr,
> -				dgram_cksum);
> -			hdr->flags = VIRTIO_NET_HDR_F_NEEDS_CSUM;
> -			break;
> -
> -		case PKT_TX_TCP_CKSUM:
> -			hdr->csum_start = cookie->l2_len + cookie->l3_len;
> -			hdr->csum_offset = offsetof(struct rte_tcp_hdr,
> cksum);
> -			hdr->flags = VIRTIO_NET_HDR_F_NEEDS_CSUM;
> -			break;
> -
> -		default:
> -			ASSIGN_UNLESS_EQUAL(hdr->csum_start, 0);
> -			ASSIGN_UNLESS_EQUAL(hdr->csum_offset, 0);
> -			ASSIGN_UNLESS_EQUAL(hdr->flags, 0);
> -			break;
> -		}
> +	uint64_t csum_l4 = cookie->ol_flags & PKT_TX_L4_MASK;
> +
> +	if (cookie->ol_flags & PKT_TX_TCP_SEG)
> +		csum_l4 |= PKT_TX_TCP_CKSUM;
> +
> +	switch (csum_l4) {
> +	case PKT_TX_UDP_CKSUM:
> +		hdr->csum_start = cookie->l2_len + cookie->l3_len;
> +		hdr->csum_offset = offsetof(struct rte_udp_hdr,
> dgram_cksum);
> +		hdr->flags = VIRTIO_NET_HDR_F_NEEDS_CSUM;
> +		break;
> +
> +	case PKT_TX_TCP_CKSUM:
> +		hdr->csum_start = cookie->l2_len + cookie->l3_len;
> +		hdr->csum_offset = offsetof(struct rte_tcp_hdr, cksum);
> +		hdr->flags = VIRTIO_NET_HDR_F_NEEDS_CSUM;
> +		break;
> +
> +	default:
> +		ASSIGN_UNLESS_EQUAL(hdr->csum_start, 0);
> +		ASSIGN_UNLESS_EQUAL(hdr->csum_offset, 0);
> +		ASSIGN_UNLESS_EQUAL(hdr->flags, 0);
> +		break;
> +	}
> 
> -		/* TCP Segmentation Offload */
> -		if (cookie->ol_flags & PKT_TX_TCP_SEG) {
> -			hdr->gso_type = (cookie->ol_flags & PKT_TX_IPV6) ?
> -				VIRTIO_NET_HDR_GSO_TCPV6 :
> -				VIRTIO_NET_HDR_GSO_TCPV4;
> -			hdr->gso_size = cookie->tso_segsz;
> -			hdr->hdr_len =
> -				cookie->l2_len +
> -				cookie->l3_len +
> -				cookie->l4_len;
> -		} else {
> -			ASSIGN_UNLESS_EQUAL(hdr->gso_type, 0);
> -			ASSIGN_UNLESS_EQUAL(hdr->gso_size, 0);
> -			ASSIGN_UNLESS_EQUAL(hdr->hdr_len, 0);
> -		}
> +	/* TCP Segmentation Offload */
> +	if (cookie->ol_flags & PKT_TX_TCP_SEG) {
> +		hdr->gso_type = (cookie->ol_flags & PKT_TX_IPV6) ?
> +			VIRTIO_NET_HDR_GSO_TCPV6 :
> +			VIRTIO_NET_HDR_GSO_TCPV4;
> +		hdr->gso_size = cookie->tso_segsz;
> +		hdr->hdr_len = cookie->l2_len + cookie->l3_len + cookie-
> >l4_len;
> +	} else {
> +		ASSIGN_UNLESS_EQUAL(hdr->gso_type, 0);
> +		ASSIGN_UNLESS_EQUAL(hdr->gso_size, 0);
> +		ASSIGN_UNLESS_EQUAL(hdr->hdr_len, 0);
>  	}
>  }
> 
> @@ -737,7 +729,8 @@ virtqueue_enqueue_xmit_packed(struct virtnet_tx
> *txvq, struct rte_mbuf *cookie,
>  		}
>  	}
> 
> -	virtqueue_xmit_offload(hdr, cookie, vq->hw->has_tx_offload);
> +	if (vq->hw->has_tx_offload)
> +		virtqueue_xmit_offload(hdr, cookie);
> 
>  	do {
>  		uint16_t flags;
> --
> 2.23.0

Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [dpdk-dev] [PATCH 2/5] net/tap: do not touch Tx offload flags
  2021-04-08 12:58           ` Flavio Leitner
@ 2021-04-09 13:30             ` Olivier Matz
  2021-04-09 16:55               ` Flavio Leitner
  2021-04-28 12:17               ` David Marchand
  0 siblings, 2 replies; 63+ messages in thread
From: Olivier Matz @ 2021-04-09 13:30 UTC (permalink / raw)
  To: Flavio Leitner
  Cc: David Marchand, dev, maxime.coquelin, i.maximets, Keith Wiles

On Thu, Apr 08, 2021 at 09:58:35AM -0300, Flavio Leitner wrote:
> On Thu, Apr 08, 2021 at 02:05:21PM +0200, Olivier Matz wrote:
> > On Thu, Apr 08, 2021 at 08:21:58AM -0300, Flavio Leitner wrote:
> > > On Thu, Apr 08, 2021 at 09:41:59AM +0200, Olivier Matz wrote:
> > > > On Wed, Apr 07, 2021 at 05:15:39PM -0300, Flavio Leitner wrote:
> > > > > On Thu, Apr 01, 2021 at 11:52:40AM +0200, David Marchand wrote:
> > > > > > Tx offload flags are of the application responsibility.
> > > > > > Leave the mbuf alone and check for TSO where needed.
> > > > > > 
> > > > > > Signed-off-by: David Marchand <david.marchand@redhat.com>
> > > > > > ---
> > > > > 
> > > > > The patch looks good, but maybe a better approach would be
> > > > > to change the documentation to require the TCP_CKSUM flag
> > > > > when TCP_SEG is used, otherwise this flag adjusting needs
> > > > > to be replicated every time TCP_SEG is used.
> > > > > 
> > > > > The above could break existing applications, so perhaps doing
> > > > > something like below would be better and backwards compatible?
> > > > > Then we can remove those places tweaking the flags completely.
> > > > 
> > > > As a first step, I suggest to document that:
> > > > - applications must set TCP_CKSUM when setting TCP_SEG
> > > 
> > > That's what I suggest above.
> > > 
> > > > - pmds must suppose that TCP_CKSUM is set when TCP_SEG is set
> > > 
> > > But that keeps the problem of implying the TCP_CKSUM flag in
> > > various places.
> > 
> > Yes. What I propose is just a first step: better document what is the
> > current expected behavior, before doing something else.
> > 
> > > > This is clearer that what we have today, and I think it does not break
> > > > anything. This will guide apps in the correct direction, facilitating
> > > > an eventual future PMD change.
> > > > 
> > > > > diff --git a/lib/librte_mbuf/rte_mbuf_core.h b/lib/librte_mbuf/rte_mbuf_core.h
> > > > > index c17dc95c5..6a0c2cdd9 100644
> > > > > --- a/lib/librte_mbuf/rte_mbuf_core.h
> > > > > +++ b/lib/librte_mbuf/rte_mbuf_core.h
> > > > > @@ -298,7 +298,7 @@ extern "C" {
> > > > >   *  - if it's IPv4, set the PKT_TX_IP_CKSUM flag
> > > > >   *  - fill the mbuf offload information: l2_len, l3_len, l4_len, tso_segsz
> > > > >   */
> > > > > -#define PKT_TX_TCP_SEG       (1ULL << 50)
> > > > > +#define PKT_TX_TCP_SEG       (1ULL << 50) | PKT_TX_TCP_CKSUM
> > > > >  
> > > > >  /** TX IEEE1588 packet to timestamp. */
> > > > >  #define PKT_TX_IEEE1588_TMST (1ULL << 51)
> > > > 
> > > > I'm afraid some applications or drivers use extended bit manipulations
> > > > to do the conversion from/to another domain (like hardware descriptors
> > > > or application-specific flags). They may expect this constant to be a
> > > > uniq flag.
> > > 
> > > Interesting, do you have an example? Because each flag still has an
> > > separate meaning.
> > 
> > Honnestly no, I don't have any good example, just a (maybe unfounded) doubt.
> > 
> > I have in mind operations that are done with tables or vector
> > instructions inside the drivers, but this is mainly done for Rx, not Tx.
> > You can look at Tx functions like mlx5_set_cksum_table() or
> > nix_xmit_pkts_vector(), or Rx functions like desc_to_olflags_v() or
> > enic_noscatter_vec_recv_pkts() to see what kind of stuff I'm talking
> > about.
> 
> I see your point. Going back to improving the documentation as a
> first step, what would be the next steps? Are we going to wait few
> releases and then remove the flag tweaking code assuming that PMDs
> and apps are ok?

After this documentation step, in few releases, we could relax the
constraint on PMD: applications will be expected to set TCP_CKSUM when
TCP_SEG is set, so no need for the PMD to force TCP_CKSUM to 1 if
TCP_SEG is set. The documentation will be updated again.

This plan can be described in the deprecation notice, and later in the
release note.

How does it sound?

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [dpdk-dev] [PATCH 2/5] net/tap: do not touch Tx offload flags
  2021-04-09 13:30             ` Olivier Matz
@ 2021-04-09 16:55               ` Flavio Leitner
  2021-04-28 12:17               ` David Marchand
  1 sibling, 0 replies; 63+ messages in thread
From: Flavio Leitner @ 2021-04-09 16:55 UTC (permalink / raw)
  To: Olivier Matz
  Cc: David Marchand, dev, maxime.coquelin, i.maximets, Keith Wiles

On Fri, Apr 09, 2021 at 03:30:18PM +0200, Olivier Matz wrote:
> On Thu, Apr 08, 2021 at 09:58:35AM -0300, Flavio Leitner wrote:
> > On Thu, Apr 08, 2021 at 02:05:21PM +0200, Olivier Matz wrote:
> > > On Thu, Apr 08, 2021 at 08:21:58AM -0300, Flavio Leitner wrote:
> > > > On Thu, Apr 08, 2021 at 09:41:59AM +0200, Olivier Matz wrote:
> > > > > On Wed, Apr 07, 2021 at 05:15:39PM -0300, Flavio Leitner wrote:
> > > > > > On Thu, Apr 01, 2021 at 11:52:40AM +0200, David Marchand wrote:
> > > > > > > Tx offload flags are of the application responsibility.
> > > > > > > Leave the mbuf alone and check for TSO where needed.
> > > > > > > 
> > > > > > > Signed-off-by: David Marchand <david.marchand@redhat.com>
> > > > > > > ---
> > > > > > 
> > > > > > The patch looks good, but maybe a better approach would be
> > > > > > to change the documentation to require the TCP_CKSUM flag
> > > > > > when TCP_SEG is used, otherwise this flag adjusting needs
> > > > > > to be replicated every time TCP_SEG is used.
> > > > > > 
> > > > > > The above could break existing applications, so perhaps doing
> > > > > > something like below would be better and backwards compatible?
> > > > > > Then we can remove those places tweaking the flags completely.
> > > > > 
> > > > > As a first step, I suggest to document that:
> > > > > - applications must set TCP_CKSUM when setting TCP_SEG
> > > > 
> > > > That's what I suggest above.
> > > > 
> > > > > - pmds must suppose that TCP_CKSUM is set when TCP_SEG is set
> > > > 
> > > > But that keeps the problem of implying the TCP_CKSUM flag in
> > > > various places.
> > > 
> > > Yes. What I propose is just a first step: better document what is the
> > > current expected behavior, before doing something else.
> > > 
> > > > > This is clearer that what we have today, and I think it does not break
> > > > > anything. This will guide apps in the correct direction, facilitating
> > > > > an eventual future PMD change.
[...]
> > I see your point. Going back to improving the documentation as a
> > first step, what would be the next steps? Are we going to wait few
> > releases and then remove the flag tweaking code assuming that PMDs
> > and apps are ok?
> 
> After this documentation step, in few releases, we could relax the
> constraint on PMD: applications will be expected to set TCP_CKSUM when
> TCP_SEG is set, so no need for the PMD to force TCP_CKSUM to 1 if
> TCP_SEG is set. The documentation will be updated again.
> 
> This plan can be described in the deprecation notice, and later in the
> release note.
> 
> How does it sound?

Works for me.
Thanks,
-- 
fbl

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [dpdk-dev] [PATCH 3/5] net/virtio: do not touch Tx offload flags
  2021-04-01  9:52 ` [dpdk-dev] [PATCH 3/5] net/virtio: " David Marchand
@ 2021-04-13 14:17   ` Maxime Coquelin
  0 siblings, 0 replies; 63+ messages in thread
From: Maxime Coquelin @ 2021-04-13 14:17 UTC (permalink / raw)
  To: David Marchand, dev; +Cc: olivier.matz, fbl, i.maximets, Chenbo Xia



On 4/1/21 11:52 AM, David Marchand wrote:
> Tx offload flags are of the application responsibility.
> Leave the mbuf alone and use a local storage for implicit tcp checksum
> offloading in case of TSO.
> 
> Signed-off-by: David Marchand <david.marchand@redhat.com>
> ---
>  drivers/net/virtio/virtqueue.h | 6 ++++--
>  1 file changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/net/virtio/virtqueue.h b/drivers/net/virtio/virtqueue.h
> index 71b66f3208..2e8826bc28 100644
> --- a/drivers/net/virtio/virtqueue.h
> +++ b/drivers/net/virtio/virtqueue.h
> @@ -618,10 +618,12 @@ virtqueue_xmit_offload(struct virtio_net_hdr *hdr,
>  			uint8_t offload)
>  {
>  	if (offload) {
> +		uint64_t csum_l4 = cookie->ol_flags & PKT_TX_L4_MASK;
> +
>  		if (cookie->ol_flags & PKT_TX_TCP_SEG)
> -			cookie->ol_flags |= PKT_TX_TCP_CKSUM;
> +			csum_l4 |= PKT_TX_TCP_CKSUM;
>  
> -		switch (cookie->ol_flags & PKT_TX_L4_MASK) {
> +		switch (csum_l4) {
>  		case PKT_TX_UDP_CKSUM:
>  			hdr->csum_start = cookie->l2_len + cookie->l3_len;
>  			hdr->csum_offset = offsetof(struct rte_udp_hdr,
> 

Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>

Thanks,
Maxime


^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [dpdk-dev] [PATCH 5/5] vhost: fix offload flags in Rx path
  2021-04-08 18:38   ` Flavio Leitner
@ 2021-04-13 15:27     ` Maxime Coquelin
  2021-04-27 17:09       ` David Marchand
  0 siblings, 1 reply; 63+ messages in thread
From: Maxime Coquelin @ 2021-04-13 15:27 UTC (permalink / raw)
  To: Flavio Leitner, David Marchand
  Cc: dev, olivier.matz, i.maximets, Chenbo Xia, Jijiang Liu, Yuanhan Liu



On 4/8/21 8:38 PM, Flavio Leitner wrote:
> On Thu, Apr 01, 2021 at 11:52:43AM +0200, David Marchand wrote:
>> The vhost library current configures Tx offloading (PKT_TX_*) on any
>> packet received from a guest virtio device which asks for some offloading.
>>
>> This is problematic, as Tx offloading is something that the application
>> must ask for: the application needs to configure devices
>> to support every used offloads (ip, tcp checksumming, tso..), and the
>> various l2/l3/l4 lengths must be set following any processing that
>> happened in the application itself.
>>
>> On the other hand, the received packets are not marked wrt current
>> packet l3/l4 checksumming info.
>>
>> Copy virtio rx processing to fix those offload flags.
>>
>> The vhost example needs a reworking as it was built with the assumption
>> that mbuf TSO configuration is set up by the vhost library.
>> This is not done in this patch for now so TSO activation is forcibly
>> refused.
>>
>> Fixes: 859b480d5afd ("vhost: add guest offload setting")
> 
> There is change that before ECN was ignored and now it is invalid.
> I think that's the right way to go, but not sure if virtio blocks
> the negotiation of that feature.

No, I just tested and the feature gets negotiated.

Disabling it in Vhost lib should be avoided to avoid breaking
live-migration.

It might be safer to revert back to older behavior for it, i.e. just
ignore the bit. I don't think it is ever set, because otherwise we would
have had lots of reports since the Vhost log would be flooded with:

VHOST_LOG_DATA(WARNING,
	"unsupported gso type %u.\n", hdr->gso_type);

David, what do you think?

> Reviewed-by: Flavio Leitner <fbl@sysclose.org>
> 
> fbl
> 


^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [dpdk-dev] [PATCH 5/5] vhost: fix offload flags in Rx path
  2021-04-13 15:27     ` Maxime Coquelin
@ 2021-04-27 17:09       ` David Marchand
  2021-04-27 17:19         ` David Marchand
  0 siblings, 1 reply; 63+ messages in thread
From: David Marchand @ 2021-04-27 17:09 UTC (permalink / raw)
  To: Maxime Coquelin
  Cc: Flavio Leitner, dev, Olivier Matz, Ilya Maximets, Chenbo Xia,
	Jijiang Liu, Yuanhan Liu

On Tue, Apr 13, 2021 at 5:27 PM Maxime Coquelin
<maxime.coquelin@redhat.com> wrote:
> On 4/8/21 8:38 PM, Flavio Leitner wrote:
> > On Thu, Apr 01, 2021 at 11:52:43AM +0200, David Marchand wrote:
> >> The vhost library current configures Tx offloading (PKT_TX_*) on any
> >> packet received from a guest virtio device which asks for some offloading.
> >>
> >> This is problematic, as Tx offloading is something that the application
> >> must ask for: the application needs to configure devices
> >> to support every used offloads (ip, tcp checksumming, tso..), and the
> >> various l2/l3/l4 lengths must be set following any processing that
> >> happened in the application itself.
> >>
> >> On the other hand, the received packets are not marked wrt current
> >> packet l3/l4 checksumming info.
> >>
> >> Copy virtio rx processing to fix those offload flags.
> >>
> >> The vhost example needs a reworking as it was built with the assumption
> >> that mbuf TSO configuration is set up by the vhost library.
> >> This is not done in this patch for now so TSO activation is forcibly
> >> refused.
> >>
> >> Fixes: 859b480d5afd ("vhost: add guest offload setting")
> >
> > There is change that before ECN was ignored and now it is invalid.
> > I think that's the right way to go, but not sure if virtio blocks
> > the negotiation of that feature.
>
> No, I just tested and the feature gets negotiated.

I suppose you tested with testpmd, because I can see ECN is disabled
by default with OVS.


>
> Disabling it in Vhost lib should be avoided to avoid breaking
> live-migration.
>
> It might be safer to revert back to older behavior for it, i.e. just
> ignore the bit. I don't think it is ever set, because otherwise we would
> have had lots of reports since the Vhost log would be flooded with:

-  The VIRTIO_NET_HDR_GSO_ECN bit is supposed to be coupled with TSO bits.
Copying a bit more of this code:
   switch (hdr->gso_type & ~VIRTIO_NET_HDR_GSO_ECN) {
...
   default:
>
> VHOST_LOG_DATA(WARNING,
>         "unsupported gso type %u.\n", hdr->gso_type);

The absence of log does not mean the guest is not sending packets with
VIRTIO_NET_HDR_GSO_ECN set.
Otoh, getting this log instead indicates a bug in the virtio driver
(as we discussed offlist).


- It is not clear to me how deployed the ECN feature is.
I think the Linux kernel won't try to start a TCP connection unless
explicitly configuring it on a socket (but I am a bit lost).

By default, VIRTIO_NET_F_HOST_ECN is announced as supported by vhost-user.
So in theory, a guest virtio netdevice with NETIF_F_TSO_ECN can
transmit packet (with SKB_GSO_TCP_ECN translated to
VIRTIO_NET_HDR_GSO_ECN in virtio_net_hdr_from_skb) to a vhost-user
backend.


- Treating ECN with GSO requires special handling:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=b0da8537037f337103348f239ad901477e907aa8

I can see some change in the i40e kernel driver at least.
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=059dab69652da3525d320d77ac5422ec708ced14
The ixgbe kernel driver is not flagged with NETIF_F_TSO_ECN.

We don't have such a distinction in DPDK: neither a per mbuf flag to
mark packets, nor a device offloading flag/capability.
And the rte_gso library probably does not handle correctly CWR.
About the i40e driver, I can't find the same configuration than the
kernel driver.



- Now, about the next step...

The "good" (I suppose you might disagree here) news, is that this
feature is disabled in OVS:
https://github.com/openvswitch/ovs/blob/master/lib/netdev-dpdk.c#L5162

About handling TSO + ECN, this is a generic problem with the DPDK API
and we have been living for a long time.
I understand passing such packets to hw that does not handle this
correctly breaks the ECN feature not work properly.
But "normal" TSO works.

I agree, we can let such packets be received by vhost like it was done
before my patch.

Investigating the other side (GUEST_ECN + the virtio pmd) could be
worth later, as I think GSO+ECN packets are dropped in the current
code.


-- 
David Marchand


^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [dpdk-dev] [PATCH 5/5] vhost: fix offload flags in Rx path
  2021-04-27 17:09       ` David Marchand
@ 2021-04-27 17:19         ` David Marchand
  0 siblings, 0 replies; 63+ messages in thread
From: David Marchand @ 2021-04-27 17:19 UTC (permalink / raw)
  To: Maxime Coquelin
  Cc: Flavio Leitner, dev, Olivier Matz, Ilya Maximets, Chenbo Xia,
	Jijiang Liu, Yuanhan Liu

On Tue, Apr 27, 2021 at 7:09 PM David Marchand
<david.marchand@redhat.com> wrote:
> Investigating the other side (GUEST_ECN + the virtio pmd) could be
> worth later, as I think GSO+ECN packets are dropped in the current
> code.

Errr, but that would be a problem only for vhost-kernel -> virtio pmd.
Not sure this is a usecase we care about.


-- 
David Marchand


^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [dpdk-dev] [PATCH 2/5] net/tap: do not touch Tx offload flags
  2021-04-08  7:53   ` Olivier Matz
@ 2021-04-28 12:12     ` David Marchand
  0 siblings, 0 replies; 63+ messages in thread
From: David Marchand @ 2021-04-28 12:12 UTC (permalink / raw)
  To: Olivier Matz
  Cc: dev, Maxime Coquelin, Flavio Leitner, Ilya Maximets, Keith Wiles

On Thu, Apr 8, 2021 at 9:53 AM Olivier Matz <olivier.matz@6wind.com> wrote:
>
> On Thu, Apr 01, 2021 at 11:52:40AM +0200, David Marchand wrote:
> > Tx offload flags are of the application responsibility.
> > Leave the mbuf alone and check for TSO where needed.
> >
> > Signed-off-by: David Marchand <david.marchand@redhat.com>

Self nack on this patch.

>
> Maybe the problem being solved should be better described in the commit
> log. Is it a problem (other than cosmetic) to touch a mbuf in the Tx
> function of a driver, where we could expect that the mbuf is owned by
> the driver?
>
> The only problem I can think about is in case we transmit a direct mbuf
> whose refcnt is increased, but I wonder how much this is really
> supported: for instance, several drivers add vlans using
> rte_vlan_insert() in their Tx path.

This was my initial thought, as I suspected issues with applications
which keep a refcount on the mbuf.
But after more discussions offlist, it is hard to find a usecase for this.

The gso library already touches the ol_flags from the input mbuf and
this is documented in its API.

Plus, my patch also has an issue.
This library gives back an array of direct/indirect mbufs pointing at
the data from the original mbuf.
Those mbufs are populated with the original mbuf ol_flags but nothing
has been done on the checksum in this data.
The net/tap driver relies on this feature: mbuf_in is marked with
PKT_TX_TCP_CKSUM so that the generated mbufs given back by the rte_gso
library has this offload flag set and PKT_TX_TCP_SEG is not present
anymore.
Later in the net/tap tx handler, PKT_TX_TCP_CKSUM presence triggers
tcp checksum computation.
So this patch breaks the tso support in net/tap.


Just for the record.
As far as rte_vlan_insert() is concerned, it ensures that the mbuf is
not shared.
https://git.dpdk.org/dpdk/tree/lib/net/rte_ether.h#n352


-- 
David Marchand


^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [dpdk-dev] [PATCH 2/5] net/tap: do not touch Tx offload flags
  2021-04-09 13:30             ` Olivier Matz
  2021-04-09 16:55               ` Flavio Leitner
@ 2021-04-28 12:17               ` David Marchand
  1 sibling, 0 replies; 63+ messages in thread
From: David Marchand @ 2021-04-28 12:17 UTC (permalink / raw)
  To: Olivier Matz
  Cc: Flavio Leitner, dev, Maxime Coquelin, Ilya Maximets, Keith Wiles,
	Yigit, Ferruh, Thomas Monjalon, Andrew Rybchenko

On Fri, Apr 9, 2021 at 3:30 PM Olivier Matz <olivier.matz@6wind.com> wrote:
> > I see your point. Going back to improving the documentation as a
> > first step, what would be the next steps? Are we going to wait few
> > releases and then remove the flag tweaking code assuming that PMDs
> > and apps are ok?
>
> After this documentation step, in few releases, we could relax the
> constraint on PMD: applications will be expected to set TCP_CKSUM when
> TCP_SEG is set, so no need for the PMD to force TCP_CKSUM to 1 if
> TCP_SEG is set. The documentation will be updated again.
>
> This plan can be described in the deprecation notice, and later in the
> release note.

Looking at drivers, some of them already trigger tcp checksumming with
the presence of PKT_TX_TCP_SEG only.
See for example:
https://git.dpdk.org/dpdk/tree/drivers/net/ixgbe/ixgbe_rxtx.c#n391

The hw needs to fill tcp checksum when generating segments.
So I suppose this was the original meaning of the "implies" comment.
https://git.dpdk.org/dpdk/tree/lib/mbuf/rte_mbuf_core.h#n292

We could reword the comment, but I don't think there is anything to
change in the API.


-- 
David Marchand


^ permalink raw reply	[flat|nested] 63+ messages in thread

* [dpdk-dev] [PATCH v2 0/4] Offload flags fixes
  2021-04-01  9:52 [dpdk-dev] [PATCH 0/5] Offload flags fixes David Marchand
                   ` (4 preceding siblings ...)
  2021-04-01  9:52 ` [dpdk-dev] [PATCH 5/5] vhost: fix offload flags in Rx path David Marchand
@ 2021-04-29  8:04 ` David Marchand
  2021-04-29  8:04   ` [dpdk-dev] [PATCH v2 1/4] mbuf: mark old offload flag as deprecated David Marchand
                     ` (3 more replies)
  2021-05-03 13:26 ` [dpdk-dev] [PATCH v3 0/4] Offload flags fixes David Marchand
  2021-05-03 16:43 ` [dpdk-dev] [PATCH v4 0/3] " David Marchand
  7 siblings, 4 replies; 63+ messages in thread
From: David Marchand @ 2021-04-29  8:04 UTC (permalink / raw)
  To: dev; +Cc: maxime.coquelin, olivier.matz, fbl, i.maximets

The important part is the last patch on vhost handling of offloading
requests coming from a virtio guest interface.

The rest are small fixes that I accumulated while reviewing the mbuf
offload flags.

On this last patch, it has the potential of breaking existing
applications using the vhost library (OVS being impacted).
I did not mark it for backport.

Changes since v1:
- dropped patch on net/tap,
- added missing bits in example/vhost,
- relaxed checks on VIRTIO_NET_HDR_GSO_ECN and VIRTIO_NET_HDR_GSO_UDP,

-- 
David Marchand

David Marchand (4):
  mbuf: mark old offload flag as deprecated
  net/virtio: do not touch Tx offload flags
  net/virtio: refactor Tx offload helper
  vhost: fix offload flags in Rx path

 drivers/net/virtio/virtio_rxtx.c             |   7 +-
 drivers/net/virtio/virtio_rxtx_packed_avx.h  |   2 +-
 drivers/net/virtio/virtio_rxtx_packed_neon.h |   2 +-
 drivers/net/virtio/virtqueue.h               |  81 +++++------
 examples/vhost/main.c                        |  42 +++---
 lib/mbuf/rte_mbuf_core.h                     |   3 +-
 lib/vhost/virtio_net.c                       | 139 ++++++++-----------
 7 files changed, 124 insertions(+), 152 deletions(-)

-- 
2.23.0


^ permalink raw reply	[flat|nested] 63+ messages in thread

* [dpdk-dev] [PATCH v2 1/4] mbuf: mark old offload flag as deprecated
  2021-04-29  8:04 ` [dpdk-dev] [PATCH v2 0/4] Offload flags fixes David Marchand
@ 2021-04-29  8:04   ` David Marchand
  2021-04-29 12:14     ` Lance Richardson
  2021-04-29 16:45     ` Ajit Khaparde
  2021-04-29  8:04   ` [dpdk-dev] [PATCH v2 2/4] net/virtio: do not touch Tx offload flags David Marchand
                     ` (2 subsequent siblings)
  3 siblings, 2 replies; 63+ messages in thread
From: David Marchand @ 2021-04-29  8:04 UTC (permalink / raw)
  To: dev
  Cc: maxime.coquelin, olivier.matz, fbl, i.maximets, Ferruh Yigit,
	Lance Richardson, Andrew Rybchenko, Ajit Khaparde

PKT_RX_EIP_CKSUM_BAD has been declared deprecated but there was no
warning to applications still using it.
Fix this by marking as deprecated with the newly introduced
RTE_DEPRECATED.

Fixes: e8a419d6de4b ("mbuf: rename outer IP checksum macro")

Signed-off-by: David Marchand <david.marchand@redhat.com>
Reviewed-by: Flavio Leitner <fbl@sysclose.org>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
---
Changes since v1:
- updated commitlog following Olivier comment,

---
 lib/mbuf/rte_mbuf_core.h | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/lib/mbuf/rte_mbuf_core.h b/lib/mbuf/rte_mbuf_core.h
index c17dc95c51..bb38d7f581 100644
--- a/lib/mbuf/rte_mbuf_core.h
+++ b/lib/mbuf/rte_mbuf_core.h
@@ -83,7 +83,8 @@ extern "C" {
  * Deprecated.
  * This flag has been renamed, use PKT_RX_OUTER_IP_CKSUM_BAD instead.
  */
-#define PKT_RX_EIP_CKSUM_BAD PKT_RX_OUTER_IP_CKSUM_BAD
+#define PKT_RX_EIP_CKSUM_BAD \
+	RTE_DEPRECATED(PKT_RX_EIP_CKSUM_BAD) PKT_RX_OUTER_IP_CKSUM_BAD
 
 /**
  * A vlan has been stripped by the hardware and its tci is saved in
-- 
2.23.0


^ permalink raw reply	[flat|nested] 63+ messages in thread

* [dpdk-dev] [PATCH v2 2/4] net/virtio: do not touch Tx offload flags
  2021-04-29  8:04 ` [dpdk-dev] [PATCH v2 0/4] Offload flags fixes David Marchand
  2021-04-29  8:04   ` [dpdk-dev] [PATCH v2 1/4] mbuf: mark old offload flag as deprecated David Marchand
@ 2021-04-29  8:04   ` David Marchand
  2021-04-29 13:51     ` Flavio Leitner
  2021-04-29  8:04   ` [dpdk-dev] [PATCH v2 3/4] net/virtio: refactor Tx offload helper David Marchand
  2021-04-29  8:04   ` [dpdk-dev] [PATCH v2 4/4] vhost: fix offload flags in Rx path David Marchand
  3 siblings, 1 reply; 63+ messages in thread
From: David Marchand @ 2021-04-29  8:04 UTC (permalink / raw)
  To: dev; +Cc: maxime.coquelin, olivier.matz, fbl, i.maximets, Chenbo Xia

Tx offload flags are of the application responsibility.
Leave the mbuf alone and use a local storage for implicit tcp checksum
offloading in case of TSO.

Signed-off-by: David Marchand <david.marchand@redhat.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
---
 drivers/net/virtio/virtqueue.h | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/net/virtio/virtqueue.h b/drivers/net/virtio/virtqueue.h
index e9992b745d..ed3b85080e 100644
--- a/drivers/net/virtio/virtqueue.h
+++ b/drivers/net/virtio/virtqueue.h
@@ -622,10 +622,12 @@ virtqueue_xmit_offload(struct virtio_net_hdr *hdr,
 			uint8_t offload)
 {
 	if (offload) {
+		uint64_t csum_l4 = cookie->ol_flags & PKT_TX_L4_MASK;
+
 		if (cookie->ol_flags & PKT_TX_TCP_SEG)
-			cookie->ol_flags |= PKT_TX_TCP_CKSUM;
+			csum_l4 |= PKT_TX_TCP_CKSUM;
 
-		switch (cookie->ol_flags & PKT_TX_L4_MASK) {
+		switch (csum_l4) {
 		case PKT_TX_UDP_CKSUM:
 			hdr->csum_start = cookie->l2_len + cookie->l3_len;
 			hdr->csum_offset = offsetof(struct rte_udp_hdr,
-- 
2.23.0


^ permalink raw reply	[flat|nested] 63+ messages in thread

* [dpdk-dev] [PATCH v2 3/4] net/virtio: refactor Tx offload helper
  2021-04-29  8:04 ` [dpdk-dev] [PATCH v2 0/4] Offload flags fixes David Marchand
  2021-04-29  8:04   ` [dpdk-dev] [PATCH v2 1/4] mbuf: mark old offload flag as deprecated David Marchand
  2021-04-29  8:04   ` [dpdk-dev] [PATCH v2 2/4] net/virtio: do not touch Tx offload flags David Marchand
@ 2021-04-29  8:04   ` David Marchand
  2021-04-29 12:59     ` Maxime Coquelin
  2021-04-29  8:04   ` [dpdk-dev] [PATCH v2 4/4] vhost: fix offload flags in Rx path David Marchand
  3 siblings, 1 reply; 63+ messages in thread
From: David Marchand @ 2021-04-29  8:04 UTC (permalink / raw)
  To: dev
  Cc: maxime.coquelin, olivier.matz, fbl, i.maximets, Ruifeng Wang,
	Chenbo Xia, Bruce Richardson, Konstantin Ananyev, Jerin Jacob

Purely cosmetic but it is rather odd to have an "offload" helper that
checks if it actually must do something.
We already have the same checks in most callers, so move this branch
in them.

Signed-off-by: David Marchand <david.marchand@redhat.com>
Reviewed-by: Flavio Leitner <fbl@sysclose.org>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
---
 drivers/net/virtio/virtio_rxtx.c             |  7 +-
 drivers/net/virtio/virtio_rxtx_packed_avx.h  |  2 +-
 drivers/net/virtio/virtio_rxtx_packed_neon.h |  2 +-
 drivers/net/virtio/virtqueue.h               | 83 +++++++++-----------
 4 files changed, 44 insertions(+), 50 deletions(-)

diff --git a/drivers/net/virtio/virtio_rxtx.c b/drivers/net/virtio/virtio_rxtx.c
index 8df913b0ba..34108fb946 100644
--- a/drivers/net/virtio/virtio_rxtx.c
+++ b/drivers/net/virtio/virtio_rxtx.c
@@ -448,7 +448,7 @@ virtqueue_enqueue_xmit_inorder(struct virtnet_tx *txvq,
 		if (!vq->hw->has_tx_offload)
 			virtqueue_clear_net_hdr(hdr);
 		else
-			virtqueue_xmit_offload(hdr, cookies[i], true);
+			virtqueue_xmit_offload(hdr, cookies[i]);
 
 		start_dp[idx].addr  = rte_mbuf_data_iova(cookies[i]) - head_size;
 		start_dp[idx].len   = cookies[i]->data_len + head_size;
@@ -495,7 +495,7 @@ virtqueue_enqueue_xmit_packed_fast(struct virtnet_tx *txvq,
 	if (!vq->hw->has_tx_offload)
 		virtqueue_clear_net_hdr(hdr);
 	else
-		virtqueue_xmit_offload(hdr, cookie, true);
+		virtqueue_xmit_offload(hdr, cookie);
 
 	dp->addr = rte_mbuf_data_iova(cookie) - head_size;
 	dp->len  = cookie->data_len + head_size;
@@ -581,7 +581,8 @@ virtqueue_enqueue_xmit(struct virtnet_tx *txvq, struct rte_mbuf *cookie,
 		idx = start_dp[idx].next;
 	}
 
-	virtqueue_xmit_offload(hdr, cookie, vq->hw->has_tx_offload);
+	if (vq->hw->has_tx_offload)
+		virtqueue_xmit_offload(hdr, cookie);
 
 	do {
 		start_dp[idx].addr  = rte_mbuf_data_iova(cookie);
diff --git a/drivers/net/virtio/virtio_rxtx_packed_avx.h b/drivers/net/virtio/virtio_rxtx_packed_avx.h
index 228cf5437b..c819d2e4f2 100644
--- a/drivers/net/virtio/virtio_rxtx_packed_avx.h
+++ b/drivers/net/virtio/virtio_rxtx_packed_avx.h
@@ -115,7 +115,7 @@ virtqueue_enqueue_batch_packed_vec(struct virtnet_tx *txvq,
 		virtio_for_each_try_unroll(i, 0, PACKED_BATCH_SIZE) {
 			hdr = rte_pktmbuf_mtod_offset(tx_pkts[i],
 					struct virtio_net_hdr *, -head_size);
-			virtqueue_xmit_offload(hdr, tx_pkts[i], true);
+			virtqueue_xmit_offload(hdr, tx_pkts[i]);
 		}
 	}
 
diff --git a/drivers/net/virtio/virtio_rxtx_packed_neon.h b/drivers/net/virtio/virtio_rxtx_packed_neon.h
index d4257e68f0..f19e618635 100644
--- a/drivers/net/virtio/virtio_rxtx_packed_neon.h
+++ b/drivers/net/virtio/virtio_rxtx_packed_neon.h
@@ -134,7 +134,7 @@ virtqueue_enqueue_batch_packed_vec(struct virtnet_tx *txvq,
 		virtio_for_each_try_unroll(i, 0, PACKED_BATCH_SIZE) {
 			hdr = rte_pktmbuf_mtod_offset(tx_pkts[i],
 					struct virtio_net_hdr *, -head_size);
-			virtqueue_xmit_offload(hdr, tx_pkts[i], true);
+			virtqueue_xmit_offload(hdr, tx_pkts[i]);
 		}
 	}
 
diff --git a/drivers/net/virtio/virtqueue.h b/drivers/net/virtio/virtqueue.h
index ed3b85080e..03957b2bd0 100644
--- a/drivers/net/virtio/virtqueue.h
+++ b/drivers/net/virtio/virtqueue.h
@@ -617,52 +617,44 @@ virtqueue_notify(struct virtqueue *vq)
 } while (0)
 
 static inline void
-virtqueue_xmit_offload(struct virtio_net_hdr *hdr,
-			struct rte_mbuf *cookie,
-			uint8_t offload)
+virtqueue_xmit_offload(struct virtio_net_hdr *hdr, struct rte_mbuf *cookie)
 {
-	if (offload) {
-		uint64_t csum_l4 = cookie->ol_flags & PKT_TX_L4_MASK;
-
-		if (cookie->ol_flags & PKT_TX_TCP_SEG)
-			csum_l4 |= PKT_TX_TCP_CKSUM;
-
-		switch (csum_l4) {
-		case PKT_TX_UDP_CKSUM:
-			hdr->csum_start = cookie->l2_len + cookie->l3_len;
-			hdr->csum_offset = offsetof(struct rte_udp_hdr,
-				dgram_cksum);
-			hdr->flags = VIRTIO_NET_HDR_F_NEEDS_CSUM;
-			break;
-
-		case PKT_TX_TCP_CKSUM:
-			hdr->csum_start = cookie->l2_len + cookie->l3_len;
-			hdr->csum_offset = offsetof(struct rte_tcp_hdr, cksum);
-			hdr->flags = VIRTIO_NET_HDR_F_NEEDS_CSUM;
-			break;
-
-		default:
-			ASSIGN_UNLESS_EQUAL(hdr->csum_start, 0);
-			ASSIGN_UNLESS_EQUAL(hdr->csum_offset, 0);
-			ASSIGN_UNLESS_EQUAL(hdr->flags, 0);
-			break;
-		}
+	uint64_t csum_l4 = cookie->ol_flags & PKT_TX_L4_MASK;
+
+	if (cookie->ol_flags & PKT_TX_TCP_SEG)
+		csum_l4 |= PKT_TX_TCP_CKSUM;
+
+	switch (csum_l4) {
+	case PKT_TX_UDP_CKSUM:
+		hdr->csum_start = cookie->l2_len + cookie->l3_len;
+		hdr->csum_offset = offsetof(struct rte_udp_hdr, dgram_cksum);
+		hdr->flags = VIRTIO_NET_HDR_F_NEEDS_CSUM;
+		break;
+
+	case PKT_TX_TCP_CKSUM:
+		hdr->csum_start = cookie->l2_len + cookie->l3_len;
+		hdr->csum_offset = offsetof(struct rte_tcp_hdr, cksum);
+		hdr->flags = VIRTIO_NET_HDR_F_NEEDS_CSUM;
+		break;
+
+	default:
+		ASSIGN_UNLESS_EQUAL(hdr->csum_start, 0);
+		ASSIGN_UNLESS_EQUAL(hdr->csum_offset, 0);
+		ASSIGN_UNLESS_EQUAL(hdr->flags, 0);
+		break;
+	}
 
-		/* TCP Segmentation Offload */
-		if (cookie->ol_flags & PKT_TX_TCP_SEG) {
-			hdr->gso_type = (cookie->ol_flags & PKT_TX_IPV6) ?
-				VIRTIO_NET_HDR_GSO_TCPV6 :
-				VIRTIO_NET_HDR_GSO_TCPV4;
-			hdr->gso_size = cookie->tso_segsz;
-			hdr->hdr_len =
-				cookie->l2_len +
-				cookie->l3_len +
-				cookie->l4_len;
-		} else {
-			ASSIGN_UNLESS_EQUAL(hdr->gso_type, 0);
-			ASSIGN_UNLESS_EQUAL(hdr->gso_size, 0);
-			ASSIGN_UNLESS_EQUAL(hdr->hdr_len, 0);
-		}
+	/* TCP Segmentation Offload */
+	if (cookie->ol_flags & PKT_TX_TCP_SEG) {
+		hdr->gso_type = (cookie->ol_flags & PKT_TX_IPV6) ?
+			VIRTIO_NET_HDR_GSO_TCPV6 :
+			VIRTIO_NET_HDR_GSO_TCPV4;
+		hdr->gso_size = cookie->tso_segsz;
+		hdr->hdr_len = cookie->l2_len + cookie->l3_len + cookie->l4_len;
+	} else {
+		ASSIGN_UNLESS_EQUAL(hdr->gso_type, 0);
+		ASSIGN_UNLESS_EQUAL(hdr->gso_size, 0);
+		ASSIGN_UNLESS_EQUAL(hdr->hdr_len, 0);
 	}
 }
 
@@ -741,7 +733,8 @@ virtqueue_enqueue_xmit_packed(struct virtnet_tx *txvq, struct rte_mbuf *cookie,
 		}
 	}
 
-	virtqueue_xmit_offload(hdr, cookie, vq->hw->has_tx_offload);
+	if (vq->hw->has_tx_offload)
+		virtqueue_xmit_offload(hdr, cookie);
 
 	do {
 		uint16_t flags;
-- 
2.23.0


^ permalink raw reply	[flat|nested] 63+ messages in thread

* [dpdk-dev] [PATCH v2 4/4] vhost: fix offload flags in Rx path
  2021-04-29  8:04 ` [dpdk-dev] [PATCH v2 0/4] Offload flags fixes David Marchand
                     ` (2 preceding siblings ...)
  2021-04-29  8:04   ` [dpdk-dev] [PATCH v2 3/4] net/virtio: refactor Tx offload helper David Marchand
@ 2021-04-29  8:04   ` David Marchand
  2021-04-29 13:30     ` Maxime Coquelin
  2021-04-29 18:39     ` Flavio Leitner
  3 siblings, 2 replies; 63+ messages in thread
From: David Marchand @ 2021-04-29  8:04 UTC (permalink / raw)
  To: dev
  Cc: maxime.coquelin, olivier.matz, fbl, i.maximets, Chenbo Xia,
	Jijiang Liu, Yuanhan Liu

The vhost library current configures Tx offloading (PKT_TX_*) on any
packet received from a guest virtio device which asks for some offloading.

This is problematic, as Tx offloading is something that the application
must ask for: the application needs to configure devices
to support every used offloads (ip, tcp checksumming, tso..), and the
various l2/l3/l4 lengths must be set following any processing that
happened in the application itself.

On the other hand, the received packets are not marked wrt current
packet l3/l4 checksumming info.

Copy virtio rx processing to fix those offload flags but accepting
VIRTIO_NET_HDR_GSO_ECN and VIRTIO_NET_HDR_GSO_UDP too.

The vhost example has been updated accordingly: TSO is applied to any
packet marked LRO.

Fixes: 859b480d5afd ("vhost: add guest offload setting")

Signed-off-by: David Marchand <david.marchand@redhat.com>
---
Changes since v1:
- updated vhost example,
- restored VIRTIO_NET_HDR_GSO_ECN and VIRTIO_NET_HDR_GSO_UDP support,
- restored log on buggy offload request,

---
 examples/vhost/main.c  |  42 +++++++------
 lib/vhost/virtio_net.c | 139 +++++++++++++++++------------------------
 2 files changed, 78 insertions(+), 103 deletions(-)

diff --git a/examples/vhost/main.c b/examples/vhost/main.c
index ff48ba270d..4b3df254ba 100644
--- a/examples/vhost/main.c
+++ b/examples/vhost/main.c
@@ -19,6 +19,7 @@
 #include <rte_log.h>
 #include <rte_string_fns.h>
 #include <rte_malloc.h>
+#include <rte_net.h>
 #include <rte_vhost.h>
 #include <rte_ip.h>
 #include <rte_tcp.h>
@@ -1032,33 +1033,34 @@ find_local_dest(struct vhost_dev *vdev, struct rte_mbuf *m,
 	return 0;
 }
 
-static uint16_t
-get_psd_sum(void *l3_hdr, uint64_t ol_flags)
-{
-	if (ol_flags & PKT_TX_IPV4)
-		return rte_ipv4_phdr_cksum(l3_hdr, ol_flags);
-	else /* assume ethertype == RTE_ETHER_TYPE_IPV6 */
-		return rte_ipv6_phdr_cksum(l3_hdr, ol_flags);
-}
-
 static void virtio_tx_offload(struct rte_mbuf *m)
 {
+	struct rte_net_hdr_lens hdr_lens;
+	struct rte_ipv4_hdr *ipv4_hdr;
+	struct rte_tcp_hdr *tcp_hdr;
+	uint32_t ptype;
 	void *l3_hdr;
-	struct rte_ipv4_hdr *ipv4_hdr = NULL;
-	struct rte_tcp_hdr *tcp_hdr = NULL;
-	struct rte_ether_hdr *eth_hdr =
-		rte_pktmbuf_mtod(m, struct rte_ether_hdr *);
 
-	l3_hdr = (char *)eth_hdr + m->l2_len;
+	ptype = rte_net_get_ptype(m, &hdr_lens, RTE_PTYPE_ALL_MASK);
+	m->l2_len = hdr_lens.l2_len;
+	m->l3_len = hdr_lens.l3_len;
+	m->l4_len = hdr_lens.l4_len;
 
-	if (m->ol_flags & PKT_TX_IPV4) {
+	l3_hdr = rte_pktmbuf_mtod_offset(m, void *, m->l2_len);
+	tcp_hdr = rte_pktmbuf_mtod_offset(m, struct rte_tcp_hdr *,
+		m->l2_len + m->l3_len);
+
+	m->ol_flags |= PKT_TX_TCP_SEG;
+	if ((ptype & RTE_PTYPE_L3_MASK) == RTE_PTYPE_L3_IPV4) {
+		m->ol_flags |= PKT_TX_IPV4;
+		m->ol_flags |= PKT_TX_IP_CKSUM;
 		ipv4_hdr = l3_hdr;
 		ipv4_hdr->hdr_checksum = 0;
-		m->ol_flags |= PKT_TX_IP_CKSUM;
+		tcp_hdr->cksum = rte_ipv4_phdr_cksum(l3_hdr, m->ol_flags);
+	} else { /* assume ethertype == RTE_ETHER_TYPE_IPV6 */
+		m->ol_flags |= PKT_TX_IPV6;
+		tcp_hdr->cksum = rte_ipv6_phdr_cksum(l3_hdr, m->ol_flags);
 	}
-
-	tcp_hdr = (struct rte_tcp_hdr *)((char *)l3_hdr + m->l3_len);
-	tcp_hdr->cksum = get_psd_sum(l3_hdr, m->ol_flags);
 }
 
 static __rte_always_inline void
@@ -1151,7 +1153,7 @@ virtio_tx_route(struct vhost_dev *vdev, struct rte_mbuf *m, uint16_t vlan_tag)
 		m->vlan_tci = vlan_tag;
 	}
 
-	if (m->ol_flags & PKT_TX_TCP_SEG)
+	if (m->ol_flags & PKT_RX_LRO)
 		virtio_tx_offload(m);
 
 	tx_q->m_table[tx_q->len++] = m;
diff --git a/lib/vhost/virtio_net.c b/lib/vhost/virtio_net.c
index ff39878609..da15d11390 100644
--- a/lib/vhost/virtio_net.c
+++ b/lib/vhost/virtio_net.c
@@ -8,6 +8,7 @@
 
 #include <rte_mbuf.h>
 #include <rte_memcpy.h>
+#include <rte_net.h>
 #include <rte_ether.h>
 #include <rte_ip.h>
 #include <rte_vhost.h>
@@ -1827,105 +1828,74 @@ virtio_net_with_host_offload(struct virtio_net *dev)
 	return false;
 }
 
-static void
-parse_ethernet(struct rte_mbuf *m, uint16_t *l4_proto, void **l4_hdr)
-{
-	struct rte_ipv4_hdr *ipv4_hdr;
-	struct rte_ipv6_hdr *ipv6_hdr;
-	void *l3_hdr = NULL;
-	struct rte_ether_hdr *eth_hdr;
-	uint16_t ethertype;
-
-	eth_hdr = rte_pktmbuf_mtod(m, struct rte_ether_hdr *);
-
-	m->l2_len = sizeof(struct rte_ether_hdr);
-	ethertype = rte_be_to_cpu_16(eth_hdr->ether_type);
-
-	if (ethertype == RTE_ETHER_TYPE_VLAN) {
-		struct rte_vlan_hdr *vlan_hdr =
-			(struct rte_vlan_hdr *)(eth_hdr + 1);
-
-		m->l2_len += sizeof(struct rte_vlan_hdr);
-		ethertype = rte_be_to_cpu_16(vlan_hdr->eth_proto);
-	}
-
-	l3_hdr = (char *)eth_hdr + m->l2_len;
-
-	switch (ethertype) {
-	case RTE_ETHER_TYPE_IPV4:
-		ipv4_hdr = l3_hdr;
-		*l4_proto = ipv4_hdr->next_proto_id;
-		m->l3_len = rte_ipv4_hdr_len(ipv4_hdr);
-		*l4_hdr = (char *)l3_hdr + m->l3_len;
-		m->ol_flags |= PKT_TX_IPV4;
-		break;
-	case RTE_ETHER_TYPE_IPV6:
-		ipv6_hdr = l3_hdr;
-		*l4_proto = ipv6_hdr->proto;
-		m->l3_len = sizeof(struct rte_ipv6_hdr);
-		*l4_hdr = (char *)l3_hdr + m->l3_len;
-		m->ol_flags |= PKT_TX_IPV6;
-		break;
-	default:
-		m->l3_len = 0;
-		*l4_proto = 0;
-		*l4_hdr = NULL;
-		break;
-	}
-}
-
-static __rte_always_inline void
+static __rte_always_inline int
 vhost_dequeue_offload(struct virtio_net_hdr *hdr, struct rte_mbuf *m)
 {
-	uint16_t l4_proto = 0;
-	void *l4_hdr = NULL;
-	struct rte_tcp_hdr *tcp_hdr = NULL;
+	struct rte_net_hdr_lens hdr_lens;
+	uint32_t hdrlen, ptype;
+	int l4_supported = 0;
 
+	/* nothing to do */
 	if (hdr->flags == 0 && hdr->gso_type == VIRTIO_NET_HDR_GSO_NONE)
-		return;
-
-	parse_ethernet(m, &l4_proto, &l4_hdr);
-	if (hdr->flags == VIRTIO_NET_HDR_F_NEEDS_CSUM) {
-		if (hdr->csum_start == (m->l2_len + m->l3_len)) {
-			switch (hdr->csum_offset) {
-			case (offsetof(struct rte_tcp_hdr, cksum)):
-				if (l4_proto == IPPROTO_TCP)
-					m->ol_flags |= PKT_TX_TCP_CKSUM;
-				break;
-			case (offsetof(struct rte_udp_hdr, dgram_cksum)):
-				if (l4_proto == IPPROTO_UDP)
-					m->ol_flags |= PKT_TX_UDP_CKSUM;
-				break;
-			case (offsetof(struct rte_sctp_hdr, cksum)):
-				if (l4_proto == IPPROTO_SCTP)
-					m->ol_flags |= PKT_TX_SCTP_CKSUM;
-				break;
-			default:
-				break;
-			}
+		return 0;
+
+	m->ol_flags |= PKT_RX_IP_CKSUM_UNKNOWN;
+
+	ptype = rte_net_get_ptype(m, &hdr_lens, RTE_PTYPE_ALL_MASK);
+	m->packet_type = ptype;
+	if ((ptype & RTE_PTYPE_L4_MASK) == RTE_PTYPE_L4_TCP ||
+	    (ptype & RTE_PTYPE_L4_MASK) == RTE_PTYPE_L4_UDP ||
+	    (ptype & RTE_PTYPE_L4_MASK) == RTE_PTYPE_L4_SCTP)
+		l4_supported = 1;
+
+	if (hdr->flags & VIRTIO_NET_HDR_F_NEEDS_CSUM) {
+		hdrlen = hdr_lens.l2_len + hdr_lens.l3_len + hdr_lens.l4_len;
+		if (hdr->csum_start <= hdrlen && l4_supported) {
+			m->ol_flags |= PKT_RX_L4_CKSUM_NONE;
+		} else {
+			/* Unknown proto or tunnel, do sw cksum. We can assume
+			 * the cksum field is in the first segment since the
+			 * buffers we provided to the host are large enough.
+			 * In case of SCTP, this will be wrong since it's a CRC
+			 * but there's nothing we can do.
+			 */
+			uint16_t csum = 0, off;
+
+			if (rte_raw_cksum_mbuf(m, hdr->csum_start,
+					rte_pktmbuf_pkt_len(m) - hdr->csum_start, &csum) < 0)
+				return -EINVAL;
+			if (likely(csum != 0xffff))
+				csum = ~csum;
+			off = hdr->csum_offset + hdr->csum_start;
+			if (rte_pktmbuf_data_len(m) >= off + 1)
+				*rte_pktmbuf_mtod_offset(m, uint16_t *, off) = csum;
 		}
+	} else if (hdr->flags & VIRTIO_NET_HDR_F_DATA_VALID && l4_supported) {
+		m->ol_flags |= PKT_RX_L4_CKSUM_GOOD;
 	}
 
-	if (l4_hdr && hdr->gso_type != VIRTIO_NET_HDR_GSO_NONE) {
+	/* GSO request, save required information in mbuf */
+	if (hdr->gso_type != VIRTIO_NET_HDR_GSO_NONE) {
+		/* Check unsupported modes */
+		if (hdr->gso_size == 0)
+			return -EINVAL;
+
 		switch (hdr->gso_type & ~VIRTIO_NET_HDR_GSO_ECN) {
 		case VIRTIO_NET_HDR_GSO_TCPV4:
 		case VIRTIO_NET_HDR_GSO_TCPV6:
-			tcp_hdr = l4_hdr;
-			m->ol_flags |= PKT_TX_TCP_SEG;
-			m->tso_segsz = hdr->gso_size;
-			m->l4_len = (tcp_hdr->data_off & 0xf0) >> 2;
-			break;
 		case VIRTIO_NET_HDR_GSO_UDP:
-			m->ol_flags |= PKT_TX_UDP_SEG;
+			m->ol_flags |= PKT_RX_LRO | PKT_RX_L4_CKSUM_NONE;
+			/* Update mss lengths in mbuf */
 			m->tso_segsz = hdr->gso_size;
-			m->l4_len = sizeof(struct rte_udp_hdr);
 			break;
 		default:
 			VHOST_LOG_DATA(WARNING,
 				"unsupported gso type %u.\n", hdr->gso_type);
-			break;
+			return -EINVAL;
 		}
 	}
+
+	return 0;
 }
 
 static __rte_noinline void
@@ -2084,8 +2054,11 @@ copy_desc_to_mbuf(struct virtio_net *dev, struct vhost_virtqueue *vq,
 	prev->data_len = mbuf_offset;
 	m->pkt_len    += mbuf_offset;
 
-	if (hdr)
-		vhost_dequeue_offload(hdr, m);
+	if (hdr && vhost_dequeue_offload(hdr, m) < 0) {
+		VHOST_LOG_DATA(ERR, "Packet with invalid offloads.\n");
+		error = -1;
+		goto out;
+	}
 
 out:
 
-- 
2.23.0


^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [dpdk-dev] [PATCH v2 1/4] mbuf: mark old offload flag as deprecated
  2021-04-29  8:04   ` [dpdk-dev] [PATCH v2 1/4] mbuf: mark old offload flag as deprecated David Marchand
@ 2021-04-29 12:14     ` Lance Richardson
  2021-04-29 16:45     ` Ajit Khaparde
  1 sibling, 0 replies; 63+ messages in thread
From: Lance Richardson @ 2021-04-29 12:14 UTC (permalink / raw)
  To: David Marchand
  Cc: dev, Maxime Coquelin, Olivier Matz, fbl, i.maximets,
	Ferruh Yigit, Andrew Rybchenko, Ajit Khaparde

[-- Attachment #1: Type: text/plain, Size: 1294 bytes --]

On Thu, Apr 29, 2021 at 4:05 AM David Marchand
<david.marchand@redhat.com> wrote:
>
> PKT_RX_EIP_CKSUM_BAD has been declared deprecated but there was no
> warning to applications still using it.
> Fix this by marking as deprecated with the newly introduced
> RTE_DEPRECATED.
>
> Fixes: e8a419d6de4b ("mbuf: rename outer IP checksum macro")
>
> Signed-off-by: David Marchand <david.marchand@redhat.com>
> Reviewed-by: Flavio Leitner <fbl@sysclose.org>
> Acked-by: Olivier Matz <olivier.matz@6wind.com>
> ---
> Changes since v1:
> - updated commitlog following Olivier comment,
>
> ---
>  lib/mbuf/rte_mbuf_core.h | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/lib/mbuf/rte_mbuf_core.h b/lib/mbuf/rte_mbuf_core.h
> index c17dc95c51..bb38d7f581 100644
> --- a/lib/mbuf/rte_mbuf_core.h
> +++ b/lib/mbuf/rte_mbuf_core.h
> @@ -83,7 +83,8 @@ extern "C" {
>   * Deprecated.
>   * This flag has been renamed, use PKT_RX_OUTER_IP_CKSUM_BAD instead.
>   */
> -#define PKT_RX_EIP_CKSUM_BAD PKT_RX_OUTER_IP_CKSUM_BAD
> +#define PKT_RX_EIP_CKSUM_BAD \
> +       RTE_DEPRECATED(PKT_RX_EIP_CKSUM_BAD) PKT_RX_OUTER_IP_CKSUM_BAD
>
>  /**
>   * A vlan has been stripped by the hardware and its tci is saved in
> --
> 2.23.0
>
Acked-by: Lance Richardson <lance.richardson@broadcom.com>

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [dpdk-dev] [PATCH v2 3/4] net/virtio: refactor Tx offload helper
  2021-04-29  8:04   ` [dpdk-dev] [PATCH v2 3/4] net/virtio: refactor Tx offload helper David Marchand
@ 2021-04-29 12:59     ` Maxime Coquelin
  0 siblings, 0 replies; 63+ messages in thread
From: Maxime Coquelin @ 2021-04-29 12:59 UTC (permalink / raw)
  To: David Marchand, dev
  Cc: olivier.matz, fbl, i.maximets, Ruifeng Wang, Chenbo Xia,
	Bruce Richardson, Konstantin Ananyev, Jerin Jacob



On 4/29/21 10:04 AM, David Marchand wrote:
> Purely cosmetic but it is rather odd to have an "offload" helper that
> checks if it actually must do something.
> We already have the same checks in most callers, so move this branch
> in them.
> 
> Signed-off-by: David Marchand <david.marchand@redhat.com>
> Reviewed-by: Flavio Leitner <fbl@sysclose.org>
> Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
> ---
>  drivers/net/virtio/virtio_rxtx.c             |  7 +-
>  drivers/net/virtio/virtio_rxtx_packed_avx.h  |  2 +-
>  drivers/net/virtio/virtio_rxtx_packed_neon.h |  2 +-
>  drivers/net/virtio/virtqueue.h               | 83 +++++++++-----------
>  4 files changed, 44 insertions(+), 50 deletions(-)
> 

Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>

Thanks,
Maxime


^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [dpdk-dev] [PATCH v2 4/4] vhost: fix offload flags in Rx path
  2021-04-29  8:04   ` [dpdk-dev] [PATCH v2 4/4] vhost: fix offload flags in Rx path David Marchand
@ 2021-04-29 13:30     ` Maxime Coquelin
  2021-04-29 13:31       ` Maxime Coquelin
  2021-04-29 20:09       ` David Marchand
  2021-04-29 18:39     ` Flavio Leitner
  1 sibling, 2 replies; 63+ messages in thread
From: Maxime Coquelin @ 2021-04-29 13:30 UTC (permalink / raw)
  To: David Marchand, dev
  Cc: olivier.matz, fbl, i.maximets, Chenbo Xia, Jijiang Liu, Stokes, Ian



On 4/29/21 10:04 AM, David Marchand wrote:
> The vhost library current configures Tx offloading (PKT_TX_*) on any

s/current/currently/

> packet received from a guest virtio device which asks for some offloading.
> 
> This is problematic, as Tx offloading is something that the application
> must ask for: the application needs to configure devices
> to support every used offloads (ip, tcp checksumming, tso..), and the
> various l2/l3/l4 lengths must be set following any processing that
> happened in the application itself.
> 
> On the other hand, the received packets are not marked wrt current
> packet l3/l4 checksumming info.
> 
> Copy virtio rx processing to fix those offload flags but accepting
> VIRTIO_NET_HDR_GSO_ECN and VIRTIO_NET_HDR_GSO_UDP too.
> 
> The vhost example has been updated accordingly: TSO is applied to any
> packet marked LRO.
> 
> Fixes: 859b480d5afd ("vhost: add guest offload setting")
> 
> Signed-off-by: David Marchand <david.marchand@redhat.com>
> ---
> Changes since v1:
> - updated vhost example,
> - restored VIRTIO_NET_HDR_GSO_ECN and VIRTIO_NET_HDR_GSO_UDP support,
> - restored log on buggy offload request,
> 
> ---
>  examples/vhost/main.c  |  42 +++++++------
>  lib/vhost/virtio_net.c | 139 +++++++++++++++++------------------------
>  2 files changed, 78 insertions(+), 103 deletions(-)

Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>


^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [dpdk-dev] [PATCH v2 4/4] vhost: fix offload flags in Rx path
  2021-04-29 13:30     ` Maxime Coquelin
@ 2021-04-29 13:31       ` Maxime Coquelin
  2021-04-29 20:21         ` David Marchand
  2021-04-29 20:09       ` David Marchand
  1 sibling, 1 reply; 63+ messages in thread
From: Maxime Coquelin @ 2021-04-29 13:31 UTC (permalink / raw)
  To: David Marchand, dev
  Cc: olivier.matz, fbl, i.maximets, Chenbo Xia, Jijiang Liu, Stokes, Ian



On 4/29/21 3:30 PM, Maxime Coquelin wrote:
> 
> 
> On 4/29/21 10:04 AM, David Marchand wrote:
>> The vhost library current configures Tx offloading (PKT_TX_*) on any
> 
> s/current/currently/
> 
>> packet received from a guest virtio device which asks for some offloading.
>>
>> This is problematic, as Tx offloading is something that the application
>> must ask for: the application needs to configure devices
>> to support every used offloads (ip, tcp checksumming, tso..), and the
>> various l2/l3/l4 lengths must be set following any processing that
>> happened in the application itself.
>>
>> On the other hand, the received packets are not marked wrt current
>> packet l3/l4 checksumming info.
>>
>> Copy virtio rx processing to fix those offload flags but accepting
>> VIRTIO_NET_HDR_GSO_ECN and VIRTIO_NET_HDR_GSO_UDP too.
>>
>> The vhost example has been updated accordingly: TSO is applied to any
>> packet marked LRO.
>>
>> Fixes: 859b480d5afd ("vhost: add guest offload setting")
>>
>> Signed-off-by: David Marchand <david.marchand@redhat.com>
>> ---
>> Changes since v1:
>> - updated vhost example,
>> - restored VIRTIO_NET_HDR_GSO_ECN and VIRTIO_NET_HDR_GSO_UDP support,
>> - restored log on buggy offload request,
>>
>> ---
>>  examples/vhost/main.c  |  42 +++++++------
>>  lib/vhost/virtio_net.c | 139 +++++++++++++++++------------------------
>>  2 files changed, 78 insertions(+), 103 deletions(-)
> 
> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
> 

As I understand it, this change kind of break the ABI, but it is
actually fixing a misuse of the mbuf API, so I think we should
take this patch.


^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [dpdk-dev] [PATCH v2 2/4] net/virtio: do not touch Tx offload flags
  2021-04-29  8:04   ` [dpdk-dev] [PATCH v2 2/4] net/virtio: do not touch Tx offload flags David Marchand
@ 2021-04-29 13:51     ` Flavio Leitner
  0 siblings, 0 replies; 63+ messages in thread
From: Flavio Leitner @ 2021-04-29 13:51 UTC (permalink / raw)
  To: David Marchand; +Cc: dev, maxime.coquelin, olivier.matz, i.maximets, Chenbo Xia

On Thu, Apr 29, 2021 at 10:04:36AM +0200, David Marchand wrote:
> Tx offload flags are of the application responsibility.
> Leave the mbuf alone and use a local storage for implicit tcp checksum
> offloading in case of TSO.
> 
> Signed-off-by: David Marchand <david.marchand@redhat.com>
> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
> ---

Acked-by: Flavio Leitner <fbl@sysclose.org>


^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [dpdk-dev] [PATCH v2 1/4] mbuf: mark old offload flag as deprecated
  2021-04-29  8:04   ` [dpdk-dev] [PATCH v2 1/4] mbuf: mark old offload flag as deprecated David Marchand
  2021-04-29 12:14     ` Lance Richardson
@ 2021-04-29 16:45     ` Ajit Khaparde
  1 sibling, 0 replies; 63+ messages in thread
From: Ajit Khaparde @ 2021-04-29 16:45 UTC (permalink / raw)
  To: David Marchand
  Cc: dpdk-dev, Maxime Coquelin, Olivier Matz, fbl, i.maximets,
	Ferruh Yigit, Lance Richardson, Andrew Rybchenko

[-- Attachment #1: Type: text/plain, Size: 1289 bytes --]

On Thu, Apr 29, 2021 at 1:05 AM David Marchand
<david.marchand@redhat.com> wrote:
>
> PKT_RX_EIP_CKSUM_BAD has been declared deprecated but there was no
> warning to applications still using it.
> Fix this by marking as deprecated with the newly introduced
> RTE_DEPRECATED.
>
> Fixes: e8a419d6de4b ("mbuf: rename outer IP checksum macro")
>
> Signed-off-by: David Marchand <david.marchand@redhat.com>
> Reviewed-by: Flavio Leitner <fbl@sysclose.org>
> Acked-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: Ajit Khaparde <ajit.khaparde@broadcom.com>

> ---
> Changes since v1:
> - updated commitlog following Olivier comment,
>
> ---
>  lib/mbuf/rte_mbuf_core.h | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/lib/mbuf/rte_mbuf_core.h b/lib/mbuf/rte_mbuf_core.h
> index c17dc95c51..bb38d7f581 100644
> --- a/lib/mbuf/rte_mbuf_core.h
> +++ b/lib/mbuf/rte_mbuf_core.h
> @@ -83,7 +83,8 @@ extern "C" {
>   * Deprecated.
>   * This flag has been renamed, use PKT_RX_OUTER_IP_CKSUM_BAD instead.
>   */
> -#define PKT_RX_EIP_CKSUM_BAD PKT_RX_OUTER_IP_CKSUM_BAD
> +#define PKT_RX_EIP_CKSUM_BAD \
> +       RTE_DEPRECATED(PKT_RX_EIP_CKSUM_BAD) PKT_RX_OUTER_IP_CKSUM_BAD
>
>  /**
>   * A vlan has been stripped by the hardware and its tci is saved in
> --
> 2.23.0
>

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [dpdk-dev] [PATCH v2 4/4] vhost: fix offload flags in Rx path
  2021-04-29  8:04   ` [dpdk-dev] [PATCH v2 4/4] vhost: fix offload flags in Rx path David Marchand
  2021-04-29 13:30     ` Maxime Coquelin
@ 2021-04-29 18:39     ` Flavio Leitner
  2021-04-29 19:18       ` David Marchand
  1 sibling, 1 reply; 63+ messages in thread
From: Flavio Leitner @ 2021-04-29 18:39 UTC (permalink / raw)
  To: David Marchand
  Cc: dev, maxime.coquelin, olivier.matz, i.maximets, Chenbo Xia,
	Jijiang Liu, Yuanhan Liu

On Thu, Apr 29, 2021 at 10:04:38AM +0200, David Marchand wrote:
> The vhost library current configures Tx offloading (PKT_TX_*) on any
> packet received from a guest virtio device which asks for some offloading.
> 
> This is problematic, as Tx offloading is something that the application
> must ask for: the application needs to configure devices
> to support every used offloads (ip, tcp checksumming, tso..), and the
> various l2/l3/l4 lengths must be set following any processing that
> happened in the application itself.
> 
> On the other hand, the received packets are not marked wrt current
> packet l3/l4 checksumming info.
> 
> Copy virtio rx processing to fix those offload flags but accepting
> VIRTIO_NET_HDR_GSO_ECN and VIRTIO_NET_HDR_GSO_UDP too.
> 
> The vhost example has been updated accordingly: TSO is applied to any
> packet marked LRO.
> 
> Fixes: 859b480d5afd ("vhost: add guest offload setting")
> 
> Signed-off-by: David Marchand <david.marchand@redhat.com>
> ---
> Changes since v1:
> - updated vhost example,
> - restored VIRTIO_NET_HDR_GSO_ECN and VIRTIO_NET_HDR_GSO_UDP support,
> - restored log on buggy offload request,
> 
> ---
>  examples/vhost/main.c  |  42 +++++++------
>  lib/vhost/virtio_net.c | 139 +++++++++++++++++------------------------
>  2 files changed, 78 insertions(+), 103 deletions(-)
> 
[...]

> -	if (l4_hdr && hdr->gso_type != VIRTIO_NET_HDR_GSO_NONE) {
> +	/* GSO request, save required information in mbuf */
> +	if (hdr->gso_type != VIRTIO_NET_HDR_GSO_NONE) {
> +		/* Check unsupported modes */
> +		if (hdr->gso_size == 0)
> +			return -EINVAL;
> +
>  		switch (hdr->gso_type & ~VIRTIO_NET_HDR_GSO_ECN) {
>  		case VIRTIO_NET_HDR_GSO_TCPV4:
>  		case VIRTIO_NET_HDR_GSO_TCPV6:
> -			tcp_hdr = l4_hdr;
> -			m->ol_flags |= PKT_TX_TCP_SEG;
> -			m->tso_segsz = hdr->gso_size;
> -			m->l4_len = (tcp_hdr->data_off & 0xf0) >> 2;
> -			break;
>  		case VIRTIO_NET_HDR_GSO_UDP:
> -			m->ol_flags |= PKT_TX_UDP_SEG;
> +			m->ol_flags |= PKT_RX_LRO | PKT_RX_L4_CKSUM_NONE;

My understanding of the virtio 1.1 spec is that GSO can be
used independently of CSUM. There is nothing preventing to
send a fully checksummed TSO packet.

Anyways, that's unusual and not the goal of this patch.

Acked-by: Flavio Leitner <fbl@sysclose.org>

fbl


> +			/* Update mss lengths in mbuf */
>  			m->tso_segsz = hdr->gso_size;
> -			m->l4_len = sizeof(struct rte_udp_hdr);
>  			break;
>  		default:
>  			VHOST_LOG_DATA(WARNING,
>  				"unsupported gso type %u.\n", hdr->gso_type);
> -			break;
> +			return -EINVAL;
>  		}
>  	}
> +
> +	return 0;
>  }
>  
>  static __rte_noinline void
> @@ -2084,8 +2054,11 @@ copy_desc_to_mbuf(struct virtio_net *dev, struct vhost_virtqueue *vq,
>  	prev->data_len = mbuf_offset;
>  	m->pkt_len    += mbuf_offset;
>  
> -	if (hdr)
> -		vhost_dequeue_offload(hdr, m);
> +	if (hdr && vhost_dequeue_offload(hdr, m) < 0) {
> +		VHOST_LOG_DATA(ERR, "Packet with invalid offloads.\n");
> +		error = -1;
> +		goto out;
> +	}
>  
>  out:
>  
> -- 
> 2.23.0
> 

-- 
fbl

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [dpdk-dev] [PATCH v2 4/4] vhost: fix offload flags in Rx path
  2021-04-29 18:39     ` Flavio Leitner
@ 2021-04-29 19:18       ` David Marchand
  0 siblings, 0 replies; 63+ messages in thread
From: David Marchand @ 2021-04-29 19:18 UTC (permalink / raw)
  To: Flavio Leitner, Olivier Matz
  Cc: dev, Maxime Coquelin, Ilya Maximets, Chenbo Xia

On Thu, Apr 29, 2021 at 8:39 PM Flavio Leitner <fbl@sysclose.org> wrote:
> > -     if (l4_hdr && hdr->gso_type != VIRTIO_NET_HDR_GSO_NONE) {
> > +     /* GSO request, save required information in mbuf */
> > +     if (hdr->gso_type != VIRTIO_NET_HDR_GSO_NONE) {
> > +             /* Check unsupported modes */
> > +             if (hdr->gso_size == 0)
> > +                     return -EINVAL;
> > +
> >               switch (hdr->gso_type & ~VIRTIO_NET_HDR_GSO_ECN) {
> >               case VIRTIO_NET_HDR_GSO_TCPV4:
> >               case VIRTIO_NET_HDR_GSO_TCPV6:
> > -                     tcp_hdr = l4_hdr;
> > -                     m->ol_flags |= PKT_TX_TCP_SEG;
> > -                     m->tso_segsz = hdr->gso_size;
> > -                     m->l4_len = (tcp_hdr->data_off & 0xf0) >> 2;
> > -                     break;
> >               case VIRTIO_NET_HDR_GSO_UDP:
> > -                     m->ol_flags |= PKT_TX_UDP_SEG;
> > +                     m->ol_flags |= PKT_RX_LRO | PKT_RX_L4_CKSUM_NONE;
>
> My understanding of the virtio 1.1 spec is that GSO can be
> used independently of CSUM. There is nothing preventing to
> send a fully checksummed TSO packet.

This forces a superfluous cksum in such a situation.
It can be fixed later if needed.

The virtio pmd rx side has the same behavior.


> Anyways, that's unusual and not the goal of this patch.
>
> Acked-by: Flavio Leitner <fbl@sysclose.org>

Thanks!


-- 
David Marchand


^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [dpdk-dev] [PATCH v2 4/4] vhost: fix offload flags in Rx path
  2021-04-29 13:30     ` Maxime Coquelin
  2021-04-29 13:31       ` Maxime Coquelin
@ 2021-04-29 20:09       ` David Marchand
  1 sibling, 0 replies; 63+ messages in thread
From: David Marchand @ 2021-04-29 20:09 UTC (permalink / raw)
  To: Maxime Coquelin
  Cc: dev, Olivier Matz, Flavio Leitner, Ilya Maximets, Chenbo Xia,
	Jijiang Liu, Stokes, Ian

On Thu, Apr 29, 2021 at 3:30 PM Maxime Coquelin
<maxime.coquelin@redhat.com> wrote:
> On 4/29/21 10:04 AM, David Marchand wrote:
> > The vhost library current configures Tx offloading (PKT_TX_*) on any
>
> s/current/currently/

Ok.

>
> > packet received from a guest virtio device which asks for some offloading.
> >
> > This is problematic, as Tx offloading is something that the application
> > must ask for: the application needs to configure devices
> > to support every used offloads (ip, tcp checksumming, tso..), and the
> > various l2/l3/l4 lengths must be set following any processing that
> > happened in the application itself.
> >
> > On the other hand, the received packets are not marked wrt current
> > packet l3/l4 checksumming info.
> >
> > Copy virtio rx processing to fix those offload flags but accepting
> > VIRTIO_NET_HDR_GSO_ECN and VIRTIO_NET_HDR_GSO_UDP too.
> >
> > The vhost example has been updated accordingly: TSO is applied to any
> > packet marked LRO.
> >
> > Fixes: 859b480d5afd ("vhost: add guest offload setting")
> >
> > Signed-off-by: David Marchand <david.marchand@redhat.com>
> > ---
> > Changes since v1:
> > - updated vhost example,
> > - restored VIRTIO_NET_HDR_GSO_ECN and VIRTIO_NET_HDR_GSO_UDP support,
> > - restored log on buggy offload request,
> >
> > ---
> >  examples/vhost/main.c  |  42 +++++++------
> >  lib/vhost/virtio_net.c | 139 +++++++++++++++++------------------------
> >  2 files changed, 78 insertions(+), 103 deletions(-)


A release note update is missing.


-- 
David Marchand


^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [dpdk-dev] [PATCH v2 4/4] vhost: fix offload flags in Rx path
  2021-04-29 13:31       ` Maxime Coquelin
@ 2021-04-29 20:21         ` David Marchand
  2021-04-30  8:38           ` Maxime Coquelin
  0 siblings, 1 reply; 63+ messages in thread
From: David Marchand @ 2021-04-29 20:21 UTC (permalink / raw)
  To: Maxime Coquelin
  Cc: dev, Olivier Matz, Flavio Leitner, Ilya Maximets, Chenbo Xia,
	Stokes, Ian

On Thu, Apr 29, 2021 at 3:31 PM Maxime Coquelin
<maxime.coquelin@redhat.com> wrote:
> On 4/29/21 3:30 PM, Maxime Coquelin wrote:
> >> The vhost library current configures Tx offloading (PKT_TX_*) on any
> >> packet received from a guest virtio device which asks for some offloading.
> >>
> >> This is problematic, as Tx offloading is something that the application
> >> must ask for: the application needs to configure devices
> >> to support every used offloads (ip, tcp checksumming, tso..), and the
> >> various l2/l3/l4 lengths must be set following any processing that
> >> happened in the application itself.
> >>
> >> On the other hand, the received packets are not marked wrt current
> >> packet l3/l4 checksumming info.
> >>
> >> Copy virtio rx processing to fix those offload flags but accepting
> >> VIRTIO_NET_HDR_GSO_ECN and VIRTIO_NET_HDR_GSO_UDP too.
> >>
> >> The vhost example has been updated accordingly: TSO is applied to any
> >> packet marked LRO.
> >>
> >> Fixes: 859b480d5afd ("vhost: add guest offload setting")
>
> As I understand it, this change kind of break the ABI, but it is
> actually fixing a misuse of the mbuf API, so I think we should
> take this patch.

Indeed, this breaks the v21 ABI.

But the only usecase I can think of is an application using TSO /
checksum offloads *only* for traffic coming from vhost.
I say *only* for traffic coming from vhost, because to have this
application do TSO / checksum offloaing for traffic coming from a
physical port, it would comply with the mbuf API and set the PKT_TX_*
flags.

Apart from the example/vhost, I am not sure there is such an
application that only does v2v or v2p but _not_ p2v TSO / checksum
offloading.
(Note: I am unable to use this example... it seems unhappy with the
mlx5 port I use => FPE because this driver does not support vmdq o_O)


I see three options:
- fix the vhost library and break the ABI that only works in an
example (this current patch),
- maintain the v21 ABI
  * using symbol versioning, this adds no branch, recompiled
application use the new ABI, this can't be backported to 20.11,
  * keeping the current behavior by default, but introducing a new
flag that an application would pass to rte_vhost_driver_register().
This new flag triggers this current patch behavior but it would add an
additional branch per packets bulk in vhost dequeue path. This *could*
be backported to 20.11.


-- 
David Marchand


^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [dpdk-dev] [PATCH v2 4/4] vhost: fix offload flags in Rx path
  2021-04-29 20:21         ` David Marchand
@ 2021-04-30  8:38           ` Maxime Coquelin
  0 siblings, 0 replies; 63+ messages in thread
From: Maxime Coquelin @ 2021-04-30  8:38 UTC (permalink / raw)
  To: David Marchand
  Cc: dev, Olivier Matz, Flavio Leitner, Ilya Maximets, Chenbo Xia,
	Stokes, Ian

Hi David,

On 4/29/21 10:21 PM, David Marchand wrote:
> On Thu, Apr 29, 2021 at 3:31 PM Maxime Coquelin
> <maxime.coquelin@redhat.com> wrote:
>> On 4/29/21 3:30 PM, Maxime Coquelin wrote:
>>>> The vhost library current configures Tx offloading (PKT_TX_*) on any
>>>> packet received from a guest virtio device which asks for some offloading.
>>>>
>>>> This is problematic, as Tx offloading is something that the application
>>>> must ask for: the application needs to configure devices
>>>> to support every used offloads (ip, tcp checksumming, tso..), and the
>>>> various l2/l3/l4 lengths must be set following any processing that
>>>> happened in the application itself.
>>>>
>>>> On the other hand, the received packets are not marked wrt current
>>>> packet l3/l4 checksumming info.
>>>>
>>>> Copy virtio rx processing to fix those offload flags but accepting
>>>> VIRTIO_NET_HDR_GSO_ECN and VIRTIO_NET_HDR_GSO_UDP too.
>>>>
>>>> The vhost example has been updated accordingly: TSO is applied to any
>>>> packet marked LRO.
>>>>
>>>> Fixes: 859b480d5afd ("vhost: add guest offload setting")
>>
>> As I understand it, this change kind of break the ABI, but it is
>> actually fixing a misuse of the mbuf API, so I think we should
>> take this patch.
> 
> Indeed, this breaks the v21 ABI.
> 
> But the only usecase I can think of is an application using TSO /
> checksum offloads *only* for traffic coming from vhost.
> I say *only* for traffic coming from vhost, because to have this
> application do TSO / checksum offloaing for traffic coming from a
> physical port, it would comply with the mbuf API and set the PKT_TX_*
> flags.
> 
> Apart from the example/vhost, I am not sure there is such an
> application that only does v2v or v2p but _not_ p2v TSO / checksum
> offloading.
> (Note: I am unable to use this example... it seems unhappy with the
> mlx5 port I use => FPE because this driver does not support vmdq o_O)
> 
> 
> I see three options:
> - fix the vhost library and break the ABI that only works in an
> example (this current patch),
> - maintain the v21 ABI
>   * using symbol versioning, this adds no branch, recompiled
> application use the new ABI, this can't be backported to 20.11,
>   * keeping the current behavior by default, but introducing a new
> flag that an application would pass to rte_vhost_driver_register().
> This new flag triggers this current patch behavior but it would add an
> additional branch per packets bulk in vhost dequeue path. This *could*
> be backported to 20.11.

The flag option seems to be the best option, as it will not break ABI so
applications we don't know about using Vhost offloads won't be impacted
and can add support for the behaviour in a smooth way.

The hardest part with this solution is to find a proper name for that
flag...

Thanks,
Maxime


^ permalink raw reply	[flat|nested] 63+ messages in thread

* [dpdk-dev] [PATCH v3 0/4] Offload flags fixes
  2021-04-01  9:52 [dpdk-dev] [PATCH 0/5] Offload flags fixes David Marchand
                   ` (5 preceding siblings ...)
  2021-04-29  8:04 ` [dpdk-dev] [PATCH v2 0/4] Offload flags fixes David Marchand
@ 2021-05-03 13:26 ` David Marchand
  2021-05-03 13:26   ` [dpdk-dev] [PATCH v3 1/4] mbuf: mark old offload flag as deprecated David Marchand
                     ` (4 more replies)
  2021-05-03 16:43 ` [dpdk-dev] [PATCH v4 0/3] " David Marchand
  7 siblings, 5 replies; 63+ messages in thread
From: David Marchand @ 2021-05-03 13:26 UTC (permalink / raw)
  To: dev
  Cc: maxime.coquelin, olivier.matz, fbl, i.maximets, chenbo.xia, ian.stokes

The important part is the last patch on vhost handling of offloading
requests coming from a virtio guest interface.

The rest are small fixes that I accumulated while reviewing the mbuf
offload flags.

On this last patch, it has the potential of breaking existing
applications using the vhost library (OVS being impacted).
I did not mark it for backport.

Changes since v2:
- kept behavior untouched (to avoid breaking ABI) and introduced a new
  flag to select the new behavior,

Changes since v1:
- dropped patch on net/tap,
- added missing bits in example/vhost,
- relaxed checks on VIRTIO_NET_HDR_GSO_ECN and VIRTIO_NET_HDR_GSO_UDP,

-- 
David Marchand

David Marchand (4):
  mbuf: mark old offload flag as deprecated
  net/virtio: do not touch Tx offload flags
  net/virtio: refactor Tx offload helper
  vhost: fix offload flags in Rx path

 doc/guides/prog_guide/vhost_lib.rst          |  12 ++
 doc/guides/rel_notes/release_21_05.rst       |   6 +
 drivers/net/vhost/rte_eth_vhost.c            |   2 +-
 drivers/net/virtio/virtio_rxtx.c             |   7 +-
 drivers/net/virtio/virtio_rxtx_packed_avx.h  |   2 +-
 drivers/net/virtio/virtio_rxtx_packed_neon.h |   2 +-
 drivers/net/virtio/virtqueue.h               |  81 ++++----
 examples/vhost/main.c                        |  44 ++---
 lib/mbuf/rte_mbuf_core.h                     |   3 +-
 lib/vhost/rte_vhost.h                        |   1 +
 lib/vhost/socket.c                           |   5 +-
 lib/vhost/vhost.c                            |   6 +-
 lib/vhost/vhost.h                            |  14 +-
 lib/vhost/virtio_net.c                       | 185 ++++++++++++++++---
 14 files changed, 268 insertions(+), 102 deletions(-)

-- 
2.23.0


^ permalink raw reply	[flat|nested] 63+ messages in thread

* [dpdk-dev] [PATCH v3 1/4] mbuf: mark old offload flag as deprecated
  2021-05-03 13:26 ` [dpdk-dev] [PATCH v3 0/4] Offload flags fixes David Marchand
@ 2021-05-03 13:26   ` David Marchand
  2021-05-03 14:02     ` Maxime Coquelin
  2021-05-03 14:12     ` David Marchand
  2021-05-03 13:26   ` [dpdk-dev] [PATCH v3 2/4] net/virtio: do not touch Tx offload flags David Marchand
                     ` (3 subsequent siblings)
  4 siblings, 2 replies; 63+ messages in thread
From: David Marchand @ 2021-05-03 13:26 UTC (permalink / raw)
  To: dev
  Cc: maxime.coquelin, olivier.matz, fbl, i.maximets, chenbo.xia,
	ian.stokes, Lance Richardson, Ajit Khaparde, Ferruh Yigit,
	Andrew Rybchenko

PKT_RX_EIP_CKSUM_BAD has been declared deprecated but there was no
warning to applications still using it.
Fix this by marking as deprecated with the newly introduced
RTE_DEPRECATED.

Fixes: e8a419d6de4b ("mbuf: rename outer IP checksum macro")

Signed-off-by: David Marchand <david.marchand@redhat.com>
Reviewed-by: Flavio Leitner <fbl@sysclose.org>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: Lance Richardson <lance.richardson@broadcom.com>
Acked-by: Ajit Khaparde <ajit.khaparde@broadcom.com>
---
Changes since v1:
- updated commitlog following Olivier comment,

---
 lib/mbuf/rte_mbuf_core.h | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/lib/mbuf/rte_mbuf_core.h b/lib/mbuf/rte_mbuf_core.h
index c17dc95c51..bb38d7f581 100644
--- a/lib/mbuf/rte_mbuf_core.h
+++ b/lib/mbuf/rte_mbuf_core.h
@@ -83,7 +83,8 @@ extern "C" {
  * Deprecated.
  * This flag has been renamed, use PKT_RX_OUTER_IP_CKSUM_BAD instead.
  */
-#define PKT_RX_EIP_CKSUM_BAD PKT_RX_OUTER_IP_CKSUM_BAD
+#define PKT_RX_EIP_CKSUM_BAD \
+	RTE_DEPRECATED(PKT_RX_EIP_CKSUM_BAD) PKT_RX_OUTER_IP_CKSUM_BAD
 
 /**
  * A vlan has been stripped by the hardware and its tci is saved in
-- 
2.23.0


^ permalink raw reply	[flat|nested] 63+ messages in thread

* [dpdk-dev] [PATCH v3 2/4] net/virtio: do not touch Tx offload flags
  2021-05-03 13:26 ` [dpdk-dev] [PATCH v3 0/4] Offload flags fixes David Marchand
  2021-05-03 13:26   ` [dpdk-dev] [PATCH v3 1/4] mbuf: mark old offload flag as deprecated David Marchand
@ 2021-05-03 13:26   ` David Marchand
  2021-05-03 13:26   ` [dpdk-dev] [PATCH v3 3/4] net/virtio: refactor Tx offload helper David Marchand
                     ` (2 subsequent siblings)
  4 siblings, 0 replies; 63+ messages in thread
From: David Marchand @ 2021-05-03 13:26 UTC (permalink / raw)
  To: dev
  Cc: maxime.coquelin, olivier.matz, fbl, i.maximets, chenbo.xia, ian.stokes

Tx offload flags are of the application responsibility.
Leave the mbuf alone and use a local storage for implicit tcp checksum
offloading in case of TSO.

Signed-off-by: David Marchand <david.marchand@redhat.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Flavio Leitner <fbl@sysclose.org>
---
 drivers/net/virtio/virtqueue.h | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/net/virtio/virtqueue.h b/drivers/net/virtio/virtqueue.h
index e9992b745d..ed3b85080e 100644
--- a/drivers/net/virtio/virtqueue.h
+++ b/drivers/net/virtio/virtqueue.h
@@ -622,10 +622,12 @@ virtqueue_xmit_offload(struct virtio_net_hdr *hdr,
 			uint8_t offload)
 {
 	if (offload) {
+		uint64_t csum_l4 = cookie->ol_flags & PKT_TX_L4_MASK;
+
 		if (cookie->ol_flags & PKT_TX_TCP_SEG)
-			cookie->ol_flags |= PKT_TX_TCP_CKSUM;
+			csum_l4 |= PKT_TX_TCP_CKSUM;
 
-		switch (cookie->ol_flags & PKT_TX_L4_MASK) {
+		switch (csum_l4) {
 		case PKT_TX_UDP_CKSUM:
 			hdr->csum_start = cookie->l2_len + cookie->l3_len;
 			hdr->csum_offset = offsetof(struct rte_udp_hdr,
-- 
2.23.0


^ permalink raw reply	[flat|nested] 63+ messages in thread

* [dpdk-dev] [PATCH v3 3/4] net/virtio: refactor Tx offload helper
  2021-05-03 13:26 ` [dpdk-dev] [PATCH v3 0/4] Offload flags fixes David Marchand
  2021-05-03 13:26   ` [dpdk-dev] [PATCH v3 1/4] mbuf: mark old offload flag as deprecated David Marchand
  2021-05-03 13:26   ` [dpdk-dev] [PATCH v3 2/4] net/virtio: do not touch Tx offload flags David Marchand
@ 2021-05-03 13:26   ` David Marchand
  2021-05-03 13:26   ` [dpdk-dev] [PATCH v3 4/4] vhost: fix offload flags in Rx path David Marchand
  2021-05-03 15:24   ` [dpdk-dev] [PATCH v3 0/4] Offload flags fixes Maxime Coquelin
  4 siblings, 0 replies; 63+ messages in thread
From: David Marchand @ 2021-05-03 13:26 UTC (permalink / raw)
  To: dev
  Cc: maxime.coquelin, olivier.matz, fbl, i.maximets, chenbo.xia,
	ian.stokes, Ruifeng Wang, Bruce Richardson, Konstantin Ananyev,
	Jerin Jacob

Purely cosmetic but it is rather odd to have an "offload" helper that
checks if it actually must do something.
We already have the same checks in most callers, so move this branch
in them.

Signed-off-by: David Marchand <david.marchand@redhat.com>
Reviewed-by: Flavio Leitner <fbl@sysclose.org>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
---
 drivers/net/virtio/virtio_rxtx.c             |  7 +-
 drivers/net/virtio/virtio_rxtx_packed_avx.h  |  2 +-
 drivers/net/virtio/virtio_rxtx_packed_neon.h |  2 +-
 drivers/net/virtio/virtqueue.h               | 83 +++++++++-----------
 4 files changed, 44 insertions(+), 50 deletions(-)

diff --git a/drivers/net/virtio/virtio_rxtx.c b/drivers/net/virtio/virtio_rxtx.c
index 8df913b0ba..34108fb946 100644
--- a/drivers/net/virtio/virtio_rxtx.c
+++ b/drivers/net/virtio/virtio_rxtx.c
@@ -448,7 +448,7 @@ virtqueue_enqueue_xmit_inorder(struct virtnet_tx *txvq,
 		if (!vq->hw->has_tx_offload)
 			virtqueue_clear_net_hdr(hdr);
 		else
-			virtqueue_xmit_offload(hdr, cookies[i], true);
+			virtqueue_xmit_offload(hdr, cookies[i]);
 
 		start_dp[idx].addr  = rte_mbuf_data_iova(cookies[i]) - head_size;
 		start_dp[idx].len   = cookies[i]->data_len + head_size;
@@ -495,7 +495,7 @@ virtqueue_enqueue_xmit_packed_fast(struct virtnet_tx *txvq,
 	if (!vq->hw->has_tx_offload)
 		virtqueue_clear_net_hdr(hdr);
 	else
-		virtqueue_xmit_offload(hdr, cookie, true);
+		virtqueue_xmit_offload(hdr, cookie);
 
 	dp->addr = rte_mbuf_data_iova(cookie) - head_size;
 	dp->len  = cookie->data_len + head_size;
@@ -581,7 +581,8 @@ virtqueue_enqueue_xmit(struct virtnet_tx *txvq, struct rte_mbuf *cookie,
 		idx = start_dp[idx].next;
 	}
 
-	virtqueue_xmit_offload(hdr, cookie, vq->hw->has_tx_offload);
+	if (vq->hw->has_tx_offload)
+		virtqueue_xmit_offload(hdr, cookie);
 
 	do {
 		start_dp[idx].addr  = rte_mbuf_data_iova(cookie);
diff --git a/drivers/net/virtio/virtio_rxtx_packed_avx.h b/drivers/net/virtio/virtio_rxtx_packed_avx.h
index 228cf5437b..c819d2e4f2 100644
--- a/drivers/net/virtio/virtio_rxtx_packed_avx.h
+++ b/drivers/net/virtio/virtio_rxtx_packed_avx.h
@@ -115,7 +115,7 @@ virtqueue_enqueue_batch_packed_vec(struct virtnet_tx *txvq,
 		virtio_for_each_try_unroll(i, 0, PACKED_BATCH_SIZE) {
 			hdr = rte_pktmbuf_mtod_offset(tx_pkts[i],
 					struct virtio_net_hdr *, -head_size);
-			virtqueue_xmit_offload(hdr, tx_pkts[i], true);
+			virtqueue_xmit_offload(hdr, tx_pkts[i]);
 		}
 	}
 
diff --git a/drivers/net/virtio/virtio_rxtx_packed_neon.h b/drivers/net/virtio/virtio_rxtx_packed_neon.h
index d4257e68f0..f19e618635 100644
--- a/drivers/net/virtio/virtio_rxtx_packed_neon.h
+++ b/drivers/net/virtio/virtio_rxtx_packed_neon.h
@@ -134,7 +134,7 @@ virtqueue_enqueue_batch_packed_vec(struct virtnet_tx *txvq,
 		virtio_for_each_try_unroll(i, 0, PACKED_BATCH_SIZE) {
 			hdr = rte_pktmbuf_mtod_offset(tx_pkts[i],
 					struct virtio_net_hdr *, -head_size);
-			virtqueue_xmit_offload(hdr, tx_pkts[i], true);
+			virtqueue_xmit_offload(hdr, tx_pkts[i]);
 		}
 	}
 
diff --git a/drivers/net/virtio/virtqueue.h b/drivers/net/virtio/virtqueue.h
index ed3b85080e..03957b2bd0 100644
--- a/drivers/net/virtio/virtqueue.h
+++ b/drivers/net/virtio/virtqueue.h
@@ -617,52 +617,44 @@ virtqueue_notify(struct virtqueue *vq)
 } while (0)
 
 static inline void
-virtqueue_xmit_offload(struct virtio_net_hdr *hdr,
-			struct rte_mbuf *cookie,
-			uint8_t offload)
+virtqueue_xmit_offload(struct virtio_net_hdr *hdr, struct rte_mbuf *cookie)
 {
-	if (offload) {
-		uint64_t csum_l4 = cookie->ol_flags & PKT_TX_L4_MASK;
-
-		if (cookie->ol_flags & PKT_TX_TCP_SEG)
-			csum_l4 |= PKT_TX_TCP_CKSUM;
-
-		switch (csum_l4) {
-		case PKT_TX_UDP_CKSUM:
-			hdr->csum_start = cookie->l2_len + cookie->l3_len;
-			hdr->csum_offset = offsetof(struct rte_udp_hdr,
-				dgram_cksum);
-			hdr->flags = VIRTIO_NET_HDR_F_NEEDS_CSUM;
-			break;
-
-		case PKT_TX_TCP_CKSUM:
-			hdr->csum_start = cookie->l2_len + cookie->l3_len;
-			hdr->csum_offset = offsetof(struct rte_tcp_hdr, cksum);
-			hdr->flags = VIRTIO_NET_HDR_F_NEEDS_CSUM;
-			break;
-
-		default:
-			ASSIGN_UNLESS_EQUAL(hdr->csum_start, 0);
-			ASSIGN_UNLESS_EQUAL(hdr->csum_offset, 0);
-			ASSIGN_UNLESS_EQUAL(hdr->flags, 0);
-			break;
-		}
+	uint64_t csum_l4 = cookie->ol_flags & PKT_TX_L4_MASK;
+
+	if (cookie->ol_flags & PKT_TX_TCP_SEG)
+		csum_l4 |= PKT_TX_TCP_CKSUM;
+
+	switch (csum_l4) {
+	case PKT_TX_UDP_CKSUM:
+		hdr->csum_start = cookie->l2_len + cookie->l3_len;
+		hdr->csum_offset = offsetof(struct rte_udp_hdr, dgram_cksum);
+		hdr->flags = VIRTIO_NET_HDR_F_NEEDS_CSUM;
+		break;
+
+	case PKT_TX_TCP_CKSUM:
+		hdr->csum_start = cookie->l2_len + cookie->l3_len;
+		hdr->csum_offset = offsetof(struct rte_tcp_hdr, cksum);
+		hdr->flags = VIRTIO_NET_HDR_F_NEEDS_CSUM;
+		break;
+
+	default:
+		ASSIGN_UNLESS_EQUAL(hdr->csum_start, 0);
+		ASSIGN_UNLESS_EQUAL(hdr->csum_offset, 0);
+		ASSIGN_UNLESS_EQUAL(hdr->flags, 0);
+		break;
+	}
 
-		/* TCP Segmentation Offload */
-		if (cookie->ol_flags & PKT_TX_TCP_SEG) {
-			hdr->gso_type = (cookie->ol_flags & PKT_TX_IPV6) ?
-				VIRTIO_NET_HDR_GSO_TCPV6 :
-				VIRTIO_NET_HDR_GSO_TCPV4;
-			hdr->gso_size = cookie->tso_segsz;
-			hdr->hdr_len =
-				cookie->l2_len +
-				cookie->l3_len +
-				cookie->l4_len;
-		} else {
-			ASSIGN_UNLESS_EQUAL(hdr->gso_type, 0);
-			ASSIGN_UNLESS_EQUAL(hdr->gso_size, 0);
-			ASSIGN_UNLESS_EQUAL(hdr->hdr_len, 0);
-		}
+	/* TCP Segmentation Offload */
+	if (cookie->ol_flags & PKT_TX_TCP_SEG) {
+		hdr->gso_type = (cookie->ol_flags & PKT_TX_IPV6) ?
+			VIRTIO_NET_HDR_GSO_TCPV6 :
+			VIRTIO_NET_HDR_GSO_TCPV4;
+		hdr->gso_size = cookie->tso_segsz;
+		hdr->hdr_len = cookie->l2_len + cookie->l3_len + cookie->l4_len;
+	} else {
+		ASSIGN_UNLESS_EQUAL(hdr->gso_type, 0);
+		ASSIGN_UNLESS_EQUAL(hdr->gso_size, 0);
+		ASSIGN_UNLESS_EQUAL(hdr->hdr_len, 0);
 	}
 }
 
@@ -741,7 +733,8 @@ virtqueue_enqueue_xmit_packed(struct virtnet_tx *txvq, struct rte_mbuf *cookie,
 		}
 	}
 
-	virtqueue_xmit_offload(hdr, cookie, vq->hw->has_tx_offload);
+	if (vq->hw->has_tx_offload)
+		virtqueue_xmit_offload(hdr, cookie);
 
 	do {
 		uint16_t flags;
-- 
2.23.0


^ permalink raw reply	[flat|nested] 63+ messages in thread

* [dpdk-dev] [PATCH v3 4/4] vhost: fix offload flags in Rx path
  2021-05-03 13:26 ` [dpdk-dev] [PATCH v3 0/4] Offload flags fixes David Marchand
                     ` (2 preceding siblings ...)
  2021-05-03 13:26   ` [dpdk-dev] [PATCH v3 3/4] net/virtio: refactor Tx offload helper David Marchand
@ 2021-05-03 13:26   ` David Marchand
  2021-05-03 15:24   ` [dpdk-dev] [PATCH v3 0/4] Offload flags fixes Maxime Coquelin
  4 siblings, 0 replies; 63+ messages in thread
From: David Marchand @ 2021-05-03 13:26 UTC (permalink / raw)
  To: dev
  Cc: maxime.coquelin, olivier.matz, fbl, i.maximets, chenbo.xia,
	ian.stokes, stable, Jijiang Liu, Yuanhan Liu

The vhost library currently configures Tx offloading (PKT_TX_*) on any
packet received from a guest virtio device which asks for some offloading.

This is problematic, as Tx offloading is something that the application
must ask for: the application needs to configure devices
to support every used offloads (ip, tcp checksumming, tso..), and the
various l2/l3/l4 lengths must be set following any processing that
happened in the application itself.

On the other hand, the received packets are not marked wrt current
packet l3/l4 checksumming info.

Copy virtio rx processing to fix those offload flags with some
differences:
- accept VIRTIO_NET_HDR_GSO_ECN and VIRTIO_NET_HDR_GSO_UDP,
- ignore anything but the VIRTIO_NET_HDR_F_NEEDS_CSUM flag (to comply with
  the virtio spec),

Some applications might rely on the current behavior, so it is left
untouched by default.
A new RTE_VHOST_USER_NET_COMPLIANT_OL_FLAGS flag is added to enable the
new behavior.

The vhost example has been updated for the new behavior: TSO is applied to
any packet marked LRO.

Fixes: 859b480d5afd ("vhost: add guest offload setting")
Cc: stable@dpdk.org

Signed-off-by: David Marchand <david.marchand@redhat.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
---
Changes since v2:
- introduced a new flag to keep existing behavior as the default,
- packets with unrecognised offload are passed to the application with no
  offload metadata rather than dropped,
- ignored VIRTIO_NET_HDR_F_DATA_VALID since the virtio spec states that
  the virtio driver is not allowed to use this flag when transmitting
  packets,

Changes since v1:
- updated vhost example,
- restored VIRTIO_NET_HDR_GSO_ECN and VIRTIO_NET_HDR_GSO_UDP support,
- restored log on buggy offload request,

---
 doc/guides/prog_guide/vhost_lib.rst    |  12 ++
 doc/guides/rel_notes/release_21_05.rst |   6 +
 drivers/net/vhost/rte_eth_vhost.c      |   2 +-
 examples/vhost/main.c                  |  44 +++---
 lib/vhost/rte_vhost.h                  |   1 +
 lib/vhost/socket.c                     |   5 +-
 lib/vhost/vhost.c                      |   6 +-
 lib/vhost/vhost.h                      |  14 +-
 lib/vhost/virtio_net.c                 | 185 ++++++++++++++++++++++---
 9 files changed, 222 insertions(+), 53 deletions(-)

diff --git a/doc/guides/prog_guide/vhost_lib.rst b/doc/guides/prog_guide/vhost_lib.rst
index dc29229167..042875a9ca 100644
--- a/doc/guides/prog_guide/vhost_lib.rst
+++ b/doc/guides/prog_guide/vhost_lib.rst
@@ -118,6 +118,18 @@ The following is an overview of some key Vhost API functions:
 
     It is disabled by default.
 
+  - ``RTE_VHOST_USER_NET_COMPLIANT_OL_FLAGS``
+
+    Since v16.04, the vhost library forwards checksum and gso requests for
+    packets received from a virtio driver by filling Tx offload metadata in
+    the mbuf. This behavior is inconsistent with other drivers but it is left
+    untouched for existing applications that might rely on it.
+
+    This flag disables the legacy behavior and instead ask vhost to simply
+    populate Rx offload metadata in the mbuf.
+
+    It is disabled by default.
+
 * ``rte_vhost_driver_set_features(path, features)``
 
   This function sets the feature bits the vhost-user driver supports. The
diff --git a/doc/guides/rel_notes/release_21_05.rst b/doc/guides/rel_notes/release_21_05.rst
index b3224dc332..1cb06ce487 100644
--- a/doc/guides/rel_notes/release_21_05.rst
+++ b/doc/guides/rel_notes/release_21_05.rst
@@ -329,6 +329,12 @@ API Changes
   ``policer_action_recolor_supported`` and ``policer_action_drop_supported``
   have been removed.
 
+* vhost: The vhost library currently populates received mbufs from a virtio
+  driver with Tx offload flags while not filling Rx offload flags.
+  While this behavior is arguable, it is kept untouched.
+  A new flag ``RTE_VHOST_USER_NET_COMPLIANT_OL_FLAGS`` has been added to ask
+  for a behavior compliant with to the mbuf offload API.
+
 
 ABI Changes
 -----------
diff --git a/drivers/net/vhost/rte_eth_vhost.c b/drivers/net/vhost/rte_eth_vhost.c
index d198fc8a8e..281379d6a3 100644
--- a/drivers/net/vhost/rte_eth_vhost.c
+++ b/drivers/net/vhost/rte_eth_vhost.c
@@ -1505,7 +1505,7 @@ rte_pmd_vhost_probe(struct rte_vdev_device *dev)
 	int ret = 0;
 	char *iface_name;
 	uint16_t queues;
-	uint64_t flags = 0;
+	uint64_t flags = RTE_VHOST_USER_NET_COMPLIANT_OL_FLAGS;
 	uint64_t disable_flags = 0;
 	int client_mode = 0;
 	int iommu_support = 0;
diff --git a/examples/vhost/main.c b/examples/vhost/main.c
index ff48ba270d..64295aaf7e 100644
--- a/examples/vhost/main.c
+++ b/examples/vhost/main.c
@@ -19,6 +19,7 @@
 #include <rte_log.h>
 #include <rte_string_fns.h>
 #include <rte_malloc.h>
+#include <rte_net.h>
 #include <rte_vhost.h>
 #include <rte_ip.h>
 #include <rte_tcp.h>
@@ -1032,33 +1033,34 @@ find_local_dest(struct vhost_dev *vdev, struct rte_mbuf *m,
 	return 0;
 }
 
-static uint16_t
-get_psd_sum(void *l3_hdr, uint64_t ol_flags)
-{
-	if (ol_flags & PKT_TX_IPV4)
-		return rte_ipv4_phdr_cksum(l3_hdr, ol_flags);
-	else /* assume ethertype == RTE_ETHER_TYPE_IPV6 */
-		return rte_ipv6_phdr_cksum(l3_hdr, ol_flags);
-}
-
 static void virtio_tx_offload(struct rte_mbuf *m)
 {
+	struct rte_net_hdr_lens hdr_lens;
+	struct rte_ipv4_hdr *ipv4_hdr;
+	struct rte_tcp_hdr *tcp_hdr;
+	uint32_t ptype;
 	void *l3_hdr;
-	struct rte_ipv4_hdr *ipv4_hdr = NULL;
-	struct rte_tcp_hdr *tcp_hdr = NULL;
-	struct rte_ether_hdr *eth_hdr =
-		rte_pktmbuf_mtod(m, struct rte_ether_hdr *);
 
-	l3_hdr = (char *)eth_hdr + m->l2_len;
+	ptype = rte_net_get_ptype(m, &hdr_lens, RTE_PTYPE_ALL_MASK);
+	m->l2_len = hdr_lens.l2_len;
+	m->l3_len = hdr_lens.l3_len;
+	m->l4_len = hdr_lens.l4_len;
 
-	if (m->ol_flags & PKT_TX_IPV4) {
+	l3_hdr = rte_pktmbuf_mtod_offset(m, void *, m->l2_len);
+	tcp_hdr = rte_pktmbuf_mtod_offset(m, struct rte_tcp_hdr *,
+		m->l2_len + m->l3_len);
+
+	m->ol_flags |= PKT_TX_TCP_SEG;
+	if ((ptype & RTE_PTYPE_L3_MASK) == RTE_PTYPE_L3_IPV4) {
+		m->ol_flags |= PKT_TX_IPV4;
+		m->ol_flags |= PKT_TX_IP_CKSUM;
 		ipv4_hdr = l3_hdr;
 		ipv4_hdr->hdr_checksum = 0;
-		m->ol_flags |= PKT_TX_IP_CKSUM;
+		tcp_hdr->cksum = rte_ipv4_phdr_cksum(l3_hdr, m->ol_flags);
+	} else { /* assume ethertype == RTE_ETHER_TYPE_IPV6 */
+		m->ol_flags |= PKT_TX_IPV6;
+		tcp_hdr->cksum = rte_ipv6_phdr_cksum(l3_hdr, m->ol_flags);
 	}
-
-	tcp_hdr = (struct rte_tcp_hdr *)((char *)l3_hdr + m->l3_len);
-	tcp_hdr->cksum = get_psd_sum(l3_hdr, m->ol_flags);
 }
 
 static __rte_always_inline void
@@ -1151,7 +1153,7 @@ virtio_tx_route(struct vhost_dev *vdev, struct rte_mbuf *m, uint16_t vlan_tag)
 		m->vlan_tci = vlan_tag;
 	}
 
-	if (m->ol_flags & PKT_TX_TCP_SEG)
+	if (m->ol_flags & PKT_RX_LRO)
 		virtio_tx_offload(m);
 
 	tx_q->m_table[tx_q->len++] = m;
@@ -1636,7 +1638,7 @@ main(int argc, char *argv[])
 	int ret, i;
 	uint16_t portid;
 	static pthread_t tid;
-	uint64_t flags = 0;
+	uint64_t flags = RTE_VHOST_USER_NET_COMPLIANT_OL_FLAGS;
 
 	signal(SIGINT, sigint_handler);
 
diff --git a/lib/vhost/rte_vhost.h b/lib/vhost/rte_vhost.h
index d0a8ae31f2..8d875e9322 100644
--- a/lib/vhost/rte_vhost.h
+++ b/lib/vhost/rte_vhost.h
@@ -36,6 +36,7 @@ extern "C" {
 /* support only linear buffers (no chained mbufs) */
 #define RTE_VHOST_USER_LINEARBUF_SUPPORT	(1ULL << 6)
 #define RTE_VHOST_USER_ASYNC_COPY	(1ULL << 7)
+#define RTE_VHOST_USER_NET_COMPLIANT_OL_FLAGS	(1ULL << 8)
 
 /* Features. */
 #ifndef VIRTIO_NET_F_GUEST_ANNOUNCE
diff --git a/lib/vhost/socket.c b/lib/vhost/socket.c
index 0169d36481..5d0d728d52 100644
--- a/lib/vhost/socket.c
+++ b/lib/vhost/socket.c
@@ -42,6 +42,7 @@ struct vhost_user_socket {
 	bool extbuf;
 	bool linearbuf;
 	bool async_copy;
+	bool net_compliant_ol_flags;
 
 	/*
 	 * The "supported_features" indicates the feature bits the
@@ -224,7 +225,8 @@ vhost_user_add_connection(int fd, struct vhost_user_socket *vsocket)
 	size = strnlen(vsocket->path, PATH_MAX);
 	vhost_set_ifname(vid, vsocket->path, size);
 
-	vhost_set_builtin_virtio_net(vid, vsocket->use_builtin_virtio_net);
+	vhost_setup_virtio_net(vid, vsocket->use_builtin_virtio_net,
+		vsocket->net_compliant_ol_flags);
 
 	vhost_attach_vdpa_device(vid, vsocket->vdpa_dev);
 
@@ -877,6 +879,7 @@ rte_vhost_driver_register(const char *path, uint64_t flags)
 	vsocket->extbuf = flags & RTE_VHOST_USER_EXTBUF_SUPPORT;
 	vsocket->linearbuf = flags & RTE_VHOST_USER_LINEARBUF_SUPPORT;
 	vsocket->async_copy = flags & RTE_VHOST_USER_ASYNC_COPY;
+	vsocket->net_compliant_ol_flags = flags & RTE_VHOST_USER_NET_COMPLIANT_OL_FLAGS;
 
 	if (vsocket->async_copy &&
 		(flags & (RTE_VHOST_USER_IOMMU_SUPPORT |
diff --git a/lib/vhost/vhost.c b/lib/vhost/vhost.c
index a70fe01d8f..846113d46f 100644
--- a/lib/vhost/vhost.c
+++ b/lib/vhost/vhost.c
@@ -752,7 +752,7 @@ vhost_set_ifname(int vid, const char *if_name, unsigned int if_len)
 }
 
 void
-vhost_set_builtin_virtio_net(int vid, bool enable)
+vhost_setup_virtio_net(int vid, bool enable, bool compliant_ol_flags)
 {
 	struct virtio_net *dev = get_device(vid);
 
@@ -763,6 +763,10 @@ vhost_set_builtin_virtio_net(int vid, bool enable)
 		dev->flags |= VIRTIO_DEV_BUILTIN_VIRTIO_NET;
 	else
 		dev->flags &= ~VIRTIO_DEV_BUILTIN_VIRTIO_NET;
+	if (!compliant_ol_flags)
+		dev->flags |= VIRTIO_DEV_LEGACY_OL_FLAGS;
+	else
+		dev->flags &= ~VIRTIO_DEV_LEGACY_OL_FLAGS;
 }
 
 void
diff --git a/lib/vhost/vhost.h b/lib/vhost/vhost.h
index f628714c24..65bcdc5301 100644
--- a/lib/vhost/vhost.h
+++ b/lib/vhost/vhost.h
@@ -27,15 +27,17 @@
 #include "rte_vhost_async.h"
 
 /* Used to indicate that the device is running on a data core */
-#define VIRTIO_DEV_RUNNING 1
+#define VIRTIO_DEV_RUNNING ((uint32_t)1 << 0)
 /* Used to indicate that the device is ready to operate */
-#define VIRTIO_DEV_READY 2
+#define VIRTIO_DEV_READY ((uint32_t)1 << 1)
 /* Used to indicate that the built-in vhost net device backend is enabled */
-#define VIRTIO_DEV_BUILTIN_VIRTIO_NET 4
+#define VIRTIO_DEV_BUILTIN_VIRTIO_NET ((uint32_t)1 << 2)
 /* Used to indicate that the device has its own data path and configured */
-#define VIRTIO_DEV_VDPA_CONFIGURED 8
+#define VIRTIO_DEV_VDPA_CONFIGURED ((uint32_t)1 << 3)
 /* Used to indicate that the feature negotiation failed */
-#define VIRTIO_DEV_FEATURES_FAILED 16
+#define VIRTIO_DEV_FEATURES_FAILED ((uint32_t)1 << 4)
+/* Used to indicate that the virtio_net tx code should fill TX ol_flags */
+#define VIRTIO_DEV_LEGACY_OL_FLAGS ((uint32_t)1 << 5)
 
 /* Backend value set by guest. */
 #define VIRTIO_DEV_STOPPED -1
@@ -674,7 +676,7 @@ int alloc_vring_queue(struct virtio_net *dev, uint32_t vring_idx);
 void vhost_attach_vdpa_device(int vid, struct rte_vdpa_device *dev);
 
 void vhost_set_ifname(int, const char *if_name, unsigned int if_len);
-void vhost_set_builtin_virtio_net(int vid, bool enable);
+void vhost_setup_virtio_net(int vid, bool enable, bool legacy_ol_flags);
 void vhost_enable_extbuf(int vid);
 void vhost_enable_linearbuf(int vid);
 int vhost_enable_guest_notification(struct virtio_net *dev,
diff --git a/lib/vhost/virtio_net.c b/lib/vhost/virtio_net.c
index ff39878609..aef30ad4fe 100644
--- a/lib/vhost/virtio_net.c
+++ b/lib/vhost/virtio_net.c
@@ -8,6 +8,7 @@
 
 #include <rte_mbuf.h>
 #include <rte_memcpy.h>
+#include <rte_net.h>
 #include <rte_ether.h>
 #include <rte_ip.h>
 #include <rte_vhost.h>
@@ -1875,15 +1876,12 @@ parse_ethernet(struct rte_mbuf *m, uint16_t *l4_proto, void **l4_hdr)
 }
 
 static __rte_always_inline void
-vhost_dequeue_offload(struct virtio_net_hdr *hdr, struct rte_mbuf *m)
+vhost_dequeue_offload_legacy(struct virtio_net_hdr *hdr, struct rte_mbuf *m)
 {
 	uint16_t l4_proto = 0;
 	void *l4_hdr = NULL;
 	struct rte_tcp_hdr *tcp_hdr = NULL;
 
-	if (hdr->flags == 0 && hdr->gso_type == VIRTIO_NET_HDR_GSO_NONE)
-		return;
-
 	parse_ethernet(m, &l4_proto, &l4_hdr);
 	if (hdr->flags == VIRTIO_NET_HDR_F_NEEDS_CSUM) {
 		if (hdr->csum_start == (m->l2_len + m->l3_len)) {
@@ -1928,6 +1926,94 @@ vhost_dequeue_offload(struct virtio_net_hdr *hdr, struct rte_mbuf *m)
 	}
 }
 
+static __rte_always_inline void
+vhost_dequeue_offload(struct virtio_net_hdr *hdr, struct rte_mbuf *m,
+	bool legacy_ol_flags)
+{
+	struct rte_net_hdr_lens hdr_lens;
+	int l4_supported = 0;
+	uint32_t ptype;
+
+	if (hdr->flags == 0 && hdr->gso_type == VIRTIO_NET_HDR_GSO_NONE)
+		return;
+
+	if (legacy_ol_flags) {
+		vhost_dequeue_offload_legacy(hdr, m);
+		return;
+	}
+
+	m->ol_flags |= PKT_RX_IP_CKSUM_UNKNOWN;
+
+	ptype = rte_net_get_ptype(m, &hdr_lens, RTE_PTYPE_ALL_MASK);
+	m->packet_type = ptype;
+	if ((ptype & RTE_PTYPE_L4_MASK) == RTE_PTYPE_L4_TCP ||
+	    (ptype & RTE_PTYPE_L4_MASK) == RTE_PTYPE_L4_UDP ||
+	    (ptype & RTE_PTYPE_L4_MASK) == RTE_PTYPE_L4_SCTP)
+		l4_supported = 1;
+
+	/* According to Virtio 1.1 spec, the device only needs to look at
+	 * VIRTIO_NET_HDR_F_NEEDS_CSUM in the packet transmission path.
+	 * This differs from the processing incoming packets path where the
+	 * driver could rely on VIRTIO_NET_HDR_F_DATA_VALID flag set by the
+	 * device.
+	 *
+	 * 5.1.6.2.1 Driver Requirements: Packet Transmission
+	 * The driver MUST NOT set the VIRTIO_NET_HDR_F_DATA_VALID and
+	 * VIRTIO_NET_HDR_F_RSC_INFO bits in flags.
+	 *
+	 * 5.1.6.2.2 Device Requirements: Packet Transmission
+	 * The device MUST ignore flag bits that it does not recognize.
+	 */
+	if (hdr->flags & VIRTIO_NET_HDR_F_NEEDS_CSUM) {
+		uint32_t hdrlen;
+
+		hdrlen = hdr_lens.l2_len + hdr_lens.l3_len + hdr_lens.l4_len;
+		if (hdr->csum_start <= hdrlen && l4_supported != 0) {
+			m->ol_flags |= PKT_RX_L4_CKSUM_NONE;
+		} else {
+			/* Unknown proto or tunnel, do sw cksum. We can assume
+			 * the cksum field is in the first segment since the
+			 * buffers we provided to the host are large enough.
+			 * In case of SCTP, this will be wrong since it's a CRC
+			 * but there's nothing we can do.
+			 */
+			uint16_t csum = 0, off;
+
+			if (rte_raw_cksum_mbuf(m, hdr->csum_start,
+					rte_pktmbuf_pkt_len(m) - hdr->csum_start, &csum) < 0)
+				return;
+			if (likely(csum != 0xffff))
+				csum = ~csum;
+			off = hdr->csum_offset + hdr->csum_start;
+			if (rte_pktmbuf_data_len(m) >= off + 1)
+				*rte_pktmbuf_mtod_offset(m, uint16_t *, off) = csum;
+		}
+	}
+
+	if (hdr->gso_type != VIRTIO_NET_HDR_GSO_NONE) {
+		if (hdr->gso_size == 0)
+			return;
+
+		switch (hdr->gso_type & ~VIRTIO_NET_HDR_GSO_ECN) {
+		case VIRTIO_NET_HDR_GSO_TCPV4:
+		case VIRTIO_NET_HDR_GSO_TCPV6:
+			if ((ptype & RTE_PTYPE_L4_MASK) != RTE_PTYPE_L4_TCP)
+				break;
+			m->ol_flags |= PKT_RX_LRO | PKT_RX_L4_CKSUM_NONE;
+			m->tso_segsz = hdr->gso_size;
+			break;
+		case VIRTIO_NET_HDR_GSO_UDP:
+			if ((ptype & RTE_PTYPE_L4_MASK) != RTE_PTYPE_L4_UDP)
+				break;
+			m->ol_flags |= PKT_RX_LRO | PKT_RX_L4_CKSUM_NONE;
+			m->tso_segsz = hdr->gso_size;
+			break;
+		default:
+			break;
+		}
+	}
+}
+
 static __rte_noinline void
 copy_vnet_hdr_from_desc(struct virtio_net_hdr *hdr,
 		struct buf_vector *buf_vec)
@@ -1952,7 +2038,8 @@ copy_vnet_hdr_from_desc(struct virtio_net_hdr *hdr,
 static __rte_always_inline int
 copy_desc_to_mbuf(struct virtio_net *dev, struct vhost_virtqueue *vq,
 		  struct buf_vector *buf_vec, uint16_t nr_vec,
-		  struct rte_mbuf *m, struct rte_mempool *mbuf_pool)
+		  struct rte_mbuf *m, struct rte_mempool *mbuf_pool,
+		  bool legacy_ol_flags)
 {
 	uint32_t buf_avail, buf_offset;
 	uint64_t buf_addr, buf_len;
@@ -2085,7 +2172,7 @@ copy_desc_to_mbuf(struct virtio_net *dev, struct vhost_virtqueue *vq,
 	m->pkt_len    += mbuf_offset;
 
 	if (hdr)
-		vhost_dequeue_offload(hdr, m);
+		vhost_dequeue_offload(hdr, m, legacy_ol_flags);
 
 out:
 
@@ -2168,9 +2255,11 @@ virtio_dev_pktmbuf_alloc(struct virtio_net *dev, struct rte_mempool *mp,
 	return NULL;
 }
 
-static __rte_noinline uint16_t
+__rte_always_inline
+static uint16_t
 virtio_dev_tx_split(struct virtio_net *dev, struct vhost_virtqueue *vq,
-	struct rte_mempool *mbuf_pool, struct rte_mbuf **pkts, uint16_t count)
+	struct rte_mempool *mbuf_pool, struct rte_mbuf **pkts, uint16_t count,
+	bool legacy_ol_flags)
 {
 	uint16_t i;
 	uint16_t free_entries;
@@ -2230,7 +2319,7 @@ virtio_dev_tx_split(struct virtio_net *dev, struct vhost_virtqueue *vq,
 		}
 
 		err = copy_desc_to_mbuf(dev, vq, buf_vec, nr_vec, pkts[i],
-				mbuf_pool);
+				mbuf_pool, legacy_ol_flags);
 		if (unlikely(err)) {
 			rte_pktmbuf_free(pkts[i]);
 			if (!allocerr_warned) {
@@ -2258,6 +2347,24 @@ virtio_dev_tx_split(struct virtio_net *dev, struct vhost_virtqueue *vq,
 	return (i - dropped);
 }
 
+__rte_noinline
+static uint16_t
+virtio_dev_tx_split_legacy(struct virtio_net *dev,
+	struct vhost_virtqueue *vq, struct rte_mempool *mbuf_pool,
+	struct rte_mbuf **pkts, uint16_t count)
+{
+	return virtio_dev_tx_split(dev, vq, mbuf_pool, pkts, count, true);
+}
+
+__rte_noinline
+static uint16_t
+virtio_dev_tx_split_compliant(struct virtio_net *dev,
+	struct vhost_virtqueue *vq, struct rte_mempool *mbuf_pool,
+	struct rte_mbuf **pkts, uint16_t count)
+{
+	return virtio_dev_tx_split(dev, vq, mbuf_pool, pkts, count, false);
+}
+
 static __rte_always_inline int
 vhost_reserve_avail_batch_packed(struct virtio_net *dev,
 				 struct vhost_virtqueue *vq,
@@ -2338,7 +2445,8 @@ static __rte_always_inline int
 virtio_dev_tx_batch_packed(struct virtio_net *dev,
 			   struct vhost_virtqueue *vq,
 			   struct rte_mempool *mbuf_pool,
-			   struct rte_mbuf **pkts)
+			   struct rte_mbuf **pkts,
+			   bool legacy_ol_flags)
 {
 	uint16_t avail_idx = vq->last_avail_idx;
 	uint32_t buf_offset = sizeof(struct virtio_net_hdr_mrg_rxbuf);
@@ -2362,7 +2470,7 @@ virtio_dev_tx_batch_packed(struct virtio_net *dev,
 	if (virtio_net_with_host_offload(dev)) {
 		vhost_for_each_try_unroll(i, 0, PACKED_BATCH_SIZE) {
 			hdr = (struct virtio_net_hdr *)(desc_addrs[i]);
-			vhost_dequeue_offload(hdr, pkts[i]);
+			vhost_dequeue_offload(hdr, pkts[i], legacy_ol_flags);
 		}
 	}
 
@@ -2383,7 +2491,8 @@ vhost_dequeue_single_packed(struct virtio_net *dev,
 			    struct rte_mempool *mbuf_pool,
 			    struct rte_mbuf **pkts,
 			    uint16_t *buf_id,
-			    uint16_t *desc_count)
+			    uint16_t *desc_count,
+			    bool legacy_ol_flags)
 {
 	struct buf_vector buf_vec[BUF_VECTOR_MAX];
 	uint32_t buf_len;
@@ -2410,7 +2519,7 @@ vhost_dequeue_single_packed(struct virtio_net *dev,
 	}
 
 	err = copy_desc_to_mbuf(dev, vq, buf_vec, nr_vec, *pkts,
-				mbuf_pool);
+				mbuf_pool, legacy_ol_flags);
 	if (unlikely(err)) {
 		if (!allocerr_warned) {
 			VHOST_LOG_DATA(ERR,
@@ -2429,14 +2538,15 @@ static __rte_always_inline int
 virtio_dev_tx_single_packed(struct virtio_net *dev,
 			    struct vhost_virtqueue *vq,
 			    struct rte_mempool *mbuf_pool,
-			    struct rte_mbuf **pkts)
+			    struct rte_mbuf **pkts,
+			    bool legacy_ol_flags)
 {
 
 	uint16_t buf_id, desc_count = 0;
 	int ret;
 
 	ret = vhost_dequeue_single_packed(dev, vq, mbuf_pool, pkts, &buf_id,
-					&desc_count);
+					&desc_count, legacy_ol_flags);
 
 	if (likely(desc_count > 0)) {
 		if (virtio_net_is_inorder(dev))
@@ -2452,12 +2562,14 @@ virtio_dev_tx_single_packed(struct virtio_net *dev,
 	return ret;
 }
 
-static __rte_noinline uint16_t
+__rte_always_inline
+static uint16_t
 virtio_dev_tx_packed(struct virtio_net *dev,
 		     struct vhost_virtqueue *__rte_restrict vq,
 		     struct rte_mempool *mbuf_pool,
 		     struct rte_mbuf **__rte_restrict pkts,
-		     uint32_t count)
+		     uint32_t count,
+		     bool legacy_ol_flags)
 {
 	uint32_t pkt_idx = 0;
 	uint32_t remained = count;
@@ -2467,7 +2579,8 @@ virtio_dev_tx_packed(struct virtio_net *dev,
 
 		if (remained >= PACKED_BATCH_SIZE) {
 			if (!virtio_dev_tx_batch_packed(dev, vq, mbuf_pool,
-							&pkts[pkt_idx])) {
+							&pkts[pkt_idx],
+							legacy_ol_flags)) {
 				pkt_idx += PACKED_BATCH_SIZE;
 				remained -= PACKED_BATCH_SIZE;
 				continue;
@@ -2475,7 +2588,8 @@ virtio_dev_tx_packed(struct virtio_net *dev,
 		}
 
 		if (virtio_dev_tx_single_packed(dev, vq, mbuf_pool,
-						&pkts[pkt_idx]))
+						&pkts[pkt_idx],
+						legacy_ol_flags))
 			break;
 		pkt_idx++;
 		remained--;
@@ -2492,6 +2606,24 @@ virtio_dev_tx_packed(struct virtio_net *dev,
 	return pkt_idx;
 }
 
+__rte_noinline
+static uint16_t
+virtio_dev_tx_packed_legacy(struct virtio_net *dev,
+	struct vhost_virtqueue *__rte_restrict vq, struct rte_mempool *mbuf_pool,
+	struct rte_mbuf **__rte_restrict pkts, uint32_t count)
+{
+	return virtio_dev_tx_packed(dev, vq, mbuf_pool, pkts, count, true);
+}
+
+__rte_noinline
+static uint16_t
+virtio_dev_tx_packed_compliant(struct virtio_net *dev,
+	struct vhost_virtqueue *__rte_restrict vq, struct rte_mempool *mbuf_pool,
+	struct rte_mbuf **__rte_restrict pkts, uint32_t count)
+{
+	return virtio_dev_tx_packed(dev, vq, mbuf_pool, pkts, count, false);
+}
+
 uint16_t
 rte_vhost_dequeue_burst(int vid, uint16_t queue_id,
 	struct rte_mempool *mbuf_pool, struct rte_mbuf **pkts, uint16_t count)
@@ -2567,10 +2699,17 @@ rte_vhost_dequeue_burst(int vid, uint16_t queue_id,
 		count -= 1;
 	}
 
-	if (vq_is_packed(dev))
-		count = virtio_dev_tx_packed(dev, vq, mbuf_pool, pkts, count);
-	else
-		count = virtio_dev_tx_split(dev, vq, mbuf_pool, pkts, count);
+	if (vq_is_packed(dev)) {
+		if (dev->flags & VIRTIO_DEV_LEGACY_OL_FLAGS)
+			count = virtio_dev_tx_packed_legacy(dev, vq, mbuf_pool, pkts, count);
+		else
+			count = virtio_dev_tx_packed_compliant(dev, vq, mbuf_pool, pkts, count);
+	} else {
+		if (dev->flags & VIRTIO_DEV_LEGACY_OL_FLAGS)
+			count = virtio_dev_tx_split_legacy(dev, vq, mbuf_pool, pkts, count);
+		else
+			count = virtio_dev_tx_split_compliant(dev, vq, mbuf_pool, pkts, count);
+	}
 
 out:
 	if (dev->features & (1ULL << VIRTIO_F_IOMMU_PLATFORM))
-- 
2.23.0


^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [dpdk-dev] [PATCH v3 1/4] mbuf: mark old offload flag as deprecated
  2021-05-03 13:26   ` [dpdk-dev] [PATCH v3 1/4] mbuf: mark old offload flag as deprecated David Marchand
@ 2021-05-03 14:02     ` Maxime Coquelin
  2021-05-03 14:12     ` David Marchand
  1 sibling, 0 replies; 63+ messages in thread
From: Maxime Coquelin @ 2021-05-03 14:02 UTC (permalink / raw)
  To: David Marchand, dev
  Cc: olivier.matz, fbl, i.maximets, chenbo.xia, ian.stokes,
	Lance Richardson, Ajit Khaparde, Ferruh Yigit, Andrew Rybchenko



On 5/3/21 3:26 PM, David Marchand wrote:
> PKT_RX_EIP_CKSUM_BAD has been declared deprecated but there was no
> warning to applications still using it.
> Fix this by marking as deprecated with the newly introduced
> RTE_DEPRECATED.
> 
> Fixes: e8a419d6de4b ("mbuf: rename outer IP checksum macro")
> 
> Signed-off-by: David Marchand <david.marchand@redhat.com>
> Reviewed-by: Flavio Leitner <fbl@sysclose.org>
> Acked-by: Olivier Matz <olivier.matz@6wind.com>
> Acked-by: Lance Richardson <lance.richardson@broadcom.com>
> Acked-by: Ajit Khaparde <ajit.khaparde@broadcom.com>
> ---
> Changes since v1:
> - updated commitlog following Olivier comment,
> 
> ---
>  lib/mbuf/rte_mbuf_core.h | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 

Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>

Thanks,
Maxime


^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [dpdk-dev] [PATCH v3 1/4] mbuf: mark old offload flag as deprecated
  2021-05-03 13:26   ` [dpdk-dev] [PATCH v3 1/4] mbuf: mark old offload flag as deprecated David Marchand
  2021-05-03 14:02     ` Maxime Coquelin
@ 2021-05-03 14:12     ` David Marchand
  1 sibling, 0 replies; 63+ messages in thread
From: David Marchand @ 2021-05-03 14:12 UTC (permalink / raw)
  To: dev
  Cc: Maxime Coquelin, Olivier Matz, Flavio Leitner, Ilya Maximets,
	Xia, Chenbo, Ian Stokes, Lance Richardson, Ajit Khaparde,
	Ferruh Yigit, Andrew Rybchenko

On Mon, May 3, 2021 at 3:27 PM David Marchand <david.marchand@redhat.com> wrote:
>
> PKT_RX_EIP_CKSUM_BAD has been declared deprecated but there was no
> warning to applications still using it.
> Fix this by marking as deprecated with the newly introduced
> RTE_DEPRECATED.
>
> Fixes: e8a419d6de4b ("mbuf: rename outer IP checksum macro")
>
> Signed-off-by: David Marchand <david.marchand@redhat.com>
> Reviewed-by: Flavio Leitner <fbl@sysclose.org>
> Acked-by: Olivier Matz <olivier.matz@6wind.com>
> Acked-by: Lance Richardson <lance.richardson@broadcom.com>
> Acked-by: Ajit Khaparde <ajit.khaparde@broadcom.com>

Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>

Applied to the main branch.
The rest of the series will go through next-virtio.


-- 
David Marchand


^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [dpdk-dev] [PATCH v3 0/4] Offload flags fixes
  2021-05-03 13:26 ` [dpdk-dev] [PATCH v3 0/4] Offload flags fixes David Marchand
                     ` (3 preceding siblings ...)
  2021-05-03 13:26   ` [dpdk-dev] [PATCH v3 4/4] vhost: fix offload flags in Rx path David Marchand
@ 2021-05-03 15:24   ` Maxime Coquelin
  2021-05-03 16:21     ` David Marchand
  4 siblings, 1 reply; 63+ messages in thread
From: Maxime Coquelin @ 2021-05-03 15:24 UTC (permalink / raw)
  To: David Marchand, dev; +Cc: olivier.matz, fbl, i.maximets, chenbo.xia, ian.stokes

Hi David,

On 5/3/21 3:26 PM, David Marchand wrote:
> The important part is the last patch on vhost handling of offloading
> requests coming from a virtio guest interface.
> 
> The rest are small fixes that I accumulated while reviewing the mbuf
> offload flags.
> 
> On this last patch, it has the potential of breaking existing
> applications using the vhost library (OVS being impacted).
> I did not mark it for backport.
> 
> Changes since v2:
> - kept behavior untouched (to avoid breaking ABI) and introduced a new
>   flag to select the new behavior,
> 
> Changes since v1:
> - dropped patch on net/tap,
> - added missing bits in example/vhost,
> - relaxed checks on VIRTIO_NET_HDR_GSO_ECN and VIRTIO_NET_HDR_GSO_UDP,
> 

Patch 4 does not apply on top of next-virtio/main branch.
Could you please send a rebased version?

Thanks,
Maxime


^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [dpdk-dev] [PATCH v3 0/4] Offload flags fixes
  2021-05-03 15:24   ` [dpdk-dev] [PATCH v3 0/4] Offload flags fixes Maxime Coquelin
@ 2021-05-03 16:21     ` David Marchand
  0 siblings, 0 replies; 63+ messages in thread
From: David Marchand @ 2021-05-03 16:21 UTC (permalink / raw)
  To: Maxime Coquelin
  Cc: dev, Olivier Matz, Flavio Leitner, Ilya Maximets, Xia, Chenbo,
	Ian Stokes

On Mon, May 3, 2021 at 5:24 PM Maxime Coquelin
<maxime.coquelin@redhat.com> wrote:
> On 5/3/21 3:26 PM, David Marchand wrote:
> > The important part is the last patch on vhost handling of offloading
> > requests coming from a virtio guest interface.
> >
> > The rest are small fixes that I accumulated while reviewing the mbuf
> > offload flags.
> >
> > On this last patch, it has the potential of breaking existing
> > applications using the vhost library (OVS being impacted).
> > I did not mark it for backport.
> >
> > Changes since v2:
> > - kept behavior untouched (to avoid breaking ABI) and introduced a new
> >   flag to select the new behavior,
> >
> > Changes since v1:
> > - dropped patch on net/tap,
> > - added missing bits in example/vhost,
> > - relaxed checks on VIRTIO_NET_HDR_GSO_ECN and VIRTIO_NET_HDR_GSO_UDP,
> >
>
> Patch 4 does not apply on top of next-virtio/main branch.
> Could you please send a rebased version?

The conflict is with Balazs rework.
Ok, preparing v4.


-- 
David Marchand


^ permalink raw reply	[flat|nested] 63+ messages in thread

* [dpdk-dev] [PATCH v4 0/3] Offload flags fixes
  2021-04-01  9:52 [dpdk-dev] [PATCH 0/5] Offload flags fixes David Marchand
                   ` (6 preceding siblings ...)
  2021-05-03 13:26 ` [dpdk-dev] [PATCH v3 0/4] Offload flags fixes David Marchand
@ 2021-05-03 16:43 ` David Marchand
  2021-05-03 16:43   ` [dpdk-dev] [PATCH v4 1/3] net/virtio: do not touch Tx offload flags David Marchand
                     ` (3 more replies)
  7 siblings, 4 replies; 63+ messages in thread
From: David Marchand @ 2021-05-03 16:43 UTC (permalink / raw)
  To: dev
  Cc: maxime.coquelin, olivier.matz, fbl, i.maximets, chenbo.xia, ian.stokes

The important part is the last patch on vhost handling of offloading
requests coming from a virtio guest interface.

The rest are small fixes that I accumulated while reviewing the mbuf
offload flags.

On this last patch, it has the potential of breaking existing
applications using the vhost library (OVS being impacted).
I did not mark it for backport.

Changes since v3:
- patch 1 went through the main repo,
- rebased on next-virtio,

Changes since v2:
- kept behavior untouched (to avoid breaking ABI) and introduced a new
  flag to select the new behavior,

Changes since v1:
- dropped patch on net/tap,
- added missing bits in example/vhost,
- relaxed checks on VIRTIO_NET_HDR_GSO_ECN and VIRTIO_NET_HDR_GSO_UDP,

-- 
David Marchand

David Marchand (3):
  net/virtio: do not touch Tx offload flags
  net/virtio: refactor Tx offload helper
  vhost: fix offload flags in Rx path

 doc/guides/prog_guide/vhost_lib.rst          |  12 ++
 doc/guides/rel_notes/release_21_05.rst       |   6 +
 drivers/net/vhost/rte_eth_vhost.c            |   2 +-
 drivers/net/virtio/virtio_rxtx.c             |   7 +-
 drivers/net/virtio/virtio_rxtx_packed_avx.h  |   2 +-
 drivers/net/virtio/virtio_rxtx_packed_neon.h |   2 +-
 drivers/net/virtio/virtqueue.h               |  81 ++++----
 examples/vhost/main.c                        |  44 ++---
 lib/vhost/rte_vhost.h                        |   1 +
 lib/vhost/socket.c                           |   5 +-
 lib/vhost/vhost.c                            |   6 +-
 lib/vhost/vhost.h                            |  14 +-
 lib/vhost/virtio_net.c                       | 185 ++++++++++++++++---
 13 files changed, 266 insertions(+), 101 deletions(-)

-- 
2.23.0


^ permalink raw reply	[flat|nested] 63+ messages in thread

* [dpdk-dev] [PATCH v4 1/3] net/virtio: do not touch Tx offload flags
  2021-05-03 16:43 ` [dpdk-dev] [PATCH v4 0/3] " David Marchand
@ 2021-05-03 16:43   ` David Marchand
  2021-05-03 16:43   ` [dpdk-dev] [PATCH v4 2/3] net/virtio: refactor Tx offload helper David Marchand
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 63+ messages in thread
From: David Marchand @ 2021-05-03 16:43 UTC (permalink / raw)
  To: dev
  Cc: maxime.coquelin, olivier.matz, fbl, i.maximets, chenbo.xia, ian.stokes

Tx offload flags are of the application responsibility.
Leave the mbuf alone and use a local storage for implicit tcp checksum
offloading in case of TSO.

Signed-off-by: David Marchand <david.marchand@redhat.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Flavio Leitner <fbl@sysclose.org>
---
 drivers/net/virtio/virtqueue.h | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/net/virtio/virtqueue.h b/drivers/net/virtio/virtqueue.h
index e9992b745d..ed3b85080e 100644
--- a/drivers/net/virtio/virtqueue.h
+++ b/drivers/net/virtio/virtqueue.h
@@ -622,10 +622,12 @@ virtqueue_xmit_offload(struct virtio_net_hdr *hdr,
 			uint8_t offload)
 {
 	if (offload) {
+		uint64_t csum_l4 = cookie->ol_flags & PKT_TX_L4_MASK;
+
 		if (cookie->ol_flags & PKT_TX_TCP_SEG)
-			cookie->ol_flags |= PKT_TX_TCP_CKSUM;
+			csum_l4 |= PKT_TX_TCP_CKSUM;
 
-		switch (cookie->ol_flags & PKT_TX_L4_MASK) {
+		switch (csum_l4) {
 		case PKT_TX_UDP_CKSUM:
 			hdr->csum_start = cookie->l2_len + cookie->l3_len;
 			hdr->csum_offset = offsetof(struct rte_udp_hdr,
-- 
2.23.0


^ permalink raw reply	[flat|nested] 63+ messages in thread

* [dpdk-dev] [PATCH v4 2/3] net/virtio: refactor Tx offload helper
  2021-05-03 16:43 ` [dpdk-dev] [PATCH v4 0/3] " David Marchand
  2021-05-03 16:43   ` [dpdk-dev] [PATCH v4 1/3] net/virtio: do not touch Tx offload flags David Marchand
@ 2021-05-03 16:43   ` David Marchand
  2021-05-03 16:43   ` [dpdk-dev] [PATCH v4 3/3] vhost: fix offload flags in Rx path David Marchand
  2021-05-04  8:29   ` [dpdk-dev] [PATCH v4 0/3] Offload flags fixes Maxime Coquelin
  3 siblings, 0 replies; 63+ messages in thread
From: David Marchand @ 2021-05-03 16:43 UTC (permalink / raw)
  To: dev
  Cc: maxime.coquelin, olivier.matz, fbl, i.maximets, chenbo.xia,
	ian.stokes, Ruifeng Wang, Bruce Richardson, Konstantin Ananyev,
	Jerin Jacob

Purely cosmetic but it is rather odd to have an "offload" helper that
checks if it actually must do something.
We already have the same checks in most callers, so move this branch
in them.

Signed-off-by: David Marchand <david.marchand@redhat.com>
Reviewed-by: Flavio Leitner <fbl@sysclose.org>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
---
 drivers/net/virtio/virtio_rxtx.c             |  7 +-
 drivers/net/virtio/virtio_rxtx_packed_avx.h  |  2 +-
 drivers/net/virtio/virtio_rxtx_packed_neon.h |  2 +-
 drivers/net/virtio/virtqueue.h               | 83 +++++++++-----------
 4 files changed, 44 insertions(+), 50 deletions(-)

diff --git a/drivers/net/virtio/virtio_rxtx.c b/drivers/net/virtio/virtio_rxtx.c
index 8df913b0ba..34108fb946 100644
--- a/drivers/net/virtio/virtio_rxtx.c
+++ b/drivers/net/virtio/virtio_rxtx.c
@@ -448,7 +448,7 @@ virtqueue_enqueue_xmit_inorder(struct virtnet_tx *txvq,
 		if (!vq->hw->has_tx_offload)
 			virtqueue_clear_net_hdr(hdr);
 		else
-			virtqueue_xmit_offload(hdr, cookies[i], true);
+			virtqueue_xmit_offload(hdr, cookies[i]);
 
 		start_dp[idx].addr  = rte_mbuf_data_iova(cookies[i]) - head_size;
 		start_dp[idx].len   = cookies[i]->data_len + head_size;
@@ -495,7 +495,7 @@ virtqueue_enqueue_xmit_packed_fast(struct virtnet_tx *txvq,
 	if (!vq->hw->has_tx_offload)
 		virtqueue_clear_net_hdr(hdr);
 	else
-		virtqueue_xmit_offload(hdr, cookie, true);
+		virtqueue_xmit_offload(hdr, cookie);
 
 	dp->addr = rte_mbuf_data_iova(cookie) - head_size;
 	dp->len  = cookie->data_len + head_size;
@@ -581,7 +581,8 @@ virtqueue_enqueue_xmit(struct virtnet_tx *txvq, struct rte_mbuf *cookie,
 		idx = start_dp[idx].next;
 	}
 
-	virtqueue_xmit_offload(hdr, cookie, vq->hw->has_tx_offload);
+	if (vq->hw->has_tx_offload)
+		virtqueue_xmit_offload(hdr, cookie);
 
 	do {
 		start_dp[idx].addr  = rte_mbuf_data_iova(cookie);
diff --git a/drivers/net/virtio/virtio_rxtx_packed_avx.h b/drivers/net/virtio/virtio_rxtx_packed_avx.h
index 228cf5437b..c819d2e4f2 100644
--- a/drivers/net/virtio/virtio_rxtx_packed_avx.h
+++ b/drivers/net/virtio/virtio_rxtx_packed_avx.h
@@ -115,7 +115,7 @@ virtqueue_enqueue_batch_packed_vec(struct virtnet_tx *txvq,
 		virtio_for_each_try_unroll(i, 0, PACKED_BATCH_SIZE) {
 			hdr = rte_pktmbuf_mtod_offset(tx_pkts[i],
 					struct virtio_net_hdr *, -head_size);
-			virtqueue_xmit_offload(hdr, tx_pkts[i], true);
+			virtqueue_xmit_offload(hdr, tx_pkts[i]);
 		}
 	}
 
diff --git a/drivers/net/virtio/virtio_rxtx_packed_neon.h b/drivers/net/virtio/virtio_rxtx_packed_neon.h
index d4257e68f0..f19e618635 100644
--- a/drivers/net/virtio/virtio_rxtx_packed_neon.h
+++ b/drivers/net/virtio/virtio_rxtx_packed_neon.h
@@ -134,7 +134,7 @@ virtqueue_enqueue_batch_packed_vec(struct virtnet_tx *txvq,
 		virtio_for_each_try_unroll(i, 0, PACKED_BATCH_SIZE) {
 			hdr = rte_pktmbuf_mtod_offset(tx_pkts[i],
 					struct virtio_net_hdr *, -head_size);
-			virtqueue_xmit_offload(hdr, tx_pkts[i], true);
+			virtqueue_xmit_offload(hdr, tx_pkts[i]);
 		}
 	}
 
diff --git a/drivers/net/virtio/virtqueue.h b/drivers/net/virtio/virtqueue.h
index ed3b85080e..03957b2bd0 100644
--- a/drivers/net/virtio/virtqueue.h
+++ b/drivers/net/virtio/virtqueue.h
@@ -617,52 +617,44 @@ virtqueue_notify(struct virtqueue *vq)
 } while (0)
 
 static inline void
-virtqueue_xmit_offload(struct virtio_net_hdr *hdr,
-			struct rte_mbuf *cookie,
-			uint8_t offload)
+virtqueue_xmit_offload(struct virtio_net_hdr *hdr, struct rte_mbuf *cookie)
 {
-	if (offload) {
-		uint64_t csum_l4 = cookie->ol_flags & PKT_TX_L4_MASK;
-
-		if (cookie->ol_flags & PKT_TX_TCP_SEG)
-			csum_l4 |= PKT_TX_TCP_CKSUM;
-
-		switch (csum_l4) {
-		case PKT_TX_UDP_CKSUM:
-			hdr->csum_start = cookie->l2_len + cookie->l3_len;
-			hdr->csum_offset = offsetof(struct rte_udp_hdr,
-				dgram_cksum);
-			hdr->flags = VIRTIO_NET_HDR_F_NEEDS_CSUM;
-			break;
-
-		case PKT_TX_TCP_CKSUM:
-			hdr->csum_start = cookie->l2_len + cookie->l3_len;
-			hdr->csum_offset = offsetof(struct rte_tcp_hdr, cksum);
-			hdr->flags = VIRTIO_NET_HDR_F_NEEDS_CSUM;
-			break;
-
-		default:
-			ASSIGN_UNLESS_EQUAL(hdr->csum_start, 0);
-			ASSIGN_UNLESS_EQUAL(hdr->csum_offset, 0);
-			ASSIGN_UNLESS_EQUAL(hdr->flags, 0);
-			break;
-		}
+	uint64_t csum_l4 = cookie->ol_flags & PKT_TX_L4_MASK;
+
+	if (cookie->ol_flags & PKT_TX_TCP_SEG)
+		csum_l4 |= PKT_TX_TCP_CKSUM;
+
+	switch (csum_l4) {
+	case PKT_TX_UDP_CKSUM:
+		hdr->csum_start = cookie->l2_len + cookie->l3_len;
+		hdr->csum_offset = offsetof(struct rte_udp_hdr, dgram_cksum);
+		hdr->flags = VIRTIO_NET_HDR_F_NEEDS_CSUM;
+		break;
+
+	case PKT_TX_TCP_CKSUM:
+		hdr->csum_start = cookie->l2_len + cookie->l3_len;
+		hdr->csum_offset = offsetof(struct rte_tcp_hdr, cksum);
+		hdr->flags = VIRTIO_NET_HDR_F_NEEDS_CSUM;
+		break;
+
+	default:
+		ASSIGN_UNLESS_EQUAL(hdr->csum_start, 0);
+		ASSIGN_UNLESS_EQUAL(hdr->csum_offset, 0);
+		ASSIGN_UNLESS_EQUAL(hdr->flags, 0);
+		break;
+	}
 
-		/* TCP Segmentation Offload */
-		if (cookie->ol_flags & PKT_TX_TCP_SEG) {
-			hdr->gso_type = (cookie->ol_flags & PKT_TX_IPV6) ?
-				VIRTIO_NET_HDR_GSO_TCPV6 :
-				VIRTIO_NET_HDR_GSO_TCPV4;
-			hdr->gso_size = cookie->tso_segsz;
-			hdr->hdr_len =
-				cookie->l2_len +
-				cookie->l3_len +
-				cookie->l4_len;
-		} else {
-			ASSIGN_UNLESS_EQUAL(hdr->gso_type, 0);
-			ASSIGN_UNLESS_EQUAL(hdr->gso_size, 0);
-			ASSIGN_UNLESS_EQUAL(hdr->hdr_len, 0);
-		}
+	/* TCP Segmentation Offload */
+	if (cookie->ol_flags & PKT_TX_TCP_SEG) {
+		hdr->gso_type = (cookie->ol_flags & PKT_TX_IPV6) ?
+			VIRTIO_NET_HDR_GSO_TCPV6 :
+			VIRTIO_NET_HDR_GSO_TCPV4;
+		hdr->gso_size = cookie->tso_segsz;
+		hdr->hdr_len = cookie->l2_len + cookie->l3_len + cookie->l4_len;
+	} else {
+		ASSIGN_UNLESS_EQUAL(hdr->gso_type, 0);
+		ASSIGN_UNLESS_EQUAL(hdr->gso_size, 0);
+		ASSIGN_UNLESS_EQUAL(hdr->hdr_len, 0);
 	}
 }
 
@@ -741,7 +733,8 @@ virtqueue_enqueue_xmit_packed(struct virtnet_tx *txvq, struct rte_mbuf *cookie,
 		}
 	}
 
-	virtqueue_xmit_offload(hdr, cookie, vq->hw->has_tx_offload);
+	if (vq->hw->has_tx_offload)
+		virtqueue_xmit_offload(hdr, cookie);
 
 	do {
 		uint16_t flags;
-- 
2.23.0


^ permalink raw reply	[flat|nested] 63+ messages in thread

* [dpdk-dev] [PATCH v4 3/3] vhost: fix offload flags in Rx path
  2021-05-03 16:43 ` [dpdk-dev] [PATCH v4 0/3] " David Marchand
  2021-05-03 16:43   ` [dpdk-dev] [PATCH v4 1/3] net/virtio: do not touch Tx offload flags David Marchand
  2021-05-03 16:43   ` [dpdk-dev] [PATCH v4 2/3] net/virtio: refactor Tx offload helper David Marchand
@ 2021-05-03 16:43   ` David Marchand
  2021-05-04 11:07     ` Flavio Leitner
  2021-05-08  6:24     ` Wang, Yinan
  2021-05-04  8:29   ` [dpdk-dev] [PATCH v4 0/3] Offload flags fixes Maxime Coquelin
  3 siblings, 2 replies; 63+ messages in thread
From: David Marchand @ 2021-05-03 16:43 UTC (permalink / raw)
  To: dev
  Cc: maxime.coquelin, olivier.matz, fbl, i.maximets, chenbo.xia,
	ian.stokes, stable, Jijiang Liu, Yuanhan Liu

The vhost library currently configures Tx offloading (PKT_TX_*) on any
packet received from a guest virtio device which asks for some offloading.

This is problematic, as Tx offloading is something that the application
must ask for: the application needs to configure devices
to support every used offloads (ip, tcp checksumming, tso..), and the
various l2/l3/l4 lengths must be set following any processing that
happened in the application itself.

On the other hand, the received packets are not marked wrt current
packet l3/l4 checksumming info.

Copy virtio rx processing to fix those offload flags with some
differences:
- accept VIRTIO_NET_HDR_GSO_ECN and VIRTIO_NET_HDR_GSO_UDP,
- ignore anything but the VIRTIO_NET_HDR_F_NEEDS_CSUM flag (to comply with
  the virtio spec),

Some applications might rely on the current behavior, so it is left
untouched by default.
A new RTE_VHOST_USER_NET_COMPLIANT_OL_FLAGS flag is added to enable the
new behavior.

The vhost example has been updated for the new behavior: TSO is applied to
any packet marked LRO.

Fixes: 859b480d5afd ("vhost: add guest offload setting")
Cc: stable@dpdk.org

Signed-off-by: David Marchand <david.marchand@redhat.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
---
Changes since v3:
- rebased on next-virtio,

Changes since v2:
- introduced a new flag to keep existing behavior as the default,
- packets with unrecognised offload are passed to the application with no
  offload metadata rather than dropped,
- ignored VIRTIO_NET_HDR_F_DATA_VALID since the virtio spec states that
  the virtio driver is not allowed to use this flag when transmitting
  packets,

Changes since v1:
- updated vhost example,
- restored VIRTIO_NET_HDR_GSO_ECN and VIRTIO_NET_HDR_GSO_UDP support,
- restored log on buggy offload request,

---
 doc/guides/prog_guide/vhost_lib.rst    |  12 ++
 doc/guides/rel_notes/release_21_05.rst |   6 +
 drivers/net/vhost/rte_eth_vhost.c      |   2 +-
 examples/vhost/main.c                  |  44 +++---
 lib/vhost/rte_vhost.h                  |   1 +
 lib/vhost/socket.c                     |   5 +-
 lib/vhost/vhost.c                      |   6 +-
 lib/vhost/vhost.h                      |  14 +-
 lib/vhost/virtio_net.c                 | 185 ++++++++++++++++++++++---
 9 files changed, 222 insertions(+), 53 deletions(-)

diff --git a/doc/guides/prog_guide/vhost_lib.rst b/doc/guides/prog_guide/vhost_lib.rst
index 7afa351675..d18fb98910 100644
--- a/doc/guides/prog_guide/vhost_lib.rst
+++ b/doc/guides/prog_guide/vhost_lib.rst
@@ -118,6 +118,18 @@ The following is an overview of some key Vhost API functions:
 
     It is disabled by default.
 
+  - ``RTE_VHOST_USER_NET_COMPLIANT_OL_FLAGS``
+
+    Since v16.04, the vhost library forwards checksum and gso requests for
+    packets received from a virtio driver by filling Tx offload metadata in
+    the mbuf. This behavior is inconsistent with other drivers but it is left
+    untouched for existing applications that might rely on it.
+
+    This flag disables the legacy behavior and instead ask vhost to simply
+    populate Rx offload metadata in the mbuf.
+
+    It is disabled by default.
+
 * ``rte_vhost_driver_set_features(path, features)``
 
   This function sets the feature bits the vhost-user driver supports. The
diff --git a/doc/guides/rel_notes/release_21_05.rst b/doc/guides/rel_notes/release_21_05.rst
index a5f21f8425..6b7b0810a5 100644
--- a/doc/guides/rel_notes/release_21_05.rst
+++ b/doc/guides/rel_notes/release_21_05.rst
@@ -337,6 +337,12 @@ API Changes
   ``policer_action_recolor_supported`` and ``policer_action_drop_supported``
   have been removed.
 
+* vhost: The vhost library currently populates received mbufs from a virtio
+  driver with Tx offload flags while not filling Rx offload flags.
+  While this behavior is arguable, it is kept untouched.
+  A new flag ``RTE_VHOST_USER_NET_COMPLIANT_OL_FLAGS`` has been added to ask
+  for a behavior compliant with to the mbuf offload API.
+
 
 ABI Changes
 -----------
diff --git a/drivers/net/vhost/rte_eth_vhost.c b/drivers/net/vhost/rte_eth_vhost.c
index d198fc8a8e..281379d6a3 100644
--- a/drivers/net/vhost/rte_eth_vhost.c
+++ b/drivers/net/vhost/rte_eth_vhost.c
@@ -1505,7 +1505,7 @@ rte_pmd_vhost_probe(struct rte_vdev_device *dev)
 	int ret = 0;
 	char *iface_name;
 	uint16_t queues;
-	uint64_t flags = 0;
+	uint64_t flags = RTE_VHOST_USER_NET_COMPLIANT_OL_FLAGS;
 	uint64_t disable_flags = 0;
 	int client_mode = 0;
 	int iommu_support = 0;
diff --git a/examples/vhost/main.c b/examples/vhost/main.c
index 0bee1f3321..d2179eadb9 100644
--- a/examples/vhost/main.c
+++ b/examples/vhost/main.c
@@ -19,6 +19,7 @@
 #include <rte_log.h>
 #include <rte_string_fns.h>
 #include <rte_malloc.h>
+#include <rte_net.h>
 #include <rte_vhost.h>
 #include <rte_ip.h>
 #include <rte_tcp.h>
@@ -1029,33 +1030,34 @@ find_local_dest(struct vhost_dev *vdev, struct rte_mbuf *m,
 	return 0;
 }
 
-static uint16_t
-get_psd_sum(void *l3_hdr, uint64_t ol_flags)
-{
-	if (ol_flags & PKT_TX_IPV4)
-		return rte_ipv4_phdr_cksum(l3_hdr, ol_flags);
-	else /* assume ethertype == RTE_ETHER_TYPE_IPV6 */
-		return rte_ipv6_phdr_cksum(l3_hdr, ol_flags);
-}
-
 static void virtio_tx_offload(struct rte_mbuf *m)
 {
+	struct rte_net_hdr_lens hdr_lens;
+	struct rte_ipv4_hdr *ipv4_hdr;
+	struct rte_tcp_hdr *tcp_hdr;
+	uint32_t ptype;
 	void *l3_hdr;
-	struct rte_ipv4_hdr *ipv4_hdr = NULL;
-	struct rte_tcp_hdr *tcp_hdr = NULL;
-	struct rte_ether_hdr *eth_hdr =
-		rte_pktmbuf_mtod(m, struct rte_ether_hdr *);
 
-	l3_hdr = (char *)eth_hdr + m->l2_len;
+	ptype = rte_net_get_ptype(m, &hdr_lens, RTE_PTYPE_ALL_MASK);
+	m->l2_len = hdr_lens.l2_len;
+	m->l3_len = hdr_lens.l3_len;
+	m->l4_len = hdr_lens.l4_len;
 
-	if (m->ol_flags & PKT_TX_IPV4) {
+	l3_hdr = rte_pktmbuf_mtod_offset(m, void *, m->l2_len);
+	tcp_hdr = rte_pktmbuf_mtod_offset(m, struct rte_tcp_hdr *,
+		m->l2_len + m->l3_len);
+
+	m->ol_flags |= PKT_TX_TCP_SEG;
+	if ((ptype & RTE_PTYPE_L3_MASK) == RTE_PTYPE_L3_IPV4) {
+		m->ol_flags |= PKT_TX_IPV4;
+		m->ol_flags |= PKT_TX_IP_CKSUM;
 		ipv4_hdr = l3_hdr;
 		ipv4_hdr->hdr_checksum = 0;
-		m->ol_flags |= PKT_TX_IP_CKSUM;
+		tcp_hdr->cksum = rte_ipv4_phdr_cksum(l3_hdr, m->ol_flags);
+	} else { /* assume ethertype == RTE_ETHER_TYPE_IPV6 */
+		m->ol_flags |= PKT_TX_IPV6;
+		tcp_hdr->cksum = rte_ipv6_phdr_cksum(l3_hdr, m->ol_flags);
 	}
-
-	tcp_hdr = (struct rte_tcp_hdr *)((char *)l3_hdr + m->l3_len);
-	tcp_hdr->cksum = get_psd_sum(l3_hdr, m->ol_flags);
 }
 
 static __rte_always_inline void
@@ -1148,7 +1150,7 @@ virtio_tx_route(struct vhost_dev *vdev, struct rte_mbuf *m, uint16_t vlan_tag)
 		m->vlan_tci = vlan_tag;
 	}
 
-	if (m->ol_flags & PKT_TX_TCP_SEG)
+	if (m->ol_flags & PKT_RX_LRO)
 		virtio_tx_offload(m);
 
 	tx_q->m_table[tx_q->len++] = m;
@@ -1633,7 +1635,7 @@ main(int argc, char *argv[])
 	int ret, i;
 	uint16_t portid;
 	static pthread_t tid;
-	uint64_t flags = 0;
+	uint64_t flags = RTE_VHOST_USER_NET_COMPLIANT_OL_FLAGS;
 
 	signal(SIGINT, sigint_handler);
 
diff --git a/lib/vhost/rte_vhost.h b/lib/vhost/rte_vhost.h
index d0a8ae31f2..8d875e9322 100644
--- a/lib/vhost/rte_vhost.h
+++ b/lib/vhost/rte_vhost.h
@@ -36,6 +36,7 @@ extern "C" {
 /* support only linear buffers (no chained mbufs) */
 #define RTE_VHOST_USER_LINEARBUF_SUPPORT	(1ULL << 6)
 #define RTE_VHOST_USER_ASYNC_COPY	(1ULL << 7)
+#define RTE_VHOST_USER_NET_COMPLIANT_OL_FLAGS	(1ULL << 8)
 
 /* Features. */
 #ifndef VIRTIO_NET_F_GUEST_ANNOUNCE
diff --git a/lib/vhost/socket.c b/lib/vhost/socket.c
index 0169d36481..5d0d728d52 100644
--- a/lib/vhost/socket.c
+++ b/lib/vhost/socket.c
@@ -42,6 +42,7 @@ struct vhost_user_socket {
 	bool extbuf;
 	bool linearbuf;
 	bool async_copy;
+	bool net_compliant_ol_flags;
 
 	/*
 	 * The "supported_features" indicates the feature bits the
@@ -224,7 +225,8 @@ vhost_user_add_connection(int fd, struct vhost_user_socket *vsocket)
 	size = strnlen(vsocket->path, PATH_MAX);
 	vhost_set_ifname(vid, vsocket->path, size);
 
-	vhost_set_builtin_virtio_net(vid, vsocket->use_builtin_virtio_net);
+	vhost_setup_virtio_net(vid, vsocket->use_builtin_virtio_net,
+		vsocket->net_compliant_ol_flags);
 
 	vhost_attach_vdpa_device(vid, vsocket->vdpa_dev);
 
@@ -877,6 +879,7 @@ rte_vhost_driver_register(const char *path, uint64_t flags)
 	vsocket->extbuf = flags & RTE_VHOST_USER_EXTBUF_SUPPORT;
 	vsocket->linearbuf = flags & RTE_VHOST_USER_LINEARBUF_SUPPORT;
 	vsocket->async_copy = flags & RTE_VHOST_USER_ASYNC_COPY;
+	vsocket->net_compliant_ol_flags = flags & RTE_VHOST_USER_NET_COMPLIANT_OL_FLAGS;
 
 	if (vsocket->async_copy &&
 		(flags & (RTE_VHOST_USER_IOMMU_SUPPORT |
diff --git a/lib/vhost/vhost.c b/lib/vhost/vhost.c
index c9b6379f73..9abfc0bfe7 100644
--- a/lib/vhost/vhost.c
+++ b/lib/vhost/vhost.c
@@ -752,7 +752,7 @@ vhost_set_ifname(int vid, const char *if_name, unsigned int if_len)
 }
 
 void
-vhost_set_builtin_virtio_net(int vid, bool enable)
+vhost_setup_virtio_net(int vid, bool enable, bool compliant_ol_flags)
 {
 	struct virtio_net *dev = get_device(vid);
 
@@ -763,6 +763,10 @@ vhost_set_builtin_virtio_net(int vid, bool enable)
 		dev->flags |= VIRTIO_DEV_BUILTIN_VIRTIO_NET;
 	else
 		dev->flags &= ~VIRTIO_DEV_BUILTIN_VIRTIO_NET;
+	if (!compliant_ol_flags)
+		dev->flags |= VIRTIO_DEV_LEGACY_OL_FLAGS;
+	else
+		dev->flags &= ~VIRTIO_DEV_LEGACY_OL_FLAGS;
 }
 
 void
diff --git a/lib/vhost/vhost.h b/lib/vhost/vhost.h
index b303635645..8078ddff79 100644
--- a/lib/vhost/vhost.h
+++ b/lib/vhost/vhost.h
@@ -27,15 +27,17 @@
 #include "rte_vhost_async.h"
 
 /* Used to indicate that the device is running on a data core */
-#define VIRTIO_DEV_RUNNING 1
+#define VIRTIO_DEV_RUNNING ((uint32_t)1 << 0)
 /* Used to indicate that the device is ready to operate */
-#define VIRTIO_DEV_READY 2
+#define VIRTIO_DEV_READY ((uint32_t)1 << 1)
 /* Used to indicate that the built-in vhost net device backend is enabled */
-#define VIRTIO_DEV_BUILTIN_VIRTIO_NET 4
+#define VIRTIO_DEV_BUILTIN_VIRTIO_NET ((uint32_t)1 << 2)
 /* Used to indicate that the device has its own data path and configured */
-#define VIRTIO_DEV_VDPA_CONFIGURED 8
+#define VIRTIO_DEV_VDPA_CONFIGURED ((uint32_t)1 << 3)
 /* Used to indicate that the feature negotiation failed */
-#define VIRTIO_DEV_FEATURES_FAILED 16
+#define VIRTIO_DEV_FEATURES_FAILED ((uint32_t)1 << 4)
+/* Used to indicate that the virtio_net tx code should fill TX ol_flags */
+#define VIRTIO_DEV_LEGACY_OL_FLAGS ((uint32_t)1 << 5)
 
 /* Backend value set by guest. */
 #define VIRTIO_DEV_STOPPED -1
@@ -683,7 +685,7 @@ int alloc_vring_queue(struct virtio_net *dev, uint32_t vring_idx);
 void vhost_attach_vdpa_device(int vid, struct rte_vdpa_device *dev);
 
 void vhost_set_ifname(int, const char *if_name, unsigned int if_len);
-void vhost_set_builtin_virtio_net(int vid, bool enable);
+void vhost_setup_virtio_net(int vid, bool enable, bool legacy_ol_flags);
 void vhost_enable_extbuf(int vid);
 void vhost_enable_linearbuf(int vid);
 int vhost_enable_guest_notification(struct virtio_net *dev,
diff --git a/lib/vhost/virtio_net.c b/lib/vhost/virtio_net.c
index 1a34867f3c..8e36f4c340 100644
--- a/lib/vhost/virtio_net.c
+++ b/lib/vhost/virtio_net.c
@@ -8,6 +8,7 @@
 
 #include <rte_mbuf.h>
 #include <rte_memcpy.h>
+#include <rte_net.h>
 #include <rte_ether.h>
 #include <rte_ip.h>
 #include <rte_vhost.h>
@@ -2303,15 +2304,12 @@ parse_ethernet(struct rte_mbuf *m, uint16_t *l4_proto, void **l4_hdr)
 }
 
 static __rte_always_inline void
-vhost_dequeue_offload(struct virtio_net_hdr *hdr, struct rte_mbuf *m)
+vhost_dequeue_offload_legacy(struct virtio_net_hdr *hdr, struct rte_mbuf *m)
 {
 	uint16_t l4_proto = 0;
 	void *l4_hdr = NULL;
 	struct rte_tcp_hdr *tcp_hdr = NULL;
 
-	if (hdr->flags == 0 && hdr->gso_type == VIRTIO_NET_HDR_GSO_NONE)
-		return;
-
 	parse_ethernet(m, &l4_proto, &l4_hdr);
 	if (hdr->flags == VIRTIO_NET_HDR_F_NEEDS_CSUM) {
 		if (hdr->csum_start == (m->l2_len + m->l3_len)) {
@@ -2356,6 +2354,94 @@ vhost_dequeue_offload(struct virtio_net_hdr *hdr, struct rte_mbuf *m)
 	}
 }
 
+static __rte_always_inline void
+vhost_dequeue_offload(struct virtio_net_hdr *hdr, struct rte_mbuf *m,
+	bool legacy_ol_flags)
+{
+	struct rte_net_hdr_lens hdr_lens;
+	int l4_supported = 0;
+	uint32_t ptype;
+
+	if (hdr->flags == 0 && hdr->gso_type == VIRTIO_NET_HDR_GSO_NONE)
+		return;
+
+	if (legacy_ol_flags) {
+		vhost_dequeue_offload_legacy(hdr, m);
+		return;
+	}
+
+	m->ol_flags |= PKT_RX_IP_CKSUM_UNKNOWN;
+
+	ptype = rte_net_get_ptype(m, &hdr_lens, RTE_PTYPE_ALL_MASK);
+	m->packet_type = ptype;
+	if ((ptype & RTE_PTYPE_L4_MASK) == RTE_PTYPE_L4_TCP ||
+	    (ptype & RTE_PTYPE_L4_MASK) == RTE_PTYPE_L4_UDP ||
+	    (ptype & RTE_PTYPE_L4_MASK) == RTE_PTYPE_L4_SCTP)
+		l4_supported = 1;
+
+	/* According to Virtio 1.1 spec, the device only needs to look at
+	 * VIRTIO_NET_HDR_F_NEEDS_CSUM in the packet transmission path.
+	 * This differs from the processing incoming packets path where the
+	 * driver could rely on VIRTIO_NET_HDR_F_DATA_VALID flag set by the
+	 * device.
+	 *
+	 * 5.1.6.2.1 Driver Requirements: Packet Transmission
+	 * The driver MUST NOT set the VIRTIO_NET_HDR_F_DATA_VALID and
+	 * VIRTIO_NET_HDR_F_RSC_INFO bits in flags.
+	 *
+	 * 5.1.6.2.2 Device Requirements: Packet Transmission
+	 * The device MUST ignore flag bits that it does not recognize.
+	 */
+	if (hdr->flags & VIRTIO_NET_HDR_F_NEEDS_CSUM) {
+		uint32_t hdrlen;
+
+		hdrlen = hdr_lens.l2_len + hdr_lens.l3_len + hdr_lens.l4_len;
+		if (hdr->csum_start <= hdrlen && l4_supported != 0) {
+			m->ol_flags |= PKT_RX_L4_CKSUM_NONE;
+		} else {
+			/* Unknown proto or tunnel, do sw cksum. We can assume
+			 * the cksum field is in the first segment since the
+			 * buffers we provided to the host are large enough.
+			 * In case of SCTP, this will be wrong since it's a CRC
+			 * but there's nothing we can do.
+			 */
+			uint16_t csum = 0, off;
+
+			if (rte_raw_cksum_mbuf(m, hdr->csum_start,
+					rte_pktmbuf_pkt_len(m) - hdr->csum_start, &csum) < 0)
+				return;
+			if (likely(csum != 0xffff))
+				csum = ~csum;
+			off = hdr->csum_offset + hdr->csum_start;
+			if (rte_pktmbuf_data_len(m) >= off + 1)
+				*rte_pktmbuf_mtod_offset(m, uint16_t *, off) = csum;
+		}
+	}
+
+	if (hdr->gso_type != VIRTIO_NET_HDR_GSO_NONE) {
+		if (hdr->gso_size == 0)
+			return;
+
+		switch (hdr->gso_type & ~VIRTIO_NET_HDR_GSO_ECN) {
+		case VIRTIO_NET_HDR_GSO_TCPV4:
+		case VIRTIO_NET_HDR_GSO_TCPV6:
+			if ((ptype & RTE_PTYPE_L4_MASK) != RTE_PTYPE_L4_TCP)
+				break;
+			m->ol_flags |= PKT_RX_LRO | PKT_RX_L4_CKSUM_NONE;
+			m->tso_segsz = hdr->gso_size;
+			break;
+		case VIRTIO_NET_HDR_GSO_UDP:
+			if ((ptype & RTE_PTYPE_L4_MASK) != RTE_PTYPE_L4_UDP)
+				break;
+			m->ol_flags |= PKT_RX_LRO | PKT_RX_L4_CKSUM_NONE;
+			m->tso_segsz = hdr->gso_size;
+			break;
+		default:
+			break;
+		}
+	}
+}
+
 static __rte_noinline void
 copy_vnet_hdr_from_desc(struct virtio_net_hdr *hdr,
 		struct buf_vector *buf_vec)
@@ -2380,7 +2466,8 @@ copy_vnet_hdr_from_desc(struct virtio_net_hdr *hdr,
 static __rte_always_inline int
 copy_desc_to_mbuf(struct virtio_net *dev, struct vhost_virtqueue *vq,
 		  struct buf_vector *buf_vec, uint16_t nr_vec,
-		  struct rte_mbuf *m, struct rte_mempool *mbuf_pool)
+		  struct rte_mbuf *m, struct rte_mempool *mbuf_pool,
+		  bool legacy_ol_flags)
 {
 	uint32_t buf_avail, buf_offset;
 	uint64_t buf_addr, buf_len;
@@ -2513,7 +2600,7 @@ copy_desc_to_mbuf(struct virtio_net *dev, struct vhost_virtqueue *vq,
 	m->pkt_len    += mbuf_offset;
 
 	if (hdr)
-		vhost_dequeue_offload(hdr, m);
+		vhost_dequeue_offload(hdr, m, legacy_ol_flags);
 
 out:
 
@@ -2606,9 +2693,11 @@ virtio_dev_pktmbuf_alloc(struct virtio_net *dev, struct rte_mempool *mp,
 	return pkt;
 }
 
-static __rte_noinline uint16_t
+__rte_always_inline
+static uint16_t
 virtio_dev_tx_split(struct virtio_net *dev, struct vhost_virtqueue *vq,
-	struct rte_mempool *mbuf_pool, struct rte_mbuf **pkts, uint16_t count)
+	struct rte_mempool *mbuf_pool, struct rte_mbuf **pkts, uint16_t count,
+	bool legacy_ol_flags)
 {
 	uint16_t i;
 	uint16_t free_entries;
@@ -2668,7 +2757,7 @@ virtio_dev_tx_split(struct virtio_net *dev, struct vhost_virtqueue *vq,
 		}
 
 		err = copy_desc_to_mbuf(dev, vq, buf_vec, nr_vec, pkts[i],
-				mbuf_pool);
+				mbuf_pool, legacy_ol_flags);
 		if (unlikely(err)) {
 			rte_pktmbuf_free(pkts[i]);
 			if (!allocerr_warned) {
@@ -2696,6 +2785,24 @@ virtio_dev_tx_split(struct virtio_net *dev, struct vhost_virtqueue *vq,
 	return (i - dropped);
 }
 
+__rte_noinline
+static uint16_t
+virtio_dev_tx_split_legacy(struct virtio_net *dev,
+	struct vhost_virtqueue *vq, struct rte_mempool *mbuf_pool,
+	struct rte_mbuf **pkts, uint16_t count)
+{
+	return virtio_dev_tx_split(dev, vq, mbuf_pool, pkts, count, true);
+}
+
+__rte_noinline
+static uint16_t
+virtio_dev_tx_split_compliant(struct virtio_net *dev,
+	struct vhost_virtqueue *vq, struct rte_mempool *mbuf_pool,
+	struct rte_mbuf **pkts, uint16_t count)
+{
+	return virtio_dev_tx_split(dev, vq, mbuf_pool, pkts, count, false);
+}
+
 static __rte_always_inline int
 vhost_reserve_avail_batch_packed(struct virtio_net *dev,
 				 struct vhost_virtqueue *vq,
@@ -2770,7 +2877,8 @@ vhost_reserve_avail_batch_packed(struct virtio_net *dev,
 static __rte_always_inline int
 virtio_dev_tx_batch_packed(struct virtio_net *dev,
 			   struct vhost_virtqueue *vq,
-			   struct rte_mbuf **pkts)
+			   struct rte_mbuf **pkts,
+			   bool legacy_ol_flags)
 {
 	uint16_t avail_idx = vq->last_avail_idx;
 	uint32_t buf_offset = sizeof(struct virtio_net_hdr_mrg_rxbuf);
@@ -2794,7 +2902,7 @@ virtio_dev_tx_batch_packed(struct virtio_net *dev,
 	if (virtio_net_with_host_offload(dev)) {
 		vhost_for_each_try_unroll(i, 0, PACKED_BATCH_SIZE) {
 			hdr = (struct virtio_net_hdr *)(desc_addrs[i]);
-			vhost_dequeue_offload(hdr, pkts[i]);
+			vhost_dequeue_offload(hdr, pkts[i], legacy_ol_flags);
 		}
 	}
 
@@ -2815,7 +2923,8 @@ vhost_dequeue_single_packed(struct virtio_net *dev,
 			    struct rte_mempool *mbuf_pool,
 			    struct rte_mbuf *pkts,
 			    uint16_t *buf_id,
-			    uint16_t *desc_count)
+			    uint16_t *desc_count,
+			    bool legacy_ol_flags)
 {
 	struct buf_vector buf_vec[BUF_VECTOR_MAX];
 	uint32_t buf_len;
@@ -2841,7 +2950,7 @@ vhost_dequeue_single_packed(struct virtio_net *dev,
 	}
 
 	err = copy_desc_to_mbuf(dev, vq, buf_vec, nr_vec, pkts,
-				mbuf_pool);
+				mbuf_pool, legacy_ol_flags);
 	if (unlikely(err)) {
 		if (!allocerr_warned) {
 			VHOST_LOG_DATA(ERR,
@@ -2859,14 +2968,15 @@ static __rte_always_inline int
 virtio_dev_tx_single_packed(struct virtio_net *dev,
 			    struct vhost_virtqueue *vq,
 			    struct rte_mempool *mbuf_pool,
-			    struct rte_mbuf *pkts)
+			    struct rte_mbuf *pkts,
+			    bool legacy_ol_flags)
 {
 
 	uint16_t buf_id, desc_count = 0;
 	int ret;
 
 	ret = vhost_dequeue_single_packed(dev, vq, mbuf_pool, pkts, &buf_id,
-					&desc_count);
+					&desc_count, legacy_ol_flags);
 
 	if (likely(desc_count > 0)) {
 		if (virtio_net_is_inorder(dev))
@@ -2882,12 +2992,14 @@ virtio_dev_tx_single_packed(struct virtio_net *dev,
 	return ret;
 }
 
-static __rte_noinline uint16_t
+__rte_always_inline
+static uint16_t
 virtio_dev_tx_packed(struct virtio_net *dev,
 		     struct vhost_virtqueue *__rte_restrict vq,
 		     struct rte_mempool *mbuf_pool,
 		     struct rte_mbuf **__rte_restrict pkts,
-		     uint32_t count)
+		     uint32_t count,
+		     bool legacy_ol_flags)
 {
 	uint32_t pkt_idx = 0;
 
@@ -2899,14 +3011,16 @@ virtio_dev_tx_packed(struct virtio_net *dev,
 
 		if (count - pkt_idx >= PACKED_BATCH_SIZE) {
 			if (!virtio_dev_tx_batch_packed(dev, vq,
-							&pkts[pkt_idx])) {
+							&pkts[pkt_idx],
+							legacy_ol_flags)) {
 				pkt_idx += PACKED_BATCH_SIZE;
 				continue;
 			}
 		}
 
 		if (virtio_dev_tx_single_packed(dev, vq, mbuf_pool,
-						pkts[pkt_idx]))
+						pkts[pkt_idx],
+						legacy_ol_flags))
 			break;
 		pkt_idx++;
 	} while (pkt_idx < count);
@@ -2924,6 +3038,24 @@ virtio_dev_tx_packed(struct virtio_net *dev,
 	return pkt_idx;
 }
 
+__rte_noinline
+static uint16_t
+virtio_dev_tx_packed_legacy(struct virtio_net *dev,
+	struct vhost_virtqueue *__rte_restrict vq, struct rte_mempool *mbuf_pool,
+	struct rte_mbuf **__rte_restrict pkts, uint32_t count)
+{
+	return virtio_dev_tx_packed(dev, vq, mbuf_pool, pkts, count, true);
+}
+
+__rte_noinline
+static uint16_t
+virtio_dev_tx_packed_compliant(struct virtio_net *dev,
+	struct vhost_virtqueue *__rte_restrict vq, struct rte_mempool *mbuf_pool,
+	struct rte_mbuf **__rte_restrict pkts, uint32_t count)
+{
+	return virtio_dev_tx_packed(dev, vq, mbuf_pool, pkts, count, false);
+}
+
 uint16_t
 rte_vhost_dequeue_burst(int vid, uint16_t queue_id,
 	struct rte_mempool *mbuf_pool, struct rte_mbuf **pkts, uint16_t count)
@@ -2999,10 +3131,17 @@ rte_vhost_dequeue_burst(int vid, uint16_t queue_id,
 		count -= 1;
 	}
 
-	if (vq_is_packed(dev))
-		count = virtio_dev_tx_packed(dev, vq, mbuf_pool, pkts, count);
-	else
-		count = virtio_dev_tx_split(dev, vq, mbuf_pool, pkts, count);
+	if (vq_is_packed(dev)) {
+		if (dev->flags & VIRTIO_DEV_LEGACY_OL_FLAGS)
+			count = virtio_dev_tx_packed_legacy(dev, vq, mbuf_pool, pkts, count);
+		else
+			count = virtio_dev_tx_packed_compliant(dev, vq, mbuf_pool, pkts, count);
+	} else {
+		if (dev->flags & VIRTIO_DEV_LEGACY_OL_FLAGS)
+			count = virtio_dev_tx_split_legacy(dev, vq, mbuf_pool, pkts, count);
+		else
+			count = virtio_dev_tx_split_compliant(dev, vq, mbuf_pool, pkts, count);
+	}
 
 out:
 	if (dev->features & (1ULL << VIRTIO_F_IOMMU_PLATFORM))
-- 
2.23.0


^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [dpdk-dev] [PATCH v4 0/3] Offload flags fixes
  2021-05-03 16:43 ` [dpdk-dev] [PATCH v4 0/3] " David Marchand
                     ` (2 preceding siblings ...)
  2021-05-03 16:43   ` [dpdk-dev] [PATCH v4 3/3] vhost: fix offload flags in Rx path David Marchand
@ 2021-05-04  8:29   ` Maxime Coquelin
  3 siblings, 0 replies; 63+ messages in thread
From: Maxime Coquelin @ 2021-05-04  8:29 UTC (permalink / raw)
  To: David Marchand, dev; +Cc: olivier.matz, fbl, i.maximets, chenbo.xia, ian.stokes



On 5/3/21 6:43 PM, David Marchand wrote:
> The important part is the last patch on vhost handling of offloading
> requests coming from a virtio guest interface.
> 
> The rest are small fixes that I accumulated while reviewing the mbuf
> offload flags.
> 
> On this last patch, it has the potential of breaking existing
> applications using the vhost library (OVS being impacted).
> I did not mark it for backport.
> 
> Changes since v3:
> - patch 1 went through the main repo,
> - rebased on next-virtio,
> 
> Changes since v2:
> - kept behavior untouched (to avoid breaking ABI) and introduced a new
>   flag to select the new behavior,
> 
> Changes since v1:
> - dropped patch on net/tap,
> - added missing bits in example/vhost,
> - relaxed checks on VIRTIO_NET_HDR_GSO_ECN and VIRTIO_NET_HDR_GSO_UDP,
> 


Applied to dpdk-next-virtio/main.

Thanks,
Maxime


^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [dpdk-dev] [PATCH v4 3/3] vhost: fix offload flags in Rx path
  2021-05-03 16:43   ` [dpdk-dev] [PATCH v4 3/3] vhost: fix offload flags in Rx path David Marchand
@ 2021-05-04 11:07     ` Flavio Leitner
  2021-05-08  6:24     ` Wang, Yinan
  1 sibling, 0 replies; 63+ messages in thread
From: Flavio Leitner @ 2021-05-04 11:07 UTC (permalink / raw)
  To: David Marchand
  Cc: dev, maxime.coquelin, olivier.matz, i.maximets, chenbo.xia,
	ian.stokes, stable, Jijiang Liu, Yuanhan Liu

On Mon, May 03, 2021 at 06:43:44PM +0200, David Marchand wrote:
> The vhost library currently configures Tx offloading (PKT_TX_*) on any
> packet received from a guest virtio device which asks for some offloading.
> 
> This is problematic, as Tx offloading is something that the application
> must ask for: the application needs to configure devices
> to support every used offloads (ip, tcp checksumming, tso..), and the
> various l2/l3/l4 lengths must be set following any processing that
> happened in the application itself.
> 
> On the other hand, the received packets are not marked wrt current
> packet l3/l4 checksumming info.
> 
> Copy virtio rx processing to fix those offload flags with some
> differences:
> - accept VIRTIO_NET_HDR_GSO_ECN and VIRTIO_NET_HDR_GSO_UDP,
> - ignore anything but the VIRTIO_NET_HDR_F_NEEDS_CSUM flag (to comply with
>   the virtio spec),
> 
> Some applications might rely on the current behavior, so it is left
> untouched by default.
> A new RTE_VHOST_USER_NET_COMPLIANT_OL_FLAGS flag is added to enable the
> new behavior.
> 
> The vhost example has been updated for the new behavior: TSO is applied to
> any packet marked LRO.
> 
> Fixes: 859b480d5afd ("vhost: add guest offload setting")
> Cc: stable@dpdk.org
> 
> Signed-off-by: David Marchand <david.marchand@redhat.com>
> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
> ---
> Changes since v3:
> - rebased on next-virtio,
> 
> Changes since v2:
> - introduced a new flag to keep existing behavior as the default,
> - packets with unrecognised offload are passed to the application with no
>   offload metadata rather than dropped,
> - ignored VIRTIO_NET_HDR_F_DATA_VALID since the virtio spec states that
>   the virtio driver is not allowed to use this flag when transmitting
>   packets,
> 
> Changes since v1:
> - updated vhost example,
> - restored VIRTIO_NET_HDR_GSO_ECN and VIRTIO_NET_HDR_GSO_UDP support,
> - restored log on buggy offload request,
> 
> ---
>  doc/guides/prog_guide/vhost_lib.rst    |  12 ++
>  doc/guides/rel_notes/release_21_05.rst |   6 +
>  drivers/net/vhost/rte_eth_vhost.c      |   2 +-
>  examples/vhost/main.c                  |  44 +++---
>  lib/vhost/rte_vhost.h                  |   1 +
>  lib/vhost/socket.c                     |   5 +-
>  lib/vhost/vhost.c                      |   6 +-
>  lib/vhost/vhost.h                      |  14 +-
>  lib/vhost/virtio_net.c                 | 185 ++++++++++++++++++++++---
>  9 files changed, 222 insertions(+), 53 deletions(-)
> 
> diff --git a/doc/guides/prog_guide/vhost_lib.rst b/doc/guides/prog_guide/vhost_lib.rst
> index 7afa351675..d18fb98910 100644
> --- a/doc/guides/prog_guide/vhost_lib.rst
> +++ b/doc/guides/prog_guide/vhost_lib.rst
> @@ -118,6 +118,18 @@ The following is an overview of some key Vhost API functions:
>  
>      It is disabled by default.
>  
> +  - ``RTE_VHOST_USER_NET_COMPLIANT_OL_FLAGS``
> +
> +    Since v16.04, the vhost library forwards checksum and gso requests for
> +    packets received from a virtio driver by filling Tx offload metadata in
> +    the mbuf. This behavior is inconsistent with other drivers but it is left
> +    untouched for existing applications that might rely on it.
> +
> +    This flag disables the legacy behavior and instead ask vhost to simply
> +    populate Rx offload metadata in the mbuf.
> +
> +    It is disabled by default.
> +
>  * ``rte_vhost_driver_set_features(path, features)``
>  
>    This function sets the feature bits the vhost-user driver supports. The
> diff --git a/doc/guides/rel_notes/release_21_05.rst b/doc/guides/rel_notes/release_21_05.rst
> index a5f21f8425..6b7b0810a5 100644
> --- a/doc/guides/rel_notes/release_21_05.rst
> +++ b/doc/guides/rel_notes/release_21_05.rst
> @@ -337,6 +337,12 @@ API Changes
>    ``policer_action_recolor_supported`` and ``policer_action_drop_supported``
>    have been removed.
>  
> +* vhost: The vhost library currently populates received mbufs from a virtio
> +  driver with Tx offload flags while not filling Rx offload flags.
> +  While this behavior is arguable, it is kept untouched.
> +  A new flag ``RTE_VHOST_USER_NET_COMPLIANT_OL_FLAGS`` has been added to ask
> +  for a behavior compliant with to the mbuf offload API.
> +
>  
>  ABI Changes
>  -----------
> diff --git a/drivers/net/vhost/rte_eth_vhost.c b/drivers/net/vhost/rte_eth_vhost.c
> index d198fc8a8e..281379d6a3 100644
> --- a/drivers/net/vhost/rte_eth_vhost.c
> +++ b/drivers/net/vhost/rte_eth_vhost.c
> @@ -1505,7 +1505,7 @@ rte_pmd_vhost_probe(struct rte_vdev_device *dev)
>  	int ret = 0;
>  	char *iface_name;
>  	uint16_t queues;
> -	uint64_t flags = 0;
> +	uint64_t flags = RTE_VHOST_USER_NET_COMPLIANT_OL_FLAGS;
>  	uint64_t disable_flags = 0;
>  	int client_mode = 0;
>  	int iommu_support = 0;
> diff --git a/examples/vhost/main.c b/examples/vhost/main.c
> index 0bee1f3321..d2179eadb9 100644
> --- a/examples/vhost/main.c
> +++ b/examples/vhost/main.c
> @@ -19,6 +19,7 @@
>  #include <rte_log.h>
>  #include <rte_string_fns.h>
>  #include <rte_malloc.h>
> +#include <rte_net.h>
>  #include <rte_vhost.h>
>  #include <rte_ip.h>
>  #include <rte_tcp.h>
> @@ -1029,33 +1030,34 @@ find_local_dest(struct vhost_dev *vdev, struct rte_mbuf *m,
>  	return 0;
>  }
>  
> -static uint16_t
> -get_psd_sum(void *l3_hdr, uint64_t ol_flags)
> -{
> -	if (ol_flags & PKT_TX_IPV4)
> -		return rte_ipv4_phdr_cksum(l3_hdr, ol_flags);
> -	else /* assume ethertype == RTE_ETHER_TYPE_IPV6 */
> -		return rte_ipv6_phdr_cksum(l3_hdr, ol_flags);
> -}
> -
>  static void virtio_tx_offload(struct rte_mbuf *m)
>  {
> +	struct rte_net_hdr_lens hdr_lens;
> +	struct rte_ipv4_hdr *ipv4_hdr;
> +	struct rte_tcp_hdr *tcp_hdr;
> +	uint32_t ptype;
>  	void *l3_hdr;
> -	struct rte_ipv4_hdr *ipv4_hdr = NULL;
> -	struct rte_tcp_hdr *tcp_hdr = NULL;
> -	struct rte_ether_hdr *eth_hdr =
> -		rte_pktmbuf_mtod(m, struct rte_ether_hdr *);
>  
> -	l3_hdr = (char *)eth_hdr + m->l2_len;
> +	ptype = rte_net_get_ptype(m, &hdr_lens, RTE_PTYPE_ALL_MASK);
> +	m->l2_len = hdr_lens.l2_len;
> +	m->l3_len = hdr_lens.l3_len;
> +	m->l4_len = hdr_lens.l4_len;
>  
> -	if (m->ol_flags & PKT_TX_IPV4) {
> +	l3_hdr = rte_pktmbuf_mtod_offset(m, void *, m->l2_len);
> +	tcp_hdr = rte_pktmbuf_mtod_offset(m, struct rte_tcp_hdr *,
> +		m->l2_len + m->l3_len);
> +
> +	m->ol_flags |= PKT_TX_TCP_SEG;
> +	if ((ptype & RTE_PTYPE_L3_MASK) == RTE_PTYPE_L3_IPV4) {
> +		m->ol_flags |= PKT_TX_IPV4;
> +		m->ol_flags |= PKT_TX_IP_CKSUM;
>  		ipv4_hdr = l3_hdr;
>  		ipv4_hdr->hdr_checksum = 0;
> -		m->ol_flags |= PKT_TX_IP_CKSUM;
> +		tcp_hdr->cksum = rte_ipv4_phdr_cksum(l3_hdr, m->ol_flags);
> +	} else { /* assume ethertype == RTE_ETHER_TYPE_IPV6 */
> +		m->ol_flags |= PKT_TX_IPV6;
> +		tcp_hdr->cksum = rte_ipv6_phdr_cksum(l3_hdr, m->ol_flags);
>  	}
> -
> -	tcp_hdr = (struct rte_tcp_hdr *)((char *)l3_hdr + m->l3_len);
> -	tcp_hdr->cksum = get_psd_sum(l3_hdr, m->ol_flags);
>  }
>  
>  static __rte_always_inline void
> @@ -1148,7 +1150,7 @@ virtio_tx_route(struct vhost_dev *vdev, struct rte_mbuf *m, uint16_t vlan_tag)
>  		m->vlan_tci = vlan_tag;
>  	}
>  
> -	if (m->ol_flags & PKT_TX_TCP_SEG)
> +	if (m->ol_flags & PKT_RX_LRO)
>  		virtio_tx_offload(m);
>  
>  	tx_q->m_table[tx_q->len++] = m;
> @@ -1633,7 +1635,7 @@ main(int argc, char *argv[])
>  	int ret, i;
>  	uint16_t portid;
>  	static pthread_t tid;
> -	uint64_t flags = 0;
> +	uint64_t flags = RTE_VHOST_USER_NET_COMPLIANT_OL_FLAGS;
>  
>  	signal(SIGINT, sigint_handler);
>  
> diff --git a/lib/vhost/rte_vhost.h b/lib/vhost/rte_vhost.h
> index d0a8ae31f2..8d875e9322 100644
> --- a/lib/vhost/rte_vhost.h
> +++ b/lib/vhost/rte_vhost.h
> @@ -36,6 +36,7 @@ extern "C" {
>  /* support only linear buffers (no chained mbufs) */
>  #define RTE_VHOST_USER_LINEARBUF_SUPPORT	(1ULL << 6)
>  #define RTE_VHOST_USER_ASYNC_COPY	(1ULL << 7)
> +#define RTE_VHOST_USER_NET_COMPLIANT_OL_FLAGS	(1ULL << 8)
>  
>  /* Features. */
>  #ifndef VIRTIO_NET_F_GUEST_ANNOUNCE
> diff --git a/lib/vhost/socket.c b/lib/vhost/socket.c
> index 0169d36481..5d0d728d52 100644
> --- a/lib/vhost/socket.c
> +++ b/lib/vhost/socket.c
> @@ -42,6 +42,7 @@ struct vhost_user_socket {
>  	bool extbuf;
>  	bool linearbuf;
>  	bool async_copy;
> +	bool net_compliant_ol_flags;
>  
>  	/*
>  	 * The "supported_features" indicates the feature bits the
> @@ -224,7 +225,8 @@ vhost_user_add_connection(int fd, struct vhost_user_socket *vsocket)
>  	size = strnlen(vsocket->path, PATH_MAX);
>  	vhost_set_ifname(vid, vsocket->path, size);
>  
> -	vhost_set_builtin_virtio_net(vid, vsocket->use_builtin_virtio_net);
> +	vhost_setup_virtio_net(vid, vsocket->use_builtin_virtio_net,
> +		vsocket->net_compliant_ol_flags);
>  
>  	vhost_attach_vdpa_device(vid, vsocket->vdpa_dev);
>  
> @@ -877,6 +879,7 @@ rte_vhost_driver_register(const char *path, uint64_t flags)
>  	vsocket->extbuf = flags & RTE_VHOST_USER_EXTBUF_SUPPORT;
>  	vsocket->linearbuf = flags & RTE_VHOST_USER_LINEARBUF_SUPPORT;
>  	vsocket->async_copy = flags & RTE_VHOST_USER_ASYNC_COPY;
> +	vsocket->net_compliant_ol_flags = flags & RTE_VHOST_USER_NET_COMPLIANT_OL_FLAGS;
>  
>  	if (vsocket->async_copy &&
>  		(flags & (RTE_VHOST_USER_IOMMU_SUPPORT |
> diff --git a/lib/vhost/vhost.c b/lib/vhost/vhost.c
> index c9b6379f73..9abfc0bfe7 100644
> --- a/lib/vhost/vhost.c
> +++ b/lib/vhost/vhost.c
> @@ -752,7 +752,7 @@ vhost_set_ifname(int vid, const char *if_name, unsigned int if_len)
>  }
>  
>  void
> -vhost_set_builtin_virtio_net(int vid, bool enable)
> +vhost_setup_virtio_net(int vid, bool enable, bool compliant_ol_flags)
>  {
>  	struct virtio_net *dev = get_device(vid);
>  
> @@ -763,6 +763,10 @@ vhost_set_builtin_virtio_net(int vid, bool enable)
>  		dev->flags |= VIRTIO_DEV_BUILTIN_VIRTIO_NET;
>  	else
>  		dev->flags &= ~VIRTIO_DEV_BUILTIN_VIRTIO_NET;
> +	if (!compliant_ol_flags)
> +		dev->flags |= VIRTIO_DEV_LEGACY_OL_FLAGS;
> +	else
> +		dev->flags &= ~VIRTIO_DEV_LEGACY_OL_FLAGS;
>  }
>  
>  void
> diff --git a/lib/vhost/vhost.h b/lib/vhost/vhost.h
> index b303635645..8078ddff79 100644
> --- a/lib/vhost/vhost.h
> +++ b/lib/vhost/vhost.h
> @@ -27,15 +27,17 @@
>  #include "rte_vhost_async.h"
>  
>  /* Used to indicate that the device is running on a data core */
> -#define VIRTIO_DEV_RUNNING 1
> +#define VIRTIO_DEV_RUNNING ((uint32_t)1 << 0)
>  /* Used to indicate that the device is ready to operate */
> -#define VIRTIO_DEV_READY 2
> +#define VIRTIO_DEV_READY ((uint32_t)1 << 1)
>  /* Used to indicate that the built-in vhost net device backend is enabled */
> -#define VIRTIO_DEV_BUILTIN_VIRTIO_NET 4
> +#define VIRTIO_DEV_BUILTIN_VIRTIO_NET ((uint32_t)1 << 2)
>  /* Used to indicate that the device has its own data path and configured */
> -#define VIRTIO_DEV_VDPA_CONFIGURED 8
> +#define VIRTIO_DEV_VDPA_CONFIGURED ((uint32_t)1 << 3)
>  /* Used to indicate that the feature negotiation failed */
> -#define VIRTIO_DEV_FEATURES_FAILED 16
> +#define VIRTIO_DEV_FEATURES_FAILED ((uint32_t)1 << 4)
> +/* Used to indicate that the virtio_net tx code should fill TX ol_flags */
> +#define VIRTIO_DEV_LEGACY_OL_FLAGS ((uint32_t)1 << 5)
>  
>  /* Backend value set by guest. */
>  #define VIRTIO_DEV_STOPPED -1
> @@ -683,7 +685,7 @@ int alloc_vring_queue(struct virtio_net *dev, uint32_t vring_idx);
>  void vhost_attach_vdpa_device(int vid, struct rte_vdpa_device *dev);
>  
>  void vhost_set_ifname(int, const char *if_name, unsigned int if_len);
> -void vhost_set_builtin_virtio_net(int vid, bool enable);
> +void vhost_setup_virtio_net(int vid, bool enable, bool legacy_ol_flags);
>  void vhost_enable_extbuf(int vid);
>  void vhost_enable_linearbuf(int vid);
>  int vhost_enable_guest_notification(struct virtio_net *dev,
> diff --git a/lib/vhost/virtio_net.c b/lib/vhost/virtio_net.c
> index 1a34867f3c..8e36f4c340 100644
> --- a/lib/vhost/virtio_net.c
> +++ b/lib/vhost/virtio_net.c
> @@ -8,6 +8,7 @@
>  
>  #include <rte_mbuf.h>
>  #include <rte_memcpy.h>
> +#include <rte_net.h>
>  #include <rte_ether.h>
>  #include <rte_ip.h>
>  #include <rte_vhost.h>
> @@ -2303,15 +2304,12 @@ parse_ethernet(struct rte_mbuf *m, uint16_t *l4_proto, void **l4_hdr)
>  }
>  
>  static __rte_always_inline void
> -vhost_dequeue_offload(struct virtio_net_hdr *hdr, struct rte_mbuf *m)
> +vhost_dequeue_offload_legacy(struct virtio_net_hdr *hdr, struct rte_mbuf *m)
>  {
>  	uint16_t l4_proto = 0;
>  	void *l4_hdr = NULL;
>  	struct rte_tcp_hdr *tcp_hdr = NULL;
>  
> -	if (hdr->flags == 0 && hdr->gso_type == VIRTIO_NET_HDR_GSO_NONE)
> -		return;
> -
>  	parse_ethernet(m, &l4_proto, &l4_hdr);
>  	if (hdr->flags == VIRTIO_NET_HDR_F_NEEDS_CSUM) {
>  		if (hdr->csum_start == (m->l2_len + m->l3_len)) {
> @@ -2356,6 +2354,94 @@ vhost_dequeue_offload(struct virtio_net_hdr *hdr, struct rte_mbuf *m)
>  	}
>  }
>  
> +static __rte_always_inline void
> +vhost_dequeue_offload(struct virtio_net_hdr *hdr, struct rte_mbuf *m,
> +	bool legacy_ol_flags)
> +{
> +	struct rte_net_hdr_lens hdr_lens;
> +	int l4_supported = 0;
> +	uint32_t ptype;
> +
> +	if (hdr->flags == 0 && hdr->gso_type == VIRTIO_NET_HDR_GSO_NONE)
> +		return;
> +
> +	if (legacy_ol_flags) {
> +		vhost_dequeue_offload_legacy(hdr, m);
> +		return;
> +	}
> +
> +	m->ol_flags |= PKT_RX_IP_CKSUM_UNKNOWN;
> +
> +	ptype = rte_net_get_ptype(m, &hdr_lens, RTE_PTYPE_ALL_MASK);
> +	m->packet_type = ptype;

My _impression_ is that calling rte_net_get_ptype() makes the
receiving process a bit more expensive than without the patch
and it is not optional. However, the original parsing code was
limited and could be considered a bug.

Anyways, calling that has the nice side effect of providing the
packet_type which it didn't provide before the patch.

Acked-by: Flavio Leitner <fbl@sysclose.org>
(though this just got merged)

Thanks David, great work!
fbl


> +	if ((ptype & RTE_PTYPE_L4_MASK) == RTE_PTYPE_L4_TCP ||
> +	    (ptype & RTE_PTYPE_L4_MASK) == RTE_PTYPE_L4_UDP ||
> +	    (ptype & RTE_PTYPE_L4_MASK) == RTE_PTYPE_L4_SCTP)
> +		l4_supported = 1;
> +
> +	/* According to Virtio 1.1 spec, the device only needs to look at
> +	 * VIRTIO_NET_HDR_F_NEEDS_CSUM in the packet transmission path.
> +	 * This differs from the processing incoming packets path where the
> +	 * driver could rely on VIRTIO_NET_HDR_F_DATA_VALID flag set by the
> +	 * device.
> +	 *
> +	 * 5.1.6.2.1 Driver Requirements: Packet Transmission
> +	 * The driver MUST NOT set the VIRTIO_NET_HDR_F_DATA_VALID and
> +	 * VIRTIO_NET_HDR_F_RSC_INFO bits in flags.
> +	 *
> +	 * 5.1.6.2.2 Device Requirements: Packet Transmission
> +	 * The device MUST ignore flag bits that it does not recognize.
> +	 */
> +	if (hdr->flags & VIRTIO_NET_HDR_F_NEEDS_CSUM) {
> +		uint32_t hdrlen;
> +
> +		hdrlen = hdr_lens.l2_len + hdr_lens.l3_len + hdr_lens.l4_len;
> +		if (hdr->csum_start <= hdrlen && l4_supported != 0) {
> +			m->ol_flags |= PKT_RX_L4_CKSUM_NONE;
> +		} else {
> +			/* Unknown proto or tunnel, do sw cksum. We can assume
> +			 * the cksum field is in the first segment since the
> +			 * buffers we provided to the host are large enough.
> +			 * In case of SCTP, this will be wrong since it's a CRC
> +			 * but there's nothing we can do.
> +			 */
> +			uint16_t csum = 0, off;
> +
> +			if (rte_raw_cksum_mbuf(m, hdr->csum_start,
> +					rte_pktmbuf_pkt_len(m) - hdr->csum_start, &csum) < 0)
> +				return;
> +			if (likely(csum != 0xffff))
> +				csum = ~csum;
> +			off = hdr->csum_offset + hdr->csum_start;
> +			if (rte_pktmbuf_data_len(m) >= off + 1)
> +				*rte_pktmbuf_mtod_offset(m, uint16_t *, off) = csum;
> +		}
> +	}
> +
> +	if (hdr->gso_type != VIRTIO_NET_HDR_GSO_NONE) {
> +		if (hdr->gso_size == 0)
> +			return;
> +
> +		switch (hdr->gso_type & ~VIRTIO_NET_HDR_GSO_ECN) {
> +		case VIRTIO_NET_HDR_GSO_TCPV4:
> +		case VIRTIO_NET_HDR_GSO_TCPV6:
> +			if ((ptype & RTE_PTYPE_L4_MASK) != RTE_PTYPE_L4_TCP)
> +				break;
> +			m->ol_flags |= PKT_RX_LRO | PKT_RX_L4_CKSUM_NONE;
> +			m->tso_segsz = hdr->gso_size;
> +			break;
> +		case VIRTIO_NET_HDR_GSO_UDP:
> +			if ((ptype & RTE_PTYPE_L4_MASK) != RTE_PTYPE_L4_UDP)
> +				break;
> +			m->ol_flags |= PKT_RX_LRO | PKT_RX_L4_CKSUM_NONE;
> +			m->tso_segsz = hdr->gso_size;
> +			break;
> +		default:
> +			break;
> +		}
> +	}
> +}
> +
>  static __rte_noinline void
>  copy_vnet_hdr_from_desc(struct virtio_net_hdr *hdr,
>  		struct buf_vector *buf_vec)
> @@ -2380,7 +2466,8 @@ copy_vnet_hdr_from_desc(struct virtio_net_hdr *hdr,
>  static __rte_always_inline int
>  copy_desc_to_mbuf(struct virtio_net *dev, struct vhost_virtqueue *vq,
>  		  struct buf_vector *buf_vec, uint16_t nr_vec,
> -		  struct rte_mbuf *m, struct rte_mempool *mbuf_pool)
> +		  struct rte_mbuf *m, struct rte_mempool *mbuf_pool,
> +		  bool legacy_ol_flags)
>  {
>  	uint32_t buf_avail, buf_offset;
>  	uint64_t buf_addr, buf_len;
> @@ -2513,7 +2600,7 @@ copy_desc_to_mbuf(struct virtio_net *dev, struct vhost_virtqueue *vq,
>  	m->pkt_len    += mbuf_offset;
>  
>  	if (hdr)
> -		vhost_dequeue_offload(hdr, m);
> +		vhost_dequeue_offload(hdr, m, legacy_ol_flags);
>  
>  out:
>  
> @@ -2606,9 +2693,11 @@ virtio_dev_pktmbuf_alloc(struct virtio_net *dev, struct rte_mempool *mp,
>  	return pkt;
>  }
>  
> -static __rte_noinline uint16_t
> +__rte_always_inline
> +static uint16_t
>  virtio_dev_tx_split(struct virtio_net *dev, struct vhost_virtqueue *vq,
> -	struct rte_mempool *mbuf_pool, struct rte_mbuf **pkts, uint16_t count)
> +	struct rte_mempool *mbuf_pool, struct rte_mbuf **pkts, uint16_t count,
> +	bool legacy_ol_flags)
>  {
>  	uint16_t i;
>  	uint16_t free_entries;
> @@ -2668,7 +2757,7 @@ virtio_dev_tx_split(struct virtio_net *dev, struct vhost_virtqueue *vq,
>  		}
>  
>  		err = copy_desc_to_mbuf(dev, vq, buf_vec, nr_vec, pkts[i],
> -				mbuf_pool);
> +				mbuf_pool, legacy_ol_flags);
>  		if (unlikely(err)) {
>  			rte_pktmbuf_free(pkts[i]);
>  			if (!allocerr_warned) {
> @@ -2696,6 +2785,24 @@ virtio_dev_tx_split(struct virtio_net *dev, struct vhost_virtqueue *vq,
>  	return (i - dropped);
>  }
>  
> +__rte_noinline
> +static uint16_t
> +virtio_dev_tx_split_legacy(struct virtio_net *dev,
> +	struct vhost_virtqueue *vq, struct rte_mempool *mbuf_pool,
> +	struct rte_mbuf **pkts, uint16_t count)
> +{
> +	return virtio_dev_tx_split(dev, vq, mbuf_pool, pkts, count, true);
> +}
> +
> +__rte_noinline
> +static uint16_t
> +virtio_dev_tx_split_compliant(struct virtio_net *dev,
> +	struct vhost_virtqueue *vq, struct rte_mempool *mbuf_pool,
> +	struct rte_mbuf **pkts, uint16_t count)
> +{
> +	return virtio_dev_tx_split(dev, vq, mbuf_pool, pkts, count, false);
> +}
> +
>  static __rte_always_inline int
>  vhost_reserve_avail_batch_packed(struct virtio_net *dev,
>  				 struct vhost_virtqueue *vq,
> @@ -2770,7 +2877,8 @@ vhost_reserve_avail_batch_packed(struct virtio_net *dev,
>  static __rte_always_inline int
>  virtio_dev_tx_batch_packed(struct virtio_net *dev,
>  			   struct vhost_virtqueue *vq,
> -			   struct rte_mbuf **pkts)
> +			   struct rte_mbuf **pkts,
> +			   bool legacy_ol_flags)
>  {
>  	uint16_t avail_idx = vq->last_avail_idx;
>  	uint32_t buf_offset = sizeof(struct virtio_net_hdr_mrg_rxbuf);
> @@ -2794,7 +2902,7 @@ virtio_dev_tx_batch_packed(struct virtio_net *dev,
>  	if (virtio_net_with_host_offload(dev)) {
>  		vhost_for_each_try_unroll(i, 0, PACKED_BATCH_SIZE) {
>  			hdr = (struct virtio_net_hdr *)(desc_addrs[i]);
> -			vhost_dequeue_offload(hdr, pkts[i]);
> +			vhost_dequeue_offload(hdr, pkts[i], legacy_ol_flags);
>  		}
>  	}
>  
> @@ -2815,7 +2923,8 @@ vhost_dequeue_single_packed(struct virtio_net *dev,
>  			    struct rte_mempool *mbuf_pool,
>  			    struct rte_mbuf *pkts,
>  			    uint16_t *buf_id,
> -			    uint16_t *desc_count)
> +			    uint16_t *desc_count,
> +			    bool legacy_ol_flags)
>  {
>  	struct buf_vector buf_vec[BUF_VECTOR_MAX];
>  	uint32_t buf_len;
> @@ -2841,7 +2950,7 @@ vhost_dequeue_single_packed(struct virtio_net *dev,
>  	}
>  
>  	err = copy_desc_to_mbuf(dev, vq, buf_vec, nr_vec, pkts,
> -				mbuf_pool);
> +				mbuf_pool, legacy_ol_flags);
>  	if (unlikely(err)) {
>  		if (!allocerr_warned) {
>  			VHOST_LOG_DATA(ERR,
> @@ -2859,14 +2968,15 @@ static __rte_always_inline int
>  virtio_dev_tx_single_packed(struct virtio_net *dev,
>  			    struct vhost_virtqueue *vq,
>  			    struct rte_mempool *mbuf_pool,
> -			    struct rte_mbuf *pkts)
> +			    struct rte_mbuf *pkts,
> +			    bool legacy_ol_flags)
>  {
>  
>  	uint16_t buf_id, desc_count = 0;
>  	int ret;
>  
>  	ret = vhost_dequeue_single_packed(dev, vq, mbuf_pool, pkts, &buf_id,
> -					&desc_count);
> +					&desc_count, legacy_ol_flags);
>  
>  	if (likely(desc_count > 0)) {
>  		if (virtio_net_is_inorder(dev))
> @@ -2882,12 +2992,14 @@ virtio_dev_tx_single_packed(struct virtio_net *dev,
>  	return ret;
>  }
>  
> -static __rte_noinline uint16_t
> +__rte_always_inline
> +static uint16_t
>  virtio_dev_tx_packed(struct virtio_net *dev,
>  		     struct vhost_virtqueue *__rte_restrict vq,
>  		     struct rte_mempool *mbuf_pool,
>  		     struct rte_mbuf **__rte_restrict pkts,
> -		     uint32_t count)
> +		     uint32_t count,
> +		     bool legacy_ol_flags)
>  {
>  	uint32_t pkt_idx = 0;
>  
> @@ -2899,14 +3011,16 @@ virtio_dev_tx_packed(struct virtio_net *dev,
>  
>  		if (count - pkt_idx >= PACKED_BATCH_SIZE) {
>  			if (!virtio_dev_tx_batch_packed(dev, vq,
> -							&pkts[pkt_idx])) {
> +							&pkts[pkt_idx],
> +							legacy_ol_flags)) {
>  				pkt_idx += PACKED_BATCH_SIZE;
>  				continue;
>  			}
>  		}
>  
>  		if (virtio_dev_tx_single_packed(dev, vq, mbuf_pool,
> -						pkts[pkt_idx]))
> +						pkts[pkt_idx],
> +						legacy_ol_flags))
>  			break;
>  		pkt_idx++;
>  	} while (pkt_idx < count);
> @@ -2924,6 +3038,24 @@ virtio_dev_tx_packed(struct virtio_net *dev,
>  	return pkt_idx;
>  }
>  
> +__rte_noinline
> +static uint16_t
> +virtio_dev_tx_packed_legacy(struct virtio_net *dev,
> +	struct vhost_virtqueue *__rte_restrict vq, struct rte_mempool *mbuf_pool,
> +	struct rte_mbuf **__rte_restrict pkts, uint32_t count)
> +{
> +	return virtio_dev_tx_packed(dev, vq, mbuf_pool, pkts, count, true);
> +}
> +
> +__rte_noinline
> +static uint16_t
> +virtio_dev_tx_packed_compliant(struct virtio_net *dev,
> +	struct vhost_virtqueue *__rte_restrict vq, struct rte_mempool *mbuf_pool,
> +	struct rte_mbuf **__rte_restrict pkts, uint32_t count)
> +{
> +	return virtio_dev_tx_packed(dev, vq, mbuf_pool, pkts, count, false);
> +}
> +
>  uint16_t
>  rte_vhost_dequeue_burst(int vid, uint16_t queue_id,
>  	struct rte_mempool *mbuf_pool, struct rte_mbuf **pkts, uint16_t count)
> @@ -2999,10 +3131,17 @@ rte_vhost_dequeue_burst(int vid, uint16_t queue_id,
>  		count -= 1;
>  	}
>  
> -	if (vq_is_packed(dev))
> -		count = virtio_dev_tx_packed(dev, vq, mbuf_pool, pkts, count);
> -	else
> -		count = virtio_dev_tx_split(dev, vq, mbuf_pool, pkts, count);
> +	if (vq_is_packed(dev)) {
> +		if (dev->flags & VIRTIO_DEV_LEGACY_OL_FLAGS)
> +			count = virtio_dev_tx_packed_legacy(dev, vq, mbuf_pool, pkts, count);
> +		else
> +			count = virtio_dev_tx_packed_compliant(dev, vq, mbuf_pool, pkts, count);
> +	} else {
> +		if (dev->flags & VIRTIO_DEV_LEGACY_OL_FLAGS)
> +			count = virtio_dev_tx_split_legacy(dev, vq, mbuf_pool, pkts, count);
> +		else
> +			count = virtio_dev_tx_split_compliant(dev, vq, mbuf_pool, pkts, count);
> +	}
>  
>  out:
>  	if (dev->features & (1ULL << VIRTIO_F_IOMMU_PLATFORM))
> -- 
> 2.23.0
> 

-- 
fbl

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [dpdk-dev] [PATCH v4 3/3] vhost: fix offload flags in Rx path
  2021-05-03 16:43   ` [dpdk-dev] [PATCH v4 3/3] vhost: fix offload flags in Rx path David Marchand
  2021-05-04 11:07     ` Flavio Leitner
@ 2021-05-08  6:24     ` Wang, Yinan
  2021-05-12  3:29       ` Wang, Yinan
  1 sibling, 1 reply; 63+ messages in thread
From: Wang, Yinan @ 2021-05-08  6:24 UTC (permalink / raw)
  To: David Marchand, dev
  Cc: maxime.coquelin, olivier.matz, fbl, i.maximets, Xia, Chenbo,
	Stokes, Ian, stable, Jijiang Liu, Yuanhan Liu

Hi David,

May I know how to configures Tx offloading by testpmd, could you help to provide an example case?
I add a case which need vhost tx offload (TSO/cksum) function, this case can't work with the patch, could you use this case as the example if possible?

For example: VM2VM split ring vhost-user/virtio-net test with tcp traffic 
=========================================================================

1. Launch the Vhost sample on socket 0 by below commands::

    rm -rf vhost-net*
    ./dpdk-testpmd -l 2-4 -n 4 --no-pci --file-prefix=vhost --vdev 'net_vhost0,iface=vhost-net0,queues=1' \
    --vdev 'net_vhost1,iface=vhost-net1,queues=1'  -- -i --nb-cores=2 --txd=1024 --rxd=1024
    testpmd>start

2. Launch VM1 and VM2 on socket 1::

    taskset -c 32 qemu-system-x86_64 -name vm1 -enable-kvm -cpu host -smp 1 -m 4096 \
    -object memory-backend-file,id=mem,size=4096M,mem-path=/mnt/huge,share=on \
    -numa node,memdev=mem -mem-prealloc -drive file=/home/osimg/ubuntu20-04.img  \
    -chardev socket,path=/tmp/vm2_qga0.sock,server,nowait,id=vm2_qga0 -device virtio-serial \
    -device virtserialport,chardev=vm2_qga0,name=org.qemu.guest_agent.2 -daemonize \
    -monitor unix:/tmp/vm2_monitor.sock,server,nowait -device e1000,netdev=nttsip1 \
    -netdev user,id=nttsip1,hostfwd=tcp:127.0.0.1:6002-:22 \
    -chardev socket,id=char0,path=./vhost-net0 \
    -netdev type=vhost-user,id=netdev0,chardev=char0,vhostforce \
    -device virtio-net-pci,netdev=netdev0,mac=52:54:00:00:00:01,disable-modern=false,mrg_rxbuf=on,csum=on,guest_csum=on,host_tso4=on,guest_tso4=on,guest_ecn=on -vnc :10

   taskset -c 33 qemu-system-x86_64 -name vm2 -enable-kvm -cpu host -smp 1 -m 4096 \
    -object memory-backend-file,id=mem,size=4096M,mem-path=/mnt/huge,share=on \
    -numa node,memdev=mem -mem-prealloc -drive file=/home/osimg/ubuntu20-04-2.img  \
    -chardev socket,path=/tmp/vm2_qga0.sock,server,nowait,id=vm2_qga0 -device virtio-serial \
    -device virtserialport,chardev=vm2_qga0,name=org.qemu.guest_agent.2 -daemonize \
    -monitor unix:/tmp/vm2_monitor.sock,server,nowait -device e1000,netdev=nttsip1 \
    -netdev user,id=nttsip1,hostfwd=tcp:127.0.0.1:6003-:22 \
    -chardev socket,id=char0,path=./vhost-net1 \
    -netdev type=vhost-user,id=netdev0,chardev=char0,vhostforce \
    -device virtio-net-pci,netdev=netdev0,mac=52:54:00:00:00:02,disable-modern=false,mrg_rxbuf=on,csum=on,guest_csum=on,host_tso4=on,guest_tso4=on,guest_ecn=on -vnc :12

3. On VM1, set virtio device IP and run arp protocal::

    ifconfig ens5 1.1.1.2
    arp -s 1.1.1.8 52:54:00:00:00:02

4. On VM2, set virtio device IP and run arp protocal::

    ifconfig ens5 1.1.1.8
    arp -s 1.1.1.2 52:54:00:00:00:01

5. Check the iperf performance with different packet size between two VMs by below commands::

    Under VM1, run: `iperf -s -i 1`
    Under VM2, run: `iperf -c 1.1.1.2 -i 1 -t 60`

BR,
Yinan

> -----Original Message-----
> From: dev <dev-bounces@dpdk.org> On Behalf Of David Marchand
> Sent: 2021?5?4? 0:44
> To: dev@dpdk.org
> Cc: maxime.coquelin@redhat.com; olivier.matz@6wind.com;
> fbl@sysclose.org; i.maximets@ovn.org; Xia, Chenbo
> <chenbo.xia@intel.com>; Stokes, Ian <ian.stokes@intel.com>;
> stable@dpdk.org; Jijiang Liu <jijiang.liu@intel.com>; Yuanhan Liu
> <yuanhan.liu@linux.intel.com>
> Subject: [dpdk-dev] [PATCH v4 3/3] vhost: fix offload flags in Rx path
> 
> The vhost library currently configures Tx offloading (PKT_TX_*) on any
> packet received from a guest virtio device which asks for some offloading.
> 
> This is problematic, as Tx offloading is something that the application
> must ask for: the application needs to configure devices
> to support every used offloads (ip, tcp checksumming, tso..), and the
> various l2/l3/l4 lengths must be set following any processing that
> happened in the application itself.
> 
> On the other hand, the received packets are not marked wrt current
> packet l3/l4 checksumming info.
> 
> Copy virtio rx processing to fix those offload flags with some
> differences:
> - accept VIRTIO_NET_HDR_GSO_ECN and VIRTIO_NET_HDR_GSO_UDP,
> - ignore anything but the VIRTIO_NET_HDR_F_NEEDS_CSUM flag (to comply
> with
>   the virtio spec),
> 
> Some applications might rely on the current behavior, so it is left
> untouched by default.
> A new RTE_VHOST_USER_NET_COMPLIANT_OL_FLAGS flag is added to
> enable the
> new behavior.
> 
> The vhost example has been updated for the new behavior: TSO is applied
> to
> any packet marked LRO.
> 
> Fixes: 859b480d5afd ("vhost: add guest offload setting")
> Cc: stable@dpdk.org
> 
> Signed-off-by: David Marchand <david.marchand@redhat.com>
> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
> ---
> Changes since v3:
> - rebased on next-virtio,
> 
> Changes since v2:
> - introduced a new flag to keep existing behavior as the default,
> - packets with unrecognised offload are passed to the application with no
>   offload metadata rather than dropped,
> - ignored VIRTIO_NET_HDR_F_DATA_VALID since the virtio spec states that
>   the virtio driver is not allowed to use this flag when transmitting
>   packets,
> 
> Changes since v1:
> - updated vhost example,
> - restored VIRTIO_NET_HDR_GSO_ECN and VIRTIO_NET_HDR_GSO_UDP
> support,
> - restored log on buggy offload request,
> 
> ---
>  doc/guides/prog_guide/vhost_lib.rst    |  12 ++
>  doc/guides/rel_notes/release_21_05.rst |   6 +
>  drivers/net/vhost/rte_eth_vhost.c      |   2 +-
>  examples/vhost/main.c                  |  44 +++---
>  lib/vhost/rte_vhost.h                  |   1 +
>  lib/vhost/socket.c                     |   5 +-
>  lib/vhost/vhost.c                      |   6 +-
>  lib/vhost/vhost.h                      |  14 +-
>  lib/vhost/virtio_net.c                 | 185 ++++++++++++++++++++++---
>  9 files changed, 222 insertions(+), 53 deletions(-)
> 
> diff --git a/doc/guides/prog_guide/vhost_lib.rst
> b/doc/guides/prog_guide/vhost_lib.rst
> index 7afa351675..d18fb98910 100644
> --- a/doc/guides/prog_guide/vhost_lib.rst
> +++ b/doc/guides/prog_guide/vhost_lib.rst
> @@ -118,6 +118,18 @@ The following is an overview of some key Vhost
> API functions:
> 
>      It is disabled by default.
> 
> +  - ``RTE_VHOST_USER_NET_COMPLIANT_OL_FLAGS``
> +
> +    Since v16.04, the vhost library forwards checksum and gso requests for
> +    packets received from a virtio driver by filling Tx offload metadata in
> +    the mbuf. This behavior is inconsistent with other drivers but it is left
> +    untouched for existing applications that might rely on it.
> +
> +    This flag disables the legacy behavior and instead ask vhost to simply
> +    populate Rx offload metadata in the mbuf.
> +
> +    It is disabled by default.
> +
>  * ``rte_vhost_driver_set_features(path, features)``
> 
>    This function sets the feature bits the vhost-user driver supports. The
> diff --git a/doc/guides/rel_notes/release_21_05.rst
> b/doc/guides/rel_notes/release_21_05.rst
> index a5f21f8425..6b7b0810a5 100644
> --- a/doc/guides/rel_notes/release_21_05.rst
> +++ b/doc/guides/rel_notes/release_21_05.rst
> @@ -337,6 +337,12 @@ API Changes
>    ``policer_action_recolor_supported`` and
> ``policer_action_drop_supported``
>    have been removed.
> 
> +* vhost: The vhost library currently populates received mbufs from a virtio
> +  driver with Tx offload flags while not filling Rx offload flags.
> +  While this behavior is arguable, it is kept untouched.
> +  A new flag ``RTE_VHOST_USER_NET_COMPLIANT_OL_FLAGS`` has been
> added to ask
> +  for a behavior compliant with to the mbuf offload API.
> +
> 
>  ABI Changes
>  -----------
> diff --git a/drivers/net/vhost/rte_eth_vhost.c
> b/drivers/net/vhost/rte_eth_vhost.c
> index d198fc8a8e..281379d6a3 100644
> --- a/drivers/net/vhost/rte_eth_vhost.c
> +++ b/drivers/net/vhost/rte_eth_vhost.c
> @@ -1505,7 +1505,7 @@ rte_pmd_vhost_probe(struct rte_vdev_device
> *dev)
>  	int ret = 0;
>  	char *iface_name;
>  	uint16_t queues;
> -	uint64_t flags = 0;
> +	uint64_t flags = RTE_VHOST_USER_NET_COMPLIANT_OL_FLAGS;
>  	uint64_t disable_flags = 0;
>  	int client_mode = 0;
>  	int iommu_support = 0;
> diff --git a/examples/vhost/main.c b/examples/vhost/main.c
> index 0bee1f3321..d2179eadb9 100644
> --- a/examples/vhost/main.c
> +++ b/examples/vhost/main.c
> @@ -19,6 +19,7 @@
>  #include <rte_log.h>
>  #include <rte_string_fns.h>
>  #include <rte_malloc.h>
> +#include <rte_net.h>
>  #include <rte_vhost.h>
>  #include <rte_ip.h>
>  #include <rte_tcp.h>
> @@ -1029,33 +1030,34 @@ find_local_dest(struct vhost_dev *vdev,
> struct rte_mbuf *m,
>  	return 0;
>  }
> 
> -static uint16_t
> -get_psd_sum(void *l3_hdr, uint64_t ol_flags)
> -{
> -	if (ol_flags & PKT_TX_IPV4)
> -		return rte_ipv4_phdr_cksum(l3_hdr, ol_flags);
> -	else /* assume ethertype == RTE_ETHER_TYPE_IPV6 */
> -		return rte_ipv6_phdr_cksum(l3_hdr, ol_flags);
> -}
> -
>  static void virtio_tx_offload(struct rte_mbuf *m)
>  {
> +	struct rte_net_hdr_lens hdr_lens;
> +	struct rte_ipv4_hdr *ipv4_hdr;
> +	struct rte_tcp_hdr *tcp_hdr;
> +	uint32_t ptype;
>  	void *l3_hdr;
> -	struct rte_ipv4_hdr *ipv4_hdr = NULL;
> -	struct rte_tcp_hdr *tcp_hdr = NULL;
> -	struct rte_ether_hdr *eth_hdr =
> -		rte_pktmbuf_mtod(m, struct rte_ether_hdr *);
> 
> -	l3_hdr = (char *)eth_hdr + m->l2_len;
> +	ptype = rte_net_get_ptype(m, &hdr_lens, RTE_PTYPE_ALL_MASK);
> +	m->l2_len = hdr_lens.l2_len;
> +	m->l3_len = hdr_lens.l3_len;
> +	m->l4_len = hdr_lens.l4_len;
> 
> -	if (m->ol_flags & PKT_TX_IPV4) {
> +	l3_hdr = rte_pktmbuf_mtod_offset(m, void *, m->l2_len);
> +	tcp_hdr = rte_pktmbuf_mtod_offset(m, struct rte_tcp_hdr *,
> +		m->l2_len + m->l3_len);
> +
> +	m->ol_flags |= PKT_TX_TCP_SEG;
> +	if ((ptype & RTE_PTYPE_L3_MASK) == RTE_PTYPE_L3_IPV4) {
> +		m->ol_flags |= PKT_TX_IPV4;
> +		m->ol_flags |= PKT_TX_IP_CKSUM;
>  		ipv4_hdr = l3_hdr;
>  		ipv4_hdr->hdr_checksum = 0;
> -		m->ol_flags |= PKT_TX_IP_CKSUM;
> +		tcp_hdr->cksum = rte_ipv4_phdr_cksum(l3_hdr, m-
> >ol_flags);
> +	} else { /* assume ethertype == RTE_ETHER_TYPE_IPV6 */
> +		m->ol_flags |= PKT_TX_IPV6;
> +		tcp_hdr->cksum = rte_ipv6_phdr_cksum(l3_hdr, m-
> >ol_flags);
>  	}
> -
> -	tcp_hdr = (struct rte_tcp_hdr *)((char *)l3_hdr + m->l3_len);
> -	tcp_hdr->cksum = get_psd_sum(l3_hdr, m->ol_flags);
>  }
> 
>  static __rte_always_inline void
> @@ -1148,7 +1150,7 @@ virtio_tx_route(struct vhost_dev *vdev, struct
> rte_mbuf *m, uint16_t vlan_tag)
>  		m->vlan_tci = vlan_tag;
>  	}
> 
> -	if (m->ol_flags & PKT_TX_TCP_SEG)
> +	if (m->ol_flags & PKT_RX_LRO)
>  		virtio_tx_offload(m);
> 
>  	tx_q->m_table[tx_q->len++] = m;
> @@ -1633,7 +1635,7 @@ main(int argc, char *argv[])
>  	int ret, i;
>  	uint16_t portid;
>  	static pthread_t tid;
> -	uint64_t flags = 0;
> +	uint64_t flags = RTE_VHOST_USER_NET_COMPLIANT_OL_FLAGS;
> 
>  	signal(SIGINT, sigint_handler);
> 
> diff --git a/lib/vhost/rte_vhost.h b/lib/vhost/rte_vhost.h
> index d0a8ae31f2..8d875e9322 100644
> --- a/lib/vhost/rte_vhost.h
> +++ b/lib/vhost/rte_vhost.h
> @@ -36,6 +36,7 @@ extern "C" {
>  /* support only linear buffers (no chained mbufs) */
>  #define RTE_VHOST_USER_LINEARBUF_SUPPORT	(1ULL << 6)
>  #define RTE_VHOST_USER_ASYNC_COPY	(1ULL << 7)
> +#define RTE_VHOST_USER_NET_COMPLIANT_OL_FLAGS	(1ULL << 8)
> 
>  /* Features. */
>  #ifndef VIRTIO_NET_F_GUEST_ANNOUNCE
> diff --git a/lib/vhost/socket.c b/lib/vhost/socket.c
> index 0169d36481..5d0d728d52 100644
> --- a/lib/vhost/socket.c
> +++ b/lib/vhost/socket.c
> @@ -42,6 +42,7 @@ struct vhost_user_socket {
>  	bool extbuf;
>  	bool linearbuf;
>  	bool async_copy;
> +	bool net_compliant_ol_flags;
> 
>  	/*
>  	 * The "supported_features" indicates the feature bits the
> @@ -224,7 +225,8 @@ vhost_user_add_connection(int fd, struct
> vhost_user_socket *vsocket)
>  	size = strnlen(vsocket->path, PATH_MAX);
>  	vhost_set_ifname(vid, vsocket->path, size);
> 
> -	vhost_set_builtin_virtio_net(vid, vsocket->use_builtin_virtio_net);
> +	vhost_setup_virtio_net(vid, vsocket->use_builtin_virtio_net,
> +		vsocket->net_compliant_ol_flags);
> 
>  	vhost_attach_vdpa_device(vid, vsocket->vdpa_dev);
> 
> @@ -877,6 +879,7 @@ rte_vhost_driver_register(const char *path,
> uint64_t flags)
>  	vsocket->extbuf = flags & RTE_VHOST_USER_EXTBUF_SUPPORT;
>  	vsocket->linearbuf = flags &
> RTE_VHOST_USER_LINEARBUF_SUPPORT;
>  	vsocket->async_copy = flags & RTE_VHOST_USER_ASYNC_COPY;
> +	vsocket->net_compliant_ol_flags = flags &
> RTE_VHOST_USER_NET_COMPLIANT_OL_FLAGS;
> 
>  	if (vsocket->async_copy &&
>  		(flags & (RTE_VHOST_USER_IOMMU_SUPPORT |
> diff --git a/lib/vhost/vhost.c b/lib/vhost/vhost.c
> index c9b6379f73..9abfc0bfe7 100644
> --- a/lib/vhost/vhost.c
> +++ b/lib/vhost/vhost.c
> @@ -752,7 +752,7 @@ vhost_set_ifname(int vid, const char *if_name,
> unsigned int if_len)
>  }
> 
>  void
> -vhost_set_builtin_virtio_net(int vid, bool enable)
> +vhost_setup_virtio_net(int vid, bool enable, bool compliant_ol_flags)
>  {
>  	struct virtio_net *dev = get_device(vid);
> 
> @@ -763,6 +763,10 @@ vhost_set_builtin_virtio_net(int vid, bool enable)
>  		dev->flags |= VIRTIO_DEV_BUILTIN_VIRTIO_NET;
>  	else
>  		dev->flags &= ~VIRTIO_DEV_BUILTIN_VIRTIO_NET;
> +	if (!compliant_ol_flags)
> +		dev->flags |= VIRTIO_DEV_LEGACY_OL_FLAGS;
> +	else
> +		dev->flags &= ~VIRTIO_DEV_LEGACY_OL_FLAGS;
>  }
> 
>  void
> diff --git a/lib/vhost/vhost.h b/lib/vhost/vhost.h
> index b303635645..8078ddff79 100644
> --- a/lib/vhost/vhost.h
> +++ b/lib/vhost/vhost.h
> @@ -27,15 +27,17 @@
>  #include "rte_vhost_async.h"
> 
>  /* Used to indicate that the device is running on a data core */
> -#define VIRTIO_DEV_RUNNING 1
> +#define VIRTIO_DEV_RUNNING ((uint32_t)1 << 0)
>  /* Used to indicate that the device is ready to operate */
> -#define VIRTIO_DEV_READY 2
> +#define VIRTIO_DEV_READY ((uint32_t)1 << 1)
>  /* Used to indicate that the built-in vhost net device backend is enabled */
> -#define VIRTIO_DEV_BUILTIN_VIRTIO_NET 4
> +#define VIRTIO_DEV_BUILTIN_VIRTIO_NET ((uint32_t)1 << 2)
>  /* Used to indicate that the device has its own data path and configured */
> -#define VIRTIO_DEV_VDPA_CONFIGURED 8
> +#define VIRTIO_DEV_VDPA_CONFIGURED ((uint32_t)1 << 3)
>  /* Used to indicate that the feature negotiation failed */
> -#define VIRTIO_DEV_FEATURES_FAILED 16
> +#define VIRTIO_DEV_FEATURES_FAILED ((uint32_t)1 << 4)
> +/* Used to indicate that the virtio_net tx code should fill TX ol_flags */
> +#define VIRTIO_DEV_LEGACY_OL_FLAGS ((uint32_t)1 << 5)
> 
>  /* Backend value set by guest. */
>  #define VIRTIO_DEV_STOPPED -1
> @@ -683,7 +685,7 @@ int alloc_vring_queue(struct virtio_net *dev,
> uint32_t vring_idx);
>  void vhost_attach_vdpa_device(int vid, struct rte_vdpa_device *dev);
> 
>  void vhost_set_ifname(int, const char *if_name, unsigned int if_len);
> -void vhost_set_builtin_virtio_net(int vid, bool enable);
> +void vhost_setup_virtio_net(int vid, bool enable, bool legacy_ol_flags);
>  void vhost_enable_extbuf(int vid);
>  void vhost_enable_linearbuf(int vid);
>  int vhost_enable_guest_notification(struct virtio_net *dev,
> diff --git a/lib/vhost/virtio_net.c b/lib/vhost/virtio_net.c
> index 1a34867f3c..8e36f4c340 100644
> --- a/lib/vhost/virtio_net.c
> +++ b/lib/vhost/virtio_net.c
> @@ -8,6 +8,7 @@
> 
>  #include <rte_mbuf.h>
>  #include <rte_memcpy.h>
> +#include <rte_net.h>
>  #include <rte_ether.h>
>  #include <rte_ip.h>
>  #include <rte_vhost.h>
> @@ -2303,15 +2304,12 @@ parse_ethernet(struct rte_mbuf *m, uint16_t
> *l4_proto, void **l4_hdr)
>  }
> 
>  static __rte_always_inline void
> -vhost_dequeue_offload(struct virtio_net_hdr *hdr, struct rte_mbuf *m)
> +vhost_dequeue_offload_legacy(struct virtio_net_hdr *hdr, struct
> rte_mbuf *m)
>  {
>  	uint16_t l4_proto = 0;
>  	void *l4_hdr = NULL;
>  	struct rte_tcp_hdr *tcp_hdr = NULL;
> 
> -	if (hdr->flags == 0 && hdr->gso_type ==
> VIRTIO_NET_HDR_GSO_NONE)
> -		return;
> -
>  	parse_ethernet(m, &l4_proto, &l4_hdr);
>  	if (hdr->flags == VIRTIO_NET_HDR_F_NEEDS_CSUM) {
>  		if (hdr->csum_start == (m->l2_len + m->l3_len)) {
> @@ -2356,6 +2354,94 @@ vhost_dequeue_offload(struct virtio_net_hdr
> *hdr, struct rte_mbuf *m)
>  	}
>  }
> 
> +static __rte_always_inline void
> +vhost_dequeue_offload(struct virtio_net_hdr *hdr, struct rte_mbuf *m,
> +	bool legacy_ol_flags)
> +{
> +	struct rte_net_hdr_lens hdr_lens;
> +	int l4_supported = 0;
> +	uint32_t ptype;
> +
> +	if (hdr->flags == 0 && hdr->gso_type ==
> VIRTIO_NET_HDR_GSO_NONE)
> +		return;
> +
> +	if (legacy_ol_flags) {
> +		vhost_dequeue_offload_legacy(hdr, m);
> +		return;
> +	}
> +
> +	m->ol_flags |= PKT_RX_IP_CKSUM_UNKNOWN;
> +
> +	ptype = rte_net_get_ptype(m, &hdr_lens, RTE_PTYPE_ALL_MASK);
> +	m->packet_type = ptype;
> +	if ((ptype & RTE_PTYPE_L4_MASK) == RTE_PTYPE_L4_TCP ||
> +	    (ptype & RTE_PTYPE_L4_MASK) == RTE_PTYPE_L4_UDP ||
> +	    (ptype & RTE_PTYPE_L4_MASK) == RTE_PTYPE_L4_SCTP)
> +		l4_supported = 1;
> +
> +	/* According to Virtio 1.1 spec, the device only needs to look at
> +	 * VIRTIO_NET_HDR_F_NEEDS_CSUM in the packet transmission
> path.
> +	 * This differs from the processing incoming packets path where the
> +	 * driver could rely on VIRTIO_NET_HDR_F_DATA_VALID flag set by
> the
> +	 * device.
> +	 *
> +	 * 5.1.6.2.1 Driver Requirements: Packet Transmission
> +	 * The driver MUST NOT set the VIRTIO_NET_HDR_F_DATA_VALID
> and
> +	 * VIRTIO_NET_HDR_F_RSC_INFO bits in flags.
> +	 *
> +	 * 5.1.6.2.2 Device Requirements: Packet Transmission
> +	 * The device MUST ignore flag bits that it does not recognize.
> +	 */
> +	if (hdr->flags & VIRTIO_NET_HDR_F_NEEDS_CSUM) {
> +		uint32_t hdrlen;
> +
> +		hdrlen = hdr_lens.l2_len + hdr_lens.l3_len + hdr_lens.l4_len;
> +		if (hdr->csum_start <= hdrlen && l4_supported != 0) {
> +			m->ol_flags |= PKT_RX_L4_CKSUM_NONE;
> +		} else {
> +			/* Unknown proto or tunnel, do sw cksum. We can
> assume
> +			 * the cksum field is in the first segment since the
> +			 * buffers we provided to the host are large enough.
> +			 * In case of SCTP, this will be wrong since it's a CRC
> +			 * but there's nothing we can do.
> +			 */
> +			uint16_t csum = 0, off;
> +
> +			if (rte_raw_cksum_mbuf(m, hdr->csum_start,
> +					rte_pktmbuf_pkt_len(m) - hdr-
> >csum_start, &csum) < 0)
> +				return;
> +			if (likely(csum != 0xffff))
> +				csum = ~csum;
> +			off = hdr->csum_offset + hdr->csum_start;
> +			if (rte_pktmbuf_data_len(m) >= off + 1)
> +				*rte_pktmbuf_mtod_offset(m, uint16_t *,
> off) = csum;
> +		}
> +	}
> +
> +	if (hdr->gso_type != VIRTIO_NET_HDR_GSO_NONE) {
> +		if (hdr->gso_size == 0)
> +			return;
> +
> +		switch (hdr->gso_type & ~VIRTIO_NET_HDR_GSO_ECN) {
> +		case VIRTIO_NET_HDR_GSO_TCPV4:
> +		case VIRTIO_NET_HDR_GSO_TCPV6:
> +			if ((ptype & RTE_PTYPE_L4_MASK) !=
> RTE_PTYPE_L4_TCP)
> +				break;
> +			m->ol_flags |= PKT_RX_LRO |
> PKT_RX_L4_CKSUM_NONE;
> +			m->tso_segsz = hdr->gso_size;
> +			break;
> +		case VIRTIO_NET_HDR_GSO_UDP:
> +			if ((ptype & RTE_PTYPE_L4_MASK) !=
> RTE_PTYPE_L4_UDP)
> +				break;
> +			m->ol_flags |= PKT_RX_LRO |
> PKT_RX_L4_CKSUM_NONE;
> +			m->tso_segsz = hdr->gso_size;
> +			break;
> +		default:
> +			break;
> +		}
> +	}
> +}
> +
>  static __rte_noinline void
>  copy_vnet_hdr_from_desc(struct virtio_net_hdr *hdr,
>  		struct buf_vector *buf_vec)
> @@ -2380,7 +2466,8 @@ copy_vnet_hdr_from_desc(struct virtio_net_hdr
> *hdr,
>  static __rte_always_inline int
>  copy_desc_to_mbuf(struct virtio_net *dev, struct vhost_virtqueue *vq,
>  		  struct buf_vector *buf_vec, uint16_t nr_vec,
> -		  struct rte_mbuf *m, struct rte_mempool *mbuf_pool)
> +		  struct rte_mbuf *m, struct rte_mempool *mbuf_pool,
> +		  bool legacy_ol_flags)
>  {
>  	uint32_t buf_avail, buf_offset;
>  	uint64_t buf_addr, buf_len;
> @@ -2513,7 +2600,7 @@ copy_desc_to_mbuf(struct virtio_net *dev,
> struct vhost_virtqueue *vq,
>  	m->pkt_len    += mbuf_offset;
> 
>  	if (hdr)
> -		vhost_dequeue_offload(hdr, m);
> +		vhost_dequeue_offload(hdr, m, legacy_ol_flags);
> 
>  out:
> 
> @@ -2606,9 +2693,11 @@ virtio_dev_pktmbuf_alloc(struct virtio_net
> *dev, struct rte_mempool *mp,
>  	return pkt;
>  }
> 
> -static __rte_noinline uint16_t
> +__rte_always_inline
> +static uint16_t
>  virtio_dev_tx_split(struct virtio_net *dev, struct vhost_virtqueue *vq,
> -	struct rte_mempool *mbuf_pool, struct rte_mbuf **pkts, uint16_t
> count)
> +	struct rte_mempool *mbuf_pool, struct rte_mbuf **pkts, uint16_t
> count,
> +	bool legacy_ol_flags)
>  {
>  	uint16_t i;
>  	uint16_t free_entries;
> @@ -2668,7 +2757,7 @@ virtio_dev_tx_split(struct virtio_net *dev, struct
> vhost_virtqueue *vq,
>  		}
> 
>  		err = copy_desc_to_mbuf(dev, vq, buf_vec, nr_vec, pkts[i],
> -				mbuf_pool);
> +				mbuf_pool, legacy_ol_flags);
>  		if (unlikely(err)) {
>  			rte_pktmbuf_free(pkts[i]);
>  			if (!allocerr_warned) {
> @@ -2696,6 +2785,24 @@ virtio_dev_tx_split(struct virtio_net *dev,
> struct vhost_virtqueue *vq,
>  	return (i - dropped);
>  }
> 
> +__rte_noinline
> +static uint16_t
> +virtio_dev_tx_split_legacy(struct virtio_net *dev,
> +	struct vhost_virtqueue *vq, struct rte_mempool *mbuf_pool,
> +	struct rte_mbuf **pkts, uint16_t count)
> +{
> +	return virtio_dev_tx_split(dev, vq, mbuf_pool, pkts, count, true);
> +}
> +
> +__rte_noinline
> +static uint16_t
> +virtio_dev_tx_split_compliant(struct virtio_net *dev,
> +	struct vhost_virtqueue *vq, struct rte_mempool *mbuf_pool,
> +	struct rte_mbuf **pkts, uint16_t count)
> +{
> +	return virtio_dev_tx_split(dev, vq, mbuf_pool, pkts, count, false);
> +}
> +
>  static __rte_always_inline int
>  vhost_reserve_avail_batch_packed(struct virtio_net *dev,
>  				 struct vhost_virtqueue *vq,
> @@ -2770,7 +2877,8 @@ vhost_reserve_avail_batch_packed(struct
> virtio_net *dev,
>  static __rte_always_inline int
>  virtio_dev_tx_batch_packed(struct virtio_net *dev,
>  			   struct vhost_virtqueue *vq,
> -			   struct rte_mbuf **pkts)
> +			   struct rte_mbuf **pkts,
> +			   bool legacy_ol_flags)
>  {
>  	uint16_t avail_idx = vq->last_avail_idx;
>  	uint32_t buf_offset = sizeof(struct virtio_net_hdr_mrg_rxbuf);
> @@ -2794,7 +2902,7 @@ virtio_dev_tx_batch_packed(struct virtio_net
> *dev,
>  	if (virtio_net_with_host_offload(dev)) {
>  		vhost_for_each_try_unroll(i, 0, PACKED_BATCH_SIZE) {
>  			hdr = (struct virtio_net_hdr *)(desc_addrs[i]);
> -			vhost_dequeue_offload(hdr, pkts[i]);
> +			vhost_dequeue_offload(hdr, pkts[i], legacy_ol_flags);
>  		}
>  	}
> 
> @@ -2815,7 +2923,8 @@ vhost_dequeue_single_packed(struct virtio_net
> *dev,
>  			    struct rte_mempool *mbuf_pool,
>  			    struct rte_mbuf *pkts,
>  			    uint16_t *buf_id,
> -			    uint16_t *desc_count)
> +			    uint16_t *desc_count,
> +			    bool legacy_ol_flags)
>  {
>  	struct buf_vector buf_vec[BUF_VECTOR_MAX];
>  	uint32_t buf_len;
> @@ -2841,7 +2950,7 @@ vhost_dequeue_single_packed(struct virtio_net
> *dev,
>  	}
> 
>  	err = copy_desc_to_mbuf(dev, vq, buf_vec, nr_vec, pkts,
> -				mbuf_pool);
> +				mbuf_pool, legacy_ol_flags);
>  	if (unlikely(err)) {
>  		if (!allocerr_warned) {
>  			VHOST_LOG_DATA(ERR,
> @@ -2859,14 +2968,15 @@ static __rte_always_inline int
>  virtio_dev_tx_single_packed(struct virtio_net *dev,
>  			    struct vhost_virtqueue *vq,
>  			    struct rte_mempool *mbuf_pool,
> -			    struct rte_mbuf *pkts)
> +			    struct rte_mbuf *pkts,
> +			    bool legacy_ol_flags)
>  {
> 
>  	uint16_t buf_id, desc_count = 0;
>  	int ret;
> 
>  	ret = vhost_dequeue_single_packed(dev, vq, mbuf_pool, pkts,
> &buf_id,
> -					&desc_count);
> +					&desc_count, legacy_ol_flags);
> 
>  	if (likely(desc_count > 0)) {
>  		if (virtio_net_is_inorder(dev))
> @@ -2882,12 +2992,14 @@ virtio_dev_tx_single_packed(struct virtio_net
> *dev,
>  	return ret;
>  }
> 
> -static __rte_noinline uint16_t
> +__rte_always_inline
> +static uint16_t
>  virtio_dev_tx_packed(struct virtio_net *dev,
>  		     struct vhost_virtqueue *__rte_restrict vq,
>  		     struct rte_mempool *mbuf_pool,
>  		     struct rte_mbuf **__rte_restrict pkts,
> -		     uint32_t count)
> +		     uint32_t count,
> +		     bool legacy_ol_flags)
>  {
>  	uint32_t pkt_idx = 0;
> 
> @@ -2899,14 +3011,16 @@ virtio_dev_tx_packed(struct virtio_net *dev,
> 
>  		if (count - pkt_idx >= PACKED_BATCH_SIZE) {
>  			if (!virtio_dev_tx_batch_packed(dev, vq,
> -							&pkts[pkt_idx])) {
> +							&pkts[pkt_idx],
> +							legacy_ol_flags)) {
>  				pkt_idx += PACKED_BATCH_SIZE;
>  				continue;
>  			}
>  		}
> 
>  		if (virtio_dev_tx_single_packed(dev, vq, mbuf_pool,
> -						pkts[pkt_idx]))
> +						pkts[pkt_idx],
> +						legacy_ol_flags))
>  			break;
>  		pkt_idx++;
>  	} while (pkt_idx < count);
> @@ -2924,6 +3038,24 @@ virtio_dev_tx_packed(struct virtio_net *dev,
>  	return pkt_idx;
>  }
> 
> +__rte_noinline
> +static uint16_t
> +virtio_dev_tx_packed_legacy(struct virtio_net *dev,
> +	struct vhost_virtqueue *__rte_restrict vq, struct rte_mempool
> *mbuf_pool,
> +	struct rte_mbuf **__rte_restrict pkts, uint32_t count)
> +{
> +	return virtio_dev_tx_packed(dev, vq, mbuf_pool, pkts, count, true);
> +}
> +
> +__rte_noinline
> +static uint16_t
> +virtio_dev_tx_packed_compliant(struct virtio_net *dev,
> +	struct vhost_virtqueue *__rte_restrict vq, struct rte_mempool
> *mbuf_pool,
> +	struct rte_mbuf **__rte_restrict pkts, uint32_t count)
> +{
> +	return virtio_dev_tx_packed(dev, vq, mbuf_pool, pkts, count, false);
> +}
> +
>  uint16_t
>  rte_vhost_dequeue_burst(int vid, uint16_t queue_id,
>  	struct rte_mempool *mbuf_pool, struct rte_mbuf **pkts, uint16_t
> count)
> @@ -2999,10 +3131,17 @@ rte_vhost_dequeue_burst(int vid, uint16_t
> queue_id,
>  		count -= 1;
>  	}
> 
> -	if (vq_is_packed(dev))
> -		count = virtio_dev_tx_packed(dev, vq, mbuf_pool, pkts,
> count);
> -	else
> -		count = virtio_dev_tx_split(dev, vq, mbuf_pool, pkts, count);
> +	if (vq_is_packed(dev)) {
> +		if (dev->flags & VIRTIO_DEV_LEGACY_OL_FLAGS)
> +			count = virtio_dev_tx_packed_legacy(dev, vq,
> mbuf_pool, pkts, count);
> +		else
> +			count = virtio_dev_tx_packed_compliant(dev, vq,
> mbuf_pool, pkts, count);
> +	} else {
> +		if (dev->flags & VIRTIO_DEV_LEGACY_OL_FLAGS)
> +			count = virtio_dev_tx_split_legacy(dev, vq,
> mbuf_pool, pkts, count);
> +		else
> +			count = virtio_dev_tx_split_compliant(dev, vq,
> mbuf_pool, pkts, count);
> +	}
> 
>  out:
>  	if (dev->features & (1ULL << VIRTIO_F_IOMMU_PLATFORM))
> --
> 2.23.0


^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [dpdk-dev] [PATCH v4 3/3] vhost: fix offload flags in Rx path
  2021-05-08  6:24     ` Wang, Yinan
@ 2021-05-12  3:29       ` Wang, Yinan
  2021-05-12 15:20         ` David Marchand
  0 siblings, 1 reply; 63+ messages in thread
From: Wang, Yinan @ 2021-05-12  3:29 UTC (permalink / raw)
  To: Wang, Yinan, David Marchand, dev
  Cc: maxime.coquelin, olivier.matz, fbl, i.maximets, Xia, Chenbo,
	Stokes, Ian, stable, Jijiang Liu, Yuanhan Liu

Hi David,

Since vhost tx offload can’t work now, we report a Bugzilla as below, could you help to take a look?
https://bugs.dpdk.org/show_bug.cgi?id=702
We also tried vhost example with VM2VM iperf test, large pkts also can't forwarding.

BR,
Yinan


> -----Original Message-----
> From: dev <dev-bounces@dpdk.org> On Behalf Of Wang, Yinan
> Sent: 2021?5?8? 14:24
> To: David Marchand <david.marchand@redhat.com>; dev@dpdk.org
> Cc: maxime.coquelin@redhat.com; olivier.matz@6wind.com;
> fbl@sysclose.org; i.maximets@ovn.org; Xia, Chenbo
> <chenbo.xia@intel.com>; Stokes, Ian <ian.stokes@intel.com>;
> stable@dpdk.org; Jijiang Liu <jijiang.liu@intel.com>; Yuanhan Liu
> <yuanhan.liu@linux.intel.com>
> Subject: Re: [dpdk-dev] [PATCH v4 3/3] vhost: fix offload flags in Rx path
> 
> Hi David,
> 
> May I know how to configures Tx offloading by testpmd, could you help to
> provide an example case?
> I add a case which need vhost tx offload (TSO/cksum) function, this case
> can't work with the patch, could you use this case as the example if possible?
> 
> For example: VM2VM split ring vhost-user/virtio-net test with tcp traffic
> ==========================================================
> ===============
> 
> 1. Launch the Vhost sample on socket 0 by below commands::
> 
>     rm -rf vhost-net*
>     ./dpdk-testpmd -l 2-4 -n 4 --no-pci --file-prefix=vhost --vdev
> 'net_vhost0,iface=vhost-net0,queues=1' \
>     --vdev 'net_vhost1,iface=vhost-net1,queues=1'  -- -i --nb-cores=2 --
> txd=1024 --rxd=1024
>     testpmd>start
> 
> 2. Launch VM1 and VM2 on socket 1::
> 
>     taskset -c 32 qemu-system-x86_64 -name vm1 -enable-kvm -cpu host -
> smp 1 -m 4096 \
>     -object memory-backend-file,id=mem,size=4096M,mem-
> path=/mnt/huge,share=on \
>     -numa node,memdev=mem -mem-prealloc -drive
> file=/home/osimg/ubuntu20-04.img  \
>     -chardev socket,path=/tmp/vm2_qga0.sock,server,nowait,id=vm2_qga0
> -device virtio-serial \
>     -device virtserialport,chardev=vm2_qga0,name=org.qemu.guest_agent.2
> -daemonize \
>     -monitor unix:/tmp/vm2_monitor.sock,server,nowait -device
> e1000,netdev=nttsip1 \
>     -netdev user,id=nttsip1,hostfwd=tcp:127.0.0.1:6002-:22 \
>     -chardev socket,id=char0,path=./vhost-net0 \
>     -netdev type=vhost-user,id=netdev0,chardev=char0,vhostforce \
>     -device virtio-net-pci,netdev=netdev0,mac=52:54:00:00:00:01,disable-
> modern=false,mrg_rxbuf=on,csum=on,guest_csum=on,host_tso4=on,guest
> _tso4=on,guest_ecn=on -vnc :10
> 
>    taskset -c 33 qemu-system-x86_64 -name vm2 -enable-kvm -cpu host -
> smp 1 -m 4096 \
>     -object memory-backend-file,id=mem,size=4096M,mem-
> path=/mnt/huge,share=on \
>     -numa node,memdev=mem -mem-prealloc -drive
> file=/home/osimg/ubuntu20-04-2.img  \
>     -chardev socket,path=/tmp/vm2_qga0.sock,server,nowait,id=vm2_qga0
> -device virtio-serial \
>     -device virtserialport,chardev=vm2_qga0,name=org.qemu.guest_agent.2
> -daemonize \
>     -monitor unix:/tmp/vm2_monitor.sock,server,nowait -device
> e1000,netdev=nttsip1 \
>     -netdev user,id=nttsip1,hostfwd=tcp:127.0.0.1:6003-:22 \
>     -chardev socket,id=char0,path=./vhost-net1 \
>     -netdev type=vhost-user,id=netdev0,chardev=char0,vhostforce \
>     -device virtio-net-pci,netdev=netdev0,mac=52:54:00:00:00:02,disable-
> modern=false,mrg_rxbuf=on,csum=on,guest_csum=on,host_tso4=on,guest
> _tso4=on,guest_ecn=on -vnc :12
> 
> 3. On VM1, set virtio device IP and run arp protocal::
> 
>     ifconfig ens5 1.1.1.2
>     arp -s 1.1.1.8 52:54:00:00:00:02
> 
> 4. On VM2, set virtio device IP and run arp protocal::
> 
>     ifconfig ens5 1.1.1.8
>     arp -s 1.1.1.2 52:54:00:00:00:01
> 
> 5. Check the iperf performance with different packet size between two VMs
> by below commands::
> 
>     Under VM1, run: `iperf -s -i 1`
>     Under VM2, run: `iperf -c 1.1.1.2 -i 1 -t 60`
> 
> BR,
> Yinan
> 
> > -----Original Message-----
> > From: dev <dev-bounces@dpdk.org> On Behalf Of David Marchand
> > Sent: 2021?5?4? 0:44
> > To: dev@dpdk.org
> > Cc: maxime.coquelin@redhat.com; olivier.matz@6wind.com;
> > fbl@sysclose.org; i.maximets@ovn.org; Xia, Chenbo
> > <chenbo.xia@intel.com>; Stokes, Ian <ian.stokes@intel.com>;
> > stable@dpdk.org; Jijiang Liu <jijiang.liu@intel.com>; Yuanhan Liu
> > <yuanhan.liu@linux.intel.com>
> > Subject: [dpdk-dev] [PATCH v4 3/3] vhost: fix offload flags in Rx path
> >
> > The vhost library currently configures Tx offloading (PKT_TX_*) on any
> > packet received from a guest virtio device which asks for some offloading.
> >
> > This is problematic, as Tx offloading is something that the application
> > must ask for: the application needs to configure devices
> > to support every used offloads (ip, tcp checksumming, tso..), and the
> > various l2/l3/l4 lengths must be set following any processing that
> > happened in the application itself.
> >
> > On the other hand, the received packets are not marked wrt current
> > packet l3/l4 checksumming info.
> >
> > Copy virtio rx processing to fix those offload flags with some
> > differences:
> > - accept VIRTIO_NET_HDR_GSO_ECN and VIRTIO_NET_HDR_GSO_UDP,
> > - ignore anything but the VIRTIO_NET_HDR_F_NEEDS_CSUM flag (to
> comply
> > with
> >   the virtio spec),
> >
> > Some applications might rely on the current behavior, so it is left
> > untouched by default.
> > A new RTE_VHOST_USER_NET_COMPLIANT_OL_FLAGS flag is added to
> > enable the
> > new behavior.
> >
> > The vhost example has been updated for the new behavior: TSO is applied
> > to
> > any packet marked LRO.
> >
> > Fixes: 859b480d5afd ("vhost: add guest offload setting")
> > Cc: stable@dpdk.org
> >
> > Signed-off-by: David Marchand <david.marchand@redhat.com>
> > Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
> > ---
> > Changes since v3:
> > - rebased on next-virtio,
> >
> > Changes since v2:
> > - introduced a new flag to keep existing behavior as the default,
> > - packets with unrecognised offload are passed to the application with no
> >   offload metadata rather than dropped,
> > - ignored VIRTIO_NET_HDR_F_DATA_VALID since the virtio spec states
> that
> >   the virtio driver is not allowed to use this flag when transmitting
> >   packets,
> >
> > Changes since v1:
> > - updated vhost example,
> > - restored VIRTIO_NET_HDR_GSO_ECN and VIRTIO_NET_HDR_GSO_UDP
> > support,
> > - restored log on buggy offload request,
> >
> > ---
> >  doc/guides/prog_guide/vhost_lib.rst    |  12 ++
> >  doc/guides/rel_notes/release_21_05.rst |   6 +
> >  drivers/net/vhost/rte_eth_vhost.c      |   2 +-
> >  examples/vhost/main.c                  |  44 +++---
> >  lib/vhost/rte_vhost.h                  |   1 +
> >  lib/vhost/socket.c                     |   5 +-
> >  lib/vhost/vhost.c                      |   6 +-
> >  lib/vhost/vhost.h                      |  14 +-
> >  lib/vhost/virtio_net.c                 | 185 ++++++++++++++++++++++---
> >  9 files changed, 222 insertions(+), 53 deletions(-)
> >
> > diff --git a/doc/guides/prog_guide/vhost_lib.rst
> > b/doc/guides/prog_guide/vhost_lib.rst
> > index 7afa351675..d18fb98910 100644
> > --- a/doc/guides/prog_guide/vhost_lib.rst
> > +++ b/doc/guides/prog_guide/vhost_lib.rst
> > @@ -118,6 +118,18 @@ The following is an overview of some key Vhost
> > API functions:
> >
> >      It is disabled by default.
> >
> > +  - ``RTE_VHOST_USER_NET_COMPLIANT_OL_FLAGS``
> > +
> > +    Since v16.04, the vhost library forwards checksum and gso requests
> for
> > +    packets received from a virtio driver by filling Tx offload metadata in
> > +    the mbuf. This behavior is inconsistent with other drivers but it is left
> > +    untouched for existing applications that might rely on it.
> > +
> > +    This flag disables the legacy behavior and instead ask vhost to simply
> > +    populate Rx offload metadata in the mbuf.
> > +
> > +    It is disabled by default.
> > +
> >  * ``rte_vhost_driver_set_features(path, features)``
> >
> >    This function sets the feature bits the vhost-user driver supports. The
> > diff --git a/doc/guides/rel_notes/release_21_05.rst
> > b/doc/guides/rel_notes/release_21_05.rst
> > index a5f21f8425..6b7b0810a5 100644
> > --- a/doc/guides/rel_notes/release_21_05.rst
> > +++ b/doc/guides/rel_notes/release_21_05.rst
> > @@ -337,6 +337,12 @@ API Changes
> >    ``policer_action_recolor_supported`` and
> > ``policer_action_drop_supported``
> >    have been removed.
> >
> > +* vhost: The vhost library currently populates received mbufs from a
> virtio
> > +  driver with Tx offload flags while not filling Rx offload flags.
> > +  While this behavior is arguable, it is kept untouched.
> > +  A new flag ``RTE_VHOST_USER_NET_COMPLIANT_OL_FLAGS`` has been
> > added to ask
> > +  for a behavior compliant with to the mbuf offload API.
> > +
> >
> >  ABI Changes
> >  -----------
> > diff --git a/drivers/net/vhost/rte_eth_vhost.c
> > b/drivers/net/vhost/rte_eth_vhost.c
> > index d198fc8a8e..281379d6a3 100644
> > --- a/drivers/net/vhost/rte_eth_vhost.c
> > +++ b/drivers/net/vhost/rte_eth_vhost.c
> > @@ -1505,7 +1505,7 @@ rte_pmd_vhost_probe(struct rte_vdev_device
> > *dev)
> >  	int ret = 0;
> >  	char *iface_name;
> >  	uint16_t queues;
> > -	uint64_t flags = 0;
> > +	uint64_t flags = RTE_VHOST_USER_NET_COMPLIANT_OL_FLAGS;
> >  	uint64_t disable_flags = 0;
> >  	int client_mode = 0;
> >  	int iommu_support = 0;
> > diff --git a/examples/vhost/main.c b/examples/vhost/main.c
> > index 0bee1f3321..d2179eadb9 100644
> > --- a/examples/vhost/main.c
> > +++ b/examples/vhost/main.c
> > @@ -19,6 +19,7 @@
> >  #include <rte_log.h>
> >  #include <rte_string_fns.h>
> >  #include <rte_malloc.h>
> > +#include <rte_net.h>
> >  #include <rte_vhost.h>
> >  #include <rte_ip.h>
> >  #include <rte_tcp.h>
> > @@ -1029,33 +1030,34 @@ find_local_dest(struct vhost_dev *vdev,
> > struct rte_mbuf *m,
> >  	return 0;
> >  }
> >
> > -static uint16_t
> > -get_psd_sum(void *l3_hdr, uint64_t ol_flags)
> > -{
> > -	if (ol_flags & PKT_TX_IPV4)
> > -		return rte_ipv4_phdr_cksum(l3_hdr, ol_flags);
> > -	else /* assume ethertype == RTE_ETHER_TYPE_IPV6 */
> > -		return rte_ipv6_phdr_cksum(l3_hdr, ol_flags);
> > -}
> > -
> >  static void virtio_tx_offload(struct rte_mbuf *m)
> >  {
> > +	struct rte_net_hdr_lens hdr_lens;
> > +	struct rte_ipv4_hdr *ipv4_hdr;
> > +	struct rte_tcp_hdr *tcp_hdr;
> > +	uint32_t ptype;
> >  	void *l3_hdr;
> > -	struct rte_ipv4_hdr *ipv4_hdr = NULL;
> > -	struct rte_tcp_hdr *tcp_hdr = NULL;
> > -	struct rte_ether_hdr *eth_hdr =
> > -		rte_pktmbuf_mtod(m, struct rte_ether_hdr *);
> >
> > -	l3_hdr = (char *)eth_hdr + m->l2_len;
> > +	ptype = rte_net_get_ptype(m, &hdr_lens, RTE_PTYPE_ALL_MASK);
> > +	m->l2_len = hdr_lens.l2_len;
> > +	m->l3_len = hdr_lens.l3_len;
> > +	m->l4_len = hdr_lens.l4_len;
> >
> > -	if (m->ol_flags & PKT_TX_IPV4) {
> > +	l3_hdr = rte_pktmbuf_mtod_offset(m, void *, m->l2_len);
> > +	tcp_hdr = rte_pktmbuf_mtod_offset(m, struct rte_tcp_hdr *,
> > +		m->l2_len + m->l3_len);
> > +
> > +	m->ol_flags |= PKT_TX_TCP_SEG;
> > +	if ((ptype & RTE_PTYPE_L3_MASK) == RTE_PTYPE_L3_IPV4) {
> > +		m->ol_flags |= PKT_TX_IPV4;
> > +		m->ol_flags |= PKT_TX_IP_CKSUM;
> >  		ipv4_hdr = l3_hdr;
> >  		ipv4_hdr->hdr_checksum = 0;
> > -		m->ol_flags |= PKT_TX_IP_CKSUM;
> > +		tcp_hdr->cksum = rte_ipv4_phdr_cksum(l3_hdr, m-
> > >ol_flags);
> > +	} else { /* assume ethertype == RTE_ETHER_TYPE_IPV6 */
> > +		m->ol_flags |= PKT_TX_IPV6;
> > +		tcp_hdr->cksum = rte_ipv6_phdr_cksum(l3_hdr, m-
> > >ol_flags);
> >  	}
> > -
> > -	tcp_hdr = (struct rte_tcp_hdr *)((char *)l3_hdr + m->l3_len);
> > -	tcp_hdr->cksum = get_psd_sum(l3_hdr, m->ol_flags);
> >  }
> >
> >  static __rte_always_inline void
> > @@ -1148,7 +1150,7 @@ virtio_tx_route(struct vhost_dev *vdev, struct
> > rte_mbuf *m, uint16_t vlan_tag)
> >  		m->vlan_tci = vlan_tag;
> >  	}
> >
> > -	if (m->ol_flags & PKT_TX_TCP_SEG)
> > +	if (m->ol_flags & PKT_RX_LRO)
> >  		virtio_tx_offload(m);
> >
> >  	tx_q->m_table[tx_q->len++] = m;
> > @@ -1633,7 +1635,7 @@ main(int argc, char *argv[])
> >  	int ret, i;
> >  	uint16_t portid;
> >  	static pthread_t tid;
> > -	uint64_t flags = 0;
> > +	uint64_t flags = RTE_VHOST_USER_NET_COMPLIANT_OL_FLAGS;
> >
> >  	signal(SIGINT, sigint_handler);
> >
> > diff --git a/lib/vhost/rte_vhost.h b/lib/vhost/rte_vhost.h
> > index d0a8ae31f2..8d875e9322 100644
> > --- a/lib/vhost/rte_vhost.h
> > +++ b/lib/vhost/rte_vhost.h
> > @@ -36,6 +36,7 @@ extern "C" {
> >  /* support only linear buffers (no chained mbufs) */
> >  #define RTE_VHOST_USER_LINEARBUF_SUPPORT	(1ULL << 6)
> >  #define RTE_VHOST_USER_ASYNC_COPY	(1ULL << 7)
> > +#define RTE_VHOST_USER_NET_COMPLIANT_OL_FLAGS	(1ULL << 8)
> >
> >  /* Features. */
> >  #ifndef VIRTIO_NET_F_GUEST_ANNOUNCE
> > diff --git a/lib/vhost/socket.c b/lib/vhost/socket.c
> > index 0169d36481..5d0d728d52 100644
> > --- a/lib/vhost/socket.c
> > +++ b/lib/vhost/socket.c
> > @@ -42,6 +42,7 @@ struct vhost_user_socket {
> >  	bool extbuf;
> >  	bool linearbuf;
> >  	bool async_copy;
> > +	bool net_compliant_ol_flags;
> >
> >  	/*
> >  	 * The "supported_features" indicates the feature bits the
> > @@ -224,7 +225,8 @@ vhost_user_add_connection(int fd, struct
> > vhost_user_socket *vsocket)
> >  	size = strnlen(vsocket->path, PATH_MAX);
> >  	vhost_set_ifname(vid, vsocket->path, size);
> >
> > -	vhost_set_builtin_virtio_net(vid, vsocket->use_builtin_virtio_net);
> > +	vhost_setup_virtio_net(vid, vsocket->use_builtin_virtio_net,
> > +		vsocket->net_compliant_ol_flags);
> >
> >  	vhost_attach_vdpa_device(vid, vsocket->vdpa_dev);
> >
> > @@ -877,6 +879,7 @@ rte_vhost_driver_register(const char *path,
> > uint64_t flags)
> >  	vsocket->extbuf = flags & RTE_VHOST_USER_EXTBUF_SUPPORT;
> >  	vsocket->linearbuf = flags &
> > RTE_VHOST_USER_LINEARBUF_SUPPORT;
> >  	vsocket->async_copy = flags & RTE_VHOST_USER_ASYNC_COPY;
> > +	vsocket->net_compliant_ol_flags = flags &
> > RTE_VHOST_USER_NET_COMPLIANT_OL_FLAGS;
> >
> >  	if (vsocket->async_copy &&
> >  		(flags & (RTE_VHOST_USER_IOMMU_SUPPORT |
> > diff --git a/lib/vhost/vhost.c b/lib/vhost/vhost.c
> > index c9b6379f73..9abfc0bfe7 100644
> > --- a/lib/vhost/vhost.c
> > +++ b/lib/vhost/vhost.c
> > @@ -752,7 +752,7 @@ vhost_set_ifname(int vid, const char *if_name,
> > unsigned int if_len)
> >  }
> >
> >  void
> > -vhost_set_builtin_virtio_net(int vid, bool enable)
> > +vhost_setup_virtio_net(int vid, bool enable, bool compliant_ol_flags)
> >  {
> >  	struct virtio_net *dev = get_device(vid);
> >
> > @@ -763,6 +763,10 @@ vhost_set_builtin_virtio_net(int vid, bool
> enable)
> >  		dev->flags |= VIRTIO_DEV_BUILTIN_VIRTIO_NET;
> >  	else
> >  		dev->flags &= ~VIRTIO_DEV_BUILTIN_VIRTIO_NET;
> > +	if (!compliant_ol_flags)
> > +		dev->flags |= VIRTIO_DEV_LEGACY_OL_FLAGS;
> > +	else
> > +		dev->flags &= ~VIRTIO_DEV_LEGACY_OL_FLAGS;
> >  }
> >
> >  void
> > diff --git a/lib/vhost/vhost.h b/lib/vhost/vhost.h
> > index b303635645..8078ddff79 100644
> > --- a/lib/vhost/vhost.h
> > +++ b/lib/vhost/vhost.h
> > @@ -27,15 +27,17 @@
> >  #include "rte_vhost_async.h"
> >
> >  /* Used to indicate that the device is running on a data core */
> > -#define VIRTIO_DEV_RUNNING 1
> > +#define VIRTIO_DEV_RUNNING ((uint32_t)1 << 0)
> >  /* Used to indicate that the device is ready to operate */
> > -#define VIRTIO_DEV_READY 2
> > +#define VIRTIO_DEV_READY ((uint32_t)1 << 1)
> >  /* Used to indicate that the built-in vhost net device backend is enabled
> */
> > -#define VIRTIO_DEV_BUILTIN_VIRTIO_NET 4
> > +#define VIRTIO_DEV_BUILTIN_VIRTIO_NET ((uint32_t)1 << 2)
> >  /* Used to indicate that the device has its own data path and configured
> */
> > -#define VIRTIO_DEV_VDPA_CONFIGURED 8
> > +#define VIRTIO_DEV_VDPA_CONFIGURED ((uint32_t)1 << 3)
> >  /* Used to indicate that the feature negotiation failed */
> > -#define VIRTIO_DEV_FEATURES_FAILED 16
> > +#define VIRTIO_DEV_FEATURES_FAILED ((uint32_t)1 << 4)
> > +/* Used to indicate that the virtio_net tx code should fill TX ol_flags */
> > +#define VIRTIO_DEV_LEGACY_OL_FLAGS ((uint32_t)1 << 5)
> >
> >  /* Backend value set by guest. */
> >  #define VIRTIO_DEV_STOPPED -1
> > @@ -683,7 +685,7 @@ int alloc_vring_queue(struct virtio_net *dev,
> > uint32_t vring_idx);
> >  void vhost_attach_vdpa_device(int vid, struct rte_vdpa_device *dev);
> >
> >  void vhost_set_ifname(int, const char *if_name, unsigned int if_len);
> > -void vhost_set_builtin_virtio_net(int vid, bool enable);
> > +void vhost_setup_virtio_net(int vid, bool enable, bool legacy_ol_flags);
> >  void vhost_enable_extbuf(int vid);
> >  void vhost_enable_linearbuf(int vid);
> >  int vhost_enable_guest_notification(struct virtio_net *dev,
> > diff --git a/lib/vhost/virtio_net.c b/lib/vhost/virtio_net.c
> > index 1a34867f3c..8e36f4c340 100644
> > --- a/lib/vhost/virtio_net.c
> > +++ b/lib/vhost/virtio_net.c
> > @@ -8,6 +8,7 @@
> >
> >  #include <rte_mbuf.h>
> >  #include <rte_memcpy.h>
> > +#include <rte_net.h>
> >  #include <rte_ether.h>
> >  #include <rte_ip.h>
> >  #include <rte_vhost.h>
> > @@ -2303,15 +2304,12 @@ parse_ethernet(struct rte_mbuf *m,
> uint16_t
> > *l4_proto, void **l4_hdr)
> >  }
> >
> >  static __rte_always_inline void
> > -vhost_dequeue_offload(struct virtio_net_hdr *hdr, struct rte_mbuf *m)
> > +vhost_dequeue_offload_legacy(struct virtio_net_hdr *hdr, struct
> > rte_mbuf *m)
> >  {
> >  	uint16_t l4_proto = 0;
> >  	void *l4_hdr = NULL;
> >  	struct rte_tcp_hdr *tcp_hdr = NULL;
> >
> > -	if (hdr->flags == 0 && hdr->gso_type ==
> > VIRTIO_NET_HDR_GSO_NONE)
> > -		return;
> > -
> >  	parse_ethernet(m, &l4_proto, &l4_hdr);
> >  	if (hdr->flags == VIRTIO_NET_HDR_F_NEEDS_CSUM) {
> >  		if (hdr->csum_start == (m->l2_len + m->l3_len)) {
> > @@ -2356,6 +2354,94 @@ vhost_dequeue_offload(struct virtio_net_hdr
> > *hdr, struct rte_mbuf *m)
> >  	}
> >  }
> >
> > +static __rte_always_inline void
> > +vhost_dequeue_offload(struct virtio_net_hdr *hdr, struct rte_mbuf *m,
> > +	bool legacy_ol_flags)
> > +{
> > +	struct rte_net_hdr_lens hdr_lens;
> > +	int l4_supported = 0;
> > +	uint32_t ptype;
> > +
> > +	if (hdr->flags == 0 && hdr->gso_type ==
> > VIRTIO_NET_HDR_GSO_NONE)
> > +		return;
> > +
> > +	if (legacy_ol_flags) {
> > +		vhost_dequeue_offload_legacy(hdr, m);
> > +		return;
> > +	}
> > +
> > +	m->ol_flags |= PKT_RX_IP_CKSUM_UNKNOWN;
> > +
> > +	ptype = rte_net_get_ptype(m, &hdr_lens, RTE_PTYPE_ALL_MASK);
> > +	m->packet_type = ptype;
> > +	if ((ptype & RTE_PTYPE_L4_MASK) == RTE_PTYPE_L4_TCP ||
> > +	    (ptype & RTE_PTYPE_L4_MASK) == RTE_PTYPE_L4_UDP ||
> > +	    (ptype & RTE_PTYPE_L4_MASK) == RTE_PTYPE_L4_SCTP)
> > +		l4_supported = 1;
> > +
> > +	/* According to Virtio 1.1 spec, the device only needs to look at
> > +	 * VIRTIO_NET_HDR_F_NEEDS_CSUM in the packet transmission
> > path.
> > +	 * This differs from the processing incoming packets path where the
> > +	 * driver could rely on VIRTIO_NET_HDR_F_DATA_VALID flag set by
> > the
> > +	 * device.
> > +	 *
> > +	 * 5.1.6.2.1 Driver Requirements: Packet Transmission
> > +	 * The driver MUST NOT set the VIRTIO_NET_HDR_F_DATA_VALID
> > and
> > +	 * VIRTIO_NET_HDR_F_RSC_INFO bits in flags.
> > +	 *
> > +	 * 5.1.6.2.2 Device Requirements: Packet Transmission
> > +	 * The device MUST ignore flag bits that it does not recognize.
> > +	 */
> > +	if (hdr->flags & VIRTIO_NET_HDR_F_NEEDS_CSUM) {
> > +		uint32_t hdrlen;
> > +
> > +		hdrlen = hdr_lens.l2_len + hdr_lens.l3_len + hdr_lens.l4_len;
> > +		if (hdr->csum_start <= hdrlen && l4_supported != 0) {
> > +			m->ol_flags |= PKT_RX_L4_CKSUM_NONE;
> > +		} else {
> > +			/* Unknown proto or tunnel, do sw cksum. We can
> > assume
> > +			 * the cksum field is in the first segment since the
> > +			 * buffers we provided to the host are large enough.
> > +			 * In case of SCTP, this will be wrong since it's a CRC
> > +			 * but there's nothing we can do.
> > +			 */
> > +			uint16_t csum = 0, off;
> > +
> > +			if (rte_raw_cksum_mbuf(m, hdr->csum_start,
> > +					rte_pktmbuf_pkt_len(m) - hdr-
> > >csum_start, &csum) < 0)
> > +				return;
> > +			if (likely(csum != 0xffff))
> > +				csum = ~csum;
> > +			off = hdr->csum_offset + hdr->csum_start;
> > +			if (rte_pktmbuf_data_len(m) >= off + 1)
> > +				*rte_pktmbuf_mtod_offset(m, uint16_t *,
> > off) = csum;
> > +		}
> > +	}
> > +
> > +	if (hdr->gso_type != VIRTIO_NET_HDR_GSO_NONE) {
> > +		if (hdr->gso_size == 0)
> > +			return;
> > +
> > +		switch (hdr->gso_type & ~VIRTIO_NET_HDR_GSO_ECN) {
> > +		case VIRTIO_NET_HDR_GSO_TCPV4:
> > +		case VIRTIO_NET_HDR_GSO_TCPV6:
> > +			if ((ptype & RTE_PTYPE_L4_MASK) !=
> > RTE_PTYPE_L4_TCP)
> > +				break;
> > +			m->ol_flags |= PKT_RX_LRO |
> > PKT_RX_L4_CKSUM_NONE;
> > +			m->tso_segsz = hdr->gso_size;
> > +			break;
> > +		case VIRTIO_NET_HDR_GSO_UDP:
> > +			if ((ptype & RTE_PTYPE_L4_MASK) !=
> > RTE_PTYPE_L4_UDP)
> > +				break;
> > +			m->ol_flags |= PKT_RX_LRO |
> > PKT_RX_L4_CKSUM_NONE;
> > +			m->tso_segsz = hdr->gso_size;
> > +			break;
> > +		default:
> > +			break;
> > +		}
> > +	}
> > +}
> > +
> >  static __rte_noinline void
> >  copy_vnet_hdr_from_desc(struct virtio_net_hdr *hdr,
> >  		struct buf_vector *buf_vec)
> > @@ -2380,7 +2466,8 @@ copy_vnet_hdr_from_desc(struct
> virtio_net_hdr
> > *hdr,
> >  static __rte_always_inline int
> >  copy_desc_to_mbuf(struct virtio_net *dev, struct vhost_virtqueue *vq,
> >  		  struct buf_vector *buf_vec, uint16_t nr_vec,
> > -		  struct rte_mbuf *m, struct rte_mempool *mbuf_pool)
> > +		  struct rte_mbuf *m, struct rte_mempool *mbuf_pool,
> > +		  bool legacy_ol_flags)
> >  {
> >  	uint32_t buf_avail, buf_offset;
> >  	uint64_t buf_addr, buf_len;
> > @@ -2513,7 +2600,7 @@ copy_desc_to_mbuf(struct virtio_net *dev,
> > struct vhost_virtqueue *vq,
> >  	m->pkt_len    += mbuf_offset;
> >
> >  	if (hdr)
> > -		vhost_dequeue_offload(hdr, m);
> > +		vhost_dequeue_offload(hdr, m, legacy_ol_flags);
> >
> >  out:
> >
> > @@ -2606,9 +2693,11 @@ virtio_dev_pktmbuf_alloc(struct virtio_net
> > *dev, struct rte_mempool *mp,
> >  	return pkt;
> >  }
> >
> > -static __rte_noinline uint16_t
> > +__rte_always_inline
> > +static uint16_t
> >  virtio_dev_tx_split(struct virtio_net *dev, struct vhost_virtqueue *vq,
> > -	struct rte_mempool *mbuf_pool, struct rte_mbuf **pkts, uint16_t
> > count)
> > +	struct rte_mempool *mbuf_pool, struct rte_mbuf **pkts, uint16_t
> > count,
> > +	bool legacy_ol_flags)
> >  {
> >  	uint16_t i;
> >  	uint16_t free_entries;
> > @@ -2668,7 +2757,7 @@ virtio_dev_tx_split(struct virtio_net *dev,
> struct
> > vhost_virtqueue *vq,
> >  		}
> >
> >  		err = copy_desc_to_mbuf(dev, vq, buf_vec, nr_vec, pkts[i],
> > -				mbuf_pool);
> > +				mbuf_pool, legacy_ol_flags);
> >  		if (unlikely(err)) {
> >  			rte_pktmbuf_free(pkts[i]);
> >  			if (!allocerr_warned) {
> > @@ -2696,6 +2785,24 @@ virtio_dev_tx_split(struct virtio_net *dev,
> > struct vhost_virtqueue *vq,
> >  	return (i - dropped);
> >  }
> >
> > +__rte_noinline
> > +static uint16_t
> > +virtio_dev_tx_split_legacy(struct virtio_net *dev,
> > +	struct vhost_virtqueue *vq, struct rte_mempool *mbuf_pool,
> > +	struct rte_mbuf **pkts, uint16_t count)
> > +{
> > +	return virtio_dev_tx_split(dev, vq, mbuf_pool, pkts, count, true);
> > +}
> > +
> > +__rte_noinline
> > +static uint16_t
> > +virtio_dev_tx_split_compliant(struct virtio_net *dev,
> > +	struct vhost_virtqueue *vq, struct rte_mempool *mbuf_pool,
> > +	struct rte_mbuf **pkts, uint16_t count)
> > +{
> > +	return virtio_dev_tx_split(dev, vq, mbuf_pool, pkts, count, false);
> > +}
> > +
> >  static __rte_always_inline int
> >  vhost_reserve_avail_batch_packed(struct virtio_net *dev,
> >  				 struct vhost_virtqueue *vq,
> > @@ -2770,7 +2877,8 @@ vhost_reserve_avail_batch_packed(struct
> > virtio_net *dev,
> >  static __rte_always_inline int
> >  virtio_dev_tx_batch_packed(struct virtio_net *dev,
> >  			   struct vhost_virtqueue *vq,
> > -			   struct rte_mbuf **pkts)
> > +			   struct rte_mbuf **pkts,
> > +			   bool legacy_ol_flags)
> >  {
> >  	uint16_t avail_idx = vq->last_avail_idx;
> >  	uint32_t buf_offset = sizeof(struct virtio_net_hdr_mrg_rxbuf);
> > @@ -2794,7 +2902,7 @@ virtio_dev_tx_batch_packed(struct virtio_net
> > *dev,
> >  	if (virtio_net_with_host_offload(dev)) {
> >  		vhost_for_each_try_unroll(i, 0, PACKED_BATCH_SIZE) {
> >  			hdr = (struct virtio_net_hdr *)(desc_addrs[i]);
> > -			vhost_dequeue_offload(hdr, pkts[i]);
> > +			vhost_dequeue_offload(hdr, pkts[i], legacy_ol_flags);
> >  		}
> >  	}
> >
> > @@ -2815,7 +2923,8 @@ vhost_dequeue_single_packed(struct
> virtio_net
> > *dev,
> >  			    struct rte_mempool *mbuf_pool,
> >  			    struct rte_mbuf *pkts,
> >  			    uint16_t *buf_id,
> > -			    uint16_t *desc_count)
> > +			    uint16_t *desc_count,
> > +			    bool legacy_ol_flags)
> >  {
> >  	struct buf_vector buf_vec[BUF_VECTOR_MAX];
> >  	uint32_t buf_len;
> > @@ -2841,7 +2950,7 @@ vhost_dequeue_single_packed(struct
> virtio_net
> > *dev,
> >  	}
> >
> >  	err = copy_desc_to_mbuf(dev, vq, buf_vec, nr_vec, pkts,
> > -				mbuf_pool);
> > +				mbuf_pool, legacy_ol_flags);
> >  	if (unlikely(err)) {
> >  		if (!allocerr_warned) {
> >  			VHOST_LOG_DATA(ERR,
> > @@ -2859,14 +2968,15 @@ static __rte_always_inline int
> >  virtio_dev_tx_single_packed(struct virtio_net *dev,
> >  			    struct vhost_virtqueue *vq,
> >  			    struct rte_mempool *mbuf_pool,
> > -			    struct rte_mbuf *pkts)
> > +			    struct rte_mbuf *pkts,
> > +			    bool legacy_ol_flags)
> >  {
> >
> >  	uint16_t buf_id, desc_count = 0;
> >  	int ret;
> >
> >  	ret = vhost_dequeue_single_packed(dev, vq, mbuf_pool, pkts,
> > &buf_id,
> > -					&desc_count);
> > +					&desc_count, legacy_ol_flags);
> >
> >  	if (likely(desc_count > 0)) {
> >  		if (virtio_net_is_inorder(dev))
> > @@ -2882,12 +2992,14 @@ virtio_dev_tx_single_packed(struct
> virtio_net
> > *dev,
> >  	return ret;
> >  }
> >
> > -static __rte_noinline uint16_t
> > +__rte_always_inline
> > +static uint16_t
> >  virtio_dev_tx_packed(struct virtio_net *dev,
> >  		     struct vhost_virtqueue *__rte_restrict vq,
> >  		     struct rte_mempool *mbuf_pool,
> >  		     struct rte_mbuf **__rte_restrict pkts,
> > -		     uint32_t count)
> > +		     uint32_t count,
> > +		     bool legacy_ol_flags)
> >  {
> >  	uint32_t pkt_idx = 0;
> >
> > @@ -2899,14 +3011,16 @@ virtio_dev_tx_packed(struct virtio_net *dev,
> >
> >  		if (count - pkt_idx >= PACKED_BATCH_SIZE) {
> >  			if (!virtio_dev_tx_batch_packed(dev, vq,
> > -							&pkts[pkt_idx])) {
> > +							&pkts[pkt_idx],
> > +							legacy_ol_flags)) {
> >  				pkt_idx += PACKED_BATCH_SIZE;
> >  				continue;
> >  			}
> >  		}
> >
> >  		if (virtio_dev_tx_single_packed(dev, vq, mbuf_pool,
> > -						pkts[pkt_idx]))
> > +						pkts[pkt_idx],
> > +						legacy_ol_flags))
> >  			break;
> >  		pkt_idx++;
> >  	} while (pkt_idx < count);
> > @@ -2924,6 +3038,24 @@ virtio_dev_tx_packed(struct virtio_net *dev,
> >  	return pkt_idx;
> >  }
> >
> > +__rte_noinline
> > +static uint16_t
> > +virtio_dev_tx_packed_legacy(struct virtio_net *dev,
> > +	struct vhost_virtqueue *__rte_restrict vq, struct rte_mempool
> > *mbuf_pool,
> > +	struct rte_mbuf **__rte_restrict pkts, uint32_t count)
> > +{
> > +	return virtio_dev_tx_packed(dev, vq, mbuf_pool, pkts, count, true);
> > +}
> > +
> > +__rte_noinline
> > +static uint16_t
> > +virtio_dev_tx_packed_compliant(struct virtio_net *dev,
> > +	struct vhost_virtqueue *__rte_restrict vq, struct rte_mempool
> > *mbuf_pool,
> > +	struct rte_mbuf **__rte_restrict pkts, uint32_t count)
> > +{
> > +	return virtio_dev_tx_packed(dev, vq, mbuf_pool, pkts, count, false);
> > +}
> > +
> >  uint16_t
> >  rte_vhost_dequeue_burst(int vid, uint16_t queue_id,
> >  	struct rte_mempool *mbuf_pool, struct rte_mbuf **pkts, uint16_t
> > count)
> > @@ -2999,10 +3131,17 @@ rte_vhost_dequeue_burst(int vid, uint16_t
> > queue_id,
> >  		count -= 1;
> >  	}
> >
> > -	if (vq_is_packed(dev))
> > -		count = virtio_dev_tx_packed(dev, vq, mbuf_pool, pkts,
> > count);
> > -	else
> > -		count = virtio_dev_tx_split(dev, vq, mbuf_pool, pkts, count);
> > +	if (vq_is_packed(dev)) {
> > +		if (dev->flags & VIRTIO_DEV_LEGACY_OL_FLAGS)
> > +			count = virtio_dev_tx_packed_legacy(dev, vq,
> > mbuf_pool, pkts, count);
> > +		else
> > +			count = virtio_dev_tx_packed_compliant(dev, vq,
> > mbuf_pool, pkts, count);
> > +	} else {
> > +		if (dev->flags & VIRTIO_DEV_LEGACY_OL_FLAGS)
> > +			count = virtio_dev_tx_split_legacy(dev, vq,
> > mbuf_pool, pkts, count);
> > +		else
> > +			count = virtio_dev_tx_split_compliant(dev, vq,
> > mbuf_pool, pkts, count);
> > +	}
> >
> >  out:
> >  	if (dev->features & (1ULL << VIRTIO_F_IOMMU_PLATFORM))
> > --
> > 2.23.0


^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [dpdk-dev] [PATCH v4 3/3] vhost: fix offload flags in Rx path
  2021-05-12  3:29       ` Wang, Yinan
@ 2021-05-12 15:20         ` David Marchand
  2021-05-13  6:34           ` Wang, Yinan
  0 siblings, 1 reply; 63+ messages in thread
From: David Marchand @ 2021-05-12 15:20 UTC (permalink / raw)
  To: Wang, Yinan
  Cc: dev, maxime.coquelin, olivier.matz, fbl, i.maximets, Xia, Chenbo,
	Stokes, Ian, stable, Jijiang Liu, Yuanhan Liu

On Wed, May 12, 2021 at 5:30 AM Wang, Yinan <yinan.wang@intel.com> wrote:
>
> Hi David,
>
> Since vhost tx offload can’t work now, we report a Bugzilla as below, could you help to take a look?
> https://bugs.dpdk.org/show_bug.cgi?id=702

(I discovered your mail from 05/08 only today, now that I got a new
mail, might be a pebcak from me, sorry...)


- Looking at the bz, there is a first issue/misconception.
testpmd only does TSO or any kind of tx offloading with the csum forward engine.
The iofwd engine won't make TSO possible.


- Let's say we use the csum fwd engine, testpmd configures drivers
through the ethdev API.
The ethdev API states that no offloading is enabled unless requested
by the application.
TSO, l3/l4 checksums offloading are documented as:
https://doc.dpdk.org/guides/nics/features.html#l3-checksum-offload
https://doc.dpdk.org/guides/nics/features.html#lro

But the vhost pmd does not report such capabilities.
https://git.dpdk.org/dpdk/tree/drivers/net/vhost/rte_eth_vhost.c#n1276

So we can't expect testpmd to have tso working with net/vhost pmd.


- The csum offloading engine swaps mac addresses.
I would expect issues with inter vm traffic.


In summary, I think this is a bad test.
If it worked with the commands in the bugzilla before my change (which
I doubt), it was wrong.


> We also tried vhost example with VM2VM iperf test, large pkts also can't forwarding.

"large pkts", can you give details?

I tried to use this example, without/with my change, but:

When I try to start this example with a physical port and two vhosts,
I get a crash (division by 0 on vdmq stuff).
When I start it without a physical port, I get a complaint about no
port being enabled.
Passing a portmask 0x1 seems to work, the example starts but, next, no
traffic is forwarded (not even arp).
Hooking gdb, I never get packet dequeued from vhost.


-- 
David Marchand


^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [dpdk-dev] [PATCH v4 3/3] vhost: fix offload flags in Rx path
  2021-05-12 15:20         ` David Marchand
@ 2021-05-13  6:34           ` Wang, Yinan
  0 siblings, 0 replies; 63+ messages in thread
From: Wang, Yinan @ 2021-05-13  6:34 UTC (permalink / raw)
  To: David Marchand
  Cc: dev, maxime.coquelin, olivier.matz, fbl, i.maximets, Xia, Chenbo,
	Stokes, Ian, stable, Jijiang Liu, Yuanhan Liu



> -----Original Message-----
> From: David Marchand <david.marchand@redhat.com>
> Sent: 2021年5月12日 23:20
> To: Wang, Yinan <yinan.wang@intel.com>
> Cc: dev@dpdk.org; maxime.coquelin@redhat.com;
> olivier.matz@6wind.com; fbl@sysclose.org; i.maximets@ovn.org; Xia,
> Chenbo <chenbo.xia@intel.com>; Stokes, Ian <ian.stokes@intel.com>;
> stable@dpdk.org; Jijiang Liu <jijiang.liu@intel.com>; Yuanhan Liu
> <yuanhan.liu@linux.intel.com>
> Subject: Re: [dpdk-dev] [PATCH v4 3/3] vhost: fix offload flags in Rx path
> 
> On Wed, May 12, 2021 at 5:30 AM Wang, Yinan <yinan.wang@intel.com>
> wrote:
> >
> > Hi David,
> >
> > Since vhost tx offload can’t work now, we report a Bugzilla as below, could
> you help to take a look?
> > https://bugs.dpdk.org/show_bug.cgi?id=702
> 
> (I discovered your mail from 05/08 only today, now that I got a new
> mail, might be a pebcak from me, sorry...)
> 
> 
> - Looking at the bz, there is a first issue/misconception.
> testpmd only does TSO or any kind of tx offloading with the csum forward
> engine.
> The iofwd engine won't make TSO possible.
> 
> 
> - Let's say we use the csum fwd engine, testpmd configures drivers
> through the ethdev API.
> The ethdev API states that no offloading is enabled unless requested
> by the application.
> TSO, l3/l4 checksums offloading are documented as:
> https://doc.dpdk.org/guides/nics/features.html#l3-checksum-offload
> https://doc.dpdk.org/guides/nics/features.html#lro
> 
> But the vhost pmd does not report such capabilities.
> https://git.dpdk.org/dpdk/tree/drivers/net/vhost/rte_eth_vhost.c#n1276
> 
> So we can't expect testpmd to have tso working with net/vhost pmd.
> 
> 
> - The csum offloading engine swaps mac addresses.
> I would expect issues with inter vm traffic.
> 
> 
> In summary, I think this is a bad test.
> If it worked with the commands in the bugzilla before my change (which
> I doubt), it was wrong.

Thanks your kindly explanation. 
Before this patch, vhost can declare tso offload, if we configure TSO/csum in Qemu, tso offload flags can be marked, such vm2vm can fwd large pkts (64k when using iperf) with iofwd.
Now I am understand this case will not work later, we can move to using vswitch.

> 
> > We also tried vhost example with VM2VM iperf test, large pkts also can't
> forwarding.
> 
> "large pkts", can you give details?
> 
> I tried to use this example, without/with my change, but:
> 
> When I try to start this example with a physical port and two vhosts,
> I get a crash (division by 0 on vdmq stuff).
> When I start it without a physical port, I get a complaint about no
> port being enabled.
> Passing a portmask 0x1 seems to work, the example starts but, next, no
> traffic is forwarded (not even arp).
> Hooking gdb, I never get packet dequeued from vhost.

I re-test with vswitch, vm2vm iperf test can work w/ and w/o this patch. Sorry for the wrong result about vhost example before.
There are some special configuration in vswitch sample. Test steps can work as below:

1. Modify the testpmd code as following::
	--- a/examples/vhost/main.c
	+++ b/examples/vhost/main.c
	@@ -29,7 +29,7 @@
	 #include "main.h"

	 #ifndef MAX_QUEUES
	-#define MAX_QUEUES 128
	+#define MAX_QUEUES 512
	 #endif
	 /* the maximum number of external ports supported */

2. Bind one physical ports to vfio-pci, launch dpdk-vhost by below command::

	./dpdk-vhost -l 26-28 -n 4 -- -p 0x1 --mergeable 1 --vm2vm 1 --socket-file /tmp/vhost-net0 --socket-file /tmp/vhost-net1

3. Start VM0::

 	/home/qemu-install/qemu-4.2.1/bin/qemu-system-x86_64 -name vm1 -enable-kvm -cpu host -smp 4 -m 4096 \
        -object memory-backend-file,id=mem,size=4096M,mem-path=/mnt/huge,share=on \
        -numa node,memdev=mem -mem-prealloc -drive file=/home/osimg/ubuntu20-04.img  \
        -chardev socket,path=/tmp/vm2_qga0.sock,server,nowait,id=vm2_qga0 -device virtio-serial \
        -device virtserialport,chardev=vm2_qga0,name=org.qemu.guest_agent.2 -daemonize \
        -monitor unix:/tmp/vm2_monitor.sock,server,nowait -device e1000,netdev=nttsip1 \
        -netdev user,id=nttsip1,hostfwd=tcp:127.0.0.1:6002-:22 \
        -chardev socket,id=char0,path=/tmp/vhost-net0 \
        -netdev type=vhost-user,id=netdev0,chardev=char0,vhostforce \
        -device virtio-net-pci,netdev=netdev0,mac=52:54:00:00:00:01,disable-modern=true,mrg_rxbuf=off,csum=on,guest_csum=on,host_tso4=on,guest_tso4=on,guest_ecn=on -vnc :10

4. Start VM1::

	/home/qemu-install/qemu-4.2.1/bin/qemu-system-x86_64 -name vm2 -enable-kvm -cpu host -smp 4 -m 4096 \
        -object memory-backend-file,id=mem,size=4096M,mem-path=/mnt/huge,share=on \
        -numa node,memdev=mem -mem-prealloc -drive file=/home/osimg/ubuntu20-04-2.img  \
        -chardev socket,path=/tmp/vm2_qga0.sock,server,nowait,id=vm2_qga0 -device virtio-serial \
        -device virtserialport,chardev=vm2_qga0,name=org.qemu.guest_agent.2 -daemonize \
        -monitor unix:/tmp/vm2_monitor.sock,server,nowait -device e1000,netdev=nttsip1 \
        -netdev user,id=nttsip1,hostfwd=tcp:127.0.0.1:6003-:22 \
        -chardev socket,id=char0,path=/tmp/vhost-net1 \
        -netdev type=vhost-user,id=netdev0,chardev=char0,vhostforce \
        -device virtio-net-pci,netdev=netdev0,mac=52:54:00:00:00:02,disable-modern=true,mrg_rxbuf=off,csum=on,guest_csum=on,host_tso4=on,guest_tso4=on,guest_ecn=on -vnc :12
5. On VM1, set virtio device IP and run arp protocal::

    ifconfig ens5 1.1.1.2
    arp -s 1.1.1.8 52:54:00:00:00:02

6. On VM2, set virtio device IP and run arp protocal::

    ifconfig ens5 1.1.1.8
    arp -s 1.1.1.2 52:54:00:00:00:01

7. Check the iperf performance with different packet size between two VMs by below commands::

    Under VM1, run: `iperf -s -i 1`
    Under VM2, run: `iperf -c 1.1.1.2 -i 1 -t 60`

> 
> 
> --
> David Marchand


^ permalink raw reply	[flat|nested] 63+ messages in thread

end of thread, other threads:[~2021-05-13  6:34 UTC | newest]

Thread overview: 63+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-04-01  9:52 [dpdk-dev] [PATCH 0/5] Offload flags fixes David Marchand
2021-04-01  9:52 ` [dpdk-dev] [PATCH 1/5] mbuf: mark old offload flag as deprecated David Marchand
2021-04-07 20:14   ` Flavio Leitner
2021-04-08  7:23   ` Olivier Matz
2021-04-08  8:41     ` David Marchand
2021-04-01  9:52 ` [dpdk-dev] [PATCH 2/5] net/tap: do not touch Tx offload flags David Marchand
2021-04-07 20:15   ` Flavio Leitner
2021-04-08  7:41     ` Olivier Matz
2021-04-08 11:21       ` Flavio Leitner
2021-04-08 12:05         ` Olivier Matz
2021-04-08 12:58           ` Flavio Leitner
2021-04-09 13:30             ` Olivier Matz
2021-04-09 16:55               ` Flavio Leitner
2021-04-28 12:17               ` David Marchand
2021-04-08 12:16         ` Ananyev, Konstantin
2021-04-08  7:53   ` Olivier Matz
2021-04-28 12:12     ` David Marchand
2021-04-01  9:52 ` [dpdk-dev] [PATCH 3/5] net/virtio: " David Marchand
2021-04-13 14:17   ` Maxime Coquelin
2021-04-01  9:52 ` [dpdk-dev] [PATCH 4/5] net/virtio: refactor Tx offload helper David Marchand
2021-04-08 13:05   ` Flavio Leitner
2021-04-09  2:31   ` Ruifeng Wang
2021-04-01  9:52 ` [dpdk-dev] [PATCH 5/5] vhost: fix offload flags in Rx path David Marchand
2021-04-08  8:28   ` Olivier Matz
2021-04-08 18:38   ` Flavio Leitner
2021-04-13 15:27     ` Maxime Coquelin
2021-04-27 17:09       ` David Marchand
2021-04-27 17:19         ` David Marchand
2021-04-29  8:04 ` [dpdk-dev] [PATCH v2 0/4] Offload flags fixes David Marchand
2021-04-29  8:04   ` [dpdk-dev] [PATCH v2 1/4] mbuf: mark old offload flag as deprecated David Marchand
2021-04-29 12:14     ` Lance Richardson
2021-04-29 16:45     ` Ajit Khaparde
2021-04-29  8:04   ` [dpdk-dev] [PATCH v2 2/4] net/virtio: do not touch Tx offload flags David Marchand
2021-04-29 13:51     ` Flavio Leitner
2021-04-29  8:04   ` [dpdk-dev] [PATCH v2 3/4] net/virtio: refactor Tx offload helper David Marchand
2021-04-29 12:59     ` Maxime Coquelin
2021-04-29  8:04   ` [dpdk-dev] [PATCH v2 4/4] vhost: fix offload flags in Rx path David Marchand
2021-04-29 13:30     ` Maxime Coquelin
2021-04-29 13:31       ` Maxime Coquelin
2021-04-29 20:21         ` David Marchand
2021-04-30  8:38           ` Maxime Coquelin
2021-04-29 20:09       ` David Marchand
2021-04-29 18:39     ` Flavio Leitner
2021-04-29 19:18       ` David Marchand
2021-05-03 13:26 ` [dpdk-dev] [PATCH v3 0/4] Offload flags fixes David Marchand
2021-05-03 13:26   ` [dpdk-dev] [PATCH v3 1/4] mbuf: mark old offload flag as deprecated David Marchand
2021-05-03 14:02     ` Maxime Coquelin
2021-05-03 14:12     ` David Marchand
2021-05-03 13:26   ` [dpdk-dev] [PATCH v3 2/4] net/virtio: do not touch Tx offload flags David Marchand
2021-05-03 13:26   ` [dpdk-dev] [PATCH v3 3/4] net/virtio: refactor Tx offload helper David Marchand
2021-05-03 13:26   ` [dpdk-dev] [PATCH v3 4/4] vhost: fix offload flags in Rx path David Marchand
2021-05-03 15:24   ` [dpdk-dev] [PATCH v3 0/4] Offload flags fixes Maxime Coquelin
2021-05-03 16:21     ` David Marchand
2021-05-03 16:43 ` [dpdk-dev] [PATCH v4 0/3] " David Marchand
2021-05-03 16:43   ` [dpdk-dev] [PATCH v4 1/3] net/virtio: do not touch Tx offload flags David Marchand
2021-05-03 16:43   ` [dpdk-dev] [PATCH v4 2/3] net/virtio: refactor Tx offload helper David Marchand
2021-05-03 16:43   ` [dpdk-dev] [PATCH v4 3/3] vhost: fix offload flags in Rx path David Marchand
2021-05-04 11:07     ` Flavio Leitner
2021-05-08  6:24     ` Wang, Yinan
2021-05-12  3:29       ` Wang, Yinan
2021-05-12 15:20         ` David Marchand
2021-05-13  6:34           ` Wang, Yinan
2021-05-04  8:29   ` [dpdk-dev] [PATCH v4 0/3] Offload flags fixes Maxime Coquelin

DPDK patches and discussions

This inbox may be cloned and mirrored by anyone:

	git clone --mirror https://inbox.dpdk.org/dev/0 dev/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 dev dev/ https://inbox.dpdk.org/dev \
		dev@dpdk.org
	public-inbox-index dev

Example config snippet for mirrors.
Newsgroup available over NNTP:
	nntp://inbox.dpdk.org/inbox.dpdk.dev


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git