* [dpdk-stable] [PATCH v3 4/4] vhost: fix offload flags in Rx path [not found] ` <20210503132646.16076-1-david.marchand@redhat.com> @ 2021-05-03 13:26 ` David Marchand 0 siblings, 0 replies; 7+ messages in thread From: David Marchand @ 2021-05-03 13:26 UTC (permalink / raw) To: dev Cc: maxime.coquelin, olivier.matz, fbl, i.maximets, chenbo.xia, ian.stokes, stable, Jijiang Liu, Yuanhan Liu The vhost library currently configures Tx offloading (PKT_TX_*) on any packet received from a guest virtio device which asks for some offloading. This is problematic, as Tx offloading is something that the application must ask for: the application needs to configure devices to support every used offloads (ip, tcp checksumming, tso..), and the various l2/l3/l4 lengths must be set following any processing that happened in the application itself. On the other hand, the received packets are not marked wrt current packet l3/l4 checksumming info. Copy virtio rx processing to fix those offload flags with some differences: - accept VIRTIO_NET_HDR_GSO_ECN and VIRTIO_NET_HDR_GSO_UDP, - ignore anything but the VIRTIO_NET_HDR_F_NEEDS_CSUM flag (to comply with the virtio spec), Some applications might rely on the current behavior, so it is left untouched by default. A new RTE_VHOST_USER_NET_COMPLIANT_OL_FLAGS flag is added to enable the new behavior. The vhost example has been updated for the new behavior: TSO is applied to any packet marked LRO. Fixes: 859b480d5afd ("vhost: add guest offload setting") Cc: stable@dpdk.org Signed-off-by: David Marchand <david.marchand@redhat.com> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com> --- Changes since v2: - introduced a new flag to keep existing behavior as the default, - packets with unrecognised offload are passed to the application with no offload metadata rather than dropped, - ignored VIRTIO_NET_HDR_F_DATA_VALID since the virtio spec states that the virtio driver is not allowed to use this flag when transmitting packets, Changes since v1: - updated vhost example, - restored VIRTIO_NET_HDR_GSO_ECN and VIRTIO_NET_HDR_GSO_UDP support, - restored log on buggy offload request, --- doc/guides/prog_guide/vhost_lib.rst | 12 ++ doc/guides/rel_notes/release_21_05.rst | 6 + drivers/net/vhost/rte_eth_vhost.c | 2 +- examples/vhost/main.c | 44 +++--- lib/vhost/rte_vhost.h | 1 + lib/vhost/socket.c | 5 +- lib/vhost/vhost.c | 6 +- lib/vhost/vhost.h | 14 +- lib/vhost/virtio_net.c | 185 ++++++++++++++++++++++--- 9 files changed, 222 insertions(+), 53 deletions(-) diff --git a/doc/guides/prog_guide/vhost_lib.rst b/doc/guides/prog_guide/vhost_lib.rst index dc29229167..042875a9ca 100644 --- a/doc/guides/prog_guide/vhost_lib.rst +++ b/doc/guides/prog_guide/vhost_lib.rst @@ -118,6 +118,18 @@ The following is an overview of some key Vhost API functions: It is disabled by default. + - ``RTE_VHOST_USER_NET_COMPLIANT_OL_FLAGS`` + + Since v16.04, the vhost library forwards checksum and gso requests for + packets received from a virtio driver by filling Tx offload metadata in + the mbuf. This behavior is inconsistent with other drivers but it is left + untouched for existing applications that might rely on it. + + This flag disables the legacy behavior and instead ask vhost to simply + populate Rx offload metadata in the mbuf. + + It is disabled by default. + * ``rte_vhost_driver_set_features(path, features)`` This function sets the feature bits the vhost-user driver supports. The diff --git a/doc/guides/rel_notes/release_21_05.rst b/doc/guides/rel_notes/release_21_05.rst index b3224dc332..1cb06ce487 100644 --- a/doc/guides/rel_notes/release_21_05.rst +++ b/doc/guides/rel_notes/release_21_05.rst @@ -329,6 +329,12 @@ API Changes ``policer_action_recolor_supported`` and ``policer_action_drop_supported`` have been removed. +* vhost: The vhost library currently populates received mbufs from a virtio + driver with Tx offload flags while not filling Rx offload flags. + While this behavior is arguable, it is kept untouched. + A new flag ``RTE_VHOST_USER_NET_COMPLIANT_OL_FLAGS`` has been added to ask + for a behavior compliant with to the mbuf offload API. + ABI Changes ----------- diff --git a/drivers/net/vhost/rte_eth_vhost.c b/drivers/net/vhost/rte_eth_vhost.c index d198fc8a8e..281379d6a3 100644 --- a/drivers/net/vhost/rte_eth_vhost.c +++ b/drivers/net/vhost/rte_eth_vhost.c @@ -1505,7 +1505,7 @@ rte_pmd_vhost_probe(struct rte_vdev_device *dev) int ret = 0; char *iface_name; uint16_t queues; - uint64_t flags = 0; + uint64_t flags = RTE_VHOST_USER_NET_COMPLIANT_OL_FLAGS; uint64_t disable_flags = 0; int client_mode = 0; int iommu_support = 0; diff --git a/examples/vhost/main.c b/examples/vhost/main.c index ff48ba270d..64295aaf7e 100644 --- a/examples/vhost/main.c +++ b/examples/vhost/main.c @@ -19,6 +19,7 @@ #include <rte_log.h> #include <rte_string_fns.h> #include <rte_malloc.h> +#include <rte_net.h> #include <rte_vhost.h> #include <rte_ip.h> #include <rte_tcp.h> @@ -1032,33 +1033,34 @@ find_local_dest(struct vhost_dev *vdev, struct rte_mbuf *m, return 0; } -static uint16_t -get_psd_sum(void *l3_hdr, uint64_t ol_flags) -{ - if (ol_flags & PKT_TX_IPV4) - return rte_ipv4_phdr_cksum(l3_hdr, ol_flags); - else /* assume ethertype == RTE_ETHER_TYPE_IPV6 */ - return rte_ipv6_phdr_cksum(l3_hdr, ol_flags); -} - static void virtio_tx_offload(struct rte_mbuf *m) { + struct rte_net_hdr_lens hdr_lens; + struct rte_ipv4_hdr *ipv4_hdr; + struct rte_tcp_hdr *tcp_hdr; + uint32_t ptype; void *l3_hdr; - struct rte_ipv4_hdr *ipv4_hdr = NULL; - struct rte_tcp_hdr *tcp_hdr = NULL; - struct rte_ether_hdr *eth_hdr = - rte_pktmbuf_mtod(m, struct rte_ether_hdr *); - l3_hdr = (char *)eth_hdr + m->l2_len; + ptype = rte_net_get_ptype(m, &hdr_lens, RTE_PTYPE_ALL_MASK); + m->l2_len = hdr_lens.l2_len; + m->l3_len = hdr_lens.l3_len; + m->l4_len = hdr_lens.l4_len; - if (m->ol_flags & PKT_TX_IPV4) { + l3_hdr = rte_pktmbuf_mtod_offset(m, void *, m->l2_len); + tcp_hdr = rte_pktmbuf_mtod_offset(m, struct rte_tcp_hdr *, + m->l2_len + m->l3_len); + + m->ol_flags |= PKT_TX_TCP_SEG; + if ((ptype & RTE_PTYPE_L3_MASK) == RTE_PTYPE_L3_IPV4) { + m->ol_flags |= PKT_TX_IPV4; + m->ol_flags |= PKT_TX_IP_CKSUM; ipv4_hdr = l3_hdr; ipv4_hdr->hdr_checksum = 0; - m->ol_flags |= PKT_TX_IP_CKSUM; + tcp_hdr->cksum = rte_ipv4_phdr_cksum(l3_hdr, m->ol_flags); + } else { /* assume ethertype == RTE_ETHER_TYPE_IPV6 */ + m->ol_flags |= PKT_TX_IPV6; + tcp_hdr->cksum = rte_ipv6_phdr_cksum(l3_hdr, m->ol_flags); } - - tcp_hdr = (struct rte_tcp_hdr *)((char *)l3_hdr + m->l3_len); - tcp_hdr->cksum = get_psd_sum(l3_hdr, m->ol_flags); } static __rte_always_inline void @@ -1151,7 +1153,7 @@ virtio_tx_route(struct vhost_dev *vdev, struct rte_mbuf *m, uint16_t vlan_tag) m->vlan_tci = vlan_tag; } - if (m->ol_flags & PKT_TX_TCP_SEG) + if (m->ol_flags & PKT_RX_LRO) virtio_tx_offload(m); tx_q->m_table[tx_q->len++] = m; @@ -1636,7 +1638,7 @@ main(int argc, char *argv[]) int ret, i; uint16_t portid; static pthread_t tid; - uint64_t flags = 0; + uint64_t flags = RTE_VHOST_USER_NET_COMPLIANT_OL_FLAGS; signal(SIGINT, sigint_handler); diff --git a/lib/vhost/rte_vhost.h b/lib/vhost/rte_vhost.h index d0a8ae31f2..8d875e9322 100644 --- a/lib/vhost/rte_vhost.h +++ b/lib/vhost/rte_vhost.h @@ -36,6 +36,7 @@ extern "C" { /* support only linear buffers (no chained mbufs) */ #define RTE_VHOST_USER_LINEARBUF_SUPPORT (1ULL << 6) #define RTE_VHOST_USER_ASYNC_COPY (1ULL << 7) +#define RTE_VHOST_USER_NET_COMPLIANT_OL_FLAGS (1ULL << 8) /* Features. */ #ifndef VIRTIO_NET_F_GUEST_ANNOUNCE diff --git a/lib/vhost/socket.c b/lib/vhost/socket.c index 0169d36481..5d0d728d52 100644 --- a/lib/vhost/socket.c +++ b/lib/vhost/socket.c @@ -42,6 +42,7 @@ struct vhost_user_socket { bool extbuf; bool linearbuf; bool async_copy; + bool net_compliant_ol_flags; /* * The "supported_features" indicates the feature bits the @@ -224,7 +225,8 @@ vhost_user_add_connection(int fd, struct vhost_user_socket *vsocket) size = strnlen(vsocket->path, PATH_MAX); vhost_set_ifname(vid, vsocket->path, size); - vhost_set_builtin_virtio_net(vid, vsocket->use_builtin_virtio_net); + vhost_setup_virtio_net(vid, vsocket->use_builtin_virtio_net, + vsocket->net_compliant_ol_flags); vhost_attach_vdpa_device(vid, vsocket->vdpa_dev); @@ -877,6 +879,7 @@ rte_vhost_driver_register(const char *path, uint64_t flags) vsocket->extbuf = flags & RTE_VHOST_USER_EXTBUF_SUPPORT; vsocket->linearbuf = flags & RTE_VHOST_USER_LINEARBUF_SUPPORT; vsocket->async_copy = flags & RTE_VHOST_USER_ASYNC_COPY; + vsocket->net_compliant_ol_flags = flags & RTE_VHOST_USER_NET_COMPLIANT_OL_FLAGS; if (vsocket->async_copy && (flags & (RTE_VHOST_USER_IOMMU_SUPPORT | diff --git a/lib/vhost/vhost.c b/lib/vhost/vhost.c index a70fe01d8f..846113d46f 100644 --- a/lib/vhost/vhost.c +++ b/lib/vhost/vhost.c @@ -752,7 +752,7 @@ vhost_set_ifname(int vid, const char *if_name, unsigned int if_len) } void -vhost_set_builtin_virtio_net(int vid, bool enable) +vhost_setup_virtio_net(int vid, bool enable, bool compliant_ol_flags) { struct virtio_net *dev = get_device(vid); @@ -763,6 +763,10 @@ vhost_set_builtin_virtio_net(int vid, bool enable) dev->flags |= VIRTIO_DEV_BUILTIN_VIRTIO_NET; else dev->flags &= ~VIRTIO_DEV_BUILTIN_VIRTIO_NET; + if (!compliant_ol_flags) + dev->flags |= VIRTIO_DEV_LEGACY_OL_FLAGS; + else + dev->flags &= ~VIRTIO_DEV_LEGACY_OL_FLAGS; } void diff --git a/lib/vhost/vhost.h b/lib/vhost/vhost.h index f628714c24..65bcdc5301 100644 --- a/lib/vhost/vhost.h +++ b/lib/vhost/vhost.h @@ -27,15 +27,17 @@ #include "rte_vhost_async.h" /* Used to indicate that the device is running on a data core */ -#define VIRTIO_DEV_RUNNING 1 +#define VIRTIO_DEV_RUNNING ((uint32_t)1 << 0) /* Used to indicate that the device is ready to operate */ -#define VIRTIO_DEV_READY 2 +#define VIRTIO_DEV_READY ((uint32_t)1 << 1) /* Used to indicate that the built-in vhost net device backend is enabled */ -#define VIRTIO_DEV_BUILTIN_VIRTIO_NET 4 +#define VIRTIO_DEV_BUILTIN_VIRTIO_NET ((uint32_t)1 << 2) /* Used to indicate that the device has its own data path and configured */ -#define VIRTIO_DEV_VDPA_CONFIGURED 8 +#define VIRTIO_DEV_VDPA_CONFIGURED ((uint32_t)1 << 3) /* Used to indicate that the feature negotiation failed */ -#define VIRTIO_DEV_FEATURES_FAILED 16 +#define VIRTIO_DEV_FEATURES_FAILED ((uint32_t)1 << 4) +/* Used to indicate that the virtio_net tx code should fill TX ol_flags */ +#define VIRTIO_DEV_LEGACY_OL_FLAGS ((uint32_t)1 << 5) /* Backend value set by guest. */ #define VIRTIO_DEV_STOPPED -1 @@ -674,7 +676,7 @@ int alloc_vring_queue(struct virtio_net *dev, uint32_t vring_idx); void vhost_attach_vdpa_device(int vid, struct rte_vdpa_device *dev); void vhost_set_ifname(int, const char *if_name, unsigned int if_len); -void vhost_set_builtin_virtio_net(int vid, bool enable); +void vhost_setup_virtio_net(int vid, bool enable, bool legacy_ol_flags); void vhost_enable_extbuf(int vid); void vhost_enable_linearbuf(int vid); int vhost_enable_guest_notification(struct virtio_net *dev, diff --git a/lib/vhost/virtio_net.c b/lib/vhost/virtio_net.c index ff39878609..aef30ad4fe 100644 --- a/lib/vhost/virtio_net.c +++ b/lib/vhost/virtio_net.c @@ -8,6 +8,7 @@ #include <rte_mbuf.h> #include <rte_memcpy.h> +#include <rte_net.h> #include <rte_ether.h> #include <rte_ip.h> #include <rte_vhost.h> @@ -1875,15 +1876,12 @@ parse_ethernet(struct rte_mbuf *m, uint16_t *l4_proto, void **l4_hdr) } static __rte_always_inline void -vhost_dequeue_offload(struct virtio_net_hdr *hdr, struct rte_mbuf *m) +vhost_dequeue_offload_legacy(struct virtio_net_hdr *hdr, struct rte_mbuf *m) { uint16_t l4_proto = 0; void *l4_hdr = NULL; struct rte_tcp_hdr *tcp_hdr = NULL; - if (hdr->flags == 0 && hdr->gso_type == VIRTIO_NET_HDR_GSO_NONE) - return; - parse_ethernet(m, &l4_proto, &l4_hdr); if (hdr->flags == VIRTIO_NET_HDR_F_NEEDS_CSUM) { if (hdr->csum_start == (m->l2_len + m->l3_len)) { @@ -1928,6 +1926,94 @@ vhost_dequeue_offload(struct virtio_net_hdr *hdr, struct rte_mbuf *m) } } +static __rte_always_inline void +vhost_dequeue_offload(struct virtio_net_hdr *hdr, struct rte_mbuf *m, + bool legacy_ol_flags) +{ + struct rte_net_hdr_lens hdr_lens; + int l4_supported = 0; + uint32_t ptype; + + if (hdr->flags == 0 && hdr->gso_type == VIRTIO_NET_HDR_GSO_NONE) + return; + + if (legacy_ol_flags) { + vhost_dequeue_offload_legacy(hdr, m); + return; + } + + m->ol_flags |= PKT_RX_IP_CKSUM_UNKNOWN; + + ptype = rte_net_get_ptype(m, &hdr_lens, RTE_PTYPE_ALL_MASK); + m->packet_type = ptype; + if ((ptype & RTE_PTYPE_L4_MASK) == RTE_PTYPE_L4_TCP || + (ptype & RTE_PTYPE_L4_MASK) == RTE_PTYPE_L4_UDP || + (ptype & RTE_PTYPE_L4_MASK) == RTE_PTYPE_L4_SCTP) + l4_supported = 1; + + /* According to Virtio 1.1 spec, the device only needs to look at + * VIRTIO_NET_HDR_F_NEEDS_CSUM in the packet transmission path. + * This differs from the processing incoming packets path where the + * driver could rely on VIRTIO_NET_HDR_F_DATA_VALID flag set by the + * device. + * + * 5.1.6.2.1 Driver Requirements: Packet Transmission + * The driver MUST NOT set the VIRTIO_NET_HDR_F_DATA_VALID and + * VIRTIO_NET_HDR_F_RSC_INFO bits in flags. + * + * 5.1.6.2.2 Device Requirements: Packet Transmission + * The device MUST ignore flag bits that it does not recognize. + */ + if (hdr->flags & VIRTIO_NET_HDR_F_NEEDS_CSUM) { + uint32_t hdrlen; + + hdrlen = hdr_lens.l2_len + hdr_lens.l3_len + hdr_lens.l4_len; + if (hdr->csum_start <= hdrlen && l4_supported != 0) { + m->ol_flags |= PKT_RX_L4_CKSUM_NONE; + } else { + /* Unknown proto or tunnel, do sw cksum. We can assume + * the cksum field is in the first segment since the + * buffers we provided to the host are large enough. + * In case of SCTP, this will be wrong since it's a CRC + * but there's nothing we can do. + */ + uint16_t csum = 0, off; + + if (rte_raw_cksum_mbuf(m, hdr->csum_start, + rte_pktmbuf_pkt_len(m) - hdr->csum_start, &csum) < 0) + return; + if (likely(csum != 0xffff)) + csum = ~csum; + off = hdr->csum_offset + hdr->csum_start; + if (rte_pktmbuf_data_len(m) >= off + 1) + *rte_pktmbuf_mtod_offset(m, uint16_t *, off) = csum; + } + } + + if (hdr->gso_type != VIRTIO_NET_HDR_GSO_NONE) { + if (hdr->gso_size == 0) + return; + + switch (hdr->gso_type & ~VIRTIO_NET_HDR_GSO_ECN) { + case VIRTIO_NET_HDR_GSO_TCPV4: + case VIRTIO_NET_HDR_GSO_TCPV6: + if ((ptype & RTE_PTYPE_L4_MASK) != RTE_PTYPE_L4_TCP) + break; + m->ol_flags |= PKT_RX_LRO | PKT_RX_L4_CKSUM_NONE; + m->tso_segsz = hdr->gso_size; + break; + case VIRTIO_NET_HDR_GSO_UDP: + if ((ptype & RTE_PTYPE_L4_MASK) != RTE_PTYPE_L4_UDP) + break; + m->ol_flags |= PKT_RX_LRO | PKT_RX_L4_CKSUM_NONE; + m->tso_segsz = hdr->gso_size; + break; + default: + break; + } + } +} + static __rte_noinline void copy_vnet_hdr_from_desc(struct virtio_net_hdr *hdr, struct buf_vector *buf_vec) @@ -1952,7 +2038,8 @@ copy_vnet_hdr_from_desc(struct virtio_net_hdr *hdr, static __rte_always_inline int copy_desc_to_mbuf(struct virtio_net *dev, struct vhost_virtqueue *vq, struct buf_vector *buf_vec, uint16_t nr_vec, - struct rte_mbuf *m, struct rte_mempool *mbuf_pool) + struct rte_mbuf *m, struct rte_mempool *mbuf_pool, + bool legacy_ol_flags) { uint32_t buf_avail, buf_offset; uint64_t buf_addr, buf_len; @@ -2085,7 +2172,7 @@ copy_desc_to_mbuf(struct virtio_net *dev, struct vhost_virtqueue *vq, m->pkt_len += mbuf_offset; if (hdr) - vhost_dequeue_offload(hdr, m); + vhost_dequeue_offload(hdr, m, legacy_ol_flags); out: @@ -2168,9 +2255,11 @@ virtio_dev_pktmbuf_alloc(struct virtio_net *dev, struct rte_mempool *mp, return NULL; } -static __rte_noinline uint16_t +__rte_always_inline +static uint16_t virtio_dev_tx_split(struct virtio_net *dev, struct vhost_virtqueue *vq, - struct rte_mempool *mbuf_pool, struct rte_mbuf **pkts, uint16_t count) + struct rte_mempool *mbuf_pool, struct rte_mbuf **pkts, uint16_t count, + bool legacy_ol_flags) { uint16_t i; uint16_t free_entries; @@ -2230,7 +2319,7 @@ virtio_dev_tx_split(struct virtio_net *dev, struct vhost_virtqueue *vq, } err = copy_desc_to_mbuf(dev, vq, buf_vec, nr_vec, pkts[i], - mbuf_pool); + mbuf_pool, legacy_ol_flags); if (unlikely(err)) { rte_pktmbuf_free(pkts[i]); if (!allocerr_warned) { @@ -2258,6 +2347,24 @@ virtio_dev_tx_split(struct virtio_net *dev, struct vhost_virtqueue *vq, return (i - dropped); } +__rte_noinline +static uint16_t +virtio_dev_tx_split_legacy(struct virtio_net *dev, + struct vhost_virtqueue *vq, struct rte_mempool *mbuf_pool, + struct rte_mbuf **pkts, uint16_t count) +{ + return virtio_dev_tx_split(dev, vq, mbuf_pool, pkts, count, true); +} + +__rte_noinline +static uint16_t +virtio_dev_tx_split_compliant(struct virtio_net *dev, + struct vhost_virtqueue *vq, struct rte_mempool *mbuf_pool, + struct rte_mbuf **pkts, uint16_t count) +{ + return virtio_dev_tx_split(dev, vq, mbuf_pool, pkts, count, false); +} + static __rte_always_inline int vhost_reserve_avail_batch_packed(struct virtio_net *dev, struct vhost_virtqueue *vq, @@ -2338,7 +2445,8 @@ static __rte_always_inline int virtio_dev_tx_batch_packed(struct virtio_net *dev, struct vhost_virtqueue *vq, struct rte_mempool *mbuf_pool, - struct rte_mbuf **pkts) + struct rte_mbuf **pkts, + bool legacy_ol_flags) { uint16_t avail_idx = vq->last_avail_idx; uint32_t buf_offset = sizeof(struct virtio_net_hdr_mrg_rxbuf); @@ -2362,7 +2470,7 @@ virtio_dev_tx_batch_packed(struct virtio_net *dev, if (virtio_net_with_host_offload(dev)) { vhost_for_each_try_unroll(i, 0, PACKED_BATCH_SIZE) { hdr = (struct virtio_net_hdr *)(desc_addrs[i]); - vhost_dequeue_offload(hdr, pkts[i]); + vhost_dequeue_offload(hdr, pkts[i], legacy_ol_flags); } } @@ -2383,7 +2491,8 @@ vhost_dequeue_single_packed(struct virtio_net *dev, struct rte_mempool *mbuf_pool, struct rte_mbuf **pkts, uint16_t *buf_id, - uint16_t *desc_count) + uint16_t *desc_count, + bool legacy_ol_flags) { struct buf_vector buf_vec[BUF_VECTOR_MAX]; uint32_t buf_len; @@ -2410,7 +2519,7 @@ vhost_dequeue_single_packed(struct virtio_net *dev, } err = copy_desc_to_mbuf(dev, vq, buf_vec, nr_vec, *pkts, - mbuf_pool); + mbuf_pool, legacy_ol_flags); if (unlikely(err)) { if (!allocerr_warned) { VHOST_LOG_DATA(ERR, @@ -2429,14 +2538,15 @@ static __rte_always_inline int virtio_dev_tx_single_packed(struct virtio_net *dev, struct vhost_virtqueue *vq, struct rte_mempool *mbuf_pool, - struct rte_mbuf **pkts) + struct rte_mbuf **pkts, + bool legacy_ol_flags) { uint16_t buf_id, desc_count = 0; int ret; ret = vhost_dequeue_single_packed(dev, vq, mbuf_pool, pkts, &buf_id, - &desc_count); + &desc_count, legacy_ol_flags); if (likely(desc_count > 0)) { if (virtio_net_is_inorder(dev)) @@ -2452,12 +2562,14 @@ virtio_dev_tx_single_packed(struct virtio_net *dev, return ret; } -static __rte_noinline uint16_t +__rte_always_inline +static uint16_t virtio_dev_tx_packed(struct virtio_net *dev, struct vhost_virtqueue *__rte_restrict vq, struct rte_mempool *mbuf_pool, struct rte_mbuf **__rte_restrict pkts, - uint32_t count) + uint32_t count, + bool legacy_ol_flags) { uint32_t pkt_idx = 0; uint32_t remained = count; @@ -2467,7 +2579,8 @@ virtio_dev_tx_packed(struct virtio_net *dev, if (remained >= PACKED_BATCH_SIZE) { if (!virtio_dev_tx_batch_packed(dev, vq, mbuf_pool, - &pkts[pkt_idx])) { + &pkts[pkt_idx], + legacy_ol_flags)) { pkt_idx += PACKED_BATCH_SIZE; remained -= PACKED_BATCH_SIZE; continue; @@ -2475,7 +2588,8 @@ virtio_dev_tx_packed(struct virtio_net *dev, } if (virtio_dev_tx_single_packed(dev, vq, mbuf_pool, - &pkts[pkt_idx])) + &pkts[pkt_idx], + legacy_ol_flags)) break; pkt_idx++; remained--; @@ -2492,6 +2606,24 @@ virtio_dev_tx_packed(struct virtio_net *dev, return pkt_idx; } +__rte_noinline +static uint16_t +virtio_dev_tx_packed_legacy(struct virtio_net *dev, + struct vhost_virtqueue *__rte_restrict vq, struct rte_mempool *mbuf_pool, + struct rte_mbuf **__rte_restrict pkts, uint32_t count) +{ + return virtio_dev_tx_packed(dev, vq, mbuf_pool, pkts, count, true); +} + +__rte_noinline +static uint16_t +virtio_dev_tx_packed_compliant(struct virtio_net *dev, + struct vhost_virtqueue *__rte_restrict vq, struct rte_mempool *mbuf_pool, + struct rte_mbuf **__rte_restrict pkts, uint32_t count) +{ + return virtio_dev_tx_packed(dev, vq, mbuf_pool, pkts, count, false); +} + uint16_t rte_vhost_dequeue_burst(int vid, uint16_t queue_id, struct rte_mempool *mbuf_pool, struct rte_mbuf **pkts, uint16_t count) @@ -2567,10 +2699,17 @@ rte_vhost_dequeue_burst(int vid, uint16_t queue_id, count -= 1; } - if (vq_is_packed(dev)) - count = virtio_dev_tx_packed(dev, vq, mbuf_pool, pkts, count); - else - count = virtio_dev_tx_split(dev, vq, mbuf_pool, pkts, count); + if (vq_is_packed(dev)) { + if (dev->flags & VIRTIO_DEV_LEGACY_OL_FLAGS) + count = virtio_dev_tx_packed_legacy(dev, vq, mbuf_pool, pkts, count); + else + count = virtio_dev_tx_packed_compliant(dev, vq, mbuf_pool, pkts, count); + } else { + if (dev->flags & VIRTIO_DEV_LEGACY_OL_FLAGS) + count = virtio_dev_tx_split_legacy(dev, vq, mbuf_pool, pkts, count); + else + count = virtio_dev_tx_split_compliant(dev, vq, mbuf_pool, pkts, count); + } out: if (dev->features & (1ULL << VIRTIO_F_IOMMU_PLATFORM)) -- 2.23.0 ^ permalink raw reply [flat|nested] 7+ messages in thread
[parent not found: <20210503164344.27916-1-david.marchand@redhat.com>]
* [dpdk-stable] [PATCH v4 3/3] vhost: fix offload flags in Rx path [not found] ` <20210503164344.27916-1-david.marchand@redhat.com> @ 2021-05-03 16:43 ` David Marchand 2021-05-04 11:07 ` Flavio Leitner 2021-05-08 6:24 ` [dpdk-stable] [dpdk-dev] " Wang, Yinan 0 siblings, 2 replies; 7+ messages in thread From: David Marchand @ 2021-05-03 16:43 UTC (permalink / raw) To: dev Cc: maxime.coquelin, olivier.matz, fbl, i.maximets, chenbo.xia, ian.stokes, stable, Jijiang Liu, Yuanhan Liu The vhost library currently configures Tx offloading (PKT_TX_*) on any packet received from a guest virtio device which asks for some offloading. This is problematic, as Tx offloading is something that the application must ask for: the application needs to configure devices to support every used offloads (ip, tcp checksumming, tso..), and the various l2/l3/l4 lengths must be set following any processing that happened in the application itself. On the other hand, the received packets are not marked wrt current packet l3/l4 checksumming info. Copy virtio rx processing to fix those offload flags with some differences: - accept VIRTIO_NET_HDR_GSO_ECN and VIRTIO_NET_HDR_GSO_UDP, - ignore anything but the VIRTIO_NET_HDR_F_NEEDS_CSUM flag (to comply with the virtio spec), Some applications might rely on the current behavior, so it is left untouched by default. A new RTE_VHOST_USER_NET_COMPLIANT_OL_FLAGS flag is added to enable the new behavior. The vhost example has been updated for the new behavior: TSO is applied to any packet marked LRO. Fixes: 859b480d5afd ("vhost: add guest offload setting") Cc: stable@dpdk.org Signed-off-by: David Marchand <david.marchand@redhat.com> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com> --- Changes since v3: - rebased on next-virtio, Changes since v2: - introduced a new flag to keep existing behavior as the default, - packets with unrecognised offload are passed to the application with no offload metadata rather than dropped, - ignored VIRTIO_NET_HDR_F_DATA_VALID since the virtio spec states that the virtio driver is not allowed to use this flag when transmitting packets, Changes since v1: - updated vhost example, - restored VIRTIO_NET_HDR_GSO_ECN and VIRTIO_NET_HDR_GSO_UDP support, - restored log on buggy offload request, --- doc/guides/prog_guide/vhost_lib.rst | 12 ++ doc/guides/rel_notes/release_21_05.rst | 6 + drivers/net/vhost/rte_eth_vhost.c | 2 +- examples/vhost/main.c | 44 +++--- lib/vhost/rte_vhost.h | 1 + lib/vhost/socket.c | 5 +- lib/vhost/vhost.c | 6 +- lib/vhost/vhost.h | 14 +- lib/vhost/virtio_net.c | 185 ++++++++++++++++++++++--- 9 files changed, 222 insertions(+), 53 deletions(-) diff --git a/doc/guides/prog_guide/vhost_lib.rst b/doc/guides/prog_guide/vhost_lib.rst index 7afa351675..d18fb98910 100644 --- a/doc/guides/prog_guide/vhost_lib.rst +++ b/doc/guides/prog_guide/vhost_lib.rst @@ -118,6 +118,18 @@ The following is an overview of some key Vhost API functions: It is disabled by default. + - ``RTE_VHOST_USER_NET_COMPLIANT_OL_FLAGS`` + + Since v16.04, the vhost library forwards checksum and gso requests for + packets received from a virtio driver by filling Tx offload metadata in + the mbuf. This behavior is inconsistent with other drivers but it is left + untouched for existing applications that might rely on it. + + This flag disables the legacy behavior and instead ask vhost to simply + populate Rx offload metadata in the mbuf. + + It is disabled by default. + * ``rte_vhost_driver_set_features(path, features)`` This function sets the feature bits the vhost-user driver supports. The diff --git a/doc/guides/rel_notes/release_21_05.rst b/doc/guides/rel_notes/release_21_05.rst index a5f21f8425..6b7b0810a5 100644 --- a/doc/guides/rel_notes/release_21_05.rst +++ b/doc/guides/rel_notes/release_21_05.rst @@ -337,6 +337,12 @@ API Changes ``policer_action_recolor_supported`` and ``policer_action_drop_supported`` have been removed. +* vhost: The vhost library currently populates received mbufs from a virtio + driver with Tx offload flags while not filling Rx offload flags. + While this behavior is arguable, it is kept untouched. + A new flag ``RTE_VHOST_USER_NET_COMPLIANT_OL_FLAGS`` has been added to ask + for a behavior compliant with to the mbuf offload API. + ABI Changes ----------- diff --git a/drivers/net/vhost/rte_eth_vhost.c b/drivers/net/vhost/rte_eth_vhost.c index d198fc8a8e..281379d6a3 100644 --- a/drivers/net/vhost/rte_eth_vhost.c +++ b/drivers/net/vhost/rte_eth_vhost.c @@ -1505,7 +1505,7 @@ rte_pmd_vhost_probe(struct rte_vdev_device *dev) int ret = 0; char *iface_name; uint16_t queues; - uint64_t flags = 0; + uint64_t flags = RTE_VHOST_USER_NET_COMPLIANT_OL_FLAGS; uint64_t disable_flags = 0; int client_mode = 0; int iommu_support = 0; diff --git a/examples/vhost/main.c b/examples/vhost/main.c index 0bee1f3321..d2179eadb9 100644 --- a/examples/vhost/main.c +++ b/examples/vhost/main.c @@ -19,6 +19,7 @@ #include <rte_log.h> #include <rte_string_fns.h> #include <rte_malloc.h> +#include <rte_net.h> #include <rte_vhost.h> #include <rte_ip.h> #include <rte_tcp.h> @@ -1029,33 +1030,34 @@ find_local_dest(struct vhost_dev *vdev, struct rte_mbuf *m, return 0; } -static uint16_t -get_psd_sum(void *l3_hdr, uint64_t ol_flags) -{ - if (ol_flags & PKT_TX_IPV4) - return rte_ipv4_phdr_cksum(l3_hdr, ol_flags); - else /* assume ethertype == RTE_ETHER_TYPE_IPV6 */ - return rte_ipv6_phdr_cksum(l3_hdr, ol_flags); -} - static void virtio_tx_offload(struct rte_mbuf *m) { + struct rte_net_hdr_lens hdr_lens; + struct rte_ipv4_hdr *ipv4_hdr; + struct rte_tcp_hdr *tcp_hdr; + uint32_t ptype; void *l3_hdr; - struct rte_ipv4_hdr *ipv4_hdr = NULL; - struct rte_tcp_hdr *tcp_hdr = NULL; - struct rte_ether_hdr *eth_hdr = - rte_pktmbuf_mtod(m, struct rte_ether_hdr *); - l3_hdr = (char *)eth_hdr + m->l2_len; + ptype = rte_net_get_ptype(m, &hdr_lens, RTE_PTYPE_ALL_MASK); + m->l2_len = hdr_lens.l2_len; + m->l3_len = hdr_lens.l3_len; + m->l4_len = hdr_lens.l4_len; - if (m->ol_flags & PKT_TX_IPV4) { + l3_hdr = rte_pktmbuf_mtod_offset(m, void *, m->l2_len); + tcp_hdr = rte_pktmbuf_mtod_offset(m, struct rte_tcp_hdr *, + m->l2_len + m->l3_len); + + m->ol_flags |= PKT_TX_TCP_SEG; + if ((ptype & RTE_PTYPE_L3_MASK) == RTE_PTYPE_L3_IPV4) { + m->ol_flags |= PKT_TX_IPV4; + m->ol_flags |= PKT_TX_IP_CKSUM; ipv4_hdr = l3_hdr; ipv4_hdr->hdr_checksum = 0; - m->ol_flags |= PKT_TX_IP_CKSUM; + tcp_hdr->cksum = rte_ipv4_phdr_cksum(l3_hdr, m->ol_flags); + } else { /* assume ethertype == RTE_ETHER_TYPE_IPV6 */ + m->ol_flags |= PKT_TX_IPV6; + tcp_hdr->cksum = rte_ipv6_phdr_cksum(l3_hdr, m->ol_flags); } - - tcp_hdr = (struct rte_tcp_hdr *)((char *)l3_hdr + m->l3_len); - tcp_hdr->cksum = get_psd_sum(l3_hdr, m->ol_flags); } static __rte_always_inline void @@ -1148,7 +1150,7 @@ virtio_tx_route(struct vhost_dev *vdev, struct rte_mbuf *m, uint16_t vlan_tag) m->vlan_tci = vlan_tag; } - if (m->ol_flags & PKT_TX_TCP_SEG) + if (m->ol_flags & PKT_RX_LRO) virtio_tx_offload(m); tx_q->m_table[tx_q->len++] = m; @@ -1633,7 +1635,7 @@ main(int argc, char *argv[]) int ret, i; uint16_t portid; static pthread_t tid; - uint64_t flags = 0; + uint64_t flags = RTE_VHOST_USER_NET_COMPLIANT_OL_FLAGS; signal(SIGINT, sigint_handler); diff --git a/lib/vhost/rte_vhost.h b/lib/vhost/rte_vhost.h index d0a8ae31f2..8d875e9322 100644 --- a/lib/vhost/rte_vhost.h +++ b/lib/vhost/rte_vhost.h @@ -36,6 +36,7 @@ extern "C" { /* support only linear buffers (no chained mbufs) */ #define RTE_VHOST_USER_LINEARBUF_SUPPORT (1ULL << 6) #define RTE_VHOST_USER_ASYNC_COPY (1ULL << 7) +#define RTE_VHOST_USER_NET_COMPLIANT_OL_FLAGS (1ULL << 8) /* Features. */ #ifndef VIRTIO_NET_F_GUEST_ANNOUNCE diff --git a/lib/vhost/socket.c b/lib/vhost/socket.c index 0169d36481..5d0d728d52 100644 --- a/lib/vhost/socket.c +++ b/lib/vhost/socket.c @@ -42,6 +42,7 @@ struct vhost_user_socket { bool extbuf; bool linearbuf; bool async_copy; + bool net_compliant_ol_flags; /* * The "supported_features" indicates the feature bits the @@ -224,7 +225,8 @@ vhost_user_add_connection(int fd, struct vhost_user_socket *vsocket) size = strnlen(vsocket->path, PATH_MAX); vhost_set_ifname(vid, vsocket->path, size); - vhost_set_builtin_virtio_net(vid, vsocket->use_builtin_virtio_net); + vhost_setup_virtio_net(vid, vsocket->use_builtin_virtio_net, + vsocket->net_compliant_ol_flags); vhost_attach_vdpa_device(vid, vsocket->vdpa_dev); @@ -877,6 +879,7 @@ rte_vhost_driver_register(const char *path, uint64_t flags) vsocket->extbuf = flags & RTE_VHOST_USER_EXTBUF_SUPPORT; vsocket->linearbuf = flags & RTE_VHOST_USER_LINEARBUF_SUPPORT; vsocket->async_copy = flags & RTE_VHOST_USER_ASYNC_COPY; + vsocket->net_compliant_ol_flags = flags & RTE_VHOST_USER_NET_COMPLIANT_OL_FLAGS; if (vsocket->async_copy && (flags & (RTE_VHOST_USER_IOMMU_SUPPORT | diff --git a/lib/vhost/vhost.c b/lib/vhost/vhost.c index c9b6379f73..9abfc0bfe7 100644 --- a/lib/vhost/vhost.c +++ b/lib/vhost/vhost.c @@ -752,7 +752,7 @@ vhost_set_ifname(int vid, const char *if_name, unsigned int if_len) } void -vhost_set_builtin_virtio_net(int vid, bool enable) +vhost_setup_virtio_net(int vid, bool enable, bool compliant_ol_flags) { struct virtio_net *dev = get_device(vid); @@ -763,6 +763,10 @@ vhost_set_builtin_virtio_net(int vid, bool enable) dev->flags |= VIRTIO_DEV_BUILTIN_VIRTIO_NET; else dev->flags &= ~VIRTIO_DEV_BUILTIN_VIRTIO_NET; + if (!compliant_ol_flags) + dev->flags |= VIRTIO_DEV_LEGACY_OL_FLAGS; + else + dev->flags &= ~VIRTIO_DEV_LEGACY_OL_FLAGS; } void diff --git a/lib/vhost/vhost.h b/lib/vhost/vhost.h index b303635645..8078ddff79 100644 --- a/lib/vhost/vhost.h +++ b/lib/vhost/vhost.h @@ -27,15 +27,17 @@ #include "rte_vhost_async.h" /* Used to indicate that the device is running on a data core */ -#define VIRTIO_DEV_RUNNING 1 +#define VIRTIO_DEV_RUNNING ((uint32_t)1 << 0) /* Used to indicate that the device is ready to operate */ -#define VIRTIO_DEV_READY 2 +#define VIRTIO_DEV_READY ((uint32_t)1 << 1) /* Used to indicate that the built-in vhost net device backend is enabled */ -#define VIRTIO_DEV_BUILTIN_VIRTIO_NET 4 +#define VIRTIO_DEV_BUILTIN_VIRTIO_NET ((uint32_t)1 << 2) /* Used to indicate that the device has its own data path and configured */ -#define VIRTIO_DEV_VDPA_CONFIGURED 8 +#define VIRTIO_DEV_VDPA_CONFIGURED ((uint32_t)1 << 3) /* Used to indicate that the feature negotiation failed */ -#define VIRTIO_DEV_FEATURES_FAILED 16 +#define VIRTIO_DEV_FEATURES_FAILED ((uint32_t)1 << 4) +/* Used to indicate that the virtio_net tx code should fill TX ol_flags */ +#define VIRTIO_DEV_LEGACY_OL_FLAGS ((uint32_t)1 << 5) /* Backend value set by guest. */ #define VIRTIO_DEV_STOPPED -1 @@ -683,7 +685,7 @@ int alloc_vring_queue(struct virtio_net *dev, uint32_t vring_idx); void vhost_attach_vdpa_device(int vid, struct rte_vdpa_device *dev); void vhost_set_ifname(int, const char *if_name, unsigned int if_len); -void vhost_set_builtin_virtio_net(int vid, bool enable); +void vhost_setup_virtio_net(int vid, bool enable, bool legacy_ol_flags); void vhost_enable_extbuf(int vid); void vhost_enable_linearbuf(int vid); int vhost_enable_guest_notification(struct virtio_net *dev, diff --git a/lib/vhost/virtio_net.c b/lib/vhost/virtio_net.c index 1a34867f3c..8e36f4c340 100644 --- a/lib/vhost/virtio_net.c +++ b/lib/vhost/virtio_net.c @@ -8,6 +8,7 @@ #include <rte_mbuf.h> #include <rte_memcpy.h> +#include <rte_net.h> #include <rte_ether.h> #include <rte_ip.h> #include <rte_vhost.h> @@ -2303,15 +2304,12 @@ parse_ethernet(struct rte_mbuf *m, uint16_t *l4_proto, void **l4_hdr) } static __rte_always_inline void -vhost_dequeue_offload(struct virtio_net_hdr *hdr, struct rte_mbuf *m) +vhost_dequeue_offload_legacy(struct virtio_net_hdr *hdr, struct rte_mbuf *m) { uint16_t l4_proto = 0; void *l4_hdr = NULL; struct rte_tcp_hdr *tcp_hdr = NULL; - if (hdr->flags == 0 && hdr->gso_type == VIRTIO_NET_HDR_GSO_NONE) - return; - parse_ethernet(m, &l4_proto, &l4_hdr); if (hdr->flags == VIRTIO_NET_HDR_F_NEEDS_CSUM) { if (hdr->csum_start == (m->l2_len + m->l3_len)) { @@ -2356,6 +2354,94 @@ vhost_dequeue_offload(struct virtio_net_hdr *hdr, struct rte_mbuf *m) } } +static __rte_always_inline void +vhost_dequeue_offload(struct virtio_net_hdr *hdr, struct rte_mbuf *m, + bool legacy_ol_flags) +{ + struct rte_net_hdr_lens hdr_lens; + int l4_supported = 0; + uint32_t ptype; + + if (hdr->flags == 0 && hdr->gso_type == VIRTIO_NET_HDR_GSO_NONE) + return; + + if (legacy_ol_flags) { + vhost_dequeue_offload_legacy(hdr, m); + return; + } + + m->ol_flags |= PKT_RX_IP_CKSUM_UNKNOWN; + + ptype = rte_net_get_ptype(m, &hdr_lens, RTE_PTYPE_ALL_MASK); + m->packet_type = ptype; + if ((ptype & RTE_PTYPE_L4_MASK) == RTE_PTYPE_L4_TCP || + (ptype & RTE_PTYPE_L4_MASK) == RTE_PTYPE_L4_UDP || + (ptype & RTE_PTYPE_L4_MASK) == RTE_PTYPE_L4_SCTP) + l4_supported = 1; + + /* According to Virtio 1.1 spec, the device only needs to look at + * VIRTIO_NET_HDR_F_NEEDS_CSUM in the packet transmission path. + * This differs from the processing incoming packets path where the + * driver could rely on VIRTIO_NET_HDR_F_DATA_VALID flag set by the + * device. + * + * 5.1.6.2.1 Driver Requirements: Packet Transmission + * The driver MUST NOT set the VIRTIO_NET_HDR_F_DATA_VALID and + * VIRTIO_NET_HDR_F_RSC_INFO bits in flags. + * + * 5.1.6.2.2 Device Requirements: Packet Transmission + * The device MUST ignore flag bits that it does not recognize. + */ + if (hdr->flags & VIRTIO_NET_HDR_F_NEEDS_CSUM) { + uint32_t hdrlen; + + hdrlen = hdr_lens.l2_len + hdr_lens.l3_len + hdr_lens.l4_len; + if (hdr->csum_start <= hdrlen && l4_supported != 0) { + m->ol_flags |= PKT_RX_L4_CKSUM_NONE; + } else { + /* Unknown proto or tunnel, do sw cksum. We can assume + * the cksum field is in the first segment since the + * buffers we provided to the host are large enough. + * In case of SCTP, this will be wrong since it's a CRC + * but there's nothing we can do. + */ + uint16_t csum = 0, off; + + if (rte_raw_cksum_mbuf(m, hdr->csum_start, + rte_pktmbuf_pkt_len(m) - hdr->csum_start, &csum) < 0) + return; + if (likely(csum != 0xffff)) + csum = ~csum; + off = hdr->csum_offset + hdr->csum_start; + if (rte_pktmbuf_data_len(m) >= off + 1) + *rte_pktmbuf_mtod_offset(m, uint16_t *, off) = csum; + } + } + + if (hdr->gso_type != VIRTIO_NET_HDR_GSO_NONE) { + if (hdr->gso_size == 0) + return; + + switch (hdr->gso_type & ~VIRTIO_NET_HDR_GSO_ECN) { + case VIRTIO_NET_HDR_GSO_TCPV4: + case VIRTIO_NET_HDR_GSO_TCPV6: + if ((ptype & RTE_PTYPE_L4_MASK) != RTE_PTYPE_L4_TCP) + break; + m->ol_flags |= PKT_RX_LRO | PKT_RX_L4_CKSUM_NONE; + m->tso_segsz = hdr->gso_size; + break; + case VIRTIO_NET_HDR_GSO_UDP: + if ((ptype & RTE_PTYPE_L4_MASK) != RTE_PTYPE_L4_UDP) + break; + m->ol_flags |= PKT_RX_LRO | PKT_RX_L4_CKSUM_NONE; + m->tso_segsz = hdr->gso_size; + break; + default: + break; + } + } +} + static __rte_noinline void copy_vnet_hdr_from_desc(struct virtio_net_hdr *hdr, struct buf_vector *buf_vec) @@ -2380,7 +2466,8 @@ copy_vnet_hdr_from_desc(struct virtio_net_hdr *hdr, static __rte_always_inline int copy_desc_to_mbuf(struct virtio_net *dev, struct vhost_virtqueue *vq, struct buf_vector *buf_vec, uint16_t nr_vec, - struct rte_mbuf *m, struct rte_mempool *mbuf_pool) + struct rte_mbuf *m, struct rte_mempool *mbuf_pool, + bool legacy_ol_flags) { uint32_t buf_avail, buf_offset; uint64_t buf_addr, buf_len; @@ -2513,7 +2600,7 @@ copy_desc_to_mbuf(struct virtio_net *dev, struct vhost_virtqueue *vq, m->pkt_len += mbuf_offset; if (hdr) - vhost_dequeue_offload(hdr, m); + vhost_dequeue_offload(hdr, m, legacy_ol_flags); out: @@ -2606,9 +2693,11 @@ virtio_dev_pktmbuf_alloc(struct virtio_net *dev, struct rte_mempool *mp, return pkt; } -static __rte_noinline uint16_t +__rte_always_inline +static uint16_t virtio_dev_tx_split(struct virtio_net *dev, struct vhost_virtqueue *vq, - struct rte_mempool *mbuf_pool, struct rte_mbuf **pkts, uint16_t count) + struct rte_mempool *mbuf_pool, struct rte_mbuf **pkts, uint16_t count, + bool legacy_ol_flags) { uint16_t i; uint16_t free_entries; @@ -2668,7 +2757,7 @@ virtio_dev_tx_split(struct virtio_net *dev, struct vhost_virtqueue *vq, } err = copy_desc_to_mbuf(dev, vq, buf_vec, nr_vec, pkts[i], - mbuf_pool); + mbuf_pool, legacy_ol_flags); if (unlikely(err)) { rte_pktmbuf_free(pkts[i]); if (!allocerr_warned) { @@ -2696,6 +2785,24 @@ virtio_dev_tx_split(struct virtio_net *dev, struct vhost_virtqueue *vq, return (i - dropped); } +__rte_noinline +static uint16_t +virtio_dev_tx_split_legacy(struct virtio_net *dev, + struct vhost_virtqueue *vq, struct rte_mempool *mbuf_pool, + struct rte_mbuf **pkts, uint16_t count) +{ + return virtio_dev_tx_split(dev, vq, mbuf_pool, pkts, count, true); +} + +__rte_noinline +static uint16_t +virtio_dev_tx_split_compliant(struct virtio_net *dev, + struct vhost_virtqueue *vq, struct rte_mempool *mbuf_pool, + struct rte_mbuf **pkts, uint16_t count) +{ + return virtio_dev_tx_split(dev, vq, mbuf_pool, pkts, count, false); +} + static __rte_always_inline int vhost_reserve_avail_batch_packed(struct virtio_net *dev, struct vhost_virtqueue *vq, @@ -2770,7 +2877,8 @@ vhost_reserve_avail_batch_packed(struct virtio_net *dev, static __rte_always_inline int virtio_dev_tx_batch_packed(struct virtio_net *dev, struct vhost_virtqueue *vq, - struct rte_mbuf **pkts) + struct rte_mbuf **pkts, + bool legacy_ol_flags) { uint16_t avail_idx = vq->last_avail_idx; uint32_t buf_offset = sizeof(struct virtio_net_hdr_mrg_rxbuf); @@ -2794,7 +2902,7 @@ virtio_dev_tx_batch_packed(struct virtio_net *dev, if (virtio_net_with_host_offload(dev)) { vhost_for_each_try_unroll(i, 0, PACKED_BATCH_SIZE) { hdr = (struct virtio_net_hdr *)(desc_addrs[i]); - vhost_dequeue_offload(hdr, pkts[i]); + vhost_dequeue_offload(hdr, pkts[i], legacy_ol_flags); } } @@ -2815,7 +2923,8 @@ vhost_dequeue_single_packed(struct virtio_net *dev, struct rte_mempool *mbuf_pool, struct rte_mbuf *pkts, uint16_t *buf_id, - uint16_t *desc_count) + uint16_t *desc_count, + bool legacy_ol_flags) { struct buf_vector buf_vec[BUF_VECTOR_MAX]; uint32_t buf_len; @@ -2841,7 +2950,7 @@ vhost_dequeue_single_packed(struct virtio_net *dev, } err = copy_desc_to_mbuf(dev, vq, buf_vec, nr_vec, pkts, - mbuf_pool); + mbuf_pool, legacy_ol_flags); if (unlikely(err)) { if (!allocerr_warned) { VHOST_LOG_DATA(ERR, @@ -2859,14 +2968,15 @@ static __rte_always_inline int virtio_dev_tx_single_packed(struct virtio_net *dev, struct vhost_virtqueue *vq, struct rte_mempool *mbuf_pool, - struct rte_mbuf *pkts) + struct rte_mbuf *pkts, + bool legacy_ol_flags) { uint16_t buf_id, desc_count = 0; int ret; ret = vhost_dequeue_single_packed(dev, vq, mbuf_pool, pkts, &buf_id, - &desc_count); + &desc_count, legacy_ol_flags); if (likely(desc_count > 0)) { if (virtio_net_is_inorder(dev)) @@ -2882,12 +2992,14 @@ virtio_dev_tx_single_packed(struct virtio_net *dev, return ret; } -static __rte_noinline uint16_t +__rte_always_inline +static uint16_t virtio_dev_tx_packed(struct virtio_net *dev, struct vhost_virtqueue *__rte_restrict vq, struct rte_mempool *mbuf_pool, struct rte_mbuf **__rte_restrict pkts, - uint32_t count) + uint32_t count, + bool legacy_ol_flags) { uint32_t pkt_idx = 0; @@ -2899,14 +3011,16 @@ virtio_dev_tx_packed(struct virtio_net *dev, if (count - pkt_idx >= PACKED_BATCH_SIZE) { if (!virtio_dev_tx_batch_packed(dev, vq, - &pkts[pkt_idx])) { + &pkts[pkt_idx], + legacy_ol_flags)) { pkt_idx += PACKED_BATCH_SIZE; continue; } } if (virtio_dev_tx_single_packed(dev, vq, mbuf_pool, - pkts[pkt_idx])) + pkts[pkt_idx], + legacy_ol_flags)) break; pkt_idx++; } while (pkt_idx < count); @@ -2924,6 +3038,24 @@ virtio_dev_tx_packed(struct virtio_net *dev, return pkt_idx; } +__rte_noinline +static uint16_t +virtio_dev_tx_packed_legacy(struct virtio_net *dev, + struct vhost_virtqueue *__rte_restrict vq, struct rte_mempool *mbuf_pool, + struct rte_mbuf **__rte_restrict pkts, uint32_t count) +{ + return virtio_dev_tx_packed(dev, vq, mbuf_pool, pkts, count, true); +} + +__rte_noinline +static uint16_t +virtio_dev_tx_packed_compliant(struct virtio_net *dev, + struct vhost_virtqueue *__rte_restrict vq, struct rte_mempool *mbuf_pool, + struct rte_mbuf **__rte_restrict pkts, uint32_t count) +{ + return virtio_dev_tx_packed(dev, vq, mbuf_pool, pkts, count, false); +} + uint16_t rte_vhost_dequeue_burst(int vid, uint16_t queue_id, struct rte_mempool *mbuf_pool, struct rte_mbuf **pkts, uint16_t count) @@ -2999,10 +3131,17 @@ rte_vhost_dequeue_burst(int vid, uint16_t queue_id, count -= 1; } - if (vq_is_packed(dev)) - count = virtio_dev_tx_packed(dev, vq, mbuf_pool, pkts, count); - else - count = virtio_dev_tx_split(dev, vq, mbuf_pool, pkts, count); + if (vq_is_packed(dev)) { + if (dev->flags & VIRTIO_DEV_LEGACY_OL_FLAGS) + count = virtio_dev_tx_packed_legacy(dev, vq, mbuf_pool, pkts, count); + else + count = virtio_dev_tx_packed_compliant(dev, vq, mbuf_pool, pkts, count); + } else { + if (dev->flags & VIRTIO_DEV_LEGACY_OL_FLAGS) + count = virtio_dev_tx_split_legacy(dev, vq, mbuf_pool, pkts, count); + else + count = virtio_dev_tx_split_compliant(dev, vq, mbuf_pool, pkts, count); + } out: if (dev->features & (1ULL << VIRTIO_F_IOMMU_PLATFORM)) -- 2.23.0 ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [dpdk-stable] [PATCH v4 3/3] vhost: fix offload flags in Rx path 2021-05-03 16:43 ` [dpdk-stable] [PATCH v4 3/3] " David Marchand @ 2021-05-04 11:07 ` Flavio Leitner 2021-05-08 6:24 ` [dpdk-stable] [dpdk-dev] " Wang, Yinan 1 sibling, 0 replies; 7+ messages in thread From: Flavio Leitner @ 2021-05-04 11:07 UTC (permalink / raw) To: David Marchand Cc: dev, maxime.coquelin, olivier.matz, i.maximets, chenbo.xia, ian.stokes, stable, Jijiang Liu, Yuanhan Liu On Mon, May 03, 2021 at 06:43:44PM +0200, David Marchand wrote: > The vhost library currently configures Tx offloading (PKT_TX_*) on any > packet received from a guest virtio device which asks for some offloading. > > This is problematic, as Tx offloading is something that the application > must ask for: the application needs to configure devices > to support every used offloads (ip, tcp checksumming, tso..), and the > various l2/l3/l4 lengths must be set following any processing that > happened in the application itself. > > On the other hand, the received packets are not marked wrt current > packet l3/l4 checksumming info. > > Copy virtio rx processing to fix those offload flags with some > differences: > - accept VIRTIO_NET_HDR_GSO_ECN and VIRTIO_NET_HDR_GSO_UDP, > - ignore anything but the VIRTIO_NET_HDR_F_NEEDS_CSUM flag (to comply with > the virtio spec), > > Some applications might rely on the current behavior, so it is left > untouched by default. > A new RTE_VHOST_USER_NET_COMPLIANT_OL_FLAGS flag is added to enable the > new behavior. > > The vhost example has been updated for the new behavior: TSO is applied to > any packet marked LRO. > > Fixes: 859b480d5afd ("vhost: add guest offload setting") > Cc: stable@dpdk.org > > Signed-off-by: David Marchand <david.marchand@redhat.com> > Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com> > --- > Changes since v3: > - rebased on next-virtio, > > Changes since v2: > - introduced a new flag to keep existing behavior as the default, > - packets with unrecognised offload are passed to the application with no > offload metadata rather than dropped, > - ignored VIRTIO_NET_HDR_F_DATA_VALID since the virtio spec states that > the virtio driver is not allowed to use this flag when transmitting > packets, > > Changes since v1: > - updated vhost example, > - restored VIRTIO_NET_HDR_GSO_ECN and VIRTIO_NET_HDR_GSO_UDP support, > - restored log on buggy offload request, > > --- > doc/guides/prog_guide/vhost_lib.rst | 12 ++ > doc/guides/rel_notes/release_21_05.rst | 6 + > drivers/net/vhost/rte_eth_vhost.c | 2 +- > examples/vhost/main.c | 44 +++--- > lib/vhost/rte_vhost.h | 1 + > lib/vhost/socket.c | 5 +- > lib/vhost/vhost.c | 6 +- > lib/vhost/vhost.h | 14 +- > lib/vhost/virtio_net.c | 185 ++++++++++++++++++++++--- > 9 files changed, 222 insertions(+), 53 deletions(-) > > diff --git a/doc/guides/prog_guide/vhost_lib.rst b/doc/guides/prog_guide/vhost_lib.rst > index 7afa351675..d18fb98910 100644 > --- a/doc/guides/prog_guide/vhost_lib.rst > +++ b/doc/guides/prog_guide/vhost_lib.rst > @@ -118,6 +118,18 @@ The following is an overview of some key Vhost API functions: > > It is disabled by default. > > + - ``RTE_VHOST_USER_NET_COMPLIANT_OL_FLAGS`` > + > + Since v16.04, the vhost library forwards checksum and gso requests for > + packets received from a virtio driver by filling Tx offload metadata in > + the mbuf. This behavior is inconsistent with other drivers but it is left > + untouched for existing applications that might rely on it. > + > + This flag disables the legacy behavior and instead ask vhost to simply > + populate Rx offload metadata in the mbuf. > + > + It is disabled by default. > + > * ``rte_vhost_driver_set_features(path, features)`` > > This function sets the feature bits the vhost-user driver supports. The > diff --git a/doc/guides/rel_notes/release_21_05.rst b/doc/guides/rel_notes/release_21_05.rst > index a5f21f8425..6b7b0810a5 100644 > --- a/doc/guides/rel_notes/release_21_05.rst > +++ b/doc/guides/rel_notes/release_21_05.rst > @@ -337,6 +337,12 @@ API Changes > ``policer_action_recolor_supported`` and ``policer_action_drop_supported`` > have been removed. > > +* vhost: The vhost library currently populates received mbufs from a virtio > + driver with Tx offload flags while not filling Rx offload flags. > + While this behavior is arguable, it is kept untouched. > + A new flag ``RTE_VHOST_USER_NET_COMPLIANT_OL_FLAGS`` has been added to ask > + for a behavior compliant with to the mbuf offload API. > + > > ABI Changes > ----------- > diff --git a/drivers/net/vhost/rte_eth_vhost.c b/drivers/net/vhost/rte_eth_vhost.c > index d198fc8a8e..281379d6a3 100644 > --- a/drivers/net/vhost/rte_eth_vhost.c > +++ b/drivers/net/vhost/rte_eth_vhost.c > @@ -1505,7 +1505,7 @@ rte_pmd_vhost_probe(struct rte_vdev_device *dev) > int ret = 0; > char *iface_name; > uint16_t queues; > - uint64_t flags = 0; > + uint64_t flags = RTE_VHOST_USER_NET_COMPLIANT_OL_FLAGS; > uint64_t disable_flags = 0; > int client_mode = 0; > int iommu_support = 0; > diff --git a/examples/vhost/main.c b/examples/vhost/main.c > index 0bee1f3321..d2179eadb9 100644 > --- a/examples/vhost/main.c > +++ b/examples/vhost/main.c > @@ -19,6 +19,7 @@ > #include <rte_log.h> > #include <rte_string_fns.h> > #include <rte_malloc.h> > +#include <rte_net.h> > #include <rte_vhost.h> > #include <rte_ip.h> > #include <rte_tcp.h> > @@ -1029,33 +1030,34 @@ find_local_dest(struct vhost_dev *vdev, struct rte_mbuf *m, > return 0; > } > > -static uint16_t > -get_psd_sum(void *l3_hdr, uint64_t ol_flags) > -{ > - if (ol_flags & PKT_TX_IPV4) > - return rte_ipv4_phdr_cksum(l3_hdr, ol_flags); > - else /* assume ethertype == RTE_ETHER_TYPE_IPV6 */ > - return rte_ipv6_phdr_cksum(l3_hdr, ol_flags); > -} > - > static void virtio_tx_offload(struct rte_mbuf *m) > { > + struct rte_net_hdr_lens hdr_lens; > + struct rte_ipv4_hdr *ipv4_hdr; > + struct rte_tcp_hdr *tcp_hdr; > + uint32_t ptype; > void *l3_hdr; > - struct rte_ipv4_hdr *ipv4_hdr = NULL; > - struct rte_tcp_hdr *tcp_hdr = NULL; > - struct rte_ether_hdr *eth_hdr = > - rte_pktmbuf_mtod(m, struct rte_ether_hdr *); > > - l3_hdr = (char *)eth_hdr + m->l2_len; > + ptype = rte_net_get_ptype(m, &hdr_lens, RTE_PTYPE_ALL_MASK); > + m->l2_len = hdr_lens.l2_len; > + m->l3_len = hdr_lens.l3_len; > + m->l4_len = hdr_lens.l4_len; > > - if (m->ol_flags & PKT_TX_IPV4) { > + l3_hdr = rte_pktmbuf_mtod_offset(m, void *, m->l2_len); > + tcp_hdr = rte_pktmbuf_mtod_offset(m, struct rte_tcp_hdr *, > + m->l2_len + m->l3_len); > + > + m->ol_flags |= PKT_TX_TCP_SEG; > + if ((ptype & RTE_PTYPE_L3_MASK) == RTE_PTYPE_L3_IPV4) { > + m->ol_flags |= PKT_TX_IPV4; > + m->ol_flags |= PKT_TX_IP_CKSUM; > ipv4_hdr = l3_hdr; > ipv4_hdr->hdr_checksum = 0; > - m->ol_flags |= PKT_TX_IP_CKSUM; > + tcp_hdr->cksum = rte_ipv4_phdr_cksum(l3_hdr, m->ol_flags); > + } else { /* assume ethertype == RTE_ETHER_TYPE_IPV6 */ > + m->ol_flags |= PKT_TX_IPV6; > + tcp_hdr->cksum = rte_ipv6_phdr_cksum(l3_hdr, m->ol_flags); > } > - > - tcp_hdr = (struct rte_tcp_hdr *)((char *)l3_hdr + m->l3_len); > - tcp_hdr->cksum = get_psd_sum(l3_hdr, m->ol_flags); > } > > static __rte_always_inline void > @@ -1148,7 +1150,7 @@ virtio_tx_route(struct vhost_dev *vdev, struct rte_mbuf *m, uint16_t vlan_tag) > m->vlan_tci = vlan_tag; > } > > - if (m->ol_flags & PKT_TX_TCP_SEG) > + if (m->ol_flags & PKT_RX_LRO) > virtio_tx_offload(m); > > tx_q->m_table[tx_q->len++] = m; > @@ -1633,7 +1635,7 @@ main(int argc, char *argv[]) > int ret, i; > uint16_t portid; > static pthread_t tid; > - uint64_t flags = 0; > + uint64_t flags = RTE_VHOST_USER_NET_COMPLIANT_OL_FLAGS; > > signal(SIGINT, sigint_handler); > > diff --git a/lib/vhost/rte_vhost.h b/lib/vhost/rte_vhost.h > index d0a8ae31f2..8d875e9322 100644 > --- a/lib/vhost/rte_vhost.h > +++ b/lib/vhost/rte_vhost.h > @@ -36,6 +36,7 @@ extern "C" { > /* support only linear buffers (no chained mbufs) */ > #define RTE_VHOST_USER_LINEARBUF_SUPPORT (1ULL << 6) > #define RTE_VHOST_USER_ASYNC_COPY (1ULL << 7) > +#define RTE_VHOST_USER_NET_COMPLIANT_OL_FLAGS (1ULL << 8) > > /* Features. */ > #ifndef VIRTIO_NET_F_GUEST_ANNOUNCE > diff --git a/lib/vhost/socket.c b/lib/vhost/socket.c > index 0169d36481..5d0d728d52 100644 > --- a/lib/vhost/socket.c > +++ b/lib/vhost/socket.c > @@ -42,6 +42,7 @@ struct vhost_user_socket { > bool extbuf; > bool linearbuf; > bool async_copy; > + bool net_compliant_ol_flags; > > /* > * The "supported_features" indicates the feature bits the > @@ -224,7 +225,8 @@ vhost_user_add_connection(int fd, struct vhost_user_socket *vsocket) > size = strnlen(vsocket->path, PATH_MAX); > vhost_set_ifname(vid, vsocket->path, size); > > - vhost_set_builtin_virtio_net(vid, vsocket->use_builtin_virtio_net); > + vhost_setup_virtio_net(vid, vsocket->use_builtin_virtio_net, > + vsocket->net_compliant_ol_flags); > > vhost_attach_vdpa_device(vid, vsocket->vdpa_dev); > > @@ -877,6 +879,7 @@ rte_vhost_driver_register(const char *path, uint64_t flags) > vsocket->extbuf = flags & RTE_VHOST_USER_EXTBUF_SUPPORT; > vsocket->linearbuf = flags & RTE_VHOST_USER_LINEARBUF_SUPPORT; > vsocket->async_copy = flags & RTE_VHOST_USER_ASYNC_COPY; > + vsocket->net_compliant_ol_flags = flags & RTE_VHOST_USER_NET_COMPLIANT_OL_FLAGS; > > if (vsocket->async_copy && > (flags & (RTE_VHOST_USER_IOMMU_SUPPORT | > diff --git a/lib/vhost/vhost.c b/lib/vhost/vhost.c > index c9b6379f73..9abfc0bfe7 100644 > --- a/lib/vhost/vhost.c > +++ b/lib/vhost/vhost.c > @@ -752,7 +752,7 @@ vhost_set_ifname(int vid, const char *if_name, unsigned int if_len) > } > > void > -vhost_set_builtin_virtio_net(int vid, bool enable) > +vhost_setup_virtio_net(int vid, bool enable, bool compliant_ol_flags) > { > struct virtio_net *dev = get_device(vid); > > @@ -763,6 +763,10 @@ vhost_set_builtin_virtio_net(int vid, bool enable) > dev->flags |= VIRTIO_DEV_BUILTIN_VIRTIO_NET; > else > dev->flags &= ~VIRTIO_DEV_BUILTIN_VIRTIO_NET; > + if (!compliant_ol_flags) > + dev->flags |= VIRTIO_DEV_LEGACY_OL_FLAGS; > + else > + dev->flags &= ~VIRTIO_DEV_LEGACY_OL_FLAGS; > } > > void > diff --git a/lib/vhost/vhost.h b/lib/vhost/vhost.h > index b303635645..8078ddff79 100644 > --- a/lib/vhost/vhost.h > +++ b/lib/vhost/vhost.h > @@ -27,15 +27,17 @@ > #include "rte_vhost_async.h" > > /* Used to indicate that the device is running on a data core */ > -#define VIRTIO_DEV_RUNNING 1 > +#define VIRTIO_DEV_RUNNING ((uint32_t)1 << 0) > /* Used to indicate that the device is ready to operate */ > -#define VIRTIO_DEV_READY 2 > +#define VIRTIO_DEV_READY ((uint32_t)1 << 1) > /* Used to indicate that the built-in vhost net device backend is enabled */ > -#define VIRTIO_DEV_BUILTIN_VIRTIO_NET 4 > +#define VIRTIO_DEV_BUILTIN_VIRTIO_NET ((uint32_t)1 << 2) > /* Used to indicate that the device has its own data path and configured */ > -#define VIRTIO_DEV_VDPA_CONFIGURED 8 > +#define VIRTIO_DEV_VDPA_CONFIGURED ((uint32_t)1 << 3) > /* Used to indicate that the feature negotiation failed */ > -#define VIRTIO_DEV_FEATURES_FAILED 16 > +#define VIRTIO_DEV_FEATURES_FAILED ((uint32_t)1 << 4) > +/* Used to indicate that the virtio_net tx code should fill TX ol_flags */ > +#define VIRTIO_DEV_LEGACY_OL_FLAGS ((uint32_t)1 << 5) > > /* Backend value set by guest. */ > #define VIRTIO_DEV_STOPPED -1 > @@ -683,7 +685,7 @@ int alloc_vring_queue(struct virtio_net *dev, uint32_t vring_idx); > void vhost_attach_vdpa_device(int vid, struct rte_vdpa_device *dev); > > void vhost_set_ifname(int, const char *if_name, unsigned int if_len); > -void vhost_set_builtin_virtio_net(int vid, bool enable); > +void vhost_setup_virtio_net(int vid, bool enable, bool legacy_ol_flags); > void vhost_enable_extbuf(int vid); > void vhost_enable_linearbuf(int vid); > int vhost_enable_guest_notification(struct virtio_net *dev, > diff --git a/lib/vhost/virtio_net.c b/lib/vhost/virtio_net.c > index 1a34867f3c..8e36f4c340 100644 > --- a/lib/vhost/virtio_net.c > +++ b/lib/vhost/virtio_net.c > @@ -8,6 +8,7 @@ > > #include <rte_mbuf.h> > #include <rte_memcpy.h> > +#include <rte_net.h> > #include <rte_ether.h> > #include <rte_ip.h> > #include <rte_vhost.h> > @@ -2303,15 +2304,12 @@ parse_ethernet(struct rte_mbuf *m, uint16_t *l4_proto, void **l4_hdr) > } > > static __rte_always_inline void > -vhost_dequeue_offload(struct virtio_net_hdr *hdr, struct rte_mbuf *m) > +vhost_dequeue_offload_legacy(struct virtio_net_hdr *hdr, struct rte_mbuf *m) > { > uint16_t l4_proto = 0; > void *l4_hdr = NULL; > struct rte_tcp_hdr *tcp_hdr = NULL; > > - if (hdr->flags == 0 && hdr->gso_type == VIRTIO_NET_HDR_GSO_NONE) > - return; > - > parse_ethernet(m, &l4_proto, &l4_hdr); > if (hdr->flags == VIRTIO_NET_HDR_F_NEEDS_CSUM) { > if (hdr->csum_start == (m->l2_len + m->l3_len)) { > @@ -2356,6 +2354,94 @@ vhost_dequeue_offload(struct virtio_net_hdr *hdr, struct rte_mbuf *m) > } > } > > +static __rte_always_inline void > +vhost_dequeue_offload(struct virtio_net_hdr *hdr, struct rte_mbuf *m, > + bool legacy_ol_flags) > +{ > + struct rte_net_hdr_lens hdr_lens; > + int l4_supported = 0; > + uint32_t ptype; > + > + if (hdr->flags == 0 && hdr->gso_type == VIRTIO_NET_HDR_GSO_NONE) > + return; > + > + if (legacy_ol_flags) { > + vhost_dequeue_offload_legacy(hdr, m); > + return; > + } > + > + m->ol_flags |= PKT_RX_IP_CKSUM_UNKNOWN; > + > + ptype = rte_net_get_ptype(m, &hdr_lens, RTE_PTYPE_ALL_MASK); > + m->packet_type = ptype; My _impression_ is that calling rte_net_get_ptype() makes the receiving process a bit more expensive than without the patch and it is not optional. However, the original parsing code was limited and could be considered a bug. Anyways, calling that has the nice side effect of providing the packet_type which it didn't provide before the patch. Acked-by: Flavio Leitner <fbl@sysclose.org> (though this just got merged) Thanks David, great work! fbl > + if ((ptype & RTE_PTYPE_L4_MASK) == RTE_PTYPE_L4_TCP || > + (ptype & RTE_PTYPE_L4_MASK) == RTE_PTYPE_L4_UDP || > + (ptype & RTE_PTYPE_L4_MASK) == RTE_PTYPE_L4_SCTP) > + l4_supported = 1; > + > + /* According to Virtio 1.1 spec, the device only needs to look at > + * VIRTIO_NET_HDR_F_NEEDS_CSUM in the packet transmission path. > + * This differs from the processing incoming packets path where the > + * driver could rely on VIRTIO_NET_HDR_F_DATA_VALID flag set by the > + * device. > + * > + * 5.1.6.2.1 Driver Requirements: Packet Transmission > + * The driver MUST NOT set the VIRTIO_NET_HDR_F_DATA_VALID and > + * VIRTIO_NET_HDR_F_RSC_INFO bits in flags. > + * > + * 5.1.6.2.2 Device Requirements: Packet Transmission > + * The device MUST ignore flag bits that it does not recognize. > + */ > + if (hdr->flags & VIRTIO_NET_HDR_F_NEEDS_CSUM) { > + uint32_t hdrlen; > + > + hdrlen = hdr_lens.l2_len + hdr_lens.l3_len + hdr_lens.l4_len; > + if (hdr->csum_start <= hdrlen && l4_supported != 0) { > + m->ol_flags |= PKT_RX_L4_CKSUM_NONE; > + } else { > + /* Unknown proto or tunnel, do sw cksum. We can assume > + * the cksum field is in the first segment since the > + * buffers we provided to the host are large enough. > + * In case of SCTP, this will be wrong since it's a CRC > + * but there's nothing we can do. > + */ > + uint16_t csum = 0, off; > + > + if (rte_raw_cksum_mbuf(m, hdr->csum_start, > + rte_pktmbuf_pkt_len(m) - hdr->csum_start, &csum) < 0) > + return; > + if (likely(csum != 0xffff)) > + csum = ~csum; > + off = hdr->csum_offset + hdr->csum_start; > + if (rte_pktmbuf_data_len(m) >= off + 1) > + *rte_pktmbuf_mtod_offset(m, uint16_t *, off) = csum; > + } > + } > + > + if (hdr->gso_type != VIRTIO_NET_HDR_GSO_NONE) { > + if (hdr->gso_size == 0) > + return; > + > + switch (hdr->gso_type & ~VIRTIO_NET_HDR_GSO_ECN) { > + case VIRTIO_NET_HDR_GSO_TCPV4: > + case VIRTIO_NET_HDR_GSO_TCPV6: > + if ((ptype & RTE_PTYPE_L4_MASK) != RTE_PTYPE_L4_TCP) > + break; > + m->ol_flags |= PKT_RX_LRO | PKT_RX_L4_CKSUM_NONE; > + m->tso_segsz = hdr->gso_size; > + break; > + case VIRTIO_NET_HDR_GSO_UDP: > + if ((ptype & RTE_PTYPE_L4_MASK) != RTE_PTYPE_L4_UDP) > + break; > + m->ol_flags |= PKT_RX_LRO | PKT_RX_L4_CKSUM_NONE; > + m->tso_segsz = hdr->gso_size; > + break; > + default: > + break; > + } > + } > +} > + > static __rte_noinline void > copy_vnet_hdr_from_desc(struct virtio_net_hdr *hdr, > struct buf_vector *buf_vec) > @@ -2380,7 +2466,8 @@ copy_vnet_hdr_from_desc(struct virtio_net_hdr *hdr, > static __rte_always_inline int > copy_desc_to_mbuf(struct virtio_net *dev, struct vhost_virtqueue *vq, > struct buf_vector *buf_vec, uint16_t nr_vec, > - struct rte_mbuf *m, struct rte_mempool *mbuf_pool) > + struct rte_mbuf *m, struct rte_mempool *mbuf_pool, > + bool legacy_ol_flags) > { > uint32_t buf_avail, buf_offset; > uint64_t buf_addr, buf_len; > @@ -2513,7 +2600,7 @@ copy_desc_to_mbuf(struct virtio_net *dev, struct vhost_virtqueue *vq, > m->pkt_len += mbuf_offset; > > if (hdr) > - vhost_dequeue_offload(hdr, m); > + vhost_dequeue_offload(hdr, m, legacy_ol_flags); > > out: > > @@ -2606,9 +2693,11 @@ virtio_dev_pktmbuf_alloc(struct virtio_net *dev, struct rte_mempool *mp, > return pkt; > } > > -static __rte_noinline uint16_t > +__rte_always_inline > +static uint16_t > virtio_dev_tx_split(struct virtio_net *dev, struct vhost_virtqueue *vq, > - struct rte_mempool *mbuf_pool, struct rte_mbuf **pkts, uint16_t count) > + struct rte_mempool *mbuf_pool, struct rte_mbuf **pkts, uint16_t count, > + bool legacy_ol_flags) > { > uint16_t i; > uint16_t free_entries; > @@ -2668,7 +2757,7 @@ virtio_dev_tx_split(struct virtio_net *dev, struct vhost_virtqueue *vq, > } > > err = copy_desc_to_mbuf(dev, vq, buf_vec, nr_vec, pkts[i], > - mbuf_pool); > + mbuf_pool, legacy_ol_flags); > if (unlikely(err)) { > rte_pktmbuf_free(pkts[i]); > if (!allocerr_warned) { > @@ -2696,6 +2785,24 @@ virtio_dev_tx_split(struct virtio_net *dev, struct vhost_virtqueue *vq, > return (i - dropped); > } > > +__rte_noinline > +static uint16_t > +virtio_dev_tx_split_legacy(struct virtio_net *dev, > + struct vhost_virtqueue *vq, struct rte_mempool *mbuf_pool, > + struct rte_mbuf **pkts, uint16_t count) > +{ > + return virtio_dev_tx_split(dev, vq, mbuf_pool, pkts, count, true); > +} > + > +__rte_noinline > +static uint16_t > +virtio_dev_tx_split_compliant(struct virtio_net *dev, > + struct vhost_virtqueue *vq, struct rte_mempool *mbuf_pool, > + struct rte_mbuf **pkts, uint16_t count) > +{ > + return virtio_dev_tx_split(dev, vq, mbuf_pool, pkts, count, false); > +} > + > static __rte_always_inline int > vhost_reserve_avail_batch_packed(struct virtio_net *dev, > struct vhost_virtqueue *vq, > @@ -2770,7 +2877,8 @@ vhost_reserve_avail_batch_packed(struct virtio_net *dev, > static __rte_always_inline int > virtio_dev_tx_batch_packed(struct virtio_net *dev, > struct vhost_virtqueue *vq, > - struct rte_mbuf **pkts) > + struct rte_mbuf **pkts, > + bool legacy_ol_flags) > { > uint16_t avail_idx = vq->last_avail_idx; > uint32_t buf_offset = sizeof(struct virtio_net_hdr_mrg_rxbuf); > @@ -2794,7 +2902,7 @@ virtio_dev_tx_batch_packed(struct virtio_net *dev, > if (virtio_net_with_host_offload(dev)) { > vhost_for_each_try_unroll(i, 0, PACKED_BATCH_SIZE) { > hdr = (struct virtio_net_hdr *)(desc_addrs[i]); > - vhost_dequeue_offload(hdr, pkts[i]); > + vhost_dequeue_offload(hdr, pkts[i], legacy_ol_flags); > } > } > > @@ -2815,7 +2923,8 @@ vhost_dequeue_single_packed(struct virtio_net *dev, > struct rte_mempool *mbuf_pool, > struct rte_mbuf *pkts, > uint16_t *buf_id, > - uint16_t *desc_count) > + uint16_t *desc_count, > + bool legacy_ol_flags) > { > struct buf_vector buf_vec[BUF_VECTOR_MAX]; > uint32_t buf_len; > @@ -2841,7 +2950,7 @@ vhost_dequeue_single_packed(struct virtio_net *dev, > } > > err = copy_desc_to_mbuf(dev, vq, buf_vec, nr_vec, pkts, > - mbuf_pool); > + mbuf_pool, legacy_ol_flags); > if (unlikely(err)) { > if (!allocerr_warned) { > VHOST_LOG_DATA(ERR, > @@ -2859,14 +2968,15 @@ static __rte_always_inline int > virtio_dev_tx_single_packed(struct virtio_net *dev, > struct vhost_virtqueue *vq, > struct rte_mempool *mbuf_pool, > - struct rte_mbuf *pkts) > + struct rte_mbuf *pkts, > + bool legacy_ol_flags) > { > > uint16_t buf_id, desc_count = 0; > int ret; > > ret = vhost_dequeue_single_packed(dev, vq, mbuf_pool, pkts, &buf_id, > - &desc_count); > + &desc_count, legacy_ol_flags); > > if (likely(desc_count > 0)) { > if (virtio_net_is_inorder(dev)) > @@ -2882,12 +2992,14 @@ virtio_dev_tx_single_packed(struct virtio_net *dev, > return ret; > } > > -static __rte_noinline uint16_t > +__rte_always_inline > +static uint16_t > virtio_dev_tx_packed(struct virtio_net *dev, > struct vhost_virtqueue *__rte_restrict vq, > struct rte_mempool *mbuf_pool, > struct rte_mbuf **__rte_restrict pkts, > - uint32_t count) > + uint32_t count, > + bool legacy_ol_flags) > { > uint32_t pkt_idx = 0; > > @@ -2899,14 +3011,16 @@ virtio_dev_tx_packed(struct virtio_net *dev, > > if (count - pkt_idx >= PACKED_BATCH_SIZE) { > if (!virtio_dev_tx_batch_packed(dev, vq, > - &pkts[pkt_idx])) { > + &pkts[pkt_idx], > + legacy_ol_flags)) { > pkt_idx += PACKED_BATCH_SIZE; > continue; > } > } > > if (virtio_dev_tx_single_packed(dev, vq, mbuf_pool, > - pkts[pkt_idx])) > + pkts[pkt_idx], > + legacy_ol_flags)) > break; > pkt_idx++; > } while (pkt_idx < count); > @@ -2924,6 +3038,24 @@ virtio_dev_tx_packed(struct virtio_net *dev, > return pkt_idx; > } > > +__rte_noinline > +static uint16_t > +virtio_dev_tx_packed_legacy(struct virtio_net *dev, > + struct vhost_virtqueue *__rte_restrict vq, struct rte_mempool *mbuf_pool, > + struct rte_mbuf **__rte_restrict pkts, uint32_t count) > +{ > + return virtio_dev_tx_packed(dev, vq, mbuf_pool, pkts, count, true); > +} > + > +__rte_noinline > +static uint16_t > +virtio_dev_tx_packed_compliant(struct virtio_net *dev, > + struct vhost_virtqueue *__rte_restrict vq, struct rte_mempool *mbuf_pool, > + struct rte_mbuf **__rte_restrict pkts, uint32_t count) > +{ > + return virtio_dev_tx_packed(dev, vq, mbuf_pool, pkts, count, false); > +} > + > uint16_t > rte_vhost_dequeue_burst(int vid, uint16_t queue_id, > struct rte_mempool *mbuf_pool, struct rte_mbuf **pkts, uint16_t count) > @@ -2999,10 +3131,17 @@ rte_vhost_dequeue_burst(int vid, uint16_t queue_id, > count -= 1; > } > > - if (vq_is_packed(dev)) > - count = virtio_dev_tx_packed(dev, vq, mbuf_pool, pkts, count); > - else > - count = virtio_dev_tx_split(dev, vq, mbuf_pool, pkts, count); > + if (vq_is_packed(dev)) { > + if (dev->flags & VIRTIO_DEV_LEGACY_OL_FLAGS) > + count = virtio_dev_tx_packed_legacy(dev, vq, mbuf_pool, pkts, count); > + else > + count = virtio_dev_tx_packed_compliant(dev, vq, mbuf_pool, pkts, count); > + } else { > + if (dev->flags & VIRTIO_DEV_LEGACY_OL_FLAGS) > + count = virtio_dev_tx_split_legacy(dev, vq, mbuf_pool, pkts, count); > + else > + count = virtio_dev_tx_split_compliant(dev, vq, mbuf_pool, pkts, count); > + } > > out: > if (dev->features & (1ULL << VIRTIO_F_IOMMU_PLATFORM)) > -- > 2.23.0 > -- fbl ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [dpdk-stable] [dpdk-dev] [PATCH v4 3/3] vhost: fix offload flags in Rx path 2021-05-03 16:43 ` [dpdk-stable] [PATCH v4 3/3] " David Marchand 2021-05-04 11:07 ` Flavio Leitner @ 2021-05-08 6:24 ` Wang, Yinan 2021-05-12 3:29 ` Wang, Yinan 1 sibling, 1 reply; 7+ messages in thread From: Wang, Yinan @ 2021-05-08 6:24 UTC (permalink / raw) To: David Marchand, dev Cc: maxime.coquelin, olivier.matz, fbl, i.maximets, Xia, Chenbo, Stokes, Ian, stable, Jijiang Liu, Yuanhan Liu Hi David, May I know how to configures Tx offloading by testpmd, could you help to provide an example case? I add a case which need vhost tx offload (TSO/cksum) function, this case can't work with the patch, could you use this case as the example if possible? For example: VM2VM split ring vhost-user/virtio-net test with tcp traffic ========================================================================= 1. Launch the Vhost sample on socket 0 by below commands:: rm -rf vhost-net* ./dpdk-testpmd -l 2-4 -n 4 --no-pci --file-prefix=vhost --vdev 'net_vhost0,iface=vhost-net0,queues=1' \ --vdev 'net_vhost1,iface=vhost-net1,queues=1' -- -i --nb-cores=2 --txd=1024 --rxd=1024 testpmd>start 2. Launch VM1 and VM2 on socket 1:: taskset -c 32 qemu-system-x86_64 -name vm1 -enable-kvm -cpu host -smp 1 -m 4096 \ -object memory-backend-file,id=mem,size=4096M,mem-path=/mnt/huge,share=on \ -numa node,memdev=mem -mem-prealloc -drive file=/home/osimg/ubuntu20-04.img \ -chardev socket,path=/tmp/vm2_qga0.sock,server,nowait,id=vm2_qga0 -device virtio-serial \ -device virtserialport,chardev=vm2_qga0,name=org.qemu.guest_agent.2 -daemonize \ -monitor unix:/tmp/vm2_monitor.sock,server,nowait -device e1000,netdev=nttsip1 \ -netdev user,id=nttsip1,hostfwd=tcp:127.0.0.1:6002-:22 \ -chardev socket,id=char0,path=./vhost-net0 \ -netdev type=vhost-user,id=netdev0,chardev=char0,vhostforce \ -device virtio-net-pci,netdev=netdev0,mac=52:54:00:00:00:01,disable-modern=false,mrg_rxbuf=on,csum=on,guest_csum=on,host_tso4=on,guest_tso4=on,guest_ecn=on -vnc :10 taskset -c 33 qemu-system-x86_64 -name vm2 -enable-kvm -cpu host -smp 1 -m 4096 \ -object memory-backend-file,id=mem,size=4096M,mem-path=/mnt/huge,share=on \ -numa node,memdev=mem -mem-prealloc -drive file=/home/osimg/ubuntu20-04-2.img \ -chardev socket,path=/tmp/vm2_qga0.sock,server,nowait,id=vm2_qga0 -device virtio-serial \ -device virtserialport,chardev=vm2_qga0,name=org.qemu.guest_agent.2 -daemonize \ -monitor unix:/tmp/vm2_monitor.sock,server,nowait -device e1000,netdev=nttsip1 \ -netdev user,id=nttsip1,hostfwd=tcp:127.0.0.1:6003-:22 \ -chardev socket,id=char0,path=./vhost-net1 \ -netdev type=vhost-user,id=netdev0,chardev=char0,vhostforce \ -device virtio-net-pci,netdev=netdev0,mac=52:54:00:00:00:02,disable-modern=false,mrg_rxbuf=on,csum=on,guest_csum=on,host_tso4=on,guest_tso4=on,guest_ecn=on -vnc :12 3. On VM1, set virtio device IP and run arp protocal:: ifconfig ens5 1.1.1.2 arp -s 1.1.1.8 52:54:00:00:00:02 4. On VM2, set virtio device IP and run arp protocal:: ifconfig ens5 1.1.1.8 arp -s 1.1.1.2 52:54:00:00:00:01 5. Check the iperf performance with different packet size between two VMs by below commands:: Under VM1, run: `iperf -s -i 1` Under VM2, run: `iperf -c 1.1.1.2 -i 1 -t 60` BR, Yinan > -----Original Message----- > From: dev <dev-bounces@dpdk.org> On Behalf Of David Marchand > Sent: 2021?5?4? 0:44 > To: dev@dpdk.org > Cc: maxime.coquelin@redhat.com; olivier.matz@6wind.com; > fbl@sysclose.org; i.maximets@ovn.org; Xia, Chenbo > <chenbo.xia@intel.com>; Stokes, Ian <ian.stokes@intel.com>; > stable@dpdk.org; Jijiang Liu <jijiang.liu@intel.com>; Yuanhan Liu > <yuanhan.liu@linux.intel.com> > Subject: [dpdk-dev] [PATCH v4 3/3] vhost: fix offload flags in Rx path > > The vhost library currently configures Tx offloading (PKT_TX_*) on any > packet received from a guest virtio device which asks for some offloading. > > This is problematic, as Tx offloading is something that the application > must ask for: the application needs to configure devices > to support every used offloads (ip, tcp checksumming, tso..), and the > various l2/l3/l4 lengths must be set following any processing that > happened in the application itself. > > On the other hand, the received packets are not marked wrt current > packet l3/l4 checksumming info. > > Copy virtio rx processing to fix those offload flags with some > differences: > - accept VIRTIO_NET_HDR_GSO_ECN and VIRTIO_NET_HDR_GSO_UDP, > - ignore anything but the VIRTIO_NET_HDR_F_NEEDS_CSUM flag (to comply > with > the virtio spec), > > Some applications might rely on the current behavior, so it is left > untouched by default. > A new RTE_VHOST_USER_NET_COMPLIANT_OL_FLAGS flag is added to > enable the > new behavior. > > The vhost example has been updated for the new behavior: TSO is applied > to > any packet marked LRO. > > Fixes: 859b480d5afd ("vhost: add guest offload setting") > Cc: stable@dpdk.org > > Signed-off-by: David Marchand <david.marchand@redhat.com> > Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com> > --- > Changes since v3: > - rebased on next-virtio, > > Changes since v2: > - introduced a new flag to keep existing behavior as the default, > - packets with unrecognised offload are passed to the application with no > offload metadata rather than dropped, > - ignored VIRTIO_NET_HDR_F_DATA_VALID since the virtio spec states that > the virtio driver is not allowed to use this flag when transmitting > packets, > > Changes since v1: > - updated vhost example, > - restored VIRTIO_NET_HDR_GSO_ECN and VIRTIO_NET_HDR_GSO_UDP > support, > - restored log on buggy offload request, > > --- > doc/guides/prog_guide/vhost_lib.rst | 12 ++ > doc/guides/rel_notes/release_21_05.rst | 6 + > drivers/net/vhost/rte_eth_vhost.c | 2 +- > examples/vhost/main.c | 44 +++--- > lib/vhost/rte_vhost.h | 1 + > lib/vhost/socket.c | 5 +- > lib/vhost/vhost.c | 6 +- > lib/vhost/vhost.h | 14 +- > lib/vhost/virtio_net.c | 185 ++++++++++++++++++++++--- > 9 files changed, 222 insertions(+), 53 deletions(-) > > diff --git a/doc/guides/prog_guide/vhost_lib.rst > b/doc/guides/prog_guide/vhost_lib.rst > index 7afa351675..d18fb98910 100644 > --- a/doc/guides/prog_guide/vhost_lib.rst > +++ b/doc/guides/prog_guide/vhost_lib.rst > @@ -118,6 +118,18 @@ The following is an overview of some key Vhost > API functions: > > It is disabled by default. > > + - ``RTE_VHOST_USER_NET_COMPLIANT_OL_FLAGS`` > + > + Since v16.04, the vhost library forwards checksum and gso requests for > + packets received from a virtio driver by filling Tx offload metadata in > + the mbuf. This behavior is inconsistent with other drivers but it is left > + untouched for existing applications that might rely on it. > + > + This flag disables the legacy behavior and instead ask vhost to simply > + populate Rx offload metadata in the mbuf. > + > + It is disabled by default. > + > * ``rte_vhost_driver_set_features(path, features)`` > > This function sets the feature bits the vhost-user driver supports. The > diff --git a/doc/guides/rel_notes/release_21_05.rst > b/doc/guides/rel_notes/release_21_05.rst > index a5f21f8425..6b7b0810a5 100644 > --- a/doc/guides/rel_notes/release_21_05.rst > +++ b/doc/guides/rel_notes/release_21_05.rst > @@ -337,6 +337,12 @@ API Changes > ``policer_action_recolor_supported`` and > ``policer_action_drop_supported`` > have been removed. > > +* vhost: The vhost library currently populates received mbufs from a virtio > + driver with Tx offload flags while not filling Rx offload flags. > + While this behavior is arguable, it is kept untouched. > + A new flag ``RTE_VHOST_USER_NET_COMPLIANT_OL_FLAGS`` has been > added to ask > + for a behavior compliant with to the mbuf offload API. > + > > ABI Changes > ----------- > diff --git a/drivers/net/vhost/rte_eth_vhost.c > b/drivers/net/vhost/rte_eth_vhost.c > index d198fc8a8e..281379d6a3 100644 > --- a/drivers/net/vhost/rte_eth_vhost.c > +++ b/drivers/net/vhost/rte_eth_vhost.c > @@ -1505,7 +1505,7 @@ rte_pmd_vhost_probe(struct rte_vdev_device > *dev) > int ret = 0; > char *iface_name; > uint16_t queues; > - uint64_t flags = 0; > + uint64_t flags = RTE_VHOST_USER_NET_COMPLIANT_OL_FLAGS; > uint64_t disable_flags = 0; > int client_mode = 0; > int iommu_support = 0; > diff --git a/examples/vhost/main.c b/examples/vhost/main.c > index 0bee1f3321..d2179eadb9 100644 > --- a/examples/vhost/main.c > +++ b/examples/vhost/main.c > @@ -19,6 +19,7 @@ > #include <rte_log.h> > #include <rte_string_fns.h> > #include <rte_malloc.h> > +#include <rte_net.h> > #include <rte_vhost.h> > #include <rte_ip.h> > #include <rte_tcp.h> > @@ -1029,33 +1030,34 @@ find_local_dest(struct vhost_dev *vdev, > struct rte_mbuf *m, > return 0; > } > > -static uint16_t > -get_psd_sum(void *l3_hdr, uint64_t ol_flags) > -{ > - if (ol_flags & PKT_TX_IPV4) > - return rte_ipv4_phdr_cksum(l3_hdr, ol_flags); > - else /* assume ethertype == RTE_ETHER_TYPE_IPV6 */ > - return rte_ipv6_phdr_cksum(l3_hdr, ol_flags); > -} > - > static void virtio_tx_offload(struct rte_mbuf *m) > { > + struct rte_net_hdr_lens hdr_lens; > + struct rte_ipv4_hdr *ipv4_hdr; > + struct rte_tcp_hdr *tcp_hdr; > + uint32_t ptype; > void *l3_hdr; > - struct rte_ipv4_hdr *ipv4_hdr = NULL; > - struct rte_tcp_hdr *tcp_hdr = NULL; > - struct rte_ether_hdr *eth_hdr = > - rte_pktmbuf_mtod(m, struct rte_ether_hdr *); > > - l3_hdr = (char *)eth_hdr + m->l2_len; > + ptype = rte_net_get_ptype(m, &hdr_lens, RTE_PTYPE_ALL_MASK); > + m->l2_len = hdr_lens.l2_len; > + m->l3_len = hdr_lens.l3_len; > + m->l4_len = hdr_lens.l4_len; > > - if (m->ol_flags & PKT_TX_IPV4) { > + l3_hdr = rte_pktmbuf_mtod_offset(m, void *, m->l2_len); > + tcp_hdr = rte_pktmbuf_mtod_offset(m, struct rte_tcp_hdr *, > + m->l2_len + m->l3_len); > + > + m->ol_flags |= PKT_TX_TCP_SEG; > + if ((ptype & RTE_PTYPE_L3_MASK) == RTE_PTYPE_L3_IPV4) { > + m->ol_flags |= PKT_TX_IPV4; > + m->ol_flags |= PKT_TX_IP_CKSUM; > ipv4_hdr = l3_hdr; > ipv4_hdr->hdr_checksum = 0; > - m->ol_flags |= PKT_TX_IP_CKSUM; > + tcp_hdr->cksum = rte_ipv4_phdr_cksum(l3_hdr, m- > >ol_flags); > + } else { /* assume ethertype == RTE_ETHER_TYPE_IPV6 */ > + m->ol_flags |= PKT_TX_IPV6; > + tcp_hdr->cksum = rte_ipv6_phdr_cksum(l3_hdr, m- > >ol_flags); > } > - > - tcp_hdr = (struct rte_tcp_hdr *)((char *)l3_hdr + m->l3_len); > - tcp_hdr->cksum = get_psd_sum(l3_hdr, m->ol_flags); > } > > static __rte_always_inline void > @@ -1148,7 +1150,7 @@ virtio_tx_route(struct vhost_dev *vdev, struct > rte_mbuf *m, uint16_t vlan_tag) > m->vlan_tci = vlan_tag; > } > > - if (m->ol_flags & PKT_TX_TCP_SEG) > + if (m->ol_flags & PKT_RX_LRO) > virtio_tx_offload(m); > > tx_q->m_table[tx_q->len++] = m; > @@ -1633,7 +1635,7 @@ main(int argc, char *argv[]) > int ret, i; > uint16_t portid; > static pthread_t tid; > - uint64_t flags = 0; > + uint64_t flags = RTE_VHOST_USER_NET_COMPLIANT_OL_FLAGS; > > signal(SIGINT, sigint_handler); > > diff --git a/lib/vhost/rte_vhost.h b/lib/vhost/rte_vhost.h > index d0a8ae31f2..8d875e9322 100644 > --- a/lib/vhost/rte_vhost.h > +++ b/lib/vhost/rte_vhost.h > @@ -36,6 +36,7 @@ extern "C" { > /* support only linear buffers (no chained mbufs) */ > #define RTE_VHOST_USER_LINEARBUF_SUPPORT (1ULL << 6) > #define RTE_VHOST_USER_ASYNC_COPY (1ULL << 7) > +#define RTE_VHOST_USER_NET_COMPLIANT_OL_FLAGS (1ULL << 8) > > /* Features. */ > #ifndef VIRTIO_NET_F_GUEST_ANNOUNCE > diff --git a/lib/vhost/socket.c b/lib/vhost/socket.c > index 0169d36481..5d0d728d52 100644 > --- a/lib/vhost/socket.c > +++ b/lib/vhost/socket.c > @@ -42,6 +42,7 @@ struct vhost_user_socket { > bool extbuf; > bool linearbuf; > bool async_copy; > + bool net_compliant_ol_flags; > > /* > * The "supported_features" indicates the feature bits the > @@ -224,7 +225,8 @@ vhost_user_add_connection(int fd, struct > vhost_user_socket *vsocket) > size = strnlen(vsocket->path, PATH_MAX); > vhost_set_ifname(vid, vsocket->path, size); > > - vhost_set_builtin_virtio_net(vid, vsocket->use_builtin_virtio_net); > + vhost_setup_virtio_net(vid, vsocket->use_builtin_virtio_net, > + vsocket->net_compliant_ol_flags); > > vhost_attach_vdpa_device(vid, vsocket->vdpa_dev); > > @@ -877,6 +879,7 @@ rte_vhost_driver_register(const char *path, > uint64_t flags) > vsocket->extbuf = flags & RTE_VHOST_USER_EXTBUF_SUPPORT; > vsocket->linearbuf = flags & > RTE_VHOST_USER_LINEARBUF_SUPPORT; > vsocket->async_copy = flags & RTE_VHOST_USER_ASYNC_COPY; > + vsocket->net_compliant_ol_flags = flags & > RTE_VHOST_USER_NET_COMPLIANT_OL_FLAGS; > > if (vsocket->async_copy && > (flags & (RTE_VHOST_USER_IOMMU_SUPPORT | > diff --git a/lib/vhost/vhost.c b/lib/vhost/vhost.c > index c9b6379f73..9abfc0bfe7 100644 > --- a/lib/vhost/vhost.c > +++ b/lib/vhost/vhost.c > @@ -752,7 +752,7 @@ vhost_set_ifname(int vid, const char *if_name, > unsigned int if_len) > } > > void > -vhost_set_builtin_virtio_net(int vid, bool enable) > +vhost_setup_virtio_net(int vid, bool enable, bool compliant_ol_flags) > { > struct virtio_net *dev = get_device(vid); > > @@ -763,6 +763,10 @@ vhost_set_builtin_virtio_net(int vid, bool enable) > dev->flags |= VIRTIO_DEV_BUILTIN_VIRTIO_NET; > else > dev->flags &= ~VIRTIO_DEV_BUILTIN_VIRTIO_NET; > + if (!compliant_ol_flags) > + dev->flags |= VIRTIO_DEV_LEGACY_OL_FLAGS; > + else > + dev->flags &= ~VIRTIO_DEV_LEGACY_OL_FLAGS; > } > > void > diff --git a/lib/vhost/vhost.h b/lib/vhost/vhost.h > index b303635645..8078ddff79 100644 > --- a/lib/vhost/vhost.h > +++ b/lib/vhost/vhost.h > @@ -27,15 +27,17 @@ > #include "rte_vhost_async.h" > > /* Used to indicate that the device is running on a data core */ > -#define VIRTIO_DEV_RUNNING 1 > +#define VIRTIO_DEV_RUNNING ((uint32_t)1 << 0) > /* Used to indicate that the device is ready to operate */ > -#define VIRTIO_DEV_READY 2 > +#define VIRTIO_DEV_READY ((uint32_t)1 << 1) > /* Used to indicate that the built-in vhost net device backend is enabled */ > -#define VIRTIO_DEV_BUILTIN_VIRTIO_NET 4 > +#define VIRTIO_DEV_BUILTIN_VIRTIO_NET ((uint32_t)1 << 2) > /* Used to indicate that the device has its own data path and configured */ > -#define VIRTIO_DEV_VDPA_CONFIGURED 8 > +#define VIRTIO_DEV_VDPA_CONFIGURED ((uint32_t)1 << 3) > /* Used to indicate that the feature negotiation failed */ > -#define VIRTIO_DEV_FEATURES_FAILED 16 > +#define VIRTIO_DEV_FEATURES_FAILED ((uint32_t)1 << 4) > +/* Used to indicate that the virtio_net tx code should fill TX ol_flags */ > +#define VIRTIO_DEV_LEGACY_OL_FLAGS ((uint32_t)1 << 5) > > /* Backend value set by guest. */ > #define VIRTIO_DEV_STOPPED -1 > @@ -683,7 +685,7 @@ int alloc_vring_queue(struct virtio_net *dev, > uint32_t vring_idx); > void vhost_attach_vdpa_device(int vid, struct rte_vdpa_device *dev); > > void vhost_set_ifname(int, const char *if_name, unsigned int if_len); > -void vhost_set_builtin_virtio_net(int vid, bool enable); > +void vhost_setup_virtio_net(int vid, bool enable, bool legacy_ol_flags); > void vhost_enable_extbuf(int vid); > void vhost_enable_linearbuf(int vid); > int vhost_enable_guest_notification(struct virtio_net *dev, > diff --git a/lib/vhost/virtio_net.c b/lib/vhost/virtio_net.c > index 1a34867f3c..8e36f4c340 100644 > --- a/lib/vhost/virtio_net.c > +++ b/lib/vhost/virtio_net.c > @@ -8,6 +8,7 @@ > > #include <rte_mbuf.h> > #include <rte_memcpy.h> > +#include <rte_net.h> > #include <rte_ether.h> > #include <rte_ip.h> > #include <rte_vhost.h> > @@ -2303,15 +2304,12 @@ parse_ethernet(struct rte_mbuf *m, uint16_t > *l4_proto, void **l4_hdr) > } > > static __rte_always_inline void > -vhost_dequeue_offload(struct virtio_net_hdr *hdr, struct rte_mbuf *m) > +vhost_dequeue_offload_legacy(struct virtio_net_hdr *hdr, struct > rte_mbuf *m) > { > uint16_t l4_proto = 0; > void *l4_hdr = NULL; > struct rte_tcp_hdr *tcp_hdr = NULL; > > - if (hdr->flags == 0 && hdr->gso_type == > VIRTIO_NET_HDR_GSO_NONE) > - return; > - > parse_ethernet(m, &l4_proto, &l4_hdr); > if (hdr->flags == VIRTIO_NET_HDR_F_NEEDS_CSUM) { > if (hdr->csum_start == (m->l2_len + m->l3_len)) { > @@ -2356,6 +2354,94 @@ vhost_dequeue_offload(struct virtio_net_hdr > *hdr, struct rte_mbuf *m) > } > } > > +static __rte_always_inline void > +vhost_dequeue_offload(struct virtio_net_hdr *hdr, struct rte_mbuf *m, > + bool legacy_ol_flags) > +{ > + struct rte_net_hdr_lens hdr_lens; > + int l4_supported = 0; > + uint32_t ptype; > + > + if (hdr->flags == 0 && hdr->gso_type == > VIRTIO_NET_HDR_GSO_NONE) > + return; > + > + if (legacy_ol_flags) { > + vhost_dequeue_offload_legacy(hdr, m); > + return; > + } > + > + m->ol_flags |= PKT_RX_IP_CKSUM_UNKNOWN; > + > + ptype = rte_net_get_ptype(m, &hdr_lens, RTE_PTYPE_ALL_MASK); > + m->packet_type = ptype; > + if ((ptype & RTE_PTYPE_L4_MASK) == RTE_PTYPE_L4_TCP || > + (ptype & RTE_PTYPE_L4_MASK) == RTE_PTYPE_L4_UDP || > + (ptype & RTE_PTYPE_L4_MASK) == RTE_PTYPE_L4_SCTP) > + l4_supported = 1; > + > + /* According to Virtio 1.1 spec, the device only needs to look at > + * VIRTIO_NET_HDR_F_NEEDS_CSUM in the packet transmission > path. > + * This differs from the processing incoming packets path where the > + * driver could rely on VIRTIO_NET_HDR_F_DATA_VALID flag set by > the > + * device. > + * > + * 5.1.6.2.1 Driver Requirements: Packet Transmission > + * The driver MUST NOT set the VIRTIO_NET_HDR_F_DATA_VALID > and > + * VIRTIO_NET_HDR_F_RSC_INFO bits in flags. > + * > + * 5.1.6.2.2 Device Requirements: Packet Transmission > + * The device MUST ignore flag bits that it does not recognize. > + */ > + if (hdr->flags & VIRTIO_NET_HDR_F_NEEDS_CSUM) { > + uint32_t hdrlen; > + > + hdrlen = hdr_lens.l2_len + hdr_lens.l3_len + hdr_lens.l4_len; > + if (hdr->csum_start <= hdrlen && l4_supported != 0) { > + m->ol_flags |= PKT_RX_L4_CKSUM_NONE; > + } else { > + /* Unknown proto or tunnel, do sw cksum. We can > assume > + * the cksum field is in the first segment since the > + * buffers we provided to the host are large enough. > + * In case of SCTP, this will be wrong since it's a CRC > + * but there's nothing we can do. > + */ > + uint16_t csum = 0, off; > + > + if (rte_raw_cksum_mbuf(m, hdr->csum_start, > + rte_pktmbuf_pkt_len(m) - hdr- > >csum_start, &csum) < 0) > + return; > + if (likely(csum != 0xffff)) > + csum = ~csum; > + off = hdr->csum_offset + hdr->csum_start; > + if (rte_pktmbuf_data_len(m) >= off + 1) > + *rte_pktmbuf_mtod_offset(m, uint16_t *, > off) = csum; > + } > + } > + > + if (hdr->gso_type != VIRTIO_NET_HDR_GSO_NONE) { > + if (hdr->gso_size == 0) > + return; > + > + switch (hdr->gso_type & ~VIRTIO_NET_HDR_GSO_ECN) { > + case VIRTIO_NET_HDR_GSO_TCPV4: > + case VIRTIO_NET_HDR_GSO_TCPV6: > + if ((ptype & RTE_PTYPE_L4_MASK) != > RTE_PTYPE_L4_TCP) > + break; > + m->ol_flags |= PKT_RX_LRO | > PKT_RX_L4_CKSUM_NONE; > + m->tso_segsz = hdr->gso_size; > + break; > + case VIRTIO_NET_HDR_GSO_UDP: > + if ((ptype & RTE_PTYPE_L4_MASK) != > RTE_PTYPE_L4_UDP) > + break; > + m->ol_flags |= PKT_RX_LRO | > PKT_RX_L4_CKSUM_NONE; > + m->tso_segsz = hdr->gso_size; > + break; > + default: > + break; > + } > + } > +} > + > static __rte_noinline void > copy_vnet_hdr_from_desc(struct virtio_net_hdr *hdr, > struct buf_vector *buf_vec) > @@ -2380,7 +2466,8 @@ copy_vnet_hdr_from_desc(struct virtio_net_hdr > *hdr, > static __rte_always_inline int > copy_desc_to_mbuf(struct virtio_net *dev, struct vhost_virtqueue *vq, > struct buf_vector *buf_vec, uint16_t nr_vec, > - struct rte_mbuf *m, struct rte_mempool *mbuf_pool) > + struct rte_mbuf *m, struct rte_mempool *mbuf_pool, > + bool legacy_ol_flags) > { > uint32_t buf_avail, buf_offset; > uint64_t buf_addr, buf_len; > @@ -2513,7 +2600,7 @@ copy_desc_to_mbuf(struct virtio_net *dev, > struct vhost_virtqueue *vq, > m->pkt_len += mbuf_offset; > > if (hdr) > - vhost_dequeue_offload(hdr, m); > + vhost_dequeue_offload(hdr, m, legacy_ol_flags); > > out: > > @@ -2606,9 +2693,11 @@ virtio_dev_pktmbuf_alloc(struct virtio_net > *dev, struct rte_mempool *mp, > return pkt; > } > > -static __rte_noinline uint16_t > +__rte_always_inline > +static uint16_t > virtio_dev_tx_split(struct virtio_net *dev, struct vhost_virtqueue *vq, > - struct rte_mempool *mbuf_pool, struct rte_mbuf **pkts, uint16_t > count) > + struct rte_mempool *mbuf_pool, struct rte_mbuf **pkts, uint16_t > count, > + bool legacy_ol_flags) > { > uint16_t i; > uint16_t free_entries; > @@ -2668,7 +2757,7 @@ virtio_dev_tx_split(struct virtio_net *dev, struct > vhost_virtqueue *vq, > } > > err = copy_desc_to_mbuf(dev, vq, buf_vec, nr_vec, pkts[i], > - mbuf_pool); > + mbuf_pool, legacy_ol_flags); > if (unlikely(err)) { > rte_pktmbuf_free(pkts[i]); > if (!allocerr_warned) { > @@ -2696,6 +2785,24 @@ virtio_dev_tx_split(struct virtio_net *dev, > struct vhost_virtqueue *vq, > return (i - dropped); > } > > +__rte_noinline > +static uint16_t > +virtio_dev_tx_split_legacy(struct virtio_net *dev, > + struct vhost_virtqueue *vq, struct rte_mempool *mbuf_pool, > + struct rte_mbuf **pkts, uint16_t count) > +{ > + return virtio_dev_tx_split(dev, vq, mbuf_pool, pkts, count, true); > +} > + > +__rte_noinline > +static uint16_t > +virtio_dev_tx_split_compliant(struct virtio_net *dev, > + struct vhost_virtqueue *vq, struct rte_mempool *mbuf_pool, > + struct rte_mbuf **pkts, uint16_t count) > +{ > + return virtio_dev_tx_split(dev, vq, mbuf_pool, pkts, count, false); > +} > + > static __rte_always_inline int > vhost_reserve_avail_batch_packed(struct virtio_net *dev, > struct vhost_virtqueue *vq, > @@ -2770,7 +2877,8 @@ vhost_reserve_avail_batch_packed(struct > virtio_net *dev, > static __rte_always_inline int > virtio_dev_tx_batch_packed(struct virtio_net *dev, > struct vhost_virtqueue *vq, > - struct rte_mbuf **pkts) > + struct rte_mbuf **pkts, > + bool legacy_ol_flags) > { > uint16_t avail_idx = vq->last_avail_idx; > uint32_t buf_offset = sizeof(struct virtio_net_hdr_mrg_rxbuf); > @@ -2794,7 +2902,7 @@ virtio_dev_tx_batch_packed(struct virtio_net > *dev, > if (virtio_net_with_host_offload(dev)) { > vhost_for_each_try_unroll(i, 0, PACKED_BATCH_SIZE) { > hdr = (struct virtio_net_hdr *)(desc_addrs[i]); > - vhost_dequeue_offload(hdr, pkts[i]); > + vhost_dequeue_offload(hdr, pkts[i], legacy_ol_flags); > } > } > > @@ -2815,7 +2923,8 @@ vhost_dequeue_single_packed(struct virtio_net > *dev, > struct rte_mempool *mbuf_pool, > struct rte_mbuf *pkts, > uint16_t *buf_id, > - uint16_t *desc_count) > + uint16_t *desc_count, > + bool legacy_ol_flags) > { > struct buf_vector buf_vec[BUF_VECTOR_MAX]; > uint32_t buf_len; > @@ -2841,7 +2950,7 @@ vhost_dequeue_single_packed(struct virtio_net > *dev, > } > > err = copy_desc_to_mbuf(dev, vq, buf_vec, nr_vec, pkts, > - mbuf_pool); > + mbuf_pool, legacy_ol_flags); > if (unlikely(err)) { > if (!allocerr_warned) { > VHOST_LOG_DATA(ERR, > @@ -2859,14 +2968,15 @@ static __rte_always_inline int > virtio_dev_tx_single_packed(struct virtio_net *dev, > struct vhost_virtqueue *vq, > struct rte_mempool *mbuf_pool, > - struct rte_mbuf *pkts) > + struct rte_mbuf *pkts, > + bool legacy_ol_flags) > { > > uint16_t buf_id, desc_count = 0; > int ret; > > ret = vhost_dequeue_single_packed(dev, vq, mbuf_pool, pkts, > &buf_id, > - &desc_count); > + &desc_count, legacy_ol_flags); > > if (likely(desc_count > 0)) { > if (virtio_net_is_inorder(dev)) > @@ -2882,12 +2992,14 @@ virtio_dev_tx_single_packed(struct virtio_net > *dev, > return ret; > } > > -static __rte_noinline uint16_t > +__rte_always_inline > +static uint16_t > virtio_dev_tx_packed(struct virtio_net *dev, > struct vhost_virtqueue *__rte_restrict vq, > struct rte_mempool *mbuf_pool, > struct rte_mbuf **__rte_restrict pkts, > - uint32_t count) > + uint32_t count, > + bool legacy_ol_flags) > { > uint32_t pkt_idx = 0; > > @@ -2899,14 +3011,16 @@ virtio_dev_tx_packed(struct virtio_net *dev, > > if (count - pkt_idx >= PACKED_BATCH_SIZE) { > if (!virtio_dev_tx_batch_packed(dev, vq, > - &pkts[pkt_idx])) { > + &pkts[pkt_idx], > + legacy_ol_flags)) { > pkt_idx += PACKED_BATCH_SIZE; > continue; > } > } > > if (virtio_dev_tx_single_packed(dev, vq, mbuf_pool, > - pkts[pkt_idx])) > + pkts[pkt_idx], > + legacy_ol_flags)) > break; > pkt_idx++; > } while (pkt_idx < count); > @@ -2924,6 +3038,24 @@ virtio_dev_tx_packed(struct virtio_net *dev, > return pkt_idx; > } > > +__rte_noinline > +static uint16_t > +virtio_dev_tx_packed_legacy(struct virtio_net *dev, > + struct vhost_virtqueue *__rte_restrict vq, struct rte_mempool > *mbuf_pool, > + struct rte_mbuf **__rte_restrict pkts, uint32_t count) > +{ > + return virtio_dev_tx_packed(dev, vq, mbuf_pool, pkts, count, true); > +} > + > +__rte_noinline > +static uint16_t > +virtio_dev_tx_packed_compliant(struct virtio_net *dev, > + struct vhost_virtqueue *__rte_restrict vq, struct rte_mempool > *mbuf_pool, > + struct rte_mbuf **__rte_restrict pkts, uint32_t count) > +{ > + return virtio_dev_tx_packed(dev, vq, mbuf_pool, pkts, count, false); > +} > + > uint16_t > rte_vhost_dequeue_burst(int vid, uint16_t queue_id, > struct rte_mempool *mbuf_pool, struct rte_mbuf **pkts, uint16_t > count) > @@ -2999,10 +3131,17 @@ rte_vhost_dequeue_burst(int vid, uint16_t > queue_id, > count -= 1; > } > > - if (vq_is_packed(dev)) > - count = virtio_dev_tx_packed(dev, vq, mbuf_pool, pkts, > count); > - else > - count = virtio_dev_tx_split(dev, vq, mbuf_pool, pkts, count); > + if (vq_is_packed(dev)) { > + if (dev->flags & VIRTIO_DEV_LEGACY_OL_FLAGS) > + count = virtio_dev_tx_packed_legacy(dev, vq, > mbuf_pool, pkts, count); > + else > + count = virtio_dev_tx_packed_compliant(dev, vq, > mbuf_pool, pkts, count); > + } else { > + if (dev->flags & VIRTIO_DEV_LEGACY_OL_FLAGS) > + count = virtio_dev_tx_split_legacy(dev, vq, > mbuf_pool, pkts, count); > + else > + count = virtio_dev_tx_split_compliant(dev, vq, > mbuf_pool, pkts, count); > + } > > out: > if (dev->features & (1ULL << VIRTIO_F_IOMMU_PLATFORM)) > -- > 2.23.0 ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [dpdk-stable] [dpdk-dev] [PATCH v4 3/3] vhost: fix offload flags in Rx path 2021-05-08 6:24 ` [dpdk-stable] [dpdk-dev] " Wang, Yinan @ 2021-05-12 3:29 ` Wang, Yinan 2021-05-12 15:20 ` David Marchand 0 siblings, 1 reply; 7+ messages in thread From: Wang, Yinan @ 2021-05-12 3:29 UTC (permalink / raw) To: Wang, Yinan, David Marchand, dev Cc: maxime.coquelin, olivier.matz, fbl, i.maximets, Xia, Chenbo, Stokes, Ian, stable, Jijiang Liu, Yuanhan Liu Hi David, Since vhost tx offload can’t work now, we report a Bugzilla as below, could you help to take a look? https://bugs.dpdk.org/show_bug.cgi?id=702 We also tried vhost example with VM2VM iperf test, large pkts also can't forwarding. BR, Yinan > -----Original Message----- > From: dev <dev-bounces@dpdk.org> On Behalf Of Wang, Yinan > Sent: 2021?5?8? 14:24 > To: David Marchand <david.marchand@redhat.com>; dev@dpdk.org > Cc: maxime.coquelin@redhat.com; olivier.matz@6wind.com; > fbl@sysclose.org; i.maximets@ovn.org; Xia, Chenbo > <chenbo.xia@intel.com>; Stokes, Ian <ian.stokes@intel.com>; > stable@dpdk.org; Jijiang Liu <jijiang.liu@intel.com>; Yuanhan Liu > <yuanhan.liu@linux.intel.com> > Subject: Re: [dpdk-dev] [PATCH v4 3/3] vhost: fix offload flags in Rx path > > Hi David, > > May I know how to configures Tx offloading by testpmd, could you help to > provide an example case? > I add a case which need vhost tx offload (TSO/cksum) function, this case > can't work with the patch, could you use this case as the example if possible? > > For example: VM2VM split ring vhost-user/virtio-net test with tcp traffic > ========================================================== > =============== > > 1. Launch the Vhost sample on socket 0 by below commands:: > > rm -rf vhost-net* > ./dpdk-testpmd -l 2-4 -n 4 --no-pci --file-prefix=vhost --vdev > 'net_vhost0,iface=vhost-net0,queues=1' \ > --vdev 'net_vhost1,iface=vhost-net1,queues=1' -- -i --nb-cores=2 -- > txd=1024 --rxd=1024 > testpmd>start > > 2. Launch VM1 and VM2 on socket 1:: > > taskset -c 32 qemu-system-x86_64 -name vm1 -enable-kvm -cpu host - > smp 1 -m 4096 \ > -object memory-backend-file,id=mem,size=4096M,mem- > path=/mnt/huge,share=on \ > -numa node,memdev=mem -mem-prealloc -drive > file=/home/osimg/ubuntu20-04.img \ > -chardev socket,path=/tmp/vm2_qga0.sock,server,nowait,id=vm2_qga0 > -device virtio-serial \ > -device virtserialport,chardev=vm2_qga0,name=org.qemu.guest_agent.2 > -daemonize \ > -monitor unix:/tmp/vm2_monitor.sock,server,nowait -device > e1000,netdev=nttsip1 \ > -netdev user,id=nttsip1,hostfwd=tcp:127.0.0.1:6002-:22 \ > -chardev socket,id=char0,path=./vhost-net0 \ > -netdev type=vhost-user,id=netdev0,chardev=char0,vhostforce \ > -device virtio-net-pci,netdev=netdev0,mac=52:54:00:00:00:01,disable- > modern=false,mrg_rxbuf=on,csum=on,guest_csum=on,host_tso4=on,guest > _tso4=on,guest_ecn=on -vnc :10 > > taskset -c 33 qemu-system-x86_64 -name vm2 -enable-kvm -cpu host - > smp 1 -m 4096 \ > -object memory-backend-file,id=mem,size=4096M,mem- > path=/mnt/huge,share=on \ > -numa node,memdev=mem -mem-prealloc -drive > file=/home/osimg/ubuntu20-04-2.img \ > -chardev socket,path=/tmp/vm2_qga0.sock,server,nowait,id=vm2_qga0 > -device virtio-serial \ > -device virtserialport,chardev=vm2_qga0,name=org.qemu.guest_agent.2 > -daemonize \ > -monitor unix:/tmp/vm2_monitor.sock,server,nowait -device > e1000,netdev=nttsip1 \ > -netdev user,id=nttsip1,hostfwd=tcp:127.0.0.1:6003-:22 \ > -chardev socket,id=char0,path=./vhost-net1 \ > -netdev type=vhost-user,id=netdev0,chardev=char0,vhostforce \ > -device virtio-net-pci,netdev=netdev0,mac=52:54:00:00:00:02,disable- > modern=false,mrg_rxbuf=on,csum=on,guest_csum=on,host_tso4=on,guest > _tso4=on,guest_ecn=on -vnc :12 > > 3. On VM1, set virtio device IP and run arp protocal:: > > ifconfig ens5 1.1.1.2 > arp -s 1.1.1.8 52:54:00:00:00:02 > > 4. On VM2, set virtio device IP and run arp protocal:: > > ifconfig ens5 1.1.1.8 > arp -s 1.1.1.2 52:54:00:00:00:01 > > 5. Check the iperf performance with different packet size between two VMs > by below commands:: > > Under VM1, run: `iperf -s -i 1` > Under VM2, run: `iperf -c 1.1.1.2 -i 1 -t 60` > > BR, > Yinan > > > -----Original Message----- > > From: dev <dev-bounces@dpdk.org> On Behalf Of David Marchand > > Sent: 2021?5?4? 0:44 > > To: dev@dpdk.org > > Cc: maxime.coquelin@redhat.com; olivier.matz@6wind.com; > > fbl@sysclose.org; i.maximets@ovn.org; Xia, Chenbo > > <chenbo.xia@intel.com>; Stokes, Ian <ian.stokes@intel.com>; > > stable@dpdk.org; Jijiang Liu <jijiang.liu@intel.com>; Yuanhan Liu > > <yuanhan.liu@linux.intel.com> > > Subject: [dpdk-dev] [PATCH v4 3/3] vhost: fix offload flags in Rx path > > > > The vhost library currently configures Tx offloading (PKT_TX_*) on any > > packet received from a guest virtio device which asks for some offloading. > > > > This is problematic, as Tx offloading is something that the application > > must ask for: the application needs to configure devices > > to support every used offloads (ip, tcp checksumming, tso..), and the > > various l2/l3/l4 lengths must be set following any processing that > > happened in the application itself. > > > > On the other hand, the received packets are not marked wrt current > > packet l3/l4 checksumming info. > > > > Copy virtio rx processing to fix those offload flags with some > > differences: > > - accept VIRTIO_NET_HDR_GSO_ECN and VIRTIO_NET_HDR_GSO_UDP, > > - ignore anything but the VIRTIO_NET_HDR_F_NEEDS_CSUM flag (to > comply > > with > > the virtio spec), > > > > Some applications might rely on the current behavior, so it is left > > untouched by default. > > A new RTE_VHOST_USER_NET_COMPLIANT_OL_FLAGS flag is added to > > enable the > > new behavior. > > > > The vhost example has been updated for the new behavior: TSO is applied > > to > > any packet marked LRO. > > > > Fixes: 859b480d5afd ("vhost: add guest offload setting") > > Cc: stable@dpdk.org > > > > Signed-off-by: David Marchand <david.marchand@redhat.com> > > Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com> > > --- > > Changes since v3: > > - rebased on next-virtio, > > > > Changes since v2: > > - introduced a new flag to keep existing behavior as the default, > > - packets with unrecognised offload are passed to the application with no > > offload metadata rather than dropped, > > - ignored VIRTIO_NET_HDR_F_DATA_VALID since the virtio spec states > that > > the virtio driver is not allowed to use this flag when transmitting > > packets, > > > > Changes since v1: > > - updated vhost example, > > - restored VIRTIO_NET_HDR_GSO_ECN and VIRTIO_NET_HDR_GSO_UDP > > support, > > - restored log on buggy offload request, > > > > --- > > doc/guides/prog_guide/vhost_lib.rst | 12 ++ > > doc/guides/rel_notes/release_21_05.rst | 6 + > > drivers/net/vhost/rte_eth_vhost.c | 2 +- > > examples/vhost/main.c | 44 +++--- > > lib/vhost/rte_vhost.h | 1 + > > lib/vhost/socket.c | 5 +- > > lib/vhost/vhost.c | 6 +- > > lib/vhost/vhost.h | 14 +- > > lib/vhost/virtio_net.c | 185 ++++++++++++++++++++++--- > > 9 files changed, 222 insertions(+), 53 deletions(-) > > > > diff --git a/doc/guides/prog_guide/vhost_lib.rst > > b/doc/guides/prog_guide/vhost_lib.rst > > index 7afa351675..d18fb98910 100644 > > --- a/doc/guides/prog_guide/vhost_lib.rst > > +++ b/doc/guides/prog_guide/vhost_lib.rst > > @@ -118,6 +118,18 @@ The following is an overview of some key Vhost > > API functions: > > > > It is disabled by default. > > > > + - ``RTE_VHOST_USER_NET_COMPLIANT_OL_FLAGS`` > > + > > + Since v16.04, the vhost library forwards checksum and gso requests > for > > + packets received from a virtio driver by filling Tx offload metadata in > > + the mbuf. This behavior is inconsistent with other drivers but it is left > > + untouched for existing applications that might rely on it. > > + > > + This flag disables the legacy behavior and instead ask vhost to simply > > + populate Rx offload metadata in the mbuf. > > + > > + It is disabled by default. > > + > > * ``rte_vhost_driver_set_features(path, features)`` > > > > This function sets the feature bits the vhost-user driver supports. The > > diff --git a/doc/guides/rel_notes/release_21_05.rst > > b/doc/guides/rel_notes/release_21_05.rst > > index a5f21f8425..6b7b0810a5 100644 > > --- a/doc/guides/rel_notes/release_21_05.rst > > +++ b/doc/guides/rel_notes/release_21_05.rst > > @@ -337,6 +337,12 @@ API Changes > > ``policer_action_recolor_supported`` and > > ``policer_action_drop_supported`` > > have been removed. > > > > +* vhost: The vhost library currently populates received mbufs from a > virtio > > + driver with Tx offload flags while not filling Rx offload flags. > > + While this behavior is arguable, it is kept untouched. > > + A new flag ``RTE_VHOST_USER_NET_COMPLIANT_OL_FLAGS`` has been > > added to ask > > + for a behavior compliant with to the mbuf offload API. > > + > > > > ABI Changes > > ----------- > > diff --git a/drivers/net/vhost/rte_eth_vhost.c > > b/drivers/net/vhost/rte_eth_vhost.c > > index d198fc8a8e..281379d6a3 100644 > > --- a/drivers/net/vhost/rte_eth_vhost.c > > +++ b/drivers/net/vhost/rte_eth_vhost.c > > @@ -1505,7 +1505,7 @@ rte_pmd_vhost_probe(struct rte_vdev_device > > *dev) > > int ret = 0; > > char *iface_name; > > uint16_t queues; > > - uint64_t flags = 0; > > + uint64_t flags = RTE_VHOST_USER_NET_COMPLIANT_OL_FLAGS; > > uint64_t disable_flags = 0; > > int client_mode = 0; > > int iommu_support = 0; > > diff --git a/examples/vhost/main.c b/examples/vhost/main.c > > index 0bee1f3321..d2179eadb9 100644 > > --- a/examples/vhost/main.c > > +++ b/examples/vhost/main.c > > @@ -19,6 +19,7 @@ > > #include <rte_log.h> > > #include <rte_string_fns.h> > > #include <rte_malloc.h> > > +#include <rte_net.h> > > #include <rte_vhost.h> > > #include <rte_ip.h> > > #include <rte_tcp.h> > > @@ -1029,33 +1030,34 @@ find_local_dest(struct vhost_dev *vdev, > > struct rte_mbuf *m, > > return 0; > > } > > > > -static uint16_t > > -get_psd_sum(void *l3_hdr, uint64_t ol_flags) > > -{ > > - if (ol_flags & PKT_TX_IPV4) > > - return rte_ipv4_phdr_cksum(l3_hdr, ol_flags); > > - else /* assume ethertype == RTE_ETHER_TYPE_IPV6 */ > > - return rte_ipv6_phdr_cksum(l3_hdr, ol_flags); > > -} > > - > > static void virtio_tx_offload(struct rte_mbuf *m) > > { > > + struct rte_net_hdr_lens hdr_lens; > > + struct rte_ipv4_hdr *ipv4_hdr; > > + struct rte_tcp_hdr *tcp_hdr; > > + uint32_t ptype; > > void *l3_hdr; > > - struct rte_ipv4_hdr *ipv4_hdr = NULL; > > - struct rte_tcp_hdr *tcp_hdr = NULL; > > - struct rte_ether_hdr *eth_hdr = > > - rte_pktmbuf_mtod(m, struct rte_ether_hdr *); > > > > - l3_hdr = (char *)eth_hdr + m->l2_len; > > + ptype = rte_net_get_ptype(m, &hdr_lens, RTE_PTYPE_ALL_MASK); > > + m->l2_len = hdr_lens.l2_len; > > + m->l3_len = hdr_lens.l3_len; > > + m->l4_len = hdr_lens.l4_len; > > > > - if (m->ol_flags & PKT_TX_IPV4) { > > + l3_hdr = rte_pktmbuf_mtod_offset(m, void *, m->l2_len); > > + tcp_hdr = rte_pktmbuf_mtod_offset(m, struct rte_tcp_hdr *, > > + m->l2_len + m->l3_len); > > + > > + m->ol_flags |= PKT_TX_TCP_SEG; > > + if ((ptype & RTE_PTYPE_L3_MASK) == RTE_PTYPE_L3_IPV4) { > > + m->ol_flags |= PKT_TX_IPV4; > > + m->ol_flags |= PKT_TX_IP_CKSUM; > > ipv4_hdr = l3_hdr; > > ipv4_hdr->hdr_checksum = 0; > > - m->ol_flags |= PKT_TX_IP_CKSUM; > > + tcp_hdr->cksum = rte_ipv4_phdr_cksum(l3_hdr, m- > > >ol_flags); > > + } else { /* assume ethertype == RTE_ETHER_TYPE_IPV6 */ > > + m->ol_flags |= PKT_TX_IPV6; > > + tcp_hdr->cksum = rte_ipv6_phdr_cksum(l3_hdr, m- > > >ol_flags); > > } > > - > > - tcp_hdr = (struct rte_tcp_hdr *)((char *)l3_hdr + m->l3_len); > > - tcp_hdr->cksum = get_psd_sum(l3_hdr, m->ol_flags); > > } > > > > static __rte_always_inline void > > @@ -1148,7 +1150,7 @@ virtio_tx_route(struct vhost_dev *vdev, struct > > rte_mbuf *m, uint16_t vlan_tag) > > m->vlan_tci = vlan_tag; > > } > > > > - if (m->ol_flags & PKT_TX_TCP_SEG) > > + if (m->ol_flags & PKT_RX_LRO) > > virtio_tx_offload(m); > > > > tx_q->m_table[tx_q->len++] = m; > > @@ -1633,7 +1635,7 @@ main(int argc, char *argv[]) > > int ret, i; > > uint16_t portid; > > static pthread_t tid; > > - uint64_t flags = 0; > > + uint64_t flags = RTE_VHOST_USER_NET_COMPLIANT_OL_FLAGS; > > > > signal(SIGINT, sigint_handler); > > > > diff --git a/lib/vhost/rte_vhost.h b/lib/vhost/rte_vhost.h > > index d0a8ae31f2..8d875e9322 100644 > > --- a/lib/vhost/rte_vhost.h > > +++ b/lib/vhost/rte_vhost.h > > @@ -36,6 +36,7 @@ extern "C" { > > /* support only linear buffers (no chained mbufs) */ > > #define RTE_VHOST_USER_LINEARBUF_SUPPORT (1ULL << 6) > > #define RTE_VHOST_USER_ASYNC_COPY (1ULL << 7) > > +#define RTE_VHOST_USER_NET_COMPLIANT_OL_FLAGS (1ULL << 8) > > > > /* Features. */ > > #ifndef VIRTIO_NET_F_GUEST_ANNOUNCE > > diff --git a/lib/vhost/socket.c b/lib/vhost/socket.c > > index 0169d36481..5d0d728d52 100644 > > --- a/lib/vhost/socket.c > > +++ b/lib/vhost/socket.c > > @@ -42,6 +42,7 @@ struct vhost_user_socket { > > bool extbuf; > > bool linearbuf; > > bool async_copy; > > + bool net_compliant_ol_flags; > > > > /* > > * The "supported_features" indicates the feature bits the > > @@ -224,7 +225,8 @@ vhost_user_add_connection(int fd, struct > > vhost_user_socket *vsocket) > > size = strnlen(vsocket->path, PATH_MAX); > > vhost_set_ifname(vid, vsocket->path, size); > > > > - vhost_set_builtin_virtio_net(vid, vsocket->use_builtin_virtio_net); > > + vhost_setup_virtio_net(vid, vsocket->use_builtin_virtio_net, > > + vsocket->net_compliant_ol_flags); > > > > vhost_attach_vdpa_device(vid, vsocket->vdpa_dev); > > > > @@ -877,6 +879,7 @@ rte_vhost_driver_register(const char *path, > > uint64_t flags) > > vsocket->extbuf = flags & RTE_VHOST_USER_EXTBUF_SUPPORT; > > vsocket->linearbuf = flags & > > RTE_VHOST_USER_LINEARBUF_SUPPORT; > > vsocket->async_copy = flags & RTE_VHOST_USER_ASYNC_COPY; > > + vsocket->net_compliant_ol_flags = flags & > > RTE_VHOST_USER_NET_COMPLIANT_OL_FLAGS; > > > > if (vsocket->async_copy && > > (flags & (RTE_VHOST_USER_IOMMU_SUPPORT | > > diff --git a/lib/vhost/vhost.c b/lib/vhost/vhost.c > > index c9b6379f73..9abfc0bfe7 100644 > > --- a/lib/vhost/vhost.c > > +++ b/lib/vhost/vhost.c > > @@ -752,7 +752,7 @@ vhost_set_ifname(int vid, const char *if_name, > > unsigned int if_len) > > } > > > > void > > -vhost_set_builtin_virtio_net(int vid, bool enable) > > +vhost_setup_virtio_net(int vid, bool enable, bool compliant_ol_flags) > > { > > struct virtio_net *dev = get_device(vid); > > > > @@ -763,6 +763,10 @@ vhost_set_builtin_virtio_net(int vid, bool > enable) > > dev->flags |= VIRTIO_DEV_BUILTIN_VIRTIO_NET; > > else > > dev->flags &= ~VIRTIO_DEV_BUILTIN_VIRTIO_NET; > > + if (!compliant_ol_flags) > > + dev->flags |= VIRTIO_DEV_LEGACY_OL_FLAGS; > > + else > > + dev->flags &= ~VIRTIO_DEV_LEGACY_OL_FLAGS; > > } > > > > void > > diff --git a/lib/vhost/vhost.h b/lib/vhost/vhost.h > > index b303635645..8078ddff79 100644 > > --- a/lib/vhost/vhost.h > > +++ b/lib/vhost/vhost.h > > @@ -27,15 +27,17 @@ > > #include "rte_vhost_async.h" > > > > /* Used to indicate that the device is running on a data core */ > > -#define VIRTIO_DEV_RUNNING 1 > > +#define VIRTIO_DEV_RUNNING ((uint32_t)1 << 0) > > /* Used to indicate that the device is ready to operate */ > > -#define VIRTIO_DEV_READY 2 > > +#define VIRTIO_DEV_READY ((uint32_t)1 << 1) > > /* Used to indicate that the built-in vhost net device backend is enabled > */ > > -#define VIRTIO_DEV_BUILTIN_VIRTIO_NET 4 > > +#define VIRTIO_DEV_BUILTIN_VIRTIO_NET ((uint32_t)1 << 2) > > /* Used to indicate that the device has its own data path and configured > */ > > -#define VIRTIO_DEV_VDPA_CONFIGURED 8 > > +#define VIRTIO_DEV_VDPA_CONFIGURED ((uint32_t)1 << 3) > > /* Used to indicate that the feature negotiation failed */ > > -#define VIRTIO_DEV_FEATURES_FAILED 16 > > +#define VIRTIO_DEV_FEATURES_FAILED ((uint32_t)1 << 4) > > +/* Used to indicate that the virtio_net tx code should fill TX ol_flags */ > > +#define VIRTIO_DEV_LEGACY_OL_FLAGS ((uint32_t)1 << 5) > > > > /* Backend value set by guest. */ > > #define VIRTIO_DEV_STOPPED -1 > > @@ -683,7 +685,7 @@ int alloc_vring_queue(struct virtio_net *dev, > > uint32_t vring_idx); > > void vhost_attach_vdpa_device(int vid, struct rte_vdpa_device *dev); > > > > void vhost_set_ifname(int, const char *if_name, unsigned int if_len); > > -void vhost_set_builtin_virtio_net(int vid, bool enable); > > +void vhost_setup_virtio_net(int vid, bool enable, bool legacy_ol_flags); > > void vhost_enable_extbuf(int vid); > > void vhost_enable_linearbuf(int vid); > > int vhost_enable_guest_notification(struct virtio_net *dev, > > diff --git a/lib/vhost/virtio_net.c b/lib/vhost/virtio_net.c > > index 1a34867f3c..8e36f4c340 100644 > > --- a/lib/vhost/virtio_net.c > > +++ b/lib/vhost/virtio_net.c > > @@ -8,6 +8,7 @@ > > > > #include <rte_mbuf.h> > > #include <rte_memcpy.h> > > +#include <rte_net.h> > > #include <rte_ether.h> > > #include <rte_ip.h> > > #include <rte_vhost.h> > > @@ -2303,15 +2304,12 @@ parse_ethernet(struct rte_mbuf *m, > uint16_t > > *l4_proto, void **l4_hdr) > > } > > > > static __rte_always_inline void > > -vhost_dequeue_offload(struct virtio_net_hdr *hdr, struct rte_mbuf *m) > > +vhost_dequeue_offload_legacy(struct virtio_net_hdr *hdr, struct > > rte_mbuf *m) > > { > > uint16_t l4_proto = 0; > > void *l4_hdr = NULL; > > struct rte_tcp_hdr *tcp_hdr = NULL; > > > > - if (hdr->flags == 0 && hdr->gso_type == > > VIRTIO_NET_HDR_GSO_NONE) > > - return; > > - > > parse_ethernet(m, &l4_proto, &l4_hdr); > > if (hdr->flags == VIRTIO_NET_HDR_F_NEEDS_CSUM) { > > if (hdr->csum_start == (m->l2_len + m->l3_len)) { > > @@ -2356,6 +2354,94 @@ vhost_dequeue_offload(struct virtio_net_hdr > > *hdr, struct rte_mbuf *m) > > } > > } > > > > +static __rte_always_inline void > > +vhost_dequeue_offload(struct virtio_net_hdr *hdr, struct rte_mbuf *m, > > + bool legacy_ol_flags) > > +{ > > + struct rte_net_hdr_lens hdr_lens; > > + int l4_supported = 0; > > + uint32_t ptype; > > + > > + if (hdr->flags == 0 && hdr->gso_type == > > VIRTIO_NET_HDR_GSO_NONE) > > + return; > > + > > + if (legacy_ol_flags) { > > + vhost_dequeue_offload_legacy(hdr, m); > > + return; > > + } > > + > > + m->ol_flags |= PKT_RX_IP_CKSUM_UNKNOWN; > > + > > + ptype = rte_net_get_ptype(m, &hdr_lens, RTE_PTYPE_ALL_MASK); > > + m->packet_type = ptype; > > + if ((ptype & RTE_PTYPE_L4_MASK) == RTE_PTYPE_L4_TCP || > > + (ptype & RTE_PTYPE_L4_MASK) == RTE_PTYPE_L4_UDP || > > + (ptype & RTE_PTYPE_L4_MASK) == RTE_PTYPE_L4_SCTP) > > + l4_supported = 1; > > + > > + /* According to Virtio 1.1 spec, the device only needs to look at > > + * VIRTIO_NET_HDR_F_NEEDS_CSUM in the packet transmission > > path. > > + * This differs from the processing incoming packets path where the > > + * driver could rely on VIRTIO_NET_HDR_F_DATA_VALID flag set by > > the > > + * device. > > + * > > + * 5.1.6.2.1 Driver Requirements: Packet Transmission > > + * The driver MUST NOT set the VIRTIO_NET_HDR_F_DATA_VALID > > and > > + * VIRTIO_NET_HDR_F_RSC_INFO bits in flags. > > + * > > + * 5.1.6.2.2 Device Requirements: Packet Transmission > > + * The device MUST ignore flag bits that it does not recognize. > > + */ > > + if (hdr->flags & VIRTIO_NET_HDR_F_NEEDS_CSUM) { > > + uint32_t hdrlen; > > + > > + hdrlen = hdr_lens.l2_len + hdr_lens.l3_len + hdr_lens.l4_len; > > + if (hdr->csum_start <= hdrlen && l4_supported != 0) { > > + m->ol_flags |= PKT_RX_L4_CKSUM_NONE; > > + } else { > > + /* Unknown proto or tunnel, do sw cksum. We can > > assume > > + * the cksum field is in the first segment since the > > + * buffers we provided to the host are large enough. > > + * In case of SCTP, this will be wrong since it's a CRC > > + * but there's nothing we can do. > > + */ > > + uint16_t csum = 0, off; > > + > > + if (rte_raw_cksum_mbuf(m, hdr->csum_start, > > + rte_pktmbuf_pkt_len(m) - hdr- > > >csum_start, &csum) < 0) > > + return; > > + if (likely(csum != 0xffff)) > > + csum = ~csum; > > + off = hdr->csum_offset + hdr->csum_start; > > + if (rte_pktmbuf_data_len(m) >= off + 1) > > + *rte_pktmbuf_mtod_offset(m, uint16_t *, > > off) = csum; > > + } > > + } > > + > > + if (hdr->gso_type != VIRTIO_NET_HDR_GSO_NONE) { > > + if (hdr->gso_size == 0) > > + return; > > + > > + switch (hdr->gso_type & ~VIRTIO_NET_HDR_GSO_ECN) { > > + case VIRTIO_NET_HDR_GSO_TCPV4: > > + case VIRTIO_NET_HDR_GSO_TCPV6: > > + if ((ptype & RTE_PTYPE_L4_MASK) != > > RTE_PTYPE_L4_TCP) > > + break; > > + m->ol_flags |= PKT_RX_LRO | > > PKT_RX_L4_CKSUM_NONE; > > + m->tso_segsz = hdr->gso_size; > > + break; > > + case VIRTIO_NET_HDR_GSO_UDP: > > + if ((ptype & RTE_PTYPE_L4_MASK) != > > RTE_PTYPE_L4_UDP) > > + break; > > + m->ol_flags |= PKT_RX_LRO | > > PKT_RX_L4_CKSUM_NONE; > > + m->tso_segsz = hdr->gso_size; > > + break; > > + default: > > + break; > > + } > > + } > > +} > > + > > static __rte_noinline void > > copy_vnet_hdr_from_desc(struct virtio_net_hdr *hdr, > > struct buf_vector *buf_vec) > > @@ -2380,7 +2466,8 @@ copy_vnet_hdr_from_desc(struct > virtio_net_hdr > > *hdr, > > static __rte_always_inline int > > copy_desc_to_mbuf(struct virtio_net *dev, struct vhost_virtqueue *vq, > > struct buf_vector *buf_vec, uint16_t nr_vec, > > - struct rte_mbuf *m, struct rte_mempool *mbuf_pool) > > + struct rte_mbuf *m, struct rte_mempool *mbuf_pool, > > + bool legacy_ol_flags) > > { > > uint32_t buf_avail, buf_offset; > > uint64_t buf_addr, buf_len; > > @@ -2513,7 +2600,7 @@ copy_desc_to_mbuf(struct virtio_net *dev, > > struct vhost_virtqueue *vq, > > m->pkt_len += mbuf_offset; > > > > if (hdr) > > - vhost_dequeue_offload(hdr, m); > > + vhost_dequeue_offload(hdr, m, legacy_ol_flags); > > > > out: > > > > @@ -2606,9 +2693,11 @@ virtio_dev_pktmbuf_alloc(struct virtio_net > > *dev, struct rte_mempool *mp, > > return pkt; > > } > > > > -static __rte_noinline uint16_t > > +__rte_always_inline > > +static uint16_t > > virtio_dev_tx_split(struct virtio_net *dev, struct vhost_virtqueue *vq, > > - struct rte_mempool *mbuf_pool, struct rte_mbuf **pkts, uint16_t > > count) > > + struct rte_mempool *mbuf_pool, struct rte_mbuf **pkts, uint16_t > > count, > > + bool legacy_ol_flags) > > { > > uint16_t i; > > uint16_t free_entries; > > @@ -2668,7 +2757,7 @@ virtio_dev_tx_split(struct virtio_net *dev, > struct > > vhost_virtqueue *vq, > > } > > > > err = copy_desc_to_mbuf(dev, vq, buf_vec, nr_vec, pkts[i], > > - mbuf_pool); > > + mbuf_pool, legacy_ol_flags); > > if (unlikely(err)) { > > rte_pktmbuf_free(pkts[i]); > > if (!allocerr_warned) { > > @@ -2696,6 +2785,24 @@ virtio_dev_tx_split(struct virtio_net *dev, > > struct vhost_virtqueue *vq, > > return (i - dropped); > > } > > > > +__rte_noinline > > +static uint16_t > > +virtio_dev_tx_split_legacy(struct virtio_net *dev, > > + struct vhost_virtqueue *vq, struct rte_mempool *mbuf_pool, > > + struct rte_mbuf **pkts, uint16_t count) > > +{ > > + return virtio_dev_tx_split(dev, vq, mbuf_pool, pkts, count, true); > > +} > > + > > +__rte_noinline > > +static uint16_t > > +virtio_dev_tx_split_compliant(struct virtio_net *dev, > > + struct vhost_virtqueue *vq, struct rte_mempool *mbuf_pool, > > + struct rte_mbuf **pkts, uint16_t count) > > +{ > > + return virtio_dev_tx_split(dev, vq, mbuf_pool, pkts, count, false); > > +} > > + > > static __rte_always_inline int > > vhost_reserve_avail_batch_packed(struct virtio_net *dev, > > struct vhost_virtqueue *vq, > > @@ -2770,7 +2877,8 @@ vhost_reserve_avail_batch_packed(struct > > virtio_net *dev, > > static __rte_always_inline int > > virtio_dev_tx_batch_packed(struct virtio_net *dev, > > struct vhost_virtqueue *vq, > > - struct rte_mbuf **pkts) > > + struct rte_mbuf **pkts, > > + bool legacy_ol_flags) > > { > > uint16_t avail_idx = vq->last_avail_idx; > > uint32_t buf_offset = sizeof(struct virtio_net_hdr_mrg_rxbuf); > > @@ -2794,7 +2902,7 @@ virtio_dev_tx_batch_packed(struct virtio_net > > *dev, > > if (virtio_net_with_host_offload(dev)) { > > vhost_for_each_try_unroll(i, 0, PACKED_BATCH_SIZE) { > > hdr = (struct virtio_net_hdr *)(desc_addrs[i]); > > - vhost_dequeue_offload(hdr, pkts[i]); > > + vhost_dequeue_offload(hdr, pkts[i], legacy_ol_flags); > > } > > } > > > > @@ -2815,7 +2923,8 @@ vhost_dequeue_single_packed(struct > virtio_net > > *dev, > > struct rte_mempool *mbuf_pool, > > struct rte_mbuf *pkts, > > uint16_t *buf_id, > > - uint16_t *desc_count) > > + uint16_t *desc_count, > > + bool legacy_ol_flags) > > { > > struct buf_vector buf_vec[BUF_VECTOR_MAX]; > > uint32_t buf_len; > > @@ -2841,7 +2950,7 @@ vhost_dequeue_single_packed(struct > virtio_net > > *dev, > > } > > > > err = copy_desc_to_mbuf(dev, vq, buf_vec, nr_vec, pkts, > > - mbuf_pool); > > + mbuf_pool, legacy_ol_flags); > > if (unlikely(err)) { > > if (!allocerr_warned) { > > VHOST_LOG_DATA(ERR, > > @@ -2859,14 +2968,15 @@ static __rte_always_inline int > > virtio_dev_tx_single_packed(struct virtio_net *dev, > > struct vhost_virtqueue *vq, > > struct rte_mempool *mbuf_pool, > > - struct rte_mbuf *pkts) > > + struct rte_mbuf *pkts, > > + bool legacy_ol_flags) > > { > > > > uint16_t buf_id, desc_count = 0; > > int ret; > > > > ret = vhost_dequeue_single_packed(dev, vq, mbuf_pool, pkts, > > &buf_id, > > - &desc_count); > > + &desc_count, legacy_ol_flags); > > > > if (likely(desc_count > 0)) { > > if (virtio_net_is_inorder(dev)) > > @@ -2882,12 +2992,14 @@ virtio_dev_tx_single_packed(struct > virtio_net > > *dev, > > return ret; > > } > > > > -static __rte_noinline uint16_t > > +__rte_always_inline > > +static uint16_t > > virtio_dev_tx_packed(struct virtio_net *dev, > > struct vhost_virtqueue *__rte_restrict vq, > > struct rte_mempool *mbuf_pool, > > struct rte_mbuf **__rte_restrict pkts, > > - uint32_t count) > > + uint32_t count, > > + bool legacy_ol_flags) > > { > > uint32_t pkt_idx = 0; > > > > @@ -2899,14 +3011,16 @@ virtio_dev_tx_packed(struct virtio_net *dev, > > > > if (count - pkt_idx >= PACKED_BATCH_SIZE) { > > if (!virtio_dev_tx_batch_packed(dev, vq, > > - &pkts[pkt_idx])) { > > + &pkts[pkt_idx], > > + legacy_ol_flags)) { > > pkt_idx += PACKED_BATCH_SIZE; > > continue; > > } > > } > > > > if (virtio_dev_tx_single_packed(dev, vq, mbuf_pool, > > - pkts[pkt_idx])) > > + pkts[pkt_idx], > > + legacy_ol_flags)) > > break; > > pkt_idx++; > > } while (pkt_idx < count); > > @@ -2924,6 +3038,24 @@ virtio_dev_tx_packed(struct virtio_net *dev, > > return pkt_idx; > > } > > > > +__rte_noinline > > +static uint16_t > > +virtio_dev_tx_packed_legacy(struct virtio_net *dev, > > + struct vhost_virtqueue *__rte_restrict vq, struct rte_mempool > > *mbuf_pool, > > + struct rte_mbuf **__rte_restrict pkts, uint32_t count) > > +{ > > + return virtio_dev_tx_packed(dev, vq, mbuf_pool, pkts, count, true); > > +} > > + > > +__rte_noinline > > +static uint16_t > > +virtio_dev_tx_packed_compliant(struct virtio_net *dev, > > + struct vhost_virtqueue *__rte_restrict vq, struct rte_mempool > > *mbuf_pool, > > + struct rte_mbuf **__rte_restrict pkts, uint32_t count) > > +{ > > + return virtio_dev_tx_packed(dev, vq, mbuf_pool, pkts, count, false); > > +} > > + > > uint16_t > > rte_vhost_dequeue_burst(int vid, uint16_t queue_id, > > struct rte_mempool *mbuf_pool, struct rte_mbuf **pkts, uint16_t > > count) > > @@ -2999,10 +3131,17 @@ rte_vhost_dequeue_burst(int vid, uint16_t > > queue_id, > > count -= 1; > > } > > > > - if (vq_is_packed(dev)) > > - count = virtio_dev_tx_packed(dev, vq, mbuf_pool, pkts, > > count); > > - else > > - count = virtio_dev_tx_split(dev, vq, mbuf_pool, pkts, count); > > + if (vq_is_packed(dev)) { > > + if (dev->flags & VIRTIO_DEV_LEGACY_OL_FLAGS) > > + count = virtio_dev_tx_packed_legacy(dev, vq, > > mbuf_pool, pkts, count); > > + else > > + count = virtio_dev_tx_packed_compliant(dev, vq, > > mbuf_pool, pkts, count); > > + } else { > > + if (dev->flags & VIRTIO_DEV_LEGACY_OL_FLAGS) > > + count = virtio_dev_tx_split_legacy(dev, vq, > > mbuf_pool, pkts, count); > > + else > > + count = virtio_dev_tx_split_compliant(dev, vq, > > mbuf_pool, pkts, count); > > + } > > > > out: > > if (dev->features & (1ULL << VIRTIO_F_IOMMU_PLATFORM)) > > -- > > 2.23.0 ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [dpdk-stable] [dpdk-dev] [PATCH v4 3/3] vhost: fix offload flags in Rx path 2021-05-12 3:29 ` Wang, Yinan @ 2021-05-12 15:20 ` David Marchand 2021-05-13 6:34 ` Wang, Yinan 0 siblings, 1 reply; 7+ messages in thread From: David Marchand @ 2021-05-12 15:20 UTC (permalink / raw) To: Wang, Yinan Cc: dev, maxime.coquelin, olivier.matz, fbl, i.maximets, Xia, Chenbo, Stokes, Ian, stable, Jijiang Liu, Yuanhan Liu On Wed, May 12, 2021 at 5:30 AM Wang, Yinan <yinan.wang@intel.com> wrote: > > Hi David, > > Since vhost tx offload can’t work now, we report a Bugzilla as below, could you help to take a look? > https://bugs.dpdk.org/show_bug.cgi?id=702 (I discovered your mail from 05/08 only today, now that I got a new mail, might be a pebcak from me, sorry...) - Looking at the bz, there is a first issue/misconception. testpmd only does TSO or any kind of tx offloading with the csum forward engine. The iofwd engine won't make TSO possible. - Let's say we use the csum fwd engine, testpmd configures drivers through the ethdev API. The ethdev API states that no offloading is enabled unless requested by the application. TSO, l3/l4 checksums offloading are documented as: https://doc.dpdk.org/guides/nics/features.html#l3-checksum-offload https://doc.dpdk.org/guides/nics/features.html#lro But the vhost pmd does not report such capabilities. https://git.dpdk.org/dpdk/tree/drivers/net/vhost/rte_eth_vhost.c#n1276 So we can't expect testpmd to have tso working with net/vhost pmd. - The csum offloading engine swaps mac addresses. I would expect issues with inter vm traffic. In summary, I think this is a bad test. If it worked with the commands in the bugzilla before my change (which I doubt), it was wrong. > We also tried vhost example with VM2VM iperf test, large pkts also can't forwarding. "large pkts", can you give details? I tried to use this example, without/with my change, but: When I try to start this example with a physical port and two vhosts, I get a crash (division by 0 on vdmq stuff). When I start it without a physical port, I get a complaint about no port being enabled. Passing a portmask 0x1 seems to work, the example starts but, next, no traffic is forwarded (not even arp). Hooking gdb, I never get packet dequeued from vhost. -- David Marchand ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [dpdk-stable] [dpdk-dev] [PATCH v4 3/3] vhost: fix offload flags in Rx path 2021-05-12 15:20 ` David Marchand @ 2021-05-13 6:34 ` Wang, Yinan 0 siblings, 0 replies; 7+ messages in thread From: Wang, Yinan @ 2021-05-13 6:34 UTC (permalink / raw) To: David Marchand Cc: dev, maxime.coquelin, olivier.matz, fbl, i.maximets, Xia, Chenbo, Stokes, Ian, stable, Jijiang Liu, Yuanhan Liu > -----Original Message----- > From: David Marchand <david.marchand@redhat.com> > Sent: 2021年5月12日 23:20 > To: Wang, Yinan <yinan.wang@intel.com> > Cc: dev@dpdk.org; maxime.coquelin@redhat.com; > olivier.matz@6wind.com; fbl@sysclose.org; i.maximets@ovn.org; Xia, > Chenbo <chenbo.xia@intel.com>; Stokes, Ian <ian.stokes@intel.com>; > stable@dpdk.org; Jijiang Liu <jijiang.liu@intel.com>; Yuanhan Liu > <yuanhan.liu@linux.intel.com> > Subject: Re: [dpdk-dev] [PATCH v4 3/3] vhost: fix offload flags in Rx path > > On Wed, May 12, 2021 at 5:30 AM Wang, Yinan <yinan.wang@intel.com> > wrote: > > > > Hi David, > > > > Since vhost tx offload can’t work now, we report a Bugzilla as below, could > you help to take a look? > > https://bugs.dpdk.org/show_bug.cgi?id=702 > > (I discovered your mail from 05/08 only today, now that I got a new > mail, might be a pebcak from me, sorry...) > > > - Looking at the bz, there is a first issue/misconception. > testpmd only does TSO or any kind of tx offloading with the csum forward > engine. > The iofwd engine won't make TSO possible. > > > - Let's say we use the csum fwd engine, testpmd configures drivers > through the ethdev API. > The ethdev API states that no offloading is enabled unless requested > by the application. > TSO, l3/l4 checksums offloading are documented as: > https://doc.dpdk.org/guides/nics/features.html#l3-checksum-offload > https://doc.dpdk.org/guides/nics/features.html#lro > > But the vhost pmd does not report such capabilities. > https://git.dpdk.org/dpdk/tree/drivers/net/vhost/rte_eth_vhost.c#n1276 > > So we can't expect testpmd to have tso working with net/vhost pmd. > > > - The csum offloading engine swaps mac addresses. > I would expect issues with inter vm traffic. > > > In summary, I think this is a bad test. > If it worked with the commands in the bugzilla before my change (which > I doubt), it was wrong. Thanks your kindly explanation. Before this patch, vhost can declare tso offload, if we configure TSO/csum in Qemu, tso offload flags can be marked, such vm2vm can fwd large pkts (64k when using iperf) with iofwd. Now I am understand this case will not work later, we can move to using vswitch. > > > We also tried vhost example with VM2VM iperf test, large pkts also can't > forwarding. > > "large pkts", can you give details? > > I tried to use this example, without/with my change, but: > > When I try to start this example with a physical port and two vhosts, > I get a crash (division by 0 on vdmq stuff). > When I start it without a physical port, I get a complaint about no > port being enabled. > Passing a portmask 0x1 seems to work, the example starts but, next, no > traffic is forwarded (not even arp). > Hooking gdb, I never get packet dequeued from vhost. I re-test with vswitch, vm2vm iperf test can work w/ and w/o this patch. Sorry for the wrong result about vhost example before. There are some special configuration in vswitch sample. Test steps can work as below: 1. Modify the testpmd code as following:: --- a/examples/vhost/main.c +++ b/examples/vhost/main.c @@ -29,7 +29,7 @@ #include "main.h" #ifndef MAX_QUEUES -#define MAX_QUEUES 128 +#define MAX_QUEUES 512 #endif /* the maximum number of external ports supported */ 2. Bind one physical ports to vfio-pci, launch dpdk-vhost by below command:: ./dpdk-vhost -l 26-28 -n 4 -- -p 0x1 --mergeable 1 --vm2vm 1 --socket-file /tmp/vhost-net0 --socket-file /tmp/vhost-net1 3. Start VM0:: /home/qemu-install/qemu-4.2.1/bin/qemu-system-x86_64 -name vm1 -enable-kvm -cpu host -smp 4 -m 4096 \ -object memory-backend-file,id=mem,size=4096M,mem-path=/mnt/huge,share=on \ -numa node,memdev=mem -mem-prealloc -drive file=/home/osimg/ubuntu20-04.img \ -chardev socket,path=/tmp/vm2_qga0.sock,server,nowait,id=vm2_qga0 -device virtio-serial \ -device virtserialport,chardev=vm2_qga0,name=org.qemu.guest_agent.2 -daemonize \ -monitor unix:/tmp/vm2_monitor.sock,server,nowait -device e1000,netdev=nttsip1 \ -netdev user,id=nttsip1,hostfwd=tcp:127.0.0.1:6002-:22 \ -chardev socket,id=char0,path=/tmp/vhost-net0 \ -netdev type=vhost-user,id=netdev0,chardev=char0,vhostforce \ -device virtio-net-pci,netdev=netdev0,mac=52:54:00:00:00:01,disable-modern=true,mrg_rxbuf=off,csum=on,guest_csum=on,host_tso4=on,guest_tso4=on,guest_ecn=on -vnc :10 4. Start VM1:: /home/qemu-install/qemu-4.2.1/bin/qemu-system-x86_64 -name vm2 -enable-kvm -cpu host -smp 4 -m 4096 \ -object memory-backend-file,id=mem,size=4096M,mem-path=/mnt/huge,share=on \ -numa node,memdev=mem -mem-prealloc -drive file=/home/osimg/ubuntu20-04-2.img \ -chardev socket,path=/tmp/vm2_qga0.sock,server,nowait,id=vm2_qga0 -device virtio-serial \ -device virtserialport,chardev=vm2_qga0,name=org.qemu.guest_agent.2 -daemonize \ -monitor unix:/tmp/vm2_monitor.sock,server,nowait -device e1000,netdev=nttsip1 \ -netdev user,id=nttsip1,hostfwd=tcp:127.0.0.1:6003-:22 \ -chardev socket,id=char0,path=/tmp/vhost-net1 \ -netdev type=vhost-user,id=netdev0,chardev=char0,vhostforce \ -device virtio-net-pci,netdev=netdev0,mac=52:54:00:00:00:02,disable-modern=true,mrg_rxbuf=off,csum=on,guest_csum=on,host_tso4=on,guest_tso4=on,guest_ecn=on -vnc :12 5. On VM1, set virtio device IP and run arp protocal:: ifconfig ens5 1.1.1.2 arp -s 1.1.1.8 52:54:00:00:00:02 6. On VM2, set virtio device IP and run arp protocal:: ifconfig ens5 1.1.1.8 arp -s 1.1.1.2 52:54:00:00:00:01 7. Check the iperf performance with different packet size between two VMs by below commands:: Under VM1, run: `iperf -s -i 1` Under VM2, run: `iperf -c 1.1.1.2 -i 1 -t 60` > > > -- > David Marchand ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2021-05-13 6:34 UTC | newest] Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- [not found] <20210401095243.18211-1-david.marchand@redhat.com> [not found] ` <20210503132646.16076-1-david.marchand@redhat.com> 2021-05-03 13:26 ` [dpdk-stable] [PATCH v3 4/4] vhost: fix offload flags in Rx path David Marchand [not found] ` <20210503164344.27916-1-david.marchand@redhat.com> 2021-05-03 16:43 ` [dpdk-stable] [PATCH v4 3/3] " David Marchand 2021-05-04 11:07 ` Flavio Leitner 2021-05-08 6:24 ` [dpdk-stable] [dpdk-dev] " Wang, Yinan 2021-05-12 3:29 ` Wang, Yinan 2021-05-12 15:20 ` David Marchand 2021-05-13 6:34 ` Wang, Yinan
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).