From: "Xia, Chenbo" <chenbo.xia@intel.com>
To: Maxime Coquelin <maxime.coquelin@redhat.com>,
"dev@dpdk.org" <dev@dpdk.org>,
"Stokes, Ian" <ian.stokes@intel.com>
Subject: Re: [dpdk-dev] [PATCH v2 4/4] vhost: remove dequeue zero-copy support
Date: Mon, 28 Sep 2020 11:14:21 +0000 [thread overview]
Message-ID: <BY5PR11MB4055BC75E3DC1AC10DC8FCCE9C350@BY5PR11MB4055.namprd11.prod.outlook.com> (raw)
In-Reply-To: <20200928091712.7946-5-maxime.coquelin@redhat.com>
> -----Original Message-----
> From: Maxime Coquelin <maxime.coquelin@redhat.com>
> Sent: Monday, September 28, 2020 5:17 PM
> To: dev@dpdk.org; Xia, Chenbo <chenbo.xia@intel.com>; Stokes, Ian
> <ian.stokes@intel.com>
> Cc: Maxime Coquelin <maxime.coquelin@redhat.com>
> Subject: [PATCH v2 4/4] vhost: remove dequeue zero-copy support
>
> Dequeue zero-copy removal was announced in DPDK v20.08.
> This feature brings constraints which makes the maintenance
> of the Vhost library difficult. Its limitations makes it also
> difficult to use by the applications (Tx vring starvation).
>
> Removing it makes it easier to add new features, and also remove
> some code in the hot path, which should bring a performance
> improvement for the standard path.
>
> Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
> ---
> doc/guides/prog_guide/vhost_lib.rst | 52 +----
> lib/librte_vhost/rte_vhost.h | 2 +-
> lib/librte_vhost/socket.c | 47 ----
> lib/librte_vhost/vhost.c | 14 --
> lib/librte_vhost/vhost.h | 28 ---
> lib/librte_vhost/vhost_user.c | 80 +------
> lib/librte_vhost/virtio_net.c | 326 +++-------------------------
> 7 files changed, 33 insertions(+), 516 deletions(-)
>
> diff --git a/doc/guides/prog_guide/vhost_lib.rst
> b/doc/guides/prog_guide/vhost_lib.rst
> index b892eec67a..ba4c62aeb8 100644
> --- a/doc/guides/prog_guide/vhost_lib.rst
> +++ b/doc/guides/prog_guide/vhost_lib.rst
> @@ -51,50 +51,6 @@ The following is an overview of some key Vhost API
> functions:
> This reconnect option is enabled by default. However, it can be
> turned off
> by setting this flag.
>
> - - ``RTE_VHOST_USER_DEQUEUE_ZERO_COPY``
> -
> - Dequeue zero copy will be enabled when this flag is set. It is
> disabled by
> - default.
> -
> - There are some truths (including limitations) you might want to know
> while
> - setting this flag:
> -
> - * zero copy is not good for small packets (typically for packet size
> below
> - 512).
> -
> - * zero copy is really good for VM2VM case. For iperf between two VMs,
> the
> - boost could be above 70% (when TSO is enabled).
> -
> - * For zero copy in VM2NIC case, guest Tx used vring may be starved if
> the
> - PMD driver consume the mbuf but not release them timely.
> -
> - For example, i40e driver has an optimization to maximum NIC
> pipeline which
> - postpones returning transmitted mbuf until only tx_free_threshold
> free
> - descs left. The virtio TX used ring will be starved if the formula
> - (num_i40e_tx_desc - num_virtio_tx_desc > tx_free_threshold) is true,
> since
> - i40e will not return back mbuf.
> -
> - A performance tip for tuning zero copy in VM2NIC case is to adjust
> the
> - frequency of mbuf free (i.e. adjust tx_free_threshold of i40e
> driver) to
> - balance consumer and producer.
> -
> - * Guest memory should be backended with huge pages to achieve better
> - performance. Using 1G page size is the best.
> -
> - When dequeue zero copy is enabled, the guest phys address and host
> phys
> - address mapping has to be established. Using non-huge pages means
> far
> - more page segments. To make it simple, DPDK vhost does a linear
> search
> - of those segments, thus the fewer the segments, the quicker we will
> get
> - the mapping. NOTE: we may speed it by using tree searching in
> future.
> -
> - * zero copy can not work when using vfio-pci with iommu mode
> currently, this
> - is because we don't setup iommu dma mapping for guest memory. If
> you have
> - to use vfio-pci driver, please insert vfio-pci kernel module in
> noiommu
> - mode.
> -
> - * The consumer of zero copy mbufs should consume these mbufs as soon
> as
> - possible, otherwise it may block the operations in vhost.
> -
> - ``RTE_VHOST_USER_IOMMU_SUPPORT``
>
> IOMMU support will be enabled when this flag is set. It is disabled
> by
> @@ -362,16 +318,16 @@ Guest memory requirement
>
> * Memory pre-allocation
>
> - For non-zerocopy non-async data path, guest memory pre-allocation is
> not a
> + For non-async data path, guest memory pre-allocation is not a
> must. This can help save of memory. If users really want the guest
> memory
> to be pre-allocated (e.g., for performance reason), we can add option
> ``-mem-prealloc`` when starting QEMU. Or, we can lock all memory at
> vhost
> side which will force memory to be allocated when mmap at vhost side;
> option --mlockall in ovs-dpdk is an example in hand.
>
> - For async and zerocopy data path, we force the VM memory to be
> - pre-allocated at vhost lib when mapping the guest memory; and also we
> need
> - to lock the memory to prevent pages being swapped out to disk.
> + For async data path, we force the VM memory to be pre-allocated at
> vhost
> + lib when mapping the guest memory; and also we need to lock the memory
> to
> + prevent pages being swapped out to disk.
>
> * Memory sharing
>
> diff --git a/lib/librte_vhost/rte_vhost.h b/lib/librte_vhost/rte_vhost.h
> index a94c84134d..46019df6fe 100644
> --- a/lib/librte_vhost/rte_vhost.h
> +++ b/lib/librte_vhost/rte_vhost.h
> @@ -28,7 +28,7 @@ extern "C" {
>
> #define RTE_VHOST_USER_CLIENT (1ULL << 0)
> #define RTE_VHOST_USER_NO_RECONNECT (1ULL << 1)
> -#define RTE_VHOST_USER_DEQUEUE_ZERO_COPY (1ULL << 2)
> +#define RTE_VHOST_USER_RESERVED_1 (1ULL << 2)
> #define RTE_VHOST_USER_IOMMU_SUPPORT (1ULL << 3)
> #define RTE_VHOST_USER_POSTCOPY_SUPPORT (1ULL << 4)
> /* support mbuf with external buffer attached */
> diff --git a/lib/librte_vhost/socket.c b/lib/librte_vhost/socket.c
> index 73e1dca95e..0169d36481 100644
> --- a/lib/librte_vhost/socket.c
> +++ b/lib/librte_vhost/socket.c
> @@ -37,7 +37,6 @@ struct vhost_user_socket {
> struct sockaddr_un un;
> bool is_server;
> bool reconnect;
> - bool dequeue_zero_copy;
> bool iommu_support;
> bool use_builtin_virtio_net;
> bool extbuf;
> @@ -229,9 +228,6 @@ vhost_user_add_connection(int fd, struct
> vhost_user_socket *vsocket)
>
> vhost_attach_vdpa_device(vid, vsocket->vdpa_dev);
>
> - if (vsocket->dequeue_zero_copy)
> - vhost_enable_dequeue_zero_copy(vid);
> -
> if (vsocket->extbuf)
> vhost_enable_extbuf(vid);
>
> @@ -878,18 +874,8 @@ rte_vhost_driver_register(const char *path, uint64_t
> flags)
> goto out_free;
> }
> vsocket->vdpa_dev = NULL;
> - vsocket->dequeue_zero_copy = flags &
> RTE_VHOST_USER_DEQUEUE_ZERO_COPY;
> vsocket->extbuf = flags & RTE_VHOST_USER_EXTBUF_SUPPORT;
> vsocket->linearbuf = flags & RTE_VHOST_USER_LINEARBUF_SUPPORT;
> -
> - if (vsocket->dequeue_zero_copy &&
> - (flags & RTE_VHOST_USER_IOMMU_SUPPORT)) {
> - VHOST_LOG_CONFIG(ERR,
> - "error: enabling dequeue zero copy and IOMMU features "
> - "simultaneously is not supported\n");
> - goto out_mutex;
> - }
> -
> vsocket->async_copy = flags & RTE_VHOST_USER_ASYNC_COPY;
>
> if (vsocket->async_copy &&
> @@ -918,39 +904,6 @@ rte_vhost_driver_register(const char *path, uint64_t
> flags)
> vsocket->features = VIRTIO_NET_SUPPORTED_FEATURES;
> vsocket->protocol_features = VHOST_USER_PROTOCOL_FEATURES;
>
> - /*
> - * Dequeue zero copy can't assure descriptors returned in order.
> - * Also, it requires that the guest memory is populated, which is
> - * not compatible with postcopy.
> - */
> - if (vsocket->dequeue_zero_copy) {
> - if (vsocket->extbuf) {
> - VHOST_LOG_CONFIG(ERR,
> - "error: zero copy is incompatible with external
> buffers\n");
> - ret = -1;
> - goto out_mutex;
> - }
> - if (vsocket->linearbuf) {
> - VHOST_LOG_CONFIG(ERR,
> - "error: zero copy is incompatible with linear
> buffers\n");
> - ret = -1;
> - goto out_mutex;
> - }
> - if ((flags & RTE_VHOST_USER_CLIENT) != 0) {
> - VHOST_LOG_CONFIG(ERR,
> - "error: zero copy is incompatible with vhost client
> mode\n");
> - ret = -1;
> - goto out_mutex;
> - }
> - vsocket->supported_features &= ~(1ULL << VIRTIO_F_IN_ORDER);
> - vsocket->features &= ~(1ULL << VIRTIO_F_IN_ORDER);
> -
> - VHOST_LOG_CONFIG(INFO,
> - "Dequeue zero copy requested, disabling postcopy
> support\n");
> - vsocket->protocol_features &=
> - ~(1ULL << VHOST_USER_PROTOCOL_F_PAGEFAULT);
> - }
> -
> if (vsocket->async_copy) {
> vsocket->supported_features &= ~(1ULL << VHOST_F_LOG_ALL);
> vsocket->features &= ~(1ULL << VHOST_F_LOG_ALL);
> diff --git a/lib/librte_vhost/vhost.c b/lib/librte_vhost/vhost.c
> index 8f20a0818f..c7cd34e42b 100644
> --- a/lib/librte_vhost/vhost.c
> +++ b/lib/librte_vhost/vhost.c
> @@ -539,8 +539,6 @@ init_vring_queue(struct virtio_net *dev, uint32_t
> vring_idx)
> vhost_user_iotlb_init(dev, vring_idx);
> /* Backends are set to -1 indicating an inactive device. */
> vq->backend = -1;
> -
> - TAILQ_INIT(&vq->zmbuf_list);
> }
>
> static void
> @@ -704,18 +702,6 @@ vhost_set_ifname(int vid, const char *if_name,
> unsigned int if_len)
> dev->ifname[sizeof(dev->ifname) - 1] = '\0';
> }
>
> -void
> -vhost_enable_dequeue_zero_copy(int vid)
> -{
> - struct virtio_net *dev = get_device(vid);
> -
> - if (dev == NULL)
> - return;
> -
> - dev->dequeue_zero_copy = 1;
> - VHOST_LOG_CONFIG(INFO, "dequeue zero copy is enabled\n");
> -}
> -
> void
> vhost_set_builtin_virtio_net(int vid, bool enable)
> {
> diff --git a/lib/librte_vhost/vhost.h b/lib/librte_vhost/vhost.h
> index 73a1ed8895..20ccdc9bdc 100644
> --- a/lib/librte_vhost/vhost.h
> +++ b/lib/librte_vhost/vhost.h
> @@ -94,20 +94,6 @@ struct buf_vector {
> uint32_t desc_idx;
> };
>
> -/*
> - * A structure to hold some fields needed in zero copy code path,
> - * mainly for associating an mbuf with the right desc_idx.
> - */
> -struct zcopy_mbuf {
> - struct rte_mbuf *mbuf;
> - uint32_t desc_idx;
> - uint16_t desc_count;
> - uint16_t in_use;
> -
> - TAILQ_ENTRY(zcopy_mbuf) next;
> -};
> -TAILQ_HEAD(zcopy_mbuf_list, zcopy_mbuf);
> -
> /*
> * Structure contains the info for each batched memory copy.
> */
> @@ -185,12 +171,6 @@ struct vhost_virtqueue {
> struct rte_vhost_resubmit_info *resubmit_inflight;
> uint64_t global_counter;
>
> - uint16_t nr_zmbuf;
> - uint16_t zmbuf_size;
> - uint16_t last_zmbuf_idx;
> - struct zcopy_mbuf *zmbufs;
> - struct zcopy_mbuf_list zmbuf_list;
> -
> union {
> struct vring_used_elem *shadow_used_split;
> struct vring_used_elem_packed *shadow_used_packed;
> @@ -380,7 +360,6 @@ struct virtio_net {
> /* to tell if we need broadcast rarp packet */
> int16_t broadcast_rarp;
> uint32_t nr_vring;
> - int dequeue_zero_copy;
> int async_copy;
> int extbuf;
> int linearbuf;
> @@ -718,7 +697,6 @@ int alloc_vring_queue(struct virtio_net *dev, uint32_t
> vring_idx);
> void vhost_attach_vdpa_device(int vid, struct rte_vdpa_device *dev);
>
> void vhost_set_ifname(int, const char *if_name, unsigned int if_len);
> -void vhost_enable_dequeue_zero_copy(int vid);
> void vhost_set_builtin_virtio_net(int vid, bool enable);
> void vhost_enable_extbuf(int vid);
> void vhost_enable_linearbuf(int vid);
> @@ -898,10 +876,4 @@ mbuf_is_consumed(struct rte_mbuf *m)
> return true;
> }
>
> -static __rte_always_inline void
> -put_zmbuf(struct zcopy_mbuf *zmbuf)
> -{
> - zmbuf->in_use = 0;
> -}
> -
> #endif /* _VHOST_NET_CDEV_H_ */
> diff --git a/lib/librte_vhost/vhost_user.c b/lib/librte_vhost/vhost_user.c
> index 501218e192..3196391c32 100644
> --- a/lib/librte_vhost/vhost_user.c
> +++ b/lib/librte_vhost/vhost_user.c
> @@ -134,47 +134,15 @@ get_blk_size(int fd)
> return ret == -1 ? (uint64_t)-1 : (uint64_t)stat.st_blksize;
> }
>
> -/*
> - * Reclaim all the outstanding zmbufs for a virtqueue.
> - */
> -static void
> -drain_zmbuf_list(struct vhost_virtqueue *vq)
> -{
> - struct zcopy_mbuf *zmbuf, *next;
> -
> - for (zmbuf = TAILQ_FIRST(&vq->zmbuf_list);
> - zmbuf != NULL; zmbuf = next) {
> - next = TAILQ_NEXT(zmbuf, next);
> -
> - while (!mbuf_is_consumed(zmbuf->mbuf))
> - usleep(1000);
> -
> - TAILQ_REMOVE(&vq->zmbuf_list, zmbuf, next);
> - restore_mbuf(zmbuf->mbuf);
> - rte_pktmbuf_free(zmbuf->mbuf);
> - put_zmbuf(zmbuf);
> - vq->nr_zmbuf -= 1;
> - }
> -}
> -
> static void
> free_mem_region(struct virtio_net *dev)
> {
> uint32_t i;
> struct rte_vhost_mem_region *reg;
> - struct vhost_virtqueue *vq;
>
> if (!dev || !dev->mem)
> return;
>
> - if (dev->dequeue_zero_copy) {
> - for (i = 0; i < dev->nr_vring; i++) {
> - vq = dev->virtqueue[i];
> - if (vq)
> - drain_zmbuf_list(vq);
> - }
> - }
> -
> for (i = 0; i < dev->mem->nregions; i++) {
> reg = &dev->mem->regions[i];
> if (reg->host_user_addr) {
> @@ -454,23 +422,6 @@ vhost_user_set_vring_num(struct virtio_net **pdev,
> return RTE_VHOST_MSG_RESULT_ERR;
> }
>
> - if (dev->dequeue_zero_copy) {
> - vq->nr_zmbuf = 0;
> - vq->last_zmbuf_idx = 0;
> - vq->zmbuf_size = vq->size;
> - if (vq->zmbufs)
> - rte_free(vq->zmbufs);
> - vq->zmbufs = rte_zmalloc(NULL, vq->zmbuf_size *
> - sizeof(struct zcopy_mbuf), 0);
> - if (vq->zmbufs == NULL) {
> - VHOST_LOG_CONFIG(WARNING,
> - "failed to allocate mem for zero copy; "
> - "zero copy is force disabled\n");
> - dev->dequeue_zero_copy = 0;
> - }
> - TAILQ_INIT(&vq->zmbuf_list);
> - }
> -
> if (vq_is_packed(dev)) {
> if (vq->shadow_used_packed)
> rte_free(vq->shadow_used_packed);
> @@ -524,7 +475,6 @@ numa_realloc(struct virtio_net *dev, int index)
> int oldnode, newnode;
> struct virtio_net *old_dev;
> struct vhost_virtqueue *old_vq, *vq;
> - struct zcopy_mbuf *new_zmbuf;
> struct vring_used_elem *new_shadow_used_split;
> struct vring_used_elem_packed *new_shadow_used_packed;
> struct batch_copy_elem *new_batch_copy_elems;
> @@ -555,16 +505,6 @@ numa_realloc(struct virtio_net *dev, int index)
> return dev;
>
> memcpy(vq, old_vq, sizeof(*vq));
> - TAILQ_INIT(&vq->zmbuf_list);
> -
> - if (dev->dequeue_zero_copy) {
> - new_zmbuf = rte_malloc_socket(NULL, vq->zmbuf_size *
> - sizeof(struct zcopy_mbuf), 0, newnode);
> - if (new_zmbuf) {
> - rte_free(vq->zmbufs);
> - vq->zmbufs = new_zmbuf;
> - }
> - }
>
> if (vq_is_packed(dev)) {
> new_shadow_used_packed = rte_malloc_socket(NULL,
> @@ -1179,8 +1119,7 @@ vhost_user_set_mem_table(struct virtio_net **pdev,
> struct VhostUserMsg *msg,
> goto err_mmap;
> }
>
> - populate = (dev->dequeue_zero_copy || dev->async_copy) ?
> - MAP_POPULATE : 0;
> + populate = dev->async_copy ? MAP_POPULATE : 0;
> mmap_addr = mmap(NULL, mmap_size, PROT_READ | PROT_WRITE,
> MAP_SHARED | populate, fd, 0);
>
> @@ -1195,7 +1134,7 @@ vhost_user_set_mem_table(struct virtio_net **pdev,
> struct VhostUserMsg *msg,
> reg->host_user_addr = (uint64_t)(uintptr_t)mmap_addr +
> mmap_offset;
>
> - if (dev->dequeue_zero_copy || dev->async_copy)
> + if (dev->async_copy)
> if (add_guest_pages(dev, reg, alignment) < 0) {
> VHOST_LOG_CONFIG(ERR,
> "adding guest pages to region %u failed.\n",
> @@ -1933,15 +1872,6 @@ vhost_user_set_vring_kick(struct virtio_net **pdev,
> struct VhostUserMsg *msg,
> return RTE_VHOST_MSG_RESULT_OK;
> }
>
> -static void
> -free_zmbufs(struct vhost_virtqueue *vq)
> -{
> - drain_zmbuf_list(vq);
> -
> - rte_free(vq->zmbufs);
> - vq->zmbufs = NULL;
> -}
> -
> /*
> * when virtio is stopped, qemu will send us the GET_VRING_BASE message.
> */
> @@ -1996,8 +1926,6 @@ vhost_user_get_vring_base(struct virtio_net **pdev,
>
> vq->signalled_used_valid = false;
>
> - if (dev->dequeue_zero_copy)
> - free_zmbufs(vq);
> if (vq_is_packed(dev)) {
> rte_free(vq->shadow_used_packed);
> vq->shadow_used_packed = NULL;
> @@ -2051,10 +1979,6 @@ vhost_user_set_vring_enable(struct virtio_net
> **pdev,
> }
> }
>
> - /* On disable, rings have to be stopped being processed. */
> - if (!enable && dev->dequeue_zero_copy)
> - drain_zmbuf_list(dev->virtqueue[index]);
> -
> dev->virtqueue[index]->enabled = enable;
>
> return RTE_VHOST_MSG_RESULT_OK;
> diff --git a/lib/librte_vhost/virtio_net.c b/lib/librte_vhost/virtio_net.c
> index bd9303c8a9..0a0bea1a5a 100644
> --- a/lib/librte_vhost/virtio_net.c
> +++ b/lib/librte_vhost/virtio_net.c
> @@ -1946,7 +1946,7 @@ copy_desc_to_mbuf(struct virtio_net *dev, struct
> vhost_virtqueue *vq,
> struct rte_mbuf *m, struct rte_mempool *mbuf_pool)
> {
> uint32_t buf_avail, buf_offset;
> - uint64_t buf_addr, buf_iova, buf_len;
> + uint64_t buf_addr, buf_len;
> uint32_t mbuf_avail, mbuf_offset;
> uint32_t cpy_len;
> struct rte_mbuf *cur = m, *prev = m;
> @@ -1958,7 +1958,6 @@ copy_desc_to_mbuf(struct virtio_net *dev, struct
> vhost_virtqueue *vq,
> int error = 0;
>
> buf_addr = buf_vec[vec_idx].buf_addr;
> - buf_iova = buf_vec[vec_idx].buf_iova;
> buf_len = buf_vec[vec_idx].buf_len;
>
> if (unlikely(buf_len < dev->vhost_hlen && nr_vec <= 1)) {
> @@ -1988,14 +1987,12 @@ copy_desc_to_mbuf(struct virtio_net *dev, struct
> vhost_virtqueue *vq,
> buf_offset = dev->vhost_hlen - buf_len;
> vec_idx++;
> buf_addr = buf_vec[vec_idx].buf_addr;
> - buf_iova = buf_vec[vec_idx].buf_iova;
> buf_len = buf_vec[vec_idx].buf_len;
> buf_avail = buf_len - buf_offset;
> } else if (buf_len == dev->vhost_hlen) {
> if (unlikely(++vec_idx >= nr_vec))
> goto out;
> buf_addr = buf_vec[vec_idx].buf_addr;
> - buf_iova = buf_vec[vec_idx].buf_iova;
> buf_len = buf_vec[vec_idx].buf_len;
>
> buf_offset = 0;
> @@ -2012,48 +2009,23 @@ copy_desc_to_mbuf(struct virtio_net *dev, struct
> vhost_virtqueue *vq,
> mbuf_offset = 0;
> mbuf_avail = m->buf_len - RTE_PKTMBUF_HEADROOM;
> while (1) {
> - uint64_t hpa;
> -
> cpy_len = RTE_MIN(buf_avail, mbuf_avail);
>
> - /*
> - * A desc buf might across two host physical pages that are
> - * not continuous. In such case (gpa_to_hpa returns 0), data
> - * will be copied even though zero copy is enabled.
> - */
> - if (unlikely(dev->dequeue_zero_copy && (hpa = gpa_to_hpa(dev,
> - buf_iova + buf_offset, cpy_len)))) {
> - cur->data_len = cpy_len;
> - cur->data_off = 0;
> - cur->buf_addr =
> - (void *)(uintptr_t)(buf_addr + buf_offset);
> - cur->buf_iova = hpa;
> -
> - /*
> - * In zero copy mode, one mbuf can only reference data
> - * for one or partial of one desc buff.
> - */
> - mbuf_avail = cpy_len;
> - } else {
> - if (likely(cpy_len > MAX_BATCH_LEN ||
> - vq->batch_copy_nb_elems >= vq->size ||
> - (hdr && cur == m))) {
> - rte_memcpy(rte_pktmbuf_mtod_offset(cur, void *,
> - mbuf_offset),
> - (void *)((uintptr_t)(buf_addr +
> - buf_offset)),
> - cpy_len);
> - } else {
> - batch_copy[vq->batch_copy_nb_elems].dst =
> - rte_pktmbuf_mtod_offset(cur, void *,
> - mbuf_offset);
> - batch_copy[vq->batch_copy_nb_elems].src =
> + if (likely(cpy_len > MAX_BATCH_LEN ||
> + vq->batch_copy_nb_elems >= vq->size ||
> + (hdr && cur == m))) {
> + rte_memcpy(rte_pktmbuf_mtod_offset(cur, void *,
> + mbuf_offset),
> (void *)((uintptr_t)(buf_addr +
> - buf_offset));
> - batch_copy[vq->batch_copy_nb_elems].len =
> - cpy_len;
> - vq->batch_copy_nb_elems++;
> - }
> + buf_offset)), cpy_len);
> + } else {
> + batch_copy[vq->batch_copy_nb_elems].dst =
> + rte_pktmbuf_mtod_offset(cur, void *,
> + mbuf_offset);
> + batch_copy[vq->batch_copy_nb_elems].src =
> + (void *)((uintptr_t)(buf_addr + buf_offset));
> + batch_copy[vq->batch_copy_nb_elems].len = cpy_len;
> + vq->batch_copy_nb_elems++;
> }
>
> mbuf_avail -= cpy_len;
> @@ -2067,7 +2039,6 @@ copy_desc_to_mbuf(struct virtio_net *dev, struct
> vhost_virtqueue *vq,
> break;
>
> buf_addr = buf_vec[vec_idx].buf_addr;
> - buf_iova = buf_vec[vec_idx].buf_iova;
> buf_len = buf_vec[vec_idx].buf_len;
>
> buf_offset = 0;
> @@ -2089,8 +2060,6 @@ copy_desc_to_mbuf(struct virtio_net *dev, struct
> vhost_virtqueue *vq,
> error = -1;
> goto out;
> }
> - if (unlikely(dev->dequeue_zero_copy))
> - rte_mbuf_refcnt_update(cur, 1);
>
> prev->next = cur;
> prev->data_len = mbuf_offset;
> @@ -2114,37 +2083,6 @@ copy_desc_to_mbuf(struct virtio_net *dev, struct
> vhost_virtqueue *vq,
> return error;
> }
>
> -static __rte_always_inline struct zcopy_mbuf *
> -get_zmbuf(struct vhost_virtqueue *vq)
> -{
> - uint16_t i;
> - uint16_t last;
> - int tries = 0;
> -
> - /* search [last_zmbuf_idx, zmbuf_size) */
> - i = vq->last_zmbuf_idx;
> - last = vq->zmbuf_size;
> -
> -again:
> - for (; i < last; i++) {
> - if (vq->zmbufs[i].in_use == 0) {
> - vq->last_zmbuf_idx = i + 1;
> - vq->zmbufs[i].in_use = 1;
> - return &vq->zmbufs[i];
> - }
> - }
> -
> - tries++;
> - if (tries == 1) {
> - /* search [0, last_zmbuf_idx) */
> - i = 0;
> - last = vq->last_zmbuf_idx;
> - goto again;
> - }
> -
> - return NULL;
> -}
> -
> static void
> virtio_dev_extbuf_free(void *addr __rte_unused, void *opaque)
> {
> @@ -2244,30 +2182,6 @@ virtio_dev_tx_split(struct virtio_net *dev, struct
> vhost_virtqueue *vq,
> uint16_t dropped = 0;
> static bool allocerr_warned;
>
> - if (unlikely(dev->dequeue_zero_copy)) {
> - struct zcopy_mbuf *zmbuf, *next;
> -
> - for (zmbuf = TAILQ_FIRST(&vq->zmbuf_list);
> - zmbuf != NULL; zmbuf = next) {
> - next = TAILQ_NEXT(zmbuf, next);
> -
> - if (mbuf_is_consumed(zmbuf->mbuf)) {
> - update_shadow_used_ring_split(vq,
> - zmbuf->desc_idx, 0);
> - TAILQ_REMOVE(&vq->zmbuf_list, zmbuf, next);
> - restore_mbuf(zmbuf->mbuf);
> - rte_pktmbuf_free(zmbuf->mbuf);
> - put_zmbuf(zmbuf);
> - vq->nr_zmbuf -= 1;
> - }
> - }
> -
> - if (likely(vq->shadow_used_idx)) {
> - flush_shadow_used_ring_split(dev, vq);
> - vhost_vring_call_split(dev, vq);
> - }
> - }
> -
> /*
> * The ordering between avail index and
> * desc reads needs to be enforced.
> @@ -2300,8 +2214,7 @@ virtio_dev_tx_split(struct virtio_net *dev, struct
> vhost_virtqueue *vq,
> VHOST_ACCESS_RO) < 0))
> break;
>
> - if (likely(dev->dequeue_zero_copy == 0))
> - update_shadow_used_ring_split(vq, head_idx, 0);
> + update_shadow_used_ring_split(vq, head_idx, 0);
>
> pkts[i] = virtio_dev_pktmbuf_alloc(dev, mbuf_pool, buf_len);
> if (unlikely(pkts[i] == NULL)) {
> @@ -2335,42 +2248,16 @@ virtio_dev_tx_split(struct virtio_net *dev, struct
> vhost_virtqueue *vq,
> i++;
> break;
> }
> -
> - if (unlikely(dev->dequeue_zero_copy)) {
> - struct zcopy_mbuf *zmbuf;
> -
> - zmbuf = get_zmbuf(vq);
> - if (!zmbuf) {
> - rte_pktmbuf_free(pkts[i]);
> - dropped += 1;
> - i++;
> - break;
> - }
> - zmbuf->mbuf = pkts[i];
> - zmbuf->desc_idx = head_idx;
> -
> - /*
> - * Pin lock the mbuf; we will check later to see
> - * whether the mbuf is freed (when we are the last
> - * user) or not. If that's the case, we then could
> - * update the used ring safely.
> - */
> - rte_mbuf_refcnt_update(pkts[i], 1);
> -
> - vq->nr_zmbuf += 1;
> - TAILQ_INSERT_TAIL(&vq->zmbuf_list, zmbuf, next);
> - }
> }
> +
> vq->last_avail_idx += i;
>
> - if (likely(dev->dequeue_zero_copy == 0)) {
> - do_data_copy_dequeue(vq);
> - if (unlikely(i < count))
> - vq->shadow_used_idx = i;
> - if (likely(vq->shadow_used_idx)) {
> - flush_shadow_used_ring_split(dev, vq);
> - vhost_vring_call_split(dev, vq);
> - }
> + do_data_copy_dequeue(vq);
> + if (unlikely(i < count))
> + vq->shadow_used_idx = i;
> + if (likely(vq->shadow_used_idx)) {
> + flush_shadow_used_ring_split(dev, vq);
> + vhost_vring_call_split(dev, vq);
> }
>
> return (i - dropped);
> @@ -2570,162 +2457,6 @@ virtio_dev_tx_single_packed(struct virtio_net *dev,
> return ret;
> }
>
> -static __rte_always_inline int
> -virtio_dev_tx_batch_packed_zmbuf(struct virtio_net *dev,
> - struct vhost_virtqueue *vq,
> - struct rte_mempool *mbuf_pool,
> - struct rte_mbuf **pkts)
> -{
> - struct zcopy_mbuf *zmbufs[PACKED_BATCH_SIZE];
> - uintptr_t desc_addrs[PACKED_BATCH_SIZE];
> - uint16_t ids[PACKED_BATCH_SIZE];
> - uint16_t i;
> -
> - uint16_t avail_idx = vq->last_avail_idx;
> -
> - if (vhost_reserve_avail_batch_packed(dev, vq, mbuf_pool, pkts,
> - avail_idx, desc_addrs, ids))
> - return -1;
> -
> - vhost_for_each_try_unroll(i, 0, PACKED_BATCH_SIZE)
> - zmbufs[i] = get_zmbuf(vq);
> -
> - vhost_for_each_try_unroll(i, 0, PACKED_BATCH_SIZE) {
> - if (!zmbufs[i])
> - goto free_pkt;
> - }
> -
> - vhost_for_each_try_unroll(i, 0, PACKED_BATCH_SIZE) {
> - zmbufs[i]->mbuf = pkts[i];
> - zmbufs[i]->desc_idx = ids[i];
> - zmbufs[i]->desc_count = 1;
> - }
> -
> - vhost_for_each_try_unroll(i, 0, PACKED_BATCH_SIZE)
> - rte_mbuf_refcnt_update(pkts[i], 1);
> -
> - vhost_for_each_try_unroll(i, 0, PACKED_BATCH_SIZE)
> - TAILQ_INSERT_TAIL(&vq->zmbuf_list, zmbufs[i], next);
> -
> - vq->nr_zmbuf += PACKED_BATCH_SIZE;
> - vq_inc_last_avail_packed(vq, PACKED_BATCH_SIZE);
> -
> - return 0;
> -
> -free_pkt:
> - vhost_for_each_try_unroll(i, 0, PACKED_BATCH_SIZE)
> - rte_pktmbuf_free(pkts[i]);
> -
> - return -1;
> -}
> -
> -static __rte_always_inline int
> -virtio_dev_tx_single_packed_zmbuf(struct virtio_net *dev,
> - struct vhost_virtqueue *vq,
> - struct rte_mempool *mbuf_pool,
> - struct rte_mbuf **pkts)
> -{
> - uint16_t buf_id, desc_count;
> - struct zcopy_mbuf *zmbuf;
> -
> - if (vhost_dequeue_single_packed(dev, vq, mbuf_pool, pkts, &buf_id,
> - &desc_count))
> - return -1;
> -
> - zmbuf = get_zmbuf(vq);
> - if (!zmbuf) {
> - rte_pktmbuf_free(*pkts);
> - return -1;
> - }
> - zmbuf->mbuf = *pkts;
> - zmbuf->desc_idx = buf_id;
> - zmbuf->desc_count = desc_count;
> -
> - rte_mbuf_refcnt_update(*pkts, 1);
> -
> - vq->nr_zmbuf += 1;
> - TAILQ_INSERT_TAIL(&vq->zmbuf_list, zmbuf, next);
> -
> - vq_inc_last_avail_packed(vq, desc_count);
> - return 0;
> -}
> -
> -static __rte_always_inline void
> -free_zmbuf(struct vhost_virtqueue *vq)
> -{
> - struct zcopy_mbuf *next = NULL;
> - struct zcopy_mbuf *zmbuf;
> -
> - for (zmbuf = TAILQ_FIRST(&vq->zmbuf_list);
> - zmbuf != NULL; zmbuf = next) {
> - next = TAILQ_NEXT(zmbuf, next);
> -
> - uint16_t last_used_idx = vq->last_used_idx;
> -
> - if (mbuf_is_consumed(zmbuf->mbuf)) {
> - uint16_t flags;
> - flags = vq->desc_packed[last_used_idx].flags;
> - if (vq->used_wrap_counter) {
> - flags |= VRING_DESC_F_USED;
> - flags |= VRING_DESC_F_AVAIL;
> - } else {
> - flags &= ~VRING_DESC_F_USED;
> - flags &= ~VRING_DESC_F_AVAIL;
> - }
> -
> - vq->desc_packed[last_used_idx].id = zmbuf->desc_idx;
> - vq->desc_packed[last_used_idx].len = 0;
> -
> - rte_smp_wmb();
> - vq->desc_packed[last_used_idx].flags = flags;
> -
> - vq_inc_last_used_packed(vq, zmbuf->desc_count);
> -
> - TAILQ_REMOVE(&vq->zmbuf_list, zmbuf, next);
> - restore_mbuf(zmbuf->mbuf);
> - rte_pktmbuf_free(zmbuf->mbuf);
> - put_zmbuf(zmbuf);
> - vq->nr_zmbuf -= 1;
> - }
> - }
> -}
> -
> -static __rte_noinline uint16_t
> -virtio_dev_tx_packed_zmbuf(struct virtio_net *dev,
> - struct vhost_virtqueue *__rte_restrict vq,
> - struct rte_mempool *mbuf_pool,
> - struct rte_mbuf **__rte_restrict pkts,
> - uint32_t count)
> -{
> - uint32_t pkt_idx = 0;
> - uint32_t remained = count;
> -
> - free_zmbuf(vq);
> -
> - do {
> - if (remained >= PACKED_BATCH_SIZE) {
> - if (!virtio_dev_tx_batch_packed_zmbuf(dev, vq,
> - mbuf_pool, &pkts[pkt_idx])) {
> - pkt_idx += PACKED_BATCH_SIZE;
> - remained -= PACKED_BATCH_SIZE;
> - continue;
> - }
> - }
> -
> - if (virtio_dev_tx_single_packed_zmbuf(dev, vq, mbuf_pool,
> - &pkts[pkt_idx]))
> - break;
> - pkt_idx++;
> - remained--;
> -
> - } while (remained);
> -
> - if (pkt_idx)
> - vhost_vring_call_packed(dev, vq);
> -
> - return pkt_idx;
> -}
> -
> static __rte_noinline uint16_t
> virtio_dev_tx_packed(struct virtio_net *dev,
> struct vhost_virtqueue *__rte_restrict vq,
> @@ -2841,14 +2572,9 @@ rte_vhost_dequeue_burst(int vid, uint16_t queue_id,
> count -= 1;
> }
>
> - if (vq_is_packed(dev)) {
> - if (unlikely(dev->dequeue_zero_copy))
> - count = virtio_dev_tx_packed_zmbuf(dev, vq, mbuf_pool,
> - pkts, count);
> - else
> - count = virtio_dev_tx_packed(dev, vq, mbuf_pool, pkts,
> - count);
> - } else
> + if (vq_is_packed(dev))
> + count = virtio_dev_tx_packed(dev, vq, mbuf_pool, pkts, count);
> + else
> count = virtio_dev_tx_split(dev, vq, mbuf_pool, pkts, count);
>
> out:
> --
> 2.26.2
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
next prev parent reply other threads:[~2020-09-28 11:14 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-09-28 9:17 [dpdk-dev] [PATCH v2 0/4] " Maxime Coquelin
2020-09-28 9:17 ` [dpdk-dev] [PATCH v2 1/4] net/vhost: " Maxime Coquelin
2020-09-28 11:07 ` Xia, Chenbo
2020-09-28 9:17 ` [dpdk-dev] [PATCH v2 2/4] examples/vhost_crypto: use vhost async-copy flag Maxime Coquelin
2020-09-28 9:17 ` [dpdk-dev] [PATCH v2 3/4] examples/vhost: remove dequeue zero-copy support Maxime Coquelin
2020-09-28 9:17 ` [dpdk-dev] [PATCH v2 4/4] vhost: " Maxime Coquelin
2020-09-28 11:14 ` Xia, Chenbo [this message]
2020-09-30 16:21 ` [dpdk-dev] [PATCH v2 0/4] " Maxime Coquelin
2020-09-30 20:56 ` Ferruh Yigit
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=BY5PR11MB4055BC75E3DC1AC10DC8FCCE9C350@BY5PR11MB4055.namprd11.prod.outlook.com \
--to=chenbo.xia@intel.com \
--cc=dev@dpdk.org \
--cc=ian.stokes@intel.com \
--cc=maxime.coquelin@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).