From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mga11.intel.com (mga11.intel.com [192.55.52.93]) by dpdk.org (Postfix) with ESMTP id ABF6A68D9 for ; Fri, 23 Sep 2016 06:13:04 +0200 (CEST) Received: from fmsmga005.fm.intel.com ([10.253.24.32]) by fmsmga102.fm.intel.com with ESMTP; 22 Sep 2016 21:13:04 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.30,380,1470726000"; d="scan'208";a="12384595" Received: from yliu-dev.sh.intel.com ([10.239.67.162]) by fmsmga005.fm.intel.com with ESMTP; 22 Sep 2016 21:13:03 -0700 From: Yuanhan Liu To: dev@dpdk.org Cc: Maxime Coquelin , Yuanhan Liu Date: Fri, 23 Sep 2016 12:13:25 +0800 Message-Id: <1474604007-5221-6-git-send-email-yuanhan.liu@linux.intel.com> X-Mailer: git-send-email 1.9.0 In-Reply-To: <1474604007-5221-1-git-send-email-yuanhan.liu@linux.intel.com> References: <1471939839-29778-1-git-send-email-yuanhan.liu@linux.intel.com> <1474604007-5221-1-git-send-email-yuanhan.liu@linux.intel.com> Subject: [dpdk-dev] [PATCH v2 5/7] vhost: add a flag to enable dequeue zero copy X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 23 Sep 2016 04:13:05 -0000 Dequeue zero copy is disabled by default. Here add a new flag ``RTE_VHOST_USER_DEQUEUE_ZERO_COPY`` to explictily enable it. Signed-off-by: Yuanhan Liu --- v2: - update release log - doc dequeue zero copy in detail --- doc/guides/prog_guide/vhost_lib.rst | 35 +++++++++++++++++++++++++++++++++- doc/guides/rel_notes/release_16_11.rst | 11 +++++++++++ lib/librte_vhost/rte_virtio_net.h | 1 + lib/librte_vhost/socket.c | 5 +++++ lib/librte_vhost/vhost.c | 10 ++++++++++ lib/librte_vhost/vhost.h | 1 + 6 files changed, 62 insertions(+), 1 deletion(-) diff --git a/doc/guides/prog_guide/vhost_lib.rst b/doc/guides/prog_guide/vhost_lib.rst index 6b0c6b2..3fa9dd7 100644 --- a/doc/guides/prog_guide/vhost_lib.rst +++ b/doc/guides/prog_guide/vhost_lib.rst @@ -79,7 +79,7 @@ The following is an overview of the Vhost API functions: ``/dev/path`` character device file will be created. For vhost-user server mode, a Unix domain socket file ``path`` will be created. - Currently two flags are supported (these are valid for vhost-user only): + Currently supported flags are (these are valid for vhost-user only): - ``RTE_VHOST_USER_CLIENT`` @@ -97,6 +97,39 @@ The following is an overview of the Vhost API functions: This reconnect option is enabled by default. However, it can be turned off by setting this flag. + - ``RTE_VHOST_USER_DEQUEUE_ZERO_COPY`` + + Dequeue zero copy will be enabled when this flag is set. It is disabled by + default. + + There are some truths (including limitations) you might want to know while + setting this flag: + + * zero copy is not good for small packets (typically for packet size below + 512). + + * zero copy is really good for VM2VM case. For iperf between two VMs, the + boost could be above 70% (when TSO is enableld). + + * for VM2NIC case, the ``nb_tx_desc`` has to be small enough: <= 64 if virtio + indirect feature is not enabled and <= 128 if it is enabled. + + The is because when dequeue zero copy is enabled, guest Tx used vring will + be updated only when corresponding mbuf is freed. Thus, the nb_tx_desc + has to be small enough so that the PMD driver will run out of available + Tx descriptors and free mbufs timely. Otherwise, guest Tx vring would be + starved. + + * Guest memory should be backended with huge pages to achieve better + performance. Using 1G page size is the best. + + When dequeue zero copy is enabled, the guest phys address and host phys + address mapping has to be established. Using non-huge pages means far + more page segments. To make it simple, DPDK vhost does a linear search + of those segments, thus the fewer the segments, the quicker we will get + the mapping. NOTE: we may speed it by using radix tree searching in + future. + * ``rte_vhost_driver_session_start()`` This function starts the vhost session loop to handle vhost messages. It diff --git a/doc/guides/rel_notes/release_16_11.rst b/doc/guides/rel_notes/release_16_11.rst index 66916af..0c5756e 100644 --- a/doc/guides/rel_notes/release_16_11.rst +++ b/doc/guides/rel_notes/release_16_11.rst @@ -36,6 +36,17 @@ New Features This section is a comment. Make sure to start the actual text at the margin. + * **Added vhost-user dequeue zero copy support** + + The copy in dequeue path is saved, which is meant to improve the performance. + In the VM2VM case, the boost is quite impressive. The bigger the packet size, + the bigger performance boost you may get. However, for VM2NIC case, there + are some limitations, yet the boost is not that impressive as VM2VM case. + It may even drop quite a bit for small packets. + + For such reason, this feature is disabled by default. It can be enabled when + ``RTE_VHOST_USER_DEQUEUE_ZERO_COPY`` flag is given. Check the vhost section + at programming guide for more information. Resolved Issues --------------- diff --git a/lib/librte_vhost/rte_virtio_net.h b/lib/librte_vhost/rte_virtio_net.h index a88aecd..c53ff64 100644 --- a/lib/librte_vhost/rte_virtio_net.h +++ b/lib/librte_vhost/rte_virtio_net.h @@ -53,6 +53,7 @@ #define RTE_VHOST_USER_CLIENT (1ULL << 0) #define RTE_VHOST_USER_NO_RECONNECT (1ULL << 1) +#define RTE_VHOST_USER_DEQUEUE_ZERO_COPY (1ULL << 2) /* Enum for virtqueue management. */ enum {VIRTIO_RXQ, VIRTIO_TXQ, VIRTIO_QNUM}; diff --git a/lib/librte_vhost/socket.c b/lib/librte_vhost/socket.c index bf03f84..967cb65 100644 --- a/lib/librte_vhost/socket.c +++ b/lib/librte_vhost/socket.c @@ -62,6 +62,7 @@ struct vhost_user_socket { int connfd; bool is_server; bool reconnect; + bool dequeue_zero_copy; }; struct vhost_user_connection { @@ -203,6 +204,9 @@ vhost_user_add_connection(int fd, struct vhost_user_socket *vsocket) size = strnlen(vsocket->path, PATH_MAX); vhost_set_ifname(vid, vsocket->path, size); + if (vsocket->dequeue_zero_copy) + vhost_enable_dequeue_zero_copy(vid); + RTE_LOG(INFO, VHOST_CONFIG, "new device, handle is %d\n", vid); vsocket->connfd = fd; @@ -499,6 +503,7 @@ rte_vhost_driver_register(const char *path, uint64_t flags) memset(vsocket, 0, sizeof(struct vhost_user_socket)); vsocket->path = strdup(path); vsocket->connfd = -1; + vsocket->dequeue_zero_copy = flags & RTE_VHOST_USER_DEQUEUE_ZERO_COPY; if ((flags & RTE_VHOST_USER_CLIENT) != 0) { vsocket->reconnect = !(flags & RTE_VHOST_USER_NO_RECONNECT); diff --git a/lib/librte_vhost/vhost.c b/lib/librte_vhost/vhost.c index ab25649..f5f8f92 100644 --- a/lib/librte_vhost/vhost.c +++ b/lib/librte_vhost/vhost.c @@ -290,6 +290,16 @@ vhost_set_ifname(int vid, const char *if_name, unsigned int if_len) dev->ifname[sizeof(dev->ifname) - 1] = '\0'; } +void +vhost_enable_dequeue_zero_copy(int vid) +{ + struct virtio_net *dev = get_device(vid); + + if (dev == NULL) + return; + + dev->dequeue_zero_copy = 1; +} int rte_vhost_get_numa_node(int vid) diff --git a/lib/librte_vhost/vhost.h b/lib/librte_vhost/vhost.h index be8a398..53dbf33 100644 --- a/lib/librte_vhost/vhost.h +++ b/lib/librte_vhost/vhost.h @@ -278,6 +278,7 @@ void vhost_destroy_device(int); int alloc_vring_queue_pair(struct virtio_net *dev, uint32_t qp_idx); void vhost_set_ifname(int, const char *if_name, unsigned int if_len); +void vhost_enable_dequeue_zero_copy(int vid); /* * Backend-specific cleanup. Defined by vhost-cuse and vhost-user. -- 1.9.0