From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from dpdk.org (dpdk.org [92.243.14.124]) by inbox.dpdk.org (Postfix) with ESMTP id 001F2A2F67 for ; Fri, 4 Oct 2019 22:10:30 +0200 (CEST) Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id 5BA671C2E1; Fri, 4 Oct 2019 22:10:29 +0200 (CEST) Received: from sysclose.org (smtp.sysclose.org [69.164.214.230]) by dpdk.org (Postfix) with ESMTP id EEA621C2B4 for ; Fri, 4 Oct 2019 22:10:26 +0200 (CEST) Received: by sysclose.org (Postfix, from userid 5001) id 1752866FD; Fri, 4 Oct 2019 20:10:48 +0000 (UTC) DKIM-Filter: OpenDKIM Filter v2.11.0 sysclose.org 1752866FD DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sysclose.org; s=201903; t=1570219848; bh=zSBN01KYqx0BVfgYWVcsLzNVldCMdab6Okm0Haj8Bzg=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=UPJfO7A7dFBB9Dxq5cHtwjgRu7W4mMAUdjOVfL1AEtOfwl5+h+oTyvO8sHWqwOWGA POMfaGzsYJSfrlkKYqYyQZg0JbhREra+BQ4zvJpb57C/KIrzWRCYQsrg96Yjq4FGcm P1OaXIFFcM9MmZtYx53l2bE9ufvZLkbbFNUuDhM8HJIPzh8LaEiYbNT17YgUTTII5v 8iiOM+c1JVEiU9fncsSaAfK/0vH4jIxEUut/qMEeVGvaHLGrSLRO9C7CXNQnmqXRIv rCrQPYLwnExQLmMeuHQcNTA/j4D8WIevWQ9v5hIKLwCUP2phEk0FcieoboOmSGTPuQ l3ksH8a/cxReA== X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on mail.sysclose.org X-Spam-Level: X-Spam-Status: No, score=-1.1 required=5.0 tests=ALL_TRUSTED,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU autolearn=ham autolearn_force=no version=3.4.0 Received: from localhost (unknown [177.183.215.210]) by sysclose.org (Postfix) with ESMTPSA id 4BDEC65B8; Fri, 4 Oct 2019 20:10:46 +0000 (UTC) DKIM-Filter: OpenDKIM Filter v2.11.0 sysclose.org 4BDEC65B8 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sysclose.org; s=201903; t=1570219846; bh=zSBN01KYqx0BVfgYWVcsLzNVldCMdab6Okm0Haj8Bzg=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=I1AJY7amasGioglK13lGpxPzUv1GvxtzZpFSQ72805wWNJUEsw7Y7i1gXTU9u2YZE zI8tV71jWFIOKjP9q80mOnAiqNq05et5Y66L9wtEQ5qilcCBNuTtBncWjAzeU3k3Vj 8FIEevsAV3L4leq7KtzUUejCRv+t8lx1KVFs+zKPakPdRqhalEhMx45rETyXApcmVk qingGAss6JxaN7iGrkB4swfO+sLlAfPVkBqZjKxNGmUFesAtsRMSs0Ao7al4YC9XTK 1yyw0XLCTnqgGkXcGUV08acEG7wnJ45k9/DIpXzWvx4U44z6lvnM98LdBELXG+JY8n B91+d4/R0HWFQ== From: Flavio Leitner To: dev@dpdk.org Cc: Ilya Maximets , Maxime Coquelin , Shahaf Shuler , David Marchand , Tiwei Bie , Obrembski MichalX , Stokes Ian Date: Fri, 4 Oct 2019 17:10:08 -0300 Message-Id: <20191004201008.3981-1-fbl@sysclose.org> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20191001221935.12140-1-fbl@sysclose.org> References: <20191001221935.12140-1-fbl@sysclose.org> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Subject: [dpdk-dev] [PATCH v2] vhost: add support for large buffers. X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" The rte_vhost_dequeue_burst supports two ways of dequeuing data. If the data fits into a buffer, then all data is copied and a single linear buffer is returned. Otherwise it allocates additional mbufs and chains them together to return a multiple segments mbuf. While that covers most use cases, it forces applications that need to work with larger data sizes to support multiple segments mbufs. The non-linear characteristic brings complexity and performance implications to the application. To resolve the issue, let the host provide during registration if attaching an external buffer to pktmbuf is supported and if only linear buffer are supported. Signed-off-by: Flavio Leitner --- doc/guides/prog_guide/vhost_lib.rst | 35 ++++++++++ lib/librte_vhost/rte_vhost.h | 4 ++ lib/librte_vhost/socket.c | 10 +++ lib/librte_vhost/vhost.c | 22 ++++++ lib/librte_vhost/vhost.h | 4 ++ lib/librte_vhost/virtio_net.c | 103 ++++++++++++++++++++++++++-- 6 files changed, 172 insertions(+), 6 deletions(-) - Changelog: V2: - Used rte_malloc() instead of another mempool as suggested by Shahaf. - Added the documentation section. - Using driver registration to negotiate the features. - OvS PoC code: https://github.com/fleitner/ovs/commit/8fc197c40b1d4fda331686a7b919e9e2b670dda7 diff --git a/doc/guides/prog_guide/vhost_lib.rst b/doc/guides/prog_guide/vhost_lib.rst index fc3ee4353..07e40e3c5 100644 --- a/doc/guides/prog_guide/vhost_lib.rst +++ b/doc/guides/prog_guide/vhost_lib.rst @@ -117,6 +117,41 @@ The following is an overview of some key Vhost API functions: Enabling this flag should only be done when the calling application does not pre-fault the guest shared memory, otherwise migration would fail. + - ``RTE_VHOST_USER_LINEARBUF_SUPPORT`` + + Enabling this flag forces vhost dequeue function to only provide linear + pktmbuf (no multi-segmented pktmbuf). + + The vhost library by default provides a single pktmbuf for given a + packet, but if for some reason the data doesn't fit into a single + pktmbuf (e.g., TSO is enabled), the library will allocate additional + pktmbufs from the same mempool and chain them together to create a + multi-segmented pktmbuf. + + However, the vhost application needs to support multi-segmented format. + If the vhost application does not support that format and requires large + buffers to be dequeue, this flag should be enabled to force only linear + buffers (see RTE_VHOST_USER_EXTBUF_SUPPORT) or drop the packet. + + It is disabled by default. + + - ``RTE_VHOST_USER_EXTBUF_SUPPORT`` + + Enabling this flag allows vhost dequeue function to allocate and attach + an external buffer to a pktmbuf if the pkmbuf doesn't provide enough + space to store all data. + + This is useful when the vhost application wants to support large packets + but doesn't want to increase the default mempool object size nor to + support multi-segmented mbufs (non-linear). In this case, a fresh buffer + is allocated using rte_malloc() which gets attached to a pktmbuf using + rte_pktmbuf_attach_extbuf(). + + See RTE_VHOST_USER_LINEARBUF_SUPPORT as well to disable multi-segmented + mbufs for application that doesn't support chained mbufs. + + It is disabled by default. + * ``rte_vhost_driver_set_features(path, features)`` This function sets the feature bits the vhost-user driver supports. The diff --git a/lib/librte_vhost/rte_vhost.h b/lib/librte_vhost/rte_vhost.h index 19474bca0..b821b5df4 100644 --- a/lib/librte_vhost/rte_vhost.h +++ b/lib/librte_vhost/rte_vhost.h @@ -30,6 +30,10 @@ extern "C" { #define RTE_VHOST_USER_DEQUEUE_ZERO_COPY (1ULL << 2) #define RTE_VHOST_USER_IOMMU_SUPPORT (1ULL << 3) #define RTE_VHOST_USER_POSTCOPY_SUPPORT (1ULL << 4) +/* support mbuf with external buffer attached */ +#define RTE_VHOST_USER_EXTBUF_SUPPORT (1ULL << 5) +/* support only linear buffers (no chained mbufs) */ +#define RTE_VHOST_USER_LINEARBUF_SUPPORT (1ULL << 6) /** Protocol features. */ #ifndef VHOST_USER_PROTOCOL_F_MQ diff --git a/lib/librte_vhost/socket.c b/lib/librte_vhost/socket.c index 274988c4d..0ba610fda 100644 --- a/lib/librte_vhost/socket.c +++ b/lib/librte_vhost/socket.c @@ -40,6 +40,8 @@ struct vhost_user_socket { bool dequeue_zero_copy; bool iommu_support; bool use_builtin_virtio_net; + bool extbuf; + bool linearbuf; /* * The "supported_features" indicates the feature bits the @@ -232,6 +234,12 @@ vhost_user_add_connection(int fd, struct vhost_user_socket *vsocket) if (vsocket->dequeue_zero_copy) vhost_enable_dequeue_zero_copy(vid); + if (vsocket->extbuf) + vhost_enable_extbuf(vid); + + if (vsocket->linearbuf) + vhost_enable_linearbuf(vid); + RTE_LOG(INFO, VHOST_CONFIG, "new device, handle is %d\n", vid); if (vsocket->notify_ops->new_connection) { @@ -870,6 +878,8 @@ rte_vhost_driver_register(const char *path, uint64_t flags) goto out_free; } vsocket->dequeue_zero_copy = flags & RTE_VHOST_USER_DEQUEUE_ZERO_COPY; + vsocket->extbuf = flags & RTE_VHOST_USER_EXTBUF_SUPPORT; + vsocket->linearbuf = flags & RTE_VHOST_USER_LINEARBUF_SUPPORT; /* * Set the supported features correctly for the builtin vhost-user diff --git a/lib/librte_vhost/vhost.c b/lib/librte_vhost/vhost.c index cea44df8c..77457f538 100644 --- a/lib/librte_vhost/vhost.c +++ b/lib/librte_vhost/vhost.c @@ -605,6 +605,28 @@ vhost_set_builtin_virtio_net(int vid, bool enable) dev->flags &= ~VIRTIO_DEV_BUILTIN_VIRTIO_NET; } +void +vhost_enable_extbuf(int vid) +{ + struct virtio_net *dev = get_device(vid); + + if (dev == NULL) + return; + + dev->extbuf = 1; +} + +void +vhost_enable_linearbuf(int vid) +{ + struct virtio_net *dev = get_device(vid); + + if (dev == NULL) + return; + + dev->linearbuf = 1; +} + int rte_vhost_get_mtu(int vid, uint16_t *mtu) { diff --git a/lib/librte_vhost/vhost.h b/lib/librte_vhost/vhost.h index 5131a97a3..0346bd118 100644 --- a/lib/librte_vhost/vhost.h +++ b/lib/librte_vhost/vhost.h @@ -302,6 +302,8 @@ struct virtio_net { rte_atomic16_t broadcast_rarp; uint32_t nr_vring; int dequeue_zero_copy; + int extbuf; + int linearbuf; struct vhost_virtqueue *virtqueue[VHOST_MAX_QUEUE_PAIRS * 2]; #define IF_NAME_SZ (PATH_MAX > IFNAMSIZ ? PATH_MAX : IFNAMSIZ) char ifname[IF_NAME_SZ]; @@ -476,6 +478,8 @@ void vhost_attach_vdpa_device(int vid, int did); void vhost_set_ifname(int, const char *if_name, unsigned int if_len); void vhost_enable_dequeue_zero_copy(int vid); void vhost_set_builtin_virtio_net(int vid, bool enable); +void vhost_enable_extbuf(int vid); +void vhost_enable_linearbuf(int vid); struct vhost_device_ops const *vhost_driver_callback_get(const char *path); diff --git a/lib/librte_vhost/virtio_net.c b/lib/librte_vhost/virtio_net.c index 5b85b832d..fca75161d 100644 --- a/lib/librte_vhost/virtio_net.c +++ b/lib/librte_vhost/virtio_net.c @@ -2,6 +2,7 @@ * Copyright(c) 2010-2016 Intel Corporation */ +#include #include #include #include @@ -1289,6 +1290,96 @@ get_zmbuf(struct vhost_virtqueue *vq) return NULL; } +static void +virtio_dev_extbuf_free(void *addr __rte_unused, void *opaque) +{ + rte_free(opaque); +} + +static int +virtio_dev_extbuf_alloc(struct rte_mbuf *pkt, uint16_t size) +{ + struct rte_mbuf_ext_shared_info *shinfo; + uint16_t buf_len; + rte_iova_t iova; + void *buf; + + shinfo = NULL; + buf_len = size + RTE_PKTMBUF_HEADROOM; + + /* Try to use pkt buffer to store shinfo to reduce the amount of memory + * required, otherwise store shinfo in the new buffer. + */ + if (rte_pktmbuf_tailroom(pkt) > sizeof(*shinfo)) + shinfo = rte_pktmbuf_mtod(pkt, + struct rte_mbuf_ext_shared_info *); + else { + if (unlikely(buf_len + sizeof(shinfo) > UINT16_MAX)) { + RTE_LOG(ERR, VHOST_DATA, + "buffer size exceeded maximum.\n"); + return -ENOSPC; + } + + buf_len += sizeof(shinfo); + } + + buf = rte_malloc(NULL, buf_len, RTE_CACHE_LINE_SIZE); + if (unlikely(buf == NULL)) { + RTE_LOG(ERR, VHOST_DATA, + "Failed to allocate memory for mbuf.\n"); + return -ENOMEM; + } + + /* initialize shinfo */ + if (shinfo) { + shinfo->free_cb = virtio_dev_extbuf_free; + shinfo->fcb_opaque = buf; + rte_mbuf_ext_refcnt_set(shinfo, 1); + } else { + shinfo = rte_pktmbuf_ext_shinfo_init_helper(buf, &buf_len, + virtio_dev_extbuf_free, buf); + assert(shinfo); + } + + iova = rte_mempool_virt2iova(buf); + rte_pktmbuf_attach_extbuf(pkt, buf, iova, buf_len, shinfo); + rte_pktmbuf_reset_headroom(pkt); + assert(pkt->ol_flags == EXT_ATTACHED_MBUF); + + return 0; +} + +/* + * Allocate a host supported pktmbuf. + */ +static __rte_always_inline struct rte_mbuf * +virtio_dev_pktmbuf_alloc(struct virtio_net *dev, struct rte_mempool *mp, + uint16_t data_len) +{ + struct rte_mbuf *pkt = rte_pktmbuf_alloc(mp); + + if (unlikely(pkt == NULL)) + return NULL; + + if (rte_pktmbuf_tailroom(pkt) >= data_len) + return pkt; + + /* attach an external buffer if supported */ + if (dev->extbuf && !virtio_dev_extbuf_alloc(pkt, data_len)) + return pkt; + + /* check if chained buffers are allowed */ + if (!dev->linearbuf) + return pkt; + + /* Data doesn't fit into the buffer and the host supports + * only linear buffers + */ + rte_pktmbuf_free(pkt); + + return NULL; +} + static __rte_noinline uint16_t virtio_dev_tx_split(struct virtio_net *dev, struct vhost_virtqueue *vq, struct rte_mempool *mbuf_pool, struct rte_mbuf **pkts, uint16_t count) @@ -1343,21 +1434,21 @@ virtio_dev_tx_split(struct virtio_net *dev, struct vhost_virtqueue *vq, for (i = 0; i < count; i++) { struct buf_vector buf_vec[BUF_VECTOR_MAX]; uint16_t head_idx; - uint32_t dummy_len; + uint32_t buf_len; uint16_t nr_vec = 0; int err; if (unlikely(fill_vec_buf_split(dev, vq, vq->last_avail_idx + i, &nr_vec, buf_vec, - &head_idx, &dummy_len, + &head_idx, &buf_len, VHOST_ACCESS_RO) < 0)) break; if (likely(dev->dequeue_zero_copy == 0)) update_shadow_used_ring_split(vq, head_idx, 0); - pkts[i] = rte_pktmbuf_alloc(mbuf_pool); + pkts[i] = virtio_dev_pktmbuf_alloc(dev, mbuf_pool, buf_len); if (unlikely(pkts[i] == NULL)) { RTE_LOG(ERR, VHOST_DATA, "Failed to allocate memory for mbuf.\n"); @@ -1451,14 +1542,14 @@ virtio_dev_tx_packed(struct virtio_net *dev, struct vhost_virtqueue *vq, for (i = 0; i < count; i++) { struct buf_vector buf_vec[BUF_VECTOR_MAX]; uint16_t buf_id; - uint32_t dummy_len; + uint32_t buf_len; uint16_t desc_count, nr_vec = 0; int err; if (unlikely(fill_vec_buf_packed(dev, vq, vq->last_avail_idx, &desc_count, buf_vec, &nr_vec, - &buf_id, &dummy_len, + &buf_id, &buf_len, VHOST_ACCESS_RO) < 0)) break; @@ -1466,7 +1557,7 @@ virtio_dev_tx_packed(struct virtio_net *dev, struct vhost_virtqueue *vq, update_shadow_used_ring_packed(vq, buf_id, 0, desc_count); - pkts[i] = rte_pktmbuf_alloc(mbuf_pool); + pkts[i] = virtio_dev_pktmbuf_alloc(dev, mbuf_pool, buf_len); if (unlikely(pkts[i] == NULL)) { RTE_LOG(ERR, VHOST_DATA, "Failed to allocate memory for mbuf.\n"); -- 2.20.1