From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pd0-f175.google.com (mail-pd0-f175.google.com [209.85.192.175]) by dpdk.org (Postfix) with ESMTP id 5AED47E80 for ; Thu, 6 Nov 2014 12:05:35 +0100 (CET) Received: by mail-pd0-f175.google.com with SMTP id y13so966246pdi.6 for ; Thu, 06 Nov 2014 03:15:01 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=hswaiklQIU4i23qjRNaOTURbXPWBWnZF1kdnqblBDyQ=; b=g2bLhZ9PD9SJa8fMpSlaI8J//prX6hdOArNgmTweUhmdAFlUvl9OoYGz7O6gAHtdzg 8AUJOqWdPAgcUALMllcwnDTi0E2Vm7yporve3gIko01ay34KM004nlcgUe5/Ps8/cVza 709cPGoD4/cCbyb4AQM0xKoMgZq6zGsIwhLgfj1ToUYfvk3miSQoSUapPuls8SMtrK7j ijL1wsAKtmkPwDWu8lOtfyhtRmeix4fTaNK6eBq3tc7SiwDgzlS+3/kondZS3pghu064 vskXH/zjnFRnYJKZzuSb20ajUUb0wzrRrIsR5TNqnR0vcOJutvdhumdhfVMc5ptC54w2 V1ug== X-Gm-Message-State: ALoCoQkO+WueDt3qO/nZwCWRSDNOK9f0G3aBEgZjMo2blp4boM8clFAYKPb+xustP2WzTMHXplLb X-Received: by 10.70.5.130 with SMTP id s2mr3683000pds.17.1415272501829; Thu, 06 Nov 2014 03:15:01 -0800 (PST) Received: from localhost.localdomain (napt.igel.co.jp. [219.106.231.132]) by mx.google.com with ESMTPSA id jc3sm5652315pbb.49.2014.11.06.03.14.59 for (version=TLSv1.2 cipher=ECDHE-RSA-AES128-SHA bits=128/128); Thu, 06 Nov 2014 03:15:01 -0800 (PST) From: Tetsuya Mukawa To: dev@dpdk.org Date: Thu, 6 Nov 2014 20:14:27 +0900 Message-Id: <1415272471-3299-4-git-send-email-mukawa@igel.co.jp> X-Mailer: git-send-email 1.9.1 In-Reply-To: <1415272471-3299-1-git-send-email-mukawa@igel.co.jp> References: <1415272471-3299-1-git-send-email-mukawa@igel.co.jp> Cc: nakajima.yoshihiro@lab.ntt.co.jp, masutani.hitoshi@lab.ntt.co.jp Subject: [dpdk-dev] [RFC PATCH 3/7] lib/librte_vhost: Add an abstraction layer tointerpret messages X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 06 Nov 2014 11:05:35 -0000 This patch adds an abstraction layer to interpret messages from QEMU. This abstraction layer is needed because there are differences in message formats between vhost-cuse and vhost-user. Signed-off-by: Tetsuya Mukawa --- lib/librte_vhost/vhost-net-cdev.c | 2 +- lib/librte_vhost/vhost-net.h | 3 +- lib/librte_vhost/virtio-net-cdev.c | 492 +++++++++++++++++++++++++++++++++++++ lib/librte_vhost/virtio-net.c | 484 ++---------------------------------- 4 files changed, 517 insertions(+), 464 deletions(-) create mode 100644 lib/librte_vhost/virtio-net-cdev.c diff --git a/lib/librte_vhost/vhost-net-cdev.c b/lib/librte_vhost/vhost-net-cdev.c index 83e1d14..12d0f68 100644 --- a/lib/librte_vhost/vhost-net-cdev.c +++ b/lib/librte_vhost/vhost-net-cdev.c @@ -342,7 +342,7 @@ vhost_cuse_driver_register(struct vhost_driver *drv) cuse_info.dev_info_argv = device_argv; cuse_info.flags = CUSE_UNRESTRICTED_IOCTL; - ops = get_virtio_net_callbacks(); + ops = get_virtio_net_callbacks(drv->type); drv->session = cuse_lowlevel_setup(3, fuse_argv, &cuse_info, &vhost_net_ops, 0, NULL); diff --git a/lib/librte_vhost/vhost-net.h b/lib/librte_vhost/vhost-net.h index 03a5c57..09a99ce 100644 --- a/lib/librte_vhost/vhost-net.h +++ b/lib/librte_vhost/vhost-net.h @@ -109,5 +109,6 @@ struct vhost_net_device_ops { }; -struct vhost_net_device_ops const *get_virtio_net_callbacks(void); +struct vhost_net_device_ops const *get_virtio_net_callbacks( + vhost_driver_type_t type); #endif /* _VHOST_NET_CDEV_H_ */ diff --git a/lib/librte_vhost/virtio-net-cdev.c b/lib/librte_vhost/virtio-net-cdev.c new file mode 100644 index 0000000..f225bf5 --- /dev/null +++ b/lib/librte_vhost/virtio-net-cdev.c @@ -0,0 +1,492 @@ +/*- + * BSD LICENSE + * + * Copyright(c) 2014 IGEL Co.,Ltd. All rights reserved. + * All rights reserved. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * + * * Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * * Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in + * the documentation and/or other materials provided with the + * distribution. + * * Neither the name of IGEL nor the names of its + * contributors may be used to endorse or promote products derived + * from this software without specific prior written permission. + * + * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + */ + +#include +#include +#include +#include + +#include + +#include "vhost-net.h" +#include "eventfd_link/eventfd_link.h" + +const char eventfd_cdev[] = "/dev/eventfd-link"; + +/* Line size for reading maps file. */ +const uint32_t BUFSIZE = PATH_MAX; + +/* Size of prot char array in procmap. */ +#define PROT_SZ 5 + +/* Number of elements in procmap struct. */ +#define PROCMAP_SZ 8 + +/* Structure containing information gathered from maps file. */ +struct procmap { + uint64_t va_start; /* Start virtual address in file. */ + uint64_t va_end; /* End virtual address in file. */ + uint64_t len; /* Size of file. */ + uint64_t pgoff; /* Not used. */ + uint32_t maj; /* Not used. */ + uint32_t min; /* Not used. */ + uint32_t ino; /* Not used. */ + char prot[PROT_SZ]; /* Not used. */ + char fname[PATH_MAX];/* File name. */ +}; + +/* + * Locate the file containing QEMU's memory space and map it to our address space. + */ +static int +host_memory_map(struct virtio_net *dev, struct virtio_memory *mem, + pid_t pid, uint64_t addr) +{ + struct dirent *dptr = NULL; + struct procmap procmap; + DIR *dp = NULL; + int fd; + int i; + char memfile[PATH_MAX]; + char mapfile[PATH_MAX]; + char procdir[PATH_MAX]; + char resolved_path[PATH_MAX]; + char *path = NULL; + FILE *fmap; + void *map; + uint8_t found = 0; + char line[BUFSIZE]; + char dlm[] = "- : "; + char *str, *sp, *in[PROCMAP_SZ]; + char *end = NULL; + + /* Path where mem files are located. */ + snprintf(procdir, PATH_MAX, "/proc/%u/fd/", pid); + /* Maps file used to locate mem file. */ + snprintf(mapfile, PATH_MAX, "/proc/%u/maps", pid); + + fmap = fopen(mapfile, "r"); + if (fmap == NULL) { + RTE_LOG(ERR, VHOST_CONFIG, + "(%"PRIu64") Failed to open maps file for pid %d\n", + dev->device_fh, pid); + return -1; + } + + /* Read through maps file until we find out base_address. */ + while (fgets(line, BUFSIZE, fmap) != 0) { + str = line; + errno = 0; + /* Split line in to fields. */ + for (i = 0; i < PROCMAP_SZ; i++) { + in[i] = strtok_r(str, &dlm[i], &sp); + if ((in[i] == NULL) || (errno != 0)) { + fclose(fmap); + return -1; + } + str = NULL; + } + + /* Convert/Copy each field as needed. */ + procmap.va_start = strtoull(in[0], &end, 16); + if ((in[0] == '\0') || (end == NULL) || (*end != '\0') || (errno != 0)) { + fclose(fmap); + return -1; + } + + procmap.va_end = strtoull(in[1], &end, 16); + if ((in[1] == '\0') || (end == NULL) || (*end != '\0') || (errno != 0)) { + fclose(fmap); + return -1; + } + + procmap.pgoff = strtoull(in[3], &end, 16); + if ((in[3] == '\0') || (end == NULL) || (*end != '\0') || (errno != 0)) { + fclose(fmap); + return -1; + } + + procmap.maj = strtoul(in[4], &end, 16); + if ((in[4] == '\0') || (end == NULL) || (*end != '\0') || (errno != 0)) { + fclose(fmap); + return -1; + } + + procmap.min = strtoul(in[5], &end, 16); + if ((in[5] == '\0') || (end == NULL) || (*end != '\0') || (errno != 0)) { + fclose(fmap); + return -1; + } + + procmap.ino = strtoul(in[6], &end, 16); + if ((in[6] == '\0') || (end == NULL) || (*end != '\0') || (errno != 0)) { + fclose(fmap); + return -1; + } + + memcpy(&procmap.prot, in[2], PROT_SZ); + memcpy(&procmap.fname, in[7], PATH_MAX); + + if ((procmap.va_start <= addr) && (procmap.va_end >= addr)) { + procmap.len = procmap.va_end - procmap.va_start; + found = 1; + break; + } + } + fclose(fmap); + + if (!found) { + RTE_LOG(ERR, VHOST_CONFIG, "(%"PRIu64") Failed to find memory file in pid %d maps file\n", dev->device_fh, pid); + return -1; + } + + /* Find the guest memory file among the process fds. */ + dp = opendir(procdir); + if (dp == NULL) { + RTE_LOG(ERR, VHOST_CONFIG, "(%"PRIu64") Cannot open pid %d process directory\n", dev->device_fh, pid); + return -1; + + } + + found = 0; + + /* Read the fd directory contents. */ + while (NULL != (dptr = readdir(dp))) { + snprintf(memfile, PATH_MAX, "/proc/%u/fd/%s", + pid, dptr->d_name); + path = realpath(memfile, resolved_path); + if (path == NULL) { + RTE_LOG(ERR, VHOST_CONFIG, + "(%"PRIu64") Failed to resolve fd directory\n", + dev->device_fh); + closedir(dp); + return -1; + } + if (strncmp(resolved_path, procmap.fname, + strnlen(procmap.fname, PATH_MAX)) == 0) { + found = 1; + break; + } + } + + closedir(dp); + + if (found == 0) { + RTE_LOG(ERR, VHOST_CONFIG, "(%"PRIu64") Failed to find memory file for pid %d\n", dev->device_fh, pid); + return -1; + } + /* Open the shared memory file and map the memory into this process. */ + fd = open(memfile, O_RDWR); + + if (fd == -1) { + RTE_LOG(ERR, VHOST_CONFIG, "(%"PRIu64") Failed to open %s for pid %d\n", dev->device_fh, memfile, pid); + return -1; + } + + map = mmap(0, (size_t)procmap.len, PROT_READ|PROT_WRITE , + MAP_POPULATE|MAP_SHARED, fd, 0); + close(fd); + + if (map == MAP_FAILED) { + RTE_LOG(ERR, VHOST_CONFIG, "(%"PRIu64") Error mapping the file %s for pid %d\n", dev->device_fh, memfile, pid); + return -1; + } + + /* Store the memory address and size in the device data structure */ + mem->mapped_address = (uint64_t)(uintptr_t)map; + mem->mapped_size = procmap.len; + + LOG_DEBUG(VHOST_CONFIG, "(%"PRIu64") Mem File: %s->%s - Size: %llu - VA: %p\n", dev->device_fh, + memfile, resolved_path, (long long unsigned)mem->mapped_size, map); + + return 0; +} + +/* + * Called from CUSE IOCTL: VHOST_SET_MEM_TABLE + * This function creates and populates the memory structure for the device. This includes + * storing offsets used to translate buffer addresses. + */ +static int +cuse_set_mem_table(struct vhost_device_ctx ctx, const void *mem_regions_addr, + uint32_t nregions) +{ + struct virtio_net *dev; + struct vhost_memory_region *mem_regions; + struct virtio_memory *mem; + uint64_t size = offsetof(struct vhost_memory, regions); + uint32_t regionidx, valid_regions; + + dev = get_device(ctx); + if (dev == NULL) + return -1; + + if (dev->mem) { + munmap((void *)(uintptr_t)dev->mem->mapped_address, + (size_t)dev->mem->mapped_size); + free(dev->mem); + } + + /* Malloc the memory structure depending on the number of regions. */ + mem = calloc(1, sizeof(struct virtio_memory) + + (sizeof(struct virtio_memory_regions) * nregions)); + if (mem == NULL) { + RTE_LOG(ERR, VHOST_CONFIG, "(%"PRIu64") Failed to allocate memory for dev->mem.\n", dev->device_fh); + return -1; + } + + mem->nregions = nregions; + + mem_regions = (void *)(uintptr_t) + ((uint64_t)(uintptr_t)mem_regions_addr + size); + + for (regionidx = 0; regionidx < mem->nregions; regionidx++) { + /* Populate the region structure for each region. */ + mem->regions[regionidx].guest_phys_address = + mem_regions[regionidx].guest_phys_addr; + mem->regions[regionidx].guest_phys_address_end = + mem->regions[regionidx].guest_phys_address + + mem_regions[regionidx].memory_size; + mem->regions[regionidx].memory_size = + mem_regions[regionidx].memory_size; + mem->regions[regionidx].userspace_address = + mem_regions[regionidx].userspace_addr; + + LOG_DEBUG(VHOST_CONFIG, "(%"PRIu64") REGION: %u - GPA: %p - QEMU VA: %p - SIZE (%"PRIu64")\n", dev->device_fh, + regionidx, (void *)(uintptr_t)mem->regions[regionidx].guest_phys_address, + (void *)(uintptr_t)mem->regions[regionidx].userspace_address, + mem->regions[regionidx].memory_size); + + /*set the base address mapping*/ + if (mem->regions[regionidx].guest_phys_address == 0x0) { + mem->base_address = mem->regions[regionidx].userspace_address; + /* Map VM memory file */ + if (host_memory_map(dev, mem, ctx.pid, mem->base_address) != 0) { + free(mem); + return -1; + } + } + } + + /* Check that we have a valid base address. */ + if (mem->base_address == 0) { + RTE_LOG(ERR, VHOST_CONFIG, "(%"PRIu64") Failed to find base address of qemu memory file.\n", dev->device_fh); + free(mem); + return -1; + } + + /* Check if all of our regions have valid mappings. Usually one does not exist in the QEMU memory file. */ + valid_regions = mem->nregions; + for (regionidx = 0; regionidx < mem->nregions; regionidx++) { + if ((mem->regions[regionidx].userspace_address < mem->base_address) || + (mem->regions[regionidx].userspace_address > (mem->base_address + mem->mapped_size))) + valid_regions--; + } + + /* If a region does not have a valid mapping we rebuild our memory struct to contain only valid entries. */ + if (valid_regions != mem->nregions) { + LOG_DEBUG(VHOST_CONFIG, "(%"PRIu64") Not all memory regions exist in the QEMU mem file. Re-populating mem structure\n", + dev->device_fh); + + /* Re-populate the memory structure with only valid regions. Invalid regions are over-written with memmove. */ + valid_regions = 0; + + for (regionidx = mem->nregions; 0 != regionidx--;) { + if ((mem->regions[regionidx].userspace_address < mem->base_address) || + (mem->regions[regionidx].userspace_address > (mem->base_address + mem->mapped_size))) { + memmove(&mem->regions[regionidx], &mem->regions[regionidx + 1], + sizeof(struct virtio_memory_regions) * valid_regions); + } else { + valid_regions++; + } + } + } + mem->nregions = valid_regions; + dev->mem = mem; + + /* + * Calculate the address offset for each region. This offset is used to identify the vhost virtual address + * corresponding to a QEMU guest physical address. + */ + for (regionidx = 0; regionidx < dev->mem->nregions; regionidx++) { + dev->mem->regions[regionidx].address_offset = dev->mem->regions[regionidx].userspace_address - dev->mem->base_address + + dev->mem->mapped_address - dev->mem->regions[regionidx].guest_phys_address; + + } + return 0; +} + +/* + * Called from CUSE IOCTL: VHOST_GET_VRING_BASE + * We send the virtio device our available ring last used index. + */ +static int +cuse_get_vring_base(struct vhost_device_ctx ctx, uint32_t index, + struct vhost_vring_state *state) +{ + struct virtio_net *dev; + + dev = get_device(ctx); + if (dev == NULL) + return -1; + + state->index = index; + /* State->index refers to the queue index. The TX queue is 1, RX queue is 0. */ + state->num = dev->virtqueue[state->index]->last_used_idx; + + return 0; +} + +/* + * This function uses the eventfd_link kernel module to copy an eventfd file descriptor + * provided by QEMU in to our process space. + */ +static int +eventfd_copy(struct virtio_net *dev, struct eventfd_copy *eventfd_copy) +{ + int eventfd_link, ret; + + /* Open the character device to the kernel module. */ + eventfd_link = open(eventfd_cdev, O_RDWR); + if (eventfd_link < 0) { + RTE_LOG(ERR, VHOST_CONFIG, "(%"PRIu64") eventfd_link module is not loaded\n", dev->device_fh); + return -1; + } + + /* Call the IOCTL to copy the eventfd. */ + ret = ioctl(eventfd_link, EVENTFD_COPY, eventfd_copy); + close(eventfd_link); + + if (ret < 0) { + RTE_LOG(ERR, VHOST_CONFIG, "(%"PRIu64") EVENTFD_COPY ioctl failed\n", dev->device_fh); + return -1; + } + + + return 0; +} + +/* + * Called from CUSE IOCTL: VHOST_SET_VRING_CALL + * The virtio device sends an eventfd to interrupt the guest. This fd gets copied in + * to our process space. + */ +static int +cuse_set_vring_call(struct vhost_device_ctx ctx, struct vhost_vring_file *file) +{ + struct virtio_net *dev; + struct eventfd_copy eventfd_kick; + struct vhost_virtqueue *vq; + + dev = get_device(ctx); + if (dev == NULL) + return -1; + + /* file->index refers to the queue index. The TX queue is 1, RX queue is 0. */ + vq = dev->virtqueue[file->index]; + + if (vq->kickfd) + close((int)vq->kickfd); + + /* Populate the eventfd_copy structure and call eventfd_copy. */ + vq->kickfd = eventfd(0, EFD_NONBLOCK | EFD_CLOEXEC); + eventfd_kick.source_fd = vq->kickfd; + eventfd_kick.target_fd = file->fd; + eventfd_kick.target_pid = ctx.pid; + + if (eventfd_copy(dev, &eventfd_kick)) + return -1; + + return 0; +} + +/* + * Called from CUSE IOCTL: VHOST_SET_VRING_KICK + * The virtio device sends an eventfd that it can use to notify us. This fd gets copied in + * to our process space. + */ +static int +cuse_set_vring_kick(struct vhost_device_ctx ctx, struct vhost_vring_file *file) +{ + struct virtio_net *dev; + struct eventfd_copy eventfd_call; + struct vhost_virtqueue *vq; + + dev = get_device(ctx); + if (dev == NULL) + return -1; + + /* file->index refers to the queue index. The TX queue is 1, RX queue is 0. */ + vq = dev->virtqueue[file->index]; + + if (vq->callfd) + close((int)vq->callfd); + + /* Populate the eventfd_copy structure and call eventfd_copy. */ + vq->callfd = eventfd(0, EFD_NONBLOCK | EFD_CLOEXEC); + eventfd_call.source_fd = vq->callfd; + eventfd_call.target_fd = file->fd; + eventfd_call.target_pid = ctx.pid; + + if (eventfd_copy(dev, &eventfd_call)) + return -1; + + return 0; +} + +/* + * Function pointers are set for the device operations to allow CUSE to call functions + * when an IOCTL, device_add or device_release is received. + */ +static const struct vhost_net_device_ops vhost_cuse_device_ops = { + .new_device = new_device, + .destroy_device = destroy_device, + + .get_features = get_features, + .set_features = set_features, + + .set_mem_table = cuse_set_mem_table, + + .set_vring_num = set_vring_num, + .set_vring_addr = set_vring_addr, + .set_vring_base = set_vring_base, + .get_vring_base = cuse_get_vring_base, + + .set_vring_kick = cuse_set_vring_kick, + .set_vring_call = cuse_set_vring_call, + + .set_backend = set_backend, + + .set_owner = set_owner, + .reset_owner = reset_owner, +}; diff --git a/lib/librte_vhost/virtio-net.c b/lib/librte_vhost/virtio-net.c index 1dee1d8..985c66b 100644 --- a/lib/librte_vhost/virtio-net.c +++ b/lib/librte_vhost/virtio-net.c @@ -50,7 +50,6 @@ #include #include "vhost-net.h" -#include "eventfd_link/eventfd_link.h" /** * Device linked list structure for configuration. @@ -60,8 +59,6 @@ struct virtio_net_config_ll { struct virtio_net_config_ll *next; /* Next entry on linked list.*/ }; -const char eventfd_cdev[] = "/dev/eventfd-link"; - /* device ops to add/remove device to data core. */ static struct virtio_net_device_ops const *notify_ops; /* Root address of the linked list in the configuration core. */ @@ -71,28 +68,6 @@ static struct virtio_net_config_ll *ll_root; #define VHOST_SUPPORTED_FEATURES (1ULL << VIRTIO_NET_F_MRG_RXBUF) static uint64_t VHOST_FEATURES = VHOST_SUPPORTED_FEATURES; -/* Line size for reading maps file. */ -const uint32_t BUFSIZE = PATH_MAX; - -/* Size of prot char array in procmap. */ -#define PROT_SZ 5 - -/* Number of elements in procmap struct. */ -#define PROCMAP_SZ 8 - -/* Structure containing information gathered from maps file. */ -struct procmap { - uint64_t va_start; /* Start virtual address in file. */ - uint64_t va_end; /* End virtual address in file. */ - uint64_t len; /* Size of file. */ - uint64_t pgoff; /* Not used. */ - uint32_t maj; /* Not used. */ - uint32_t min; /* Not used. */ - uint32_t ino; /* Not used. */ - char prot[PROT_SZ]; /* Not used. */ - char fname[PATH_MAX];/* File name. */ -}; - /* * Converts QEMU virtual address to Vhost virtual address. This function is used * to convert the ring addresses to our address space. @@ -119,173 +94,6 @@ qva_to_vva(struct virtio_net *dev, uint64_t qemu_va) } /* - * Locate the file containing QEMU's memory space and map it to our address space. - */ -static int -host_memory_map(struct virtio_net *dev, struct virtio_memory *mem, - pid_t pid, uint64_t addr) -{ - struct dirent *dptr = NULL; - struct procmap procmap; - DIR *dp = NULL; - int fd; - int i; - char memfile[PATH_MAX]; - char mapfile[PATH_MAX]; - char procdir[PATH_MAX]; - char resolved_path[PATH_MAX]; - char *path = NULL; - FILE *fmap; - void *map; - uint8_t found = 0; - char line[BUFSIZE]; - char dlm[] = "- : "; - char *str, *sp, *in[PROCMAP_SZ]; - char *end = NULL; - - /* Path where mem files are located. */ - snprintf(procdir, PATH_MAX, "/proc/%u/fd/", pid); - /* Maps file used to locate mem file. */ - snprintf(mapfile, PATH_MAX, "/proc/%u/maps", pid); - - fmap = fopen(mapfile, "r"); - if (fmap == NULL) { - RTE_LOG(ERR, VHOST_CONFIG, - "(%"PRIu64") Failed to open maps file for pid %d\n", - dev->device_fh, pid); - return -1; - } - - /* Read through maps file until we find out base_address. */ - while (fgets(line, BUFSIZE, fmap) != 0) { - str = line; - errno = 0; - /* Split line in to fields. */ - for (i = 0; i < PROCMAP_SZ; i++) { - in[i] = strtok_r(str, &dlm[i], &sp); - if ((in[i] == NULL) || (errno != 0)) { - fclose(fmap); - return -1; - } - str = NULL; - } - - /* Convert/Copy each field as needed. */ - procmap.va_start = strtoull(in[0], &end, 16); - if ((in[0] == '\0') || (end == NULL) || (*end != '\0') || (errno != 0)) { - fclose(fmap); - return -1; - } - - procmap.va_end = strtoull(in[1], &end, 16); - if ((in[1] == '\0') || (end == NULL) || (*end != '\0') || (errno != 0)) { - fclose(fmap); - return -1; - } - - procmap.pgoff = strtoull(in[3], &end, 16); - if ((in[3] == '\0') || (end == NULL) || (*end != '\0') || (errno != 0)) { - fclose(fmap); - return -1; - } - - procmap.maj = strtoul(in[4], &end, 16); - if ((in[4] == '\0') || (end == NULL) || (*end != '\0') || (errno != 0)) { - fclose(fmap); - return -1; - } - - procmap.min = strtoul(in[5], &end, 16); - if ((in[5] == '\0') || (end == NULL) || (*end != '\0') || (errno != 0)) { - fclose(fmap); - return -1; - } - - procmap.ino = strtoul(in[6], &end, 16); - if ((in[6] == '\0') || (end == NULL) || (*end != '\0') || (errno != 0)) { - fclose(fmap); - return -1; - } - - memcpy(&procmap.prot, in[2], PROT_SZ); - memcpy(&procmap.fname, in[7], PATH_MAX); - - if ((procmap.va_start <= addr) && (procmap.va_end >= addr)) { - procmap.len = procmap.va_end - procmap.va_start; - found = 1; - break; - } - } - fclose(fmap); - - if (!found) { - RTE_LOG(ERR, VHOST_CONFIG, "(%"PRIu64") Failed to find memory file in pid %d maps file\n", dev->device_fh, pid); - return -1; - } - - /* Find the guest memory file among the process fds. */ - dp = opendir(procdir); - if (dp == NULL) { - RTE_LOG(ERR, VHOST_CONFIG, "(%"PRIu64") Cannot open pid %d process directory\n", dev->device_fh, pid); - return -1; - - } - - found = 0; - - /* Read the fd directory contents. */ - while (NULL != (dptr = readdir(dp))) { - snprintf(memfile, PATH_MAX, "/proc/%u/fd/%s", - pid, dptr->d_name); - path = realpath(memfile, resolved_path); - if (path == NULL) { - RTE_LOG(ERR, VHOST_CONFIG, - "(%"PRIu64") Failed to resolve fd directory\n", - dev->device_fh); - closedir(dp); - return -1; - } - if (strncmp(resolved_path, procmap.fname, - strnlen(procmap.fname, PATH_MAX)) == 0) { - found = 1; - break; - } - } - - closedir(dp); - - if (found == 0) { - RTE_LOG(ERR, VHOST_CONFIG, "(%"PRIu64") Failed to find memory file for pid %d\n", dev->device_fh, pid); - return -1; - } - /* Open the shared memory file and map the memory into this process. */ - fd = open(memfile, O_RDWR); - - if (fd == -1) { - RTE_LOG(ERR, VHOST_CONFIG, "(%"PRIu64") Failed to open %s for pid %d\n", dev->device_fh, memfile, pid); - return -1; - } - - map = mmap(0, (size_t)procmap.len, PROT_READ|PROT_WRITE , - MAP_POPULATE|MAP_SHARED, fd, 0); - close(fd); - - if (map == MAP_FAILED) { - RTE_LOG(ERR, VHOST_CONFIG, "(%"PRIu64") Error mapping the file %s for pid %d\n", dev->device_fh, memfile, pid); - return -1; - } - - /* Store the memory address and size in the device data structure */ - mem->mapped_address = (uint64_t)(uintptr_t)map; - mem->mapped_size = procmap.len; - - LOG_DEBUG(VHOST_CONFIG, "(%"PRIu64") Mem File: %s->%s - Size: %llu - VA: %p\n", dev->device_fh, - memfile, resolved_path, (long long unsigned)mem->mapped_size, map); - - return 0; -} - -/* * Retrieves an entry from the devices configuration linked list. */ static struct virtio_net_config_ll * @@ -439,7 +247,7 @@ init_device(struct virtio_net *dev) } /* - * Function is called from the CUSE open function. The device structure is + * Function is called from the open function. The device structure is * initialised and a new entry is added to the device configuration linked * list. */ @@ -492,7 +300,7 @@ new_device(struct vhost_device_ctx ctx) } /* - * Function is called from the CUSE release function. This function will cleanup + * Function is called from the release function. This function will cleanup * the device and remove it from device configuration linked list. */ static void @@ -521,7 +329,7 @@ destroy_device(struct vhost_device_ctx ctx) } /* - * Called from CUSE IOCTL: VHOST_SET_OWNER + * Called from IOCTL: VHOST_SET_OWNER * This function just returns success at the moment unless the device hasn't been initialised. */ static int @@ -537,7 +345,7 @@ set_owner(struct vhost_device_ctx ctx) } /* - * Called from CUSE IOCTL: VHOST_RESET_OWNER + * Called from IOCTL: VHOST_RESET_OWNER */ static int reset_owner(struct vhost_device_ctx ctx) @@ -553,7 +361,7 @@ reset_owner(struct vhost_device_ctx ctx) } /* - * Called from CUSE IOCTL: VHOST_GET_FEATURES + * Called from IOCTL: VHOST_GET_FEATURES * The features that we support are requested. */ static int @@ -571,7 +379,7 @@ get_features(struct vhost_device_ctx ctx, uint64_t *pu) } /* - * Called from CUSE IOCTL: VHOST_SET_FEATURES + * Called from IOCTL: VHOST_SET_FEATURES * We receive the negotiated set of features supported by us and the virtio device. */ static int @@ -605,123 +413,8 @@ set_features(struct vhost_device_ctx ctx, uint64_t *pu) return 0; } - -/* - * Called from CUSE IOCTL: VHOST_SET_MEM_TABLE - * This function creates and populates the memory structure for the device. This includes - * storing offsets used to translate buffer addresses. - */ -static int -set_mem_table(struct vhost_device_ctx ctx, const void *mem_regions_addr, - uint32_t nregions) -{ - struct virtio_net *dev; - struct vhost_memory_region *mem_regions; - struct virtio_memory *mem; - uint64_t size = offsetof(struct vhost_memory, regions); - uint32_t regionidx, valid_regions; - - dev = get_device(ctx); - if (dev == NULL) - return -1; - - if (dev->mem) { - munmap((void *)(uintptr_t)dev->mem->mapped_address, - (size_t)dev->mem->mapped_size); - free(dev->mem); - } - - /* Malloc the memory structure depending on the number of regions. */ - mem = calloc(1, sizeof(struct virtio_memory) + - (sizeof(struct virtio_memory_regions) * nregions)); - if (mem == NULL) { - RTE_LOG(ERR, VHOST_CONFIG, "(%"PRIu64") Failed to allocate memory for dev->mem.\n", dev->device_fh); - return -1; - } - - mem->nregions = nregions; - - mem_regions = (void *)(uintptr_t) - ((uint64_t)(uintptr_t)mem_regions_addr + size); - - for (regionidx = 0; regionidx < mem->nregions; regionidx++) { - /* Populate the region structure for each region. */ - mem->regions[regionidx].guest_phys_address = - mem_regions[regionidx].guest_phys_addr; - mem->regions[regionidx].guest_phys_address_end = - mem->regions[regionidx].guest_phys_address + - mem_regions[regionidx].memory_size; - mem->regions[regionidx].memory_size = - mem_regions[regionidx].memory_size; - mem->regions[regionidx].userspace_address = - mem_regions[regionidx].userspace_addr; - - LOG_DEBUG(VHOST_CONFIG, "(%"PRIu64") REGION: %u - GPA: %p - QEMU VA: %p - SIZE (%"PRIu64")\n", dev->device_fh, - regionidx, (void *)(uintptr_t)mem->regions[regionidx].guest_phys_address, - (void *)(uintptr_t)mem->regions[regionidx].userspace_address, - mem->regions[regionidx].memory_size); - - /*set the base address mapping*/ - if (mem->regions[regionidx].guest_phys_address == 0x0) { - mem->base_address = mem->regions[regionidx].userspace_address; - /* Map VM memory file */ - if (host_memory_map(dev, mem, ctx.pid, mem->base_address) != 0) { - free(mem); - return -1; - } - } - } - - /* Check that we have a valid base address. */ - if (mem->base_address == 0) { - RTE_LOG(ERR, VHOST_CONFIG, "(%"PRIu64") Failed to find base address of qemu memory file.\n", dev->device_fh); - free(mem); - return -1; - } - - /* Check if all of our regions have valid mappings. Usually one does not exist in the QEMU memory file. */ - valid_regions = mem->nregions; - for (regionidx = 0; regionidx < mem->nregions; regionidx++) { - if ((mem->regions[regionidx].userspace_address < mem->base_address) || - (mem->regions[regionidx].userspace_address > (mem->base_address + mem->mapped_size))) - valid_regions--; - } - - /* If a region does not have a valid mapping we rebuild our memory struct to contain only valid entries. */ - if (valid_regions != mem->nregions) { - LOG_DEBUG(VHOST_CONFIG, "(%"PRIu64") Not all memory regions exist in the QEMU mem file. Re-populating mem structure\n", - dev->device_fh); - - /* Re-populate the memory structure with only valid regions. Invalid regions are over-written with memmove. */ - valid_regions = 0; - - for (regionidx = mem->nregions; 0 != regionidx--;) { - if ((mem->regions[regionidx].userspace_address < mem->base_address) || - (mem->regions[regionidx].userspace_address > (mem->base_address + mem->mapped_size))) { - memmove(&mem->regions[regionidx], &mem->regions[regionidx + 1], - sizeof(struct virtio_memory_regions) * valid_regions); - } else { - valid_regions++; - } - } - } - mem->nregions = valid_regions; - dev->mem = mem; - - /* - * Calculate the address offset for each region. This offset is used to identify the vhost virtual address - * corresponding to a QEMU guest physical address. - */ - for (regionidx = 0; regionidx < dev->mem->nregions; regionidx++) { - dev->mem->regions[regionidx].address_offset = dev->mem->regions[regionidx].userspace_address - dev->mem->base_address - + dev->mem->mapped_address - dev->mem->regions[regionidx].guest_phys_address; - - } - return 0; -} - /* - * Called from CUSE IOCTL: VHOST_SET_VRING_NUM + * Called from IOCTL: VHOST_SET_VRING_NUM * The virtio device sends us the size of the descriptor ring. */ static int @@ -740,7 +433,7 @@ set_vring_num(struct vhost_device_ctx ctx, struct vhost_vring_state *state) } /* - * Called from CUSE IOCTL: VHOST_SET_VRING_ADDR + * Called from IOCTL: VHOST_SET_VRING_ADDR * The virtio device sends us the desc, used and avail ring addresses. This function * then converts these to our address space. */ @@ -784,7 +477,7 @@ set_vring_addr(struct vhost_device_ctx ctx, struct vhost_vring_addr *addr) } /* - * Called from CUSE IOCTL: VHOST_SET_VRING_BASE + * Called from IOCTL: VHOST_SET_VRING_BASE * The virtio device sends us the available ring last used index. */ static int @@ -804,125 +497,7 @@ set_vring_base(struct vhost_device_ctx ctx, struct vhost_vring_state *state) } /* - * Called from CUSE IOCTL: VHOST_GET_VRING_BASE - * We send the virtio device our available ring last used index. - */ -static int -get_vring_base(struct vhost_device_ctx ctx, uint32_t index, - struct vhost_vring_state *state) -{ - struct virtio_net *dev; - - dev = get_device(ctx); - if (dev == NULL) - return -1; - - state->index = index; - /* State->index refers to the queue index. The TX queue is 1, RX queue is 0. */ - state->num = dev->virtqueue[state->index]->last_used_idx; - - return 0; -} - -/* - * This function uses the eventfd_link kernel module to copy an eventfd file descriptor - * provided by QEMU in to our process space. - */ -static int -eventfd_copy(struct virtio_net *dev, struct eventfd_copy *eventfd_copy) -{ - int eventfd_link, ret; - - /* Open the character device to the kernel module. */ - eventfd_link = open(eventfd_cdev, O_RDWR); - if (eventfd_link < 0) { - RTE_LOG(ERR, VHOST_CONFIG, "(%"PRIu64") eventfd_link module is not loaded\n", dev->device_fh); - return -1; - } - - /* Call the IOCTL to copy the eventfd. */ - ret = ioctl(eventfd_link, EVENTFD_COPY, eventfd_copy); - close(eventfd_link); - - if (ret < 0) { - RTE_LOG(ERR, VHOST_CONFIG, "(%"PRIu64") EVENTFD_COPY ioctl failed\n", dev->device_fh); - return -1; - } - - - return 0; -} - -/* - * Called from CUSE IOCTL: VHOST_SET_VRING_CALL - * The virtio device sends an eventfd to interrupt the guest. This fd gets copied in - * to our process space. - */ -static int -set_vring_call(struct vhost_device_ctx ctx, struct vhost_vring_file *file) -{ - struct virtio_net *dev; - struct eventfd_copy eventfd_kick; - struct vhost_virtqueue *vq; - - dev = get_device(ctx); - if (dev == NULL) - return -1; - - /* file->index refers to the queue index. The TX queue is 1, RX queue is 0. */ - vq = dev->virtqueue[file->index]; - - if (vq->kickfd) - close((int)vq->kickfd); - - /* Populate the eventfd_copy structure and call eventfd_copy. */ - vq->kickfd = eventfd(0, EFD_NONBLOCK | EFD_CLOEXEC); - eventfd_kick.source_fd = vq->kickfd; - eventfd_kick.target_fd = file->fd; - eventfd_kick.target_pid = ctx.pid; - - if (eventfd_copy(dev, &eventfd_kick)) - return -1; - - return 0; -} - -/* - * Called from CUSE IOCTL: VHOST_SET_VRING_KICK - * The virtio device sends an eventfd that it can use to notify us. This fd gets copied in - * to our process space. - */ -static int -set_vring_kick(struct vhost_device_ctx ctx, struct vhost_vring_file *file) -{ - struct virtio_net *dev; - struct eventfd_copy eventfd_call; - struct vhost_virtqueue *vq; - - dev = get_device(ctx); - if (dev == NULL) - return -1; - - /* file->index refers to the queue index. The TX queue is 1, RX queue is 0. */ - vq = dev->virtqueue[file->index]; - - if (vq->callfd) - close((int)vq->callfd); - - /* Populate the eventfd_copy structure and call eventfd_copy. */ - vq->callfd = eventfd(0, EFD_NONBLOCK | EFD_CLOEXEC); - eventfd_call.source_fd = vq->callfd; - eventfd_call.target_fd = file->fd; - eventfd_call.target_pid = ctx.pid; - - if (eventfd_copy(dev, &eventfd_call)) - return -1; - - return 0; -} - -/* - * Called from CUSE IOCTL: VHOST_NET_SET_BACKEND + * Called from IOCTL: VHOST_NET_SET_BACKEND * To complete device initialisation when the virtio driver is loaded we are provided with a * valid fd for a tap device (not used by us). If this happens then we can add the device to a * data core. When the virtio driver is removed we get fd=-1. At that point we remove the device @@ -953,39 +528,24 @@ set_backend(struct vhost_device_ctx ctx, struct vhost_vring_file *file) } /* - * Function pointers are set for the device operations to allow CUSE to call functions - * when an IOCTL, device_add or device_release is received. + * Include cuse depend functions and definitions. */ -static const struct vhost_net_device_ops vhost_device_ops = { - .new_device = new_device, - .destroy_device = destroy_device, - - .get_features = get_features, - .set_features = set_features, - - .set_mem_table = set_mem_table, - - .set_vring_num = set_vring_num, - .set_vring_addr = set_vring_addr, - .set_vring_base = set_vring_base, - .get_vring_base = get_vring_base, - - .set_vring_kick = set_vring_kick, - .set_vring_call = set_vring_call, - - .set_backend = set_backend, - - .set_owner = set_owner, - .reset_owner = reset_owner, -}; +#include "virtio-net-cdev.c" /* - * Called by main to setup callbacks when registering CUSE device. + * Called by main to setup callbacks when registering device. */ struct vhost_net_device_ops const * -get_virtio_net_callbacks(void) +get_virtio_net_callbacks(vhost_driver_type_t type) { - return &vhost_device_ops; + switch (type) { + case VHOST_DRV_CUSE: + return &vhost_cuse_device_ops; + default: + break; + } + + return NULL; } int rte_vhost_enable_guest_notification(struct virtio_net *dev, -- 1.9.1