From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pa0-f53.google.com (mail-pa0-f53.google.com [209.85.220.53]) by dpdk.org (Postfix) with ESMTP id 44EAB7F7D for ; Mon, 17 Nov 2014 06:54:40 +0100 (CET) Received: by mail-pa0-f53.google.com with SMTP id kq14so2622860pab.26 for ; Sun, 16 Nov 2014 22:04:54 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:message-id:date:from:user-agent:mime-version:to :subject:references:in-reply-to:content-type :content-transfer-encoding; bh=di+yjy5im8khuu7SCpBbynj/aWFjfAjEJfIx8uAmJvg=; b=ER1G1VNQuLBPDQ8+qSUioETrqQ78Cqe3l21EWo/08KZPNAUC7AJDo52gEvmHYhqKDo flo0Y6KmXgc4MjXm2PBAWzHn7vIxpviPrBjXkVfa+IcXowF5sHNCKMpC133YVTWG65ez KspDRXG6KdOotMy6xboi8eckgHcyN0mp2YL5eGsmVmg1A7VBev6usneCCm5z3k5U/72r DMhA1s9nJ6SmULRgT7uRvX5SZFk4cOqjG60BYY2Dtpyl+lOrChFeeDU3jtKjiG5IzehK 6+G6wBvOHiZeMLRxCR5i/nD6PkozWOCArPIWvQChTlh8X9xXc8oNBVTd1tuhjS4qvmso GgvQ== X-Gm-Message-State: ALoCoQltGTX/uUeOBClVHV5To2tj4n8KIBGXlOv1Zau7+czePfNeSIYny7F2L3PNpMXJIrJ1HHff X-Received: by 10.70.94.197 with SMTP id de5mr10795pdb.161.1416204294598; Sun, 16 Nov 2014 22:04:54 -0800 (PST) Received: from [10.16.129.101] (napt.igel.co.jp. [219.106.231.132]) by mx.google.com with ESMTPSA id nb5sm33914381pbc.25.2014.11.16.22.04.52 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Sun, 16 Nov 2014 22:04:53 -0800 (PST) Message-ID: <54699004.4040609@igel.co.jp> Date: Mon, 17 Nov 2014 15:04:52 +0900 From: Tetsuya Mukawa User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:24.0) Gecko/20100101 Thunderbird/24.6.0 MIME-Version: 1.0 To: Huawei Xie , dev@dpdk.org References: <1416014087-22499-1-git-send-email-huawei.xie@intel.com> In-Reply-To: <1416014087-22499-1-git-send-email-huawei.xie@intel.com> Content-Type: text/plain; charset=ISO-2022-JP Content-Transfer-Encoding: 7bit Subject: Re: [dpdk-dev] [PATCH RFC] lib/librte_vhost: vhost-user X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 17 Nov 2014 05:54:42 -0000 Hi Xie, (2014/11/15 10:14), Huawei Xie wrote: > implement socket server > fd event dispatch mechanism > vhost sock message handling > memory map for each region > VHOST_USER_SET_VRING_KICK_FD as the indicator that vring is available > VHOST_USER_GET_VRING_BASE as the message that vring should be released > > The message flow between vhost-user and vhost-cuse is kindof different, > which makes virtio-net common message handler layer difficult and complicated to handle > both cases in new_device/destroy_device/memory map/resource cleanup. > > Will only leave the most common messag handling in virtio-net, and move the > control logic to cuse/fuse layer. > > > Signed-off-by: Huawei Xie Great patch! I guess we can start from this patch to implement vhost-user and abstraction layer. I've checked patch. 1. White space, tab and indent patch. I will send patch that clears white space, tab and indent. Could you please check it? It might be difficult to see the difference, if your editor doesn't show a space or tab. 2. Some files are based on old codes. At least, following patch is not included. - vhost: fix build without unused result Also vhost_rxtx.c isn't probably based on latest code. 3. Device abstraction layer code I will send the device abstraction layer code after this email. Anyway, I guess we need to decide whether, or not we still keep vhost-cuse code 4. Multiple devices operation. For example, when thread1 opens vhost-user device1 and thread2 opens vhost-user device2, each thread may want to register own callbacks. Current implementation may not allow this. I guess we need to eliminate global variables in librte_vhost as much as possible. Thanks, Tetsuya > --- > lib/librte_vhost/Makefile | 14 +- > lib/librte_vhost/eventfd_link/eventfd_link.c | 27 +- > lib/librte_vhost/eventfd_link/eventfd_link.h | 48 +- > lib/librte_vhost/libvirt/qemu-wrap.py | 367 --------------- > lib/librte_vhost/rte_virtio_net.h | 106 ++--- > lib/librte_vhost/vhost-cuse/vhost-net-cdev.c | 436 ++++++++++++++++++ > lib/librte_vhost/vhost-cuse/virtio-net-cdev.c | 314 +++++++++++++ > lib/librte_vhost/vhost-cuse/virtio-net-cdev.h | 43 ++ > lib/librte_vhost/vhost-net-cdev.c | 389 ---------------- > lib/librte_vhost/vhost-net-cdev.h | 113 ----- > lib/librte_vhost/vhost-user/fd_man.c | 158 +++++++ > lib/librte_vhost/vhost-user/fd_man.h | 31 ++ > lib/librte_vhost/vhost-user/vhost-net-user.c | 417 +++++++++++++++++ > lib/librte_vhost/vhost-user/vhost-net-user.h | 74 +++ > lib/librte_vhost/vhost-user/virtio-net-user.c | 208 +++++++++ > lib/librte_vhost/vhost-user/virtio-net-user.h | 11 + > lib/librte_vhost/vhost_rxtx.c | 625 ++++---------------------- > lib/librte_vhost/virtio-net.c | 450 ++++--------------- > 18 files changed, 1939 insertions(+), 1892 deletions(-) > delete mode 100755 lib/librte_vhost/libvirt/qemu-wrap.py > create mode 100644 lib/librte_vhost/vhost-cuse/vhost-net-cdev.c > create mode 100644 lib/librte_vhost/vhost-cuse/virtio-net-cdev.c > create mode 100644 lib/librte_vhost/vhost-cuse/virtio-net-cdev.h > delete mode 100644 lib/librte_vhost/vhost-net-cdev.c > delete mode 100644 lib/librte_vhost/vhost-net-cdev.h > create mode 100644 lib/librte_vhost/vhost-user/fd_man.c > create mode 100644 lib/librte_vhost/vhost-user/fd_man.h > create mode 100644 lib/librte_vhost/vhost-user/vhost-net-user.c > create mode 100644 lib/librte_vhost/vhost-user/vhost-net-user.h > create mode 100644 lib/librte_vhost/vhost-user/virtio-net-user.c > create mode 100644 lib/librte_vhost/vhost-user/virtio-net-user.h > > diff --git a/lib/librte_vhost/Makefile b/lib/librte_vhost/Makefile > index c008d64..cb4e172 100644 > --- a/lib/librte_vhost/Makefile > +++ b/lib/librte_vhost/Makefile > @@ -34,17 +34,19 @@ include $(RTE_SDK)/mk/rte.vars.mk > # library name > LIB = librte_vhost.a > > -CFLAGS += $(WERROR_FLAGS) -I$(SRCDIR) -O3 -D_FILE_OFFSET_BITS=64 -lfuse > +CFLAGS += $(WERROR_FLAGS) -I$(SRCDIR) -I. -I vhost-user -I vhost-cuse -O3 -D_FILE_OFFSET_BITS=64 -lfuse > LDFLAGS += -lfuse > # all source are stored in SRCS-y > -SRCS-$(CONFIG_RTE_LIBRTE_VHOST) := vhost-net-cdev.c virtio-net.c vhost_rxtx.c > +#SRCS-$(CONFIG_RTE_LIBRTE_VHOST) := vhost-cuse/vhost-net-cdev.c vhost-cuse/virtio-net-cdev.c > + > +SRCS-$(CONFIG_RTE_LIBRTE_VHOST) := vhost-user/fd_man.c vhost-user/vhost-net-user.c vhost-user/virtio-net-user.c > + > +SRCS-$(CONFIG_RTE_LIBRTE_VHOST) += virtio-net.c vhost_rxtx.c > > # install includes > SYMLINK-$(CONFIG_RTE_LIBRTE_VHOST)-include += rte_virtio_net.h > > -# dependencies > -DEPDIRS-$(CONFIG_RTE_LIBRTE_VHOST) += lib/librte_eal > -DEPDIRS-$(CONFIG_RTE_LIBRTE_VHOST) += lib/librte_ether > -DEPDIRS-$(CONFIG_RTE_LIBRTE_VHOST) += lib/librte_mbuf > +# this lib needs eal > +DEPDIRS-$(CONFIG_RTE_LIBRTE_VHOST) += lib/librte_eal lib/librte_mbuf > > include $(RTE_SDK)/mk/rte.lib.mk > diff --git a/lib/librte_vhost/eventfd_link/eventfd_link.c b/lib/librte_vhost/eventfd_link/eventfd_link.c > index 7755dd6..4c9b628 100644 > --- a/lib/librte_vhost/eventfd_link/eventfd_link.c > +++ b/lib/librte_vhost/eventfd_link/eventfd_link.c > @@ -13,8 +13,7 @@ > * General Public License for more details. > * > * You should have received a copy of the GNU General Public License > - * along with this program; if not, write to the Free Software > - * Foundation, Inc., 51 Franklin St - Fifth Floor, Boston, MA 02110-1301 USA. > + * along with this program; If not, see . > * The full GNU General Public License is included in this distribution > * in the file called LICENSE.GPL. > * > @@ -78,8 +77,7 @@ eventfd_link_ioctl(struct file *f, unsigned int ioctl, unsigned long arg) > > switch (ioctl) { > case EVENTFD_COPY: > - if (copy_from_user(&eventfd_copy, argp, > - sizeof(struct eventfd_copy))) > + if (copy_from_user(&eventfd_copy, argp, sizeof(struct eventfd_copy))) > return -EFAULT; > > /* > @@ -88,28 +86,28 @@ eventfd_link_ioctl(struct file *f, unsigned int ioctl, unsigned long arg) > task_target = > pid_task(find_vpid(eventfd_copy.target_pid), PIDTYPE_PID); > if (task_target == NULL) { > - pr_debug("Failed to get mem ctx for target pid\n"); > + printk(KERN_DEBUG "Failed to get mem ctx for target pid\n"); > return -EFAULT; > } > > files = get_files_struct(current); > if (files == NULL) { > - pr_debug("Failed to get files struct\n"); > + printk(KERN_DEBUG "Failed to get files struct\n"); > return -EFAULT; > } > > rcu_read_lock(); > file = fcheck_files(files, eventfd_copy.source_fd); > if (file) { > - if (file->f_mode & FMODE_PATH || > - !atomic_long_inc_not_zero(&file->f_count)) > + if (file->f_mode & FMODE_PATH > + || !atomic_long_inc_not_zero(&file->f_count)) > file = NULL; > } > rcu_read_unlock(); > put_files_struct(files); > > if (file == NULL) { > - pr_debug("Failed to get file from source pid\n"); > + printk(KERN_DEBUG "Failed to get file from source pid\n"); > return 0; > } > > @@ -128,25 +126,26 @@ eventfd_link_ioctl(struct file *f, unsigned int ioctl, unsigned long arg) > > files = get_files_struct(task_target); > if (files == NULL) { > - pr_debug("Failed to get files struct\n"); > + printk(KERN_DEBUG "Failed to get files struct\n"); > return -EFAULT; > } > > rcu_read_lock(); > file = fcheck_files(files, eventfd_copy.target_fd); > if (file) { > - if (file->f_mode & FMODE_PATH || > - !atomic_long_inc_not_zero(&file->f_count)) > - file = NULL; > + if (file->f_mode & FMODE_PATH > + || !atomic_long_inc_not_zero(&file->f_count)) > + file = NULL; > } > rcu_read_unlock(); > put_files_struct(files); > > if (file == NULL) { > - pr_debug("Failed to get file from target pid\n"); > + printk(KERN_DEBUG "Failed to get file from target pid\n"); > return 0; > } > > + > /* > * Install the file struct from the target process into the > * file desciptor of the source process, > diff --git a/lib/librte_vhost/eventfd_link/eventfd_link.h b/lib/librte_vhost/eventfd_link/eventfd_link.h > index ea619ec..38052e2 100644 > --- a/lib/librte_vhost/eventfd_link/eventfd_link.h > +++ b/lib/librte_vhost/eventfd_link/eventfd_link.h > @@ -1,7 +1,4 @@ > /*- > - * This file is provided under a dual BSD/GPLv2 license. When using or > - * redistributing this file, you may do so under either license. > - * > * GPL LICENSE SUMMARY > * > * Copyright(c) 2010-2014 Intel Corporation. All rights reserved. > @@ -16,61 +13,28 @@ > * General Public License for more details. > * > * You should have received a copy of the GNU General Public License > - * along with this program; if not, write to the Free Software > - * Foundation, Inc., 51 Franklin St - Fifth Floor, Boston, MA 02110-1301 USA. > + * along with this program; If not, see . > * The full GNU General Public License is included in this distribution > * in the file called LICENSE.GPL. > * > * Contact Information: > * Intel Corporation > - * > - * BSD LICENSE > - * > - * Copyright(c) 2010-2014 Intel Corporation. All rights reserved. > - * All rights reserved. > - * > - * Redistribution and use in source and binary forms, with or without > - * modification, are permitted provided that the following conditions > - * are met: > - * > - * Redistributions of source code must retain the above copyright > - * notice, this list of conditions and the following disclaimer. > - * Redistributions in binary form must reproduce the above copyright > - * notice, this list of conditions and the following disclaimer in > - * the documentation and/or other materials provided with the > - * distribution. > - * Neither the name of Intel Corporation nor the names of its > - * contributors may be used to endorse or promote products derived > - * from this software without specific prior written permission. > - * > - * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS > - * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT > - * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR > - * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT > - * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, > - * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT > - * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, > - * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY > - * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT > - * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE > - * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. > - * > */ > > #ifndef _EVENTFD_LINK_H_ > #define _EVENTFD_LINK_H_ > > /* > - * ioctl to copy an fd entry in calling process to an fd in a target process > + * ioctl to copy an fd entry in calling process to an fd in a target process > */ > #define EVENTFD_COPY 1 > > /* > - * arguements for the EVENTFD_COPY ioctl > + * arguements for the EVENTFD_COPY ioctl > */ > struct eventfd_copy { > - unsigned target_fd; /* fd in the target pid */ > - unsigned source_fd; /* fd in the calling pid */ > - pid_t target_pid; /* pid of the target pid */ > + unsigned target_fd; /**< fd in the target pid */ > + unsigned source_fd; /**< fd in the calling pid */ > + pid_t target_pid; /**< pid of the target pid */ > }; > #endif /* _EVENTFD_LINK_H_ */ > diff --git a/lib/librte_vhost/libvirt/qemu-wrap.py b/lib/librte_vhost/libvirt/qemu-wrap.py > deleted file mode 100755 > index e2d68a0..0000000 > --- a/lib/librte_vhost/libvirt/qemu-wrap.py > +++ /dev/null > @@ -1,367 +0,0 @@ > -#!/usr/bin/python > -#/* > -# * BSD LICENSE > -# * > -# * Copyright(c) 2010-2014 Intel Corporation. All rights reserved. > -# * All rights reserved. > -# * > -# * Redistribution and use in source and binary forms, with or without > -# * modification, are permitted provided that the following conditions > -# * are met: > -# * > -# * * Redistributions of source code must retain the above copyright > -# * notice, this list of conditions and the following disclaimer. > -# * * Redistributions in binary form must reproduce the above copyright > -# * notice, this list of conditions and the following disclaimer in > -# * the documentation and/or other materials provided with the > -# * distribution. > -# * * Neither the name of Intel Corporation nor the names of its > -# * contributors may be used to endorse or promote products derived > -# * from this software without specific prior written permission. > -# * > -# * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS > -# * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT > -# * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR > -# * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT > -# * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, > -# * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT > -# * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, > -# * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY > -# * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT > -# * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE > -# * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. > -# */ > - > -##################################################################### > -# This script is designed to modify the call to the QEMU emulator > -# to support userspace vhost when starting a guest machine through > -# libvirt with vhost enabled. The steps to enable this are as follows > -# and should be run as root: > -# > -# 1. Place this script in a libvirtd's binary search PATH ($PATH) > -# A good location would be in the same directory that the QEMU > -# binary is located > -# > -# 2. Ensure that the script has the same owner/group and file > -# permissions as the QEMU binary > -# > -# 3. Update the VM xml file using "virsh edit VM.xml" > -# > -# 3.a) Set the VM to use the launch script > -# > -# Set the emulator path contained in the > -# tags > -# > -# e.g replace /usr/bin/qemu-kvm > -# with /usr/bin/qemu-wrap.py > -# > -# 3.b) Set the VM's device's to use vhost-net offload > -# > -# > -# > -# > -# > -# > -# 4. Enable libvirt to access our userpace device file by adding it to > -# controllers cgroup for libvirtd using the following steps > -# > -# 4.a) In /etc/libvirt/qemu.conf add/edit the following lines: > -# 1) cgroup_controllers = [ ... "devices", ... ] > -# 2) clear_emulator_capabilities = 0 > -# 3) user = "root" > -# 4) group = "root" > -# 5) cgroup_device_acl = [ > -# "/dev/null", "/dev/full", "/dev/zero", > -# "/dev/random", "/dev/urandom", > -# "/dev/ptmx", "/dev/kvm", "/dev/kqemu", > -# "/dev/rtc", "/dev/hpet", "/dev/net/tun", > -# "/dev/-", > -# ] > -# > -# 4.b) Disable SELinux or set to permissive mode > -# > -# 4.c) Mount cgroup device controller > -# "mkdir /dev/cgroup" > -# "mount -t cgroup none /dev/cgroup -o devices" > -# > -# 4.d) Set hugetlbfs_mount variable - ( Optional ) > -# VMs using userspace vhost must use hugepage backed > -# memory. This can be enabled in the libvirt XML > -# config by adding a memory backing section to the > -# XML config e.g. > -# > -# > -# > -# This memory backing section should be added after the > -# and sections. This will add > -# flags "-mem-prealloc -mem-path " to the QEMU > -# command line. The hugetlbfs_mount variable can be used > -# to override the default passed through by libvirt. > -# > -# if "-mem-prealloc" or "-mem-path " are not passed > -# through and a vhost device is detected then these options will > -# be automatically added by this script. This script will detect > -# the system hugetlbfs mount point to be used for . The > -# default for this script can be overidden by the > -# hugetlbfs_dir variable in the configuration section of this script. > -# > -# > -# 4.e) Restart the libvirtd system process > -# e.g. on Fedora "systemctl restart libvirtd.service" > -# > -# > -# 4.f) Edit the Configuration Parameters section of this script > -# to point to the correct emulator location and set any > -# addition options > -# > -# The script modifies the libvirtd Qemu call by modifying/adding > -# options based on the configuration parameters below. > -# NOTE: > -# emul_path and us_vhost_path must be set > -# All other parameters are optional > -##################################################################### > - > - > -############################################# > -# Configuration Parameters > -############################################# > -#Path to QEMU binary > -emul_path = "/usr/local/bin/qemu-system-x86_64" > - > -#Path to userspace vhost device file > -# This filename should match the --dev-basename --dev-index parameters of > -# the command used to launch the userspace vhost sample application e.g. > -# if the sample app lauch command is: > -# ./build/vhost-switch ..... --dev-basename usvhost --dev-index 1 > -# then this variable should be set to: > -# us_vhost_path = "/dev/usvhost-1" > -us_vhost_path = "/dev/usvhost-1" > - > -#List of additional user defined emulation options. These options will > -#be added to all Qemu calls > -emul_opts_user = [] > - > -#List of additional user defined emulation options for vhost only. > -#These options will only be added to vhost enabled guests > -emul_opts_user_vhost = [] > - > -#For all VHOST enabled VMs, the VM memory is preallocated from hugetlbfs > -# Set this variable to one to enable this option for all VMs > -use_huge_all = 0 > - > -#Instead of autodetecting, override the hugetlbfs directory by setting > -#this variable > -hugetlbfs_dir = "" > - > -############################################# > - > - > -############################################# > -# ****** Do Not Modify Below this Line ****** > -############################################# > - > -import sys, os, subprocess > - > - > -#List of open userspace vhost file descriptors > -fd_list = [] > - > -#additional virtio device flags when using userspace vhost > -vhost_flags = [ "csum=off", > - "gso=off", > - "guest_tso4=off", > - "guest_tso6=off", > - "guest_ecn=off" > - ] > - > - > -############################################# > -# Find the system hugefile mount point. > -# Note: > -# if multiple hugetlbfs mount points exist > -# then the first one found will be used > -############################################# > -def find_huge_mount(): > - > - if (len(hugetlbfs_dir)): > - return hugetlbfs_dir > - > - huge_mount = "" > - > - if (os.access("/proc/mounts", os.F_OK)): > - f = open("/proc/mounts", "r") > - line = f.readline() > - while line: > - line_split = line.split(" ") > - if line_split[2] == 'hugetlbfs': > - huge_mount = line_split[1] > - break > - line = f.readline() > - else: > - print "/proc/mounts not found" > - exit (1) > - > - f.close > - if len(huge_mount) == 0: > - print "Failed to find hugetlbfs mount point" > - exit (1) > - > - return huge_mount > - > - > -############################################# > -# Get a userspace Vhost file descriptor > -############################################# > -def get_vhost_fd(): > - > - if (os.access(us_vhost_path, os.F_OK)): > - fd = os.open( us_vhost_path, os.O_RDWR) > - else: > - print ("US-Vhost file %s not found" %us_vhost_path) > - exit (1) > - > - return fd > - > - > -############################################# > -# Check for vhostfd. if found then replace > -# with our own vhost fd and append any vhost > -# flags onto the end > -############################################# > -def modify_netdev_arg(arg): > - > - global fd_list > - vhost_in_use = 0 > - s = '' > - new_opts = [] > - netdev_opts = arg.split(",") > - > - for opt in netdev_opts: > - #check if vhost is used > - if "vhost" == opt[:5]: > - vhost_in_use = 1 > - else: > - new_opts.append(opt) > - > - #if using vhost append vhost options > - if vhost_in_use == 1: > - #append vhost on option > - new_opts.append('vhost=on') > - #append vhostfd ption > - new_fd = get_vhost_fd() > - new_opts.append('vhostfd=' + str(new_fd)) > - fd_list.append(new_fd) > - > - #concatenate all options > - for opt in new_opts: > - if len(s) > 0: > - s+=',' > - > - s+=opt > - > - return s > - > - > -############################################# > -# Main > -############################################# > -def main(): > - > - global fd_list > - global vhost_in_use > - new_args = [] > - num_cmd_args = len(sys.argv) > - emul_call = '' > - mem_prealloc_set = 0 > - mem_path_set = 0 > - num = 0; > - > - #parse the parameters > - while (num < num_cmd_args): > - arg = sys.argv[num] > - > - #Check netdev +1 parameter for vhostfd > - if arg == '-netdev': > - num_vhost_devs = len(fd_list) > - new_args.append(arg) > - > - num+=1 > - arg = sys.argv[num] > - mod_arg = modify_netdev_arg(arg) > - new_args.append(mod_arg) > - > - #append vhost flags if this is a vhost device > - # and -device is the next arg > - # i.e -device -opt1,-opt2,...,-opt3,%vhost > - if (num_vhost_devs < len(fd_list)): > - num+=1 > - arg = sys.argv[num] > - if arg == '-device': > - new_args.append(arg) > - num+=1 > - new_arg = sys.argv[num] > - for flag in vhost_flags: > - new_arg = ''.join([new_arg,',',flag]) > - new_args.append(new_arg) > - else: > - new_args.append(arg) > - elif arg == '-mem-prealloc': > - mem_prealloc_set = 1 > - new_args.append(arg) > - elif arg == '-mem-path': > - mem_path_set = 1 > - new_args.append(arg) > - > - else: > - new_args.append(arg) > - > - num+=1 > - > - #Set Qemu binary location > - emul_call+=emul_path > - emul_call+=" " > - > - #Add prealloc mem options if using vhost and not already added > - if ((len(fd_list) > 0) and (mem_prealloc_set == 0)): > - emul_call += "-mem-prealloc " > - > - #Add mempath mem options if using vhost and not already added > - if ((len(fd_list) > 0) and (mem_path_set == 0)): > - #Detect and add hugetlbfs mount point > - mp = find_huge_mount() > - mp = "".join(["-mem-path ", mp]) > - emul_call += mp > - emul_call += " " > - > - > - #add user options > - for opt in emul_opts_user: > - emul_call += opt > - emul_call += " " > - > - #Add add user vhost only options > - if len(fd_list) > 0: > - for opt in emul_opts_user_vhost: > - emul_call += opt > - emul_call += " " > - > - #Add updated libvirt options > - iter_args = iter(new_args) > - #skip 1st arg i.e. call to this script > - next(iter_args) > - for arg in iter_args: > - emul_call+=str(arg) > - emul_call+= " " > - > - #Call QEMU > - subprocess.call(emul_call, shell=True) > - > - > - #Close usvhost files > - for fd in fd_list: > - os.close(fd) > - > - > -if __name__ == "__main__": > - main() > - > diff --git a/lib/librte_vhost/rte_virtio_net.h b/lib/librte_vhost/rte_virtio_net.h > index 00b1328..7a05dab 100644 > --- a/lib/librte_vhost/rte_virtio_net.h > +++ b/lib/librte_vhost/rte_virtio_net.h > @@ -34,11 +34,6 @@ > #ifndef _VIRTIO_NET_H_ > #define _VIRTIO_NET_H_ > > -/** > - * @file > - * Interface to vhost net > - */ > - > #include > #include > #include > @@ -48,66 +43,38 @@ > #include > #include > > -/* Used to indicate that the device is running on a data core */ > -#define VIRTIO_DEV_RUNNING 1 > - > -/* Backend value set by guest. */ > -#define VIRTIO_DEV_STOPPED -1 > - > +#define VIRTIO_DEV_RUNNING 1 /**< Used to indicate that the device is running on a data core. */ > +#define VIRTIO_DEV_STOPPED -1 /**< Backend value set by guest. */ > > /* Enum for virtqueue management. */ > enum {VIRTIO_RXQ, VIRTIO_TXQ, VIRTIO_QNUM}; > > -#define BUF_VECTOR_MAX 256 > - > -/** > - * Structure contains buffer address, length and descriptor index > - * from vring to do scatter RX. > - */ > -struct buf_vector { > - uint64_t buf_addr; > - uint32_t buf_len; > - uint32_t desc_idx; > -}; > - > /** > * Structure contains variables relevant to RX/TX virtqueues. > */ > struct vhost_virtqueue { > - struct vring_desc *desc; /**< Virtqueue descriptor ring. */ > - struct vring_avail *avail; /**< Virtqueue available ring. */ > - struct vring_used *used; /**< Virtqueue used ring. */ > - uint32_t size; /**< Size of descriptor ring. */ > - uint32_t backend; /**< Backend value to determine if device should started/stopped. */ > - uint16_t vhost_hlen; /**< Vhost header length (varies depending on RX merge buffers. */ > - volatile uint16_t last_used_idx; /**< Last index used on the available ring */ > - volatile uint16_t last_used_idx_res; /**< Used for multiple devices reserving buffers. */ > - eventfd_t callfd; /**< Currently unused as polling mode is enabled. */ > - eventfd_t kickfd; /**< Used to notify the guest (trigger interrupt). */ > - struct buf_vector buf_vec[BUF_VECTOR_MAX]; /**< for scatter RX. */ > -} __rte_cache_aligned; > - > -/** > - * Device structure contains all configuration information relating to the device. > - */ > -struct virtio_net { > - struct vhost_virtqueue *virtqueue[VIRTIO_QNUM]; /**< Contains all virtqueue information. */ > - struct virtio_memory *mem; /**< QEMU memory and memory region information. */ > - uint64_t features; /**< Negotiated feature set. */ > - uint64_t device_fh; /**< device identifier. */ > - uint32_t flags; /**< Device flags. Only used to check if device is running on data core. */ > - void *priv; /**< private context */ > + struct vring_desc *desc; /**< descriptor ring. */ > + struct vring_avail *avail; /**< available ring. */ > + struct vring_used *used; /**< used ring. */ > + uint32_t size; /**< Size of descriptor ring. */ > + uint32_t backend; /**< Backend value to determine if device should be started/stopped. */ > + uint16_t vhost_hlen; /**< Vhost header length (varies depending on RX merge buffers. */ > + volatile uint16_t last_used_idx; /**< Last index used on the available ring. */ > + volatile uint16_t last_used_idx_res; /**< Used for multiple devices reserving buffers. */ > + eventfd_t callfd; /**< Currently unused as polling mode is enabled. */ > + eventfd_t kickfd; /**< Used to notify the guest (trigger interrupt). */ > } __rte_cache_aligned; > > /** > - * Information relating to memory regions including offsets to addresses in QEMUs memory file. > + * Information relating to memory regions including offsets to > + * addresses in QEMUs memory file. > */ > struct virtio_memory_regions { > - uint64_t guest_phys_address; /**< Base guest physical address of region. */ > - uint64_t guest_phys_address_end; /**< End guest physical address of region. */ > - uint64_t memory_size; /**< Size of region. */ > - uint64_t userspace_address; /**< Base userspace address of region. */ > - uint64_t address_offset; /**< Offset of region for address translation. */ > + uint64_t guest_phys_address; /**< Base guest physical address of region. */ > + uint64_t guest_phys_address_end; /**< End guest physical address of region. */ > + uint64_t memory_size; /**< Size of region. */ > + uint64_t userspace_address; /**< Base userspace address of region. */ > + uint64_t address_offset; /**< Offset of region for address translation. */ > }; > > > @@ -115,21 +82,34 @@ struct virtio_memory_regions { > * Memory structure includes region and mapping information. > */ > struct virtio_memory { > - uint64_t base_address; /**< Base QEMU userspace address of the memory file. */ > - uint64_t mapped_address; /**< Mapped address of memory file base in our applications memory space. */ > - uint64_t mapped_size; /**< Total size of memory file. */ > - uint32_t nregions; /**< Number of memory regions. */ > + uint64_t base_address; /**< Base QEMU userspace address of the memory file. */ > + uint64_t mapped_address; /**< Mapped address of memory file base in our applications memory space. */ > + uint64_t mapped_size; /**< Total size of memory file. */ > + uint32_t nregions; /**< Number of memory regions. */ > struct virtio_memory_regions regions[0]; /**< Memory region information. */ > }; > > /** > + * Device structure contains all configuration information relating to the device. > + */ > +struct virtio_net { > + struct vhost_virtqueue *virtqueue[VIRTIO_QNUM]; /**< Contains all virtqueue information. */ > + struct virtio_memory *mem; /**< QEMU memory and memory region information. */ > + uint64_t features; /**< Negotiated feature set. */ > + uint64_t device_fh; /**< Device identifier. */ > + uint32_t flags; /**< Device flags. Only used to check if device is running on data core. */ > + void *priv; > +} __rte_cache_aligned; > + > +/** > * Device operations to add/remove device. > */ > struct virtio_net_device_ops { > - int (*new_device)(struct virtio_net *); /**< Add device. */ > - void (*destroy_device)(volatile struct virtio_net *); /**< Remove device. */ > + int (*new_device)(struct virtio_net *); /**< Add device. */ > + void (*destroy_device)(struct virtio_net *); /**< Remove device. */ > }; > > + > static inline uint16_t __attribute__((always_inline)) > rte_vring_available_entries(struct virtio_net *dev, uint16_t queue_id) > { > @@ -179,7 +159,7 @@ int rte_vhost_driver_register(const char *dev_name); > > /* Register callbacks. */ > int rte_vhost_driver_callback_register(struct virtio_net_device_ops const * const); > -/* Start vhost driver session blocking loop. */ > + > int rte_vhost_driver_session_start(void); > > /** > @@ -192,8 +172,8 @@ int rte_vhost_driver_session_start(void); > * @return > * num of packets enqueued > */ > -uint16_t rte_vhost_enqueue_burst(struct virtio_net *dev, uint16_t queue_id, > - struct rte_mbuf **pkts, uint16_t count); > +uint32_t rte_vhost_enqueue_burst(struct virtio_net *dev, uint16_t queue_id, > + struct rte_mbuf **pkts, uint32_t count); > > /** > * This function gets guest buffers from the virtio device TX virtqueue, > @@ -206,7 +186,7 @@ uint16_t rte_vhost_enqueue_burst(struct virtio_net *dev, uint16_t queue_id, > * @return > * num of packets dequeued > */ > -uint16_t rte_vhost_dequeue_burst(struct virtio_net *dev, uint16_t queue_id, > - struct rte_mempool *mbuf_pool, struct rte_mbuf **pkts, uint16_t count); > +uint32_t rte_vhost_dequeue_burst(struct virtio_net *dev, uint16_t queue_id, > + struct rte_mempool *mbuf_pool, struct rte_mbuf **pkts, uint32_t count); > > #endif /* _VIRTIO_NET_H_ */ > diff --git a/lib/librte_vhost/vhost-cuse/vhost-net-cdev.c b/lib/librte_vhost/vhost-cuse/vhost-net-cdev.c > new file mode 100644 > index 0000000..4671643 > --- /dev/null > +++ b/lib/librte_vhost/vhost-cuse/vhost-net-cdev.c > @@ -0,0 +1,436 @@ > +/*- > + * BSD LICENSE > + * > + * Copyright(c) 2010-2014 Intel Corporation. All rights reserved. > + * All rights reserved. > + * > + * Redistribution and use in source and binary forms, with or without > + * modification, are permitted provided that the following conditions > + * are met: > + * > + * * Redistributions of source code must retain the above copyright > + * notice, this list of conditions and the following disclaimer. > + * * Redistributions in binary form must reproduce the above copyright > + * notice, this list of conditions and the following disclaimer in > + * the documentation and/or other materials provided with the > + * distribution. > + * * Neither the name of Intel Corporation nor the names of its > + * contributors may be used to endorse or promote products derived > + * from this software without specific prior written permission. > + * > + * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS > + * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT > + * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR > + * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT > + * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, > + * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT > + * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, > + * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY > + * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT > + * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE > + * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. > + */ > + > +#include > +#include > +#include > +#include > +#include > +#include > +#include > +#include > + > +#include > +#include > +#include > +#include > + > +#include "virtio-net-cdev.h" > +#include "vhost-net.h" > +#include "eventfd_link/eventfd_link.h" > + > +#define FUSE_OPT_DUMMY "\0\0" > +#define FUSE_OPT_FORE "-f\0\0" > +#define FUSE_OPT_NOMULTI "-s\0\0" > + > +static const uint32_t default_major = 231; > +static const uint32_t default_minor = 1; > +static const char cuse_device_name[] = "/dev/cuse"; > +static const char default_cdev[] = "vhost-net"; > +static const char eventfd_cdev[] = "/dev/eventfd-link"; > + > +static struct fuse_session *session; > +const struct vhost_net_device_ops const *ops; > + > +/* > + * Returns vhost_device_ctx from given fuse_req_t. The index is populated later > + * when the device is added to the device linked list. > + */ > +static struct vhost_device_ctx > +fuse_req_to_vhost_ctx(fuse_req_t req, struct fuse_file_info *fi) > +{ > + struct vhost_device_ctx ctx; > + struct fuse_ctx const *const req_ctx = fuse_req_ctx(req); > + > + ctx.pid = req_ctx->pid; > + ctx.fh = fi->fh; > + > + return ctx; > +} > + > +/* > + * When the device is created in QEMU it gets initialised here and > + * added to the device linked list. > + */ > +static void > +vhost_net_open(fuse_req_t req, struct fuse_file_info *fi) > +{ > + struct vhost_device_ctx ctx = fuse_req_to_vhost_ctx(req, fi); > + int err = 0; > + > + err = ops->new_device(ctx); > + if (err == -1) { > + fuse_reply_err(req, EPERM); > + return; > + } > + > + fi->fh = err; > + > + RTE_LOG(INFO, VHOST_CONFIG, > + "(%"PRIu64") Device configuration started\n", fi->fh); > + fuse_reply_open(req, fi); > +} > + > +/* > + * When QEMU is shutdown or killed the device gets released. > + */ > +static void > +vhost_net_release(fuse_req_t req, struct fuse_file_info *fi) > +{ > + int err = 0; > + struct vhost_device_ctx ctx = fuse_req_to_vhost_ctx(req, fi); > + > + ops->destroy_device(ctx); > + RTE_LOG(INFO, VHOST_CONFIG, "(%"PRIu64") Device released\n", ctx.fh); > + fuse_reply_err(req, err); > +} > + > +/* > + * Boilerplate code for CUSE IOCTL > + * Implicit arguments: ctx, req, result. > + */ > +#define VHOST_IOCTL(func) do { \ > + result = (func)(ctx); \ > + fuse_reply_ioctl(req, result, NULL, 0); \ > +} while (0) > + > +/* > + * Boilerplate IOCTL RETRY > + * Implicit arguments: req. > + */ > +#define VHOST_IOCTL_RETRY(size_r, size_w) do { \ > + struct iovec iov_r = { arg, (size_r) }; \ > + struct iovec iov_w = { arg, (size_w) }; \ > + fuse_reply_ioctl_retry(req, &iov_r, \ > + (size_r) ? 1 : 0, &iov_w, (size_w) ? 1 : 0);\ > +} while (0) > + > +/* > + * Boilerplate code for CUSE Read IOCTL > + * Implicit arguments: ctx, req, result, in_bufsz, in_buf. > + */ > +#define VHOST_IOCTL_R(type, var, func) do { \ > + if (!in_bufsz) { \ > + VHOST_IOCTL_RETRY(sizeof(type), 0);\ > + } else { \ > + (var) = *(const type*)in_buf; \ > + result = func(ctx, &(var)); \ > + fuse_reply_ioctl(req, result, NULL, 0);\ > + } \ > +} while (0) > + > +/* > + * Boilerplate code for CUSE Write IOCTL > + * Implicit arguments: ctx, req, result, out_bufsz. > + */ > +#define VHOST_IOCTL_W(type, var, func) do { \ > + if (!out_bufsz) { \ > + VHOST_IOCTL_RETRY(0, sizeof(type));\ > + } else { \ > + result = (func)(ctx, &(var));\ > + fuse_reply_ioctl(req, result, &(var), sizeof(type));\ > + } \ > +} while (0) > + > +/* > + * Boilerplate code for CUSE Read/Write IOCTL > + * Implicit arguments: ctx, req, result, in_bufsz, in_buf. > + */ > +#define VHOST_IOCTL_RW(type1, var1, type2, var2, func) do { \ > + if (!in_bufsz) { \ > + VHOST_IOCTL_RETRY(sizeof(type1), sizeof(type2));\ > + } else { \ > + (var1) = *(const type1*) (in_buf); \ > + result = (func)(ctx, (var1), &(var2)); \ > + fuse_reply_ioctl(req, result, &(var2), sizeof(type2));\ > + } \ > +} while (0) > + > +/* > + * This function uses the eventfd_link kernel module to copy an eventfd file > + * descriptor provided by QEMU in to our process space. > + */ > +static int > +eventfd_copy(int target_fd, int target_pid) > +{ > + int eventfd_link, ret; > + struct eventfd_copy eventfd_copy; > + int fd = eventfd(0, EFD_NONBLOCK | EFD_CLOEXEC); > + > + if (fd == -1) > + return -1; > + > + /* Open the character device to the kernel module. */ > + /* TODO: check this earlier rather than fail until VM boots! */ > + eventfd_link = open(eventfd_cdev, O_RDWR); > + if (eventfd_link < 0) { > + RTE_LOG(ERR, VHOST_CONFIG, > + "eventfd_link module is not loaded\n"); > + return -1; > + } > + > + eventfd_copy.source_fd = fd; > + eventfd_copy.target_fd = target_fd; > + eventfd_copy.target_pid = target_pid; > + /* Call the IOCTL to copy the eventfd. */ > + ret = ioctl(eventfd_link, EVENTFD_COPY, &eventfd_copy); > + close(eventfd_link); > + > + if (ret < 0) { > + RTE_LOG(ERR, VHOST_CONFIG, > + "EVENTFD_COPY ioctl failed\n"); > + return -1; > + } > + > + return fd; > +} > + > +/* > + * The IOCTLs are handled using CUSE/FUSE in userspace. Depending on > + * the type of IOCTL a buffer is requested to read or to write. This > + * request is handled by FUSE and the buffer is then given to CUSE. > + */ > +static void > +vhost_net_ioctl(fuse_req_t req, int cmd, void *arg, > + struct fuse_file_info *fi, __rte_unused unsigned flags, > + const void *in_buf, size_t in_bufsz, size_t out_bufsz) > +{ > + struct vhost_device_ctx ctx = fuse_req_to_vhost_ctx(req, fi); > + struct vhost_vring_file file; > + struct vhost_vring_state state; > + struct vhost_vring_addr addr; > + uint64_t features; > + uint32_t index; > + int result = 0; > + > + switch (cmd) { > + case VHOST_NET_SET_BACKEND: > + LOG_DEBUG(VHOST_CONFIG, > + "(%"PRIu64") IOCTL: VHOST_NET_SET_BACKEND\n", ctx.fh); > + VHOST_IOCTL_R(struct vhost_vring_file, file, ops->set_backend); > + break; > + > + case VHOST_GET_FEATURES: > + LOG_DEBUG(VHOST_CONFIG, > + "(%"PRIu64") IOCTL: VHOST_GET_FEATURES\n", ctx.fh); > + VHOST_IOCTL_W(uint64_t, features, ops->get_features); > + break; > + > + case VHOST_SET_FEATURES: > + LOG_DEBUG(VHOST_CONFIG, > + "(%"PRIu64") IOCTL: VHOST_SET_FEATURES\n", ctx.fh); > + VHOST_IOCTL_R(uint64_t, features, ops->set_features); > + break; > + > + case VHOST_RESET_OWNER: > + LOG_DEBUG(VHOST_CONFIG, > + "(%"PRIu64") IOCTL: VHOST_RESET_OWNER\n", ctx.fh); > + VHOST_IOCTL(ops->reset_owner); > + break; > + > + case VHOST_SET_OWNER: > + LOG_DEBUG(VHOST_CONFIG, > + "(%"PRIu64") IOCTL: VHOST_SET_OWNER\n", ctx.fh); > + VHOST_IOCTL(ops->set_owner); > + break; > + > + case VHOST_SET_MEM_TABLE: > + /*TODO fix race condition.*/ > + LOG_DEBUG(VHOST_CONFIG, > + "(%"PRIu64") IOCTL: VHOST_SET_MEM_TABLE\n", ctx.fh); > + static struct vhost_memory mem_temp; > + switch (in_bufsz) { > + case 0: > + VHOST_IOCTL_RETRY(sizeof(struct vhost_memory), 0); > + break; > + > + case sizeof(struct vhost_memory): > + mem_temp = *(const struct vhost_memory *) in_buf; > + > + if (mem_temp.nregions > 0) { > + VHOST_IOCTL_RETRY(sizeof(struct vhost_memory) + > + (sizeof(struct vhost_memory_region) * > + mem_temp.nregions), 0); > + } else { > + result = -1; > + fuse_reply_ioctl(req, result, NULL, 0); > + } > + break; > + > + default: > + result = cuse_set_mem_table(ctx, in_buf, > + mem_temp.nregions); > + if (result) > + fuse_reply_err(req, EINVAL); > + else > + fuse_reply_ioctl(req, result, NULL, 0); > + } > + break; > + > + case VHOST_SET_VRING_NUM: > + LOG_DEBUG(VHOST_CONFIG, > + "(%"PRIu64") IOCTL: VHOST_SET_VRING_NUM\n", ctx.fh); > + VHOST_IOCTL_R(struct vhost_vring_state, state, ops->set_vring_num); > + break; > + > + case VHOST_SET_VRING_BASE: > + LOG_DEBUG(VHOST_CONFIG, > + "(%"PRIu64") IOCTL: VHOST_SET_VRING_BASE\n", ctx.fh); > + VHOST_IOCTL_R(struct vhost_vring_state, state, ops->set_vring_base); > + break; > + > + case VHOST_GET_VRING_BASE: > + LOG_DEBUG(VHOST_CONFIG, > + "(%"PRIu64") IOCTL: VHOST_GET_VRING_BASE\n", ctx.fh); > + VHOST_IOCTL_RW(uint32_t, index, > + struct vhost_vring_state, state, ops->get_vring_base); > + break; > + > + case VHOST_SET_VRING_ADDR: > + LOG_DEBUG(VHOST_CONFIG, > + "(%"PRIu64") IOCTL: VHOST_SET_VRING_ADDR\n", ctx.fh); > + VHOST_IOCTL_R(struct vhost_vring_addr, addr, ops->set_vring_addr); > + break; > + > + case VHOST_SET_VRING_KICK: > + case VHOST_SET_VRING_CALL: > + if (!in_buf) { > + VHOST_IOCTL_RETRY(sizeof(struct vhost_vring_file), 0); > + } else { > + int fd; > + file = *(const struct vhost_vring_file *)in_buf; > + LOG_DEBUG(VHOST_CONFIG, > + "kick/call idx:%d fd:%d\n", file.index, file.fd); > + if ((fd = eventfd_copy(file.fd, ctx.pid)) < 0){ > + fuse_reply_ioctl(req, -1, NULL, 0); > + } > + file.fd = fd; > + if (cmd == VHOST_SET_VRING_KICK) { > + VHOST_IOCTL_R(struct vhost_vring_file, file, ops->set_vring_call); > + } > + else { > + VHOST_IOCTL_R(struct vhost_vring_file, file, ops->set_vring_kick); > + } > + } > + break; > + > + default: > + RTE_LOG(ERR, VHOST_CONFIG, > + "(%"PRIu64") IOCTL: DOESN NOT EXIST\n", ctx.fh); > + result = -1; > + fuse_reply_ioctl(req, result, NULL, 0); > + } > + > + if (result < 0) > + LOG_DEBUG(VHOST_CONFIG, > + "(%"PRIu64") IOCTL: FAIL\n", ctx.fh); > + else > + LOG_DEBUG(VHOST_CONFIG, > + "(%"PRIu64") IOCTL: SUCCESS\n", ctx.fh); > +} > + > +/* > + * Structure handling open, release and ioctl function pointers is populated. > + */ > +static const struct cuse_lowlevel_ops vhost_net_ops = { > + .open = vhost_net_open, > + .release = vhost_net_release, > + .ioctl = vhost_net_ioctl, > +}; > + > +/* > + * cuse_info is populated and used to register the cuse device. > + * vhost_net_device_ops are also passed when the device is registered in app. > + */ > +int > +rte_vhost_driver_register(const char *dev_name) > +{ > + struct cuse_info cuse_info; > + char device_name[PATH_MAX] = ""; > + char char_device_name[PATH_MAX] = ""; > + const char *device_argv[] = { device_name }; > + > + char fuse_opt_dummy[] = FUSE_OPT_DUMMY; > + char fuse_opt_fore[] = FUSE_OPT_FORE; > + char fuse_opt_nomulti[] = FUSE_OPT_NOMULTI; > + char *fuse_argv[] = {fuse_opt_dummy, fuse_opt_fore, fuse_opt_nomulti}; > + > + if (access(cuse_device_name, R_OK | W_OK) < 0) { > + RTE_LOG(ERR, VHOST_CONFIG, > + "char device %s can't be accessed, maybe not exist\n", > + cuse_device_name); > + return -1; > + } > + > + /* > + * The device name is created. This is passed to QEMU so that it can > + * register the device with our application. > + */ > + snprintf(device_name, PATH_MAX, "DEVNAME=%s", dev_name); > + snprintf(char_device_name, PATH_MAX, "/dev/%s", dev_name); > + > + /* Check if device already exists. */ > + if (access(char_device_name, F_OK) != -1) { > + RTE_LOG(ERR, VHOST_CONFIG, > + "char device %s already exists\n", char_device_name); > + return -1; > + } > + > + memset(&cuse_info, 0, sizeof(cuse_info)); > + cuse_info.dev_major = default_major; > + cuse_info.dev_minor = default_minor; > + cuse_info.dev_info_argc = 1; > + cuse_info.dev_info_argv = device_argv; > + cuse_info.flags = CUSE_UNRESTRICTED_IOCTL; > + > + ops = get_virtio_net_callbacks(); > + > + session = cuse_lowlevel_setup(3, fuse_argv, > + &cuse_info, &vhost_net_ops, 0, NULL); > + if (session == NULL) > + return -1; > + > + return 0; > +} > + > +/** > + * The CUSE session is launched allowing the application to receive open, > + * release and ioctl calls. > + */ > +int > +rte_vhost_driver_session_start(void) > +{ > + fuse_session_loop(session); > + > + return 0; > +} > diff --git a/lib/librte_vhost/vhost-cuse/virtio-net-cdev.c b/lib/librte_vhost/vhost-cuse/virtio-net-cdev.c > new file mode 100644 > index 0000000..5c16aa5 > --- /dev/null > +++ b/lib/librte_vhost/vhost-cuse/virtio-net-cdev.c > @@ -0,0 +1,314 @@ > +/*- > + * BSD LICENSE > + * > + * Copyright(c) 2010-2014 Intel Corporation. All rights reserved. > + * All rights reserved. > + * > + * Redistribution and use in source and binary forms, with or without > + * modification, are permitted provided that the following conditions > + * are met: > + * > + * * Redistributions of source code must retain the above copyright > + * notice, this list of conditions and the following disclaimer. > + * * Redistributions in binary form must reproduce the above copyright > + * notice, this list of conditions and the following disclaimer in > + * the documentation and/or other materials provided with the > + * distribution. > + * * Neither the name of Intel Corporation nor the names of its > + * contributors may be used to endorse or promote products derived > + * from this software without specific prior written permission. > + * > + * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS > + * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT > + * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR > + * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT > + * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, > + * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT > + * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, > + * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY > + * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT > + * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE > + * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. > + */ > + > +#include > +#include > +#include > +#include > +#include > +#include > +#include > +#include > +#include > +#include > +#include > +#include > +#include > + > +#include > + > +#include "vhost-net.h" > +#include "virtio-net-cdev.h" > + > +extern struct vhost_net_device_ops const *ops; > + > +/* Line size for reading maps file. */ > +static const uint32_t BUFSIZE = PATH_MAX; > + > +/* Size of prot char array in procmap. */ > +#define PROT_SZ 5 > + > +/* Number of elements in procmap struct. */ > +#define PROCMAP_SZ 8 > + > +/* Structure containing information gathered from maps file. */ > +struct procmap { > + uint64_t va_start; /* Start virtual address in file. */ > + uint64_t len; /* Size of file. */ > + uint64_t pgoff; /* Not used. */ > + uint32_t maj; /* Not used. */ > + uint32_t min; /* Not used. */ > + uint32_t ino; /* Not used. */ > + char prot[PROT_SZ]; /* Not used. */ > + char fname[PATH_MAX]; /* File name. */ > +}; > + > +/* > + * Locate the file containing QEMU's memory space and > + * map it to our address space. > + */ > +static int > +host_memory_map(pid_t pid, uint64_t addr, > + uint64_t *mapped_address, uint64_t *mapped_size) > +{ > + struct dirent *dptr = NULL; > + struct procmap procmap; > + DIR *dp = NULL; > + int fd; > + int i; > + char memfile[PATH_MAX]; > + char mapfile[PATH_MAX]; > + char procdir[PATH_MAX]; > + char resolved_path[PATH_MAX]; > + FILE *fmap; > + void *map; > + uint8_t found = 0; > + char line[BUFSIZE]; > + char dlm[] = "- : "; > + char *str, *sp, *in[PROCMAP_SZ]; > + char *end = NULL; > + > + /* Path where mem files are located. */ > + snprintf(procdir, PATH_MAX, "/proc/%u/fd/", pid); > + /* Maps file used to locate mem file. */ > + snprintf(mapfile, PATH_MAX, "/proc/%u/maps", pid); > + > + fmap = fopen(mapfile, "r"); > + if (fmap == NULL) { > + RTE_LOG(ERR, VHOST_CONFIG, > + "Failed to open maps file for pid %d\n", pid); > + return -1; > + } > + > + /* Read through maps file until we find out base_address. */ > + while (fgets(line, BUFSIZE, fmap) != 0) { > + str = line; > + errno = 0; > + /* Split line in to fields. */ > + for (i = 0; i < PROCMAP_SZ; i++) { > + in[i] = strtok_r(str, &dlm[i], &sp); > + if ((in[i] == NULL) || (errno != 0)) { > + fclose(fmap); > + return -1; > + } > + str = NULL; > + } > + > + /* Convert/Copy each field as needed. */ > + procmap.va_start = strtoull(in[0], &end, 16); > + if ((in[0] == '\0') || (end == NULL) || (*end != '\0') || > + (errno != 0)) { > + fclose(fmap); > + return -1; > + } > + > + procmap.len = strtoull(in[1], &end, 16); > + if ((in[1] == '\0') || (end == NULL) || (*end != '\0') || > + (errno != 0)) { > + fclose(fmap); > + return -1; > + } > + > + procmap.pgoff = strtoull(in[3], &end, 16); > + if ((in[3] == '\0') || (end == NULL) || (*end != '\0') || > + (errno != 0)) { > + fclose(fmap); > + return -1; > + } > + > + procmap.maj = strtoul(in[4], &end, 16); > + if ((in[4] == '\0') || (end == NULL) || (*end != '\0') || > + (errno != 0)) { > + fclose(fmap); > + return -1; > + } > + > + procmap.min = strtoul(in[5], &end, 16); > + if ((in[5] == '\0') || (end == NULL) || (*end != '\0') || > + (errno != 0)) { > + fclose(fmap); > + return -1; > + } > + > + procmap.ino = strtoul(in[6], &end, 16); > + if ((in[6] == '\0') || (end == NULL) || (*end != '\0') || > + (errno != 0)) { > + fclose(fmap); > + return -1; > + } > + > + memcpy(&procmap.prot, in[2], PROT_SZ); > + memcpy(&procmap.fname, in[7], PATH_MAX); > + > + if (procmap.va_start == addr) { > + procmap.len = procmap.len - procmap.va_start; > + found = 1; > + break; > + } > + } > + fclose(fmap); > + > + if (!found) { > + RTE_LOG(ERR, VHOST_CONFIG, > + "Failed to find memory file in pid %d maps file\n", pid); > + return -1; > + } > + > + /* Find the guest memory file among the process fds. */ > + dp = opendir(procdir); > + if (dp == NULL) { > + RTE_LOG(ERR, VHOST_CONFIG, > + "Cannot open pid %d process directory\n", > + pid); > + return -1; > + > + } > + > + found = 0; > + > + /* Read the fd directory contents. */ > + while (NULL != (dptr = readdir(dp))) { > + snprintf(memfile, PATH_MAX, "/proc/%u/fd/%s", > + pid, dptr->d_name); > + realpath(memfile, resolved_path); > + if (resolved_path == NULL) { > + RTE_LOG(ERR, VHOST_CONFIG, > + "Failed to resolve fd directory\n"); > + closedir(dp); > + return -1; > + } > + if (strncmp(resolved_path, procmap.fname, > + strnlen(procmap.fname, PATH_MAX)) == 0) { > + found = 1; > + break; > + } > + } > + > + closedir(dp); > + > + if (found == 0) { > + RTE_LOG(ERR, VHOST_CONFIG, > + "Failed to find memory file for pid %d\n", > + pid); > + return -1; > + } > + /* Open the shared memory file and map the memory into this process. */ > + fd = open(memfile, O_RDWR); > + > + if (fd == -1) { > + RTE_LOG(ERR, VHOST_CONFIG, > + "Failed to open %s for pid %d\n", > + memfile, pid); > + return -1; > + } > + > + map = mmap(0, (size_t)procmap.len, PROT_READ|PROT_WRITE , > + MAP_POPULATE|MAP_SHARED, fd, 0); > + close(fd); > + > + if (map == MAP_FAILED) { > + RTE_LOG(ERR, VHOST_CONFIG, > + "Error mapping the file %s for pid %d\n", > + memfile, pid); > + return -1; > + } > + > + /* Store the memory address and size in the device data structure */ > + *mapped_address = (uint64_t)(uintptr_t)map; > + *mapped_size = procmap.len; > + > + LOG_DEBUG(VHOST_CONFIG, > + "Mem File: %s->%s - Size: %llu - VA: %p\n", > + memfile, resolved_path, > + (unsigned long long)mapped_size, map); > + > + return 0; > +} > + > +int > +cuse_set_mem_table(struct vhost_device_ctx ctx, const struct vhost_memory *mem_regions_addr, > + uint32_t nregions) > +{ > + uint64_t size = offsetof(struct vhost_memory, regions); > + uint32_t idx; > + struct virtio_memory_regions regions[8]; /* VHOST_MAX_MEMORY_REGIONS */ > + struct vhost_memory_region *mem_regions = (void *)(uintptr_t) > + ((uint64_t)(uintptr_t)mem_regions_addr + size); > + uint64_t base_address = 0, mapped_address, mapped_size; > + > + for (idx = 0; idx < nregions; idx++) { > + regions[idx].guest_phys_address = > + mem_regions[idx].guest_phys_addr; > + regions[idx].guest_phys_address_end = > + regions[idx].guest_phys_address + > + mem_regions[idx].memory_size; > + regions[idx].memory_size = > + mem_regions[idx].memory_size; > + regions[idx].userspace_address = > + mem_regions[idx].userspace_addr; > + > + LOG_DEBUG(VHOST_CONFIG, "REGION: %u - GPA: %p - QEMU VA: %p - SIZE (%"PRIu64")\n", > + idx, > + (void *)(uintptr_t)regions[idx].guest_phys_address, > + (void *)(uintptr_t)regions[idx].userspace_address, > + regions[idx].memory_size); > + > + /*set the base address mapping*/ > + if (regions[idx].guest_phys_address == 0x0) { > + base_address = > + regions[idx].userspace_address; > + /* Map VM memory file */ > + if (host_memory_map(ctx.pid, base_address, > + &mapped_address, &mapped_size) != 0) { > + return -1; > + } > + } > + } > + > + /* Check that we have a valid base address. */ > + if (base_address == 0) { > + RTE_LOG(ERR, VHOST_CONFIG, > + "Failed to find base address of qemu memory file.\n"); > + return -1; > + } > + > + for (idx = 0; idx < nregions; idx++) { > + regions[idx].address_offset = > + mapped_address - base_address + > + regions[idx].userspace_address - > + regions[idx].guest_phys_address; > + } > + > + ops->set_mem_table(ctx, ®ions[0], nregions); > + return 0; > +} > diff --git a/lib/librte_vhost/vhost-cuse/virtio-net-cdev.h b/lib/librte_vhost/vhost-cuse/virtio-net-cdev.h > new file mode 100644 > index 0000000..6f98ce8 > --- /dev/null > +++ b/lib/librte_vhost/vhost-cuse/virtio-net-cdev.h > @@ -0,0 +1,43 @@ > +/*- > + * BSD LICENSE > + * > + * Copyright(c) 2010-2014 Intel Corporation. All rights reserved. > + * All rights reserved. > + * > + * Redistribution and use in source and binary forms, with or without > + * modification, are permitted provided that the following conditions > + * are met: > + * > + * * Redistributions of source code must retain the above copyright > + * notice, this list of conditions and the following disclaimer. > + * * Redistributions in binary form must reproduce the above copyright > + * notice, this list of conditions and the following disclaimer in > + * the documentation and/or other materials provided with the > + * distribution. > + * * Neither the name of Intel Corporation nor the names of its > + * contributors may be used to endorse or promote products derived > + * from this software without specific prior written permission. > + * > + * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS > + * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT > + * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR > + * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT > + * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, > + * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT > + * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, > + * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY > + * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT > + * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE > + * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. > + */ > +#ifndef _VIRTIO_NET_CDEV_H > +#define _VIRTIO_NET_CDEV_H > +#include > + > +#include "vhost-net.h" > + > +int > +cuse_set_mem_table(struct vhost_device_ctx ctx, const struct vhost_memory *mem_regions_addr, > + uint32_t nregions); > + > +#endif > diff --git a/lib/librte_vhost/vhost-net-cdev.c b/lib/librte_vhost/vhost-net-cdev.c > deleted file mode 100644 > index 57c76cb..0000000 > --- a/lib/librte_vhost/vhost-net-cdev.c > +++ /dev/null > @@ -1,389 +0,0 @@ > -/*- > - * BSD LICENSE > - * > - * Copyright(c) 2010-2014 Intel Corporation. All rights reserved. > - * All rights reserved. > - * > - * Redistribution and use in source and binary forms, with or without > - * modification, are permitted provided that the following conditions > - * are met: > - * > - * * Redistributions of source code must retain the above copyright > - * notice, this list of conditions and the following disclaimer. > - * * Redistributions in binary form must reproduce the above copyright > - * notice, this list of conditions and the following disclaimer in > - * the documentation and/or other materials provided with the > - * distribution. > - * * Neither the name of Intel Corporation nor the names of its > - * contributors may be used to endorse or promote products derived > - * from this software without specific prior written permission. > - * > - * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS > - * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT > - * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR > - * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT > - * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, > - * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT > - * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, > - * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY > - * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT > - * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE > - * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. > - */ > - > -#include > -#include > -#include > -#include > -#include > -#include > -#include > - > -#include > -#include > -#include > -#include > - > -#include "vhost-net-cdev.h" > - > -#define FUSE_OPT_DUMMY "\0\0" > -#define FUSE_OPT_FORE "-f\0\0" > -#define FUSE_OPT_NOMULTI "-s\0\0" > - > -static const uint32_t default_major = 231; > -static const uint32_t default_minor = 1; > -static const char cuse_device_name[] = "/dev/cuse"; > -static const char default_cdev[] = "vhost-net"; > - > -static struct fuse_session *session; > -static struct vhost_net_device_ops const *ops; > - > -/* > - * Returns vhost_device_ctx from given fuse_req_t. The index is populated later > - * when the device is added to the device linked list. > - */ > -static struct vhost_device_ctx > -fuse_req_to_vhost_ctx(fuse_req_t req, struct fuse_file_info *fi) > -{ > - struct vhost_device_ctx ctx; > - struct fuse_ctx const *const req_ctx = fuse_req_ctx(req); > - > - ctx.pid = req_ctx->pid; > - ctx.fh = fi->fh; > - > - return ctx; > -} > - > -/* > - * When the device is created in QEMU it gets initialised here and > - * added to the device linked list. > - */ > -static void > -vhost_net_open(fuse_req_t req, struct fuse_file_info *fi) > -{ > - struct vhost_device_ctx ctx = fuse_req_to_vhost_ctx(req, fi); > - int err = 0; > - > - err = ops->new_device(ctx); > - if (err == -1) { > - fuse_reply_err(req, EPERM); > - return; > - } > - > - fi->fh = err; > - > - RTE_LOG(INFO, VHOST_CONFIG, > - "(%"PRIu64") Device configuration started\n", fi->fh); > - fuse_reply_open(req, fi); > -} > - > -/* > - * When QEMU is shutdown or killed the device gets released. > - */ > -static void > -vhost_net_release(fuse_req_t req, struct fuse_file_info *fi) > -{ > - int err = 0; > - struct vhost_device_ctx ctx = fuse_req_to_vhost_ctx(req, fi); > - > - ops->destroy_device(ctx); > - RTE_LOG(INFO, VHOST_CONFIG, "(%"PRIu64") Device released\n", ctx.fh); > - fuse_reply_err(req, err); > -} > - > -/* > - * Boilerplate code for CUSE IOCTL > - * Implicit arguments: ctx, req, result. > - */ > -#define VHOST_IOCTL(func) do { \ > - result = (func)(ctx); \ > - fuse_reply_ioctl(req, result, NULL, 0); \ > -} while (0) > - > -/* > - * Boilerplate IOCTL RETRY > - * Implicit arguments: req. > - */ > -#define VHOST_IOCTL_RETRY(size_r, size_w) do { \ > - struct iovec iov_r = { arg, (size_r) }; \ > - struct iovec iov_w = { arg, (size_w) }; \ > - fuse_reply_ioctl_retry(req, &iov_r, \ > - (size_r) ? 1 : 0, &iov_w, (size_w) ? 1 : 0);\ > -} while (0) > - > -/* > - * Boilerplate code for CUSE Read IOCTL > - * Implicit arguments: ctx, req, result, in_bufsz, in_buf. > - */ > -#define VHOST_IOCTL_R(type, var, func) do { \ > - if (!in_bufsz) { \ > - VHOST_IOCTL_RETRY(sizeof(type), 0);\ > - } else { \ > - (var) = *(const type*)in_buf; \ > - result = func(ctx, &(var)); \ > - fuse_reply_ioctl(req, result, NULL, 0);\ > - } \ > -} while (0) > - > -/* > - * Boilerplate code for CUSE Write IOCTL > - * Implicit arguments: ctx, req, result, out_bufsz. > - */ > -#define VHOST_IOCTL_W(type, var, func) do { \ > - if (!out_bufsz) { \ > - VHOST_IOCTL_RETRY(0, sizeof(type));\ > - } else { \ > - result = (func)(ctx, &(var));\ > - fuse_reply_ioctl(req, result, &(var), sizeof(type));\ > - } \ > -} while (0) > - > -/* > - * Boilerplate code for CUSE Read/Write IOCTL > - * Implicit arguments: ctx, req, result, in_bufsz, in_buf. > - */ > -#define VHOST_IOCTL_RW(type1, var1, type2, var2, func) do { \ > - if (!in_bufsz) { \ > - VHOST_IOCTL_RETRY(sizeof(type1), sizeof(type2));\ > - } else { \ > - (var1) = *(const type1*) (in_buf); \ > - result = (func)(ctx, (var1), &(var2)); \ > - fuse_reply_ioctl(req, result, &(var2), sizeof(type2));\ > - } \ > -} while (0) > - > -/* > - * The IOCTLs are handled using CUSE/FUSE in userspace. Depending on the type > - * of IOCTL a buffer is requested to read or to write. This request is handled > - * by FUSE and the buffer is then given to CUSE. > - */ > -static void > -vhost_net_ioctl(fuse_req_t req, int cmd, void *arg, > - struct fuse_file_info *fi, __rte_unused unsigned flags, > - const void *in_buf, size_t in_bufsz, size_t out_bufsz) > -{ > - struct vhost_device_ctx ctx = fuse_req_to_vhost_ctx(req, fi); > - struct vhost_vring_file file; > - struct vhost_vring_state state; > - struct vhost_vring_addr addr; > - uint64_t features; > - uint32_t index; > - int result = 0; > - > - switch (cmd) { > - case VHOST_NET_SET_BACKEND: > - LOG_DEBUG(VHOST_CONFIG, > - "(%"PRIu64") IOCTL: VHOST_NET_SET_BACKEND\n", ctx.fh); > - VHOST_IOCTL_R(struct vhost_vring_file, file, ops->set_backend); > - break; > - > - case VHOST_GET_FEATURES: > - LOG_DEBUG(VHOST_CONFIG, > - "(%"PRIu64") IOCTL: VHOST_GET_FEATURES\n", ctx.fh); > - VHOST_IOCTL_W(uint64_t, features, ops->get_features); > - break; > - > - case VHOST_SET_FEATURES: > - LOG_DEBUG(VHOST_CONFIG, > - "(%"PRIu64") IOCTL: VHOST_SET_FEATURES\n", ctx.fh); > - VHOST_IOCTL_R(uint64_t, features, ops->set_features); > - break; > - > - case VHOST_RESET_OWNER: > - LOG_DEBUG(VHOST_CONFIG, > - "(%"PRIu64") IOCTL: VHOST_RESET_OWNER\n", ctx.fh); > - VHOST_IOCTL(ops->reset_owner); > - break; > - > - case VHOST_SET_OWNER: > - LOG_DEBUG(VHOST_CONFIG, > - "(%"PRIu64") IOCTL: VHOST_SET_OWNER\n", ctx.fh); > - VHOST_IOCTL(ops->set_owner); > - break; > - > - case VHOST_SET_MEM_TABLE: > - /*TODO fix race condition.*/ > - LOG_DEBUG(VHOST_CONFIG, > - "(%"PRIu64") IOCTL: VHOST_SET_MEM_TABLE\n", ctx.fh); > - static struct vhost_memory mem_temp; > - > - switch (in_bufsz) { > - case 0: > - VHOST_IOCTL_RETRY(sizeof(struct vhost_memory), 0); > - break; > - > - case sizeof(struct vhost_memory): > - mem_temp = *(const struct vhost_memory *) in_buf; > - > - if (mem_temp.nregions > 0) { > - VHOST_IOCTL_RETRY(sizeof(struct vhost_memory) + > - (sizeof(struct vhost_memory_region) * > - mem_temp.nregions), 0); > - } else { > - result = -1; > - fuse_reply_ioctl(req, result, NULL, 0); > - } > - break; > - > - default: > - result = ops->set_mem_table(ctx, > - in_buf, mem_temp.nregions); > - if (result) > - fuse_reply_err(req, EINVAL); > - else > - fuse_reply_ioctl(req, result, NULL, 0); > - } > - break; > - > - case VHOST_SET_VRING_NUM: > - LOG_DEBUG(VHOST_CONFIG, > - "(%"PRIu64") IOCTL: VHOST_SET_VRING_NUM\n", ctx.fh); > - VHOST_IOCTL_R(struct vhost_vring_state, state, > - ops->set_vring_num); > - break; > - > - case VHOST_SET_VRING_BASE: > - LOG_DEBUG(VHOST_CONFIG, > - "(%"PRIu64") IOCTL: VHOST_SET_VRING_BASE\n", ctx.fh); > - VHOST_IOCTL_R(struct vhost_vring_state, state, > - ops->set_vring_base); > - break; > - > - case VHOST_GET_VRING_BASE: > - LOG_DEBUG(VHOST_CONFIG, > - "(%"PRIu64") IOCTL: VHOST_GET_VRING_BASE\n", ctx.fh); > - VHOST_IOCTL_RW(uint32_t, index, > - struct vhost_vring_state, state, ops->get_vring_base); > - break; > - > - case VHOST_SET_VRING_ADDR: > - LOG_DEBUG(VHOST_CONFIG, > - "(%"PRIu64") IOCTL: VHOST_SET_VRING_ADDR\n", ctx.fh); > - VHOST_IOCTL_R(struct vhost_vring_addr, addr, > - ops->set_vring_addr); > - break; > - > - case VHOST_SET_VRING_KICK: > - LOG_DEBUG(VHOST_CONFIG, > - "(%"PRIu64") IOCTL: VHOST_SET_VRING_KICK\n", ctx.fh); > - VHOST_IOCTL_R(struct vhost_vring_file, file, > - ops->set_vring_kick); > - break; > - > - case VHOST_SET_VRING_CALL: > - LOG_DEBUG(VHOST_CONFIG, > - "(%"PRIu64") IOCTL: VHOST_SET_VRING_CALL\n", ctx.fh); > - VHOST_IOCTL_R(struct vhost_vring_file, file, > - ops->set_vring_call); > - break; > - > - default: > - RTE_LOG(ERR, VHOST_CONFIG, > - "(%"PRIu64") IOCTL: DOESN NOT EXIST\n", ctx.fh); > - result = -1; > - fuse_reply_ioctl(req, result, NULL, 0); > - } > - > - if (result < 0) > - LOG_DEBUG(VHOST_CONFIG, > - "(%"PRIu64") IOCTL: FAIL\n", ctx.fh); > - else > - LOG_DEBUG(VHOST_CONFIG, > - "(%"PRIu64") IOCTL: SUCCESS\n", ctx.fh); > -} > - > -/* > - * Structure handling open, release and ioctl function pointers is populated. > - */ > -static const struct cuse_lowlevel_ops vhost_net_ops = { > - .open = vhost_net_open, > - .release = vhost_net_release, > - .ioctl = vhost_net_ioctl, > -}; > - > -/* > - * cuse_info is populated and used to register the cuse device. > - * vhost_net_device_ops are also passed when the device is registered in app. > - */ > -int > -rte_vhost_driver_register(const char *dev_name) > -{ > - struct cuse_info cuse_info; > - char device_name[PATH_MAX] = ""; > - char char_device_name[PATH_MAX] = ""; > - const char *device_argv[] = { device_name }; > - > - char fuse_opt_dummy[] = FUSE_OPT_DUMMY; > - char fuse_opt_fore[] = FUSE_OPT_FORE; > - char fuse_opt_nomulti[] = FUSE_OPT_NOMULTI; > - char *fuse_argv[] = {fuse_opt_dummy, fuse_opt_fore, fuse_opt_nomulti}; > - > - if (access(cuse_device_name, R_OK | W_OK) < 0) { > - RTE_LOG(ERR, VHOST_CONFIG, > - "char device %s can't be accessed, maybe not exist\n", > - cuse_device_name); > - return -1; > - } > - > - /* > - * The device name is created. This is passed to QEMU so that it can > - * register the device with our application. > - */ > - snprintf(device_name, PATH_MAX, "DEVNAME=%s", dev_name); > - snprintf(char_device_name, PATH_MAX, "/dev/%s", dev_name); > - > - /* Check if device already exists. */ > - if (access(char_device_name, F_OK) != -1) { > - RTE_LOG(ERR, VHOST_CONFIG, > - "char device %s already exists\n", char_device_name); > - return -1; > - } > - > - memset(&cuse_info, 0, sizeof(cuse_info)); > - cuse_info.dev_major = default_major; > - cuse_info.dev_minor = default_minor; > - cuse_info.dev_info_argc = 1; > - cuse_info.dev_info_argv = device_argv; > - cuse_info.flags = CUSE_UNRESTRICTED_IOCTL; > - > - ops = get_virtio_net_callbacks(); > - > - session = cuse_lowlevel_setup(3, fuse_argv, > - &cuse_info, &vhost_net_ops, 0, NULL); > - if (session == NULL) > - return -1; > - > - return 0; > -} > - > -/** > - * The CUSE session is launched allowing the application to receive open, > - * release and ioctl calls. > - */ > -int > -rte_vhost_driver_session_start(void) > -{ > - fuse_session_loop(session); > - > - return 0; > -} > diff --git a/lib/librte_vhost/vhost-net-cdev.h b/lib/librte_vhost/vhost-net-cdev.h > deleted file mode 100644 > index 03a5c57..0000000 > --- a/lib/librte_vhost/vhost-net-cdev.h > +++ /dev/null > @@ -1,113 +0,0 @@ > -/*- > - * BSD LICENSE > - * > - * Copyright(c) 2010-2014 Intel Corporation. All rights reserved. > - * All rights reserved. > - * > - * Redistribution and use in source and binary forms, with or without > - * modification, are permitted provided that the following conditions > - * are met: > - * > - * * Redistributions of source code must retain the above copyright > - * notice, this list of conditions and the following disclaimer. > - * * Redistributions in binary form must reproduce the above copyright > - * notice, this list of conditions and the following disclaimer in > - * the documentation and/or other materials provided with the > - * distribution. > - * * Neither the name of Intel Corporation nor the names of its > - * contributors may be used to endorse or promote products derived > - * from this software without specific prior written permission. > - * > - * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS > - * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT > - * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR > - * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT > - * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, > - * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT > - * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, > - * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY > - * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT > - * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE > - * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. > - */ > - > -#ifndef _VHOST_NET_CDEV_H_ > -#define _VHOST_NET_CDEV_H_ > -#include > -#include > -#include > -#include > -#include > - > -#include > - > -/* Macros for printing using RTE_LOG */ > -#define RTE_LOGTYPE_VHOST_CONFIG RTE_LOGTYPE_USER1 > -#define RTE_LOGTYPE_VHOST_DATA RTE_LOGTYPE_USER1 > - > -#ifdef RTE_LIBRTE_VHOST_DEBUG > -#define VHOST_MAX_PRINT_BUFF 6072 > -#define LOG_LEVEL RTE_LOG_DEBUG > -#define LOG_DEBUG(log_type, fmt, args...) RTE_LOG(DEBUG, log_type, fmt, ##args) > -#define PRINT_PACKET(device, addr, size, header) do { \ > - char *pkt_addr = (char *)(addr); \ > - unsigned int index; \ > - char packet[VHOST_MAX_PRINT_BUFF]; \ > - \ > - if ((header)) \ > - snprintf(packet, VHOST_MAX_PRINT_BUFF, "(%"PRIu64") Header size %d: ", (device->device_fh), (size)); \ > - else \ > - snprintf(packet, VHOST_MAX_PRINT_BUFF, "(%"PRIu64") Packet size %d: ", (device->device_fh), (size)); \ > - for (index = 0; index < (size); index++) { \ > - snprintf(packet + strnlen(packet, VHOST_MAX_PRINT_BUFF), VHOST_MAX_PRINT_BUFF - strnlen(packet, VHOST_MAX_PRINT_BUFF), \ > - "%02hhx ", pkt_addr[index]); \ > - } \ > - snprintf(packet + strnlen(packet, VHOST_MAX_PRINT_BUFF), VHOST_MAX_PRINT_BUFF - strnlen(packet, VHOST_MAX_PRINT_BUFF), "\n"); \ > - \ > - LOG_DEBUG(VHOST_DATA, "%s", packet); \ > -} while (0) > -#else > -#define LOG_LEVEL RTE_LOG_INFO > -#define LOG_DEBUG(log_type, fmt, args...) do {} while (0) > -#define PRINT_PACKET(device, addr, size, header) do {} while (0) > -#endif > - > - > -/* > - * Structure used to identify device context. > - */ > -struct vhost_device_ctx { > - pid_t pid; /* PID of process calling the IOCTL. */ > - uint64_t fh; /* Populated with fi->fh to track the device index. */ > -}; > - > -/* > - * Structure contains function pointers to be defined in virtio-net.c. These > - * functions are called in CUSE context and are used to configure devices. > - */ > -struct vhost_net_device_ops { > - int (*new_device)(struct vhost_device_ctx); > - void (*destroy_device)(struct vhost_device_ctx); > - > - int (*get_features)(struct vhost_device_ctx, uint64_t *); > - int (*set_features)(struct vhost_device_ctx, uint64_t *); > - > - int (*set_mem_table)(struct vhost_device_ctx, const void *, uint32_t); > - > - int (*set_vring_num)(struct vhost_device_ctx, struct vhost_vring_state *); > - int (*set_vring_addr)(struct vhost_device_ctx, struct vhost_vring_addr *); > - int (*set_vring_base)(struct vhost_device_ctx, struct vhost_vring_state *); > - int (*get_vring_base)(struct vhost_device_ctx, uint32_t, struct vhost_vring_state *); > - > - int (*set_vring_kick)(struct vhost_device_ctx, struct vhost_vring_file *); > - int (*set_vring_call)(struct vhost_device_ctx, struct vhost_vring_file *); > - > - int (*set_backend)(struct vhost_device_ctx, struct vhost_vring_file *); > - > - int (*set_owner)(struct vhost_device_ctx); > - int (*reset_owner)(struct vhost_device_ctx); > -}; > - > - > -struct vhost_net_device_ops const *get_virtio_net_callbacks(void); > -#endif /* _VHOST_NET_CDEV_H_ */ > diff --git a/lib/librte_vhost/vhost-user/fd_man.c b/lib/librte_vhost/vhost-user/fd_man.c > new file mode 100644 > index 0000000..c7fd3f2 > --- /dev/null > +++ b/lib/librte_vhost/vhost-user/fd_man.c > @@ -0,0 +1,158 @@ > +#include > +#include > +#include > +#include > +#include > +#include > +#include > +#include > + > +#include > + > +#include "fd_man.h" > + > +/** > + * Returns the index in the fdset for a fd. > + * If fd is -1, it means to search for a free entry. > + * @return > + * Index for the fd, or -1 if fd isn't in the fdset. > + */ > +static int > +fdset_find_fd(struct fdset *pfdset, int fd) > +{ > + int i; > + > + for (i = 0; i < pfdset->num && pfdset->fd[i].fd != fd; i++); > + > + return i == pfdset->num ? -1 : i; > +} > + > +static int > +fdset_find_free_slot(struct fdset *pfdset) > +{ > + return fdset_find_fd(pfdset, -1); > + > +} > + > +static void > +fdset_add_fd(struct fdset *pfdset, int idx, int fd, fd_cb rcb, > + fd_cb wcb, uint64_t dat) > +{ > + struct fdentry *pfdentry = &pfdset->fd[idx]; > + > + pfdentry->fd = fd; > + pfdentry->rcb = rcb; > + pfdentry->wcb = wcb; > + pfdentry->dat = dat; > +} > + > +/** > + * Fill the read/write fdset with the fds in the fdset. > + * @return > + * the maximum fds filled in the read/write fd_set. > + */ > +static int > +fdset_fill(fd_set *rfset, fd_set *wfset, struct fdset *pfdset) > +{ > + struct fdentry *pfdentry; > + int i, maxfds = -1; > + int num = MAX_FDS; > + > + for (i = 0; i < num ; i++) { > + pfdentry = &pfdset->fd[i]; > + if (pfdentry->fd != -1) { > + int added = 0; > + if (pfdentry->rcb && rfset) { > + FD_SET(pfdentry->fd, rfset); > + added = 1; > + } > + if (pfdentry->wcb && wfset) { > + FD_SET(pfdentry->fd, wfset); > + added = 1; > + } > + if (added) > + maxfds = pfdentry->fd < maxfds ? > + maxfds : pfdentry->fd; > + } > + } > + return maxfds; > +} > + > +void > +fdset_init(struct fdset *pfdset) > +{ > + int i; > + > + for (i = 0; i < MAX_FDS; i++) > + pfdset->fd[i].fd = -1; > + pfdset->num = MAX_FDS; > + > +} > + > +/** > + * Register the fd in the fdset with its read/write handler and context. > + */ > +int > +fdset_add(struct fdset *pfdset, int fd, fd_cb rcb, fd_cb wcb, uint64_t dat) > +{ > + int i; > + > + if (fd == -1) > + return -1; > + > + /* Find a free slot in the list. */ > + i = fdset_find_free_slot(pfdset); > + if (i == -1) > + return -2; > + > + fdset_add_fd(pfdset, i, fd, rcb, wcb, dat); > + > + return 0; > +} > + > +/** > + * Unregister the fd from the fdset. > + */ > +void > +fdset_del(struct fdset *pfdset, int fd) > +{ > + int i; > + > + i = fdset_find_fd(pfdset, fd); > + if (i != -1) { > + pfdset->fd[i].fd = -1; > + } > +} > + > + > +void > +fdset_event_dispatch(struct fdset *pfdset) > +{ > + fd_set rfds,wfds; > + int i, maxfds; > + struct fdentry *pfdentry; > + int num = MAX_FDS; > + > + if (pfdset == NULL) > + return; > + while (1) { > + FD_ZERO(&rfds); > + FD_ZERO(&wfds); > + maxfds = fdset_fill(&rfds, &wfds, pfdset); > + /* fd management runs in one thread */ > + if (maxfds == -1) { > + return; > + } > + > + select(maxfds + 1, &rfds, &wfds, NULL, NULL); > + > + for (i = 0; i < num; i++) { > + pfdentry = &pfdset->fd[i]; > + if (FD_ISSET(pfdentry->fd, &rfds)) > + pfdentry->rcb(pfdentry->fd, pfdentry->dat); > + if (FD_ISSET(pfdentry->fd, &wfds)) > + pfdentry->wcb(pfdentry->fd, pfdentry->dat); > + } > + > + } > +} > diff --git a/lib/librte_vhost/vhost-user/fd_man.h b/lib/librte_vhost/vhost-user/fd_man.h > new file mode 100644 > index 0000000..57cc81d > --- /dev/null > +++ b/lib/librte_vhost/vhost-user/fd_man.h > @@ -0,0 +1,31 @@ > +#ifndef _FD_MAN_H_ > +#define _FD_MAN_H_ > +#include > + > +#define MAX_FDS 1024 > + > +typedef void (*fd_cb)(int fd, uint64_t dat); > + > +struct fdentry { > + int fd; /* -1 indicates this entry is empty */ > + fd_cb rcb; /* callback when this fd is readable. */ > + fd_cb wcb; /* callback when this fd is writeable.*/ > + uint64_t dat; /* fd context */ > +}; > + > +struct fdset { > + struct fdentry fd[MAX_FDS]; > + int num; > +}; > + > + > +void fdset_init(struct fdset *pfdset); > + > +int fdset_add(struct fdset *pfdset, int fd, fd_cb rcb, > + fd_cb wcb, uint64_t ctx); > + > +void fdset_del(struct fdset *pfdset, int fd); > + > +void fdset_event_dispatch(struct fdset *pfdset); > + > +#endif > diff --git a/lib/librte_vhost/vhost-user/vhost-net-user.c b/lib/librte_vhost/vhost-user/vhost-net-user.c > new file mode 100644 > index 0000000..34450f4 > --- /dev/null > +++ b/lib/librte_vhost/vhost-user/vhost-net-user.c > @@ -0,0 +1,417 @@ > +/*- > + * BSD LICENSE > + * > + * Copyright(c) 2010-2014 Intel Corporation. All rights reserved. > + * All rights reserved. > + * > + * Redistribution and use in source and binary forms, with or without > + * modification, are permitted provided that the following conditions > + * are met: > + * > + * * Redistributions of source code must retain the above copyright > + * notice, this list of conditions and the following disclaimer. > + * * Redistributions in binary form must reproduce the above copyright > + * notice, this list of conditions and the following disclaimer in > + * the documentation and/or other materials provided with the > + * distribution. > + * * Neither the name of Intel Corporation nor the names of its > + * contributors may be used to endorse or promote products derived > + * from this software without specific prior written permission. > + * > + * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS > + * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT > + * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR > + * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT > + * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, > + * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT > + * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, > + * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY > + * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT > + * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE > + * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. > + */ > + > +#include > +#include > +#include > +#include > +#include > +#include > +#include > +#include > +#include > +#include > + > +#include > +#include > + > +#include "fd_man.h" > +#include "vhost-net-user.h" > +#include "vhost-net.h" > +#include "virtio-net-user.h" > + > +static void vserver_new_vq_conn(int fd, uint64_t data); > +static void vserver_message_handler(int fd, uint64_t dat); > +const struct vhost_net_device_ops *ops; > + > +static struct vhost_server *g_vhost_server; > + > +static const char *vhost_message_str[VHOST_USER_MAX] = > +{ > + [VHOST_USER_NONE] = "VHOST_USER_NONE", > + [VHOST_USER_GET_FEATURES] = "VHOST_USER_GET_FEATURES", > + [VHOST_USER_SET_FEATURES] = "VHOST_USER_SET_FEATURES", > + [VHOST_USER_SET_OWNER] = "VHOST_USER_SET_OWNER", > + [VHOST_USER_RESET_OWNER] = "VHOST_USER_RESET_OWNER", > + [VHOST_USER_SET_MEM_TABLE] = "VHOST_USER_SET_MEM_TABLE", > + [VHOST_USER_SET_LOG_BASE] = "VHOST_USER_SET_LOG_BASE", > + [VHOST_USER_SET_LOG_FD] = "VHOST_USER_SET_LOG_FD", > + [VHOST_USER_SET_VRING_NUM] = "VHOST_USER_SET_VRING_NUM", > + [VHOST_USER_SET_VRING_ADDR] = "VHOST_USER_SET_VRING_ADDR", > + [VHOST_USER_SET_VRING_BASE] = "VHOST_USER_SET_VRING_BASE", > + [VHOST_USER_GET_VRING_BASE] = "VHOST_USER_GET_VRING_BASE", > + [VHOST_USER_SET_VRING_KICK] = "VHOST_USER_SET_VRING_KICK", > + [VHOST_USER_SET_VRING_CALL] = "VHOST_USER_SET_VRING_CALL", > + [VHOST_USER_SET_VRING_ERR] = "VHOST_USER_SET_VRING_ERR" > +}; > + > +/** > + * Create a unix domain socket and bind to path. > + * @return > + * socket fd or -1 on failure > + */ > +static int > +uds_socket(const char *path) > +{ > + struct sockaddr_un un; > + int sockfd; > + int ret; > + > + if (path == NULL) > + return -1; > + > + sockfd = socket(AF_UNIX, SOCK_STREAM, 0); > + if (sockfd < 0) > + return -1; > + RTE_LOG(INFO, VHOST_CONFIG, "socket created, fd:%d\n", sockfd); > + > + memset(&un, 0, sizeof(un)); > + un.sun_family = AF_UNIX; > + snprintf(un.sun_path, sizeof(un.sun_path), "%s", path); > + ret = bind(sockfd, (struct sockaddr *)&un, sizeof(un)); > + if (ret == -1) > + goto err; > + RTE_LOG(INFO, VHOST_CONFIG, "bind to %s\n", path); > + > + ret = listen(sockfd, 1); > + if (ret == -1) > + goto err; > + > + return sockfd; > + > +err: > + close(sockfd); > + return -1; > +} > + > + > +/* return bytes# of read */ > +static int > +read_fd_message(int sockfd, char *buf, int buflen, int *fds, int fd_num) > +{ > + > + struct iovec iov; > + struct msghdr msgh = { 0 }; > + size_t fdsize = fd_num * sizeof(int); > + char control[CMSG_SPACE(fdsize)]; > + struct cmsghdr *cmsg; > + int ret; > + > + iov.iov_base = buf; > + iov.iov_len = buflen; > + > + msgh.msg_iov = &iov; > + msgh.msg_iovlen = 1; > + msgh.msg_control = control; > + msgh.msg_controllen = sizeof(control); > + > + ret = recvmsg(sockfd, &msgh, 0); > + if (ret <= 0) { > + RTE_LOG(ERR, VHOST_CONFIG, "%s failed\n", __func__); > + return ret; > + } > + /* ret == buflen */ > + if (msgh.msg_flags & (MSG_TRUNC | MSG_CTRUNC)) { > + RTE_LOG(ERR, VHOST_CONFIG, "%s failed\n", __func__); > + return -1; > + } > + > + for (cmsg = CMSG_FIRSTHDR(&msgh); cmsg != NULL; > + cmsg = CMSG_NXTHDR(&msgh, cmsg)) { > + if ( (cmsg->cmsg_level == SOL_SOCKET) && > + (cmsg->cmsg_type == SCM_RIGHTS)) { > + memcpy(fds, CMSG_DATA(cmsg), fdsize); > + break; > + } > + } > + return ret; > +} > + > +static int > +read_vhost_message(int sockfd, struct VhostUserMsg *msg) > +{ > + int ret; > + > + ret = read_fd_message(sockfd, (char *)msg, VHOST_USER_HDR_SIZE, > + msg->fds, VHOST_MEMORY_MAX_NREGIONS); > + if (ret <= 0) > + return ret; > + > + if (msg->size) { > + if (msg->size > sizeof(msg->payload)) { > + RTE_LOG(ERR, VHOST_CONFIG, > + "%s: invalid size:%d\n", __func__, msg->size); > + return -1; > + } > + ret = read(sockfd, &msg->payload, msg->size); > + if (ret == 0) > + return 0; > + if (ret != (int)msg->size) { > + printf("read control message failed\n"); > + return -1; > + } > + } > + > + return ret; > +} > + > +static int > +send_fd_message(int sockfd, char *buf, int buflen, int *fds, int fd_num) > +{ > + > + struct iovec iov; > + struct msghdr msgh = { 0 }; > + size_t fdsize = fd_num * sizeof(int); > + char control[CMSG_SPACE(fdsize)]; > + struct cmsghdr *cmsg; > + int ret; > + > + iov.iov_base = buf; > + iov.iov_len = buflen; > + msgh.msg_iov = &iov; > + msgh.msg_iovlen = 1; > + > + if (fds && fd_num > 0) { > + msgh.msg_control = control; > + msgh.msg_controllen = sizeof(control); > + cmsg = CMSG_FIRSTHDR(&msgh); > + cmsg->cmsg_len = CMSG_LEN(fdsize); > + cmsg->cmsg_level = SOL_SOCKET; > + cmsg->cmsg_type = SCM_RIGHTS; > + memcpy(CMSG_DATA(cmsg), fds, fdsize); > + } else { > + msgh.msg_control = NULL; > + msgh.msg_controllen = 0; > + } > + > + do { > + ret = sendmsg(sockfd, &msgh, 0); > + } while (ret < 0 && errno == EINTR); > + > + if (ret < 0) { > + RTE_LOG(ERR, VHOST_CONFIG, "sendmsg error\n"); > + return -1; > + } > + > + return 0; > +} > + > +static int > +send_vhost_message(int sockfd, struct VhostUserMsg *msg) > +{ > + int ret; > + > + msg->flags &= ~VHOST_USER_VERSION_MASK; > + msg->flags |= VHOST_USER_VERSION; > + msg->flags |= VHOST_USER_REPLY_MASK; > + > + ret = send_fd_message(sockfd, (char *)msg, > + VHOST_USER_HDR_SIZE + msg->size, NULL, 0); > + > + return ret; > +} > + > +/* call back when there is new connection. */ > +static void > +vserver_new_vq_conn(int fd, uint64_t dat) > +{ > + struct vhost_server *vserver = (void *)(uintptr_t)dat; > + int conn_fd; > + uint32_t fh; > + struct vhost_device_ctx vdev_ctx = { 0 }; > + > + conn_fd = accept(fd, NULL, NULL); > + RTE_LOG(INFO, VHOST_CONFIG, > + "%s: new connection is %d\n", __func__, conn_fd); > + if (conn_fd < 0) > + return; > + > + fh = ops->new_device(vdev_ctx); > + RTE_LOG(INFO, VHOST_CONFIG, "new device, handle is %d\n", fh); > + > + fdset_add(&vserver->fdset, > + conn_fd, vserver_message_handler, NULL, fh); > +} > + > +/* callback when there is message on the connfd */ > +static void > +vserver_message_handler(int connfd, uint64_t dat) > +{ > + struct vhost_device_ctx ctx; > + uint32_t fh = (uint32_t)dat; > + struct VhostUserMsg msg; > + uint64_t features; > + int ret; > + > + ctx.fh = fh; > + ret = read_vhost_message(connfd, &msg); > + if (ret < 0) { > + printf("vhost read message failed\n"); > + > + /*TODO: cleanup */ > + close(connfd); > + fdset_del(&g_vhost_server->fdset, connfd); > + ops->destroy_device(ctx); > + > + return; > + } else if (ret == 0) { > + /*TODO: cleanup */ > + RTE_LOG(INFO, VHOST_CONFIG, > + "vhost peer closed\n"); > + close(connfd); > + fdset_del(&g_vhost_server->fdset, connfd); > + ops->destroy_device(ctx); > + > + return; > + } > + if (msg.request > VHOST_USER_MAX) { > + /*TODO: cleanup */ > + RTE_LOG(INFO, VHOST_CONFIG, > + "vhost read incorrect message\n"); > + close(connfd); > + fdset_del(&g_vhost_server->fdset, connfd); > + > + return; > + } > + > + RTE_LOG(INFO, VHOST_CONFIG, "read message %s\n", > + vhost_message_str[msg.request]); > + switch (msg.request) { > + case VHOST_USER_GET_FEATURES: > + ret = ops->get_features(ctx, &features); > + msg.payload.u64 = ret; > + msg.size = sizeof(msg.payload.u64); > + send_vhost_message(connfd, &msg); > + break; > + case VHOST_USER_SET_FEATURES: > + ops->set_features(ctx, &features); > + break; > + > + case VHOST_USER_SET_OWNER: > + ops->set_owner(ctx); > + break; > + case VHOST_USER_RESET_OWNER: > + ops->reset_owner(ctx); > + break; > + > + case VHOST_USER_SET_MEM_TABLE: > + user_set_mem_table(ctx, &msg); > + break; > + > + case VHOST_USER_SET_LOG_BASE: > + case VHOST_USER_SET_LOG_FD: > + RTE_LOG(INFO, VHOST_CONFIG, "not implemented.\n"); > + break; > + > + case VHOST_USER_SET_VRING_NUM: > + ops->set_vring_num(ctx, &msg.payload.state); > + break; > + case VHOST_USER_SET_VRING_ADDR: > + ops->set_vring_addr(ctx, &msg.payload.addr); > + break; > + case VHOST_USER_SET_VRING_BASE: > + ops->set_vring_base(ctx, &msg.payload.state); > + break; > + > + case VHOST_USER_GET_VRING_BASE: > + ret = ops->get_vring_base(ctx, msg.payload.state.index, > + &msg.payload.state); > + msg.size = sizeof(msg.payload.state); > + send_vhost_message(connfd, &msg); > + break; > + > + case VHOST_USER_SET_VRING_KICK: > + user_set_vring_kick(ctx, &msg); > + break; > + case VHOST_USER_SET_VRING_CALL: > + user_set_vring_call(ctx, &msg); > + break; > + > + case VHOST_USER_SET_VRING_ERR: > + RTE_LOG(INFO, VHOST_CONFIG, "not implemented\n"); > + break; > + > + default: > + break; > + > + } > +} > + > + > +/** > + * Creates and initialise the vhost server. > + */ > +int > +rte_vhost_driver_register(const char *path) > +{ > + > + struct vhost_server *vserver; > + > + if (g_vhost_server != NULL) > + return -1; > + > + vserver = calloc(sizeof(struct vhost_server), 1); > + /*TODO: all allocation is through DPDK memory allocation */ > + if (vserver == NULL) > + return -1; > + > + fdset_init(&vserver->fdset); > + > + unlink(path); > + > + vserver->listenfd = uds_socket(path); > + if (vserver->listenfd < 0) { > + free(vserver); > + return -1; > + } > + vserver->path = path; > + > + fdset_add(&vserver->fdset, vserver->listenfd, > + vserver_new_vq_conn, NULL, > + (uint64_t)(uintptr_t)vserver); > + > + ops = get_virtio_net_callbacks(); > + > + g_vhost_server = vserver; > + > + return 0; > +} > + > + > +int > +rte_vhost_driver_session_start(void) > +{ > + fdset_event_dispatch(&g_vhost_server->fdset); > + return 0; > +} > + > diff --git a/lib/librte_vhost/vhost-user/vhost-net-user.h b/lib/librte_vhost/vhost-user/vhost-net-user.h > new file mode 100644 > index 0000000..c9df9fa > --- /dev/null > +++ b/lib/librte_vhost/vhost-user/vhost-net-user.h > @@ -0,0 +1,74 @@ > +#ifndef _VHOST_NET_USER_H > +#define _VHOST_NET_USER_H > +#include > +#include > + > +#include "fd_man.h" > + > +struct vhost_server { > + const char *path; /**< The path the uds is bind to. */ > + int listenfd; /**< The listener sockfd. */ > + struct fdset fdset; /**< The fd list this vhost server manages. */ > +}; > + > +/*********** FROM hw/virtio/vhost-user.c *************************************/ > + > +#define VHOST_MEMORY_MAX_NREGIONS 8 > + > +typedef enum VhostUserRequest { > + VHOST_USER_NONE = 0, > + VHOST_USER_GET_FEATURES = 1, > + VHOST_USER_SET_FEATURES = 2, > + VHOST_USER_SET_OWNER = 3, > + VHOST_USER_RESET_OWNER = 4, > + VHOST_USER_SET_MEM_TABLE = 5, > + VHOST_USER_SET_LOG_BASE = 6, > + VHOST_USER_SET_LOG_FD = 7, > + VHOST_USER_SET_VRING_NUM = 8, > + VHOST_USER_SET_VRING_ADDR = 9, > + VHOST_USER_SET_VRING_BASE = 10, > + VHOST_USER_GET_VRING_BASE = 11, > + VHOST_USER_SET_VRING_KICK = 12, > + VHOST_USER_SET_VRING_CALL = 13, > + VHOST_USER_SET_VRING_ERR = 14, > + VHOST_USER_MAX > +} VhostUserRequest; > + > +typedef struct VhostUserMemoryRegion { > + uint64_t guest_phys_addr; > + uint64_t memory_size; > + uint64_t userspace_addr; > + uint64_t mmap_offset; > +} VhostUserMemoryRegion; > + > +typedef struct VhostUserMemory { > + uint32_t nregions; > + uint32_t padding; > + VhostUserMemoryRegion regions[VHOST_MEMORY_MAX_NREGIONS]; > +} VhostUserMemory; > + > +typedef struct VhostUserMsg { > + VhostUserRequest request; > + > +#define VHOST_USER_VERSION_MASK (0x3) > +#define VHOST_USER_REPLY_MASK (0x1 << 2) > + uint32_t flags; > + uint32_t size; /* the following payload size */ > + union { > +#define VHOST_USER_VRING_IDX_MASK (0xff) > +#define VHOST_USER_VRING_NOFD_MASK (0x1<<8) > + uint64_t u64; > + struct vhost_vring_state state; > + struct vhost_vring_addr addr; > + VhostUserMemory memory; > + } payload; > + int fds[VHOST_MEMORY_MAX_NREGIONS]; > +} __attribute__((packed)) VhostUserMsg; > + > +#define VHOST_USER_HDR_SIZE (intptr_t)(&((VhostUserMsg *)0)->payload.u64) > + > +/* The version of the protocol we support */ > +#define VHOST_USER_VERSION (0x1) > + > +/*****************************************************************************/ > +#endif > diff --git a/lib/librte_vhost/vhost-user/virtio-net-user.c b/lib/librte_vhost/vhost-user/virtio-net-user.c > new file mode 100644 > index 0000000..f38e6cc > --- /dev/null > +++ b/lib/librte_vhost/vhost-user/virtio-net-user.c > @@ -0,0 +1,208 @@ > +/*- > + * BSD LICENSE > + * > + * Copyright(c) 2010-2014 Intel Corporation. All rights reserved. > + * All rights reserved. > + * > + * Redistribution and use in source and binary forms, with or without > + * modification, are permitted provided that the following conditions > + * are met: > + * > + * * Redistributions of source code must retain the above copyright > + * notice, this list of conditions and the following disclaimer. > + * * Redistributions in binary form must reproduce the above copyright > + * notice, this list of conditions and the following disclaimer in > + * the documentation and/or other materials provided with the > + * distribution. > + * * Neither the name of Intel Corporation nor the names of its > + * contributors may be used to endorse or promote products derived > + * from this software without specific prior written permission. > + * > + * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS > + * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT > + * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR > + * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT > + * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, > + * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT > + * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, > + * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY > + * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT > + * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE > + * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. > + */ > + > +#include > +#include > +#include > +#include > +#include > + > +#include > + > +#include "virtio-net-user.h" > +#include "vhost-net-user.h" > +#include "vhost-net.h" > + > +extern const struct vhost_net_device_ops *ops; > + > +#if 0 > +int > +user_set_mem_table(struct vhost_device_ctx ctx, struct VhostUserMsg *pmsg) > +{ > + unsigned int idx; > + struct VhostUserMemory memory = pmsg->payload.memory; > + struct virtio_memory_regions regions[VHOST_MEMORY_MAX_NREGIONS]; > + uint64_t mapped_address, base_address = 0, mem_size = 0; > + > + for (idx = 0; idx < memory.nregions; idx++) { > + if (memory.regions[idx].guest_phys_addr == 0) > + base_address = memory.regions[idx].userspace_addr; > + } > + if (base_address == 0) { > + RTE_LOG(ERR, VHOST_CONFIG, > + "couldn't find the mem region whose gpa is 0.\n"); > + return -1; > + } > + > + for (idx = 0; idx < memory.nregions; idx++) { > + uint64_t size = memory.regions[idx].userspace_addr - > + base_address + memory.regions[idx].memory_size; > + if (mem_size < size) > + mem_size = size; > + } > + > + /* > + * here we assume qemu will map only one file for memory allocation, > + * we only use fds[0] with offset 0. > + */ > + mapped_address = (uint64_t)(uintptr_t)mmap(NULL, mem_size, > + PROT_READ | PROT_WRITE, MAP_SHARED, pmsg->fds[0], 0); > + > + if (mapped_address == (uint64_t)(uintptr_t)MAP_FAILED) { > + RTE_LOG(ERR, VHOST_CONFIG, " mmap qemu guest failed.\n"); > + return -1; > + } > + > + for (idx = 0; idx < memory.nregions; idx++) { > + regions[idx].guest_phys_address = > + memory.regions[idx].guest_phys_addr; > + regions[idx].guest_phys_address_end = > + memory.regions[idx].guest_phys_addr + > + memory.regions[idx].memory_size; > + regions[idx].memory_size = memory.regions[idx].memory_size; > + regions[idx].userspace_address = > + memory.regions[idx].userspace_addr; > + > + regions[idx].address_offset = mapped_address - base_address + > + regions[idx].userspace_address - > + regions[idx].guest_phys_address; > + LOG_DEBUG(VHOST_CONFIG, > + "REGION: %u - GPA: %p - QEMU VA: %p - SIZE (%"PRIu64")\n", > + idx, > + (void *)(uintptr_t)regions[idx].guest_phys_address, > + (void *)(uintptr_t)regions[idx].userspace_address, > + regions[idx].memory_size); > + } > + ops->set_mem_table(ctx, regions, memory.nregions); > + return 0; > +} > + > +#else > + > +int > +user_set_mem_table(struct vhost_device_ctx ctx, struct VhostUserMsg *pmsg) > +{ > + unsigned int idx; > + struct VhostUserMemory memory = pmsg->payload.memory; > + struct virtio_memory_regions regions[VHOST_MEMORY_MAX_NREGIONS]; > + uint64_t mapped_address, base_address = 0; > + > + for (idx = 0; idx < memory.nregions; idx++) { > + if (memory.regions[idx].guest_phys_addr == 0) > + base_address = memory.regions[idx].userspace_addr; > + } > + if (base_address == 0) { > + RTE_LOG(ERR, VHOST_CONFIG, > + "couldn't find the mem region whose gpa is 0.\n"); > + return -1; > + } > + > + > + for (idx = 0; idx < memory.nregions; idx++) { > + regions[idx].guest_phys_address = > + memory.regions[idx].guest_phys_addr; > + regions[idx].guest_phys_address_end = > + memory.regions[idx].guest_phys_addr + > + memory.regions[idx].memory_size; > + regions[idx].memory_size = memory.regions[idx].memory_size; > + regions[idx].userspace_address = > + memory.regions[idx].userspace_addr; > +/* > + mapped_address = (uint64_t)(uintptr_t)mmap(NULL, > + regions[idx].memory_size, > + PROT_READ | PROT_WRITE, MAP_SHARED, > + pmsg->fds[idx], > + memory.regions[idx].mmap_offset); > +*/ > + > +/* This is ugly */ > + mapped_address = (uint64_t)(uintptr_t)mmap(NULL, > + regions[idx].memory_size + > + memory.regions[idx].mmap_offset, > + PROT_READ | PROT_WRITE, MAP_SHARED, > + pmsg->fds[idx], > + 0); > + printf("mapped to %p\n", (void *)mapped_address); > + > + if (mapped_address == (uint64_t)(uintptr_t)MAP_FAILED) { > + RTE_LOG(ERR, VHOST_CONFIG, " mmap qemu guest failed.\n"); > + return -1; > + } > + > +// printf("ret=%d\n", munmap((void *)mapped_address, (regions[idx].memory_size + memory.regions[idx].mmap_offset + 0x3FFFFFFF) & ~0x3FFFFFFF)); > +// printf("unaligned ret=%d\n", munmap((void *)mapped_address, (regions[idx].memory_size + memory.regions[idx].mmap_offset ) )); > + mapped_address += memory.regions[idx].mmap_offset; > + > + regions[idx].address_offset = mapped_address - > + regions[idx].guest_phys_address; > + LOG_DEBUG(VHOST_CONFIG, > + "REGION: %u - GPA: %p - QEMU VA: %p - SIZE (%"PRIu64")\n", > + idx, > + (void *)(uintptr_t)regions[idx].guest_phys_address, > + (void *)(uintptr_t)regions[idx].userspace_address, > + regions[idx].memory_size); > + } > + ops->set_mem_table(ctx, regions, memory.nregions); > + return 0; > +} > + > + > + > + > +#endif > + > + > +void > +user_set_vring_call(struct vhost_device_ctx ctx, struct VhostUserMsg *pmsg) > +{ > + struct vhost_vring_file file; > + > + file.index = pmsg->payload.u64 & VHOST_USER_VRING_IDX_MASK; > + file.fd = pmsg->fds[0]; > + RTE_LOG(INFO, VHOST_CONFIG, > + "vring call idx:%d file:%d\n", file.index, file.fd); > + ops->set_vring_call(ctx, &file); > +} > + > + > +void > +user_set_vring_kick(struct vhost_device_ctx ctx, struct VhostUserMsg *pmsg) > +{ > + struct vhost_vring_file file; > + > + file.index = pmsg->payload.u64 & VHOST_USER_VRING_IDX_MASK; > + file.fd = pmsg->fds[0]; > + RTE_LOG(INFO, VHOST_CONFIG, > + "vring kick idx:%d file:%d\n", file.index, file.fd); > + ops->set_vring_kick(ctx, &file); > +} > diff --git a/lib/librte_vhost/vhost-user/virtio-net-user.h b/lib/librte_vhost/vhost-user/virtio-net-user.h > new file mode 100644 > index 0000000..0969376 > --- /dev/null > +++ b/lib/librte_vhost/vhost-user/virtio-net-user.h > @@ -0,0 +1,11 @@ > +#ifndef _VIRTIO_NET_USER_H > +#define _VIRTIO_NET_USER_H > + > +#include "vhost-net.h" > +#include "vhost-net-user.h" > + > +int user_set_mem_table(struct vhost_device_ctx, struct VhostUserMsg *); > +void user_set_vring_kick(struct vhost_device_ctx, struct VhostUserMsg *); > +void user_set_vring_call(struct vhost_device_ctx, struct VhostUserMsg *); > + > +#endif > diff --git a/lib/librte_vhost/vhost_rxtx.c b/lib/librte_vhost/vhost_rxtx.c > index ccfd82f..8ff0301 100644 > --- a/lib/librte_vhost/vhost_rxtx.c > +++ b/lib/librte_vhost/vhost_rxtx.c > @@ -38,19 +38,14 @@ > #include > #include > > -#include "vhost-net-cdev.h" > +#include "vhost-net.h" > > -#define MAX_PKT_BURST 32 > +#define VHOST_MAX_PKT_BURST 64 > +#define VHOST_MAX_MRG_PKT_BURST 64 > > -/** > - * This function adds buffers to the virtio devices RX virtqueue. Buffers can > - * be received from the physical port or from another virtio device. A packet > - * count is returned to indicate the number of packets that are succesfully > - * added to the RX queue. This function works when mergeable is disabled. > - */ > -static inline uint32_t __attribute__((always_inline)) > -virtio_dev_rx(struct virtio_net *dev, uint16_t queue_id, > - struct rte_mbuf **pkts, uint32_t count) > + > +uint32_t > +rte_vhost_enqueue_burst(struct virtio_net *dev, uint16_t queue_id, struct rte_mbuf **pkts, uint32_t count) > { > struct vhost_virtqueue *vq; > struct vring_desc *desc; > @@ -59,26 +54,23 @@ virtio_dev_rx(struct virtio_net *dev, uint16_t queue_id, > struct virtio_net_hdr_mrg_rxbuf virtio_hdr = {{0, 0, 0, 0, 0, 0}, 0}; > uint64_t buff_addr = 0; > uint64_t buff_hdr_addr = 0; > - uint32_t head[MAX_PKT_BURST], packet_len = 0; > + uint32_t head[VHOST_MAX_PKT_BURST], packet_len = 0; > uint32_t head_idx, packet_success = 0; > + uint32_t mergeable, mrg_count = 0; > uint16_t avail_idx, res_cur_idx; > uint16_t res_base_idx, res_end_idx; > uint16_t free_entries; > uint8_t success = 0; > > - LOG_DEBUG(VHOST_DATA, "(%"PRIu64") virtio_dev_rx()\n", dev->device_fh); > + LOG_DEBUG(VHOST_DATA, "(%"PRIu64") %s()\n", dev->device_fh, __func__); > if (unlikely(queue_id != VIRTIO_RXQ)) { > LOG_DEBUG(VHOST_DATA, "mq isn't supported in this version.\n"); > return 0; > } > > vq = dev->virtqueue[VIRTIO_RXQ]; > - count = (count > MAX_PKT_BURST) ? MAX_PKT_BURST : count; > - > - /* > - * As many data cores may want access to available buffers, > - * they need to be reserved. > - */ > + count = (count > VHOST_MAX_PKT_BURST) ? VHOST_MAX_PKT_BURST : count; > + /* As many data cores may want access to available buffers, they need to be reserved. */ > do { > res_base_idx = vq->last_used_idx_res; > avail_idx = *((volatile uint16_t *)&vq->avail->idx); > @@ -93,21 +85,25 @@ virtio_dev_rx(struct virtio_net *dev, uint16_t queue_id, > > res_end_idx = res_base_idx + count; > /* vq->last_used_idx_res is atomically updated. */ > - /* TODO: Allow to disable cmpset if no concurrency in application. */ > + /* TODO: Allow to disable cmpset if no concurrency in application */ > success = rte_atomic16_cmpset(&vq->last_used_idx_res, > res_base_idx, res_end_idx); > + /* If there is contention here and failed, try again. */ > } while (unlikely(success == 0)); > res_cur_idx = res_base_idx; > LOG_DEBUG(VHOST_DATA, "(%"PRIu64") Current Index %d| End Index %d\n", > - dev->device_fh, res_cur_idx, res_end_idx); > + dev->device_fh, > + res_cur_idx, res_end_idx); > > /* Prefetch available ring to retrieve indexes. */ > rte_prefetch0(&vq->avail->ring[res_cur_idx & (vq->size - 1)]); > > + /* Check if the VIRTIO_NET_F_MRG_RXBUF feature is enabled. */ > + mergeable = dev->features & (1 << VIRTIO_NET_F_MRG_RXBUF); > + > /* Retrieve all of the head indexes first to avoid caching issues. */ > for (head_idx = 0; head_idx < count; head_idx++) > - head[head_idx] = vq->avail->ring[(res_cur_idx + head_idx) & > - (vq->size - 1)]; > + head[head_idx] = vq->avail->ring[(res_cur_idx + head_idx) & (vq->size - 1)]; > > /*Prefetch descriptor index. */ > rte_prefetch0(&vq->desc[head[packet_success]]); > @@ -123,46 +119,57 @@ virtio_dev_rx(struct virtio_net *dev, uint16_t queue_id, > /* Prefetch buffer address. */ > rte_prefetch0((void *)(uintptr_t)buff_addr); > > - /* Copy virtio_hdr to packet and increment buffer address */ > - buff_hdr_addr = buff_addr; > - packet_len = rte_pktmbuf_data_len(buff) + vq->vhost_hlen; > - > - /* > - * If the descriptors are chained the header and data are > - * placed in separate buffers. > - */ > - if (desc->flags & VRING_DESC_F_NEXT) { > - desc->len = vq->vhost_hlen; > - desc = &vq->desc[desc->next]; > - /* Buffer address translation. */ > - buff_addr = gpa_to_vva(dev, desc->addr); > - desc->len = rte_pktmbuf_data_len(buff); > + if (mergeable && (mrg_count != 0)) { > + desc->len = packet_len = rte_pktmbuf_data_len(buff); > } else { > - buff_addr += vq->vhost_hlen; > - desc->len = packet_len; > + /* Copy virtio_hdr to packet and increment buffer address */ > + buff_hdr_addr = buff_addr; > + packet_len = rte_pktmbuf_data_len(buff) + vq->vhost_hlen; > + > + /* > + * If the descriptors are chained the header and data are placed in > + * separate buffers. > + */ > + if (desc->flags & VRING_DESC_F_NEXT) { > + desc->len = vq->vhost_hlen; > + desc = &vq->desc[desc->next]; > + /* Buffer address translation. */ > + buff_addr = gpa_to_vva(dev, desc->addr); > + desc->len = rte_pktmbuf_data_len(buff); > + } else { > + buff_addr += vq->vhost_hlen; > + desc->len = packet_len; > + } > } > > + VHOST_PRINT_PACKET(dev, (uintptr_t)buff_addr, rte_pktmbuf_data_len(buff), 0); > + > /* Update used ring with desc information */ > - vq->used->ring[res_cur_idx & (vq->size - 1)].id = > - head[packet_success]; > + vq->used->ring[res_cur_idx & (vq->size - 1)].id = head[packet_success]; > vq->used->ring[res_cur_idx & (vq->size - 1)].len = packet_len; > > /* Copy mbuf data to buffer */ > - /* FIXME for sg mbuf and the case that desc couldn't hold the mbuf data */ > - rte_memcpy((void *)(uintptr_t)buff_addr, > - rte_pktmbuf_mtod(buff, const void *), > - rte_pktmbuf_data_len(buff)); > - PRINT_PACKET(dev, (uintptr_t)buff_addr, > - rte_pktmbuf_data_len(buff), 0); > + /* TODO fixme for sg mbuf and the case that desc couldn't hold the mbuf data */ > + rte_memcpy((void *)(uintptr_t)buff_addr, (const void *)buff->pkt.data, rte_pktmbuf_data_len(buff)); > > res_cur_idx++; > packet_success++; > > - rte_memcpy((void *)(uintptr_t)buff_hdr_addr, > - (const void *)&virtio_hdr, vq->vhost_hlen); > - > - PRINT_PACKET(dev, (uintptr_t)buff_hdr_addr, vq->vhost_hlen, 1); > - > + /* If mergeable is disabled then a header is required per buffer. */ > + if (!mergeable) { > + rte_memcpy((void *)(uintptr_t)buff_hdr_addr, (const void *)&virtio_hdr, vq->vhost_hlen); > + VHOST_PRINT_PACKET(dev, (uintptr_t)buff_hdr_addr, vq->vhost_hlen, 1); > + } else { > + mrg_count++; > + /* Merge buffer can only handle so many buffers at a time. Tell the guest if this limit is reached. */ > + if ((mrg_count == VHOST_MAX_MRG_PKT_BURST) || (res_cur_idx == res_end_idx)) { > + virtio_hdr.num_buffers = mrg_count; > + LOG_DEBUG(VHOST_DATA, "(%"PRIu64") RX: Num merge buffers %d\n", dev->device_fh, virtio_hdr.num_buffers); > + rte_memcpy((void *)(uintptr_t)buff_hdr_addr, (const void *)&virtio_hdr, vq->vhost_hlen); > + VHOST_PRINT_PACKET(dev, (uintptr_t)buff_hdr_addr, vq->vhost_hlen, 1); > + mrg_count = 0; > + } > + } > if (res_cur_idx < res_end_idx) { > /* Prefetch descriptor index. */ > rte_prefetch0(&vq->desc[head[packet_success]]); > @@ -184,357 +191,18 @@ virtio_dev_rx(struct virtio_net *dev, uint16_t queue_id, > return count; > } > > -static inline uint32_t __attribute__((always_inline)) > -copy_from_mbuf_to_vring(struct virtio_net *dev, uint16_t res_base_idx, > - uint16_t res_end_idx, struct rte_mbuf *pkt) > -{ > - uint32_t vec_idx = 0; > - uint32_t entry_success = 0; > - struct vhost_virtqueue *vq; > - /* The virtio_hdr is initialised to 0. */ > - struct virtio_net_hdr_mrg_rxbuf virtio_hdr = { > - {0, 0, 0, 0, 0, 0}, 0}; > - uint16_t cur_idx = res_base_idx; > - uint64_t vb_addr = 0; > - uint64_t vb_hdr_addr = 0; > - uint32_t seg_offset = 0; > - uint32_t vb_offset = 0; > - uint32_t seg_avail; > - uint32_t vb_avail; > - uint32_t cpy_len, entry_len; > - > - if (pkt == NULL) > - return 0; > - > - LOG_DEBUG(VHOST_DATA, "(%"PRIu64") Current Index %d| " > - "End Index %d\n", > - dev->device_fh, cur_idx, res_end_idx); > - > - /* > - * Convert from gpa to vva > - * (guest physical addr -> vhost virtual addr) > - */ > - vq = dev->virtqueue[VIRTIO_RXQ]; > - vb_addr = > - gpa_to_vva(dev, vq->buf_vec[vec_idx].buf_addr); > - vb_hdr_addr = vb_addr; > - > - /* Prefetch buffer address. */ > - rte_prefetch0((void *)(uintptr_t)vb_addr); > - > - virtio_hdr.num_buffers = res_end_idx - res_base_idx; > - > - LOG_DEBUG(VHOST_DATA, "(%"PRIu64") RX: Num merge buffers %d\n", > - dev->device_fh, virtio_hdr.num_buffers); > > - rte_memcpy((void *)(uintptr_t)vb_hdr_addr, > - (const void *)&virtio_hdr, vq->vhost_hlen); > - > - PRINT_PACKET(dev, (uintptr_t)vb_hdr_addr, vq->vhost_hlen, 1); > - > - seg_avail = rte_pktmbuf_data_len(pkt); > - vb_offset = vq->vhost_hlen; > - vb_avail = > - vq->buf_vec[vec_idx].buf_len - vq->vhost_hlen; > - > - entry_len = vq->vhost_hlen; > - > - if (vb_avail == 0) { > - uint32_t desc_idx = > - vq->buf_vec[vec_idx].desc_idx; > - vq->desc[desc_idx].len = vq->vhost_hlen; > - > - if ((vq->desc[desc_idx].flags > - & VRING_DESC_F_NEXT) == 0) { > - /* Update used ring with desc information */ > - vq->used->ring[cur_idx & (vq->size - 1)].id > - = vq->buf_vec[vec_idx].desc_idx; > - vq->used->ring[cur_idx & (vq->size - 1)].len > - = entry_len; > - > - entry_len = 0; > - cur_idx++; > - entry_success++; > - } > - > - vec_idx++; > - vb_addr = > - gpa_to_vva(dev, vq->buf_vec[vec_idx].buf_addr); > - > - /* Prefetch buffer address. */ > - rte_prefetch0((void *)(uintptr_t)vb_addr); > - vb_offset = 0; > - vb_avail = vq->buf_vec[vec_idx].buf_len; > - } > - > - cpy_len = RTE_MIN(vb_avail, seg_avail); > - > - while (cpy_len > 0) { > - /* Copy mbuf data to vring buffer */ > - rte_memcpy((void *)(uintptr_t)(vb_addr + vb_offset), > - (const void *)(rte_pktmbuf_mtod(pkt, char*) + seg_offset), > - cpy_len); > - > - PRINT_PACKET(dev, > - (uintptr_t)(vb_addr + vb_offset), > - cpy_len, 0); > - > - seg_offset += cpy_len; > - vb_offset += cpy_len; > - seg_avail -= cpy_len; > - vb_avail -= cpy_len; > - entry_len += cpy_len; > - > - if (seg_avail != 0) { > - /* > - * The virtio buffer in this vring > - * entry reach to its end. > - * But the segment doesn't complete. > - */ > - if ((vq->desc[vq->buf_vec[vec_idx].desc_idx].flags & > - VRING_DESC_F_NEXT) == 0) { > - /* Update used ring with desc information */ > - vq->used->ring[cur_idx & (vq->size - 1)].id > - = vq->buf_vec[vec_idx].desc_idx; > - vq->used->ring[cur_idx & (vq->size - 1)].len > - = entry_len; > - entry_len = 0; > - cur_idx++; > - entry_success++; > - } > - > - vec_idx++; > - vb_addr = gpa_to_vva(dev, > - vq->buf_vec[vec_idx].buf_addr); > - vb_offset = 0; > - vb_avail = vq->buf_vec[vec_idx].buf_len; > - cpy_len = RTE_MIN(vb_avail, seg_avail); > - } else { > - /* > - * This current segment complete, need continue to > - * check if the whole packet complete or not. > - */ > - pkt = pkt->next; > - if (pkt != NULL) { > - /* > - * There are more segments. > - */ > - if (vb_avail == 0) { > - /* > - * This current buffer from vring is > - * used up, need fetch next buffer > - * from buf_vec. > - */ > - uint32_t desc_idx = > - vq->buf_vec[vec_idx].desc_idx; > - vq->desc[desc_idx].len = vb_offset; > - > - if ((vq->desc[desc_idx].flags & > - VRING_DESC_F_NEXT) == 0) { > - uint16_t wrapped_idx = > - cur_idx & (vq->size - 1); > - /* > - * Update used ring with the > - * descriptor information > - */ > - vq->used->ring[wrapped_idx].id > - = desc_idx; > - vq->used->ring[wrapped_idx].len > - = entry_len; > - entry_success++; > - entry_len = 0; > - cur_idx++; > - } > - > - /* Get next buffer from buf_vec. */ > - vec_idx++; > - vb_addr = gpa_to_vva(dev, > - vq->buf_vec[vec_idx].buf_addr); > - vb_avail = > - vq->buf_vec[vec_idx].buf_len; > - vb_offset = 0; > - } > - > - seg_offset = 0; > - seg_avail = rte_pktmbuf_data_len(pkt); > - cpy_len = RTE_MIN(vb_avail, seg_avail); > - } else { > - /* > - * This whole packet completes. > - */ > - uint32_t desc_idx = > - vq->buf_vec[vec_idx].desc_idx; > - vq->desc[desc_idx].len = vb_offset; > - > - while (vq->desc[desc_idx].flags & > - VRING_DESC_F_NEXT) { > - desc_idx = vq->desc[desc_idx].next; > - vq->desc[desc_idx].len = 0; > - } > - > - /* Update used ring with desc information */ > - vq->used->ring[cur_idx & (vq->size - 1)].id > - = vq->buf_vec[vec_idx].desc_idx; > - vq->used->ring[cur_idx & (vq->size - 1)].len > - = entry_len; > - entry_len = 0; > - cur_idx++; > - entry_success++; > - seg_avail = 0; > - cpy_len = RTE_MIN(vb_avail, seg_avail); > - } > - } > - } > - > - return entry_success; > -} > - > -/* > - * This function works for mergeable RX. > - */ > -static inline uint32_t __attribute__((always_inline)) > -virtio_dev_merge_rx(struct virtio_net *dev, uint16_t queue_id, > - struct rte_mbuf **pkts, uint32_t count) > +uint32_t > +rte_vhost_dequeue_burst(struct virtio_net *dev, uint16_t queue_id, struct rte_mempool *mbuf_pool, struct rte_mbuf **pkts, uint32_t count) > { > - struct vhost_virtqueue *vq; > - uint32_t pkt_idx = 0, entry_success = 0; > - uint16_t avail_idx, res_cur_idx; > - uint16_t res_base_idx, res_end_idx; > - uint8_t success = 0; > - > - LOG_DEBUG(VHOST_DATA, "(%"PRIu64") virtio_dev_merge_rx()\n", > - dev->device_fh); > - if (unlikely(queue_id != VIRTIO_RXQ)) { > - LOG_DEBUG(VHOST_DATA, "mq isn't supported in this version.\n"); > - } > - > - vq = dev->virtqueue[VIRTIO_RXQ]; > - count = RTE_MIN((uint32_t)MAX_PKT_BURST, count); > - > - if (count == 0) > - return 0; > - > - for (pkt_idx = 0; pkt_idx < count; pkt_idx++) { > - uint32_t secure_len = 0; > - uint16_t need_cnt; > - uint32_t vec_idx = 0; > - uint32_t pkt_len = pkts[pkt_idx]->pkt_len + vq->vhost_hlen; > - uint16_t i, id; > - > - do { > - /* > - * As many data cores may want access to available > - * buffers, they need to be reserved. > - */ > - res_base_idx = vq->last_used_idx_res; > - res_cur_idx = res_base_idx; > - > - do { > - avail_idx = *((volatile uint16_t *)&vq->avail->idx); > - if (unlikely(res_cur_idx == avail_idx)) { > - LOG_DEBUG(VHOST_DATA, > - "(%"PRIu64") Failed " > - "to get enough desc from " > - "vring\n", > - dev->device_fh); > - return pkt_idx; > - } else { > - uint16_t wrapped_idx = > - (res_cur_idx) & (vq->size - 1); > - uint32_t idx = > - vq->avail->ring[wrapped_idx]; > - uint8_t next_desc; > - > - do { > - next_desc = 0; > - secure_len += vq->desc[idx].len; > - if (vq->desc[idx].flags & > - VRING_DESC_F_NEXT) { > - idx = vq->desc[idx].next; > - next_desc = 1; > - } > - } while (next_desc); > - > - res_cur_idx++; > - } > - } while (pkt_len > secure_len); > - > - /* vq->last_used_idx_res is atomically updated. */ > - success = rte_atomic16_cmpset(&vq->last_used_idx_res, > - res_base_idx, > - res_cur_idx); > - } while (success == 0); > - > - id = res_base_idx; > - need_cnt = res_cur_idx - res_base_idx; > - > - for (i = 0; i < need_cnt; i++, id++) { > - uint16_t wrapped_idx = id & (vq->size - 1); > - uint32_t idx = vq->avail->ring[wrapped_idx]; > - uint8_t next_desc; > - do { > - next_desc = 0; > - vq->buf_vec[vec_idx].buf_addr = > - vq->desc[idx].addr; > - vq->buf_vec[vec_idx].buf_len = > - vq->desc[idx].len; > - vq->buf_vec[vec_idx].desc_idx = idx; > - vec_idx++; > - > - if (vq->desc[idx].flags & VRING_DESC_F_NEXT) { > - idx = vq->desc[idx].next; > - next_desc = 1; > - } > - } while (next_desc); > - } > - > - res_end_idx = res_cur_idx; > - > - entry_success = copy_from_mbuf_to_vring(dev, res_base_idx, > - res_end_idx, pkts[pkt_idx]); > - > - rte_compiler_barrier(); > - > - /* > - * Wait until it's our turn to add our buffer > - * to the used ring. > - */ > - while (unlikely(vq->last_used_idx != res_base_idx)) > - rte_pause(); > - > - *(volatile uint16_t *)&vq->used->idx += entry_success; > - vq->last_used_idx = res_end_idx; > - > - /* Kick the guest if necessary. */ > - if (!(vq->avail->flags & VRING_AVAIL_F_NO_INTERRUPT)) > - eventfd_write((int)vq->kickfd, 1); > - } > - > - return count; > -} > - > -uint16_t > -rte_vhost_enqueue_burst(struct virtio_net *dev, uint16_t queue_id, > - struct rte_mbuf **pkts, uint16_t count) > -{ > - if (unlikely(dev->features & (1 << VIRTIO_NET_F_MRG_RXBUF))) > - return virtio_dev_merge_rx(dev, queue_id, pkts, count); > - else > - return virtio_dev_rx(dev, queue_id, pkts, count); > -} > - > -uint16_t > -rte_vhost_dequeue_burst(struct virtio_net *dev, uint16_t queue_id, > - struct rte_mempool *mbuf_pool, struct rte_mbuf **pkts, uint16_t count) > -{ > - struct rte_mbuf *m, *prev; > + struct rte_mbuf *mbuf; > struct vhost_virtqueue *vq; > struct vring_desc *desc; > - uint64_t vb_addr = 0; > - uint32_t head[MAX_PKT_BURST]; > + uint64_t buff_addr = 0; > + uint32_t head[VHOST_MAX_PKT_BURST]; > uint32_t used_idx; > uint32_t i; > - uint16_t free_entries, entry_success = 0; > + uint16_t free_entries, packet_success = 0; > uint16_t avail_idx; > > if (unlikely(queue_id != VIRTIO_TXQ)) { > @@ -549,8 +217,8 @@ rte_vhost_dequeue_burst(struct virtio_net *dev, uint16_t queue_id, > if (vq->last_used_idx == avail_idx) > return 0; > > - LOG_DEBUG(VHOST_DATA, "%s (%"PRIu64")\n", __func__, > - dev->device_fh); > + LOG_DEBUG(VHOST_DATA, "(%"PRIu64") %s(%d->%d)\n", > + dev->device_fh, __func__, vq->last_used_idx, avail_idx); > > /* Prefetch available ring to retrieve head indexes. */ > rte_prefetch0(&vq->avail->ring[vq->last_used_idx & (vq->size - 1)]); > @@ -558,173 +226,68 @@ rte_vhost_dequeue_burst(struct virtio_net *dev, uint16_t queue_id, > /*get the number of free entries in the ring*/ > free_entries = (avail_idx - vq->last_used_idx); > > - free_entries = RTE_MIN(free_entries, count); > + if (free_entries > count) > + free_entries = count; > /* Limit to MAX_PKT_BURST. */ > - free_entries = RTE_MIN(free_entries, MAX_PKT_BURST); > + if (free_entries > VHOST_MAX_PKT_BURST) > + free_entries = VHOST_MAX_PKT_BURST; > > - LOG_DEBUG(VHOST_DATA, "(%"PRIu64") Buffers available %d\n", > - dev->device_fh, free_entries); > + LOG_DEBUG(VHOST_DATA, "(%"PRIu64") Buffers available %d\n", dev->device_fh, free_entries); > /* Retrieve all of the head indexes first to avoid caching issues. */ > for (i = 0; i < free_entries; i++) > head[i] = vq->avail->ring[(vq->last_used_idx + i) & (vq->size - 1)]; > > /* Prefetch descriptor index. */ > - rte_prefetch0(&vq->desc[head[entry_success]]); > + rte_prefetch0(&vq->desc[head[packet_success]]); > rte_prefetch0(&vq->used->ring[vq->last_used_idx & (vq->size - 1)]); > > - while (entry_success < free_entries) { > - uint32_t vb_avail, vb_offset; > - uint32_t seg_avail, seg_offset; > - uint32_t cpy_len; > - uint32_t seg_num = 0; > - struct rte_mbuf *cur; > - uint8_t alloc_err = 0; > - > - desc = &vq->desc[head[entry_success]]; > + while (packet_success < free_entries) { > + desc = &vq->desc[head[packet_success]]; > > /* Discard first buffer as it is the virtio header */ > desc = &vq->desc[desc->next]; > > /* Buffer address translation. */ > - vb_addr = gpa_to_vva(dev, desc->addr); > + buff_addr = gpa_to_vva(dev, desc->addr); > /* Prefetch buffer address. */ > - rte_prefetch0((void *)(uintptr_t)vb_addr); > + rte_prefetch0((void *)(uintptr_t)buff_addr); > > used_idx = vq->last_used_idx & (vq->size - 1); > > - if (entry_success < (free_entries - 1)) { > + if (packet_success < (free_entries - 1)) { > /* Prefetch descriptor index. */ > - rte_prefetch0(&vq->desc[head[entry_success+1]]); > + rte_prefetch0(&vq->desc[head[packet_success+1]]); > rte_prefetch0(&vq->used->ring[(used_idx + 1) & (vq->size - 1)]); > } > > /* Update used index buffer information. */ > - vq->used->ring[used_idx].id = head[entry_success]; > + vq->used->ring[used_idx].id = head[packet_success]; > vq->used->ring[used_idx].len = 0; > > - vb_offset = 0; > - vb_avail = desc->len; > - /* Allocate an mbuf and populate the structure. */ > - m = rte_pktmbuf_alloc(mbuf_pool); > - if (unlikely(m == NULL)) { > - RTE_LOG(ERR, VHOST_DATA, > - "Failed to allocate memory for mbuf.\n"); > - return entry_success; > + mbuf = rte_pktmbuf_alloc(mbuf_pool); > + if (unlikely(mbuf == NULL)) { > + RTE_LOG(ERR, VHOST_DATA, "Failed to allocate memory for mbuf.\n"); > + return packet_success; > } > - seg_offset = 0; > - seg_avail = m->buf_len - RTE_PKTMBUF_HEADROOM; > - cpy_len = RTE_MIN(vb_avail, seg_avail); > - > - PRINT_PACKET(dev, (uintptr_t)vb_addr, desc->len, 0); > - > - seg_num++; > - cur = m; > - prev = m; > - while (cpy_len != 0) { > - rte_memcpy((void *)(rte_pktmbuf_mtod(cur, char *) + seg_offset), > - (void *)((uintptr_t)(vb_addr + vb_offset)), > - cpy_len); > - > - seg_offset += cpy_len; > - vb_offset += cpy_len; > - vb_avail -= cpy_len; > - seg_avail -= cpy_len; > - > - if (vb_avail != 0) { > - /* > - * The segment reachs to its end, > - * while the virtio buffer in TX vring has > - * more data to be copied. > - */ > - cur->data_len = seg_offset; > - m->pkt_len += seg_offset; > - /* Allocate mbuf and populate the structure. */ > - cur = rte_pktmbuf_alloc(mbuf_pool); > - if (unlikely(cur == NULL)) { > - RTE_LOG(ERR, VHOST_DATA, "Failed to " > - "allocate memory for mbuf.\n"); > - rte_pktmbuf_free(m); > - alloc_err = 1; > - break; > - } > - > - seg_num++; > - prev->next = cur; > - prev = cur; > - seg_offset = 0; > - seg_avail = cur->buf_len - RTE_PKTMBUF_HEADROOM; > - } else { > - if (desc->flags & VRING_DESC_F_NEXT) { > - /* > - * There are more virtio buffers in > - * same vring entry need to be copied. > - */ > - if (seg_avail == 0) { > - /* > - * The current segment hasn't > - * room to accomodate more > - * data. > - */ > - cur->data_len = seg_offset; > - m->pkt_len += seg_offset; > - /* > - * Allocate an mbuf and > - * populate the structure. > - */ > - cur = rte_pktmbuf_alloc(mbuf_pool); > - if (unlikely(cur == NULL)) { > - RTE_LOG(ERR, > - VHOST_DATA, > - "Failed to " > - "allocate memory " > - "for mbuf\n"); > - rte_pktmbuf_free(m); > - alloc_err = 1; > - break; > - } > - seg_num++; > - prev->next = cur; > - prev = cur; > - seg_offset = 0; > - seg_avail = cur->buf_len - RTE_PKTMBUF_HEADROOM; > - } > - > - desc = &vq->desc[desc->next]; > - > - /* Buffer address translation. */ > - vb_addr = gpa_to_vva(dev, desc->addr); > - /* Prefetch buffer address. */ > - rte_prefetch0((void *)(uintptr_t)vb_addr); > - vb_offset = 0; > - vb_avail = desc->len; > - > - PRINT_PACKET(dev, (uintptr_t)vb_addr, > - desc->len, 0); > - } else { > - /* The whole packet completes. */ > - cur->data_len = seg_offset; > - m->pkt_len += seg_offset; > - vb_avail = 0; > - } > - } > + mbuf->pkt.data_len = desc->len; > + mbuf->pkt.pkt_len = mbuf->pkt.data_len; > > - cpy_len = RTE_MIN(vb_avail, seg_avail); > - } > + rte_memcpy((void *) mbuf->pkt.data, > + (const void *) buff_addr, mbuf->pkt.data_len); > > - if (unlikely(alloc_err == 1)) > - break; > + pkts[packet_success] = mbuf; > > - m->nb_segs = seg_num; > + VHOST_PRINT_PACKET(dev, (uintptr_t)buff_addr, desc->len, 0); > > - pkts[entry_success] = m; > vq->last_used_idx++; > - entry_success++; > + packet_success++; > } > > rte_compiler_barrier(); > - vq->used->idx += entry_success; > + vq->used->idx += packet_success; > /* Kick guest if required. */ > if (!(vq->avail->flags & VRING_AVAIL_F_NO_INTERRUPT)) > eventfd_write((int)vq->kickfd, 1); > - return entry_success; > + > + return packet_success; > } > diff --git a/lib/librte_vhost/virtio-net.c b/lib/librte_vhost/virtio-net.c > index 852b6d1..516e743 100644 > --- a/lib/librte_vhost/virtio-net.c > +++ b/lib/librte_vhost/virtio-net.c > @@ -31,17 +31,14 @@ > * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. > */ > > -#include > -#include > #include > #include > #include > #include > #include > -#include > -#include > #include > #include > +#include > > #include > #include > @@ -49,10 +46,8 @@ > #include > #include > > -#include "vhost-net-cdev.h" > -#include "eventfd_link/eventfd_link.h" > - > -/* > +#include "vhost-net.h" > +/** > * Device linked list structure for configuration. > */ > struct virtio_net_config_ll { > @@ -60,38 +55,15 @@ struct virtio_net_config_ll { > struct virtio_net_config_ll *next; /* Next dev on linked list.*/ > }; > > -const char eventfd_cdev[] = "/dev/eventfd-link"; > - > -/* device ops to add/remove device to/from data core. */ > +/* device ops to add/remove device to data core. */ > static struct virtio_net_device_ops const *notify_ops; > -/* root address of the linked list of managed virtio devices */ > +/* root address of the linked list in the configuration core. */ > static struct virtio_net_config_ll *ll_root; > > /* Features supported by this lib. */ > -#define VHOST_SUPPORTED_FEATURES ((1ULL << VIRTIO_NET_F_MRG_RXBUF) | \ > - (1ULL << VIRTIO_NET_F_CTRL_RX)) > +#define VHOST_SUPPORTED_FEATURES (1ULL << VIRTIO_NET_F_MRG_RXBUF) > static uint64_t VHOST_FEATURES = VHOST_SUPPORTED_FEATURES; > > -/* Line size for reading maps file. */ > -static const uint32_t BUFSIZE = PATH_MAX; > - > -/* Size of prot char array in procmap. */ > -#define PROT_SZ 5 > - > -/* Number of elements in procmap struct. */ > -#define PROCMAP_SZ 8 > - > -/* Structure containing information gathered from maps file. */ > -struct procmap { > - uint64_t va_start; /* Start virtual address in file. */ > - uint64_t len; /* Size of file. */ > - uint64_t pgoff; /* Not used. */ > - uint32_t maj; /* Not used. */ > - uint32_t min; /* Not used. */ > - uint32_t ino; /* Not used. */ > - char prot[PROT_SZ]; /* Not used. */ > - char fname[PATH_MAX]; /* File name. */ > -}; > > /* > * Converts QEMU virtual address to Vhost virtual address. This function is > @@ -110,199 +82,15 @@ qva_to_vva(struct virtio_net *dev, uint64_t qemu_va) > if ((qemu_va >= region->userspace_address) && > (qemu_va <= region->userspace_address + > region->memory_size)) { > - vhost_va = dev->mem->mapped_address + qemu_va - > - dev->mem->base_address; > + vhost_va = qemu_va + region->guest_phys_address + > + region->address_offset - > + region->userspace_address; > break; > } > } > return vhost_va; > } > > -/* > - * Locate the file containing QEMU's memory space and > - * map it to our address space. > - */ > -static int > -host_memory_map(struct virtio_net *dev, struct virtio_memory *mem, > - pid_t pid, uint64_t addr) > -{ > - struct dirent *dptr = NULL; > - struct procmap procmap; > - DIR *dp = NULL; > - int fd; > - int i; > - char memfile[PATH_MAX]; > - char mapfile[PATH_MAX]; > - char procdir[PATH_MAX]; > - char resolved_path[PATH_MAX]; > - char *path = NULL; > - FILE *fmap; > - void *map; > - uint8_t found = 0; > - char line[BUFSIZE]; > - char dlm[] = "- : "; > - char *str, *sp, *in[PROCMAP_SZ]; > - char *end = NULL; > - > - /* Path where mem files are located. */ > - snprintf(procdir, PATH_MAX, "/proc/%u/fd/", pid); > - /* Maps file used to locate mem file. */ > - snprintf(mapfile, PATH_MAX, "/proc/%u/maps", pid); > - > - fmap = fopen(mapfile, "r"); > - if (fmap == NULL) { > - RTE_LOG(ERR, VHOST_CONFIG, > - "(%"PRIu64") Failed to open maps file for pid %d\n", > - dev->device_fh, pid); > - return -1; > - } > - > - /* Read through maps file until we find out base_address. */ > - while (fgets(line, BUFSIZE, fmap) != 0) { > - str = line; > - errno = 0; > - /* Split line into fields. */ > - for (i = 0; i < PROCMAP_SZ; i++) { > - in[i] = strtok_r(str, &dlm[i], &sp); > - if ((in[i] == NULL) || (errno != 0)) { > - fclose(fmap); > - return -1; > - } > - str = NULL; > - } > - > - /* Convert/Copy each field as needed. */ > - procmap.va_start = strtoull(in[0], &end, 16); > - if ((in[0] == '\0') || (end == NULL) || (*end != '\0') || > - (errno != 0)) { > - fclose(fmap); > - return -1; > - } > - > - procmap.len = strtoull(in[1], &end, 16); > - if ((in[1] == '\0') || (end == NULL) || (*end != '\0') || > - (errno != 0)) { > - fclose(fmap); > - return -1; > - } > - > - procmap.pgoff = strtoull(in[3], &end, 16); > - if ((in[3] == '\0') || (end == NULL) || (*end != '\0') || > - (errno != 0)) { > - fclose(fmap); > - return -1; > - } > - > - procmap.maj = strtoul(in[4], &end, 16); > - if ((in[4] == '\0') || (end == NULL) || (*end != '\0') || > - (errno != 0)) { > - fclose(fmap); > - return -1; > - } > - > - procmap.min = strtoul(in[5], &end, 16); > - if ((in[5] == '\0') || (end == NULL) || (*end != '\0') || > - (errno != 0)) { > - fclose(fmap); > - return -1; > - } > - > - procmap.ino = strtoul(in[6], &end, 16); > - if ((in[6] == '\0') || (end == NULL) || (*end != '\0') || > - (errno != 0)) { > - fclose(fmap); > - return -1; > - } > - > - memcpy(&procmap.prot, in[2], PROT_SZ); > - memcpy(&procmap.fname, in[7], PATH_MAX); > - > - if (procmap.va_start == addr) { > - procmap.len = procmap.len - procmap.va_start; > - found = 1; > - break; > - } > - } > - fclose(fmap); > - > - if (!found) { > - RTE_LOG(ERR, VHOST_CONFIG, > - "(%"PRIu64") Failed to find memory file in pid %d maps file\n", > - dev->device_fh, pid); > - return -1; > - } > - > - /* Find the guest memory file among the process fds. */ > - dp = opendir(procdir); > - if (dp == NULL) { > - RTE_LOG(ERR, VHOST_CONFIG, > - "(%"PRIu64") Cannot open pid %d process directory\n", > - dev->device_fh, pid); > - return -1; > - } > - > - found = 0; > - > - /* Read the fd directory contents. */ > - while (NULL != (dptr = readdir(dp))) { > - snprintf(memfile, PATH_MAX, "/proc/%u/fd/%s", > - pid, dptr->d_name); > - path = realpath(memfile, resolved_path); > - if ((path == NULL) && (strlen(resolved_path) == 0)) { > - RTE_LOG(ERR, VHOST_CONFIG, > - "(%"PRIu64") Failed to resolve fd directory\n", > - dev->device_fh); > - closedir(dp); > - return -1; > - } > - if (strncmp(resolved_path, procmap.fname, > - strnlen(procmap.fname, PATH_MAX)) == 0) { > - found = 1; > - break; > - } > - } > - > - closedir(dp); > - > - if (found == 0) { > - RTE_LOG(ERR, VHOST_CONFIG, > - "(%"PRIu64") Failed to find memory file for pid %d\n", > - dev->device_fh, pid); > - return -1; > - } > - /* Open the shared memory file and map the memory into this process. */ > - fd = open(memfile, O_RDWR); > - > - if (fd == -1) { > - RTE_LOG(ERR, VHOST_CONFIG, > - "(%"PRIu64") Failed to open %s for pid %d\n", > - dev->device_fh, memfile, pid); > - return -1; > - } > - > - map = mmap(0, (size_t)procmap.len, PROT_READ|PROT_WRITE, > - MAP_POPULATE|MAP_SHARED, fd, 0); > - close(fd); > - > - if (map == MAP_FAILED) { > - RTE_LOG(ERR, VHOST_CONFIG, > - "(%"PRIu64") Error mapping the file %s for pid %d\n", > - dev->device_fh, memfile, pid); > - return -1; > - } > - > - /* Store the memory address and size in the device data structure */ > - mem->mapped_address = (uint64_t)(uintptr_t)map; > - mem->mapped_size = procmap.len; > - > - LOG_DEBUG(VHOST_CONFIG, > - "(%"PRIu64") Mem File: %s->%s - Size: %llu - VA: %p\n", > - dev->device_fh, > - memfile, resolved_path, > - (unsigned long long)mem->mapped_size, map); > - > - return 0; > -} > > /* > * Retrieves an entry from the devices configuration linked list. > @@ -376,7 +164,7 @@ add_config_ll_entry(struct virtio_net_config_ll *new_ll_dev) > } > > } > - > +/*TODO dpdk alloc/free if possible */ > /* > * Unmap any memory, close any file descriptors and > * free any memory owned by a device. > @@ -389,16 +177,17 @@ cleanup_device(struct virtio_net *dev) > munmap((void *)(uintptr_t)dev->mem->mapped_address, > (size_t)dev->mem->mapped_size); > free(dev->mem); > + dev->mem = NULL; > } > > /* Close any event notifiers opened by device. */ > - if (dev->virtqueue[VIRTIO_RXQ]->callfd) > + if (dev->virtqueue[VIRTIO_RXQ]->callfd > 0) > close((int)dev->virtqueue[VIRTIO_RXQ]->callfd); > - if (dev->virtqueue[VIRTIO_RXQ]->kickfd) > + if (dev->virtqueue[VIRTIO_RXQ]->kickfd > 0) > close((int)dev->virtqueue[VIRTIO_RXQ]->kickfd); > - if (dev->virtqueue[VIRTIO_TXQ]->callfd) > + if (dev->virtqueue[VIRTIO_TXQ]->callfd > 0) > close((int)dev->virtqueue[VIRTIO_TXQ]->callfd); > - if (dev->virtqueue[VIRTIO_TXQ]->kickfd) > + if (dev->virtqueue[VIRTIO_TXQ]->kickfd > 0) > close((int)dev->virtqueue[VIRTIO_TXQ]->kickfd); > } > > @@ -522,8 +311,8 @@ new_device(struct vhost_device_ctx ctx) > } > > /* > - * Function is called from the CUSE release function. This function will > - * cleanup the device and remove it from device configuration linked list. > + * Function is called from the CUSE release function. This function will cleanup > + * the device and remove it from device configuration linked list. > */ > static void > destroy_device(struct vhost_device_ctx ctx) > @@ -569,6 +358,7 @@ set_owner(struct vhost_device_ctx ctx) > return -1; > > return 0; > + /* TODO check ctx.fh is meaningfull here */ > } > > /* > @@ -651,14 +441,12 @@ set_features(struct vhost_device_ctx ctx, uint64_t *pu) > * This includes storing offsets used to translate buffer addresses. > */ > static int > -set_mem_table(struct vhost_device_ctx ctx, const void *mem_regions_addr, > - uint32_t nregions) > +set_mem_table(struct vhost_device_ctx ctx, > + const struct virtio_memory_regions *regions, uint32_t nregions) > { > struct virtio_net *dev; > - struct vhost_memory_region *mem_regions; > struct virtio_memory *mem; > - uint64_t size = offsetof(struct vhost_memory, regions); > - uint32_t regionidx, valid_regions; > + uint32_t regionidx; > > dev = get_device(ctx); > if (dev == NULL) > @@ -682,107 +470,24 @@ set_mem_table(struct vhost_device_ctx ctx, const void *mem_regions_addr, > > mem->nregions = nregions; > > - mem_regions = (void *)(uintptr_t) > - ((uint64_t)(uintptr_t)mem_regions_addr + size); > - > for (regionidx = 0; regionidx < mem->nregions; regionidx++) { > /* Populate the region structure for each region. */ > - mem->regions[regionidx].guest_phys_address = > - mem_regions[regionidx].guest_phys_addr; > - mem->regions[regionidx].guest_phys_address_end = > - mem->regions[regionidx].guest_phys_address + > - mem_regions[regionidx].memory_size; > - mem->regions[regionidx].memory_size = > - mem_regions[regionidx].memory_size; > - mem->regions[regionidx].userspace_address = > - mem_regions[regionidx].userspace_addr; > - > - LOG_DEBUG(VHOST_CONFIG, "(%"PRIu64") REGION: %u - GPA: %p - QEMU VA: %p - SIZE (%"PRIu64")\n", dev->device_fh, > - regionidx, > - (void *)(uintptr_t)mem->regions[regionidx].guest_phys_address, > - (void *)(uintptr_t)mem->regions[regionidx].userspace_address, > - mem->regions[regionidx].memory_size); > - > - /*set the base address mapping*/ > + mem->regions[regionidx] = regions[regionidx]; > if (mem->regions[regionidx].guest_phys_address == 0x0) { > mem->base_address = > mem->regions[regionidx].userspace_address; > - /* Map VM memory file */ > - if (host_memory_map(dev, mem, ctx.pid, > - mem->base_address) != 0) { > - free(mem); > - return -1; > - } > + mem->mapped_address = > + mem->regions[regionidx].address_offset; > } > } > > - /* Check that we have a valid base address. */ > - if (mem->base_address == 0) { > - RTE_LOG(ERR, VHOST_CONFIG, "(%"PRIu64") Failed to find base address of qemu memory file.\n", dev->device_fh); > - free(mem); > - return -1; > - } > - > - /* > - * Check if all of our regions have valid mappings. > - * Usually one does not exist in the QEMU memory file. > - */ > - valid_regions = mem->nregions; > - for (regionidx = 0; regionidx < mem->nregions; regionidx++) { > - if ((mem->regions[regionidx].userspace_address < > - mem->base_address) || > - (mem->regions[regionidx].userspace_address > > - (mem->base_address + mem->mapped_size))) > - valid_regions--; > - } > - > - /* > - * If a region does not have a valid mapping, > - * we rebuild our memory struct to contain only valid entries. > - */ > - if (valid_regions != mem->nregions) { > - LOG_DEBUG(VHOST_CONFIG, "(%"PRIu64") Not all memory regions exist in the QEMU mem file. Re-populating mem structure\n", > - dev->device_fh); > - > - /* > - * Re-populate the memory structure with only valid regions. > - * Invalid regions are over-written with memmove. > - */ > - valid_regions = 0; > - > - for (regionidx = mem->nregions; 0 != regionidx--;) { > - if ((mem->regions[regionidx].userspace_address < > - mem->base_address) || > - (mem->regions[regionidx].userspace_address > > - (mem->base_address + mem->mapped_size))) { > - memmove(&mem->regions[regionidx], > - &mem->regions[regionidx + 1], > - sizeof(struct virtio_memory_regions) * > - valid_regions); > - } else { > - valid_regions++; > - } > - } > - } > - mem->nregions = valid_regions; > + /*TODO addback the logic that remove invalid memory regions */ > dev->mem = mem; > > - /* > - * Calculate the address offset for each region. > - * This offset is used to identify the vhost virtual address > - * corresponding to a QEMU guest physical address. > - */ > - for (regionidx = 0; regionidx < dev->mem->nregions; regionidx++) { > - dev->mem->regions[regionidx].address_offset = > - dev->mem->regions[regionidx].userspace_address - > - dev->mem->base_address + > - dev->mem->mapped_address - > - dev->mem->regions[regionidx].guest_phys_address; > - > - } > return 0; > } > > + > /* > * Called from CUSE IOCTL: VHOST_SET_VRING_NUM > * The virtio device sends us the size of the descriptor ring. > @@ -896,38 +601,62 @@ get_vring_base(struct vhost_device_ctx ctx, uint32_t index, > /* State->index refers to the queue index. The txq is 1, rxq is 0. */ > state->num = dev->virtqueue[state->index]->last_used_idx; > > - return 0; > -} > + if (dev->flags & VIRTIO_DEV_RUNNING) { > + RTE_LOG(INFO, VHOST_CONFIG, > + "get_vring_base message is for release\n"); > + notify_ops->destroy_device(dev); > + /* > + * sync call. > + * when it returns, it means it si removed from data core. > + */ > + } > + /* TODO fix all munmap */ > + if (dev->mem) { > + munmap((void *)(uintptr_t)dev->mem->mapped_address, > + (size_t)dev->mem->mapped_size); > + free(dev->mem); > + dev->mem = NULL; > + } > > -/* > - * This function uses the eventfd_link kernel module to copy an eventfd file > - * descriptor provided by QEMU in to our process space. > - */ > -static int > -eventfd_copy(struct virtio_net *dev, struct eventfd_copy *eventfd_copy) > -{ > - int eventfd_link, ret; > > - /* Open the character device to the kernel module. */ > - eventfd_link = open(eventfd_cdev, O_RDWR); > - if (eventfd_link < 0) { > - RTE_LOG(ERR, VHOST_CONFIG, > - "(%"PRIu64") eventfd_link module is not loaded\n", > - dev->device_fh); > - return -1; > - } > + if (dev->virtqueue[VIRTIO_RXQ]->callfd > 0) > + close((int)dev->virtqueue[VIRTIO_RXQ]->callfd); > + dev->virtqueue[VIRTIO_RXQ]->callfd = -1; > + if (dev->virtqueue[VIRTIO_TXQ]->callfd > 0) > + close((int)dev->virtqueue[VIRTIO_TXQ]->callfd); > + dev->virtqueue[VIRTIO_TXQ]->callfd = -1; > + /* We don't cleanup callfd here as we willn't get CALLFD again */ > + > + dev->virtqueue[VIRTIO_RXQ]->desc = NULL; > + dev->virtqueue[VIRTIO_RXQ]->avail = NULL; > + dev->virtqueue[VIRTIO_RXQ]->used = NULL; > + dev->virtqueue[VIRTIO_RXQ]->last_used_idx = 0; > + dev->virtqueue[VIRTIO_RXQ]->last_used_idx_res = 0; > + > + dev->virtqueue[VIRTIO_TXQ]->desc = NULL; > + dev->virtqueue[VIRTIO_TXQ]->avail = NULL; > + dev->virtqueue[VIRTIO_TXQ]->used = NULL; > + dev->virtqueue[VIRTIO_TXQ]->last_used_idx = 0; > + dev->virtqueue[VIRTIO_TXQ]->last_used_idx_res = 0; > > - /* Call the IOCTL to copy the eventfd. */ > - ret = ioctl(eventfd_link, EVENTFD_COPY, eventfd_copy); > - close(eventfd_link); > > - if (ret < 0) { > - RTE_LOG(ERR, VHOST_CONFIG, > - "(%"PRIu64") EVENTFD_COPY ioctl failed\n", > - dev->device_fh); > - return -1; > - } > + return 0; > +} > > +static int > +virtio_is_ready(struct virtio_net *dev, int index) > +{ > + struct vhost_virtqueue *vq1, *vq2; > + /* mq support in future.*/ > + vq1 = dev->virtqueue[index]; > + vq2 = dev->virtqueue[index ^ 1]; > + if (vq1 && vq2 && vq1->desc && vq2->desc && > + (vq1->kickfd > 0) && (vq1->callfd > 0) && > + (vq2->kickfd > 0) && (vq2->callfd > 0)) { > + LOG_DEBUG(VHOST_CONFIG, "virtio is ready for processing.\n"); > + return 1; > + } > + LOG_DEBUG(VHOST_CONFIG, "virtio isn't ready for processing.\n"); > return 0; > } > > @@ -940,7 +669,6 @@ static int > set_vring_call(struct vhost_device_ctx ctx, struct vhost_vring_file *file) > { > struct virtio_net *dev; > - struct eventfd_copy eventfd_kick; > struct vhost_virtqueue *vq; > > dev = get_device(ctx); > @@ -953,14 +681,7 @@ set_vring_call(struct vhost_device_ctx ctx, struct vhost_vring_file *file) > if (vq->kickfd) > close((int)vq->kickfd); > > - /* Populate the eventfd_copy structure and call eventfd_copy. */ > - vq->kickfd = eventfd(0, EFD_NONBLOCK | EFD_CLOEXEC); > - eventfd_kick.source_fd = vq->kickfd; > - eventfd_kick.target_fd = file->fd; > - eventfd_kick.target_pid = ctx.pid; > - > - if (eventfd_copy(dev, &eventfd_kick)) > - return -1; > + vq->kickfd = file->fd; > > return 0; > } > @@ -974,7 +695,6 @@ static int > set_vring_kick(struct vhost_device_ctx ctx, struct vhost_vring_file *file) > { > struct virtio_net *dev; > - struct eventfd_copy eventfd_call; > struct vhost_virtqueue *vq; > > dev = get_device(ctx); > @@ -986,16 +706,11 @@ set_vring_kick(struct vhost_device_ctx ctx, struct vhost_vring_file *file) > > if (vq->callfd) > close((int)vq->callfd); > + vq->callfd = file->fd; > > - /* Populate the eventfd_copy structure and call eventfd_copy. */ > - vq->callfd = eventfd(0, EFD_NONBLOCK | EFD_CLOEXEC); > - eventfd_call.source_fd = vq->callfd; > - eventfd_call.target_fd = file->fd; > - eventfd_call.target_pid = ctx.pid; > - > - if (eventfd_copy(dev, &eventfd_call)) > - return -1; > - > + if (virtio_is_ready(dev, file->index) && > + !(dev->flags & VIRTIO_DEV_RUNNING)) > + notify_ops->new_device(dev); > return 0; > } > > @@ -1024,6 +739,7 @@ set_backend(struct vhost_device_ctx ctx, struct vhost_vring_file *file) > * If the device isn't already running and both backend fds are set, > * we add the device. > */ > + LOG_DEBUG(VHOST_CONFIG, "%s %d\n", __func__, file->fd); > if (!(dev->flags & VIRTIO_DEV_RUNNING)) { > if (((int)dev->virtqueue[VIRTIO_TXQ]->backend != VIRTIO_DEV_STOPPED) && > ((int)dev->virtqueue[VIRTIO_RXQ]->backend != VIRTIO_DEV_STOPPED))