From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mx1.redhat.com (mx1.redhat.com [209.132.183.28]) by dpdk.org (Postfix) with ESMTP id CFB9B592C for ; Wed, 11 Jan 2017 04:23:10 +0100 (CET) Received: from int-mx11.intmail.prod.int.phx2.redhat.com (int-mx11.intmail.prod.int.phx2.redhat.com [10.5.11.24]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 29F522173D; Wed, 11 Jan 2017 03:23:10 +0000 (UTC) Received: from [10.72.5.47] (vpn1-5-47.pek2.redhat.com [10.72.5.47]) by int-mx11.intmail.prod.int.phx2.redhat.com (8.14.4/8.14.4) with ESMTP id v0B3N4Iv023473 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Tue, 10 Jan 2017 22:23:07 -0500 To: "Tan, Jianfeng" , dev@dpdk.org References: <1480689075-66977-1-git-send-email-jianfeng.tan@intel.com> <1482477266-39199-1-git-send-email-jianfeng.tan@intel.com> <1482477266-39199-6-git-send-email-jianfeng.tan@intel.com> <46af618f-c01b-3571-78fc-12d10859a4a1@redhat.com> <4fa4d06c-d359-df12-a073-7c2c2540b634@intel.com> Cc: yuanhan.liu@linux.intel.com, ferruh.yigit@intel.com, cunming.liang@intel.com From: Jason Wang Message-ID: <83c1bc3a-7b16-6f73-f0ba-6ed5c863eaa5@redhat.com> Date: Wed, 11 Jan 2017 11:23:02 +0800 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.5.1 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit X-Scanned-By: MIMEDefang 2.68 on 10.5.11.24 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.28]); Wed, 11 Jan 2017 03:23:10 +0000 (UTC) Subject: Re: [dpdk-dev] [PATCH v2 5/7] net/virtio_user: add vhost kernel support X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 11 Jan 2017 03:23:11 -0000 On 2017年01月11日 11:13, Tan, Jianfeng wrote: > > > On 1/11/2017 10:42 AM, Jason Wang wrote: >> >> >> On 2017年01月10日 14:11, Tan, Jianfeng wrote: >>> >>> Hi Jason, >>> >>> >>> On 1/9/2017 12:39 PM, Jason Wang wrote: >>>> >>>> >>>> On 2016年12月23日 15:14, Jianfeng Tan wrote: >>>>> This patch add support vhost kernel as the backend for virtio_user. >>>>> Three main hook functions are added: >>>>> - vhost_kernel_setup() to open char device, each vq pair needs one >>>>> vhostfd; >>>>> - vhost_kernel_ioctl() to communicate control messages with vhost >>>>> kernel module; >>>>> - vhost_kernel_enable_queue_pair() to open tap device and set it >>>>> as the backend of corresonding vhost fd (that is to say, vq >>>>> pair). >>>>> >>>>> Signed-off-by: Jianfeng Tan >>>>> --- >>>>> >>> [...] >>>>> +/* TUNSETIFF ifr flags */ >>>>> +#define IFF_TAP 0x0002 >>>>> +#define IFF_NO_PI 0x1000 >>>>> +#define IFF_ONE_QUEUE 0x2000 >>>>> +#define IFF_VNET_HDR 0x4000 >>>>> +#define IFF_MULTI_QUEUE 0x0100 >>>>> +#define IFF_ATTACH_QUEUE 0x0200 >>>>> +#define IFF_DETACH_QUEUE 0x0400 >>>> >>>> Do we really want to duplicate those things which has been exposed >>>> by uapi here? >>> >>> You mean those defined in ? Redefine those common >>> macros, or include standard header file, with respective pros and >>> cons. DPDK prefers the redefinition way as far as I understand, >>> doesn't it? >>> >> >> Well, if you really want to do this, you may want to use an >> independent file. Then you can sync it with linux headers with a bash >> script. > > Agreed! > > [...] >>>>> +static struct vhost_memory_kernel * >>>>> +prepare_vhost_memory_kernel(void) >>>>> +{ >>>>> + uint32_t i, j, k = 0; >>>>> + struct rte_memseg *seg; >>>>> + struct vhost_memory_region *mr; >>>>> + struct vhost_memory_kernel *vm; >>>>> + >>>>> + vm = malloc(sizeof(struct vhost_memory_kernel) + >>>>> + VHOST_KERNEL_MAX_REGIONS * >>>>> + sizeof(struct vhost_memory_region)); >>>>> + >>>>> + for (i = 0; i < RTE_MAX_MEMSEG; ++i) { >>>>> + seg = &rte_eal_get_configuration()->mem_config->memseg[i]; >>>>> + if (!seg->addr) >>>>> + break; >>>> >>>> If we're sure the number of regions is less than 64(or the module >>>> parameter read from /sys), can we avoid the iteration here? >>> >>> The "if" statement under the "for" statement can save us from all >>> RTE_MAX_MEMSEG iteration. >> >> But if we know the number of regions is short than the limit, there's >> even no need for this? > > The problem is: we don't have a variable to know how many segments are > there in DPDK. Anyway, we need to report error in the situation that # > of segments > # of regions. Or can you give a more specific example? > > [...] If we don't know #segments, I'm fine with current code. >>>>> + if (dev->ifname) >>>>> + strncpy(ifr.ifr_name, dev->ifname, IFNAMSIZ); >>>>> + else >>>>> + strncpy(ifr.ifr_name, "tap%d", IFNAMSIZ); >>>>> + if (ioctl(tapfd, TUNSETIFF, (void *)&ifr) == -1) { >>>>> + PMD_DRV_LOG(ERR, "TUNSETIFF failed: %s", strerror(errno)); >>>>> + goto error; >>>>> + } >>>> >>>> This requires CAP_NET_ADMIN, so we should really consider to accept >>>> a pre-created fd here. >>> >>> It sounds like a future work for me. So far, all DPDK apps are >>> running in privileged mode (including CAP_NET_ADMIN?). >>> >> >> That's not safe. > > Yes. > >> Accepting a per-created fd can solve this, and can even support macvtap > > So you are proposing a way to specify the virtio_user parameter like, > --vdev=virtio_user,tapfd=fd1,xxx, right? As replied in your another > comment, it's a great way to go. But I'm afraid we have no time to > implement that in this cycle. > > Thanks, > Jianfeng Ok, just want to show its advantages. It can be added on top. And two more suggestions: - better to split tap support out of vhost file - kernel support more than 8 queues on recent kernel, so there's no need to limit it to 8. When running on old kernel, TUNSETIFF will fail which can give user a hint to reduce #queues. Thanks