From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mga01.intel.com (mga01.intel.com [192.55.52.88]) by dpdk.org (Postfix) with ESMTP id 77FD47E75 for ; Fri, 14 Nov 2014 01:12:31 +0100 (CET) Received: from fmsmga003.fm.intel.com ([10.253.24.29]) by fmsmga101.fm.intel.com with ESMTP; 13 Nov 2014 16:22:30 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="4.97,862,1389772800"; d="scan'208";a="416252095" Received: from pgsmsx103.gar.corp.intel.com ([10.221.44.82]) by FMSMGA003.fm.intel.com with ESMTP; 13 Nov 2014 16:13:27 -0800 Received: from pgsmsx107.gar.corp.intel.com (10.221.44.105) by PGSMSX103.gar.corp.intel.com (10.221.44.82) with Microsoft SMTP Server (TLS) id 14.3.195.1; Fri, 14 Nov 2014 08:22:28 +0800 Received: from shsmsx152.ccr.corp.intel.com (10.239.6.52) by PGSMSX107.gar.corp.intel.com (10.221.44.105) with Microsoft SMTP Server (TLS) id 14.3.195.1; Fri, 14 Nov 2014 08:22:28 +0800 Received: from shsmsx101.ccr.corp.intel.com ([169.254.1.110]) by SHSMSX152.ccr.corp.intel.com ([169.254.6.5]) with mapi id 14.03.0195.001; Fri, 14 Nov 2014 08:22:27 +0800 From: "Xie, Huawei" To: Tetsuya Mukawa , "dev@dpdk.org" Thread-Topic: vhost-user technical isssues Thread-Index: Ac/997srCGFXGQlUQCu5VHdgBPnJPv//6DqA//yYxXA= Date: Fri, 14 Nov 2014 00:22:27 +0000 Message-ID: References: <5462DE39.1070006@igel.co.jp> In-Reply-To: <5462DE39.1070006@igel.co.jp> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [10.239.127.40] Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Subject: Re: [dpdk-dev] vhost-user technical isssues X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 14 Nov 2014 00:12:32 -0000 > -----Original Message----- > From: Tetsuya Mukawa [mailto:mukawa@igel.co.jp] > Sent: Tuesday, November 11, 2014 9:13 PM > To: Xie, Huawei; dev@dpdk.org > Cc: Long, Thomas > Subject: Re: vhost-user technical isssues >=20 > Hi Xie, >=20 > (2014/11/12 6:37), Xie, Huawei wrote: > > Hi Tetsuya: > > There are two major technical issues in my mind for vhost-user > implementation. > > > > 1) memory region map > > Vhost-user passes us file fd and offset for each memory region. Unfortu= nately > the mmap offset is "very" wrong. I discovered this issue long time ago, a= nd also > found > > that I couldn't mmap the huge page file even with correct offset(need d= ouble > check). > > Just now I find that people reported this issue on Nov 3. > > [Qemu-devel] [PULL 27/29] vhost-user: fix mmap offset calculation > > Anyway, I turned to the same idea used in our DPDK vhost-cuse: only use= the fd > for region(0) to map the whole file. > > I think we should use this way temporarily to support qemu-2.1 as it ha= s that > bug. > I agree with you. > Also we may have an issue about un-mapping file on hugetlbfs of linux. > When I check munmap(), it seems 'size' need to be aligned by hugepage siz= e. > (I guess it may be a kernel bug. Might be fixed already.) > Please add return value checking code for munmap(). > Still munmap() might be failed. >=20 > > > > 2) what message is the indicator for vhost start/release? > > Previously for vhost-cuse, it has SET_BACKEND message. > > What we should do for vhost-user? > > SET_VRING_KICK for start? > I think so. >=20 > > What about for release? > > Unlike the kernel virtio, the DPDK virtio in guest could be restarted. > > > > Thoughts? > I guess we need to consider 2 types of restarting. > One is virtio-net driver restarting, the other is vhost-user backend > restarting. > But, so far, it's nice to start to think about virtio-net driver > restarting first. >=20 > Probably we need to implement a way to let vhost-user backend know > virtio-net driver is restarted. > I am not sure what is good way to let vhost-user backend know it. > But how about followings RFC? I checked your code today, and didn't find the logic to deal with virtio re= configuration. >=20 > - When unix domain socket is closed, vhost-user backend should treat it > as "release". > It is useful when QEMU itself is gone suddenly. This is the simple case. >=20 > - Also, implementing new ioctl command like VHOST_RESET_BACKEND. > This command should be sent from virtio-net device of QEMU when > VIRTIO_CONFIG_STATUS_RESET register of virtio-net device is set by > vrtio-net driver. > (Usually this register is set when virtio-net driver is initialized or > stopped.) > It means we need to change QEMU. ;) > It seems virtio-net PMD already sets this register when PMD is > initialized or stopped. > So this framework should work, and can let vhost-user backend know > driver resetting. > (And I guess we can say same things for virtio-net kernel driver.) > It might be enough to close an unix domain socket, instead of > implementing new command. I don't understand wrt closing the socket. The socket connection from qemu will be opened and close once during the li= fe cycle of virtio. This is correct behavior. But virtio driver couldn't be reconfigur= ed several times by guest. It is done by writing status val to the STATUS register. > But in the case, we may need auto reconnection mechanism. >=20 > - We also need to consider DPDK application is gone suddenly without > setting reset register. > In the case, vhost-user backend cannot know it. Only user (or some kind > of watchdog > applications on guest) knows it. > Because of this, user(or app.) should have responsibility to solve this > situation. > To be more precise, user should let vhost-user backend know device > releasing. > If user starts an other DPDK application without solving the issue, the > new DPDK application may > access memory that vhost-user backend is also accessing. > I guess user can solve the issue using "dpdk_nic_bind.py". > The script can move virtio-net device to kernel virtio-net driver, and > return it to igb_uio. > While those steps, virtio-net device is initialized by virtio-net > kernel driver. > So vhost-user backend can know device releasing. >=20 My thought without new message support: When vhost-user receives another configuration message since last time it i= s ready for processing, then we could release it from data core, and process the next = reconfiguration message, and then re-add it to data core when it is ready again(check new = kick message as before). The candidate message is set_mem_table. It is ok we keep the device on data core until we receive the new reconfigu= ration message. Just waste vhost some cycles checking the avail idx. > Tetsuya >=20 > > > > -huawei >=20