From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <huawei.xie@intel.com>
Received: from mga01.intel.com (mga01.intel.com [192.55.52.88])
 by dpdk.org (Postfix) with ESMTP id 77FD47E75
 for <dev@dpdk.org>; Fri, 14 Nov 2014 01:12:31 +0100 (CET)
Received: from fmsmga003.fm.intel.com ([10.253.24.29])
 by fmsmga101.fm.intel.com with ESMTP; 13 Nov 2014 16:22:30 -0800
X-ExtLoop1: 1
X-IronPort-AV: E=Sophos;i="4.97,862,1389772800"; d="scan'208";a="416252095"
Received: from pgsmsx103.gar.corp.intel.com ([10.221.44.82])
 by FMSMGA003.fm.intel.com with ESMTP; 13 Nov 2014 16:13:27 -0800
Received: from pgsmsx107.gar.corp.intel.com (10.221.44.105) by
 PGSMSX103.gar.corp.intel.com (10.221.44.82) with Microsoft SMTP Server (TLS)
 id 14.3.195.1; Fri, 14 Nov 2014 08:22:28 +0800
Received: from shsmsx152.ccr.corp.intel.com (10.239.6.52) by
 PGSMSX107.gar.corp.intel.com (10.221.44.105) with Microsoft SMTP Server (TLS)
 id 14.3.195.1; Fri, 14 Nov 2014 08:22:28 +0800
Received: from shsmsx101.ccr.corp.intel.com ([169.254.1.110]) by
 SHSMSX152.ccr.corp.intel.com ([169.254.6.5]) with mapi id 14.03.0195.001;
 Fri, 14 Nov 2014 08:22:27 +0800
From: "Xie, Huawei" <huawei.xie@intel.com>
To: Tetsuya Mukawa <mukawa@igel.co.jp>, "dev@dpdk.org" <dev@dpdk.org>
Thread-Topic: vhost-user technical isssues
Thread-Index: Ac/997srCGFXGQlUQCu5VHdgBPnJPv//6DqA//yYxXA=
Date: Fri, 14 Nov 2014 00:22:27 +0000
Message-ID: <C37D651A908B024F974696C65296B57B0F30265D@SHSMSX101.ccr.corp.intel.com>
References: <C37D651A908B024F974696C65296B57B0F2F19EF@SHSMSX101.ccr.corp.intel.com>
 <5462DE39.1070006@igel.co.jp>
In-Reply-To: <5462DE39.1070006@igel.co.jp>
Accept-Language: en-US
Content-Language: en-US
X-MS-Has-Attach: 
X-MS-TNEF-Correlator: 
x-originating-ip: [10.239.127.40]
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0
Subject: Re: [dpdk-dev] vhost-user technical isssues
X-BeenThere: dev@dpdk.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: patches and discussions about DPDK <dev.dpdk.org>
List-Unsubscribe: <http://dpdk.org/ml/options/dev>,
 <mailto:dev-request@dpdk.org?subject=unsubscribe>
List-Archive: <http://dpdk.org/ml/archives/dev/>
List-Post: <mailto:dev@dpdk.org>
List-Help: <mailto:dev-request@dpdk.org?subject=help>
List-Subscribe: <http://dpdk.org/ml/listinfo/dev>,
 <mailto:dev-request@dpdk.org?subject=subscribe>
X-List-Received-Date: Fri, 14 Nov 2014 00:12:32 -0000


> -----Original Message-----
> From: Tetsuya Mukawa [mailto:mukawa@igel.co.jp]
> Sent: Tuesday, November 11, 2014 9:13 PM
> To: Xie, Huawei; dev@dpdk.org
> Cc: Long, Thomas
> Subject: Re: vhost-user technical isssues
>=20
> Hi Xie,
>=20
> (2014/11/12 6:37), Xie, Huawei wrote:
> > Hi Tetsuya:
> > There are two major technical issues in my mind for vhost-user
> implementation.
> >
> > 1) memory region map
> > Vhost-user passes us file fd and offset for each memory region. Unfortu=
nately
> the mmap offset is "very" wrong. I discovered this issue long time ago, a=
nd also
> found
> > that I couldn't mmap the huge page file even with correct offset(need d=
ouble
> check).
> > Just now I find that people reported this issue on Nov 3.
> > [Qemu-devel] [PULL 27/29] vhost-user: fix mmap offset calculation
> > Anyway, I turned to the same idea used in our DPDK vhost-cuse: only use=
 the fd
> for region(0) to map the  whole file.
> > I think we should use this way temporarily to support qemu-2.1 as it ha=
s that
> bug.
> I agree with you.
> Also we may have an issue about un-mapping file on hugetlbfs of linux.
> When I check munmap(), it seems 'size' need to be aligned by hugepage siz=
e.
> (I guess it may be a kernel bug. Might be fixed already.)
> Please add return value checking code for munmap().
> Still munmap() might be failed.
>=20
> >
> > 2) what message is the indicator for vhost start/release?
> > Previously  for vhost-cuse, it has SET_BACKEND message.
> > What we should do for vhost-user?
> > SET_VRING_KICK for start?
> I think so.
>=20
> > What about for release?
> > Unlike the kernel virtio, the DPDK virtio in guest could be restarted.
> >
> > Thoughts?
> I guess we need to consider 2 types of restarting.
> One is virtio-net driver restarting, the other is vhost-user backend
> restarting.
> But, so far, it's nice to start to think about virtio-net driver
> restarting first.
>=20
> Probably we need to implement a way to let vhost-user backend know
> virtio-net driver is restarted.
> I am not sure what is good way to let vhost-user backend know it.
> But how about followings RFC?

I checked your code today, and didn't find the logic to deal with virtio re=
configuration.
>=20
> - When unix domain socket is closed, vhost-user backend should treat it
> as "release".
>  It is useful when QEMU itself is gone suddenly.
This is the simple case.
>=20
> - Also, implementing new ioctl command like VHOST_RESET_BACKEND.
>  This command should be sent from virtio-net device of QEMU when
>  VIRTIO_CONFIG_STATUS_RESET register of virtio-net device is set by
> vrtio-net driver.
>  (Usually this register is set when virtio-net driver is initialized or
> stopped.)
>  It means we need to change QEMU. ;)
>  It seems virtio-net PMD already sets this register when PMD is
> initialized or stopped.
>  So this framework should work, and can let vhost-user backend know
> driver resetting.
>  (And I guess we can say same things for virtio-net kernel driver.)
>  It might be enough to close an unix domain socket, instead of
> implementing new command.
I don't understand wrt closing the socket.

The socket connection from qemu will be opened and close once during the li=
fe cycle of
virtio. This is correct behavior. But virtio  driver couldn't be reconfigur=
ed several times by guest.
It is done by writing status val to the STATUS register.

>  But in the case, we may need auto reconnection mechanism.
>=20
> - We also need to consider DPDK application is gone suddenly without
> setting reset register.
>  In the case, vhost-user backend cannot know it. Only user (or some kind
> of watchdog
>  applications on guest) knows it.
>  Because of this, user(or app.) should have responsibility to solve this
> situation.
>  To be more precise, user should let vhost-user backend know device
> releasing.
>  If user starts an other DPDK application without solving the issue, the
> new DPDK application may
>  access memory that vhost-user backend is also accessing.
>  I guess user can solve the issue using "dpdk_nic_bind.py".
>  The script can move virtio-net device to kernel virtio-net driver, and
> return it to igb_uio.
>  While those steps, virtio-net device is initialized by virtio-net
> kernel driver.
>  So vhost-user backend can know device releasing.
>=20


My thought without new message support:
When vhost-user receives another configuration message since last time it i=
s ready for
processing,  then we could release it from data core, and process the next =
reconfiguration
message, and then re-add it  to data core when it is ready again(check new =
kick message as before).

The candidate message is set_mem_table.

It is ok we keep the device on data core until we receive the new reconfigu=
ration message. Just waste vhost
some cycles checking the avail idx.

> Tetsuya
>=20
> >
> > -huawei
>=20