From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-qg0-f49.google.com (mail-qg0-f49.google.com [209.85.192.49]) by dpdk.org (Postfix) with ESMTP id 2A6D8FE5 for ; Tue, 15 Dec 2015 12:43:16 +0100 (CET) Received: by mail-qg0-f49.google.com with SMTP id w101so3967001qge.2 for ; Tue, 15 Dec 2015 03:43:16 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=6wind-com.20150623.gappssmtp.com; s=20150623; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=V1priS7YziC3b8PFSpJplsdF9AVsGI7GqSk2pgKpTp8=; b=nprlpFB++efS3xOfVtW1eFC8ANDjd/E86n+54hKZ108GZHx1ciB3k8/l8YJhnosOFn SIr1E984naMAViB+pwMn2UJ29/tVpCl9dyq2yYUQzU91epyzgka3j+ku9c/zdO8LXJQJ 9YLwZCHYnCH+K4Ot6Qjys2Pa1A3rXit4+PTgDFZT7ZlHvCrwqDTk+P3xHD9CZKNdc+7I 5t1Q2+AqiSoUJ0dA5+UiHAkr5WEHfsPsUTwo0Ih5+MWddgFh+dcQV4XCj+Sbh5X3S3fp o7uui7IiioHta1gBMs8++WBLqfYXrIv1GMJBRL2MKrpjNkMBNEYWSBvxGUl1z79GiuEw 26uw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:date :message-id:subject:from:to:cc:content-type; bh=V1priS7YziC3b8PFSpJplsdF9AVsGI7GqSk2pgKpTp8=; b=dSYrwY3976rUoH784mUgGjxhsXm9w/b9vgey7lsc28G7KfFJgeK/rzK9vFcXjT+4FO cFJSgCaWYKQf5PMSmME+50QdlxHL59mJ1i5EEva48Duexqe+mTLN+PO0OvvWnqcz9RgD hk9Dyu/0A8vE5ENhsV1galQ1fmkL1AG90dXZln4Md+v97dYmJa3KQyFFuwU1JS8ddlLf N2BNuDVRo+DYLUR1NP1lw0ADfY/HmquTftDTjzHGPFrXmd9FlqAjBs0oCLApqRZE/tQS Fisz2O9FcP04kxH2n2NG9ruL8SEiSdhiaF2CgTidR78emDY52TSANV/4YHBLd4ZesKh0 vNcA== X-Gm-Message-State: ALoCoQkfnEsZmb2i2oUUZ7+qiVnGXaDSanVWxgpAoJ87W/MA/kQ+3say/uJ1i/9qRXZmZV0MpzDm+2uunaPicdmOOctAZYK5mM+r3cHW54noOSeHtPqoN5E= MIME-Version: 1.0 X-Received: by 10.140.20.242 with SMTP id 105mr49602563qgj.18.1450179795667; Tue, 15 Dec 2015 03:43:15 -0800 (PST) Received: by 10.140.98.193 with HTTP; Tue, 15 Dec 2015 03:43:15 -0800 (PST) In-Reply-To: <20151215100548.GD32243@pxdev.xzpeter.org> References: <000001d133ed$b2446eb0$16cd4c10$@samsung.com> <20151211094934.GX29571@yliu-dev.sh.intel.com> <001c01d133fd$d3a7d870$7af78950$@samsung.com> <20151214035842.GB18437@pxdev.xzpeter.org> <20151215082324.GG29571@yliu-dev.sh.intel.com> <007f01d13715$042a0a80$0c7e1f80$@samsung.com> <20151215100548.GD32243@pxdev.xzpeter.org> Date: Tue, 15 Dec 2015 12:43:15 +0100 Message-ID: From: Thibaut Collet To: Peter Xu Content-Type: text/plain; charset=UTF-8 X-Content-Filtered-By: Mailman/MimeDel 2.1.15 Cc: dev@dpdk.org, Victor Kaplansky Subject: Re: [dpdk-dev] [PATCH 0/4 for 2.3] vhost-user live migration support X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 15 Dec 2015 11:43:16 -0000 On Tue, Dec 15, 2015 at 11:05 AM, Peter Xu wrote: > On Tue, Dec 15, 2015 at 11:45:56AM +0300, Pavel Fedin wrote: > > To tell the truth, i don't know. I am also learning qemu internals on > the fly. Indeed, i see that it should announce itself. But > > this brings up a question: why do we need special announce procedure in > vhost-user then? > > I have the same question. Here is my guess... > > In customized networks, maybe people are not using ARP at all? When > we use DPDK, we directly pass through the network logic inside > kernel itself. So logically all the network protocols could be > customized by the user of it. In the customized network, maybe there > is some other protocol (rather than RARP) that would do the same > thing as what ARP/RARP does. So, this SEND_RARP request could give > the vhost-user backend a chance to format its own announce packet > and broadcast (in the SEND_RARP request, the guest's mac address > will be appended). > > CCing Victor to better know the truth... > > Peter > Hi, After a migration, to avoid network outage, the guest must announce its new location to the L2 layer, typically with a GARP. Otherwise requests sent to the guest arrive to the old host until a ARP request is sent (after 30 seconds) or the guest sends some data. QEMU implementation of self announce after a migration with a vhost backend is the following: - If the VIRTIO_GUEST_ANNOUNCE feature has been negotiated the guest sends automatically a GARP. - Else if the vhost backend implements VHOST_USER_SEND_RARP this request is sent to the vhost backend. When this message is received the vhost backend must act as it receives a RARP from the guest (purpose of this RARP is to update switches' MAC->port maaping as a GARP). This RARP is a false one, created by the vhost backend, - Else nothing is done and we have a network outage until a ARP is sent or the guest sends some data. VIRTIO_GUEST_ANNOUNCE feature is negotiated if: - the vhost backend announces the support of this feature. Maybe QEMU can be updated to support unconditionnaly this feature - the virtio driver of the guest implements this feature. It is not the case for old kernel or dpdk virtio pmd. Regarding dpdk to have a migration of vhost interface with limited network outage we have to: - Implement management VHOST_USER_SEND_RARP request to emulate a fake RARP for guest To do that we have to consider two kinds of guest: 1. Guest with virtio driver implementing VIRTIO_GUEST_ANNOUNCE feature 2. Guest with virtio driver that does not have the VIRTIO_GUEST_ANNOUNCE feature. This is the case with old kernel or guest running a dpdk (virtio pmd of dpdk does not have this feature) Guest with VIRTIO_GUEST_ANNOUNCE feature sends automatically some GARP after a migration if this feature has been negotiated. So the only thing to do it is to negotiate the VIRTIO_GUEST_ANNOUNCE feature between QEMU, vhost backend and the guest. For this kind of guest the vhost-backend must announce the support of VIRTIO_GUEST_ANNOUNCE feature. As vhost-backend has no particular action to do in this case the support of VIRTIO_GUEST_ANNOUNCE feature can be unconditionally set in QEMU in the future. For guest without VIRTIO_GUEST_ANNOUNCE feature we have to send a fake RARP: QEMU knows the MAC address of the guest and can create and broadcast a RARP. But in case of vhost-backend QEMU is not able to broadcast this fake RARP and must ask to the vhost backend to do it through the VHOST_USER_SEND_RARP request. When the vhost backend receives this message it must create a fake RARP message (as done by QEMU) and do the appropriate operation as this message has been sent by the guest through the virtio rings. To solve this point 2 solutions are implemented: - After the migration the guest automatically sends GARP. This solution occurs if VIRTIO_GUEST_ANNOUNCE feature has been negotiated between QEMU and the guest. * VIRTIO_GUEST_ANNOUNCE