From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-qg0-f50.google.com (mail-qg0-f50.google.com [209.85.192.50]) by dpdk.org (Postfix) with ESMTP id 4380BFE5 for ; Tue, 15 Dec 2015 12:47:48 +0100 (CET) Received: by mail-qg0-f50.google.com with SMTP id 103so4031824qgi.3 for ; Tue, 15 Dec 2015 03:47:48 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=6wind-com.20150623.gappssmtp.com; s=20150623; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=h06fQtVbeZFLo+tduAFxKX8ozMsaffUCoMlCmub83yk=; b=SwBQNa+auFxYvRgGfBg/A3RxckfVCU/HwI/8R1EhbzbOfKkEYfPDAz+SVP8dobxJ2g bL523HgKHPQhGCtfAi9NttXi845ntCGVBcnGIb9XevQRWerhuahz+MOwtAkvlDtO9t8F lrI5k3+ycEBqUbTKXJO5n890WGwIxvwlyddf5Rf1t9nYDE36yet8taW1SeKCV2Usf24R LoNS+cd7Jdvl8PTah8rsvI6g2WltoTWSDBCinOUH+VmafoyT+R2JNLPTq0h5QzbZajjb TlbKpo+7J1fQdCqqCGi3+d+m8UFbD+U26o8ApADYTniCh0AvkUixA7Foj3dYrsOtKDTx 2wHA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:date :message-id:subject:from:to:cc:content-type; bh=h06fQtVbeZFLo+tduAFxKX8ozMsaffUCoMlCmub83yk=; b=OMDJYabAu1PlIfx4nvAFk+ygbd9dYRJzFuZkogIcJzR5lYapBNSOM8ACmQmgKpbx2A u5Vgb8wlv0uobWVSoZ4ykssahZTO2yxmQOJMJXmolhefgm/nyt4oexS5Nvrz8j9q9WOf r/HwQGxTYl7XLlHoN9pT3tPjncukkK/QoAd6HT90HA8FsPSN00qNy+zwJ22ZA/IkH/UK Pi5GSpCCo+jFOdSAnMI61tHcPG+hjLfLxJK8XOLqXBXE/oJJ9Hx/q4UcFqOxxDKp5Zy/ E1/RzFv1U8f9AKEgT57Vhlg78jigw6j3x6aR577MG2jMk5YiQ6Q6l8JinITbydJUxoM1 ScGA== X-Gm-Message-State: ALoCoQlM20HKmusd8ZoZiYvsr4NUoF13341ME5OIe9hQ5ITOVJD2Q6UmAPB8F5AiD0/zvq6QnQ7OWzW3UpU+04p0BWAcKUz16dJUONUIiviCVxYlnDeQM8Y= MIME-Version: 1.0 X-Received: by 10.140.89.201 with SMTP id v67mr51801513qgd.38.1450180067683; Tue, 15 Dec 2015 03:47:47 -0800 (PST) Received: by 10.140.98.193 with HTTP; Tue, 15 Dec 2015 03:47:47 -0800 (PST) In-Reply-To: References: <000001d133ed$b2446eb0$16cd4c10$@samsung.com> <20151211094934.GX29571@yliu-dev.sh.intel.com> <001c01d133fd$d3a7d870$7af78950$@samsung.com> <20151214035842.GB18437@pxdev.xzpeter.org> <20151215082324.GG29571@yliu-dev.sh.intel.com> <007f01d13715$042a0a80$0c7e1f80$@samsung.com> <20151215100548.GD32243@pxdev.xzpeter.org> Date: Tue, 15 Dec 2015 12:47:47 +0100 Message-ID: From: Thibaut Collet To: Peter Xu Content-Type: text/plain; charset=UTF-8 X-Content-Filtered-By: Mailman/MimeDel 2.1.15 Cc: dev@dpdk.org, Victor Kaplansky Subject: Re: [dpdk-dev] [PATCH 0/4 for 2.3] vhost-user live migration support X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 15 Dec 2015 11:47:48 -0000 On Tue, Dec 15, 2015 at 12:43 PM, Thibaut Collet wrote: > > > On Tue, Dec 15, 2015 at 11:05 AM, Peter Xu wrote: > >> On Tue, Dec 15, 2015 at 11:45:56AM +0300, Pavel Fedin wrote: >> > To tell the truth, i don't know. I am also learning qemu internals on >> the fly. Indeed, i see that it should announce itself. But >> > this brings up a question: why do we need special announce procedure in >> vhost-user then? >> >> I have the same question. Here is my guess... >> >> In customized networks, maybe people are not using ARP at all? When >> we use DPDK, we directly pass through the network logic inside >> kernel itself. So logically all the network protocols could be >> customized by the user of it. In the customized network, maybe there >> is some other protocol (rather than RARP) that would do the same >> thing as what ARP/RARP does. So, this SEND_RARP request could give >> the vhost-user backend a chance to format its own announce packet >> and broadcast (in the SEND_RARP request, the guest's mac address >> will be appended). >> >> CCing Victor to better know the truth... >> >> Peter >> > > > Hi, > > After a migration, to avoid network outage, the guest must announce its > new location to the L2 layer, typically with a GARP. Otherwise requests > sent to the guest arrive to the old host until a ARP request is sent (after > 30 seconds) or the guest sends some data. > > QEMU implementation of self announce after a migration with a vhost > backend is the following: > - If the VIRTIO_GUEST_ANNOUNCE feature has been negotiated the guest > sends automatically a GARP. > - Else if the vhost backend implements VHOST_USER_SEND_RARP this request > is sent to the vhost backend. When this message is received the vhost > backend must act as it receives a RARP from the guest (purpose of this RARP > is to update switches' MAC->port maaping as a GARP). This RARP is a false > one, created by the vhost backend, > - Else nothing is done and we have a network outage until a ARP is sent > or the guest sends some data. > > > VIRTIO_GUEST_ANNOUNCE feature is negotiated if: > - the vhost backend announces the support of this feature. Maybe QEMU > can be updated to support unconditionnaly this feature > - the virtio driver of the guest implements this feature. It is not the > case for old kernel or dpdk virtio pmd. > > Regarding dpdk to have a migration of vhost interface with limited network > outage we have to: > > - Implement management VHOST_USER_SEND_RARP request to emulate a fake > RARP for guest > > To do that we have to consider two kinds of guest: > 1. Guest with virtio driver implementing VIRTIO_GUEST_ANNOUNCE feature > 2. Guest with virtio driver that does not have the VIRTIO_GUEST_ANNOUNCE > feature. This is the case with old kernel or guest running a dpdk (virtio > pmd of dpdk does not have this feature) > > Guest with VIRTIO_GUEST_ANNOUNCE feature sends automatically some GARP > after a migration if this feature has been negotiated. So the only thing to > do it is to negotiate the VIRTIO_GUEST_ANNOUNCE feature between QEMU, vhost > backend and the guest. > For this kind of guest the vhost-backend must announce the support of > VIRTIO_GUEST_ANNOUNCE feature. As vhost-backend has no particular action to > do in this case the support of VIRTIO_GUEST_ANNOUNCE feature can be > unconditionally set in QEMU in the future. > > For guest without VIRTIO_GUEST_ANNOUNCE feature we have to send a fake > RARP: QEMU knows the MAC address of the guest and can create and broadcast > a RARP. But in case of vhost-backend QEMU is not able to broadcast this > fake RARP and must ask to the vhost backend to do it through the > VHOST_USER_SEND_RARP request. When the vhost backend receives this message > it must create a fake RARP message (as done by QEMU) and do the appropriate > operation as this message has been sent by the guest through the virtio > rings. > > > To solve this point 2 solutions are implemented: > - After the migration the guest automatically sends GARP. This solution > occurs if VIRTIO_GUEST_ANNOUNCE feature has been negotiated between QEMU > and the guest. > * VIRTIO_GUEST_ANNOUNCE > Sorry my previous message will be sent by error (it is a draft with rework in progress) The full explanation are: Hi, After a migration, to avoid network outage, the guest must announce its new location to the L2 layer, typically with a GARP. Otherwise requests sent to the guest arrive to the old host until a ARP request is sent (after 30 seconds) or the guest sends some data. QEMU implementation of self announce after a migration with a vhost backend is the following: - If the VIRTIO_GUEST_ANNOUNCE feature has been negotiated the guest sends automatically a GARP. - Else if the vhost backend implements VHOST_USER_SEND_RARP this request is sent to the vhost backend. When this message is received the vhost backend must act as it receives a RARP from the guest (purpose of this RARP is to update switches' MAC->port maaping as a GARP). This RARP is a false one, created by the vhost backend, - Else nothing is done and we have a network outage until a ARP is sent or the guest sends some data. VIRTIO_GUEST_ANNOUNCE feature is negotiated if: - the vhost backend announces the support of this feature. Maybe QEMU can be updated to support unconditionnaly this feature - the virtio driver of the guest implements this feature. It is not the case for old kernel or dpdk virtio pmd. Regarding dpdk to have a migration of vhost interface with limited network outage we have to: - In the vhost pmd * Announce supports of VIRTIO_GUEST_ANNOUNCE feature * Implement management of VHOST_USER_SEND_RARP request to emulate a fake RARP if the VIRTIO_GUEST_ANNOUNCE feature is not implemented by the guest - In the virtio pmd * Support VIRTIO_GUEST_ANNOUNCE feature to avoid RARP emission by the host after a migration. Hope this explanation will help Regards. Thibaut.