DPDK patches and discussions
 help / color / mirror / Atom feed
* Re: [dpdk-dev] [PATCH 0/4 for 2.3] vhost-user live migration support
@ 2015-12-11  8:26 Pavel Fedin
  2015-12-11  9:49 ` Yuanhan Liu
  0 siblings, 1 reply; 42+ messages in thread
From: Pavel Fedin @ 2015-12-11  8:26 UTC (permalink / raw)
  To: dev

 Hello!

 I am currently testing this patchset with qemu and have problems.

 The guest migrates correctly, but after the migration it cries in the log:

Vhost user backend fails to broadcast fake RARP

 and pinging the (new) host doesn't work. When i migrate it back to the old host, the network resumes working.

 I have analyzed the code, and this problem happens because we support neither VHOST_USER_PROTOCOL_F_RARP, nor
VIRTIO_NET_F_GUEST_ANNOUNCE. Since the latter seems to be related only to guest, i simply enabled it in qemu by force, and after
this the network doesn't work at all.
 Can anybody help me and explain how the thing works? I expected that gratuitous ARP packets are harmless, but they seem to break
things somehow. And what was used for testing the implementation?

Kind regards,
Pavel Fedin
Expert Engineer
Samsung Electronics Research center Russia

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [dpdk-dev] [PATCH 0/4 for 2.3] vhost-user live migration support
  2015-12-11  8:26 [dpdk-dev] [PATCH 0/4 for 2.3] vhost-user live migration support Pavel Fedin
@ 2015-12-11  9:49 ` Yuanhan Liu
  2015-12-11 10:22   ` Pavel Fedin
  0 siblings, 1 reply; 42+ messages in thread
From: Yuanhan Liu @ 2015-12-11  9:49 UTC (permalink / raw)
  To: Pavel Fedin; +Cc: dev

On Fri, Dec 11, 2015 at 11:26:55AM +0300, Pavel Fedin wrote:
>  Hello!
> 
>  I am currently testing this patchset with qemu and have problems.

Hi,

Thanks for testing!
> 
>  The guest migrates correctly, but after the migration it cries in the log:
> 
> Vhost user backend fails to broadcast fake RARP

Yes, because I have enabled it yet on DPDK side, and I was intended to
try it in the v2 patchset, which is likely to be sent out next week.

	--yliu

> 
>  and pinging the (new) host doesn't work. When i migrate it back to the old host, the network resumes working.
> 
>  I have analyzed the code, and this problem happens because we support neither VHOST_USER_PROTOCOL_F_RARP, nor
> VIRTIO_NET_F_GUEST_ANNOUNCE. Since the latter seems to be related only to guest, i simply enabled it in qemu by force, and after
> this the network doesn't work at all.
>  Can anybody help me and explain how the thing works? I expected that gratuitous ARP packets are harmless, but they seem to break
> things somehow. And what was used for testing the implementation?
> 
> Kind regards,
> Pavel Fedin
> Expert Engineer
> Samsung Electronics Research center Russia
> 
> 

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [dpdk-dev] [PATCH 0/4 for 2.3] vhost-user live migration support
  2015-12-11  9:49 ` Yuanhan Liu
@ 2015-12-11 10:22   ` Pavel Fedin
  2015-12-14  3:58     ` Peter Xu
  0 siblings, 1 reply; 42+ messages in thread
From: Pavel Fedin @ 2015-12-11 10:22 UTC (permalink / raw)
  To: 'Yuanhan Liu'; +Cc: dev

 Hello!

> On Fri, Dec 11, 2015 at 11:26:55AM +0300, Pavel Fedin wrote:
> >  Hello!
> >
> >  I am currently testing this patchset with qemu and have problems.
> 
> Hi,
> 
> Thanks for testing!

 Not at all :)

 BTW, it works, and it was my bad. openvswitch was configured incorrectly on the other side, vhost port number was different for
some reason, while ruleset was the same. I reconfigured it and now everything migrates correctly, except increased downtime because
of missing GARP (the guest misses some PINGs, then it retries ARP, which brings the link back up).

 Tested-by: Pavel Fedin <p.fedin@samsung.com>

Kind regards,
Pavel Fedin
Expert Engineer
Samsung Electronics Research center Russia

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [dpdk-dev] [PATCH 0/4 for 2.3] vhost-user live migration support
  2015-12-11 10:22   ` Pavel Fedin
@ 2015-12-14  3:58     ` Peter Xu
  2015-12-14  7:30       ` Pavel Fedin
  2015-12-15  8:23       ` Yuanhan Liu
  0 siblings, 2 replies; 42+ messages in thread
From: Peter Xu @ 2015-12-14  3:58 UTC (permalink / raw)
  To: Pavel Fedin, yuanhan.liu; +Cc: dev

On Fri, Dec 11, 2015 at 01:22:23PM +0300, Pavel Fedin wrote:
>  BTW, it works, and it was my bad. openvswitch was configured incorrectly on the other side, vhost port number was different for
> some reason, while ruleset was the same. I reconfigured it and now everything migrates correctly, except increased downtime because
> of missing GARP (the guest misses some PINGs, then it retries ARP, which brings the link back up).

Hi,

When doing the ping, was it from the guest (to another host) or to
the guest (from another host)?

In any case, I still could not understand why the ping loss happened
in this test.

If ping from guest, no ARP refresh is required at all?

If ping to guest from outside, when the migration finishes on the
target side of qemu, qemu_self_announce() will be called. Although
we might see a warning like "Vhost user backend fails to broadcast
fake RARP" (notify is done by hacking vhost_user_receive(), even if
notify fails, things will still move on), QEMU should still send a
RARP onto the link.

Not sure whether I missed anything.

Thanks.
Peter

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [dpdk-dev] [PATCH 0/4 for 2.3] vhost-user live migration support
  2015-12-14  3:58     ` Peter Xu
@ 2015-12-14  7:30       ` Pavel Fedin
  2015-12-14  9:04         ` Peter Xu
  2015-12-15  8:23       ` Yuanhan Liu
  1 sibling, 1 reply; 42+ messages in thread
From: Pavel Fedin @ 2015-12-14  7:30 UTC (permalink / raw)
  To: 'Peter Xu', yuanhan.liu; +Cc: dev

 Hello!

> When doing the ping, was it from the guest (to another host) or to
> the guest (from another host)?
> 
> In any case, I still could not understand why the ping loss happened
> in this test.
> 
> If ping from guest, no ARP refresh is required at all?

 ping from guest to host.

 Ok, my setup was:

Host<------->openVSwitch<----------->guest
      LOCAL               vhostuser

 So, in order to migrate the guest, i simply replicated this setup on both hosts, with the same IPs on host side. And on both hosts i set up the following ruleset for openvswitch:

ovs-ofctl add-flow ovs-br0 in_port=1,actions=output:LOCAL
ovs-ofctl add-flow ovs-br0 in_port=LOCAL,actions=output:1

 And on the second host, for some reason, vhostuser port got no 2 in the database instead of 1. Probably because first i added wrong port, then added correct one, then removed the wrong one. So, as i wrote before - please don't worry, the patch works fine, it was totally my lame fault.

Kind regards,
Pavel Fedin
Expert Engineer
Samsung Electronics Research center Russia

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [dpdk-dev] [PATCH 0/4 for 2.3] vhost-user live migration support
  2015-12-14  7:30       ` Pavel Fedin
@ 2015-12-14  9:04         ` Peter Xu
  2015-12-14  9:46           ` Pavel Fedin
  0 siblings, 1 reply; 42+ messages in thread
From: Peter Xu @ 2015-12-14  9:04 UTC (permalink / raw)
  To: Pavel Fedin; +Cc: dev

On Mon, Dec 14, 2015 at 10:30:54AM +0300, Pavel Fedin wrote:
>  Hello!

Hi, Pavel!

> 
> > When doing the ping, was it from the guest (to another host) or to
> > the guest (from another host)?
> > 
> > In any case, I still could not understand why the ping loss happened
> > in this test.
> > 
> > If ping from guest, no ARP refresh is required at all?
> 
>  ping from guest to host.
> 
>  Ok, my setup was:
> 
> Host<------->openVSwitch<----------->guest
>       LOCAL               vhostuser
> 
>  So, in order to migrate the guest, i simply replicated this setup on both hosts, with the same IPs on host side. And on both hosts i set up the following ruleset for openvswitch:

Regarding to "with the same IPs on host side": do you mean that you
configured the same IP on two hosts in the intranet? I think this
does not matter if we are testing it functionally (whether live
migration could work), However I would still perfer to try ping
another host (say, host3) inside the intranet. What do you think?

When pinging host3, I assume there should have no ping loss. Also,
should have no loss too in the revert direction (reason as in
previous mail).

> 
> ovs-ofctl add-flow ovs-br0 in_port=1,actions=output:LOCAL
> ovs-ofctl add-flow ovs-br0 in_port=LOCAL,actions=output:1
> 
>  And on the second host, for some reason, vhostuser port got no 2 in the database instead of 1. Probably because first i added wrong port, then added correct one, then removed the wrong one. So, as i wrote before - please don't worry, the patch works fine, it was totally my lame fault.

Yes, thanks to let me know that the patch is working. Actually what
I am interested in is the down time that when host3 ping guest from
outside during migration. Would you please let me know the result if
you are doing such tests in the future? And please just ignore this
if there is no requirement on your side.

Thanks!
Peter

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [dpdk-dev] [PATCH 0/4 for 2.3] vhost-user live migration support
  2015-12-14  9:04         ` Peter Xu
@ 2015-12-14  9:46           ` Pavel Fedin
  2015-12-14 10:09             ` Peter Xu
  2015-12-14 12:09             ` Yuanhan Liu
  0 siblings, 2 replies; 42+ messages in thread
From: Pavel Fedin @ 2015-12-14  9:46 UTC (permalink / raw)
  To: 'Peter Xu'; +Cc: dev

 Hello!

> > Host<------->openVSwitch<----------->guest
> >       LOCAL               vhostuser
> >
> >  So, in order to migrate the guest, i simply replicated this setup on both hosts, with the
> same IPs on host side. And on both hosts i set up the following ruleset for openvswitch:
> 
> Regarding to "with the same IPs on host side": do you mean that you
> configured the same IP on two hosts in the intranet?

 No intranet. You can think of it as an isolated network between the host and guest, and that's all. I just assigned an IP to ovs' LOCAL interface on both hosts, and these ovs instances knew nothing about each other, neither they forwarded packets between each other. I didn't want to make things overcomplicated and decided not to mess with host's own connection to the intranet, just something that sits on the other side of vhost-user and replies to PINGs was perfectly OK for me.

> I think this
> does not matter if we are testing it functionally (whether live
> migration could work), However I would still perfer to try ping
> another host (say, host3) inside the intranet. What do you think?

 Yes, perhaps this would be better test, may be next time i'll do it. Anyway, IIRC, PATCH v2 is coming.

Kind regards,
Pavel Fedin
Expert Engineer
Samsung Electronics Research center Russia

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [dpdk-dev] [PATCH 0/4 for 2.3] vhost-user live migration support
  2015-12-14  9:46           ` Pavel Fedin
@ 2015-12-14 10:09             ` Peter Xu
  2015-12-14 12:09             ` Yuanhan Liu
  1 sibling, 0 replies; 42+ messages in thread
From: Peter Xu @ 2015-12-14 10:09 UTC (permalink / raw)
  To: Pavel Fedin; +Cc: dev

On Mon, Dec 14, 2015 at 12:46:57PM +0300, Pavel Fedin wrote:
> > Regarding to "with the same IPs on host side": do you mean that you
> > configured the same IP on two hosts in the intranet?
> 
>  No intranet. You can think of it as an isolated network between the host and guest, and that's all. I just assigned an IP to ovs' LOCAL interface on both hosts, and these ovs instances knew nothing about each other, neither they forwarded packets between each other. I didn't want to make things overcomplicated and decided not to mess with host's own connection to the intranet, just something that sits on the other side of vhost-user and replies to PINGs was perfectly OK for me.

I see.

> 
> > I think this
> > does not matter if we are testing it functionally (whether live
> > migration could work), However I would still perfer to try ping
> > another host (say, host3) inside the intranet. What do you think?
> 
>  Yes, perhaps this would be better test, may be next time i'll do it. Anyway, IIRC, PATCH v2 is coming.

Agreed.

Thanks!
Peter

> 
> Kind regards,
> Pavel Fedin
> Expert Engineer
> Samsung Electronics Research center Russia
> 
> 

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [dpdk-dev] [PATCH 0/4 for 2.3] vhost-user live migration support
  2015-12-14  9:46           ` Pavel Fedin
  2015-12-14 10:09             ` Peter Xu
@ 2015-12-14 12:09             ` Yuanhan Liu
  2015-12-14 13:00               ` Peter Xu
  1 sibling, 1 reply; 42+ messages in thread
From: Yuanhan Liu @ 2015-12-14 12:09 UTC (permalink / raw)
  To: Pavel Fedin; +Cc: dev, Victor Kaplansky, Michael S. Tsirkin

On Mon, Dec 14, 2015 at 12:46:57PM +0300, Pavel Fedin wrote:
>  Hello!
> 
> > > Host<------->openVSwitch<----------->guest
> > >       LOCAL               vhostuser
> > >
> > >  So, in order to migrate the guest, i simply replicated this setup on both hosts, with the
> > same IPs on host side. And on both hosts i set up the following ruleset for openvswitch:
> > 
> > Regarding to "with the same IPs on host side": do you mean that you
> > configured the same IP on two hosts in the intranet?
> 
>  No intranet. You can think of it as an isolated network between the host and guest, and that's all. I just assigned an IP to ovs' LOCAL interface on both hosts, and these ovs instances knew nothing about each other, neither they forwarded packets between each other. I didn't want to make things overcomplicated and decided not to mess with host's own connection to the intranet, just something that sits on the other side of vhost-user and replies to PINGs was perfectly OK for me.

Pavel,

It seems that we have exactly the same test environment set up: I have
one server (where I normally do vhost test there) and one desktop (my
dev box), 

On both hosts, there is an ovs bridge, with IP address 192.168.100.1
assigned manually. Later, I started a VM on the server, and manually
assigned IP to 192.168.100.10. I then run "ping 192.168.100.1" for
live migration testing.

The migration to my desktop somehow works (even though there are some
bugs in this patch set), however, I did see what Pavel saw: about 12
packets has been lost, which means about 12 seconds the network is not
working well.

Besides that, there was always an error message from the target host
after the migration:

    KVM: injection failed, MSI lost (Operation not permitted)

Firstly, I've very limited knowledge of OVS, therefore I'm not sure
this kind of live migration test env is setup rightly or not. I'd
appreciate if anyone could shine some lights on it. Anyway, I'm digging
the code to see if I can find something abnormal there.

> > I think this
> > does not matter if we are testing it functionally (whether live
> > migration could work), However I would still perfer to try ping
> > another host (say, host3) inside the intranet. What do you think?
> 
>  Yes, perhaps this would be better test, may be next time i'll do it.

Again, appreciate your testing!

> Anyway, IIRC, PATCH v2 is coming.

Hopefully, I could fix this gap this week and send out v2.

	--yliu

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [dpdk-dev] [PATCH 0/4 for 2.3] vhost-user live migration support
  2015-12-14 12:09             ` Yuanhan Liu
@ 2015-12-14 13:00               ` Peter Xu
  2015-12-14 13:21                 ` Yuanhan Liu
  0 siblings, 1 reply; 42+ messages in thread
From: Peter Xu @ 2015-12-14 13:00 UTC (permalink / raw)
  To: Yuanhan Liu; +Cc: dev, Victor Kaplansky, Michael S. Tsirkin

On Mon, Dec 14, 2015 at 08:09:37PM +0800, Yuanhan Liu wrote:
> It seems that we have exactly the same test environment set up: I have
> one server (where I normally do vhost test there) and one desktop (my
> dev box), 
> 
> On both hosts, there is an ovs bridge, with IP address 192.168.100.1
> assigned manually. Later, I started a VM on the server, and manually
> assigned IP to 192.168.100.10. I then run "ping 192.168.100.1" for
> live migration testing.
> 
> The migration to my desktop somehow works (even though there are some
> bugs in this patch set), however, I did see what Pavel saw: about 12
> packets has been lost, which means about 12 seconds the network is not
> working well.

Hi, Yuanhan,

I _guess_ the problem for ping might be: guest ARP entry for
192.168.100.1 is not updated. Or say, after guest migrated to host2
from host1, guest is still trying to send packet to host1's NIC (no
one is telling it to update, right?), so no one is responding the
ping. When the entry is expired, guest will resend the ARP request,
and host2 will respond this time, with mac address on host2 provided
this time. After that, ping works again.

(not familiar with OVS too, so am just taking it as a "vritual"
switch)

Thanks.
Peter

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [dpdk-dev] [PATCH 0/4 for 2.3] vhost-user live migration support
  2015-12-14 13:00               ` Peter Xu
@ 2015-12-14 13:21                 ` Yuanhan Liu
  2015-12-14 13:28                   ` Peter Xu
  2015-12-14 14:54                   ` Pavel Fedin
  0 siblings, 2 replies; 42+ messages in thread
From: Yuanhan Liu @ 2015-12-14 13:21 UTC (permalink / raw)
  To: Peter Xu; +Cc: dev, Victor Kaplansky, Michael S. Tsirkin

On Mon, Dec 14, 2015 at 09:00:22PM +0800, Peter Xu wrote:
> On Mon, Dec 14, 2015 at 08:09:37PM +0800, Yuanhan Liu wrote:
> > It seems that we have exactly the same test environment set up: I have
> > one server (where I normally do vhost test there) and one desktop (my
> > dev box), 
> > 
> > On both hosts, there is an ovs bridge, with IP address 192.168.100.1
> > assigned manually. Later, I started a VM on the server, and manually
> > assigned IP to 192.168.100.10. I then run "ping 192.168.100.1" for
> > live migration testing.
> > 
> > The migration to my desktop somehow works (even though there are some
> > bugs in this patch set), however, I did see what Pavel saw: about 12
> > packets has been lost, which means about 12 seconds the network is not
> > working well.
> 
> Hi, Yuanhan,
> 
> I _guess_ the problem for ping might be: guest ARP entry for
> 192.168.100.1 is not updated. Or say, after guest migrated to host2
> from host1, guest is still trying to send packet to host1's NIC (no
> one is telling it to update, right?), so no one is responding the
> ping. When the entry is expired, guest will resend the ARP request,
> and host2 will respond this time, with mac address on host2 provided
> this time. After that, ping works again.

Peter,

Thanks for your input, and that sounds reasonable. You just reminded
me that the host1's NIC is indeed different with host2's NIC: the ovs
bridge mac address is different.

I then had a quick try, setting the two ovs bridge with same mac
address, and it works like a charm: the gap is gone :)

	--yliu
> 
> (not familiar with OVS too, so am just taking it as a "vritual"
> switch)
> 
> Thanks.
> Peter

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [dpdk-dev] [PATCH 0/4 for 2.3] vhost-user live migration support
  2015-12-14 13:21                 ` Yuanhan Liu
@ 2015-12-14 13:28                   ` Peter Xu
  2015-12-14 13:51                     ` Yuanhan Liu
  2015-12-14 14:54                   ` Pavel Fedin
  1 sibling, 1 reply; 42+ messages in thread
From: Peter Xu @ 2015-12-14 13:28 UTC (permalink / raw)
  To: Yuanhan Liu; +Cc: dev, Victor Kaplansky, Michael S. Tsirkin

On Mon, Dec 14, 2015 at 09:21:15PM +0800, Yuanhan Liu wrote:
> Peter,
> 
> Thanks for your input, and that sounds reasonable. You just reminded
> me that the host1's NIC is indeed different with host2's NIC: the ovs
> bridge mac address is different.
> 
> I then had a quick try, setting the two ovs bridge with same mac
> address, and it works like a charm: the gap is gone :)

Good to know that. :)

I will try to do some tests too using the patchset. Not sure whether
I will encounter the same KVM warning (seems related to APIC,
however still could not tell more than that). Will update you if
there is anything helpful.

Peter

> 
> 	--yliu

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [dpdk-dev] [PATCH 0/4 for 2.3] vhost-user live migration support
  2015-12-14 13:28                   ` Peter Xu
@ 2015-12-14 13:51                     ` Yuanhan Liu
  0 siblings, 0 replies; 42+ messages in thread
From: Yuanhan Liu @ 2015-12-14 13:51 UTC (permalink / raw)
  To: Peter Xu; +Cc: dev, Victor Kaplansky, Michael S. Tsirkin

On Mon, Dec 14, 2015 at 09:28:08PM +0800, Peter Xu wrote:
> On Mon, Dec 14, 2015 at 09:21:15PM +0800, Yuanhan Liu wrote:
> > Peter,
> > 
> > Thanks for your input, and that sounds reasonable. You just reminded
> > me that the host1's NIC is indeed different with host2's NIC: the ovs
> > bridge mac address is different.
> > 
> > I then had a quick try, setting the two ovs bridge with same mac
> > address, and it works like a charm: the gap is gone :)
> 
> Good to know that. :)
> 
> I will try to do some tests too using the patchset. Not sure whether
> I will encounter the same KVM warning (seems related to APIC,
> however still could not tell more than that). Will update you if
> there is anything helpful.

Appreciate that! It'd be good if you could test with my v2 patch set:
hopefully I could cook it up by the end of tomorrow.

	--yliu

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [dpdk-dev] [PATCH 0/4 for 2.3] vhost-user live migration support
  2015-12-14 13:21                 ` Yuanhan Liu
  2015-12-14 13:28                   ` Peter Xu
@ 2015-12-14 14:54                   ` Pavel Fedin
  1 sibling, 0 replies; 42+ messages in thread
From: Pavel Fedin @ 2015-12-14 14:54 UTC (permalink / raw)
  To: 'Yuanhan Liu', 'Peter Xu'
  Cc: dev, 'Victor Kaplansky', 'Michael S. Tsirkin'

 Hello!

> > I _guess_ the problem for ping might be: guest ARP entry for
> > 192.168.100.1 is not updated. Or say, after guest migrated to host2
> > from host1, guest is still trying to send packet to host1's NIC (no
> > one is telling it to update, right?), so no one is responding the
> > ping. When the entry is expired, guest will resend the ARP request,
> > and host2 will respond this time, with mac address on host2 provided
> > this time. After that, ping works again.
> 
> Peter,
> 
> Thanks for your input, and that sounds reasonable. You just reminded
> me that the host1's NIC is indeed different with host2's NIC: the ovs
> bridge mac address is different.

 Yes, this is indeed what is happening, and actually i already wrote about it. In wireshark it looks exactly like that: the some
PINGs are sent without replies, then the guest redoes ARP, PING replies resume.

Kind regards,
Pavel Fedin
Expert Engineer
Samsung Electronics Research center Russia

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [dpdk-dev] [PATCH 0/4 for 2.3] vhost-user live migration support
  2015-12-14  3:58     ` Peter Xu
  2015-12-14  7:30       ` Pavel Fedin
@ 2015-12-15  8:23       ` Yuanhan Liu
  2015-12-15  8:45         ` Pavel Fedin
  2015-12-15  9:42         ` Peter Xu
  1 sibling, 2 replies; 42+ messages in thread
From: Yuanhan Liu @ 2015-12-15  8:23 UTC (permalink / raw)
  To: Peter Xu; +Cc: dev

On Mon, Dec 14, 2015 at 11:58:42AM +0800, Peter Xu wrote:
> On Fri, Dec 11, 2015 at 01:22:23PM +0300, Pavel Fedin wrote:
> >  BTW, it works, and it was my bad. openvswitch was configured incorrectly on the other side, vhost port number was different for
> > some reason, while ruleset was the same. I reconfigured it and now everything migrates correctly, except increased downtime because
> > of missing GARP (the guest misses some PINGs, then it retries ARP, which brings the link back up).
> 
> Hi,
> 
> When doing the ping, was it from the guest (to another host) or to
> the guest (from another host)?
> 
> In any case, I still could not understand why the ping loss happened
> in this test.
> 
> If ping from guest, no ARP refresh is required at all?
> 
> If ping to guest from outside, when the migration finishes on the
> target side of qemu, qemu_self_announce() will be called.

It's supposed to see some ARP requests if I run tcpdump against
with the ovs bridge, right? However, in fact, I saw nothing from
tcpdump on the target host when the migration is done.

I mean I do find that qemu_annouce_self composes an ARP
broadcoast request, but it seems that I didn't catch it on
the target host.

Something wrong, or someting I missed?

	--yliu

> Although
> we might see a warning like "Vhost user backend fails to broadcast
> fake RARP" (notify is done by hacking vhost_user_receive(), even if
> notify fails, things will still move on), QEMU should still send a
> RARP onto the link.
> 
> Not sure whether I missed anything.
> 
> Thanks.
> Peter

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [dpdk-dev] [PATCH 0/4 for 2.3] vhost-user live migration support
  2015-12-15  8:23       ` Yuanhan Liu
@ 2015-12-15  8:45         ` Pavel Fedin
  2015-12-15  8:56           ` Yuanhan Liu
  2015-12-15 10:05           ` Peter Xu
  2015-12-15  9:42         ` Peter Xu
  1 sibling, 2 replies; 42+ messages in thread
From: Pavel Fedin @ 2015-12-15  8:45 UTC (permalink / raw)
  To: 'Yuanhan Liu', 'Peter Xu'; +Cc: dev

 Hello!

> I mean I do find that qemu_annouce_self composes an ARP
> broadcoast request, but it seems that I didn't catch it on
> the target host.
> 
> Something wrong, or someting I missed?

 To tell the truth, i don't know. I am also learning qemu internals on the fly. Indeed, i see that it should announce itself. But
this brings up a question: why do we need special announce procedure in vhost-user then?
 I think you can add some debug output and see how it works in realtime. This is what i normally do when i don't understand in which
sequence things happen.

Kind regards,
Pavel Fedin
Expert Engineer
Samsung Electronics Research center Russia

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [dpdk-dev] [PATCH 0/4 for 2.3] vhost-user live migration support
  2015-12-15  8:45         ` Pavel Fedin
@ 2015-12-15  8:56           ` Yuanhan Liu
  2015-12-15  9:04             ` Pavel Fedin
  2015-12-15 10:05           ` Peter Xu
  1 sibling, 1 reply; 42+ messages in thread
From: Yuanhan Liu @ 2015-12-15  8:56 UTC (permalink / raw)
  To: Pavel Fedin, Thibaut Collet; +Cc: dev

On Tue, Dec 15, 2015 at 11:45:56AM +0300, Pavel Fedin wrote:
>  Hello!
> 
> > I mean I do find that qemu_annouce_self composes an ARP
> > broadcoast request, but it seems that I didn't catch it on
> > the target host.
> > 
> > Something wrong, or someting I missed?
> 
>  To tell the truth, i don't know. I am also learning qemu internals on the fly. Indeed, i see that it should announce itself.

I was acutally asking Peter. Sorry for not making it clear and
thanks for your reply, anyway :)

> But
> this brings up a question: why do we need special announce procedure in vhost-user then?

Note quite sure. I found Thibaut submitted a patch to send
VHOST_USER_SEND_RARP request after migration is done months
ago. Thibaut, would you please elaborate it a bit more what
should be done on vhost-user backend? To construct a gratuitous
ARP request and broadcast it?

>  I think you can add some debug output and see how it works in realtime. This is what i normally do when i don't understand in which
> sequence things happen.

Thanks.

	--yliu
> 
> Kind regards,
> Pavel Fedin
> Expert Engineer
> Samsung Electronics Research center Russia
> 

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [dpdk-dev] [PATCH 0/4 for 2.3] vhost-user live migration support
  2015-12-15  8:56           ` Yuanhan Liu
@ 2015-12-15  9:04             ` Pavel Fedin
  0 siblings, 0 replies; 42+ messages in thread
From: Pavel Fedin @ 2015-12-15  9:04 UTC (permalink / raw)
  To: 'Yuanhan Liu', 'Thibaut Collet'; +Cc: dev

 Hello!

> Note quite sure. I found Thibaut submitted a patch to send
> VHOST_USER_SEND_RARP request after migration is done months
> ago. Thibaut, would you please elaborate it a bit more what
> should be done on vhost-user backend? To construct a gratuitous
> ARP request and broadcast it?

 By the way, some more info for you all.
1. I've just examined qemu_announce_self() and i see that IPs are all set to 0 in the packet it generates. It's quite logical
because qemu has no idea what address is used by the guest, even more, theoretically it could be not IPv4 at all. But then - how can
it work at all, and what's the use for this packet?
2. I tried to work around if by adding VIRTIO_NET_F_GUEST_ANNOUNCE. I expected that the guest will see it and make announcement by
itself. But result was quite the opposite - PING stopped working at all, right from the beginning, even without migration.

 Can local qemu/DPDK/etc gurus give some explanation?

Kind regards,
Pavel Fedin
Expert Engineer
Samsung Electronics Research center Russia

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [dpdk-dev] [PATCH 0/4 for 2.3] vhost-user live migration support
  2015-12-15  8:23       ` Yuanhan Liu
  2015-12-15  8:45         ` Pavel Fedin
@ 2015-12-15  9:42         ` Peter Xu
  1 sibling, 0 replies; 42+ messages in thread
From: Peter Xu @ 2015-12-15  9:42 UTC (permalink / raw)
  To: Yuanhan Liu; +Cc: dev

On Tue, Dec 15, 2015 at 04:23:24PM +0800, Yuanhan Liu wrote:
> On Mon, Dec 14, 2015 at 11:58:42AM +0800, Peter Xu wrote:
> > If ping to guest from outside, when the migration finishes on the
> > target side of qemu, qemu_self_announce() will be called.
> 
> It's supposed to see some ARP requests if I run tcpdump against
> with the ovs bridge, right? However, in fact, I saw nothing from
> tcpdump on the target host when the migration is done.
> 
> I mean I do find that qemu_annouce_self composes an ARP
> broadcoast request, but it seems that I didn't catch it on
> the target host.
> 
> Something wrong, or someting I missed?

AFAIK, it should be RARP rather than ARP request. However, sorry
that I do not know the reason for its lossing either.

Btw, I did a very basic live migration using v1 patchset locally
today (one host, two QEMU instances attach to the same vhost-user
socket), it's working too on my host. :)

Thanks.
Peter

> 
> 	--yliu
> 
> > Although
> > we might see a warning like "Vhost user backend fails to broadcast
> > fake RARP" (notify is done by hacking vhost_user_receive(), even if
> > notify fails, things will still move on), QEMU should still send a
> > RARP onto the link.
> > 
> > Not sure whether I missed anything.
> > 
> > Thanks.
> > Peter

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [dpdk-dev] [PATCH 0/4 for 2.3] vhost-user live migration support
  2015-12-15  8:45         ` Pavel Fedin
  2015-12-15  8:56           ` Yuanhan Liu
@ 2015-12-15 10:05           ` Peter Xu
  2015-12-15 11:43             ` Thibaut Collet
  1 sibling, 1 reply; 42+ messages in thread
From: Peter Xu @ 2015-12-15 10:05 UTC (permalink / raw)
  To: Pavel Fedin; +Cc: dev, Victor Kaplansky

On Tue, Dec 15, 2015 at 11:45:56AM +0300, Pavel Fedin wrote:
>  To tell the truth, i don't know. I am also learning qemu internals on the fly. Indeed, i see that it should announce itself. But
> this brings up a question: why do we need special announce procedure in vhost-user then?

I have the same question. Here is my guess...

In customized networks, maybe people are not using ARP at all? When
we use DPDK, we directly pass through the network logic inside
kernel itself. So logically all the network protocols could be
customized by the user of it. In the customized network, maybe there
is some other protocol (rather than RARP) that would do the same
thing as what ARP/RARP does. So, this SEND_RARP request could give
the vhost-user backend a chance to format its own announce packet
and broadcast (in the SEND_RARP request, the guest's mac address
will be appended).

CCing Victor to better know the truth...

Peter

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [dpdk-dev] [PATCH 0/4 for 2.3] vhost-user live migration support
  2015-12-15 10:05           ` Peter Xu
@ 2015-12-15 11:43             ` Thibaut Collet
  2015-12-15 11:47               ` Thibaut Collet
  0 siblings, 1 reply; 42+ messages in thread
From: Thibaut Collet @ 2015-12-15 11:43 UTC (permalink / raw)
  To: Peter Xu; +Cc: dev, Victor Kaplansky

On Tue, Dec 15, 2015 at 11:05 AM, Peter Xu <peterx@redhat.com> wrote:

> On Tue, Dec 15, 2015 at 11:45:56AM +0300, Pavel Fedin wrote:
> >  To tell the truth, i don't know. I am also learning qemu internals on
> the fly. Indeed, i see that it should announce itself. But
> > this brings up a question: why do we need special announce procedure in
> vhost-user then?
>
> I have the same question. Here is my guess...
>
> In customized networks, maybe people are not using ARP at all? When
> we use DPDK, we directly pass through the network logic inside
> kernel itself. So logically all the network protocols could be
> customized by the user of it. In the customized network, maybe there
> is some other protocol (rather than RARP) that would do the same
> thing as what ARP/RARP does. So, this SEND_RARP request could give
> the vhost-user backend a chance to format its own announce packet
> and broadcast (in the SEND_RARP request, the guest's mac address
> will be appended).
>
> CCing Victor to better know the truth...
>
> Peter
>


Hi,

After a migration, to avoid network outage, the guest must announce its new
location to the L2 layer, typically with a GARP. Otherwise requests sent to
the guest arrive to the old host until a ARP request is sent (after 30
seconds) or the guest sends some data.

QEMU implementation of self announce after a migration with a vhost backend
is the following:
 - If the VIRTIO_GUEST_ANNOUNCE feature has been negotiated the guest sends
automatically a GARP.
 - Else if the vhost backend implements VHOST_USER_SEND_RARP this request
is sent to the vhost backend. When this message is received the vhost
backend must act as it receives a RARP from the guest (purpose of this RARP
is to update switches' MAC->port maaping as a GARP). This RARP is a false
one, created by the vhost backend,
 - Else nothing is done and we have a network outage until a ARP is sent or
the guest sends some data.


VIRTIO_GUEST_ANNOUNCE feature is negotiated if:
  - the vhost backend announces the support of this feature. Maybe QEMU can
be updated to support unconditionnaly this feature
  - the virtio driver of the guest implements this feature. It is not the
case for old kernel or dpdk virtio pmd.

Regarding dpdk to have a migration of vhost interface with limited network
outage we have to:

  - Implement management VHOST_USER_SEND_RARP request to emulate a fake
RARP for guest

To do that we have to consider two kinds of guest:
  1. Guest with virtio driver implementing VIRTIO_GUEST_ANNOUNCE feature
  2. Guest with virtio driver that does not have the VIRTIO_GUEST_ANNOUNCE
feature. This is the case with old kernel or guest running a dpdk (virtio
pmd of dpdk does not have this feature)

Guest with VIRTIO_GUEST_ANNOUNCE feature sends automatically some GARP
after a migration if this feature has been negotiated. So the only thing to
do it is to negotiate the VIRTIO_GUEST_ANNOUNCE feature between QEMU, vhost
backend and the guest.
For this kind of guest the vhost-backend must announce the support of
VIRTIO_GUEST_ANNOUNCE feature. As vhost-backend has no particular action to
do in this case the support of VIRTIO_GUEST_ANNOUNCE feature can be
unconditionally set in QEMU in the future.

For guest without VIRTIO_GUEST_ANNOUNCE feature we have to send a fake
RARP: QEMU knows the MAC address of the guest and can create and broadcast
a RARP. But in case of vhost-backend QEMU is not able to broadcast this
fake RARP and must ask to the vhost backend to do it through the
VHOST_USER_SEND_RARP request. When the vhost backend receives this message
it must create a fake RARP message (as done by QEMU) and do the appropriate
operation as this message has been sent by the guest through the virtio
rings.


To solve this point 2 solutions are implemented:
 - After the migration the guest automatically sends GARP. This solution
occurs if VIRTIO_GUEST_ANNOUNCE feature has been negotiated between QEMU
and the guest.
         * VIRTIO_GUEST_ANNOUNCE

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [dpdk-dev] [PATCH 0/4 for 2.3] vhost-user live migration support
  2015-12-15 11:43             ` Thibaut Collet
@ 2015-12-15 11:47               ` Thibaut Collet
  2015-12-15 12:24                 ` Pavel Fedin
  2015-12-15 13:18                 ` Yuanhan Liu
  0 siblings, 2 replies; 42+ messages in thread
From: Thibaut Collet @ 2015-12-15 11:47 UTC (permalink / raw)
  To: Peter Xu; +Cc: dev, Victor Kaplansky

On Tue, Dec 15, 2015 at 12:43 PM, Thibaut Collet <thibaut.collet@6wind.com>
wrote:

>
>
> On Tue, Dec 15, 2015 at 11:05 AM, Peter Xu <peterx@redhat.com> wrote:
>
>> On Tue, Dec 15, 2015 at 11:45:56AM +0300, Pavel Fedin wrote:
>> >  To tell the truth, i don't know. I am also learning qemu internals on
>> the fly. Indeed, i see that it should announce itself. But
>> > this brings up a question: why do we need special announce procedure in
>> vhost-user then?
>>
>> I have the same question. Here is my guess...
>>
>> In customized networks, maybe people are not using ARP at all? When
>> we use DPDK, we directly pass through the network logic inside
>> kernel itself. So logically all the network protocols could be
>> customized by the user of it. In the customized network, maybe there
>> is some other protocol (rather than RARP) that would do the same
>> thing as what ARP/RARP does. So, this SEND_RARP request could give
>> the vhost-user backend a chance to format its own announce packet
>> and broadcast (in the SEND_RARP request, the guest's mac address
>> will be appended).
>>
>> CCing Victor to better know the truth...
>>
>> Peter
>>
>
>
> Hi,
>
> After a migration, to avoid network outage, the guest must announce its
> new location to the L2 layer, typically with a GARP. Otherwise requests
> sent to the guest arrive to the old host until a ARP request is sent (after
> 30 seconds) or the guest sends some data.
>
> QEMU implementation of self announce after a migration with a vhost
> backend is the following:
>  - If the VIRTIO_GUEST_ANNOUNCE feature has been negotiated the guest
> sends automatically a GARP.
>  - Else if the vhost backend implements VHOST_USER_SEND_RARP this request
> is sent to the vhost backend. When this message is received the vhost
> backend must act as it receives a RARP from the guest (purpose of this RARP
> is to update switches' MAC->port maaping as a GARP). This RARP is a false
> one, created by the vhost backend,
>  - Else nothing is done and we have a network outage until a ARP is sent
> or the guest sends some data.
>
>
> VIRTIO_GUEST_ANNOUNCE feature is negotiated if:
>   - the vhost backend announces the support of this feature. Maybe QEMU
> can be updated to support unconditionnaly this feature
>   - the virtio driver of the guest implements this feature. It is not the
> case for old kernel or dpdk virtio pmd.
>
> Regarding dpdk to have a migration of vhost interface with limited network
> outage we have to:
>
>   - Implement management VHOST_USER_SEND_RARP request to emulate a fake
> RARP for guest
>
> To do that we have to consider two kinds of guest:
>   1. Guest with virtio driver implementing VIRTIO_GUEST_ANNOUNCE feature
>   2. Guest with virtio driver that does not have the VIRTIO_GUEST_ANNOUNCE
> feature. This is the case with old kernel or guest running a dpdk (virtio
> pmd of dpdk does not have this feature)
>
> Guest with VIRTIO_GUEST_ANNOUNCE feature sends automatically some GARP
> after a migration if this feature has been negotiated. So the only thing to
> do it is to negotiate the VIRTIO_GUEST_ANNOUNCE feature between QEMU, vhost
> backend and the guest.
> For this kind of guest the vhost-backend must announce the support of
> VIRTIO_GUEST_ANNOUNCE feature. As vhost-backend has no particular action to
> do in this case the support of VIRTIO_GUEST_ANNOUNCE feature can be
> unconditionally set in QEMU in the future.
>
> For guest without VIRTIO_GUEST_ANNOUNCE feature we have to send a fake
> RARP: QEMU knows the MAC address of the guest and can create and broadcast
> a RARP. But in case of vhost-backend QEMU is not able to broadcast this
> fake RARP and must ask to the vhost backend to do it through the
> VHOST_USER_SEND_RARP request. When the vhost backend receives this message
> it must create a fake RARP message (as done by QEMU) and do the appropriate
> operation as this message has been sent by the guest through the virtio
> rings.
>
>
> To solve this point 2 solutions are implemented:
>  - After the migration the guest automatically sends GARP. This solution
> occurs if VIRTIO_GUEST_ANNOUNCE feature has been negotiated between QEMU
> and the guest.
>          * VIRTIO_GUEST_ANNOUNCE
>


Sorry my previous message will be sent by error (it is a draft with rework
in progress)

The full explanation are:

Hi,

After a migration, to avoid network outage, the guest must announce its new
location to the L2 layer, typically with a GARP. Otherwise requests sent to
the guest arrive to the old host until a ARP request is sent (after 30
seconds) or the guest sends some data.

QEMU implementation of self announce after a migration with a vhost backend
is the following:
 - If the VIRTIO_GUEST_ANNOUNCE feature has been negotiated the guest sends
automatically a GARP.
 - Else if the vhost backend implements VHOST_USER_SEND_RARP this request
is sent to the vhost backend. When this message is received the vhost
backend must act as it receives a RARP from the guest (purpose of this RARP
is to update switches' MAC->port maaping as a GARP). This RARP is a false
one, created by the vhost backend,
 - Else nothing is done and we have a network outage until a ARP is sent or
the guest sends some data.


VIRTIO_GUEST_ANNOUNCE feature is negotiated if:
  - the vhost backend announces the support of this feature. Maybe QEMU can
be updated to support unconditionnaly this feature
  - the virtio driver of the guest implements this feature. It is not the
case for old kernel or dpdk virtio pmd.

Regarding dpdk to have a migration of vhost interface with limited network
outage we have to:
  - In the vhost pmd
        * Announce supports of VIRTIO_GUEST_ANNOUNCE feature
        * Implement management of VHOST_USER_SEND_RARP request to emulate a
fake RARP if the VIRTIO_GUEST_ANNOUNCE feature is not implemented by the
guest

  - In the virtio pmd
        * Support VIRTIO_GUEST_ANNOUNCE feature to avoid RARP emission by
the host after a migration.


Hope this explanation will help

Regards.

Thibaut.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [dpdk-dev] [PATCH 0/4 for 2.3] vhost-user live migration support
  2015-12-15 11:47               ` Thibaut Collet
@ 2015-12-15 12:24                 ` Pavel Fedin
  2015-12-15 13:36                   ` Yuanhan Liu
  2015-12-15 13:18                 ` Yuanhan Liu
  1 sibling, 1 reply; 42+ messages in thread
From: Pavel Fedin @ 2015-12-15 12:24 UTC (permalink / raw)
  To: 'Thibaut Collet', 'Peter Xu'
  Cc: dev, 'Victor Kaplansky'

 Hello!

> After a migration, to avoid network outage, the guest must announce its new location to the L2 layer, typically with a GARP. Otherwise requests sent to
> the guest arrive to the old host until a ARP request is sent (after 30 seconds) or the guest sends some data.
> QEMU implementation of self announce after a migration with a vhost backend is the following:
> - If the VIRTIO_GUEST_ANNOUNCE feature has been negotiated the guest sends automatically a GARP.
> - Else if the vhost backend implements VHOST_USER_SEND_RARP this request is sent to the vhost backend. When this message is received the vhost backend
> must act as it receives a RARP from the guest (purpose of this RARP is to update switches' MAC->port maaping as a GARP). This RARP is a false one,
> created by the vhost backend,
> - Else nothing is done and we have a network outage until a ARP is sent or the guest sends some data.

 But what is qemu_announce_self() then? It's just unconditionally triggered after migration, but indeed sends some strange thing.

> VIRTIO_GUEST_ANNOUNCE feature is negotiated if:
>  - the vhost backend announces the support of this feature. Maybe QEMU can be updated to support unconditionnaly this feature

 Wrong. I tried to unconditionally enforce it in qemu (my guest does support it), and the link stopped working at all. I don't understand why.

Kind regards,
Pavel Fedin
Expert Engineer
Samsung Electronics Research center Russia

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [dpdk-dev] [PATCH 0/4 for 2.3] vhost-user live migration support
  2015-12-15 11:47               ` Thibaut Collet
  2015-12-15 12:24                 ` Pavel Fedin
@ 2015-12-15 13:18                 ` Yuanhan Liu
  2015-12-15 15:07                   ` Thibaut Collet
  1 sibling, 1 reply; 42+ messages in thread
From: Yuanhan Liu @ 2015-12-15 13:18 UTC (permalink / raw)
  To: Thibaut Collet; +Cc: dev, Victor Kaplansky

On Tue, Dec 15, 2015 at 12:47:47PM +0100, Thibaut Collet wrote:
> On Tue, Dec 15, 2015 at 12:43 PM, Thibaut Collet <thibaut.collet@6wind.com>
> wrote:
> 
> >
> >
> > On Tue, Dec 15, 2015 at 11:05 AM, Peter Xu <peterx@redhat.com> wrote:
> >
> >> On Tue, Dec 15, 2015 at 11:45:56AM +0300, Pavel Fedin wrote:
> >> >  To tell the truth, i don't know. I am also learning qemu internals on
> >> the fly. Indeed, i see that it should announce itself. But
> >> > this brings up a question: why do we need special announce procedure in
> >> vhost-user then?
> >>
> >> I have the same question. Here is my guess...
> >>
> >> In customized networks, maybe people are not using ARP at all? When
> >> we use DPDK, we directly pass through the network logic inside
> >> kernel itself. So logically all the network protocols could be
> >> customized by the user of it. In the customized network, maybe there
> >> is some other protocol (rather than RARP) that would do the same
> >> thing as what ARP/RARP does. So, this SEND_RARP request could give
> >> the vhost-user backend a chance to format its own announce packet
> >> and broadcast (in the SEND_RARP request, the guest's mac address
> >> will be appended).
> >>
> >> CCing Victor to better know the truth...
> >>
> >> Peter
> >>
> >

Hey Thibaut,

First of all, thanks a lot for your lengthy explanation.

> > Hi,
> >
> > After a migration, to avoid network outage, the guest must announce its
> > new location to the L2 layer, typically with a GARP. Otherwise requests
> > sent to the guest arrive to the old host until a ARP request is sent (after
> > 30 seconds) or the guest sends some data.
> >
> > QEMU implementation of self announce after a migration with a vhost
> > backend is the following:
> >  - If the VIRTIO_GUEST_ANNOUNCE feature has been negotiated the guest
> > sends automatically a GARP.

I'm kind of clear how VIRTIO_GUEST_ANNOUNCE works so far, except that I
met a bug, which I will describe in another email.

> >  - Else if the vhost backend implements VHOST_USER_SEND_RARP this request
> > is sent to the vhost backend. When this message is received the vhost
> > backend must act as it receives a RARP from the guest (purpose of this RARP

Can you be more specific about this? Say, what kind of acts the vhost
backend should do exactly?

> > is to update switches' MAC->port maaping as a GARP).

Isn't it vhost library is not aware of swtich at all? How could we
update switches's MAC-port mapping inside vhost library?

> This RARP is a false
> > one, created by the vhost backend,

I'm a bit confused now. You were just saying "vhost backend must act
as it __recevives__ a RARP from the guest", and you are now saying
"the RARP is a false one __created__ by the vhost backend".

Thanks.

	--yliu

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [dpdk-dev] [PATCH 0/4 for 2.3] vhost-user live migration support
  2015-12-15 12:24                 ` Pavel Fedin
@ 2015-12-15 13:36                   ` Yuanhan Liu
  2015-12-15 13:48                     ` Pavel Fedin
  0 siblings, 1 reply; 42+ messages in thread
From: Yuanhan Liu @ 2015-12-15 13:36 UTC (permalink / raw)
  To: Pavel Fedin; +Cc: dev, 'Victor Kaplansky'

On Tue, Dec 15, 2015 at 03:24:48PM +0300, Pavel Fedin wrote:
>  Hello!
> 
> > After a migration, to avoid network outage, the guest must announce its new location to the L2 layer, typically with a GARP. Otherwise requests sent to
> > the guest arrive to the old host until a ARP request is sent (after 30 seconds) or the guest sends some data.
> > QEMU implementation of self announce after a migration with a vhost backend is the following:
> > - If the VIRTIO_GUEST_ANNOUNCE feature has been negotiated the guest sends automatically a GARP.
> > - Else if the vhost backend implements VHOST_USER_SEND_RARP this request is sent to the vhost backend. When this message is received the vhost backend
> > must act as it receives a RARP from the guest (purpose of this RARP is to update switches' MAC->port maaping as a GARP). This RARP is a false one,
> > created by the vhost backend,
> > - Else nothing is done and we have a network outage until a ARP is sent or the guest sends some data.
> 
>  But what is qemu_announce_self() then? It's just unconditionally triggered after migration, but indeed sends some strange thing.
> 
> > VIRTIO_GUEST_ANNOUNCE feature is negotiated if:
> >  - the vhost backend announces the support of this feature. Maybe QEMU can be updated to support unconditionnaly this feature
> 
>  Wrong. I tried to unconditionally enforce it in qemu (my guest does support it), and the link stopped working at all. I don't understand why.

I'm wondering how did you do that? Why do you need enforece it in QEMU?
Isn't it already supported so far?

Actually, what's we need to do is to add such feature bit in vhost
library, to claim we support it so that the the guest will send a 
gratuitous ARP when migration is done (check virtio_net_load()).

----
diff --git a/lib/librte_vhost/virtio-net.c
b/lib/librte_vhost/virtio-net.c
index 03044f6..0ba5045 100644
--- a/lib/librte_vhost/virtio-net.c
+++ b/lib/librte_vhost/virtio-net.c
@@ -74,6 +74,7 @@ static struct virtio_net_config_ll *ll_root;
 #define VHOST_SUPPORTED_FEATURES ((1ULL << VIRTIO_NET_F_MRG_RXBUF) | \
                                (1ULL << VIRTIO_NET_F_CTRL_VQ) | \
                                (1ULL << VIRTIO_NET_F_CTRL_RX) | \
+                               (1ULL << VIRTIO_NET_F_GUEST_ANNOUNCE) | \
                                (VHOST_SUPPORTS_MQ)            | \
                                (1ULL << VIRTIO_F_VERSION_1)   | \
                                (1ULL << VHOST_F_LOG_ALL)      | \

However, I found the GARP is not sent out at all, due to an error
I met and reported before:

    KVM: injection failed, MSI lost (Operation not permitted)

Which happened at the time QEMU is about to send the interrupt to the 
guest for announce event. However, it failed, hence no GARP was received.

One thing worth noting is that it happened only when I did live migration
on two different hosts (the two hosts happened to be using a same old 
kernel: v3.11.10).  It works pretty well on same host. So, seems like
a KVM bug then?

	--yliu

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [dpdk-dev] [PATCH 0/4 for 2.3] vhost-user live migration support
  2015-12-15 13:36                   ` Yuanhan Liu
@ 2015-12-15 13:48                     ` Pavel Fedin
  2015-12-15 13:59                       ` Yuanhan Liu
  0 siblings, 1 reply; 42+ messages in thread
From: Pavel Fedin @ 2015-12-15 13:48 UTC (permalink / raw)
  To: 'Yuanhan Liu'; +Cc: dev, 'Victor Kaplansky'

 Hello!

> >  Wrong. I tried to unconditionally enforce it in qemu (my guest does support it), and the
> link stopped working at all. I don't understand why.
> 
> I'm wondering how did you do that? Why do you need enforece it in QEMU?
> Isn't it already supported so far?

 I mean - qemu first asks vhost-user server (ovs/DPDK in our case) about capabilities, then negotiates them with the guest. And DPDK
doesn't report VIRTIO_NET_F_GUEST_ANNOUNCE, so i just ORed this flag in qemu before the negotiation with guest (because indeed my
logic says that the host should not do anything special about it). So the overall effect is the same as in your patch

> diff --git a/lib/librte_vhost/virtio-net.c
> b/lib/librte_vhost/virtio-net.c
> index 03044f6..0ba5045 100644
> --- a/lib/librte_vhost/virtio-net.c
> +++ b/lib/librte_vhost/virtio-net.c
> @@ -74,6 +74,7 @@ static struct virtio_net_config_ll *ll_root;
>  #define VHOST_SUPPORTED_FEATURES ((1ULL << VIRTIO_NET_F_MRG_RXBUF) | \
>                                 (1ULL << VIRTIO_NET_F_CTRL_VQ) | \
>                                 (1ULL << VIRTIO_NET_F_CTRL_RX) | \
> +                               (1ULL << VIRTIO_NET_F_GUEST_ANNOUNCE) | \
>                                 (VHOST_SUPPORTS_MQ)            | \
>                                 (1ULL << VIRTIO_F_VERSION_1)   | \
>                                 (1ULL << VHOST_F_LOG_ALL)      | \

 But i was somehow wrong and this causes the whole thing to stop working instead. Even after just booting up the network doesn't
work and PINGs do not pass.

> However, I found the GARP is not sent out at all, due to an error
> I met and reported before:
> 
>     KVM: injection failed, MSI lost (Operation not permitted)

 Interesting, i don't have this problem here. Some bug in your kernel/hardware?

> One thing worth noting is that it happened only when I did live migration
> on two different hosts (the two hosts happened to be using a same old
> kernel: v3.11.10).  It works pretty well on same host. So, seems like
> a KVM bug then?

 3.18.9 here and no this problem.

Kind regards,
Pavel Fedin
Expert Engineer
Samsung Electronics Research center Russia

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [dpdk-dev] [PATCH 0/4 for 2.3] vhost-user live migration support
  2015-12-15 13:48                     ` Pavel Fedin
@ 2015-12-15 13:59                       ` Yuanhan Liu
  2015-12-15 14:58                         ` Pavel Fedin
  0 siblings, 1 reply; 42+ messages in thread
From: Yuanhan Liu @ 2015-12-15 13:59 UTC (permalink / raw)
  To: Pavel Fedin; +Cc: dev, 'Victor Kaplansky'

On Tue, Dec 15, 2015 at 04:48:12PM +0300, Pavel Fedin wrote:
>  Hello!
> 
> > >  Wrong. I tried to unconditionally enforce it in qemu (my guest does support it), and the
> > link stopped working at all. I don't understand why.
> > 
> > I'm wondering how did you do that? Why do you need enforece it in QEMU?
> > Isn't it already supported so far?
> 
>  I mean - qemu first asks vhost-user server (ovs/DPDK in our case) about capabilities, then negotiates them with the guest. And DPDK
> doesn't report VIRTIO_NET_F_GUEST_ANNOUNCE, so i just ORed this flag in qemu before the negotiation with guest (because indeed my
> logic says that the host should not do anything special about it). So the overall effect is the same as in your patch

I see.

> 
> > diff --git a/lib/librte_vhost/virtio-net.c
> > b/lib/librte_vhost/virtio-net.c
> > index 03044f6..0ba5045 100644
> > --- a/lib/librte_vhost/virtio-net.c
> > +++ b/lib/librte_vhost/virtio-net.c
> > @@ -74,6 +74,7 @@ static struct virtio_net_config_ll *ll_root;
> >  #define VHOST_SUPPORTED_FEATURES ((1ULL << VIRTIO_NET_F_MRG_RXBUF) | \
> >                                 (1ULL << VIRTIO_NET_F_CTRL_VQ) | \
> >                                 (1ULL << VIRTIO_NET_F_CTRL_RX) | \
> > +                               (1ULL << VIRTIO_NET_F_GUEST_ANNOUNCE) | \
> >                                 (VHOST_SUPPORTS_MQ)            | \
> >                                 (1ULL << VIRTIO_F_VERSION_1)   | \
> >                                 (1ULL << VHOST_F_LOG_ALL)      | \
> 
>  But i was somehow wrong and this causes the whole thing to stop working instead. Even after just booting up the network doesn't
> work and PINGs do not pass.

No idea. Maybe you have changed some other configures (such as of ovs)
without notice? Or, the ovs bridge interface resets?

BTW, would you please try my v1 patch set with above diff applied to
see if the ping loss is still there. You might also want to run tcpdump
with the dest host ovs bridge, to see if GARP is actually sent.

> 
> > However, I found the GARP is not sent out at all, due to an error
> > I met and reported before:
> > 
> >     KVM: injection failed, MSI lost (Operation not permitted)

I was thinking that may be caused by the difference of my two hosts (a
desktop and a server). I will try to find two similar hosts tomorrow
to do more tests. Besides that, it'd be great if you could do a more
test with above diff applied.

	--yliu
> 
>  Interesting, i don't have this problem here. Some bug in your kernel/hardware?
> 
> > One thing worth noting is that it happened only when I did live migration
> > on two different hosts (the two hosts happened to be using a same old
> > kernel: v3.11.10).  It works pretty well on same host. So, seems like
> > a KVM bug then?
> 
>  3.18.9 here and no this problem.
> 
> Kind regards,
> Pavel Fedin
> Expert Engineer
> Samsung Electronics Research center Russia
> 

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [dpdk-dev] [PATCH 0/4 for 2.3] vhost-user live migration support
  2015-12-15 13:59                       ` Yuanhan Liu
@ 2015-12-15 14:58                         ` Pavel Fedin
  2015-12-16  7:28                           ` Yuanhan Liu
  0 siblings, 1 reply; 42+ messages in thread
From: Pavel Fedin @ 2015-12-15 14:58 UTC (permalink / raw)
  To: 'Yuanhan Liu'; +Cc: dev, 'Victor Kaplansky'

 Hello!

> No idea. Maybe you have changed some other configures (such as of ovs)
> without notice? Or, the ovs bridge interface resets?

 I don't touch the ovs at all. Just shut down the guest, rebuild the qemu, reinstall it, run the guest.

> 
> BTW, would you please try my v1 patch set with above diff applied to
> see if the ping loss is still there. You might also want to run tcpdump
> with the dest host ovs bridge, to see if GARP is actually sent.

 Retested with wireshark running on the host. I used my qemu patch instead, but it should not matter at all:
--- cut ---
diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
index 1b6c5ac..5ca2987 100644
--- a/hw/virtio/vhost-user.c
+++ b/hw/virtio/vhost-user.c
@@ -480,7 +480,12 @@ static int vhost_user_get_u64(struct vhost_dev *dev, int request, uint64_t *u64)

 static int vhost_user_get_features(struct vhost_dev *dev, uint64_t *features)
 {
-    return vhost_user_get_u64(dev, VHOST_USER_GET_FEATURES, features);
+    int ret = vhost_user_get_u64(dev, VHOST_USER_GET_FEATURES, features);
+
+    if (!ret) {
+        virtio_add_feature(features, VIRTIO_NET_F_GUEST_ANNOUNCE);
+    }
+    return ret;
 }

 static int vhost_user_set_owner(struct vhost_dev *dev)
--- cut ---

 So, here are both wireshark captures on the host side:

1. Without the patch

root@nfv_test_x86_64 / # tshark -i ovs-br0
Running as user "root" and group "root". This could be dangerous.
Capturing on 'ovs-br0'
  1   0.000000           :: -> ff02::16     ICMPv6 90 Multicast Listener Report Message v2
  2   0.003304 RealtekU_3b:83:1a -> Broadcast    ARP 42 Gratuitous ARP for 192.168.6.2 (Reply)
  3   0.669957           :: -> ff02::16     ICMPv6 90 Multicast Listener Report Message v2
  4   0.858957           :: -> ff02::1:ff3b:831a ICMPv6 78 Neighbor Solicitation for fe80::5054:ff:fe3b:831a
  5   1.858968 fe80::5054:ff:fe3b:831a -> ff02::16     ICMPv6 90 Multicast Listener Report Message v2
  6   2.300948 fe80::5054:ff:fe3b:831a -> ff02::16     ICMPv6 90 Multicast Listener Report Message v2
  7   2.527088 fe80::5054:ff:fe3b:831a -> ff02::2      ICMPv6 62 Router Solicitation
  8   2.527800 RealtekU_3b:83:1a -> Broadcast    ARP 42 Gratuitous ARP for 192.168.6.2 (Request)
  9   6.526814 fe80::5054:ff:fe3b:831a -> ff02::2      ICMPv6 62 Router Solicitation
 10  10.526993 fe80::5054:ff:fe3b:831a -> ff02::2      ICMPv6 62 Router Solicitation
 11  15.984632 RealtekU_3b:83:1a -> Broadcast    ARP 42 Who has 192.168.6.1?  Tell 192.168.6.2
 12  15.984643 be:e1:71:c1:47:4d -> RealtekU_3b:83:1a ARP 42 192.168.6.1 is at be:e1:71:c1:47:4d
 13  15.984772  192.168.6.2 -> 192.168.6.1  ICMP 98 Echo (ping) request  id=0x0477, seq=1/256, ttl=64
 14  15.984798  192.168.6.1 -> 192.168.6.2  ICMP 98 Echo (ping) reply    id=0x0477, seq=1/256, ttl=64 (request in 13)
 15  16.984970  192.168.6.2 -> 192.168.6.1  ICMP 98 Echo (ping) request  id=0x0477, seq=2/512, ttl=64
 16  16.984991  192.168.6.1 -> 192.168.6.2  ICMP 98 Echo (ping) reply    id=0x0477, seq=2/512, ttl=64 (request in 15)
 17  17.984956  192.168.6.2 -> 192.168.6.1  ICMP 98 Echo (ping) request  id=0x0477, seq=3/768, ttl=64
 18  17.984975  192.168.6.1 -> 192.168.6.2  ICMP 98 Echo (ping) reply    id=0x0477, seq=3/768, ttl=64 (request in 17)
 19  20.994535 be:e1:71:c1:47:4d -> RealtekU_3b:83:1a ARP 42 Who has 192.168.6.2?  Tell 192.168.6.1
 20  20.994637 RealtekU_3b:83:1a -> be:e1:71:c1:47:4d ARP 42 192.168.6.2 is at 52:54:00:3b:83:1a
^C20 packets captured

2. With the patch

root@nfv_test_x86_64 / # tshark -i ovs-br0
Running as user "root" and group "root". This could be dangerous.
Capturing on 'ovs-br0'
  1   0.000000           :: -> ff02::16     ICMPv6 90 Multicast Listener Report Message v2
  2   0.000969 RealtekU_3b:83:1a -> Broadcast    ARP 42 Gratuitous ARP for 192.168.6.2 (Reply)
  3   0.156966           :: -> ff02::1:ff3b:831a ICMPv6 78 Neighbor Solicitation for fe80::5054:ff:fe3b:831a
  4   0.536948           :: -> ff02::16     ICMPv6 90 Multicast Listener Report Message v2
  5   1.156968 fe80::5054:ff:fe3b:831a -> ff02::16     ICMPv6 90 Multicast Listener Report Message v2
  6   1.312708 fe80::5054:ff:fe3b:831a -> ff02::2      ICMPv6 62 Router Solicitation
  7   1.629960 fe80::5054:ff:fe3b:831a -> ff02::16     ICMPv6 90 Multicast Listener Report Message v2
  8   2.314713 RealtekU_3b:83:1a -> Broadcast    ARP 42 Gratuitous ARP for 192.168.6.2 (Request)
  9   5.313333 fe80::5054:ff:fe3b:831a -> ff02::2      ICMPv6 62 Router Solicitation
 10   9.315486 fe80::5054:ff:fe3b:831a -> ff02::2      ICMPv6 62 Router Solicitation
 11  21.536450 RealtekU_3b:83:1a -> Broadcast    ARP 42 Who has 192.168.6.1?  Tell 192.168.6.2
 12  21.536461 be:e1:71:c1:47:4d -> RealtekU_3b:83:1a ARP 42 192.168.6.1 is at be:e1:71:c1:47:4d
 13  22.538937 RealtekU_3b:83:1a -> Broadcast    ARP 42 Who has 192.168.6.1?  Tell 192.168.6.2
 14  22.538943 be:e1:71:c1:47:4d -> RealtekU_3b:83:1a ARP 42 192.168.6.1 is at be:e1:71:c1:47:4d
 15  23.540937 RealtekU_3b:83:1a -> Broadcast    ARP 42 Who has 192.168.6.1?  Tell 192.168.6.2
 16  23.540942 be:e1:71:c1:47:4d -> RealtekU_3b:83:1a ARP 42 192.168.6.1 is at be:e1:71:c1:47:4d
 17  25.537519 RealtekU_3b:83:1a -> Broadcast    ARP 42 Who has 192.168.6.1?  Tell 192.168.6.2
 18  25.537525 be:e1:71:c1:47:4d -> RealtekU_3b:83:1a ARP 42 192.168.6.1 is at be:e1:71:c1:47:4d
 19  26.538939 RealtekU_3b:83:1a -> Broadcast    ARP 42 Who has 192.168.6.1?  Tell 192.168.6.2
 20  26.538944 be:e1:71:c1:47:4d -> RealtekU_3b:83:1a ARP 42 192.168.6.1 is at be:e1:71:c1:47:4d
 21  27.540937 RealtekU_3b:83:1a -> Broadcast    ARP 42 Who has 192.168.6.1?  Tell 192.168.6.2
 22  27.540942 be:e1:71:c1:47:4d -> RealtekU_3b:83:1a ARP 42 192.168.6.1 is at be:e1:71:c1:47:4d
 23  29.538475 RealtekU_3b:83:1a -> Broadcast    ARP 42 Who has 192.168.6.1?  Tell 192.168.6.2
 24  29.538482 be:e1:71:c1:47:4d -> RealtekU_3b:83:1a ARP 42 192.168.6.1 is at be:e1:71:c1:47:4d
 25  30.538935 RealtekU_3b:83:1a -> Broadcast    ARP 42 Who has 192.168.6.1?  Tell 192.168.6.2
 26  30.538941 be:e1:71:c1:47:4d -> RealtekU_3b:83:1a ARP 42 192.168.6.1 is at be:e1:71:c1:47:4d
 27  31.540935 RealtekU_3b:83:1a -> Broadcast    ARP 42 Who has 192.168.6.1?  Tell 192.168.6.2
 28  31.540941 be:e1:71:c1:47:4d -> RealtekU_3b:83:1a ARP 42 192.168.6.1 is at be:e1:71:c1:47:4d
^C28 packets captured

 Obviously, the guest simply doesn't read incoming packets. ifconfig for the interface on guest side shows:

RX packets 0 bytes 0 (0.0 B)
RX errors 0 dropped 9 overruns 0 frame 9

 BTW, number 9 exactly matches the number of ARP replies from the host. The question is - why? Looks like guest behavior changes
somehow. Is it a bug in guest? It's very strange, because in these sessions i see only one difference in IPv6 packets:

  4   0.858957           :: -> ff02::1:ff3b:831a ICMPv6 78 Neighbor Solicitation for fe80::5054:ff:fe3b:831a

This is present in session #1 and missing from session #2. Can it affect the whole thing somehow? But i don't even use IPv6.

Kind regards,
Pavel Fedin
Expert Engineer
Samsung Electronics Research center Russia

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [dpdk-dev] [PATCH 0/4 for 2.3] vhost-user live migration support
  2015-12-15 13:18                 ` Yuanhan Liu
@ 2015-12-15 15:07                   ` Thibaut Collet
  2015-12-15 15:36                     ` Pavel Fedin
  2015-12-16  2:38                     ` Peter Xu
  0 siblings, 2 replies; 42+ messages in thread
From: Thibaut Collet @ 2015-12-15 15:07 UTC (permalink / raw)
  To: Yuanhan Liu; +Cc: dev, Victor Kaplansky

On Tue, Dec 15, 2015 at 2:18 PM, Yuanhan Liu <yuanhan.liu@linux.intel.com>
wrote:

> On Tue, Dec 15, 2015 at 12:47:47PM +0100, Thibaut Collet wrote:
> > On Tue, Dec 15, 2015 at 12:43 PM, Thibaut Collet <
> thibaut.collet@6wind.com>
> > wrote:
> >
> > >
> > >
> > > On Tue, Dec 15, 2015 at 11:05 AM, Peter Xu <peterx@redhat.com> wrote:
> > >
> > >> On Tue, Dec 15, 2015 at 11:45:56AM +0300, Pavel Fedin wrote:
> > >> >  To tell the truth, i don't know. I am also learning qemu internals
> on
> > >> the fly. Indeed, i see that it should announce itself. But
> > >> > this brings up a question: why do we need special announce
> procedure in
> > >> vhost-user then?
> > >>
> > >> I have the same question. Here is my guess...
> > >>
> > >> In customized networks, maybe people are not using ARP at all? When
> > >> we use DPDK, we directly pass through the network logic inside
> > >> kernel itself. So logically all the network protocols could be
> > >> customized by the user of it. In the customized network, maybe there
> > >> is some other protocol (rather than RARP) that would do the same
> > >> thing as what ARP/RARP does. So, this SEND_RARP request could give
> > >> the vhost-user backend a chance to format its own announce packet
> > >> and broadcast (in the SEND_RARP request, the guest's mac address
> > >> will be appended).
> > >>
> > >> CCing Victor to better know the truth...
> > >>
> > >> Peter
> > >>
> > >
>
> Hey Thibaut,
>
> First of all, thanks a lot for your lengthy explanation.
>
> > > Hi,
> > >
> > > After a migration, to avoid network outage, the guest must announce its
> > > new location to the L2 layer, typically with a GARP. Otherwise requests
> > > sent to the guest arrive to the old host until a ARP request is sent
> (after
> > > 30 seconds) or the guest sends some data.
> > >
> > > QEMU implementation of self announce after a migration with a vhost
> > > backend is the following:
> > >  - If the VIRTIO_GUEST_ANNOUNCE feature has been negotiated the guest
> > > sends automatically a GARP.
>
> I'm kind of clear how VIRTIO_GUEST_ANNOUNCE works so far, except that I
> met a bug, which I will describe in another email.
>
> > >  - Else if the vhost backend implements VHOST_USER_SEND_RARP this
> request
> > > is sent to the vhost backend. When this message is received the vhost
> > > backend must act as it receives a RARP from the guest (purpose of this
> RARP
>
> Can you be more specific about this? Say, what kind of acts the vhost
> backend should do exactly?
>
> > > is to update switches' MAC->port maaping as a GARP).
>
> Isn't it vhost library is not aware of swtich at all? How could we
> update switches's MAC-port mapping inside vhost library?
>
> > This RARP is a false
> > > one, created by the vhost backend,
>
> I'm a bit confused now. You were just saying "vhost backend must act
> as it __recevives__ a RARP from the guest", and you are now saying
> "the RARP is a false one __created__ by the vhost backend".
>
> Thanks.
>
>         --yliu
>

After a migration, to avoid netwotk outage, all interfaces of the guest
must send a packet to update switches mapping (ideally a GARP).
As some interfaces do not do it QEMU does it in behalf of the guest by
sending a RARP (his RARP is not forged by the guest but by QEMU). This is
the qemu_self_announce purpose that "spoofs" a RARP to all backend of guest
ethernet interfaces. For vhost-user backend, QEMU can not do it directly
and asks to the vhost-user backend to do it with the VHOST_USER_SEND_RARP
request that contains the MAC address of the guest interface.

Thibaut.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [dpdk-dev] [PATCH 0/4 for 2.3] vhost-user live migration support
  2015-12-15 15:07                   ` Thibaut Collet
@ 2015-12-15 15:36                     ` Pavel Fedin
  2015-12-16  2:38                     ` Peter Xu
  1 sibling, 0 replies; 42+ messages in thread
From: Pavel Fedin @ 2015-12-15 15:36 UTC (permalink / raw)
  To: 'Thibaut Collet', 'Yuanhan Liu'
  Cc: dev, 'Victor Kaplansky'

 Hello!
 
> After a migration, to avoid netwotk outage, all interfaces of the guest must send a packet to update switches mapping (ideally a GARP).
> As some interfaces do not do it QEMU does it in behalf of the guest by sending a RARP (his RARP is not forged by the guest but by QEMU). This is the
> qemu_self_announce purpose that "spoofs" a RARP to all backend of guest ethernet interfaces. For vhost-user backend, QEMU can not do it directly

 Aha, see it now. qemu_announce_self() uses qemu_foreach_nic(), which actually iterates only over NET_CLIENT_OPTIONS_KIND_NIC interfaces. I expect these are fully emulated hardware controllers. virtio uses another type (see enum NetClientOptionsKind).
 So, we can happily ignore qemu_announce_self(), it does not do anything for us. Thanks for pointing it out.

Kind regards,
Pavel Fedin
Expert Engineer
Samsung Electronics Research center Russia

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [dpdk-dev] [PATCH 0/4 for 2.3] vhost-user live migration support
  2015-12-15 15:07                   ` Thibaut Collet
  2015-12-15 15:36                     ` Pavel Fedin
@ 2015-12-16  2:38                     ` Peter Xu
  2015-12-16  2:50                       ` Yuanhan Liu
  2015-12-16  7:05                       ` Pavel Fedin
  1 sibling, 2 replies; 42+ messages in thread
From: Peter Xu @ 2015-12-16  2:38 UTC (permalink / raw)
  To: Thibaut Collet; +Cc: dev, Victor Kaplansky

On Tue, Dec 15, 2015 at 04:07:57PM +0100, Thibaut Collet wrote:
> After a migration, to avoid netwotk outage, all interfaces of the guest
> must send a packet to update switches mapping (ideally a GARP).
> As some interfaces do not do it QEMU does it in behalf of the guest by
> sending a RARP (his RARP is not forged by the guest but by QEMU). This is
> the qemu_self_announce purpose that "spoofs" a RARP to all backend of guest
> ethernet interfaces. For vhost-user backend, QEMU can not do it directly
> and asks to the vhost-user backend to do it with the VHOST_USER_SEND_RARP
> request that contains the MAC address of the guest interface.
> 
> Thibaut.

Hi, Thibaut,

Thanks for the explaination.

Two more questions:

1. if vhost-user backend (or say, DPDK) supports GUEST_ANNOUNCE, and
   send another RARP (or say, GARP, I will use RARP as example),
   then there will be two RARP later on the line, right? (since the
   QEMU one is sent unconditionally from qemu_announce_self).

2. if the only thing vhost-user backend is to send another same RARP
   when got SEND_RARP request, why would it bother if QEMU will
   unconditionally send one? (or say, I still do not know why we
   need this SEND_RARP request, if the vhost-user backend is going
   to do the same thing again as QEMU already does)

Thanks in advance.
Peter

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [dpdk-dev] [PATCH 0/4 for 2.3] vhost-user live migration support
  2015-12-16  2:38                     ` Peter Xu
@ 2015-12-16  2:50                       ` Yuanhan Liu
  2015-12-16  7:05                       ` Pavel Fedin
  1 sibling, 0 replies; 42+ messages in thread
From: Yuanhan Liu @ 2015-12-16  2:50 UTC (permalink / raw)
  To: Peter Xu; +Cc: dev, Victor Kaplansky

On Wed, Dec 16, 2015 at 10:38:03AM +0800, Peter Xu wrote:
> On Tue, Dec 15, 2015 at 04:07:57PM +0100, Thibaut Collet wrote:
> > After a migration, to avoid netwotk outage, all interfaces of the guest
> > must send a packet to update switches mapping (ideally a GARP).
> > As some interfaces do not do it QEMU does it in behalf of the guest by
> > sending a RARP (his RARP is not forged by the guest but by QEMU). This is
> > the qemu_self_announce purpose that "spoofs" a RARP to all backend of guest
> > ethernet interfaces. For vhost-user backend, QEMU can not do it directly
> > and asks to the vhost-user backend to do it with the VHOST_USER_SEND_RARP
> > request that contains the MAC address of the guest interface.
> > 
> > Thibaut.
> 
> Hi, Thibaut,
> 
> Thanks for the explaination.
> 
> Two more questions:
> 
> 1. if vhost-user backend (or say, DPDK) supports GUEST_ANNOUNCE, and
>    send another RARP (or say, GARP, I will use RARP as example),
>    then there will be two RARP later on the line, right? (since the
>    QEMU one is sent unconditionally from qemu_announce_self).

The one send by qemu_announce_self() will be caught by
vhost_user_receive(), which ends up invoking vhost_user_migration_done().
And it will be dropped when VIRTIO_NET_F_GUEST_ANNOUNCE is negotiated
there.

> 2. if the only thing vhost-user backend is to send another same RARP
>    when got SEND_RARP request, why would it bother if QEMU will
>    unconditionally send one? (or say, I still do not know why we
>    need this SEND_RARP request, if the vhost-user backend is going
>    to do the same thing again as QEMU already does)

Because that one is caught by vhost-user, and vhost-user just relays
it to the backend when necessary (say when GUEST_ANNOUNCE is not
supported)?

	--yliu

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [dpdk-dev] [PATCH 0/4 for 2.3] vhost-user live migration support
  2015-12-16  2:38                     ` Peter Xu
  2015-12-16  2:50                       ` Yuanhan Liu
@ 2015-12-16  7:05                       ` Pavel Fedin
  1 sibling, 0 replies; 42+ messages in thread
From: Pavel Fedin @ 2015-12-16  7:05 UTC (permalink / raw)
  To: 'Peter Xu', 'Thibaut Collet'
  Cc: dev, 'Victor Kaplansky'

 Hello!

> 1. if vhost-user backend (or say, DPDK) supports GUEST_ANNOUNCE, and
>    send another RARP (or say, GARP, I will use RARP as example),
>    then there will be two RARP later on the line, right? (since the
>    QEMU one is sent unconditionally from qemu_announce_self).

 qemu_announce_self() is NOT unconditional. It applies only to emulated physical NICs and bypasses virtio/vhost. So it will not send anything at all for vhost-user.

> 2. if the only thing vhost-user backend is to send another same RARP
>    when got SEND_RARP request, why would it bother if QEMU will
>    unconditionally send one?

 See above, it won't send one.
 It looks to me like qemu_announce_self() is just a poor man's solution which even doesn't always work (because GARP should reassociate an existing IP with new MAC, shouldn't it? and qemu doesn't know the IP and just sets both src and dst to 0.0.0.0).

Kind regards,
Pavel Fedin
Expert Engineer
Samsung Electronics Research center Russia

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [dpdk-dev] [PATCH 0/4 for 2.3] vhost-user live migration support
  2015-12-15 14:58                         ` Pavel Fedin
@ 2015-12-16  7:28                           ` Yuanhan Liu
  2015-12-16 11:57                             ` Pavel Fedin
  0 siblings, 1 reply; 42+ messages in thread
From: Yuanhan Liu @ 2015-12-16  7:28 UTC (permalink / raw)
  To: Pavel Fedin; +Cc: dev, 'Victor Kaplansky'

On Tue, Dec 15, 2015 at 05:58:28PM +0300, Pavel Fedin wrote:
>  Hello!
> 
> > No idea. Maybe you have changed some other configures (such as of ovs)
> > without notice? Or, the ovs bridge interface resets?
> 
>  I don't touch the ovs at all. Just shut down the guest, rebuild the qemu, reinstall it, run the guest.
> 
> > 
> > BTW, would you please try my v1 patch set with above diff applied to
> > see if the ping loss is still there. You might also want to run tcpdump
> > with the dest host ovs bridge, to see if GARP is actually sent.
> 
>  Retested with wireshark running on the host. I used my qemu patch instead, but it should not matter at all:
> --- cut ---
> diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
> index 1b6c5ac..5ca2987 100644
> --- a/hw/virtio/vhost-user.c
> +++ b/hw/virtio/vhost-user.c
> @@ -480,7 +480,12 @@ static int vhost_user_get_u64(struct vhost_dev *dev, int request, uint64_t *u64)
> 
>  static int vhost_user_get_features(struct vhost_dev *dev, uint64_t *features)
>  {
> -    return vhost_user_get_u64(dev, VHOST_USER_GET_FEATURES, features);
> +    int ret = vhost_user_get_u64(dev, VHOST_USER_GET_FEATURES, features);
> +
> +    if (!ret) {
> +        virtio_add_feature(features, VIRTIO_NET_F_GUEST_ANNOUNCE);
> +    }
> +    return ret;
>  }
> 
>  static int vhost_user_set_owner(struct vhost_dev *dev)
> --- cut ---
> 
>  So, here are both wireshark captures on the host side:

Pavel,

I can reproduce your issue on my side with above patch (and only when
F_GUEST_ANNOUNCE is not set at DPDK vhost lib). TBH, I don't know
why that happened, the cause could be subtle, and I don't think it's
worthwhile to dig it, especially it's not the right way to do it.

So, would you please try to set the F_GUEST_ANNOUNCE flag on DPDK vhost
lib side, as my early diff showed and have another test?

On the other hand, I failed to find two identical server, the two closet
I found are E5-2695 and E5-2699, However, the MSI lost fatal bug still
occurred. I'm out of thoughts what could be the root cause. I'm asking
help from som KVM gurus; hopefully they could shine some lights on.
Meanwhile, I may need try to debug it.

Since you don't meet such issue, I'd hope you could have a test and
tell me how it works :)

Thanks.

	--yliu

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [dpdk-dev] [PATCH 0/4 for 2.3] vhost-user live migration support
  2015-12-16  7:28                           ` Yuanhan Liu
@ 2015-12-16 11:57                             ` Pavel Fedin
  2015-12-16 12:08                               ` Yuanhan Liu
  0 siblings, 1 reply; 42+ messages in thread
From: Pavel Fedin @ 2015-12-16 11:57 UTC (permalink / raw)
  To: 'Yuanhan Liu'; +Cc: dev, 'Victor Kaplansky'

 Hello!

> I can reproduce your issue on my side with above patch (and only when
> F_GUEST_ANNOUNCE is not set at DPDK vhost lib). TBH, I don't know
> why that happened, the cause could be subtle, and I don't think it's
> worthwhile to dig it, especially it's not the right way to do it.

 May be not right, may be it can be done... Actually, i found what was wrong. qemu tries to feed features back to vhost-user via
VHOST_USER_SET_FEATURES, and DPDK barfs on the unknown bit. More tweaking is needed for qemu to do the trick correctly.

> So, would you please try to set the F_GUEST_ANNOUNCE flag on DPDK vhost
> lib side, as my early diff showed and have another test?

 Tried it, works fine, thank you.
 I have almost implemented the workaround in qemu... However now i start to think that you are right. Theoretically, the application
may want to suppress GUEST_ANNOUNCE for some reason. So, let it stay this way. Please include this bit into your v2.

Kind regards,
Pavel Fedin
Expert Engineer
Samsung Electronics Research center Russia

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [dpdk-dev] [PATCH 0/4 for 2.3] vhost-user live migration support
  2015-12-16 11:57                             ` Pavel Fedin
@ 2015-12-16 12:08                               ` Yuanhan Liu
  2015-12-16 12:43                                 ` Pavel Fedin
  0 siblings, 1 reply; 42+ messages in thread
From: Yuanhan Liu @ 2015-12-16 12:08 UTC (permalink / raw)
  To: Pavel Fedin; +Cc: dev, 'Victor Kaplansky'

On Wed, Dec 16, 2015 at 02:57:15PM +0300, Pavel Fedin wrote:
>  Hello!
> 
> > I can reproduce your issue on my side with above patch (and only when
> > F_GUEST_ANNOUNCE is not set at DPDK vhost lib). TBH, I don't know
> > why that happened, the cause could be subtle, and I don't think it's
> > worthwhile to dig it, especially it's not the right way to do it.
> 
>  May be not right, may be it can be done... Actually, i found what was wrong. qemu tries to feed features back to vhost-user via
> VHOST_USER_SET_FEATURES, and DPDK barfs on the unknown bit. More tweaking is needed for qemu to do the trick correctly.
> 
> > So, would you please try to set the F_GUEST_ANNOUNCE flag on DPDK vhost
> > lib side, as my early diff showed and have another test?
> 
>  Tried it, works fine, thank you.

Thanks for the test.

However, I'm more curious about the ping loss? Did you still see
that? And to be more specific, have the wireshark captured the
GRAP from the guest?  And what's the output of 'grep virtio /proc/interrupts'
inside guest?

	--yliu


>  I have almost implemented the workaround in qemu... However now i start to think that you are right. Theoretically, the application
> may want to suppress GUEST_ANNOUNCE for some reason. So, let it stay this way. Please include this bit into your v2.
> 
> Kind regards,
> Pavel Fedin
> Expert Engineer
> Samsung Electronics Research center Russia
> 

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [dpdk-dev] [PATCH 0/4 for 2.3] vhost-user live migration support
  2015-12-16 12:08                               ` Yuanhan Liu
@ 2015-12-16 12:43                                 ` Pavel Fedin
  2015-12-16 13:00                                   ` Yuanhan Liu
  0 siblings, 1 reply; 42+ messages in thread
From: Pavel Fedin @ 2015-12-16 12:43 UTC (permalink / raw)
  To: 'Yuanhan Liu'; +Cc: dev, 'Victor Kaplansky'

 Hello!

> However, I'm more curious about the ping loss? Did you still see
> that? And to be more specific, have the wireshark captured the
> GRAP from the guest?

 Yes, everything is fine.

root@nfv_test_x86_64 /var/log/libvirt/qemu # tshark -i ovs-br0
Running as user "root" and group "root". This could be dangerous.
Capturing on 'ovs-br0'
  1   0.000000 RealtekU_3b:83:1a -> Broadcast    ARP 42 Gratuitous ARP for 192.168.6.2 (Request)
  2   0.000024 fe80::5054:ff:fe3b:831a -> ff02::1      ICMPv6 86 Neighbor Advertisement fe80::5054:ff:fe3b:831a (ovr) is at
52:54:00:3b:83:1a
  3   0.049490 RealtekU_3b:83:1a -> Broadcast    ARP 42 Gratuitous ARP for 192.168.6.2 (Request)
  4   0.049497 fe80::5054:ff:fe3b:831a -> ff02::1      ICMPv6 86 Neighbor Advertisement fe80::5054:ff:fe3b:831a (ovr) is at
52:54:00:3b:83:1a
  5   0.199485 RealtekU_3b:83:1a -> Broadcast    ARP 42 Gratuitous ARP for 192.168.6.2 (Request)
  6   0.199492 fe80::5054:ff:fe3b:831a -> ff02::1      ICMPv6 86 Neighbor Advertisement fe80::5054:ff:fe3b:831a (ovr) is at
52:54:00:3b:83:1a
  7   0.449500 RealtekU_3b:83:1a -> Broadcast    ARP 42 Gratuitous ARP for 192.168.6.2 (Request)
  8   0.449508 fe80::5054:ff:fe3b:831a -> ff02::1      ICMPv6 86 Neighbor Advertisement fe80::5054:ff:fe3b:831a (ovr) is at
52:54:00:3b:83:1a
  9   0.517229  192.168.6.2 -> 192.168.6.1  ICMP 98 Echo (ping) request  id=0x04af, seq=70/17920, ttl=64
 10   0.517277  192.168.6.1 -> 192.168.6.2  ICMP 98 Echo (ping) reply    id=0x04af, seq=70/17920, ttl=64 (request in 9)
 11   0.799521 RealtekU_3b:83:1a -> Broadcast    ARP 42 Gratuitous ARP for 192.168.6.2 (Request)
 12   0.799553 fe80::5054:ff:fe3b:831a -> ff02::1      ICMPv6 86 Neighbor Advertisement fe80::5054:ff:fe3b:831a (ovr) is at
52:54:00:3b:83:1a
 13   1.517210  192.168.6.2 -> 192.168.6.1  ICMP 98 Echo (ping) request  id=0x04af, seq=71/18176, ttl=64
 14   1.517238  192.168.6.1 -> 192.168.6.2  ICMP 98 Echo (ping) reply    id=0x04af, seq=71/18176, ttl=64 (request in 13)
 15   2.517219  192.168.6.2 -> 192.168.6.1  ICMP 98 Echo (ping) request  id=0x04af, seq=72/18432, ttl=64
 16   2.517256  192.168.6.1 -> 192.168.6.2  ICMP 98 Echo (ping) reply    id=0x04af, seq=72/18432, ttl=64 (request in 15)
 17   3.517497  192.168.6.2 -> 192.168.6.1  ICMP 98 Echo (ping) request  id=0x04af, seq=73/18688, ttl=64
 18   3.517518  192.168.6.1 -> 192.168.6.2  ICMP 98 Echo (ping) reply    id=0x04af, seq=73/18688, ttl=64 (request in 17)
 19   4.517219  192.168.6.2 -> 192.168.6.1  ICMP 98 Echo (ping) request  id=0x04af, seq=74/18944, ttl=64
 20   4.517237  192.168.6.1 -> 192.168.6.2  ICMP 98 Echo (ping) reply    id=0x04af, seq=74/18944, ttl=64 (request in 19)
 21   5.517222  192.168.6.2 -> 192.168.6.1  ICMP 98 Echo (ping) request  id=0x04af, seq=75/19200, ttl=64
 22   5.517242  192.168.6.1 -> 192.168.6.2  ICMP 98 Echo (ping) reply    id=0x04af, seq=75/19200, ttl=64 (request in 21)
 23   6.517235  192.168.6.2 -> 192.168.6.1  ICMP 98 Echo (ping) request  id=0x04af, seq=76/19456, ttl=64
 24   6.517256  192.168.6.1 -> 192.168.6.2  ICMP 98 Echo (ping) reply    id=0x04af, seq=76/19456, ttl=64 (request in 23)
 25   6.531466 be:e1:71:c1:47:4d -> RealtekU_3b:83:1a ARP 42 Who has 192.168.6.2?  Tell 192.168.6.1
 26   6.531619 RealtekU_3b:83:1a -> be:e1:71:c1:47:4d ARP 42 192.168.6.2 is at 52:54:00:3b:83:1a
 27   7.517212  192.168.6.2 -> 192.168.6.1  ICMP 98 Echo (ping) request  id=0x04af, seq=77/19712, ttl=64
 28   7.517229  192.168.6.1 -> 192.168.6.2  ICMP 98 Echo (ping) reply    id=0x04af, seq=77/19712, ttl=64 (request in 27)

 But there's one important detail here. Any replicated network interfaces (LOCAL port in my example) should be fully cloned on both
hosts, including MAC addresses. Otherwise after the migration the guest continues to send packets to old MAC, and, obvious, there's
still ping loss until it redoes the ARP for its ping target.

>  And what's the output of 'grep virtio /proc/interrupts' inside guest?

11:          0          0          0          0   IO-APIC  11-fasteoi   uhci_hcd:usb1, virtio3
 24:          0          0          0          0   PCI-MSI 114688-edge      virtio2-config
 25:       3544          0          0          0   PCI-MSI 114689-edge      virtio2-req.0
 26:         10          0          0          0   PCI-MSI 49152-edge      virtio0-config
 27:        852          0          0          0   PCI-MSI 49153-edge      virtio0-input.0
 28:          3          0          0          0   PCI-MSI 49154-edge      virtio0-output.0
 29:         10          0          0          0   PCI-MSI 65536-edge      virtio1-config
 30:        172          0          0          0   PCI-MSI 65537-edge      virtio1-input.0
 31:          1          0          0          0   PCI-MSI 65538-edge      virtio1-output.0

Kind regards,
Pavel Fedin
Expert Engineer
Samsung Electronics Research center Russia

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [dpdk-dev] [PATCH 0/4 for 2.3] vhost-user live migration support
  2015-12-16 12:43                                 ` Pavel Fedin
@ 2015-12-16 13:00                                   ` Yuanhan Liu
  0 siblings, 0 replies; 42+ messages in thread
From: Yuanhan Liu @ 2015-12-16 13:00 UTC (permalink / raw)
  To: Pavel Fedin; +Cc: dev, 'Victor Kaplansky'

On Wed, Dec 16, 2015 at 03:43:06PM +0300, Pavel Fedin wrote:
> rYR8N8f/ookveMRL7BfPnj5lw+EJZd+uG+v/lZnBuWidyQ4r
> 	g586/P1rPsQw8p6wT+M7LnqvMLZM9eWq2ht53Bd5liqxFGckGmoxFxUnAgC5sFKthAIAAA==
> Status: O
> Content-Length: 4853
> Lines: 66
> 
>  Hello!
> 
> > However, I'm more curious about the ping loss? Did you still see
> > that? And to be more specific, have the wireshark captured the
> > GRAP from the guest?
> 
>  Yes, everything is fine.

Great!

> 
> root@nfv_test_x86_64 /var/log/libvirt/qemu # tshark -i ovs-br0
> Running as user "root" and group "root". This could be dangerous.
> Capturing on 'ovs-br0'
>   1   0.000000 RealtekU_3b:83:1a -> Broadcast    ARP 42 Gratuitous ARP for 192.168.6.2 (Request)
>   2   0.000024 fe80::5054:ff:fe3b:831a -> ff02::1      ICMPv6 86 Neighbor Advertisement fe80::5054:ff:fe3b:831a (ovr) is at
> 52:54:00:3b:83:1a
>   3   0.049490 RealtekU_3b:83:1a -> Broadcast    ARP 42 Gratuitous ARP for 192.168.6.2 (Request)
>   4   0.049497 fe80::5054:ff:fe3b:831a -> ff02::1      ICMPv6 86 Neighbor Advertisement fe80::5054:ff:fe3b:831a (ovr) is at
> 52:54:00:3b:83:1a
>   5   0.199485 RealtekU_3b:83:1a -> Broadcast    ARP 42 Gratuitous ARP for 192.168.6.2 (Request)
>   6   0.199492 fe80::5054:ff:fe3b:831a -> ff02::1      ICMPv6 86 Neighbor Advertisement fe80::5054:ff:fe3b:831a (ovr) is at
> 52:54:00:3b:83:1a
>   7   0.449500 RealtekU_3b:83:1a -> Broadcast    ARP 42 Gratuitous ARP for 192.168.6.2 (Request)
>   8   0.449508 fe80::5054:ff:fe3b:831a -> ff02::1      ICMPv6 86 Neighbor Advertisement fe80::5054:ff:fe3b:831a (ovr) is at
> 52:54:00:3b:83:1a
>   9   0.517229  192.168.6.2 -> 192.168.6.1  ICMP 98 Echo (ping) request  id=0x04af, seq=70/17920, ttl=64
>  10   0.517277  192.168.6.1 -> 192.168.6.2  ICMP 98 Echo (ping) reply    id=0x04af, seq=70/17920, ttl=64 (request in 9)
>  11   0.799521 RealtekU_3b:83:1a -> Broadcast    ARP 42 Gratuitous ARP for 192.168.6.2 (Request)
>  12   0.799553 fe80::5054:ff:fe3b:831a -> ff02::1      ICMPv6 86 Neighbor Advertisement fe80::5054:ff:fe3b:831a (ovr) is at
> 52:54:00:3b:83:1a
>  13   1.517210  192.168.6.2 -> 192.168.6.1  ICMP 98 Echo (ping) request  id=0x04af, seq=71/18176, ttl=64
>  14   1.517238  192.168.6.1 -> 192.168.6.2  ICMP 98 Echo (ping) reply    id=0x04af, seq=71/18176, ttl=64 (request in 13)
>  15   2.517219  192.168.6.2 -> 192.168.6.1  ICMP 98 Echo (ping) request  id=0x04af, seq=72/18432, ttl=64
>  16   2.517256  192.168.6.1 -> 192.168.6.2  ICMP 98 Echo (ping) reply    id=0x04af, seq=72/18432, ttl=64 (request in 15)
>  17   3.517497  192.168.6.2 -> 192.168.6.1  ICMP 98 Echo (ping) request  id=0x04af, seq=73/18688, ttl=64
>  18   3.517518  192.168.6.1 -> 192.168.6.2  ICMP 98 Echo (ping) reply    id=0x04af, seq=73/18688, ttl=64 (request in 17)
>  19   4.517219  192.168.6.2 -> 192.168.6.1  ICMP 98 Echo (ping) request  id=0x04af, seq=74/18944, ttl=64
>  20   4.517237  192.168.6.1 -> 192.168.6.2  ICMP 98 Echo (ping) reply    id=0x04af, seq=74/18944, ttl=64 (request in 19)
>  21   5.517222  192.168.6.2 -> 192.168.6.1  ICMP 98 Echo (ping) request  id=0x04af, seq=75/19200, ttl=64
>  22   5.517242  192.168.6.1 -> 192.168.6.2  ICMP 98 Echo (ping) reply    id=0x04af, seq=75/19200, ttl=64 (request in 21)
>  23   6.517235  192.168.6.2 -> 192.168.6.1  ICMP 98 Echo (ping) request  id=0x04af, seq=76/19456, ttl=64
>  24   6.517256  192.168.6.1 -> 192.168.6.2  ICMP 98 Echo (ping) reply    id=0x04af, seq=76/19456, ttl=64 (request in 23)
>  25   6.531466 be:e1:71:c1:47:4d -> RealtekU_3b:83:1a ARP 42 Who has 192.168.6.2?  Tell 192.168.6.1
>  26   6.531619 RealtekU_3b:83:1a -> be:e1:71:c1:47:4d ARP 42 192.168.6.2 is at 52:54:00:3b:83:1a
>  27   7.517212  192.168.6.2 -> 192.168.6.1  ICMP 98 Echo (ping) request  id=0x04af, seq=77/19712, ttl=64
>  28   7.517229  192.168.6.1 -> 192.168.6.2  ICMP 98 Echo (ping) reply    id=0x04af, seq=77/19712, ttl=64 (request in 27)
> 
>  But there's one important detail here. Any replicated network interfaces (LOCAL port in my example) should be fully cloned on both
> hosts, including MAC addresses. Otherwise after the migration the guest continues to send packets to old MAC, and, obvious, there's
> still ping loss until it redoes the ARP for its ping target.

I see. And here I care more about whether we can get the GARP from the
target guest just after the migration. If you can, everything should
be fine.
> 
> >  And what's the output of 'grep virtio /proc/interrupts' inside guest?
> 
> 11:          0          0          0          0   IO-APIC  11-fasteoi   uhci_hcd:usb1, virtio3
>  24:          0          0          0          0   PCI-MSI 114688-edge      virtio2-config
>  25:       3544          0          0          0   PCI-MSI 114689-edge      virtio2-req.0
>  26:         10          0          0          0   PCI-MSI 49152-edge      virtio0-config

The GUEST_ANNOUNCE has indeed been triggered. That's great! I just have
no idea why I can't get any config IRQ from the guest after the migration.
(I can for migratin inside one same host, but not on two hosts).

In my first tries, I just got an error message telling me that the MSI is
just lost. I then found it may because I'm using a customized guest kernel.
I then switched to the kernel shipped by Fedora 22, I no longer see such
error, but I still don't see such interrupt generated inside the guest, either.

It might still be an issue on my side. Even it's not, it's likely a KVM
bug, but not from vhost-user. And glad it works on your side :)

So, I will send v2 tomorow.

	--yliu

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [dpdk-dev] [PATCH 0/4 for 2.3] vhost-user live migration support
  2015-12-02  3:43 Yuanhan Liu
  2015-12-02 14:10 ` Victor Kaplansky
@ 2015-12-09  3:41 ` Xie, Huawei
  1 sibling, 0 replies; 42+ messages in thread
From: Xie, Huawei @ 2015-12-09  3:41 UTC (permalink / raw)
  To: Yuanhan Liu, dev; +Cc: Victor Kaplansky, Michael S. Tsirkin

On 12/2/2015 11:40 AM, Yuanhan Liu wrote:
> This patch set adds the initial vhost-user live migration support.
>
> The major task behind that is to log pages we touched during
> live migration. So, this patch is basically about adding vhost
> log support, and using it.
>
> Patchset
> ========
> - Patch 1 handles VHOST_USER_SET_LOG_BASE, which tells us where
>   the dirty memory bitmap is.
>     
> - Patch 2 introduces a vhost_log_write() helper function to log
>   pages we are gonna change.
>
> - Patch 3 logs changes we made to used vring.
>
> - Patch 4 sets log_fhmfd protocol feature bit, which actually
>   enables the vhost-user live migration support.
>
> A simple test guide (on same host)
> ==================================
>
> The following test is based on OVS + DPDK. And here is guide
> to setup OVS + DPDK:
>
>     http://wiki.qemu.org/Features/vhost-user-ovs-dpdk
>
> 1. start ovs-vswitchd
>
> 2. Add two ovs vhost-user port, say vhost0 and vhost1
>
> 3. Start a VM1 to connect to vhost0. Here is my example:
>
>    $QEMU -enable-kvm -m 1024 -smp 4 \
>        -chardev socket,id=char0,path=/var/run/openvswitch/vhost0  \
>        -netdev type=vhost-user,id=mynet1,chardev=char0,vhostforce \
>        -device virtio-net-pci,netdev=mynet1,mac=52:54:00:12:34:58 \
>        -object memory-backend-file,id=mem,size=1024M,mem-path=$HOME/hugetlbfs,share=on \
>        -numa node,memdev=mem -mem-prealloc \
>        -kernel $HOME/iso/vmlinuz -append "root=/dev/sda1" \
>        -hda fc-19-i386.img \
>        -monitor telnet::3333,server,nowait -curses
>
> 4. run "ping $host" inside VM1
>
> 5. Start VM2 to connect to vhost0, and marking it as the target
>    of live migration (by adding -incoming tcp:0:4444 option)
>
>    $QEMU -enable-kvm -m 1024 -smp 4 \
>        -chardev socket,id=char0,path=/var/run/openvswitch/vhost1  \
>        -netdev type=vhost-user,id=mynet1,chardev=char0,vhostforce \
>        -device virtio-net-pci,netdev=mynet1,mac=52:54:00:12:34:58 \
>        -object memory-backend-file,id=mem,size=1024M,mem-path=$HOME/hugetlbfs,share=on \
>        -numa node,memdev=mem -mem-prealloc \
>        -kernel $HOME/iso/vmlinuz -append "root=/dev/sda1" \
>        -hda fc-19-i386.img \
>        -monitor telnet::3334,server,nowait -curses \
>        -incoming tcp:0:4444 
>
> 6. connect to VM1 monitor, and start migration:
>
>    > migrate tcp:0:4444
>
> 7. After a while, you will find that VM1 has been migrated to VM2,
>    and the "ping" command continues running, perfectly.
Is there some formal verification that migration is truly successful? At
least that the memory we care in our vhost-user case has been migrated
successfully?
For instance, we miss logging guest RX buffers in this patch set, but we
have no idea.

[...]

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [dpdk-dev] [PATCH 0/4 for 2.3] vhost-user live migration support
  2015-12-02 14:10 ` Victor Kaplansky
@ 2015-12-02 14:33   ` Yuanhan Liu
  0 siblings, 0 replies; 42+ messages in thread
From: Yuanhan Liu @ 2015-12-02 14:33 UTC (permalink / raw)
  To: Victor Kaplansky; +Cc: dev, Michael S. Tsirkin

On Wed, Dec 02, 2015 at 04:10:56PM +0200, Victor Kaplansky wrote:
...
> > Note: this patch set has mostly been based on Victor Kaplansk's demo
> > work (vhost-user-bridge) at QEMU project. I was thinking to add Victor
> > as the co-author. Victor, what do you think of that? :)
> 
> Thanks for adding me to credits list!

Great, I will add your signed-off-by since v2. Will that be okay to you?

	--yliu

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [dpdk-dev] [PATCH 0/4 for 2.3] vhost-user live migration support
  2015-12-02  3:43 Yuanhan Liu
@ 2015-12-02 14:10 ` Victor Kaplansky
  2015-12-02 14:33   ` Yuanhan Liu
  2015-12-09  3:41 ` Xie, Huawei
  1 sibling, 1 reply; 42+ messages in thread
From: Victor Kaplansky @ 2015-12-02 14:10 UTC (permalink / raw)
  To: Yuanhan Liu; +Cc: dev, Michael S. Tsirkin

On Wed, Dec 02, 2015 at 11:43:09AM +0800, Yuanhan Liu wrote:
> This patch set adds the initial vhost-user live migration support.
> 
> The major task behind that is to log pages we touched during
> live migration. So, this patch is basically about adding vhost
> log support, and using it.
> 
> Patchset
> ========
> - Patch 1 handles VHOST_USER_SET_LOG_BASE, which tells us where
>   the dirty memory bitmap is.
>     
> - Patch 2 introduces a vhost_log_write() helper function to log
>   pages we are gonna change.
> 
> - Patch 3 logs changes we made to used vring.
> 
> - Patch 4 sets log_fhmfd protocol feature bit, which actually
>   enables the vhost-user live migration support.
> 
> A simple test guide (on same host)
> ==================================
> 
> The following test is based on OVS + DPDK. And here is guide
> to setup OVS + DPDK:
> 
>     http://wiki.qemu.org/Features/vhost-user-ovs-dpdk
> 
> 1. start ovs-vswitchd
> 
> 2. Add two ovs vhost-user port, say vhost0 and vhost1
> 
> 3. Start a VM1 to connect to vhost0. Here is my example:
> 
>    $QEMU -enable-kvm -m 1024 -smp 4 \
>        -chardev socket,id=char0,path=/var/run/openvswitch/vhost0  \
>        -netdev type=vhost-user,id=mynet1,chardev=char0,vhostforce \
>        -device virtio-net-pci,netdev=mynet1,mac=52:54:00:12:34:58 \
>        -object memory-backend-file,id=mem,size=1024M,mem-path=$HOME/hugetlbfs,share=on \
>        -numa node,memdev=mem -mem-prealloc \
>        -kernel $HOME/iso/vmlinuz -append "root=/dev/sda1" \
>        -hda fc-19-i386.img \
>        -monitor telnet::3333,server,nowait -curses
> 
> 4. run "ping $host" inside VM1
> 
> 5. Start VM2 to connect to vhost0, and marking it as the target
>    of live migration (by adding -incoming tcp:0:4444 option)
> 
>    $QEMU -enable-kvm -m 1024 -smp 4 \
>        -chardev socket,id=char0,path=/var/run/openvswitch/vhost1  \
>        -netdev type=vhost-user,id=mynet1,chardev=char0,vhostforce \
>        -device virtio-net-pci,netdev=mynet1,mac=52:54:00:12:34:58 \
>        -object memory-backend-file,id=mem,size=1024M,mem-path=$HOME/hugetlbfs,share=on \
>        -numa node,memdev=mem -mem-prealloc \
>        -kernel $HOME/iso/vmlinuz -append "root=/dev/sda1" \
>        -hda fc-19-i386.img \
>        -monitor telnet::3334,server,nowait -curses \
>        -incoming tcp:0:4444 
> 
> 6. connect to VM1 monitor, and start migration:
> 
>    > migrate tcp:0:4444
> 
> 7. After a while, you will find that VM1 has been migrated to VM2,
>    and the "ping" command continues running, perfectly.
> 
> 
> Note: this patch set has mostly been based on Victor Kaplansk's demo
> work (vhost-user-bridge) at QEMU project. I was thinking to add Victor
> as the co-author. Victor, what do you think of that? :)

Thanks for adding me to credits list!
-- Victor

> 
> Comments are welcome!
> 
> ---
> Yuanhan Liu (4):
>   vhost: handle VHOST_USER_SET_LOG_BASE request
>   vhost: introduce vhost_log_write
>   vhost: log vring changes
>   vhost: enable log_shmfd protocol feature
> 
>  lib/librte_vhost/rte_virtio_net.h             | 35 ++++++++++++++
>  lib/librte_vhost/vhost_rxtx.c                 | 70 ++++++++++++++++++---------
>  lib/librte_vhost/vhost_user/vhost-net-user.c  |  7 ++-
>  lib/librte_vhost/vhost_user/vhost-net-user.h  |  6 +++
>  lib/librte_vhost/vhost_user/virtio-net-user.c | 44 +++++++++++++++++
>  lib/librte_vhost/vhost_user/virtio-net-user.h |  5 +-
>  lib/librte_vhost/virtio-net.c                 |  4 ++
>  7 files changed, 145 insertions(+), 26 deletions(-)
> 
> -- 
> 1.9.0

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [dpdk-dev] [PATCH 0/4 for 2.3] vhost-user live migration support
@ 2015-12-02  3:43 Yuanhan Liu
  2015-12-02 14:10 ` Victor Kaplansky
  2015-12-09  3:41 ` Xie, Huawei
  0 siblings, 2 replies; 42+ messages in thread
From: Yuanhan Liu @ 2015-12-02  3:43 UTC (permalink / raw)
  To: dev; +Cc: Victor Kaplansky, Michael S. Tsirkin

This patch set adds the initial vhost-user live migration support.

The major task behind that is to log pages we touched during
live migration. So, this patch is basically about adding vhost
log support, and using it.

Patchset
========
- Patch 1 handles VHOST_USER_SET_LOG_BASE, which tells us where
  the dirty memory bitmap is.
    
- Patch 2 introduces a vhost_log_write() helper function to log
  pages we are gonna change.

- Patch 3 logs changes we made to used vring.

- Patch 4 sets log_fhmfd protocol feature bit, which actually
  enables the vhost-user live migration support.

A simple test guide (on same host)
==================================

The following test is based on OVS + DPDK. And here is guide
to setup OVS + DPDK:

    http://wiki.qemu.org/Features/vhost-user-ovs-dpdk

1. start ovs-vswitchd

2. Add two ovs vhost-user port, say vhost0 and vhost1

3. Start a VM1 to connect to vhost0. Here is my example:

   $QEMU -enable-kvm -m 1024 -smp 4 \
       -chardev socket,id=char0,path=/var/run/openvswitch/vhost0  \
       -netdev type=vhost-user,id=mynet1,chardev=char0,vhostforce \
       -device virtio-net-pci,netdev=mynet1,mac=52:54:00:12:34:58 \
       -object memory-backend-file,id=mem,size=1024M,mem-path=$HOME/hugetlbfs,share=on \
       -numa node,memdev=mem -mem-prealloc \
       -kernel $HOME/iso/vmlinuz -append "root=/dev/sda1" \
       -hda fc-19-i386.img \
       -monitor telnet::3333,server,nowait -curses

4. run "ping $host" inside VM1

5. Start VM2 to connect to vhost0, and marking it as the target
   of live migration (by adding -incoming tcp:0:4444 option)

   $QEMU -enable-kvm -m 1024 -smp 4 \
       -chardev socket,id=char0,path=/var/run/openvswitch/vhost1  \
       -netdev type=vhost-user,id=mynet1,chardev=char0,vhostforce \
       -device virtio-net-pci,netdev=mynet1,mac=52:54:00:12:34:58 \
       -object memory-backend-file,id=mem,size=1024M,mem-path=$HOME/hugetlbfs,share=on \
       -numa node,memdev=mem -mem-prealloc \
       -kernel $HOME/iso/vmlinuz -append "root=/dev/sda1" \
       -hda fc-19-i386.img \
       -monitor telnet::3334,server,nowait -curses \
       -incoming tcp:0:4444 

6. connect to VM1 monitor, and start migration:

   > migrate tcp:0:4444

7. After a while, you will find that VM1 has been migrated to VM2,
   and the "ping" command continues running, perfectly.


Note: this patch set has mostly been based on Victor Kaplansk's demo
work (vhost-user-bridge) at QEMU project. I was thinking to add Victor
as the co-author. Victor, what do you think of that? :)

Comments are welcome!

---
Yuanhan Liu (4):
  vhost: handle VHOST_USER_SET_LOG_BASE request
  vhost: introduce vhost_log_write
  vhost: log vring changes
  vhost: enable log_shmfd protocol feature

 lib/librte_vhost/rte_virtio_net.h             | 35 ++++++++++++++
 lib/librte_vhost/vhost_rxtx.c                 | 70 ++++++++++++++++++---------
 lib/librte_vhost/vhost_user/vhost-net-user.c  |  7 ++-
 lib/librte_vhost/vhost_user/vhost-net-user.h  |  6 +++
 lib/librte_vhost/vhost_user/virtio-net-user.c | 44 +++++++++++++++++
 lib/librte_vhost/vhost_user/virtio-net-user.h |  5 +-
 lib/librte_vhost/virtio-net.c                 |  4 ++
 7 files changed, 145 insertions(+), 26 deletions(-)

-- 
1.9.0

^ permalink raw reply	[flat|nested] 42+ messages in thread

end of thread, other threads:[~2015-12-16 13:00 UTC | newest]

Thread overview: 42+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-12-11  8:26 [dpdk-dev] [PATCH 0/4 for 2.3] vhost-user live migration support Pavel Fedin
2015-12-11  9:49 ` Yuanhan Liu
2015-12-11 10:22   ` Pavel Fedin
2015-12-14  3:58     ` Peter Xu
2015-12-14  7:30       ` Pavel Fedin
2015-12-14  9:04         ` Peter Xu
2015-12-14  9:46           ` Pavel Fedin
2015-12-14 10:09             ` Peter Xu
2015-12-14 12:09             ` Yuanhan Liu
2015-12-14 13:00               ` Peter Xu
2015-12-14 13:21                 ` Yuanhan Liu
2015-12-14 13:28                   ` Peter Xu
2015-12-14 13:51                     ` Yuanhan Liu
2015-12-14 14:54                   ` Pavel Fedin
2015-12-15  8:23       ` Yuanhan Liu
2015-12-15  8:45         ` Pavel Fedin
2015-12-15  8:56           ` Yuanhan Liu
2015-12-15  9:04             ` Pavel Fedin
2015-12-15 10:05           ` Peter Xu
2015-12-15 11:43             ` Thibaut Collet
2015-12-15 11:47               ` Thibaut Collet
2015-12-15 12:24                 ` Pavel Fedin
2015-12-15 13:36                   ` Yuanhan Liu
2015-12-15 13:48                     ` Pavel Fedin
2015-12-15 13:59                       ` Yuanhan Liu
2015-12-15 14:58                         ` Pavel Fedin
2015-12-16  7:28                           ` Yuanhan Liu
2015-12-16 11:57                             ` Pavel Fedin
2015-12-16 12:08                               ` Yuanhan Liu
2015-12-16 12:43                                 ` Pavel Fedin
2015-12-16 13:00                                   ` Yuanhan Liu
2015-12-15 13:18                 ` Yuanhan Liu
2015-12-15 15:07                   ` Thibaut Collet
2015-12-15 15:36                     ` Pavel Fedin
2015-12-16  2:38                     ` Peter Xu
2015-12-16  2:50                       ` Yuanhan Liu
2015-12-16  7:05                       ` Pavel Fedin
2015-12-15  9:42         ` Peter Xu
  -- strict thread matches above, loose matches on Subject: below --
2015-12-02  3:43 Yuanhan Liu
2015-12-02 14:10 ` Victor Kaplansky
2015-12-02 14:33   ` Yuanhan Liu
2015-12-09  3:41 ` Xie, Huawei

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).