DPDK patches and discussions
 help / color / mirror / Atom feed
* Re: [dpdk-dev] [RFC 0/5] virtio support for container
@ 2015-12-30  9:46 Pavel Fedin
  2015-12-31  9:19 ` Tan, Jianfeng
  0 siblings, 1 reply; 16+ messages in thread
From: Pavel Fedin @ 2015-12-30  9:46 UTC (permalink / raw)
  To: dev

 Hello everybody!

 I am currently working on improved version of this patchset, and i am testing it with openvswitch. I run two openvswitch instances:
on host and in container. Both ovs instances forward packets between its LOCAL port and vhost/virtio port. This way i can
comfortably run PING between my host and container.
 The problem is that the patchset seems to be broken somehow. ovs-vswitchd fails to open dpdk0 device, and if i set --log-level=9
for DPDK, i see this in the console:
--- cut ---
Broadcast message from systemd-journald@localhost.localdomain (Wed 2015-12-30 11:13:00 MSK):

ovs-vswitchd[557]: EAL: TSC frequency is ~3400032 KHz


Broadcast message from systemd-journald@localhost.localdomain (Wed 2015-12-30 11:13:00 MSK):

ovs-vswitchd[560]: EAL: memzone_reserve_aligned_thread_unsafe(): memzone <RG_MP_ovs_mp_1500_0_262144> already exists


Broadcast message from systemd-journald@localhost.localdomain (Wed 2015-12-30 11:13:00 MSK):

ovs-vswitchd[560]: RING: Cannot reserve memory
--- cut ---

 How can i debug this?

Kind regards,
Pavel Fedin
Expert Engineer
Samsung Electronics Research center Russia

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [dpdk-dev] [RFC 0/5] virtio support for container
  2015-12-30  9:46 [dpdk-dev] [RFC 0/5] virtio support for container Pavel Fedin
@ 2015-12-31  9:19 ` Tan, Jianfeng
  2015-12-31  9:40   ` Pavel Fedin
  0 siblings, 1 reply; 16+ messages in thread
From: Tan, Jianfeng @ 2015-12-31  9:19 UTC (permalink / raw)
  To: Pavel Fedin, dev

Hi Fedin,

First of all, when you say openvswitch, are you referring to ovs-dpdk?

And can you detail your test case? Like, how do you want ovs_on_host and ovs_in_container to be connected?
Through two-direct-connected physical NICs, or one vhost port in ovs_on_host and one virtio port in ovs_in_container?

Thanks,
Jianfeng

> -----Original Message-----
> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Pavel Fedin
> Sent: Wednesday, December 30, 2015 5:47 PM
> To: dev@dpdk.org
> Subject: Re: [dpdk-dev] [RFC 0/5] virtio support for container
> 
>  Hello everybody!
> 
>  I am currently working on improved version of this patchset, and i am testing
> it with openvswitch. I run two openvswitch instances:
> on host and in container. Both ovs instances forward packets between its
> LOCAL port and vhost/virtio port. This way i can
> comfortably run PING between my host and container.
>  The problem is that the patchset seems to be broken somehow. ovs-
> vswitchd fails to open dpdk0 device, and if i set --log-level=9
> for DPDK, i see this in the console:
> --- cut ---
> Broadcast message from systemd-journald@localhost.localdomain (Wed
> 2015-12-30 11:13:00 MSK):
> 
> ovs-vswitchd[557]: EAL: TSC frequency is ~3400032 KHz
> 
> 
> Broadcast message from systemd-journald@localhost.localdomain (Wed
> 2015-12-30 11:13:00 MSK):
> 
> ovs-vswitchd[560]: EAL: memzone_reserve_aligned_thread_unsafe():
> memzone <RG_MP_ovs_mp_1500_0_262144> already exists
> 
> 
> Broadcast message from systemd-journald@localhost.localdomain (Wed
> 2015-12-30 11:13:00 MSK):
> 
> ovs-vswitchd[560]: RING: Cannot reserve memory
> --- cut ---
> 
>  How can i debug this?
> 
> Kind regards,
> Pavel Fedin
> Expert Engineer
> Samsung Electronics Research center Russia
> 
> 

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [dpdk-dev] [RFC 0/5] virtio support for container
  2015-12-31  9:19 ` Tan, Jianfeng
@ 2015-12-31  9:40   ` Pavel Fedin
  2015-12-31 10:02     ` Tan, Jianfeng
  0 siblings, 1 reply; 16+ messages in thread
From: Pavel Fedin @ 2015-12-31  9:40 UTC (permalink / raw)
  To: 'Tan, Jianfeng', dev

 Hello!

> First of all, when you say openvswitch, are you referring to ovs-dpdk?

 I am referring to mainline ovs, compiled with dpdk, and using userspace dataplane.
 AFAIK ovs-dpdk is early Intel fork, which is abandoned at the moment.

> And can you detail your test case? Like, how do you want ovs_on_host and ovs_in_container to
> be connected?
> Through two-direct-connected physical NICs, or one vhost port in ovs_on_host and one virtio
> port in ovs_in_container?

 vhost port. i. e.

                             |
LOCAL------dpdkvhostuser<----+---->cvio----->LOCAL
      ovs                    |          ovs
                             |
                host         |        container

 By this time i advanced in my research. ovs not only crashes by itself, but manages to crash host side. It does this by doing
reconfiguration sequence without sending VHOST_USER_SET_MEM_TABLE, therefore host-side ovs tries to refer old addresses and dies
badly.
 Those messages about memory pool already being present are perhaps OK.

Kind regards,
Pavel Fedin
Expert Engineer
Samsung Electronics Research center Russia

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [dpdk-dev] [RFC 0/5] virtio support for container
  2015-12-31  9:40   ` Pavel Fedin
@ 2015-12-31 10:02     ` Tan, Jianfeng
  2015-12-31 10:38       ` Pavel Fedin
  0 siblings, 1 reply; 16+ messages in thread
From: Tan, Jianfeng @ 2015-12-31 10:02 UTC (permalink / raw)
  To: Pavel Fedin, dev



> -----Original Message-----
> From: Pavel Fedin [mailto:p.fedin@samsung.com]
> Sent: Thursday, December 31, 2015 5:40 PM
> To: Tan, Jianfeng; dev@dpdk.org
> Subject: RE: [dpdk-dev] [RFC 0/5] virtio support for container
> 
>  Hello!
> 
> > First of all, when you say openvswitch, are you referring to ovs-dpdk?
> 
>  I am referring to mainline ovs, compiled with dpdk, and using userspace
> dataplane.
>  AFAIK ovs-dpdk is early Intel fork, which is abandoned at the moment.
> 
> > And can you detail your test case? Like, how do you want ovs_on_host and
> ovs_in_container to
> > be connected?
> > Through two-direct-connected physical NICs, or one vhost port in
> ovs_on_host and one virtio
> > port in ovs_in_container?
> 
>  vhost port. i. e.
> 
>                              |
> LOCAL------dpdkvhostuser<----+---->cvio----->LOCAL
>       ovs                    |          ovs
>                              |
>                 host         |        container
> 
>  By this time i advanced in my research. ovs not only crashes by itself, but
> manages to crash host side. It does this by doing
> reconfiguration sequence without sending VHOST_USER_SET_MEM_TABLE,
> therefore host-side ovs tries to refer old addresses and dies
> badly.

Yes, this case is exactly suited for this patchset.

Before you start another ovs_in_container, previous ones get killed? If so, vhost information
in ovs_on_host will be wiped as the unix socket is broken.
And by the way, ovs just allows one virtio for one vhost port, much different from the exmpale,
vhost-switch.

Thanks,
Jianfeng  

>  Those messages about memory pool already being present are perhaps OK.
> 
> Kind regards,
> Pavel Fedin
> Expert Engineer
> Samsung Electronics Research center Russia
> 

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [dpdk-dev] [RFC 0/5] virtio support for container
  2015-12-31 10:02     ` Tan, Jianfeng
@ 2015-12-31 10:38       ` Pavel Fedin
  2015-12-31 11:58         ` Tan, Jianfeng
  0 siblings, 1 reply; 16+ messages in thread
From: Pavel Fedin @ 2015-12-31 10:38 UTC (permalink / raw)
  To: 'Tan, Jianfeng', dev

 Hello!

> Before you start another ovs_in_container, previous ones get killed?

 Of course. It crashes.

> If so, vhost information in ovs_on_host will be wiped as the unix socket is broken.

 Yes. And ovs_on_host crashes because:
a) ovs_in_container does not send VHOST_USER_SET_MEM_TABLE (i don't know why yet)
b) set_vring_addr() does not make sure that dev->mem is set.

 I am preparing a patch to fix (b).

Kind regards,
Pavel Fedin
Expert Engineer
Samsung Electronics Research center Russia

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [dpdk-dev] [RFC 0/5] virtio support for container
  2015-12-31 10:38       ` Pavel Fedin
@ 2015-12-31 11:58         ` Tan, Jianfeng
  2015-12-31 12:44           ` Pavel Fedin
                             ` (2 more replies)
  0 siblings, 3 replies; 16+ messages in thread
From: Tan, Jianfeng @ 2015-12-31 11:58 UTC (permalink / raw)
  To: Pavel Fedin, dev

Hi,

> a) ovs_in_container does not send VHOST_USER_SET_MEM_TABLE
Please check if rte_eth_dev_start() is called.
(rte_eth_dev_start -> virtio_dev_start -> vtpci_reinit_complete -> kick_all_vq)

> b) set_vring_addr() does not make sure that dev->mem is set. 
>  I am preparing a patch to fix (b).

Yes, it seems like a bug, lack of necessary check.

Thanks,
Jianfeng

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [dpdk-dev] [RFC 0/5] virtio support for container
  2015-12-31 11:58         ` Tan, Jianfeng
@ 2015-12-31 12:44           ` Pavel Fedin
  2015-12-31 12:54             ` Tan, Jianfeng
  2015-12-31 13:47           ` Pavel Fedin
  2015-12-31 15:39           ` Pavel Fedin
  2 siblings, 1 reply; 16+ messages in thread
From: Pavel Fedin @ 2015-12-31 12:44 UTC (permalink / raw)
  To: 'Tan, Jianfeng', dev

Hello!

> > a) ovs_in_container does not send VHOST_USER_SET_MEM_TABLE
> Please check if rte_eth_dev_start() is called.
> (rte_eth_dev_start -> virtio_dev_start -> vtpci_reinit_complete -> kick_all_vq)
> 
> > b) set_vring_addr() does not make sure that dev->mem is set.
> >  I am preparing a patch to fix (b).
> 
> Yes, it seems like a bug, lack of necessary check.

 I've made some progress about (a). It's tricky. This caused by this fragment:

        if (vhost_user_read(vhost->sockfd, &msg, len, fds, fd_num) < 0)
                return 0;

 Here you ignore errors. And this particular request for some reason ends up in EBADF. The most magic part is that sometimes it just
works...
 Not sure if i can finish it today, and here in Russia we have New Year holidays until 11th.

Kind regards,
Pavel Fedin
Expert Engineer
Samsung Electronics Research center Russia

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [dpdk-dev] [RFC 0/5] virtio support for container
  2015-12-31 12:44           ` Pavel Fedin
@ 2015-12-31 12:54             ` Tan, Jianfeng
  2015-12-31 13:07               ` Pavel Fedin
  0 siblings, 1 reply; 16+ messages in thread
From: Tan, Jianfeng @ 2015-12-31 12:54 UTC (permalink / raw)
  To: Pavel Fedin, dev

Hello!

> 
>  I've made some progress about (a). It's tricky. This caused by this fragment:
> 
>         if (vhost_user_read(vhost->sockfd, &msg, len, fds, fd_num) < 0)
>                 return 0;
> 
>  Here you ignore errors. And this particular request for some reason ends up
> in EBADF. The most magic part is that sometimes it just
> works...
>  Not sure if i can finish it today, and here in Russia we have New Year holidays
> until 11th.

Oops, I made a mistake here. I got vhost_user_read() and vhost_user_write() backwards.

+	len = VHOST_USER_HDR_SIZE + msg.size;
+	if (vhost_user_read(hw->sockfd, &msg, len, fds, fd_num) < 0)
+		return 0;
+
+	if (need_reply) {
+		if (vhost_user_write(hw->sockfd, &msg) < 0)
+			return -1;
+
+		if (req != msg.request) {
+			PMD_DRV_LOG(ERR, "Received unexpected msg type."
+					" Expected %d received %d",
+					req, msg.request);
+			return -1;
+		}

Thanks,
Jianfeng

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [dpdk-dev] [RFC 0/5] virtio support for container
  2015-12-31 12:54             ` Tan, Jianfeng
@ 2015-12-31 13:07               ` Pavel Fedin
  0 siblings, 0 replies; 16+ messages in thread
From: Pavel Fedin @ 2015-12-31 13:07 UTC (permalink / raw)
  To: 'Tan, Jianfeng', dev

 Hello!

> >  Here you ignore errors. And this particular request for some reason ends up
> > in EBADF. The most magic part is that sometimes it just
> > works...
> >  Not sure if i can finish it today, and here in Russia we have New Year holidays
> > until 11th.
> 
> Oops, I made a mistake here. I got vhost_user_read() and vhost_user_write() backwards.

 But nevertheless they do the right thing. vhost_user_read() actually writes the message into socket, and vhost_user_write() reads
it. So they should work correctly.
 I've just checked, fd number is not corrupted.

Kind regards,
Pavel Fedin
Expert Engineer
Samsung Electronics Research center Russia

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [dpdk-dev] [RFC 0/5] virtio support for container
  2015-12-31 11:58         ` Tan, Jianfeng
  2015-12-31 12:44           ` Pavel Fedin
@ 2015-12-31 13:47           ` Pavel Fedin
  2015-12-31 15:39           ` Pavel Fedin
  2 siblings, 0 replies; 16+ messages in thread
From: Pavel Fedin @ 2015-12-31 13:47 UTC (permalink / raw)
  To: 'Tan, Jianfeng', dev

 Hello!

> > a) ovs_in_container does not send VHOST_USER_SET_MEM_TABLE
> Please check if rte_eth_dev_start() is called.
> (rte_eth_dev_start -> virtio_dev_start -> vtpci_reinit_complete -> kick_all_vq)

 I've figured out what happened, and it's my fault only :( I have modified your patchset and added --shared-mem option. And forgot
to specify it to gdb :) Without it memory is not shared, and rte_memseg_info_get() returned fd = -1. And if you put it into control
message for sendmsg(), you get your -EBADF.
 So please ignore this.
 But, nevertheless, ovs in container still dies with:
--- cut ---
Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fff97fff700 (LWP 3866)]
virtio_recv_mergeable_pkts (rx_queue=0x7fffd46a9a80, rx_pkts=0x7fff97ffe850, nb_pkts=32) at
/home/p.fedin/dpdk/drivers/net/virtio/virtio_rxtx.c:683
683	/home/p.fedin/dpdk/drivers/net/virtio/virtio_rxtx.c: No such file or directory.
Missing separate debuginfos, use: dnf debuginfo-install keyutils-libs-1.5.9-7.fc23.x86_64 krb5-libs-1.13.2-11.fc23.x86_64
libcap-ng-0.7.7-2.fc23.x86_64 libcom_err-1.42.13-3.fc23.x86_64 libselinux-2.4-4.fc23.x86_64 openssl-libs-1.0.2d-2.fc23.x86_64
pcre-8.37-4.fc23.x86_64 zlib-1.2.8-9.fc23.x86_64
(gdb) where
#0  virtio_recv_mergeable_pkts (rx_queue=0x7fffd46a9a80, rx_pkts=0x7fff97ffe850, nb_pkts=32) at
/home/p.fedin/dpdk/drivers/net/virtio/virtio_rxtx.c:683
#1  0x0000000000669ee8 in rte_eth_rx_burst (nb_pkts=32, rx_pkts=0x7fff97ffe850, queue_id=0, port_id=0 '\000') at
/home/p.fedin/dpdk/build/include/rte_ethdev.h:2510
#2  netdev_dpdk_rxq_recv (rxq_=<optimized out>, packets=0x7fff97ffe850, c=0x7fff97ffe84c) at lib/netdev-dpdk.c:1033
#3  0x00000000005e8ca1 in netdev_rxq_recv (rx=<optimized out>, buffers=buffers@entry=0x7fff97ffe850, cnt=cnt@entry=0x7fff97ffe84c)
at lib/netdev.c:654
#4  0x00000000005cb338 in dp_netdev_process_rxq_port (pmd=pmd@entry=0x7fffac7f8010, rxq=<optimized out>, port=<optimized out>,
port=<optimized out>) at lib/dpif-netdev.c:2510
#5  0x00000000005cc649 in pmd_thread_main (f_=0x7fffac7f8010) at lib/dpif-netdev.c:2671
#6  0x0000000000628424 in ovsthread_wrapper (aux_=<optimized out>) at lib/ovs-thread.c:340
#7  0x00007ffff70f660a in start_thread () from /lib64/libpthread.so.0
#8  0x00007ffff6926bbd in clone () from /lib64/libc.so.6
(gdb)
--- cut ---

 and l2fwd does not reproduce this. So, let's wait until 11.01.2016. And happy New Year to everybody who reads it (and who doesn't)
:)

Kind regards,
Pavel Fedin
Expert Engineer
Samsung Electronics Research center Russia

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [dpdk-dev] [RFC 0/5] virtio support for container
  2015-12-31 11:58         ` Tan, Jianfeng
  2015-12-31 12:44           ` Pavel Fedin
  2015-12-31 13:47           ` Pavel Fedin
@ 2015-12-31 15:39           ` Pavel Fedin
  2016-01-06  5:47             ` Tan, Jianfeng
  2 siblings, 1 reply; 16+ messages in thread
From: Pavel Fedin @ 2015-12-31 15:39 UTC (permalink / raw)
  To: 'Tan, Jianfeng', dev

 Hello!

 Last minute note. I have found the problem but have no time to research and fix it.
 It happens because ovs first creates the device, starts it, then stops it, and reconfigures queues. The second queue allocation
happens from within netdev_set_multiq(). Then ovs restarts the device and proceeds to actually using it.
 But, queues are not initialized properly in DPDK after the second allocation. Because of this thing:

	/* On restart after stop do not touch queues */
	if (hw->started)
		return 0;

 It keeps us away from calling virtio_dev_rxtx_start(), which should in turn call virtio_dev_vring_start(), which calls
vring_init(). So, VIRTQUEUE_NUSED() dies badly because vq->vq_ring all contains NULLs.
 See you all after 10th. And happy New Year again!

Kind regards,
Pavel Fedin
Expert Engineer
Samsung Electronics Research center Russia

> -----Original Message-----
> From: Pavel Fedin [mailto:p.fedin@samsung.com]
> Sent: Thursday, December 31, 2015 4:47 PM
> To: 'Tan, Jianfeng'; 'dev@dpdk.org'
> Subject: RE: [dpdk-dev] [RFC 0/5] virtio support for container
> 
>  Hello!
> 
> > > a) ovs_in_container does not send VHOST_USER_SET_MEM_TABLE
> > Please check if rte_eth_dev_start() is called.
> > (rte_eth_dev_start -> virtio_dev_start -> vtpci_reinit_complete -> kick_all_vq)
> 
>  I've figured out what happened, and it's my fault only :( I have modified your patchset and
> added --shared-mem option. And forgot to specify it to gdb :) Without it memory is not shared,
> and rte_memseg_info_get() returned fd = -1. And if you put it into control message for
> sendmsg(), you get your -EBADF.
>  So please ignore this.
>  But, nevertheless, ovs in container still dies with:
> --- cut ---
> Program received signal SIGSEGV, Segmentation fault.
> [Switching to Thread 0x7fff97fff700 (LWP 3866)]
> virtio_recv_mergeable_pkts (rx_queue=0x7fffd46a9a80, rx_pkts=0x7fff97ffe850, nb_pkts=32) at
> /home/p.fedin/dpdk/drivers/net/virtio/virtio_rxtx.c:683
> 683	/home/p.fedin/dpdk/drivers/net/virtio/virtio_rxtx.c: No such file or directory.
> Missing separate debuginfos, use: dnf debuginfo-install keyutils-libs-1.5.9-7.fc23.x86_64
> krb5-libs-1.13.2-11.fc23.x86_64 libcap-ng-0.7.7-2.fc23.x86_64 libcom_err-1.42.13-
> 3.fc23.x86_64 libselinux-2.4-4.fc23.x86_64 openssl-libs-1.0.2d-2.fc23.x86_64 pcre-8.37-
> 4.fc23.x86_64 zlib-1.2.8-9.fc23.x86_64
> (gdb) where
> #0  virtio_recv_mergeable_pkts (rx_queue=0x7fffd46a9a80, rx_pkts=0x7fff97ffe850, nb_pkts=32)
> at /home/p.fedin/dpdk/drivers/net/virtio/virtio_rxtx.c:683
> #1  0x0000000000669ee8 in rte_eth_rx_burst (nb_pkts=32, rx_pkts=0x7fff97ffe850, queue_id=0,
> port_id=0 '\000') at /home/p.fedin/dpdk/build/include/rte_ethdev.h:2510
> #2  netdev_dpdk_rxq_recv (rxq_=<optimized out>, packets=0x7fff97ffe850, c=0x7fff97ffe84c) at
> lib/netdev-dpdk.c:1033
> #3  0x00000000005e8ca1 in netdev_rxq_recv (rx=<optimized out>,
> buffers=buffers@entry=0x7fff97ffe850, cnt=cnt@entry=0x7fff97ffe84c) at lib/netdev.c:654
> #4  0x00000000005cb338 in dp_netdev_process_rxq_port (pmd=pmd@entry=0x7fffac7f8010,
> rxq=<optimized out>, port=<optimized out>, port=<optimized out>) at lib/dpif-netdev.c:2510
> #5  0x00000000005cc649 in pmd_thread_main (f_=0x7fffac7f8010) at lib/dpif-netdev.c:2671
> #6  0x0000000000628424 in ovsthread_wrapper (aux_=<optimized out>) at lib/ovs-thread.c:340
> #7  0x00007ffff70f660a in start_thread () from /lib64/libpthread.so.0
> #8  0x00007ffff6926bbd in clone () from /lib64/libc.so.6
> (gdb)
> --- cut ---
> 
>  and l2fwd does not reproduce this. So, let's wait until 11.01.2016. And happy New Year to
> everybody who reads it (and who doesn't) :)
> 
> Kind regards,
> Pavel Fedin
> Expert Engineer
> Samsung Electronics Research center Russia

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [dpdk-dev] [RFC 0/5] virtio support for container
  2015-12-31 15:39           ` Pavel Fedin
@ 2016-01-06  5:47             ` Tan, Jianfeng
  0 siblings, 0 replies; 16+ messages in thread
From: Tan, Jianfeng @ 2016-01-06  5:47 UTC (permalink / raw)
  To: Pavel Fedin, dev



On 12/31/2015 11:39 PM, Pavel Fedin wrote:
>   Hello!
>
>   Last minute note. I have found the problem but have no time to research and fix it.
>   It happens because ovs first creates the device, starts it, then stops it, and reconfigures queues. The second queue allocation
> happens from within netdev_set_multiq(). Then ovs restarts the device and proceeds to actually using it.
>   But, queues are not initialized properly in DPDK after the second allocation. Because of this thing:
>
> 	/* On restart after stop do not touch queues */
> 	if (hw->started)
> 		return 0;
Hi Fedin,

As you see, I also think it is a bug. A device should be ok to 
start/stop/start...

I already send a patch to fix this.
http://dpdk.org/ml/archives/dev/2016-January/031010.html

Thanks,
Jianfeng
>
>   It keeps us away from calling virtio_dev_rxtx_start(), which should in turn call virtio_dev_vring_start(), which calls
> vring_init(). So, VIRTQUEUE_NUSED() dies badly because vq->vq_ring all contains NULLs.
>   See you all after 10th. And happy New Year again!
>
>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [dpdk-dev] [RFC 0/5] virtio support for container
@ 2017-06-15  8:21 Avi Cohen (A)
  0 siblings, 0 replies; 16+ messages in thread
From: Avi Cohen (A) @ 2017-06-15  8:21 UTC (permalink / raw)
  To: dev

Hello,
Just want to check the status of this project 
Is it alive ?  working ?
Can I run a container connected to OVS-DPDK via a virtio device ?
Where can I download the code/patches ?
Best Regards
avi

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [dpdk-dev] [RFC 0/5] virtio support for container
  2015-11-24  3:53 ` Zhuangyanying
@ 2015-11-24  6:19   ` Tan, Jianfeng
  0 siblings, 0 replies; 16+ messages in thread
From: Tan, Jianfeng @ 2015-11-24  6:19 UTC (permalink / raw)
  To: Zhuangyanying, dev
  Cc: nakajima.yoshihiro, Zhbzg, mst, gaoxiaoqiu, Zhangbo (Oscar),
	Zhoujingbin, Guohongzhen



> -----Original Message-----
> From: Zhuangyanying [mailto:ann.zhuangyanying@huawei.com]
> Sent: Tuesday, November 24, 2015 11:53 AM
> To: Tan, Jianfeng; dev@dpdk.org
> Cc: mst@redhat.com; mukawa@igel.co.jp; nakajima.yoshihiro@lab.ntt.co.jp;
> Qiu, Michael; Guohongzhen; Zhoujingbin; Zhangbo (Oscar); gaoxiaoqiu;
> Zhbzg; Xie, Huawei
> Subject: RE: [RFC 0/5] virtio support for container
> 
> 
> 
> > -----Original Message-----
> > From: Jianfeng Tan [mailto:jianfeng.tan@intel.com]
> > Sent: Friday, November 06, 2015 2:31 AM
> > To: dev@dpdk.org
> > Cc: mst@redhat.com; mukawa@igel.co.jp;
> nakajima.yoshihiro@lab.ntt.co.jp;
> > michael.qiu@intel.com; Guohongzhen; Zhoujingbin; Zhuangyanying;
> Zhangbo
> > (Oscar); gaoxiaoqiu; Zhbzg; huawei.xie@intel.com; Jianfeng Tan
> > Subject: [RFC 0/5] virtio support for container
> >
...
> > 2.1.4
> 
> This patch arose a good idea to add an extra abstracted IO layer,  which
> would make it simple to extend the function to the kernel mode switch(such
> as OVS). That's great.
> But I have one question here:
>     it's the issue on VHOST_USER_SET_MEM_TABLE. you alloc memory from
> tmpfs filesyste, just one fd, could used rte_memseg_info_get() to
> 	directly get the memory topology, However, things change in kernel-
> space, because mempool should be created on each container's
> 	hugetlbfs(rather than tmpfs), which is seperated from each other, at
> last, considering of the ioctl's parameter.
>        My solution is as follows for your reference:
> /*
> 	reg = mem->regions;
> 	reg->guest_phys_addr = (__u64) ((struct virtqueue *)(dev->data-
> >rx_queues[0]))->mpool->elt_va_start;
> 	reg->userspace_addr = reg->guest_phys_addr;
> 	reg->memory_size = ((struct virtqueue *)(dev->data-
> >rx_queues[0]))->mpool->elt_va_end - reg->guest_phys_addr;
> 
> 	reg = mem->regions + 1;
> 	reg->guest_phys_addr = (__u64)(((struct virtqueue *)(dev->data-
> >tx_queues[0]))->virtio_net_hdr_mem);
> 	reg->userspace_addr = reg->guest_phys_addr;
> 	reg->memory_size = vq_size * internals->vtnet_hdr_size;
> */
> 	   But it's a little ugly, any better idea?

Hi Yanying,

Your solution seems ok for me when used with kernel vhost-net, because vhost
kthread just shares the same mm_struct with virtio process. But it will not work
with vhost-user, which realize memory sharing through putting fd in sendmsg().
Worse, it will not work with userspace vhost_cuse (see
lib/librte_vhost/vhost_cuse/), either, because current implementation supposes
VM's physical memory is backed by one huge file. Actually, what we need to do
Is enhancing userspace vhost_cuse, so that it supports cross-file memory region.

With below solutions to support hugetlbfs FYI:

To support hugetlbfs, my previous idea is to use -v option of "docker run"
to map hugetlbfs into its /dev/shm, so that we can create a "huge" shm file
on hugetlbfs. But this seems not accepted by others.

You mentioned the situation that DPDK now creates a file for each hugepage.
Maybe we just need to share all these hugepages with vhost. To minimize the
memory translation effort, we need to require that we use as few pages as
possible. Can you accept this solution?

Thanks,
Jianfeng

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [dpdk-dev] [RFC 0/5] virtio support for container
  2015-11-05 18:31 Jianfeng Tan
@ 2015-11-24  3:53 ` Zhuangyanying
  2015-11-24  6:19   ` Tan, Jianfeng
  0 siblings, 1 reply; 16+ messages in thread
From: Zhuangyanying @ 2015-11-24  3:53 UTC (permalink / raw)
  To: Jianfeng Tan, dev
  Cc: nakajima.yoshihiro, Zhbzg, mst, gaoxiaoqiu, Zhangbo (Oscar),
	Zhoujingbin, Guohongzhen



> -----Original Message-----
> From: Jianfeng Tan [mailto:jianfeng.tan@intel.com]
> Sent: Friday, November 06, 2015 2:31 AM
> To: dev@dpdk.org
> Cc: mst@redhat.com; mukawa@igel.co.jp; nakajima.yoshihiro@lab.ntt.co.jp;
> michael.qiu@intel.com; Guohongzhen; Zhoujingbin; Zhuangyanying; Zhangbo
> (Oscar); gaoxiaoqiu; Zhbzg; huawei.xie@intel.com; Jianfeng Tan
> Subject: [RFC 0/5] virtio support for container
> 
> This patchset only acts as a PoC to request the community for comments.
> 
> This patchset is to provide high performance networking interface
> (virtio) for container-based DPDK applications. The way of starting DPDK
> applications in containers with ownership of NIC devices exclusively is beyond
> the scope. The basic idea here is to present a new virtual device (named
> eth_cvio), which can be discovered and initialized in container-based DPDK
> applications rte_eal_init().
> To minimize the change, we reuse already-existing virtio frontend driver code
> (driver/net/virtio/).
> 
> Compared to QEMU/VM case, virtio device framework (translates I/O port r/w
> operations into unix socket/cuse protocol, which is originally provided in QEMU),
> is integrated in virtio frontend driver. Aka, this new converged driver actually
> plays the role of original frontend driver and the role of QEMU device
> framework.
> 
> The biggest difference here lies in how to calculate relative address for backend.
> The principle of virtio is that: based on one or multiple shared memory
> segments, vhost maintains a reference system with the base addresses and
> length of these segments so that an address from VM comes (usually GPA,
> Guest Physical Address), vhost can translate it into self-recognizable address
> (aka VVA, Vhost Virtual Address). To decrease the overhead of address
> translation, we should maintain as few segments as better. In the context of
> virtual machines, GPA is always locally continuous. So it's a good choice. In
> container's case, CVA (Container Virtual Address) can be used. This means
> that:
> a. when set_base_addr, CVA address is used; b. when preparing RX's
> descriptors, CVA address is used; c. when transmitting packets, CVA is filled in
> TX's descriptors; d. in TX and CQ's header, CVA is used.
> 
> How to share memory? In VM's case, qemu always shares all physical layout to
> backend. But it's not feasible for a container, as a process, to share all virtual
> memory regions to backend. So only specified virtual memory regions (type is
> shared) are sent to backend. It leads to a limitation that only addresses in
> these areas can be used to transmit or receive packets. For now, the shared
> memory is created in /dev/shm using shm_open() in the memory initialization
> process.
> 
> How to use?
> 
> a. Apply the patch of virtio for container. We need two copies of patched code
> (referred as dpdk-app/ and dpdk-vhost/)
> 
> b. To compile container apps:
> $: cd dpdk-app
> $: vim config/common_linuxapp (uncomment "CONFIG_RTE_VIRTIO_VDEV=y")
> $: make config RTE_SDK=`pwd` T=x86_64-native-linuxapp-gcc
> $: make install RTE_SDK=`pwd` T=x86_64-native-linuxapp-gcc
> $: make -C examples/l2fwd RTE_SDK=`pwd` T=x86_64-native-linuxapp-gcc
> 
> c. To build a docker image using Dockerfile below.
> $: cat ./Dockerfile
> FROM ubuntu:latest
> WORKDIR /usr/src/dpdk
> COPY . /usr/src/dpdk
> CMD ["/usr/src/dpdk/examples/l2fwd/build/l2fwd", "-c", "0xc", "-n", "4",
> "--no-huge", "--no-pci",
> "--vdev=eth_cvio0,queue_num=256,rx=1,tx=1,cq=0,path=/var/run/usvhost",
> "--", "-p", "0x1"]
> $: docker build -t dpdk-app-l2fwd .
> 
> d. To compile vhost:
> $: cd dpdk-vhost
> $: make config RTE_SDK=`pwd` T=x86_64-native-linuxapp-gcc
> $: make install RTE_SDK=`pwd` T=x86_64-native-linuxapp-gcc
> $: make -C examples/vhost RTE_SDK=`pwd` T=x86_64-native-linuxapp-gcc
> 
> e. Start vhost-switch
> $: ./examples/vhost/build/vhost-switch -c 3 -n 4 --socket-mem 1024,1024 -- -p
> 0x1 --stats 1
> 
> f. Start docker
> $: docker run -i -t -v <path to vhost unix socket>:/var/run/usvhost
> dpdk-app-l2fwd
> 
> Signed-off-by: Huawei Xie <huawei.xie@intel.com>
> Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com>
> 
> Jianfeng Tan (5):
>   virtio/container: add handler for ioport rd/wr
>   virtio/container: add a new virtual device named eth_cvio
>   virtio/container: unify desc->addr assignment
>   virtio/container: adjust memory initialization process
>   vhost/container: change mode of vhost listening socket
> 
>  config/common_linuxapp                       |   5 +
>  drivers/net/virtio/Makefile                  |   4 +
>  drivers/net/virtio/vhost-user.c              | 433
> +++++++++++++++++++++++++++
>  drivers/net/virtio/vhost-user.h              | 137 +++++++++
>  drivers/net/virtio/virtio_ethdev.c           | 319 +++++++++++++++-----
>  drivers/net/virtio/virtio_ethdev.h           |  16 +
>  drivers/net/virtio/virtio_pci.h              |  32 +-
>  drivers/net/virtio/virtio_rxtx.c             |   9 +-
>  drivers/net/virtio/virtio_rxtx_simple.c      |   9 +-
>  drivers/net/virtio/virtqueue.h               |   9 +-
>  lib/librte_eal/common/include/rte_memory.h   |   5 +
>  lib/librte_eal/linuxapp/eal/eal_memory.c     |  58 +++-
>  lib/librte_mempool/rte_mempool.c             |  16 +-
>  lib/librte_vhost/vhost_user/vhost-net-user.c |   5 +
>  14 files changed, 967 insertions(+), 90 deletions(-)  create mode 100644
> drivers/net/virtio/vhost-user.c  create mode 100644
> drivers/net/virtio/vhost-user.h
> 
> --
> 2.1.4

This patch arose a good idea to add an extra abstracted IO layer,  which would make it simple to extend the function to the kernel mode switch(such as OVS). That's great.
But I have one question here: 
    it's the issue on VHOST_USER_SET_MEM_TABLE. you alloc memory from tmpfs filesyste, just one fd, could used rte_memseg_info_get() to 
	directly get the memory topology, However, things change in kernel-space, because mempool should be created on each container's
	hugetlbfs(rather than tmpfs), which is seperated from each other, at last, considering of the ioctl's parameter. 
       My solution is as follows for your reference:
/*
	reg = mem->regions;
	reg->guest_phys_addr = (__u64) ((struct virtqueue *)(dev->data->rx_queues[0]))->mpool->elt_va_start;
	reg->userspace_addr = reg->guest_phys_addr;
	reg->memory_size = ((struct virtqueue *)(dev->data->rx_queues[0]))->mpool->elt_va_end - reg->guest_phys_addr;

	reg = mem->regions + 1;
	reg->guest_phys_addr = (__u64)(((struct virtqueue *)(dev->data->tx_queues[0]))->virtio_net_hdr_mem);
	reg->userspace_addr = reg->guest_phys_addr;
	reg->memory_size = vq_size * internals->vtnet_hdr_size;
*/	  
	   But it's a little ugly, any better idea?

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [dpdk-dev] [RFC 0/5] virtio support for container
@ 2015-11-05 18:31 Jianfeng Tan
  2015-11-24  3:53 ` Zhuangyanying
  0 siblings, 1 reply; 16+ messages in thread
From: Jianfeng Tan @ 2015-11-05 18:31 UTC (permalink / raw)
  To: dev
  Cc: nakajima.yoshihiro, zhbzg, mst, gaoxiaoqiu, oscar.zhangbo,
	ann.zhuangyanying, zhoujingbin, guohongzhen

This patchset only acts as a PoC to request the community for comments.
 
This patchset is to provide high performance networking interface
(virtio) for container-based DPDK applications. The way of starting
DPDK applications in containers with ownership of NIC devices
exclusively is beyond the scope. The basic idea here is to present
a new virtual device (named eth_cvio), which can be discovered
and initialized in container-based DPDK applications rte_eal_init().
To minimize the change, we reuse already-existing virtio frontend
driver code (driver/net/virtio/).
 
Compared to QEMU/VM case, virtio device framework (translates I/O
port r/w operations into unix socket/cuse protocol, which is originally
provided in QEMU),  is integrated in virtio frontend driver. Aka, this
new converged driver actually plays the role of original frontend
driver and the role of QEMU device framework.
 
The biggest difference here lies in how to calculate relative address
for backend. The principle of virtio is that: based on one or multiple
shared memory segments, vhost maintains a reference system with
the base addresses and length of these segments so that an address
from VM comes (usually GPA, Guest Physical Address), vhost can
translate it into self-recognizable address (aka VVA, Vhost Virtual
Address). To decrease the overhead of address translation, we should
maintain as few segments as better. In the context of virtual machines,
GPA is always locally continuous. So it's a good choice. In container's
case, CVA (Container Virtual Address) can be used. This means that:
a. when set_base_addr, CVA address is used; b. when preparing RX's
descriptors, CVA address is used; c. when transmitting packets, CVA is
filled in TX's descriptors; d. in TX and CQ's header, CVA is used.
 
How to share memory? In VM's case, qemu always shares all physical
layout to backend. But it's not feasible for a container, as a process,
to share all virtual memory regions to backend. So only specified
virtual memory regions (type is shared) are sent to backend. It leads
to a limitation that only addresses in these areas can be used to
transmit or receive packets. For now, the shared memory is created
in /dev/shm using shm_open() in the memory initialization process.
 
How to use?
 
a. Apply the patch of virtio for container. We need two copies of
patched code (referred as dpdk-app/ and dpdk-vhost/)
 
b. To compile container apps:
$: cd dpdk-app
$: vim config/common_linuxapp (uncomment "CONFIG_RTE_VIRTIO_VDEV=y")
$: make config RTE_SDK=`pwd` T=x86_64-native-linuxapp-gcc
$: make install RTE_SDK=`pwd` T=x86_64-native-linuxapp-gcc
$: make -C examples/l2fwd RTE_SDK=`pwd` T=x86_64-native-linuxapp-gcc
 
c. To build a docker image using Dockerfile below.
$: cat ./Dockerfile
FROM ubuntu:latest
WORKDIR /usr/src/dpdk
COPY . /usr/src/dpdk
CMD ["/usr/src/dpdk/examples/l2fwd/build/l2fwd", "-c", "0xc", "-n", "4", "--no-huge", "--no-pci", "--vdev=eth_cvio0,queue_num=256,rx=1,tx=1,cq=0,path=/var/run/usvhost", "--", "-p", "0x1"]
$: docker build -t dpdk-app-l2fwd .
 
d. To compile vhost:
$: cd dpdk-vhost
$: make config RTE_SDK=`pwd` T=x86_64-native-linuxapp-gcc
$: make install RTE_SDK=`pwd` T=x86_64-native-linuxapp-gcc
$: make -C examples/vhost RTE_SDK=`pwd` T=x86_64-native-linuxapp-gcc
 
e. Start vhost-switch
$: ./examples/vhost/build/vhost-switch -c 3 -n 4 --socket-mem 1024,1024 -- -p 0x1 --stats 1
 
f. Start docker
$: docker run -i -t -v <path to vhost unix socket>:/var/run/usvhost dpdk-app-l2fwd

Signed-off-by: Huawei Xie <huawei.xie@intel.com>
Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com>

Jianfeng Tan (5):
  virtio/container: add handler for ioport rd/wr
  virtio/container: add a new virtual device named eth_cvio
  virtio/container: unify desc->addr assignment
  virtio/container: adjust memory initialization process
  vhost/container: change mode of vhost listening socket

 config/common_linuxapp                       |   5 +
 drivers/net/virtio/Makefile                  |   4 +
 drivers/net/virtio/vhost-user.c              | 433 +++++++++++++++++++++++++++
 drivers/net/virtio/vhost-user.h              | 137 +++++++++
 drivers/net/virtio/virtio_ethdev.c           | 319 +++++++++++++++-----
 drivers/net/virtio/virtio_ethdev.h           |  16 +
 drivers/net/virtio/virtio_pci.h              |  32 +-
 drivers/net/virtio/virtio_rxtx.c             |   9 +-
 drivers/net/virtio/virtio_rxtx_simple.c      |   9 +-
 drivers/net/virtio/virtqueue.h               |   9 +-
 lib/librte_eal/common/include/rte_memory.h   |   5 +
 lib/librte_eal/linuxapp/eal/eal_memory.c     |  58 +++-
 lib/librte_mempool/rte_mempool.c             |  16 +-
 lib/librte_vhost/vhost_user/vhost-net-user.c |   5 +
 14 files changed, 967 insertions(+), 90 deletions(-)
 create mode 100644 drivers/net/virtio/vhost-user.c
 create mode 100644 drivers/net/virtio/vhost-user.h

-- 
2.1.4

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2017-06-15  8:21 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-12-30  9:46 [dpdk-dev] [RFC 0/5] virtio support for container Pavel Fedin
2015-12-31  9:19 ` Tan, Jianfeng
2015-12-31  9:40   ` Pavel Fedin
2015-12-31 10:02     ` Tan, Jianfeng
2015-12-31 10:38       ` Pavel Fedin
2015-12-31 11:58         ` Tan, Jianfeng
2015-12-31 12:44           ` Pavel Fedin
2015-12-31 12:54             ` Tan, Jianfeng
2015-12-31 13:07               ` Pavel Fedin
2015-12-31 13:47           ` Pavel Fedin
2015-12-31 15:39           ` Pavel Fedin
2016-01-06  5:47             ` Tan, Jianfeng
  -- strict thread matches above, loose matches on Subject: below --
2017-06-15  8:21 Avi Cohen (A)
2015-11-05 18:31 Jianfeng Tan
2015-11-24  3:53 ` Zhuangyanying
2015-11-24  6:19   ` Tan, Jianfeng

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).