DPDK patches and discussions
 help / color / mirror / Atom feed
From: Daniele Di Proietto <diproiettod@vmware.com>
To: Christian Ehrhardt <christian.ehrhardt@canonical.com>
Cc: "discuss@openvswitch.org" <discuss@openvswitch.org>,
	dev <dev@dpdk.org>,
	"ciara.loftus@intel.com" <ciara.loftus@intel.com>
Subject: Re: [dpdk-dev] Issues with openvswitch2.5+dpdk2.2+kvm/virtio-pci
Date: Tue, 22 Mar 2016 20:47:06 +0000	[thread overview]
Message-ID: <D316F5F2.1826D%diproiettod@vmware.com> (raw)
In-Reply-To: <CAATJJ0KUxrXVhLug3f=XzBPo4mXQtiJ207OS4Sa_iAeO-uA6bw@mail.gmail.com>

Hi Christian,

thanks for the report.  I've managed to reproduce the problem and I
observed two separate issues:

1) short version: it appears to be a problem in DPDK 2.2 and it should be
fixed by 9a0615af7746("virtio: fix restart"), now on master.

long version:

for every DPDK device added to OVS, these calls are issued:

netdev_open()
...
   rte_eth_dev_configure()
   rte_eth_rx_queue_setup()
   rte_eth_dev_start()

before receiving packets, the datapath calls netdev_set_multiq, which
reinitializes the device

netdev_set_multiq()
...
   rte_eth_dev_stop()
   rte_eth_dev_configure()
   rte_eth_rx_queue_setup()
   rte_eth_dev_start()

The problem is that the second rte_eth_dev_start() doesn't actually
initializes dev->data->rx_queues[0], because it (wrongly?) assumes that
it's already initialized. This causes the next call to rte_eth_ex_burst()
to crash.

It appears that simply backporting 9a0615af7746("virtio: fix restart")
fixes the issue.

2) When ovs-vswitchd is started with --monitor, there will be a parent
process (the monitor) that simply restarts the child if it crashes, while
the child does the actual ovs-vswitchd job.

It appears that the monitor feature is broken with DPDK, because the DPDK
library is wrongly initialized in the parent process.  If the child
crashes, all the DPDK memzones are preserved, meaning that new allocations
will likely fail.

The fix for this is calling rte_eal_init() in the child process, after
parsing other ovs-vswitchd command line options. This is done (among other
things) in Aaron's series currently under review:

http://openvswitch.org/pipermail/dev/2016-March/067422.html

I think we need a separate fix for branch-2.5.

On 18/03/2016 08:20, "Christian Ehrhardt"
<christian.ehrhardt@canonical.com> wrote:

>Hi,
>I was trying to replicate a setup that I have working on physical devices
>(ixgbe) under kvm since there is a virtio pmd driver.
>
>
>TL;DR:
>- under KVM with virtio-pci (working on baremetal with ixgbe cards)
>- adding dpdk port to ovs fails with memzone <port0_tvq0> already exists
>and causes a segfault
>- I couldn't find a solution in similar mails that popped up here
>recently, any help or pointer appreciated.
>
>
>## Details ##
>I thought I've read that others have it working I thought that would be a
>great way to gain more debug control of the environment, but something
>seems to be eluding me.
>
>
>There were quite some similar mails on the List recently, but none seemed
>to hit the same issue as I do. At least none of the tunings/workarounds
>seemed to apply to me.
>
>As versions I have Openvswitch 2.5, DPDK 2.2, Qemu 2.5, Kernel 4.4 - so a
>fairly recent software stack.
>
>
>The super-short repro summary is:
>1. starting ovs-dpdk like
>ovs-vswitchd --dpdk -c 0x1 -n 4 --pci-blacklist 0000:00:03.0 -m 2048 --
>unix:/var/run/openvswitch/db.sock -vconsole:emer -vsyslog:err -vfile:info
>--mlockall --no-chdir --log-file=/var/log/openvswitch/ovs-vswitchd.log
>--pidfile=/var/run/openvswitch/ovs-vswitchd.pid
> --detach --monitor
>2. add a bridge and a ovs dpdk port
>
>ovs-vsctl add-port ovsdpdkbr0 dpdk0 -- set Interface dpdk0 type=dpdk
>
>ovs-vsctl add-port ovsdpdkbr0 dpdk0 -- set Interface dpdk0 type=dpdk
>
>
>
>The log of the initialization after #1 looks good to me - I can see two
>of my three virtio devices recognized and one blacklisted.
>Memory allocation looks good, ... I'll attach the log at the end of the
>mail
>
>
>
>
>## ISSUE ##
>But when I add a port and refer to one of the dpdk ports it fails with
>the following:
>ovs-vsctl[14023]: ovs|00001|vsctl|INFO|Called as ovs-vsctl add-port
>ovsdpdkbr0 dpdk0 -- set Interface dpdk0 type=dpdk
>ovs-vswitchd[13903]: EAL: memzone_reserve_aligned_thread_unsafe():
>memzone <port0_tvq0> already exists
>ovs-vswitchd[13903]: EAL: memzone_reserve_aligned_thread_unsafe():
>memzone <port0_tvq0_hdrzone> already exists
>ovs-vswitchd[13903]: EAL: memzone_reserve_aligned_thread_unsafe():
>memzone <port0_rvq0> already exists
>kernel: show_signal_msg: 18 callbacks suppressed
>kernel: pmd12[14025]: segfault at 2 ip 00007f3eb205eab2 sp
>00007f3e3dffa590 error 4 in libdpdk.so.0[7f3eb1fdf000+1e9000]
>ovs-vswitchd[13902]: ovs|00003|daemon_unix(monitor)|ERR|1 crashes: pid
>13903 died, killed (Segmentation fault), core dumped, restarting
>systemd-udevd[14040]: Could not generate persistent MAC address for
>ovs-netdev: No such file or directory
>kernel: device ovs-netdev entered promiscuous mode
>ovs-vswitchd[14036]: EAL: memzone_reserve_aligned_thread_unsafe():
>memzone <RG_MP_ovs_mp_1500_0_262144> already exists
>ovs-vswitchd[14036]: RING: Cannot reserve memory
>kernel: device ovsdpdkbr0 entered promiscuous mode
>ovs-vswitchd[14036]: EAL: memzone_reserve_aligned_thread_unsafe():
>memzone <RG_MP_ovs_mp_1500_0_262144> already exists
>ovs-vswitchd[14036]: RING: Cannot reserve memory
>
>
>
>
>
>## Experiments (failed) ##
>I thought it could be related to all the multiqueue chances that recently
>going in.
>My usual setup has 4 vCPUs and 4 queues per virtio-net device.
>I tried them with only 1 of 4 queues, also with only 1 queue defined and
>only 1 CPU - but all fail the same way.
>I have testpmd and l2fwd on the same devices working, so I hope they are
>not totally set up badly.
>
>
>I also tried hilarious things like reassigning to uio_pci_generic before,
>but well its virtio_pmd eventually anyways - so it made no difference.
>
>
>From how it appears I felt that it could be related to the old
>discussions around
>[1] http://dpdk.org/ml/archives/dev/2015-May/017589.html
>[2] http://openvswitch.org/pipermail/dev/2015-March/052344.html
>But they are (partially) applied upstream already and the issue doesn't
>100% match the old discussions.
>
>
>
>
>## Logs ##
>[3] log of openvswitch start:
>systemd[1]: Starting Open vSwitch Internal Unit...
>ovs-ctl[13868]:  * Starting ovsdb-server
>ovs-vsctl[13893]: ovs|00001|vsctl|INFO|Called as ovs-vsctl --no-wait --
>init -- set Open_vSwitch . db-version=7.12.1
>ovs-vsctl[13898]: ovs|00001|vsctl|INFO|Called as ovs-vsctl --no-wait set
>Open_vSwitch . ovs-version=2.5.0
>"external-ids:system-id=\"8ddb892c-53a5-410d-a765-0031ad6eb1be\""
>"system-type=\"Ubuntu\"" "system-version=\"16.04-xenial\""
>ovs-ctl[13868]:  * Configuring Open vSwitch system IDs
>ovs-ctl[13868]: 2016-03-18T14:28:28Z|00001|dpdk|INFO|No -vhost_sock_dir
>provided - defaulting to /var/run/openvswitch
>ovs-vswitchd[13900]: ovs|00001|dpdk|INFO|No -vhost_sock_dir provided -
>defaulting to /var/run/openvswitch
>ovs-ctl[13868]: EAL: Detected lcore 0 as core 0 on socket 0
>ovs-ctl[13868]: EAL: Detected lcore 1 as core 0 on socket 0
>ovs-ctl[13868]: EAL: Detected lcore 2 as core 0 on socket 0
>ovs-ctl[13868]: EAL: Detected lcore 3 as core 0 on socket 0
>ovs-ctl[13868]: EAL: Support maximum 128 logical core(s) by configuration.
>ovs-ctl[13868]: EAL: Detected 4 lcore(s)
>ovs-ctl[13868]: EAL: No free hugepages reported in hugepages-1048576kB
>ovs-ctl[13868]: EAL: VFIO modules not all loaded, skip VFIO support...
>ovs-ctl[13868]: EAL: Setting up physically contiguous memory...
>ovs-ctl[13868]: EAL: Ask a virtual area of 0x200000 bytes
>ovs-ctl[13868]: EAL: Virtual area found at 0x7f3eaf800000 (size =
>0x200000)
>ovs-ctl[13868]: EAL: Ask a virtual area of 0x5ac00000 bytes
>ovs-ctl[13868]: EAL: Virtual area found at 0x7f3e54a00000 (size =
>0x5ac00000)
>ovs-ctl[13868]: EAL: Ask a virtual area of 0xc00000 bytes
>ovs-ctl[13868]: EAL: Virtual area found at 0x7f3e53c00000 (size =
>0xc00000)
>ovs-ctl[13868]: EAL: Ask a virtual area of 0x1200000 bytes
>ovs-ctl[13868]: EAL: Virtual area found at 0x7f3e52800000 (size =
>0x1200000)
>ovs-ctl[13868]: EAL: Ask a virtual area of 0x400000 bytes
>ovs-ctl[13868]: EAL: Virtual area found at 0x7f3e52200000 (size =
>0x400000)
>ovs-ctl[13868]: EAL: Ask a virtual area of 0x200000 bytes
>ovs-ctl[13868]: EAL: Virtual area found at 0x7f3e51e00000 (size =
>0x200000)
>ovs-ctl[13868]: EAL: Ask a virtual area of 0x400000 bytes
>ovs-ctl[13868]: EAL: Virtual area found at 0x7f3e51800000 (size =
>0x400000)
>ovs-ctl[13868]: EAL: Ask a virtual area of 0x200000 bytes
>ovs-ctl[13868]: EAL: Virtual area found at 0x7f3e51400000 (size =
>0x200000)
>ovs-ctl[13868]: EAL: Ask a virtual area of 0xc00000 bytes
>ovs-ctl[13868]: EAL: Virtual area found at 0x7f3e50600000 (size =
>0xc00000)
>ovs-ctl[13868]: EAL: Ask a virtual area of 0x1000000 bytes
>ovs-ctl[13868]: EAL: Virtual area found at 0x7f3e4f400000 (size =
>0x1000000)
>ovs-ctl[13868]: EAL: Ask a virtual area of 0x200000 bytes
>ovs-ctl[13868]: EAL: Virtual area found at 0x7f3e4f000000 (size =
>0x200000)
>ovs-ctl[13868]: EAL: Ask a virtual area of 0x200000 bytes
>ovs-ctl[13868]: EAL: Virtual area found at 0x7f3e4ec00000 (size =
>0x200000)
>ovs-ctl[13868]: EAL: Ask a virtual area of 0x200000 bytes
>ovs-ctl[13868]: EAL: Virtual area found at 0x7f3e4e800000 (size =
>0x200000)
>ovs-ctl[13868]: EAL: Ask a virtual area of 0x200000 bytes
>ovs-ctl[13868]: EAL: Virtual area found at 0x7f3e4e400000 (size =
>0x200000)
>ovs-ctl[13868]: EAL: Ask a virtual area of 0x400000 bytes
>ovs-ctl[13868]: EAL: Virtual area found at 0x7f3e4de00000 (size =
>0x400000)
>ovs-ctl[13868]: EAL: Ask a virtual area of 0x200000 bytes
>ovs-ctl[13868]: EAL: Virtual area found at 0x7f3e4da00000 (size =
>0x200000)
>ovs-ctl[13868]: EAL: Ask a virtual area of 0x5ac00000 bytes
>ovs-ctl[13868]: EAL: Virtual area found at 0x7f3df2c00000 (size =
>0x5ac00000)
>ovs-ctl[13868]: EAL: Ask a virtual area of 0xc00000 bytes
>ovs-ctl[13868]: EAL: Virtual area found at 0x7f3df1e00000 (size =
>0xc00000)
>ovs-ctl[13868]: EAL: Ask a virtual area of 0x400000 bytes
>ovs-ctl[13868]: EAL: Virtual area found at 0x7f3df1800000 (size =
>0x400000)
>ovs-ctl[13868]: EAL: Ask a virtual area of 0x1000000 bytes
>ovs-ctl[13868]: EAL: Virtual area found at 0x7f3df0600000 (size =
>0x1000000)
>ovs-ctl[13868]: EAL: Ask a virtual area of 0x200000 bytes
>ovs-ctl[13868]: EAL: Virtual area found at 0x7f3df0200000 (size =
>0x200000)
>ovs-ctl[13868]: EAL: Ask a virtual area of 0x400000 bytes
>ovs-ctl[13868]: EAL: Virtual area found at 0x7f3defc00000 (size =
>0x400000)
>ovs-ctl[13868]: EAL: Ask a virtual area of 0x200000 bytes
>ovs-ctl[13868]: EAL: Virtual area found at 0x7f3def800000 (size =
>0x200000)
>ovs-ctl[13868]: EAL: Ask a virtual area of 0x400000 bytes
>ovs-ctl[13868]: EAL: Virtual area found at 0x7f3def200000 (size =
>0x400000)
>ovs-ctl[13868]: EAL: Ask a virtual area of 0x600000 bytes
>ovs-ctl[13868]: EAL: Virtual area found at 0x7f3deea00000 (size =
>0x600000)
>ovs-ctl[13868]: EAL: Ask a virtual area of 0x1800000 bytes
>ovs-ctl[13868]: EAL: Virtual area found at 0x7f3ded000000 (size =
>0x1800000)
>ovs-ctl[13868]: EAL: Ask a virtual area of 0x600000 bytes
>ovs-ctl[13868]: EAL: Virtual area found at 0x7f3dec800000 (size =
>0x600000)
>ovs-ctl[13868]: EAL: Ask a virtual area of 0x200000 bytes
>ovs-ctl[13868]: EAL: Virtual area found at 0x7f3dec400000 (size =
>0x200000)
>ovs-ctl[13868]: EAL: Requesting 1024 pages of size 2MB from socket 0
>ovs-ctl[13868]: EAL: TSC frequency is ~2397235 KHz
>ovs-vswitchd[13900]: EAL: TSC frequency is ~2397235 KHz
>ovs-ctl[13868]: EAL: WARNING: cpu flags constant_tsc=yes nonstop_tsc=no
>-> using unreliable clock cycles !
>ovs-ctl[13868]: EAL: Master lcore 0 is ready (tid=b2f7ab00;cpuset=[0])
>ovs-ctl[13868]: EAL: PCI device 0000:00:03.0 on NUMA socket -1
>ovs-ctl[13868]: EAL:   probe driver: 1af4:1000 rte_virtio_pmd
>ovs-ctl[13868]: EAL:   Device is blacklisted, not initializing
>ovs-ctl[13868]: EAL: PCI device 0000:00:04.0 on NUMA socket -1
>ovs-ctl[13868]: EAL:   probe driver: 1af4:1000 rte_virtio_pmd
>ovs-vswitchd[13900]: EAL: WARNING: cpu flags constant_tsc=yes
>nonstop_tsc=no -> using unreliable clock cycles !
>ovs-ctl[13868]: EAL: PCI device 0000:00:05.0 on NUMA socket -1
>ovs-ctl[13868]: EAL:   probe driver: 1af4:1000 rte_virtio_pmd
>ovs-vswitchd[13900]: EAL: Master lcore 0 is ready
>(tid=b2f7ab00;cpuset=[0])
>ovs-vswitchd[13900]: EAL: PCI device 0000:00:03.0 on NUMA socket -1
>ovs-vswitchd[13900]: EAL:   probe driver: 1af4:1000 rte_virtio_pmd
>ovs-vswitchd[13900]: EAL:   Device is blacklisted, not initializing
>ovs-vswitchd[13900]: EAL: PCI device 0000:00:04.0 on NUMA socket -1
>ovs-vswitchd[13900]: EAL:   probe driver: 1af4:1000 rte_virtio_pmd
>ovs-vswitchd[13900]: EAL: PCI device 0000:00:05.0 on NUMA socket -1
>ovs-vswitchd[13900]: EAL:   probe driver: 1af4:1000 rte_virtio_pmd
>ovs-ctl[13868]: Zone 0: name:<RG_MP_log_history>, phys:0x1495fdec0,
>len:0x2080, virt:0x7f3e4dbfdec0, socket_id:0, flags:0
>ovs-ctl[13868]: Zone 1: name:<MP_log_history>, phys:0xb7d75f00,
>len:0x28a0c0, virt:0x7f3e4df75f00, socket_id:0, flags:0
>ovs-ctl[13868]: Zone 2: name:<rte_eth_dev_data>, phys:0xb7d467c0,
>len:0x2f700, virt:0x7f3e4df467c0, socket_id:0, flags:0
>ovs-ctl[13868]: Zone 3: name:<port0_cvq>, phys:0xb7d43000, len:0x2000,
>virt:0x7f3e4df43000, socket_id:0, flags:0
>ovs-ctl[13868]: Zone 4: name:<port0_cvq_hdrzone>, phys:0xb7d41fc0,
>len:0x1000, virt:0x7f3e4df41fc0, socket_id:0, flags:0
>ovs-ctl[13868]: Zone 5: name:<port1_cvq>, phys:0xb7d3f000, len:0x2000,
>virt:0x7f3e4df3f000, socket_id:0, flags:0
>ovs-ctl[13868]: Zone 6: name:<port1_cvq_hdrzone>, phys:0xb7d3dfc0,
>len:0x1000, virt:0x7f3e4df3dfc0, socket_id:0, flags:0
>ovs-ctl[13868]:  * Starting ovs-vswitchd
>ovs-ctl[13868]:  * Enabling remote OVSDB managers
>systemd[1]: Started Open vSwitch Internal Unit.
>
>
>
>P.S. adding a few people I usually see replying on these topics directly.
>Thanks in advance for sharing your thoughts!
>
>Christian EhrhardtSoftware Engineer, Ubuntu Server
>Canonical Ltd
>
>
>
>
>
>

  parent reply	other threads:[~2016-03-22 20:47 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-03-18 15:20 Christian Ehrhardt
2016-03-21  8:16 ` Christian Ehrhardt
2016-03-22 20:47 ` Daniele Di Proietto [this message]
2016-03-23  7:19   ` Christian Ehrhardt

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=D316F5F2.1826D%diproiettod@vmware.com \
    --to=diproiettod@vmware.com \
    --cc=christian.ehrhardt@canonical.com \
    --cc=ciara.loftus@intel.com \
    --cc=dev@dpdk.org \
    --cc=discuss@openvswitch.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).