DPDK patches and discussions
 help / color / mirror / Atom feed
From: "Traynor, Kevin" <kevin.traynor@intel.com>
To: Pravin Shelar <pshelar@nicira.com>,
	Oleg Strikov <oleg.strikov@canonical.com>
Cc: "dev@dpdk.org" <dev@dpdk.org>,
	Daniele Di Proietto <diproiettod@vmware.com>
Subject: Re: [dpdk-dev] Issues met while running openvswitch/dpdk/virtio inside the VM
Date: Mon, 11 May 2015 12:10:14 +0000	[thread overview]
Message-ID: <BC0FEEC7D7650749874CEC11314A88F73074D10F@IRSMSX104.ger.corp.intel.com> (raw)
In-Reply-To: <CALnjE+q0Nb886obPvq7odBxYohf=pN5pjhzhpkyf3ZciEZbH-Q@mail.gmail.com>


> -----Original Message-----
> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Pravin Shelar
> Sent: Friday, May 8, 2015 2:20 AM
> To: Oleg Strikov
> Cc: dev@dpdk.org
> Subject: Re: [dpdk-dev] Issues met while running openvswitch/dpdk/virtio
> inside the VM
> 
> On Thu, May 7, 2015 at 9:22 AM, Oleg Strikov <oleg.strikov@canonical.com>
> wrote:
> > Hi DPDK users and developers,
> >
> > Few weeks ago I came up with the idea to run openvswitch with dpdk backend
> > inside qemu-kvm virtual machine. I don't have enough supported NICs yet and
> > my plan was to start experimenting inside the virtualized environment,
> > achieve functional state of all the components and then switch to the real
> > hardware. Additional useful side-effect of doing things inside the vm is
> > that issues can be easily reproduced by someone else in a different
> > environment.
> >
> > I (fondly) hoped that running openvswitch/dpdk inside the vm would be
> > simpler than running the same set of components on the real hardware.
> > Unfortunately I met a bunch of issues on the way. All these issues lie on a
> > borderline between dpdk and openvswitch but I think that you might be
> > interested in my story. Please note that I still don't have
> > openvswitch/dpdk working inside the vm. I definetely have some progress
> > though.
> >
> Thanks for summarizing all the issues.
> DPDK is testing is done on real hardware and we are planing testing it
> in VM. This will certainly help in fixing issues sooner.
> 
> > Q: Does it sound okay from functional (not performance) standpoint to run
> > openvswitch/dpdk inside the vm? Do we want to be able to do this? Does
> > anyone from the dpdk development team do this?
> >
> > ## Issue 1 ##
> >
> > Openvswitch requires backend pmd driver to provide N_CORES tx queues where
> > N_CORES is the amount of cores available on the machine (openvswitch counts
> > the amount of cpu* entries inside /sys/devices/system/node/node0/ folder).
> > To my understanding it doesn't take into account the actual amount of cores
> > used by dpdk and just allocates tx queue for each available core. You may
> > refer to this chunk of code for details:
> > https://github.com/openvswitch/ovs/blob/master/lib/dpif-netdev.c#L1067
> >
> In case of OVS DPDK, there is no dpdk thread. Therefore all polling
> cores are managed by OVS and there is no need to account cores for
> DPDK. You can assign specific cores for OVS to limit number of cores
> used by OVS.
> 
> > This approach works fine on the real hardware but makes some issues when we
> > run openvswitch/dpdk inside the virtual machine. I tried both emulated
> > e1000 NIC and virtio NIC and neither of them worked just from the box.
> > Emulated e1000 NIC doesn't support multiple tx queues at all (see
> > http://dpdk.org/browse/dpdk/tree/lib/librte_pmd_e1000/em_ethdev.c#n884) and
> > virtio NIC doesn't support multiple tx queues by default. To enable
> > multiple tx queue for virtio NIC I had to add the following line to the
> > interface section of my libvirt config: '<driver name="vhost" queues="4"/>'
> >
> Good point. We should document this. Can you send patch to update
> README.DPDK?

Daniele's patch http://openvswitch.org/pipermail/dev/2015-March/052344.html
also allows for having a limited set of queues available. The documentation
patch is a good idea too.

> 
> > ## Issue 2 ##
> >
> > Openvswitch calls rte_eth_tx_queue_setup() twice for the same
> > port_id/queue_id. First call takes place during device initialization (see
> > call to dpdk_eth_dev_init() inside netdev_dpdk_init():
> > https://github.com/openvswitch/ovs/blob/master/lib/netdev-dpdk.c#L522).
> > Second call takes place when openvswitch tries to add more tx queues to the
> > device (see call to dpdk_eth_dev_init() inside netdev_dpdk_set_multiq():
> > https://github.com/openvswitch/ovs/blob/master/lib/netdev-dpdk.c#L697).
> > Second call not only initialized new queues but tries to re-initialize
> > existing ones.
> >
> > Unfortunately virtio driver can't handle second call of
> > rte_eth_tx_queue_setup() and returns error here:
> > http://dpdk.org/browse/dpdk/tree/lib/librte_pmd_virtio/virtio_ethdev.c#n316
> > This happens because memzone with the name portN_tvqN already exists when
> > second call takes place (memzone has been created during the first call).
> > To deal with this issue I had to manually add rte_memzone_lookup-based
> > check for this situation and avoid allocation of a new memzone if it
> > already exists.
> >
> This sounds like issue with virtIO driver. I think we need to fix DPDK
> upstream for this to work correctly.
> 
> > Q: Is it okay that openvswitch calls rte_eth_tx_queue_setup() twice? Right
> > now I can't understand if it's the issue with the virtio pmd driver or
> > incorrect API usage by openvswitch? Could someone shed some light on this
> > so I can move forward and maybe propose a fix.
> >
> > ## Issue 3 ##
> >
> > This issue is also (somehow) related to the fact that openvswitch calls
> > rte_eth_tx_queue_setup() twice. I fix the previous issue by the method
> > described above and initialization finishes. The whole machinery starts to
> > work but crashes at the very beginning (while fetching the first packet
> > from the NIC maybe). This crash happens here:
> > http://dpdk.org/browse/dpdk/tree/lib/librte_pmd_virtio/virtio_rxtx.c#n588
> > It takes place because vq_ring structure contains zeros instead of correct
> > values:
> > vq_ring = {num = 0, desc = 0x0, avail = 0x0, used = 0x0}
> > My understanding is that vq_ring gets initialized after the first call to
> > rte_eth_tx_queue_setup(), then overwritten by the second call to
> > rte_eth_tx_queue_setup() but without an appropriate initialization for the
> > second time. I'm trying to fix this issue right now.
> >
> This also sounds like DPDK issue.
> 
> > Q: Does it sound like a realistic goal to make virtio driver work in
> > openvswitch-like scenarios? I'm definitely not an expert in the area of
> > dpdk and can't estimate time and resources required. Maybe it's better to
> > wait until I get a proper hardware?
> >
> It will be nice to make OVS-DPDK work in VM. As I said I am also
> planning on working on it. Thanks for the heads up.

  reply	other threads:[~2015-05-11 12:10 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-05-07 16:22 Oleg Strikov
2015-05-08  1:19 ` Pravin Shelar
2015-05-11 12:10   ` Traynor, Kevin [this message]
2015-05-12 15:57     ` Oleg Strikov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=BC0FEEC7D7650749874CEC11314A88F73074D10F@IRSMSX104.ger.corp.intel.com \
    --to=kevin.traynor@intel.com \
    --cc=dev@dpdk.org \
    --cc=diproiettod@vmware.com \
    --cc=oleg.strikov@canonical.com \
    --cc=pshelar@nicira.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).