DPDK patches and discussions
 help / color / mirror / Atom feed
* Re: [dpdk-dev] Status of Open vSwitch with DPDK
       [not found] <D1F17A83.5ED1%diproiettod@vmware.com>
@ 2015-08-14 16:04 ` Gray, Mark D
  2015-08-14 21:24   ` [dpdk-dev] [ovs-dev] " Thomas F Herbert
  2015-08-15  7:16   ` Flavio Leitner
  0 siblings, 2 replies; 4+ messages in thread
From: Gray, Mark D @ 2015-08-14 16:04 UTC (permalink / raw)
  To: Daniele Di Proietto, dev; +Cc: dev

Hi Daniele,

Thanks for starting this conversation. It is a good list :) I have crossed-posted this
to dpdk.org as I feel that some of the points could be interesting to that community
as they are related to how DPDK is used.

How do "users" of OVS with DPDK feel about this list? Does anyone disagree or
does anyone have any additions? What are your experiences?

> 
> There has been some discussion lately about the status of the Open vSwitch
> port to DPDK.  While part of the code has been tested for quite some time,
> I think we can agree that there are a few rough spots that prevent it from
> being easily deployed and used.
> 
> I was hoping to get some feedback from the community about those rough
> spots,
> i.e. areas where OVS+DPDK can/needs to improve to become more
> "production
> ready" and user-friendly.
> 
> - PMD threads and queues management: the code has shown several bugs
> and
> the
>   netdev interfaces don't seem up to the job anymore.

You had a few ideas about how to refactor this before but I was concerned 
about the effect it would have on throughput. I can't find the thread. 

Do you have some further ideas about how to achieve this?

> 
>   There's a lot of margin of improvement: we could factor out the code from
>   dpif-netdev, add configuration parameters for advanced users, and figure
> out
>   a way to add unit tests.
> 

I think this is a general issue with both the kernel datapath (and netdevs)
and the userspace datapath. There isn't much unit testing (or testing) outside
of the slow path. 

>   Related to this, the system should be as fast as possible out-of-the-box,
>   without requiring too much tuning.

This is a good point. I think the kernel datapath has a similar issue. You can
get a certain level of performance without compiling with -Ofast or
pinning threads but you will (even with the kernel datapath) get better
performance if you pin threads (and possibly compile differently). I guess
it is more visible with the dpdk datapath as performance is one of the key
values. It is also more detrimental to the performance if you don't set it
up correctly.

Perhaps we could provide scripts to help do this?

I think this is also interesting to the DPDK community. There is 
knowledge required when running DPDK enabled apps to
get good performance: core pinning is one thing that comes to mind.

> 
> - Userspace tunneling: while the code has been there for quite some time it
>   hasn't received the level of testing that the Linux kernel datapath
> tunneling
>   has.
> 

Again, there is a lack of test infrastructure in general for OVS. vsperf is a good
start, and it would be great to see more people use and contribute to it!

> - Documentation: other than a step by step tutorial,  it cannot be said
> that
>   DPDK is a first class citizen in the OVS documentation.  Manpages could
> be
>   improved.

Easily done. The INSTALL guide is pretty good but the structure could be better. 
There is also a lack of manpages. Good point.

> 
> - Vhost: the code has not received the level of testing of the kernel
> vhost.
>   Another doubt shared by some developers is whether we should keep
>   vhost-cuse, given its relatively low ease of use and the overlapping with
>   the far more standard vhost-user.

vhost-cuse is required for older versions of qemu. I'm aware of some companies
using it as they are restricted to an older version of qemu. I think it is deprecated
at the moment? Is there a notice to that effect? We just need a plan for when to
remove it and make sure that plan is clear?

> 
> - Interface management and naming: interfaces must be manually removed
> from
>   the kernel drivers.
> 
>   We still don't have an easy way to identify them. Ideas are welcome: how
> can
>   we make this user friendly?  Is there a better solution on the DPDK side?

This is a tough one and is interesting to the DPDK community.  The basic issue
here is that users are more familiar with linux interfaces and linux naming
conventions.

"ovs-vsctl add-port bro eth0" makes a lot more sense than

"dpdk_nic_bind -b igb_uio <pci_id>", then check the order that the ports
are enumerated and then run "ovs-vsctl add-port br0 dpdkN".

I can think of ways to do this with physical NICs. For example,
you could reference the port by the linux name and when you try to add it, OVS
could unbind from the kernel module and bind it to igb_uio?

However, I am not sure how you would do it with virtual nics as there is not
even a real device.

I think a general solution from the dpdk community would be really helpful here.
> 
>   How are DPDK interfaces handled by linux distributions? I've heard about
>   ongoing work for RHEL and Ubuntu, it would be interesting to coordinate.
> 
> 
> - Insight into the system and debuggability: nothing beats tcpdump for the
>   kernel datapath.  Can something similar be done for the userspace
> datapath?

Yeah, this would be useful. I have my own way of dealing with this. For example,
you could dump from the LOCAL port on a NORMAL bridge or add a rule to 
mirror a flow to another port but I feel there could be a better way to do this in
DPDK. I have recently heard that the DPDK team do something with a pcap pmd
to help with debugging. A more general approach from dpdk would help a lot.
> 
> - Consistency of the tools: some commands are slightly different for the
>   userspace/kernel datapath.  Ideally there shouldn't be any difference.

Yeah, there are some things that could be changed. DPDK just works differently but
the benefits are significant :)

We need to mount hugepages, bind nics to igb_uio, etc

With a lot of this stuff, maybe the DPDK community's tools don't need to emulate
the linux networking tools exactly. Maybe over time as the DPDK community
and user-base expands, people will become more familiar with the tools, processes, etc
and this will be less of an issue?


> 
> - Packaging: how should the distributions package DPDK and OVS? Should
> there
>   only be a single build to handle both the kernel and the userspace
> datapath,
>   eventually dynamically linked to DPDK?

Yeah. Do we need to start with dpdk if we have compiled with DPDK support???
> 
> - Benchmarks: we often rely on extremely simple flow tables with single
> flow
>   traffic to evaluate the effect of a change.  That may be ok during
>   development, but OVS with the kernel datapath has been tested in
> different
>   scenarios with more complicated flow tables and even with hostile traffic
>   patterns.
> 
>   Efforts in this sense are being made, like the vsperf project, or even
> the
>   simple ovs-pipeline.py

vsperf will really help this.

> 
> I would appreciate feedback on the above points, not (only) in terms of
> solutions, but in terms of requirements that you feel are important for our
> system to be considered ready.
> 
> Cheers,
> 
> Daniele
> 


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [dpdk-dev] [ovs-dev] Status of Open vSwitch with DPDK
  2015-08-14 16:04 ` [dpdk-dev] Status of Open vSwitch with DPDK Gray, Mark D
@ 2015-08-14 21:24   ` Thomas F Herbert
  2015-08-15  7:16   ` Flavio Leitner
  1 sibling, 0 replies; 4+ messages in thread
From: Thomas F Herbert @ 2015-08-14 21:24 UTC (permalink / raw)
  To: Gray, Mark D, Daniele Di Proietto, dev; +Cc: dev

On 8/14/15 12:04 PM, Gray, Mark D wrote:
> Hi Daniele,
>
> Thanks for starting this conversation. It is a good list :) I have crossed-posted this
> to dpdk.org as I feel that some of the points could be interesting to that community
> as they are related to how DPDK is used.
>
> How do "users" of OVS with DPDK feel about this list? Does anyone disagree or
> does anyone have any additions? What are your experiences?
Daniele,

Although I think Mark posted this information to @openvswitch before, I 
want to mention again the new project in opnfv, openvswitch for nfv 
(tagged ovsnfv) whose purpose is to deploy Open vSwitch with sw datapath 
acceleration into opnfv. The goal is to test ovs-dpdk or other potential 
contributed accelerated datapaths into more complex user focused 
scenarios such as sfc and opnfv vsperf.
>
>>
>> There has been some discussion lately about the status of the Open vSwitch
>> port to DPDK.  While part of the code has been tested for quite some time,
>> I think we can agree that there are a few rough spots that prevent it from
>> being easily deployed and used.
>>
>> I was hoping to get some feedback from the community about those rough
>> spots,
>> i.e. areas where OVS+DPDK can/needs to improve to become more
>> "production
>> ready" and user-friendly.
>>
>> - PMD threads and queues management: the code has shown several bugs
>> and
>> the
>>    netdev interfaces don't seem up to the job anymore.
>
> You had a few ideas about how to refactor this before but I was concerned
> about the effect it would have on throughput. I can't find the thread.
>
> Do you have some further ideas about how to achieve this?
>
>>
>>    There's a lot of margin of improvement: we could factor out the code from
>>    dpif-netdev, add configuration parameters for advanced users, and figure
>> out
>>    a way to add unit tests.
>>
>
> I think this is a general issue with both the kernel datapath (and netdevs)
> and the userspace datapath. There isn't much unit testing (or testing) outside
> of the slow path.
Well yes of course but there is quite a bit of tradecraft accumulated 
over many years about how to debug and test a kernel based protocol that 
just doesn't exist yet for dpdk.
>
>>    Related to this, the system should be as fast as possible out-of-the-box,
>>    without requiring too much tuning.
I know there have been some off-line discussions about the possibility 
of creating some canned tuning profiles including a default profile to 
improve the "out of the box" experience of dpdk so new deployers of 
dpdk/ovs could experience some of the benefits of dpdk without needing 
to deep dive into the mysteries of tuning dpdk.
>
> This is a good point. I think the kernel datapath has a similar issue. You can
> get a certain level of performance without compiling with -Ofast or
> pinning threads but you will (even with the kernel datapath) get better
> performance if you pin threads (and possibly compile differently). I guess
> it is more visible with the dpdk datapath as performance is one of the key
> values. It is also more detrimental to the performance if you don't set it
> up correctly.
>
> Perhaps we could provide scripts to help do this?
>
> I think this is also interesting to the DPDK community. There is
> knowledge required when running DPDK enabled apps to
> get good performance: core pinning is one thing that comes to mind.
>
>>
>> - Userspace tunneling: while the code has been there for quite some time it
>>    hasn't received the level of testing that the Linux kernel datapath
>> tunneling
>>    has.
>>
>
> Again, there is a lack of test infrastructure in general for OVS. vsperf is a good
> start, and it would be great to see more people use and contribute to it!
>
>> - Documentation: other than a step by step tutorial,  it cannot be said
>> that
>>    DPDK is a first class citizen in the OVS documentation.  Manpages could
>> be
>>    improved.
>
> Easily done. The INSTALL guide is pretty good but the structure could be better.
> There is also a lack of manpages. Good point.
>
>>
>> - Vhost: the code has not received the level of testing of the kernel
>> vhost.
>>    Another doubt shared by some developers is whether we should keep
>>    vhost-cuse, given its relatively low ease of use and the overlapping with
>>    the far more standard vhost-user.
>
> vhost-cuse is required for older versions of qemu. I'm aware of some companies
> using it as they are restricted to an older version of qemu. I think it is deprecated
> at the moment? Is there a notice to that effect? We just need a plan for when to
> remove it and make sure that plan is clear?
+1
>
>>
>> - Interface management and naming: interfaces must be manually removed
>> from
>>    the kernel drivers.
>>
>>    We still don't have an easy way to identify them. Ideas are welcome: how
>> can
>>    we make this user friendly?  Is there a better solution on the DPDK side?
>
> This is a tough one and is interesting to the DPDK community.  The basic issue
> here is that users are more familiar with linux interfaces and linux naming
> conventions.
>
> "ovs-vsctl add-port bro eth0" makes a lot more sense than
>
> "dpdk_nic_bind -b igb_uio <pci_id>", then check the order that the ports
> are enumerated and then run "ovs-vsctl add-port br0 dpdkN".
>
> I can think of ways to do this with physical NICs. For example,
> you could reference the port by the linux name and when you try to add it, OVS
> could unbind from the kernel module and bind it to igb_uio?
>
> However, I am not sure how you would do it with virtual nics as there is not
> even a real device.
>
> I think a general solution from the dpdk community would be really helpful here.
>>
>>    How are DPDK interfaces handled by linux distributions? I've heard about
>>    ongoing work for RHEL and Ubuntu, it would be interesting to coordinate.
>>
>>
>> - Insight into the system and debuggability: nothing beats tcpdump for the
>>    kernel datapath.  Can something similar be done for the userspace
>> datapath?
>
> Yeah, this would be useful. I have my own way of dealing with this. For example,
> you could dump from the LOCAL port on a NORMAL bridge or add a rule to
> mirror a flow to another port but I feel there could be a better way to do this in
> DPDK.
+1
  I have recently heard that the DPDK team do something with a pcap pmd
> to help with debugging. A more general approach from dpdk would help a lot.
I agree that a libpcap IF would be really useful - maybe where a core 
with a hugepage could be allocated for buffering.
>>
>> - Consistency of the tools: some commands are slightly different for the
>>    userspace/kernel datapath.  Ideally there shouldn't be any difference.
>
> Yeah, there are some things that could be changed. DPDK just works differently but
> the benefits are significant :)
>
> We need to mount hugepages, bind nics to igb_uio, etc
>
> With a lot of this stuff, maybe the DPDK community's tools don't need to emulate
> the linux networking tools exactly. Maybe over time as the DPDK community
> and user-base expands, people will become more familiar with the tools, processes, etc
> and this will be less of an issue?
>
>
>>
>> - Packaging: how should the distributions package DPDK and OVS? Should
>> there
>>    only be a single build to handle both the kernel and the userspace
>> datapath,
>>    eventually dynamically linked to DPDK?
>
> Yeah. Do we need to start with dpdk if we have compiled with DPDK support???
>>
>> - Benchmarks: we often rely on extremely simple flow tables with single
>> flow
>>    traffic to evaluate the effect of a change.  That may be ok during
>>    development, but OVS with the kernel datapath has been tested in
>> different
>>    scenarios with more complicated flow tables and even with hostile traffic
>>    patterns.
>>
>>    Efforts in this sense are being made, like the vsperf project, or even
>> the
>>    simple ovs-pipeline.py
>
> vsperf will really help this.
>
>>
>> I would appreciate feedback on the above points, not (only) in terms of
>> solutions, but in terms of requirements that you feel are important for our
>> system to be considered ready.
Thanks for making these good points and starting this thread.

--TFH
>>
>> Cheers,
>>
>> Daniele
>>
>
> _______________________________________________
> dev mailing list
> dev@openvswitch.org
> http://openvswitch.org/mailman/listinfo/dev
>

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [dpdk-dev] [ovs-dev] Status of Open vSwitch with DPDK
  2015-08-14 16:04 ` [dpdk-dev] Status of Open vSwitch with DPDK Gray, Mark D
  2015-08-14 21:24   ` [dpdk-dev] [ovs-dev] " Thomas F Herbert
@ 2015-08-15  7:16   ` Flavio Leitner
  2015-08-17 14:53     ` Mark D. Gray
  1 sibling, 1 reply; 4+ messages in thread
From: Flavio Leitner @ 2015-08-15  7:16 UTC (permalink / raw)
  To: Gray, Mark D; +Cc: dev, dev

On Fri, Aug 14, 2015 at 04:04:40PM +0000, Gray, Mark D wrote:
> Hi Daniele,
> 
> Thanks for starting this conversation. It is a good list :) I have crossed-posted this
> to dpdk.org as I feel that some of the points could be interesting to that community
> as they are related to how DPDK is used.
> 
> How do "users" of OVS with DPDK feel about this list? Does anyone disagree or
> does anyone have any additions? What are your experiences?
> 
> > 
> > There has been some discussion lately about the status of the Open vSwitch
> > port to DPDK.  While part of the code has been tested for quite some time,
> > I think we can agree that there are a few rough spots that prevent it from
> > being easily deployed and used.
> > 
> > I was hoping to get some feedback from the community about those rough
> > spots,
> > i.e. areas where OVS+DPDK can/needs to improve to become more
> > "production
> > ready" and user-friendly.
> > 
> > - PMD threads and queues management: the code has shown several bugs
> > and
> > the
> >   netdev interfaces don't seem up to the job anymore.
> 
> You had a few ideas about how to refactor this before but I was concerned 
> about the effect it would have on throughput. I can't find the thread. 
> 
> Do you have some further ideas about how to achieve this?

I miss the fact that we can't tell which queue can go to each PMD and
also that all devices must have the same number of rx queues. I agree
that there are other issues, but it seems the kind of configuration
knobs I am looking for might not be the end goal since what has been
said is to look for a more automated way.  Having said so, I also
would like to hear if you have further ideas about how to archive that.


> >   There's a lot of margin of improvement: we could factor out the code from
> >   dpif-netdev, add configuration parameters for advanced users, and figure
> > out
> >   a way to add unit tests.
> > 
> 
> I think this is a general issue with both the kernel datapath (and netdevs)
> and the userspace datapath. There isn't much unit testing (or testing) outside
> of the slow path. 

Maybe we could exercise the interfaces using pcap pmd.


> >   Related to this, the system should be as fast as possible out-of-the-box,
> >   without requiring too much tuning.
> 
> This is a good point. I think the kernel datapath has a similar issue. You can
> get a certain level of performance without compiling with -Ofast or
> pinning threads but you will (even with the kernel datapath) get better
> performance if you pin threads (and possibly compile differently). I guess
> it is more visible with the dpdk datapath as performance is one of the key
> values. It is also more detrimental to the performance if you don't set it
> up correctly.

Not only that, you need to consider how the resources will be
distributed upfront so that you don't run out of hugepages, perhaps
isolate PMD CPUs from the Linux scheduler, etc.  So, I think a more
realistic goal would be: the system should require minimal/none tuning
to run with acceptable performance.


> Perhaps we could provide scripts to help do this?

Or profiles (if that isn't included in your scripts definition)


> I think this is also interesting to the DPDK community. There is 
> knowledge required when running DPDK enabled apps to
> get good performance: core pinning is one thing that comes to mind.
> 
> > 
> > - Userspace tunneling: while the code has been there for quite some time it
> >   hasn't received the level of testing that the Linux kernel datapath
> > tunneling
> >   has.
> > 
> 
> Again, there is a lack of test infrastructure in general for OVS. vsperf is a good
> start, and it would be great to see more people use and contribute to it!

Yes.


> > - Documentation: other than a step by step tutorial,  it cannot be said
> > that
> >   DPDK is a first class citizen in the OVS documentation.  Manpages could
> > be
> >   improved.
> 
> Easily done. The INSTALL guide is pretty good but the structure could be better. 
> There is also a lack of manpages. Good point.

Yup.


> > - Vhost: the code has not received the level of testing of the kernel
> > vhost.
> >   Another doubt shared by some developers is whether we should keep
> >   vhost-cuse, given its relatively low ease of use and the overlapping with
> >   the far more standard vhost-user.
> 
> vhost-cuse is required for older versions of qemu. I'm aware of some companies
> using it as they are restricted to an older version of qemu. I think it is deprecated
> at the moment? Is there a notice to that effect? We just need a plan for when to
> remove it and make sure that plan is clear?

Apparently having two solutions to address the same issue causes more
harm than good, so removing vhost-cuse would be helpful.  I agree that
we need a clear plan with a soak time so users can either upgrade to
vhost-user or tell why they can't.


> > - Interface management and naming: interfaces must be manually removed
> > from
> >   the kernel drivers.
> > 
> >   We still don't have an easy way to identify them. Ideas are welcome: how
> > can
> >   we make this user friendly?  Is there a better solution on the DPDK side?
> 
> This is a tough one and is interesting to the DPDK community.  The basic issue
> here is that users are more familiar with linux interfaces and linux naming
> conventions.
> 
> "ovs-vsctl add-port bro eth0" makes a lot more sense than
> 
> "dpdk_nic_bind -b igb_uio <pci_id>", then check the order that the ports
> are enumerated and then run "ovs-vsctl add-port br0 dpdkN".
> 
> I can think of ways to do this with physical NICs. For example,
> you could reference the port by the linux name and when you try to add it, OVS
> could unbind from the kernel module and bind it to igb_uio?
> 
> However, I am not sure how you would do it with virtual nics as there is not
> even a real device.
> 
> I think a general solution from the dpdk community would be really helpful here.


It doesn't look like openvswitch is the right place to fix this.  The
openvswitch should deal with the port and the system should provide
the port somehow.  That's what happens with the kernel datapath, for
instance, openvswitch doesn't load any NIC driver.

So, it seems to be more related to udev/systemd configuration in which
the sys admin would tell the interfaces and the appropriate driver
(UIO/VFIO/Bifurcated...).

Even if the system delivers the DPDK port ready, it would be great to
have some friendly mapping so that users can refer to ports with known
names.


> >   How are DPDK interfaces handled by linux distributions? I've heard about
> >   ongoing work for RHEL and Ubuntu, it would be interesting to coordinate.

We have implemented dpdk/vhost support in initscripts so you could
configure the ports in the same way as for the kernel devices, but
how to properly bind to the driver is unclear yet.


> > - Insight into the system and debuggability: nothing beats tcpdump for the
> >   kernel datapath.  Can something similar be done for the userspace
> > datapath?
> 
> Yeah, this would be useful. I have my own way of dealing with this. For example,
> you could dump from the LOCAL port on a NORMAL bridge or add a rule to 
> mirror a flow to another port but I feel there could be a better way to do this in
> DPDK. I have recently heard that the DPDK team do something with a pcap pmd
> to help with debugging. A more general approach from dpdk would help a lot.

One idea maybe is that openvswitch could provide a mode to clone TX/RX
packets to a pcap pmd. Or write the packets using pcap format directly
to a file (avoid another pmd which might not be available). Or even
push them using a tap device. Either way tcpdump or wireshark would work. 


> > - Consistency of the tools: some commands are slightly different for the
> >   userspace/kernel datapath.  Ideally there shouldn't be any difference.

Could you give some examples?


> Yeah, there are some things that could be changed. DPDK just works differently but
> the benefits are significant :)
> 
> We need to mount hugepages, bind nics to igb_uio, etc
> 
> With a lot of this stuff, maybe the DPDK community's tools don't need to emulate
> the linux networking tools exactly. Maybe over time as the DPDK community
> and user-base expands, people will become more familiar with the tools, processes, etc
> and this will be less of an issue?
> 
> 
> > 
> > - Packaging: how should the distributions package DPDK and OVS? Should
> > there
> >   only be a single build to handle both the kernel and the userspace
> > datapath,
> >   eventually dynamically linked to DPDK?
> 
> Yeah. Do we need to start with dpdk if we have compiled with DPDK support???

Well, certainly not everybody wants to have DPDK dependencies neither
shared nor statically.  Maybe the path is a plug-in architecture? 


> > - Benchmarks: we often rely on extremely simple flow tables with single
> > flow
> >   traffic to evaluate the effect of a change.  That may be ok during
> >   development, but OVS with the kernel datapath has been tested in
> > different
> >   scenarios with more complicated flow tables and even with hostile traffic
> >   patterns.
> > 
> >   Efforts in this sense are being made, like the vsperf project, or even
> > the
> >   simple ovs-pipeline.py
> 
> vsperf will really help this.

Indeed, but how is OVS kernel datapath being tested? Is there a
script?  Maybe we can use the same tests for DPDK.


> > I would appreciate feedback on the above points, not (only) in terms of
> > solutions, but in terms of requirements that you feel are important for our
> > system to be considered ready.

The list covers technical issues, documentation issues and usability
issues which are great, thanks for doing it.  However, as said one
important use-case is extreme performance and that requires configuration
or tuning flexibility which adds usability/supportability issues.  Will
those knobs be a valid option provided that the defaults works well enough?

Thanks,
fbl

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [dpdk-dev] [ovs-dev] Status of Open vSwitch with DPDK
  2015-08-15  7:16   ` Flavio Leitner
@ 2015-08-17 14:53     ` Mark D. Gray
  0 siblings, 0 replies; 4+ messages in thread
From: Mark D. Gray @ 2015-08-17 14:53 UTC (permalink / raw)
  To: Daniele Di Proietto, dev, dev

On 08/15/15 08:16, Flavio Leitner wrote:
> On Fri, Aug 14, 2015 at 04:04:40PM +0000, Gray, Mark D wrote:
>> Hi Daniele,
>>
>> Thanks for starting this conversation. It is a good list :) I have crossed-posted this
>> to dpdk.org as I feel that some of the points could be interesting to that community
>> as they are related to how DPDK is used.
>>
>> How do "users" of OVS with DPDK feel about this list? Does anyone disagree or
>> does anyone have any additions? What are your experiences?
>>
>>>
>>> There has been some discussion lately about the status of the Open vSwitch
>>> port to DPDK.  While part of the code has been tested for quite some time,
>>> I think we can agree that there are a few rough spots that prevent it from
>>> being easily deployed and used.
>>>
>>> I was hoping to get some feedback from the community about those rough
>>> spots,
>>> i.e. areas where OVS+DPDK can/needs to improve to become more
>>> "production
>>> ready" and user-friendly.
>>>
>>> - PMD threads and queues management: the code has shown several bugs
>>> and
>>> the
>>>    netdev interfaces don't seem up to the job anymore.
>>
>> You had a few ideas about how to refactor this before but I was concerned
>> about the effect it would have on throughput. I can't find the thread.
>>
>> Do you have some further ideas about how to achieve this?
>
> I miss the fact that we can't tell which queue can go to each PMD and
> also that all devices must have the same number of rx queues. I agree
> that there are other issues, but it seems the kind of configuration
> knobs I am looking for might not be the end goal since what has been
> said is to look for a more automated way.  Having said so, I also
> would like to hear if you have further ideas about how to archive that.
>
>
>>>    There's a lot of margin of improvement: we could factor out the code from
>>>    dpif-netdev, add configuration parameters for advanced users, and figure
>>> out
>>>    a way to add unit tests.
>>>
>>
>> I think this is a general issue with both the kernel datapath (and netdevs)
>> and the userspace datapath. There isn't much unit testing (or testing) outside
>> of the slow path.
>
> Maybe we could exercise the interfaces using pcap pmd.
>
>

We had a similar idea. Using this, it would be possible to test the 
entire datapath or netdev for functionality! I don’t think there is an 
equivalent for the kernel datapath?

>>>    Related to this, the system should be as fast as possible out-of-the-box,
>>>    without requiring too much tuning.
>>
>> This is a good point. I think the kernel datapath has a similar issue. You can
>> get a certain level of performance without compiling with -Ofast or
>> pinning threads but you will (even with the kernel datapath) get better
>> performance if you pin threads (and possibly compile differently). I guess
>> it is more visible with the dpdk datapath as performance is one of the key
>> values. It is also more detrimental to the performance if you don't set it
>> up correctly.
>
> Not only that, you need to consider how the resources will be
> distributed upfront so that you don't run out of hugepages, perhaps
> isolate PMD CPUs from the Linux scheduler, etc.  So, I think a more
> realistic goal would be: the system should require minimal/none tuning
> to run with acceptable performance.
>

How do you define "acceptable" performance :)?

>
>> Perhaps we could provide scripts to help do this?
>
> Or profiles (if that isn't included in your scripts definition)
>

Maybe we should define profiles like "performance", "minimum cores", etc

>
>> I think this is also interesting to the DPDK community. There is
>> knowledge required when running DPDK enabled apps to
>> get good performance: core pinning is one thing that comes to mind.
>>
>>>
>>> - Userspace tunneling: while the code has been there for quite some time it
>>>    hasn't received the level of testing that the Linux kernel datapath
>>> tunneling
>>>    has.
>>>
>>
>> Again, there is a lack of test infrastructure in general for OVS. vsperf is a good
>> start, and it would be great to see more people use and contribute to it!
>
> Yes.
>
>
>>> - Documentation: other than a step by step tutorial,  it cannot be said
>>> that
>>>    DPDK is a first class citizen in the OVS documentation.  Manpages could
>>> be
>>>    improved.
>>
>> Easily done. The INSTALL guide is pretty good but the structure could be better.
>> There is also a lack of manpages. Good point.
>
> Yup.
>
>
>>> - Vhost: the code has not received the level of testing of the kernel
>>> vhost.
>>>    Another doubt shared by some developers is whether we should keep
>>>    vhost-cuse, given its relatively low ease of use and the overlapping with
>>>    the far more standard vhost-user.
>>
>> vhost-cuse is required for older versions of qemu. I'm aware of some companies
>> using it as they are restricted to an older version of qemu. I think it is deprecated
>> at the moment? Is there a notice to that effect? We just need a plan for when to
>> remove it and make sure that plan is clear?
>
> Apparently having two solutions to address the same issue causes more
> harm than good, so removing vhost-cuse would be helpful.  I agree that
> we need a clear plan with a soak time so users can either upgrade to
> vhost-user or tell why they can't.
>
>
>>> - Interface management and naming: interfaces must be manually removed
>>> from
>>>    the kernel drivers.
>>>
>>>    We still don't have an easy way to identify them. Ideas are welcome: how
>>> can
>>>    we make this user friendly?  Is there a better solution on the DPDK side?
>>
>> This is a tough one and is interesting to the DPDK community.  The basic issue
>> here is that users are more familiar with linux interfaces and linux naming
>> conventions.
>>
>> "ovs-vsctl add-port bro eth0" makes a lot more sense than
>>
>> "dpdk_nic_bind -b igb_uio<pci_id>", then check the order that the ports
>> are enumerated and then run "ovs-vsctl add-port br0 dpdkN".
>>
>> I can think of ways to do this with physical NICs. For example,
>> you could reference the port by the linux name and when you try to add it, OVS
>> could unbind from the kernel module and bind it to igb_uio?
>>
>> However, I am not sure how you would do it with virtual nics as there is not
>> even a real device.
>>
>> I think a general solution from the dpdk community would be really helpful here.
>
>
> It doesn't look like openvswitch is the right place to fix this.  The
> openvswitch should deal with the port and the system should provide
> the port somehow.  That's what happens with the kernel datapath, for
> instance, openvswitch doesn't load any NIC driver.
>
> So, it seems to be more related to udev/systemd configuration in which
> the sys admin would tell the interfaces and the appropriate driver
> (UIO/VFIO/Bifurcated...).
>
> Even if the system delivers the DPDK port ready, it would be great to
> have some friendly mapping so that users can refer to ports with known
> names.
>

Agreed

>
>>>    How are DPDK interfaces handled by linux distributions? I've heard about
>>>    ongoing work for RHEL and Ubuntu, it would be interesting to coordinate.
>
> We have implemented dpdk/vhost support in initscripts so you could
> configure the ports in the same way as for the kernel devices, but
> how to properly bind to the driver is unclear yet.
>
>
>>> - Insight into the system and debuggability: nothing beats tcpdump for the
>>>    kernel datapath.  Can something similar be done for the userspace
>>> datapath?
>>
>> Yeah, this would be useful. I have my own way of dealing with this. For example,
>> you could dump from the LOCAL port on a NORMAL bridge or add a rule to
>> mirror a flow to another port but I feel there could be a better way to do this in
>> DPDK. I have recently heard that the DPDK team do something with a pcap pmd
>> to help with debugging. A more general approach from dpdk would help a lot.
>
> One idea maybe is that openvswitch could provide a mode to clone TX/RX
> packets to a pcap pmd. Or write the packets using pcap format directly
> to a file (avoid another pmd which might not be available). Or even
> push them using a tap device. Either way tcpdump or wireshark would work.
>
>
>>> - Consistency of the tools: some commands are slightly different for the
>>>    userspace/kernel datapath.  Ideally there shouldn't be any difference.
>
> Could you give some examples?
>
>
>> Yeah, there are some things that could be changed. DPDK just works differently but
>> the benefits are significant :)
>>
>> We need to mount hugepages, bind nics to igb_uio, etc
>>
>> With a lot of this stuff, maybe the DPDK community's tools don't need to emulate
>> the linux networking tools exactly. Maybe over time as the DPDK community
>> and user-base expands, people will become more familiar with the tools, processes, etc
>> and this will be less of an issue?
>>
>>
>>>
>>> - Packaging: how should the distributions package DPDK and OVS? Should
>>> there
>>>    only be a single build to handle both the kernel and the userspace
>>> datapath,
>>>    eventually dynamically linked to DPDK?
>>
>> Yeah. Do we need to start with dpdk if we have compiled with DPDK support???
>
> Well, certainly not everybody wants to have DPDK dependencies neither
> shared nor statically.  Maybe the path is a plug-in architecture?
>
>
>>> - Benchmarks: we often rely on extremely simple flow tables with single
>>> flow
>>>    traffic to evaluate the effect of a change.  That may be ok during
>>>    development, but OVS with the kernel datapath has been tested in
>>> different
>>>    scenarios with more complicated flow tables and even with hostile traffic
>>>    patterns.
>>>
>>>    Efforts in this sense are being made, like the vsperf project, or even
>>> the
>>>    simple ovs-pipeline.py
>>
>> vsperf will really help this.
>
> Indeed, but how is OVS kernel datapath being tested? Is there a
> script?  Maybe we can use the same tests for DPDK.
>
>
>>> I would appreciate feedback on the above points, not (only) in terms of
>>> solutions, but in terms of requirements that you feel are important for our
>>> system to be considered ready.
>
> The list covers technical issues, documentation issues and usability
> issues which are great, thanks for doing it.  However, as said one
> important use-case is extreme performance and that requires configuration
> or tuning flexibility which adds usability/supportability issues.  Will
> those knobs be a valid option provided that the defaults works well enough?
>


I feel that we need to expose knobs up through Open vSwitch in order to 
tune for extreme performance otherwise how do we highlight the value in 
what we are doing? I think we need some way to allow a user to do this 
type of configuration when they know what they are doing (without having 
to recompile the code).

> Thanks,
> fbl
>

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2015-08-17 14:53 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <D1F17A83.5ED1%diproiettod@vmware.com>
2015-08-14 16:04 ` [dpdk-dev] Status of Open vSwitch with DPDK Gray, Mark D
2015-08-14 21:24   ` [dpdk-dev] [ovs-dev] " Thomas F Herbert
2015-08-15  7:16   ` Flavio Leitner
2015-08-17 14:53     ` Mark D. Gray

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).