From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mga01.intel.com (mga01.intel.com [192.55.52.88]) by dpdk.org (Postfix) with ESMTP id 86C8C8E5B for ; Mon, 17 Aug 2015 16:53:04 +0200 (CEST) Received: from orsmga002.jf.intel.com ([10.7.209.21]) by fmsmga101.fm.intel.com with ESMTP; 17 Aug 2015 07:53:03 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.15,695,1432623600"; d="scan'208";a="785682699" Received: from sivswdev01.ir.intel.com (HELO [10.237.217.45]) ([10.237.217.45]) by orsmga002.jf.intel.com with ESMTP; 17 Aug 2015 07:53:03 -0700 Message-ID: <55D1F54D.9070205@intel.com> Date: Mon, 17 Aug 2015 15:53:01 +0100 From: "Mark D. Gray" User-Agent: Mozilla/5.0 (X11; Linux i686 on x86_64; rv:8.0) Gecko/20111105 Thunderbird/8.0 MIME-Version: 1.0 To: Daniele Di Proietto , "dev@openvswitch.org" , dev References: <738D45BC1F695740A983F43CFE1B7EA9437C8255@IRSMSX108.ger.corp.intel.com> <20150815071630.GB2600@x240.home> In-Reply-To: <20150815071630.GB2600@x240.home> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit Subject: Re: [dpdk-dev] [ovs-dev] Status of Open vSwitch with DPDK X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list Reply-To: mark.d.gray@intel.com List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 17 Aug 2015 14:53:05 -0000 On 08/15/15 08:16, Flavio Leitner wrote: > On Fri, Aug 14, 2015 at 04:04:40PM +0000, Gray, Mark D wrote: >> Hi Daniele, >> >> Thanks for starting this conversation. It is a good list :) I have crossed-posted this >> to dpdk.org as I feel that some of the points could be interesting to that community >> as they are related to how DPDK is used. >> >> How do "users" of OVS with DPDK feel about this list? Does anyone disagree or >> does anyone have any additions? What are your experiences? >> >>> >>> There has been some discussion lately about the status of the Open vSwitch >>> port to DPDK. While part of the code has been tested for quite some time, >>> I think we can agree that there are a few rough spots that prevent it from >>> being easily deployed and used. >>> >>> I was hoping to get some feedback from the community about those rough >>> spots, >>> i.e. areas where OVS+DPDK can/needs to improve to become more >>> "production >>> ready" and user-friendly. >>> >>> - PMD threads and queues management: the code has shown several bugs >>> and >>> the >>> netdev interfaces don't seem up to the job anymore. >> >> You had a few ideas about how to refactor this before but I was concerned >> about the effect it would have on throughput. I can't find the thread. >> >> Do you have some further ideas about how to achieve this? > > I miss the fact that we can't tell which queue can go to each PMD and > also that all devices must have the same number of rx queues. I agree > that there are other issues, but it seems the kind of configuration > knobs I am looking for might not be the end goal since what has been > said is to look for a more automated way. Having said so, I also > would like to hear if you have further ideas about how to archive that. > > >>> There's a lot of margin of improvement: we could factor out the code from >>> dpif-netdev, add configuration parameters for advanced users, and figure >>> out >>> a way to add unit tests. >>> >> >> I think this is a general issue with both the kernel datapath (and netdevs) >> and the userspace datapath. There isn't much unit testing (or testing) outside >> of the slow path. > > Maybe we could exercise the interfaces using pcap pmd. > > We had a similar idea. Using this, it would be possible to test the entire datapath or netdev for functionality! I don’t think there is an equivalent for the kernel datapath? >>> Related to this, the system should be as fast as possible out-of-the-box, >>> without requiring too much tuning. >> >> This is a good point. I think the kernel datapath has a similar issue. You can >> get a certain level of performance without compiling with -Ofast or >> pinning threads but you will (even with the kernel datapath) get better >> performance if you pin threads (and possibly compile differently). I guess >> it is more visible with the dpdk datapath as performance is one of the key >> values. It is also more detrimental to the performance if you don't set it >> up correctly. > > Not only that, you need to consider how the resources will be > distributed upfront so that you don't run out of hugepages, perhaps > isolate PMD CPUs from the Linux scheduler, etc. So, I think a more > realistic goal would be: the system should require minimal/none tuning > to run with acceptable performance. > How do you define "acceptable" performance :)? > >> Perhaps we could provide scripts to help do this? > > Or profiles (if that isn't included in your scripts definition) > Maybe we should define profiles like "performance", "minimum cores", etc > >> I think this is also interesting to the DPDK community. There is >> knowledge required when running DPDK enabled apps to >> get good performance: core pinning is one thing that comes to mind. >> >>> >>> - Userspace tunneling: while the code has been there for quite some time it >>> hasn't received the level of testing that the Linux kernel datapath >>> tunneling >>> has. >>> >> >> Again, there is a lack of test infrastructure in general for OVS. vsperf is a good >> start, and it would be great to see more people use and contribute to it! > > Yes. > > >>> - Documentation: other than a step by step tutorial, it cannot be said >>> that >>> DPDK is a first class citizen in the OVS documentation. Manpages could >>> be >>> improved. >> >> Easily done. The INSTALL guide is pretty good but the structure could be better. >> There is also a lack of manpages. Good point. > > Yup. > > >>> - Vhost: the code has not received the level of testing of the kernel >>> vhost. >>> Another doubt shared by some developers is whether we should keep >>> vhost-cuse, given its relatively low ease of use and the overlapping with >>> the far more standard vhost-user. >> >> vhost-cuse is required for older versions of qemu. I'm aware of some companies >> using it as they are restricted to an older version of qemu. I think it is deprecated >> at the moment? Is there a notice to that effect? We just need a plan for when to >> remove it and make sure that plan is clear? > > Apparently having two solutions to address the same issue causes more > harm than good, so removing vhost-cuse would be helpful. I agree that > we need a clear plan with a soak time so users can either upgrade to > vhost-user or tell why they can't. > > >>> - Interface management and naming: interfaces must be manually removed >>> from >>> the kernel drivers. >>> >>> We still don't have an easy way to identify them. Ideas are welcome: how >>> can >>> we make this user friendly? Is there a better solution on the DPDK side? >> >> This is a tough one and is interesting to the DPDK community. The basic issue >> here is that users are more familiar with linux interfaces and linux naming >> conventions. >> >> "ovs-vsctl add-port bro eth0" makes a lot more sense than >> >> "dpdk_nic_bind -b igb_uio", then check the order that the ports >> are enumerated and then run "ovs-vsctl add-port br0 dpdkN". >> >> I can think of ways to do this with physical NICs. For example, >> you could reference the port by the linux name and when you try to add it, OVS >> could unbind from the kernel module and bind it to igb_uio? >> >> However, I am not sure how you would do it with virtual nics as there is not >> even a real device. >> >> I think a general solution from the dpdk community would be really helpful here. > > > It doesn't look like openvswitch is the right place to fix this. The > openvswitch should deal with the port and the system should provide > the port somehow. That's what happens with the kernel datapath, for > instance, openvswitch doesn't load any NIC driver. > > So, it seems to be more related to udev/systemd configuration in which > the sys admin would tell the interfaces and the appropriate driver > (UIO/VFIO/Bifurcated...). > > Even if the system delivers the DPDK port ready, it would be great to > have some friendly mapping so that users can refer to ports with known > names. > Agreed > >>> How are DPDK interfaces handled by linux distributions? I've heard about >>> ongoing work for RHEL and Ubuntu, it would be interesting to coordinate. > > We have implemented dpdk/vhost support in initscripts so you could > configure the ports in the same way as for the kernel devices, but > how to properly bind to the driver is unclear yet. > > >>> - Insight into the system and debuggability: nothing beats tcpdump for the >>> kernel datapath. Can something similar be done for the userspace >>> datapath? >> >> Yeah, this would be useful. I have my own way of dealing with this. For example, >> you could dump from the LOCAL port on a NORMAL bridge or add a rule to >> mirror a flow to another port but I feel there could be a better way to do this in >> DPDK. I have recently heard that the DPDK team do something with a pcap pmd >> to help with debugging. A more general approach from dpdk would help a lot. > > One idea maybe is that openvswitch could provide a mode to clone TX/RX > packets to a pcap pmd. Or write the packets using pcap format directly > to a file (avoid another pmd which might not be available). Or even > push them using a tap device. Either way tcpdump or wireshark would work. > > >>> - Consistency of the tools: some commands are slightly different for the >>> userspace/kernel datapath. Ideally there shouldn't be any difference. > > Could you give some examples? > > >> Yeah, there are some things that could be changed. DPDK just works differently but >> the benefits are significant :) >> >> We need to mount hugepages, bind nics to igb_uio, etc >> >> With a lot of this stuff, maybe the DPDK community's tools don't need to emulate >> the linux networking tools exactly. Maybe over time as the DPDK community >> and user-base expands, people will become more familiar with the tools, processes, etc >> and this will be less of an issue? >> >> >>> >>> - Packaging: how should the distributions package DPDK and OVS? Should >>> there >>> only be a single build to handle both the kernel and the userspace >>> datapath, >>> eventually dynamically linked to DPDK? >> >> Yeah. Do we need to start with dpdk if we have compiled with DPDK support??? > > Well, certainly not everybody wants to have DPDK dependencies neither > shared nor statically. Maybe the path is a plug-in architecture? > > >>> - Benchmarks: we often rely on extremely simple flow tables with single >>> flow >>> traffic to evaluate the effect of a change. That may be ok during >>> development, but OVS with the kernel datapath has been tested in >>> different >>> scenarios with more complicated flow tables and even with hostile traffic >>> patterns. >>> >>> Efforts in this sense are being made, like the vsperf project, or even >>> the >>> simple ovs-pipeline.py >> >> vsperf will really help this. > > Indeed, but how is OVS kernel datapath being tested? Is there a > script? Maybe we can use the same tests for DPDK. > > >>> I would appreciate feedback on the above points, not (only) in terms of >>> solutions, but in terms of requirements that you feel are important for our >>> system to be considered ready. > > The list covers technical issues, documentation issues and usability > issues which are great, thanks for doing it. However, as said one > important use-case is extreme performance and that requires configuration > or tuning flexibility which adds usability/supportability issues. Will > those knobs be a valid option provided that the defaults works well enough? > I feel that we need to expose knobs up through Open vSwitch in order to tune for extreme performance otherwise how do we highlight the value in what we are doing? I think we need some way to allow a user to do this type of configuration when they know what they are doing (without having to recompile the code). > Thanks, > fbl >