DPDK usage discussions
 help / color / mirror / Atom feed
From: "Xia, Chenbo" <chenbo.xia@intel.com>
To: Vipul Ujawane <vipul999ujawane@gmail.com>,
	David Christensen <drc@linux.vnet.ibm.com>
Cc: "users@dpdk.org" <users@dpdk.org>
Subject: Re: [dpdk-users] Poor performance when using OVS with DPDK
Date: Tue, 30 Jun 2020 04:41:57 +0000	[thread overview]
Message-ID: <MN2PR11MB4063917A9A4B7EC230BA68029C6F0@MN2PR11MB4063.namprd11.prod.outlook.com> (raw)
In-Reply-To: <CABgxuK5gdmjiViAk7Fq9GomAdp=0j=U4yrTUyCtQ+DE2mwUAdw@mail.gmail.com>

Hi Vipul,

Did you check the core affinity of forwarding thread in OVS? For opt perf, one fwd thread should take one dedicated core.

BRs,
Chenbo

> -----Original Message-----
> From: users <users-bounces@dpdk.org> On Behalf Of Vipul Ujawane
> Sent: Monday, June 29, 2020 4:33 PM
> To: David Christensen <drc@linux.vnet.ibm.com>
> Cc: users@dpdk.org
> Subject: Re: [dpdk-users] Poor performance when using OVS with DPDK
> 
> So,
> > You don't mention how many different flows you're using in the test.
> Don't be surprised as throughput drops when you move from 1,000 flows to
> 1,000,000 flows.
> 
> We currently only have 1 flow, the basic packet forwarding rule. We used pktgen
> standard built-in packet generation without any pcap or script that would
> change the flows!
> Therefore, increasing the number of queues (and cores/queues) cannot help;
> that flow will always be handled in one specific queue.
> 
> Increasing the overall core assignment to DPDK should then help, but it does not.
> On the other hand, we tested again the VM-to-VM performance as well via iperf
> and the dpdkvhost user interfaces in the KVM machines, but the performance is
> still bad with the new settings, although a bit increased; it's around 10G now.
> Note again, it's iperf using TCP and MTU sized packets (but with OVS- Kernel, the
> performance is 20G with a similar setup).
> 
> Thanks.
> 
> On Sat, Jun 27, 2020 at 3:32 AM David Christensen <drc@linux.vnet.ibm.com>
> wrote:
> 
> > >  > Why don't you reserve any CPUs for OVS/DPDK or VM usage?  All  >
> > > published  > performance white papers recommend settings for CPU
> > > isolation like  > this  > Mellanox DPDK performance report:
> > >  >
> > >  >
> > >
> >
> https://fast.dpdk.org/doc/perf/DPDK_19_08_Mellanox_NIC_performance_rep
> > ort.pdf
> > > <
> > https://mailtrack.io/trace/link/d820a3bc37ae49e92351ef785e3cfb4c21ab5d
> >
> 3c?url=https%3A%2F%2Ffast.dpdk.org%2Fdoc%2Fperf%2FDPDK_19_08_Mellan
> ox_
> > NIC_performance_report.pdf&userId=1840365&signature=eda177b39475fac6
> > >
> > >  >
> > >  > For their test system:
> > >  >
> > >  > isolcpus=24-47 intel_idle.max_cstate=0 processor.max_cstate=0  >
> > > intel_pstate=disable nohz_full=24-47  > rcu_nocbs=24-47
> > > rcu_nocb_poll default_hugepagesz=1G hugepagesz=1G  > hugepages=64
> > > audit=0  > nosoftlockup  >  > Using the tuned service (CPU
> > > partitioning profile) make this process  > easier:
> > >  >
> > >  > https://tuned-project.org/
> > > <
> > https://mailtrack.io/trace/link/4f3f47457c7163aacfe6bb108c6eff554be9cd
> > 4d?url=https%3A%2F%2Ftuned-
> project.org%2F&userId=1840365&signature=f15
> > 04b405d9e880f
> > >
> > >  >
> > > Nice tutorial, thanks for sharing. I have checked it and configured
> > > our server like this:
> > >
> > > isolcpus=12-19 intel_idle.max_cstate=0 processor.max_cstate=0
> > > nohz_full=12-19 rcu_nocbs=12-19 intel_pstate=disable
> > > default_hugepagesz=1G hugepagesz=1G hugepages=24 audit=0
> > > nosoftlockup intel_iommu=on iommu=pt rcu_nocb_poll
> > >
> > >
> > > Even though our servers are NUMA-capable and NUMA-aware, we only
> > > have one CPU installed in one socket.
> > > And one CPU has 20 physical cores (40 threads), so I figured out to
> > > use the "top-most" cores for DPDK/OVS, that's the reason of
> > > isolcpus=12-19
> >
> > You can never have too many cores.  On POWER systems I'll sometimes
> > reserve 76 out of 80 available cores to improve overall throughput.
> >
> > >  > >
> > >  > > ./usertools/dpdk-devbind.py --status  > > Network devices using
> > > kernel driver  > > ===================================
> > >  > > 0000:b3:00.0 'MT27800 Family [ConnectX-5] 1017' if=ens2  >
> > > drv=mlx5_core  > > unused=igb_uio,vfio-pci  > >  > > Due to the way
> > > how Mellanox cards and their driver work, I have not  > bond  > >
> > > igb_uio to the interface, however, uio, igb_uio and vfio-pci kernel
> > > > modules  > > are loaded.
> > >  > >
> > >  > >
> > >  > > Relevant part of the VM-config for Qemu/KVM  > >
> > > -------------------------------------------
> > >  > >    <cputune>
> > >  > >      <shares>4096</shares>
> > >  > >      <vcpupin vcpu='0' cpuset='4'/>
> > >  > >      <vcpupin vcpu='1' cpuset='5'/>
> > >  >
> > >  > Where did you get these CPU mapping values?  x86 systems
> > > typically  > map  > even-numbered CPUs to one NUMA node and
> > > odd-numbered CPUs to a  > different  > NUMA node.  You generally
> > > want to select CPUs from the same NUMA node  > as  > the mlx5 NIC
> > > you're using for DPDK.
> > >  >
> > >  > You should have at least 4 CPUs in the VM, selected according to
> > > the  > NUMA topology of the system.
> > > as per my answer above, our system has no secondary NUMA node, all
> > > mappings are to the same socket/CPU.
> > >
> > >  >
> > >  > Take a look at this bash script written for Red Hat:
> > >  >
> > >  >
> > >
> > https://github.com/ctrautma/RHEL_NIC_QUALIFICATION/blob/ansible/ansibl
> > e/get_cpulist.sh
> > > <
> > https://mailtrack.io/trace/link/cf74e0c69acb6d9a348606606825a032089882
> >
> 4a?url=https%3A%2F%2Fgithub.com%2Fctrautma%2FRHEL_NIC_QUALIFICATIO
> N%2F
> > blob%2Fansible%2Fansible%2Fget_cpulist.sh&userId=1840365&signature=988
> > 4f8ea2b42399d
> > >
> > >  >
> > >  > It gives you a good starting reference which CPUs to select for
> > > the  > OVS/DPDK and VM configurations on your particular system.
> > > Also  > review  > the Ansible script pvp_ovsdpdk.yml, it provides a
> > > lot of other  > useful  > steps you might be able to apply to your
> > > Debian OS.
> > >  >
> > >  > >      <emulatorpin cpuset='4-5'/>
> > >  > >    </cputune>
> > >  > >    <cpu mode='host-model' check='partial'>
> > >  > >      <model fallback='allow'/>
> > >  > >      <topology sockets='2' cores='1' threads='1'/>
> > >  > >      <numa>
> > >  > >        <cell id='0' cpus='0-1' memory='4194304' unit='KiB'
> > >  > > memAccess='shared'/>
> > >  > >      </numa>
> > >  > >    </cpu>
> > >  > >      <interface type='vhostuser'>
> > >  > >        <mac address='00:00:00:00:00:aa'/>
> > >  > >        <source type='unix'
> > >  > path='/usr/local/var/run/openvswitch/vhostuser'
> > >  > > mo$
> > >  > >        <model type='virtio'/>
> > >  > >        <driver queues='2'>
> > >  > >          <host mrg_rxbuf='on'/>
> > >  >
> > >  > Is there a requirement for mergeable RX buffers?  Some PMDs like
> > > > mlx5  > can take advantage of SSE instructions when this is
> > > disabled,  > yielding  > better performance.
> > > Good point, there is no requirement, I just took an example config
> > > and though it's necessary for the driver queues setting.
> >
> > That's how we all learn :-)
> >
> > >  >
> > >  > >        </driver>
> > >  > >        <address type='pci' domain='0x0000' bus='0x07' slot='0x00'
> > >  > > function='0x0'$
> > >  > >      </interface>
> > >  > >
> > >  >
> > >  > I don't see hugepage usage in the libvirt XML.  Something similar to:
> > >  >
> > >  >    <memory unit='KiB'>8388608</memory>
> > >  >    <currentMemory unit='KiB'>8388608</currentMemory>
> > >  >    <memoryBacking>
> > >  >      <hugepages>
> > >  >        <page size='1048576' unit='KiB' nodeset='0'/>
> > >  >      </hugepages>
> > >  >    </memoryBacking>
> > > I did not copy this part of the XML, but we have hugepages
> > > configured properly.
> > >  >
> > >  >
> > >  > > -----------------------------------
> > >  > > OVS Start Config
> > >  > > -----------------------------------
> > >  > > ovs-vsctl --no-wait set Open_vSwitch .
> > > other_config:dpdk-init=true  > > ovs-vsctl --no-wait set
> > > Open_vSwitch . other_config:dpdk-socket-  > mem="4096,0"
> > >  > > ovs-vsctl --no-wait set Open_vSwitch . other_config:dpdk-lcore-
> > > > mask=0xff  > > ovs-vsctl --no-wait set Open_vSwitch .
> > > other_config:pmd-cpu-mask=0e  >  > These two masks shouldn't
> > > overlap:
> > >  >
> > >
> > https://developers.redhat.com/blog/2017/06/28/ovs-dpdk-parameters-deal
> > ing-with-multi-numa/
> > > <
> > https://mailtrack.io/trace/link/6c59473e0a8547a9cb80d8f52f9cf5190a0712
> >
> f6?url=https%3A%2F%2Fdevelopers.redhat.com%2Fblog%2F2017%2F06%2F28%
> 2Fo
> > vs-dpdk-parameters-dealing-with-multi-numa%2F&userId=1840365&signature
> > =b01114dc094e5fb9
> > >
> > >  >
> > > Thanks, this did really help me understand in which order these
> > > commands should be issued.
> > >
> > > So, the problem now is the following.
> > > I did all the changes you shared, and started OVS/DPDK in a proper
> > > way and set these features:
> > >
> > > ovs-vsctl --no-wait set Open_vSwitch . other_config:dpdk-socket-
> > > mem="8192,0"
> > >
> > > ovs-vsctl --no-wait set Open_vSwitch . other_config:dpdk-lcore-
> > > mask=0x01000
> > >
> > > ovs-vsctl --no-wait set Open_vSwitch . other_config:dpdk-init=true
> > >
> > > and, finally this:
> > > ovs-vsctl --no-wait set Open_vSwitch . other_config:pmd-cpu-
> > > mask=0x0e000
> > >
> > > The documentation you shared say this last one can be even set
> > > during runtime. So, I was playing with it to see there is any change.
> > >
> > > I did not start any VM on top of OVS/DPDK, just set up a port
> > > forward rule (in_port=1, actions=output:IN_PORT), since I only have
> > > one physical ports on each mellanox card.
> > > Then, I generated traffic from the other server towards OVS Using
> > > pktsize 64B, the max throughput Pktgen reports is 8Gbps.
> > > In particular, I got these metrics:
> > > Size        Sent_pps       Recv_pps      Recv_Gbps
> > > 64B           93M            11M             ~8
> > > 128B          65M            12.5M           ~15
> > > 256B          42.5M          12.3M           ~27
> > > 512B          23.5M          11.9M           ~51
> > > 1024B         11.9M          10M             ~83
> > > 1280B         9.6M           8.3M            ~86
> > > 1500B         8.3M           6.7M            ~82
> > >
> > > It's quite interesting that for 64B, the pps is less than for
> > > greater sizes. Because PPS should be the practical limitation in
> > > throughput, and according to the packet size we can count the throughput in
> Gbps.
> >
> > Looking at 64B performance gives you a sense of the per-packet
> > overhead associated with the DPDK framework and your application.  At
> > 100Gb/s line rate, 64B frames will arrive every 6.72ns.  Since your
> > received PPS is peaking around 12.5MPPS I'd guess that it's taking
> > about 80ns of CPU time per frame.  I don't know how well OVS scales
> > with additional CPUs, something to look at.
> >
> > You don't mention how many different flows you're using in the test.
> > Don't be surprised as throughput drops when you move from 1,000 flows
> > to
> > 1,000,000 flows.
> >
> > It's likely that most of your frame loss is due to the NICs RX buffers
> > overflowing and dropping frames due to back pressure (i.e. DPDK/OVS
> > can't process packets fast enough).  Look the the mlx5's hardware
> > statistics to verify.
> >
> > You may be able to improve the performance by increasing the number of
> > RX queues and RX descriptors per queue, and assigning more lcores to
> > match the number of queues, allowing the work to be spread more evenly
> > and reducing buffer overflows.  This often works when running testpmd
> > alone since the app overhead is low but has less effect on OVS
> > perforamnce.  You might consider benchmarking testpmd alone vs
> > OVS/DPDK to understand the OVS overhead.
> >
> > >
> > > Anyway, OVS-DPDK have 3 cores to use, but only one rx queue is
> > > assigned to the port (so, basically --- as `top` also shows --- it
> > > is the one- core performance.
> >
> > Increasing the number of RX queues/descriptors and assigning a
> > dedicated lcore to each queue will generally improve performance if
> > your bottleneck is RX in the PMD.
> >
> > > Increasing the cores did not help, and the performance remained the
> > > same. Is this performance normal for OVS/DPDK?
> >
> > That's been my experience, though there are other who have more
> > experience with performance testing OVS.  The platform matters.  Look
> > for existing whitepapers and compare your system configuration to
> > theirs to see what you need to achieve the performance you're looking for.
> >
> > Dave
> >
> 
> 
> --
> 
> Vipul Ujawane <https://vipul999ujawane.github.io/>
> Pre-Final Year Undergraduate
> Department of Industrial and Systems Engineering Indian Institute of Technology,
> Kharagpur

  reply	other threads:[~2020-06-30  4:42 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-06-24 11:03 Vipul Ujawane
2020-06-25 18:03 ` David Christensen
2020-06-26  9:39   ` Vipul Ujawane
2020-06-26 19:32     ` David Christensen
2020-06-29  8:33       ` Vipul Ujawane
2020-06-30  4:41         ` Xia, Chenbo [this message]
  -- strict thread matches above, loose matches on Subject: below --
2020-06-24 10:56 Vipul Ujawane

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=MN2PR11MB4063917A9A4B7EC230BA68029C6F0@MN2PR11MB4063.namprd11.prod.outlook.com \
    --to=chenbo.xia@intel.com \
    --cc=drc@linux.vnet.ibm.com \
    --cc=users@dpdk.org \
    --cc=vipul999ujawane@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).