DPDK usage discussions
 help / color / mirror / Atom feed
* [dpdk-users] Poor performance when using OVS with DPDK
@ 2020-06-24 10:56 Vipul Ujawane
  0 siblings, 0 replies; 7+ messages in thread
From: Vipul Ujawane @ 2020-06-24 10:56 UTC (permalink / raw)
  To: ovs-discuss

Dear all,

I am observing a very low performance when running OVS-DPDK when compared
to OVS running with the Kernel Datapath.
I have OvS version 2.13.90 compiled from source with the latest stable DPDK
v19.11.3 on a stable Debian system running kernel 4.19.0-9-amd64 (real
version:4.19.118).

I have tried to use the latest released OvS as well (2.12) with the same
LTS DPDK. As a last resort, I have tried an older kernel, whether it has
any problem (4.19.0-8-amd64 (real version:4.19.98)).

I have not been able to troubleshoot the problem, and kindly request your
help regarding the same.

HW configuration
================
We have to two totally identical servers (Debian stable, Intel(R) Xeon(R)
Gold 6230 CPU, 96G Mem), each runs KVM virtual machine. On the hypervisor
layer, we have OvS for traffic routing. The servers are connected directly
via a Mellanox ConnectX-5 (1x100G).
OVS Forwarding tables are configured for simple port-forwarding only to
avoid any packet processing-related issue.

Problem
=======
When both servers are running OVS-Kernel at the hypervisor layer and VMs
are connected to it via libvirt and virtio interfaces, the
VM->Server1->Server2->VM throughput is around 16-18Gbps.
However, when using OVS-DPDK with the same setting, the throughput drops
down to 4-6Gbps.


SW/driver configurations:
==================
DPDK
----
In config common_base, besides the defaults, I have enabled the following
extra drivers/features to be compiled/enabled.
CONFIG_RTE_LIBRTE_MLX5_PMD=y
CONFIG_RTE_LIBRTE_VHOST=y
CONFIG_RTE_LIBRTE_VHOST_NUMA=y
CONFIG_RTE_LIBRTE_PMD_VHOST=y
CONFIG_RTE_VIRTIO_USER=n
CONFIG_RTE_EAL_VFIO=y


OVS
---
$ovs-vswitchd --version
ovs-vswitchd (Open vSwitch) 2.13.90

$sudo ovs-vsctl get Open_vSwitch . dpdk_initialized
true

$sudo ovs-vsctl get Open_vSwitch . dpdk_version
"DPDK 19.11.3"

OS settings
-----------
$ lsb_release -a
No LSB modules are available.
Distributor ID: Debian
Description: Debian GNU/Linux 10 (buster)
Release: 10
Codename: buster


$ cat /proc/cmdline
BOOT_IMAGE=/vmlinuz-4.19.0-9-amd64 root=/dev/mapper/Volume0-debian--stable
ro default_hugepagesz=1G hugepagesz=1G hugepages=16 intel_iommu=on iommu=pt
quiet

./usertools/dpdk-devbind.py --status
Network devices using kernel driver
===================================
0000:b3:00.0 'MT27800 Family [ConnectX-5] 1017' if=ens2 drv=mlx5_core
unused=igb_uio,vfio-pci

Due to the way how Mellanox cards and their driver work, I have not bond
igb_uio to the interface, however, uio, igb_uio and vfio-pci kernel modules
are loaded.


Relevant part of the VM-config for Qemu/KVM
-------------------------------------------
  <cputune>
    <shares>4096</shares>
    <vcpupin vcpu='0' cpuset='4'/>
    <vcpupin vcpu='1' cpuset='5'/>
    <emulatorpin cpuset='4-5'/>
  </cputune>
  <cpu mode='host-model' check='partial'>
    <model fallback='allow'/>
    <topology sockets='2' cores='1' threads='1'/>
    <numa>
      <cell id='0' cpus='0-1' memory='4194304' unit='KiB'
memAccess='shared'/>
    </numa>
  </cpu>
    <interface type='vhostuser'>
      <mac address='00:00:00:00:00:aa'/>
      <source type='unix' path='/usr/local/var/run/openvswitch/vhostuser'
mo$
      <model type='virtio'/>
      <driver queues='2'>
        <host mrg_rxbuf='on'/>
      </driver>
      <address type='pci' domain='0x0000' bus='0x07' slot='0x00'
function='0x0'$
    </interface>

-----------------------------------
OVS Start Config
-----------------------------------
ovs-vsctl --no-wait set Open_vSwitch . other_config:dpdk-init=true
ovs-vsctl --no-wait set Open_vSwitch . other_config:dpdk-socket-mem="4096,0"
ovs-vsctl --no-wait set Open_vSwitch . other_config:dpdk-lcore-mask=0xff
ovs-vsctl --no-wait set Open_vSwitch . other_config:pmd-cpu-mask=0e
ovs-vsctl add-port ovsbr dpdk0 -- set Interface dpdk0 type=dpdk
options:dpdk-devargs=0000:b3:00.0
ovs-vsctl set interface dpdk0 options:n_rxq=2
ovs-vsctl add-port ovsbr vhost-vm -- set Interface vhostuser
type=dpdkvhostuser



-------------------------------------------------------
$cat /proc/cmdline
BOOT_IMAGE=/vmlinuz-4.19.0-9-amd64 root=/dev/mapper/Volume0-debian--stable
ro default_hugepagesz=1G hugepagesz=1G hugepages=16 intel_iommu=on iommu=pt
quiet


Is there anything I should be aware of the versions and setting I am using?
Did I compile DPDK and/or OvS in a wrong way?

Thank you for your kind help ;)

-- 

Vipul Ujawane

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [dpdk-users] Poor performance when using OVS with DPDK
  2020-06-29  8:33       ` Vipul Ujawane
@ 2020-06-30  4:41         ` Xia, Chenbo
  0 siblings, 0 replies; 7+ messages in thread
From: Xia, Chenbo @ 2020-06-30  4:41 UTC (permalink / raw)
  To: Vipul Ujawane, David Christensen; +Cc: users

Hi Vipul,

Did you check the core affinity of forwarding thread in OVS? For opt perf, one fwd thread should take one dedicated core.

BRs,
Chenbo

> -----Original Message-----
> From: users <users-bounces@dpdk.org> On Behalf Of Vipul Ujawane
> Sent: Monday, June 29, 2020 4:33 PM
> To: David Christensen <drc@linux.vnet.ibm.com>
> Cc: users@dpdk.org
> Subject: Re: [dpdk-users] Poor performance when using OVS with DPDK
> 
> So,
> > You don't mention how many different flows you're using in the test.
> Don't be surprised as throughput drops when you move from 1,000 flows to
> 1,000,000 flows.
> 
> We currently only have 1 flow, the basic packet forwarding rule. We used pktgen
> standard built-in packet generation without any pcap or script that would
> change the flows!
> Therefore, increasing the number of queues (and cores/queues) cannot help;
> that flow will always be handled in one specific queue.
> 
> Increasing the overall core assignment to DPDK should then help, but it does not.
> On the other hand, we tested again the VM-to-VM performance as well via iperf
> and the dpdkvhost user interfaces in the KVM machines, but the performance is
> still bad with the new settings, although a bit increased; it's around 10G now.
> Note again, it's iperf using TCP and MTU sized packets (but with OVS- Kernel, the
> performance is 20G with a similar setup).
> 
> Thanks.
> 
> On Sat, Jun 27, 2020 at 3:32 AM David Christensen <drc@linux.vnet.ibm.com>
> wrote:
> 
> > >  > Why don't you reserve any CPUs for OVS/DPDK or VM usage?  All  >
> > > published  > performance white papers recommend settings for CPU
> > > isolation like  > this  > Mellanox DPDK performance report:
> > >  >
> > >  >
> > >
> >
> https://fast.dpdk.org/doc/perf/DPDK_19_08_Mellanox_NIC_performance_rep
> > ort.pdf
> > > <
> > https://mailtrack.io/trace/link/d820a3bc37ae49e92351ef785e3cfb4c21ab5d
> >
> 3c?url=https%3A%2F%2Ffast.dpdk.org%2Fdoc%2Fperf%2FDPDK_19_08_Mellan
> ox_
> > NIC_performance_report.pdf&userId=1840365&signature=eda177b39475fac6
> > >
> > >  >
> > >  > For their test system:
> > >  >
> > >  > isolcpus=24-47 intel_idle.max_cstate=0 processor.max_cstate=0  >
> > > intel_pstate=disable nohz_full=24-47  > rcu_nocbs=24-47
> > > rcu_nocb_poll default_hugepagesz=1G hugepagesz=1G  > hugepages=64
> > > audit=0  > nosoftlockup  >  > Using the tuned service (CPU
> > > partitioning profile) make this process  > easier:
> > >  >
> > >  > https://tuned-project.org/
> > > <
> > https://mailtrack.io/trace/link/4f3f47457c7163aacfe6bb108c6eff554be9cd
> > 4d?url=https%3A%2F%2Ftuned-
> project.org%2F&userId=1840365&signature=f15
> > 04b405d9e880f
> > >
> > >  >
> > > Nice tutorial, thanks for sharing. I have checked it and configured
> > > our server like this:
> > >
> > > isolcpus=12-19 intel_idle.max_cstate=0 processor.max_cstate=0
> > > nohz_full=12-19 rcu_nocbs=12-19 intel_pstate=disable
> > > default_hugepagesz=1G hugepagesz=1G hugepages=24 audit=0
> > > nosoftlockup intel_iommu=on iommu=pt rcu_nocb_poll
> > >
> > >
> > > Even though our servers are NUMA-capable and NUMA-aware, we only
> > > have one CPU installed in one socket.
> > > And one CPU has 20 physical cores (40 threads), so I figured out to
> > > use the "top-most" cores for DPDK/OVS, that's the reason of
> > > isolcpus=12-19
> >
> > You can never have too many cores.  On POWER systems I'll sometimes
> > reserve 76 out of 80 available cores to improve overall throughput.
> >
> > >  > >
> > >  > > ./usertools/dpdk-devbind.py --status  > > Network devices using
> > > kernel driver  > > ===================================
> > >  > > 0000:b3:00.0 'MT27800 Family [ConnectX-5] 1017' if=ens2  >
> > > drv=mlx5_core  > > unused=igb_uio,vfio-pci  > >  > > Due to the way
> > > how Mellanox cards and their driver work, I have not  > bond  > >
> > > igb_uio to the interface, however, uio, igb_uio and vfio-pci kernel
> > > > modules  > > are loaded.
> > >  > >
> > >  > >
> > >  > > Relevant part of the VM-config for Qemu/KVM  > >
> > > -------------------------------------------
> > >  > >    <cputune>
> > >  > >      <shares>4096</shares>
> > >  > >      <vcpupin vcpu='0' cpuset='4'/>
> > >  > >      <vcpupin vcpu='1' cpuset='5'/>
> > >  >
> > >  > Where did you get these CPU mapping values?  x86 systems
> > > typically  > map  > even-numbered CPUs to one NUMA node and
> > > odd-numbered CPUs to a  > different  > NUMA node.  You generally
> > > want to select CPUs from the same NUMA node  > as  > the mlx5 NIC
> > > you're using for DPDK.
> > >  >
> > >  > You should have at least 4 CPUs in the VM, selected according to
> > > the  > NUMA topology of the system.
> > > as per my answer above, our system has no secondary NUMA node, all
> > > mappings are to the same socket/CPU.
> > >
> > >  >
> > >  > Take a look at this bash script written for Red Hat:
> > >  >
> > >  >
> > >
> > https://github.com/ctrautma/RHEL_NIC_QUALIFICATION/blob/ansible/ansibl
> > e/get_cpulist.sh
> > > <
> > https://mailtrack.io/trace/link/cf74e0c69acb6d9a348606606825a032089882
> >
> 4a?url=https%3A%2F%2Fgithub.com%2Fctrautma%2FRHEL_NIC_QUALIFICATIO
> N%2F
> > blob%2Fansible%2Fansible%2Fget_cpulist.sh&userId=1840365&signature=988
> > 4f8ea2b42399d
> > >
> > >  >
> > >  > It gives you a good starting reference which CPUs to select for
> > > the  > OVS/DPDK and VM configurations on your particular system.
> > > Also  > review  > the Ansible script pvp_ovsdpdk.yml, it provides a
> > > lot of other  > useful  > steps you might be able to apply to your
> > > Debian OS.
> > >  >
> > >  > >      <emulatorpin cpuset='4-5'/>
> > >  > >    </cputune>
> > >  > >    <cpu mode='host-model' check='partial'>
> > >  > >      <model fallback='allow'/>
> > >  > >      <topology sockets='2' cores='1' threads='1'/>
> > >  > >      <numa>
> > >  > >        <cell id='0' cpus='0-1' memory='4194304' unit='KiB'
> > >  > > memAccess='shared'/>
> > >  > >      </numa>
> > >  > >    </cpu>
> > >  > >      <interface type='vhostuser'>
> > >  > >        <mac address='00:00:00:00:00:aa'/>
> > >  > >        <source type='unix'
> > >  > path='/usr/local/var/run/openvswitch/vhostuser'
> > >  > > mo$
> > >  > >        <model type='virtio'/>
> > >  > >        <driver queues='2'>
> > >  > >          <host mrg_rxbuf='on'/>
> > >  >
> > >  > Is there a requirement for mergeable RX buffers?  Some PMDs like
> > > > mlx5  > can take advantage of SSE instructions when this is
> > > disabled,  > yielding  > better performance.
> > > Good point, there is no requirement, I just took an example config
> > > and though it's necessary for the driver queues setting.
> >
> > That's how we all learn :-)
> >
> > >  >
> > >  > >        </driver>
> > >  > >        <address type='pci' domain='0x0000' bus='0x07' slot='0x00'
> > >  > > function='0x0'$
> > >  > >      </interface>
> > >  > >
> > >  >
> > >  > I don't see hugepage usage in the libvirt XML.  Something similar to:
> > >  >
> > >  >    <memory unit='KiB'>8388608</memory>
> > >  >    <currentMemory unit='KiB'>8388608</currentMemory>
> > >  >    <memoryBacking>
> > >  >      <hugepages>
> > >  >        <page size='1048576' unit='KiB' nodeset='0'/>
> > >  >      </hugepages>
> > >  >    </memoryBacking>
> > > I did not copy this part of the XML, but we have hugepages
> > > configured properly.
> > >  >
> > >  >
> > >  > > -----------------------------------
> > >  > > OVS Start Config
> > >  > > -----------------------------------
> > >  > > ovs-vsctl --no-wait set Open_vSwitch .
> > > other_config:dpdk-init=true  > > ovs-vsctl --no-wait set
> > > Open_vSwitch . other_config:dpdk-socket-  > mem="4096,0"
> > >  > > ovs-vsctl --no-wait set Open_vSwitch . other_config:dpdk-lcore-
> > > > mask=0xff  > > ovs-vsctl --no-wait set Open_vSwitch .
> > > other_config:pmd-cpu-mask=0e  >  > These two masks shouldn't
> > > overlap:
> > >  >
> > >
> > https://developers.redhat.com/blog/2017/06/28/ovs-dpdk-parameters-deal
> > ing-with-multi-numa/
> > > <
> > https://mailtrack.io/trace/link/6c59473e0a8547a9cb80d8f52f9cf5190a0712
> >
> f6?url=https%3A%2F%2Fdevelopers.redhat.com%2Fblog%2F2017%2F06%2F28%
> 2Fo
> > vs-dpdk-parameters-dealing-with-multi-numa%2F&userId=1840365&signature
> > =b01114dc094e5fb9
> > >
> > >  >
> > > Thanks, this did really help me understand in which order these
> > > commands should be issued.
> > >
> > > So, the problem now is the following.
> > > I did all the changes you shared, and started OVS/DPDK in a proper
> > > way and set these features:
> > >
> > > ovs-vsctl --no-wait set Open_vSwitch . other_config:dpdk-socket-
> > > mem="8192,0"
> > >
> > > ovs-vsctl --no-wait set Open_vSwitch . other_config:dpdk-lcore-
> > > mask=0x01000
> > >
> > > ovs-vsctl --no-wait set Open_vSwitch . other_config:dpdk-init=true
> > >
> > > and, finally this:
> > > ovs-vsctl --no-wait set Open_vSwitch . other_config:pmd-cpu-
> > > mask=0x0e000
> > >
> > > The documentation you shared say this last one can be even set
> > > during runtime. So, I was playing with it to see there is any change.
> > >
> > > I did not start any VM on top of OVS/DPDK, just set up a port
> > > forward rule (in_port=1, actions=output:IN_PORT), since I only have
> > > one physical ports on each mellanox card.
> > > Then, I generated traffic from the other server towards OVS Using
> > > pktsize 64B, the max throughput Pktgen reports is 8Gbps.
> > > In particular, I got these metrics:
> > > Size        Sent_pps       Recv_pps      Recv_Gbps
> > > 64B           93M            11M             ~8
> > > 128B          65M            12.5M           ~15
> > > 256B          42.5M          12.3M           ~27
> > > 512B          23.5M          11.9M           ~51
> > > 1024B         11.9M          10M             ~83
> > > 1280B         9.6M           8.3M            ~86
> > > 1500B         8.3M           6.7M            ~82
> > >
> > > It's quite interesting that for 64B, the pps is less than for
> > > greater sizes. Because PPS should be the practical limitation in
> > > throughput, and according to the packet size we can count the throughput in
> Gbps.
> >
> > Looking at 64B performance gives you a sense of the per-packet
> > overhead associated with the DPDK framework and your application.  At
> > 100Gb/s line rate, 64B frames will arrive every 6.72ns.  Since your
> > received PPS is peaking around 12.5MPPS I'd guess that it's taking
> > about 80ns of CPU time per frame.  I don't know how well OVS scales
> > with additional CPUs, something to look at.
> >
> > You don't mention how many different flows you're using in the test.
> > Don't be surprised as throughput drops when you move from 1,000 flows
> > to
> > 1,000,000 flows.
> >
> > It's likely that most of your frame loss is due to the NICs RX buffers
> > overflowing and dropping frames due to back pressure (i.e. DPDK/OVS
> > can't process packets fast enough).  Look the the mlx5's hardware
> > statistics to verify.
> >
> > You may be able to improve the performance by increasing the number of
> > RX queues and RX descriptors per queue, and assigning more lcores to
> > match the number of queues, allowing the work to be spread more evenly
> > and reducing buffer overflows.  This often works when running testpmd
> > alone since the app overhead is low but has less effect on OVS
> > perforamnce.  You might consider benchmarking testpmd alone vs
> > OVS/DPDK to understand the OVS overhead.
> >
> > >
> > > Anyway, OVS-DPDK have 3 cores to use, but only one rx queue is
> > > assigned to the port (so, basically --- as `top` also shows --- it
> > > is the one- core performance.
> >
> > Increasing the number of RX queues/descriptors and assigning a
> > dedicated lcore to each queue will generally improve performance if
> > your bottleneck is RX in the PMD.
> >
> > > Increasing the cores did not help, and the performance remained the
> > > same. Is this performance normal for OVS/DPDK?
> >
> > That's been my experience, though there are other who have more
> > experience with performance testing OVS.  The platform matters.  Look
> > for existing whitepapers and compare your system configuration to
> > theirs to see what you need to achieve the performance you're looking for.
> >
> > Dave
> >
> 
> 
> --
> 
> Vipul Ujawane <https://vipul999ujawane.github.io/>
> Pre-Final Year Undergraduate
> Department of Industrial and Systems Engineering Indian Institute of Technology,
> Kharagpur

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [dpdk-users] Poor performance when using OVS with DPDK
  2020-06-26 19:32     ` David Christensen
@ 2020-06-29  8:33       ` Vipul Ujawane
  2020-06-30  4:41         ` Xia, Chenbo
  0 siblings, 1 reply; 7+ messages in thread
From: Vipul Ujawane @ 2020-06-29  8:33 UTC (permalink / raw)
  To: David Christensen; +Cc: users

So,
> You don't mention how many different flows you're using in the test.
Don't be surprised as throughput drops when you move from 1,000 flows to
1,000,000 flows.

We currently only have 1 flow, the basic packet forwarding rule. We used
pktgen standard built-in packet generation without any pcap or script that
would change the flows!
Therefore, increasing the number of queues (and cores/queues) cannot
help; that flow will always be handled in one specific queue.

Increasing the overall core assignment to DPDK should then help, but it
does not. On the other hand, we tested again the VM-to-VM performance as
well via
iperf and the dpdkvhost user interfaces in the KVM machines, but the
performance is still bad with the new settings, although a bit
increased; it's around 10G now.
Note again, it's iperf using TCP and MTU sized packets (but with OVS-
Kernel, the performance is 20G with a similar setup).

Thanks.

On Sat, Jun 27, 2020 at 3:32 AM David Christensen <drc@linux.vnet.ibm.com>
wrote:

> >  > Why don't you reserve any CPUs for OVS/DPDK or VM usage?  All
> >  > published
> >  > performance white papers recommend settings for CPU isolation like
> >  > this
> >  > Mellanox DPDK performance report:
> >  >
> >  >
> >
> https://fast.dpdk.org/doc/perf/DPDK_19_08_Mellanox_NIC_performance_report.pdf
> > <
> https://mailtrack.io/trace/link/d820a3bc37ae49e92351ef785e3cfb4c21ab5d3c?url=https%3A%2F%2Ffast.dpdk.org%2Fdoc%2Fperf%2FDPDK_19_08_Mellanox_NIC_performance_report.pdf&userId=1840365&signature=eda177b39475fac6
> >
> >  >
> >  > For their test system:
> >  >
> >  > isolcpus=24-47 intel_idle.max_cstate=0 processor.max_cstate=0
> >  > intel_pstate=disable nohz_full=24-47
> >  > rcu_nocbs=24-47 rcu_nocb_poll default_hugepagesz=1G hugepagesz=1G
> >  > hugepages=64 audit=0
> >  > nosoftlockup
> >  >
> >  > Using the tuned service (CPU partitioning profile) make this process
> >  > easier:
> >  >
> >  > https://tuned-project.org/
> > <
> https://mailtrack.io/trace/link/4f3f47457c7163aacfe6bb108c6eff554be9cd4d?url=https%3A%2F%2Ftuned-project.org%2F&userId=1840365&signature=f1504b405d9e880f
> >
> >  >
> > Nice tutorial, thanks for sharing. I have checked it and configured our
> > server like this:
> >
> > isolcpus=12-19 intel_idle.max_cstate=0 processor.max_cstate=0
> > nohz_full=12-19 rcu_nocbs=12-19 intel_pstate=disable
> > default_hugepagesz=1G hugepagesz=1G hugepages=24 audit=0 nosoftlockup
> > intel_iommu=on iommu=pt rcu_nocb_poll
> >
> >
> > Even though our servers are NUMA-capable and NUMA-aware, we only have
> > one CPU installed in one socket.
> > And one CPU has 20 physical cores (40 threads), so I figured out to use
> > the "top-most" cores for DPDK/OVS, that's the reason of isolcpus=12-19
>
> You can never have too many cores.  On POWER systems I'll sometimes
> reserve 76 out of 80 available cores to improve overall throughput.
>
> >  > >
> >  > > ./usertools/dpdk-devbind.py --status
> >  > > Network devices using kernel driver
> >  > > ===================================
> >  > > 0000:b3:00.0 'MT27800 Family [ConnectX-5] 1017' if=ens2
> >  > drv=mlx5_core
> >  > > unused=igb_uio,vfio-pci
> >  > >
> >  > > Due to the way how Mellanox cards and their driver work, I have not
> >  > bond
> >  > > igb_uio to the interface, however, uio, igb_uio and vfio-pci kernel
> >  > modules
> >  > > are loaded.
> >  > >
> >  > >
> >  > > Relevant part of the VM-config for Qemu/KVM
> >  > > -------------------------------------------
> >  > >    <cputune>
> >  > >      <shares>4096</shares>
> >  > >      <vcpupin vcpu='0' cpuset='4'/>
> >  > >      <vcpupin vcpu='1' cpuset='5'/>
> >  >
> >  > Where did you get these CPU mapping values?  x86 systems typically
> >  > map
> >  > even-numbered CPUs to one NUMA node and odd-numbered CPUs to a
> >  > different
> >  > NUMA node.  You generally want to select CPUs from the same NUMA node
> >  > as
> >  > the mlx5 NIC you're using for DPDK.
> >  >
> >  > You should have at least 4 CPUs in the VM, selected according to the
> >  > NUMA topology of the system.
> > as per my answer above, our system has no secondary NUMA node, all
> > mappings are to the same socket/CPU.
> >
> >  >
> >  > Take a look at this bash script written for Red Hat:
> >  >
> >  >
> >
> https://github.com/ctrautma/RHEL_NIC_QUALIFICATION/blob/ansible/ansible/get_cpulist.sh
> > <
> https://mailtrack.io/trace/link/cf74e0c69acb6d9a348606606825a0320898824a?url=https%3A%2F%2Fgithub.com%2Fctrautma%2FRHEL_NIC_QUALIFICATION%2Fblob%2Fansible%2Fansible%2Fget_cpulist.sh&userId=1840365&signature=9884f8ea2b42399d
> >
> >  >
> >  > It gives you a good starting reference which CPUs to select for the
> >  > OVS/DPDK and VM configurations on your particular system.  Also
> >  > review
> >  > the Ansible script pvp_ovsdpdk.yml, it provides a lot of other
> >  > useful
> >  > steps you might be able to apply to your Debian OS.
> >  >
> >  > >      <emulatorpin cpuset='4-5'/>
> >  > >    </cputune>
> >  > >    <cpu mode='host-model' check='partial'>
> >  > >      <model fallback='allow'/>
> >  > >      <topology sockets='2' cores='1' threads='1'/>
> >  > >      <numa>
> >  > >        <cell id='0' cpus='0-1' memory='4194304' unit='KiB'
> >  > > memAccess='shared'/>
> >  > >      </numa>
> >  > >    </cpu>
> >  > >      <interface type='vhostuser'>
> >  > >        <mac address='00:00:00:00:00:aa'/>
> >  > >        <source type='unix'
> >  > path='/usr/local/var/run/openvswitch/vhostuser'
> >  > > mo$
> >  > >        <model type='virtio'/>
> >  > >        <driver queues='2'>
> >  > >          <host mrg_rxbuf='on'/>
> >  >
> >  > Is there a requirement for mergeable RX buffers?  Some PMDs like
> >  > mlx5
> >  > can take advantage of SSE instructions when this is disabled,
> >  > yielding
> >  > better performance.
> > Good point, there is no requirement, I just took an example config and
> > though it's necessary for the driver queues setting.
>
> That's how we all learn :-)
>
> >  >
> >  > >        </driver>
> >  > >        <address type='pci' domain='0x0000' bus='0x07' slot='0x00'
> >  > > function='0x0'$
> >  > >      </interface>
> >  > >
> >  >
> >  > I don't see hugepage usage in the libvirt XML.  Something similar to:
> >  >
> >  >    <memory unit='KiB'>8388608</memory>
> >  >    <currentMemory unit='KiB'>8388608</currentMemory>
> >  >    <memoryBacking>
> >  >      <hugepages>
> >  >        <page size='1048576' unit='KiB' nodeset='0'/>
> >  >      </hugepages>
> >  >    </memoryBacking>
> > I did not copy this part of the XML, but we have hugepages configured
> > properly.
> >  >
> >  >
> >  > > -----------------------------------
> >  > > OVS Start Config
> >  > > -----------------------------------
> >  > > ovs-vsctl --no-wait set Open_vSwitch . other_config:dpdk-init=true
> >  > > ovs-vsctl --no-wait set Open_vSwitch . other_config:dpdk-socket-
> >  > mem="4096,0"
> >  > > ovs-vsctl --no-wait set Open_vSwitch . other_config:dpdk-lcore-
> >  > mask=0xff
> >  > > ovs-vsctl --no-wait set Open_vSwitch . other_config:pmd-cpu-mask=0e
> >  >
> >  > These two masks shouldn't overlap:
> >  >
> >
> https://developers.redhat.com/blog/2017/06/28/ovs-dpdk-parameters-dealing-with-multi-numa/
> > <
> https://mailtrack.io/trace/link/6c59473e0a8547a9cb80d8f52f9cf5190a0712f6?url=https%3A%2F%2Fdevelopers.redhat.com%2Fblog%2F2017%2F06%2F28%2Fovs-dpdk-parameters-dealing-with-multi-numa%2F&userId=1840365&signature=b01114dc094e5fb9
> >
> >  >
> > Thanks, this did really help me understand in which order these
> > commands should be issued.
> >
> > So, the problem now is the following.
> > I did all the changes you shared, and started OVS/DPDK in a proper way
> > and set these features:
> >
> > ovs-vsctl --no-wait set Open_vSwitch . other_config:dpdk-socket-
> > mem="8192,0"
> >
> > ovs-vsctl --no-wait set Open_vSwitch . other_config:dpdk-lcore-
> > mask=0x01000
> >
> > ovs-vsctl --no-wait set Open_vSwitch . other_config:dpdk-init=true
> >
> > and, finally this:
> > ovs-vsctl --no-wait set Open_vSwitch . other_config:pmd-cpu-
> > mask=0x0e000
> >
> > The documentation you shared say this last one can be even set during
> > runtime. So, I was playing with it to see there is any change.
> >
> > I did not start any VM on top of OVS/DPDK, just set up a port forward
> > rule (in_port=1, actions=output:IN_PORT), since I only have one
> > physical ports on each mellanox card.
> > Then, I generated traffic from the other server towards OVS
> > Using pktsize 64B, the max throughput Pktgen reports is 8Gbps.
> > In particular, I got these metrics:
> > Size        Sent_pps       Recv_pps      Recv_Gbps
> > 64B           93M            11M             ~8
> > 128B          65M            12.5M           ~15
> > 256B          42.5M          12.3M           ~27
> > 512B          23.5M          11.9M           ~51
> > 1024B         11.9M          10M             ~83
> > 1280B         9.6M           8.3M            ~86
> > 1500B         8.3M           6.7M            ~82
> >
> > It's quite interesting that for 64B, the pps is less than for greater
> > sizes. Because PPS should be the practical limitation in throughput,
> > and according to the packet size we can count the throughput in Gbps.
>
> Looking at 64B performance gives you a sense of the per-packet overhead
> associated with the DPDK framework and your application.  At 100Gb/s
> line rate, 64B frames will arrive every 6.72ns.  Since your received PPS
> is peaking around 12.5MPPS I'd guess that it's taking about 80ns of CPU
> time per frame.  I don't know how well OVS scales with additional CPUs,
> something to look at.
>
> You don't mention how many different flows you're using in the test.
> Don't be surprised as throughput drops when you move from 1,000 flows to
> 1,000,000 flows.
>
> It's likely that most of your frame loss is due to the NICs RX buffers
> overflowing and dropping frames due to back pressure (i.e. DPDK/OVS
> can't process packets fast enough).  Look the the mlx5's hardware
> statistics to verify.
>
> You may be able to improve the performance by increasing the number of
> RX queues and RX descriptors per queue, and assigning more lcores to
> match the number of queues, allowing the work to be spread more evenly
> and reducing buffer overflows.  This often works when running testpmd
> alone since the app overhead is low but has less effect on OVS
> perforamnce.  You might consider benchmarking testpmd alone vs OVS/DPDK
> to understand the OVS overhead.
>
> >
> > Anyway, OVS-DPDK have 3 cores to use, but only one rx queue is assigned
> > to the port (so, basically --- as `top` also shows --- it is the one-
> > core performance.
>
> Increasing the number of RX queues/descriptors and assigning a dedicated
> lcore to each queue will generally improve performance if your
> bottleneck is RX in the PMD.
>
> > Increasing the cores did not help, and the performance remained the
> > same. Is this performance normal for OVS/DPDK?
>
> That's been my experience, though there are other who have more
> experience with performance testing OVS.  The platform matters.  Look
> for existing whitepapers and compare your system configuration to theirs
> to see what you need to achieve the performance you're looking for.
>
> Dave
>


-- 

Vipul Ujawane <https://vipul999ujawane.github.io/>
Pre-Final Year Undergraduate
Department of Industrial and Systems Engineering
Indian Institute of Technology, Kharagpur

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [dpdk-users] Poor performance when using OVS with DPDK
  2020-06-26  9:39   ` Vipul Ujawane
@ 2020-06-26 19:32     ` David Christensen
  2020-06-29  8:33       ` Vipul Ujawane
  0 siblings, 1 reply; 7+ messages in thread
From: David Christensen @ 2020-06-26 19:32 UTC (permalink / raw)
  To: Vipul Ujawane; +Cc: users

>  > Why don't you reserve any CPUs for OVS/DPDK or VM usage?  All
>  > published
>  > performance white papers recommend settings for CPU isolation like
>  > this
>  > Mellanox DPDK performance report:
>  >
>  > 
> https://fast.dpdk.org/doc/perf/DPDK_19_08_Mellanox_NIC_performance_report.pdf 
> <https://mailtrack.io/trace/link/d820a3bc37ae49e92351ef785e3cfb4c21ab5d3c?url=https%3A%2F%2Ffast.dpdk.org%2Fdoc%2Fperf%2FDPDK_19_08_Mellanox_NIC_performance_report.pdf&userId=1840365&signature=eda177b39475fac6>
>  >
>  > For their test system:
>  >
>  > isolcpus=24-47 intel_idle.max_cstate=0 processor.max_cstate=0
>  > intel_pstate=disable nohz_full=24-47
>  > rcu_nocbs=24-47 rcu_nocb_poll default_hugepagesz=1G hugepagesz=1G
>  > hugepages=64 audit=0
>  > nosoftlockup
>  >
>  > Using the tuned service (CPU partitioning profile) make this process
>  > easier:
>  >
>  > https://tuned-project.org/ 
> <https://mailtrack.io/trace/link/4f3f47457c7163aacfe6bb108c6eff554be9cd4d?url=https%3A%2F%2Ftuned-project.org%2F&userId=1840365&signature=f1504b405d9e880f>
>  >
> Nice tutorial, thanks for sharing. I have checked it and configured our
> server like this:
> 
> isolcpus=12-19 intel_idle.max_cstate=0 processor.max_cstate=0
> nohz_full=12-19 rcu_nocbs=12-19 intel_pstate=disable
> default_hugepagesz=1G hugepagesz=1G hugepages=24 audit=0 nosoftlockup
> intel_iommu=on iommu=pt rcu_nocb_poll
> 
> 
> Even though our servers are NUMA-capable and NUMA-aware, we only have
> one CPU installed in one socket.
> And one CPU has 20 physical cores (40 threads), so I figured out to use
> the "top-most" cores for DPDK/OVS, that's the reason of isolcpus=12-19

You can never have too many cores.  On POWER systems I'll sometimes 
reserve 76 out of 80 available cores to improve overall throughput.

>  > >
>  > > ./usertools/dpdk-devbind.py --status
>  > > Network devices using kernel driver
>  > > ===================================
>  > > 0000:b3:00.0 'MT27800 Family [ConnectX-5] 1017' if=ens2
>  > drv=mlx5_core
>  > > unused=igb_uio,vfio-pci
>  > >
>  > > Due to the way how Mellanox cards and their driver work, I have not
>  > bond
>  > > igb_uio to the interface, however, uio, igb_uio and vfio-pci kernel
>  > modules
>  > > are loaded.
>  > >
>  > >
>  > > Relevant part of the VM-config for Qemu/KVM
>  > > -------------------------------------------
>  > >    <cputune>
>  > >      <shares>4096</shares>
>  > >      <vcpupin vcpu='0' cpuset='4'/>
>  > >      <vcpupin vcpu='1' cpuset='5'/>
>  >
>  > Where did you get these CPU mapping values?  x86 systems typically
>  > map
>  > even-numbered CPUs to one NUMA node and odd-numbered CPUs to a
>  > different
>  > NUMA node.  You generally want to select CPUs from the same NUMA node
>  > as
>  > the mlx5 NIC you're using for DPDK.
>  >
>  > You should have at least 4 CPUs in the VM, selected according to the
>  > NUMA topology of the system.
> as per my answer above, our system has no secondary NUMA node, all
> mappings are to the same socket/CPU.
> 
>  >
>  > Take a look at this bash script written for Red Hat:
>  >
>  > 
> https://github.com/ctrautma/RHEL_NIC_QUALIFICATION/blob/ansible/ansible/get_cpulist.sh 
> <https://mailtrack.io/trace/link/cf74e0c69acb6d9a348606606825a0320898824a?url=https%3A%2F%2Fgithub.com%2Fctrautma%2FRHEL_NIC_QUALIFICATION%2Fblob%2Fansible%2Fansible%2Fget_cpulist.sh&userId=1840365&signature=9884f8ea2b42399d>
>  >
>  > It gives you a good starting reference which CPUs to select for the
>  > OVS/DPDK and VM configurations on your particular system.  Also
>  > review
>  > the Ansible script pvp_ovsdpdk.yml, it provides a lot of other
>  > useful
>  > steps you might be able to apply to your Debian OS.
>  >
>  > >      <emulatorpin cpuset='4-5'/>
>  > >    </cputune>
>  > >    <cpu mode='host-model' check='partial'>
>  > >      <model fallback='allow'/>
>  > >      <topology sockets='2' cores='1' threads='1'/>
>  > >      <numa>
>  > >        <cell id='0' cpus='0-1' memory='4194304' unit='KiB'
>  > > memAccess='shared'/>
>  > >      </numa>
>  > >    </cpu>
>  > >      <interface type='vhostuser'>
>  > >        <mac address='00:00:00:00:00:aa'/>
>  > >        <source type='unix'
>  > path='/usr/local/var/run/openvswitch/vhostuser'
>  > > mo$
>  > >        <model type='virtio'/>
>  > >        <driver queues='2'>
>  > >          <host mrg_rxbuf='on'/>
>  >
>  > Is there a requirement for mergeable RX buffers?  Some PMDs like
>  > mlx5
>  > can take advantage of SSE instructions when this is disabled,
>  > yielding
>  > better performance.
> Good point, there is no requirement, I just took an example config and
> though it's necessary for the driver queues setting.

That's how we all learn :-)

>  >
>  > >        </driver>
>  > >        <address type='pci' domain='0x0000' bus='0x07' slot='0x00'
>  > > function='0x0'$
>  > >      </interface>
>  > >
>  >
>  > I don't see hugepage usage in the libvirt XML.  Something similar to:
>  >
>  >    <memory unit='KiB'>8388608</memory>
>  >    <currentMemory unit='KiB'>8388608</currentMemory>
>  >    <memoryBacking>
>  >      <hugepages>
>  >        <page size='1048576' unit='KiB' nodeset='0'/>
>  >      </hugepages>
>  >    </memoryBacking>
> I did not copy this part of the XML, but we have hugepages configured
> properly.
>  >
>  >
>  > > -----------------------------------
>  > > OVS Start Config
>  > > -----------------------------------
>  > > ovs-vsctl --no-wait set Open_vSwitch . other_config:dpdk-init=true
>  > > ovs-vsctl --no-wait set Open_vSwitch . other_config:dpdk-socket-
>  > mem="4096,0"
>  > > ovs-vsctl --no-wait set Open_vSwitch . other_config:dpdk-lcore-
>  > mask=0xff
>  > > ovs-vsctl --no-wait set Open_vSwitch . other_config:pmd-cpu-mask=0e
>  >
>  > These two masks shouldn't overlap:
>  > 
> https://developers.redhat.com/blog/2017/06/28/ovs-dpdk-parameters-dealing-with-multi-numa/ 
> <https://mailtrack.io/trace/link/6c59473e0a8547a9cb80d8f52f9cf5190a0712f6?url=https%3A%2F%2Fdevelopers.redhat.com%2Fblog%2F2017%2F06%2F28%2Fovs-dpdk-parameters-dealing-with-multi-numa%2F&userId=1840365&signature=b01114dc094e5fb9>
>  >
> Thanks, this did really help me understand in which order these
> commands should be issued.
> 
> So, the problem now is the following.
> I did all the changes you shared, and started OVS/DPDK in a proper way
> and set these features:
> 
> ovs-vsctl --no-wait set Open_vSwitch . other_config:dpdk-socket-
> mem="8192,0"
> 
> ovs-vsctl --no-wait set Open_vSwitch . other_config:dpdk-lcore-
> mask=0x01000
> 
> ovs-vsctl --no-wait set Open_vSwitch . other_config:dpdk-init=true
> 
> and, finally this:
> ovs-vsctl --no-wait set Open_vSwitch . other_config:pmd-cpu-
> mask=0x0e000
> 
> The documentation you shared say this last one can be even set during
> runtime. So, I was playing with it to see there is any change.
> 
> I did not start any VM on top of OVS/DPDK, just set up a port forward
> rule (in_port=1, actions=output:IN_PORT), since I only have one
> physical ports on each mellanox card.
> Then, I generated traffic from the other server towards OVS
> Using pktsize 64B, the max throughput Pktgen reports is 8Gbps.
> In particular, I got these metrics:
> Size        Sent_pps       Recv_pps      Recv_Gbps
> 64B           93M            11M             ~8
> 128B          65M            12.5M           ~15
> 256B          42.5M          12.3M           ~27
> 512B          23.5M          11.9M           ~51
> 1024B         11.9M          10M             ~83
> 1280B         9.6M           8.3M            ~86
> 1500B         8.3M           6.7M            ~82
> 
> It's quite interesting that for 64B, the pps is less than for greater
> sizes. Because PPS should be the practical limitation in throughput,
> and according to the packet size we can count the throughput in Gbps.

Looking at 64B performance gives you a sense of the per-packet overhead 
associated with the DPDK framework and your application.  At 100Gb/s 
line rate, 64B frames will arrive every 6.72ns.  Since your received PPS 
is peaking around 12.5MPPS I'd guess that it's taking about 80ns of CPU 
time per frame.  I don't know how well OVS scales with additional CPUs, 
something to look at.

You don't mention how many different flows you're using in the test. 
Don't be surprised as throughput drops when you move from 1,000 flows to 
1,000,000 flows.

It's likely that most of your frame loss is due to the NICs RX buffers 
overflowing and dropping frames due to back pressure (i.e. DPDK/OVS 
can't process packets fast enough).  Look the the mlx5's hardware 
statistics to verify.

You may be able to improve the performance by increasing the number of 
RX queues and RX descriptors per queue, and assigning more lcores to 
match the number of queues, allowing the work to be spread more evenly 
and reducing buffer overflows.  This often works when running testpmd 
alone since the app overhead is low but has less effect on OVS 
perforamnce.  You might consider benchmarking testpmd alone vs OVS/DPDK 
to understand the OVS overhead.

> 
> Anyway, OVS-DPDK have 3 cores to use, but only one rx queue is assigned
> to the port (so, basically --- as `top` also shows --- it is the one-
> core performance.

Increasing the number of RX queues/descriptors and assigning a dedicated 
lcore to each queue will generally improve performance if your 
bottleneck is RX in the PMD.

> Increasing the cores did not help, and the performance remained the
> same. Is this performance normal for OVS/DPDK?

That's been my experience, though there are other who have more 
experience with performance testing OVS.  The platform matters.  Look 
for existing whitepapers and compare your system configuration to theirs 
to see what you need to achieve the performance you're looking for.

Dave

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [dpdk-users] Poor performance when using OVS with DPDK
  2020-06-25 18:03 ` David Christensen
@ 2020-06-26  9:39   ` Vipul Ujawane
  2020-06-26 19:32     ` David Christensen
  0 siblings, 1 reply; 7+ messages in thread
From: Vipul Ujawane @ 2020-06-26  9:39 UTC (permalink / raw)
  To: David Christensen; +Cc: users

On Fri, 2020-06-26 at 02:08 +0800, Vipul Ujawane wrote:
>
>
> ---------- Forwarded message ---------
> From: David Christensen <drc@linux.vnet.ibm.com>
> Date: Fri, Jun 26, 2020, 02:03
> Subject: Re: [dpdk-users] Poor performance when using OVS with DPDK
> To: Vipul Ujawane <vipul999ujawane@gmail.com>, <users@dpdk.org>
>
>
>
>
> On 6/24/20 4:03 AM, Vipul Ujawane wrote:
> > Dear all,
> >
> > I am observing a very low performance when running OVS-DPDK when
> compared
> > to OVS running with the Kernel Datapath.
> > I have OvS version 2.13.90 compiled from source with the latest
> stable DPDK
> > v19.11.3 on a stable Debian system running kernel 4.19.0-9-amd64
> (real
> > version:4.19.118).
> >
> > I have tried to use the latest released OvS as well (2.12) with the
> same
> > LTS DPDK. As a last resort, I have tried an older kernel, whether
> it has
> > any problem (4.19.0-8-amd64 (real version:4.19.98)).
> >
> > I have not been able to troubleshoot the problem, and kindly
> request your
> > help regarding the same.
> >
> > HW configuration
> > ================
> > We have to two totally identical servers (Debian stable, Intel(R)
> Xeon(R)
> > Gold 6230 CPU, 96G Mem), each runs KVM virtual machine. On the
> hypervisor
> > layer, we have OvS for traffic routing. The servers are connected
> directly
> > via a Mellanox ConnectX-5 (1x100G).
> > OVS Forwarding tables are configured for simple port-forwarding
> only to
> > avoid any packet processing-related issue.
> >
> > Problem
> > =======
> > When both servers are running OVS-Kernel at the hypervisor layer
> and VMs
> > are connected to it via libvirt and virtio interfaces, the
> > VM->Server1->Server2->VM throughput is around 16-18Gbps.
> > However, when using OVS-DPDK with the same setting, the throughput
> drops
> > down to 4-6Gbps.
>
> You don't mention the traffic profile.  I assume 64 byte frames but
> best
> to be explicit.

Sure, sorry about that! we did use iperf (MTU sized packets) and the
gotten throughput was 4-6Gbps.


> >
> > SW/driver configurations:
> > ==================
> > DPDK
> > ----
> > In config common_base, besides the defaults, I have enabled the
> following
> > extra drivers/features to be compiled/enabled.
> > CONFIG_RTE_LIBRTE_MLX5_PMD=y
> > CONFIG_RTE_LIBRTE_VHOST=y
> > CONFIG_RTE_LIBRTE_VHOST_NUMA=y
> > CONFIG_RTE_LIBRTE_PMD_VHOST=y
> > CONFIG_RTE_VIRTIO_USER=n
> > CONFIG_RTE_EAL_VFIO=y
> >
> >
> > OVS
> > ---
> > $ovs-vswitchd --version
> > ovs-vswitchd (Open vSwitch) 2.13.90
> >
> > $sudo ovs-vsctl get Open_vSwitch . dpdk_initialized
> > true
> >
> > $sudo ovs-vsctl get Open_vSwitch . dpdk_version
> > "DPDK 19.11.3"
> >
> > OS settings
> > -----------
> > $ lsb_release -a
> > No LSB modules are available.
> > Distributor ID: Debian
> > Description: Debian GNU/Linux 10 (buster)
> > Release: 10
> > Codename: buster
> >
> >
> > $ cat /proc/cmdline
> > BOOT_IMAGE=/vmlinuz-4.19.0-9-amd64 root=/dev/mapper/Volume0-debian-
> -stable
> > ro default_hugepagesz=1G hugepagesz=1G hugepages=16 intel_iommu=on
> iommu=pt
> > quiet
>
> Why don't you reserve any CPUs for OVS/DPDK or VM usage?  All
> published
> performance white papers recommend settings for CPU isolation like
> this
> Mellanox DPDK performance report:
>
>
https://fast.dpdk.org/doc/perf/DPDK_19_08_Mellanox_NIC_performance_report.pdf
<https://mailtrack.io/trace/link/d820a3bc37ae49e92351ef785e3cfb4c21ab5d3c?url=https%3A%2F%2Ffast.dpdk.org%2Fdoc%2Fperf%2FDPDK_19_08_Mellanox_NIC_performance_report.pdf&userId=1840365&signature=eda177b39475fac6>
>
> For their test system:
>
> isolcpus=24-47 intel_idle.max_cstate=0 processor.max_cstate=0
> intel_pstate=disable nohz_full=24-47
> rcu_nocbs=24-47 rcu_nocb_poll default_hugepagesz=1G hugepagesz=1G
> hugepages=64 audit=0
> nosoftlockup
>
> Using the tuned service (CPU partitioning profile) make this process
> easier:
>
> https://tuned-project.org/
<https://mailtrack.io/trace/link/4f3f47457c7163aacfe6bb108c6eff554be9cd4d?url=https%3A%2F%2Ftuned-project.org%2F&userId=1840365&signature=f1504b405d9e880f>
>
Nice tutorial, thanks for sharing. I have checked it and configured our
server like this:

isolcpus=12-19 intel_idle.max_cstate=0 processor.max_cstate=0
nohz_full=12-19 rcu_nocbs=12-19 intel_pstate=disable
default_hugepagesz=1G hugepagesz=1G hugepages=24 audit=0 nosoftlockup
intel_iommu=on iommu=pt rcu_nocb_poll


Even though our servers are NUMA-capable and NUMA-aware, we only have
one CPU installed in one socket.
And one CPU has 20 physical cores (40 threads), so I figured out to use
the "top-most" cores for DPDK/OVS, that's the reason of isolcpus=12-19

> >
> > ./usertools/dpdk-devbind.py --status
> > Network devices using kernel driver
> > ===================================
> > 0000:b3:00.0 'MT27800 Family [ConnectX-5] 1017' if=ens2
> drv=mlx5_core
> > unused=igb_uio,vfio-pci
> >
> > Due to the way how Mellanox cards and their driver work, I have not
> bond
> > igb_uio to the interface, however, uio, igb_uio and vfio-pci kernel
> modules
> > are loaded.
> >
> >
> > Relevant part of the VM-config for Qemu/KVM
> > -------------------------------------------
> >    <cputune>
> >      <shares>4096</shares>
> >      <vcpupin vcpu='0' cpuset='4'/>
> >      <vcpupin vcpu='1' cpuset='5'/>
>
> Where did you get these CPU mapping values?  x86 systems typically
> map
> even-numbered CPUs to one NUMA node and odd-numbered CPUs to a
> different
> NUMA node.  You generally want to select CPUs from the same NUMA node
> as
> the mlx5 NIC you're using for DPDK.
>
> You should have at least 4 CPUs in the VM, selected according to the
> NUMA topology of the system.
as per my answer above, our system has no secondary NUMA node, all
mappings are to the same socket/CPU.

>
> Take a look at this bash script written for Red Hat:
>
>
https://github.com/ctrautma/RHEL_NIC_QUALIFICATION/blob/ansible/ansible/get_cpulist.sh
<https://mailtrack.io/trace/link/cf74e0c69acb6d9a348606606825a0320898824a?url=https%3A%2F%2Fgithub.com%2Fctrautma%2FRHEL_NIC_QUALIFICATION%2Fblob%2Fansible%2Fansible%2Fget_cpulist.sh&userId=1840365&signature=9884f8ea2b42399d>
>
> It gives you a good starting reference which CPUs to select for the
> OVS/DPDK and VM configurations on your particular system.  Also
> review
> the Ansible script pvp_ovsdpdk.yml, it provides a lot of other
> useful
> steps you might be able to apply to your Debian OS.
>
> >      <emulatorpin cpuset='4-5'/>
> >    </cputune>
> >    <cpu mode='host-model' check='partial'>
> >      <model fallback='allow'/>
> >      <topology sockets='2' cores='1' threads='1'/>
> >      <numa>
> >        <cell id='0' cpus='0-1' memory='4194304' unit='KiB'
> > memAccess='shared'/>
> >      </numa>
> >    </cpu>
> >      <interface type='vhostuser'>
> >        <mac address='00:00:00:00:00:aa'/>
> >        <source type='unix'
> path='/usr/local/var/run/openvswitch/vhostuser'
> > mo$
> >        <model type='virtio'/>
> >        <driver queues='2'>
> >          <host mrg_rxbuf='on'/>
>
> Is there a requirement for mergeable RX buffers?  Some PMDs like
> mlx5
> can take advantage of SSE instructions when this is disabled,
> yielding
> better performance.
Good point, there is no requirement, I just took an example config and
though it's necessary for the driver queues setting.
>
> >        </driver>
> >        <address type='pci' domain='0x0000' bus='0x07' slot='0x00'
> > function='0x0'$
> >      </interface>
> >
>
> I don't see hugepage usage in the libvirt XML.  Something similar to:
>
>    <memory unit='KiB'>8388608</memory>
>    <currentMemory unit='KiB'>8388608</currentMemory>
>    <memoryBacking>
>      <hugepages>
>        <page size='1048576' unit='KiB' nodeset='0'/>
>      </hugepages>
>    </memoryBacking>
I did not copy this part of the XML, but we have hugepages configured
properly.
>
>
> > -----------------------------------
> > OVS Start Config
> > -----------------------------------
> > ovs-vsctl --no-wait set Open_vSwitch . other_config:dpdk-init=true
> > ovs-vsctl --no-wait set Open_vSwitch . other_config:dpdk-socket-
> mem="4096,0"
> > ovs-vsctl --no-wait set Open_vSwitch . other_config:dpdk-lcore-
> mask=0xff
> > ovs-vsctl --no-wait set Open_vSwitch . other_config:pmd-cpu-mask=0e
>
> These two masks shouldn't overlap:
>
https://developers.redhat.com/blog/2017/06/28/ovs-dpdk-parameters-dealing-with-multi-numa/
<https://mailtrack.io/trace/link/6c59473e0a8547a9cb80d8f52f9cf5190a0712f6?url=https%3A%2F%2Fdevelopers.redhat.com%2Fblog%2F2017%2F06%2F28%2Fovs-dpdk-parameters-dealing-with-multi-numa%2F&userId=1840365&signature=b01114dc094e5fb9>
>
Thanks, this did really help me understand in which order these
commands should be issued.

So, the problem now is the following.
I did all the changes you shared, and started OVS/DPDK in a proper way
and set these features:

ovs-vsctl --no-wait set Open_vSwitch . other_config:dpdk-socket-
mem="8192,0"

ovs-vsctl --no-wait set Open_vSwitch . other_config:dpdk-lcore-
mask=0x01000

ovs-vsctl --no-wait set Open_vSwitch . other_config:dpdk-init=true

and, finally this:
ovs-vsctl --no-wait set Open_vSwitch . other_config:pmd-cpu-
mask=0x0e000

The documentation you shared say this last one can be even set during
runtime. So, I was playing with it to see there is any change.

I did not start any VM on top of OVS/DPDK, just set up a port forward
rule (in_port=1, actions=output:IN_PORT), since I only have one
physical ports on each mellanox card.
Then, I generated traffic from the other server towards OVS
Using pktsize 64B, the max throughput Pktgen reports is 8Gbps.
In particular, I got these metrics:
Size        Sent_pps       Recv_pps      Recv_Gbps
64B           93M            11M             ~8
128B          65M            12.5M           ~15
256B          42.5M          12.3M           ~27
512B          23.5M          11.9M           ~51
1024B         11.9M          10M             ~83
1280B         9.6M           8.3M            ~86
1500B         8.3M           6.7M            ~82

It's quite interesting that for 64B, the pps is less than for greater
sizes. Because PPS should be the practical limitation in throughput,
and according to the packet size we can count the throughput in Gbps.

Anyway, OVS-DPDK have 3 cores to use, but only one rx queue is assigned
to the port (so, basically --- as `top` also shows --- it is the one-
core performance.

Increasing the cores did not help, and the performance remained the
same. Is this performance normal for OVS/DPDK?

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [dpdk-users] Poor performance when using OVS with DPDK
  2020-06-24 11:03 Vipul Ujawane
@ 2020-06-25 18:03 ` David Christensen
  2020-06-26  9:39   ` Vipul Ujawane
  0 siblings, 1 reply; 7+ messages in thread
From: David Christensen @ 2020-06-25 18:03 UTC (permalink / raw)
  To: Vipul Ujawane, users



On 6/24/20 4:03 AM, Vipul Ujawane wrote:
> Dear all,
> 
> I am observing a very low performance when running OVS-DPDK when compared
> to OVS running with the Kernel Datapath.
> I have OvS version 2.13.90 compiled from source with the latest stable DPDK
> v19.11.3 on a stable Debian system running kernel 4.19.0-9-amd64 (real
> version:4.19.118).
> 
> I have tried to use the latest released OvS as well (2.12) with the same
> LTS DPDK. As a last resort, I have tried an older kernel, whether it has
> any problem (4.19.0-8-amd64 (real version:4.19.98)).
> 
> I have not been able to troubleshoot the problem, and kindly request your
> help regarding the same.
> 
> HW configuration
> ================
> We have to two totally identical servers (Debian stable, Intel(R) Xeon(R)
> Gold 6230 CPU, 96G Mem), each runs KVM virtual machine. On the hypervisor
> layer, we have OvS for traffic routing. The servers are connected directly
> via a Mellanox ConnectX-5 (1x100G).
> OVS Forwarding tables are configured for simple port-forwarding only to
> avoid any packet processing-related issue.
> 
> Problem
> =======
> When both servers are running OVS-Kernel at the hypervisor layer and VMs
> are connected to it via libvirt and virtio interfaces, the
> VM->Server1->Server2->VM throughput is around 16-18Gbps.
> However, when using OVS-DPDK with the same setting, the throughput drops
> down to 4-6Gbps.

You don't mention the traffic profile.  I assume 64 byte frames but best 
to be explicit.

> 
> SW/driver configurations:
> ==================
> DPDK
> ----
> In config common_base, besides the defaults, I have enabled the following
> extra drivers/features to be compiled/enabled.
> CONFIG_RTE_LIBRTE_MLX5_PMD=y
> CONFIG_RTE_LIBRTE_VHOST=y
> CONFIG_RTE_LIBRTE_VHOST_NUMA=y
> CONFIG_RTE_LIBRTE_PMD_VHOST=y
> CONFIG_RTE_VIRTIO_USER=n
> CONFIG_RTE_EAL_VFIO=y
> 
> 
> OVS
> ---
> $ovs-vswitchd --version
> ovs-vswitchd (Open vSwitch) 2.13.90
> 
> $sudo ovs-vsctl get Open_vSwitch . dpdk_initialized
> true
> 
> $sudo ovs-vsctl get Open_vSwitch . dpdk_version
> "DPDK 19.11.3"
> 
> OS settings
> -----------
> $ lsb_release -a
> No LSB modules are available.
> Distributor ID: Debian
> Description: Debian GNU/Linux 10 (buster)
> Release: 10
> Codename: buster
> 
> 
> $ cat /proc/cmdline
> BOOT_IMAGE=/vmlinuz-4.19.0-9-amd64 root=/dev/mapper/Volume0-debian--stable
> ro default_hugepagesz=1G hugepagesz=1G hugepages=16 intel_iommu=on iommu=pt
> quiet

Why don't you reserve any CPUs for OVS/DPDK or VM usage?  All published 
performance white papers recommend settings for CPU isolation like this 
Mellanox DPDK performance report:

https://fast.dpdk.org/doc/perf/DPDK_19_08_Mellanox_NIC_performance_report.pdf

For their test system:

isolcpus=24-47 intel_idle.max_cstate=0 processor.max_cstate=0 
intel_pstate=disable nohz_full=24-47
rcu_nocbs=24-47 rcu_nocb_poll default_hugepagesz=1G hugepagesz=1G 
hugepages=64 audit=0
nosoftlockup

Using the tuned service (CPU partitioning profile) make this process easier:

https://tuned-project.org/

> 
> ./usertools/dpdk-devbind.py --status
> Network devices using kernel driver
> ===================================
> 0000:b3:00.0 'MT27800 Family [ConnectX-5] 1017' if=ens2 drv=mlx5_core
> unused=igb_uio,vfio-pci
> 
> Due to the way how Mellanox cards and their driver work, I have not bond
> igb_uio to the interface, however, uio, igb_uio and vfio-pci kernel modules
> are loaded.
> 
> 
> Relevant part of the VM-config for Qemu/KVM
> -------------------------------------------
>    <cputune>
>      <shares>4096</shares>
>      <vcpupin vcpu='0' cpuset='4'/>
>      <vcpupin vcpu='1' cpuset='5'/>

Where did you get these CPU mapping values?  x86 systems typically map 
even-numbered CPUs to one NUMA node and odd-numbered CPUs to a different 
NUMA node.  You generally want to select CPUs from the same NUMA node as 
the mlx5 NIC you're using for DPDK.

You should have at least 4 CPUs in the VM, selected according to the 
NUMA topology of the system.

Take a look at this bash script written for Red Hat:

https://github.com/ctrautma/RHEL_NIC_QUALIFICATION/blob/ansible/ansible/get_cpulist.sh

It gives you a good starting reference which CPUs to select for the 
OVS/DPDK and VM configurations on your particular system.  Also review 
the Ansible script pvp_ovsdpdk.yml, it provides a lot of other useful 
steps you might be able to apply to your Debian OS.

>      <emulatorpin cpuset='4-5'/>
>    </cputune>
>    <cpu mode='host-model' check='partial'>
>      <model fallback='allow'/>
>      <topology sockets='2' cores='1' threads='1'/>
>      <numa>
>        <cell id='0' cpus='0-1' memory='4194304' unit='KiB'
> memAccess='shared'/>
>      </numa>
>    </cpu>
>      <interface type='vhostuser'>
>        <mac address='00:00:00:00:00:aa'/>
>        <source type='unix' path='/usr/local/var/run/openvswitch/vhostuser'
> mo$
>        <model type='virtio'/>
>        <driver queues='2'>
>          <host mrg_rxbuf='on'/>

Is there a requirement for mergeable RX buffers?  Some PMDs like mlx5 
can take advantage of SSE instructions when this is disabled, yielding 
better performance.

>        </driver>
>        <address type='pci' domain='0x0000' bus='0x07' slot='0x00'
> function='0x0'$
>      </interface>
> 

I don't see hugepage usage in the libvirt XML.  Something similar to:

   <memory unit='KiB'>8388608</memory>
   <currentMemory unit='KiB'>8388608</currentMemory>
   <memoryBacking>
     <hugepages>
       <page size='1048576' unit='KiB' nodeset='0'/>
     </hugepages>
   </memoryBacking>


> -----------------------------------
> OVS Start Config
> -----------------------------------
> ovs-vsctl --no-wait set Open_vSwitch . other_config:dpdk-init=true
> ovs-vsctl --no-wait set Open_vSwitch . other_config:dpdk-socket-mem="4096,0"
> ovs-vsctl --no-wait set Open_vSwitch . other_config:dpdk-lcore-mask=0xff
> ovs-vsctl --no-wait set Open_vSwitch . other_config:pmd-cpu-mask=0e

These two masks shouldn't overlap:
https://developers.redhat.com/blog/2017/06/28/ovs-dpdk-parameters-dealing-with-multi-numa/

> ovs-vsctl add-port ovsbr dpdk0 -- set Interface dpdk0 type=dpdk
> options:dpdk-devargs=0000:b3:00.0
> ovs-vsctl set interface dpdk0 options:n_rxq=2
> ovs-vsctl add-port ovsbr vhost-vm -- set Interface vhostuser
> type=dpdkvhostuser
> 
> 
> 
> -------------------------------------------------------
> $cat /proc/cmdline
> BOOT_IMAGE=/vmlinuz-4.19.0-9-amd64 root=/dev/mapper/Volume0-debian--stable
> ro default_hugepagesz=1G hugepagesz=1G hugepages=16 intel_iommu=on iommu=pt
> quiet
> 
> 
> Is there anything I should be aware of the versions and setting I am using?
> Did I compile DPDK and/or OvS in a wrong way?
> 
> Thank you for your kind help ;)
> 

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [dpdk-users] Poor performance when using OVS with DPDK
@ 2020-06-24 11:03 Vipul Ujawane
  2020-06-25 18:03 ` David Christensen
  0 siblings, 1 reply; 7+ messages in thread
From: Vipul Ujawane @ 2020-06-24 11:03 UTC (permalink / raw)
  To: users

Dear all,

I am observing a very low performance when running OVS-DPDK when compared
to OVS running with the Kernel Datapath.
I have OvS version 2.13.90 compiled from source with the latest stable DPDK
v19.11.3 on a stable Debian system running kernel 4.19.0-9-amd64 (real
version:4.19.118).

I have tried to use the latest released OvS as well (2.12) with the same
LTS DPDK. As a last resort, I have tried an older kernel, whether it has
any problem (4.19.0-8-amd64 (real version:4.19.98)).

I have not been able to troubleshoot the problem, and kindly request your
help regarding the same.

HW configuration
================
We have to two totally identical servers (Debian stable, Intel(R) Xeon(R)
Gold 6230 CPU, 96G Mem), each runs KVM virtual machine. On the hypervisor
layer, we have OvS for traffic routing. The servers are connected directly
via a Mellanox ConnectX-5 (1x100G).
OVS Forwarding tables are configured for simple port-forwarding only to
avoid any packet processing-related issue.

Problem
=======
When both servers are running OVS-Kernel at the hypervisor layer and VMs
are connected to it via libvirt and virtio interfaces, the
VM->Server1->Server2->VM throughput is around 16-18Gbps.
However, when using OVS-DPDK with the same setting, the throughput drops
down to 4-6Gbps.


SW/driver configurations:
==================
DPDK
----
In config common_base, besides the defaults, I have enabled the following
extra drivers/features to be compiled/enabled.
CONFIG_RTE_LIBRTE_MLX5_PMD=y
CONFIG_RTE_LIBRTE_VHOST=y
CONFIG_RTE_LIBRTE_VHOST_NUMA=y
CONFIG_RTE_LIBRTE_PMD_VHOST=y
CONFIG_RTE_VIRTIO_USER=n
CONFIG_RTE_EAL_VFIO=y


OVS
---
$ovs-vswitchd --version
ovs-vswitchd (Open vSwitch) 2.13.90

$sudo ovs-vsctl get Open_vSwitch . dpdk_initialized
true

$sudo ovs-vsctl get Open_vSwitch . dpdk_version
"DPDK 19.11.3"

OS settings
-----------
$ lsb_release -a
No LSB modules are available.
Distributor ID: Debian
Description: Debian GNU/Linux 10 (buster)
Release: 10
Codename: buster


$ cat /proc/cmdline
BOOT_IMAGE=/vmlinuz-4.19.0-9-amd64 root=/dev/mapper/Volume0-debian--stable
ro default_hugepagesz=1G hugepagesz=1G hugepages=16 intel_iommu=on iommu=pt
quiet

./usertools/dpdk-devbind.py --status
Network devices using kernel driver
===================================
0000:b3:00.0 'MT27800 Family [ConnectX-5] 1017' if=ens2 drv=mlx5_core
unused=igb_uio,vfio-pci

Due to the way how Mellanox cards and their driver work, I have not bond
igb_uio to the interface, however, uio, igb_uio and vfio-pci kernel modules
are loaded.


Relevant part of the VM-config for Qemu/KVM
-------------------------------------------
  <cputune>
    <shares>4096</shares>
    <vcpupin vcpu='0' cpuset='4'/>
    <vcpupin vcpu='1' cpuset='5'/>
    <emulatorpin cpuset='4-5'/>
  </cputune>
  <cpu mode='host-model' check='partial'>
    <model fallback='allow'/>
    <topology sockets='2' cores='1' threads='1'/>
    <numa>
      <cell id='0' cpus='0-1' memory='4194304' unit='KiB'
memAccess='shared'/>
    </numa>
  </cpu>
    <interface type='vhostuser'>
      <mac address='00:00:00:00:00:aa'/>
      <source type='unix' path='/usr/local/var/run/openvswitch/vhostuser'
mo$
      <model type='virtio'/>
      <driver queues='2'>
        <host mrg_rxbuf='on'/>
      </driver>
      <address type='pci' domain='0x0000' bus='0x07' slot='0x00'
function='0x0'$
    </interface>

-----------------------------------
OVS Start Config
-----------------------------------
ovs-vsctl --no-wait set Open_vSwitch . other_config:dpdk-init=true
ovs-vsctl --no-wait set Open_vSwitch . other_config:dpdk-socket-mem="4096,0"
ovs-vsctl --no-wait set Open_vSwitch . other_config:dpdk-lcore-mask=0xff
ovs-vsctl --no-wait set Open_vSwitch . other_config:pmd-cpu-mask=0e
ovs-vsctl add-port ovsbr dpdk0 -- set Interface dpdk0 type=dpdk
options:dpdk-devargs=0000:b3:00.0
ovs-vsctl set interface dpdk0 options:n_rxq=2
ovs-vsctl add-port ovsbr vhost-vm -- set Interface vhostuser
type=dpdkvhostuser



-------------------------------------------------------
$cat /proc/cmdline
BOOT_IMAGE=/vmlinuz-4.19.0-9-amd64 root=/dev/mapper/Volume0-debian--stable
ro default_hugepagesz=1G hugepagesz=1G hugepages=16 intel_iommu=on iommu=pt
quiet


Is there anything I should be aware of the versions and setting I am using?
Did I compile DPDK and/or OvS in a wrong way?

Thank you for your kind help ;)

-- 

Vipul Ujawane

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2020-06-30  7:33 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-06-24 10:56 [dpdk-users] Poor performance when using OVS with DPDK Vipul Ujawane
2020-06-24 11:03 Vipul Ujawane
2020-06-25 18:03 ` David Christensen
2020-06-26  9:39   ` Vipul Ujawane
2020-06-26 19:32     ` David Christensen
2020-06-29  8:33       ` Vipul Ujawane
2020-06-30  4:41         ` Xia, Chenbo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).