From: David Christensen <drc@linux.vnet.ibm.com>
To: Vipul Ujawane <vipul999ujawane@gmail.com>, users@dpdk.org
Subject: Re: [dpdk-users] Poor performance when using OVS with DPDK
Date: Thu, 25 Jun 2020 11:03:41 -0700 [thread overview]
Message-ID: <f06f3d0a-2306-03a1-f368-aa5cb312a52d@linux.vnet.ibm.com> (raw)
In-Reply-To: <CABgxuK5UHr9bDF_HyUOUXRRN-S4z9onbgux0+1KjDHF1VhkoFA@mail.gmail.com>
On 6/24/20 4:03 AM, Vipul Ujawane wrote:
> Dear all,
>
> I am observing a very low performance when running OVS-DPDK when compared
> to OVS running with the Kernel Datapath.
> I have OvS version 2.13.90 compiled from source with the latest stable DPDK
> v19.11.3 on a stable Debian system running kernel 4.19.0-9-amd64 (real
> version:4.19.118).
>
> I have tried to use the latest released OvS as well (2.12) with the same
> LTS DPDK. As a last resort, I have tried an older kernel, whether it has
> any problem (4.19.0-8-amd64 (real version:4.19.98)).
>
> I have not been able to troubleshoot the problem, and kindly request your
> help regarding the same.
>
> HW configuration
> ================
> We have to two totally identical servers (Debian stable, Intel(R) Xeon(R)
> Gold 6230 CPU, 96G Mem), each runs KVM virtual machine. On the hypervisor
> layer, we have OvS for traffic routing. The servers are connected directly
> via a Mellanox ConnectX-5 (1x100G).
> OVS Forwarding tables are configured for simple port-forwarding only to
> avoid any packet processing-related issue.
>
> Problem
> =======
> When both servers are running OVS-Kernel at the hypervisor layer and VMs
> are connected to it via libvirt and virtio interfaces, the
> VM->Server1->Server2->VM throughput is around 16-18Gbps.
> However, when using OVS-DPDK with the same setting, the throughput drops
> down to 4-6Gbps.
You don't mention the traffic profile. I assume 64 byte frames but best
to be explicit.
>
> SW/driver configurations:
> ==================
> DPDK
> ----
> In config common_base, besides the defaults, I have enabled the following
> extra drivers/features to be compiled/enabled.
> CONFIG_RTE_LIBRTE_MLX5_PMD=y
> CONFIG_RTE_LIBRTE_VHOST=y
> CONFIG_RTE_LIBRTE_VHOST_NUMA=y
> CONFIG_RTE_LIBRTE_PMD_VHOST=y
> CONFIG_RTE_VIRTIO_USER=n
> CONFIG_RTE_EAL_VFIO=y
>
>
> OVS
> ---
> $ovs-vswitchd --version
> ovs-vswitchd (Open vSwitch) 2.13.90
>
> $sudo ovs-vsctl get Open_vSwitch . dpdk_initialized
> true
>
> $sudo ovs-vsctl get Open_vSwitch . dpdk_version
> "DPDK 19.11.3"
>
> OS settings
> -----------
> $ lsb_release -a
> No LSB modules are available.
> Distributor ID: Debian
> Description: Debian GNU/Linux 10 (buster)
> Release: 10
> Codename: buster
>
>
> $ cat /proc/cmdline
> BOOT_IMAGE=/vmlinuz-4.19.0-9-amd64 root=/dev/mapper/Volume0-debian--stable
> ro default_hugepagesz=1G hugepagesz=1G hugepages=16 intel_iommu=on iommu=pt
> quiet
Why don't you reserve any CPUs for OVS/DPDK or VM usage? All published
performance white papers recommend settings for CPU isolation like this
Mellanox DPDK performance report:
https://fast.dpdk.org/doc/perf/DPDK_19_08_Mellanox_NIC_performance_report.pdf
For their test system:
isolcpus=24-47 intel_idle.max_cstate=0 processor.max_cstate=0
intel_pstate=disable nohz_full=24-47
rcu_nocbs=24-47 rcu_nocb_poll default_hugepagesz=1G hugepagesz=1G
hugepages=64 audit=0
nosoftlockup
Using the tuned service (CPU partitioning profile) make this process easier:
https://tuned-project.org/
>
> ./usertools/dpdk-devbind.py --status
> Network devices using kernel driver
> ===================================
> 0000:b3:00.0 'MT27800 Family [ConnectX-5] 1017' if=ens2 drv=mlx5_core
> unused=igb_uio,vfio-pci
>
> Due to the way how Mellanox cards and their driver work, I have not bond
> igb_uio to the interface, however, uio, igb_uio and vfio-pci kernel modules
> are loaded.
>
>
> Relevant part of the VM-config for Qemu/KVM
> -------------------------------------------
> <cputune>
> <shares>4096</shares>
> <vcpupin vcpu='0' cpuset='4'/>
> <vcpupin vcpu='1' cpuset='5'/>
Where did you get these CPU mapping values? x86 systems typically map
even-numbered CPUs to one NUMA node and odd-numbered CPUs to a different
NUMA node. You generally want to select CPUs from the same NUMA node as
the mlx5 NIC you're using for DPDK.
You should have at least 4 CPUs in the VM, selected according to the
NUMA topology of the system.
Take a look at this bash script written for Red Hat:
https://github.com/ctrautma/RHEL_NIC_QUALIFICATION/blob/ansible/ansible/get_cpulist.sh
It gives you a good starting reference which CPUs to select for the
OVS/DPDK and VM configurations on your particular system. Also review
the Ansible script pvp_ovsdpdk.yml, it provides a lot of other useful
steps you might be able to apply to your Debian OS.
> <emulatorpin cpuset='4-5'/>
> </cputune>
> <cpu mode='host-model' check='partial'>
> <model fallback='allow'/>
> <topology sockets='2' cores='1' threads='1'/>
> <numa>
> <cell id='0' cpus='0-1' memory='4194304' unit='KiB'
> memAccess='shared'/>
> </numa>
> </cpu>
> <interface type='vhostuser'>
> <mac address='00:00:00:00:00:aa'/>
> <source type='unix' path='/usr/local/var/run/openvswitch/vhostuser'
> mo$
> <model type='virtio'/>
> <driver queues='2'>
> <host mrg_rxbuf='on'/>
Is there a requirement for mergeable RX buffers? Some PMDs like mlx5
can take advantage of SSE instructions when this is disabled, yielding
better performance.
> </driver>
> <address type='pci' domain='0x0000' bus='0x07' slot='0x00'
> function='0x0'$
> </interface>
>
I don't see hugepage usage in the libvirt XML. Something similar to:
<memory unit='KiB'>8388608</memory>
<currentMemory unit='KiB'>8388608</currentMemory>
<memoryBacking>
<hugepages>
<page size='1048576' unit='KiB' nodeset='0'/>
</hugepages>
</memoryBacking>
> -----------------------------------
> OVS Start Config
> -----------------------------------
> ovs-vsctl --no-wait set Open_vSwitch . other_config:dpdk-init=true
> ovs-vsctl --no-wait set Open_vSwitch . other_config:dpdk-socket-mem="4096,0"
> ovs-vsctl --no-wait set Open_vSwitch . other_config:dpdk-lcore-mask=0xff
> ovs-vsctl --no-wait set Open_vSwitch . other_config:pmd-cpu-mask=0e
These two masks shouldn't overlap:
https://developers.redhat.com/blog/2017/06/28/ovs-dpdk-parameters-dealing-with-multi-numa/
> ovs-vsctl add-port ovsbr dpdk0 -- set Interface dpdk0 type=dpdk
> options:dpdk-devargs=0000:b3:00.0
> ovs-vsctl set interface dpdk0 options:n_rxq=2
> ovs-vsctl add-port ovsbr vhost-vm -- set Interface vhostuser
> type=dpdkvhostuser
>
>
>
> -------------------------------------------------------
> $cat /proc/cmdline
> BOOT_IMAGE=/vmlinuz-4.19.0-9-amd64 root=/dev/mapper/Volume0-debian--stable
> ro default_hugepagesz=1G hugepagesz=1G hugepages=16 intel_iommu=on iommu=pt
> quiet
>
>
> Is there anything I should be aware of the versions and setting I am using?
> Did I compile DPDK and/or OvS in a wrong way?
>
> Thank you for your kind help ;)
>
next prev parent reply other threads:[~2020-06-25 18:03 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-06-24 11:03 Vipul Ujawane
2020-06-25 18:03 ` David Christensen [this message]
2020-06-26 9:39 ` Vipul Ujawane
2020-06-26 19:32 ` David Christensen
2020-06-29 8:33 ` Vipul Ujawane
2020-06-30 4:41 ` Xia, Chenbo
-- strict thread matches above, loose matches on Subject: below --
2020-06-24 10:56 Vipul Ujawane
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=f06f3d0a-2306-03a1-f368-aa5cb312a52d@linux.vnet.ibm.com \
--to=drc@linux.vnet.ibm.com \
--cc=users@dpdk.org \
--cc=vipul999ujawane@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).