DPDK usage discussions
 help / color / Atom feed
From: David Christensen <drc@linux.vnet.ibm.com>
To: Vipul Ujawane <vipul999ujawane@gmail.com>, users@dpdk.org
Subject: Re: [dpdk-users] Poor performance when using OVS with DPDK
Date: Thu, 25 Jun 2020 11:03:41 -0700
Message-ID: <f06f3d0a-2306-03a1-f368-aa5cb312a52d@linux.vnet.ibm.com> (raw)
In-Reply-To: <CABgxuK5UHr9bDF_HyUOUXRRN-S4z9onbgux0+1KjDHF1VhkoFA@mail.gmail.com>



On 6/24/20 4:03 AM, Vipul Ujawane wrote:
> Dear all,
> 
> I am observing a very low performance when running OVS-DPDK when compared
> to OVS running with the Kernel Datapath.
> I have OvS version 2.13.90 compiled from source with the latest stable DPDK
> v19.11.3 on a stable Debian system running kernel 4.19.0-9-amd64 (real
> version:4.19.118).
> 
> I have tried to use the latest released OvS as well (2.12) with the same
> LTS DPDK. As a last resort, I have tried an older kernel, whether it has
> any problem (4.19.0-8-amd64 (real version:4.19.98)).
> 
> I have not been able to troubleshoot the problem, and kindly request your
> help regarding the same.
> 
> HW configuration
> ================
> We have to two totally identical servers (Debian stable, Intel(R) Xeon(R)
> Gold 6230 CPU, 96G Mem), each runs KVM virtual machine. On the hypervisor
> layer, we have OvS for traffic routing. The servers are connected directly
> via a Mellanox ConnectX-5 (1x100G).
> OVS Forwarding tables are configured for simple port-forwarding only to
> avoid any packet processing-related issue.
> 
> Problem
> =======
> When both servers are running OVS-Kernel at the hypervisor layer and VMs
> are connected to it via libvirt and virtio interfaces, the
> VM->Server1->Server2->VM throughput is around 16-18Gbps.
> However, when using OVS-DPDK with the same setting, the throughput drops
> down to 4-6Gbps.

You don't mention the traffic profile.  I assume 64 byte frames but best 
to be explicit.

> 
> SW/driver configurations:
> ==================
> DPDK
> ----
> In config common_base, besides the defaults, I have enabled the following
> extra drivers/features to be compiled/enabled.
> CONFIG_RTE_LIBRTE_MLX5_PMD=y
> CONFIG_RTE_LIBRTE_VHOST=y
> CONFIG_RTE_LIBRTE_VHOST_NUMA=y
> CONFIG_RTE_LIBRTE_PMD_VHOST=y
> CONFIG_RTE_VIRTIO_USER=n
> CONFIG_RTE_EAL_VFIO=y
> 
> 
> OVS
> ---
> $ovs-vswitchd --version
> ovs-vswitchd (Open vSwitch) 2.13.90
> 
> $sudo ovs-vsctl get Open_vSwitch . dpdk_initialized
> true
> 
> $sudo ovs-vsctl get Open_vSwitch . dpdk_version
> "DPDK 19.11.3"
> 
> OS settings
> -----------
> $ lsb_release -a
> No LSB modules are available.
> Distributor ID: Debian
> Description: Debian GNU/Linux 10 (buster)
> Release: 10
> Codename: buster
> 
> 
> $ cat /proc/cmdline
> BOOT_IMAGE=/vmlinuz-4.19.0-9-amd64 root=/dev/mapper/Volume0-debian--stable
> ro default_hugepagesz=1G hugepagesz=1G hugepages=16 intel_iommu=on iommu=pt
> quiet

Why don't you reserve any CPUs for OVS/DPDK or VM usage?  All published 
performance white papers recommend settings for CPU isolation like this 
Mellanox DPDK performance report:

https://fast.dpdk.org/doc/perf/DPDK_19_08_Mellanox_NIC_performance_report.pdf

For their test system:

isolcpus=24-47 intel_idle.max_cstate=0 processor.max_cstate=0 
intel_pstate=disable nohz_full=24-47
rcu_nocbs=24-47 rcu_nocb_poll default_hugepagesz=1G hugepagesz=1G 
hugepages=64 audit=0
nosoftlockup

Using the tuned service (CPU partitioning profile) make this process easier:

https://tuned-project.org/

> 
> ./usertools/dpdk-devbind.py --status
> Network devices using kernel driver
> ===================================
> 0000:b3:00.0 'MT27800 Family [ConnectX-5] 1017' if=ens2 drv=mlx5_core
> unused=igb_uio,vfio-pci
> 
> Due to the way how Mellanox cards and their driver work, I have not bond
> igb_uio to the interface, however, uio, igb_uio and vfio-pci kernel modules
> are loaded.
> 
> 
> Relevant part of the VM-config for Qemu/KVM
> -------------------------------------------
>    <cputune>
>      <shares>4096</shares>
>      <vcpupin vcpu='0' cpuset='4'/>
>      <vcpupin vcpu='1' cpuset='5'/>

Where did you get these CPU mapping values?  x86 systems typically map 
even-numbered CPUs to one NUMA node and odd-numbered CPUs to a different 
NUMA node.  You generally want to select CPUs from the same NUMA node as 
the mlx5 NIC you're using for DPDK.

You should have at least 4 CPUs in the VM, selected according to the 
NUMA topology of the system.

Take a look at this bash script written for Red Hat:

https://github.com/ctrautma/RHEL_NIC_QUALIFICATION/blob/ansible/ansible/get_cpulist.sh

It gives you a good starting reference which CPUs to select for the 
OVS/DPDK and VM configurations on your particular system.  Also review 
the Ansible script pvp_ovsdpdk.yml, it provides a lot of other useful 
steps you might be able to apply to your Debian OS.

>      <emulatorpin cpuset='4-5'/>
>    </cputune>
>    <cpu mode='host-model' check='partial'>
>      <model fallback='allow'/>
>      <topology sockets='2' cores='1' threads='1'/>
>      <numa>
>        <cell id='0' cpus='0-1' memory='4194304' unit='KiB'
> memAccess='shared'/>
>      </numa>
>    </cpu>
>      <interface type='vhostuser'>
>        <mac address='00:00:00:00:00:aa'/>
>        <source type='unix' path='/usr/local/var/run/openvswitch/vhostuser'
> mo$
>        <model type='virtio'/>
>        <driver queues='2'>
>          <host mrg_rxbuf='on'/>

Is there a requirement for mergeable RX buffers?  Some PMDs like mlx5 
can take advantage of SSE instructions when this is disabled, yielding 
better performance.

>        </driver>
>        <address type='pci' domain='0x0000' bus='0x07' slot='0x00'
> function='0x0'$
>      </interface>
> 

I don't see hugepage usage in the libvirt XML.  Something similar to:

   <memory unit='KiB'>8388608</memory>
   <currentMemory unit='KiB'>8388608</currentMemory>
   <memoryBacking>
     <hugepages>
       <page size='1048576' unit='KiB' nodeset='0'/>
     </hugepages>
   </memoryBacking>


> -----------------------------------
> OVS Start Config
> -----------------------------------
> ovs-vsctl --no-wait set Open_vSwitch . other_config:dpdk-init=true
> ovs-vsctl --no-wait set Open_vSwitch . other_config:dpdk-socket-mem="4096,0"
> ovs-vsctl --no-wait set Open_vSwitch . other_config:dpdk-lcore-mask=0xff
> ovs-vsctl --no-wait set Open_vSwitch . other_config:pmd-cpu-mask=0e

These two masks shouldn't overlap:
https://developers.redhat.com/blog/2017/06/28/ovs-dpdk-parameters-dealing-with-multi-numa/

> ovs-vsctl add-port ovsbr dpdk0 -- set Interface dpdk0 type=dpdk
> options:dpdk-devargs=0000:b3:00.0
> ovs-vsctl set interface dpdk0 options:n_rxq=2
> ovs-vsctl add-port ovsbr vhost-vm -- set Interface vhostuser
> type=dpdkvhostuser
> 
> 
> 
> -------------------------------------------------------
> $cat /proc/cmdline
> BOOT_IMAGE=/vmlinuz-4.19.0-9-amd64 root=/dev/mapper/Volume0-debian--stable
> ro default_hugepagesz=1G hugepagesz=1G hugepages=16 intel_iommu=on iommu=pt
> quiet
> 
> 
> Is there anything I should be aware of the versions and setting I am using?
> Did I compile DPDK and/or OvS in a wrong way?
> 
> Thank you for your kind help ;)
> 

  reply index

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-06-24 11:03 Vipul Ujawane
2020-06-25 18:03 ` David Christensen [this message]
2020-06-26  9:39   ` Vipul Ujawane
2020-06-26 19:32     ` David Christensen
2020-06-29  8:33       ` Vipul Ujawane
2020-06-30  4:41         ` Xia, Chenbo
  -- strict thread matches above, loose matches on Subject: below --
2020-06-24 10:56 Vipul Ujawane

Reply instructions:

You may reply publically to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=f06f3d0a-2306-03a1-f368-aa5cb312a52d@linux.vnet.ibm.com \
    --to=drc@linux.vnet.ibm.com \
    --cc=users@dpdk.org \
    --cc=vipul999ujawane@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

DPDK usage discussions

Archives are clonable:
	git clone --mirror http://inbox.dpdk.org/users/0 users/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 users users/ http://inbox.dpdk.org/users \
		users@dpdk.org
	public-inbox-index users


Newsgroup available over NNTP:
	nntp://inbox.dpdk.org/inbox.dpdk.users


AGPL code for this site: git clone https://public-inbox.org/ public-inbox