From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <users-bounces@dpdk.org>
Received: from dpdk.org (dpdk.org [92.243.14.124])
	by inbox.dpdk.org (Postfix) with ESMTP id 20B79A0520
	for <public@inbox.dpdk.org>; Fri, 26 Jun 2020 21:32:56 +0200 (CEST)
Received: from [92.243.14.124] (localhost [127.0.0.1])
	by dpdk.org (Postfix) with ESMTP id 4ADBB1B94F;
	Fri, 26 Jun 2020 21:32:55 +0200 (CEST)
Received: from mx0a-001b2d01.pphosted.com (mx0a-001b2d01.pphosted.com
 [148.163.156.1]) by dpdk.org (Postfix) with ESMTP id 9925423D
 for <users@dpdk.org>; Fri, 26 Jun 2020 21:32:53 +0200 (CEST)
Received: from pps.filterd (m0098409.ppops.net [127.0.0.1])
 by mx0a-001b2d01.pphosted.com (8.16.0.42/8.16.0.42) with SMTP id
 05QJW8Xh022682 for <users@dpdk.org>; Fri, 26 Jun 2020 15:32:52 -0400
Received: from pps.reinject (localhost [127.0.0.1])
 by mx0a-001b2d01.pphosted.com with ESMTP id 31wkbger4k-1
 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT)
 for <users@dpdk.org>; Fri, 26 Jun 2020 15:32:51 -0400
Received: from m0098409.ppops.net (m0098409.ppops.net [127.0.0.1])
 by pps.reinject (8.16.0.36/8.16.0.36) with SMTP id 05QJWp7a024402
 for <users@dpdk.org>; Fri, 26 Jun 2020 15:32:51 -0400
Received: from ppma03dal.us.ibm.com (b.bd.3ea9.ip4.static.sl-reverse.com
 [169.62.189.11])
 by mx0a-001b2d01.pphosted.com with ESMTP id 31wkbger48-1
 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT);
 Fri, 26 Jun 2020 15:32:51 -0400
Received: from pps.filterd (ppma03dal.us.ibm.com [127.0.0.1])
 by ppma03dal.us.ibm.com (8.16.0.42/8.16.0.42) with SMTP id 05QJUApK031961;
 Fri, 26 Jun 2020 19:32:50 GMT
Received: from b01cxnp22033.gho.pok.ibm.com (b01cxnp22033.gho.pok.ibm.com
 [9.57.198.23]) by ppma03dal.us.ibm.com with ESMTP id 31wa24x4hn-1
 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT);
 Fri, 26 Jun 2020 19:32:50 +0000
Received: from b01ledav002.gho.pok.ibm.com (b01ledav002.gho.pok.ibm.com
 [9.57.199.107])
 by b01cxnp22033.gho.pok.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id
 05QJWnGJ50725164
 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK);
 Fri, 26 Jun 2020 19:32:49 GMT
Received: from b01ledav002.gho.pok.ibm.com (unknown [127.0.0.1])
 by IMSVA (Postfix) with ESMTP id 9611D124055;
 Fri, 26 Jun 2020 19:32:49 +0000 (GMT)
Received: from b01ledav002.gho.pok.ibm.com (unknown [127.0.0.1])
 by IMSVA (Postfix) with ESMTP id B4F27124052;
 Fri, 26 Jun 2020 19:32:48 +0000 (GMT)
Received: from Davids-MBP.randomparity.org (unknown [9.211.152.140])
 by b01ledav002.gho.pok.ibm.com (Postfix) with ESMTP;
 Fri, 26 Jun 2020 19:32:48 +0000 (GMT)
To: Vipul Ujawane <vipul999ujawane@gmail.com>
Cc: users@dpdk.org
References: <CABgxuK5UHr9bDF_HyUOUXRRN-S4z9onbgux0+1KjDHF1VhkoFA@mail.gmail.com>
 <f06f3d0a-2306-03a1-f368-aa5cb312a52d@linux.vnet.ibm.com>
 <CABgxuK4Fkvb8rtN0eKV5ha7-tDDQGd8QNLXa3bZnbdAZHp=Rtg@mail.gmail.com>
From: David Christensen <drc@linux.vnet.ibm.com>
Message-ID: <80f92bdf-7c29-31fa-3be8-dcaee8d9e4a7@linux.vnet.ibm.com>
Date: Fri, 26 Jun 2020 12:32:47 -0700
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:68.0)
 Gecko/20100101 Thunderbird/68.9.0
MIME-Version: 1.0
In-Reply-To: <CABgxuK4Fkvb8rtN0eKV5ha7-tDDQGd8QNLXa3bZnbdAZHp=Rtg@mail.gmail.com>
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Language: en-US
Content-Transfer-Encoding: 8bit
X-TM-AS-GCONF: 00
X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:6.0.216, 18.0.687
 definitions=2020-06-26_10:2020-06-26,
 2020-06-26 signatures=0
X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0
 bulkscore=0 mlxscore=0
 phishscore=0 suspectscore=0 spamscore=0 mlxlogscore=999 impostorscore=0
 cotscore=-2147483648 clxscore=1015 malwarescore=0 adultscore=0
 lowpriorityscore=0 priorityscore=1501 classifier=spam adjust=0 reason=mlx
 scancount=1 engine=8.12.0-2004280000 definitions=main-2006260134
Subject: Re: [dpdk-users] Poor performance when using OVS with DPDK
X-BeenThere: users@dpdk.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: DPDK usage discussions <users.dpdk.org>
List-Unsubscribe: <https://mails.dpdk.org/options/users>,
 <mailto:users-request@dpdk.org?subject=unsubscribe>
List-Archive: <http://mails.dpdk.org/archives/users/>
List-Post: <mailto:users@dpdk.org>
List-Help: <mailto:users-request@dpdk.org?subject=help>
List-Subscribe: <https://mails.dpdk.org/listinfo/users>,
 <mailto:users-request@dpdk.org?subject=subscribe>
Errors-To: users-bounces@dpdk.org
Sender: "users" <users-bounces@dpdk.org>

>  > Why don't you reserve any CPUs for OVS/DPDK or VM usage?  All
>  > published
>  > performance white papers recommend settings for CPU isolation like
>  > this
>  > Mellanox DPDK performance report:
>  >
>  > 
> https://fast.dpdk.org/doc/perf/DPDK_19_08_Mellanox_NIC_performance_report.pdf 
> <https://mailtrack.io/trace/link/d820a3bc37ae49e92351ef785e3cfb4c21ab5d3c?url=https%3A%2F%2Ffast.dpdk.org%2Fdoc%2Fperf%2FDPDK_19_08_Mellanox_NIC_performance_report.pdf&userId=1840365&signature=eda177b39475fac6>
>  >
>  > For their test system:
>  >
>  > isolcpus=24-47 intel_idle.max_cstate=0 processor.max_cstate=0
>  > intel_pstate=disable nohz_full=24-47
>  > rcu_nocbs=24-47 rcu_nocb_poll default_hugepagesz=1G hugepagesz=1G
>  > hugepages=64 audit=0
>  > nosoftlockup
>  >
>  > Using the tuned service (CPU partitioning profile) make this process
>  > easier:
>  >
>  > https://tuned-project.org/ 
> <https://mailtrack.io/trace/link/4f3f47457c7163aacfe6bb108c6eff554be9cd4d?url=https%3A%2F%2Ftuned-project.org%2F&userId=1840365&signature=f1504b405d9e880f>
>  >
> Nice tutorial, thanks for sharing. I have checked it and configured our
> server like this:
> 
> isolcpus=12-19 intel_idle.max_cstate=0 processor.max_cstate=0
> nohz_full=12-19 rcu_nocbs=12-19 intel_pstate=disable
> default_hugepagesz=1G hugepagesz=1G hugepages=24 audit=0 nosoftlockup
> intel_iommu=on iommu=pt rcu_nocb_poll
> 
> 
> Even though our servers are NUMA-capable and NUMA-aware, we only have
> one CPU installed in one socket.
> And one CPU has 20 physical cores (40 threads), so I figured out to use
> the "top-most" cores for DPDK/OVS, that's the reason of isolcpus=12-19

You can never have too many cores.  On POWER systems I'll sometimes 
reserve 76 out of 80 available cores to improve overall throughput.

>  > >
>  > > ./usertools/dpdk-devbind.py --status
>  > > Network devices using kernel driver
>  > > ===================================
>  > > 0000:b3:00.0 'MT27800 Family [ConnectX-5] 1017' if=ens2
>  > drv=mlx5_core
>  > > unused=igb_uio,vfio-pci
>  > >
>  > > Due to the way how Mellanox cards and their driver work, I have not
>  > bond
>  > > igb_uio to the interface, however, uio, igb_uio and vfio-pci kernel
>  > modules
>  > > are loaded.
>  > >
>  > >
>  > > Relevant part of the VM-config for Qemu/KVM
>  > > -------------------------------------------
>  > >    <cputune>
>  > >      <shares>4096</shares>
>  > >      <vcpupin vcpu='0' cpuset='4'/>
>  > >      <vcpupin vcpu='1' cpuset='5'/>
>  >
>  > Where did you get these CPU mapping values?  x86 systems typically
>  > map
>  > even-numbered CPUs to one NUMA node and odd-numbered CPUs to a
>  > different
>  > NUMA node.  You generally want to select CPUs from the same NUMA node
>  > as
>  > the mlx5 NIC you're using for DPDK.
>  >
>  > You should have at least 4 CPUs in the VM, selected according to the
>  > NUMA topology of the system.
> as per my answer above, our system has no secondary NUMA node, all
> mappings are to the same socket/CPU.
> 
>  >
>  > Take a look at this bash script written for Red Hat:
>  >
>  > 
> https://github.com/ctrautma/RHEL_NIC_QUALIFICATION/blob/ansible/ansible/get_cpulist.sh 
> <https://mailtrack.io/trace/link/cf74e0c69acb6d9a348606606825a0320898824a?url=https%3A%2F%2Fgithub.com%2Fctrautma%2FRHEL_NIC_QUALIFICATION%2Fblob%2Fansible%2Fansible%2Fget_cpulist.sh&userId=1840365&signature=9884f8ea2b42399d>
>  >
>  > It gives you a good starting reference which CPUs to select for the
>  > OVS/DPDK and VM configurations on your particular system.  Also
>  > review
>  > the Ansible script pvp_ovsdpdk.yml, it provides a lot of other
>  > useful
>  > steps you might be able to apply to your Debian OS.
>  >
>  > >      <emulatorpin cpuset='4-5'/>
>  > >    </cputune>
>  > >    <cpu mode='host-model' check='partial'>
>  > >      <model fallback='allow'/>
>  > >      <topology sockets='2' cores='1' threads='1'/>
>  > >      <numa>
>  > >        <cell id='0' cpus='0-1' memory='4194304' unit='KiB'
>  > > memAccess='shared'/>
>  > >      </numa>
>  > >    </cpu>
>  > >      <interface type='vhostuser'>
>  > >        <mac address='00:00:00:00:00:aa'/>
>  > >        <source type='unix'
>  > path='/usr/local/var/run/openvswitch/vhostuser'
>  > > mo$
>  > >        <model type='virtio'/>
>  > >        <driver queues='2'>
>  > >          <host mrg_rxbuf='on'/>
>  >
>  > Is there a requirement for mergeable RX buffers?  Some PMDs like
>  > mlx5
>  > can take advantage of SSE instructions when this is disabled,
>  > yielding
>  > better performance.
> Good point, there is no requirement, I just took an example config and
> though it's necessary for the driver queues setting.

That's how we all learn :-)

>  >
>  > >        </driver>
>  > >        <address type='pci' domain='0x0000' bus='0x07' slot='0x00'
>  > > function='0x0'$
>  > >      </interface>
>  > >
>  >
>  > I don't see hugepage usage in the libvirt XML.  Something similar to:
>  >
>  >    <memory unit='KiB'>8388608</memory>
>  >    <currentMemory unit='KiB'>8388608</currentMemory>
>  >    <memoryBacking>
>  >      <hugepages>
>  >        <page size='1048576' unit='KiB' nodeset='0'/>
>  >      </hugepages>
>  >    </memoryBacking>
> I did not copy this part of the XML, but we have hugepages configured
> properly.
>  >
>  >
>  > > -----------------------------------
>  > > OVS Start Config
>  > > -----------------------------------
>  > > ovs-vsctl --no-wait set Open_vSwitch . other_config:dpdk-init=true
>  > > ovs-vsctl --no-wait set Open_vSwitch . other_config:dpdk-socket-
>  > mem="4096,0"
>  > > ovs-vsctl --no-wait set Open_vSwitch . other_config:dpdk-lcore-
>  > mask=0xff
>  > > ovs-vsctl --no-wait set Open_vSwitch . other_config:pmd-cpu-mask=0e
>  >
>  > These two masks shouldn't overlap:
>  > 
> https://developers.redhat.com/blog/2017/06/28/ovs-dpdk-parameters-dealing-with-multi-numa/ 
> <https://mailtrack.io/trace/link/6c59473e0a8547a9cb80d8f52f9cf5190a0712f6?url=https%3A%2F%2Fdevelopers.redhat.com%2Fblog%2F2017%2F06%2F28%2Fovs-dpdk-parameters-dealing-with-multi-numa%2F&userId=1840365&signature=b01114dc094e5fb9>
>  >
> Thanks, this did really help me understand in which order these
> commands should be issued.
> 
> So, the problem now is the following.
> I did all the changes you shared, and started OVS/DPDK in a proper way
> and set these features:
> 
> ovs-vsctl --no-wait set Open_vSwitch . other_config:dpdk-socket-
> mem="8192,0"
> 
> ovs-vsctl --no-wait set Open_vSwitch . other_config:dpdk-lcore-
> mask=0x01000
> 
> ovs-vsctl --no-wait set Open_vSwitch . other_config:dpdk-init=true
> 
> and, finally this:
> ovs-vsctl --no-wait set Open_vSwitch . other_config:pmd-cpu-
> mask=0x0e000
> 
> The documentation you shared say this last one can be even set during
> runtime. So, I was playing with it to see there is any change.
> 
> I did not start any VM on top of OVS/DPDK, just set up a port forward
> rule (in_port=1, actions=output:IN_PORT), since I only have one
> physical ports on each mellanox card.
> Then, I generated traffic from the other server towards OVS
> Using pktsize 64B, the max throughput Pktgen reports is 8Gbps.
> In particular, I got these metrics:
> Size        Sent_pps       Recv_pps      Recv_Gbps
> 64B           93M            11M             ~8
> 128B          65M            12.5M           ~15
> 256B          42.5M          12.3M           ~27
> 512B          23.5M          11.9M           ~51
> 1024B         11.9M          10M             ~83
> 1280B         9.6M           8.3M            ~86
> 1500B         8.3M           6.7M            ~82
> 
> It's quite interesting that for 64B, the pps is less than for greater
> sizes. Because PPS should be the practical limitation in throughput,
> and according to the packet size we can count the throughput in Gbps.

Looking at 64B performance gives you a sense of the per-packet overhead 
associated with the DPDK framework and your application.  At 100Gb/s 
line rate, 64B frames will arrive every 6.72ns.  Since your received PPS 
is peaking around 12.5MPPS I'd guess that it's taking about 80ns of CPU 
time per frame.  I don't know how well OVS scales with additional CPUs, 
something to look at.

You don't mention how many different flows you're using in the test. 
Don't be surprised as throughput drops when you move from 1,000 flows to 
1,000,000 flows.

It's likely that most of your frame loss is due to the NICs RX buffers 
overflowing and dropping frames due to back pressure (i.e. DPDK/OVS 
can't process packets fast enough).  Look the the mlx5's hardware 
statistics to verify.

You may be able to improve the performance by increasing the number of 
RX queues and RX descriptors per queue, and assigning more lcores to 
match the number of queues, allowing the work to be spread more evenly 
and reducing buffer overflows.  This often works when running testpmd 
alone since the app overhead is low but has less effect on OVS 
perforamnce.  You might consider benchmarking testpmd alone vs OVS/DPDK 
to understand the OVS overhead.

> 
> Anyway, OVS-DPDK have 3 cores to use, but only one rx queue is assigned
> to the port (so, basically --- as `top` also shows --- it is the one-
> core performance.

Increasing the number of RX queues/descriptors and assigning a dedicated 
lcore to each queue will generally improve performance if your 
bottleneck is RX in the PMD.

> Increasing the cores did not help, and the performance remained the
> same. Is this performance normal for OVS/DPDK?

That's been my experience, though there are other who have more 
experience with performance testing OVS.  The platform matters.  Look 
for existing whitepapers and compare your system configuration to theirs 
to see what you need to achieve the performance you're looking for.

Dave