[dpdk-users] Vhost PMD Performance Doesn't Scale as Expected

DPDK usage discussions
 help / color / mirror / Atom feed

* [dpdk-users] Vhost PMD Performance Doesn't Scale as Expected
@ 2020-08-12 18:16 David Christensen
  0 siblings, 0 replies; only message in thread
From: David Christensen @ 2020-08-12 18:16 UTC (permalink / raw)
  To: users; +Cc: Maxime Coquelin, Chenbo Xia, Zhihong Wang

I'm examining performance between two VMs connected with a vhost 
interface on DPDK 20.08 and testpmd.  Each VM (client-0, server-0) has 4 
VCPUs, 4 RX/TX queues per port, 4GB RAM, and runs 8 containers, each 
with an instance of qperf running the tcp_bw test.  The configuration is 
targeting all CPU/memory activity for NUMA node 1.

When I look at the cumulative throughput as I increase the number of 
qperf pairs I'm noticing that the performance doesn't appear to scale as 
I had hoped.  Here's a table with some results:

                     concurrent qperf pairs
msg_size     1           2           4           8
8,192    12.74 Gb/s  21.68 Gb/s  27.89 Gb/s  30.94 Gb/s
16,384   13.84 Gb/s  24.06 Gb/s  28.51 Gb/s  30.47 Gb/s
32,768   16.13 Gb/s  24.49 Gb/s  28.89 Gb/s  30.23 Gb/s
65,536   16.19 Gb/s  22.53 Gb/s  29.79 Gb/s  30.46 Gb/s
131,072  15.37 Gb/s  23.89 Gb/s  29.65 Gb/s  30.88 Gb/s
262,144  14.73 Gb/s  22.97 Gb/s  29.54 Gb/s  31.28 Gb/s
524,288  14.62 Gb/s  23.39 Gb/s  28.70 Gb/s  30.98 Gb/s

Can anyone suggest a possible configuration change that might improve 
performance or is this generally what is expected?  I was expecting 
performance to nearly double as I move from 1 to 2 to 4 queues.

Even single queue performance is below Intel's published performance 
results (see 
https://fast.dpdk.org/doc/perf/DPDK_20_05_Intel_virtio_performance_report.pdf), 
though I was unable to get the vhost-switch example application to run 
due to an mbuf allocation error for the i40e PMD and had to revert to 
the testpmd app.

Configuration details below.

Dave

/proc/cmdline:
--------------
BOOT_IMAGE=(hd0,gpt2)/vmlinuz-4.18.0-147.el8.x86_64 
root=/dev/mapper/rhel-root ro intel_iommu=on iommu=pt 
default_hugepagesz=1G hugepagesz=1G hugepages=64 crashkernel=auto 
resume=/dev/mapper/rhel-swap rd.lvm.lv=rhel/root rd.lvm.lv=rhel/swap 
rhgb quiet =1 nohz=on nohz_full=8-15,24-31 rcu_nocbs=8-15,24-31 
tuned.non_isolcpus=00ff00ff intel_pstate=disable nosoftlockup

testpmd command-line:
---------------------
~/src/dpdk/build/app/dpdk-testpmd -l 7,24-31 -n 4 --no-pci --vdev 
'net_vhost0,iface=/tmp/vhost-dpdk-server-0,dequeue-zero-copy=1,tso=1,queues=4' 
--vdev 
'net_vhost1,iface=/tmp/vhost-dpdk-client-0,dequeue-zero-copy=1,tso=1,queues=4' 
  -- -i --nb-cores=8 --numa --rxq=4 --txq=4

testpmd forwarding core mapping:
--------------------------------
Start automatic packet forwarding
io packet forwarding - ports=2 - cores=8 - streams=8 - NUMA support 
enabled, MP allocation mode: native
Logical Core 24 (socket 1) forwards packets on 1 streams:
   RX P=0/Q=0 (socket 0) -> TX P=1/Q=0 (socket 0) peer=02:00:00:00:00:01
Logical Core 25 (socket 1) forwards packets on 1 streams:
   RX P=1/Q=0 (socket 0) -> TX P=0/Q=0 (socket 0) peer=02:00:00:00:00:00
Logical Core 26 (socket 1) forwards packets on 1 streams:
   RX P=0/Q=1 (socket 0) -> TX P=1/Q=1 (socket 0) peer=02:00:00:00:00:01
Logical Core 27 (socket 1) forwards packets on 1 streams:
   RX P=1/Q=1 (socket 0) -> TX P=0/Q=1 (socket 0) peer=02:00:00:00:00:00
Logical Core 28 (socket 1) forwards packets on 1 streams:
   RX P=0/Q=2 (socket 0) -> TX P=1/Q=2 (socket 0) peer=02:00:00:00:00:01
Logical Core 29 (socket 1) forwards packets on 1 streams:
   RX P=1/Q=2 (socket 0) -> TX P=0/Q=2 (socket 0) peer=02:00:00:00:00:00
Logical Core 30 (socket 1) forwards packets on 1 streams:
   RX P=0/Q=3 (socket 0) -> TX P=1/Q=3 (socket 0) peer=02:00:00:00:00:01
Logical Core 31 (socket 1) forwards packets on 1 streams:
   RX P=1/Q=3 (socket 0) -> TX P=0/Q=3 (socket 0) peer=02:00:00:00:00:00

   io packet forwarding packets/burst=32
   nb forwarding cores=8 - nb forwarding ports=2
   port 0: RX queue number: 4 Tx queue number: 4
     Rx offloads=0x0 Tx offloads=0x0
     RX queue: 0
       RX desc=0 - RX free threshold=0
       RX threshold registers: pthresh=0 hthresh=0  wthresh=0
       RX Offloads=0x0
     TX queue: 0
       TX desc=0 - TX free threshold=0
       TX threshold registers: pthresh=0 hthresh=0  wthresh=0
       TX offloads=0x0 - TX RS bit threshold=0
   port 1: RX queue number: 4 Tx queue number: 4
     Rx offloads=0x0 Tx offloads=0x0
     RX queue: 0
       RX desc=0 - RX free threshold=0
       RX threshold registers: pthresh=0 hthresh=0  wthresh=0
       RX Offloads=0x0
     TX queue: 0
       TX desc=0 - TX free threshold=0
       TX threshold registers: pthresh=0 hthresh=0  wthresh=0
       TX offloads=0x0 - TX RS bit threshold=0

lscpu:
------
Architecture:        x86_64
CPU op-mode(s):      32-bit, 64-bit
Byte Order:          Little Endian
CPU(s):              32
On-line CPU(s) list: 0-31
Thread(s) per core:  2
Core(s) per socket:  8
Socket(s):           2
NUMA node(s):        2
Vendor ID:           GenuineIntel
CPU family:          6
Model:               85
Model name:          Intel(R) Xeon(R) Silver 4110 CPU @ 2.10GHz
Stepping:            4
CPU MHz:             2400.075
BogoMIPS:            4200.00
Virtualization:      VT-x
L1d cache:           32K
L1i cache:           32K
L2 cache:            1024K
L3 cache:            11264K
NUMA node0 CPU(s):   0-7,16-23
NUMA node1 CPU(s):   8-15,24-31

server-0 libvirt XML:
---------------------
...
   <memory unit='KiB'>4194304</memory>
   <currentMemory unit='KiB'>4194304</currentMemory>
   <memoryBacking>
     <hugepages>
       <page size='1048576' unit='KiB' nodeset='0'/>
     </hugepages>
   </memoryBacking>
   <vcpu placement='static' cpuset='8-11'>4</vcpu>
   <numatune>
     <memory mode='strict' nodeset='1'/>
   </numatune>
   <os>
     <type arch='x86_64' machine='pc-q35-rhel8.2.0'>hvm</type>
   </os>
   <features>
     <acpi/>
     <apic/>
     <vmport state='off'/>
   </features>
   <cpu mode='host-passthrough' check='none'>
     <numa>
       <cell id='0' cpus='0-3' memory='4194304' unit='KiB' 
memAccess='shared'/>
     </numa>
   </cpu>
...
     <interface type='vhostuser'>
       <mac address='52:54:00:2a:fc:10'/>
       <source type='unix' path='/tmp/vhost-dpdk-server-0' mode='client'/>
       <model type='virtio'/>
       <driver name='vhost' queues='4' rx_queue_size='1024'>
         <host mrg_rxbuf='on'/>
       </driver>
       <address type='pci' domain='0x0000' bus='0x01' slot='0x00' 
function='0x0'/>
     </interface>
...



^ permalink raw reply	[flat|nested] only message in thread

only message in thread, other threads:[~2020-08-12 18:16 UTC | newest]

Thread overview: (only message) (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-08-12 18:16 [dpdk-users] Vhost PMD Performance Doesn't Scale as Expected David Christensen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).