DPDK usage discussions
 help / color / mirror / Atom feed
* [dpdk-users] Explanation for poor performance of DPDK not found
@ 2018-08-27 15:21 Victor Huertas
  2018-08-27 16:05 ` Stephen Hemminger
  0 siblings, 1 reply; 4+ messages in thread
From: Victor Huertas @ 2018-08-27 15:21 UTC (permalink / raw)
  To: users

Dear colleagues,

I am seeing a strange behaviour in terms of performance when I run the L3
forwarding pipeline app example of DPDK.

The diagram is as simple as this:

PC1 <--------1 Gbps link----------> DPDK app (L3 forwarding) <--------1
Gbps link--------> PC2

I have implemented a new pipeline which is performs ARP task in order to
configure the Routing type pipelines's table 1 (the one that performs MAC
translation from next-hop IP addr).

The first strange thing I see is that when I ping from PC 1 to PC 2, the
ping works but it is reporting me a delay of 19,9 ms. And also every ping
report (1 per second) reports a decreasing delay in 1 ms like this:
PING 192.168.1.101 (192.168.1.101) 56(84) bytes of data.
64 bytes from 192.168.1.101: icmp_seq=2 ttl=64 time=17.2 ms
64 bytes from 192.168.1.101: icmp_seq=3 ttl=64 time=15.9 ms
64 bytes from 192.168.1.101: icmp_seq=4 ttl=64 time=14.9 ms
64 bytes from 192.168.1.101: icmp_seq=5 ttl=64 time=13.9 ms
64 bytes from 192.168.1.101: icmp_seq=6 ttl=64 time=12.9 ms
64 bytes from 192.168.1.101: icmp_seq=7 ttl=64 time=11.9 ms
64 bytes from 192.168.1.101: icmp_seq=8 ttl=64 time=10.9 ms
64 bytes from 192.168.1.101: icmp_seq=9 ttl=64 time=19.9 ms
64 bytes from 192.168.1.101: icmp_seq=10 ttl=64 time=18.9 ms
64 bytes from 192.168.1.101: icmp_seq=11 ttl=64 time=17.9 ms

As you can see, there is a 1 ms decrease each ping report and suddenly
comes back to 19,9 ms

The second issue comes up when I send an 700 Mbps UDP stream (using iperf
v2.0.5 at both sides) from PC1 to PC2. What I see is a slight packet loss
on reception.
[  4]  0.0-509.3 sec  1 datagrams received out-of-order
[  3] local 192.168.0.101 port 5001 connected with 192.168.1.101 port 60184
[  3]  0.0- 5.0 sec   437 MBytes   733 Mbits/sec   0.022 ms   39/311788
(0.013%)
[  3]  5.0-10.0 sec   437 MBytes   733 Mbits/sec   0.025 ms  166/311988
(0.053%)
[  3] 10.0-15.0 sec   437 MBytes   734 Mbits/sec   0.022 ms    0/312067
(0%)
[  3] 15.0-20.0 sec   437 MBytes   733 Mbits/sec   0.029 ms  151/311916
(0.048%)
[  3] 20.0-25.0 sec   437 MBytes   734 Mbits/sec   0.016 ms   30/311926
(0.0096%)
[  3] 25.0-30.0 sec   437 MBytes   734 Mbits/sec   0.022 ms  143/312118
(0.046%)
[  3] 30.0-35.0 sec   437 MBytes   733 Mbits/sec   0.022 ms   20/311801
(0.0064%)
[  3] 35.0-40.0 sec   437 MBytes   733 Mbits/sec   0.020 ms  202/311857
(0.065%)
[  3] 40.0-45.0 sec   437 MBytes   733 Mbits/sec   0.017 ms  242/311921
(0.078%)
[  3] 45.0-50.0 sec   437 MBytes   733 Mbits/sec   0.021 ms  280/311890
(0.09%)
[  3] 50.0-55.0 sec   438 MBytes   734 Mbits/sec   0.019 ms    0/312119
(0%)
[  3] 55.0-60.0 sec   436 MBytes   732 Mbits/sec   0.018 ms  152/311339
(0.049%)
[  3] 60.0-65.0 sec   437 MBytes   734 Mbits/sec   0.017 ms  113/312048
(0.036%)
[  3] 65.0-70.0 sec   437 MBytes   733 Mbits/sec   0.023 ms  180/311756
(0.058%)
[  3] 70.0-75.0 sec   437 MBytes   734 Mbits/sec   0.020 ms    0/311960
(0%)
[  3] 75.0-80.0 sec   437 MBytes   734 Mbits/sec   0.013 ms  118/312060
(0.038%)
[  3] 80.0-85.0 sec   437 MBytes   734 Mbits/sec   0.019 ms  122/312060
(0.039%)
[  3] 85.0-90.0 sec   437 MBytes   733 Mbits/sec   0.025 ms   55/311904
(0.018%)
[  3] 90.0-95.0 sec   437 MBytes   733 Mbits/sec   0.024 ms  259/312002
(0.083%)
[  3]  0.0-97.0 sec  8.28 GBytes   733 Mbits/sec   0.034 ms 2271/6053089
(0.038%)

Sometimes I even see packet disorder report from the iperf receipt part.

I didn't expect such a performance in terms of delay and throughput and I
would link to find an explanation. That's why I need your help.

Allow me to tell you some particularities of the machine that runs the DPDK
application and the environment which could help us explain this behaviour.


   1. When I run the application I running using the "Debug" environment of
   the Eclipse in Linux OpenSuse 42.3 Leap.
   2. The hugepages size in this machine is 2 MB
   3. 1024 hugepages has been reserved for the application
   4. lscpu displayed subsequently

cuda1@cuda1:~/eclipse-workspace/SimpleModelDPDK> lscpu
Architecture:          x86_64
CPU op-mode(s):        32-bit, 64-bit
Byte Order:            Little Endian
CPU(s):                8
On-line CPU(s) list:   0-7
Thread(s) per core:    1
Core(s) per socket:    4
Socket(s):             2
NUMA node(s):          2
Vendor ID:             GenuineIntel
CPU family:            6
Model:                 26
Model name:            Intel(R) Xeon(R) CPU           E5506  @ 2.13GHz
Stepping:              5
CPU MHz:               2133.000
CPU max MHz:           2133.0000
CPU min MHz:           1600.0000
BogoMIPS:              4267.10
Virtualization:        VT-x
L1d cache:             32K
L1i cache:             32K
L2 cache:              256K
L3 cache:              4096K
NUMA node0 CPU(s):     0-3
NUMA node1 CPU(s):     4-7
Flags:                 fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge
mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall
nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology
nonstop_tsc aperfmperf pni dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16
xtpr pdcm dca sse4_1 sse4_2 popcnt lahf_lm dtherm retpoline kaiser
tpr_shadow vnmi flexpriority ept vpid

5. Routing pipeline is executed using core 1. Master pipeline is executed
using core 0 and new ARP pipeline is executed using core 2.

6. The two NICs I am using seems not to be assigned to any NUMA node

cuda1@cuda1:~/eclipse-workspace/SimpleModelDPDK> cat
/sys/bus/pci/devices/0000\:04\:00.0/numa_node
-1
cuda1@cuda1:~/eclipse-workspace/SimpleModelDPDK> cat
/sys/bus/pci/devices/0000\:04\:00.1/numa_node
-1

7. According to the pipeline ROUTING statistics (regarding table 0 and
table 1) very few miss drops at table 0 are reported and do not coincide at
all with the ones reported by iperf (iperf drops are much higher than the
table 0 and table 1 drops) and also the links used in the application do
not report any drop at all.

So where are these packets dropped?

Any of you have an idea if this particularities from my PC can justify this
behaviour?

I need to find an answer to this because I expected a much better
performance according to the DPDK performance expectations.

Thanks for your attention

Victor

-- 
Victor

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [dpdk-users] Explanation for poor performance of DPDK not found
  2018-08-27 15:21 [dpdk-users] Explanation for poor performance of DPDK not found Victor Huertas
@ 2018-08-27 16:05 ` Stephen Hemminger
  2018-08-28  7:22   ` Victor Huertas
  0 siblings, 1 reply; 4+ messages in thread
From: Stephen Hemminger @ 2018-08-27 16:05 UTC (permalink / raw)
  To: users

On Mon, 27 Aug 2018 17:21:03 +0200
Victor Huertas <vhuertas@gmail.com> wrote:

> Dear colleagues,
> 
> I am seeing a strange behaviour in terms of performance when I run the L3
> forwarding pipeline app example of DPDK.
> 
> The diagram is as simple as this:
> 
> PC1 <--------1 Gbps link----------> DPDK app (L3 forwarding) <--------1
> Gbps link--------> PC2
> 
> I have implemented a new pipeline which is performs ARP task in order to
> configure the Routing type pipelines's table 1 (the one that performs MAC
> translation from next-hop IP addr).
> 
> The first strange thing I see is that when I ping from PC 1 to PC 2, the
> ping works but it is reporting me a delay of 19,9 ms. And also every ping
> report (1 per second) reports a decreasing delay in 1 ms like this:
> PING 192.168.1.101 (192.168.1.101) 56(84) bytes of data.
> 64 bytes from 192.168.1.101: icmp_seq=2 ttl=64 time=17.2 ms
> 64 bytes from 192.168.1.101: icmp_seq=3 ttl=64 time=15.9 ms
> 64 bytes from 192.168.1.101: icmp_seq=4 ttl=64 time=14.9 ms
> 64 bytes from 192.168.1.101: icmp_seq=5 ttl=64 time=13.9 ms
> 64 bytes from 192.168.1.101: icmp_seq=6 ttl=64 time=12.9 ms
> 64 bytes from 192.168.1.101: icmp_seq=7 ttl=64 time=11.9 ms
> 64 bytes from 192.168.1.101: icmp_seq=8 ttl=64 time=10.9 ms
> 64 bytes from 192.168.1.101: icmp_seq=9 ttl=64 time=19.9 ms
> 64 bytes from 192.168.1.101: icmp_seq=10 ttl=64 time=18.9 ms
> 64 bytes from 192.168.1.101: icmp_seq=11 ttl=64 time=17.9 ms
> 
> As you can see, there is a 1 ms decrease each ping report and suddenly
> comes back to 19,9 ms
> 
> The second issue comes up when I send an 700 Mbps UDP stream (using iperf
> v2.0.5 at both sides) from PC1 to PC2. What I see is a slight packet loss
> on reception.
> [  4]  0.0-509.3 sec  1 datagrams received out-of-order
> [  3] local 192.168.0.101 port 5001 connected with 192.168.1.101 port 60184
> [  3]  0.0- 5.0 sec   437 MBytes   733 Mbits/sec   0.022 ms   39/311788
> (0.013%)
> [  3]  5.0-10.0 sec   437 MBytes   733 Mbits/sec   0.025 ms  166/311988
> (0.053%)
> [  3] 10.0-15.0 sec   437 MBytes   734 Mbits/sec   0.022 ms    0/312067
> (0%)
> [  3] 15.0-20.0 sec   437 MBytes   733 Mbits/sec   0.029 ms  151/311916
> (0.048%)
> [  3] 20.0-25.0 sec   437 MBytes   734 Mbits/sec   0.016 ms   30/311926
> (0.0096%)
> [  3] 25.0-30.0 sec   437 MBytes   734 Mbits/sec   0.022 ms  143/312118
> (0.046%)
> [  3] 30.0-35.0 sec   437 MBytes   733 Mbits/sec   0.022 ms   20/311801
> (0.0064%)
> [  3] 35.0-40.0 sec   437 MBytes   733 Mbits/sec   0.020 ms  202/311857
> (0.065%)
> [  3] 40.0-45.0 sec   437 MBytes   733 Mbits/sec   0.017 ms  242/311921
> (0.078%)
> [  3] 45.0-50.0 sec   437 MBytes   733 Mbits/sec   0.021 ms  280/311890
> (0.09%)
> [  3] 50.0-55.0 sec   438 MBytes   734 Mbits/sec   0.019 ms    0/312119
> (0%)
> [  3] 55.0-60.0 sec   436 MBytes   732 Mbits/sec   0.018 ms  152/311339
> (0.049%)
> [  3] 60.0-65.0 sec   437 MBytes   734 Mbits/sec   0.017 ms  113/312048
> (0.036%)
> [  3] 65.0-70.0 sec   437 MBytes   733 Mbits/sec   0.023 ms  180/311756
> (0.058%)
> [  3] 70.0-75.0 sec   437 MBytes   734 Mbits/sec   0.020 ms    0/311960
> (0%)
> [  3] 75.0-80.0 sec   437 MBytes   734 Mbits/sec   0.013 ms  118/312060
> (0.038%)
> [  3] 80.0-85.0 sec   437 MBytes   734 Mbits/sec   0.019 ms  122/312060
> (0.039%)
> [  3] 85.0-90.0 sec   437 MBytes   733 Mbits/sec   0.025 ms   55/311904
> (0.018%)
> [  3] 90.0-95.0 sec   437 MBytes   733 Mbits/sec   0.024 ms  259/312002
> (0.083%)
> [  3]  0.0-97.0 sec  8.28 GBytes   733 Mbits/sec   0.034 ms 2271/6053089
> (0.038%)
> 
> Sometimes I even see packet disorder report from the iperf receipt part.
> 
> I didn't expect such a performance in terms of delay and throughput and I
> would link to find an explanation. That's why I need your help.
> 
> Allow me to tell you some particularities of the machine that runs the DPDK
> application and the environment which could help us explain this behaviour.
> 
> 
>    1. When I run the application I running using the "Debug" environment of
>    the Eclipse in Linux OpenSuse 42.3 Leap.
>    2. The hugepages size in this machine is 2 MB
>    3. 1024 hugepages has been reserved for the application
>    4. lscpu displayed subsequently
> 
> cuda1@cuda1:~/eclipse-workspace/SimpleModelDPDK> lscpu
> Architecture:          x86_64
> CPU op-mode(s):        32-bit, 64-bit
> Byte Order:            Little Endian
> CPU(s):                8
> On-line CPU(s) list:   0-7
> Thread(s) per core:    1
> Core(s) per socket:    4
> Socket(s):             2
> NUMA node(s):          2
> Vendor ID:             GenuineIntel
> CPU family:            6
> Model:                 26
> Model name:            Intel(R) Xeon(R) CPU           E5506  @ 2.13GHz
> Stepping:              5
> CPU MHz:               2133.000
> CPU max MHz:           2133.0000
> CPU min MHz:           1600.0000
> BogoMIPS:              4267.10
> Virtualization:        VT-x
> L1d cache:             32K
> L1i cache:             32K
> L2 cache:              256K
> L3 cache:              4096K
> NUMA node0 CPU(s):     0-3
> NUMA node1 CPU(s):     4-7
> Flags:                 fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge
> mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall
> nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology
> nonstop_tsc aperfmperf pni dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16
> xtpr pdcm dca sse4_1 sse4_2 popcnt lahf_lm dtherm retpoline kaiser
> tpr_shadow vnmi flexpriority ept vpid
> 
> 5. Routing pipeline is executed using core 1. Master pipeline is executed
> using core 0 and new ARP pipeline is executed using core 2.
> 
> 6. The two NICs I am using seems not to be assigned to any NUMA node
> 
> cuda1@cuda1:~/eclipse-workspace/SimpleModelDPDK> cat
> /sys/bus/pci/devices/0000\:04\:00.0/numa_node
> -1
> cuda1@cuda1:~/eclipse-workspace/SimpleModelDPDK> cat
> /sys/bus/pci/devices/0000\:04\:00.1/numa_node
> -1
> 
> 7. According to the pipeline ROUTING statistics (regarding table 0 and
> table 1) very few miss drops at table 0 are reported and do not coincide at
> all with the ones reported by iperf (iperf drops are much higher than the
> table 0 and table 1 drops) and also the links used in the application do
> not report any drop at all.
> 
> So where are these packets dropped?
> 
> Any of you have an idea if this particularities from my PC can justify this
> behaviour?
> 
> I need to find an answer to this because I expected a much better
> performance according to the DPDK performance expectations.
> 
> Thanks for your attention
> 
> Victor
> 

What NIC?

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [dpdk-users] Explanation for poor performance of DPDK not found
  2018-08-27 16:05 ` Stephen Hemminger
@ 2018-08-28  7:22   ` Victor Huertas
  2018-08-28  8:36     ` Victor Huertas
  0 siblings, 1 reply; 4+ messages in thread
From: Victor Huertas @ 2018-08-28  7:22 UTC (permalink / raw)
  To: stephen; +Cc: users

You are right Stephen.
I missed the NIC's description. Sorry about that. You can find subsquently
the NICs I have in this machine (I am using for DPDK 0000:04:00.0 and
0000:04:00.1).

cuda1@cuda1:~/eclipse-workspace/SimpleModelDPDK> sudo lspci -D | grep
'Network\|Ethernet'

0000:02:00.0 Ethernet controller: Intel Corporation 82572EI Gigabit
Ethernet Controller (Copper) (rev 06)
0000:04:00.0 Ethernet controller: Intel Corporation 82576 Gigabit Network
Connection (rev 01)
0000:04:00.1 Ethernet controller: Intel Corporation 82576 Gigabit Network
Connection (rev 01)

Thanks for your attention

Regards,

El lun., 27 ago. 2018 a las 18:05, Stephen Hemminger (<
stephen@networkplumber.org>) escribió:

> On Mon, 27 Aug 2018 17:21:03 +0200
> Victor Huertas <vhuertas@gmail.com> wrote:
>
> > Dear colleagues,
> >
> > I am seeing a strange behaviour in terms of performance when I run the L3
> > forwarding pipeline app example of DPDK.
> >
> > The diagram is as simple as this:
> >
> > PC1 <--------1 Gbps link----------> DPDK app (L3 forwarding) <--------1
> > Gbps link--------> PC2
> >
> > I have implemented a new pipeline which is performs ARP task in order to
> > configure the Routing type pipelines's table 1 (the one that performs MAC
> > translation from next-hop IP addr).
> >
> > The first strange thing I see is that when I ping from PC 1 to PC 2, the
> > ping works but it is reporting me a delay of 19,9 ms. And also every ping
> > report (1 per second) reports a decreasing delay in 1 ms like this:
> > PING 192.168.1.101 (192.168.1.101) 56(84) bytes of data.
> > 64 bytes from 192.168.1.101: icmp_seq=2 ttl=64 time=17.2 ms
> > 64 bytes from 192.168.1.101: icmp_seq=3 ttl=64 time=15.9 ms
> > 64 bytes from 192.168.1.101: icmp_seq=4 ttl=64 time=14.9 ms
> > 64 bytes from 192.168.1.101: icmp_seq=5 ttl=64 time=13.9 ms
> > 64 bytes from 192.168.1.101: icmp_seq=6 ttl=64 time=12.9 ms
> > 64 bytes from 192.168.1.101: icmp_seq=7 ttl=64 time=11.9 ms
> > 64 bytes from 192.168.1.101: icmp_seq=8 ttl=64 time=10.9 ms
> > 64 bytes from 192.168.1.101: icmp_seq=9 ttl=64 time=19.9 ms
> > 64 bytes from 192.168.1.101: icmp_seq=10 ttl=64 time=18.9 ms
> > 64 bytes from 192.168.1.101: icmp_seq=11 ttl=64 time=17.9 ms
> >
> > As you can see, there is a 1 ms decrease each ping report and suddenly
> > comes back to 19,9 ms
> >
> > The second issue comes up when I send an 700 Mbps UDP stream (using iperf
> > v2.0.5 at both sides) from PC1 to PC2. What I see is a slight packet loss
> > on reception.
> > [  4]  0.0-509.3 sec  1 datagrams received out-of-order
> > [  3] local 192.168.0.101 port 5001 connected with 192.168.1.101 port
> 60184
> > [  3]  0.0- 5.0 sec   437 MBytes   733 Mbits/sec   0.022 ms   39/311788
> > (0.013%)
> > [  3]  5.0-10.0 sec   437 MBytes   733 Mbits/sec   0.025 ms  166/311988
> > (0.053%)
> > [  3] 10.0-15.0 sec   437 MBytes   734 Mbits/sec   0.022 ms    0/312067
> > (0%)
> > [  3] 15.0-20.0 sec   437 MBytes   733 Mbits/sec   0.029 ms  151/311916
> > (0.048%)
> > [  3] 20.0-25.0 sec   437 MBytes   734 Mbits/sec   0.016 ms   30/311926
> > (0.0096%)
> > [  3] 25.0-30.0 sec   437 MBytes   734 Mbits/sec   0.022 ms  143/312118
> > (0.046%)
> > [  3] 30.0-35.0 sec   437 MBytes   733 Mbits/sec   0.022 ms   20/311801
> > (0.0064%)
> > [  3] 35.0-40.0 sec   437 MBytes   733 Mbits/sec   0.020 ms  202/311857
> > (0.065%)
> > [  3] 40.0-45.0 sec   437 MBytes   733 Mbits/sec   0.017 ms  242/311921
> > (0.078%)
> > [  3] 45.0-50.0 sec   437 MBytes   733 Mbits/sec   0.021 ms  280/311890
> > (0.09%)
> > [  3] 50.0-55.0 sec   438 MBytes   734 Mbits/sec   0.019 ms    0/312119
> > (0%)
> > [  3] 55.0-60.0 sec   436 MBytes   732 Mbits/sec   0.018 ms  152/311339
> > (0.049%)
> > [  3] 60.0-65.0 sec   437 MBytes   734 Mbits/sec   0.017 ms  113/312048
> > (0.036%)
> > [  3] 65.0-70.0 sec   437 MBytes   733 Mbits/sec   0.023 ms  180/311756
> > (0.058%)
> > [  3] 70.0-75.0 sec   437 MBytes   734 Mbits/sec   0.020 ms    0/311960
> > (0%)
> > [  3] 75.0-80.0 sec   437 MBytes   734 Mbits/sec   0.013 ms  118/312060
> > (0.038%)
> > [  3] 80.0-85.0 sec   437 MBytes   734 Mbits/sec   0.019 ms  122/312060
> > (0.039%)
> > [  3] 85.0-90.0 sec   437 MBytes   733 Mbits/sec   0.025 ms   55/311904
> > (0.018%)
> > [  3] 90.0-95.0 sec   437 MBytes   733 Mbits/sec   0.024 ms  259/312002
> > (0.083%)
> > [  3]  0.0-97.0 sec  8.28 GBytes   733 Mbits/sec   0.034 ms 2271/6053089
> > (0.038%)
> >
> > Sometimes I even see packet disorder report from the iperf receipt part.
> >
> > I didn't expect such a performance in terms of delay and throughput and I
> > would link to find an explanation. That's why I need your help.
> >
> > Allow me to tell you some particularities of the machine that runs the
> DPDK
> > application and the environment which could help us explain this
> behaviour.
> >
> >
> >    1. When I run the application I running using the "Debug" environment
> of
> >    the Eclipse in Linux OpenSuse 42.3 Leap.
> >    2. The hugepages size in this machine is 2 MB
> >    3. 1024 hugepages has been reserved for the application
> >    4. lscpu displayed subsequently
> >
> > cuda1@cuda1:~/eclipse-workspace/SimpleModelDPDK> lscpu
> > Architecture:          x86_64
> > CPU op-mode(s):        32-bit, 64-bit
> > Byte Order:            Little Endian
> > CPU(s):                8
> > On-line CPU(s) list:   0-7
> > Thread(s) per core:    1
> > Core(s) per socket:    4
> > Socket(s):             2
> > NUMA node(s):          2
> > Vendor ID:             GenuineIntel
> > CPU family:            6
> > Model:                 26
> > Model name:            Intel(R) Xeon(R) CPU           E5506  @ 2.13GHz
> > Stepping:              5
> > CPU MHz:               2133.000
> > CPU max MHz:           2133.0000
> > CPU min MHz:           1600.0000
> > BogoMIPS:              4267.10
> > Virtualization:        VT-x
> > L1d cache:             32K
> > L1i cache:             32K
> > L2 cache:              256K
> > L3 cache:              4096K
> > NUMA node0 CPU(s):     0-3
> > NUMA node1 CPU(s):     4-7
> > Flags:                 fpu vme de pse tsc msr pae mce cx8 apic sep mtrr
> pge
> > mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe
> syscall
> > nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology
> > nonstop_tsc aperfmperf pni dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16
> > xtpr pdcm dca sse4_1 sse4_2 popcnt lahf_lm dtherm retpoline kaiser
> > tpr_shadow vnmi flexpriority ept vpid
> >
> > 5. Routing pipeline is executed using core 1. Master pipeline is executed
> > using core 0 and new ARP pipeline is executed using core 2.
> >
> > 6. The two NICs I am using seems not to be assigned to any NUMA node
> >
> > cuda1@cuda1:~/eclipse-workspace/SimpleModelDPDK> cat
> > /sys/bus/pci/devices/0000\:04\:00.0/numa_node
> > -1
> > cuda1@cuda1:~/eclipse-workspace/SimpleModelDPDK> cat
> > /sys/bus/pci/devices/0000\:04\:00.1/numa_node
> > -1
> >
> > 7. According to the pipeline ROUTING statistics (regarding table 0 and
> > table 1) very few miss drops at table 0 are reported and do not coincide
> at
> > all with the ones reported by iperf (iperf drops are much higher than the
> > table 0 and table 1 drops) and also the links used in the application do
> > not report any drop at all.
> >
> > So where are these packets dropped?
> >
> > Any of you have an idea if this particularities from my PC can justify
> this
> > behaviour?
> >
> > I need to find an answer to this because I expected a much better
> > performance according to the DPDK performance expectations.
> >
> > Thanks for your attention
> >
> > Victor
> >
>
> What NIC?
>


-- 
Victor

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [dpdk-users] Explanation for poor performance of DPDK not found
  2018-08-28  7:22   ` Victor Huertas
@ 2018-08-28  8:36     ` Victor Huertas
  0 siblings, 0 replies; 4+ messages in thread
From: Victor Huertas @ 2018-08-28  8:36 UTC (permalink / raw)
  To: stephen; +Cc: users

Just to provide additional info, the DPDK version used is the 17.11 and I
am running the application with the VFIO PMD driver on these two NICs.

Regards,

PD: I resend the message to all dpdk-users group.

El mar., 28 ago. 2018 a las 9:22, Victor Huertas (<vhuertas@gmail.com>)
escribió:

>
> You are right Stephen.
> I missed the NIC's description. Sorry about that. You can find subsquently
> the NICs I have in this machine (I am using for DPDK 0000:04:00.0 and
> 0000:04:00.1).
>
> cuda1@cuda1:~/eclipse-workspace/SimpleModelDPDK> sudo lspci -D | grep
> 'Network\|Ethernet'
>
> 0000:02:00.0 Ethernet controller: Intel Corporation 82572EI Gigabit
> Ethernet Controller (Copper) (rev 06)
> 0000:04:00.0 Ethernet controller: Intel Corporation 82576 Gigabit Network
> Connection (rev 01)
> 0000:04:00.1 Ethernet controller: Intel Corporation 82576 Gigabit Network
> Connection (rev 01)
>
> Thanks for your attention
>
> Regards,
>
> El lun., 27 ago. 2018 a las 18:05, Stephen Hemminger (<
> stephen@networkplumber.org>) escribió:
>
>> On Mon, 27 Aug 2018 17:21:03 +0200
>> Victor Huertas <vhuertas@gmail.com> wrote:
>>
>> > Dear colleagues,
>> >
>> > I am seeing a strange behaviour in terms of performance when I run the
>> L3
>> > forwarding pipeline app example of DPDK.
>> >
>> > The diagram is as simple as this:
>> >
>> > PC1 <--------1 Gbps link----------> DPDK app (L3 forwarding) <--------1
>> > Gbps link--------> PC2
>> >
>> > I have implemented a new pipeline which is performs ARP task in order to
>> > configure the Routing type pipelines's table 1 (the one that performs
>> MAC
>> > translation from next-hop IP addr).
>> >
>> > The first strange thing I see is that when I ping from PC 1 to PC 2, the
>> > ping works but it is reporting me a delay of 19,9 ms. And also every
>> ping
>> > report (1 per second) reports a decreasing delay in 1 ms like this:
>> > PING 192.168.1.101 (192.168.1.101) 56(84) bytes of data.
>> > 64 bytes from 192.168.1.101: icmp_seq=2 ttl=64 time=17.2 ms
>> > 64 bytes from 192.168.1.101: icmp_seq=3 ttl=64 time=15.9 ms
>> > 64 bytes from 192.168.1.101: icmp_seq=4 ttl=64 time=14.9 ms
>> > 64 bytes from 192.168.1.101: icmp_seq=5 ttl=64 time=13.9 ms
>> > 64 bytes from 192.168.1.101: icmp_seq=6 ttl=64 time=12.9 ms
>> > 64 bytes from 192.168.1.101: icmp_seq=7 ttl=64 time=11.9 ms
>> > 64 bytes from 192.168.1.101: icmp_seq=8 ttl=64 time=10.9 ms
>> > 64 bytes from 192.168.1.101: icmp_seq=9 ttl=64 time=19.9 ms
>> > 64 bytes from 192.168.1.101: icmp_seq=10 ttl=64 time=18.9 ms
>> > 64 bytes from 192.168.1.101: icmp_seq=11 ttl=64 time=17.9 ms
>> >
>> > As you can see, there is a 1 ms decrease each ping report and suddenly
>> > comes back to 19,9 ms
>> >
>> > The second issue comes up when I send an 700 Mbps UDP stream (using
>> iperf
>> > v2.0.5 at both sides) from PC1 to PC2. What I see is a slight packet
>> loss
>> > on reception.
>> > [  4]  0.0-509.3 sec  1 datagrams received out-of-order
>> > [  3] local 192.168.0.101 port 5001 connected with 192.168.1.101 port
>> 60184
>> > [  3]  0.0- 5.0 sec   437 MBytes   733 Mbits/sec   0.022 ms   39/311788
>> > (0.013%)
>> > [  3]  5.0-10.0 sec   437 MBytes   733 Mbits/sec   0.025 ms  166/311988
>> > (0.053%)
>> > [  3] 10.0-15.0 sec   437 MBytes   734 Mbits/sec   0.022 ms    0/312067
>> > (0%)
>> > [  3] 15.0-20.0 sec   437 MBytes   733 Mbits/sec   0.029 ms  151/311916
>> > (0.048%)
>> > [  3] 20.0-25.0 sec   437 MBytes   734 Mbits/sec   0.016 ms   30/311926
>> > (0.0096%)
>> > [  3] 25.0-30.0 sec   437 MBytes   734 Mbits/sec   0.022 ms  143/312118
>> > (0.046%)
>> > [  3] 30.0-35.0 sec   437 MBytes   733 Mbits/sec   0.022 ms   20/311801
>> > (0.0064%)
>> > [  3] 35.0-40.0 sec   437 MBytes   733 Mbits/sec   0.020 ms  202/311857
>> > (0.065%)
>> > [  3] 40.0-45.0 sec   437 MBytes   733 Mbits/sec   0.017 ms  242/311921
>> > (0.078%)
>> > [  3] 45.0-50.0 sec   437 MBytes   733 Mbits/sec   0.021 ms  280/311890
>> > (0.09%)
>> > [  3] 50.0-55.0 sec   438 MBytes   734 Mbits/sec   0.019 ms    0/312119
>> > (0%)
>> > [  3] 55.0-60.0 sec   436 MBytes   732 Mbits/sec   0.018 ms  152/311339
>> > (0.049%)
>> > [  3] 60.0-65.0 sec   437 MBytes   734 Mbits/sec   0.017 ms  113/312048
>> > (0.036%)
>> > [  3] 65.0-70.0 sec   437 MBytes   733 Mbits/sec   0.023 ms  180/311756
>> > (0.058%)
>> > [  3] 70.0-75.0 sec   437 MBytes   734 Mbits/sec   0.020 ms    0/311960
>> > (0%)
>> > [  3] 75.0-80.0 sec   437 MBytes   734 Mbits/sec   0.013 ms  118/312060
>> > (0.038%)
>> > [  3] 80.0-85.0 sec   437 MBytes   734 Mbits/sec   0.019 ms  122/312060
>> > (0.039%)
>> > [  3] 85.0-90.0 sec   437 MBytes   733 Mbits/sec   0.025 ms   55/311904
>> > (0.018%)
>> > [  3] 90.0-95.0 sec   437 MBytes   733 Mbits/sec   0.024 ms  259/312002
>> > (0.083%)
>> > [  3]  0.0-97.0 sec  8.28 GBytes   733 Mbits/sec   0.034 ms 2271/6053089
>> > (0.038%)
>> >
>> > Sometimes I even see packet disorder report from the iperf receipt part.
>> >
>> > I didn't expect such a performance in terms of delay and throughput and
>> I
>> > would link to find an explanation. That's why I need your help.
>> >
>> > Allow me to tell you some particularities of the machine that runs the
>> DPDK
>> > application and the environment which could help us explain this
>> behaviour.
>> >
>> >
>> >    1. When I run the application I running using the "Debug"
>> environment of
>> >    the Eclipse in Linux OpenSuse 42.3 Leap.
>> >    2. The hugepages size in this machine is 2 MB
>> >    3. 1024 hugepages has been reserved for the application
>> >    4. lscpu displayed subsequently
>> >
>> > cuda1@cuda1:~/eclipse-workspace/SimpleModelDPDK> lscpu
>> > Architecture:          x86_64
>> > CPU op-mode(s):        32-bit, 64-bit
>> > Byte Order:            Little Endian
>> > CPU(s):                8
>> > On-line CPU(s) list:   0-7
>> > Thread(s) per core:    1
>> > Core(s) per socket:    4
>> > Socket(s):             2
>> > NUMA node(s):          2
>> > Vendor ID:             GenuineIntel
>> > CPU family:            6
>> > Model:                 26
>> > Model name:            Intel(R) Xeon(R) CPU           E5506  @ 2.13GHz
>> > Stepping:              5
>> > CPU MHz:               2133.000
>> > CPU max MHz:           2133.0000
>> > CPU min MHz:           1600.0000
>> > BogoMIPS:              4267.10
>> > Virtualization:        VT-x
>> > L1d cache:             32K
>> > L1i cache:             32K
>> > L2 cache:              256K
>> > L3 cache:              4096K
>> > NUMA node0 CPU(s):     0-3
>> > NUMA node1 CPU(s):     4-7
>> > Flags:                 fpu vme de pse tsc msr pae mce cx8 apic sep mtrr
>> pge
>> > mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe
>> syscall
>> > nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology
>> > nonstop_tsc aperfmperf pni dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16
>> > xtpr pdcm dca sse4_1 sse4_2 popcnt lahf_lm dtherm retpoline kaiser
>> > tpr_shadow vnmi flexpriority ept vpid
>> >
>> > 5. Routing pipeline is executed using core 1. Master pipeline is
>> executed
>> > using core 0 and new ARP pipeline is executed using core 2.
>> >
>> > 6. The two NICs I am using seems not to be assigned to any NUMA node
>> >
>> > cuda1@cuda1:~/eclipse-workspace/SimpleModelDPDK> cat
>> > /sys/bus/pci/devices/0000\:04\:00.0/numa_node
>> > -1
>> > cuda1@cuda1:~/eclipse-workspace/SimpleModelDPDK> cat
>> > /sys/bus/pci/devices/0000\:04\:00.1/numa_node
>> > -1
>> >
>> > 7. According to the pipeline ROUTING statistics (regarding table 0 and
>> > table 1) very few miss drops at table 0 are reported and do not
>> coincide at
>> > all with the ones reported by iperf (iperf drops are much higher than
>> the
>> > table 0 and table 1 drops) and also the links used in the application do
>> > not report any drop at all.
>> >
>> > So where are these packets dropped?
>> >
>> > Any of you have an idea if this particularities from my PC can justify
>> this
>> > behaviour?
>> >
>> > I need to find an answer to this because I expected a much better
>> > performance according to the DPDK performance expectations.
>> >
>> > Thanks for your attention
>> >
>> > Victor
>> >
>>
>> What NIC?
>>
>
>
> --
> Victor
>


-- 
Victor

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2018-08-28  8:37 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-08-27 15:21 [dpdk-users] Explanation for poor performance of DPDK not found Victor Huertas
2018-08-27 16:05 ` Stephen Hemminger
2018-08-28  7:22   ` Victor Huertas
2018-08-28  8:36     ` Victor Huertas

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).