* [dpdk-users] Explanation for poor performance of DPDK not found
@ 2018-08-27 15:21 Victor Huertas
2018-08-27 16:05 ` Stephen Hemminger
0 siblings, 1 reply; 4+ messages in thread
From: Victor Huertas @ 2018-08-27 15:21 UTC (permalink / raw)
To: users
Dear colleagues,
I am seeing a strange behaviour in terms of performance when I run the L3
forwarding pipeline app example of DPDK.
The diagram is as simple as this:
PC1 <--------1 Gbps link----------> DPDK app (L3 forwarding) <--------1
Gbps link--------> PC2
I have implemented a new pipeline which is performs ARP task in order to
configure the Routing type pipelines's table 1 (the one that performs MAC
translation from next-hop IP addr).
The first strange thing I see is that when I ping from PC 1 to PC 2, the
ping works but it is reporting me a delay of 19,9 ms. And also every ping
report (1 per second) reports a decreasing delay in 1 ms like this:
PING 192.168.1.101 (192.168.1.101) 56(84) bytes of data.
64 bytes from 192.168.1.101: icmp_seq=2 ttl=64 time=17.2 ms
64 bytes from 192.168.1.101: icmp_seq=3 ttl=64 time=15.9 ms
64 bytes from 192.168.1.101: icmp_seq=4 ttl=64 time=14.9 ms
64 bytes from 192.168.1.101: icmp_seq=5 ttl=64 time=13.9 ms
64 bytes from 192.168.1.101: icmp_seq=6 ttl=64 time=12.9 ms
64 bytes from 192.168.1.101: icmp_seq=7 ttl=64 time=11.9 ms
64 bytes from 192.168.1.101: icmp_seq=8 ttl=64 time=10.9 ms
64 bytes from 192.168.1.101: icmp_seq=9 ttl=64 time=19.9 ms
64 bytes from 192.168.1.101: icmp_seq=10 ttl=64 time=18.9 ms
64 bytes from 192.168.1.101: icmp_seq=11 ttl=64 time=17.9 ms
As you can see, there is a 1 ms decrease each ping report and suddenly
comes back to 19,9 ms
The second issue comes up when I send an 700 Mbps UDP stream (using iperf
v2.0.5 at both sides) from PC1 to PC2. What I see is a slight packet loss
on reception.
[ 4] 0.0-509.3 sec 1 datagrams received out-of-order
[ 3] local 192.168.0.101 port 5001 connected with 192.168.1.101 port 60184
[ 3] 0.0- 5.0 sec 437 MBytes 733 Mbits/sec 0.022 ms 39/311788
(0.013%)
[ 3] 5.0-10.0 sec 437 MBytes 733 Mbits/sec 0.025 ms 166/311988
(0.053%)
[ 3] 10.0-15.0 sec 437 MBytes 734 Mbits/sec 0.022 ms 0/312067
(0%)
[ 3] 15.0-20.0 sec 437 MBytes 733 Mbits/sec 0.029 ms 151/311916
(0.048%)
[ 3] 20.0-25.0 sec 437 MBytes 734 Mbits/sec 0.016 ms 30/311926
(0.0096%)
[ 3] 25.0-30.0 sec 437 MBytes 734 Mbits/sec 0.022 ms 143/312118
(0.046%)
[ 3] 30.0-35.0 sec 437 MBytes 733 Mbits/sec 0.022 ms 20/311801
(0.0064%)
[ 3] 35.0-40.0 sec 437 MBytes 733 Mbits/sec 0.020 ms 202/311857
(0.065%)
[ 3] 40.0-45.0 sec 437 MBytes 733 Mbits/sec 0.017 ms 242/311921
(0.078%)
[ 3] 45.0-50.0 sec 437 MBytes 733 Mbits/sec 0.021 ms 280/311890
(0.09%)
[ 3] 50.0-55.0 sec 438 MBytes 734 Mbits/sec 0.019 ms 0/312119
(0%)
[ 3] 55.0-60.0 sec 436 MBytes 732 Mbits/sec 0.018 ms 152/311339
(0.049%)
[ 3] 60.0-65.0 sec 437 MBytes 734 Mbits/sec 0.017 ms 113/312048
(0.036%)
[ 3] 65.0-70.0 sec 437 MBytes 733 Mbits/sec 0.023 ms 180/311756
(0.058%)
[ 3] 70.0-75.0 sec 437 MBytes 734 Mbits/sec 0.020 ms 0/311960
(0%)
[ 3] 75.0-80.0 sec 437 MBytes 734 Mbits/sec 0.013 ms 118/312060
(0.038%)
[ 3] 80.0-85.0 sec 437 MBytes 734 Mbits/sec 0.019 ms 122/312060
(0.039%)
[ 3] 85.0-90.0 sec 437 MBytes 733 Mbits/sec 0.025 ms 55/311904
(0.018%)
[ 3] 90.0-95.0 sec 437 MBytes 733 Mbits/sec 0.024 ms 259/312002
(0.083%)
[ 3] 0.0-97.0 sec 8.28 GBytes 733 Mbits/sec 0.034 ms 2271/6053089
(0.038%)
Sometimes I even see packet disorder report from the iperf receipt part.
I didn't expect such a performance in terms of delay and throughput and I
would link to find an explanation. That's why I need your help.
Allow me to tell you some particularities of the machine that runs the DPDK
application and the environment which could help us explain this behaviour.
1. When I run the application I running using the "Debug" environment of
the Eclipse in Linux OpenSuse 42.3 Leap.
2. The hugepages size in this machine is 2 MB
3. 1024 hugepages has been reserved for the application
4. lscpu displayed subsequently
cuda1@cuda1:~/eclipse-workspace/SimpleModelDPDK> lscpu
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 8
On-line CPU(s) list: 0-7
Thread(s) per core: 1
Core(s) per socket: 4
Socket(s): 2
NUMA node(s): 2
Vendor ID: GenuineIntel
CPU family: 6
Model: 26
Model name: Intel(R) Xeon(R) CPU E5506 @ 2.13GHz
Stepping: 5
CPU MHz: 2133.000
CPU max MHz: 2133.0000
CPU min MHz: 1600.0000
BogoMIPS: 4267.10
Virtualization: VT-x
L1d cache: 32K
L1i cache: 32K
L2 cache: 256K
L3 cache: 4096K
NUMA node0 CPU(s): 0-3
NUMA node1 CPU(s): 4-7
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge
mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall
nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology
nonstop_tsc aperfmperf pni dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16
xtpr pdcm dca sse4_1 sse4_2 popcnt lahf_lm dtherm retpoline kaiser
tpr_shadow vnmi flexpriority ept vpid
5. Routing pipeline is executed using core 1. Master pipeline is executed
using core 0 and new ARP pipeline is executed using core 2.
6. The two NICs I am using seems not to be assigned to any NUMA node
cuda1@cuda1:~/eclipse-workspace/SimpleModelDPDK> cat
/sys/bus/pci/devices/0000\:04\:00.0/numa_node
-1
cuda1@cuda1:~/eclipse-workspace/SimpleModelDPDK> cat
/sys/bus/pci/devices/0000\:04\:00.1/numa_node
-1
7. According to the pipeline ROUTING statistics (regarding table 0 and
table 1) very few miss drops at table 0 are reported and do not coincide at
all with the ones reported by iperf (iperf drops are much higher than the
table 0 and table 1 drops) and also the links used in the application do
not report any drop at all.
So where are these packets dropped?
Any of you have an idea if this particularities from my PC can justify this
behaviour?
I need to find an answer to this because I expected a much better
performance according to the DPDK performance expectations.
Thanks for your attention
Victor
--
Victor
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [dpdk-users] Explanation for poor performance of DPDK not found
2018-08-27 15:21 [dpdk-users] Explanation for poor performance of DPDK not found Victor Huertas
@ 2018-08-27 16:05 ` Stephen Hemminger
2018-08-28 7:22 ` Victor Huertas
0 siblings, 1 reply; 4+ messages in thread
From: Stephen Hemminger @ 2018-08-27 16:05 UTC (permalink / raw)
To: users
On Mon, 27 Aug 2018 17:21:03 +0200
Victor Huertas <vhuertas@gmail.com> wrote:
> Dear colleagues,
>
> I am seeing a strange behaviour in terms of performance when I run the L3
> forwarding pipeline app example of DPDK.
>
> The diagram is as simple as this:
>
> PC1 <--------1 Gbps link----------> DPDK app (L3 forwarding) <--------1
> Gbps link--------> PC2
>
> I have implemented a new pipeline which is performs ARP task in order to
> configure the Routing type pipelines's table 1 (the one that performs MAC
> translation from next-hop IP addr).
>
> The first strange thing I see is that when I ping from PC 1 to PC 2, the
> ping works but it is reporting me a delay of 19,9 ms. And also every ping
> report (1 per second) reports a decreasing delay in 1 ms like this:
> PING 192.168.1.101 (192.168.1.101) 56(84) bytes of data.
> 64 bytes from 192.168.1.101: icmp_seq=2 ttl=64 time=17.2 ms
> 64 bytes from 192.168.1.101: icmp_seq=3 ttl=64 time=15.9 ms
> 64 bytes from 192.168.1.101: icmp_seq=4 ttl=64 time=14.9 ms
> 64 bytes from 192.168.1.101: icmp_seq=5 ttl=64 time=13.9 ms
> 64 bytes from 192.168.1.101: icmp_seq=6 ttl=64 time=12.9 ms
> 64 bytes from 192.168.1.101: icmp_seq=7 ttl=64 time=11.9 ms
> 64 bytes from 192.168.1.101: icmp_seq=8 ttl=64 time=10.9 ms
> 64 bytes from 192.168.1.101: icmp_seq=9 ttl=64 time=19.9 ms
> 64 bytes from 192.168.1.101: icmp_seq=10 ttl=64 time=18.9 ms
> 64 bytes from 192.168.1.101: icmp_seq=11 ttl=64 time=17.9 ms
>
> As you can see, there is a 1 ms decrease each ping report and suddenly
> comes back to 19,9 ms
>
> The second issue comes up when I send an 700 Mbps UDP stream (using iperf
> v2.0.5 at both sides) from PC1 to PC2. What I see is a slight packet loss
> on reception.
> [ 4] 0.0-509.3 sec 1 datagrams received out-of-order
> [ 3] local 192.168.0.101 port 5001 connected with 192.168.1.101 port 60184
> [ 3] 0.0- 5.0 sec 437 MBytes 733 Mbits/sec 0.022 ms 39/311788
> (0.013%)
> [ 3] 5.0-10.0 sec 437 MBytes 733 Mbits/sec 0.025 ms 166/311988
> (0.053%)
> [ 3] 10.0-15.0 sec 437 MBytes 734 Mbits/sec 0.022 ms 0/312067
> (0%)
> [ 3] 15.0-20.0 sec 437 MBytes 733 Mbits/sec 0.029 ms 151/311916
> (0.048%)
> [ 3] 20.0-25.0 sec 437 MBytes 734 Mbits/sec 0.016 ms 30/311926
> (0.0096%)
> [ 3] 25.0-30.0 sec 437 MBytes 734 Mbits/sec 0.022 ms 143/312118
> (0.046%)
> [ 3] 30.0-35.0 sec 437 MBytes 733 Mbits/sec 0.022 ms 20/311801
> (0.0064%)
> [ 3] 35.0-40.0 sec 437 MBytes 733 Mbits/sec 0.020 ms 202/311857
> (0.065%)
> [ 3] 40.0-45.0 sec 437 MBytes 733 Mbits/sec 0.017 ms 242/311921
> (0.078%)
> [ 3] 45.0-50.0 sec 437 MBytes 733 Mbits/sec 0.021 ms 280/311890
> (0.09%)
> [ 3] 50.0-55.0 sec 438 MBytes 734 Mbits/sec 0.019 ms 0/312119
> (0%)
> [ 3] 55.0-60.0 sec 436 MBytes 732 Mbits/sec 0.018 ms 152/311339
> (0.049%)
> [ 3] 60.0-65.0 sec 437 MBytes 734 Mbits/sec 0.017 ms 113/312048
> (0.036%)
> [ 3] 65.0-70.0 sec 437 MBytes 733 Mbits/sec 0.023 ms 180/311756
> (0.058%)
> [ 3] 70.0-75.0 sec 437 MBytes 734 Mbits/sec 0.020 ms 0/311960
> (0%)
> [ 3] 75.0-80.0 sec 437 MBytes 734 Mbits/sec 0.013 ms 118/312060
> (0.038%)
> [ 3] 80.0-85.0 sec 437 MBytes 734 Mbits/sec 0.019 ms 122/312060
> (0.039%)
> [ 3] 85.0-90.0 sec 437 MBytes 733 Mbits/sec 0.025 ms 55/311904
> (0.018%)
> [ 3] 90.0-95.0 sec 437 MBytes 733 Mbits/sec 0.024 ms 259/312002
> (0.083%)
> [ 3] 0.0-97.0 sec 8.28 GBytes 733 Mbits/sec 0.034 ms 2271/6053089
> (0.038%)
>
> Sometimes I even see packet disorder report from the iperf receipt part.
>
> I didn't expect such a performance in terms of delay and throughput and I
> would link to find an explanation. That's why I need your help.
>
> Allow me to tell you some particularities of the machine that runs the DPDK
> application and the environment which could help us explain this behaviour.
>
>
> 1. When I run the application I running using the "Debug" environment of
> the Eclipse in Linux OpenSuse 42.3 Leap.
> 2. The hugepages size in this machine is 2 MB
> 3. 1024 hugepages has been reserved for the application
> 4. lscpu displayed subsequently
>
> cuda1@cuda1:~/eclipse-workspace/SimpleModelDPDK> lscpu
> Architecture: x86_64
> CPU op-mode(s): 32-bit, 64-bit
> Byte Order: Little Endian
> CPU(s): 8
> On-line CPU(s) list: 0-7
> Thread(s) per core: 1
> Core(s) per socket: 4
> Socket(s): 2
> NUMA node(s): 2
> Vendor ID: GenuineIntel
> CPU family: 6
> Model: 26
> Model name: Intel(R) Xeon(R) CPU E5506 @ 2.13GHz
> Stepping: 5
> CPU MHz: 2133.000
> CPU max MHz: 2133.0000
> CPU min MHz: 1600.0000
> BogoMIPS: 4267.10
> Virtualization: VT-x
> L1d cache: 32K
> L1i cache: 32K
> L2 cache: 256K
> L3 cache: 4096K
> NUMA node0 CPU(s): 0-3
> NUMA node1 CPU(s): 4-7
> Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge
> mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall
> nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology
> nonstop_tsc aperfmperf pni dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16
> xtpr pdcm dca sse4_1 sse4_2 popcnt lahf_lm dtherm retpoline kaiser
> tpr_shadow vnmi flexpriority ept vpid
>
> 5. Routing pipeline is executed using core 1. Master pipeline is executed
> using core 0 and new ARP pipeline is executed using core 2.
>
> 6. The two NICs I am using seems not to be assigned to any NUMA node
>
> cuda1@cuda1:~/eclipse-workspace/SimpleModelDPDK> cat
> /sys/bus/pci/devices/0000\:04\:00.0/numa_node
> -1
> cuda1@cuda1:~/eclipse-workspace/SimpleModelDPDK> cat
> /sys/bus/pci/devices/0000\:04\:00.1/numa_node
> -1
>
> 7. According to the pipeline ROUTING statistics (regarding table 0 and
> table 1) very few miss drops at table 0 are reported and do not coincide at
> all with the ones reported by iperf (iperf drops are much higher than the
> table 0 and table 1 drops) and also the links used in the application do
> not report any drop at all.
>
> So where are these packets dropped?
>
> Any of you have an idea if this particularities from my PC can justify this
> behaviour?
>
> I need to find an answer to this because I expected a much better
> performance according to the DPDK performance expectations.
>
> Thanks for your attention
>
> Victor
>
What NIC?
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [dpdk-users] Explanation for poor performance of DPDK not found
2018-08-27 16:05 ` Stephen Hemminger
@ 2018-08-28 7:22 ` Victor Huertas
2018-08-28 8:36 ` Victor Huertas
0 siblings, 1 reply; 4+ messages in thread
From: Victor Huertas @ 2018-08-28 7:22 UTC (permalink / raw)
To: stephen; +Cc: users
You are right Stephen.
I missed the NIC's description. Sorry about that. You can find subsquently
the NICs I have in this machine (I am using for DPDK 0000:04:00.0 and
0000:04:00.1).
cuda1@cuda1:~/eclipse-workspace/SimpleModelDPDK> sudo lspci -D | grep
'Network\|Ethernet'
0000:02:00.0 Ethernet controller: Intel Corporation 82572EI Gigabit
Ethernet Controller (Copper) (rev 06)
0000:04:00.0 Ethernet controller: Intel Corporation 82576 Gigabit Network
Connection (rev 01)
0000:04:00.1 Ethernet controller: Intel Corporation 82576 Gigabit Network
Connection (rev 01)
Thanks for your attention
Regards,
El lun., 27 ago. 2018 a las 18:05, Stephen Hemminger (<
stephen@networkplumber.org>) escribió:
> On Mon, 27 Aug 2018 17:21:03 +0200
> Victor Huertas <vhuertas@gmail.com> wrote:
>
> > Dear colleagues,
> >
> > I am seeing a strange behaviour in terms of performance when I run the L3
> > forwarding pipeline app example of DPDK.
> >
> > The diagram is as simple as this:
> >
> > PC1 <--------1 Gbps link----------> DPDK app (L3 forwarding) <--------1
> > Gbps link--------> PC2
> >
> > I have implemented a new pipeline which is performs ARP task in order to
> > configure the Routing type pipelines's table 1 (the one that performs MAC
> > translation from next-hop IP addr).
> >
> > The first strange thing I see is that when I ping from PC 1 to PC 2, the
> > ping works but it is reporting me a delay of 19,9 ms. And also every ping
> > report (1 per second) reports a decreasing delay in 1 ms like this:
> > PING 192.168.1.101 (192.168.1.101) 56(84) bytes of data.
> > 64 bytes from 192.168.1.101: icmp_seq=2 ttl=64 time=17.2 ms
> > 64 bytes from 192.168.1.101: icmp_seq=3 ttl=64 time=15.9 ms
> > 64 bytes from 192.168.1.101: icmp_seq=4 ttl=64 time=14.9 ms
> > 64 bytes from 192.168.1.101: icmp_seq=5 ttl=64 time=13.9 ms
> > 64 bytes from 192.168.1.101: icmp_seq=6 ttl=64 time=12.9 ms
> > 64 bytes from 192.168.1.101: icmp_seq=7 ttl=64 time=11.9 ms
> > 64 bytes from 192.168.1.101: icmp_seq=8 ttl=64 time=10.9 ms
> > 64 bytes from 192.168.1.101: icmp_seq=9 ttl=64 time=19.9 ms
> > 64 bytes from 192.168.1.101: icmp_seq=10 ttl=64 time=18.9 ms
> > 64 bytes from 192.168.1.101: icmp_seq=11 ttl=64 time=17.9 ms
> >
> > As you can see, there is a 1 ms decrease each ping report and suddenly
> > comes back to 19,9 ms
> >
> > The second issue comes up when I send an 700 Mbps UDP stream (using iperf
> > v2.0.5 at both sides) from PC1 to PC2. What I see is a slight packet loss
> > on reception.
> > [ 4] 0.0-509.3 sec 1 datagrams received out-of-order
> > [ 3] local 192.168.0.101 port 5001 connected with 192.168.1.101 port
> 60184
> > [ 3] 0.0- 5.0 sec 437 MBytes 733 Mbits/sec 0.022 ms 39/311788
> > (0.013%)
> > [ 3] 5.0-10.0 sec 437 MBytes 733 Mbits/sec 0.025 ms 166/311988
> > (0.053%)
> > [ 3] 10.0-15.0 sec 437 MBytes 734 Mbits/sec 0.022 ms 0/312067
> > (0%)
> > [ 3] 15.0-20.0 sec 437 MBytes 733 Mbits/sec 0.029 ms 151/311916
> > (0.048%)
> > [ 3] 20.0-25.0 sec 437 MBytes 734 Mbits/sec 0.016 ms 30/311926
> > (0.0096%)
> > [ 3] 25.0-30.0 sec 437 MBytes 734 Mbits/sec 0.022 ms 143/312118
> > (0.046%)
> > [ 3] 30.0-35.0 sec 437 MBytes 733 Mbits/sec 0.022 ms 20/311801
> > (0.0064%)
> > [ 3] 35.0-40.0 sec 437 MBytes 733 Mbits/sec 0.020 ms 202/311857
> > (0.065%)
> > [ 3] 40.0-45.0 sec 437 MBytes 733 Mbits/sec 0.017 ms 242/311921
> > (0.078%)
> > [ 3] 45.0-50.0 sec 437 MBytes 733 Mbits/sec 0.021 ms 280/311890
> > (0.09%)
> > [ 3] 50.0-55.0 sec 438 MBytes 734 Mbits/sec 0.019 ms 0/312119
> > (0%)
> > [ 3] 55.0-60.0 sec 436 MBytes 732 Mbits/sec 0.018 ms 152/311339
> > (0.049%)
> > [ 3] 60.0-65.0 sec 437 MBytes 734 Mbits/sec 0.017 ms 113/312048
> > (0.036%)
> > [ 3] 65.0-70.0 sec 437 MBytes 733 Mbits/sec 0.023 ms 180/311756
> > (0.058%)
> > [ 3] 70.0-75.0 sec 437 MBytes 734 Mbits/sec 0.020 ms 0/311960
> > (0%)
> > [ 3] 75.0-80.0 sec 437 MBytes 734 Mbits/sec 0.013 ms 118/312060
> > (0.038%)
> > [ 3] 80.0-85.0 sec 437 MBytes 734 Mbits/sec 0.019 ms 122/312060
> > (0.039%)
> > [ 3] 85.0-90.0 sec 437 MBytes 733 Mbits/sec 0.025 ms 55/311904
> > (0.018%)
> > [ 3] 90.0-95.0 sec 437 MBytes 733 Mbits/sec 0.024 ms 259/312002
> > (0.083%)
> > [ 3] 0.0-97.0 sec 8.28 GBytes 733 Mbits/sec 0.034 ms 2271/6053089
> > (0.038%)
> >
> > Sometimes I even see packet disorder report from the iperf receipt part.
> >
> > I didn't expect such a performance in terms of delay and throughput and I
> > would link to find an explanation. That's why I need your help.
> >
> > Allow me to tell you some particularities of the machine that runs the
> DPDK
> > application and the environment which could help us explain this
> behaviour.
> >
> >
> > 1. When I run the application I running using the "Debug" environment
> of
> > the Eclipse in Linux OpenSuse 42.3 Leap.
> > 2. The hugepages size in this machine is 2 MB
> > 3. 1024 hugepages has been reserved for the application
> > 4. lscpu displayed subsequently
> >
> > cuda1@cuda1:~/eclipse-workspace/SimpleModelDPDK> lscpu
> > Architecture: x86_64
> > CPU op-mode(s): 32-bit, 64-bit
> > Byte Order: Little Endian
> > CPU(s): 8
> > On-line CPU(s) list: 0-7
> > Thread(s) per core: 1
> > Core(s) per socket: 4
> > Socket(s): 2
> > NUMA node(s): 2
> > Vendor ID: GenuineIntel
> > CPU family: 6
> > Model: 26
> > Model name: Intel(R) Xeon(R) CPU E5506 @ 2.13GHz
> > Stepping: 5
> > CPU MHz: 2133.000
> > CPU max MHz: 2133.0000
> > CPU min MHz: 1600.0000
> > BogoMIPS: 4267.10
> > Virtualization: VT-x
> > L1d cache: 32K
> > L1i cache: 32K
> > L2 cache: 256K
> > L3 cache: 4096K
> > NUMA node0 CPU(s): 0-3
> > NUMA node1 CPU(s): 4-7
> > Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr
> pge
> > mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe
> syscall
> > nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology
> > nonstop_tsc aperfmperf pni dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16
> > xtpr pdcm dca sse4_1 sse4_2 popcnt lahf_lm dtherm retpoline kaiser
> > tpr_shadow vnmi flexpriority ept vpid
> >
> > 5. Routing pipeline is executed using core 1. Master pipeline is executed
> > using core 0 and new ARP pipeline is executed using core 2.
> >
> > 6. The two NICs I am using seems not to be assigned to any NUMA node
> >
> > cuda1@cuda1:~/eclipse-workspace/SimpleModelDPDK> cat
> > /sys/bus/pci/devices/0000\:04\:00.0/numa_node
> > -1
> > cuda1@cuda1:~/eclipse-workspace/SimpleModelDPDK> cat
> > /sys/bus/pci/devices/0000\:04\:00.1/numa_node
> > -1
> >
> > 7. According to the pipeline ROUTING statistics (regarding table 0 and
> > table 1) very few miss drops at table 0 are reported and do not coincide
> at
> > all with the ones reported by iperf (iperf drops are much higher than the
> > table 0 and table 1 drops) and also the links used in the application do
> > not report any drop at all.
> >
> > So where are these packets dropped?
> >
> > Any of you have an idea if this particularities from my PC can justify
> this
> > behaviour?
> >
> > I need to find an answer to this because I expected a much better
> > performance according to the DPDK performance expectations.
> >
> > Thanks for your attention
> >
> > Victor
> >
>
> What NIC?
>
--
Victor
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [dpdk-users] Explanation for poor performance of DPDK not found
2018-08-28 7:22 ` Victor Huertas
@ 2018-08-28 8:36 ` Victor Huertas
0 siblings, 0 replies; 4+ messages in thread
From: Victor Huertas @ 2018-08-28 8:36 UTC (permalink / raw)
To: stephen; +Cc: users
Just to provide additional info, the DPDK version used is the 17.11 and I
am running the application with the VFIO PMD driver on these two NICs.
Regards,
PD: I resend the message to all dpdk-users group.
El mar., 28 ago. 2018 a las 9:22, Victor Huertas (<vhuertas@gmail.com>)
escribió:
>
> You are right Stephen.
> I missed the NIC's description. Sorry about that. You can find subsquently
> the NICs I have in this machine (I am using for DPDK 0000:04:00.0 and
> 0000:04:00.1).
>
> cuda1@cuda1:~/eclipse-workspace/SimpleModelDPDK> sudo lspci -D | grep
> 'Network\|Ethernet'
>
> 0000:02:00.0 Ethernet controller: Intel Corporation 82572EI Gigabit
> Ethernet Controller (Copper) (rev 06)
> 0000:04:00.0 Ethernet controller: Intel Corporation 82576 Gigabit Network
> Connection (rev 01)
> 0000:04:00.1 Ethernet controller: Intel Corporation 82576 Gigabit Network
> Connection (rev 01)
>
> Thanks for your attention
>
> Regards,
>
> El lun., 27 ago. 2018 a las 18:05, Stephen Hemminger (<
> stephen@networkplumber.org>) escribió:
>
>> On Mon, 27 Aug 2018 17:21:03 +0200
>> Victor Huertas <vhuertas@gmail.com> wrote:
>>
>> > Dear colleagues,
>> >
>> > I am seeing a strange behaviour in terms of performance when I run the
>> L3
>> > forwarding pipeline app example of DPDK.
>> >
>> > The diagram is as simple as this:
>> >
>> > PC1 <--------1 Gbps link----------> DPDK app (L3 forwarding) <--------1
>> > Gbps link--------> PC2
>> >
>> > I have implemented a new pipeline which is performs ARP task in order to
>> > configure the Routing type pipelines's table 1 (the one that performs
>> MAC
>> > translation from next-hop IP addr).
>> >
>> > The first strange thing I see is that when I ping from PC 1 to PC 2, the
>> > ping works but it is reporting me a delay of 19,9 ms. And also every
>> ping
>> > report (1 per second) reports a decreasing delay in 1 ms like this:
>> > PING 192.168.1.101 (192.168.1.101) 56(84) bytes of data.
>> > 64 bytes from 192.168.1.101: icmp_seq=2 ttl=64 time=17.2 ms
>> > 64 bytes from 192.168.1.101: icmp_seq=3 ttl=64 time=15.9 ms
>> > 64 bytes from 192.168.1.101: icmp_seq=4 ttl=64 time=14.9 ms
>> > 64 bytes from 192.168.1.101: icmp_seq=5 ttl=64 time=13.9 ms
>> > 64 bytes from 192.168.1.101: icmp_seq=6 ttl=64 time=12.9 ms
>> > 64 bytes from 192.168.1.101: icmp_seq=7 ttl=64 time=11.9 ms
>> > 64 bytes from 192.168.1.101: icmp_seq=8 ttl=64 time=10.9 ms
>> > 64 bytes from 192.168.1.101: icmp_seq=9 ttl=64 time=19.9 ms
>> > 64 bytes from 192.168.1.101: icmp_seq=10 ttl=64 time=18.9 ms
>> > 64 bytes from 192.168.1.101: icmp_seq=11 ttl=64 time=17.9 ms
>> >
>> > As you can see, there is a 1 ms decrease each ping report and suddenly
>> > comes back to 19,9 ms
>> >
>> > The second issue comes up when I send an 700 Mbps UDP stream (using
>> iperf
>> > v2.0.5 at both sides) from PC1 to PC2. What I see is a slight packet
>> loss
>> > on reception.
>> > [ 4] 0.0-509.3 sec 1 datagrams received out-of-order
>> > [ 3] local 192.168.0.101 port 5001 connected with 192.168.1.101 port
>> 60184
>> > [ 3] 0.0- 5.0 sec 437 MBytes 733 Mbits/sec 0.022 ms 39/311788
>> > (0.013%)
>> > [ 3] 5.0-10.0 sec 437 MBytes 733 Mbits/sec 0.025 ms 166/311988
>> > (0.053%)
>> > [ 3] 10.0-15.0 sec 437 MBytes 734 Mbits/sec 0.022 ms 0/312067
>> > (0%)
>> > [ 3] 15.0-20.0 sec 437 MBytes 733 Mbits/sec 0.029 ms 151/311916
>> > (0.048%)
>> > [ 3] 20.0-25.0 sec 437 MBytes 734 Mbits/sec 0.016 ms 30/311926
>> > (0.0096%)
>> > [ 3] 25.0-30.0 sec 437 MBytes 734 Mbits/sec 0.022 ms 143/312118
>> > (0.046%)
>> > [ 3] 30.0-35.0 sec 437 MBytes 733 Mbits/sec 0.022 ms 20/311801
>> > (0.0064%)
>> > [ 3] 35.0-40.0 sec 437 MBytes 733 Mbits/sec 0.020 ms 202/311857
>> > (0.065%)
>> > [ 3] 40.0-45.0 sec 437 MBytes 733 Mbits/sec 0.017 ms 242/311921
>> > (0.078%)
>> > [ 3] 45.0-50.0 sec 437 MBytes 733 Mbits/sec 0.021 ms 280/311890
>> > (0.09%)
>> > [ 3] 50.0-55.0 sec 438 MBytes 734 Mbits/sec 0.019 ms 0/312119
>> > (0%)
>> > [ 3] 55.0-60.0 sec 436 MBytes 732 Mbits/sec 0.018 ms 152/311339
>> > (0.049%)
>> > [ 3] 60.0-65.0 sec 437 MBytes 734 Mbits/sec 0.017 ms 113/312048
>> > (0.036%)
>> > [ 3] 65.0-70.0 sec 437 MBytes 733 Mbits/sec 0.023 ms 180/311756
>> > (0.058%)
>> > [ 3] 70.0-75.0 sec 437 MBytes 734 Mbits/sec 0.020 ms 0/311960
>> > (0%)
>> > [ 3] 75.0-80.0 sec 437 MBytes 734 Mbits/sec 0.013 ms 118/312060
>> > (0.038%)
>> > [ 3] 80.0-85.0 sec 437 MBytes 734 Mbits/sec 0.019 ms 122/312060
>> > (0.039%)
>> > [ 3] 85.0-90.0 sec 437 MBytes 733 Mbits/sec 0.025 ms 55/311904
>> > (0.018%)
>> > [ 3] 90.0-95.0 sec 437 MBytes 733 Mbits/sec 0.024 ms 259/312002
>> > (0.083%)
>> > [ 3] 0.0-97.0 sec 8.28 GBytes 733 Mbits/sec 0.034 ms 2271/6053089
>> > (0.038%)
>> >
>> > Sometimes I even see packet disorder report from the iperf receipt part.
>> >
>> > I didn't expect such a performance in terms of delay and throughput and
>> I
>> > would link to find an explanation. That's why I need your help.
>> >
>> > Allow me to tell you some particularities of the machine that runs the
>> DPDK
>> > application and the environment which could help us explain this
>> behaviour.
>> >
>> >
>> > 1. When I run the application I running using the "Debug"
>> environment of
>> > the Eclipse in Linux OpenSuse 42.3 Leap.
>> > 2. The hugepages size in this machine is 2 MB
>> > 3. 1024 hugepages has been reserved for the application
>> > 4. lscpu displayed subsequently
>> >
>> > cuda1@cuda1:~/eclipse-workspace/SimpleModelDPDK> lscpu
>> > Architecture: x86_64
>> > CPU op-mode(s): 32-bit, 64-bit
>> > Byte Order: Little Endian
>> > CPU(s): 8
>> > On-line CPU(s) list: 0-7
>> > Thread(s) per core: 1
>> > Core(s) per socket: 4
>> > Socket(s): 2
>> > NUMA node(s): 2
>> > Vendor ID: GenuineIntel
>> > CPU family: 6
>> > Model: 26
>> > Model name: Intel(R) Xeon(R) CPU E5506 @ 2.13GHz
>> > Stepping: 5
>> > CPU MHz: 2133.000
>> > CPU max MHz: 2133.0000
>> > CPU min MHz: 1600.0000
>> > BogoMIPS: 4267.10
>> > Virtualization: VT-x
>> > L1d cache: 32K
>> > L1i cache: 32K
>> > L2 cache: 256K
>> > L3 cache: 4096K
>> > NUMA node0 CPU(s): 0-3
>> > NUMA node1 CPU(s): 4-7
>> > Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr
>> pge
>> > mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe
>> syscall
>> > nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology
>> > nonstop_tsc aperfmperf pni dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16
>> > xtpr pdcm dca sse4_1 sse4_2 popcnt lahf_lm dtherm retpoline kaiser
>> > tpr_shadow vnmi flexpriority ept vpid
>> >
>> > 5. Routing pipeline is executed using core 1. Master pipeline is
>> executed
>> > using core 0 and new ARP pipeline is executed using core 2.
>> >
>> > 6. The two NICs I am using seems not to be assigned to any NUMA node
>> >
>> > cuda1@cuda1:~/eclipse-workspace/SimpleModelDPDK> cat
>> > /sys/bus/pci/devices/0000\:04\:00.0/numa_node
>> > -1
>> > cuda1@cuda1:~/eclipse-workspace/SimpleModelDPDK> cat
>> > /sys/bus/pci/devices/0000\:04\:00.1/numa_node
>> > -1
>> >
>> > 7. According to the pipeline ROUTING statistics (regarding table 0 and
>> > table 1) very few miss drops at table 0 are reported and do not
>> coincide at
>> > all with the ones reported by iperf (iperf drops are much higher than
>> the
>> > table 0 and table 1 drops) and also the links used in the application do
>> > not report any drop at all.
>> >
>> > So where are these packets dropped?
>> >
>> > Any of you have an idea if this particularities from my PC can justify
>> this
>> > behaviour?
>> >
>> > I need to find an answer to this because I expected a much better
>> > performance according to the DPDK performance expectations.
>> >
>> > Thanks for your attention
>> >
>> > Victor
>> >
>>
>> What NIC?
>>
>
>
> --
> Victor
>
--
Victor
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2018-08-28 8:37 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-08-27 15:21 [dpdk-users] Explanation for poor performance of DPDK not found Victor Huertas
2018-08-27 16:05 ` Stephen Hemminger
2018-08-28 7:22 ` Victor Huertas
2018-08-28 8:36 ` Victor Huertas
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).