* Re: [dpdk-dev] Performance regression in DPDK 1.8/2.0
2015-04-27 22:28 ` Paul Emmerich
@ 2015-04-28 5:50 ` Matthew Hall
2015-04-28 10:56 ` Paul Emmerich
2015-04-28 10:43 ` Paul Emmerich
` (2 subsequent siblings)
3 siblings, 1 reply; 13+ messages in thread
From: Matthew Hall @ 2015-04-28 5:50 UTC (permalink / raw)
To: Paul Emmerich; +Cc: dev
On Apr 27, 2015, at 3:28 PM, Paul Emmerich <emmericp@net.in.tum.de> wrote:
> Let me know if you need any additional information.
> I'd also be interested in the configuration that resulted in the 20% speed-
> up that was mentioned in the original mbuf patch
Not sure if it's relevant or not, but there was another mail claiming PCIe MSI-X wasn't necessarily working in DPDK 2.x. Not sure if that could be causing slowdowns when there are drastic volumes of 64-byte packets causing a lot of PCI activity.
Also, you are mentioning some specific patches were involved... so I have to ask if anybody tried git bisect yet or not. Maybe easier than trying to guess at the answer.
Matthew.
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [dpdk-dev] Performance regression in DPDK 1.8/2.0
2015-04-28 5:50 ` Matthew Hall
@ 2015-04-28 10:56 ` Paul Emmerich
0 siblings, 0 replies; 13+ messages in thread
From: Paul Emmerich @ 2015-04-28 10:56 UTC (permalink / raw)
To: Matthew Hall; +Cc: dev
Hi,
Matthew Hall <mhall@mhcomputing.net>:
> Not sure if it's relevant or not, but there was another mail claiming PCIe MSI-X wasn't necessarily working in DPDK 2.x. Not sure if that could be causing slowdowns when there are drastic volumes of 64-byte packets causing a lot of PCI activity.
Interrupts should not be relevant here.
> Also, you are mentioning some specific patches were involved... so I have to ask if anybody tried git bisect yet or not. Maybe easier than trying to guess at the answer.
I have not yet tried to bisect it, but that’s the next step
on my todo list*. The mbuf patch was just an educated
guess to start a discussion.
I hoped that I was just doing something obvious wrong,
and/or that someone could point me to performance
regression tests that were executed to proof that the mbuf
patch does not affect performance.
However, there don’t seem to be any 'official‘ performance
regression tests, are there?
Paul
* I probably won’t be able to it until next week, though as
I have to to finish the paper about my packet generator
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [dpdk-dev] Performance regression in DPDK 1.8/2.0
2015-04-27 22:28 ` Paul Emmerich
2015-04-28 5:50 ` Matthew Hall
@ 2015-04-28 10:43 ` Paul Emmerich
2015-04-28 10:55 ` Bruce Richardson
2015-04-28 10:58 ` Bruce Richardson
2015-04-28 11:31 ` De Lara Guarch, Pablo
3 siblings, 1 reply; 13+ messages in thread
From: Paul Emmerich @ 2015-04-28 10:43 UTC (permalink / raw)
To: De Lara Guarch, Pablo; +Cc: dev
Hi,
sorry, I mixed up the hardware I used for my tests.
Paul Emmerich <emmericp@net.in.tum.de>:
> CPU: Intel(R) Xeon(R) CPU E3-1230 v2
> TurboBoost and HyperThreading disabled.
> Frequency fixed at 3.30 GHz via acpi_cpufreq.
The CPU frequency was fixed at 1.60 GHz to enforce
a CPU bottleneck.
My original post said that I used a Xeon E5-2620 v3
at 1.2 GHz, this is incorrect. The calculation for Cycles/Pkt
in the original post used the correct 1.6 GHz figure, though.
(I used the E5 CPU for the evaluation of my packet generator
performance with 1.7.1/2.0.0, not for the l2fwd test.)
Sorry for the confusion.
Paul
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [dpdk-dev] Performance regression in DPDK 1.8/2.0
2015-04-28 10:43 ` Paul Emmerich
@ 2015-04-28 10:55 ` Bruce Richardson
2015-04-28 11:32 ` De Lara Guarch, Pablo
0 siblings, 1 reply; 13+ messages in thread
From: Bruce Richardson @ 2015-04-28 10:55 UTC (permalink / raw)
To: Paul Emmerich; +Cc: dev
On Tue, Apr 28, 2015 at 12:43:16PM +0200, Paul Emmerich wrote:
> Hi,
>
> sorry, I mixed up the hardware I used for my tests.
>
>
> Paul Emmerich <emmericp@net.in.tum.de>:
> > CPU: Intel(R) Xeon(R) CPU E3-1230 v2
> > TurboBoost and HyperThreading disabled.
> > Frequency fixed at 3.30 GHz via acpi_cpufreq.
>
> The CPU frequency was fixed at 1.60 GHz to enforce
> a CPU bottleneck.
>
>
> My original post said that I used a Xeon E5-2620 v3
> at 1.2 GHz, this is incorrect. The calculation for Cycles/Pkt
> in the original post used the correct 1.6 GHz figure, though.
>
> (I used the E5 CPU for the evaluation of my packet generator
> performance with 1.7.1/2.0.0, not for the l2fwd test.)
>
>
> Sorry for the confusion.
>
>
> Paul
Thanks for the update - we are investigating.
/Bruce
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [dpdk-dev] Performance regression in DPDK 1.8/2.0
2015-04-28 10:55 ` Bruce Richardson
@ 2015-04-28 11:32 ` De Lara Guarch, Pablo
0 siblings, 0 replies; 13+ messages in thread
From: De Lara Guarch, Pablo @ 2015-04-28 11:32 UTC (permalink / raw)
To: Richardson, Bruce, Paul Emmerich; +Cc: dev
> -----Original Message-----
> From: Richardson, Bruce
> Sent: Tuesday, April 28, 2015 11:55 AM
> To: Paul Emmerich
> Cc: De Lara Guarch, Pablo; dev@dpdk.org
> Subject: Re: [dpdk-dev] Performance regression in DPDK 1.8/2.0
>
> On Tue, Apr 28, 2015 at 12:43:16PM +0200, Paul Emmerich wrote:
> > Hi,
> >
> > sorry, I mixed up the hardware I used for my tests.
> >
> >
> > Paul Emmerich <emmericp@net.in.tum.de>:
> > > CPU: Intel(R) Xeon(R) CPU E3-1230 v2
> > > TurboBoost and HyperThreading disabled.
> > > Frequency fixed at 3.30 GHz via acpi_cpufreq.
> >
> > The CPU frequency was fixed at 1.60 GHz to enforce
> > a CPU bottleneck.
> >
> >
> > My original post said that I used a Xeon E5-2620 v3
> > at 1.2 GHz, this is incorrect. The calculation for Cycles/Pkt
> > in the original post used the correct 1.6 GHz figure, though.
> >
> > (I used the E5 CPU for the evaluation of my packet generator
> > performance with 1.7.1/2.0.0, not for the l2fwd test.)
Thanks for the update. So, just for clarification,
for l2fwd you used E3-1230 v2 (Ivy Bridge), at 1.6 GHz or 3.3 GHz?
Pablo
> >
> >
> > Sorry for the confusion.
> >
> >
> > Paul
> Thanks for the update - we are investigating.
>
> /Bruce
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [dpdk-dev] Performance regression in DPDK 1.8/2.0
2015-04-27 22:28 ` Paul Emmerich
2015-04-28 5:50 ` Matthew Hall
2015-04-28 10:43 ` Paul Emmerich
@ 2015-04-28 10:58 ` Bruce Richardson
2015-04-28 11:31 ` De Lara Guarch, Pablo
3 siblings, 0 replies; 13+ messages in thread
From: Bruce Richardson @ 2015-04-28 10:58 UTC (permalink / raw)
To: Paul Emmerich; +Cc: dev
On Tue, Apr 28, 2015 at 12:28:34AM +0200, Paul Emmerich wrote:
> Let me know if you need any additional information.
> I'd also be interested in the configuration that resulted in the 20% speed-
> up that was mentioned in the original mbuf patch
>
> Paul
>
The speed-up would be for apps that were doing RX of scattered packets, i.e.
across mbufs. Before 1.8, this was using a scalar function which was rather
slow compared to the fast-path vector function. In 1.8 we introduced a new
vector function which supported scattered packets - it still isn't as fast as
the non-scattered packet RX function, but it was a good improvement over the
older version.
/Bruce
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [dpdk-dev] Performance regression in DPDK 1.8/2.0
2015-04-27 22:28 ` Paul Emmerich
` (2 preceding siblings ...)
2015-04-28 10:58 ` Bruce Richardson
@ 2015-04-28 11:31 ` De Lara Guarch, Pablo
2015-04-28 11:48 ` Paul Emmerich
3 siblings, 1 reply; 13+ messages in thread
From: De Lara Guarch, Pablo @ 2015-04-28 11:31 UTC (permalink / raw)
To: Paul Emmerich; +Cc: dev
> -----Original Message-----
> From: Paul Emmerich [mailto:emmericp@net.in.tum.de]
> Sent: Monday, April 27, 2015 11:29 PM
> To: De Lara Guarch, Pablo
> Cc: Pavel Odintsov; dev@dpdk.org
> Subject: Re: [dpdk-dev] Performance regression in DPDK 1.8/2.0
>
> Hi,
>
> Pablo <pablo.de.lara.guarch@intel.com>:
> > Could you tell me how you got the L1 cache miss ratio? Perf?
>
> perf stat -e L1-dcache-loads,L1-dcache-misses l2fwd ...
>
>
> > Could you provide more information on how you run the l2fwd app,
> > in order to try to reproduce the issue:
> > - L2fwd Command line
>
> ./build/l2fwd -c 3 -n 2 -- -p 3 -q 2
>
>
> > - L2fwd initialization (to check memory/CPU/NICs)
>
> I unfortunately did not save the output, but I wrote down the important
> parts:
>
> 1.7.1: no output regarding rx/tx code paths as init debug wasn't enabled
> 1.8.0 and 2.0.0: simple tx code path, vector rx
>
>
> Hardware:
>
> CPU: Intel(R) Xeon(R) CPU E3-1230 v2
> TurboBoost and HyperThreading disabled.
> Frequency fixed at 3.30 GHz via acpi_cpufreq.
>
> NIC: X540-T2
>
> Memory: Dual Channel DDR3 1333 MHz, 4x 4GB
>
> > Did you change the l2fwd app between versions? L2fwd uses simple rx on
> 1.7.1,
> > whereas it uses vector rx on 2.0 (enable IXGBE_DEBUG_INIT to check it).
>
> Yes, I had to update l2fwd when going from 1.7.1 to 1.8.0. However, the
> changes in the app were minimal.
Could you tell me which changes you made here? I see you are using simple tx code path on 1.8.0,
but with the default values, you should be using vector tx,
unless you have changed anything in the tx configuration.
Not sure also if you are using simple tx code path on 1.7.1 then, plus scattered rx.
(Without changing the l2fwd app, I use scattered rx and vector tx).
Thanks!
Pablo
>
> 1.8.0 and 2.0.0 used vector rx. Disabling vector rx via DPDK .config file
> causes another 30% performance loss so I kept it enabled.
>
>
>
> > Which packet format/size did you use? Does your traffic generator take
> into account the Inter-packet gap?
>
> 64 Byte packets, full line rate on both ports, i.e. 14.88 Mpps per port.
> The packet's content doesn't matter as l2fwd doesn't look at it, but it was
> just some random stuff: EthType 0x1234.
>
>
> Let me know if you need any additional information.
> I'd also be interested in the configuration that resulted in the 20% speed-
> up that was mentioned in the original mbuf patch
>
> Paul
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [dpdk-dev] Performance regression in DPDK 1.8/2.0
2015-04-28 11:31 ` De Lara Guarch, Pablo
@ 2015-04-28 11:48 ` Paul Emmerich
2015-05-05 14:56 ` De Lara Guarch, Pablo
0 siblings, 1 reply; 13+ messages in thread
From: Paul Emmerich @ 2015-04-28 11:48 UTC (permalink / raw)
To: De Lara Guarch, Pablo; +Cc: dev
Hi,
De Lara Guarch, Pablo <pablo.de.lara.guarch@intel.com>:
> Could you tell me which changes you made here? I see you are using simple tx code path on 1.8.0,
> but with the default values, you should be using vector tx,
> unless you have changed anything in the tx configuration.
sorry, I might have written that down wrong or read the output wrong.
I did not modify the l2fwd example.
> So, just for clarification,
> for l2fwd you used E3-1230 v2 (Ivy Bridge), at 1.6 GHz or 3.3 GHz?
At 1.6 GHz as it is simply too fast at 3.3 GHz ;)
I’ll probably write a minimal example that shows my
problem with tx only sometime next week.
I just used the l2fwd example to illustrate my point
with a 'builtin‘ example.
Paul
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [dpdk-dev] Performance regression in DPDK 1.8/2.0
2015-04-28 11:48 ` Paul Emmerich
@ 2015-05-05 14:56 ` De Lara Guarch, Pablo
0 siblings, 0 replies; 13+ messages in thread
From: De Lara Guarch, Pablo @ 2015-05-05 14:56 UTC (permalink / raw)
To: Paul Emmerich; +Cc: dev
Hi Paul,
> -----Original Message-----
> From: Paul Emmerich [mailto:emmericp@net.in.tum.de]
> Sent: Tuesday, April 28, 2015 12:48 PM
> To: De Lara Guarch, Pablo
> Cc: Pavel Odintsov; dev@dpdk.org
> Subject: Re: [dpdk-dev] Performance regression in DPDK 1.8/2.0
>
> Hi,
>
>
> De Lara Guarch, Pablo <pablo.de.lara.guarch@intel.com>:
> > Could you tell me which changes you made here? I see you are using
> simple tx code path on 1.8.0,
> > but with the default values, you should be using vector tx,
> > unless you have changed anything in the tx configuration.
>
> sorry, I might have written that down wrong or read the output wrong.
> I did not modify the l2fwd example.
>
>
> > So, just for clarification,
> > for l2fwd you used E3-1230 v2 (Ivy Bridge), at 1.6 GHz or 3.3 GHz?
>
> At 1.6 GHz as it is simply too fast at 3.3 GHz ;)
>
>
> I'll probably write a minimal example that shows my
> problem with tx only sometime next week.
> I just used the l2fwd example to illustrate my point
> with a 'builtin' example.
Thanks for the clarification. I tested it on Ivy Bridge as well, and I could not reproduce the issue.
Make sure that you use vector rx/tx anyway, to get best performance
(you should be seeing better performance, since l2fwd in 1.8/2.0 uses both vector rx/tx).
Thanks,
Pablo
>
> Paul
^ permalink raw reply [flat|nested] 13+ messages in thread