* Re: [dpdk-dev] [PATCH 0/2] L3FWD sample optimisation
@ 2014-05-28 9:17 Ananyev, Konstantin
2014-06-10 22:44 ` Thomas Monjalon
0 siblings, 1 reply; 6+ messages in thread
From: Ananyev, Konstantin @ 2014-05-28 9:17 UTC (permalink / raw)
To: Thomas Monjalon; +Cc: dev
Hi Thomas,
>As you are doing optimizations, it's important to know the performance gain.
>It could help to mitigate future reworks.
>So please, could you provide some benchmarking numbers in the commit log?
Some performance data below.
Also, forgot to mention that new code path can be switched on/off by setting
ENABLE_MULTI_BUFFER_OPTIMIZE macro to 1/0.
Do I need to resubmit the whole patch series, or just a cover letter, or ...?
Konstantin
SUT: dual-socket board IVB 2.8 GHz with 4 ports on 4 NIC (all at socket 0) connected to the traffic generator.
2x1GB pages, kernel: 3.11.3-201.fc19.x86_64, gcc 4.8.2.
64B packets, using the packet flooding method.
All 4 ports are managed by one logical core:
Optimised scalar PMD RX/TX was used.
DIFF % (NEW-OLD)
IPV4-CONT-BURST: +23%
IPV6-CONT-BURST : +13%
IPV4/IPV6-CONT-BURST: +8%
IPV4-4STREAMSX8: +7%
IPV4-4STREAMSX1: -2%
Test cases description:
IPV4-CONT-BURST - IPV4 packets all packets from the one input port are destined for the same output port.
IPV6-CONT-BURST - IPV6 packets all packets from the one input port are destined for the same output port.
IPV4/IPV6-CONT-BURST - mix of the first 2 with interleave=1 (e.g: IPV4,IPV6,IPV4,IPV6, ...)
IPV4-4STREAMSX1 - 4 streams of IPV4 packets, where all packets from same stream are destined for the same output port
(e.g: IPV4_DST_P0, IPV4_DST_P1, IPV4_DST_P2, IPV4_DST_P3, IPV4_DST_P0, ...)
IPV4-4STREAMSX8 - same as above but packets for each stream are coming in groups of 8
(e.g: IPV4_DST_P0 X 8, IPV4_DST_P1 X 8, IPV4_DST_P2 X 8, IPV4_DST_P3 X 8, IPV4_DST_P0 X 8, ...)
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [dpdk-dev] [PATCH 0/2] L3FWD sample optimisation
2014-05-28 9:17 [dpdk-dev] [PATCH 0/2] L3FWD sample optimisation Ananyev, Konstantin
@ 2014-06-10 22:44 ` Thomas Monjalon
0 siblings, 0 replies; 6+ messages in thread
From: Thomas Monjalon @ 2014-06-10 22:44 UTC (permalink / raw)
To: Ananyev, Konstantin; +Cc: dev
Hi Konstantin,
2014-05-28 09:17, Ananyev, Konstantin:
> Hi Thomas,
>
> >As you are doing optimizations, it's important to know the performance gain.
> >It could help to mitigate future reworks.
> >So please, could you provide some benchmarking numbers in the commit log?
>
> Some performance data below.
> Also, forgot to mention that new code path can be switched on/off by setting
> ENABLE_MULTI_BUFFER_OPTIMIZE macro to 1/0.
> Do I need to resubmit the whole patch series, or just a cover letter, or ...?
I think you should resubmit the whole serie after having checked it with checkpatch.pl.
Please keep Acked-by and Tested-by lines from previous mails.
Thanks
--
Thomas
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [dpdk-dev] [PATCH 0/2] L3FWD sample optimisation
2014-05-22 16:55 Konstantin Ananyev
2014-05-23 8:05 ` Thomas Monjalon
2014-06-04 13:47 ` Cao, Waterman
@ 2014-06-06 8:26 ` De Lara Guarch, Pablo
2 siblings, 0 replies; 6+ messages in thread
From: De Lara Guarch, Pablo @ 2014-06-06 8:26 UTC (permalink / raw)
To: Ananyev, Konstantin, dev
Acked-by: Pablo de Lara Guarch <pablo.de.lara.guarch@intel.com>
> -----Original Message-----
> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Konstantin Ananyev
> Sent: Thursday, May 22, 2014 5:56 PM
> To: dev@dpdk.org; dev@dpdk.org
> Subject: [dpdk-dev] [PATCH 0/2] L3FWD sample optimisation
>
> With latest HW and optimised RX/TX path there is a huge gap between
> tespmd iofwd and l3fwd performance results.
> So there is an attempt to optimise l3fwd LPM code path and reduce the gap:
> - Instead of processing each input packet up to completion -
> divide packet processing into several stages and perform
> stage by stage for the whole burst.
> - Unroll things by the factor of 4 whenever possible.
> - Use SSE instincts for some operations (bswap, replace MAC addresses, etc).
> - Avoid TX packet buffering whenever possible.
> - Move some checks from RX/TX into setup phase.
>
> app/test/test_lpm.c | 70 ++++
> examples/l3fwd/main.c | 467 +++++++++++++++++++++-
> lib/librte_eal/common/Makefile | 1 +
> lib/librte_eal/common/include/rte_common_vect.h | 93 +++++
> lib/librte_lpm/rte_lpm.h | 117 ++++++
> 5 files changed, 726 insertions(+), 22 deletions(-)
> create mode 100644 lib/librte_eal/common/include/rte_common_vect.h
>
> --
> 1.7.7.6
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [dpdk-dev] [PATCH 0/2] L3FWD sample optimisation
2014-05-22 16:55 Konstantin Ananyev
2014-05-23 8:05 ` Thomas Monjalon
@ 2014-06-04 13:47 ` Cao, Waterman
2014-06-06 8:26 ` De Lara Guarch, Pablo
2 siblings, 0 replies; 6+ messages in thread
From: Cao, Waterman @ 2014-06-04 13:47 UTC (permalink / raw)
To: Ananyev, Konstantin, dev, Thomas Monjalon
Tested-by: Waterman Cao <waterman.cao@intel.com>
This patch has been tested by Intel. We performed l3fwd performance test.
Test result shows that l3fwd performance with this ‘lpm optimization’ patch is much higher than that without this patch.
Test environment: Fedora 20, Linux Kernel 3.11.10, GCC 4.8.2, Intel Xeon processor E5-2680 v2, with 2 ports on 2 Niantic (all at socket 0)
Please refer performance data from the separate email:
http://dpdk.org/ml/archives/dev/2014-May/002703.html
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [dpdk-dev] [PATCH 0/2] L3FWD sample optimisation
2014-05-22 16:55 Konstantin Ananyev
@ 2014-05-23 8:05 ` Thomas Monjalon
2014-06-04 13:47 ` Cao, Waterman
2014-06-06 8:26 ` De Lara Guarch, Pablo
2 siblings, 0 replies; 6+ messages in thread
From: Thomas Monjalon @ 2014-05-23 8:05 UTC (permalink / raw)
To: Konstantin Ananyev; +Cc: dev
Hi Konstantin,
2014-05-22 17:55, Konstantin Ananyev:
> With latest HW and optimised RX/TX path there is a huge gap between
> tespmd iofwd and l3fwd performance results.
> So there is an attempt to optimise l3fwd LPM code path and reduce the gap:
> - Instead of processing each input packet up to completion -
> divide packet processing into several stages and perform
> stage by stage for the whole burst.
> - Unroll things by the factor of 4 whenever possible.
> - Use SSE instincts for some operations (bswap, replace MAC addresses,
> etc). - Avoid TX packet buffering whenever possible.
> - Move some checks from RX/TX into setup phase.
As you are doing optimizations, it's important to know the performance gain.
It could help to mitigate future reworks.
So please, could you provide some benchmarking numbers in the commit log?
Thanks
--
Thomas
^ permalink raw reply [flat|nested] 6+ messages in thread
* [dpdk-dev] [PATCH 0/2] L3FWD sample optimisation
@ 2014-05-22 16:55 Konstantin Ananyev
2014-05-23 8:05 ` Thomas Monjalon
` (2 more replies)
0 siblings, 3 replies; 6+ messages in thread
From: Konstantin Ananyev @ 2014-05-22 16:55 UTC (permalink / raw)
To: dev, dev
With latest HW and optimised RX/TX path there is a huge gap between
tespmd iofwd and l3fwd performance results.
So there is an attempt to optimise l3fwd LPM code path and reduce the gap:
- Instead of processing each input packet up to completion -
divide packet processing into several stages and perform
stage by stage for the whole burst.
- Unroll things by the factor of 4 whenever possible.
- Use SSE instincts for some operations (bswap, replace MAC addresses, etc).
- Avoid TX packet buffering whenever possible.
- Move some checks from RX/TX into setup phase.
app/test/test_lpm.c | 70 ++++
examples/l3fwd/main.c | 467 +++++++++++++++++++++-
lib/librte_eal/common/Makefile | 1 +
lib/librte_eal/common/include/rte_common_vect.h | 93 +++++
lib/librte_lpm/rte_lpm.h | 117 ++++++
5 files changed, 726 insertions(+), 22 deletions(-)
create mode 100644 lib/librte_eal/common/include/rte_common_vect.h
--
1.7.7.6
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2014-06-10 22:44 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-05-28 9:17 [dpdk-dev] [PATCH 0/2] L3FWD sample optimisation Ananyev, Konstantin
2014-06-10 22:44 ` Thomas Monjalon
-- strict thread matches above, loose matches on Subject: below --
2014-05-22 16:55 Konstantin Ananyev
2014-05-23 8:05 ` Thomas Monjalon
2014-06-04 13:47 ` Cao, Waterman
2014-06-06 8:26 ` De Lara Guarch, Pablo
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).