* DPDK performances surprise
@ 2022-05-18 16:40 Antonio Di Bacco
2022-05-18 16:53 ` Sanford, Robert
0 siblings, 1 reply; 8+ messages in thread
From: Antonio Di Bacco @ 2022-05-18 16:40 UTC (permalink / raw)
To: users
[-- Attachment #1: Type: text/plain, Size: 814 bytes --]
I recently read a performance test where l2fwd was able to receive packets
(8000B) from a 100 Gbps card, swap the L2 addresses and send them back to
the same port to be received by an ethernet analyzer. The throughput
achieved was close to 100 Gbps on a XEON machine (Intel(R) Xeon(R) Platinum
8176 CPU @ 2.10GHz) . This is the same processor I have and I know that, if
I try to write around 8000B to the attached DDR4 (2666MT/s) on an allocated
1GB hugepage, I get a maximum throughput of around 20GB/s.
Now, a 100 Gbps can generate a flow of around 12 GB/s, these packets have
to be written to the DDR and then read back to swap L2 addresses and this
leads to a cumulative bandwidth on the DDR that is around 2x12 GB/s and is
more than the 20GB/s of available bandwidth on the DDR4.
How can this be possible ?
[-- Attachment #2: Type: text/html, Size: 891 bytes --]
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: DPDK performances surprise
2022-05-18 16:40 DPDK performances surprise Antonio Di Bacco
@ 2022-05-18 16:53 ` Sanford, Robert
2022-05-18 17:04 ` Stephen Hemminger
2022-05-19 8:29 ` Kinsella, Ray
0 siblings, 2 replies; 8+ messages in thread
From: Sanford, Robert @ 2022-05-18 16:53 UTC (permalink / raw)
To: Antonio Di Bacco, users
[-- Attachment #1: Type: text/plain, Size: 1143 bytes --]
My guess is that most of the packet data has a short life in the L3 cache (before being overwritten by newer packets), but is never flushed to memory.
From: Antonio Di Bacco <a.dibacco.ks@gmail.com>
Date: Wednesday, May 18, 2022 at 12:40 PM
To: "users@dpdk.org" <users@dpdk.org>
Subject: DPDK performances surprise
I recently read a performance test where l2fwd was able to receive packets (8000B) from a 100 Gbps card, swap the L2 addresses and send them back to the same port to be received by an ethernet analyzer. The throughput achieved was close to 100 Gbps on a XEON machine (Intel(R) Xeon(R) Platinum 8176 CPU @ 2.10GHz) . This is the same processor I have and I know that, if I try to write around 8000B to the attached DDR4 (2666MT/s) on an allocated 1GB hugepage, I get a maximum throughput of around 20GB/s.
Now, a 100 Gbps can generate a flow of around 12 GB/s, these packets have to be written to the DDR and then read back to swap L2 addresses and this leads to a cumulative bandwidth on the DDR that is around 2x12 GB/s and is more than the 20GB/s of available bandwidth on the DDR4.
How can this be possible ?
[-- Attachment #2: Type: text/html, Size: 3069 bytes --]
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: DPDK performances surprise
2022-05-18 16:53 ` Sanford, Robert
@ 2022-05-18 17:04 ` Stephen Hemminger
2022-05-19 9:03 ` Antonio Di Bacco
2022-05-19 8:29 ` Kinsella, Ray
1 sibling, 1 reply; 8+ messages in thread
From: Stephen Hemminger @ 2022-05-18 17:04 UTC (permalink / raw)
To: Sanford, Robert; +Cc: Antonio Di Bacco, users
On Wed, 18 May 2022 16:53:04 +0000
"Sanford, Robert" <rsanford@akamai.com> wrote:
> My guess is that most of the packet data has a short life in the L3 cache (before being overwritten by newer packets), but is never flushed to memory.
>
> From: Antonio Di Bacco <a.dibacco.ks@gmail.com>
> Date: Wednesday, May 18, 2022 at 12:40 PM
> To: "users@dpdk.org" <users@dpdk.org>
> Subject: DPDK performances surprise
>
> I recently read a performance test where l2fwd was able to receive packets (8000B) from a 100 Gbps card, swap the L2 addresses and send them back to the same port to be received by an ethernet analyzer. The throughput achieved was close to 100 Gbps on a XEON machine (Intel(R) Xeon(R) Platinum 8176 CPU @ 2.10GHz) . This is the same processor I have and I know that, if I try to write around 8000B to the attached DDR4 (2666MT/s) on an allocated 1GB hugepage, I get a maximum throughput of around 20GB/s.
>
> Now, a 100 Gbps can generate a flow of around 12 GB/s, these packets have to be written to the DDR and then read back to swap L2 addresses and this leads to a cumulative bandwidth on the DDR that is around 2x12 GB/s and is more than the 20GB/s of available bandwidth on the DDR4.
>
> How can this be possible ?
Likely cache effects from DDIO. What is your packet size.
^ permalink raw reply [flat|nested] 8+ messages in thread
* RE: DPDK performances surprise
2022-05-18 16:53 ` Sanford, Robert
2022-05-18 17:04 ` Stephen Hemminger
@ 2022-05-19 8:29 ` Kinsella, Ray
2022-05-19 9:04 ` Antonio Di Bacco
1 sibling, 1 reply; 8+ messages in thread
From: Kinsella, Ray @ 2022-05-19 8:29 UTC (permalink / raw)
To: Sanford, Robert, Antonio Di Bacco, users
[-- Attachment #1: Type: text/plain, Size: 1558 bytes --]
I’d say that is likely yes.
FYI - pcm-memory is very handy tool for looking at memory traffic.
https://github.com/opcm/pcm
Thanks,
Ray K
From: Sanford, Robert <rsanford@akamai.com>
Sent: Wednesday 18 May 2022 17:53
To: Antonio Di Bacco <a.dibacco.ks@gmail.com>; users@dpdk.org
Subject: Re: DPDK performances surprise
My guess is that most of the packet data has a short life in the L3 cache (before being overwritten by newer packets), but is never flushed to memory.
From: Antonio Di Bacco <a.dibacco.ks@gmail.com<mailto:a.dibacco.ks@gmail.com>>
Date: Wednesday, May 18, 2022 at 12:40 PM
To: "users@dpdk.org<mailto:users@dpdk.org>" <users@dpdk.org<mailto:users@dpdk.org>>
Subject: DPDK performances surprise
I recently read a performance test where l2fwd was able to receive packets (8000B) from a 100 Gbps card, swap the L2 addresses and send them back to the same port to be received by an ethernet analyzer. The throughput achieved was close to 100 Gbps on a XEON machine (Intel(R) Xeon(R) Platinum 8176 CPU @ 2.10GHz) . This is the same processor I have and I know that, if I try to write around 8000B to the attached DDR4 (2666MT/s) on an allocated 1GB hugepage, I get a maximum throughput of around 20GB/s.
Now, a 100 Gbps can generate a flow of around 12 GB/s, these packets have to be written to the DDR and then read back to swap L2 addresses and this leads to a cumulative bandwidth on the DDR that is around 2x12 GB/s and is more than the 20GB/s of available bandwidth on the DDR4.
How can this be possible ?
[-- Attachment #2: Type: text/html, Size: 6053 bytes --]
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: DPDK performances surprise
2022-05-18 17:04 ` Stephen Hemminger
@ 2022-05-19 9:03 ` Antonio Di Bacco
0 siblings, 0 replies; 8+ messages in thread
From: Antonio Di Bacco @ 2022-05-19 9:03 UTC (permalink / raw)
To: Stephen Hemminger; +Cc: Sanford, Robert, users
[-- Attachment #1: Type: text/plain, Size: 1468 bytes --]
The packets are 8000B long.
On Wed, May 18, 2022 at 7:04 PM Stephen Hemminger <
stephen@networkplumber.org> wrote:
> On Wed, 18 May 2022 16:53:04 +0000
> "Sanford, Robert" <rsanford@akamai.com> wrote:
>
> > My guess is that most of the packet data has a short life in the L3
> cache (before being overwritten by newer packets), but is never flushed to
> memory.
> >
> > From: Antonio Di Bacco <a.dibacco.ks@gmail.com>
> > Date: Wednesday, May 18, 2022 at 12:40 PM
> > To: "users@dpdk.org" <users@dpdk.org>
> > Subject: DPDK performances surprise
> >
> > I recently read a performance test where l2fwd was able to receive
> packets (8000B) from a 100 Gbps card, swap the L2 addresses and send them
> back to the same port to be received by an ethernet analyzer. The
> throughput achieved was close to 100 Gbps on a XEON machine (Intel(R)
> Xeon(R) Platinum 8176 CPU @ 2.10GHz) . This is the same processor I have
> and I know that, if I try to write around 8000B to the attached DDR4
> (2666MT/s) on an allocated 1GB hugepage, I get a maximum throughput of
> around 20GB/s.
> >
> > Now, a 100 Gbps can generate a flow of around 12 GB/s, these packets
> have to be written to the DDR and then read back to swap L2 addresses and
> this leads to a cumulative bandwidth on the DDR that is around 2x12 GB/s
> and is more than the 20GB/s of available bandwidth on the DDR4.
> >
> > How can this be possible ?
>
> Likely cache effects from DDIO. What is your packet size.
>
[-- Attachment #2: Type: text/html, Size: 2079 bytes --]
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: DPDK performances surprise
2022-05-19 8:29 ` Kinsella, Ray
@ 2022-05-19 9:04 ` Antonio Di Bacco
2022-05-19 9:07 ` Kinsella, Ray
0 siblings, 1 reply; 8+ messages in thread
From: Antonio Di Bacco @ 2022-05-19 9:04 UTC (permalink / raw)
To: Kinsella, Ray; +Cc: Sanford, Robert, users
[-- Attachment #1: Type: text/plain, Size: 1762 bytes --]
This tool seems awesome!!! Better than VTUNE?
On Thu, May 19, 2022 at 10:29 AM Kinsella, Ray <ray.kinsella@intel.com>
wrote:
> I’d say that is likely yes.
>
>
>
> FYI - pcm-memory is very handy tool for looking at memory traffic.
>
> https://github.com/opcm/pcm
>
>
>
> Thanks,
>
>
>
> Ray K
>
>
>
> *From:* Sanford, Robert <rsanford@akamai.com>
> *Sent:* Wednesday 18 May 2022 17:53
> *To:* Antonio Di Bacco <a.dibacco.ks@gmail.com>; users@dpdk.org
> *Subject:* Re: DPDK performances surprise
>
>
>
> My guess is that most of the packet data has a short life in the L3 cache
> (before being overwritten by newer packets), but is never flushed to memory.
>
>
>
> *From: *Antonio Di Bacco <a.dibacco.ks@gmail.com>
> *Date: *Wednesday, May 18, 2022 at 12:40 PM
> *To: *"users@dpdk.org" <users@dpdk.org>
> *Subject: *DPDK performances surprise
>
>
>
> I recently read a performance test where l2fwd was able to receive packets
> (8000B) from a 100 Gbps card, swap the L2 addresses and send them back to
> the same port to be received by an ethernet analyzer. The throughput
> achieved was close to 100 Gbps on a XEON machine (Intel(R) Xeon(R) Platinum
> 8176 CPU @ 2.10GHz) . This is the same processor I have and I know that, if
> I try to write around 8000B to the attached DDR4 (2666MT/s) on an allocated
> 1GB hugepage, I get a maximum throughput of around 20GB/s.
>
>
>
> Now, a 100 Gbps can generate a flow of around 12 GB/s, these packets have
> to be written to the DDR and then read back to swap L2 addresses and this
> leads to a cumulative bandwidth on the DDR that is around 2x12 GB/s and is
> more than the 20GB/s of available bandwidth on the DDR4.
>
>
>
> How can this be possible ?
>
[-- Attachment #2: Type: text/html, Size: 5080 bytes --]
^ permalink raw reply [flat|nested] 8+ messages in thread
* RE: DPDK performances surprise
2022-05-19 9:04 ` Antonio Di Bacco
@ 2022-05-19 9:07 ` Kinsella, Ray
2022-05-19 15:05 ` Stephen Hemminger
0 siblings, 1 reply; 8+ messages in thread
From: Kinsella, Ray @ 2022-05-19 9:07 UTC (permalink / raw)
To: Antonio Di Bacco; +Cc: Sanford, Robert, users
[-- Attachment #1: Type: text/plain, Size: 2155 bytes --]
Better is a _very_ subjective.
pcm-memory does one thing well.
That whole suite is worth playing with though.
Ray K
From: Antonio Di Bacco <a.dibacco.ks@gmail.com>
Sent: Thursday 19 May 2022 10:04
To: Kinsella, Ray <ray.kinsella@intel.com>
Cc: Sanford, Robert <rsanford@akamai.com>; users@dpdk.org
Subject: Re: DPDK performances surprise
This tool seems awesome!!! Better than VTUNE?
On Thu, May 19, 2022 at 10:29 AM Kinsella, Ray <ray.kinsella@intel.com<mailto:ray.kinsella@intel.com>> wrote:
I’d say that is likely yes.
FYI - pcm-memory is very handy tool for looking at memory traffic.
https://github.com/opcm/pcm
Thanks,
Ray K
From: Sanford, Robert <rsanford@akamai.com<mailto:rsanford@akamai.com>>
Sent: Wednesday 18 May 2022 17:53
To: Antonio Di Bacco <a.dibacco.ks@gmail.com<mailto:a.dibacco.ks@gmail.com>>; users@dpdk.org<mailto:users@dpdk.org>
Subject: Re: DPDK performances surprise
My guess is that most of the packet data has a short life in the L3 cache (before being overwritten by newer packets), but is never flushed to memory.
From: Antonio Di Bacco <a.dibacco.ks@gmail.com<mailto:a.dibacco.ks@gmail.com>>
Date: Wednesday, May 18, 2022 at 12:40 PM
To: "users@dpdk.org<mailto:users@dpdk.org>" <users@dpdk.org<mailto:users@dpdk.org>>
Subject: DPDK performances surprise
I recently read a performance test where l2fwd was able to receive packets (8000B) from a 100 Gbps card, swap the L2 addresses and send them back to the same port to be received by an ethernet analyzer. The throughput achieved was close to 100 Gbps on a XEON machine (Intel(R) Xeon(R) Platinum 8176 CPU @ 2.10GHz) . This is the same processor I have and I know that, if I try to write around 8000B to the attached DDR4 (2666MT/s) on an allocated 1GB hugepage, I get a maximum throughput of around 20GB/s.
Now, a 100 Gbps can generate a flow of around 12 GB/s, these packets have to be written to the DDR and then read back to swap L2 addresses and this leads to a cumulative bandwidth on the DDR that is around 2x12 GB/s and is more than the 20GB/s of available bandwidth on the DDR4.
How can this be possible ?
[-- Attachment #2: Type: text/html, Size: 9661 bytes --]
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: DPDK performances surprise
2022-05-19 9:07 ` Kinsella, Ray
@ 2022-05-19 15:05 ` Stephen Hemminger
0 siblings, 0 replies; 8+ messages in thread
From: Stephen Hemminger @ 2022-05-19 15:05 UTC (permalink / raw)
To: Kinsella, Ray; +Cc: Antonio Di Bacco, Sanford, Robert, users
On Thu, 19 May 2022 09:07:28 +0000
"Kinsella, Ray" <ray.kinsella@intel.com> wrote:
> Better is a _very_ subjective.
>
> pcm-memory does one thing well.
> That whole suite is worth playing with though.
>
> Ray K
>
> From: Antonio Di Bacco <a.dibacco.ks@gmail.com>
> Sent: Thursday 19 May 2022 10:04
> To: Kinsella, Ray <ray.kinsella@intel.com>
> Cc: Sanford, Robert <rsanford@akamai.com>; users@dpdk.org
> Subject: Re: DPDK performances surprise
>
> This tool seems awesome!!! Better than VTUNE?
>
> On Thu, May 19, 2022 at 10:29 AM Kinsella, Ray <ray.kinsella@intel.com<mailto:ray.kinsella@intel.com>> wrote:
> I’d say that is likely yes.
>
> FYI - pcm-memory is very handy tool for looking at memory traffic.
> https://github.com/opcm/pcm
>
> Thanks,
>
> Ray K
>
> From: Sanford, Robert <rsanford@akamai.com<mailto:rsanford@akamai.com>>
> Sent: Wednesday 18 May 2022 17:53
> To: Antonio Di Bacco <a.dibacco.ks@gmail.com<mailto:a.dibacco.ks@gmail.com>>; users@dpdk.org<mailto:users@dpdk.org>
> Subject: Re: DPDK performances surprise
>
> My guess is that most of the packet data has a short life in the L3 cache (before being overwritten by newer packets), but is never flushed to memory.
>
> From: Antonio Di Bacco <a.dibacco.ks@gmail.com<mailto:a.dibacco.ks@gmail.com>>
> Date: Wednesday, May 18, 2022 at 12:40 PM
> To: "users@dpdk.org<mailto:users@dpdk.org>" <users@dpdk.org<mailto:users@dpdk.org>>
> Subject: DPDK performances surprise
>
> I recently read a performance test where l2fwd was able to receive packets (8000B) from a 100 Gbps card, swap the L2 addresses and send them back to the same port to be received by an ethernet analyzer. The throughput achieved was close to 100 Gbps on a XEON machine (Intel(R) Xeon(R) Platinum 8176 CPU @ 2.10GHz) . This is the same processor I have and I know that, if I try to write around 8000B to the attached DDR4 (2666MT/s) on an allocated 1GB hugepage, I get a maximum throughput of around 20GB/s.
>
> Now, a 100 Gbps can generate a flow of around 12 GB/s, these packets have to be written to the DDR and then read back to swap L2 addresses and this leads to a cumulative bandwidth on the DDR that is around 2x12 GB/s and is more than the 20GB/s of available bandwidth on the DDR4.
>
> How can this be possible ?
If you are comparing forwarding versus writing the whole packet:
- for the forwarding case swapping mac address is a single cache line read/write
- for software writing the whole packet it will end up walking through many cache lines and dirtying
them. Also for that test are you rewriting same packet or walking much larger memory area
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2022-05-19 15:05 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-05-18 16:40 DPDK performances surprise Antonio Di Bacco
2022-05-18 16:53 ` Sanford, Robert
2022-05-18 17:04 ` Stephen Hemminger
2022-05-19 9:03 ` Antonio Di Bacco
2022-05-19 8:29 ` Kinsella, Ray
2022-05-19 9:04 ` Antonio Di Bacco
2022-05-19 9:07 ` Kinsella, Ray
2022-05-19 15:05 ` Stephen Hemminger
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).