Better is a _very_ subjective. pcm-memory does one thing well. That whole suite is worth playing with though. Ray K From: Antonio Di Bacco Sent: Thursday 19 May 2022 10:04 To: Kinsella, Ray Cc: Sanford, Robert ; users@dpdk.org Subject: Re: DPDK performances surprise This tool seems awesome!!! Better than VTUNE? On Thu, May 19, 2022 at 10:29 AM Kinsella, Ray > wrote: I’d say that is likely yes. FYI - pcm-memory is very handy tool for looking at memory traffic. https://github.com/opcm/pcm Thanks, Ray K From: Sanford, Robert > Sent: Wednesday 18 May 2022 17:53 To: Antonio Di Bacco >; users@dpdk.org Subject: Re: DPDK performances surprise My guess is that most of the packet data has a short life in the L3 cache (before being overwritten by newer packets), but is never flushed to memory. From: Antonio Di Bacco > Date: Wednesday, May 18, 2022 at 12:40 PM To: "users@dpdk.org" > Subject: DPDK performances surprise I recently read a performance test where l2fwd was able to receive packets (8000B) from a 100 Gbps card, swap the L2 addresses and send them back to the same port to be received by an ethernet analyzer. The throughput achieved was close to 100 Gbps on a XEON machine (Intel(R) Xeon(R) Platinum 8176 CPU @ 2.10GHz) . This is the same processor I have and I know that, if I try to write around 8000B to the attached DDR4 (2666MT/s) on an allocated 1GB hugepage, I get a maximum throughput of around 20GB/s. Now, a 100 Gbps can generate a flow of around 12 GB/s, these packets have to be written to the DDR and then read back to swap L2 addresses and this leads to a cumulative bandwidth on the DDR that is around 2x12 GB/s and is more than the 20GB/s of available bandwidth on the DDR4. How can this be possible ?