DPDK usage discussions
 help / color / mirror / Atom feed
* Cache misses on allocating mbufs
@ 2024-06-22 23:22 fwefew 4t4tg
  0 siblings, 0 replies; only message in thread
From: fwefew 4t4tg @ 2024-06-22 23:22 UTC (permalink / raw)
  To: users

I happened to be looking into mbuf allocation, and am a little
underwhelmed by the default DPDK performance. There are a lot of
last-level cache misses as measured by Intel's PMU.

I made a single-producer/single-consumer mempool and benchmarked
rte_pktmbuf_alloc and rte_pktmbuf_free using 1Gb huge pages. This ran
on 1 pinned core doing nothing else. A representative loop,

  for (u_int64_t i=0; i<MAX; ++i) {
    data[i] = rte_pktmbuf_alloc(pool);
  }

The salient results are:

- allocating 100,000 mbufs in a tight loop
  * 246.836380 cycles (72.598935 ns) per alloc
  * 797231 LLC cache misses or 90.8% of all LLC references

That's darn near 8 LLC misses per allocation.

Now, we may assume some level of coldness the first time through. If
one frees everything then reallocates 100,000 mbufs on the same pool
the data is slightly better:

- reallocating the same 100,000 mbufs in a tight loop:
  * 221.091480 cycles (65.026906 ns) per alloc
  * 521377 LLC misses 62.8% of all LLC references

It's not so much the allocation of memory; it's the DPDK
initialization of it. Contrast with raw allocations, which skips some
init work rte_pktmbuf_alloc() does. It's an order of 10 better:

- allocate 100,000 mbufs with rte_mbuf_raw_alloc
  * 62.446280 cycles (18.366553 ns) per alloc
  * 8244 LLC misses 14.9% of all LLC references

The bigger problem I have is there seems to be no way around this.
Even if I allocate my own memory with mmap, and add it to the DPDK
heap, I still have to make a mempool on it and still use the same
rte_pktmbuf_alloc calls on it.

Ideally, I want one mempool per TX queue in 1 thread pinned to 1 core
that has NO/NONE/ZERO contention with anything else.

Ideas?

Env:
CPU: Intel(R) Xeon(R) E-2278G CPU @ 3.40GHz
Ubuntu 24.04 stock
MLX5 driver
DPDK 24.07-rc1
OFED MLNX_OFED_LINUX-24.04-0.6.6.0-ubuntu24.04-x86_64.tgz

Mempool created like this:

    rte_mempool *mempool = rte_pktmbuf_pool_create_by_ops(
      name,
      102399,
      512,
      8,
      1700,
      0,
      "ring_sp_sc");

^ permalink raw reply	[flat|nested] only message in thread

only message in thread, other threads:[~2024-06-22 23:23 UTC | newest]

Thread overview: (only message) (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-06-22 23:22 Cache misses on allocating mbufs fwefew 4t4tg

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).