Optimizing for common burst sizes

DPDK patches and discussions
 help / color / mirror / Atom feed

* Optimizing for common burst sizes
@ 2022-09-01  8:21 Morten Brørup
  0 siblings, 0 replies; only message in thread
From: Morten Brørup @ 2022-09-01  8:21 UTC (permalink / raw)
  To: dev

Triggered by the discussion about the performance cost of function pointers [1], I want to share some thoughts about variables vs. constants:

A lot of flexibility - some of it only required to support more or less exotic scenarios - has been sneaking into DPDK, and costing performance. Not only function pointers, but also variables, which might as well be build time constants.

E.g.: When the i40e driver allocates a bulk of mbufs, the non-vector function allocates rxq->rx_free_thresh mbufs each time, but the optimized vector functions allocate RTE_I40E_RXQ_REARM_THRESH (#defined as 32) mbufs each time. Using a constant provides higher performance, because the compiler at build time knows the size of the mempool cache copy loop in the rte_mempool_do_generic_get() function, which gets inlined into the i40e driver. I suppose the non-vector variant having a variable is there to support latency-sensitive applications requiring very small bursts - but using a build time configurable constant would provide even higher performance (i.e. lower latency).

We might be able to achieve some general performance improvements by agreeing on a few "extremely common" burst sizes, and giving them special treatment. It would not only improve performance, but might also reduce some complexity by using common standards in places where burst sizes today are implementation specific. They might also be helpful guidance for new DPDK application developers. These constants could be:

RTE_BURST_MICRO
---------------
The burst size for latency sensitive application.
Default: 4 (?)

RTE_BURST_SMALL
---------------
A small, but still efficient burst; e.g. a cache line of pointers.
Default: RTE_CACHE_LINE_SIZE / sizeof(void *) = 8 or 16

RTE_BURST_DEFAULT
-----------------
The typical application burst size.
Default: 32

RTE_BURST_HUGE
--------------
A very large burst, but small enough to fit into a typical PMD egress queue.
E.g. the mempool cache size, or half of it.
Default: RTE_MEMPOOL_CACHE_MAX_SIZE = 512 (or half of it?)
Note: This one might be too rare to deserve special treatment, but is included for the sake of discussion.

Obviously, these should be build time configurable.

[1] http://inbox.dpdk.org/dev/20220818114449.1408226-1-cristian.dumitrescu@intel.com/T/#m679f356f097c89d3a542b7a0967069d6d0bc25e3

Med venlig hilsen / Kind regards,
-Morten Brørup

^ permalink raw reply	[flat|nested] only message in thread

only message in thread, other threads:[~2022-09-01  8:21 UTC | newest]

Thread overview: (only message) (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-09-01  8:21 Optimizing for common burst sizes Morten Brørup

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).