We have a DPDK application that only calls rte_eth_rx_burst() (we do not transmit packets) and it must process the payload very quickly. The payload of a single network packet MUST be in contiguous memory.

The DPDK API is optimized around having memory pools of fixed-size mbufs in memory pools. If a packet is received on the DPDK port that is larger than the mbuf size, but smaller than the max MTU then it will be segmented according to the figure in the mbuf documentation:

This leads us the following problems:

· If we configure the memory pool to store large packets (for example max MTU size) then we will always store the payload in contiguous memory, but we will waste huge amounts memory in the case we receive traffic containing small packets. Imagine that our mbuf size is 9216 bytes, but we are receiving mostly packets of size 100-300 bytes. We are wasting memory by a factor of 90!

· If we reduce the size of mbufs, to let's say 512 bytes, then we need special handling of those segments to store the payload in contiguous memory. Special handling and copying of segments hurt our performance, so it should be limited.

Considering the above, my questions are as follows:

1. What strategy is recommended for a DPDK application that needs to process the payload of network packets in contiguous memory? With both small (100-300 bytes) and large (9216) packets, without wasting huge amounts of memory with 9K-sized mbuf pools? Is copying segmented jumbo frames into a larger max_mtu mbuf the only option?

2. Some frameworks and drivers allow decoupling the RX descriptors (mbuf) from the payload, such that payloads are stored contiguously. This does not seem to be possible in the DPDK framework for segmented packets, since mbufs are stored right next to their payload in memory?