Thank you for taking time to provide a nice reply. The upshot here is that DPDK
already uses DMA in a smart way to move packet data into TXQs. I presume the
reverse also happens: NIC uses DMA to move packets out of its HW RXQs into 
the host machine's memory using the mempool associated with it.



On Wed, Jan 11, 2023 at 6:26 AM Dmitry Kozlyuk <dmitry.kozliuk@gmail.com> wrote:
2023-01-08 16:05 (UTC-0500), fwefew 4t4tg:
> Consider a valid DPDK TXQ with its mempool of rte_mbufs. Application code
> will allocate a mbuf from the pool and prepare it with headers, data, and
> so on.
>
> When the mbuf(s) are enqueued to the NIC with rte_eth_tx_burst() does DPDK
> DMA the memory into the NIC? Is this an optimization worth considering?

DPDK is SW running on CPU.
DMA is a way for HW to access RAM bypassing CPU (thus it is "direct").

What happens in rte_eth_tx_burst():
DPDK fills the packet descriptor and requests the NIC to send the packet.
The NIC subsequently and asynchronously uses DMA to read the packet data.

Regarding optimizations:
1. Even if the NIC has some internal buffer where it stores packet data
before sending it to the wire, those buffers are not usually exposed.
2. If the NIC has on-board memory to store packet data,
this would be implemented by a mempool driver working with such memory.

> DPDK provides a DMA example here:
> http://doc.dpdk.org/api/examples_2dma_2dmafwd_8c-example.html
>
> Now, to be fair, ultimately whether or not DMA helps must be evidenced by a
> benchmark. Still, is there any serious reason to make mempools and its
> bufs DMA into and out of the NIC?

DMA devices in DPDK allow the CPU to initiate an operation on RAM
that will be performed asynchronously by some special HW.
For example, instead of memset() DPDK can tell DMA device
to zero a memory block and avoid spending CPU cycles
(but CPU will need to ensure zeroing completion later).