Thank you for taking time to provide a nice reply. The upshot here is that DPDK already uses DMA in a smart way to move packet data into TXQs. I presume the reverse also happens: NIC uses DMA to move packets out of its HW RXQs into the host machine's memory using the mempool associated with it. On Wed, Jan 11, 2023 at 6:26 AM Dmitry Kozlyuk wrote: > 2023-01-08 16:05 (UTC-0500), fwefew 4t4tg: > > Consider a valid DPDK TXQ with its mempool of rte_mbufs. Application code > > will allocate a mbuf from the pool and prepare it with headers, data, and > > so on. > > > > When the mbuf(s) are enqueued to the NIC with rte_eth_tx_burst() does > DPDK > > DMA the memory into the NIC? Is this an optimization worth considering? > > DPDK is SW running on CPU. > DMA is a way for HW to access RAM bypassing CPU (thus it is "direct"). > > What happens in rte_eth_tx_burst(): > DPDK fills the packet descriptor and requests the NIC to send the packet. > The NIC subsequently and asynchronously uses DMA to read the packet data. > > Regarding optimizations: > 1. Even if the NIC has some internal buffer where it stores packet data > before sending it to the wire, those buffers are not usually exposed. > 2. If the NIC has on-board memory to store packet data, > this would be implemented by a mempool driver working with such memory. > > > DPDK provides a DMA example here: > > http://doc.dpdk.org/api/examples_2dma_2dmafwd_8c-example.html > > > > Now, to be fair, ultimately whether or not DMA helps must be evidenced > by a > > benchmark. Still, is there any serious reason to make mempools and its > > bufs DMA into and out of the NIC? > > DMA devices in DPDK allow the CPU to initiate an operation on RAM > that will be performed asynchronously by some special HW. > For example, instead of memset() DPDK can tell DMA device > to zero a memory block and avoid spending CPU cycles > (but CPU will need to ensure zeroing completion later). >