[-- Attachment #1: Type: text/plain, Size: 620 bytes --] Consider a valid DPDK TXQ with its mempool of rte_mbufs. Application code will allocate a mbuf from the pool and prepare it with headers, data, and so on. When the mbuf(s) are enqueued to the NIC with rte_eth_tx_burst() does DPDK DMA the memory into the NIC? Is this an optimization worth considering? DPDK provides a DMA example here: http://doc.dpdk.org/api/examples_2dma_2dmafwd_8c-example.html Now, to be fair, ultimately whether or not DMA helps must be evidenced by a benchmark. Still, is there any serious reason to make mempools and its bufs DMA into and out of the NIC? Any perspective here is appreciated. [-- Attachment #2: Type: text/html, Size: 804 bytes --]
2023-01-08 16:05 (UTC-0500), fwefew 4t4tg: > Consider a valid DPDK TXQ with its mempool of rte_mbufs. Application code > will allocate a mbuf from the pool and prepare it with headers, data, and > so on. > > When the mbuf(s) are enqueued to the NIC with rte_eth_tx_burst() does DPDK > DMA the memory into the NIC? Is this an optimization worth considering? DPDK is SW running on CPU. DMA is a way for HW to access RAM bypassing CPU (thus it is "direct"). What happens in rte_eth_tx_burst(): DPDK fills the packet descriptor and requests the NIC to send the packet. The NIC subsequently and asynchronously uses DMA to read the packet data. Regarding optimizations: 1. Even if the NIC has some internal buffer where it stores packet data before sending it to the wire, those buffers are not usually exposed. 2. If the NIC has on-board memory to store packet data, this would be implemented by a mempool driver working with such memory. > DPDK provides a DMA example here: > http://doc.dpdk.org/api/examples_2dma_2dmafwd_8c-example.html > > Now, to be fair, ultimately whether or not DMA helps must be evidenced by a > benchmark. Still, is there any serious reason to make mempools and its > bufs DMA into and out of the NIC? DMA devices in DPDK allow the CPU to initiate an operation on RAM that will be performed asynchronously by some special HW. For example, instead of memset() DPDK can tell DMA device to zero a memory block and avoid spending CPU cycles (but CPU will need to ensure zeroing completion later).
[-- Attachment #1: Type: text/plain, Size: 1967 bytes --] Thank you for taking time to provide a nice reply. The upshot here is that DPDK already uses DMA in a smart way to move packet data into TXQs. I presume the reverse also happens: NIC uses DMA to move packets out of its HW RXQs into the host machine's memory using the mempool associated with it. On Wed, Jan 11, 2023 at 6:26 AM Dmitry Kozlyuk <dmitry.kozliuk@gmail.com> wrote: > 2023-01-08 16:05 (UTC-0500), fwefew 4t4tg: > > Consider a valid DPDK TXQ with its mempool of rte_mbufs. Application code > > will allocate a mbuf from the pool and prepare it with headers, data, and > > so on. > > > > When the mbuf(s) are enqueued to the NIC with rte_eth_tx_burst() does > DPDK > > DMA the memory into the NIC? Is this an optimization worth considering? > > DPDK is SW running on CPU. > DMA is a way for HW to access RAM bypassing CPU (thus it is "direct"). > > What happens in rte_eth_tx_burst(): > DPDK fills the packet descriptor and requests the NIC to send the packet. > The NIC subsequently and asynchronously uses DMA to read the packet data. > > Regarding optimizations: > 1. Even if the NIC has some internal buffer where it stores packet data > before sending it to the wire, those buffers are not usually exposed. > 2. If the NIC has on-board memory to store packet data, > this would be implemented by a mempool driver working with such memory. > > > DPDK provides a DMA example here: > > http://doc.dpdk.org/api/examples_2dma_2dmafwd_8c-example.html > > > > Now, to be fair, ultimately whether or not DMA helps must be evidenced > by a > > benchmark. Still, is there any serious reason to make mempools and its > > bufs DMA into and out of the NIC? > > DMA devices in DPDK allow the CPU to initiate an operation on RAM > that will be performed asynchronously by some special HW. > For example, instead of memset() DPDK can tell DMA device > to zero a memory block and avoid spending CPU cycles > (but CPU will need to ensure zeroing completion later). > [-- Attachment #2: Type: text/html, Size: 2553 bytes --]
On Wed, 11 Jan 2023 13:05:07 -0500
fwefew 4t4tg <7532yahoo@gmail.com> wrote:
> Thank you for taking time to provide a nice reply. The upshot here is that
> DPDK
> already uses DMA in a smart way to move packet data into TXQs. I presume the
> reverse also happens: NIC uses DMA to move packets out of its HW RXQs into
> the host machine's memory using the mempool associated with it.
>
>
>
> On Wed, Jan 11, 2023 at 6:26 AM Dmitry Kozlyuk <dmitry.kozliuk@gmail.com>
> wrote:
>
> > 2023-01-08 16:05 (UTC-0500), fwefew 4t4tg:
> > > Consider a valid DPDK TXQ with its mempool of rte_mbufs. Application code
> > > will allocate a mbuf from the pool and prepare it with headers, data, and
> > > so on.
> > >
> > > When the mbuf(s) are enqueued to the NIC with rte_eth_tx_burst() does
> > DPDK
> > > DMA the memory into the NIC? Is this an optimization worth considering?
> >
> > DPDK is SW running on CPU.
> > DMA is a way for HW to access RAM bypassing CPU (thus it is "direct").
> >
> > What happens in rte_eth_tx_burst():
> > DPDK fills the packet descriptor and requests the NIC to send the packet.
> > The NIC subsequently and asynchronously uses DMA to read the packet data.
> >
> > Regarding optimizations:
> > 1. Even if the NIC has some internal buffer where it stores packet data
> > before sending it to the wire, those buffers are not usually exposed.
> > 2. If the NIC has on-board memory to store packet data,
> > this would be implemented by a mempool driver working with such memory.
> >
> > > DPDK provides a DMA example here:
> > > http://doc.dpdk.org/api/examples_2dma_2dmafwd_8c-example.html
> > >
> > > Now, to be fair, ultimately whether or not DMA helps must be evidenced
> > by a
> > > benchmark. Still, is there any serious reason to make mempools and its
> > > bufs DMA into and out of the NIC?
> >
> > DMA devices in DPDK allow the CPU to initiate an operation on RAM
> > that will be performed asynchronously by some special HW.
> > For example, instead of memset() DPDK can tell DMA device
> > to zero a memory block and avoid spending CPU cycles
> > (but CPU will need to ensure zeroing completion later).
> >
The setup and DMA is done in the hardware specific poll mode driver (PMD).
There can be some drivers that can't do direct DMA and need to copy data
but these are the exception and mostly for virtual devices.
2023-01-11 13:05 (UTC-0500), fwefew 4t4tg: > Thank you for taking time to provide a nice reply. The upshot here is that > DPDK > already uses DMA in a smart way to move packet data into TXQs. I presume the > reverse also happens: NIC uses DMA to move packets out of its HW RXQs into > the host machine's memory using the mempool associated with it. Almost, except that the NIC does not know about mempools. It's DPDK PMD (as Stephen clarified) that allocates mbufs from mempool, does Rx descriptor setup to point to the buffers and then requests the NIC to write packet data to host RAM using DMA. If you want to study this area, take a look at "ixy" project family: https://github.com/ixy-languages/ixy-languages It's like "micro DPDK" for educational purposes.