I completed code to initialize an AWS ENA adapter with RX, TX queues. With
this work in hand, DPDK creates one thread pinned to the right core as per
the --lcores argument. So far so good.
The DPDK documentation and example code is fairly clear here.

What's not as clear is how RX packets are handled. As far as I can tell the
canonical way to deal with RX packets is running 'rte_eth_add_rx_callback'
for each RXQ. This allows one to process each received packet (for a given
RXQ) via a provided callback in the same lcore/hardware-thread that DPDK
created for me. As such, there is no need to create additional threads.
Correct?

Furthermore, I hope the mbufs the callback gets somehow correspond to mbufs
associated with the RX descriptors provided to the RXQs so there's no need
for copying packets after the NIC receives them before the callback acts on
it. As far as I can this hope is ill-founded.. A lot of DPDK code I've seen
allocates more mbufs per RXQ than the number of RX descriptors. To me this
seems to imply DPDK's RXQ threads put copies of the
received-off-the-wire-packets into a copy for delivery to app code.

TX is less clear to me.

For TX there seems to be no way to transmit packets (burst or otherwise)
without creating another thread --- that is, another thread beyond what
DPDK makes for me. This other thread must at the appropriate time
prepare mbufs and call 'rte_eth_tx_burst' on the correct TXQ. DPDK seems to
want to keep its thread for it's own work. Yes, DPDK provides
'rte_eth_add_tx_callback' but that only works after the mbufs have been
created and told to transmit, which is after the fact of creation. Putting
this together, DPDK requires me to create new threads unlike RX. Correct?

While creating additional threads for TX is not the end of the world, I do
not want the DPDK TX thread to copy mbufs; I want zero-copy. Here, then, I
gather DPDK's TXQ thread takes the mbufs the helper TX thread provides in
the 'rte_eth_tx_burst' call and provides them to the TXQS descriptors so
they go out on the wire without copying. Is this correct?

Now, it's worth pointing out here that 'rte_eth_tx_queue_setup' unlike the
RX equivalent does not accept a mempool.  So in addition to the above
points, those additional TX helper threads (those which call
rte_eth_tx_burst) will need to arrange for its own mempool. That's not hard
to do, but I just want confirmation.

Thanks.