My NIC has 32 RX and 32 TX queues.

Now given this API,

static uint16_t rte_eth_tx_burst(uint16_t port_id,
uint16_t queue_id,
struct rte_mbuf ** tx_pkts,
uint16_t nb_pkts 
)

I gather the most natural approach for efficient packet transmissions is,

- assign one core to TX on a given port, queue since I/O through different queues is essentially independent
- arrange packets in memory buffers so packets for the same queue are in the same buffer

The RX side is much the same:

static uint16_t rte_eth_rx_burst(uint16_t port_id,
uint16_t queue_id,
struct rte_mbuf ** rx_pkts,
const uint16_t nb_pkts 
)

- assign one core to RX on a given port, queue since I/O through different queues is essentially independent of other queues

Is this pretty much the starting place?

Now, my NIC also has RSS (elided):

ethtool -x eth0

RSS hash key:

e8:27:27:e4:d9:fa:e4:1e:c6:89:67:95:52:4b:7a:41:3a:a6:68:5f:12:ec:4c:2f:51:18:a0:9b:bb:e1:7a:fb:a7:fb:7f:68:39:47:c2:83

RSS hash function:

    toeplitz: on

    xor: off

    crc32: off


But doesn't specifying the queue_id on RX/TX burst functions undercut RSS? That is, nowhere is there an opportunity for the NIC to determine which queue TX or RX go to since it was told at the outset?