My NIC has 32 RX and 32 TX queues. Now given this API, static uint16_t rte_eth_tx_burst ( uint16_t port_id, uint16_t queue_id, struct rte_mbuf ** tx_pkts, uint16_t nb_pkts ) I gather the most natural approach for efficient packet transmissions is, - assign one core to TX on a given port, queue since I/O through different queues is essentially independent - arrange packets in memory buffers so packets for the same queue are in the same buffer The RX side is much the same: static uint16_t rte_eth_rx_burst ( uint16_t port_id, uint16_t queue_id, struct rte_mbuf ** rx_pkts, const uint16_t nb_pkts ) - assign one core to RX on a given port, queue since I/O through different queues is essentially independent of other queues Is this pretty much the starting place? Now, my NIC also has RSS (elided): ethtool -x eth0 RSS hash key: e8:27:27:e4:d9:fa:e4:1e:c6:89:67:95:52:4b:7a:41:3a:a6:68:5f:12:ec:4c:2f:51:18:a0:9b:bb:e1:7a:fb:a7:fb:7f:68:39:47:c2:83 RSS hash function: toeplitz: on xor: off crc32: off But doesn't specifying the queue_id on RX/TX burst functions undercut RSS? That is, nowhere is there an opportunity for the NIC to determine which queue TX or RX go to since it was told at the outset?