Hi everyone,
I'm trying to use a secondary process to rx/tx packets through a NIC and running into segfaults due to some data structures not being populated on the secondary. I've figured out a solution, but it involves the use of internal APIs and I wanted to check with the community if there's a better way to do this.
Primary process:
- Command line: sudo LD_LIBRARY_PATH=/path/to/dpdk/shared/libs/ ./test-ethdev --main-lcore 0 --file-prefix=mpdemo --socket-mem=1024 -d /path/to/dpdk/shared/libs/dpdk/pmds-25.0 --proc-type primary
- Calls rte_eal_init
- Registers message handler using
rte_mp_action_register
- Waits indefinitely for CTRL+C
Secondary process:
- Command line: Same as primary, except --proc-type=secondary
- Calls rte_eal_init
- Registers message handler using
rte_mp_action_register
- Creates a mempool using rte_pktmbuf_pool_create
- Sends request message to primary using
rte_mp_request_async
Primary process (upon receiving the async request):
- Look up the mempool created by the secondary
- Get port id for a particular ethdev using PCI address
- Call rte_eth_dev_configure,
rte_eth_rx_queue_setup,
rte_eth_tx_queue_setup and
rte_eth_dev_start to start the ethdev
- Send response to secondary
Secondary process (upon receiving response):
- Call
rte_eth_dev_get_port_by_name to fetch port id
- Call
rte_eth_dev_attach_secondary and
rte_eth_dev_probing_finish to initialize data structures. Without this the next step has segv because rte_eth_fp_ops.rxq.data is nullptr
- Call rte_eth_rx_burst to receive packets
The above works but I have a few questions whether this process can be improved. I'm using an Nvidia ConnectX6 NIC (mlx5 driver) in case that matters.
- Are there public APIs I can call instead of
rte_eth_dev_attach_secondary and
rte_eth_dev_probing_finish?
- Is it possible to perform rte_eth_dev_configure, tx/rx queue setup and
rte_eth_dev_start from the secondary process? These functions return various error codes when I try this.
- When I call rte_eth_dev_start on the primary, it's sending a message to the secondary that goes unhandled. Is this a problem, and possibly the reason for the unpopulated rte_eth_fp_ops on the secondary? This is the error message I see from the primary:
EAL: Fail to recv reply for request /var/run/dpdk/mpdemo/mp_socket_1094695_b7080eff23acd4:common_mlx5_mp
mlx5_net: port 2 failed to request stop/start Rx/Tx (5)
Thanks for all your help,
Ashish