Hi everyone,
I'm trying to use a secondary process to rx/tx packets through a NIC and running into segfaults due to some data structures not being populated on the secondary. I've figured out a solution, but it involves the use of internal APIs and I wanted to check with the community if there's a better way to do this.

I'm using DPDK 24.11 LTS for this test. The test code is at https://gist.github.com/praetorian20/0c1b69abbc7843d958da72fdf611a5d7. I'll describe the steps I'm following here.

Primary process:
  1. Command line: sudo LD_LIBRARY_PATH=/path/to/dpdk/shared/libs/ ./test-ethdev --main-lcore 0 --file-prefix=mpdemo --socket-mem=1024 -d /path/to/dpdk/shared/libs/dpdk/pmds-25.0 --proc-type primary
  2. Calls rte_eal_init
  3. Registers message handler using  rte_mp_action_register
  4. Waits indefinitely for CTRL+C
Secondary process:
  1. Command line: Same as primary, except --proc-type=secondary
  2. Calls rte_eal_init
  3. Registers message handler using  rte_mp_action_register
  4. Creates a mempool using rte_pktmbuf_pool_create
  5. Sends request message to primary using  rte_mp_request_async
Primary process (upon receiving the async request):
  1. Look up the mempool created by the secondary
  2. Get port id for a particular ethdev using PCI address
  3. Call rte_eth_dev_configurerte_eth_rx_queue_setup,  rte_eth_tx_queue_setup and  rte_eth_dev_start to start the ethdev
  4. Send response to secondary
Secondary process (upon receiving response):
  1. Call  rte_eth_dev_get_port_by_name to fetch port id
  2. Call  rte_eth_dev_attach_secondary and  rte_eth_dev_probing_finish to initialize data structures. Without this the next step has segv because rte_eth_fp_ops.rxq.data is nullptr
  3. Call rte_eth_rx_burst to receive packets

The above works but I have a few questions whether this process can be improved. I'm using an Nvidia ConnectX6 NIC (mlx5 driver) in case that matters.

  • Are there public APIs I can call instead of  rte_eth_dev_attach_secondary and  rte_eth_dev_probing_finish?
  • Is it possible to perform rte_eth_dev_configure, tx/rx queue setup and  rte_eth_dev_start from the secondary process? These functions return various error codes when I try this.
  • When I call rte_eth_dev_start on the primary, it's sending a message to the secondary that goes unhandled. Is this a problem, and possibly the reason for the unpopulated rte_eth_fp_ops on the secondary? This is the error message I see from the primary:
EAL: Fail to recv reply for request /var/run/dpdk/mpdemo/mp_socket_1094695_b7080eff23acd4:common_mlx5_mp
mlx5_net: port 2 failed to request stop/start Rx/Tx (5)

 Thanks for all your help,
Ashish