Hello commit authors (and maintainers),

I'm currently working with rte_flow_async_create() using the postpone flag, along with rte_flow_push/pull() for batching, in a scenario involving thousands of flows on a BlueField-2 system.

My goal is to implement hardware steering such that ingress traffic bypasses the ARM core of the BF2, and egress traffic does the same.

According to the DPDK documentation, rte_flow_push/pull() seems to be intended for use as a batch operation, wrapping a large for loop that issues multiple flow operations, and then committing them to hardware in one go.

However, I’ve observed that when multiple cores simultaneously insert flow rules, using rte_flow_push/pull() in such a batched way can result in the rule insertion operations not being properly transmitted to the hardware. Specifically, the internal function mlx5dr_send_all_dep_wqe() ends up getting stuck in its while loop.

Interestingly, if I call rte_flow_push/pull() after each individual rte_flow_async_create() operation, even though that usage seems contrary to the intended batching model, the infinite loop issue is significantly mitigated. The frequency of getting stuck in mlx5dr_send_all_dep_wqe() drops drastically—though it still occurs occasionally.

In summary, calling rte_flow_push/pull() after each rte_flow_async_create() seems to avoid the infinite loop, but I’m unsure if this is an expected usage pattern. I would like to ask:

Is this behavior intentional?
Am I misunderstanding the design or usage expectations for rte_flow_push/pull() in multi-core scenarios?

Thank you for your time and support.

Sincerely,

Seongjong Bae M.S. Student T-Networking Lab.

Email

sjbae1999@gmail.com

Mobile

(+82)01089640524

Web.

https://tnet.snu.ac.kr/