Hello,

@Ivan Malov, I use one flow queue per lcore.

Sincerely,
Seongjong Bae M.S. Student T-Networking Lab.

Email.sjbae1999@gmail.com
Mobile.(+82)01089640524
Web.https://tnet.snu.ac.kr/


2025년 8월 12일 (화) 오후 5:30, Bing Zhao <bingz@nvidia.com>님이 작성:
@Ivan Malov, which version of DPDK are you using? The last year RC?

@Erez Shitrit, could you help to confirm if the GCC loop expansion bug of some arm compiler is also present in this branch?
I remember there was a GCC bug to always compare with 1 and jump into an infinite loop.

Thanks

> -----Original Message-----
> From: Ivan Malov <ivan.malov@arknetworks.am>
> Sent: Tuesday, August 12, 2025 12:09 AM
> To: 배성종 <sjbae1999@gmail.com>
> Cc: users@dpdk.org; Dariusz Sosnowski <dsosnowski@nvidia.com>; Slava
> Ovsiienko <viacheslavo@nvidia.com>; Bing Zhao <bingz@nvidia.com>; Ori Kam
> <orika@nvidia.com>; Suanming Mou <suanmingm@nvidia.com>; Matan Azrad
> <matan@nvidia.com>
> Subject: Re: [DPDK 24.11.3-rc1] rte_flow_async_create() stucks in while
> loop (infinite loop)
>
> External email: Use caution opening links or attachments
>
>
> Hi,
>
> On Mon, 28 Jul 2025, 배성종 wrote:
>
> > Hello commit authors (and maintainers),
> >
> > I'm currently working with rte_flow_async_create() using the postpone
> > flag, along with rte_flow_push/pull() for batching, in a scenario
> involving thousands of flows on a BlueField-2 system.
> >
> > My goal is to implement hardware steering such that ingress traffic
> bypasses the ARM core of the BF2, and egress traffic does the same.
> >
> > According to the DPDK documentation, rte_flow_push/pull() seems to be
> > intended for use as a batch operation, wrapping a large for loop that
> issues multiple flow operations, and then committing them to hardware in
> one go.
> >
> > However, I’ve observed that when multiple cores simultaneously insert
> > flow rules, using rte_flow_push/pull() in such a batched way can result
> in the rule insertion operations not being properly transmitted to the
> hardware. Specifically, the internal function mlx5dr_send_all_dep_wqe()
> ends up getting stuck in its while loop.
> >
> > Interestingly, if I call rte_flow_push/pull() after each individual
> > rte_flow_async_create() operation, even though that usage seems contrary
> to the intended batching model, the infinite loop issue is significantly
> mitigated. The frequency of getting stuck in mlx5dr_send_all_dep_wqe()
> drops drastically—though it still occurs occasionally.
> >
> > In summary, calling rte_flow_push/pull() after each
> rte_flow_async_create() seems to avoid the infinite loop, but I’m unsure
> if this is an expected usage pattern. I would like to ask:
> >
> >  *
> >
> >     Is this behavior intentional?
> >
> >  *
> >
> >     Am I misunderstanding the design or usage expectations for
> rte_flow_push/pull() in multi-core scenarios?
> >
>
> Perhaps my question is a bit out of place and wrong, but, given the fact
> there are no code snippets to take a look at, are you using separate flow
> queues for submitting the operations, one flow queue per lcore?
>
> Thank you.
>
> > Thank you for your time and support.
> >
> > Sincerely,
> > Seongjong Bae M.S. Student T-Networking Lab.
> > [AIorK4yCWXBmHrQ1GGSZ1Kc18irHfB1S9x_FqTeAHsxNIdnf_olG-PRjFVlItUw34zr1t
> > nNwkP5AlPTomK87]
> > Email
> > sjbae1999@gmail.com
> > Mobile
> > (+82)01089640524
> > Web.
> > https://tnet.snu.ac.kr/
> > [a81b6766e3d5b6518dc4010493c7772f5a46f598.png?u=11013800]
> >
> >