Just for info, if somebody hits the same issue. Forcing the copy of the packets between the kernel and the user space with 'force_copy=1' On Fri, Jan 26, 2024 at 4:01 PM Pavel Vazharov wrote: > On Fri, Jan 26, 2024 at 1:53 AM Stephen Hemminger < > stephen@networkplumber.org> wrote: > >> On Thu, 25 Jan 2024 10:48:07 +0200 >> Pavel Vazharov wrote: >> >> > Hi there, >> > >> > I'd like to ask for advice for a weird issue that I'm facing trying to >> run >> > XDP on top of a bonding device (802.3ad) (and also on the physical >> > interfaces behind the bond). >> > >> > I've a DPDK application which runs on top of XDP sockets, using the >> DPDK AF_XDP >> > driver . It was a pure >> DPDK >> > application but lately it was migrated to run on top of XDP sockets >> because >> > we need to split the traffic entering the machine between the DPDK >> > application and other "standard-Linux" applications running on the same >> > machine. >> > The application works fine when running on top of a single interface >> but it >> > has problems when it runs on top of a bonding interface. It needs to be >> > able to run with multiple XDP sockets where each socket (or group of XDP >> > sockets) is/are handled in a separate thread. However, the bonding >> device >> > is reported with a single queue and thus the application can't open more >> > than one XDP socket for it. So I've tried binding the XDP sockets to >> the >> > queues of the physical interfaces. For example: >> > - 3 interfaces each one is set to have 8 queues >> > - I've created 3 virtual af_xdp devices each one with 8 queues i.e. in >> > summary 24 XDP sockets each bound to a separate queue (this >> functionality >> > is provided by the DPDK itself). >> > - I've run the application on 2 threads where the first thread handled >> the >> > first 12 queues (XDP sockets) and the second thread handled the next 12 >> > queues (XDP socket) i.e. the first thread worked with all 8 queues from >> > af_xdp device 0 and the first 4 queues from af_xdp device 1. The second >> > thread worked with the next 4 queues from af_xdp device 1 and all 8 >> queues >> > from af_xdp device 2. I've also tried another distribution scheme (see >> > below). The given threads just call the receve/transmit functions >> provided >> > by the DPDK for the assigned queues. >> > - The problem is that with this scheme the network device on the other >> side >> > reports: "The member of the LACP mode Eth-Trunk interface received an >> > abnormal LACPDU, which may be caused by optical fiber misconnection". >> And >> > this error is always reported for the last device/interface in the >> bonding >> > and the bonding/LACP doesn't work. >> > - Another thing is that if I run the DPDK application on a single >> thread, >> > and the sending/receiving on all queues is handled on a single thread, >> then >> > the bonding seems to work correctly and the above error is not reported. >> > - I've checked the code multiple times and I'm sure that each thread is >> > accessing its own group of queues/sockets. >> > - I've tried 2 different schemes of accessing but each one led to the >> same >> > issue. For example (device_idx - queue_idx), I've tried these two >> orders of >> > accessing: >> > Thread 1 Thread2 >> > (0 - 0) (1 - 4) >> > (0 - 1) (1 - 5) >> > ... (1 - 6) >> > ... (1 - 7) >> > (0 - 7) (2 - 0) >> > (1 - 0) (2 - 1) >> > (1 - 1) ... >> > (1 - 2) ... >> > (1 - 3) (2 - 7) >> > >> > Thread 1 Thread2 >> > (0 - 0) (0 - 4) >> > (1 - 0) (1 - 4) >> > (2 - 0) (2 - 4) >> > (0 - 1) (0 - 5) >> > (1 - 1) (1 - 5) >> > (2 - 1) (2 - 5) >> > ... ... >> > (0 - 3) (0 - 7) >> > (1 - 3) (1 - 7) >> > (2 - 3) (2 - 7) >> > >> > And here are my questions based on the above situation: >> > 1. I assumed that it's not possible to run multiple XDP sockets on top >> of >> > the bonding device itself and I need to "bind" the XDP sockets on the >> > physical interfaces behind the bonding device. Am I right about this or >> am >> > I missing something? >> > 2. Is the bonding logic (LACP management traffic) affected by the access >> > pattern of the XDP sockets? >> > 3. Is this scheme supposed to work or it's just that the design is >> wrong? I >> > mean, maybe a group of queues/sockets shouldn't be handled on a given >> > thread but only a single queue should be handled on a given application >> > thread. It's just that the physical devices have more queues setup on >> them >> > than the number of threads in the DPDK application and thus multiple >> queues >> > need to be handled on a single application thread. >> > >> > Any ideas are appreciated! >> > >> > Regards, >> > Pavel. >> >> Look at recent discussions on netdev mailing list. >> Linux bonding device still needs more work to fully support XDP. >> > Thank you. Will do so. > Just for info, if somebody hits the same issue. Forcing the copy of the packets between the kernel and the user space with 'force_copy=1' fixes the issue explained above. There was another person in the netdev mailing list reporting the same for the case of bonding. And I tried it and it worked in my case too.