DPDK usage discussions
 help / color / mirror / Atom feed
* Questions about running XDP sockets on top of bonding device or on the physical interfaces behind the bond
@ 2024-01-25  8:48 Pavel Vazharov
  2024-01-25 23:53 ` Stephen Hemminger
  0 siblings, 1 reply; 4+ messages in thread
From: Pavel Vazharov @ 2024-01-25  8:48 UTC (permalink / raw)
  To: users

[-- Attachment #1: Type: text/plain, Size: 4097 bytes --]

Hi there,

I'd like to ask for advice for a weird issue that I'm facing trying to run
XDP on top of a bonding device (802.3ad) (and also on the physical
interfaces behind the bond).

I've a DPDK application which runs on top of XDP sockets, using the DPDK AF_XDP
driver <https://doc.dpdk.org/guides/nics/af_xdp.html>. It was a pure DPDK
application but lately it was migrated to run on top of XDP sockets because
we need to split the traffic entering the machine between the DPDK
application and other "standard-Linux" applications running on the same
machine.
The application works fine when running on top of a single interface but it
has problems when it runs on top of a bonding interface. It needs to be
able to run with multiple XDP sockets where each socket (or group of XDP
sockets) is/are handled in a separate thread. However, the bonding device
is reported with a single queue and thus the application can't open more
than one  XDP socket for it. So I've tried binding the XDP sockets to the
queues of the physical interfaces. For example:
- 3 interfaces each one is set to have 8 queues
- I've created 3 virtual af_xdp devices each one with 8 queues i.e. in
summary 24 XDP sockets each bound to a separate queue (this functionality
is provided by the DPDK itself).
- I've run the application on 2 threads where the first thread handled the
first 12 queues (XDP sockets) and the second thread handled the next 12
queues (XDP socket) i.e. the first thread worked with all 8 queues from
af_xdp device 0 and the first 4 queues from af_xdp device 1. The second
thread worked with the next 4 queues from af_xdp device 1 and all 8 queues
from af_xdp device 2. I've also tried another distribution scheme (see
below). The given threads just call the receve/transmit functions provided
by the DPDK for the assigned queues.
- The problem is that with this scheme the network device on the other side
reports: "The member of the LACP mode Eth-Trunk interface received an
abnormal LACPDU, which may be caused by optical fiber misconnection". And
this error is always reported for the last device/interface in the bonding
and the bonding/LACP doesn't work.
- Another thing is that if I run the DPDK application on a single thread,
and the sending/receiving on all queues is handled on a single thread, then
the bonding seems to work correctly and the above error is not reported.
- I've checked the code multiple times and I'm sure that each thread is
accessing its own group of queues/sockets.
- I've tried 2 different schemes of accessing but each one led to the same
issue. For example (device_idx - queue_idx), I've tried these two orders of
accessing:
Thread 1        Thread2
(0 - 0)             (1 - 4)
(0 - 1)             (1 - 5)
...                    (1 - 6)
...                    (1 - 7)
(0 - 7)             (2 - 0)
(1 - 0)             (2 - 1)
(1 - 1)             ...
(1 - 2)             ...
(1 - 3)             (2 - 7)

Thread 1        Thread2
(0 - 0)             (0 - 4)
(1 - 0)             (1 - 4)
(2 - 0)             (2 - 4)
(0 - 1)             (0 - 5)
(1 - 1)             (1 - 5)
(2 - 1)             (2 - 5)
...                    ...
(0 - 3)             (0 - 7)
(1 - 3)             (1 - 7)
(2 - 3)             (2 - 7)

And here are my questions based on the above situation:
1. I assumed that it's not possible to run multiple XDP sockets on top of
the bonding device itself and I need to "bind" the XDP sockets on the
physical interfaces behind the bonding device. Am I right about this or am
I missing something?
2. Is the bonding logic (LACP management traffic) affected by the access
pattern of the XDP sockets?
3. Is this scheme supposed to work or it's just that the design is wrong? I
mean, maybe a group of queues/sockets shouldn't be handled on a given
thread but only a single queue should be handled on a given application
thread. It's just that the physical devices have more queues setup on them
than the number of threads in the DPDK application and thus multiple queues
need to be handled on a single application thread.

Any ideas are appreciated!

Regards,
Pavel.

[-- Attachment #2: Type: text/html, Size: 5160 bytes --]

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Questions about running XDP sockets on top of bonding device or on the physical interfaces behind the bond
  2024-01-25  8:48 Questions about running XDP sockets on top of bonding device or on the physical interfaces behind the bond Pavel Vazharov
@ 2024-01-25 23:53 ` Stephen Hemminger
  2024-01-26 14:01   ` Pavel Vazharov
  0 siblings, 1 reply; 4+ messages in thread
From: Stephen Hemminger @ 2024-01-25 23:53 UTC (permalink / raw)
  To: Pavel Vazharov; +Cc: users

On Thu, 25 Jan 2024 10:48:07 +0200
Pavel Vazharov <freakpv@gmail.com> wrote:

> Hi there,
> 
> I'd like to ask for advice for a weird issue that I'm facing trying to run
> XDP on top of a bonding device (802.3ad) (and also on the physical
> interfaces behind the bond).
> 
> I've a DPDK application which runs on top of XDP sockets, using the DPDK AF_XDP
> driver <https://doc.dpdk.org/guides/nics/af_xdp.html>. It was a pure DPDK
> application but lately it was migrated to run on top of XDP sockets because
> we need to split the traffic entering the machine between the DPDK
> application and other "standard-Linux" applications running on the same
> machine.
> The application works fine when running on top of a single interface but it
> has problems when it runs on top of a bonding interface. It needs to be
> able to run with multiple XDP sockets where each socket (or group of XDP
> sockets) is/are handled in a separate thread. However, the bonding device
> is reported with a single queue and thus the application can't open more
> than one  XDP socket for it. So I've tried binding the XDP sockets to the
> queues of the physical interfaces. For example:
> - 3 interfaces each one is set to have 8 queues
> - I've created 3 virtual af_xdp devices each one with 8 queues i.e. in
> summary 24 XDP sockets each bound to a separate queue (this functionality
> is provided by the DPDK itself).
> - I've run the application on 2 threads where the first thread handled the
> first 12 queues (XDP sockets) and the second thread handled the next 12
> queues (XDP socket) i.e. the first thread worked with all 8 queues from
> af_xdp device 0 and the first 4 queues from af_xdp device 1. The second
> thread worked with the next 4 queues from af_xdp device 1 and all 8 queues
> from af_xdp device 2. I've also tried another distribution scheme (see
> below). The given threads just call the receve/transmit functions provided
> by the DPDK for the assigned queues.
> - The problem is that with this scheme the network device on the other side
> reports: "The member of the LACP mode Eth-Trunk interface received an
> abnormal LACPDU, which may be caused by optical fiber misconnection". And
> this error is always reported for the last device/interface in the bonding
> and the bonding/LACP doesn't work.
> - Another thing is that if I run the DPDK application on a single thread,
> and the sending/receiving on all queues is handled on a single thread, then
> the bonding seems to work correctly and the above error is not reported.
> - I've checked the code multiple times and I'm sure that each thread is
> accessing its own group of queues/sockets.
> - I've tried 2 different schemes of accessing but each one led to the same
> issue. For example (device_idx - queue_idx), I've tried these two orders of
> accessing:
> Thread 1        Thread2
> (0 - 0)             (1 - 4)
> (0 - 1)             (1 - 5)
> ...                    (1 - 6)
> ...                    (1 - 7)
> (0 - 7)             (2 - 0)
> (1 - 0)             (2 - 1)
> (1 - 1)             ...
> (1 - 2)             ...
> (1 - 3)             (2 - 7)
> 
> Thread 1        Thread2
> (0 - 0)             (0 - 4)
> (1 - 0)             (1 - 4)
> (2 - 0)             (2 - 4)
> (0 - 1)             (0 - 5)
> (1 - 1)             (1 - 5)
> (2 - 1)             (2 - 5)
> ...                    ...
> (0 - 3)             (0 - 7)
> (1 - 3)             (1 - 7)
> (2 - 3)             (2 - 7)
> 
> And here are my questions based on the above situation:
> 1. I assumed that it's not possible to run multiple XDP sockets on top of
> the bonding device itself and I need to "bind" the XDP sockets on the
> physical interfaces behind the bonding device. Am I right about this or am
> I missing something?
> 2. Is the bonding logic (LACP management traffic) affected by the access
> pattern of the XDP sockets?
> 3. Is this scheme supposed to work or it's just that the design is wrong? I
> mean, maybe a group of queues/sockets shouldn't be handled on a given
> thread but only a single queue should be handled on a given application
> thread. It's just that the physical devices have more queues setup on them
> than the number of threads in the DPDK application and thus multiple queues
> need to be handled on a single application thread.
> 
> Any ideas are appreciated!
> 
> Regards,
> Pavel.

Look at recent discussions on netdev mailing list.
Linux bonding device still needs more work to fully support XDP.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Questions about running XDP sockets on top of bonding device or on the physical interfaces behind the bond
  2024-01-25 23:53 ` Stephen Hemminger
@ 2024-01-26 14:01   ` Pavel Vazharov
  2024-01-30 13:58     ` Pavel Vazharov
  0 siblings, 1 reply; 4+ messages in thread
From: Pavel Vazharov @ 2024-01-26 14:01 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: users

[-- Attachment #1: Type: text/plain, Size: 4898 bytes --]

On Fri, Jan 26, 2024 at 1:53 AM Stephen Hemminger <
stephen@networkplumber.org> wrote:

> On Thu, 25 Jan 2024 10:48:07 +0200
> Pavel Vazharov <freakpv@gmail.com> wrote:
>
> > Hi there,
> >
> > I'd like to ask for advice for a weird issue that I'm facing trying to
> run
> > XDP on top of a bonding device (802.3ad) (and also on the physical
> > interfaces behind the bond).
> >
> > I've a DPDK application which runs on top of XDP sockets, using the DPDK
> AF_XDP
> > driver <https://doc.dpdk.org/guides/nics/af_xdp.html>. It was a pure
> DPDK
> > application but lately it was migrated to run on top of XDP sockets
> because
> > we need to split the traffic entering the machine between the DPDK
> > application and other "standard-Linux" applications running on the same
> > machine.
> > The application works fine when running on top of a single interface but
> it
> > has problems when it runs on top of a bonding interface. It needs to be
> > able to run with multiple XDP sockets where each socket (or group of XDP
> > sockets) is/are handled in a separate thread. However, the bonding device
> > is reported with a single queue and thus the application can't open more
> > than one  XDP socket for it. So I've tried binding the XDP sockets to the
> > queues of the physical interfaces. For example:
> > - 3 interfaces each one is set to have 8 queues
> > - I've created 3 virtual af_xdp devices each one with 8 queues i.e. in
> > summary 24 XDP sockets each bound to a separate queue (this functionality
> > is provided by the DPDK itself).
> > - I've run the application on 2 threads where the first thread handled
> the
> > first 12 queues (XDP sockets) and the second thread handled the next 12
> > queues (XDP socket) i.e. the first thread worked with all 8 queues from
> > af_xdp device 0 and the first 4 queues from af_xdp device 1. The second
> > thread worked with the next 4 queues from af_xdp device 1 and all 8
> queues
> > from af_xdp device 2. I've also tried another distribution scheme (see
> > below). The given threads just call the receve/transmit functions
> provided
> > by the DPDK for the assigned queues.
> > - The problem is that with this scheme the network device on the other
> side
> > reports: "The member of the LACP mode Eth-Trunk interface received an
> > abnormal LACPDU, which may be caused by optical fiber misconnection". And
> > this error is always reported for the last device/interface in the
> bonding
> > and the bonding/LACP doesn't work.
> > - Another thing is that if I run the DPDK application on a single thread,
> > and the sending/receiving on all queues is handled on a single thread,
> then
> > the bonding seems to work correctly and the above error is not reported.
> > - I've checked the code multiple times and I'm sure that each thread is
> > accessing its own group of queues/sockets.
> > - I've tried 2 different schemes of accessing but each one led to the
> same
> > issue. For example (device_idx - queue_idx), I've tried these two orders
> of
> > accessing:
> > Thread 1        Thread2
> > (0 - 0)             (1 - 4)
> > (0 - 1)             (1 - 5)
> > ...                    (1 - 6)
> > ...                    (1 - 7)
> > (0 - 7)             (2 - 0)
> > (1 - 0)             (2 - 1)
> > (1 - 1)             ...
> > (1 - 2)             ...
> > (1 - 3)             (2 - 7)
> >
> > Thread 1        Thread2
> > (0 - 0)             (0 - 4)
> > (1 - 0)             (1 - 4)
> > (2 - 0)             (2 - 4)
> > (0 - 1)             (0 - 5)
> > (1 - 1)             (1 - 5)
> > (2 - 1)             (2 - 5)
> > ...                    ...
> > (0 - 3)             (0 - 7)
> > (1 - 3)             (1 - 7)
> > (2 - 3)             (2 - 7)
> >
> > And here are my questions based on the above situation:
> > 1. I assumed that it's not possible to run multiple XDP sockets on top of
> > the bonding device itself and I need to "bind" the XDP sockets on the
> > physical interfaces behind the bonding device. Am I right about this or
> am
> > I missing something?
> > 2. Is the bonding logic (LACP management traffic) affected by the access
> > pattern of the XDP sockets?
> > 3. Is this scheme supposed to work or it's just that the design is
> wrong? I
> > mean, maybe a group of queues/sockets shouldn't be handled on a given
> > thread but only a single queue should be handled on a given application
> > thread. It's just that the physical devices have more queues setup on
> them
> > than the number of threads in the DPDK application and thus multiple
> queues
> > need to be handled on a single application thread.
> >
> > Any ideas are appreciated!
> >
> > Regards,
> > Pavel.
>
> Look at recent discussions on netdev mailing list.
> Linux bonding device still needs more work to fully support XDP.
>
Thank you. Will do so.

[-- Attachment #2: Type: text/html, Size: 5989 bytes --]

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Questions about running XDP sockets on top of bonding device or on the physical interfaces behind the bond
  2024-01-26 14:01   ` Pavel Vazharov
@ 2024-01-30 13:58     ` Pavel Vazharov
  0 siblings, 0 replies; 4+ messages in thread
From: Pavel Vazharov @ 2024-01-30 13:58 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: users

[-- Attachment #1: Type: text/plain, Size: 5582 bytes --]

Just for info, if somebody hits the same issue.
Forcing the copy of the packets between the kernel and the user space with
'force_copy=1'

On Fri, Jan 26, 2024 at 4:01 PM Pavel Vazharov <freakpv@gmail.com> wrote:

> On Fri, Jan 26, 2024 at 1:53 AM Stephen Hemminger <
> stephen@networkplumber.org> wrote:
>
>> On Thu, 25 Jan 2024 10:48:07 +0200
>> Pavel Vazharov <freakpv@gmail.com> wrote:
>>
>> > Hi there,
>> >
>> > I'd like to ask for advice for a weird issue that I'm facing trying to
>> run
>> > XDP on top of a bonding device (802.3ad) (and also on the physical
>> > interfaces behind the bond).
>> >
>> > I've a DPDK application which runs on top of XDP sockets, using the
>> DPDK AF_XDP
>> > driver <https://doc.dpdk.org/guides/nics/af_xdp.html>. It was a pure
>> DPDK
>> > application but lately it was migrated to run on top of XDP sockets
>> because
>> > we need to split the traffic entering the machine between the DPDK
>> > application and other "standard-Linux" applications running on the same
>> > machine.
>> > The application works fine when running on top of a single interface
>> but it
>> > has problems when it runs on top of a bonding interface. It needs to be
>> > able to run with multiple XDP sockets where each socket (or group of XDP
>> > sockets) is/are handled in a separate thread. However, the bonding
>> device
>> > is reported with a single queue and thus the application can't open more
>> > than one  XDP socket for it. So I've tried binding the XDP sockets to
>> the
>> > queues of the physical interfaces. For example:
>> > - 3 interfaces each one is set to have 8 queues
>> > - I've created 3 virtual af_xdp devices each one with 8 queues i.e. in
>> > summary 24 XDP sockets each bound to a separate queue (this
>> functionality
>> > is provided by the DPDK itself).
>> > - I've run the application on 2 threads where the first thread handled
>> the
>> > first 12 queues (XDP sockets) and the second thread handled the next 12
>> > queues (XDP socket) i.e. the first thread worked with all 8 queues from
>> > af_xdp device 0 and the first 4 queues from af_xdp device 1. The second
>> > thread worked with the next 4 queues from af_xdp device 1 and all 8
>> queues
>> > from af_xdp device 2. I've also tried another distribution scheme (see
>> > below). The given threads just call the receve/transmit functions
>> provided
>> > by the DPDK for the assigned queues.
>> > - The problem is that with this scheme the network device on the other
>> side
>> > reports: "The member of the LACP mode Eth-Trunk interface received an
>> > abnormal LACPDU, which may be caused by optical fiber misconnection".
>> And
>> > this error is always reported for the last device/interface in the
>> bonding
>> > and the bonding/LACP doesn't work.
>> > - Another thing is that if I run the DPDK application on a single
>> thread,
>> > and the sending/receiving on all queues is handled on a single thread,
>> then
>> > the bonding seems to work correctly and the above error is not reported.
>> > - I've checked the code multiple times and I'm sure that each thread is
>> > accessing its own group of queues/sockets.
>> > - I've tried 2 different schemes of accessing but each one led to the
>> same
>> > issue. For example (device_idx - queue_idx), I've tried these two
>> orders of
>> > accessing:
>> > Thread 1        Thread2
>> > (0 - 0)             (1 - 4)
>> > (0 - 1)             (1 - 5)
>> > ...                    (1 - 6)
>> > ...                    (1 - 7)
>> > (0 - 7)             (2 - 0)
>> > (1 - 0)             (2 - 1)
>> > (1 - 1)             ...
>> > (1 - 2)             ...
>> > (1 - 3)             (2 - 7)
>> >
>> > Thread 1        Thread2
>> > (0 - 0)             (0 - 4)
>> > (1 - 0)             (1 - 4)
>> > (2 - 0)             (2 - 4)
>> > (0 - 1)             (0 - 5)
>> > (1 - 1)             (1 - 5)
>> > (2 - 1)             (2 - 5)
>> > ...                    ...
>> > (0 - 3)             (0 - 7)
>> > (1 - 3)             (1 - 7)
>> > (2 - 3)             (2 - 7)
>> >
>> > And here are my questions based on the above situation:
>> > 1. I assumed that it's not possible to run multiple XDP sockets on top
>> of
>> > the bonding device itself and I need to "bind" the XDP sockets on the
>> > physical interfaces behind the bonding device. Am I right about this or
>> am
>> > I missing something?
>> > 2. Is the bonding logic (LACP management traffic) affected by the access
>> > pattern of the XDP sockets?
>> > 3. Is this scheme supposed to work or it's just that the design is
>> wrong? I
>> > mean, maybe a group of queues/sockets shouldn't be handled on a given
>> > thread but only a single queue should be handled on a given application
>> > thread. It's just that the physical devices have more queues setup on
>> them
>> > than the number of threads in the DPDK application and thus multiple
>> queues
>> > need to be handled on a single application thread.
>> >
>> > Any ideas are appreciated!
>> >
>> > Regards,
>> > Pavel.
>>
>> Look at recent discussions on netdev mailing list.
>> Linux bonding device still needs more work to fully support XDP.
>>
> Thank you. Will do so.
>
Just for info, if somebody hits the same issue.
Forcing the copy of the packets between the kernel and the user space with
'force_copy=1'
fixes the issue explained above.
There was another person in the netdev mailing list reporting the same for
the case of bonding.
And I tried it and it worked in my case too.

[-- Attachment #2: Type: text/html, Size: 7293 bytes --]

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2024-01-30 13:58 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-01-25  8:48 Questions about running XDP sockets on top of bonding device or on the physical interfaces behind the bond Pavel Vazharov
2024-01-25 23:53 ` Stephen Hemminger
2024-01-26 14:01   ` Pavel Vazharov
2024-01-30 13:58     ` Pavel Vazharov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).