DPDK usage discussions
 help / color / mirror / Atom feed
* dumpcap: weird failure with six IPv6 hosts in the filter
@ 2024-06-17  7:11 Isaac Boukris
  2024-06-17 14:44 ` Stephen Hemminger
  2024-06-17 15:30 ` Stephen Hemminger
  0 siblings, 2 replies; 10+ messages in thread
From: Isaac Boukris @ 2024-06-17  7:11 UTC (permalink / raw)
  To: users, Stephen Hemminger

Hi Stephen,

For instance, the following filter fais as follows (if I omit one host
it works):
-f "host 1::1 or host 1::1 or host 1::1 or host 1::1 or host 1::1 or host 1::1"

EAL: Error - exiting with code: 1
  Cause: Packet dump enable on 0:0000:13:00.0 failed Connection timed out

On the server side I see:
Jun 16 15:17:08: EAL: failed to send to
(/tmp/dpdk/rte/mp_socket_262131_4a103955de0b7a) due to No such file or
directory
Jun 16 15:17:08: pdump_server(): failed to send to client:No such file
or directory
Jun 16 15:17:08: EAL: Fail to handle message: mp_pdump

Then subsequent requests fail with (even with no filter):
pdump_register_rx_callbacks(): rx callback for port=0 queue=0, already exists

I debugged the dpdk-mp-msg thread with gdb, as far as I can tell it
hangs an awful lot of time on rte_bpf_load() (~15 secs in my env), so
the client times out and by the time the server tries to respond the
client socket doesn't exist anymore.

Thoughts?

Thanks!

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: dumpcap: weird failure with six IPv6 hosts in the filter
  2024-06-17  7:11 dumpcap: weird failure with six IPv6 hosts in the filter Isaac Boukris
@ 2024-06-17 14:44 ` Stephen Hemminger
  2024-06-17 15:30 ` Stephen Hemminger
  1 sibling, 0 replies; 10+ messages in thread
From: Stephen Hemminger @ 2024-06-17 14:44 UTC (permalink / raw)
  To: Isaac Boukris; +Cc: users

On Mon, 17 Jun 2024 10:11:47 +0300
Isaac Boukris <iboukris@gmail.com> wrote:

> Hi Stephen,
> 
> For instance, the following filter fais as follows (if I omit one host
> it works):
> -f "host 1::1 or host 1::1 or host 1::1 or host 1::1 or host 1::1 or host 1::1"
> 
> EAL: Error - exiting with code: 1
>   Cause: Packet dump enable on 0:0000:13:00.0 failed Connection timed out
> 
> On the server side I see:
> Jun 16 15:17:08: EAL: failed to send to
> (/tmp/dpdk/rte/mp_socket_262131_4a103955de0b7a) due to No such file or
> directory
> Jun 16 15:17:08: pdump_server(): failed to send to client:No such file
> or directory
> Jun 16 15:17:08: EAL: Fail to handle message: mp_pdump
> 
> Then subsequent requests fail with (even with no filter):
> pdump_register_rx_callbacks(): rx callback for port=0 queue=0, already exists
> 
> I debugged the dpdk-mp-msg thread with gdb, as far as I can tell it
> hangs an awful lot of time on rte_bpf_load() (~15 secs in my env), so
> the client times out and by the time the server tries to respond the
> client socket doesn't exist anymore.
> 
> Thoughts?
> 
> Thanks!

What is the resulting bpf code? Looks like a BPF bug with larger programs.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: dumpcap: weird failure with six IPv6 hosts in the filter
  2024-06-17  7:11 dumpcap: weird failure with six IPv6 hosts in the filter Isaac Boukris
  2024-06-17 14:44 ` Stephen Hemminger
@ 2024-06-17 15:30 ` Stephen Hemminger
  2024-06-17 15:57   ` Isaac Boukris
  1 sibling, 1 reply; 10+ messages in thread
From: Stephen Hemminger @ 2024-06-17 15:30 UTC (permalink / raw)
  To: Isaac Boukris; +Cc: users

On Mon, 17 Jun 2024 10:11:47 +0300
Isaac Boukris <iboukris@gmail.com> wrote:

> Hi Stephen,
> 
> For instance, the following filter fais as follows (if I omit one host
> it works):
> -f "host 1::1 or host 1::1 or host 1::1 or host 1::1 or host 1::1 or host 1::1"
> 
> EAL: Error - exiting with code: 1
>   Cause: Packet dump enable on 0:0000:13:00.0 failed Connection timed out
> 
> On the server side I see:
> Jun 16 15:17:08: EAL: failed to send to
> (/tmp/dpdk/rte/mp_socket_262131_4a103955de0b7a) due to No such file or
> directory
> Jun 16 15:17:08: pdump_server(): failed to send to client:No such file
> or directory
> Jun 16 15:17:08: EAL: Fail to handle message: mp_pdump
> 
> Then subsequent requests fail with (even with no filter):
> pdump_register_rx_callbacks(): rx callback for port=0 queue=0, already exists
> 
> I debugged the dpdk-mp-msg thread with gdb, as far as I can tell it
> hangs an awful lot of time on rte_bpf_load() (~15 secs in my env), so
> the client times out and by the time the server tries to respond the
> client socket doesn't exist anymore.
> 
> Thoughts?

I tried testing this with current 24.07-rc code base and do not see any problem
(don't have real hardware needed to fix vdev to get tap to work).

# ./build/app/dpdk-dumpcap -f "host 1::1 or host 1::1 or host 1::1 or host 1::1 or host 1::1 or host 1::1"
File: /tmp/dpdk-dumpcap_0_net_tap0_20240617082758.pcapng
Capturing on 'net_tap0'
Packets captured: 0 ^C
Packets received/dropped on interface 'net_tap0': 0/0 (0.0)

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: dumpcap: weird failure with six IPv6 hosts in the filter
  2024-06-17 15:30 ` Stephen Hemminger
@ 2024-06-17 15:57   ` Isaac Boukris
  2024-06-17 18:32     ` Isaac Boukris
  0 siblings, 1 reply; 10+ messages in thread
From: Isaac Boukris @ 2024-06-17 15:57 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: users

On Mon, Jun 17, 2024 at 6:30 PM Stephen Hemminger
<stephen@networkplumber.org> wrote:
>
> On Mon, 17 Jun 2024 10:11:47 +0300
> Isaac Boukris <iboukris@gmail.com> wrote:
>
> > Hi Stephen,
> >
> > For instance, the following filter fais as follows (if I omit one host
> > it works):
> > -f "host 1::1 or host 1::1 or host 1::1 or host 1::1 or host 1::1 or host 1::1"
> >
> > EAL: Error - exiting with code: 1
> >   Cause: Packet dump enable on 0:0000:13:00.0 failed Connection timed out
> >
> > On the server side I see:
> > Jun 16 15:17:08: EAL: failed to send to
> > (/tmp/dpdk/rte/mp_socket_262131_4a103955de0b7a) due to No such file or
> > directory
> > Jun 16 15:17:08: pdump_server(): failed to send to client:No such file
> > or directory
> > Jun 16 15:17:08: EAL: Fail to handle message: mp_pdump
> >
> > Then subsequent requests fail with (even with no filter):
> > pdump_register_rx_callbacks(): rx callback for port=0 queue=0, already exists
> >
> > I debugged the dpdk-mp-msg thread with gdb, as far as I can tell it
> > hangs an awful lot of time on rte_bpf_load() (~15 secs in my env), so
> > the client times out and by the time the server tries to respond the
> > client socket doesn't exist anymore.
> >
> > Thoughts?
>
> I tried testing this with current 24.07-rc code base and do not see any problem
> (don't have real hardware needed to fix vdev to get tap to work).
>
> # ./build/app/dpdk-dumpcap -f "host 1::1 or host 1::1 or host 1::1 or host 1::1 or host 1::1 or host 1::1"
> File: /tmp/dpdk-dumpcap_0_net_tap0_20240617082758.pcapng
> Capturing on 'net_tap0'
> Packets captured: 0 ^C
> Packets received/dropped on interface 'net_tap0': 0/0 (0.0)

Thanks for giving it a try, I had thought it would easily reproduce as
it does for me (perhaps worth trying adding a host or two).

I'll try a more recent dpdk version (currently running 23.11.0) and
try to narrow it further otherwise.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: dumpcap: weird failure with six IPv6 hosts in the filter
  2024-06-17 15:57   ` Isaac Boukris
@ 2024-06-17 18:32     ` Isaac Boukris
  2024-06-17 19:37       ` Isaac Boukris
  0 siblings, 1 reply; 10+ messages in thread
From: Isaac Boukris @ 2024-06-17 18:32 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: users

On Mon, Jun 17, 2024 at 6:57 PM Isaac Boukris <iboukris@gmail.com> wrote:
>
> On Mon, Jun 17, 2024 at 6:30 PM Stephen Hemminger
> <stephen@networkplumber.org> wrote:
> >
> > On Mon, 17 Jun 2024 10:11:47 +0300
> > Isaac Boukris <iboukris@gmail.com> wrote:
> >
> > > Hi Stephen,
> > >
> > > For instance, the following filter fais as follows (if I omit one host
> > > it works):
> > > -f "host 1::1 or host 1::1 or host 1::1 or host 1::1 or host 1::1 or host 1::1"
> > >
> > > EAL: Error - exiting with code: 1
> > >   Cause: Packet dump enable on 0:0000:13:00.0 failed Connection timed out
> > >
> > > On the server side I see:
> > > Jun 16 15:17:08: EAL: failed to send to
> > > (/tmp/dpdk/rte/mp_socket_262131_4a103955de0b7a) due to No such file or
> > > directory
> > > Jun 16 15:17:08: pdump_server(): failed to send to client:No such file
> > > or directory
> > > Jun 16 15:17:08: EAL: Fail to handle message: mp_pdump
> > >
> > > Then subsequent requests fail with (even with no filter):
> > > pdump_register_rx_callbacks(): rx callback for port=0 queue=0, already exists
> > >
> > > I debugged the dpdk-mp-msg thread with gdb, as far as I can tell it
> > > hangs an awful lot of time on rte_bpf_load() (~15 secs in my env), so
> > > the client times out and by the time the server tries to respond the
> > > client socket doesn't exist anymore.
> > >
> > > Thoughts?
> >
> > I tried testing this with current 24.07-rc code base and do not see any problem
> > (don't have real hardware needed to fix vdev to get tap to work).
> >
> > # ./build/app/dpdk-dumpcap -f "host 1::1 or host 1::1 or host 1::1 or host 1::1 or host 1::1 or host 1::1"
> > File: /tmp/dpdk-dumpcap_0_net_tap0_20240617082758.pcapng
> > Capturing on 'net_tap0'
> > Packets captured: 0 ^C
> > Packets received/dropped on interface 'net_tap0': 0/0 (0.0)
>
> Thanks for giving it a try, I had thought it would easily reproduce as
> it does for me (perhaps worth trying adding a host or two).
>
> I'll try a more recent dpdk version (currently running 23.11.0) and
> try to narrow it further otherwise.

Just a quick update that I still see the issue in my env with the
master branch (24.07.0-rc0), I'm now testing by adding the filter to
'sample_filters' in test_bpf.c and running:
time sudo build/app/dpdk-test bpf_convert_autotest

With 5 hosts it takes less than 2 secs, with 6 it takes about 25 secs,
i'll try to strace it maybe.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: dumpcap: weird failure with six IPv6 hosts in the filter
  2024-06-17 18:32     ` Isaac Boukris
@ 2024-06-17 19:37       ` Isaac Boukris
  2024-06-17 20:43         ` Isaac Boukris
  0 siblings, 1 reply; 10+ messages in thread
From: Isaac Boukris @ 2024-06-17 19:37 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: users

> Just a quick update that I still see the issue in my env with the
> master branch (24.07.0-rc0), I'm now testing by adding the filter to
> 'sample_filters' in test_bpf.c and running:
> time sudo build/app/dpdk-test bpf_convert_autotest
>
> With 5 hosts it takes less than 2 secs, with 6 it takes about 25 secs,
> i'll try to strace it maybe.

strace was useless, no syscalls for ~18 secs, not sure how to debug it
further, valgrind / callgrind don't work on dpdk..

It doesn't seem to be about the size though, I was able to produce
larger bpf code with ipv4 addresses and it worked fine too.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: dumpcap: weird failure with six IPv6 hosts in the filter
  2024-06-17 19:37       ` Isaac Boukris
@ 2024-06-17 20:43         ` Isaac Boukris
  2024-06-17 21:40           ` Stephen Hemminger
  0 siblings, 1 reply; 10+ messages in thread
From: Isaac Boukris @ 2024-06-17 20:43 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: users

On Mon, Jun 17, 2024 at 10:37 PM Isaac Boukris <iboukris@gmail.com> wrote:
>
> > Just a quick update that I still see the issue in my env with the
> > master branch (24.07.0-rc0), I'm now testing by adding the filter to
> > 'sample_filters' in test_bpf.c and running:
> > time sudo build/app/dpdk-test bpf_convert_autotest
> >
> > With 5 hosts it takes less than 2 secs, with 6 it takes about 25 secs,
> > i'll try to strace it maybe.
>
> strace was useless, no syscalls for ~18 secs, not sure how to debug it
> further, valgrind / callgrind don't work on dpdk..
>
> It doesn't seem to be about the size though, I was able to produce
> larger bpf code with ipv4 addresses and it worked fine too.

Debugged a bit further with gdb, it looks like it is stuck in a while
loop in lib/bpf/bpf_validate.c:evaluate(), there is a comment saying
"make sure we evaluate each node only once" but it seem to go back and
forth on the same idx's afaict.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: dumpcap: weird failure with six IPv6 hosts in the filter
  2024-06-17 20:43         ` Isaac Boukris
@ 2024-06-17 21:40           ` Stephen Hemminger
  2024-06-18 23:06             ` Konstantin Ananyev
  0 siblings, 1 reply; 10+ messages in thread
From: Stephen Hemminger @ 2024-06-17 21:40 UTC (permalink / raw)
  To: Isaac Boukris; +Cc: users, Konstantin Ananyev

On Mon, 17 Jun 2024 23:43:19 +0300
Isaac Boukris <iboukris@gmail.com> wrote:

> On Mon, Jun 17, 2024 at 10:37 PM Isaac Boukris <iboukris@gmail.com> wrote:
> >  
> > > Just a quick update that I still see the issue in my env with the
> > > master branch (24.07.0-rc0), I'm now testing by adding the filter to
> > > 'sample_filters' in test_bpf.c and running:
> > > time sudo build/app/dpdk-test bpf_convert_autotest
> > >
> > > With 5 hosts it takes less than 2 secs, with 6 it takes about 25 secs,
> > > i'll try to strace it maybe.  
> >
> > strace was useless, no syscalls for ~18 secs, not sure how to debug it
> > further, valgrind / callgrind don't work on dpdk..
> >
> > It doesn't seem to be about the size though, I was able to produce
> > larger bpf code with ipv4 addresses and it worked fine too.  
> 
> Debugged a bit further with gdb, it looks like it is stuck in a while
> loop in lib/bpf/bpf_validate.c:evaluate(), there is a comment saying
> "make sure we evaluate each node only once" but it seem to go back and
> forth on the same idx's afaict.

No idea, only original author understands the verifier.
Having our own unique verifier may not be a good idea.
There some other userspace BPF projects, seems like a good place for
convergence.


https://lpc.events/event/17/contributions/1639/attachments/1280/2585/userspace-ebpf-bpftime-lpc.pdf

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: dumpcap: weird failure with six IPv6 hosts in the filter
  2024-06-17 21:40           ` Stephen Hemminger
@ 2024-06-18 23:06             ` Konstantin Ananyev
  2024-06-19  8:45               ` Isaac Boukris
  0 siblings, 1 reply; 10+ messages in thread
From: Konstantin Ananyev @ 2024-06-18 23:06 UTC (permalink / raw)
  To: Stephen Hemminger, Isaac Boukris; +Cc: users

17.06.2024 22:40, Stephen Hemminger пишет:
> On Mon, 17 Jun 2024 23:43:19 +0300
> Isaac Boukris <iboukris@gmail.com> wrote:
> 
>> On Mon, Jun 17, 2024 at 10:37 PM Isaac Boukris <iboukris@gmail.com> wrote:
>>>   
>>>> Just a quick update that I still see the issue in my env with the
>>>> master branch (24.07.0-rc0), I'm now testing by adding the filter to
>>>> 'sample_filters' in test_bpf.c and running:
>>>> time sudo build/app/dpdk-test bpf_convert_autotest
>>>>
>>>> With 5 hosts it takes less than 2 secs, with 6 it takes about 25 secs,
>>>> i'll try to strace it maybe.
>>>
>>> strace was useless, no syscalls for ~18 secs, not sure how to debug it
>>> further, valgrind / callgrind don't work on dpdk..
>>>
>>> It doesn't seem to be about the size though, I was able to produce
>>> larger bpf code with ipv4 addresses and it worked fine too.
>>
>> Debugged a bit further with gdb, it looks like it is stuck in a while
>> loop in lib/bpf/bpf_validate.c:evaluate(), there is a comment saying
>> "make sure we evaluate each node only once" but it seem to go back and
>> forth on the same idx's afaict.
> 
> No idea, only original author understands the verifier.
> Having our own unique verifier may not be a good idea.
> There some other userspace BPF projects, seems like a good place for
> convergence.
> 
> 
> https://lpc.events/event/17/contributions/1639/attachments/1280/2585/userspace-ebpf-bpftime-lpc.pdf

hi Isaac,
please create a bug report in DPDK bugzilla.
Ideally with a clear and simple way to reproduce
the bug you are facing in the description.
I'll try to have a look when I'll have some free time.
Thanks
Konstantin

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: dumpcap: weird failure with six IPv6 hosts in the filter
  2024-06-18 23:06             ` Konstantin Ananyev
@ 2024-06-19  8:45               ` Isaac Boukris
  0 siblings, 0 replies; 10+ messages in thread
From: Isaac Boukris @ 2024-06-19  8:45 UTC (permalink / raw)
  To: Konstantin Ananyev; +Cc: Stephen Hemminger, users

Hi Konstantin,

On Wed, Jun 19, 2024 at 2:06 AM Konstantin Ananyev
<konstantin.v.ananyev@yandex.ru> wrote:
>
> 17.06.2024 22:40, Stephen Hemminger пишет:
> > On Mon, 17 Jun 2024 23:43:19 +0300
> > Isaac Boukris <iboukris@gmail.com> wrote:
> >
> >> On Mon, Jun 17, 2024 at 10:37 PM Isaac Boukris <iboukris@gmail.com> wrote:
> >>>
> >>>> Just a quick update that I still see the issue in my env with the
> >>>> master branch (24.07.0-rc0), I'm now testing by adding the filter to
> >>>> 'sample_filters' in test_bpf.c and running:
> >>>> time sudo build/app/dpdk-test bpf_convert_autotest
> >>>>
> >>>> With 5 hosts it takes less than 2 secs, with 6 it takes about 25 secs,
> >>>> i'll try to strace it maybe.
> >>>
> >>> strace was useless, no syscalls for ~18 secs, not sure how to debug it
> >>> further, valgrind / callgrind don't work on dpdk..
> >>>
> >>> It doesn't seem to be about the size though, I was able to produce
> >>> larger bpf code with ipv4 addresses and it worked fine too.
> >>
> >> Debugged a bit further with gdb, it looks like it is stuck in a while
> >> loop in lib/bpf/bpf_validate.c:evaluate(), there is a comment saying
> >> "make sure we evaluate each node only once" but it seem to go back and
> >> forth on the same idx's afaict.
> >
> > No idea, only original author understands the verifier.
> > Having our own unique verifier may not be a good idea.
> > There some other userspace BPF projects, seems like a good place for
> > convergence.
> >
> >
> > https://lpc.events/event/17/contributions/1639/attachments/1280/2585/userspace-ebpf-bpftime-lpc.pdf
>
> hi Isaac,
> please create a bug report in DPDK bugzilla.
> Ideally with a clear and simple way to reproduce
> the bug you are facing in the description.
> I'll try to have a look when I'll have some free time.
> Thanks
> Konstantin

Done:

https://bugs.dpdk.org/show_bug.cgi?id=1465

Thanks a lot!

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2024-06-19  8:46 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-06-17  7:11 dumpcap: weird failure with six IPv6 hosts in the filter Isaac Boukris
2024-06-17 14:44 ` Stephen Hemminger
2024-06-17 15:30 ` Stephen Hemminger
2024-06-17 15:57   ` Isaac Boukris
2024-06-17 18:32     ` Isaac Boukris
2024-06-17 19:37       ` Isaac Boukris
2024-06-17 20:43         ` Isaac Boukris
2024-06-17 21:40           ` Stephen Hemminger
2024-06-18 23:06             ` Konstantin Ananyev
2024-06-19  8:45               ` Isaac Boukris

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).