DPDK usage discussions
 help / color / mirror / Atom feed
* Re: [dpdk-users] DPDK on Mellanox BlueField Ref Platform
@ 2019-09-24 16:31 Jim Vaigl
  2019-09-24 17:18 ` Stephen Hemminger
  0 siblings, 1 reply; 12+ messages in thread
From: Jim Vaigl @ 2019-09-24 16:31 UTC (permalink / raw)
  To: users

Since no one has chimed in with any build/install/configure suggestion for
the
BlueField, I've spent some time debugging and thought I'd share the results.
Building the l3fwd example application and running it as the docs suggest,
when
I try to send it UDP packets from another machine, it dumps core.

Debugging a bit with gdb and printf, I can see that from inside
process_packet()
and processx4_step1() the calls to rte_pktmbuf_mtod() return Nil or
suspicious
pointer values (i.e. 0x80).  The sample apps don't guard against NULL
pointers
being returned from this rte call, so that's why it's dumping core.

I still think the problem is related to the driver config, but thought this
might ring a bell for anyone who's had problems like this.

The thing that still bothers me is that rather than seeing what I was
expecting
at init based on what the documentation shows:
    [...]
    EAL: probe driver: 15b3:1013 librte_pmd_mlx5

... when rte_eal_init() runs, I'm seeing:
    [...]
    EAL:  Selected IOVA mode 'PA'
    EAL:  Probing VFIO support...

This still seems wrong, and I've verified that specifying the BlueField
target ID
string in the make is causing "CONFIG_RTE_LIBRTE_MLX5_PMD=y" to appear in
the .config.

Regards,
--Jim Vaigl
614 886 5999



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [dpdk-users] DPDK on Mellanox BlueField Ref Platform
  2019-09-24 16:31 [dpdk-users] DPDK on Mellanox BlueField Ref Platform Jim Vaigl
@ 2019-09-24 17:18 ` Stephen Hemminger
  2019-09-24 19:10   ` Jim Vaigl
  0 siblings, 1 reply; 12+ messages in thread
From: Stephen Hemminger @ 2019-09-24 17:18 UTC (permalink / raw)
  To: Jim Vaigl; +Cc: users

On Tue, 24 Sep 2019 12:31:51 -0400
"Jim Vaigl" <jimv@rockbridgesoftware.com> wrote:

> Since no one has chimed in with any build/install/configure suggestion for
> the
> BlueField, I've spent some time debugging and thought I'd share the results.
> Building the l3fwd example application and running it as the docs suggest,
> when
> I try to send it UDP packets from another machine, it dumps core.
> 
> Debugging a bit with gdb and printf, I can see that from inside
> process_packet()
> and processx4_step1() the calls to rte_pktmbuf_mtod() return Nil or
> suspicious
> pointer values (i.e. 0x80).  The sample apps don't guard against NULL
> pointers
> being returned from this rte call, so that's why it's dumping core.
> 
> I still think the problem is related to the driver config, but thought this
> might ring a bell for anyone who's had problems like this.
> 
> The thing that still bothers me is that rather than seeing what I was
> expecting
> at init based on what the documentation shows:
>     [...]
>     EAL: probe driver: 15b3:1013 librte_pmd_mlx5
> 
> ... when rte_eal_init() runs, I'm seeing:
>     [...]
>     EAL:  Selected IOVA mode 'PA'
>     EAL:  Probing VFIO support...
> 
> This still seems wrong, and I've verified that specifying the BlueField
> target ID
> string in the make is causing "CONFIG_RTE_LIBRTE_MLX5_PMD=y" to appear in
> the .config.
> 
> Regards,
> --Jim Vaigl
> 614 886 5999
> 
> 

make sure you have latest version of rdma-core installed (v25).
The right version is not in most distros

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [dpdk-users] DPDK on Mellanox BlueField Ref Platform
  2019-09-24 17:18 ` Stephen Hemminger
@ 2019-09-24 19:10   ` Jim Vaigl
  2019-09-26 10:59     ` Asaf Penso
  0 siblings, 1 reply; 12+ messages in thread
From: Jim Vaigl @ 2019-09-24 19:10 UTC (permalink / raw)
  To: 'Stephen Hemminger'; +Cc: users

On Tue, 24 Sep 2019 12:31:51 -0400
"Jim Vaigl" <jimv@rockbridgesoftware.com> wrote:

>> Since no one has chimed in with any build/install/configure suggestion
for
>> the
>> BlueField, I've spent some time debugging and thought I'd share the
results.
>> Building the l3fwd example application and running it as the docs
suggest,
>> when
>> I try to send it UDP packets from another machine, it dumps core.
>> 
>> Debugging a bit with gdb and printf, I can see that from inside
>> process_packet()
>> and processx4_step1() the calls to rte_pktmbuf_mtod() return Nil or
>> suspicious
>> pointer values (i.e. 0x80).  The sample apps don't guard against NULL
>> pointers
>> being returned from this rte call, so that's why it's dumping core.
>> 
>> I still think the problem is related to the driver config, but thought
this
>> might ring a bell for anyone who's had problems like this.
>> 
>> The thing that still bothers me is that rather than seeing what I was
>> expecting
>> at init based on what the documentation shows:
>>     [...]
>>     EAL: probe driver: 15b3:1013 librte_pmd_mlx5
>> 
>> ... when rte_eal_init() runs, I'm seeing:
>>     [...]
>>     EAL:  Selected IOVA mode 'PA'
>>     EAL:  Probing VFIO support...
>> 
>> This still seems wrong, and I've verified that specifying the BlueField
>> target ID
>> string in the make is causing "CONFIG_RTE_LIBRTE_MLX5_PMD=y" to appear in
>> the .config.
>> 
>> Regards,
>> --Jim Vaigl
>> 614 886 5999
>> 
>> 
>
>From: Stephen Hemminger [mailto:stephen@networkplumber.org] 
>Sent: Tuesday, September 24, 2019 1:18 PM
>To: Jim Vaigl
>Cc: users@dpdk.org
>
>Subject: Re: [dpdk-users] DPDK on Mellanox BlueField Ref Platform
>make sure you have latest version of rdma-core installed (v25).
>The right version is not in most distros

Great suggestion.  I'm using the rdma-core from the MLNX_OFED 4.6-3.5.8.0
install.  I can't figure out how to tell what version that thing includes,
even looking at the source, since there's no version information in the
source files, BUT I went to github and downloaded rdma-core v24 and v25
and neither diff cleanly with the source RPM that comes in the OFED
install.  I don't know yet if it's because this is some different version
or if it's because Mellanox has made their own tweaks.

I would hope that the very latest OFED from Mellanox would include an
up-to-date and working set of libs/modules, but maybe you're on to
something.  It sounds like a risky move, but maybe I'll try just
installing rdma-core from github over top of the OFED install.  I have a
fear that I'll end up with inconsistent versions, but it's worth a try.

Thanks,
--Jim




^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [dpdk-users] DPDK on Mellanox BlueField Ref Platform
  2019-09-24 19:10   ` Jim Vaigl
@ 2019-09-26 10:59     ` Asaf Penso
  2019-09-26 19:46       ` Jim Vaigl
  2019-10-04 17:35       ` Jim Vaigl
  0 siblings, 2 replies; 12+ messages in thread
From: Asaf Penso @ 2019-09-26 10:59 UTC (permalink / raw)
  To: Jim Vaigl, 'Stephen Hemminger'
  Cc: users, Kiran Vedere, Erez Ferber, Olga Shern

Hello Jim,

Thanks for your mail.
In order  for us to have a better resolution please send a mail to our support team - support@mellanox.com
Please provide as much info about the setup, configuration etc as you can.

In parallel, I added Erez Ferber here to assist.

Regards,
Asaf Penso

> -----Original Message-----
> From: users <users-bounces@dpdk.org> On Behalf Of Jim Vaigl
> Sent: Tuesday, September 24, 2019 10:11 PM
> To: 'Stephen Hemminger' <stephen@networkplumber.org>
> Cc: users@dpdk.org
> Subject: Re: [dpdk-users] DPDK on Mellanox BlueField Ref Platform
> 
> On Tue, 24 Sep 2019 12:31:51 -0400
> "Jim Vaigl" <jimv@rockbridgesoftware.com> wrote:
> 
> >> Since no one has chimed in with any build/install/configure suggestion
> for
> >> the
> >> BlueField, I've spent some time debugging and thought I'd share the
> results.
> >> Building the l3fwd example application and running it as the docs
> suggest,
> >> when
> >> I try to send it UDP packets from another machine, it dumps core.
> >>
> >> Debugging a bit with gdb and printf, I can see that from inside
> >> process_packet()
> >> and processx4_step1() the calls to rte_pktmbuf_mtod() return Nil or
> >> suspicious
> >> pointer values (i.e. 0x80).  The sample apps don't guard against NULL
> >> pointers
> >> being returned from this rte call, so that's why it's dumping core.
> >>
> >> I still think the problem is related to the driver config, but thought
> this
> >> might ring a bell for anyone who's had problems like this.
> >>
> >> The thing that still bothers me is that rather than seeing what I was
> >> expecting
> >> at init based on what the documentation shows:
> >>     [...]
> >>     EAL: probe driver: 15b3:1013 librte_pmd_mlx5
> >>
> >> ... when rte_eal_init() runs, I'm seeing:
> >>     [...]
> >>     EAL:  Selected IOVA mode 'PA'
> >>     EAL:  Probing VFIO support...
> >>
> >> This still seems wrong, and I've verified that specifying the BlueField
> >> target ID
> >> string in the make is causing "CONFIG_RTE_LIBRTE_MLX5_PMD=y" to
> appear in
> >> the .config.
> >>
> >> Regards,
> >> --Jim Vaigl
> >> 614 886 5999
> >>
> >>
> >
> >From: Stephen Hemminger [mailto:stephen@networkplumber.org]
> >Sent: Tuesday, September 24, 2019 1:18 PM
> >To: Jim Vaigl
> >Cc: users@dpdk.org
> >
> >Subject: Re: [dpdk-users] DPDK on Mellanox BlueField Ref Platform
> >make sure you have latest version of rdma-core installed (v25).
> >The right version is not in most distros
> 
> Great suggestion.  I'm using the rdma-core from the MLNX_OFED 4.6-3.5.8.0
> install.  I can't figure out how to tell what version that thing includes,
> even looking at the source, since there's no version information in the
> source files, BUT I went to github and downloaded rdma-core v24 and v25
> and neither diff cleanly with the source RPM that comes in the OFED
> install.  I don't know yet if it's because this is some different version
> or if it's because Mellanox has made their own tweaks.
> 
> I would hope that the very latest OFED from Mellanox would include an
> up-to-date and working set of libs/modules, but maybe you're on to
> something.  It sounds like a risky move, but maybe I'll try just
> installing rdma-core from github over top of the OFED install.  I have a
> fear that I'll end up with inconsistent versions, but it's worth a try.
> 
> Thanks,
> --Jim
> 
> 


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [dpdk-users] DPDK on Mellanox BlueField Ref Platform
  2019-09-26 10:59     ` Asaf Penso
@ 2019-09-26 19:46       ` Jim Vaigl
  2019-10-04 17:35       ` Jim Vaigl
  1 sibling, 0 replies; 12+ messages in thread
From: Jim Vaigl @ 2019-09-26 19:46 UTC (permalink / raw)
  To: 'Asaf Penso', 'Stephen Hemminger'
  Cc: users, 'Kiran Vedere', 'Erez Ferber',
	'Olga Shern'

> From: Asaf Penso [mailto:asafp@mellanox.com] 
> Sent: Thursday, September 26, 2019 7:00 AM
> To: Jim Vaigl; 'Stephen Hemminger'
> Cc: users@dpdk.org; Kiran Vedere; Erez Ferber; Olga Shern
> Subject: RE: [dpdk-users] DPDK on Mellanox BlueField Ref Platform
>
> Hello Jim,
>
> Thanks for your mail.
> In order  for us to have a better resolution please send a mail to our
support team > - support@mellanox.com
> Please provide as much info about the setup, configuration etc as you can.
>
> In parallel, I added Erez Ferber here to assist.
>
> Regards,
> Asaf Penso

Thanks for the kind offer, Asaf.  I'll take this debug effort
off-line with you and Erez and post back to the list here later
with any resolution so everyone can see the result.

By the way, the prior suggestion of using v. 25 of rdma-core
didn't pan out:  the current build script just makes a local
build in a subdirectory off the source tree and there's no
obvious way to integrate it with the MLNX_OFED environment
and the dpdk install.  After resolving package dependencies
to get rdma-core to build from the GitHub repo, I realized
the instructions say this:

  ---
  Building
  This project uses a cmake based build system. Quick start:

  $ bash build.sh
  build/bin will contain the sample programs and build/lib
  will contain the shared libraries. The build is configured
  to run all the programs 'in-place' and cannot be installed.

  NOTE: It is not currently easy to run from the build
  directory, the plugins only load from the system path.
  ---

--Jim

>> -----Original Message-----
>> From: users <users-bounces@dpdk.org> On Behalf Of Jim Vaigl
>> Sent: Tuesday, September 24, 2019 10:11 PM
>> To: 'Stephen Hemminger' <stephen@networkplumber.org>
>> Cc: users@dpdk.org
>> Subject: Re: [dpdk-users] DPDK on Mellanox BlueField Ref Platform
>> 
>> On Tue, 24 Sep 2019 12:31:51 -0400
>> "Jim Vaigl" <jimv@rockbridgesoftware.com> wrote:
>> 
>>>> Since no one has chimed in with any build/install/configure suggestion
>> for
>> >> the
>> >> BlueField, I've spent some time debugging and thought I'd share the
>> results.
>> >> Building the l3fwd example application and running it as the docs
>> suggest,
>> >> when
>> >> I try to send it UDP packets from another machine, it dumps core.
>> >>
>> >> Debugging a bit with gdb and printf, I can see that from inside
>> >> process_packet()
>> >> and processx4_step1() the calls to rte_pktmbuf_mtod() return Nil or
>> >> suspicious
>> >> pointer values (i.e. 0x80).  The sample apps don't guard against NULL
>> >> pointers
>> >> being returned from this rte call, so that's why it's dumping core.
>> >>
>> >> I still think the problem is related to the driver config, but thought
>> this
>> >> might ring a bell for anyone who's had problems like this.
>> >>
>> >> The thing that still bothers me is that rather than seeing what I was
>> >> expecting
>> >> at init based on what the documentation shows:
>> >>     [...]
>> >>     EAL: probe driver: 15b3:1013 librte_pmd_mlx5
>> >>
>> >> ... when rte_eal_init() runs, I'm seeing:
>> >>     [...]
>> >>     EAL:  Selected IOVA mode 'PA'
>> >>     EAL:  Probing VFIO support...
>> >>
>> >> This still seems wrong, and I've verified that specifying the
BlueField
>> >> target ID
>> >> string in the make is causing "CONFIG_RTE_LIBRTE_MLX5_PMD=y" to
>> appear in
>> >> the .config.
>> >>
>> >> Regards,
>> >> --Jim Vaigl
>> >> 614 886 5999
>> >>
>> >>
>> >
>> >From: Stephen Hemminger [mailto:stephen@networkplumber.org]
>> >Sent: Tuesday, September 24, 2019 1:18 PM
>> >To: Jim Vaigl
>> >Cc: users@dpdk.org
>> >
>> >Subject: Re: [dpdk-users] DPDK on Mellanox BlueField Ref Platform
>> >make sure you have latest version of rdma-core installed (v25).
>> >The right version is not in most distros
>> 
>> Great suggestion.  I'm using the rdma-core from the MLNX_OFED 4.6-3.5.8.0
>> install.  I can't figure out how to tell what version that thing
includes,
>> even looking at the source, since there's no version information in the
>> source files, BUT I went to github and downloaded rdma-core v24 and v25
>> and neither diff cleanly with the source RPM that comes in the OFED
>> install.  I don't know yet if it's because this is some different version
>> or if it's because Mellanox has made their own tweaks.
>> 
>> I would hope that the very latest OFED from Mellanox would include an
>> up-to-date and working set of libs/modules, but maybe you're on to
>> something.  It sounds like a risky move, but maybe I'll try just
>> installing rdma-core from github over top of the OFED install.  I have a
>> fear that I'll end up with inconsistent versions, but it's worth a try.
>> 
>> Thanks,
>> --Jim
 


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [dpdk-users] DPDK on Mellanox BlueField Ref Platform
  2019-09-26 10:59     ` Asaf Penso
  2019-09-26 19:46       ` Jim Vaigl
@ 2019-10-04 17:35       ` Jim Vaigl
  2019-10-04 18:28         ` Kiran Vedere
  1 sibling, 1 reply; 12+ messages in thread
From: Jim Vaigl @ 2019-10-04 17:35 UTC (permalink / raw)
  To: 'Asaf Penso', 'Stephen Hemminger'
  Cc: users, 'Kiran Vedere', 'Erez Ferber',
	'Olga Shern', 'Dan Vogel'

A final update on this issue.  Kiran Vedere went above and beyond the
call of duty: he completely reproduced my hardware setup, showed that it
worked using trex to generate similar traffic to mine, and then provided
me with a bundled-up .bfb of his CentOS (with updated kernel) and OFED
install to try so that there would be no configuration stuff for me to
mess up.

Using this, I saw exactly the same crashes I had seen in my setup.
After some thought, I realized the only meaningful difference was that
my traffic generator and IP configuration relied on an MTU size of 9000.
Once I set the MTU size down to 1500, the crashes stopped.

So, the answer is clearly that I'm just not setting up for the larger
MTU size.  I need to start to understand how to get DPDK to manage
that, but the crashing is at least understood now, and I have a way
forward.

Thanks very much to Kiran.

Regards,
--Jim

-----Original Message-----
From: Jim Vaigl [mailto:jimv@rockbridgesoftware.com] 
Sent: Thursday, September 26, 2019 3:47 PM
To: 'Asaf Penso'; 'Stephen Hemminger'
Cc: 'users@dpdk.org'; 'Kiran Vedere'; 'Erez Ferber'; 'Olga Shern'
Subject: RE: [dpdk-users] DPDK on Mellanox BlueField Ref Platform

> From: Asaf Penso [mailto:asafp@mellanox.com] 
> Sent: Thursday, September 26, 2019 7:00 AM
> To: Jim Vaigl; 'Stephen Hemminger'
> Cc: users@dpdk.org; Kiran Vedere; Erez Ferber; Olga Shern
> Subject: RE: [dpdk-users] DPDK on Mellanox BlueField Ref Platform
>
> Hello Jim,
>
> Thanks for your mail.
> In order  for us to have a better resolution please send a mail to our
support team > - support@mellanox.com
> Please provide as much info about the setup, configuration etc as you can.
>
> In parallel, I added Erez Ferber here to assist.
>
> Regards,
> Asaf Penso

Thanks for the kind offer, Asaf.  I'll take this debug effort
off-line with you and Erez and post back to the list here later
with any resolution so everyone can see the result.

By the way, the prior suggestion of using v. 25 of rdma-core
didn't pan out:  the current build script just makes a local
build in a subdirectory off the source tree and there's no
obvious way to integrate it with the MLNX_OFED environment
and the dpdk install.  After resolving package dependencies
to get rdma-core to build from the GitHub repo, I realized
the instructions say this:

  ---
  Building
  This project uses a cmake based build system. Quick start:

  $ bash build.sh
  build/bin will contain the sample programs and build/lib
  will contain the shared libraries. The build is configured
  to run all the programs 'in-place' and cannot be installed.

  NOTE: It is not currently easy to run from the build
  directory, the plugins only load from the system path.
  ---

--Jim

>> -----Original Message-----
>> From: users <users-bounces@dpdk.org> On Behalf Of Jim Vaigl
>> Sent: Tuesday, September 24, 2019 10:11 PM
>> To: 'Stephen Hemminger' <stephen@networkplumber.org>
>> Cc: users@dpdk.org
>> Subject: Re: [dpdk-users] DPDK on Mellanox BlueField Ref Platform
>> 
>> On Tue, 24 Sep 2019 12:31:51 -0400
>> "Jim Vaigl" <jimv@rockbridgesoftware.com> wrote:
>> 
>>>> Since no one has chimed in with any build/install/configure suggestion
>> for
>> >> the
>> >> BlueField, I've spent some time debugging and thought I'd share the
>> results.
>> >> Building the l3fwd example application and running it as the docs
>> suggest,
>> >> when
>> >> I try to send it UDP packets from another machine, it dumps core.
>> >>
>> >> Debugging a bit with gdb and printf, I can see that from inside
>> >> process_packet()
>> >> and processx4_step1() the calls to rte_pktmbuf_mtod() return Nil or
>> >> suspicious
>> >> pointer values (i.e. 0x80).  The sample apps don't guard against NULL
>> >> pointers
>> >> being returned from this rte call, so that's why it's dumping core.
>> >>
>> >> I still think the problem is related to the driver config, but thought
>> this
>> >> might ring a bell for anyone who's had problems like this.
>> >>
>> >> The thing that still bothers me is that rather than seeing what I was
>> >> expecting
>> >> at init based on what the documentation shows:
>> >>     [...]
>> >>     EAL: probe driver: 15b3:1013 librte_pmd_mlx5
>> >>
>> >> ... when rte_eal_init() runs, I'm seeing:
>> >>     [...]
>> >>     EAL:  Selected IOVA mode 'PA'
>> >>     EAL:  Probing VFIO support...
>> >>
>> >> This still seems wrong, and I've verified that specifying the
BlueField
>> >> target ID
>> >> string in the make is causing "CONFIG_RTE_LIBRTE_MLX5_PMD=y" to
>> appear in
>> >> the .config.
>> >>
>> >> Regards,
>> >> --Jim Vaigl
>> >> 614 886 5999
>> >>
>> >>
>> >
>> >From: Stephen Hemminger [mailto:stephen@networkplumber.org]
>> >Sent: Tuesday, September 24, 2019 1:18 PM
>> >To: Jim Vaigl
>> >Cc: users@dpdk.org
>> >
>> >Subject: Re: [dpdk-users] DPDK on Mellanox BlueField Ref Platform
>> >make sure you have latest version of rdma-core installed (v25).
>> >The right version is not in most distros
>> 
>> Great suggestion.  I'm using the rdma-core from the MLNX_OFED 4.6-3.5.8.0
>> install.  I can't figure out how to tell what version that thing
includes,
>> even looking at the source, since there's no version information in the
>> source files, BUT I went to github and downloaded rdma-core v24 and v25
>> and neither diff cleanly with the source RPM that comes in the OFED
>> install.  I don't know yet if it's because this is some different version
>> or if it's because Mellanox has made their own tweaks.
>> 
>> I would hope that the very latest OFED from Mellanox would include an
>> up-to-date and working set of libs/modules, but maybe you're on to
>> something.  It sounds like a risky move, but maybe I'll try just
>> installing rdma-core from github over top of the OFED install.  I have a
>> fear that I'll end up with inconsistent versions, but it's worth a try.
>> 
>> Thanks,
>> --Jim
 


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [dpdk-users] DPDK on Mellanox BlueField Ref Platform
  2019-10-04 17:35       ` Jim Vaigl
@ 2019-10-04 18:28         ` Kiran Vedere
  2019-10-07 16:52           ` Jim Vaigl
  0 siblings, 1 reply; 12+ messages in thread
From: Kiran Vedere @ 2019-10-04 18:28 UTC (permalink / raw)
  To: Jim Vaigl, Asaf Penso, 'Stephen Hemminger'
  Cc: users, Erez Ferber, Olga Shern, Danny Vogel

Hi Jim,

I tried your test with 9000 Byte MTU Size. On BlueField Reference Platform I set the MTU of the interface to 9000 and on TRex I am sending 8096 size byte packets. I am able to loop back packets fine w/o any issues. Below is the command line I use for testpmd

./testpmd --log-level="mlx5,8" -l 3,4,5,6,7,8,9,10,11,12,13,14,15 -n 4 -w 17:00.0 --socket-mem=2048 -- --socket-num=0 --burst=64 --txd=2048 --rxd=2048 --mbcache=512 --rxq=12 --txq=12 --nb-cores=12 -i -a --forward-mode=mac --max-pkt-len=9000 --mbuf-size=16384

Two things to consider: The max Rx packet len  is used by the PMD during its Rx Queue initialization. By default this is set to 1518 Bytes for testpmd/l3fwd. For jumbo frames you need to pass --max-pkt-len=9000 (for testpmd) or --enable-jumbo --max-pkt-len=9000 (for l3fwd). Are you passing these values to l3fwd/testpmd when you run your test? Also since the mbuf_size is 2048 by default, you need to increase the mbuf_size to > Jumbo frame size unless you enable scatter in the PMD. For testpmd you can increase the mbuf size by using --mbuf-size parameter. For l3fwd I don't think there is a command line option to increase mbuf size in runtime. So you might need to recompile the l3fwd code to increase mbuf size. Are you doing this?

Hope this helps.

Regards,
Kiran



-----Original Message-----
From: Jim Vaigl <jimv@rockbridgesoftware.com> 
Sent: Friday, October 4, 2019 1:35 PM
To: Asaf Penso <asafp@mellanox.com>; 'Stephen Hemminger' <stephen@networkplumber.org>
Cc: users@dpdk.org; Kiran Vedere <kiranv@mellanox.com>; Erez Ferber <erezf@mellanox.com>; Olga Shern <olgas@mellanox.com>; Danny Vogel <dan@mellanoxfederal.com>
Subject: RE: [dpdk-users] DPDK on Mellanox BlueField Ref Platform

A final update on this issue.  Kiran Vedere went above and beyond the call of duty: he completely reproduced my hardware setup, showed that it worked using trex to generate similar traffic to mine, and then provided me with a bundled-up .bfb of his CentOS (with updated kernel) and OFED install to try so that there would be no configuration stuff for me to mess up.

Using this, I saw exactly the same crashes I had seen in my setup.
After some thought, I realized the only meaningful difference was that my traffic generator and IP configuration relied on an MTU size of 9000.
Once I set the MTU size down to 1500, the crashes stopped.

So, the answer is clearly that I'm just not setting up for the larger MTU size.  I need to start to understand how to get DPDK to manage that, but the crashing is at least understood now, and I have a way forward.

Thanks very much to Kiran.

Regards,
--Jim

-----Original Message-----
From: Jim Vaigl [mailto:jimv@rockbridgesoftware.com]
Sent: Thursday, September 26, 2019 3:47 PM
To: 'Asaf Penso'; 'Stephen Hemminger'
Cc: 'users@dpdk.org'; 'Kiran Vedere'; 'Erez Ferber'; 'Olga Shern'
Subject: RE: [dpdk-users] DPDK on Mellanox BlueField Ref Platform

> From: Asaf Penso [mailto:asafp@mellanox.com]
> Sent: Thursday, September 26, 2019 7:00 AM
> To: Jim Vaigl; 'Stephen Hemminger'
> Cc: users@dpdk.org; Kiran Vedere; Erez Ferber; Olga Shern
> Subject: RE: [dpdk-users] DPDK on Mellanox BlueField Ref Platform
>
> Hello Jim,
>
> Thanks for your mail.
> In order  for us to have a better resolution please send a mail to our
support team > - support@mellanox.com
> Please provide as much info about the setup, configuration etc as you can.
>
> In parallel, I added Erez Ferber here to assist.
>
> Regards,
> Asaf Penso

Thanks for the kind offer, Asaf.  I'll take this debug effort off-line with you and Erez and post back to the list here later with any resolution so everyone can see the result.

By the way, the prior suggestion of using v. 25 of rdma-core didn't pan out:  the current build script just makes a local build in a subdirectory off the source tree and there's no obvious way to integrate it with the MLNX_OFED environment and the dpdk install.  After resolving package dependencies to get rdma-core to build from the GitHub repo, I realized the instructions say this:

  ---
  Building
  This project uses a cmake based build system. Quick start:

  $ bash build.sh
  build/bin will contain the sample programs and build/lib
  will contain the shared libraries. The build is configured
  to run all the programs 'in-place' and cannot be installed.

  NOTE: It is not currently easy to run from the build
  directory, the plugins only load from the system path.
  ---

--Jim

>> -----Original Message-----
>> From: users <users-bounces@dpdk.org> On Behalf Of Jim Vaigl
>> Sent: Tuesday, September 24, 2019 10:11 PM
>> To: 'Stephen Hemminger' <stephen@networkplumber.org>
>> Cc: users@dpdk.org
>> Subject: Re: [dpdk-users] DPDK on Mellanox BlueField Ref Platform
>> 
>> On Tue, 24 Sep 2019 12:31:51 -0400
>> "Jim Vaigl" <jimv@rockbridgesoftware.com> wrote:
>> 
>>>> Since no one has chimed in with any build/install/configure 
>>>> suggestion
>> for
>> >> the
>> >> BlueField, I've spent some time debugging and thought I'd share 
>> >> the
>> results.
>> >> Building the l3fwd example application and running it as the docs
>> suggest,
>> >> when
>> >> I try to send it UDP packets from another machine, it dumps core.
>> >>
>> >> Debugging a bit with gdb and printf, I can see that from inside
>> >> process_packet()
>> >> and processx4_step1() the calls to rte_pktmbuf_mtod() return Nil 
>> >> or suspicious pointer values (i.e. 0x80).  The sample apps don't 
>> >> guard against NULL pointers being returned from this rte call, so 
>> >> that's why it's dumping core.
>> >>
>> >> I still think the problem is related to the driver config, but 
>> >> thought
>> this
>> >> might ring a bell for anyone who's had problems like this.
>> >>
>> >> The thing that still bothers me is that rather than seeing what I 
>> >> was expecting at init based on what the documentation shows:
>> >>     [...]
>> >>     EAL: probe driver: 15b3:1013 librte_pmd_mlx5
>> >>
>> >> ... when rte_eal_init() runs, I'm seeing:
>> >>     [...]
>> >>     EAL:  Selected IOVA mode 'PA'
>> >>     EAL:  Probing VFIO support...
>> >>
>> >> This still seems wrong, and I've verified that specifying the
BlueField
>> >> target ID
>> >> string in the make is causing "CONFIG_RTE_LIBRTE_MLX5_PMD=y" to
>> appear in
>> >> the .config.
>> >>
>> >> Regards,
>> >> --Jim Vaigl
>> >> 614 886 5999
>> >>
>> >>
>> >
>> >From: Stephen Hemminger [mailto:stephen@networkplumber.org]
>> >Sent: Tuesday, September 24, 2019 1:18 PM
>> >To: Jim Vaigl
>> >Cc: users@dpdk.org
>> >
>> >Subject: Re: [dpdk-users] DPDK on Mellanox BlueField Ref Platform 
>> >make sure you have latest version of rdma-core installed (v25).
>> >The right version is not in most distros
>> 
>> Great suggestion.  I'm using the rdma-core from the MLNX_OFED 
>> 4.6-3.5.8.0 install.  I can't figure out how to tell what version 
>> that thing
includes,
>> even looking at the source, since there's no version information in 
>> the source files, BUT I went to github and downloaded rdma-core v24 
>> and v25 and neither diff cleanly with the source RPM that comes in 
>> the OFED install.  I don't know yet if it's because this is some 
>> different version or if it's because Mellanox has made their own tweaks.
>> 
>> I would hope that the very latest OFED from Mellanox would include an 
>> up-to-date and working set of libs/modules, but maybe you're on to 
>> something.  It sounds like a risky move, but maybe I'll try just 
>> installing rdma-core from github over top of the OFED install.  I 
>> have a fear that I'll end up with inconsistent versions, but it's worth a try.
>> 
>> Thanks,
>> --Jim
 


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [dpdk-users] DPDK on Mellanox BlueField Ref Platform
  2019-10-04 18:28         ` Kiran Vedere
@ 2019-10-07 16:52           ` Jim Vaigl
  2019-10-07 17:00             ` Kiran Vedere
  0 siblings, 1 reply; 12+ messages in thread
From: Jim Vaigl @ 2019-10-07 16:52 UTC (permalink / raw)
  To: 'Kiran Vedere', 'Asaf Penso',
	'Stephen Hemminger'
  Cc: users, 'Erez Ferber', 'Olga Shern',
	'Danny Vogel'

Hi Kiran,

When I try this command line with testpmd (with the -w just changed to
my port 0's PCIe address), I get "Creation of mbuf pool for socket
0 failed:  Cannot allocate memory".  I've tried adding --total-num-mbufs
to restrict that, but that didn't help.  It runs if I try restricting it
to just two cores, but then I drop most of my packets.  Here's the
output running it as you suggested:

    [root@localhost bin]# ./testpmd --log-level="mlx5,8" -l 3,4,5,6,7,8,
     9,10,11,12,13,14,15 -n 4 -w 0f:00.0 --socket-mem=2048 ---socket-num=0
    --burst=64 --txd=2048 --rxd=2048 --mbcache=512 --rxq=12 --txq=12
    --nb-cores=12 -i -a --forward-mode=mac --max-pkt-len=9000
    --mbuf-size=16384

    EAL: Detected 16 lcore(s)
    EAL: Detected 1 NUMA nodes
    EAL: Multi-process socket /var/run/dpdk/rte/mp_socket
    EAL: Selected IOVA mode 'PA'
    EAL: Probing VFIO support...
    EAL: VFIO support initialized
    EAL: PCI device 0000:0f:00.0 on NUMA socket -1
    EAL:   Invalid NUMA socket, default to 0
    EAL:   probe driver: 15b3:a2d2 net_mlx5
    net_mlx5: mlx5.c:2145: mlx5_pci_probe(): checking device "mlx5_1"
    net_mlx5: mlx5.c:2145: mlx5_pci_probe(): checking device "mlx5_0"
    net_mlx5: mlx5.c:2154: mlx5_pci_probe(): PCI information matches for
device "mlx5_0"
    net_mlx5: mlx5.c:2342: mlx5_pci_probe(): no E-Switch support detected
    net_mlx5: mlx5.c:1557: mlx5_dev_spawn(): naming Ethernet device
"0f:00.0"
    net_mlx5: mlx5.c:363: mlx5_alloc_shared_ibctx(): DevX is NOT supported
    net_mlx5: mlx5_mr.c:212: mlx5_mr_btree_init(): initialized B-tree
0x17fec8c68 with table     0x17fec60c0
    net_mlx5: mlx5.c:1610: mlx5_dev_spawn(): enhanced MPW is supported
    net_mlx5: mlx5.c:1623: mlx5_dev_spawn(): SWP support: 7
    net_mlx5: mlx5.c:1632: mlx5_dev_spawn():
min_single_stride_log_num_of_bytes: 6
    net_mlx5: mlx5.c:1634: mlx5_dev_spawn():
max_single_stride_log_num_of_bytes: 13
    net_mlx5: mlx5.c:1636: mlx5_dev_spawn():
min_single_wqe_log_num_of_strides: 3
    net_mlx5: mlx5.c:1638: mlx5_dev_spawn():
max_single_wqe_log_num_of_strides: 16
    net_mlx5: mlx5.c:1640: mlx5_dev_spawn():        supported_qpts: 256
    net_mlx5: mlx5.c:1641: mlx5_dev_spawn(): device supports Multi-Packet RQ
    net_mlx5: mlx5.c:1674: mlx5_dev_spawn(): tunnel offloading is supported
    net_mlx5: mlx5.c:1686: mlx5_dev_spawn(): MPLS over GRE/UDP tunnel
offloading is not     supported
    net_mlx5: mlx5.c:1783: mlx5_dev_spawn(): checksum offloading is
supported
    net_mlx5: mlx5.c:1803: mlx5_dev_spawn(): maximum Rx indirection table
size is 512
    net_mlx5: mlx5.c:1807: mlx5_dev_spawn(): VLAN stripping is supported
    net_mlx5: mlx5.c:1811: mlx5_dev_spawn(): FCS stripping configuration is
supported
    net_mlx5: mlx5.c:1840: mlx5_dev_spawn(): enhanced MPS is enabled
    net_mlx5: mlx5.c:1938: mlx5_dev_spawn(): port 0 MAC address is
50:6b:4b:e0:9a:22
    net_mlx5: mlx5.c:1945: mlx5_dev_spawn(): port 0 ifname is "enp15s0f0"
    net_mlx5: mlx5.c:1958: mlx5_dev_spawn(): port 0 MTU is 9000
    net_mlx5: mlx5.c:1980: mlx5_dev_spawn(): port 0 forcing Ethernet
interface up
    net_mlx5: mlx5.c:1356: mlx5_set_min_inline(): min tx inline configured:
0
    net_mlx5: mlx5_flow.c:377: mlx5_flow_discover_priorities(): port 0 flow
maximum     priority: 5
    Interactive-mode selected
    Auto-start selected
    Set mac packet forwarding mode
    testpmd: create a new mbuf pool <mbuf_pool_socket_0>: n=344064,
size=16384, socket=0
    testpmd: preferred mempool ops selected: ring_mp_mc
    EAL: Error - exiting with code: 1
      Cause: Creation of mbuf pool for socket 0 failed: Cannot allocate
memory

This is with 2048 2M hugepages defined, so I think I have plenty of
memory available.  I used dpdk-setup to set and verify the hugepages'
configuration and availability.  I'm trying to do some experiments to
see if I get to the bottom of this.

Any thoughts?

Regards,
--Jim

-----Original Message-----
From: Kiran Vedere [mailto:kiranv@mellanox.com] 
Sent: Friday, October 04, 2019 2:28 PM
To: Jim Vaigl; Asaf Penso; 'Stephen Hemminger'
Cc: users@dpdk.org; Erez Ferber; Olga Shern; Danny Vogel
Subject: RE: [dpdk-users] DPDK on Mellanox BlueField Ref Platform

Hi Jim,

I tried your test with 9000 Byte MTU Size. On BlueField Reference Platform I
set the MTU of the interface to 9000 and on TRex I am sending 8096 size byte
packets. I am able to loop back packets fine w/o any issues. Below is the
command line I use for testpmd

./testpmd --log-level="mlx5,8" -l 3,4,5,6,7,8,9,10,11,12,13,14,15 -n 4 -w
17:00.0 --socket-mem=2048 -- --socket-num=0 --burst=64 --txd=2048 --rxd=2048
--mbcache=512 --rxq=12 --txq=12 --nb-cores=12 -i -a --forward-mode=mac
--max-pkt-len=9000 --mbuf-size=16384

Two things to consider: The max Rx packet len  is used by the PMD during its
Rx Queue initialization. By default this is set to 1518 Bytes for
testpmd/l3fwd. For jumbo frames you need to pass --max-pkt-len=9000 (for
testpmd) or --enable-jumbo --max-pkt-len=9000 (for l3fwd). Are you passing
these values to l3fwd/testpmd when you run your test? Also since the
mbuf_size is 2048 by default, you need to increase the mbuf_size to > Jumbo
frame size unless you enable scatter in the PMD. For testpmd you can
increase the mbuf size by using --mbuf-size parameter. For l3fwd I don't
think there is a command line option to increase mbuf size in runtime. So
you might need to recompile the l3fwd code to increase mbuf size. Are you
doing this?

Hope this helps.

Regards,
Kiran



-----Original Message-----
From: Jim Vaigl <jimv@rockbridgesoftware.com> 
Sent: Friday, October 4, 2019 1:35 PM
To: Asaf Penso <asafp@mellanox.com>; 'Stephen Hemminger'
<stephen@networkplumber.org>
Cc: users@dpdk.org; Kiran Vedere <kiranv@mellanox.com>; Erez Ferber
<erezf@mellanox.com>; Olga Shern <olgas@mellanox.com>; Danny Vogel
<dan@mellanoxfederal.com>
Subject: RE: [dpdk-users] DPDK on Mellanox BlueField Ref Platform

A final update on this issue.  Kiran Vedere went above and beyond the call
of duty: he completely reproduced my hardware setup, showed that it worked
using trex to generate similar traffic to mine, and then provided me with a
bundled-up .bfb of his CentOS (with updated kernel) and OFED install to try
so that there would be no configuration stuff for me to mess up.

Using this, I saw exactly the same crashes I had seen in my setup.
After some thought, I realized the only meaningful difference was that my
traffic generator and IP configuration relied on an MTU size of 9000.
Once I set the MTU size down to 1500, the crashes stopped.

So, the answer is clearly that I'm just not setting up for the larger MTU
size.  I need to start to understand how to get DPDK to manage that, but the
crashing is at least understood now, and I have a way forward.

Thanks very much to Kiran.

Regards,
--Jim

-----Original Message-----
From: Jim Vaigl [mailto:jimv@rockbridgesoftware.com]
Sent: Thursday, September 26, 2019 3:47 PM
To: 'Asaf Penso'; 'Stephen Hemminger'
Cc: 'users@dpdk.org'; 'Kiran Vedere'; 'Erez Ferber'; 'Olga Shern'
Subject: RE: [dpdk-users] DPDK on Mellanox BlueField Ref Platform

> From: Asaf Penso [mailto:asafp@mellanox.com]
> Sent: Thursday, September 26, 2019 7:00 AM
> To: Jim Vaigl; 'Stephen Hemminger'
> Cc: users@dpdk.org; Kiran Vedere; Erez Ferber; Olga Shern
> Subject: RE: [dpdk-users] DPDK on Mellanox BlueField Ref Platform
>
> Hello Jim,
>
> Thanks for your mail.
> In order  for us to have a better resolution please send a mail to our
support team > - support@mellanox.com
> Please provide as much info about the setup, configuration etc as you can.
>
> In parallel, I added Erez Ferber here to assist.
>
> Regards,
> Asaf Penso

Thanks for the kind offer, Asaf.  I'll take this debug effort off-line with
you and Erez and post back to the list here later with any resolution so
everyone can see the result.

By the way, the prior suggestion of using v. 25 of rdma-core didn't pan out:
the current build script just makes a local build in a subdirectory off the
source tree and there's no obvious way to integrate it with the MLNX_OFED
environment and the dpdk install.  After resolving package dependencies to
get rdma-core to build from the GitHub repo, I realized the instructions say
this:

  ---
  Building
  This project uses a cmake based build system. Quick start:

  $ bash build.sh
  build/bin will contain the sample programs and build/lib
  will contain the shared libraries. The build is configured
  to run all the programs 'in-place' and cannot be installed.

  NOTE: It is not currently easy to run from the build
  directory, the plugins only load from the system path.
  ---

--Jim

>> -----Original Message-----
>> From: users <users-bounces@dpdk.org> On Behalf Of Jim Vaigl
>> Sent: Tuesday, September 24, 2019 10:11 PM
>> To: 'Stephen Hemminger' <stephen@networkplumber.org>
>> Cc: users@dpdk.org
>> Subject: Re: [dpdk-users] DPDK on Mellanox BlueField Ref Platform
>> 
>> On Tue, 24 Sep 2019 12:31:51 -0400
>> "Jim Vaigl" <jimv@rockbridgesoftware.com> wrote:
>> 
>>>> Since no one has chimed in with any build/install/configure 
>>>> suggestion
>> for
>> >> the
>> >> BlueField, I've spent some time debugging and thought I'd share 
>> >> the
>> results.
>> >> Building the l3fwd example application and running it as the docs
>> suggest,
>> >> when
>> >> I try to send it UDP packets from another machine, it dumps core.
>> >>
>> >> Debugging a bit with gdb and printf, I can see that from inside
>> >> process_packet()
>> >> and processx4_step1() the calls to rte_pktmbuf_mtod() return Nil 
>> >> or suspicious pointer values (i.e. 0x80).  The sample apps don't 
>> >> guard against NULL pointers being returned from this rte call, so 
>> >> that's why it's dumping core.
>> >>
>> >> I still think the problem is related to the driver config, but 
>> >> thought
>> this
>> >> might ring a bell for anyone who's had problems like this.
>> >>
>> >> The thing that still bothers me is that rather than seeing what I 
>> >> was expecting at init based on what the documentation shows:
>> >>     [...]
>> >>     EAL: probe driver: 15b3:1013 librte_pmd_mlx5
>> >>
>> >> ... when rte_eal_init() runs, I'm seeing:
>> >>     [...]
>> >>     EAL:  Selected IOVA mode 'PA'
>> >>     EAL:  Probing VFIO support...
>> >>
>> >> This still seems wrong, and I've verified that specifying the
BlueField
>> >> target ID
>> >> string in the make is causing "CONFIG_RTE_LIBRTE_MLX5_PMD=y" to
>> appear in
>> >> the .config.
>> >>
>> >> Regards,
>> >> --Jim Vaigl
>> >> 614 886 5999
>> >>
>> >>
>> >
>> >From: Stephen Hemminger [mailto:stephen@networkplumber.org]
>> >Sent: Tuesday, September 24, 2019 1:18 PM
>> >To: Jim Vaigl
>> >Cc: users@dpdk.org
>> >
>> >Subject: Re: [dpdk-users] DPDK on Mellanox BlueField Ref Platform 
>> >make sure you have latest version of rdma-core installed (v25).
>> >The right version is not in most distros
>> 
>> Great suggestion.  I'm using the rdma-core from the MLNX_OFED 
>> 4.6-3.5.8.0 install.  I can't figure out how to tell what version 
>> that thing
includes,
>> even looking at the source, since there's no version information in 
>> the source files, BUT I went to github and downloaded rdma-core v24 
>> and v25 and neither diff cleanly with the source RPM that comes in 
>> the OFED install.  I don't know yet if it's because this is some 
>> different version or if it's because Mellanox has made their own tweaks.
>> 
>> I would hope that the very latest OFED from Mellanox would include an 
>> up-to-date and working set of libs/modules, but maybe you're on to 
>> something.  It sounds like a risky move, but maybe I'll try just 
>> installing rdma-core from github over top of the OFED install.  I 
>> have a fear that I'll end up with inconsistent versions, but it's worth a
try.
>> 
>> Thanks,
>> --Jim
 


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [dpdk-users] DPDK on Mellanox BlueField Ref Platform
  2019-10-07 16:52           ` Jim Vaigl
@ 2019-10-07 17:00             ` Kiran Vedere
  2019-10-07 17:02               ` Kiran Vedere
  2019-10-07 18:10               ` Jim Vaigl
  0 siblings, 2 replies; 12+ messages in thread
From: Kiran Vedere @ 2019-10-07 17:00 UTC (permalink / raw)
  To: Jim Vaigl, Asaf Penso, 'Stephen Hemminger'
  Cc: users, Erez Ferber, Olga Shern, Danny Vogel

Hi Jim,

Looks like n=344064, size=16384 exceeds 5 G. I used 4K 2M Pages (so that's 8G). Can you try with that? You can use more hugepages (8K for ex) as well just to be on safeside or reduce the max-pkt-len to little over 9000 (9216 maybe) and give it a try?

Regards,
Kiran

-----Original Message-----
From: Jim Vaigl <jimv@rockbridgesoftware.com> 
Sent: Monday, October 7, 2019 12:52 PM
To: Kiran Vedere <kiranv@mellanox.com>; Asaf Penso <asafp@mellanox.com>; 'Stephen Hemminger' <stephen@networkplumber.org>
Cc: users@dpdk.org; Erez Ferber <erezf@mellanox.com>; Olga Shern <olgas@mellanox.com>; Danny Vogel <dan@mellanoxfederal.com>
Subject: RE: [dpdk-users] DPDK on Mellanox BlueField Ref Platform

Hi Kiran,

When I try this command line with testpmd (with the -w just changed to my port 0's PCIe address), I get "Creation of mbuf pool for socket
0 failed:  Cannot allocate memory".  I've tried adding --total-num-mbufs to restrict that, but that didn't help.  It runs if I try restricting it to just two cores, but then I drop most of my packets.  Here's the output running it as you suggested:

    [root@localhost bin]# ./testpmd --log-level="mlx5,8" -l 3,4,5,6,7,8,
     9,10,11,12,13,14,15 -n 4 -w 0f:00.0 --socket-mem=2048 ---socket-num=0
    --burst=64 --txd=2048 --rxd=2048 --mbcache=512 --rxq=12 --txq=12
    --nb-cores=12 -i -a --forward-mode=mac --max-pkt-len=9000
    --mbuf-size=16384

    EAL: Detected 16 lcore(s)
    EAL: Detected 1 NUMA nodes
    EAL: Multi-process socket /var/run/dpdk/rte/mp_socket
    EAL: Selected IOVA mode 'PA'
    EAL: Probing VFIO support...
    EAL: VFIO support initialized
    EAL: PCI device 0000:0f:00.0 on NUMA socket -1
    EAL:   Invalid NUMA socket, default to 0
    EAL:   probe driver: 15b3:a2d2 net_mlx5
    net_mlx5: mlx5.c:2145: mlx5_pci_probe(): checking device "mlx5_1"
    net_mlx5: mlx5.c:2145: mlx5_pci_probe(): checking device "mlx5_0"
    net_mlx5: mlx5.c:2154: mlx5_pci_probe(): PCI information matches for device "mlx5_0"
    net_mlx5: mlx5.c:2342: mlx5_pci_probe(): no E-Switch support detected
    net_mlx5: mlx5.c:1557: mlx5_dev_spawn(): naming Ethernet device "0f:00.0"
    net_mlx5: mlx5.c:363: mlx5_alloc_shared_ibctx(): DevX is NOT supported
    net_mlx5: mlx5_mr.c:212: mlx5_mr_btree_init(): initialized B-tree
0x17fec8c68 with table     0x17fec60c0
    net_mlx5: mlx5.c:1610: mlx5_dev_spawn(): enhanced MPW is supported
    net_mlx5: mlx5.c:1623: mlx5_dev_spawn(): SWP support: 7
    net_mlx5: mlx5.c:1632: mlx5_dev_spawn():
min_single_stride_log_num_of_bytes: 6
    net_mlx5: mlx5.c:1634: mlx5_dev_spawn():
max_single_stride_log_num_of_bytes: 13
    net_mlx5: mlx5.c:1636: mlx5_dev_spawn():
min_single_wqe_log_num_of_strides: 3
    net_mlx5: mlx5.c:1638: mlx5_dev_spawn():
max_single_wqe_log_num_of_strides: 16
    net_mlx5: mlx5.c:1640: mlx5_dev_spawn():        supported_qpts: 256
    net_mlx5: mlx5.c:1641: mlx5_dev_spawn(): device supports Multi-Packet RQ
    net_mlx5: mlx5.c:1674: mlx5_dev_spawn(): tunnel offloading is supported
    net_mlx5: mlx5.c:1686: mlx5_dev_spawn(): MPLS over GRE/UDP tunnel
offloading is not     supported
    net_mlx5: mlx5.c:1783: mlx5_dev_spawn(): checksum offloading is supported
    net_mlx5: mlx5.c:1803: mlx5_dev_spawn(): maximum Rx indirection table size is 512
    net_mlx5: mlx5.c:1807: mlx5_dev_spawn(): VLAN stripping is supported
    net_mlx5: mlx5.c:1811: mlx5_dev_spawn(): FCS stripping configuration is supported
    net_mlx5: mlx5.c:1840: mlx5_dev_spawn(): enhanced MPS is enabled
    net_mlx5: mlx5.c:1938: mlx5_dev_spawn(): port 0 MAC address is
50:6b:4b:e0:9a:22
    net_mlx5: mlx5.c:1945: mlx5_dev_spawn(): port 0 ifname is "enp15s0f0"
    net_mlx5: mlx5.c:1958: mlx5_dev_spawn(): port 0 MTU is 9000
    net_mlx5: mlx5.c:1980: mlx5_dev_spawn(): port 0 forcing Ethernet interface up
    net_mlx5: mlx5.c:1356: mlx5_set_min_inline(): min tx inline configured:
0
    net_mlx5: mlx5_flow.c:377: mlx5_flow_discover_priorities(): port 0 flow
maximum     priority: 5
    Interactive-mode selected
    Auto-start selected
    Set mac packet forwarding mode
    testpmd: create a new mbuf pool <mbuf_pool_socket_0>: n=344064, size=16384, socket=0
    testpmd: preferred mempool ops selected: ring_mp_mc
    EAL: Error - exiting with code: 1
      Cause: Creation of mbuf pool for socket 0 failed: Cannot allocate memory

This is with 2048 2M hugepages defined, so I think I have plenty of memory available.  I used dpdk-setup to set and verify the hugepages'
configuration and availability.  I'm trying to do some experiments to see if I get to the bottom of this.

Any thoughts?

Regards,
--Jim

-----Original Message-----
From: Kiran Vedere [mailto:kiranv@mellanox.com]
Sent: Friday, October 04, 2019 2:28 PM
To: Jim Vaigl; Asaf Penso; 'Stephen Hemminger'
Cc: users@dpdk.org; Erez Ferber; Olga Shern; Danny Vogel
Subject: RE: [dpdk-users] DPDK on Mellanox BlueField Ref Platform

Hi Jim,

I tried your test with 9000 Byte MTU Size. On BlueField Reference Platform I set the MTU of the interface to 9000 and on TRex I am sending 8096 size byte packets. I am able to loop back packets fine w/o any issues. Below is the command line I use for testpmd

./testpmd --log-level="mlx5,8" -l 3,4,5,6,7,8,9,10,11,12,13,14,15 -n 4 -w
17:00.0 --socket-mem=2048 -- --socket-num=0 --burst=64 --txd=2048 --rxd=2048
--mbcache=512 --rxq=12 --txq=12 --nb-cores=12 -i -a --forward-mode=mac
--max-pkt-len=9000 --mbuf-size=16384

Two things to consider: The max Rx packet len  is used by the PMD during its Rx Queue initialization. By default this is set to 1518 Bytes for testpmd/l3fwd. For jumbo frames you need to pass --max-pkt-len=9000 (for
testpmd) or --enable-jumbo --max-pkt-len=9000 (for l3fwd). Are you passing these values to l3fwd/testpmd when you run your test? Also since the mbuf_size is 2048 by default, you need to increase the mbuf_size to > Jumbo frame size unless you enable scatter in the PMD. For testpmd you can increase the mbuf size by using --mbuf-size parameter. For l3fwd I don't think there is a command line option to increase mbuf size in runtime. So you might need to recompile the l3fwd code to increase mbuf size. Are you doing this?

Hope this helps.

Regards,
Kiran



-----Original Message-----
From: Jim Vaigl <jimv@rockbridgesoftware.com>
Sent: Friday, October 4, 2019 1:35 PM
To: Asaf Penso <asafp@mellanox.com>; 'Stephen Hemminger'
<stephen@networkplumber.org>
Cc: users@dpdk.org; Kiran Vedere <kiranv@mellanox.com>; Erez Ferber <erezf@mellanox.com>; Olga Shern <olgas@mellanox.com>; Danny Vogel <dan@mellanoxfederal.com>
Subject: RE: [dpdk-users] DPDK on Mellanox BlueField Ref Platform

A final update on this issue.  Kiran Vedere went above and beyond the call of duty: he completely reproduced my hardware setup, showed that it worked using trex to generate similar traffic to mine, and then provided me with a bundled-up .bfb of his CentOS (with updated kernel) and OFED install to try so that there would be no configuration stuff for me to mess up.

Using this, I saw exactly the same crashes I had seen in my setup.
After some thought, I realized the only meaningful difference was that my traffic generator and IP configuration relied on an MTU size of 9000.
Once I set the MTU size down to 1500, the crashes stopped.

So, the answer is clearly that I'm just not setting up for the larger MTU size.  I need to start to understand how to get DPDK to manage that, but the crashing is at least understood now, and I have a way forward.

Thanks very much to Kiran.

Regards,
--Jim

-----Original Message-----
From: Jim Vaigl [mailto:jimv@rockbridgesoftware.com]
Sent: Thursday, September 26, 2019 3:47 PM
To: 'Asaf Penso'; 'Stephen Hemminger'
Cc: 'users@dpdk.org'; 'Kiran Vedere'; 'Erez Ferber'; 'Olga Shern'
Subject: RE: [dpdk-users] DPDK on Mellanox BlueField Ref Platform

> From: Asaf Penso [mailto:asafp@mellanox.com]
> Sent: Thursday, September 26, 2019 7:00 AM
> To: Jim Vaigl; 'Stephen Hemminger'
> Cc: users@dpdk.org; Kiran Vedere; Erez Ferber; Olga Shern
> Subject: RE: [dpdk-users] DPDK on Mellanox BlueField Ref Platform
>
> Hello Jim,
>
> Thanks for your mail.
> In order  for us to have a better resolution please send a mail to our
support team > - support@mellanox.com
> Please provide as much info about the setup, configuration etc as you can.
>
> In parallel, I added Erez Ferber here to assist.
>
> Regards,
> Asaf Penso

Thanks for the kind offer, Asaf.  I'll take this debug effort off-line with you and Erez and post back to the list here later with any resolution so everyone can see the result.

By the way, the prior suggestion of using v. 25 of rdma-core didn't pan out:
the current build script just makes a local build in a subdirectory off the source tree and there's no obvious way to integrate it with the MLNX_OFED environment and the dpdk install.  After resolving package dependencies to get rdma-core to build from the GitHub repo, I realized the instructions say
this:

  ---
  Building
  This project uses a cmake based build system. Quick start:

  $ bash build.sh
  build/bin will contain the sample programs and build/lib
  will contain the shared libraries. The build is configured
  to run all the programs 'in-place' and cannot be installed.

  NOTE: It is not currently easy to run from the build
  directory, the plugins only load from the system path.
  ---

--Jim

>> -----Original Message-----
>> From: users <users-bounces@dpdk.org> On Behalf Of Jim Vaigl
>> Sent: Tuesday, September 24, 2019 10:11 PM
>> To: 'Stephen Hemminger' <stephen@networkplumber.org>
>> Cc: users@dpdk.org
>> Subject: Re: [dpdk-users] DPDK on Mellanox BlueField Ref Platform
>> 
>> On Tue, 24 Sep 2019 12:31:51 -0400
>> "Jim Vaigl" <jimv@rockbridgesoftware.com> wrote:
>> 
>>>> Since no one has chimed in with any build/install/configure 
>>>> suggestion
>> for
>> >> the
>> >> BlueField, I've spent some time debugging and thought I'd share 
>> >> the
>> results.
>> >> Building the l3fwd example application and running it as the docs
>> suggest,
>> >> when
>> >> I try to send it UDP packets from another machine, it dumps core.
>> >>
>> >> Debugging a bit with gdb and printf, I can see that from inside
>> >> process_packet()
>> >> and processx4_step1() the calls to rte_pktmbuf_mtod() return Nil 
>> >> or suspicious pointer values (i.e. 0x80).  The sample apps don't 
>> >> guard against NULL pointers being returned from this rte call, so 
>> >> that's why it's dumping core.
>> >>
>> >> I still think the problem is related to the driver config, but 
>> >> thought
>> this
>> >> might ring a bell for anyone who's had problems like this.
>> >>
>> >> The thing that still bothers me is that rather than seeing what I 
>> >> was expecting at init based on what the documentation shows:
>> >>     [...]
>> >>     EAL: probe driver: 15b3:1013 librte_pmd_mlx5
>> >>
>> >> ... when rte_eal_init() runs, I'm seeing:
>> >>     [...]
>> >>     EAL:  Selected IOVA mode 'PA'
>> >>     EAL:  Probing VFIO support...
>> >>
>> >> This still seems wrong, and I've verified that specifying the
BlueField
>> >> target ID
>> >> string in the make is causing "CONFIG_RTE_LIBRTE_MLX5_PMD=y" to
>> appear in
>> >> the .config.
>> >>
>> >> Regards,
>> >> --Jim Vaigl
>> >> 614 886 5999
>> >>
>> >>
>> >
>> >From: Stephen Hemminger [mailto:stephen@networkplumber.org]
>> >Sent: Tuesday, September 24, 2019 1:18 PM
>> >To: Jim Vaigl
>> >Cc: users@dpdk.org
>> >
>> >Subject: Re: [dpdk-users] DPDK on Mellanox BlueField Ref Platform 
>> >make sure you have latest version of rdma-core installed (v25).
>> >The right version is not in most distros
>> 
>> Great suggestion.  I'm using the rdma-core from the MLNX_OFED
>> 4.6-3.5.8.0 install.  I can't figure out how to tell what version 
>> that thing
includes,
>> even looking at the source, since there's no version information in 
>> the source files, BUT I went to github and downloaded rdma-core v24 
>> and v25 and neither diff cleanly with the source RPM that comes in 
>> the OFED install.  I don't know yet if it's because this is some 
>> different version or if it's because Mellanox has made their own tweaks.
>> 
>> I would hope that the very latest OFED from Mellanox would include an 
>> up-to-date and working set of libs/modules, but maybe you're on to 
>> something.  It sounds like a risky move, but maybe I'll try just 
>> installing rdma-core from github over top of the OFED install.  I 
>> have a fear that I'll end up with inconsistent versions, but it's 
>> worth a
try.
>> 
>> Thanks,
>> --Jim
 


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [dpdk-users] DPDK on Mellanox BlueField Ref Platform
  2019-10-07 17:00             ` Kiran Vedere
@ 2019-10-07 17:02               ` Kiran Vedere
  2019-10-07 18:10               ` Jim Vaigl
  1 sibling, 0 replies; 12+ messages in thread
From: Kiran Vedere @ 2019-10-07 17:02 UTC (permalink / raw)
  To: Jim Vaigl, Asaf Penso, 'Stephen Hemminger'
  Cc: users, Erez Ferber, Olga Shern, Danny Vogel

Hi Jim,

I am sorry. I meant reduce the --mbuf-size to little over jumbo frame size (ex 9216)

Regards,
kiran

-----Original Message-----
From: Kiran Vedere 
Sent: Monday, October 7, 2019 1:01 PM
To: Jim Vaigl <jimv@rockbridgesoftware.com>; Asaf Penso <asafp@mellanox.com>; 'Stephen Hemminger' <stephen@networkplumber.org>
Cc: users@dpdk.org; Erez Ferber <erezf@mellanox.com>; Olga Shern <olgas@mellanox.com>; Danny Vogel <dan@mellanoxfederal.com>
Subject: RE: [dpdk-users] DPDK on Mellanox BlueField Ref Platform

Hi Jim,

Looks like n=344064, size=16384 exceeds 5 G. I used 4K 2M Pages (so that's 8G). Can you try with that? You can use more hugepages (8K for ex) as well just to be on safeside or reduce the max-pkt-len to little over 9000 (9216 maybe) and give it a try?

Regards,
Kiran

-----Original Message-----
From: Jim Vaigl <jimv@rockbridgesoftware.com>
Sent: Monday, October 7, 2019 12:52 PM
To: Kiran Vedere <kiranv@mellanox.com>; Asaf Penso <asafp@mellanox.com>; 'Stephen Hemminger' <stephen@networkplumber.org>
Cc: users@dpdk.org; Erez Ferber <erezf@mellanox.com>; Olga Shern <olgas@mellanox.com>; Danny Vogel <dan@mellanoxfederal.com>
Subject: RE: [dpdk-users] DPDK on Mellanox BlueField Ref Platform

Hi Kiran,

When I try this command line with testpmd (with the -w just changed to my port 0's PCIe address), I get "Creation of mbuf pool for socket
0 failed:  Cannot allocate memory".  I've tried adding --total-num-mbufs to restrict that, but that didn't help.  It runs if I try restricting it to just two cores, but then I drop most of my packets.  Here's the output running it as you suggested:

    [root@localhost bin]# ./testpmd --log-level="mlx5,8" -l 3,4,5,6,7,8,
     9,10,11,12,13,14,15 -n 4 -w 0f:00.0 --socket-mem=2048 ---socket-num=0
    --burst=64 --txd=2048 --rxd=2048 --mbcache=512 --rxq=12 --txq=12
    --nb-cores=12 -i -a --forward-mode=mac --max-pkt-len=9000
    --mbuf-size=16384

    EAL: Detected 16 lcore(s)
    EAL: Detected 1 NUMA nodes
    EAL: Multi-process socket /var/run/dpdk/rte/mp_socket
    EAL: Selected IOVA mode 'PA'
    EAL: Probing VFIO support...
    EAL: VFIO support initialized
    EAL: PCI device 0000:0f:00.0 on NUMA socket -1
    EAL:   Invalid NUMA socket, default to 0
    EAL:   probe driver: 15b3:a2d2 net_mlx5
    net_mlx5: mlx5.c:2145: mlx5_pci_probe(): checking device "mlx5_1"
    net_mlx5: mlx5.c:2145: mlx5_pci_probe(): checking device "mlx5_0"
    net_mlx5: mlx5.c:2154: mlx5_pci_probe(): PCI information matches for device "mlx5_0"
    net_mlx5: mlx5.c:2342: mlx5_pci_probe(): no E-Switch support detected
    net_mlx5: mlx5.c:1557: mlx5_dev_spawn(): naming Ethernet device "0f:00.0"
    net_mlx5: mlx5.c:363: mlx5_alloc_shared_ibctx(): DevX is NOT supported
    net_mlx5: mlx5_mr.c:212: mlx5_mr_btree_init(): initialized B-tree
0x17fec8c68 with table     0x17fec60c0
    net_mlx5: mlx5.c:1610: mlx5_dev_spawn(): enhanced MPW is supported
    net_mlx5: mlx5.c:1623: mlx5_dev_spawn(): SWP support: 7
    net_mlx5: mlx5.c:1632: mlx5_dev_spawn():
min_single_stride_log_num_of_bytes: 6
    net_mlx5: mlx5.c:1634: mlx5_dev_spawn():
max_single_stride_log_num_of_bytes: 13
    net_mlx5: mlx5.c:1636: mlx5_dev_spawn():
min_single_wqe_log_num_of_strides: 3
    net_mlx5: mlx5.c:1638: mlx5_dev_spawn():
max_single_wqe_log_num_of_strides: 16
    net_mlx5: mlx5.c:1640: mlx5_dev_spawn():        supported_qpts: 256
    net_mlx5: mlx5.c:1641: mlx5_dev_spawn(): device supports Multi-Packet RQ
    net_mlx5: mlx5.c:1674: mlx5_dev_spawn(): tunnel offloading is supported
    net_mlx5: mlx5.c:1686: mlx5_dev_spawn(): MPLS over GRE/UDP tunnel
offloading is not     supported
    net_mlx5: mlx5.c:1783: mlx5_dev_spawn(): checksum offloading is supported
    net_mlx5: mlx5.c:1803: mlx5_dev_spawn(): maximum Rx indirection table size is 512
    net_mlx5: mlx5.c:1807: mlx5_dev_spawn(): VLAN stripping is supported
    net_mlx5: mlx5.c:1811: mlx5_dev_spawn(): FCS stripping configuration is supported
    net_mlx5: mlx5.c:1840: mlx5_dev_spawn(): enhanced MPS is enabled
    net_mlx5: mlx5.c:1938: mlx5_dev_spawn(): port 0 MAC address is
50:6b:4b:e0:9a:22
    net_mlx5: mlx5.c:1945: mlx5_dev_spawn(): port 0 ifname is "enp15s0f0"
    net_mlx5: mlx5.c:1958: mlx5_dev_spawn(): port 0 MTU is 9000
    net_mlx5: mlx5.c:1980: mlx5_dev_spawn(): port 0 forcing Ethernet interface up
    net_mlx5: mlx5.c:1356: mlx5_set_min_inline(): min tx inline configured:
0
    net_mlx5: mlx5_flow.c:377: mlx5_flow_discover_priorities(): port 0 flow
maximum     priority: 5
    Interactive-mode selected
    Auto-start selected
    Set mac packet forwarding mode
    testpmd: create a new mbuf pool <mbuf_pool_socket_0>: n=344064, size=16384, socket=0
    testpmd: preferred mempool ops selected: ring_mp_mc
    EAL: Error - exiting with code: 1
      Cause: Creation of mbuf pool for socket 0 failed: Cannot allocate memory

This is with 2048 2M hugepages defined, so I think I have plenty of memory available.  I used dpdk-setup to set and verify the hugepages'
configuration and availability.  I'm trying to do some experiments to see if I get to the bottom of this.

Any thoughts?

Regards,
--Jim

-----Original Message-----
From: Kiran Vedere [mailto:kiranv@mellanox.com]
Sent: Friday, October 04, 2019 2:28 PM
To: Jim Vaigl; Asaf Penso; 'Stephen Hemminger'
Cc: users@dpdk.org; Erez Ferber; Olga Shern; Danny Vogel
Subject: RE: [dpdk-users] DPDK on Mellanox BlueField Ref Platform

Hi Jim,

I tried your test with 9000 Byte MTU Size. On BlueField Reference Platform I set the MTU of the interface to 9000 and on TRex I am sending 8096 size byte packets. I am able to loop back packets fine w/o any issues. Below is the command line I use for testpmd

./testpmd --log-level="mlx5,8" -l 3,4,5,6,7,8,9,10,11,12,13,14,15 -n 4 -w
17:00.0 --socket-mem=2048 -- --socket-num=0 --burst=64 --txd=2048 --rxd=2048
--mbcache=512 --rxq=12 --txq=12 --nb-cores=12 -i -a --forward-mode=mac
--max-pkt-len=9000 --mbuf-size=16384

Two things to consider: The max Rx packet len  is used by the PMD during its Rx Queue initialization. By default this is set to 1518 Bytes for testpmd/l3fwd. For jumbo frames you need to pass --max-pkt-len=9000 (for
testpmd) or --enable-jumbo --max-pkt-len=9000 (for l3fwd). Are you passing these values to l3fwd/testpmd when you run your test? Also since the mbuf_size is 2048 by default, you need to increase the mbuf_size to > Jumbo frame size unless you enable scatter in the PMD. For testpmd you can increase the mbuf size by using --mbuf-size parameter. For l3fwd I don't think there is a command line option to increase mbuf size in runtime. So you might need to recompile the l3fwd code to increase mbuf size. Are you doing this?

Hope this helps.

Regards,
Kiran



-----Original Message-----
From: Jim Vaigl <jimv@rockbridgesoftware.com>
Sent: Friday, October 4, 2019 1:35 PM
To: Asaf Penso <asafp@mellanox.com>; 'Stephen Hemminger'
<stephen@networkplumber.org>
Cc: users@dpdk.org; Kiran Vedere <kiranv@mellanox.com>; Erez Ferber <erezf@mellanox.com>; Olga Shern <olgas@mellanox.com>; Danny Vogel <dan@mellanoxfederal.com>
Subject: RE: [dpdk-users] DPDK on Mellanox BlueField Ref Platform

A final update on this issue.  Kiran Vedere went above and beyond the call of duty: he completely reproduced my hardware setup, showed that it worked using trex to generate similar traffic to mine, and then provided me with a bundled-up .bfb of his CentOS (with updated kernel) and OFED install to try so that there would be no configuration stuff for me to mess up.

Using this, I saw exactly the same crashes I had seen in my setup.
After some thought, I realized the only meaningful difference was that my traffic generator and IP configuration relied on an MTU size of 9000.
Once I set the MTU size down to 1500, the crashes stopped.

So, the answer is clearly that I'm just not setting up for the larger MTU size.  I need to start to understand how to get DPDK to manage that, but the crashing is at least understood now, and I have a way forward.

Thanks very much to Kiran.

Regards,
--Jim

-----Original Message-----
From: Jim Vaigl [mailto:jimv@rockbridgesoftware.com]
Sent: Thursday, September 26, 2019 3:47 PM
To: 'Asaf Penso'; 'Stephen Hemminger'
Cc: 'users@dpdk.org'; 'Kiran Vedere'; 'Erez Ferber'; 'Olga Shern'
Subject: RE: [dpdk-users] DPDK on Mellanox BlueField Ref Platform

> From: Asaf Penso [mailto:asafp@mellanox.com]
> Sent: Thursday, September 26, 2019 7:00 AM
> To: Jim Vaigl; 'Stephen Hemminger'
> Cc: users@dpdk.org; Kiran Vedere; Erez Ferber; Olga Shern
> Subject: RE: [dpdk-users] DPDK on Mellanox BlueField Ref Platform
>
> Hello Jim,
>
> Thanks for your mail.
> In order  for us to have a better resolution please send a mail to our
support team > - support@mellanox.com
> Please provide as much info about the setup, configuration etc as you can.
>
> In parallel, I added Erez Ferber here to assist.
>
> Regards,
> Asaf Penso

Thanks for the kind offer, Asaf.  I'll take this debug effort off-line with you and Erez and post back to the list here later with any resolution so everyone can see the result.

By the way, the prior suggestion of using v. 25 of rdma-core didn't pan out:
the current build script just makes a local build in a subdirectory off the source tree and there's no obvious way to integrate it with the MLNX_OFED environment and the dpdk install.  After resolving package dependencies to get rdma-core to build from the GitHub repo, I realized the instructions say
this:

  ---
  Building
  This project uses a cmake based build system. Quick start:

  $ bash build.sh
  build/bin will contain the sample programs and build/lib
  will contain the shared libraries. The build is configured
  to run all the programs 'in-place' and cannot be installed.

  NOTE: It is not currently easy to run from the build
  directory, the plugins only load from the system path.
  ---

--Jim

>> -----Original Message-----
>> From: users <users-bounces@dpdk.org> On Behalf Of Jim Vaigl
>> Sent: Tuesday, September 24, 2019 10:11 PM
>> To: 'Stephen Hemminger' <stephen@networkplumber.org>
>> Cc: users@dpdk.org
>> Subject: Re: [dpdk-users] DPDK on Mellanox BlueField Ref Platform
>> 
>> On Tue, 24 Sep 2019 12:31:51 -0400
>> "Jim Vaigl" <jimv@rockbridgesoftware.com> wrote:
>> 
>>>> Since no one has chimed in with any build/install/configure 
>>>> suggestion
>> for
>> >> the
>> >> BlueField, I've spent some time debugging and thought I'd share 
>> >> the
>> results.
>> >> Building the l3fwd example application and running it as the docs
>> suggest,
>> >> when
>> >> I try to send it UDP packets from another machine, it dumps core.
>> >>
>> >> Debugging a bit with gdb and printf, I can see that from inside
>> >> process_packet()
>> >> and processx4_step1() the calls to rte_pktmbuf_mtod() return Nil 
>> >> or suspicious pointer values (i.e. 0x80).  The sample apps don't 
>> >> guard against NULL pointers being returned from this rte call, so 
>> >> that's why it's dumping core.
>> >>
>> >> I still think the problem is related to the driver config, but 
>> >> thought
>> this
>> >> might ring a bell for anyone who's had problems like this.
>> >>
>> >> The thing that still bothers me is that rather than seeing what I 
>> >> was expecting at init based on what the documentation shows:
>> >>     [...]
>> >>     EAL: probe driver: 15b3:1013 librte_pmd_mlx5
>> >>
>> >> ... when rte_eal_init() runs, I'm seeing:
>> >>     [...]
>> >>     EAL:  Selected IOVA mode 'PA'
>> >>     EAL:  Probing VFIO support...
>> >>
>> >> This still seems wrong, and I've verified that specifying the
BlueField
>> >> target ID
>> >> string in the make is causing "CONFIG_RTE_LIBRTE_MLX5_PMD=y" to
>> appear in
>> >> the .config.
>> >>
>> >> Regards,
>> >> --Jim Vaigl
>> >> 614 886 5999
>> >>
>> >>
>> >
>> >From: Stephen Hemminger [mailto:stephen@networkplumber.org]
>> >Sent: Tuesday, September 24, 2019 1:18 PM
>> >To: Jim Vaigl
>> >Cc: users@dpdk.org
>> >
>> >Subject: Re: [dpdk-users] DPDK on Mellanox BlueField Ref Platform 
>> >make sure you have latest version of rdma-core installed (v25).
>> >The right version is not in most distros
>> 
>> Great suggestion.  I'm using the rdma-core from the MLNX_OFED
>> 4.6-3.5.8.0 install.  I can't figure out how to tell what version 
>> that thing
includes,
>> even looking at the source, since there's no version information in 
>> the source files, BUT I went to github and downloaded rdma-core v24 
>> and v25 and neither diff cleanly with the source RPM that comes in 
>> the OFED install.  I don't know yet if it's because this is some 
>> different version or if it's because Mellanox has made their own tweaks.
>> 
>> I would hope that the very latest OFED from Mellanox would include an 
>> up-to-date and working set of libs/modules, but maybe you're on to 
>> something.  It sounds like a risky move, but maybe I'll try just 
>> installing rdma-core from github over top of the OFED install.  I 
>> have a fear that I'll end up with inconsistent versions, but it's 
>> worth a
try.
>> 
>> Thanks,
>> --Jim
 


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [dpdk-users] DPDK on Mellanox BlueField Ref Platform
  2019-10-07 17:00             ` Kiran Vedere
  2019-10-07 17:02               ` Kiran Vedere
@ 2019-10-07 18:10               ` Jim Vaigl
  1 sibling, 0 replies; 12+ messages in thread
From: Jim Vaigl @ 2019-10-07 18:10 UTC (permalink / raw)
  To: 'Kiran Vedere', 'Asaf Penso',
	'Stephen Hemminger'
  Cc: users, 'Erez Ferber', 'Olga Shern',
	'Danny Vogel'

That did it. I wasn't doing the arithmetic right.  If I do 8K 2MB pages,
I get no memory errors.  So, I'm good now.  I can't thank you enough for
all the help.

To recap this topic for anyone following:

1) The combinations of versions I was using at the beginning and
   doubting was almost certainly just fine for use on the BlueField
   reference platform:

     CentOS 7.4.1708
     Kernel 4.14.139
     MLNX_OFED 4.6-3.5.8
     DPDK 19.08

2) The combinations of versions Kiran provided to me are also
   fine:

     CentOS 7.6
     Kernel 4.20
     MLNX_OFED 4.6-3.5.8
     DPDK 19.08

3)  If in doubt about your own cobbled-together environment, ask
    your friendly Mellanox support agent to provide a known-good
    .bfb image to compare behavior against.

4)  The core dumps I was seeing were a result of my running the
    testpmd/l3fwd tools with large MTU, but not specifying that
    to the tools.  Accounting for this properly solves my issue.

Thanks all,
--Jim

-----Original Message-----
From: Kiran Vedere [mailto:kiranv@mellanox.com] 
Sent: Monday, October 07, 2019 1:01 PM
To: Jim Vaigl; Asaf Penso; 'Stephen Hemminger'
Cc: users@dpdk.org; Erez Ferber; Olga Shern; Danny Vogel
Subject: RE: [dpdk-users] DPDK on Mellanox BlueField Ref Platform

Hi Jim,

Looks like n=344064, size=16384 exceeds 5 G. I used 4K 2M Pages (so that's
8G). Can you try with that? You can use more hugepages (8K for ex) as well
just to be on safeside or reduce the max-pkt-len to little over 9000 (9216
maybe) and give it a try?

Regards,
Kiran

-----Original Message-----
From: Jim Vaigl <jimv@rockbridgesoftware.com> 
Sent: Monday, October 7, 2019 12:52 PM
To: Kiran Vedere <kiranv@mellanox.com>; Asaf Penso <asafp@mellanox.com>;
'Stephen Hemminger' <stephen@networkplumber.org>
Cc: users@dpdk.org; Erez Ferber <erezf@mellanox.com>; Olga Shern
<olgas@mellanox.com>; Danny Vogel <dan@mellanoxfederal.com>
Subject: RE: [dpdk-users] DPDK on Mellanox BlueField Ref Platform

Hi Kiran,

When I try this command line with testpmd (with the -w just changed to my
port 0's PCIe address), I get "Creation of mbuf pool for socket
0 failed:  Cannot allocate memory".  I've tried adding --total-num-mbufs to
restrict that, but that didn't help.  It runs if I try restricting it to
just two cores, but then I drop most of my packets.  Here's the output
running it as you suggested:

    [root@localhost bin]# ./testpmd --log-level="mlx5,8" -l 3,4,5,6,7,8,
     9,10,11,12,13,14,15 -n 4 -w 0f:00.0 --socket-mem=2048 ---socket-num=0
    --burst=64 --txd=2048 --rxd=2048 --mbcache=512 --rxq=12 --txq=12
    --nb-cores=12 -i -a --forward-mode=mac --max-pkt-len=9000
    --mbuf-size=16384

    EAL: Detected 16 lcore(s)
    EAL: Detected 1 NUMA nodes
    EAL: Multi-process socket /var/run/dpdk/rte/mp_socket
    EAL: Selected IOVA mode 'PA'
    EAL: Probing VFIO support...
    EAL: VFIO support initialized
    EAL: PCI device 0000:0f:00.0 on NUMA socket -1
    EAL:   Invalid NUMA socket, default to 0
    EAL:   probe driver: 15b3:a2d2 net_mlx5
    net_mlx5: mlx5.c:2145: mlx5_pci_probe(): checking device "mlx5_1"
    net_mlx5: mlx5.c:2145: mlx5_pci_probe(): checking device "mlx5_0"
    net_mlx5: mlx5.c:2154: mlx5_pci_probe(): PCI information matches for
device "mlx5_0"
    net_mlx5: mlx5.c:2342: mlx5_pci_probe(): no E-Switch support detected
    net_mlx5: mlx5.c:1557: mlx5_dev_spawn(): naming Ethernet device
"0f:00.0"
    net_mlx5: mlx5.c:363: mlx5_alloc_shared_ibctx(): DevX is NOT supported
    net_mlx5: mlx5_mr.c:212: mlx5_mr_btree_init(): initialized B-tree
0x17fec8c68 with table     0x17fec60c0
    net_mlx5: mlx5.c:1610: mlx5_dev_spawn(): enhanced MPW is supported
    net_mlx5: mlx5.c:1623: mlx5_dev_spawn(): SWP support: 7
    net_mlx5: mlx5.c:1632: mlx5_dev_spawn():
min_single_stride_log_num_of_bytes: 6
    net_mlx5: mlx5.c:1634: mlx5_dev_spawn():
max_single_stride_log_num_of_bytes: 13
    net_mlx5: mlx5.c:1636: mlx5_dev_spawn():
min_single_wqe_log_num_of_strides: 3
    net_mlx5: mlx5.c:1638: mlx5_dev_spawn():
max_single_wqe_log_num_of_strides: 16
    net_mlx5: mlx5.c:1640: mlx5_dev_spawn():        supported_qpts: 256
    net_mlx5: mlx5.c:1641: mlx5_dev_spawn(): device supports Multi-Packet RQ
    net_mlx5: mlx5.c:1674: mlx5_dev_spawn(): tunnel offloading is supported
    net_mlx5: mlx5.c:1686: mlx5_dev_spawn(): MPLS over GRE/UDP tunnel
offloading is not     supported
    net_mlx5: mlx5.c:1783: mlx5_dev_spawn(): checksum offloading is
supported
    net_mlx5: mlx5.c:1803: mlx5_dev_spawn(): maximum Rx indirection table
size is 512
    net_mlx5: mlx5.c:1807: mlx5_dev_spawn(): VLAN stripping is supported
    net_mlx5: mlx5.c:1811: mlx5_dev_spawn(): FCS stripping configuration is
supported
    net_mlx5: mlx5.c:1840: mlx5_dev_spawn(): enhanced MPS is enabled
    net_mlx5: mlx5.c:1938: mlx5_dev_spawn(): port 0 MAC address is
50:6b:4b:e0:9a:22
    net_mlx5: mlx5.c:1945: mlx5_dev_spawn(): port 0 ifname is "enp15s0f0"
    net_mlx5: mlx5.c:1958: mlx5_dev_spawn(): port 0 MTU is 9000
    net_mlx5: mlx5.c:1980: mlx5_dev_spawn(): port 0 forcing Ethernet
interface up
    net_mlx5: mlx5.c:1356: mlx5_set_min_inline(): min tx inline configured:
0
    net_mlx5: mlx5_flow.c:377: mlx5_flow_discover_priorities(): port 0 flow
maximum     priority: 5
    Interactive-mode selected
    Auto-start selected
    Set mac packet forwarding mode
    testpmd: create a new mbuf pool <mbuf_pool_socket_0>: n=344064,
size=16384, socket=0
    testpmd: preferred mempool ops selected: ring_mp_mc
    EAL: Error - exiting with code: 1
      Cause: Creation of mbuf pool for socket 0 failed: Cannot allocate
memory

This is with 2048 2M hugepages defined, so I think I have plenty of memory
available.  I used dpdk-setup to set and verify the hugepages'
configuration and availability.  I'm trying to do some experiments to see if
I get to the bottom of this.

Any thoughts?

Regards,
--Jim

-----Original Message-----
From: Kiran Vedere [mailto:kiranv@mellanox.com]
Sent: Friday, October 04, 2019 2:28 PM
To: Jim Vaigl; Asaf Penso; 'Stephen Hemminger'
Cc: users@dpdk.org; Erez Ferber; Olga Shern; Danny Vogel
Subject: RE: [dpdk-users] DPDK on Mellanox BlueField Ref Platform

Hi Jim,

I tried your test with 9000 Byte MTU Size. On BlueField Reference Platform I
set the MTU of the interface to 9000 and on TRex I am sending 8096 size byte
packets. I am able to loop back packets fine w/o any issues. Below is the
command line I use for testpmd

./testpmd --log-level="mlx5,8" -l 3,4,5,6,7,8,9,10,11,12,13,14,15 -n 4 -w
17:00.0 --socket-mem=2048 -- --socket-num=0 --burst=64 --txd=2048 --rxd=2048
--mbcache=512 --rxq=12 --txq=12 --nb-cores=12 -i -a --forward-mode=mac
--max-pkt-len=9000 --mbuf-size=16384

Two things to consider: The max Rx packet len  is used by the PMD during its
Rx Queue initialization. By default this is set to 1518 Bytes for
testpmd/l3fwd. For jumbo frames you need to pass --max-pkt-len=9000 (for
testpmd) or --enable-jumbo --max-pkt-len=9000 (for l3fwd). Are you passing
these values to l3fwd/testpmd when you run your test? Also since the
mbuf_size is 2048 by default, you need to increase the mbuf_size to > Jumbo
frame size unless you enable scatter in the PMD. For testpmd you can
increase the mbuf size by using --mbuf-size parameter. For l3fwd I don't
think there is a command line option to increase mbuf size in runtime. So
you might need to recompile the l3fwd code to increase mbuf size. Are you
doing this?

Hope this helps.

Regards,
Kiran



-----Original Message-----
From: Jim Vaigl <jimv@rockbridgesoftware.com>
Sent: Friday, October 4, 2019 1:35 PM
To: Asaf Penso <asafp@mellanox.com>; 'Stephen Hemminger'
<stephen@networkplumber.org>
Cc: users@dpdk.org; Kiran Vedere <kiranv@mellanox.com>; Erez Ferber
<erezf@mellanox.com>; Olga Shern <olgas@mellanox.com>; Danny Vogel
<dan@mellanoxfederal.com>
Subject: RE: [dpdk-users] DPDK on Mellanox BlueField Ref Platform

A final update on this issue.  Kiran Vedere went above and beyond the call
of duty: he completely reproduced my hardware setup, showed that it worked
using trex to generate similar traffic to mine, and then provided me with a
bundled-up .bfb of his CentOS (with updated kernel) and OFED install to try
so that there would be no configuration stuff for me to mess up.

Using this, I saw exactly the same crashes I had seen in my setup.
After some thought, I realized the only meaningful difference was that my
traffic generator and IP configuration relied on an MTU size of 9000.
Once I set the MTU size down to 1500, the crashes stopped.

So, the answer is clearly that I'm just not setting up for the larger MTU
size.  I need to start to understand how to get DPDK to manage that, but the
crashing is at least understood now, and I have a way forward.

Thanks very much to Kiran.

Regards,
--Jim

-----Original Message-----
From: Jim Vaigl [mailto:jimv@rockbridgesoftware.com]
Sent: Thursday, September 26, 2019 3:47 PM
To: 'Asaf Penso'; 'Stephen Hemminger'
Cc: 'users@dpdk.org'; 'Kiran Vedere'; 'Erez Ferber'; 'Olga Shern'
Subject: RE: [dpdk-users] DPDK on Mellanox BlueField Ref Platform

> From: Asaf Penso [mailto:asafp@mellanox.com]
> Sent: Thursday, September 26, 2019 7:00 AM
> To: Jim Vaigl; 'Stephen Hemminger'
> Cc: users@dpdk.org; Kiran Vedere; Erez Ferber; Olga Shern
> Subject: RE: [dpdk-users] DPDK on Mellanox BlueField Ref Platform
>
> Hello Jim,
>
> Thanks for your mail.
> In order  for us to have a better resolution please send a mail to our
support team > - support@mellanox.com
> Please provide as much info about the setup, configuration etc as you can.
>
> In parallel, I added Erez Ferber here to assist.
>
> Regards,
> Asaf Penso

Thanks for the kind offer, Asaf.  I'll take this debug effort off-line with
you and Erez and post back to the list here later with any resolution so
everyone can see the result.

By the way, the prior suggestion of using v. 25 of rdma-core didn't pan out:
the current build script just makes a local build in a subdirectory off the
source tree and there's no obvious way to integrate it with the MLNX_OFED
environment and the dpdk install.  After resolving package dependencies to
get rdma-core to build from the GitHub repo, I realized the instructions say
this:

  ---
  Building
  This project uses a cmake based build system. Quick start:

  $ bash build.sh
  build/bin will contain the sample programs and build/lib
  will contain the shared libraries. The build is configured
  to run all the programs 'in-place' and cannot be installed.

  NOTE: It is not currently easy to run from the build
  directory, the plugins only load from the system path.
  ---

--Jim

>> -----Original Message-----
>> From: users <users-bounces@dpdk.org> On Behalf Of Jim Vaigl
>> Sent: Tuesday, September 24, 2019 10:11 PM
>> To: 'Stephen Hemminger' <stephen@networkplumber.org>
>> Cc: users@dpdk.org
>> Subject: Re: [dpdk-users] DPDK on Mellanox BlueField Ref Platform
>> 
>> On Tue, 24 Sep 2019 12:31:51 -0400
>> "Jim Vaigl" <jimv@rockbridgesoftware.com> wrote:
>> 
>>>> Since no one has chimed in with any build/install/configure 
>>>> suggestion
>> for
>> >> the
>> >> BlueField, I've spent some time debugging and thought I'd share 
>> >> the
>> results.
>> >> Building the l3fwd example application and running it as the docs
>> suggest,
>> >> when
>> >> I try to send it UDP packets from another machine, it dumps core.
>> >>
>> >> Debugging a bit with gdb and printf, I can see that from inside
>> >> process_packet()
>> >> and processx4_step1() the calls to rte_pktmbuf_mtod() return Nil 
>> >> or suspicious pointer values (i.e. 0x80).  The sample apps don't 
>> >> guard against NULL pointers being returned from this rte call, so 
>> >> that's why it's dumping core.
>> >>
>> >> I still think the problem is related to the driver config, but 
>> >> thought
>> this
>> >> might ring a bell for anyone who's had problems like this.
>> >>
>> >> The thing that still bothers me is that rather than seeing what I 
>> >> was expecting at init based on what the documentation shows:
>> >>     [...]
>> >>     EAL: probe driver: 15b3:1013 librte_pmd_mlx5
>> >>
>> >> ... when rte_eal_init() runs, I'm seeing:
>> >>     [...]
>> >>     EAL:  Selected IOVA mode 'PA'
>> >>     EAL:  Probing VFIO support...
>> >>
>> >> This still seems wrong, and I've verified that specifying the
BlueField
>> >> target ID
>> >> string in the make is causing "CONFIG_RTE_LIBRTE_MLX5_PMD=y" to
>> appear in
>> >> the .config.
>> >>
>> >> Regards,
>> >> --Jim Vaigl
>> >> 614 886 5999
>> >>
>> >>
>> >
>> >From: Stephen Hemminger [mailto:stephen@networkplumber.org]
>> >Sent: Tuesday, September 24, 2019 1:18 PM
>> >To: Jim Vaigl
>> >Cc: users@dpdk.org
>> >
>> >Subject: Re: [dpdk-users] DPDK on Mellanox BlueField Ref Platform 
>> >make sure you have latest version of rdma-core installed (v25).
>> >The right version is not in most distros
>> 
>> Great suggestion.  I'm using the rdma-core from the MLNX_OFED
>> 4.6-3.5.8.0 install.  I can't figure out how to tell what version 
>> that thing
includes,
>> even looking at the source, since there's no version information in 
>> the source files, BUT I went to github and downloaded rdma-core v24 
>> and v25 and neither diff cleanly with the source RPM that comes in 
>> the OFED install.  I don't know yet if it's because this is some 
>> different version or if it's because Mellanox has made their own tweaks.
>> 
>> I would hope that the very latest OFED from Mellanox would include an 
>> up-to-date and working set of libs/modules, but maybe you're on to 
>> something.  It sounds like a risky move, but maybe I'll try just 
>> installing rdma-core from github over top of the OFED install.  I 
>> have a fear that I'll end up with inconsistent versions, but it's 
>> worth a
try.
>> 
>> Thanks,
>> --Jim
 


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [dpdk-users] DPDK on Mellanox BlueField Ref Platform
@ 2019-09-20 19:34 Jim Vaigl
  0 siblings, 0 replies; 12+ messages in thread
From: Jim Vaigl @ 2019-09-20 19:34 UTC (permalink / raw)
  To: users

I'm trying to get DPDK to run on a Mellanox BlueField Reference Platform.
For what it's worth, I'm using CentOS 7.4.1708, Kernel 4.14.139 built from
source, MLNX_OFED 4.6-3.5.8, and DPDK 19.08.  I can get stuff to build and
run, but I'm failing to receive packets.  Either I get none, or I get core
dumps.  I've become convinced it's at least partly because the wrong driver
is in use in the configuration I've created.

 

In https://doc.dpdk.org/guides/nics/mlx5.html, it says this:

 

                User space I/O kernel modules (uio and igb_uio) are not used
and do not have to be loaded.

 

And the example output on that page shows:

 

EAL: PCI device 0000:05:00.0 on NUMA socket 0

EAL:   probe driver: 15b3:1013 librte_pmd_mlx5

PMD: librte_pmd_mlx5: PCI information matches, using device "mlx5_0" (VF:
false)

PMD: librte_pmd_mlx5: 1 port(s) detected

PMD: librte_pmd_mlx5: port 1 MAC address is e4:1d:2d:e7:0c:fe

 

...so, it's working and not using VFIO or igb_uio.  But in DPDK's
usertools/dpdk-setup.sh, I see this:

 

#

# Calls dpdk-devbind.py --status to show the devices and what they

# are all bound to, in terms of drivers.

#

show_devices()

{

       if [ -d /sys/module/vfio_pci -o -d /sys/module/igb_uio ]; then

              ${RTE_SDK}/usertools/dpdk-devbind.py --status

       else

              echo "# Please load the 'igb_uio' or 'vfio-pci' kernel module
before "

              echo "# querying or adjusting device bindings"

       fi

}

 

So the setup script bails unless VFIO or igb_uio is loaded, but this doesn't
match the output from the testpmd output example above, which shows
something else active.   I've got a request in to Mellanox, but it's been
weeks and the only feedback so far is that 'the DPDK developer brought up
DPDK on Bluefield and confirmed operation'.

 

If anyone is aware of a combination of version of CentOS, kernel, OFED, and
DPDK that works, and a self-consistent and compatible combination of
build/install options, I'd be grateful.  I've gone as far as digging through
the kernel .config and turning off VFIO and any built-in looking support for
mlx5 or dpdk so that I can be sure the support is coming from the dpdk
modules, but this feels like guessing.

 

Thanks for any suggestions,

--Jim


^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2019-10-07 21:11 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-09-24 16:31 [dpdk-users] DPDK on Mellanox BlueField Ref Platform Jim Vaigl
2019-09-24 17:18 ` Stephen Hemminger
2019-09-24 19:10   ` Jim Vaigl
2019-09-26 10:59     ` Asaf Penso
2019-09-26 19:46       ` Jim Vaigl
2019-10-04 17:35       ` Jim Vaigl
2019-10-04 18:28         ` Kiran Vedere
2019-10-07 16:52           ` Jim Vaigl
2019-10-07 17:00             ` Kiran Vedere
2019-10-07 17:02               ` Kiran Vedere
2019-10-07 18:10               ` Jim Vaigl
  -- strict thread matches above, loose matches on Subject: below --
2019-09-20 19:34 Jim Vaigl

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).