DPDK patches and discussions
 help / color / mirror / Atom feed
* Should we try to be more graceful in library init on old Hardware?
@ 2023-03-30 12:53 Christian Ehrhardt
  2023-03-30 13:15 ` Bruce Richardson
  0 siblings, 1 reply; 4+ messages in thread
From: Christian Ehrhardt @ 2023-03-30 12:53 UTC (permalink / raw)
  To: dev, Luca Boccassi

Hi,
I've recently gotten a kind of bug I was waiting for many years.
In fact I wondered if it would still come up as each year  made it less likely.
But it happened and I got a crash report of someone using dpdk a
rather old pre sse4.2 hardware.
=> https://bugs.launchpad.net/ubuntu/+source/dpdk/+bug/2009635/comments/9

The reporter was nice and tried the newer 22.11, but that is just as affected.

I understand that DPDK, as a project, has set this as the minimal
accepted hardware capability.
But due to some programs - in this case UHD - being able to do many
other things it might happen that UHD or any else just links to DPDK
(as it could be used with it) and due to that runs into a crash when
loading. In theory other tools like collectd which has dpdk support
would be affected by the same.

Example:
root@1bee22d20ca0:/# uhd_usrp_probe
Illegal instruction (core dumped)

(gdb) bt
#0 0x00007f4b2d3a3374 in rte_srand () from
/lib/x86_64-linux-gnu/librte_eal.so.23
#1 0x00007f4b2d3967ec in ?? () from /lib/x86_64-linux-gnu/librte_eal.so.23
#2 0x00007f4b2e5d1fbe in call_init (l=<optimized out>,
argc=argc@entry=1, argv=argv@entry=0x7ffeabf5b488,
env=env@entry=0x7ffeabf5b498)
    at ./elf/dl-init.c:70
#3 0x00007f4b2e5d20a8 in call_init (env=0x7ffeabf5b498,
argv=0x7ffeabf5b488, argc=1, l=<optimized out>) at ./elf/dl-init.c:33
#4 _dl_init (main_map=0x7f4b2e6042e0, argc=1, argv=0x7ffeabf5b488,
env=0x7ffeabf5b498) at ./elf/dl-init.c:117
#5 0x00007f4b2e5ea8b0 in _dl_start_user () from /lib64/ld-linux-x86-64.so.2
#6 0x0000000000000001 in ?? ()
#7 0x00007ffeabf5c844 in ?? ()
#8 0x0000000000000000 in ?? ()

Right now all we could do is:
a) say bad luck old hardware (not nice)
b) make super complex alternative builds with and without dpdk support
c) ask the DPDK project to work on non sse4.2 (unlikely and too late
in 2023 I guess)
d) Somehow make the initialization graceful (that is what I'm RFC here)

If we could manage to get that DPDK to ensure the lib loading paths
are SSE4.2 free.
Then we could check the capabilities on the actual initialization and
return a proper bad result instead of a crash.
Due to that only real-users of DPDK would be required to have
sufficiently new hardware.
And OTOH users of software that links, but in the current config would
not use DPDK would suffer less.

WDYT?
Maybe it has been already discussed and I did neither remember nor find it?

-- 
Christian Ehrhardt
Senior Staff Engineer, Ubuntu Server
Canonical Ltd

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Should we try to be more graceful in library init on old Hardware?
  2023-03-30 12:53 Should we try to be more graceful in library init on old Hardware? Christian Ehrhardt
@ 2023-03-30 13:15 ` Bruce Richardson
  2023-03-30 13:28   ` Bruce Richardson
  0 siblings, 1 reply; 4+ messages in thread
From: Bruce Richardson @ 2023-03-30 13:15 UTC (permalink / raw)
  To: Christian Ehrhardt; +Cc: dev, Luca Boccassi

On Thu, Mar 30, 2023 at 02:53:41PM +0200, Christian Ehrhardt wrote:
> Hi,
> I've recently gotten a kind of bug I was waiting for many years.
> In fact I wondered if it would still come up as each year  made it less likely.
> But it happened and I got a crash report of someone using dpdk a
> rather old pre sse4.2 hardware.
> => https://bugs.launchpad.net/ubuntu/+source/dpdk/+bug/2009635/comments/9
> 
> The reporter was nice and tried the newer 22.11, but that is just as affected.
> 
> I understand that DPDK, as a project, has set this as the minimal
> accepted hardware capability.
> But due to some programs - in this case UHD - being able to do many
> other things it might happen that UHD or any else just links to DPDK
> (as it could be used with it) and due to that runs into a crash when
> loading. In theory other tools like collectd which has dpdk support
> would be affected by the same.
> 
> Example:
> root@1bee22d20ca0:/# uhd_usrp_probe
> Illegal instruction (core dumped)
> 
> (gdb) bt
> #0 0x00007f4b2d3a3374 in rte_srand () from
> /lib/x86_64-linux-gnu/librte_eal.so.23
> #1 0x00007f4b2d3967ec in ?? () from /lib/x86_64-linux-gnu/librte_eal.so.23
> #2 0x00007f4b2e5d1fbe in call_init (l=<optimized out>,
> argc=argc@entry=1, argv=argv@entry=0x7ffeabf5b488,
> env=env@entry=0x7ffeabf5b498)
>     at ./elf/dl-init.c:70
> #3 0x00007f4b2e5d20a8 in call_init (env=0x7ffeabf5b498,
> argv=0x7ffeabf5b488, argc=1, l=<optimized out>) at ./elf/dl-init.c:33
> #4 _dl_init (main_map=0x7f4b2e6042e0, argc=1, argv=0x7ffeabf5b488,
> env=0x7ffeabf5b498) at ./elf/dl-init.c:117
> #5 0x00007f4b2e5ea8b0 in _dl_start_user () from /lib64/ld-linux-x86-64.so.2
> #6 0x0000000000000001 in ?? ()
> #7 0x00007ffeabf5c844 in ?? ()
> #8 0x0000000000000000 in ?? ()
> 
> Right now all we could do is:
> a) say bad luck old hardware (not nice)
> b) make super complex alternative builds with and without dpdk support
> c) ask the DPDK project to work on non sse4.2 (unlikely and too late
> in 2023 I guess)
> d) Somehow make the initialization graceful (that is what I'm RFC here)
> 
> If we could manage to get that DPDK to ensure the lib loading paths
> are SSE4.2 free.
> Then we could check the capabilities on the actual initialization and
> return a proper bad result instead of a crash.
> Due to that only real-users of DPDK would be required to have
> sufficiently new hardware.
> And OTOH users of software that links, but in the current config would
> not use DPDK would suffer less.
> 
> WDYT?
> Maybe it has been already discussed and I did neither remember nor find it?
> 
It certainly hasn't been discussed previously, but there is meant to be
support for this in EAL init itself. Almost the first function called
from eal_init() is "rte_cpu_is_supported()" [1] which checks the build-time
CPU flags against those of the current system.
Unfortunately, from the error message you are getting, that doesn't seem to
be working ok in the case of SSE4.2. It seems the compiler is inserting
SSE4 instructions before we even get to that point. :-(

Perhaps we need to move eal init to a new file, and compile it (and the
cpuflag checks) with very minimal CPU flags.

/Bruce


[1] http://git.dpdk.org/dpdk/tree/lib/eal/common/eal_common_cpuflags.c

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Should we try to be more graceful in library init on old Hardware?
  2023-03-30 13:15 ` Bruce Richardson
@ 2023-03-30 13:28   ` Bruce Richardson
  2023-03-30 14:31     ` Dmitry Kozlyuk
  0 siblings, 1 reply; 4+ messages in thread
From: Bruce Richardson @ 2023-03-30 13:28 UTC (permalink / raw)
  To: Christian Ehrhardt; +Cc: dev, Luca Boccassi

On Thu, Mar 30, 2023 at 02:15:42PM +0100, Bruce Richardson wrote:
> On Thu, Mar 30, 2023 at 02:53:41PM +0200, Christian Ehrhardt wrote:
> > Hi,
> > I've recently gotten a kind of bug I was waiting for many years.
> > In fact I wondered if it would still come up as each year  made it less likely.
> > But it happened and I got a crash report of someone using dpdk a
> > rather old pre sse4.2 hardware.
> > => https://bugs.launchpad.net/ubuntu/+source/dpdk/+bug/2009635/comments/9
> > 
> > The reporter was nice and tried the newer 22.11, but that is just as affected.
> > 
> > I understand that DPDK, as a project, has set this as the minimal
> > accepted hardware capability.
> > But due to some programs - in this case UHD - being able to do many
> > other things it might happen that UHD or any else just links to DPDK
> > (as it could be used with it) and due to that runs into a crash when
> > loading. In theory other tools like collectd which has dpdk support
> > would be affected by the same.
> > 
> > Example:
> > root@1bee22d20ca0:/# uhd_usrp_probe
> > Illegal instruction (core dumped)
> > 
> > (gdb) bt
> > #0 0x00007f4b2d3a3374 in rte_srand () from
> > /lib/x86_64-linux-gnu/librte_eal.so.23
> > #1 0x00007f4b2d3967ec in ?? () from /lib/x86_64-linux-gnu/librte_eal.so.23
> > #2 0x00007f4b2e5d1fbe in call_init (l=<optimized out>,
> > argc=argc@entry=1, argv=argv@entry=0x7ffeabf5b488,
> > env=env@entry=0x7ffeabf5b498)
> >     at ./elf/dl-init.c:70
> > #3 0x00007f4b2e5d20a8 in call_init (env=0x7ffeabf5b498,
> > argv=0x7ffeabf5b488, argc=1, l=<optimized out>) at ./elf/dl-init.c:33
> > #4 _dl_init (main_map=0x7f4b2e6042e0, argc=1, argv=0x7ffeabf5b488,
> > env=0x7ffeabf5b498) at ./elf/dl-init.c:117
> > #5 0x00007f4b2e5ea8b0 in _dl_start_user () from /lib64/ld-linux-x86-64.so.2
> > #6 0x0000000000000001 in ?? ()
> > #7 0x00007ffeabf5c844 in ?? ()
> > #8 0x0000000000000000 in ?? ()
> > 
> > Right now all we could do is:
> > a) say bad luck old hardware (not nice)
> > b) make super complex alternative builds with and without dpdk support
> > c) ask the DPDK project to work on non sse4.2 (unlikely and too late
> > in 2023 I guess)
> > d) Somehow make the initialization graceful (that is what I'm RFC here)
> > 
> > If we could manage to get that DPDK to ensure the lib loading paths
> > are SSE4.2 free.
> > Then we could check the capabilities on the actual initialization and
> > return a proper bad result instead of a crash.
> > Due to that only real-users of DPDK would be required to have
> > sufficiently new hardware.
> > And OTOH users of software that links, but in the current config would
> > not use DPDK would suffer less.
> > 
> > WDYT?
> > Maybe it has been already discussed and I did neither remember nor find it?
> > 
> It certainly hasn't been discussed previously, but there is meant to be
> support for this in EAL init itself. Almost the first function called
> from eal_init() is "rte_cpu_is_supported()" [1] which checks the build-time
> CPU flags against those of the current system.
> Unfortunately, from the error message you are getting, that doesn't seem to
> be working ok in the case of SSE4.2. It seems the compiler is inserting
> SSE4 instructions before we even get to that point. :-(
> 
> Perhaps we need to move eal init to a new file, and compile it (and the
> cpuflag checks) with very minimal CPU flags.
> 

Following up to my own mail...

I believe we may be able to solve this easier by maybe using the "target"
attribute for those functions. For x86 builds I don't see why eal init
cannot be compiled for an earlier SSE version, (march=core2, perhaps). It's
not a performance-sensitive function.

Thoughts?
/Bruce

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Should we try to be more graceful in library init on old Hardware?
  2023-03-30 13:28   ` Bruce Richardson
@ 2023-03-30 14:31     ` Dmitry Kozlyuk
  0 siblings, 0 replies; 4+ messages in thread
From: Dmitry Kozlyuk @ 2023-03-30 14:31 UTC (permalink / raw)
  To: Bruce Richardson; +Cc: Christian Ehrhardt, dev, Luca Boccassi

2023-03-30 14:28 (UTC+0100), Bruce Richardson:
> On Thu, Mar 30, 2023 at 02:15:42PM +0100, Bruce Richardson wrote:
> > On Thu, Mar 30, 2023 at 02:53:41PM +0200, Christian Ehrhardt wrote:  
> > > Hi,
> > > I've recently gotten a kind of bug I was waiting for many years.
> > > In fact I wondered if it would still come up as each year  made it less likely.
> > > But it happened and I got a crash report of someone using dpdk a
> > > rather old pre sse4.2 hardware.  
> > > => https://bugs.launchpad.net/ubuntu/+source/dpdk/+bug/2009635/comments/9  
> > > 
> > > The reporter was nice and tried the newer 22.11, but that is just as affected.
> > > 
> > > I understand that DPDK, as a project, has set this as the minimal
> > > accepted hardware capability.
> > > But due to some programs - in this case UHD - being able to do many
> > > other things it might happen that UHD or any else just links to DPDK
> > > (as it could be used with it) and due to that runs into a crash when
> > > loading. In theory other tools like collectd which has dpdk support
> > > would be affected by the same.
> > > 
> > > Example:
> > > root@1bee22d20ca0:/# uhd_usrp_probe
> > > Illegal instruction (core dumped)
> > > 
> > > (gdb) bt
> > > #0 0x00007f4b2d3a3374 in rte_srand () from
> > > /lib/x86_64-linux-gnu/librte_eal.so.23
> > > #1 0x00007f4b2d3967ec in ?? () from /lib/x86_64-linux-gnu/librte_eal.so.23
> > > #2 0x00007f4b2e5d1fbe in call_init (l=<optimized out>,
> > > argc=argc@entry=1, argv=argv@entry=0x7ffeabf5b488,
> > > env=env@entry=0x7ffeabf5b498)
> > >     at ./elf/dl-init.c:70
> > > #3 0x00007f4b2e5d20a8 in call_init (env=0x7ffeabf5b498,
> > > argv=0x7ffeabf5b488, argc=1, l=<optimized out>) at ./elf/dl-init.c:33
> > > #4 _dl_init (main_map=0x7f4b2e6042e0, argc=1, argv=0x7ffeabf5b488,
> > > env=0x7ffeabf5b498) at ./elf/dl-init.c:117
> > > #5 0x00007f4b2e5ea8b0 in _dl_start_user () from /lib64/ld-linux-x86-64.so.2
> > > #6 0x0000000000000001 in ?? ()
> > > #7 0x00007ffeabf5c844 in ?? ()
> > > #8 0x0000000000000000 in ?? ()
> > > 
> > > Right now all we could do is:
> > > a) say bad luck old hardware (not nice)
> > > b) make super complex alternative builds with and without dpdk support
> > > c) ask the DPDK project to work on non sse4.2 (unlikely and too late
> > > in 2023 I guess)
> > > d) Somehow make the initialization graceful (that is what I'm RFC here)
> > > 
> > > If we could manage to get that DPDK to ensure the lib loading paths
> > > are SSE4.2 free.
> > > Then we could check the capabilities on the actual initialization and
> > > return a proper bad result instead of a crash.
> > > Due to that only real-users of DPDK would be required to have
> > > sufficiently new hardware.
> > > And OTOH users of software that links, but in the current config would
> > > not use DPDK would suffer less.
> > > 
> > > WDYT?
> > > Maybe it has been already discussed and I did neither remember nor find it?
> > >   
> > It certainly hasn't been discussed previously, but there is meant to be
> > support for this in EAL init itself. Almost the first function called
> > from eal_init() is "rte_cpu_is_supported()" [1] which checks the build-time
> > CPU flags against those of the current system.
> > Unfortunately, from the error message you are getting, that doesn't seem to
> > be working ok in the case of SSE4.2. It seems the compiler is inserting
> > SSE4 instructions before we even get to that point. :-(
> > 
> > Perhaps we need to move eal init to a new file, and compile it (and the
> > cpuflag checks) with very minimal CPU flags.
> >   
> 
> Following up to my own mail...
> 
> I believe we may be able to solve this easier by maybe using the "target"
> attribute for those functions. For x86 builds I don't see why eal init
> cannot be compiled for an earlier SSE version, (march=core2, perhaps). It's
> not a performance-sensitive function.
> 
> Thoughts?
> /Bruce

The error originates from some RTE_INIT() routine called on library load.
They can also be augmented with the "target" attribute
and a check before calling the actual code supplied by DPDK developer.
The latter is needed because we can't ensure (systematically)
that this code doesn't call some external function that uses SSE4.2.
As for rte_eal_init(), I think the check there is enough with one big "if":
main() must also be compiled for the generic CPU to get there.
So app developers can't be completely freed from thinking about this.
BTW, rte_cpu_is_supported() itself is not protected
against being compiled into unsupported instructions :)

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2023-03-30 14:31 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-03-30 12:53 Should we try to be more graceful in library init on old Hardware? Christian Ehrhardt
2023-03-30 13:15 ` Bruce Richardson
2023-03-30 13:28   ` Bruce Richardson
2023-03-30 14:31     ` Dmitry Kozlyuk

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).