* Should we try to be more graceful in library init on old Hardware? @ 2023-03-30 12:53 Christian Ehrhardt 2023-03-30 13:15 ` Bruce Richardson 0 siblings, 1 reply; 4+ messages in thread From: Christian Ehrhardt @ 2023-03-30 12:53 UTC (permalink / raw) To: dev, Luca Boccassi Hi, I've recently gotten a kind of bug I was waiting for many years. In fact I wondered if it would still come up as each year made it less likely. But it happened and I got a crash report of someone using dpdk a rather old pre sse4.2 hardware. => https://bugs.launchpad.net/ubuntu/+source/dpdk/+bug/2009635/comments/9 The reporter was nice and tried the newer 22.11, but that is just as affected. I understand that DPDK, as a project, has set this as the minimal accepted hardware capability. But due to some programs - in this case UHD - being able to do many other things it might happen that UHD or any else just links to DPDK (as it could be used with it) and due to that runs into a crash when loading. In theory other tools like collectd which has dpdk support would be affected by the same. Example: root@1bee22d20ca0:/# uhd_usrp_probe Illegal instruction (core dumped) (gdb) bt #0 0x00007f4b2d3a3374 in rte_srand () from /lib/x86_64-linux-gnu/librte_eal.so.23 #1 0x00007f4b2d3967ec in ?? () from /lib/x86_64-linux-gnu/librte_eal.so.23 #2 0x00007f4b2e5d1fbe in call_init (l=<optimized out>, argc=argc@entry=1, argv=argv@entry=0x7ffeabf5b488, env=env@entry=0x7ffeabf5b498) at ./elf/dl-init.c:70 #3 0x00007f4b2e5d20a8 in call_init (env=0x7ffeabf5b498, argv=0x7ffeabf5b488, argc=1, l=<optimized out>) at ./elf/dl-init.c:33 #4 _dl_init (main_map=0x7f4b2e6042e0, argc=1, argv=0x7ffeabf5b488, env=0x7ffeabf5b498) at ./elf/dl-init.c:117 #5 0x00007f4b2e5ea8b0 in _dl_start_user () from /lib64/ld-linux-x86-64.so.2 #6 0x0000000000000001 in ?? () #7 0x00007ffeabf5c844 in ?? () #8 0x0000000000000000 in ?? () Right now all we could do is: a) say bad luck old hardware (not nice) b) make super complex alternative builds with and without dpdk support c) ask the DPDK project to work on non sse4.2 (unlikely and too late in 2023 I guess) d) Somehow make the initialization graceful (that is what I'm RFC here) If we could manage to get that DPDK to ensure the lib loading paths are SSE4.2 free. Then we could check the capabilities on the actual initialization and return a proper bad result instead of a crash. Due to that only real-users of DPDK would be required to have sufficiently new hardware. And OTOH users of software that links, but in the current config would not use DPDK would suffer less. WDYT? Maybe it has been already discussed and I did neither remember nor find it? -- Christian Ehrhardt Senior Staff Engineer, Ubuntu Server Canonical Ltd ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: Should we try to be more graceful in library init on old Hardware? 2023-03-30 12:53 Should we try to be more graceful in library init on old Hardware? Christian Ehrhardt @ 2023-03-30 13:15 ` Bruce Richardson 2023-03-30 13:28 ` Bruce Richardson 0 siblings, 1 reply; 4+ messages in thread From: Bruce Richardson @ 2023-03-30 13:15 UTC (permalink / raw) To: Christian Ehrhardt; +Cc: dev, Luca Boccassi On Thu, Mar 30, 2023 at 02:53:41PM +0200, Christian Ehrhardt wrote: > Hi, > I've recently gotten a kind of bug I was waiting for many years. > In fact I wondered if it would still come up as each year made it less likely. > But it happened and I got a crash report of someone using dpdk a > rather old pre sse4.2 hardware. > => https://bugs.launchpad.net/ubuntu/+source/dpdk/+bug/2009635/comments/9 > > The reporter was nice and tried the newer 22.11, but that is just as affected. > > I understand that DPDK, as a project, has set this as the minimal > accepted hardware capability. > But due to some programs - in this case UHD - being able to do many > other things it might happen that UHD or any else just links to DPDK > (as it could be used with it) and due to that runs into a crash when > loading. In theory other tools like collectd which has dpdk support > would be affected by the same. > > Example: > root@1bee22d20ca0:/# uhd_usrp_probe > Illegal instruction (core dumped) > > (gdb) bt > #0 0x00007f4b2d3a3374 in rte_srand () from > /lib/x86_64-linux-gnu/librte_eal.so.23 > #1 0x00007f4b2d3967ec in ?? () from /lib/x86_64-linux-gnu/librte_eal.so.23 > #2 0x00007f4b2e5d1fbe in call_init (l=<optimized out>, > argc=argc@entry=1, argv=argv@entry=0x7ffeabf5b488, > env=env@entry=0x7ffeabf5b498) > at ./elf/dl-init.c:70 > #3 0x00007f4b2e5d20a8 in call_init (env=0x7ffeabf5b498, > argv=0x7ffeabf5b488, argc=1, l=<optimized out>) at ./elf/dl-init.c:33 > #4 _dl_init (main_map=0x7f4b2e6042e0, argc=1, argv=0x7ffeabf5b488, > env=0x7ffeabf5b498) at ./elf/dl-init.c:117 > #5 0x00007f4b2e5ea8b0 in _dl_start_user () from /lib64/ld-linux-x86-64.so.2 > #6 0x0000000000000001 in ?? () > #7 0x00007ffeabf5c844 in ?? () > #8 0x0000000000000000 in ?? () > > Right now all we could do is: > a) say bad luck old hardware (not nice) > b) make super complex alternative builds with and without dpdk support > c) ask the DPDK project to work on non sse4.2 (unlikely and too late > in 2023 I guess) > d) Somehow make the initialization graceful (that is what I'm RFC here) > > If we could manage to get that DPDK to ensure the lib loading paths > are SSE4.2 free. > Then we could check the capabilities on the actual initialization and > return a proper bad result instead of a crash. > Due to that only real-users of DPDK would be required to have > sufficiently new hardware. > And OTOH users of software that links, but in the current config would > not use DPDK would suffer less. > > WDYT? > Maybe it has been already discussed and I did neither remember nor find it? > It certainly hasn't been discussed previously, but there is meant to be support for this in EAL init itself. Almost the first function called from eal_init() is "rte_cpu_is_supported()" [1] which checks the build-time CPU flags against those of the current system. Unfortunately, from the error message you are getting, that doesn't seem to be working ok in the case of SSE4.2. It seems the compiler is inserting SSE4 instructions before we even get to that point. :-( Perhaps we need to move eal init to a new file, and compile it (and the cpuflag checks) with very minimal CPU flags. /Bruce [1] http://git.dpdk.org/dpdk/tree/lib/eal/common/eal_common_cpuflags.c ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: Should we try to be more graceful in library init on old Hardware? 2023-03-30 13:15 ` Bruce Richardson @ 2023-03-30 13:28 ` Bruce Richardson 2023-03-30 14:31 ` Dmitry Kozlyuk 0 siblings, 1 reply; 4+ messages in thread From: Bruce Richardson @ 2023-03-30 13:28 UTC (permalink / raw) To: Christian Ehrhardt; +Cc: dev, Luca Boccassi On Thu, Mar 30, 2023 at 02:15:42PM +0100, Bruce Richardson wrote: > On Thu, Mar 30, 2023 at 02:53:41PM +0200, Christian Ehrhardt wrote: > > Hi, > > I've recently gotten a kind of bug I was waiting for many years. > > In fact I wondered if it would still come up as each year made it less likely. > > But it happened and I got a crash report of someone using dpdk a > > rather old pre sse4.2 hardware. > > => https://bugs.launchpad.net/ubuntu/+source/dpdk/+bug/2009635/comments/9 > > > > The reporter was nice and tried the newer 22.11, but that is just as affected. > > > > I understand that DPDK, as a project, has set this as the minimal > > accepted hardware capability. > > But due to some programs - in this case UHD - being able to do many > > other things it might happen that UHD or any else just links to DPDK > > (as it could be used with it) and due to that runs into a crash when > > loading. In theory other tools like collectd which has dpdk support > > would be affected by the same. > > > > Example: > > root@1bee22d20ca0:/# uhd_usrp_probe > > Illegal instruction (core dumped) > > > > (gdb) bt > > #0 0x00007f4b2d3a3374 in rte_srand () from > > /lib/x86_64-linux-gnu/librte_eal.so.23 > > #1 0x00007f4b2d3967ec in ?? () from /lib/x86_64-linux-gnu/librte_eal.so.23 > > #2 0x00007f4b2e5d1fbe in call_init (l=<optimized out>, > > argc=argc@entry=1, argv=argv@entry=0x7ffeabf5b488, > > env=env@entry=0x7ffeabf5b498) > > at ./elf/dl-init.c:70 > > #3 0x00007f4b2e5d20a8 in call_init (env=0x7ffeabf5b498, > > argv=0x7ffeabf5b488, argc=1, l=<optimized out>) at ./elf/dl-init.c:33 > > #4 _dl_init (main_map=0x7f4b2e6042e0, argc=1, argv=0x7ffeabf5b488, > > env=0x7ffeabf5b498) at ./elf/dl-init.c:117 > > #5 0x00007f4b2e5ea8b0 in _dl_start_user () from /lib64/ld-linux-x86-64.so.2 > > #6 0x0000000000000001 in ?? () > > #7 0x00007ffeabf5c844 in ?? () > > #8 0x0000000000000000 in ?? () > > > > Right now all we could do is: > > a) say bad luck old hardware (not nice) > > b) make super complex alternative builds with and without dpdk support > > c) ask the DPDK project to work on non sse4.2 (unlikely and too late > > in 2023 I guess) > > d) Somehow make the initialization graceful (that is what I'm RFC here) > > > > If we could manage to get that DPDK to ensure the lib loading paths > > are SSE4.2 free. > > Then we could check the capabilities on the actual initialization and > > return a proper bad result instead of a crash. > > Due to that only real-users of DPDK would be required to have > > sufficiently new hardware. > > And OTOH users of software that links, but in the current config would > > not use DPDK would suffer less. > > > > WDYT? > > Maybe it has been already discussed and I did neither remember nor find it? > > > It certainly hasn't been discussed previously, but there is meant to be > support for this in EAL init itself. Almost the first function called > from eal_init() is "rte_cpu_is_supported()" [1] which checks the build-time > CPU flags against those of the current system. > Unfortunately, from the error message you are getting, that doesn't seem to > be working ok in the case of SSE4.2. It seems the compiler is inserting > SSE4 instructions before we even get to that point. :-( > > Perhaps we need to move eal init to a new file, and compile it (and the > cpuflag checks) with very minimal CPU flags. > Following up to my own mail... I believe we may be able to solve this easier by maybe using the "target" attribute for those functions. For x86 builds I don't see why eal init cannot be compiled for an earlier SSE version, (march=core2, perhaps). It's not a performance-sensitive function. Thoughts? /Bruce ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: Should we try to be more graceful in library init on old Hardware? 2023-03-30 13:28 ` Bruce Richardson @ 2023-03-30 14:31 ` Dmitry Kozlyuk 0 siblings, 0 replies; 4+ messages in thread From: Dmitry Kozlyuk @ 2023-03-30 14:31 UTC (permalink / raw) To: Bruce Richardson; +Cc: Christian Ehrhardt, dev, Luca Boccassi 2023-03-30 14:28 (UTC+0100), Bruce Richardson: > On Thu, Mar 30, 2023 at 02:15:42PM +0100, Bruce Richardson wrote: > > On Thu, Mar 30, 2023 at 02:53:41PM +0200, Christian Ehrhardt wrote: > > > Hi, > > > I've recently gotten a kind of bug I was waiting for many years. > > > In fact I wondered if it would still come up as each year made it less likely. > > > But it happened and I got a crash report of someone using dpdk a > > > rather old pre sse4.2 hardware. > > > => https://bugs.launchpad.net/ubuntu/+source/dpdk/+bug/2009635/comments/9 > > > > > > The reporter was nice and tried the newer 22.11, but that is just as affected. > > > > > > I understand that DPDK, as a project, has set this as the minimal > > > accepted hardware capability. > > > But due to some programs - in this case UHD - being able to do many > > > other things it might happen that UHD or any else just links to DPDK > > > (as it could be used with it) and due to that runs into a crash when > > > loading. In theory other tools like collectd which has dpdk support > > > would be affected by the same. > > > > > > Example: > > > root@1bee22d20ca0:/# uhd_usrp_probe > > > Illegal instruction (core dumped) > > > > > > (gdb) bt > > > #0 0x00007f4b2d3a3374 in rte_srand () from > > > /lib/x86_64-linux-gnu/librte_eal.so.23 > > > #1 0x00007f4b2d3967ec in ?? () from /lib/x86_64-linux-gnu/librte_eal.so.23 > > > #2 0x00007f4b2e5d1fbe in call_init (l=<optimized out>, > > > argc=argc@entry=1, argv=argv@entry=0x7ffeabf5b488, > > > env=env@entry=0x7ffeabf5b498) > > > at ./elf/dl-init.c:70 > > > #3 0x00007f4b2e5d20a8 in call_init (env=0x7ffeabf5b498, > > > argv=0x7ffeabf5b488, argc=1, l=<optimized out>) at ./elf/dl-init.c:33 > > > #4 _dl_init (main_map=0x7f4b2e6042e0, argc=1, argv=0x7ffeabf5b488, > > > env=0x7ffeabf5b498) at ./elf/dl-init.c:117 > > > #5 0x00007f4b2e5ea8b0 in _dl_start_user () from /lib64/ld-linux-x86-64.so.2 > > > #6 0x0000000000000001 in ?? () > > > #7 0x00007ffeabf5c844 in ?? () > > > #8 0x0000000000000000 in ?? () > > > > > > Right now all we could do is: > > > a) say bad luck old hardware (not nice) > > > b) make super complex alternative builds with and without dpdk support > > > c) ask the DPDK project to work on non sse4.2 (unlikely and too late > > > in 2023 I guess) > > > d) Somehow make the initialization graceful (that is what I'm RFC here) > > > > > > If we could manage to get that DPDK to ensure the lib loading paths > > > are SSE4.2 free. > > > Then we could check the capabilities on the actual initialization and > > > return a proper bad result instead of a crash. > > > Due to that only real-users of DPDK would be required to have > > > sufficiently new hardware. > > > And OTOH users of software that links, but in the current config would > > > not use DPDK would suffer less. > > > > > > WDYT? > > > Maybe it has been already discussed and I did neither remember nor find it? > > > > > It certainly hasn't been discussed previously, but there is meant to be > > support for this in EAL init itself. Almost the first function called > > from eal_init() is "rte_cpu_is_supported()" [1] which checks the build-time > > CPU flags against those of the current system. > > Unfortunately, from the error message you are getting, that doesn't seem to > > be working ok in the case of SSE4.2. It seems the compiler is inserting > > SSE4 instructions before we even get to that point. :-( > > > > Perhaps we need to move eal init to a new file, and compile it (and the > > cpuflag checks) with very minimal CPU flags. > > > > Following up to my own mail... > > I believe we may be able to solve this easier by maybe using the "target" > attribute for those functions. For x86 builds I don't see why eal init > cannot be compiled for an earlier SSE version, (march=core2, perhaps). It's > not a performance-sensitive function. > > Thoughts? > /Bruce The error originates from some RTE_INIT() routine called on library load. They can also be augmented with the "target" attribute and a check before calling the actual code supplied by DPDK developer. The latter is needed because we can't ensure (systematically) that this code doesn't call some external function that uses SSE4.2. As for rte_eal_init(), I think the check there is enough with one big "if": main() must also be compiled for the generic CPU to get there. So app developers can't be completely freed from thinking about this. BTW, rte_cpu_is_supported() itself is not protected against being compiled into unsupported instructions :) ^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2023-03-30 14:31 UTC | newest] Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2023-03-30 12:53 Should we try to be more graceful in library init on old Hardware? Christian Ehrhardt 2023-03-30 13:15 ` Bruce Richardson 2023-03-30 13:28 ` Bruce Richardson 2023-03-30 14:31 ` Dmitry Kozlyuk
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).