how to make dpdk processes tolerable to segmantation fault?

DPDK usage discussions
 help / color / mirror / Atom feed

* how to make dpdk processes tolerable to segmantation fault?
@ 2023-11-30  7:45 Fuji Nafiul
  2023-11-30 16:24 ` Dmitry Kozlyuk
  0 siblings, 1 reply; 3+ messages in thread
From: Fuji Nafiul @ 2023-11-30  7:45 UTC (permalink / raw)
  To: users

[-- Attachment #1: Type: text/plain, Size: 780 bytes --]

In a normal c program, I saw that the segmentation fault in 1 loosely
coupled thread doesn't necessarily affect other threads or the main
program. There, I can check all the threads by process ID of it in every
certain period of time and if some unexepected segmentation fault occurs or
got killed I can re run the thread and it works fine. I can later monitor
the logs and inspect the situation.

But I saw that, segmentation fault or other unexpected error in remotely
launched (using DPDK) functions on different core affects the whole dpdk
process and whole dpdk program crashes.. why is that?

Is there any alternative way to handle this scenario ? How can I take
measures for unexpected future error occurance where I should auto rerun
dpdk remote processes in live system?

[-- Attachment #2: Type: text/html, Size: 827 bytes --]

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: how to make dpdk processes tolerable to segmantation fault?
  2023-11-30  7:45 how to make dpdk processes tolerable to segmantation fault? Fuji Nafiul
@ 2023-11-30 16:24 ` Dmitry Kozlyuk
  2023-11-30 21:19   ` Stephen Hemminger
  0 siblings, 1 reply; 3+ messages in thread
From: Dmitry Kozlyuk @ 2023-11-30 16:24 UTC (permalink / raw)
  To: Fuji Nafiul; +Cc: users

2023-11-30 13:45 (UTC+0600), Fuji Nafiul:
> In a normal c program, I saw that the segmentation fault in 1 loosely
> coupled thread doesn't necessarily affect other threads or the main
> program. There, I can check all the threads by process ID of it in every
> certain period of time and if some unexepected segmentation fault occurs or
> got killed I can re run the thread and it works fine. I can later monitor
> the logs and inspect the situation.
> 
> But I saw that, segmentation fault or other unexpected error in remotely
> launched (using DPDK) functions on different core affects the whole dpdk
> process and whole dpdk program crashes.. why is that?
> 
> Is there any alternative way to handle this scenario ? How can I take
> measures for unexpected future error occurance where I should auto rerun
> dpdk remote processes in live system?

Please consider running the buggy code that causes SIGSEGV
in a separate process rather than a thread.
If it must use DPDK, can it be made an independent app?

DPDK is unlikely to ever support the described scenario.
Continuing to run the process after SIGSEGV is inherently unsafe.
Specifically, DPDK communicates with its lcore threads
using pipes allocated at startup.
If such thread crashed and a SIGSEGV not killing the app was installed,
the communication would hang.
Generally, DPDK employs user-space synchronization primitives,
which cannot recover if one of the threads using them crashes.

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: how to make dpdk processes tolerable to segmantation fault?
  2023-11-30 16:24 ` Dmitry Kozlyuk
@ 2023-11-30 21:19   ` Stephen Hemminger
  0 siblings, 0 replies; 3+ messages in thread
From: Stephen Hemminger @ 2023-11-30 21:19 UTC (permalink / raw)
  To: Dmitry Kozlyuk; +Cc: Fuji Nafiul, users

On Thu, 30 Nov 2023 19:24:01 +0300
Dmitry Kozlyuk <dmitry.kozliuk@gmail.com> wrote:

> 2023-11-30 13:45 (UTC+0600), Fuji Nafiul:
> > In a normal c program, I saw that the segmentation fault in 1 loosely
> > coupled thread doesn't necessarily affect other threads or the main
> > program. There, I can check all the threads by process ID of it in every
> > certain period of time and if some unexepected segmentation fault occurs or
> > got killed I can re run the thread and it works fine. I can later monitor
> > the logs and inspect the situation.
> > 
> > But I saw that, segmentation fault or other unexpected error in remotely
> > launched (using DPDK) functions on different core affects the whole dpdk
> > process and whole dpdk program crashes.. why is that?
> > 
> > Is there any alternative way to handle this scenario ? How can I take
> > measures for unexpected future error occurance where I should auto rerun
> > dpdk remote processes in live system?  
> 
> Please consider running the buggy code that causes SIGSEGV
> in a separate process rather than a thread.
> If it must use DPDK, can it be made an independent app?
> 
> DPDK is unlikely to ever support the described scenario.
> Continuing to run the process after SIGSEGV is inherently unsafe.
> Specifically, DPDK communicates with its lcore threads
> using pipes allocated at startup.
> If such thread crashed and a SIGSEGV not killing the app was installed,
> the communication would hang.
> Generally, DPDK employs user-space synchronization primitives,
> which cannot recover if one of the threads using them crashes.


A couple of things you can do.
  - run your DPDK application as a systemd service which will be restarted
    when you crash.
  - catch SIGSEGV in the application an print a backtrace, then abort.

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2023-11-30 21:19 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-11-30  7:45 how to make dpdk processes tolerable to segmantation fault? Fuji Nafiul
2023-11-30 16:24 ` Dmitry Kozlyuk
2023-11-30 21:19   ` Stephen Hemminger

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).