[dpdk-users] Multi-process recovery (is it even possible?)

DPDK usage discussions
 help / color / mirror / Atom feed

* [dpdk-users] Multi-process recovery (is it even possible?)
@ 2018-02-28 18:53 Lazarenko, Vlad (WorldQuant)
  2018-03-01  8:19 ` Tan, Jianfeng
  0 siblings, 1 reply; 4+ messages in thread
From: Lazarenko, Vlad (WorldQuant) @ 2018-02-28 18:53 UTC (permalink / raw)
  To: 'users@dpdk.org'

Guys,

I am looking for possible solutions for the following problems that come along with asymmetric multi-process architecture...

Given multiple processes share the same RX/TX queue(s) and packet pool(s) and the possibility of one packet from RX queue being fanned out to multiple slave processes, is there a way to recover from slave crashing (or exits w/o cleaning up properly)? In theory it could have incremented mbuf reference count more than once and unless everything is restarted, I don't see a reliable way to release those mbufs back to the pool.

Also, if spinlock is involved and either master or slave crashes, everything simply gets stuck. Is there any way to detect this (i.e. outside of data path)..?

Thanks,
Vlad

###################################################################################

The information contained in this communication is confidential, may be

subject to legal privilege, and is intended only for the individual named.

If you are not the named addressee, please notify the sender immediately and

delete this email from your system.  The views expressed in this email are

the views of the sender only.  Outgoing and incoming electronic communications

to this address are electronically archived and subject to review and/or disclosure

to someone other than the recipient.

###################################################################################

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [dpdk-users] Multi-process recovery (is it even possible?)
  2018-02-28 18:53 [dpdk-users] Multi-process recovery (is it even possible?) Lazarenko, Vlad (WorldQuant)
@ 2018-03-01  8:19 ` Tan, Jianfeng
  2018-03-01 14:53   ` Lazarenko, Vlad (WorldQuant)
  0 siblings, 1 reply; 4+ messages in thread
From: Tan, Jianfeng @ 2018-03-01  8:19 UTC (permalink / raw)
  To: Lazarenko, Vlad (WorldQuant), 'users@dpdk.org'



> -----Original Message-----
> From: users [mailto:users-bounces@dpdk.org] On Behalf Of Lazarenko, Vlad
> (WorldQuant)
> Sent: Thursday, March 1, 2018 2:54 AM
> To: 'users@dpdk.org'
> Subject: [dpdk-users] Multi-process recovery (is it even possible?)
> 
> Guys,
> 
> I am looking for possible solutions for the following problems that come
> along with asymmetric multi-process architecture...
> 
> Given multiple processes share the same RX/TX queue(s) and packet pool(s)
> and the possibility of one packet from RX queue being fanned out to multiple
> slave processes, is there a way to recover from slave crashing (or exits w/o
> cleaning up properly)? In theory it could have incremented mbuf reference
> count more than once and unless everything is restarted, I don't see a
> reliable way to release those mbufs back to the pool.

Recycle an element is too difficult; from what I know, it's next to impossible. 
To recycle a memzone/mempool is easier. So in your case, you might want to use different pools for different queues (processes).

If you really want to recycle an element, rte_mbuf in your case, it might be doable by:
1. set up rx callback for each process, and in the callback, store a special flag at rte_mbuf->udata64.
2. when the primary to detect a secondary is down, we iterate all element with the special flag, and put them back into the ring.

There is small chance to fail that , mbuf is allocated by a secondary process, and before it's flagged, it crashes.

Thanks,
Jianfeng


> 
> Also, if spinlock is involved and either master or slave crashes, everything
> simply gets stuck. Is there any way to detect this (i.e. outside of data path)..?
> 
> Thanks,
> Vlad
> 
> 
> 
> ##########################################################
> #########################
> 
> The information contained in this communication is confidential, may be
> 
> subject to legal privilege, and is intended only for the individual named.
> 
> If you are not the named addressee, please notify the sender immediately
> and
> 
> delete this email from your system.  The views expressed in this email are
> 
> the views of the sender only.  Outgoing and incoming electronic
> communications
> 
> to this address are electronically archived and subject to review and/or
> disclosure
> 
> to someone other than the recipient.
> 
> ##########################################################
> #########################

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [dpdk-users] Multi-process recovery (is it even possible?)
  2018-03-01  8:19 ` Tan, Jianfeng
@ 2018-03-01 14:53   ` Lazarenko, Vlad (WorldQuant)
  2018-03-02  0:39     ` Tan, Jianfeng
  0 siblings, 1 reply; 4+ messages in thread
From: Lazarenko, Vlad (WorldQuant) @ 2018-03-01 14:53 UTC (permalink / raw)
  To: 'Tan, Jianfeng', 'users@dpdk.org'

Hello Jianfeng,

Thanks for getting back to me.  I thought about using "udata64", too. But that didn't work for me if a single packet was fanned out to multiple slave processes.  But most importantly, it looks like if a slave process crashes somewhere in the middle of getting or putting packets from/to a pool, we could end up with a deadlock. So I guess I'd have to think about a different design or be ready to bounce all of the processes if one of them fails.

Thanks,
Vlad

> -----Original Message-----
> From: Tan, Jianfeng [mailto:jianfeng.tan@intel.com]
> Sent: Thursday, March 01, 2018 3:20 AM
> To: Lazarenko, Vlad (WorldQuant); 'users@dpdk.org'
> Subject: RE: Multi-process recovery (is it even possible?)
> 
> 
> 
> > -----Original Message-----
> > From: users [mailto:users-bounces@dpdk.org] On Behalf Of Lazarenko,
> > Vlad
> > (WorldQuant)
> > Sent: Thursday, March 1, 2018 2:54 AM
> > To: 'users@dpdk.org'
> > Subject: [dpdk-users] Multi-process recovery (is it even possible?)
> >
> > Guys,
> >
> > I am looking for possible solutions for the following problems that
> > come along with asymmetric multi-process architecture...
> >
> > Given multiple processes share the same RX/TX queue(s) and packet
> > pool(s) and the possibility of one packet from RX queue being fanned
> > out to multiple slave processes, is there a way to recover from slave
> > crashing (or exits w/o cleaning up properly)? In theory it could have
> > incremented mbuf reference count more than once and unless everything
> > is restarted, I don't see a reliable way to release those mbufs back to the
> pool.
> 
> Recycle an element is too difficult; from what I know, it's next to impossible.
> To recycle a memzone/mempool is easier. So in your case, you might want to
> use different pools for different queues (processes).
> 
> If you really want to recycle an element, rte_mbuf in your case, it might be
> doable by:
> 1. set up rx callback for each process, and in the callback, store a special flag
> at rte_mbuf->udata64.
> 2. when the primary to detect a secondary is down, we iterate all element
> with the special flag, and put them back into the ring.
> 
> There is small chance to fail that , mbuf is allocated by a secondary process,
> and before it's flagged, it crashes.
> 
> Thanks,
> Jianfeng
> 
> 
> >
> > Also, if spinlock is involved and either master or slave crashes,
> > everything simply gets stuck. Is there any way to detect this (i.e. outside of
> data path)..?
> >
> > Thanks,
> > Vlad
> >



###################################################################################

The information contained in this communication is confidential, may be

subject to legal privilege, and is intended only for the individual named.

If you are not the named addressee, please notify the sender immediately and

delete this email from your system.  The views expressed in this email are

the views of the sender only.  Outgoing and incoming electronic communications

to this address are electronically archived and subject to review and/or disclosure

to someone other than the recipient.

###################################################################################

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [dpdk-users] Multi-process recovery (is it even possible?)
  2018-03-01 14:53   ` Lazarenko, Vlad (WorldQuant)
@ 2018-03-02  0:39     ` Tan, Jianfeng
  0 siblings, 0 replies; 4+ messages in thread
From: Tan, Jianfeng @ 2018-03-02  0:39 UTC (permalink / raw)
  To: Lazarenko, Vlad (WorldQuant), 'users@dpdk.org'



> -----Original Message-----
> From: Lazarenko, Vlad (WorldQuant)
> [mailto:Vlad.Lazarenko@worldquant.com]
> Sent: Thursday, March 1, 2018 10:53 PM
> To: Tan, Jianfeng; 'users@dpdk.org'
> Subject: RE: Multi-process recovery (is it even possible?)
> 
> Hello Jianfeng,
> 
> Thanks for getting back to me.  I thought about using "udata64", too. But that
> didn't work for me if a single packet was fanned out to multiple slave
> processes.  But most importantly, it looks like if a slave process crashes
> somewhere in the middle of getting or putting packets from/to a pool, we
> could end up with a deadlock. So I guess I'd have to think about a different
> design or be ready to bounce all of the processes if one of them fails.

OK, a better design to avoid such hard issue is good way to go. Good luck!

Thanks,
Jianfeng

> 
> Thanks,
> Vlad
> 
> > -----Original Message-----
> > From: Tan, Jianfeng [mailto:jianfeng.tan@intel.com]
> > Sent: Thursday, March 01, 2018 3:20 AM
> > To: Lazarenko, Vlad (WorldQuant); 'users@dpdk.org'
> > Subject: RE: Multi-process recovery (is it even possible?)
> >
> >
> >
> > > -----Original Message-----
> > > From: users [mailto:users-bounces@dpdk.org] On Behalf Of Lazarenko,
> > > Vlad
> > > (WorldQuant)
> > > Sent: Thursday, March 1, 2018 2:54 AM
> > > To: 'users@dpdk.org'
> > > Subject: [dpdk-users] Multi-process recovery (is it even possible?)
> > >
> > > Guys,
> > >
> > > I am looking for possible solutions for the following problems that
> > > come along with asymmetric multi-process architecture...
> > >
> > > Given multiple processes share the same RX/TX queue(s) and packet
> > > pool(s) and the possibility of one packet from RX queue being fanned
> > > out to multiple slave processes, is there a way to recover from slave
> > > crashing (or exits w/o cleaning up properly)? In theory it could have
> > > incremented mbuf reference count more than once and unless
> everything
> > > is restarted, I don't see a reliable way to release those mbufs back to the
> > pool.
> >
> > Recycle an element is too difficult; from what I know, it's next to impossible.
> > To recycle a memzone/mempool is easier. So in your case, you might want
> to
> > use different pools for different queues (processes).
> >
> > If you really want to recycle an element, rte_mbuf in your case, it might be
> > doable by:
> > 1. set up rx callback for each process, and in the callback, store a special flag
> > at rte_mbuf->udata64.
> > 2. when the primary to detect a secondary is down, we iterate all element
> > with the special flag, and put them back into the ring.
> >
> > There is small chance to fail that , mbuf is allocated by a secondary process,
> > and before it's flagged, it crashes.
> >
> > Thanks,
> > Jianfeng
> >
> >
> > >
> > > Also, if spinlock is involved and either master or slave crashes,
> > > everything simply gets stuck. Is there any way to detect this (i.e. outside
> of
> > data path)..?
> > >
> > > Thanks,
> > > Vlad
> > >
> 
> 
> 
> ##########################################################
> #########################
> 
> The information contained in this communication is confidential, may be
> 
> subject to legal privilege, and is intended only for the individual named.
> 
> If you are not the named addressee, please notify the sender immediately
> and
> 
> delete this email from your system.  The views expressed in this email are
> 
> the views of the sender only.  Outgoing and incoming electronic
> communications
> 
> to this address are electronically archived and subject to review and/or
> disclosure
> 
> to someone other than the recipient.
> 
> ##########################################################
> #########################

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2018-03-02  0:40 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-02-28 18:53 [dpdk-users] Multi-process recovery (is it even possible?) Lazarenko, Vlad (WorldQuant)
2018-03-01  8:19 ` Tan, Jianfeng
2018-03-01 14:53   ` Lazarenko, Vlad (WorldQuant)
2018-03-02  0:39     ` Tan, Jianfeng

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).