From: "Tan, Jianfeng" <jianfeng.tan@intel.com>
To: "Lazarenko, Vlad (WorldQuant)" <Vlad.Lazarenko@worldquant.com>,
"'users@dpdk.org'" <users@dpdk.org>
Subject: Re: [dpdk-users] Multi-process recovery (is it even possible?)
Date: Fri, 2 Mar 2018 00:39:58 +0000 [thread overview]
Message-ID: <ED26CBA2FAD1BF48A8719AEF02201E365145A788@SHSMSX103.ccr.corp.intel.com> (raw)
In-Reply-To: <790E2AC11206AC46B8F4BB82078E34F8081E29C2@PWSSMTEXMBX002.AD.MLP.com>
> -----Original Message-----
> From: Lazarenko, Vlad (WorldQuant)
> [mailto:Vlad.Lazarenko@worldquant.com]
> Sent: Thursday, March 1, 2018 10:53 PM
> To: Tan, Jianfeng; 'users@dpdk.org'
> Subject: RE: Multi-process recovery (is it even possible?)
>
> Hello Jianfeng,
>
> Thanks for getting back to me. I thought about using "udata64", too. But that
> didn't work for me if a single packet was fanned out to multiple slave
> processes. But most importantly, it looks like if a slave process crashes
> somewhere in the middle of getting or putting packets from/to a pool, we
> could end up with a deadlock. So I guess I'd have to think about a different
> design or be ready to bounce all of the processes if one of them fails.
OK, a better design to avoid such hard issue is good way to go. Good luck!
Thanks,
Jianfeng
>
> Thanks,
> Vlad
>
> > -----Original Message-----
> > From: Tan, Jianfeng [mailto:jianfeng.tan@intel.com]
> > Sent: Thursday, March 01, 2018 3:20 AM
> > To: Lazarenko, Vlad (WorldQuant); 'users@dpdk.org'
> > Subject: RE: Multi-process recovery (is it even possible?)
> >
> >
> >
> > > -----Original Message-----
> > > From: users [mailto:users-bounces@dpdk.org] On Behalf Of Lazarenko,
> > > Vlad
> > > (WorldQuant)
> > > Sent: Thursday, March 1, 2018 2:54 AM
> > > To: 'users@dpdk.org'
> > > Subject: [dpdk-users] Multi-process recovery (is it even possible?)
> > >
> > > Guys,
> > >
> > > I am looking for possible solutions for the following problems that
> > > come along with asymmetric multi-process architecture...
> > >
> > > Given multiple processes share the same RX/TX queue(s) and packet
> > > pool(s) and the possibility of one packet from RX queue being fanned
> > > out to multiple slave processes, is there a way to recover from slave
> > > crashing (or exits w/o cleaning up properly)? In theory it could have
> > > incremented mbuf reference count more than once and unless
> everything
> > > is restarted, I don't see a reliable way to release those mbufs back to the
> > pool.
> >
> > Recycle an element is too difficult; from what I know, it's next to impossible.
> > To recycle a memzone/mempool is easier. So in your case, you might want
> to
> > use different pools for different queues (processes).
> >
> > If you really want to recycle an element, rte_mbuf in your case, it might be
> > doable by:
> > 1. set up rx callback for each process, and in the callback, store a special flag
> > at rte_mbuf->udata64.
> > 2. when the primary to detect a secondary is down, we iterate all element
> > with the special flag, and put them back into the ring.
> >
> > There is small chance to fail that , mbuf is allocated by a secondary process,
> > and before it's flagged, it crashes.
> >
> > Thanks,
> > Jianfeng
> >
> >
> > >
> > > Also, if spinlock is involved and either master or slave crashes,
> > > everything simply gets stuck. Is there any way to detect this (i.e. outside
> of
> > data path)..?
> > >
> > > Thanks,
> > > Vlad
> > >
>
>
>
> ##########################################################
> #########################
>
> The information contained in this communication is confidential, may be
>
> subject to legal privilege, and is intended only for the individual named.
>
> If you are not the named addressee, please notify the sender immediately
> and
>
> delete this email from your system. The views expressed in this email are
>
> the views of the sender only. Outgoing and incoming electronic
> communications
>
> to this address are electronically archived and subject to review and/or
> disclosure
>
> to someone other than the recipient.
>
> ##########################################################
> #########################
prev parent reply other threads:[~2018-03-02 0:40 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-02-28 18:53 Lazarenko, Vlad (WorldQuant)
2018-03-01 8:19 ` Tan, Jianfeng
2018-03-01 14:53 ` Lazarenko, Vlad (WorldQuant)
2018-03-02 0:39 ` Tan, Jianfeng [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ED26CBA2FAD1BF48A8719AEF02201E365145A788@SHSMSX103.ccr.corp.intel.com \
--to=jianfeng.tan@intel.com \
--cc=Vlad.Lazarenko@worldquant.com \
--cc=users@dpdk.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).