From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mga09.intel.com (mga09.intel.com [134.134.136.24]) by dpdk.org (Postfix) with ESMTP id B3B7E4F98 for ; Fri, 2 Mar 2018 01:40:02 +0100 (CET) X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga008.fm.intel.com ([10.253.24.58]) by orsmga102.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 01 Mar 2018 16:40:01 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.47,409,1515484800"; d="scan'208";a="21439230" Received: from fmsmsx108.amr.corp.intel.com ([10.18.124.206]) by fmsmga008.fm.intel.com with ESMTP; 01 Mar 2018 16:40:01 -0800 Received: from fmsmsx121.amr.corp.intel.com (10.18.125.36) by FMSMSX108.amr.corp.intel.com (10.18.124.206) with Microsoft SMTP Server (TLS) id 14.3.319.2; Thu, 1 Mar 2018 16:40:01 -0800 Received: from shsmsx102.ccr.corp.intel.com (10.239.4.154) by fmsmsx121.amr.corp.intel.com (10.18.125.36) with Microsoft SMTP Server (TLS) id 14.3.319.2; Thu, 1 Mar 2018 16:40:00 -0800 Received: from shsmsx103.ccr.corp.intel.com ([169.254.4.116]) by shsmsx102.ccr.corp.intel.com ([169.254.2.124]) with mapi id 14.03.0319.002; Fri, 2 Mar 2018 08:39:58 +0800 From: "Tan, Jianfeng" To: "Lazarenko, Vlad (WorldQuant)" , "'users@dpdk.org'" Thread-Topic: Multi-process recovery (is it even possible?) Thread-Index: AdOwxXAvHZ5QGX6aRnmNCp+LjczNoAAaaIMQAA8mzDAAFMIx0A== Date: Fri, 2 Mar 2018 00:39:58 +0000 Message-ID: References: <790E2AC11206AC46B8F4BB82078E34F8081E0EAB@PWSSMTEXMBX002.AD.MLP.com> <790E2AC11206AC46B8F4BB82078E34F8081E29C2@PWSSMTEXMBX002.AD.MLP.com> In-Reply-To: <790E2AC11206AC46B8F4BB82078E34F8081E29C2@PWSSMTEXMBX002.AD.MLP.com> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [10.239.127.40] Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Subject: Re: [dpdk-users] Multi-process recovery (is it even possible?) X-BeenThere: users@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK usage discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 02 Mar 2018 00:40:03 -0000 > -----Original Message----- > From: Lazarenko, Vlad (WorldQuant) > [mailto:Vlad.Lazarenko@worldquant.com] > Sent: Thursday, March 1, 2018 10:53 PM > To: Tan, Jianfeng; 'users@dpdk.org' > Subject: RE: Multi-process recovery (is it even possible?) >=20 > Hello Jianfeng, >=20 > Thanks for getting back to me. I thought about using "udata64", too. But= that > didn't work for me if a single packet was fanned out to multiple slave > processes. But most importantly, it looks like if a slave process crashe= s > somewhere in the middle of getting or putting packets from/to a pool, we > could end up with a deadlock. So I guess I'd have to think about a differ= ent > design or be ready to bounce all of the processes if one of them fails. OK, a better design to avoid such hard issue is good way to go. Good luck! Thanks, Jianfeng >=20 > Thanks, > Vlad >=20 > > -----Original Message----- > > From: Tan, Jianfeng [mailto:jianfeng.tan@intel.com] > > Sent: Thursday, March 01, 2018 3:20 AM > > To: Lazarenko, Vlad (WorldQuant); 'users@dpdk.org' > > Subject: RE: Multi-process recovery (is it even possible?) > > > > > > > > > -----Original Message----- > > > From: users [mailto:users-bounces@dpdk.org] On Behalf Of Lazarenko, > > > Vlad > > > (WorldQuant) > > > Sent: Thursday, March 1, 2018 2:54 AM > > > To: 'users@dpdk.org' > > > Subject: [dpdk-users] Multi-process recovery (is it even possible?) > > > > > > Guys, > > > > > > I am looking for possible solutions for the following problems that > > > come along with asymmetric multi-process architecture... > > > > > > Given multiple processes share the same RX/TX queue(s) and packet > > > pool(s) and the possibility of one packet from RX queue being fanned > > > out to multiple slave processes, is there a way to recover from slave > > > crashing (or exits w/o cleaning up properly)? In theory it could have > > > incremented mbuf reference count more than once and unless > everything > > > is restarted, I don't see a reliable way to release those mbufs back = to the > > pool. > > > > Recycle an element is too difficult; from what I know, it's next to imp= ossible. > > To recycle a memzone/mempool is easier. So in your case, you might want > to > > use different pools for different queues (processes). > > > > If you really want to recycle an element, rte_mbuf in your case, it mig= ht be > > doable by: > > 1. set up rx callback for each process, and in the callback, store a sp= ecial flag > > at rte_mbuf->udata64. > > 2. when the primary to detect a secondary is down, we iterate all eleme= nt > > with the special flag, and put them back into the ring. > > > > There is small chance to fail that , mbuf is allocated by a secondary p= rocess, > > and before it's flagged, it crashes. > > > > Thanks, > > Jianfeng > > > > > > > > > > Also, if spinlock is involved and either master or slave crashes, > > > everything simply gets stuck. Is there any way to detect this (i.e. o= utside > of > > data path)..? > > > > > > Thanks, > > > Vlad > > > >=20 >=20 >=20 > ########################################################## > ######################### >=20 > The information contained in this communication is confidential, may be >=20 > subject to legal privilege, and is intended only for the individual named= . >=20 > If you are not the named addressee, please notify the sender immediately > and >=20 > delete this email from your system. The views expressed in this email ar= e >=20 > the views of the sender only. Outgoing and incoming electronic > communications >=20 > to this address are electronically archived and subject to review and/or > disclosure >=20 > to someone other than the recipient. >=20 > ########################################################## > #########################