From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mx02.mlp.com (mx02.mlp.com [204.212.175.137]) by dpdk.org (Postfix) with ESMTP id 82F604C88 for ; Thu, 1 Mar 2018 15:53:21 +0100 (CET) IronPort-PHdr: =?us-ascii?q?9a23=3AqJRt0xLqgQ+wTcENAdmcpTZWNBhigK39O0sv0rFi?= =?us-ascii?q?tYgXKv74rarrMEGX3/hxlliBBdydt6ofzbKO+4nbGkU4qa6bt34DdJEeHzQksu?= =?us-ascii?q?4x2zIaPcieFEfgJ+TrZSFpVO5LVVti4m3peRMNQJW2aFLduGC94iAPERvjKwV1?= =?us-ascii?q?Ov71GonPhMiryuy+4ZLebxlGiTanfb9+MAi9oBnMuMURnYZsMLs6xAHTontPde?= =?us-ascii?q?RWxGdoKkyWkh3h+Mq+/4Nt/jpJtf45+MFOTav1f6IjTbxFFzsmKHw65NfqtRbY?= =?us-ascii?q?UwSC4GYXX3gMnRpJBwjF6wz6Xov0vyDnuOdxxDWWMMvrRr0vRz+s87lkRwPpiC?= =?us-ascii?q?cfNj427mfXitBrjKlGpB6tvgFzz5LIbI2QMvd1Y6HTcs4ARWdZXshfSTFPAp+y?= =?us-ascii?q?YYUMAeoOP+dYoJXyqVQBtha+GRCsCefzxjNUmnP6w6s32PkhHwHc2wwgGsoDvH?= =?us-ascii?q?rSotrvNaYdS/q1w7fOzTXAaPNawSr254fMch87vP6HQLZ8fsrWyUk1EwPKk06e?= =?us-ascii?q?qZH/MDOTyuQBtmaX5PdnWO2gj24osRx+riKpyMg2i4jGnJgVxU7C9SV6x4Y4Pt?= =?us-ascii?q?O5SElhYd6gDZRQrjyaOJFwQsM+WW1npCE6yrgftJO9YSMEy4wnygbCZ/CbaYSF?= =?us-ascii?q?7Q/vWPyMLTp6nn5pZa+zihSq/US91uHwTMm53ExIoydFiNXAq2wB2wHL5sSaS/?= =?us-ascii?q?Zw+l2t1SiP2g3c8O1IPEE5mbDFJ5I/xrM9kIcYv17ZES/sgkr2ibebdkAj+ue1?= =?us-ascii?q?9evqeq7mppqAN49sjQH+L7gultS/AesmNggOWHCW9/6827P+4EP3R6tEgPI3na?= =?us-ascii?q?fbrp7XO9gXqrKkDwNPzost5QyzATCg3toCh3UIMFVFeBefg4joPVHBPuz4AO+i?= =?us-ascii?q?j1iwijtmyO3KMqf8DpjPNHTPjartcLRl505Z0gUzzNRf55xOCrEGJfL+QkD+tN?= =?us-ascii?q?jCARAkKQC6xfzoCdRn2YMER22PBKyZMKTJvF6G4eIvOe2Ma5IMuDbgMfcl4eLu?= =?us-ascii?q?gWUlll8aeKmlxYEXZ2ygHvR6P0WZZmLhgtUAEWgQuAo+QvLliFuNUTJJe3a9Ra?= =?us-ascii?q?Q86yo6CIKgEYfMWIStjKad0ye8G51cfnpGBUyUEXf0a4WEXO8BaD+JIsB/iDwE?= =?us-ascii?q?TqOsRJI51R6ztw/20b1nLvDb+n5QiZW2npdJ7uuX3TMz/Dp6AsKSnCvFG2N1l2?= =?us-ascii?q?UGSjk5mrx2p2R8zEuO1e5zhPkOUZQZqO5SXx0hHZLEw+88DMr9FUiJKsqSRUy9?= =?us-ascii?q?atm+DTJ3Scg+lYwgeUF4Tp+GhwLKxSesALhR34KPDZB8sp3bxXz9Htt0zWqHnu?= =?us-ascii?q?F1sV0vTo1rPH+vnKN59A/7GInTl0mAiq+saeIX2yubpzTL9naHoEwNCF04aq7C?= =?us-ascii?q?R31KIxKO9dk=3D?= X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: =?us-ascii?q?A2HEAACjEpha/3URKQpcGwEBAQEDAQEBC?= =?us-ascii?q?QEBAYQmgSgKjW2Pc4EWgjKReoIVhToCgn80GAECAQEBAQEBAgECgRCCOCQBgkc?= =?us-ascii?q?BAQEBAycTSwQCAQgNBAEDAQELFAkHMhQDBggBAQQBEggTsWk6IYhEghYBAQEBA?= =?us-ascii?q?QEBAwEBAQEBAQEBIIUmg3+FEoUjgzuCMgWaWAkCmiKFTJMBHoILcIMShFp3i3C?= =?us-ascii?q?BFwEBAQ?= X-IPAS-Result: =?us-ascii?q?A2HEAACjEpha/3URKQpcGwEBAQEDAQEBCQEBAYQmgSgKjW2?= =?us-ascii?q?Pc4EWgjKReoIVhToCgn80GAECAQEBAQEBAgECgRCCOCQBgkcBAQEBAycTSwQCA?= =?us-ascii?q?QgNBAEDAQELFAkHMhQDBggBAQQBEggTsWk6IYhEghYBAQEBAQEBAwEBAQEBAQE?= =?us-ascii?q?BIIUmg3+FEoUjgzuCMgWaWAkCmiKFTJMBHoILcIMShFp3i3CBFwEBAQ?= X-IronPort-AV: E=Sophos;i="5.47,408,1515474000"; d="scan'208";a="80372136" X-IronPort-Outbreak-Status: No, level 0, Unknown - Unknown Received: from unknown (HELO PWSSMTEXHTC001.AD.MLP.com) ([10.41.17.117]) by smtprfp.mlp.com with ESMTP; 01 Mar 2018 09:53:18 -0500 Received: from PWSSMTEXMBX002.AD.MLP.com ([169.254.16.229]) by PWSSMTEXHTC001.AD.MLP.com ([10.41.17.117]) with mapi id 14.03.0248.002; Thu, 1 Mar 2018 09:53:19 -0500 From: "Lazarenko, Vlad (WorldQuant)" To: "'Tan, Jianfeng'" , "'users@dpdk.org'" Thread-Topic: Multi-process recovery (is it even possible?) Thread-Index: AdOwxXAvHZ5QGX6aRnmNCp+LjczNoAAaaIMQAA8mzDA= Date: Thu, 1 Mar 2018 14:53:18 +0000 Message-ID: <790E2AC11206AC46B8F4BB82078E34F8081E29C2@PWSSMTEXMBX002.AD.MLP.com> References: <790E2AC11206AC46B8F4BB82078E34F8081E0EAB@PWSSMTEXMBX002.AD.MLP.com> In-Reply-To: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [10.5.75.101] Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Subject: Re: [dpdk-users] Multi-process recovery (is it even possible?) X-BeenThere: users@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK usage discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 01 Mar 2018 14:53:21 -0000 Hello Jianfeng, Thanks for getting back to me. I thought about using "udata64", too. But t= hat didn't work for me if a single packet was fanned out to multiple slave = processes. But most importantly, it looks like if a slave process crashes = somewhere in the middle of getting or putting packets from/to a pool, we co= uld end up with a deadlock. So I guess I'd have to think about a different = design or be ready to bounce all of the processes if one of them fails. Thanks, Vlad > -----Original Message----- > From: Tan, Jianfeng [mailto:jianfeng.tan@intel.com] > Sent: Thursday, March 01, 2018 3:20 AM > To: Lazarenko, Vlad (WorldQuant); 'users@dpdk.org' > Subject: RE: Multi-process recovery (is it even possible?) > = > = > = > > -----Original Message----- > > From: users [mailto:users-bounces@dpdk.org] On Behalf Of Lazarenko, > > Vlad > > (WorldQuant) > > Sent: Thursday, March 1, 2018 2:54 AM > > To: 'users@dpdk.org' > > Subject: [dpdk-users] Multi-process recovery (is it even possible?) > > > > Guys, > > > > I am looking for possible solutions for the following problems that > > come along with asymmetric multi-process architecture... > > > > Given multiple processes share the same RX/TX queue(s) and packet > > pool(s) and the possibility of one packet from RX queue being fanned > > out to multiple slave processes, is there a way to recover from slave > > crashing (or exits w/o cleaning up properly)? In theory it could have > > incremented mbuf reference count more than once and unless everything > > is restarted, I don't see a reliable way to release those mbufs back to= the > pool. > = > Recycle an element is too difficult; from what I know, it's next to impos= sible. > To recycle a memzone/mempool is easier. So in your case, you might want to > use different pools for different queues (processes). > = > If you really want to recycle an element, rte_mbuf in your case, it might= be > doable by: > 1. set up rx callback for each process, and in the callback, store a spec= ial flag > at rte_mbuf->udata64. > 2. when the primary to detect a secondary is down, we iterate all element > with the special flag, and put them back into the ring. > = > There is small chance to fail that , mbuf is allocated by a secondary pro= cess, > and before it's flagged, it crashes. > = > Thanks, > Jianfeng > = > = > > > > Also, if spinlock is involved and either master or slave crashes, > > everything simply gets stuck. Is there any way to detect this (i.e. out= side of > data path)..? > > > > Thanks, > > Vlad > > ###########################################################################= ######## The information contained in this communication is confidential, may be subject to legal privilege, and is intended only for the individual named. If you are not the named addressee, please notify the sender immediately and delete this email from your system. The views expressed in this email are the views of the sender only. Outgoing and incoming electronic communicati= ons to this address are electronically archived and subject to review and/or di= sclosure to someone other than the recipient. ###########################################################################= ########