From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) by dpdk.org (Postfix) with ESMTP id 676295592 for ; Mon, 10 Jul 2017 12:18:15 +0200 (CEST) Received: from orsmga001.jf.intel.com ([10.7.209.18]) by orsmga103.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 10 Jul 2017 03:18:14 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.40,339,1496127600"; d="scan'208";a="1149944687" Received: from irsmsx106.ger.corp.intel.com ([163.33.3.31]) by orsmga001.jf.intel.com with ESMTP; 10 Jul 2017 03:18:13 -0700 Received: from irsmsx109.ger.corp.intel.com ([169.254.13.187]) by IRSMSX106.ger.corp.intel.com ([169.254.8.236]) with mapi id 14.03.0319.002; Mon, 10 Jul 2017 11:18:13 +0100 From: "Burakov, Anatoly" To: "Burakov, Anatoly" , "dev@dpdk.org" Thread-Topic: [dpdk-dev] [RFC 0/4] DPDK multiprocess rework Thread-Index: AQHS0L6eY/hS54CZAUKvErd7M8jIQKJNKfoQ Date: Mon, 10 Jul 2017 10:18:12 +0000 Message-ID: References: <1495211986-15177-1-git-send-email-anatoly.burakov@intel.com> In-Reply-To: <1495211986-15177-1-git-send-email-anatoly.burakov@intel.com> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-ctpclassification: CTP_PUBLIC x-titus-metadata-40: eyJDYXRlZ29yeUxhYmVscyI6IiIsIk1ldGFkYXRhIjp7Im5zIjoiaHR0cDpcL1wvd3d3LnRpdHVzLmNvbVwvbnNcL0ludGVsMyIsImlkIjoiODNkZTJmMTAtMDc2NC00OWZjLTgzM2UtMjc4NzFhYmFmYTk1IiwicHJvcHMiOlt7Im4iOiJDVFBDbGFzc2lmaWNhdGlvbiIsInZhbHMiOlt7InZhbHVlIjoiQ1RQX1BVQkxJQyJ9XX1dfSwiU3ViamVjdExhYmVscyI6W10sIlRNQ1ZlcnNpb24iOiIxNi41LjkuMyIsIlRydXN0ZWRMYWJlbEhhc2giOiIwYis0RHoyZmNXd0g3TDF2VzVMeWYyMDRJZHpjZzRtQ2ZibEIwXC9sNzdpND0ifQ== dlp-product: dlpe-windows dlp-version: 10.0.102.7 dlp-reaction: no-action x-originating-ip: [163.33.239.180] Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Subject: Re: [dpdk-dev] [RFC 0/4] DPDK multiprocess rework X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 10 Jul 2017 10:18:16 -0000 > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Anatoly Burakov > Sent: Friday, May 19, 2017 5:40 PM > To: dev@dpdk.org > Subject: [dpdk-dev] [RFC 0/4] DPDK multiprocess rework >=20 > This is a proof-of-concept proposal for rework of how DPDK secondary > processes work. While the code has some limitations, it works well enough= to > demonstrate the concept, and it can successfully run all existing multipr= ocess > applications. >=20 > Current problems with DPDK secondary processes: > * ASLR interferes with mappings > * "Fixed" by disabling ASLR, but not really a solution > * Secondary process may map things into where we want to map shared > memory > * _Almost_ works with --base-virtaddr, but unreliable and tedious > * Function pointers don't work (so e.g. hash library is broken) >=20 > Proposed solution: >=20 > Instead of running secondary process and mapping resources from primary > process, the following is done: > 0) compile all applications as position-indendent executables, compile DP= DK > as > a shared library > 1) fork() from primary process > 2) dlopen() secondary process binary > 3) use dlsym() to find entry point > 4) run the application code while having all resources already mapped >=20 > Benefits: > * No more ASLR issues > * No need for --base-virtaddr > * Function pointers from primary process will work in secondaries > * Hash library (and any other library that uses function pointers inter= nally) > will work correctly in multi-process scenario > * ethdev data can be moved to shared memory > * Primary process interrupt callbacks can be run by secondary process > * More secure as all applications are compiled as position-indendent bina= ries > (default on Fedora) >=20 > Potential drawbacks (that we could think of): > * Kind of a hack > * Puts some code restrictions on secondary processes > * Anything happening before EAL init will be run twice > * Some use cases are no longer possible (attaching to a dead primary) > * May impact binaries compiled to use a lot (kilobytes) of thread-local > storage[1] > * Likely wouldn't work for static linking >=20 > There are also a number of issues that need to be resolved, but those are > implementation details and are out of scope for RFC. >=20 > What is explicitly out of scope: > * Fixing interrupts in secondary processes > * Fixing hotplug in secondary processes >=20 > These currently do not work in secondary processes, and this proposal doe= s > nothing to change that. They are better addressed using dedicated EAL- > internal IPC proposal. >=20 >=20 > Technical nitty-gritty >=20 > Things quickly get confusing, so terminology: > - Original Primary is normal DPDK primary process > - Forked Primary is a "clean slate" primary process, from which all secon= dary > processes will be forked (threads and fork don't mix well, so fork is d= one > after all the hugepage and PCI data is mapped, but before all the threa= ds > are > spun up) > - Original Secondary is a process that connects to Forked Primary, sends > some > data and and triggers a fork > - Forked Secondary is _actual_ secondary process (forked from Forked > Primary) >=20 > Timeline: > - Original Primary starts > - Forked Primary is forked from Original Primary > - Original Secondary starts and connects to Forked Primary > - Forked Primary forks into Forked Secondary > - Original Secondary waits until Forked Secondary dies >=20 > During EAL init, Original Primary does a fork() to form a Forked Primary = - a > "clean slate" starting point for secondary processes. Forked Primary open= s a > local socket (a-la VFIO) and starts listening for incoming connections. >=20 > Original Secondary process connects to Forked Primary, sends stdout/log > fd's, command line parameters, etc. over local socket, and sits around wa= iting > for Forked Secondary to die, then exits (Original Secondary does _not_ ma= p > anything or do any EAL init, it rte_exit()'s from inside rte_eal_init()).= Forked > Secondary process then executes main(), passing all command-line > arguments, and execution of secondary process resumes. >=20 > Why pre-fork and not pthread like VFIO? >=20 > Pthreads and fork() don't mix well, because fork() stops the world (all > threads disappear, leaving behind thread stacks, locks and possibly > inconsistent state of both app data and system libraries). On the other h= and, > forking from single- threaded context is safe. Current implementation > doesn't _exactly_ fork from a single-threaded context, but this can be fi= xed > later by rearranging EAL init. >=20 > [1]: https://www.redhat.com/archives/phil-list/2003- > February/msg00077.html >=20 Ping