From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mga09.intel.com (mga09.intel.com [134.134.136.24]) by dpdk.org (Postfix) with ESMTP id C30EE107A for ; Fri, 19 May 2017 18:39:48 +0200 (CEST) Received: from fmsmga006.fm.intel.com ([10.253.24.20]) by orsmga102.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 19 May 2017 09:39:47 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.38,364,1491289200"; d="scan'208";a="104567343" Received: from irvmail001.ir.intel.com ([163.33.26.43]) by fmsmga006.fm.intel.com with ESMTP; 19 May 2017 09:39:46 -0700 Received: from sivswdev02.ir.intel.com (sivswdev02.ir.intel.com [10.237.217.46]) by irvmail001.ir.intel.com (8.14.3/8.13.6/MailSET/Hub) with ESMTP id v4JGdkFs024639 for ; Fri, 19 May 2017 17:39:46 +0100 Received: from sivswdev02.ir.intel.com (localhost [127.0.0.1]) by sivswdev02.ir.intel.com with ESMTP id v4JGdkUK015311 for ; Fri, 19 May 2017 17:39:46 +0100 Received: (from aburakov@localhost) by sivswdev02.ir.intel.com with œ id v4JGdk43015307 for dev@dpdk.org; Fri, 19 May 2017 17:39:46 +0100 From: Anatoly Burakov To: dev@dpdk.org Date: Fri, 19 May 2017 17:39:42 +0100 Message-Id: <1495211986-15177-1-git-send-email-anatoly.burakov@intel.com> X-Mailer: git-send-email 1.7.0.7 Subject: [dpdk-dev] [RFC 0/4] DPDK multiprocess rework X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 19 May 2017 16:39:49 -0000 This is a proof-of-concept proposal for rework of how DPDK secondary processes work. While the code has some limitations, it works well enough to demonstrate the concept, and it can successfully run all existing multiprocess applications. Current problems with DPDK secondary processes: * ASLR interferes with mappings * "Fixed" by disabling ASLR, but not really a solution * Secondary process may map things into where we want to map shared memory * _Almost_ works with --base-virtaddr, but unreliable and tedious * Function pointers don't work (so e.g. hash library is broken) Proposed solution: Instead of running secondary process and mapping resources from primary process, the following is done: 0) compile all applications as position-indendent executables, compile DPDK as a shared library 1) fork() from primary process 2) dlopen() secondary process binary 3) use dlsym() to find entry point 4) run the application code while having all resources already mapped Benefits: * No more ASLR issues * No need for --base-virtaddr * Function pointers from primary process will work in secondaries * Hash library (and any other library that uses function pointers internally) will work correctly in multi-process scenario * ethdev data can be moved to shared memory * Primary process interrupt callbacks can be run by secondary process * More secure as all applications are compiled as position-indendent binaries (default on Fedora) Potential drawbacks (that we could think of): * Kind of a hack * Puts some code restrictions on secondary processes * Anything happening before EAL init will be run twice * Some use cases are no longer possible (attaching to a dead primary) * May impact binaries compiled to use a lot (kilobytes) of thread-local storage[1] * Likely wouldn't work for static linking There are also a number of issues that need to be resolved, but those are implementation details and are out of scope for RFC. What is explicitly out of scope: * Fixing interrupts in secondary processes * Fixing hotplug in secondary processes These currently do not work in secondary processes, and this proposal does nothing to change that. They are better addressed using dedicated EAL-internal IPC proposal. Technical nitty-gritty Things quickly get confusing, so terminology: - Original Primary is normal DPDK primary process - Forked Primary is a "clean slate" primary process, from which all secondary processes will be forked (threads and fork don't mix well, so fork is done after all the hugepage and PCI data is mapped, but before all the threads are spun up) - Original Secondary is a process that connects to Forked Primary, sends some data and and triggers a fork - Forked Secondary is _actual_ secondary process (forked from Forked Primary) Timeline: - Original Primary starts - Forked Primary is forked from Original Primary - Original Secondary starts and connects to Forked Primary - Forked Primary forks into Forked Secondary - Original Secondary waits until Forked Secondary dies During EAL init, Original Primary does a fork() to form a Forked Primary - a "clean slate" starting point for secondary processes. Forked Primary opens a local socket (a-la VFIO) and starts listening for incoming connections. Original Secondary process connects to Forked Primary, sends stdout/log fd's, command line parameters, etc. over local socket, and sits around waiting for Forked Secondary to die, then exits (Original Secondary does _not_ map anything or do any EAL init, it rte_exit()'s from inside rte_eal_init()). Forked Secondary process then executes main(), passing all command-line arguments, and execution of secondary process resumes. Why pre-fork and not pthread like VFIO? Pthreads and fork() don't mix well, because fork() stops the world (all threads disappear, leaving behind thread stacks, locks and possibly inconsistent state of both app data and system libraries). On the other hand, forking from single- threaded context is safe. Current implementation doesn't _exactly_ fork from a single-threaded context, but this can be fixed later by rearranging EAL init. [1]: https://www.redhat.com/archives/phil-list/2003-February/msg00077.html Anatoly Burakov (4): vfio: refactor sockets into separate files eal: enable experimental dlopen()-based secondary process support apps: enable new secondary process support in multiprocess apps mk: default to compiling shared libraries config/common_base | 2 +- .../client_server_mp/mp_client/Makefile | 2 +- examples/multi_process/simple_mp/Makefile | 2 +- examples/multi_process/symmetric_mp/Makefile | 2 +- lib/librte_eal/linuxapp/eal/Makefile | 3 + lib/librte_eal/linuxapp/eal/eal.c | 105 ++++- lib/librte_eal/linuxapp/eal/eal_mp.h | 54 +++ lib/librte_eal/linuxapp/eal/eal_mp_primary.c | 477 +++++++++++++++++++++ lib/librte_eal/linuxapp/eal/eal_mp_secondary.c | 301 +++++++++++++ lib/librte_eal/linuxapp/eal/eal_mp_socket.c | 301 +++++++++++++ lib/librte_eal/linuxapp/eal/eal_mp_socket.h | 54 +++ lib/librte_eal/linuxapp/eal/eal_vfio.c | 20 +- lib/librte_eal/linuxapp/eal/eal_vfio.h | 24 +- lib/librte_eal/linuxapp/eal/eal_vfio_mp_sync.c | 243 ++--------- 14 files changed, 1347 insertions(+), 243 deletions(-) create mode 100755 lib/librte_eal/linuxapp/eal/eal_mp.h create mode 100755 lib/librte_eal/linuxapp/eal/eal_mp_primary.c create mode 100755 lib/librte_eal/linuxapp/eal/eal_mp_secondary.c create mode 100755 lib/librte_eal/linuxapp/eal/eal_mp_socket.c create mode 100755 lib/librte_eal/linuxapp/eal/eal_mp_socket.h -- 2.7.4