From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mga09.intel.com (mga09.intel.com [134.134.136.24]) by dpdk.org (Postfix) with ESMTP id 1A731568F for ; Mon, 23 May 2016 15:35:22 +0200 (CEST) Received: from fmsmga002.fm.intel.com ([10.253.24.26]) by orsmga102.jf.intel.com with ESMTP; 23 May 2016 06:35:21 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.26,355,1459839600"; d="scan'208";a="986863319" Received: from yliu-dev.sh.intel.com (HELO yliu-dev) ([10.239.67.162]) by fmsmga002.fm.intel.com with ESMTP; 23 May 2016 06:35:21 -0700 Date: Mon, 23 May 2016 21:36:43 +0800 From: Yuanhan Liu To: Yoni Gilad Cc: "dev@dpdk.org" Message-ID: <20160523133643.GL5641@yliu-dev.sh.intel.com> References: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.23 (2014-03-12) Subject: Re: [dpdk-dev] virtio: crash when using multiple processes (16.04 regression) X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 23 May 2016 13:35:23 -0000 On Thu, May 19, 2016 at 04:20:40PM +0000, Yoni Gilad wrote: > Hi, > > We have encountered a crash in virtio_xmit_pkts (specifically, in the call to virtqueue_notify) when running DPDK in a multi-process setup. This is a regression in DPDK 16.04. > > The culprit seems to be the field vtpci_ops in the virtio_hw structure. This field is stored in shared memory, but points to a struct in the primary process's address space. If the same struct was loaded in a different address in the secondary process, it will lead to a crash or other issues when this field is dereferenced there. The referenced virtio_pci_ops struct contains function pointers, which can also be different in the secondary process. That indeed sounds like to be the culprit. Function pointers is known for not friendly for multiple processes: see the 18.c section of DPDK programmers guide (http://dpdk.org/doc/guides/prog_guide/multi_proc_support.html): The use of function pointers between multiple processes running based of different compiled binaries is not supported, since the location of a given function in one process may be different to its location in a second. This prevents the librte_hash library from behaving properly as in a multi-threaded instance, since it uses a pointer to the hash function internally. TBH, I missed this bit (multiple processes) while introducing this function pointer; well, we never tested it before, either. We could fix/workaround it by getting the right function pointer set dynamically, but that far from being perfect. --yliu