From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mga17.intel.com (mga17.intel.com [192.55.52.151]) by dpdk.org (Postfix) with ESMTP id 03E551B53B for ; Fri, 15 Jun 2018 17:16:04 +0200 (CEST) X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga003.fm.intel.com ([10.253.24.29]) by fmsmga107.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 15 Jun 2018 08:16:03 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.51,227,1526367600"; d="scan'208";a="57624380" Received: from aburakov-mobl.ger.corp.intel.com (HELO [10.237.220.66]) ([10.237.220.66]) by FMSMGA003.fm.intel.com with ESMTP; 15 Jun 2018 08:16:01 -0700 To: Qi Zhang , thomas@monjalon.net Cc: konstantin.ananyev@intel.com, dev@dpdk.org, bruce.richardson@intel.com, ferruh.yigit@intel.com, benjamin.h.shelton@intel.com, narender.vangati@intel.com References: <20180607123849.14439-1-qi.z.zhang@intel.com> From: "Burakov, Anatoly" Message-ID: Date: Fri, 15 Jun 2018 16:16:00 +0100 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:52.0) Gecko/20100101 Thunderbird/52.8.0 MIME-Version: 1.0 In-Reply-To: <20180607123849.14439-1-qi.z.zhang@intel.com> Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 7bit Subject: Re: [dpdk-dev] [PATCH 00/22] enable hotplug on multi-process X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 15 Jun 2018 15:16:05 -0000 Hi Qi, I haven't read the code yet, and i'll be the first to admit that i'm not too well versed on how shared/private device data works, so my apologies in advance if all of below comments are addressed by implementation details or are way off base! On 07-Jun-18 1:38 PM, Qi Zhang wrote: > > 1. Attach a share device from primary > 2. Detach a share device from primary > 3. Attach a share device from secondary > 4. Detach a share device from secondary > 5. Attach a private device from secondary > 6. Detach a private device from secondary > 7. Detach a share device from secondary privately > 8. Attach a share device from secondary privately > <...> > Case 7, 8: > Secondary process can also temporally to detach a share device > "privately" then attach it back later, this action also not impact other > processes. > Do we really need to implement these cases? It seems to me that this "reattach it later" introduces unnecessary complexity. If secondary has detached the device, i think it is safer if we cannot reattach it, period, because it was a shared device. What if we try to attach it when a handshake has already completed and all other processes expect to detach it? (in fact, do we differentiate between non-existent device and shared device that has been "privately detached"? I would expect that we keep the device as detached as opposed to forgetting about it, so that, come handshake, we can safely reply "yeah, we can detach the device", but maybe it's OK to not treat request to detach a non-existent device as an error... thoughts? am i getting something wrong?) > APIs chenages: > ============== > > rte_eth_dev_attach and rte_eth_dev_attach are extended to support > share device attach/detach in primary-secondary process model, it will > be called in case 1,2,3,4. > > New API rte_eth_dev_attach_private and rte_eth_dev_detach_private are > introduced to cover case 5,6,7,8, this API can only be invoked in > secondary process. > > New API rte_eth_dev_lock and rte_eth_dev_unlock are introduced to let > application lock or unlock on specific ethdev, a locked device > can't be detached. This help applicaiton to prevent unexpected > device detaching, especially in multi-process envrionment. > Aslo the new API let application to register a callback function > which will be invoked before a device is going to be detached, > the return value of the function will decide if device will continue > be detached or not, this support application to do condition check > at runtime. I assume that you've added device locking to avoid having to do handshake before detach, correct? Is this a shared lock of some kind, or is it a private lock? If it's shared lock, what happens if the process holding that lock crashes? > > PMD Impact: > =========== > > Currently device removing is not handled well in secondary process on most > pmd drivers, rte_eth_dev_relase_port will be invoked and will mess up > primary process since it reset all shared data. So we introduced new API > rte_eth_dev_release_port_local which only reset ethdev's state to unsued but > not touch shared data so other process will not be impacted. > Since not all device driver is target to support primary-secondary > process model, so the patch set only fix this on all Intel devices and > vdev, it can be refereneced by other driver when equevalent fix is required Nitpick - why the naming mismatch between *_private() and *_local()? > > Limitation: > =========== > > The solution does not cover the case that primary process exit while > secondary processes still be active. Though this is not a typial use > case, but if this happens: > 1. secondary process can't attach / detach any shared device since no > primary exist. > 2. secondary process still can attach / detach private device. > 3. secondary process still can detach a share device privately but may > not attach it back, that ethdev slot will become zombie slot. I think this should be explicit and by design. Shared devices can only be communicated to all secondaries through a primary process. No primary - no shared devices. I don't think we can do anything about it unless we implement some kind of peer-to-peer IPC (which isn't happening as far as i'm aware). Thanks for your work on this patchset! -- Thanks, Anatoly