From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) by dpdk.org (Postfix) with ESMTP id D10AA1B5F1 for ; Tue, 26 Jun 2018 17:30:28 +0200 (CEST) X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga002.fm.intel.com ([10.253.24.26]) by orsmga103.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 26 Jun 2018 08:30:27 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.51,274,1526367600"; d="scan'208";a="61471739" Received: from jguo15x-mobl3.ccr.corp.intel.com (HELO [10.255.135.254]) ([10.255.135.254]) by fmsmga002.fm.intel.com with ESMTP; 26 Jun 2018 08:30:12 -0700 To: =?UTF-8?Q?Ga=c3=abtan_Rivet?= References: <1498711073-42917-1-git-send-email-jia.guo@intel.com> <1529668268-7462-1-git-send-email-jia.guo@intel.com> <1529668268-7462-2-git-send-email-jia.guo@intel.com> <20180622125908.r5r7qvvndn4dp6ln@bidouze.vm.6wind.com> Cc: stephen@networkplumber.org, bruce.richardson@intel.com, ferruh.yigit@intel.com, konstantin.ananyev@intel.com, jingjing.wu@intel.com, thomas@monjalon.net, motih@mellanox.com, matan@mellanox.com, harry.van.haaren@intel.com, qi.z.zhang@intel.com, shaopeng.he@intel.com, jblunck@infradead.org, shreyansh.jain@nxp.com, dev@dpdk.org, helin.zhang@intel.com From: "Guo, Jia" Message-ID: <3cc2fd5f-48b9-7d0b-e942-1716b0622462@intel.com> Date: Tue, 26 Jun 2018 23:30:00 +0800 User-Agent: Mozilla/5.0 (Windows NT 6.3; WOW64; rv:45.0) Gecko/20100101 Thunderbird/45.4.0 MIME-Version: 1.0 In-Reply-To: <20180622125908.r5r7qvvndn4dp6ln@bidouze.vm.6wind.com> Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 8bit Subject: Re: [dpdk-dev] [PATCH v2 1/4] bus/pci: handle device hot unplug X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 26 Jun 2018 15:30:29 -0000 hi, gaetan, thanks for your review, see comment as bellow On 6/22/2018 8:59 PM, Gaëtan Rivet wrote: > Hi Jeff, > > Sorry, I followed this development from afar, > I have a remark regarding this API, I think it can be made simpler. > Details below. > > On Fri, Jun 22, 2018 at 07:51:05PM +0800, Jeff Guo wrote: >> When a hardware device is removed physically or the software disables >> it, the hot unplug occur. App need to call ether dev API to detach the >> device, to unplug the device at the bus level and make access to the device >> invalid. But the problem is that, the removal of the device from the >> software lists is not going to be instantaneous, at this time if the data >> path still read/write the device, it will cause MMIO error and result of >> the app crash out. So a hot unplug handle mechanism need to guaranty app >> will not crash out when hot unplug device. >> >> To handle device hot unplug is bus-specific behavior, this patch introduces >> a bus ops so that each kind of bus can implement its own logic. Further, >> this patch implements the ops for PCI bus: remap a dummy memory to avoid >> bus read/write error. >> >> Signed-off-by: Jeff Guo >> --- >> v2->v1(v21): >> refind commit log >> --- >> drivers/bus/pci/pci_common.c | 65 +++++++++++++++++++++++++++++++++ >> drivers/bus/pci/pci_common_uio.c | 33 +++++++++++++++++ >> drivers/bus/pci/private.h | 12 ++++++ >> lib/librte_eal/common/include/rte_bus.h | 16 ++++++++ >> 4 files changed, 126 insertions(+) >> >> diff --git a/drivers/bus/pci/pci_common.c b/drivers/bus/pci/pci_common.c >> index 7215aae..74d9aa8 100644 >> --- a/drivers/bus/pci/pci_common.c >> +++ b/drivers/bus/pci/pci_common.c >> @@ -472,6 +472,70 @@ pci_find_device(const struct rte_device *start, rte_dev_cmp_t cmp, >> return NULL; >> } >> >> +static struct rte_pci_device * >> +pci_find_device_by_addr(void *failure_addr) >> +{ >> + struct rte_pci_device *pdev = NULL; >> + int i; >> + >> + FOREACH_DEVICE_ON_PCIBUS(pdev) { >> + for (i = 0; i != RTE_DIM(pdev->mem_resource); i++) { >> + if ((uint64_t)(uintptr_t)failure_addr >= >> + (uint64_t)(uintptr_t)pdev->mem_resource[i].addr && >> + (uint64_t)(uintptr_t)failure_addr < >> + (uint64_t)(uintptr_t)pdev->mem_resource[i].addr + >> + pdev->mem_resource[i].len) { >> + RTE_LOG(ERR, EAL, "Failure address " >> + "%16.16"PRIx64" belongs to " >> + "device %s!\n", >> + (uint64_t)(uintptr_t)failure_addr, >> + pdev->device.name); >> + return pdev; >> + } >> + } >> + } >> + return NULL; >> +} > You define here a new bus ops that takes either an rte_device or an > arbitrary address as input. > > In the uev handler that would call this ops afterward, you similarly try > to find either a bus using the device name, or then iterate over all > buses and try to find one able to handle the error. > > This seems redundant and prone to ambiguity: should one check that the > device address is actually linked with the provided address? If not, is > it an improper call or a special case? This is unclear. > > Note: I haven't followed the previous discussion, maybe the > dual dev_addr + failure_addr is warranted in the API here, > if so why not. Otherwise it just seems redundant: > the dev addr will never be within a physical BAR mapping, > and for all buses / drivers not using physical mappings, > the addr is only meaningful as a cue to find an internal > resource. > > You can use the bus->find_device() to iterate over buses, and design > your bus ops such that when provided with an addr, would do whatever it > needs internally to find a relevant resource and either handle the > error, or return that the error was not handled. if bus->find_device() can make think more simpler, why not? i will check that and modify it. > Something like that: > > /* new bus ops: */ > /* If generic error codes are defined as part of the API, > >0 should mean that the sigbus was not handled, > <0 that an error occured but that one should stop trying, > 0 that everything is ok. > */ > int (*handle_sigbus)(void *addr); > > /* new rte_bus API: */ > > static int > bus_handle_sigbus(const struct rte_bus *bus, > const void *addr) > { > /* If additional error codes are defined as part of the API, > negative values should stop the iteration. > In this case, rte_errno would need to be set as well. > */ > return !(bus->handle_sigbus && bus->handle_sigbus(addr) <= 0); > } > > int > rte_bus_sigbus_handler(void *addr) > { > struct rte_bus *bus; > int old_errno = rte_errno; > > rte_errno = 0; > bus = rte_bus_find(NULL, bus_handle_sigbus, addr); > if (bus == NULL) { > /* ERROR: no bus could handle the error. */ > RTE_LOG(ERR, EAL, "No bus was able to handle the error"); > return -1; > } else if {rte_errno != 0) { > /* ERROR: a generic sigbus handling error. */ > RTE_LOG(ERR, EAL, "Say what the error is"); > return -1; > } > rte_errno = old_errno; > return 0; > } > > Which would afterward be implemented, for example in PCI bus: > > static rte_pci_device * > pci_find_device_by_addr(void *addr) > { > struct rte_pci_device *pdev; > > FOREACH_DEVICE_ON_PCIBUS(pdev) > if (&pdev->device == addr || > /* addr within mappings of pdev */) > return pdev; > return NULL; > } > > static int > pci_handle_sigbus(void *addr) > { > static rte_pci_device *pdev; > > pdev = pci_find_device_by_addr(addr); > if (pdev == NULL) > return -1; > /* Here, handle uio remap if needed. */ > } > > ------------------------- > > - Leave the bus iteration and check within rte_bus. Centralize the call > in a tighter rte_bus API, do not use directly your OPS from your other > EAL facility. > > - You have left several error messages for signaling success (!), or > even simply that a bus could not handle a specific address. This is > bad. An error should only appear on error. Otherwise, all of this can > easily be traced using a debugger, so I don't think it's necessary > to leave it at a DEBUG level. > > But in any case, remove all ERR-level messages for success. make sense. i will check which debug message is no need at least. > <...> > >> diff --git a/lib/librte_eal/common/include/rte_bus.h b/lib/librte_eal/common/include/rte_bus.h >> index eb9eded..6a5609f 100644 >> --- a/lib/librte_eal/common/include/rte_bus.h >> +++ b/lib/librte_eal/common/include/rte_bus.h >> @@ -168,6 +168,20 @@ typedef int (*rte_bus_unplug_t)(struct rte_device *dev); >> typedef int (*rte_bus_parse_t)(const char *name, void *addr); >> >> /** >> + * Implementation a specific hot unplug handler, which is responsible >> + * for handle the failure when hot unplug the device, guaranty the system >> + * would not hung in the case. >> + * @param dev >> + * Pointer of the device structure. >> + * >> + * @return >> + * 0 on success. >> + * !0 on error. >> + */ >> +typedef int (*rte_bus_handle_hot_unplug_t)(struct rte_device *dev, >> + void *dev_addr); >> + > I don't like the name of the OPS. > The documentation evokes only "the failure". > So is it a handle for any and all error possibly happening to a device? > If so, where is the input to describe the error? > If it is only meant to handle SIGBUS, because it is a very specific > error state only meant to happen on certain parts of the bus (the queue > mappings, if relevant), then it makes sense to only have an arbitrary > address as context for handling. > > But then, it needs to be called as such. The expected failure to be > handled should be explicit in the name of the ops, and the documentation > should be more precise about what a bus developper should do with the > input. I agree with your point of let the name more explicit, but i think here maybe we should spit it into two ops, the one is hotplug_handler, the other is sigbus_handler, because there are 2 path that both, data path and control path, they are also need to call remap function when detect the hot remove event,even there are no sigbus happen. >> +/** >> * Bus scan policies >> */ >> enum rte_bus_scan_mode { >> @@ -209,6 +223,8 @@ struct rte_bus { >> rte_bus_plug_t plug; /**< Probe single device for drivers */ >> rte_bus_unplug_t unplug; /**< Remove single device from driver */ >> rte_bus_parse_t parse; /**< Parse a device name */ >> + rte_bus_handle_hot_unplug_t handle_hot_unplug; /**< handle hot unplug >> + device event */ > The new ops should be added at the end of the structure. ok. > Regards,