From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mga07.intel.com (mga07.intel.com [134.134.136.100]) by dpdk.org (Postfix) with ESMTP id 53B441B4A8 for ; Thu, 27 Sep 2018 17:07:26 +0200 (CEST) X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga007.fm.intel.com ([10.253.24.52]) by orsmga105.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 27 Sep 2018 08:07:25 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.54,311,1534834800"; d="scan'208";a="73458491" Received: from fyigit-mobl.ger.corp.intel.com (HELO [10.252.3.51]) ([10.252.3.51]) by fmsmga007.fm.intel.com with ESMTP; 27 Sep 2018 08:07:20 -0700 To: Jeff Guo , stephen@networkplumber.org, bruce.richardson@intel.com, konstantin.ananyev@intel.com, gaetan.rivet@6wind.com, jingjing.wu@intel.com, thomas@monjalon.net, motih@mellanox.com, matan@mellanox.com, harry.van.haaren@intel.com, qi.z.zhang@intel.com, shaopeng.he@intel.com, bernard.iremonger@intel.com, arybchenko@solarflare.com, wenzhuo.lu@intel.com Cc: jblunck@infradead.org, shreyansh.jain@nxp.com, dev@dpdk.org, helin.zhang@intel.com References: <1498711073-42917-1-git-send-email-jia.guo@intel.com> <1534502916-31636-1-git-send-email-jia.guo@intel.com> <1534502916-31636-9-git-send-email-jia.guo@intel.com> From: Ferruh Yigit Openpgp: preference=signencrypt Message-ID: Date: Thu, 27 Sep 2018 16:07:19 +0100 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:52.0) Gecko/20100101 Thunderbird/52.9.1 MIME-Version: 1.0 In-Reply-To: <1534502916-31636-9-git-send-email-jia.guo@intel.com> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit Subject: Re: [dpdk-dev] [PATCH v10 7/8] igb_uio: fix unexpected remove issue for hotplug X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 27 Sep 2018 15:07:26 -0000 On 8/17/2018 11:48 AM, Jeff Guo wrote: > When a device is hotplugged out, the PCI resource is released in the > kernel, the UIO file descriptor will disappear and the irq will be > released. After this, a kernel crash will be caused if the igb uio driver > tries to access or release these resources. > > And more, uio_remove will be called unexpectedly before uio_release > when device be hotpluggged out, the uio_remove procedure will > free resources that are required by uio_release. This will later affect the > usage of interrupt as there is no way to disable the interrupt which is > defined in uio_release. > > To prevent this, the hotplug removal needs to be identified and processed > accordingly in igb uio driver. > > This patch proposes the addition of enum rte_udev_state in the > rte_uio_pci_dev struct. This will store the state of the uio device as one > of the following: probed/opened/released/removed. > > This patch also checks the kobject's remove_uevent_sent state to detect if > the removal status is hotplug-out. Once a hotplug-out is detected, it will > call uio_release and set the uio status to "removed". After that, uio will > check the status in the uio_release function. If uio has already been > removed, it will only free the dirty uio resource. > > Signed-off-by: Jeff Guo > Acked-by: Shaopeng He <...> > @@ -331,20 +351,35 @@ igbuio_pci_open(struct uio_info *info, struct inode *inode) > > /* enable interrupts */ > err = igbuio_pci_enable_interrupts(udev); > - mutex_unlock(&udev->lock); > if (err) { > dev_err(&dev->dev, "Enable interrupt fails\n"); > + pci_clear_master(dev); Why pci_clear_master required here. btw, some part of this patch conflicts with [1], which removes mutes and use atomic refcnt operations, but introducing state seems needs mutex. [1] igb_uio: fix refcount if open returns error https://patches.dpdk.org/patch/44732/ > + mutex_unlock(&udev->lock); > return err; > } > + udev->state = RTE_UDEV_OPENNED; > + mutex_unlock(&udev->lock); > return 0; > } > > +/** > + * This gets called while closing uio device file. > + */ > static int > igbuio_pci_release(struct uio_info *info, struct inode *inode) > { > struct rte_uio_pci_dev *udev = info->priv; > struct pci_dev *dev = udev->pdev; > > + if (udev->state == RTE_UDEV_REMOVED) { > + mutex_destroy(&udev->lock); > + igbuio_pci_release_iomem(&udev->info); > + pci_disable_device(dev); > + pci_set_drvdata(dev, NULL); > + kfree(udev); > + return 0; This branch taken when pci_remove called before pci_release. - At this stage is "dev" valid, since pci_remove() called? - In this path uio_unregister_device() is missing, who unregisters uio? - sysfs_remove_group() also missing, it is not clear if it is forgotten or left out, what do you think move common part of pci_remove into new function and call both in pci_remove and here? And as a logic, can we make pci_remove clear everything, instead of doing some cleanup here. Like: pci_remove: - calls pci_release - instead of return keeps doing pci_remove work - set state to REMOVED pci_release: - if state is REMOVED, return without doing nothing btw, even after uio_unregister_device() how pci_release called? It can help to share crash backtrace in commit log, to describe problem in more detail.