From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mga04.intel.com (mga04.intel.com [192.55.52.120]) by dpdk.org (Postfix) with ESMTP id CD8205F25 for ; Thu, 18 Oct 2018 07:51:56 +0200 (CEST) X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga002.jf.intel.com ([10.7.209.21]) by fmsmga104.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 17 Oct 2018 22:51:55 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.54,395,1534834800"; d="scan'208";a="101190062" Received: from jguo15x-mobl.ccr.corp.intel.com (HELO [10.67.68.85]) ([10.67.68.85]) by orsmga002.jf.intel.com with ESMTP; 17 Oct 2018 22:51:50 -0700 To: Ferruh Yigit , stephen@networkplumber.org, bruce.richardson@intel.com, konstantin.ananyev@intel.com, gaetan.rivet@6wind.com, jingjing.wu@intel.com, thomas@monjalon.net, motih@mellanox.com, matan@mellanox.com, harry.van.haaren@intel.com, qi.z.zhang@intel.com, shaopeng.he@intel.com, bernard.iremonger@intel.com, arybchenko@solarflare.com, wenzhuo.lu@intel.com Cc: jblunck@infradead.org, shreyansh.jain@nxp.com, dev@dpdk.org, helin.zhang@intel.com References: <1498711073-42917-1-git-send-email-jia.guo@intel.com> <1534502916-31636-1-git-send-email-jia.guo@intel.com> <1534502916-31636-9-git-send-email-jia.guo@intel.com> From: Jeff Guo Message-ID: <920f4be1-cb58-803a-1aff-aaeb167ca682@intel.com> Date: Thu, 18 Oct 2018 13:51:49 +0800 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:60.0) Gecko/20100101 Thunderbird/60.2.1 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit Content-Language: en-US Subject: Re: [dpdk-dev] [PATCH v10 7/8] igb_uio: fix unexpected remove issue for hotplug X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 18 Oct 2018 05:51:57 -0000 hi, ferruh On 9/27/2018 11:07 PM, Ferruh Yigit wrote: > On 8/17/2018 11:48 AM, Jeff Guo wrote: >> When a device is hotplugged out, the PCI resource is released in the >> kernel, the UIO file descriptor will disappear and the irq will be >> released. After this, a kernel crash will be caused if the igb uio driver >> tries to access or release these resources. >> >> And more, uio_remove will be called unexpectedly before uio_release >> when device be hotpluggged out, the uio_remove procedure will >> free resources that are required by uio_release. This will later affect the >> usage of interrupt as there is no way to disable the interrupt which is >> defined in uio_release. >> >> To prevent this, the hotplug removal needs to be identified and processed >> accordingly in igb uio driver. >> >> This patch proposes the addition of enum rte_udev_state in the >> rte_uio_pci_dev struct. This will store the state of the uio device as one >> of the following: probed/opened/released/removed. >> >> This patch also checks the kobject's remove_uevent_sent state to detect if >> the removal status is hotplug-out. Once a hotplug-out is detected, it will >> call uio_release and set the uio status to "removed". After that, uio will >> check the status in the uio_release function. If uio has already been >> removed, it will only free the dirty uio resource. >> >> Signed-off-by: Jeff Guo >> Acked-by: Shaopeng He > <...> > >> @@ -331,20 +351,35 @@ igbuio_pci_open(struct uio_info *info, struct inode *inode) >> >> /* enable interrupts */ >> err = igbuio_pci_enable_interrupts(udev); >> - mutex_unlock(&udev->lock); >> if (err) { >> dev_err(&dev->dev, "Enable interrupt fails\n"); >> + pci_clear_master(dev); > Why pci_clear_master required here. It is because set master is before interrupt enabling, if enable interrupt fails should clear master before return i think. Anyway it is not belong to this patch perspective, it could be separated to another one. > btw, some part of this patch conflicts with [1], which removes mutes and use > atomic refcnt operations, but introducing state seems needs mutex. > > [1] > igb_uio: fix refcount if open returns error > https://patches.dpdk.org/patch/44732/ yes, i see and will rework for that if need. >> + mutex_unlock(&udev->lock); >> return err; >> } >> + udev->state = RTE_UDEV_OPENNED; >> + mutex_unlock(&udev->lock); >> return 0; >> } >> >> +/** >> + * This gets called while closing uio device file. >> + */ >> static int >> igbuio_pci_release(struct uio_info *info, struct inode *inode) >> { >> struct rte_uio_pci_dev *udev = info->priv; >> struct pci_dev *dev = udev->pdev; >> >> + if (udev->state == RTE_UDEV_REMOVED) { >> + mutex_destroy(&udev->lock); >> + igbuio_pci_release_iomem(&udev->info); >> + pci_disable_device(dev); >> + pci_set_drvdata(dev, NULL); >> + kfree(udev); >> + return 0; > This branch taken when pci_remove called before pci_release. > - At this stage is "dev" valid, since pci_remove() called? > - In this path uio_unregister_device() is missing, who unregisters uio? > - sysfs_remove_group() also missing, it is not clear if it is forgotten or left > out, what do you think move common part of pci_remove into new function and call > both in pci_remove and here? It is not forgotten but specific left out, since the if uio remove before uio release it will cause kernel error, which is double free the already-free irq issue when uio unregister device. > And as a logic, can we make pci_remove clear everything, instead of doing some > cleanup here. Like: > pci_remove: > - calls pci_release > - instead of return keeps doing pci_remove work > - set state to REMOVED > > pci_release: > - if state is REMOVED, return without doing nothing I think the logic you said here is make sense, just make release and remove more focus their own work. > > btw, even after uio_unregister_device() how pci_release called?  The consequence of igb uio removal is that, igb uio remove be called, then igb uio release be called when detaching device, then if quit app it will call pci remove. > > It can help to share crash backtrace in commit log, to describe problem in more > detail. I will do that. And i think the 2 thing need to fix is that, one is the double free irq issue, the other one is give the chance to disable interrupt when uio remove be called before uio release. I check again and find that just add release before remove could both fix these issues, so please review my coming update patch. Thanks.