From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <jingjing.wu@intel.com>
Received: from mga09.intel.com (mga09.intel.com [134.134.136.24])
 by dpdk.org (Postfix) with ESMTP id A864825B3;
 Thu, 12 Oct 2017 07:43:34 +0200 (CEST)
Received: from fmsmga005.fm.intel.com ([10.253.24.32])
 by orsmga102.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384;
 11 Oct 2017 22:43:33 -0700
X-ExtLoop1: 1
X-IronPort-AV: E=Sophos;i="5.43,364,1503385200"; d="scan'208";a="161746924"
Received: from fmsmsx104.amr.corp.intel.com ([10.18.124.202])
 by fmsmga005.fm.intel.com with ESMTP; 11 Oct 2017 22:43:32 -0700
Received: from fmsmsx156.amr.corp.intel.com (10.18.116.74) by
 fmsmsx104.amr.corp.intel.com (10.18.124.202) with Microsoft SMTP Server (TLS)
 id 14.3.319.2; Wed, 11 Oct 2017 22:43:32 -0700
Received: from shsmsx152.ccr.corp.intel.com (10.239.6.52) by
 fmsmsx156.amr.corp.intel.com (10.18.116.74) with Microsoft SMTP Server (TLS)
 id 14.3.319.2; Wed, 11 Oct 2017 22:43:32 -0700
Received: from shsmsx103.ccr.corp.intel.com ([169.254.4.213]) by
 SHSMSX152.ccr.corp.intel.com ([169.254.6.93]) with mapi id 14.03.0319.002;
 Thu, 12 Oct 2017 13:43:30 +0800
From: "Wu, Jingjing" <jingjing.wu@intel.com>
To: "Yigit, Ferruh" <ferruh.yigit@intel.com>, "Tan, Jianfeng"
 <jianfeng.tan@intel.com>, "shijith.thotton@caviumnetworks.com"
 <shijith.thotton@caviumnetworks.com>, "gregory@weka.io" <gregory@weka.io>,
 "Xing, Beilei" <beilei.xing@intel.com>
CC: "dev@dpdk.org" <dev@dpdk.org>, "stable@dpdk.org" <stable@dpdk.org>
Thread-Topic: [PATCH v2 2/2] igb_uio: fix interrupt enablement after FLR in VM
Thread-Index: AQHTQYiJirQH/y4uiEin3zVjl/Hz6KLftsbA
Date: Thu, 12 Oct 2017 05:43:29 +0000
Message-ID: <9BB6961774997848B5B42BEC655768F810E92434@SHSMSX103.ccr.corp.intel.com>
References: <1507581083-33667-2-git-send-email-jingjing.wu@intel.com>
 <1507586960-35508-1-git-send-email-jingjing.wu@intel.com>
In-Reply-To: <1507586960-35508-1-git-send-email-jingjing.wu@intel.com>
Accept-Language: en-US
Content-Language: en-US
X-MS-Has-Attach: 
X-MS-TNEF-Correlator: 
x-originating-ip: [10.239.127.40]
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0
Subject: Re: [dpdk-stable] [PATCH v2 2/2] igb_uio: fix interrupt enablement
	after FLR in VM
X-BeenThere: stable@dpdk.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: patches for DPDK stable branches <stable.dpdk.org>
List-Unsubscribe: <http://dpdk.org/ml/options/stable>,
 <mailto:stable-request@dpdk.org?subject=unsubscribe>
List-Archive: <http://dpdk.org/ml/archives/stable/>
List-Post: <mailto:stable@dpdk.org>
List-Help: <mailto:stable-request@dpdk.org?subject=help>
List-Subscribe: <http://dpdk.org/ml/listinfo/stable>,
 <mailto:stable-request@dpdk.org?subject=subscribe>
X-List-Received-Date: Thu, 12 Oct 2017 05:43:35 -0000

Hi, Shjith

Could you help to review and verify if this fix can meet your requirement?

Thanks
Jingjing

> -----Original Message-----
> From: Wu, Jingjing
> Sent: Tuesday, October 10, 2017 6:09 AM
> To: Yigit, Ferruh <ferruh.yigit@intel.com>; Tan, Jianfeng
> <jianfeng.tan@intel.com>; shijith.thotton@caviumnetworks.com;
> gregory@weka.io; Xing, Beilei <beilei.xing@intel.com>
> Cc: dev@dpdk.org; Wu, Jingjing <jingjing.wu@intel.com>; stable@dpdk.org
> Subject: [PATCH v2 2/2] igb_uio: fix interrupt enablement after FLR in VM
>=20
> If pass-through a VF by vfio-pci to a Qemu VM, after FLR in VM, the inter=
rupt
> setting is not recoverd correctly to host as below:
>  in VM guest:
>         Capabilities: [70] MSI-X: Enable+ Count=3D5 Masked-  in Host:
>         Capabilities: [70] MSI-X: Enable+ Count=3D5 Masked-
>=20
> That was because in pci_reset_function, it first reads the PCI configure =
and set
> FLR reset, and then writes PCI configure as restoration. But not all the =
writing
> are successful to Host.
> Because vfio-pci driver doesn't allow directly write PCI MSI-X Cap.
>=20
> To fix this issue, we need to move the interrupt enablement from igb_uio =
probe
> to open device file. While it is also the similar as the behaviour in vfi=
o_pci
> kernel module code.
>=20
> Fixes: b58eedfc7dd5 ("igb_uio: issue FLR during open and release of devic=
e file")
>=20
> Cc: stable@dpdk.org
>=20
> Signed-off-by: Jingjing Wu <jingjing.wu@intel.com>
> Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com>
> ---
> v2 change:
>  - fix typo
>  - remove duplicated debug info
>=20
>  lib/librte_eal/linuxapp/igb_uio/igb_uio.c | 221 ++++++++++++++++--------=
------
>  1 file changed, 119 insertions(+), 102 deletions(-)
>=20
> diff --git a/lib/librte_eal/linuxapp/igb_uio/igb_uio.c
> b/lib/librte_eal/linuxapp/igb_uio/igb_uio.c
> index c8dd5f4..d943795 100644
> --- a/lib/librte_eal/linuxapp/igb_uio/igb_uio.c
> +++ b/lib/librte_eal/linuxapp/igb_uio/igb_uio.c
> @@ -49,7 +49,6 @@ struct rte_uio_pci_dev {
>=20
>  static char *intr_mode;
>  static enum rte_intr_mode igbuio_intr_mode_preferred =3D
> RTE_INTR_MODE_MSIX;
> -
>  /* sriov sysfs */
>  static ssize_t
>  show_max_vfs(struct device *dev, struct device_attribute *attr, @@ -144,=
8
> +143,10 @@ igbuio_pci_irqcontrol(struct uio_info *info, s32 irq_state)
>   * If yes, disable it here and will be enable later.
>   */
>  static irqreturn_t
> -igbuio_pci_irqhandler(int irq, struct uio_info *info)
> +igbuio_pci_irqhandler(int irq, void *dev_id)
>  {
> +	struct uio_device *idev =3D (struct uio_device *)dev_id;
> +	struct uio_info *info =3D idev->info;
>  	struct rte_uio_pci_dev *udev =3D info->priv;
>=20
>  	/* Legacy mode need to mask in hardware */ @@ -153,10 +154,115
> @@ igbuio_pci_irqhandler(int irq, struct uio_info *info)
>  	    !pci_check_and_mask_intx(udev->pdev))
>  		return IRQ_NONE;
>=20
> +	uio_event_notify(info);
> +
>  	/* Message signal mode, no share IRQ and automasked */
>  	return IRQ_HANDLED;
>  }
>=20
> +static int
> +igbuio_pci_enable_interrupts(struct rte_uio_pci_dev *udev) {
> +	int err =3D 0;
> +#ifndef HAVE_ALLOC_IRQ_VECTORS
> +	struct msix_entry msix_entry;
> +#endif
> +
> +	switch (igbuio_intr_mode_preferred) {
> +	case RTE_INTR_MODE_MSIX:
> +		/* Only 1 msi-x vector needed */
> +#ifndef HAVE_ALLOC_IRQ_VECTORS
> +		msix_entry.entry =3D 0;
> +		if (pci_enable_msix(udev->pdev, &msix_entry, 1) =3D=3D 0) {
> +			dev_dbg(&udev->pdev->dev, "using MSI-X");
> +			udev->info.irq_flags =3D IRQF_NO_THREAD;
> +			udev->info.irq =3D msix_entry.vector;
> +			udev->mode =3D RTE_INTR_MODE_MSIX;
> +			break;
> +		}
> +#else
> +		if (pci_alloc_irq_vectors(udev->pdev, 1, 1, PCI_IRQ_MSIX) =3D=3D 1)
> {
> +			dev_dbg(&udev->pdev->dev, "using MSI-X");
> +			udev->info.irq_flags =3D IRQF_NO_THREAD;
> +			udev->info.irq =3D pci_irq_vector(udev->pdev, 0);
> +			udev->mode =3D RTE_INTR_MODE_MSIX;
> +			break;
> +		}
> +#endif
> +
> +	/* fall back to MSI */
> +	case RTE_INTR_MODE_MSI:
> +#ifndef HAVE_ALLOC_IRQ_VECTORS
> +		if (pci_enable_msi(udev->pdev) =3D=3D 0) {
> +			dev_dbg(&udev->pdev->dev, "using MSI");
> +			udev->info.irq_flags =3D IRQF_NO_THREAD;
> +			udev->info.irq =3D udev->pdev->irq;
> +			udev->mode =3D RTE_INTR_MODE_MSI;
> +			break;
> +		}
> +#else
> +		if (pci_alloc_irq_vectors(udev->pdev, 1, 1, PCI_IRQ_MSI) =3D=3D 1)
> {
> +			dev_dbg(&udev->pdev->dev, "using MSI");
> +			udev->info.irq_flags =3D IRQF_NO_THREAD;
> +			udev->info.irq =3D pci_irq_vector(udev->pdev, 0);
> +			udev->mode =3D RTE_INTR_MODE_MSI;
> +			break;
> +		}
> +#endif
> +	/* fall back to INTX */
> +	case RTE_INTR_MODE_LEGACY:
> +		if (pci_intx_mask_supported(udev->pdev)) {
> +			dev_dbg(&udev->pdev->dev, "using INTX");
> +			udev->info.irq_flags =3D IRQF_SHARED |
> IRQF_NO_THREAD;
> +			udev->info.irq =3D udev->pdev->irq;
> +			udev->mode =3D RTE_INTR_MODE_LEGACY;
> +			break;
> +		}
> +		dev_notice(&udev->pdev->dev, "PCI INTX mask not
> supported\n");
> +	/* fall back to no IRQ */
> +	case RTE_INTR_MODE_NONE:
> +		udev->mode =3D RTE_INTR_MODE_NONE;
> +		udev->info.irq =3D UIO_IRQ_NONE;
> +		break;
> +
> +	default:
> +		dev_err(&udev->pdev->dev, "invalid IRQ mode %u",
> +			igbuio_intr_mode_preferred);
> +		udev->info.irq =3D UIO_IRQ_NONE;
> +		err =3D -EINVAL;
> +	}
> +
> +	if (udev->info.irq !=3D UIO_IRQ_NONE)
> +		err =3D request_irq(udev->info.irq, igbuio_pci_irqhandler,
> +				  udev->info.irq_flags, udev->info.name,
> +				  udev->info.uio_dev);
> +	dev_info(&udev->pdev->dev, "uio device registered with irq %lx\n",
> +		 udev->info.irq);
> +
> +	return err;
> +}
> +
> +static void
> +igbuio_pci_disable_interrupts(struct rte_uio_pci_dev *udev) {
> +	if (udev->info.irq) {
> +		free_irq(udev->info.irq, udev->info.uio_dev);
> +		udev->info.irq =3D 0;
> +	}
> +
> +#ifndef HAVE_ALLOC_IRQ_VECTORS
> +	if (udev->mode =3D=3D RTE_INTR_MODE_MSIX)
> +		pci_disable_msix(udev->pdev);
> +	if (udev->mode =3D=3D RTE_INTR_MODE_MSI)
> +		pci_disable_msi(udev->pdev);
> +#else
> +	if (udev->mode =3D=3D RTE_INTR_MODE_MSIX ||
> +	    udev->mode =3D=3D RTE_INTR_MODE_MSI)
> +		pci_free_irq_vectors(udev->pdev);
> +#endif
> +}
> +
> +
>  /**
>   * This gets called while opening uio device file.
>   */
> @@ -165,12 +271,19 @@ igbuio_pci_open(struct uio_info *info, struct inode
> *inode)  {
>  	struct rte_uio_pci_dev *udev =3D info->priv;
>  	struct pci_dev *dev =3D udev->pdev;
> +	int err;
>=20
>  	pci_reset_function(dev);
>=20
>  	/* set bus master, which was cleared by the reset function */
>  	pci_set_master(dev);
>=20
> +	/* enable interrupts */
> +	err =3D igbuio_pci_enable_interrupts(udev);
> +	if (err) {
> +		dev_err(&dev->dev, "Enable interrupt fails\n");
> +		return err;
> +	}
>  	return 0;
>  }
>=20
> @@ -180,6 +293,9 @@ igbuio_pci_release(struct uio_info *info, struct inod=
e
> *inode)
>  	struct rte_uio_pci_dev *udev =3D info->priv;
>  	struct pci_dev *dev =3D udev->pdev;
>=20
> +	/* disable interrupts */
> +	igbuio_pci_disable_interrupts(udev);
> +
>  	/* stop the device from further DMA */
>  	pci_clear_master(dev);
>=20
> @@ -250,94 +366,6 @@ igbuio_pci_release_iomem(struct uio_info *info)  }
>=20
>  static int
> -igbuio_pci_enable_interrupts(struct rte_uio_pci_dev *udev) -{
> -	int err =3D 0;
> -#ifndef HAVE_ALLOC_IRQ_VECTORS
> -	struct msix_entry msix_entry;
> -#endif
> -
> -	switch (igbuio_intr_mode_preferred) {
> -	case RTE_INTR_MODE_MSIX:
> -		/* Only 1 msi-x vector needed */
> -#ifndef HAVE_ALLOC_IRQ_VECTORS
> -		msix_entry.entry =3D 0;
> -		if (pci_enable_msix(udev->pdev, &msix_entry, 1) =3D=3D 0) {
> -			dev_dbg(&udev->pdev->dev, "using MSI-X");
> -			udev->info.irq_flags =3D IRQF_NO_THREAD;
> -			udev->info.irq =3D msix_entry.vector;
> -			udev->mode =3D RTE_INTR_MODE_MSIX;
> -			break;
> -		}
> -#else
> -		if (pci_alloc_irq_vectors(udev->pdev, 1, 1, PCI_IRQ_MSIX) =3D=3D 1)
> {
> -			dev_dbg(&udev->pdev->dev, "using MSI-X");
> -			udev->info.irq_flags =3D IRQF_NO_THREAD;
> -			udev->info.irq =3D pci_irq_vector(udev->pdev, 0);
> -			udev->mode =3D RTE_INTR_MODE_MSIX;
> -			break;
> -		}
> -#endif
> -	/* fall back to MSI */
> -	case RTE_INTR_MODE_MSI:
> -#ifndef HAVE_ALLOC_IRQ_VECTORS
> -		if (pci_enable_msi(udev->pdev) =3D=3D 0) {
> -			dev_dbg(&udev->pdev->dev, "using MSI");
> -			udev->info.irq_flags =3D IRQF_NO_THREAD;
> -			udev->info.irq =3D udev->pdev->irq;
> -			udev->mode =3D RTE_INTR_MODE_MSI;
> -			break;
> -		}
> -#else
> -		if (pci_alloc_irq_vectors(udev->pdev, 1, 1, PCI_IRQ_MSI) =3D=3D 1)
> {
> -			dev_dbg(&udev->pdev->dev, "using MSI");
> -			udev->info.irq_flags =3D IRQF_NO_THREAD;
> -			udev->info.irq =3D pci_irq_vector(udev->pdev, 0);
> -			udev->mode =3D RTE_INTR_MODE_MSI;
> -			break;
> -		}
> -#endif
> -	/* fall back to INTX */
> -	case RTE_INTR_MODE_LEGACY:
> -		if (pci_intx_mask_supported(udev->pdev)) {
> -			dev_dbg(&udev->pdev->dev, "using INTX");
> -			udev->info.irq_flags =3D IRQF_SHARED |
> IRQF_NO_THREAD;
> -			udev->info.irq =3D udev->pdev->irq;
> -			udev->mode =3D RTE_INTR_MODE_LEGACY;
> -			break;
> -		}
> -		dev_notice(&udev->pdev->dev, "PCI INTX mask not
> supported\n");
> -	/* fall back to no IRQ */
> -	case RTE_INTR_MODE_NONE:
> -		udev->mode =3D RTE_INTR_MODE_NONE;
> -		udev->info.irq =3D UIO_IRQ_NONE;
> -		break;
> -
> -	default:
> -		dev_err(&udev->pdev->dev, "invalid IRQ mode %u",
> -			igbuio_intr_mode_preferred);
> -		err =3D -EINVAL;
> -	}
> -
> -	return err;
> -}
> -
> -static void
> -igbuio_pci_disable_interrupts(struct rte_uio_pci_dev *udev) -{ -#ifndef
> HAVE_ALLOC_IRQ_VECTORS
> -	if (udev->mode =3D=3D RTE_INTR_MODE_MSIX)
> -		pci_disable_msix(udev->pdev);
> -	if (udev->mode =3D=3D RTE_INTR_MODE_MSI)
> -		pci_disable_msi(udev->pdev);
> -#else
> -	if (udev->mode =3D=3D RTE_INTR_MODE_MSIX ||
> -	    udev->mode =3D=3D RTE_INTR_MODE_MSI)
> -		pci_free_irq_vectors(udev->pdev);
> -#endif
> -}
> -
> -static int
>  igbuio_setup_bars(struct pci_dev *dev, struct uio_info *info)  {
>  	int i, iom, iop, ret;
> @@ -427,20 +455,15 @@ igbuio_pci_probe(struct pci_dev *dev, const struct
> pci_device_id *id)
>  	/* fill uio infos */
>  	udev->info.name =3D "igb_uio";
>  	udev->info.version =3D "0.1";
> -	udev->info.handler =3D igbuio_pci_irqhandler;
>  	udev->info.irqcontrol =3D igbuio_pci_irqcontrol;
>  	udev->info.open =3D igbuio_pci_open;
>  	udev->info.release =3D igbuio_pci_release;
>  	udev->info.priv =3D udev;
>  	udev->pdev =3D dev;
>=20
> -	err =3D igbuio_pci_enable_interrupts(udev);
> -	if (err !=3D 0)
> -		goto fail_release_iomem;
> -
>  	err =3D sysfs_create_group(&dev->dev.kobj, &dev_attr_grp);
>  	if (err !=3D 0)
> -		goto fail_disable_interrupts;
> +		goto fail_release_iomem;
>=20
>  	/* register uio driver */
>  	err =3D uio_register_device(&dev->dev, &udev->info); @@ -449,9 +472,6
> @@ igbuio_pci_probe(struct pci_dev *dev, const struct pci_device_id *id)
>=20
>  	pci_set_drvdata(dev, udev);
>=20
> -	dev_info(&dev->dev, "uio device registered with irq %lx\n",
> -		 udev->info.irq);
> -
>  	/*
>  	 * Doing a harmless dma mapping for attaching the device to
>  	 * the iommu identity mapping if kernel boots with iommu=3Dpt.
> @@ -477,8 +497,6 @@ igbuio_pci_probe(struct pci_dev *dev, const struct
> pci_device_id *id)
>=20
>  fail_remove_group:
>  	sysfs_remove_group(&dev->dev.kobj, &dev_attr_grp);
> -fail_disable_interrupts:
> -	igbuio_pci_disable_interrupts(udev);
>  fail_release_iomem:
>  	igbuio_pci_release_iomem(&udev->info);
>  	pci_disable_device(dev);
> @@ -495,7 +513,6 @@ igbuio_pci_remove(struct pci_dev *dev)
>=20
>  	sysfs_remove_group(&dev->dev.kobj, &dev_attr_grp);
>  	uio_unregister_device(&udev->info);
> -	igbuio_pci_disable_interrupts(udev);
>  	igbuio_pci_release_iomem(&udev->info);
>  	pci_disable_device(dev);
>  	pci_set_drvdata(dev, NULL);
> --
> 2.7.4