From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <konstantin.ananyev@intel.com>
Received: from mga03.intel.com (mga03.intel.com [134.134.136.65])
 by dpdk.org (Postfix) with ESMTP id 9E95C7EC7
 for <dev@dpdk.org>; Fri, 20 Apr 2018 12:32:05 +0200 (CEST)
X-Amp-Result: SKIPPED(no attachment in message)
X-Amp-File-Uploaded: False
Received: from orsmga001.jf.intel.com ([10.7.209.18])
 by orsmga103.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384;
 20 Apr 2018 03:32:05 -0700
X-ExtLoop1: 1
X-IronPort-AV: E=Sophos;i="5.49,300,1520924400"; d="scan'208";a="49424464"
Received: from irsmsx151.ger.corp.intel.com ([163.33.192.59])
 by orsmga001.jf.intel.com with ESMTP; 20 Apr 2018 03:32:03 -0700
Received: from irsmsx102.ger.corp.intel.com ([169.254.2.164]) by
 IRSMSX151.ger.corp.intel.com ([169.254.4.181]) with mapi id 14.03.0319.002;
 Fri, 20 Apr 2018 11:32:02 +0100
From: "Ananyev, Konstantin" <konstantin.ananyev@intel.com>
To: "Guo, Jia" <jia.guo@intel.com>, "stephen@networkplumber.org"
 <stephen@networkplumber.org>, "Richardson, Bruce"
 <bruce.richardson@intel.com>, "Yigit, Ferruh" <ferruh.yigit@intel.com>,
 "gaetan.rivet@6wind.com" <gaetan.rivet@6wind.com>, "Wu, Jingjing"
 <jingjing.wu@intel.com>, "thomas@monjalon.net" <thomas@monjalon.net>,
 "motih@mellanox.com" <motih@mellanox.com>, "matan@mellanox.com"
 <matan@mellanox.com>, "Van Haaren, Harry" <harry.van.haaren@intel.com>,
 "Tan, Jianfeng" <jianfeng.tan@intel.com>
CC: "jblunck@infradead.org" <jblunck@infradead.org>, "shreyansh.jain@nxp.com"
 <shreyansh.jain@nxp.com>, "dev@dpdk.org" <dev@dpdk.org>, "Zhang, Helin"
 <helin.zhang@intel.com>
Thread-Topic: [PATCH V20 1/4] bus/pci: introduce device hot unplug handle
Thread-Index: AQHT1xqy3MPgIc/PH0e+Cvfaf7uu2KQJdHXQ
Date: Fri, 20 Apr 2018 10:32:01 +0000
Message-ID: <2601191342CEEE43887BDE71AB977258AE918A3B@IRSMSX102.ger.corp.intel.com>
References: <1498711073-42917-1-git-send-email-jia.guo@intel.com>
 <1524058689-4954-1-git-send-email-jia.guo@intel.com>
 <1524058689-4954-2-git-send-email-jia.guo@intel.com>
In-Reply-To: <1524058689-4954-2-git-send-email-jia.guo@intel.com>
Accept-Language: en-IE, en-US
Content-Language: en-US
X-MS-Has-Attach: 
X-MS-TNEF-Correlator: 
x-titus-metadata-40: eyJDYXRlZ29yeUxhYmVscyI6IiIsIk1ldGFkYXRhIjp7Im5zIjoiaHR0cDpcL1wvd3d3LnRpdHVzLmNvbVwvbnNcL0ludGVsMyIsImlkIjoiNTQxMTVkNmYtNTlmMi00MGU0LTlmZDMtOGU4YWNlMTIxNjliIiwicHJvcHMiOlt7Im4iOiJDVFBDbGFzc2lmaWNhdGlvbiIsInZhbHMiOlt7InZhbHVlIjoiQ1RQX05UIn1dfV19LCJTdWJqZWN0TGFiZWxzIjpbXSwiVE1DVmVyc2lvbiI6IjE2LjUuOS4zIiwiVHJ1c3RlZExhYmVsSGFzaCI6IkwzWXdHeDJjUEVkRmpIT1Q2SGdlb2hFRG53VFNGSmpUcENLN1R0MVhHSE09In0=
x-ctpclassification: CTP_NT
dlp-product: dlpe-windows
dlp-version: 11.0.200.100
dlp-reaction: no-action
x-originating-ip: [163.33.239.181]
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0
Subject: Re: [dpdk-dev] [PATCH V20 1/4] bus/pci: introduce device hot unplug
	handle
X-BeenThere: dev@dpdk.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: DPDK patches and discussions <dev.dpdk.org>
List-Unsubscribe: <https://dpdk.org/ml/options/dev>,
 <mailto:dev-request@dpdk.org?subject=unsubscribe>
List-Archive: <http://dpdk.org/ml/archives/dev/>
List-Post: <mailto:dev@dpdk.org>
List-Help: <mailto:dev-request@dpdk.org?subject=help>
List-Subscribe: <https://dpdk.org/ml/listinfo/dev>,
 <mailto:dev-request@dpdk.org?subject=subscribe>
X-List-Received-Date: Fri, 20 Apr 2018 10:32:06 -0000

Hi Jeff,

> As of device hot unplug, we need some preparatory measures so that we wil=
l
> not encounter memory fault after device be plug out of the system,
> and also let we could recover the running data path but not been break.
> This patch allows the buses to handle device hot unplug event.
> The patch only enable the ops in pci bus, when handle device hot unplug
> event, remap a dummy memory to avoid bus read/write error.
> Other buses could accordingly implement this ops specific by themselves.
>=20
> Signed-off-by: Jeff Guo <jia.guo@intel.com>
> ---
> v20->19:
> clean the code
> ---
>  drivers/bus/pci/pci_common.c            | 67 +++++++++++++++++++++++++++=
++++++
>  drivers/bus/pci/pci_common_uio.c        | 32 ++++++++++++++++
>  drivers/bus/pci/private.h               | 12 ++++++
>  lib/librte_eal/common/include/rte_bus.h | 16 ++++++++
>  4 files changed, 127 insertions(+)
>=20
> diff --git a/drivers/bus/pci/pci_common.c b/drivers/bus/pci/pci_common.c
> index 2a00f36..709eaf3 100644
> --- a/drivers/bus/pci/pci_common.c
> +++ b/drivers/bus/pci/pci_common.c
> @@ -474,6 +474,72 @@ pci_find_device(const struct rte_device *start, rte_=
dev_cmp_t cmp,
>  }
>=20
>  static int
> +pci_handle_hot_unplug(struct rte_device *dev, void *failure_addr)
> +{
> +	struct rte_pci_device *pdev =3D NULL;
> +	int ret =3D 0, i, isfound =3D 0;
> +
> +	if (failure_addr !=3D NULL) {
> +		FOREACH_DEVICE_ON_PCIBUS(pdev) {
> +			for (i =3D 0; i !=3D sizeof(pdev->mem_resource) /
> +				sizeof(pdev->mem_resource[0]); i++) {

You can do i !=3D RTE_DIM(pdev->mem_resource) here.

> +				if ((uint64_t)failure_addr >=3D
> +				    (uint64_t)pdev->mem_resource[i].addr &&
> +				    (uint64_t)failure_addr <=3D
> +				    (uint64_t)pdev->mem_resource[i].addr +
> +				    pdev->mem_resource[i].len) {


I think it should be failure_addr < addr + len

> +					RTE_LOG(ERR, EAL, "Failure address "
> +						"%16.16"PRIx64" is belong to "
> +						"resource of device %s!\n",
> +						(uint64_t)failure_addr,
> +						pdev->device.name);
> +					isfound =3D 1;
> +					break;
> +				}
> +			}
> +			if (isfound)
> +				break;


Might be it is a good thing to put the code that searches for address into =
a separate function.=20

> +		}
> +	} else if (dev !=3D NULL) {
> +		pdev =3D RTE_DEV_TO_PCI(dev);
> +	} else {
> +		return -EINVAL;
> +	}
> +
> +	if (!pdev)
> +		return -1;
> +
> +	/* remap resources for devices */
> +	switch (pdev->kdrv) {
> +	case RTE_KDRV_VFIO:
> +#ifdef VFIO_PRESENT
> +		/* TODO */
> +#endif

Should set ret =3D-1 as not implemented now.

> +		break;
> +	case RTE_KDRV_IGB_UIO:
> +	case RTE_KDRV_UIO_GENERIC:
> +		if (rte_eal_using_phys_addrs()) {
> +			/* map resources for devices that use uio */
> +			ret =3D pci_uio_remap_resource(pdev);
> +		}
> +		break;
> +	case RTE_KDRV_NIC_UIO:
> +		ret =3D pci_uio_remap_resource(pdev);
> +		break;
> +	default:
> +		RTE_LOG(DEBUG, EAL,
> +			"  Not managed by a supported kernel driver, skipped\n");
> +		ret =3D -1;
> +		break;
> +	}
> +
> +	if (ret !=3D 0)
> +		RTE_LOG(ERR, EAL, "failed to handle hot unplug of %s",
> +			pdev->name);
> +	return ret;
> +}
> +
> +static int
>  pci_plug(struct rte_device *dev)
>  {
>  	return pci_probe_all_drivers(RTE_DEV_TO_PCI(dev));
> @@ -503,6 +569,7 @@ struct rte_pci_bus rte_pci_bus =3D {
>  		.unplug =3D pci_unplug,
>  		.parse =3D pci_parse,
>  		.get_iommu_class =3D rte_pci_get_iommu_class,
> +		.handle_hot_unplug =3D pci_handle_hot_unplug,
>  	},
>  	.device_list =3D TAILQ_HEAD_INITIALIZER(rte_pci_bus.device_list),
>  	.driver_list =3D TAILQ_HEAD_INITIALIZER(rte_pci_bus.driver_list),
> diff --git a/drivers/bus/pci/pci_common_uio.c b/drivers/bus/pci/pci_commo=
n_uio.c
> index 54bc20b..ba2c458 100644
> --- a/drivers/bus/pci/pci_common_uio.c
> +++ b/drivers/bus/pci/pci_common_uio.c
> @@ -146,6 +146,38 @@ pci_uio_unmap(struct mapped_pci_resource *uio_res)
>  	}
>  }
>=20
> +/* remap the PCI resource of a PCI device in anonymous virtual memory */
> +int
> +pci_uio_remap_resource(struct rte_pci_device *dev)
> +{
> +	int i;
> +	void *map_address;
> +
> +	if (dev =3D=3D NULL)
> +		return -1;
> +
> +	/* Remap all BARs */
> +	for (i =3D 0; i !=3D PCI_MAX_RESOURCE; i++) {
> +		/* skip empty BAR */
> +		if (dev->mem_resource[i].phys_addr =3D=3D 0)
> +			continue;
> +		pci_unmap_resource(dev->mem_resource[i].addr,
> +				(size_t)dev->mem_resource[i].len);
> +		map_address =3D pci_map_resource(
> +				dev->mem_resource[i].addr, -1, 0,
> +				(size_t)dev->mem_resource[i].len,
> +				MAP_ANONYMOUS | MAP_FIXED);

Instead of using mumap/mmap() can we use mremap() here?
Might be a bit safer approach.

> +		if (map_address =3D=3D MAP_FAILED) {
> +			RTE_LOG(ERR, EAL,
> +				"Cannot remap resource for device %s\n",
> +				dev->name);
> +			return -1;
> +		}
> +	}
> +
> +	return 0;
> +}
> +
>  static struct mapped_pci_resource *
>  pci_uio_find_resource(struct rte_pci_device *dev)
>  {
> diff --git a/drivers/bus/pci/private.h b/drivers/bus/pci/private.h
> index 88fa587..cc1668c 100644
> --- a/drivers/bus/pci/private.h
> +++ b/drivers/bus/pci/private.h
> @@ -173,6 +173,18 @@ void pci_uio_free_resource(struct rte_pci_device *de=
v,
>  		struct mapped_pci_resource *uio_res);
>=20
>  /**
> + * remap the pci uio resource.
> + *
> + * @param dev
> + *   Point to the struct rte pci device.
> + * @return
> + *   - On success, zero.
> + *   - On failure, a negative value.
> + */
> +int
> +pci_uio_remap_resource(struct rte_pci_device *dev);
> +
> +/**
>   * Map device memory to uio resource
>   *
>   * This function is private to EAL.
> diff --git a/lib/librte_eal/common/include/rte_bus.h b/lib/librte_eal/com=
mon/include/rte_bus.h
> index 6fb0834..d2c5778 100644
> --- a/lib/librte_eal/common/include/rte_bus.h
> +++ b/lib/librte_eal/common/include/rte_bus.h
> @@ -168,6 +168,20 @@ typedef int (*rte_bus_unplug_t)(struct rte_device *d=
ev);
>  typedef int (*rte_bus_parse_t)(const char *name, void *addr);
>=20
>  /**
> + * Implementation specific hot unplug handler function which is responsi=
ble
> + * for handle the failure when hot unplug the device, guaranty the syste=
m
> + * would not crash in the case.
> + * @param dev
> + *	Pointer of the device structure.
> + *
> + * @return
> + *	0 on success.
> + *	!0 on error.
> + */
> +typedef int (*rte_bus_handle_hot_unplug_t)(struct rte_device *dev,
> +						void *dev_addr);
> +
> +/**
>   * Bus scan policies
>   */
>  enum rte_bus_scan_mode {
> @@ -209,6 +223,8 @@ struct rte_bus {
>  	rte_bus_plug_t plug;         /**< Probe single device for drivers */
>  	rte_bus_unplug_t unplug;     /**< Remove single device from driver */
>  	rte_bus_parse_t parse;       /**< Parse a device name */
> +	rte_bus_handle_hot_unplug_t handle_hot_unplug; /**< handle hot unplug
> +							device event */
>  	struct rte_bus_conf conf;    /**< Bus configuration */
>  	rte_bus_get_iommu_class_t get_iommu_class; /**< Get iommu class */
>  };
> --
> 2.7.4