DPDK patches and discussions
 help / color / mirror / Atom feed
* [dpdk-dev] [PATCH 0/7] vfio/pci: SR-IOV support
@ 2020-02-11 23:05 Alex Williamson
  2020-02-11 23:05 ` [dpdk-dev] [PATCH 1/7] vfio: Include optional device match in vfio_device_ops callbacks Alex Williamson
                   ` (7 more replies)
  0 siblings, 8 replies; 19+ messages in thread
From: Alex Williamson @ 2020-02-11 23:05 UTC (permalink / raw)
  To: kvm
  Cc: linux-pci, linux-kernel, dev, mtosatti, thomas, bluca,
	jerinjacobk, bruce.richardson, cohuck

Given the mostly positive feedback from the RFC[1], here's a new
non-RFC revision.  Changes since RFC:

 - vfio_device_ops.match semantics refined
 - Use helpers for struct pci_dev.physfn to avoid breakage without
   CONFIG_PCI_IOV
 - Relax to allow SR-IOV configuration changes while PF is opened.
   There are potentially interesting use cases here, including
   perhaps QEMU emulating an SR-IOV capability and calling out
   to a privileged entity to manipulate sriov_numvfs and corral
   the resulting devices.
 - Retest vfio_device_feature.argsz to include uuid length.
 - Add Connie's R-b on 6/7

I still wish we had a solution to make it less opaque to the user
why a VFIO_GROUP_GET_DEVICE_FD() has failed if a VF token is
required, but this is still the best I've been able to come up with.
If there are objections or better ideas, please raise them now.

The synopsis of this series is that we have an ongoing desire to drive
PCIe SR-IOV PFs from userspace with VFIO.  There's an immediate need
for this with DPDK drivers and potentially interesting future use
cases in virtualization.  We've been reluctant to add this support
previously due to the dependency and trust relationship between the
VF device and PF driver.  Minimally the PF driver can induce a denial
of service to the VF, but depending on the specific implementation,
the PF driver might also be responsible for moving data between VFs
or have direct access to the state of the VF, including data or state
otherwise private to the VF or VF driver.

To help resolve these concerns, we introduce a VF token into the VFIO
PCI ABI, which acts as a shared secret key between drivers.  The
userspace PF driver is required to set the VF token to a known value
and userspace VF drivers are required to provide the token to access
the VF device.  If a PF driver is restarted with VF drivers in use, it
must also provide the current token in order to prevent a rogue
untrusted PF driver from replacing a known driver.  The degree to
which this new token is considered secret is left to the userspace
drivers, the kernel intentionally provides no means to retrieve the
current token.

Note that the above token is only required for this new model where
both the PF and VF devices are usable through vfio-pci.  Existing
models of VFIO drivers where the PF is used without SR-IOV enabled
or the VF is bound to a userspace driver with an in-kernel, host PF
driver are unaffected.

The latter configuration above also highlights a new inverted scenario
that is now possible, a userspace PF driver with in-kernel VF drivers.
I believe this is a scenario that should be allowed, but should not be
enabled by default.  This series includes code to set a default
driver_override for VFs sourced from a vfio-pci user owned PF, such
that the VFs are also bound to vfio-pci.  This model is compatible
with tools like driverctl and allows the system administrator to
decide if other bindings should be enabled.  The VF token interface
above exists only between vfio-pci PF and VF drivers, once a VF is
bound to another driver, the administrator has effectively pronounced
the device as trusted.  The vfio-pci driver will note alternate
binding in dmesg for logging and debugging purposes.

Please review, comment, and test.  The example QEMU implementation
provided with the RFC[2] is still current for this version.  Thanks,

Alex

[1] https://lore.kernel.org/lkml/158085337582.9445.17682266437583505502.stgit@gimli.home/
[2] https://lore.kernel.org/lkml/20200204161737.34696b91@w520.home/
---

Alex Williamson (7):
      vfio: Include optional device match in vfio_device_ops callbacks
      vfio/pci: Implement match ops
      vfio/pci: Introduce VF token
      vfio: Introduce VFIO_DEVICE_FEATURE ioctl and first user
      vfio/pci: Add sriov_configure support
      vfio/pci: Remove dev_fmt definition
      vfio/pci: Cleanup .probe() exit paths


 drivers/vfio/pci/vfio_pci.c         |  312 ++++++++++++++++++++++++++++++++---
 drivers/vfio/pci/vfio_pci_private.h |   10 +
 drivers/vfio/vfio.c                 |   20 ++
 include/linux/vfio.h                |    4 
 include/uapi/linux/vfio.h           |   37 ++++
 5 files changed, 355 insertions(+), 28 deletions(-)


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [dpdk-dev] [PATCH 1/7] vfio: Include optional device match in vfio_device_ops callbacks
  2020-02-11 23:05 [dpdk-dev] [PATCH 0/7] vfio/pci: SR-IOV support Alex Williamson
@ 2020-02-11 23:05 ` Alex Williamson
  2020-02-13 10:31   ` Cornelia Huck
  2020-02-11 23:05 ` [dpdk-dev] [PATCH 2/7] vfio/pci: Implement match ops Alex Williamson
                   ` (6 subsequent siblings)
  7 siblings, 1 reply; 19+ messages in thread
From: Alex Williamson @ 2020-02-11 23:05 UTC (permalink / raw)
  To: kvm
  Cc: linux-pci, linux-kernel, dev, mtosatti, thomas, bluca,
	jerinjacobk, bruce.richardson, cohuck

Allow bus drivers to provide their own callback to match a device to
the user provided string.

Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
---
 drivers/vfio/vfio.c  |   20 ++++++++++++++++----
 include/linux/vfio.h |    4 ++++
 2 files changed, 20 insertions(+), 4 deletions(-)

diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c
index c8482624ca34..0bd77d6ea691 100644
--- a/drivers/vfio/vfio.c
+++ b/drivers/vfio/vfio.c
@@ -875,11 +875,23 @@ EXPORT_SYMBOL_GPL(vfio_device_get_from_dev);
 static struct vfio_device *vfio_device_get_from_name(struct vfio_group *group,
 						     char *buf)
 {
-	struct vfio_device *it, *device = NULL;
+	struct vfio_device *it, *device = ERR_PTR(-ENODEV);
 
 	mutex_lock(&group->device_lock);
 	list_for_each_entry(it, &group->device_list, group_next) {
-		if (!strcmp(dev_name(it->dev), buf)) {
+		int ret;
+
+		if (it->ops->match) {
+			ret = it->ops->match(it->device_data, buf);
+			if (ret < 0) {
+				device = ERR_PTR(ret);
+				break;
+			}
+		} else {
+			ret = !strcmp(dev_name(it->dev), buf);
+		}
+
+		if (ret) {
 			device = it;
 			vfio_device_get(device);
 			break;
@@ -1430,8 +1442,8 @@ static int vfio_group_get_device_fd(struct vfio_group *group, char *buf)
 		return -EPERM;
 
 	device = vfio_device_get_from_name(group, buf);
-	if (!device)
-		return -ENODEV;
+	if (IS_ERR(device))
+		return PTR_ERR(device);
 
 	ret = device->ops->open(device->device_data);
 	if (ret) {
diff --git a/include/linux/vfio.h b/include/linux/vfio.h
index e42a711a2800..029694b977f2 100644
--- a/include/linux/vfio.h
+++ b/include/linux/vfio.h
@@ -26,6 +26,9 @@
  *         operations documented below
  * @mmap: Perform mmap(2) on a region of the device file descriptor
  * @request: Request for the bus driver to release the device
+ * @match: Optional device name match callback (return: 0 for no-match, >0 for
+ *         match, -errno for abort (ex. match with insufficient or incorrect
+ *         additional args)
  */
 struct vfio_device_ops {
 	char	*name;
@@ -39,6 +42,7 @@ struct vfio_device_ops {
 			 unsigned long arg);
 	int	(*mmap)(void *device_data, struct vm_area_struct *vma);
 	void	(*request)(void *device_data, unsigned int count);
+	int	(*match)(void *device_data, char *buf);
 };
 
 extern struct iommu_group *vfio_iommu_group_get(struct device *dev);


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [dpdk-dev] [PATCH 2/7] vfio/pci: Implement match ops
  2020-02-11 23:05 [dpdk-dev] [PATCH 0/7] vfio/pci: SR-IOV support Alex Williamson
  2020-02-11 23:05 ` [dpdk-dev] [PATCH 1/7] vfio: Include optional device match in vfio_device_ops callbacks Alex Williamson
@ 2020-02-11 23:05 ` Alex Williamson
  2020-02-13 11:04   ` Cornelia Huck
  2020-02-11 23:05 ` [dpdk-dev] [PATCH 3/7] vfio/pci: Introduce VF token Alex Williamson
                   ` (5 subsequent siblings)
  7 siblings, 1 reply; 19+ messages in thread
From: Alex Williamson @ 2020-02-11 23:05 UTC (permalink / raw)
  To: kvm
  Cc: linux-pci, linux-kernel, dev, mtosatti, thomas, bluca,
	jerinjacobk, bruce.richardson, cohuck

This currently serves the same purpose as the default implementation
but will be expanded for additional functionality.

Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
---
 drivers/vfio/pci/vfio_pci.c |    8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
index 379a02c36e37..2ec6c31d0ab0 100644
--- a/drivers/vfio/pci/vfio_pci.c
+++ b/drivers/vfio/pci/vfio_pci.c
@@ -1278,6 +1278,13 @@ static void vfio_pci_request(void *device_data, unsigned int count)
 	mutex_unlock(&vdev->igate);
 }
 
+static int vfio_pci_match(void *device_data, char *buf)
+{
+	struct vfio_pci_device *vdev = device_data;
+
+	return !strcmp(pci_name(vdev->pdev), buf);
+}
+
 static const struct vfio_device_ops vfio_pci_ops = {
 	.name		= "vfio-pci",
 	.open		= vfio_pci_open,
@@ -1287,6 +1294,7 @@ static const struct vfio_device_ops vfio_pci_ops = {
 	.write		= vfio_pci_write,
 	.mmap		= vfio_pci_mmap,
 	.request	= vfio_pci_request,
+	.match		= vfio_pci_match,
 };
 
 static int vfio_pci_reflck_attach(struct vfio_pci_device *vdev);


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [dpdk-dev] [PATCH 3/7] vfio/pci: Introduce VF token
  2020-02-11 23:05 [dpdk-dev] [PATCH 0/7] vfio/pci: SR-IOV support Alex Williamson
  2020-02-11 23:05 ` [dpdk-dev] [PATCH 1/7] vfio: Include optional device match in vfio_device_ops callbacks Alex Williamson
  2020-02-11 23:05 ` [dpdk-dev] [PATCH 2/7] vfio/pci: Implement match ops Alex Williamson
@ 2020-02-11 23:05 ` Alex Williamson
  2020-02-13 11:46   ` Cornelia Huck
  2020-02-11 23:05 ` [dpdk-dev] [PATCH 4/7] vfio: Introduce VFIO_DEVICE_FEATURE ioctl and first user Alex Williamson
                   ` (4 subsequent siblings)
  7 siblings, 1 reply; 19+ messages in thread
From: Alex Williamson @ 2020-02-11 23:05 UTC (permalink / raw)
  To: kvm
  Cc: linux-pci, linux-kernel, dev, mtosatti, thomas, bluca,
	jerinjacobk, bruce.richardson, cohuck

If we enable SR-IOV on a vfio-pci owned PF, the resulting VFs are not
fully isolated from the PF.  The PF can always cause a denial of
service to the VF, if not access data passed through the VF directly.
This is why vfio-pci currently does not bind to PFs with SR-IOV enabled
and does not provide access itself to enabling SR-IOV on a PF.  The
IOMMU grouping mechanism might allow us a solution to this lack of
isolation, however the deficiency isn't actually in the DMA path, so
much as the potential cooperation between PF and VF devices.  Also,
if we were to force VFs into the same IOMMU group as the PF, we severely
limit the utility of having independent drivers managing PFs and VFs
with vfio.

Therefore we introduce the concept of a VF token.  The token is
implemented as a UUID and represents a shared secret which must be set
by the PF driver and used by the VF drivers in order to access a vfio
device file descriptor for the VF.  The ioctl to set the VF token will
be provided in a later commit, this commit implements the underlying
infrastructure.  The concept here is to augment the string the user
passes to match a device within a group in order to retrieve access to
the device descriptor.  For example, rather than passing only the PCI
device name (ex. "0000:03:00.0") the user would also pass a vf_token
UUID (ex. "2ab74924-c335-45f4-9b16-8569e5b08258").  The device match
string therefore becomes:

"0000:03:00.0 vf_token=2ab74924-c335-45f4-9b16-8569e5b08258"

This syntax is expected to be extensible to future options as well, with
the standard being:

"$DEVICE_NAME $OPTION1=$VALUE1 $OPTION2=$VALUE2"

The device name must be first and option=value pairs are separated by
spaces.

The vf_token option is only required for VFs where the PF device is
bound to vfio-pci.  There is no change for PFs using existing host
drivers.

Note that in order to protect existing VF users, not only is it required
to set a vf_token on the PF before VFs devices can be accessed, but also
if there are existing VF users, (re)opening the PF device must also
provide the current vf_token as authentication.  This is intended to
prevent a VF driver starting with a trusted PF driver and later being
replaced by an unknown driver.  A vf_token is not required to open the
PF device when none of the VF devices are in use by vfio-pci drivers.

Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
---
 drivers/vfio/pci/vfio_pci.c         |  127 +++++++++++++++++++++++++++++++++++
 drivers/vfio/pci/vfio_pci_private.h |    8 ++
 2 files changed, 134 insertions(+), 1 deletion(-)

diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
index 2ec6c31d0ab0..26aea9ac4863 100644
--- a/drivers/vfio/pci/vfio_pci.c
+++ b/drivers/vfio/pci/vfio_pci.c
@@ -466,6 +466,35 @@ static void vfio_pci_disable(struct vfio_pci_device *vdev)
 		vfio_pci_set_power_state(vdev, PCI_D3hot);
 }
 
+static struct pci_driver vfio_pci_driver;
+
+static void vfio_pci_vf_token_user_add(struct vfio_pci_device *vdev, int val)
+{
+	struct pci_dev *physfn = pci_physfn(vdev->pdev);
+	struct vfio_device *pf_dev;
+	struct vfio_pci_device *pf_vdev;
+
+	if (!vdev->pdev->is_virtfn)
+		return;
+
+	pf_dev = vfio_device_get_from_dev(&physfn->dev);
+	if (!pf_dev)
+		return;
+
+	if (pci_dev_driver(physfn) != &vfio_pci_driver) {
+		vfio_device_put(pf_dev);
+		return;
+	}
+
+	pf_vdev = vfio_device_data(pf_dev);
+
+	mutex_lock(&pf_vdev->vf_token->lock);
+	pf_vdev->vf_token->users += val;
+	mutex_unlock(&pf_vdev->vf_token->lock);
+
+	vfio_device_put(pf_dev);
+}
+
 static void vfio_pci_release(void *device_data)
 {
 	struct vfio_pci_device *vdev = device_data;
@@ -475,6 +504,7 @@ static void vfio_pci_release(void *device_data)
 	if (!(--vdev->refcnt)) {
 		vfio_spapr_pci_eeh_release(vdev->pdev);
 		vfio_pci_disable(vdev);
+		vfio_pci_vf_token_user_add(vdev, -1);
 	}
 
 	mutex_unlock(&vdev->reflck->lock);
@@ -493,6 +523,7 @@ static int vfio_pci_open(void *device_data)
 	mutex_lock(&vdev->reflck->lock);
 
 	if (!vdev->refcnt) {
+		vfio_pci_vf_token_user_add(vdev, 1);
 		ret = vfio_pci_enable(vdev);
 		if (ret)
 			goto error;
@@ -1278,11 +1309,86 @@ static void vfio_pci_request(void *device_data, unsigned int count)
 	mutex_unlock(&vdev->igate);
 }
 
+#define VF_TOKEN_ARG "vf_token="
+
+/* Called with vdev->vf_token->lock */
+static int vfio_pci_vf_token_match(struct vfio_pci_device *vdev, char *opts)
+{
+	char *token;
+	uuid_t uuid;
+	int ret;
+
+	if (!opts)
+		return -EINVAL;
+
+	token = strstr(opts, VF_TOKEN_ARG);
+	if (!token)
+		return -EINVAL;
+
+	token += strlen(VF_TOKEN_ARG);
+
+	ret = uuid_parse(token, &uuid);
+	if (ret)
+		return ret;
+
+	if (!uuid_equal(&uuid, &vdev->vf_token->uuid))
+		return -EACCES;
+
+	return 0;
+}
+
 static int vfio_pci_match(void *device_data, char *buf)
 {
 	struct vfio_pci_device *vdev = device_data;
+	char *opts;
+
+	opts = strchr(buf, ' ');
+	if (opts) {
+		*opts = 0;
+		opts++;
+	}
+
+	if (strcmp(pci_name(vdev->pdev), buf))
+		return 0; /* No match */
+
+	if (vdev->pdev->is_virtfn) {
+		struct pci_dev *physfn = pci_physfn(vdev->pdev);
+		struct vfio_device *pf_dev;
+		int ret = 0;
+
+		pf_dev = vfio_device_get_from_dev(&physfn->dev);
+		if (pf_dev) {
+			if (pci_dev_driver(physfn) == &vfio_pci_driver) {
+				struct vfio_pci_device *pf_vdev =
+						vfio_device_data(pf_dev);
+
+				mutex_lock(&pf_vdev->vf_token->lock);
+				ret = vfio_pci_vf_token_match(pf_vdev, opts);
+				mutex_unlock(&pf_vdev->vf_token->lock);
+			}
+
+			vfio_device_put(pf_dev);
+
+			if (ret)
+				return -EACCES;
+		}
+	}
 
-	return !strcmp(pci_name(vdev->pdev), buf);
+	if (vdev->vf_token) {
+		int ret = 0;
+
+		mutex_lock(&vdev->vf_token->lock);
+
+		if (vdev->vf_token->users)
+			ret = vfio_pci_vf_token_match(vdev, opts);
+
+		mutex_unlock(&vdev->vf_token->lock);
+
+		if (ret)
+			return -EACCES;
+	}
+
+	return 1; /* Match */
 }
 
 static const struct vfio_device_ops vfio_pci_ops = {
@@ -1354,6 +1460,19 @@ static int vfio_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
 		return ret;
 	}
 
+	if (pdev->is_physfn) {
+		vdev->vf_token = kzalloc(sizeof(*vdev->vf_token), GFP_KERNEL);
+		if (!vdev->vf_token) {
+			vfio_pci_reflck_put(vdev->reflck);
+			vfio_del_group_dev(&pdev->dev);
+			vfio_iommu_group_put(group, &pdev->dev);
+			kfree(vdev);
+			return -ENOMEM;
+		}
+		mutex_init(&vdev->vf_token->lock);
+		uuid_gen(&vdev->vf_token->uuid);
+	}
+
 	if (vfio_pci_is_vga(pdev)) {
 		vga_client_register(pdev, vdev, NULL, vfio_pci_set_vga_decode);
 		vga_set_legacy_decoding(pdev,
@@ -1387,6 +1506,12 @@ static void vfio_pci_remove(struct pci_dev *pdev)
 	if (!vdev)
 		return;
 
+	if (vdev->vf_token) {
+		WARN_ON(vdev->vf_token->users);
+		mutex_destroy(&vdev->vf_token->lock);
+		kfree(vdev->vf_token);
+	}
+
 	vfio_pci_reflck_put(vdev->reflck);
 
 	vfio_iommu_group_put(pdev->dev.iommu_group, &pdev->dev);
diff --git a/drivers/vfio/pci/vfio_pci_private.h b/drivers/vfio/pci/vfio_pci_private.h
index 8a2c7607d513..6da2b4748abf 100644
--- a/drivers/vfio/pci/vfio_pci_private.h
+++ b/drivers/vfio/pci/vfio_pci_private.h
@@ -12,6 +12,7 @@
 #include <linux/pci.h>
 #include <linux/irqbypass.h>
 #include <linux/types.h>
+#include <linux/uuid.h>
 
 #ifndef VFIO_PCI_PRIVATE_H
 #define VFIO_PCI_PRIVATE_H
@@ -84,6 +85,12 @@ struct vfio_pci_reflck {
 	struct mutex		lock;
 };
 
+struct vfio_pci_vf_token {
+	struct mutex		lock;
+	uuid_t			uuid;
+	u32			users;
+};
+
 struct vfio_pci_device {
 	struct pci_dev		*pdev;
 	void __iomem		*barmap[PCI_STD_NUM_BARS];
@@ -122,6 +129,7 @@ struct vfio_pci_device {
 	struct list_head	dummy_resources_list;
 	struct mutex		ioeventfds_lock;
 	struct list_head	ioeventfds_list;
+	struct vfio_pci_vf_token	*vf_token;
 };
 
 #define is_intx(vdev) (vdev->irq_type == VFIO_PCI_INTX_IRQ_INDEX)


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [dpdk-dev] [PATCH 4/7] vfio: Introduce VFIO_DEVICE_FEATURE ioctl and first user
  2020-02-11 23:05 [dpdk-dev] [PATCH 0/7] vfio/pci: SR-IOV support Alex Williamson
                   ` (2 preceding siblings ...)
  2020-02-11 23:05 ` [dpdk-dev] [PATCH 3/7] vfio/pci: Introduce VF token Alex Williamson
@ 2020-02-11 23:05 ` Alex Williamson
  2020-02-13 12:41   ` Cornelia Huck
  2020-02-11 23:06 ` [dpdk-dev] [PATCH 5/7] vfio/pci: Add sriov_configure support Alex Williamson
                   ` (3 subsequent siblings)
  7 siblings, 1 reply; 19+ messages in thread
From: Alex Williamson @ 2020-02-11 23:05 UTC (permalink / raw)
  To: kvm
  Cc: linux-pci, linux-kernel, dev, mtosatti, thomas, bluca,
	jerinjacobk, bruce.richardson, cohuck

The VFIO_DEVICE_FEATURE ioctl is meant to be a general purpose, device
agnostic ioctl for setting, retrieving, and probing device features.
This implementation provides a 16-bit field for specifying a feature
index, where the data porition of the ioctl is determined by the
semantics for the given feature.  Additional flag bits indicate the
direction and nature of the operation; SET indicates user data is
provided into the device feature, GET indicates the device feature is
written out into user data.  The PROBE flag augments determining
whether the given feature is supported, and if provided, whether the
given operation on the feature is supported.

The first user of this ioctl is for setting the vfio-pci VF token,
where the user provides a shared secret key (UUID) on a SR-IOV PF
device, which users must provide when opening associated VF devices.

Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
---
 drivers/vfio/pci/vfio_pci.c |   52 +++++++++++++++++++++++++++++++++++++++++++
 include/uapi/linux/vfio.h   |   37 +++++++++++++++++++++++++++++++
 2 files changed, 89 insertions(+)

diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
index 26aea9ac4863..5414744a3ead 100644
--- a/drivers/vfio/pci/vfio_pci.c
+++ b/drivers/vfio/pci/vfio_pci.c
@@ -1171,6 +1171,58 @@ static long vfio_pci_ioctl(void *device_data,
 
 		return vfio_pci_ioeventfd(vdev, ioeventfd.offset,
 					  ioeventfd.data, count, ioeventfd.fd);
+	} else if (cmd == VFIO_DEVICE_FEATURE) {
+		struct vfio_device_feature feature;
+		uuid_t uuid;
+
+		minsz = offsetofend(struct vfio_device_feature, flags);
+
+		if (copy_from_user(&feature, (void __user *)arg, minsz))
+			return -EFAULT;
+
+		if (feature.argsz < minsz)
+			return -EINVAL;
+
+		if (feature.flags & ~(VFIO_DEVICE_FEATURE_MASK |
+				      VFIO_DEVICE_FEATURE_SET |
+				      VFIO_DEVICE_FEATURE_GET |
+				      VFIO_DEVICE_FEATURE_PROBE))
+			return -EINVAL;
+
+		switch (feature.flags & VFIO_DEVICE_FEATURE_MASK) {
+		case VFIO_DEVICE_FEATURE_PCI_VF_TOKEN:
+			if (!vdev->vf_token)
+				return -ENOTTY;
+
+			/*
+			 * We do not support GET of the VF Token UUID as this
+			 * could expose the token of the previous device user.
+			 */
+			if (feature.flags & VFIO_DEVICE_FEATURE_GET)
+				return -EINVAL;
+
+			if (feature.flags & VFIO_DEVICE_FEATURE_PROBE)
+				return 0;
+
+			/* Don't SET unless told to do so */
+			if (!(feature.flags & VFIO_DEVICE_FEATURE_SET))
+				return -EINVAL;
+
+			if (feature.argsz < minsz + sizeof(uuid))
+				return -EINVAL;
+
+			if (copy_from_user(&uuid, (void __user *)(arg + minsz),
+					   sizeof(uuid)))
+				return -EFAULT;
+
+			mutex_lock(&vdev->vf_token->lock);
+			uuid_copy(&vdev->vf_token->uuid, &uuid);
+			mutex_unlock(&vdev->vf_token->lock);
+
+			return 0;
+		default:
+			return -ENOTTY;
+		}
 	}
 
 	return -ENOTTY;
diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
index 9e843a147ead..c5cbf04ce5a7 100644
--- a/include/uapi/linux/vfio.h
+++ b/include/uapi/linux/vfio.h
@@ -707,6 +707,43 @@ struct vfio_device_ioeventfd {
 
 #define VFIO_DEVICE_IOEVENTFD		_IO(VFIO_TYPE, VFIO_BASE + 16)
 
+/**
+ * VFIO_DEVICE_FEATURE - _IORW(VFIO_TYPE, VFIO_BASE + 17,
+ *			       struct vfio_device_feature
+ *
+ * Get, set, or probe feature data of the device.  The feature is selected
+ * using the FEATURE_MASK portion of the flags field.  Support for a feature
+ * can be probed by setting both the FEATURE_MASK and PROBE bits.  A probe
+ * may optionally include the GET and/or SET bits to determine read vs write
+ * access of the feature respectively.  Probing a feature will return success
+ * if the feature is supported and all of the optionally indicated GET/SET
+ * methods are supported.  The format of the data portion of the structure is
+ * specific to the given feature.  The data portion is not required for
+ * probing.
+ *
+ * Return 0 on success, -errno on failure.
+ */
+struct vfio_device_feature {
+	__u32	argsz;
+	__u32	flags;
+#define VFIO_DEVICE_FEATURE_MASK	(0xffff) /* 16-bit feature index */
+#define VFIO_DEVICE_FEATURE_GET		(1 << 16) /* Get feature into data[] */
+#define VFIO_DEVICE_FEATURE_SET		(1 << 17) /* Set feature from data[] */
+#define VFIO_DEVICE_FEATURE_PROBE	(1 << 18) /* Probe feature support */
+	__u8	data[];
+};
+
+#define VFIO_DEVICE_FEATURE		_IO(VFIO_TYPE, VFIO_BASE + 17)
+
+/*
+ * Provide support for setting a PCI VF Token, which is used as a shared
+ * secret between PF and VF drivers.  This feature may only be set on a
+ * PCI SR-IOV PF when SR-IOV is enabled on the PF and there are no existing
+ * open VFs.  Data provided when setting this feature is a 16-byte array
+ * (__u8 b[16]), representing a UUID.
+ */
+#define VFIO_DEVICE_FEATURE_PCI_VF_TOKEN	(0)
+
 /* -------- API for Type1 VFIO IOMMU -------- */
 
 /**


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [dpdk-dev] [PATCH 5/7] vfio/pci: Add sriov_configure support
  2020-02-11 23:05 [dpdk-dev] [PATCH 0/7] vfio/pci: SR-IOV support Alex Williamson
                   ` (3 preceding siblings ...)
  2020-02-11 23:05 ` [dpdk-dev] [PATCH 4/7] vfio: Introduce VFIO_DEVICE_FEATURE ioctl and first user Alex Williamson
@ 2020-02-11 23:06 ` Alex Williamson
  2020-02-11 23:06 ` [dpdk-dev] [PATCH 6/7] vfio/pci: Remove dev_fmt definition Alex Williamson
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 19+ messages in thread
From: Alex Williamson @ 2020-02-11 23:06 UTC (permalink / raw)
  To: kvm
  Cc: linux-pci, linux-kernel, dev, mtosatti, thomas, bluca,
	jerinjacobk, bruce.richardson, cohuck

With the VF Token interface we can now expect that a vfio userspace
driver must be in collaboration with the PF driver, an unwitting
userspace driver will not be able to get past the GET_DEVICE_FD step
in accessing the device.  We can now move on to actually allowing
SR-IOV to be enabled by vfio-pci on the PF.  Support for this is not
enabled by default in this commit, but it does provide a module option
for this to be enabled (enable_sriov=1).  Enabling VFs is rather
straightforward, except we don't want to risk that a VF might get
autoprobed and bound to other drivers, so a bus notifier is used to
"capture" VFs to vfio-pci using the driver_override support.  We
assume any later action to bind the device to other drivers is
condoned by the system admin and allow it with a log warning.

vfio-pci will disable SR-IOV on a PF before releasing the device,
allowing a VF driver to be assured other drivers cannot take over the
PF and that any other userspace driver must know the shared VF token.
This support also does not provide a mechanism for the PF userspace
driver itself to manipulate SR-IOV.  With this patch SR-IOV can only
be enabled via the host sysfs interface and the PF driver user cannot
create and remove VF.

Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
---
 drivers/vfio/pci/vfio_pci.c         |  106 +++++++++++++++++++++++++++++++----
 drivers/vfio/pci/vfio_pci_private.h |    2 +
 2 files changed, 97 insertions(+), 11 deletions(-)

diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
index 5414744a3ead..cce6127a9c56 100644
--- a/drivers/vfio/pci/vfio_pci.c
+++ b/drivers/vfio/pci/vfio_pci.c
@@ -54,6 +54,12 @@ module_param(disable_idle_d3, bool, S_IRUGO | S_IWUSR);
 MODULE_PARM_DESC(disable_idle_d3,
 		 "Disable using the PCI D3 low power state for idle, unused devices");
 
+static bool enable_sriov;
+#ifdef CONFIG_PCI_IOV
+module_param(enable_sriov, bool, 0644);
+MODULE_PARM_DESC(enable_sriov, "Enable support for SR-IOV configuration");
+#endif
+
 static inline bool vfio_vga_disabled(void)
 {
 #ifdef CONFIG_VFIO_PCI_VGA
@@ -1457,6 +1463,35 @@ static const struct vfio_device_ops vfio_pci_ops = {
 
 static int vfio_pci_reflck_attach(struct vfio_pci_device *vdev);
 static void vfio_pci_reflck_put(struct vfio_pci_reflck *reflck);
+static struct pci_driver vfio_pci_driver;
+
+static int vfio_pci_bus_notifier(struct notifier_block *nb,
+				 unsigned long action, void *data)
+{
+	struct vfio_pci_device *vdev = container_of(nb,
+						    struct vfio_pci_device, nb);
+	struct device *dev = data;
+	struct pci_dev *pdev = to_pci_dev(dev);
+	struct pci_dev *physfn = pci_physfn(pdev);
+
+	if (action == BUS_NOTIFY_ADD_DEVICE &&
+	    pdev->is_virtfn && physfn == vdev->pdev) {
+		pci_info(vdev->pdev, "Captured SR-IOV VF %s driver_override\n",
+			 pci_name(pdev));
+		pdev->driver_override = kasprintf(GFP_KERNEL, "%s",
+						  vfio_pci_ops.name);
+	} else if (action == BUS_NOTIFY_BOUND_DRIVER &&
+		   pdev->is_virtfn && physfn == vdev->pdev) {
+		struct pci_driver *drv = pci_dev_driver(pdev);
+
+		if (drv && drv != &vfio_pci_driver)
+			pci_warn(vdev->pdev,
+				 "VF %s bound to driver %s while PF bound to vfio-pci\n",
+				 pci_name(pdev), drv->name);
+	}
+
+	return 0;
+}
 
 static int vfio_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
 {
@@ -1468,12 +1503,12 @@ static int vfio_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
 		return -EINVAL;
 
 	/*
-	 * Prevent binding to PFs with VFs enabled, this too easily allows
-	 * userspace instance with VFs and PFs from the same device, which
-	 * cannot work.  Disabling SR-IOV here would initiate removing the
-	 * VFs, which would unbind the driver, which is prone to blocking
-	 * if that VF is also in use by vfio-pci.  Just reject these PFs
-	 * and let the user sort it out.
+	 * Prevent binding to PFs with VFs enabled, the VFs might be in use
+	 * by the host or other users.  We cannot capture the VFs if they
+	 * already exist, nor can we track VF users.  Disabling SR-IOV here
+	 * would initiate removing the VFs, which would unbind the driver,
+	 * which is prone to blocking if that VF is also in use by vfio-pci.
+	 * Just reject these PFs and let the user sort it out.
 	 */
 	if (pci_num_vf(pdev)) {
 		pci_warn(pdev, "Cannot bind to PF with SR-IOV enabled\n");
@@ -1521,6 +1556,18 @@ static int vfio_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
 			kfree(vdev);
 			return -ENOMEM;
 		}
+
+		vdev->nb.notifier_call = vfio_pci_bus_notifier;
+		ret = bus_register_notifier(&pci_bus_type, &vdev->nb);
+		if (ret) {
+			kfree(vdev->vf_token);
+			vfio_pci_reflck_put(vdev->reflck);
+			vfio_del_group_dev(&pdev->dev);
+			vfio_iommu_group_put(group, &pdev->dev);
+			kfree(vdev);
+			return ret;
+		}
+
 		mutex_init(&vdev->vf_token->lock);
 		uuid_gen(&vdev->vf_token->uuid);
 	}
@@ -1554,6 +1601,8 @@ static void vfio_pci_remove(struct pci_dev *pdev)
 {
 	struct vfio_pci_device *vdev;
 
+	pci_disable_sriov(pdev);
+
 	vdev = vfio_del_group_dev(&pdev->dev);
 	if (!vdev)
 		return;
@@ -1564,6 +1613,9 @@ static void vfio_pci_remove(struct pci_dev *pdev)
 		kfree(vdev->vf_token);
 	}
 
+	if (vdev->nb.notifier_call)
+		bus_unregister_notifier(&pci_bus_type, &vdev->nb);
+
 	vfio_pci_reflck_put(vdev->reflck);
 
 	vfio_iommu_group_put(pdev->dev.iommu_group, &pdev->dev);
@@ -1612,16 +1664,48 @@ static pci_ers_result_t vfio_pci_aer_err_detected(struct pci_dev *pdev,
 	return PCI_ERS_RESULT_CAN_RECOVER;
 }
 
+static int vfio_pci_sriov_configure(struct pci_dev *pdev, int nr_virtfn)
+{
+	struct vfio_pci_device *vdev;
+	struct vfio_device *device;
+	int ret = 0;
+
+	might_sleep();
+
+	if (!enable_sriov)
+		return -ENOENT;
+
+	device = vfio_device_get_from_dev(&pdev->dev);
+	if (!device)
+		return -ENODEV;
+
+	vdev = vfio_device_data(device);
+	if (!vdev) {
+		vfio_device_put(device);
+		return -ENODEV;
+	}
+
+	if (nr_virtfn == 0)
+		pci_disable_sriov(pdev);
+	else
+		ret = pci_enable_sriov(pdev, nr_virtfn);
+
+	vfio_device_put(device);
+
+	return ret < 0 ? ret : nr_virtfn;
+}
+
 static const struct pci_error_handlers vfio_err_handlers = {
 	.error_detected = vfio_pci_aer_err_detected,
 };
 
 static struct pci_driver vfio_pci_driver = {
-	.name		= "vfio-pci",
-	.id_table	= NULL, /* only dynamic ids */
-	.probe		= vfio_pci_probe,
-	.remove		= vfio_pci_remove,
-	.err_handler	= &vfio_err_handlers,
+	.name			= "vfio-pci",
+	.id_table		= NULL, /* only dynamic ids */
+	.probe			= vfio_pci_probe,
+	.remove			= vfio_pci_remove,
+	.sriov_configure	= vfio_pci_sriov_configure,
+	.err_handler		= &vfio_err_handlers,
 };
 
 static DEFINE_MUTEX(reflck_lock);
diff --git a/drivers/vfio/pci/vfio_pci_private.h b/drivers/vfio/pci/vfio_pci_private.h
index 6da2b4748abf..b651a007334d 100644
--- a/drivers/vfio/pci/vfio_pci_private.h
+++ b/drivers/vfio/pci/vfio_pci_private.h
@@ -13,6 +13,7 @@
 #include <linux/irqbypass.h>
 #include <linux/types.h>
 #include <linux/uuid.h>
+#include <linux/notifier.h>
 
 #ifndef VFIO_PCI_PRIVATE_H
 #define VFIO_PCI_PRIVATE_H
@@ -130,6 +131,7 @@ struct vfio_pci_device {
 	struct mutex		ioeventfds_lock;
 	struct list_head	ioeventfds_list;
 	struct vfio_pci_vf_token	*vf_token;
+	struct notifier_block	nb;
 };
 
 #define is_intx(vdev) (vdev->irq_type == VFIO_PCI_INTX_IRQ_INDEX)


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [dpdk-dev] [PATCH 6/7] vfio/pci: Remove dev_fmt definition
  2020-02-11 23:05 [dpdk-dev] [PATCH 0/7] vfio/pci: SR-IOV support Alex Williamson
                   ` (4 preceding siblings ...)
  2020-02-11 23:06 ` [dpdk-dev] [PATCH 5/7] vfio/pci: Add sriov_configure support Alex Williamson
@ 2020-02-11 23:06 ` Alex Williamson
  2020-02-11 23:06 ` [dpdk-dev] [PATCH 7/7] vfio/pci: Cleanup .probe() exit paths Alex Williamson
  2020-02-14  4:57 ` [dpdk-dev] [PATCH 0/7] vfio/pci: SR-IOV support Alexey Kardashevskiy
  7 siblings, 0 replies; 19+ messages in thread
From: Alex Williamson @ 2020-02-11 23:06 UTC (permalink / raw)
  To: kvm
  Cc: linux-pci, linux-kernel, dev, mtosatti, thomas, bluca,
	jerinjacobk, bruce.richardson, cohuck

It currently results in messages like:

 "vfio-pci 0000:03:00.0: vfio_pci: ..."

Which is quite a bit redundant.

Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
---
 drivers/vfio/pci/vfio_pci.c |    1 -
 1 file changed, 1 deletion(-)

diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
index cce6127a9c56..a88b45ce1cc7 100644
--- a/drivers/vfio/pci/vfio_pci.c
+++ b/drivers/vfio/pci/vfio_pci.c
@@ -9,7 +9,6 @@
  */
 
 #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
-#define dev_fmt pr_fmt
 
 #include <linux/device.h>
 #include <linux/eventfd.h>


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [dpdk-dev] [PATCH 7/7] vfio/pci: Cleanup .probe() exit paths
  2020-02-11 23:05 [dpdk-dev] [PATCH 0/7] vfio/pci: SR-IOV support Alex Williamson
                   ` (5 preceding siblings ...)
  2020-02-11 23:06 ` [dpdk-dev] [PATCH 6/7] vfio/pci: Remove dev_fmt definition Alex Williamson
@ 2020-02-11 23:06 ` Alex Williamson
  2020-02-14  4:57 ` [dpdk-dev] [PATCH 0/7] vfio/pci: SR-IOV support Alexey Kardashevskiy
  7 siblings, 0 replies; 19+ messages in thread
From: Alex Williamson @ 2020-02-11 23:06 UTC (permalink / raw)
  To: kvm
  Cc: linux-pci, linux-kernel, dev, mtosatti, thomas, bluca,
	jerinjacobk, bruce.richardson, cohuck

The cleanup is getting a tad long.

Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
---
 drivers/vfio/pci/vfio_pci.c |   54 ++++++++++++++++++++-----------------------
 1 file changed, 25 insertions(+), 29 deletions(-)

diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
index a88b45ce1cc7..fff49dfc742a 100644
--- a/drivers/vfio/pci/vfio_pci.c
+++ b/drivers/vfio/pci/vfio_pci.c
@@ -1520,8 +1520,8 @@ static int vfio_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
 
 	vdev = kzalloc(sizeof(*vdev), GFP_KERNEL);
 	if (!vdev) {
-		vfio_iommu_group_put(group, &pdev->dev);
-		return -ENOMEM;
+		ret = -ENOMEM;
+		goto out_group_put;
 	}
 
 	vdev->pdev = pdev;
@@ -1532,43 +1532,27 @@ static int vfio_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
 	INIT_LIST_HEAD(&vdev->ioeventfds_list);
 
 	ret = vfio_add_group_dev(&pdev->dev, &vfio_pci_ops, vdev);
-	if (ret) {
-		vfio_iommu_group_put(group, &pdev->dev);
-		kfree(vdev);
-		return ret;
-	}
+	if (ret)
+		goto out_free;
 
 	ret = vfio_pci_reflck_attach(vdev);
-	if (ret) {
-		vfio_del_group_dev(&pdev->dev);
-		vfio_iommu_group_put(group, &pdev->dev);
-		kfree(vdev);
-		return ret;
-	}
+	if (ret)
+		goto out_del_group_dev;
 
 	if (pdev->is_physfn) {
 		vdev->vf_token = kzalloc(sizeof(*vdev->vf_token), GFP_KERNEL);
 		if (!vdev->vf_token) {
-			vfio_pci_reflck_put(vdev->reflck);
-			vfio_del_group_dev(&pdev->dev);
-			vfio_iommu_group_put(group, &pdev->dev);
-			kfree(vdev);
-			return -ENOMEM;
-		}
-
-		vdev->nb.notifier_call = vfio_pci_bus_notifier;
-		ret = bus_register_notifier(&pci_bus_type, &vdev->nb);
-		if (ret) {
-			kfree(vdev->vf_token);
-			vfio_pci_reflck_put(vdev->reflck);
-			vfio_del_group_dev(&pdev->dev);
-			vfio_iommu_group_put(group, &pdev->dev);
-			kfree(vdev);
-			return ret;
+			ret = -ENOMEM;
+			goto out_reflck;
 		}
 
 		mutex_init(&vdev->vf_token->lock);
 		uuid_gen(&vdev->vf_token->uuid);
+
+		vdev->nb.notifier_call = vfio_pci_bus_notifier;
+		ret = bus_register_notifier(&pci_bus_type, &vdev->nb);
+		if (ret)
+			goto out_vf_token;
 	}
 
 	if (vfio_pci_is_vga(pdev)) {
@@ -1594,6 +1578,18 @@ static int vfio_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
 	}
 
 	return ret;
+
+out_vf_token:
+	kfree(vdev->vf_token);
+out_reflck:
+	vfio_pci_reflck_put(vdev->reflck);
+out_del_group_dev:
+	vfio_del_group_dev(&pdev->dev);
+out_free:
+	kfree(vdev);
+out_group_put:
+	vfio_iommu_group_put(group, &pdev->dev);
+	return ret;
 }
 
 static void vfio_pci_remove(struct pci_dev *pdev)


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [dpdk-dev] [PATCH 1/7] vfio: Include optional device match in vfio_device_ops callbacks
  2020-02-11 23:05 ` [dpdk-dev] [PATCH 1/7] vfio: Include optional device match in vfio_device_ops callbacks Alex Williamson
@ 2020-02-13 10:31   ` Cornelia Huck
  0 siblings, 0 replies; 19+ messages in thread
From: Cornelia Huck @ 2020-02-13 10:31 UTC (permalink / raw)
  To: Alex Williamson
  Cc: kvm, linux-pci, linux-kernel, dev, mtosatti, thomas, bluca,
	jerinjacobk, bruce.richardson

On Tue, 11 Feb 2020 16:05:25 -0700
Alex Williamson <alex.williamson@redhat.com> wrote:

> Allow bus drivers to provide their own callback to match a device to
> the user provided string.
> 
> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
> ---
>  drivers/vfio/vfio.c  |   20 ++++++++++++++++----
>  include/linux/vfio.h |    4 ++++
>  2 files changed, 20 insertions(+), 4 deletions(-)

Reviewed-by: Cornelia Huck <cohuck@redhat.com>


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [dpdk-dev] [PATCH 2/7] vfio/pci: Implement match ops
  2020-02-11 23:05 ` [dpdk-dev] [PATCH 2/7] vfio/pci: Implement match ops Alex Williamson
@ 2020-02-13 11:04   ` Cornelia Huck
  0 siblings, 0 replies; 19+ messages in thread
From: Cornelia Huck @ 2020-02-13 11:04 UTC (permalink / raw)
  To: Alex Williamson
  Cc: kvm, linux-pci, linux-kernel, dev, mtosatti, thomas, bluca,
	jerinjacobk, bruce.richardson

On Tue, 11 Feb 2020 16:05:34 -0700
Alex Williamson <alex.williamson@redhat.com> wrote:

> This currently serves the same purpose as the default implementation
> but will be expanded for additional functionality.
> 
> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
> ---
>  drivers/vfio/pci/vfio_pci.c |    8 ++++++++
>  1 file changed, 8 insertions(+)

Reviewed-by: Cornelia Huck <cohuck@redhat.com>


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [dpdk-dev] [PATCH 3/7] vfio/pci: Introduce VF token
  2020-02-11 23:05 ` [dpdk-dev] [PATCH 3/7] vfio/pci: Introduce VF token Alex Williamson
@ 2020-02-13 11:46   ` Cornelia Huck
  2020-02-13 17:23     ` Alex Williamson
  0 siblings, 1 reply; 19+ messages in thread
From: Cornelia Huck @ 2020-02-13 11:46 UTC (permalink / raw)
  To: Alex Williamson
  Cc: kvm, linux-pci, linux-kernel, dev, mtosatti, thomas, bluca,
	jerinjacobk, bruce.richardson

On Tue, 11 Feb 2020 16:05:42 -0700
Alex Williamson <alex.williamson@redhat.com> wrote:

> If we enable SR-IOV on a vfio-pci owned PF, the resulting VFs are not
> fully isolated from the PF.  The PF can always cause a denial of
> service to the VF, if not access data passed through the VF directly.
> This is why vfio-pci currently does not bind to PFs with SR-IOV enabled
> and does not provide access itself to enabling SR-IOV on a PF.  The
> IOMMU grouping mechanism might allow us a solution to this lack of
> isolation, however the deficiency isn't actually in the DMA path, so
> much as the potential cooperation between PF and VF devices.  Also,
> if we were to force VFs into the same IOMMU group as the PF, we severely
> limit the utility of having independent drivers managing PFs and VFs
> with vfio.
> 
> Therefore we introduce the concept of a VF token.  The token is
> implemented as a UUID and represents a shared secret which must be set
> by the PF driver and used by the VF drivers in order to access a vfio
> device file descriptor for the VF.  The ioctl to set the VF token will
> be provided in a later commit, this commit implements the underlying
> infrastructure.  The concept here is to augment the string the user
> passes to match a device within a group in order to retrieve access to
> the device descriptor.  For example, rather than passing only the PCI
> device name (ex. "0000:03:00.0") the user would also pass a vf_token
> UUID (ex. "2ab74924-c335-45f4-9b16-8569e5b08258").  The device match
> string therefore becomes:
> 
> "0000:03:00.0 vf_token=2ab74924-c335-45f4-9b16-8569e5b08258"
> 
> This syntax is expected to be extensible to future options as well, with
> the standard being:
> 
> "$DEVICE_NAME $OPTION1=$VALUE1 $OPTION2=$VALUE2"

Is this designed to be an AND condition? (I read it as such.)

> 
> The device name must be first and option=value pairs are separated by
> spaces.
> 
> The vf_token option is only required for VFs where the PF device is
> bound to vfio-pci.  There is no change for PFs using existing host
> drivers.
> 
> Note that in order to protect existing VF users, not only is it required
> to set a vf_token on the PF before VFs devices can be accessed, but also
> if there are existing VF users, (re)opening the PF device must also
> provide the current vf_token as authentication.  This is intended to
> prevent a VF driver starting with a trusted PF driver and later being
> replaced by an unknown driver.  A vf_token is not required to open the
> PF device when none of the VF devices are in use by vfio-pci drivers.
> 
> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
> ---
>  drivers/vfio/pci/vfio_pci.c         |  127 +++++++++++++++++++++++++++++++++++
>  drivers/vfio/pci/vfio_pci_private.h |    8 ++
>  2 files changed, 134 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
> index 2ec6c31d0ab0..26aea9ac4863 100644
> --- a/drivers/vfio/pci/vfio_pci.c
> +++ b/drivers/vfio/pci/vfio_pci.c
> @@ -466,6 +466,35 @@ static void vfio_pci_disable(struct vfio_pci_device *vdev)
>  		vfio_pci_set_power_state(vdev, PCI_D3hot);
>  }
>  
> +static struct pci_driver vfio_pci_driver;
> +
> +static void vfio_pci_vf_token_user_add(struct vfio_pci_device *vdev, int val)

Suggestion: call this _user_modify(), and have _user_add() and
_user_remove() as wrappers. That said, ...

> +{
> +	struct pci_dev *physfn = pci_physfn(vdev->pdev);
> +	struct vfio_device *pf_dev;
> +	struct vfio_pci_device *pf_vdev;
> +
> +	if (!vdev->pdev->is_virtfn)
> +		return;
> +
> +	pf_dev = vfio_device_get_from_dev(&physfn->dev);
> +	if (!pf_dev)
> +		return;
> +
> +	if (pci_dev_driver(physfn) != &vfio_pci_driver) {
> +		vfio_device_put(pf_dev);
> +		return;
> +	}
> +
> +	pf_vdev = vfio_device_data(pf_dev);
> +
> +	mutex_lock(&pf_vdev->vf_token->lock);
> +	pf_vdev->vf_token->users += val;

...is this expected to always be >= 0? If yes, it looks like a bug if
this is called with val==-n for n > users.

> +	mutex_unlock(&pf_vdev->vf_token->lock);
> +
> +	vfio_device_put(pf_dev);
> +}
> +
>  static void vfio_pci_release(void *device_data)
>  {
>  	struct vfio_pci_device *vdev = device_data;
> @@ -475,6 +504,7 @@ static void vfio_pci_release(void *device_data)
>  	if (!(--vdev->refcnt)) {
>  		vfio_spapr_pci_eeh_release(vdev->pdev);
>  		vfio_pci_disable(vdev);
> +		vfio_pci_vf_token_user_add(vdev, -1);
>  	}
>  
>  	mutex_unlock(&vdev->reflck->lock);
> @@ -493,6 +523,7 @@ static int vfio_pci_open(void *device_data)
>  	mutex_lock(&vdev->reflck->lock);
>  
>  	if (!vdev->refcnt) {
> +		vfio_pci_vf_token_user_add(vdev, 1);
>  		ret = vfio_pci_enable(vdev);
>  		if (ret)
>  			goto error;
> @@ -1278,11 +1309,86 @@ static void vfio_pci_request(void *device_data, unsigned int count)
>  	mutex_unlock(&vdev->igate);
>  }
>  
> +#define VF_TOKEN_ARG "vf_token="
> +
> +/* Called with vdev->vf_token->lock */
> +static int vfio_pci_vf_token_match(struct vfio_pci_device *vdev, char *opts)
> +{
> +	char *token;
> +	uuid_t uuid;
> +	int ret;
> +
> +	if (!opts)
> +		return -EINVAL;
> +
> +	token = strstr(opts, VF_TOKEN_ARG);
> +	if (!token)
> +		return -EINVAL;
> +
> +	token += strlen(VF_TOKEN_ARG);
> +
> +	ret = uuid_parse(token, &uuid);
> +	if (ret)
> +		return ret;
> +
> +	if (!uuid_equal(&uuid, &vdev->vf_token->uuid))
> +		return -EACCES;
> +
> +	return 0;
> +}
> +
>  static int vfio_pci_match(void *device_data, char *buf)
>  {
>  	struct vfio_pci_device *vdev = device_data;
> +	char *opts;
> +
> +	opts = strchr(buf, ' ');
> +	if (opts) {
> +		*opts = 0;
> +		opts++;
> +	}
> +
> +	if (strcmp(pci_name(vdev->pdev), buf))
> +		return 0; /* No match */

Up to here, this function is fine; but below, it gets a bit hard to
follow...

> +
> +	if (vdev->pdev->is_virtfn) {
> +		struct pci_dev *physfn = pci_physfn(vdev->pdev);
> +		struct vfio_device *pf_dev;
> +		int ret = 0;
> +
> +		pf_dev = vfio_device_get_from_dev(&physfn->dev);
> +		if (pf_dev) {
> +			if (pci_dev_driver(physfn) == &vfio_pci_driver) {
> +				struct vfio_pci_device *pf_vdev =
> +						vfio_device_data(pf_dev);
> +
> +				mutex_lock(&pf_vdev->vf_token->lock);
> +				ret = vfio_pci_vf_token_match(pf_vdev, opts);
> +				mutex_unlock(&pf_vdev->vf_token->lock);
> +			}
> +
> +			vfio_device_put(pf_dev);
> +
> +			if (ret)
> +				return -EACCES;
> +		}
> +	}

If we are a VF, and the PF is bound to vfio, pass whatever stuff other
than the device name that was passed in to an opaque match function.

>  
> -	return !strcmp(pci_name(vdev->pdev), buf);
> +	if (vdev->vf_token) {
> +		int ret = 0;
> +
> +		mutex_lock(&vdev->vf_token->lock);
> +
> +		if (vdev->vf_token->users)
> +			ret = vfio_pci_vf_token_match(vdev, opts);
> +
> +		mutex_unlock(&vdev->vf_token->lock);
> +
> +		if (ret)
> +			return -EACCES;
> +	}

If we have a VF token with users, pass whatever stuff other than the
device name that was passed in to an opaque match function.

What about the following instead:

- parse the passed-in string into device name and key/value pairs
- maybe reject anything with an unknown key
- check the device name
- if we're a VF with the PF bound to vfio, require a VF token to be
  specified and pass it to a token match function
- if we have a VF token with users, require a VF token to be specified
  and pass it to a token match function

My main gripes with the current code are:
- key=value parsing is done in the match function for vf_token
- it looks hard to extend the list of supported key/value pairs
- I don't see a good way to find out (as the user) _why_ the VF isn't
  matching


> +
> +	return 1; /* Match */
>  }

(...)


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [dpdk-dev] [PATCH 4/7] vfio: Introduce VFIO_DEVICE_FEATURE ioctl and first user
  2020-02-11 23:05 ` [dpdk-dev] [PATCH 4/7] vfio: Introduce VFIO_DEVICE_FEATURE ioctl and first user Alex Williamson
@ 2020-02-13 12:41   ` Cornelia Huck
  2020-02-13 17:39     ` Alex Williamson
  0 siblings, 1 reply; 19+ messages in thread
From: Cornelia Huck @ 2020-02-13 12:41 UTC (permalink / raw)
  To: Alex Williamson
  Cc: kvm, linux-pci, linux-kernel, dev, mtosatti, thomas, bluca,
	jerinjacobk, bruce.richardson

On Tue, 11 Feb 2020 16:05:51 -0700
Alex Williamson <alex.williamson@redhat.com> wrote:

> The VFIO_DEVICE_FEATURE ioctl is meant to be a general purpose, device
> agnostic ioctl for setting, retrieving, and probing device features.
> This implementation provides a 16-bit field for specifying a feature
> index, where the data porition of the ioctl is determined by the
> semantics for the given feature.  Additional flag bits indicate the
> direction and nature of the operation; SET indicates user data is
> provided into the device feature, GET indicates the device feature is
> written out into user data.  The PROBE flag augments determining
> whether the given feature is supported, and if provided, whether the
> given operation on the feature is supported.
> 
> The first user of this ioctl is for setting the vfio-pci VF token,
> where the user provides a shared secret key (UUID) on a SR-IOV PF
> device, which users must provide when opening associated VF devices.
> 
> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
> ---
>  drivers/vfio/pci/vfio_pci.c |   52 +++++++++++++++++++++++++++++++++++++++++++
>  include/uapi/linux/vfio.h   |   37 +++++++++++++++++++++++++++++++
>  2 files changed, 89 insertions(+)

(...)

> diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
> index 9e843a147ead..c5cbf04ce5a7 100644
> --- a/include/uapi/linux/vfio.h
> +++ b/include/uapi/linux/vfio.h
> @@ -707,6 +707,43 @@ struct vfio_device_ioeventfd {
>  
>  #define VFIO_DEVICE_IOEVENTFD		_IO(VFIO_TYPE, VFIO_BASE + 16)
>  
> +/**
> + * VFIO_DEVICE_FEATURE - _IORW(VFIO_TYPE, VFIO_BASE + 17,
> + *			       struct vfio_device_feature

Missing ')'

> + *
> + * Get, set, or probe feature data of the device.  The feature is selected
> + * using the FEATURE_MASK portion of the flags field.  Support for a feature
> + * can be probed by setting both the FEATURE_MASK and PROBE bits.  A probe
> + * may optionally include the GET and/or SET bits to determine read vs write
> + * access of the feature respectively.  Probing a feature will return success
> + * if the feature is supported and all of the optionally indicated GET/SET
> + * methods are supported.  The format of the data portion of the structure is

If neither GET nor SET are specified, will it return success if any of
the two are supported?

> + * specific to the given feature.  The data portion is not required for
> + * probing.
> + *
> + * Return 0 on success, -errno on failure.
> + */
> +struct vfio_device_feature {
> +	__u32	argsz;
> +	__u32	flags;
> +#define VFIO_DEVICE_FEATURE_MASK	(0xffff) /* 16-bit feature index */
> +#define VFIO_DEVICE_FEATURE_GET		(1 << 16) /* Get feature into data[] */
> +#define VFIO_DEVICE_FEATURE_SET		(1 << 17) /* Set feature from data[] */
> +#define VFIO_DEVICE_FEATURE_PROBE	(1 << 18) /* Probe feature support */
> +	__u8	data[];
> +};

I'm not sure I'm a fan of cramming both feature selection and operation
selection into flags. What about:

struct vfio_device_feature {
	__u32 argsz;
	__u32 flags;
/* GET/SET/PROBE #defines */
	__u32 feature;
	__u8  data[];
};

Getting/setting more than one feature at the same time does not sound
like a common use case; you would need to specify some kind of
algorithm for that anyway, and just doing it individually seems much
easier than that.

> +
> +#define VFIO_DEVICE_FEATURE		_IO(VFIO_TYPE, VFIO_BASE + 17)
> +
> +/*
> + * Provide support for setting a PCI VF Token, which is used as a shared
> + * secret between PF and VF drivers.  This feature may only be set on a
> + * PCI SR-IOV PF when SR-IOV is enabled on the PF and there are no existing
> + * open VFs.  Data provided when setting this feature is a 16-byte array
> + * (__u8 b[16]), representing a UUID.

No objection to that.

> + */
> +#define VFIO_DEVICE_FEATURE_PCI_VF_TOKEN	(0)
> +
>  /* -------- API for Type1 VFIO IOMMU -------- */
>  
>  /**
> 


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [dpdk-dev] [PATCH 3/7] vfio/pci: Introduce VF token
  2020-02-13 11:46   ` Cornelia Huck
@ 2020-02-13 17:23     ` Alex Williamson
  2020-02-13 18:35       ` Cornelia Huck
  0 siblings, 1 reply; 19+ messages in thread
From: Alex Williamson @ 2020-02-13 17:23 UTC (permalink / raw)
  To: Cornelia Huck
  Cc: kvm, linux-pci, linux-kernel, dev, mtosatti, thomas, bluca,
	jerinjacobk, bruce.richardson

On Thu, 13 Feb 2020 12:46:54 +0100
Cornelia Huck <cohuck@redhat.com> wrote:

> On Tue, 11 Feb 2020 16:05:42 -0700
> Alex Williamson <alex.williamson@redhat.com> wrote:
> 
> > If we enable SR-IOV on a vfio-pci owned PF, the resulting VFs are not
> > fully isolated from the PF.  The PF can always cause a denial of
> > service to the VF, if not access data passed through the VF directly.
> > This is why vfio-pci currently does not bind to PFs with SR-IOV enabled
> > and does not provide access itself to enabling SR-IOV on a PF.  The
> > IOMMU grouping mechanism might allow us a solution to this lack of
> > isolation, however the deficiency isn't actually in the DMA path, so
> > much as the potential cooperation between PF and VF devices.  Also,
> > if we were to force VFs into the same IOMMU group as the PF, we severely
> > limit the utility of having independent drivers managing PFs and VFs
> > with vfio.
> > 
> > Therefore we introduce the concept of a VF token.  The token is
> > implemented as a UUID and represents a shared secret which must be set
> > by the PF driver and used by the VF drivers in order to access a vfio
> > device file descriptor for the VF.  The ioctl to set the VF token will
> > be provided in a later commit, this commit implements the underlying
> > infrastructure.  The concept here is to augment the string the user
> > passes to match a device within a group in order to retrieve access to
> > the device descriptor.  For example, rather than passing only the PCI
> > device name (ex. "0000:03:00.0") the user would also pass a vf_token
> > UUID (ex. "2ab74924-c335-45f4-9b16-8569e5b08258").  The device match
> > string therefore becomes:
> > 
> > "0000:03:00.0 vf_token=2ab74924-c335-45f4-9b16-8569e5b08258"
> > 
> > This syntax is expected to be extensible to future options as well, with
> > the standard being:
> > 
> > "$DEVICE_NAME $OPTION1=$VALUE1 $OPTION2=$VALUE2"  
> 
> Is this designed to be an AND condition? (I read it as such.)

Not sure I understand, the device name is always required for
compatibility, then zero or more key/value pairs may be needed
depending on the vfio bus driver and device requirements.  I'm not
suggesting that the user would pass multiple vf_token= options and the
implementation here would only parse the first.  I'm really only
suggesting the parsing format we'd use for multiple options, I'm not
trying to dictate how a bus driver might make use of them.  Does that
make sense, should I change some wording?
 
> > 
> > The device name must be first and option=value pairs are separated by
> > spaces.
> > 
> > The vf_token option is only required for VFs where the PF device is
> > bound to vfio-pci.  There is no change for PFs using existing host
> > drivers.
> > 
> > Note that in order to protect existing VF users, not only is it required
> > to set a vf_token on the PF before VFs devices can be accessed, but also
> > if there are existing VF users, (re)opening the PF device must also
> > provide the current vf_token as authentication.  This is intended to
> > prevent a VF driver starting with a trusted PF driver and later being
> > replaced by an unknown driver.  A vf_token is not required to open the
> > PF device when none of the VF devices are in use by vfio-pci drivers.
> > 
> > Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
> > ---
> >  drivers/vfio/pci/vfio_pci.c         |  127 +++++++++++++++++++++++++++++++++++
> >  drivers/vfio/pci/vfio_pci_private.h |    8 ++
> >  2 files changed, 134 insertions(+), 1 deletion(-)
> > 
> > diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
> > index 2ec6c31d0ab0..26aea9ac4863 100644
> > --- a/drivers/vfio/pci/vfio_pci.c
> > +++ b/drivers/vfio/pci/vfio_pci.c
> > @@ -466,6 +466,35 @@ static void vfio_pci_disable(struct vfio_pci_device *vdev)
> >  		vfio_pci_set_power_state(vdev, PCI_D3hot);
> >  }
> >  
> > +static struct pci_driver vfio_pci_driver;
> > +
> > +static void vfio_pci_vf_token_user_add(struct vfio_pci_device *vdev, int val)  
> 
> Suggestion: call this _user_modify(), and have _user_add() and
> _user_remove() as wrappers. That said, ...

I did consider something along these lines, but it seems overly
explicit for a helper that's used in two places with only two obvious
and discrete values.  If this were an exposed API, absolutely I'd agree.

> > +{
> > +	struct pci_dev *physfn = pci_physfn(vdev->pdev);
> > +	struct vfio_device *pf_dev;
> > +	struct vfio_pci_device *pf_vdev;
> > +
> > +	if (!vdev->pdev->is_virtfn)
> > +		return;
> > +
> > +	pf_dev = vfio_device_get_from_dev(&physfn->dev);
> > +	if (!pf_dev)
> > +		return;
> > +
> > +	if (pci_dev_driver(physfn) != &vfio_pci_driver) {
> > +		vfio_device_put(pf_dev);
> > +		return;
> > +	}
> > +
> > +	pf_vdev = vfio_device_data(pf_dev);
> > +
> > +	mutex_lock(&pf_vdev->vf_token->lock);
> > +	pf_vdev->vf_token->users += val;  
> 
> ...is this expected to always be >= 0? If yes, it looks like a bug if
> this is called with val==-n for n > users.

I'm not sure if you're inadvertently pointing out the bug in the
vfio_pci_open() path below where we increment token users before
vfio_pci_enable() which can fail, or your suggesting a WARN_ON here
should this go negative.  This is a static helper function, so I
generally don't try to sanitize the inputs like I would for an exposed
API.
 
> > +	mutex_unlock(&pf_vdev->vf_token->lock);
> > +
> > +	vfio_device_put(pf_dev);
> > +}
> > +
> >  static void vfio_pci_release(void *device_data)
> >  {
> >  	struct vfio_pci_device *vdev = device_data;
> > @@ -475,6 +504,7 @@ static void vfio_pci_release(void *device_data)
> >  	if (!(--vdev->refcnt)) {
> >  		vfio_spapr_pci_eeh_release(vdev->pdev);
> >  		vfio_pci_disable(vdev);
> > +		vfio_pci_vf_token_user_add(vdev, -1);
> >  	}
> >  
> >  	mutex_unlock(&vdev->reflck->lock);
> > @@ -493,6 +523,7 @@ static int vfio_pci_open(void *device_data)
> >  	mutex_lock(&vdev->reflck->lock);
> >  
> >  	if (!vdev->refcnt) {
> > +		vfio_pci_vf_token_user_add(vdev, 1);
> >  		ret = vfio_pci_enable(vdev);
> >  		if (ret)
> >  			goto error;

I think we want to include decrementing token users in the error path
here.

> > @@ -1278,11 +1309,86 @@ static void vfio_pci_request(void *device_data, unsigned int count)
> >  	mutex_unlock(&vdev->igate);
> >  }
> >  
> > +#define VF_TOKEN_ARG "vf_token="
> > +
> > +/* Called with vdev->vf_token->lock */
> > +static int vfio_pci_vf_token_match(struct vfio_pci_device *vdev, char *opts)
> > +{
> > +	char *token;
> > +	uuid_t uuid;
> > +	int ret;
> > +
> > +	if (!opts)
> > +		return -EINVAL;
> > +
> > +	token = strstr(opts, VF_TOKEN_ARG);
> > +	if (!token)
> > +		return -EINVAL;
> > +
> > +	token += strlen(VF_TOKEN_ARG);
> > +
> > +	ret = uuid_parse(token, &uuid);
> > +	if (ret)
> > +		return ret;
> > +
> > +	if (!uuid_equal(&uuid, &vdev->vf_token->uuid))
> > +		return -EACCES;
> > +
> > +	return 0;
> > +}
> > +
> >  static int vfio_pci_match(void *device_data, char *buf)
> >  {
> >  	struct vfio_pci_device *vdev = device_data;
> > +	char *opts;
> > +
> > +	opts = strchr(buf, ' ');
> > +	if (opts) {
> > +		*opts = 0;
> > +		opts++;
> > +	}
> > +
> > +	if (strcmp(pci_name(vdev->pdev), buf))
> > +		return 0; /* No match */  
> 
> Up to here, this function is fine; but below, it gets a bit hard to
> follow...
> 
> > +
> > +	if (vdev->pdev->is_virtfn) {
> > +		struct pci_dev *physfn = pci_physfn(vdev->pdev);
> > +		struct vfio_device *pf_dev;
> > +		int ret = 0;
> > +
> > +		pf_dev = vfio_device_get_from_dev(&physfn->dev);
> > +		if (pf_dev) {
> > +			if (pci_dev_driver(physfn) == &vfio_pci_driver) {
> > +				struct vfio_pci_device *pf_vdev =
> > +						vfio_device_data(pf_dev);
> > +
> > +				mutex_lock(&pf_vdev->vf_token->lock);
> > +				ret = vfio_pci_vf_token_match(pf_vdev, opts);
> > +				mutex_unlock(&pf_vdev->vf_token->lock);
> > +			}
> > +
> > +			vfio_device_put(pf_dev);
> > +
> > +			if (ret)
> > +				return -EACCES;
> > +		}
> > +	}  
> 
> If we are a VF, and the PF is bound to vfio, pass whatever stuff other
> than the device name that was passed in to an opaque match function.

vfio_pci_match() is an opaque match function relative to vfio.c, but
there's nothing opaque here.  We have a VF where the associated PF is
bound to vfio-pci, therefore we require that the additional options
include a vf_token matching the PF and we go looking to verify that.
 
> > -	return !strcmp(pci_name(vdev->pdev), buf);
> > +	if (vdev->vf_token) {
> > +		int ret = 0;
> > +
> > +		mutex_lock(&vdev->vf_token->lock);
> > +
> > +		if (vdev->vf_token->users)
> > +			ret = vfio_pci_vf_token_match(vdev, opts);
> > +
> > +		mutex_unlock(&vdev->vf_token->lock);
> > +
> > +		if (ret)
> > +			return -EACCES;
> > +	}  
> 
> If we have a VF token with users, pass whatever stuff other than the
> device name that was passed in to an opaque match function.

This description strays further off course a bit.  If we have a
vf_token then we are a PF and clearly bound to vfio-pci.  If there are
existing VF users then we require the user to provide a vf_token
matching the one currently on the device.

> What about the following instead:
> 
> - parse the passed-in string into device name and key/value pairs
> - maybe reject anything with an unknown key

This is definitely something that we should decided whether or not we
want to do it.  I think an argument for it is that a user can pick
arbitrary key=value options that would be ignored with this
implementation, but later might match a key that gets defined and then
we break them.  Misuse of the API on the part of the user, but maybe
better to just prevent it ahead of time.

> - check the device name
> - if we're a VF with the PF bound to vfio, require a VF token to be
>   specified and pass it to a token match function
> - if we have a VF token with users, require a VF token to be specified
>   and pass it to a token match function

This is essentially what we do above.  Maybe we just disagree about
whether we're calling an "opaque match function" versus a "token match
function".

> 
> My main gripes with the current code are:
> - key=value parsing is done in the match function for vf_token
> - it looks hard to extend the list of supported key/value pairs
> - I don't see a good way to find out (as the user) _why_ the VF isn't
>   matching

If we want to reject unknown options, then yes, the parsing should be
done ahead.  I don't see that it's hard to extend though, each new
requirement can follow the same methodology to check for it in the
remaining options string.

The last point is the one I brought up in the cover letter and where
I'm also not happy with the opaque error condition, but I have no
thoughts on how to resolve it.  Either we block the user from getting
the device file descriptor, and they're left scratching their heads
why, or we give them access to the file descriptor AND we need to
impose barriers to access mechanisms (ex. block read/write/mmap, limit
ioctls) AND we need to use VFIO_DEVICE_FEATURE and VFIO_DEVICE_GET_INFO
as a mechanism for the user to figure out that the device requires
"something" to get full access.  With the latter, I'm concerned whether
existing userspace code will fail in predictable ways and that it
snowballs into an ugly API.  For instance, if we add a flag to device
info to indicate it's locked, existing code won't know about that flag,
so we have to cripple device info to report no regions and no irqs to
make that code fail.  Then a user needs to know which feature to probe
for to figure out how the device is locked, then once they do we make
device info report real values?  It's maybe a little more deterministic
than blocking access to the file descriptor, but I'm not sure it's
worth it.  We could do a log-once if the match fails for token, but we
need to be careful not to provide an obvious point where the user could
spam the logs.  I've also considered if we could write an error back
into the user's buffer, but the ioctl isn't designed that way and we
don't know if we'd break how the user consumes that buffer later.

Ideas greatly welcomed in this space.  Thanks for the review!

Alex


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [dpdk-dev] [PATCH 4/7] vfio: Introduce VFIO_DEVICE_FEATURE ioctl and first user
  2020-02-13 12:41   ` Cornelia Huck
@ 2020-02-13 17:39     ` Alex Williamson
  2020-02-13 18:08       ` Cornelia Huck
  0 siblings, 1 reply; 19+ messages in thread
From: Alex Williamson @ 2020-02-13 17:39 UTC (permalink / raw)
  To: Cornelia Huck
  Cc: kvm, linux-pci, linux-kernel, dev, mtosatti, thomas, bluca,
	jerinjacobk, bruce.richardson

On Thu, 13 Feb 2020 13:41:21 +0100
Cornelia Huck <cohuck@redhat.com> wrote:

> On Tue, 11 Feb 2020 16:05:51 -0700
> Alex Williamson <alex.williamson@redhat.com> wrote:
> 
> > The VFIO_DEVICE_FEATURE ioctl is meant to be a general purpose, device
> > agnostic ioctl for setting, retrieving, and probing device features.
> > This implementation provides a 16-bit field for specifying a feature
> > index, where the data porition of the ioctl is determined by the
> > semantics for the given feature.  Additional flag bits indicate the
> > direction and nature of the operation; SET indicates user data is
> > provided into the device feature, GET indicates the device feature is
> > written out into user data.  The PROBE flag augments determining
> > whether the given feature is supported, and if provided, whether the
> > given operation on the feature is supported.
> > 
> > The first user of this ioctl is for setting the vfio-pci VF token,
> > where the user provides a shared secret key (UUID) on a SR-IOV PF
> > device, which users must provide when opening associated VF devices.
> > 
> > Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
> > ---
> >  drivers/vfio/pci/vfio_pci.c |   52 +++++++++++++++++++++++++++++++++++++++++++
> >  include/uapi/linux/vfio.h   |   37 +++++++++++++++++++++++++++++++
> >  2 files changed, 89 insertions(+)  
> 
> (...)
> 
> > diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
> > index 9e843a147ead..c5cbf04ce5a7 100644
> > --- a/include/uapi/linux/vfio.h
> > +++ b/include/uapi/linux/vfio.h
> > @@ -707,6 +707,43 @@ struct vfio_device_ioeventfd {
> >  
> >  #define VFIO_DEVICE_IOEVENTFD		_IO(VFIO_TYPE, VFIO_BASE + 16)
> >  
> > +/**
> > + * VFIO_DEVICE_FEATURE - _IORW(VFIO_TYPE, VFIO_BASE + 17,
> > + *			       struct vfio_device_feature  
> 
> Missing ')'

Fixed.
 
> > + *
> > + * Get, set, or probe feature data of the device.  The feature is selected
> > + * using the FEATURE_MASK portion of the flags field.  Support for a feature
> > + * can be probed by setting both the FEATURE_MASK and PROBE bits.  A probe
> > + * may optionally include the GET and/or SET bits to determine read vs write
> > + * access of the feature respectively.  Probing a feature will return success
> > + * if the feature is supported and all of the optionally indicated GET/SET
> > + * methods are supported.  The format of the data portion of the structure is  
> 
> If neither GET nor SET are specified, will it return success if any of
> the two are supported?

Yes, that's how I've implemented this first feature.

> > + * specific to the given feature.  The data portion is not required for
> > + * probing.
> > + *
> > + * Return 0 on success, -errno on failure.
> > + */
> > +struct vfio_device_feature {
> > +	__u32	argsz;
> > +	__u32	flags;
> > +#define VFIO_DEVICE_FEATURE_MASK	(0xffff) /* 16-bit feature index */
> > +#define VFIO_DEVICE_FEATURE_GET		(1 << 16) /* Get feature into data[] */
> > +#define VFIO_DEVICE_FEATURE_SET		(1 << 17) /* Set feature from data[] */
> > +#define VFIO_DEVICE_FEATURE_PROBE	(1 << 18) /* Probe feature support */
> > +	__u8	data[];
> > +};  
> 
> I'm not sure I'm a fan of cramming both feature selection and operation
> selection into flags. What about:
> 
> struct vfio_device_feature {
> 	__u32 argsz;
> 	__u32 flags;
> /* GET/SET/PROBE #defines */
> 	__u32 feature;
> 	__u8  data[];
> };

Then data is unaligned so we either need to expand feature or add
padding.  So this makes the structure at least 8 bytes bigger and buys
us...?  What's so special about the bottom half of flags that we can't
designate it as the flags that specify the feature?  We still have
another 13 bits of flags for future use.

> Getting/setting more than one feature at the same time does not sound
> like a common use case; you would need to specify some kind of
> algorithm for that anyway, and just doing it individually seems much
> easier than that.

Yup.  I just figured 2^16 features is a nice way to make use of the
structure vs 2^32 features and 4 bytes of padding or 2^64 features.  I
don't think I'm being optimistic in thinking we'll have far less than
16K features and we can always reserve feature 0xffff as an extended
feature where the first 8-bytes of data defines that extended feature
index.

> > +
> > +#define VFIO_DEVICE_FEATURE		_IO(VFIO_TYPE, VFIO_BASE + 17)
> > +
> > +/*
> > + * Provide support for setting a PCI VF Token, which is used as a shared
> > + * secret between PF and VF drivers.  This feature may only be set on a
> > + * PCI SR-IOV PF when SR-IOV is enabled on the PF and there are no existing
> > + * open VFs.  Data provided when setting this feature is a 16-byte array
> > + * (__u8 b[16]), representing a UUID.  
> 
> No objection to that.

:)  Thanks!

Alex


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [dpdk-dev] [PATCH 4/7] vfio: Introduce VFIO_DEVICE_FEATURE ioctl and first user
  2020-02-13 17:39     ` Alex Williamson
@ 2020-02-13 18:08       ` Cornelia Huck
  0 siblings, 0 replies; 19+ messages in thread
From: Cornelia Huck @ 2020-02-13 18:08 UTC (permalink / raw)
  To: Alex Williamson
  Cc: kvm, linux-pci, linux-kernel, dev, mtosatti, thomas, bluca,
	jerinjacobk, bruce.richardson

On Thu, 13 Feb 2020 10:39:57 -0700
Alex Williamson <alex.williamson@redhat.com> wrote:

> On Thu, 13 Feb 2020 13:41:21 +0100
> Cornelia Huck <cohuck@redhat.com> wrote:
> 
> > On Tue, 11 Feb 2020 16:05:51 -0700
> > Alex Williamson <alex.williamson@redhat.com> wrote:

> > > +struct vfio_device_feature {
> > > +	__u32	argsz;
> > > +	__u32	flags;
> > > +#define VFIO_DEVICE_FEATURE_MASK	(0xffff) /* 16-bit feature index */
> > > +#define VFIO_DEVICE_FEATURE_GET		(1 << 16) /* Get feature into data[] */
> > > +#define VFIO_DEVICE_FEATURE_SET		(1 << 17) /* Set feature from data[] */
> > > +#define VFIO_DEVICE_FEATURE_PROBE	(1 << 18) /* Probe feature support */
> > > +	__u8	data[];
> > > +};    
> > 
> > I'm not sure I'm a fan of cramming both feature selection and operation
> > selection into flags. What about:
> > 
> > struct vfio_device_feature {
> > 	__u32 argsz;
> > 	__u32 flags;
> > /* GET/SET/PROBE #defines */
> > 	__u32 feature;
> > 	__u8  data[];
> > };  
> 
> Then data is unaligned so we either need to expand feature or add
> padding.  So this makes the structure at least 8 bytes bigger and buys
> us...?  What's so special about the bottom half of flags that we can't
> designate it as the flags that specify the feature?  We still have
> another 13 bits of flags for future use.

It is more my general dislike of bit fiddling here, no strong
objection, certainly.

> 
> > Getting/setting more than one feature at the same time does not sound
> > like a common use case; you would need to specify some kind of
> > algorithm for that anyway, and just doing it individually seems much
> > easier than that.  
> 
> Yup.  I just figured 2^16 features is a nice way to make use of the
> structure vs 2^32 features and 4 bytes of padding or 2^64 features.  I
> don't think I'm being optimistic in thinking we'll have far less than
> 16K features and we can always reserve feature 0xffff as an extended
> feature where the first 8-bytes of data defines that extended feature
> index.

Agreed, we're probably not going to end up with a flood of features
here.

Anyway, much of this seems to be a matter of personal taste, so let's
keep it as it is.


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [dpdk-dev] [PATCH 3/7] vfio/pci: Introduce VF token
  2020-02-13 17:23     ` Alex Williamson
@ 2020-02-13 18:35       ` Cornelia Huck
  2020-02-14 23:40         ` Alex Williamson
  0 siblings, 1 reply; 19+ messages in thread
From: Cornelia Huck @ 2020-02-13 18:35 UTC (permalink / raw)
  To: Alex Williamson
  Cc: kvm, linux-pci, linux-kernel, dev, mtosatti, thomas, bluca,
	jerinjacobk, bruce.richardson

On Thu, 13 Feb 2020 10:23:21 -0700
Alex Williamson <alex.williamson@redhat.com> wrote:

> On Thu, 13 Feb 2020 12:46:54 +0100
> Cornelia Huck <cohuck@redhat.com> wrote:
> 
> > On Tue, 11 Feb 2020 16:05:42 -0700
> > Alex Williamson <alex.williamson@redhat.com> wrote:
> >   
> > > If we enable SR-IOV on a vfio-pci owned PF, the resulting VFs are not
> > > fully isolated from the PF.  The PF can always cause a denial of
> > > service to the VF, if not access data passed through the VF directly.
> > > This is why vfio-pci currently does not bind to PFs with SR-IOV enabled
> > > and does not provide access itself to enabling SR-IOV on a PF.  The
> > > IOMMU grouping mechanism might allow us a solution to this lack of
> > > isolation, however the deficiency isn't actually in the DMA path, so
> > > much as the potential cooperation between PF and VF devices.  Also,
> > > if we were to force VFs into the same IOMMU group as the PF, we severely
> > > limit the utility of having independent drivers managing PFs and VFs
> > > with vfio.
> > > 
> > > Therefore we introduce the concept of a VF token.  The token is
> > > implemented as a UUID and represents a shared secret which must be set
> > > by the PF driver and used by the VF drivers in order to access a vfio
> > > device file descriptor for the VF.  The ioctl to set the VF token will
> > > be provided in a later commit, this commit implements the underlying
> > > infrastructure.  The concept here is to augment the string the user
> > > passes to match a device within a group in order to retrieve access to
> > > the device descriptor.  For example, rather than passing only the PCI
> > > device name (ex. "0000:03:00.0") the user would also pass a vf_token
> > > UUID (ex. "2ab74924-c335-45f4-9b16-8569e5b08258").  The device match
> > > string therefore becomes:
> > > 
> > > "0000:03:00.0 vf_token=2ab74924-c335-45f4-9b16-8569e5b08258"
> > > 
> > > This syntax is expected to be extensible to future options as well, with
> > > the standard being:
> > > 
> > > "$DEVICE_NAME $OPTION1=$VALUE1 $OPTION2=$VALUE2"    
> > 
> > Is this designed to be an AND condition? (I read it as such.)  
> 
> Not sure I understand, the device name is always required for
> compatibility, then zero or more key/value pairs may be needed
> depending on the vfio bus driver and device requirements.  I'm not
> suggesting that the user would pass multiple vf_token= options and the
> implementation here would only parse the first.  I'm really only
> suggesting the parsing format we'd use for multiple options, I'm not
> trying to dictate how a bus driver might make use of them.  Does that
> make sense, should I change some wording?

Not multiple vf_token= options, but multiple, different options.

E.g. we have something like "$NAME foo=xyz bar=zyx". What is supposed
to happen?

- both the foo= and bar= values have to give a match
- either foo= or bar= have to match
- if foo= doesn't match, try bar=
- foo= and bar= are ignored, if not applicable

(My understanding is that $NAME matching continues to be mandatory?)

What should happen for vf_token= is reasonably clear, but I'm wondering
about further extensions, as you already talk about it.

>  
> > > 
> > > The device name must be first and option=value pairs are separated by
> > > spaces.
> > > 
> > > The vf_token option is only required for VFs where the PF device is
> > > bound to vfio-pci.  There is no change for PFs using existing host
> > > drivers.
> > > 
> > > Note that in order to protect existing VF users, not only is it required
> > > to set a vf_token on the PF before VFs devices can be accessed, but also
> > > if there are existing VF users, (re)opening the PF device must also
> > > provide the current vf_token as authentication.  This is intended to
> > > prevent a VF driver starting with a trusted PF driver and later being
> > > replaced by an unknown driver.  A vf_token is not required to open the
> > > PF device when none of the VF devices are in use by vfio-pci drivers.
> > > 
> > > Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
> > > ---
> > >  drivers/vfio/pci/vfio_pci.c         |  127 +++++++++++++++++++++++++++++++++++
> > >  drivers/vfio/pci/vfio_pci_private.h |    8 ++
> > >  2 files changed, 134 insertions(+), 1 deletion(-)
> > > 
> > > diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
> > > index 2ec6c31d0ab0..26aea9ac4863 100644
> > > --- a/drivers/vfio/pci/vfio_pci.c
> > > +++ b/drivers/vfio/pci/vfio_pci.c
> > > @@ -466,6 +466,35 @@ static void vfio_pci_disable(struct vfio_pci_device *vdev)
> > >  		vfio_pci_set_power_state(vdev, PCI_D3hot);
> > >  }
> > >  
> > > +static struct pci_driver vfio_pci_driver;
> > > +
> > > +static void vfio_pci_vf_token_user_add(struct vfio_pci_device *vdev, int val)    
> > 
> > Suggestion: call this _user_modify(), and have _user_add() and
> > _user_remove() as wrappers. That said, ...  
> 
> I did consider something along these lines, but it seems overly
> explicit for a helper that's used in two places with only two obvious
> and discrete values.  If this were an exposed API, absolutely I'd agree.
> 
> > > +{
> > > +	struct pci_dev *physfn = pci_physfn(vdev->pdev);
> > > +	struct vfio_device *pf_dev;
> > > +	struct vfio_pci_device *pf_vdev;
> > > +
> > > +	if (!vdev->pdev->is_virtfn)
> > > +		return;
> > > +
> > > +	pf_dev = vfio_device_get_from_dev(&physfn->dev);
> > > +	if (!pf_dev)
> > > +		return;
> > > +
> > > +	if (pci_dev_driver(physfn) != &vfio_pci_driver) {
> > > +		vfio_device_put(pf_dev);
> > > +		return;
> > > +	}
> > > +
> > > +	pf_vdev = vfio_device_data(pf_dev);
> > > +
> > > +	mutex_lock(&pf_vdev->vf_token->lock);
> > > +	pf_vdev->vf_token->users += val;    
> > 
> > ...is this expected to always be >= 0? If yes, it looks like a bug if
> > this is called with val==-n for n > users.  
> 
> I'm not sure if you're inadvertently pointing out the bug in the
> vfio_pci_open() path below where we increment token users before
> vfio_pci_enable() which can fail, or your suggesting a WARN_ON here
> should this go negative.  This is a static helper function, so I
> generally don't try to sanitize the inputs like I would for an exposed
> API.

Yes, if you let users drop below 0, it's an internal error. Still, I
think it's worth checking, so that we catch those internal errors early
on, so a WARN_ON is probably the right thing to do.

>  
> > > +	mutex_unlock(&pf_vdev->vf_token->lock);
> > > +
> > > +	vfio_device_put(pf_dev);
> > > +}
> > > +
> > >  static void vfio_pci_release(void *device_data)
> > >  {
> > >  	struct vfio_pci_device *vdev = device_data;
> > > @@ -475,6 +504,7 @@ static void vfio_pci_release(void *device_data)
> > >  	if (!(--vdev->refcnt)) {
> > >  		vfio_spapr_pci_eeh_release(vdev->pdev);
> > >  		vfio_pci_disable(vdev);
> > > +		vfio_pci_vf_token_user_add(vdev, -1);
> > >  	}
> > >  
> > >  	mutex_unlock(&vdev->reflck->lock);
> > > @@ -493,6 +523,7 @@ static int vfio_pci_open(void *device_data)
> > >  	mutex_lock(&vdev->reflck->lock);
> > >  
> > >  	if (!vdev->refcnt) {
> > > +		vfio_pci_vf_token_user_add(vdev, 1);
> > >  		ret = vfio_pci_enable(vdev);
> > >  		if (ret)
> > >  			goto error;  
> 
> I think we want to include decrementing token users in the error path
> here.

Yes; good that my comment made you spot it, because I didn't notice :)

> 
> > > @@ -1278,11 +1309,86 @@ static void vfio_pci_request(void *device_data, unsigned int count)
> > >  	mutex_unlock(&vdev->igate);
> > >  }
> > >  
> > > +#define VF_TOKEN_ARG "vf_token="
> > > +
> > > +/* Called with vdev->vf_token->lock */
> > > +static int vfio_pci_vf_token_match(struct vfio_pci_device *vdev, char *opts)
> > > +{
> > > +	char *token;
> > > +	uuid_t uuid;
> > > +	int ret;
> > > +
> > > +	if (!opts)
> > > +		return -EINVAL;
> > > +
> > > +	token = strstr(opts, VF_TOKEN_ARG);
> > > +	if (!token)
> > > +		return -EINVAL;
> > > +
> > > +	token += strlen(VF_TOKEN_ARG);
> > > +
> > > +	ret = uuid_parse(token, &uuid);
> > > +	if (ret)
> > > +		return ret;
> > > +
> > > +	if (!uuid_equal(&uuid, &vdev->vf_token->uuid))
> > > +		return -EACCES;
> > > +
> > > +	return 0;

Again, I guess my objections below are a matter of taste; especially
because this function does the key=value parsing, and I'm not sure it's
the right place to do so.

> > > +}
> > > +
> > >  static int vfio_pci_match(void *device_data, char *buf)
> > >  {
> > >  	struct vfio_pci_device *vdev = device_data;
> > > +	char *opts;
> > > +
> > > +	opts = strchr(buf, ' ');
> > > +	if (opts) {
> > > +		*opts = 0;
> > > +		opts++;
> > > +	}
> > > +
> > > +	if (strcmp(pci_name(vdev->pdev), buf))
> > > +		return 0; /* No match */    
> > 
> > Up to here, this function is fine; but below, it gets a bit hard to
> > follow...
> >   
> > > +
> > > +	if (vdev->pdev->is_virtfn) {
> > > +		struct pci_dev *physfn = pci_physfn(vdev->pdev);
> > > +		struct vfio_device *pf_dev;
> > > +		int ret = 0;
> > > +
> > > +		pf_dev = vfio_device_get_from_dev(&physfn->dev);
> > > +		if (pf_dev) {
> > > +			if (pci_dev_driver(physfn) == &vfio_pci_driver) {
> > > +				struct vfio_pci_device *pf_vdev =
> > > +						vfio_device_data(pf_dev);
> > > +
> > > +				mutex_lock(&pf_vdev->vf_token->lock);
> > > +				ret = vfio_pci_vf_token_match(pf_vdev, opts);
> > > +				mutex_unlock(&pf_vdev->vf_token->lock);
> > > +			}
> > > +
> > > +			vfio_device_put(pf_dev);
> > > +
> > > +			if (ret)
> > > +				return -EACCES;
> > > +		}
> > > +	}    
> > 
> > If we are a VF, and the PF is bound to vfio, pass whatever stuff other
> > than the device name that was passed in to an opaque match function.  
> 
> vfio_pci_match() is an opaque match function relative to vfio.c, but
> there's nothing opaque here.  We have a VF where the associated PF is
> bound to vfio-pci, therefore we require that the additional options
> include a vf_token matching the PF and we go looking to verify that.
>  
> > > -	return !strcmp(pci_name(vdev->pdev), buf);
> > > +	if (vdev->vf_token) {
> > > +		int ret = 0;
> > > +
> > > +		mutex_lock(&vdev->vf_token->lock);
> > > +
> > > +		if (vdev->vf_token->users)
> > > +			ret = vfio_pci_vf_token_match(vdev, opts);
> > > +
> > > +		mutex_unlock(&vdev->vf_token->lock);
> > > +
> > > +		if (ret)
> > > +			return -EACCES;
> > > +	}    
> > 
> > If we have a VF token with users, pass whatever stuff other than the
> > device name that was passed in to an opaque match function.  
> 
> This description strays further off course a bit.  If we have a
> vf_token then we are a PF and clearly bound to vfio-pci.  If there are
> existing VF users then we require the user to provide a vf_token
> matching the one currently on the device.

Maybe my wording is just a bit off...

> 
> > What about the following instead:
> > 
> > - parse the passed-in string into device name and key/value pairs
> > - maybe reject anything with an unknown key  
> 
> This is definitely something that we should decided whether or not we
> want to do it.  I think an argument for it is that a user can pick
> arbitrary key=value options that would be ignored with this
> implementation, but later might match a key that gets defined and then
> we break them.  Misuse of the API on the part of the user, but maybe
> better to just prevent it ahead of time.

Yes, it's probably good to do this now.

> 
> > - check the device name
> > - if we're a VF with the PF bound to vfio, require a VF token to be
> >   specified and pass it to a token match function
> > - if we have a VF token with users, require a VF token to be specified
> >   and pass it to a token match function  
> 
> This is essentially what we do above.  Maybe we just disagree about
> whether we're calling an "opaque match function" versus a "token match
> function".

Maybe I should have said "function parsing a string, which might
contain a lot of unrelated stuff" vs "function explicitly handling a
vf_token value".

> 
> > 
> > My main gripes with the current code are:
> > - key=value parsing is done in the match function for vf_token
> > - it looks hard to extend the list of supported key/value pairs
> > - I don't see a good way to find out (as the user) _why_ the VF isn't
> >   matching  
> 
> If we want to reject unknown options, then yes, the parsing should be
> done ahead.  I don't see that it's hard to extend though, each new
> requirement can follow the same methodology to check for it in the
> remaining options string.

If you pre-parse into option/value pairs, you see quite easily if you
managed to obtain a required option, if an option has been specified
more than once, or if an unknown option has been specified. If a new
option is introduced, you just need to handle whatever has been parsed
already. Extending is probably not exactly hard, but pre-parsing likely
makes it easier, as you don't have implicit assumptions.

> 
> The last point is the one I brought up in the cover letter and where
> I'm also not happy with the opaque error condition, but I have no
> thoughts on how to resolve it.  Either we block the user from getting
> the device file descriptor, and they're left scratching their heads
> why, or we give them access to the file descriptor AND we need to
> impose barriers to access mechanisms (ex. block read/write/mmap, limit
> ioctls) AND we need to use VFIO_DEVICE_FEATURE and VFIO_DEVICE_GET_INFO
> as a mechanism for the user to figure out that the device requires
> "something" to get full access.  With the latter, I'm concerned whether
> existing userspace code will fail in predictable ways and that it
> snowballs into an ugly API.  For instance, if we add a flag to device
> info to indicate it's locked, existing code won't know about that flag,
> so we have to cripple device info to report no regions and no irqs to
> make that code fail.  Then a user needs to know which feature to probe
> for to figure out how the device is locked, then once they do we make
> device info report real values?  It's maybe a little more deterministic
> than blocking access to the file descriptor, but I'm not sure it's
> worth it.  We could do a log-once if the match fails for token, but we
> need to be careful not to provide an obvious point where the user could
> spam the logs.  I've also considered if we could write an error back
> into the user's buffer, but the ioctl isn't designed that way and we
> don't know if we'd break how the user consumes that buffer later.

Some extended reporting mechanism is likely to become unwieldy,
especially when we realize we missed something. A simple log message
that a vf_token is required (pointing to a more verbose description, if
possible) looks best (obviously rate limited or only printed once).
Just enough to give the user a hint so that they are not left
completely baffled.

> 
> Ideas greatly welcomed in this space.  Thanks for the review!
> 
> Alex


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [dpdk-dev] [PATCH 0/7] vfio/pci: SR-IOV support
  2020-02-11 23:05 [dpdk-dev] [PATCH 0/7] vfio/pci: SR-IOV support Alex Williamson
                   ` (6 preceding siblings ...)
  2020-02-11 23:06 ` [dpdk-dev] [PATCH 7/7] vfio/pci: Cleanup .probe() exit paths Alex Williamson
@ 2020-02-14  4:57 ` Alexey Kardashevskiy
  2020-02-14 15:27   ` Alex Williamson
  7 siblings, 1 reply; 19+ messages in thread
From: Alexey Kardashevskiy @ 2020-02-14  4:57 UTC (permalink / raw)
  To: Alex Williamson, kvm
  Cc: linux-pci, linux-kernel, dev, mtosatti, thomas, bluca,
	jerinjacobk, bruce.richardson, cohuck



On 12/02/2020 10:05, Alex Williamson wrote:
> Given the mostly positive feedback from the RFC[1], here's a new
> non-RFC revision.  Changes since RFC:
> 
>  - vfio_device_ops.match semantics refined
>  - Use helpers for struct pci_dev.physfn to avoid breakage without
>    CONFIG_PCI_IOV
>  - Relax to allow SR-IOV configuration changes while PF is opened.
>    There are potentially interesting use cases here, including
>    perhaps QEMU emulating an SR-IOV capability and calling out
>    to a privileged entity to manipulate sriov_numvfs and corral
>    the resulting devices.
>  - Retest vfio_device_feature.argsz to include uuid length.
>  - Add Connie's R-b on 6/7
> 
> I still wish we had a solution to make it less opaque to the user
> why a VFIO_GROUP_GET_DEVICE_FD() has failed if a VF token is
> required, but this is still the best I've been able to come up with.
> If there are objections or better ideas, please raise them now.
> 
> The synopsis of this series is that we have an ongoing desire to drive
> PCIe SR-IOV PFs from userspace with VFIO.  There's an immediate need
> for this with DPDK drivers and potentially interesting future use
> cases in virtualization.  We've been reluctant to add this support
> previously due to the dependency and trust relationship between the
> VF device and PF driver.  Minimally the PF driver can induce a denial
> of service to the VF, but depending on the specific implementation,
> the PF driver might also be responsible for moving data between VFs
> or have direct access to the state of the VF, including data or state
> otherwise private to the VF or VF driver.
> 
> To help resolve these concerns, we introduce a VF token into the VFIO
> PCI ABI, which acts as a shared secret key between drivers.  The
> userspace PF driver is required to set the VF token to a known value
> and userspace VF drivers are required to provide the token to access
> the VF device.  If a PF driver is restarted with VF drivers in use, it
> must also provide the current token in order to prevent a rogue
> untrusted PF driver from replacing a known driver.  The degree to
> which this new token is considered secret is left to the userspace
> drivers, the kernel intentionally provides no means to retrieve the
> current token.
> 
> Note that the above token is only required for this new model where
> both the PF and VF devices are usable through vfio-pci.  Existing
> models of VFIO drivers where the PF is used without SR-IOV enabled
> or the VF is bound to a userspace driver with an in-kernel, host PF
> driver are unaffected.
> 
> The latter configuration above also highlights a new inverted scenario
> that is now possible, a userspace PF driver with in-kernel VF drivers.
> I believe this is a scenario that should be allowed, but should not be
> enabled by default.  This series includes code to set a default
> driver_override for VFs sourced from a vfio-pci user owned PF, such
> that the VFs are also bound to vfio-pci.  This model is compatible
> with tools like driverctl and allows the system administrator to
> decide if other bindings should be enabled.  The VF token interface
> above exists only between vfio-pci PF and VF drivers, once a VF is
> bound to another driver, the administrator has effectively pronounced
> the device as trusted.  The vfio-pci driver will note alternate
> binding in dmesg for logging and debugging purposes.
> 
> Please review, comment, and test.  The example QEMU implementation
> provided with the RFC[2] is still current for this version.  Thanks,


It is a cool feature. One question - what device have you tested it with?

Does not a PF want to control/manage VFs on a PF driver side? I am
thinking of Mellanox CX5 or similar NIC and it acts as an managed
ethernet switch which might want to do something to VFs and VFs may not
work as expected without PF's native driver doing things to it, or this
is not a concern, is it? Thanks,


> 
> Alex
> 
> [1] https://lore.kernel.org/lkml/158085337582.9445.17682266437583505502.stgit@gimli.home/
> [2] https://lore.kernel.org/lkml/20200204161737.34696b91@w520.home/
> ---
> 
> Alex Williamson (7):
>       vfio: Include optional device match in vfio_device_ops callbacks
>       vfio/pci: Implement match ops
>       vfio/pci: Introduce VF token
>       vfio: Introduce VFIO_DEVICE_FEATURE ioctl and first user
>       vfio/pci: Add sriov_configure support
>       vfio/pci: Remove dev_fmt definition
>       vfio/pci: Cleanup .probe() exit paths
> 
> 
>  drivers/vfio/pci/vfio_pci.c         |  312 ++++++++++++++++++++++++++++++++---
>  drivers/vfio/pci/vfio_pci_private.h |   10 +
>  drivers/vfio/vfio.c                 |   20 ++
>  include/linux/vfio.h                |    4 
>  include/uapi/linux/vfio.h           |   37 ++++
>  5 files changed, 355 insertions(+), 28 deletions(-)
> 

-- 
Alexey

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [dpdk-dev] [PATCH 0/7] vfio/pci: SR-IOV support
  2020-02-14  4:57 ` [dpdk-dev] [PATCH 0/7] vfio/pci: SR-IOV support Alexey Kardashevskiy
@ 2020-02-14 15:27   ` Alex Williamson
  0 siblings, 0 replies; 19+ messages in thread
From: Alex Williamson @ 2020-02-14 15:27 UTC (permalink / raw)
  To: Alexey Kardashevskiy
  Cc: kvm, linux-pci, linux-kernel, dev, mtosatti, thomas, bluca,
	jerinjacobk, bruce.richardson, cohuck

On Fri, 14 Feb 2020 15:57:04 +1100
Alexey Kardashevskiy <aik@ozlabs.ru> wrote:

> On 12/02/2020 10:05, Alex Williamson wrote:
> > Given the mostly positive feedback from the RFC[1], here's a new
> > non-RFC revision.  Changes since RFC:
> > 
> >  - vfio_device_ops.match semantics refined
> >  - Use helpers for struct pci_dev.physfn to avoid breakage without
> >    CONFIG_PCI_IOV
> >  - Relax to allow SR-IOV configuration changes while PF is opened.
> >    There are potentially interesting use cases here, including
> >    perhaps QEMU emulating an SR-IOV capability and calling out
> >    to a privileged entity to manipulate sriov_numvfs and corral
> >    the resulting devices.
> >  - Retest vfio_device_feature.argsz to include uuid length.
> >  - Add Connie's R-b on 6/7
> > 
> > I still wish we had a solution to make it less opaque to the user
> > why a VFIO_GROUP_GET_DEVICE_FD() has failed if a VF token is
> > required, but this is still the best I've been able to come up with.
> > If there are objections or better ideas, please raise them now.
> > 
> > The synopsis of this series is that we have an ongoing desire to drive
> > PCIe SR-IOV PFs from userspace with VFIO.  There's an immediate need
> > for this with DPDK drivers and potentially interesting future use
> > cases in virtualization.  We've been reluctant to add this support
> > previously due to the dependency and trust relationship between the
> > VF device and PF driver.  Minimally the PF driver can induce a denial
> > of service to the VF, but depending on the specific implementation,
> > the PF driver might also be responsible for moving data between VFs
> > or have direct access to the state of the VF, including data or state
> > otherwise private to the VF or VF driver.
> > 
> > To help resolve these concerns, we introduce a VF token into the VFIO
> > PCI ABI, which acts as a shared secret key between drivers.  The
> > userspace PF driver is required to set the VF token to a known value
> > and userspace VF drivers are required to provide the token to access
> > the VF device.  If a PF driver is restarted with VF drivers in use, it
> > must also provide the current token in order to prevent a rogue
> > untrusted PF driver from replacing a known driver.  The degree to
> > which this new token is considered secret is left to the userspace
> > drivers, the kernel intentionally provides no means to retrieve the
> > current token.
> > 
> > Note that the above token is only required for this new model where
> > both the PF and VF devices are usable through vfio-pci.  Existing
> > models of VFIO drivers where the PF is used without SR-IOV enabled
> > or the VF is bound to a userspace driver with an in-kernel, host PF
> > driver are unaffected.
> > 
> > The latter configuration above also highlights a new inverted scenario
> > that is now possible, a userspace PF driver with in-kernel VF drivers.
> > I believe this is a scenario that should be allowed, but should not be
> > enabled by default.  This series includes code to set a default
> > driver_override for VFs sourced from a vfio-pci user owned PF, such
> > that the VFs are also bound to vfio-pci.  This model is compatible
> > with tools like driverctl and allows the system administrator to
> > decide if other bindings should be enabled.  The VF token interface
> > above exists only between vfio-pci PF and VF drivers, once a VF is
> > bound to another driver, the administrator has effectively pronounced
> > the device as trusted.  The vfio-pci driver will note alternate
> > binding in dmesg for logging and debugging purposes.
> > 
> > Please review, comment, and test.  The example QEMU implementation
> > provided with the RFC[2] is still current for this version.  Thanks,  
> 
> 
> It is a cool feature. One question - what device have you tested it with?
> 
> Does not a PF want to control/manage VFs on a PF driver side? I am
> thinking of Mellanox CX5 or similar NIC and it acts as an managed
> ethernet switch which might want to do something to VFs and VFs may not
> work as expected without PF's native driver doing things to it, or this
> is not a concern, is it? Thanks,

TBH, I'm starting with the premise that a userspace PF driver already
works.  The DPDK folks have produced some "interesting" code that
allows SR-IOV to be enabled on a PF underneath vfio-pci.  There's also
a non-upstream igb-uio driver associated with DPDK that seems to be
recommended for SR-IOV PF driver use cases, particularly for an FPGA
device.  The testing I've done, and what's provided by the QEMU patch I
reference, is really only unit testing the vf_token support and
DEVICE_FEATURE ioctl provided here.  I used this with an Intel 82576
(igb) where the PF driver doesn't particularly like being assigned to a
VM with SR-IOV enabled.  Likewise, I can prove that the interfaces here
provide the correct restrictions for the VF, but the VF doesn't work in
a VM due to the state of the PF.  I'm hoping we'll have some
confirmation from the DPDK folks that this provides what they need to
abandon the non-upstream drivers and more nefarious hacks.  There's a
lot more virtualization work to be done in QEMU before I'd propose
patch I reference above upstream.

To your specific question regarding CX5, I think there are very few
SR-IOV devices where the PF doesn't act as some kind of packet router
or ring management engine.  The Amazon device listed in the pci-pf-stub
driver seems to be one of the few SR-IOV devices which claim the PF has
no special interfaces other than exposing the SR-IOV capability itself.
So I think we generally expect a device specific SR-IOV aware driver
running on the PF via this interface.  That's certainly the case for
the DPDK code for the FPGA device above.  Thanks,

Alex


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [dpdk-dev] [PATCH 3/7] vfio/pci: Introduce VF token
  2020-02-13 18:35       ` Cornelia Huck
@ 2020-02-14 23:40         ` Alex Williamson
  0 siblings, 0 replies; 19+ messages in thread
From: Alex Williamson @ 2020-02-14 23:40 UTC (permalink / raw)
  To: Cornelia Huck
  Cc: kvm, linux-pci, linux-kernel, dev, mtosatti, thomas, bluca,
	jerinjacobk, bruce.richardson

On Thu, 13 Feb 2020 19:35:16 +0100
Cornelia Huck <cohuck@redhat.com> wrote:

> On Thu, 13 Feb 2020 10:23:21 -0700
> Alex Williamson <alex.williamson@redhat.com> wrote:
> 
> > On Thu, 13 Feb 2020 12:46:54 +0100
> > Cornelia Huck <cohuck@redhat.com> wrote:
> >   
> > > On Tue, 11 Feb 2020 16:05:42 -0700
> > > Alex Williamson <alex.williamson@redhat.com> wrote:
> > >     
> > > > If we enable SR-IOV on a vfio-pci owned PF, the resulting VFs are not
> > > > fully isolated from the PF.  The PF can always cause a denial of
> > > > service to the VF, if not access data passed through the VF directly.
> > > > This is why vfio-pci currently does not bind to PFs with SR-IOV enabled
> > > > and does not provide access itself to enabling SR-IOV on a PF.  The
> > > > IOMMU grouping mechanism might allow us a solution to this lack of
> > > > isolation, however the deficiency isn't actually in the DMA path, so
> > > > much as the potential cooperation between PF and VF devices.  Also,
> > > > if we were to force VFs into the same IOMMU group as the PF, we severely
> > > > limit the utility of having independent drivers managing PFs and VFs
> > > > with vfio.
> > > > 
> > > > Therefore we introduce the concept of a VF token.  The token is
> > > > implemented as a UUID and represents a shared secret which must be set
> > > > by the PF driver and used by the VF drivers in order to access a vfio
> > > > device file descriptor for the VF.  The ioctl to set the VF token will
> > > > be provided in a later commit, this commit implements the underlying
> > > > infrastructure.  The concept here is to augment the string the user
> > > > passes to match a device within a group in order to retrieve access to
> > > > the device descriptor.  For example, rather than passing only the PCI
> > > > device name (ex. "0000:03:00.0") the user would also pass a vf_token
> > > > UUID (ex. "2ab74924-c335-45f4-9b16-8569e5b08258").  The device match
> > > > string therefore becomes:
> > > > 
> > > > "0000:03:00.0 vf_token=2ab74924-c335-45f4-9b16-8569e5b08258"
> > > > 
> > > > This syntax is expected to be extensible to future options as well, with
> > > > the standard being:
> > > > 
> > > > "$DEVICE_NAME $OPTION1=$VALUE1 $OPTION2=$VALUE2"      
> > > 
> > > Is this designed to be an AND condition? (I read it as such.)    
> > 
> > Not sure I understand, the device name is always required for
> > compatibility, then zero or more key/value pairs may be needed
> > depending on the vfio bus driver and device requirements.  I'm not
> > suggesting that the user would pass multiple vf_token= options and the
> > implementation here would only parse the first.  I'm really only
> > suggesting the parsing format we'd use for multiple options, I'm not
> > trying to dictate how a bus driver might make use of them.  Does that
> > make sense, should I change some wording?  
> 
> Not multiple vf_token= options, but multiple, different options.
> 
> E.g. we have something like "$NAME foo=xyz bar=zyx". What is supposed
> to happen?
> 
> - both the foo= and bar= values have to give a match
> - either foo= or bar= have to match
> - if foo= doesn't match, try bar=
> - foo= and bar= are ignored, if not applicable

I'm not sure how we can make generic predictions like that, as soon as
we start adding options we risk having two options that are
incompatible, or one deprecates another.

> (My understanding is that $NAME matching continues to be mandatory?)

Yes, there can still be multiple devices per group, so the user needs
to ask for a device by name.

> What should happen for vf_token= is reasonably clear, but I'm wondering
> about further extensions, as you already talk about it.

It's gotten a bit more fuzzy for me as I've been working on the next
iteration.  For instance, if we reject rather than ignore unknown
options, should we also reject a vf_token= option when it isn't needed?
For example, if a user provides a vf_token= option because they think
they know the PF driver, but the PF is actually managed by an in-kernel
host driver, it seems like that should fail.

So it seems that sets a user mindset that if the vf_token= isn't
rejected that it's used, so rather than ignoring a vf_token= option
when opening a PF with no VF users, should providing that value
actually be a mechanism for setting the VF token?  If it is, do we need
VFIO_DEVICE_FEATURE?  I still like VFIO_DEVICE_FEATURE from the aspect
that the user doesn't commit to using a VF token when they open a
device, they can set or change it later, but clearly the functionality
has an overlap.

I'm also tempted to look whether there's something we could do with
cgroups to imply this relationship, but I know very little there and
we'd want to be careful not to impose a namespace beyond this PF/VF
collaboration idea.

> > > > The device name must be first and option=value pairs are separated by
> > > > spaces.
> > > > 
> > > > The vf_token option is only required for VFs where the PF device is
> > > > bound to vfio-pci.  There is no change for PFs using existing host
> > > > drivers.
> > > > 
> > > > Note that in order to protect existing VF users, not only is it required
> > > > to set a vf_token on the PF before VFs devices can be accessed, but also
> > > > if there are existing VF users, (re)opening the PF device must also
> > > > provide the current vf_token as authentication.  This is intended to
> > > > prevent a VF driver starting with a trusted PF driver and later being
> > > > replaced by an unknown driver.  A vf_token is not required to open the
> > > > PF device when none of the VF devices are in use by vfio-pci drivers.
> > > > 
> > > > Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
> > > > ---
> > > >  drivers/vfio/pci/vfio_pci.c         |  127 +++++++++++++++++++++++++++++++++++
> > > >  drivers/vfio/pci/vfio_pci_private.h |    8 ++
> > > >  2 files changed, 134 insertions(+), 1 deletion(-)
> > > > 
> > > > diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
> > > > index 2ec6c31d0ab0..26aea9ac4863 100644
> > > > --- a/drivers/vfio/pci/vfio_pci.c
> > > > +++ b/drivers/vfio/pci/vfio_pci.c
> > > > @@ -466,6 +466,35 @@ static void vfio_pci_disable(struct vfio_pci_device *vdev)
> > > >  		vfio_pci_set_power_state(vdev, PCI_D3hot);
> > > >  }
> > > >  
> > > > +static struct pci_driver vfio_pci_driver;
> > > > +
> > > > +static void vfio_pci_vf_token_user_add(struct vfio_pci_device *vdev, int val)      
> > > 
> > > Suggestion: call this _user_modify(), and have _user_add() and
> > > _user_remove() as wrappers. That said, ...    
> > 
> > I did consider something along these lines, but it seems overly
> > explicit for a helper that's used in two places with only two obvious
> > and discrete values.  If this were an exposed API, absolutely I'd agree.
> >   
> > > > +{
> > > > +	struct pci_dev *physfn = pci_physfn(vdev->pdev);
> > > > +	struct vfio_device *pf_dev;
> > > > +	struct vfio_pci_device *pf_vdev;
> > > > +
> > > > +	if (!vdev->pdev->is_virtfn)
> > > > +		return;
> > > > +
> > > > +	pf_dev = vfio_device_get_from_dev(&physfn->dev);
> > > > +	if (!pf_dev)
> > > > +		return;
> > > > +
> > > > +	if (pci_dev_driver(physfn) != &vfio_pci_driver) {
> > > > +		vfio_device_put(pf_dev);
> > > > +		return;
> > > > +	}
> > > > +
> > > > +	pf_vdev = vfio_device_data(pf_dev);
> > > > +
> > > > +	mutex_lock(&pf_vdev->vf_token->lock);
> > > > +	pf_vdev->vf_token->users += val;      
> > > 
> > > ...is this expected to always be >= 0? If yes, it looks like a bug if
> > > this is called with val==-n for n > users.    
> > 
> > I'm not sure if you're inadvertently pointing out the bug in the
> > vfio_pci_open() path below where we increment token users before
> > vfio_pci_enable() which can fail, or your suggesting a WARN_ON here
> > should this go negative.  This is a static helper function, so I
> > generally don't try to sanitize the inputs like I would for an exposed
> > API.  
> 
> Yes, if you let users drop below 0, it's an internal error. Still, I
> think it's worth checking, so that we catch those internal errors early
> on, so a WARN_ON is probably the right thing to do.

Added a WARN_ON <0

> > > > +	mutex_unlock(&pf_vdev->vf_token->lock);
> > > > +
> > > > +	vfio_device_put(pf_dev);
> > > > +}
> > > > +
> > > >  static void vfio_pci_release(void *device_data)
> > > >  {
> > > >  	struct vfio_pci_device *vdev = device_data;
> > > > @@ -475,6 +504,7 @@ static void vfio_pci_release(void *device_data)
> > > >  	if (!(--vdev->refcnt)) {
> > > >  		vfio_spapr_pci_eeh_release(vdev->pdev);
> > > >  		vfio_pci_disable(vdev);
> > > > +		vfio_pci_vf_token_user_add(vdev, -1);
> > > >  	}
> > > >  
> > > >  	mutex_unlock(&vdev->reflck->lock);
> > > > @@ -493,6 +523,7 @@ static int vfio_pci_open(void *device_data)
> > > >  	mutex_lock(&vdev->reflck->lock);
> > > >  
> > > >  	if (!vdev->refcnt) {
> > > > +		vfio_pci_vf_token_user_add(vdev, 1);
> > > >  		ret = vfio_pci_enable(vdev);
> > > >  		if (ret)
> > > >  			goto error;    
> > 
> > I think we want to include decrementing token users in the error path
> > here.  
> 
> Yes; good that my comment made you spot it, because I didn't notice :)

Fixed, I opted to move this increment until after the potential failure
as a PF bound to vfio-pci will disable SR-IOV on remove, so if we're a
VF we effectively block the PF from getting removed anyway and there's
no hurry to bump the user count first.


> > > > @@ -1278,11 +1309,86 @@ static void vfio_pci_request(void *device_data, unsigned int count)
> > > >  	mutex_unlock(&vdev->igate);
> > > >  }
> > > >  
> > > > +#define VF_TOKEN_ARG "vf_token="
> > > > +
> > > > +/* Called with vdev->vf_token->lock */
> > > > +static int vfio_pci_vf_token_match(struct vfio_pci_device *vdev, char *opts)
> > > > +{
> > > > +	char *token;
> > > > +	uuid_t uuid;
> > > > +	int ret;
> > > > +
> > > > +	if (!opts)
> > > > +		return -EINVAL;
> > > > +
> > > > +	token = strstr(opts, VF_TOKEN_ARG);
> > > > +	if (!token)
> > > > +		return -EINVAL;
> > > > +
> > > > +	token += strlen(VF_TOKEN_ARG);
> > > > +
> > > > +	ret = uuid_parse(token, &uuid);
> > > > +	if (ret)
> > > > +		return ret;
> > > > +
> > > > +	if (!uuid_equal(&uuid, &vdev->vf_token->uuid))
> > > > +		return -EACCES;
> > > > +
> > > > +	return 0;  
> 
> Again, I guess my objections below are a matter of taste; especially
> because this function does the key=value parsing, and I'm not sure it's
> the right place to do so.

Ok, I've moved the parsing into vfio_pci_match().  The vf_token
validation gets quite a bit hairy by trying to error in cases where the
token is provided but not used though.

> > > > +}
> > > > +
> > > >  static int vfio_pci_match(void *device_data, char *buf)
> > > >  {
> > > >  	struct vfio_pci_device *vdev = device_data;
> > > > +	char *opts;
> > > > +
> > > > +	opts = strchr(buf, ' ');
> > > > +	if (opts) {
> > > > +		*opts = 0;
> > > > +		opts++;
> > > > +	}
> > > > +
> > > > +	if (strcmp(pci_name(vdev->pdev), buf))
> > > > +		return 0; /* No match */      
> > > 
> > > Up to here, this function is fine; but below, it gets a bit hard to
> > > follow...
> > >     
> > > > +
> > > > +	if (vdev->pdev->is_virtfn) {
> > > > +		struct pci_dev *physfn = pci_physfn(vdev->pdev);
> > > > +		struct vfio_device *pf_dev;
> > > > +		int ret = 0;
> > > > +
> > > > +		pf_dev = vfio_device_get_from_dev(&physfn->dev);
> > > > +		if (pf_dev) {
> > > > +			if (pci_dev_driver(physfn) == &vfio_pci_driver) {
> > > > +				struct vfio_pci_device *pf_vdev =
> > > > +						vfio_device_data(pf_dev);
> > > > +
> > > > +				mutex_lock(&pf_vdev->vf_token->lock);
> > > > +				ret = vfio_pci_vf_token_match(pf_vdev, opts);
> > > > +				mutex_unlock(&pf_vdev->vf_token->lock);
> > > > +			}
> > > > +
> > > > +			vfio_device_put(pf_dev);
> > > > +
> > > > +			if (ret)
> > > > +				return -EACCES;
> > > > +		}
> > > > +	}      
> > > 
> > > If we are a VF, and the PF is bound to vfio, pass whatever stuff other
> > > than the device name that was passed in to an opaque match function.    
> > 
> > vfio_pci_match() is an opaque match function relative to vfio.c, but
> > there's nothing opaque here.  We have a VF where the associated PF is
> > bound to vfio-pci, therefore we require that the additional options
> > include a vf_token matching the PF and we go looking to verify that.
> >    
> > > > -	return !strcmp(pci_name(vdev->pdev), buf);
> > > > +	if (vdev->vf_token) {
> > > > +		int ret = 0;
> > > > +
> > > > +		mutex_lock(&vdev->vf_token->lock);
> > > > +
> > > > +		if (vdev->vf_token->users)
> > > > +			ret = vfio_pci_vf_token_match(vdev, opts);
> > > > +
> > > > +		mutex_unlock(&vdev->vf_token->lock);
> > > > +
> > > > +		if (ret)
> > > > +			return -EACCES;
> > > > +	}      
> > > 
> > > If we have a VF token with users, pass whatever stuff other than the
> > > device name that was passed in to an opaque match function.    
> > 
> > This description strays further off course a bit.  If we have a
> > vf_token then we are a PF and clearly bound to vfio-pci.  If there are
> > existing VF users then we require the user to provide a vf_token
> > matching the one currently on the device.  
> 
> Maybe my wording is just a bit off...
> 
> >   
> > > What about the following instead:
> > > 
> > > - parse the passed-in string into device name and key/value pairs
> > > - maybe reject anything with an unknown key    
> > 
> > This is definitely something that we should decided whether or not we
> > want to do it.  I think an argument for it is that a user can pick
> > arbitrary key=value options that would be ignored with this
> > implementation, but later might match a key that gets defined and then
> > we break them.  Misuse of the API on the part of the user, but maybe
> > better to just prevent it ahead of time.  
> 
> Yes, it's probably good to do this now.

Added.

> > > - check the device name
> > > - if we're a VF with the PF bound to vfio, require a VF token to be
> > >   specified and pass it to a token match function
> > > - if we have a VF token with users, require a VF token to be specified
> > >   and pass it to a token match function    
> > 
> > This is essentially what we do above.  Maybe we just disagree about
> > whether we're calling an "opaque match function" versus a "token match
> > function".  
> 
> Maybe I should have said "function parsing a string, which might
> contain a lot of unrelated stuff" vs "function explicitly handling a
> vf_token value".

Next version will pass a pre-parsed uuid pointer to the vf_token
validation function.

> > > My main gripes with the current code are:
> > > - key=value parsing is done in the match function for vf_token
> > > - it looks hard to extend the list of supported key/value pairs
> > > - I don't see a good way to find out (as the user) _why_ the VF isn't
> > >   matching    
> > 
> > If we want to reject unknown options, then yes, the parsing should be
> > done ahead.  I don't see that it's hard to extend though, each new
> > requirement can follow the same methodology to check for it in the
> > remaining options string.  
> 
> If you pre-parse into option/value pairs, you see quite easily if you
> managed to obtain a required option, if an option has been specified
> more than once, or if an unknown option has been specified. If a new
> option is introduced, you just need to handle whatever has been parsed
> already. Extending is probably not exactly hard, but pre-parsing likely
> makes it easier, as you don't have implicit assumptions.

I hope you'll like the next version ;)

> > The last point is the one I brought up in the cover letter and where
> > I'm also not happy with the opaque error condition, but I have no
> > thoughts on how to resolve it.  Either we block the user from getting
> > the device file descriptor, and they're left scratching their heads
> > why, or we give them access to the file descriptor AND we need to
> > impose barriers to access mechanisms (ex. block read/write/mmap, limit
> > ioctls) AND we need to use VFIO_DEVICE_FEATURE and VFIO_DEVICE_GET_INFO
> > as a mechanism for the user to figure out that the device requires
> > "something" to get full access.  With the latter, I'm concerned whether
> > existing userspace code will fail in predictable ways and that it
> > snowballs into an ugly API.  For instance, if we add a flag to device
> > info to indicate it's locked, existing code won't know about that flag,
> > so we have to cripple device info to report no regions and no irqs to
> > make that code fail.  Then a user needs to know which feature to probe
> > for to figure out how the device is locked, then once they do we make
> > device info report real values?  It's maybe a little more deterministic
> > than blocking access to the file descriptor, but I'm not sure it's
> > worth it.  We could do a log-once if the match fails for token, but we
> > need to be careful not to provide an obvious point where the user could
> > spam the logs.  I've also considered if we could write an error back
> > into the user's buffer, but the ioctl isn't designed that way and we
> > don't know if we'd break how the user consumes that buffer later.  
> 
> Some extended reporting mechanism is likely to become unwieldy,
> especially when we realize we missed something. A simple log message
> that a vf_token is required (pointing to a more verbose description, if
> possible) looks best (obviously rate limited or only printed once).
> Just enough to give the user a hint so that they are not left
> completely baffled.

Using pci_info_ratelimited() for now.  I'm curious your opinion on the
above snowball effect of rejecting unknown/duplicate options turning
into rejecting unused options as well, and how that plays into whether
we really need the DEVICE_FEATURE ioctl.  It might be too abstract
until I send out a new version.  Thanks for the comments,

Alex


^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2020-02-14 23:40 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-02-11 23:05 [dpdk-dev] [PATCH 0/7] vfio/pci: SR-IOV support Alex Williamson
2020-02-11 23:05 ` [dpdk-dev] [PATCH 1/7] vfio: Include optional device match in vfio_device_ops callbacks Alex Williamson
2020-02-13 10:31   ` Cornelia Huck
2020-02-11 23:05 ` [dpdk-dev] [PATCH 2/7] vfio/pci: Implement match ops Alex Williamson
2020-02-13 11:04   ` Cornelia Huck
2020-02-11 23:05 ` [dpdk-dev] [PATCH 3/7] vfio/pci: Introduce VF token Alex Williamson
2020-02-13 11:46   ` Cornelia Huck
2020-02-13 17:23     ` Alex Williamson
2020-02-13 18:35       ` Cornelia Huck
2020-02-14 23:40         ` Alex Williamson
2020-02-11 23:05 ` [dpdk-dev] [PATCH 4/7] vfio: Introduce VFIO_DEVICE_FEATURE ioctl and first user Alex Williamson
2020-02-13 12:41   ` Cornelia Huck
2020-02-13 17:39     ` Alex Williamson
2020-02-13 18:08       ` Cornelia Huck
2020-02-11 23:06 ` [dpdk-dev] [PATCH 5/7] vfio/pci: Add sriov_configure support Alex Williamson
2020-02-11 23:06 ` [dpdk-dev] [PATCH 6/7] vfio/pci: Remove dev_fmt definition Alex Williamson
2020-02-11 23:06 ` [dpdk-dev] [PATCH 7/7] vfio/pci: Cleanup .probe() exit paths Alex Williamson
2020-02-14  4:57 ` [dpdk-dev] [PATCH 0/7] vfio/pci: SR-IOV support Alexey Kardashevskiy
2020-02-14 15:27   ` Alex Williamson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).